KR20220101817A

KR20220101817A - Method and system of an artificial intelligence for predicting financial information

Info

Publication number: KR20220101817A
Application number: KR1020210003752A
Authority: KR
Inventors: 최희준
Original assignee: 최희준
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2022-07-19
Also published as: KR102598430B1

Abstract

The present invention provides an artificial intelligence system for predicting financial information, comprising: a financial data collection unit that collects data to be analyzed by extracting finance-related meta information and character information by linking with at least one of a plurality of Internet media, media providing financial services, and news web pages of media companies through a communication network; a morpheme analysis unit that analyzes morphemes by dividing the financial-related character information into lexical meaning and grammatical meaning; a syntax analysis unit that constructs a parsing tree through syntax analysis for the data to be analyzed that has undergone the morpheme analysis; a sentence analysis unit that generates meaning for each word in the parsing tree and then assigns the meaning; a financial information extraction unit that extracts financial information based on the result of the sentence analysis unit; and a financial prediction information generation unit that generates financial prediction information through comparison between the extracted financial information and reference data managed by preset classification depending on fluctuations in stock prices and financial indices. Accordingly, the prediction accuracy of financial information can be increased.

Description

An artificial intelligence system for predicting financial information and a method for predicting financial information using the same

본 발명은 자연어 처리 기반으로 금융 정보를 예측할 수 있는 인공 지능 시스템과 이를 이용한 금융 정보 예측 방법에 관한 것이다.The present invention relates to an artificial intelligence system capable of predicting financial information based on natural language processing and a financial information prediction method using the same.

최근에는 IT 기술의 급격한 발전으로 인해 스마트폰과 같은 단말의 보급이 확대되고 있다. 이로 인해, 사용자는 단말을 사용하여 직접 은행 등을 방문하지 않고도 온라인상에서 다양한 금융업무의 처리가 가능하다. 또한, 기존의 현금을 통한 금융거래보다 신용카드 또는 간편결재서비스 등의 전자적 결재수단을 이용한 금융거래가 증가하고 있다.Recently, due to the rapid development of IT technology, the spread of terminals such as smart phones is expanding. Accordingly, the user can process various financial services online without directly visiting a bank or the like using the terminal. In addition, the number of financial transactions using electronic payment methods such as credit cards or simple payment services is increasing compared to existing financial transactions through cash.

이와 같은 금융거래 환경의 변화로 인해 각 금융기관에서는 다양한 형태의 금융상품을 개발하여 홍보하고 있다. 하지만, 다양한 결재수단과 금융상품 등이 증가함에 따라 관련 정보 또한 기하급수적으로 늘어나고 있어 사용자는 자신에게 적합한 금융상품을 선택하는데 어려움이 있었다.Due to such changes in the financial transaction environment, each financial institution is developing and promoting various types of financial products. However, as the number of various payment methods and financial products increases, the related information also increases exponentially, so it is difficult for users to select a suitable financial product.

한편, 금융 산업과 관련된 정보는 금융 지수, 이자율 등의 수치 자료와 경제 분석 리포트, 산업 관련 뉴스 등의 문자 자료로 나누어진다.Meanwhile, information related to the financial industry is divided into numerical data such as financial indices and interest rates, and text data such as economic analysis reports and industry-related news.

이 중 문자 자료는 계량 경제학자나 수학자가 활발히 연구를 진행해온 수치 자료와 달리 학자들의 연구 활발히 진행되지 못한 채 증권 애널리스트들의 직관이 중요시되는 고유 영역으로 여겨지고 있다.Of these, text data is regarded as a unique area where the intuition of securities analysts is important, unlike numerical data that econometrics and mathematicians have actively researched.

그러나, 문자 자료와 수학적 모델의 괴리로 인해, 파생 상품 해지처럼 빠른 판단을 해야 하는 금융 산업에서 잘못된 판단을 할 확률이 높아진다. 이는 곧 투자자나 회사에 커다란 손실을 야기할 수 있다.However, due to the discrepancy between the text data and the mathematical model, the probability of making a wrong decision increases in the financial industry, which requires quick judgment, such as canceling derivatives. This can lead to huge losses for investors or companies.

최근에는 하드웨어와 머신러닝과 같은 분석 기술이 발전함에 따라 문자 자료에서 미반영 정보를 찾아내어 정량적 형태로 모델에 편입시키는 것이 가능하기 때문에, 이를 토대로 한 다양한 분야에서의 정보 분석 기술이 개발되고 있다.Recently, with the development of analysis technologies such as hardware and machine learning, it is possible to find non-reflected information in text data and incorporate it into a model in a quantitative form. Based on this, information analysis technology in various fields is being developed.

대한민국 공개특허 제10-2019-0082151호(2019.07.09.공개.)Republic of Korea Patent Publication No. 10-2019-0082151 (published on July 9, 2019.)

본 발명은 금융과 관련된 다양한 형태의 문자로 구성된 데이터를 수집하고, 수집한 데이터를 기반으로 금융 정보를 추출하여 예측할 수 있는 금융 정보 예측을 위한 인공 지능 시스템과 이를 이용한 금융 정보 예측 방법을 제공한다.The present invention provides an artificial intelligence system for predicting financial information that can collect data composed of various types of characters related to finance and extract and predict financial information based on the collected data, and a financial information prediction method using the same.

또한, 본 발명은 추출한 금융 정보와 주가와 금융지수의 변동을 변동폭에 따라 기 설정된 분류되는 분류 데이터간의 비교를 통해 금융 정보를 예측할 수 있는 인공 지능 시스템과 이를 이용한 금융 정보 예측 방법을 제공한다.In addition, the present invention provides an artificial intelligence system capable of predicting financial information by comparing extracted financial information, stock price, and classification data classified according to the fluctuation range according to the fluctuation range, and a financial information prediction method using the same.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 실시예에 따른 금융 정보 예측을 위한 인공 지능 시스템은 복수의 인터넷 매체, 금융 관련 서비스를 제공하는 매체 및 언론사 뉴스 웹 페이지 중 적어도 하나 이상과 통신망을 통해 연동하여 금융 관련 메타 정보와 문자 정보를 추출하여 분석 대상 데이터를 수집하는 금융 데이터 수집부와, 상기 금융 관련 문자 정보를 어휘적 의미와 문법적 의미로 나눠서 형태소를 분석하는 형태소 분석부와, 상기 형태소 분석을 거친 분석 대상 데이터에 대한 구문 분석을 통해 파싱 트리를 구성하는 구문 분석부와, 상기 파싱 트리 내 각 단어에 대해 의미를 생성한 후 이를 부여하는 문장 해석부와, 상기 문장 해석부의 결과를 기반으로 금융 정보를 추출하는 금융 정보 추출부와, 상기 추출한 금융 정보와 주가와 금융지수의 변동을 변동폭에 따라 기 설정된 분류로 관리되는 기준 데이터간의 비교를 통해 금융 예측 정보를 생성하는 금융 예측 정보 생성부를 포함할 수 있다.As a technical means for achieving the above technical problem, the artificial intelligence system for predicting financial information according to an embodiment of the present invention includes at least one or more of a plurality of Internet media, media providing financial-related services, and news web pages of media companies; A financial data collection unit that collects analysis target data by extracting financial-related meta information and text information by interworking through a communication network, a morpheme analysis unit that analyzes morphemes by dividing the financial-related text information into lexical and grammatical meanings; A syntax analysis unit for constructing a parsing tree through syntactic analysis of the analysis target data that has undergone the morpheme analysis; A financial information extraction unit that extracts financial information based on It may include a generator.

본 발명의 실시예에 따르면, 상기 구문 분석부는 한국어 언어 구조에 적합한 신경망 의존 구문 분석기 및 순환 알고리즘을 통해 구문 분석을 수행하여 파싱 트리를 구성할 수 있다.According to an embodiment of the present invention, the syntax analysis unit may construct a parsing tree by performing syntax analysis through a neural network-dependent syntax analyzer and a recursive algorithm suitable for a Korean language structure.

본 발명의 실시예에 따르면, 상기 문장 해석부는 분포 가설과 벡터 공간 모델을 사용하여 상기 구문 분석부의 구문 분석 결과에서 의미를 분석한 후 이를 기반으로 문장에 대한 해석을 실시하며, 상기 문장에 대한 해설을 토대로 문장을 임베딩한 벡터시퀀스를 생성할 수 있다.According to an embodiment of the present invention, the sentence analysis unit analyzes the meaning from the syntax analysis result of the syntax analysis unit using a distribution hypothesis and a vector space model, and then interprets the sentence based on the analysis result, and explains the sentence Based on this, a vector sequence in which a sentence is embedded can be generated.

본 발명의 실시예에 따르면, 상기 금융 정보 추출부는 상기 임베딩한 벡터시퀀스를 이용하여 Attention 기반의 트랜스포머(Transformer)와 사전처리(Pretraining)를 수행하여 금융 정보에 대응되는 콘텍스트 벡터를 산출할 수 있다.According to an embodiment of the present invention, the financial information extraction unit may calculate a context vector corresponding to financial information by performing pretraining with an attention-based transformer using the embedded vector sequence.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 실시예에 인공 지능 시스템을 이용한 금융 정보 예측 방법은 금융 정보 예측 장치에서, 복수의 인터넷 매체, 금융 관련 서비스를 제공하는 매체 및 언론사 뉴스 웹 페이지 중 적어도 하나 이상과 통신망을 통해 연동하여 금융 관련 메타 정보와 문자 정보를 추출하여 분석 대상 데이터를 수집하는 단계와, 상기 금융 관련 문자 정보를 어휘적 의미와 문법적 의미로 나눠서 형태소를 분석하는 단계와, 상기 형태소 분석을 거친 분석 대상 데이터에 대한 구문 분석을 통해 파싱 트리를 구성하는 단계와, 상기 파싱 트리 내 각 단어에 대해 의미를 생성한 후 이를 부여하는 단계와, 상기 문장 해석부의 결과를 기반으로 금융 정보를 추출하는 단계와, 상기 추출한 금융 정보와 주가와 금융지수의 변동을 변동폭에 따라 기 설정된 분류로 관리되는 금융 정보간의 비교를 통해 금융 예측 정보를 생성하는 단계를 포함할 수 있다.As a technical means for achieving the above technical problem, a financial information prediction method using an artificial intelligence system in an embodiment of the present invention is a financial information prediction device, a plurality of Internet media, media and news websites providing financial-related services Collecting analysis target data by extracting financial-related meta information and text information by interworking with at least one of the pages through a communication network, and analyzing morphemes by dividing the financial-related text information into lexical and grammatical meanings; , constructing a parsing tree through syntactic analysis of the analysis target data that has undergone the morpheme analysis, generating a meaning for each word in the parsing tree and then giving it, and based on the result of the sentence interpretation unit It may include extracting financial information, and generating financial prediction information through comparison between the extracted financial information and financial information managed in a preset classification according to the fluctuation range of stock prices and financial indexes.

본 발명의 실시예에 따르면, 상기 부여하는 단계는 분포 가설과 벡터 공간 모델을 사용하여 상기 파싱 트리에서 의미를 분석한 후 상기 의미의 분석 결과를 기반으로 문장에 대한 해석을 실시하며, 상기 문장에 대한 해설을 토대로 문장을 임베딩한 벡터시퀀스를 생성할 수 있다.According to an embodiment of the present invention, the giving step analyzes the meaning in the parsing tree using a distribution hypothesis and a vector space model, and then analyzes the sentence based on the analysis result of the meaning, It is possible to create a vector sequence in which a sentence is embedded based on the explanation.

본 발명의 실시예에 따르면, 상기 금융 정보를 추출하는 단계는 상기 임베딩한 벡터시퀀스를 이용하여 Attention 기반의 트랜스포머(Transformer)와 사전처리(Pretraining)를 수행하여 금융 정보에 대응되는 콘텍스트 벡터를 산출하며, 상기 금융 예측 정보를 생성하는 단계는 상기 콘텍스트 벡터를 하기의 수학식에 적용하여 상기 콘텍스트 벡터가 어느 분류에 속하는지를 판단하며, 상기 판단한 결과를 기반으로 금융 예측 정보를 생성할 수 있다.According to an embodiment of the present invention, in the step of extracting the financial information, a context vector corresponding to the financial information is calculated by performing pretraining with an attention-based transformer using the embedded vector sequence, , the generating of the financial prediction information may include determining which category the context vector belongs to by applying the context vector to the following equation, and generating financial prediction information based on the determined result.

전술한 본 발명의 실시예에 따르면, 금융과 관련된 다양한 형태의 문자로 구성된 데이터를 수집하고, 수집한 데이터를 기반으로 금융 정보를 추출하여 예측할 함으로써, 문자 정보를 반영하여 금융 예측 정보를 생성할 수 있기 때문에 금융 정보에 대한 예측 정확도를 높일 수 있다.According to the above-described embodiment of the present invention, financial prediction information can be generated by reflecting the text information by collecting data composed of various types of finance-related text and extracting and predicting financial information based on the collected data. Therefore, it is possible to increase the prediction accuracy of financial information.

도 1은 본 발명의 실시예에 따른 금융 정보 예측을 위한 인공 지능 시스템을 도시한 도면이다.
도 2는 본 발명의 실시예에 따른 인공 지능 시스템의 구분 분석부에서 출력되는 파싱 트리의 구성을 설명하기 위한 도면이다.
도 3은 본 발명의 실시예에 따른 인공 지능 시스템에서 금융 정보 예측 과정을 도시한 흐름도이다.1 is a diagram illustrating an artificial intelligence system for predicting financial information according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining the configuration of a parsing tree output from a classification analysis unit of an artificial intelligence system according to an embodiment of the present invention.
3 is a flowchart illustrating a financial information prediction process in an artificial intelligence system according to an embodiment of the present invention.

이하, 도면을 참조하여 본 발명의 구체적인 실시 형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to provide a comprehensive understanding of the methods, apparatus, and/or systems described herein. However, this is merely an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing the embodiments of the present invention, if it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. And, the terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification. The terminology used in the detailed description is for the purpose of describing embodiments of the present invention only, and should in no way be limiting. Unless explicitly used otherwise, expressions in the singular include the meaning of the plural. In this description, expressions such as “comprising” or “comprising” are intended to indicate certain features, numbers, steps, acts, elements, some or a combination thereof, one or more other than those described. It should not be construed to exclude the presence or possibility of other features, numbers, steps, acts, elements, or any part or combination thereof.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 따른 금융 정보 예측을 위한 인공 지능 시스템과 이를 이용한 금융 정보 예측 방법에 대해 설명하기로 한다.Hereinafter, an artificial intelligence system for predicting financial information and a method of predicting financial information using the same according to an embodiment of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 금융 정보 예측을 위한 인공 지능 시스템을 도시한 도면이며, 도 2는 본 발명의 실시예에 따른 인공 지능 시스템의 구분 분석부에서 출력되는 파싱 트리의 구성을 설명하기 위한 도면이다.1 is a diagram illustrating an artificial intelligence system for predicting financial information according to an embodiment of the present invention, and FIG. 2 illustrates the configuration of a parsing tree output from a division analysis unit of an artificial intelligence system according to an embodiment of the present invention It is a drawing for

본 발명의 실시예에 따른 설명에 앞서, 금융 정보 예측을 위한 인공 지능 시스템은 자연어 처리를 기반으로 동작할 수 있는데, 자연어 처리는 인간의 언어로 다양한 컴퓨팅 기기와 의사소통할 수 있는 자연어 인터페이스를 구현하기 위한 기술을 의미하며, 인간의 두뇌 기관에서 일어나는 기호 이해와 기호 산출의 과정을 컴퓨팅 디바이스에 의해 모사되도록 하는 것을 의미할 수 있다.Prior to the description according to an embodiment of the present invention, the artificial intelligence system for predicting financial information may operate based on natural language processing, which implements a natural language interface that can communicate with various computing devices in human language. It means a technology to do this, and it may mean that the process of understanding symbols and calculating symbols occurring in a human brain organ is simulated by a computing device.

도 1에 도시된 바와 같이, 금융 정보 예측을 위한 인공 지능 시스템은 최적화 수학 이론을 적용한 딥러닝(Deep Learning)을 이용하여 자연어 처리 알고리즘을 적용한 금융 정보 알고리즘을 통해 이루어진 자연어 처리 방법으로 구현될 수 있다.As shown in Figure 1, the artificial intelligence system for predicting financial information can be implemented as a natural language processing method made through a financial information algorithm to which a natural language processing algorithm is applied using deep learning to which an optimization mathematical theory is applied. .

본 발명의 실시예에 따른 금융 정보 예측을 위한 인공 지능 시스템의 금융 정보 예측 장치(100)는 파이썬(Python)을 이용하여 구현된 자연어 처리 방법을 기반으로 동작할 수 있다.The apparatus 100 for predicting financial information of an artificial intelligence system for predicting financial information according to an embodiment of the present invention may operate based on a natural language processing method implemented using Python.

금융 정보 예측 장치(100)는 파이토치(PYTORCH) 또는 텐소플로(TENSORFLOW)에서 Skipgram 기반 BERT(Bidiretional Embedding Representation Transformer)를 적용하여 문장을 임베딩한 벡터시퀀스를 산출할 수 있다. 이 경우 필요한 데이터는 한국어 위키미디어(WIKIPEDIA), 넷플릭스(NETFLIX) 또는 수집된 신문 기사 정보 등을 이용하여 구축된 언어 정보 데이터베이스(150)를 이용할 수 있다.The financial information prediction apparatus 100 may calculate a vector sequence in which a sentence is embedded by applying a Skipgram-based Bidirectional Embedding Representation Transformer (BERT) in PYTORCH or TENSORFLOW. In this case, the necessary data may use the language information database 150 constructed using Korean Wikimedia, Netflix, or collected newspaper article information.

이를 위하여, 본 발명의 실시예에 따른 금융 정보 예측 장치(100)는 크롤링을 통해 문자 정보를 수집하고, 수집한 문자 정보에서 형태소 분석 및 구문 분석을 실시하여 문장을 해석한 후 이를 토대로 금융 정보를 추출하여 금융 예측 정보를 생성할 수 있다.To this end, the financial information prediction apparatus 100 according to an embodiment of the present invention collects text information through crawling, analyzes the morpheme and syntax on the collected text information, interprets the sentence, and then provides financial information based on this. It can be extracted to generate financial forecast information.

이를 위하여, 금융 정보 예측 장치(100)는 금융 데이터 수집부(102), 형태소 분석부(104), 구문 분석부(106), 문장 해석부(108), 금융 정보 추출부(110) 및 금융 예측 정보 생성부(112) 등을 포함할 수 있다.To this end, the financial information prediction apparatus 100 includes a financial data collection unit 102 , a morpheme analysis unit 104 , a syntax analysis unit 106 , a sentence analysis unit 108 , a financial information extraction unit 110 , and a financial prediction unit. It may include an information generating unit 112 and the like.

먼저, 금융 데이터 수집부(102)는 문자 자료의 자연어 처리를 위한 금융 데이터를 수집할 수 있다.First, the financial data collection unit 102 may collect financial data for natural language processing of text data.

본 발명의 실시예에서, 금융 데이터 수집부(102)는 가치 있는 다량의 비정형적 데이터를 수집하기 위해서 크롤링 절차를 통해 금융 데이터를 수집할 수 있다. 구체적으로, 금융 데이터 수집부(102)는 금융 관련 크롤러(102a)를 구비하며, 금융 관련 크롤러(102a)를 이용하여 크롤링을 통해 여러 인터넷 매체와 언론사 뉴스 웹페이지에서 금융 관련 메타 정보와 문자 정보를 추출하여 분석 대상 데이터를 수집할 수 있다.In an embodiment of the present invention, the financial data collection unit 102 may collect financial data through a crawling procedure in order to collect a large amount of valuable atypical data. Specifically, the financial data collection unit 102 includes a finance-related crawler 102a, and through crawling using the finance-related crawler 102a, financial-related meta information and text information from various Internet media and news webpages It is possible to collect data to be analyzed by extracting it.

본 발명의 실시예에 따른 금융 데이터 수집부(102)는 통신망을 통해 여러 인터넷 매체, 예컨대 금융 정보 서비스, 환율 변경 서비스, 주가 관련 서비스, 금리 관련 서비스 등을 제공하는 인터넷 매체, 언론 서비스를 제공하는 매체 등과 연동하여 금융 관련 메타 정보와 문자 정보를 추출하여 분석 대상 데이터를 수집할 수 있다. The financial data collection unit 102 according to an embodiment of the present invention provides various Internet media, for example, financial information service, exchange rate change service, stock price related service, interest rate related service, etc. through a communication network. It is possible to collect analysis target data by extracting financial-related meta information and text information in conjunction with media.

형태소 분석부(104)는 분석 대상 데이터 내 문자 정보의 공백 사이에 있는 기호를 어간과 어미로 나눌 수 있다. 구체적으로, 형태소 분석부(104)는 크롤링을 통하여 수집한 문자 정보를 최소 의미 단위, 즉 어휘적 의미와 문법적 의미로 나눔으로써 형태소를 분석할 수 있다.The morpheme analyzer 104 may divide a symbol between blanks of character information in the analysis target data into a stem and a stem. Specifically, the morpheme analyzer 104 may analyze the morpheme by dividing the character information collected through crawling into minimum semantic units, that is, lexical meaning and grammatical meaning.

특히, 본 발명의 실시예에서 형태소 분석부(104)는 형태소 분석기 모듈, 예컨대 Hannanum, Kkma, Komoran, Mecab, Twitter 등과 같은 형태소 분석기 모듈을 이용하여 품사 등을 고려하여 분석 대상 데이터에 대한 형태소 분석을 실시할 수 있다.In particular, in the embodiment of the present invention, the morpheme analyzer 104 performs morpheme analysis on the data to be analyzed in consideration of part-of-speech using a morpheme analyzer module, for example, a morpheme analyzer module such as Hannanum, Kkma, Komoran, Mecab, and Twitter. can be carried out.

구문 분석부(106)는 형태소 분석을 거친 분석 대상 데이터에 대한 구문 분석을 실시하여 도 2에 도시된 바와 같은 파싱 트리(parsing tree)를 구성할 수 있다.The syntax analysis unit 106 may construct a parsing tree as shown in FIG. 2 by performing syntax analysis on the analysis target data that has undergone morphological analysis.

또한, 구문 분석부(106)는 형용사나 동사의 성분적 의미나 서법적 의미를 분석하는 과정으로서, 문장을 구성하고 있는 단어들을 성분 또는 의미역으로 분석할 수 있다.In addition, the syntax analysis unit 106 may analyze the words constituting the sentence as components or semantic domains as a process of analyzing the component meaning or grammatical meaning of an adjective or a verb.

본 발명의 실시예에서, 구문 분석부(106)는 RNN(Recursive Neural Network), SU-RNN(Sytactically United Recursive Neural Network), CNN(Convolution Neural Network), LSTM(Long/Short Term Memory) 등의 딥러닝 기법을 이용한 구문 분석용 프로그램(미도시됨)을 이용하여 분석 대상 데이터에 대한 구문 분석을 실시할 수 있다.In the embodiment of the present invention, the syntax analysis unit 106 is a deep neural network (RNN), such as a Recursive Neural Network (RNN), a Syntactically United Recursive Neural Network (SU-RNN), a Convolution Neural Network (CNN), or a Long/Short Term Memory (LSTM). Syntax analysis may be performed on the analysis target data using a syntax analysis program (not shown) using a learning technique.

또한, 본 발명의 실시예에서, 구문 분석부(106)는 한국어 언어 구조에 적합한 신경망 의존 구문 분석기(Neural Network Dependency Parser)를 적용하며, 순환(recursive) 알고리즘을 사용하는 방법을 통하여 구문 분석을 수행할 수 있다. 여기에서, 순환 알고리즘은 함수 안에서 본인 함수를 다시 호출하는 재귀 호출 방식을 의미할 수 있다.In addition, in an embodiment of the present invention, the syntax analysis unit 106 applies a Neural Network Dependency Parser suitable for the Korean language structure, and performs syntax analysis through a method using a recursive algorithm. can do. Here, the recursive algorithm may refer to a recursive calling method in which the own function is called again within the function.

신경망 의존 구문 분석기는 분석 대상 데이터의 현재 상태로부터 추출한 단어(Word)들과 자질(Feature)들에 대하여, 해당 단어 또는 자질에 대해서만 1로 표현되고 나머지는 0으로 표현되는 방식인 원-핫(One-hot) 형태로 되어 있는 입력 레이어(Input layer)와, 입력 레이어에 대하여 NNLM(neural network language model)의 단어 표현을 사전 훈련(pre-training)으로 차원을 줄여주는 프로젝션 레이어(Projection layer)와, 프로젝션 레이어에 대하여 비선형 변환(non-linear activation)을 수행하는 히든 레이어(Hidden layer)와, 각 상태별 확률을 구하는 출력 레이어(Output layer)의 신경망을 이용하여 수행될 수 있다.The neural network-dependent syntax analyzer is a one-hot method in which words and features extracted from the current state of the data to be analyzed are expressed as 1 and the rest are expressed as 0 only for the corresponding word or feature. -hot) in the form of an input layer, and a projection layer that reduces the dimension by pre-training the word expression of a neural network language model (NNLM) with respect to the input layer, This may be performed using a neural network of a hidden layer that performs non-linear activation on the projection layer and an output layer that calculates a probability for each state.

문장 해석부(108)는 구문 분석에 의하여 생성된 구조들에 대해 의미(meaning)를 생성한 후 이를 부여할 수 있다.The sentence interpretation unit 108 may generate a meaning for the structures generated by the syntax analysis, and then give it.

즉, 문장 해석부(108)는 구문 분석을 통해 출력되는 각 단어에 적절한 개체를 대응시켜 각 단어의 의미가 서로 결합하는 방법으로 정확한 구조를 생성할 수 있다. 이때, 문장 해석부(108)는 지식 데이터베이스(미도시됨)를 이용하여 각 단어에 적절한 개체를 선택한 후 각 단어에 개체를 대응하는 방법으로 정확한 구조를 생성할 수 있다,That is, the sentence interpretation unit 108 may generate an accurate structure in a manner in which the meaning of each word is combined with each other by matching an appropriate entity to each word output through syntax analysis. At this time, the sentence interpretation unit 108 may select an appropriate entity for each word using a knowledge database (not shown) and then create an accurate structure by corresponding the entity to each word.

본 발명의 실시예에서 문장 해석부(108)는 구문 분석 결과에서 의미를 분석하기 위해 분포 가설(Distributional Hypothesis)과 벡터 공간 모델(Vector Space Models)을 사용한다. In an embodiment of the present invention, the sentence interpretation unit 108 uses a distributional hypothesis and a vector space model to analyze the meaning in the syntax analysis result.

이때, 분포 가설은 비슷한 맥락에 등장하는 단어들이 유사한 의미를 지닌다는 가설이며, 벡터 공간 모델은 문서 집합에 속하는 각각의 단어 또는 구문들을 벡터공간의 벡터로 표현하여 각 벡터간의 거리를 정의된 것일 수 있다.In this case, the distribution hypothesis is a hypothesis that words appearing in a similar context have similar meanings, and the vector space model expresses each word or phrase belonging to a document set as a vector in a vector space to define the distance between the vectors. have.

이를 통해, 문장 해석부(108)는 분포 가설을 통해 유사성이 있는 맥락에 등장하는 단어들이 유사한 의미를 갖는다는 분포 가설을 통해 의미를 생성함과 더불어 벡터 공간 모델의 단어 또는 구문들 사이의 거리를 통해서 각각의 의미를 정량화함으로써, 구문 분석에 의하여 생성된 파싱 트리 구조들에 대해 의미(meaning)를 생성한 후 이를 부여할 수 있다.Through this, the sentence interpretation unit 108 generates a meaning through the distribution hypothesis that words appearing in similar contexts have similar meanings through the distribution hypothesis, and calculates the distance between words or phrases of the vector space model. By quantifying the meaning of each, it is possible to generate a meaning for the parsing tree structures generated by the syntax analysis and then assign it.

특히, 본 발명의 실시예에 따른 문장 해석부(108)는 DMN(Dynamic Memory Network), Bayesian Classifier 등의 딥러닝을 기반으로 한 금융 전문 문장 해석기(미도시됨)를 이용하여 문장에 대한 해석을 실시한 후 이를 토대로 도 2에 도시된 바와 같이 문장을 임베딩한 벡터시퀀스를 생성할 수 있다.In particular, the sentence interpretation unit 108 according to an embodiment of the present invention analyzes the sentence using a financial specialized sentence interpreter (not shown) based on deep learning such as a Dynamic Memory Network (DMN) and a Bayesian Classifier. After implementation, based on this, as shown in FIG. 2 , a vector sequence in which a sentence is embedded can be generated.

본 발명의 실시예에 따른 금융 정보 추출부(110)는 임베딩한 벡터시퀀스를 입력으로 한 Attention 기반의 트랜스포머(Transformer) 또는 오토엔코더(Autoencoder)와 사전처리(Pretraining)를 수행하여 콘텍스트 벡터(Context vector)를 산출할 수 있다. The financial information extraction unit 110 according to an embodiment of the present invention performs pretraining with an attention-based transformer or autoencoder to which an embedded vector sequence is input, and performs pretraining to a context vector (Context vector) ) can be calculated.

이때, 콘텍스트 벡터는 기사와 문장 전체의 정보가 포함된 것으로서, 딥러닝의 응용에 이용될 수 있으며, 주가나 금융지표의 변동을 결과로 보는 강화 학습의 입력 벡터일 수 있다.In this case, the context vector includes information on the entire article and sentence, and can be used for deep learning applications, and can be an input vector for reinforcement learning that views changes in stock prices or financial indicators as a result.

본 발명의 실시예에서 금융 정보는 주가와 금융지수의 변동을 변동폭에 따라 6진 분류로 관리될 수 있으며, 본 발명의 실시예에서 적용되는 모형화 모델은 6진 분류로 관리되는 금융 정보와 금융 정보 추출부(110)로부터 입력되는 콘텍스트 벡터를 이용할 수 있다.In an embodiment of the present invention, financial information may be managed in a hexadecimal classification according to the fluctuation range of stock prices and financial indices, and the modeling model applied in the embodiment of the present invention is financial information and financial information managed in a hexadecimal classification. A context vector input from the extractor 110 may be used.

금융 예측 정보 생성부(112)는 6진 분류로 관리되는 금융 정보와 사전처리를 통해 얻어진 콘텍스트 벡터를 기반으로 한 파인튜닝(FineTuning)을 통해 금융 예측 정보를 생성할 수 있다. 구체적으로, 금융 예측 정보 생성부(112)는 6진 분류로 관리되는 금융 정보와 사전처리를 통해 얻어진 콘텍스트 벡터를 아래의 수학식 1을 이용하여 수학적 모형화를 수행하여 콘텍스트 벡터가 6진 분류 중 어느 분류에 속하는지를 판단함으로써, 자연어 처리 기반으로 금융 정보를 예측할 수 있다.The financial prediction information generating unit 112 may generate financial prediction information through fine tuning based on the financial information managed by hexadecimal classification and the context vector obtained through pre-processing. Specifically, the financial prediction information generating unit 112 performs mathematical modeling on the financial information managed by hexadecimal classification and the context vector obtained through pre-processing using Equation 1 below to determine which of the hexadecimal classifications the context vector is. By determining whether it belongs to a classification, financial information can be predicted based on natural language processing.

[수학식 1][Equation 1]

상기의 수학식 1에서

는 i그룹의 기댓값 (

)이며,

는 데이터(즉, 추출한 콘텍스트 벡터)가 i 그룹에 속할 확률이며,

는 예측치가 i 그룹에 속한다는 가설이 진실일 확률이다. In Equation 1 above

is the expected value of group i (

) and

is the probability that the data (that is, the extracted context vector) belongs to group i,

is the probability that the hypothesis that the predicted value belongs to group i is true.

즉,

는 6진 분류로 나누어진 금융 정보 그룹의 기댓값일 수 있다.in other words,

may be the expected value of the financial information group divided into hexadecimal classification.

상술한 바와 같은 수학식 1을 통해 금융 예측 정보 생성부(112)는 금융 정보 추출부(110)에서 추출한 금융 정보에 관련된 콘텍스트 벡터가 기준 데이터 중 어느 그룹에 속하는지를 판단한 후 판단 결과를 기반으로 금융 정보를 예측할 수 있다.Through Equation 1 as described above, the financial prediction information generation unit 112 determines to which group of reference data the context vector related to the financial information extracted by the financial information extraction unit 110 belongs to, and then, based on the determination result, information can be predicted.

본 발명의 실시예에서, 6진 분류로 나누어진 금융 정보에 대한 기준 데이터는 주가와 금융 지수의 변동을 변동폭에 대한 것으로서, 그 예로서 주가인 경우 주가 상승, 하락, 상승률, 하락률 등을 들 수 있으나, 이에 한정하지는 않는다.In an embodiment of the present invention, the reference data for financial information divided into hexadecimal classification is for the fluctuation range of stock prices and financial indices, and for example, in the case of stock prices, stock price rise, fall, rise rate, decline rate, etc. However, the present invention is not limited thereto.

즉, 이러한 정보를 토대로 금융 정보 추출부(110)에서 추출한 금융 정보에 대응되는 콘텍스트 벡터가 주식과 관련된 경우 금융 예측 정보 생성부(112)는 해당 콘텍스트 벡터가 수학식 1을 통해 6진 분류로 나눠진 금융 정보 예측을 위한 기준 데이터 중 어느 부분에 속하는지를 판단하며, 판단한 결과를 기반으로 금융 예측 정보를 생성할 수 있다.That is, if the context vector corresponding to the financial information extracted by the financial information extraction unit 110 based on this information is related to the stock, the financial prediction information generation unit 112 divides the context vector into hexadecimal classification through Equation 1 It may be determined which part of the reference data for financial information prediction belongs, and financial prediction information may be generated based on the determination result.

상술한 바와 같은 본 발명의 실시예에 따른 금융 정보 예측 장치(100)가 금융 정보를 예측하는 과정에 대해 도 3을 참조하여 설명하기로 한다.A process of predicting financial information by the financial information prediction apparatus 100 according to an embodiment of the present invention as described above will be described with reference to FIG. 3 .

도 3은 본 발명의 실시예에 따른 인공 지능 시스템에서 금융 정보 예측 과정을 도시한 흐름도이다.3 is a flowchart illustrating a financial information prediction process in an artificial intelligence system according to an embodiment of the present invention.

도 3에 도시된 바와 같이, 먼저 금융 정보 예측 장치(100)는 분석 대상 데이터, 즉 금융 관련 메타 정보와 금융 관련 문자 정보를 수집하여 자연어 처리 과정을 거쳐 금융 정보를 추출하고, 추출한 금융 정보에 모형화 모델을 적용하여 금융 예측 정보, 예컨대 지수, 금리 등과 같은 금융 예측 정보를 생성할 수 있다.As shown in FIG. 3 , first, the financial information prediction apparatus 100 collects analysis target data, that is, financial-related meta information and financial-related text information, extracts financial information through a natural language processing process, and models it on the extracted financial information. The model may be applied to generate financial forecasting information, for example, financial forecasting information such as indices, interest rates, and the like.

이를 위하여, 본 발명의 실시예에 따른 금융 정보 예측 장치(100)의 금융 데이터 수집부(102)는 복수의 인터넷 매체, 금융 관련 서비스를 제공하는 매체 및 언론사 뉴스 웹 페이지 중 적어도 하나 이상과 통신망을 통해 연동하여 금융 관련 메타 정보와 금융 관련 문자 정보를 추출하여 분석 대상 데이터를 수집한다(S300).To this end, the financial data collection unit 102 of the financial information prediction apparatus 100 according to an embodiment of the present invention establishes a communication network with at least one of a plurality of Internet media, media providing financial-related services, and news web pages of media companies. Through interlocking, financial-related meta information and financial-related text information are extracted to collect analysis target data (S300).

그런 다음, 금융 정보 예측 장치(100)는 형태소 분석부(104)를 통해 분석 대상 데이터의 문자 관련 데이터를 어휘적 의미와 문법적 의미로 나눠서 형태소를 분석한다(S302).Then, the financial information prediction apparatus 100 analyzes the morphemes by dividing the character-related data of the analysis target data into lexical meaning and grammatical meaning through the morpheme analysis unit 104 ( S302 ).

그리고 나서, 금융 정보 예측 장치(100)는 구문 분석부(106)를 통해 형태소 분석을 거친 분석 대상 데이터에 대한 구문 분석을 통해 파싱 트리를 구성한다(S304). 이때, 금융 정보 예측 장치(100)는 한국어 언어 구조에 적합한 신경망 의존 구문 분석기 및 순환 알고리즘을 통해 구문 분석을 수행할 수 있다.Then, the financial information prediction apparatus 100 constructs a parsing tree by parsing the analysis target data that has undergone morphological analysis through the syntax analysis unit 106 ( S304 ). In this case, the financial information prediction apparatus 100 may perform syntax analysis through a neural network-dependent syntax analyzer and a recursive algorithm suitable for the Korean language structure.

이후, 금융 정보 예측 장치(100)는 문장 해석부(108)를 통해 파싱 트리 내 각 단어에 대해 의미를 생성한 후 이를 부여한다(S306). 이때, 금융 정보 예측 장치(100)는 분포 가설과 벡터 공간 모델을 사용하여 구문 분석 결과에서 의미를 분석하며, 분석한 의미를 기반으로 문장에 대한 해석을 실시한 후 문장에 대한 해설을 토대로 문장을 임베딩한 벡터 시퀸스를 생성할 수 있다.Thereafter, the financial information prediction apparatus 100 generates a meaning for each word in the parsing tree through the sentence interpretation unit 108 and gives it to each word (S306). At this time, the financial information prediction apparatus 100 analyzes the meaning from the syntax analysis result using the distribution hypothesis and the vector space model, and interprets the sentence based on the analyzed meaning, and then embeds the sentence based on the explanation of the sentence. One vector sequence can be created.

그리고나서, 금융 정보 예측 장치(100)는 금융 정보 추출부(110)를 통해 문장 해석부(108)로부터 출력되는 임베딩한 벡터 시퀸스를 이용하여 금융 정보를 추출한다(S308). 즉, 금융 정보 예측 장치(100)는 임베딩한 벡터 시퀸스를 이용하여 Attention 기반의 트랜스포머(Transformer)와 사전처리(Pretraining)를 수행하여 금융 정보에 해당되는 콘텍스트 벡터(Weight vector)를 산출할 수 있다.Then, the financial information prediction apparatus 100 extracts financial information using the embedded vector sequence output from the sentence interpretation unit 108 through the financial information extraction unit 110 (S308). That is, the financial information prediction apparatus 100 may calculate a context vector (weight vector) corresponding to financial information by performing pretraining with an attention-based transformer using the embedded vector sequence.

이후, 금융 정보 예측 장치(100)는 금융 정보 예측 정보 생성부(112)를 통해 콘텍스트 벡터를 수학식 1과 같은 모형화 모델에 적용하여 금융 예측 정보를 생성한다(S310).Thereafter, the financial information prediction apparatus 100 generates financial prediction information by applying the context vector to the modeling model as in Equation 1 through the financial information prediction information generating unit 112 ( S310 ).

한편, 첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록에서 설명된 기능들을 수행하는 수단을 생성하게 된다.Meanwhile, combinations of each block in the accompanying block diagram and each step in the flowchart may be performed by computer program instructions. These computer program instructions may be embodied in a processor of a general purpose computer, special purpose computer, or other programmable data processing equipment, such that the instructions performed by the processor of the computer or other programmable data processing equipment are described in each block of the block diagram. It creates a means to perform functions.

이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 기록매체(또는 메모리) 등에 저장되는 것도 가능하므로, 그 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 기록매체(또는 메모리)에 저장된 인스트럭션들은 블록도의 각 블록에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다.These computer program instructions may also be stored in a computer-usable or computer-readable recording medium (or memory), etc., which may direct a computer or other programmable data processing equipment to implement a function in a specific manner, so that the computer is available. Alternatively, the instructions stored in the computer-readable recording medium (or memory) may produce an article of manufacture including instruction means for performing the functions described in each block of the block diagram.

그리고, 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.And, since the computer program instructions may be mounted on a computer or other programmable data processing equipment, a series of operating steps are performed on the computer or other programmable data processing equipment to create a computer-executed process to create a computer or other program. It is also possible that instructions for performing the possible data processing equipment provide steps for performing the functions described in each block of the block diagram.

또한, 각 블록은 특정된 논리적 기능(들)을 실행하기 위한 적어도 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시 예들에서는 블록들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.In addition, each block may represent a module, segment, or part of code including at least one or more executable instructions for executing specified logical function(s). It should also be noted that, in some alternative embodiments, it is also possible for the functions mentioned in the blocks to occur out of order. For example, it is possible that two blocks shown in succession are actually performed substantially simultaneously, or that the blocks are sometimes performed in the reverse order according to the corresponding function.

100 : 금융 정보 예측 장치
102 : 금융 데이터 수집부
104 : 형태소 분석부
106 : 구문 분석부
108 : 문장 해석부
110 : 금융 정보 추출부
112 : 금융 예측 정보 생성부100: financial information prediction device
102: financial data collection unit
104: morpheme analysis unit
106: parsing unit
108: sentence interpretation unit
110: financial information extraction unit
112: financial forecast information generation unit

Claims

A financial data collection unit that collects data to be analyzed by extracting financial-related meta information and text information by linking with at least one of a plurality of Internet media, media providing financial-related services, and news web pages of media companies through a communication network;
a morpheme analysis unit that analyzes morphemes by dividing the financial-related text information into lexical and grammatical meanings;
a parsing unit for constructing a parsing tree by parsing the analysis target data that has undergone the morpheme analysis;
a sentence interpretation unit that generates a meaning for each word in the parsing tree and gives it;
a financial information extraction unit for extracting financial information based on the result of the sentence interpretation unit;
An artificial intelligence system for predicting financial information, comprising a financial prediction information generation unit that generates financial prediction information through comparison between the extracted financial information and reference data managed by a preset classification according to the fluctuation range of the stock price and financial index.

According to claim 1,
The sentence interpretation unit,
Finance that analyzes the meaning of the syntax analysis result of the syntax analysis unit using a distribution hypothesis and a vector space model, analyzes the sentence based on this, and generates a vector sequence in which the sentence is embedded based on the explanation of the sentence Artificial intelligence system for information prediction.

3. The method of claim 2,
The financial information extraction unit,
An artificial intelligence system for predicting financial information that calculates a context vector corresponding to financial information by performing pretraining with an attention-based transformer using the embedded vector sequence.

In the financial information prediction device, collecting data to be analyzed by extracting financial-related meta-information and text information by interworking with at least one of a plurality of internet media, media providing financial-related services, and news web pages of media companies through a communication network Wow,
analyzing the morphemes by dividing the financial-related text information into lexical and grammatical meanings;
constructing a parsing tree by parsing the analysis target data that has undergone the morpheme analysis;
creating a meaning for each word in the parsing tree and giving it;
extracting financial information based on the result of the sentence interpretation unit;
The method of predicting financial information using an artificial intelligence system, comprising the step of generating financial prediction information by comparing the extracted financial information with the financial information managed in a preset classification according to the fluctuation range of stock prices and financial indexes.

5. The method of claim 4,
The giving step is
After analyzing the meaning in the parsing tree using a distribution hypothesis and a vector space model, analysis of the sentence is performed based on the analysis result of the meaning, and a vector sequence in which the sentence is embedded based on the explanation of the sentence is generated. A method of predicting financial information using an artificial intelligence system.

6. The method of claim 5,
The step of extracting the financial information,
Using the embedded vector sequence, an attention-based transformer and pretraining are performed to calculate a context vector corresponding to financial information,
The step of generating the financial forecast information includes,
A method for predicting financial information using an artificial intelligence system for determining which category the context vector belongs to by applying the context vector to the following equation, and generating financial prediction information based on the determined result.