KR20230010956A

KR20230010956A - Method for determining investment indicator related with stock item and providing information of stock item based on artificial intelligence, and computing system performing the same

Info

Publication number: KR20230010956A
Application number: KR1020210091396A
Authority: KR
Inventors: 김정민; 김동진; 김준석; 송민정; 이현용; 김병훈; 이수진; 백성운
Original assignee: 주식회사 씽크풀
Priority date: 2021-07-13
Filing date: 2021-07-13
Publication date: 2023-01-20

Abstract

Disclosed are a method and system for determining a reliable investment indicator for specific financial products based on artificial intelligence and providing personalized investment information to each investor based on the reliable investment indicator. According to one aspect of the present invention, a stock item information provision method comprises: a step in which a computing system constructs an item-indicator relationship DB that stores a related investment impact indicator corresponding to each of a plurality of investment items; a step in which the computing system constructs a user-indicator relationship DB that stores an interested investment impact indicator corresponding to each of a plurality of users; a step in which the computing system observes whether the related investment impact indicator corresponding to each of the plurality of investment items is updated; a step in which, when a related investment indicator is observed to be updated, the computing system determines a target user corresponding to the related investment indicator with an update among the plurality of users based on the user-indicator relationship DB; and a step in which the computing system provides the target user with personalized information about investment items corresponding to the relevant investment indicator with the update. According to the present invention, meaningful and reliable investment indicators can be determined through a plurality of documents distributed on the network.

Description

Method for determining investment indicator related with stock item and providing information of stock item based on artificial intelligence, and computing system performing the same}

본 발명은 인공지능 기반의 인공지능 기반의 투자 지표 결정 및 종목 정보 제공 방법 및 이를 수행하는 컴퓨팅 시스템에 관한 것이다. 보다 상세하게는 인공지능 기반으로 특정 금융상품에 대한 신뢰성 있는 투자지표를 결정하고 이를 기반으로 각 투자자에 개인화된 투자 정보를 제공할 수 있는 방법 및 시스템에 관한 것이다.The present invention relates to an artificial intelligence-based artificial intelligence-based method for determining investment indicators and providing stock information, and a computing system for performing the same. More specifically, it relates to a method and system capable of determining a reliable investment index for a specific financial product based on artificial intelligence and providing personalized investment information to each investor based on this.

특정 금융 투자 상품 혹은 금융 투자 종목(예를 들어, 특정 주식 종목 등)과 연관된 키워드를 알고 있다면 해당 키워드에 대한 정보의 검색 등을 통해 해당 종목의 가격변동을 예측하는 등과 같이 매우 유용할 수 있다. 따라서 종래에도 특정 금융 투자 종목 별로 연관 키워드를 정의하고 이를 활용하고자 하는 시도가 있어왔으며, 이러한 기술의 예로 한국공개특허 제10-2015-0083620호(키워드 연동 투자 정보 제공 시스템), 한국공개특허 제10-2017-0056040호(연관 종목 정보 통합 제공 방법) 등을 들 수 있다. If you know a keyword related to a specific financial investment product or financial investment item (eg, a specific stock item, etc.), it can be very useful, such as predicting the price change of the item through a search for information on the keyword. Therefore, in the past, attempts have been made to define and utilize related keywords for each specific financial investment item, and examples of such technologies include Korean Patent Publication No. 10-2015-0083620 (Keyword Linked Investment Information Provision System) and Korean Patent Publication No. 10 -2017-0056040 (How to provide integrated information on related items).

하지만 종래의 기술은 연관 키워드 또는 특정 금융 투자 종목의 연관 정보를 사람이 수동으로 일일이 유지하는 방식이거나 웹 상에서 노출빈도 등을 이용하는 방식이어서 사람의 국한된 지식에 의존하거나 또는 실질적으로 크게 관련이 없는 키워드라도 특정 금융상품의 명칭과 같이 등장하는 일반적인 단어들이 연관 키워드로 선정되는 등 정확성이 상당히 낮은 문제점이 있었다. 따라서 특정인의 지식과 경험에 의존하는 것이 아니라 네트워크 상에서 유통되는 다양한 비정형 데이터로부터 특정 금융투자 종목과 유의미한 정도로 연관성이 있는 키워드를 상대적으로 정확하게 추출해낼 수 있는 기술적 사상이 요구된다.However, the conventional technology is a method in which a person manually maintains related keywords or related information of a specific financial investment item, or a method using exposure frequency on the web, so that it relies on a person's limited knowledge or even a keyword that is not substantially related to There was a problem with fairly low accuracy, such as common words that appear, such as the name of a specific financial product, being selected as related keywords. Therefore, a technical idea that can relatively accurately extract keywords that are significantly related to a specific financial investment item from various unstructured data circulating on the network rather than relying on the knowledge and experience of a specific person is required.

또한 종래에 소셜 미디어 등의 네트워크 상에 다수의 비정형 데이터를 통해 주가 예측을 하고자 하는 시도가 있었다. 이러한 시도는 한국등록특허 제10-1531970호(소셜 미디어 데이터 및 증시 관련 웹데이터 분석을 통한 주가 예측 방법 및 이를 적용한 주가 예측시스템)에 개시된 바 있다. 하지만 이러한 종래의 시도는 단어 또는 키워드별로 주가에 긍정적인 또는 부정적인 영향을 미치는지를 판단하여 종합적으로 해당 문서가 긍정적 영향을 미치는지 부정적 영향을 미치는지를 판단하는 방식이다.Also, in the past, there has been an attempt to predict a stock price through a plurality of unstructured data on a network such as social media. This attempt has been disclosed in Korean Patent Registration No. 10-1531970 (Stock price prediction method through analysis of social media data and stock market-related web data and stock price prediction system using the same). However, such a conventional attempt is to determine whether each word or keyword has a positive or negative effect on the stock price and collectively determine whether the corresponding document has a positive or negative effect.

하지만 이러한 방식은 개별적인 키워드별로 긍부정 판단을 수행하기 때문에, 키워드별로 긍부정 판단을 수행한 결과를 이용하여 적절한 알고리즘(예를 들어, 나이브 베이즈 분류(Naive Bayesian))을 통해 문장 또는 문서 전체가 긍정적인지 부정적인지를 분류한다고 하더라도 실제로는 오류가 클 수밖에 없는 문제점이 있다. 즉, 문장 또는 문서 안에 존재하는 키워드별 긍부정 판단이 아무리 잘 수행된다고 하더라도 문장 또는 문서가 긍정적인 영향을 미치는지 부정적 영향을 미치는지에 대한 판단결과는 오류가 클 수 밖에 없다.However, since this method performs positive-negative judgment for each individual keyword, the entire sentence or document is classified through an appropriate algorithm (for example, Naive Bayesian) using the result of positive-negative judgment for each keyword. Even if it is classified as positive or negative, there is a problem in which errors are inevitable in practice. That is, no matter how well the positive/negative judgment for each keyword present in a sentence or document is performed, the result of determining whether the sentence or document has a positive or negative effect is inevitably error-prone.

또한 종래의 방식은 키워드별로 긍부정의 감성평가를 수행하는데 이러한 경우는 미리 정해진 또는 정형적으로 자주 쓰이는 키워드(예를 들어, 공시 데이터 또는 뉴스 등에서 사용되는 용어들)에 대해서는 잘 동작할 수 있지만 일반 사용자들이 편하게 사용하는 구어체나 자주 사용하지 않는 키워드를 이용하여 해당 금융상품에 대한 의견 또는 분석을 하는 경우에는 정확도가 상당히 낮아지는 문제점이 있다. In addition, the conventional method performs positive or negative sentiment evaluation for each keyword. In this case, it can work well for frequently used keywords (for example, public data or terms used in news) that are predetermined or formal. There is a problem in that accuracy is significantly lowered when opinions or analyzes are made on the financial product using colloquial words that users are comfortable using or keywords that are not frequently used.

따라서 특정 금융상품이나 특정 금융상품의 연관 키워드에 상응하는 비정형 데이터 즉, 문서들로부터 키워드가 아닌 문장 또는 문서 단위로 금융상품의 가치에 긍정적 영향 또는 부정적 영향을 미칠 수 있는지를 판단하는 인공지능 기반의 딥러닝 모델을 설계하고 이를 통해 유의미하고 신뢰성 있는 투자지표를 결정할 수 있는 기술적 사상이 요구된다.Therefore, from unstructured data corresponding to specific financial products or related keywords, that is, documents, artificial intelligence-based technology that determines whether a sentence or document unit can have a positive or negative impact on the value of a financial product, rather than keywords. Technical thinking is required to design deep learning models and determine meaningful and reliable investment indicators through them.

또한 각 사용자 별로 선호하는 투자지표가 상이할 수 있다 따라서, 각 사용자 별로 선호하는 투자지표를 미리 파악하고, 해당 투자지표의 변화가 감지될 경우, 그와 관련된 투자 정보를 정확하고 빠르게 제공할 수 있는 기술적 사상이 요구된다.In addition, each user's preferred investment index may be different. Therefore, each user's preferred investment index is identified in advance, and when a change in the investment index is detected, relevant investment information can be provided accurately and quickly. Technical thinking is required.

한국공개특허 제10-2015-0083620호Korean Patent Publication No. 10-2015-0083620 한국공개특허 제10-2017-0056040호Korean Patent Publication No. 10-2017-0056040 한국등록특허 제10-1531970호Korean Patent Registration No. 10-1531970

본 발명이 이루고자 하는 기술적인 과제는 소셜 미디어/인터넷 커뮤니티의 게시물들과 같은 다수의 비정형 데이터(예를 들어, 뉴스, 사용자들이 작성한 게시글, 댓글 등; '비정형 문서'라고도 함)로부터 특정 금융 투자 종목에 실질적으로 유의미한 연관 키워드를 인공지능 기반으로 추출할 수 있는 기술적 사상을 제공하는 것이다. 나아가 네트워크 상에서 유통되는 다수의 문서들을 통해 유의미하고 신뢰성 있는 투자지표를 결정할 수 있는 기술적 사상을 제공하는 것이다.The technical problem to be achieved by the present invention is a specific financial investment item from a number of unstructured data (eg, news, posts written by users, comments, etc.; also referred to as 'unstructured documents') such as social media/internet community posts. It is to provide a technical idea that can extract practically meaningful related keywords based on artificial intelligence. Furthermore, it is to provide technical ideas that can determine meaningful and reliable investment indicators through a number of documents circulating on the network.

또한 특정한 투자 지표에 관심이 있는 투자자에게 해당 투자 지표의 변화가 발생한 경우 관련 금융 투자 종목에 대한 정보를 신속하게 제공할 수 있는 기술적 사상을 제공하는 것이다.In addition, it is to provide technical ideas that can promptly provide information on related financial investment items to investors who are interested in specific investment indicators when changes in the relevant investment indicators occur.

본 발명의 일 측면에 따르면, 컴퓨팅 시스템이, 복수의 투자 종목 각각에 대응되는 연관 투자영향지표를 저장하는 종목-지표 관계 DB를 구축하는 단계; 상기 컴퓨팅 시스템이, 복수의 사용자 각각에 대응되는 관심 투자영향지표를 저장하는 사용자-지표 관계 DB를 구축하는 단계; 상기 컴퓨팅 시스템이, 상기 복수의 투자종목 각각에 대응되는 연관 투자영향지표의 업데이트 여부를 관측하는 단계; 업데이트가 발생한 연관 투자지표가 관측된 경우, 상기 컴퓨팅 시스템이, 상기 사용자-지표 관계 DB에 기초하여 상기 복수의 사용자 중 업데이트가 발생한 상기 연관 투자지표에 대응되는 타겟 사용자를 판단하는 단계; 및 상기 컴퓨팅 시스템이, 상기 타겟 사용자에게 업데이트가 발생한 상기 연관 투자지표에 대응되는 투자 종목에 대한 개인화된 정보를 제공하는 단계를 포함하는 종목 정보 제공 방법이 제공된다.According to one aspect of the present invention, the computing system, constructing an item-index relationship DB for storing the relevant investment impact index corresponding to each of a plurality of investment items; constructing, by the computing system, a user-indicator relationship DB for storing an investment impact index of interest corresponding to each of a plurality of users; Observing, by the computing system, whether an associated investment impact index corresponding to each of the plurality of investment items is updated; determining, by the computing system, a target user corresponding to the updated related investment index from among the plurality of users based on the user-index relationship DB, when a related investment index with an update is observed; and providing, by the computing system, personalized information about an investment item corresponding to the related investment index in which an update has occurred to the target user.

일 실시예에서, 상기 사용자-지표 관계 DB를 구축하는 단계는, 상기 복수의 사용자 각각에 대하여, 상기 사용자가 소비한 비정형 데이터들을 수집하는 단계; 및 상기 사용자가 소비한 비정형 데이터들에 기초하여 상기 사용자에 대응되는 관심 투자영향지표를 판단하는 단계를 포함할 수 있다.In an embodiment, the constructing the user-indicator relationship DB may include: collecting, for each of the plurality of users, unstructured data consumed by the user; and determining an investment impact index of interest corresponding to the user based on unstructured data consumed by the user.

일 실시예에서, 상기 종목 정보 제공 방법은, 상기 컴퓨팅 시스템이, 상기 복수의 투자 종목 각각에 대응되는 연관 키워드를 저장하는 종목-키워드 관계 DB를 구축하는 단계를 더 포함하고, 상기 사용자-지표 관계 DB를 구축하는 단계는, 상기 복수의 사용자 각각에 대하여, 상기 사용자가 소비한 비정형 데이터들을 수집하는 단계; 상기 사용자가 소비한 비정형 데이터들에 기초하여 상기 사용자의 관심 키워드를 추출하는 단계; 상기 종목-키워드 관계 DB로부터 상기 사용자의 관심 키워드에 대응되는 투자 종목을 추출하는 단계; 및 상기 종목-지표 관계 DB로부터 상기 사용자의 관심 키워드에 대응되는 투자 종목에 대응되는 투자영향지표를 추출하는 단계를 포함할 수 있다.In one embodiment, the method of providing item information further comprises constructing, by the computing system, an item-keyword relationship DB storing a related keyword corresponding to each of the plurality of investment items, and the user-indicator relationship. Building a DB may include: collecting unstructured data consumed by the user for each of the plurality of users; extracting a keyword of interest of the user based on unstructured data consumed by the user; extracting an investment item corresponding to the user's interest keyword from the item-keyword relation DB; and extracting an investment impact indicator corresponding to an investment item corresponding to the user's interest keyword from the item-indicator relation DB.

일 실시예에서, 복수의 투자 종목 각각에 대응되는 연관 투자영향지표를 포함하는 종목-지표 관계 DB를 구축하는 단계는, 상기 복수의 투자 종목 각각에 대하여, 상기 투자 종목에 대응되는 연관 투자영향지표 결정방법을 수행하되, 상기 투자 종목에 대응되는 연관 투자지표 결정방법은, 상기 투자 종목 또는 상기 투자 종목의 연관 키워드에 상응하는 복수의 비정형 문서들을 수집하는 단계; 및 수집한 비정형 문서들에 기초하여 상기 투자 종목에 상응하는 투자영향지표를 결정하는 단계를 포함하며, 상기 수집한 비정형 문서들에 기초하여 상기 투자 종목에 상응하는 투자영향지표를 결정하는 단계는, 수집된 비정형 문서들 중 전부 또는 일부인 판단대상 문서들에 대해 컨텍스트 센서티브(context-sensitive)한 자연어 처리모델을 통해 학습된 영향 판단모델의 출력결과에 기초하여 상기 투자영향지표를 결정하는 단계를 포함하되, 상기 영향 판단모델은 상기 판단대상 문서들에 포함된 개별문서 자체 또는 상기 개별문서에 포함된 문장 별로 투자에 긍정적 영향을 미치는지 또는 부정적 영향을 미치는지 여부를 포함하는 분류결과를 출력할 수 있도록 학습된 모델일 수 있다.In one embodiment, the step of constructing an item-indicator relationship DB including related investment impact indicators corresponding to each of a plurality of investment items includes, for each of the plurality of investment items, the related investment impact indicator corresponding to the investment item. Performing a determination method, wherein the method of determining a related investment index corresponding to the investment item includes: collecting a plurality of atypical documents corresponding to the investment item or a keyword associated with the investment item; and determining an investment impact index corresponding to the investment item based on the collected atypical documents, wherein determining an investment impact index corresponding to the investment item based on the collected atypical documents comprises: Determining the investment impact index based on the output result of the impact judgment model learned through a context-sensitive natural language processing model for all or some of the collected unstructured documents to be judged; , The impact judgment model is learned to output a classification result including whether each individual document included in the documents to be judged or each sentence included in the individual document has a positive or negative impact on investment. can be a model

일 실시예에서, 상기 투자 종목에 대응되는 연관 투자지표 결정방법은, 수집한 비정형 문서들 중 미리 정해진 필터링 조건에 해당하는 비정형 문서에 대한 필터링을 수행하는 단계를 더 포함하며, 필터링을 수행한 후에 남은 비정형 문서들이 상기 판단대상 문서들로 특정되는 것을 특징으로 할 수 있다.In one embodiment, the method for determining the related investment index corresponding to the investment item further includes performing filtering on unstructured documents corresponding to a predetermined filtering condition among collected unstructured documents, and after performing the filtering It may be characterized in that the remaining atypical documents are specified as the documents to be judged.

일 실시예에서, 상기 시스템이 수집한 비정형 문서들 중 미리 정해진 필터링 조건에 해당하는 비정형 문서에 대한 필터링을 수행하는 단계는, 상기 수집한 비정형 문서들 각각에 대한 문서 벡터(document vector)를 생성하는 단계; 생성한 문서 벡터들에 기초하여 유사도가 일정 수준 이상인 비정형 문서들을 클러스터링 하는 단계; 및 클러스터링된 비정형 문서 클러스터들 각각에서 일부를 판단대상 문서에서 제외하도록 필터링을 수행하는 단계를 포함할 수 있다.In one embodiment, the step of filtering on unstructured documents corresponding to a predetermined filtering condition among the unstructured documents collected by the system includes generating a document vector for each of the collected unstructured documents. step; clustering atypical documents having a similarity of at least a certain level based on the generated document vectors; and performing filtering to exclude a part of each of the clustered irregular document clusters from the document to be judged.

일 실시예에서, 상기 생성한 문서 벡터들에 기초하여 유사도가 일정 수준 이상인 비정형 문서들을 클러스터링 하는 단계는, 상기 유사도가 일정 수준 이상이며 비정형 문서의 생성시간이 미리 정해진 시간범위 내인 비정형 문서들 간에 클러스터링을 수행하는 단계를 포함할 수 있다.In one embodiment, the clustering of irregular documents having a similarity of at least a predetermined level based on the generated document vectors includes clustering between irregular documents having a similarity of at least a predetermined level and a generation time of the unstructured documents within a predetermined time range. It may include the step of performing.

일 실시예에서, 상기 투자 종목 또는 상기 투자 종목의 연관 키워드에 상응하는 복수의 비정형 문서들을 수집하는 단계는, 상기 투자 종목에 상응하는 제1비정형 문서들 및 상기 연관 키워드에 상응하는 제2비정형 문서들을 각각 수집하는 단계를 포함하며, 상기 수집한 비정형 문서들에 기초하여 상기 투자 종목에 상응하는 투자영향지표를 결정하는 단계는, 상기 제1비정형 문서들에 기초하여 추출된 제1투자영향지표 및 상기 제2비정형 문서들에 기초하여 추출된 제2투자영향지표에 기초하여 결정하는 단계를 포함할 수 있다.In one embodiment, the step of collecting a plurality of unstructured documents corresponding to the investment item or a keyword related to the investment item includes the first unstructured documents corresponding to the investment item and the second unstructured document corresponding to the related keyword. The step of determining an investment impact index corresponding to the investment item based on the collected atypical documents includes a first investment impact index extracted based on the first atypical documents and and determining based on a second investment impact index extracted based on the second atypical documents.

일 실시예에서, 상기 제1비정형 문서들에 기초하여 추출된 제1투자영향지표 및 상기 제2비정형 문서들에 기초하여 추출된 제2투자영향지표에 기초하여 결정하는 단계는, 상기 제1투자영향지표 및 상기 제2투자영향지표별로 서로 다른 가중치를 가지도록 상기 투자영향지표가 결정되는 것을 특징으로 할 수 있다.In one embodiment, the step of determining based on the first investment impact index extracted based on the first unstructured documents and the second investment impact index extracted based on the second unstructured documents, It may be characterized in that the investment impact index is determined to have different weights for each impact index and the second investment impact index.

일 실시예에서, 상기 영향 판단모델은, 상기 개별문서에 포함된 문장 별로 투자에 긍정적 영향을 미치는지 또는 부정적 영향을 미치는지 여부를 포함하는 분류결과를 출력하며, 판단된 문장 별 분류결과에 기초하여 상기 개별문서의 분류결과가 판단되는 것을 특징으로 할 수 있다.In one embodiment, the impact determination model outputs a classification result including whether each sentence included in the individual document has a positive or negative impact on investment, and based on the classification result for each sentence, It may be characterized in that the classification result of the individual document is judged.

본 발명의 다른 일 측면에 따르면, 상술한 방법을 수행하기 위하여 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램이 제공된다.According to another aspect of the present invention, a computer program stored in a computer readable recording medium is provided to perform the above method.

본 발명의 다른 일 측면에 따르면, 상술한 방법을 수행하는 컴퓨터 프로그램이 기록된 컴퓨터 판독 가능한 기록매체가 제공된다.According to another aspect of the present invention, a computer readable recording medium on which a computer program for performing the above method is recorded is provided.

본 발명의 다른 일 측면에 따르면, 컴퓨팅 시스템으로서, 프로세서; 및 상기 프로세서에 의하여 실행되는 컴퓨터 프로그램을 저장하는 메모리를 포함하며, 상기 컴퓨터 프로그램은, 상기 프로세서에 의해 실행되는 경우, 상기 컴퓨팅 시스템으로 하여금, 복수의 투자 종목 각각에 대응되는 연관 투자영향지표를 저장하는 종목-지표 관계 DB를 구축하는 단계; 복수의 사용자 각각에 대응되는 관심 투자영향지표를 저장하는 사용자-지표 관계 DB를 구축하는 단계; 상기 복수의 투자종목 각각에 대응되는 연관 투자영향지표의 업데이트 여부를 관측하는 단계; 업데이트가 발생한 연관 투자지표가 관측된 경우, 상기 사용자-지표 관계 DB에 기초하여 상기 복수의 사용자 중 업데이트가 발생한 상기 연관 투자지표에 대응되는 타겟 사용자를 판단하는 단계; 및 상기 타겟 사용자에게 업데이트가 발생한 상기 연관 투자지표에 대응되는 투자 종목에 대한 개인화된 정보를 제공하는 단계를 포함하는 방법을 수행하도록 하는 컴퓨팅 시스템이 제공된다.According to another aspect of the present invention, a computing system comprising: a processor; and a memory for storing a computer program executed by the processor, wherein the computer program, when executed by the processor, causes the computing system to store a related investment impact index corresponding to each of a plurality of investment items. constructing an item-indicator relation DB; Constructing a user-indicator relationship DB that stores investment impact indicators of interest corresponding to each of a plurality of users; Observing whether the related investment impact index corresponding to each of the plurality of investment items is updated; determining a target user corresponding to the related investment index in which an update has occurred among the plurality of users based on the user-indicator relationship DB when a related investment index in which an update has occurred is observed; and providing the target user with personalized information about an investment item corresponding to the related investment index in which an update has occurred.

본 발명의 일 실시예에 따르면 소셜 미디어/인터넷 커뮤니티의 게시물들과 같은 다수의 비정형 데이터로부터 특정 금융 투자 종목에 실질적으로 유의미한 연관 키워드를 인공지능 기반으로 추출할 수 있다.According to an embodiment of the present invention, it is possible to extract related keywords that are substantially meaningful to a specific financial investment event based on artificial intelligence from a plurality of unstructured data such as social media/internet community posts.

또한 특정인의 단편적 지식이나 노출빈도에 따른 단순한 연관 키워드의 추출이 아닌 키워드 추출시점에서의 살아있는 정보들로부터 해당 정보들의 의미를 고려하여 실질적으로 해당 금융 투자 종목과 연관성이 있는 키워드를 추출할 수 있는 효과가 있다.In addition, the effect of extracting keywords that are substantially related to the financial investment item by considering the meaning of the information from live information at the time of keyword extraction, rather than simply extracting related keywords according to the fragmentary knowledge of a specific person or exposure frequency there is

또한 소셜 미디어와 같은 비정형 데이터의 특성상 동일한 또는 거의 유사한 컨텐츠가 서로 다른 게시자에 의해 네트워크상에 업로드되는 경우(예를 들어, 동일하거나 매우 유사한 뉴스 기사가 여러 매체에 의해 네트워크 상에 업로드되거나, 특정인의 컨텐츠가 다수의 사용자들에 의해 반복적으로 업로드되는 경우)가 빈번한데 이러한 비정형 데이터의 중복성까지 필터링을 수행하여 어느 정도 해소한 후에 연관 키워드를 추출함으로써 정확도를 향상시킬 수 있는 효과가 있다.In addition, due to the nature of unstructured data such as social media, when the same or almost similar content is uploaded on the network by different publishers (for example, the same or very similar news article is uploaded on the network by multiple media, or a specific person's When content is repeatedly uploaded by a large number of users) is frequent, there is an effect of improving accuracy by extracting related keywords after filtering is performed to a certain extent to eliminate redundancy of such unstructured data.

또한 연관 키워드와 관련성이 있는 다른 키워드로 확장을 하는 경우 이슈에 직접적인 금융 투자 종목뿐만 아니라 간접적인 관련성을 갖는 금융 투자 종목까지도 탐색할 수 있는 효과가 있다.In addition, when expanding to other keywords that are related to the related keyword, it is possible to search not only financial investment items that are directly related to the issue, but also financial investment items that are indirectly related to the issue.

또한 네트워크 상에서 유통되는 다수의 문서들을 통해 해당 문서가 특정 금융상품에 대해 긍정적인지 또는 부정적인지를 문장 또는 문서단위로 판단할 수 있는 딥러닝 모델을 설계하고 이를 통해 유의미하고 신뢰성 있는 연관 투자지표를 결정할 수 있는 효과가 있다.In addition, through a large number of documents circulating on the network, a deep learning model can be designed to determine whether the document is positive or negative for a specific financial product in a sentence or document unit, and through this, a meaningful and reliable related investment index can be determined. There is an effect.

또한 이러한 연관 투자지표에 기반하여, 특정 투자 지표에 대한 개인화된 정보를 제공할 수 있다. 즉 특정 투자 지표의 변화가 감지된 경우 그와 연관성이 깊은 투자 종목에 관한 정보를 해당 투자 지표에 관심이 많은 투자자에게 신속하게 제공할 수 있는 효과가 있다.In addition, based on these related investment indicators, personalized information on specific investment indicators can be provided. In other words, when a change in a specific investment indicator is detected, information on an investment item closely related thereto can be quickly provided to investors who are interested in the investment indicator.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.
도 1은 본 발명의 기술적 사상에 따른 인공지능 기반의 개인화된 종목 정보 제공 방법을 수행하기 위한 개략적인 시스템 구성들을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 종목 정보 제공 시스템의 개략적인 논리적인 구성을 도시한 블록도이다
도 3은 본 발명의 실시 예에 따른 종목 정보 제공 시스템의 개략적인 물리적 구성을 설명하기 위한 도면이다.
도 4는 본 발명의 실시 예에 따른 특정 종목의 연관 키워드/연관 투자영향지표 결정방법을 수행하는 종목 DB 구축모듈의 개략적인 구성을 설명하기 위한 블록도이다.
도 5는 본 발명의 실시 예에 따른 자연어 처리모델의 개념을 설명하기 위한 도면이다.
도 6은 본 발명의 실시 예에 따라 워드벡터를 통한 연관 키워드를 추출하는 개념을 설명하기 위한 도면이다.
도 7은 본 발명의 실시 예에 따라 필터링을 수행하는 개념을 설명하기 위한 도면이다.
도 8은 본 발명의 실시 예에 따라 연관 키워드의 변화를 설명하기 위한 도면이다.
도 9는 본 발명의 실시 예에 따라 종목-키워드 관계 DB가 구축되는 과정을 도시한 흐름도를 예시적으로 도시한 것이다.
도10는 본 발명의 실시 예에 따라 종목-지표 관계 DB를 구축하는 과정을 설명하기 위한 도면이다.
도 11은 본 발명의 실시 예에 따른 비정형 문서들을 이용하여 투자영향지표를 결정하는 개념을 설명하기 위한 도면이다.
도 12는 본 발명의 실시 예에 따라 문장단위의 감성평가를 통해 문서의 영향평가를 수행하는 개념을 설명하기 위한 도면이다.
도 13은 본 발명의 실시 예에 따라 문서의 차수를 반영하여 적응적으로 영향평가를 수행하는 개념을 설명하기 위한 도면이다.
도 14는 본 발명의 일 실시예에 따른 인공지능 기반의 개인화된 종목 정보 제공 방법의 일 예를 도시한 흐름도이다.In order to more fully understand the drawings cited in the detailed description of the present invention, a brief description of each drawing is provided.
1 is a diagram for explaining schematic system configurations for performing an AI-based personalized event information providing method according to the technical idea of the present invention.
2 is a block diagram showing a schematic and logical configuration of an item information providing system according to an embodiment of the present invention.
3 is a diagram for explaining a schematic physical configuration of an event information providing system according to an embodiment of the present invention.
4 is a block diagram for explaining a schematic configuration of an item DB construction module that performs a method for determining a keyword related to a specific item/related investment impact index according to an embodiment of the present invention.
5 is a diagram for explaining the concept of a natural language processing model according to an embodiment of the present invention.
6 is a diagram for explaining the concept of extracting a related keyword through a word vector according to an embodiment of the present invention.
7 is a diagram for explaining a concept of performing filtering according to an embodiment of the present invention.
8 is a diagram for explaining changes in related keywords according to an embodiment of the present invention.
9 is an exemplary flowchart showing a process of constructing an item-keyword relation DB according to an embodiment of the present invention.
10 is a diagram for explaining a process of constructing an event-indicator relation DB according to an embodiment of the present invention.
11 is a diagram for explaining the concept of determining an investment impact index using atypical documents according to an embodiment of the present invention.
12 is a diagram for explaining the concept of performing the impact evaluation of a document through sentiment evaluation in sentence units according to an embodiment of the present invention.
13 is a diagram for explaining the concept of adaptively performing impact evaluation by reflecting the order of a document according to an embodiment of the present invention.
14 is a flowchart illustrating an example of a method for providing personalized item information based on artificial intelligence according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Since the present invention can apply various transformations and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, it should be understood that this is not intended to limit the present invention to specific embodiments, and includes all transformations, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the present invention, if it is determined that a detailed description of related known technologies may obscure the gist of the present invention, the detailed description will be omitted.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. Terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise.

본 명세서에 있어서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this specification, terms such as "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other It should be understood that the presence or addition of features, numbers, steps, operations, components, parts, or combinations thereof is not precluded.

또한, 본 명세서에 있어서는 어느 하나의 구성요소가 다른 구성요소로 데이터를 '전송'하는 경우에는 상기 구성요소는 상기 다른 구성요소로 직접 상기 데이터를 전송할 수도 있고, 적어도 하나의 또 다른 구성요소를 통하여 상기 데이터를 상기 다른 구성요소로 전송할 수도 있는 것을 의미한다. 반대로 어느 하나의 구성요소가 다른 구성요소로 데이터를 '직접 전송'하는 경우에는 상기 구성요소에서 다른 구성요소를 통하지 않고 상기 다른 구성요소로 상기 데이터가 전송되는 것을 의미한다.In addition, in the present specification, when one component 'transmits' data to another component, the component may directly transmit the data to the other component, or through at least one other component. It means that the data can be transmitted to the other component. Conversely, when one component 'directly transmits' data to another component, it means that the data is transmitted from the component to the other component without going through the other component.

도 1은 본 발명의 기술적 사상에 따른 인공지능 기반의 개인화된 종목 정보 제공 방법을 수행하기 위한 개략적인 시스템 구성들을 설명하기 위한 도면이다.1 is a diagram for explaining schematic system configurations for performing an AI-based personalized event information providing method according to the technical idea of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 인공지능 기반의 개인화된 종목 정보 제공 방법은 소정의 금융 투자 종목 정보 제공 시스템(100; 이하 '종목 정보 제공 시스템'이라고 함)에 의해 수행될 수 있다.Referring to FIG. 1, an artificial intelligence-based personalized item information providing method according to an embodiment of the present invention is performed by a predetermined financial investment item information providing system (100; hereinafter referred to as 'item information providing system'). can

상기 종목 정보 제공 시스템(10)은 상기 종목 정보 제공 시스템(10)은 다양한 사회적 이슈가 발생한 경우, 이러한 이슈에 관련성이 있는 금융 투자 종목(이하, '이슈 종목'이라고 함)을 결정하여 사용자 단말(30)로 제공할 수 있다. 금융 투자 종목은, 예를 들어, 주식, 선물, 옵션 등의 금융상품의 개별 종목을 의미할 수 있다.The item information providing system 10 determines a financial investment item (hereinafter referred to as 'issue item') related to the issue when various social issues occur, and the user terminal ( 30) can be provided. A financial investment item may mean, for example, an individual item of a financial product such as stocks, futures, and options.

특히 상기 종목 정보 제공 시스템(10)은 상기 사용자 단말(30)의 사용자가 평소에 관심이 있는 투자 지표('투자 팩터'라고도 하며, 관련 종목의 주가에 영향을 미치므로 '투자영향지표'라고도 함)를 분석하여 추출하고 미리 데이터베이스화할 수 있으며, 관심 투자 지표에 업데이트 발생한 경우, 상기 관심 투자 지표에 연관된 연관 금융 투자 종목에 대한 정보, 즉 상기 사용자에 개인화된 정보를 제공할 수 있다.In particular, the item information providing system 10 is an investment indicator (also referred to as an 'investment factor' that the user of the user terminal 30 is usually interested in, and is also referred to as an 'investment impact indicator' because it affects the stock price of a related item. ) can be analyzed, extracted, and databased in advance, and when an update to the investment index of interest occurs, information on related financial investment items related to the investment index of interest, that is, personalized information to the user can be provided.

사용자에게 제공되는 특정 금융 투자 종목에 대한 정보는 해당 종목의 명칭, 테마, 주가의 흐름(예를 들어 봉차트와 같은 시계열 차트), 해당 종목 관련 뉴스, 해당 종목과 관련된 애널리스트 분석 리포트, 해당 종목을 포함하는 섹터에 대한 시황 정보, 공시 정보, 해당 종목과 관련된 각종 투자지표(예를 들어, 시가총액, PER, PBR, 배당수익율, 52주 신고가. 52주 신저가, 이동평균선 등), 해당 종목을 포함하는 투자 전략, 투자 로직, 포트폴리오에 관한 정보를 포함할 수 있으나 이에 한정되는 것은 아니며, 상기 금융 투자 종목과 관련된 다양한 정보 또는 컨텐츠를 포함할 수 있다.Information on a specific financial investment item provided to users includes the name of the item, the theme, the flow of stock prices (eg, a time series chart such as a bar chart), news related to the item, analyst analysis reports related to the item, and the corresponding item. Market information on sectors included, disclosure information, various investment indicators related to the stock (e.g., market capitalization, PER, PBR, dividend yield, 52-week new price, 52-week new low, moving average, etc.), Information on the included investment strategy, investment logic, and portfolio may be included, but is not limited thereto, and may include various information or contents related to the financial investment item.

후술할 바와 같이 상기 종목 정보 제공 시스템(10)은 개별 사용자의 관심 투자 지표나 특정 종목의 연관 키워드, 혹은 특정 종목의 연관 투자 지표 등을 파악하기 위하여 비정형 데이터를 이용할 수 있으며, 상기 종목 정보 제공 시스템(10)은 네트워크를 통해 각종 비정형 데이터들을 수집할 수 있다. 본 명세서에서 비정형 데이터라 함은 네트워크 상에서 유통되는 다양한 컨텐츠를 의미할 수 있으며, 이러한 비정형 데이터는 언론사, 금융기관, 증권사, 일반적인 사용자들에 의해 생성될 수 있다. 상기 비정형 데이터는 일종의 문서로서 취급될 수 있으며, 텍스트가 포함되어 있으면 컨텐츠의 길이가 길거나(예를 들어, 몇 줄 또는 몇 십 줄 이상의 게시글) 또는 짧거나(예를 들어, 한 줄의 댓글 등) 무관하게 상기 비정형 데이터가 될 수 있다. 또한 본 명세서에서 정의되는 비정형 데이터는 상기 종목 정보 제공 시스템(10)에 의해 수집되는 데이터가 미리 정해진 형식에 제한을 받지 않는다는 것을 의미하는 것이지, 상기 비정형 데이터의 각 생산자는 일정한 형식과 규칙을 가지는 비정형 데이터를 생성할 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.As will be described later, the item information providing system 10 may use unstructured data to identify investment indicators of interest of individual users, keywords related to a specific item, or investment indicators related to a specific item, and the item information providing system (10) can collect various unstructured data through the network. In this specification, unstructured data may refer to various contents circulating on a network, and such unstructured data may be generated by media companies, financial institutions, securities companies, and general users. The unstructured data can be treated as a kind of document, and if text is included, the length of the content is long (eg, several lines or more than several tens of lines) or short (eg, one line of comments, etc.) It can be the unstructured data regardless. In addition, the unstructured data defined in this specification means that the data collected by the item information providing system 10 is not limited to a predetermined format, but each producer of the unstructured data has a certain format and rules. An average expert in the field of the present invention will be able to easily infer that data can be generated.

상기 종목 정보 제공 시스템(10)은 이러한 비정형 데이터를 소정의 정보 소스 시스템(20)으로부터 획득할 수 있다. 상기 정보 소스 시스템(20)은 비정형 데이터를 생성하거나 생산자(즉, 상기 정보 소스 시스템(20)의 사용자, 관리자 혹은 게시자)에 의해 생성되는 비정형 데이터를 저장하고 등록/게시할 수 있는 시스템일 수 있다. 상기 정보 소스 시스템(20)은, 예를 들어, 소셜 미디어 서비스 시스템, 각종 온라인 커뮤니티 서비스 시스템, 언론사 시스템, 금융기관 시스템, 증권사 시스템을 비롯하여 사용자가 텍스트 형태의 게시물이나 댓글을 게시할 수 있는 기능을 제공하는 각종 시스템을 포함할 수 있다.The item information providing system 10 may acquire such unstructured data from a predetermined information source system 20 . The information source system 20 may be a system capable of generating unstructured data or storing and registering/posting unstructured data generated by a producer (ie, a user, manager, or publisher of the information source system 20). . The information source system 20 includes, for example, a social media service system, various online community service systems, a media company system, a financial institution system, a stock company system, and a function for users to post posts or comments in text form. A variety of systems may be provided.

도 1에서는 설명의 편의상 하나의 대표적인 정보 소스 시스템(20)만을 도시하고 있으나 실제로 상기 종목 정보 제공 시스템(10)은 네트워크 상에 존재하는 수 많은 정보 소스 시스템들로부터 비정형 데이터를 수집할 수 있다.Although FIG. 1 shows only one representative information source system 20 for convenience of explanation, in reality, the item information providing system 10 can collect unstructured data from numerous information source systems existing on the network.

상기 종목 정보 제공 시스템(10)은 주기적으로 또는 관리자의 명령에 따라 각종 비정형 데이터들을 크롤링할 수 있다. 구현 예에 따라서는 상기 비정형 데이터의 수집 대상이 되는 시스템(즉, 상기 정보 소스 시스템(20))에서 미리 정해진 프로토콜(예를 들어, API)을 통해 비정형 데이터를 수집할 수도 있다. 일 예에 의하면 상기 종목 정보 제공 시스템(10)은 대량의 비정형 데이터를 벌크로 수집한 후 필요한 특정 비정형 데이터만을 추출하여 사용할 수도 있으며, 특정 조건(예를 들어, 특정 사용자, 특정 종목, 특정 기간)을 만족하는 비정형 데이터들만을 수집할 수도 있다.The item information providing system 10 may crawl various unstructured data periodically or according to a manager's command. Depending on implementation, unstructured data may be collected through a predetermined protocol (eg, API) in a system to which the unstructured data is collected (ie, the information source system 20). According to an example, the item information providing system 10 may collect a large amount of unstructured data in bulk and then extract and use only necessary specific unstructured data, and may use specific conditions (eg, specific user, specific item, specific period). It is also possible to collect only unstructured data that satisfies .

한편, 상기 종목 정보 제공 시스템(10)은 본 발명의 기술적 사상을 구현하기 위한 연산능력을 가진 데이터 처리장치인 컴퓨팅 시스템일 수 있으며, 일반적으로 네트워크를 통해 클라이언트가 접속 가능한 데이터 처리 장치인 서버뿐만 아니라 개인용 컴퓨터나 휴대 단말 등과 같은 컴퓨팅 장치를 포함할 수 있다.On the other hand, the item information providing system 10 may be a computing system that is a data processing device having an arithmetic capability for implementing the technical idea of the present invention, and generally not only a server that is a data processing device accessible to clients through a network. It may include a computing device such as a personal computer or a portable terminal.

상기 종목 정보 제공 시스템(10)은 어느 하나의 물리적 장치로 구현될 수도 있으나, 필요에 따라 복수의 물리적 장치가 유기적으로 결합되어 본 발명의 기술적 사상에 따른 상기 종목 정보 제공 시스템(10)을 구현할 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.The item information providing system 10 may be implemented with any one physical device, but if necessary, a plurality of physical devices may be organically combined to implement the item information providing system 10 according to the technical idea of the present invention. An average expert in the art of the present invention will be able to easily infer that there is.

상기 종목 정보 제공 시스템(10)은 소정의 모(母) 시스템(미도시)의 서브 시스템의 형태로 구현될 수도 있다. 상기 모 시스템은, 예를 들어, 증권사 시스템, 은행과 같은 금융기관의 시스템 또는 커뮤니티 서비스를 제공하거나 각종 투자정보를 제공하는 시스템일 수 있다. 또한 상기 모 시스템은 서버일 수 있다. 상기 서버는 본 발명의 기술적 사상을 구현하기 위한 연산능력을 가진 데이터 처리장치를 의미하며, 일반적으로 네트워크를 통해 클라이언트가 접속 가능한 데이터 처리장치뿐만 아니라 개인용 컴퓨터, 휴대 단말 등과 같이 특정 서비스를 수행할 수 있는 어떠한 장치도 서버로 정의될 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다. The item information providing system 10 may be implemented in the form of a subsystem of a predetermined parent system (not shown). The parent system may be, for example, a securities company system, a system of a financial institution such as a bank, or a system that provides community services or various types of investment information. Also, the parent system may be a server. The server refers to a data processing device having computing power for implementing the technical idea of the present invention, and generally can perform a specific service, such as a personal computer or mobile terminal, as well as a data processing device accessible to clients through a network. An average expert in the art of the present invention can easily infer that any device that exists can be defined as a server.

상기 사용자 단말(30)은 프로세싱 장치를 구비한 컴퓨팅 장치일 수 있다. 예를 들어 상기 사용자 단말(200)은 스마트폰, 휴대전화, PDA, 태블릿 PC와 같은 모바일 장치일 수 있으나 이에 한정되는 것은 아니며, 랩탑, 테스크탑 등을 포함할 수 있다. 한편 상기 사용자 단말(200)은 발화자의 음성을 입력받을 수 있는 마이크, 상기 학습 시스템(100)이 제공하는 학습용 컨텐츠를 출력할 수 있는 디스플레이 장치 및 스피커를 포함할 수 있다.The user terminal 30 may be a computing device having a processing device. For example, the user terminal 200 may be a mobile device such as a smart phone, a mobile phone, a PDA, or a tablet PC, but is not limited thereto, and may include a laptop, a desktop, and the like. Meanwhile, the user terminal 200 may include a microphone capable of receiving a speaker's voice, a display device capable of outputting contents for learning provided by the learning system 100, and a speaker.

한편 상기 종목 정보 제공 시스템(10)은 상기 사용자 단말(20)과 유/무선 네트워크(예를 들어, 인터넷)를 통해 연결되어 본 발명의 기술적 사상을 구현하는데 필요한 각종 정보, 데이터 및/또는 신호를 송수신할 수 있다.Meanwhile, the item information providing system 10 is connected to the user terminal 20 through a wired/wireless network (eg, Internet) to provide various information, data, and/or signals necessary to implement the technical idea of the present invention. can transmit and receive.

도 2는 본 발명의 일 실시예에 따른 종목 정보 제공 시스템(10)의 개략적인 논리적인 구성을 도시한 블록도이며, 도 3은 본 발명의 실시 예에 따른 종목 정보 제공 시스템(10)의 개략적인 물리적 구성을 설명하기 위한 도면이다.2 is a block diagram showing a schematic and logical configuration of an item information providing system 10 according to an embodiment of the present invention, and FIG. 3 is a schematic block diagram of the item information providing system 10 according to an embodiment of the present invention. It is a drawing for explaining the physical configuration of

도 2를 참조하면, 본 발명의 일 실시예예 따른 종목 정보 제공 시스템(10)은 데이터 수집모듈(50), 종목 DB 구축모듈(100), 사용자-지표 관계 DB 구축모듈(300), 업데이트 판단모듈(300), 정보 제공 모듈(400), DB(500)를 포함할 수 있다. 본 발명의 실시예에 따라서는, 상술한 구성요소들 중 일부 구성요소는 반드시 본 발명의 구현에 필수적으로 필요한 구성요소에 해당하지 않을 수도 있으며, 또한 실시예에 따라 상기 종목 정보 제공 시스템(10)은 이보다 더 많은 구성요소를 포함할 수도 있음은 물론이다. 예를 들어 상기 종목 정보 제공 시스템(10)은 외부 장치와 통신하기 위한 통신모듈(미도시), 상기 종목 정보 제공 시스템(10)의 구성요소 및 리소스를 제어하기 위한 제어모듈(미도시), 각종 데이터 및 정보를 저장할 수 있는 저장모듈(미도시) 등을 더 포함할 수 있다.Referring to FIG. 2, the item information providing system 10 according to an embodiment of the present invention includes a data collection module 50, an item DB construction module 100, a user-index relationship DB construction module 300, and an update judgment module. 300, an information providing module 400, and a DB 500 may be included. Depending on the embodiment of the present invention, some of the above-described components may not necessarily correspond to components essential to the implementation of the present invention, and also according to the embodiment, the item information providing system 10 Of course, may include more components than this. For example, the item information providing system 10 includes a communication module (not shown) for communicating with an external device, a control module (not shown) for controlling components and resources of the item information providing system 10, various A storage module (not shown) capable of storing data and information may be further included.

상기 종목 정보 제공 시스템(10)은 본 발명의 기술적 사상을 구현하기 위해 필요한 하드웨어 리소스(resource) 및/또는 소프트웨어를 구비한 논리적인 구성을 의미할 수 있으며, 반드시 하나의 물리적인 구성요소를 의미하거나 하나의 장치를 의미하는 것은 아니다. 즉, 상기 종목 정보 제공 시스템(10)은 본 발명의 기술적 사상을 구현하기 위해 구비되는 하드웨어 및/또는 소프트웨어의 논리적인 결합을 의미할 수 있으며, 필요한 경우에는 서로 이격된 장치에 설치되어 각각의 기능을 수행함으로써 본 발명의 기술적 사상을 구현하기 위한 논리적인 구성들의 집합으로 구현될 수도 있다. 또한, 상기 종목 정보 제공 시스템(10)은 본 발명의 기술적 사상을 구현하기 위한 각각의 기능 또는 역할별로 별도로 구현되는 구성들의 집합을 의미할 수도 있다. 예를 들어, 상기 데이터 수집모듈(50), 종목 DB 구축모듈(100), 사용자-지표 관계 DB 구축모듈(200), 업데이트 판단모듈(300), 정보 제공 모듈(400), DB(500) 각각은 서로 다른 물리적 장치에 위치할 수도 있고, 동일한 물리적 장치에 위치할 수도 있다. 또한, 구현 예에 따라서는 상기 데이터 수집모듈(50), 종목 DB 구축모듈(100), 사용자-지표 관계 DB 구축모듈(200), 업데이트 판단모듈(300), 정보 제공 모듈(400), DB(500) 각각을 구성하는 소프트웨어 및/또는 하드웨어의 결합 역시 서로 다른 물리적 장치에 위치하고, 서로 다른 물리적 장치에 위치한 구성들이 서로 유기적으로 결합되어 각각의 상기 모듈들을 구현할 수도 있다.The item information providing system 10 may refer to a logical configuration having hardware resources and/or software required to implement the technical idea of the present invention, and necessarily refers to one physical component or It does not mean a single device. That is, the item information providing system 10 may mean a logical combination of hardware and/or software provided to implement the technical idea of the present invention, and if necessary, it is installed in devices spaced apart from each other to perform respective functions. By performing, it may be implemented as a set of logical configurations for implementing the technical idea of the present invention. In addition, the item information providing system 10 may mean a set of components implemented separately for each function or role to implement the technical idea of the present invention. For example, the data collection module 50, item DB construction module 100, user-index relationship DB construction module 200, update judgment module 300, information provision module 400, DB 500, respectively may be located on different physical devices or may be located on the same physical device. In addition, depending on the implementation example, the data collection module 50, item DB construction module 100, user-index relationship DB construction module 200, update judgment module 300, information provision module 400, DB ( 500) A combination of software and/or hardware constituting each may also be located in different physical devices, and components located in different physical devices may be organically combined with each other to implement each of the modules.

또한, 본 명세서에서 모듈이라 함은, 본 발명의 기술적 사상을 수행하기 위한 하드웨어 및 상기 하드웨어를 구동하기 위한 소프트웨어의 기능적, 구조적 결합을 의미할 수 있다. 예를 들어, 상기 모듈은 소정의 코드와 상기 소정의 코드가 수행되기 위한 하드웨어 리소스(resource)의 논리적인 단위를 의미할 수 있으며, 반드시 물리적으로 연결된 코드를 의미하거나, 한 종류의 하드웨어를 의미하는 것은 아님은 본 발명의 기술분야의 평균적 전문가에게는 용이하게 추론될 수 있다.Also, in this specification, a module may mean a functional and structural combination of hardware for implementing the technical concept of the present invention and software for driving the hardware. For example, the module may mean a logical unit of a predetermined code and a hardware resource for executing the predetermined code, and necessarily means a physically connected code or one type of hardware. That this is not the case can be easily deduced to the average expert in the art of the present invention.

한편, 상기 종목 정보 제공 시스템(10)은 물리적으로는 도 3에 도시된 바와 같은 구성을 가질 수 있다. 상기 종목 정보 제공 시스템(10)은 본 발명의 기술적 사상을 구현하기 위한 프로그램이 저장되는 메모리(120-1), 및 상기 메모리(120)에 저장된 프로그램을 실행하기 위한 프로세서(110-1)가 구비될 수 있다.Meanwhile, the item information providing system 10 may physically have a configuration as shown in FIG. 3 . The item information providing system 10 includes a memory 120-1 storing a program for implementing the technical idea of the present invention, and a processor 110-1 for executing the program stored in the memory 120. It can be.

상기 프로세서(110-1)는 상기 종목 정보 제공 시스템(10)의 구현 예에 따라, CPU, APU, 모바일 프로세서 등 다양한 명칭으로 명명될 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다. 또한, 도 2에서 설명한 바와 같이 상기 종목 정보 제공 시스템(10)은 복수의 물리적 장치들이 유기적으로 결합되어 구현될 수도 있으며, 이러한 경우 상기 프로세서(110-1)는 물리적 장치 별로 적어도 한 개 구비되어 본 발명의 종목 정보 제공 시스템(10)을 구현할 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.An average expert in the technical field of the present invention can easily infer that the processor 110-1 can be named by various names such as CPU, APU, mobile processor, etc., depending on the implementation example of the item information providing system 10. You will be able to. In addition, as described in FIG. 2, the item information providing system 10 may be implemented by organically combining a plurality of physical devices. In this case, at least one processor 110-1 is provided for each physical device. An average expert in the technical field of the present invention will be able to easily infer that the item information providing system 10 of the present invention can be implemented.

상기 메모리(120-1)는 상기 프로그램이 저장되며, 상기 프로그램을 구동시키기 위해 상기 프로세서가 접근할 수 있는 어떠한 형태의 저장장치로 구현되어도 무방하다. 또한 하드웨어적 구현 예에 따라 상기 메모리(120-1)는 어느 하나의 저장장치가 아니라 복수의 저장장치로 구현될 수도 있다. 또한 상기 메모리(120-1)는 주기억장치뿐만 아니라, 임시기억장치를 포함할 수도 있다. 또한 휘발성 메모리 또는 비휘발성 메모리로 구현될 수도 있으며, 상기 프로그램이 저장되고 상기 프로세서에 의해 구동될 수 있도록 구현되는 모든 형태의 정보저장 수단을 포함하는 의미로 정의될 수 있다. The memory 120-1 stores the program and may be implemented as any type of storage device accessible by the processor to drive the program. Also, according to a hardware implementation example, the memory 120-1 may be implemented as a plurality of storage devices rather than a single storage device. In addition, the memory 120-1 may include a temporary storage device as well as a main memory device. In addition, it may be implemented as a volatile memory or a non-volatile memory, and may be defined as including all types of information storage means implemented so that the program can be stored and driven by the processor.

상기 종목 정보 제공 시스템(10)은 실시 예에 따라 본 발명의 기술적 사상에 따라 관심 투자 지표와 연관성이 깊은 종목에 대한 개인화된 정보를 생성할 수 있으며, 이를 활용하고자 하는 주체(예를 들어, 증권사, 은행, 또는 기타 서비스 사업자 등)가 운영하고 실시하는 시스템일 수 있고, 웹 서버, 컴퓨터 등 다양한 방식으로 구현될 수 있으며, 본 명세서에서 정의되는 기능을 수행할 수 있는 어떠한 형태의 데이터 프로세싱 장치도 포함하는 의미로 정의될 수 있다.According to an embodiment, the item information providing system 10 may generate personalized information on items closely related to an investment index of interest according to the technical idea of the present invention, and a subject who wants to use it (eg, a securities company) , banks, or other service providers, etc.), can be implemented in various ways such as web servers, computers, etc., and any type of data processing device capable of performing the functions defined in this specification It can be defined in a meaning that includes.

또한 상기 종목 정보 제공 시스템(10)의 실시 예에 따라 다양한 주변장치들(주변장치 1 내지 주변장치 N, 130-1, 131-1)이 더 구비될 수 있다. 예를 들어, 키보드, 모니터, 그래픽 카드, 통신장치 등이 주변장치로써 상기 종목 정보 제공 시스템(10)에 더 포함될 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다. In addition, according to the embodiment of the event information providing system 10, various peripheral devices (peripheral device 1 to peripheral device N, 130-1, 131-1) may be further provided. For example, an average expert in the technical field of the present invention may easily infer that a keyboard, monitor, graphic card, communication device, etc. may be further included in the item information providing system 10 as peripheral devices.

이하, 본 명세서에서 소정의 모듈이 어떤 기능을 수행한다고 함은 상기 프로세서(110-1)가 상기 메모리(120-1)에 구비된 프로그램을 구동하여 상기 기능을 수행하는 것을 의미함을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.Hereinafter, when a predetermined module performs a function in this specification, it means that the processor 110-1 drives a program included in the memory 120-1 to perform the function. An average expert in the art will be able to reason easily.

도 2를 참조하면, 상기 DB(500)는 본 발명의 기술적 사상을 구현하는데 필요한 각종 데이터 혹은 DB 테이블을 저장할 수 있다. 예를 들어, 상기 종목 정보 제공 시스템(10)은 본 발명의 기술적 사상에 따른 인공지능 기반의 개인화된 종목 정보 제공 방법을 수행하는 과정에서 종목-키워드 관계 DB, 종목-지표 관계 DB, 및 사용자-지표 관계 DB를 구축할 수 있다. 그리고 구축된 상기 종목-키워드 관계 DB, 종목-지표 관계 DB, 및 사용자-지표 관계 DB는 상기 DB(500)에 저장될 수 있다.Referring to FIG. 2 , the DB 500 may store various data or DB tables required to implement the technical idea of the present invention. For example, in the process of performing the AI-based personalized item information providing method according to the technical concept of the present invention, the item information providing system 10 includes an item-keyword relationship DB, an item-indicator relationship DB, and a user- An indicator relation DB can be constructed. In addition, the constructed item-keyword relationship DB, item-index relationship DB, and user-index relationship DB may be stored in the DB 500 .

본 명세서에서 DB라 함은, 적어도 하나의 테이블로 구현될 수도 있으며, 상기 DB에 저장된 정보를 검색, 저장, 및 관리하기 위한 별도의 DBMS(Database Management System)을 더 포함하는 의미로 사용될 수도 있다. 또한, 링크드 리스트(linked-list), 트리(Tree), 관계형 DB의 형태 등 다양한 방식으로 구현될 수 있으며, 상기 DB에 저장될 정보를 저장할 수 있는 모든 데이터 저장매체 및 데이터 구조를 포함하는 의미로 사용될 수 있다.In this specification, DB may be implemented as at least one table, and may be used to mean further including a separate database management system (DBMS) for searching, storing, and managing information stored in the DB. In addition, it can be implemented in various ways such as linked-list, tree, relational DB, etc., and includes all data storage media and data structures capable of storing information to be stored in the DB. can be used

한편, 상기 데이터 수집모듈(50)은 복수의 비정형 데이터들을 수집할 수 있다. 이를 위해 상기 데이터 수집모듈(50)은 상기 정보 소스 시스템(200)이 제공하는 웹 또는 앱의 정보를 크롤링하거나 상기 시스템이 제공하는 API를 통해 수집할 수 있다. 이러한 정보 소스 시스템은 SNS(Social Network Service) 시스템, 언론사, 거래소, 금융기관, 포털, 카페, 블로그 등 다양할 수 있다.Meanwhile, the data collection module 50 may collect a plurality of unstructured data. To this end, the data collection module 50 may crawl web or app information provided by the information source system 200 or collect it through an API provided by the system. Such an information source system may be various, such as a Social Network Service (SNS) system, a media company, an exchange, a financial institution, a portal, a cafe, and a blog.

상기 데이터 수집모듈(50)은 대량의 비정형 데이터를 벌크로 수집한 후 필요한 특정 비정형 데이터만을 추출하여 사용할 수도 있으며, 특정 조건(예를 들어, 특정 사용자, 특정 종목, 특정 기간)을 만족하는 비정형 데이터들만을 수집할 수도 있다.The data collection module 50 may collect a large amount of unstructured data in bulk and then extract and use only necessary specific unstructured data, and unstructured data that satisfies specific conditions (eg, specific user, specific item, specific period). You can only collect them.

한편, 상기 종목 DB 구축모듈(100)은 복수의 투자 종목 각각에 대응되는 연관 키워드를 저장하는 종목-키워드 관계 DB를 구축할 수 있다. 또한 상기 종목 DB 구축모듈은 복수의 투자 종목 각각에 대응되는 연관 투자영향지표를 저장하는 종목-지표 관계 DB를 구축할 수 있다.Meanwhile, the item DB construction module 100 may construct an item-keyword relation DB for storing related keywords corresponding to each of a plurality of investment items. In addition, the item DB construction module may construct an item-indicator relation DB storing related investment impact indicators corresponding to each of a plurality of investment items.

상기 종목 DB 구축모듈(100)은 복수의 금융 투자 종목 각각에 대하여, 상기 투자 종목에 대응되는 연관 키워드 결정방법을 수행함으로써 상기 금융 투자 종목에 대응되는 연관 키워드를 추출할 수 있으며, 각각의 금융 투자 종목에 대응되는 연관 키워드들을 종목-키워드 관계 DB에 저장하거나 업데이트함으로써 상기 종목-키워드 관계 DB을 구축할 수 있다.For each of a plurality of financial investment items, the item DB construction module 100 may extract a related keyword corresponding to the financial investment item by performing a method of determining a related keyword corresponding to the investment item, and each financial investment item The item-keyword relation DB may be constructed by storing or updating related keywords corresponding to the item in the item-keyword relation DB.

또한 상기 종목 DB 구축모듈(100)은 복수의 금융 투자 종목 각각에 대하여, 상기 투자 종목에 대응되는 연관 투자영향지표를 추출할 수 있으며, 각각의 금융 투자 종목에 대응되는 연관 투자지표들을 종목-지표 관계 DB에 저장하거나 업데이트함으로써 상기 종목-지표 관계 DB을 구축할 수 있다. 이를 위하여 상기 종목 DB 구축모듈(100)은 본 발명의 기술적 사상에 따라 소셜 미디어 등 네트워크에서 수집되는 비정형 데이터(문서들)을 이용해 특정 금융종목의 가치변동을 예측할 수 있는 투자지표를 결정할 수 있다. 특히 본 발명의 기술적 사상에 따라 결정되는 특정 금융종목의 연관 키워드를 이용하여 상기 특정 금융종목의 투자영향지표를 결정하는 경우에는 보다 신뢰성 있으면서 시기적으로 변화하는 사회적 인식(연관 키워드의 변화 등)을 반영한 투자영향지표의 결정이 가능한 효과가 있다.In addition, for each of a plurality of financial investment items, the item DB construction module 100 may extract related investment impact indicators corresponding to the investment items, and convert related investment indicators corresponding to each financial investment item into item-indices. The item-indicator relation DB can be constructed by storing or updating in the relation DB. To this end, the item DB construction module 100 may determine an investment index capable of predicting the value change of a specific financial item using unstructured data (documents) collected from networks such as social media according to the technical idea of the present invention. In particular, in the case of determining the investment impact index of a specific financial item using keywords related to a specific financial item determined according to the technical idea of the present invention, it is more reliable and reflects social awareness (changes in related keywords, etc.) that change over time. There is an effect that can determine the investment impact index.

금융 투자 종목은 주식, 선물, 옵션 등의 금융상품의 개별종목을 의미하며, 본 명세서에서는 금융 종목 혹은 투자 종목은 금융 투자 종목과 동일한 의미로 사용될 수 있다. 상기 복수의 금융 투자 종목은 미리 정의되어 있을 수 있다.Financial investment items refer to individual items of financial instruments such as stocks, futures, and options, and in this specification, financial items or investment items may be used as the same meaning as financial investment items. The plurality of financial investment items may be predefined.

특정 금융종목에 대응되는 연관 키워드라 함은 이론적으로는 상기 특정 금융종목(예를 들어, 주식종목 A, 채권 B 등)의 가치변동에 영향을 미치는 이벤트, 객체, 인물, 동향 등을 나타내는 키워드를 의미할 수 있다. 실질적으로는 본 발명의 기술적 사상에 따라 다수의 비정형 데이터들(예를 들어, SNS 등의 소셜 미디어상의 사용자 컨텐츠, 뉴스, 카페 또는 블로그 등에 개시된 컨텐츠 등)에 기초하여 특정 금융종목과 연관이 있는 것으로 추출되는 키워드를 의미할 수 있다.A related keyword corresponding to a specific financial item is, theoretically, a keyword representing an event, object, person, trend, etc. that affects the value change of the specific financial item (eg, stock item A, bond B, etc.). can mean Substantially, according to the technical concept of the present invention, it is considered to be related to a specific financial item based on a plurality of unstructured data (eg, user content on social media such as SNS, news, content disclosed in cafes or blogs, etc.) It may mean an extracted keyword.

상기 종목 DB 구축모듈(100)은 네트워크를 통해 상기 특정 금융종목에 상응하는 비정형 데이터들을 수집할 수 있다. 상기 종목 DB 구축모듈(100)은 예를 들어 상기 정보 소스 시스템(200)으로부터 상기 특정 금융종목에 상응하는 비정형 데이터들을 수집할 수 있다.The item DB construction module 100 may collect unstructured data corresponding to the specific financial item through a network. The item DB construction module 100 may collect unstructured data corresponding to the specific financial item from the information source system 200, for example.

상기 특정 금융종목에 상응하는 비정형 데이터라 함은 상기 특정 금융종목의 명칭(예를 들어, 삼성전자, 하이닉스 등)이 포함되어 있으면서 네트워크 상에서 유통되는 다양한 컨텐츠를 의미할 수 있으며, 이러한 비정형 데이터는 언론사, 금융기관, 증권사, 일반적인 사용자들에 의해 생성될 수 있다. 상기 비정형 데이터는 일종의 문서로써 취급될 수 있으며, 텍스트가 포함되어 있으면 컨텐츠의 길이가 길거나(예를 들어, 몇 줄 또는 몇 십 줄 이상의 게시글) 또는 짧거나(예를 들어, 한 줄의 댓글 등) 무관하게 상기 비정형 데이터가 될 수 있다.The unstructured data corresponding to the specific financial item may refer to various contents that include the name of the specific financial item (for example, Samsung Electronics, Hynix, etc.) and circulate on the network. , can be created by financial institutions, securities companies, and general users. The unstructured data can be treated as a kind of document, and if text is included, the length of the content is long (eg, several lines or more than several tens of lines) or short (eg, one line of comments, etc.) It can be the unstructured data regardless.

상기 종목 DB 구축모듈(100)은 주기적으로 또는 관리자의 명령에 따라 특정 금융종목에 상응하는 비정형 데이터들을 크롤링할 수 있다. 구현 예에 따라서는 상기 비정형 데이터의 수집 대상이 되는 시스템에서 미리 정해진 프로토콜(예를 들어, API)을 통해 상기 비정형 데이터를 수집할 수도 있다. 일 예에 의하면 상기 특정 금융종목에 상응하는 비정형 데이터인지와 무관하게 대량의 비정형 데이터들이 수집된 후 상기 종목 DB 구축모듈(100)에 의해 상기 특정 금융종목에 해당하는 데이터들만이 추출될 수도 있으며, 다른 일 예에 의하면 상기 특정 금융종목에 상응하는 비정형 데이터들만이 네트워크를 통해 수집될 수도 있다. The item DB construction module 100 may crawl unstructured data corresponding to a specific financial item periodically or according to a manager's command. Depending on implementation, the unstructured data may be collected through a predetermined protocol (eg, API) in a system to which the unstructured data is to be collected. According to an example, after a large amount of unstructured data is collected regardless of whether it is unstructured data corresponding to the specific financial item, only data corresponding to the specific financial item may be extracted by the item DB building module 100, According to another example, only unstructured data corresponding to the specific financial item may be collected through a network.

그러면 상기 종목 DB 구축모듈(100)은 이렇게 수집된 비정형 데이터들에 기초하여 상기 특정 금융종목의 연관 키워드를 결정할 수 있다.Then, the item DB construction module 100 may determine a keyword related to the specific financial item based on the unstructured data collected in this way.

즉 본 발명의 기술적 사상에 의하면 상기 종목 DB 구축모듈(100)은 연관 키워드의 결정 시점에서 현재 네트워크상에 유통되는 다수의 컨텐츠 생산자에 의해 작성된 의견, 반응, 분석들을 토대로 상기 특정 금융종목의 연관 키워드를 결정하는 특징이 있다. 이를 위해 상기 종목 DB 구축모듈(100)은 특정 금융종목의 연관 키워드를 결정하기 위해 사용되는 비정형 데이터를 상기 비정형 데이터의 생산시점(예를 들어, 네트워크상에 업로드 시점)에 기초하여 제한할 수 있다. 예를 들어 최근 소정의 기간(예를 들어, 1달, 3달 등)에 생상된 비정형 데이터만에 기초하여 상기 종목 DB 구축모듈(100)은 특정 금융종목의 연관 키워드를 결정할 수 있다. That is, according to the technical concept of the present invention, the item DB construction module 100 is a related keyword of the specific financial item based on opinions, reactions, and analyzes written by a plurality of content producers currently circulating on the network at the time of determining the relevant keyword. There are characteristics that determine To this end, the item DB building module 100 may limit unstructured data used to determine keywords related to a specific financial item based on the production time of the unstructured data (eg, upload time on the network). . For example, the item DB building module 100 may determine a keyword related to a specific financial item based only on unstructured data generated in a recent predetermined period (eg, 1 month, 3 months, etc.).

그리고 이러한 연관 키워드의 결정을 주기적으로 수행하면서 해당 특정 금융종목의 연관 키워드의 변화를 모니터링함으로서 상기 특정 금융종목 또는 상기 특정 금융종목의 주체(예를 들어, 회사 등)의 사업방향의 변화나 유의미한 이벤트의 발생여부를 연관 키워드에 기반하여 확인할 수 있는 효과가 있다.In addition, by periodically performing the determination of such related keywords and monitoring changes in related keywords of the specific financial issue, changes in the business direction of the specific financial issue or the subject of the specific financial issue (eg, company, etc.) or a meaningful event There is an effect of confirming the occurrence of based on related keywords.

본 발명의 기술적 사상에 의하면 상기 종목 DB 구축모듈(100)은 특정 금융종목의 연관 키워드를 네트워크 상에 유통되는 비정형 데이터를 분석하여 결정하며, 이러한 비정형 데이터의 분석에는 딥러닝 기반의 자연어 처리모델(Natural Language Processing Model)을 이용할 수 있다.According to the technical idea of the present invention, the item DB construction module 100 determines keywords associated with a specific financial item by analyzing unstructured data circulating on the network, and analysis of such unstructured data includes a deep learning-based natural language processing model ( Natural Language Processing Model) can be used.

특히 보다 정확도 높은 연관 키워드의 결정을 위해 상기 종목 DB 구축모듈(100)은 컨텍스트 센서티브(context sensitive)한 자연어 처리모델을 이용할 수 있다.In particular, in order to determine related keywords with higher accuracy, the item DB building module 100 may use a context sensitive natural language processing model.

컨텍스트 센서티브 자연어 처리모델은 컨텍스트 프리(Context Free) 자연어 처리모델과 상반되는 개념으로써 어떤 키워드 또는 문장의 의미를 문맥에 따라 달리 정의 또는 인식하는 자연어 처리모델을 의미할 수 있다. 이에 비해 컨텍스트 프리 자연어 처리모델은 문맥과 무관하게 해당 키워드 또는 문장 자체로써 해당 키워드 또는 문장의 의미를 정의하거나 인식하는 것을 의미할 수 있다.A context-sensitive natural language processing model is a concept opposite to a context-free natural language processing model, and may refer to a natural language processing model that differently defines or recognizes the meaning of a certain keyword or sentence according to context. In contrast, a context-free natural language processing model may mean defining or recognizing the meaning of a corresponding keyword or sentence as a corresponding keyword or sentence itself regardless of context.

이러한 컨텍스트 센서티브한 자연어 처리모델은 다수의 학습 대상 데이터를 학습하여 실제로 특정 키워드(자연어 처리모델에서는 토큰이라고 표현하기도 하며, 키워드는 하나의 토큰 또는 토큰의 결합일 수 있음, 본 명세서에서는 설명의 편의를 위해 토큰 또는 토큰들의 결합을 키워드로 표현하기로 함)와 연관이 있는 키워드를 잘 파악하는 것이 해당 자연어 처리모델의 중요한 성능 중 하나이며, 이를 위해서 어텐션(attention)이라는 개념을 적용한 자연어 처리모델이 등장하였다. This context-sensitive natural language processing model learns a large number of learning target data and actually uses a specific keyword (sometimes referred to as a token in a natural language processing model, and a keyword can be a single token or a combination of tokens. In this specification, for convenience of explanation, One of the important performance of the natural language processing model is to understand the keywords related to the token or the combination of tokens as a keyword), and for this purpose, a natural language processing model applying the concept of attention has appeared. did

어텐션은 어떤 키워드를 정의하거나 인식하기 위해 해당 키워드와 연관이 있으므로 더 집중(attention)해서 참고하여야 다른 키워드와의 관계를 의미할 수 있다. 예를 들어, 제1키워드와 어텐션의 값이 높은 제2키워드는 학습 데이터 상에서 상기 특정 키워드와 연관성이 높게 사용되었음을 의미할 수 있다. Since attention is related to a keyword in order to define or recognize a certain keyword, it can mean a relationship with other keywords when referred to with more attention. For example, a second keyword having a high value of attention with the first keyword may mean that it is used in learning data with a high correlation with the specific keyword.

어텐션 메커니즘과 어텐션 함수 등의 어텐션의 개념에 대해서는 널리 공지되어 있으므로 상세한 설명은 생략하도록 한다.Since the concept of attention, such as the attention mechanism and the attention function, is widely known, a detailed description thereof will be omitted.

이러한 어텐션의 개념을 이용하여 학습 데이터들에 사용된 키워드들을 벡터화하는 자연어 처리모델의 경우 학습 데이터들에 포함된 각각의 키워드들은 문맥을 반영하여 벡터화될 수 있게 된다.In the case of a natural language processing model that vectorizes keywords used in training data using the concept of attention, each keyword included in training data can be vectorized by reflecting the context.

이러한 자연어 처리모델의 일 예는 예를 들어, ELMo(Embeddings from Language Model), ULM-FiT (Universal Language Model Fine-tuning for Text Classification), BERT (Bidirectional Encoder Representations from Transformers) 등이 존재한다.Examples of such natural language processing models include, for example, Embeddings from Language Model (ELMo), Universal Language Model Fine-tuning for Text Classification (ULM-FiT), and Bidirectional Encoder Representations from Transformers (BERT).

상기의 자연어 처리모델은 어떠한 경우든 키워드를 벡터화하는 태스크(word embedding)를 수행하며, 이러한 키워드의 벡터화를 통해 해당 키워드의 의미를 벡터로 표현할 수 있다. In any case, the natural language processing model performs a task of vectorizing a keyword (word embedding), and through vectorization of such a keyword, the meaning of the keyword can be expressed as a vector.

특히 BERT (Bidirectional Encoder Representations from Transformers)의 경우는 비지도 학습(Unsupervised Learning)을 통해 다량의 학습 데이터를 라벨링(labeling)이라는 태스크 없이도 학습하여 키워드별로 문맥에 따라 차별적으로 해당 키워드를 높은 성능으로 벡터화할 수 있는 모델로 각광을 받고 있다.In particular, in the case of BERT (Bidirectional Encoder Representations from Transformers), a large amount of training data can be learned without a labeling task through unsupervised learning, and the keyword can be vectorized with high performance differentially according to the context of each keyword. It is getting attention as a model that can do it.

따라서 본 발명의 일 실시 예에 의한 자연어 처리모델은 학습 데이터를 BERT 자연어 처리모델을 통해 학습하여 각각의 키워드를 벡터화하는 장연어 처리모델일 수 있지만, 이에 국한되지는 않는다.Therefore, the natural language processing model according to an embodiment of the present invention may be a salmon fish processing model that vectorizes each keyword by learning learning data through the BERT natural language processing model, but is not limited thereto.

어떠한 경우든 본 발명의 기술적 사상에 의하면 컨텍스트 센서티브한 자연어 처리모델을 통해 문맥을 반영한 각각의 키워드의 벡터화 결과를 이용할 경우, 종래에 단순히 통계적인 빈도 등을 통해 연관 키워드를 결정하는 방식에 비해 훨씬 높은 성능(즉, 단순히 특정 키워드와 같이 등장하는 빈도는 높지만 실질적인 연관성은 거의 없는 일반적인 의미의 단어들)을 가질 수 있다.In any case, according to the technical idea of the present invention, when the vectorization result of each keyword reflecting the context is used through a context-sensitive natural language processing model, it is much higher than the conventional method of determining related keywords simply through statistical frequency, etc. You can have a performance (that is, words in a general sense that simply appear like a specific keyword with a high frequency but have little practical relevance).

한편, 본 발명의 기술적 사상에 의하면 상기 종목 DB 구축모듈(100)은 특정 금융종목의 투자영향지표를 네트워크 상의 비정형 데이터에 기반하여 결정할 수 있다. 이때 상술한 바와 같이 결정된 특정 금융종목의 연관 키워드를 이용할 수도 있지만, 반드시 이에 국한되지는 않으며 전문가 등의 사람에 의해 결정된 특정 금융종목의 연관 키워드를 이용하여서도 본 발명의 기술적 사상에 따른 투자영향지표를 결정할 수도 있다. Meanwhile, according to the technical idea of the present invention, the item DB construction module 100 may determine an investment impact index of a specific financial item based on unstructured data on the network. At this time, although keywords related to specific financial items determined as described above may be used, it is not necessarily limited thereto, and investment impact indicators according to the technical idea of the present invention may be used even when keywords related to specific financial items determined by experts or the like are used. can also decide

이를 위해 상기 종목 DB 구축모듈(100)은 딥러닝 기반의 자연어 처리모델을 이용할 수 있다. 투자영향지표를 결정하기 위한 상기 자연어 처리모델은 연관 키워드를 결정하기 위한 자연어 처리모델과는 별개로 구비될 수도 있다. To this end, the item DB construction module 100 may use a deep learning-based natural language processing model. The natural language processing model for determining the investment impact index may be provided separately from the natural language processing model for determining the related keyword.

이러한 자연어 처리모델은 후술할 바와 같이 영향판단 모듈로 명명될 수도 있다. 상기 자연어 처리모델은 비정형 데이터 즉, 수집된 문서들에 포함된 문장들이 해당 특정 금융종목에 긍정적 영향을 미치는지 부정적 영향을 미치는지를 판단하거나 문서들 자체가 긍정적 영향을 미치는지 부정적 영향을 미치는지를 판단할 수 있도록 학습된 딥러닝 모델일 수 있다. 이하 본 명세서에서는 연관 키워드를 결정하기 위해 학습된 자연어 처리모델을 제1자연어 처리모델이라 명명하기로 한다. 또한 투자영향지표를 결정하기 위한 자연어 처리모델 즉, 영향 판단모델은 제2자연어 처리모델로 명명하기로 한다.Such a natural language processing model may be named an influence judgment module as will be described later. The natural language processing model can determine whether unstructured data, that is, sentences included in collected documents, have a positive or negative effect on a particular financial item, or determine whether the documents themselves have a positive or negative effect. It may be a deep learning model trained to Hereinafter, in the present specification, a natural language processing model learned to determine a related keyword will be referred to as a first natural language processing model. In addition, the natural language processing model for determining the investment impact index, that is, the impact judgment model, will be named a second natural language processing model.

제1자연어 처리모델과 제2자연어 처리모델은 각각 별개로 구비될 수 있다. 실시 예에 따라 상기 제2자연어 처리모델은 제1자연어 처리모델을 구축한 후, 제1자연어 처리모델을 파인튜닝(fine-tuning)하여 특정 문장 또는 문서가 입력되면 해당 문장 또는 문서가 긍정적인지, 부정적인지, 또는 중립인지 여부를 분류하는 분류모델로 학습된 모델일 수 있다. The first natural language processing model and the second natural language processing model may be separately provided. According to an embodiment, the second natural language processing model establishes a first natural language processing model and then fine-tunes the first natural language processing model to determine whether the sentence or document is positive when a specific sentence or document is input. It can be a model trained as a classification model that classifies whether it is negative or neutral.

이처럼 특정 데이터가 입력되면 긍정 또는 부정을 포함하는 분류결과를 출력하는 머신러닝 모델을 감성분석(Sentiment Analisys) 모델이라고 한다. In this way, a machine learning model that outputs a classification result including positive or negative when specific data is input is called a Sentiment Analysis model.

실시 예에 따라 상기 제2자연어 처리모델은 긍정, 부정, 또는 중립의 3개의 클래스로 입력 데이터를 분류하는 모델일 수도 있고, 다른 실시 예에 따라서는 강한긍정, 약한긍정, 강한부정, 약한부정, 중립 등과 같이 더 많은 클래스로 입력 데이터를 분류하도록 학습되는 모델일 수도 있다.According to an embodiment, the second natural language processing model may be a model that classifies input data into three classes of positive, negative, or neutral, and according to other embodiments, strong positive, weak positive, strong negative, weak negative, It can also be a model that is trained to classify the input data into more classes, such as neutral.

이러한 제2자연어 처리모델은 컨텍스트 센서티브한 자연어 처리모델(예를 들어, BERT 등)을 대량의 코퍼스(또는 문서)로 사전학습(pre-training)한 후, 사전학습된 모델을 이용하여 상술한 바와 같은 감성분석을 수행하도록 파인튜닝함으로써 구축할 수 있다. 이러한 사전학습에는 전술한 바와 같이 BERT 등의 공개된 자연어 처리모델 그 자체 또는 금융상품과 관련된 데이터를 추가로 학습하여 제1자연어 처리모델을 구축하고, 제1자연어 처리모델을 다수의 라벨링된 학습 데이터를 이용하여 파인튜닝하여 구축할 수 있다. 다수의 라벨링된 학습 데이터는 다수의 문장들에 대해 제2자연어 처리모델이 출력할 분류결과(예를 들어, 긍정, 부정, 또는 중립)를 라벨링한 데이터일 수 있다. 물론 문서 자체를 학습 데이터로 이용할 경우는 문서 자체에 라벨링된 데이터가 학습 데이터가 될 수도 있다.This second natural language processing model pre-trains a context-sensitive natural language processing model (eg, BERT, etc.) with a large amount of corpus (or documents), and then uses the pre-trained model as described above. It can be built by fine-tuning to perform the same sentiment analysis. As described above, in this pre-learning, the first natural language processing model is built by additionally learning data related to the open natural language processing model itself or financial products such as BERT, and the first natural language processing model is used as a plurality of labeled training data It can be built by fine-tuning using . The plurality of labeled training data may be data in which classification results (eg, positive, negative, or neutral) to be output by the second natural language processing model for a plurality of sentences are labeled. Of course, when the document itself is used as training data, the data labeled on the document itself may be the training data.

어떠한 경우든 본 발명의 기술적 사상에 따른 제2자연어 처리모델은 문장별로 해당 문장이 특정 금융종목에 대해 긍정적인 문장인지 또는 부정적인 문장인지를 포함하는 분류결과를 출력할 수 있다. 이는 종래에 키워드별로 긍정 또는 부정의 감성분석을 소정의 방식을 수행하는 것에 비해 훨씬 높은 정확도와 신뢰성이 있는 판단결과를 도출할 수 있다. In any case, the second natural language processing model according to the technical concept of the present invention may output a classification result including whether the corresponding sentence is positive or negative for a specific financial item for each sentence. This can lead to a judgment result with much higher accuracy and reliability than conventionally performing positive or negative sentiment analysis for each keyword in a predetermined method.

즉, 종래의 방식인 키워드별 감성분석의 경우에는 한 문장에서 긍정적인 키워드와 부정적인 키워드가 공존하는 경우 확률적 모델(예를 들어, 나이브 베이지안 방식 등)을 통해 해당 문장 또는 문서가 긍정적일 확률 또는 부정적일 확률을 예측할 수밖에 없으며 이러한 경우 감성분석의 정확도가 상대적으로 낮을 수 밖에 없는 문제점이 있었다.That is, in the case of sentiment analysis by keyword, which is a conventional method, when positive keywords and negative keywords coexist in a sentence, the probability that the sentence or document is positive or negative is determined through a probabilistic model (eg, the naive Bayesian method, etc.) In this case, there was a problem that the accuracy of sentiment analysis was relatively low.

하지만 대량의 데이터로 학습된 컨텍스트 센서티브한 자연어 처리모델을 이용하여 문장 또는 문서 단위 자체에 대해 감성분석을 하는 딥러닝 모델의 경우, 라벨링된 학습 데이터의 의미 자체가 유사한 문장 또는 문서를 해당 라벨링 결과와 동일한 출력을 할 수 있도록 학습되며 이러한 경우 훨씬 더 신뢰성 있는 감성분석이 수행될 수 있는 효과가 있다. However, in the case of a deep learning model that performs sentiment analysis on a sentence or document unit itself using a context-sensitive natural language processing model learned from a large amount of data, a sentence or document whose meaning itself is similar to the labeled training data is compared with the corresponding labeling result. It is learned to produce the same output, and in this case, there is an effect that much more reliable sentiment analysis can be performed.

이하 본 명세서에서는 제2자연어 처리모델은 수집된 비정형 문서들에 포함된 문장별로 감성분석을 수행하고 이를 통해 문서에 대한 감성분석을 수행하는 경우를 예시적으로 설명하지만, 실시 예에 따라서는 문서자체에 대해 감성분석 결과를 출력하도록 상기 제2자연어 처리모델이 학습되고 활용될 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.Hereinafter, in the present specification, a case in which the second natural language processing model performs sentiment analysis for each sentence included in the collected unstructured documents and through this is described as an example, but depending on the embodiment, the document itself An average expert in the technical field of the present invention can easily infer that the second natural language processing model can be learned and utilized to output a sentiment analysis result for .

한편, 본 발명의 기술적 사상은 네트워크상의 비정형 데이터를 통해 특정 금융종목의 연관 키워드/연관 투자영향지표를 결정함으로써 다수의 컨텐츠 생산자의 다양한 의견이 반영되면서도 시간적 경향성을 반영할 수 있다는 장점을 그대로 보유하면서도, 네트워크상에 수집된 비정형 데이터를 이용하는 경우 발생할 수 있는 문제점을 해결할 수 있는 기술적 사상을 같이 제공할 수 있다.On the other hand, the technical concept of the present invention has the advantage of being able to reflect the temporal trend while reflecting various opinions of a number of content producers by determining the related keyword/related investment impact index of a specific financial item through unstructured data on the network. In addition, technical ideas that can solve problems that may occur when using unstructured data collected on the network can be provided together.

네트워크상에 수집된 비정형 데이터를 이용하는 경우 발생할 수 있는 대표적인 문제점은 정보의 중복성일 수 있다. 즉, 동일하거나 실제로 거의 동일한 내용이 다수의 컨텐츠 생산자에 의해 별개의 비정형 데이터로써 생산되는 문제점이 있다. 이러한 경우 실질적으로는 한 개의 컨텐츠(텍스트 문서)임에도 다수의 생산자에 의해 반복적으로 업로드되거나 약간의 변경만 가해진 채 업로드되는 경우, 상기 한 개의 컨텐츠가 자연어 처리모델에 의해 반복적으로 학습되는 문제가 발생할 수 있다. A typical problem that may occur when using unstructured data collected on a network may be information redundancy. That is, there is a problem in that the same or substantially the same content is produced as separate unstructured data by a plurality of content producers. In this case, even if it is actually one content (text document), if it is repeatedly uploaded by multiple producers or uploaded with only slight changes applied, the problem of repeatedly learning the one content by the natural language processing model may occur. there is.

그리고 이는 해당 컨텐츠에 자연어 처리모델이 오버피팅(over fitting)되는 문제점 즉, 실제로는 크게 연관성이 없거나 연관성의 정도가 낮지만 높은 연관성을 가지도록 학습되는 경향을 가질 수 있다.In addition, this may have a problem in that the natural language processing model is overfitted to the corresponding content, that is, it may have a tendency to be learned to have a high degree of correlation even though there is no actual correlation or a low degree of correlation.

이를 위해 상기 종목 DB 구축모듈(100)은 비정형 데이터를 소정의 기준을 통해 필터링하여 중복성을 해소하는 기술적 사상을 제공할 수 있다.To this end, the item DB construction module 100 may filter unstructured data through a predetermined criterion to provide a technical concept for resolving redundancy.

결국 본 발명의 기술적 사상에 따르면 네트워크상의 다수의 비정형 데이터 즉, 다수의 생산자가 생산하는 다수의 라이브(live)한 정보들로부터 특정 금융종목의 연관 키워드를 의미를 어느 정도 이해할 수 있는 컨텍스트 센서티브한 자연어 처리모델을 통해 정확도 높게 결정하되, 네트워크상의 다수의 비정형 데이터를 이용하는 경우에 발생할 수 있는 문제점까지 해결할 수 있는 효과가 있다. 또한 다수의 생산자가 생산하는 다수의 라이브(live)한 정보들에 기초하여 해당 특정 금융종목의 주가 방향성을 예측할 수 있는 소정의 투자영향지표를 결정할 수 있다.After all, according to the technical idea of the present invention, a context-sensitive natural language that can understand the meaning of related keywords of a specific financial item to some extent from a plurality of unstructured data on the network, that is, a plurality of live information produced by a plurality of producers. It is determined with high accuracy through the processing model, but there is an effect of solving problems that may occur when using a large number of unstructured data on the network. Also, based on a plurality of live information produced by a plurality of producers, it is possible to determine a predetermined investment impact index capable of predicting the stock price direction of a specific financial issue.

이러한 기술적 사상을 구현하기 위한 상기 종목 DB 구축모듈(100)의 세부적인 구성은 도 4를 참조하여 설명하도록 한다.The detailed configuration of the item DB building module 100 for implementing these technical ideas will be described with reference to FIG. 4 .

도 4는 본 발명의 실시 예에 따른 특정 종목의 연관 키워드/연관 투자영향지표 결정방법을 수행하는 종목 DB 구축모듈(100)의 개략적인 구성을 설명하기 위한 블록도이다. 4 is a block diagram for explaining a schematic configuration of an item DB construction module 100 that performs a method for determining a keyword related to a specific item/related investment impact indicator according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 기술적 사상에 따른 종목 DB 구축모듈(100)은 추출모듈(110), 자연어 처리모델(120; 영향판단 모델을 포함함)을 포함할 수 있다. 실시 예에 따라 상기 종목 DB 구축모듈(100)은 필터링 모듈(130), 문서벡터 생성모듈(140)을 더 포함할 수 있다.Referring to FIG. 4 , the item DB construction module 100 according to the technical idea of the present invention may include an extraction module 110 and a natural language processing model 120 (including an influence judgment model). According to an embodiment, the item DB building module 100 may further include a filtering module 130 and a document vector generating module 140.

상기 데이터 수집모듈(50)은 특정 금융종목에 상응하는 복수의 비정형 데이터들을 수집할 수 있다. 그러면 상기 추출모듈(110)은 수집한 상기 비정형 데이터들(예를 들어, 금융관련 뉴스, 공시, 사용자가 업로드한 컨텐츠, 애널리스트 리포트 등의 다양한 문서 컨텐츠 또는 이들에 대한 댓글 등)에 기초하여 상기 특정 금융종목에 대응되는 연관 키워드를 추출할 수 있으며, 상기 특정 금융종목에 대응되는 연관 키워드를 종목-키워드 관계 DB에 저장할 수 있다.The data collection module 50 may collect a plurality of unstructured data corresponding to a specific financial item. Then, the extraction module 110 determines the specific data based on the collected unstructured data (eg, financial news, public announcements, contents uploaded by users, various document contents such as analyst reports, or comments thereon). A related keyword corresponding to a financial item may be extracted, and a related keyword corresponding to the specific financial item may be stored in an item-keyword relation DB.

이를 위해 상기 추출모듈(110)은 상기 자연어 처리모델(120)을 통해 수집된 비정형 데이터들 중 전부 또는 일부인 학습대상 데이터들을 학습하도록 할 수 있다. 상기 자연어 처리모델(130)은 전술한 바와 같이 제1자연어 처리모델 및 제2자연어 처리모델을 포함할 수 있다. 제2자연어 처리모델은 전술한 바와 같이 비정형 데이터들 즉, 비정형 문서들에 포함된 문장들 또는 문서자체에 대해 상기 특정 금융종목에 대해 긍정적 영향을 미치는지 또는 부정적 영향을 미치는지를 포함하는 분류결과를 출력하는 영향판단 모델 또는 감성분석 모델일 수 있다.To this end, the extraction module 110 may learn target data that are all or part of unstructured data collected through the natural language processing model 120 . As described above, the natural language processing model 130 may include a first natural language processing model and a second natural language processing model. As described above, the second natural language processing model outputs classification results including whether sentences or documents included in unstructured data, that is, unstructured documents, have a positive or negative effect on the specific financial item as described above. It may be an influence judgment model or a sentiment analysis model that

상기 자연어 처리모델(120)에 포함된 제1자연어 처리모델은 전술한 바와 같이 적어도 컨텍스트 센서티브(context-sensitive)하게 학습대상 데이터들에 포함된 키워드를 각각 벡터화할 수 있는 모델일 수 있다. As described above, the first natural language processing model included in the natural language processing model 120 may be a model capable of vectorizing each of the keywords included in the learning target data in a context-sensitive manner.

예를 들어 상기 제1자연어 처리모델은 BERT 등과 같이 비지도 학습을 통해 대량의 코퍼스(corpus)를 학습할 수 있는 모델일 수 있고, 상기 학습대상 데이터들만을 훈련하거나 또는 상기 학습대상 데이터들이 아닌 다른 대량의 문서들 즉 코퍼스들을 미리 학습한 후 상기 학습대상 데이터들을 추가로 훈련하는 방식으로 학습될 수 있다.For example, the first natural language processing model may be a model that can learn a large amount of corpus through unsupervised learning, such as BERT, and train only the learning target data or other training target data. It can be learned in such a way that a large amount of documents, that is, corpuses are learned in advance and then the learning target data are additionally trained.

예를 들어, 도 5는 본 발명의 실시 예에 따른 자연어 처리모델의 개념을 설명하기 위한 도면인데, 도 5에 도시된 바와 같이 소정의 제1자연어 처리모델(예를 들어, BERT 등의 NLP 모델)이 구비될 수 있다.For example, FIG. 5 is a diagram for explaining the concept of a natural language processing model according to an embodiment of the present invention. As shown in FIG. 5, a predetermined first natural language processing model (eg, an NLP model such as BERT) ) may be provided.

상기 제1자연어 처리모델은 학습대상 데이터들(예를 들어, D1, D2, D3, D4 등)에 대한 학습을 수행한다. The first natural language processing model performs learning on learning target data (eg, D1, D2, D3, D4, etc.).

상기 제1자연어 처리모델은 학습대상 데이터들이 충분히 많은 경우에는 학습대상 데이터들만으로 학습이 수행될 수도 있지만, 통상적으로 학습대상 데이터들만으로는 충분한 양이 안 될 가능성이 높으므로 상기 제1자연어 처리모델은 상기 학습대상 데이터들이 아닌 다른 대량의 데이터들(예를 들어, BERT의 경우 wiki 데이터)로 이미 사전 학습된 모델(pre-trained model)일 수 있다. The first natural language processing model may be trained with only the learning target data when there is a sufficiently large amount of target data. It may be a pre-trained model with other large amounts of data (for example, wiki data in the case of BERT) other than learning target data.

그리고 상기 학습대상 데이터들을 추가로 학습하여 상기 학습대상 데이터들에 포함된 키워드들 각각에 대한 워드벡터를 획득할 수 있다.In addition, word vectors for each of the keywords included in the learning target data may be obtained by additionally learning the learning target data.

학습대상 데이터는 상기 데이터 수집모듈(50)이 수집한 특정 금융종목에 상응하는 비정형 데이터들 중에서 중복성 문제를 해결하고 남은 데이터들일 수 있다. 중복성 문제를 해결하기 위한 기술적 사상은 후술하도록 하며, 소정의 기준을 통해 서로 다른 비정형 데이터들이 중복성이 있다고 판단되면 이들 중 하나(또는 실시 예에 따라 몇 개)만을 남김으로써 학습대상 데이터들이 특정될 수 있다.The learning target data may be data remaining after resolving a redundancy problem among unstructured data corresponding to a specific financial item collected by the data collection module 50 . The technical idea for solving the redundancy problem will be described later, and if it is determined that different unstructured data have redundancy through a predetermined criterion, the learning target data can be specified by leaving only one of them (or several according to the embodiment). there is.

어떠한 경우든 상기 제1자연어 처리모델은 학습대상 데이터들에 포함되어 있는 키워드들 각각을 문맥(context)을 반영하여 벡터화할 수 있는 모델일 수 있다.In any case, the first natural language processing model may be a model capable of vectorizing each of the keywords included in the learning target data by reflecting the context.

그러면 상기 추출모듈(110)은 학습된 자연어 처리모델(120)을 통해 획득된 상기 특정 금융종목의 키워드(즉, 상기 특정 금융종목의 명칭)에 상응하는 제1벡터 및 상기 제1벡터와 소정의 기준을 만족하는 제2벡터를 적어도 하나 추출하고, 추출된 적어도 하나의 제2벡터에 대응되는 키워드를 상기 연관 키워드로 추출할 수 있다.Then, the extraction module 110 obtains a first vector corresponding to the keyword of the specific financial item (ie, the name of the specific financial item) acquired through the learned natural language processing model 120 and the first vector and the predetermined At least one second vector satisfying the criterion may be extracted, and a keyword corresponding to the extracted at least one second vector may be extracted as the related keyword.

성능이 뛰어난 즉, 잘 훈련된 자연어 처리모델은 키워드를 벡터화하면서 동일하거나 유사한 의미를 가지는 키워드는 벡터공간 상에서 가까운 위치에 존재할 수 있도록 벡터화를 수행할 수 있다.A natural language processing model with excellent performance, that is, a well-trained natural language processing model, can vectorize keywords so that keywords having the same or similar meanings can exist in close positions on a vector space.

그리고 이렇게 벡터화된 키워드들은 벡터 공간 상(예를 들어, BERT의 경우 768차원 공간)에 매핑될 수 있다.And these vectorized keywords can be mapped on a vector space (eg, 768-dimensional space in the case of BERT).

이러한 개념은 도 6을 참조하여 설명하도록 한다. This concept will be described with reference to FIG. 6 .

도 6은 본 발명의 실시 예에 따라 워드벡터를 통한 연관 키워드를 추출하는 개념을 설명하기 위한 도면이다.6 is a diagram for explaining the concept of extracting a related keyword through a word vector according to an embodiment of the present invention.

도 6을 참조하면, 학습된 제1자연어 처리모델을 통해 상기 학습대상 데이터들에 포함된 각각의 키워드들은 벡터화될 수 있으며, 이를 도시하면 도 6에 도시된 바와 같을 수 있다.Referring to FIG. 6 , each of the keywords included in the learning target data may be vectorized through the learned first natural language processing model, and this may be shown in FIG. 6 .

도 6에서 제1벡터(1)은 특정 금융종목에 해당하는 키워드(즉 명칭)를 나타내는 벡터일 수 있다.In FIG. 6, the first vector 1 may be a vector representing a keyword (name) corresponding to a specific financial item.

그리고 상기 키워드와 밀접한 연관이 있는 것으로 제1자연어 처리모델을 통해 학습된 키워드들 각각의 벡터들(예를 들어, 2-1 내지 2-8)은 벡터공간 상에서 가까운 위치에 매핑되게 된다.In addition, each vector (eg, 2-1 to 2-8) of the keywords learned through the first natural language processing model, which is closely related to the keyword, is mapped to a close position on the vector space.

이때 상기 제1자연어 처리모델은 컨텍스트를 고려하여 워드 임베딩(키워드 벡터화)을 수행하며, 이에 따라 단순히 특정 금융종목에 해당하는 키워드와 함께 기재된 경우가 많지만 다른 금융종목과도 함께 기재된 경우 등 과 같이 실질적인 의미상으로 큰 연관성이 없는 단어들(예를 들어, 주식에서 일반적으로 쓰이는 키워드들, '~이다' 등의 용언 등)은 상기 특정 금융종목에 대응되는 제1벡터(1)와는 벡터 공간 상에서 거리가 상대적으로 멀게 매핑될 수 있다.At this time, the first natural language processing model performs word embedding (keyword vectorization) in consideration of the context, and thus, in many cases, it is simply described with a keyword corresponding to a specific financial item, but when it is described together with other financial items, substantial Words that are not semantically related (eg, keywords commonly used in stocks, verbs such as 'is', etc.) are distanced from the first vector (1) corresponding to the specific financial item in the vector space. may be mapped relatively far.

따라서 상기 추출모듈(110)은 학습대상 데이터들에 포함된 키워드들이 상기 제1자연어 처리모델에 의해 각각 벡터화된 결과를 통해 상기 특정 금융종목의 연관 키워드를 추출할 수 있다.Accordingly, the extraction module 110 may extract keywords related to the specific financial item through the result of vectorizing the keywords included in the learning target data by the first natural language processing model.

예를 들어, 상기 제1벡터(1)와 미리 정해진 일정 기준 예를 들어, 코사인 유사도(Cosine Similarity)가 일정 값 이상이거나 상기 제1벡터와 유클리디안 거리(Euclidean Distance)가 일정 값 이하인 벡터들(예를 들어, 2-1 내지 2-8)을 추출하고 추출된 벡터들에 대응되는 키워드를 연관 키워드로 추출할 수 있다. For example, vectors in which the first vector 1 and a predetermined predetermined criterion, for example, a cosine similarity greater than or equal to a predetermined value or a Euclidean distance between the first vector and the first vector 1 are less than or equal to a predetermined value (For example, 2-1 to 2-8) may be extracted, and keywords corresponding to the extracted vectors may be extracted as related keywords.

코사인 유사도 또는 유클리디언 거리는 모두 벡터 공간 상에서 벡터의 유사도를 측정하기 위해 사용될 수 있는 지표임은 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.An average expert in the field of the present invention can easily infer that cosine similarity or Euclidean distance are both metrics that can be used to measure the similarity of vectors in a vector space.

또한 상기 일정 값을 얼마로 결정할지는 실험을 통해 결정할 수 있음은 물론이다. In addition, it is of course possible to determine how much the predetermined value is determined through an experiment.

이렇게 추출된 연관 키워드는 서비스의 관리자 또는 추가적인 프로세싱에 의해 더 제한될 수도 있지만, 어떠한 경우든 이러한 방식으로 추출된 연관 키워드들이 상기 특정 금융종목의 연관 키워드 또는 그 후보로 특정될 수 있다.Although the related keywords extracted in this way may be further restricted by a service manager or additional processing, in any case, the related keywords extracted in this way may be specified as the related keywords of the specific financial item or their candidates.

한편 상기 종목 DB 구축모듈(100)은 특정 금융종목에 상응하는 비정형 데이터들을 전부 학습대상 데이터로 학습하는 것이 아니라, 소정의 필터링을 수행하고 수행결과 남은 비정형 데이터들만을 학습대상 데이터로 특정할 수 있다.On the other hand, the item DB building module 100 does not learn all unstructured data corresponding to a specific financial item as learning target data, but performs a predetermined filtering and specifies only unstructured data remaining as a result of the execution as learning target data. .

이를 위해 상기 필터링 모듈(130)은 상기 데이터 수집모듈(50)이 수집한 비정형 데이터들 중 미리 정해진 필터링 조건에 해당하는 비정형 데이터에 대한 필터링을 수행할 수 있다. 그리고 필터링을 수행한 후에 남은 비정형 데이터들이 학습대상 데이터로 특정될 수 있다.To this end, the filtering module 130 may perform filtering on unstructured data corresponding to a predetermined filtering condition among the unstructured data collected by the data collection module 50 . In addition, unstructured data remaining after filtering may be specified as learning target data.

일 예에 의하면, 상기 필터링 모듈(130)은 수집한 비정형 데이터들 중복성이 높은 비정형 데이터들은 필터링을 통해 걸러내고 학습대상 데이터들 간에는 중복성이 낮은 것들만 특정할 수 있다.According to an example, the filtering module 130 may filter out unstructured data with high redundancy among the collected unstructured data and specify only those with low redundancy among learning target data.

이를 위해 상기 필터링 모듈(130)은 비정형 데이터들 즉, 문서들 간에 유사도가 일정 수준 이상 높은 문서들 간에 클러스터링(clustering) 또는 그루핑(grouping)할 수 있다. To this end, the filtering module 130 may cluster or group unstructured data, that is, documents having a similarity higher than a certain level.

이를 위해 상기 종목 DB 구축모듈(100)은 각각의 비정형 데이터들을 나타내는 문서 벡터를 생성하고 이를 활용할 수 있다. 문서 벡터는 해당 문서를 특징짓기 위한 벡터이며 컨텍스트를 고려하여 워드 임베딩이 잘 수행되는 자연어 처리모델(120)을 통해서 획득되는 워드 벡터들을 이용하여 해당 워드 벡터에 대응하는 키워드들을 포함하는 문장(sentence)을 특징짓는 문장 벡터(sentence vector) 또는 해당 문장들을 포함하는 해당 문서를 특징짓는 문서 벡터(document vector)를 정의하는 방식은 다양할 수 있음은 물론이다.To this end, the item DB construction module 100 may create a document vector representing each unstructured data and utilize it. The document vector is a vector for characterizing the corresponding document and is a sentence including keywords corresponding to the corresponding word vector using word vectors obtained through the natural language processing model 120 where word embedding is well performed in consideration of the context. Of course, a method of defining a sentence vector characterizing a sentence vector or a document vector characterizing a corresponding document including corresponding sentences may be varied.

물론 문서 벡터를 생성하기 위해서 상기 자연어 처리모델(120)이 생성한 워드벡터를 반드시 이용해야 하는 것은 아니며, 단순히 문서간의 유사성을 판단하기 위한 다양한 공지의 문서 벡터(예를 들어, TF-IDF 또는 이들을 활용한 방식) 생성 방식이 이용될 수 있음은 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다. Of course, it is not necessary to use the word vector generated by the natural language processing model 120 to generate a document vector, and various known document vectors (eg, TF-IDF or An average expert in the field of the present invention can easily infer that the production method can be used.

이러한 문서벡터의 생성은 문서벡터 생성모듈(140)에 의해 수행될 수 있다. Generation of such a document vector may be performed by the document vector generation module 140 .

일 예에 의하면 상기 문서벡터 생성모듈(140)은 상기 자연어 처리모델(120)을 통해 획득되는, 문서에 포함된 키워드들 각각의 워드벡터를 활용하여 문서벡터를 생성할 수 있다. 워드벡터를 활용하여 문장벡터를 생성하고 이를 이용하여 문서벡터를 생성할 수도 있고, 워드벡터로부터 바로 문서벡터를 정의할 수도 있다. 문장벡터 및/또는 문서벡터를 도출하기 위해 문장 또는 문서에 포함된 키워드 각각의 워드벡터를 소정의 방식으로 연산할 수 있고, 각각의 방식은 실시 예에 따라 다양해질 수 있음은 물론이다. 특히 BERT와 같이 양방향으로 깊은 어텐션을 고려하여 워드 임베딩을 수행하는 모델의 경우, 워드벡터를 통해 생성하는 문장벡터 또는 문서벡터의 성능은 높아질 수 있다.According to an example, the document vector generation module 140 may generate a document vector by utilizing a word vector of each of keywords included in a document obtained through the natural language processing model 120 . A sentence vector can be created using a word vector and a document vector can be created using it, or a document vector can be defined directly from a word vector. In order to derive a sentence vector and/or a document vector, each word vector of each keyword included in a sentence or document may be calculated in a predetermined method, and each method may vary according to embodiments, of course. In particular, in the case of a model that performs word embedding in consideration of deep attention in both directions, such as BERT, the performance of sentence vectors or document vectors generated through word vectors can be improved.

그러면 상기 필터링 모듈(130)은 상기 문서벡터 생성모듈(140)에 의해 생성된 비정형 데이터들 각각에 대한 문서벡터에 기초하여 유사도가 일정 수준 이상인 비정형 데이터들을 클러스터링 할 수 있다. 문서벡터간의 유사도 역시 전술한 바와 같은 코사인 유사도 또는 유클리디언 거리가 이용될 수 있으며, 클러스트링의 기준 값을 얼마로 정할지는 다양한 실시 예가 가능할 수 있다.Then, the filtering module 130 may cluster unstructured data whose similarity is equal to or higher than a certain level based on the document vectors for each of the unstructured data generated by the document vector generation module 140 . Cosine similarity or Euclidean distance as described above may also be used for the similarity between document vectors, and various embodiments may be possible for how to determine the criterion value for clustering.

그러면 상기 필터링 모듈(130)은 클러스터링된 비정형 데이터 클러스터들 각각에서 일부를 학습대상 데이터에서 제외하도록 필터링을 수행할 수 있다.Then, the filtering module 130 may perform filtering to exclude some of the clustered unstructured data clusters from the learning target data.

한편 본 발명의 기술적 사상에 의하면 클러스터링을 수행할 때 비정형 데이터들 각각의 생성시점(예를 들어, 업로드 시간 등)을 더 고려하여 클러스터링이 수행될 수도 있다. Meanwhile, according to the technical idea of the present invention, when clustering is performed, clustering may be performed by further considering the generation time (eg, upload time, etc.) of each of the unstructured data.

이는 통상적으로 어떤 한 컨텐츠가 생성되면, 상기 컨텐츠의 생성시점과 유사한 시점에 상기 컨텐츠에 기초한 중복 컨텐츠(예를 들어, 동일하거나 유사한 컨텐츠가 서로 다른 게시자에 의해 업로드되는 컨텐츠)가 생성되는 경우가 빈번하며 이때는 중복 컨텐츠를 필터링하는 것이 바람직할 수 있다. 하지만 내용이 유사한 컨텐츠라도 일정 시간이상의 차이가 있는 경우에는 그 자체로 독자적인 컨텐츠로서 다수의 사용자들에 의해 인식되고 취급될 수 있기 때문이다.In general, when a certain content is created, duplicate content based on the content (for example, content uploaded by different publishers of the same or similar content) is frequently created at a time similar to the time of creation of the content. In this case, it may be desirable to filter duplicate content. However, if there is a difference of more than a certain period of time even if the contents are similar, this is because it can be recognized and handled by a number of users as its own content.

따라서 설령 그 의미가 유사한 컨텐츠 즉 비정형 데이터라 하더라도 그 생성시점의 시간 간격이 큰 경우에는, 새롭게 시장에 영향을 미치거나 사용자의 인식에 영향을 미칠 수 있는 새로운 컨텐츠로 취급되는 것이 오히려 바람직할 수 있다. 따라서 본 발명의 기술적 사상은 이러한 개념을 수용하여 비정형 데이터들에 대한 필터링을 수행할 수 있다.Therefore, even if the content has a similar meaning, i.e. unstructured data, if the time interval at the time of creation is large, it may be preferable to treat it as new content that can newly affect the market or affect user perception. . Therefore, the technical idea of the present invention can perform filtering on unstructured data by accommodating this concept.

이러한 일 예는 도7에 도시된 바와 같을 수 있다.An example of this may be as shown in FIG. 7 .

도 7은 본 발명의 실시 예에 따라 필터링을 수행하는 개념을 설명하기 위한 도면이다.7 is a diagram for explaining a concept of performing filtering according to an embodiment of the present invention.

도 7을 참조하면, 상기 문서벡터 생성모듈(140)에 의해 네트워크로부터 비정형 데이터들 각각에 대응하는 문서벡터들(예를 들어, D1 내지 D19)이 생성될 수 있다. Referring to FIG. 7 , document vectors (eg, D1 to D19) corresponding to each of unstructured data may be generated from a network by the document vector generation module 140 .

도 7에 도시된 각각의 문서벡터들(예를 들어, D1 내지 D19)은 문서벡터들간의 유사도에 의해 어느 하나의 클러스터로 클러스터링된 문서벡터들일 수 있다. Each of the document vectors (for example, D1 to D19) shown in FIG. 7 may be document vectors clustered into one cluster according to the degree of similarity between the document vectors.

일 실시 예에 의하면, 어느 하나의 클러스터로 분류된 문서벡터들(예를 들어, D1 내지 D19)은 실제로 내용이 유사한 비정형 데이터들이므로 이중에서 어느 하나의 문서 또는 필요에 따라 가장 유사도가 낮은 몇 개의 문서 즉, 비정형 데이터만을 학습대상 데이터로 남기고 나머지는 필터링을 수행하여 학습대상 데이터를 특정할 수 있다. 그리고 이러한 과정을 각각의 클러스터들에 대해 수행할 수 있다.According to an embodiment, since the document vectors classified into any one cluster (for example, D1 to D19) are actually unstructured data with similar contents, one document or several documents with the lowest similarity as needed It is possible to specify the learning target data by leaving only the document, that is, unstructured data, as the learning target data and performing filtering on the rest. And this process can be performed for each cluster.

또한 실시 예에 따라서는 상기 필터링 모듈(130)은 문서의 유사도뿐만 아니라 해당 문서의 생성시점에 더 기초하여 클러스터링을 수행할 수도 있다. 즉, 문서의 유사도가 일정 수준 이상이면서 문서의 생성시간이 일정 간격내인 조건을 만족하는 문서들간에 클러스터링이 수행될 수 있으며, 그 결과는 도7에 도시된 바와 같을 수 있다.In addition, according to embodiments, the filtering module 130 may perform clustering based not only on the similarity of documents but also on the creation time of the corresponding document. That is, clustering may be performed between documents that satisfy the condition that the similarity of the documents is at least a certain level and the document creation time is within a certain interval, and the result may be as shown in FIG. 7 .

즉, 문서의 유사도만으로는 하나의 클러스터로 분류된 문서벡터들(예를 들어, D1 내지 D19)이 문서의 생성시간이라는 기준을 통해 3개의 세부 클러스터(Dt1, Dt2, Dt3)로 분류될 수 있다.That is, document vectors (for example, D1 to D19) classified as one cluster based only on document similarity can be classified into three sub-clusters (Dt1, Dt2, Dt3) through the criterion of document creation time.

그러면 상기 필터링 모듈(130)은 각각의 세부 클러스터(Dt1, Dt2, Dt3)에서 필터링을 수행하고 남은 문서들 즉, 비정형 데이터들 1개 또는 몇 개를 학습대상 데이터로 특정할 수 있다.Then, the filtering module 130 may perform filtering on each of the detailed clusters Dt1, Dt2, and Dt3 and specify one or several unstructured data as the learning target data.

이처럼 문서의 생성시간에 더 기초하여 클러스터링을 수행하여 필터링을 수행하는 것은 단순히 비정형 데이터의 수집을 세분화하는 것에 비해 보다 효과적이다. 왜냐하면 비정형 데이터의 수집은 자연어 처리모델(120)의 학습결과에 영향을 미칠 정도로 데이터의 양이 충분히 존재하여야 함으로 상대적으로 긴 주기(예를 들어, 몇 개월) 단위로 수행되는 것이 바람직하다. 하지만 이렇게 상대적으로 긴 시간동안 수집된 비정형 데이터들 중는 실제로 어느 한 개의 컨텐츠에 기초하여 중복성이 있도록 생성된 컨텐츠가 존재할 뿐만 아니라, 의미는 유사하지만 별개의 생성자에 의해 별개로 생성된 독자적 컨텐츠일 수도 있다. 따라서 후자의 경우는 설령 그 의미는 유사하더라도 별개의 정보로써 학습대상 데이터에 포함되는 것이 바람직할 수 있다.In this way, filtering by performing clustering based on document creation time is more effective than simply subdividing the collection of unstructured data. This is because the collection of unstructured data is preferably carried out in units of relatively long cycles (eg, several months) since the amount of data must exist sufficiently to affect the learning result of the natural language processing model 120 . However, among the unstructured data collected over a relatively long period of time, not only do content created to have redundancy actually exist based on any one content, but it may also be independent content that is similar in meaning but separately created by a separate creator. . Therefore, in the latter case, even if the meaning is similar, it may be desirable to be included in the learning target data as separate information.

따라서 상대적으로 긴 주기 단위로 특정 금융종목에 상응하는 비정형 데이터들을 수집한 후, 비정형 데이터(문서)들 간의 유사도 및 생성시간 기준(예를 들어, 1일 또는 2일)으로 상대적으로 짧은 기간의 시간범위 내의 문서들을 클러스터링을 수행하는 것이 중복성은 제거하면서도 다수의 컨텐츠 생성자들의 개별적인 견해 또는 분석이 담긴 컨텐츠에 따라 연관 키워드를 생성하기 위한 본 발명의 취지를 잘 달성할 수 있다.Therefore, after collecting unstructured data corresponding to a specific financial item in a relatively long period, a relatively short period of time based on the similarity between the unstructured data (documents) and the creation time (eg, 1 or 2 days) By performing clustering of documents within a range, the purpose of the present invention for generating related keywords according to the contents containing individual views or analyzes of multiple content creators can be achieved while eliminating redundancy.

이에 따라 상기 필터링 모듈(130)은 상기 세부 클러스터(Dt1, Dt2, Dt3)들 각각에서 한 개 또는 몇 개의 문서벡터들에 상응하는 문서들만을 학습대상 데이터로 남기는 필터링을 수행할 수 있다.Accordingly, the filtering module 130 may perform filtering to leave only documents corresponding to one or several document vectors in each of the detailed clusters Dt1, Dt2, and Dt3 as learning target data.

한편, 본 발명의 기술적 사상은 네트워크 상의 다수의 컨텐츠 생성자의 견해나 분석 또는 뉴스 등을 통해 특정 금융종목의 연관 키워드를 결정하는 특징을 가지므로, 이러한 연관 키워드의 변화 또는 추이가 상기 특정 금융종목 또는 이에 해당하는 회사의 변화 또는 이들에 대한 다수의 사람들의 인식의 변화를 반영하는 것일 수 있다.On the other hand, since the technical idea of the present invention has a feature of determining keywords related to a specific financial item through opinions, analysis, or news of a plurality of content creators on the network, changes or trends in these related keywords may affect the specific financial item or It may reflect a change in the corresponding company or a change in the perception of many people about them.

따라서 특정 금융종목의 연관 키워드의 변화가 있을 경우 이를 빠른 시간 내에 인식하는 경우에는 상대적으로 해당 변화를 고려한 투자전략의 생성이 가능할 수 있다.Therefore, if there is a change in keywords related to a specific financial item, it may be possible to create an investment strategy that relatively considers the change in case of recognizing it in a short time.

이를 위해 상기 종목 DB 구축모듈(100)은 소정의 주기 별로 연관 키워드 결정을 수행할 수 있다.To this end, the item DB building module 100 may determine a related keyword for each predetermined period.

그리고 각 주기 별로 미리 정해진 기간 내에 생성된 비정형 데이터들로써 연관 키워드의 결정을 수행할 수 있고, 그 결과들을 소정의 DB(500)에 저장 및 관리함으로써 특정 금융종목의 연관 키워드의 변화를 모니터링할 수 있다.In addition, it is possible to determine related keywords with unstructured data generated within a predetermined period for each period, and store and manage the results in a predetermined DB 500 to monitor changes in related keywords of specific financial items. .

이러한 일 예는 도8을 참조하여 설명하도록 한다.An example of this will be described with reference to FIG. 8 .

도 8은 본 발명의 실시 예에 따라 연관 키워드의 변화를 설명하기 위한 도면이다.8 is a diagram for explaining changes in related keywords according to an embodiment of the present invention.

도 8을 참조하면, 상기 추출모듈(110)은 상술한 바와 같은 방식으로 특정 금융종목(예를 들어, S1, S2 등)의 연관 키워드를 결정할 수 있다.Referring to FIG. 8 , the extraction module 110 may determine a related keyword of a specific financial item (eg, S1, S2, etc.) in the above-described manner.

예를 들어, 제1주기(P1)에서 상기 특정 금융종목(예를 들어, S1)의 연관 키워드들(예를 들어, K1, K2, K3, K4)가 결정되었고, 제2주기(P2)에서 다시 연관 키워드를 결정하였지만 제1주기(P1)에서 결정된 연관 키워드들과는 차이가 없을 수 있다. 즉, 제2주기(P2)에서 결정된 연관 키워드들 역시 제1주기(P1)과 동일한 K1, K2, K3, K4일 수 있다.For example, keywords (eg, K1, K2, K3, K4) associated with the specific financial issue (eg, S1) are determined in the first period (P1), and in the second period (P2) Although the related keywords are determined again, there may be no difference from the related keywords determined in the first period (P1). That is, the related keywords determined in the second period P2 may also be the same K1, K2, K3, and K4 as those in the first period P1.

하지만 제3주기(P3)에서 결정된 연관 키워드는 K1, K3, K4, K5일 수 있다. 즉, 새로운 연관 키워드(K5)가 제3주기에서 상기 특정 금융종목(예를 들어, S1)의 연관 키워드로 결정될 수 있으며, 이는 실제 상기 특정 금융종목(예를 들어, S1)이나 이에 해당하는 회사에 K5라는 연관 키워드와 관련된 이슈가 발생했거나 적어도 그 이슈가 발생한 것으로 인식한 사용자들이 등장했음을 의미할 수 있다.However, the related keywords determined in the third period P3 may be K1, K3, K4, and K5. That is, a new related keyword (K5) may be determined as a keyword related to the specific financial issue (eg, S1) in the third cycle, which is actually the specific financial issue (eg, S1) or a company corresponding thereto. This may mean that an issue related to the related keyword K5 has occurred, or at least users who recognized that the issue has occurred have appeared.

이러한 경우 상기 추출모듈(110)은 소정의 알람 프로세스를 수행할 수 있으며, 이러한 알람 프로세스는 상기 종목 DB 구축모듈(100)을 운영하는 주체의 관리자 또는 미리 지정된 알림통보 대상자나 머신에게 새로운 연관 키워드가 등장했음을 통보하는 절차일 수 있다. 이를 통해 해당 통보를 수신한 객체는 이를 확인하고 새로운 투자전략의 검토나 수립을 빠르게 진행할 수 있는 효과가 있다.In this case, the extraction module 110 may perform a predetermined alarm process, and this alarm process may generate a new related keyword to the manager of the entity operating the item DB construction module 100 or to a previously designated notification target or machine. It may be a procedure to notify that it has appeared. Through this, the object receiving the notification has the effect of confirming it and quickly proceeding with the review or establishment of a new investment strategy.

도 9는 본 발명의 실시 예에 따른 종목 DB 구축모듈(100)이 종목-키워드 관계 DB를 구축하는 과정을 도시한 흐름도를 예시적으로 도시한 것이다.9 is a flowchart illustrating a process of constructing an item-keyword relation DB by the item DB construction module 100 according to an embodiment of the present invention by way of example.

도 9에 도시된 바와 같이, 상기 종목 DB 구축모듈(100)은 미리 정의된 복수의 금융 투자 종목 S 각각에 대하여, S110 내지 S130 단계를 수행할 수 있다(S100).As shown in FIG. 9 , the item DB construction module 100 may perform steps S110 to S130 for each of a plurality of predefined financial investment items S (S100).

S110 단계에서, 상기 종목 DB 구축모듈(100)은 특정 금융종목 S에 상응하는 비정형 데이터를 수집할 수 있다. In step S110, the item DB construction module 100 may collect unstructured data corresponding to a specific financial item S.

그러면 S120 단계에서, 상기 종목 DB 구축모듈(100)은 수집한 비정형 데이터에 기초하여 상기 특정 금융종목 S의 연관 키워드를 결정할 수 있다. 연관 키워드를 결정하기 위해 전술한 바와 같이 컨텍스트 센서티브한 자연어 처리모델이 이용될 수 있고, 상기 자연어 처리모델의 학습에 이용되는 학습대상 데이터는 수집된 비정형 데이터들 전부가 아니라 소정의 필터링 프로세스가 수행된 후에 특정될 수 있음은 전술한 바와 같다.Then, in step S120, the item DB building module 100 may determine a keyword associated with the specific financial item S based on the collected unstructured data. As described above, a context-sensitive natural language processing model may be used to determine the relevant keyword, and the learning target data used for learning the natural language processing model is not all of the collected unstructured data, but a predetermined filtering process is performed. It is as described above that it can be specified later.

S130 단계에서 상기 종목 DB 구축모듈(100)은 상기 특정 금융종목 S와 상기 특정 금융종목 S의 연관 키워드를 대응시켜 DB(500)에 저장하거나 상기 특정 금융종목 S에 상응하는 DB상의 항목(예를 들어 레코드)을 업데이트할 수 있다.In step S130, the item DB building module 100 matches the specific financial item S with keywords related to the specific financial item S and stores them in the DB 500 or items on the DB corresponding to the specific financial item S (eg, records) can be updated.

한편, 위의 프로세스는 소정의 주기 즉, 미리 정해진 일정시간이 경과하면 반복적으로 수행될 수 있다.Meanwhile, the above process may be repeatedly performed when a predetermined period, that is, a predetermined period of time has elapsed.

상기 추출모듈(110)은 특정 금융종목에 대응되는 투자영향지표를 추출할 수 있으며, 상기 특정 금융종목에 대응되는 연관 투자영향지표를 종목-지표 관계 DB에 저장할 수 있다. 이러한 일 예는 도 10를 참조하여 설명하도록 한다.The extraction module 110 may extract an investment impact indicator corresponding to a specific financial item, and may store the related investment impact indicator corresponding to the specific financial item in an item-indicator relation DB. An example of this will be described with reference to FIG. 10 .

도10는 본 발명의 실시 예에 따라 종목-지표 관계 DB를 구축하는 과정을 설명하기 위한 도면이다.10 is a diagram for explaining a process of constructing an event-indicator relation DB according to an embodiment of the present invention.

도 10에 도시된 바와 같이, 미리 정의된 복수의 금융 투자 종목 S 각각에 대하여, S210 내지 S230 단계가 수행될 수 있다(S200).As shown in FIG. 10, steps S210 to S230 may be performed for each of a plurality of predefined financial investment items S (S200).

S210 단계에서, 상기 데이터 수집모듈(50)은 상기 금융종목 S에 상응하는 비정형 데이터를 수집할 수 있음은 전술한 바와 같다. 실시 예에 따라 상기 데이터 수집모듈(50)은 특정 금융종목의 연관 키워드에 상응하는 비정형 문서들을 수집할 수도 있다. 즉, 상기 종목 DB 구축모듈(100)은 상기 금융종목 S에 상응하는 비정형 문서들(제1비정형 문서들이라 함)만을 이용하여 투자영향지표를 결정할 수도 있고, 상기 제1비정형 문서들뿐만 아니라 상기 금융종목 S의 연관 키워드에 상응하는 비정형 문서들(제2비정형 문서들이라 함)에 더 기초하여 투자영향지표를 결정할 수도 있다. 실시 예에 따라서는 연관 키워드의 연관 키워드에 상응하는 비정형 문서들(제3비정형 문서들이라 함)에 더 기초하여 투자영향지표를 결정할 수도 있다.As described above, in step S210, the data collection module 50 can collect unstructured data corresponding to the financial item S. According to an embodiment, the data collection module 50 may collect atypical documents corresponding to keywords related to specific financial items. That is, the item DB building module 100 may determine the investment impact index using only atypical documents (referred to as first unstructured documents) corresponding to the financial item S, and may determine the financial item as well as the first unstructured documents. The investment impact index may be determined further based on unstructured documents (referred to as second unstructured documents) corresponding to the related keyword of item S. Depending on the embodiment, the investment impact index may be determined further based on unstructured documents (referred to as third unstructured documents) corresponding to the related keyword of the related keyword.

이처럼 특정 금융종목 자체를 언급하고 있는 문서뿐만 아니라, 특정 금융종목의 연관 키워드를 언급하고 있는 문서, 더 나아가 특정 금융종목의 연관 키워드의 연관 키워드를 언급하고 있는 문서들과 같이 연관 키워드의 차수를 확장해가면서 특정 금융종목과 연관성 있는 문서들을 확장할 수 있고, 이들을 통해 후술하는 바와 같이 투자영향지표를 결정할 수 있다. 몇 차의 연관 키워드에 상응하는 문서(예를 들어, 특정 금융종목의 연관 키워드에 상응하는 비정형 문서를 제2비정형 문서 또는 2차비정형 문서라 정의하면, 특정 금융종목의 연관 키워드의 연관 키워드에 상응하는 비정형 문서는 3차 비정형 문서로 정의할 수 있음)까지 포함하여 투자영향지표를 결정할지에 대해서는 실시 예에 따라 달라질 수 있다.In this way, the order of related keywords is expanded, such as not only documents that mention specific financial items themselves, but also documents that mention related keywords of specific financial items, and furthermore, documents that mention keywords related to keywords related to specific financial items. Over time, documents related to specific financial issues can be expanded, and through these, investment impact indicators can be determined as described below. Documents corresponding to several related keywords (for example, if an atypical document corresponding to a related keyword of a specific financial item is defined as a secondary unstructured document or a secondary unstructured document, it corresponds to a related keyword of a related keyword of a specific financial item) Whether or not to determine the investment impact index, including up to the tertiary unstructured document (which can be defined as a tertiary unstructured document), may vary depending on the embodiment.

또한 필연적으로 제1비정형 문서(1차비정형 문서)와 그 다음 차수인 2차비정형 문서 간에는 중복되는 문서가 존재할 수 있고, 이러한 경우에는 빠른 차수의 비정형 문서로 취급할 수 있다.In addition, overlapping documents may inevitably exist between the first unstructured document (primary unstructured document) and the second unstructured document, which is the next order, and in this case, it can be treated as a fast order unstructured document.

어떠한 경우든 데이터 수집모듈(50)에 의해 1차비정형 문서들을 포함하며 n(n은 2이상의 자연수)차 비정형 문서들이 수집되면, 상기 추출모듈(110)은 수집한 비정형 문서들에 기초하여 상기 금융종목 S에 상응하는 연관 투자영향지표를 결정할 수 있다(S220). In any case, when n (n is a natural number greater than or equal to 2) order unstructured documents including primary unstructured documents are collected by the data collection module 50, the extraction module 110 performs the financial analysis on the basis of the collected unstructured documents. A related investment impact index corresponding to item S may be determined (S220).

상기 추출모듈(110)은 수집한 비정형 문서들에 기초하여 상기 금융종목 S에 상응하는 투자영향지표를 결정하기 위해 상기 자연어 처리모델(120)에 포함된 제2자연어 처리모델을 이용할 수 있다.The extraction module 110 may use a second natural language processing model included in the natural language processing model 120 to determine an investment impact index corresponding to the financial item S based on the collected unstructured documents.

상기 제2자연어 처리모델은 전술한 바와 같이 수집된 비정형 문서들 중 전부 또는 일부인 판단대상 문서들에 대해 해당 문서들 각각이 특정 금융종목에 대해 긍정적인 영향을 갖는 것 또는 부정적인 영향을 갖는 것을 포함하는 미리 정해진 분류결과로 분류할 수 있다.The second natural language processing model includes that each of the corresponding documents has a positive or negative effect on a specific financial item for all or some of the documents to be judged among the atypical documents collected as described above. It can be classified according to a pre-determined classification result.

상기 제2자연어 처리모델은 문서 그 자체만이 아니라 문서에 포함된 문장 별로 영향판단 즉, 감성분석을 수행할 수도 있으며, 실시 예에 따라서는 문장별 감성분석 결과에 기초하여 문서의 감성분석을 수행할 수 있다.The second natural language processing model may perform affect determination, that is, sentiment analysis, for each sentence included in the document as well as for the document itself. can do.

일 예에 의하면, 상기 제2자연어 처리모델은 컨텍스트 센서티브(context-sensitive)한 자연어 처리모델을 통해 감성분석을 수행할 수 있도록 학습되어 있으며, 이러한 학습된 제2자연어 처리모델을 통해 수집된 비정형 문서들의 전부 또는 일부에 대한 감성분석이 수행될 수 있다.According to an example, the second natural language processing model is trained to perform sentiment analysis through a context-sensitive natural language processing model, and unstructured documents collected through the learned second natural language processing model. Sentiment analysis may be performed on all or part of them.

제2자연어 처리모델이 출력하는 감성분석 결과는 단순히 긍정, 부정, 또는 중립의 3개 클래스일 수도 있지만, 더 상세한 클래스(예를 들어, 강한긍정, 약한긍정, 강한부정, 약한부정, 중립 등)로 분류할 수도 있다. 물론, 제2자연어 처리모델의 학습시에는 출력할 클래스별로 라벨링이 수행되어야 함은 물론이다.The sentiment analysis result output by the second natural language processing model may simply be three classes of positive, negative, or neutral, but more detailed classes (eg, strong positive, weak positive, strong negative, weak negative, neutral, etc.) can also be classified as Of course, when learning the second natural language processing model, labeling must be performed for each class to be output.

한편 수집된 비정형 문서들 전부를 감성분석을 수행할 판단대상 문서들로 특정하는 경우는 전술한 바와 같이 어느 하나의 컨텐츠를 그대로 복제하여 업로드하거나 약간의 변경만 가해진 컨텐츠가 업로드되는 등의 소셜 미디어 상에서의 컨텐츠 중복성 문제로 인해 실질적으로는 하나의 컨텐츠가 다수의 문서들로 재생산되어 투자영향지표에 과다한 영향을 미치는 문제가 발생할 수 있다.On the other hand, in the case of specifying all of the collected unstructured documents as documents to be judged to perform sentiment analysis, as described above, any one content is copied and uploaded as it is or content with only slight changes is uploaded on social media. Due to the content redundancy problem of , in practice, a single content is reproduced in multiple documents, which may cause an excessive effect on the investment impact index.

이를 위해 제2자연어 처리모델이 감성분석을 수행할 대상 역시 필터링 모듈(130)에 의해 필터링될 수 있으며, 그 기준은 도 7에서 전술한 바와 같을 수 있다.To this end, the object to which the second natural language processing model performs sentiment analysis may also be filtered by the filtering module 130, and the criteria may be the same as those described above with reference to FIG. 7 .

즉, 필터링 모듈(130)은 연관 키워드를 결정하기 위해 필터링을 수행한 것과 동일 또는 유사한 방식으로 제2자연어 처리모델이 감성분석을 수행할 대상인 판단대상 문서들에 대해서도 필터링을 수행할 수 있다.That is, the filtering module 130 may also perform filtering on documents to be judged, which are subjects for which the second natural language processing model performs sentiment analysis, in the same or similar manner to the filtering performed to determine the related keyword.

필터링 모듈(130)은 수집한 비정형 문서들 중 미리 정해진 필터링 조건에 해당하는 비정형 문서에 대한 필터링을 수행할 수 있으며, 필터링을 수행한 후에 남은 비정형 문서들이 판단대상 문서들로 특정될 수 있다.The filtering module 130 may perform filtering on unstructured documents corresponding to predetermined filtering conditions among the collected unstructured documents, and unstructured documents remaining after filtering may be specified as documents to be judged.

이때 필터링을 위해, 전술한 바와 같이 문서들 간 유사도가 일정 수준 이상인 비정형 문서들 간에 클러스터링이 수행될 수 있고, 클러스터링된 비정형 문서 클러스터들 각각에서 일부를 판단대상 문서에서 제외하도록 필터링이 수행될 수도 있다. At this time, for filtering, as described above, clustering may be performed between irregular documents having a similarity between documents of a certain level or higher, and filtering may be performed to exclude a part of each of the clustered irregular document clusters from the document to be judged. .

또한 유사도 뿐만 아니라 비정형 문서의 생성시간이 미리 정해진 시간범위 내인 비정형 문서들을 클러스터링할 수 있음은 전술한 바와 같다. 그리고 비정형 문서들 간의 유사도 판단을 위해 문서벡터가 이용될 수 있음도 전술한 바와 같다.In addition, it is possible to cluster unstructured documents in which not only the similarity but also the creation time of the unstructured document is within a predetermined time range, as described above. Also, as described above, a document vector can be used to determine the degree of similarity between unstructured documents.

이러한 방식으로 중복성이 어느 정도 해소된 후의 비정형 문서들 즉 판단대상 문서들에 대해 제2자연어 처리모델은 감성분석을 수행할 수 있다.In this way, the second natural language processing model may perform sentiment analysis on atypical documents after redundancy is resolved to some extent, that is, documents to be judged.

판단대상 문서들에 대해 감성분석을 수행한 결과를 이용하여 투자영향지표를 결정하기 위한 개념은 도 11에 도시된다. 도 11은 본 발명의 실시 예에 따른 비정형 문서들을 이용하여 투자영향지표를 결정하는 개념을 설명하기 위한 도면이다.The concept of determining an investment impact index using the result of sentiment analysis on documents to be judged is shown in FIG. 11 . 11 is a diagram for explaining the concept of determining an investment impact index using atypical documents according to an embodiment of the present invention.

도11을 참조하면, 상기 제2자연어 처리모델은 판단대상 문서들이 특정되면 판단대상 문서들(예를 들어, D1 내지 Dn) 각각을 입력으로 받아서 판단대상 문서들 각각에 대한 감성분석 결과(예를 들어, fD1 내지 fDn)를 출력할 수 있다.Referring to FIG. 11, the second natural language processing model receives each of the judgment target documents (eg, D1 to Dn) as input when the judgment target documents are specified, and the sentiment analysis result for each judgment target document (for example, For example, fD1 to fDn) may be output.

제2자연어 처리모델이 긍정, 부정, 및 중립의 3개의 클래스로 출력을 수행하도록 학습된 경우, 상기 감성분석 결과(예를 들어, fD1 내지 fDn) 역시 긍정, 부정, 및 중립 중 어느 하나일 수 있다.When the second natural language processing model is trained to perform output in three classes of positive, negative, and neutral, the sentiment analysis results (eg, fD1 to fDn) may also be any one of positive, negative, and neutral. there is.

그러면 긍정, 부정, 및 중립 각각이 소정의 수치(예를 들어, 1, -1, 0)로 치환될 수 있고, 각각의 문서별 가중치가 소정의 방식으로 정의됨으로써 도 10에 도시된 바와 같은 투자영향지표가 결정될 수 있다. 긍정, 부정, 및 중립 별로 대응되는 수치는 다양한 실시 예가 가능할 수 있음은 물론이다.Then, each of positive, negative, and neutral can be replaced with a predetermined number (eg, 1, -1, 0), and the weight for each document is defined in a predetermined way, so that investment as shown in FIG. Impact indicators can be determined. Numerical values corresponding to positive, negative, and neutral can be various embodiments.

즉, 투자영향지표(F)는 각각의 문서들의 감성분석 결과와 문서 별 가중치의 곱으로 정의될 수 있다.That is, the investment impact index (F) can be defined as the product of the sentiment analysis result of each document and the weight of each document.

이때 가중치는 다양하게 결정될 수 있으며, 일 예에 의하면 상기 가중치는 문서의 출처일 수 있다. 예를 들어, 문서의 출처 별로 미리 가중치가 정해질 수 있으며, 이러한 가중치는 출처 별 신뢰도에 따라 정해질 수 있다. 예를 들어, 문서가 언론사나 공공기관, 금융기관 등 상대적으로 높은 신뢰성을 가지는 출처인 경우에는 높게 책정되고, 개별 사용자인 경우에는 상대적으로 낮게 책정될 수 있다. 또한 개별 사용자들이 출처인 경우에도 다양한 방식으로 사용자별 신뢰도를 결정(예를 들어, 컨텐츠 생산 수, 팔로워의 수 등)하고 이에 따라 차별적인 가중치가 부여될 수도 있다.At this time, the weight may be determined in various ways, and according to an example, the weight may be the source of the document. For example, a weight may be determined in advance for each document source, and the weight may be determined according to the reliability of each source. For example, when a document is a source with relatively high reliability, such as a media outlet, a public institution, or a financial institution, the price may be set high, and when it is an individual user, it may be set relatively low. In addition, even when individual users are sources, the reliability of each user can be determined in various ways (eg, the number of contents produced, the number of followers, etc.), and differential weights can be assigned accordingly.

한편, 각각의 문서들별 감성분석은 전술한 바와 같이 제2자연어 처리모델을 학습할 때부터 학습 데이터를 문서 자체로 설정하고 학습 데이터인 다수의 문서들에 대한 라벨링을 수행하여, 판단대상 문서들이 입력되면 판단대상 문서들 자체에 대해 감성분석 결과를 출력하도록 구현될 수도 있다. 하지만 문서들이 상대적으로 짧은 경우가 아니라면, 긴 문서들에 대해서는 문서 자체에 대해 학습 데이터의 생성시에 라벨링을 하기도 어렵거니와 문서 내에 긍정적인 문장 및 부정적인 문장이 공존하는 경우에는 문서 단위로 일괄적으로 긍정 또는 부정으로 판단하기가 용이하지 않을 수 있다.On the other hand, in the sentiment analysis for each document, as described above, when the second natural language processing model is learned, the learning data is set as the document itself, and a plurality of documents that are the learning data are labeled. If input, it may be implemented to output sentiment analysis results for the judgment target documents themselves. However, unless the documents are relatively short, it is difficult to label long documents at the time of generating training data for the documents themselves, and when positive sentences and negative sentences coexist in the document, positive sentences are collectively documented. Or it may not be easy to judge negatively.

따라서 본 발명의 기술적 사상에 의하면 문서들의 감성분석 결과를 수행하기 위해 문장들별 감성분석 결과를 이용할 수 있고, 이러한 개념은 도 12을 참조하여 설명하도록 한다.Therefore, according to the technical concept of the present invention, the sentiment analysis result of each sentence can be used to perform the sentiment analysis result of documents, and this concept will be described with reference to FIG. 12 .

도 12는 본 발명의 실시 예에 따라 문장단위의 감성평가를 통해 문서의 영향평가를 수행하는 개념을 설명하기 위한 도면이다.12 is a diagram for explaining the concept of performing the impact evaluation of a document through sentiment evaluation in sentence units according to an embodiment of the present invention.

도 12를 참조하면, 제2자연어 처리모델은 각각의 판단대상 문서들(Di)에 포함된 문장들(예를 들어, S1 내지 SM)별로 긍정, 부정, 또는 중립 등의 감성분석을 수행할 수 있다. 물론, 이를 위해서는 상기 제2자연어 처리모델은 학습 시에 라벨링된 문장들을 학습 데이터로 이용하여 학습되어야 함은 물론이다.Referring to FIG. 12, the second natural language processing model may perform sentiment analysis such as positive, negative, or neutral for each sentence (eg, S1 to SM) included in each of the judgment target documents Di. there is. Of course, for this, the second natural language processing model must be trained using labeled sentences as training data during learning.

그러면 각각의 문장 별 감성분석 결과(예를 들어, fs1, fs2, ..., fsM)에 기초하여 상기 추출모듈(110)은 상기 문서(Di)의 감성분석 결과(fDi)를 판단할 수 있다. Then, based on the sentiment analysis result for each sentence (eg, fs1, fs2, ..., fsM), the extraction module 110 may determine the sentiment analysis result fDi of the document Di. .

상기 문서(Di)의 감성분석 결과(fDi) 역시 각각의 문장별 감성분석결과와 이들의 가중치(예를 들어, α1 내지 αm)의 곱에 의해 결정될 수 있으며, 가중치는 실험을 통해 적응적으로 결정될 수 있다.The sentiment analysis result fDi of the document Di may also be determined by multiplying the sentiment analysis result for each sentence and their weights (eg, α1 to αm), and the weights may be adaptively determined through experiments. can

이처럼 문장 별 감성분석 결과를 통해 문서의 감성분석 결과를 도출하는 경우에는 문서자체에 대해 감성분석을 수행하도록 학습되는 것에 비해 보다 정확성 높은 감성분석이 수행될 수 있는 효과가 있다.In this way, when the sentiment analysis result of the document is derived through the sentiment analysis result of each sentence, there is an effect that more accurate sentiment analysis can be performed compared to learning to perform sentiment analysis on the document itself.

한편, 상기 데이터 수집모듈(50)에 의해 수집되는 비정형 문서들은 전술한 바와 같이 특정 금융종목에 상응하는 문서들(제1비정형 문서 또는 2차비정형 문서)뿐만 아니라, 특정 금융종목의 연관 키워드에 상응하는 문서(제2비정형 문서 또는 2차 비정형 문서) 또는 연관 키워드의 연관 키워드에 상응하는 문서(제3비정형 문서 또는 3차 비정형 문서) 등과 같이 그 범위가 확대될 수 있다.On the other hand, the unstructured documents collected by the data collection module 50 correspond not only to documents (first unstructured documents or secondary unstructured documents) corresponding to specific financial items, but also to keywords related to specific financial items, as described above. The range can be expanded, such as a document (second unstructured document or secondary unstructured document) or a document corresponding to a related keyword of a related keyword (third unstructured document or tertiary unstructured document).

이러한 경우에는 판단대상 문서들 각각의 차수 별로 개별적으로 부분 투자영향지표를 결정하고, 부분 투자영향지표들에 기초하여 전체 투자영향지표를 결정할 수 있다.In this case, the partial investment impact index can be determined individually for each order of documents to be judged, and the overall investment impact index can be determined based on the partial investment impact index.

이러한 일 예는 도 13을 참조하여 설명하도록 한다.An example of this will be described with reference to FIG. 13 .

도 13은 본 발명의 실시 예에 따라 문서의 차수를 반영하여 적응적으로 영향평가를 수행하는 개념을 설명하기 위한 도면이다.13 is a diagram for explaining the concept of adaptively performing impact evaluation by reflecting the order of a document according to an embodiment of the present invention.

도 13을 참조하면, 필터링 모듈(130)에 의해 필터링이 수행된 후의 문서들은 D11 내지 D1N, D21 내지 D2M, D31 내지 D3P일 수 있다. Referring to FIG. 13 , documents after filtering by the filtering module 130 may be D11 to D1N, D21 to D2M, and D31 to D3P.

그리고 D11 내지 D1N는 특정 금융종목에 상응하는 비정형 문서 즉, 제1차 비정형 문서일 수 있고, D21 내지 D2M는 2차 비정형 문서이며, D31 내지 D3P은 3차 비정형 문서일 수 있다.Further, D11 to D1N may be irregular documents corresponding to a specific financial item, that is, first irregular documents, D21 to D2M may be secondary irregular documents, and D31 to D3P may be tertiary irregular documents.

이러한 경우 상기 추출모듈(110)은 각각의 비정형 문서의 차수별 투자영향지표(F1, F2, F3 등)를 개별적으로 산정하고, 차수별 투자영향지표들에 대해 차별적인 가중치(예를 들어, a, b, c)를 산정하여 전체 투자영향지표(F)를 결정할 수 있다.In this case, the extraction module 110 individually calculates the investment impact indicators (F1, F2, F3, etc.) for each order of each unstructured document, and discriminates weights (eg, a, b) for the investment impact indicators for each order. , c) can be calculated to determine the overall investment impact index (F).

이때 낮은 차수일수록 가중치는 높게 설정될 수 있다. 이는 낮은 차수일수록 해당 특정 금융종목에 직접적인 언급이 된 컨텐츠이기 때문이며, 직접성이 높을수록 특정 금융종목에 미치는 영향이 더 클 수 있기 때문이다.In this case, the lower the degree, the higher the weight may be set. This is because the lower the order, the content that is directly mentioned in the specific financial item, and the higher the directness, the greater the impact on the specific financial item.

결국 특정 금융종목의 투자영향지표를 네트워크 상의 다수의 생산자에 의해 생산된 컨텐츠를 통해 결정하면서, 투자영향지표에 영향을 미치는 문서의 범위를 키워드의 확장에 따라 확장하여 투자영향지표를 결정할 수 있다. 그리고 확장의 정도에 따라 차별적으로 투자영향지표에 미치는 정도를 조정함으로써 실제로 사용자들이 해당 문서를 보고 특정 금융종목과 연관성이 직접적이라고 판단하는 경우에는 보다 높은 영향을 미치고 연관성이 간접적이거나 낮다고 생각할 수 있는 문서는 투자영향지표에도 낮은 영향을 미치도록 상기 투자영향지표가 결정될 수 있다. Ultimately, the investment impact index can be determined by expanding the range of documents affecting the investment impact index according to the expansion of keywords while determining the investment impact index of a specific financial item through the contents produced by multiple producers on the network. In addition, by adjusting the degree of impact on the investment impact index differentially according to the degree of expansion, when users actually look at the document and determine that the relationship with a specific financial item is direct, the document has a higher impact and the relationship can be considered indirect or low. The investment impact index may be determined so as to have a low impact on the investment impact index.

다시 도 10을 참조하면 S230 단계에서 상기 종목 DB 구축모듈(100)은 상기 특정 금융종목 S와 상기 특정 금융종목 S에 대응되는 연관 투자영향지표를 대응시켜 DB(500)에 저장하거나 상기 특정 금융종목 S에 상응하는 DB상의 항목(예를 들어 레코드)을 업데이트할 수 있다.Referring to FIG. 10 again, in step S230, the item DB construction module 100 matches the specific financial item S with the relevant investment impact index corresponding to the specific financial item S and stores them in the DB 500 or the specific financial item S. You can update an item (for example, a record) on the DB corresponding to S.

다시 도2를 참조하면, 상기 사용자-지표 관계 DB 구축모듈(200)은 복수의 사용자 각각에 대응되는 관심 투자영행지표를 저장하는 사용자-지표 관계 DB를 구축할 수 있다.Referring back to FIG. 2 , the user-indicator relationship DB building module 200 may build a user-indicator relationship DB that stores investment performance indicators of interest corresponding to each of a plurality of users.

상기 사용자-지표 관계 DB 구축모듈(200)은 상기 종목 정보 제공 시스템(10)을 이용하여 제공되는 소정의 서비스에 가입하였거나 상기 종목 정보 제공 시스템(10)에 미리 등록된 복수의 사용자 별로 해당 사용자가 관심을 가지고 있는 관심 투자영향지표를 추출할 수 있으며 이를 사용자-키워드 관계 DB에 저장할 수 있다.The user-indicator relationship DB construction module 200 allows a corresponding user for each of a plurality of users who have subscribed to a predetermined service provided by using the item information providing system 10 or are pre-registered in the item information providing system 10. Interested investment impact indicators can be extracted and stored in the user-keyword relation DB.

일 실시예에서, 각 사용자에 대응될 수 있는 관심 투자영향지표는 상기 종목-지표 관계 DB 상의 투자영향지표 중 일부일 수 있다. 즉, 상기 종목-지표 관계 DB 상에 존재하는 투자영향지표가 특정 사용자에 대응되는 관심 투자영향지표의 후보가 될 수 있다.In one embodiment, the investment impact index of interest that can correspond to each user may be a part of the investment impact index on the item-indicator relation DB. That is, the investment impact index existing on the item-indicator relation DB can be a candidate for the investment impact index of interest corresponding to a specific user.

일 실시예에서 상기 사용자-지표 관계 DB 구축모듈(200)은 질의-응답 절차에 의해 각 사용자에 상응하는 관심 투자영향지표를 결정할 수 있다. 예를 들어 상기 사용자-지표 관계 DB 구축모듈(200)은 상기 사용자 단말(30)로 선호하는 투자영향지표에 대한 질의를 전송할 수 있으며, 상기 사용자 단말(30)에서 응답한 투자영향지표를 상기 사용자 단말(30)의 사용자에 상응하는 관심 투자영향지표라고 판단할 수 있다.In one embodiment, the user-indicator relationship DB building module 200 may determine an investment impact index of interest corresponding to each user through a question-and-answer procedure. For example, the user-indicator relationship DB construction module 200 may transmit a query about a preferred investment impact index to the user terminal 30, and the user terminal 30 may send an investment impact index response to the user terminal 30. It can be determined that the interest investment impact index corresponding to the user of the terminal 30 is.

다른 일 실시예에서, 상기 사용자-지표 관계 DB 구축모듈(200)은 각 사용자 별로, 해당 사용자가 소비하거나 생성한 비정형 데이터에 기반하여 해당 사용자의 관심 투자영향지표를 판단할 수도 있다.In another embodiment, the user-indicator relationship DB construction module 200 may determine an investment impact index of interest for each user based on unstructured data consumed or generated by the corresponding user.

이를 위하여, 상기 데이터 수집모듈(50)은 상기 특정 사용자에 상응하는 비정형 데이터를 수집할 수 있다. 상기 특정 사용자에 상응하는 비정형 데이터는 상기 특정 사용자가 생산(예를 들어, 작성)하거나 소비(예를 들어, 열람)한 비정형 데이터일 수 있다. 예를 들어 상기 특정 사용자가 인터넷 게시판의 특정 게시물에 댓글을 단 경우, 상기 특정 게시물 및 댓글은 모두 상기 특정 사용자에 상응하는 비정형 데이터일 수 있다. 실시예에 따라서, 상기 데이터 수집모듈(50)은 특정 사용자에 상응하는 비정형 데이터를 수집하기 위하여, 상기 특정 사용자가 상기 정보 소스 시스템(200)에 로그인하거나 인증하기 위해 필요한 소정의 사용자 인증 정보(예를 들면, 아이디/패스워드 정보, OAuth 정보)를 사용자로부터 미리 획득할 수 있다.To this end, the data collection module 50 may collect unstructured data corresponding to the specific user. The unstructured data corresponding to the specific user may be unstructured data produced (eg, written) or consumed (eg, viewed) by the specific user. For example, when the specific user comments on a specific post on an internet bulletin board, both the specific post and the comment may be unstructured data corresponding to the specific user. Depending on the embodiment, the data collection module 50 collects unstructured data corresponding to a specific user, using predetermined user authentication information (e.g., For example, ID/password information, OAuth information) may be obtained from the user in advance.

상기 사용자-지표 관계 DB 구축모듈(200)은 수집된 특정 사용자에 상응하는 비정형 데이터에 기초하여 상기 특정 사용자에 대응되는 관심 키워드를 추출할 수 있다.The user-index relationship DB construction module 200 may extract a keyword of interest corresponding to the specific user based on the collected unstructured data corresponding to the specific user.

일 실시예에서, 상기 사용자-지표 관계 DB 구축모듈(200)은 특정 사용자에 상응하는 비정형 데이터에 등장하는 각 키워드의 빈도에 기초하여 상기 특정 사용자에 대응되는 관심 키워드를 추출할 수 있다. 예를 들어, 상기 사용자-지표 관계 DB 구축모듈(200)은 등장 빈도가 일정 횟수 이상인 키워드들 혹은 등장 빈도가 가장 높은 몇 개의 상위 키워드들을 관심 키워드라고 판단할 수 있다. 다른 일 실시예에서, 상기 사용자-지표 관계 DB 구축모듈(200)은 각 비정형 데이터(문서)의 대표 키워드 혹은 대표 키워드들을 판단할 수 있으며, 대표 키워드로 선정된 빈도에 기초하여 관심 키워드를 추출할 수도 있다. 대표 키워드를 판단하는 방법의 예로 TF-IDF 또는 이를 활용한 방식을 들 수 있다. 상기 사용자-지표 관계 DB 구축모듈(200)은 특정 사용자의 관심 키워드를 결정하기 위해 사용되는 비정형 데이터를 상기 비정형 데이터의 생산시점(예를 들어, 네트워크상에 업로드 시점)에 기초하여 제한할 수 있다. 예를 들어 최근 소정의 기간(예를 들어, 1달, 3달 등)에 생상된 비정형 데이터만에 기초하여 상기 사용자-지표 관계 DB 구축모듈(200)은 특정 사용자의 관심 키워드를 결정할 수 있다. 그리고 이러한 관심 키워드의 결정을 주기적으로 수행하면서 해당 특정 사용자의 관심 키워드를 주기적으로 업데이트할 수 있다.In an embodiment, the user-indicator relationship DB construction module 200 may extract a keyword of interest corresponding to a specific user based on the frequency of each keyword appearing in unstructured data corresponding to the specific user. For example, the user-indicator relationship DB construction module 200 may determine keywords with a frequency of appearance of more than a certain number of times or several top keywords with the highest frequency of appearance as keywords of interest. In another embodiment, the user-indicator relationship DB construction module 200 may determine a representative keyword or representative keywords of each unstructured data (document), and extract a keyword of interest based on the frequency selected as the representative keyword. may be As an example of a method of determining a representative keyword, TF-IDF or a method using the same may be cited. The user-indicator relationship DB construction module 200 may limit unstructured data used to determine a keyword of interest of a specific user based on a production time of the unstructured data (eg, upload time on a network). . For example, the user-indicator relationship DB construction module 200 may determine a keyword of interest of a specific user based only on unstructured data generated in a recent predetermined period (eg, 1 month, 3 months, etc.). In addition, while periodically performing the determination of the keyword of interest, the keyword of interest of the specific user may be periodically updated.

위와 같은 방식에 의해 특정 사용자의 관심 키워드가 결정되면 상기 사용자-지표 관계 DB 구축모듈(200)은 상기 종목-키워드 관계 DB로부터 상기 사용자의 관심 키워드에 대응되는 투자 종목을 추출할 수 있으며, 상기 종목-지표 관계 DB로부터 상기 사용자의 관심 키워드에 대응되는 투자 종목에 대응되는 투자영향지표를 추출할 수 있다. 예를 들어 특정 사용자의 관심 키워드가 키워드 A라고 판단된 경우, 상기 사용자-지표 관계 DB 구축모듈(200)은 키워드 A에 대응되는 연관 종목을 상기 종목-키워드 관계 DB를 탐색하여 추출할 수 있다. 그리하여 키워드 A에 대응되는 연관 종목이 종목 1 및 종목 2라고 판단되면 상기 사용자-지표 관계 DB 구축모듈(200)은 종목-지표 관계 DB를 탐색하여 종목 1에 상응하는 투자영향지표 및 및 종목 2에 상응하는 투자영향지표를 추출할 수 있다. 그러면 상기 특정 사용자에 상응하는 관심 투자영향지표는 종목 1에 상응하는 투자영향지표 및 및 종목 2에 상응하는 투자영향지표로 결정될 수 있다.When a keyword of interest of a specific user is determined in the above manner, the user-indicator relationship DB construction module 200 may extract an investment item corresponding to the user's keyword of interest from the item-keyword relationship DB, and the item - An investment impact index corresponding to an investment item corresponding to the user's keyword of interest can be extracted from the index relation DB. For example, when it is determined that a keyword of interest of a specific user is keyword A, the user-indicator relationship DB construction module 200 may search and extract a related item corresponding to keyword A from the item-keyword relationship DB. Thus, if it is determined that the related items corresponding to the keyword A are item 1 and item 2, the user-indicator relationship DB construction module 200 searches the item-indicator relationship DB to determine the investment impact indicator corresponding to item 1 and item 2. Corresponding investment impact indicators can be extracted. Then, the investment impact index of interest corresponding to the specific user may be determined as an investment impact index corresponding to item 1 and an investment impact index corresponding to item 2.

한편, 업데이트 판단모듈(300)은 상기 복수의 투자종목 각각에 대응되는 연관 투자영향지표의 업데이트 여부를 관측할 수 있다. 특정 투자영향지표가 업데이트되었다고 함은 해당 투자영향지표의 관측 시점의 값이 기존의 값과 달라졌음을 의미할 수 있다. 상기 업데이트 판단모듈(300)은 각 금융 투자 종목의 투자영향지표의 값을 산출할 수 있는 데이터를 제공하는 소정의 외부 시스템으로부터 데이터를 수신하거나 자체적으로 업데이트되는 데이터에 기초하여 투자영향지표의 값을 산출하고 기존의 값과 비교할 수 있다.Meanwhile, the update determination module 300 may observe whether the related investment impact index corresponding to each of the plurality of investment items is updated. The fact that a specific investment impact index is updated may mean that the value at the point of observation of the investment impact index is different from the previous value. The update determination module 300 receives data from a predetermined external system that provides data capable of calculating the value of the investment impact index of each financial investment item, or determines the value of the investment impact index based on data updated by itself. Calculate and compare with previous values.

일 실시예에서 상기 업데이트 판단모듈(300)은 주기적으로 상기 복수의 투자종목 각각에 대응되는 연관 투자영향지표의 업데이트 여부를 관측할 수 있다.In one embodiment, the update determination module 300 may periodically observe whether the associated investment impact index corresponding to each of the plurality of investment items is updated.

또는 상기 업데이트 판단모듈(300)은 최근 소정의 기간 동안 생성된 비정형 데이터에 기초하여 이슈 키워드를 특정하고, 특정된 이슈 키워드와 연관된 투자 종목에 상응하는 투자영향지표의 업데이트 여부를 확인할 수도 있다. 상기 이슈 키워드는 이슈 금융종목을 결정하는 시점에서 발생한 사회적 이슈를 나타내는 키워드일 수 있다. 또한 이슈 종목은 정치, 경제, 산업 등의 다양한 사회적 이슈가 발생한 경우, 이러한 이슈에 직간접적으로 영향을 받을 가능성이 있는 금융종목을 의미할 수 있다. 상기 이슈 키워드는 수집된 비정형 데이터 중 현재부터 미리 정해진 일정 시간(하루, 또는 일주일 등)내에 수집된 비정형 데이터로부터 자동으로 추출될 수 있다. 예를 들어, 상기 업데이트 판단모듈(300)은 비정형 데이터들에서 키워드의 빈도, 비정형 데이터의 출처 등 다양한 요소에 기반하여 자동으로 상기 이슈 키워드가 결정될 수도 있다. 이러한 이슈 키워드를 결정하기 위한 다양한 선행기술(한국특허출원 출원번호 10-2015-0012255호, 10-2014-0081204호, 10-2019-0146726호 등)이 널리 공지되어 있으므로 비정형 데이터에 기초하여 이슈 키워드를 결정하는 구체적인 방식에 대한 설명은 본 명세서에서는 생략하도록 한다. 다른 실시 예에 의하면, 상기 업데이트 판단모듈(300)은 검색 서비스, 온라인 상품 판매 플랫폼 등의 다양한 서비스 측 시스템으로부터 이슈가 되는 키워드를 수신하여 수신된 키워드를 이슈 키워드로 특정할 수도 있다. 실시 예에 따라서는 관리자 또는 사용자가 자신이 판단한 이슈 또는 관심있는 이슈에 대한 키워드를 직접 입력하면, 상기 업데이트 판단모듈(300)은 입력된 키워드를 이슈 키워드로 특정할 수도 있다.Alternatively, the update determination module 300 may specify an issue keyword based on unstructured data recently generated during a predetermined period, and may check whether an investment impact index corresponding to an investment item related to the specified issue keyword is updated. The issue keyword may be a keyword indicating a social issue that occurred at the time of determining an issue financial item. In addition, issue items may refer to financial items that may be directly or indirectly affected by various social issues such as politics, economy, and industry. The issue keyword may be automatically extracted from unstructured data collected within a predetermined period of time (eg, a day or a week) from the present among collected unstructured data. For example, the update determination module 300 may automatically determine the issue keyword based on various factors such as the frequency of keywords in unstructured data and the source of unstructured data. Since various prior arts (Korean Patent Application Application Nos. 10-2015-0012255, 10-2014-0081204, 10-2019-0146726, etc.) for determining these issue keywords are widely known, issue keywords based on unstructured data A description of a specific method for determining is omitted in the present specification. According to another embodiment, the update determination module 300 may receive keywords that are issues from various service-side systems such as a search service and an online product sales platform, and may specify the received keywords as issue keywords. Depending on the embodiment, when a manager or a user directly inputs a keyword for an issue determined by the manager or an issue of interest, the update determination module 300 may specify the input keyword as an issue keyword.

상기 업데이트 판단모듈(300)은 사전에 구축된 상기 종목-키워드 관계 DB에 기초하여 상기 이슈 키워드에 대응되는 금융 투자 종목인 이슈 종목을 판단할 수 있다. 즉, 상기 업데이트 판단모듈(300)은 상기 종목-키워드 관계 DB에서 이슈 키워드에 대응되는 항목을 서치하여 상기 이슈 키워드에 대응되는 이슈 종목을 판단할 수 있다.The update determination module 300 may determine an issue item, which is a financial investment item corresponding to the issue keyword, based on the item-keyword relation DB constructed in advance. That is, the update determination module 300 may determine an issue item corresponding to the issue keyword by searching for an item corresponding to the issue keyword in the item-keyword relationship DB.

한편 상기 정보 제공 모듈(400)은 상기 사용자-지표 관계 DB에 기초하여 상기 복수의 사용자 중에서 상기 업데이트된 투자영향지표에 대응되는 타겟 사용자를 판단할 수 있으며, 상기 타겟 사용자에게 상기 상기 업데이트된 투자영향지표에 대응되는 투자 종목에 대한 개인화된 정보를 제공할 수 있다.Meanwhile, the information providing module 400 may determine a target user corresponding to the updated investment impact index from among the plurality of users based on the user-index relationship DB, and provide the updated investment impact index to the target user. It is possible to provide personalized information on investment stocks corresponding to indicators.

상기 정보 제공 모듈(400)은 상기 사용자-지표 관계 DB를 탐색하여 상기 상기 업데이트된 투자영향지표가 관심 투자영향지표인 사용자 혹은 사용자들을 탐색할 수 있으며, 탐색된 사용자를 상기 타겟 사용자라고 판단할 수 있다.The information providing module 400 may search the user-indicator relationship DB to search for a user or users for whom the updated investment impact index is an investment influence index of interest, and determine the searched user as the target user. there is.

이후 상기 정보 제공 모듈(400)은 상기 타겟 사용자에게 상기 업데이트된 투자영향지표에 대응되는 투자 종목에 대한 정보를 제공할 수 있다. 사용자에게 제공되는 특정 투자 종목에 대한 정보는 해당 종목의 명칭, 테마, 주가의 흐름, 해당 종목 관련 뉴스, 해당 종목과 관련된 애널리스트 분석 리포트, 해당 종목을 포함하는 섹터에 대한 시황 정보, 공시 정보, 해당 종목과 관련된 각종 투자지표, 해당 종목을 포함하는 투자 전략, 투자 로직, 포트폴리오에 관한 정보를 포함할 수 있으며, 이 외에도 상기 투자 종목과 관련된 다양한 정보를 포함할 수 있다.Thereafter, the information providing module 400 may provide the target user with information on investment items corresponding to the updated investment impact index. Information on a specific investment item provided to users includes the name of the item, theme, stock price flow, news related to the item, analyst analysis report related to the item, market information on the sector including the item, disclosure information, It may include various investment indices related to the stock, investment strategy including the stock, investment logic, information on the portfolio, and other various information related to the above investment stock.

이와 같이 본 발명의 기술적 사상에 의하면, 특정한 투자영향지표에 업데이트가 발생한 경우 그와 관련성이 있는 금융종목이 빠르고 정확하게 결정될 수 있다. 또한 평소 이러한 투자영향지표에 관심이 있는 것으로 판단된 사용자들에게 그와 연관된 금융 투자 종목에 대한 정보를 신속하게 제공할 수 있는 효과가 있다.As described above, according to the technical idea of the present invention, when an update occurs in a specific investment impact index, a financial item related thereto can be quickly and accurately determined. In addition, there is an effect of quickly providing information on related financial investment items to users who are determined to be interested in these investment impact indicators.

도 14는 본 발명의 일 실시예에 따른 인공지능 기반의 개인화된 종목 정보 제공 방법의 일 예를 도시한 흐름도이다.14 is a flowchart illustrating an example of a method for providing personalized item information based on artificial intelligence according to an embodiment of the present invention.

도 14의 과정이 수행되기 앞서, 복수의 투자 종목 각각에 대응되는 연관 투자영향지표를 저장하는 종목-지표 관계 DB, 상기 복수의 투자 종목 각각에 대응되는 연관 키워드를 저장하는 종목-키워드 관계 DB 및 복수의 사용자 각각에 대응되는 관심 투자영향지표를 저장하는 사용자-지표 관계 DB는 미리 구축되어 있는 것으로 가정한다.Before the process of FIG. 14 is performed, an item-indicator relationship DB for storing related investment impact indicators corresponding to each of a plurality of investment items, an item-keyword relationship DB for storing related keywords corresponding to each of the plurality of investment items, and It is assumed that a user-indicator relationship DB for storing investment impact indicators of interest corresponding to each of a plurality of users is built in advance.

도 14를 참조하면, 상기 종목 정보 제공 시스템(10)은 복수의 투자종목 각각에 대응되는 연관 투자영향지표의 업데이트 여부를 관측할 수 있다(S300).Referring to FIG. 14 , the item information providing system 10 may observe whether the related investment impact index corresponding to each of a plurality of investment items is updated (S300).

그러면 업데이트가 발생한 연관 투자지표가 관측된 경우, 상기 상기 종목 정보 제공 시스템(10), 상기 사용자-지표 관계 DB에 기초하여 복수의 사용자 중 업데이트가 발생한 상기 연관 투자지표에 대응되는 타겟 사용자를 판단할 수 있다(S310).Then, when a related investment index with an update is observed, a target user corresponding to the related investment index with an update is determined among a plurality of users based on the item information providing system 10 and the user-indicator relationship DB. It can (S310).

또한 상기 종목 정보 제공 시스템(10)은 업데이트가 발생한 연관 투자지표에 대응되는 투자 종목을 상기 종목-지표 관계 DB에 기초하여 판단할 수 있으며, 해당 투자 종목에 대한 개인화된 정보를 타겟 사용자에게 제공할 수 있다(S320).In addition, the item information providing system 10 may determine an investment item corresponding to an updated investment index based on the item-indicator relationship DB, and provide personalized information about the investment item to a target user. It can (S320).

한편, 구현 예에 따라서, 상기 종목 정보 제공 시스템(10)은 프로세서 및 상기 프로세서에 의해 실행되는 프로그램을 저장하는 메모리를 포함할 수 있다. 상기 프로세서는 싱글 코어 CPU혹은 멀티 코어 CPU를 포함할 수 있다. 메모리는 고속 랜덤 액세스 메모리를 포함할 수 있고 하나 이상의 자기 디스크 저장 장치, 플래시 메모리 장치, 또는 기타 비휘발성 고체상태 메모리 장치와 같은 비휘발성 메모리를 포함할 수도 있다. 프로세서 및 기타 구성 요소에 의한 메모리로의 액세스는 메모리 컨트롤러에 의해 제어될 수 있다. 여기서, 상기 프로그램은, 프로세서에 의해 실행되는 경우, 본 실시예에 따른 종목 정보 제공 시스템(10)으로 하여금, 상술한 방법을 수행하도록 할 수 있다.Meanwhile, according to an implementation example, the item information providing system 10 may include a processor and a memory storing a program executed by the processor. The processor may include a single-core CPU or a multi-core CPU. The memory may include high-speed random access memory and may also include non-volatile memory such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Access to memory by processors and other components may be controlled by a memory controller. Here, the program, when executed by a processor, may cause the item information providing system 10 according to the present embodiment to perform the above-described method.

한편, 본 발명의 실시예에 따른 인공지능 기반의 개인화된 종목 정보 제공 방법은 컴퓨터가 읽을 수 있는 프로그램 명령 형태로 구현되어 컴퓨터로 읽을 수 있는 기록 매체에 저장될 수 있으며, 본 발명의 실시예에 따른 제어 프로그램 및 대상 프로그램도 컴퓨터로 판독 가능한 기록 매체에 저장될 수 있다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.On the other hand, the method for providing personalized event information based on artificial intelligence according to an embodiment of the present invention may be implemented in the form of computer-readable program commands and stored in a computer-readable recording medium, The control program and the target program according to the above may also be stored in a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored.

기록 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다.Program commands recorded on the recording medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in the software field.

컴퓨터로 읽을 수 있는 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, floptical disks and hardware devices specially configured to store and execute program instructions, such as magneto-optical media and ROM, RAM, flash memory, and the like. In addition, the computer-readable recording medium is distributed in computer systems connected through a network, so that computer-readable codes can be stored and executed in a distributed manner.

프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 전자적으로 정보를 처리하는 장치, 예를 들어, 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Examples of program instructions include high-level language codes that can be executed by a device that electronically processes information using an interpreter, for example, a computer, as well as machine language codes generated by a compiler.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타나며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts thereof should be construed as being included in the scope of the present invention. .

Claims

constructing, by a computing system, an item-indicator relationship DB that stores related investment impact indicators corresponding to each of a plurality of investment items;
constructing, by the computing system, a user-indicator relationship DB for storing an investment impact index of interest corresponding to each of a plurality of users;
Observing, by the computing system, whether an associated investment impact index corresponding to each of the plurality of investment items is updated;
determining, by the computing system, a target user corresponding to the updated related investment index from among the plurality of users based on the user-index relationship DB, when a related investment index with an update is observed; and
and providing, by the computing system, personalized information about an investment item corresponding to the related investment index in which an update has occurred to the target user.

According to claim 1,
The step of constructing the user-indicator relationship DB, for each of the plurality of users,
collecting unstructured data consumed by the user; and
A method of providing item information comprising determining an investment impact index of interest corresponding to the user based on unstructured data consumed by the user.

The method of claim 1, wherein the method of providing item information,
The computing system further comprises constructing an item-keyword relationship DB for storing a related keyword corresponding to each of the plurality of investment items,
The step of constructing the user-indicator relationship DB, for each of the plurality of users,
collecting unstructured data consumed by the user;
extracting a keyword of interest of the user based on unstructured data consumed by the user;
extracting an investment item corresponding to the user's interest keyword from the item-keyword relation DB; and
An item information providing method comprising extracting an investment impact indicator corresponding to an investment item corresponding to the user's interest keyword from the item-indicator relation DB.

According to claim 1,
The step of constructing an item-indicator relationship DB including related investment impact indicators corresponding to each of a plurality of investment items,
For each of the plurality of investment items, a method for determining the related investment impact index corresponding to the investment item is performed,
The method for determining the related investment index corresponding to the investment item is,
Collecting a plurality of atypical documents corresponding to the investment item or a keyword related to the investment item; and
Determining an investment impact index corresponding to the investment item based on the collected atypical documents,
The step of determining the investment impact index corresponding to the investment item based on the collected atypical documents,
Determining the investment impact index based on the output result of the impact judgment model learned through a context-sensitive natural language processing model for all or some of the collected unstructured documents to be judged; ,
The impact judgment model is a model learned to output a classification result including whether each individual document included in the documents to be judged or each sentence included in the individual document has a positive or negative impact on investment. How to provide personal item information.

The method of claim 4, wherein the method for determining the related investment index corresponding to the investment item,
Further comprising performing filtering on unstructured documents corresponding to predetermined filtering conditions among the collected unstructured documents,
A method of providing item information, characterized in that atypical documents remaining after filtering are specified as the documents to be judged.

The method of claim 5, wherein the filtering of unstructured documents corresponding to predetermined filtering conditions among the unstructured documents collected by the system comprises:
generating a document vector for each of the collected irregular documents;
clustering atypical documents having a similarity of at least a certain level based on the generated document vectors; and
A method of providing item information, comprising performing filtering to exclude some of the clustered atypical document clusters from documents to be judged.

7. The method of claim 6, wherein the clustering of irregular documents having a similarity of at least a certain level based on the generated document vectors comprises:
A method of providing item information comprising performing clustering among unstructured documents having the degree of similarity equal to or higher than a certain level and generating time of the unstructured documents within a predetermined time range.

According to claim 4,
The step of collecting a plurality of atypical documents corresponding to the investment item or a keyword related to the investment item,
Collecting first unstructured documents corresponding to the investment item and second unstructured documents corresponding to the related keyword, respectively;
The step of determining the investment impact index corresponding to the investment item based on the collected atypical documents,
and determining based on a first investment impact index extracted based on the first unstructured documents and a second investment impact index extracted based on the second unstructured documents.

According to claim 8,
The step of determining based on the first investment impact index extracted based on the first unstructured documents and the second investment impact index extracted based on the second unstructured documents,
The method of providing stock information, characterized in that the investment impact index is determined to have different weights for each of the first investment impact index and the second investment impact index.

The method of claim 4, wherein the impact determination model,
Outputs a classification result including whether each sentence included in the individual document has a positive or negative impact on investment,
A method of providing item information, characterized in that the classification result of the individual document is determined based on the determined classification result for each sentence.

A computer program stored in a computer readable recording medium to perform the method according to any one of claims 1 to 10.

A computer readable recording medium on which a computer program for performing the method according to any one of claims 1 to 10 is recorded.

As a computing system,
processor; and
A memory for storing a computer program executed by the processor,
The computer program, when executed by the processor, causes the computing system to:
Constructing an item-indicator relationship DB that stores related investment impact indicators corresponding to each of a plurality of investment items;
Constructing a user-indicator relationship DB that stores investment impact indicators of interest corresponding to each of a plurality of users;
Observing whether the related investment impact index corresponding to each of the plurality of investment items is updated;
determining a target user corresponding to the related investment index in which an update has occurred among the plurality of users based on the user-indicator relationship DB when a related investment index in which an update has occurred is observed; and
A computing system that causes the target user to perform a method comprising providing personalized information on an investment item corresponding to the related investment index in which an update has occurred.