KR20220093671A

KR20220093671A - An analysis appartus for social network service based on artificial intelligence

Info

Publication number: KR20220093671A
Application number: KR1020200184640A
Authority: KR
Inventors: 정경태
Original assignee: 올유저닷넷(주)
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-07-05
Also published as: KR20230096932A

Abstract

According to one embodiment of the present invention, an apparatus for analyzing a social network service based on artificial intelligence (AI) includes: a data collection module collecting data on a social network service platform; a data management module outputting the data collected by the data collection module based on a pre-set search algorithm, and having a function of generating labeling for the collected data; a machine learning module performing machine learning based on a label of the data stored in the data management module, and generating an algorithm automatically performing labeling of the data; a user service module receiving a request of a user and visualizing and outputting a result analyzed by at least one between the data management module and the machine learning module based on the request of the user; and a core service module controlling at least one among the data collection module, the data management module, the machine learning module, and the user service module. The apparatus for analyzing a social network service based on AI of the present invention can be easily applied regardless of user's environment since each module is provided as a container which can operate without being affected by the environment.

Description

AI-based social network service analysis device {AN ANALYSIS APPARTUS FOR SOCIAL NETWORK SERVICE BASED ON ARTIFICIAL INTELLIGENCE}

본 발명은 소셜 네트워크 서비스 분석 장치에 관한 것으로, 구체적으로는 인터넷 뉴스 및 소셜 네트워크 서비스 상의 게시물 및 게시물의 댓글을 분석하여 고객의 긍정 및 부정도를 파악할 수 있는 인공지능 기반의 소셜 네트워크 서비스 분석 장치에 관한 것이다. The present invention relates to a social network service analysis device, and more specifically, to an artificial intelligence-based social network service analysis device capable of analyzing the positive and negative levels of customers by analyzing posts and comments on Internet news and social network services. it's about

소셜 데이터는 설문 조사나 인터뷰 자료처럼 가공된 데이터가 아니라 소비자 의견 또는 감성이 그대로 반영된 데이터라는 점에서 소비자를 가장 잘 파악할 수 있는 마케팅 지표로 각광받고 있다. Social data is in the spotlight as a marketing indicator that can best understand consumers because it is data that reflects consumer opinions or emotions, rather than processed data like survey or interview data.

이에 따라, 소셜 데이터에서 타깃(target)에 대한 소비자의 의견 또는 타깃으로부터 소비자가 느끼는 감성을 자동으로 분석해 주는 감성분석(sentiment analysis) 또한 빅데이터 환경의 핵심 기술 중 하나로 주목받고 있다.Accordingly, sentiment analysis, which automatically analyzes the consumer's opinion on the target in social data or the emotion the consumer feels from the target, is also attracting attention as one of the core technologies of the big data environment.

오피니언 마이닝(opinion mining)으로도 불리는 감성 분석은 다양한 콘텐츠에서 사람의 주관적인 데이터를 추출하고, 이를 분석하여 사람들의 의견, 감성 등을 도출하는 자연어 처리 기술이다. Sentiment analysis, also called opinion mining, is a natural language processing technology that extracts people's subjective data from various contents and analyzes them to derive people's opinions and emotions.

감성 분석에는 크게 문서/문장 단위의 감성 분석 기법과 속성 단위의 감성 분석 기법이 있으며, 이중에서, 문서 단위의 감성 분석 기법은 주로 기계 학습 모델을 통해 수행된다. Sentiment analysis is largely divided into a document/sentence unit sentiment analysis technique and an attribute unit sentiment analysis technique. Among them, the document unit sentiment analysis technique is mainly performed through a machine learning model.

가령, 문서 단위 감성 분석 기법은 학습된 기계 학습 모델을 통해 주어진 문서의 감성 클래스(e.g. 긍정, 부정)을 추론하는 방식으로 수행된다.For example, the document-level sentiment analysis technique is performed by inferring the sentiment class (e.g. positive, negative) of a given document through the learned machine learning model.

종래의 경우 상술한 감성분석을 실시하기 위하여 데이터 수집, 데이터 관리, 기계학습 등을 하나의 시스템에서 진행하였으며, 이로 인하여 기능의 확장이 불가하고, 사용자 환경에 영향을 받는 문제점이 있어왔다. In the conventional case, data collection, data management, machine learning, etc. were performed in one system in order to perform the above-described sentiment analysis, and thus there has been a problem in that the function cannot be expanded and the user environment is affected.

한편, 하기 선행기술문헌은 토픽별 감성사전 구축 및 감성사전을 기반하여 기사 극성 계산, 토픽별 월별 감성지수 구축을 통하여 시장의 거시적 동향을 나타내는 변수들과의 인과성을 정확하게 분석 및 예측할 수 있도록 한 감성사전 구축을 통한 주택시장 동향 분석을 위한 시스템 및 방법을 개시하고 있으며, 본 발명의 기술적 요지는 개시하고 있지 않다. On the other hand, the following prior art literature is a sentiment that enables accurate analysis and prediction of causality with variables representing macro trends in the market by calculating article polarity and constructing a monthly emotional index for each topic based on the sentiment dictionary construction for each topic and sentiment dictionary. Disclosed is a system and method for analyzing housing market trends through prior construction, but the technical gist of the present invention is not disclosed.

대한민국 공개특허공보 제10-2020-0022144호Republic of Korea Patent Publication No. 10-2020-0022144

본 발명의 일 실시예에 따른 AI 기반의 뉴스 및 소셜 네트워크 서비스 분석 장치는 전술한 문제점을 해결하기 위하여 다음과 같은 해결과제를 목적으로 한다.AI-based news and social network service analysis apparatus according to an embodiment of the present invention aims to solve the following problems in order to solve the above problems.

분석 장치의 각 기능을 담당하는 모듈을 각각의 서버에서 제공하도록 구성함으로써 각 기능에 대한 서버의 자원을 최대로 활용할 수 있는 AI 기반의 뉴스 및 소셜 네트워크 서비스 분석 장치를 제공하는 것이다.It is to provide an AI-based news and social network service analysis device that can utilize the server's resources for each function to the maximum by configuring the module responsible for each function of the analysis device to be provided by each server.

아울러 각 모듈들의 환경을 상호 독립적인 환경으로 설정하여 제공함으로써 사용자의 환경과는 무관하게 용이하게 적용이 가능한 AI 기반의 뉴스 및 소셜 네트워크 서비스 분석 장치를 제공하는 것이다. In addition, it is to provide an AI-based news and social network service analysis device that can be easily applied regardless of the user's environment by setting and providing the environment of each module as an environment independent of each other.

본 발명의 해결과제는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 해결과제들은 아래의 기재로부터 당해 기술분야에 있어서의 통상의 지식을 가진 자에게 명확하게 이해되어 질 수 있을 것이다.The problems to be solved of the present invention are not limited to those mentioned above, and other problems not mentioned will be clearly understood by those of ordinary skill in the art from the following description.

본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치는, 소셜 네트워크 서비스 플랫폼 상의 데이터를 수집하는 데이터 수집 모듈;An AI-based social network service analysis apparatus according to an embodiment of the present invention includes: a data collection module for collecting data on a social network service platform;

상기 데이터 수집 모듈이 수집한 데이터를 미리 설정된 검색 알고리즘에 기초하여 출력하고, 수집된 데이터에 라벨링을 생성하는 기능을 구비하는 데이터 관리 모듈; 상기 데이터 관리 모듈에 저장된 상기 데이터의 라벨에 기초하여 기계학습을 수행하여, 상기 데이터의 라벨링을 자동으로 수행하는 알고리즘을 생성하는 기계학습 모듈; 사용자의 요청을 입력받고, 상기 사용자의 요청에 기초하여 상기 데이터 관리 모듈 및 기계학습 모듈 중 적어도 하나가 분석한 결과를 시각화하여 출력하는 사용자 서비스 모듈; 및 상기 데이터 수집 모듈, 데이터 관리 모듈, 기계학습 모듈 및 사용자 서비스 모듈 중 적어도 하나를 제어하는 코어 서비스 모듈;을 포함한다.a data management module having a function of outputting the data collected by the data collection module based on a preset search algorithm and generating a label on the collected data; a machine learning module that performs machine learning based on the label of the data stored in the data management module, and generates an algorithm for automatically performing the labeling of the data; a user service module that receives a user's request, and visualizes and outputs a result analyzed by at least one of the data management module and the machine learning module based on the user's request; and a core service module for controlling at least one of the data collection module, data management module, machine learning module, and user service module.

상기 데이터 수집 모듈은, 상기 소셜 네트워크 서비스 플랫폼에 게시된 게시물의 URL을 미리 설정된 주기마다 수집하는 URL 수집유닛; 및 상기 URL 수집유닛으로부터 전달받은 URL에 기초하여 상기 게시물의 크롤링을 수행하는 크롤링 유닛;을 포함하는 것이 바람직하다.The data collection module may include: a URL collection unit configured to collect URLs of posts posted on the social network service platform at preset intervals; and a crawling unit configured to crawl the post based on the URL received from the URL collection unit.

상기 URL 수집유닛은 미리 설정된 주기마다 상기 소셜 네트워크 서비스 플랫폼에 접속하여 상기 소셜 네트워크 서비스 플랫폼의 최신글부터 지난 주기까지의 게시물들의 ID를 획득한 후, 상기 게시물들의 ID에 기초하여 상기 게시물들의 URL을 생성하여 상기 크롤링 유닛에 전달하고, 상기 크롤링 유닛은 상기 URL 수집유닛이 생성한 URL에 접속한 후 해당 URL의 게시물을 크롤링하는 것이 바람직하다.The URL collecting unit accesses the social network service platform at preset intervals to obtain IDs of posts from the latest post of the social network service platform to the last cycle, and then retrieves the URLs of the posts based on the IDs of the posts. It is preferable to generate and transmit to the crawling unit, and the crawling unit accesses the URL generated by the URL collection unit and then crawls the post of the corresponding URL.

상기 크롤링 유닛은 상기 URL 수집유닛이 생성한 URL에 접속하여 상기 URL의 게시물을 크롤링한 후 전처리를 수행하여 전처리 데이터를 생성하고, 상기 전처리 데이터는 상기 코어 서비스 모듈의 제2 데이터베이스에 저장되는 것이 바람직하다.Preferably, the crawling unit accesses the URL generated by the URL collection unit, crawls the posts of the URL, and then performs pre-processing to generate pre-processing data, and the pre-processing data is stored in the second database of the core service module. do.

상기 데이터 관리 모듈은 상기 사용자 서비스 모듈에서 입력된 사용자의 요청에 기초하여 상기 코어 서비스 모듈의 제2 데이터베이스에 저장된 데이터 중 상기 요청에 부합하는 타겟 데이터와 상기 타겟 데이터의 정보를 상기 데이터 관리 모듈 내의 제1 데이터베이스에 저장하는 것이 바람직하다.The data management module is configured to, based on a user's request inputted from the user service module, store target data corresponding to the request among data stored in the second database of the core service module and information on the target data in the data management module. 1 It is desirable to store it in the database.

상기 데이터 관리 모듈은 상기 제1 데이터베이스에 저장된 게시물을 데이터 관리 웹페이지 상에 출력하되, 상기 데이터 관리 웹페이지 상에는 상기 게시물의 원문, 상기 게시물을 미리 설정된 단위로 분할한 컨텍스트 및 상기 게시물의 댓글 중 적어도 하나가 출력되는 것이 바람직하다.The data management module outputs the post stored in the first database on a data management web page, and on the data management web page, at least one of the original text of the post, the context in which the post is divided into preset units, and the comment of the post It is preferable that one is output.

상기 데이터 관리 웹페이지 상에는 사용자로 하여금 상기 게시물의 원문, 상기 게시물을 미리 설정된 단위로 분할한 컨텍스트 및 상기 게시물의 댓글 중 적어도 하나에 대한 주요요소(Aspect) 및 상기 주요요소에 대한 감성평가에 대한 라벨링을 수행할 수 있도록 구성되고, 상기 라벨링 결과는 상기 제1 데이터베이스 및 제2 데이터베이스 중 적어도 하나에 저장되되 상기 미리 설정된 단위는 해당 게시물의 문맥을 가장 잘 표현할 수 있는 단위인 것이 바람직하다.On the data management web page, the user allows the user to label the main element (Aspect) for at least one of the original text of the post, the context in which the post is divided into preset units, and the comment of the post and the emotional evaluation of the main element Preferably, the labeling result is stored in at least one of the first database and the second database, and the preset unit is a unit that can best express the context of the corresponding post.

본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치는, 각 모듈이 환경 독립적으로 동작 가능한 컨테이너로 제공됨으로써 사용자의 환경과 관계없이 쉽게 적용이 가능하며, 나아가 각각의 컨테이너의 성능 및 기능의 확장이 용이하다는 효과가 있다.The AI-based social network service analysis apparatus according to an embodiment of the present invention can be easily applied regardless of the user's environment because each module is provided as a container that can operate independently of the environment, and furthermore, the performance and function of each container has the effect of being easy to expand.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당해 기술분야에 있어서의 통상의 지식을 가진 자에게 명확하게 이해되어질 수 있을 것이다.Effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those of ordinary skill in the art from the following description.

도 1은 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치를 도시한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치 중 데이터 수집 모듈의 동작을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치 중 데이터 관리 모듈의 동작을 설명하기 위한 도면이다.
도 4 및 도 5는 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치에서 데이터 관리 웹페이지를 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치 중 데이터 관리 모듈의 다른 실시예의 동작을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치 중 데이터 관리 모듈에서 사용자가 작성한 라벨링 저장과정을 도시한 도면이다.
도 8은 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치 중 기계학습 모듈의 Web상 화면의 일 실시예를 도시한 도면이다.
도 9는 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치 중 기계학습 모듈에서의 학습과정을 설명하기 위한 도면이다.
도 10은 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치 중 기계학습 모듈의 Web상 화면의 다른 실시예를 도시한 도면이다.
도 11은 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치 중 기계학습 모듈에서의 추론과정을 설명하기 위한 도면이다. 도 12 및 도 13은 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치 중 사용자 서비스 모듈에 의한 시각화 예를 도시한 도면이다. 1 is a block diagram illustrating an AI-based social network service analysis apparatus according to an embodiment of the present invention.
2 is a diagram for explaining an operation of a data collection module in an AI-based social network service analysis apparatus according to an embodiment of the present invention.
3 is a diagram for explaining an operation of a data management module in an AI-based social network service analysis apparatus according to an embodiment of the present invention.
4 and 5 are diagrams illustrating a data management web page in an AI-based social network service analysis apparatus according to an embodiment of the present invention.
6 is a view for explaining the operation of another embodiment of the data management module of the AI-based social network service analysis apparatus according to an embodiment of the present invention.
7 is a diagram illustrating a labeling storage process created by a user in a data management module of an AI-based social network service analysis apparatus according to an embodiment of the present invention.
8 is a diagram illustrating an embodiment of a screen on the Web of a machine learning module among AI-based social network service analysis apparatuses according to an embodiment of the present invention.
9 is a diagram for explaining a learning process in a machine learning module among the AI-based social network service analysis apparatus according to an embodiment of the present invention.
10 is a diagram illustrating another embodiment of a screen on the Web of a machine learning module among the AI-based social network service analysis apparatus according to an embodiment of the present invention.
11 is a diagram for explaining an inference process in a machine learning module among the AI-based social network service analysis apparatus according to an embodiment of the present invention. 12 and 13 are diagrams illustrating examples of visualization by a user service module among AI-based social network service analysis apparatuses according to an embodiment of the present invention.

첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. A preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings, but the same or similar components are assigned the same reference numerals regardless of reference numerals, and redundant description thereof will be omitted.

또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.In addition, in the description of the present invention, if it is determined that a detailed description of a related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, it should be noted that the accompanying drawings are only for easy understanding of the spirit of the present invention, and should not be construed as limiting the spirit of the present invention by the accompanying drawings.

이하 도 1 내지 도 8을 참조하여 본 발명의 일 실시예에 따른 AI 기반의 뉴스 및 소셜 네트워크 서비스 분석 장치에 대하여 설명하도록 한다. Hereinafter, an AI-based news and social network service analysis apparatus according to an embodiment of the present invention will be described with reference to FIGS. 1 to 8 .

본 발명의 일 실시예에 따른 AI 기반의 뉴스 및 소셜 네트워크 서비스 분석 장치는 인터넷 뉴스 및 소셜 네트워크 서비스 상의 게시물을 및 게시물의 댓글을 분석하여 고객의 긍정 및 부정도를 파악할 수 있는 분석 장치에 관한 것으로, 도 1에 도시된 바와 같이 데이터 수집 모듈(100), 데이터 관리 모듈(200), 기계학습 모듈(300), 사용자 서비스 모듈(400) 및 코어 서비스 모듈(500)을 포함하도록 구성된다. An AI-based news and social network service analysis device according to an embodiment of the present invention relates to an analysis device capable of identifying positive and negative levels of customers by analyzing posts on Internet news and social network services and comments on posts. , is configured to include a data collection module 100 , a data management module 200 , a machine learning module 300 , a user service module 400 and a core service module 500 as shown in FIG. 1 .

상술한 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치를 이루는 각 구성들은 각각의 서버를 구비하도록 구성되며, 이를 통하여 각 기능마다 서버의 자원을 최대로 활용할 수 있도록 구비된다.Each of the components constituting the AI-based social network service analysis apparatus according to an embodiment of the present invention described above is configured to include a respective server, and through this, is provided so that the server's resources can be maximally utilized for each function.

아울러, 각 구성이 컨테이너 방식의 독립환경으로 구성되어, 각 구성들 간의 환경 의존성이 없기 때문에 개발환경의 구성의 최적화가 용이하며, 사용자의 환경과 관계없이 쉽게 적용 가능하며 각각의 컨테이너의 성능 및 기능의 확장이 용이하도록 구성된다.In addition, since each configuration is composed of a container-type independent environment, and there is no environmental dependence between each configuration, the configuration of the development environment is easy to optimize, and it can be easily applied regardless of the user's environment, and the performance and function of each container is configured to facilitate the expansion of

데이터 수집 모듈(100)은 네이버 기사, 네이버 카페, 트위터, 페이스북 및 인스타그램 등의 소셜 네트워크 서비스 플랫폼 상의 데이터를 수집하는 기능을 수행하며, 하나의 소셜 네트워크 서비스 플랫폼 당 2개의 구성요소, 즉 URL 수집유닛(110) 및 크롤링 유닛(120)을 포함하도록 구성된다.The data collection module 100 performs a function of collecting data on social network service platforms such as Naver articles, Naver cafes, Twitter, Facebook and Instagram, and has two components per one social network service platform, namely, It is configured to include a URL collection unit 110 and a crawling unit 120 .

URL 수집유닛(110)은 미리 설정된 주기에 따라 특정기간의 글들의 URL을 수집하는 기능을 수행하고, 크롤링 유닛(120)은 URL 수집유닛(110)이 수집한 URL을 받아서 실제 크롤링하고 DB에 저장하는 기능을 수행한다.The URL collection unit 110 performs a function of collecting URLs of articles of a specific period according to a preset period, and the crawling unit 120 receives the URL collected by the URL collection unit 110, actually crawls it, and stores it in the DB. perform the function

URL 수집유닛(110)이 URL을 수집할 때 마다 크롤링 유닛(120)에게 URL 정보가 담긴 queue를 전송하게 되고, 크롤링 유닛(120)은 URL 수집유닛(110)이 보낸 queue를 받아 바로바로 크롤링을 실행하게 되며, URL 수집유닛(110)과 크롤링 유닛(120)의 동작 시간 차로 인해 처리되지 못한 queue들은 크롤링 유닛(120)에 쌓여 있다가 순차적으로 처리하면서 데이터가 수집되게 된다.Whenever the URL collection unit 110 collects URLs, it transmits a queue containing URL information to the crawling unit 120, and the crawling unit 120 receives the queue sent by the URL collection unit 110 and crawls immediately. Queues that have not been processed due to the difference in operation time between the URL collection unit 110 and the crawling unit 120 are accumulated in the crawling unit 120, and data is collected while sequentially processing them.

이 경우 백그라운드에서 사용자가 신경 쓰지 않아도 주기적으로 데이터를 크롤링 해오며 주기를 짧게 구현하면 실시간데이터 수집까지 가능하도록 구현될 수 있으며, 전체 글을 한번에 가져오는 것이 아닌 주기적으로 나눠서 빈틈없이 글들을 가져오기 때문에 효율적으로 데이터를 수집하며 항상 최신의 데이터를 유지할 수 있기 때문에 정확하고 의미있는 분석이 가능하게 된다.In this case, data is periodically crawled in the background even if the user does not care, and if the cycle is shortened, real-time data collection is possible. Accurate and meaningful analysis is possible because data can be collected efficiently and data can be kept up-to-date at all times.

이하에서는 도 2를 참조하여 상술한 데이터 수집 모듈(100)의 동작에 대하여 시계열적으로 설명하도록 한다.Hereinafter, the operation of the above-described data collection module 100 will be described in time series with reference to FIG. 2 .

먼저 미리 설정된 주기에 따라 URL 수집유닛(110)이 실행이 되어 해당 소셜 네트워크 플랫폼에 접속하는 단계(S110)가 수행되고, 이후 URL 수집유닛(110)이 해당 소셜 네트워크 플랫폼으로부터 최신 게시물부터 지난 주기까지의 게시물의 ID들을 획득하는 단계(S120)가 수행된다.First, the URL collecting unit 110 is executed according to a preset cycle to access the corresponding social network platform (S110), and then the URL collecting unit 110 is performed from the latest posting from the corresponding social network platform to the last cycle. A step (S120) of obtaining IDs of a post of

이후 URL 수집유닛(110)은 획득한 ID들을 반복문을 통하여 각각에 대한 URL을 생성하는 단계(S130) 및 여기에서 생성된 URL들을 크롤링 유닛(120)에게 전달하는 단계(S140)가 수행된다.Thereafter, the URL collection unit 110 generates a URL for each of the obtained IDs through a looping statement (S130) and a step (S140) of transmitting the generated URLs to the crawling unit 120 is performed.

다음으로 크롤링 유닛(120)이 전달받은 URL에 접속하는 단계(S150) 및 해당 URL의 글들을 크롤링한 후 html 태그 등을 삭제하는 등의 전처리를 수행하는 단계(S160)가 진행된다.Next, the crawling unit 120 accesses the received URL ( S150 ) and performs pre-processing such as deleting html tags after crawling the articles of the URL ( S160 ).

이후 크롤링 유닛(120)이 전처리된 데이터를 코어 서비스 모듈(500)의 제1 데이터베이스에 저장하는 단계(S170)가 수행되며, 마지막으로 코어 서비스 모듈(500)은 크롤링 유닛(120)에게 데이터베이스 내의 데이터의 저장이 잘 되었는지 여부의 확인을 위한 확인 메시지를 전송하는 단계(S180)가 진행된다.After that, the crawling unit 120 stores the preprocessed data in the first database of the core service module 500 ( S170 ), and finally, the core service module 500 sends the crawling unit 120 to the data in the database. A step (S180) of transmitting a confirmation message for checking whether the storage of is well proceeds.

아울러 종래의 데이터 크롤링 과정에서는 URL 정보들을 가져올 때 html 기반의 paser를 사용하였으나, 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치의 데이터 수집 모듈(100)은 lxml 기반의 paser를 사용함으로써 종래 대비 수집 속도가 향상되는 효과를 기대할 수 있다.In addition, in the conventional data crawling process, an html-based paser is used to fetch URL information, but the data collection module 100 of the AI-based social network service analysis apparatus according to an embodiment of the present invention uses an lxml-based paser. By doing so, the effect of improving the collection speed compared to the prior art can be expected.

데이터 관리 모듈(200)은 사용자로 하여금 데이터 수집 모듈(100)에서 수집한 데이터에 효율적으로 접근하고 관리할 수 있도록 마련된 구성으로, 데이터 수집 모듈(100)이 수집한 데이터를 미리 설정된 검색 알고리즘에 기초하여 출력하고, 나아가 수집된 데이터에 라벨링을 생성하는 기능을 구비하도록 구성된다. The data management module 200 is a configuration provided so that a user can efficiently access and manage the data collected by the data collection module 100, and the data collected by the data collection module 100 is based on a preset search algorithm. and output, and further configured to have a function of generating a label on the collected data.

이러한 데이터 관리 모듈(200)은 웹으로 제공되며 자체 데이터베이스인 제1 데이터베이스(210)를 가지고 있어 제1 데이터베이스(210)에 저장할 값을 요청을 통해 가져오게 되고, 제1 데이터베이스(210)에 저장된 데이터를 통한 카테고리 및 검색알고리즘이 존재하여 사용자가 데이터를 추려서 볼 수 있도록 구성된다.The data management module 200 is provided on the web and has a first database 210 that is its own database, so a value to be stored in the first database 210 is retrieved through a request, and data stored in the first database 210 is retrieved. There are categories and search algorithms through , so users can select and view data.

즉, 출처 별, 날짜 별, 키워드 별, 회사 별 등 데이터의 기본 정보를 카테고리화 할 수 있으며, 상세 정보는 검색을 통하여 쉽게 sorting 및 접근이 가능하도록 구성된다. That is, basic information of data such as source, date, keyword, and company can be categorized, and detailed information is configured to be easily sorted and accessed through search.

이러한 데이터 관리 모듈(200)의 동작을 도 3을 참조하여 시계열적으로 설명해보면, 먼저 데이터 관리 모듈(200)의 웹페이지를 통하여 분석하고자 하는 타겟(Target)에 대한 카테고리의 생성을 코어 서비스 모듈(500)에 요청하는 단계(S210)가 수행된다.When the operation of the data management module 200 is described in time series with reference to FIG. 3 , first, the creation of a category for a target to be analyzed through the web page of the data management module 200 is performed by the core service module ( 500) is requested (S210) is performed.

이때 코어 서비스 모듈(500)은 자체 보유하고 있는 제2 데이터베이스(510)에서 타겟에 대한 데이터, 즉 크롤링해서 저장된 데이터를 가져와서 페이지네이션(pagination)을 진행하는데, 이때 요청이 들어온 URL 정보와 페이지 정보를 추후 데이터가 보여질 웹 URL에 사용하기 위하여 같이 저장하게 된다. At this time, the core service module 500 fetches data about the target, that is, crawled and stored data from the second database 510 it owns, and performs pagination. At this time, the requested URL information and page information will be saved together for use in the web URL where the data will be displayed later.

이후 코어 서비스 모듈(500)은 전체 데이터 수 및 페이지 정보를 데이터 관리 모듈(200)로 전송하는 단계(S220)가 수행된다.Thereafter, the core service module 500 transmits the total number of data and page information to the data management module 200 ( S220 ).

이때 데이터 관리 모듈(200)로 전달된 데이터가 데이터 관리 모듈(200)에 구비된 제1 데이터베이스(210)에 저장될 때마다 도 4에 도시된 바와 같이 데이터가 얼마만큼 불러와졌는지는 웹 페이지 상에 표시될 수 있으며, 이때 데이터는 실시간으로 전달되기 때문에 데이터가 100%가 전달되지 않은 상황이라도 사용자는 데이터 관리 웹페이지 상에서 여러가지 작업의 수행이 가능하다.At this time, whenever the data transferred to the data management module 200 is stored in the first database 210 provided in the data management module 200, as shown in FIG. 4 , how much data has been retrieved is displayed on the web page. In this case, since the data is delivered in real time, the user can perform various tasks on the data management web page even if 100% of the data is not delivered.

아울러 코어 서비스 모듈(500)이 전체 데이터 수 및 페이지 정보를 데이터 관리 모듈(200)로 전송하는 동시에 코어 서비스 모듈(500)은 데이터베이스(510)에 저장된 URL 정보 및 페이지 정보를 dm_generator로 송신하는 단계(S230)가 수행된다. In addition, the core service module 500 transmits the total number of data and page information to the data management module 200 while the core service module 500 transmits the URL information and page information stored in the database 510 to the dm_generator ( S230) is performed.

이후 dm_generator는 실제 페이지네이션된 데이터를 코어 서비스 모듈(500)에 요청하는 단계(S240) 및 코어 서비스 모듈(500)이 요청받은 데이터를 dm_generator로 송신하는 단계(S250)가 진행되며, 이때 지정한 페이지의 개수가 채워질때까지 S240 및 S250의 수행이 반복된다.Afterwards, the dm_generator requests the actual paginated data to the core service module 500 ( S240 ) and the core service module 500 transmits the requested data to the dm_generator ( S250 ). At this time, the S240 and S250 are repeated until the number is filled.

S240 및 S250의 반복 수행에 의하여 dm_generator가 한 페이지의 전체 데이터를 획득하게 되면 해당 데이터를 dm_creator로 보내는 단계(S260)가 수행된다.When the dm_generator acquires the entire data of one page by repeating steps S240 and S250, the step of sending the data to the dm_creator (S260) is performed.

다음으로 dm_creator가 dm_generator로부터 획득한 데이터를 데이터 관리 모듈(200)로 전달하여 데이터를 데이터 관리 모듈(200)에 구비된 제1 데이터베이스(210)에 저장하는 단계(S270)가 수행되며, 이때 데이터 관리 모듈(200)의 웹페이지에는 S230에서 지정한 URL 페이지에 저장된 정보가 제공된다.Next, the dm_creator transfers the data acquired from the dm_generator to the data management module 200 and stores the data in the first database 210 provided in the data management module 200 ( S270 ). At this time, the data management Information stored in the URL page designated in S230 is provided to the web page of the module 200 .

아울러 상술한 데이터 관리 모듈(200)에서 한번에 모든 데이터를 가져오면 웹 화면에 리스트가 나올 때까지 시간이 오래 걸리기 때문에 먼저 페이지당 글이 나오고 본문의 수정을 누를 때 그 본문에 엮여 있던 댓글이나 context 단위의 글들이 함께 불러와지게 구현하여 효율성 증대시킬 수도 있다. In addition, since it takes a long time for the list to appear on the web screen when all data is fetched from the above-described data management module 200 at once, the article per page is first displayed and when the edit of the text is pressed, the comment or context unit that was tied to the text It is also possible to increase the efficiency by implementing the articles to be imported together.

한편, 데이터 관리 모듈(200)은 데이터 가져오기를 제외한 나머지 카테고리 생성, 검색 등의 데이터 조회 및 추출시에는 기존의 DB search가 아닌 Full search text 기반의 알고리즘을 이용하도록 구성될 수 있다.Meanwhile, the data management module 200 may be configured to use a full search text-based algorithm instead of the existing DB search when searching for and extracting data such as generation and search of categories other than data import.

Full search text 기반의 검색은 단어를 조각내서 저장하기 때문에(inverted) 조회가 매우 빠르며 속도 면에서도 상당한 개선을 이루어낼 수 있으며, 단어를 조각낼 때 nori_tokenizer를 이용하여 형태소 기반으로 잘라서 넣기 때문에 종래 DB 검색 대비 의미있는 결과들이 나올 수 있다.The full search text-based search is very fast because the word is stored in fragments (inverted), and a significant improvement in speed can be achieved. Contrasting meaningful results can be obtained.

상술한 Full search text 기반의 검색을 포함한 데이터 관리 모듈(200)의 동작을 도 6을 참조하여 시계열적으로 설명해보면, S210 내지 S270 단계는 상술한 바와 동일하며, S270 단계 이후에는 데이터 관리 모듈(200)이 S260 단계을 통하여 획득한 후 제1 데이터베이스(210)에 저장하였던 데이터를 Full Text 검색엔진(220)에 전달하는 단계(S280)가 수행된다. When the operation of the data management module 200 including the full search text-based search described above will be described in time series with reference to FIG. 6 , steps S210 to S270 are the same as described above, and after step S270, the data management module 200 ) is obtained through step S260, and then the data stored in the first database 210 is transferred to the Full Text search engine 220 (S280).

이후 데이터 관리 모듈(200)은 dm_creator에 Full Text 검색엔진(220)에서의 데이터 저장이 정상적으로 이루어졌음을 알리는 메시지를 송신하거나 또는 정상적이지 않은 경우 S270의 재수행을 요청하는 메시지를 송신하는 단계(S290)가 수행된다. Thereafter, the data management module 200 transmits a message to the dm_creator indicating that data storage in the Full Text search engine 220 has been normally performed or, if it is not normal, sending a message requesting re-execution of S270 (S290). ) is performed.

한편, 데이터 관리 모듈(200)의 데이터 관리 웹페이지는 도 5에 도시된 바와 같이 출력될 수 있는데, 상기 데이터 관리 웹페이지상에는 사용자로 하여금 상기 게시물의 원문, 상기 게시물을 미리 설정된 단위로 분할한 컨텍스트 및 상기 게시물의 댓글 중 적어도 하나에 대한 주요요소(Aspect) 및 상기 주요요소에 대한 감성평가에 대한 라벨링을 수행할 수 있도록 구성될 수 있다.On the other hand, the data management web page of the data management module 200 may be output as shown in FIG. 5 , and on the data management web page, the user allows the user to divide the original text of the post and the post into preset units. and an aspect of at least one of the comments of the post and may be configured to perform labeling for the emotional evaluation of the main element.

특히 사용자는 웹페이지 상에서 데이터를 수정할 수 있으며, 구체적으로 수집된 데이터)글 단위)마다 상세 페이지가 존재하고, 여기에서 사용자가 글을 수행하게 되면 제1 데이터베이스(210)에서도 이러한 수정내용이 실시간으로 반영되어 저장되도록 구성된다.In particular, a user can modify data on a web page, and there is a detailed page for each (specifically collected data) article), and when the user performs an article here, these modifications are also displayed in the first database 210 in real time. It is configured to be reflected and stored.

아울러 사용자는 웹 페이지 상에서 데이터의 기계학습 수행을 위한 라벨링을 생성할 수도 있는데, 전체 게시물을 context를 이해하기 적합한 문단 또는 문장 등의 단위로 잘라서 데이터를 저장할 수 있으며, 해당 단위 별로 라벨링을 수행하도록 구성될 수 있다.In addition, the user can create labeling for performing machine learning of data on a web page, and the entire post can be cut into units such as paragraphs or sentences suitable for understanding context, and data can be saved, and labeling is performed for each unit can be

또한 댓글이 존재하는 경우에는 댓글이 본문의 문맥과 연관되는 경우가 많기 때문에 사용자가 본문을 보면서 댓글에 대한 라벨링이 가능하도록 구성되는 것이 바람직하며, 이때 댓글 별로 본문에서 추가적으로 필요한 정보들을 입력함으로써 추후 새로운 데이터를 처리할 알고리즘에 반영될 수 있도록 구성되는 것이 바람직하다. In addition, if there is a comment, since the comment is often related to the context of the text, it is preferable that the user can label the comment while viewing the text. It is desirable to be configured to be reflected in an algorithm to process data.

즉 사용자가 직접 라벨링한 정보는 제1 데이터베이스(210)에 저장된 각 게시글(본문, 문맥 단위, 댓글)의 meta field에 저장되어 추후 학습에 사용되며, 라벨의 경우 수행하고자 하는 작업에 따라 여러 개가 될 수도 있고 그 형식이 달라질 수 있기 때문에 어떠한 형식에도 대응 가능하도록 jsonb 타입으로 저장함으로써 활용성을 극대화하는 것이 바람직하며, 또한 이러한 라벨링 정보는 코어 서비스 모듈(500)의 제2 데이터베이스(510)에도 저장될 수 있다.That is, the information directly labeled by the user is stored in the meta field of each post (body, context unit, comment) stored in the first database 210 and used for later learning. It is preferable to maximize the utility by storing it in a jsonb type to be able to respond to any format because the format may be different, and this labeling information is also stored in the second database 510 of the core service module 500. can

한편, 본 발명의 일 실시예에 따른 키워드 기반으로 사용자가 커스텀 카테고리를 추가할 수 있도록 구성될 수도 있다. Meanwhile, it may be configured so that a user can add a custom category based on a keyword according to an embodiment of the present invention.

즉, 사용자가 직접 카테고리명을 지정하여 생성할 수 있으며, 해당 카테고리에서 키워드를 추가하거나 삭제하는 등의 작업을 통하여 원하는 카테고리를 생성할 수 있다.That is, a user can directly designate a category name and create a category, and a desired category can be created by adding or deleting keywords in the corresponding category.

키워드는 쉼표(,) 구분으로 한번에 복수 개를 입력할 수 있으며, 각각의 키워드는 뱃지로 저장되며, 삭제시에는 키워드 뱃지를 눌러 여러 개를 한꺼번에 삭제할 수 있도록 구성된다. A plurality of keywords can be entered at once by separating them with a comma (,), and each keyword is saved as a badge.

아울러 키워드는 형태소 분석 기반 검색 알고리즘을 통하여 검색된 결과들을 가져오게 되며, 좀 더 정확한 검색을 위하여 추가된 키워드들은 사용자 사전에 저장되어 정확한 tokenizing이 되도록 구현되는 것이 바람직하다. In addition, keywords are retrieved through a morpheme analysis-based search algorithm, and for more accurate searches, it is desirable that the added keywords are stored in the user's dictionary for accurate tokenizing.

나아가 라벨링 과정에서 상태가 추가되었는데 라벨을 작성하면 '검수 필요' 상태가 할당되고 추후 관리자가 검수확인까지 한 경우 검수 완료, 기계학습 모델이 추론한 라벨이거나, 라벨이 아예 없는 등의 사람이 라벨을 작성하지 않은 글들은 라벨링 필요 상태로 저장이 되어 카테고리를 통해 모아서 볼 수 있도록 구성될 수 있다.Furthermore, a status was added during the labeling process, and when a label is created, the 'Requires Inspection' status is assigned and the administrator confirms the inspection later. Unwritten articles are stored in the labeling required state and can be organized so that they can be viewed and collected through categories.

또한 라벨링 과정에서 애매한 부분은 글, 댓글, context 별로 메모를 작성할 수 있으며, 저장하기를 누르면 라벨과 함께 저장이 되어 글에는 '메모' 뱃지가 붙고 메모 글이 하단의 게시판에 추가가 되도록 구성될 수 있으며, 하단 게시판에선 메모에 대한 댓글을 달 수 있도록 구성되며, 메모 뱃지는 메모를 삭제하거나 메모를 남기고 이슈 닫기 버튼을 통해 떼어낼 수 있도록 구성되는 것도 가능할 것이다.In addition, in the labeling process, you can write a memo for each post, comment, and context, and if you press Save, it will be saved with the label, and the post will be marked with a 'Memo' badge and the memo can be added to the bulletin board at the bottom. It is also possible that the lower bulletin board is configured so that you can comment on the memo, and the memo badge can be configured so that you can delete the memo or leave a memo and remove it through the close issue button.

상술한 사용자가 작성한 라벨링 저장과정을 도 7을 참조하여 시계열적으로 설명해보면 다음과 같다. A time-series description of the labeling storage process created by the user described above with reference to FIG. 7 is as follows.

먼저 사용자가 데이터 관리 모듈(200) 상에서 라벨링을 작성 후 저장하게 되면, 데이터 관리 모듈(200)의 제1 데이터베이스(210) 상에 추가된 라벨링 정보가 업데이트되는 동시에 Full Text 검색엔진(220)로 ID 및 상태정보가 전달되어 라벨링 작성 상태가 업데이트되는 단계(S291)가 수행된다. First, when the user writes and stores labeling on the data management module 200, the labeling information added on the first database 210 of the data management module 200 is updated and the ID is sent to the Full Text search engine 220 at the same time. and a step (S291) in which the state information is transmitted and the labeling writing state is updated.

이후 데이터 관리 모듈(200)은 업데이트된 라벨링 정보 및 라벨링된 게시물에 대한 메타정보(id, 본문, context, 댓글 여부 등)를 코어 서비스 모듈(400)로 전달하는 단계(S292)와, 코어 서비스 모듈(500)이 제2 데이터베이스(510)에서 전달된 메타정보에 해당하는 게시글을 찾아 라벨링 정보를 업데이트 하는 단계 (S293)가 진행된다.Thereafter, the data management module 200 transmits the updated labeling information and meta information (id, text, context, comment, etc.) for the labeled post to the core service module 400 (S292) and the core service module A step (S293) of updating the labeling information by finding a post corresponding to the meta information transmitted from the second database 510 (500) is performed.

이후 코어 서비스 모듈(500)은 데이터 관리 모듈(200) 데이터 전송이 정상적으로 이루어졌음을 알리는 메시지를 송신하거나 또는 정상적이지 않은 경우 데이터의 재전송을 요청하는 단계(S294)가 수행된다. 기계학습 모듈(300)은 데이터 관리 모듈(200)에 저장된 상기 데이터의 라벨에 기초하여 기계학습을 수행하여, 상기 데이터의 라벨링을 자동으로 수행하는 알고리즘을 생성하는 기능을 수행하는데, 즉 제1 데이터베이스(210) 또는 제2 데이터베이스(510) 중 적어도 하나에 저장된 meta field의 데이터로 학습을 진행하여 output_meta field에 추론한 라벨을 저장하는 기능을 수행한다.Afterwards, the core service module 500 transmits a message indicating that data transmission to the data management module 200 has been normally performed, or if it is not normal, a step S294 of requesting retransmission of data is performed. The machine learning module 300 performs machine learning based on the label of the data stored in the data management module 200 to perform a function of generating an algorithm that automatically performs the labeling of the data, that is, the first database 210 or the second database 510 performs a function of storing the inferred label in the output_meta field by learning with data of the meta field stored in at least one of the database 510 .

기계학습 모듈(300)에서는 meta field에 저장되어 있는 라벨들과 게시물(본문, 단위 별, 댓글) 전체로 학습이 진행되며, 사용자가 특정 부분에 특화된 학습 모델을 원할 경우에는 사용자 서비스 모듈(400)에서 사용자가 파라미터를 지정(분석하고자 하는 타겟, 기간, 키워드 등)하여 거기에 맞는 학습데이터로만 학습이 가능하도록 구현하는 것도 가능하다.In the machine learning module 300, learning is carried out with all the labels and posts (text, unit, and comment) stored in the meta field, and when the user wants a specialized learning model for a specific part, the user service module 400 It is also possible to implement so that the user can designate parameters (target to be analyzed, period, keyword, etc.) in

한편 기계학습 모듈(300)에서 적용될 수 있는 모델로는 Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence 논문을 활용한 모델이며 논문에서 Target Aspect Sentiment 세가지를 통해 target이 어떤 aspect에서 Sentiment(긍정 혹은 부정) 평가를 받는지 예측한다는 컨셉을 차용하여 게시물에서 Target(회사)이 Aspect(6가지 평가지표)에서 Sentiment(긍정 중립 부정) 평가를 받는지 예측할 수 있다.On the other hand, a model that can be applied in the machine learning module 300 is a model using the Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence paper. ), it is possible to predict whether a target (company) will receive a Sentiment (positive, neutral or negative) evaluation in Aspect (6 evaluation indicators) in a post by borrowing the concept of predicting whether it will be evaluated.

또한 해당 논문에서 사용한 기존의 모델은 영어 전용 모델이었기 때문에 이미 오픈된 여러 한국어 모델 중에서 자체 테스트 결과 성능이 가장 좋았던 KoElectra 모델을 사용할 수 있으며, Aspect 및 Sentiment의 종류와 기준은 task에 따라 달라질 수 있으며 심지어 시스템이 해결해야할 task에 따라 모델 자체가 바뀔 수 있다.In addition, since the existing model used in this paper was an English-only model, the KoElectra model, which had the best performance as a result of self-test among several open Korean models, can be used. The model itself can change depending on the task the system needs to solve.

추가적으로 Longformer: The Long-Document Transforemr의 논문에 수록된 기술을 적용하여 기존 논문에서 사용한 Bert-Model의 sequence 한계인 512 토큰을 넘어서 4096토큰까지 학습이 이루어질 수 있도록 구현하였고 이를 통해 전체 corpus의 길이 제한에서 좀 더 자유롭게 문서 단위까지의 문맥 정보를 포함하여 학습을 진행할 수 있게 하였다. In addition, by applying the technology contained in the paper of Longformer: The Long-Document Transformer, it was implemented so that learning can take place up to 4096 tokens beyond the 512 token sequence limit of the Bert-Model used in the existing paper. It allowed learning to proceed more freely including contextual information down to the document level.

현재 공식적으로 한국어 Longformer pretrained 모델은 존재하지 않으며 본 출원인의 연구 및 실험을 통해 다른 공개된 여러 한국어 pretrained 모델을 Longformer용으로 변환한 pretrained 모델 중 성능이 뛰어난 모델을 사용하고 있으며 기존 Longformer 모델을 수정하여 조금 더 메모리 효율적인 모델을 사용하고 있다. Currently, there is no official Korean Longformer pretrained model, and a model with excellent performance among the pretrained models converted for Longformer from several other published Korean pretrained models through research and experiments by the applicant is used. We are using a more memory efficient model.

하지만 성능적인 측면에서 Longformer가 기존 모델보다 아쉬운 면이 존재하기 때문에 512토큰 이하에서는 기존 모델을 512토큰 이상에서는 Longformer 모델을 사용하는 2-way 방식의 모델 학습방식을 사용하고 있다.However, in terms of performance, Longformer is inferior to the existing model in terms of performance, so a 2-way model learning method is used that uses the existing model for less than 512 tokens and the Longformer model for more than 512 tokens.

특히 기계학습 모듈(300)은 웹이 추가되며, 기계학습 모듈(300)에서 이루어지는 학습 및 예측이 기계학습 모듈(300)에 연결된 웹에서 진행되도록 구성될 수 있다.In particular, the machine learning module 300 may be configured such that a web is added, and learning and prediction made in the machine learning module 300 are performed on a web connected to the machine learning module 300 .

이러한 웹에서 사용자는 기본설정으로 학습 및 예측을 할 수 있으며, 도 8에 도시된 바와 같이 원하는 작업에 맞추어 구성된 옵션들을 직접 설정하여 학습을 진행할 수 있으며, 옵션의 경우 사용자의 요청에 의하여 확장이 가능할 수 있도록 구성될 수 있다.In such a web, a user can learn and predict with default settings, and as shown in FIG. 8, can directly set options configured for a desired task to proceed with learning. It can be configured to

아울러 사용자가 모델의 이름을 지정할 수 있어 여러 방식으로 학습된 모델을 효율적으로 관리할 수 있으며 추후 예측(inference) 작업에서 직접 지정했던 이름으로 모델을 선택하여 예측할 수 있다. In addition, the user can specify a name for the model, so that the trained model can be efficiently managed in various ways, and the model can be selected and predicted with the name that was specified directly in the inference job.

기계학습 모듈(300)에서 학습이 진행되면 기계학습 모듈(300) 내에 구비된 제3 데이터베이스(310)에 모델명, 정확도, 생성일 등이 저장이 되며 관련 정보는 기계학습 모듈 web의 모델 리스트에서 확인할 수 있게 된다.When learning is performed in the machine learning module 300, the model name, accuracy, creation date, etc. are stored in the third database 310 provided in the machine learning module 300, and related information can be checked in the model list of the machine learning module web. be able to

특히 기계학습 모듈(300)에서 학습이 일어나는 과정을 도 9를 참조하여 시계열적으로 설명해보면 아래와 같다.In particular, a process in which learning occurs in the machine learning module 300 will be described in time series with reference to FIG. 9 as follows.

먼저 기계학습 모듈(300)과 연결된 WEB의 모델 학습 탭에서 사용자가 도 8의 화면에서의 학습시작 아이콘을 누르면 사용자가 설정한 옵션 정보가 내부의 학습 모듈에 들어오는 단계(S311)가 수행된다.First, when the user presses the learning start icon on the screen of FIG. 8 in the model learning tab of the WEB connected to the machine learning module 300, a step (S311) of entering the option information set by the user into the internal learning module is performed.

이후, 기계학습 모듈(300)은 전달된 정보를 Training Worker에 전달함과 동시에 기계학습 모듈(300)에 연결된 제3 데이터베이스(310)에 옵션 정보를 저장하는 단계(S312)가 수행되며, 이를 통하여 모델 리스트 페이지에서 학습모델에 대한 정보와 상태를 확인할 수 있게 된다.Thereafter, the machine learning module 300 transmits the transmitted information to the training worker and at the same time stores the option information in the third database 310 connected to the machine learning module 300 ( S312 ) is performed, and through this You can check the information and status of the learning model on the model list page.

이후 Training Worker는 직접 코어 서비스 모듈(500)의 제2 데이터베이스(510)에 접속하여 S312 단계번에서 받은 옵션 정보를 검색하는 단계(S313), 제2 데이터베이스(510)가 S313 단계에서 검색된 데이터(학습에 필요한 text, meta(사람이 작성한 라벨) 등의 정보)를 Training Worker에 전달하는 단계(S314) 및 Training Worker가 S314 단계에서 받은 정보를 기계학습 모듈(300)로 전달하는 단계(S315)가 순차적으로 진행된다.After that, the training worker directly accesses the second database 510 of the core service module 500 and searches for the option information received in step S312 (S313), the second database 510 is the data (learning) retrieved in step S313. A step (S314) of delivering the necessary text, meta (information such as a label written by a person) to the training worker (S314) and the step (S315) of the training worker delivering the information received in step S314 to the machine learning module 300 are sequential proceeds with

이후 기계학습 모듈(300)이 S315 단계에서 전달받은 정보에 기초하여 학습을 진행하고 학습이 완료된 모델을 저장하는 단계(S316) 및 학습이 완료된 모델 Path를 제3 데이터베이스(310)에 저장함과 동시에 상태를 학습 중에서 학습완료로 업데이트하는 단계(S317)이 수행된다. 한편, 기계학습 모듈(300)에 의하여 학습된 모델은 사람이 직접 라벨링을 수행할 필요가 없이 새로운 데이터에 대하여 결과를 예측하게 되는데, 새로운 데이터가 들어오면 자동으로 그 데이터에서 미리 지정된 타겟 별로 aspect 및 sentiment에 대한 라벨을 예측하여 제2 데이터베이스(510) 및 제3 데이터베이스(310) 중 적어도 하나에 저장될 수 있으며, 이는 사람이 직접 입력한 라벨과는 구분되도록 저장되는 것이 바람직하다.Thereafter, the machine learning module 300 proceeds learning based on the information received in step S315 and stores the model on which the learning is completed (S316) and the state at the same time as storing the model Path on which the learning is completed in the third database 310 A step (S317) of updating to completion of learning during learning is performed. On the other hand, the model learned by the machine learning module 300 predicts the result for new data without the need for a person to directly perform labeling. Predicting the label for sentiment may be stored in at least one of the second database 510 and the third database 310, which is preferably stored to be distinguished from the label directly input by a person.

이러한 기계학습 모듈(300)의 예측과정을 도 10 및 도 11을 참조하여 설명해보면 다음과 같다.The prediction process of the machine learning module 300 will be described with reference to FIGS. 10 and 11 as follows.

먼저 도 10에 도시된 기계학습 모듈(300)의 web상에서 사용자가 예측 시작 아이콘을 누르면 미리 작성된 옵션 정보가 기계학습 모듈로 전달되는 단계(S321)가 수행되며, 이후 기계학습 모듈(300)은 S321 단계에서 전달된 정보를 Inference Worker 및 제3 데이터베이스(310)로 전달하는 단계(S322)가 수행되며, 이때 제3 데이터베이스(310)에서 해당 모델의 상태는 예측 중인 것으로 업데이트된다. First, when the user presses the prediction start icon on the web of the machine learning module 300 shown in FIG. 10, a step (S321) of transferring pre-written option information to the machine learning module is performed, and then the machine learning module 300 is S321 A step S322 of transferring the information transmitted in the step to the Inference Worker and the third database 310 is performed, and at this time, the state of the corresponding model in the third database 310 is updated to be predicted.

이후 Inference Worker가 S322 단계에서 획득한 정보에 대응되는 데이터를 제2 데이터베이스(510)에 요청하는 단계(S323) 및 제2 데이터베이스(510)가 요청받은 데이터를 Inference Worker에 전달하는 단계(S324)가 순차적으로 진행된다. After the inference worker requests data corresponding to the information obtained in step S322 from the second database 510 (S323) and the second database 510 delivers the requested data to the inference worker (S324) proceeds sequentially.

이후 Inference Worker는 전달받은 데이터를 기계학습 모듈(300)에 전달하는 단계(S325) 및 기계학습 모듈(300)이 학습된 모델에 기초하여 추론을 진행하는 단계(S326)가 수행된다. Thereafter, the inference worker transmits the received data to the machine learning module 300 ( S325 ) and the machine learning module 300 performs inference based on the learned model ( S326 ).

기계학습 모듈(300)에서 추론이 완료된 데이터가 반영될 수 있도록 제2 데이터베이스(510)에 해당 내용이 업데이트되는 단계(S327)가 수행되는데, 이때 데이터에 output_meta라는 이름에 필드에 저장이 되며 meta 필드는 사용자가 작성한 라벨, output_meta는 모델이 예측한 라벨이 저장된다.In the machine learning module 300, a step (S327) of updating the corresponding contents in the second database 510 is performed so that the inferred data can be reflected, and at this time, the data is stored in a field named output_meta, and the meta field is the label created by the user, and output_meta is the label predicted by the model.

마지막으로 모든 데이터에 대한 추론이 완료되고 제2 데이터베이스(510) 내에서 모든 업데이트가 완료되면 제3 데이터베이스(310)에서는 현재 상태를 추론완료 상태로 업데이트하는 단계(S328)가 수행된다. Finally, when the speculation on all data is completed and all updates in the second database 510 are completed, the third database 310 updates the current state to the speculation completed state ( S328 ) is performed.

상술한 과정을 도 6을 참조하여 시계열적으로 설명해보면, 먼저 데이터 수집 모듈(100), 구체적으로 크롤링 유닛(120)이 크롤링한 새로운 게시물이 코어 서비스 모듈(500)의 데이터베이스에 저장되는 단계(S310)가 수행되되, 동시에 데이터 수집 모듈(100)은 크롤링한 데이터를 model_inference worker에 전달하는 단계(S320)가 진행된다.When the above-described process is described in time series with reference to FIG. 6 , first, a new post crawled by the data collection module 100 , specifically, the crawling unit 120 is stored in the database of the core service module 500 ( S310 ) ) is performed, but at the same time, the data collection module 100 transmits the crawled data to the model_inference worker (S320).

여기에 실무에서의 더 높은 정확성을 위해 모델이 예측하지 못한 혹은 예측 score가 특정 수치를 넘지 못하는 경우를 임계치(threshold)로 두어서 그 수치를 넘지 못한 예측은 딥러닝이 아닌 프로젝트에 맞추어 개발한 키워드 rule 기반으로 튜닝된 알고리즘을 통해 라벨 예측을 진행하여 실무에서 더욱 더 정확한 결과를 통해 분석을 진행할 수 있도록 할 수 있다.Here, for higher accuracy in practice, the case that the model does not predict or the prediction score does not exceed a certain number is set as a threshold, and the prediction that does not exceed that number is a keyword developed for a project, not deep learning. By performing label prediction through a rule-based tuning algorithm, it is possible to conduct analysis through more accurate results in practice.

사용자 서비스 모듈(400)은 사용자의 요청을 입력받고, 상기 사용자의 요청에 기초하여 상기 데이터 관리 모듈 및 기계학습 모듈 중 적어도 하나가 분석한 결과를 시각화하여 출력하는 기능을 수행한다. The user service module 400 receives a user's request, and performs a function of visualizing and outputting a result analyzed by at least one of the data management module and the machine learning module based on the user's request.

즉 이러한 사용자 서비스 모듈(400)은 데이터베이스에 저장된 글과 정보들(글의 출처, 날짜, 키워드, target, aspect와 sentiment)을 분석하여 그 결과를 도표로 보여줄 수 있는 기능을 수행한다.That is, the user service module 400 performs a function of analyzing the text and information stored in the database (the source of the text, date, keyword, target, aspect and sentiment) and showing the result as a diagram.

Task에 따라 보여지는 정보가 달라질 수 있으며 도 12에 도시된 바와 같이 회사별로 평가지수별 점수(각각의 글 당 어떤 aspect에서 긍부정 평가가 어떻게 되었고 그 글에 댓글들의 긍부정 평가까지 고려한 점수)별 radar 그래프(육각형 수치그래프)를 제공할 수 있으며, 도 13에 도시된 바와 같이 기간별 평가 지수 변화 추이를 확인할 수 있도록 구성될 수 있다.The information shown may vary depending on the task, and as shown in FIG. 12, each score by evaluation index by company (the score that considers the positive/negative evaluation of the comments on the post and how the positive/negative evaluation was performed in an aspect per each post) It can provide a radar graph (a hexagonal numerical graph), and as shown in FIG. 13, it can be configured to check the change trend of the evaluation index for each period.

아울러 사용자 서비스 모듈(400)은 특정 기업의 플랫폼 별 언급량, 워드클라우드, 특정 키워드에 대한 언급량과 증가율 등의 텍스트 기반 검색 및 분석에 대한 도표를 제공할 수 있으며, 나아가 트랜드 맵 또한 제공할 수 있도록 구성되는 것이 바람직하다. In addition, the user service module 400 may provide a chart for text-based search and analysis, such as the amount of mention by platform of a specific company, word cloud, and the amount and increase rate of mention of a specific keyword, and furthermore, a trend map may also be provided. It is preferable to be configured so that

아울러 각각의 그래프마다 다른 회사를 겹쳐 볼 수 있어 직관적인 비교가 가능하도록 구현되는 것이 바람직하며, 각각의 그래프의 경우 그래프 상에 마우스를 올리면 상세 정보가 보이거나 스크롤을 통해 기간을 조정하여 볼 수 있는 등 Interactive한 결과를 보여줄 수 있는 코드로 구현될 수 있다.In addition, it is desirable to be implemented so that intuitive comparison is possible because different companies can be viewed overlaid on each graph. It can be implemented as a code that can show interactive results such as

또한 사용자의 다양한 분석을 담보할 수 있도록 다양한 도표와 다양한 조건 설정이 가능하도록 구현될 수 있으며, 검색에 사용되는 데이터의 경우 meta 필드가 존재하면 meta field의 데이터를 우선적으로 사용하고, meta field가 없는 경우에만 output_meta의 데이터를 사용하여 모델예측 값을 사용하도록 구현한다.In addition, it can be implemented so that various diagrams and various conditions can be set to guarantee various analyzes of the user. In the case of data used for search, if there is a meta field, the data of the meta field is used preferentially, Implement to use the model prediction value using the data of output_meta only in case.

마지막으로 코어 서비스 모듈(500)은 데이터 수집 모듈(100), 데이터 관리 모듈(200), 기계학습 모듈(300) 및 사용자 서비스 모듈(400) 중 적어도 하나를 통합 관리 및 제어하는 기능을 수행하며, 상술한 각각의 모듈들과 연결되어 실시간 통신이 이루어지며, 각 모듈에서 들어온 command를 적재 적소의 모듈에 전달하여 해당 모듈의 worker가 동작할 수 있도록 하는 역할을 수행한다.Finally, the core service module 500 performs a function of integrated management and control of at least one of the data collection module 100, the data management module 200, the machine learning module 300, and the user service module 400, Real-time communication is made by being connected to each of the above-mentioned modules, and it plays a role of delivering the command received from each module to the appropriate module to enable the worker of the corresponding module to operate.

아울러 코어 서비스 모듈(500)은 데이터 수집 모듈(100)에서 크롤링한 전체 data가 저장되는 제2 데이터베이스(510)가 구비되어 필요한 데이터를 각각의 서비스에 전달해 주는 기능도 겸하는 등의 서비스를 통합적으로 관리하는 역할을 할 수 있다. In addition, the core service module 500 is provided with a second database 510 in which the entire data crawled by the data collection module 100 is stored, so as to also serve as a function of delivering necessary data to each service. can play a role

상술한 내용을 종합적으로 고려하여 본 발명의 일 실시예에 따른 AI 기반의 소셜 네트워크 서비스 분석 장치에 대하여 정리하여 설명해보면 다음과 같다.An AI-based social network service analysis apparatus according to an embodiment of the present invention will be summarized and described in detail in consideration of the foregoing.

먼저 데이터 수집 모듈(100)에서는 정해진 기간별로 주기적으로 social platform 별 generator worker을 실행시키고 generator worker가 실행이 되면 자동으로 crawler worker가 실행이 되어서 데이터 수집을 진행할 수 있도록 관리한다.First, in the data collection module 100, a generator worker for each social platform is executed periodically for a set period, and when the generator worker is executed, the crawler worker is automatically executed and managed so that data collection can proceed.

데이터 관리 모듈(200)과는 직접 API 통신으로 요청을 수신하고 요청에 맞는 데이터의 정보를 전달하고 dm_generator worker와 dm_creator worker을 통해 실제 데이터까지 전달하여 동작을 시키며 내부적으로 Crawler worker를 통해 Inference worker에 데이터를 전달하여 모델 추론까지 자동으로 되도록 관리를 한다.The data management module 200 receives a request through API communication directly, delivers data information suitable for the request, and delivers actual data through dm_generator worker and dm_creator worker to operate, and internally data to inference worker through crawler worker It is managed so that even model inference is automatically transmitted.

위의 모든 시스템은 각각의 서버에서 제공되기 때문에 각 기능마다 서버의 자원을 최대로 활용할 수 있으며 요즘 떠오르는 environment independent하게 동작가능한 컨테이너로 제공되어 미리 셋팅된 환경에서 제공되기 때문에 사용자의 환경과 관계없이 쉽게 적용 가능하며 각각의 컨테이너의 성능 및 기능의 확장이 용이하다는 효과를 기대할 수 있다. Since all of the above systems are provided by each server, the resources of the server can be utilized to the maximum for each function, and since it is provided as a container that can operate independently of the environment that is emerging these days, it is provided in a preset environment, so it is easy regardless of the user's environment. It is applicable, and the effect that the performance and function of each container can be easily extended can be expected.

본 명세서에서 설명되는 실시예와 첨부된 도면은 본 발명에 포함되는 기술적 사상의 일부를 예시적으로 설명하는 것에 불과하다. 따라서 본 명세서에 개시된 실시예들은 본 발명의 기술적 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이므로, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것이 아님은 자명하다. 본 발명의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당해 기술분야에 있어서의 통상의 지식을 가진 자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시예는 모두 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다. The embodiments described in this specification and the accompanying drawings are merely illustrative of some of the technical ideas included in the present invention. Therefore, since the embodiments disclosed in the present specification are for explanation rather than limitation of the technical spirit of the present invention, it is obvious that the scope of the technical spirit of the present invention is not limited by these embodiments. Modifications and specific embodiments that can be easily inferred by a person of ordinary skill in the art within the scope of the technical idea included in the specification and drawings of the present invention are included in the scope of the present invention. will have to be interpreted.

100: 데이터 수집 모듈
200: 데이터 관리 모듈
300: 기계학습 모듈
400: 사용자 서비스 모듈
500: 코어 서비스 모듈100: data acquisition module
200: data management module
300: machine learning module
400: user service module
500: core service module

Claims

a data collection module for collecting data on the social network service platform;
a data management module having a function of outputting the data collected by the data collection module based on a preset search algorithm and generating a label on the collected data;
a machine learning module that performs machine learning based on the label of the data stored in the data management module, and generates an algorithm for automatically performing the labeling of the data;
a user service module that receives a user's request, and visualizes and outputs a result analyzed by at least one of the data management module and the machine learning module based on the user's request; and
a core service module for controlling at least one of the data collection module, data management module, machine learning module, and user service module;
AI-based social network service analysis device that includes.

The method according to claim 1, wherein the data collection module,
a URL collecting unit that collects URLs of posts posted on the social network service platform at preset intervals; and
a crawling unit that crawls the postings based on the URL received from the URL collection unit;
AI-based news and social network service analysis device comprising a.

3. The method according to claim 2,
The URL collecting unit accesses the social network service platform at preset intervals to obtain IDs of posts from the latest post of the social network service platform to the last cycle, and then retrieves the URLs of the posts based on the IDs of the posts. generated and delivered to the crawling unit,
The crawling unit accesses the URL generated by the URL collection unit and then crawls the post of the URL.

4. The method according to claim 3,
The crawling unit accesses the URL generated by the URL collection unit, crawls the post of the URL, and performs pre-processing to generate pre-processing data,
The AI-based social network service analysis device, characterized in that the pre-processing data is stored in the second database of the core service module.

4. The method according to claim 3,
The data management module is configured to, based on a user's request input from the user service module, store target data matching the request among data stored in the second database of the core service module and information on the target data in the data management module. 1 AI-based social network service analysis device that stores in the database.

6. The method of claim 5,
The data management module outputs the post stored in the first database on the data management web page,
An AI-based social network service analysis apparatus in which at least one of the original text of the post, the context in which the post is divided into preset units, and the comment of the post is output on the data management web page.

7. The method of claim 6,
On the data management web page, the user allows the user to label the main element (Aspect) for at least one of the original text of the post, the context in which the post is divided into preset units, and the comment of the post and the emotional evaluation of the main element is configured to perform
The AI-based social network service analysis apparatus, characterized in that the labeling result is stored in at least one of the first database and the second database.