KR20220068793A

KR20220068793A - Method for providing news analysis service using robotic process automation monitoring

Info

Publication number: KR20220068793A
Application number: KR1020200155964A
Authority: KR
Inventors: 정민아
Original assignee: 정민아
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2022-05-26
Also published as: KR102413961B1

Abstract

Provided is a method for providing a news analysis service using robotic process automation (RPA) monitoring. The method comprises the steps of: performing news clipping on news data to correspond to a pre-input keyword file using RPA; performing natural language processing of the clipped news data and generating an issue keyword related to a main keyword of the news data using an association rule algorithm; collecting social media data and performing text mining using the main keyword and the issue keyword; and extracting a noun keyword from the text-mined social media data and visualizing the noun keyword as a keyword network.

Description

Method for providing news analysis service using RPA monitoring

본 발명은 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법에 관한 것으로, 뉴스를 수집 및 분석하고 메인 키워드 및 연관된 이슈 키워드를 추출하며, 관련된 소셜미디어의 콘텐츠 및 코멘트까지 분석하여 키워드 네트워크로 시각화할 수 있는 플랫폼을 제공한다.The present invention relates to a method of providing a news analysis service using RPA monitoring, and a platform capable of collecting and analyzing news, extracting main keywords and related issue keywords, and analyzing contents and comments of related social media to visualize it as a keyword network provides

RPA(Robotic Process Automation)은 사람이 컴퓨터를 통해 수행하는 반복적이고 루틴한 일련의 액션을 그대로 재현하는 기술로 물리적인 로봇이 아닌 소프트웨어로 프로그래밍하여 작동하도록 하는 기술이다. 업무지원 시스템은 데이터베이스, 화면설계(User Interface), 데이터 처리 로직 등의 기능 항목 별로 설계 및 구현을 해야 하나, RPA는 사람이 기 구축된 시스템 또는 웹 사이트에서 어떤 항목을 인지하고, 마우스나 키보드를 어떻게 동작하고 데이터를 어디에 입력하는지를 상세 정의하고 이를 그대로 구현하는 기술이다. 따라서 RPA 도입으로 인해 시스템의 수정이나 변경이 필요없고 개발되는 단위 프로세스들이 상호 독립적이라 개발 시스템 및 환경이 단순하여 개발이 상대적으로 용이하다. 이러한 특징은 RPA를 차별화된 솔루션으로 구분하게 하며, 업무자동화 시스템 개발에서 제외되던, 데이터 인터페이스가 어려운 데이터를 획득하고 입력하거나, 표준화가 어려운 조직별 사용자 별 특이 보고서 작성 등의 영역으로 자동화 가능 영역을 확대할 수 있다는 점이 부각되면서 관련 솔루션이 활발하게 적용되고 있다.RPA (Robotic Process Automation) is a technology that reproduces a series of repetitive and routine actions performed by a human through a computer as it is. The business support system should be designed and implemented for each functional item such as database, screen design (User Interface), and data processing logic. It is a technology that defines in detail how it works and where data is input and implements it as it is. Therefore, due to the introduction of RPA, there is no need to modify or change the system, and since the unit processes to be developed are mutually independent, the development system and environment are simple and development is relatively easy. These characteristics make RPA a differentiated solution, and it has been excluded from the development of the work automation system, but the areas that can be automated include acquiring and inputting data that is difficult to interface with, or writing special reports for each user in each organization that is difficult to standardize. As the fact that it can be expanded is highlighted, related solutions are being actively applied.

이때, RPA를 이용하여 데이터베이스를 구축하거나 봇(Bot)을 이용하여 수집 및 서비스를 자동화하는 기술이 개발되었는데, 이와 관련하여 선행기술인 한국공개특허 제2020-0041563호(2020년04월22일 공개) 및 한국공개특허 제2019-0134874호(2019년12월05일 공개)에는, 플랜트의 공정 조건 및 사고 시나리오 정보에 따라 플랜트의 사고 시뮬레이션을 수행하여 사고 시뮬레이션 결과를 생성 및 저장하며, 사고 시뮬레이션 동작 및 데이터베이스부로의 저장 과정을 자동화하 및 제어하는 로보틱 프로세스의 구성과, 봇을 이용하여 실시간으로 데이터를 자동으로 수집하고, 인공지능 알고리즘으로 분석한 후 지표를 산출하며, 지표를 바탕으로 로봇 저널리즘을 이용하여 실시간 반응형의 뉴스 콘텐츠를 생성하는 구성이 개시되어 있다.At this time, a technology for building a database using RPA or automating collection and service using bots was developed. And in Korea Patent Publication No. 2019-0134874 (published on December 05, 2019), an accident simulation of the plant is performed according to the process conditions and accident scenario information of the plant to generate and store the accident simulation result, and the accident simulation operation and The configuration of a robotic process that automates and controls the storage process in the database unit, and automatically collects data in real time using a bot, analyzes it with an artificial intelligence algorithm, and calculates an index. A configuration for generating real-time responsive news content by using is disclosed.

다만, 뉴스 콘텐츠를 로봇으로 생성하는 구성인 존재하지만 RSA를 이용하여 적어도 하나의 매체로부터 뉴스 콘텐츠를 수집하고, 경쟁사 및 SOV(Share Of Voice) 분석이나, 키워드나 연관어, 더 나아가 감성 분석이 가능한 플랫폼은 존재하지 않는다. 경쟁사 분석 및 시장 분석을 위하여 아직까지도 사람이 일일이 수동으로 모니터링을 진행하고 있으며, 모니터링 및 리포트 작성과 같은 반복업무는 계속되고 있다. 또, 매체에 게재된 뉴스를 수집하고 이슈를 파악했다고 할지라도 이와 관련한 소셜미디어의 피드 또는 반응을 분석하는 것은 또 다른 업무로드(Load)를 주기 때문에 자연어처리(NLP) 및 연관규칙분석을 이용하여 뉴스와 관련된 코멘트를 분석하는 기술도 함께 요구된다. 이에, RPA를 이용하여 뉴스를 자동으로 클리핑(Newsclipping)하고, 수집된 뉴스의 연관 이슈 키워드를 추출함으로써 코멘트를 시각화할 수 있는 방법의 연구 및 개발이 요구된다.However, although there is a configuration that generates news content with a robot, it is possible to collect news content from at least one medium using RSA, analyze competitors and SOV (Share Of Voice), keywords, related words, and further sentiment analysis. The platform does not exist. For competitor analysis and market analysis, people are still manually monitoring, and repetitive tasks such as monitoring and report writing are continuing. In addition, even if the news published in the media is collected and the issue is identified, analyzing the feed or response of social media related to this gives another load, so using natural language processing (NLP) and related rule analysis The skills to analyze comments related to news are also required. Accordingly, research and development of a method for visualizing comments by automatically clipping news using RPA and extracting related issue keywords from the collected news is required.

본 발명의 일 실시예는, RPA(Robotic Process Automation)를 이용하여 뉴스기사를 자동으로 클리핑(Newsclipping)함으로써 단순반복되던 모니터링 업무를 자동화시키고, 뉴스기사를 자연어처리한 후 연관규칙 알고리즘으로 메인 키워드와 연관된 이슈 키워드를 자동으로 생성하며, 메인 키워드 및 이슈 키워드를 합하여 소셜미디어의 데이터를 수집한 후 텍스트 마이닝을 실시하고, TF-IDF(Term Frequency-Inverse Document Frequency) 및 연관규칙 알고리즘을 이용하여 고객의 반응인 코멘트를 명사 키워드로 추출하며, 추출된 명사 키워드를 각 노드에 할당한 후 각 명사 간 연관성이 높을수록 가깝게 배치하며, 각 명사 키워드의 언급이 많을수록 각 노드의 크기를 증가시켜 시각화된 키워드 네트워크를 제공할 수 있는, RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법을 제공할 수 있다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.An embodiment of the present invention automates the monitoring task that was simply repeated by automatically clipping news articles using RPA (Robotic Process Automation), and after natural language processing of the news articles, the main keyword and Automatically generate related issue keywords, collect social media data by combining main keywords and issue keywords, conduct text mining, and use TF-IDF (Term Frequency-Inverse Document Frequency) and association rule algorithms to A keyword network visualized by extracting the comment as a response as a noun keyword, assigning the extracted noun keyword to each node, and placing the nouns closer as the relevance between them increases. It is possible to provide a method for providing a news analysis service using RPA monitoring, which can provide However, the technical task to be achieved by the present embodiment is not limited to the technical task as described above, and other technical tasks may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예는, RPA(Robotic Process Automation)를 이용하여 기 입력된 키워드 파일에 대응하도록 뉴스 데이터를 클리핑(Newsclipping)하는 단계, 클리핑된 뉴스 데이터를 자연어처리(Natural Language Processing)하고 연관규칙 알고리즘으로 뉴스 데이터의 메인 키워드와 연관된 이슈 키워드를 생성하는 단계, 메인 키워드 및 이슈 키워드를 이용하여 소셜미디어 데이터를 수집하고 텍스트 마이닝(Text Mining)을 수행하는 단계 및 텍스트 마이닝된 소셜미디어 데이터로부터 명사 키워드를 추출하고, 명사 키워드를 키워드 네트워크로 시각화하는 단계를 포함한다.As a technical means for achieving the above-described technical problem, an embodiment of the present invention includes a step of clipping news data to correspond to a previously input keyword file using RPA (Robotic Process Automation), clipped news The data is processed in natural language (Natural Language Processing) and the issue keywords related to the main keywords of the news data are generated using the association rule algorithm, social media data are collected using the main keywords and the issue keywords, and text mining is performed. and extracting a noun keyword from the text-mined social media data, and visualizing the noun keyword as a keyword network.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, RPA(Robotic Process Automation)를 이용하여 뉴스기사를 자동으로 클리핑(Newsclipping)함으로써 단순반복되던 모니터링 업무를 자동화시키고, 뉴스기사를 자연어처리한 후 연관규칙 알고리즘으로 메인 키워드와 연관된 이슈 키워드를 자동으로 생성하며, 메인 키워드 및 이슈 키워드를 합하여 소셜미디어의 데이터를 수집한 후 텍스트 마이닝을 실시하고, TF-IDF(Term Frequency-Inverse Document Frequency) 및 연관규칙 알고리즘을 이용하여 고객의 반응인 코멘트를 명사 키워드로 추출하며, 추출된 명사 키워드를 각 노드에 할당한 후 각 명사 간 연관성이 높을수록 가깝게 배치하며, 각 명사 키워드의 언급이 많을수록 각 노드의 크기를 증가시켜 시각화된 키워드 네트워크를 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, by automatically clipping news articles using RPA (Robotic Process Automation), the simple repetitive monitoring task is automated, and the news articles are processed in natural language and then related The rule algorithm automatically creates issue keywords related to the main keyword, collects social media data by combining the main keyword and the issue keyword, then conducts text mining, TF-IDF (Term Frequency-Inverse Document Frequency) and related rules Using an algorithm, the customer's response, a comment, is extracted as a noun keyword, and after allocating the extracted noun keyword to each node, the higher the relevance between each noun, the closer they are placed. can be increased to provide a visualized keyword network.

도 1은 본 발명의 일 실시예에 따른 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 시스템을 설명하기 위한 도면이다.
도 2는 도 1의 시스템에 포함된 뉴스 분석 서비스 제공 서버를 설명하기 위한 블록 구성도이다.
도 3은 본 발명의 일 실시예에 따른 RPA 모니터링을 이용한 뉴스 분석 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.
도 4는 도 3의 뉴스 분석 서비스를 실행하기 위한 프로그램 코드를 도시한 도면이다.
도 5는 도 3의 뉴스 분석 서비스가 제공되는 화면을 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법을 설명하기 위한 동작 흐름도이다.1 is a view for explaining a system for providing a news analysis service using RPA monitoring according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a news analysis service providing server included in the system of FIG. 1 .
3 is a diagram for explaining an embodiment in which a news analysis service using RPA monitoring is implemented according to an embodiment of the present invention.
4 is a diagram illustrating a program code for executing the news analysis service of FIG. 3 .
5 is a diagram illustrating a screen on which the news analysis service of FIG. 3 is provided.
6 is an operation flowchart illustrating a method of providing a news analysis service using RPA monitoring according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated, and one or more other features However, it is to be understood that the existence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded in advance.

명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본 발명의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본 발명의 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다. The terms "about", "substantially", etc. to the extent used throughout the specification are used in or close to the numerical value when manufacturing and material tolerances inherent in the stated meaning are presented, and are intended to enhance the understanding of the present invention. To help, precise or absolute figures are used to prevent unfair use by unconscionable infringers of the stated disclosure. As used throughout the specification of the present invention, the term "step of (to)" or "step of" does not mean "step for".

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체 지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware. Meanwhile, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or to reproduce one or more processors. Thus, as an example, '~' denotes components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and '~ units' may be combined into a smaller number of components and '~ units' or further separated into additional components and '~ units'. In addition, components and '~ units' may be implemented to play one or more CPUs in a device or secure multimedia card.

본 명세서에 있어서 단말, 장치 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말, 장치 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말, 장치 또는 디바이스에서 수행될 수도 있다. Some of the operations or functions described as being performed by the terminal, apparatus, or device in the present specification may be performed instead of by a server connected to the terminal, apparatus, or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal, apparatus, or device connected to the server.

본 명세서에서 있어서, 단말과 매핑(Mapping) 또는 매칭(Matching)으로 기술된 동작이나 기능 중 일부는, 단말의 식별 정보(Identifying Data)인 단말기의 고유번호나 개인의 식별정보를 매핑 또는 매칭한다는 의미로 해석될 수 있다.In this specification, some of the operations or functions described as mapping or matching with the terminal means mapping or matching the terminal's unique number or personal identification information, which is the identification data of the terminal. can be interpreted as

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 시스템을 설명하기 위한 도면이다. 도 1을 참조하면, RPA 모니터링을 이용한 뉴스 분석 서비스 제공 시스템(1)은, 적어도 하나의 사용자 단말(100), 뉴스 분석 서비스 제공 서버(300) 및 적어도 하나의 관리자 단말(400)을 포함할 수 있다. 다만, 이러한 도 1의 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 시스템(1)은, 본 발명의 일 실시예에 불과하므로, 도 1을 통하여 본 발명이 한정 해석되는 것은 아니다.1 is a view for explaining a system for providing a news analysis service using RPA monitoring according to an embodiment of the present invention. Referring to FIG. 1 , the system 1 for providing a news analysis service using RPA monitoring may include at least one user terminal 100 , a news analysis service providing server 300 , and at least one manager terminal 400 . have. However, since the news analysis service providing system 1 using RPA monitoring of FIG. 1 is only an embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 1 .

이때, 도 1의 각 구성요소들은 일반적으로 네트워크(network, 200)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 적어도 하나의 사용자 단말(100)은 네트워크(200)를 통하여 뉴스 분석 서비스 제공 서버(300)와 연결될 수 있다. 그리고, 뉴스 분석 서비스 제공 서버(300)는, 네트워크(200)를 통하여 적어도 하나의 사용자 단말(100), 적어도 하나의 관리자 단말(400)과 연결될 수 있다. 또한, 적어도 하나의 관리자 단말(400)은, 네트워크(200)를 통하여 뉴스 분석 서비스 제공 서버(300)와 연결될 수 있다. At this time, each component of FIG. 1 is generally connected through a network 200 . For example, as shown in FIG. 1 , at least one user terminal 100 may be connected to the news analysis service providing server 300 through the network 200 . In addition, the news analysis service providing server 300 may be connected to at least one user terminal 100 and at least one manager terminal 400 through the network 200 . In addition, at least one manager terminal 400 may be connected to the news analysis service providing server 300 through the network 200 .

여기서, 네트워크는, 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷(WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), 5GPP(5th Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), RF(Radio Frequency), 블루투스(Bluetooth) 네트워크, NFC(Near-Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Here, the network refers to a connection structure in which information exchange is possible between each node, such as a plurality of terminals and servers, and an example of such a network includes a local area network (LAN), a wide area network (WAN: Wide Area Network), the Internet (WWW: World Wide Web), wired and wireless data communication networks, telephone networks, wired and wireless television networks, and the like. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), 5th Generation Partnership Project (5GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi (Wi-Fi) , Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), RF (Radio Frequency), Bluetooth (Bluetooth) network, NFC ( Near-Field Communication) networks, satellite broadcast networks, analog broadcast networks, Digital Multimedia Broadcasting (DMB) networks, and the like are included, but are not limited thereto.

하기에서, 적어도 하나의 라는 용어는 단수 및 복수를 포함하는 용어로 정의되고, 적어도 하나의 라는 용어가 존재하지 않더라도 각 구성요소가 단수 또는 복수로 존재할 수 있고, 단수 또는 복수를 의미할 수 있음은 자명하다 할 것이다. 또한, 각 구성요소가 단수 또는 복수로 구비되는 것은, 실시예에 따라 변경가능하다 할 것이다.In the following, the term at least one is defined as a term including the singular and the plural, and even if the at least one term does not exist, each element may exist in the singular or plural, and may mean the singular or plural. it will be self-evident In addition, that each component is provided in singular or plural may be changed according to embodiments.

적어도 하나의 사용자 단말(100)은, RPA 모니터링을 이용한 뉴스 분석 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 자사의 상품 또는 서비스에 대한 뉴스나 소셜미디어를 분석한 후 인사이트 리포트를 받아보기를 원하는 고객사의 단말일 수 있다. At least one user terminal 100 receives an insight report after analyzing news or social media for its products or services using a news analysis service-related web page, app page, program or application using RPA monitoring It may be a terminal of a customer who wants .

여기서, 적어도 하나의 사용자 단말(100)은, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 사용자 단말(100)은, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 사용자 단말(100)은, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one user terminal 100 may be implemented as a computer that can access a remote server or terminal through a network. Here, the computer may include, for example, navigation, a laptop equipped with a web browser, a desktop, and a laptop. In this case, the at least one user terminal 100 may be implemented as a terminal capable of accessing a remote server or terminal through a network. At least one user terminal 100 is, for example, as a wireless communication device that guarantees portability and mobility, navigation, PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) ) terminal, a smart phone, a smart pad, a tablet PC, etc. may include all kinds of handheld-based wireless communication devices.

뉴스 분석 서비스 제공 서버(300)는, RPA 모니터링을 이용한 뉴스 분석 서비스 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 제공하는 서버일 수 있다. 그리고, 뉴스 분석 서비스 제공 서버(300)는, 사용자 단말(100)의 분석 의뢰를 받고 관리자 단말(400)로부터 중심 키워드가 포함된 파일을 수신하며, 중심 키워드를 기준으로 적어도 하나의 매체에서 적어도 하나의 뉴스를 수집하는 방식으로 뉴스클리핑(Newsclipping)을 하는 서버일 수 있다. 그리고, 뉴스 분석 서비스 제공 서버(300)는, 수집한 뉴스 데이터 내 메인 키워드를 추출하고, 메인 키워드와 연관된 이슈 키워드를 추출하는 서버일 수 있다. 또, 뉴스 분석 서비스 제공 서버(300)는, 메인 키워드 및 이슈 키워드가 포함된 적어도 하나의 소셜미디어 데이터를 검색하고, 소셜미디어 데이터 내 콘텐츠 및 코멘트를 분석한 후, 명사 키워드를 추출하고 각 명사 키워드에 노드를 할당하며, 각 명사 키워드의 수가 많을수록 노드의 크기를 크게, 명사 키워드 간 연관도가 높을수록 거리를 가깝게 배치하는 방식으로 키워드 네트워크를 생성하는 서버일 수 있다. 그리고, 뉴스 분석 서비스 제공 서버(300)는, 수집 및 분석한 데이터를 리포트 형식으로 인사이트 리포트(Insight Report)를 생성한 후 RPA를 이용하여 사용자 단말(100)로 예약발송하도록 하는 서버일 수 있다.The news analysis service providing server 300 may be a server that provides a news analysis service web page, app page, program, or application using RPA monitoring. In addition, the news analysis service providing server 300 receives an analysis request from the user terminal 100 and receives a file including a central keyword from the manager terminal 400 , and at least one medium in at least one medium based on the central keyword. It may be a server that does news clipping as a way to collect news of In addition, the news analysis service providing server 300 may be a server that extracts a main keyword from the collected news data and extracts an issue keyword related to the main keyword. In addition, the news analysis service providing server 300 searches for at least one social media data including a main keyword and an issue keyword, analyzes content and comments in the social media data, extracts a noun keyword, and each noun keyword It may be a server that creates a keyword network in such a way that nodes are allocated to , and the size of the node increases as the number of each noun keyword increases, and the distance between the noun keywords increases as the correlation between the noun keywords increases. In addition, the news analysis service providing server 300 may be a server that generates an Insight Report based on the collected and analyzed data in a report format and then transmits the collected and analyzed data to the user terminal 100 by using RPA.

여기서, 뉴스 분석 서비스 제공 서버(300)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다.Here, the news analysis service providing server 300 may be implemented as a computer capable of accessing a remote server or terminal through a network. Here, the computer may include, for example, navigation, a laptop equipped with a web browser, a desktop, and a laptop.

적어도 하나의 관리자 단말(400)은, RPA 모니터링을 이용한 뉴스 분석 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하는 의사의 단말일 수 있다. 그리고, 적어도 하나의 관리자 단말(400)은, 뉴스 분석 서비스 제공 서버(300)를 경유하여 사용자 단말(100)의 분석 의뢰가 수신된 경우, 분석할 뉴스를 수집하고, 분석할 수 있도록 키워드 파일을 뉴스 분석 서비스 제공 서버(300)로 업로드하는 단말일 수 있다. 또, 적어도 하나의 관리자 단말(400)은, 뉴스 분석 서비스 제공 서버(300)에서 분석을 마친 결과에 대하여 최종 검수한 결과를 뉴스 분석 서비스 제공 서버(300)로 업로드하는 단말일 수 있다.The at least one manager terminal 400 may be a terminal of a doctor using a web page, an app page, a program, or an application related to a news analysis service using RPA monitoring. And, when the analysis request of the user terminal 100 is received via the news analysis service providing server 300 , the at least one manager terminal 400 collects news to be analyzed and creates a keyword file for analysis. It may be a terminal for uploading to the news analysis service providing server 300 . In addition, the at least one manager terminal 400 may be a terminal for uploading a final inspection result to the news analysis service providing server 300 with respect to the result of the analysis completed by the news analysis service providing server 300 .

여기서, 적어도 하나의 관리자 단말(400)은, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 관리자 단말(400)은, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 관리자 단말(400)은, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one manager terminal 400 may be implemented as a computer capable of accessing a remote server or terminal through a network. Here, the computer may include, for example, navigation, a laptop equipped with a web browser, a desktop, and a laptop. In this case, the at least one manager terminal 400 may be implemented as a terminal capable of accessing a remote server or terminal through a network. At least one manager terminal 400, for example, as a wireless communication device that guarantees portability and mobility, navigation, PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) ) terminal, a smart phone, a smart pad, a tablet PC, etc. may include all kinds of handheld-based wireless communication devices.

도 2는 도 1의 시스템에 포함된 뉴스 분석 서비스 제공 서버를 설명하기 위한 블록 구성도이고, 도 3은 본 발명의 일 실시예에 따른 RPA 모니터링을 이용한 뉴스 분석 서비스가 구현된 일 실시예를 설명하기 위한 도면이고, 도 4는 도 3의 뉴스 분석 서비스를 실행하기 위한 프로그램 코드를 도시한 도면이고, 도 5는 도 3의 뉴스 분석 서비스가 제공되는 화면을 도시한 도면이다.FIG. 2 is a block diagram illustrating a news analysis service providing server included in the system of FIG. 1 , and FIG. 3 describes an embodiment in which a news analysis service using RPA monitoring is implemented according to an embodiment of the present invention. 4 is a diagram illustrating program code for executing the news analysis service of FIG. 3, and FIG. 5 is a diagram illustrating a screen on which the news analysis service of FIG. 3 is provided.

도 2를 참조하면, 뉴스 분석 서비스 제공 서버(300)는, 수집부(310), 생성부(320), 수행부(330), 시각화부(340), 보고부(350) 및 발송부(360)를 포함할 수 있다.Referring to FIG. 2 , the news analysis service providing server 300 includes a collecting unit 310 , a generating unit 320 , a performing unit 330 , a visualization unit 340 , a reporting unit 350 , and a sending unit 360 . ) may be included.

본 발명의 일 실시예에 따른 뉴스 분석 서비스 제공 서버(300)나 연동되어 동작하는 다른 서버(미도시)가 적어도 하나의 사용자 단말(100) 및 적어도 하나의 관리자 단말(400)로 RPA 모니터링을 이용한 뉴스 분석 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 전송하는 경우, 적어도 하나의 사용자 단말(100) 및 적어도 하나의 관리자 단말(400)은, RPA 모니터링을 이용한 뉴스 분석 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 설치하거나 열 수 있다. 또한, 웹 브라우저에서 실행되는 스크립트를 이용하여 서비스 프로그램이 적어도 하나의 사용자 단말(100) 및 적어도 하나의 관리자 단말(400)에서 구동될 수도 있다. 여기서, 웹 브라우저는 웹(WWW: World Wide Web) 서비스를 이용할 수 있게 하는 프로그램으로 HTML(Hyper Text Mark-up Language)로 서술된 하이퍼 텍스트를 받아서 보여주는 프로그램을 의미하며, 예를 들어 넷스케이프(Netscape), 익스플로러(Explorer), 크롬(Chrome) 등을 포함한다. 또한, 애플리케이션은 단말 상의 응용 프로그램(Application)을 의미하며, 예를 들어, 모바일 단말(스마트폰)에서 실행되는 앱(App)을 포함한다.The news analysis service providing server 300 according to an embodiment of the present invention or another server (not shown) operating in conjunction with at least one user terminal 100 and at least one manager terminal 400 uses RPA monitoring. When transmitting a news analysis service application, program, app page, web page, etc., at least one user terminal 100 and at least one manager terminal 400, the news analysis service application, program, app page using RPA monitoring , can install or open web pages, etc. In addition, the service program may be driven in at least one user terminal 100 and at least one manager terminal 400 by using a script executed in a web browser. Here, the web browser is a program that enables the use of a web (WWW: World Wide Web) service, and refers to a program that receives and displays hypertext written in HTML (Hyper Text Mark-up Language), for example, Netscape. , Explorer, Chrome, and the like. In addition, the application means an application on the terminal, for example, includes an app (App) executed in a mobile terminal (smartphone).

도 2를 참조하면, 수집부(310)는, RPA(Robotic Process Automation)를 이용하여 기 입력된 키워드 파일에 대응하도록 뉴스 데이터를 클리핑(Newsclipping)할 수 있다. 수집부(310)에서 RPA를 이용하여 기 입력된 키워드 파일에 대응하도록 뉴스 데이터를 클리핑할 때, 기 입력된 키워드 파일은, 메인 키워드 데이터가 포함된 제 1 엑셀 파일 및 자사, 경쟁사 및 산업별 데이터 수집을 위한 수집 키워드 데이터, 수집에서 제외되어야 할 키워드인 제외 키워드 데이터 및 뉴스 데이터를 수집할 매체 데이터를 포함하는 제 2 엑셀 파일을 포함할 수 있고, 매체 데이터는 해시태그로 저장될 수 있다.Referring to FIG. 2 , the collection unit 310 may clip news data to correspond to a keyword file previously input using RPA (Robotic Process Automation). When the collection unit 310 clips news data to correspond to a previously input keyword file using RPA, the input keyword file is a first Excel file containing main keyword data and data collected by company, competitor, and industry It may include a second Excel file including the collection keyword data for, negative keyword data, which is a keyword to be excluded from collection, and media data for collecting news data, and the media data may be stored as a hashtag.

생성부(320)는, 클리핑된 뉴스 데이터를 자연어처리(Natural Language Processing)하고 연관규칙 알고리즘으로 뉴스 데이터의 메인 키워드와 연관된 이슈 키워드를 생성할 수 있다. 생성부(320)에서 클리핑된 뉴스 데이터를 자연어처리(Natural Language Processing)하고 연관규칙 알고리즘으로 뉴스 데이터의 메인 키워드와 연관된 이슈 키워드를 생성할 때, 클리핑된 뉴스 데이터와, 클리핑된 뉴스 데이터 내에 포함된 코멘트 데이터를 토큰화(Tokenization)하고 불용어(Stop words)를 TF-IDF(Term Frequency-Inverse Document Frequency)로 제거할 수 있다.The generating unit 320 may perform natural language processing on the clipped news data and generate an issue keyword related to the main keyword of the news data using an association rule algorithm. When the generating unit 320 performs natural language processing on the clipped news data and generates an issue keyword related to the main keyword of the news data using the association rule algorithm, the clipped news data and the clipped news data included in the Comment data can be tokenized and stop words can be removed with TF-IDF (Term Frequency-Inverse Document Frequency).

여기서, TF-IDF는 정보 검색과 텍스트 마이닝에서 이용하는 가중치로, 여러 문서로 이루어진 문서군이 있을 때 어떤 단어가 특정 문서 내에서 얼마나 중요한 것인지를 나타내는 통계적 수치이다. 단어 빈도(Term Frequency)는 특정 단어가 문서 내에 얼마나 자주 등장하는지를 나타내는 값이며, 역문서 빈도(Inverse Document Frequency)는 다른 문서에는 많지 않고 해당 문서에서 자주 등장하는 단어를 의미한다. TF-IDF의 값은 단어 빈도와 역문서 빈도의 곱으로 사용한다. Here, TF-IDF is a weight used in information retrieval and text mining, and is a statistical value indicating how important a word is in a specific document when there is a document group consisting of several documents. The term frequency is a value indicating how often a specific word appears in a document, and the inverse document frequency refers to a word that appears frequently in the document but not many in other documents. The value of TF-IDF is used as the product of word frequency and inverse document frequency.

TF는 기본적으로 문서 내의 단어 총 빈도수를 사용해 계산할 수 있지만 단어수가 많아질 경우 값이 지속적으로 커질 수 있기 때문에, 불린 빈도(Boolean Frequency)를 이용하여 단어의 출현 여부만으로 TF 값을 0과 1로 정의할 수 있다. 또는, 로그 스케일 빈도(Logarithmically Scaled Frequency)를 이용하여 TF 값에 로그를 취함으로써 문서의 크기 해결도 하고 실제 빈도수도 반영할 수 있다. 마지막으로, 증가 빈도(Augmented Frequency)를 이용함으로써, 문서 길이에 따른 단어의 상대적 빈도를 나타내는 방식으로 최대 스케일이 1을 넘지 않도록 할 수도 있다. 결과적으로, TF-IDF는 단어가 특정 문서 내에서 빈도수가 높고 전체 문서 중 해당 단어가 포함된 문서가 적을수록 높아진다. 이를 통해 모든 문서에서 자주 나타나는 단어들을 걸러낼 수 있다.Basically, TF can be calculated using the total frequency of words in the document, but if the number of words increases, the value can continuously increase. can do. Alternatively, the size of the document may be resolved and the actual frequency may also be reflected by taking a logarithm of the TF value using a logarithmically scaled frequency. Finally, by using the augmented frequency, the maximum scale may not exceed 1 in a way that indicates the relative frequency of words according to the document length. As a result, the TF-IDF increases as a word has a high frequency within a specific document and there are fewer documents including the word among all documents. This allows you to filter out words that appear frequently in all documents.

덧붙여서, 이슈 키워드는 메인 키워드의 연관 단어, 즉 연관 키워드가 되는데, 이는, 빅데이터(BigData) 처리 기술을 이용하여 특정 시점 또는 전체 기간에 대해 중심 키워드와 의미적 관련성이 높은 이슈 키워드와 관련 데이터를 효율적으로 추출하는 방법을 이용할 수 있다. 빅데이터 플랫폼 중 하나인 Hadoop과 NoSQL중 오픈소스로 널리 사용되는 HBase 등을 이용하여 특정 중심 키워드에 대한 대량의 데이터를 수집한 후, 수집된 내용을 문장으로 분할하여 형태소 분석을 수행할 수 있다. 형태소 분석 작업 후, 각 추출 단어의 빈도수를 계산하고 날짜별로 저장하고, 데이터를 자동으로 수집하기 위하여 이슈 키워드 후보로 추출된 단어를 다시 중심 키워드로 보내어 재귀적으로 데이터를 수집한다. 최종적으로 검색을 하여 원하는 정보를 얻을 때 특정 시점 또는 전체 기간의 의미적 관련성이 높은 이슈 키워드와 관련 데이터가 제공되게 된다. 이를 이용하여 이슈 키워드를 찾을 수 있다.In addition, the issue keyword becomes a related word of the main keyword, that is, a related keyword, which uses big data processing technology to generate issue keywords and related data that are highly semantically related to the central keyword for a specific time point or the entire period. An efficient extraction method can be used. Using Hadoop, one of the big data platforms, and HBase, which is widely used as an open source among NoSQL, a large amount of data for a specific central keyword is collected, and then the collected contents are divided into sentences to perform morphological analysis. After morpheme analysis, the frequency of each extracted word is calculated and stored by date, and in order to automatically collect data, the word extracted as an issue keyword candidate is sent back to the central keyword to collect data recursively. Finally, when the desired information is obtained through a search, issue keywords and related data with high semantic relevance for a specific point in time or for the entire period are provided. You can use this to find issue keywords.

먼저 중심 키워드에 대한 인터넷 데이터로부터 데이터를 수집하는 모듈이 있으며, 수집된 데이터를 통하여 형태소 분석을 수행한다. 형태소가 분석된 데이터에 대해 맵리듀스(MapReduce) 작업을 통하여 빈도수를 계산하고, 데이터 저장 모듈에서 추출된 데이터에 대해 구성한 HBase 스키마에 맞게 데이터를 저장할 수 있다. 대량의 데이터를 자동으로 수집하기 위해 추출된 단어 중 이미 검색된 단어는 배제한 후, 추출 단어를 중심 키워드로 보내어 재귀적으로 데이터를 수집할 수 있다. 데이터 추출 모듈에서 HBase의 필터(Filter) 기능을 사용하여 최종적으로 검색한 중심 키워드에 대한 이슈 키워드와 관련 데이터를 획득할 수 있다. 이때 데이터 수집 모듈에서는 중심 키워드로 검색된 인터넷 데이터를 수집해 오는 기능을 하는데, 인터넷 데이터를 수집하는 이유는 이슈 키워드를 추출하기 위한 사전작업이다. 수많은 인터넷 데이터에서 내용을 추출할 때 본문의 내용을 추출하도록 크롤링(Crawling) 작업을 하고, 수집한 인터넷 데이터에 대해 본문의 내용뿐만 아니라 인터넷 데이터가 작성된 날짜와 인터넷 주소를 함께 수집을 할 수 있다. 그 후, 수집한 인터넷 데이터 내용에 대해 형태소 분석을 수행하기 위하여 각 내용을 문장으로 분할할 수 있다.First, there is a module that collects data from Internet data for central keywords, and morphological analysis is performed through the collected data. The frequency can be calculated through the MapReduce operation for the morpheme-analyzed data, and the data can be stored according to the HBase schema configured for the data extracted from the data storage module. In order to automatically collect a large amount of data, it is possible to collect data recursively by excluding previously searched words among extracted words and then sending the extracted words as a central keyword. In the data extraction module, you can acquire issue keywords and related data for the finally searched central keyword by using the HBase filter function. At this time, the data collection module collects Internet data searched for by the central keyword. When extracting content from numerous Internet data, crawling is performed to extract the content of the main text, and for the collected Internet data, not only the content of the main text, but also the date and Internet address of the Internet data can be collected. Thereafter, each content may be divided into sentences in order to perform morphological analysis on the collected Internet data content.

형태소 분석 모듈에서는 데이터 수집 모듈을 통해 획득한 인터넷 데이터 내용에 대해 일반명사와 고유명사만을 추출하기 위하여 형태소 분석을 한다. 형태소 분석을 하여 각 품사를 판별한 후, 불용어(Stop Word)가 될 관사, 전치사, 조사, 접속사 등과 같은 품사를 제거하고, 이슈 키워드 후보군이 될 일반명사와 고유명사를 추출할 수 있다. 빈도수 추출 모듈에서는 형태소 분석 모듈에서 넘겨받은 일반명사와 고유명사를 통해 각 단어에 대한 빈도수를 계산할 수 있다. 맵리듀스 작업을 수행하여 중심 키워드를 입력했을 때 나온 추출 단어의 빈도수를 계산한 후, 하둡에서 제공하는 보조 정렬을 통하여 빈도수가 높은 순으로 정렬할 수 있다. 데이터 저장 모듈에서는 앞에서 기술한 각 모듈에서 추출된 데이터를 HBase안에 저장하는데, HBase안에 데이터를 저장할 때 중심 키워드를 로우키(Row Key)로 저장을 하고, 컬럼패밀리(Column Family)를 3개로 나누며, 각 컬럼패밀리에 맞게 정성자(Qualifier)에 데이터를 저장한다. 첫 번째 컬럼패밀리에는 중심 키워드를 통해 추출된 단어를 저장하고, 두 번째 컬럼패밀리에는 관련된 단어가 나온 가장 최신 인터넷 데이터의 주소를 저장하고, 세 번째 컬럼패밀리명에는 작성 날짜를 저장한다. 값(Value)에는 각 추출된 단어의 빈도수를 저장할 수 있다.In the morpheme analysis module, morpheme analysis is performed to extract only common nouns and proper nouns for the contents of Internet data acquired through the data collection module. After determining each part-of-speech by morpheme analysis, it is possible to remove parts of speech such as articles, prepositions, surveys, and conjunctions that will be stop words, and extract common nouns and proper nouns that will become issue keyword candidates. In the frequency extraction module, the frequency of each word can be calculated using the common nouns and proper nouns received from the morpheme analysis module. After calculating the frequency of extracted words when a central keyword is input by performing a MapReduce operation, the secondary sort provided by Hadoop can be used to sort in the order of highest frequency. In the data storage module, the data extracted from each module described above is stored in HBase. When storing data in HBase, the central keyword is stored as a row key, and the column family is divided into three, Data is stored in a qualifier for each column family. The first column family stores the word extracted through the central keyword, the second column family stores the address of the most recent Internet data with related words, and the third column family name stores the creation date. A value may store the frequency of each extracted word.

이슈 키워드에 대한 데이터를 자동으로 수집하기 위하여 중심 키워드로 추출된 단어를 중심 키워드로 보내어 재귀적 방식으로 데이터를 수집할 수 있다. 재귀적으로 데이터를 수집할 때 중심 키워드가 들어간 단어가 HBase안에 로우키로 저장되어 있으면 인터넷 데이터를 수집하지 않고 넘어가도록 설정하여 데이터의 중복을 방지할 수 있다. 재귀적으로 데이터를 수집할 경우 무한루프에 빠지지 않도록 검색하는 단계의 횟수를 설정하여 데이터를 수집하고, 데이터 추출 모듈은 HBase에서 제공하는 로우, 정성자, 값에 대한 필터를 통하여 이슈 키워드와 관련 데이터를 추출한다. In order to automatically collect data on issue keywords, it is possible to collect data in a recursive manner by sending a word extracted as a central keyword to the central keyword. When collecting data recursively, if a word with a central keyword is stored as a low key in HBase, it is possible to prevent data duplication by setting it to skip Internet data without collecting it. When collecting data recursively, data is collected by setting the number of search steps so as not to fall into an infinite loop, and the data extraction module collects issue keywords and related data through the filters for rows, qualifiers, and values provided by HBase. extract

검색할 단어를 로우필터(RowFilter)로 검색한 후 값필터(ValueFilter)를 사용하여 연관성이 높은 상위 단어 리스트를 검색하고, TF(Term Frequency)을 수정하여 중심 키워드(메인 키워드)와 이슈 키워드의 연관성을 측정한다. 중심 키워드를 통해 추출된 추출 단어의 빈도수를 이용하여 평균값을 계산한 후 중심 키워드와 추출 단어의 연관성을 산출하는데, 값필터로 연관성이 높은 상위의 데이터를 검색한 후, 정성자필터(QualifierFilter)를 사용하여 각각의 데이터를 찾아 이슈 키워드와 함께 제공할 수 있다. 물론, 이슈 키워드를 검색하고 데이터베이스화하는 방법은 상술한 방법에 한정되지는 않는다.After searching the word to be searched with the RowFilter, use the ValueFilter to search the list of high-relevant words, and modify the TF (Term Frequency) to correlate the central keyword (main keyword) with the issue keyword. measure After calculating the average value using the frequency of extracted words extracted through the central keyword, the correlation between the central keyword and the extracted word is calculated. Thus, you can find each data and provide it with issue keywords. Of course, the method of searching for and databaseizing the issue keyword is not limited to the above-described method.

생성부(320)는, 뉴스 데이터 및 코멘트 데이터에 연관규칙 알고리즘을 적용하여 메인 키워드와 연관된 이슈 키워드를 자동으로 생성할 수 있다. 이때, 연관규칙 알고리즘은, Apriori Algorithm일 수 있다. 이는, 연관규칙, 선호도, 정보여과 등 데이터 변수들에서 관찰되는 주요한 관계를 가장 적합하게 설명할 수 있는 규칙을 찾는 알고리즘으로 높은 다차원이나 복잡한 관계를 가지는 데이터 간에 중요 연관성을 찾는데 사용될 수 있다. The generator 320 may automatically generate an issue keyword related to the main keyword by applying the association rule algorithm to the news data and the comment data. In this case, the association rule algorithm may be an Apriori Algorithm. This is an algorithm that finds a rule that can best explain the main relationship observed in data variables, such as association rules, preferences, and information filtration, and can be used to find important associations between data having high multidimensional or complex relationships.

Apriori 알고리즘에서 관계는 빈발 아이템 집합(Frequent Item Sets) 또는 연관규칙(Association Rules) 두 가지 형태로 표현한다. 빈발 아이템 집합은 함께 자주 발생하는 아이템들을 모은 것이며, 연관규칙은 아이템 간의 관계에 강도가 존재한다고 제안하는 것이다. 특정 데이터 집합에서 위 두 가지 방법을 사용하여 아이템 집합에 대한 관계 여부를 판단할 수 있다. 특정 아이템에서 대한 신뢰도를 지지도의 수식은 공지된 바와 같으므로 상세히 설명하지는 않는다. 지지도는 데이터 그룹에 특정 데이터가 포함된 데이터 집합의 비율이고, 신뢰도는 연관규칙으로 정의되어 연관성이 많은 데이터들을 그룹화 하는 군집화의 일종으로, 목적을 동시에 만족하는 가능성이 큰 데이터를 찾을 수 있다. 물론, 상술한 방법 이외에도 다양한 방법으로 분석을 수행할 수 있고, 다양한 방법이 이용가능함은 자명하다 할 것이다.In the Apriori algorithm, the relationship is expressed in two forms: Frequent Item Sets or Association Rules. The frequent item set is a collection of items that occur frequently together, and the association rule suggests that there is strength in the relationship between items. In a specific data set, using the above two methods, it is possible to determine whether there is a relationship to a set of items. Since the formula for the degree of support for the reliability of a specific item is known, it will not be described in detail. Support is the ratio of a data set that includes specific data in a data group, and reliability is defined as an association rule and is a type of clustering that groups highly correlated data. Of course, it will be apparent that the analysis can be performed by various methods other than the above-described method, and that various methods are available.

수행부(330)는, 메인 키워드 및 이슈 키워드를 이용하여 소셜미디어 데이터를 수집하고 텍스트 마이닝(Text Mining)을 수행할 수 있다. 수행부(330)는, 메인 키워드 및 이슈 키워드를 이용하여 소셜미디어 데이터를 수집하고 텍스트 마이닝(Text Mining)을 수행할 때, 소셜미디어 데이터 중 이슈 키워드와 연관된 소셜미디어 데이터를 검색할 수 있고, 이슈 키워드와 연관된 소셜미디어 데이터 내에 포함된 콘텐츠 및 코멘트를 토큰화(Tokenization)하고, 불용어(Stop words)를 TF-IDF(Term Frequency-Inverse Document Frequency)로 제거하며, 콘텐츠 및 코멘트에 연관규칙 알고리즘을 적용할 수 있다.The performing unit 330 may collect social media data using the main keyword and the issue keyword and perform text mining. When performing text mining and collecting social media data using the main keyword and the issue keyword, the performing unit 330 may search for social media data related to the issue keyword among the social media data, Tokenization of content and comments included in social media data related to keywords, removal of stop words with TF-IDF (Term Frequency-Inverse Document Frequency), and application of association rule algorithm to content and comments can do.

이때, 텍스트 마이닝이란 수치형 정형 데이터가 아닌 문자나 텍스트로 구성된 비정형 데이터로부터 자연어를 처리하고 형태소 분석기술을 이용하여 정제한 데이터 간의 관계 데이터를 추출하는 과정이다. 이때, 텍스트 마이닝 중 토픽모델링 알고리즘이 이용될 수도 있는데, 이는 온라인 댓글이나 SNS상의 메시지와 같은 비정형적인 문서 내에서 연관성 있는 단어들을 군집화하여 해당 주제를 추론하는 통계적 방법으로 LSA(Latent Semantic Analysis), pLSA(probabilistic Latent Semantic Analysis), LDA(Latent Dirichlet Allocation)가 이용될 수 있다. In this case, text mining is a process of extracting relational data between data refined by processing natural language from unstructured data composed of characters or text, rather than numerical structured data, and refined using morpheme analysis technology. In this case, a topic modeling algorithm may be used during text mining, which is a statistical method for inferring a topic by clustering related words in atypical documents such as online comments or messages on SNS. Latent Semantic Analysis (LSA), pLSA (Probabilistic Latent Semantic Analysis) and LDA (Latent Dirichlet Allocation) may be used.

이때, 본 발명의 일 실시예에서는, LDA 알고리즘이 문서 수준의 확률모형을 확보하지 못하기 때문에, 디리클레 분포(Dirichlet Allocation)에 기반을 둔 확률적 토픽모델링을 이용할 수도 있다. 이는, 주어진 단어들의 동시 출현 확률분포를 분석함으로써 해당 문서가 어떤 주제를 다루고 있는지 예측할 수 있다. LDA 토픽 분석에 사용된 깁스 샘플링(Gibbs Sampling)을 이용할 수 있는데, 각각의 문서는 분석가가 임의 확률과 샘플의 개수를 설정한 후 반복적으로 해당 샘플의 개수를 늘리거나 줄이면서 설정한 확률을 수정하는 방법이다.In this case, in an embodiment of the present invention, since the LDA algorithm does not secure a document-level probabilistic model, probabilistic topic modeling based on Dirichlet Allocation may be used. It is possible to predict which topic the document is dealing with by analyzing the probability distribution of co-occurrence of given words. Gibbs Sampling used in LDA topic analysis can be used. way.

이때, 중립도 기반으로 선택적으로 단어를 제거하여 리뷰 분석의 정확도를 높이는 방법이 전처리로 더 이용될 수도 있다. 첫 번째로는, Delete Sparse terms를 이용하여 희소한 단어들을 제거할 수 있다. 이 방식의 목적은 오타로 인해 추출된 단어를 제거하거나 매우 적게 등장하는 단어를 제거하기 위한 것이다. 희소한 단어들을 제거하기 위해 설정한 절삭 값(Threshold)은 문서 수 대비 단어의 등장 횟수가 기 설정된 퍼센트 이하의 희소한 단어를 제거하는 방법일 수 있다. 두 번째로는, Delete Neutral terms를 이용하여 중립성을 기준으로 단어들을 제거하는 것이다. In this case, a method of increasing the accuracy of review analysis by selectively removing words based on neutrality may be further used as a preprocessing method. First, sparse words can be removed using Delete Sparse terms. The purpose of this method is to remove words extracted due to typos or words that appear very few. A threshold set to remove rare words may be a method of removing rare words in which the number of occurrences of words relative to the number of documents is less than or equal to a preset percentage. Second, words are removed based on neutrality using Delete Neutral terms.

이 방식의 목적은 분류에 영향을 주는 단어가 아닌 데이터의 종류별 속성에 따라서 자주 등장하는 단어들을 제거하기 위한 것이다. 중립도 기반 단어 제거는 기본적으로 희소성에 기반하여 희소한 단어를 제거한 후에 중립도를 기반으로 두 집합에 모두 속하는 단어를 제거하는 것인데, 한 단어가 유용한 리뷰 집합과 유용하지 않은 리뷰 집합에 모두 등장하는 정도를 중립도(Neutrality Index)라고 정의한다. 물론, 상술한 전처리 방법 이외에도 다양한 전처리 방법이 존재할 수 있으므로 상술한 방법에 한정되지는 않는다.The purpose of this method is to remove words that appear frequently according to the properties of each type of data, not words that affect classification. Neutrality-based word removal is basically removing rare words based on sparsity and then removing words belonging to both sets based on neutrality, the degree to which a word appears in both useful and non-useful review sets. is defined as the Neutrality Index. Of course, since various pre-treatment methods may exist in addition to the above-described pre-treatment method, the present invention is not limited to the above-described method.

시각화부(340)는, 텍스트 마이닝된 소셜미디어 데이터로부터 명사 키워드를 추출하고, 명사 키워드를 키워드 네트워크로 시각화할 수 있다. 시각화부(340)에서 텍스트 마이닝된 소셜미디어 데이터로부터 명사 키워드를 추출하고, 명사 키워드를 키워드 네트워크로 시각화할 때, TF-IDF(Term Frequency-Inverse Document Frequency) 및 연관규칙 알고리즘을 적용하여 명사 키워드를 추출할 수 있다. 또, 시각화부(340)는, 텍스트 마이닝된 소셜미디어 데이터로부터 명사 키워드를 추출하고, 명사 키워드를 키워드 네트워크로 시각화할 때, 명사 키워드에 노드를 할당하고, 명사 키워드 간 연관성이 높은 경우 거리를 가깝게 배치하며, 명사 키워드의 언급 수가 많을수록 노드의 크기를 크게 형성시켜 키워드 네트워크로 시각화할 수 있다.The visualization unit 340 may extract a noun keyword from the text-mined social media data and visualize the noun keyword as a keyword network. When the visualization unit 340 extracts a noun keyword from the text-mined social media data and visualizes the noun keyword as a keyword network, TF-IDF (Term Frequency-Inverse Document Frequency) and association rule algorithm are applied to find the noun keyword. can be extracted. In addition, the visualization unit 340 extracts a noun keyword from the text-mined social media data, and when visualizing the noun keyword as a keyword network, assigns a node to the noun keyword, and closes the distance when the correlation between the noun keywords is high The larger the number of mentions of noun keywords, the larger the node size can be formed and visualized as a keyword network.

보고부(350)는, 시각화부(340)에서 텍스트 마이닝된 소셜미디어 데이터로부터 명사 키워드를 추출하고, 명사 키워드를 키워드 네트워크로 시각화한 후, 뉴스 데이터, 메인 키워드, 이슈 키워드, 소셜미디어 데이터 및 명사 키워드를 이용하여 인사이트 리포트(Insight Report)를 생성할 수 있다.The reporting unit 350 extracts a noun keyword from the text-mined social media data in the visualization unit 340 , and visualizes the noun keyword as a keyword network, followed by news data, main keywords, issue keywords, social media data and nouns. You can create an Insight Report using keywords.

발송부(360)는, 시각화부(340)에서 텍스트 마이닝된 소셜미디어 데이터로부터 명사 키워드를 추출하고, 명사 키워드를 키워드 네트워크로 시각화한 후, RPA를 이용하여 기 저장된 이메일 주소로 키워드 네트워크를 예약발송할 수 있다.The sending unit 360 extracts a noun keyword from the text-mined social media data in the visualization unit 340, visualizes the noun keyword as a keyword network, and then sends the keyword network to a pre-stored email address using RPA. can

이하, 상술한 도 2의 뉴스 분석 서비스 제공 서버의 구성에 따른 동작 과정을 도 3을 예로 들어 상세히 설명하기로 한다. 다만, 실시예는 본 발명의 다양한 실시예 중 어느 하나일 뿐, 이에 한정되지 않음은 자명하다 할 것이다.Hereinafter, an operation process according to the configuration of the news analysis service providing server of FIG. 2 will be described in detail with reference to FIG. 3 as an example. However, it will be apparent that the embodiment is only one of various embodiments of the present invention and is not limited thereto.

도 3을 기준으로 설명하면서 도 3에 대응하는 프로그램 코드를 함께 설명해야 하므로 이를 도 4를 참조하여 설명하기로 한다. 본 발명의 일 실시예에 따른 분석 서비스는 크게 두 가지의 단계로 나누어질 수 있다. 하나는 도 3a에 도시된 뉴스클리핑이고, 다른 하나는 도 3b의 소셜미디어 분석이다. 크게 두 가지의 단계를 중심으로 설명하기로 한다.Since the program code corresponding to FIG. 3 has to be described with reference to FIG. 3 , this will be described with reference to FIG. 4 . The analysis service according to an embodiment of the present invention may be divided into two stages. One is news clipping shown in FIG. 3A , and the other is social media analysis shown in FIG. 3B . We will mainly explain two steps.

<뉴스클리핑><News Clipping>

도 3a 및 도 4a를 참조하면, 새로운 뉴스 기사를 확인할 때, 금일 기준으로 몇 일간의 뉴스 기사를 수집할 것인지를 설정하게 된다. 그리고, 검색대상.xlxs 즉 도 4b의 엑셀 파일에 검색 리스트를 확인하고 연 < 월 < 일 < 키워드 순서의 폴더 구조를 가지고 있는지 체크 후 폴더 생성 또는 스킵(Skip)한다. 도 4c를 참조하면, 첫 번째 엑셀 파일에서는 각 중심 키워드 정보가 들어있는 엑셀 파일 이름을 불러오고 두 번째 엑셀 파일에는 자사/경쟁사/산업별 데이터 수집을 위한 키워드와 제외 키워드와 매체 정보가 해시 태그로 저장되도록 구성된다. 그리고 도 4d를 참조하면, 수집할 뉴스 데이터(뉴스 기사)를 담을 리스트 생성과 for문을 통해서 한 페이지에 10 개씩 담긴 뉴스를 체크하고 While문으로 페이지를 넘기면서 더 이상 페이지가 존재하지 않을 때까지 반복하게 된다.Referring to FIGS. 3A and 4A , when a new news article is checked, a number of days of news articles to be collected is set as of today. Then, the search list is checked in the search target.xlxs, that is, the Excel file of FIG. 4B, and it is checked whether the folder structure is in the order of year < month < day < keyword, and then the folder is created or skipped. Referring to FIG. 4C , in the first Excel file, the name of the Excel file containing each central keyword information is called, and in the second Excel file, keywords, negative keywords, and media information for data collection by company/competitor/industry are stored as hashtags. configured to be And, referring to FIG. 4D , creating a list to contain the news data (news articles) to be collected and checking the news 10 per page through the for statement, turning the page with the While statement, until the page no longer exists will repeat

도 4e를 참조하면, 수집한 기사의 날짜 정보가 매체명과 불필요한 데이터가 있기 때문에 정규표현식으로 날짜만 추출하고, 도 4f를 참조하면, 몇 시간 전 또는 몇 일 전 뉴스의 경우 YYYY.MM.DD 형식으로 표기 되어있지 않기 때문에 이를 계산하여 날짜 형식으로 변경한다. 도 4g를 참조하면, 수집된 결과를 엑셀 파일로 저장하고 다시 해당 파일을 각각 불러와서 하나의 엑셀 파일 내에서 시트로 저장한다.Referring to Figure 4e, since the date information of the collected articles has a media name and unnecessary data, only the date is extracted with a regular expression. Since it is not marked as , it is calculated and changed to the date format. Referring to FIG. 4G , the collected results are saved as an Excel file, the respective files are called again, and the collected results are saved as a sheet within one Excel file.

<소셜미디어 분석><Social Media Analysis>

도 3b와 도 4h를 참조하면, 수집된 뉴스기사를 토큰화하고 불용어를 제거 그리고 Apriori 알고리즘으로 메인 키워드와 연관된 이슈 키워드를 자동으로 생성하고, 도 4i와 같이 Apriori 알고리즘으로 형성된 이슈 키워드와 메인 키워드를 합쳐서 네이버 카페에서 카페명/타이틀/날짜/조회수/글내용/댓글 정보를 수집할 수 있다. 네이버의 경우 최대 1000개의 카페 글만 검색 창에 보여주기 때문에 이를 해결하기 위해서 1일 단위로 검색을 하고 모든 검색이 끝날 경우 time_mange 라는 함수에서 1을 더해 줌에 따라서 하루 전날의 카페 글을 자동으로 수집하고 최대 원하는 날짜와 time_manage에 축적된 수가 일치할 때까지 for문을 반복한다. 컴퓨터가 인터넷 반응 속도보다 빠르기 때문에 특정 데이터가 없음으로 인식하고 프로그램을 종료할 수 있어서 Selenium의 EC함수로 모든 데이터가 정상적으로 웹 페이지에 존재 후 데이터를 수집하기 시작한다. 3b and 4h, the collected news articles are tokenized, stopwords are removed, and issue keywords related to the main keyword are automatically generated with the Apriori algorithm. Together, you can collect cafe name/title/date/view count/posting content/comment information from Naver Cafe. In the case of Naver, since only up to 1000 cafe posts are displayed in the search window, to solve this problem, searches are performed on a daily basis. Repeat the for statement until the maximum desired date and the accumulated number in time_manage match. Because the computer is faster than the Internet reaction speed, it recognizes that there is no specific data and can terminate the program. Selenium's EC function starts collecting data after all data is normally present on the web page.

수집된 데이터를 도 4j와 같이 타이틀/본문/댓글 만을 불러와 텍스트 마이닝 시작하며, 도 4k와 같이 3 개의 클래스를 하나의 컬럼으로 통합하며, 도 4l를 참조하면, 상술한 프로세스에서 사용한 TF-IDF와 Apriori 알고리즘을 다시 사용하여 SNS에서 고객이 키워드와 이슈 부분을 어떻게 생각하는지 명사로 추출한다. 도 4m을 참조하면, 추출한 데이터를 시각화하는데, 거리가 가까운 노드 끼리 연관성이 강하고, 노드의 크기가 클 수록 많이 언급된 키워드이다. 이에 따른 결과는 도 4n과 같이 나타난다.As shown in Fig. 4j, only title/body/comments are retrieved and text mining is started, and three classes are integrated into one column as shown in Fig. 4k. Referring to Fig. 4l, TF-IDF used in the above process And Apriori algorithm is used again to extract what customers think of keywords and issues on SNS as nouns. Referring to FIG. 4M , when the extracted data is visualized, the correlation between nodes with a close distance is strong, and the larger the size of the node, the more frequently mentioned keywords. The result is shown as shown in FIG. 4N.

도 5a 및 도 5b를 참조하면, RPA 및 텍스트 마이닝 기술을 활용해 사람이 수동으로 진행하던 모니터링, 리포트 작성, 경쟁사 분석, 시장 분석 등의 업무를 자동화할 수 있게 된다. 이때, 이용된 기술은 RPA, R, Python일 수 있다.Referring to FIGS. 5A and 5B , it is possible to automate tasks such as monitoring, report writing, competitor analysis, and market analysis that were performed manually by humans by using RPA and text mining technology. In this case, the technology used may be RPA, R, or Python.

이와 같은 도 2 내지 도 5의 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1을 통해 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters not described with respect to the method of providing a news analysis service using RPA monitoring of FIGS. 2 to 5 are the same as those described above for the method of providing a news analysis service using RPA monitoring through FIG. 1 or from the described contents. Since it can be easily inferred, the following description will be omitted.

도 6은 본 발명의 일 실시예에 따른 도 1의 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 시스템에 포함된 각 구성들 상호 간에 데이터가 송수신되는 과정을 나타낸 도면이다. 이하, 도 6을 통해 각 구성들 상호간에 데이터가 송수신되는 과정의 일 예를 설명할 것이나, 이와 같은 실시예로 본원이 한정 해석되는 것은 아니며, 앞서 설명한 다양한 실시예들에 따라 도 6에 도시된 데이터가 송수신되는 과정이 변경될 수 있음은 기술분야에 속하는 당업자에게 자명하다.6 is a diagram illustrating a process in which data is transmitted/received between components included in the system for providing a news analysis service using RPA monitoring of FIG. 1 according to an embodiment of the present invention. Hereinafter, an example of a process in which data is transmitted and received between each component will be described with reference to FIG. 6 , but the present application is not limited to such an embodiment, and the example shown in FIG. 6 according to the various embodiments described above will be described. It is apparent to those skilled in the art that the data transmission/reception process may be changed.

도 6을 참조하면, 뉴스 분석 서비스 제공 서버는, RPA(Robotic Process Automation)를 이용하여 기 입력된 키워드 파일에 대응하도록 뉴스 데이터를 클리핑(Newsclipping)한다(S6100).Referring to FIG. 6 , the news analysis service providing server clips news data so as to correspond to a previously input keyword file using RPA (Robotic Process Automation) (S6100).

그리고, 뉴스 분석 서비스 제공 서버는, 클리핑된 뉴스 데이터를 자연어처리(Natural Language Processing)하고 연관규칙 알고리즘으로 뉴스 데이터의 메인 키워드와 연관된 이슈 키워드를 생성하고(S6200), 메인 키워드 및 이슈 키워드를 이용하여 소셜미디어 데이터를 수집하고 텍스트 마이닝(Text Mining)을 수행한다(S6300).Then, the news analysis service providing server performs natural language processing on the clipped news data and generates an issue keyword related to the main keyword of the news data using the association rule algorithm (S6200), and uses the main keyword and the issue keyword. Collect social media data and perform text mining (S6300).

또, 뉴스 분석 서비스 제공 서버는, 텍스트 마이닝된 소셜미디어 데이터로부터 명사 키워드를 추출하고, 명사 키워드를 키워드 네트워크로 시각화 한다(S6400).In addition, the news analysis service providing server extracts a noun keyword from the text-mined social media data, and visualizes the noun keyword as a keyword network (S6400).

상술한 단계들(S6100~S6400)간의 순서는 예시일 뿐, 이에 한정되지 않는다. 즉, 상술한 단계들(S6100~S6400)간의 순서는 상호 변동될 수 있으며, 이중 일부 단계들은 동시에 실행되거나 삭제될 수도 있다.The order between the above-described steps S6100 to S6400 is merely an example, and is not limited thereto. That is, the order between the above-described steps ( S6100 to S6400 ) may be mutually changed, and some of these steps may be simultaneously executed or deleted.

이와 같은 도 6의 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1 내지 도 5를 통해 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters not described with respect to the method for providing a news analysis service using RPA monitoring of FIG. 6 are the same as those described for the method of providing a news analysis service using RPA monitoring through FIGS. 1 to 5 or from the contents described above. Since it can be easily inferred, the following description will be omitted.

도 6를 통해 설명된 일 실시예에 따른 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The method of providing a news analysis service using RPA monitoring according to an embodiment described with reference to FIG. 6 may also be implemented in the form of a recording medium including instructions executable by a computer, such as an application or program module executed by a computer. can Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 일 실시예에 따른 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있음)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 본 발명의 일 실시예에 따른 RPA 모니터링을 이용한 뉴스 분석 서비스 제공 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다.In the method for providing a news analysis service using RPA monitoring according to an embodiment of the present invention described above, an application basically installed in a terminal (which may include a program included in a platform or operating system, etc. basically installed in the terminal) may be executed, and may be executed by an application (ie, a program) directly installed in the master terminal by a user through an application providing server such as an application store server, an application, or a web server related to the corresponding service. In this sense, the method for providing a news analysis service using RPA monitoring according to an embodiment of the present invention described above is implemented as an application (ie, program) installed basically in a terminal or directly installed by a user, and is transmitted to a computer such as a terminal. It may be recorded on a readable recording medium.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be interpreted as being included in the scope of the present invention. do.

Claims

In the news analysis service providing method executed in the news analysis service providing server,
Clipping news data to correspond to a keyword file previously input using RPA (Robotic Process Automation);
processing the clipped news data in natural language and generating an issue keyword associated with the main keyword of the news data using an association rule algorithm;
collecting social media data using the main keyword and the issue keyword and performing text mining; and
extracting a noun keyword from the text-mined social media data, and visualizing the noun keyword as a keyword network;
A method of providing a news analysis service using RPA monitoring comprising a.

The method of claim 1,
In the step of clipping the news data to correspond to the previously input keyword file using the RPA,
The previously input keyword file may include: a first Excel file including main keyword data; and
a second Excel file including collected keyword data for data collection by company, competitor, and industry, negative keyword data that is a keyword to be excluded from collection, and media data for collecting the news data;
including,
The method of providing a news analysis service using RPA monitoring, wherein the media data is stored as a hashtag.

The method of claim 1,
The step of natural language processing of the clipped news data and generating an issue keyword related to the main keyword of the news data by an association rule algorithm comprises:
Tokenizing the clipped news data and comment data included in the clipped news data, and removing stop words to TF-IDF (Term Frequency-Inverse Document Frequency);
automatically generating an issue keyword related to the main keyword by applying the association rule algorithm to the news data and comment data;
A method of providing a news analysis service using RPA monitoring, which is executed while performing.

The method of claim 1,
Extracting a noun keyword from the text-mined social media data, and visualizing the noun keyword as a keyword network,
extracting the noun keyword by applying Term Frequency-Inverse Document Frequency (TF-IDF) and the association rule algorithm;
A method of providing a news analysis service using RPA monitoring, which is executed while performing.

The method of claim 1,
Extracting a noun keyword from the text-mined social media data, and visualizing the noun keyword as a keyword network,
allocating nodes to the noun keywords, arranging a distance close to each other when the relevance between the noun keywords is high, and forming a size of a node larger as the number of mentions of the noun keywords increases to visualize the keyword network;
A method of providing a news analysis service using RPA monitoring, which is executed while performing.

The method of claim 1,
The association rule algorithm is an Apriori Algorithm, a method of providing a news analysis service using RPA monitoring.

The method of claim 1,
After extracting a noun keyword from the text-mined social media data and visualizing the noun keyword as a keyword network,
generating an Insight Report using the news data, main keywords, issue keywords, social media data, and noun keywords;
The method of providing a news analysis service using RPA monitoring further comprising a.

The method of claim 1,
Collecting social media data using the main keyword and the issue keyword and performing text mining includes:
searching for social media data related to the issue keyword among the social media data;
tokenizing content and comments included in social media data associated with the issue keyword;
removing stop words with TF-IDF (Term Frequency-Inverse Document Frequency); and
applying the association rule algorithm to the content and comments;
A method of providing a news analysis service using RPA monitoring, which is executed while performing.

The method of claim 1,
After extracting a noun keyword from the text-mined social media data and visualizing the noun keyword as a keyword network,
pre-sending the keyword network to a pre-stored e-mail address using the RPA;
The method of providing a news analysis service using RPA monitoring further comprising a.

A computer-readable recording medium recording a program for executing the method of any one of claims 1 to 9.