KR20210033294A

KR20210033294A - Automatic manufacturing apparatus for reports, and control method thereof

Info

Publication number: KR20210033294A
Application number: KR1020190114855A
Authority: KR
Inventors: 유형선; 김지희; 정예림; 김은선
Original assignee: 한국과학기술정보연구원
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2021-03-26
Also published as: KR102294555B1

Abstract

The present invention relates to a method for automatically writing a report capable of automatically writing an industry and market analysis report by conducting industry and market analysis on an analysis target, automatically extracting the key content from the analysis result, and then writing the implications in textual sentences.

Description

Automatic report generation device and its operation method {AUTOMATIC MANUFACTURING APPARATUS FOR REPORTS, AND CONTROL METHOD THEREOF}

본 발명은 분석 대상에 대한 산업·시장 분석을 수행하고, 분석 결과 중 핵심내용을 자동으로 추출한 후 그 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석 보고서를 자동으로 생성하기 위한 방안에 관한 것이다.The present invention relates to a method for performing industry/market analysis on an analysis object, automatically extracting core contents from the analysis results, and automatically generating an industry/market analysis report in which the implications are written in text form.

산업·시장 분석은 사업 타당성을 판단하기 위해 필수적으로 거쳐야 하는 분석 단계로, 산·학·연·관의 모든 기술사업화 주체는 기술사업화 전 과정에서 합리적 의사결정을 하기 위해 어떤 규모와 형태로든 산업·시장 분석의 수행을 필요로 하고 있다.Industry and market analysis is an analysis step that must be taken to determine the feasibility of a business. All technology commercialization agents in industry, academia, research institutes, and governments are responsible for making rational decisions in the entire process of technology commercialization. It is necessary to conduct market analysis.

산업·시장 분석의 방법은 크게 전문가 혹은 수요자의 정성적 판단에 의존하는 정성적 방법과 정량적 데이터에 근거한 정량적 방법으로 구분할 수 있는데, 보다 객관적인 분석이 가능한 정량적 분석 방법론과 이를 지원하기 위한 분석 시스템이 최근 들어 더 비중 있게 활용되고 있는 추세이다.The method of industry/market analysis can be divided into a qualitative method that relies on the qualitative judgment of an expert or consumer, and a quantitative method based on quantitative data. For example, it is being used more heavily.

그러나 이처럼 기존 기술에 채택하고 있는 정량적 분석 방법론과 분석 시스템은 데이터의 수집·가공·분석 단계를 수월하게 도와주는 용도에만 초점이 맞춰져 왔다.However, the quantitative analysis methodology and analysis system adopted in the existing technology have focused only on the purpose of helping to facilitate data collection, processing, and analysis.

즉 기존 기술은 산업·시장 분석자가 분석 대상에 대한 데이터를 수집·가공·분석하는 작업을 보다 수월하게 수행할 수 있도록 지원하는 수준이며, 더욱이 그 분석 결과물 또한 주로 표나 그림 형태로 요약하는 수준에 그치고 있다는 한계점이 있다.In other words, the existing technology is at the level of supporting industry and market analysts to more easily collect, process, and analyze data on the subject of analysis, and furthermore, the analysis results are only summarized in the form of tables or figures. There is a limitation.

본 발명은 상기한 사정을 감안하여 창출된 것으로서, 본 발명에서 도달하고자 하는 목적은, 분석 대상에 대한 산업·시장 분석을 수행하고, 분석 결과 중 핵심내용을 자동으로 추출한 후 그 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석 보고서를 자동으로 생성하는데 있다.The present invention was created in view of the above circumstances, and an object to be reached in the present invention is to perform industry/market analysis on an analysis target, automatically extract core contents from the analysis results, and then describe the implications in text form. It is to automatically generate an industrial and market analysis report written in sentences.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 자동보고서생성장치는, 분석 대상에 대한 보고서 생성을 위해 분석 대상과 관련된 원본 데이터를 수집하는 수집부; 상기 보고서의 포맷으로 지정되는 분석 카테고리 별로 상기 원본 데이터를 분류하여, 각 분석 카테고리마다의 분류 데이터로서 매칭시키는 분류부; 및 상기 분석 카테고리 별로 기 정의된 카테고리 특성에 따라 상기 분류 데이터를 분석하여, 상기 분석 카테고리마다의 분석 데이터를 생성하는 분석부를 포함하는 것을 특징으로 한다.An automatic report generation apparatus according to an embodiment of the present invention for achieving the above object includes: a collection unit for collecting original data related to an analysis object in order to generate a report on the analysis object; A classification unit for classifying the original data for each analysis category designated in the report format and matching it as classification data for each analysis category; And an analysis unit configured to generate analysis data for each analysis category by analyzing the classification data according to category characteristics predefined for each analysis category.

구체적으로, 상기 자동보고서생성장치는, 상기 분석 카테고리 별 분석 데이터로부터 핵심 내용을 추출하는 추출부; 및 상기 핵심 내용이 의미하는 시사점을 기 정의된 양식 기반의 문자 생성 규칙에 따라 텍스트 형태로 변환하여, 상기 분석 카테고리 별 분석 데이터와 상기 텍스트 형태로 변환된 시사점을 병기한 보고서를 생성하는 생성부를 더 포함할 수 있다.Specifically, the automatic report generation device includes: an extraction unit for extracting core contents from the analysis data for each analysis category; And a generator configured to convert the implications of the core content into a text form according to a predefined form-based character generation rule, and generate a report including the analysis data for each analysis category and the implications converted to the text form. Can include.

구체적으로, 상기 수집부는, 상기 분석 카테고리 별 상기 카테고리 특성과 매칭되도록 지정된 데이터 저장소의 리스트를 기초로, 상기 분석 대상으로 입력된 명칭을 상기 리스트 상 데이터 저장소 각각에서의 통용 명칭으로 변경 또는 확장하여 상기 원본 데이터를 수집할 수 있다.Specifically, the collection unit changes or expands the name input for the analysis object to a common name in each data storage on the list, based on a list of data storages designated to match the category characteristics for each analysis category. Original data can be collected.

구체적으로, 상기 분석부는, 2 이상의 분석 카테고리 간에 공유되는 분류 데이터인 공유 데이터가 존재하는 경우, 상기 공유 데이터 별로 소유권을 가지는 마스터 분석 카테고리를 지정하여 상기 마스터 분석 카테고리에 한해서 상기 공유 데이터를 가공한 정제 데이터를 생성할 수 있도록 할 수 있다.Specifically, when there is shared data, which is classification data shared between two or more analysis categories, the analysis unit designates a master analysis category having ownership for each of the shared data, and processes the shared data only for the master analysis category. You can make it possible to generate data.

구체적으로, 상기 공유 데이터는, 데이터 간의 관련도 및 데이터 크기 중 적어도 하나를 기초로 상기 2 이상의 분석 카테고리 중 어느 하나에 대해서 데이터 가공을 위한 소유권이 할당될 수 있다.Specifically, in the shared data, ownership for data processing may be assigned to any one of the two or more analysis categories based on at least one of a relationship between data and a data size.

구체적으로, 상기 공유 데이터는, 상기 2 이상의 분석 카테고리 간에 지정되는 데이터 가공 시간 동안 정제 데이터로의 데이터 가공이 이루어지며, 상기 정제 데이터는, 상기 데이터 가공 시간이 종료되는 시점 또는 데이터 가공이 완료되는 시점에, 소유권을 가지지 않는 분석 카테고리에 대해서 갱신될 수 있다.Specifically, the shared data is processed into refined data during a data processing time designated between the two or more analysis categories, and the refined data is when the data processing time ends or data processing is completed. On the other hand, it can be updated for analytics categories that do not have ownership.

구체적으로, 상기 추출부는, 상기 분석 카테고리 별 핵심 내용 추출규칙에 따른 내용 선택 가중치에 따라, 상기 분석 데이터로부터 확인되는 수치적 물리량의 변화에 대한 내용, 범주 간에 상기 수치적 물리량이 비교되는 내용, 및 노출 빈도가 가장 높은 특정 용어에 관한 내용 중 적어도 하나를 핵심 내용으로 추출할 수 있다.Specifically, the extraction unit, according to the content selection weight according to the core content extraction rule for each analysis category, the content of the change in the numerical physical quantity identified from the analysis data, the content of the numerical physical quantity compared between categories, and At least one of the content related to a specific term with the highest exposure frequency can be extracted as the core content.

구체적으로, 상기 추출부는, 상기 분석 카테고리 별 핵심 내용 추출규칙에 따른 시기 선택 가중치에 따라, 상기 분석 데이터로부터 확인되는 과거보다는 최근의 변화 내용, 미래 전망에 대한 내용, 및 특정한 사건을 전후한 시점에 관한 내용 중 적어도 하나를 핵심 내용으로 추출할 수 있다.Specifically, the extraction unit, according to the time selection weight according to the core content extraction rule for each analysis category, at a time before and after a specific event, more recent changes than the past, contents about future prospects, and At least one of the related content can be extracted as the core content.

구체적으로, 상기 추출부는, 상기 분석 데이터로부터 동일 수준의 다수 범주를 아우르는 상위 수준 범주를 포함한 계층 구조가 확인되는 경우, 상기 분석 카테고리 별 핵심 내용 추출규칙에 따른 범주 선택 가중치에 따라, 상위 수준에 대한 분석 내용, 동일 수준의 다수 범주로부터 새롭게 도출되는 신규 상위 수준 범주에 대한 내용, 상위 수준 범주의 통계적 물리량과 설정 값 이상의 차이를 보이는 하위 수준의 범주에 대한 내용, 동일 수준의 범주 중에서 상기 분석 대상과의 관련도가 가장 높은 범주에 대한 내용, 동일 수준의 범주 중에서 가장 높거나 낮은 물리량을 갖는 범주에 대한 내용, 동일 수준의 범주 중에서 설정 순위 또는 설정 비율 이내의 물리량을 갖는 범주에 대한 내용, 및 동일 수준의 범주 중에서 물리량의 변화가 가장 큰 범주에 대한 내용 중 적어도 하나를 핵심 내용으로 추출할 수 있다.Specifically, when a hierarchical structure including a high-level category encompassing a plurality of categories of the same level is identified from the analysis data, the extracting unit is based on a category selection weight according to the core content extraction rule for each analysis category. Analysis contents, contents of a new higher-level category newly derived from multiple categories of the same level, contents of a lower-level category showing a difference of more than a set value and statistical physical quantity of the upper-level category, and the analysis target among the same-level categories The content of the category with the highest relevance of the same level, the category with the highest or the lowest physical quantity among the categories of the same level, the content of the category with the physical quantity within the set rank or ratio among the categories of the same level, and the same Among the level categories, at least one of the categories with the greatest change in physical quantity can be extracted as the core content.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 자동보고서생성장치의 동작 방법은, 분석 대상에 대한 보고서 생성을 위해 분석 대상과 관련된 원본 데이터를 수집하는 수집단계; 상기 보고서의 포맷으로 지정되는 분석 카테고리 별로 상기 원본 데이터를 분류하여, 각 분석 카테고리마다의 분류 데이터로서 매칭시키는 분류단계; 및 상기 분석 카테고리 별로 기 정의된 카테고리 특성에 따라 상기 분류 데이터를 분석하여, 상기 분석 카테고리마다의 분석 데이터를 생성하는 분석단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, a method of operating an automatic report generation apparatus according to an embodiment of the present invention includes: a collection step of collecting original data related to an analysis object in order to generate a report on the analysis object; A classification step of classifying the original data for each analysis category specified in the report format and matching it as classification data for each analysis category; And an analysis step of analyzing the classification data according to category characteristics predefined for each analysis category, and generating analysis data for each analysis category.

구체적으로, 상기 방법은, 상기 분석 카테고리 별 분석 데이터로부터 핵심 내용을 추출하는 추출단계; 및 상기 핵심 내용이 의미하는 시사점을 기 정의된 양식 기반의 문자 생성 규칙에 따라 텍스트 형태로 변환하여, 상기 분석 카테고리 별 분석 데이터와 상기 텍스트 형태로 변환된 시사점을 병기한 보고서를 생성하는 생성단계를 더 포함할 수 있다.Specifically, the method includes: an extraction step of extracting core content from the analysis data for each analysis category; And a generating step of converting the implications of the core content into a text format according to a predefined format-based character generation rule, and generating a report containing the analysis data for each analysis category and the implications converted to the text format. It may contain more.

구체적으로, 상기 수집단계는, 상기 분석 카테고리 별 상기 카테고리 특성과 매칭되도록 지정된 데이터 저장소의 리스트를 기초로, 상기 분석 대상으로 입력된 명칭을 상기 리스트 상 데이터 저장소 각각에서의 통용 명칭으로 변경 또는 확장하여 상기 원본 데이터를 수집할 수 있다.Specifically, in the collecting step, based on a list of data stores designated to match the category characteristics for each analysis category, the name input as the analysis object is changed or expanded to a common name in each data store on the list. The original data can be collected.

구체적으로, 상기 분석단계는, 2 이상의 분석 카테고리 간에 공유되는 분류 데이터인 공유 데이터가 존재하는 경우, 상기 공유 데이터 별로 소유권을 가지는 마스터 분석 카테고리를 지정하여 상기 마스터 분석 카테고리에 한해서 상기 공유 데이터를 가공한 정제 데이터를 생성할 수 있도록 할 수 있다.Specifically, in the analysis step, when there is shared data, which is classification data shared between two or more analysis categories, the shared data is processed only for the master analysis category by designating a master analysis category having ownership for each of the shared data. You can make it possible to generate refined data.

구체적으로, 상기 추출단계는, 상기 분석 카테고리 별 핵심 내용 추출규칙에 따른 내용 선택 가중치에 따라, 상기 분석 데이터로부터 확인되는 수치적 물리량의 변화에 대한 내용, 범주 간에 상기 수치적 물리량이 비교되는 내용, 및 노출 빈도가 가장 높은 특정 용어에 관한 내용 중 적어도 하나를 핵심 내용으로 추출할 수 있다.Specifically, in the extraction step, according to the content selection weight according to the core content extraction rule for each analysis category, the content of the change in the numerical physical quantity identified from the analysis data, the content of the numerical physical quantity being compared between categories, And at least one of the contents related to the specific term with the highest exposure frequency may be extracted as the core contents.

구체적으로, 상기 추출단계는, 상기 분석 카테고리 별 핵심 내용 추출규칙에 따른 시기 선택 가중치에 따라, 상기 분석 데이터로부터 확인되는 과거보다는 최근의 변화 내용, 미래 전망에 대한 내용, 및 특정한 사건을 전후한 시점에 관한 내용 중 적어도 하나를 핵심 내용으로 추출할 수 있다.Specifically, in the extraction step, according to the timing selection weight according to the core content extraction rule for each analysis category, the contents of recent changes rather than the past identified from the analysis data, contents of future prospects, and time points before and after a specific event At least one of the content related to can be extracted as the core content.

구체적으로, 상기 추출단계는, 상기 분석 데이터로부터 동일 수준의 다수 범주를 아우르는 상위 수준 범주를 포함한 계층 구조가 확인되는 경우, 상기 분석 카테고리 별 핵심 내용 추출규칙에 따른 범주 선택 가중치에 따라, 상위 수준에 대한 분석 내용, 동일 수준의 다수 범주로부터 새롭게 도출되는 신규 상위 수준 범주에 대한 내용, 상위 수준 범주의 통계적 물리량과 설정 값 이상의 차이를 보이는 하위 수준의 범주에 대한 내용, 동일 수준의 범주 중에서 상기 분석 대상과의 관련도가 가장 높은 범주에 대한 내용, 동일 수준의 범주 중에서 가장 높거나 낮은 물리량을 갖는 범주에 대한 내용, 동일 수준의 범주 중에서 설정 순위 또는 설정 비율 이내의 물리량을 갖는 범주에 대한 내용, 및 동일 수준의 범주 중에서 물리량의 변화가 가장 큰 범주에 대한 내용 중 적어도 하나를 핵심 내용으로 추출할 수 있다.Specifically, in the extraction step, when a hierarchical structure including a higher level category encompassing a plurality of categories of the same level is confirmed from the analysis data, the higher level is selected according to the category selection weight according to the core content extraction rule for each analysis category. Analysis content, content of a new higher-level category newly derived from multiple categories of the same level, content of a lower-level category showing a difference greater than or equal to the set value and the statistical physical quantity of the upper-level category, and the analysis target among the categories of the same level The content of the category with the highest relevance to the category, the category with the highest or the lowest physical quantity among the categories of the same level, the content of the category with the physical quantity within the set rank or ratio among the categories of the same level, and Among the categories of the same level, at least one of the categories with the greatest change in physical quantity can be extracted as the core content.

이에, 본 발명의 자동보고서생성장치 및 그 동작 방법에서는, 자동화된 프로세스를 통해 산업·시장분석의 전 과정을 체계적이고 효율적으로 수행할 수 있도록 함으로써 분석 대상의 양에 상관없이 다수의 산업 및 품목 분야에 대해서도 빠른 시간 안에 고속·대용량 분석이 가능케 하여 산업·시장분석에 소요되는 시간과 비용을 크게 절감시키는 효과가 있다.Accordingly, in the automatic report generation device and its operation method of the present invention, the entire process of industry and market analysis can be systematically and efficiently performed through an automated process, so that a number of industries and item fields Also, it enables high-speed and large-capacity analysis in a short time, greatly reducing the time and cost required for industrial and market analysis.

또한 기존의 표 및 그림 형태로만 제공되던 정량적 시스템의 한계를 벗어나 분석 결과로부터 도출할 수 있는 핵심적인 특징과 분석 인사이트를 설명 문장을 통해 제공함으로써 정보 이용자의 이해도를 높이고, 정보 이용자의 해석 수준에 의존하지 않고 보다 가치 있는 시사점을 제공하는 효과가 있다.In addition, by providing key features and analysis insights that can be derived from the analysis results beyond the limitations of the quantitative system, which was provided only in the form of tables and figures, through explanatory sentences, it enhances the understanding of information users and relies on the level of interpretation of information users. It does not have the effect of providing more valuable implications.

도 1은 본 발명의 일 실시예에 따른 보고서 자동 생성 환경을 설명하기 위한 개략적인 구성도.
도 2는 본 발명의 일 실시예에 따른 자동보고서생성장치의 구성을 설명하기 위한 구성도.
도 3 내지 도 16은 본 발명의 일 실시예에 따른 분석 결과로서 도출되는 형태를 설명하기 위한 예시도.
도 17은 본 발명의 일 실시예에 따른 보고서 형성을 설명하기 위한 예시도.
도 18은 본 발명의 일 실시예에 따른 자동보고서생성장치의 동작 방법을 설명하기 위한 순서도.1 is a schematic configuration diagram for explaining an environment for automatically generating a report according to an embodiment of the present invention.
Figure 2 is a configuration diagram for explaining the configuration of the automatic report generating device according to an embodiment of the present invention.
3 to 16 are exemplary views for explaining a form derived as an analysis result according to an embodiment of the present invention.
17 is an exemplary view for explaining the formation of a report according to an embodiment of the present invention.
18 is a flowchart illustrating a method of operating an automatic report generating apparatus according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 일 실시예에 대하여 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 보고서 자동 생성 환경을 보여주고 있다.1 shows an environment for automatically generating a report according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 보고서 자동 생성 환경은, 산업·시장 분석 결과에 따른 보고서를 자동 생성하는 자동보고서생성장치(100)를 포함할 수 있다.As shown in FIG. 1, the environment for automatically generating a report according to an embodiment of the present invention may include an automatic report generation device 100 that automatically generates a report according to an industry/market analysis result.

자동보고서생성장치(100)는 산업·시장 분석을 수행하고, 분석 수행 결과에 해당하는 보고서를 자동 생성하기 위한 장치를 일컫는 것으로서, 유무선 통신망을 통해 접속 가능한 서버의 형태로 구현되거나, 또는 컴퓨터 시스템(예: 컴퓨터, 모바일 폰) 내 프로그램 형태로도 구현될 수 있다.The automatic report generation device 100 is a device for performing industry/market analysis and automatically generating a report corresponding to the result of the analysis, and is implemented in the form of a server accessible through a wired/wireless communication network, or a computer system ( Example: computer, mobile phone) It can also be implemented in the form of an in-program.

이러한, 자동보고서생성장치(100)가 서버의 형태로 구현되는 경우에는, 예컨대, 웹 서버, 데이터베이스 서버, 프록시 서버 등의 형태로 구현될 수 있으며, 네트워크 부하 분산 메커니즘, 내지 서비스 장치가 인터넷 또는 다른 네트워크 상에서 동작할 수 있도록 하는 다양한 소프트웨어 중 하나 이상이 설치될 수 있으며, 이를 통해 컴퓨터화된 시스템으로도 구현될 수 있다. 또한, 네트워크는 http 네트워크일 수 있으며, 전용 회선(private line), 인트라넷 또는 임의의 다른 네트워크일 수 있고, 또한 본 발명의 일 실시예에 따른 시스템 내 각 구성 간의 연결은, 데이터가 임의의 해커 또는 다른 제3자에 의한 공격을 받지 않도록 보안 네트워크로 연결될 수 있다.When the automatic report generation device 100 is implemented in the form of a server, it may be implemented in the form of, for example, a web server, a database server, a proxy server, etc. One or more of a variety of software that enables operation on a network may be installed, and through this, it may be implemented as a computerized system. In addition, the network may be an http network, a private line, an intranet, or any other network. In addition, the connection between each component in the system according to an embodiment of the present invention, data is a random hacker or It can be connected to a secure network so that it is not attacked by other third parties.

한편, 일반적으로 산업·시장 분석의 방법은 크게 전문가 혹은 수요자의 정성적 판단에 의존하는 정성적 방법과 정량적 데이터에 근거한 정량적 방법으로 구분할 수 있는데, 보다 객관적인 분석이 가능한 정량적 분석 방법론과 이를 지원하기 위한 분석 시스템이 최근 들어 더 비중 있게 활용되고 있는 추세이다.Meanwhile, in general, the method of industry/market analysis can be divided into a qualitative method that relies on a qualitative judgment of an expert or a consumer, and a quantitative method based on quantitative data. In recent years, the analysis system is being used more heavily.

그러나 이러한, 정량적 분석 방법론과 분석 시스템의 경우, 데이터의 수집·가공·분석 단계를 수월하게 도와주는 용도에만 초점이 맞춰진 관계로, 산업·시장 분석자가 분석 대상에 대한 데이터를 수집·가공·분석하는 작업을 보다 수월하게 수행할 수 있도록 지원하는 수준이며, 더욱이 그 분석 결과물 또한 주로 표나 그림 형태로 요약하는 수준에 그치고 있다는 한계점이 있다.However, in the case of such a quantitative analysis methodology and analysis system, the focus is only on the purpose of facilitating the collection, processing, and analysis of data, so industry and market analysts collect, process, and analyze data on the subject of analysis. It is a level that supports the task to be performed more easily, and furthermore, the analysis result is limited to the level of summarizing mainly in the form of tables or figures.

따라서 산업·시장 데이터의 수집·가공·분석 과정뿐만 아니라 해석, 출력과정까지 자동으로 수행할 수 있는 방안이 제안되는 경우, 정보를 생산하는 분석자뿐만 아니라 분석된 정보를 이용하는 수요자에게도 매우 효과적인 산업·시장 분석 지원 수단이 될 수 있을 것으로 기대된다.Therefore, when a method that can automatically perform not only the process of collecting, processing, and analyzing industry and market data, but also the process of interpretation and output is proposed, it is very effective not only for the analyst who produces the information, but also for the consumer who uses the analyzed information. It is expected to be a means of supporting analysis.

또한, 자동으로 핵심내용을 추출하여 시사점을 도출할 수 있다면, 분석자의 개인적인 분석 역량이나 정보 이용자의 해석 능력의 차이에 관계없이 수준 높은 정보 활용이 가능할 것이며, 나아가 핵심 내용의 의미를 해석하여 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석 보고서를 자동으로 생성할 수 있다면, 특히 다수의 분석 대상에 대한 산업·시장 분석을 반복적으로 수행하여 보고서를 작성하는 경우 이에 소요되는 시간과 비용을 크게 절감할 수 있을 것으로 기대된다.In addition, if the core content can be automatically extracted and the implications can be derived, the high-level information will be available regardless of the individual analysis capability of the analyst or the interpretation ability of the information user, and furthermore, the meaning of the core content will be interpreted to determine the implications. If it is possible to automatically generate an industry/market analysis report written in text form, it will greatly reduce the time and cost required, especially when creating a report by repeatedly performing industry/market analysis on a number of analysis targets. It is expected to be possible.

이에, 본 발명의 일 실시예에서는, 분석 대상에 대한 산업·시장 분석을 수행하고, 분석 결과 중 핵심내용을 자동으로 추출한 후 그 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석 보고서를 자동으로 생성하기 위한 새로운 방안을 제안하고자 하며, 이하에서는 이를 실현하기 위한 자동보고서생성장치(100)의 구성에 대해 보다 구체적으로 설명하기로 한다.Accordingly, in an embodiment of the present invention, an industry/market analysis report is automatically generated in which the analysis target is analyzed for industry/market, core content is automatically extracted from the analysis results, and the implications are written in text form. It is intended to propose a new method for this, and hereinafter will be described in more detail with respect to the configuration of the automatic report generating device 100 for realizing this.

도 2에는 본 발명의 일 실시예에 따른 자동보고서생성장치(100)의 구성을 보여주고 있다.2 shows the configuration of the automatic report generating apparatus 100 according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 일 실시예에 따른 자동보고서생성장치(100)는 분석 데이터에 대한 데이터를 수집하는 수집부(10), 수집된 데이터를 분류하는 분류부(20), 및 분류된 데이터를 분석하는 분석부(30)를 포함하는 구성을 가질 수 있다.As shown in FIG. 2, the automatic report generation apparatus 100 according to an embodiment of the present invention includes a collection unit 10 for collecting data on analysis data, a classification unit 20 for classifying the collected data, And an analysis unit 30 for analyzing the classified data.

또한, 본 발명의 일 실시예에 따른 자동보고서생성장치(100)는 전술한 구성 이외에 핵심 내용을 추출하는 추출부(40), 및 보고서를 생성하는 생성부(50)의 구성을 더 포함할 수 있다.In addition, the automatic report generation apparatus 100 according to an embodiment of the present invention may further include a configuration of an extraction unit 40 for extracting core contents and a generation unit 50 for generating a report in addition to the above-described configuration. have.

이상의 수집부(10), 분류부(20), 분석부(30), 추출부(40), 및 생성부(50)를 포함하는 자동보고서생성장치(100)의 전체 구성 내지는 적어도 일부는 하드웨어 모듈 형태 또는 소프트웨어 모듈 형태로 구현되거나, 하드웨어 모듈과 소프트웨어 모듈이 조합된 형태로도 구현될 수 있다.The entire configuration or at least a part of the automatic report generation device 100 including the above collection unit 10, classification unit 20, analysis unit 30, extraction unit 40, and generation unit 50 is a hardware module It may be implemented in the form of a form or a software module, or may be implemented in a form in which a hardware module and a software module are combined.

여기서, 소프트웨어 모듈이란, 예컨대, 자동보고서생성장치(100) 내에서 연산을 제어하는 프로세서에 의해 실행되는 명령어로 이해될 수 있으며, 이러한 명령어는 자동보고서생성장치(100) 내 메모리에 탑재된 형태를 가질 수 있을 것이다.Here, the software module may be understood as, for example, an instruction executed by a processor that controls an operation in the automatic report generating device 100, and this instruction is a form mounted in the memory in the automatic report generating device 100. You will be able to have it.

한편, 본 발명의 일 실시예에 따른 자동보고서생성장치(100)는 전술한 구성 이외에, 유무선 통신망을 통해서 원격의 장치와의 실질적인 통신 기능을 담당하는 RF 모듈인 통신부(60)의 구성을 더 포함할 수 있다.On the other hand, the automatic report generation apparatus 100 according to an embodiment of the present invention further includes a configuration of the communication unit 60, which is an RF module responsible for a practical communication function with a remote device through a wired or wireless communication network, in addition to the above configuration. can do.

여기서, 통신부(60)는 예컨대, 안테나 시스템, RF 송수신기, 하나 이상의 증폭기, 튜너, 하나 이상의 발진기, 디지털 신호 처리기, 코덱(CODEC) 칩셋, 및 메모리 등을 포함하지만 이에 제한되지는 않으며, 이 기능을 수행하는 공지의 회로는 모두 포함할 수 있다.Here, the communication unit 60 includes, but is not limited to, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a codec chipset, and a memory. Any known circuit to perform may be included.

이상 본 발명의 일 실시예에 따른 자동보고서생성장치(100)는 분석 대상에 대한 산업·시장 분석을 수행하고, 분석 결과 중 핵심내용을 자동으로 추출한 후 그 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석 보고서를 자동으로 생성할 수 있는데, 이하에서는 이를 실현하기 위한 자동보고서생성장치(100) 내 각 구성에 대해 보다 구체적으로 설명하기로 한다.As described above, the automatic report generation apparatus 100 according to an embodiment of the present invention performs an industry/market analysis on an analysis target, automatically extracts the core contents from the analysis results, and then writes its implications as text-type sentences. The market analysis report can be automatically generated. Hereinafter, each component in the automatic report generation device 100 for realizing this will be described in more detail.

수집부(10)는 분석 데이터에 대한 데이터를 수집하는 기능을 수행한다.The collection unit 10 performs a function of collecting data for analysis data.

보다 구체적으로, 수집부(10)는 분석 대상에 대한 보고서 생성을 위해 분석 대상과 관련된 데이터(이하, 원본 데이터)를 수집하게 된다.More specifically, the collection unit 10 collects data related to the analysis target (hereinafter, source data) in order to generate a report on the analysis target.

이때, 분석 대상은, 산업 혹은 품목에 대한 정보로서 보고서 포맷(구성)에 관한 정보와 함께 사용자로부터 입력될 수 있으며, 여기서의 보고서 포맷은, 사용자 지정에 따라 2 이상의 분석 카테고리를 포함하도록 지정될 수 있다.At this time, the analysis target may be input from the user along with information on the report format (composition) as information on the industry or item, and the report format here may be designated to include two or more analysis categories according to user designation. have.

이에, 수집부(10)는 분석 대상이 입력되는 경우, 보고서의 포맷으로 지정되는 분석 카테고리 별로 각 카테고리 특성과 매칭되도록 지정된 데이터 저장소의 리스트를 확인하며, 분석 대상으로 입력된 명칭을 리스트 상 데이터 저장소 각각에서의 통용 명칭(예: 분류코드)으로 변경 또는 확장하는 방식을 통해서 원본 데이터를 수집할 수 있다.Accordingly, when an analysis target is input, the collection unit 10 checks the list of designated data stores to match the characteristics of each category for each analysis category specified in the report format, and stores the name input as the analysis target on the list. Original data can be collected by changing or expanding to a common name (eg, classification code) in each.

즉, 수집부(10)는 분석 대상에 대한 원본 데이터 수집 시, 보고서의 포맷으로 지정된 각 분석 카테고리와 매칭되는 지정된 데이터 저장소로부터 원본 데이터를 수집함으로써, 분석 카테고리에 적합한 원본 데이터 이외의 부가적인 데이터 수집을 제한함으로써, 원본 데이터 수집 및 분석에 소요되는 시간을 최소화할 수 있는 것이다.That is, the collection unit 10 collects the original data from the designated data storage matching each analysis category specified in the report format when collecting the original data for the analysis target, thereby collecting additional data other than the original data suitable for the analysis category. By limiting the data, the time required to collect and analyze the original data can be minimized.

예를 들어, 분석 대상으로 '승강기'가 입력되며, 보고서의 포맷으로서, 제1장 정의 및 개요, 제2장 환경분석, 제3장 시장구조분석, 제4장 경쟁현황분석, 제5장 시장규모 추정 및 전망, 제6장 재무구조분석, 제7장 시사점지표분석의 순서로 분석 카테고리가 지정될 수 있다.For example,'elevator' is input as an analysis target, and as the format of the report, Chapter 1 Definition and Overview, Chapter 2 Environmental Analysis, Chapter 3 Market Structure Analysis, Chapter 4 Competition Status Analysis, Chapter 5 Markets Analysis categories can be specified in the order of size estimation and forecast, Chapter 6 Financial Structure Analysis, and Chapter 7 Implication Index Analysis.

이 경우, 제1장 정의 및 개요에 해당하는 분석 카테고리에는, 한국과학기술정보연구원 KMAPS, 통계청 및 뉴스기사의 데이터 저장소가 매칭될 수 있으며, 제2장 환경분석에 해당하는 분석 카테고리에는, 한국과학기술정보연구원 KMAPS, 뉴스기사, 인터넷 백과사전, 정부부처 및 공공기관의 데이터 저장소가 매칭될 수 있다.In this case, the data repositories of the Korea Institute of Science and Technology Information, KMAPS, the National Statistical Office, and news articles may be matched to the analysis category corresponding to the definition and outline of Chapter 1. KMAPS of Korea Institute of Technology and Information, news articles, internet encyclopedias, and data repositories of government departments and public institutions can be matched.

또한, 제3장 시장구조분석에 해당하는 분류 카테고리에는, 한국과학기술정보연구원 KMAPS, 한국은행의 데이터 저장소가 매칭될 수 있고, 제4장 경쟁현황분석에 해당하는 분류 카테고리에는, 한국과학기술정보연구원 KMAPS, 금융감독원, 및 민간 신용평가사의 데이터 저장소가 매칭될 수 있다.In addition, the data storage of the Korea Institute of Science and Technology Information KMAPS and the Bank of Korea can be matched to the classification category corresponding to Chapter 3 Market Structure Analysis, and in the classification category corresponding to Chapter 4 Competition Status Analysis, Korea Science and Technology Information Researcher KMAPS, Financial Supervisory Service, and private credit rating agencies' data repositories can be matched.

나아가, 제5장 시장규모 추정 및 전망에 해당하는 분류 카테고리에는, 한국과학기술정보연구원 KMAPS, 통계청, 관세청, 및 미국을 포함한 해외 국가 통계청의 데이터 저장소가 매칭 가능하며, 제6장 재무구조분석에 해당하는 분류 카테고리에는 민간 신용평가사, 그리고 제7장 시사점지표분석에 해당하는 분류 카테고리에는, 한국과학기술정보연구원 KMAPS의 데이터 저장소가 매칭될 수 있다.Furthermore, the data storage of the Korea Institute of Science and Technology Information KMAPS, the National Statistical Office, the Customs Service, and the foreign national statistical offices including the United States can be matched to the classification category corresponding to Chapter 5 Market Size Estimation and Prospect. A private credit rating agency may be matched to a corresponding classification category, and a data repository of KMAPS of the Korea Institute of Science and Technology Information may be matched to a classification category corresponding to Chapter 7 Analysis of Implications Indicators.

한편, 분석 대상으로 '승강기'가 입력된 경우, 분류 대상에 해당되는 분류코드로서, 한국표준산업분류 C29162 (승강기 제조업), MTI분류 745100 (엘리베이터), 745200 (에스컬레이터), 한국재화 및 서비스분류 54691 (리프트, 엘리베이터 설치공사서비스), 87158 (엘리베이터 및 에스컬레이터 유지관리 및 수리 서비스), 북미표준산업분류 333921 (Elevator and Moving Stairway Manufacturing) 등이 선택될 수 있다.On the other hand, if'elevator' is entered as an analysis target, it is a classification code that corresponds to the classification target. (Lift and Elevator Installation Construction Service), 87158 (Elevator and Escalator Maintenance and Repair Service), North American Standard Industry Classification 333921 (Elevator and Moving Stairway Manufacturing) can be selected.

이를 토대로, 한국과학기술정보연구원의 산업·시장분석시스템인 KMAPS(KISTI Market Analysis and Prediction System) 웹페이지로부터 한국표준산업분류 '승강기 제조업'에 대해 사전에 분류되어 정리된 시장규모 데이터, 기업별 매출액 데이터, 산업간 거래 데이터 등의 정형 데이터를 선택하여 다운로드할 수 있으며, 통계청의 웹페이지로부터는 동 산업에 대한 정의, 구체적인 범위, 주요 생산품 등 산업 설명문 비정형데이터를 웹크롤링하고, 동 산업 및 이에 속하는 하위 수준 품목들에 대해 사전에 분류된 출하액, 기업체수, 종사자수 등에 대한 정형데이터를 선택하여 다운로드할 수 있다.Based on this, the market size data classified and organized in advance for the Korean standard industry classification'elevator manufacturing industry' from the KMAPS (KISTI Market Analysis and Prediction System) webpage, the industry and market analysis system of the Korea Institute of Science and Technology Information, and sales by company. You can select and download structured data such as data and inter-industry transaction data, and from the web page of the National Statistical Office, you can web crawl unstructured data for industry descriptions such as the definition, specific scope, and major products of the industry. For lower-level items, you can select and download structured data on the amount of shipment, the number of companies, and the number of workers classified in advance.

또한, 한국은행의 웹페이지로부터는 동 산업에 대해 사전에 분류된 주요 재무 항목 및 산업연관분석에 관한 정형데이터를 선택하여 다운로드할 수 있으며, 금융감독원의 웹페이지로부터는 동 산업에 대해 사전에 분류된 종사 기업의 개요 및 재무에 관한 정형데이터 및 비정형데이터를 선택하여 다운로드할 수 있다.In addition, you can select and download structured data on major financial items and industry-related analysis classified in advance for the industry from the web page of the Bank of Korea, and the industry is classified in advance from the web page of the Financial Supervisory Service. You can select and download structured data and unstructured data on the outline and finance of the engaged companies.

또한, 민간 신용평가사의 웹페이지로부터 동 산업에 대해 사전에 분류된 종사 기업의 개요 및 재무에 관한 정형데이터를 선택하여 다운로드할 수 있으며, 미국 통계청의 웹페이지로부터는 북미표준산업분류 333921에 대해 사전에 분류되어 정리된 출하액, 기업체수, 종사자수, 수출입 등에 대한 정형데이터를 선택하여 다운로드할 수 있고, 한국무역협회의 웹페이지로부터 MTI분류 745100 내지 745200으로 사전에 분류되어 정리된 수출입 데이터를 선택하여 다운로드할 수 있다.In addition, you can select and download structured data about the financial and the outline of the companies classified in advance for the industry from the webpage of private credit rating agencies, and the North American Standard Industry Classification 333921 can be downloaded from the web page of the U.S. Statistical Office. You can select and download structured data on shipment amount, number of companies, number of workers, import and export, etc. classified and organized in the Korea International Trade Association's webpage, by selecting the import and export data classified in advance as MTI classification 745100 to 745200 from the webpage of the Korea International Trade Association. You can download it.

나아가, 뉴스 전문 포털 및 인터넷 백과사전 웹페이지로부터 분석대상인 '승강기'와 관련된 '승강기', '엘리베이터', '에스컬레이터' 등의 검색어로 검색하여 도출된 주요 뉴스 기사와 정의 및 특징에 관한 비정형 데이터를 웹크롤링할 수 있으며, 중소벤처기업부 중소기업 기술로드맵 보고서와 같이 정부부처 및 공공기관에서 발간하는 산업·시장 분석보고서 정보원에서 상기 관련 검색어로 검색하여 도출된 비정형데이터를 다운로드하거나 웹크롤링 할 수 있다.Furthermore, major news articles and unstructured data on definitions and characteristics derived from search terms such as'lift','elevator', and'escalator' related to the analysis target'elevator' are retrieved from the news portal and internet encyclopedia webpage. Web crawling is possible, and unstructured data derived by searching for the above related keywords from information sources for industry and market analysis reports published by government ministries and public institutions, such as the Small and Medium Business Technology Roadmap Report of the Ministry of SMEs and Startups, can be downloaded or web crawled.

분류부(20)는 분석 카테고리마다의 분류 데이터를 매칭시키는 기능을 수행한다.The classification unit 20 performs a function of matching classification data for each analysis category.

보다 구체적으로, 분류부(20)는 보고서의 포맷으로 지정된 분석 카테고리 별로 원본 데이터의 수집이 완료되면, 각 원본 데이터를 분류하여, 각 분석 카테고리마다의 분류 데이터로서 매칭시키게 된다.More specifically, when the collection of original data for each analysis category specified in the report format is completed, the classification unit 20 classifies each original data and matches it as classification data for each analysis category.

앞선 예에서와 같이, 분석 대상으로 '승강기'가 입력되며, 보고서의 포맷으로서, 제1장 정의 및 개요, 제2장 환경분석, 제3장 시장구조분석, 제4장 경쟁현황분석, 제5장 시장규모 추정 및 전망, 제6장 재무구조분석, 제7장 시사점지표분석의 순서로 분석 카테고리가 지정된 경우, 제1장 정의 및 개요에 해당하는 분석 카테고리에는, 통계청의 산업 설명문 데이터, 뉴스기사 데이터가 매칭될 수 있으며, 제2장 환경분석에 해당하는 분석 카테고리에는, 뉴스기사 데이터, 인터넷 백과사전 데이터, 정부부처 및 공공기관에서 발간하는 산업·시장 분석보고서가 분류되어 매칭될 수 있다.As in the previous example, the'lift' is input as an analysis target, and as the format of the report, Chapter 1 Definition and Overview, Chapter 2 Environmental Analysis, Chapter 3 Market Structure Analysis, Chapter 4 Competition Status Analysis, and Chapter 5 If the analysis category is specified in the order of Chapter 1, the estimation and forecast of market size, Chapter 6 Financial Structure Analysis, and Chapter 7 Implication Index Analysis, the analysis categories corresponding to Chapter 1 Definition and Overview include industry explanatory data from the National Statistical Office, and news articles. Data can be matched, and in the analysis category corresponding to Chapter 2 Environmental Analysis, news article data, Internet encyclopedia data, and industry/market analysis reports published by government departments and public institutions can be classified and matched.

또한, 제3장 시장구조분석에 해당하는 분류 카테고리에는, 한국과학기술정보연구원 KMAPS의 산업간 거래 데이터, 한국은행의 산업연관분석 데이터가 매칭될 수 있고, 제4장 경쟁현황분석에 해당하는 분류 카테고리에는, 한국과학기술정보연구원 KMAPS의 기업별 매출액 데이터, 금융감독원의 기업 재무데이터, 민간 신용평가사의 기업 재무데이터가 매칭될 수 있다.In addition, in the classification category corresponding to Chapter 3 Market Structural Analysis, the inter-industry transaction data of KMAPS of Korea Institute of Science and Technology Information and the data of industry-related analysis of the Bank of Korea can be matched, and the classification corresponding to Chapter 4 competition status analysis. In the category, sales data for each company of KMAPS of the Korea Institute of Science and Technology Information, corporate financial data of the Financial Supervisory Service, and corporate financial data of a private credit rating company may be matched.

나아가, 제5장 시장규모 추정 및 전망에 해당하는 분류 카테고리에는, 한국과학기술정보연구원 KMAPS의 시장규모데이터, 통계청의 출하액 데이터, 관세청의 수출입 데이터, 미국 통계청의 출하액 및 수출입 데이터가 매칭 가능하며, 제6장 재무구조분석에 해당하는 분류 카테고리에는 민간 신용평가사의 기업 재무데이터, 그리고 제7장 시사점지표분석에 해당하는 분류 카테고리에는, 한국과학기술정보연구원 KMAPS의 KMAPS Index 데이터가 매칭될 수 있다.Furthermore, in the classification category corresponding to Chapter 5 Market Size Estimation and Prospect, the market size data of KMAPS of the Korea Institute of Science and Technology Information, shipment amount data of the National Statistical Office, import and export data of the Korea Customs Service, and the shipment amount and import and export data of the US Statistical Office can be matched. The classification category corresponding to Chapter 6 Financial Structure Analysis can be matched with corporate financial data of private credit rating agencies, and the KMAPS Index data of KMAPS of Korea Institute of Science and Technology Information can be matched with the classification category corresponding to Chapter 7 Implication Index Analysis.

분석부(30)는 분석 카테고리마다의 분석 데이터를 생성하는 기능을 수행한다.The analysis unit 30 performs a function of generating analysis data for each analysis category.

보다 구체적으로, 분석부(30)는 보고서의 포맷으로 지정된 분석 카테고리마다의 분석 데이터 분류가 완료되면, 각 카테고리의 특성에 따라 분류된 분류 데이터를 분석하여, 분석 카테고리 별 분석 데이터를 생성하게 된다.More specifically, when the analysis data classification for each analysis category specified in the report format is completed, the analysis unit 30 analyzes the classification data classified according to the characteristics of each category to generate analysis data for each analysis category.

이때, 분석부(30)는 각 분석 카테고리 별 분석 데이터를 생성함에 있어서, 분석 데이터를 각 분석 카테고리마다 데이터 분석에 적합한 형태인 정제 데이터로 정제하기 위한 데이터 가공 절차를 수행할 수 있다.In this case, in generating the analysis data for each analysis category, the analysis unit 30 may perform a data processing procedure to refine the analysis data into refined data in a form suitable for data analysis for each analysis category.

예를 들어, 시장규모나 주요 재무항목에 관한 수치 데이터의 경우 억 원 등으로 단위를 통일하고, 결측치 처리와 관련하여, 연속형 데이터의 경우, 평균 대체, 회귀 대체, 최우도 대체, 다중 대체를 통해 결측값을 대체하며 범주형 데이터의 경우 최빈값 대체, 지도학습 및 준지도학습을 통해 결측값을 대체하고, 이상치 처리와 관련하여 사전에 정해진 특정 범위 밖의 값을 갖는 케이스, 혹은 전체 케이스 중 특정 비율 이내에 속한 값을 갖는 케이스, 또는 통계 분석 결과 이상치로 판별된 케이스를 탐지하여 예외 처리할 수 있다.For example, in the case of numerical data on market size or major financial items, the unit is unified in billions, and in relation to the processing of missing values, in the case of continuous data, average substitution, regression substitution, most likelihood substitution, and multiple substitution are used. In the case of categorical data, it replaces missing values through mode substitution, supervised learning, and semi-supervised learning, and cases with values outside a predetermined range in relation to outlier processing, or a specific percentage of all cases. An exception can be handled by detecting a case with a value within the range or a case identified as an outlier as a result of statistical analysis.

또한, 분석 카테고리 간에는, 제1장. 시장규모 및 전망을 위해 한국과학기술정보연구원 KMAPS의 시장규모데이터, 통계청의 출하액 데이터, 관세청의 수출입 데이터, 미국 통계청의 출하액 및 수출입 데이터 등의 이종 데이터를 사전에 정해진 규칙에 따라 매칭, 연계하고 연산할 수 있다.Also, between the categories of analysis, Chapter 1. For market size and forecast, heterogeneous data such as market size data of KMAPS of Korea Institute of Science and Technology Information, shipment amount data of Statistics Korea, import and export data of Korea Customs Service, shipment amount and import and export data of US Statistical Office are matched, linked, and calculated according to predetermined rules. can do.

이와 관련하여, [표 1]에는 통계청 웹페이지로부터 한국표준산업분류 ‘승강기 제조업’으로 분류되어 정리된 출하액 데이터를 수집한 결과를, [표 2]에는 한국무역협회의 웹페이지로부터 MTI분류 745100 내지 745200으로 분류되어 정리된 수출입 데이터를 수집한 결과를 나타내고 있다.In this regard, [Table 1] shows the results of collecting the data on shipments classified and organized into the Korean Standard Industry Classification'Elevator Manufacturing Industry' from the National Statistical Office's web page, and [Table 2] shows the MTI classification 745100 through the web page of the Korea International Trade Association. It shows the result of collecting the imported and exported data classified as 745200.

한국표준산업분류 ‘승강기 제조업’ 출하액 규모(단위: 백만 원)Korean Standard Industry Classification “Elevator Manufacturing” Shipment Scale (Unit: KRW million) 연도year 20072007 20082008 20092009 20102010 20112011 20122012 20132013 출하액Shipment amount 2,795,4802,795,480 2,790,1902,790,190 2,682,5102,682,510 2,889,8702,889,870 3,214,9503,214,950 3,334,2203,334,220 2,722,4602,722,460 연도year 20142014 20152015 20162016 20172017 출하액Shipment amount 3,300,3403,300,340 3,789,2603,789,260 4,238,9104,238,910 4,828,5004,828,500

MTI분류 745100 내지 745200의 수출입액 규모(단위: 천 달러)Scale of import and export of MTI classification 745100 to 745200 (Unit: thousand dollars) 연도year 20072007 20082008 20092009 20102010 20112011 20122012 20132013 수출액 745100Exports 745 100 191,111191,111 235,005235,005 239,516239,516 70,66970,669 74,94474,944 69,32369,323 72,11572,115 수출액 745200Exports 745200 5,3825,382 6,9156,915 824824 2,8422,842 284284 2,8572,857 699699 수입액 745100Income 745 100 32,34932,349 11,66911,669 13,55313,553 14,24114,241 12,54812,548 5,0455,045 10,61510,615 수입액 745200Income 745200 39,24939,249 51,38651,386 42,42742,427 46,43246,432 54,98954,989 38,75038,750 41,86141,861 연도year 20142014 20152015 20162016 20172017 20182018 수출액 745100Exports 745 100 54,29054,290 86,51486,514 56,40856,408 47,81947,819 52,82452,824 수출액 745200Exports 745200 00 00 3636 9090 00 수입액 745100Income 745 100 12,52112,521 6,0036,003 6,7536,753 1,6561,656 1,2241,224 수입액 745200Income 745200 49,69249,692 75,75375,753 61,35061,350 53,66753,667 51,61751,617

한국표준산업분류 ‘승강기 제조업’과 관련된 세부품목별 출하액 규모(단위: 백만 원)Shipment amount by detailed item related to the Korean Standard Industry Classification'elevator manufacturing industry' (unit: KRW million) 연도year 20072007 20082008 20092009 20102010 20112011 20122012 20132013 엘리베이터elevator 12,62412,624 14,08914,089 12,12912,129 12,27712,277 13,20813,208 8,7918,791 11,52511,525 에스컬레이터escalator 1,1171,117 236236 2,0552,055 1,6511,651 1,8181,818 1,9511,951 1,2241,224 리프트기기Lift equipment 1,3461,346 1,5251,525 1,2271,227 703703 1,0451,045 659659 510510 주차기Parking machine 873873 686686 1,2051,205 1,9461,946 1,2451,245 7,0967,096 1,4011,401 승강기류 부품Elevator parts 3,3823,382 3,7663,766 3,4643,464 4,2764,276 4,2204,220 5,2795,279 3,9693,969 연도year 20142014 20152015 20162016 20172017 엘리베이터elevator 14,66014,660 20,81820,818 21,36321,363 24,33324,333 에스컬레이터escalator 902902 -- 822822 -- 리프트기기Lift equipment 425425 969969 1,2111,211 1,4881,488 주차기Parking machine 1,1501,150 1,7711,771 2,3072,307 2,5222,522 승강기류 부품Elevator parts 4,9754,975 5,2725,272 6,0736,073 7,6587,658

[표 1]에 나타낸 바와 같이 통계청으로부터 한국표준산업분류 '승강기 제조업'에 대한 백만 원 단위로 정리된 출하액 데이터를 선택하여 수집하고, 아울러 [표 3]에 나타낸 바와 같이 통계청으로부터 '승강기 제조업'과 관련된 세부품목에 대한 출하액 데이터를 선택하여 수집하고, [표 2]에 나타낸 바와 같이 한국무역협회로부터 MTI분류 745100 내지 745200에 대해 천 달러 단위로 정리된 수출입액 규모를 선택하여 수집하고, [표 3]의 2015년과 2017년 에스컬레이터 품목의 출하액 결측치는 이동평균 대체를 통해 각각 900백만 원, 800백만 원으로 대체하고, 이상치 탐지 모델링을 통해 이상치가 존재하지 않음을 확인하고, 출하액과 무역액에 대한 이종간 데이터를 연계하고, 한국은행으로부터 연평균 원/달러 환율 데이터를 선택하여 수집한 후 출하액과 무역액 간의 화폐 단위를 억 원 단위로 통일한 후, 출하액에 수출액을 감산하고 수입액을 합산하는 연산 과정을 통해 가공된 데이터로서 분석대상인'승강기'의 내수시장규모를 [표 4]와 같이 새롭게 생산할 수 있다.As shown in [Table 1], we selected and collected the shipment amount data collected in units of million won for the Korean Standard Industry Classification'elevator manufacturing industry' from the National Statistical Office. In addition, as shown in [Table 3], the'elevator manufacturing industry' and As shown in [Table 2], the shipment amount data for related detailed items is selected and collected, and as shown in [Table 2], the amount of exports and imports collected in units of thousands of dollars for MTI classifications 745100 to 745200 from the Korea International Trade Association is selected and collected. ], the estimated shipment value of escalator items in 2015 and 2017 was replaced by KRW 900 million and KRW 800 million, respectively, through moving average substitution, and the outlier detection modeling confirmed that no outliers existed, and between shipments and trade amounts After linking the data, selecting and collecting annual average won/dollar exchange rate data from the Bank of Korea, unifying the monetary unit between the shipment amount and the trade amount into billion won, then processing it through an operation process that subtracts the export amount from the shipment amount and adds the import amount. As the resulted data, the size of the domestic market of the'lift', which is the target of analysis, can be newly produced as shown in [Table 4].

'승강기’ 내수시장규모(단위: 억 원)'Elevator' Domestic Market Scale (Unit: KRW 100 million) 연도year 20072007 20082008 20092009 20102010 20112011 20122012 20132013 출하액Shipment amount 26,79426,794 25,93025,930 24,47224,472 28,75028,750 32,06432,064 33,02233,022 27,00227,002 연도year 20142014 20152015 20162016 20172017 출하액Shipment amount 33,08733,087 37,83937,839 42,52442,524 48,36948,369

또한, 분석부(30)는 각 분석 카테고리 별 분석 데이터를 생성함에 있어서, 비정형 데이터에 대한 자연어 처리를 수행할 수 있다.In addition, when generating analysis data for each analysis category, the analysis unit 30 may perform natural language processing on unstructured data.

예를 들어, 경쟁현황분석을 위해 수집한 기업 데이터 중 기업의 주요제품명에 대한 비정형 데이터에 대해 형태소 분석, 용어 추출, 용어 빈도 분석, 용어-문서간 행렬 생성, 용어간 동시 출현 빈도 분석, 용어간 연관 관계 분석, 문서간 유사도 분석, 토픽 모델링 등의 자연어 처리를 할 수 있다.For example, morpheme analysis, term extraction, term frequency analysis, term-document matrix generation, simultaneous occurrence frequency analysis between terms, for unstructured data on the company's major product names among corporate data collected for competition status analysis. Natural language processing such as relationship analysis, similarity analysis between documents, and topic modeling can be performed.

또한, 분석부(30)는 각 분석 카테고리 별 분석 데이터를 생성함에 있어서, 분석 카테고리 별 특성에 따라 그 분석 결과를 표나 그림의 형태로 도식화할 수 있다.In addition, in generating the analysis data for each analysis category, the analysis unit 30 may diagram the analysis result in the form of a table or picture according to the characteristics of each analysis category.

예를 들어, 제5장. 시장규모 및 전망에 해당하는 분석 카테고리에 대해서는, 분석대상의 출하액과 수출입액을 연산하여 추정한 내수시장규모를 바탕으로 추세법, 평균법, 평활법, 누적자기회귀이동평균(ARIMA) 등의 시계열 분석기법 및 BASS, Gomperz, Logistics 등의 확산 모형을 이용하는 방법을 이용하여 미래 시장규모를 예측하고 그 중 가장 적합도가 높은 방법인 누적자기회귀이동평균 방법으로 예측한 결과를 도 3 및 도 4에 나타낸 바와 같이, 표와 그림의 형태로 도식화할 수 있으며, 또한 출하액과 수출입액의 연산을 통해 부가적으로 산출할 수 있는 수입품 의존도, 수출 비중, 무역특화지수 등의 분석 결과를 표와 그림의 형태로 도식화할 수 있다.For example, Chapter 5. For analysis categories corresponding to market size and forecast, time series analysis such as trend method, averaging method, smoothing method, and cumulative autoregressive moving average (ARIMA) based on the domestic market size estimated by calculating the shipment amount and export/import amount of the analysis target. As shown in Figs. 3 and 4, the future market size is predicted using a method and a method using diffusion models such as BASS, Gomperz, and Logistics, and the predicted result by the cumulative autoregressive moving average method, which is the most suitable method. Likewise, it can be plotted in the form of tables and figures, and the analysis results such as dependence on imports, export share, and trade-specific index, which can be additionally calculated through calculation of shipments and imports and exports, are plotted in the form of tables and figures. can do.

또한, 제3장. 시장구조분석에 해당하는 분석 카테고리에 대해서는, 한국표준산업분류 ‘승강기 제조업’에 대해 정제한 데이터를 바탕으로 동 산업에 부품을 공급하는 후방산업과 동 산업에서 생산한 승강기를 소비하는 전방산업을 분석하는 전후방 산업구조 분석을 수행하여 도 5에 나타낸 바와 같이 그림의 형태로 도식화할 수 있다.Also, Chapter 3. For the analysis category corresponding to the market structure analysis, based on the refined data on the Korean standard industry classification'elevator manufacturing industry', the downstream industry that supplies parts to the industry and the downstream industry that consumes elevators produced in the same industry are analyzed. By performing the analysis of the front and rear industrial structure, it can be diagrammed in the form of a picture as shown in FIG. 5.

또한, 제4장. 경쟁현황분석에 해당하는 분석 카테고리에 대해서는, 정제 및 가공한 데이터를 바탕으로 한국표준산업분류 ‘승강기 제조업’에 참여하고 있는 기업들의 매출액 기준 시장점유율을 추정하여 도 6에 나타낸 바와 같이 그림으로 도식화 할 수 있으며, 기업별 시장점유율 분포 결과를 바탕으로 가공부에서 새롭게 연산하여 생성한 HHI(Hirshman-Herfindahl Index), CR(Concentration Ratio)3 등의 지표를 통해 시장집중도를 분석하여 도 7에 나타낸 바와 같이 그림으로 도식화할 수 있고, 기업 수준의 범주보다 상위 수준의 범주인 대기업, 중견기업, 중기업, 소기업으로 구분한 기업 규모별 시장점유율을 분석하여 도 8에 나타낸 바와 같이 그림으로 도식화할 수 있으며, 주요 기업들의 최근 3년 주요 재무항목을 비교 분석하여 도 9에 나타낸 바와 같이 표의 형태로 도식화할 수 있으며, 동 업종에 참여하고 있는 기업들의 주요 제품명에 대한 비정형 데이터를 자연어 처리를 통해 가공한 데이터를 워드 클라우드 분석하여 도 10에 나타낸 바와 같이 그림의 형태로 도식화할 수 있으며, 동 업종에 참여하고 있는 기업의 종업원 수와 업력의 분포를 분석하여 도 11에 나타낸 바와 같이 그림의 형태로 도식화할 수 있다.Also, Chapter 4. For the analysis category corresponding to the competition status analysis, based on refined and processed data, the market share based on sales of companies participating in the Korean Standard Industry Classification'elevator manufacturing industry' is estimated and plotted as shown in Fig.6. Based on the results of the market share distribution by company, the market concentration was analyzed through indicators such as HHI (Hirshman-Herfindahl Index) and CR (Concentration Ratio)3 newly calculated and generated by the processing department, as shown in FIG. It can be illustrated as a picture, and the market share by company size divided into large, mid-sized, mid-sized, and small-sized companies, which are higher-level categories than the corporate-level category, can be analyzed and plotted as a picture as shown in FIG. As shown in Fig. 9, it is possible to compare and analyze major financial items of companies in the last three years and plot them in the form of a table, and the data processed through natural language processing of unstructured data on major product names of companies participating in the same industry is used as a word. It can be analyzed in the cloud and plotted in the form of a picture as shown in FIG. 10, and the number of employees and the distribution of business power of companies participating in the same industry can be analyzed and plotted in the form of a picture as shown in FIG.

또한, 제6장. 재무구조 분석에 해당하는 분석 카테고리에 대해서는, 정제 데이터를 바탕으로 한국표준산업분류 ‘승강기 제조업’에 참여하고 있는 전체 기업, 영업이익 상위 25% 이내 기업, 중소기업, 창업 5년 이하 기업의 평균적인 주요 재무비율을 분석하여 도 12에 나타낸 바와 같이 표의 형태로 도식화 할 수 있으며, 주요 재무비율을 시계열 분석하여 도 13에 나타낸 바와 같이 그림 형태로 도식화 할 수 있다.Also, Chapter 6. For the analysis category corresponding to the financial structure analysis, based on refined data, the average major of all companies participating in the Korean Standard Industrial Classification'elevator manufacturing industry', companies within the top 25% of operating profits, SMEs, and companies with less than 5 years of founding. The financial ratio can be analyzed and plotted in the form of a table as shown in FIG. 12, and the major financial ratios can be analyzed in a time series and plotted in the form of a figure as shown in FIG.

나아가, 제7장. 시사점지표 분석에 해당하는 분석 카테고리에 대해서는, 제2장 내지 제6장의 분석 내용을 종합하여, 창업 3년 이내 기업의 매출비중과 기업수 비중을 고려한 신규진입현황을 분석하여 도 14에 나타낸 바와 같이 그림으로 도식화 할 수 있고, 또한 시장집중도와 시장성장율, 중소기업 참여 비중을 고려한 성장기회성을 분석하여 도 15에 나타낸 바와 같이 그림으로 도식화 할 수 있고, 5년 후 추정 시장규모와 매출액 영업이익율을 고려한 수익가능성을 분석하여 도 16에 나타낸 바와 같이 그림으로 도식화 할 수 있다.Further, Chapter 7. As for the analysis category corresponding to the analysis of the implication index, as shown in Fig. 14, by synthesizing the analysis contents of Chapters 2 to 6, analyzing the new entry status taking into account the share of sales and the number of companies within 3 years of the founding of the company. It can be plotted as a figure, and it can be plotted as a figure as shown in Fig. 15 by analyzing the growth opportunity that takes into account market concentration, market growth rate, and participation in SMEs, and profits taking into account the estimated market size and operating margin of sales after 5 years. By analyzing the possibility, it can be schematically illustrated as shown in FIG. 16.

한편, 본 발명의 일 실시예에 따라 분석 카테고리 별로 분류되는 분석 데이터의 경우, 각 카테고리 간 특성의 유사성으로 인해, 분석 카테고리 간에 서로 공유되는 분석데이터인 공유 데이터가 존재하는 것을 고려한다.Meanwhile, in the case of analysis data classified by analysis category according to an embodiment of the present invention, it is considered that shared data, which is analysis data shared with each other, exists between analysis categories due to the similarity of characteristics between each category.

이에, 분석부(30)는 2 이상의 분석 카테고리 간에 공유되는 분류 데이터인 공유 데이터가 존재하는 경우, 공유 데이터 별로 소유권을 가지는 마스터 분석 카테고리를 지정하여 지정된 마스터 분석 카테고리에 한해서 공유 데이터를 가공한 정제 데이터를 생성할 수 있도록 한다.Accordingly, when there is shared data, which is classification data shared between two or more analysis categories, the analysis unit 30 designates a master analysis category that has ownership for each shared data, and processes the shared data only for the designated master analysis category. To be able to create.

즉, 본 발명의 일 실시예에서는 공유 데이터를 서로 공유하게 되는 분석 카테고리 중 하나의 분석 카테고리에 대해서만 데이터 정제를 위한 권한인 소유권을 부여함으로써, 분석 카테고리 마다 공유 데이터에 대한 데이터 정제가 이루어지는 경우보다 데이터 정제에 필요한 리소스 사용을 현격하게 감소시킬 수 있는 것이다.That is, in an embodiment of the present invention, by granting ownership, which is the authority for data purification, to only one of the analysis categories that share shared data with each other, the data is better than when data purification is performed on the shared data for each analysis category. It is possible to significantly reduce the use of resources required for refining.

이때, 공유 데이터는, 예컨대, 데이터 간의 관련도 및 데이터 크기 중 적어도 하나를 기초로 2 이상의 분석 카테고리 중 어느 하나에 대해서 데이터 가공을 위한 소유권이 할당될 수 있으며, 소유권 할당 이후에는 2 이상의 분석 카테고리 간에 지정되는 데이터 가공 시간 동안 정제 데이터로의 데이터 가공이 이루어질 수 있으며, 데이터 가공 시간이 종료되는 시점 또는 데이터 가공이 완료되는 시점에, 소유권을 가지지 않는 상대 분석 카테고리에 대해서 자동 갱신될 수 있다.In this case, the shared data may be assigned ownership for data processing to any one of two or more analysis categories based on at least one of, for example, a relationship between data and a data size, and after the ownership assignment, between the two or more analysis categories Data processing into refined data may be performed during a specified data processing time, and when data processing time ends or data processing is completed, relative analysis categories that do not have ownership may be automatically updated.

즉, 본 발명의 일 실시예에서는 공유 데이터에 대한 소유권을 가지는 각각의 분석 카테고리에서 데이터 정제가 완료되는 경우, 정제 데이터를 정제 전 공유 데이터를 함께 공유하는 타 분석 카테고리와 공유함으로써, 타 분석 카테고리에서도 정제 데이터로부터 분석 결과를 도출할 수 있도록 지원할 수 있는 것이다.That is, in an embodiment of the present invention, when data purification is completed in each analysis category having ownership of the shared data, the purification data is shared with another analysis category that shares the shared data before purification, so that the other analysis category also It can support to derive analysis results from refined data.

추출부(40)는 핵심 내용을 추출하는 기능을 수행한다.The extraction unit 40 performs a function of extracting core contents.

보다 구체적으로, 추출부(40)는 분석 카테고리 별 분류 데이터에 대한 분석 결과인 분석 데이터의 생성이 완료되면, 각 분석 카테고리의 분석 데이터로부터 핵심 내용을 추출하게 된다.More specifically, when the generation of the analysis data, which is the analysis result of the classification data for each analysis category, is completed, the extraction unit 40 extracts core contents from the analysis data of each analysis category.

이때, 추출부(40)는 분석 데이터로부터 동일 수준의 다수 범주를 아우르는 상위 수준 범주를 포함한 계층 구조가 확인되는 경우, 상기 분석 카테고리 별 핵심 내용 추출규칙에 따른 범주 선택 가중치에 따라, 상위 수준에 대한 분석 내용, 동일 수준의 다수 범주로부터 새롭게 도출되는 신규 상위 수준 범주에 대한 내용, 상위 수준 범주의 통계적 물리량과 설정 값 이상의 차이를 보이는 하위 수준의 범주에 대한 내용, 동일 수준의 범주 중에서 상기 분석 대상과의 관련도가 가장 높은 범주에 대한 내용, 동일 수준의 범주 중에서 가장 높거나 낮은 물리량을 갖는 범주에 대한 내용, 동일 수준의 범주 중에서 설정 순위 또는 설정 비율 이내의 물리량을 갖는 범주에 대한 내용, 및 동일 수준의 범주 중에서 물리량의 변화가 가장 큰 범주에 대한 내용 중 적어도 하나를 핵심 내용으로 추출할 수 있다.At this time, when the hierarchical structure including a higher level category encompassing a plurality of categories of the same level is identified from the analysis data, the extraction unit 40 determines the higher level according to the category selection weight according to the core content extraction rule for each analysis category. Analysis contents, contents of a new higher-level category newly derived from multiple categories of the same level, contents of a lower-level category showing a difference of more than a set value and statistical physical quantity of the upper-level category, and the analysis target among the same-level categories The content of the category with the highest relevance of the same level, the category with the highest or the lowest physical quantity among the categories of the same level, the content of the category with the physical quantity within the set rank or ratio among the categories of the same level, and the same Among the level categories, at least one of the categories with the greatest change in physical quantity can be extracted as the core content.

다시 말해, 내용적인 선택 측면에서는, 시장규모, 시장집중도, 매출액, 자산규모 등 수치로 표현되는 물리량과 그것의 변화량에 대한 내용, 물리량과 그것의 변화량을 특정 범주 간 서로 비교하는 내용, 가장 많이 등장하는 용어와 관련된 내용을 핵심 내용으로 추출할 수 있고, 시기적인 선택 측면에서는, 과거보다는 최근의 변화에 대한 내용, 미래 전망에 대한 내용, 물리량의 급격한 변화가 있던 시점에 대한 내용, 특정한 사건을 전후한 시점에 대한 내용을 핵심 내용으로 추출할 수 있으며, 범주와 수준의 선택 측면에서는, 상위 수준과 그에 속하는 하위 수준에 대한 내용을 동시에 포함하고 있는 분석 결과물에 대해서 전반적인 동향을 설명할 수 있는 상위 수준에 대한 분석 내용, 동일 수준의 다수 범주들을 포함할 상위 수준의 범주가 사전에 정해지지 않은 경우, K-평균 군집화, 이단계 군집화, 코호넨 군집화 등의 비지도학습 모델링을 통해 새롭게 상위 수준의 범주를 구성하여 구성된 상위 수준의 범주에 해당하는 내용, 상위 수준 범주의 물리량 통계값과 통계적으로 유의미한 차이가 있는 하위 수준 범주에 대한 내용, 동일 수준 범주 중에서 분석 대상과 가장 관련성이 높은 범주에 대한 내용, 동일 수준 범주 중에서 가장 높거나 낮은 물리량을 갖거나 일정 순위나 비율 이내의 물리량을 갖는 범주에 대한 내용, 동일 수준 범주 중 가장 급격한 변화가 있는 범주에 대한 내용을 핵심내용으로 추출할 수 있다.In other words, in terms of content selection, the content of the physical quantity and its change expressed in numerical values such as market size, market concentration, sales, and asset size, and the contents of comparing the physical quantity and its change between specific categories, most often appear. The content related to the term can be extracted as the core content, and in terms of time selection, content about recent changes rather than the past, content about future prospects, content about the time when there was a sudden change in physical quantity, before and after a specific event. The content of a point in time can be extracted as the core content, and in terms of selection of categories and levels, the high-level that can explain the overall trend of the analysis result that includes the contents of the upper level and the lower level belonging to it at the same time. In the case where the analysis contents of the analysis and the higher-level categories that will include multiple categories of the same level are not previously determined, new higher-level categories are newly established through unsupervised learning modeling such as K-means clustering, two-stage clustering, and Kohonen clustering. Contents corresponding to the upper-level categories configured and configured, the lower-level categories with statistically significant differences from the statistical values of the physical quantities of the upper-level categories, and the categories most relevant to the analysis target among the same-level categories, the same Among the level categories, the contents of the category with the highest or lowest physical quantity, the physical quantity within a certain rank or ratio, and the category with the most rapid change among the same level categories can be extracted as core contents.

참고로, 추출 알고리즘의 선택 측면에서는 분산분석, 상관분석, 회귀분석, 판별분석 등의 통계적 기법을 사용하거나 지도학습, 준지도학습, 비지도학습의 기계학습 알고리즘을 적용하여 핵심내용 추출 규칙을 생성할 수 있다.For reference, in terms of selection of the extraction algorithm, statistical techniques such as variance analysis, correlation analysis, regression analysis, and discriminant analysis are used, or by applying machine learning algorithms such as supervised learning, semi-supervised learning, and unsupervised learning, the core content extraction rule is created. can do.

예를 들어, 본 발명의 일 실시예에서는 한국표준산업분류 ‘승강기 제조업’에 대해 최근의 내수시장규모와 성장률, 미래 전망치에 대한 수치 데이터를 핵심 내용으로 추출할 수 있으며, 동 업종과 관련된 다양한 세부품목 중 가장 많이 등장하는 용어인 ‘엘리베이터’에 대한 내용과 그것을 중심으로 동일 수준 범주인 ‘에스컬레이터’에 대한 내용과 비교하는 내용을 핵심 내용으로 추출할 수 있다.For example, in one embodiment of the present invention, numerical data on the recent domestic market size, growth rate, and future forecast for the Korean standard industry classification'elevator manufacturing industry' can be extracted as core contents. The contents of the term'elevator', which is the most common term among items, and the contents of comparison with the contents of the same level category,'escalator', can be extracted as the core contents.

또한 동 업종 참여 기업 중 총자산 규모가 10위 이내에 드는 기업들의 재무 정보 중 최근 3년 간의 총자산 변화에 관한 내용을 핵심 내용으로 추출할 수 있으며, 동 업종 참여기업들의 평균 총자산회전율 시계열 분석 자료 중 가장 급격한 증가를 나타낸 2014년 데이터에 대한 내용을 핵심 내용으로 선택할 수 있다.In addition, it is possible to extract the contents of the change in total assets over the last three years from the financial information of the companies with total assets within the 10th place among participating companies in the same industry, and is the most rapid among the average total asset turnover time series analysis data of the participating companies in the same industry. The content of the 2014 data showing an increase can be selected as the core content.

또한 2009년 글로벌 금융위기를 전후한 시점의 동 업종 평균 매출액 증가율 변화에 대한 내용을 핵심 내용으로 추출할 수 있으며, 또한 ‘승강기 제조업’과 그 하위 관련 세부제품의 내용 중 전체적인 동향을 나타내는 ‘승강기 제조업’의 출하액 데이터를 핵심 내용으로 선택할 수 있다.In addition, the contents of the change in the average sales growth rate of the same industry at the time before and after the global financial crisis in 2009 can be extracted as the core contents, and the'elevator manufacturing industry', which shows the overall trend among the contents of the'elevator manufacturing industry' and its sub-related detailed products. 'Shipment amount data can be selected as the core content.

또한 제조업 전체 업종의 유동비율 시계열 데이터 패턴을 K-평균 군집화 모델링을 통해 군집화한 결과, 동 업종이 속한 C-type 패턴이 나타내는 특징을 핵심내용으로 선택할 수 있으며, 동 업종 동 규모 기업들에 비해 영업이익률이 통계적으로 유의미하게 낮은 것으로 분석된 C 기업에 대한 재무분석 내용을 핵심 내용으로 선택할 수 있다.In addition, as a result of clustering the floating ratio time series data patterns of all industries in the manufacturing industry through K-means clustering modeling, the characteristics of the C-type pattern to which the same industry belongs can be selected as the core content. The core contents of the financial analysis of Company C, whose profit margin is statistically significantly low, can be selected.

또한 동 산업의 수요 산업 중 가장 거래 규모가 큰 ‘아파트 건설업’과 부품 공급 산업 중 가장 거래 규모가 큰 ‘1차 금속제품 제조업’의 교섭력에 대한 분석 내용을 핵심 내용으로 선택할 수 있으며, 동 산업에 대한 여러 가지 평균 재무비율에 대한 지표 중 최근 가장 큰 변화를 보인 매출액 증가율에 대한 내용을 핵심 내용으로 선택할 수 있다.In addition, you can select as the core content the analysis of the bargaining power of the'apartment construction industry', which has the largest transaction volume among the demand industries of the industry, and the'primary metal product manufacturing industry,' which has the largest transaction volume among the parts supply industries. Among the various indicators for the average financial ratio of Korea, you can select the contents of the sales growth rate, which showed the most recent change, as the core content.

생성부(50)는 보고서를 자동 생성하는 기능을 수행한다.The generation unit 50 performs a function of automatically generating a report.

보다 구체적으로, 생성부(50)는 분석 카테고리 별 핵심 내용의 추출이 완료되면, 핵심 내용이 의미하는 시사점을 기 정의된 양식 기반의 문자 생성 규칙에 따라 텍스트 형태로 변환하여, 분석 카테고리 별 분석 데이터와 상기 텍스트 형태로 변환된 시사점을 병기한 보고서를 생성하게 된다.More specifically, when the extraction of the core content for each analysis category is completed, the generation unit 50 converts the implications of the core content into a text format according to a predefined form-based character generation rule, and analyzes data for each analysis category. And a report containing the implications converted to the above text format is generated.

예를 들어, 본 발명의 일 실시예에서는, 제5장 시장규모 및 전망에서 핵심 내용으로 추출된 향후 5년의 시장규모 및 성장률을 전망하는 부분에 대해, “[분석대상 산업명(산업코드)]의 국내 시장규모는 [분석 기준년도] [분석 기준년도의 내수시장규모] 수준이며, [향후 4년 예상 연평균성장률]의 연평균성장률로 [(선택) A범위: 크게 증가, B범위: 증가, C 범위: 다소 증가, D범위: 유지, E범위: 다소 감소, F범위: 감소, G범위: 크게 감소]하여 [기준년도로부터 4년후 연도] [4년 후 내수시장규모] 수준에 이를 것으로 전망된다”와 같이 사전에 정해진 양식 기반의 문장 생성 규칙을 이용하여 “승강기 제조업(C29162)의 국내 시장규모는 2017년 48,369억 원 수준이며, 4.2%의 연평균성장률로 증가하여 2021년 56,999억 원 수준에 이를 것으로 전망된다.”의 문장을 자동으로 생성할 수 있다.For example, in one embodiment of the present invention, for the part that predicts the market size and growth rate of the next five years extracted as the core content in Chapter 5 Market Size and Prospect, “[The name of the industry to be analyzed (industry code)] The domestic market size of [Analysis Base Year] is the level of [Analysis Base Year's Domestic Market Size], and is the annual average growth rate of [Expected annual growth rate for the next four years] [(Optional) Range A: greatly increased, Range B: increased, C range: slightly increased, D range: maintained, E range: slightly decreased, F range: decreased, G range: significantly decreased], expected to reach the level of [four years from the base year] [4 years later domestic market size] Using a pre-determined format-based sentence generation rule, such as “It will be”, the domestic market size of “the elevator manufacturing industry (C29162) is 48,369 billion won in 2017, and it has increased at an annual average growth rate of 4.2% to reach 5699 billion won in 2021. It is expected that this is expected.” can be automatically generated.

또한 제6장. 재무구조분석에서 핵심 내용으로 추출된 평균 재무비율의 최근 변화 추세에 대한 내용으로, “[분석대상 산업명(산업코드)]에 대한 주요 재무비율의 지난 10년간 시계열 데이터를 살펴보면, 최근 들어 [(선택) A: 매출액 증가율, B: 영업이익률, C: 유동비율, D: 총자산회전율]은 [(선택) A 범위: 빠르게 증가, B 범위: 증가, C 범위: 다소 증가, D 범위: {최근 3년 평균 비율}을 유지, E범위: 다소 감소, F범위: 감소, G범위: 빠르게 감소] 하고 있다”와 같이 사전에 정해진 양식 기반의 문장 생성 규칙을 이용하여 “승강기 제조업(C29162)에 대한 주요 재무비율의 지난 10년간 시계열 데이터를 살펴보면, 최근 들어 매출액 증가율은 빠르게 감소하고 있고, 영업이익률은 빠르게 증가하고 있으며, 유동비율은 빠르게 증가하고 있고, 총자산 회전율은 다소 증가하고 있다”의 문장을 자동으로 생성할 수 있다.Also in Chapter 6. It is the content of the recent change trend of the average financial ratio extracted as a core content in the financial structure analysis. Looking at the time series data of the major financial ratios for the last 10 years for “[Analysis target industry name (industry code)], recently [( Select) A: Sales growth rate, B: Operating margin, C: Current ratio, D: Total asset turnover] is [(Optional) A range: rapidly increasing, B range: increasing, C range: slightly increasing, D range: {last 3 Yearly average rate} is maintained, E range: slightly decrease, F range: decrease, G range: rapidly decrease] by using a pre-determined format-based sentence generation rule, Looking at the time series data of the financial ratio over the past 10 years, in recent years, the rate of increase in sales is rapidly decreasing, the operating margin is increasing rapidly, the current ratio is increasing rapidly, and the total asset turnover rate is increasing somewhat” automatically. Can be generated.

또한, 산업·시장 분석 보고서 등에 담겨 있는 문장들을 자연어 처리하고 낱말 혹은 음절 단위로 딥러닝하여, 특정 낱말 혹은 음절 뒤에 출현할 가능성이 가장 높은 낱말 혹은 음절을 이어 붙여가면서 상기 양식 기반 문장 생성 규칙을 보완할 수 있다.In addition, natural language processing of sentences contained in industry and market analysis reports, and deep learning in units of words or syllables, complements the above style-based sentence generation rules by concatenating the words or syllables most likely to appear after a specific word or syllable. can do.

한편, 본 발명의 일 실시예에서는 분석된 표나 그림과 텍스트 형태의 문장을 조합하여 각각의 위치를 결정하고 보고서를 생성할 수 있는데, 각 분석 카테고리 별 분석 결과인 분석 데이터와, 각 분석 카테고리 별 핵심 내용을 텍스트로 변환한 설명 문장을 정해진 순서에 따라 나열하여 예컨대, 도 17에서와 같은 형태로 보고서를 생성할 수 있다.On the other hand, in an embodiment of the present invention, each position can be determined by combining the analyzed table or picture and text in the form of text, and a report can be generated. By arranging explanatory sentences in which the contents are converted into text in a predetermined order, a report may be generated in the form as shown in FIG. 17, for example.

이상에서 살펴본 바와 같이, 본 발명의 일 실시예에 따른 자동보고서생성장치(100)의 구성에 따르면, 분석 대상에 대한 정량적 데이터를 수집·분류·정제·가공한 후 산업·시장 분석을 자동으로 수행하고 분석 결과 중 핵심내용을 자동으로 추출한 후 그 의미를 해석하여 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석보고서를 자동으로 생성함으로써, 정보를 생산하는 분석자뿐만 아니라 분석된 정보를 이용하는 수요자에게도 매우 효과적인 산업·시장 분석 지원 수단이 될 수 있다. 또한, 자동으로 핵심내용을 추출하여 시사점을 도출함으로써, 분석자의 개인적인 분석 역량이나 정보 이용자의 해석 능력의 차이에 관계없이 수준 높은 정보 활용이 가능하다. 또한 핵심 내용의 의미를 해석하여 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석 보고서를 자동으로 생성함으로써, 특히 다수의 분석 대상에 대한 산업·시장 분석을 반복적으로 수행하여 보고서를 작성하는 경우 이에 소요되는 시간과 비용을 크게 절감할 수 있다.As described above, according to the configuration of the automatic report generation device 100 according to an embodiment of the present invention, after collecting, classifying, refining, and processing quantitative data on an analysis target, industry and market analysis is automatically performed. And, by automatically extracting the core content from the analysis results and automatically generating an industry/market analysis report that interprets the meaning and writes the implications in text form, it is very useful not only for analysts who produce information, but also for consumers who use the analyzed information. It can be an effective means of supporting industry and market analysis. In addition, by automatically extracting the core content and deriving implications, high-level information can be utilized regardless of the difference in the analyst's personal analysis capability or the information user's interpretation ability. In addition, by interpreting the meaning of the core content, it automatically generates an industry/market analysis report with implications in text form, especially when creating a report by repeatedly performing industry/market analysis on a number of analysis targets. It can greatly reduce the time and cost of becoming.

이하, 도 18을 참조하여 본 발명의 일 실시예에 따른 자동보고서생성장치(20)의 동작 방법에 대한 설명을 이어 가기로 한다.Hereinafter, a description of the operation method of the automatic report generation apparatus 20 according to an embodiment of the present invention will be continued with reference to FIG. 18.

먼저, 수집부(10)는 분석 대상에 대한 보고서 생성을 위해 분석 대상과 관련된 원본 데이터를 수집한다(S10).First, the collection unit 10 collects original data related to the analysis target to generate a report on the analysis target (S10).

그런 다음, 분류부(20)는 보고서의 포맷으로 지정된 분석 카테고리 별로 원본 데이터의 수집이 완료되면, 각 원본 데이터를 분류하여, 각 분석 카테고리마다의 분류 데이터로서 매칭시킨다(S20).Then, when the collection of the original data for each analysis category specified in the report format is completed, the classification unit 20 classifies each original data and matches it as classification data for each analysis category (S20).

이어서, 분석부(30)는 보고서의 포맷으로 지정된 분석 카테고리마다의 분석 데이터 분류가 완료되면, 각 카테고리의 특성에 따라 분류된 분류 데이터를 분석하여, 분석 카테고리 별 분석 데이터를 생성한다(S30).Subsequently, when the analysis data classification for each analysis category specified in the report format is completed, the analysis unit 30 analyzes the classification data classified according to the characteristics of each category, and generates analysis data for each analysis category (S30).

나아가, 추출부(40)는 분석 카테고리 별 분류 데이터에 대한 분석 결과인 분석 데이터의 생성이 완료되면, 각 분석 카테고리의 분석 데이터로부터 핵심 내용을 추출한다(S40).Further, when the generation of the analysis data, which is the analysis result of the classification data for each analysis category, is completed, the extraction unit 40 extracts core contents from the analysis data of each analysis category (S40).

이후, 생성부(50)는 분석 카테고리 별 핵심 내용의 추출이 완료되면, 핵심 내용이 의미하는 시사점을 기 정의된 양식 기반의 문자 생성 규칙에 따라 텍스트 형태로 변환하여, 분석 카테고리 별 분석 데이터와 상기 텍스트 형태로 변환된 시사점을 병기한 보고서를 생성한다(S50-S60).Thereafter, when the extraction of the core content for each analysis category is completed, the generation unit 50 converts the implications of the core content into a text form according to a predefined form-based character generation rule, and the analysis data for each analysis category and the above A report stating the implications converted into text form is generated (S50-S60).

이상에서 살펴본 바와 같이, 본 발명의 일 실시예에 따른 자동보고서생성장치(100)의 동작 방법에 따르면, 분석 대상에 대한 정량적 데이터를 수집·분류·정제·가공한 후 산업·시장 분석을 자동으로 수행하고 분석 결과 중 핵심내용을 자동으로 추출한 후 그 의미를 해석하여 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석보고서를 자동으로 생성함으로써, 정보를 생산하는 분석자뿐만 아니라 분석된 정보를 이용하는 수요자에게도 매우 효과적인 산업·시장 분석 지원 수단이 될 수 있다. 또한, 자동으로 핵심내용을 추출하여 시사점을 도출함으로써, 분석자의 개인적인 분석 역량이나 정보 이용자의 해석 능력의 차이에 관계없이 수준 높은 정보 활용이 가능하다. 또한 핵심 내용의 의미를 해석하여 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석 보고서를 자동으로 생성함으로써, 특히 다수의 분석 대상에 대한 산업·시장 분석을 반복적으로 수행하여 보고서를 작성하는 경우 이에 소요되는 시간과 비용을 크게 절감할 수 있다.As described above, according to the operation method of the automatic report generation device 100 according to an embodiment of the present invention, after collecting, classifying, refining, and processing quantitative data on an analysis target, the industry and market analysis are automatically performed. After performing the analysis and automatically extracting the core content from the analysis results, by automatically generating an industry/market analysis report that interprets the meaning and writes the implications in text form, not only the analyst who produces the information, but also the consumers who use the analyzed information. It can be a very effective means of supporting industry and market analysis. In addition, by automatically extracting the core content and deriving implications, high-level information can be utilized regardless of the difference in the analyst's personal analysis capability or the information user's interpretation ability. In addition, by interpreting the meaning of the core content, it automatically generates an industry/market analysis report with implications in text form, especially when creating a report by repeatedly performing industry/market analysis on a number of analysis targets. It can greatly reduce the time and cost of becoming.

한편, 여기에 제시된 실시예들과 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Meanwhile, the steps of the method or algorithm described in connection with the embodiments presented herein may be directly implemented in hardware or implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded in the medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

지금까지 본 발명을 바람직한 실시 예를 참조하여 상세히 설명하였지만, 본 발명이 상기한 실시 예에 한정되는 것은 아니며, 이하의 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 또는 수정이 가능한 범위까지 본 발명의 기술적 사상이 미친다 할 것이다.Until now, the present invention has been described in detail with reference to preferred embodiments, but the present invention is not limited to the above-described embodiments, and the technical field to which the present invention belongs without departing from the gist of the present invention claimed in the following claims. Anyone of ordinary skill in the art will say that the technical idea of the present invention extends to the range in which various modifications or modifications are possible.

본 발명에 따른 자동보고서생성장치 및 그 동작 방법에 따르면, 분석 대상에 대한 산업·시장 분석을 수행하고, 분석 결과 중 핵심내용을 자동으로 추출한 후 그 시사점을 텍스트 형태의 문장으로 작성한 산업·시장 분석 보고서를 자동으로 생성할 수 있다는 점에서, 기존 기술의 한계를 뛰어 넘음에 따라 관련 기술에 대한 이용만이 아닌 적용되는 장치의 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있는 발명이다.According to the automatic report generation apparatus and its operation method according to the present invention, an industry/market analysis is performed on an analysis target, and core contents are automatically extracted from the analysis results, and the implications are written in text form for an industry/market analysis. In the sense that reports can be automatically generated, as it goes beyond the limitations of existing technologies, the possibility of marketing or sales of applied devices is sufficient as well as the extent to which it can be carried out clearly in reality, as well as the use of related technologies. It is an invention that can be used in the future.

100: 자동보고서생성장치
10: 수집부 20: 분류부
30: 분석부 40: 추출부
50: 생성부100: automatic report generation device
10: collection unit 20: classification unit
30: analysis unit 40: extraction unit
50: generation unit

Claims

A collection unit that collects original data related to the analysis target to generate a report on the analysis target;
A classification unit for classifying the original data for each analysis category designated in the report format and matching it as classification data for each analysis category; And
And an analysis unit configured to generate analysis data for each analysis category by analyzing the classification data according to category characteristics predefined for each analysis category.

The method of claim 1,
The automatic report generation device,
An extraction unit for extracting core contents from the analysis data for each analysis category; And
Further comprising a generating unit for converting the implications of the core content into a text format according to a predefined format-based character generation rule, and generating a report containing the analysis data for each analysis category and the implications converted to the text format. Automatic report generation device, characterized in that.

The method of claim 1,
The collection unit,
The original data is collected by changing or expanding the name input for the analysis object to a common name in each data storage on the list, based on a list of data storages designated to match the category characteristics for each analysis category. Automatic report generation device.

The method of claim 1,
The analysis unit,
When there is shared data, which is classification data shared between two or more analysis categories, a master analysis category having ownership for each shared data is designated so that refined data processed by processing the shared data can be generated only for the master analysis category. Automatic report generation device, characterized in that.

The method of claim 4,
The shared data,
An automatic report generation apparatus, characterized in that ownership for data processing is allocated to any one of the two or more analysis categories based on at least one of a degree of relationship between data and a data size.

The method of claim 4,
The shared data,
Data processing into refined data is performed during the data processing time specified between the two or more analysis categories,
The refinement data,
When the data processing time ends or the data processing is completed, the automatic report generation apparatus is updated for an analysis category that does not have ownership.

The method of claim 2,
The extraction unit,
According to the content selection weight according to the core content extraction rule for each analysis category, the content of the change in the numerical physical quantity identified from the analysis data, the content of the numerical physical quantity compared between categories, and the specific term with the highest exposure frequency Automatic report generation device, characterized in that extracting at least one of the content related to the core content.

The method of claim 2,
The extraction unit,
According to the time selection weight according to the core content extraction rules for each analysis category, at least one of the contents of recent changes rather than the past identified from the analysis data, contents of future prospects, and contents of the time before and after a specific event is the core. Automatic report generation device, characterized in that extracting as content.

The method of claim 2,
The extraction unit,
When a hierarchical structure including a high-level category encompassing multiple categories of the same level is identified from the analysis data, the analysis content for the high level, multiple of the same level, according to the category selection weight according to the core content extraction rule for each analysis category. The content of the new upper-level category newly derived from the category, the content of the lower-level category showing a difference greater than or equal to the set value and the statistical physical quantity of the upper-level category, the category with the highest relevance to the analysis object among the categories of the same level The content of the category with the highest or lowest physical quantity among the categories of the same level, the content of the category with the physical quantity within the set rank or ratio among the categories of the same level, and the change of the physical quantity among the categories of the same level Automatic report generation device, characterized in that extracting at least one of the contents of the largest category as the core contents.

A collection step of collecting original data related to the analysis target to generate a report on the analysis target;
A classification step of classifying the original data for each analysis category specified in the report format and matching it as classification data for each analysis category; And
And an analysis step of analyzing the classification data according to the category characteristics predefined for each analysis category, and generating analysis data for each analysis category.

The method of claim 10,
The above method,
An extraction step of extracting core contents from the analysis data for each analysis category; And
Further, a generating step of generating a report containing the analysis data for each analysis category and the implications converted to the text format by converting the implications of the core content into a text format according to a predefined format-based character generation rule. Operating method of the automatic report generating device comprising a.

The method of claim 10,
The collecting step,
The original data is collected by changing or expanding the name input for the analysis object to a common name in each data storage on the list, based on a list of data storages designated to match the category characteristics for each analysis category. How to operate the automatic report generator.

The method of claim 10,
The analysis step,
When there is shared data, which is classification data shared between two or more analysis categories, a master analysis category having ownership for each shared data is designated so that refined data processed by processing the shared data can be generated only for the master analysis category. The method of operating the automatic report generation device, characterized in that.

The method of claim 13,
The shared data,
A method of operating an automatic report generation apparatus, characterized in that ownership for data processing is allocated to any one of the two or more analysis categories based on at least one of a degree of relationship between data and a data size.

The method of claim 13,
The shared data,
Data processing into refined data is performed during the data processing time specified between the two or more analysis categories,
The refinement data,
The method of operating an automatic report generation apparatus, characterized in that, when the data processing time ends or the data processing is completed, an analysis category that does not have ownership is updated.

The method of claim 11,
The extraction step,
According to the content selection weight according to the core content extraction rule for each analysis category, the content of the change in the numerical physical quantity identified from the analysis data, the content of the numerical physical quantity compared between categories, and the specific term with the highest exposure frequency An operation method of an automatic report generation device, characterized in that extracting at least one of the contents related to the core contents.

The method of claim 11,
The extraction step,
According to the time selection weight according to the core content extraction rules for each analysis category, at least one of the contents of recent changes rather than the past identified from the analysis data, contents of future prospects, and contents of the time before and after a specific event is the core. An operation method of an automatic report generation device, characterized in that extracting as content.

The method of claim 11,
The extraction step,
When a hierarchical structure including a high-level category encompassing multiple categories of the same level is identified from the analysis data, the analysis content for the high level, multiple of the same level, according to the category selection weight according to the core content extraction rule for each analysis category. The content of the new upper-level category newly derived from the category, the content of the lower-level category showing a difference greater than or equal to the set value and the statistical physical quantity of the upper-level category, the category with the highest relevance to the analysis object among the categories of the same level The content of the category with the highest or lowest physical quantity among the categories of the same level, the content of the category with the physical quantity within the set rank or ratio among the categories of the same level, and the change of the physical quantity among the categories of the same level A method of operating an automatic report generator, characterized in that extracting at least one of the contents of the largest category as core contents.