KR102230102B1

KR102230102B1 - Method and apparatus for supporting text mining

Info

Publication number: KR102230102B1
Application number: KR1020180013614A
Authority: KR
Inventors: 고우헤이 니시카와
Original assignee: 가부시키가이샤 스크린 홀딩스
Priority date: 2017-03-15
Filing date: 2018-02-02
Publication date: 2021-03-18
Also published as: TWI692696B; TW201835790A; CN108628928A; JP6829117B2; CN108628928B; KR20180105566A; JP2018152023A

Abstract

대응 분석의 결과를 나타내는 산포도를 표시할 때에, 산포도와 산포도를 보는 방법을 나타내는 힌트를 포함하는 지원 화면을 표시한다. 단어와 변수에 관한 산포도를 표시할 때에는, 힌트를 포함하지 않는 기본 화면, 원점 부근의 단어의 판단 방법을 힌트로서 포함하는 제 1 지원 화면, 변수를 특징짓는 단어의 관련도의 판단 방법을 힌트로서 포함하는 제 2 지원 화면, 단어끼리의 유사도의 판단 방법을 힌트로서 포함하는 제 3 지원 화면, 및, 변수끼리의 유사도의 판단 방법을 힌트로서 포함하는 제 4 지원 화면 중에서, 이용자가 지시한 화면을 표시한다. 이로써, 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있도록 한다.When displaying the scatter plot representing the result of the correspondence analysis, a support screen including a hint indicating how to view the scatter plot and the scatter plot is displayed. When displaying the scatter diagram for words and variables, a basic screen that does not contain hints, a first support screen that includes a method of determining words near the origin as hints, and a method of determining the degree of relevance of words that characterize the variables are used as hints. A screen instructed by the user from among the included second support screen, a third support screen including a method of determining the degree of similarity between words as a hint, and a fourth support screen including a method of determining the degree of similarity between variables as a hint. Indicate. This makes it possible to efficiently perform the process of inducing knowledge from the graph showing the result of the correspondence analysis.

Description

Text mining support method and apparatus {METHOD AND APPARATUS FOR SUPPORTING TEXT MINING}

본 발명은, 데이터 마이닝 기술에 관한 것이고, 특히, 텍스트 마이닝의 실행을 지원하는 텍스트 마이닝 지원 방법 및 장치에 관한 것이다.The present invention relates to a data mining technology, and more particularly, to a text mining support method and apparatus for supporting the execution of text mining.

최근, 대량의 데이터에 대해 통계학이나 패턴 인식 등의 데이터 분석 기술을 적용하여, 대량의 데이터로부터 지견 (데이터 중에 나타나는 규칙 등) 을 유도하는 데이터 마이닝 기술이 주목받고 있다. 텍스트 데이터를 대상으로 하는 데이터 마이닝은, 텍스트 마이닝으로 불린다. 이하, 텍스트 데이터에 대해, 데이터 분석 기술의 일종인 대응 분석 (코레스폰던스 분석) 을 실시하는 경우에 대해 생각한다.Recently, a data mining technology that derives knowledge (rules appearing in data) from a large amount of data by applying data analysis techniques such as statistics or pattern recognition to a large amount of data has attracted attention. Data mining targeting text data is called text mining. Hereinafter, a case where correspondence analysis (correspondence analysis), which is a kind of data analysis technique, is performed on text data will be considered.

대응 분석에서는, 크로스 집계표에 대해 표두 항목과 표측 항목 사이의 상관이 최대가 되도록 각 항목을 재나열하는 처리가 실시된다. 대응 분석을 실시한 결과는, 일반적으로 산포도 (散布圖) (2 차원 그래프) 를 사용하여 표현된다. 예를 들어, 도 2 에 나타내는 크로스 집계표에 대해 대응 분석을 실시하면, 도 3 에 나타내는 산포도가 얻어진다.In the correspondence analysis, a process of re-arranging each item in the cross count table so that the correlation between the table head item and the table item is maximized is performed. The result of the correspondence analysis is generally expressed using a scatter plot (two-dimensional graph). For example, when a correspondence analysis is performed on the cross count table shown in FIG. 2, the scatter plot shown in FIG. 3 is obtained.

본원 발명에 관련하여, 일본 공개특허공보 2005-44087호에는, 복수의 분석 툴을 사용할 때의 분석 순서를 이용자에 대해 제시하는 텍스트 마이닝 시스템이 기재되어 있다. 상기 문헌에 기재된 시스템을 이용하면, 텍스트 마이닝에 관한 지식이나 경험이 적은 이용자라도, 복수의 분석 툴을 바람직한 순서로 사용하여 분석을 실시할 수 있다.In connection with the present invention, Japanese Laid-Open Patent Publication No. 2005-44087 discloses a text mining system that presents an analysis procedure to a user when using a plurality of analysis tools. If the system described in the above document is used, even a user with little knowledge or experience in text mining can perform analysis using a plurality of analysis tools in a preferred order.

대응 분석에서는, 산포도를 구함으로써, 구한 산포도에 대해 고찰을 실시하여, 지견을 유도하는 것 쪽이 중요하다. 그러나, 텍스트 마이닝에 관한 지식이나 경험이 적은 이용자는, 산포도를 보는 방법을 모르기 때문에, 산포도를 보아도 먼저 무엇을 실시하면 되는지를 모른다. 이 때문에, 지식이나 경험이 적은 이용자는, 산포도로부터 지견을 유도하는 처리를 효율적으로 실시할 수 없다.In the correspondence analysis, it is important to examine the obtained scatter plot by obtaining a scatter plot and to derive knowledge. However, since users with little knowledge or experience in text mining do not know how to view the scatter plot, they do not know what to do first even when viewing the scatter plot. For this reason, a user with little knowledge or experience cannot efficiently perform the process of inducing knowledge from the scatter map.

특허문헌 1 에 기재된 시스템은, 분석 순서를 이용자에 대해 제시하지만, 분석 결과로부터 지견을 유도하는 처리를 지원하는 것은 아니다. 이 때문에, 특허문헌 1 에 기재된 시스템을 이용해도, 상기의 과제를 해결하지 못한다.The system described in Patent Literature 1 presents an analysis procedure to a user, but does not support processing to derive knowledge from an analysis result. For this reason, even if the system described in patent document 1 is used, the said subject cannot be solved.

그 때문에, 본 발명은, 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시하기 위한 텍스트 마이닝 지원 방법 및 장치를 제공하는 것을 목적으로 한다.Therefore, an object of the present invention is to provide a text mining support method and apparatus for efficiently carrying out a process of deriving knowledge from a graph representing the result of correspondence analysis.

상기의 목적을 달성하기 위해서, 본 발명은 이하의 특징을 가지고 있다.In order to achieve the above object, the present invention has the following features.

본 발명의 제 1 국면은, 대응 분석에 의한 분석 결과를 표시하는 텍스트 마이닝 지원 방법으로서,A first aspect of the present invention is a text mining support method for displaying an analysis result by correspondence analysis,

상기 분석 결과를 입력하는 스텝과,A step of inputting the analysis result;

이용자로부터의 지시를 입력하는 스텝과,A step of inputting an instruction from the user, and

상기 분석 결과를 나타내는 그래프를 포함하는 화면의 화면 데이터를 생성하는 스텝과,Generating screen data of a screen including a graph representing the analysis result,

상기 화면 데이터에 기초하여, 화면을 표시하는 스텝을 구비하고,Comprising a step of displaying a screen based on the screen data,

상기 화면 데이터를 생성하는 스텝은, 상기 지시에 따라, 상기 그래프와 상기 그래프를 보는 방법을 나타내는 힌트를 포함하는 지원 화면의 화면 데이터를 생성하는 것을 특징으로 한다.The generating of the screen data may include generating screen data of a support screen including the graph and a hint indicating how to view the graph according to the instruction.

본 발명의 제 2 국면은, 본 발명의 제 1 국면에 있어서,In the second aspect of the present invention, in the first aspect of the present invention,

상기 화면 데이터를 생성하는 스텝은, 복수의 지원 화면과, 상기 그래프를 포함하고 상기 힌트를 포함하지 않는 기본 화면 중에서, 상기 지시에 따라 선택된 화면의 화면 데이터를 생성하는 것을 특징으로 한다.The generating of the screen data may include generating screen data of a screen selected according to the instruction from among a plurality of support screens and basic screens including the graph and not including the hint.

본 발명의 제 3 국면은, 본 발명의 제 2 국면에 있어서,The third aspect of the present invention, in the second aspect of the present invention,

상기 분석 결과를 입력하는 스텝에서는, 상기 분석 결과로서, 제 1 항목과 제 2 항목을 대응지은 결과로서, 상기 제 1 항목의 제 1 성분 및 제 2 성분과, 상기 제 2 항목의 제 1 성분 및 제 2 성분을 포함하는 결과가 입력되고,In the step of inputting the analysis result, as the analysis result, as a result of matching the first item and the second item, the first component and the second component of the first item, the first component of the second item, and A result containing the second component is entered,

상기 화면 데이터를 생성하는 스텝은, 상기 그래프로서, 상기 제 1 성분을 가로축, 상기 제 2 성분을 세로축으로 한 평면 내에 상기 제 1 항목과 상기 제 2 항목을 플롯한 산포도를 작성하는 것을 특징으로 한다.The step of generating the screen data is characterized in that, as the graph, a scatter diagram in which the first item and the second item are plotted in a plane having the first component as a horizontal axis and the second component as a vertical axis is created. .

본 발명의 제 4 국면은, 본 발명의 제 3 국면에 있어서,In the fourth aspect of the present invention, in the third aspect of the present invention,

상기 복수의 지원 화면은, 산포도 내에서 원점 부근의 제 1 항목은 현저한 특징을 가지지 않는 취지를 상기 힌트로서 포함하는 제 1 지원 화면을 포함하는 것을 특징으로 한다.The plurality of support screens are characterized by including a first support screen including as the hint that the first item near the origin in the scatter map does not have a remarkable feature.

본 발명의 제 5 국면은, 본 발명의 제 4 국면에 있어서,In the fifth aspect of the present invention, in the fourth aspect of the present invention,

상기 제 1 지원 화면에 포함되는 산포도에는, 원점 부근의 범위가 도시되어 있는 것을 특징으로 한다.In the scatter diagram included in the first support screen, a range near the origin is shown.

본 발명의 제 6 국면은, 본 발명의 제 3 국면에 있어서,In the sixth aspect of the present invention, in the third aspect of the present invention,

상기 복수의 지원 화면은, 산포도 내에서 원점으로부터 제 2 항목을 향해 멀어지는 방향에 있는 제 1 항목은 당해 제 2 항목을 특징짓는 취지를 상기 힌트로서 포함하는 제 2 지원 화면을 포함하는 것을 특징으로 한다.The plurality of support screens are characterized in that the first item in a direction away from the origin in the scatter map toward the second item includes a second support screen including as the hint that characterizes the second item. .

본 발명의 제 7 국면은, 본 발명의 제 6 국면에 있어서,In the seventh aspect of the present invention, in the sixth aspect of the present invention,

상기 제 2 지원 화면에 포함되는 산포도에는, 원점으로부터 선택된 제 2 항목을 향해 멀어지는 방향의 범위가 도시되어 있는 것을 특징으로 한다.In the scatter diagram included in the second support screen, a range in a direction away from the origin toward the selected second item is shown.

본 발명의 제 8 국면은, 본 발명의 제 3 국면에 있어서,In the eighth aspect of the present invention, in the third aspect of the present invention,

상기 복수의 지원 화면은, 산포도 내에서 거리가 가까운 제 1 항목끼리는 유사도가 높은 취지를 상기 힌트로서 포함하는 제 3 지원 화면을 포함하는 것을 특징으로 한다.The plurality of support screens are characterized in that they include a third support screen including, as the hint, the fact that the similarity between the first items having a close distance within the scatter map is high.

본 발명의 제 9 국면은, 본 발명의 제 8 국면에 있어서,In the ninth aspect of the present invention, in the eighth aspect of the present invention,

상기 제 3 지원 화면에 포함되는 산포도에는, 선택된 제 1 항목 부근의 범위가 도시되어 있는 것을 특징으로 한다.In the scatter diagram included in the third support screen, a range around the selected first item is shown.

본 발명의 제 10 국면은, 본 발명의 제 3 국면에 있어서,In the tenth aspect of the present invention, in the third aspect of the present invention,

상기 복수의 지원 화면은, 산포도 내에서 거리가 가까운 제 2 항목끼리는 유사도가 높은 취지를 상기 힌트로서 포함하는 제 4 지원 화면을 포함하는 것을 특징으로 한다.The plurality of support screens are characterized in that they include a fourth support screen including, as the hint, the fact that the similarity between second items having a close distance within the scatter map is high.

본 발명의 제 11 국면은, 본 발명의 제 10 국면에 있어서,In the eleventh aspect of the present invention, in the tenth aspect of the present invention,

상기 제 4 지원 화면에 포함되는 산포도에는, 선택된 제 2 항목으로부터 가장 거리가 가까운 제 2 항목을 나타내는 표시가 도시되어 있는 것을 특징으로 한다.In the scatter diagram included in the fourth support screen, a display indicating a second item having the closest distance from the selected second item is shown.

본 발명의 제 12 국면은, 본 발명의 제 3 국면에 있어서,In the twelfth aspect of the present invention, in the third aspect of the present invention,

상기 분석 결과를 입력하는 스텝에서는, 상기 분석 결과로서, 단어를 상기 제 1 항목, 문장의 부분을 상기 제 2 항목, 문장의 각 부분에 있어서의 각 단어의 출현 빈도를 표 내 데이터로 하는 크로스 집계표에 대해 대응 분석을 실시한 결과가 입력되는 것을 특징으로 한다.In the step of inputting the analysis result, as the analysis result, a word is the first item, the sentence part is the second item, and the frequency of occurrence of each word in each part of the sentence is used as data in the table. It is characterized in that the result of the corresponding analysis is inputted.

본 발명의 제 13 국면은, 대응 분석에 의한 분석 결과를 표시하는 텍스트 마이닝 지원 장치로서,A thirteenth aspect of the present invention is a text mining support device that displays an analysis result by correspondence analysis,

상기 분석 결과를 입력하기 위한 분석 결과 입력부와,An analysis result input unit for inputting the analysis result;

이용자로부터의 지시를 입력하기 위한 지시 입력부와,An instruction input unit for inputting an instruction from a user,

상기 분석 결과를 나타내는 그래프를 포함하는 화면의 화면 데이터를 생성하는 화면 생성부와,A screen generator that generates screen data of a screen including a graph representing the analysis result,

상기 화면 데이터에 기초하여, 화면을 표시하는 분석 결과 표시부를 구비하고,An analysis result display unit for displaying a screen based on the screen data,

상기 화면 생성부는, 상기 지시에 따라, 상기 그래프와 상기 그래프를 보는 방법을 나타내는 힌트를 포함하는 지원 화면의 화면 데이터를 생성하는 것을 특징으로 한다.The screen generator may generate screen data of a support screen including the graph and a hint indicating how to view the graph according to the instruction.

본 발명의 제 14 국면은, 본 발명의 제 13 국면에 있어서,In the fourteenth aspect of the present invention, in the thirteenth aspect of the present invention,

상기 화면 생성부는, 복수의 지원 화면과, 상기 그래프를 포함하고 상기 힌트를 포함하지 않는 기본 화면 중에서, 상기 지시에 따라 선택된 화면의 화면 데이터를 생성하는 것을 특징으로 한다.The screen generator may generate screen data of a screen selected according to the instruction from among a plurality of support screens and basic screens including the graph and not including the hint.

본 발명의 제 15 국면은, 본 발명의 제 14 국면에 있어서,In the fifteenth aspect of the present invention, in the fourteenth aspect of the present invention,

상기 분석 결과 입력부에는, 상기 분석 결과로서, 제 1 항목과 제 2 항목을 대응지은 결과로서, 상기 제 1 항목의 제 1 성분 및 제 2 성분과, 상기 제 2 항목의 제 1 성분 및 제 2 성분을 포함하는 결과가 입력되고,In the analysis result input unit, as the analysis result, as a result of matching the first item and the second item, a first component and a second component of the first item, and a first component and a second component of the second item The result containing the is entered,

상기 화면 생성부는, 상기 그래프로서, 상기 제 1 성분을 가로축, 상기 제 2 성분을 세로축으로 한 평면 내에 상기 제 1 항목과 상기 제 2 항목을 플롯한 산포도를 작성하는 것을 특징으로 한다.The screen generator is characterized in that, as the graph, a scatter diagram in which the first item and the second item are plotted in a plane having the first component as a horizontal axis and the second component as a vertical axis is created.

본 발명의 제 16 국면은, 본 발명의 제 15 국면에 있어서,In the sixteenth aspect of the present invention, in the fifteenth aspect of the present invention,

상기 분석 결과 입력부에는, 상기 분석 결과로서, 단어를 상기 제 1 항목, 문장의 부분을 상기 제 2 항목, 문장의 각 부분에 있어서의 각 단어의 출현 빈도를 표 내 데이터로 하는 크로스 집계표에 대해 대응 분석을 실시한 결과가 입력되는 것을 특징으로 한다.In the analysis result input unit, as the analysis result, a word is corresponded to the first item, the sentence part is the second item, and the frequency of occurrence of each word in each part of the sentence is used as data in the table. It is characterized in that the result of the analysis is input.

상기 제 1 또는 제 13 국면에 의하면, 이용자는, 대응 분석의 결과를 나타내는 그래프와 그래프를 보는 방법을 나타내는 힌트를 포함하는 지원 화면을 이용하여, 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.According to the first or thirteenth aspect, the user uses a graph indicating the result of the correspondence analysis and a support screen including a hint indicating how to view the graph, and a process of inducing knowledge from the graph indicating the result of the correspondence analysis. Can be carried out efficiently.

상기 제 2 또는 제 14 국면에 의하면, 힌트를 포함하는 지원 화면과 힌트를 포함하지 않는 기본 화면을 선택적으로 표시함으로써, 이용자의 레벨에 따른 화면을 표시할 수 있다. 또, 복수의 지원 화면을 선택적으로 표시함으로써, 이용자에 대해 그래프를 보는 방법을 복수 종류 제시할 수 있다.According to the second or fourteenth aspect, by selectively displaying a support screen including a hint and a basic screen not including a hint, a screen according to a user's level can be displayed. In addition, by selectively displaying a plurality of support screens, it is possible to present a plurality of types of methods for viewing graphs to users.

상기 제 3 또는 제 15 국면에 의하면, 이용자는, 제 1 항목과 제 2 항목에 관한 대응 분석의 결과를 나타내는 산포도로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.According to the third or fifteenth aspect, the user can efficiently perform the process of inducing knowledge from a scatter diagram showing the result of the correspondence analysis on the first item and the second item.

상기 제 4 국면에 의하면, 이용자는, 산포도 내에서 원점 부근의 제 1 항목은 현저한 특징을 가지지 않는다는 지식을 이용하여, 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.According to the fourth aspect, the user can efficiently carry out the process of inducing knowledge from the graph representing the result of the correspondence analysis, using the knowledge that the first item near the origin in the scatter plot does not have a remarkable characteristic. .

상기 제 5 국면에 의하면, 이용자는, 도시된 범위를 보고, 현저한 특징을 가지지 않는 제 1 항목을 용이하게 알 수 있다.According to the fifth aspect, the user can easily know the first item that does not have a remarkable feature by looking at the illustrated range.

상기 제 6 국면에 의하면, 이용자는, 산포도 내에서 원점으로부터 제 2 항목을 향해 멀어지는 방향에 있는 제 1 항목은 당해 제 2 항목을 특징짓는다는 지식을 이용하여, 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.According to the sixth aspect, the user is informed from the graph representing the result of the correspondence analysis using the knowledge that the first item in the direction away from the origin toward the second item in the scatter plot characterizes the second item. It is possible to efficiently carry out the treatment that induces it.

상기 제 7 국면에 의하면, 이용자는, 도시된 범위를 보고, 선택된 제 2 항목을 특징짓는 제 1 항목을 용이하게 알 수 있다.According to the seventh aspect, the user can see the illustrated range and easily know the first item that characterizes the selected second item.

상기 제 8 국면에 의하면, 이용자는, 산포도 내에서 거리가 가까운 제 1 항목끼리는 유사도가 높다는 지식을 이용하여, 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.According to the eighth aspect, the user can efficiently perform a process of inducing knowledge from a graph representing the result of the correspondence analysis, using the knowledge that the first items having a close distance in the scatter plot have a high degree of similarity.

상기 제 9 국면에 의하면, 이용자는, 도시된 범위를 보고, 선택된 제 1 항목과 유사도가 높은 제 1 항목을 용이하게 알 수 있다.According to the ninth aspect, the user can easily know the first item having a high degree of similarity to the selected first item by looking at the illustrated range.

상기 제 10 국면에 의하면, 이용자는, 산포도 내에서 거리가 가까운 제 2 항목끼리는 유사도가 높다는 지식을 이용하여, 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.According to the tenth aspect, the user can efficiently perform a process of inducing knowledge from a graph indicating the result of the correspondence analysis, using the knowledge that the second items having a close distance in the scatter plot have a high degree of similarity.

상기 제 11 국면에 의하면, 이용자는, 도시된 표시를 보고, 선택된 제 2 항목과 가장 유사도가 높은 제 2 항목을 용이하게 알 수 있다.According to the eleventh aspect, the user can easily know the second item having the highest similarity to the selected second item by looking at the illustrated display.

상기 제 12 또는 제 18 국면에 의하면, 이용자는, 단어와 문장의 부분에 관한 대응 분석의 결과를 나타내는 산포도로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.According to the twelfth or eighteenth aspect, the user can efficiently perform a process of deriving knowledge from a scatter diagram showing the result of a correspondence analysis of a word and a part of a sentence.

본 발명의 이들 및 다른 목적, 특징, 양태 및 효과는, 첨부 도면을 참조하여 이하의 상세한 설명으로부터 더욱 분명해질 것이다.These and other objects, features, aspects, and effects of the present invention will become more apparent from the following detailed description with reference to the accompanying drawings.

도 1 은, 본 발명의 실시형태에 관련된 텍스트 마이닝 지원 장치의 구성을 나타내는 블록도이다.
도 2 는, 대응 분석의 대상이 되는 크로스 집계표를 나타내는 도면이다.
도 3 은, 도 1 에 나타내는 텍스트 마이닝 지원 장치에서 작성되는 산포도를 나타내는 도면이다.
도 4 는, 도 1 에 나타내는 텍스트 마이닝 지원 장치로서 기능하는 컴퓨터의 구성을 나타내는 블록도이다.
도 5 는, 도 1 에 나타내는 텍스트 마이닝 지원 장치의 동작을 나타내는 플로 차트이다.
도 6 은, 도 1 에 나타내는 텍스트 마이닝 지원 장치의 기본 화면을 나타내는 도면이다.
도 7 은, 도 1 에 나타내는 텍스트 마이닝 지원 장치의 제 1 지원 화면을 나타내는 도면이다.
도 8 은, 도 1 에 나타내는 텍스트 마이닝 지원 장치의 제 2 지원 화면을 나타내는 도면이다.
도 9 는, 도 1 에 나타내는 텍스트 마이닝 지원 장치의 제 3 지원 화면을 나타내는 도면이다.
도 10 은, 도 1 에 나타내는 텍스트 마이닝 지원 장치의 제 4 지원 화면을 나타내는 도면이다.1 is a block diagram showing a configuration of a text mining support apparatus according to an embodiment of the present invention.
Fig. 2 is a diagram showing a cross count table that is an object of correspondence analysis.
3 is a diagram showing a scatter diagram created by the text mining support apparatus shown in FIG. 1.
FIG. 4 is a block diagram showing the configuration of a computer functioning as a text mining support device shown in FIG. 1.
5 is a flowchart showing the operation of the text mining support apparatus shown in FIG. 1.
6 is a diagram illustrating a basic screen of the text mining support apparatus shown in FIG. 1.
7 is a diagram illustrating a first support screen of the text mining support apparatus shown in FIG. 1.
FIG. 8 is a diagram illustrating a second support screen of the text mining support device shown in FIG. 1.
9 is a diagram illustrating a third support screen of the text mining support apparatus shown in FIG. 1.
10 is a diagram illustrating a fourth support screen of the text mining support device shown in FIG. 1.

이하, 도면을 참조하여, 본 발명의 실시형태에 관련된 텍스트 마이닝 지원 방법, 텍스트 마이닝 지원 장치, 및, 텍스트 마이닝 지원 프로그램에 대해 설명한다. 본 실시형태에 관련된 텍스트 마이닝 지원 방법은, 전형적으로는 컴퓨터를 사용하여 실행된다. 본 실시형태에 관련된 텍스트 마이닝 지원 장치는, 전형적으로는 컴퓨터를 사용하여 구성된다. 본 실시형태에 관련된 텍스트 마이닝 지원 프로그램은, 컴퓨터를 사용하여 텍스트 마이닝 지원 방법을 실시하기 위한 프로그램이다. 텍스트 마이닝 지원 프로그램을 실행하는 컴퓨터는, 텍스트 마이닝 지원 장치로서 기능한다.Hereinafter, a text mining support method, a text mining support apparatus, and a text mining support program according to an embodiment of the present invention will be described with reference to the drawings. The text mining support method according to the present embodiment is typically executed using a computer. The text mining support apparatus according to the present embodiment is typically configured using a computer. The text mining support program according to the present embodiment is a program for implementing a text mining support method using a computer. The computer executing the text mining support program functions as a text mining support device.

도 1 은, 본 발명의 실시형태에 관련된 텍스트 마이닝 지원 장치의 구성을 나타내는 블록도이다. 도 1 에 나타내는 텍스트 마이닝 지원 장치 (10) 는, 분석 결과 입력부 (11), 지시 입력부 (12), 화면 생성부 (13), 및, 분석 결과 표시부 (14) 를 구비하고 있다. 텍스트 마이닝 지원 장치 (10) 에는, 텍스트 데이터에 대해 대응 분석을 실시한 결과가 입력된다. 텍스트 마이닝 지원 장치 (10) 는, 입력된 분석 결과를 나타내는 산포도를 화면에 표시한다.1 is a block diagram showing a configuration of a text mining support apparatus according to an embodiment of the present invention. The text mining support apparatus 10 shown in FIG. 1 includes an analysis result input unit 11, an instruction input unit 12, a screen generation unit 13, and an analysis result display unit 14. In the text mining support device 10, the result of performing correspondence analysis on text data is input. The text mining support device 10 displays a scatter diagram indicating the input analysis result on the screen.

도 1 에서는, 텍스트 마이닝 지원 장치 (10) 의 전단에 텍스트 분석 장치 (5) 가 형성되어 있다. 텍스트 분석 장치 (5) 에는, 텍스트 데이터 (1) 가 입력된다. 이하의 설명에서는, 텍스트 데이터 (1) 는, 복수의 부분 (이하, 「장 (章)」이라고 한다) 을 갖는 문장 데이터인 것으로 한다. 또, 대응 분석을 실시하는 장면에서는 「장」을 「변수」라고도 한다. 텍스트 분석 장치 (5) 는, 텍스트 데이터 (1) 에 포함되는 단어를 추출하고, 단어를 표측 항목, 장을 표두 항목, 각 장에 있어서의 각 단어의 출현 빈도를 표 내 데이터로 하는 크로스 집계표를 작성한다. 텍스트 분석 장치 (5) 는, 작성한 크로스 집계표에 대해 대응 분석을 실시하고, 분석 결과 (2) 를 출력한다. 대응 분석을 실시하면, 처리 대상 데이터의 특징을 나타내는 2 개 이상의 성분이 얻어진다. 분석 결과 (2) 에는, 적어도, 각 단어의 제 1 및 제 2 성분, 각 변수의 제 1 및 제 2 성분, 제 1 성분의 기여율, 그리고, 제 2 성분의 기여율이 포함된다.In FIG. 1, a text analysis device 5 is provided in the front end of the text mining support device 10. Text data 1 is input to the text analysis device 5. In the following description, the text data 1 is assumed to be sentence data having a plurality of parts (hereinafter, referred to as "chapter"). In addition, in a scene where correspondence analysis is performed, "field" is also referred to as "variable". The text analysis device 5 extracts the words contained in the text data 1, and generates a cross-counting table in which the word is a table item, a chapter is a table item, and the frequency of occurrence of each word in each chapter is used as data in the table. Write. The text analysis device 5 performs correspondence analysis on the created cross count table, and outputs the analysis result (2). By performing a correspondence analysis, two or more components representing characteristics of the data to be processed are obtained. The analysis result (2) includes at least the first and second components of each word, the first and second components of each variable, the contribution rate of the first component, and the contribution rate of the second component.

도 2 는, 대응 분석의 대상이 되는 크로스 집계표를 나타내는 도면이다. 도 2 에 나타내는 크로스 집계표는, 텍스트 분석 장치 (5) 에 소설 「인간 실격」의 문장 데이터를 텍스트 데이터 (1) 로서 입력함으로써 작성된다. 이 소설은, 일본의 소설로서, 「머리말」「제 1 수기」「제 2 수기」「제 3 수기」 및 「후기」의 5 개의 장을 갖고, 「자기」「인간」「넙치」「기분」 등의 단어를 포함한다. 도 2 에 나타내는 크로스 집계표는, 표측 항목으로서 「자기」「인간」「넙치」「기분」 등의 단어를 포함하고, 표두 항목으로서 「머리말」「제 1 수기」「제 2 수기」「제 3 수기」 및 「후기」의 5 개의 변수 (장) 를 포함한다. 「제 1 수기」에는, 단어 「인간」이 38 회 나타난다. 이것에 대응하여 도 2 에 나타내는 크로스 집계표에서는, 표측 항목이 「인간」, 표두 항목이 「제 1 수기」인 칸 (사선부) 에 38 로 기재되어 있다. 또한, 대응 분석을 바람직하게 실시하기 위해서, 도 2 에 나타내는 크로스 집계표에는 소정 이상의 출현 빈도를 갖는 단어만이 포함되어 있다.Fig. 2 is a diagram showing a cross count table that is an object of correspondence analysis. The cross count table shown in FIG. 2 is created by inputting sentence data of a novel "human disqualification" into the text analysis device 5 as text data 1. This novel, as a Japanese novel, has five chapters: "Preface", "First Memo", "Second Note", "Third Note", and "Late Period", "Self", "Human", "Halibut", "Mood" Includes words such as. The cross count table shown in Fig. 2 includes words such as "self", "human", "flounder" and "mood" as table items, and as heading items, "preface", "first handwritten", "second handwritten" and "third handwritten" It includes five variables (chapter) of "" and "late". In the "first handwriting", the word "human" appears 38 times. Correspondingly, in the cross count table shown in Fig. 2, the table item is described as "human" and the table head item is described as 38 in the column (diagonal part) of the "first handwriting". In addition, in order to perform the correspondence analysis preferably, only words having a frequency of occurrence of a predetermined or more are included in the cross count table shown in FIG. 2.

도 3 은, 텍스트 마이닝 지원 장치 (10) 에서 작성되는 산포도를 나타내는 도면이다. 상기 서술한 바와 같이, 텍스트 마이닝 지원 장치 (10) 에 입력되는 분석 결과 (2) 에는, 적어도, 각 단어의 제 1 및 제 2 성분, 각 변수의 제 1 및 제 2 성분, 제 1 성분의 기여율, 그리고, 제 2 성분의 기여율이 포함된다. 화면 생성부 (13) 는, 제 1 성분을 가로축, 제 2 성분을 세로축으로 한 평면 내에, 단어와 변수를 플롯함으로써 산포도를 작성한다. 예를 들어, 도 2 에 나타내는 크로스 집계표에 대한 분석 결과 (2) 에 기초하여, 도 3 에 나타내는 산포도가 작성된다. 분석 결과 표시부 (14) 는, 작성된 산포도를 포함하는 화면을 표시한다.3 is a diagram showing a scatter diagram created by the text mining support device 10. As described above, in the analysis result (2) input to the text mining support device 10, at least, the first and second components of each word, the first and second components of each variable, and the contribution rate of the first component , And, the contribution rate of the second component is included. The screen generation unit 13 creates a scatter diagram by plotting words and variables in a plane in which the first component is the horizontal axis and the second component is the vertical axis. For example, based on the analysis result (2) for the cross count table shown in FIG. 2, the scatter diagram shown in FIG. 3 is created. The analysis result display unit 14 displays a screen including the created scatter plot.

도 3 에서는, 단어의 위치에 검게 칠한 원, 변수의 위치에 백색의 정방형이 기재되고, 단어는 표준체로, 변수는 이탤릭체로 기재되어 있다. 도 3 에는, 제 1 성분의 기여율과 제 2 성분의 기여율이 기재되어 있다. 일반적으로, 제 1 성분의 기여율은, 제 2 성분의 기여율보다 크다. 이 점을 고려하여, 산포도 내의 2 점 P(p₁, p₂), Q(q₁, q₂) 간의 거리 d 는, 제 1 성분의 기여율 k₁ 과 제 2 성분의 기여율 k₂ 를 사용하여 다음 식 (1) 과 같이 정의된다.In Fig. 3, a circle painted in black at the position of a word and a square in white at the position of a variable are described, and the word is written in standard font, and the variable is written in italics. In Fig. 3, the contribution rate of the first component and the contribution rate of the second component are described. In general, the contribution rate of the first component is greater than that of the second component. Taking this into account, the distance d between the two points P(p ₁ , p ₂ ) and Q(q ₁ , q ₂ ) in the scatter plot is calculated using the contribution rate k ₁ of the first component and the contribution rate k _{2 of the second component.} It is defined as the following equation (1).

d ＝ √[{k₁(p₁ － q₁)}² ＋ {k₂(p₂ －q₂)}²] … (1)d = √[{k ₁ (p ₁ － q ₁ )} ² + {k ₂ (p ₂ －q ₂ )} ² ]… (One)

이하의 설명에 있어서의 거리란, 식 (1) 로 정의되는 산포도 내에서의 거리를 말한다. 산포도 내에 기재된 원은, 제 1 성분 방향의 길이가 제 2 성분 방향의 길이보다 짧은 타원으로 보인다.The distance in the following description refers to the distance within the scatter plot defined by Formula (1). The circle described in the scatter plot appears to be an ellipse whose length in the direction of the first component is shorter than the length in the direction of the second component.

도 4 는, 텍스트 마이닝 지원 장치 (10) 로서 기능하는 컴퓨터의 구성을 나타내는 블록도이다. 도 4 에 나타내는 컴퓨터 (20) 는, CPU (21), 메인 메모리 (22), 기억부 (23), 입력부 (24), 표시부 (25), 통신부 (26), 및, 기록 매체 판독부 (27) 를 구비하고 있다. 메인 메모리 (22) 에는, 예를 들어, DRAM 이 사용된다. 기억부 (23) 에는, 예를 들어, 하드 디스크나 솔리드 스테이트 드라이브가 사용된다. 입력부 (24) 에는, 예를 들어, 키보드 (28) 나 마우스 (29) 가 포함된다. 표시부 (25) 에는, 예를 들어, 액정 디스플레이가 사용된다. 통신부 (26) 는, 유선 통신 또는 무선 통신의 인터페이스 회로이다. 기록 매체 판독부 (27) 는, 프로그램 등을 기억한 기록 매체 (30) 의 인터페이스 회로이다. 기록 매체 (30) 에는, 예를 들어, CD-ROM, DVD-ROM 등의 비일과성의 기록 매체가 사용된다.4 is a block diagram showing the configuration of a computer functioning as the text mining support device 10. The computer 20 shown in FIG. 4 includes a CPU 21, a main memory 22, a storage unit 23, an input unit 24, a display unit 25, a communication unit 26, and a recording medium reading unit 27. ). For the main memory 22, for example, DRAM is used. For the storage unit 23, for example, a hard disk or a solid state drive is used. The input unit 24 includes, for example, a keyboard 28 and a mouse 29. For the display portion 25, for example, a liquid crystal display is used. The communication unit 26 is an interface circuit for wired communication or wireless communication. The recording medium reading unit 27 is an interface circuit of the recording medium 30 storing programs and the like. For the recording medium 30, for example, a non-transitory recording medium such as a CD-ROM and a DVD-ROM is used.

컴퓨터 (20) 가 텍스트 마이닝 지원 프로그램 (31) 을 실행하는 경우, 기억부 (23) 는, 텍스트 마이닝 지원 프로그램 (31) 과 분석 결과 (2) 를 기억한다. 텍스트 마이닝 지원 프로그램 (31) 과 분석 결과 (2) 는, 예를 들어, 서버나 다른 컴퓨터로부터 통신부 (26) 를 사용하여 수신한 것이어도 되고, 기록 매체 (30) 로부터 기록 매체 판독부 (27) 를 사용하여 판독 출력한 것이어도 된다.When the computer 20 executes the text mining support program 31, the storage unit 23 stores the text mining support program 31 and the analysis result (2). The text mining support program 31 and the analysis result 2 may be received from, for example, a server or another computer using the communication unit 26, and the recording medium reading unit 27 from the recording medium 30 It may be read out using.

텍스트 마이닝 지원 프로그램 (31) 을 실행할 때에는, 텍스트 마이닝 지원 프로그램 (31) 과 분석 결과 (2) 는 메인 메모리 (22) 에 복사 전송된다. CPU (21) 는, 메인 메모리 (22) 를 작업용 메모리로서 이용하여, 메인 메모리 (22) 에 기억된 텍스트 마이닝 지원 프로그램 (31) 을 실행함으로써, 메인 메모리 (22) 에 기억된 분석 결과 (2) 를 처리한다. 이 때 컴퓨터 (20) 는, 텍스트 마이닝 지원 장치 (10) 로서 기능한다. 또한, 이상에 서술한 컴퓨터 (20) 의 구성은 일례에 지나지 않고, 임의의 컴퓨터를 사용하여 텍스트 마이닝 지원 장치 (10) 를 구성할 수 있다.When executing the text mining support program 31, the text mining support program 31 and the analysis result 2 are transferred to the main memory 22 by copying. The CPU 21 uses the main memory 22 as a working memory, and executes the text mining support program 31 stored in the main memory 22, so that the analysis result (2) stored in the main memory 22 Process. At this time, the computer 20 functions as the text mining support device 10. In addition, the configuration of the computer 20 described above is only an example, and the text mining support apparatus 10 can be configured using an arbitrary computer.

텍스트 마이닝에 관한 지식이나 경험을 갖는 이용자는, 대응 분석의 결과를 나타내는 산포도에 대해, 이하와 같은 지식을 갖는다. 지식이나 경험을 갖는 이용자는, 이 지식들을 이용하여 산포도로부터 지견을 유도할 수 있다.A user who has knowledge or experience related to text mining has the following knowledge about the scatter plot indicating the result of the correspondence analysis. Users with knowledge or experience can use this knowledge to derive knowledge from the scatter plot.

제 1 지식 「원점 부근의 단어는, 현저한 특징을 가지지 않는다.」The first knowledge "words near the origin do not have remarkable features."

제 2 지식 「원점으로부터 변수를 향해 멀어지는 방향에 있는 단어는, 당해 변수와의 관련도가 높고, 당해 변수를 특징짓는다.」Second knowledge "A word in a direction away from the origin toward the variable has a high degree of relevance to the variable and characterizes the variable."

제 3 지식 「거리가 가까운 단어끼리는 유사도가 높다.」Knowledge 3 "The similarity between words with close distance is high."

제 4 지식 「거리가 가까운 변수끼리는 유사도가 높다.」The fourth knowledge "The similarity between variables with close distance is high."

한편, 텍스트 마이닝에 관한 지식이나 경험이 적은 이용자는, 상기와 같은 지식을 가지지 않는다. 이 때문에, 지식이나 경험이 적은 이용자는, 산포도로부터 지견을 유도하는 처리를 효율적으로 실시하지 못한다. 이 문제를 해결하기 위해서, 텍스트 마이닝 지원 장치 (10) 는, 산포도를 포함하는 화면을 기본 화면으로서 표시할 뿐만 아니라, 이용자로부터의 지시에 따라, 산포도와 산포도를 보는 방법을 나타내는 힌트를 포함하는 화면을 지원 화면으로서 표시한다.On the other hand, users with little knowledge or experience in text mining do not have the above knowledge. For this reason, a user with little knowledge or experience cannot efficiently perform the process of inducing knowledge from the scatter map. In order to solve this problem, the text mining support device 10 not only displays a screen including a scatter map as a basic screen, but also displays a screen including a hint indicating how to view the scatter map and the scatter map according to an instruction from the user. Is displayed as a support screen.

도 1 을 참조하여, 텍스트 마이닝 지원 장치 (10) 의 각 부의 동작을 설명한다. 분석 결과 입력부 (11) 에는, 외부의 장치 (예를 들어, 텍스트 분석 장치 (5)) 로부터 출력된 분석 결과 (2) 가 입력된다. 지시 입력부 (12) 에는, 이용자로부터의 지시가 입력된다. 화면 생성부 (13) 는, 분석 결과 (2) 를 나타내는 산포도를 작성하고, 산포도를 포함하는 화면의 화면 데이터를 생성한다. 화면 생성부 (13) 는, 지시 입력부 (12) 를 사용하여 입력된 이용자로부터의 지시에 따라, 산포도 및 힌트를 포함하는 지원 화면의 화면 데이터와, 산포도를 포함하고 힌트를 포함하지 않는 기본 화면의 화면 데이터를 선택적으로 생성한다. 분석 결과 표시부 (14) 는, 화면 생성부 (13) 에서 생성된 화면 데이터에 기초하여 화면을 표시한다. 이하, 텍스트 마이닝 지원 장치 (10) 에서 표시되는 지원 화면은 4 종류인 것으로 하고, 4 종류의 지원 화면을 제 1 ∼ 제 4 지원 화면이라고 한다.With reference to Fig. 1, the operation of each unit of the text mining support apparatus 10 will be described. To the analysis result input unit 11, the analysis result 2 output from an external device (for example, the text analysis device 5) is input. In the instruction input unit 12, an instruction from the user is input. The screen generation unit 13 creates a scatter plot representing the analysis result (2), and generates screen data of a screen including the scatter plot. The screen generation unit 13 includes screen data of a support screen including a scatter diagram and a hint, and a basic screen including a scatter diagram and no hint, according to an instruction from the user input using the instruction input unit 12. Selectively generate screen data. The analysis result display unit 14 displays a screen based on the screen data generated by the screen generating unit 13. Hereinafter, it is assumed that there are four types of support screens displayed by the text mining support device 10, and the four types of support screens are referred to as first to fourth support screens.

도 5 는, 텍스트 마이닝 지원 장치 (10) 의 동작을 나타내는 플로 차트이다. 먼저, CPU (21) 는, 텍스트 분석 장치 (5) 로부터 출력된 분석 결과 (2) 를 메인 메모리 (22) 에 전송한다. 이로써, 텍스트 마이닝 지원 장치 (10) 에 분석 결과 (2) 가 입력된다 (스텝 S101). 다음으로, CPU (21) 는, 분석 결과 (2) 에 기초하여 산포도를 작성한다 (스텝 S102). 산포도는, 제 1 성분을 가로축, 제 2 성분을 세로축으로 한 평면 내에, 단어와 변수를 플롯함으로써 작성된다. 다음으로, CPU (21) 는, 스텝 S102 에서 작성된 산포도를 포함하는 기본 화면의 화면 데이터를 생성한다 (스텝 S103). 다음으로, CPU (21) 는, 스텝 S103 에서 생성된 화면 데이터에 기초하여, 표시부 (25) 에 기본 화면을 표시시킨다 (스텝 S104).5 is a flowchart showing the operation of the text mining support device 10. First, the CPU 21 transmits the analysis result 2 output from the text analysis device 5 to the main memory 22. Thereby, the analysis result 2 is input to the text mining support apparatus 10 (step S101). Next, the CPU 21 creates a scatter plot based on the analysis result (2) (step S102). The scatter plot is created by plotting words and variables in a plane with the first component as the horizontal axis and the second component as the vertical axis. Next, the CPU 21 generates screen data of the basic screen including the scatter plot created in step S102 (step S103). Next, the CPU 21 causes the display unit 25 to display a basic screen based on the screen data generated in step S103 (step S104).

도 6 은, 기본 화면을 나타내는 도면이다. 도 6 에 나타내는 기본 화면 (100) 은, 화면 선택 윈도우 (101) 와 산포도 윈도우 (102) 를 포함하고 있다. 산포도 윈도우 (102) 에는, 도 3 에 나타내는 산포도가 기재된다. 화면 선택 윈도우 (101) 는, 6 개의 라디오 버튼 (103) 을 갖는다. 이하, 6 개의 라디오 버튼 (103) 을 제 1 ∼ 제 6 라디오 버튼이라고 한다. 제 1 ∼ 제 6 라디오 버튼은, 각각, 기본 화면, 제 1 ∼ 제 4 지원 화면, 및, 종료에 대응지어진다. 기본 화면 (100) 이 표시되었을 때에, 이용자는, 키보드 (28) 또는 마우스 (29) 를 조작하여, 제 1 ∼ 제 6 라디오 버튼 중 어느 것을 누른다. 이로써, 이용자로부터의 지시가 입력된다.6 is a diagram showing a basic screen. The basic screen 100 shown in FIG. 6 includes a screen selection window 101 and a scatter window 102. In the scatter plot window 102, the scatter plot shown in FIG. 3 is described. The screen selection window 101 has six radio buttons 103. Hereinafter, the six radio buttons 103 are referred to as first to sixth radio buttons. The first to sixth radio buttons are associated with the base screen, the first to the fourth support screen, and the end, respectively. When the basic screen 100 is displayed, the user operates the keyboard 28 or the mouse 29 and presses any of the first to sixth radio buttons. Thereby, an instruction from the user is input.

CPU (21) 는, 화면 선택 윈도우 (101) 를 사용하여 입력된 이용자로부터의 지시를 받는다 (스텝 S105). 다음으로, CPU (21) 는, 이용자로부터의 지시에 따라, 이하의 어느 스텝으로 진행된다 (스텝 S106). 이용자로부터의 지시가 「기본 화면」인 경우 (제 1 라디오 버튼이 눌린 경우), CPU (21) 는 스텝 S107 로 진행된다. 이 경우, CPU (21) 는, 스텝 S103 과 마찬가지로, 기본 화면의 화면 데이터를 생성한다 (스텝 S107). 이용자로부터의 지시가 「제 1 지원 화면」인 경우 (제 2 라디오 버튼이 눌린 경우), CPU (21) 는 스텝 S108 로 진행된다. 이 경우, CPU (21) 는, 제 1 지원 화면의 화면 데이터를 생성한다 (스텝 S108). 이용자로부터의 지시가 「제 2 지원 화면」인 경우 (제 3 라디오 버튼이 눌린 경우), CPU (21) 는 스텝 S109 로 진행된다. 이 경우, CPU (21) 는, 제 2 지원 화면의 화면 데이터를 생성한다 (스텝 S109). 이용자로부터의 지시가 「제 3 지원 화면」인 경우 (제 4 라디오 버튼이 눌린 경우), CPU (21) 는 스텝 S110 으로 진행된다. 이 경우, CPU (21) 는, 제 3 지원 화면의 화면 데이터를 생성한다 (스텝 S110). 이용자로부터의 지시가 「제 4 지원 화면」인 경우 (제 5 라디오 버튼이 눌린 경우), CPU (21) 는 스텝 S111 로 진행된다. 이 경우, CPU (21) 는, 제 4 지원 화면의 화면 데이터를 생성한다 (스텝 S111). 이용자로부터의 지시가 「종료」인 경우 (제 6 라디오 버튼이 눌린 경우), CPU (21) 는 처리를 종료한다.The CPU 21 receives an instruction from the user input using the screen selection window 101 (step S105). Next, the CPU 21 proceeds to any of the following steps in accordance with an instruction from the user (step S106). When the instruction from the user is a "basic screen" (when the first radio button is pressed), the CPU 21 proceeds to step S107. In this case, the CPU 21 generates screen data of the basic screen similarly to step S103 (step S107). When the instruction from the user is a "first support screen" (when the second radio button is pressed), the CPU 21 proceeds to step S108. In this case, the CPU 21 generates screen data of the first support screen (step S108). When the instruction from the user is a "second support screen" (when the third radio button is pressed), the CPU 21 proceeds to step S109. In this case, the CPU 21 generates screen data of the second support screen (step S109). When the instruction from the user is a "third support screen" (when the fourth radio button is pressed), the CPU 21 proceeds to step S110. In this case, the CPU 21 generates screen data of the third support screen (step S110). When the instruction from the user is "4th support screen" (when the 5th radio button is pressed), the CPU 21 proceeds to step S111. In this case, the CPU 21 generates screen data of the fourth support screen (step S111). When the instruction from the user is "end" (when the sixth radio button is pressed), the CPU 21 ends the processing.

CPU (21) 는, 스텝 S107 ∼ S111 의 어느 것을 실행한 후, 스텝 S112 로 진행된다. 다음으로, CPU (21) 는, 스텝 S107 ∼ S111 의 어느 것에서 생성된 화면 데이터에 기초하여, 표시부 (25) 에 화면을 표시시킨다 (스텝 S112). 다음으로, CPU (21) 는, 스텝 S105 로 진행된다. 이와 같이 텍스트 마이닝 지원 장치 (10) 는, 이용자로부터의 지시에 따라, 기본 화면과 제 1 ∼ 제 4 지원 화면 중에서 선택된 화면을 표시한다.The CPU 21 proceeds to step S112 after executing any of steps S107 to S111. Next, the CPU 21 causes the display unit 25 to display a screen based on the screen data generated in any of steps S107 to S111 (step S112). Next, the CPU 21 proceeds to step S105. In this way, the text mining support device 10 displays a screen selected from the basic screen and the first to fourth support screens according to an instruction from the user.

또한, 도 4 에 나타내는 컴퓨터 (20) 의 구성 요소 및 도 5 에 나타내는 스텝과, 도 1 에 나타내는 텍스트 마이닝 지원 장치 (10) 의 구성 요소는, 이하와 같이 대응한다. 스텝 S101 을 실행하는 CPU (21) 는, 분석 결과 입력부 (11) 로서 기능한다. 입력부 (24) 및 스텝 S105 를 실행하는 CPU (21) 는, 지시 입력부 (12) 로서 기능한다. 스텝 S102 ∼ S103, S106 ∼ S111 을 실행하는 CPU (21) 는, 화면 생성부 (13) 로서 기능한다. 표시부 (25) 및 스텝 S104, S112 를 실행하는 CPU (21) 는, 분석 결과 표시부 (14) 로서 기능한다.In addition, the constituent elements of the computer 20 shown in FIG. 4 and the steps shown in FIG. 5 and the constituent elements of the text mining support apparatus 10 shown in FIG. 1 correspond as follows. The CPU 21 executing step S101 functions as the analysis result input unit 11. The input unit 24 and the CPU 21 executing step S105 function as the instruction input unit 12. The CPU 21 that executes steps S102 to S103 and S106 to S111 functions as the screen generation unit 13. The display unit 25 and the CPU 21 executing steps S104 and S112 function as the analysis result display unit 14.

도 7 은, 제 1 지원 화면을 나타내는 도면이다. 도 7 에 나타내는 제 1 지원 화면 (110) 은, 화면 선택 윈도우 (101), 산포도 윈도우 (112), 단어 리스트 윈도우 (113), 및, 힌트 윈도우 (114) 를 포함하고 있다. 제 1 지원 화면 (110) 은, 제 1 지식 「원점 부근의 단어는, 현저한 특징을 가지지 않는다.」에 관한 것이다. 이용자는, 제 1 지원 화면 (110) 을 보고, 제 1 지식을 이용하여 산포도로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.7 is a diagram illustrating a first support screen. The first support screen 110 shown in FIG. 7 includes a screen selection window 101, a scatter plot window 112, a word list window 113, and a hint window 114. The first support screen 110 relates to the first knowledge "a word near the origin does not have a remarkable feature." The user sees the first support screen 110 and can efficiently perform the process of inducing knowledge from the scatter map by using the first knowledge.

제 1 지원 화면 (110) 이 표시되기 전에, 이용자는, 키보드 (28) 또는 마우스 (29) 를 조작하여, 원점 부근으로 판단되는 범위를 지정한다. 원점 부근으로 판단되는 범위의 초기치는, 미리 결정되어 있어도 된다. 산포도 윈도우 (112) 에는, 도 3 에 나타내는 산포도가 기재된다. 산포도 윈도우 (112) 내의 산포도에는, 원점 부근을 나타내는 원 (115) (외관은 타원) 이 기재된다. 원 (115) 은, 산포도와는 상이한 색 (예를 들어, 적색) 으로 기재하는 것이 바람직하다. 이와 같이 제 1 지원 화면 (110) 에 포함되는 산포도에는, 원점 부근의 범위가 원 (115) 을 사용하여 도시되어 있다. 따라서, 이용자는, 도시된 범위를 보고, 현저한 특징을 가지지 않는 단어를 용이하게 알 수 있다.Before the first support screen 110 is displayed, the user operates the keyboard 28 or the mouse 29 to designate a range judged near the origin. The initial value of the range judged near the origin may be determined in advance. In the scatter plot window 112, the scatter plot shown in FIG. 3 is described. On the scatter plot in the scatter plot window 112, a circle 115 (appearance is an ellipse) representing the vicinity of the origin is described. It is preferable to describe the circle 115 in a color different from the scattering degree (for example, red). In this way, in the scatter diagram included in the first support screen 110, a range near the origin is shown using a circle 115. Thus, the user can easily recognize words that do not have salient features by looking at the illustrated range.

단어 리스트 윈도우 (113) 에는, 원점 부근에 있는 단어 (원 (115) 내의 단어) 와 당해 단어의 원점으로부터의 거리를, 거리가 가까운 순으로 나열한 단어 리스트가 기재된다. 단어 리스트 윈도우 (113) 내의 상향 삼각형은, 거리가 가까운 순으로 나열되어 있는 것을 나타낸다. 힌트 윈도우 (114) 에는, 「분석의 포인트」라는 표제를 붙이고, 제 1 지식이 기재된다. 힌트 윈도우 (114) 는, 산포도 윈도우 (112) 와 겹치는 위치에 배치된다.In the word list window 113, a word list in which the words near the origin (the words in the circle 115) and the distance from the origin of the word are arranged in the order of the closest distance is described. The upward triangle in the word list window 113 indicates that the distances are arranged in the order of close proximity. In the hint window 114, the title "Point of Analysis" is attached, and the first knowledge is described. The hint window 114 is disposed at a position overlapping the scatter plot window 112.

원 (115) 의 사이즈는, 임의의 방법으로 결정된다. 예를 들어, 이용자가 원 (115) 에 포함되는 단어의 개수 (예를 들어, 10 개) 를 지정함으로써, 원 (115) 의 사이즈를 결정해도 된다. 혹은, 이용자가 원 (115) 에 포함되는 단어의 비율 (예를 들어, 전체의 10 ％) 을 지정함으로써, 원 (115) 의 사이즈를 결정해도 된다. 혹은, 이용자가 원점으로부터의 거리를 제 1 지원 화면 (110) 내에서 마우스 (29) 를 사용하여 지정함으로써, 원 (115) 의 사이즈를 결정해도 된다.The size of the circle 115 is determined by an arbitrary method. For example, the user may determine the size of the circle 115 by designating the number of words contained in the circle 115 (eg, 10). Alternatively, the size of the circle 115 may be determined by the user designating the ratio of words included in the circle 115 (eg, 10% of the total). Alternatively, the size of the circle 115 may be determined by the user designating the distance from the origin using the mouse 29 within the first support screen 110.

도 7 에 나타내는 제 1 지원 화면 (110) 에서는, 원점 부근의 단어 (원 (115) 내의 단어) 는 다른 단어와 동일한 양태로 표시된다. 이 대신에, 제 1 지원 화면에서는, 원점 부근의 단어를 다른 단어와 상이한 양태로 (예를 들어, 연한 색으로) 표시해도 되고, 원점 부근의 단어를 표시하지 않아도 된다. 제 2 ∼ 제 4 지원 화면에서도, 제 1 지원 화면에서 다른 단어와 상이한 양태로 표시한 단어를 다른 단어와 상이한 양태로 표시해도 되고, 제 1 지원 화면에서 표시하지 않았던 단어를 표시하지 않아도 된다.In the first support screen 110 shown in FIG. 7, words near the origin (words in the circle 115) are displayed in the same manner as other words. Instead of this, on the first support screen, words near the origin may be displayed in a manner different from other words (for example, in a light color), and words near the origin may not be displayed. Also in the second to fourth support screens, words displayed in a mode different from other words on the first support screen may be displayed in a mode different from other words, and words not displayed on the first support screen may not be displayed.

도 8 은, 제 2 지원 화면을 나타내는 도면이다. 도 8 에 나타내는 제 2 지원 화면 (120) 은, 화면 선택 윈도우 (101), 산포도 윈도우 (122), 단어 리스트 윈도우 (123), 및, 힌트 윈도우 (124) 를 포함하고 있다. 제 2 지원 화면 (120) 은, 제 2 지식 「원점으로부터 변수를 향해 멀어지는 방향에 있는 단어는, 당해 변수와의 관련도가 높고, 당해 변수를 특징짓는다.」에 관한 것이다. 이용자는, 제 2 지원 화면 (120) 을 보고, 제 2 지식을 이용하여 산포도로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.8 is a diagram illustrating a second support screen. The second support screen 120 shown in FIG. 8 includes a screen selection window 101, a scatter plot window 122, a word list window 123, and a hint window 124. The second support screen 120 relates to second knowledge "a word in a direction away from the origin toward the variable has a high degree of relation to the variable and characterizes the variable." The user sees the second support screen 120 and can efficiently perform the process of inducing knowledge from the scatter map by using the second knowledge.

제 2 지원 화면 (120) 이 표시되기 전에, 이용자는, 키보드 (28) 또는 마우스 (29) 를 조작하여, 1 개의 변수 (장) 를 선택한다. 여기서는, 변수 「머리말」이 선택된 경우에 대해 설명한다. 산포도 윈도우 (122) 에는, 도 3 에 나타내는 산포도가 기재된다. 산포도 윈도우 (122) 내의 산포도에는, 원점을 시점 (始點) 으로 하여, 선택된 변수를 통과하는 화살표 (125) 와, 원점을 시점으로 하여, 화살표 (125) 와의 사이에서 소정 각도 (예를 들어, 10°) 의 각을 이루는 2 개의 반직선 (126, 127) 이 기재된다. 반직선 (126, 127) 사이에 낀 영역 내에는, 원점으로부터 선택된 변수를 향해 멀어지는 방향에 있는 단어가 존재한다. 이와 같이 제 2 지원 화면 (120) 에 포함되는 산포도에는, 원점으로부터 선택된 변수를 향해 멀어지는 방향의 범위가 반직선 (126, 127) 을 사용하여 도시되어 있다. 따라서, 이용자는, 도시된 범위를 보고, 선택된 변수를 특징짓는 단어를 용이하게 알 수 있다.Before the second support screen 120 is displayed, the user operates the keyboard 28 or the mouse 29 to select one variable (chapter). Here, a case where the variable "preface" is selected will be described. In the scatter plot window 122, the scatter plot shown in FIG. 3 is described. In the scatter plot in the scatter plot window 122, a predetermined angle between the arrow 125 passing through the selected variable with the origin as a viewpoint and the origin as a viewpoint, and the arrow 125 (for example, The two semi-straight lines 126 and 127 forming an angle of 10°) are described. Within the area sandwiched between the rays 126 and 127, there are words in the direction away from the origin toward the selected variable. In this way, in the scatter diagram included in the second support screen 120, the range of the direction away from the origin toward the selected variable is shown using semi-straight lines 126 and 127. Thus, the user can easily see the depicted range and know the words that characterize the selected variable.

단어 리스트 윈도우 (123) 에는, 원점으로부터 선택된 변수를 향해 멀어지는 방향에 있는 단어 (반직선 (126, 127) 사이에 낀 영역 내의 단어) 와 당해 단어의 원점으로부터의 거리를, 거리가 먼 순으로 나열한 단어 리스트가 기재된다. 단어 리스트 윈도우 (123) 내의 하향 삼각형은, 거리가 먼 순으로 나열되어 있는 것을 나타낸다. 단어 리스트 윈도우 (123) 에는, 제 2 지식에 관련하여, 「원점으로부터의 거리가 먼 쪽이 보다 관련도가 높은 것으로 판단할 수 있다.」라고 기재된다. 힌트 윈도우 (124) 에는, 「분석의 포인트」라는 표제를 붙이고, 제 2 지식이 기재된다. 힌트 윈도우 (124) 는, 산포도 윈도우 (122) 와 겹치는 위치에 배치된다.In the word list window 123, words in the direction away from the origin toward the selected variable (words in the area sandwiched between the beams 126 and 127) and the distance from the origin of the word are listed in the order of distance. A list is shown. The downward triangle in the word list window 123 indicates that the distances are arranged in the order of distant. In the word list window 123, in relation to the second knowledge, "the longer the distance from the origin can be determined to have a higher degree of relevance." In the hint window 124, the title "Point of Analysis" is attached, and the second knowledge is described. The hint window 124 is disposed at a position overlapping the scatter plot window 122.

화살표 (125) 와 반직선 (126, 127) 이 이루는 각의 각도는, 화살표 (125) 와 반직선 (126, 127) 이 동일한 상한 (象限) 에 포함되는 한, 임의의 방법으로 결정할 수 있다. 화살표 (125) 와 각도를 부여하여 반직선 (126, 127) 을 기재하였을 때에, 반직선 (126, 127) 이 화살표 (125) 와 상이한 상한에 포함되는 경우, 반직선 (126, 127) 은 제 1 또는 제 2 성분축 상에 기재된다. 화살표 (125) 는, 산포도와는 상이한 색 (예를 들어, 적색) 으로 기재하는 것이 바람직하다. 반직선 (126, 127) 은, 산포도 및 화살표 (125) 와는 상이한 색 (예를 들어, 청색) 으로 기재하는 것이 바람직하다.The angle of the angle formed by the arrow 125 and the semi-straight lines 126 and 127 can be determined by any method as long as the arrow 125 and the semi-straight lines 126 and 127 are included in the same upper limit. When a semi-straight line 126, 127 is given by giving an angle with the arrow 125, when the semi-straight line 126, 127 is included in the upper limit different from the arrow 125, the semi-straight line 126, 127 is the first or the first It is written on the two-component axis. It is preferable to describe the arrow 125 in a color different from the scattering degree (for example, red). It is preferable to describe the semi-straight lines 126 and 127 in a color different from the scatter diagram and the arrow 125 (for example, blue).

도 9 는, 제 3 지원 화면을 나타내는 도면이다. 도 9 에 나타내는 제 3 지원 화면 (130) 은, 화면 선택 윈도우 (101), 산포도 윈도우 (132), 단어 리스트 윈도우 (133), 및, 힌트 윈도우 (134) 를 포함하고 있다. 제 3 지원 화면 (130) 은, 제 3 지식 「거리가 가까운 단어끼리는 유사도가 높다.」에 관한 것이다. 이용자는, 제 3 지원 화면 (130) 을 보고, 제 3 지식을 이용하여 산포도로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.9 is a diagram showing a third support screen. The third support screen 130 shown in FIG. 9 includes a screen selection window 101, a scatter plot window 132, a word list window 133, and a hint window 134. The 3rd support screen 130 relates to the 3rd knowledge "the similarity between words with close distance is high". The user sees the third support screen 130 and can efficiently perform the process of inducing knowledge from the scatter map by using the third knowledge.

제 3 지원 화면 (130) 이 표시되기 전에, 이용자는, 키보드 (28) 또는 마우스 (29) 조작하여, 1 개의 단어를 선택하고, 선택된 단어의 부근으로 판단되는 범위를 지정한다. 여기서는, 단어 「눈」이 선택된 경우에 대해 설명한다. 산포도 윈도우 (132) 에는, 도 3 에 나타내는 산포도가 기재된다. 산포도 윈도우 (132) 내의 산포도에는, 선택된 단어의 부근을 나타내는 원 (135) (외관은 타원) 이 기재된다. 원 (135) 은, 산포도와는 상이한 색 (예를 들어, 적색) 으로 기재하는 것이 바람직하다. 이와 같이 제 3 지원 화면 (130) 에 포함되는 산포도에는, 선택된 단어의 부근의 범위가 원 (135) 을 사용하여 도시되어 있다. 따라서, 이용자는, 도시된 범위를 보고, 선택된 단어와 유사도가 높은 단어를 용이하게 알 수 있다.Before the third support screen 130 is displayed, the user operates the keyboard 28 or the mouse 29, selects one word, and designates a range judged to be in the vicinity of the selected word. Here, the case where the word "eye" is selected will be described. In the scatter plot window 132, the scatter plot shown in FIG. 3 is described. On the scatter plot in the scatter plot window 132, a circle 135 (appearance is an ellipse) representing the vicinity of the selected word is described. It is preferable to describe the circle 135 in a color different from the scattering degree (for example, red). In this way, in the scatter diagram included in the third support screen 130, a range of the vicinity of the selected word is shown using a circle 135. Accordingly, the user can easily know a word having a high similarity to the selected word by looking at the illustrated range.

단어 리스트 윈도우 (133) 에는, 선택된 단어의 부근에 있는 단어 (원 (135) 내의 단어) 와 당해 단어의 지정된 단어로부터의 거리를, 거리가 가까운 순으로 나열한 단어 리스트가 기재된다. 단어 리스트 윈도우 (133) 에는, 제 3 지식으로서, 「단어로부터의 거리는 가까운 쪽이 보다 유사도가 높은 것으로 판단할 수 있다.」라고 기재된다. 이 예에서는, 선택된 단어 「눈」으로부터의 거리가 가장 가까운 단어는 「얼굴」이다. 따라서, 선택된 단어 「눈」과 유사도가 가장 높은 단어는 「얼굴」이다. 힌트 윈도우 (134) 에는, 「분석의 포인트」라는 표제를 붙이고, 그 취지가 기재된다. 힌트 윈도우 (134) 는, 산포도 윈도우 (132) 와 겹치는 위치에 배치된다.In the word list window 133, a word list in which words in the vicinity of the selected word (words in the circle 135) and the distance from the designated word of the word are listed in the order of the closest distance. In the word list window 133, as the third knowledge, "the closer the distance from the word can be determined to have higher similarity." In this example, the word with the closest distance from the selected word "eye" is "face". Therefore, the word with the highest similarity to the selected word "eye" is "face". In the hint window 134, the title "Point of Analysis" is attached, and the purpose is described. The hint window 134 is disposed at a position overlapping the scatter plot window 132.

원 (135) 의 사이즈는, 제 1 지원 화면 (110) 내의 원 (115) 의 사이즈와 마찬가지로, 임의의 방법으로 결정된다. 예를 들어, 이용자는, 원 (135) 에 포함되는 단어의 개수를 지정하는 방법, 원 (135) 에 포함되는 단어의 비율을 지정하는 방법, 선택된 단어로부터의 거리를 지정하는 방법 등에 의해, 원 (135) 의 사이즈를 결정한다.The size of the circle 135 is determined by an arbitrary method, similar to the size of the circle 115 in the first support screen 110. For example, the user can designate the number of words included in the circle 135, the method of specifying the ratio of the words included in the circle 135, the method of specifying the distance from the selected word, etc. Determine the size of 135.

도 10 은, 제 4 지원 화면을 나타내는 도면이다. 도 10 에 나타내는 제 4 지원 화면 (140) 은, 화면 선택 윈도우 (101), 산포도 윈도우 (142), 변수 리스트 윈도우 (143), 및, 힌트 윈도우 (144) 를 포함하고 있다. 제 4 지원 화면 (140) 은, 제 4 지식 「거리가 가까운 변수끼리는 유사도가 높다.」에 관한 것이다. 이용자는, 제 4 지원 화면 (140) 을 보고, 제 4 지식을 이용하여 산포도로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.10 is a diagram illustrating a fourth support screen. The fourth support screen 140 shown in FIG. 10 includes a screen selection window 101, a scatter plot window 142, a variable list window 143, and a hint window 144. The fourth support screen 140 relates to the fourth knowledge "the similarity between variables with close distances is high". The user sees the fourth support screen 140 and can efficiently perform the process of inducing knowledge from the scatter map by using the fourth knowledge.

제 4 지원 화면 (140) 이 표시되기 전에, 이용자는, 키보드 (28) 또는 마우스 (29) 를 조작하여, 1 개의 변수를 선택한다. 여기서는, 변수 「머리말」이 선택된 경우에 대해 설명한다. 산포도 윈도우 (142) 에는, 도 3 에 나타내는 산포도가 기재된다. 산포도 윈도우 (142) 내의 산포도에는, 선택된 변수를 시점으로 하여, 선택된 변수로부터의 거리가 가장 가까운 변수를 종점으로 하는 화살표 (145) 가 기재된다. 화살표 (145) 는, 산포도와는 상이한 색 (예를 들어, 적색) 으로 기재하는 것이 바람직하다. 이와 같이 제 4 지원 화면 (140) 에 포함되는 산포도에는, 선택된 변수로부터 가장 거리가 가까운 변수를 나타내는 화살표 (145) 가 도시되어 있다. 따라서, 이용자는, 도시된 화살표 (145) 를 보고, 선택된 변수와 유사도가 가장 높은 변수를 용이하게 알 수 있다.Before the fourth support screen 140 is displayed, the user operates the keyboard 28 or the mouse 29 to select one variable. Here, a case where the variable "preface" is selected will be described. In the scatter plot window 142, the scatter plot shown in FIG. 3 is described. On the scatter plot in the scatter plot window 142, an arrow 145 is written with the selected variable as the starting point and the variable with the closest distance from the selected variable as the end point. It is preferable to describe the arrow 145 in a color different from the scattering degree (eg, red). In this way, on the scatter diagram included in the fourth support screen 140, an arrow 145 indicating a variable having the closest distance from the selected variable is shown. Accordingly, the user can easily know the variable having the highest similarity to the selected variable by looking at the illustrated arrow 145.

변수 리스트 윈도우 (143) 는, 선택된 변수로부터의 거리가 비교적 가까운 변수와 당해 거리를, 거리가 가까운 순으로 나열한 변수 리스트가 기재된다. 변수 리스트 윈도우 (143) 에는, 제 4 지식으로서, 「변수로부터의 거리는 가까운 쪽이 보다 유사도가 높은 것으로 판단할 수 있다.」라고 기재된다. 이 예에서는, 선택된 변수 「머리말」로부터의 거리가 가장 가까운 변수는 「후기」이다. 따라서, 선택된 변수 「머리말」과 유사도가 가장 높은 변수는 「후기」이다. 힌트 윈도우 (144) 에는, 「분석의 포인트」라는 표제를 붙이고, 그 취지가 기재된다. 힌트 윈도우 (144) 는, 산포도 윈도우 (142) 와 겹치는 위치에 배치된다.In the variable list window 143, a variable list in which the distance from the selected variable is relatively close and the distance are arranged in the order of close distance are described. In the variable list window 143, as the fourth knowledge, "the closer the distance from the variable can be determined to have higher similarity." In this example, the variable with the closest distance from the selected variable "front" is "late". Therefore, the variable with the highest similarity to the selected variable "preface" is "late". In the hint window 144, a title of "Point of Analysis" is attached, and the purpose is described. The hint window 144 is disposed at a position overlapping the scatter plot window 142.

또한, 텍스트 마이닝 지원 장치 (10) 는, 이상에서 서술한 지원 화면 이외의 지원 화면을 표시해도 된다. 지원 화면은, 산포도와 산포도를 보는 방법을 나타내는 힌트를 포함하는 한, 임의의 내용을 포함하고 있어도 된다. 힌트는, 산포도를 보는 방법을 명시적으로 나타내는 것이어도 되고, 산포도를 보는 방법을 시사하는 것이어도 된다. 힌트는, 지원 화면의 어느 부분에 포함되어 있어도 된다. 힌트는, 산포도 윈도우와 겹치는 윈도우에 기재되어 있어도 되고, 산포도 윈도우와 겹치지 않는 윈도우에 기재되어 있어도 되며, 위치가 고정된 메시지 박스에 기재되어 있어도 된다.Further, the text mining support device 10 may display support screens other than the support screens described above. The support screen may contain arbitrary content as long as it includes a scatter plot and a hint indicating how to view the scatter plot. The hint may explicitly indicate how to view the scatter plot, or may suggest a method of viewing the scatter plot. The hint may be included in any part of the support screen. The hint may be described in a window that overlaps the scatter plot window, the scatter plot may be described in a window that does not overlap the window, or may be described in a message box with a fixed position.

이상에 나타내는 바와 같이, 본 실시형태에 관련된 텍스트 마이닝 지원 방법은, 분석 결과 (2) 를 입력하는 스텝과, 이용자로부터의 지시를 입력하는 스텝과, 분석 결과 (2) 를 나타내는 그래프 (산포도) 를 포함하는 화면의 화면 데이터를 생성하는 스텝과, 화면 데이터에 기초하여 화면을 표시하는 스텝을 구비하고 있다. 화면 데이터를 생성하는 스텝은, 지시에 따라, 그래프와 그래프를 보는 방법을 나타내는 힌트를 포함하는 지원 화면의 화면 데이터를 생성한다. 따라서, 이용자는, 대응 분석의 결과를 나타내는 그래프와 그래프를 보는 방법을 나타내는 힌트를 포함하는 지원 화면을 사용하여, 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.As shown above, the text mining support method according to the present embodiment includes a step of inputting an analysis result (2), a step of inputting an instruction from a user, and a graph (scatter plot) showing the analysis result (2). It includes a step of generating screen data of a screen to be included, and a step of displaying a screen based on the screen data. The step of generating screen data generates screen data of a support screen including a graph and a hint indicating how to view the graph according to the instruction. Accordingly, the user can efficiently perform a process of inducing knowledge from the graph indicating the result of the correspondence analysis by using a graph indicating the result of the correspondence analysis and a support screen including a hint indicating how to view the graph.

화면 데이터를 생성하는 스텝은, 복수의 지원 화면 (제 1 ∼ 제 4 지원 화면 (110, 120, 130, 140)) 과, 그래프를 포함하고 힌트를 포함하지 않는 기본 화면 (100) 중에서, 지시에 따라 선택된 화면의 화면 데이터를 생성한다. 이와 같이 힌트를 포함하는 지원 화면과 힌트를 포함하지 않는 기본 화면을 선택적으로 표시함으로써, 이용자의 레벨에 따른 화면을 표시할 수 있다. 또, 복수의 지원 화면을 선택적으로 표시함으로써, 이용자에 대해 그래프를 보는 방법을 복수 종류 제시할 수 있다.The step of generating the screen data includes a plurality of support screens (first to fourth support screens 110, 120, 130, 140), and a basic screen 100 including a graph and not including a hint, according to the instruction. The screen data of the selected screen is generated accordingly. As described above, by selectively displaying the support screen including the hint and the basic screen not including the hint, a screen according to the user's level can be displayed. In addition, by selectively displaying a plurality of support screens, it is possible to present a plurality of types of methods for viewing graphs to users.

분석 결과를 입력하는 스텝에서는, 분석 결과 (2) 로서, 제 1 항목 (단어) 과 제 2 항목 (변수) 을 대응지은 결과로서, 제 1 항목의 제 1 성분 및 제 2 성분과, 제 2 항목의 제 1 성분 및 제 2 성분을 포함하는 결과가 입력되고, 화면 데이터를 생성하는 스텝은, 그래프로서, 제 1 성분을 가로축, 제 2 성분을 세로축으로 한 평면 내에 제 1 항목과 제 2 항목을 플롯한 산포도를 작성한다. 따라서, 이용자는, 제 1 항목과 제 2 항목에 관한 대응 분석의 결과를 나타내는 산포도로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.In the step of inputting the analysis result, as the analysis result (2), as a result of matching the first item (word) and the second item (variable), the first and second components of the first item, and the second item The result including the first component and the second component of is input, and the step of generating the screen data includes, as a graph, the first item and the second item in a plane with the first component as the horizontal axis and the second component as the vertical axis. Create a plotted scatter plot. Accordingly, the user can efficiently perform the process of inducing knowledge from the scatter diagram showing the result of the correspondence analysis on the first item and the second item.

복수의 지원 화면은, 산포도 내에서 원점 부근의 제 1 항목은 현저한 특징을 가지지 않는 취지를 힌트로서 포함하는 제 1 지원 화면 (110), 산포도 내에서 원점으로부터 제 2 항목을 향해 멀어지는 방향에 있는 제 1 항목은 당해 제 2 항목을 특징짓는 취지를 힌트로서 포함하는 제 2 지원 화면 (120), 산포도 내에서 거리가 가까운 제 1 항목끼리는 유사도가 높은 취지를 힌트로서 포함하는 제 3 지원 화면 (130), 및, 산포도 내에서 거리가 가까운 제 2 항목끼리는 유사도가 높은 취지를 힌트로서 포함하는 제 4 지원 화면 (140) 을 포함하고 있다. 따라서, 이용자는, 각 지원 화면에 포함되는 힌트를 사용하여, 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.The plurality of support screens includes a first support screen 110 including as a hint that the first item near the origin in the scatter plot does not have a remarkable characteristic, and the first support screen 110 in the scatter plot in a direction away from the origin toward the second item. The first item is a second support screen 120 that includes as a hint that characterizes the second item, and a third support screen 130 that includes as a hint that the similarity between first items having a close distance in the scatter plot is high. , And a fourth support screen 140 including, as a hint, that the similarity between the second items having a close distance in the scatter map is high. Therefore, the user can efficiently perform the process of deriving knowledge from the graph showing the result of the correspondence analysis using the hint included in each support screen.

제 1 지원 화면 (110) 에 포함되는 산포도에는, 원점 부근의 범위가 원 (115) 을 사용하여 도시되어 있다. 제 2 지원 화면 (120) 에 포함되는 산포도에는, 원점으로부터 선택된 제 2 항목을 향해 멀어지는 방향의 범위가 반직선 (126, 127) 을 사용하여 도시되어 있다. 제 3 지원 화면 (130) 에 포함되는 산포도에는, 선택된 제 1 항목 부근의 범위가 원 (135) 을 사용하여 도시되어 있다. 제 4 지원 화면 (140) 에 포함되는 산포도에는, 선택된 제 2 항목으로부터 가장 거리가 가까운 제 2 항목을 나타내는 표시 (화살표 (145)) 가 도시되어 있다. 따라서, 이용자는, 각 지원 화면에 도시된 범위 또는 표시를 보고, 현저한 특징을 가지지 않는 제 1 항목, 선택된 제 2 항목을 특징짓는 제 1 항목, 선택된 제 1 항목과 유사도가 높은 제 1 항목, 및, 선택된 제 2 항목과 유사도가 높은 제 2 항목을 용이하게 알 수 있다.In the scatter diagram included in the first support screen 110, a range near the origin is shown using a circle 115. In the scatter diagram included in the second support screen 120, a range of a direction moving away from the origin toward the selected second item is shown using semi-linear lines 126 and 127. In the scatter diagram included in the third support screen 130, a range around the selected first item is shown using a circle 135. In the scatter diagram included in the fourth support screen 140, a display (arrow 145) indicating a second item having the closest distance from the selected second item is shown. Accordingly, the user sees the range or indication shown on each support screen, the first item that does not have a remarkable feature, the first item that characterizes the selected second item, the first item that has high similarity to the selected first item, and , It is easy to know the second item having a high similarity to the selected second item.

분석 결과를 입력하는 스텝에서는, 분석 결과로서, 단어를 제 1 항목, 문장의 부분을 제 2 항목, 문장의 각 부분에 있어서의 각 단어의 출현 빈도를 표 내 데이터로 하는 크로스 집계표에 대해 대응 분석을 실시한 결과가 입력된다. 따라서, 이용자는, 단어와 문장의 부분에 관한 대응 분석의 결과를 나타내는 산포도로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.In the step of inputting the analysis result, as the analysis result, a correspondence analysis is performed on a cross-countered table in which the word is the first item, the sentence part is the second item, and the frequency of occurrence of each word in each part of the sentence is the data in the table. The result of performing is entered. Accordingly, the user can efficiently perform a process of inducing knowledge from a scatter diagram showing the result of the correspondence analysis on the word and the part of the sentence.

본 실시형태에 관련된 텍스트 마이닝 지원 장치 (10), 및, 본 실시형태에 관련된 텍스트 마이닝 지원 프로그램 (31) 은, 본 실시형태에 관련된 텍스트 마이닝 지원 방법과 동일한 특징을 갖고, 동일한 효과를 발휘한다.The text mining support apparatus 10 according to the present embodiment and the text mining support program 31 according to the present embodiment have the same characteristics as the text mining support method according to the present embodiment, and exhibit the same effects.

또한, 이상의 설명에서는, 텍스트 마이닝 지원 장치 (10) 는, 대응 분석의 결과를 2 차원적으로 나타내는 산포도를 표시하는 것으로 하였다. 이것에 한정되지 않고, 본 발명은, 대응 분석의 결과를 다차원적으로 나타내는 그래프 (예를 들어, 3 차원 그래프) 를 표시하는 텍스트 마이닝 지원 방법 및 장치에도 적용할 수 있다. 또, 텍스트 데이터에 관한 크로스 집계표에 대한 대응 분석의 결과를 나타내는 산포도를 표시하는 텍스트 마이닝 지원 방법 및 장치와 마찬가지로, 텍스트 데이터 이외의 임의의 데이터에 관한 크로스 집계표에 대한 대응 분석의 결과를 나타내는 그래프 (산포도나 3 차원 그래프 등) 를 표시하는 데이터 마이닝 지원 방법 및 장치를 구성할 수 있다.In addition, in the above description, the text mining support device 10 is supposed to display a scatter diagram that shows the result of the correspondence analysis in two dimensions. The present invention is not limited to this, and the present invention can also be applied to a text mining support method and apparatus for displaying a multidimensional graph (for example, a three-dimensional graph) representing the result of the correspondence analysis. In addition, as in the text mining support method and apparatus that displays the result of the correspondence analysis on the cross-countered table for text data, a graph representing the result of the correspondence analysis for the cross-counter table for arbitrary data other than text data ( A data mining support method and apparatus that displays a scatter plot or a 3D graph) can be configured.

본 발명의 텍스트 마이닝 지원 방법 및 장치에 의하면, 대응 분석의 결과를 나타내는 그래프와 그래프를 보는 방법을 나타내는 힌트를 포함하는 지원 화면을 표시함으로써, 이용자는 대응 분석의 결과를 나타내는 그래프로부터 지견을 유도하는 처리를 효율적으로 실시할 수 있다.According to the text mining support method and apparatus of the present invention, by displaying a support screen including a graph indicating the result of the correspondence analysis and a hint indicating how to view the graph, the user derives knowledge from the graph indicating the result of the correspondence analysis. Treatment can be carried out efficiently.

이상에 있어서 본 발명을 상세하게 설명하였지만, 이상의 설명은 모든 면에서 예시적인 것으로서 제한적인 것은 아니다. 다수의 다른 변경이나 변형이 본 발명의 범위를 일탈하지 않고 안출 가능한 것으로 양해된다.Although the present invention has been described in detail above, the above description is illustrative in all respects and is not limiting. It is understood that many other changes or modifications can be devised without departing from the scope of the present invention.

2 : 분석 결과
10 : 텍스트 마이닝 지원 장치
11 : 분석 결과 입력부
12 : 지시 입력부
13 : 화면 생성부
14 : 분석 결과 표시부
31 : 텍스트 마이닝 지원 프로그램
100 : 기본 화면
101 : 화면 선택 윈도우
102, 112, 122, 132, 142 : 산포도 윈도우
110, 120, 130, 140 : 지원 화면
113, 123, 133 : 단어 리스트 윈도우
143 : 변수 리스트 윈도우
114, 124, 134, 144 : 힌트 윈도우 2: analysis result
10: text mining support device
11: Analysis result input unit
12: instruction input unit
13: screen generator
14: analysis result display unit
31: Text mining support program
100: default screen
101: screen selection window
102, 112, 122, 132, 142: Scatter plot window
110, 120, 130, 140: Support screen
113, 123, 133: Word list window
143: Variable list window
114, 124, 134, 144: hint window

Claims

delete

As a text mining support method that displays an analysis result by correspondence analysis, executed by a computer,
A step of inputting the analysis result;
A step of inputting an instruction from the user, and
Generating screen data of a screen including a graph representing the analysis result,
Comprising a step of displaying a screen based on the screen data,
In the step of generating the screen data, according to the instruction, generating screen data of a support screen including the graph and a hint indicating how to view the graph,
The step of generating the screen data may include generating screen data of a screen selected according to the instruction from among a plurality of support screens and basic screens including the graph and not including the hint,
In the step of inputting the analysis result, as the analysis result, as a result of matching the first item and the second item, the first component and the second component of the first item, the first component of the second item, and A result containing the second component is entered,
The step of generating the screen data comprises creating a scatter diagram in which the first item and the second item are plotted in a plane having the first component as a horizontal axis and the second component as a vertical axis as the graph. , How to support text mining.

The method of claim 3,
Wherein the plurality of support screens includes a first support screen including as the hint that the first item near the origin in the scatter map does not have a remarkable characteristic.

The method of claim 4,
A text mining support method, characterized in that a range near an origin is shown in the scatter diagram included in the first support screen.

The method of claim 3,
The plurality of support screens, characterized in that the first item in a direction away from the origin in the scatter map toward the second item comprises a second support screen including, as the hint, that characterizes the second item. , How to support text mining.

The method of claim 6,
The text mining support method, characterized in that, in the scatter diagram included in the second support screen, a range of a direction away from the origin toward the selected second item is shown.

The method of claim 3,
Wherein the plurality of support screens include a third support screen including as the hint that first items having a close distance within a scatter map have a high similarity.

The method of claim 8,
A text mining support method, characterized in that, in the scatter diagram included in the third support screen, a range around the selected first item is shown.

The method of claim 3,
Wherein the plurality of support screens include a fourth support screen including as the hint that second items having a close distance within a scatter map have a high degree of similarity.

The method of claim 10,
A text mining support method, characterized in that a display indicating a second item having the closest distance from the selected second item is shown on the scatter diagram included in the fourth support screen.

The method of claim 3,
In the step of inputting the analysis result, as the analysis result, a word is the first item, the sentence part is the second item, and the frequency of occurrence of each word in each part of the sentence is used as data in the table. The text mining support method, characterized in that the result of the corresponding analysis is inputted.

delete

As a text mining support device that displays the analysis result by correspondence analysis,
An analysis result input unit for inputting the analysis result;
An instruction input unit for inputting an instruction from a user,
A screen generator that generates screen data of a screen including a graph representing the analysis result,
An analysis result display unit for displaying a screen based on the screen data,
The screen generation unit generates screen data of a support screen including the graph and a hint indicating how to view the graph according to the instruction,
The screen generation unit generates screen data of a screen selected according to the instruction from among a plurality of support screens and a basic screen including the graph and not including the hint,
In the analysis result input unit, as the analysis result, as a result of matching the first item and the second item, a first component and a second component of the first item, and a first component and a second component of the second item The result containing the is entered,
The screen generation unit, as the graph, generates a scatter diagram in which the first item and the second item are plotted in a plane with the first component as a horizontal axis and the second component as a vertical axis. Text mining support Device.

The method of claim 15,
In the analysis result input unit, as the analysis result, a word is corresponded to the first item, the sentence part is the second item, and the frequency of occurrence of each word in each part of the sentence is used as data in the table. Text mining support device, characterized in that the result of the analysis is input.

As a text mining support method that displays an analysis result by correspondence analysis, executed by a computer,
A step of inputting the analysis result;
A step of inputting an instruction from the user, and
Generating screen data of a screen including a graph representing the analysis result,
Comprising a step of displaying a screen based on the screen data,
In the step of generating the screen data, according to the instruction, generating screen data of a support screen including the graph and a hint indicating how to view the graph,
The screen data of the support screen, as the hint, comprises a description of knowledge on how to view the graph.

As a text mining support device that displays the analysis result by correspondence analysis,
An analysis result input unit for inputting the analysis result;
An instruction input unit for inputting an instruction from a user,
A screen generator that generates screen data of a screen including a graph representing the analysis result,
An analysis result display unit for displaying a screen based on the screen data,
The screen generation unit generates screen data of a support screen including the graph and a hint indicating how to view the graph according to the instruction,
The screen data of the support screen, as the hint, characterized in that it includes a description of knowledge on how to view the graph.