KR102175658B1

KR102175658B1 - Text mining method, text mining program and text mining apparatus

Info

Publication number: KR102175658B1
Application number: KR1020190023397A
Authority: KR
Inventors: 징롱 저우
Original assignee: 가부시키가이샤 스크린 홀딩스
Priority date: 2018-03-20
Filing date: 2019-02-27
Publication date: 2020-11-06
Also published as: KR20190110435A; TWI736860B; JP2019164592A; CN110309260A; TW201941083A; CN110309260B; JP7078429B2

Abstract

텍스트 마이닝 방법은, 날짜를 갖는 문으로 이루어지는 텍스트 데이터로부터 단어를 추출하는 스텝과, 추출한 단어에 대해 분석 기간별로 계층적 클러스터 분석을 실시하는 스텝과, 계층적 클러스터 분석의 결과를 포함하는 화면을 표시하는 스텝을 구비한다. 분석 결과를 포함하는 제 1 화면 내에서 주목어를 지정하는 지시가 입력되었을 때에, 주목어를 포함하는 클러스터에 포함되는 단어에 기초하는 클러스터명을 시간축을 따라 나타냄으로써, 주목어를 포함하는 클러스터의 시간 경과에 따른 변화를 나타내는 제 2 화면을 표시한다. 이로써, 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 용이하게 인식할 수 있도록 한다.In the text mining method, a step of extracting words from text data consisting of doors with dates, a step of performing a hierarchical cluster analysis on the extracted words for each analysis period, and a screen including the results of the hierarchical cluster analysis are displayed. It has a step to do. When an instruction for designating a target word is input in the first screen including the analysis result, the cluster name based on the word included in the cluster including the target word is displayed along the time axis, A second screen showing changes over time is displayed. This makes it possible to easily recognize the change over time in the hierarchical cluster analysis result.

Description

Text mining method, text mining program, and text mining device {TEXT MINING METHOD, TEXT MINING PROGRAM AND TEXT MINING APPARATUS}

본 발명은 텍스트 마이닝에 관한 것으로, 특히, 계층적 클러스터 분석의 결과를 포함하는 화면을 표시하는 텍스트 마이닝 방법, 텍스트 마이닝 프로그램, 및 텍스트 마이닝 장치에 관한 것이다.The present invention relates to text mining, and more particularly, to a text mining method, a text mining program, and a text mining apparatus for displaying a screen including a result of a hierarchical cluster analysis.

최근, 자유 기술된 텍스트 데이터를 해석하고, 해석 결과로부터 유용한 정보를 구하는 텍스트 마이닝이 주목받고 있다. 텍스트 마이닝에서는, 예를 들어, 분석 대상인 텍스트 데이터로부터 단어를 추출하고, 단어의 출현 빈도나 출현 경향 등을 해석함으로써, 정보를 구한다.Recently, text mining, which analyzes freely described text data and obtains useful information from the analysis results, has attracted attention. In text mining, for example, information is obtained by extracting words from text data to be analyzed, and analyzing the frequency and tendency of the words to appear.

이하, 텍스트 데이터로부터 추출한 단어에 대해 계층적 클러스터 분석을 실시하고, 그 결과를 포함하는 화면을 표시하는 텍스트 마이닝 장치에 대해서 생각해 본다. 계층적 클러스터 분석에서는, 단어 간의 유사도에 기초하여, 유사도가 높은 단어를 포함하는 클러스터가 계층적으로 작성된다. 일반적으로, 계층적 클러스터 분석의 결과는, 도 10 에 나타내는 수형도 (덴드로그램) 를 이용하여 분석자에게 제공된다. 분석자는, 계층적 클러스터 분석의 결과에 기초하여, 텍스트 데이터의 개요를 파악할 수 있다.Hereinafter, a text mining apparatus that performs hierarchical cluster analysis on words extracted from text data and displays a screen including the results will be considered. In hierarchical cluster analysis, clusters including words with high similarity are hierarchically created based on the similarity between words. In general, the results of hierarchical cluster analysis are provided to an analyst using a tree diagram (dendogram) shown in FIG. 10. The analyst can grasp the outline of the text data based on the result of the hierarchical cluster analysis.

일본 공개특허공보 2018-18118호에는, 계층적 클러스터 분석의 결과를 도 11 에 나타내는 양태로 표시하는 텍스트 마이닝 장치가 기재되어 있다. 이 문헌에 기재된 텍스트 마이닝 장치는, 클러스터수 m 과 클러스터 내의 최대 표시 데이터수 n 이 주어졌을 때에, 계층적 클러스터 분석의 결과로부터 m 개의 클러스터를 구하고, 구한 m 개의 클러스터를 구름형 도형으로 화면에 표시하고, 각 클러스터의 내부에 n 개 이하의 단어를 표시한다.Japanese Patent Application Laid-Open No. 2018-18118 discloses a text mining apparatus that displays the results of hierarchical cluster analysis in the manner shown in FIG. 11. The text mining apparatus described in this document obtains m clusters from the results of hierarchical cluster analysis when the number of clusters m and the maximum number of displayed data n in the cluster is given, and displays the obtained m clusters as a cloud-shaped figure. And, n or fewer words are displayed inside each cluster.

텍스트 데이터 중에는, 보수 작업 기록이나 콜 센터의 전화 응대 기록 등과 같이, 날짜를 갖는 문 (文) 으로 이루어지고, 장기간에 걸쳐 누적적으로 축적되는 것이 있다. 이와 같은 텍스트 데이터에 대해 계층적 클러스터 분석을 실시할 때에는, 텍스트 데이터를 예를 들어 월별로 나누고, 각 월의 텍스트 데이터에 대해 계층적 클러스터 분석을 실시한다. 이로써, 계층적 클러스터 분석의 결과를 월별로 구할 수 있다.Among the text data, there are some which consist of documents with dates, such as records of maintenance work or records of telephone responses from a call center, and are accumulated cumulatively over a long period of time. When performing a hierarchical cluster analysis on such text data, the text data is divided by month, for example, and a hierarchical cluster analysis is performed on the text data of each month. Accordingly, the results of the hierarchical cluster analysis can be obtained monthly.

이 경우, 분석자는, 텍스트 데이터 중에서 주목해야 할 단어 (이하, 주목어라고 한다) 를 선택하고, 각 월에서 주목어를 포함하는 클러스터, 주목어를 포함하는 클러스터가 변화하는 시기, 주목어의 출현 빈도의 시간 경과에 따른 변화 등을 알고 싶다고 생각한다. 그러나, 종래의 텍스트 마이닝 장치에서는, 이용자는 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 용이하게 인식할 수 없다.In this case, the analyst selects a word to pay attention to (hereinafter referred to as a attention word) from the text data, and in each month, the cluster containing the attention word, the time when the cluster including the attention word changes, the appearance of the attention word. I want to know the change in frequency over time. However, in the conventional text mining apparatus, the user cannot easily recognize the change over time in the hierarchical cluster analysis result.

그 때문에, 본 발명은, 이용자가 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 용이하게 인식할 수 있는 텍스트 마이닝 방법, 텍스트 마이닝 프로그램, 및 텍스트 마이닝 장치를 제공하는 것을 목적으로 한다.Therefore, an object of the present invention is to provide a text mining method, a text mining program, and a text mining apparatus in which a user can easily recognize a change over time in a hierarchical cluster analysis result.

본 발명의 제 1 국면은, 텍스트 데이터의 분석 결과를 포함하는 화면을 표시하는 텍스트 마이닝 방법으로서,A first aspect of the present invention is a text mining method for displaying a screen including an analysis result of text data,

날짜를 갖는 문으로 이루어지는 텍스트 데이터로부터 단어를 추출하는 스텝과, A step of extracting a word from text data consisting of a statement having a date,

상기 단어에 대해 분석 기간별로 계층적 클러스터 분석을 실시하는 스텝과,A step of performing a hierarchical cluster analysis on the word for each analysis period,

상기 계층적 클러스터 분석의 결과를 포함하는 화면을 표시하는 스텝을 구비하고, Comprising a step of displaying a screen including a result of the hierarchical cluster analysis,

상기 결과를 포함하는 제 1 화면 내에서 주목어를 지정하는 지시가 입력되었을 때에, 상기 화면을 표시하는 스텝은, 상기 주목어를 포함하는 클러스터의 시간 경과에 따른 변화를 나타내는 제 2 화면을 표시하는 것을 특징으로 한다.When an instruction for designating a target word is input within the first screen including the result, the step of displaying the screen includes displaying a second screen indicating changes over time of the cluster including the target word. It features.

본 발명의 제 2 국면은, 본 발명의 제 1 국면에 있어서, In the second aspect of the present invention, in the first aspect of the present invention,

상기 제 2 화면은, 상기 클러스터에 포함되는 단어에 기초하는 클러스터명을 시간축을 따라 나타내는 것을 특징으로 한다.The second screen is characterized in that a cluster name based on words included in the cluster is displayed along a time axis.

본 발명의 제 3 국면은, 본 발명의 제 2 국면에 있어서, The third aspect of the present invention, in the second aspect of the present invention,

상기 클러스터명은, 상기 클러스터에 포함되는 단어를 출현 빈도가 높은 순으로 소정의 개수 이하만큼 연결한 것인 것을 특징으로 한다.The cluster name is characterized in that the words included in the cluster are concatenated by a predetermined number or less in the order of their appearance frequency.

본 발명의 제 4 국면은, 본 발명의 제 2 국면에 있어서, In the fourth aspect of the present invention, in the second aspect of the present invention,

상기 제 2 화면은, 상기 클러스터명이 변화하는 시기에 대응하는 위치에, 상기 클러스터명의 변화의 정도에 따른 양태를 갖는 마크를 추가로 포함하는 것을 특징으로 한다.The second screen is characterized in that it further includes a mark having an aspect according to a degree of change of the cluster name at a position corresponding to a time when the cluster name changes.

본 발명의 제 5 국면은, 본 발명의 제 4 국면에 있어서, In the fifth aspect of the present invention, in the fourth aspect of the present invention,

상기 마크는, 상기 클러스터명의 변화의 정도에 따른 색을 갖는 화살표인 것을 특징으로 한다.The mark is characterized in that it is an arrow having a color according to the degree of change of the cluster name.

본 발명의 제 6 국면은, 본 발명의 제 2 국면에 있어서, In the sixth aspect of the present invention, in the second aspect of the present invention,

상기 클러스터명을 구성하는 단어 중 앞의 클러스터명으로부터 변화한 단어는, 상기 제 2 화면 내에서 강조 표시되는 것을 특징으로 한다.Among words constituting the cluster name, a word that has changed from the previous cluster name is highlighted in the second screen.

본 발명의 제 7 국면은, 본 발명의 제 2 국면에 있어서, In the seventh aspect of the present invention, in the second aspect of the present invention,

상기 제 2 화면은, 상기 시간축을 따라 상기 주목어의 출현 빈도의 시간 경과에 따른 변화를 나타내는 그래프를 추가로 포함하는 것을 특징으로 한다.The second screen is characterized in that it further includes a graph showing a change over time in the frequency of appearance of the main word along the time axis.

본 발명의 제 8 국면은, 본 발명의 제 7 국면에 있어서, In the eighth aspect of the present invention, in the seventh aspect of the present invention,

상기 제 2 화면은, 상기 클러스터명이 변화하는 시기에 대응하는 위치에 경계선을 추가로 포함하고, 상기 그래프의 배경은, 상기 경계선마다 상이한 양태를 갖는 것을 특징으로 한다.The second screen is characterized in that a boundary line is additionally included at a position corresponding to a time when the cluster name changes, and the background of the graph has a different aspect for each boundary line.

본 발명의 제 9 국면은, 본 발명의 제 2 국면에 있어서, In the ninth aspect of the present invention, in the second aspect of the present invention,

상기 클러스터명이 크게 변화하는 것이 많은 경우에는, 상기 화면을 표시하는 스텝은, 경고 메세지를 포함하는 화면을 표시하는 것을 특징으로 한다.When the cluster name frequently changes greatly, the step of displaying the screen is characterized in that a screen including a warning message is displayed.

본 발명의 제 10 국면은, 텍스트 데이터의 분석 결과를 포함하는 화면을 표시하기 위한 텍스트 마이닝 프로그램으로서, A tenth aspect of the present invention is a text mining program for displaying a screen including an analysis result of text data,

상기 계층적 클러스터 분석의 결과를 포함하는 화면을 표시하는 스텝을 컴퓨터에 CPU 가 메모리를 이용하여 실행시키고, The CPU executes the step of displaying a screen including the result of the hierarchical cluster analysis in the computer using a memory,

본 발명의 제 11 국면은, 본 발명의 제 10 국면에 있어서, In the eleventh aspect of the present invention, in the tenth aspect of the present invention,

본 발명의 제 12 국면은, 본 발명의 제 11 국면에 있어서, In the twelfth aspect of the present invention, in the eleventh aspect of the present invention,

본 발명의 제 13 국면은, 본 발명의 제 11 국면에 있어서, In the thirteenth aspect of the present invention, in the eleventh aspect of the present invention,

본 발명의 제 14 국면은, 본 발명의 제 13 국면에 있어서, In the fourteenth aspect of the present invention, in the thirteenth aspect of the present invention,

본 발명의 제 15 국면은, 본 발명의 제 11 국면에 있어서, In the fifteenth aspect of the present invention, in the eleventh aspect of the present invention,

본 발명의 제 16 국면은, 본 발명의 제 11 국면에 있어서, In the sixteenth aspect of the present invention, in the eleventh aspect of the present invention,

본 발명의 제 17 국면은, 본 발명의 제 16 국면에 있어서, In the seventeenth aspect of the present invention, in the sixteenth aspect of the present invention,

본 발명의 제 18 국면은, 본 발명의 제 11 국면에 있어서, In the eighteenth aspect of the present invention, in the eleventh aspect of the present invention,

본 발명의 제 19 국면은, 텍스트 데이터의 분석 결과를 포함하는 화면을 표시하는 텍스트 마이닝 장치로서, A nineteenth aspect of the present invention is a text mining apparatus that displays a screen including an analysis result of text data,

날짜를 갖는 문으로 이루어지는 텍스트 데이터로부터 단어를 추출하는 단어 추출부와, A word extraction unit for extracting a word from text data consisting of a statement having a date,

상기 단어에 대해 분석 기간별로 계층적 클러스터 분석을 실시하는 클러스터링 처리부와, A clustering processing unit that performs hierarchical cluster analysis on the word by analysis period,

상기 계층적 클러스터 분석의 결과를 포함하는 화면을 표시하는 화면 표시부를 구비하고, A screen display unit for displaying a screen including a result of the hierarchical cluster analysis,

상기 결과를 포함하는 제 1 화면 내에서 주목어를 지정하는 지시가 입력되었을 때에, 상기 화면 표시부는, 상기 주목어를 포함하는 클러스터의 시간 경과에 따른 변화를 나타내는 제 2 화면을 표시하는 것을 특징으로 한다.When an instruction for designating a target word is input in the first screen including the result, the screen display unit displays a second screen indicating changes over time of the cluster including the target word. do.

본 발명의 제 20 국면은, 본 발명의 제 19 국면에 있어서, In the twentieth aspect of the present invention, in the nineteenth aspect of the present invention,

상기 제 1, 제 10 또는 제 19 국면에 의하면, 계층적 클러스터 분석의 결과를 포함하는 제 1 화면 내에서 주목어를 지정하는 지시가 입력되었을 때에, 주목어를 포함하는 클러스터의 시간 경과에 따른 변화를 나타내는 제 2 화면을 표시함으로써, 이용자는 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 용이하게 인식할 수 있다.According to the first, tenth, or nineteenth phase, when an instruction for designating a target word is input in the first screen including the result of hierarchical cluster analysis, the change over time of the cluster including the target word By displaying the second screen indicating Δ, the user can easily recognize the change over time in the hierarchical cluster analysis result.

상기 제 2, 제 11 또는 제 20 국면에 의하면, 주목어를 포함하는 클러스터에 포함되는 단어에 기초하는 클러스터명을 시간축을 따라 나타냄으로써, 이용자는 주목어를 포함하는 클러스터의 시간 경과에 따른 변화를 용이하게 인식할 수 있다.According to the second, eleventh, or twentieth phase, the cluster name based on the words included in the cluster including the target word is represented along the time axis, so that the user can change the change of the cluster including the target word over time. It can be easily recognized.

상기 제 3 또는 제 12 국면에 의하면, 주목어를 포함하는 클러스터 내에서 출현 빈도가 높은 단어를 연결한 클러스터명을 시간축을 따라 나타냄으로써, 이용자는 주목어를 포함하는 클러스터의 시간 경과에 따른 변화를 용이하게 인식할 수 있다.According to the third or twelfth aspect, the cluster name in which words with high frequency appearing in the cluster including the target word are connected along the time axis is displayed, so that the user can change the change over time of the cluster including the target word. It can be easily recognized.

상기 제 4, 제 5, 제 13 또는 제 14 국면에 의하면, 주목어를 포함하는 클러스터 이름의 변화의 정도에 따른 양태를 갖는 마크 (변화의 정도에 따른 색을 갖는 화살표) 를 포함하는 제 2 화면을 표시함으로써, 이용자는 주목어를 포함하는 클러스터의 변화의 정도를 용이하게 인식할 수 있다.According to the fourth, fifth, thirteenth or fourteenth aspect, a second screen including a mark having a mode according to the degree of change of the cluster name including the main word (arrow with a color according to the degree of change) By displaying, the user can easily recognize the degree of change in the cluster including the target word.

상기 제 6 또는 제 15 국면에 의하면, 주목어를 포함하는 클러스터의 이름을 구성하는 단어 중 변화된 단어를 강조 표시함으로써, 이용자는 주목어를 포함하는 클러스터에 있어서 출현 빈도가 높은 단어가 어떻게 변화했는지를 용이하게 인식할 수 있다.According to the sixth or fifteenth aspect, by highlighting the changed word among the words constituting the name of the cluster including the target word, the user can see how the word with high frequency of appearance in the cluster including the target word has changed. It can be easily recognized.

상기 제 7 또는 제 16 국면에 의하면, 주목어를 포함하는 클러스터의 시간 경과에 따른 변화에 추가하여, 주목어의 출현 빈도의 시간 경과에 따른 변화를 나타내는 그래프를 포함하는 화면을 표시함으로써, 이용자는 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 용이하게 인식할 수 있다.According to the seventh or sixteenth aspect, by displaying a screen including a graph indicating changes over time in the frequency of appearance of the attention word, in addition to the change over time of the cluster including the attention word, the user Changes in the hierarchical cluster analysis result over time can be easily recognized.

상기 제 8 또는 제 17 국면에 의하면, 주목어를 포함하는 클러스터의 이름이 변화하는 시기에 대응하는 위치에 경계선을 표시하고, 그래프의 배경의 양태를 경계선마다 바꿈으로써, 이용자는 주목어를 포함하는 클러스터가 변화하는 시기를 용이하게 인식할 수 있다.According to the eighth or seventeenth aspect, by displaying a boundary line at a position corresponding to the time when the name of the cluster containing the attention word changes, and changing the mode of the graph background for each boundary line, the user includes the attention word. It is easy to recognize when the cluster changes.

상기 제 9 또는 제 18 국면에 의하면, 주목어를 포함하는 클러스터의 이름이 크게 변화하는 것이 많은 경우에 경고 메세지를 포함하는 화면을 표시함으로써, 이용자는 계층적 클러스터 분석이 잘 되고 있지 않음을 인식할 수 있다.According to the ninth or eighteenth aspect, by displaying a screen including a warning message when the name of the cluster containing the target word is frequently changed, the user can recognize that hierarchical cluster analysis is not working well. I can.

도 1 은 본 발명의 실시형태에 관련된 텍스트 마이닝 장치의 구성을 나타내는 블록도이다.
도 2 는 도 1 에 나타내는 텍스트 마이닝 장치로서 동작하는 컴퓨터의 구성을 나타내는 블록도이다.
도 3 은 도 1 에 나타내는 텍스트 마이닝 장치의 동작을 나타내는 플로 차트이다.
도 4 는 도 1 에 나타내는 텍스트 마이닝 장치가 표시하는 계층적 클러스터 분석의 결과를 나타내는 창의 예를 나타내는 도면이다.
도 5 는 도 4 에 나타내는 창 내에서 주목어를 지정하는 조작을 나타내는 도면이다.
도 6 은 도 1 에 나타내는 텍스트 마이닝 장치가 표시하는 분석 결과의 시간 경과에 따른 변화를 나타내는 창의 예를 나타내는 도면이다.
도 7 은 도 1 에 나타내는 텍스트 마이닝 장치의 표시 화면의 예를 나타내는 도면이다.
도 8a 는 계층적 클러스터 분석 결과의 시간 경과에 따른 변화의 예를 나타내는 도면이다.
도 8b 는 도 8a 에 이어지는 도면이다.
도 8c 는 도 8b 에 이어지는 도면이다.
도 8d 는 도 8c 에 이어지는 도면이다.
도 9 는 도 1 에 나타내는 텍스트 마이닝 장치가 표시하는 창을 나타내는 도면이다.
도 10 은 수형도의 예를 나타내는 도면이다.
도 11 은 종래의 텍스트 마이닝 장치에 있어서의 계층적 클러스터 분석 결과의 표시 양태를 나타내는 도면이다.1 is a block diagram showing the configuration of a text mining apparatus according to an embodiment of the present invention.
Fig. 2 is a block diagram showing the configuration of a computer operating as the text mining apparatus shown in Fig. 1;
3 is a flow chart showing the operation of the text mining device shown in FIG. 1.
FIG. 4 is a diagram illustrating an example of a window showing a result of hierarchical cluster analysis displayed by the text mining device shown in FIG. 1.
5 is a diagram showing an operation for designating a target word within the window shown in FIG. 4.
6 is a diagram illustrating an example of a window showing changes over time in an analysis result displayed by the text mining apparatus shown in FIG. 1.
7 is a diagram illustrating an example of a display screen of the text mining device shown in FIG. 1.
8A is a diagram illustrating an example of changes over time in a hierarchical cluster analysis result.
Fig. 8B is a view following Fig. 8A.
Fig. 8C is a diagram following Fig. 8B.
Fig. 8D is a diagram following Fig. 8C.
9 is a diagram showing a window displayed by the text mining device shown in FIG. 1;
10 is a diagram showing an example of a vertical diagram.
11 is a diagram showing a display mode of hierarchical cluster analysis results in a conventional text mining apparatus.

이하, 도면을 참조하여, 본 발명의 실시형태에 관련된 텍스트 마이닝 방법, 텍스트 마이닝 프로그램, 및 텍스트 마이닝 장치에 대하여 설명한다. 본 실시형태에 관련된 텍스트 마이닝 방법은, 전형적으로는 컴퓨터를 사용하여 실행된다. 본 실시형태에 관련된 텍스트 마이닝 프로그램은, 컴퓨터를 사용하여 텍스트 마이닝 방법을 실시하기 위한 프로그램이다. 본 실시형태에 관련된 텍스트 마이닝 장치는, 전형적으로는 컴퓨터를 사용하여 구성된다. 텍스트 마이닝 프로그램을 실행하는 컴퓨터는, 텍스트 마이닝 장치로서 기능한다.Hereinafter, a text mining method, a text mining program, and a text mining apparatus according to an embodiment of the present invention will be described with reference to the drawings. The text mining method according to this embodiment is typically executed using a computer. The text mining program according to the present embodiment is a program for implementing a text mining method using a computer. The text mining apparatus according to the present embodiment is typically configured using a computer. A computer executing a text mining program functions as a text mining device.

도 1 은, 본 발명의 실시형태에 관련된 텍스트 마이닝 장치의 구성을 나타내는 블록도이다. 도 1 에 나타내는 텍스트 마이닝 장치 (10) 는, 지시 입력부 (11), 텍스트 데이터 기억부 (12), 단어 추출부 (13), 클러스터링 처리부 (14), 분석 결과 기억부 (15), 및 화면 표시부 (16) 를 구비하고 있다. 텍스트 마이닝 장치 (10) 는, 텍스트 데이터 기억부 (12) 에 기억된 텍스트 데이터에 대해 계층적 클러스터 분석을 실시하고, 분석 결과를 포함하는 화면을 표시한다.1 is a block diagram showing a configuration of a text mining apparatus according to an embodiment of the present invention. The text mining device 10 shown in Fig. 1 includes an instruction input unit 11, a text data storage unit 12, a word extraction unit 13, a clustering processing unit 14, an analysis result storage unit 15, and a screen display unit. (16) is provided. The text mining device 10 performs hierarchical cluster analysis on the text data stored in the text data storage unit 12, and displays a screen including the analysis result.

텍스트 마이닝 장치 (10) 의 동작의 개요는, 이하와 같다. 지시 입력부 (11) 에는, 이용자 (텍스트 데이터의 분석자) 로부터의 지시가 입력된다. 텍스트 데이터 기억부 (12) 는, 자유 기술된 1 이상의 텍스트 데이터를 기억하고 있다. 단어 추출부 (13) 는, 텍스트 데이터 기억부 (12) 에 기억된 텍스트 데이터에 대해 형태소 해석을 실시함으로써, 텍스트 데이터로부터 단어를 추출한다. 클러스터링 처리부 (14) 는, 단어 추출부 (13) 에서 추출된 단어에 대해 계층적 클러스터 분석을 실시한다. 분석 결과 기억부 (15) 는, 클러스터링 처리부 (14) 에 의한 분석 결과를 기억한다. 화면 표시부 (16) 는, 분석 결과 기억부 (15) 에 기억된 분석 결과에 기초하여 화면 데이터를 표시한다.The outline of the operation of the text mining device 10 is as follows. In the instruction input unit 11, an instruction from a user (analyzer of text data) is input. The text data storage unit 12 stores one or more freely described text data. The word extraction unit 13 extracts words from the text data by performing morpheme analysis on the text data stored in the text data storage unit 12. The clustering processing unit 14 performs hierarchical cluster analysis on the words extracted by the word extraction unit 13. The analysis result storage unit 15 stores the analysis result by the clustering processing unit 14. The screen display unit 16 displays screen data based on the analysis result stored in the analysis result storage unit 15.

텍스트 데이터 기억부 (12) 는, 날짜를 갖는 문으로 이루어지고, 장기간 (예를 들어, 수년간) 에 걸쳐 누적적으로 축적된 텍스트 데이터를 기억하고 있다. 이용자는, 지시 입력부 (11) 를 사용하여, 분석 대상인 텍스트 데이터와 분석 기간과 분석 간격을 지정하는 지시, 주목어를 지정하는 지시 등을 입력한다. 단어 추출부 (13), 클러스터링 처리부 (14), 및 화면 표시부 (16) 는, 이용자로부터의 지시에 따라, 텍스트 데이터에 대해 계층적 클러스터 분석을 실시한 결과를 포함하는 화면을 표시하기 위한 동작을 실시한다. 또, 화면 표시부 (16) 는, 이용자로부터의 지시에 따라, 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 포함하는 화면을 표시한다.The text data storage unit 12 is made of a door having a date and stores text data accumulated over a long period of time (for example, several years). Using the instruction input unit 11, the user inputs text data to be analyzed, an instruction to designate an analysis period and an analysis interval, an instruction to designate an attention word, and the like. The word extracting unit 13, the clustering processing unit 14, and the screen display unit 16 perform an operation for displaying a screen including the result of performing hierarchical cluster analysis on text data according to an instruction from the user. do. In addition, the screen display unit 16 displays a screen including changes over time in the hierarchical cluster analysis result according to an instruction from the user.

도 2 는, 텍스트 마이닝 장치 (10) 로서 기능하는 컴퓨터의 구성을 나타내는 블록도이다. 도 2 에 나타내는 컴퓨터 (20) 는, CPU (21), 메인 메모리 (22), 기억부 (23), 입력부 (24), 표시부 (25), 통신부 (26), 및 기록 매체 판독부 (27) 를 구비하고 있다. 메인 메모리 (22) 에는, 예를 들어, DRAM 이 사용된다. 기억부 (23) 에는, 예를 들어, 하드 디스크나 솔리드 스테이트 드라이브가 사용된다. 입력부 (24) 에는, 예를 들어, 키보드 (28) 나 마우스 (29) 가 포함된다. 표시부 (25) 에는, 예를 들어, 액정 디스플레이가 사용된다. 통신부 (26) 는, 유선 통신 또는 무선 통신의 인터페이스 회로이다. 기록 매체 판독부 (27) 는, 프로그램 등을 기억한 기록 매체 (30) 의 인터페이스 회로이다. 기록 매체 (30) 에는, 예를 들어, CD-ROM, DVD-ROM, USB 메모리 등의 비일과성의 기록 매체가 사용된다.2 is a block diagram showing the configuration of a computer functioning as the text mining device 10. The computer 20 shown in FIG. 2 includes a CPU 21, a main memory 22, a storage unit 23, an input unit 24, a display unit 25, a communication unit 26, and a recording medium reading unit 27. It is equipped with. For the main memory 22, for example, DRAM is used. For the storage unit 23, for example, a hard disk or a solid state drive is used. The input unit 24 includes, for example, a keyboard 28 and a mouse 29. For the display portion 25, for example, a liquid crystal display is used. The communication unit 26 is an interface circuit for wired communication or wireless communication. The recording medium reading unit 27 is an interface circuit of the recording medium 30 storing programs and the like. For the recording medium 30, for example, a non-transitory recording medium such as a CD-ROM, a DVD-ROM, or a USB memory is used.

컴퓨터 (20) 가 텍스트 마이닝 프로그램 (31) 을 실행하는 경우, 기억부 (23) 는, 텍스트 마이닝 프로그램 (31) 과 텍스트 데이터 (32) 를 기억한다. 텍스트 마이닝 프로그램 (31) 과 텍스트 데이터 (32) 는, 예를 들어, 서버나 다른 컴퓨터로부터 통신부 (26) 를 사용해서 수신한 것이어도 되고, 기록 매체 (30) 로부터 기록 매체 판독부 (27) 를 사용해서 판독 출력한 것이어도 된다.When the computer 20 executes the text mining program 31, the storage unit 23 stores the text mining program 31 and text data 32. The text mining program 31 and the text data 32 may be received from, for example, a server or another computer using the communication unit 26, or the recording medium reading unit 27 from the recording medium 30. It may be used and read out.

텍스트 마이닝 프로그램 (31) 을 실행할 때에는, 텍스트 마이닝 프로그램 (31) 과 텍스트 데이터 (32) 는 메인 메모리 (22) 에 복사 전송된다. CPU (21) 는, 메인 메모리 (22) 를 작업용 메모리로서 이용하여, 메인 메모리 (22) 에 기억된 텍스트 마이닝 프로그램 (31) 을 실행함으로써, 텍스트 데이터 (32) 로부터 단어를 추출하는 처리, 추출한 단어에 대해 계층적 클러스터 분석을 실시하는 처리, 분석 결과를 포함하는 화면을 표시하는 처리 등을 실시한다. 이 때 컴퓨터 (20) 는, 텍스트 마이닝 장치 (10) 로서 기능한다. 또한, 이상으로 서술한 컴퓨터 (20) 의 구성은 일례에 불과하며, 임의의 컴퓨터를 사용하여 텍스트 마이닝 장치 (10) 를 구성할 수 있다.When executing the text mining program 31, the text mining program 31 and text data 32 are copied and transferred to the main memory 22. The CPU 21 uses the main memory 22 as a working memory and executes the text mining program 31 stored in the main memory 22, thereby extracting a word from the text data 32, the extracted word A process of performing hierarchical cluster analysis and a process of displaying a screen including analysis results are performed. At this time, the computer 20 functions as the text mining device 10. In addition, the configuration of the computer 20 described above is only an example, and the text mining apparatus 10 can be configured using any computer.

도 3 은, 텍스트 마이닝 장치 (10) 의 동작을 나타내는 플로 차트이다. 도 3 에 나타내는 동작을 실시하기 전에, 텍스트 데이터 기억부 (12) 는, 자유 기술되고, 누적적으로 축적된 1 이상의 텍스트 데이터를 기억하고 있다. 텍스트 데이터는 날짜 (예를 들어, 작업일이나 접수일 등) 를 갖는 문으로 이루어지고, 텍스트 데이터는 날짜에 의해 복수의 부분으로 분할된다. 텍스트 마이닝 장치 (10) 는, 텍스트 데이터 기억부 (12) 에 기억된 텍스트 데이터 중에서 이용자가 지정한 텍스트 데이터에 대해 처리를 실시한다.3 is a flow chart showing the operation of the text mining device 10. Before performing the operation shown in FIG. 3, the text data storage unit 12 stores one or more text data that is freely described and accumulated cumulatively. Text data consists of a statement with a date (eg, work day or reception date, etc.), and text data is divided into a plurality of parts by date. The text mining device 10 performs processing on text data designated by a user from among text data stored in the text data storage unit 12.

도 3 에 있어서, 지시 입력부 (11) 는, 먼저 이용자로부터 분석 대상인 텍스트 데이터, 분석 기간, 및 분석 간격을 지정하는 지시를 수취한다 (스텝 S101). 이용자는, 입력부 (24) 를 사용하여, 화면에 표시된 다이얼로그 박스 (도시 생략) 에 이들 정보를 입력한다. 수취한 지시는, 텍스트 마이닝 장치 (10) 의 각 부에 대해 출력된다.In Fig. 3, the instruction input unit 11 first receives an instruction to designate the text data to be analyzed, the analysis period, and the analysis interval from the user (step S101). The user uses the input unit 24 to input these information into a dialog box (not shown) displayed on the screen. The received instruction is output to each unit of the text mining device 10.

다음으로, 단어 추출부 (13) 는, 텍스트 데이터 기억부 (12) 로부터 지정된 텍스트 데이터를 판독 출력한다 (스텝 S102). 다음으로, 단어 추출부 (13) 는, 스텝 S102 에서 판독 출력한 텍스트 데이터에 대해 형태소 해석을 실시함으로써, 판독 출력한 텍스트 데이터로부터 단어를 추출한다 (스텝 S103). 이 때, 단어 추출부 (13) 는, 판독 출력한 텍스트 데이터로부터, 이후의 분석에서 필요해지는 단어만을 추출한다.Next, the word extracting unit 13 reads and outputs the designated text data from the text data storage unit 12 (step S102). Next, the word extraction unit 13 extracts a word from the read-out text data by performing morpheme analysis on the text data read and output in step S102 (step S103). At this time, the word extracting unit 13 extracts only words necessary for subsequent analysis from the read-out text data.

다음으로, 클러스터링 처리부 (14) 는, 스텝 S103 에서 추출된 단어에 대해 계층적 클러스터 분석을 실시한다 (스텝 S104). 다음으로, 클러스터링 처리부 (14) 는, 스텝 S103 에서 추출된 단어의 출현 빈도를 구한다 (스텝 S105). 다음으로, 분석 결과 기억부 (15) 는, 스텝 S104 에서 구한 계층적 클러스터 분석의 결과와 스텝 S105 에서 구한 단어의 출현 빈도를 기억한다 (스텝 S106).Next, the clustering processing unit 14 performs hierarchical cluster analysis on the words extracted in step S103 (step S104). Next, the clustering processing unit 14 calculates the frequency of appearance of the word extracted in step S103 (step S105). Next, the analysis result storage unit 15 stores the hierarchical cluster analysis result obtained in step S104 and the frequency of occurrence of the word determined in step S105 (step S106).

클러스터링 처리부 (14) 는, 지시 입력부 (11) 로부터, 이용자가 지정한 분석 기간과 분석 간격을 수취한다. 분석 기간은, 분석 대상인 텍스트 데이터 중, 실제로 계층적 클러스터 분석을 실시하는 기간을 나타낸다. 분석 기간은, 분석 간격을 단위로 하여 복수의 기간으로 분할된다. 예를 들어, 분석 기간이 2005 년 6 월 1 일부터 2015 년 5 월 31 일까지의 기간이고, 분석 간격이 1 개월인 경우, 11 년의 분석 기간은 132 개의 기간으로 분할된다.The clustering processing unit 14 receives, from the instruction input unit 11, an analysis period and an analysis interval designated by the user. The analysis period indicates a period during which hierarchical cluster analysis is actually performed among text data to be analyzed. The analysis period is divided into a plurality of periods based on the analysis interval. For example, if the analysis period is from June 1, 2005 to May 31, 2015, and the analysis interval is 1 month, the analysis period of 11 years is divided into 132 periods.

분할 후의 기간의 개수를 p 로 한다. 클러스터링 처리부 (14) 는, 스텝 S104 에 있어서, p 개의 기간 각각에 대하여 계층적 클러스터 분석을 실시한다. 보다 상세하게는, 클러스터링 처리부 (14) 는, p 개의 기간 각각에 대하여, 스텝 S103 에서 추출된 단어에 대해, 스텝 S102 에서 판독 출력된 텍스트 데이터 중 기간 내의 일시를 갖는 문을 사용하여 계층적 클러스터 분석을 실시한다. 클러스터링 처리부 (14) 는, 예를 들어, 텍스트 데이터 (32) 에 있어서의 2 개의 단어 간의 거리 (2 개의 단어가 어느 정도 떨어져 나타나는지) 에 기초하여, 2 개의 단어 간의 유사도를 구한다. 클러스터링 처리부 (14) 는, 구한 단어 간의 유사도에 기초하여, 소정의 방법 (예를 들어, 최단 거리법, 최장 거리법, 군 평균법, 십진법, 워드법 등) 을 이용하여 계층적 클러스터 분석을 실시한다.The number of periods after division is set to p. The clustering processing unit 14 performs hierarchical cluster analysis for each of the p periods in step S104. More specifically, for each of the p periods, the clustering processing unit 14 analyzes the hierarchical cluster using a statement having a date and time within the period among text data read and output in step S102 for the words extracted in step S103. Conduct. The clustering processing unit 14 obtains a degree of similarity between two words, for example, based on the distance between two words in the text data 32 (how far the two words appear apart). The clustering processing unit 14 performs hierarchical cluster analysis using a predetermined method (e.g., shortest distance method, longest distance method, group average method, decimal method, word method, etc.) based on the degree of similarity between the obtained words. .

클러스터링 처리부 (14) 는, 스텝 S105 에 있어서, p 개의 기간 각각에 대하여 단어의 출현 빈도를 구한다. 스텝 S104 에서는 계층적 클러스터 분석의 결과가 p 개 구해지고, 스텝 S105 에서는 단어의 출현 빈도가 p 개씩 구해진다. 분석 결과 기억부 (15) 는, 스텝 S106 에 있어서, p 개의 기간 각각에 대하여, 계층적 클러스터 분석의 결과와 단어의 출현 빈도를 기억한다.In step S105, the clustering processing unit 14 calculates the frequency of occurrence of words for each of the p periods. In step S104, p results of the hierarchical cluster analysis are obtained, and in step S105, p occurrence frequencies of words are obtained. In step S106, the analysis result storage unit 15 stores the hierarchical cluster analysis result and the frequency of occurrence of words for each of the p periods.

다음으로, 화면 표시부 (16) 는, 분석 결과 기억부 (15) 에 기억된 계층적 클러스터 분석의 결과를 포함하는 화면을 표시한다 (스텝 S107). 도 4 는, 스텝 S107 에서 표시되는 창의 예를 나타내는 도면이다. 도 4 에 나타내는 창 (41) 은, 계층적 클러스터 분석의 결과를 포함하고 있다. 계층적 클러스터 분석의 결과에 대해 클러스터수를 설정하면, 각 클러스터에 포함되는 단어가 결정된다. 텍스트 마이닝 장치 (10) 는, 계층적 클러스터 분석의 결과를 포함하는 화면을 표시할 때에, 수형도 대신에, 복수의 클러스터를 도 4 에 나타내는 양태로 표시한다.Next, the screen display unit 16 displays a screen including the result of hierarchical cluster analysis stored in the analysis result storage unit 15 (step S107). 4 is a diagram showing an example of a window displayed in step S107. The window 41 shown in FIG. 4 contains the result of hierarchical cluster analysis. When the number of clusters is set for the result of hierarchical cluster analysis, words included in each cluster are determined. When displaying a screen including the result of hierarchical cluster analysis, the text mining device 10 displays a plurality of clusters in the manner shown in FIG. 4 instead of a tree diagram.

텍스트 마이닝 장치 (10) 는, 동작 파라미터로서, 클러스터수와 클러스터 내의 최대 표시 데이터수를 갖는다. 이하, 전자를 m, 후자를 n 으로 한다. 이들 값은, 초기 상태에서는 소정의 초기값으로 설정되어 있다. 이용자는, 지시 입력부 (11) 를 사용하여, 이들의 값을 임의로 설정해도 된다. 텍스트 마이닝 장치 (10) 에서는, 스텝 S103 에서 추출된 단어는, m 개의 클러스터로 분류된다. 각 클러스터에는, 1 개 이상의 단어가 포함된다. 창 (41) 에는 m 개의 클러스터가 구름형 도형으로 표시되고, 각 클러스터의 내부에는 각 클러스터에 포함되는 단어가 표시된다. 각 클러스터의 내부에 표시되는 단어의 개수는, n 개 이하로 제한된다. 예를 들어, n ＝ 5 일 때에 어떤 클러스터가 10 개의 단어를 포함하는 경우, 화면에 표시되는 클러스터의 내부에는 5 개의 단어가 표시된다.The text mining apparatus 10 has, as operation parameters, the number of clusters and the maximum number of display data in the cluster. Hereinafter, the former is m and the latter is n. These values are set to predetermined initial values in the initial state. The user may arbitrarily set these values using the instruction input unit 11. In the text mining apparatus 10, the words extracted in step S103 are classified into m clusters. Each cluster contains one or more words. In the window 41, m clusters are displayed as a cloud-shaped figure, and words included in each cluster are displayed inside each cluster. The number of words displayed inside each cluster is limited to n or less. For example, when n = 5, when a cluster contains 10 words, 5 words are displayed inside the cluster displayed on the screen.

다음으로, 지시 입력부 (11) 는, 이용자로부터 지시를 수취한다 (스텝 S111). 다음으로, 텍스트 마이닝 장치 (10) 는, 스텝 S111 에서 수취한 지시가 주목어를 지정하는 지시인지 여부를 판단한다 (스텝 S112). 텍스트 마이닝 장치 (10) 의 제어는, 예인 경우에는 스텝 S121 로 진행되고, 아니오인 경우에는 스텝 S113 으로 진행된다.Next, the instruction input unit 11 receives an instruction from the user (step S111). Next, the text mining apparatus 10 determines whether or not the instruction received in step S111 is an instruction for designating a target word (step S112). The control of the text mining apparatus 10 proceeds to step S121 in the case of yes, and proceeds to step S113 in the case of no.

후자의 경우, 스텝 S111 에서 수취한 지시는, 예를 들어, 창을 이동시키는 지시, 창을 비표시로 하는 지시, 창을 닫는 지시 등이다. 화면 표시부 (16) 는, 스텝 S111 에서 수취한 지시에 따라, 갱신 후의 화면을 표시한다 (스텝 S113). 그 후, 텍스트 마이닝 장치 (10) 의 제어는, 스텝 S111 로 진행된다.In the latter case, the instruction received in step S111 is, for example, an instruction to move the window, an instruction to hide the window, an instruction to close the window, and the like. The screen display unit 16 displays the updated screen according to the instruction received in step S111 (step S113). After that, the control of the text mining device 10 proceeds to step S111.

스텝 S111 을 실행할 때에는, 계층적 클러스터 분석의 결과를 포함하는 화면이 표시되어 있다. 이하, 스텝 S111 을 실행할 때에, 도 4 에 나타내는 창 (41) 을 포함하는 화면이 표시되어 있는 것으로 한다. 또, 마우스 커서 (43) 가 표시 화면 내의 어떤 요소 위에 있을 때에 마우스 (29) 의 버튼을 클릭하는 것을 「요소를 클릭한다」라고 하고, 주목어를 포함하는 클러스터를 「주목어 클러스터」라고 하고, 주목어 클러스터에 붙여지는 이름을 「주목어 클러스터명」이라고 한다.When executing step S111, a screen including the result of hierarchical cluster analysis is displayed. Hereinafter, when executing step S111, it is assumed that a screen including the window 41 shown in Fig. 4 is displayed. In addition, clicking the button of the mouse 29 when the mouse cursor 43 is over a certain element in the display screen is referred to as "clicking the element", and the cluster containing the target word is referred to as "the main word cluster", The name given to the main word cluster is called "the main word cluster name".

도 5 는, 주목어를 지정하는 조작을 나타내는 도면이다. 이용자는, 창 (41) 내에서 주목어로서 지정하는 단어 (여기에서는 「분해」) 를 클릭한다 (1 회째 클릭). 이 때, 표시 화면 내에 컨텍스트 메뉴 (42) 가 나타난다. 이용자는, 컨텍스트 메뉴 (42) 중에서 항목 「분석 결과의 시간 경과에 따른 변화로」를 클릭한다 (2 회째 클릭). 이 조작에 의해, 1 회째에 클릭된 단어가 주목어로서 지정된다.5 is a diagram showing an operation for designating a target word. In the window 41, the user clicks a word (here "disassemble") designated as the target word (click 1st time). At this time, the context menu 42 appears in the display screen. In the context menu 42, the user clicks the item "Change of analysis result over time" (click 2nd time). By this operation, the word clicked on the first time is designated as the target word.

스텝 S112 에서 예인 경우, 화면 표시부 (16) 는, 분석 결과 기억부 (15) 로부터 계층적 클러스터 분석의 결과와 주목어의 출현 빈도를 판독 출력한다 (스텝 S121). 다음으로, 화면 표시부 (16) 는, 판독 출력한 데이터에 기초하여, 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 포함하는 화면을 표시한다 (스텝 S122).In the case of YES in step S112, the screen display unit 16 reads and outputs the hierarchical cluster analysis result and the frequency of occurrence of the target word from the analysis result storage unit 15 (step S121). Next, the screen display unit 16 displays a screen including a change over time in the hierarchical cluster analysis result based on the read-out data (step S122).

도 6 은, 스텝 S122 에서 표시되는, 분석 결과의 시간 경과에 따른 변화를 나타내는 창을 나타내는 도면이다. 도 6 에 나타내는 창 (51) 은, 스텝 S111 에 있어서, 주목어로서 「분해」를 지정했을 때에 표시된다. 창 (51) 은, 예를 들어 도 7 에 나타내는 바와 같이, 도 4 에 나타내는 창 (41) 에 겹쳐 표시된다.Fig. 6 is a diagram showing a window showing a change over time of an analysis result displayed in step S122. The window 51 shown in FIG. 6 is displayed in step S111 when "decomposition" is designated as the main word. The window 51 is superimposed on the window 41 shown in FIG. 4 as shown in FIG. 7, for example.

창 (51) 은, 수평 방향으로 연신하는 시간축을 따라, 주목어의 출현 빈도의 시간 경과에 따른 변화를 나타내는 꺾은선 그래프 (52) 를 포함하고 있다. 주목어의 출현 빈도에는, 예를 들어, 주목어 클러스터에 포함되는 모든 단어의 출현 횟수의 합계 중에서 주목어의 출현 횟수가 차지하는 비율이 사용된다. 주목어의 출현 빈도는, 이용자로부터의 지시에 따라, 주목어의 출현 횟수로 바뀌어도 된다.The window 51 includes a line graph 52 showing a change in the frequency of appearance of the attention word over time along a time axis extending in the horizontal direction. For the frequency of appearance of the attention word, for example, a ratio occupied by the number of appearances of the attention word among the sum of the number of occurrences of all words included in the attention word cluster is used. The frequency of appearance of the attention word may be changed to the number of appearances of the attention word according to an instruction from the user.

계층적 클러스터 분석에 의해 얻어지는 클러스터의 구성 (클러스터에 포함되는 요소) 은, 시간 경과에 따라 변화한다. 클러스터의 시간 경과에 따른 변화를 나타내기 위해, 클러스터에는 자동적으로 이름이 붙여진다. 클러스터가 1 개의 단어만을 포함하는 경우에는, 클러스터명에는 그 단어가 그대로 사용된다. 클러스터가 2 개의 단어를 포함하는 경우에는, 클러스터명에는 2 개의 단어를 출현 빈도가 높은 순으로 연결한 것이 사용된다. 클러스터가 3 개 이상의 단어를 포함하는 경우에는, 클러스터명에는 클러스터에 포함되는 단어 중 출현 빈도가 높은 3 개의 단어를 출현 빈도가 큰 순으로 연결한 것이 사용된다. 또한, 클러스터명을 구성하는 단어의 집합이 동일한 경우, 단어의 순서가 상이해도 동일한 클러스터명으로서 취급된다.The configuration of the cluster (elements included in the cluster) obtained by hierarchical cluster analysis changes over time. Clusters are automatically named to indicate how the cluster changes over time. When the cluster contains only one word, the word is used as it is in the cluster name. When the cluster includes two words, the cluster name is the one in which the two words are connected in the order of the highest frequency of appearance. When the cluster includes three or more words, the cluster name is one in which three words with a high occurrence frequency among words included in the cluster are connected in the order of the highest occurrence frequency. Further, when the set of words constituting the cluster name is the same, even if the order of the words is different, it is treated as the same cluster name.

도 8a ∼ 도 8d 는, 계층적 클러스터 분석 결과의 시간 경과에 따른 변화의 예를 나타내는 도면이다. 도 8a ∼ 도 8d 에는, 다른 달에 있어서의 계층적 클러스터 분석의 결과가 기재되어 있다. 도 8a ∼ 도 8d 에 있어서, 구름형 도형은 클러스터를 나타내고, 밑줄을 그은 문자열은 클러스터명을 나타낸다. 원의 사이즈는, 원 안에 기재된 단어의 출현 빈도를 나타낸다.8A to 8D are diagrams showing examples of changes over time in the results of hierarchical cluster analysis. 8A to 8D show the results of hierarchical cluster analysis in different months. In Figs. 8A to 8D, a cloud-shaped figure represents a cluster, and an underlined character string represents a cluster name. The size of the circle indicates the frequency of occurrence of words written in the circle.

도 8a 에 나타내는 분석 결과에서는, 텍스트 데이터로부터 추출된 단어는, 「구동」과 「분해」를 포함하는 클러스터, 「배기」와 「압」과 「플로」와 「밸브」를 포함하는 클러스터, 및 「벨트」와 「회전」과 「체크」와 「모터」와 「팽팽함」을 포함하는 클러스터로 분류되어 있다. 이들 3 개의 클러스터에는, 각각 「구동·분해」, 「배기·압·플로」, 및 「벨트·회전·체크」라는 이름이 붙여진다. 도 8b ∼ 도 8d 에 나타내는 분석 결과에 대해서도, 3 개의 클러스터에 동일한 방법으로 이름이 붙여진다.In the analysis result shown in Fig. 8A, the words extracted from the text data are clusters including "drive" and "decomposition", clusters including "exhaust" and "pressure" and "flow" and "valve", and " It is classified into clusters including “belt”, “rotation”, “check”, “motor” and “tightness”. These three clusters are named "drive/disassemble", "exhaust/pressure/flow", and "belt/rotation/check", respectively. Also about the analysis results shown in Figs. 8B to 8D, three clusters are named in the same manner.

주목어로서 「분해」를 지정했을 때, 주목어 클러스터명은, 도 8a 에 나타내는 분석 결과에서는 「분해·구동」이고, 도 8b 에 나타내는 분석 결과에서는 「구동·벨트·회전」이고, 도 8c 에 나타내는 분석 결과에서는 「배기·압·플로」이고, 도 8d 에 나타내는 분석 결과에서는 「배기·압·분해」이다. 이와 같이 주목어 클러스터명은, 시간 경과에 따라 변화한다.When "decomposition" is designated as the notable word, the name of the attention word cluster is "decomposition/drive" in the analysis result shown in FIG. 8A, and "drive/belt rotation" in the analysis result shown in FIG. 8B. In the analysis result, it is "exhaust pressure, flow", and in the analysis result shown in FIG. 8D, it is "exhaust pressure, decomposition". In this way, the attention word cluster name changes over time.

도 6 에 나타내는 창 (51) 은, 꺾은선 그래프 (52) 에 추가하여, 주목어 클러스터명 (53), 경계선 (54), 및 화살표 (55) 를 포함하고 있다. 주목어 클러스터명 (53) 은, 수평 방향으로 연신하는 시간축을 따라, 꺾은선 그래프 (52) 의 상부에 표시된다. 경계선 (54) 은, 꺾은선 그래프 (52) 내에서, 주목어 클러스터명 (53) 이 변화하는 시기에 대응하는 위치에 표시된다. 주목어 클러스터명 (53) 은, 경계선 (54) 으로 구획된 기간마다 표시된다. 꺾은선 그래프 (52) 의 배경은, 경계선 (54) 마다 상이한 양태 (예를 들어, 다른 색이나 다른 패턴) 를 갖는다. 주목어 클러스터명 (53) 을 구성하는 단어 중 앞의 클러스터명으로부터 변화한 단어 (구 (舊) 주목어 클러스터명에는 포함되지 않고, 신 (新) 주목어 클러스터명에 포함되어 있는 단어) 는, 강조 표시된다. 창 (51) 에서는, 그러한 단어는 고딕체로 또한 이탤릭체로 표시되어 있다.The window 51 shown in FIG. 6 includes the attention word cluster name 53, the boundary line 54, and the arrow 55 in addition to the dotted line graph 52. The attention word cluster name 53 is displayed on the upper part of the broken line graph 52 along the time axis extending in the horizontal direction. The boundary line 54 is displayed in the broken line graph 52 at a position corresponding to the time when the attention word cluster name 53 changes. The attention word cluster name 53 is displayed for each period divided by the border line 54. The background of the dotted line graph 52 has a different aspect (for example, a different color or a different pattern) for each boundary line 54. Among the words constituting the attention word cluster name 53, a word changed from the previous cluster name (a word not included in the old attention word cluster name, but included in the new attention word cluster name), Is highlighted. In window 51, such words are indicated in both Gothic and italics.

화살표 (55) 는, 경계선 (54) 의 상부에서, 주목어 클러스터명 (53) 이 변화하는 시기에 대응하는 위치에 표시된다. 화살표 (55) 는, 주목어 클러스터명 (53) 의 변화의 정도에 따른 양태로 표시된다. 주목어 클러스터명 (53) 을 구성하는 단어가 모두 변화하는 경우에는, 빨간 화살표 (55r) 가 표시된다. 주목어 클러스터명 (53) 을 구성하는 단어가 2 개 변화하는 경우에는, 파란 화살표 (55b) 가 표시된다. 주목어 클러스터명 (53) 을 구성하는 단어가 1 개 변화하는 경우에는, 검은 화살표 (55n) 가 표시된다. 또한, 화살표 (55) 의 표시 양태는, 주목어 클러스터명 (53) 의 변화의 정도에 따라 다르기만 하면 임의여도 된다. 예를 들어, 화살표 (55) 의 표시 사이즈가, 주목어 클러스터명 (53) 의 변화의 정도에 따라 상이해도 된다.An arrow 55 is displayed at a position corresponding to the time when the attention word cluster name 53 changes, above the boundary line 54. Arrow 55 is displayed in the mode according to the degree of change of the attention word cluster name 53. When all of the words constituting the attention word cluster name 53 change, a red arrow 55r is displayed. When two words constituting the attention word cluster name 53 change, a blue arrow 55b is displayed. When one word constituting the attention word cluster name 53 changes, a black arrow 55n is displayed. In addition, the display mode of the arrow 55 may be arbitrary as long as it differs according to the degree of change of the attention word cluster name 53. For example, the display size of the arrow 55 may be different depending on the degree of change of the attention word cluster name 53.

도 6 에 나타내는 예에서는, 주목어 클러스터명 (53) 은, 「구동·분해」, 「구동·벨트·회전」, 「배기·압·플로」, 및 「배기·압·분해」의 순으로 시간 경과에 따라 변화한다. 1 회째 변화에서는 주목어 클러스터명 (53) 을 구성하는 단어가 2 개 변화하므로, 최초의 경계선 (54) 위에는 파란 화살표 (55b) 가 표시된다. 2 회째 변화에서는 주목어 클러스터명 (53) 을 구성하는 단어가 모두 변화하므로, 2 번째 경계선 (54) 위에는 빨간 화살표 (55r) 가 표시된다. 3 번째 변화에서는 주목어 클러스터명 (53) 을 구성하는 단어가 1 개 변화하므로, 3 회째 경계선 (54) 위에는 검은 화살표 (55n) 가 표시된다.In the example shown in FIG. 6, the attention word cluster name 53 is time in the order of "drive/decomposition", "drive/belt/rotation", "exhaust/pressure/flow", and "exhaust/pressure/decomposition". It changes with progress. In the first change, two words constituting the attention word cluster name 53 change, so a blue arrow 55b is displayed on the first boundary line 54. In the second change, since all of the words constituting the attention word cluster name 53 change, a red arrow 55r is displayed on the second boundary line 54. In the third change, since one word constituting the attention word cluster name 53 changes, a black arrow 55n is displayed on the third boundary line 54.

다음으로, 화면 표시부 (16) 는, 스텝 S122 에서 표시한 화면에 포함되는 화살표 (55) 의 개수를 종류별로 구한다 (스텝 S123). 다음으로, 화면 표시부 (16) 는, 각 종류의 화살표 (55) 의 개수에 기초하여, 주목어 클러스터명 (53) 의 변화가 큰지 여부를 판단한다 (스텝 S124). 화면 표시부 (16) 는, 예를 들어, 빨간 화살표 (55r) 의 개수가 화살표 (55) 의 총수의 30 ％ 를 초과한 경우에 예라고 판단해도 되고, 빨간 화살표 (55r) 의 개수와 파란 화살표 (55b) 의 개수의 합계가 화살표 (55) 의 총수의 60 ％ 를 초과한 경우에 예라고 판단해도 된다. 텍스트 마이닝 장치 (10) 의 제어는, 예인 경우에는 스텝 S125 로 진행되고, 아니오인 경우에는 스텝 S111 로 진행된다.Next, the screen display unit 16 obtains the number of arrows 55 included in the screen displayed in step S122 for each type (step S123). Next, the screen display unit 16 determines whether or not the change in the attention word cluster name 53 is large, based on the number of arrows 55 of each type (step S124). The screen display unit 16 may be determined as YES when the number of red arrows 55r exceeds 30% of the total number of arrows 55, for example, and the number of red arrows 55r and the blue arrows ( When the total number of 55b) exceeds 60% of the total number of arrows 55, it may be judged as YES. The control of the text mining device 10 proceeds to step S125 in the case of yes, and to step S111 in the case of no.

전자의 경우, 화면 표시부 (16) 는, 경고 메세지를 포함하는 화면을 표시한다 (스텝 S125). 도 9 는, 스텝 S125 에서 표시되는 창을 나타내는 도면이다. 도 9 에 나타내는 창 (61) 은, 주목어 클러스터의 구성이 크게 변화하는 경우가 많기 때문에, 계층적 클러스터 분석의 설정 (예를 들어, 클러스터수나 대상 단어수) 을 재조정할 것을 권하는 취지의 경고 메세지를 포함하고 있다. 그 후, 텍스트 마이닝 장치 (10) 의 제어는, 스텝 S111 로 진행된다.In the former case, the screen display unit 16 displays a screen including a warning message (step S125). 9 is a diagram showing a window displayed in step S125. The window 61 shown in Fig. 9 is a warning message to the effect of recommending readjusting the hierarchical cluster analysis settings (e.g., the number of clusters or the number of target words) because the configuration of the attention word cluster changes greatly in many cases. It includes. After that, control of the text mining device 10 proceeds to step S111.

이상으로 나타내는 바와 같이, 본 실시형태에 관련된 텍스트 마이닝 방법은, 날짜를 갖는 문으로 이루어지는 텍스트 데이터로부터 단어를 추출하는 스텝 (스텝 S102, S103) 과, 추출한 단어에 대해 분석 기간별로 계층적 클러스터 분석을 실시하는 스텝 (스텝 S104) 과, 계층적 클러스터 분석에 의한 분석 결과를 포함하는 화면을 표시하는 스텝 (스텝 S107, S113, S121 ∼ S125) 을 구비하고 있다. 분석 결과를 포함하는 제 1 화면 (창 (41) 을 포함하는 화면) 내에서 주목어를 지정하는 지시가 입력되었을 때에 (도 5), 화면을 표시하는 스텝 (스텝 S122) 은, 주목어를 포함하는 클러스터의 시간 경과에 따른 변화를 나타내는 제 2 화면 (창 (51) 을 포함하는 화면) 을 표시한다. 본 실시형태에 관련된 텍스트 마이닝 방법에 의하면, 계층적 클러스터 분석의 결과를 포함하는 제 1 화면 내에서 주목어를 지정하는 지시가 입력되었을 때에, 주목어를 포함하는 클러스터의 시간 경과에 따른 변화를 나타내는 제 2 화면을 표시함으로써, 이용자는 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 용이하게 인식할 수 있다.As described above, in the text mining method according to the present embodiment, the steps of extracting words from text data consisting of statements having dates (steps S102 and S103), and hierarchical cluster analysis for each analysis period for the extracted words. It includes a step to perform (step S104) and a step (steps S107, S113, and S121 to S125) of displaying a screen including the analysis result by hierarchical cluster analysis. When an instruction for designating a target word is input in the first screen containing the analysis result (the screen including the window 41) (Fig. 5), the step of displaying the screen (step S122) includes the target word A second screen (a screen including the window 51) indicating changes over time of the cluster to be performed is displayed. According to the text mining method according to the present embodiment, when an instruction for designating a target word is input in the first screen including the result of hierarchical cluster analysis, it indicates the change over time of the cluster including the target word. By displaying the second screen, the user can easily recognize the change over time in the hierarchical cluster analysis result.

또, 제 2 화면은, 주목어를 포함하는 클러스터에 포함되는 단어에 기초하는 클러스터명 (주목어 클러스터명 (53)) 을 시간축을 따라 나타낸다. 또, 이 클러스터명은, 주목어를 포함하는 클러스터에 포함되는 단어를 출현 빈도가 높은 순으로 소정의 개수 이하 (3 개 이하) 만큼 연결한 것이다. 따라서, 이용자는 주목어를 포함하는 클러스터의 시간 경과에 따른 변화를 용이하게 인식할 수 있다.In addition, the second screen shows the cluster name (the main word cluster name 53) based on words included in the cluster including the target word along the time axis. In addition, this cluster name is a result of concatenating a predetermined number or less (three or less) of words included in the cluster containing the target word in the order of their appearance frequency. Accordingly, the user can easily recognize the change over time of the cluster including the target word.

또, 제 2 화면은, 주목어를 포함하는 클러스터의 이름이 변화하는 시기에 대응하는 위치에, 클러스터명의 변화의 정도에 따른 양태를 갖는 마크를 포함하고 있다. 이 마크는, 클러스터명의 변화의 정도에 따른 색을 갖는 화살표 (55) 여도 된다. 이와 같은 마크 (화살표 (55)) 를 포함하는 제 2 화면을 표시함으로써, 이용자는 주목어를 포함하는 클러스터 이름의 변화의 정도를 용이하게 인식할 수 있다. 또, 클러스터명을 구성하는 단어 중 앞의 클러스터명으로부터 변화한 단어 (도 6 에 나타내는 「벨트」, 「회전」 등) 는, 제 2 화면 내에서 강조 표시된다. 따라서, 이용자는 주목어를 포함하는 클러스터에 있어서 출현 빈도가 높은 단어가 어떻게 변화했는지를 용이하게 인식할 수 있다.Further, the second screen includes a mark having an aspect according to the degree of change of the cluster name at a position corresponding to the time when the name of the cluster including the target word changes. This mark may be an arrow 55 having a color according to the degree of change in the cluster name. By displaying the second screen including such a mark (arrow 55), the user can easily recognize the degree of change in the cluster name including the target word. In addition, among words constituting the cluster name, words that have changed from the previous cluster name ("belt", "rotation", etc. shown in Fig. 6) are highlighted in the second screen. Accordingly, the user can easily recognize how a word with a high frequency of appearance has changed in the cluster containing the target word.

또, 제 2 화면은, 시간축을 따라 주목어의 출현 빈도의 시간 경과에 따른 변화를 나타내는 그래프 (꺾은선 그래프 (52)) 를 포함하고 있다. 주목어를 포함하는 클러스터의 시간 경과에 따른 변화에 추가하여, 주목어의 출현 빈도의 시간 경과에 따른 변화를 나타내는 그래프를 포함하는 화면을 표시함으로써, 이용자는 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 용이하게 인식할 수 있다. 또, 제 2 화면은, 주목어를 포함하는 클러스터의 이름이 변화하는 시기에 대응하는 위치에 경계선 (54) 을 포함하고, 그래프의 배경은, 경계선마다 상이한 양태를 갖는다. 따라서, 이용자는 주목어를 포함하는 클러스터가 변화하는 시기를 용이하게 인식할 수 있다. 또, 주목어를 포함하는 클러스터의 이름이 크게 변화하는 것이 많은 경우에는, 화면을 표시하는 스텝은, 경고 메세지를 포함하는 화면 (창 (61) 을 포함하는 화면) 을 표시한다. 따라서, 이용자는 계층적 클러스터 분석이 잘 되고 있지 않음을 인식할 수 있다.In addition, the second screen includes a graph (line graph 52) showing a change over time in the frequency of appearance of the target word along the time axis. In addition to the change over time of the cluster containing the attention word, by displaying a screen including a graph representing the change over time in the frequency of appearance of the attention word, the user can Changes can be easily recognized. Further, the second screen includes a boundary line 54 at a position corresponding to the time when the name of the cluster containing the target word changes, and the background of the graph has a different aspect for each boundary line. Accordingly, the user can easily recognize when the cluster including the target word changes. In addition, when the name of the cluster containing the target word frequently changes greatly, the step of displaying the screen displays a screen including a warning message (a screen including the window 61). Thus, the user can recognize that hierarchical cluster analysis is not working well.

본 실시형태에 관련된 텍스트 마이닝 장치 (10) 및 텍스트 마이닝 프로그램 (31) 은, 상기의 텍스트 마이닝 방법과 동일한 특징을 가지며, 동일한 효과를 나타낸다. 본 실시형태에 관련된 텍스트 마이닝 방법, 텍스트 마이닝 장치 (10), 및 텍스트 마이닝 프로그램 (31) 에 의하면, 이용자는 계층적 클러스터 분석 결과의 시간 경과에 따른 변화를 용이하게 인식할 수 있다.The text mining apparatus 10 and the text mining program 31 according to the present embodiment have the same characteristics as the text mining method described above, and exhibit the same effect. According to the text mining method, the text mining apparatus 10, and the text mining program 31 according to the present embodiment, the user can easily recognize the change over time in the hierarchical cluster analysis result.

이상에서 본 발명을 상세하게 설명했지만, 이상의 설명은 모든 면에서 예시적인 것으로서 제한적인 것은 아니다. 다수의 다른 변경이나 변형이 본 발명의 범위를 일탈하지 않고 안출 가능한 것으로 이해된다.Although the present invention has been described in detail above, the above description is illustrative in all respects and is not limiting. It is understood that many other changes or modifications may be devised without departing from the scope of the present invention.

10 : 텍스트 마이닝 장치
11 : 지시 입력부
12 : 텍스트 데이터 기억부
13 : 단어 추출부
14 : 클러스터링 처리부
15 : 분석 결과 기억부
16 : 화면 표시부
20 : 컴퓨터
21 : CPU
22 : 메인 메모리
29 : 마우스
30 : 기록 매체
31 : 텍스트 마이닝 프로그램
32 : 텍스트 데이터
41, 51, 61 : 창
42 : 컨텍스트 메뉴
43 : 마우스 커서
52 : 꺾은선 그래프
53 : 주목어 클러스터명
54 : 경계선
55 : 화살표10: text mining device
11: Instruction input unit
12: text data storage unit
13: word extraction unit
14: clustering processing unit
15: analysis result storage unit
16: screen display
20: computer
21: CPU
22: main memory
29: mouse
30: recording medium
31: text mining program
32: text data
41, 51, 61: window
42: context menu
43: mouse cursor
52: line graph
53: Note word cluster name
54: border line
55: arrow

Claims

delete

A text mining method for displaying a screen including an analysis result of text data,
A step of extracting a word from text data consisting of a statement having a date,
A step of performing a hierarchical cluster analysis on the word for each analysis period,
Comprising a step of displaying a screen including a result of the hierarchical cluster analysis,
When an instruction for designating a target word in the first screen including the result is input, the step of displaying the screen includes displaying a second screen indicating changes over time of the cluster including the target word, and ,
The second screen displays a cluster name based on words included in the cluster along a time axis,
The cluster name is characterized in that the words included in the cluster are concatenated by a predetermined number or less in the order of high frequency of appearance.

A text mining method for displaying a screen including an analysis result of text data,
A step of extracting a word from text data consisting of a statement having a date,
A step of performing a hierarchical cluster analysis on the word for each analysis period,
Comprising a step of displaying a screen including a result of the hierarchical cluster analysis,
When an instruction for designating a target word in the first screen including the result is input, the step of displaying the screen includes displaying a second screen indicating changes over time of the cluster including the target word, and ,
The second screen displays a cluster name based on words included in the cluster along a time axis,
The second screen, characterized in that, at a position corresponding to a time when the cluster name changes, further includes a mark having a mode according to the degree of change of the cluster name.

The method of claim 4,
The mark is an arrow having a color according to the degree of change of the cluster name.

A text mining method for displaying a screen including an analysis result of text data,
A step of extracting a word from text data consisting of a statement having a date,
A step of performing a hierarchical cluster analysis on the word for each analysis period,
Comprising a step of displaying a screen including a result of the hierarchical cluster analysis,
When an instruction for designating a target word in the first screen including the result is input, the step of displaying the screen includes displaying a second screen indicating changes over time of the cluster including the target word, and ,
The second screen displays a cluster name based on words included in the cluster along a time axis,
A text mining method, characterized in that, among words constituting the cluster name, a word changed from the previous cluster name is highlighted in the second screen.

delete

A text mining method for displaying a screen including an analysis result of text data,
A step of extracting a word from text data consisting of a statement having a date,
A step of performing a hierarchical cluster analysis on the word for each analysis period,
Comprising a step of displaying a screen including a result of the hierarchical cluster analysis,
When an instruction for designating a target word in the first screen including the result is input, the step of displaying the screen includes displaying a second screen indicating changes over time of the cluster including the target word, and ,
The second screen displays a cluster name based on words included in the cluster along a time axis,
The second screen further includes a graph showing a change over time in the frequency of appearance of the main word along the time axis,
The second screen further includes a boundary line at a position corresponding to the time when the cluster name changes, and the background of the graph has a different aspect for each boundary line.

delete

A text mining program stored in a recording medium for displaying a screen including an analysis result of text data,
A step of extracting a word from text data consisting of a statement having a date,
A step of performing a hierarchical cluster analysis on the word for each analysis period,
The CPU executes the step of displaying a screen including the result of the hierarchical cluster analysis in the computer using a memory,
When an instruction for designating a target word in the first screen including the result is input, the step of displaying the screen includes displaying a second screen indicating changes over time of the cluster including the target word, and ,
The second screen displays a cluster name based on words included in the cluster along a time axis,
The cluster name is a text mining program stored in a recording medium, characterized in that the words included in the cluster are concatenated by a predetermined number or less in order of high frequency of appearance.

A text mining program stored in a recording medium for displaying a screen including an analysis result of text data,
A step of extracting a word from text data consisting of a statement having a date,
A step of performing a hierarchical cluster analysis on the word for each analysis period,
The CPU executes the step of displaying a screen including the result of the hierarchical cluster analysis in the computer using a memory,
When an instruction for designating a target word in the first screen including the result is input, the step of displaying the screen includes displaying a second screen indicating changes over time of the cluster including the target word, and ,
The second screen displays a cluster name based on words included in the cluster along a time axis,
And the second screen further includes a mark having an aspect corresponding to a degree of change of the cluster name at a position corresponding to a time when the cluster name changes.

The method of claim 13,
Wherein the mark is an arrow having a color according to a degree of change of the cluster name.

A text mining program stored in a recording medium for displaying a screen including an analysis result of text data,
A step of extracting a word from text data consisting of a statement having a date,
A step of performing a hierarchical cluster analysis on the word for each analysis period,
The CPU executes the step of displaying a screen including the result of the hierarchical cluster analysis in the computer using a memory,
When an instruction for designating a target word in the first screen including the result is input, the step of displaying the screen includes displaying a second screen indicating changes over time of the cluster including the target word, and ,
The second screen displays a cluster name based on words included in the cluster along a time axis,
A text mining program stored in a recording medium, characterized in that, among words constituting the cluster name, a word that has changed from a previous cluster name is highlighted in the second screen.

delete

A text mining program stored in a recording medium for displaying a screen including an analysis result of text data,
A step of extracting a word from text data consisting of a statement having a date,
A step of performing a hierarchical cluster analysis on the word for each analysis period,
The CPU executes the step of displaying a screen including the result of the hierarchical cluster analysis in the computer using a memory,
When an instruction for designating a target word in the first screen including the result is input, the step of displaying the screen includes displaying a second screen indicating changes over time of the cluster including the target word, and ,
The second screen displays a cluster name based on words included in the cluster along a time axis,
The second screen further includes a graph showing a change over time in the frequency of appearance of the main word along the time axis,
The second screen further includes a boundary line at a position corresponding to a time when the cluster name changes, and the background of the graph has a different aspect for each boundary line.

delete

A text mining device that displays a screen including an analysis result of text data,
A word extraction unit for extracting a word from text data consisting of a statement having a date,
A clustering processing unit that performs hierarchical cluster analysis on the word by analysis period,
A screen display unit for displaying a screen including a result of the hierarchical cluster analysis,
When an instruction for designating a target word is input in the first screen including the result, the screen display unit displays a second screen indicating changes over time of the cluster including the target word,
The second screen displays a cluster name based on words included in the cluster along a time axis,
Wherein the second screen further includes a mark having an aspect according to a degree of change of the cluster name at a position corresponding to a time when the cluster name changes.

The method of claim 19,
Wherein the mark is an arrow having a color according to a degree of change of the cluster name.