JP2018152023A

JP2018152023A - Text mining support method and device

Info

Publication number: JP2018152023A
Application number: JP2017049728A
Authority: JP
Inventors: 康平西川; Kohei Nishikawa
Original assignee: Screen Holdings Co Ltd
Current assignee: Screen Holdings Co Ltd
Priority date: 2017-03-15
Filing date: 2017-03-15
Publication date: 2018-09-27
Anticipated expiration: 2037-03-15
Also published as: KR102230102B1; TWI692696B; JP6829117B2; CN108628928B; CN108628928A; TW201835790A; KR20180105566A

Abstract

【課題】対応分析の結果を示すグラフから知見を導く処理を効率的に行えるようにする。【解決手段】対応分析の結果を示す散布図を表示するときに、散布図と散布図の見方を示すヒントとを含む支援画面を表示する。単語と変数に関する散布図を表示するときには、ヒントを含まない基本画面、原点付近の単語の判断方法をヒントとして含む第１支援画面、変数を特徴づける単語の関連度の判断方法をヒントして含む第２支援画面、単語同士の類似度の判断方法をヒントとして含む第３支援画面、および、変数同士の類似度の判断方法をヒントとして含む第４支援画面の中から、利用者が指示した画面を表示する。【選択図】図７To efficiently perform a process for deriving knowledge from a graph showing a result of correspondence analysis. When displaying a scatter diagram showing the result of correspondence analysis, a support screen including a scatter diagram and a hint indicating how to read the scatter diagram is displayed. When displaying a scatter diagram regarding words and variables, a basic screen that does not include a hint, a first support screen that includes a method for determining a word near the origin as a hint, and a method for determining the relevance of a word that characterizes a variable A screen instructed by the user from the second support screen, the third support screen including a method of determining similarity between words as a hint, and the fourth support screen including a method of determining similarity between variables as a hint Is displayed. [Selection] Figure 7

Description

本発明は、データマイニング技術に関し、特に、テキストマイニングの実行を支援するテキストマイニング支援方法および装置に関する。 The present invention relates to a data mining technique, and more particularly to a text mining support method and apparatus for supporting execution of text mining.

近年、大量のデータに対して統計学やパターン認識などのデータ分析技術を適用し、大量のデータから知見（データの中に現れる規則など）を導くデータマイニング技術が注目されている。テキストデータを対象とするデータマイニングは、テキストマイニングと呼ばれる。以下、テキストデータに対して、データ分析技術の一種である対応分析（コレスポンデンス分析）を行う場合について考える。 In recent years, data mining technology that applies data analysis techniques such as statistics and pattern recognition to a large amount of data to derive knowledge (such as rules appearing in the data) from a large amount of data has attracted attention. Data mining for text data is called text mining. Hereinafter, a case where correspondence analysis (correspondence analysis), which is a kind of data analysis technology, is performed on text data will be considered.

対応分析では、クロス集計表に対して表頭項目と表側項目の間の相関が最大になるように各項目を並べ替える処理が行われる。対応分析を行った結果は、一般に散布図（２次元グラフ）を用いて表現される。例えば、図２に示すクロス集計表に対して対応分析を行うと、図３に示す散布図が得られる。 In the correspondence analysis, a process of rearranging each item so that the correlation between the head item and the front item is maximized with respect to the cross tabulation table is performed. The result of the correspondence analysis is generally expressed using a scatter diagram (two-dimensional graph). For example, when correspondence analysis is performed on the cross tabulation table shown in FIG. 2, a scatter diagram shown in FIG. 3 is obtained.

本願発明に関連して、特許文献１には、複数の分析ツールを用いるときの分析手順を利用者に対して提示するテキストマイニングシステムが記載されている。特許文献１に記載されたシステムを用いれば、テキストマイニングに関する知識や経験が少ない利用者でも、複数の分析ツールを好適な順序で用いて分析を行うことができる。 In relation to the present invention, Patent Document 1 describes a text mining system that presents a user with an analysis procedure when using a plurality of analysis tools. If the system described in Patent Document 1 is used, even a user who has little knowledge and experience regarding text mining can perform analysis using a plurality of analysis tools in a suitable order.

特開２００５−４４０８７号公報JP-A-2005-44087

対応分析では、散布図を求めることよりも、求めた散布図に対して考察を行い、知見を導くことのほうが重要である。しかしながら、テキストマイニングに関する知識や経験が少ない利用者は、散布図の見方が分からないので、散布図を見てもまず何を行えばよいかが分からない。このため、知識や経験が少ない利用者は、散布図から知識を導く処理を効率的に行うことができない。 In correspondence analysis, it is more important to consider the derived scatterplot and derive knowledge than to obtain the scatterplot. However, users who have little knowledge and experience regarding text mining do not know how to view the scatter diagram, and therefore do not know what to do first by looking at the scatter diagram. For this reason, a user with little knowledge and experience cannot efficiently perform the process of deriving knowledge from the scatter diagram.

特許文献１に記載されたシステムは、分析手順を利用者に対して提示するが、分析結果から知見を導く処理を支援する訳ではない。このため、特許文献１に記載されたシステムを用いても、上記の課題を解決することができない。 The system described in Patent Document 1 presents an analysis procedure to the user, but does not support the process of deriving knowledge from the analysis result. For this reason, even if it uses the system described in patent document 1, said subject cannot be solved.

それ故に、本発明は、対応分析の結果を示すグラフから知見を導く処理を効率的に行うためのテキストマイニング支援方法および装置を提供することを目的とする。 Therefore, an object of the present invention is to provide a text mining support method and apparatus for efficiently performing processing for deriving knowledge from a graph showing the result of correspondence analysis.

本発明の第１の局面は、対応分析による分析結果を表示するテキストマイニング支援方法であって、
前記分析結果を入力するステップと、
利用者からの指示を入力するステップと、
前記分析結果を示すグラフを含む画面の画面データを生成するステップと、
前記画面データに基づき、画面を表示するステップとを備え、
前記画面データを生成するステップは、前記指示に応じて、前記グラフと前記グラフの見方を示すヒントとを含む支援画面の画面データを生成することを特徴とする。 A first aspect of the present invention is a text mining support method for displaying an analysis result by correspondence analysis,
Inputting the analysis result;
Inputting instructions from the user;
Generating screen data of a screen including a graph indicating the analysis result;
And displaying a screen based on the screen data,
The step of generating the screen data generates screen data of a support screen including the graph and a hint indicating how to read the graph in response to the instruction.

本発明の第２の局面は、本発明の第１の局面において、
前記画面データを生成するステップは、複数の支援画面と、前記グラフを含み前記ヒントを含まない基本画面との中から、前記指示に応じて選択された画面の画面データを生成することを特徴とする。 According to a second aspect of the present invention, in the first aspect of the present invention,
The step of generating the screen data includes generating screen data of a screen selected according to the instruction from a plurality of support screens and a basic screen including the graph and not including the hint. To do.

本発明の第３の局面は、本発明の第２の局面において、
前記分析結果を入力するステップでは、前記分析結果として、第１項目と第２項目とを対応づけた結果であって、前記第１項目の第１成分および第２成分と、前記第２項目の第１成分および第２成分とを含む結果が入力され、
前記画面データを生成するステップは、前記グラフとして、前記第１成分を横軸、前記第２成分を縦軸とした平面内に前記第１項目と前記第２項目とをプロットした散布図を作成することを特徴とする。 According to a third aspect of the present invention, in the second aspect of the present invention,
The step of inputting the analysis result is a result of associating the first item and the second item as the analysis result, wherein the first component and the second component of the first item, and the second item A result including a first component and a second component is input;
The step of generating the screen data creates, as the graph, a scatter diagram in which the first item and the second item are plotted in a plane with the first component as the horizontal axis and the second component as the vertical axis. It is characterized by doing.

本発明の第４の局面は、本発明の第３の局面において、
前記複数の支援画面は、散布図内で原点付近の第１項目は顕著な特徴を有しない旨を前記ヒントとして含む第１支援画面を含むことを特徴とする。 According to a fourth aspect of the present invention, in the third aspect of the present invention,
The plurality of support screens include a first support screen including the hint that the first item near the origin in the scatter diagram does not have a remarkable feature.

本発明の第５の局面は、本発明の第４の局面において、
前記第１支援画面に含まれる散布図には、原点付近の範囲が図示されていることを特徴とする。 According to a fifth aspect of the present invention, in the fourth aspect of the present invention,
In the scatter diagram included in the first support screen, a range near the origin is illustrated.

本発明の第６の局面は、本発明の第３の局面において、
前記複数の支援画面は、散布図内で原点から第２項目に向かって離れる方向にある第１項目は当該第２項目を特徴づける旨を前記ヒントとして含む第２支援画面を含むことを特徴とする。 According to a sixth aspect of the present invention, in the third aspect of the present invention,
The plurality of support screens include a second support screen including, as the hint, the first item in a direction away from the origin toward the second item in the scatter diagram characterizes the second item. To do.

本発明の第７の局面は、本発明の第６の局面において、
前記第２支援画面に含まれる散布図には、原点から選択された第２項目に向かって離れる方向の範囲が図示されていることを特徴とする。 A seventh aspect of the present invention is the sixth aspect of the present invention,
In the scatter diagram included in the second support screen, a range in a direction away from the origin toward the second item selected is illustrated.

本発明の第８の局面は、本発明の第３の局面において、
前記複数の支援画面は、散布図内で距離が近い第１項目同士は類似度が高い旨を前記ヒントとして含む第３支援画面を含むことを特徴とする。 According to an eighth aspect of the present invention, in the third aspect of the present invention,
The plurality of support screens include a third support screen including, as the hint, that the first items that are close to each other in the scatter diagram have high similarity.

本発明の第９の局面は、本発明の第８の局面において、
前記第３支援画面に含まれる散布図には、選択された第１項目付近の範囲が図示されていることを特徴とする。 A ninth aspect of the present invention is the eighth aspect of the present invention,
In the scatter diagram included in the third support screen, a range in the vicinity of the selected first item is illustrated.

本発明の第１０の局面は、本発明の第３の局面において、
前記複数の支援画面は、散布図内で距離が近い第２項目同士は類似度が高い旨を前記ヒントとして含む第４支援画面を含むことを特徴とする。 According to a tenth aspect of the present invention, in the third aspect of the present invention,
The plurality of support screens include a fourth support screen including, as the hint, that the second items that are close to each other in the scatter diagram have high similarity.

本発明の第１１の局面は、本発明の第１０の局面において、
前記第４支援画面に含まれる散布図には、選択された第２項目から最も距離が近い第２項目を示す印が図示されていることを特徴とする。 An eleventh aspect of the present invention is the tenth aspect of the present invention,
In the scatter diagram included in the fourth support screen, a mark indicating the second item closest to the selected second item is shown.

本発明の第１２の局面は、本発明の第３の局面において、
前記分析結果を入力するステップでは、前記分析結果として、単語を前記第１項目、文章の部分を前記第２項目、文章の各部分における各単語の出現頻度を表内データとするクロス集計表に対して対応分析を行った結果が入力されることを特徴とする。 A twelfth aspect of the present invention is the third aspect of the present invention,
In the step of inputting the analysis result, the analysis result is a cross tabulation table in which the word is the first item, the sentence part is the second item, and the appearance frequency of each word in each part of the sentence is in-table data. The result of the corresponding analysis is input.

本発明の第１３の局面は、対応分析による分析結果を表示するテキストマイニング支援装置であって、
前記分析結果を入力するための分析結果入力部と、
利用者からの指示を入力するための指示入力部と、
前記分析結果を示すグラフを含む画面の画面データを生成する画面生成部と、
前記画面データに基づき、画面を表示する分析結果表示部とを備え、
前記画面生成部は、前記指示に応じて、前記グラフと前記グラフの見方を示すヒントとを含む支援画面の画面データを生成することを特徴とする。 A thirteenth aspect of the present invention is a text mining support device for displaying an analysis result by correspondence analysis,
An analysis result input unit for inputting the analysis result;
An instruction input unit for inputting instructions from the user;
A screen generator that generates screen data of a screen including a graph indicating the analysis result;
An analysis result display unit for displaying a screen based on the screen data;
The screen generation unit generates screen data of a support screen including the graph and a hint indicating how to read the graph in response to the instruction.

本発明の第１４の局面は、本発明の第１３の局面において、
前記画面生成部は、複数の支援画面と、前記グラフを含み前記ヒントを含まない基本画面との中から、前記指示に応じて選択された画面の画面データを生成することを特徴とする。 A fourteenth aspect of the present invention is the thirteenth aspect of the present invention,
The screen generation unit generates screen data of a screen selected according to the instruction from a plurality of support screens and a basic screen including the graph and not including the hint.

本発明の第１５の局面は、本発明の第１４の局面において、
前記分析結果入力部には、前記分析結果として、第１項目と第２項目とを対応づけた結果であって、前記第１項目の第１成分および第２成分と、前記第２項目の第１成分および第２成分とを含む結果が入力され、
前記画面生成部は、前記グラフとして、前記第１成分を横軸、前記第２成分を縦軸とした平面内に前記第１項目と前記第２項目とをプロットした散布図を作成することを特徴とする。 A fifteenth aspect of the present invention is the fourteenth aspect of the present invention,
The analysis result input unit is a result of associating the first item and the second item as the analysis result, and includes the first component and the second component of the first item, and the second item of the second item. A result including a first component and a second component is input;
The screen generation unit creates, as the graph, a scatter diagram in which the first item and the second item are plotted in a plane having the first component as a horizontal axis and the second component as a vertical axis. Features.

本発明の第１６の局面は、本発明の第１５の局面において、
前記分析結果入力部には、前記分析結果として、単語を前記第１項目、文章の部分を前記第２項目、文章の各部分における各単語の出現頻度を表内データとするクロス集計表に対して対応分析を行った結果が入力されることを特徴とする。 A sixteenth aspect of the present invention is the fifteenth aspect of the present invention,
In the analysis result input unit, as the analysis result, a word is the first item, a sentence part is the second item, and a frequency of each word in each part of the sentence is an in-table data. The result of the correspondence analysis is input.

上記第１または第１３の局面によれば、利用者は、対応分析の結果を示すグラフとグラフの見方を示すヒントとを含む支援画面を用いて、対応分析の結果を示すグラフから知見を導く処理を効率的に行うことができる。 According to the first or thirteenth aspect, the user derives knowledge from the graph indicating the result of the correspondence analysis using the support screen including the graph indicating the result of the correspondence analysis and the hint indicating how to read the graph. Processing can be performed efficiently.

上記第２または第１４の局面によれば、ヒントを含む支援画面とヒントを含まない基本画面とを選択的に表示することにより、利用者のレベルに応じた画面を表示することができる。また、複数の支援画面を選択的に表示することにより、利用者に対してグラフの見方を複数とおり提示することができる。 According to the second or fourteenth aspect, by selectively displaying the support screen including the hint and the basic screen not including the hint, a screen according to the level of the user can be displayed. In addition, by selectively displaying a plurality of support screens, the user can be presented with a plurality of ways of viewing the graph.

上記第３または第１５の局面によれば、利用者は、第１項目と第２項目に関する対応分析の結果を示す散布図から知見を導く処理を効率的に行うことができる。 According to the third or fifteenth aspect, the user can efficiently perform the process of deriving knowledge from the scatter diagram showing the result of the correspondence analysis regarding the first item and the second item.

上記第４の局面によれば、利用者は、散布図内で原点付近の第１項目は顕著な特徴を有しないという知識を用いて、対応分析の結果を示すグラフから知見を導く処理を効率的に行うことができる。 According to the fourth aspect, the user efficiently uses the knowledge that the first item in the vicinity of the origin in the scatter diagram does not have a prominent feature to efficiently perform the process of deriving knowledge from the graph indicating the result of the correspondence analysis. Can be done automatically.

上記第５の局面によれば、利用者は、図示された範囲を見て、顕著な特徴を有しない第１項目を容易に知ることができる。 According to the fifth aspect, the user can easily know the first item having no remarkable features by looking at the illustrated range.

上記第６の局面によれば、利用者は、散布図内で原点から第２項目に向かって離れる方向にある第１項目は当該第２項目を特徴づけるという知識を用いて、対応分析の結果を示すグラフから知見を導く処理を効率的に行うことができる。 According to the sixth aspect, the user uses the knowledge that the first item in the direction away from the origin toward the second item in the scatter diagram characterizes the second item, and results of the correspondence analysis. Thus, it is possible to efficiently perform the process of deriving knowledge from the graph indicating the above.

上記第７の局面によれば、利用者は、図示された範囲を見て、選択された第２項目を特徴づける第１項目を容易に知ることができる。 According to the seventh aspect, the user can easily know the first item characterizing the selected second item by looking at the illustrated range.

上記第８の局面によれば、利用者は、散布図内で距離が近い第１項目同士は類似度が高いという知識を用いて、対応分析の結果を示すグラフから知見を導く処理を効率的に行うことができる。 According to the eighth aspect, the user efficiently uses the knowledge that the first items that are close to each other in the scatter diagram have a high degree of similarity to perform the process of deriving knowledge from the graph indicating the result of the correspondence analysis. Can be done.

上記第９の局面によれば、利用者は、図示された範囲を見て、選択された第１項目と類似度が高い第１項目を容易に知ることができる。 According to the ninth aspect, the user can easily know the first item having a high degree of similarity to the selected first item by looking at the illustrated range.

上記第１０の局面によれば、利用者は、散布図内で距離が近い第２項目同士は類似度が高いという知識を用いて、対応分析の結果を示すグラフから知見を導く処理を効率的に行うことができる。 According to the tenth aspect, the user efficiently uses the knowledge that the second items that are close to each other in the scatter diagram have a high degree of similarity to perform the process of deriving knowledge from the graph indicating the result of the correspondence analysis. Can be done.

上記第１１の局面によれば、利用者は、図示された印を見て、選択された第２項目と最も類似度が高い第２項目を容易に知ることができる。 According to the eleventh aspect, the user can easily know the second item having the highest degree of similarity with the selected second item by looking at the illustrated mark.

上記第１２または第１８の局面によれば、利用者は、単語と文章の部分に関する対応分析の結果を示す散布図から知見を導く処理を効率的に行うことができる。 According to the twelfth or eighteenth aspect, the user can efficiently perform the process of deriving knowledge from the scatter diagram showing the result of the correspondence analysis regarding the word and sentence portions.

本発明の実施形態に係るテキストマイニング支援装置の構成を示すブロック図である。It is a block diagram which shows the structure of the text mining assistance apparatus which concerns on embodiment of this invention. 対応分析の対象となるクロス集計表を示す図である。It is a figure which shows the cross tabulation table | surface used as the object of correspondence analysis. 図１に示すテキストマイニング支援装置で作成される散布図を示す図である。It is a figure which shows the scatter diagram produced with the text mining assistance apparatus shown in FIG. 図１に示すテキストマイニング支援装置として機能するコンピュータの構成を示すブロック図である。It is a block diagram which shows the structure of the computer which functions as a text mining assistance apparatus shown in FIG. 図１に示すテキストマイニング支援装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the text mining assistance apparatus shown in FIG. 図１に示すテキストマイニング支援装置の基本画面を示す図である。It is a figure which shows the basic screen of the text mining assistance apparatus shown in FIG. 図１に示すテキストマイニング支援装置の第１支援画面を示す図である。It is a figure which shows the 1st assistance screen of the text mining assistance apparatus shown in FIG. 図１に示すテキストマイニング支援装置の第２支援画面を示す図である。It is a figure which shows the 2nd assistance screen of the text mining assistance apparatus shown in FIG. 図１に示すテキストマイニング支援装置の第３支援画面を示す図である。It is a figure which shows the 3rd assistance screen of the text mining assistance apparatus shown in FIG. 図１に示すテキストマイニング支援装置の第４支援画面を示す図である。It is a figure which shows the 4th assistance screen of the text mining assistance apparatus shown in FIG.

以下、図面を参照して、本発明の実施形態に係るテキストマイニング支援方法、テキストマイニング支援装置、および、テキストマイニング支援プログラムについて説明する。本実施形態に係るテキストマイニング支援方法は、典型的にはコンピュータを用いて実行される。本実施形態に係るテキストマイニング支援装置は、典型的にはコンピュータを用いて構成される。本実施形態に係るテキストマイニング支援プログラムは、コンピュータを用いてテキストマイニング支援方法を実施するためのプログラムである。テキストマイニング支援プログラムを実行するコンピュータは、テキストマイニング支援装置として機能する。 Hereinafter, a text mining support method, a text mining support device, and a text mining support program according to an embodiment of the present invention will be described with reference to the drawings. The text mining support method according to the present embodiment is typically executed using a computer. The text mining support device according to the present embodiment is typically configured using a computer. The text mining support program according to the present embodiment is a program for implementing the text mining support method using a computer. A computer that executes the text mining support program functions as a text mining support device.

図１は、本発明の実施形態に係るテキストマイニング支援装置の構成を示すブロック図である。図１に示すテキストマイニング支援装置１０は、分析結果入力部１１、指示入力部１２、画面生成部１３、および、分析結果表示部１４を備えている。テキストマイニング支援装置１０には、テキストデータに対して対応分析を行った結果が入力される。テキストマイニング支援装置１０は、入力された分析結果を示す散布図を画面に表示する。 FIG. 1 is a block diagram showing a configuration of a text mining support apparatus according to an embodiment of the present invention. A text mining support apparatus 10 shown in FIG. 1 includes an analysis result input unit 11, an instruction input unit 12, a screen generation unit 13, and an analysis result display unit 14. The text mining support device 10 receives the result of the correspondence analysis performed on the text data. The text mining support device 10 displays a scatter diagram showing the input analysis result on the screen.

図１では、テキストマイニング支援装置１０の前段にテキスト分析装置５が設けられている。テキスト分析装置５には、テキストデータ１が入力される。以下の説明では、テキストデータ１は、複数の部分（以下、「章」という）を有する文章データであるとする。また、対応分析を行う場面では「章」を「変数」ともいう。テキスト分析装置５は、テキストデータ１に含まれる単語を抽出し、単語を表側項目、章を表頭項目、各章における各単語の出現頻度を表内データとするクロス集計表を作成する。テキスト分析装置５は、作成したクロス集計表に対して対応分析を行い、分析結果２を出力する。対応分析を行うと、処理対象データの特徴を表す２個以上の成分が得られる。分析結果２には、少なくとも、各単語の第１および第２成分、各変数の第１および第２成分、第１成分の寄与率、並びに、第２成分の寄与率が含まれる。 In FIG. 1, a text analysis device 5 is provided in front of the text mining support device 10. Text data 1 is input to the text analysis device 5. In the following description, it is assumed that the text data 1 is text data having a plurality of parts (hereinafter referred to as “chapter”). Also, in the scene where correspondence analysis is performed, “chapter” is also referred to as “variable”. The text analysis device 5 extracts words included in the text data 1 and creates a cross tabulation table with the words as front-side items, chapters as head items, and the appearance frequency of each word in each chapter as in-table data. The text analysis device 5 performs correspondence analysis on the created cross tabulation table and outputs an analysis result 2. When the correspondence analysis is performed, two or more components representing the characteristics of the processing target data are obtained. The analysis result 2 includes at least the first and second components of each word, the first and second components of each variable, the contribution rate of the first component, and the contribution rate of the second component.

図２は、対応分析の対象となるクロス集計表を示す図である。図２に示すクロス集計表は、テキスト分析装置５に小説「人間失格」の文章データをテキストデータ１として入力することにより作成される。この小説は、「はしがき」「第一の手記」「第二の手記」「第三の手記」および「あとがき」の５個の章を有し、「自分」「人間」「ヒラメ」「気持」などの単語を含む。図２に示すクロス集計表は、表側項目として「自分」「人間」「ヒラメ」「気持」などの単語を含み、表頭項目として「はしがき」「第一の手記」「第二の手記」「第三の手記」および「あとがき」の５個の変数（章）を含む。「第一の手記」には、単語「人間」が３８回現れる。これに対応して図２に示すクロス集計表では、表側項目が「人間」、表頭項目が「第一の手記」の欄（斜線部）に３８と記載されている。なお、対応分析を好適に行うために、図２に示すクロス集計表には所定以上の出現頻度を有する単語だけが含まれている。 FIG. 2 is a diagram illustrating a cross tabulation table to be subjected to correspondence analysis. The cross tabulation table shown in FIG. 2 is created by inputting the text data of the novel “personal disqualification” as text data 1 into the text analysis device 5. This novel has five chapters: “Foreword”, “First Note”, “Second Note”, “Third Note”, and “Afterword”, “Self”, “Human”, “Flounder”, “ It includes words such as “feeling”. The cross tabulation table shown in FIG. 2 includes words such as “self”, “human”, “flounder”, and “feeling” as front side items, and “Foreword”, “first note”, and “second note” as front items. It includes five variables (chapters), “Third note” and “Afterword”. In the “first note”, the word “human” appears 38 times. Correspondingly, in the cross tabulation table shown in FIG. 2, “38” is written in the column (hatched portion) of the front item “Human” and the front item “First Manual”. In order to suitably perform the correspondence analysis, the cross tabulation table shown in FIG. 2 includes only words having an appearance frequency equal to or higher than a predetermined value.

図３は、テキストマイニング支援装置１０で作成される散布図を示す図である。上述したように、テキストマイニング支援装置１０に入力される分析結果２には、少なくとも、各単語の第１および第２成分、各変数の第１および第２成分、第１成分の寄与率、並びに、第２成分の寄与率が含まれる。画面生成部１３は、第１成分を横軸、第２成分を縦軸とした平面内に、単語と変数をプロットすることにより散布図を作成する。例えば、図２に示すクロス集計表についての分析結果２に基づき、図３に示す散布図が作成される。分析結果表示部１４は、作成された散布図を含む画面を表示する。 FIG. 3 is a diagram illustrating a scatter diagram created by the text mining support apparatus 10. As described above, the analysis result 2 input to the text mining support device 10 includes at least the first and second components of each word, the first and second components of each variable, the contribution ratio of the first component, and , The contribution ratio of the second component is included. The screen generation unit 13 creates a scatter plot by plotting words and variables in a plane with the first component as the horizontal axis and the second component as the vertical axis. For example, the scatter diagram shown in FIG. 3 is created based on the analysis result 2 for the cross tabulation table shown in FIG. The analysis result display unit 14 displays a screen including the created scatter diagram.

図３では、単語の位置に黒塗りの円、変数の位置に白抜きの正方形が記載され、単語は標準体で、変数は斜体で記載されている。図３には、第１成分の寄与率と第２成分の寄与率が記載されている。一般に、第１成分の寄与率は、第２成分の寄与率よりも大きい。この点を考慮して、散布図内の２点Ｐ（ｐ₁ ，ｐ₂ ）、Ｑ（ｑ₁ ，ｑ₂ ）間の距離ｄは、第１成分の寄与率ｋ₁ と第２成分の寄与率ｋ₂ を用いて次式（１）のように定義される。
ｄ＝√［｛ｋ₁（ｐ₁−ｑ₁）｝²＋｛ｋ₂（ｐ₂−ｑ₂）｝²］ …（１）
以下の説明における距離とは、式（１）で定義される散布図内での距離をいう。散布図内に記載された円は、第１成分方向の長さが第２成分方向の長さより短い楕円に見える。 In FIG. 3, black circles are written at word positions, white squares are written at variable positions, words are written in standard font, and variables are written in italic font. FIG. 3 shows the contribution ratio of the first component and the contribution ratio of the second component. In general, the contribution ratio of the first component is larger than the contribution ratio of the second component. Considering this point, the distance d between the two points P (p ₁ , p ₂ ) and Q (q ₁ , q ₂ ) in the scatter diagram is the contribution ratio k ₁ of the first component and the contribution of the second component. It is defined as the following equation (1) using the rate k ₂ .
d = √ [{k ₁ (p ₁ −q ₁ )} ² + {k ₂ (p ₂ −q ₂ )} ² ] (1)
The distance in the following description refers to the distance in the scatter diagram defined by the equation (1). The circle described in the scatter diagram looks like an ellipse whose length in the first component direction is shorter than the length in the second component direction.

図４は、テキストマイニング支援装置１０として機能するコンピュータの構成を示すブロック図である。図４に示すコンピュータ２０は、ＣＰＵ２１、メインメモリ２２、記憶部２３、入力部２４、表示部２５、通信部２６、および、記録媒体読み取り部２７を備えている。メインメモリ２２には、例えば、ＤＲＡＭが使用される。記憶部２３には、例えば、ハードディスクやソリッドステートドライブが使用される。入力部２４には、例えば、キーボード２８やマウス２９が含まれる。表示部２５には、例えば、液晶ディスプレイが使用される。通信部２６は、有線通信または無線通信のインターフェイス回路である。記録媒体読み取り部２７は、プログラムなどを記憶した記録媒体３０のインターフェイス回路である。記録媒体３０には、例えば、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭなどの非一過性の記録媒体が使用される。 FIG. 4 is a block diagram illustrating a configuration of a computer that functions as the text mining support device 10. A computer 20 shown in FIG. 4 includes a CPU 21, a main memory 22, a storage unit 23, an input unit 24, a display unit 25, a communication unit 26, and a recording medium reading unit 27. For example, a DRAM is used as the main memory 22. For the storage unit 23, for example, a hard disk or a solid state drive is used. The input unit 24 includes a keyboard 28 and a mouse 29, for example. For example, a liquid crystal display is used for the display unit 25. The communication unit 26 is an interface circuit for wired communication or wireless communication. The recording medium reading unit 27 is an interface circuit of the recording medium 30 that stores programs and the like. As the recording medium 30, for example, a non-transitory recording medium such as a CD-ROM or a DVD-ROM is used.

コンピュータ２０がテキストマイニング支援プログラム３１を実行する場合、記憶部２３は、テキストマイニング支援プログラム３１と分析結果２を記憶する。テキストマイニング支援プログラム３１と分析結果２は、例えば、サーバや他のコンピュータから通信部２６を用いて受信したものでもよく、記録媒体３０から記録媒体読み取り部２７を用いて読み出したものでもよい。 When the computer 20 executes the text mining support program 31, the storage unit 23 stores the text mining support program 31 and the analysis result 2. The text mining support program 31 and the analysis result 2 may be received from the server or another computer using the communication unit 26, or read from the recording medium 30 using the recording medium reading unit 27, for example.

テキストマイニング支援プログラム３１を実行するときには、テキストマイニング支援プログラム３１と分析結果２はメインメモリ２２に複写転送される。ＣＰＵ２１は、メインメモリ２２を作業用メモリとして利用して、メインメモリ２２に記憶されたテキストマイニング支援プログラム３１を実行することにより、メインメモリ２２に記憶された分析結果２を処理する。このときコンピュータ２０は、テキストマイニング支援装置１０として機能する。なお、以上に述べたコンピュータ２０の構成は一例に過ぎず、任意のコンピュータを用いてテキストマイニング支援装置１０を構成することができる。 When the text mining support program 31 is executed, the text mining support program 31 and the analysis result 2 are copied and transferred to the main memory 22. The CPU 21 processes the analysis result 2 stored in the main memory 22 by executing the text mining support program 31 stored in the main memory 22 by using the main memory 22 as a working memory. At this time, the computer 20 functions as the text mining support device 10. The configuration of the computer 20 described above is merely an example, and the text mining support device 10 can be configured using an arbitrary computer.

テキストマイニングに関する知識や経験を有する利用者は、対応分析の結果を示す散布図について、以下のような知識を有する。知識や経験を有する利用者は、これらの知識を用いて散布図から知見を導くことができる。
第１の知識「原点付近の単語は、顕著な特徴を有しない。」
第２の知識「原点から変数に向かって離れる方向にある単語は、当該変数との関連度が高く、当該変数を特徴づける。」
第３の知識「距離が近い単語同士は類似度が高い。」
第４の知識「距離が近い変数同士は類似度が高い。」 A user who has knowledge and experience regarding text mining has the following knowledge about a scatter diagram showing the result of correspondence analysis. Users with knowledge and experience can use this knowledge to derive knowledge from the scatter plot.
First knowledge “words near the origin do not have significant features.”
Second knowledge “words in a direction away from the origin toward the variable have a high degree of association with the variable and characterize the variable.”
Third knowledge “words close to each other have high similarity”
Fourth knowledge “variables with close distances have high similarity.”

一方、テキストマイニングに関する知識や経験が少ない利用者は、上記のような知識を有しない。このため、知識や経験が少ない利用者は、散布図から知見を導く処理を効率的に行うことができない。この問題を解決するために、テキストマイニング支援装置１０は、散布図を含む画面を基本画面として表示するだけでなく、利用者からの指示に応じて、散布図と散布図の見方を示すヒントとを含む画面を支援画面として表示する。 On the other hand, a user who has little knowledge and experience regarding text mining does not have such knowledge. For this reason, a user with little knowledge and experience cannot efficiently perform the process of deriving knowledge from the scatter diagram. In order to solve this problem, the text mining support device 10 not only displays a screen including a scatter diagram as a basic screen, but also a hint indicating how to read the scatter diagram and the scatter diagram according to an instruction from the user. Display a screen that contains as a support screen.

図１を参照して、テキストマイニング支援装置１０の各部の動作を説明する。分析結果入力部１１には、外部の装置（例えば、テキスト分析装置５）から出力された分析結果２が入力される。指示入力部１２には、利用者からの指示が入力される。画面生成部１３は、分析結果２を示す散布図を作成し、散布図を含む画面の画面データを生成する。画面生成部１３は、指示入力部１２を用いて入力された利用者からの指示に応じて、散布図およびヒントを含む支援画面の画面データと、散布図を含みヒントを含まない基本画面の画面データとを選択的に作成する。分析結果表示部１４は、画面生成部１３で生成された画面データに基づき画面を表示する。以下、テキストマイニング支援装置１０で表示される支援画面は４種類であるとし、４種類の支援画面を第１〜第４支援画面という。 With reference to FIG. 1, operation | movement of each part of the text mining assistance apparatus 10 is demonstrated. The analysis result input unit 11 receives the analysis result 2 output from an external device (for example, the text analysis device 5). The instruction input unit 12 receives an instruction from the user. The screen generation unit 13 creates a scatter diagram showing the analysis result 2 and generates screen data of a screen including the scatter diagram. In response to an instruction from the user input using the instruction input unit 12, the screen generation unit 13 includes screen data of a support screen including a scatter diagram and a hint, and a screen of a basic screen including a scatter diagram and not including a hint Selectively create data. The analysis result display unit 14 displays a screen based on the screen data generated by the screen generation unit 13. Hereinafter, there are four types of support screens displayed on the text mining support device 10, and the four types of support screens are referred to as first to fourth support screens.

図５は、テキストマイニング支援装置１０の動作を示すフローチャートである。まず、ＣＰＵ２１は、テキスト分析装置５から出力された分析結果２をメインメモリ２２に転送する。これにより、テキストマイニング支援装置１０に分析結果２が入力される（ステップＳ１０１）。次に、ＣＰＵ２１は、分析結果２に基づき散布図を作成する（ステップＳ１０２）。散布図は、第１成分を横軸、第２成分を縦軸とした平面内に、単語と変数をプロットすることにより作成される。次に、ＣＰＵ２１は、ステップＳ１０２で作成された散布図を含む基本画面の画面データを作成する（ステップＳ１０３）。次に、ＣＰＵ２１は、ステップＳ１０３で作成された画面データに基づき、表示部２５に基本画面を表示させる（ステップＳ１０４）。 FIG. 5 is a flowchart showing the operation of the text mining support apparatus 10. First, the CPU 21 transfers the analysis result 2 output from the text analysis device 5 to the main memory 22. Thereby, the analysis result 2 is input to the text mining support device 10 (step S101). Next, the CPU 21 creates a scatter diagram based on the analysis result 2 (step S102). A scatter diagram is created by plotting words and variables in a plane with the first component on the horizontal axis and the second component on the vertical axis. Next, the CPU 21 creates screen data of a basic screen including the scatter diagram created in step S102 (step S103). Next, the CPU 21 displays a basic screen on the display unit 25 based on the screen data created in step S103 (step S104).

図６は、基本画面を示す図である。図６に示す基本画面１００は、画面選択ウインドウ１０１と散布図ウインドウ１０２を含んでいる。散布図ウインドウ１０２には、図３に示す散布図が記載される。画面選択ウインドウ１０１は、６個のラジオボタン１０３を有する。以下、６個のラジオボタン１０３を第１〜第６ラジオボタンという。第１〜第６ラジオボタンは、それぞれ、基本画面、第１〜第４支援画面、および、終了に対応づけられる。基本画面１００が表示されたときに、利用者は、キーボード２８またはマウス２９を操作して、第１〜第６ラジオボタンのうちいずれかを押す。これにより、利用者からの指示が入力される。 FIG. 6 is a diagram showing a basic screen. A basic screen 100 shown in FIG. 6 includes a screen selection window 101 and a scatter diagram window 102. The scatter diagram shown in FIG. 3 is written in the scatter diagram window 102. The screen selection window 101 has six radio buttons 103. Hereinafter, the six radio buttons 103 are referred to as first to sixth radio buttons. The first to sixth radio buttons are associated with the basic screen, the first to fourth support screens, and the end, respectively. When the basic screen 100 is displayed, the user operates the keyboard 28 or the mouse 29 to press one of the first to sixth radio buttons. Thereby, an instruction from the user is input.

ＣＰＵ２１は、画面選択ウインドウ１０１を用いて入力された利用者からの指示を受け取る（ステップＳ１０５）。次に、ＣＰＵ２１は、利用者からの指示に応じて、以下のいずれかのステップに進む（ステップＳ１０６）。利用者からの指示が「基本画面」である場合（第１ラジオボタンが押された場合）、ＣＰＵ２１はステップＳ１０７へ進む。この場合、ＣＰＵ２１は、ステップＳ１０３と同様に、基本画面の画面データを生成する（ステップＳ１０７）。利用者からの指示が「第１支援画面」である場合（第２ラジオボタンが押された場合）、ＣＰＵ２１はステップＳ１０８へ進む。この場合、ＣＰＵ２１は、第１支援画面の画面データを生成する（ステップＳ１０８）。利用者からの指示が「第２支援画面」である場合（第３ラジオボタンが押された場合）、ＣＰＵ２１はステップＳ１０９へ進む。この場合、ＣＰＵ２１は、第２支援画面の画面データを生成する（ステップＳ１０９）。利用者からの指示が「第３支援画面」である場合（第４ラジオボタンが押された場合）、ＣＰＵ２１はステップＳ１１０へ進む。この場合、ＣＰＵ２１は、第３支援画面の画面データを生成する（ステップＳ１１０）。利用者からの指示が「第４支援画面」である場合（第５ラジオボタンが押された場合）、ＣＰＵ２１はステップＳ１１１へ進む。この場合、ＣＰＵ２１は、第４支援画面の画面データを生成する（ステップＳ１１１）。利用者からの指示が「終了」である場合（第６ラジオボタンが押された場合）、ＣＰＵ２１は処理を終了する。 The CPU 21 receives an instruction from the user input using the screen selection window 101 (step S105). Next, the CPU 21 proceeds to one of the following steps in accordance with an instruction from the user (step S106). When the instruction from the user is “basic screen” (when the first radio button is pressed), the CPU 21 proceeds to step S107. In this case, the CPU 21 generates screen data of the basic screen as in step S103 (step S107). When the instruction from the user is the “first support screen” (when the second radio button is pressed), the CPU 21 proceeds to step S108. In this case, the CPU 21 generates screen data of the first support screen (step S108). When the instruction from the user is the “second support screen” (when the third radio button is pressed), the CPU 21 proceeds to step S109. In this case, the CPU 21 generates screen data for the second support screen (step S109). When the instruction from the user is the “third support screen” (when the fourth radio button is pressed), the CPU 21 proceeds to step S110. In this case, the CPU 21 generates screen data for the third support screen (step S110). When the instruction from the user is the “fourth support screen” (when the fifth radio button is pressed), the CPU 21 proceeds to step S111. In this case, the CPU 21 generates screen data for the fourth support screen (step S111). When the instruction from the user is “end” (when the sixth radio button is pressed), the CPU 21 ends the process.

ＣＰＵ２１は、ステップＳ１０７〜Ｓ１１１のいずれかを実行した後、ステップＳ１１２へ進む。次に、ＣＰＵ２１は、ステップＳ１０７〜Ｓ１１１のいずれかで作成された画面データに基づき、表示部２５に画面を表示させる（ステップＳ１１２）。次に、ＣＰＵ２１は、ステップＳ１０５へ進む。このようにテキストマイニング支援装置１０は、利用者からの指示に応じて、基本画面と第１〜第４支援画面の中から選択された画面を表示する。 After executing any of steps S107 to S111, the CPU 21 proceeds to step S112. Next, the CPU 21 displays a screen on the display unit 25 based on the screen data created in any of steps S107 to S111 (step S112). Next, the CPU 21 proceeds to step S105. As described above, the text mining support device 10 displays a screen selected from the basic screen and the first to fourth support screens in accordance with an instruction from the user.

なお、図４に示すコンピュータ２０の構成要素および図５に示すステップと、図１に示すテキストマイニング支援装置１０の構成要素とは、以下のように対応する。ステップＳ１０１を実行するＣＰＵ２１は、分析結果入力部１１として機能する。入力部２４およびステップＳ１０５を実行するＣＰＵ２１は、指示入力部１２として機能する。ステップＳ１０２〜Ｓ１０３、Ｓ１０６〜Ｓ１１１を実行するＣＰＵ２１は、画面生成部１３として機能する。表示部２５およびステップＳ１０４、Ｓ１１２を実行するＣＰＵ２１は、分析結果表示部１４として機能する。 The components of the computer 20 shown in FIG. 4 and the steps shown in FIG. 5 correspond to the components of the text mining support apparatus 10 shown in FIG. 1 as follows. The CPU 21 that executes step S <b> 101 functions as the analysis result input unit 11. The CPU 21 that executes the input unit 24 and step S105 functions as the instruction input unit 12. The CPU 21 that executes steps S102 to S103 and S106 to S111 functions as the screen generation unit 13. The display unit 25 and the CPU 21 that executes steps S104 and S112 function as the analysis result display unit 14.

図７は、第１支援画面を示す図である。図７に示す第１支援画面１１０は、画面選択ウインドウ１０１、散布図ウインドウ１１２、単語リストウインドウ１１３、および、ヒントウインドウ１１４を含んでいる。第１支援画面１１０は、第１の知識「原点付近の単語は、顕著な特徴を有しない。」に関する。利用者は、第１支援画面１１０を見て、第１の知識を用いて散布図から知見を導く処理を効率的に行うことができる。 FIG. 7 is a diagram illustrating the first support screen. The first support screen 110 shown in FIG. 7 includes a screen selection window 101, a scatter diagram window 112, a word list window 113, and a hint window 114. The first support screen 110 relates to the first knowledge “words near the origin do not have a prominent feature”. The user can efficiently perform the process of deriving knowledge from the scatter diagram using the first knowledge by looking at the first support screen 110.

第１支援画面１１０が表示される前に、利用者は、キーボード２８またはマウス２９を操作して、原点付近と判断される範囲を指定する。原点付近と判断される範囲の初期値は、予め決定されていてもよい。散布図ウインドウ１１２には、図３に示す散布図が記載される。散布図ウインドウ１１２内の散布図には、原点付近を示す円１１５（外見は楕円）が記載される。円１１５は、散布図とは異なる色（例えば、赤）で記載することが好ましい。このように第１支援画面１１０に含まれる散布図には、原点付近の範囲が円１１５を用いて図示されている。したがって、利用者は、図示された範囲を見て、顕著な特徴を有しない単語を容易に知ることができる。 Before the first support screen 110 is displayed, the user operates the keyboard 28 or the mouse 29 to designate a range determined to be near the origin. The initial value of the range determined to be near the origin may be determined in advance. In the scatter diagram window 112, the scatter diagram shown in FIG. 3 is described. In the scatter diagram in the scatter diagram window 112, a circle 115 (appearance is an ellipse) indicating the vicinity of the origin is described. The circle 115 is preferably described in a different color (for example, red) from the scatter diagram. Thus, in the scatter diagram included in the first support screen 110, the range near the origin is illustrated using the circle 115. Therefore, the user can easily know a word that does not have a remarkable feature by looking at the illustrated range.

単語リストウインドウ１１３には、原点付近にある単語（円１１５内の単語）と当該単語の原点からの距離とを、距離が近い順に並べた単語リストが記載される。単語リストウインドウ１１３内の上向き三角形は、距離が近い順に並べられていることを示す。ヒントウインドウ１１４には、「分析のポイント」という表題を付けて、第１の知識が記載される。ヒントウインドウ１１４は、散布図ウインドウ１１２と重なる位置に配置される。 In the word list window 113, a word list in which words near the origin (words in the circle 115) and distances from the origin of the words are arranged in order of increasing distance. The upward triangles in the word list window 113 indicate that they are arranged in order of increasing distance. In the hint window 114, the first knowledge is described with the title "point of analysis". The hint window 114 is arranged at a position overlapping the scatter diagram window 112.

円１１５のサイズは、任意の方法で決定される。例えば、利用者が円１１５に含まれる単語の個数（例えば、１０個）を指定することにより、円１１５のサイズを決定してもよい。あるいは、利用者が円１１５に含まれる単語の割合（例えば、全体の１０％）を指定することにより、円１１５のサイズを決定してもよい。あるいは、利用者が原点からの距離を第１支援画面１１０内でマウス２９を用いて指定することにより、円１１５のサイズを決定してもよい。 The size of the circle 115 is determined by an arbitrary method. For example, the size of the circle 115 may be determined by designating the number of words (for example, 10) included in the circle 115 by the user. Alternatively, the size of the circle 115 may be determined by the user specifying the ratio of words included in the circle 115 (for example, 10% of the whole). Alternatively, the user may determine the size of the circle 115 by specifying the distance from the origin using the mouse 29 in the first support screen 110.

図７に示す第１支援画面１１０では、原点付近の単語（円１１５内の単語）は他の単語と同じ態様で表示される。これに代えて、第１支援画面では、原点付近の単語を他の単語と異なる態様で（例えば、薄い色で）表示してもよく、原点付近の単語を表示しなくてもよい。第２〜第４支援画面でも、第１支援画面で他の単語と異なる態様で表示した単語を他の単語と異なる態様で表示してもよく、第１支援画面で表示しなかった単語を表示しなくてもよい。 On the first support screen 110 shown in FIG. 7, words near the origin (words in the circle 115) are displayed in the same manner as other words. Instead, on the first support screen, a word near the origin may be displayed in a different manner from other words (for example, in a light color), or a word near the origin may not be displayed. In the second to fourth support screens, the words displayed in a different mode from the other words on the first support screen may be displayed in a mode different from the other words, and the words not displayed on the first support screen are displayed. You don't have to.

図８は、第２支援画面を示す図である。図８に示す第２支援画面１２０は、画面選択ウインドウ１０１、散布図ウインドウ１２２、単語リストウインドウ１２３、および、ヒントウインドウ１２４を含んでいる。第２支援画面１２０は、第２の知識「原点から変数に向かって離れる方向にある単語は、当該変数との関連度が高く、当該変数を特徴づける。」に関する。利用者は、第２支援画面１２０を見て、第２の知識を用いて散布図から知見を導く処理を効率的に行うことができる。 FIG. 8 is a diagram illustrating the second support screen. The second support screen 120 illustrated in FIG. 8 includes a screen selection window 101, a scatter diagram window 122, a word list window 123, and a hint window 124. The second support screen 120 relates to the second knowledge “words in a direction away from the origin toward the variable have a high degree of association with the variable and characterize the variable”. The user can efficiently perform the process of deriving knowledge from the scatter diagram using the second knowledge by looking at the second support screen 120.

第２支援画面１２０が表示される前に、利用者は、キーボード２８またはマウス２９を操作して、１個の変数（章）を選択する。ここでは、変数「はしがき」が選択された場合について説明する。散布図ウインドウ１２２には、図３に示す散布図が記載される。散布図ウインドウ１２２内の散布図には、原点を始点とし、選択された変数を通過する矢印１２５と、原点を始点とし、矢印１２５との間で所定角度（例えば、１０°）の角をなす２本の半直線１２６、１２７とが記載される。半直線１２６、１２７で挟まれた領域内には、原点から選択された変数に向かって離れる方向にある単語が存在する。このように第２支援画面１２０に含まれる散布図には、原点から選択された変数に向かって離れる方向の範囲が半直線１２６、１２７を用いて図示されている。したがって、利用者は、図示された範囲を見て、選択された変数を特徴づける単語を容易に知ることができる。 Before the second support screen 120 is displayed, the user operates the keyboard 28 or the mouse 29 to select one variable (chapter). Here, a case where the variable “Foreword” is selected will be described. In the scatter diagram window 122, the scatter diagram shown in FIG. 3 is described. The scatter diagram in the scatter diagram window 122 forms an angle of a predetermined angle (for example, 10 °) between the arrow 125 that starts from the origin and passes through the selected variable, and the arrow 125 that starts from the origin. Two half lines 126, 127 are described. Within the region sandwiched between the half lines 126 and 127, there are words in a direction away from the origin toward the selected variable. Thus, in the scatter diagram included in the second support screen 120, the range in the direction away from the origin toward the selected variable is illustrated using the half lines 126 and 127. Therefore, the user can easily know the word characterizing the selected variable by looking at the range shown.

単語リストウインドウ１２３には、原点から選択された変数に向かって離れる方向にある単語（半直線１２６、１２７で挟まれた領域内の単語）と当該単語の原点からの距離とを、距離が遠い順に並べた単語リストが記載される。単語リストウインドウ１２３内の下向き三角形は、距離が遠い順に並べられていることを示す。単語リストウインドウ１２３には、第２の知識に関連して、「原点からの距離が遠いほうがより関連度が高いと判断できる。」と記載される。ヒントウインドウ１２４には、「分析のポイント」という表題を付けて、第２の知識が記載される。ヒントウインドウ１２４は、散布図ウインドウ１２２と重なる位置に配置される。 In the word list window 123, a distance between a word in a direction away from the origin toward the selected variable (a word in an area between the half lines 126 and 127) and the distance from the origin of the word is long. An ordered word list is listed. The downward triangles in the word list window 123 indicate that they are arranged in order of increasing distance. In the word list window 123, in relation to the second knowledge, “It can be determined that the degree of relevance is higher when the distance from the origin is farther”. In the hint window 124, the second knowledge is described with the title “point of analysis”. The hint window 124 is arranged at a position overlapping the scatter diagram window 122.

矢印１２５と半直線１２６、１２７とがなす角の角度は、矢印１２５と半直線１２６、１２７が同じ象限に含まれる限り、任意の方法で決定することができる。矢印１２５と角度を与えて半直線１２６、１２７を記載したときに、半直線１２６、１２７が矢印１２５と異なる象限に含まれる場合、半直線１２６、１２７は第１または第２成分軸上に記載される。矢印１２５は、散布図とは異なる色（例えば、赤）で記載することが好ましい。半直線１２６、１２７は、散布図および矢印１２５とは異なる色（例えば、青）で記載することが好ましい。 The angle between the arrow 125 and the half lines 126 and 127 can be determined by an arbitrary method as long as the arrow 125 and the half lines 126 and 127 are included in the same quadrant. When the half lines 126 and 127 are described by giving an angle to the arrow 125, if the half lines 126 and 127 are included in a different quadrant from the arrow 125, the half lines 126 and 127 are described on the first or second component axis. Is done. The arrow 125 is preferably written in a color (for example, red) different from the scatter diagram. The half lines 126 and 127 are preferably described in a color (for example, blue) different from the scatter diagram and the arrow 125.

図９は、第３支援画面を示す図である。図９に示す第３支援画面１３０は、画面選択ウインドウ１０１、散布図ウインドウ１３２、単語リストウインドウ１３３、および、ヒントウインドウ１３４を含んでいる。第３支援画面１３０は、第３の知識「距離が近い単語同士は類似度が高い。」に関する。利用者は、第３支援画面１３０を見て、第３の知識を用いて散布図から知見を導く処理を効率的に行うことができる。 FIG. 9 is a diagram showing a third support screen. The third support screen 130 shown in FIG. 9 includes a screen selection window 101, a scatter diagram window 132, a word list window 133, and a hint window 134. The third support screen 130 relates to the third knowledge “words that are close to each other have high similarity”. The user can efficiently perform the process of deriving knowledge from the scatter diagram using the third knowledge by looking at the third support screen 130.

第３支援画面１３０が表示される前に、利用者は、キーボード２８またはマウス２９操作して、１個の単語を選択し、選択された単語の付近と判断される範囲を指定する。ここでは、単語「眼」が選択された場合について説明する。散布図ウインドウ１３２には、図３に示す散布図が記載される。散布図ウインドウ１３２内の散布図には、選択された単語の付近を示す円１３５（外見は楕円）が記載される。円１３５は、散布図とは異なる色（例えば、赤）で記載することが好ましい。このように第３支援画面１３０に含まれる散布図には、選択された単語の付近の範囲が円１３５を用いて図示されている。したがって、利用者は、図示された範囲を見て、選択された単語と類似度が高い単語を容易に知ることができる。 Before the third support screen 130 is displayed, the user operates the keyboard 28 or the mouse 29 to select one word, and designates a range determined to be near the selected word. Here, a case where the word “eye” is selected will be described. In the scatter diagram window 132, the scatter diagram shown in FIG. 3 is described. In the scatter diagram in the scatter diagram window 132, a circle 135 (appearance is an ellipse) indicating the vicinity of the selected word is described. The circle 135 is preferably described in a color different from the scatter diagram (for example, red). As described above, in the scatter diagram included in the third support screen 130, the range around the selected word is illustrated using the circle 135. Therefore, the user can easily know a word having a high similarity to the selected word by looking at the range shown.

単語リストウインドウ１３３には、選択された単語の付近にある単語（円１３５内の単語）と当該単語の指定された単語からの距離とを、距離が近い順に並べた単語リストが記載される。単語リストウインドウ１３３には、第３の知識として、「単語からの距離は近いほうがより類似度が高いと判断できる。」と記載される。この例では、選択された単語「眼」からの距離が最も近い変数は「顔」である。したがって、選択された変数「眼」と類似度が最も高い単語は「顔」である。ヒントウインドウ１３４には、「分析のポイント」という表題を付けて、その旨が記載される。ヒントウインドウ１３４は、散布図ウインドウ１３２と重なる位置に配置される。 In the word list window 133, a word list in which words in the vicinity of the selected word (words in the circle 135) and the distance from the designated word of the word are arranged in order of increasing distance is described. In the word list window 133, as third knowledge, “the distance from the word is closer, it can be determined that the similarity is higher” is described. In this example, the variable closest to the selected word “eye” is “face”. Therefore, the word having the highest degree of similarity with the selected variable “eye” is “face”. The hint window 134 is labeled “Analysis Point” with a message to that effect. The hint window 134 is arranged at a position overlapping the scatter diagram window 132.

円１３５のサイズは、第１支援画面１１０内の円１１５のサイズと同様に、任意の方法で決定される。例えば、利用者は、円１３５に含まれる単語の個数を指定する方法、円１３５に含まれる単語の割合を指定する方法、選択された単語からの距離を指定する方法などにより、円１３５のサイズを決定する。 Similar to the size of the circle 115 in the first support screen 110, the size of the circle 135 is determined by an arbitrary method. For example, the user can specify the size of the circle 135 by a method of specifying the number of words included in the circle 135, a method of specifying a ratio of words included in the circle 135, a method of specifying a distance from the selected word, and the like. To decide.

図１０は、第４支援画面を示す図である。図１０に示す第４支援画面１４０は、画面選択ウインドウ１０１、散布図ウインドウ１４２、変数リストウインドウ１４３、および、ヒントウインドウ１４４を含んでいる。第４支援画面１４０は、第４の知識「距離が近い変数同士は類似度が高い。」に関する。利用者は、第４支援画面１４０を見て、第４の知識を用いて散布図から知見を導く処理を効率的に行うことができる。 FIG. 10 is a diagram illustrating the fourth support screen. The fourth support screen 140 shown in FIG. 10 includes a screen selection window 101, a scatter diagram window 142, a variable list window 143, and a hint window 144. The fourth support screen 140 relates to the fourth knowledge “variables with close distances have high similarity”. The user can efficiently perform the process of deriving knowledge from the scatter diagram using the fourth knowledge by looking at the fourth support screen 140.

第４支援画面１４０が表示される前に、利用者は、キーボード２８またはマウス２９を操作して、１個の変数を選択する。ここでは、変数「はしがき」が選択された場合について説明する。散布図ウインドウ１４２には、図３に示す散布図が記載される。散布図ウインドウ１４２内の散布図には、選択された変数を始点とし、選択された変数からの距離が最も近い変数を終点とする矢印１４５が記載される。矢印１４５は、散布図とは異なる色（例えば、赤）で記載することが好ましい。このように第４支援画面１４０に含まれる散布図には、選択された変数から最も距離が近い変数を示す矢印１４５が図示されている。したがって、利用者は、図示された矢印１４５を見て、選択された変数と類似度が最も高い変数を容易に知ることができる。 Before the fourth support screen 140 is displayed, the user operates the keyboard 28 or the mouse 29 to select one variable. Here, a case where the variable “Foreword” is selected will be described. In the scatter diagram window 142, the scatter diagram shown in FIG. 3 is described. In the scatter diagram in the scatter diagram window 142, there is described an arrow 145 starting from the selected variable and ending with the variable closest to the selected variable. The arrow 145 is preferably described in a color (for example, red) different from the scatter diagram. As described above, the scatter diagram included in the fourth support screen 140 shows the arrow 145 indicating the variable that is the closest to the selected variable. Therefore, the user can easily know the variable having the highest similarity with the selected variable by looking at the arrow 145 shown in the figure.

変数リストウインドウ１４３は、選択された変数からの距離が比較的近い変数と当該距離とを、距離が近い順に並べた変数リストが記載される。変数リストウインドウ１４３には、第４の知識として、「変数からの距離は近いほうがより類似度が高いと判断できる。」と記載される。この例では、選択された変数「はしがき」からの距離が最も近い変数は「あとがき」である。したがって、選択された変数「はしがき」と類似度が最も高い変数は「あとがき」である。ヒントウインドウ１４４には、「分析のポイント」という表題を付けて、その旨が記載される。ヒントウインドウ１４４は、散布図ウインドウ１４２と重なる位置に配置される。 The variable list window 143 describes a variable list in which variables that are relatively close to the selected variable and the distances are arranged in order of increasing distance. In the variable list window 143, the fourth knowledge is described as "It can be determined that the closer the distance from the variable is, the higher the similarity is." In this example, the variable closest to the selected variable “Foreword” is “Afterword”. Therefore, the variable with the highest similarity to the selected variable “Foreword” is “Afterword”. The hint window 144 is labeled with the title “Analysis Point”. The hint window 144 is arranged at a position overlapping the scatter diagram window 142.

なお、テキストマイニング支援装置１０は、以上に述べた支援画面以外の支援画面を表示してもよい。支援画面は、散布図と散布図の見方を示すヒントとを含む限り、任意の内容を含んでいてもよい。ヒントは、散布図の見方を明示的に示すものでもよく、散布図の見方を示唆するものでもよい。ヒントは、支援画面のいずれの部分に含まれていてもよい。ヒントは、散布図ウインドウと重なるウインドウに記載されていてもよく、散布図ウインドウと重ならないウインドウに記載されていてもよく、位置が固定されたメッセージボックスに記載されていてもよい。 The text mining support device 10 may display a support screen other than the support screen described above. The support screen may include any content as long as it includes a scatter diagram and a hint indicating how to view the scatter diagram. The hint may explicitly indicate how to view the scatter diagram or may suggest how to view the scatter diagram. The hint may be included in any part of the support screen. The hint may be described in a window that overlaps with the scatter diagram window, may be described in a window that does not overlap with the scatter diagram window, or may be described in a message box having a fixed position.

以上に示すように、本実施形態に係るテキストマイニング支援方法は、分析結果２を入力するステップと、利用者からの指示を入力するステップと、分析結果２を示すグラフ（散布図）を含む画面の画面データを生成するステップと、画面データに基づき、画面を表示するステップとを備えている。画面データを生成するステップは、指示に応じて、グラフとグラフの見方を示すヒントとを含む支援画面の画面データを生成する。したがって、利用者は、対応分析の結果を示すグラフとグラフの見方を示すヒントとを含む支援画面を用いて、対応分析の結果を示すグラフから知見を導く処理を効率的に行うことができる。 As described above, the text mining support method according to the present embodiment includes a screen including a step of inputting the analysis result 2, a step of inputting an instruction from the user, and a graph (scatter diagram) showing the analysis result 2 Generating screen data and displaying a screen based on the screen data. The step of generating screen data generates screen data of a support screen including a graph and a hint indicating how to read the graph according to the instruction. Therefore, the user can efficiently perform the process of deriving knowledge from the graph indicating the result of the correspondence analysis using the support screen including the graph indicating the result of the correspondence analysis and the hint indicating how to read the graph.

画面データを生成するステップは、複数の支援画面（第１〜第４支援画面１１０、１２０、１３０、１４０）と、グラフを含みヒントを含まない基本画面１００との中から、指示に応じて選択された画面の画面データを生成する。このようにヒントを含む支援画面とヒントを含まない基本画面とを選択的に表示することにより、利用者のレベルに応じた画面を表示することができる。また、複数の支援画面を選択的に表示することにより、利用者に対してグラフの見方を複数とおり提示することができる。 The step of generating screen data is selected according to an instruction from a plurality of support screens (first to fourth support screens 110, 120, 130, and 140) and a basic screen 100 that includes a graph and does not include a hint. Screen data of the generated screen is generated. Thus, by selectively displaying the support screen including the hint and the basic screen not including the hint, it is possible to display a screen according to the level of the user. In addition, by selectively displaying a plurality of support screens, the user can be presented with a plurality of ways of viewing the graph.

分析結果を入力するステップでは、分析結果２として、第１項目（単語）と第２項目（変数）とを対応づけた結果であって、第１項目の第１成分および第２成分と、第２項目の第１成分および第２成分とを含む結果が入力され、画面データを生成するステップは、グラフとして、第１成分を横軸、第２成分を縦軸とした平面内に第１項目と第２項目とをプロットした散布図を作成する。したがって、利用者は、第１項目と第２項目に関する対応分析の結果を示す散布図から知見を導く処理を効率的に行うことができる。 In the step of inputting the analysis result, as the analysis result 2, the first item (word) and the second item (variable) are associated with each other, and the first component and the second component of the first item, The step of generating the screen data by inputting the result including the first component and the second component of the two items is a graph in which the first item is in a plane with the first component as the horizontal axis and the second component as the vertical axis. And a scatter plot in which the second item is plotted. Therefore, the user can efficiently perform the process of deriving knowledge from the scatter diagram showing the result of the correspondence analysis regarding the first item and the second item.

複数の支援画面は、散布図内で原点付近の第１項目は顕著な特徴を有しない旨をヒントとして含む第１支援画面１１０、散布図内で原点から第２項目に向かって離れる方向にある第１項目は当該第２項目を特徴づける旨をヒントとして含む第２支援画面１２０、散布図内で距離が近い第１項目同士は類似度が高い旨をヒントとして含む第３支援画面１３０、および、散布図内で距離が近い第２項目同士は類似度が高い旨をヒントとして含む第４支援画面１４０を含んでいる。したがって、利用者は、各支援画面に含まれるヒントを用いて、対応分析の結果を示すグラフから知見を導く処理を効率的に行うことができる。 The plurality of support screens are in a direction away from the origin to the second item in the first support screen 110 including a hint that the first item near the origin in the scatter diagram does not have a remarkable feature. A second support screen 120 including a hint that the first item characterizes the second item as a hint, a third support screen 130 including a hint that the first items that are close to each other in the scatter diagram have high similarity, and The second items that are close to each other in the scatter diagram include a fourth support screen 140 that includes a hint that the degree of similarity is high. Therefore, the user can efficiently perform the process of deriving knowledge from the graph indicating the result of the correspondence analysis using the hint included in each support screen.

第１支援画面１１０に含まれる散布図には、原点付近の範囲が円１１５を用いて図示されている。第２支援画面１２０に含まれる散布図には、原点から選択された第２項目に向かって離れる方向の範囲が半直線１２６、１２７を用いて図示されている。第３支援画面１３０に含まれる散布図には、選択された第１項目付近の範囲が円１３５を用いて図示されている。第４支援画面１４０に含まれる散布図には、選択された第２項目から最も距離が近い第２項目を示す印（矢印１４５）が図示されている。したがって、利用者は、各支援画面に図示された範囲または印を見て、顕著な特徴を有しない第１項目、選択された第２項目を特徴づける第１項目、選択された第１項目と類似度が高い第１項目、および、選択された第２項目と類似度が高い第２項目を容易に知ることができる。 In the scatter diagram included in the first support screen 110, a range near the origin is illustrated using a circle 115. In the scatter diagram included in the second support screen 120, the range in the direction away from the origin toward the second item selected is illustrated using the half lines 126 and 127. In the scatter diagram included in the third support screen 130, a range near the selected first item is illustrated using a circle 135. In the scatter diagram included in the fourth support screen 140, a mark (arrow 145) indicating the second item closest to the selected second item is shown. Therefore, the user looks at the range or mark shown on each support screen, and the first item that does not have a prominent feature, the first item that characterizes the selected second item, the selected first item, The first item having a high similarity and the second item having a high similarity with the selected second item can be easily known.

分析結果を入力するステップでは、分析結果として、単語を第１項目、文章の部分を第２項目、文章の各部分における各単語の出現頻度を表内データとするクロス集計表に対して対応分析を行った結果が入力される。したがって、利用者は、単語と文章の部分に関する対応分析の結果を示す散布図から知見を導く処理を効率的に行うことができる。 In the step of inputting the analysis result, as a result of the analysis, a correspondence analysis is performed on the cross tabulation table in which the word is the first item, the sentence part is the second item, and the appearance frequency of each word in each part of the sentence is the in-table data. The result of performing is input. Therefore, the user can efficiently perform the process of deriving knowledge from the scatter diagram showing the result of the correspondence analysis regarding the word and sentence portions.

本実施形態に係るテキストマイニング支援装置１０、および、本実施形態に係るテキストマイニング支援プログラム３１は、本実施形態に係るテキストマイニング支援方法と同様の特徴を有し、同様の効果を奏する。 The text mining support apparatus 10 according to the present embodiment and the text mining support program 31 according to the present embodiment have the same features as the text mining support method according to the present embodiment, and have the same effects.

なお、以上の説明では、テキストマイニング支援装置１０は、対応分析の結果を２次元的に示す散布図を表示することとした。これに限らず、本発明は、対応分析の結果を多次元的に示すグラフ（例えば、３次元グラフ）を表示するテキストマイニング支援方法および装置にも適用することができる。また、テキストデータに関するクロス集計表に対する対応分析の結果を示す散布図を表示するテキストマイニング支援方法および装置と同様に、テキストデータ以外の任意のデータに関するクロス集計表に対する対応分析の結果を示すグラフ（散布図や３次元グラフなど）を表示するデータマイニング支援方法および装置を構成することができる。 In the above description, the text mining support device 10 displays a scatter diagram that two-dimensionally shows the result of the correspondence analysis. The present invention is not limited to this, and the present invention can also be applied to a text mining support method and apparatus for displaying a graph (for example, a three-dimensional graph) showing the result of correspondence analysis in a multidimensional manner. Similarly to the text mining support method and apparatus for displaying a scatter diagram showing the result of the correspondence analysis on the cross tabulation table related to the text data, the graph showing the result of the correspondence analysis on the cross tabulation table on any data other than the text data ( A data mining support method and apparatus for displaying a scatter diagram or a three-dimensional graph can be configured.

本発明のテキストマイニング支援方法および装置によれば、対応分析の結果を示すグラフとグラフの見方を示すヒントとを含む支援画面を表示することにより、利用者は対応分析の結果を示すグラフから知見を導く処理を効率的に行うことができる。 According to the text mining support method and apparatus of the present invention, by displaying a support screen including a graph indicating the result of correspondence analysis and a hint indicating how to read the graph, the user can learn from the graph indicating the result of correspondence analysis. Can be efficiently performed.

２…分析結果
１０…テキストマイニング支援装置
１１…分析結果入力部
１２…指示入力部
１３…画面生成部
１４…分析結果表示部
３１…テキストマイニング支援プログラム
１００…基本画面
１０１…画面選択ウインドウ
１０２、１１２、１２２、１３２、１４２…散布図ウインドウ
１１０、１２０、１３０、１４０…支援画面
１１３、１２３、１３３…単語リストウインドウ
１４３…変数リストウインドウ
１１４、１２４、１３４、１４４…ヒントウインドウ 2 ... Analysis result 10 ... Text mining support device 11 ... Analysis result input unit 12 ... Instruction input unit 13 ... Screen generation unit 14 ... Analysis result display unit 31 ... Text mining support program 100 ... Basic screen 101 ... Screen selection window 102, 112 , 122, 132, 142 ... Scatter chart window 110, 120, 130, 140 ... Support screen 113, 123, 133 ... Word list window 143 ... Variable list window 114, 124, 134, 144 ... Hint window

Claims

A text mining support method for displaying analysis results by correspondence analysis,
Inputting the analysis result;
Inputting instructions from the user;
Generating screen data of a screen including a graph indicating the analysis result;
And displaying a screen based on the screen data,
The step of generating the screen data generates a screen data of a support screen including the graph and a hint indicating how to read the graph in accordance with the instruction.

The step of generating the screen data includes generating screen data of a screen selected according to the instruction from a plurality of support screens and a basic screen including the graph and not including the hint. The text mining support method according to claim 1.

The step of inputting the analysis result is a result of associating the first item and the second item as the analysis result, wherein the first component and the second component of the first item, and the second item A result including a first component and a second component is input;
The step of generating the screen data creates, as the graph, a scatter diagram in which the first item and the second item are plotted in a plane with the first component as the horizontal axis and the second component as the vertical axis. The text mining support method according to claim 2, wherein:

The text mining according to claim 3, wherein the plurality of support screens include a first support screen including the hint that the first item in the vicinity of the origin in the scatter diagram does not have a remarkable feature. Support method.

The text mining support method according to claim 4, wherein a range near the origin is shown in the scatter diagram included in the first support screen.

The plurality of support screens include a second support screen including, as the hint, the first item in a direction away from the origin toward the second item in the scatter diagram characterizes the second item. The text mining support method according to claim 3.

The text mining support method according to claim 6, wherein the scatter diagram included in the second support screen shows a range in a direction away from the origin toward the second item selected.

4. The text mining support according to claim 3, wherein the plurality of support screens include a third support screen including, as the hint, that the first items that are close to each other in the scatter diagram have high similarity. Method.

9. The text mining support method according to claim 8, wherein a range near the selected first item is illustrated in the scatter diagram included in the third support screen.

4. The text mining support according to claim 3, wherein the plurality of support screens include a fourth support screen including, as the hint, that the second items that are close to each other in the scatter diagram have high similarity. Method.

The text mining support according to claim 10, wherein the scatter diagram included in the fourth support screen includes a mark indicating the second item closest to the selected second item. Method.

In the step of inputting the analysis result, the analysis result is a cross tabulation table in which the word is the first item, the sentence part is the second item, and the appearance frequency of each word in each part of the sentence is in-table data. The text mining support method according to claim 3, wherein a result of correspondence analysis is inputted.

A text mining support device that displays an analysis result by correspondence analysis,
An analysis result input unit for inputting the analysis result;
An instruction input unit for inputting instructions from the user;
A screen generator that generates screen data of a screen including a graph indicating the analysis result;
An analysis result display unit for displaying a screen based on the screen data;
The screen generation unit generates screen data of a support screen including the graph and a hint indicating how to read the graph in accordance with the instruction.

The screen generation unit generates screen data of a screen selected according to the instruction from a plurality of support screens and a basic screen including the graph and not including the hint. Item 14. The text mining support device according to Item 13.

The analysis result input unit is a result of associating the first item and the second item as the analysis result, and includes the first component and the second component of the first item, and the second item of the second item. A result including a first component and a second component is input;
The screen generation unit creates, as the graph, a scatter diagram in which the first item and the second item are plotted in a plane having the first component as a horizontal axis and the second component as a vertical axis. The text mining support device according to claim 14, characterized in that it is characterized by

In the analysis result input unit, as the analysis result, a word is the first item, a sentence part is the second item, and a frequency of each word in each part of the sentence is an in-table data. The text mining support device according to claim 15, wherein the result of the correspondence analysis is input.