JP2019164593A

JP2019164593A - Text mining method, text mining program, and text mining device

Info

Publication number: JP2019164593A
Application number: JP2018052074A
Authority: JP
Inventors: 未希柿ノ木; Miki Kakinoki
Original assignee: Screen Holdings Co Ltd
Current assignee: Screen Holdings Co Ltd
Priority date: 2018-03-20
Filing date: 2018-03-20
Publication date: 2019-09-26
Anticipated expiration: 2038-03-20
Also published as: CN110309290A; TWI703457B; JP6987003B2; KR102162779B1; TW201945958A; CN110309290B; KR20190110428A

Abstract

To display a screen including a co-occurrence network by simple operation when designating an attention word.SOLUTION: A text mining method includes: a step for extracting a word from text data; a step for generating a co-occurrence matrix for the extracted word; a step for generating a co-occurrence network on the basis of the generated co-occurrence matrix; and a step for displaying a screen including the generated co-occurrence network. When an instruction for designating an attention word is input in a first screen including a first co-occurrence network on the basis of the whole of the designated text data, the text mining method extracts the word from limited text data constituted of a part including the attention word among the designated text data, generates a second co-occurrence matrix by using the limited text data for the extracted word, generates a second co-occurrence network on the basis of the second co-occurrence matrix, and displays a second screen including the second co-occurrence network.SELECTED DRAWING: Figure 3

Description

本発明は、テキストマイニングに関し、特に、単語の共起ネットワークを含む画面を表示するテキストマイニング方法、テキストマイニングプログラム、および、テキストマイニング装置に関する。 The present invention relates to text mining, and more particularly to a text mining method, a text mining program, and a text mining apparatus that display a screen including a word co-occurrence network.

近年、自由記述されたテキストデータを分析し、分析結果から有用な情報を求めるテキストマイニングが注目されている。テキストマイニングでは、例えば、分析対象のテキストデータから単語を抽出し、単語の出現頻度や出現傾向などを解析することにより、情報を求める。 In recent years, text mining that analyzes free-written text data and obtains useful information from the analysis results has attracted attention. In text mining, for example, a word is extracted from text data to be analyzed, and information is obtained by analyzing the appearance frequency and appearance tendency of the word.

自由記述されたテキストデータを分析するときには、分析者は、初期段階では対象を主観的に選択するのではなく、テキストデータの全体像を把握する必要がある。このため、分析者は、テキストデータに含まれる単語の共起ネットワークを用いることがある。 When analyzing freely described text data, an analyst needs to grasp the whole image of text data instead of subjectively selecting an object in the initial stage. For this reason, an analyst may use a co-occurrence network of words included in text data.

図１９は、共起ネットワークの例を示す図である。共起ネットワークは、テキストデータから同じ文に含まれることが多い単語のペアを抽出し、その結果を無向グラフで表現したものである。分析対象のテキストデータにおいて単語Ｗａと単語Ｗｂが同じ文に含まれることが多い場合、共起ネットワークには、単語Ｗａに対応するノード、単語Ｗｂに対応するノード、および、両者を接続するエッジが含まれる。図１９に示す共起ネットワークは、「スタッフ」に対応するノード、「対応」に対応するノード、および、両者を接続するエッジを含んでいる。図１９に示す共起ネットワークを見れば、分析対象のテキストデータでは「スタッフ」と「対応」が同じ文に含まれることが多いことが分かる。 FIG. 19 is a diagram illustrating an example of a co-occurrence network. The co-occurrence network extracts word pairs that are often included in the same sentence from text data, and expresses the result as an undirected graph. When the word Wa and the word Wb are often included in the same sentence in the text data to be analyzed, the co-occurrence network includes a node corresponding to the word Wa, a node corresponding to the word Wb, and an edge connecting the two. included. The co-occurrence network shown in FIG. 19 includes a node corresponding to “staff”, a node corresponding to “correspondence”, and an edge connecting both. From the co-occurrence network shown in FIG. 19, it is understood that “staff” and “correspondence” are often included in the same sentence in the text data to be analyzed.

一般に、共起ネットワークは、指定されたテキストデータの全体に基づき生成される。以下、このような共起ネットワークを「全体共起ネットワーク」という。分析者は、自分が立てた仮説や分析目的に応じて全体共起ネットワークから注目すべき単語（以下、注目語という）を複数個選択し、注目語を考慮して以降の分析を行う。 In general, the co-occurrence network is generated based on the entire designated text data. Hereinafter, such a co-occurrence network is referred to as an “overall co-occurrence network”. The analyst selects a plurality of words (hereinafter referred to as attention words) to be noticed from the entire co-occurrence network according to the hypothesis and analysis purpose that he / she has established, and performs subsequent analysis in consideration of the attention words.

分析者は、注目語を選択するときに、選択した注目語が分析目的などに適しているか否かを判断するために、注目語を含む文の中で注目語がどのように使われているかを考察する。このため、分析者は、指定されたテキストデータのうち注目語を含む文からなるテキストデータ（以下、限定テキストデータという）に基づく共起ネットワークを用いることがある。なお、ここで言う「注目語を含む文」は、注目語を含む単一の文を意味する場合だけでなく、注目語を含む文を包含する段落など、ブロック単位に分割された複数の文（文の集合）を意味する場合がある。以下、このような共起ネットワークを「限定共起ネットワーク」という。分析者は、限定共起ネットワークを用いることにより、限定テキストデータの内容を把握することができる。分析者は、すべての注目語を選択するまで、全体共起ネットワークと限定共起ネットワークを繰り返し参照する。 When an analyst selects an attention word, how the attention word is used in a sentence including the attention word to determine whether the selected attention word is suitable for the purpose of analysis, etc. Is considered. For this reason, an analyst may use a co-occurrence network based on text data (hereinafter referred to as limited text data) composed of a sentence including a noticed word among designated text data. Note that the “sentence containing the attention word” here refers not only to a single sentence including the attention word, but also to a plurality of sentences divided into block units, such as paragraphs including the sentence including the attention word. It may mean (a set of sentences). Hereinafter, such a co-occurrence network is referred to as a “limited co-occurrence network”. The analyst can grasp the contents of the limited text data by using the limited co-occurrence network. The analyst repeatedly refers to the global co-occurrence network and the limited co-occurrence network until all attention words are selected.

以下、テキストデータに含まれる単語の共起ネットワークを生成し、生成した共起ネットワークを含む画面を表示するテキストマイニング装置について考える。特許文献１には、複数の文書のそれぞれについて全体共起ネットワークを生成し、生成した複数の全体共起ネットワークを含む画面を表示するドキュメントデータベース表示装置が記載されている。この表示装置は、複数の全体共起ネットワークの中から利用者が入力した単語を検索し、検索した単語を画面内で強調表示する。 Hereinafter, a text mining device that generates a co-occurrence network of words included in text data and displays a screen including the generated co-occurrence network will be considered. Patent Document 1 describes a document database display device that generates a global co-occurrence network for each of a plurality of documents and displays a screen including the generated global co-occurrence networks. This display device searches for a word input by the user from a plurality of global co-occurrence networks, and highlights the searched word on the screen.

特開平８−３１４９８０号公報JP-A-8-314980

従来のテキストマイニング装置は、指定されたテキストデータの全体に基づき共起ネットワークを生成する。したがって、従来のテキストマイニング装置によれば、全体共起ネットワークを含む画面を容易に表示することができる。 A conventional text mining device generates a co-occurrence network based on the entire designated text data. Therefore, according to the conventional text mining device, a screen including the entire co-occurrence network can be easily displayed.

一方、従来のテキストマイニング装置を用いて限定共起ネットワークを含む画面を表示するときには、分析者は煩雑な操作を行う必要がある。具体的には、分析者は、全体共起ネットワークの中から１個の注目語を選択するたびに、指定されたテキストデータに基づき限定テキストデータを生成し、生成した限定テキストデータをテキストマイニング装置に与える必要がある。また、分析者は、注目語を選択するときに、全体共起ネットワークと限定共起ネットワークの両方を参照する。このため、テキストマイニング装置は、全体共起ネットワークの画像データと限定共起ネットワークの画像データの両方を保存する必要がある。しかし、多くの共起ネットワークを生成した場合、画像データの保存と管理が困難になる。 On the other hand, when displaying a screen including a limited co-occurrence network using a conventional text mining device, the analyst needs to perform a complicated operation. Specifically, each time an analyst selects one attention word from the entire co-occurrence network, the analyst generates limited text data based on the specified text data, and the generated limited text data is converted into a text mining device. Need to give to. Also, the analyst refers to both the global co-occurrence network and the limited co-occurrence network when selecting the attention word. For this reason, the text mining device needs to store both the image data of the entire co-occurrence network and the image data of the limited co-occurrence network. However, when many co-occurrence networks are generated, it becomes difficult to store and manage image data.

それ故に、本発明は、注目語を指定したときの共起ネットワークを含む画面を簡単な操作で表示できるテキストマイニング方法、テキストマイニングプログラム、および、テキストマイニング装置を提供することを目的とする。 Therefore, an object of the present invention is to provide a text mining method, a text mining program, and a text mining device that can display a screen including a co-occurrence network when an attention word is designated by a simple operation.

本発明の第１の局面は、テキストデータの分析結果を含む画面を表示するテキストマイニング方法であって、
テキストデータから単語を抽出するステップと、
前記単語について共起行列を生成するステップと、
前記共起行列に基づき共起ネットワークを生成するステップと、
前記共起ネットワークを含む画面を表示するステップとを備え、
指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、前記単語を抽出するステップは前記指定されたテキストデータのうち前記注目語を含む部分からなる限定テキストデータから前記単語を抽出し、前記共起行列を生成するステップは前記単語について前記限定テキストデータを用いて第２共起行列を生成し、前記共起ネットワークを生成するステップは前記第２共起行列に基づき第２共起ネットワークを生成し、前記画面を表示するステップは前記第２共起ネットワークを含む第２画面を表示することを特徴とする。 A first aspect of the present invention is a text mining method for displaying a screen including an analysis result of text data,
Extracting words from text data;
Generating a co-occurrence matrix for the word;
Generating a co-occurrence network based on the co-occurrence matrix;
Displaying a screen including the co-occurrence network,
The step of extracting the word when the instruction to specify the attention word is input in the first screen including the first co-occurrence network based on the whole of the specified text data includes the step of extracting the word from the specified text data. The step of extracting the word from the limited text data including the part including the attention word and generating the co-occurrence matrix generates a second co-occurrence matrix using the limited text data for the word, The generating step generates a second co-occurrence network based on the second co-occurrence matrix, and the step of displaying the screen displays a second screen including the second co-occurrence network.

本発明の第２の局面は、本発明の第１の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１個または複数のノードを選択し、分析開始を選択することにより、前記ノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 According to a second aspect of the present invention, in the first aspect of the present invention,
By selecting one or a plurality of nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction for designating a word corresponding to the node as the attention word is input. It is characterized by that.

本発明の第３の局面は、本発明の第１の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１個のノードを続けて選択することにより、前記ノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 According to a third aspect of the present invention, in the first aspect of the present invention,
By sequentially selecting one node included in the first co-occurrence network in the first screen, an instruction for designating a word corresponding to the node as the attention word is input. .

本発明の第４の局面は、本発明の第１の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１本のエッジを続けて選択することにより、前記エッジに接続された２個のノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 According to a fourth aspect of the present invention, in the first aspect of the present invention,
An instruction for designating a word corresponding to two nodes connected to the edge as the attention word by successively selecting one edge included in the first co-occurrence network in the first screen. It is input.

本発明の第５の局面は、本発明の第１の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１本または複数のエッジを選択し、分析開始を選択することにより、前記エッジに接続された複数のノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 According to a fifth aspect of the present invention, in the first aspect of the present invention,
By selecting one or a plurality of edges included in the first co-occurrence network in the first screen and selecting start analysis, words corresponding to a plurality of nodes connected to the edges are selected as the attention word. An instruction to designate as is input.

本発明の第６の局面は、本発明の第１の局面において、
複数の第２共起ネットワークを含む第２画面内で併合指示が入力されたときに、前記画面を表示するステップは、前記複数の第２共起ネットワークをタブ形式で表示することを特徴とする。 According to a sixth aspect of the present invention, in the first aspect of the present invention,
The step of displaying the screen when a merge instruction is input in a second screen including a plurality of second co-occurrence networks includes displaying the plurality of second co-occurrence networks in a tab format. .

本発明の第７の局面は、本発明の第６の局面において、
前記第２画面内で一の第２共起ネットワークを掴んで他の第２共起ネットワーク内で離すことにより、前記併合指示が入力されることを特徴とする。 A seventh aspect of the present invention is the sixth aspect of the present invention,
The merging instruction is input by grasping one second co-occurrence network in the second screen and releasing it in another second co-occurrence network.

本発明の第８の局面は、本発明の第１の局面において、
前記限定テキストデータは、前記指定されたテキストデータのうち前記注目語を含む文からなることを特徴とする。 According to an eighth aspect of the present invention, in the first aspect of the present invention,
The limited text data includes a sentence including the attention word in the designated text data.

本発明の第９の局面は、本発明の第８の局面において、
複数の注目語が指定されたときの前記限定テキストデータは、前記指定されたテキストデータのうち前記複数の注目語のすべてを含む文からなることを特徴とする。 A ninth aspect of the present invention is the eighth aspect of the present invention,
The limited text data when a plurality of attention words are designated is characterized by comprising a sentence including all of the plurality of attention words in the designated text data.

本発明の第１０の局面は、本発明の第８の局面において、
複数の注目語が指定されたときの前記限定テキストデータは、前記指定されたテキストデータのうち前記複数の注目語のいずれかを含む文からなることを特徴とする。 A tenth aspect of the present invention is the eighth aspect of the present invention,
The limited text data when a plurality of attention words are designated is composed of a sentence including any of the plurality of attention words in the designated text data.

本発明の第１１の局面は、本発明の第１の局面において、
前記共起行列を生成するステップは、Ｊａｃｃａｒｄ係数を要素とする共起行列を生成することを特徴とする。 According to an eleventh aspect of the present invention, in the first aspect of the present invention,
The step of generating the co-occurrence matrix generates a co-occurrence matrix having a Jaccard coefficient as an element.

本発明の第１２の局面は、テキストデータの分析結果を含む画面を表示するためのテキストマイニングプログラムであって、
テキストデータから単語を抽出するステップと、
前記単語について共起行列を生成するステップと、
前記共起行列に基づき共起ネットワークを生成するステップと、
前記共起ネットワークを含む画面を表示するステップとをコンピュータにＣＰＵがメモリを利用して実行させ、
指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、前記単語を抽出するステップは前記指定されたテキストデータのうち前記注目語を含む部分からなる限定テキストデータから前記単語を抽出し、前記共起行列を生成するステップは前記単語について前記限定テキストデータを用いて第２共起行列を生成し、前記共起ネットワークを生成するステップは前記第２共起行列に基づき第２共起ネットワークを生成し、前記画面を表示するステップは前記第２共起ネットワークを含む第２画面を表示することを特徴とする。 A twelfth aspect of the present invention is a text mining program for displaying a screen including an analysis result of text data,
Extracting words from text data;
Generating a co-occurrence matrix for the word;
Generating a co-occurrence network based on the co-occurrence matrix;
And causing the computer to execute a step of displaying a screen including the co-occurrence network using a memory,
The step of extracting the word when the instruction to specify the attention word is input in the first screen including the first co-occurrence network based on the whole of the specified text data includes the step of extracting the word from the specified text data. The step of extracting the word from the limited text data including the part including the attention word and generating the co-occurrence matrix generates a second co-occurrence matrix using the limited text data for the word, The generating step generates a second co-occurrence network based on the second co-occurrence matrix, and the step of displaying the screen displays a second screen including the second co-occurrence network.

本発明の第１３の局面は、本発明の第１２の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１個または複数のノードを選択し、分析開始を選択することにより、前記ノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 A thirteenth aspect of the present invention is the twelfth aspect of the present invention,
By selecting one or a plurality of nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction for designating a word corresponding to the node as the attention word is input. It is characterized by that.

本発明の第１４の局面は、本発明の第１２の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１個のノードを続けて選択することにより、前記ノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 A fourteenth aspect of the present invention is the twelfth aspect of the present invention,
By sequentially selecting one node included in the first co-occurrence network in the first screen, an instruction for designating a word corresponding to the node as the attention word is input. .

本発明の第１５の局面は、本発明の第１２の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１本のエッジを続けて選択することにより、前記エッジに接続された２個のノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 A fifteenth aspect of the present invention is the twelfth aspect of the present invention,
An instruction for designating a word corresponding to two nodes connected to the edge as the attention word by successively selecting one edge included in the first co-occurrence network in the first screen. It is input.

本発明の第１６の局面は、本発明の第１２の局面において、
前記第１画面内で前記第１共起ネットワークに含まれる１本または複数のエッジを選択し、分析開始を選択することにより、前記エッジに接続された複数のノードに対応する単語を前記注目語として指定する指示が入力されることを特徴とする。 A sixteenth aspect of the present invention is the twelfth aspect of the present invention,
By selecting one or a plurality of edges included in the first co-occurrence network in the first screen and selecting start analysis, words corresponding to a plurality of nodes connected to the edges are selected as the attention word. An instruction to designate as is input.

本発明の第１７の局面は、本発明の第１２の局面において、
複数の第２共起ネットワークを含む第２画面内で併合指示が入力されたときに、前記画面を表示するステップは、前記複数の第２共起ネットワークをタブ形式で表示することを特徴とする。 A seventeenth aspect of the present invention is the twelfth aspect of the present invention,
The step of displaying the screen when a merge instruction is input in a second screen including a plurality of second co-occurrence networks includes displaying the plurality of second co-occurrence networks in a tab format. .

本発明の第１８の局面は、本発明の第１７の局面において、
前記第２画面内で一の第２共起ネットワークを掴んで他の第２共起ネットワーク内で離すことにより、前記併合指示が入力されることを特徴とする。 According to an eighteenth aspect of the present invention, in an seventeenth aspect of the present invention,
The merging instruction is input by grasping one second co-occurrence network in the second screen and releasing it in another second co-occurrence network.

本発明の第１９の局面は、テキストデータの分析結果を含む画面を表示するテキストマイニング装置であって、
テキストデータから単語を抽出する単語抽出部と、
前記単語について共起行列を生成する共起行列生成部と、
前記共起行列に基づき共起ネットワークを生成する共起ネットワーク生成部と、
前記共起ネットワークを含む画面を表示する画面表示部とを備え、
指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、前記単語抽出部は前記指定されたテキストデータのうち前記注目語を含む部分からなる限定テキストデータから前記単語を抽出し、前記共起行列生成部は前記単語について前記限定テキストデータを用いて第２共起行列を生成し、前記共起ネットワーク生成部は前記第２共起行列に基づき第２共起ネットワークを生成し、前記画面表示部は前記第２共起ネットワークを含む第２画面を表示することを特徴とする。 A nineteenth aspect of the present invention is a text mining device that displays a screen including an analysis result of text data,
A word extraction unit for extracting words from text data;
A co-occurrence matrix generator for generating a co-occurrence matrix for the word;
A co-occurrence network generation unit that generates a co-occurrence network based on the co-occurrence matrix;
A screen display unit for displaying a screen including the co-occurrence network,
When an instruction for designating a word of interest is input in the first screen including the first co-occurrence network based on the whole of the designated text data, the word extraction unit extracts the word of interest from the designated text data The co-occurrence matrix generation unit generates a second co-occurrence matrix using the limited text data for the word, and the co-occurrence network generation unit A second co-occurrence network is generated based on a two-co-occurrence matrix, and the screen display unit displays a second screen including the second co-occurrence network.

本発明の第２０の局面は、本発明の第１９の局面において、
複数の第２共起ネットワークを含む第２画面内で併合指示が入力されたときに、前記画面表示部は、前記複数の第２共起ネットワークをタブ形式で表示することを特徴とする。 According to a twentieth aspect of the present invention, in a nineteenth aspect of the present invention,
When a merge instruction is input within a second screen including a plurality of second co-occurrence networks, the screen display unit displays the plurality of second co-occurrence networks in a tab format.

上記第１、第１２または第１９の局面によれば、指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、指定されたテキストデータのうち注目語を含む部分に基づく第２共起ネットワークを含む第２画面が表示される。したがって、注目語を指定したときの共起ネットワークを含む画面を簡単な操作で表示することができる。 According to the first, twelfth, or nineteenth aspect, the designation is made when an instruction for designating the attention word is input in the first screen including the first co-occurrence network based on the entire designated text data. A second screen including the second co-occurrence network based on the portion including the attention word in the text data thus displayed is displayed. Therefore, the screen including the co-occurrence network when the attention word is designated can be displayed with a simple operation.

上記第２または第１３の局面によれば、第１画面内で１個または複数のノードと分析開始を選択することにより、１個または複数の注目語を指定する指示を簡単な操作で入力し、１個または複数の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 According to the second or thirteenth aspect, by selecting one or a plurality of nodes and analysis start in the first screen, an instruction for designating one or a plurality of attention words is input by a simple operation. A screen including a co-occurrence network when one or more attention words are designated can be displayed.

上記第３または第１４の局面によれば、第１画面内で１個のノード続けて選択することにより、１個の注目語を指定する指示を簡単な操作で入力し、１個の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 According to the third or fourteenth aspect, by selecting one node in succession in the first screen, an instruction for designating one attention word is input by a simple operation, and one attention word The screen including the co-occurrence network can be displayed.

上記第４または第１５の局面によれば、第１画面内で１本のエッジを続けて選択することにより、２個の注目語を指定する指示を簡単な操作で入力し、２個の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 According to the fourth or fifteenth aspect, an instruction for designating two attention words is input by a simple operation by continuously selecting one edge in the first screen, and two attentions are given. A screen including a co-occurrence network when a word is specified can be displayed.

上記第５または第１６の局面によれば、第１画面内で１本または複数のエッジと分析開始を選択することにより、複数の注目語を指定する指示を簡単な操作で入力し、複数の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 According to the fifth or sixteenth aspect, by selecting one or more edges and analysis start in the first screen, an instruction for designating a plurality of attention words can be input by a simple operation, A screen including the co-occurrence network when the attention word is designated can be displayed.

上記第６、第１７または第２０の局面によれば、併合指示が入力されたときに複数の第２共起ネットワークをタブ形式で表示することにより、複数の第２共起ネットワークをコンパクトに表示することができる。 According to the sixth, seventeenth, or twentieth aspect, a plurality of second co-occurrence networks are displayed in a compact manner by displaying the plurality of second co-occurrence networks in a tab format when a merge instruction is input. can do.

上記第７または第１８の局面によれば、第２画面内で第２共起ネットワークを掴んで離すことにより、併合指示を簡単な操作で入力し、複数の第２共起ネットワークをコンパクトに表示することができる。 According to the seventh or eighteenth aspect, by holding and releasing the second co-occurrence network in the second screen, a merge instruction can be input with a simple operation, and a plurality of second co-occurrence networks can be displayed in a compact manner. can do.

上記第８の局面によれば、注目語を指定する指示が入力されたときに、指定されたテキストデータを文単位で分けて限定テキストデータを求め、求めた限定テキストデータに基づく第２共起ネットワークを含む画面を表示することができる。 According to the eighth aspect, when an instruction for designating an attention word is input, the designated text data is divided into sentence units to obtain limited text data, and the second co-occurrence based on the obtained limited text data A screen including the network can be displayed.

上記第９または第１０の局面によれば、複数の注目語についてＡＮＤ処理またはＯＲ処理を行ったときの第２共起ネットワークを含む画面を表示することができる。 According to the ninth or tenth aspect, it is possible to display a screen including the second co-occurrence network when AND processing or OR processing is performed on a plurality of attention words.

上記第１１の局面によれば、Ｊａｃｃａｒｄ係数を要素とする共起行列を生成することにより、テキストデータに含まれる単語の共起性を好適に分析することができる。 According to the eleventh aspect, by generating a co-occurrence matrix having Jaccard coefficients as elements, it is possible to suitably analyze the co-occurrence of words included in text data.

本発明の実施形態に係るテキストマイニング装置の構成を示すブロック図である。It is a block diagram which shows the structure of the text mining device which concerns on embodiment of this invention. 図１に示すテキストマイニング装置として機能するコンピュータの構成を示すブロック図である。It is a block diagram which shows the structure of the computer which functions as a text mining apparatus shown in FIG. 図１に示すテキストマイニング装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置で生成される共起行列の例を示す図である。It is a figure which shows the example of the co-occurrence matrix produced | generated with the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置が表示する全体共起ネットワークを含むウインドウの例を示す図である。It is a figure which shows the example of the window containing the whole co-occurrence network which the text mining apparatus shown in FIG. 1 displays. 図５に示すウインドウ内で注目語を指定する第１の操作を示す図である。It is a figure which shows 1st operation which designates an attention word in the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第２の操作を示す図である。It is a figure which shows 2nd operation which designates an attention word in the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第３の操作を示す図である。It is a figure which shows 3rd operation which designates an attention word in the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第４の操作を示す図である。It is a figure which shows 4th operation which designates an attention word in the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第５の操作を示す図である。It is a figure which shows 5th operation which designates an attention word within the window shown in FIG. 図５に示すウインドウ内で注目語を指定する第６の操作を示す図である。It is a figure which shows 6th operation which designates an attention word in the window shown in FIG. 図１に示すテキストマイニング装置が表示する限定共起ネットワークを含むウインドウの例を示す図である。It is a figure which shows the example of the window containing the limited co-occurrence network which the text mining apparatus shown in FIG. 1 displays. 図１に示すテキストマイニング装置が表示する限定共起ネットワークを含むウインドウの例を示す図である。It is a figure which shows the example of the window containing the limited co-occurrence network which the text mining apparatus shown in FIG. 1 displays. 図１に示すテキストマイニング装置の表示画面の例を示す図である。It is a figure which shows the example of the display screen of the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置の表示画面の例を示す図である。It is a figure which shows the example of the display screen of the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置の表示画面の例を示す図である。It is a figure which shows the example of the display screen of the text mining apparatus shown in FIG. 図１に示すテキストマイニング装置におけるウインドウを併合する操作を示す図である。It is a figure which shows operation which merges the window in the text mining apparatus shown in FIG. 図１７に示す操作を行った後の表示画面を示す図である。It is a figure which shows the display screen after performing operation shown in FIG. 共起ネットワークの例を示す図である。It is a figure which shows the example of a co-occurrence network.

以下、図面を参照して、本発明の実施形態に係るテキストマイニング方法、テキストマイニングプログラム、および、テキストマイニング装置について説明する。本実施形態に係るテキストマイニング方法は、典型的にはコンピュータを用いて実行される。本実施形態に係るテキストマイニングプログラムは、コンピュータを用いてテキストマイニング方法を実施するためのプログラムである。本実施形態に係るテキストマイニング装置は、典型的にはコンピュータを用いて構成される。テキストマイニングプログラムを実行するコンピュータは、テキストマイニング装置として機能する。 Hereinafter, a text mining method, a text mining program, and a text mining apparatus according to an embodiment of the present invention will be described with reference to the drawings. The text mining method according to the present embodiment is typically executed using a computer. The text mining program according to the present embodiment is a program for implementing a text mining method using a computer. The text mining device according to the present embodiment is typically configured using a computer. A computer that executes the text mining program functions as a text mining device.

図１は、本発明の実施形態に係るテキストマイニング装置の構成を示すブロック図である。図１に示すテキストマイニング装置１０は、指示入力部１１、テキストデータ記憶部１２、単語抽出部１３、共起行列生成部１４、共起ネットワーク生成部１５、および、画面表示部１６を備えている。テキストマイニング装置１０は、テキストデータ記憶部１２に記憶されたテキストデータに基づきテキストデータの分析結果として共起ネットワークを生成し、生成した共起ネットワークを含む画面を表示する。 FIG. 1 is a block diagram showing a configuration of a text mining apparatus according to an embodiment of the present invention. A text mining apparatus 10 shown in FIG. 1 includes an instruction input unit 11, a text data storage unit 12, a word extraction unit 13, a co-occurrence matrix generation unit 14, a co-occurrence network generation unit 15, and a screen display unit 16. . The text mining device 10 generates a co-occurrence network as a text data analysis result based on the text data stored in the text data storage unit 12, and displays a screen including the generated co-occurrence network.

テキストマイニング装置１０の動作の概要は、以下のとおりである。指示入力部１１には、利用者（テキストデータの分析者）からの指示が入力される。テキストデータ記憶部１２は、自由記述された１以上のテキストデータを記憶している。単語抽出部１３は、テキストデータ記憶部１２から指定されたテキストデータを読み出し、読み出したテキストデータに対して形態素解析を行うことにより、テキストデータから単語を抽出する。共起行列生成部１４は、単語抽出部１３で抽出された単語について共起行列を生成する。共起ネットワーク生成部１５は、共起行列生成部１４で生成された共起行列に基づき共起ネットワークを生成する。画面表示部１６は、共起ネットワーク生成部１５で生成された共起ネットワークを含む画面を表示する。 The outline of the operation of the text mining apparatus 10 is as follows. An instruction from the user (text data analyst) is input to the instruction input unit 11. The text data storage unit 12 stores one or more freely described text data. The word extraction unit 13 reads the designated text data from the text data storage unit 12 and extracts words from the text data by performing morphological analysis on the read text data. The co-occurrence matrix generation unit 14 generates a co-occurrence matrix for the words extracted by the word extraction unit 13. The co-occurrence network generation unit 15 generates a co-occurrence network based on the co-occurrence matrix generated by the co-occurrence matrix generation unit 14. The screen display unit 16 displays a screen including the co-occurrence network generated by the co-occurrence network generation unit 15.

利用者は、指示入力部１１を用いて、分析対象のテキストデータを指定する指示、注目語を指定する指示などを入力する。単語抽出部１３、共起ネットワーク生成部１５、および、画面表示部１６は、利用者からの指示に従い、共起ネットワークを含む画面を表示するための動作を行う。テキストデータを指定する指示が入力されたときには、指定されたテキストデータの全体に基づく全体共起ネットワークが生成され、全体共起ネットワークを含む画面が表示される。全体共起ネットワークを含む画面内で注目語を指定する指示が入力されたときには、指定されたテキストデータのうち注目語を含む文に基づく限定共起ネットワークが生成され、限定共起ネットワークを含む画面が表示される。 The user uses the instruction input unit 11 to input an instruction for specifying text data to be analyzed, an instruction for specifying a target word, and the like. The word extraction unit 13, the co-occurrence network generation unit 15, and the screen display unit 16 perform an operation for displaying a screen including the co-occurrence network in accordance with an instruction from the user. When an instruction to specify text data is input, an entire co-occurrence network based on the entire specified text data is generated, and a screen including the entire co-occurrence network is displayed. When an instruction to specify the attention word is input in the screen including the entire co-occurrence network, a limited co-occurrence network is generated based on the sentence including the attention word in the specified text data, and the screen includes the limited co-occurrence network. Is displayed.

図２は、テキストマイニング装置１０として機能するコンピュータの構成を示すブロック図である。図２に示すコンピュータ２０は、ＣＰＵ２１、メインメモリ２２、記憶部２３、入力部２４、表示部２５、通信部２６、および、記録媒体読み取り部２７を備えている。メインメモリ２２には、例えば、ＤＲＡＭが使用される。記憶部２３には、例えば、ハードディスクやソリッドステートドライブが使用される。入力部２４には、例えば、キーボード２８やマウス２９が含まれる。表示部２５には、例えば、液晶ディスプレイが使用される。通信部２６は、有線通信または無線通信のインターフェイス回路である。記録媒体読み取り部２７は、プログラムなどを記憶した記録媒体３０のインターフェイス回路である。記録媒体３０には、例えば、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＵＳＢメモリなどの非一過性の記録媒体が使用される。 FIG. 2 is a block diagram illustrating a configuration of a computer that functions as the text mining apparatus 10. The computer 20 illustrated in FIG. 2 includes a CPU 21, a main memory 22, a storage unit 23, an input unit 24, a display unit 25, a communication unit 26, and a recording medium reading unit 27. For example, a DRAM is used as the main memory 22. For the storage unit 23, for example, a hard disk or a solid state drive is used. The input unit 24 includes a keyboard 28 and a mouse 29, for example. For example, a liquid crystal display is used for the display unit 25. The communication unit 26 is an interface circuit for wired communication or wireless communication. The recording medium reading unit 27 is an interface circuit of the recording medium 30 that stores programs and the like. As the recording medium 30, for example, a non-transitory recording medium such as a CD-ROM, a DVD-ROM, or a USB memory is used.

コンピュータ２０がテキストマイニングプログラム３１を実行する場合、記憶部２３は、テキストマイニングプログラム３１とテキストデータ３２を記憶する。テキストマイニングプログラム３１とテキストデータ３２は、例えば、サーバや他のコンピュータから通信部２６を用いて受信したものでもよく、記録媒体３０から記録媒体読み取り部２７を用いて読み出したものでもよい。 When the computer 20 executes the text mining program 31, the storage unit 23 stores the text mining program 31 and text data 32. For example, the text mining program 31 and the text data 32 may be received from a server or another computer using the communication unit 26, or may be read from the recording medium 30 using the recording medium reading unit 27.

テキストマイニングプログラム３１を実行するときには、テキストマイニングプログラム３１とテキストデータ３２はメインメモリ２２に複写転送される。ＣＰＵ２１は、メインメモリ２２を作業用メモリとして利用して、メインメモリ２２に記憶されたテキストマイニングプログラム３１を実行することにより、テキストデータ３２から単語を抽出する処理、抽出した単語について共起行列を生成する処理、生成した共起行列に基づき共起ネットワークを生成する処理、生成した共起ネットワークを含む画面を表示する処理などを行う。このときコンピュータ２０は、テキストマイニング装置１０として機能する。なお、以上に述べたコンピュータ２０の構成は一例に過ぎず、任意のコンピュータを用いてテキストマイニング装置１０を構成することができる。 When the text mining program 31 is executed, the text mining program 31 and the text data 32 are copied and transferred to the main memory 22. The CPU 21 uses the main memory 22 as a working memory to execute a text mining program 31 stored in the main memory 22, thereby extracting a word from the text data 32, and a co-occurrence matrix for the extracted word. Processing to generate, processing to generate a co-occurrence network based on the generated co-occurrence matrix, processing to display a screen including the generated co-occurrence network, and the like are performed. At this time, the computer 20 functions as the text mining device 10. The configuration of the computer 20 described above is merely an example, and the text mining apparatus 10 can be configured using an arbitrary computer.

図３は、テキストマイニング装置１０の動作を示すフローチャートである。図３に示す動作を行う前に、テキストデータ記憶部１２は自由記述された１以上のテキストデータを記憶している。各テキストデータは、複数の文を含んでいる。テキストマイニング装置１０は、テキストデータ記憶部１２に記憶されたテキストデータのうちで利用者が指定したテキストデータに対して処理を行う。 FIG. 3 is a flowchart showing the operation of the text mining apparatus 10. Prior to performing the operation shown in FIG. 3, the text data storage unit 12 stores one or more freely described text data. Each text data includes a plurality of sentences. The text mining device 10 performs processing on the text data specified by the user among the text data stored in the text data storage unit 12.

図３において、指示入力部１１は、まず利用者からテキストデータを指定する指示を受け取る（ステップＳ１０１）。このとき、指示入力部１１は、テキストデータを指定する指示に加えて、共起行列の基準値（詳細は後述）を設定する指示、ＡＮＤ処理とＯＲ処理（詳細は後述）を切り替える指示、共起ネットワークの表示態様の詳細を設定する指示などを受け取ってもよい。受け取った指示は、テキストマイニング装置１０の各部に対して出力される。 In FIG. 3, the instruction input unit 11 first receives an instruction for designating text data from a user (step S101). At this time, the instruction input unit 11 in addition to an instruction to specify text data, an instruction to set a reference value (details will be described later) of the co-occurrence matrix, an instruction to switch between AND processing and OR processing (details will be described later), You may receive the instruction | indication etc. which set the detail of the display aspect of the origin network. The received instruction is output to each unit of the text mining device 10.

次に、単語抽出部１３は、テキストデータ記憶部１２から指定されたテキストデータを読み出す（ステップＳ１０２）。次に、単語抽出部１３は、ステップＳ１０２で読み出したテキストデータに対して形態素解析を行うことにより、読み出したテキストデータから単語を抽出する（ステップＳ１０３）。このとき、単語抽出部１３は、読み出したテキストデータから、後の分析で必要となる単語だけを抽出する。次に、共起行列生成部１４は、ステップＳ１０３で抽出された単語について、ステップＳ１０２で読み出されたテキストデータを用いて共起行列を生成する（ステップＳ１０４）。 Next, the word extraction unit 13 reads the designated text data from the text data storage unit 12 (step S102). Next, the word extraction unit 13 extracts words from the read text data by performing morphological analysis on the text data read in step S102 (step S103). At this time, the word extraction unit 13 extracts only words necessary for later analysis from the read text data. Next, the co-occurrence matrix generation unit 14 generates a co-occurrence matrix for the words extracted in step S103, using the text data read in step S102 (step S104).

図４は、共起行列生成部１４で生成された共起行列の例を示す図である。共起行列の要素は、単語のペアについて求めたＪａｃｃａｒｄ係数である。分析対象のテキストデータについて、単語Ｗａを含む文の集合をＡ、単語Ｗｂを含む文の集合をＢとする。単語のペア（Ｗａ，Ｗｂ）についてのＪａｃｃａｒｄ係数Ｋ（Ｗａ，Ｗｂ）は、次式（１）で与えられる。
Ｋ（Ｗａ，Ｗｂ）＝｜Ａ∩Ｂ｜／｜Ａ∪Ｂ｜ …（１）
ただし、式（１）において、記号∩は積集合を求める演算を表し、記号∪は和集合を求める演算を表し、｜Ｓ｜は集合Ｓに含まれる要素の個数を表す。 FIG. 4 is a diagram illustrating an example of the co-occurrence matrix generated by the co-occurrence matrix generation unit 14. The elements of the co-occurrence matrix are Jaccard coefficients obtained for word pairs. For text data to be analyzed, a set of sentences including the word Wa is A, and a set of sentences including the word Wb is B. The Jaccard coefficient K (Wa, Wb) for the word pair (Wa, Wb) is given by the following equation (1).
K (Wa, Wb) = | A∩B | / | A∪B | (1)
In Equation (1), symbol ∩ represents an operation for obtaining a product set, symbol ∪ represents an operation for obtaining a union, and | S | represents the number of elements included in the set S.

共起行列生成部１４は、ステップＳ１０４において、ステップＳ１０２で読み出されたテキストデータの全体から抽出された単語のペアのすべてについてＪａｃｃａｒｄ係数を求め、求めたＪａｃｃａｒｄ係数を要素とする共起行列を生成する。共起行列の行および列は、ステップＳ１０２で読み出されたテキストデータの全体から抽出された単語の種類に対応する。読み出されたテキストデータの全体からｎ種類の単語が抽出されたとき、ステップＳ１０４で生成される共起行列は、対角要素がすべて１であるｎ行ｎ列の対称行列である。 In step S104, the co-occurrence matrix generation unit 14 obtains a Jaccard coefficient for all the word pairs extracted from the entire text data read in step S102, and obtains a co-occurrence matrix having the obtained Jaccard coefficient as an element. Generate. The rows and columns of the co-occurrence matrix correspond to the word types extracted from the entire text data read in step S102. When n types of words are extracted from the entire read text data, the co-occurrence matrix generated in step S104 is an n-by-n symmetric matrix whose diagonal elements are all 1.

なお、共起行列生成部１４は、テキストデータを文以外の単位で分けてＪａｃｃａｒｄ係数を求めてもよい。例えば、共起行列生成部１４は、単語Ｗａを含む段落の集合をＡ、単語Ｗｂを含む段落の集合をＢとして、式（１）に従いＪａｃｃａｒｄ係数を求めてもよい。また、テキストデータに含まれる文が日付を有する場合には、共起行列生成部１４は、テキストデータを同じ日付を有する文からなる複数の部分に分け、単語Ｗａを含む部分の集合をＡ、単語Ｗｂを含む部分の集合をＢとして、式（１）に従いＪａｃｃａｒｄ係数を求めてもよい。また、共起行列生成部１４は、単語の共起性を示す他の値（例えば、Ｓｉｍｐｓｏｎ係数やコサイン距離など）を要素として含む共起行列を生成してもよい。 Note that the co-occurrence matrix generation unit 14 may obtain the Jaccard coefficient by dividing the text data into units other than sentences. For example, the co-occurrence matrix generation unit 14 may obtain the Jaccard coefficient according to Equation (1), where A is a set of paragraphs including the word Wa and B is a set of paragraphs including the word Wb. When the sentence included in the text data has a date, the co-occurrence matrix generation unit 14 divides the text data into a plurality of parts including sentences having the same date, and sets a set of parts including the word Wa as A, A set of portions including the word Wb may be B, and the Jaccard coefficient may be obtained according to the equation (1). In addition, the co-occurrence matrix generation unit 14 may generate a co-occurrence matrix that includes other values indicating the co-occurrence of words (for example, a Simpson coefficient, a cosine distance, and the like) as elements.

次に、共起ネットワーク生成部１５は、ステップＳ１０４で生成された共起行列に基づき、全体共起ネットワークを生成する（ステップＳ１０５）。次に、画面表示部１６は、ステップＳ１０５で生成された全体共起ネットワークを含む画面を表示する（ステップＳ１０６）。図５は、ステップＳ１０６で表示される、全体共起ネットワークを含むウインドウの例を示す図である。図５に示すウインドウ４１は、全体共起ネットワーク５１と分析ボタン６１を含んでいる。分析ボタン６１は、分析開始を指示するために設けられる。 Next, the co-occurrence network generation unit 15 generates an entire co-occurrence network based on the co-occurrence matrix generated in step S104 (step S105). Next, the screen display unit 16 displays a screen including the entire co-occurrence network generated in step S105 (step S106). FIG. 5 is a diagram showing an example of a window including the entire co-occurrence network displayed in step S106. A window 41 shown in FIG. 5 includes an entire co-occurrence network 51 and an analysis button 61. The analysis button 61 is provided for instructing the start of analysis.

共起ネットワーク生成部１５は、共起行列の基準値（以下、Ｖとする）を有している。基準値Ｖは、予め決定された値でもよく、指示入力部１１を用いて利用者から設定された値でもよい。ステップＳ１０４で生成された共起行列において、単語Ｗａに対応する行に含まれるＪａｃｃａｒｄ係数Ｋ（Ｗａ，＊）の最大値が基準値Ｖ以上である場合、共起ネットワーク生成部１５は単語Ｗａに対応するノード（単語Ｗａと記載したノード）を全体共起ネットワークに含める。また、ステップＳ１０４で生成された共起行列において、単語のペア（Ｗａ，Ｗｂ）に係るＪａｃｃａｒｄ係数Ｋ（Ｗａ，Ｗｂ）が基準値Ｖ以上である場合、共起ネットワーク生成部１５は単語Ｗａに対応するノードと単語Ｗｂに対応するノードとを接続するエッジを全体共起ネットワークに含める。 The co-occurrence network generation unit 15 has a co-occurrence matrix reference value (hereinafter referred to as V). The reference value V may be a predetermined value or a value set by the user using the instruction input unit 11. In the co-occurrence matrix generated in step S104, if the maximum value of the Jaccard coefficient K (Wa, *) included in the row corresponding to the word Wa is equal to or greater than the reference value V, the co-occurrence network generation unit 15 sets the word Wa. The corresponding node (the node described as word Wa) is included in the entire co-occurrence network. In the co-occurrence matrix generated in step S104, if the Jaccard coefficient K (Wa, Wb) relating to the word pair (Wa, Wb) is greater than or equal to the reference value V, the co-occurrence network generation unit 15 adds the word Wa. An edge connecting the corresponding node and the node corresponding to the word Wb is included in the entire co-occurrence network.

図５に示す全体共起ネットワーク５１では、出現頻度が大きい単語に対応するノードは大きく表示されている。共起ネットワークを含む画面を表示するときに、Ｊａｃｃａｒｄ係数Ｋ（Ｗａ，Ｗｂ）が大きいときに、単語Ｗａに対応するノードと単語Ｗｂに対応するノードとを接続するエッジを太く表示してもよい。また、Ｊａｃｃａｒｄ係数に応じて、エッジの色を切り替えてもよく、エッジの太さと色の両方を切り替えてもよい。共起ネットワークは、エッジを介して到達可能な複数の部分に分けられる。共起ネットワークを含む画面を表示するときに、各部分に含まれる複数のノードを各部分に割り当てた色で表示してもよい。なお、共起ネットワークに含まれるノードとエッジの位置に意味はない。 In the entire co-occurrence network 51 shown in FIG. 5, nodes corresponding to words having a high appearance frequency are displayed large. When the screen including the co-occurrence network is displayed, when the Jaccard coefficient K (Wa, Wb) is large, the edge connecting the node corresponding to the word Wa and the node corresponding to the word Wb may be displayed thickly. . Further, the color of the edge may be switched according to the Jaccard coefficient, or both the thickness and the color of the edge may be switched. A co-occurrence network is divided into a plurality of parts that can be reached through the edge. When a screen including a co-occurrence network is displayed, a plurality of nodes included in each part may be displayed in a color assigned to each part. Note that the positions of nodes and edges included in the co-occurrence network are meaningless.

次に、指示入力部１１は、利用者から注目語を指定する指示を受け取る（ステップＳ１１１）。ステップＳ１１１を実行するときには、全体共起ネットワークを含む画面が表示されている。利用者は、マウス２９を操作して、全体共起ネットワークの要素を選択することにより、注目語を指定する指示を入力する。なお、利用者は、指示を入力するときに、マウス２９に代えてキーボード２８を用いてもよく、表示画面に直接触れるなどの操作を行ってもよい。以下、ステップＳ１１１を実行するときに、図５に示すウインドウ４１を含む画面が表示されているとする。 Next, the instruction input unit 11 receives an instruction for designating the attention word from the user (step S111). When executing step S111, a screen including the entire co-occurrence network is displayed. The user operates the mouse 29 to select an element of the entire co-occurrence network, thereby inputting an instruction for designating the attention word. Note that the user may use the keyboard 28 instead of the mouse 29 when inputting an instruction, or may perform an operation such as directly touching the display screen. Hereinafter, it is assumed that a screen including the window 41 shown in FIG. 5 is displayed when step S111 is executed.

図６〜図１１は、それぞれ、ウインドウ４１内で注目語を指定する第１〜第６の操作を示す図である。図６〜図１１において、吹き出しは操作の手順を示し、白い矢印はマウスカーソル６２の移動を示す。吹き出しおよび矢印は、実際の画面には表示されない。以下、マウスカーソル６２が表示画面内のある要素の上にあるときにマウス２９のボタンをクリック（ダブルクリック）することを「要素をクリック（ダブルクリック）する」という。 6 to 11 are diagrams showing first to sixth operations for designating a word of interest in the window 41, respectively. 6 to 11, balloons indicate the operation procedure, and white arrows indicate the movement of the mouse cursor 62. Balloons and arrows are not displayed on the actual screen. Hereinafter, clicking (double-clicking) the button of the mouse 29 when the mouse cursor 62 is on a certain element in the display screen is referred to as “clicking (double-clicking) the element”.

図６に示すように、利用者は、ウインドウ４１内でまず注目語として指定する単語（ここでは「露天風呂」）に対応するノードをクリックし（１回目のクリック）、次に分析ボタン６１をクリックする（２回目のクリック）。この操作により、１回目にクリックされたノードに対応する単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる１個のノードを選択し、分析開始を選択することにより、１個の注目語を指定する指示が入力される。 As shown in FIG. 6, the user first clicks a node corresponding to a word (here, “open-air bath”) designated as an attention word in the window 41 (first click), and then clicks the analysis button 61. Click (second click). By this operation, the word corresponding to the node clicked for the first time is designated as the attention word. In this way, by selecting one node included in the entire co-occurrence network on the screen including the entire co-occurrence network and selecting the start of analysis, an instruction for designating one attention word is input.

図７に示すように、利用者は、ウインドウ４１内で注目語として指定する単語（ここでは「露天風呂」）に対応するノードをダブルクリックする。この操作により、ダブルクリックされたノードに対応する単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる１個のノードを続けて選択することにより、１個の注目語を指定する指示が入力される。 As shown in FIG. 7, the user double-clicks a node corresponding to a word (here, “open-air bath”) designated as the attention word in the window 41. By this operation, the word corresponding to the double-clicked node is designated as the attention word. In this way, by successively selecting one node included in the entire co-occurrence network on the screen including the entire co-occurrence network, an instruction for designating one attention word is input.

図８に示すように、利用者は、ウインドウ４１内でまず注目語として指定する単語（ここでは「露天風呂」）に対応するノードをクリックし（１回目のクリック）、次に注目語として指定する別の単語（ここでは「値段」）に対応するノードをクリックし（２回目のクリック）、最後に分析ボタン６１をクリックする（最後のクリック）。この操作により、１回目と２回目にクリックされたノードに対応する２個の単語が注目語として指定される。利用者は、ウインドウ４１内でｐ個（ｐは３以上の整数）のノードを順にクリックし、最後に分析ボタン６１をクリックしてもよい。この操作により、ｐ個のノードに対応するｐ個の単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる複数のノードを選択し、分析開始を選択することにより、複数の注目語を指定する指示が入力される。 As shown in FIG. 8, the user first clicks the node corresponding to the word (here, “open-air bath”) designated as the attention word in the window 41 (first click), and then designates it as the attention word. Click the node corresponding to another word (here “price”) (second click), and finally click the analysis button 61 (last click). By this operation, two words corresponding to the node clicked for the first time and the second time are designated as attention words. The user may sequentially click p nodes (p is an integer of 3 or more) in the window 41 and finally click the analysis button 61. By this operation, p words corresponding to p nodes are designated as attention words. In this way, by selecting a plurality of nodes included in the global co-occurrence network on the screen including the global co-occurrence network and selecting the start of analysis, an instruction for designating a plurality of attention words is input.

図９に示すように、利用者は、ウインドウ４１内で注目語として指定する２個の単語（ここでは「露天風呂」と「階段」）に対応する２個のノードを接続するエッジをダブルクリックする。これにより、ダブルクリックされたエッジに接続された２個のノードに対応する２個の単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる１個のエッジを続けて選択することにより、２個の注目語を指定する指示が入力される。 As shown in FIG. 9, the user double-clicks an edge connecting two nodes corresponding to two words (here, “open-air bath” and “stairs”) designated as attention words in the window 41. To do. Thus, two words corresponding to the two nodes connected to the double-clicked edge are designated as the attention word. In this way, by successively selecting one edge included in the entire co-occurrence network on the screen including the entire co-occurrence network, an instruction for designating two attention words is input.

図１０に示すように、利用者は、ウインドウ４１内でまず注目語として指定する２個の単語（ここでは「露天風呂」と「階段」）に対応する２個のノードを接続するエッジをクリックし（１回目のクリック）、次に分析ボタン６１をクリックする（２回目のクリック）。これにより、１回目にクリックされたエッジに接続された２個のノードに対応する２個の単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる１個のエッジを選択し、分析開始を選択することにより、２個の注目語を指定する指示が入力される。 As shown in FIG. 10, the user first clicks an edge connecting two nodes corresponding to two words (here, “open-air bath” and “stairs”) designated as attention words in the window 41. (First click), and then click the analysis button 61 (second click). As a result, two words corresponding to the two nodes connected to the edge clicked for the first time are designated as attention words. In this way, by selecting one edge included in the global co-occurrence network in the screen including the global co-occurrence network and selecting the start of analysis, an instruction for designating two attention words is input.

図１１に示すように、利用者は、ウインドウ４１内でまず注目語として指定する２個の単語（ここでは「露天風呂」と「階段」）に対応する２個のノードを接続するエッジをクリックし（１回目のクリック）、次に注目語として指定する別の２個の単語（ここでは「値段」と「考える」）に対応する２個のノードを接続するエッジをクリックし（２回目のクリック）、最後に分析ボタン６１をクリックする（最後のクリック）。この操作により、１回目と２回目にクリックされた２個のエッジに接続された４個のノードに対応する４個の単語が注目語として指定される。利用者は、ウインドウ４１内でｑ本（ｑは３以上の整数）のエッジを順にクリックし、最後に分析ボタン６１をクリックしてもよい。この操作により、ｑ本のエッジに接続された２ｑ個のノードに対応する２ｑ個の単語が注目語として指定される。このように全体共起ネットワークを含む画面内で全体共起ネットワークに含まれる複数のエッジを選択し、分析開始を選択することにより、複数の注目語を指定する指示が入力される。 As shown in FIG. 11, the user first clicks an edge connecting two nodes corresponding to two words (here, “open-air bath” and “stairs”) designated as attention words in the window 41. (First click), then click the edge that connects the two nodes corresponding to the other two words (in this case, “price” and “think”) to be specified as the attention word (second time click) Click) and finally click the analysis button 61 (last click). By this operation, four words corresponding to four nodes connected to the two edges clicked for the first time and the second time are designated as attention words. The user may sequentially click q edges (q is an integer of 3 or more) in the window 41, and finally click the analysis button 61. By this operation, 2q words corresponding to 2q nodes connected to q edges are designated as attention words. In this way, by selecting a plurality of edges included in the global co-occurrence network on the screen including the global co-occurrence network and selecting the start of analysis, an instruction for designating a plurality of attention words is input.

指示入力部１１は、ステップＳ１１１において、注目語を指定する指示に加えて、共起行列の基準値を設定する指示、ＡＮＤ処理とＯＲ処理を切り替える指示、共起ネットワークの表示態様の詳細を設定する指示などを受け取ってもよい。受け取った指示は、テキストマイニング装置１０の各部に対して出力される。 In step S111, the instruction input unit 11 sets an instruction to set a reference value of the co-occurrence matrix, an instruction to switch between AND processing and OR processing, and details of the display mode of the co-occurrence network in addition to the instruction to specify the attention word. You may receive instructions to do so. The received instruction is output to each unit of the text mining device 10.

次に、単語抽出部１３は、ステップＳ１０２で読み出したテキストデータからステップＳ１１１で指定された注目語を含む文を抽出することにより、注目語を含む文からなる限定テキストデータを求める（ステップＳ１１２）。 Next, the word extraction unit 13 obtains limited text data including a sentence including the attention word by extracting the sentence including the attention word specified in step S111 from the text data read out in step S102 (step S112). .

単語抽出部１３は、複数の注目語が指定された場合にＡＮＤ処理とＯＲ処理のうちいずれを行うかを示すフラグを有している。フラグの値は、予め決定された値でもよく、指示入力部１１を用いて利用者から設定された値でもよい。フラグがＡＮＤ処理を示す場合、単語抽出部１３は、読み出したテキストデータから指定された複数の注目語のすべてを含む文を抽出することにより、限定テキストデータを求める。フラグがＯＲ処理を示す場合、単語抽出部１３は、読み出したテキストデータから指定されたいずれかの注目語を含む文を抽出することにより、限定テキストデータを求める。 The word extraction unit 13 has a flag indicating which of AND processing and OR processing is performed when a plurality of attention words are designated. The value of the flag may be a predetermined value or a value set by the user using the instruction input unit 11. When the flag indicates AND processing, the word extraction unit 13 obtains limited text data by extracting a sentence including all of the plurality of attention words specified from the read text data. When the flag indicates OR processing, the word extraction unit 13 obtains limited text data by extracting a sentence including any one of the attention words specified from the read text data.

次に、単語抽出部１３は、ステップＳ１１２で求めた限定テキストデータに対して形態素解析を行うことにより、限定テキストデータから単語を抽出する（ステップＳ１１３）。次に、共起行列生成部１４は、ステップＳ１１３で抽出された単語について、ステップＳ１１２で求められた限定テキストデータを用いて共起行列を生成する（ステップＳ１１４）。次に、共起ネットワーク生成部１５は、ステップＳ１１４で生成された共起行列に基づき、限定共起ネットワークを生成する（ステップＳ１１５）。なお、ステップＳ１０３〜Ｓ１０５とステップＳ１１３〜Ｓ１１５の間では、処理対象は異なるが、処理内容は同じである。 Next, the word extraction unit 13 extracts words from the limited text data by performing morphological analysis on the limited text data obtained in step S112 (step S113). Next, the co-occurrence matrix generation unit 14 generates a co-occurrence matrix for the words extracted in step S113, using the limited text data obtained in step S112 (step S114). Next, the co-occurrence network generation unit 15 generates a limited co-occurrence network based on the co-occurrence matrix generated in step S114 (step S115). Note that, although the processing target is different between steps S103 to S105 and steps S113 to S115, the processing content is the same.

一般に、ステップＳ１１２で求められた限定テキストデータから抽出される単語の種類は、ステップＳ１０２で読み出されたテキストデータから抽出される単語の種類よりも少ない。ステップＳ１１４で生成された共起行列は、ステップＳ１０４で生成された共起行列とは異なる。ステップＳ１１５で生成された限定共起ネットワークは、ステップＳ１０５で生成された全体共起ネットワークとは異なる。 In general, the types of words extracted from the limited text data obtained in step S112 are fewer than the types of words extracted from the text data read in step S102. The co-occurrence matrix generated in step S114 is different from the co-occurrence matrix generated in step S104. The limited co-occurrence network generated in step S115 is different from the overall co-occurrence network generated in step S105.

次に、画面表示部１６は、ステップＳ１１５で生成された限定共起ネットワークを含む画面を表示する（ステップＳ１１６）。図１２および図１３は、ステップＳ１１６で表示される、限定共起ネットワークを含むウインドウの例を示す図である。図１２に示すウインドウ４２は、１個の注目語（ここでは「露天風呂」）を指定したときの限定共起ネットワーク５２を含んでいる。図１３に示すウインドウ４３は、２個の注目語（ここでは「露天風呂」と「浴場」）を指定したときの限定共起ネットワーク５３を含んでいる。 Next, the screen display unit 16 displays a screen including the limited co-occurrence network generated in step S115 (step S116). 12 and 13 are diagrams illustrating examples of windows including the limited co-occurrence network displayed in step S116. A window 42 shown in FIG. 12 includes a limited co-occurrence network 52 when one attention word (here, “open-air bath”) is designated. A window 43 shown in FIG. 13 includes a limited co-occurrence network 53 when two attention words (here, “open-air bath” and “bathroom”) are designated.

図１４および図１５は、テキストマイニング装置１０の表示画面の例を示す図である。画面表示部１６は、全体共起ネットワークを含むウインドウと限定共起ネットワークを含むウインドウとを重ねずに並べて表示してもよく、両者を重ねて表示してもよい。図１４に示す画面７１では、全体共起ネットワーク５１を含むウインドウ４１と限定共起ネットワーク５２を含むウインドウ４２とは、重ねずに並べて表示されている。利用者は、画面７１において、全体共起ネットワーク５１と限定共起ネットワーク５２を同時に見ることができる。図１５に示す画面７２では、限定共起ネットワーク５２を含むウインドウ４２は、全体共起ネットワーク５１を含むウインドウ４１に重ねて表示されている。利用者は、画面７２において、全体共起ネットワーク５１と限定共起ネットワーク５２を切り替えて見ることができる。 14 and 15 are diagrams illustrating examples of display screens of the text mining device 10. The screen display unit 16 may display the window including the entire co-occurrence network and the window including the limited co-occurrence network side by side without overlapping, or may display both of them in an overlapping manner. In the screen 71 shown in FIG. 14, the window 41 including the entire co-occurrence network 51 and the window 42 including the limited co-occurrence network 52 are displayed side by side without being overlapped. The user can view the entire co-occurrence network 51 and the limited co-occurrence network 52 simultaneously on the screen 71. In the screen 72 shown in FIG. 15, the window 42 including the limited co-occurrence network 52 is displayed so as to overlap the window 41 including the entire co-occurrence network 51. The user can switch the entire co-occurrence network 51 and the limited co-occurrence network 52 and view them on the screen 72.

次に、指示入力部１１は、利用者から指示を受け取る（ステップＳ１２１）。次に、テキストマイニング装置１０は、ステップＳ１２１で受け取った指示が注目語を指定する指示か否かを判断する（ステップＳ１２２）。ステップＳ１２２でＹｅｓの場合、テキストマイニング装置１０の制御はステップＳ１１２へ進む。この場合、ステップＳ１２１で指定された注目語についてステップＳ１１２〜Ｓ１１６が実行され、ステップＳ１２１で指定された注目語を含む文からなる限定テキストデータに基づく限定共起ネットワークを含む画面が表示される。 Next, the instruction input unit 11 receives an instruction from the user (step S121). Next, the text mining apparatus 10 determines whether or not the instruction received in step S121 is an instruction for designating the attention word (step S122). If Yes in step S122, the control of the text mining device 10 proceeds to step S112. In this case, steps S112 to S116 are executed for the attention word specified in step S121, and a screen including a limited co-occurrence network based on limited text data including sentences including the attention word specified in step S121 is displayed.

図１６は、テキストマイニング装置１０の表示画面の例を示す図である。図１６に示す画面７３では、全体共起ネットワーク５１を含むウインドウ４１と限定共起ネットワーク５２を含むウインドウ４２とに重ねて、注目語として「浴場」を指定したときの限定共起ネットワーク５４を含むウインドウ４４が表示されている。画面７３は、ステップＳ１１１で「露天風呂」を注目語として指定し、ステップＳ１２１で「浴場」を注目語として指定したときに表示される。利用者は、画面７３において、全体共起ネットワーク５１と限定共起ネットワーク５２、５４を切り替えて見ることができる。 FIG. 16 is a diagram illustrating an example of a display screen of the text mining device 10. The screen 73 shown in FIG. 16 includes the limited co-occurrence network 54 when “bathhouse” is designated as the attention word, overlapping the window 41 including the entire co-occurrence network 51 and the window 42 including the limited co-occurrence network 52. A window 44 is displayed. The screen 73 is displayed when “open-air bath” is designated as an attention word in step S111 and “bathhouse” is designated as an attention word in step S121. On the screen 73, the user can switch between the overall co-occurrence network 51 and the limited co-occurrence networks 52 and 54.

ステップＳ１２２でＮｏの場合、テキストマイニング装置１０の制御はステップＳ１２３へ進む。この場合、ステップＳ１２１で受け取った指示は、例えば、ウインドウを移動させる指示、ウインドウを非表示にする指示、ウインドウを閉じる指示、ウインドウを併合する指示などである。利用者は、全体共起ネットワークと限定共起ネットワークを含む画面が表示されているときに指示入力部１１を操作することにより、これらの指示を入力する。画面表示部１６は、ステップＳ１２１で受け取った指示に従い、更新後の画面を表示する（ステップＳ１２３）。その後、テキストマイニング装置１０の制御は、ステップＳ１２１へ進む。 In the case of No in step S122, the control of the text mining device 10 proceeds to step S123. In this case, the instruction received in step S121 is, for example, an instruction to move the window, an instruction to hide the window, an instruction to close the window, or an instruction to merge the windows. The user inputs these instructions by operating the instruction input unit 11 when a screen including the entire co-occurrence network and the limited co-occurrence network is displayed. The screen display unit 16 displays the updated screen according to the instruction received in step S121 (step S123). Thereafter, the control of the text mining device 10 proceeds to step S121.

図１７は、ウインドウを併合する操作を示す図である。図１７に示す画面７４には、「露天風呂」を注目語として指定したときの限定共起ネットワーク５２を含むウインドウ４２と、「浴場」を注目語として指定したときの限定共起ネットワーク５４を含むウインドウ４４とが表示されている。利用者は、画面７４において、２個の限定共起ネットワーク５２、５４を同時に見ることができる。 FIG. 17 is a diagram illustrating an operation of merging windows. The screen 74 shown in FIG. 17 includes a window 42 including a limited co-occurrence network 52 when “open-air bath” is designated as an attention word, and a limited co-occurrence network 54 when “bath” is designated as an attention word. A window 44 is displayed. The user can simultaneously view the two limited co-occurrence networks 52 and 54 on the screen 74.

図１７に示すハッチング付き矢印は、マウス２９のボタンが押された状態でマウスカーソル６２が移動したことを示す。この矢印は、実際の画面には表示されない。利用者は、画面７４内で限定共起ネットワーク５２を掴んで限定共起ネットワーク５４内で離す操作（ドロップ操作）を行う。より詳細には、利用者は、マウスカーソル６２がウインドウ４２内にあるときにマウス２９のボタンを押し、マウス２９のボタンを押したままでマウスカーソル６２をウインドウ４４内まで移動させて、マウスカーソル６２がウインドウ４４内にあるときにマウス２９のボタンを離す。この操作により、ウインドウを併合する指示が入力される。 The hatched arrow shown in FIG. 17 indicates that the mouse cursor 62 has moved in a state where the button of the mouse 29 is pressed. This arrow is not displayed on the actual screen. The user performs an operation (drop operation) for grasping the limited co-occurrence network 52 within the screen 74 and releasing it within the limited co-occurrence network 54. More specifically, the user presses the button of the mouse 29 while the mouse cursor 62 is in the window 42, moves the mouse cursor 62 into the window 44 while holding down the button of the mouse 29, and moves the mouse cursor 62. Release the mouse 29 button when is in the window 44. By this operation, an instruction to merge windows is input.

図１８は、図１７に示す操作を行った後の表示画面を示す図である。図１８に示す画面７５には、複数の限定共起ネットワークをタブ形式で表示するウインドウ４５が表示されている。図１８では、「露天風呂」と記載したタブ６４が選択され、ウインドウ４５には「露天風呂」を注目語として指定したときの限定共起ネットワーク５２が表示されている。「浴場」と記載したタブ６３が選択されたときには、ウインドウ４５には図１７に示す限定共起ネットワーク５４が表示される。 FIG. 18 is a diagram showing a display screen after the operation shown in FIG. 17 is performed. A window 75 for displaying a plurality of limited co-occurrence networks in a tab format is displayed on the screen 75 shown in FIG. In FIG. 18, the tab 64 described as “open-air bath” is selected, and the limited co-occurrence network 52 when “open-air bath” is designated as an attention word is displayed in the window 45. When the tab 63 described as “Bathhouse” is selected, the limited co-occurrence network 54 shown in FIG.

利用者がウインドウ４５内の閉じるボタン（×印）をクリックしたときに、ウインドウ４５は閉じる。利用者がタブ６３内の閉じるボタンをクリックしたときには、タブ６３は表示されなくなる。利用者がタブ６４内の閉じるボタンをクリックしたときには、タブ６４は表示されなくなり、ウインドウ４５には限定共起ネットワーク５４が表示される。 When the user clicks the close button (x mark) in the window 45, the window 45 is closed. When the user clicks the close button in the tab 63, the tab 63 is not displayed. When the user clicks the close button in the tab 64, the tab 64 is not displayed and the limited co-occurrence network 54 is displayed in the window 45.

以上に示すように、本実施形態に係るテキストマイニング方法は、テキストデータから単語を抽出するステップ（ステップＳ１０２、Ｓ１０３、Ｓ１１２、Ｓ１１３）と、抽出した単語について共起行列を生成するステップ（ステップＳ１０４、Ｓ１１４）と、生成した共起行列に基づき共起ネットワークを生成するステップ（ステップＳ１０５、Ｓ１１５）と、共起ネットワークを含む画面を表示するステップ（ステップＳ１０６、Ｓ１１６）とを備えている。指定されたテキストデータの全体に基づく第１共起ネットワーク（全体共起ネットワーク５１）を含む第１画面（ウインドウ４１を含む画面）内で注目語を指定する指示が入力されたときに、単語を抽出するステップ（ステップＳ１１２、Ｓ１１３）は指定されたテキストデータのうち注目語を含む部分（注目語を含む文）からなる限定テキストデータから単語を抽出し、共起行列を生成するステップ（ステップＳ１１４）は抽出した単語について限定テキストデータを用いて第２共起行列を生成し、共起ネットワークを生成するステップ（ステップＳ１１５）は第２共起行列に基づき第２共起ネットワーク（限定共起ネットワーク５２〜５４）を生成し、画面を表示するステップ（ステップＳ１１６）は第２共起ネットワークを含む第２画面（ウインドウ４２〜４５を含む画面）を表示する。このように本実施形態に係るテキストマイニング方法では、指定されたテキストデータの全体に基づく第１共起ネットワークを含む第１画面内で注目語を指定する指示が入力されたときに、指定されたテキストデータのうち注目語を含む部分に基づく第２共起ネットワークを含む第２画面が表示される。したがって、注目語を指定したときの共起ネットワークを含む画面を簡単な操作で表示することができる。 As described above, in the text mining method according to the present embodiment, the step of extracting words from the text data (steps S102, S103, S112, S113) and the step of generating a co-occurrence matrix for the extracted words (step S104). , S114), a step of generating a co-occurrence network based on the generated co-occurrence matrix (steps S105, S115), and a step of displaying a screen including the co-occurrence network (steps S106, S116). When an instruction for designating a word of interest is input in the first screen (screen including window 41) including the first co-occurrence network (global co-occurrence network 51) based on the entire designated text data, In the extracting step (steps S112 and S113), a word is extracted from the limited text data including a portion including the attention word (a sentence including the attention word) in the designated text data, and a co-occurrence matrix is generated (step S114). ) Generates a second co-occurrence matrix using the limited text data for the extracted word, and the step of generating a co-occurrence network (step S115) is a second co-occurrence network (limited co-occurrence network) based on the second co-occurrence matrix. 52-54) and displaying the screen (step S116) is the second image including the second co-occurrence network. To display the (screen containing the window 42 to 45). As described above, in the text mining method according to the present embodiment, when the instruction to specify the attention word is input in the first screen including the first co-occurrence network based on the entire specified text data, it is specified. A second screen including the second co-occurrence network based on the portion including the attention word in the text data is displayed. Therefore, the screen including the co-occurrence network when the attention word is designated can be displayed with a simple operation.

また、第１画面内で第１共起ネットワークに含まれる１個または複数のノードを選択し、分析開始を選択することにより、ノードに対応する単語を注目語として指定する指示が入力される（図６、図８）。このように第１画面内で１個または複数のノードと分析開始を選択することにより、１個または複数の注目語を指定する指示を簡単な操作で入力し、１個または複数の注目語を指定したときの共起ネットワークを含む画面を表示することができる。また、第１画面内で第１共起ネットワークに含まれる１個のノードを続けて選択することにより、ノードに対応する単語を注目語として指定する指示が入力される（図７）。このように第１画面内で１個のノード続けて選択することにより、１個の注目語を指定する指示を簡単な操作で入力し、１個の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 Further, by selecting one or a plurality of nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction for designating a word corresponding to the node as an attention word is input ( 6 and 8). In this way, by selecting one or more nodes and analysis start in the first screen, an instruction for designating one or more attention words can be input by a simple operation, and one or more attention words can be input. A screen containing the co-occurrence network at the specified time can be displayed. Further, by successively selecting one node included in the first co-occurrence network in the first screen, an instruction for designating a word corresponding to the node as an attention word is input (FIG. 7). In this way, by selecting one node in succession in the first screen, an instruction to designate one attention word is input by a simple operation, and the co-occurrence network when one attention word is designated is specified. The screen that contains it can be displayed.

また、第１画面内で第１共起ネットワークに含まれる１本のエッジを選択することにより、エッジに接続された２個のノードに対応する単語を注目語として指定する指示が入力される（図９）。このように第１画面内で１本のエッジを続けて選択することにより、２個の注目語を指定する指示を簡単な操作で入力し、２個の注目語を指定したときの共起ネットワークを含む画面を表示することができる。また、第１画面内で第１共起ネットワークに含まれる１本または複数のエッジを選択し、分析開始を選択することにより、エッジに接続された複数のノードに対応する単語を注目語として指定する指示が入力される（図１０、図１１）。このように第１画面内で１本または複数のエッジと分析開始を選択することにより、複数の注目語を指定する指示を簡単な操作で入力し、複数の注目語を指定したときの共起ネットワークを含む画面を表示することができる。 In addition, by selecting one edge included in the first co-occurrence network in the first screen, an instruction for designating a word corresponding to two nodes connected to the edge as an attention word is input ( FIG. 9). In this way, by selecting one edge continuously in the first screen, an instruction for designating two attention words is input by a simple operation, and the co-occurrence network when two attention words are designated Can be displayed. In addition, by selecting one or more edges included in the first co-occurrence network in the first screen and selecting start analysis, words corresponding to a plurality of nodes connected to the edges are designated as attention words. Is input (FIGS. 10 and 11). In this way, by selecting one or more edges and analysis start in the first screen, an instruction to specify a plurality of attention words can be input with a simple operation, and co-occurrence when a plurality of attention words are specified A screen including the network can be displayed.

また、複数の第２共起ネットワーク（限定共起ネットワーク５２、５４）を含む第２画面（画面７４）内で併合指示が入力されたときに（図１７）、画面を表示するステップは、複数の第２共起ネットワークをタブ形式で表示する（図１８）。これにより、複数の第２共起ネットワークをコンパクトに表示することができる。また、第２画面内で一の第２共起ネットワーク（限定共起ネットワーク５２）を掴んで他の第２共起ネットワーク（限定共起ネットワーク５４）内で離すことにより、併合指示が入力される。したがって、併合指示を簡単な操作で入力し、複数の第２共起ネットワークをコンパクトに表示することができる。 Further, when a merge instruction is input in the second screen (screen 74) including a plurality of second co-occurrence networks (limited co-occurrence networks 52, 54) (FIG. 17), there are a plurality of steps for displaying the screen. The second co-occurrence network is displayed in a tab format (FIG. 18). Thereby, a some 2nd co-occurrence network can be displayed compactly. Also, a merge instruction is input by grabbing one second co-occurrence network (limited co-occurrence network 52) in the second screen and releasing it in another second co-occurrence network (limited co-occurrence network 54). . Therefore, a merge instruction can be input by a simple operation, and a plurality of second co-occurrence networks can be displayed in a compact manner.

限定テキストデータは、指定されたテキストデータのうち注目語を含む文から構成されていてもよい。この場合、注目語を指定する指示が入力されたときに、指定されたテキストデータを文単位で分けて限定テキストデータを求め、求めた限定テキストデータに基づく第２共起ネットワークを含む画面を表示することができる。複数の注目語が指定されたときの限定テキストデータは、指定されたテキストデータのうち複数の注目語のすべてを含む文から構成されていてもよい。この場合、複数の注目語についてＡＮＤ処理を行ったときの第２共起ネットワークを含む画面を表示することができる。複数の注目語が指定されたときの限定テキストデータは、指定されたテキストデータのうち複数の注目語のいずれかを含む文から構成されていてもよい。この場合、複数の注目語についてＯＲ処理を行ったときの第２共起ネットワークを含む画面を表示することができる。また、共起行列を生成するステップは、Ｊａｃｃａｒｄ係数を要素とする共起行列を生成する。したがって、テキストデータに含まれる単語の共起性を好適に分析することができる。 The limited text data may be composed of a sentence including the attention word among the designated text data. In this case, when an instruction to specify the attention word is input, the specified text data is divided into sentence units to obtain limited text data, and a screen including the second co-occurrence network based on the determined limited text data is displayed. can do. The limited text data when a plurality of attention words are designated may be composed of a sentence including all of the plurality of attention words among the designated text data. In this case, it is possible to display a screen including the second co-occurrence network when AND processing is performed on a plurality of attention words. The limited text data when a plurality of attention words are designated may be composed of a sentence including any of the plurality of attention words among the designated text data. In this case, it is possible to display a screen including the second co-occurrence network when OR processing is performed on a plurality of attention words. In the step of generating a co-occurrence matrix, a co-occurrence matrix having Jaccard coefficients as elements is generated. Therefore, it is possible to suitably analyze the co-occurrence of words included in the text data.

本実施形態に係るテキストマイニング装置１０およびテキストマイニングプログラム３１は、上記のテキストマイニング方法と同様の特徴を有し、同様の効果を奏する。本実施形態に係るテキストマイニング方法、テキストマイニング装置１０、および、テキストマイニングプログラム３１によれば、注目語を指定したときの共起ネットワークを含む画面を簡単な操作で表示することができる。 The text mining apparatus 10 and the text mining program 31 according to the present embodiment have the same characteristics as the above text mining method and have the same effects. According to the text mining method, the text mining apparatus 10, and the text mining program 31 according to the present embodiment, a screen including a co-occurrence network when an attention word is designated can be displayed with a simple operation.

１０…テキストマイニング装置
１１…指示入力部
１２…テキストデータ記憶部
１３…単語抽出部
１４…共起行列生成部
１５…共起ネットワーク生成部
１６…画面表示部
２０…コンピュータ
２１…ＣＰＵ
２２…メインメモリ
２９…マウス
３０…記録媒体
３１…テキストマイニングプログラム
３２…テキストデータ
４１〜４５…ウインドウ
５１…全体共起ネットワーク
５２〜５４…限定共起ネットワーク
６１…分析ボタン
６２…マウスカーソル
６３〜６４…タブ
７１〜７５…画面 DESCRIPTION OF SYMBOLS 10 ... Text mining apparatus 11 ... Instruction input part 12 ... Text data memory | storage part 13 ... Word extraction part 14 ... Co-occurrence matrix generation part 15 ... Co-occurrence network generation part 16 ... Screen display part 20 ... Computer 21 ... CPU
DESCRIPTION OF SYMBOLS 22 ... Main memory 29 ... Mouse 30 ... Recording medium 31 ... Text mining program 32 ... Text data 41-45 ... Window 51 ... Whole co-occurrence network 52-54 ... Limited co-occurrence network 61 ... Analysis button 62 ... Mouse cursor 63-64 ... tabs 71-75 ... screen

Claims

A text mining method for displaying a screen including an analysis result of text data,
Extracting words from text data;
Generating a co-occurrence matrix for the word;
Generating a co-occurrence network based on the co-occurrence matrix;
Displaying a screen including the co-occurrence network,
The step of extracting the word when the instruction to specify the attention word is input in the first screen including the first co-occurrence network based on the whole of the specified text data includes the step of extracting the word from the specified text data. The step of extracting the word from the limited text data including the part including the attention word and generating the co-occurrence matrix generates a second co-occurrence matrix using the limited text data for the word, The step of generating generates a second co-occurrence network based on the second co-occurrence matrix, and the step of displaying the screen displays a second screen including the second co-occurrence network. Method.

By selecting one or a plurality of nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction for designating a word corresponding to the node as the attention word is input. The text mining method according to claim 1, wherein:

By sequentially selecting one node included in the first co-occurrence network in the first screen, an instruction for designating a word corresponding to the node as the attention word is input. The text mining method according to claim 1.

An instruction for designating a word corresponding to two nodes connected to the edge as the attention word by successively selecting one edge included in the first co-occurrence network in the first screen. The text mining method according to claim 1, wherein the text mining method is input.

By selecting one or a plurality of edges included in the first co-occurrence network in the first screen and selecting start analysis, words corresponding to a plurality of nodes connected to the edges are selected as the attention word. The text mining method according to claim 1, wherein an instruction to designate as is input.

The step of displaying the screen when a merge instruction is input in a second screen including a plurality of second co-occurrence networks includes displaying the plurality of second co-occurrence networks in a tab format. The text mining method according to claim 1.

The text mining according to claim 6, wherein the merging instruction is input by grabbing one second co-occurrence network in the second screen and releasing it in another second co-occurrence network. Method.

The text mining method according to claim 1, wherein the limited text data includes a sentence including the attention word in the designated text data.

9. The text mining according to claim 8, wherein the limited text data when a plurality of attention words are specified includes a sentence including all of the plurality of attention words in the specified text data. Method.

9. The text according to claim 8, wherein the limited text data when a plurality of attention words are designated is composed of a sentence including any of the plurality of attention words in the designated text data. Mining method.

2. The text mining method according to claim 1, wherein the step of generating the co-occurrence matrix generates a co-occurrence matrix having Jaccard coefficients as elements.

A text mining program for displaying a screen including the analysis result of text data,
Extracting words from text data;
Generating a co-occurrence matrix for the word;
Generating a co-occurrence network based on the co-occurrence matrix;
And causing the computer to execute a step of displaying a screen including the co-occurrence network using a memory,
The step of extracting the word when the instruction to specify the attention word is input in the first screen including the first co-occurrence network based on the whole of the specified text data includes the step of extracting the word from the specified text data. The step of extracting the word from the limited text data including the part including the attention word and generating the co-occurrence matrix generates a second co-occurrence matrix using the limited text data for the word, The step of generating generates a second co-occurrence network based on the second co-occurrence matrix, and the step of displaying the screen displays a second screen including the second co-occurrence network. program.

By selecting one or a plurality of nodes included in the first co-occurrence network in the first screen and selecting start analysis, an instruction for designating a word corresponding to the node as the attention word is input. The text mining program according to claim 12, wherein:

By sequentially selecting one node included in the first co-occurrence network in the first screen, an instruction for designating a word corresponding to the node as the attention word is input. The text mining program according to claim 12.

An instruction for designating a word corresponding to two nodes connected to the edge as the attention word by successively selecting one edge included in the first co-occurrence network in the first screen. The text mining program according to claim 12, wherein the text mining program is input.

By selecting one or a plurality of edges included in the first co-occurrence network in the first screen and selecting start analysis, words corresponding to a plurality of nodes connected to the edges are selected as the attention word. The text mining program according to claim 12, wherein an instruction to designate as is input.

The step of displaying the screen when a merge instruction is input in a second screen including a plurality of second co-occurrence networks includes displaying the plurality of second co-occurrence networks in a tab format. The text mining program according to claim 12.

18. The text mining according to claim 17, wherein the merge instruction is input by grasping one second co-occurrence network in the second screen and releasing it in another second co-occurrence network. program.

A text mining device that displays a screen including the analysis result of text data,
A word extraction unit for extracting words from text data;
A co-occurrence matrix generator for generating a co-occurrence matrix for the word;
A co-occurrence network generation unit that generates a co-occurrence network based on the co-occurrence matrix;
A screen display unit for displaying a screen including the co-occurrence network,
When an instruction for designating a word of interest is input in the first screen including the first co-occurrence network based on the whole of the designated text data, the word extraction unit extracts the word of interest from the designated text data The co-occurrence matrix generation unit generates a second co-occurrence matrix using the limited text data for the word, and the co-occurrence network generation unit A text mining apparatus, wherein a second co-occurrence network is generated based on a two-co-occurrence matrix, and the screen display unit displays a second screen including the second co-occurrence network.

The screen display unit displays the plurality of second co-occurrence networks in a tab format when a merge instruction is input in a second screen including a plurality of second co-occurrence networks. Item 20. The text mining device according to Item 19.