JP2010061176A

JP2010061176A - Text mining device, text mining method, and text mining program

Info

Publication number: JP2010061176A
Application number: JP2006315862A
Authority: JP
Inventors: Takahiro Ikeda; 崇博池田; Satoshi Nakazawa; 聡中澤; Yosuke Sakao; 要祐坂尾; Kenji Sato; 研治佐藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2006-11-22
Filing date: 2006-11-22
Publication date: 2010-03-18
Also published as: WO2008062822A1

Abstract

<P>PROBLEM TO BE SOLVED: To generate an attribute value or a combination of attribute values that is effective to a new text classification on the basis of a characteristic (of the text) selected by a user and that a user does not explicitly designate. <P>SOLUTION: In a text mining device, an attribute condition designating means 301 reads the conditions of an attribute value, i.e., conditions of the first class positive example or the first class negative example designated by the user via an input device 10. A text mining means 302 carries out text mining in the test of a test storage part 202, extracts effective portions to classify the first class positive example and the first class negative example as characteristics, stores the characteristics in a mining result holding part 203, and performs displaying via an output device 40. An analysis object characteristic designating means 303 inputs the characteristic selected by the user. A positive/negative example text extracting means 304 classifies the texts corresponding to the first class positive example and the first class negative example into the second class positive or negative text, based on the selected characteristic. An attribute characteristic extracting means 305 outputs the effective attribute value to classify the second class positive example and the second class negative example as characteristics. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、テキストの特徴として単語を抽出するテキストマイニング装置、テキストマイニング方法、および、テキストマイニングプログラムに関し、特に、マイニング結果として得られた単語から、その単語を含むテキストに特徴的な属性を抽出することができるテキストマイニング装置、テキストマイニング方法、および、テキストマイニングプログラムに関する。 The present invention relates to a text mining device, a text mining method, and a text mining program for extracting a word as a text feature, and in particular, from a word obtained as a mining result, a characteristic attribute of the text including the word is extracted. The present invention relates to a text mining device, a text mining method, and a text mining program.

テキストマイニングとは、いくつかの属性に関して属性値を付与されたテキストの集合に対して、利用者が特定の属性値を持つテキストを正例として指定したときに、正例のテキストに偏って出現する特徴を抽出し、出力する処理である。 Text mining appears when a user designates text with a specific attribute value as a positive example for a set of texts that have been given attribute values for several attributes. This is a process for extracting and outputting features to be output.

例えば、製品に関する問い合わせを受け付けるコンタクトセンターにおける応対記録の場合、通常、受け付けた問い合わせ内容を記述したテキスト以外に、受け付けた問い合わせがどのような種類のものであったのかを表す「質問」「要望」「修理依頼」等の問い合わせ種別、問い合わせの対象であった機種の名称、問い合わせを受け付けた日、問い合わせに応対した担当者の氏名等をセットで記録しておくことが多い。テキストマイニングでは、このようなテキストを、「問い合わせ種別」属性、「機種名」属性、「受付日」属性、「担当者」属性の属性値が付与されたテキストと見なし、例えば、「問い合わせ種別」属性値として「修理依頼」を持つテキストを正例として、正例のテキストに偏って出現する特徴を抽出することができる。 For example, in the case of a response record at a contact center that accepts inquiries about products, in addition to text that describes the contents of accepted inquiries, “questions” and “requests” that indicate what kind of inquiries were received The type of inquiry such as “repair request”, the name of the model that was the subject of the inquiry, the date of receipt of the inquiry, and the name of the person in charge who responded to the inquiry are often recorded as a set. In text mining, such text is regarded as text with attribute values of “inquiry type” attribute, “model name” attribute, “acceptance date” attribute, and “person in charge” attribute. For example, “inquiry type” Using a text having “repair request” as an attribute value as a positive example, it is possible to extract features that appear biased in the text of the positive example.

従来のテキストマイニング装置は、各テキストから単語を抽出し、正例として指定される特定の属性値を持つテキストと関連性が高い単語もしくは単語の組み合わせを、その正例の特徴として抽出するように構成される。 A conventional text mining device extracts words from each text, and extracts words or combinations of words that are highly related to text having specific attribute values specified as positive examples as features of the positive examples. Composed.

この種のテキストマイニング装置の一例が、特許文献１に記載されている。この特許文献１に記載されたテキストマイニング装置は、マイニング対象のテキスト中に出現する特徴的な語句を抽出する特徴語抽出処理部と、分析の対象とする分類軸（属性に相当）を設定する分析軸設定処理部と、分類軸の各カテゴリ（属性値に相当）と関連する度合いが高い語句を抽出する関連語句取得処理部とを有し、分析の対象として利用者が設定した分類軸の各カテゴリに特徴的な語句を抽出する。 An example of this type of text mining device is described in Patent Document 1. The text mining device described in Patent Document 1 sets a feature word extraction processing unit that extracts characteristic words and phrases appearing in a mining target text, and a classification axis (corresponding to an attribute) to be analyzed. An analysis axis setting processing unit, and a related phrase acquisition processing unit that extracts words / phrases having a high degree of association with each category (corresponding to attribute values) of the classification axis, and the classification axis set by the user as an analysis target Extract words that are characteristic of each category.

また、この種のテキストマイニング方法の別の一例が、非特許文献１に記載されている。この非特許文献１に記載されたテキストマイニング方法は、正例のテキスト（目的群）と、負例のテキスト（対象群）とが与えられたとき、正例のテキストでの出現頻度が高く、負例のテキストでの出現頻度ができるだけ低いテキスト中のパターン、すなわち、その出現頻度によって正例と負例とを分離するのに有効なパターンを発見し、そのパターンを正例の特徴として抽出するものである。 Another example of this type of text mining method is described in Non-Patent Document 1. In the text mining method described in Non-Patent Document 1, when a positive example text (target group) and a negative example text (target group) are given, the appearance frequency in the positive example text is high, A pattern in the text that appears as low as possible in the text of the negative example, that is, a pattern that is effective in separating the positive example from the negative example is found by the appearance frequency, and the pattern is extracted as a feature of the positive example. Is.

一方、テキスト以外のデータ集合から、何らかのパターンやルールを学習する技術はデータマイニングと呼ばれ、データマイニングを行うさまざまな手法が広く知られている。 On the other hand, a technique for learning a certain pattern or rule from a data set other than text is called data mining, and various methods for performing data mining are widely known.

データマイニングを行う手法の一例として、分岐征服のアルゴリズムおよびカバーリングのアルゴリズムが、非特許文献２に記載されている。この手法は、予め正例と負例に分けられた属性付きのデータ集合がある場合に、正例を弁別する決定木を求める手法である。 As an example of a method for performing data mining, a branch conquest algorithm and a covering algorithm are described in Non-Patent Document 2. This method is a method for obtaining a decision tree for discriminating positive examples when there is a data set with attributes that are divided into positive examples and negative examples in advance.

また、データマイニングを行う別の手法の一例が、非特許文献３に記載されている。この手法は、アイテムの組み合わせであるトランザクションの集合が与えられたときに、アイテムの集合間の相関ルールを求めるための手法である。 An example of another method for performing data mining is described in Non-Patent Document 3. This method is a method for obtaining a correlation rule between a set of items when a set of transactions that is a combination of items is given.

特開２００３−１４１１３４号公報JP 2003-141134 A 安部潤一郎他４名、「テキストデータからの高速データマイニング−探索的文書ブラウジングとウェブデータへの応用−」、人工知能学会誌、Ｖｏｌ．１５、Ｎｏ．４、２０００年７月、ｐｐ．６１８−６２８Junichiro Abe and four others, “High-speed data mining from text data: exploratory document browsing and application to web data”, Journal of the Japanese Society for Artificial Intelligence, Vol. 15, no. 4, July 2000, pp. 618-628 元田浩他２名、「機械学習とデータマイニング」、人工知能学会誌、Ｖｏｌ．１２、Ｎｏ．４、１９９７年７月、ｐｐ．５０５−５１２Hiroshi Motoda and two others, “Machine learning and data mining”, Journal of Artificial Intelligence, Vol. 12, no. 4, July 1997, pp. 505-512 喜連川優、「データマイニングにおける相関ルール抽出技法」、人工知能学会誌、Ｖｏｌ．１２、Ｎｏ．４、１９９７年７月、ｐｐ．５１３−５２０Yu Kiteragawa, “Association rule extraction technique in data mining”, Journal of Artificial Intelligence, Vol. 12, no. 4, July 1997, pp. 513-520

テキストマイニングによって、利用者が正例として指定するテキストに顕著に見られる特徴があれば、それを抽出することができる。このため、何らかの特徴が抽出された場合、利用者は、正例として指定したテキストには共通の特徴があることを知ることができる。しかしながら、抽出された特徴は、正例として指定されたテキスト全般に一様に出現しているとは限らない。また、正例として指定されたテキストだけに出現しているとも限らない。 By text mining, if there is a feature that is noticeable in the text specified by the user as a positive example, it can be extracted. For this reason, when some feature is extracted, the user can know that the text specified as the positive example has a common feature. However, the extracted features do not always appear uniformly throughout the text designated as a positive example. Moreover, it does not necessarily appear only in the text designated as a positive example.

例えば、製品に関する問い合わせを受け付けるコンタクトセンターにおける応対記録に対して、受付年月が「２００５年１０月」で、問い合わせ種別が「修理依頼」のテキストを正例としてテキストマイニングを行い、正例の特徴として単語「ハードディスク」が抽出されたとする。これは、受付年月が「２００６年１０月」で、問い合わせ種別が「修理依頼」のテキストにおいては、その他のテキストと比べ、単語「ハードディスク」が偏って出現していることを意味している。利用者は、この結果から、２００５年１０月の修理依頼は、他と比べて「ハードディスク」に関するものが多かったことを知ることができる。 For example, in response to a response record at a contact center that accepts inquiries about products, text mining is performed using the text with the date of acceptance “October 2005” and the type of inquiry “repair request” as a positive example. And the word “hard disk” is extracted. This means that the word “hard disk” appears in a biased manner in the text with the reception date “October 2006” and the inquiry type “repair request” compared to other texts. . From this result, the user can know that the repair requests in October 2005 were more related to “hard disks” than others.

このとき、実際には、単語「ハードディスク」が、受付年月が「２００５年１０月」で、問い合わせ種別が「修理依頼」であるテキストのうち、機種名が「ＰＣ−１００」であるテキストに特に偏って出現している可能性がある。また、単語「ハードディスク」が、受付日が「２００５年１１月」で、問い合わせ種別が「修理依頼」であるテキストにも他と比べて多く出現している可能性もある。しかしながら、利用者は、従来それを知ることができなかった。 In this case, the word “hard disk” is actually a text with the model name “PC-100” among the texts with the reception date “October 2005” and the inquiry type “repair request”. There is a possibility that it appears in particular. In addition, the word “hard disk” may appear more frequently in the text with the reception date “November 2005” and the inquiry type “repair request” than others. However, the user has not been able to know it conventionally.

以上のように、上述した従来のテキストマイニング装置の問題点は、テキストから特徴が抽出されたときに、その特徴がどの範囲のテキストに出現しているのかを利用者に提示できないことにある。すなわち、従来のテキストマイニング装置では、利用者により選択された特徴（テキストの）に基づく利用者が陽に指定しない新たなテキスト分類に有効な属性値（または、属性値の組み合わせ）を知ることができない。その理由は、上述した従来のテキストマイニング装置では、抽出された特徴が出現するテキストが、その特徴が出現するということ以外に、どのような共通する特徴を持つのかという情報を利用者に提示しないためである。 As described above, the problem with the above-described conventional text mining apparatus is that when a feature is extracted from the text, the range of the text in which the feature appears cannot be presented to the user. That is, in the conventional text mining device, the attribute value (or combination of attribute values) effective for a new text classification not explicitly specified by the user based on the feature (text) selected by the user is known. Can not. The reason is that the above-described conventional text mining device does not present to the user information on what common features the text in which the extracted features appear has in addition to the appearance of the features. Because.

本発明の目的は、上述した問題点を解決するテキストマイニング装置、テキストマイニング方法、および、テキストマイニングプログラムを提供することにある。 An object of the present invention is to provide a text mining apparatus, a text mining method, and a text mining program that solve the above-described problems.

本発明の第１のテキストマイニング装置は、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件に基づいてテキストマイニングを行い、第１種の正例と第１種の負例とを分類するのに有効な部分を特徴として抽出し、特徴の中から着目すべき特徴を前記利用者に選択させ、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別し、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成するデータ処理装置を有する。 The first text mining device of the present invention performs text mining based on an attribute value condition which is a condition of the first type positive example and the first type negative example specified by the user, and performs the first type positive example. A portion effective for classifying an example and a first type negative example is extracted as a feature, and a feature to be noticed is selected from the features, and the user selects the first type positive example, and the first type The text corresponding to the negative example is classified into the second type of positive text in which the selected feature appears and the second type of negative text in which the selected feature does not appear. It has a data processing device that generates an attribute value condition that is a new feature effective for classifying positive examples and second type negative examples.

本発明の第２のテキストマイニング装置は、複数のテキスト、および、前記テキストごとの属性値を格納する記憶装置と、前記記憶装置から前記テキスト、および、前記テキストごとの属性値を読み出し、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件を、前記テキスト、および、前記テキストごとの属性値に適用しテキストマイニングを行い、第１種の正例と第１種の負例とを分類するのに有効な部分を特徴として抽出し、前記記憶装置にマイニング結果として格納し、特徴の中から着目すべき特徴を前記利用者に選択させ、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別し、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成し出力装置に出力するデータ処理装置とを有する。 A second text mining device according to the present invention stores a plurality of texts and an attribute value for each text, reads the text and attribute values for each text from the storage device, and a user Text mining is performed by applying the attribute value condition, which is the condition of the first type positive example and the first type negative example, to the text and the attribute value for each text. A portion effective for classifying an example and a negative example of the first type is extracted as a feature, stored as a mining result in the storage device, and a feature to be noticed is selected from the features, and the user selects The text corresponding to one type of positive example and one type of negative example is the text of the second type of positive example in which the selected feature appears and the type of negative example in which the selected feature does not appear. Sort into text, It has two positive cases and the data processing device for outputting the generated output device a valid new features become attribute value condition to classify the second type of negative examples.

本発明の第３のテキストマイニング装置は、前記第１、または、第２のテキストマイニング装置であって、第１種の正例と第１種の負例とを分類するのに有効な部分が、事前に設定された第１の基準に基づく「第１種の正例のテキストでの出現頻度が高く、第１種の負例のテキストでの出現頻度が低い語句」である。 The third text mining device of the present invention is the first or second text mining device, and has an effective part for classifying the first type positive example and the first type negative example. “A phrase having a high appearance frequency in the text of the first type positive example and a low appearance frequency in the text of the first type negative example” based on the first criterion set in advance.

本発明の第４のテキストマイニング装置は、前記第１、第２、または、第３のテキストマイニング装置であって、第２種の正例と、第２種の負例とを分類するのに有効な新たな特徴となる属性値条件が、事前に設定された第２の基準に基づく「第２種の正例に対する属性値として出現頻度が高く、第２種の負例に対する属性値として出現頻度が低い属性値」の組み合わせである。 A fourth text mining device of the present invention is the first, second, or third text mining device for classifying a second type positive example and a second type negative example. An attribute value condition that becomes a valid new feature is based on a second criterion set in advance, “appears frequently as an attribute value for the second type positive example, and appears as an attribute value for the second type negative example. This is a combination of “attribute values with low frequency”.

本発明の第５のテキストマイニング装置は、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件を入力し、第１種の正例の条件である属性値条件に基づいてテキストマイニングを行い、第１種の正例を分類するのに有効な部分を特徴として抽出し、特徴の中から着目すべき特徴を前記利用者に選択させ、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別し、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成するデータ処理装置を有する。 The fifth text mining device of the present invention inputs an attribute value condition that is a condition of the first type positive example and the first type negative example designated by the user, and the condition of the first type positive example Text mining is performed based on a certain attribute value condition, a portion effective for classifying the first type of positive example is extracted as a feature, the feature to be noted is selected from the features, and the user selects the first feature. The text corresponding to the positive example of the species and the negative example of the first type is classified into the text of the second type of positive example in which the selected feature appears and the text of the second type of negative example in which the selected feature does not appear. And a data processing device for generating an attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example.

本発明の第６のテキストマイニング装置は、利用者により指定された第１種の正例の条件である属性値条件に基づいてテキストマイニングを行い、属性値条件に適合するテキストを第１種の正例とし、残りのテキストを第１種の負例として分類するのに有効な部分を特徴として抽出し、特徴の中から着目すべき特徴を前記利用者に選択させ、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別し、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成するデータ処理装置を有する。 The sixth text mining apparatus of the present invention performs text mining based on an attribute value condition which is a first type of positive condition specified by the user, and converts the text matching the attribute value condition to the first type. As a positive example, a portion that is effective for classifying the remaining text as a first type negative example is extracted as a feature, the feature to be noticed is selected from the features, and the user selects the first type positive example. And the text corresponding to the first type negative example is classified into the second type positive example text in which the selected feature appears and the second type negative example text in which the selected feature does not appear. And a data processing device for generating an attribute value condition which is a new feature effective for classifying the second type positive example and the second type negative example.

本発明の第７のテキストマイニング装置は、格納されている全テキスト中で頻出する要素を特徴として抽出し、特徴の中から着目すべき特徴を前記利用者に選択させ、選択された特徴が出現する正例のテキストと選択された特徴が出現しない負例のテキストとに分別し、正例と負例とを分類するのに有効な新たな特徴となる属性値条件を生成するデータ処理装置を有する。 The seventh text mining device of the present invention extracts elements that appear frequently in all stored text as features, causes the user to select features to be noted from the features, and the selected features appear A data processing device that generates an attribute value condition that is a new feature effective for classifying positive examples and negative examples by separating them into positive example texts and negative example texts in which the selected feature does not appear Have.

本発明の第１のテキストマイニング方法は、テキストマイニング装置が、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件に基づいてテキストマイニングを行い第１種の正例と第１種の負例とを分類するのに有効な部分を特徴として抽出する手順と、抽出した特徴の中から着目すべき特徴を前記利用者に選択させる手順と、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別する手順と、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成する手順と、を含む。 In the first text mining method of the present invention, the text mining device performs text mining based on an attribute value condition which is a condition of the first type positive example and the first type negative example designated by the user. A procedure for extracting, as a feature, a portion effective for classifying one positive example and a first negative example, a procedure for causing the user to select a feature to be noted from the extracted features, The text corresponding to one type of positive example and one type of negative example is the text of the second type of positive example in which the selected feature appears and the type of negative example in which the selected feature does not appear. And a procedure for generating an attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example.

本発明の第２のテキストマイニング方法は、複数のテキスト、および、前記テキストごとの属性値を格納する記憶装置と、データ処理装置とを備えるテキストマイニング装置におけるテキストマイニング方法であって、前記データ処理装置が、前記記憶装置から前記テキスト、および、前記テキストごとの属性値を読み出す手順と、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件を、前記テキスト、および、前記テキストごとの属性値に適用しテキストマイニングを行い、第１種の正例と第１種の負例とを分類するのに有効な部分を特徴として抽出し、前記記憶装置にマイニング結果として格納する手順と、抽出した特徴の中から着目すべき特徴を前記利用者に選択させる手順と、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別し、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成し出力装置に出力する手順とを含む。 A second text mining method of the present invention is a text mining method in a text mining device comprising a plurality of texts, a storage device storing attribute values for each text, and a data processing device, wherein the data processing The apparatus reads out the text and the attribute value for each text from the storage device, and the attribute value condition which is the condition of the first type positive example and the first type negative example specified by the user. , Applying text mining to the text and the attribute value for each text, extracting as a feature a portion effective for classifying the first type positive example and the first type negative example, and storing the memory A procedure for storing the result as a mining result in the apparatus; a procedure for causing the user to select a feature to be noted from the extracted features; a first type positive example; Are classified into a second type of positive example text in which the selected feature appears and a second type of negative example text in which the selected feature does not appear. And a procedure for generating an attribute value condition as a new feature effective for classifying the example and the second type negative example and outputting the attribute value condition to the output device.

本発明の第３のテキストマイニング方法は、前記第１、または、第２のテキストマイニング方法であって、第１種の正例と第１種の負例とを分類するのに有効な部分が、事前に設定された第１の基準に基づく「第１種の正例のテキストでの出現頻度が高く、第１種の負例のテキストでの出現頻度が低い語句」である。 A third text mining method of the present invention is the first or second text mining method, wherein a portion effective for classifying a first type positive example and a first type negative example is provided. “A phrase having a high appearance frequency in the text of the first type positive example and a low appearance frequency in the text of the first type negative example” based on the first criterion set in advance.

本発明の第４のテキストマイニング方法は、前記第１、第２、または、第３のテキストマイニング方法であって、第２種の正例と、第２種の負例とを分類するのに有効な新たな特徴となる属性値条件が、事前に設定された第２の基準に基づく「第２種の正例に対する属性値として出現頻度が高く、第２種の負例に対する属性値として出現頻度が低い属性値」の組み合わせである。 A fourth text mining method of the present invention is the first, second, or third text mining method for classifying a second type positive example and a second type negative example. An attribute value condition that becomes a valid new feature is based on a second criterion set in advance, “appears frequently as an attribute value for the second type positive example, and appears as an attribute value for the second type negative example. This is a combination of “attribute values with low frequency”.

本発明の第５のテキストマイニング方法は、テキストマイニング装置が、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件を入力する手順と、第１種の正例の条件である属性値条件に基づいてテキストマイニングを行い、第１種の正例を分類するのに有効な部分を特徴として抽出する手順と、特徴の中から着目すべき特徴を前記利用者に選択させる手順と、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別する手順と、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成する手順と、を含む。 According to a fifth text mining method of the present invention, the text mining device inputs an attribute value condition that is a condition of the first type positive example and the first type negative example designated by the user, Text mining based on the attribute value condition, which is a condition of the positive example of the seed, and a procedure for extracting a portion effective for classifying the positive example of the first kind as a feature, and a feature to be noticed from the feature The procedure selected by the user, the first type positive example, and the text corresponding to the first type negative example are selected as the second type positive example text in which the selected feature appears and the selected feature. To generate the attribute value condition which is a new feature effective for classifying the second type negative example and the second type negative example into the second type negative example text that does not appear Procedures.

本発明の第６のテキストマイニング方法は、テキストマイニング装置が、利用者により指定された第１種の正例の条件である属性値条件に基づいてテキストマイニングを行い、属性値条件に適合するテキストを第１種の正例とし、残りのテキストを第１種の負例として分類するのに有効な部分を特徴として抽出する手順と、特徴の中から着目すべき特徴を前記利用者に選択させる手順と、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別する手順と、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成する手順とを含む。 According to a sixth text mining method of the present invention, a text mining device performs text mining based on an attribute value condition which is a first type of positive condition designated by a user, and text conforming to the attribute value condition. Is a positive example of the first type, and a procedure for extracting as a feature a portion effective for classifying the remaining text as a negative example of the first type, and causing the user to select a feature to be noted from among the features The procedure, the first type positive example, and the text corresponding to the first type negative example, the second type positive example text in which the selected feature appears and the second type in which the selected feature does not appear And a procedure for generating an attribute value condition as a new feature effective for classifying the second type positive example and the second type negative example.

本発明の第７のテキストマイニング方法は、テキストマイニング装置が、格納されている全テキスト中で頻出する要素を特徴として抽出する手順と、特徴の中から着目すべき特徴を前記利用者に選択させる手順と、選択された特徴が出現する正例のテキストと選択された特徴が出現しない負例のテキストとに分別し、正例と負例とを分類するのに有効な新たな特徴となる属性値条件を生成する手順とを含む。 According to a seventh text mining method of the present invention, the text mining device causes the user to select a feature to be noticed from among a procedure in which elements frequently appearing in all stored texts are extracted as features. Separating the procedure into positive example text in which the selected feature appears and negative example text in which the selected feature does not appear, and an attribute that becomes a new feature effective for classifying the positive example and the negative example Generating a value condition.

本発明の第１のテキストマイニングプログラムは、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件に基づいてテキストマイニングを行い第１種の正例と第１種の負例とを分類するのに有効な部分を特徴として抽出する手順と、抽出した特徴の中から着目すべき特徴を前記利用者に選択させる手順と、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別する手順と、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成する手順とをテキストマイニング装置に実行させる。 The first text mining program of the present invention performs text mining based on an attribute value condition which is a condition of the first type positive example and the first type negative example specified by the user, and the first type positive example. And a procedure for extracting a portion effective for classifying the first type negative example as a feature, a procedure for causing the user to select a feature to be noted from the extracted features, and a first type positive example , And the text corresponding to the first type negative example is classified into the second type positive example text in which the selected feature appears and the second type negative example text in which the selected feature does not appear. The text mining apparatus is caused to execute a procedure and a procedure for generating an attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example.

本発明の第２のテキストマイニングプログラムは、複数のテキスト、および、前記テキストごとの属性値を格納する記憶装置と、データ処理装置とを備えるテキストマイニング装置におけるテキストマイニングプログラムであって、前記記憶装置から前記テキスト、および、前記テキストごとの属性値を読み出す手順と、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件を、前記テキスト、および、前記テキストごとの属性値に適用しテキストマイニングを行い、第１種の正例と第１種の負例とを分類するのに有効な部分を特徴として抽出し、前記記憶装置にマイニング結果として格納する手順と、抽出した特徴の中から着目すべき特徴を前記利用者に選択させる手順と、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別し、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成し出力装置に出力する手順とを前記データ処理装置に実行させる。 A second text mining program of the present invention is a text mining program in a text mining device comprising a plurality of texts, a storage device storing attribute values for each text, and a data processing device, wherein the storage device A procedure for reading the text and the attribute value for each text, and an attribute value condition which is a condition of the first type positive example and the first type negative example designated by the user, the text, and Text mining is applied to the attribute value for each text, and a portion effective for classifying the first type positive example and the first type negative example is extracted as a feature and stored as a mining result in the storage device A procedure for selecting the feature to be noticed from the extracted features, a first type positive example, and a first type negative example The corresponding text is classified into a second type of positive example text in which the selected feature appears and a second type of negative example text in which the selected feature does not appear. The data processing apparatus is caused to execute a procedure for generating an attribute value condition as a new feature effective for classifying the negative example of the seed and outputting the attribute value condition to the output apparatus.

本発明の第３のテキストマイニングプログラムは、前記第１、または、第２のテキストマイニングプログラムであって、第１種の正例と第１種の負例とを分類するのに有効な部分が、事前に設定された第１の基準に基づく「第１種の正例のテキストでの出現頻度が高く、第１種の負例のテキストでの出現頻度が低い語句」である。 The third text mining program of the present invention is the first or second text mining program, wherein the effective part for classifying the first type positive examples and the first type negative examples is provided. “A phrase having a high appearance frequency in the text of the first type positive example and a low appearance frequency in the text of the first type negative example” based on the first criterion set in advance.

本発明の第４のテキストマイニングプログラムは、前記第１、第２、または、第３のテキストマイニングプログラムであって、第２種の正例と、第２種の負例とを分類するのに有効な新たな特徴となる属性値条件が、事前に設定された第２の基準に基づく「第２種の正例に対する属性値として出現頻度が高く、第２種の負例に対する属性値として出現頻度が低い属性値」の組み合わせである。 A fourth text mining program according to the present invention is the first, second, or third text mining program for classifying the second type positive example and the second type negative example. An attribute value condition that becomes a valid new feature is based on a second criterion set in advance, “appears frequently as an attribute value for the second type positive example, and appears as an attribute value for the second type negative example. This is a combination of “attribute values with low frequency”.

本発明の第５のテキストマイニングプログラムは、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件を入力する手順と、第１種の正例の条件である属性値条件に基づいてテキストマイニングを行い、第１種の正例を分類するのに有効な部分を特徴として抽出する手順と、特徴の中から着目すべき特徴を前記利用者に選択させる手順と、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別する手順と、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成する手順とをテキストマイニング装置に実行させる。 The fifth text mining program of the present invention includes a procedure for inputting an attribute value condition which is a condition of a first type positive example and a first type negative example designated by the user, and a first type positive example. Text mining based on the attribute value condition that is a condition, a procedure for extracting a portion effective for classifying the first type of positive example as a feature, and selecting a feature to be noticed from the features to the user A second type of positive example in which the selected feature appears and a second feature in which the selected feature does not appear, and the procedure corresponding to the first type positive example and the text corresponding to the first type negative example Text mining includes a procedure for separating text into negative example text and a procedure for generating an attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example. Let the device run.

本発明の第６のテキストマイニングプログラムは、利用者により指定された第１種の正例の条件である属性値条件に基づいてテキストマイニングを行い、属性値条件に適合するテキストを第１種の正例とし、残りのテキストを第１種の負例として分類するのに有効な部分を特徴として抽出する手順と、特徴の中から着目すべき特徴を前記利用者に選択させる手順と、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別する手順と、第２種の正例と第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成する手順とをテキストマイニング装置に実行させる。 The sixth text mining program of the present invention performs text mining on the basis of an attribute value condition which is a first type of positive condition specified by the user, and converts the text matching the attribute value condition to the first type. A procedure for extracting, as a feature, a portion effective for classifying the remaining text as a first type negative example, a procedure for causing the user to select a feature to be noted from the features, The text corresponding to the positive example of the species and the negative example of the first type is classified into the text of the second type of positive example in which the selected feature appears and the text of the second type of negative example in which the selected feature does not appear. And a procedure for generating an attribute value condition, which is a new feature effective for classifying the second type positive example and the second type negative example, is executed by the text mining apparatus.

本発明の第７のテキストマイニングプログラムは、格納されている全テキスト中で頻出する要素を特徴として抽出する手順と、特徴の中から着目すべき特徴を前記利用者に選択させる手順と、選択された特徴が出現する正例のテキストと選択された特徴が出現しない負例のテキストとに分別し、正例と負例とを分類するのに有効な新たな特徴となる属性値条件を生成する手順とをテキストマイニング装置に実行させる。 The seventh text mining program of the present invention includes a procedure for extracting, as a feature, an element that frequently appears in all stored text, a procedure for causing the user to select a feature to be noted from the features, The attribute value condition that becomes a new feature effective for classifying the positive example and the negative example is generated by separating the positive example text in which the feature appears and the negative example text in which the selected feature does not appear. The procedure is executed by a text mining device.

本発明の第８のテキストマイニング装置は、属性付きのテキストの集合から特徴を抽出して出力するテキストマイニング装置であって、前記特徴の中から着目すべき特徴を入力する分析対象特徴指定手段と、前記入力された特徴がテキスト中に出現するかどうかによって、テキスト中から正例テキストと負例テキストとを抽出する正例負例テキスト抽出手段と、該正例テキストと該負例テキストとを分類するのに有効な属性的な特徴を抽出する属性特徴抽出手段とを有する。 An eighth text mining device of the present invention is a text mining device that extracts and outputs features from a set of texts with attributes, and includes analysis target feature designating means for inputting features to be noted from the features. A positive example negative example text extracting means for extracting a positive example text and a negative example text from the text depending on whether or not the inputted feature appears in the text, and the positive example text and the negative example text, Attribute feature extraction means for extracting attribute features effective for classification.

本発明の第９のテキストマイニング装置は、テキストの集合を保持するテキスト記憶手段と、前記テキストに対する属性値を保持する属性記憶手段と、テキストマイニングの条件を入力する条件指定手段と、前記条件に従ってテキストの特徴を抽出するテキストマイニング手段と、前記特徴の中から着目すべき特徴を入力する分析対象特徴指定手段と、前記入力された特徴がテキスト中に出現するかどうかによって、テキスト中から正例テキストと負例テキストとを抽出する正例負例テキスト抽出手段と、該正例テキストと該負例テキストとを分類するのに有効な属性的な特徴を抽出する属性特徴抽出手段と、を有する。 According to a ninth text mining device of the present invention, a text storage means for holding a set of text, an attribute storage means for holding an attribute value for the text, a condition designating means for inputting a text mining condition, and according to the condition Text mining means for extracting text features, analysis target feature designating means for inputting features to be noticed from among the features, and whether or not the inputted features appear in the text, positive examples from the text Positive example negative example text extracting means for extracting text and negative example text, and attribute feature extracting means for extracting attribute features effective for classifying the positive example text and the negative example text .

本発明の第８のテキストマイニング方法は、コンピュータが属性付きのテキストの集合から特徴を抽出して出力するテキストマイニング方法であって、前記特徴の中から着目すべき特徴を前記コンピュータが入力するステップと、前記入力された特徴がテキスト中に出現するかどうかによって、テキスト中から正例テキストと負例テキストとを前記コンピュータが抽出するステップと、該正例テキストと該負例テキストとを分類するのに有効な属性的な特徴を前記コンピュータが抽出するステップとを含む。 An eighth text mining method of the present invention is a text mining method in which a computer extracts and outputs features from a set of texts with attributes, and the computer inputs features to be noted from the features. The computer extracts positive text and negative text from the text according to whether the input feature appears in the text, and classifies the positive text and the negative text The computer extracting attribute features that are useful for the computer.

本発明の第９のテキストマイニング方法は、コンピュータにテキストの集合と、前記テキストに対する属性値とを記憶させ、前記コンピュータに、テキストマイニングの条件を入力するステップと、前記条件に従ってテキストの特徴を抽出するステップと、前記特徴の中から着目すべき特徴を入力するステップと、前記入力された特徴がテキスト中に出現するかどうかによって、テキスト中から正例テキストと負例テキストとを抽出するステップと、該正例テキストと該負例テキストとを分類するのに有効な属性的な特徴を抽出するステップとを含む。 According to a ninth text mining method of the present invention, a computer stores a set of texts and attribute values for the text, the text mining conditions are input to the computer, and text features are extracted according to the conditions. A step of inputting a feature to be noticed from among the features, and a step of extracting positive example text and negative example text from the text depending on whether or not the inputted feature appears in the text; And extracting an attribute characteristic effective for classifying the positive example text and the negative example text.

本発明の第８のテキストマイニングプログラムは、属性付きのテキストの集合から特徴を抽出して出力する処理をコンピュータに実行させるテキストマイニングプログラムであって、前記特徴の中から着目すべき特徴を入力する分析対象特徴指定処理と、前記入力された特徴がテキスト中に出現するかどうかによって、テキスト中から正例テキストと負例テキストとを抽出する正例負例テキスト抽出処理と、該正例テキストと該負例テキストとを分類するのに有効な属性的な特徴抽出する属性特徴抽出処理とを前記コンピュータに実行させる。 An eighth text mining program of the present invention is a text mining program for causing a computer to execute processing for extracting and outputting features from a set of texts with attributes, and inputting features to be noted from the features. Analysis target feature designation processing, positive example negative example text extraction processing for extracting positive example text and negative example text from the text, depending on whether the input feature appears in the text, the positive example text, The computer is caused to execute attribute feature extraction processing for extracting attribute features effective for classifying the negative example text.

本発明の第９のテキストマイニングプログラムは、記憶装置にテキストの集合と前記テキストに対する属性値とを記憶させる処理と、テキストマイニングの条件を入力する条件指定処理と、前記条件に従ってテキストの特徴を抽出するテキストマイニング処理と、前記特徴の中から着目すべき特徴を入力する分析対象特徴指定処理と、前記入力された特徴がテキスト中に出現するかどうかによって、テキスト中から正例テキストと負例テキストとを抽出する正例負例テキスト抽出処理と、該正例テキストと該負例テキストと分類するのに有効な属性的な特徴抽出する属性特徴抽出処理と、をコンピュータに実行させる。 According to a ninth text mining program of the present invention, a process for storing a set of texts and an attribute value for the text in a storage device, a condition designating process for inputting text mining conditions, and extracting text features according to the conditions Depending on whether the input feature appears in the text, the positive text and the negative text from the text And a positive example negative example text extraction process, and an attribute feature extraction process that extracts an attribute characteristic effective for classifying the positive example text and the negative example text.

本発明の効果は、利用者にとって利用者により選択された特徴（テキストの）に基づく利用者が陽に指定しない新たなテキスト分類に有効な属性値（または、属性値の組み合わせ）を知ることができることである。 The effect of the present invention is that the user knows attribute values (or combinations of attribute values) effective for a new text classification that is not explicitly specified by the user based on the feature (text) selected by the user. It can be done.

その理由は、利用者により指定されたテキストの属性値に基づくテキストマイニングによって抽出されたテキストの特徴のうち、利用者が選択したものが出現するテキストを正例、出現しないテキストを負例としてデータマイニングを行い、正例と負例とを分類するのに有効な属性値または属性値の組み合わせを抽出して出力するからである。 The reason for this is that, among text features extracted by text mining based on text attribute values specified by the user, the text that the user selected appears as a positive example, and the text that does not appear as a negative example This is because mining is performed and attribute values or combinations of attribute values effective for classifying positive examples and negative examples are extracted and output.

まず、本発明の概要について説明する。本発明のテキストマイニング装置は、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件に基づいてテキストマイニングを行い、第１種の正例と第１種の負例とを分類するのに有効な部分を特徴として抽出し、特徴の中から着目すべき特徴を利用者に選択させる。 First, an outline of the present invention will be described. The text mining device of the present invention performs text mining based on an attribute value condition which is a condition of the first type positive example and the first type negative example specified by the user, and the first type positive example and the first type positive example. A portion effective for classifying one kind of negative example is extracted as a feature, and a feature to be noted is selected from the features.

次に、テキストマイニング装置は、第１種の正例、および、第１種の負例に該当するテキストを、選択された特徴が出現する第２種の正例のテキストと選択された特徴が出現しない第２種の負例のテキストとに分別し、第２種の正例と、第２種の負例とを分類するのに有効な新たな特徴となる属性値条件を生成する。 Next, the text mining device determines that the text corresponding to the first type positive example and the first type negative example are the text of the second type positive example in which the selected feature appears and the selected feature is selected. The attribute value condition which becomes a new feature effective for classifying the second type positive example and the second type negative example is generated by classifying the text into the second type negative example text that does not appear.

ここで、「正例と負例とを分類するのに有効な部分」とは、たとえば、「正例のテキストでの出現頻度が高く、負例のテキストでの出現頻度が低い語句」である。すなわち、「正例のテキストには出現し、負例のテキストには、出現しない語句」に限定されるものではない。また、たとえば、出現頻度が高い、出現頻度が低いは、事前に設定されたそれぞれの「閾値」等との比較により決定することが可能である。また、たとえば、正例のテキストに出現する頻度と、負例のテキストに出現する頻度との比から決定することも可能である。このように、出現頻度の高低は、ある事前に設定された基準に基づいて決定されればよい。また、分類は、出現頻度以外の種々の尺度に基づくことが可能である。以降、「分類」を以上のような意味で使用する。 Here, the “effective portion for classifying positive examples and negative examples” is, for example, “a phrase having a high appearance frequency in the positive example text and a low appearance frequency in the negative example text”. . That is, the phrase is not limited to “a phrase that appears in positive text and does not appear in negative text”. Further, for example, whether the appearance frequency is high or the appearance frequency is low can be determined by comparison with respective “threshold values” set in advance. Further, for example, it is possible to determine from the ratio of the frequency of appearing in the positive example text to the frequency of appearing in the negative example text. As described above, the appearance frequency may be determined based on a predetermined criterion. The classification can be based on various scales other than the appearance frequency. Hereinafter, “classification” is used in the above meaning.

次に、本発明の第１の実施の形態について図面を参照して詳細に説明する。図１は、本発明の第１の実施の形態の構成を示すブロック図である。図１を参照すると、本発明の第１実施の形態のテキストマイニング装置は、キーボード、マウス等の入力装置１０と、情報を記憶するハードディスク等の記憶装置２１と、プログラム制御により動作するデータ処理装置３１と、ディスプレイ装置等の出力装置４０とから構成される。 Next, a first embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the first exemplary embodiment of the present invention. Referring to FIG. 1, a text mining device according to a first embodiment of the present invention includes an input device 10 such as a keyboard and a mouse, a storage device 21 such as a hard disk for storing information, and a data processing device that operates under program control. 31 and an output device 40 such as a display device.

記憶装置２１は、属性記憶部２０１と、テキスト記憶部２０２と、マイニング結果保持部２０３とを含む。属性記憶部２０１は、テキスト記憶部２０２に記憶される各テキストに対応付けて、そのテキストに付与された属性値の情報を記憶する。テキスト記憶部２０２は、テキストマイニングの対象となるテキストを記憶する。 The storage device 21 includes an attribute storage unit 201, a text storage unit 202, and a mining result holding unit 203. The attribute storage unit 201 stores information on attribute values assigned to the text in association with each text stored in the text storage unit 202. The text storage unit 202 stores text that is a target of text mining.

図２にテキスト記憶部２０２の一例を、図３に属性記憶部２０１の一例を示す。この例では、各テキストに一意のテキスト番号を付与してテキスト記憶部２０２に格納し、属性記憶部２０１では、各テキスト番号に対して「問い合わせ種別」「機種名」「受付年月」「担当者」の４種類の属性の属性値を格納している。 FIG. 2 shows an example of the text storage unit 202, and FIG. 3 shows an example of the attribute storage unit 201. In this example, a unique text number is assigned to each text and stored in the text storage unit 202. In the attribute storage unit 201, “inquiry type”, “model name”, “reception date”, “in charge” for each text number. The attribute values of the four types of attributes are stored.

なお、属性記憶部２０１とテキスト記憶部２０２とは、完全に分離する必要はなく、テキストとそのテキストに対する属性とを同時に記憶するように構成してもよい。マイニング結果保持部２０３は、テキスト記憶部２０２に記憶されているテキストに対して、テキストマイニングを行った結果得られる特徴を記憶する。 Note that the attribute storage unit 201 and the text storage unit 202 need not be completely separated, and may be configured to store the text and the attribute for the text at the same time. The mining result holding unit 203 stores features obtained as a result of text mining performed on the text stored in the text storage unit 202.

データ処理装置３１は、属性値条件指定手段３０１と、テキストマイニング手段３０２と、分析対象特徴指定手段３０３と、正例負例テキスト抽出手段３０４と、属性特徴抽出手段３０５とを含む。属性値条件指定手段３０１は、利用者が指定する正例（上述の第１種の）の属性値条件と負例（上述の第１種の）の属性値条件とを、入力装置１０を通して読み取る。 The data processing device 31 includes attribute value condition specifying means 301, text mining means 302, analysis target feature specifying means 303, positive / negative example text extracting means 304, and attribute feature extracting means 305. The attribute value condition designating unit 301 reads the positive value (first type) attribute value condition and the negative example (first type) attribute value condition specified by the user through the input device 10. .

テキストマイニング手段３０２は、テキスト記憶部２０２に記憶されているテキストに対して、属性値条件指定手段３０１が読み取った正例の属性値条件に適合するものを正例のテキスト、負例の属性値条件に適合するものを負例のテキストとしてテキストマイニングを適用する。これにより、テキストマイニング手段３０２は、正例のテキストの特徴として、正例を負例と分類するのに有効な特徴を抽出し、出力装置４０を通して利用者に出力する。また、抽出された特徴をマイニング結果保持部２０３に格納する。 The text mining means 302 converts the text stored in the text storage unit 202 into a positive example text and a negative example attribute value that match the positive example attribute value condition read by the attribute value condition specifying unit 301. Text mining is applied as negative example text that meets the conditions. As a result, the text mining means 302 extracts a feature effective for classifying the positive example as the negative example as the feature of the positive example text, and outputs it to the user through the output device 40. Further, the extracted feature is stored in the mining result holding unit 203.

テキストマイニングでは、一般に、単語、複数の単語からなる集合、フレーズ、文等、テキストの一部を構成する要素を特徴として抽出する。すなわち、テキストマイニングでは、これらの要素のうち、たとえば、負例のテキストにはあまり出現せず、正例のテキストに偏って出現するものを、正例のテキストの特徴として抽出する。このテキストマイニングには、非特許文献１記載の技術が部分的に適用可能である。 In text mining, in general, elements constituting a part of text such as a word, a set of a plurality of words, a phrase, a sentence, and the like are extracted as features. That is, in the text mining, for example, those elements that do not appear so much in the negative example text and appear biased in the positive example text are extracted as features of the positive example text. The technology described in Non-Patent Document 1 can be partially applied to this text mining.

なお、テキストの構造を解析し、テキストをその解析結果の構造化データに変換した後に、構造化データの部分構造を特徴として抽出するテキストマイニング手法がある。これは、例えば、単語間の係り受け関係を事前に解析しておき、係り受けの関係にある２つの単語を特徴として抽出する手法や、依存構造解析によりテキストを依存構造木に変換し、その部分木を特徴として抽出する方法等である。このような手法を用いる場合は、テキストから得られた構造化データにある部分構造が包含されている場合に、該テキストに該部分構造が出現すると見なす。 There is a text mining technique in which the structure of text is analyzed, the text is converted into structured data as a result of the analysis, and then the partial structure of the structured data is extracted as a feature. This is because, for example, the dependency relationship between words is analyzed in advance, and two words in the dependency relationship are extracted as features, or the text is converted into a dependency structure tree by dependency structure analysis. For example, a method of extracting a subtree as a feature. When such a method is used, when a partial structure is included in structured data obtained from text, it is considered that the partial structure appears in the text.

テキストマイニング手段３０２は、テキストマイニングによって得られた特徴を出力装置４０を通して利用者に出力するとともに、マイニング結果保持部２０３に格納する。なお、出力装置４０を通して利用者に出力する情報は、抽出された特徴以外に、その特徴が出現するテキストが何件あるか、その特徴がどの程度正例のテキストに偏って出現しているか等の付加的な情報を含んでいてもよい。 The text mining means 302 outputs the characteristics obtained by text mining to the user through the output device 40 and stores them in the mining result holding unit 203. The information output to the user through the output device 40 includes the number of texts in which the features appear in addition to the extracted features, how much the features are biased in the text of the normal example, etc. Additional information may be included.

分析対象特徴指定手段３０３は、テキストマイニング手段３０２によって出力された特徴のうち、着目すべき特徴を利用者に指定させ、その指定内容を入力装置１０を通して読み取る。 The analysis target feature designating unit 303 causes the user to designate a feature to be noted among the features output by the text mining unit 302 and reads the designated content through the input device 10.

正例負例テキスト抽出手段３０４は、テキスト記憶部２０２に記憶されているテキストのうち、テキストマイニング手段３０２の処理の対象となったテキスト、すなわち、属性値条件指定手段３０１が読み取った正例または負例の属性値条件のいずれかに適合するテキストの各々について、分析対象特徴指定手段３０３が読み取った特徴が出現するかどうかを判別し、その特徴が出現するテキストを正例（上述の第２種の）、その特徴が出現しないテキストを負例（上述の第２種の）として抽出する。 The positive example negative example text extraction unit 304 is a text to be processed by the text mining unit 302 among the texts stored in the text storage unit 202, that is, a positive example read by the attribute value condition designating unit 301 or For each of the texts that meet any of the negative example attribute value conditions, it is determined whether or not the feature read by the analysis target feature designating unit 303 appears, and the text in which the feature appears is determined to be a positive example (the above-described second example). The text that does not have the feature (of the seed) is extracted as a negative example (the above-mentioned second kind).

なお、正例負例テキスト抽出手段３０４による正例と負例の判別を高速化するために、テキストマイニング手段３０２が、各特徴がどのテキストに出現するかを示すインデックスを作成して記録しておき、正例負例テキスト抽出手段３０４が、そのインデックスを参照して正例と負例との判別を行うようにしてもよい。 In order to speed up the discrimination between the positive example and the negative example by the positive example / negative example text extraction unit 304, the text mining unit 302 creates and records an index indicating in which text each feature appears. Alternatively, the positive example negative example text extraction means 304 may determine the positive example and the negative example with reference to the index.

分析対象特徴指定手段３０３が、利用者に特徴を指定させる際、特徴を１つだけ指定させるようにしてもよいし、複数指定させるようにしてもよい。分析対象特徴指定手段３０３が利用者に複数の特徴を指定させる場合には、正例負例テキスト抽出手段３０４は、そのいずれかの特徴が出現するテキストを正例としてもよいし、そのすべての特徴が出現するテキストを正例とするようにしてもよい。 When the analysis target feature designating unit 303 causes the user to designate a feature, only one feature or a plurality of features may be designated. When the analysis target feature designating unit 303 causes the user to designate a plurality of features, the positive example negative example text extracting unit 304 may use the text in which any one of the features appears as a positive example, A text in which a feature appears may be a positive example.

また、正例負例テキスト抽出手段３０４が正例と負例とを判別する際、分析対象特徴指定手段３０３が読み取った特徴がある閾値以上の回数出現するテキストのみを正例として判別するようにしてもよい。 In addition, when the positive example negative example text extraction unit 304 discriminates between the positive example and the negative example, only the text that appears the number of times the feature read by the analysis target feature designating unit 303 exceeds a certain threshold is discriminated as a positive example. May be.

属性特徴抽出手段３０５は、正例負例テキスト抽出手段３０４によって抽出された正例および負例のテキストを対象として、データマイニングを適用し、正例のテキストと負例のテキストとを分類するのに有効な特徴的な属性値または属性値の組み合わせを抽出して、出力装置４０を通して利用者に出力する。 The attribute feature extraction unit 305 applies data mining to the positive example text and negative example text extracted by the positive example negative example text extraction unit 304 and classifies the positive example text and the negative example text. A characteristic attribute value or a combination of attribute values that is effective for the above is extracted and output to the user through the output device 40.

本発明の第１の実施の形態において、属性特徴抽出手段３０５が適用するデータマイニング手法は、特定の方法に限定されない。 In the first embodiment of the present invention, the data mining technique applied by the attribute feature extraction unit 305 is not limited to a specific method.

例えば、正例のテキストに特徴的な属性値または属性値の組み合わせを抽出するためのデータマイニング手法として、決定木分析の手法を用いることができる。すなわち、正例のテキストと負例のテキストを分類するための属性値の組み合わせを分岐条件とする決定木を求め、決定木において正例にいたるパスをたどるときの属性値の組み合わせを正例のテキストに特有の属性値の組み合わせとして抽出することが可能である。なお、決定木は、例えば、非特許文献２に記載される手法を用いて求めることができる。 For example, a decision tree analysis technique can be used as a data mining technique for extracting attribute values or combinations of attribute values that are characteristic of positive text. That is, a decision tree having a branch condition that is a combination of attribute values for classifying positive example text and negative example text is obtained, and the combination of attribute values when following the path leading to the positive example in the decision tree is obtained. It can be extracted as a combination of text-specific attribute values. In addition, a decision tree can be calculated | required using the method described in the nonpatent literature 2, for example.

また、同様に、正例のテキストに特徴的な属性値または属性値の組み合わせを抽出するためのデータマイニング手法として、例えば、相関分析の手法を用いることもできる。すなわち、正例のテキストの集合をＴｐ、属性値の組み合わせによる条件Ｖを持つテキストの集合をＴ（Ｖ）、テキスト集合Ｘに属するテキストの数をＮ（Ｘ）と表すとき、確信度Ｃ（Ｖ）＝Ｎ（Ｔｐ∩Ｔ（Ｖ））／Ｎ（Ｔ（Ｖ））が予め定める閾値Ｃｔｈより高く、かつ、支持度Ｓ（Ｖ）＝Ｎ（Ｔｐ∩Ｔ（Ｖ））が予め定める閾値Ｓｔｈより高い場合に、Ｖで表される属性値の組み合わせを正例のテキストに特有の属性値の組み合わせとして抽出する。これは、最小支持度と最小確信度を満たす相関ルールを抽出することに相当するため、例えば、非特許文献３に記載される手法により実現することが可能である。 Similarly, as a data mining technique for extracting attribute values or combinations of attribute values that are characteristic of positive text, for example, a correlation analysis technique can be used. That is, when the set of positive texts is expressed as Tp, the set of texts having the condition V based on the combination of attribute values is expressed as T (V), and the number of texts belonging to the text set X is expressed as N (X), the confidence C ( V) = N (Tp∩T (V)) / N (T (V)) is higher than a predetermined threshold Cth, and the support degree S (V) = N (Tp∩T (V)) is a predetermined threshold. When it is higher than Sth, the combination of attribute values represented by V is extracted as a combination of attribute values specific to the text of the positive example. Since this corresponds to extracting an association rule that satisfies the minimum support level and the minimum certainty level, it can be realized by, for example, the method described in Non-Patent Document 3.

このほか、正例のテキストに特徴的な属性値または属性値の組み合わせを抽出することができる手法であれば、任意のデータマイニング手法を用いることができる。 In addition, any data mining technique can be used as long as it is a technique that can extract attribute values or combinations of attribute values that are characteristic of positive text.

次に、図１および図４を参照して本発明の実施の第１の形態の動作について詳細に説明する。図４は、本発明の実施の第１の形態の動作を示すフローチャートである。 Next, the operation of the first embodiment of the present invention will be described in detail with reference to FIG. 1 and FIG. FIG. 4 is a flowchart showing the operation of the first exemplary embodiment of the present invention.

まず、属性値条件指定手段３０１が、利用者が正例および負例の条件として指定する属性値条件を、入力装置１０を介して読み取る（図４ステップＡ１）。 First, the attribute value condition designating unit 301 reads the attribute value condition designated by the user as the positive and negative example conditions via the input device 10 (step A1 in FIG. 4).

次に、テキストマイニング手段３０２が、テキスト記憶部２０２に記憶されているテキストに対して、属性値条件指定手段３０１が読み取った正例の属性値条件に適合するものを正例のテキスト、正例の属性値条件に適合するものを負例のテキストとしてテキストマイニングを行い、正例のテキストと負例のテキストとを分類するのに有効な特徴を抽出する（ステップＡ２）。 Next, the text mining unit 302 determines that the text stored in the text storage unit 202 matches the positive example attribute value condition read by the attribute value condition specifying unit 301 as the positive example text and the positive example. Text mining is performed with the text that satisfies the attribute value condition as negative example text, and features effective for classifying the positive example text and the negative example text are extracted (step A2).

テキストマイニング手段３０２は、抽出された特徴をマイニング結果保持部２０３に格納し、抽出された特徴をマイニング結果保持部２０３から読み出して出力装置４０を通して利用者に出力する（ステップＡ３）。次に、分析対象特徴指定手段３０３が、入力装置１０を介して利用者による特徴の選択を読み取る（ステップＡ４）。 The text mining means 302 stores the extracted features in the mining result holding unit 203, reads the extracted features from the mining result holding unit 203, and outputs them to the user through the output device 40 (step A3). Next, the analysis target feature designating unit 303 reads the feature selection by the user via the input device 10 (step A4).

正例負例テキスト抽出手段３０４は、テキスト記憶部２０２に記憶されているテキストを１つずつ読み出し（ステップＡ５）、そのテキストが、属性値条件指定手段３０１が読み取った正例または負例の属性値条件のいずれかに適合するかどうかを判定する（ステップＡ６）。適合する場合には（ステップＡ６／Ｙｅｓ）、正例負例テキスト抽出手段３０４は、そのテキストにステップＡ４で利用者により選択された特徴が出現するかどうかを判定する（ステップＡ７）。読み出したテキストに特徴が出現する場合には（ステップＡ７／Ｙｅｓ）、正例負例テキスト抽出手段３０４は、そのテキストを正例とし（ステップＡ８）、特徴が出現しない場合には（ステップＡ７／Ｎｏ）、そのテキストを負例とする（ステップＡ９）。正例負例テキスト抽出手段３０４は、すべてのテキストを処理し終えるまで、ステップＡ５−Ａ９の処理をくり返す（ステップＡ１０）。 The positive example negative example text extraction unit 304 reads the text stored in the text storage unit 202 one by one (step A5), and the text is a positive example or negative example attribute read by the attribute value condition specifying unit 301. It is determined whether or not any of the value conditions is met (step A6). If they match (step A6 / Yes), the positive / negative example text extraction unit 304 determines whether or not the feature selected by the user in step A4 appears in the text (step A7). When a feature appears in the read text (step A7 / Yes), the positive example negative example text extraction unit 304 sets the text as a positive example (step A8), and when no feature appears (step A7 / No), the text is a negative example (step A9). The positive example negative example text extraction unit 304 repeats the processing of Steps A5-A9 until all the texts have been processed (Step A10).

次に、属性特徴抽出手段３０５が、データマイニングにより、ステップＡ５−Ａ１０の処理によって抽出された正例のテキストと負例のテキストとを分類するのに有効な属性値または属性値の組み合わせを抽出する（ステップＡ１１）。次に、属性特徴抽出手段３０５は、抽出結果（属性値または属性値の組み合わせ）を出力装置４０を介して利用者に出力する（ステップＡ１２）。 Next, the attribute feature extraction unit 305 extracts, by data mining, an attribute value or a combination of attribute values effective for classifying the positive example text and the negative example text extracted by the processing in steps A5-A10. (Step A11). Next, the attribute feature extraction means 305 outputs the extraction result (attribute value or combination of attribute values) to the user via the output device 40 (step A12).

なお、本発明の第１の実施の形態では、属性値条件指定手段３０１が、利用者が指定する正例の属性値条件と負例の属性値条件とを読み取り、テキストマイニング手段３０２が、正例の属性値条件に適合するテキストを正例、負例の属性値条件に適合するテキストを負例としてテキストマイニングを行う。これとは異なり、属性値条件指定手段３０１が利用者から正例の属性値条件のみを受け取り、テキストマイニング手段３０２が、正例の属性値条件にあてはまらないテキストすべてを負例のテキストとして扱う構成も可能である。この場合、正例負例テキスト抽出手段３０４は、テキスト記憶部２０２に記憶されている全テキストを対象に正例のテキストと負例のテキストを抽出する。 In the first embodiment of the present invention, the attribute value condition specifying unit 301 reads the positive example attribute value condition and the negative example attribute value condition specified by the user, and the text mining unit 302 sets the positive value attribute value condition. Text mining is performed with text that meets the attribute value condition of the example as a positive example and text that meets the attribute value condition of a negative example as a negative example. Unlike this, the attribute value condition designating unit 301 receives only positive example attribute value conditions from the user, and the text mining unit 302 treats all texts that do not meet the positive example attribute value conditions as negative example texts. Is also possible. In this case, the positive example negative example text extraction unit 304 extracts the positive example text and the negative example text for all the texts stored in the text storage unit 202.

また、属性値条件指定手段３０１を設けず、テキストマイニング手段３０２が、テキスト記憶部２０２に記憶されている全テキスト中で頻出する要素（単語、複数の単語からなる集合、フレーズ、文等）を抽出する構成が可能である。この場合も、正例負例テキスト抽出手段３０４は、テキスト記憶部２０２に記憶されている全テキストを対象に正例のテキストと負例のテキストを抽出する。 In addition, the attribute mining condition specifying unit 301 is not provided, and the text mining unit 302 includes elements that frequently appear in all the texts stored in the text storage unit 202 (words, sets of words, phrases, sentences, etc.). A configuration to extract is possible. Also in this case, the positive example negative example text extraction unit 304 extracts the positive example text and the negative example text for all the texts stored in the text storage unit 202.

次に、本発明の第１の実施の形態の効果について説明する。 Next, effects of the first exemplary embodiment of the present invention will be described.

本発明の第１の実施の形態では、正例（第１種）、負例（第１種）の条件であるテキストの属性値に基づくテキストマイニングによって抽出されたテキストの特徴のうち、利用者が選択したものが出現するテキストを正例（第２種の）、出現しないテキストを負例（第２種の）としてデータマイニングを行い、正例（第２種の）と負例（第２種の）とを分類するのに有効な属性値または属性値の組み合わせを抽出して出力する。 In the first embodiment of the present invention, among the features of the text extracted by text mining based on the text attribute values that are the conditions of the positive example (first type) and the negative example (first type), the user Data mining is performed with the text in which the selection of appears appears as a positive example (second type) and the text that does not appear as a negative example (second type), and a positive example (second type) and a negative example (second type) Attribute values or combinations of attribute values that are effective for classifying and outputting.

すなわち、本発明の第１の実施の形態では、利用者により選択された特徴（テキストの）が出現するテキスト（必ずしも、選択されたすべての特徴が出現するテキストに限定されない）に特有の属性的な特徴を利用者に提示する。 In other words, in the first embodiment of the present invention, the attribute specific to the text in which the feature (text) selected by the user appears (not necessarily limited to the text in which all the selected features appear). Unique features to the user.

したがって、利用者は、本発明の第１の実施の形態により、利用者により選択された特徴（テキストの）に基づく利用者が陽に指定しない新たなテキスト分類（第２種の正例、第２種の負例の分類）に有効な属性値（または、属性値の組み合わせ）を知ることができる。 Therefore, according to the first embodiment of the present invention, the user can create a new text classification (second type positive example, second type) that is not explicitly specified by the user based on the feature (text) selected by the user. It is possible to know attribute values (or combinations of attribute values) effective for two types of negative examples).

次に、本発明の第２の実施の形態について詳細に説明する。図１における本発明の第２の実施の形態の構成は、本発明の第１の実施の形態の構成と同じである。本発明の第２の実施の形態は、利用者により指定された第１種の正例、第１種の負例の条件である属性値条件のうち、第１種の正例の条件である属性値条件に基づいてテキストマイニングを行い、第１種の正例をテキスト全体から分類するのに有効な部分を特徴として抽出し、特徴の中から着目すべき特徴を利用者に選択させる。 Next, a second embodiment of the present invention will be described in detail. The configuration of the second embodiment of the present invention in FIG. 1 is the same as the configuration of the first embodiment of the present invention. The second embodiment of the present invention is a condition of the first type positive example among the attribute value conditions that are conditions of the first type positive example and the first type negative example designated by the user. Text mining is performed based on the attribute value condition, a portion effective for classifying the first type positive example from the entire text is extracted as a feature, and a feature to be noticed is selected from the features.

本発明の第２の実施の形態は、テキストマイニング手段３０２が、第１種の正例にだけ基づくマイニングを行えばよいので、本発明の第１の実施の形態に比べて構成が簡単になるという効果を持つ。 The second embodiment of the present invention has a simpler configuration than the first embodiment of the present invention because the text mining means 302 only needs to perform mining based on the first positive example. Has the effect.

次に、本発明の第３の実施の形態について図面を参照して詳細に説明する。図５は、本発明の第３の実施の形態の構成を示すブロック図である。図５を参照すると、本発明の第３の実施の形態は、入力装置１０、記憶装置２２、データ処理装置３２（たとえば、コンピュータ）、出力装置４０、および、テキストマイニングプログラム５０とを備える。 Next, a third embodiment of the present invention will be described in detail with reference to the drawings. FIG. 5 is a block diagram showing the configuration of the third exemplary embodiment of the present invention. Referring to FIG. 5, the third embodiment of the present invention includes an input device 10, a storage device 22, a data processing device 32 (for example, a computer), an output device 40, and a text mining program 50.

テキストマイニングプログラム５０は、本発明の第１、第２の実施の形態の属性値条件指定手段３０１、テキストマイニング手段３０２、分析対象特徴指定手段３０３、正例負例テキスト抽出手段３０４、および、属性特徴抽出手段３０５の機能を実現する。テキストマイニングプログラム５０は、記憶装置２２、あるいは、図示しない他の記憶手段に格納される。 The text mining program 50 includes an attribute value condition specifying unit 301, a text mining unit 302, an analysis target feature specifying unit 303, a positive / negative example text extracting unit 304, and an attribute according to the first and second embodiments of the present invention. The function of the feature extraction unit 305 is realized. The text mining program 50 is stored in the storage device 22 or other storage means (not shown).

テキストマイニングプログラム５０は、記憶データ処理装置３２に読み込まれ、実行され、データ処理装置３２の動作を制御する。データ処理装置３２は、テキストマイニングプログラム５０の制御により第１、第２の実施の形態におけるデータ処理装置３１の処理と同一の処理を実行する。 The text mining program 50 is read and executed by the stored data processing device 32 and controls the operation of the data processing device 32. The data processing device 32 executes the same processing as that of the data processing device 31 in the first and second embodiments under the control of the text mining program 50.

このように、本発明の第３の実施の形態は、ハードウェアとソフトウェアとの協働により、図４の処理を実行するので、実現が容易であるという効果を持つ。 As described above, the third embodiment of the present invention has an effect that it is easy to implement because the processing of FIG. 4 is executed by cooperation of hardware and software.

次に、本発明の第１の実施の形態の実施例について図面を参照して詳細に説明する。ここでは、製品に関する問い合わせを受け付けるコンタクトセンターにおける応対記録のうち、修理依頼についての問い合わせを対象として、２００５年１０月の問い合わせの特徴を抽出する場合を例に、本発明の実施例の動作を説明する。 Next, examples of the first embodiment of the present invention will be described in detail with reference to the drawings. Here, the operation of the embodiment of the present invention will be described by taking as an example the case where the characteristics of the inquiry in October 2005 are extracted for the inquiry about the repair request from the response records in the contact center that accepts the inquiry about the product. To do.

属性記憶部２０１には、図３に示すように、各テキストに対して「問い合わせ種別」「機種名」「受付年月」「担当者」の４種類の属性の属性値が格納されている。テキスト記憶部２０２には、図２に示すように、テキストマイニングの対象となるテキスト（応対記録の内容）が予め記憶されている。 As shown in FIG. 3, the attribute storage unit 201 stores attribute values of four types of attributes of “inquiry type”, “model name”, “acceptance date”, and “person in charge” for each text. As shown in FIG. 2, the text storage unit 202 stores in advance text to be text mined (contents of reception records).

このとき、まず、属性値条件指定手段３０１が、利用者による、テキストマイニングの正例および負例の属性値条件の指定を、入力装置１０を通して読み取る。 At this time, first, the attribute value condition designating unit 301 reads the designation of the attribute value condition of the text mining positive example and negative example by the user through the input device 10.

利用者は、ここで、正例（第１種）の属性値条件として『（「問い合わせ種別」＝「修理依頼」）ＡＮＤ（「受付年月」＝「２００５年１０月」）』、負例（第１種）の属性値条件として『（「問い合わせ種別」＝「修理依頼」）ＡＮＤ（「受付年月」≠「２００５年１０月」）』という条件を指定する。 Here, the user sets “(inquiry type” = “repair request”) AND (“reception date” = “October 2005”) ”as a positive example (first type) attribute value condition, and a negative example The condition of “(“ inquiry type ”=“ repair request ”) AND (“ reception date ”≠“ October 2005 ”)” is designated as the (first type) attribute value condition.

次に、テキストマイニング手段３０２が、テキスト記憶部２０２に記憶されているテキストのうち、「問い合わせ種別が修理依頼」で、なおかつ、「受付年月が２００５年１０月」のものを正例とし、「問い合わせ種別が修理依頼」で、なおかつ、「受付年月が２００５年１０月でない」ものを負例としてテキストマイニングを実行し、正例のテキストと負例のテキストとを分類するのに有効な特徴を抽出する。 Next, the text mining means 302 uses the text stored in the text storage unit 202 with “inquiry type is repair request” and “reception date is October 2005” as a positive example. Effective for classifying positive example text and negative example text by executing text mining with “inquiry type is repair request” and “reception date is not October 2005” as a negative example Extract features.

図２のテキスト記憶部２０２に記録されているＴ１−Ｔ７のテキストでは、Ｔ１、Ｔ５、および、Ｔ７が正例（第１種）、Ｔ６が負例（第１種）となる。Ｔ２−Ｔ４のテキストは、正例の属性値条件にも負例の属性値条件にもあてはまらないため、テキストマイニングには用いられない。テキストマイニング手段３０２は、抽出した特徴を、出力装置４０を介して利用者に出力するとともに、マイニング結果保持部２０３に格納する。 In the text of T1-T7 recorded in the text storage unit 202 in FIG. 2, T1, T5, and T7 are positive examples (first type), and T6 is a negative example (first type). The text of T2-T4 is not used for text mining because neither the positive attribute value condition nor the negative attribute value condition is satisfied. The text mining means 302 outputs the extracted feature to the user via the output device 40 and stores it in the mining result holding unit 203.

図６は、テキストマイニングの結果の一例を示す説明図である。ここでは、テキストマイニング手段３０２が、テキスト中に出現する単語を特徴として抽出するものとし、図６に示すような特徴をマイニング結果保持部２０３に格納する。次に、分析対象特徴指定手段３０３が、利用者に着目する特徴を選択させ、その選択内容を入力装置１０を介して読み取る。 FIG. 6 is an explanatory diagram illustrating an example of the result of text mining. Here, the text mining means 302 extracts words appearing in the text as features, and stores the features as shown in FIG. 6 in the mining result holding unit 203. Next, the analysis target feature designating unit 303 causes the user to select a feature of interest and reads the selected content via the input device 10.

分析対象特徴指定手段３０３は、例えば、テキストマイニング手段３０２によって出力される特徴のそれぞれに対して、その特徴を選択するかどうかを入力できるようにし、利用者に特徴を選択させることができる。 For example, the analysis target feature designating unit 303 can input whether or not to select each feature output by the text mining unit 302 and can cause the user to select the feature.

図７は、出力装置４０に表示される内容の一例を示す説明図である。図７を参照すると、分析対象特徴指定手段３０３が、テキストマイニング手段３０２によって抽出された特徴のそれぞれに対して、利用者がその特徴を選択したことを示すためのチェックボックスを表示し、利用者がチェックボックスにチェックをつけた特徴を読み取る。図７では、利用者により、単語「ハードディスク」と単語「ＨＤＤ」が選択されている。 FIG. 7 is an explanatory diagram illustrating an example of contents displayed on the output device 40. Referring to FIG. 7, the analysis target feature designating unit 303 displays a check box for indicating that the user has selected the feature for each feature extracted by the text mining unit 302, and the user Read the feature with the check box checked. In FIG. 7, the word “hard disk” and the word “HDD” are selected by the user.

正例負例テキスト抽出手段３０４は、テキスト記憶部２０２に記憶されているテキストのうち、属性値条件指定手段３０１が読み取った正例（第１種の）または負例（第１種の）の属性値条件のいずれかに適合するテキストの各々について、利用者が指定した特徴が出現するかどうかを判別し、特徴が出現する場合には正例（第２種の）として、特徴が出現しない場合には負例（第２種の）として抽出する。 The positive example negative example text extraction unit 304 is a positive example (first type) or negative example (first type) read by the attribute value condition specifying unit 301 among the texts stored in the text storage unit 202. For each text that meets one of the attribute value conditions, it is determined whether or not the feature specified by the user appears. If the feature appears, the feature does not appear as a positive example (type 2). In this case, it is extracted as a negative example (second type).

ここでは、利用者が指定した特徴のいずれかが出現する場合に、そのテキストを正例として抽出するものとする。 Here, when any of the features specified by the user appears, the text is extracted as a positive example.

図２によると、Ｔ１のテキストは、属性値条件指定手段３０１が読み取った正例（第１種の）の属性値条件に適合し、なおかつ、単語「ハードディスク」を含むため、正例（第２種の）として抽出される。一方、Ｔ２−Ｔ４のテキストは、正例（第１種の）の属性値条件にも負例（第２種の）の属性値条件にもあてはまらないため、正例（第２種の）としても負例（第２種の）としても抽出されない。 According to FIG. 2, since the text of T1 matches the attribute value condition of the positive example (first type) read by the attribute value condition specifying unit 301 and includes the word “hard disk”, the positive example (second) Extracted as seeds). On the other hand, the text of T2-T4 does not correspond to the attribute value condition of the positive example (first type) nor the attribute value condition of the negative example (second type). Is not extracted as a negative example (second type).

Ｔ５のテキストは、属性値条件指定手段３０１が読み取った正例（第１種の）の属性値条件に適合するが、単語「ハードディスク」も単語「ＨＤＤ」も含まないため、負例（第２種の）として抽出される。 The text of T5 conforms to the attribute value condition of the positive example (first type) read by the attribute value condition specifying unit 301, but does not include the word “hard disk” nor the word “HDD”. Extracted as seeds).

Ｔ６のテキストは、属性値条件指定手段３０１が読み取った負例（第１種の）の属性値条件に適合し、なおかつ、単語「ＨＤＤ」を含むため、正例（第２種の）として抽出される。Ｔ７のテキストは、属性値条件指定手段３０１が読み取った正例（第１種の）の属性値条件に適合し、なおかつ、単語「ＨＤＤ」を含むため、正例（第２種の）として抽出される。他のテキストについても、同様に処理が行われる。 The text of T6 is extracted as a positive example (second type) because it matches the negative example (first type) attribute value condition read by the attribute value condition designating unit 301 and includes the word “HDD”. Is done. The text of T7 is extracted as a positive example (second type) because it matches the attribute value condition of the positive example (first type) read by the attribute value condition specifying unit 301 and includes the word “HDD”. Is done. The same processing is performed for other texts.

次に、属性特徴抽出手段３０５が、正例負例テキスト抽出手段３０４によって抽出された正例（第２種の）、および、負例（第２種の）のテキストに対して、データマイニングを適用し、正例（第２種の）のテキストと負例（第２種の）のテキストとを分類するのに有効な属性値または属性値の組み合わせを抽出し、抽出結果を出力装置４０を介して利用者に出力する。 Next, the attribute feature extraction unit 305 performs data mining on the positive example (second type) and negative example (second type) text extracted by the positive example negative example text extraction unit 304. Applying and extracting an effective attribute value or combination of attribute values to classify positive example (second type) text and negative example (second type) text, and output the output result to the output device 40 To the user.

本実施例では、データマイニングにより、属性値の組み合わせを分岐条件とする、正例（第２種の）のテキストと負例（第２種の）のテキストとを分類する決定木を求め、決定木において正例（第２種の）にいたるパスに対応する属性値の組み合わせを、正例（第２種の）のテキストに特徴的に見られる属性値の組み合わせとして抽出する。図８は、決定木の一例を示す説明図である。 In the present embodiment, a decision tree that classifies positive example (second type) text and negative example (second type) text using a combination of attribute values as a branching condition is obtained and determined by data mining. A combination of attribute values corresponding to the path leading to the positive example (second type) in the tree is extracted as a combination of attribute values that are characteristic of the text of the positive example (second type). FIG. 8 is an explanatory diagram illustrating an example of a decision tree.

データマイニングにより、図８のような決定木が得られたとすると、正例（第２種の）のテキストに特徴的に見られる属性値の組み合わせとして、『（「受付年月」＝「２００５年１０月」ＯＲ「２００５年１１月」）ＡＮＤ（「機種名」＝「ＰＣ−１００」）』が得られる。 Assuming that a decision tree as shown in FIG. 8 is obtained by data mining, a combination of attribute values that are characteristically seen in the text of the positive example (second type) is “(“ reception date ”=“ 2005 ”. “October” OR “November 2005”) AND (“model name” = “PC-100”) ”is obtained.

図９は、この場合の属性特徴抽出手段３０５の出力例を示す説明図である。図９を参照すると、出力された属性値の組み合わせは、『（「受付年月」＝「２００５年１０月」ＯＲ「２００５年１１月」）ＡＮＤ（「機種名」＝「ＰＣ−１００」）』である。 FIG. 9 is an explanatory diagram showing an output example of the attribute feature extraction unit 305 in this case. Referring to FIG. 9, the combination of the output attribute values is “(“ reception date ”=“ October 2005 ”OR“ November 2005 ”) AND (“ model name ”=“ PC-100 ”). ].

図１０は、本実施例の論理を示す説明図である。図１０を参照すると、本実施例において、利用者は、問い合わせ種別が修理依頼のテキストのうち、受付年月が２００５年１０月のものを正例（第１種の）とし（図１０（ａ）のＲ１１）、２００５年１０月以外のものを負例（第１種の）とし（図１０（ａ）のＲ１０）、テキストマイニングを行い、２００５年１０月の修理依頼の特徴として「ハードディスク」、「ＯＳ」、「ＨＤＤ」、「エラー」等の単語を得る。 FIG. 10 is an explanatory diagram showing the logic of this embodiment. Referring to FIG. 10, in this embodiment, the user sets, as a positive example (first type), a text whose inquiry type is a repair request text with a reception date of October 2005 (FIG. 10 (a)). ) R11), and non-October 2005 as a negative example (first type) (R10 in FIG. 10 (a)), text mining, and "Hard disk" as a feature of the October 2005 repair request , “OS”, “HDD”, “error”, etc.

次に、これらの特徴のうち、利用者に選択された「ハードディスク」と「ＨＤＤ」とに着目してデータマイニングを行い、テキストマイニングの対象とした、問い合わせ種別が修理依頼のテキストのうち、「ハードディスク」または「ＨＤＤ」が出現する正例（第２種の）テキスト（図１０（ｂ）のＲ２１）の属性的な特徴として『（「受付年月」＝「２００５年１０月」ＯＲ「２００５年１１月」）ＡＮＤ（「機種名」＝「ＰＣ−１００」）』という属性値の組み合わせを得る。 Next, among these characteristics, data mining is performed by focusing on the “hard disk” and “HDD” selected by the user. As an attribute characteristic of a positive example (second type) text (R21 in FIG. 10B) in which “hard disk” or “HDD” appears, “(“ reception date ”=“ October 2005 ”OR“ 2005 ” “November, year”) AND (“model name” = “PC-100”) ”combination of attribute values is obtained.

これにより、最初に正例として利用者が指定した条件は、問い合わせ種別が修理依頼で、なおかつ、受付年月が２００５年１０月であるという条件であったが、利用者は、「ハードディスク」または「ＨＤＤ」という単語が２００５年１０月だけでなく２００５年１１月の修理依頼のテキストを合わせても特徴的に出現しており、修理依頼のテキストの中でも、特にＰＣ−１００という機種に顕著に出現していることを知ることができる。 As a result, the condition initially specified by the user as a positive example was that the inquiry type is a repair request and the reception date is October 2005. The word “HDD” appears not only in October 2005 but also in the text of the repair request in November 2005, and is particularly prominent in the repair request text, especially in the PC-100 model. You can know that it has appeared.

本発明によれば、コールセンタで録音した問い合わせのデータや報告書等の紙の文書データから不具合情報や問題点等の有効な知識を抽出するマイニングシステムや、マイニングシステムを実現するためのプログラムといった用途に適用できる。また、問い合わせの内容をテキストとして蓄積しておき、その中から頻出する問い合わせを抽出してＱ＆Ａ集を構築するシステム等の用途にも適用可能である。 According to the present invention, a mining system for extracting effective knowledge such as defect information and problems from inquiry data recorded at a call center and paper document data such as a report, and a program such as a program for realizing the mining system Applicable to. Further, the present invention is applicable to a system or the like that accumulates the contents of inquiries as text and extracts frequently inquiries from the contents to construct a Q & A collection.

本発明の第１の実施の形態の構成を示すブロック図。The block diagram which shows the structure of the 1st Embodiment of this invention. テキスト記憶部の内容の例を示す説明図。Explanatory drawing which shows the example of the content of a text memory | storage part. 属性記憶部の内容の例を示す説明図。Explanatory drawing which shows the example of the content of an attribute memory | storage part. 本発明の第１の実施の形態の動作を示すフローチャート。The flowchart which shows the operation | movement of the 1st Embodiment of this invention. 本発明の第２の実施の形態の構成を示すブロック図。The block diagram which shows the structure of the 2nd Embodiment of this invention. テキストマイニングの結果の一例を示す説明図。Explanatory drawing which shows an example of the result of text mining. 出力装置に表示される内容の一例を示す説明図。Explanatory drawing which shows an example of the content displayed on an output device. 決定木の一例を示す説明図。Explanatory drawing which shows an example of a decision tree. 属性特徴抽出手段の出力の一例を示す説明図。Explanatory drawing which shows an example of the output of an attribute characteristic extraction means. 本発明の第１の実施の形態の実施例の論理を示す説明図。Explanatory drawing which shows the logic of the Example of the 1st Embodiment of this invention.

Explanation of symbols

１０入力装置
４０出力装置
５０テキストマイニングプログラム
２１記憶装置
２２記憶装置
３１データ処理装置
３２データ処理装置
２０１属性記憶部
２０２テキスト記憶部
２０３マイニング結果保持部
３０１属性値条件指定手段
３０２テキストマイニング手段
３０３分析対象特徴指定手段
３０４正例負例テキスト抽出手段
３０５属性特徴抽出手段 DESCRIPTION OF SYMBOLS 10 Input device 40 Output device 50 Text mining program 21 Storage device 22 Storage device 31 Data processing device 32 Data processing device 201 Attribute storage unit 202 Text storage unit 203 Mining result holding unit 301 Attribute value condition specifying unit 302 Text mining unit 303 Analysis object Feature designation means 304 Positive example negative example text extraction means 305 Attribute feature extraction means

Claims

Text mining is performed based on the attribute value condition which is the condition of the first type positive example and the first type negative example specified by the user, and the first type positive example and the first type negative example are classified. The effective part is extracted as a feature, the feature to be noticed is selected from the features, and the user selects the feature corresponding to the first type positive example and the first type negative example. The second type of positive example text in which the selected feature appears and the second type of negative example text in which the selected feature does not appear are classified into the second type positive example and the second type negative example. A text mining device comprising a data processing device for generating an attribute value condition that is a new feature effective for classification.

A storage device that stores a plurality of texts and attribute values for each of the texts, reads the texts and attribute values for each of the texts from the storage device, and is a first type of positive example designated by the user, Applying the attribute value condition, which is the condition of the first type negative example, to the text and the attribute value for each text, and performing text mining to classify the first type positive example and the first type negative example A portion that is effective to do this is extracted as a feature, stored as a mining result in the storage device, and the user is allowed to select a feature to be noticed from the features, and the first type positive example and the first type Are classified into a second type of positive example text in which the selected feature appears and a second type of negative example text in which the selected feature does not appear. To classify examples and negative type 2 Text mining apparatus, comprising a data processing apparatus for outputting the generated attribute value condition that is valid new feature output device.

The portion that is effective for classifying the first type positive example and the first type negative example is based on the first criterion set in advance, “the appearance frequency of the first type positive example is high. The text mining device according to claim 1, wherein the phrase is a phrase having a low appearance frequency in the first type negative example text.

An attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example is based on a second criterion set in advance. 4. The text mining device according to claim 1, 2, or 3, wherein the attribute value is a combination of an attribute value having a high appearance frequency as an attribute value and a low appearance frequency as an attribute value for the second type negative example.

Input the attribute value condition that is the condition of the first type positive example and the first type negative example specified by the user, and perform text mining based on the attribute value condition that is the condition of the first type positive example , Extracting a portion effective for classifying the first type positive example as a feature, allowing the user to select a feature to be noted from among the features, the first type positive example, and the first type The text corresponding to the negative example is classified into the text of the second type positive example in which the selected feature appears and the text of the second type negative example in which the selected feature does not appear, and the second type positive example And a data processing device for generating an attribute value condition which is a new feature effective for classifying the second type negative example.

Text mining is performed based on the attribute value condition that is the condition of the first type positive example specified by the user, the text that matches the attribute value condition is set as the first type positive example, and the remaining text is set as the first type. The part effective to classify as a negative example is extracted as a feature, the feature to be noticed is selected from the features, and the user is selected, and corresponds to the first type positive example and the first type negative example The second type positive example text in which the selected feature appears and the second type negative example text in which the selected feature does not appear are classified into the second type positive example and the second type text. A text mining device comprising a data processing device for generating an attribute value condition which is a new feature effective for classifying negative examples of

Elements that frequently appear in all stored texts are extracted as features, and the user is allowed to select a feature to be noticed from among the features, and a positive example text in which the selected feature appears and the selected feature are A text mining device comprising: a data processing device that generates an attribute value condition which is a new feature effective for classifying positive examples and negative examples by classifying them into negative example texts that do not appear.

The text mining device performs text mining based on an attribute value condition which is a condition of the first type positive example and the first type negative example specified by the user, and the first type positive example and the first type negative example. A procedure for extracting a portion effective for classifying an example as a feature, a procedure for causing the user to select a feature to be noted from the extracted features, a first type positive example, and a first type Separating the text corresponding to the negative example into the second type positive example text in which the selected feature appears and the second type negative example text in which the selected feature does not appear, and the second type And a procedure for generating an attribute value condition that is a new feature effective for classifying the positive example and the negative example of the second type.

A text mining method in a text mining device comprising a plurality of texts, a storage device for storing attribute values for each text, and a data processing device,
The data processing device is a procedure for reading the text and the attribute value for each text from the storage device, and attributes that are conditions of the first type positive example and the first type negative example specified by the user Text mining is performed by applying the value condition to the text and the attribute value for each text, and a portion effective for classifying the first type positive example and the first type negative example is extracted as a feature. Corresponds to the procedure for storing the result as the mining result in the storage device, the procedure for allowing the user to select the feature to be noticed from the extracted features, the first type positive example, and the first type negative example The second type positive example text in which the selected feature appears and the second type negative example text in which the selected feature does not appear are classified into the second type positive example and the second type text. New features useful for classifying negative cases Text mining method which comprises the steps of outputting the generated output device attribute value condition that.

The portion that is effective for classifying the first type positive example and the first type negative example is based on the first criterion set in advance, “the appearance frequency of the first type positive example is high. 10. The text mining method according to claim 8, wherein the phrase is a phrase having a low appearance frequency in the first type negative example text.

An attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example is based on a second criterion set in advance. The text mining method according to claim 8, wherein the attribute value is a combination of “attribute values having high appearance frequency as attribute values and low appearance frequency as attribute values for the second type negative example”.

A procedure in which the text mining device inputs an attribute value condition that is a condition of the first type positive example and a first type negative example designated by the user, and an attribute value condition that is a condition of the first type positive example A procedure for performing text mining based on the above and extracting a portion effective for classifying the first type of positive example as a feature, a procedure for causing the user to select a feature to be noted from the features, The text corresponding to the positive example of the species and the negative example of the first type is classified into the text of the second type of positive example in which the selected feature appears and the text of the second type of negative example in which the selected feature does not appear. And a procedure for generating an attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example. Mining method.

The text mining device performs text mining based on the attribute value condition that is the condition of the first type of positive example specified by the user, and sets the text that matches the attribute value condition as the first type of positive example, and the remaining A procedure for extracting, as a feature, a portion effective for classifying a text as a first type negative example, a procedure for causing the user to select a feature to be noted from among features, a first type positive example, and Separating the text corresponding to the first type negative example into the second type positive example text in which the selected feature appears and the second type negative example text in which the selected feature does not appear; And a procedure for generating an attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example.

A procedure in which the text mining device extracts frequently appearing elements in all stored texts as features, a procedure in which the user selects a feature to be noted from among the features, and a correct feature in which the selected feature appears. Separating the example text from the negative example text in which the selected feature does not appear, and generating a new feature attribute value condition that is useful for classifying the positive and negative examples. A featured text mining method.

Text mining is performed based on an attribute value condition which is a condition of the first type positive example and the first type negative example specified by the user, and the first type positive example and the first type negative example are classified. Applicable to the procedure for extracting the effective part of the feature as the feature, the procedure for allowing the user to select the feature to be noticed from the extracted feature, the first type positive example, and the first type negative example The second type positive example text in which the selected feature appears and the second type negative example text in which the selected feature does not appear, the second type positive example and the second type text A text mining program that causes a text mining device to execute an attribute value condition that is a new feature effective for classifying two types of negative examples.

A text mining program in a text mining device comprising a plurality of texts, a storage device for storing attribute values for each text, and a data processing device,
A procedure for reading the text and the attribute value for each text from the storage device, and an attribute value condition which is a condition of the first type positive example and the first type negative example designated by the user, And applying the text mining to the attribute value for each text, extracting a feature effective for classifying the first type positive example and the first type negative example as a feature, and mining it in the storage device The procedure to store as a result, the procedure for causing the user to select a feature to be noted from the extracted features, and the text corresponding to the first type positive example and the first type negative example are selected. The second type positive example text in which the feature appears and the second type negative example text in which the selected feature does not appear are classified into the second type positive example and the second type negative example Generates an attribute value condition that is a new feature that is useful for Text mining program characterized by executing the steps of outputting to the output device to the data processing device.

The portion that is effective for classifying the first type positive example and the first type negative example is based on the first criterion set in advance, “the appearance frequency of the first type positive example is high. The text mining program according to claim 15, wherein the phrase is a phrase having a low appearance frequency in the first type negative example text.

An attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example is based on a second criterion set in advance. The text mining program according to claim 15, 16, or 17, wherein the attribute value is a combination of “attribute value having high appearance frequency as attribute value and low appearance frequency as attribute value for the second type negative example”.

Text mining based on a procedure for inputting an attribute value condition which is a condition of the first type positive example and a first type negative example designated by the user, and an attribute value condition which is a condition of the first type positive example And a procedure for extracting a portion effective for classifying the first type of positive example as a feature, a procedure for causing the user to select a feature to be noticed from the features, a first type of positive example, And a procedure for separating the text corresponding to the first type negative example into the second type positive example text in which the selected feature appears and the second type negative example text in which the selected feature does not appear And a text mining device for causing a text mining device to execute a procedure for generating an attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example program.

Text mining is performed based on the attribute value condition that is the condition of the first type positive example specified by the user, the text that matches the attribute value condition is set as the first type positive example, and the remaining text is set as the first type. A procedure for extracting, as a feature, a portion effective to classify as a negative example, a procedure for causing the user to select a feature to be noted from features, a first type positive example, and a first type A procedure for separating the text corresponding to the negative example into the second type of positive example text in which the selected feature appears and the second type of negative example text in which the selected feature does not appear; A text mining program that causes a text mining device to execute a procedure for generating an attribute value condition that is a new feature effective for classifying a positive example and a second type negative example.

Procedures for extracting frequently occurring elements as features in all stored texts, procedures for causing the user to select features to be noted from among the features, and selection of positive text in which the selected features appear The text mining apparatus executes a procedure for generating an attribute value condition that is a new feature effective for classifying positive examples and negative examples, and separating them into negative example text that does not appear as a feature. A featured text mining program.

A text mining device for extracting and outputting features from a set of texts with attributes, wherein an analysis target feature specifying means for inputting features to be noted from the features, and the input features appear in the text A positive example negative example text extraction means for extracting positive example text and negative example text from the text, and an attribute characteristic effective for classifying the positive example text and the negative example text. A text mining device comprising: an attribute feature extracting means for extracting.

Text storage means for holding a set of text; attribute storage means for holding attribute values for the text; condition specifying means for inputting text mining conditions; and text mining means for extracting text features according to the conditions; Analysis target feature designation means for inputting a feature to be noted from among the features, and a positive example for extracting positive example text and negative example text from the text according to whether or not the inputted feature appears in the text A text mining apparatus comprising: negative example text extracting means; and attribute feature extracting means for extracting attribute features effective for classifying the positive example text and the negative example text.

A text mining method in which a computer extracts and outputs features from a set of texts with attributes, wherein the computer inputs features to be noticed from the features, and the inputted features are included in the text. The computer extracts positive example text and negative example text from the text depending on whether it appears, and the computer has attribute features effective for classifying the positive example text and the negative example text. And a step of extracting the text mining method.

The computer stores a set of texts and attribute values for the texts, the step of inputting text mining conditions to the computer, the step of extracting text features according to the conditions, and focusing on the features A step of inputting a power feature, a step of extracting positive example text and negative example text from the text according to whether or not the inputted feature appears in the text, the positive example text and the negative example text, Extracting a characteristic feature effective for classifying the text.

A text mining program for causing a computer to execute a process of extracting and outputting features from a set of text with attributes, and an analysis target feature specifying process for inputting features to be noticed from among the features, and the input Effective for extracting positive example negative example text from the text and classifying the positive example text and the negative example text depending on whether the feature appears in the text A text mining program that causes the computer to execute an attribute feature extraction process for extracting a characteristic attribute.

A process for storing a set of texts and attribute values for the text in a storage device, a condition designating process for inputting text mining conditions, a text mining process for extracting text features according to the conditions, and the features Analysis target feature designating process for inputting features to be focused on, and positive example negative example text extracting process for extracting positive example text and negative example text from text depending on whether or not the inputted feature appears in the text And a text mining program that causes a computer to execute attribute feature extraction processing for extracting attribute features effective for classifying the positive example text and the negative example text.