JP4181330B2

JP4181330B2 - Summarization creating program and system, and computer summarizing method

Info

Publication number: JP4181330B2
Application number: JP2002072885A
Authority: JP
Inventors: 佳代子磯尾; 恭子牧野; 誠司岩田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-03-15
Filing date: 2002-03-15
Publication date: 2008-11-12
Anticipated expiration: 2022-03-15
Also published as: JP2003271624A

Description

【０００１】
【発明の属する技術分野】
本発明は、テキストマイニング技術を利用した要約作成プログラム及びシステム並びにコンピュータによる要約作成方法に関する。
【０００２】
【従来の技術】
テキストマイニング技術とは、文書データに対するデータマイニング技術である。
【０００３】
テキストマイニング技術の具体例には、文書データに基づいて文脈の理解及び情報の要約・分類・検索を行う技術、文書データから知識情報を抽出する技術、文書で記述されている文書データ（定性情報）を数量（定量情報）化する技術などがある。
【０００４】
また、広義には、テキストマイニング結果に対して分析を行う技術もテキストマイニング技術に含まれる。
【０００５】
テキストマイニングを行う技術が、特願平１１−３３２１１４号に開示されている。特願平１１−３３２１１４号の発明では、概念ＩＤなどの情報が登録されている概念定義辞書を用いてテキストマイニングが行われる。
【０００６】
また、特願平１１−３３２１１４号の発明では、文書より抽出された複数の概念ＩＤの組み合わせについて、この複数の概念ＩＤの示す概念を接続する言葉である概念ラベルを定義する因果関係定義辞書が利用される。
【０００７】
【発明が解決しようとする課題】
上記特願平１１−３３２１１４号の発明では、因果関係定義辞書に新規登録する度に、概念ＩＤの組み合わせと概念ラベルとを指定し登録を命ずる作業を行う必要がある。
【０００８】
この作業は、登録する内容が多くなるにしたがって労力が大きくなり、時間がかかるという問題がある。
【０００９】
また、概念定義辞書への登録数が増えるごとに、因果関係定義辞書の登録数も増える。
【００１０】
本発明は、以上のような実情に鑑みてなされたもので、因果関係定義辞書への登録作業の効率化を図る要約作成プログラム及びシステム並びにコンピュータによる要約作成方法を提供することを目的とする。
【００１１】
【課題を解決するための手段】
本発明を実現するにあたって講じた具体的手段について以下に説明する。
【００１２】
本発明の要約作成プログラムは、コンピュータに、当該コンピュータが要約作成対象の文書データを入力するための入力機能と、データベースに記憶されており文書データを構成する文書要素とその文書要素の属するグループを識別可能なグループ識別情報を一部分に含む要素ＩＤとを関連付けた辞書情報を参照し、コンピュータに入力された要約作成対象の文書データに含まれておりかつ辞書情報に含まれている複数の文書要素とその要素ＩＤとを抽出する抽出機能と、データベースに記憶されており要素ＩＤの一部分であるグループ識別情報の組み合わせに対して文書要素を接続に利用される接続文書要素を関連付けた因果関係情報を参照し、抽出機能によって抽出された複数の要素ＩＤに含まれているグループ識別情報の組み合わせが、因果関係情報に登録されているか判断し、登録されていれば、抽出機能によって抽出された複数の要素ＩＤに含まれているグループ識別情報の組み合わせに対して関連付けされている接続文書要素を抽出する因果関係抽出機能と、複数の文書要素を並べるための結合関係を定めた結合関係情報に基づいて、抽出機能によって抽出された複数の文書要素を並べ、さらに、因果関係抽出機能によって接続文書要素が抽出されている場合には、並べられた複数の文書要素に対して抽出された接続文書要素を補い、要約データを作成する作成機能とを実現させる。
【００１３】
本発明では、辞書情報に登録された文書要素がグループ分けされている。また、因果関係情報がグループ単位で定義されている。新規の文書要素を辞書情報に登録する場合、この新規の文書要素の要素ＩＤに基づいてグループが識別可能であれば因果関係情報を更新する必要はない。
【００１４】
したがって、因果関係情報の登録作業を効率化させることができる。
【００１５】
なお、本発明において、要素ＩＤの一部分が所定のグループを表すグループ識別情報であるとしてもよい。
【００１６】
また、本発明において、コンピュータに、当該コンピュータが辞書情報に登録する文書要素とその文書要素の属するグループを識別可能なグループ識別情報を一部に含む要素ＩＤとを入力し、入力された文書要素とその要素ＩＤとを関連付けてデータベースの辞書情報に登録するための機能をさらに実現させるとしてもよい。これにより、辞書情報の登録作業を支援できる。
【００１７】
また、本発明において、コンピュータに、当該コンピュータが因果関係情報に登録するグループ識別情報の組み合わせと接続文書要素とを入力し、入力されたグループ識別情報の組み合わせと接続文書要素とを関連付けてデータベースの因果関係情報に登録するための機能をさらに実現させるとしてもよい。これにより、因果関係情報の登録作業を支援できる。
【００１８】
上記のような要約作成プログラム、及びこのプログラムを記録した記録媒体を用いることによって、上述した機能を有していないコンピュータ、コンピュータシステム、サーバやクライアント等に対して、簡単に上述した機能を付加することができる。
【００１９】
本発明の要約作成プログラムで実施される要約作成方法を発明の対象としてもよい。
【００２０】
また、本発明の要約作成プログラムで実現される機能と同様に動作する手段を具備した要約作成システムを発明の対象としてもよい。
【００２１】
【発明の実施の形態】
以下、図面を参照しながら本発明の実施の形態について説明する。なお、以下に示す各図において、同一の部分については同一の符号を付してその説明を省略する。
【００２２】
（第１の実施の形態）
本実施の形態においては、文書を構成する要素である文書要素（概念）をグループ分けして概念定義辞書に登録し、グループの組み合わせに対して文書要素を接続するための文書要素である接続文書要素を関連付けて因果関係定義辞書に登録する。
【００２３】
図１は、本実施の形態に係る要約作成プログラム及びテキストマイニングシステム（要約作成システム）の構成の一例を示すブロック図である。
【００２４】
テキストマイニングシステム１は、記録媒体２に記録されている要約作成プログラム３を読み込み、実行する。
【００２５】
要約作成プログラム３は、テキストマイニングシステム１上で起動されると、入力機能４、概念抽出機能５、因果関係抽出機能６、作成機能７、出力機能８、概念登録機能９、因果関係登録機能１０を実現する。
【００２６】
要約作成プログラム３の動作にしたがって、テキストマイニングシステム１は、データベース１１をアクセスする。
【００２７】
データベース１１には、概念定義辞書１２と因果関係定義辞書１３とが管理されている。
【００２８】
概念定義辞書１２は、主にアクション辞書１２ａとリザルト辞書１２ｂを含む。概念定義辞書１２には、例えば、要約に含めるために抽出する要素として定義された文書要素が登録される。
【００２９】
表１に、アクション辞書１２ａの一例を示す。
【００３０】
【表１】

【００３１】
アクション辞書１２ａには、「父の日」「母の日」などの暦における特別の日の名称が文書要素として登録されている。この文書要素「父の日」「母の日」には、それぞれ要素ＩＤ（概念ＩＤ）「A001」「A002」が関係付けされている。
【００３２】
ここで、要素ＩＤの先頭が「A」の文書要素は、暦を示すグループに属するとする。
【００３３】
同様に、アクション辞書１２ａには、「特売」「試飲会」などの販売に関するイベントの種別が文書要素として登録されている。この文書要素「特売」「試飲会」には、それぞれ要素ＩＤ「B001」「B002」が関係付けされている。
【００３４】
ここで、要素ＩＤの先頭が「B」の文書要素は、イベントを示すグループに属するとする。
【００３５】
表２に、リザルト辞書１２ｂの一例を示す。
【００３６】
【表２】

【００３７】
リザルト辞書１２ｂには、「売れている」「人気がある」などの高評価を意味する文書要素が登録されている。この文書要素「売れている」「人気がある」には、それぞれ要素ＩＤ「1001」「1002」が関係付けされている。
【００３８】
ここで、要素ＩＤの先頭が「1」の文書要素は、高評価グループに属するとする。
【００３９】
同様に、リザルト辞書１２ｂには、「売れていない」「人気がない」などの低評価を意味する文書要素が登録されている。この文書要素「売れていない」「人気がない」には、それぞれ要素ＩＤ「2001」「2002」が関係付けされている。
【００４０】
ここで、要素ＩＤの先頭が「2」の文書要素は、低評価グループに属するとする。
【００４１】
因果関係定義情報１３は、グループを識別可能なグループ識別情報の組み合わせに対して接続文書要素を関連付けて登録している。接続文書要素とは、文書要素を接続するための文書要素である。
【００４２】
表３に、因果関係定義辞書１３の一例を示す。
【００４３】
【表３】

【００４４】
例えば、グループ識別情報「A***」とグループ識別情報「1***」の組み合わせに対して接続文書要素「なので」が関連付けされている。グループ識別情報「A***」の示すグループには、先頭が「A」の要素ＩＤを持つ文書要素「父の日」「母の日」が属する。他のグループ識別情報の組み合わせについても同様である。
【００４５】
入力機能４は、要約作成対象の日報データを入力する。
【００４６】
表４に、日報データの一例を示す。
【００４７】
【表４】

【００４８】
日報データは、文書データに日報番号が付されたデータである。ここでは文書データ「父の日でポテトチップスが売れている」に日報番号「N001」が付された日報データと、文書データ「ビスケット特売、人気がない」に日報番号「N002」が付された日報データとが入力されたとする。
【００４９】
概念抽出機能５は、概念定義辞書１２をアクセスし、入力された日報データから概念定義辞書１２に登録されている文書要素とその要素ＩＤを抽出する。
【００５０】
この結果、図２に示すように、日報番号「N001」の日報データに対して、文書要素「父の日」とその要素ＩＤ「A001」及び文書要素「売れている」とその要素ＩＤ「1001」が抽出される。
【００５１】
一方、日報番号「N002」の日報データに対して、文書要素「特売」とその要素ＩＤ「B001」及び文書要素「人気がない」とその要素ＩＤ「2002」が抽出される。
【００５２】
因果関係抽出機能６は、因果関係定義辞書１３をアクセスし、概念抽出機能５によって抽出された要素ＩＤの組み合わせに対応するグループ識別情報の組み合わせが因果関係定義辞書１３に登録されていれば、このグループ識別情報の組み合わせに関連付けされている接続文書要素を抽出する。
【００５３】
例えば、要素ＩＤ「A001」と「1001」の組み合わせに対応するグループ識別情報「A***」と「1***」の組み合わせは、因果関係定義辞書１３に登録されているため、このグループ識別情報「A***」と「1***」の組み合わせに関連付けされている接続文書要素「なので」が抽出される。
【００５４】
作成機能７は、複数の文書要素の結合関係を定めた結合規則情報である分類軸「リザルト＆アクション」にしたがって、概念抽出機能５によって抽出された文書要素をアンド検索により並べ、要約データを作成する。
【００５５】
例えば、日報番号「N001」の日報データに対しては、文書要素「父の日」と「売れている」とが分類軸にしたがって並べられ、「父の日＆売れている」という要約データが作成される。同様に、日報番号「N002」の日報データに対しては、文書要素「特売」と「人気がない」とが分類軸にしたがって並べられ、「特売＆人気がない」という要約データが作成される。
【００５６】
さらに、作成機能７は、因果関係定義辞書１３から接続文書要素が抽出されている場合には、分類軸の「＆」部分を接続文書要素によって補い、要約データを作成する。
【００５７】
例えば、図３に示すように、日報番号「N001」の日報データからは要素ＩＤ「A001」と要素ＩＤ「1001」が抽出されているため、因果関係定義辞書１３においてグループ識別情報「A***」と「1***」に関連付けされている「なので」が補われ、「父の日なので売れている」という要約データが作成される。
【００５８】
同様に、日報番号「N002」の日報データからは要素ＩＤ「B001」と要素ＩＤ「2002」が抽出されているため、因果関係定義辞書１３においてグループ識別情報「B***」と「2***」に関連付けされている「したのに」が補われ、「特売したのに人気がない」という要約データが作成される。
【００５９】
概念登録機能９は、新規に登録する文書要素、その要素ＩＤ、アクション辞書１２ａに登録するかリザルト辞書１２ｂに登録するかの指定の入力を促すための画面を表示する。そして、概念登録機能９は、入力された新規の文書要素とその要素ＩＤを、アクション辞書１２ａとリザルト辞書１２ｂとのうち指定された側に登録する。
【００６０】
因果関係登録機能１０は、新規に登録するグループ識別情報の組み合わせと接続文書要素との入力を促すための画面を表示する。そして、因果関係登録機能１０は、入力された新規のグループ識別情報の組み合わせと接続文書要素を因果関係定義辞書１３に登録する。
【００６１】
図４は、日報データが入力されてから要約データが出力されるまでの処理の流れの一例を示すフローチャートである。
【００６２】
ステップＳ１では、日報データの入力が行われる。
【００６３】
ステップＳ２では、概念定義辞書１２に登録されており日報データに含まれている文書要素とその要素ＩＤが抽出される。
【００６４】
ステップＳ３では、抽出された要素ＩＤの組み合わせに対応するグループ識別情報の組み合わせがあるか否かが判定される。対応する組み合わせがある場合には、ステップＳ４が実行される。
【００６５】
ステップＳ４では、抽出された要素ＩＤの組み合わせに対応するグループ識別情報の組み合わせに関連付けされている接続文書要素が因果関係定義辞書１３から抽出される。
【００６６】
ステップＳ５では、抽出された文書要素を分類軸にしたがって組み合わせて要約データが作成される。
【００６７】
ステップＳ６では、作成された要約データが出力される。
【００６８】
以上説明した本実施の形態においては、文書要素を分類するためのグループを識別可能な要素ＩＤが文書要素に付され、概念定義辞書１２に登録される。
【００６９】
そして、因果関係定義辞書１３には、文書要素のグループ単位で接続文書要素が定義される。
【００７０】
これにより、文書要素の組み合わせ毎に接続文書要素を定義して因果関係定義辞書１３に登録する必要がなく、グループ単位で接続文書要素を定義すればよい。
【００７１】
したがって、因果関係定義辞書１３への登録作業を効率化できる。
【００７２】
なお、上記要素ＩＤは、先頭部分により文書要素の属するグループを識別可能としているが、先頭部分ではない他の部分によりグループを識別可能としてもよい。また、文書要素は複数のグループに属するとしてもよい。
【００７３】
また、上記作成機能７は、図５に示すように、抽出された文書要素を一般的な文書要素に変換した後に、要約データを作成するとしてもよい。例えば、文書要素「売れた」「動いた」「売れています」は、抽出されると要約作成用の文書要素「売れている」に変換されるとする。また、文書要素「セールス」「安売り」は、抽出されると「特売」に変換される。
【００７４】
また、この要約作成プログラム３は通信媒体により伝送してコンピュータに適用可能である。要約作成プログラム３を読み込んだコンピュータは、要約作成プログラム３によって動作が制御され、上述した機能を実現する。
【００７５】
また、要約作成プログラム３による機能及びデータベース１１の登録内容は、自由に組み合わせてもよく、また複数の要素に分割してもよい。
【００７６】
また、要約作成プログラム３は、複数のコンピュータ上に分散され、連携しつつ動作してもよい。
【００７７】
（第２の実施の形態）
本実施の形態においては、上記第１の実施の形態に係る要約作成プログラム３及びテキストマイニングシステム１の具体的な利用態様について説明する。
【００７８】
図６は、要約作成プログラム３及びテキストマイニングシステム１の利用態様の第一例を示すブロック図である。
【００７９】
ユーザ１４の管理するクライアント１５とサービス提供者１６の管理するテキストマイニングシステム１とは、例えばインターネットなどのネットワーク１７を介して接続されている。
【００８０】
テキストマイニングシステム１は、要約作成プログラム３の動作にしたがってデータベース１１をアクセスする。
【００８１】
ユーザ１４は、クライアント１５から企業の日報データ、月報データ、営業報告データ等の文書データをネットワーク１７経由でテキストマイニングシステム１に送信する。すると、テキストマイニングシステム１は、要約データを作成し、作成した要約データをネットワーク１７経由でクライアント１５に送信する。
【００８２】
また、ユーザ１４は、クライアント１５から辞書の更新命令をネットワーク１７経由でテキストマイニングシステム１に送信する。すると、テキストマイニングシステム１は、更新命令にしたがってデータベース１１の内容を更新する。
【００８３】
企業の日報データ、月報データ、営業報告データ等の文書データの数は膨大になる。ユーザ１４は、このような膨大な数の文書データから要約データを作成し、データ管理の効率化を図ることができる。また、ユーザ１４は、サービス提供者１６の要約作成サービスの提供を受けることで、自己で要約作成プログラム３を保守、運用することなく、効率的に要約データを作成できる。
【００８４】
一方、サービス提供者１６は、ユーザ１４からサービス料を得ることができる。
【００８５】
図７は、要約作成プログラム３及びテキストマイニングシステム１の利用態様の第二例を示すブロック図である。
【００８６】
サービス提供者１６は、ユーザ１４に対し、要約作成プログラム３又はテキストマイニングシステム１及びデータベース１１を提供又はレンタルする。
【００８７】
また、サービス提供者１６は、提供又はレンタルした要約作成プログラム３又はテキストマイニングシステム１及びデータベース１１のメンテナンスを行う。
【００８８】
これにより、サービス提供者１６は、要約作成プログラム３等の提供料又はレンタル料、メンテナンス料を得ることができる。
【００８９】
【発明の効果】
以上詳記したように本発明においては、辞書情報に登録される文書要素がグループ分けされ、因果関係情報ではこのグループの組み合わせに対して接続文書要素が登録される。
【００９０】
したがって、グループを指定して新規の文書要素を辞書情報に登録することにより、既に登録済みの因果関係情報を利用して要約データを作成することができ、因果関係定義辞書への登録作業を効率化することができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態に係る要約作成プログラム及びテキストマイニングシステムの構成の一例を示すブロック図。
【図２】抽出された文書要素とその要素ＩＤの一例を示す図。
【図３】抽出された文書要素と接続文書要素に基づいて作成された要約データの一例を示す図。
【図４】日報データが入力されてから要約データが出力されるまでの処理の流れの一例を示すフローチャート。
【図５】抽出された文書要素と要約作成に利用される文書要素との関係の一例を示す図。
【図６】本発明の第２の実施の形態に係る要約作成プログラム及びテキストマイニングシステムの利用態様の第一例を示すブロック図。
【図７】同実施の形態に係る要約作成プログラム及びテキストマイニングシステムの利用態様の第二例を示すブロック図。
【符号の説明】
１…テキストマイニングシステム
２…記録媒体
３…要約作成プログラム
４…入力機能
５…概念抽出機能
６…因果関係抽出機能
７…作成機能
８…出力機能
９…概念登録機能
１０…因果関係登録機能
１１…データベース
１２…概念定義辞書
１２ａ…アクション辞書
１２ｂ…リザルト辞書
１３…因果関係定義辞書[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a summary creation program and system using a text mining technique, and a summary creation method using a computer.
[0002]
[Prior art]
The text mining technique is a data mining technique for document data.
[0003]
Specific examples of text mining technology include technology for understanding context and summarizing / classifying / retrieving information based on document data, technology for extracting knowledge information from document data, document data described in documents (qualitative information) ) Into a quantity (quantitative information).
[0004]
In a broad sense, a technique for analyzing text mining results is also included in the text mining technique.
[0005]
A technique for performing text mining is disclosed in Japanese Patent Application No. 11-332114. In the invention of Japanese Patent Application No. 11-332114, text mining is performed using a concept definition dictionary in which information such as concept IDs are registered.
[0006]
Further, in the invention of Japanese Patent Application No. 11-332114, there is a causal relationship definition dictionary that defines a concept label that is a word connecting concepts represented by a plurality of concept IDs for a combination of a plurality of concept IDs extracted from a document. Used.
[0007]
[Problems to be solved by the invention]
In the invention of the above Japanese Patent Application No. 11-332114, it is necessary to designate a combination of concept IDs and a concept label and order registration each time a new registration is made in the causal relationship definition dictionary.
[0008]
This work has a problem that the labor increases as the contents to be registered increase and it takes time.
[0009]
Further, as the number of registrations in the concept definition dictionary increases, the number of registrations in the causal relationship definition dictionary also increases.
[0010]
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a summary creation program and system for improving the efficiency of registration work in a causal relationship definition dictionary, and a summary creation method by a computer.
[0011]
[Means for Solving the Problems]
Specific means taken for realizing the present invention will be described below.
[0012]
An abstract creation program according to the present invention includes an input function for allowing a computer to input document data to be summarized, a document element that is stored in a database and constitutes the document data, and a group to which the document element belongs. A plurality of document elements that are included in the document data of the summary creation target input to the computer with reference to the dictionary information associated with the element ID including part of the identifiable group identification information, and included in the dictionary information And causal relationship information that associates the connected document element used for connecting the document element to the combination of the extraction function for extracting the element ID and the group identification information that is stored in the database and is a part of the element ID. A combination of group identification information included in a plurality of element IDs that are referenced and extracted by the extraction function Judgment is made as to whether or not it is registered in the causal relationship information, and if registered, the connected document element associated with the combination of group identification information included in the plurality of element IDs extracted by the extraction function is extracted. A plurality of document elements extracted by the extraction function are arranged based on the causal relation extraction function and the connection relation information that defines the connection relation for arranging a plurality of document elements. If the extracted document elements are extracted, the connected document elements extracted from the plurality of arranged document elements are supplemented to realize a creation function for creating summary data.
[0013]
In the present invention, document elements registered in the dictionary information are grouped. In addition, causal relationship information is defined in units of groups. When a new document element is registered in the dictionary information, it is not necessary to update the causal relationship information if the group can be identified based on the element ID of the new document element.
[0014]
Therefore, the causal relationship information registration work can be made efficient.
[0015]
In the present invention, a part of the element ID may be group identification information representing a predetermined group.
[0016]
In the present invention, a document element that is registered in the dictionary information by the computer and an element ID that partially includes group identification information that can identify the group to which the document element belongs are input to the computer. and may further be achieved the element ID and functions for registering in the database dictionary information in association with. Thereby, the registration work of dictionary information can be supported.
[0017]
In the present invention, a combination of group identification information registered by the computer in the causal relationship information and a connected document element are input to the computer, and the input combination of group identification information and the connected document element are associated with each other in the database. A function for registering in the causal relationship information may be further realized. Thereby, the registration work of causal relationship information can be supported.
[0018]
By using the summary creation program as described above and a recording medium on which this program is recorded, the above-described functions can be easily added to computers, computer systems, servers, clients, etc. that do not have the functions described above. be able to.
[0019]
The summary creation method implemented by the summary creation program of the present invention may be the subject of the invention.
[0020]
Further, a summary creation system including means that operates in the same manner as the function realized by the summary creation program of the present invention may be the subject of the invention.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings shown below, the same portions are denoted by the same reference numerals and description thereof is omitted.
[0022]
(First embodiment)
In this embodiment, document elements (concepts) that are elements constituting a document are grouped and registered in a concept definition dictionary, and a connected document that is a document element for connecting document elements to a combination of groups. The elements are associated and registered in the causal relationship definition dictionary.
[0023]
FIG. 1 is a block diagram showing an example of the configuration of a summary creation program and a text mining system (summary creation system) according to the present embodiment.
[0024]
The text mining system 1 reads and executes the summary creation program 3 recorded on the recording medium 2.
[0025]
When the summary creation program 3 is started on the text mining system 1, the input function 4, the concept extraction function 5, the causal relation extraction function 6, the creation function 7, the output function 8, the concept registration function 9, and the causal relation registration function 10. Is realized.
[0026]
The text mining system 1 accesses the database 11 according to the operation of the summary creation program 3.
[0027]
The database 11 manages a concept definition dictionary 12 and a causal relationship definition dictionary 13.
[0028]
The concept definition dictionary 12 mainly includes an action dictionary 12a and a result dictionary 12b. For example, document elements defined as elements to be extracted for inclusion in the summary are registered in the concept definition dictionary 12.
[0029]
Table 1 shows an example of the action dictionary 12a.
[0030]
[Table 1]

[0031]
In the action dictionary 12a, special day names in the calendar such as “Father's Day” and “Mother's Day” are registered as document elements. The document elements “Father's Day” and “Mother's Day” are associated with element IDs (concept IDs) “A001” and “A002”, respectively.
[0032]
Here, it is assumed that a document element whose element ID starts with “A” belongs to a group indicating a calendar.
[0033]
Similarly, in the action dictionary 12a, event types relating to sales such as “special sale” and “tasting party” are registered as document elements. Element IDs “B001” and “B002” are associated with the document elements “sale” and “tasting party”, respectively.
[0034]
Here, it is assumed that a document element whose element ID starts with “B” belongs to a group indicating an event.
[0035]
Table 2 shows an example of the result dictionary 12b.
[0036]
[Table 2]

[0037]
In the result dictionary 12b, document elements meaning high evaluation such as “selling” and “popular” are registered. The element IDs “1001” and “1002” are associated with the document elements “selling” and “popular”, respectively.
[0038]
Here, it is assumed that the document element whose element ID starts with “1” belongs to the high evaluation group.
[0039]
Similarly, in the result dictionary 12b, document elements indicating low evaluation such as “not sold” or “not popular” are registered. The element IDs “2001” and “2002” are associated with the document elements “not sold” and “not popular”, respectively.
[0040]
Here, it is assumed that the document element whose element ID starts with “2” belongs to the low evaluation group.
[0041]
The causal relationship definition information 13 registers a connection document element in association with a combination of group identification information capable of identifying a group. A connected document element is a document element for connecting document elements.
[0042]
Table 3 shows an example of the causal relationship definition dictionary 13.
[0043]
[Table 3]

[0044]
For example, the connected document element “so” is associated with the combination of the group identification information “A ***” and the group identification information “1 ***”. Document elements “Father's Day” and “Mother's Day” having an element ID of “A” at the top belong to the group indicated by the group identification information “A ***”. The same applies to other combinations of group identification information.
[0045]
The input function 4 inputs daily report data to be summarized.
[0046]
Table 4 shows an example of daily report data.
[0047]
[Table 4]

[0048]
Daily report data is data in which daily report numbers are added to document data. Here, daily report data with the daily report number “N001” is added to the document data “Potato chips are sold on Father's Day”, and daily report number “N002” is added to the document data “Biscuit special sale, not popular” Assume that daily report data is input.
[0049]
The concept extraction function 5 accesses the concept definition dictionary 12 and extracts document elements registered in the concept definition dictionary 12 and their element IDs from the input daily report data.
[0050]
As a result, as shown in FIG. 2, for the daily report data of the daily report number “N001”, the document element “Father's Day”, its element ID “A001”, the document element “sold”, and its element ID “1001” Is extracted.
[0051]
On the other hand, the document element “sale” and its element ID “B001”, the document element “not popular” and its element ID “2002” are extracted from the daily report data of the daily report number “N002”.
[0052]
The causal relationship extraction function 6 accesses the causal relationship definition dictionary 13, and if the combination of group identification information corresponding to the combination of element IDs extracted by the concept extraction function 5 is registered in the causal relationship definition dictionary 13, A connected document element associated with a combination of group identification information is extracted.
[0053]
For example, since the combination of group identification information “A ***” and “1 ***” corresponding to the combination of element IDs “A001” and “1001” is registered in the causal relationship definition dictionary 13, this group The connected document element “so” associated with the combination of the identification information “A ***” and “1 ***” is extracted.
[0054]
The creation function 7 arranges the document elements extracted by the concept extraction function 5 by AND search according to the classification axis “Result & Action” which is the coupling rule information that defines the coupling relation of a plurality of document elements, and creates summary data. To do.
[0055]
For example, for daily report data with daily report number “N001”, the document elements “Father's Day” and “Sold” are arranged according to the classification axis, and summary data “Father's Day & Sold” is displayed. Created. Similarly, for daily report data of daily report number “N002”, document elements “sale” and “not popular” are arranged according to the classification axis, and summary data “sale & not popular” is created. .
[0056]
Further, when the connected document element is extracted from the causal relationship definition dictionary 13, the creation function 7 supplements the “&” portion of the classification axis with the connected document element, and creates summary data.
[0057]
For example, as shown in FIG. 3, since the element ID “A001” and the element ID “1001” are extracted from the daily report data of the daily report number “N001”, the group identification information “A **” is stored in the causal relationship definition dictionary 13. “So” associated with “*” and “1 ***” is supplemented, and summary data “Sold because it is Father's Day” is created.
[0058]
Similarly, since the element ID “B001” and the element ID “2002” are extracted from the daily report data of the daily report number “N002”, the group identification information “B ***” and “2 *” are extracted in the causal relationship definition dictionary 13. ** ”is supplemented with“ I did it, ”and summary data is created that“ I sold it but it was n’t popular. ”
[0059]
The concept registration function 9 displays a screen for prompting input of a document element to be newly registered, its element ID, and whether to register in the action dictionary 12a or the result dictionary 12b. The concept registration function 9 registers the input new document element and its element ID on the designated side of the action dictionary 12a and the result dictionary 12b.
[0060]
The causal relationship registration function 10 displays a screen for prompting input of a combination of group identification information to be newly registered and a connected document element. Then, the causal relationship registration function 10 registers the input combination of the new group identification information and the connected document element in the causal relationship definition dictionary 13.
[0061]
FIG. 4 is a flowchart showing an example of the flow of processing from when daily report data is input to when summary data is output.
[0062]
In step S1, daily report data is input.
[0063]
In step S2, the document element registered in the concept definition dictionary 12 and included in the daily report data and its element ID are extracted.
[0064]
In step S3, it is determined whether there is a combination of group identification information corresponding to the extracted combination of element IDs. If there is a corresponding combination, step S4 is executed.
[0065]
In step S4, the connected document element associated with the combination of group identification information corresponding to the extracted combination of element IDs is extracted from the causal relationship definition dictionary 13.
[0066]
In step S5, summary data is created by combining the extracted document elements according to the classification axis.
[0067]
In step S6, the created summary data is output.
[0068]
In the present embodiment described above, an element ID that can identify a group for classifying document elements is attached to the document element and registered in the concept definition dictionary 12.
[0069]
In the causal relationship definition dictionary 13, connected document elements are defined in units of document elements.
[0070]
Thus, it is not necessary to define a connection document element for each combination of document elements and register it in the causal relationship definition dictionary 13, and the connection document element may be defined in units of groups.
[0071]
Therefore, the registration work in the causal relationship definition dictionary 13 can be made efficient.
[0072]
In the element ID, the group to which the document element belongs can be identified by the head part, but the group may be identifiable by another part that is not the head part. Further, the document element may belong to a plurality of groups.
[0073]
Further, as shown in FIG. 5, the creation function 7 may create summary data after converting the extracted document elements into general document elements. For example, it is assumed that document elements “sold”, “moved”, and “sold” are extracted and converted into document elements “sold” for creating a summary. Further, when the document elements “sales” and “discount” are extracted, they are converted into “special sales”.
[0074]
The summary creation program 3 can be transmitted to a computer via a communication medium and applied to a computer. The operation of the computer that has read the summary creation program 3 is controlled by the summary creation program 3 to realize the above-described functions.
[0075]
Moreover, the function by the summary creation program 3 and the registered contents of the database 11 may be freely combined, or may be divided into a plurality of elements.
[0076]
The summary creation program 3 may be distributed over a plurality of computers and operate in cooperation.
[0077]
(Second Embodiment)
In the present embodiment, specific usage modes of the summary creation program 3 and the text mining system 1 according to the first embodiment will be described.
[0078]
FIG. 6 is a block diagram illustrating a first example of a usage mode of the summary creation program 3 and the text mining system 1.
[0079]
The client 15 managed by the user 14 and the text mining system 1 managed by the service provider 16 are connected via a network 17 such as the Internet.
[0080]
The text mining system 1 accesses the database 11 according to the operation of the summary creation program 3.
[0081]
The user 14 transmits document data such as company daily report data, monthly report data, and business report data from the client 15 to the text mining system 1 via the network 17. Then, the text mining system 1 creates summary data and transmits the created summary data to the client 15 via the network 17.
[0082]
Further, the user 14 transmits a dictionary update command from the client 15 to the text mining system 1 via the network 17. Then, the text mining system 1 updates the contents of the database 11 according to the update command.
[0083]
The number of document data such as company daily report data, monthly report data, and business report data becomes enormous. The user 14 can create summary data from such a large number of document data, and can improve the efficiency of data management. In addition, the user 14 can create summary data efficiently without having to maintain and operate the summary creation program 3 by receiving the summary creation service of the service provider 16.
[0084]
On the other hand, the service provider 16 can obtain a service fee from the user 14.
[0085]
FIG. 7 is a block diagram illustrating a second example of a usage mode of the summary creation program 3 and the text mining system 1.
[0086]
The service provider 16 provides or rents the summary creation program 3 or the text mining system 1 and the database 11 to the user 14.
[0087]
Further, the service provider 16 performs maintenance of the provided or rented summary creation program 3 or the text mining system 1 and the database 11.
[0088]
As a result, the service provider 16 can obtain a provision fee, rental fee, or maintenance fee for the summary creation program 3 or the like.
[0089]
【The invention's effect】
As described above in detail, in the present invention, the document elements registered in the dictionary information are grouped, and in the causal relationship information, the connected document elements are registered for this group combination.
[0090]
Therefore, by specifying a group and registering a new document element in the dictionary information, summary data can be created using the already registered causal relationship information, and the registration work in the causal relationship definition dictionary is efficient. Can be
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of the configuration of a summary creation program and a text mining system according to a first embodiment of the present invention.
FIG. 2 is a diagram showing an example of an extracted document element and its element ID.
FIG. 3 is a diagram showing an example of summary data created based on an extracted document element and a connected document element.
FIG. 4 is a flowchart showing an example of the flow of processing from when daily report data is input to when summary data is output.
FIG. 5 is a diagram illustrating an example of a relationship between an extracted document element and a document element used for creating a summary.
FIG. 6 is a block diagram showing a first example of a usage mode of a summary creation program and a text mining system according to a second embodiment of the present invention.
FIG. 7 is a block diagram showing a second example of a usage mode of the summary creation program and the text mining system according to the embodiment;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Text mining system 2 ... Recording medium 3 ... Summary creation program 4 ... Input function 5 ... Concept extraction function 6 ... Causal relationship extraction function 7 ... Creation function 8 ... Output function 9 ... Concept registration function 10 ... Causal relationship registration function 11 ... Database 12 ... Concept definition dictionary 12a ... Action dictionary 12b ... Result dictionary 13 ... Causality relation definition dictionary

Claims

On the computer,
An input function for the computer to input document data to be summarized;
Summarization input to the computer by referring to dictionary information stored in a database and associating document elements constituting document data with element IDs including part of group identification information capable of identifying groups to which the document elements belong. An extraction function for extracting a plurality of document elements and element IDs included in the document data to be created and included in the dictionary information;
Extracted by the extraction function by referring to the causal relationship information associated with the connected document element used for connecting the document element to the combination of the group identification information that is stored in the database and is a part of the element ID . It is determined whether a combination of group identification information included in a plurality of element IDs is registered in the causal relationship information. If registered, the combination is included in the plurality of element IDs extracted by the extraction function. A causal relationship extraction function for extracting connected document elements associated with a combination of group identification information,
The plurality of document elements extracted by the extraction function are arranged based on the combination rule information that defines the connection relation for arranging a plurality of document elements, and the connected document element is further extracted by the causal relation extraction function. A summary creation program for supplementing the extracted connected document elements to the plurality of arranged document elements and creating a summary data.

The summary creation program according to claim 1,
On the computer,
The computer inputs a document element registered in the dictionary information and an element ID partially including group identification information that can identify a group to which the document element belongs, and associates the input document element with the element ID. A summary creation program for further realizing a function for registering in the dictionary information of the database.

In the summary creation program according to claim 1 or 2,
On the computer,
The computer inputs a combination of group identification information to be registered in the causal relationship information and a connected document element, associates the input combination of group identification information with a connected document element, and registers it in the causal relationship information in the database. A summary creation program for further realizing the functions for

In a summary creation system that creates summary data on document data to be summarized,
An input means for the system to input document data to be summarized;
Summary information input to this system by referring to dictionary information stored in a database and associating document elements constituting document data with element IDs including part of group identification information capable of identifying a group to which the document elements belong. Extraction means for extracting a plurality of document elements and their element IDs included in the document data to be created and included in the dictionary information;
Reference is made to the causal relationship information that associates the connected document element used for connecting the document element to the combination of the group identification information that is stored in the database and is a part of the element ID, and is extracted by the extraction unit. It is determined whether a combination of group identification information included in the plurality of element IDs is registered in the causal relationship information. If registered, the combination is included in the plurality of element IDs extracted by the extraction unit. A causal relationship extracting means for extracting a connected document element associated with a combination of group identification information,
The plurality of document elements extracted by the extraction unit are arranged based on the combination rule information that defines a connection relationship for arranging a plurality of document elements, and the connected document element is further extracted by the causal relationship extraction unit. A summary creation system, comprising: creation means for supplementing the extracted connected document elements to the plurality of arranged document elements and creating summary data.

A method of creating summary data on document data to be summarized by a computer,
The computer performs processing for inputting the document data to be summarized.
The computer, referring to the dictionary information associated with the element ID that contains the document elements with identifiable group identification information group including its document element constituting the document data are stored in the database portion, the computer Performing a first process of extracting a plurality of document elements and element IDs included in the input document data to be summarized and included in the dictionary information;
A connection document element for connecting the document element belonging to each group of the group identification information combination to the combination of the group identification information stored in the database and part of the element ID by the computer. Refer to the related causal relationship information, determine whether a combination of group identification information included in the plurality of element IDs extracted by the first process is registered in the causal relationship information, and register For example, a second process of extracting a connected document element associated with a combination of group identification information included in the plurality of element IDs extracted by the first process is performed.
The computer arranges the plurality of document elements extracted by the first process based on the combination rule information that defines a connection relationship for arranging a plurality of document elements, and further connects by the second process. When document elements are extracted, a summary process is performed in which a third process of generating summary data is performed by supplementing the extracted connected document elements for the plurality of arranged document elements. Method.