JP4281899B2

JP4281899B2 - Question document summarizing device, question answering search device, question document summarizing program

Info

Publication number: JP4281899B2
Application number: JP2003069087A
Authority: JP
Inventors: 功難波
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-03-14
Filing date: 2003-03-14
Publication date: 2009-06-17
Anticipated expiration: 2023-03-14
Also published as: JP2004280323A

Description

【０００１】
【発明の属する技術分野】
本発明は、サポートセンターへの質問メール等のように、１文書が数文と極めて短く、接続詞なども用いられないような、文体が整っていない質問文書から、質問内容を表し、質問内容の把握に不可欠な一部の文（以下、重要な文ともいう）を抽出する質問文書要約装置、質問文書要約プログラムに関するものであり、また、質問文書に対する応答を検索する質問応答検索装置に関するものである。
【０００２】
【従来の技術】
現在、サポートセンター等で用いられる、質問メールに対する質問応答の検索方式には概ね３通りが存在する。
第１の検索方式は、過去の事例を質問応答の事例データベースに登録するのと同時に、事例をマッチングするための規則を登録し、その規則を用いて検索を行う方式である。規則は、基本的にはＡＮＤ，ＯＲを用いた論理式で記述されており、その条件に合致したものが検索結果として得られるというものである。第２の検索方式は、質問メールに対してそのまま全文検索を適用する方式である。第３の検索方式は、質問メールの表現に基づき質問応答検索にとって重要な文だけを抽出し、重要な文だけを用いて検索を行う方式であり、第２の検索方式の改良と言えるものである。第３の検索方式は、下記特許文献に示されるように、文末パターンを用いて文を抽出する手法が中心的である。
【０００３】
ここで、文末パターンを用いて文を抽出する手法について説明する。図１５は、日本語の質問文書の一例を示す図である。まず、質問メールとして受信された質問文書は、「。」や「？」で区切ることにより各文に分割される。次に、図１６に示す役割付与規則に従って、各文に対して役割を付与する。役割付与規則とは、文末パターンに応じて文の役割を付与する規則である。ここで説明のため、分割された各文に第１文から第８文までの文番号を振る。例えば、第１文の「現在ＰＣ−ＸＸＸを使っています。」は、文末パターンに「います」という表現が用いられていることから、第１文の役割として「行為」が付与される。図１５に示す質問文書を分割し、図１６に示した役割付与規則に従って役割を付与した結果を図１７に示す。以上の手順に従って、文末パターンを用いた文の抽出が行われる。
【０００４】
【特許文献１】
特開２００２−２７８９７７号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、上述した第１の検索方式は、規則作成のコストが高いという問題点がある。また、上述した第２の検索方式は、規則作成などのコストがかからないが、質問メール中に含まれる質問とは関係のない表現、例えば挨拶、質問者の示唆、質問を書いた人の名前、住所、引用といった質問の趣旨と関係のない文に結果が左右されるという問題点がある。また、上述した第３の検索方式は、役割がついた文を重要な文として抽出することから、文末パターンの照合により質問文書の全ての文に役割が付与された場合には、検索に必要な文だけを抽出することができない。そのため、文書によっては全文検索と全く効果が変わらず、期待された検索精度が得られないという問題点がある。
【０００６】
本発明は上述した課題に鑑みてなされたものであり、質問文書から重要な文だけを抽出する質問文書要約装置、質問文書要約プログラム、また、質問文書に対する応答を検索する質問応答検索装置を提供することを目的とする。
【０００７】
【課題を解決するための手段】
上述した課題を解決するために、本発明は、質問文書から質問内容を表す文を抽出する質問文書要約装置であって、前記質問文書を受け付ける文書入力部と、前記質問文書から文を検出する文検出部と、文の情報量である文情報量を算出する文情報量算出部と、文毎の前記文情報量の変化量に基づいて前記質問文書から文の集まりであるブロックを検出するブロック検出部と、前記ブロックの性質に関する所定の規則に従って前記ブロック内の文にブロック得点を付与するブロック得点付与部と、前記文情報量と前記ブロック得点に基づいて文の重要度を表す文得点を算出する文得点算出部と、前記文得点に基づいて前記質問文書から一部の文を出力する文出力部とを備えてなるものである。
【０００８】
このような構成によれば、質問文書から重要度の高い文だけを抽出することができる。
【０００９】
また、本発明に係る質問文書要約装置において、過去の質問文書から単語を抽出して登録単語とし、前記過去の質問文書における前記登録単語の情報量を単語情報量として算出し、前記登録単語と前記単語情報量の組を格納する単語情報量辞書をさらに備え、前記文情報量算出部は、前記単語情報量辞書の登録単語を文から検索し、検索した単語情報量の総和を検索した単語の数で割ることにより、文情報量を算出することを特徴とするものである。
【００１０】
このような構成によれば、文の重要度の元となる文情報量を算出することができる。
【００１１】
また、本発明に係る質問文書要約装置において、前記ブロック検出部は、２つの文の間で前記変化量の符号が−から＋へ変化する場合に前記２つの文の間でブロックを区切ることを特徴とするものである。
【００１２】
このような構成によれば、質問文書からパラグラフに相当する文の塊であるブロックを抽出することができる。
【００１３】
また、本発明に係る質問文書要約装置において、ブロック得点付与部は、前記ブロックのうち先頭のブロックに所定の得点を前記ブロック得点として付与することを特徴とするものである。
【００１４】
このような構成によれば、ヒューリスティックな規則に応じて重要な文を抽出することができる。
【００１５】
また、本発明に係る質問文書要約装置において、ブロック得点付与部は、前記ブロックに含まれる文の文情報量の平均値を算出し、該平均値に基づいて全ての前記ブロックに順位を付け、前記ブロック毎に順位に応じた得点を前記ブロック得点として付与することを特徴とするものである。
【００１６】
また、本発明に係る質問文書要約装置において、所定の規則に従って文に役割を付与する文役割付与部をさらに備え、文得点算出部は、前記文役割付与部により付与された役割に応じて文得点を加算することを特徴とするものである。
【００１７】
このような構成によれば、「挨拶」等の役割を持つ、質問とは関係のない文を除外することができる。
【００１８】
また、本発明は、質問文書に対する応答の検索を行う質問応答検索装置であって、過去の質問文書である検索対象文書から質問内容を表す文を検索対象文として抽出するとともに、入力された質問文書から質問内容を表す文を質問文として抽出する上述した質問文書要約装置と、前記質問文書要約装置により抽出された前記検索対象文と該検索対象文に対応する応答の組を登録する検索インデックスと、前記質問文書要約装置により抽出された質問文を用いて前記検索インデックスから前記質問文と同一若しくは類似した検索対象文を検索し、対応する応答を出力する応答検索部とを備えてなるものである。
【００１９】
なお、この場合、検索対象文の抽出を第１の質問文書要約装置を用いて構成される検索対象文抽出部により行い、質問文の抽出を第２の質問文書要約装置を用いて構成される質問文抽出部により行うように構成するようにしても良い。
【００２０】
このような構成によれば、質問文書から検索に必要な質問文だけを抽出して検索を行うことにより、安定した質問応答検索が可能となる。
【００２１】
また、本発明は、質問文書から質問を表す文を抽出して質問文書を要約する処理をコンピュータに実行させるために、コンピュータにより読取可能な媒体に記憶された質問文書要約プログラムであって、前記質問文書を受け付けるステップと、前記質問文書から文を検出するステップと、文の情報量である文情報量を算出するステップと、文毎の前記文情報量の変化量に基づいて前記質問文書から文の集まりであるブロックを検出するステップと、前記ブロックの性質に関する所定の規則に従って前記ブロック内の文にブロック得点を付与するステップと、前記文情報量と前記ブロック得点に基づいて文の重要度を表す文得点を算出するステップと、前記文得点に基づいて前記質問文書から一部の文を出力するステップとをコンピュータに実行させることを特徴とするものである。
【００２２】
また、実施の形態においては、質問文書から質問を表す文を抽出する質問文書要約方法であって、前記質問文書を受け付けるステップと、前記質問文書から文を検出するステップと、文の情報量である文情報量を算出するステップと、文毎の前記文情報量の変化量に基づいて前記質問文書から文の集まりであるブロックを検出するステップと、前記ブロックの性質に関する所定の規則に従って前記ブロック内の文にブロック得点を付与するステップと、前記文情報量と前記ブロック得点に基づいて文の重要度を表す文得点を算出するステップと、前記文得点に基づいて前記質問文書から一部の文を出力するステップとを備えてなる質問文書要約方法が開示されている。
【００２３】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して詳細に説明する。
実施の形態１．
本実施の形態においては、日本語の質問文書から質問の内容を表す重要な文を抽出する質問文書要約装置について説明する。まず、本発明の質問文書要約装置の機能の構成について説明する。図１は、本実施の形態に係る質問文書要約装置の機能の構成の一例を示すブロック図である。図１に示すように、質問文書要約装置１の機能は、文書入力部１１と、文検出部１２と、文役割付与部１３と、文情報量計算部１４と、ブロック検出部１５と、ブロック得点付与部１６と、文得点計算部１７と、文出力部１８と、役割付与規則ＤＢ（データベース）１９と、単語情報量辞書２０と、検索対象文書ＤＢ２１から構成される。
【００２４】
検索対象文書ＤＢ２１は予め、過去の事例の日本語の質問文書を蓄積している。また、単語情報量辞書２０は予め、検索対象文書ＤＢ２１中の文書を文に分割し、さらに文を単語に分割し、得られる単語を登録単語として格納している。また、単語情報量辞書２０は、検索対象文書ＤＢ２１中の全ての文の数である検索対象文数と、ある登録単語を含む文の数である単語出現文数とを数え、以下に示す式（１）を用いて登録単語の情報量である単語情報量を算出し、登録単語と単語情報量の組で格納している。
【００２５】
単語情報量＝−ｌｏｇ₂（単語出現文数／検索対象文数）・・・（１）
【００２６】
本実施の形態において単語情報量は、簡単化した式である式（１）を用いたが他の式を用いて算出しても良い。図２は、日本語に関する単語情報量辞書２０の一例を示す表である。
【００２７】
次に、質問文書要約装置１の動作について具体例を用いて説明する。ここでは、図１５に示した日本語の質問文書が質問文書要約装置１に入力された場合を例にとって説明する。図３は、本実施の形態に係る質問文書要約装置の処理の一例を示すフローチャートである。また、図４は、質問文書の各文について算出される値の一例を示す表である。図４に示すように、質問文書要約装置１の処理により、質問文書の各文に対して、文番号、役割、文情報量、変化量、ブロック番号、ブロック得点、文得点が算出される。
【００２８】
まず、文書入力部１１は質問文書の入力を受け付ける（Ｓ１）。次に、文検出部１２は入力された質問文書において、「。」や「？」等の区切りを検出することにより文を検出する（Ｓ２）。ここで文検出部１２は、文の最後は終助詞で終わる等、統計的に得られた品詞の並び方に従って分割してもよい。
【００２９】
次に、文役割付与部１３は、役割付与規則ＤＢ１９に格納されている日本語における役割付与規則に従って、各文に対して役割を付与する（Ｓ３）。ここで文役割付与部１３は、上述した特開２００１−８３５１８号公報のように、文末パターンに応じて役割を付与する。ここでは文役割付与部１３が、図１５に示す質問文書から得られる各文に対して図１６に示した役割付与規則に従って役割を付与する例について説明する。図４の表における役割の列に示すように、文役割付与部１３は文末パターンによる役割付与規則により、各文に役割を付与する。ここで役割は「挨拶」等、検索の役に立たない役割を付与してもよい。
【００３０】
次に、文情報量計算部１４は、単語情報量辞書２０における日本語の単語情報量に従って、文の情報量である文情報量を算出し、ブロック検出部１５へ出力する（Ｓ４）。文情報量の算出には、単語情報量辞書２０に予め格納されている登録単語と単語情報量を用いる。文情報量計算部１４は単語情報量辞書２０を用いて、質問文書の各文から登録単語を検索し、文毎に登録単語の単語情報量の平均値を算出し、その平均値を文情報量とする。文情報量は、以下に示す式（２）で算出される。
【００３１】
文情報量＝Σ（文中の登録単語の単語情報量）／文中の登録単語の数・・・（２）
【００３２】
例えば、第１文の「現在ＰＣ−ＸＸＸを使っています。」の登録単語は、図２の単語情報量辞書によると「ＰＣ−ＸＸＸ」と「使う」の２つであり、単語情報量はそれぞれ３と１である。従って（２）式より、第１文の文情報量＝（３＋１）／２＝２となる。このように第１文から第８文までの各文について文情報量を算出した結果を、図４の表における文情報量の列に示す。
【００３３】
次に、ブロック検出部１５は、各文の文情報量を用いて変化量を算出する（Ｓ５）。図５は、文情報量の変化量を検出する処理の一例を示すフローチャートである。まず第１文の変化量を０とすることにより変化量の初期化を行い（Ｓ１１）、対象文を第２文とする（Ｓ１２）。
【００３４】
対象文が最後の文ではない場合（Ｓ１３，Ｎｏ）、対象文の文情報量と対象文の１つ前の文の文情報量との比較を行う（Ｓ１４）。比較の結果、文情報量の差が規定の範囲内である場合（Ｓ１５，Ｙｅｓ）、対象文の変化量を０とし（Ｓ２０）、対象文を次の文とし（Ｓ１８）、処理Ｓ１３へ戻る。ここで規定の範囲とは、ヒューリスティックに例えば０．１とされる。
【００３５】
一方、比較の結果、文情報量の差が規定の範囲外であり（Ｓ１５，Ｎｏ）、対象文の文情報量の方が大きい場合（Ｓ１６，Ｙｅｓ）、対象文の変化量を＋とし（Ｓ１７）、処理Ｓ１８へ移行する。一方、対象文の文情報量の方が小さい場合（Ｓ１６，Ｎｏ）、対象文の変化量を−とし（Ｓ１９）、処理Ｓ１８へ移行する。対象文が最後の文である場合（Ｓ１３，Ｙｅｓ）、このフローを終了する。以上の変化量の検出処理を、第１文から第８文について行った結果を、図４の表における変化量の列に示す。
【００３６】
次に、ブロック検出部１５は、各文の変化量を用いて質問文書からブロックを検出する（Ｓ６）。図６は、質問文書からブロックを検出する処理の一例を示すフローチャートである。ブロック検出部１５は、まず対象文を先頭文とする（Ｓ３１）。
【００３７】
対象文が最後の文ではない場合（Ｓ３２，Ｎｏ）、対象文の変化量と対象文の次の文の変化量との比較を行う（Ｓ３３）。比較の結果、変化量が同じまたは対象文の変化量が０である場合（Ｓ３４，Ｙｅｓ）、対象文を次の文とし（Ｓ３７）、処理Ｓ３２へ戻る。一方、比較の結果、変化量が同じまたは対象文の変化量が０ではなく（Ｓ３４，Ｎｏ）、変化量が−から＋に変化している場合（Ｓ３５，Ｙｅｓ）、対象文でブロックを区切り（Ｓ３６）、処理Ｓ３７へ移行する。一方、変化量が＋から−に変化している場合（Ｓ３５，Ｎｏ）、処理Ｓ３７へ移行する。
【００３８】
対象文が最後の文である場合（Ｓ３２，Ｙｅｓ）、このフローを終了する。以上の処理により、質問文書からブロックが検出される。以上のブロックの検出を、第１文から第８文について行った結果を、図４の表におけるブロック番号の列に示す。ここでは、分割した各ブロックに１から３の番号を振り、１の番号が振られた箇所を第１ブロックとし、２の番号が振られた箇所を第２ブロックとし、３の番号が振られた箇所を第３ブロックとする。
【００３９】
次に、ブロック得点付与部１６は、ブロックの性質に応じて文にブロック得点を付与する（Ｓ７）。図７は、ブロック得点付与規則の一例を示す図である。図７に示すブロック得点付与規則は、第１ブロックに含まれる文に全文の平均文情報量の１／５を付与する、という第１ブロック優先規則である。ここで、第１ブロック優先規則は、第１ブロックは重要であるというヒューリスティックな規則を反映している。ブロック得点付与規則に従って各文に付与する得点であるブロック得点を、図４の表におけるブロック得点の列に示す。ここでは、第１パラグラフ優先規則に従って各文にブロック得点を付与したが、１ブロックあたりの文情報量の平均値を算出し、平均値に基づいて全てのブロックに順位を付け、順位に応じた得点をブロック得点として付与するようにしても良い。
【００４０】
次に、文得点計算部１７は、文毎の文得点を算出し、文出力部１８へ出力する（Ｓ８）。ここで文得点は、文の重要度を表す値であり、文毎に文情報量とブロック得点と役割に与えられた得点の総和とする。各文の文得点を算出した結果を、図４の表における文得点の列に示す。本実施の形態では、文得点を与える対象の役割を、「行為」、「障害」、「質問」としている。ここではこれらの役割以外の役割を付与されないが、例えば「挨拶」等の重要でない役割を持つ文には文得点を与えない、という規則を用いても良い。また、役割に応じて異なる得点を与えるようにしても良い。
【００４１】
最後に、文出力部１８は、文得点の高い順に文をソートして、予め指定された抽出文数だけの文を上位から抽出文として出力する（Ｓ９）。ここでは抽出文数を３とする。図８は、質問文書から得られる抽出文を示す図である。図８に示すように、抽出文は、文得点が高い順に第３文、第２文、第４文となり、質問文書の中心となる部分が抽出される。
【００４２】
以上、Ｓ１からＳ９の処理を行うことにより、本実施の形態に係る質問文書要約装置は、質問文書から重要な文だけを抽出することができる。なお、本実施の形態において、検索対象文書ＤＢ２１には予め過去の事例の質問文書を格納するとしたが、文書入力部１１へ入力された質問文書を新たに検索対象文書ＤＢ２１へ登録し、過去の事例の質問文書として蓄積するようにしても良い。
【００４３】
実施の形態２．
本実施の形態においては、英語の質問文書から重要な文を抽出する質問文書要約装置について説明する。本実施の形態に係る質問文書要約装置の機能の構成は、図１と同様である。まず、検索対象文書ＤＢ２１には予め、過去の事例の英語の質問文書が蓄積されており、単語情報量辞書２０は、実施の形態１と同様に、検索対象文書ＤＢ２１を用いて英語の単語情報量を算出し、登録単語と単語情報量の組で格納している。図９は、英語に関する単語情報量辞書の一例を示す表である。
【００４４】
次に、質問文書要約装置１の動作について具体例を用いて説明する。ここでは、図１０に示した英語の質問文書が、質問文書要約装置１に入力された場合に例をとって説明する。文書入力部１１とブロック検出部１５とブロック得点付与部１６と文得点計算部１７と文出力部１８は、実施の形態１と同様の処理を行うため、以下、文検出部１２と文役割付与部１３と文情報量算出部１４の処理、及び役割付与規則ＤＢ１９に登録される英語の役割付与規則について説明する。
【００４５】
文検出部１２は、入力された質問文書において、「．」や「？」等の区切りを検出することにより文を検出する（Ｓ２）。文役割付与部１３は、役割付与規則ＤＢ１９に格納されている英語の役割付与規則に従って、各文に対して役割を付与する（Ｓ３）。図１１は、英語における役割付与規則の一例を示す表である。日本語の文の場合は文末パターンの照合により役割を付与するとしたが、英語の文の場合は文に含まれる表現により役割を付与する。例えば、第１文の「I use PC-XXX now.」は、文中に「use」という表現が用いられていることから、第１文の役割として「action」が付与される。図１０に示す質問文書を分割し、図１１に示した役割付与規則に従って役割を付与した結果を図１２に示す。
【００４６】
文情報量算出部１４は、単語情報量辞書２０における英語の単語情報量に従って、実施の形態１と同様の手順で文情報量を算出し、ブロック検出部１５へ出力する（Ｓ４）。
【００４７】
以上、Ｓ１とＳ５からＳ９で行われる他の機能ブロックの処理に関しては、上述した実施の形態１と同様の処理が施されることにより、英語の質問文書からも重要な文だけを抽出することができる。
【００４８】
実施の形態３．
本実施の形態では、本発明の質問文書要約装置を２つ備えて構成される質問応答検索装置の構成について説明する。図１３は、本実施の形態に係る質問応答検索装置の構成の一例を示すブロック図である。図１３に示すように、質問応答検索装置３は、質問文書要約装置１Ａと、質問文書要約装置１Ｂと、検索インデックス３１と、応答検索部３２から構成される。ここで、質問文書要約装置１Ａと質問文書要約装置１Ｂは、図１の質問文書要約装置１と同じ機能の構成を持ち、同様の処理を行う。検索インデックス３１は、過去の質問文書に対する応答（回答）をその質問文書についての要約された文（検索対象文）との組合わせにおいて登録したものである。
【００４９】
次に、質問応答検索装置３の動作について説明する。まず、予め過去の事例の質問文書である検索対象文書は、質問文書要約装置１Ａへ入力される。質問文書要約装置１Ａは、検索対象文書から重要な文を抽出し、その結果を検索対象文として検索インデックス３１へ出力し、検索対象文を登録する。この検索対象文の検索インデックス３１への登録に際しては、質問文書要約装置１Ａにより抽出された検索対象文に係る質問文書に対応する応答が検索対象文と組合わせて登録される。
【００５０】
一方、利用者より受信される実際の質問文書は、質問文書要約装置１Ｂへ入力される。質問文書要約装置１Ｂは、質問文書から重要な文を抽出し、その結果を質問文として応答検索部３２へ出力する。応答検索部３２は、検索インデックス３１から質問文に類似した検索対象文を検索し、対応する応答を外部へ出力する。こうして、質問文書から検索に必要な質問文だけを抽出して検索を行うことにより、安定した質問応答検索が可能となる。
【００５１】
実施の形態４．
実施の形態３では、過去の質問文書から検索対象文を抽出する場合と、利用者から得られる質問文書から質問文を抽出する場合とで、それぞれ別個の質問文書要約装置１Ａ，１Ｂを用いるようにしたが、これらは一つの文書要約装置を兼用して行うようにしても良い。即ち、図１４に示すように、一つの文書要約装置１Ｃを用いて過去の質問文書から検索対象文を抽出して検索インデックス３１に登録した後（Ｓ１０１）、その文書要約装置１Ｃを用いて利用者から得られる質問文書から質問文を抽出し、応答検索部３２に与えるようにしても良い（Ｓ１０２）。
【００５２】
（付記１）質問文書から質問内容を表す文を抽出する質問文書要約装置であって、
前記質問文書を受け付ける文書入力部と、
前記質問文書から文を検出する文検出部と、
文の情報量である文情報量を算出する文情報量算出部と、
文毎の前記文情報量の変化量に基づいて前記質問文書から文の集まりであるブロックを検出するブロック検出部と、
前記ブロックの性質に関する所定の規則に従って前記ブロック内の文にブロック得点を付与するブロック得点付与部と、
前記文情報量と前記ブロック得点に基づいて文の重要度を表す文得点を算出する文得点算出部と、
前記文得点に基づいて前記質問文書から一部の文を出力する文出力部と
を備えてなる質問文書要約装置。
（付記２）付記１に記載の質問文書要約装置において、
過去の質問文書から単語を抽出して登録単語とし、前記過去の質問文書における前記登録単語の情報量を単語情報量として算出し、前記登録単語と前記単語情報量の組を格納する単語情報量辞書をさらに備え、
前記文情報量算出部は、前記単語情報量辞書の登録単語を文から検索し、検索した単語情報量の総和を検索した単語の数で割ることにより、文情報量を算出することを特徴とする質問文書要約装置。
（付記３）付記１または付記２に記載の質問文書要約装置において、
前記ブロック検出部は、２つの文の間で前記変化量の符号が−から＋へ変化する場合に前記２つの文の間でブロックを区切ることを特徴とする質問文書要約装置。
（付記４）付記１乃至付記３のいずれかに記載の質問文書要約装置において、
ブロック得点付与部は、前記ブロックのうち先頭のブロックに所定の得点を前記ブロック得点として付与することを特徴とする質問文書要約装置。
（付記５）付記１乃至付記３のいずれかに記載の質問文書要約装置において、
ブロック得点付与部は、前記ブロックに含まれる文の文情報量の平均値を算出し、該平均値に基づいて全ての前記ブロックに順位を付け、前記ブロック毎に順位に応じた得点を前記ブロック得点として付与することを特徴とする質問文書要約装置。
（付記６）付記１乃至付記５のいずれかに記載の質問文書要約装置において、
所定の規則に従って文に役割を付与する文役割付与部をさらに備え、
文得点算出部は、前記文役割付与部により付与された役割に応じて文得点を加算することを特徴とする質問文書要約装置。
（付記７）質問文書に対する応答の検索を行う質問応答検索装置であって、
過去の質問文書である検索対象文書から質問内容を表す文を検索対象文として抽出するとともに、入力された質問文書から質問内容を表す文を質問文として抽出する請求項１乃至請求項６のいずれかに記載の質問文書要約装置と、
前記質問文書要約装置により抽出された前記検索対象文と該検索対象文に対応する応答の組を登録する検索インデックスと、
前記質問文書要約装置により抽出された質問文を用いて前記検索インデックスから前記質問文と同一若しくは類似した検索対象文を検索し、対応する応答を出力する応答検索部と
を備えてなる質問応答検索装置。
（付記８）質問文書に対する応答の検索を行う質問応答検索装置であって、
過去の質問文書である検索対象文書から質問内容を表す文を検索対象文として抽出する付記１乃至付記６のいずれかに記載の質問文書要約装置により構成される検索対象文抽出部と、
前記検索対象文と該検索対象文に対する応答の組を登録する検索インデックスと、
入力された質問文書から質問内容を表す文を質問文として抽出する付記１乃至付記６のいずれかに記載の質問文書要約装置により構成される質問文抽出部と、前記質問文抽出部により抽出された質問文を用いて、前記検索インデックスから質問文と同一若しくは類似した検索対象文を検索し、対応する応答を出力する応答検索部と
を備えてなる質問応答検索装置。
（付記９）質問文書から質問を表す文を抽出して質問文書を要約する処理をコンピュータに実行させるために、コンピュータにより読取可能な媒体に記憶された質問文書要約プログラムであって、
前記質問文書を受け付けるステップと、
前記質問文書から文を検出するステップと、
文の情報量である文情報量を算出するステップと、
文毎の前記文情報量の変化量に基づいて前記質問文書から文の集まりであるブロックを検出するステップと、
前記ブロックの性質に関する所定の規則に従って前記ブロック内の文にブロック得点を付与するステップと、
前記文情報量と前記ブロック得点に基づいて文の重要度を表す文得点を算出するステップと、
前記文得点に基づいて前記質問文書から一部の文を出力するステップと
をコンピュータに実行させることを特徴とする質問文書要約プログラム。
（付記１０）質問文書から質問を表す文を抽出する質問文書要約方法であって、
前記質問文書を受け付けるステップと、
前記質問文書から文を検出するステップと、
文の情報量である文情報量を算出するステップと、
文毎の前記文情報量の変化量に基づいて前記質問文書から文の集まりであるブロックを検出するステップと、
前記ブロックの性質に関する所定の規則に従って前記ブロック内の文にブロック得点を付与するステップと、
前記文情報量と前記ブロック得点に基づいて文の重要度を表す文得点を算出するステップと、
前記文得点に基づいて前記質問文書から一部の文を出力するステップと
を備えてなる質問文書要約方法。
【００５３】
【発明の効果】
以上に詳述したように本発明によれば、質問メール等の質問文書から重要な文だけを抽出することが可能となる。これにより、安定した質問応答検索が可能となる。
【図面の簡単な説明】
【図１】本発明の実施の形態１に係る質問文書要約装置の機能の構成の一例を示すブロック図である。
【図２】日本語に関する単語情報量辞書の一例を示す表である。
【図３】実施の形態１に係る質問文書要約装置の処理の一例を示すフローチャートである。
【図４】質問文書の各文について算出される値の一例を示す表である。
【図５】文情報量の変化量を検出する処理の一例を示すフローチャートである。
【図６】質問文書からブロックを検出する処理の一例を示すフローチャートである。
【図７】ブロック得点付与規則の一例を示す図である。
【図８】質問文書から得られる抽出文を示す図である。
【図９】本発明の実施の形態２に係る英語に関する単語情報量辞書の一例を示す表である。
【図１０】英語の質問文書の一例を示す図である。
【図１１】英語における役割付与規則の一例を示す表である。
【図１２】英語における役割付与結果の一例を示す図である。
【図１３】本発明の実施の形態３に係る質問応答検索装置の構成の一例を示すブロック図である。
【図１４】本発明の実施の形態４に係る質問応答検索装置の構成の一例を示すブロック図である。
【図１５】日本語の質問文書の一例を示す図である。
【図１６】日本語における役割付与規則の一例を示す表である。
【図１７】日本語における役割付与結果の一例を示す図である。
【符号の説明】
１，１Ａ，１Ｂ，１Ｃ質問文書要約装置、１１文書入力部、１２文検出部、１３文役割付与部、１４文情報量計算部、１５ブロック検出部、１６ブロック得点付与部、１７文得点計算部、１８文出力部、１９役割付与規則ＤＢ、２０単語情報量辞書、２１検索対象文書ＤＢ、３質問応答検索装置、３１検索インデックス、３２応答検索部。[0001]
BACKGROUND OF THE INVENTION
The present invention expresses the contents of a question from a question document that is not well-written, such as a question mail to a support center, where one document is very short as a few sentences and no conjunction is used. The present invention relates to a question document summarization apparatus and a question document summarization program for extracting a part of sentences essential for understanding (hereinafter also referred to as important sentences), and a question response search apparatus for retrieving responses to question documents. is there.
[0002]
[Prior art]
Currently, there are roughly three types of search methods for question responses to question emails used in support centers and the like.
The first search method is a method in which past cases are registered in a question response case database, and at the same time, a rule for matching cases is registered and a search is performed using the rule. The rule is basically described by a logical expression using AND and OR, and a rule that matches the condition is obtained as a search result. The second search method is a method in which a full-text search is applied to a question mail as it is. The third search method is a method for extracting only sentences important for question answer search based on the expression of the question mail and performing a search using only the important sentences, and can be said to be an improvement of the second search method. is there. As shown in the following patent document, the third search method is centered on a technique for extracting a sentence using a sentence end pattern.
[0003]
Here, a method of extracting a sentence using a sentence end pattern will be described. FIG. 15 is a diagram illustrating an example of a Japanese question document. First, a question document received as a question mail is divided into sentences by separating them with “.” Or “?”. Next, a role is granted to each sentence according to the role grant rules shown in FIG. The role assignment rule is a rule for assigning a sentence role according to a sentence end pattern. Here, for explanation, sentence numbers from the first sentence to the eighth sentence are assigned to the divided sentences. For example, the phrase “I am currently using PC-XXX” in the first sentence uses the expression “I am” in the sentence end pattern, so “act” is given as the role of the first sentence. FIG. 17 shows the result of dividing the question document shown in FIG. 15 and assigning roles in accordance with the role assignment rules shown in FIG. The sentence extraction using the sentence end pattern is performed according to the above procedure.
[0004]
[Patent Document 1]
JP 2002-278777 A
[0005]
[Problems to be solved by the invention]
However, the above-described first search method has a problem that the cost of rule creation is high. In addition, the second search method described above is not costly for rule creation or the like, but an expression unrelated to the question included in the question email, such as a greeting, a questioner's suggestion, the name of the person who wrote the question, There is a problem that the result depends on a sentence that is not related to the purpose of the question such as an address or a quotation. In addition, the third search method described above extracts sentences with roles as important sentences. Therefore, if all sentences in a question document are given roles by collating end-of-sentence patterns, it is necessary for the search. It is not possible to extract only simple sentences. For this reason, there is a problem in that the effect of the full text search is not changed depending on the document, and the expected search accuracy cannot be obtained.
[0006]
The present invention has been made in view of the above-described problems, and provides a question document summarization device that extracts only important sentences from a question document, a question document summarization program, and a question answer search device that retrieves a response to a question document. The purpose is to do.
[0007]
[Means for Solving the Problems]
In order to solve the above-described problem, the present invention is a question document summarization apparatus that extracts a sentence representing a question content from a question document, the document input unit receiving the question document, and detecting the sentence from the question document A sentence detection unit; a sentence information amount calculation unit that calculates a sentence information amount that is a sentence information amount; and a block that is a collection of sentences from the question document based on a change amount of the sentence information amount for each sentence. A block detection unit, a block score giving unit that gives a block score to a sentence in the block according to a predetermined rule relating to the property of the block, and a sentence score that represents the importance of the sentence based on the sentence information amount and the block score A sentence score calculating unit for calculating the sentence, and a sentence output unit for outputting a part of the sentence from the question document based on the sentence score.
[0008]
According to such a configuration, it is possible to extract only sentences with high importance from the question document.
[0009]
Further, in the question document summarizing apparatus according to the present invention, a word is extracted from a past question document to be a registered word, an information amount of the registered word in the past question document is calculated as a word information amount, and the registered word and The word information dictionary further includes a word information dictionary that stores the set of word information, and the sentence information calculator searches for a registered word in the word information dictionary from a sentence, and searches for the sum of the searched word information The sentence information amount is calculated by dividing by the number of.
[0010]
According to such a configuration, it is possible to calculate a sentence information amount that is a source of sentence importance.
[0011]
Further, in the question document summarizing apparatus according to the present invention, the block detecting unit divides a block between the two sentences when the sign of the change amount changes from − to + between the two sentences. It is a feature.
[0012]
According to such a configuration, it is possible to extract a block which is a block of sentences corresponding to a paragraph from a question document.
[0013]
In the question document summarizing apparatus according to the present invention, the block score assigning unit assigns a predetermined score as the block score to the first block among the blocks.
[0014]
According to such a configuration, an important sentence can be extracted according to a heuristic rule.
[0015]
Further, in the question document summarizing apparatus according to the present invention, the block score giving unit calculates an average value of sentence information amounts of sentences included in the block, and ranks all the blocks based on the average value, A score corresponding to the rank is given to each block as the block score.
[0016]
The question document summarization apparatus according to the present invention further includes a sentence role granting unit that assigns a role to the sentence according to a predetermined rule, and the sentence score calculation unit determines whether the sentence is given according to the role given by the sentence role granting unit. The score is added.
[0017]
According to such a configuration, it is possible to exclude a sentence having a role of “greeting” or the like and not related to a question.
[0018]
The present invention is also a question answering search device for searching for a response to a question document, wherein a sentence representing a question content is extracted as a search target sentence from a search target document that is a past question document, and an inputted question The above-described question document summarization apparatus that extracts a sentence representing the contents of a question from a document as a question sentence, and a search index for registering a set of the search target sentence extracted by the question document summarization apparatus and a response corresponding to the search target sentence And a response search unit that searches the search index for a search target sentence that is the same as or similar to the question sentence using the question sentence extracted by the question document summarizing apparatus, and outputs a corresponding response. It is.
[0019]
In this case, the search target sentence is extracted by the search target sentence extracting unit configured using the first question document summarizing apparatus, and the question sentence is extracted using the second question document summarizing apparatus. You may make it comprise so that it may be performed by a question sentence extraction part.
[0020]
According to such a configuration, it is possible to perform a stable question answer search by extracting only the question sentence necessary for the search from the question document and performing the search.
[0021]
The present invention also provides a question document summarization program stored in a computer readable medium for causing a computer to execute a process of extracting a sentence representing a question from a question document and summarizing the question document. Receiving a question document; detecting a sentence from the question document; calculating a sentence information amount that is a sentence information amount; and based on a change amount of the sentence information amount for each sentence. Detecting a block that is a collection of sentences; assigning a block score to a sentence in the block according to a predetermined rule relating to the nature of the block; and importance of the sentence based on the sentence information amount and the block score And a step of calculating a sentence score representing the expression and a step of outputting a part of the sentence from the question document based on the sentence score. It is characterized in that to.
[0022]
Further, in the embodiment, there is a question document summarizing method for extracting a sentence representing a question from a question document, the step of receiving the question document, the step of detecting a sentence from the question document, and the information amount of the sentence A step of calculating a certain sentence information amount, a step of detecting a block which is a collection of sentences from the question document based on a change amount of the sentence information amount for each sentence, and the block according to a predetermined rule regarding a property of the block A step of assigning a block score to a sentence in the sentence, a step of calculating a sentence score representing the importance of the sentence based on the sentence information amount and the block score, and a part from the question document based on the sentence score A method for summarizing a question document comprising the step of outputting a sentence is disclosed.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
Embodiment 1 FIG.
In this embodiment, a question document summarizing apparatus that extracts an important sentence representing the content of a question from a Japanese question document will be described. First, the functional configuration of the question document summarization apparatus of the present invention will be described. FIG. 1 is a block diagram showing an example of a functional configuration of the question document summarizing apparatus according to the present embodiment. As shown in FIG. 1, the function of the question document summarizing apparatus 1 is as follows: a document input unit 11, a sentence detection unit 12, a sentence role assignment unit 13, a sentence information amount calculation unit 14, a block detection unit 15, and a block The score assignment unit 16, the sentence score calculation unit 17, the sentence output unit 18, a role assignment rule DB (database) 19, a word information amount dictionary 20, and a search target document DB 21 are configured.
[0024]
The search target document DB 21 stores Japanese question documents of past cases in advance. The word information dictionary 20 previously divides a document in the search target document DB 21 into sentences, further divides the sentence into words, and stores the obtained words as registered words. The word information dictionary 20 counts the number of search target sentences, which is the number of all sentences in the search target document DB 21, and the number of word appearance sentences, which is the number of sentences including a certain registered word. Using (1), a word information amount that is an information amount of a registered word is calculated and stored as a set of a registered word and a word information amount.
[0025]
Word information amount = -log ₂ (Number of word appearance sentences / number of search target sentences) (1)
[0026]
In the present embodiment, the amount of word information is calculated using Formula (1), which is a simplified formula, but may be calculated using another formula. FIG. 2 is a table showing an example of the word information dictionary 20 related to Japanese.
[0027]
Next, the operation of the question document summarizing apparatus 1 will be described using a specific example. Here, a case where the Japanese question document shown in FIG. 15 is input to the question document summarizing apparatus 1 will be described as an example. FIG. 3 is a flowchart showing an example of processing of the question document summarizing apparatus according to the present embodiment. FIG. 4 is a table showing an example of values calculated for each sentence of the question document. As shown in FIG. 4, the sentence number summarization, the sentence information amount, the change amount, the block number, the block score, and the sentence score are calculated for each sentence of the question document by the processing of the question document summarizing apparatus 1.
[0028]
First, the document input unit 11 receives an input of a question document (S1). Next, the sentence detection unit 12 detects a sentence by detecting a delimiter such as “.” Or “?” In the input question document (S2). Here, the sentence detection unit 12 may divide the sentence according to a statistically obtained part-of-speech arrangement, for example, the end of the sentence ends with a final particle.
[0029]
Next, the sentence role granting unit 13 assigns a role to each sentence according to the role assignment rules in Japanese stored in the role assignment rule DB 19 (S3). Here, the sentence role assigning unit 13 assigns a role according to the sentence end pattern, as described in Japanese Patent Laid-Open No. 2001-83518 described above. Here, an example will be described in which the sentence role assigning unit 13 assigns a role to each sentence obtained from the question document shown in FIG. 15 according to the role assignment rule shown in FIG. As shown in the role column in the table of FIG. 4, the sentence role granting unit 13 assigns a role to each sentence according to a role assignment rule based on a sentence end pattern. Here, a role that is not useful for search, such as “greeting”, may be given.
[0030]
Next, the sentence information amount calculation unit 14 calculates a sentence information amount, which is a sentence information amount, according to the Japanese word information amount in the word information amount dictionary 20, and outputs the sentence information amount to the block detection unit 15 (S4). For the calculation of the sentence information amount, a registered word and a word information amount stored in advance in the word information amount dictionary 20 are used. The sentence information amount calculation unit 14 uses the word information amount dictionary 20 to search for registered words from each sentence of the question document, calculates an average value of word information amounts of registered words for each sentence, and uses the average value as sentence information. Amount. The sentence information amount is calculated by the following equation (2).
[0031]
Sentence information amount = Σ (word information amount of registered words in sentence) / number of registered words in sentence (2)
[0032]
For example, according to the word information amount dictionary in FIG. 2, there are two registered words “PC-XXX” and “use” according to the word information dictionary in FIG. 3 and 1 respectively. Therefore, from the equation (2), the sentence information amount of the first sentence = (3 + 1) / 2 = 2. Thus, the result of calculating the sentence information amount for each sentence from the first sentence to the eighth sentence is shown in the sentence information amount column in the table of FIG.
[0033]
Next, the block detection unit 15 calculates a change amount using the sentence information amount of each sentence (S5). FIG. 5 is a flowchart illustrating an example of a process for detecting a change amount of the sentence information amount. First, the change amount is initialized by setting the change amount of the first sentence to 0 (S11), and the target sentence is set as the second sentence (S12).
[0034]
When the target sentence is not the last sentence (S13, No), the sentence information amount of the target sentence is compared with the sentence information amount of the sentence immediately before the target sentence (S14). As a result of the comparison, if the difference in the sentence information amount is within the specified range (S15, Yes), the change amount of the target sentence is set to 0 (S20), the target sentence is set to the next sentence (S18), and the process returns to S13. . Here, the prescribed range is heuristically set to 0.1, for example.
[0035]
On the other hand, as a result of the comparison, if the difference in the sentence information amount is outside the specified range (S15, No), and the sentence information amount of the target sentence is larger (S16, Yes), the change amount of the target sentence is set to + ( S17), the process proceeds to S18. On the other hand, when the sentence information amount of the target sentence is smaller (S16, No), the change amount of the target sentence is set to-(S19), and the process proceeds to S18. If the target sentence is the last sentence (S13, Yes), this flow ends. The result of performing the above change amount detection processing for the first sentence to the eighth sentence is shown in the change amount column in the table of FIG.
[0036]
Next, the block detection unit 15 detects a block from the question document using the change amount of each sentence (S6). FIG. 6 is a flowchart illustrating an example of processing for detecting a block from a question document. The block detection unit 15 first sets the target sentence as the head sentence (S31).
[0037]
When the target sentence is not the last sentence (S32, No), the change amount of the target sentence is compared with the change amount of the next sentence of the target sentence (S33). As a result of the comparison, when the amount of change is the same or the amount of change of the target sentence is 0 (S34, Yes), the target sentence is set as the next sentence (S37), and the process returns to S32. On the other hand, as a result of comparison, when the amount of change is the same or the amount of change of the target sentence is not 0 (No in S34) and the amount of change changes from − to + (S35, Yes), the block is separated by the target sentence. (S36), the process proceeds to S37. On the other hand, when the amount of change has changed from + to-(No in S35), the process proceeds to S37.
[0038]
If the target sentence is the last sentence (S32, Yes), this flow ends. Through the above processing, blocks are detected from the question document. The result of performing the above block detection for the first sentence to the eighth sentence is shown in the block number column in the table of FIG. Here, numbers 1 to 3 are assigned to each of the divided blocks, the portion assigned the number 1 is the first block, the location assigned the number 2 is the second block, and the number 3 is assigned. This location is taken as the third block.
[0039]
Next, the block score giving unit 16 gives a block score to the sentence according to the property of the block (S7). FIG. 7 is a diagram illustrating an example of a block score assignment rule. The block scoring rule shown in FIG. 7 is a first block priority rule in which 1/5 of the average sentence information amount of all sentences is given to sentences included in the first block. Here, the first block priority rule reflects a heuristic rule that the first block is important. The block score which is the score given to each sentence according to the block score assignment rule is shown in the block score column in the table of FIG. Here, a block score is assigned to each sentence according to the first paragraph priority rule, but the average value of the sentence information amount per block is calculated, and all blocks are ranked based on the average value. You may make it give a score as a block score.
[0040]
Next, the sentence score calculation unit 17 calculates a sentence score for each sentence and outputs it to the sentence output unit 18 (S8). Here, the sentence score is a value representing the importance of the sentence, and is the sum of the sentence information amount, the block score, and the score given to the role for each sentence. The result of calculating the sentence score of each sentence is shown in the sentence score column in the table of FIG. In the present embodiment, the role of the object to which the sentence score is given is “act”, “failure”, and “question”. Here, a role other than these roles is not given, but a rule that a sentence score is not given to a sentence having an insignificant role such as “greeting” may be used. Moreover, you may make it give a different score according to a role.
[0041]
Finally, the sentence output unit 18 sorts the sentences in the descending order of the sentence scores, and outputs the sentences as many as the number of extracted sentences specified in advance as extracted sentences (S9). Here, the number of extracted sentences is three. FIG. 8 is a diagram showing an extracted sentence obtained from the question document. As shown in FIG. 8, the extracted sentences are the third sentence, the second sentence, and the fourth sentence in descending order of the sentence score, and the central part of the question document is extracted.
[0042]
As described above, by performing the processing from S1 to S9, the question document summarizing apparatus according to the present embodiment can extract only important sentences from the question document. In the present embodiment, a past case question document is stored in advance in the search target document DB 21, but a question document input to the document input unit 11 is newly registered in the search target document DB 21, You may make it accumulate | store as a question document of a case.
[0043]
Embodiment 2. FIG.
In this embodiment, a question document summarizing apparatus that extracts an important sentence from an English question document will be described. The functional configuration of the question document summarizing apparatus according to the present embodiment is the same as that shown in FIG. First, English query documents of past cases are stored in advance in the search target document DB 21, and the word information dictionary 20 uses the search target document DB 21 to store English word information as in the first embodiment. The amount is calculated and stored as a set of registered words and word information amounts. FIG. 9 is a table showing an example of an English word information dictionary.
[0044]
Next, the operation of the question document summarizing apparatus 1 will be described using a specific example. Here, an example in which the English question document shown in FIG. 10 is input to the question document summarizing apparatus 1 will be described. Since the document input unit 11, the block detection unit 15, the block score assignment unit 16, the sentence score calculation unit 17, and the sentence output unit 18 perform the same processing as in the first embodiment, hereinafter the sentence detection unit 12 and the sentence role assignment The processing of the unit 13 and the sentence information calculation unit 14 and the English role assignment rules registered in the role assignment rule DB 19 will be described.
[0045]
The sentence detection unit 12 detects a sentence by detecting a delimiter such as “.” Or “?” In the input question document (S2). The sentence role granting unit 13 assigns a role to each sentence in accordance with the English role granting rules stored in the role granting rule DB 19 (S3). FIG. 11 is a table showing an example of role assignment rules in English. In the case of a Japanese sentence, the role is given by matching the sentence end pattern. In the case of an English sentence, the role is given by an expression included in the sentence. For example, “I use PC-XXX now.” In the first sentence is given “action” as the role of the first sentence because the expression “use” is used in the sentence. FIG. 12 shows the result of dividing the question document shown in FIG. 10 and assigning roles in accordance with the role assignment rules shown in FIG.
[0046]
The sentence information amount calculation unit 14 calculates the sentence information amount according to the same procedure as in the first embodiment in accordance with the English word information amount in the word information amount dictionary 20, and outputs the sentence information amount to the block detection unit 15 (S4).
[0047]
As described above, regarding the processing of the other functional blocks performed in S1 and S5 to S9, only the important sentence is extracted from the English question document by performing the same processing as in the first embodiment described above. Can do.
[0048]
Embodiment 3 FIG.
In the present embodiment, a configuration of a question answering search apparatus configured by including two question document summarizing apparatuses according to the present invention will be described. FIG. 13 is a block diagram showing an example of the configuration of the question answering search apparatus according to the present embodiment. As shown in FIG. 13, the question response search device 3 includes a question document summarization device 1 </ b> A, a question document summarization device 1 </ b> B, a search index 31, and a response search unit 32. Here, the question document summarizing apparatus 1A and the question document summarizing apparatus 1B have the same functional configuration as the question document summarizing apparatus 1 in FIG. 1 and perform the same processing. The search index 31 is obtained by registering a response (answer) to a past question document in combination with a summary sentence (search target sentence) about the question document.
[0049]
Next, the operation of the question answering search device 3 will be described. First, a search target document that is a question document of a past case is input in advance to the question document summarizing apparatus 1A. The question document summarizing apparatus 1A extracts an important sentence from the search target document, outputs the result as a search target sentence to the search index 31, and registers the search target sentence. When the search target sentence is registered in the search index 31, a response corresponding to the question document related to the search target sentence extracted by the question document summarizing apparatus 1A is registered in combination with the search target sentence.
[0050]
On the other hand, the actual question document received from the user is input to the question document summarizing apparatus 1B. The question document summarizing apparatus 1B extracts an important sentence from the question document and outputs the result to the response search unit 32 as a question sentence. The response search unit 32 searches the search index 31 for a search target sentence similar to the question sentence, and outputs a corresponding response to the outside. Thus, by extracting only the question text necessary for the search from the question document and performing the search, a stable question answer search can be performed.
[0051]
Embodiment 4 FIG.
In the third embodiment, separate question document summarization apparatuses 1A and 1B are used for extracting a search target sentence from a past question document and for extracting a question sentence from a question document obtained from a user. However, these may be performed by using one document summarizing apparatus as well. That is, as shown in FIG. 14, after a search target sentence is extracted from a past question document using one document summarizing apparatus 1C and registered in the search index 31 (S101), it is used using the document summarizing apparatus 1C. A question sentence may be extracted from the question document obtained from the person and given to the response search unit 32 (S102).
[0052]
(Supplementary note 1) A question document summarizing apparatus that extracts a sentence representing a question content from a question document,
A document input unit for receiving the question document;
A sentence detection unit for detecting a sentence from the question document;
A sentence information amount calculation unit that calculates a sentence information amount that is a sentence information amount;
A block detection unit for detecting a block that is a collection of sentences from the question document based on a change amount of the sentence information amount for each sentence;
A block score giving unit for giving a block score to a sentence in the block according to a predetermined rule regarding the property of the block;
A sentence score calculating unit that calculates a sentence score representing importance of the sentence based on the sentence information amount and the block score;
A sentence output unit for outputting a part of the sentence from the question document based on the sentence score;
Question document summarizing device comprising:
(Appendix 2) In the question document summarization apparatus described in Appendix 1,
A word information amount for extracting a word from a past question document as a registered word, calculating an information amount of the registered word in the past question document as a word information amount, and storing a set of the registered word and the word information amount A dictionary,
The sentence information amount calculation unit calculates a sentence information amount by searching for a registered word in the word information amount dictionary from a sentence and dividing the sum of the searched word information amounts by the number of searched words. Question document summarization device.
(Supplementary Note 3) In the question document summarizing apparatus described in Supplementary Note 1 or Supplementary Note 2,
The block detection unit divides a block between the two sentences when the sign of the change amount changes from-to + between the two sentences.
(Supplementary note 4) In the question document summarizing device according to any one of supplementary notes 1 to 3,
The block score assigning unit assigns a predetermined score to the first block of the blocks as the block score.
(Supplementary note 5) In the question document summarizing apparatus according to any one of supplementary notes 1 to 3,
The block scoring unit calculates an average value of sentence information amounts of sentences included in the block, ranks all the blocks based on the average value, and assigns a score corresponding to the rank for each block to the block A question document summarization apparatus characterized by being given as a score.
(Supplementary note 6) In the question document summarizing apparatus according to any one of supplementary notes 1 to 5,
A sentence role granting part for granting a role to the sentence according to a predetermined rule;
The sentence score calculating unit adds a sentence score according to the role given by the sentence role granting unit.
(Supplementary note 7) A question answering search device for searching for a response to a question document,
7. A sentence representing question content is extracted as a search target sentence from a search target document that is a past question document, and a sentence representing question content is extracted as a question sentence from an inputted question document. A question document summarization device described in
A search index for registering a set of the search target sentence extracted by the question document summarizing apparatus and a response corresponding to the search target sentence;
A response search unit that searches the search index for a search target sentence that is the same as or similar to the question sentence using the question sentence extracted by the question document summarization apparatus, and outputs a corresponding response;
A question answering search apparatus comprising:
(Supplementary note 8) A question answering search device for searching for a response to a question document,
A search target sentence extraction unit configured by the question document summarization device according to any one of supplementary notes 1 to 6 that extracts a sentence representing a question content as a search target sentence from a search target document that is a past question document;
A search index for registering the search target sentence and a set of responses to the search target sentence;
A question sentence extraction unit configured by the question document summarization device according to any one of supplementary notes 1 to 6 that extracts a sentence representing the content of a question from the input question document as a question sentence, and is extracted by the question sentence extraction unit A response search unit that searches the search index for a search target sentence that is the same as or similar to the question sentence, and outputs a corresponding response,
A question answering search apparatus comprising:
(Supplementary note 9) A question document summarization program stored in a computer-readable medium for causing a computer to execute a process of extracting a sentence representing a question from a question document and summarizing the question document,
Receiving the question document;
Detecting a sentence from the question document;
Calculating a sentence information amount that is a sentence information amount;
Detecting a block that is a collection of sentences from the question document based on a change amount of the sentence information amount for each sentence;
Assigning a block score to a sentence in the block according to a predetermined rule relating to the nature of the block;
Calculating a sentence score representing importance of the sentence based on the sentence information amount and the block score;
Outputting a part of the sentence from the question document based on the sentence score;
A question document summarization program for causing a computer to execute.
(Supplementary Note 10) A question document summarizing method for extracting a sentence representing a question from a question document,
Receiving the question document;
Detecting a sentence from the question document;
Calculating a sentence information amount that is a sentence information amount;
Detecting a block that is a collection of sentences from the question document based on a change amount of the sentence information amount for each sentence;
Assigning a block score to a sentence in the block according to a predetermined rule relating to the nature of the block;
Calculating a sentence score representing importance of the sentence based on the sentence information amount and the block score;
Outputting a part of the sentence from the question document based on the sentence score;
A question document summarizing method comprising:
[0053]
【The invention's effect】
As described in detail above, according to the present invention, it is possible to extract only important sentences from a question document such as a question mail. Thereby, a stable question answer search is possible.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating an example of a functional configuration of a question document summarizing apparatus according to a first embodiment of the present invention.
FIG. 2 is a table showing an example of a word information amount dictionary related to Japanese.
FIG. 3 is a flowchart showing an example of processing of the question document summarizing apparatus according to the first embodiment.
FIG. 4 is a table showing an example of values calculated for each sentence of a question document.
FIG. 5 is a flowchart illustrating an example of processing for detecting a change amount of sentence information.
FIG. 6 is a flowchart illustrating an example of processing for detecting a block from a question document.
FIG. 7 is a diagram illustrating an example of a block score provision rule.
FIG. 8 is a diagram showing an extracted sentence obtained from a question document.
FIG. 9 is a table showing an example of an English word information dictionary according to Embodiment 2 of the present invention.
FIG. 10 is a diagram illustrating an example of an English question document.
FIG. 11 is a table showing an example of role assignment rules in English.
FIG. 12 is a diagram showing an example of a role assignment result in English.
FIG. 13 is a block diagram showing an example of a configuration of a question response search apparatus according to Embodiment 3 of the present invention.
FIG. 14 is a block diagram showing an example of a configuration of a question answering search apparatus according to Embodiment 4 of the present invention.
FIG. 15 is a diagram illustrating an example of a Japanese question document.
FIG. 16 is a table showing an example of role assignment rules in Japanese.
FIG. 17 is a diagram illustrating an example of a role assignment result in Japanese.
[Explanation of symbols]
1,1A, 1B, 1C Question document summarization device, 11 document input unit, 12 sentence detection unit, 13 sentence role assignment unit, 14 sentence information amount calculation unit, 15 block detection unit, 16 block score assignment unit, 17 sentence score calculation Part, 18 sentence output part, 19 role grant rule DB, 20 word information amount dictionary, 21 search object document DB, 3 question answer search device, 31 search index, 32 response search part.

Claims

A question document summarization device that extracts a sentence representing a question content from a question document,
A document input unit for receiving the question document;
A sentence detection unit for detecting a sentence from the question document;
Extracting words from past question documents into registered words, based on the number of past question documents and the number of past question documents including the registered words, the importance of the registered words in the past question documents A word information dictionary that calculates a value indicating sex as a word information amount, and stores a set of the registered word and the word information amount;
A sentence information amount calculating unit that searches a registered word in the word information amount dictionary from a sentence , and calculates an average value of word information amounts paired with the registered word searched in the word information amount dictionary as a sentence information amount;
A block detection unit for detecting a block that is a collection of sentences from the question document based on a change amount of the sentence information amount for each sentence;
A block score giving unit for giving a block score to a sentence in the block according to a predetermined rule regarding the property of the block;
A sentence score calculating unit that calculates a sentence score representing importance of the sentence based on the sentence information amount and the block score;
A sentence output unit that outputs a part of the sentence from the question document based on the sentence score;
Question document summarizing device comprising:

The question document summarization apparatus according to claim 1,
The block score assigning unit assigns a predetermined score to the first block of the blocks as the block score.

A question answering search device for searching for a response to a question document,
Extracts the text from the document to be searched in the past week document represents the Question as a search subject sentence, according to claim 1 or claim 2 for extracting a sentence representing a question content from the input question text as a question sentence A question document summarization device,
A search index for registering a set of the search target sentence extracted by the question document summarizing apparatus and a response corresponding to the search target sentence;
A response search unit that searches a search target sentence that is the same as or similar to the question sentence from the search index using the question sentence extracted by the question document summarizing apparatus, and outputs a corresponding response;
A question answering search apparatus comprising:

A question document summarization program stored in a computer-readable medium for causing a computer to execute a process of extracting a sentence representing a question from a question document and summarizing the question document,
Receiving the question document;
Detecting a sentence from the question document;
Extracting words from past question documents into registered words, based on the number of past question documents and the number of past question documents including the registered words, the importance of the registered words in the past question documents Calculating a value indicating sex as a word information amount, and storing the set of the registered word and the word information amount as a word information amount dictionary;
Searching for a registered word in the word information amount dictionary from a sentence , and calculating an average value of the word information amount paired with the registered word searched in the word information amount dictionary as a sentence information amount;
Detecting a block that is a collection of sentences from the question document based on a change amount of the sentence information amount for each sentence;
Assigning a block score to a sentence in the block according to a predetermined rule relating to the nature of the block;
Calculating a sentence score representing importance of the sentence based on the sentence information amount and the block score;
Outputting a partial sentence from the question document based on the sentence score;
A question document summarization program for causing a computer to execute.