JP3363501B2

JP3363501B2 - Text search device

Info

Publication number: JP3363501B2
Application number: JP00620993A
Authority: JP
Inventors: 佳代中村
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1993-01-18
Filing date: 1993-01-18
Publication date: 2003-01-08
Anticipated expiration: 2018-01-08
Also published as: JPH06215035A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は、テキスト（ここでテキ
ストとは、１つの文章、或るいは関連した複数文章によ
り構成された文書のことを言う。）の検索に際し、入力
する質問文のキーワードを抽出し、そのキーワードを含
むテキストを検索し、出力するテキスト検索装置に関す
るものである。【０００２】【従来の技術】近年、コンピュータ技術の進歩や、文書
記憶装置の容量の増加により、多量のデータや文章を備
えるデータベースが普及しており、そのデータベースを
扱う機会も多く発生している。【０００３】これらのデータベースには、あらかじめハ
ードディスクなどの記憶装置に多量のデータが蓄積され
ており、そのデータに対して、キーワードを基にして検
索することが一般的である。【０００４】このようなテキスト検索装置におけるキー
ワードを基にした検索としては、例えば、特開平２−２
４５８号公報に開示されるように、与えられたテキスト
の形態素解析を行って、キーワードを抽出し、抽出され
たキーワードを基にして、記憶装置に格納されたテキス
トの検索を行い、キーワードが一致するテキストを出力
する方法が用いられている。【０００５】従来のテキスト検索装置を用いた検索方法
の一例を図５及び図６に基づいて説明する。【０００６】図５は、従来のテキスト検索装置の概略構
成図を示すものである。【０００７】同図において、１はキーワードを含む質問
文が入力される質問入力部、２は質問入力部１において
入力された質問文に対して形態素解析を施し、キーワー
ドの抽出を行うキーワード抽出部、３はキーワード抽出
部２において抽出されたキーワードを蓄えるキーワード
メモリ、４は検索対象となるテキストが既に蓄えられて
いる文書メモリ、５はキ−ワ−ドメモリ３に蓄えられて
いるキーワードを基にして、文書メモリ４から関連があ
るテキストを抽出するテキスト抽出部、６はテキスト抽
出部５で抽出されたテキストの出力を行う出力部、７は
テキスト検索装置全体の制御を司る制御部である。【０００８】図６は、図５に示した従来のテキスト検索
装置における出力部６からの検索結果であり、その出力
内容を示している。【０００９】以下に、図５に示すテキスト検索装置の動
作について、一例として、質問文”土星のリングについ
て知りたい。”が入力された場合を説明する。【００１０】まず、質問入力部１に”土星のリングにつ
いて知りたい。”という質問文が入力されると、キーワ
ード抽出部２は、キーワードを抽出するために、形態素
解析を行う。質問入力部１に入力された質問文は”土星
／の／リング／について／知り／たい／。”（ここで／
は形態素解析による区切りを表す。）のように形態素に
分解される。ここでは、キーワードとして名詞を用いる
こととしているので、”土星”及び”リング”が名詞と
して抽出される。従って、キーワード抽出部２におい
て、”土星”及び”リング”がキーワードとして抽出さ
れ、キーワードメモリ３に蓄えられる。この質問文にお
ける検索要求の内容は”土星のリング”に関することで
あり、キーワードである”土星”と”リング”が係受け
の関係になっているものが最も要求されるテキストであ
る。【００１１】キーワードメモリ３にキーワードが蓄積さ
れると、テキスト抽出部５は、文書メモリ４に蓄えられ
ているテキストから、キーワードを含むテキストを順次
抽出していく。【００１２】表１は文書メモリ４に格納されている様々
なテキストの一部を示したものである。【００１３】【表１】【００１４】テキスト抽出部５は、文書メモリ４の内容
から順次テキストの検索を行う。以下にその手順を示
す。【００１５】テキスト抽出部５は、文書メモリ４のテキ
ストからキーワード”土星”が含まれるか否かの判定を
全てのテキストについて行う。まず、”土星に関して言
えば、その中のリングは土星の象徴と言える。”（以
下、テキスト１という。）というテキストが抽出され
る。検索を続けていくと、”土星のリングは、太陽系の
中で最も美しいものの一つだ。”（以下、テキスト３と
いう。）というテキストが検索される。さらに検索を続
けていくと、”土星は、衛星に取り巻かれている。そし
てリングは、衛星からエネルギーを奪われている。”
（以下、テキスト４という。）というテキストが抽出さ
れる。さらに続けると、”土星においてリングはどのよ
うな働きをしているのだろうか？”（以下、テキスト６
という。）というテキストが抽出される。さらに、続け
ると”土星は太陽系の惑星の一つである。”（以下、テ
キスト７という。）というテキストが抽出される。【００１６】以上のように、テキスト１、テキスト３、
テキスト４、テキスト６、及びテキスト７が抽出され
る。テキスト２及びテキスト５については、キーワード
である”土星”が含まれていないため抽出されない。【００１７】続いて、テキスト抽出部５は、抽出された
テキスト１、テキスト３、テキスト４、テキスト６、及
びテキスト７について次のキーワード”リング”が含ま
れているか否かの判定を行う。テキスト１、テキスト
３、テキスト４、及びテキスト６には、キーワード”リ
ング”が含まれるが、テキスト７には、キーワード”リ
ング”が含まれない。このため、テキスト抽出部５から
は、テキスト１、テキスト３、テキスト４、及びテキス
ト６が抽出され、出力部６に伝えられ、図６に示すよう
に出力部６においてテキスト１、テキスト３、テキスト
４、及びテキスト６の順番に出力される。【００１８】このように、従来、キーワードによる検索
の結果、全種類のキーワードを含むテキストが、文書メ
モリ４に蓄えられている順番に出力部６から出力されて
いた。【００１９】【発明が解決しようとする課題】上記のような構成で
は、キーワードを含むテキストを抽出することは可能で
あるが、文書メモリ４に蓄えられている順番に出力を行
っていくため、質問文の意図する検索要求の内容を的確
に表すテキストから出力されるとは限らなかった。【００２０】本発明は上記問題点に鑑みなされたもので
あり、抽出されたテキストの構文解析結果を基にして、
テキストの出力の順番に優先順位を与え、検索要求の内
容を的確に表すテキストから順番に出力するテキスト検
索装置を提供するものである。【００２１】【課題を解決するための手段】上記問題点を解決するた
めに、本発明のテキスト検索装置は、複数のテキストが
格納されている文書メモリと、該文書メモリに格納され
ているテキストを抽出するのに要する質問文を入力する
質問入力部と、該質問入力部に入力された質問文から複
数種類のキーワードを抽出するキーワード抽出部と、該
キーワード抽出部で抽出されたキーワードを基にして、
上記文書メモリから上記キーワードを含んだテキストを
抽出するテキスト抽出部と、該テキスト抽出部から抽出
されたテキストを出力する出力部と、上記テキスト抽出
部で抽出されたテキストに対して、構文解析を施す構文
解析判定部と、を備え、該構文解析判定部は、上記キー
ワード抽出部で抽出されたキーワード間の距離による判
定を行い、該キーワード間の距離が短い順番に、上記出
力部から上記テキストを出力せしめる優先順位を決定す
るものである。【００２２】【作用】本発明は、上述した構成により、質問文のキー
ワードを用いて、抽出されたテキストに対してキーワー
ド間の距離による判定を行い、該キーワード間の距離が
短い順番に抽出されたテキストに優先順位を与えること
により、検索要求を的確に表すテキストを自動的に優先
して出力するテキスト検索装置を提供することが可能で
ある。【００２３】【実施例】以下に、本発明の一実施例であるテキスト検
索装置を図１乃至図４に基づいて説明し、従来と同一の
構成については同一番号を付し、その説明は省略する。【００２４】図１は、本発明のテキスト検索装置の概略
構成図、図２及び図３は、本発明の特徴である構文解析
判定部８の処理の流れを示すフローチャート、図４は、
本発明を実施した場合の出力結果である。【００２５】本発明が従来例と異なる点は、テキスト抽
出部５と出力部６の間に構文解析判定部８を設けたこと
であり、この構文解析判定部８は、質問入力部１に入力
された質問文及びテキスト抽出部５で抽出されたテキス
トについて構文解析を行うとともに、これらの構文解析
結果を基にして、テキスト抽出部５で抽出されたテキス
トの並び替えを行い、出力部６から出力するテキストの
優先順位を決定する機能を有する。この構文解析判定部
８における処理概要を図２及び図３に示すフローチャー
トを基にして説明する。【００２６】Ｓ１においては、質問入力部１に入力され
た質問文の構文解析を行い、キーワード抽出部２で抽出
されたキーワードの構文情報、即ちキーワードがどの
文、どの文節に含まれるか、或るいはキーワード間の係
受けの関係などを調べる。Ｓ２においては、テキスト抽
出部５で抽出されたテキストの１つを構文解析判定部８
に読み込ませる。Ｓ３においては、Ｓ２において読み込
まれたテキストから形態素解析を行い、キーワードを抽
出するとともに、読み込まれたテキストにおけるキーワ
ードの構文情報（キーワードがどの文、どの文節に含ま
れるか、或るいはキーワード間の係受けの関係など）を
調べる。【００２７】Ｓ４、Ｓ６、及びＳ８においては、Ｓ３で
抽出されたキーワードの構文情報を基にして出力部６に
おける優先順位を決定するものであり、まず、Ｓ４で
は、テキスト抽出部５で抽出されたテキストにおいて、
全種類のキーワードが１つの文中に存在するかどうかを
判定し、全種類のキーワードが１つの文中に存在する場
合には、Ｓ６の処理へ進み、全種類のキーワードが１つ
の文中に存在しない場合には、Ｓ５へ進む。Ｓ５におい
ては、全種類のキーワードが１つの文中に存在しない場
合の優先順位を決定し、全種類のキーワードが１つの文
中に存在しない場合、キーワード間の関係は、ほとんど
無いと判定し、優先順位（以下、ＩＤという。）は４番
目（ＩＤ＝４）と決定する（以下、ＩＤ＝ｎと書くと、
優先順位はｎ番目であることを示す。）。【００２８】次に、Ｓ６においては、テキスト抽出部５
で抽出されたテキストにおいて、全種類のキーワードが
１つの文節中に存在するかどうかを判定し、全種類のキ
ーワードが１つの文節中に存在する場合は、Ｓ８へ進
み、全種類のキーワードが１つの文節中に存在しない場
合には、Ｓ７に進む。Ｓ７においては、全種類のキーワ
ードが１つの文節中に存在しない場合の優先順位を決定
し、全種類のキーワードが１つの文節中に存在しない場
合、キーワード間の関係は、あまり無いと判定し、ＩＤ
＝３と決定する。【００２９】Ｓ８においては、テキスト抽出部５で抽出
されたテキストにおけるキーワード間の関係と、質問入
力部１に入力された質問文におけるキーワード間の関係
とが同じであるかどうかの判定を行い、同じ関係の場合
には、Ｓ１０に進み、それらの関係が異なる場合には、
Ｓ９に進む。【００３０】Ｓ９においては、テキスト抽出部５で抽出
されたテキストにおけるキーワード間の関係と、質問入
力部１に入力された質問文におけるキーワード間の関係
とが異なる場合の優先順位を決定し、この場合は、検索
要求の内容に近いが、一致はしていないと判断し、ＩＤ
＝２と決定する。【００３１】Ｓ１０においては、テキスト抽出部５で抽
出されたテキストにおけるキーワード間の関係と、質問
入力部１に入力された質問文におけるキーワード間の関
係とが同じ場合の優先順位を決定し、この場合は、検索
要求の内容に一致していると判断し、ＩＤ＝１と決定す
る。【００３２】Ｓ１１においては、テキスト抽出部５で抽
出された全てのテキストについて、上記Ｓ２乃至Ｓ１０
の処理が終了したかどうかを判定するものであり、抽出
された全てのテキストについて終了した場合には、Ｓ１
２へ進む。Ｓ１２においては、Ｓ２乃至Ｓ１０において
決定されたＩＤを基にして、優先順位の高い（ＩＤが小
さい。）ものから順番に、テキストを並べ替えるもので
ある。【００３３】Ｓ１３においては、並べ替えられたテキス
トに、同一優先順位のものがあるかどうかの判定を行
い、同一優先順位のものがある場合には、Ｓ１４に進
み、同一優先順位のものがない場合には、Ｓ１７へ進
む。【００３４】Ｓ１４乃至Ｓ１６では、同一優先順位のも
のがある場合に更に別な方法で、優先順位を決定する。
Ｓ１４では、Ｓ３において抽出されたキーワードの中
で、上記Ｓ４乃至Ｓ１０における優先順位の決定に用い
られなかったキーワードが存在するかどうかの判定を行
い、そのキーワードが存在する場合には、このキーワー
ドにより、Ｓ３で読み込まれたテキストに対してＳ４乃
至Ｓ１０を再度行い、優先順位を決定する。【００３５】Ｓ１５及びＳ１６では、上記Ｓ２乃至Ｓ１
４の処理を経ても優先順位が同じ場合には、キーワード
間の距離（あるキーワードと他のキーワードの間に存在
する文字数など）により、優先順位を決定する。【００３６】Ｓ１７では、上記Ｓ２乃至Ｓ１６の処理に
おいて決定された優先順位に基づいて、優先順位の高い
テキストから順番に、出力部６へ伝送する。【００３７】以下に、一例として”土星のリングについ
て知りたい。”という質問文に関する検索について、表
１、図２、及び図３を参照しながら、説明する。【００３８】質問入力部１に入力された質問文から、テ
キスト抽出部５のテキスト抽出までの流れは、従来例の
テキスト抽出の流れと同じであるので、ここでは説明は
省略し、以下では、テキスト抽出部５で抽出されたテキ
ストを、構文解析判定部８において優先順位を付ける方
法について図２及び図３を用いて説明する。【００３９】まず、図２のＳ１において、質問入力部１
から入力された”土星のリングについて知りたい。”と
いう質問文の形態素解析が行われ、”土星”、”リン
グ”がキーワードとして抽出され、さらに構文解析が行
われる。キーワードの”土星”、”リング”は同一文、
同一文節中であり、キーワードが係受けの関係であるこ
とが判定される。この結果が、入力された質問文の構文
解析結果として、構文解析判定部８に記憶される。【００４０】次に、Ｓ２においてテキスト抽出部５で抽
出されたテキストが、順番に構文解析判定部８に読み込
まれ、Ｓ３において形態素解析が施される。【００４１】Ｓ３においては、まず、テキスト１の形態
素解析を行い、”土星／に／関して／言え／ば／、／そ
の／中の／リング／は／土星／の／象徴／と／言える
／。”というように解析される。次に、この形態素解析
の結果を基にして、構文解析によりＩＤが決定される。
テキスト１についてキーワードである”土星”と”リン
グ”が同一文中にあるかどうかが判定される。テキスト
１は、同一文中にキーワードが存在するために、処理が
Ｓ６へ進められる。Ｓ６においては、同一文節中に、キ
ーワードが存在するかどうかの判定が施される。テキス
ト１のキーワードである”土星”と”リング”は同一文
節に存在しないので、処理がＳ７へ進められて、ＩＤ＝
３と決定された後、Ｓ１１へ進む。Ｓ１１においては、
テキスト抽出部５において抽出された全てのテキストに
ついて、構文解析によりＩＤが決定されたかどうかが調
べられる。テキスト抽出部５において抽出された全ての
テキストについて終了した場合には、Ｓ１２へ進み、終
了していない場合には、Ｓ２に戻って繰り返される。【００４２】以下に、テキスト３、テキスト４、及びテ
キスト６の構文解析の例について説明する。【００４３】続いて、Ｓ３においてテキスト３の形態素
解析を行う。テキスト３は”土星／の／リング／は／、
／太陽系／の中で／最も／美しい／もの／の／一つ／だ
／。”と解析される。この解析されたテキスト３もテキ
スト１と同様にＳ４、Ｓ６、Ｓ８及びＳ１０が施され
る。テキスト３の構文解析結果と、質問文の構文解析結
果とは同じであるので、ＩＤ＝１と決定される。【００４４】さらに続いて、テキスト抽出部５で抽出さ
れた３番目のテキストについて同様に繰り返す。テキス
ト４は”土星／は／、／衛星／に／取り巻か／れ／てい
る／。／そして／リング／は／、／衛星／から／エネル
ギー／を／奪わ／れ／ている／。”というように形態素
解析され、Ｓ４及びＳ５の処理を経て、ＩＤが決定さ
れ、テキスト４は、ＩＤ＝４となる。さらに、テキスト
６についても、”土星／において／リング／は／どのよ
うな／働き／を／し／ている／の／だろうか／？”とい
うように形態素解析され、Ｓ４、Ｓ６、Ｓ８及びＳ９の
処理を経て、ＩＤ＝２と決定される。以上により、テキ
スト抽出部５で抽出されたテキスト全てについて構文解
析によりＩＤが決定される。【００４５】最終的に構文解析によりＩＤが決定される
と、図３のＳ１２において、構文解析による優先順位の
高いものから出力の順番が決定される。Ｓ１２におい
て、同一優先順位のものが存在する場合には、Ｓ１３に
進み、存在しない場合には、Ｓ１７により、出力部６に
対して、出力する順番を指定する。本実施例において
は、テキスト１はＩＤ＝３、テキスト３はＩＤ＝１、テ
キスト４はＩＤ＝４、テキスト６はＩＤ＝２であるの
で、Ｓ１７においては、テキスト３、テキスト６、テキ
スト１、テキスト４の順番で出力部６から出力される。
その結果を図４に示す。【００４６】本実施例においては、同一優先順位のもの
がない場合について説明したが、同一優先順位のものが
複数ある場合には、図３のＳ１４に示すように、優先順
位決定の対象となったキーワード以外に存在するキーワ
ードにより、判定を行うことも可能である。また、Ｓ１
４を行っても全く同一の場合には、Ｓ１５、Ｓ１６に示
すように、抽出されたテキストにおけるキーワード間の
距離（キーワード間の文字数など）より優先順位を決定
することも可能である。【００４７】尚、本実施例においては、キーワード抽出
部２で抽出されたキーワードが２つの場合を例に説明し
たが、キーワードが３つ以上ある場合も基本的には同じ
であり、例えば、キーワードが３つの場合には、３つの
キーワードによる構文解析結果を用いたり、或るいは質
問文において出現頻度の多いキーワードに関する構文解
析結果を用いることも可能である。【００４８】【発明の効果】本発明は、以上の説明から明らかなよう
に、キーワードにより抽出されたテキストを出力部から
出力する順番の優先順位を決定することができるため、
検索要求を的確に表すテキストから順次出力することが
可能となる。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text (here, a text refers to a document composed of one sentence or a plurality of related sentences). The present invention relates to a text search device that extracts a keyword of a question sentence to be input, searches for a text including the keyword, and outputs the extracted text. 2. Description of the Related Art In recent years, with the progress of computer technology and the increase in the capacity of document storage devices, databases having a large amount of data and text have become widespread, and there have been many opportunities to handle the databases. . In these databases, a large amount of data is stored in a storage device such as a hard disk in advance, and it is general to search the data based on a keyword. As a search based on a keyword in such a text search device, for example, Japanese Patent Laid-Open No. 2-2
As disclosed in Japanese Patent No. 458, a morphological analysis of a given text is performed to extract a keyword, a search of a text stored in a storage device is performed based on the extracted keyword, and the keyword matches. A method of outputting text to be used is used. An example of a search method using a conventional text search device will be described with reference to FIGS. FIG. 5 is a schematic block diagram of a conventional text search apparatus. In FIG. 1, reference numeral 1 denotes a question input unit into which a question sentence including a keyword is input; Reference numeral 3 denotes a keyword memory for storing the keywords extracted by the keyword extracting unit 2, reference numeral 4 denotes a document memory in which a text to be searched has already been stored, and reference numeral 5 denotes a keyword memory based on the keywords stored in the keyword memory 3. A text extraction unit for extracting a relevant text from the document memory 4, an output unit 6 for outputting the text extracted by the text extraction unit 5, and a control unit 7 for controlling the entire text search device. FIG. 6 shows a search result from the output unit 6 in the conventional text search apparatus shown in FIG. 5, and shows the output contents. Hereinafter, as an example of the operation of the text search apparatus shown in FIG. 5, a case will be described in which the question sentence "I want to know about Saturn's ring" is input. First, when a question sentence "I want to know about Saturn's ring" is input to the question input unit 1, the keyword extraction unit 2 performs a morphological analysis to extract a keyword. The question sentence input to the question input section 1 is "Saturn / no / ring / about / know / want /."
Represents a break by morphological analysis. Decomposed into morphemes as in ()). Here, since nouns are used as keywords, “Saturn” and “ring” are extracted as nouns. Therefore, the keyword extraction unit 2 extracts “Saturn” and “ring” as keywords and stores them in the keyword memory 3. The content of the search request in this question text is related to “Ring of Saturn”, and the text that is most requested is one in which the keywords “Saturn” and “ring” are related. When keywords are stored in the keyword memory 3, the text extracting unit 5 sequentially extracts texts including the keywords from the texts stored in the document memory 4. Table 1 shows a part of various texts stored in the document memory 4. [Table 1] The text extracting unit 5 sequentially searches for text from the contents of the document memory 4. The procedure is described below. The text extracting unit 5 determines whether or not the keyword "Saturn" is included in the text in the document memory 4 for all texts. First, the text "Saturn is said to be a symbol of Saturn" (hereinafter referred to as text 1) is extracted. As you continue your search, you will find the text "The Saturn's ring is one of the most beautiful in the solar system." The search continued, "Saturn is surrounded by satellites, and the ring is being deprived of energy from the satellites."
(Hereinafter referred to as text 4) is extracted. To continue, "How does the ring work on Saturn?"
That. ) Is extracted. Further, the text “Saturn is one of the planets of the solar system” (hereinafter referred to as text 7) is extracted. As described above, text 1, text 3,
Text 4, text 6, and text 7 are extracted. The text 2 and the text 5 are not extracted because the keyword “Saturn” is not included. Subsequently, the text extraction unit 5 determines whether or not the extracted keyword 1, text 3, text 4, text 6, and text 7 include the next keyword "ring". Text 1, text 3, text 4, and text 6 include the keyword “ring”, but text 7 does not include the keyword “ring”. For this reason, text 1, text 3, text 4, and text 6 are extracted from the text extraction unit 5 and transmitted to the output unit 6, and as shown in FIG. 4 and text 6 in this order. As described above, conventionally, as a result of a search using a keyword, texts including all types of keywords have been output from the output unit 6 in the order stored in the document memory 4. With the above configuration, it is possible to extract a text including a keyword, but since the text is output in the order stored in the document memory 4, It was not always output from a text that accurately represented the content of the search request intended by the question sentence. The present invention has been made in view of the above problems, and based on a result of parsing an extracted text,
It is an object of the present invention to provide a text search apparatus that gives priority to the order of outputting texts and outputs the texts in order from the text that accurately represents the content of the search request. In order to solve the above problems, a text search apparatus according to the present invention comprises a document memory storing a plurality of texts, and a text memory stored in the document memory. A question input unit for inputting a question sentence required for extracting a keyword, a keyword extraction unit for extracting a plurality of types of keywords from the question sentence input to the question input unit, and a keyword And then
A text extraction unit for extracting the text including the keyword from the document memory, an output unit for outputting the text extracted from the text extraction unit, and a syntax analysis for the text extracted by the text extraction unit. And a syntactic analysis judging unit for applying the key.
Judgment based on the distance between keywords extracted by the word extraction unit
The keywords in the order in which the distance between the keywords is short.
Determine the priority order for the above section to output the above text
Things. [0022] DETAILED DESCRIPTION OF THE INVENTION The present invention, keywords with the configuration described above, by using the keywords of the question sentence, the extracted text
Is determined based on the distance between the keywords, and the distance between the keywords is
By giving priorities to texts extracted in short order, it is possible to provide a text search device that automatically prioritizes and outputs texts that accurately represent search requests. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A description will now be given, with reference to FIGS. 1 to 4, of a text search apparatus according to one embodiment of the present invention. I do. FIG. 1 is a schematic configuration diagram of a text search apparatus according to the present invention, FIGS. 2 and 3 are flowcharts showing a processing flow of a parsing determination unit 8 which is a feature of the present invention, and FIG.
It is an output result at the time of implementing the present invention. The present invention is different from the conventional example in that a syntax analysis judging unit 8 is provided between the text extracting unit 5 and the output unit 6. The parsing is performed on the extracted question text and the text extracted by the text extracting unit 5, and the text extracted by the text extracting unit 5 is rearranged based on the parsing result. It has a function to determine the priority of the text to be output. An outline of the processing in the syntax analysis determination unit 8 will be described with reference to the flowcharts shown in FIGS. At S1, the syntax of the question sentence input to the question input unit 1 is analyzed, and the syntax information of the keyword extracted by the keyword extraction unit 2, that is, which sentence and which phrase contains the keyword, Or, check the dependency relationship between keywords. In S2, one of the texts extracted by the text extraction unit 5 is sent to the syntax analysis determination unit 8
To read. In S3, morphological analysis is performed from the text read in S2 to extract keywords, and syntax information of the keywords in the read text (in which sentence or in which phrase the keywords are included, or between keywords). (Such as dependency relationships). In S4, S6, and S8, the priorities in the output unit 6 are determined based on the syntax information of the keywords extracted in S3. In the text
It is determined whether all types of keywords exist in one sentence. If all types of keywords exist in one sentence, the process proceeds to S6, and if all types of keywords do not exist in one sentence. To S5. In S5, the priority order when all types of keywords do not exist in one sentence is determined. When all types of keywords do not exist in one sentence, it is determined that there is almost no relationship between the keywords. (Hereinafter referred to as ID) is determined to be the fourth (ID = 4) (hereinafter, writing ID = n,
The priority indicates the n-th priority. ). Next, in S6, the text extraction unit 5
It is determined whether or not all types of keywords are present in one phrase in the text extracted in step. If all types of keywords are present in one phrase, the process proceeds to S8, where all types of keywords are 1 If it does not exist in one phrase, the process proceeds to S7. In S7, a priority is determined when all types of keywords do not exist in one phrase, and when all types of keywords do not exist in one phrase, it is determined that there is not much relationship between the keywords, ID
= 3. In S8, it is determined whether or not the relationship between the keywords in the text extracted by the text extraction unit 5 and the relationship between the keywords in the question sentence input to the question input unit 1 are the same. If the relationships are the same, the process proceeds to S10. If the relationships are different,
Proceed to S9. In S9, the priority order is determined when the relationship between the keywords in the text extracted by the text extraction unit 5 and the relationship between the keywords in the question sentence input to the question input unit 1 are different. In this case, it is determined that the content is close to the content of the search request but does not match,
= 2 is determined. In S10, the priority order is determined when the relationship between the keywords in the text extracted by the text extraction unit 5 and the relationship between the keywords in the question sentence input to the question input unit 1 are the same. In this case, it is determined that the content matches the content of the search request, and ID = 1 is determined. In S11, the above-mentioned S2 to S10 are executed for all the texts extracted by the text extraction unit 5.
It is determined whether or not the processing has been completed. If the processing has been completed for all the extracted texts, S1
Proceed to 2. In S12, based on the IDs determined in S2 to S10, the texts are rearranged in descending order of priority (lower IDs). In S13, it is determined whether or not the rearranged texts have the same priority. If there is one having the same priority, the process proceeds to S14, and there is no text having the same priority. In this case, the process proceeds to S17. In steps S14 to S16, if there is the same priority, the priority is determined by another method.
In S14, it is determined whether or not any of the keywords extracted in S3 has not been used in the determination of the priority in S4 to S10. , S3 are performed again on the text read in S3, and the priorities are determined. In S15 and S16, the above S2 to S1
If the priority is the same even after the processing of step 4, the priority is determined based on the distance between the keywords (the number of characters existing between a certain keyword and another keyword). In S17, based on the priorities determined in the processing of S2 to S16, the texts with the highest priority are transmitted to the output unit 6 in order. Hereinafter, as an example, a search related to the question sentence "I want to know about Saturn's ring" will be described with reference to Table 1, FIG. 2 and FIG. The flow from the question sentence input to the question input unit 1 to the text extraction by the text extraction unit 5 is the same as the flow of the text extraction of the conventional example, and the description is omitted here. A method of assigning priorities to the text extracted by the text extraction unit 5 in the syntax analysis determination unit 8 will be described with reference to FIGS. First, in S1 of FIG. 2, the question input unit 1
A morphological analysis of the question sentence "I want to know about Saturn's ring." Is performed, and "Saturn" and "ring" are extracted as keywords, and further syntax analysis is performed. The keywords "Saturn" and "Ring" are the same sentence,
It is determined that the keywords are in the same phrase and that the keywords are related. This result is stored in the syntax analysis determination unit 8 as a result of the syntax analysis of the input question sentence. Next, the text extracted by the text extraction unit 5 in S2 is sequentially read by the syntax analysis determination unit 8, and morphological analysis is performed in S3. In S3, first, a morphological analysis of the text 1 is performed, and "Saturn / about / say / ba /, / the / middle / ring / was / saturn / of / symbol / say / . ". Next, an ID is determined by syntax analysis based on the result of the morphological analysis.
It is determined whether the keywords “Saturn” and “ring” are in the same sentence for text 1. Since the keyword of the text 1 exists in the same sentence, the process proceeds to S6. In S6, it is determined whether a keyword exists in the same phrase. Since the keywords "Saturn" and "Ring" of text 1 do not exist in the same phrase, the process proceeds to S7, where ID =
After the determination is 3, the process proceeds to S11. In S11,
For all the texts extracted by the text extraction unit 5, it is checked whether or not the IDs have been determined by the syntax analysis. If the processing has been completed for all the texts extracted by the text extraction unit 5, the process proceeds to S12, and if not completed, the process returns to S2 and repeats. An example of parsing text 3, text 4, and text 6 will be described below. Subsequently, the morphological analysis of the text 3 is performed in S3. Text 3 is "Saturn / no / ring / wa /,
/ In the solar system / in / most / beautiful / things / one / one /. The analyzed text 3 is also subjected to S4, S6, S8, and S10 in the same manner as the text 1. Since the syntax analysis result of the text 3 is the same as the syntax analysis result of the question sentence, , ID = 1 is repeated, and the same is repeated for the third text extracted by the text extraction unit 5. Text 4 is described as "Saturn / ha /, / satellite // surrounding / surrounding. /ing/. / And / ring / has /, / satellite / from / energy / deprived / deprived /. The morphological analysis is performed as described above, and the ID is determined through the processing of S4 and S5, and the text 4 has ID = 4. / Work / does / does / no / maybe /? And the morphological analysis is performed as described above, and ID = 2 is determined through the processing of S4, S6, S8, and S9. As described above, the IDs are determined by parsing all the texts extracted by the text extracting unit 5. When the ID is finally determined by the syntax analysis, the output order is determined from those having the highest priority by the syntax analysis in S12 of Fig. 3. In S12, those having the same priority exist. If so, the process proceeds to S13, and if not, in S17, the output order is specified to the output unit 6. In the present embodiment, text 1 has ID = 3 and text 3 has ID = 3. 1, text 4 has ID = 4, and text 6 has ID = 2. Therefore, in step S17, the output unit 6 outputs text 3, text 6, text 1, and text 4 in this order. Is forced.
FIG. 4 shows the results. In this embodiment, the case where there is no one having the same priority is described. However, when there are a plurality of ones having the same priority, as shown in S14 of FIG. It is also possible to make a determination based on keywords that exist other than the keywords that have been used. Also, S1
If the same is obtained even after performing step 4, it is possible to determine the priority order based on the distance between keywords (such as the number of characters between keywords) in the extracted text, as shown in S15 and S16. In this embodiment, the case where the keyword extracted by the keyword extraction unit 2 is two has been described as an example. However, the case where there are three or more keywords is basically the same. In the case where there are three, it is also possible to use the result of parsing with three keywords, or the result of parsing for a keyword with a high frequency of appearance in a question sentence. According to the present invention, as is apparent from the above description, the text extracted by the keyword is output from the output unit.
Since the priority of the output order can be determined,
It is possible to sequentially output a search request from a text that accurately represents the search request.

【図面の簡単な説明】【図１】本発明の構文解析判定部を備えたテキスト検索
装置の概略構成図【図２】本発明に係る構文解析判定部の処理の流れを示
すフローチャート【図３】本発明に係る構文解析判定部の処理の流れを示
すフローチャート【図４】本発明を実施した場合の出力結果を示す図【図５】従来例のテキスト検索装置の概略構成図【図６】従来例による出力結果を示す図【符号の説明】１質問入力部２キーワード抽出部３キーワードメモリ４文書メモリ５テキスト抽出部６出力部７制御部８構文解析判定部BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic configuration diagram of a text search device including a syntax analysis determination unit according to the present invention. FIG. 2 is a flowchart illustrating a processing flow of the syntax analysis determination unit according to the present invention. FIG. 4 is a flowchart showing a flow of processing of a parsing determination unit according to the present invention. FIG. 4 is a diagram showing an output result when the present invention is implemented. FIG. 5 is a schematic configuration diagram of a conventional text search device. Diagram showing output result according to conventional example [Description of reference numerals] 1 Question input unit 2 Keyword extraction unit 3 Keyword memory 4 Document memory 5 Text extraction unit 6 Output unit 7 Control unit 8 Syntax analysis determination unit

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 350 G06F 17/30 170 G06F 17/30 330 G06F 17/27 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 17/30 350 G06F 17/30 170 G06F 17/30 330 G06F 17/27 JICST file (JOIS)

Claims

(57) [Claims] (1) A document memory storing a plurality of texts, and a question input unit for inputting a question sentence required for extracting the text stored in the document memory. A keyword extracting unit for extracting a plurality of types of keywords from the question sentence input to the question input unit, and extracting a text including the keywords from the document memory based on the keywords extracted by the keyword extracting unit. text extraction unit that includes an output unit for outputting the text extracted from the text extracting unit, on the extracted text in the text extraction unit, a syntactic analysis unit for performing parsing, the, above constituting sentence The analysis determination unit is extracted by the keyword extraction unit.
The keyword is determined based on the distance between the keywords.
The text is output from the output unit in the order of the shortest distance.
A text characterized by determining the priority order to be output
Strike search device.