JPH06215035A

JPH06215035A - Text retrieving device

Info

Publication number: JPH06215035A
Application number: JP5006209A
Authority: JP
Inventors: Yoshiyo Nakamura; 佳代中村
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1993-01-18
Filing date: 1993-01-18
Publication date: 1994-08-05
Anticipated expiration: 2018-01-08
Also published as: JP3363501B2

Abstract

PURPOSE:To provide a text retrieving device for successively outputting texts from a text exactly expressing a retrieval request by applying priority by performing the syntax analysis of the extracted text in the case of text retrieval due to a keyword. CONSTITUTION:This device is provided with a keyword memory 3 for storing the inputted keyword, document memory 4 for storing the retrieved text, text extraction part 5 for retrieving the text based on the keyword, output part 6 for outputting the retrieved result, and syntax analysis decision part 8 for applying the priority to the retrieved text by performing the syntax analysis to the text retrieved by the text extraction part and deciding whether all the kinds of keywords exist in a single sentence or not, whether they are existent in a single clause or not and further whether the keyword has the same modification relation as a question sentence or not.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テキスト（ここでテキ
ストとは、１つの文章、或るいは関連した複数文章によ
り構成された文書のことを言う。）の検索に際し、入力
する質問文のキーワードを抽出し、そのキーワードを含
むテキストを検索し、出力するテキスト検索装置に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a question text to be input when searching a text (here, the text means a text composed of one sentence or a plurality of related plural sentences). The present invention relates to a text search device that extracts a keyword, searches for text containing the keyword, and outputs the text.

【０００２】[0002]

【従来の技術】近年、コンピュータ技術の進歩や、文書
記憶装置の容量の増加により、多量のデータや文章を備
えるデータベースが普及しており、そのデータベースを
扱う機会も多く発生している。2. Description of the Related Art In recent years, due to advances in computer technology and an increase in the capacity of document storage devices, databases having a large amount of data and sentences have become widespread, and there are many opportunities to handle such databases.

【０００３】これらのデータベースには、あらかじめハ
ードディスクなどの記憶装置に多量のデータが蓄積され
ており、そのデータに対して、キーワードを基にして検
索することが一般的である。A large amount of data is stored in advance in a storage device such as a hard disk in these databases, and it is common to search the data based on keywords.

【０００４】このようなテキスト検索装置におけるキー
ワードを基にした検索としては、例えば、特開平２−２
４５８号公報に開示されるように、与えられたテキスト
の形態素解析を行って、キーワードを抽出し、抽出され
たキーワードを基にして、記憶装置に格納されたテキス
トの検索を行い、キーワードが一致するテキストを出力
する方法が用いられている。As a search based on a keyword in such a text search device, for example, Japanese Patent Laid-Open No. 2-2
As disclosed in Japanese Patent Laid-Open No. 458, the morpheme analysis of a given text is performed to extract a keyword, and the text stored in a storage device is searched based on the extracted keyword, and the keywords match each other. The method of outputting the text to be used is used.

【０００５】従来のテキスト検索装置を用いた検索方法
の一例を図５及び図６に基づいて説明する。An example of a search method using a conventional text search device will be described with reference to FIGS. 5 and 6.

【０００６】図５は、従来のテキスト検索装置の概略構
成図を示すものである。FIG. 5 is a schematic block diagram of a conventional text search device.

【０００７】同図において、１はキーワードを含む質問
文が入力される質問入力部、２は質問入力部１において
入力された質問文に対して形態素解析を施し、キーワー
ドの抽出を行うキーワード抽出部、３はキーワード抽出
部２において抽出されたキーワードを蓄えるキーワード
メモリ、４は検索対象となるテキストが既に蓄えられて
いる文書メモリ、５はキ−ワ−ドメモリ３に蓄えられて
いるキーワードを基にして、文書メモリ４から関連があ
るテキストを抽出するテキスト抽出部、６はテキスト抽
出部５で抽出されたテキストの出力を行う出力部、７は
テキスト検索装置全体の制御を司る制御部である。In the figure, 1 is a question input unit for inputting a question sentence including a keyword, and 2 is a keyword extraction unit for performing morphological analysis on the question sentence input in the question input unit 1 to extract keywords. Reference numeral 3 is a keyword memory for storing the keywords extracted by the keyword extracting unit 2, 4 is a document memory in which texts to be searched are already stored, and 5 is based on the keywords stored in the keyword memory 3. A text extraction unit for extracting relevant text from the document memory 4, an output unit 6 for outputting the text extracted by the text extraction unit 5, and a control unit 7 for controlling the entire text search apparatus.

【０００８】図６は、図５に示した従来のテキスト検索
装置における出力部６からの検索結果であり、その出力
内容を示している。FIG. 6 shows the retrieval result from the output unit 6 in the conventional text retrieval apparatus shown in FIG. 5, and shows the output contents.

【０００９】以下に、図５に示すテキスト検索装置の動
作について、一例として、質問文”土星のリングについ
て知りたい。”が入力された場合を説明する。As an example of the operation of the text search apparatus shown in FIG. 5, the case where the question sentence "I want to know about Saturn's ring." Is input will be described.

【００１０】まず、質問入力部１に”土星のリングにつ
いて知りたい。”という質問文が入力されると、キーワ
ード抽出部２は、キーワードを抽出するために、形態素
解析を行う。質問入力部１に入力された質問文は”土星
／の／リング／について／知り／たい／。”（ここで／
は形態素解析による区切りを表す。）のように形態素に
分解される。ここでは、キーワードとして名詞を用いる
こととしているので、”土星”及び”リング”が名詞と
して抽出される。従って、キーワード抽出部２におい
て、”土星”及び”リング”がキーワードとして抽出さ
れ、キーワードメモリ３に蓄えられる。この質問文にお
ける検索要求の内容は”土星のリング”に関することで
あり、キーワードである”土星”と”リング”が係受け
の関係になっているものが最も要求されるテキストであ
る。First, when the question sentence "I want to know about Saturn's ring." Is input to the question input unit 1, the keyword extraction unit 2 performs morphological analysis in order to extract the keyword. The question text entered in the question input section 1 is “Saturn / no / ring / about / know / want /.” (Here /
Represents a delimiter by morphological analysis. ) Is decomposed into morphemes. Here, since the noun is used as the keyword, "Saturn" and "ring" are extracted as the noun. Therefore, the keyword extraction unit 2 extracts "Saturn" and "Ring" as keywords and stores them in the keyword memory 3. The content of the search request in this question sentence is about "Saturn's ring", and the one in which the keywords "Saturn" and "Ring" are in a dependent relationship is the most requested text.

【００１１】キーワードメモリ３にキーワードが蓄積さ
れると、テキスト抽出部５は、文書メモリ４に蓄えられ
ているテキストから、キーワードを含むテキストを順次
抽出していく。When the keywords are stored in the keyword memory 3, the text extracting section 5 sequentially extracts the texts containing the keywords from the texts stored in the document memory 4.

【００１２】表１は文書メモリ４に格納されている様々
なテキストの一部を示したものである。Table 1 shows some of the various texts stored in the document memory 4.

【００１３】[0013]

【表１】 [Table 1]

【００１４】テキスト抽出部５は、文書メモリ４の内容
から順次テキストの検索を行う。以下にその手順を示
す。The text extraction unit 5 sequentially searches the contents of the document memory 4 for text. The procedure is shown below.

【００１５】テキスト抽出部５は、文書メモリ４のテキ
ストからキーワード”土星”が含まれるか否かの判定を
全てのテキストについて行う。まず、”土星に関して言
えば、その中のリングは土星の象徴と言える。”（以
下、テキスト１という。）というテキストが抽出され
る。検索を続けていくと、”土星のリングは、太陽系の
中で最も美しいものの一つだ。”（以下、テキスト３と
いう。）というテキストが検索される。さらに検索を続
けていくと、”土星は、衛星に取り巻かれている。そし
てリングは、衛星からエネルギーを奪われている。”
（以下、テキスト４という。）というテキストが抽出さ
れる。さらに続けると、”土星においてリングはどのよ
うな働きをしているのだろうか？”（以下、テキスト６
という。）というテキストが抽出される。さらに、続け
ると”土星は太陽系の惑星の一つである。”（以下、テ
キスト７という。）というテキストが抽出される。The text extraction unit 5 determines whether or not the keyword "Saturn" is included in the text of the document memory 4 for all texts. First, the text "As for Saturn, the ring in it is said to be a symbol of Saturn." (Hereinafter referred to as Text 1) is extracted. As you continue to search, the text "Saturn's ring is one of the most beautiful in the solar system." (Hereinafter referred to as text 3) is searched. Continuing the search, "Saturn is surrounded by satellites, and the ring is deprived of its energy."
The text (hereinafter referred to as text 4) is extracted. Continuing on, "What does the ring do in Saturn?" (Text 6 below.
Say. ) Text is extracted. Furthermore, if you continue, the text "Saturn is one of the planets of the solar system." (Hereinafter referred to as text 7) is extracted.

【００１６】以上のように、テキスト１、テキスト３、
テキスト４、テキスト６、及びテキスト７が抽出され
る。テキスト２及びテキスト５については、キーワード
である”土星”が含まれていないため抽出されない。As described above, the text 1, the text 3,
Text 4, text 6, and text 7 are extracted. Text 2 and text 5 are not extracted because the keyword "Saturn" is not included.

【００１７】続いて、テキスト抽出部５は、抽出された
テキスト１、テキスト３、テキスト４、テキスト６、及
びテキスト７について次のキーワード”リング”が含ま
れているか否かの判定を行う。テキスト１、テキスト
３、テキスト４、及びテキスト６には、キーワード”リ
ング”が含まれるが、テキスト７には、キーワード”リ
ング”が含まれない。このため、テキスト抽出部５から
は、テキスト１、テキスト３、テキスト４、及びテキス
ト６が抽出され、出力部６に伝えられ、図６に示すよう
に出力部６においてテキスト１、テキスト３、テキスト
４、及びテキスト６の順番に出力される。Subsequently, the text extraction unit 5 determines whether or not the extracted keyword 1, text 3, text 4, text 6, and text 7 include the next keyword "ring". Text 1, Text 3, Text 4, and Text 6 include the keyword “ring”, but Text 7 does not include the keyword “ring”. Therefore, the text 1, the text 3, the text 4, and the text 6 are extracted from the text extraction unit 5 and transmitted to the output unit 6, and the text 1, the text 3, the text are output at the output unit 6 as shown in FIG. 4 and the text 6 are output in this order.

【００１８】このように、従来、キーワードによる検索
の結果、全種類のキーワードを含むテキストが、文書メ
モリ４に蓄えられている順番に出力部６から出力されて
いた。As described above, conventionally, as a result of the keyword search, the texts including all kinds of keywords are output from the output unit 6 in the order of being stored in the document memory 4.

【００１９】[0019]

【発明が解決しようとする課題】上記のような構成で
は、キーワードを含むテキストを抽出することは可能で
あるが、文書メモリ４に蓄えられている順番に出力を行
っていくため、質問文の意図する検索要求の内容を的確
に表すテキストから出力されるとは限らなかった。With the above-mentioned configuration, although it is possible to extract the text including the keyword, the texts are output in the order stored in the document memory 4, so that the question text It was not always output from the text that accurately represents the content of the intended search request.

【００２０】本発明は上記問題点に鑑みなされたもので
あり、抽出されたテキストの構文解析結果を基にして、
テキストの出力の順番に優先順位を与え、検索要求の内
容を的確に表すテキストから順番に出力するテキスト検
索装置を提供するものである。The present invention has been made in view of the above problems, and based on the result of parsing the extracted text,
(EN) A text search device that gives priority to the output order of texts and outputs in order from texts that accurately represent the contents of a search request.

【００２１】[0021]

【課題を解決するための手段】上記問題点を解決するた
めに、本発明のテキスト検索装置は、キーワードを含む
質問文が入力される質問入力部、質問入力部において入
力された質問文から形態素解析によりキーワードの抽出
を行うキーワード抽出部、キーワード抽出部において抽
出されたキーワードを蓄えるキーワードメモリ、検索さ
れるテキストなど多量のデータが蓄えられている文書メ
モリ、キ−ワ−ドメモリに蓄えられているキーワードを
基にして、文書メモリから該当するテキストを抽出する
テキスト抽出部、テキスト抽出部で抽出されてテキスト
の出力を行う出力部、全体の制御を司る制御部を備え、
さらに抽出されたテキストの構文解析を行うと共に、上
記キーワード抽出部で抽出されたキーワードの構文情報
に基づいて、上記出力部から上記テキストを出力せしめ
る優先順位を決定する構文解析判定部を備えたものであ
る。In order to solve the above problems, a text search device of the present invention is a morpheme from a question input section in which a question text including a keyword is input, and a question text input in the question input section. A keyword extraction unit that extracts keywords by analysis, a keyword memory that stores the keywords extracted by the keyword extraction unit, a document memory that stores a large amount of data such as text to be searched, and a keyword memory. A text extraction unit that extracts the corresponding text from the document memory based on the keyword, an output unit that outputs the text extracted by the text extraction unit, and a control unit that controls the entire
In addition to performing a syntactic analysis of the extracted text, a syntactic analysis determination unit that determines the priority order for outputting the text from the output unit based on the syntactic information of the keyword extracted by the keyword extraction unit Is.

【００２２】[0022]

【作用】本発明は、上述した構成により、質問文のキー
ワードを用いて、抽出されたテキストに対して構文解析
を施し、抽出されたテキストに優先順位を与えることに
より、検索要求を的確に表すテキストを自動的に優先し
て出力するテキスト検索装置を提供することが可能であ
る。According to the present invention, with the above-described configuration, the extracted text is parsed using the keyword of the question sentence, and the extracted text is given a priority order to accurately represent the search request. It is possible to provide a text search device that automatically prioritizes and outputs text.

【００２３】[0023]

【実施例】以下に、本発明の一実施例であるテキスト検
索装置を図１乃至図４に基づいて説明し、従来と同一の
構成については同一番号を付し、その説明は省略する。BEST MODE FOR CARRYING OUT THE INVENTION A text search device according to an embodiment of the present invention will be described below with reference to FIGS.

【００２４】図１は、本発明のテキスト検索装置の概略
構成図、図２及び図３は、本発明の特徴である構文解析
判定部８の処理の流れを示すフローチャート、図４は、
本発明を実施した場合の出力結果である。FIG. 1 is a schematic configuration diagram of a text search device according to the present invention, FIGS. 2 and 3 are flowcharts showing a flow of processing of a parsing judgment unit 8 which is a feature of the present invention, and FIG.
It is an output result when this invention is implemented.

【００２５】本発明が従来例と異なる点は、テキスト抽
出部５と出力部６の間に構文解析判定部８を設けたこと
であり、この構文解析判定部８は、質問入力部１に入力
された質問文及びテキスト抽出部５で抽出されたテキス
トについて構文解析を行うとともに、これらの構文解析
結果を基にして、テキスト抽出部５で抽出されたテキス
トの並び替えを行い、出力部６から出力するテキストの
優先順位を決定する機能を有する。この構文解析判定部
８における処理概要を図２及び図３に示すフローチャー
トを基にして説明する。The present invention is different from the conventional example in that a syntactic analysis determination unit 8 is provided between the text extraction unit 5 and the output unit 6, and this syntactic analysis determination unit 8 is input to the question input unit 1. The parsing is performed on the extracted question sentence and the text extracted by the text extraction unit 5, and the texts extracted by the text extraction unit 5 are rearranged based on the results of the parsing. It has the function of determining the priority of the output text. An outline of processing in the syntax analysis determination unit 8 will be described based on the flowcharts shown in FIGS.

【００２６】Ｓ１においては、質問入力部１に入力され
た質問文の構文解析を行い、キーワード抽出部２で抽出
されたキーワードの構文情報、即ちキーワードがどの
文、どの文節に含まれるか、或るいはキーワード間の係
受けの関係などを調べる。Ｓ２においては、テキスト抽
出部５で抽出されたテキストの１つを構文解析判定部８
に読み込ませる。Ｓ３においては、Ｓ２において読み込
まれたテキストから形態素解析を行い、キーワードを抽
出するとともに、読み込まれたテキストにおけるキーワ
ードの構文情報（キーワードがどの文、どの文節に含ま
れるか、或るいはキーワード間の係受けの関係など）を
調べる。In S1, the question sentence input to the question input unit 1 is syntactically analyzed, and the syntactic information of the keyword extracted by the keyword extracting unit 2, that is, which sentence and which clause the keyword is included in, Investigate the relationship of the relationships between keywords. In S2, one of the texts extracted by the text extraction unit 5 is added to the syntax analysis determination unit 8
To read. In S3, morpheme analysis is performed from the text read in S2 to extract keywords, and at the same time, the syntax information of the keywords in the read text (which sentence, which clause is included in the keyword, or between the keywords). Check the relationship).

【００２７】Ｓ４、Ｓ６、及びＳ８においては、Ｓ３で
抽出されたキーワードの構文情報を基にして出力部６に
おける優先順位を決定するものであり、まず、Ｓ４で
は、テキスト抽出部５で抽出されたテキストにおいて、
全種類のキーワードが１つの文中に存在するかどうかを
判定し、全種類のキーワードが１つの文中に存在する場
合には、Ｓ６の処理へ進み、全種類のキーワードが１つ
の文中に存在しない場合には、Ｓ５へ進む。Ｓ５におい
ては、全種類のキーワードが１つの文中に存在しない場
合の優先順位を決定し、全種類のキーワードが１つの文
中に存在しない場合、キーワード間の関係は、ほとんど
無いと判定し、優先順位（以下、ＩＤという。）は４番
目（ＩＤ＝４）と決定する（以下、ＩＤ＝ｎと書くと、
優先順位はｎ番目であることを示す。）。In S4, S6, and S8, the priority order in the output unit 6 is determined based on the syntax information of the keyword extracted in S3. First, in S4, the priority is extracted by the text extraction unit 5. In the text
It is determined whether or not all types of keywords are present in one sentence. If all types of keywords are present in one sentence, the process proceeds to S6, and if all types of keywords are not present in one sentence. To proceed to S5. In S5, the priority order when all kinds of keywords are not present in one sentence is determined, and when all kinds of keywords are not present in one sentence, it is determined that there is almost no relationship between keywords, and the priority order is determined. (Hereinafter, referred to as ID) is determined to be the fourth (ID = 4) (hereinafter, if ID = n is written,
It indicates that the priority is n. ).

【００２８】次に、Ｓ６においては、テキスト抽出部５
で抽出されたテキストにおいて、全種類のキーワードが
１つの文節中に存在するかどうかを判定し、全種類のキ
ーワードが１つの文節中に存在する場合は、Ｓ８へ進
み、全種類のキーワードが１つの文節中に存在しない場
合には、Ｓ７に進む。Ｓ７においては、全種類のキーワ
ードが１つの文節中に存在しない場合の優先順位を決定
し、全種類のキーワードが１つの文節中に存在しない場
合、キーワード間の関係は、あまり無いと判定し、ＩＤ
＝３と決定する。Next, in S6, the text extraction unit 5
It is determined whether or not all types of keywords are present in one phrase in the text extracted in step S7. If all types of keywords are present in one phrase, the process proceeds to S8, and all types of keywords are 1 If not included in one clause, the process proceeds to S7. In S7, the priority order when all types of keywords are not present in one phrase is determined, and when all types of keywords are not present in one phrase, it is determined that there is not much relation between keywords, ID
= 3 is determined.

【００２９】Ｓ８においては、テキスト抽出部５で抽出
されたテキストにおけるキーワード間の関係と、質問入
力部１に入力された質問文におけるキーワード間の関係
とが同じであるかどうかの判定を行い、同じ関係の場合
には、Ｓ１０に進み、それらの関係が異なる場合には、
Ｓ９に進む。In S8, it is determined whether the relationship between the keywords in the text extracted by the text extraction unit 5 and the relationship between the keywords in the question sentence input to the question input unit 1 are the same, If the relationships are the same, the process proceeds to S10. If the relationships are different,
Proceed to S9.

【００３０】Ｓ９においては、テキスト抽出部５で抽出
されたテキストにおけるキーワード間の関係と、質問入
力部１に入力された質問文におけるキーワード間の関係
とが異なる場合の優先順位を決定し、この場合は、検索
要求の内容に近いが、一致はしていないと判断し、ＩＤ
＝２と決定する。In S9, the priority order is determined when the relationship between the keywords in the text extracted by the text extraction unit 5 and the relationship between the keywords in the question sentence input to the question input unit 1 are different. If it is close to the content of the search request, it is determined that there is no match, and the ID
= 2 is determined.

【００３１】Ｓ１０においては、テキスト抽出部５で抽
出されたテキストにおけるキーワード間の関係と、質問
入力部１に入力された質問文におけるキーワード間の関
係とが同じ場合の優先順位を決定し、この場合は、検索
要求の内容に一致していると判断し、ＩＤ＝１と決定す
る。In S10, the priority order is determined when the relationship between the keywords in the text extracted by the text extracting unit 5 and the relationship between the keywords in the question sentence input to the question input unit 1 are the same. In this case, it is determined that the contents match the content of the search request, and ID = 1 is determined.

【００３２】Ｓ１１においては、テキスト抽出部５で抽
出された全てのテキストについて、上記Ｓ２乃至Ｓ１０
の処理が終了したかどうかを判定するものであり、抽出
された全てのテキストについて終了した場合には、Ｓ１
２へ進む。Ｓ１２においては、Ｓ２乃至Ｓ１０において
決定されたＩＤを基にして、優先順位の高い（ＩＤが小
さい。）ものから順番に、テキストを並べ替えるもので
ある。In S11, all the texts extracted by the text extraction unit 5 are processed in the above S2 to S10.
It is determined whether or not the processing of step S1 has been completed, and if all the extracted texts have been completed, S1
Go to 2. In S12, the texts are rearranged in descending order of priority (smaller ID) based on the IDs determined in S2 to S10.

【００３３】Ｓ１３においては、並べ替えられたテキス
トに、同一優先順位のものがあるかどうかの判定を行
い、同一優先順位のものがある場合には、Ｓ１４に進
み、同一優先順位のものがない場合には、Ｓ１７へ進
む。In S13, it is determined whether or not the rearranged texts have the same priority order. If there is the same priority order, the process proceeds to S14 and there is no one having the same priority order. In that case, the process proceeds to S17.

【００３４】Ｓ１４乃至Ｓ１６では、同一優先順位のも
のがある場合に更に別な方法で、優先順位を決定する。
Ｓ１４では、Ｓ３において抽出されたキーワードの中
で、上記Ｓ４乃至Ｓ１０における優先順位の決定に用い
られなかったキーワードが存在するかどうかの判定を行
い、そのキーワードが存在する場合には、このキーワー
ドにより、Ｓ３で読み込まれたテキストに対してＳ４乃
至Ｓ１０を再度行い、優先順位を決定する。In steps S14 to S16, if there are those having the same priority, the priority is determined by another method.
In S14, it is determined whether or not there is a keyword not used in the determination of the priority order in S4 to S10 among the keywords extracted in S3, and if the keyword exists, this keyword is used. , S3 are again performed on the text read in S3 to determine the priority order.

【００３５】Ｓ１５及びＳ１６では、上記Ｓ２乃至Ｓ１
４の処理を経ても優先順位が同じ場合には、キーワード
間の距離（あるキーワードと他のキーワードの間に存在
する文字数など）により、優先順位を決定する。In S15 and S16, the above S2 to S1
If the priorities are the same even after the process of 4, the priorities are determined according to the distance between the keywords (the number of characters existing between a certain keyword and another keyword).

【００３６】Ｓ１７では、上記Ｓ２乃至Ｓ１６の処理に
おいて決定された優先順位に基づいて、優先順位の高い
テキストから順番に、出力部６へ伝送する。In S17, based on the priorities determined in the processes of S2 to S16, the texts with higher priorities are transmitted to the output unit 6 in order.

【００３７】以下に、一例として”土星のリングについ
て知りたい。”という質問文に関する検索について、表
１、図２、及び図３を参照しながら、説明する。As an example, the search for the question sentence "I want to know about Saturn's ring." Will be described with reference to Tables 1, 2 and 3.

【００３８】質問入力部１に入力された質問文から、テ
キスト抽出部５のテキスト抽出までの流れは、従来例の
テキスト抽出の流れと同じであるので、ここでは説明は
省略し、以下では、テキスト抽出部５で抽出されたテキ
ストを、構文解析判定部８において優先順位を付ける方
法について図２及び図３を用いて説明する。Since the flow from the question sentence input to the question input unit 1 to the text extraction of the text extraction unit 5 is the same as the flow of the text extraction of the conventional example, the description thereof will be omitted here. A method of prioritizing the texts extracted by the text extraction unit 5 in the syntax analysis determination unit 8 will be described with reference to FIGS. 2 and 3.

【００３９】まず、図２のＳ１において、質問入力部１
から入力された”土星のリングについて知りたい。”と
いう質問文の形態素解析が行われ、”土星”、”リン
グ”がキーワードとして抽出され、さらに構文解析が行
われる。キーワードの”土星”、”リング”は同一文、
同一文節中であり、キーワードが係受けの関係であるこ
とが判定される。この結果が、入力された質問文の構文
解析結果として、構文解析判定部８に記憶される。First, in S1 of FIG. 2, the question input unit 1
The question sentence "I want to know about Saturn's ring." Entered from is morphologically analyzed, "Saturn" and "ring" are extracted as keywords, and further syntactic analysis is performed. The keywords "Saturn" and "Ring" are the same sentence,
It is determined that the keywords are in the same phrase and have a dependency relationship. This result is stored in the syntactic analysis determination unit 8 as the syntactic analysis result of the input question sentence.

【００４０】次に、Ｓ２においてテキスト抽出部５で抽
出されたテキストが、順番に構文解析判定部８に読み込
まれ、Ｓ３において形態素解析が施される。Next, the texts extracted by the text extraction unit 5 in S2 are sequentially read by the syntactic analysis determination unit 8 and subjected to morphological analysis in S3.

【００４１】Ｓ３においては、まず、テキスト１の形態
素解析を行い、”土星／に／関して／言え／ば／、／そ
の／中の／リング／は／土星／の／象徴／と／言える
／。”というように解析される。次に、この形態素解析
の結果を基にして、構文解析によりＩＤが決定される。
テキスト１についてキーワードである”土星”と”リン
グ”が同一文中にあるかどうかが判定される。テキスト
１は、同一文中にキーワードが存在するために、処理が
Ｓ６へ進められる。Ｓ６においては、同一文節中に、キ
ーワードが存在するかどうかの判定が施される。テキス
ト１のキーワードである”土星”と”リング”は同一文
節に存在しないので、処理がＳ７へ進められて、ＩＤ＝
３と決定された後、Ｓ１１へ進む。Ｓ１１においては、
テキスト抽出部５において抽出された全てのテキストに
ついて、構文解析によりＩＤが決定されたかどうかが調
べられる。テキスト抽出部５において抽出された全ての
テキストについて終了した場合には、Ｓ１２へ進み、終
了していない場合には、Ｓ２に戻って繰り返される。In S3, first, the morphological analysis of the text 1 is performed, and "Saturn / to / related / to say / ba /, / the / in / ring / wa / saturn / of / symbol / to / speak / . "Is analyzed. Next, the ID is determined by syntactic analysis based on the result of this morphological analysis.
For text 1, it is determined whether the keywords "Saturn" and "ring" are in the same sentence. Since the keyword of text 1 exists in the same sentence, the process proceeds to S6. In S6, it is determined whether a keyword exists in the same phrase. Since the keywords "Saturn" and "Ring" of the text 1 do not exist in the same clause, the process proceeds to S7, where ID =
After it is determined to be 3, the process proceeds to S11. In S11,
For all the texts extracted by the text extraction unit 5, it is checked whether the ID has been determined by the syntactic analysis. If all the texts extracted by the text extraction unit 5 are completed, the process proceeds to S12, and if not completed, the process returns to S2 and is repeated.

【００４２】以下に、テキスト３、テキスト４、及びテ
キスト６の構文解析の例について説明する。An example of parsing text 3, text 4, and text 6 will be described below.

【００４３】続いて、Ｓ３においてテキスト３の形態素
解析を行う。テキスト３は”土星／の／リング／は／、
／太陽系／の中で／最も／美しい／もの／の／一つ／だ
／。”と解析される。この解析されたテキスト３もテキ
スト１と同様にＳ４、Ｓ６、Ｓ８及びＳ１０が施され
る。テキスト３の構文解析結果と、質問文の構文解析結
果とは同じであるので、ＩＤ＝１と決定される。Subsequently, the morphological analysis of the text 3 is performed in S3. Text 3 is "Saturn / no / ring / ha /,
/ Solar system / Of / most / beautiful / thing / of / one / The analyzed text 3 is also subjected to S4, S6, S8 and S10 similarly to the text 1. Since the syntactic analysis result of the text 3 and the syntactic analysis result of the question sentence are the same. , ID = 1 is determined.

【００４４】さらに続いて、テキスト抽出部５で抽出さ
れた３番目のテキストについて同様に繰り返す。テキス
ト４は”土星／は／、／衛星／に／取り巻か／れ／てい
る／。／そして／リング／は／、／衛星／から／エネル
ギー／を／奪わ／れ／ている／。”というように形態素
解析され、Ｓ４及びＳ５の処理を経て、ＩＤが決定さ
れ、テキスト４は、ＩＤ＝４となる。さらに、テキスト
６についても、”土星／において／リング／は／どのよ
うな／働き／を／し／ている／の／だろうか／？”とい
うように形態素解析され、Ｓ４、Ｓ６、Ｓ８及びＳ９の
処理を経て、ＩＤ＝２と決定される。以上により、テキ
スト抽出部５で抽出されたテキスト全てについて構文解
析によりＩＤが決定される。Further, subsequently, the third text extracted by the text extracting unit 5 is similarly repeated. Text 4 says "Saturn / ha /, / satellite / in / surrounding / retaining /./ and // ring / ha /, / satellite / from / energy / deprived / retained / retained /." Is morphologically analyzed, and the ID is determined through the processes of S4 and S5, and the text 4 has ID = 4. Further, the text 6 is also subjected to morphological analysis such as “Saturn / in / ring / wa / what / working / doing / of / wondering /?”, And S4, S6, S8 and S9. ID = 2 is determined through the process of. As described above, the IDs are determined by parsing all the texts extracted by the text extraction unit 5.

【００４５】最終的に構文解析によりＩＤが決定される
と、図３のＳ１２において、構文解析による優先順位の
高いものから出力の順番が決定される。Ｓ１２におい
て、同一優先順位のものが存在する場合には、Ｓ１３に
進み、存在しない場合には、Ｓ１７により、出力部６に
対して、出力する順番を指定する。本実施例において
は、テキスト１はＩＤ＝３、テキスト３はＩＤ＝１、テ
キスト４はＩＤ＝４、テキスト６はＩＤ＝２であるの
で、Ｓ１７においては、テキスト３、テキスト６、テキ
スト１、テキスト４の順番で出力部６から出力される。
その結果を図４に示す。When the ID is finally determined by the syntax analysis, the output order is determined in step S12 of FIG. 3 from the highest priority order by the syntax analysis. In S12, if there are those having the same priority, the process proceeds to S13, and if they do not exist, the output order is designated to the output unit 6 in S17. In this embodiment, the text 1 has ID = 3, the text 3 has ID = 1, the text 4 has ID = 4, and the text 6 has ID = 2. Therefore, in S17, the text 3, the text 6, the text 1, and the text 1, The text 4 is output in order from the output unit 6.
The result is shown in FIG.

【００４６】本実施例においては、同一優先順位のもの
がない場合について説明したが、同一優先順位のものが
複数ある場合には、図３のＳ１４に示すように、優先順
位決定の対象となったキーワード以外に存在するキーワ
ードにより、判定を行うことも可能である。また、Ｓ１
４を行っても全く同一の場合には、Ｓ１５、Ｓ１６に示
すように、抽出されたテキストにおけるキーワード間の
距離（キーワード間の文字数など）より優先順位を決定
することも可能である。In the present embodiment, the case where there is no item having the same priority has been described. However, when there are a plurality of items having the same priority, as shown in S14 of FIG. It is also possible to make a determination by using a keyword other than the above keywords. Also, S1
If the same is true after step 4, it is possible to determine the priority order based on the distance between keywords in the extracted text (such as the number of characters between keywords) as shown in S15 and S16.

【００４７】尚、本実施例においては、キーワード抽出
部２で抽出されたキーワードが２つの場合を例に説明し
たが、キーワードが３つ以上ある場合も基本的には同じ
であり、例えば、キーワードが３つの場合には、３つの
キーワードによる構文解析結果を用いたり、或るいは質
問文において出現頻度の多いキーワードに関する構文解
析結果を用いることも可能である。In the present embodiment, the case where the number of keywords extracted by the keyword extracting unit 2 is two has been described as an example, but the case where there are three or more keywords is basically the same. When there are three, it is also possible to use the syntactic analysis result by three keywords, or the syntactic analysis result regarding the keyword with a high appearance frequency in the question sentence.

【００４８】[0048]

【発明の効果】本発明は、以上の説明から明らかなよう
に、キーワードにより抽出されたテキストに対して、構
文解析を施すことにより、抽出されたテキストに対して
出力部から出力する順番の優先順位を決定するため、検
索要求を的確に表すテキストから順次出力することが可
能となる。さらには、抽出されたテキストにおいて、全
種類のキーワードが単一文章中に存在するか、或るいは
単一文節中に存在するか、さらには、キーワードが質問
文と同様な係受けの関係になっているかを判定すること
により、より簡潔に検索要求の内容を表すテキストから
順番に出力することが可能となる。As is apparent from the above description, according to the present invention, the text extracted by the keyword is syntactically analyzed to give priority to the order of outputting the extracted text from the output section. Since the ranking is determined, it is possible to sequentially output the search requests from the texts that accurately represent the search requests. Furthermore, in the extracted text, whether all types of keywords are present in a single sentence, or in a single phrase, and whether the keywords have the same dependency relationship as the question sentence. By determining whether or not, it is possible to output the text representing the content of the search request more simply and in order.

[Brief description of drawings]

【図１】本発明の構文解析判定部を備えたテキスト検索
装置の概略構成図FIG. 1 is a schematic configuration diagram of a text search device including a syntax analysis determination unit of the present invention.

【図２】本発明に係る構文解析判定部の処理の流れを示
すフローチャートFIG. 2 is a flowchart showing a flow of processing of a parsing determination unit according to the present invention.

【図３】本発明に係る構文解析判定部の処理の流れを示
すフローチャートFIG. 3 is a flowchart showing a flow of processing of a parsing determination unit according to the present invention.

【図４】本発明を実施した場合の出力結果を示す図FIG. 4 is a diagram showing an output result when the present invention is implemented.

【図５】従来例のテキスト検索装置の概略構成図FIG. 5 is a schematic configuration diagram of a conventional text search device.

【図６】従来例による出力結果を示す図FIG. 6 is a diagram showing an output result according to a conventional example.

[Explanation of symbols]

１質問入力部２キーワード抽出部３キーワードメモリ４文書メモリ５テキスト抽出部６出力部７制御部８構文解析判定部 1 question input unit 2 keyword extraction unit 3 keyword memory 4 document memory 5 text extraction unit 6 output unit 7 control unit 8 syntax analysis determination unit

Claims

[Claims]

1. A document memory in which a plurality of texts are stored, a question input section for inputting a question sentence required to extract the texts stored in the document memory, and a question input section A keyword extraction unit that extracts a plurality of types of keywords from the question sentence, a text extraction unit that extracts a text including the keyword from the document memory based on the keywords extracted by the keyword extraction unit, and the text extraction unit An output unit that outputs the text extracted from the text extraction unit; and a syntactic analysis determination unit that performs syntactic analysis on the text extracted by the text extraction unit, wherein the syntactic analysis determination unit is the keyword extraction unit. Based on the syntactic information of the keyword extracted by the above, the priority order for outputting the text from the output unit is determined. Search device.

2. The syntactic information is information as to whether or not all types of keywords extracted by the keyword extracting unit are included in one sentence, and the syntactic analysis determining unit includes all types of keywords. 2. The text search device according to claim 1, wherein the priority order of the texts included in the single sentence is set higher than the priority orders of the other texts.

3. The syntax information is information as to whether or not all types of keywords extracted by the keyword extraction unit are included in one clause, and the syntax analysis determination unit is used for all types of the keywords. 2. The text search device according to claim 1, wherein the priority included in the text included in the single clause is higher than the priority included in the other texts.

4. The syntactic analysis determination unit determines the relationship between the keywords in the question sentence input to the question input unit and the relationship between the keywords in the text extracted by the text extraction unit. 2. The text search device according to claim 1, wherein the priority order of the texts is set to be the highest when both have the same dependency relationship.