JPH0320866A

JPH0320866A - Text base retrieval system

Info

Publication number: JPH0320866A
Application number: JP2035832A
Authority: JP
Inventors: Tamaki Saito; 斎藤　珠喜; Hironobu Fukunaga; 福永　博信
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1989-03-07
Filing date: 1990-02-16
Publication date: 1991-01-29

Abstract

PURPOSE:To perform retrieval with high accuracy by extracting one of the word of a questionaire, the synonym of the word, and the word or the synonym with similar coupling relation and the word having the coupling relation as a text coinciding with the content of the questionaire as a retrieval request. CONSTITUTION:At synonym developing step 5, retrieval structure is reinforced by referring to a synonym dictionary 6 for the word in structure(retrieval structure) which becomes reference to be used in the retrieval generated at structure generating step 4, and selecting the word representing meaning similar to the word. A text retrieval part 7 retrieves a text base 8 setting the retrieval structure generated up to the synonym developing step 5 as a sample, and outputs the word coinciding with the retrieval structure that is the sample as a retrieval result. At this time, morpheme analysis and syntax analysis are performed by using a word dictionary 3. In such a way, it is possible to perform the retrieval with high accuracy from the questionaire by natural language for the data base 8 in which text data of natural language is accumulated as a character code string.

Description

[Detailed description of the invention]

〔産業上の利用分野］本発明は、自然言語の文章データを文字コード列として
蓄積したデータベース（以下、これを「テキストベース
」という）に対する、自然言語による間合せ文から高精
度な検索を可能とするテキストベース検索力式に関する
。〔従来の技術〕従来のこの種の技術としては、例えば、杉山也による「
自然言語理解に基づく情報検索システムＩＲＩＳＪ（情
報処理学会自然言語処理研究会資料ＮＬ　−５８−　８
　．　１９８６）に記載されている如く、データとして
の各テキストに対して、その内容に適したキーワード（
分野名または言葉）を付与することによって各テキスト
の内容すなわち特徴を表現し、検索時には、利用者の求
めるテキストの内容に関連するキーワード（分野名また
は言葉等）とその論理的結合関係（ＡＮＤ，ＯＲ等）を
指定し、その検索条件を満足するテキストを抽出するよ
うに構成されているものが知られている。上記文献において説明されている如き、自然言語による
質問文を受付けるインタフェースを有する場合も、質問
文を解析することによってユーザの検索要求を対応する
キーワードに展開し、それらキーワードの間の論理的結
合関係を決めて検索を行う．すなわち、自然言語による
インタフェースを有するか否かにかかわらず、前記テキ
ストベースの検索は、キーワード検索となっていた．ま
た、検索精度を向上させることを狙ったものとして、絹
川他による１日本語文構造解析による自動インデクシン
グ方式」（情報処理学会論文誌第２１巻３号，１９８０
）に記載されている如く、各キーワードに意味的役割（
テキスト中での主体，客体等）を付与する方法も提案さ
れているが、検常時の手掛りとしてキーワードを用いる
ことには変わりはない．［発明が解決しようとする課題］上記従来技術は、いずれも、テキスト中に含まれている
キーワードを手掛りにして検索を行うので、検索の精度
、すなわち、ユーザの求めるテキストがどれだけ正しく
検索できたか、が高くならないという問題があった．こ
こで、検索精度の尺度としては、一般に再現率（ユーザ
の検索要求に関連するテキストの中で、検索された関連
テキストの占める割合）と適合率（検索されたテキスト
の全体の中で検索された関連テキストの占める劃合）が
用いられる．すなわち、テキストの内容にふさわしいキーワードを付
与するということは、そのテキストの主題，要旨等を表
現するような言葉、あるいは、関連する主要な部分を表
わす言葉を、そのテキストを代表する言葉として付与す
るということであるが、ユーザが検索要求時に思い浮か
べるような言い方をすべてキーワードとして付与するこ
とは、検索時に不要なテキストを多数出力する結果にな
り、高い検索精度を確保しながら種々の表現に対応する
ことは難かしい．また、補足的な記述中の情報を検索し
たい場合についても、補足的な部分にキーワードを付与
することは一般的にはないので、キーワード検索によっ
て検索することは不可能である。本発明は上記事情に鑑みてなされたもので、その目的と
するところは、従来の技術における上述の如き問題を解
消し、キーワード検索に代る、高い検索精度を有し、か
つ、補足的に記述されている事柄をも検索可能なテキス
トベース検索方式を提供することにある。〔課題を解決するための手段〕本発明の上述の目的は、見出しの単語とその品詞情報，
文法情報等を記憶した単語辞書と、自然言語で書かれた
文書を蓄積したテキストベースと、自然言語を用いて文
章を入力する入力部と、入力された文章を単語に分割（
形態素解析）し、分割した単語の品詞情報，文法情報か
ら入力された文章の文法的構造の解析（構文解析）を行
う文解析部と、該文解析部の解析結果に基づいて前記テ
キストベースを検索する手段とを有するテキストベース
検索システムにおいて、前記見出しの単語と同義あるい
は類義な意味を有する単語を記憶した類義語辞書と、前
記テキストベースの文章を形態素解析，構文解析するテ
キストベース解析部と、該テキストベース解析部による
文章解析結果と前記文解析部による入力文の解析結果と
を照合する照合部を設けて、入力文中から、検索時に対
象となる一つ以上の単語を選別し、該単語間の格関係を
基に検索の標本となるべき構造（検索構造）を生成する
構造生成ステップと、該構造生成ステップにおいて作成
された検索構造を標本として、前記テキストベース解析
部による文章と解析結果と前記文解析部による入力文の
解析結果とを前記照合部により照合することにより、前
記テキストベース中を検索するテキスト検索ステップと
を備えたことを特徴とするテキストベース検索方式によ
って達成される．〔作用〕本発明に係るテキストベース検索方式においては、テキ
ストベース検索のための検索要求、例えば、日本語によ
る質問文を解析し、テキストベース中のすべての文章の
中から、検索要求の内容に合致するものを抽出すること
を特徴とするものであり、キーワード検索ではなく、テ
キストベース中のすべての文章を対象として検索要求に
合致するか否かをチェックする点が特徴である．また、
従来のテキストベースの検索方法が、キーワード検索に
頼らざるを得なかった理由としては、検索時にテキスト
の意味内容を解析することは、意味の解析自体が非常に
困難であること、および，それを実用的な応答時間の中
で実現することは不可能であること等が挙げられる．こ
れに対して、本発明に係るテキストベース検索方式にお
いては、テキストからの意味の抽出は行わず、検索要求
としての質問文の内容に合致するテキストとして質問文
の語およびその類義語とその結合関係（格関係）と同様
の、語または前記類義語のうちの一つおよびその結合関
係を有するものを抽出することで、処理の高速化を図り
、実用的な応答速度を達威するものである．〔実施例〕以下、本発明の実施例を図面に基づいて詳細に説明する
．第ｌ図は、本発明の一実施例を示すテキストベース検索
方式の概略フローである．図において、ｌは入力部、１
０は解析処理部、３は単語辞書、６は類義語辞書、７は
テキスト検索部、８はテキストベースを示している．な
お、上記解析処理部ｌＯは、後述する文解析ステップ２
，構造生成ステップ４，類義語展開ステップ５の各処理
ステップを実行する機能を有するものである．上記単語辞書３には、文解析１ｓ２における形態素解析
および構文解析に用いる情報が記憶されている．単語辞
書３の例は、第２図に示す通りで、その内容は、単語の
見出しとその単語の品詞および構文解析に必要な文法情
報から成る．第２図の例では、文法情報は、付属語につ
いてその付属語が接続できる語の種類（格助詞の場合は
「体１７！Ｊ等）を示してあり、「：ｊより右には、そ
の付属語が接統する語の格情報が示されている．但し，
ここでは、表層的な格情報で示されている．また、上記類ｊａ語辞書６には、類義な意味を表わす単
語が納められており、後述する類義語展開ステップ５で
参照される．第３図にその一例を示す如く、その内容は
、単語見出しと、その単語と類似な意味を持つ単語の集
まりから成る．テキストベース８は、検索対象となるべ
き文章の集まりであり、何等かの手段により計算機が直
接取扱えるような状態、例えば、磁気ディスクや磁気テ
ープ等の中に納められたものである．入力部ｌは、テキ
ストを検索するための検索要求（質問）を、自然言語の
文章によって入力するものであり、キー操作入力，音声
入力，文字のバターン認識等の文字符号化処理を介して
自然言語の文章が装置に取込まれる．文解析ステップ２は、入力部ｌで入力された文章を解析
し、文章の文法的構造を決定する．これには、文章を構
成する各単語の識別，分解を行う形態素解析と、それら
の単語の結び付き方から、文の構造を決定する構文解析
とがある。本ステップ２で行う構文解析は、文章中の各
用言に対応した格構造を抽出するものである．なお、上述の構文解析としては、格文法に対応する格構
造（格フレーム）を用意して、その文章の内容を抽出す
るもの、例えば、Ｆｉｌｌ＋＊ｏｒｅ等によって行われ
たものが利用できる．この処理の概要については、例え
ば、長尾著ｒｉｔ１！工学』（昭晃堂，昭和５８年刊）
の記載が参考になる．構造生成ステップ４は、前述の文解析ステップ２の結果
を受けて検索に用いる単語を取出し、それらの単語相互
間の関係から、検索に用いるための標準となるべき構造
（以下、「検索構造』という）を生成する．この際、同
一内容を表わす複数の自然言語表現が考えられる場合は
、後述する如く、その代表たるべき表現への変換を行う
。類義語展開ステップ５は、上述の構造生成ステップ４で
生成された検索構造中の単語について、前記類義語辞書
６を参照して、後述する如く、その単語と類似な意味を
表わす単語を選択し、検索構造を補強する．テキスト検索部７は、類義語展開ステップ５までで生成
された検索構造を標本として、テキストベース８を検索
して、標本である検索構造に合致したものを検索結果と
して出力する．この際、前述の文解析ステップ２と同様
に、単語辞書３を用いて形態素解析と構文解析を行う．上述の如く構成された本実施例のテキストベース検索方
式の動作を、以下、入力部ｌが入力文「テキストを検索
する』を、後の処理に送った場合を例として説明する．文解析
ステップ２では、入力文に対し、形態素解析および構文
解析を行い、入力文を「テキスト／名ｒ４ｊ「をｌ格助詞」「検索するｌ動詞」に分解し、更に、この入力文の動作は「検索』であり、
「検索Ｊの対象は「テキスト」であることを決定する。なお、このとき、実質的に同一内容を表わす複数の表現
、例えば、能動態と受動態による表現等に関する構文解
析結果は、例えば、第４図に示す如く、各入力文対応に
出力される．構造生成ステップ４では、上述の文解析ス
テップ２の出力結果から、検索対象となる一つ以上の単
語と、それら単語間の関係を示す「検索構造』を生成す
る．すなわち％ＬＩＳＰｔｌｌｉ型の表現で示すならば
、（検索対象テキスト））のようになる．なお、前述の如く、実質的に同一内容を表わす複数の表
現がある場合には、その代表となる構造への変換を行う
．すなわち、第４図に示す如く、「テキストを検索する
』，「テキストが検索される」，ｒテキストの検索』の
各文章からは、Ｉ！５ｒＭに示す処理により、ともに、（検索対象テキスト））の構造が生成される。類義語展開ステップ５では、前記類義語辞書６を参照し
て、上述の検索構造に含まれる単語を類ｊｉ語に展開す
る。例えば、類義語辞書６の中に，「検索』の類義語と
して「探す』、「テキスト」の類義語として「文書」，
「文章』があった場合、上述の検索構造は、（（検索探す）（対象（テキスト文書文輩）））の如く
補強される。次に、テキスト検索部７は、テキストベース解折ステッ
プ７１で、テキストベース８中の各文竜の文解析を行い
、照合ステップ７２で、単語が類義語ＪＭ開ステップ５
から引き渡された検索構造と同様な関係で出現するもの
を、一致した文書として出力する。従って、上述の例で
は、１文書を探す』，「文書が検索されるｊは一致した
と判定されるが、「テキストで検索する」は、非一致と
判定されることになる。上記実施例によれば、自然言語の文書から或るテキスト
ベースを検索対象とし、自然言語で検索するテキストを
指定し、入力文中の各単語の関係を利用して、入力文中
で使用された単語を、その類義語まで展開したもので検
索を行うことができるようになり、以下の如き効果が得
られる．（１）テキストベースに対する事前の処理が不
要となり、これによる情報の欠落を回避できる．（２）
特に専門知識がなくても利用可能になる。（３）意味的に近いものを検索できる。（４）多様な入力文に対応可能になる。なお、前述の実施例は一例として示したものであり，本
発明はこれに限定されるものではないことは、言うまで
もないことである。例えば、テキストベース解析ステッ
プ７■と文解析ステップ７２とは同様の機能を実現する
ものであり、同一のブロック（モジュール）で共用する
ことも可能である．更に、上記テキストベース解析ステ
ップ７ｌと文解析ステップ７２との間に、解析処理部１
０の構造生成ｆｉ４と同様の，構造生成ステッを有する
如く構成しても良い．[Industrial Application Field] The present invention enables high-precision searches from natural language makeshift sentences against a database that stores natural language text data as character code strings (hereinafter referred to as "text base"). Regarding the text-based search power formula. [Conventional technology] As a conventional technology of this kind, for example, "
Information retrieval system IRISJ based on natural language understanding (Information Processing Society of Japan Natural Language Processing Study Group Materials NL -58- 8
．． 1986), for each text as data, keywords (
The content or characteristics of each text is expressed by assigning keywords (field name or words, etc.) to the content of the text that the user is looking for, and their logical connection relationship (AND, There is a known system that is configured to specify a search condition (OR, etc.) and extract text that satisfies the search condition. Even when the interface has an interface that accepts question sentences written in natural language, as explained in the above-mentioned literature, the user's search request is expanded into corresponding keywords by analyzing the question sentence, and the logical connection relationship between those keywords is developed. Decide and search. In other words, the text-based search is a keyword search, regardless of whether it has a natural language interface or not. In addition, as a method aimed at improving search accuracy, Kinukawa et al. (1) Automatic indexing method using Japanese sentence structure analysis, Information Processing Society of Japan Transactions Vol. 21, No. 3, 1980
), each keyword has a semantic role (
Although methods have been proposed to add the subject, object, etc. in the text, keywords are still used as clues during inspection. [Problems to be Solved by the Invention] All of the above-mentioned conventional technologies perform a search using keywords contained in the text as clues, so the accuracy of the search, that is, how accurately the text desired by the user can be retrieved, is important. There was a problem that the value was not high. Here, as a measure of search accuracy, recall rate (proportion of related text searched among the text related to the user's search request) and precision rate (proportion of the searched text out of all the text searched) are generally used. (occupied by related texts) are used. In other words, assigning keywords appropriate to the content of the text means assigning words that express the theme, gist, etc. of the text, or words that represent the main related parts, as words that represent the text. However, adding all the phrases that come to mind when a user makes a search request as keywords will result in outputting a lot of unnecessary text when searching, making it difficult to accommodate a variety of expressions while ensuring high search accuracy. It's difficult to do. Furthermore, even when it is desired to search for information in supplementary descriptions, keywords are generally not assigned to supplementary parts, so it is impossible to search by keyword search. The present invention has been made in view of the above circumstances, and its purpose is to solve the above-mentioned problems in the conventional technology, to replace keyword searches with high search accuracy, and to provide supplementary information. The object of the present invention is to provide a text-based search method that can also search written matters. [Means for Solving the Problems] The above-mentioned object of the present invention is to provide heading words and their part-of-speech information,
A word dictionary that stores grammatical information, a text base that stores documents written in natural language, an input section that inputs sentences using natural language, and divides input sentences into words (
a sentence analysis unit that analyzes the grammatical structure (syntactic analysis) of the input sentence from the part of speech information and grammatical information of the divided words; and a sentence analysis unit that analyzes the text base based on the analysis results of the sentence analysis unit. a text-based search system comprising: a synonym dictionary storing words having the same meaning or similar meaning to the word in the heading; and a text-based analysis unit that performs morphological and syntactic analysis of the text-based sentence. , a collation unit is provided that collates the sentence analysis result by the text base analysis unit and the input sentence analysis result by the sentence analysis unit, and selects one or more words to be targeted at the time of search from the input sentence; A structure generation step that generates a structure (search structure) to be a sample for a search based on the case relationship between words, and a sentence and analysis by the text base analysis unit using the search structure created in the structure generation step as a sample. A text-based search method characterized by comprising a text search step of searching the text base by comparing the result with the analysis result of the input sentence by the sentence analysis unit by the collation unit. ．． [Operation] In the text-based search method according to the present invention, a search request for a text-based search, for example, a question sentence in Japanese, is analyzed and the content of the search request is selected from among all sentences in the text base. The feature is that it extracts matching items, and instead of keyword searching, it checks whether all sentences in the text base match the search request. Also,
The reason why conventional text-based search methods have had to rely on keyword searches is that it is extremely difficult to analyze the semantic content of text during a search, and that For example, it is impossible to achieve this within a practical response time. On the other hand, in the text-based search method according to the present invention, the meaning is not extracted from the text, but the words of the question sentence, their synonyms, and their connection relationships are extracted from the text that matches the content of the question sentence as a search request. By extracting words or one of the synonyms and their combinations similar to (case relation), processing speed is increased and practical response speed is achieved. [Example] Hereinafter, an example of the present invention will be described in detail based on the drawings. FIG. 1 is a schematic flowchart of a text-based search method showing an embodiment of the present invention. In the figure, l is the input section, 1
0 indicates an analysis processing section, 3 indicates a word dictionary, 6 indicates a synonym dictionary, 7 indicates a text search section, and 8 indicates a text base. Note that the analysis processing unit IO performs sentence analysis step 2, which will be described later.
, structure generation step 4, and synonym expansion step 5. The word dictionary 3 stores information used for morphological analysis and syntactic analysis in the sentence analysis 1s2. An example of the word dictionary 3 is shown in FIG. 2, and its contents consist of a word heading, the word's part of speech, and grammatical information necessary for syntactic analysis. In the example in Figure 2, the grammatical information indicates the type of word to which the adjunct can be connected (for case particles, ``body 17!J, etc.''), and ``To the right of :j, The case information of the word to which the adjunct is connected is shown. However,
Here, it is shown using superficial case information. In addition, the similar ja word dictionary 6 stores words expressing similar meanings, and is referred to in the synonym expansion step 5, which will be described later. As shown in Figure 3, the content consists of a word heading and a collection of words with similar meanings to that word. The text base 8 is a collection of sentences to be searched, and is stored in a state that can be directly handled by a computer by some means, such as on a magnetic disk or magnetic tape. The input unit l is for inputting a search request (question) for searching text in the form of sentences in natural language. The language text is imported into the device. Sentence analysis step 2 analyzes the sentence input through the input section 1 and determines the grammatical structure of the sentence. This includes morphological analysis, which identifies and decomposes each word that makes up a sentence, and syntactic analysis, which determines the structure of a sentence based on how these words are connected. The syntactic analysis performed in step 2 extracts the case structure corresponding to each predicate in the sentence. Note that the above-mentioned syntactic analysis can be performed by preparing a case structure (case frame) corresponding to a case grammar and extracting the content of the sentence, for example, by using Fill+*ore. For an overview of this process, see rit1! by Nagao, for example. Engineering” (Shokodo, published in 1982)
The description is helpful. In the structure generation step 4, words to be used in a search are extracted based on the results of the sentence analysis step 2 described above, and based on the relationships between these words, a structure (hereinafter referred to as a "search structure") that should be a standard for use in a search is created. ) is generated. At this time, if multiple natural language expressions expressing the same content are possible, conversion to the expression that should be representative is performed as described later. The synonym expansion step 5 is the same as the structure generation step described above. Regarding the words in the search structure generated in step 4, the search structure is reinforced by referring to the synonym dictionary 6 and selecting words expressing similar meanings to the words, as described later. The text base 8 is searched using the search structure generated up to the synonym expansion step 5 as a sample, and those that match the sample search structure are output as search results.At this time, in the same way as in the sentence analysis step 2 described above. Then, morphological analysis and syntactic analysis are performed using the word dictionary 3.The operation of the text-based search method of this embodiment configured as described above will be described below when the input unit l inputs the input sentence "Search text" as follows. An example will be explained in which the input sentence is sent to later processing.In sentence analysis step 2, the input sentence is subjected to morphological analysis and syntactic analysis, and the input sentence is converted into ``text/name r4j ``wo l case particle'' ``search l Furthermore, the action of this input sentence is "search",
``Determine that the target of search J is ``text.'' At this time, syntactic analysis results regarding multiple expressions expressing substantially the same content, such as expressions using active voice and passive voice, are output for each input sentence, as shown in FIG. 4, for example. In the structure generation step 4, a "search structure" indicating one or more words to be searched and the relationships between these words is generated from the output results of the sentence analysis step 2 described above.In other words, a ``search structure'' is generated that indicates the relationship between the words. If it is shown, it will be as follows (search target text)). As mentioned above, if there are multiple expressions that substantially represent the same content, the conversion is performed to the representative structure. In other words, As shown in Figure 4, from each sentence ``Search for text'', ``Text is searched'', and ``Search for text'', I! By the process shown in 5rM, the structure of (search target text)) is generated. In the synonym expansion step 5, the words included in the above-mentioned search structure are expanded into similar words with reference to the synonym dictionary 6. For example, in the thesaurus dictionary 6, the synonym of "search" is "search", the synonym of "text" is "document",
If there is a “sentence”, the above-mentioned search structure is reinforced as ((search search) (target (text document writer))). Next, the text search unit 7 executes the text base analysis step 71 Then, sentence analysis is performed for each Bunryu in the text base 8, and in a matching step 72, the word is found to have a synonym JM open step 5.
Documents that appear in the same relationship as the search structure passed from are output as matching documents. Therefore, in the above example, ``Search for one document'' and ``J, where a document is searched'' are determined to be a match, but ``Search by text'' is determined to be a non-match. According to the above embodiment, a certain text base is searched from a natural language document, the text to be searched in natural language is specified, and the words used in the input sentence are searched using the relationship between each word in the input sentence. It is now possible to search by expanding it to its synonyms, resulting in the following effects. (1) Pre-processing of the text base is no longer necessary, and information loss due to this can be avoided. (2)
It can be used without any special knowledge. (3) You can search for items that are semantically similar. (4) It becomes possible to handle a variety of input sentences. It should be noted that the above-mentioned embodiment is shown as an example, and it goes without saying that the present invention is not limited thereto. For example, the text base analysis step 7■ and the sentence analysis step 72 realize similar functions, and can be shared in the same block (module). Furthermore, between the text base analysis step 7l and the sentence analysis step 72, the analysis processing section 1
It may be configured to have a structure generation step similar to the structure generation step fi4 of 0.

【Effect of the invention】

以上、詳細に説明した如く、本発明によれば、テキスト
からの意味の抽出を行うのではなく、検索要求としての
質問文の内容に合致するテキストとして質問文の語およ
びその類義語とその結合関係（格関係）と同様の、語ま
たは前記類ａ語のうちの一つおよびその結合関係を有す
るものを抽出することにより、キーワード検索に代る、
高い検索精度を有し、かつ、補足的に記述されている事
柄をも検索可能なテキストベース検索方式を実現できる
という顕著な効果を奏するものである．As explained in detail above, according to the present invention, rather than extracting the meaning from the text, the words of the question sentence, their synonyms, and their combination relationships are extracted from the text that matches the content of the question sentence as a search request. (case relation), by extracting words or one of the above-mentioned class a words and those having the combination relation, instead of a keyword search,
This has the remarkable effect of realizing a text-based search method that has high search accuracy and can also search for supplementary descriptions.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すテキストベース検索方
式のフローチャート、第２図は単語辞書の内容の一例を
示す図、第３図は類義語辞書の内容の一例を示す図、第
４図は構文解析結果の一例を示す図、第５図は構造生成
ステップの処理の詳細を示すフローチャートである．１：入力部、ｌＯ：解析処理部、３：単語辞書、６：類
義語辞書、７：テキスト検索部、８：テキストベース、
２：文解析ステップ、４：構造生成ステップ、５：類義
語展開ステップ、７ｌ：テキストベース解析ステップ、
７２：照合ステップ・第２図〔見出し〕〔品詞〕〔文法情報〕第３図〔見出し〕〔類義語〕文書，文章探すFIG. 1 is a flowchart of a text-based search method showing an embodiment of the present invention, FIG. 2 is a diagram showing an example of the contents of a word dictionary, FIG. 3 is a diagram showing an example of the contents of a synonym dictionary, and FIG. 4 is a diagram showing an example of the contents of a synonym dictionary. is a diagram showing an example of a syntax analysis result, and FIG. 5 is a flowchart showing details of the processing of the structure generation step. 1: Input unit, lO: Analysis processing unit, 3: Word dictionary, 6: Thesaurus dictionary, 7: Text search unit, 8: Text base,
2: Sentence analysis step, 4: Structure generation step, 5: Synonym development step, 7l: Text-based analysis step,
72: Matching step - Figure 2 [Heading] [Part of speech] [Grammar information] Figure 3 [Heading] [Synonyms] Search for documents and sentences

Claims

[Claims]

(1) A word dictionary that stores heading words, their part-of-speech information, grammatical information, etc.; a text base that stores documents written in natural language; and an input section that inputs sentences using natural language; Divide the text into words (morphological analysis),
a sentence analysis unit that analyzes the grammatical structure (syntactic analysis) of an input sentence based on part-of-speech information and grammatical information of the divided words; and means for searching the text base based on the analysis result of the sentence analysis unit. A text-based search system comprising: a thesaurus dictionary storing words having the same meaning or similar meaning to the words in the heading; a text-based analysis unit that performs morphological analysis and syntactic analysis of the text-based sentence; and the text-based analysis unit. A collation unit is provided to compare the result of the sentence analysis by the unit with the result of the analysis of the input sentence by the sentence analysis unit, select one or more words to be searched from the input sentence, and determine the case relationship between the words. a structure generation step that generates a structure (search structure) that should be a sample for search based on the search structure; 1. A text-based search method comprising: a text search step of searching the text base by comparing the analysis result of the input sentence by the matching section with the analysis result of the input sentence by the matching section.