JP3442422B2

JP3442422B2 - Synonym information creation apparatus and method

Info

Publication number: JP3442422B2
Application number: JP05439993A
Authority: JP
Inventors: 一男住田; 誠司三池
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-03-15
Filing date: 1993-03-15
Publication date: 2003-09-02
Anticipated expiration: 2018-09-02
Also published as: JPH06266769A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書データベースに登
録される文書中の自然言語文を解析することにより、単
語もしくは複合語についての同義関係、上位−下位関係
などの同義語情報を取り出すための装置、並びに該同義
語情報作成装置で作成した同義語情報を用いて文書検索
を高精度に行う文書検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is to extract synonym information such as synonymous relations and superior-subordinate relations of a word or compound by analyzing natural language sentences in a document registered in a document database. And a document search device that performs document search with high accuracy using the synonym information created by the synonym information creation device.

【０００２】[0002]

【従来の技術】従来より、同義語や上位概念語、下位概
念語を用いて文書検索の精度を向上させる試みがなされ
ている。しかしながら、従来の同義語情報を用いた文書
検索装置にあっては、同義語情報はすべて人手で準備
し、入力しなければならない。同義語情報は、文書デー
タベースに格納されている文書で用いられている語に強
く依存するため、一般的なものをあらかじめ準備するこ
とは不可能である。このため、取り扱う文書データベー
スごとに、同義語情報を準備する必要があり、正確な文
書検索を行う文書検索装置を実現するための同義語情報
作成に、多大な費用と手間がかかっていた。2. Description of the Related Art Conventionally, attempts have been made to improve the accuracy of document retrieval by using synonyms, superordinate terms, and subordinate terms. However, in the conventional document search device using synonym information, all synonym information must be prepared and input manually. Since the synonym information strongly depends on the words used in the document stored in the document database, it is impossible to prepare general information in advance. For this reason, it is necessary to prepare synonym information for each document database to be handled, and it takes a great deal of time and effort to create synonym information for realizing a document retrieval device that accurately retrieves documents.

【０００３】このような同義語情報作成の一つとして、
国語辞書からシソーラスを作成する試みがなされている
（情報処理学会，自然言語処理研究会資料，８３−１
６）。この文献では、国語辞書の語義文から、語間の上
位−下位関係を取り出し、これによりシソーラスを自動
的に構築しようとする方法が述べられている。しかし、
既に述べたように同義語情報として必要となる語は、検
索する文書データベースに依存する。このため、汎用の
シソーラスを作成しても、検索装置での利用価値は低い
といわざるを得ない。As one of the methods for creating synonym information,
Attempts have been made to create a thesaurus from a Japanese language dictionary (Information Processing Society of Japan, Natural Language Processing Research Group, 83-1).
6). This document describes a method of extracting the upper-lower relation between words from the word meaning sentence of the Japanese dictionary and automatically constructing a thesaurus by this. But,
As described above, the word required as synonym information depends on the document database to be searched. For this reason, even if a general-purpose thesaurus is created, it is inevitable that the utility value of the search device is low.

【０００４】また、上記の辞書の語義文からシソーラス
を構築する方法と同様の方法を利用して、同義語情報を
文から自動的に抽出する装置が開示されている（特開平
４−１２３２６４号公報）。この装置にあっては、あら
かじめ登録された同義関係を表現する構文パターンと入
力された文が照合する場合、照合した文中の対応する語
間に同義関係を自動的に付与する。そして、この同義関
係の情報を格納した同義語辞書を用いて文書を検索する
というものである。An apparatus for automatically extracting synonym information from a sentence by using a method similar to the method for constructing a thesaurus from the word meaning sentence in the above dictionary has been disclosed (JP-A-4-123264). Gazette). In this device, when a syntax pattern expressing a synonymous relationship registered in advance is matched with an input sentence, a synonymous relation is automatically given between corresponding words in the collated sentence. Then, a document is searched using a synonym dictionary that stores the information of the synonymous relationship.

【０００５】しかしながら、自然言語解析の技術により
完全自動に正しい同義語情報を抽出することは、不可能
である。したがって、自動的な処理により同義語情報を
抽出した場合、正しくない同義関係や上位−下位関係に
ある語が取り出される場合がある。このような同義語情
報を格納した同義語辞書を用いて検索を行った場合、誤
った文書が検索されることになる。誤った文書を取り除
くというような手間がかえって増えてしまうという問題
があり、実用上の問題があった。However, it is impossible to extract the correct synonym information completely automatically by the technique of natural language analysis. Therefore, when the synonym information is extracted by the automatic processing, an incorrect synonymous relationship or a word having an upper-lower relationship may be extracted. When a search is performed using a synonym dictionary that stores such synonym information, an incorrect document will be searched. There was a problem in that the trouble of removing the wrong document would rather increase, and there was a practical problem.

【０００６】[0006]

【発明が解決しようとする課題】同義語情報をすべて人
手で入力する方法では多大な人件費が必要であり、現実
的ではない。また、辞書の語義文から同義語情報を作成
する方法においても、利用者にとって有用な同義語情報
を汎用的なものとして作成することはそもそも困難であ
るという問題から、利用者ごとに同義語情報を作成する
ことを効率化する装置の実現が望まれていた。加えて、
自然言語文から自動的に抽出した同義語情報には、誤り
が含まれており、自動的に同義語情報として、利用する
ことはできない。The method of manually inputting all synonym information requires a large labor cost and is not realistic. In addition, even in the method of creating synonym information from the word meaning sentence of the dictionary, it is difficult to create synonym information useful for users as a general-purpose one. It has been desired to realize a device that makes the creation of a device efficient. in addition,
The synonym information automatically extracted from the natural language sentence contains an error and cannot be automatically used as synonym information.

【０００７】本発明では、自然言語で書かれた文書から
同義語情報の候補を取り出し、その候補に基づいて修正
・追加を行い、文書検索のための同義語辞書を円滑に作
成する同義語情報作成装置を提供する。According to the present invention, synonym information for extracting synonym information candidates from a document written in natural language, correcting / adding based on the candidates, and smoothly creating a synonym dictionary for document retrieval. Provide a producing device.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するため
に、本発明における同義語情報作成装置においては、同
義語関係、上位−下位関係にある単語や複合語などの同
義語情報候補を自然言語文から抽出する手段、抽出した
同義語情報候補を表示する手段、表示した同義語情報候
補に対して修正・追加を施すための手段、修正・追加を
施した同義語情報候補を同義語辞書に登録する手段、さ
らに、同義関係を表現する自然言語文の代表例を入力す
ることにより、その文に対応する類似表現との照合をカ
バーする手段を有する。In order to achieve the above object, in the synonym information creating apparatus according to the present invention, synonym information candidates such as synonym relations, words having a superordinate-subordinate relation and compound words are naturally generated. Means for extracting from the sentence, means for displaying the extracted synonym information candidates, means for correcting / adding the displayed synonym information candidates, synonym dictionary for the corrected / added synonym information candidates And a means for covering a matching with a similar expression corresponding to the sentence by inputting a representative example of a natural language sentence expressing a synonymous relationship.

【０００９】加えて、本発明における同義語情報作成装
置においては、未登録語を検出する手段を有し、同義語
情報候補が抽出されなくても、未登録語について同義語
情報を入力することを促す。In addition, the synonym information creation device according to the present invention has means for detecting an unregistered word and inputs synonym information for an unregistered word even if a synonym information candidate is not extracted. Encourage.

【００１０】また、同義語情報候補の高精度な抽出を可
能にするため、抽出規則に対して例外的な文パターンを
登録する手段を有し、ある文が例外的な文パターンに照
合した場合、対応する抽出規則に照合しても同義語情報
をその文から抽出しない。Further, in order to enable highly accurate extraction of synonym information candidates, there is provided means for registering an exceptional sentence pattern with respect to the extraction rule, and when a certain sentence is matched with the exceptional sentence pattern. , Synonym information is not extracted from the sentence even if it is matched with the corresponding extraction rule.

【００１１】[0011]

【作用】かくして同義語情報となる候補を文書から抽出
した後、利用者が確認するとともに修正、追加を行うこ
とができるので、高精度な文書検索を可能にするための
同義語辞書作成が容易に作成できるようになる。In this way, the user can confirm the corrections and additions after extracting the candidates for the synonym information from the document, so that it is easy to create the synonym dictionary to enable highly accurate document search. Will be able to create.

【００１２】[0012]

【実施例】まず、第１の実施例について説明する。本発
明の実施例を、図に基づいて説明する。図１に本発明の
実施例を構成するための機器構成図を図示する。EXAMPLE First, a first example will be described. An embodiment of the present invention will be described with reference to the drawings. FIG. 1 illustrates a device configuration diagram for configuring an embodiment of the present invention.

【００１３】マウスなどのポインティングデバイスを含
むキーボードなどの入力手段１０１、ＣＲＴやビットマ
ップディスプレイなどの出力手段１０３、半導体メモリ
や磁気ディスク、光ディスクなどの記憶手段１０４、文
解析や同義語情報抽出の処理を行う処理手段１０２から
なる。Input means 101 such as a keyboard including a pointing device such as a mouse, output means 103 such as a CRT or bitmap display, storage means 104 such as a semiconductor memory, a magnetic disk or an optical disk, a sentence analysis and a synonym information extraction process. The processing means 102 for performing

【００１４】図２に機能構成図を図示する。文書データ
が格納されている文書データ記憶部２０５、一文ごとに
形態素解析、構文解析を行う文解析部２０１、同義語情
報の原語もしくは同義語として抽出する必要がない語を
登録する不要語辞書２０６、同義語情報を抽出するため
の構文パターンを記憶しておく抽出規則記憶部２０８、
抽出規則記憶部２０８ならびに不要語辞書に格納された
情報に従って、文解析部２０１から入力される構文構造
を解析し同語義候補を抽出し、同義語候補記憶部２０９
に格納する同義語候補抽出部２０２、同義語候補記憶部
２０９に格納されている候補をディスプレイ上に表示
し、同義語情報の追加・修正を行い、その結果を同義語
辞書２０７に格納する表示・修正部２０３、抽出規則記
憶部２０８に格納する抽出規則を入力する抽出規則入力
部２０４、同義語候補を記憶する同義語候補記憶部２０
９、確定した同義語を格納する同義語辞書２０７、全体
の制御を行う制御部２１０からなる。FIG. 2 is a functional block diagram. A document data storage unit 205 that stores document data, a sentence analysis unit 201 that performs morphological analysis and syntax analysis for each sentence, and an unnecessary word dictionary 206 that registers words that do not need to be extracted as original words or synonyms of synonym information. , An extraction rule storage unit 208 for storing a syntactic pattern for extracting synonym information,
According to the information stored in the extraction rule storage unit 208 and the unnecessary word dictionary, the syntactic structure input from the sentence analysis unit 201 is analyzed to extract synonym candidate, and the synonym candidate storage unit 209.
Displaying the candidates stored in the synonym candidate extraction unit 202 and the synonym candidate storage unit 209 on the display, adding / correcting the synonym information, and storing the result in the synonym dictionary 207. -Correction unit 203, extraction rule input unit 204 for inputting extraction rules to be stored in extraction rule storage unit 208, synonym candidate storage unit 20 for storing synonym candidates
9, a synonym dictionary 207 that stores the confirmed synonyms, and a control unit 210 that controls the whole.

【００１５】図２における、文書データ記憶部２０５、
不要語辞書２０６、抽出規則記憶部２０８、同義語辞書
２０７、同義語候補記憶部２０９は、図１における記憶
手段１０４に相当する。また、文解析部２０１、同義語
候補抽出部２０２、表示・修正部２０３、抽出規則記憶
部２０８、制御部２１０は、図１における処理手段１０
２で実行される。The document data storage unit 205 in FIG.
The unnecessary word dictionary 206, the extraction rule storage unit 208, the synonym dictionary 207, and the synonym candidate storage unit 209 correspond to the storage unit 104 in FIG. Further, the sentence analysis unit 201, the synonym candidate extraction unit 202, the display / correction unit 203, the extraction rule storage unit 208, and the control unit 210 are the processing means 10 in FIG.
It is executed in 2.

【００１６】文解析部２０１の処理は、市販の機械翻訳
システムなどで用いられている形態素解析、構文解析の
技術を利用すれば良い。図３に、制御部２１０の処理の
流れを図示する。制御部２１０は、装置全体の制御を行
う要素モジュールである。制御部２１０は、キーボード
またはマウスからの入力待ちを行い、入力が行われた場
合、その入力コードに従って処理を行う。すなわち、
“抽出”命令の入力が行われた場合、図４（ａ）の抽出
処理を起動する。“規則入力”命令の入力が行われた場
合、図４（ｂ）の規則入力処理を起動する。“表示・修
正”命令の入力が行われた場合、図４（ｃ）の表示・修
正処理を起動する。さらに、“終了”命令の入力が行わ
れた場合、全体の処理を終える。The processing of the sentence analysis unit 201 may use the techniques of morphological analysis and syntactic analysis used in commercially available machine translation systems and the like. FIG. 3 illustrates a processing flow of the control unit 210. The control unit 210 is an element module that controls the entire apparatus. The control unit 210 waits for an input from the keyboard or mouse, and when an input is made, performs processing according to the input code. That is,
When the “extract” command is input, the extraction process of FIG. 4A is started. When the "rule input" command is input, the rule input process of FIG. 4B is started. When the "display / correction" command is input, the display / correction process of FIG. 4C is started. Further, when the "end" command is input, the entire processing is ended.

【００１７】抽出処理の起動では、文書データ記憶部２
０５に格納されている文を対象にして、文解析部２０１
を起動することにより構文解析を行う。次に、得られた
構文解析結果を入力として同義語候補抽出部２０２を起
動し、同義語候補抽出を行う。When the extraction process is started, the document data storage unit 2
Sentence analysis unit 201 for the sentences stored in 05
Performs parsing by invoking. Next, the synonym candidate extraction unit 202 is activated with the obtained syntactic analysis result as an input, and synonym candidate extraction is performed.

【００１８】図５に、同義語候補抽出部の処理の流れを
図示する。同義語候補抽出部２０２では、入力された構
文パターンが、抽出規則記憶部２０８に格納されている
抽出規則に照合するかどうかを調べ、照合する場合、
（原語，同義語候補，関係）の３つ組を同義語候補記憶
部２０９に格納する。なお、原語や同義語候補が、不要
語辞書２０６に登録されている場合や、取り出した原語
と同義語のペアが同義語候補記憶部２０９にすでに格納
されている場合については、同義語候補記憶部２０９に
格納しない。FIG. 5 shows the flow of processing of the synonym candidate extraction section. The synonym candidate extraction unit 202 checks whether or not the input syntax pattern matches the extraction rule stored in the extraction rule storage unit 208.
The three sets of (original language, synonym candidate, relationship) are stored in the synonym candidate storage unit 209. Note that if the original language or synonym candidates are registered in the unnecessary word dictionary 206, or if the retrieved pair of original language and synonym is already stored in the synonym candidate storage unit 209, the synonym candidate storage is performed. It is not stored in the unit 209.

【００１９】図６に、不要語辞書２０６の形式、並びに
その格納内容の一例を図示する。不要語辞書２０６は、
検索にとってキーワードになり得ない語を登録する辞書
である。図示したように、見出しと、品詞から構成され
ている。図示した例では、例えば、「こと」や「もの」
などの名詞がキーワードとして不要であることを示す内
容が格納されている。FIG. 6 shows an example of the format of the unnecessary word dictionary 206 and its stored contents. The unnecessary word dictionary 206 is
This is a dictionary that registers words that cannot be keywords for search. As shown, it is composed of a headline and a part of speech. In the illustrated example, for example, “thing” or “thing”
It stores the content indicating that nouns such as are unnecessary as keywords.

【００２０】図７に、抽出規則記憶部２０８の形式、並
びにその格納内容の一例を図示する。抽出規則記憶部２
０８は、同義語情報を抽出するための構文パターンと、
その構文パターンで取り出される語間の関係を格納して
おくものである。図示した例の規則１は、「Ａ１はＡ２
の一種である。」という文についての構文パターンと、
その構文パターンが表現する関係は、上位関係であるこ
とを示している。ここで、“Ａ１”と“Ａ２”は特別な
意味を有しており、それぞれ原語とその原語に対する同
義語（上位語、下位語を含む）を表している。FIG. 7 shows an example of the format of the extraction rule storage unit 208 and the contents stored therein. Extraction rule storage unit 2
08 is a syntax pattern for extracting synonym information,
The relationship between the words extracted by the syntactic pattern is stored. The rule 1 in the illustrated example is "A1 is A2
Is a kind of. The syntax pattern for the sentence
The relation expressed by the syntax pattern indicates that it is a superordinate relation. Here, "A1" and "A2" have special meanings, and respectively represent an original word and a synonym (including a high-order word and a low-order word) for the original word.

【００２１】また、“ｇａ”，“ｄｋ”，“ｎｏ”は、
構文における格を表現している。図８に、ある文に対し
て抽出処理を行った例を図示する。入力文に対して、図
４（ａ）に示したステップ３０２において文解析が行わ
れる。次に、ステップ３０３において同義語候補抽出部
２０２が起動される。Further, "ga", "dk", and "no" are
Represents a case in syntax. FIG. 8 illustrates an example in which the extraction process is performed on a certain sentence. Sentence analysis is performed on the input sentence in step 302 shown in FIG. Next, in step 303, the synonym candidate extraction unit 202 is activated.

【００２２】同義語候補抽出部２０２では、図５に図示
の処理フローに従い、抽出規則記憶部２０８に格納され
ている抽出規則と照合を行う。図８に図示した例では、
規則２が照合する。抽出規則の“Ａ１”と“Ａ２”に照
合する語は、それぞれ「ＣＤ」と「譲渡性預金」であ
る。したがって、同義語候補記憶部に格納される３つ組
は、（ＣＤ，譲渡性預金，同義）ということになる。The synonym candidate extraction unit 202 collates with the extraction rule stored in the extraction rule storage unit 208 according to the processing flow shown in FIG. In the example shown in FIG.
Rule 2 matches. The words to be matched with “A1” and “A2” in the extraction rule are “CD” and “negotiable deposit”, respectively. Therefore, the triplet stored in the synonym candidate storage unit is (CD, negotiable deposit, synonym).

【００２３】図９に、同義語候補記憶部の形式、並びに
その格納内容の一例を図示する。同義語候補記憶部に
は、最低限（原語，同義語，関係）の３つ組が格納され
るが、図示した例では、照合際の情報として照合に用い
られた規則のアイデンティファイアをも格納している。FIG. 9 shows an example of the format of the synonym candidate storage unit and its stored contents. The synonym candidate storage unit stores at least a triplet (original language, synonym, relation), but in the illustrated example, the identifier of the rule used for matching is also used as information at the time of matching. Is stored.

【００２４】図１０乃至図１３に、表示・修正部２０３
の処理の流れを図示する。また、表示と入力の例を図１
３に図示する。図１３（ｂ）において、１００１の領域
は原語を表示する領域、１００２の領域は上位語を表示
する領域、１００３の領域は同義語を表示する領域、１
００４の領域は下位語を表示する領域である。また、１
００５の領域には表示・修正部に対するコマンド入力の
ためのボタンを配置している。図１３では、上位語、下
位語、同義語などの種別を表す表示をしていないが、こ
れらの種別を表示しておいてもかまわない。The display / correction unit 203 is shown in FIGS.
The process flow of is illustrated. Also, an example of display and input is shown in FIG.
It is shown in FIG. In FIG. 13B, an area 1001 displays an original word, an area 1002 displays an upper word, and an area 1003 displays a synonym.
The area 004 is an area for displaying a lower word. Also, 1
Buttons for inputting commands to the display / correction unit are arranged in the area 005. Although FIG. 13 does not display the types such as the upper word, the lower word, and the synonyms, these types may be displayed.

【００２５】表示・修正部２０３は、起動された後、同
義語候補記憶部２０９に格納された原語リストを一覧表
示する（図１３（ａ））。表示された一覧表示から原語
を選択することにより、その原語に対応する同義語情報
を表示する（図１３（ｂ））。After being activated, the display / correction unit 203 displays a list of the original word lists stored in the synonym candidate storage unit 209 (FIG. 13A). By selecting an original word from the displayed list display, synonym information corresponding to the original word is displayed (FIG. 13B).

【００２６】同義語情報を表示した時点で、図１３
（ｂ）で表示されているメニューに従い、“修正”命
令，“次”命令，“格納”命令，“関連語”命令，“終
了”命令のいずれかの入力が行われるのを待つ。When the synonym information is displayed, FIG.
According to the menu displayed in (b), it waits for any of the "correction" command, "next" command, "store" command, "related word" command, and "end" command to be input.

【００２７】“修正”命令の場合、図１０ステップ９０
６において修正処理を実行する。修正処理は、図１１及
び図１２のステップ９１５以下のサブルーチンである。
“次”命令の場合、確認修正が行われていない原語につ
いて同義語情報を表示する。In the case of the "correct" command, step 90 in FIG.
In step 6, the correction process is executed. The correction process is a subroutine after step 915 in FIGS. 11 and 12.
In the case of the "next" command, synonym information is displayed for the original language that has not been confirmed and corrected.

【００２８】“格納”命令の場合、現在表示している同
義語情報を同義語辞書に格納するとともに、同義語候補
記憶部２０９から削除する。“関連語”命令の場合、現
在確認修正中の同義語情報に含まれる語と関連する語
（他の同義語情報に存在する語）を、同義語候補記憶部
２０９ならびに同義語辞書２０７からすべて検索し、こ
れらの語の一覧を表示する（図１３（ｄ））。図１３
（ｄ）において、“→”は下位語から上位語への関係
を、“＝”は同義関係をそれぞれ表現している。In the case of the "store" command, the currently displayed synonym information is stored in the synonym dictionary and deleted from the synonym candidate storage unit 209. In the case of the “related word” command, all words related to the word included in the synonym information currently being confirmed and modified (words existing in other synonym information) are stored in the synonym candidate storage unit 209 and the synonym dictionary 207. A search is performed and a list of these words is displayed (FIG. 13 (d)). FIG.
In (d), “→” represents a relationship from a lower word to a higher word, and “=” represents a synonymous relationship.

【００２９】“終了”命令の場合、同義語情報の表示を
消し、表示・修正部の処理を終える。“修正”命令の場
合、ステップ９０６により、ステップ９１５以下の処理
が行われる。ここでは、同義語情報として表示されてい
る個々の語が処理対象となる。このため、修正対象とす
る語を、“修正ポインタ”と呼ぶ変数により管理し、そ
のポインタで指し示されている語については、強調表示
を行う。In the case of the "end" command, the display of the synonym information is erased, and the processing of the display / correction unit ends. In the case of the "correction" command, step 906 performs the processing of step 915 and thereafter. Here, individual words displayed as synonym information are to be processed. Therefore, the word to be corrected is managed by a variable called "correction pointer", and the word pointed by the pointer is highlighted.

【００３０】修正対象を移動するために、利用者が入力
する入力コードが、あらかじめ定められた“次単語”命
令，“前単語”命令，“次関係”命令，“前関係”命令
に対応するコードの場合、それぞれに対応して、修正ポ
インタの値を変更した後、ステップ９１６において強調
表示の位置を変更する。The input code input by the user to move the correction target corresponds to a predetermined "next word" command, "previous word" command, "next relation" command, and "prerelation" command. In the case of code, after changing the value of the correction pointer corresponding to each code, the position of highlighting is changed in step 916.

【００３１】なお、ここでの“次関係”というのは、修
正ポインタが上位語のフィールド内の語を指していれば
同義語のフィールドに、同義語のフィールド内の語を指
していれば下位語のフィールドに、下位語のフィールド
を指していれば上位語のフィールドに、修正ポインタの
指す語を変更する命令のことである。The "next-relationship" here means a synonym field if the correction pointer points to a word in the higher-order field, and a lower order if it points to a word in the synonym field. This is an instruction to change the word pointed by the correction pointer to the field of the upper word if it points to the field of the lower word in the field of the word.

【００３２】なお、下位語の最終単語を修正ポインタが
指している場合、“次単語”命令が入力された場合は、
上位語の最初の単語に修正ポインタが移るというよう
に、サイクリックに移動する。If the correction pointer points to the last word of the lower word, or if the "next word" command is input,
It moves cyclically such that the correction pointer moves to the first word of the broader terms.

【００３３】“単語修正”命令が入力された場合、図１
２のステップ９２７において単語の修正のための入力を
受けつける。すなわち、図１３（ｃ）のように、カーソ
ルが表示され、修正入力を受けつける。When the "correct word" command is input, the process shown in FIG.
In step 927 of step 2, the input for correcting the word is accepted. That is, as shown in FIG. 13C, the cursor is displayed to accept the correction input.

【００３４】“単語追加”命令が入力された場合、ステ
ップ９２９において単語を追加するための処理を行う。
例えば、図１３において直前の修正ポインタの位置が
“預金”である場合図１４（ｆ）のように、直前のポイ
ンタが“譲渡性預金”である場合図１４（ｇ）のよう
に、入力の領域を確保し、カーソルを表示する。When the "add word" command is input, processing for adding a word is performed in step 929.
For example, when the position of the correction pointer immediately before in FIG. 13 is “deposit”, as shown in FIG. 14F, when the position of the previous pointer is “negotiable deposit”, as shown in FIG. Allocates an area and displays the cursor.

【００３５】“削除”命令が入力された場合は、修正ポ
インタが指し示す単語を削除する。なお、図１３（ｅ）
は、対象としている関係についての同義語候補が存在し
ない場合を示している（図では下位語）。この場合に、
“単語追加”命令を入力した場合は、同じ位置にカーソ
ルを表示し、単語の入力を受けつける。When the "delete" command is input, the word pointed by the correction pointer is deleted. Note that FIG. 13 (e)
Indicates that there is no synonym candidate for the target relationship (lower term in the figure). In this case,
When the "word addition" command is input, the cursor is displayed at the same position and the word input is accepted.

【００３６】本実施例の表示・修正部２０３では、原語
リストを始めに表示し、そのリストから修正する原語を
利用者が選択するものとしたが、同義語候補記憶部２０
９に格納されている最初の原語についての同義語情報を
表示するように変形することも可能である。In the display / correction unit 203 of this embodiment, the original word list is displayed first, and the user selects the original word to be corrected from the list.
It is also possible to modify so as to display the synonym information about the first original word stored in 9.

【００３７】図１５に、同義語辞書２０７の形式、並び
にその格納内容の一例を図示する。同義語辞書は、原語
見出し、関係、同義語から構成されており、図１５の例
では、例えば、「ＣＤ」という原語に対して「預金」と
いう語が上位語であり、「譲渡性預金」という語が同義
語であり、下位語に相当する語がないことを示してい
る。FIG. 15 shows an example of the format of the synonym dictionary 207 and its stored contents. The synonym dictionary is composed of original word headings, relationships, and synonyms. In the example of FIG. 15, for example, the word “deposit” is a superordinate word to the original word “CD”, and “negotiable deposit”. Is a synonym, indicating that there is no word corresponding to a subordinate word.

【００３８】図１６に、抽出規則入力部２０４の処理の
流れを図示する。また、図１８にその表示と入力例を図
示する。抽出規則入力部２０４では、入力する構文パタ
ーンならびに関係を、それぞれ構文パターン一時記憶と
関係一時記憶に格納して管理する。FIG. 16 shows the flow of processing of the extraction rule input unit 204. Further, FIG. 18 shows the display and an input example. The extraction rule input unit 204 stores and manages the input syntax patterns and relationships in the syntax pattern temporary storage and the relationship temporary storage, respectively.

【００３９】抽出規則入力部２０４では、利用者の入力
する命令（“入力”命令，“関係入力”命令，“格納”
命令，“終了”命令のいずれか）に従って対応する処理
を行う。In the extraction rule input section 204, the commands ("input" command, "relational input" command, "store") entered by the user are input.
Command, or "end" command).

【００４０】“入力”命令の場合、図１６のステップ１
２０４以下の処理を行う（図１８（ａ））。すなわち、
利用者の入力を受けつけ（図１８（ｂ））、入力された
文に対して文解析を行った後、解析結果の構文パターン
を構文パターン一時記憶に格納する。ただし、入力文中
に、原語を意味する文字列“Ａ１”や同義語を意味する
文字列“Ａ２”が存在しない場合、その警告を表示する
（図１８（ｄ））。In the case of "input" instruction, step 1 of FIG.
The processing of 204 and below is performed (FIG. 18A). That is,
After receiving the user's input (FIG. 18B) and performing sentence analysis on the input sentence, the syntax pattern of the analysis result is stored in the syntax pattern temporary storage. However, when the character string "A1" meaning the original word or the character string "A2" meaning the synonym does not exist in the input sentence, the warning is displayed (FIG. 18 (d)).

【００４１】“関係入力”として、“上位”，“同
義”，“下位”のいずれかが入力された場合、関係一時
記憶に格納し、図１８（ｃ）のように表示する。“格
納”命令が入力された場合、構文パターン一時記憶ある
いは関係一時記憶に各々のデータが格納されている場
合、抽出規則記憶部２０８にデータを格納する。一時記
憶にデータが設定されていない場合については、図１８
（ｅ）のように警告を表示する。When any of "upper", "synonymous", and "lower" is input as the "relationship input", it is stored in the relational temporary storage and displayed as shown in FIG. 18 (c). When the "store" command is input, and when each data is stored in the syntax pattern temporary storage or the relational temporary storage, the data is stored in the extraction rule storage unit 208. If no data is set in the temporary storage, see FIG.
A warning is displayed as in (e).

【００４２】図１７に、抽出規則入力部２０４で入力し
た文が規則に変換される例を図示する。入力文におい
て、文字列“Ａ１”と“Ａ２”はそれぞれ原語と同義語
とを意味する。FIG. 17 illustrates an example in which the sentence input by the extraction rule input unit 204 is converted into a rule. In the input sentence, the character strings “A1” and “A2” mean the original language and the synonyms, respectively.

【００４３】以上のように、本実施例では、文書データ
記憶部２０５に格納されている文書内の文から同義語候
補を抽出し、その情報の確認／修正／追加が容易に行え
る装置を提供することが可能となる。As described above, in the present embodiment, a device is provided which can easily confirm / correct / add information of a synonym candidate extracted from the sentence in the document stored in the document data storage unit 205. It becomes possible to do.

【００４４】次に、第２の実施例について説明する。実
施例２では、未登録語の処理を同義語候補抽出部２０２
に付け加える。一般に文書検索においては、重要なキー
ワードとなる語は、汎用で用意されている語ではない場
合が多い。例えば、新聞記事の検索を考えた場合、固有
名詞がキーワードとして重要となるし、新製品発売の記
事に特定した場合は、製品名がキーワードとして重要に
なる。このような語をあらかじめ準備しておくことはそ
もそも不可能である。したがって、形態素解析や構文解
析などでは、これらの語は未登録語として取り扱われる
ことになる。Next, the second embodiment will be described. In the second embodiment, processing of unregistered words is performed by the synonym candidate extraction unit 202.
Add to. Generally, in document retrieval, words that are important keywords are often not words prepared for general purposes. For example, when considering the search for newspaper articles, proper nouns are important as keywords, and when specified in articles on new product launch, product names are important as keywords. It is impossible to prepare such words in advance. Therefore, these words are treated as unregistered words in morphological analysis and syntactic analysis.

【００４５】これら未登録語に関して、同義語情報を入
力することが重要となる。図１９に、実施例２について
の同義語候補抽出部２０２の処理の流れを図示する。ま
ず始めにステップ１５０１で入力された文中に未登録語
があるかどうかを判定し、存在しない場合については、
ステップ１５０６で図５で図示した処理を行う。For these unregistered words, it is important to enter synonym information. FIG. 19 illustrates a processing flow of the synonym candidate extraction unit 202 according to the second embodiment. First, it is determined whether or not there is an unregistered word in the sentence input in step 1501, and if it does not exist,
In step 1506, the processing illustrated in FIG. 5 is performed.

【００４６】次に未登録語を原語とした場合に、抽出規
則と照合するかどうかを判定し（ステップ１５０２）、
同義語候補としてすでに取り出されていたり、原語ある
いは同義語候補が不要語辞書中に存在しない場合につい
て、（原語，同義語候補，関係）の３つ組を同義語候補
記憶部２０９へ格納する（ステップ１５０３、１５０
４、１５０５）。Next, when an unregistered word is used as the original language, it is determined whether or not to match the extraction rule (step 1502).
When the original word or the synonym candidate does not exist in the unnecessary word dictionary, it is stored as a synonym candidate storage unit 209 in a triplet (original word, synonym candidate, relation) ( Steps 1503, 150
4, 1505).

【００４７】抽出規則と照合しない場合については、ス
テップ１５０７において（原語，＜ｎｕｌｌ＞，＜ｎｕ
ｌｌ＞）の３つ組を同義語候補記憶部へ格納する。他の
処理部については、実施例１とまったく同じであるので
詳細な説明は省略する。If the extraction rule is not matched, in step 1507 (original word, <null>, <nu>
11>) are stored in the synonym candidate storage unit. The other processing units are exactly the same as those in the first embodiment, and detailed description thereof will be omitted.

【００４８】以上のように実施例２では、未登録語につ
いても同義語候補記憶部２０９に格納するので、表示・
修正部３０３は実施例１とまったく同じ処理で行うこと
ができる。As described above, in the second embodiment, since the unregistered word is also stored in the synonym candidate storage unit 209, it is displayed / displayed.
The correction unit 303 can perform the same processing as in the first embodiment.

【００４９】なお、未登録語抽出を重要語抽出に置き換
えることも可能である。重要語抽出については、従来よ
り開示されている既存の処理（電子情報通信学会論文
誌，Ｄ−Ｉ，Ｖｏｌ．Ｊ７４−Ｄ−Ｉ，Ｎｏ．８）を用
いれば良い。すなわち、あらかじめ文書データ記憶部２
０５に格納されている文を解析して、重要語を判定し、
重要語テーブルを作成しておく。It is also possible to replace unregistered word extraction with important word extraction. For the important word extraction, the existing process disclosed conventionally (The Institute of Electronics, Information and Communication Engineers, DI, Vol. J74-DI, No. 8) may be used. That is, the document data storage 2
The sentence stored in 05 is analyzed to determine the important word,
Create an important word table.

【００５０】同義語候補抽出部２０２では、入力文中に
重要語テーブルに含まれている重要語が存在する場合、
図１９と同様の処理を行うように変形することが可能で
ある。In the synonym candidate extraction unit 202, when there is an important word included in the important word table in the input sentence,
It can be modified to perform the same processing as in FIG.

【００５１】次に、第３の実施例について説明する。実
施例１では、陽の抽出規則に基づいて同義語情報を抽出
したが、抽出規則の構文パターンを陽に表現するだけで
は困難である。そこで、実施例３では、負の規則として
２つのタイプの規則を記述できるように変形する。１つ
は構文パターンの例外を表現するタイプであり、もう１
つはその構文パターンに照合した場合同義関係として取
り出さないことを意味するタイプである。Next, a third embodiment will be described. In the first embodiment, the synonym information is extracted based on the explicit extraction rule, but it is difficult to express the syntactic pattern of the extraction rule only explicitly. Therefore, the third embodiment is modified so that two types of rules can be described as negative rules. One is the type that expresses the exception of the syntax pattern, and the other is
One is a type that means that it is not taken out as a synonymous relationship when it is matched with the syntactic pattern.

【００５２】図２０に、例外規則についての記憶の例を
図示する。規則１５は、例外規則の記述例である。“ｎ
ｕｌｌ−同義”という関係は、構文パターンと照合した
場合、対応する規則（規則２）と照合するが、その照合
を無効にすることを意味している。すなわち、「電子化
辞書とは電子的辞書のことでない。」という文は、「Ａ
１とはＡ２のことである」という文の構文パターンに照
合するが、「電子化辞書」と「電子的辞書」を同義語と
して取り出さないことを意味する規則である。FIG. 20 illustrates an example of storage for exception rules. Rule 15 is a description example of the exception rule. "N
The "ull-synonymous" relationship means that, when matching a syntactic pattern, it matches the corresponding rule (rule 2), but invalidates the matching. That is, "an electronic dictionary is an electronic dictionary. The sentence "not a dictionary" means "A
This is a rule that means that "1 is A2" is matched with the syntactic pattern of the sentence, but "electronic dictionary" and "electronic dictionary" are not taken out as synonyms.

【００５３】このタイプの規則は、対応する規則につい
てのみ有効な規則である。また、規則１６は、無効化規
則の記述例である。“ｎｏｔ−同義”という関係は、構
文パターンと照合する場合、その語の組合せは、同義関
係にはならないということを意味する。この例は、「Ａ
１はＡ２と違う。」というように表現された語の間に
は、同義語関係を一切設定しないということを意味して
いる。This type of rule is valid only for the corresponding rule. Rule 16 is a description example of the invalidation rule. The relationship "not-synonymous" means that the combination of words does not have a synonymous relationship when matching a syntactic pattern. In this example, "A
1 is different from A2. It means that no synonym relationship is set between words expressed as "."

【００５４】この規則に照合した場合、無効語関係テー
ブル（（原語，同義語，関係）の３つ組からなる）に各
データを格納する。そして、この組に照合する語につい
て以降、同義語候補として取り出さない。When matching with this rule, each data is stored in the invalid word relation table (consisting of three groups of (original word, synonym, relation)). Then, the words matched with this set are not extracted as synonym candidates thereafter.

【００５５】例外規則についての処理を同義語候補抽出
部２０２に付加する。処理の流れを図２１に図示する。
本実施例では、例外規則に無効化規則に入力文が照合し
た場合、それ以降の抽出処理でその情報が利用されるよ
うな構成になっている。しかし、これを例外規則や無効
化規則に入力文が照合した場合、同義語候補記憶部２０
９を走査して、対応する同義語情報を探し出し、それら
を同義語候補記憶部２０９から削除するように変形する
ことは容易である。さらに、削除すべきであることを示
すフラグを格納するフィールドを（原語，同義語，関
係）の３つ組に対して新たに設けることにより、一括に
削除はせず対話的に削除確認ができるように、表示・修
正部を変形することも容易である。The process for the exception rule is added to the synonym candidate extraction unit 202. The process flow is shown in FIG.
In this embodiment, when the input sentence is matched with the invalidation rule and the exception rule, the information is used in the subsequent extraction processing. However, when the input sentence is matched with the exception rule or the invalidation rule, the synonym candidate storage unit 20
It is easy to scan 9 to find corresponding synonym information, and modify them so as to delete them from the synonym candidate storage unit 209. Furthermore, by newly providing a field for storing a flag indicating that deletion should be performed for the triple of (original language, synonym, relation), it is possible to interactively confirm deletion without deleting all at once. As described above, it is easy to change the display / correction unit.

【００５６】次に、第４の実施例について説明する。実
施例４においては、同義語候補抽出部で抽出する同義語
情報を他の原語に対しても適応する処理を付加する。図
２２に、本実施例の同義語候補抽出部２０２の処理の流
れを図示する。実施例２の同義語候補抽出部の処理の流
れを図示した図１９との違いは、ステップ１８０８と１
８０９であり、それ以外のステップ（１８０１〜１８０
７）は図１９のステップ（１５０１〜１５０７）とまっ
たく同じである。Next, a fourth embodiment will be described. In the fourth embodiment, a process of applying the synonym information extracted by the synonym candidate extraction unit to other original words is added. FIG. 22 illustrates a processing flow of the synonym candidate extraction unit 202 according to the present exemplary embodiment. The difference from FIG. 19 which illustrates the processing flow of the synonym candidate extraction unit of the second embodiment is that steps 1808 and 1
809, and other steps (1801 to 180)
7) is exactly the same as steps (1501 to 1507) in FIG.

【００５７】入力文が抽出規則と照合しない場合、ステ
ップ１８０８において処理対象の未登録語と、同義語候
補記憶部２０９あるいは同義語辞書２０７に格納されて
いるすべての原語との文字列の一致度を計算する。この
一致度があらかじめ定めたしきい値を越え原語が存在す
る場合、ステップ１８０９において、その原語に付与さ
れている同義語情報（上位語、下位語、同義語）を読み
込み、処理対象となっている未登録語の同義語情報とし
て同義語候補記憶部２０９に格納する。If the input sentence does not match the extraction rule, in step 1808, the matching degree of the character string between the unregistered word to be processed and all the original words stored in the synonym candidate storage unit 209 or the synonym dictionary 207. To calculate. If the matching degree exceeds a predetermined threshold value and there is an original word, in step 1809, the synonym information (upper word, lower word, synonym) given to the original word is read and becomes a processing target. The synonym candidate storage unit 209 stores the synonym information of the unregistered word.

【００５８】一致度の計算としては、例えば以下のよう
な式を用いる。Ｓ（Ａ，Ｂ）＝Ｃ（Ａ，Ｂ）／Ｍａｘ（Ａ，Ｂ） …（１）上の式において、ＡとＢは一致度を計る単語であり、Ｍ
ａｘ（Ａ，Ｂ）はＡかＢの単語のうち長い文字列の側の
文字列長、Ｃ（Ａ，Ｂ）はＡとＢとで一致する文字列長
である。例えば、Ａ＝“Ｊ３１００”、Ｂ＝“Ｊ３１０
０ＺＤ”とした場合、次のように計算できる。For the calculation of the degree of coincidence, for example, the following formula is used. S (A, B) = C (A, B) / Max (A, B) (1) In the above formula, A and B are words for measuring the degree of coincidence, and M
ax (A, B) is the character string length on the side of the long character string in the word A or B, and C (A, B) is the character string length in which A and B match. For example, A = “J3100”, B = “J310”
When 0ZD "is set, the calculation can be performed as follows.

【００５９】Ｓ（“Ｊ３１００”，“Ｊ３１００ＺＤ”）＝５／７＝
０．７ …（２）この値が、あらかじめ定めたしきい値より高い場合、ス
テップ１８０９を行う。例えば、同義語候補記憶部２０
９に、（“Ｊ３１００”，“パソコン”，上位）という
情報が格納されている場合、未登録語として新たに“Ｊ
３１００ＺＤ”が入力されることを仮定すると、同義語
候補記憶部２０９に格納されている文字列の一致度の高
い原語（“Ｊ３１００”）の同義語情報（“パソコン”
上位）を読み込み、これから（“Ｊ３１００ＺＤ”，
“パソコン”，上位）を新たに同義語候補記憶部２０９
に格納する。S (“J3100”, “J3100ZD”) = 5/7 =
0.7 (2) If this value is higher than a predetermined threshold value, step 1809 is performed. For example, the synonym candidate storage unit 20
When the information ("J3100", "personal computer", upper level) is stored in 9, the new word "J3100"
Assuming that 3100ZD ”is input, synonym information (“ personal computer ”) of the original word (“ J3100 ”) with a high degree of matching of the character strings stored in the synonym candidate storage unit 209
Read the upper level, and from now on (“J3100ZD”,
“Personal computer, upper” is newly added as a synonym candidate storage unit 209.
To store.

【００６０】なお、式（１）は、文字列の一致を計るた
めの尺度として、種々の変形を行うことは可能であるこ
とはいうまでもない。実施例１から４では、抽出規則を
入力する手段を有していた。しかし、抽出規則はあらか
じめ格納しておくものとし、抽出規則入力部２０４を省
略した形で装置を構成することは可能である。すなわ
ち、抽出規則として汎用的に用意できるものは、あらか
じめ用意しておくことにより、利用者の手間は削減でき
る。Needless to say, the equation (1) can be modified in various ways as a measure for measuring the matching of character strings. The first to fourth embodiments have a means for inputting the extraction rule. However, it is possible to store the extraction rules in advance and configure the device without the extraction rule input unit 204. That is, by preparing the extraction rules that can be prepared for general use, the user's labor can be reduced.

【００６１】また、不要語辞書２０６を持たない構成と
することにより、より簡単な構成とすることも可能であ
る。さらに、説明した実施例では、同義語候補記憶部２
０９と同義語辞書２０７とを別々にして構成したが、こ
れらを一つにまとめて構成することは可能である。この
場合、確認／修正／追加を終えたか否かの情報を、各原
語と同義語情報について付与できるようにする必要があ
る。Further, a simpler structure can be achieved by adopting a structure that does not have the unnecessary word dictionary 206. Further, in the described embodiment, the synonym candidate storage unit 2
09 and the synonym dictionary 207 are separately configured, but they can be configured as one. In this case, it is necessary to be able to add information as to whether or not confirmation / correction / addition is completed for each original word and synonym information.

【００６２】又、上記実施例では同義語辞書２０７は１
つであったが、複数のユーザが共通の辞書を使用するこ
とを想定し、同義語辞書２０７をユーザに応じて複数設
けても構成してもよい。In the above embodiment, the synonym dictionary 207 is 1
However, assuming that a plurality of users use a common dictionary, a plurality of synonym dictionaries 207 may be provided depending on the users.

【００６３】[0063]

【発明の効果】本発明によれば、文書データとして記憶
されている文書から、同義語候補の情報を取り出し、確
認／修正／追加を行いながら対話的に同義語辞書を構築
する装置が提供できる。この装置により、自動的に抽出
しては得らることができない高精度な同義語辞書を容易
に作成することができるようになる。According to the present invention, it is possible to provide an apparatus for interactively constructing a synonym dictionary while taking out information of synonym candidates from a document stored as document data and performing confirmation / correction / addition. . With this device, it becomes possible to easily create a highly accurate synonym dictionary that cannot be obtained by automatic extraction.

[Brief description of drawings]

【図１】機器構成を示す図。FIG. 1 is a diagram showing a device configuration.

【図２】機能構成を示す図。FIG. 2 is a diagram showing a functional configuration.

【図３】制御部の処理の流れを示すフロー図。FIG. 3 is a flowchart showing a processing flow of a control unit.

【図４】抽出処理，規則入力処理，表示・修正処理を
示すフロー図。FIG. 4 is a flow diagram showing extraction processing, rule input processing, display / correction processing.

【図５】同義語候補抽出部の処理の流れを示すフロー
図。FIG. 5 is a flowchart showing a processing flow of a synonym candidate extraction unit.

【図６】不要語辞書の例を示す図。FIG. 6 is a diagram showing an example of an unnecessary word dictionary.

【図７】抽出規則記憶部の例を示す図。FIG. 7 is a diagram showing an example of an extraction rule storage unit.

【図８】抽出処理例を示す図。FIG. 8 is a diagram showing an example of extraction processing.

【図９】同義語候補記憶部の例を示す図。FIG. 9 is a diagram showing an example of a synonym candidate storage unit.

【図１０】表示・修正部の処理の流れを示すフロー
図。FIG. 10 is a flowchart showing a processing flow of a display / correction unit.

【図１１】修正処理実行の流れを示すフロー図。FIG. 11 is a flowchart showing the flow of correction processing execution.

【図１２】修正処理実行の流れを示すフロー図。FIG. 12 is a flowchart showing the flow of correction processing execution.

【図１３】表示・修正部の表示例を示す図。FIG. 13 is a diagram showing a display example of a display / correction unit.

【図１４】表示・修正部の表示例を示す図。FIG. 14 is a diagram showing a display example of a display / correction unit.

【図１５】同義語辞書の例を示す図。FIG. 15 is a diagram showing an example of a synonym dictionary.

【図１６】抽出規則入力部の例を示す図。FIG. 16 is a diagram showing an example of an extraction rule input unit.

【図１７】抽出規則入力部の解析例を示す図。FIG. 17 is a diagram showing an analysis example of an extraction rule input unit.

【図１８】抽出規則入力部の表示例を示す図。FIG. 18 is a diagram showing a display example of an extraction rule input unit.

【図１９】実施例２の同義語候補抽出部の処理の流れ
を示すフロー図。FIG. 19 is a flowchart showing a processing flow of a synonym candidate extraction unit of the second embodiment.

【図２０】例外規則、無効化規則の記憶例を示すフロ
ー図。FIG. 20 is a flowchart showing an example of storage of exception rules and invalidation rules.

【図２１】実施例３の同義語候補抽出部の処理の流れ
を示すフロー図。FIG. 21 is a flowchart showing a processing flow of a synonym candidate extraction unit of the third embodiment.

【図２２】実施例４の同義語候補抽出部の処理の流れ
を示すフロー図。FIG. 22 is a flowchart showing a processing flow of a synonym candidate extraction unit of the fourth embodiment.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 320 G06F 17/28 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/30 320 G06F 17/28 JISST file (JOIS)

Claims

(57) [Claims]

1. A document data storage unit for storing document data, a syntactic analysis unit for parsing each sentence in a document in the document data storage unit, and a pair of a syntactic pattern and a synonym relation is extracted. Extraction rule storage means for storing as a rule; synonym candidate extraction means for extracting synonym candidates by matching the syntactic pattern obtained by the syntactic analysis means with the extraction rule; and within the document in the document data storage means An unregistered word detecting means for detecting an unregistered word in each sentence, and a synonym candidate for storing the synonym candidate obtained by the synonym candidate extracting means and the unregistered word detected by the unregistered word detecting means Storage means; and correction means for correcting / confirming / adding to the synonym candidates stored in the synonym candidate storage means, wherein the synonym candidate storage means includes: The one that matches the extraction rule is stored as a synonym candidate.
And has no synonyms that do not match the above extraction rules
A synonym information creation device for storing as a synonym candidate .

2. A document data storage unit for storing document data, a syntactic analysis unit for parsing each sentence in a document in the document data storage unit, and a pair of a syntactic pattern and a synonym relation is extracted. Extraction rule storage means for storing as a rule, exception rule storage means for storing an exception of the extraction rule as an exception rule, invalidation rule storage means for storing a pattern of words that do not form a synonymous relationship as an invalidation rule, and The syntactic pattern obtained by the syntactic analysis means, the synonym candidate extracting means for collating the extraction rule, the exception rule and the invalidation rule to extract synonym candidates, and the synonym obtained by the synonym candidate extracting means. A synonym candidate storing means for storing a candidate, and a correcting means for correcting / confirming / adding the synonym candidate stored by the synonym candidate storing means, Broadcast creation device.

3. A synonym information creating method using a computer, wherein syntactic analysis is performed on each sentence in document data stored in advance to obtain a syntax pattern, and the syntactic pattern is stored in advance. extraction rule and the collation to extract synonyms candidate from the syntax pattern, detects the unregistered word in each sentence of the document data, those that have been matched to the extraction rule of the unregistered word and the synonyms candidate storing as a synonym candidate, one that does not match with the extraction rule of the unregistered word,
A synonym information creating method for storing synonym candidates that do not have synonyms and correcting / confirming / adding the synonym candidates.

4. A method for creating synonym information using a computer, wherein syntactic analysis is performed on each sentence in document data stored in advance to obtain a syntactic pattern. Extraction rules that represent the correspondence between patterns and synonyms, exception rules that represent exceptions to the extraction rules, and invalidation rules that represent patterns of words that do not form a synonym relationship are extracted and stored as synonym candidates. Then, the synonym information creating method of correcting / confirming / adding the synonym candidate.