JPH02129756A

JPH02129756A - Word collating device

Info

Publication number: JPH02129756A
Application number: JP63284266A
Authority: JP
Inventors: Atsuo Kawai; 河合　敦夫
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1988-11-10
Filing date: 1988-11-10
Publication date: 1990-05-17

Abstract

PURPOSE:To prevent a synonym dictionary from increasing its storage capacity and to execute word collation considering the meaning of a word by dividing the word to be collated into nouns consisting of short units, replacing a part or the whole of the word by its synonym and then collating the character string. CONSTITUTION:A word A is divided into nouns consisting of short units by a divided writing part 100. A synonym replacing part 200 searches the synonym of the noun of the short unit by using its internal synonym dictionary. Preferably, a long unit noun coupling plural short unit nouns is formed and the synonym of the long unit noun is obtd. from a synonym dictionary. The work formed by the synonym replacing part 200 and including a synonym in a part or the whole of the word and the original word A are sent to a decision part 300 and compared with a word B. Thereby, it is unnecessary to store all synonyms corresponding to respective appearing words in the synonym dictionary and whether the word is a synonym or not can be accurately decided while considering the meansings of individual nouns constituting the word.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は２つの単語を照合して両者の表わす概念が同じ
であるかどうかを認識する技術に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a technology for comparing two words and recognizing whether the two words express the same concept.

更に詳細には、本発明は自然言語で書かれた文書を計算
機で処理する際に、文字コードで記述された単語同士の
意味的な一致度を判定する装置に関するものであり、キ
ーワード自動生成装置９交書検索装置等に組み込んで用
いることができる。More specifically, the present invention relates to a device for determining the degree of semantic matching between words written in character codes when a document written in a natural language is processed by a computer, and relates to an automatic keyword generation device. It can be used by incorporating it into a correspondence search device, etc.

例えば、文書検索において、文書データベース作成者は
文献の主題をキーワードと呼ばれる単語で表現し、検索
者は検索したい情報の概念をキーワードで置き換えて検
索する。したがって、データベース作成者の付与するキ
ーワードと、検索者が検索時に使用するキーワードが、
文字列として一致した場合には、検索が成功することに
なる。For example, in a document search, a document database creator expresses the subject of a document using words called keywords, and a searcher searches by replacing the concept of the information he/she wants to search with the keyword. Therefore, the keywords assigned by the database creator and the keywords used by searchers when searching are
If the strings match, the search is successful.

しかし、作成者と検索者の付与する単語（キーワード）
は、たとえ同じ概念を表していても、同義語の存在や単
語の表記のずれにより、必ずしも文字列としては一致し
ないことがある。こうした場合には、２つの単語の表わ
す概念が同じであることを認識する単語照合装置が必要
となる。However, the words (keywords) given by the creator and the searcher
Even if they represent the same concept, they may not necessarily match as character strings due to the existence of synonyms or discrepancies in word notation. In such a case, a word matching device is required that recognizes that two words express the same concept.

[Conventional technology]

従来の単語照合装置において、文字コードで与えられた
単語同士が同義語であるかどうかの照合方式として次の
ような方式が用いられている。In conventional word matching devices, the following method is used to check whether words given by character codes are synonymous.

（１）辞書中に記述しである同義語を参照する方式。(1) A method of referring to synonyms written in a dictionary.

これは、辞書中に同義語関係にあるすべての単語を記述
しておき、その辞書を参照することにより同義語である
かどうかの判定を行う方式である。This is a method in which all words that are synonymous are written in a dictionary, and whether or not words are synonymous is determined by referring to the dictionary.

■　文字列の一致の程度により、同義語であるかどうか
を判定する方式。■ A method that determines whether or not the strings are synonyms based on the degree of matching.

これは、特に個々の名詞の意味を考慮することなく、単
語中の文字列レベルでの一致度に従い、照合を行うもの
である。This method performs matching according to the degree of matching at the character string level within a word, without particularly considering the meaning of individual nouns.

（発明が解決しようとする課題）しかしながら、上記０）及び■の従来技術は、以下の問
題点がある。(Problems to be Solved by the Invention) However, the conventional techniques 0) and 2 above have the following problems.

（′ｌ）の手法は、前述したように、同義語のすべてを
辞書に記憶する方式である。この方式は、照合すべき単
語が短単位（日本語の単語として意味をなす最小単位の
単語）の名詞である場合は有効である。しかし、単語が
短単位名詞の結合によって作られる名詞連続複合語の場
合は、必ずしも有効ではない。この理由を以下に述べる
。一般に、科学技術分野などでは、短単位の名詞同士が
結合したり、これに、接頭辞、接尾辞が結合してできた
名詞連続複合語等が科学技術用語として使われる。As mentioned above, the method ('l) is a method of storing all synonyms in a dictionary. This method is effective when the word to be matched is a short-unit noun (the smallest meaningful unit of a Japanese word). However, this is not necessarily effective if the word is a noun continuous compound word formed by combining short unit nouns. The reason for this will be explained below. Generally, in the field of science and technology, noun continuous compounds formed by combining short nouns, prefixes, and suffixes are used as science and technology terms.

こうした場合に、■短単位ごとで、同義語１表記の揺れ
が存在する。■短単位の名詞が結合する場合に、′　　
、　　　　　“／′°等の記号や接頭辞。In such cases, there is a fluctuation in the number of synonyms written in each short unit. ■When short unit nouns are combined, ′
, “/′° and other symbols and prefixes.

接尾辞等が名詞の間に入り込んで派生的な複合語を形成
する。■省略語等では、もとの複合語に比べると、短単
位の名詞中の文字列の一部がなかったり、短単位の名詞
そのものが省略されることがある。こうした■、■、■
の原因およびその組み合せにより、１つの概念を表す名
詞連続複合語（同義語〉が非常に多く存在することにな
る。こうした同義語を網羅して辞書に記述しておくこと
は、辞書作成、メンテナンスの点からも大きな工数がか
かるし、計算機の記憶容量も同義語の数だけ増大すると
いう欠点がある。Suffixes etc. are inserted between nouns to form derived compounds. ■Compared to the original compound word, an abbreviated word may be missing some of the character strings in the short noun, or the short noun itself may be omitted. These ■, ■, ■
Due to the causes and combinations thereof, there are a large number of noun continuous compound words (synonyms) that express one concept.It is important to cover all these synonyms and write them in a dictionary. It also requires a large amount of man-hours, and the computer's storage capacity also increases by the number of synonyms.

また、■の方式では、辞書の記憶容量の増大や辞書メン
テナンスの問題は回避できる。しかし、文字列のレベル
での一致をみるため、文字列としては異なるが、意味が
同じ同義語（例えば、”ｌｉ−二次電池”′、“ｌｉ二
次電池”、“リチウム二次バッテリー′°）の判定はで
きないという欠点がある。Furthermore, in the method (2), problems such as an increase in dictionary storage capacity and dictionary maintenance can be avoided. However, in order to match at the character string level, synonyms that are different as character strings but have the same meaning (for example, "li secondary battery", "li secondary battery", "lithium secondary battery") The disadvantage is that it is not possible to determine

従って、本発明は上記従来技術の問題点を解決し、多大
な辞書の記憶容量を必要とせず、しかも辞書作成及びメ
ンテナンスが容易で、精度良く単語照合が行える単語照
合装置を提供することを目的とする。SUMMARY OF THE INVENTION Therefore, an object of the present invention is to solve the problems of the prior art described above, and to provide a word matching device that does not require a large storage capacity of a dictionary, is easy to create and maintain a dictionary, and can perform word matching with high accuracy. shall be.

（課題を解決するための手段〕第１図は本発明の原理ブロック図である。(Means for solving problems) FIG. 1 is a block diagram of the principle of the present invention.

分かち書き部１００は、照合すべき一方の単語（以下、
単語Ａという）を日本語の単語として意味をなす最小単
位の単語である短単位の名詞に分割する。The parting section 100 selects one of the words to be matched (hereinafter referred to as
Word A) is divided into short nouns, which are the smallest meaningful words in Japanese.

同義語置換部２００は、同義語辞書を用いて単語の一部
又は全部を同義語に置換する。The synonym replacement unit 200 replaces part or all of a word with a synonym using a synonym dictionary.

判定部３０Ｇは、照合すべき単語及び一部又は全部に同
義語を含む単語を照合すべき他方の単語（以下、単語Ｂ
という）と比較することにより、単語同士が同義語であ
るかどうかを判定する。The determining unit 30G selects the word to be matched and the word that includes a synonym in part or all as the other word to be matched (hereinafter, word B).
) to determine whether the words are synonymous.

[Effect]

単ＨＡは分かち書き部１００で、短単位の８語に分割さ
れる。例えば、単語Ａがａｔ　Ｔ　ａ２　、ａ３の短単
位の名詞から構成されているときは、ａａ２　、ａ３に
それぞれ分割される。同義語置換部２００は内部の同義
語辞書を用いて短単位の名詞の同義語を探す。例えば、
ａｔの同６語にａＩ′があり、ａ３の同義語にａ３′が
あるとする。これにより、単語Ａに関しては、ａ、’　
、ａｌ　、ａ３なる同義語の候補と、ａｔ　、ａｌ　、
ａ３’　なる同義語の候補と、ａｌ’＋ａｚｓａ３’な
る同義語の候補が得られる。ここで、好ましくは、短単
位名詞を複数結合させた長単位名詞を作成して、同義語
辞書から長単位名詞の同義語（例えば、ａｌとａｌに対
するａ＋’、ａ２’）を得る。The single HA is divided into eight short words by the dividing section 100. For example, when word A is composed of short unit nouns at T a2 and a3, it is divided into aa2 and a3, respectively. The synonym substitution unit 200 uses an internal synonym dictionary to search for synonyms of short unit nouns. for example,
Assume that aI' is one of the six same words of at, and a3' is a synonym of a3. As a result, for word A, a,'
, al , a3 and at , al ,
A synonym candidate a3' and a synonym candidate al'+azsa3' are obtained. Here, preferably, a long unit noun is created by combining a plurality of short unit nouns, and synonyms of the long unit noun (for example, al and a+', a2' for al) are obtained from a synonym dictionary.

以上のようにして同義語置換部２００で作成された、一
部又は全部に同義語を含む単語（上記の例Ｆはａｌ　’
　　ａｌ　、ａ３’　　：ａＩ　、ａｌ、ａ３’　　：
ａＩ　　、ａｌ、ａｘ’　）及び単語Ａを判定部３００
に送り、単語Ｂと比較する。判定部３００は、例えば比
較結果である一致度が所定の同値以上であれば、単語Ａ
とＢとは同義語であるという照合結果を出力する。例え
ば、ａＩ　＊　ａ２＊　ａ３からなる単語Ａそのものと
、単語Ｂとの比較では所定の閾値に達していない場合で
も、ａｌ　’　、　ａｚ　。Words that include synonyms in part or in whole, created by the synonym replacement unit 200 as described above (the above example F is al'
al, a3': aI, al, a3':
aI, al, ax') and word A by the determining unit 300
and compare it with word B. For example, if the matching degree as a comparison result is equal to or higher than a predetermined equivalence value, the determination unit 300 determines that the word A
A comparison result indicating that and B are synonyms is output. For example, even if the comparison between the word A consisting of aI*a2*a3 and the word B does not reach a predetermined threshold, al', az.

ａ３１　からなる単語が所定の閾値を越えた場合には、
単ＨＡとＢとは同義語であると判定する。If the word consisting of a31 exceeds a predetermined threshold,
It is determined that single HA and B are synonymous.

このように、出現する単語それぞれに対してすべての同
義語を同義語辞書に記憶させておく必要がなく、単語全
体の文字列だけではなく、その単語を構成している個々
の名詞の意味を考慮して、同義語であるかどうかを精度
良く判定することができる。In this way, there is no need to store all synonyms for each word in a synonym dictionary, and it is possible to store not only the string of the entire word but also the meaning of the individual nouns that make up the word. Taking this into account, it is possible to accurately determine whether or not they are synonyms.

〔Example〕

以下、本発明の一実施例を図面を参照して詳細に説明す
る。Hereinafter, one embodiment of the present invention will be described in detail with reference to the drawings.

第２図は、本発明の一実施例のハードウェア構成を示す
ブロック図である。同図において、入力装置１は文字列
で記述された単語を読み込むものである。入力装置１は
例えばキーボードで構成しても良いし、図示するハード
ウェア構成がホストコンピュータ等に接続され、ここか
ら文字列が与えられるときはインタフェース回路で構成
する。FIG. 2 is a block diagram showing the hardware configuration of one embodiment of the present invention. In the figure, an input device 1 reads words written in character strings. The input device 1 may be configured with a keyboard, for example, or may be configured with an interface circuit when the illustrated hardware configuration is connected to a host computer or the like and a character string is supplied from there.

出力装置２は照合した２つの単語が同義語であるかどう
かの照合結果を出力するもので、例えばデイスプレィや
プリンタ等で構成する。中央処理装置（以下、単にＣＰ
ＪＪという）３は、単語照合のプログラムを実行する。The output device 2 outputs the result of the comparison to determine whether or not the two words are synonymous, and is composed of, for example, a display or a printer. Central processing unit (hereinafter simply CP)
JJ) 3 executes a word matching program.

このプログラムは、後で詳しく説明するが、分かち書き
処理、短単位置換処理、長単位＠換処理及び−政変計算
処理を含む。This program, which will be explained in detail later, includes a separation process, a short unit replacement process, a long unit @ conversion process, and a -political change calculation process.

単語テーブル４は、入力装置１から与えられた照合すべ
き単語を格納するものである。短単位テーブル５は、照
合する一方の単語の分かち畠き結果を格納する。すなわ
ち、短単位テーブル５は、照合する一方の単語を、日本
語として意味をなす最小単位の単語に分割した結果を格
納する。候補テーブル６は、単語の一部または全部を同
義語で置換して得られる単語を格納する。プログラムメ
モリ７は、ＣＰＵ３が実行する単語照合のプログラムを
格納するもので、例えばＲＯＭで構成する。The word table 4 stores words to be matched given from the input device 1. The short unit table 5 stores the dividing result of one word to be compared. That is, the short unit table 5 stores the results of dividing one of the words to be compared into minimum unit words that have meaning in Japanese. The candidate table 6 stores words obtained by replacing part or all of a word with a synonym. The program memory 7 stores a word matching program executed by the CPU 3, and is composed of, for example, a ROM.

作業メモリ８は、単語照合のプログラムを実行する際に
使用する作業メモリである。この作業メモリ８と、単語
テーブル４．短単位テーブル５及び候補テーブル６とは
、例えばＲＡＭのメモリ空間上に展開される。日本語辞
１９は短単位名詞、接頭辞・接尾辞、記号からなる日本
語辞書であり、例えばハードディスク上に構成される。The working memory 8 is a working memory used when executing a word matching program. This working memory 8 and the word table 4. The short unit table 5 and candidate table 6 are developed, for example, in a RAM memory space. The Japanese dictionary 19 is a Japanese dictionary consisting of short unit nouns, prefixes/suffixes, and symbols, and is configured on, for example, a hard disk.

同義語辞書１０は、同義語関係にある単語を記述した同
義語辞書であり、例えばハードディスク上に構成される
。The synonym dictionary 10 is a synonym dictionary that describes words having a synonym relationship, and is configured, for example, on a hard disk.

第３図は、第２図に示す実施例の機能ブロック図である
。単語テーブル４は、照合すべき一方の単語を記憶する
単語テーブルＡ１１及び他方の単語を格納する単語テー
ブル８１２とを具備する。FIG. 3 is a functional block diagram of the embodiment shown in FIG. 2. The word table 4 includes a word table A11 that stores one word to be compared and a word table 812 that stores the other word.

ＣＰＵ３は、分かち書き部１６．短単位置換部１７、良
単位置換部１８．及び−政変計算部１９の機能を具備し
て構成される。分かち書き部１６は、日本語辞書９を参
照して、単語テーブルＡ１１に記憶されている単語を短
単位の名詞に分割して、短単位テーブル５に格納する。The CPU 3 has a parting section 16. Short unit replacement section 17, good unit replacement section 18. and - It is configured with the functions of a political change calculation section 19. The parting section 16 refers to the Japanese dictionary 9, divides the words stored in the word table A11 into short noun units, and stores the nouns in the short unit table 5.

短単位置換部１７は、短単位テーブル５中の短単位名詞
をキーとして、短単位ごとに同ｓ１語辞１１０中の短単
位同義語辞書２０を検索する。そして、短単位名詞に同
ＩＩ語が存在した場合には、短単位置換部１７は、短単
位テーブル５中の短単位の名詞を同義語に置換した単語
を生成し、もとの単語（単語テーブルＡ１１中の単語）
とともに候補テーブル６中の参照番号１３で示す候補テ
ーブル１中に格納する。民事位置換部１８は、候補テー
ブル１中の各単語中の短単位名詞を複数結合させた民事
位名詞を作り、これをキーとして民事位同義訊辞綱１０
中の民事位同義語辞１１２１を検索する。そして、民事
位名詞にＪｆ５１義語が存在した場合には、民事位置換
部１８は、その民事位名詞を同＠語で置換した単語を生
成する。そして、民事位置換部１８は、候補テーブル１
中の単語及びその艮単位名詞ごとに同義語に置換した単
語を、参照番号１４で示す候補テーブル２に格納する。The short unit replacement unit 17 searches the short unit synonym dictionary 20 in the s1 dictionary 110 for each short unit using the short unit noun in the short unit table 5 as a key. Then, if the same II word exists in the short unit noun, the short unit replacement unit 17 generates a word in which the short unit noun in the short unit table 5 is replaced with a synonym, and replaces the original word (word words in table A11)
It is also stored in candidate table 1 indicated by reference number 13 in candidate table 6. The civil position substitution unit 18 creates a civil position noun by combining a plurality of short unit nouns in each word in the candidate table 1, and uses this as a key to create a civil position synonym dictionary 10.
Search for civil position synonym dictionary 1121 inside. If a Jf51 meaning word exists in a civil position noun, the civil position substitution unit 18 generates a word by replacing the civil position noun with the same @ word. Then, the civil position replacement unit 18 converts the candidate table 1
The words inside and the words replaced with synonyms for each unit noun are stored in the candidate table 2 indicated by reference number 14.

−政変計算部１９は、候補テーブル２のそれぞれの単語
と単語テーブル４中の単語テーブル８１２の単語との間
での一致度を計算し、その結果を単語一致度テーブル１
５（例えば、第２図の作業メモリ８上）へ格納する。一
致度は（式１）により計算する。- The political change calculation unit 19 calculates the degree of matching between each word in the candidate table 2 and the word in the word table 812 in the word table 4, and applies the result to the word matching table 1.
5 (for example, on the working memory 8 in FIG. 2). The degree of matching is calculated using (Equation 1).

そして、一致度の最大値があらかじめ決められた閾値以
上であれば、同義語であると判定する。Then, if the maximum value of the degree of matching is greater than or equal to a predetermined threshold, it is determined that the words are synonymous.

（一致度）− 次に、本実施例の動作を、第４図に示す処理過程の様子
を参照して説明する。図示する例では、照合すべき２つ
の単語“高速・省力型検索装置゛。(Degree of Matching) - Next, the operation of this embodiment will be explained with reference to the processing process shown in FIG. In the illustrated example, the two words to be matched are "high-speed and labor-saving search device."

゛高速省力検索システム”は、それぞれ、単語テーブル
Ａ１１．単語テーブル８１２へ格納されている。このう
ち単語Ａを、分かち書き部１６により、短単位の名詞、
接頭辞・接尾辞、記号へ分割し、分割した結果を短単位
テーブル５に格納する。The "high-speed labor-saving search system" is stored in the word table A11 and the word table 812, respectively. Among these, the word A is divided into short unit nouns,
It is divided into prefixes, suffixes, and symbols, and the divided results are stored in the short unit table 5.

次に、短単位置換部１７は、まず、短単位テーブル５中
の短単位名詞をキーとして、短単位名詞ごとに同義語を
技術した短単位同義語辞書２０を検索する。ここで、短
単位同義語辞１１２０は、第５図に示すように、見出し
語（短単位名詞キーワード）と見出し語の同義語を対応
付けて記憶している。単語Ａからは、「装置」なる短単
位名詞の同義語として「システム」があることが規定さ
れている。そして、短単位置換部１７は、この短単位名
詞「装置」を同義語「システム」で置換した単語、すな
わち［高速・省力型検索システム」を生成する。そして
、短単位置換部１７はもとの単語及び置換した単語を、
候補テーブル１へ格納する。Next, the short unit replacement unit 17 first searches the short unit synonym dictionary 20 containing synonyms for each short unit noun using the short unit noun in the short unit table 5 as a key. Here, as shown in FIG. 5, the short unit synonym dictionary 1120 stores headwords (short unit noun keywords) and synonyms of the headword in association with each other. From word A, it is specified that "system" is a synonym for the short unit noun "device". Then, the short unit replacement unit 17 generates a word by replacing the short unit noun "apparatus" with the synonym "system", that is, "high-speed/labor-saving search system". Then, the short unit replacement unit 17 replaces the original word and the replaced word with
Store in candidate table 1.

次に、民事位置換部１８は、短単位名詞を複数結合させ
た民事位名詞を作り、この民事位名詞をキーとして、民
事位同義語辞書１Ｂを検索する。Next, the civil position substitution unit 18 creates a civil position noun by combining a plurality of short unit nouns, and searches the civil position synonym dictionary 1B using this civil position noun as a key.

ここで、民事位同！Ｉ語辞書２１は、第６図に示すよう
に、見出し語（民事位名詞キーワード）と見出し語の同
義語を対応付けて記憶している。図示する民事位同ａｉ
ｍ辞書２１の例では、単１）Ａにおける民事位の同義語
がない。従って、第４図に示すように、候補テーブル１
と候補テーブル２の内容は同じとなる。そして、−政変
計算部１９は候補テーブル２の単語を単語テーブルＢ１
２の単語Ｂとの皿で上記（１）式に従い、一致度を計算
する。Here, civil rank is the same! As shown in FIG. 6, the I-word dictionary 21 stores headwords (civil position noun keywords) and synonyms of the headword in association with each other. Illustrated civil rank ai
In the example of the m dictionary 21, there is no synonym for the civil position in unit 1)A. Therefore, as shown in FIG.
and the contents of candidate table 2 are the same. Then, the political change calculation unit 19 converts the words of the candidate table 2 into the word table B1.
The degree of matching is calculated according to the above equation (1) for the dish with word B of No. 2.

例えば、候補テーブル２中の［高速・省力型検索装置」
と単語８との照合の場合、これと１１語テーブルＢ１２
中の［高速省力検索システム」との間の一致文字数は「
高」・「速」・「省」・ｒ力」・「検」・［索Ｊの６つ
であり、両方とも１０の文字数で構成されているので、
一致度は０，３６となる。一方、［高速・省力型検索シ
ステム」と単語との照合の場合、同様にして一致度を求
めると、０．８３となる。例えば、一致度の閾値を０．
８とした場合、「高速・省力型検索システム」の照合は
０．８３でこの閘値をこえる。従って、単ｍＡと８とは
同義°語であると判定できる。For example, [high-speed/labor-saving search device] in candidate table 2.
In the case of matching with word 8, this and 11 word table B12
The number of matching characters between "High-speed labor-saving search system" is "
There are 6 types: ``High'', ``Speed'', ``Relief'', ``Riki'', ``Detection'', and [Search J, both of which are composed of 10 characters, so
The degree of coincidence is 0.36. On the other hand, in the case of matching the word ``high-speed/labor-saving search system'', the degree of matching is found to be 0.83 in the same manner. For example, set the matching degree threshold to 0.
When it is set to 8, the matching value of the "high-speed, labor-saving search system" exceeds this threshold at 0.83. Therefore, it can be determined that mA and 8 are synonymous words.

第７図は別の処理過程の様子を示す図である。FIG. 7 is a diagram showing another process.

照合すべき２つの単語“小型化シンクロトロン放射施設
′″と“小型ＳＯＲ施設”は、それぞれ＠語テーブルＡ
１１．単語テーブルＢ１２へ格納されている。単ｍＡを
、分かち書き部１６により、短単位の名詞、接頭辞・接
尾辞、記号へ分割し、その結果を短単位テーブル５に格
納する。次に、短単位置換部１７は、まず、短単位テー
ブル５の短単位名詞をキーとして、第５図の短単位同義
語辞書を検索する。ここでは、置換すべき短単位の同義
語がないので、短単位テーブル５と候補テーブル１の内
容は同じとなる。次に、長単位置換部１８は、短単位名
詞を複数結合させた民事位名詞を作り、この民事位名詞
をキーとして、第６図の民事位同義語辞書を検索する。The two words to be matched, “miniature synchrotron radiation facility’” and “miniature SOR facility,” are respectively @word table A.
11. It is stored in the word table B12. The single mA is divided into short unit nouns, prefixes/suffixes, and symbols by the dividing unit 16, and the results are stored in the short unit table 5. Next, the short unit replacement unit 17 first searches the short unit synonym dictionary of FIG. 5 using the short unit noun of the short unit table 5 as a key. Here, since there is no synonym for the short unit to be replaced, the contents of short unit table 5 and candidate table 1 are the same. Next, the long unit substitution unit 18 creates a civil position noun by combining a plurality of short unit nouns, and uses this civil position noun as a key to search the civil position synonym dictionary shown in FIG. 6.

そして、長里位名詞を同義語で置換した単語を生成する
。そして、もとの単語および置換した単語を、候補テー
ブル２へ格納する。−政変計算部１９は、候補テーブル
２の単語と単語テーブル８１２との間での一致度を計算
し、その結果を単語一致度テーブル１５へ格納する。そ
して、単語一致度テーブル１５に閾値（例えば０．８）
以上である組み合せ（“小型化ＳＯＲ施設″と“小型Ｓ
ＯＲ施設”）が存在するので同義語であると判定する。Then, a word is generated by replacing the Nagasato noun with a synonym. Then, the original word and the replaced word are stored in the candidate table 2. - The political change calculation unit 19 calculates the degree of matching between the words in the candidate table 2 and the word table 812, and stores the result in the word matching degree table 15. Then, a threshold value (for example, 0.8) is added to the word matching table 15.
A combination of the above (“compact SOR facility” and “compact SOR facility”)
Since "OR facility") exists, it is determined that they are synonyms.

（発明の効果）以上説明したように、本発明では、照合すべき単語を短
単位の名詞に分割し、同義語辞書を用いて単語の一部ま
たは全部を同義語に置換し、その後に文字列照合を行う
ことにより、もとの単語同士が同義語であるかを判定す
る。従って、出現する単語それぞれに対して、その同義
語のすべてを辞書に記述する必要がないため、同義語辞
書の記憶容量の増大や辞書メンテナンスの工数増大の問
題は回避できるという点、単語の文字列だけではなくそ
の意味を考慮した単語照合が可能であるという点で、効
果がある。(Effects of the Invention) As explained above, in the present invention, words to be matched are divided into short nouns, part or all of the words are replaced with synonyms using a synonym dictionary, and then By performing column matching, it is determined whether the original words are synonymous. Therefore, it is not necessary to write all of the synonyms for each word that appears in the dictionary, so it is possible to avoid the problems of increasing the storage capacity of synonym dictionaries and increasing the man-hours of dictionary maintenance. This is effective in that it is possible to perform word matching that takes into account not only the columns but also their meanings.

[Brief explanation of the drawing]

第１図は本発明の原理ブロック図、第２図は本発明の一実施例のハードウェア構成のブロッ
ク図、第３図は第２図に示す実施例の機能ブロック図、第４図
は本発明の実施例の処理過程を示す図、第５図は短単位
同義語辞１２０の一例を示す図、第６図は民事位同義語
辞書２１の一例を示す図、及び第７図は本発明の実施例の別の処理過程を示す図である
。１・・・入力装置、２・・・出力装置、３・・・ＣＰＵ
、４・・・単語テーブル、５・・・短単位テーブル、６
・・・候補テーブル、７・・・プログラムメモリ、８・
・・作業メモリ、９・・・日本語辞書、１０・・・同義
語辞書、１１・・・単語テーブルＡ１１２・・・単語テ
ーブルＢ１１３・・・候補テーブル１．１４・・・候補
テーブル２．１５・・・単語一致度テーブル、１６・・
・分かち書き部、１７・・・短単位置換部、１８・・・
民事位置換部、１９・・・−政変計算部、２０・・・短
単位同義語辞書、２１・・・民事位同義語辞書、１００
・・・分かち書き部、２００・・・同義語置換部、３０
０・・・判定部。本発明の原理ブロック図第１図本発明の一実施例のハードウェア構成のグロック図第２
Ｆ！Ｊ第２図に示す実施例の機能ブロック口笛１Ｍ／：＊語の区切り本発明の実施例の処理過程を示す国策４図短単位同義語辞書２０の一例全示す関Fig. 1 is a block diagram of the principle of the present invention, Fig. 2 is a block diagram of the hardware configuration of an embodiment of the present invention, Fig. 3 is a functional block diagram of the embodiment shown in Fig. 2, and Fig. 4 is a block diagram of the embodiment of the present invention. FIG. 5 is a diagram showing an example of the short unit synonym dictionary 120, FIG. 6 is a diagram showing an example of the civil unit synonym dictionary 21, and FIG. 7 is a diagram showing the processing process of the embodiment of the invention. It is a figure which shows another processing process of an Example. 1... Input device, 2... Output device, 3... CPU
, 4... Word table, 5... Short unit table, 6
...Candidate table, 7.Program memory, 8.
...Working memory, 9...Japanese dictionary, 10...Synonym dictionary, 11...Word table A112...Word table B113...Candidate table 1.14...Candidate table 2.15 ...Word matching table, 16...
- Breaking section, 17...Short unit replacement section, 18...
Civil rank replacement department, 19... - Political change calculation department, 20... Short unit synonym dictionary, 21... Civil rank synonym dictionary, 100
... Separation section, 200 ... Synonym replacement section, 30
0... Judgment section. Figure 1 is a block diagram of the principle of the present invention. Figure 2 is a block diagram of the hardware configuration of an embodiment of the present invention.
F! J Functional block whistle 1M of the embodiment shown in FIG.

Claims

[Claims] A separating section that divides one word to be matched into short nouns, which are the smallest meaningful words as Japanese words; A synonym substitution unit that replaces words with synonyms, and a word to be matched and a word that partially or completely includes a synonym are compared with the other word to be matched to determine whether or not the words are synonyms. A word matching device comprising: a determining unit that makes a determination.