JPS61134877A

JPS61134877A - Unregistered word attribute estimating device

Info

Publication number: JPS61134877A
Application number: JP59255803A
Authority: JP
Inventors: Takanori Yano; 隆則矢野
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-12-05
Filing date: 1984-12-05
Publication date: 1986-06-21

Abstract

PURPOSE:To quicken the processing by estimating a part of speech within the range of a limited part of speech when an unregistered word is a foreign word to apply syntax analysis and applying attribute estimate processing over all attributes when the syntax analysis is failed. CONSTITUTION:When a foreign word part of speech estimation section 4c detects at first whether endings of a foreign word are endings of Sa-line conjugation or not and when they are the endings, it is estimated that an Sa-line noun and a syntax analyzer 5 applies syntax analysis. When the endings are not the Sa-line conjugation endings, whether or not the endings are adjective verb endings is detected, ad when the endings are the adjective verb conjugation endings, it is estimated that the word is an adjective and syntax analysis is applied. When the word is not an adjective verb, it is estimated as a noun, and the syntax analysis is applied. If the syntax analysis is failed, the foreign word transferred to an attribute sequential providing section 4b and the processing similar to unregistered words other than foreign words is attained.

Description

【発明の詳細な説明】技術分野本発明は１機械翻訳システムやワードプロセッサ等の自
然言語処理システムの日本語解析装置に関し、特に、未
登録語が出現した場合にも構文解析、およびシステムの
処理をを続行するため、未登録語の属性を自動的に推定
する未登録語属性推定装置に関する。[Detailed Description of the Invention] Technical Field The present invention relates to (1) a Japanese language analysis device for a natural language processing system such as a machine translation system or a word processor, and in particular, to perform syntax analysis and system processing even when an unregistered word appears. The present invention relates to an unregistered word attribute estimating device that automatically estimates the attributes of unregistered words in order to continue the process.

従来技術機械翻訳システムやワードプロセッサ等の自然言語処理
システムが注目されている。Background Art Natural language processing systems such as machine translation systems and word processors are attracting attention.

自然言語処理システムにおいては、日本文の構文を解析
する日本語解析装置が使用されている。Natural language processing systems use Japanese language analysis devices that analyze the syntax of Japanese sentences.

与えられた日本語の構文を解析する場合、一般に、単語
とともにその属性が登録された単語辞書を用いて１日本
語の各単語の属性を求め、これら属性のつながり関係と
文法規則により解析処理が行われている。ところが、一
般的に、文章を構成する可能性のある単語が全て単語辞
書に登録されているとは限らず、しばしば未登録単語が
文章中に現われることが多い。この場合、その単語に対
する属性を決定することができないため、構文解析が不
可能となり、その文章に対する処理を中断せざるを得な
くなる。When analyzing a given Japanese syntax, the attributes of each Japanese word are generally determined using a word dictionary in which words and their attributes are registered, and the analysis process is performed based on the connections between these attributes and grammatical rules. It is being done. However, in general, not all words that may constitute a sentence are registered in a word dictionary, and unregistered words often appear in a sentence. In this case, since the attribute for that word cannot be determined, syntactic analysis becomes impossible, and processing for that sentence must be interrupted.

そこで、未登録語の属性の自動推定方法がいくつか提案
された。すなわち、 ■特開昭５８−１７５０７４号公報に開示された「構文
分析方式Ｊのように、属性推定を順次総あたりで行う方
法、 ■特開昭５８−１７５０７５号公報に開示された［構文
分析方式」のように、構文解析方法により文法規則上推
定する方法、である。Therefore, several methods for automatically estimating the attributes of unregistered words have been proposed. In other words, ■ A method of sequentially performing brute force attribute estimation like the syntactic analysis method J disclosed in Japanese Patent Application Laid-open No. 58-175074; This is a method that uses a syntactic analysis method to infer based on grammatical rules.

しかし、これらの方法は、推定方法が複雑なため、推定
するのに長時間を要するという問題点がある。However, these methods have a problem in that it takes a long time to estimate because the estimation method is complicated.

目　　　　　的本発明の目的は、上記のような従来技術の問題点を解決
するため、迅速に未登録語の属性を推定し得る未登録語
属性推定装置を提供することにある。Purpose An object of the present invention is to provide an unregistered word attribute estimating device capable of quickly estimating the attributes of an unregistered word in order to solve the problems of the prior art as described above.

構　　　成上記目的番達成するため、本発明の構成は、単語辞書に
登録されていない未登録語が出現した場合に、全ての属
性にわたって前記未登録語の属性を推定し得る第１の手
段を備えた自然言語処理システムの未登録語属性推定装
置において、前記未登録語が外来語であるか否かを識別
する第２の手段と、限られた品詞の範囲内で外来語の品
詞を推定する第３の手段とを設け、未登録語である外来
語に対しては、当該節３の手段による推定結果に以下、
本発明の構成を図面を参照して詳細に説明する。Configuration In order to achieve the above objective number, the configuration of the present invention includes a first means capable of estimating the attributes of the unregistered word across all attributes when an unregistered word that is not registered in a word dictionary appears. An unregistered word attribute estimating device for a natural language processing system comprising: a second means for identifying whether the unregistered word is a foreign word; and estimating the part of speech of the foreign word within a limited range of parts of speech. For foreign words that are unregistered words, the following is added to the estimation results using the method in Section 3.
The configuration of the present invention will be explained in detail with reference to the drawings.

第１図は、本発明の一実施例による未登録語属性推定装
置を適用した自然言語処理システムの一部を示すブロッ
ク図である。FIG. 1 is a block diagram showing a part of a natural language processing system to which an unregistered word attribute estimating device according to an embodiment of the present invention is applied.

第１図において、１は辞書検索装置、２は単語辞書・３
′″１′！１語検出装置・４は本発明０特徴　　　　　
１である未登録語属性推定装置、５は構文解析装置、６
は文法辞書である。未登録語属性推定装置４は、外来語
識別部４ａ、属性順次付与部４ｂ、外来語品詞推定部４
ｃ、活用語尾表４ｄにより構成されている。In FIG. 1, 1 is a dictionary search device, 2 is a word dictionary, and 3 is a dictionary search device.
'''1'!1 word detection device/4 is the 0 feature of the present invention
1 is an unregistered word attribute estimation device, 5 is a syntax analysis device, and 6 is an unregistered word attribute estimation device.
is a grammar dictionary. The unregistered word attribute estimation device 4 includes a loanword identification unit 4a, an attribute sequential assignment unit 4b, and a loanword part-of-speech estimation unit 4.
c, and a table of conjugated endings 4d.

第１図の自然言語処理システムの動作概略を説明する。An outline of the operation of the natural language processing system shown in FIG. 1 will be explained.

辞書検索装置［１は、入力装置（図示省略）から入力さ
れた日本文に対し、単語辞書２を用いてその単語の属性
（品詞等）を調べることにより形態素解析を行う。The dictionary search device [1 performs morphological analysis on a Japanese sentence input from an input device (not shown) by checking the attributes (part of speech, etc.) of the word using a word dictionary 2.

ところで、入力された日本文に対し、その単語が、単語
辞書２に総て登録されているとは限らず、単語辞書２に
登録されていない未登録語が出現することは避けられな
い。未登録語が出現した場合、未登録語の属性が不明で
あるため、以後の処理を実行することができなくなる。By the way, not all words of an input Japanese sentence are registered in the word dictionary 2, and it is inevitable that unregistered words that are not registered in the word dictionary 2 will appear. If an unregistered word appears, the attributes of the unregistered word are unknown, so subsequent processing cannot be performed.

そこで、本実施例では、未登録語検出装置３により、形
態素解析された日本文中の全ての単語に属性が付与され
、形態素解析が完全に行われたか否かを調べ、もし完全
に行われており、未登録語が存在しない場合は、そのま
ま構文解析装置ｆ５に転送する。未登録語が検出された
場合は、未登録語属性推定装置４にて未登録語の属性を
推定し、推定した属性を未登録語に付与して構文解析装
置５に送出する（これに４一ついては、後で詳述する）。構文解析装置５は、形態素
解析された日本文に対し文法辞書６を用いて構文解析を
行う。構文解析された日本文に対して、以後、自然言語
処理システムの目的に応じた処理が施される。例えば１
日−英機械翻訳システムの場合は、さらに、意味解析、
日−英変換（翻訳）処理、英語構文生成、英語形態素生
成等の処理を径で、入力された日本文に対応する英語に
よる翻訳文が得られる。ワードプロセッサの場合は、接
続検定表等を用いて、単語辞書２から抽出された複数の
カナ漢字変換のための変換候補の中から適切な変換候補
を選択し、変換結果とする。Therefore, in this embodiment, the unregistered word detection device 3 assigns attributes to all the words in the morphologically analyzed Japanese sentence, checks whether or not the morphological analysis has been completely performed, and if the morphologically analyzed word has not been completely analyzed. If there are no unregistered words, the word is directly transferred to the parser f5. When an unregistered word is detected, the unregistered word attribute estimating device 4 estimates the attribute of the unregistered word, gives the estimated attribute to the unregistered word, and sends it to the syntactic analysis device 5. (One of them will be explained in detail later). The syntactic analysis device 5 uses a grammar dictionary 6 to perform syntactic analysis on the morphologically analyzed Japanese sentence. The parsed Japanese sentence is then processed according to the purpose of the natural language processing system. For example 1
In the case of a Japanese-English machine translation system, semantic analysis,
Through processing such as Japanese-English conversion (translation) processing, English syntax generation, and English morpheme generation, an English translated sentence corresponding to the input Japanese sentence is obtained. In the case of a word processor, an appropriate conversion candidate is selected from a plurality of conversion candidates for kana-kanji conversion extracted from the word dictionary 2 using a connection test table, etc., and is used as a conversion result.

以下、第１図、第２図、第３図を用いて未登録語属性推
定袋！３の動作を説明する。The following is an unregistered word attribute estimation bag using Figures 1, 2, and 3! The operation of step 3 will be explained.

第２図は、未登録語属性推定装置４の動作を示すフロー
チャート、第３図は活用語尾表３ｄの内容を示す図であ
る。FIG. 2 is a flowchart showing the operation of the unregistered word attribute estimation device 4, and FIG. 3 is a diagram showing the contents of the conjugated word ending table 3d.

未登録語検出袋！３により検出された未登録語は、未登
録語属性推定装置！ｔ４内の外来語識別部４ａに送られ
、未登録語が外来語であるか否かが識別される（２０１
）。なお、外来語であるか否かは、単語のコード体系に
より容易に識別することができる。外来語でない場合は
、属性順次付与部４ｂに送られ、属性（例えば名詞）が
付与される（２０２．２０３）。この属性順次付与部４
ｂは。Unregistered word detection bag! The unregistered words detected by 3 are detected by the unregistered word attribute estimation device! The word is sent to the foreign word identification unit 4a in t4, and it is identified whether the unregistered word is a foreign word (201
). Note that whether or not a word is a foreign word can be easily identified by the word coding system. If it is not a foreign word, it is sequentially sent to the attribute assigning unit 4b, and an attribute (for example, a noun) is assigned (202, 203). This attribute sequential assignment section 4
b is.

全ての属性を付与し得る機構である。次に、付与された
属性のもとで、構文解析装置５により構文解析を行う（
２０４）。構文解析に成功すれば、その解析結果をスタ
ックする（２０５．２０　Ｆ）。スタック後、および構
文解析に失敗した場合は、まだ付与されていない属性が
残っているか否かを調べ、まだ上記属性が残っている時
は、残った属性を付与し、その属性のもとで構文解析を
行う（２０７，２０３）。未付与の属性が残っていない
場合は、スタックした解析結果を出力し、構文解析処理
を終了する（２０　Ｂ）。This is a mechanism that can assign all attributes. Next, the syntax analysis device 5 performs syntax analysis based on the given attributes (
204). If the parsing is successful, the parsing results are stacked (205.20 F). After stacking or if syntax parsing fails, check whether there are any attributes that have not been assigned yet, and if the above attributes still remain, assign the remaining attributes, and then Perform syntax analysis (207, 203). If there are no unassigned attributes remaining, the stacked analysis results are output and the parsing process ends (20B).

未登録語が外来語の場合は、外来語品詞推定部４ｃは、
第３図に示したような活用語足表４ｄを用いて外来語の
属性（品１ｉｉ１）を推定する。外来語の品詞は、名詞
、す変名側、形容動詞の３つがほとんであるので、活用
語足表４ｄは、これら３つの品詞の活用語尾みを記載し
ている。したがって、この活用語足表４ｄを使用して外
来語の属性を推定すれば、３つの属性についてのみ検討
すれば良く、属性順次付与部４ｂによる推定のように１
０数個の属性について検討する必要がないので、推定に
要する時間を短縮することができる。If the unregistered word is a foreign word, the foreign word part of speech estimation unit 4c
The attribute of the foreign word (item 1ii1) is estimated using the conjugated word table 4d as shown in FIG. Since most of the parts of speech of loanwords are nouns, pseudonyms, and adjectives, the conjugation table 4d lists the conjugation endings of these three parts of speech. Therefore, if the attributes of a foreign word are estimated using this conjugation word foot table 4d, only three attributes need to be considered, and one
Since there is no need to consider zero or more attributes, the time required for estimation can be shortened.

外来語品詞推定部４ｃは、まず、外来語の語尾がす行変
格活用語尾であるか否かを検定し、す行変格活用謂尾で
あれば、す変名詞であると推定し、す変名詞という属性
のもとて構文解析装置５により構文解析を行う（２０９
，２１０，２１４）。The foreign word part-of-speech estimating unit 4c first tests whether the ending of the loan word is a s-flexive conjugation ending, and if it is a s-flexive conjugation ending, it is estimated that it is a s-bending noun, and The syntax is analyzed by the syntax analysis device 5 based on the attribute of noun (209
, 210, 214).

す行変格活用語尾でない場合は、さらに形容動詞活用語
尾であるか否かを検定し、形容動詞活用語尾ｒ、ｆＦ＋
６８合″・形容動１ｉｉＩＦ、％６に推定５・形容　　
　　　　１動詞という属性のもとて構文解析装置５によ
り構文解析を行う（２１１，２１２，２１４）、形容動
詞でない場合は、名詞であると推定し、名詞という属性
のもとで構文解析装置５により構文解析を行う（２１１
，２１３）。If it is not a subgrade conjugation ending, it is further tested to see if it is an adjective conjugation ending, and the adjective conjugation ending r, fF+
68 go''・adjective 1iiIF, estimated at %6 5・adjective
1 The syntax is analyzed by the syntactic analyzer 5 under the attribute of verb (211, 212, 214).If it is not an adjective, it is presumed to be a noun, and the syntax is analyzed by the syntax analyzer 5 under the attribute of noun. Perform syntax analysis (211
, 213).

次に、構文解析の成否を検定し、失敗した場合は、属性
順次付与部４ｂに外来語を転送し、外来語以外の未登録
語に対するのと同様の処理を行う（２１５，２０３）。Next, the success or failure of the parsing is tested, and if it fails, the foreign word is transferred to the sequential attribute assigning unit 4b, and the same processing as for unregistered words other than foreign words is performed (215, 203).

構文解析に成功した場合は。If parsing is successful.

それで、構文解析処理は終了する。The parsing process then ends.

このように、外来語の品詞は、はとんどの場合、名詞、
す変名詞、形容動詞の３つのいずれかであるという事実
に着目し、未登録語が外来語である場合には、まず、こ
れら３つの品詞（属性）のうちのいずれかであると推定
し、その推定のもとて構）文解析を行い、解析に失敗し
た場合に、初めて付属語順次付与部４ｂによる未登録語
属性推定処理を行うことにより、未登録語属性推定処理
の迅速化が可能となる。In this way, the parts of speech of foreign words are usually nouns,
If an unregistered word is a loan word, we first estimate that it is one of these three parts of speech (attributes). , the syntax of the estimation is performed), and when the analysis fails, the unregistered word attribute estimation process is performed by the adjunct sequential adding unit 4b for the first time, thereby speeding up the unregistered word attribute estimation process. It becomes possible.

なお、本実施例においては、全ての属性にわたって未登
録語の属性を推定し得る機構として、属性順次付与部４
ｂを採用したが、属性順次付与部４ｂの代わりに、構文
解析上の文法上期待される属性を与える属性推定機構等
、他の属性推定機構を採用することも可能である。In this embodiment, the attribute sequential assigning unit 4 is used as a mechanism that can estimate the attributes of unregistered words across all attributes.
b is employed, but it is also possible to employ other attribute estimation mechanisms, such as an attribute estimation mechanism that provides attributes expected from the grammar of syntactic analysis, in place of the attribute sequential assigning unit 4b.

効　　　果以上説明したように、本発明の未登録語属性推定装置に
よれば、未登録語属性推定処理を迅速化することが可能
となる。Effects As explained above, according to the unregistered word attribute estimating device of the present invention, it is possible to speed up the unregistered word attribute estimation process.

[Brief explanation of the drawing]

第１図は本発明の一実施例による未登録語属性推定装置
を示す図、第２図は第１図の動作を示すフローチャート
、第３図は第１図における活用額尾表の内容を示す図で
ある。４：未登録語属性推定装置、４ａ：外来語識別部、４ｂ
；属性順次付与部、４Ｃ：外来語品詞推定部、４ｄ：活
用語足表、５：構文解析装置。FIG. 1 is a diagram showing an unregistered word attribute estimation device according to an embodiment of the present invention, FIG. 2 is a flowchart showing the operation of FIG. 1, and FIG. 3 is a diagram showing the contents of the usage amount tail table in FIG. 1. It is a diagram. 4: Unregistered word attribute estimation device, 4a: Foreign word identification unit, 4b
; Attribute sequential assigning unit, 4C: Foreign word part of speech estimation unit, 4d: Conjugation word foot table, 5: Syntactic analysis device.

Claims

[Claims]

(1) When an unregistered word that is not registered in a word dictionary appears, unregistered word attribute estimation by a natural language processing system including a first means capable of estimating the attributes of the unregistered word across all attributes The device includes a second means for identifying whether the unregistered word is a foreign word, and a third means for estimating the part of speech of the foreign word within a limited range of parts of speech. An unregistered word characterized in that, for a foreign word that is, the attribute is estimated by the first means only when syntactic analysis based on the estimation result by the third means fails. Attribute estimation device.