JPH11167575A

JPH11167575A - Language analysis system and method

Info

Publication number: JPH11167575A
Application number: JP7279465A
Authority: JP
Inventors: Svetlana Shevenko; スヴェトラーナシェヴェンコ
Original assignee: Individual
Current assignee: Individual
Priority date: 1995-10-26
Filing date: 1995-10-26
Publication date: 1999-06-22

Abstract

PROBLEM TO BE SOLVED: To provide a language analysis system which can accurately analyze the parts of speech of tokens constructing the sentences. SOLUTION: A division means 2 divides the inputted sentence data into tokens by referring to a dictionary means 8 which previously stores many tokens. A part-of-speech acquisition means 4 decides the part of speech of each divided token by referring to the dictionary 8. If a token has plural parts of speech, a part-of-speech selection means 6 selects or decides a single part of speech for the token based on the parts of speech of one or more tokens preceding or following (or both of preceding and following tokens of) the relevant token. Furthermore, the capacity of the means 8 is significantly reduced and the processing speed is increased by means of a table where the verbs and the suffixes of predicate adjectives are defined in common with each other.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、機械翻訳等に用
いることのできる言語解析装置および方法に関するもの
である。より詳細には、言語の各トークンに対する品詞
を正確に決定する技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a language analysis apparatus and method which can be used for machine translation and the like. More particularly, it relates to a technique for accurately determining the part of speech for each token of a language.

【０００２】[0002]

【従来の技術および課題】従来より、自然言語をコンピ
ュータにより翻訳する機械翻訳が研究され、一部実用化
されてきた。機械翻訳を行う際には、翻訳しようとする
元の言語の解析を行った後、他の言語に翻訳することが
行われている。したがって、言語解析は機械翻訳を行う
ための重要なステップであり、言語解析の正確さが翻訳
の正確さを決定する大きな要因となっている。また、言
語解析は機械翻訳だけでなく、その他の言語処理一般に
用いることができる。2. Description of the Related Art Conventionally, machine translation for translating a natural language by a computer has been studied and partially put into practical use. When performing machine translation, an original language to be translated is analyzed and then translated into another language. Therefore, linguistic analysis is an important step in performing machine translation, and the accuracy of linguistic analysis is a major factor in determining the accuracy of translation. Linguistic analysis can be used not only for machine translation but also for other language processing in general.

【０００３】従来の言語解析においては、１つの語（ト
ークン）が２以上の品詞を持つ場合に、これを正確に特
定することができなかった。In the conventional linguistic analysis, when one word (token) has two or more parts of speech, it cannot be specified accurately.

【０００４】たとえば、特開平４−３０５７６９号公報
においては、人が品詞を選択することによってこれを解
決する方法が開示されている。しかしながら、この方法
では人による判断操作が必要であり、自動的に品詞を特
定することはできなかった。For example, Japanese Patent Laying-Open No. 4-305768 discloses a method for solving this problem by selecting a part of speech by a person. However, in this method, a judgment operation by a person is required, and the part of speech cannot be automatically specified.

【０００５】また、特開平５−２９００８１号公報にお
いては、文書の対象分野ごとに品詞の使用度を記憶して
おき、品詞特定の正確度を向上しようとする方法が開示
されている。この方法によれば、ある程度の正確性向上
は望めるが、辞書の管理が大変であるという問題が生じ
る。さらに、正確性向上にも限界があった。Japanese Patent Application Laid-Open No. H5-290081 discloses a method in which the degree of use of a part of speech is stored for each target field of a document to improve the accuracy of specifying the part of speech. According to this method, accuracy can be improved to some extent, but there is a problem that dictionary management is difficult. In addition, there is a limit in improving the accuracy.

【０００６】この発明は、上記のような従来の問題点を
解決して、品詞の特定を正確に行うことのできる言語解
析システムおよび方法を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a language analysis system and a method capable of solving the above-mentioned conventional problems and accurately specifying a part of speech.

【０００７】[0007]

【課題を解決するための手段】請求項１の言語解析シス
テムは、与えられた言語をトークンに分割する分割手
段、トークンに対する品詞を記憶した辞書手段、辞書手
段を参照して、分割手段によって分割された各トークン
の品詞を取得する品詞取得手段、１つのトークンに対
し、品詞取得手段によって得られた品詞が２以上ある場
合には、当該トークンの前または後もしくは双方に位置
する１または複数のトークンの品詞に基づいて、当該ト
ークンに与えられた２以上の品詞から１つの品詞を選択
する品詞選択手段、を備えている。According to a first aspect of the present invention, there is provided a language analysis system, comprising: a dividing unit for dividing a given language into tokens; a dictionary unit storing a part of speech for the token; If there is more than one part of speech obtained by the part of speech acquisition means for one token, if one or more parts of speech are obtained by the part of speech acquisition means, one or more A part-of-speech selecting means for selecting one part of speech from two or more parts of speech given to the token based on the part of speech of the token.

【０００８】請求項２の言語解析システムは、トークン
に対する品詞を記憶した辞書手段、辞書手段を参照し
て、与えられた言語をトークンに分割するとともに各ト
ークンの品詞を取得する分割・品詞取得手段、１つのト
ークンに対し、分割・品詞取得手段によって得られた品
詞が２以上ある場合には、当該トークンの前または後も
しくは双方に位置する１または複数のトークンの品詞に
基づいて、当該トークンに与えられた２以上の品詞から
１つの品詞を選択する品詞選択手段、を備えている。According to a second aspect of the present invention, there is provided a language analysis system, comprising: a dictionary unit which stores a part of speech for a token; and a division / part of speech acquisition unit which divides a given language into tokens and acquires a part of speech of each token by referring to the dictionary unit. If there is more than one part of speech obtained by the division / part of speech acquisition means for one token, the token is determined based on the part of speech of one or more tokens located before or after or both of the token. A part of speech selecting means for selecting one part of speech from two or more given parts of speech.

【０００９】請求項３の言語解析システムは、前記辞書
手段は、トークンに対する品詞が２以上存在する場合に
は、前または後もしくは双方に位置する１または複数の
トークンの品詞に基づいて当該トークンの品詞を選択す
るためのテーブルを有していることを特徴としている。According to a third aspect of the present invention, in the case where there is more than one part of speech for the token, the dictionary means may determine the token based on the part of speech of one or more tokens located before or after or both. It is characterized by having a table for selecting a part of speech.

【００１０】請求項４の言語解析システムは、前記辞書
手段は、動詞の接尾語及び述語形容詞の接尾語以外の品
詞を有するトークンについては、トークンと品詞を対応
づけた品詞辞書を有しており、動詞の接尾語および述語
形容詞の接尾語に関しては、個々の動詞の語根もしくは
述語形容詞の語根に関する接尾語のためのテーブルを有
していることを特徴としている。According to a fourth aspect of the present invention, the dictionary means includes a part-of-speech dictionary in which tokens and parts of speech are associated with tokens having parts of speech other than suffixes of verbs and predicative adjectives. As for the suffix of the verb and the suffix of the predicate adjective, it is characterized by having a table for the suffix of the root of each verb or the root of the predicate adjective.

【００１１】請求項５の言語解析システムは、前記接尾
語のためのテーブルとは別にまたは一体的に、本来動詞
の接尾語でない品詞のトークンを含むものを接尾語とし
て扱う慣用語のためのテーブルを有していることを特徴
としている。A linguistic analysis system according to claim 5, wherein the table for idioms that treats, as a suffix, separately or integrally with the suffix table, a suffix containing a part-of-speech token that is not originally a verb suffix. It is characterized by having.

【００１２】請求項６の言語解析システムは、前記接尾
語のためのテーブルとは別にまたは一体的に、本来動詞
の接尾語でない動詞のトークンを含むものを接尾語とし
て扱う複合動詞のためのテーブルを有していることを特
徴としている。A linguistic analysis system according to claim 6, wherein a table for compound verbs that handles, as a suffix, separately or integrally with a table for a suffix, the one containing a verb token that is not originally a suffix of a verb. It is characterized by having.

【００１３】請求項７の言語解析方法は、記憶装置に記
憶された辞書手段を用いて、与えられた言語の各トーク
ンに対して品詞を付与する言語解析方法であって、種々
のトークンに対する品詞を記憶装置に辞書手段として記
憶しておき、与えられた言語の各トークンに対応する品
詞を辞書手段から取得し、１つのトークンに対し品詞が
複数ある場合には、当該トークンの前または後もしくは
双方に位置する１または複数のトークンの品詞に基づい
て、当該トークンの品詞を絞り込むようにしたこと、を
特徴としている。According to a seventh aspect of the present invention, there is provided a language analysis method for assigning a part of speech to each token of a given language using dictionary means stored in a storage device. Is stored in the storage device as dictionary means, and the part of speech corresponding to each token in the given language is obtained from the dictionary means. If there is a plurality of parts of speech for one token, It is characterized in that the part of speech of the token is narrowed down based on the part of speech of one or a plurality of tokens located on both sides.

【００１４】請求項８の言語解析方法は、トークンに対
する品詞が２以上存在する場合には、前または後もしく
は双方に位置する１または複数のトークンの品詞に基づ
いて当該トークンの品詞を選択するためのテーブルに基
づいて、品詞を選択することを特徴としている。In the language analysis method according to the present invention, when there is more than one part of speech for a token, the part of speech of the token is selected based on the part of speech of one or more tokens located before, after, or both. Is characterized in that the part of speech is selected based on the table.

【００１５】請求項９の言語解析方法は、少なくとも、
動詞の接尾語および述語形容詞の接尾語に関しては、個
々の動詞の語根もしくは述語形容詞の語根に関する接尾
語のためのテーブルによってトークンへの分割を行うこ
とを特徴としている。According to a ninth aspect of the present invention, there is provided a language analysis method comprising:
With respect to verb suffixes and predicate adjective suffixes, tokens are divided by a table for suffixes relating to individual verb roots or predicate adjective roots.

【００１６】請求項１０の言語解析方法は、前記接尾語
のためのテーブルとは別にまたは一体的に、本来動詞の
接尾語でない品詞のトークンを含むものを接尾語として
扱う慣用語のためのテーブルを有することを特徴として
いる。The linguistic analysis method according to claim 10, wherein the table for idioms that treats, as a suffix, separately or integrally with the suffix table, the suffix includes a part of speech token that is not a verb suffix. It is characterized by having.

【００１７】請求項１１の言語解析方法は、前記接尾語
のためのテーブルとは別にまたは一体的に、本来動詞の
接尾語でない動詞のトークンを含むものを接尾語として
扱う複合動詞のためのテーブルを有することを特徴とし
ている。A linguistic analysis method according to claim 11, wherein the table for compound verbs that handles, as a suffix, a suffix that includes a verb token that is not originally a suffix of a verb, separately or integrally with the suffix table. It is characterized by having.

【００１８】請求項１２の記憶装置は、与えられた言語
の各トークンに対して品詞を付与する方法をコンピュー
タを用いて実行するための、コンピュータによって実行
可能なプログラムを、実体的に一体化したコンピュータ
可読な記憶装置であって、前記方法は、種々のトークン
に対する品詞を記憶装置に辞書手段として記憶してお
き、与えられた言語の各トークンに対応する品詞を辞書
手段から取得し、１つのトークンに対し品詞が複数ある
場合には、当該トークンの前または後ろもしくは双方に
位置する１または複数のトークンの品詞に基づいて、当
該トークンの品詞を絞り込むようにしたこと、を特徴と
している。A storage device according to a twelfth aspect substantially integrates a computer-executable program for executing a method of giving a part of speech to each token of a given language using a computer. A computer readable storage device, the method comprising: storing parts of speech for various tokens in a storage device as dictionary means, obtaining a part of speech corresponding to each token in a given language from the dictionary means, When a token has a plurality of parts of speech, the part of speech of the token is narrowed down based on the parts of speech of one or a plurality of tokens located before or after or both of the token.

【００１９】請求項１３の言語解析方法は、少なくとも
動詞の接尾語および述語形容詞の接尾語を含むグループ
と、それ以外のグループとを分けて、品詞との対応付け
を行うようにした辞書手段を有していることを特徴とし
ている。According to a thirteenth aspect of the present invention, there is provided a linguistic analysis method, wherein a group including at least a verb suffix and a predicate adjective suffix is separated from other groups, and dictionary means for associating with a part of speech. It is characterized by having.

【００２０】請求項１４の言語解析方法は、少なくとも
動詞の語根および述語形容詞の語根を含むグループと、
それ以外のグループの品詞を異なる品詞として記憶した
辞書手段を有していることを特徴としている。A language analysis method according to claim 14, wherein a group including at least a root of a verb and a root of a predicate adjective;
It is characterized by having dictionary means for storing the parts of speech of other groups as different parts of speech.

【００２１】以下、この発明を説明するために用いた用
語の概念を説明する。Hereinafter, the concept of terms used to describe the present invention will be described.

【００２２】「言語」：文書やテキスト等の書き言葉、
話し言葉等の自然言語をいう概念である。文字コード形
式、イメージ形式、音声形式等、何れの形式で与えられ
てもよい。"Language": written words such as documents and texts,
It is a concept that refers to natural language such as spoken language. Any format such as a character code format, an image format, and a voice format may be used.

【００２３】「分割手段」：言語をトークンに分割する
手段をいい、実施例においては、図４のステップＳ２、
Ｓ３、Ｓ４が対応する。図４の実施例では、接尾語以外
のトークンについては品詞辞書を参照して分割を行い、
接尾語については図１３等の接尾語のためのテーブルを
参照して分割している。しかし、ここにいう分割手段
は、接尾語についても品詞辞書を参照して分割を行うも
のも含む概念である。また、品詞辞書を参照せずに分割
を行うものも含む。"Division means": means for dividing a language into tokens. In the embodiment, step S2 in FIG.
S3 and S4 correspond. In the embodiment of FIG. 4, tokens other than suffixes are divided with reference to the part-of-speech dictionary.
The suffix is divided with reference to the table for the suffix in FIG. However, the dividing means referred to here is a concept that also includes a means for dividing a suffix with reference to a part-of-speech dictionary. In addition, there is a case where division is performed without referring to the part-of-speech dictionary.

【００２４】「品詞辞書」：トークンと品詞との関係を
記憶した手段をいうものであり、記憶の形式（一覧形
式、テーブル形式。ツリー後造形式等）は問わない。実
施例では、図５の辞書が該当する。図５の実施例では、
接尾語以外の品詞を持つトークンについて品詞辞書を形
成しているが、接尾語も含めて品詞辞書を構成してもよ
い。"Part-of-speech dictionary": means for storing the relationship between tokens and parts-of-speech, regardless of the storage format (list format, table format, tree posterior format, etc.). In the embodiment, the dictionary in FIG. In the embodiment of FIG.
Although a part-of-speech dictionary is formed for tokens having parts of speech other than suffixes, a part-of-speech dictionary may be formed including suffixes.

【００２５】「辞書手段」：少なくとも上記の品詞辞書
を含む辞書をいうものである。実施例では、図１の品詞
辞書８ａ、動詞の接尾語のためのテーブル８ｂ、述語形
容詞の接尾語のためのテーブル８dを含んだ概念であ
る。なお、動詞の接尾語のためのテーブル８ｂは、この
実施例では、図１３の一般動詞の語根の次にくる接尾語
のテーブル（テーブルＤ）、図１４のテーブル、図１５
のテーブル、図１６の複合動詞のテーブル等を含んでい
る。述語形容詞の接尾語のためのテーブル８ｄは、この
実施例では、図１７のテーブル等を含んでいる。"Dictionary means": a dictionary including at least the above part of speech dictionary. In the embodiment, the concept includes the part-of-speech dictionary 8a of FIG. 1, a table 8b for suffixes of verbs, and a table 8d for suffixes of predicate adjectives. In this embodiment, the table 8b for the verb suffix is a suffix table (table D) following the root of the general verb in FIG. 13, the table in FIG. 14, the table in FIG.
, The compound verb table of FIG. 16, and the like. The table 8d for the suffix of the predicate adjective includes the table in FIG. 17 in this embodiment.

【００２６】「品詞取得手段」：辞書手段を参照して、
トークンの品詞を得る手段をいい、実施例では、図４の
ステップＳ２、Ｓ３、Ｓ４が対応する。"Part of speech acquisition means": Referring to the dictionary means,
This means means for obtaining the part of speech of the token, and corresponds to steps S2, S3 and S4 in FIG. 4 in the embodiment.

【００２７】「品詞選択手段」：前後の１以上のトーク
ンの品詞に基づいて、２以上の品詞を有するトークンの
品詞を決定する手段をいい、実施例では、図４のステッ
プＳ７（図７全体）が対応する。図７の実施例では、図
８等のルールテーブルを参照して品詞を決定している
が、ルールテーブルの使用の有無に拘わらず、前後のト
ークンの品詞に基づいて、トークンの品詞を決定するも
のを含む概念である。また、前後のトークンの品詞を、
品詞決定のために用いるものをいうものであり、他の要
素も加味して品詞決定を行うものも含む概念である。"Part of speech selection means": means for determining the part of speech of a token having two or more parts of speech based on the part of speech of one or more tokens before and after. In the embodiment, step S7 of FIG. ) Corresponds. In the embodiment of FIG. 7, the part of speech is determined with reference to the rule table of FIG. 8, etc., but the part of speech of the token is determined based on the part of speech of the preceding and following tokens regardless of the use of the rule table. It is a concept that includes things. Also, the part of speech of the token before and after
This is a concept used for determining the part of speech, and also includes a part that determines the part of speech in consideration of other elements.

【００２８】[0028]

【発明の効果】請求項１、２、３、７、８、１２の発明
によれば、１つのトークンに対して得られた品詞が２以
上ある場合には、当該トークンの前または後ろもしくは
双方の１以上のトークンの品詞に基づいて、当該トーク
ンに与えられた２以上の品詞から１つの品詞を選択する
ようにしている。つまり、前後のトークンとの関係によ
って当該トークンの品詞を確定するようにしているの
で、より正確に当該トークンの品詞を確定し、そのこと
によって正確な意味や正確な文章上のかかり具合をとら
えることが可能となる。According to the first, second, third, seventh, eighth and twelfth aspects of the present invention, when there are two or more parts of speech obtained for one token, the token is located before or after the token, or both. Based on the part of speech of one or more tokens, one part of speech is selected from two or more parts of speech given to the token. In other words, since the part of speech of the token is determined based on the relationship with the preceding and following tokens, the part of speech of the token is determined more accurately, and thereby the exact meaning and the degree of the sentence in the sentence are captured. Becomes possible.

【００２９】請求項４、９の発明においては、辞書手段
に、少なくとも、動詞の接尾語および述語形容詞の接尾
語に関するテーブルを備えるようにしている。したがっ
て、品詞辞書には動詞の語根や述語形容詞の語根のみを
記憶し、接尾語を共通化してテーブルに持つことができ
る。すなわち、品詞辞書は語根を記憶するだけよく、ま
たテーブルは共通化を図ることができるので、辞書手段
の容量を大幅に削減できるだけでなく、処理速度の高速
化も達成できる。According to the fourth and ninth aspects of the present invention, the dictionary means is provided with at least a table relating to verb suffixes and predicate adjective suffixes. Therefore, only the root of a verb or the root of a predicate adjective can be stored in the part-of-speech dictionary, and the suffix can be shared and stored in a table. That is, since the part-of-speech dictionary only needs to store the roots and the table can be shared, not only the capacity of the dictionary means can be significantly reduced, but also the processing speed can be increased.

【００３０】請求項５、１０の発明においては、前記接
尾語のためのテーブルとは別にまたは一体的に、本来動
詞の接尾語でない品詞のトークンを含むものを接尾語と
して扱う慣用語のためのテーブルを有していることを特
徴としている。したがって、動詞の語根に続いて頻繁に
よく用いられる慣用語に対する処理の高速化を図ること
ができる。According to the fifth and tenth aspects of the present invention, separately from or integrally with the suffix table, a suffix for idioms that include a part-of-speech token that is not originally a suffix of a verb as a suffix is used. It is characterized by having a table. Therefore, it is possible to speed up processing of frequently used idiomatic words following the root of the verb.

【００３１】請求項６、１１の発明においては、前記接
尾語のためのテーブルとは別にまたは一体的に、本来動
詞の接尾語でない動詞のトークンを含むものを接尾語と
して扱う複合動詞のためのテーブルを有していることを
特徴としている。したがって、複合動詞に対する処理の
高速化を図ることができる。According to the sixth and eleventh aspects of the present invention, separately from or integrally with the suffix table, a compound verb that handles a verb token that is not originally a verb suffix is treated as a suffix. It is characterized by having a table. Therefore, the speed of processing for compound verbs can be increased.

【００３２】請求項１３の言語解析方法は、少なくとも
動詞の接尾語および述語形容詞の接尾語を含むグループ
と、それ以外のグループとを分けて、品詞との対応付け
を行うようにした辞書手段を有していることを特徴とし
ている。したがって、動詞の接尾語や述語形容詞の接尾
語を、これら以外の品詞を持つものと区別して処理する
ことができ、解析の効率化を図ることができる。A linguistic analysis method according to a thirteenth aspect of the present invention is a dictionary analysis method wherein a group including at least a verb suffix and a suffix of a predicate adjective and a group other than the group are classified and associated with a part of speech. It is characterized by having. Therefore, the verb suffix and the predicate adjective suffix can be processed separately from those having other parts of speech, and the analysis can be made more efficient.

【００３３】請求項１４の発明においては、少なくとも
動詞の語根および述語形容詞の語根を含むグループと、
それ以外のグループの品詞を異なる品詞として記憶した
辞書手段を有していることを特徴としている。したがっ
て、動詞の語根や述語形容詞の語根を契機として、動詞
の接尾語や述語形容詞の接尾語に対する処理を行うこと
ができる。In the fourteenth aspect, a group including at least a root of a verb and a root of a predicate adjective;
It is characterized by having dictionary means for storing the parts of speech of other groups as different parts of speech. Therefore, a process can be performed on the verb suffix or the predicate adjective suffix, triggered by the verb root or the predicate adjective root.

【００３４】[0034]

【発明の実施の形態】この発明による言語解析装置の一
実施形態を、図１に全体構成として示す。分割手段２
は、多くのトークンを予め記憶している辞書手段８を参
照して、入力された文章データをトークンに分割する。
品詞取得手段４は、分割した各トークンについて、辞書
手段８を参照して品詞を取得する。つまり、この実施例
では、分割手段２と品詞取得手段４とによって、分割・
品詞取得手段５が構成されている。品詞選択手段６は、
ルールテーブル７を参照して、各トークンについて取得
した品詞が複数存在する場合には、前または後（または
双方）のトークンの品詞に基づいて、当該トークンの品
詞を１つに決定する。なお、品詞を１つに決定できない
場合には、絞り込みにとどめてもよい。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of a language analyzer according to the present invention is shown in FIG. Division means 2
Refers to the dictionary means 8 which stores many tokens in advance, and divides the input sentence data into tokens.
The part-of-speech acquisition unit 4 acquires the part of speech with reference to the dictionary unit 8 for each of the divided tokens. In other words, in this embodiment, the dividing unit 2 and the part of speech
The part-of-speech acquisition means 5 is configured. The part of speech selection means 6
Referring to the rule table 7, if there is a plurality of parts of speech acquired for each token, the part of speech of the token is determined to be one based on the parts of speech of the preceding and / or subsequent (or both) tokens. If the part of speech cannot be determined as one, the narrowing down may be performed.

【００３５】なお、図１の実施例では、分割手段２が辞
書手段８を参照して分割を行うようにしている。しか
し、辞書手段８を参照せず、トークンへの分割を行うよ
うにしてもよい。In the embodiment shown in FIG. 1, the dividing means 2 performs the division with reference to the dictionary means 8. However, the tokens may be divided without referring to the dictionary means 8.

【００３６】図２に、図１の言語解析装置をＣＰＵを用
いて実現した場合のハードウエア構成を示す。バスライ
ン１０には、ＣＰＵ１２、ハードディスク１４、ＣＲＴ
１６、メモリ１８、フロッピイディスクドライブ（ＦＤ
Ｄ）２０、キーボード２２が接続されている。ハードデ
ィスク１４には、トークンと品詞を対応づけた品詞辞書
８ａ、動詞の接尾語のためのテーブル８ｂ（複合動詞の
ためのテーブル８ｃを結合しており、慣用語のテーブル
を一体化している）、述語形容詞の接尾語のためのテー
ブル８ｄが格納されている。つまり、この実施例では、
品詞辞書８ａ、動詞の接尾語のためのテーブル８ｂ、述
語形容詞の接尾語のためのテーブル８ｄによって辞書手
段が構成されている。さらに、品詞を確定するためのル
ールテーブル７も記憶されている。さらに、ハードディ
スク１４には、言語解析を行うためのプログラムが記憶
されている。このプログラムは、ＦＤＤ２０を介して、
フロッピイディスク２４から取り込まれたものである。
もちろん、ＣＤ−ＲＯＭ等のその他の記憶装置から取り
込んだものであってもよい。また、通信回線を介してダ
ウンロードしたものであってもよい。FIG. 2 shows a hardware configuration when the language analyzer of FIG. 1 is realized using a CPU. The bus line 10 includes a CPU 12, a hard disk 14, a CRT
16, memory 18, floppy disk drive (FD
D) 20 and a keyboard 22 are connected. The hard disk 14 has a part-of-speech dictionary 8a that associates tokens with parts of speech, a table 8b for suffixes of verbs (a table 8c for compound verbs is combined, and a table of idioms is integrated), A table 8d for suffixes of predicate adjectives is stored. That is, in this embodiment,
The dictionary means is composed of the part-of-speech dictionary 8a, the table 8b for verb suffixes, and the table 8d for predicate adjective suffixes. Further, a rule table 7 for determining the part of speech is also stored. Further, the hard disk 14 stores a program for performing language analysis. This program, via FDD20,
It is taken from the floppy disk 24.
Of course, the data may be taken from another storage device such as a CD-ROM. Alternatively, the program may be downloaded via a communication line.

【００３７】解析対象であるテキスト文章は、フロッピ
イディスク２６に格納されて、ＦＤＤ２０を介して読み
込まれる。もちろん、これもＣＤ−ＲＯＭ等の媒体に格
納されているものを読み込むようにしてもよいし、通信
によってテキスト文章を受け取るようにしてもよい。ま
た、キーボード２２から入力されたテキスト文章を解析
対象としてもよい。The text to be analyzed is stored in the floppy disk 26 and read through the FDD 20. Of course, this may also be read from a medium such as a CD-ROM, or a text sentence may be received by communication. Further, a text sentence input from the keyboard 22 may be an analysis target.

【００３８】取り込まれたテキスト文章は、ハードディ
スク１４に記憶されたプログラムに従って、解析され
る。解析結果は、ハードディスク１４に記憶され、必要
に応じてＣＲＴ１６、フロッピイディスクやプリンタ
（図示せず）に出力される。場合によっては、通信回線
を介して転送してもよい。The fetched text is analyzed according to a program stored in the hard disk 14. The analysis result is stored in the hard disk 14 and output to the CRT 16, a floppy disk, or a printer (not shown) as necessary. In some cases, the data may be transferred via a communication line.

【００３９】この実施例による品詞辞書は、図３のよう
に分類して各トークンについての品詞を記憶している。
図３の分類は、さらに、細かく分類されている。例え
ば、名前グループＡは、さらに、一般名詞、一般名詞で
はない名詞、リストＡの名詞、代名詞・・・・等に細分
類されている。表１に、図３の分類の細分類を示す。The part-of-speech dictionary according to this embodiment stores the parts of speech of each token classified as shown in FIG.
The classification in FIG. 3 is further subdivided. For example, the name group A is further classified into general nouns, nouns that are not general nouns, nouns in the list A, pronouns, and so on. Table 1 shows the sub-classification of the classification of FIG.

【００４０】品詞辞書には、個々のトークンにつき、こ
の細分化された品詞が記憶されている。なお、ハードデ
ィスク１２には、図３の分類階層および後述の表１の分
類階層を記憶している。したがって、細分化された品詞
が分かれば、その上位の分類も容易に取得できる。たと
えば、「一般名詞」が、「名詞グループＡ」に属すると
いうことを容易に取得できる。なお、細分化した品詞と
ともに上位の分類も併せて記憶するようにしてもよい。The part-of-speech dictionary stores the subdivided parts of speech for each token. The hard disk 12 stores the classification hierarchy of FIG. 3 and the classification hierarchy of Table 1 described later. Therefore, if the subdivided part of speech is known, the higher classification can be easily obtained. For example, it can be easily acquired that "general noun" belongs to "noun group A". In addition, you may make it memorize | store the high-order classification together with the fragmented part of speech.

【００４１】図３（および表１）においては、動詞を、
動詞の語根と動詞の接尾語とに分けて、それぞれ１つの
品詞としている。また、同様に、述語形容詞を述語形容
詞の語根と述語形容詞の接尾語とに分けて、それぞれ１
つの品詞としている。これにより、辞書のトークンが固
定されたことによって明確になり、他言語への翻訳が簡
単、明確となり、より正しい翻訳を得られるようにな
る。さらに、動詞の語根および述語形容詞の語根のグル
ープと、それ以外のグループとに大別して品詞を付与し
ている。これにより、上記の接尾語を伴うもの（動詞の
語根と述語形容詞の語根）と接尾語を伴わないものとの
処理を分けることができる。In FIG. 3 (and Table 1), the verb is
Divided into verb roots and verb suffixes, each of them is one part of speech. Similarly, the predicate adjective is divided into the root of the predicate adjective and the suffix of the predicate adjective, each of which is 1
It has two parts of speech. As a result, the dictionary token is fixed because the token is fixed, translation into another language becomes simple and clear, and a more correct translation can be obtained. Further, the parts of speech are broadly divided into groups of verb roots and predicative adjective roots, and other groups. As a result, it is possible to divide the processing into those with the suffix (the root of the verb and the root of the predicate adjective) and those without the suffix.

【００４２】また、この実施例においては、動詞の接尾
語および述語形容詞の接尾語については、品詞辞書に記
憶していない。これは、これらの接尾語に関しては、後
述のように、テーブルによって確定した方が、辞書容量
の点からも、解析上からも好ましいからである。なお、
動詞の接尾語および述語の接尾語以外にも接尾語が存在
する（数字の接尾語等）。以下では、特に断らない限
り、接尾語という場合には、動詞の接尾語および述語の
接尾語を指すものとする。In this embodiment, the verb suffix and the predicate adjective suffix are not stored in the part-of-speech dictionary. This is because it is preferable to determine these suffixes by using a table, as described later, both in terms of dictionary capacity and in terms of analysis. In addition,
There are suffixes other than verb suffixes and predicate suffixes (such as numeric suffixes). Hereinafter, unless otherwise specified, a suffix refers to a verb suffix and a predicate suffix.

【００４３】品詞辞書においては、検索の効率を考慮す
れば、文字コード順にトークンを記憶しておくことが好
ましい。In the part-of-speech dictionary, it is preferable to store tokens in the order of character codes in consideration of the efficiency of retrieval.

【００４４】表１（コンピュータ品詞）１．名前グループＡ 1.1 一般名詞：木、車・・・ 1.2 一般名詞ではない名詞：十分、必要・・・ 1.3 リストＡの名詞：現在、今・・・ 1.4 代名詞：これ、どれ・・・ 1.5 性質を表す名詞：みんな、たくさん・・・ 1.6 特定の語：程度、くらい・・・ 1.7 ”など”：など・・・ 1.8 基本の数字：一、二・・・ 1.9 テキストのシンボルでないもの：３、H₂SO₄・・・２．名前グループＢ 2.1 名詞化した語：こと、もの・・・３．名前グループＣ 3.1 数を数える語：枚、冊・・・４．名前ではないもの 4.1 述語形容詞でないもの：大きな、あの・・・５．後置詞 5.1 真の後置詞：内、中・・・ 5.2 動詞の後置詞：による、における・・・ 5.3 中間の後置詞：によって・・・６．副詞 6.1 形容詞的な副詞：比較的に・・・ 6.2 単純化された副詞：比較的・・・ 6.3 程度を示す副詞：少し、ほとんど・・・ 6.4 リスト１からの副詞：さらに、むしろ・・・ 6.5 リストＣからの副詞：約、ほぼ・・・ 6.6 マナーの副詞：下手に、上手に・・・７．接続詞 7.1 同等の接続詞：と、かつ・・・ 7.2 従属節の接続詞：とき、には・・・ 7.3 接続詞”から”：から 7.4 接続詞”からではなく”：からではなく 7.5 接続詞”もし”と”たとえ”：もし、たとえ 7.6 同格を紹介する接続詞：例えば、すなわち・・・８．不変化詞 8.1 主格の不変化詞 8.1.1 不変化詞”は”：は 8.1.2 不変化詞”が”：が・・ 8.2 間接目的語として使われる場合 8.1.1 不変化詞”に”：に 8.1.2 不変化詞”へ”：へ・・ 8.3 直接目的語として使われる場合：を 8.4 強調したり、制限したり、質問する不変化詞 8.4.1 土台となる不変化詞：は・・ 8.5 強調する不変化詞：なら、ならば・・・９．限定詞：のような、のごとき・・・１０．接辞 10.1 数字の接辞：第、目・・・ 10.2 動詞や述語形容詞の接尾語： 10.3 述語形容詞ではない接尾語：的、的な・・・ 10.4 名詞の準接辞：不、可・・・１１．分割詞 11.1 句点：。 11.2 コンマ：、 11.3 コロン：： 11.4 特別の分割詞１２．一般動詞の語根 12.1 一般動詞の語根：読、書・・・ 12.2 一般動詞でない動詞の語根１３．連結詞の語根：であ・・・１４．助動詞の語根：にな・・・１５．述語形容詞の語根：おもしろ、よ・・・図４に、ハードディスク１４に記憶された言語解析のプ
ログラムをフローチャートで示す。まず、フロッピイデ
ィスク２６に記憶されたテキスト文章を読み込む（ステ
ップＳ１）。次に、ＣＰＵ１２は、読み込んだテキスト
文章の１つの文を対象として、トークンに分割するとと
もに品詞を取得する（ステップＳ２）。この際、ＣＰＵ
１２は、ハードディスク１４に記憶された品詞辞書に基
づいて文をトークンに分解して、品詞を取得する。Table 1 (computer part of speech) Name group A 1.1 General nouns: tree, car ... 1.2 Non-generic nouns: enough, necessary ... 1.3 Nouns in List A: present, now ... 1.4 Pronouns: this, which ... 1.5 Nouns to represent: Everyone, many ... 1.6 Specific words: degree, about ... 1.7 "etc.": etc ... 1.8 Basic numbers: 1, 2, ... 1.9 Non-text symbols: 3, H ₂ SO ₄ ... 2. 2. Name group B 2.1 Nounized words: things, things ... 3. Name group C 3.1 Counting words: sheets, books ... What is not a name 4.1 What is not a predicate adjective: big, that ... Postposition 5.1 True postposition: inner, middle ... 5.2 Verb postposition: by, in ... 5.3 Intermediate postposition: by ... 6. Adverbs 6.1 Adjective adverbs: relatively ... 6.2 Simplified adverbs: relatively ... 6.3 Adverbs showing degree: a little, almost ... 6.4 Adverbs from list 1: more, rather ... 6.5 Adverb from list C: about, almost ... 6.6 Manner adverb: poorly, well ... Conjunction 7.1 Equivalent conjunctive: and, and 7.2 Subordinative conjunctive: sometimes, 7.3 7.3 Conjunction "From": From 7.4 Conjunction "Not from, not: From" 7.5 conjunction "If" and " 7. "If, if, even 7.6 Conjunctive to introduce apposition: For example, ie ... Invariants 8.1 Nominal invariants 8.1.1 Invariants ":" are: 8.1.2 Invariants ":": 8.2 When used as an indirect object 8.1.1 Invariants "" 8.1.2 Invariant "to": to 8.3 When used as a direct object: 8.4 Intensifier, restricting, or asking an invariant 8.4.1 Base invariant: ha・・ 8.5 Emphasizing invariant: If, then ... Quantifier: like, like ... 10.1 Affixes of numbers: Numbers, eyes ... 10.2 Suffixes of verbs and predicate adjectives: 10.3 Suffixes that are not predicate adjectives: target, target ... 10.4 Quasi-suffixes of nouns: no, possible ... 11. Participants 11.1 Punctuation:. 11.2 Comma :, 11.3 Colon :: 11.4 Special participle 12.1 Roots of general verbs 12.1 Roots of general verbs: reading, writing ... 12.2 Roots of verbs that are not general verbs 13. Root of connective: Then ... 14. Root of auxiliary verb: Nina ... FIG. 4 is a flowchart showing a language analysis program stored in the hard disk 14. First, the text sentence stored in the floppy disk 26 is read (step S1). Next, the CPU 12 divides one sentence of the read text sentence into tokens and acquires a part of speech (step S2). At this time, CPU
Reference numeral 12 decomposes a sentence into tokens based on the part-of-speech dictionary stored in the hard disk 14 to acquire a part of speech.

【００４５】たとえば、「僕は学校へ行きます。」とい
う文を例として、トークンへの分割処理および品詞の取
得処理について説明する。まず、ＣＰＵ１２は、最初の
文字「僕」について、品詞辞書の検索を行う。図５に品
詞辞書の一部を示す。図からも明らかなように、「僕」
は、一般名詞という品詞であることが分かる。For example, the process of dividing into tokens and the process of acquiring the part of speech will be described by taking the sentence "I am going to school" as an example. First, the CPU 12 searches the part of speech dictionary for the first character “I”. FIG. 5 shows a part of the part of speech dictionary. As is clear from the figure, "I"
Is a part of speech called a general noun.

【００４６】さらに、ＣＰＵ１２は、「僕は」という文
字につき同じようにして品詞辞書の検索を行う。品詞辞
書には「僕は」が記憶されておらず、かつ「は」が漢字
でないことから、ＣＰＵ１２は「僕」が１つのトークン
であると判断する。とともに、トークン「僕」の品詞を
名前グループＡの一般名詞として解析ファイルに記憶す
る（図６参照）。Further, the CPU 12 searches the part-of-speech dictionary in the same way for the character "I am". Since “I am” is not stored in the part-of-speech dictionary and “I” is not a kanji, the CPU 12 determines that “I” is one token. At the same time, the part of speech of the token "I" is stored in the analysis file as a general noun of the name group A (see FIG. 6).

【００４７】次に、取得した品詞が、動詞、述語形容詞
の何れかであるか否かを判断する（ステップＳ３）。こ
こでは、一般名詞であるから、ステップＳ５に進む。ス
テップＳ５においては、当該文の全てのトークンについ
て品詞を取得したか否かを判断する。ここでは、まだ、
未取得のトークンがあるので、ステップＳ６に進み、次
のトークンについて処理を行う。Next, it is determined whether the acquired part of speech is either a verb or a predicate adjective (step S3). Here, since it is a general noun, the process proceeds to step S5. In step S5, it is determined whether or not parts of speech have been acquired for all tokens of the sentence. Here, still,
Since there is an unacquired token, the process proceeds to step S6, and the process is performed for the next token.

【００４８】次に、「は」について品詞辞書の検索を行
う。品詞辞書には、「は」の品詞は記憶されておらず、
ルールテーブルの番号が記憶されている。これにより、
「は」は、２以上の品詞を有することが分かる。さら
に、「は学」という文字につき同じようにして品詞辞書
の検索を行う。品詞辞書には「は学」が記憶されていな
いことから、ＣＰＵ１２は「は」が１つのトークンであ
ると判断する。とともに、トークン「は」の品詞が２以
上存在して不確定であることから、品詞辞書に記憶され
ているルールテーブルの番号Ｂ(45)を記憶する（図６参
照）。なお、上記実施例では、２以上の品詞が存在する
場合に、ルールテーブルの番号のみを品詞辞書に記憶す
るようにしているが、これら品詞も併せて記憶するよう
にしてもよい。Next, the part of speech dictionary is searched for "wa". The part-of-speech dictionary does not store the part of speech of "ha",
The number of the rule table is stored. This allows
It can be seen that "ha" has two or more parts of speech. Further, the part-of-speech dictionary is searched in the same manner for the character "hagaku". Since “hagaku” is not stored in the part-of-speech dictionary, the CPU 12 determines that “ha” is one token. At the same time, since there are two or more parts of speech of the token "ha" and it is uncertain, the number B (45) of the rule table stored in the part of speech dictionary is stored (see FIG. 6). In the above embodiment, when there are two or more parts of speech, only the number of the rule table is stored in the part of speech dictionary. However, these parts of speech may be stored together.

【００４９】以下同じようにして、「学校」「へ」をそ
れぞれトークンとして認識し、図６に示すようにそれぞ
れの品詞を記憶する。次に、「行」をトークンとして認
識し、その品詞として一般動詞の語根を記憶する。ここ
では、取得した品詞が一般動詞の語根であるから、ステ
ップＳ３により、ステップＳ４に分岐する。ステップＳ
４においては、接尾語のトークンを確定するとともに、
その属性の解析を併せて行う。以下に述べるように、テ
ーブルを使用することにより、トークンの確定と属性の
解析とを同時に行うことを可能としている。このような
利点を追求しないのであれば、品詞辞書の中に、接尾語
を記憶しておいて処理することも可能である。In the same manner, "school" and "he" are recognized as tokens, and the parts of speech are stored as shown in FIG. Next, "line" is recognized as a token, and the root of a general verb is stored as the part of speech. Here, since the acquired part of speech is the root of a general verb, step S3 branches to step S4. Step S
In 4, we determine the suffix token and
The analysis of the attribute is also performed. As described below, by using a table, it is possible to simultaneously determine the token and analyze the attribute. If such advantages are not pursued, it is also possible to store the suffixes in the part-of-speech dictionary and process them.

【００５０】以下、この実施例に基づいて、一般動詞の
接尾語のためのテーブルを用いて接尾語の解析を行う処
理を説明する。この実施例では、図１３に示す”一般動
詞の語根の次に来る接尾語のためのテーブル”（以下テ
ーブルＤと呼ぶ）、およびこのテーブルによって指示さ
れるテーブル等によって、動詞の接尾語のためのテーブ
ルが構成されている。Hereinafter, processing for analyzing a suffix using a table for suffixes of general verbs will be described based on this embodiment. In this embodiment, a "table for a suffix following the root of a general verb" (hereinafter referred to as table D) shown in FIG. Is configured.

【００５１】前述のように「行」の品詞が一般動詞の語
根であると判明すると、ＣＰＵ１２は、図５の品詞辞書
の「行」の欄から、テーブルＤのカラム記号「ｋ」を取
得する。As described above, when it is determined that the part of speech of “line” is the root of a general verb, the CPU 12 obtains the column symbol “k” of the table D from the “line” column of the part of speech dictionary of FIG. .

【００５２】テーブルＤを、図１３に示す。このテーブ
ルは、ｒ、ｔ、ｍ、ｂ、ｎ、ｋ、ｋ’、ｇ、ｓ、ｗのカ
ラムを有している。今、指定されたカラム記号は「ｋ」
であるから、「ｋ」のカラムが参照される。ここで、Ｃ
ＰＵ１２は、次の文字「き」を読み込み、カラム「ｋ」
において該当する文字を検索する。ここでは、２行目に
「き」が存在するので、当該行の情報を用いて解析を行
う。つまり、「き」が「同時」という属性を持っている
ことを得て、これを記憶する。さらに、次に参照すべき
テーブルの番号Ｃ(2)を取得する。Table D is shown in FIG. This table has columns of r, t, m, b, n, k, k ', g, s, and w. Now, the designated column symbol is "k"
Therefore, the column “k” is referred to. Where C
The PU 12 reads the next character “ki”, and reads the column “k”
Search for the corresponding character in. In this case, since “き” exists in the second line, the analysis is performed using the information of the line. In other words, "ki" has the attribute "simultaneous", and this is stored. Further, the number C (2) of the table to be referred next is acquired.

【００５３】テーブルＣ(2)を、図１４に示す。ＣＰＵ
１２は、次の文字「ま」を取得し、このテーブルに
「ま」が存在するか否かを判断する。さらに、次の文字
を含めて「ます」がテーブルに存在するか否かを判断す
る。このようにして、ＣＰＵ１２は、最も長い文字列と
して合致するものをテーブルＣ(2)から探し出す。ここ
では、No.22の「ます」が選択される。このように、
「き」に連続する文字が見いだされた時点で、先ほど記
憶した「同時」という属性を消去する。これは、図１３
に示す接尾語のためのテーブルに掲げられた属性「同
時」「過程が明確」「過去でない」「命令」は、それ以
降に接尾語が続かない場合にのみ適用されるものだから
である。ただし、例外として「命令」でテーブルＣ(1)
とテーブルＣ(3)に続く場合は「命令」の属性は残る。
この例外となる理由は、「命令」でテーブルＣ(1)、Ｃ
(3)に続くのは動詞の接尾語に対してではなく、慣用語
に続くからである。したがって、動詞の接尾語に対して
続く、テーブルＣ(4)の場合は、原則どおり、「命令」
の属性は消える。FIG. 14 shows the table C (2). CPU
12 obtains the next character "ma" and determines whether or not "ma" exists in this table. Further, it is determined whether or not “mas” including the following characters exists in the table. In this way, the CPU 12 searches the table C (2) for one that matches as the longest character string. Here, No. 22 “Masu” is selected. in this way,
At the point when a character consecutive to "ki" is found, the attribute "simultaneous" stored earlier is deleted. This is shown in FIG.
The attributes "simultaneous", "clear process", "not in the past", and "instruction" listed in the table for suffixes shown in Fig. 7 apply only when no suffix follows. However, as an exception, the “instruction” in table C (1)
And the table "C (3)", the attribute of "instruction" remains.
The reason for this exception is that the “instructions” in Tables C (1), C
(3) follows not the verb suffix, but the idiom. Therefore, in the case of table C (4), which follows the verb suffix, the "instruction"
Attribute disappears.

【００５４】いずれにしても、「ます」が選択されるこ
とにより、「過程が明確」「過去でない」「敬語」とい
う属性を取得する。また、「きます」が１つのトークン
になり一般動詞の接尾語であると確定できる。In any case, when "masu" is selected, the attributes "process is clear", "not in the past", and "honorific" are acquired. In addition, "Kimasu" becomes one token and can be determined to be a suffix of a general verb.

【００５５】なお、テーブルＣ(2)に該当する文字列が
発見できない場合には、図１３のテーブルに戻って、次
のテーブルとして指示されている他のテーブル（ここで
はＳ(V)）を検索する。また、テーブルＣ(2)からさらに
他のテーブルが指示される場合もある。例えば、「たく
な」を選択した場合には、さらにテーブルＣを参照する
ように指示されている（次テーブルの項に「Ｃ」と記憶
されている）。この場合、テーブルＣ(1)〜Ｃ(n)を参照
して、次に続く言葉を検索する（ここで、nは、Ｃの記
号が付されたテーブルの数である）。If the character string corresponding to the table C (2) cannot be found, the process returns to the table of FIG. 13 and another table (here, S (V)) designated as the next table is used. Search for. Further, another table may be specified from the table C (2). For example, when “Takuna” is selected, an instruction is given to further refer to table C (“C” is stored in the item of the next table). In this case, the following words are searched with reference to the tables C (1) to C (n) (where n is the number of tables with the symbol C).

【００５６】ＣＰＵ１２は、以上の解析に基づき、図６
Ａに示すように、「きます」を一般動詞の接尾語として
記憶する。また、併せて、その属性「過程が明確」「過
去でない」「敬語」を記憶する。このようにして記憶し
た属性は、例えば、他言語への翻訳の際に利用できる。Based on the above analysis, the CPU 12
As shown in A, "kisuru" is stored as a suffix of a general verb. In addition, the attribute “process is clear”, “not in the past”, and “honorific” are stored. The attributes stored in this way can be used, for example, when translating into another language.

【００５７】なお、上記の例では、一般動詞の接尾語に
ついて説明したが、連結詞の接尾語や助動詞の接尾語に
ついても同じように接尾語のためのテーブルを用いて接
尾語の確定および属性解析を行う。ただし、連結詞の接
尾語や助動詞の接尾語については、一般動詞のようなテ
ーブルＤはなく、直接、テーブルＣを参照する。In the above example, the suffix of a general verb has been described. However, the suffix of a connective and the suffix of an auxiliary verb are similarly determined using a table for the suffix and the attribute of the suffix is determined. Perform analysis. However, as for the suffix of the conjunction and the suffix of the auxiliary verb, there is no table D like a general verb, and the table C is directly referred to.

【００５８】また、述語形容詞の接尾語についても同じ
ように接尾語のためのテーブル８ｄを用いて、接尾語の
確定および属性解析を行う。述語形容詞の接尾語のため
のテーブル８ｄの一部を図１７に示す。述語形容詞の接
尾語においては、一般動詞の接尾語のようなテーブルＤ
はなく、図１７に示すようなテーブルＦを持っている。
なお、図１７のテーブル（述語形容詞の接尾語のための
もの）がテーブルＣ（動詞の接尾語のためのもの）やテ
ーブルＤを参照する場合もある。また、逆に、図１４の
テーブルＣ(2)（動詞の接尾語のためのもの）がテーブ
ルＤやテーブルＦ（述語形容詞の接尾語のためのもの）
を参照する場合もある。これは、動詞の接尾語と述語形
容詞の接尾語の何れにもなりうるものが存在するためで
ある。Similarly, for the suffix of the predicate adjective, the suffix is determined and the attribute is analyzed using the suffix table 8d. FIG. 17 shows a part of the table 8d for the suffix of the predicate adjective. In the suffix of the predicate adjective, a table D like the suffix of a general verb is used.
And has a table F as shown in FIG.
The table in FIG. 17 (for a suffix of a predicate adjective) may refer to the table C (for a suffix of a verb) or the table D. Conversely, table C (2) (for the verb suffix) in FIG. 14 is replaced by table D or table F (for the predicate adjective suffix).
May be referred to. This is because there are things that can be both suffixes of verbs and suffixes of predicate adjectives.

【００５９】上記のように、この実施例では、接尾語を
分類して、テーブル化している。これにより、品詞辞書
に個々の動詞や述語形容詞に接尾語を振り当てて登録す
る場合に比べて、辞書容量を大幅に減らすことができ
る。加えて、接尾語の確定を行う際に迅速な処理を行う
ことができ、同時にその属性を解析することができる。
また、ありえない接尾語のつながりを見い出して、文章
の誤りを見つけることもできる。As described above, in this embodiment, the suffixes are classified and tabulated. As a result, the dictionary capacity can be significantly reduced as compared with the case where suffixes are assigned to individual verbs and predicate adjectives in the part of speech dictionary. In addition, when determining a suffix, quick processing can be performed, and at the same time, its attributes can be analyzed.
You can also look for improbable suffix connections and find errors in the sentence.

【００６０】次に、ＣＰＵ１２は、品詞の不明なトーク
ンについて品詞の決定を行う（ステップＳ７）。図７
に、品詞決定の詳細なフローチャートを示す。まず、ス
テップＳ１０において、「僕」に対して２以上の品詞が
取得されているか否かを判断する。ここでは、「僕」の
品詞はすでに一般名詞として確定されているので、その
まま一般名詞とする。Next, the CPU 12 determines the part of speech for the token whose part of speech is unknown (step S7). FIG.
FIG. 2 shows a detailed flowchart of the part of speech determination. First, in step S10, it is determined whether or not two or more parts of speech have been acquired for "I". Here, since the part of speech of "I" has already been determined as a general noun, it is used as a general noun.

【００６１】次に、ステップＳ１３、Ｓ１４を経て、次
のトークン「は」について同様の処理を行う。ここで、
「は」に対しては２以上の品詞が取得されているので
（つまりＢ(45)が記憶されているので）、ステップＳ１
１に進む。ステップＳ１１においては、記憶されている
ルールテーブルＢ(45)を参照する。Next, after steps S13 and S14, the same processing is performed for the next token "wa". here,
Since two or more parts of speech have been acquired for "ha" (that is, B (45) is stored), step S1 is executed.
Proceed to 1. In step S11, the stored rule table B (45) is referred to.

【００６２】ルールテーブルＢ(45)は、ハードディスク
１４に記憶されている。その詳細を、図８に示す。この
ルールテーブルには、複数のルール（NO.1〜NO.3）が記
憶されている。まず、NO.1のルールが読み出される。N
O.1のルールは、左側（１つ前）のトークンが不変化
詞、後置詞、副詞のいずれかである場合、トークン
「は」は、土台となる不変化詞であることを示してい
る。ここで、ＣＰＵ１２は、ステップＳ２で記憶した左
側のトークン「僕」の品詞を読み出す（図６Ａ参照）。
「僕」の品詞は一般名詞であって、不変化詞、後置詞、
副詞の何れでもないから、NO.1のルールは成立しない。The rule table B (45) is stored on the hard disk 14. The details are shown in FIG. A plurality of rules (NO.1 to NO.3) are stored in this rule table. First, the rule of No. 1 is read. N
The rule of O.1 indicates that if the token on the left (one before) is either an invariant, a postposition, or an adverb, the token "ha" is the base invariant. . Here, the CPU 12 reads the part of speech of the token “I” on the left side stored in step S2 (see FIG. 6A).
The part of speech of "I" is a general noun, invariant, postposition,
Since it is not an adverb, the rule of NO.1 does not hold.

【００６３】同様にして、ＣＰＵ１２はNO.2、NO.3のル
ールを検討し、適合するルールを見い出す。ここでは、
NO.3のルールが成立し、「は」の品詞は、不変化詞”
は”であると確定できる。Similarly, the CPU 12 examines the rules No. 2 and No. 3 and finds a suitable rule. here,
The rule of NO.3 holds, and the part of speech of “ha” is an invariant “
Is determined to be ".

【００６４】ＣＰＵ１２は、このようにして確定した品
詞をハードディスク１４に記憶する（ステップＳ１
２）。つまり、図６Ｂに示すように、不変化詞”は”が
記憶される。さらに、ＣＰＵ１２は、適用したルールN
O.3のウエイトの項目より、「１」を取得してこれも併
せて記憶する（図６Ｂ参照）。ここで、ウエイトとは、
品詞決定の確実性の程度を数値化したものである。この
実施例では、情報不足を「０」、正確を「１」、少し正
確を「２」、不正確を「３」としている。このようなウ
エイト付けをしておくことにより、解析後の種々の処理
（たとえば他言語への翻訳等）の際に、ウエイトに基づ
いて処理結果に正確性を付与することができる。The CPU 12 stores the part of speech determined in this way on the hard disk 14 (step S1).
2). That is, as shown in FIG. 6B, the invariant "" is stored. Further, the CPU 12 determines the applied rule N
“1” is acquired from the weight item of O.3, and this is also stored (see FIG. 6B). Here, the weight is
It is a numerical representation of the degree of certainty of part of speech determination. In this embodiment, "0" indicates lack of information, "1" indicates accuracy, "2" indicates slight accuracy, and "3" indicates inaccuracy. By assigning such weights, it is possible to add accuracy to the processing results based on the weights in various processes after analysis (for example, translation into another language).

【００６５】以上と同様にして、全てのトークンについ
て処理を行う（ステップＳ１３、Ｓ１４）。ここで挙げ
た例では、２以上の品詞を持つのは、トークン「は」だ
けであるので、最終的な記憶内容は図６Ｂのようにな
る。In the same manner as above, processing is performed for all tokens (steps S13 and S14). In the example given here, only the token "ha" has two or more parts of speech, so the final stored content is as shown in FIG. 6B.

【００６６】以上のようにして、「僕は学校へ行きま
す。」という文を、各トークンに分割して、品詞を付与
することができる。以後、このトークンへの分割と品詞
の付与を基本として、その他の解析（文型分析等）が行
われていく（ステップＳ９）。As described above, the sentence "I go to school." Can be divided into tokens and given a part of speech. Thereafter, other analysis (sentence pattern analysis, etc.) is performed on the basis of the division into tokens and the attachment of parts of speech (step S9).

【００６７】上記で例示したルールテーブルでは、左側
のトークンのみを参照して品詞を決定している。しか
し、他のルールテーブルでは、右側（１つ後ろ）のトー
クンも参照する場合もあるし、右側のトークンだけを参
照する場合もある。いずれにしても、当該トークンの品
詞を決定するために必要な程度、前後（１以上離れたト
ークンを含んでいてもよい）のトークンを参照すること
が好ましい。In the rule table exemplified above, the part of speech is determined with reference to only the token on the left side. However, other rule tables may refer to the token on the right side (one after) or may refer to only the token on the right side. In any case, it is preferable to refer to tokens before and after (which may include one or more tokens apart) to the extent necessary to determine the part of speech of the token.

【００６８】次に、「麓に近いこの村は景色が美しかっ
た。」という文を例にして、上記の解析を説明する。ま
ず、図４のステップＳ２、Ｓ５、Ｓ６を繰り返し実行し
（動詞や述語形容詞についてはＳ３、Ｓ４）、図９Ａに
示すように、トークンに分解して品詞を取得する（この
図では、動詞や述語形容詞の属性の記憶内容は省略して
いる）。次に、ステップＳ７において、品詞の不明なト
ークンについて品詞の確定を行う。Next, the above analysis will be described with reference to the sentence "This village near the foot has a beautiful scenery." First, steps S2, S5, and S6 of FIG. 4 are repeatedly executed (S3 and S4 for verbs and predicate adjectives), and as shown in FIG. The stored contents of the attributes of the predicate adjective are omitted.) Next, in step S7, the part of speech is determined for the token whose part of speech is unknown.

【００６９】まず、トークン「に」について、ルールテ
ーブルＢ(43)を参照する。ルールテーブルＢ(43)の詳細
を図１０に示す。まず、ルール１についての検討が行わ
れる。ルール１は、左のトークンが「動詞であって後ろ
に述語を伴わないもの」となっている。ここでは、動詞
の語根＋動詞の接尾語を「動詞」と呼んでいる。これ
は、「動詞の語根」＋「動詞の接尾語」という２つのト
ークン（このようなかたまりをクワジワードと呼ぶ）を
判断の基準に使うということを示している。つまり、判
断の基準に用いられるトークンはこの場合のように複数
となる場合もある。また、ルール１に示すように、品詞
以外の要素である「述語を伴わない」という要素も加味
して、ルールを構成する場合もある。First, with regard to the token “ni”, the rule table B (43) is referred to. FIG. 10 shows details of the rule table B (43). First, the rule 1 is examined. Rule 1 is such that the token on the left is “a verb without a predicate after it”. Here, the root of the verb + the suffix of the verb is called a "verb". This indicates that two tokens (the root of the verb) + the suffix of the verb (such a cluster is called a kwaji word) are used as criteria for judgment. That is, there may be a case where a plurality of tokens are used as criteria for determination as in this case. Further, as shown in Rule 1, the rule may be configured in consideration of an element other than the part of speech, that is, “without a predicate”.

【００７０】いずれにしても、ここでは、左のトークン
が一般名詞であるので、ルール１は該当しない。その結
果、ルール１に該当しない場合の全ての場合に適用され
るルール２が適用される。すなわち、品詞は、不変化
詞”に”であると決定される。また、そのウエイトは、
２として記憶される。In any case, rule 1 does not apply here because the left token is a general noun. As a result, the rule 2 that is applied in all cases that do not correspond to the rule 1 is applied. That is, the part of speech is determined to be the invariant “ni”. The weight is
2 is stored.

【００７１】以下同様にして、「は」については、図８
のルールテーブルＢ(45)を参照して、不変化詞”は”で
あると決定される。また、「が」については、図１１の
ルールテーブルＢ(44)を参照して、不変化詞”が”であ
ると決定される。In the same manner, "ha" is read as shown in FIG.
With reference to the rule table B (45), the invariant "" is determined to be "". Further, as for “GA”, it is determined that the invariant “” is “” with reference to the rule table B (44) of FIG.

【００７２】以上のようにして、「麓に近いこの村は景
色が美しかった。」という文が解析される。つまり、図
９Ｂに示すように、トークンへの分割と、各トークンの
品詞が、解析ファイルとしてハードディスク１４に記憶
される。As described above, the sentence "This village near the foot has beautiful scenery" is analyzed. That is, as shown in FIG. 9B, the division into tokens and the part of speech of each token are stored in the hard disk 14 as an analysis file.

【００７３】なお、同じ品詞が連続する場合などは、前
後のトークンの品詞によって当該トークンの品詞を決定
することが不適切となる場合もある。たとえば、「車は
常に迅速、確実かつ安全に運転しよう。」という文につ
いて、品詞辞書から品詞を取得した結果は、図１２Ａの
ようになる。ここで、「は」については、上記と同様に
して、ルールテーブルＢ(45)を参照して品詞を決定す
る。Note that when the same part of speech is continuous, it may be inappropriate to determine the part of speech of the token based on the part of speech of the preceding and following tokens. For example, the result of acquiring the part of speech from the part of speech dictionary for the sentence "Always drive the car quickly, reliably and safely" is as shown in FIG. 12A. Here, for "wa", the part of speech is determined with reference to the rule table B (45) in the same manner as described above.

【００７４】次に、「迅速」の品詞を確定するため、ル
ールテーブルＢ(6)を読み出す（図１８参照）。しか
し、このルールテーブルに従って、右側のトークン「確
実」の品詞に基づいて品詞を確定してはならない（な
お、ここでの「、」はトークンとトークンを分割してい
るだけであり無視する）。もっとも、例示したケースの
場合には、「確実」の品詞が定まっていないため、確定
すらできない。しかし、たとえ確定できたとしても、
「確実」の品詞に基づいて確定すると誤った結果とな
る。Next, the rule table B (6) is read to determine the part of speech "quick" (see FIG. 18). However, in accordance with this rule table, the part of speech must not be determined based on the part of speech of the token "sure" on the right side (note that "," here is only used to separate tokens and is ignored). However, in the case illustrated, since the part of speech of “certain” has not been determined, it cannot be determined even. However, even if it can be determined,
Incorrect results based on the part-of-speech of “certain” will produce incorrect results.

【００７５】図１９を参照しつつその説明をする。同じ
品詞のトークンα、βが連続した場合、これらは１つの
トークンとして、これら同じ品詞の連続トークンα、β
以外のトークンγと関係していると見られる。つまり、
図に示すように、それぞれのトークンα、βが、それぞ
れトークンγに関係していると見られるのである。した
がって、トークンαについて、図１８のテーブルＢ(6)
を適用する際に、右のトークンは、γとしなければなら
ないのである。The operation will be described with reference to FIG. When the tokens α and β of the same part of speech continue, they are regarded as one token, and the continuous tokens α and β of the same part of speech
It seems to be related to token γ other than. That is,
As shown in the figure, each of the tokens α and β is seen to be related to the token γ. Therefore, for token α, table B (6) in FIG.
When applying, the right token must be γ.

【００７６】このように、同じ品詞のトークン（同じル
ールテーブルを参照するトークンはそうであると推定す
る）が連続した場合（コンマや同等の接続詞が入ってい
る場合も連続とみなす）には、同じルールテーブルを参
照しない最も近いトークン（ただし、コンマや同等の接
続詞は無視する）の品詞を用いてルールテーブルにあて
はめる。上記の例でいうと、「迅速」についての品詞を
決定する場合には、「確実」や「安全」の品詞でなく、
「に」の品詞で決定するようにしている。As described above, when tokens of the same part of speech (the tokens referring to the same rule table are presumed to be the same) are continuous (the case where commas or equivalent conjunctions are included is also regarded as continuous), Fit to the rule table using the part of speech of the closest token that does not reference the same rule table (but ignores commas and equivalent conjunctions). In the example above, when determining the part of speech for "quick", instead of "particular" or "safe"
The part of speech of "ni" is decided.

【００７７】なお、「に」の品詞は確定しておらずテー
ブルＢ(43)によって決定されるようになっているが、テ
ーブルＢ(6)とＢ(43)においてとりうる品詞の組み合わ
せによって、品詞が確定可能である。Although the part of speech of "ni" has not been determined and is determined by table B (43), the combination of parts of speech that can be taken in tables B (6) and B (43) is The part of speech can be determined.

【００７８】同様に、「確実」も、「に」の品詞によっ
てその品詞が決定される。このようにして、得られた品
詞を、図１２Ｂに示す。なお、図１２Ｂのように品詞を
定めた後、所定のルールを用いて（たとえば、連続する
一般名詞と不変化詞を１つのトークンとする等のルール
を用いて）「車は」を１つののトークンとしてまとめて
扱ってもよい。同様に、「運転」「しよ」を１つのトー
クンとしてまとめて動詞として扱っても良い。このよう
な処理は、これに続く解析の内容によって、適宜選択す
ることができる。Similarly, the part of speech of “sure” is determined by the part of speech of “ni”. The part of speech obtained in this way is shown in FIG. 12B. After the part of speech is determined as shown in FIG. 12B, the "car" is converted into one using a predetermined rule (for example, using a rule such that a continuous general noun and an invariant are used as one token). May be treated collectively as a token. Similarly, "driving" and "shiyo" may be collectively treated as one token and treated as a verb. Such a process can be appropriately selected depending on the content of the subsequent analysis.

【００７９】以上述べたように、この実施例では、動詞
を、動詞の語根と動詞の接尾語とに分けて、それぞれ１
つの品詞としている。また、同様に、述語形容詞を述語
形容詞の語根と述語形容詞の接尾語とに分けて、それぞ
れ１つの品詞としている。これにより、接尾語は、品詞
辞書に登録せず、前記のようにテーブルによって特定す
るという処理を可能にしている。As described above, in this embodiment, the verb is divided into the root of the verb and the suffix of the verb,
It has two parts of speech. Similarly, the predicate adjective is divided into a root of the predicate adjective and a suffix of the predicate adjective, each of which is one part of speech. As a result, it is possible to specify the suffix by using the table as described above without registering the suffix in the part-of-speech dictionary.

【００８０】なお、「行くかもしれない」等の慣用的な
言葉は、動詞の語根「行」と接尾語「く」と慣用語「か
もしれない」に分割し、慣用語「かもしれない」を接尾
語のように扱って動詞の接尾語ためのテーブル中に盛り
込んでおいてもよい。これは、慣用語が動詞の接尾語と
同じ機能を持っているからである。このようにして慣用
語を盛り込んだ接尾語のテーブルの例を、図１５に示
す。このようにしておけば、動詞の語根やその接尾語に
続く慣用語の解析が容易となる。なお、このテーブルに
おいて、記号”、”はORの意味であり、記号”＊”はAN
Dの意味である。ただし、”＊”の右側の文字は使わな
い場合もある。たとえば、（Ａ、Ｂ）＊（Ｃ）と表記さ
れている場合、とりうる言葉は、ＡＣ，ＢＣ，Ａ，Ｂの
４つであることを示している。Conventional words such as "may go" are divided into the verb root "line", the suffix "ku" and the idiom "may", and the idiom "may" May be treated as a suffix and included in a table for verb suffixes. This is because idioms have the same function as verb suffixes. FIG. 15 shows an example of a table of suffixes containing idiomatic terms in this way. This facilitates analysis of the idiomatic term following the root of the verb and its suffix. In this table, the symbols “,” indicate OR, and the symbol “*” indicates AN.
D means. However, the character to the right of "*" may not be used. For example, if (A, B) * (C) is written, it indicates that four possible words are AC, BC, A, and B.

【００８１】なお、上記のように慣用語を接尾語のよう
に扱うテーブルを、動詞の接尾語のテーブルと一体にし
てもよいが、これと分離して設けてもよい。The table that treats idioms as suffixes as described above may be integrated with the verb suffix table, or may be provided separately.

【００８２】また、複合動詞について、動詞（”い”行
で終わる動詞）の語根の次の接尾語に続く動詞を接尾語
として扱う、複合動詞のテーブルを設けてもよい。この
実施例では、この複合動詞のテーブルを、一般動詞の語
根に続く接尾語のテーブルによって指示されるテーブル
Ｓ(V)として設けている（図１６参照）。つまり、動詞
の接尾語のテーブル中に設けられている。しかし、これ
とは分離して、設けてもよい。Further, for compound verbs, a compound verb table may be provided which treats the verb following the suffix following the root of the verb (the verb ending in the "i" line) as a suffix. In this embodiment, the compound verb table is provided as a table S (V) indicated by a suffix table following the root of a general verb (see FIG. 16). That is, it is provided in the verb suffix table. However, it may be provided separately from this.

【００８３】動詞（動詞の語根の次の接尾語が「い」で
終わる動詞）に動詞が続くと、図１６の複合動詞のテー
ブルＳ(V)が呼び出される。これにより、処理の高速化
が図られる。When a verb follows a verb (a verb whose suffix next to the root of the verb ends with "i"), the compound verb table S (V) in FIG. 16 is called. Thereby, the processing can be speeded up.

【００８４】ところで、「する」や「来る」等の動詞
は、語根が変化してしまう。つまり、本発明の考え方を
適用すれば、語根がないということになってしまう。こ
のような動詞に対しては、例外的に、変化するそれぞれ
の語根を品詞辞書に登録するとともに、次に続くテーブ
ルの番号を記憶するようにしている（図５の「来」「来
る」「来い」「来よ」参照）。By the way, verbs such as "to" and "to come" have their roots changed. That is, if the concept of the present invention is applied, there is no root. For such a verb, exceptionally, each changing root is registered in the part-of-speech dictionary, and the number of the next table is stored ("come", "come", "come" in FIG. 5). Come, come.)

【００８５】また、上記各実施例では、文章データを受
けてとって、これをトークンに分割し、品詞を決定する
ようにしているが、予めトークンに分割された文を受け
取って、品詞の決定のみを行うようにしてもよい。In each of the above embodiments, the sentence data is received and divided into tokens to determine the part of speech. However, the sentence divided in advance into tokens is received and the part of speech is determined. Only the operation may be performed.

【００８６】また、上記各実施例においては、図１の各
機能をＣＰＵを用いて実現しているが、その一部または
全部をハードウエアロジックによって構成してもよい。Further, in each of the above embodiments, each function of FIG. 1 is realized by using the CPU, but a part or all of them may be constituted by hardware logic.

[Brief description of the drawings]

【図１】この発明の一実施例による言語解析システムの
全体構成を示す図である。FIG. 1 is a diagram showing an overall configuration of a language analysis system according to an embodiment of the present invention.

【図２】図１の言語解析システムをＣＰＵを用いて実現
した場合のハードウエア構成を示す図である。FIG. 2 is a diagram showing a hardware configuration when the language analysis system of FIG. 1 is realized using a CPU.

【図３】品詞の分類を示す図である。FIG. 3 is a diagram showing classification of parts of speech.

【図４】トークンへの分割および品詞の取得の処理プロ
グラムのフローチャートである。FIG. 4 is a flowchart of a processing program for dividing tokens and acquiring parts of speech.

【図５】品詞辞書の例を示す図である。FIG. 5 is a diagram illustrating an example of a part of speech dictionary.

【図６】解析ファイルの内容を示す図である。FIG. 6 is a diagram showing the contents of an analysis file.

【図７】品詞の選択処理のプログラムを示すフローチャ
ートである。FIG. 7 is a flowchart illustrating a program of a part of speech selection process.

【図８】ルールテーブルＢ(45)を示す図である。FIG. 8 is a diagram showing a rule table B (45).

【図９】解析ファイルの内容を示す図である。FIG. 9 is a diagram showing the contents of an analysis file.

【図１０】ルールテーブルＢ(43)を示す図である。FIG. 10 is a diagram showing a rule table B (43).

【図１１】ルールテーブルＢ(44)を示す図である。FIG. 11 is a diagram showing a rule table B (44).

【図１２】解析ファイルの内容を示す図である。FIG. 12 is a diagram showing the contents of an analysis file.

【図１３】動詞の語根の次に来る接尾語のためのテーブ
ル（テーブルＤ）を示す図である。FIG. 13 is a diagram showing a table (table D) for a suffix following the verb root.

【図１４】動詞の接尾語のためのテーブルを示す図であ
る。FIG. 14 is a diagram showing a table for verb suffixes.

【図１５】慣用語を動詞の接尾語として扱う部分を含
む、接尾語のためののテーブルを示す図である。FIG. 15 is a diagram showing a table for a suffix including a part for treating an idiom as a suffix of a verb.

【図１６】複合動詞のテーブルを示す図である。FIG. 16 is a diagram showing a compound verb table.

【図１７】述語形容詞の接尾語のためのテーブルを示す
図である。FIG. 17 shows a table for suffixes of predicate adjectives.

【図１８】ルールテーブルＢ(6)を示す図である。FIG. 18 is a diagram showing a rule table B (6).

【図１９】同じ品詞が続いた場合の処理を説明するため
の図である。FIG. 19 is a diagram for explaining processing when the same part of speech continues.

[Explanation of symbols]

２・・・分割手段４・・・品詞取得手段５・・・分割・品詞選択手段６・・・品詞選択手段８・・・辞書手段 2 ... division means 4 ... part-of-speech acquisition means 5 ... division / part-of-speech selection means 6 ... part-of-speech selection means 8 ... dictionary means

─────────────────────────────────────────────────────
────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成７年１１月９日[Submission date] November 9, 1995

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００４０[Correction target item name] 0040

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【００４０】品詞辞書には、個々のトークンにつき、こ
の細分化された品詞が記憶されている。なお、ハードデ
ィスク１２には、図３の分類階層および後述の表１の分
類階層を記憶している。したがって、細分化された品詞
が分かれば、その上位の分類も容易に取得できる。たと
えば、「一般名詞」が、「名前グループＡ」に属すると
いうことを容易に取得できる。なお、細分化した品詞と
ともに上位の分類も併せて記憶するようにしてもよい。The part-of-speech dictionary stores the subdivided parts of speech for each token. The hard disk 12 stores the classification hierarchy of FIG. 3 and the classification hierarchy of Table 1 described later. Therefore, if the subdivided part of speech is known, the higher classification can be easily obtained. For example, it can be easily acquired that “general noun” belongs to “ name group A”. In addition, you may make it memorize | store the high-order classification together with the fragmented part of speech.

【手続補正２】[Procedure amendment 2]

【補正対象書類名】図面[Document name to be amended] Drawing

【補正対象項目名】図５[Correction target item name] Fig. 5

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【図５】 FIG. 5

Claims

[Claims]

1. A dividing means for dividing a given language into tokens, a dictionary means for storing parts of speech for tokens, a part of speech acquiring means for acquiring the parts of speech of each token divided by the dividing means with reference to the dictionary means, If there is more than one part of speech obtained by the part of speech acquisition means for one token, the token is given to the token based on the part of speech of one or more tokens located before or after or both of the token. A language analysis system comprising: a part of speech selecting means for selecting one part of speech from two or more parts of speech.

2. A dictionary means for storing a part of speech for a token. A division / part of speech acquisition means for dividing a given language into tokens and acquiring the part of speech of each token with reference to the dictionary means. If there are two or more parts of speech obtained by the division / part of speech acquisition means, two or more parts of speech given to the token based on the parts of speech of one or more tokens located before, after, or both of the token A language analysis system comprising: a part-of-speech selecting means for selecting one part of speech from the following.

3. The linguistic analysis system according to claim 1, wherein the dictionary means, when there are two or more parts of speech for the token, the part of speech of one or more tokens located before, after, or both. Characterized in that it has a table for selecting the part of speech of the token based on.

4. The linguistic analysis system according to claim 1 or 2, wherein the dictionary means, for a token having a part of speech other than a verb suffix and a predicate adjective suffix, associates the token with the part of speech. It has a dictionary, and has a table for suffixes related to the root of each verb or the root of a predicate adjective with respect to the verb suffix and the predicate adjective suffix.

5. The language analysis system according to claim 4, wherein the table for the suffix is separate or integrated with the table for the suffix.
Characteristically having a table for idioms that treat suffixes that contain parts of speech that are not verb suffixes.

6. The language analysis system according to claim 4, wherein the table for the suffix is separate or integrated.
A compound having a table for a compound verb that handles a verb token that is not originally a verb suffix as a suffix.

7. Using a dictionary means stored in a storage device,
A language analysis method for assigning a part of speech to each token of a given language, wherein the parts of speech of various tokens are stored in a storage device as dictionary means, and a part of speech corresponding to each token of the given language is stored. When a token has plural parts of speech obtained from the dictionary means, the part of speech of the token is narrowed down based on the parts of speech of one or more tokens located before or after or both of the token. A language analysis method characterized by the following.

8. The language analysis method according to claim 7, wherein when there are two or more parts of speech for the token, the part of speech of the token is selected based on the parts of speech of one or more tokens located before, after, or both. A feature of selecting a part of speech based on a table for performing

9. The language analysis method according to claim 8, wherein at least the suffix of the verb and the suffix of the predicate adjective are converted into a token by a table for the suffix of the root of the individual verb or the predicate adjective. Characterized by performing division.

10. The language analysis method according to claim 9, wherein the table for the suffix is separately or integrally provided.
A table characterized by having a table for idioms that treat suffixes that include part-of-speech tokens that are not originally verb suffixes.

11. The language analysis method according to claim 9, wherein the table for the suffix is separately or integrally provided with:
A compound having a table for a compound verb that handles a suffix that includes a verb token that is not originally a suffix of the verb.

12. A computer-readable storage device in which a computer-executable program for executing a method of assigning a part of speech to each token of a given language using a computer is substantially integrated. In the above method, the part of speech for various tokens is stored in a storage device as dictionary means, the part of speech corresponding to each token in a given language is obtained from the dictionary means, and a plurality of parts of speech for one token are obtained. In some cases, the part of speech of the token is narrowed down based on the part of speech of one or more tokens located before or after or both of the token.

13. A language analysis method for assigning a part of speech to each token of a given language using dictionary means stored in an existing device, comprising: a group including at least a suffix of a verb and a suffix of a predicate adjective. And a dictionary means for separately associating with the other groups and associating them with parts of speech.

14. A language analysis method for assigning a part of speech to each token of a given language using dictionary means stored in a storage device, comprising: a group including at least a root of a verb and a root of a predicate adjective; A feature of having dictionary means for storing parts of speech of other groups as different parts of speech.