JPH06289890A

JPH06289890A - Natural language processor

Info

Publication number: JPH06289890A
Application number: JP5097275A
Authority: JP
Inventors: Koji Inai; 幸治稲井
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1993-03-31
Filing date: 1993-03-31
Publication date: 1994-10-18

Abstract

PURPOSE:To make a correct discriminative read of a sentence in idiomatic expression by discriminatively reading differently read words of the same spelling in a KANJI(Chinese character)-KANA(Japanese syllabary) mixed sentence by utilizing an idiomatic expression dictionary. CONSTITUTION:Restriction among independent words such as 'how to read some word when it appears in a sentence' and restrictions among independent words are not narrowed down but as to words read discriminatively according to restrictions including the adjuncts, the restrictions including adjuncts are described in the idiomatic expression dictionary 4, and the restrictions are applied when a word dictionary 3 is retrieved. Further, a 'case wherein other words are inbetween interposed' is taken into consideration and places where other words can be inserted are describe. The idiomatic expression dictionary 4 which describes those restrictions give restrictions to the retrieval result of the word dictionary 3 and only an optimum reading candidate obtained from the word dictionary 3 can be employed as a retrieval result. Further, special grammar, etc., need not use any method to describe the idiomatic expression dictionary 4, so the discription is easy.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テキスト音声合成や機
械翻訳などにおける自然言語解析装置などの自然言語処
理に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to natural language processing such as a natural language analysis device in text-to-speech synthesis or machine translation.

【０００２】[0002]

【従来の技術】漢字かな混じり文を解析する場合、漢字
には複数の読みを有するものが数多く存在するため、こ
れらの中から適切な読みを選択する必要がある。多くの
ものは、品詞と文法情報により適切なものを選択できる
が、それらの情報だけではどうしても選択できないもの
が存在する。2. Description of the Related Art When analyzing a kanji-kana mixed sentence, there are many kanji that have a plurality of readings, so it is necessary to select an appropriate reading from these. Many can select appropriate ones based on part-of-speech and grammatical information, but there are some that cannot be selected by only using such information.

【０００３】従来、この問題を解決するために、２つの
手法がとられている。第１の手法は、隣接する単語をま
とめて複合語という形で辞書に登録するという手法であ
る。第２の手法は、文中に複数の語が共起した場合の読
み方を共起辞書として登録しておく手法である。Conventionally, two approaches have been taken to solve this problem. The first method is to register adjacent words together in a dictionary in the form of a compound word. The second method is a method of registering the reading when a plurality of words co-occur in a sentence as a co-occurrence dictionary.

【０００４】[0004]

【発明が解決しようとする課題】上記第１の手法は、隣
接する単語列をまとめて一語とみなすというものである
ため、間に他の語が挿入されたような場合は対応できな
いことや、異なる品詞の列を一つの品詞として扱うた
め、係り受け等の処理を行おうとした場合に問題が生じ
る場合がある。The first method is to consider adjacent word strings as one word collectively, and therefore cannot cope with the case where another word is inserted between them. Since a sequence of different parts of speech is treated as one part of speech, a problem may occur when an attempt is made to perform processing such as modification.

【０００５】上記第２の手法は、自立語同士の関係のみ
を記述しているため、読み分けたい自立語について、複
数の共起する自立語が文中に表れた時には、何らかの選
択基準を必要とし、最適な選択基準を決定するのは困難
である。Since the above-mentioned second method describes only the relationship between independent words, when a plurality of independent co-occurring independent words appear in a sentence, some kind of selection criterion is required, Determining the optimal selection criteria is difficult.

【０００６】本発明は、このような状況に鑑みてなされ
たものであり、同形異読語を正しく読み分けることがで
きる自然言語処理装置を提供することを目的とする。The present invention has been made in view of such circumstances, and an object of the present invention is to provide a natural language processing device capable of correctly distinguishing homomorphic differently read words.

【０００７】[0007]

【課題を解決するための手段】本発明の自然言語処理装
置は、漢字かな混じり文中の同形異読語を読み分ける自
然言語処理装置であって、自立語の共起関係による制約
だけでなく、附属語を含めた制約を記述した慣用表現辞
書（例えば、図１の慣用表現辞書４）を備え、この慣用
表現辞書を使用して、上記読み分けを行うことを特徴と
する。A natural language processing apparatus of the present invention is a natural language processing apparatus for distinguishing homomorphic differently read words in a kanji-kana mixed sentence, and is not limited to the restriction due to the co-occurrence relation of independent words. The present invention is characterized by including an idiomatic expression dictionary (for example, idiomatic expression dictionary 4 in FIG. 1) in which constraints including annexes are described, and using the idiomatic expression dictionary, the above-mentioned reading is performed.

【０００８】上記慣用表現辞書の記述において、他の単
語が挿入される場所に制約を設けられることが好まし
い。In the description of the conventional expression dictionary, it is preferable to place restrictions on the places where other words are inserted.

【０００９】上記慣用表現辞書は、単語辞書内容を参照
する形式で実現されることが好ましい。It is preferable that the idiomatic expression dictionary is implemented in a format that refers to the contents of the word dictionary.

【００１０】上記慣用表現辞書による慣用表現的的制約
の適用が、単語辞書検索直後に行われることが好まし
い。It is preferable that the application of the idiomatic expression constraint by the idiomatic expression dictionary is performed immediately after the word dictionary is searched.

【００１１】[0011]

【作用】本発明の自然言語処理装置においては、自立語
の共起関係による制約だけでなく、附属語を含めた制約
を記述した慣用表現辞書を使用して、漢字かな混じり文
中の同形異読語の読み分けが行われる。従って、『ある
語が文中に表れた時、この語の読みはこうなる』という
ような、慣用表現的な文の読み分けが可能となる。In the natural language processing apparatus of the present invention, not only the constraint due to the co-occurrence relation of independent words but also the conventional expression dictionary describing the constraint including the annexed word is used, and the homomorphic reading in the kana-kana mixed sentence is detected. Reading of words is done. Therefore, it becomes possible to distinguish sentences in an idiomatic expression such as "when a word appears in a sentence, the reading of this word is like this".

【００１２】上記慣用表現辞書の記述において、他の単
語が挿入される場所に制約を設けられる場合には、慣用
表現中に単語が挿入されても正しい読み分けを行うこと
ができる。In the description of the above-mentioned idiom expression, if restrictions are placed on the positions where other words are inserted, correct reading can be performed even if a word is inserted in the idiom.

【００１３】また、慣用表現辞書が、単語辞書内容を参
照する形式で実現される場合には、慣用表現辞書は、単
語辞書のように単語の属性といった附属の情報を持つ必
要がなく、メモリ消費量を低減することができる。Further, when the idiom expression dictionary is realized in a format that refers to the contents of the word dictionary, the idiom expression dictionary does not need to have ancillary information such as the attribute of the word like the word dictionary, and the memory consumption is reduced. The amount can be reduced.

【００１４】また、前記慣用表現辞書による慣用表現的
的制約の適用が、単語辞書検索直後に行われる場合に
は、文法解析の処理回数を低減させることができる。Further, when the application of the idiomatic expressional constraint by the idiomatic expression dictionary is performed immediately after the word dictionary is searched, the number of grammatical analysis processes can be reduced.

【００１５】[0015]

【実施例】漢字かな混じり文を入力する自然言語処理で
は、一般に漢字や単語の読みや品詞が１対１でないため
に、適切な読みや品詞を選択する処理が必要となる。す
なわち、自然言語処理においては、読み分け（単語の正
しい認定）を行うことが重要である。[Example] In natural language processing for inputting a mixed kanji / kana sentence, since the reading and part-of-speech of kanji and words are not generally one-to-one, it is necessary to select the appropriate reading and part-of-speech. That is, in natural language processing, it is important to perform reading distinction (correct recognition of words).

【００１６】一般に、読み分けの難しい語というのは、
文法上の制約では絞りきれない語、すなわち、同形異読
語であり、文中に表れる他の語に影響をうけて読みが変
わるもの（他の語が制約をかすもの）である。Generally, a word that is difficult to read is
Words that cannot be narrowed down by grammatical restrictions, that is, homomorphic different words, whose readings change under the influence of other words appearing in the sentence (those other words impose restrictions).

【００１７】本発明は、文中に表れる文法的制約だけで
は読み分けられない同形異読語を読み分けようとするも
のであり、複数の単語間によって生じる制約を慣用表現
辞書として持ち、その辞書に記述された制約を適用する
ことで正しい読み分けを行うものである。The present invention is intended to distinguish different homomorphic words that cannot be distinguished only by the grammatical constraints appearing in a sentence, and has a constraint caused by a plurality of words as an idiomatic expression dictionary and described in the dictionary. The correct reading is performed by applying the constraint described above.

【００１８】図１は、本発明による自然言語処理装置の
一実施例を示す。この実施例は、本発明をテキスト音声
合成装置に適用した例である。この実施例の特徴は、
『ある語が文中に表れた時、この語の読みはこうなる』
といった自立語同士の制約と、自立語同士の制約だけで
は絞りきれないが、附属語を含めた制約により読み分け
られるものについては、付属語を含めた制約を慣用表現
辞書４に記述し、その制約を単語辞書３の検索時に適用
する点にある。また、上述の『他の語が間に挿入された
場合』も考慮し、他の語の挿入を許す場所も記述するこ
とを特徴とする。FIG. 1 shows an embodiment of a natural language processing apparatus according to the present invention. This embodiment is an example in which the present invention is applied to a text-to-speech synthesizer. The feature of this embodiment is that
"When a word appears in a sentence, it reads like this"
Although it cannot be narrowed down only by the constraint between independent words and the constraint between independent words, for those that can be read according to the constraint including the attached word, the constraint including the attached word is described in the idiomatic expression dictionary 4 and the constraint Is applied when searching the word dictionary 3. In addition, considering "when another word is inserted in between" described above, it is also characterized in that the place where the insertion of another word is permitted is also described.

【００１９】これらの制約を記述した慣用表現辞書４
は、単語辞書３の検索結果に制約を設けるものであり、
単語辞書３から最適な読み候補だけを検索結果とするこ
とを可能とする。また、この慣用表現辞書４の記述方法
は、特別な文法等を用いる必要が無いため、記述が容易
である。Idiomatic expression dictionary 4 describing these constraints
Is to limit the search result of the word dictionary 3,
It is possible to use only the optimum reading candidate as the search result from the word dictionary 3. Further, the description method of the idiomatic expression dictionary 4 does not require the use of a special grammar, so that the description is easy.

【００２０】入力部１からは文が入力される。辞書検索
部２は、入力部１から入力された文中に慣用表現辞書４
中に登録されている慣用表現が存在するかどうかを調べ
る。A sentence is input from the input unit 1. The dictionary search unit 2 includes the idiomatic expression dictionary 4 in the sentence input from the input unit 1.
Check if there is an idiomatic expression registered in it.

【００２１】辞書検索部２は、入力文中に慣用表現の部
分が検出された場合は、単語辞書３の検索の結果を、慣
用表現辞書４に記述された制約に基づき取捨選択し、慣
用表現以外の部分については単語辞書検索の結果を全て
後段の解析処理部５に渡す。When an idiomatic expression part is detected in the input sentence, the dictionary search section 2 selects the result of the search of the word dictionary 3 based on the constraint described in the idiomatic expression dictionary 4, and selects other than the idiomatic expression. With regard to the part (2), all the results of the word dictionary search are passed to the analysis processing unit 5 in the subsequent stage.

【００２２】単語辞書３は、単語とその読み、属性、ア
クセント等の情報を有する。慣用表現辞書４は、慣用表
現とその読み、他の単語の挿入を許可する場所と文節の
区切りの情報を有する。解析処理部５は、辞書検索部２
の結果を入力とし文法等の制約などにより、単語の認定
を行いその結果を後段の音韻記号生成部６に渡す。The word dictionary 3 has information such as words, their readings, attributes, and accents. The idiomatic expression dictionary 4 has information on idiomatic expressions, their readings, places where passages of other words are permitted, and punctuation. The analysis processing unit 5 uses the dictionary search unit 2
The result is input as an input, the word is recognized based on restrictions such as grammar, and the result is passed to the phoneme symbol generator 6 in the subsequent stage.

【００２３】音韻記号生成部６では、アクセントやポー
ズ時間長などを決定して音韻記号列を生成する。音声合
成部７では、音韻記号列に基づいて合成音声を生成し、
出力部８から合成音を出力する。The phonological symbol generator 6 determines the accent, pause time length, etc. to generate a phonological symbol string. The speech synthesis unit 7 generates synthetic speech based on the phoneme symbol string,
The output unit 8 outputs a synthetic sound.

【００２４】図２は、慣用表現辞書の具体例を示す図
で、慣用表現とその読み、他の語の挿入を許可する場
所、文節の区切りを示している。図２中の例は、『断
る』という語が促音便化した場合の例で、『断る』が促
音便化して『断った』のようになった場合は、文法上の
制約では『ことわった』と『たった』（「断つ」の促音
便化したもの）の区別がつかない。そこで、図２に示し
たように、共起する語とともに慣用的表現として登録す
る（制約を設ける）ことにより読み分けを行う。FIG. 2 is a diagram showing a concrete example of the idiom dictionary, showing idioms, their readings, places where other words can be inserted, and punctuation marks. The example in Fig. 2 is an example in which the word "refuse" is converted into a consonant. When "refuse" is converted into a consonant and becomes "refused", the grammatical constraint is "kotowatari". Indistinguishable from "only" (the one that has become a consonant of "cut off"). Therefore, as shown in FIG. 2, the words are co-occurred and registered as an idiomatic expression (a restriction is provided) to distinguish the words.

【００２５】また、『客を』と『断った』の間に語が挿
入されることが考えられるため、他の語の挿入を許可す
る場所を明示する。これにより、『客を全て断った』の
ような文でも読み分けが可能となる。文節の区切りは、
単語辞書３の検索結果の絞り込み精度を向上するために
設けるものである。Since it is considered that a word is inserted between "customer" and "decline", the place where the insertion of another word is permitted is specified. This makes it possible to distinguish even sentences such as "I refused all customers." The segment break is
This is provided to improve the accuracy of narrowing down the search results of the word dictionary 3.

【００２６】また、図３のように、慣用表現辞書４に自
立語の共起関係による制約だけでなく、附属語を含めた
制約を記述することにより、『約束が違（ちがう）』と
『約束を違（たが）える』を読み分けることができる。
ただし、全ての附属語を制約として、記述するわけでな
く、例外的に発生する事例についてだけ記述をする。図
３の例では、『約束を／違』を制約として記述してお
く。Further, as shown in FIG. 3, not only the constraint by the co-occurrence relation of the independent words but also the constraint including the annex is described in the idiomatic expression dictionary 4, whereby the "promise is different". You can read differently.
However, not all the annexes are used as constraints, and only the exceptional cases are described. In the example of FIG. 3, “promise / mistake” is described as a constraint.

【００２７】図４は、図２で示した慣用的表現を含む文
での慣用表現辞書の適用例である。図４中の例『彼は、
来客を全て断った』も下線部が慣用表現辞書の制約の適
用場所である。単語辞書３の検索結果（自立語のみ）
は、図４中にも示したように、彼は３、来は５、来客は
１、客は３、断は３つある。この結果を解析処理部５に
渡し、文法的制約より絞った場合、『断った』の部分は
複数の候補が残ってしまう。そこで、単語辞書検索結果
を慣用表現辞書４に記載された制約により絞り込んだ後
で、解析処理部５に渡し、文法的制約より絞ることによ
り、『かれはらいきゃくをすべてことわった』という正
しい読み分けが可能となる。FIG. 4 is an application example of the idiom dictionary for the sentence including the idiom shown in FIG. The example in Figure 4 "He is
The underlined part is the place where the restrictions of the idiomatic expression dictionary are applied. Word dictionary 3 search results (independent words only)
As shown in FIG. 4, he has 3, he has 5, he has 1, he has 3, he has 3, and he has three. When this result is passed to the analysis processing unit 5 and narrowed down from the grammatical constraint, a plurality of candidates remain in the “refused” part. Therefore, the word dictionary search results are narrowed down by the constraints described in the idiomatic expression dictionary 4, passed to the analysis processing unit 5, and narrowed down from the grammatical constraints, so that the correct reading of "he said all the words" was made. It will be possible.

【００２８】図５は、辞書検索部２の処理例を示す。辞
書検索部２は、まず、入力部１より与えられた検索対象
文字例に対して慣用表現辞書４を検索し（ステップＳ
１）、慣用表現が存在するか否かの判定を行う（ステッ
プＳ２）。文字列中に慣用表現が検出された場合には、
辞書検索部２は、慣用表現の妥当性の判定を行う（ステ
ップＳ３）。例えば、慣用表現として登録されている漢
字熟語を包含するような漢字列が検索対象文字列中に存
在した場合や、他の単語の挿入を許可する慣用表現であ
っても挿入される文字列が長い場合は、慣用表現である
という保証はできない。このような文字列を慣用表現と
して認定することによる誤解析を防ぐために、妥当性の
判定を行う。FIG. 5 shows a processing example of the dictionary search unit 2. The dictionary search unit 2 first searches the idiomatic expression dictionary 4 for the search target character example given from the input unit 1 (step S
1) It is determined whether or not an idiomatic expression exists (step S2). If an idiomatic expression is found in the string,
The dictionary search unit 2 determines the validity of the idiomatic expression (step S3). For example, if a Kanji string that includes a Kanji compound word registered as an idiomatic expression is present in the search target character string, or if the idiomatic expression that allows the insertion of other words, the inserted character string is If it is long, it cannot be guaranteed that it is an idiomatic expression. In order to prevent erroneous analysis due to recognizing such a character string as an idiomatic expression, the validity is determined.

【００２９】辞書検索部２は、ステップＳ３において慣
用表現が妥当であると判定した場合には、検索対象文字
列に対して単語辞書３の検索を行い単語の候補を取得し
（ステップＳ４）、慣用表現の部分は、慣用表現辞書に
書かれた読みにより候補の絞り込みを行い（ステップＳ
５）、慣用表現以外の部分については全候補を辞書検索
結果とする。When it is determined in step S3 that the idiomatic expression is valid, the dictionary search section 2 searches the word dictionary 3 for the search target character string to obtain word candidates (step S4). For the idiomatic expression part, candidates are narrowed down by reading written in the idiomatic expression dictionary (step S
5) Regarding the parts other than the idiomatic expressions, all candidates are used as the dictionary search results.

【００３０】慣用表現が検出されなかった場合や、妥当
性の判定で妥当でないと判断された場合は、辞書検索部
２は、検索対象文字列に対して単語辞書３の検索を行
い、取得した単語の全候補を辞書検索結果とする（ステ
ップＳ６）。When the idiomatic expression is not detected or when it is judged to be invalid by the validity judgment, the dictionary search unit 2 searches the word dictionary 3 for the character string to be searched and acquires it. All word candidates are used as dictionary search results (step S6).

【００３１】図２および図３に示したように、慣用表現
辞書４に記述する制約は特別難しい表記法を用いていな
い。そのため、編集が非常に簡単である。また、前述し
たように、慣用表現辞書４には例外的な事例（本発明の
手法を用いない場合には、例外的な解析処理を行う必要
のあるもの）を記述するだけで良く、解析処理部５へ渡
る前に例外的なものの処理が完了しているので、解析処
理部５では特別な処理を行う必要がなくなり、解析のた
めの計算量も少なくなる。As shown in FIGS. 2 and 3, the constraints described in the idiomatic expression dictionary 4 do not use a particularly difficult notation. Therefore, editing is very easy. Further, as described above, the idiomatic expression dictionary 4 only needs to describe an exceptional case (which requires exceptional analysis processing if the method of the present invention is not used). Since the exceptional processing is completed before the processing is passed to the section 5, the analysis processing section 5 does not need to perform any special processing, and the amount of calculation for the analysis is reduced.

【００３２】また、図２に示したように、慣用表現辞書
４には、後段の解析処理部５や音韻記号生成部６で必要
となるアクセント等の情報は記載されず、それらの情報
は単語辞書３に記載された情報を用いる。これにより、
慣用表現辞書４の編集は容易になり、単語辞書３と慣用
表現辞書４の内容は共有されるため、内容の不整合とい
う問題は生じない。この辞書内容の共有を実現する際
に、慣用表現辞書４の語が自立語辞書の特定のアドレス
を参照するような方法での共有は、一方の辞書の改変に
伴い両方の辞書の再編成が必要となるため、本発明の実
施例では、慣用表現辞書４中の見出し語と読みにより単
語辞書３の内容を参照するという共有方式をとる。As shown in FIG. 2, the conventional expression dictionary 4 does not include information such as accents necessary for the analysis processing unit 5 and the phonological symbol generation unit 6 in the subsequent stage, and the information is a word. The information described in the dictionary 3 is used. This allows
Since the idiom expression dictionary 4 can be edited easily and the contents of the word dictionary 3 and the idiom expression dictionary 4 are shared, the problem of inconsistency of contents does not occur. When the sharing of the dictionary contents is realized, the sharing of the dictionary in which the words in the idiomatic expression dictionary 4 refer to a specific address in the independent word dictionary requires the reorganization of both dictionaries due to the modification of one dictionary. Since this is required, the embodiment of the present invention adopts a sharing system in which the contents of the word dictionary 3 are referred to by the entry and the reading in the idiomatic expression dictionary 4.

【００３３】[0033]

【発明の効果】本発明の自然言語処理装置によれば、自
立語の共起関係による制約だけでなく、附属語を含めた
制約を記述した慣用表現辞書を使用して、漢字かな混じ
り文中の同形異読語の読み分けを行うよにしたので、
『ある語が文中に表れた時、この語の読みはこうなる』
というような、慣用表現的な文の読み分けが可能とな
る。また、複数の単語を、一つの単語として扱うのでは
ないため、構文解析時の例外的処理が必要なく、慣用表
現辞書の記述方法は特殊な形式でないため、記述や編集
が容易であり、品詞等の知識がなくても、慣用表現辞書
の記述が可能である。According to the natural language processing apparatus of the present invention, not only the constraint due to the co-occurrence relation of independent words but also the constraint including the ancillary words are used to describe the kanji-kana mixed sentence in the sentence. I decided to distinguish different homonyms from each other.
"When a word appears in a sentence, it reads like this"
It is possible to distinguish sentences such as idiomatic expressions. In addition, since multiple words are not treated as one word, exceptional processing at the time of parsing is not necessary, and the description method of the idiom dictionary is not a special format, so it is easy to describe and edit. It is possible to describe an idiomatic expression dictionary without knowledge of such as.

【００３４】また、上記慣用表現辞書の記述において、
他の単語が挿入される場所に制約を設けることにより、
慣用表現中に単語が挿入されても正しい読み分けを行う
ことができる。Further, in the description of the above-mentioned idiomatic expression dictionary,
By placing restrictions on where other words are inserted,
Even if a word is inserted in an idiomatic expression, correct reading can be performed.

【００３５】また、慣用表現辞書を、単語辞書内容を参
照する形式で実現することにより、慣用表現辞書は、単
語辞書のように単語の属性といった附属の情報を持つ必
要がなく、メモリ消費量を低減することができる。Further, by implementing the idiomatic expression dictionary in a format that refers to the contents of the word dictionary, the idiomatic expression dictionary does not need to have attached information such as the attribute of the word unlike the word dictionary, and the memory consumption is reduced. It can be reduced.

【００３６】また、前記慣用表現辞書による慣用表現的
的制約の適用を、単語辞書検索直後に行うことにより、
文法解析の処理回数を低減させることができる。Further, by applying the idiomatic expression constraint by the idiomatic expression dictionary immediately after searching the word dictionary,
It is possible to reduce the number of times of grammar analysis processing.

[Brief description of drawings]

【図１】本発明による自然言語処理装置の一実施例を示
すブロック図である。FIG. 1 is a block diagram showing an embodiment of a natural language processing apparatus according to the present invention.

【図２】図１の慣用表現辞書４の具体例を示す図であ
る。FIG. 2 is a diagram showing a specific example of an idiomatic expression dictionary 4 in FIG.

【図３】自立語の共起関係による制約だけでなく、附属
語を含めた制約を記述した慣用表現辞書４の具体例を示
す図である。FIG. 3 is a diagram showing a specific example of an idiomatic expression dictionary 4 in which not only constraints based on the co-occurrence relation of independent words but also constraints including attached words are described.

【図４】図２で示した慣用的表現を含む文での慣用表現
辞書の適用例を示す図である。FIG. 4 is a diagram showing an application example of an idiom dictionary in a sentence including the idiom shown in FIG.

【図５】辞書検索部２の処理例を示すフローチャートで
ある。FIG. 5 is a flowchart showing a processing example of the dictionary search unit 2.

[Explanation of symbols]

２辞書検索部３単語辞書４慣用表現辞書 2 dictionary search unit 3 word dictionary 4 idiomatic expression dictionary

Claims

[Claims]

1. A natural language processing device for distinguishing homomorphic different words in a kanji-kana mixed sentence, wherein an idiomatic expression dictionary describing not only constraints by co-occurrence relations of independent words but also constraints including annexes is provided. A natural language processing device, comprising: using the conventional expression dictionary to perform the above-mentioned reading distinction.

2. The natural language processing apparatus according to claim 1, wherein in the description of the idiom dictionary, a restriction is placed on a place where another word is inserted.

3. The idiom expression dictionary is realized in a format that refers to the contents of a word dictionary.
The described natural language processing device.

4. The natural language processing apparatus according to claim 1, wherein the application of the idiomatic expressional constraint by the idiomatic expression dictionary is performed immediately after the word dictionary is searched.