JPH06289890A - Natural language processor - Google Patents

Natural language processor

Info

Publication number
JPH06289890A
JPH06289890A JP5097275A JP9727593A JPH06289890A JP H06289890 A JPH06289890 A JP H06289890A JP 5097275 A JP5097275 A JP 5097275A JP 9727593 A JP9727593 A JP 9727593A JP H06289890 A JPH06289890 A JP H06289890A
Authority
JP
Japan
Prior art keywords
dictionary
words
word
idiomatic expression
idiomatic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP5097275A
Other languages
Japanese (ja)
Inventor
Koji Inai
幸治 稲井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Priority to JP5097275A priority Critical patent/JPH06289890A/en
Publication of JPH06289890A publication Critical patent/JPH06289890A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To make a correct discriminative read of a sentence in idiomatic expression by discriminatively reading differently read words of the same spelling in a KANJI(Chinese character)-KANA(Japanese syllabary) mixed sentence by utilizing an idiomatic expression dictionary. CONSTITUTION:Restriction among independent words such as 'how to read some word when it appears in a sentence' and restrictions among independent words are not narrowed down but as to words read discriminatively according to restrictions including the adjuncts, the restrictions including adjuncts are described in the idiomatic expression dictionary 4, and the restrictions are applied when a word dictionary 3 is retrieved. Further, a 'case wherein other words are inbetween interposed' is taken into consideration and places where other words can be inserted are describe. The idiomatic expression dictionary 4 which describes those restrictions give restrictions to the retrieval result of the word dictionary 3 and only an optimum reading candidate obtained from the word dictionary 3 can be employed as a retrieval result. Further, special grammar, etc., need not use any method to describe the idiomatic expression dictionary 4, so the discription is easy.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、テキスト音声合成や機
械翻訳などにおける自然言語解析装置などの自然言語処
理に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to natural language processing such as a natural language analysis device in text-to-speech synthesis or machine translation.

【0002】[0002]

【従来の技術】漢字かな混じり文を解析する場合、漢字
には複数の読みを有するものが数多く存在するため、こ
れらの中から適切な読みを選択する必要がある。多くの
ものは、品詞と文法情報により適切なものを選択できる
が、それらの情報だけではどうしても選択できないもの
が存在する。
2. Description of the Related Art When analyzing a kanji-kana mixed sentence, there are many kanji that have a plurality of readings, so it is necessary to select an appropriate reading from these. Many can select appropriate ones based on part-of-speech and grammatical information, but there are some that cannot be selected by only using such information.

【0003】従来、この問題を解決するために、2つの
手法がとられている。第1の手法は、隣接する単語をま
とめて複合語という形で辞書に登録するという手法であ
る。第2の手法は、文中に複数の語が共起した場合の読
み方を共起辞書として登録しておく手法である。
Conventionally, two approaches have been taken to solve this problem. The first method is to register adjacent words together in a dictionary in the form of a compound word. The second method is a method of registering the reading when a plurality of words co-occur in a sentence as a co-occurrence dictionary.

【0004】[0004]

【発明が解決しようとする課題】上記第1の手法は、隣
接する単語列をまとめて一語とみなすというものである
ため、間に他の語が挿入されたような場合は対応できな
いことや、異なる品詞の列を一つの品詞として扱うた
め、係り受け等の処理を行おうとした場合に問題が生じ
る場合がある。
The first method is to consider adjacent word strings as one word collectively, and therefore cannot cope with the case where another word is inserted between them. Since a sequence of different parts of speech is treated as one part of speech, a problem may occur when an attempt is made to perform processing such as modification.

【0005】上記第2の手法は、自立語同士の関係のみ
を記述しているため、読み分けたい自立語について、複
数の共起する自立語が文中に表れた時には、何らかの選
択基準を必要とし、最適な選択基準を決定するのは困難
である。
Since the above-mentioned second method describes only the relationship between independent words, when a plurality of independent co-occurring independent words appear in a sentence, some kind of selection criterion is required, Determining the optimal selection criteria is difficult.

【0006】本発明は、このような状況に鑑みてなされ
たものであり、同形異読語を正しく読み分けることがで
きる自然言語処理装置を提供することを目的とする。
The present invention has been made in view of such circumstances, and an object of the present invention is to provide a natural language processing device capable of correctly distinguishing homomorphic differently read words.

【0007】[0007]

【課題を解決するための手段】本発明の自然言語処理装
置は、漢字かな混じり文中の同形異読語を読み分ける自
然言語処理装置であって、自立語の共起関係による制約
だけでなく、附属語を含めた制約を記述した慣用表現辞
書(例えば、図1の慣用表現辞書4)を備え、この慣用
表現辞書を使用して、上記読み分けを行うことを特徴と
する。
A natural language processing apparatus of the present invention is a natural language processing apparatus for distinguishing homomorphic differently read words in a kanji-kana mixed sentence, and is not limited to the restriction due to the co-occurrence relation of independent words. The present invention is characterized by including an idiomatic expression dictionary (for example, idiomatic expression dictionary 4 in FIG. 1) in which constraints including annexes are described, and using the idiomatic expression dictionary, the above-mentioned reading is performed.

【0008】上記慣用表現辞書の記述において、他の単
語が挿入される場所に制約を設けられることが好まし
い。
In the description of the conventional expression dictionary, it is preferable to place restrictions on the places where other words are inserted.

【0009】上記慣用表現辞書は、単語辞書内容を参照
する形式で実現されることが好ましい。
It is preferable that the idiomatic expression dictionary is implemented in a format that refers to the contents of the word dictionary.

【0010】上記慣用表現辞書による慣用表現的的制約
の適用が、単語辞書検索直後に行われることが好まし
い。
It is preferable that the application of the idiomatic expression constraint by the idiomatic expression dictionary is performed immediately after the word dictionary is searched.

【0011】[0011]

【作用】本発明の自然言語処理装置においては、自立語
の共起関係による制約だけでなく、附属語を含めた制約
を記述した慣用表現辞書を使用して、漢字かな混じり文
中の同形異読語の読み分けが行われる。従って、『ある
語が文中に表れた時、この語の読みはこうなる』という
ような、慣用表現的な文の読み分けが可能となる。
In the natural language processing apparatus of the present invention, not only the constraint due to the co-occurrence relation of independent words but also the conventional expression dictionary describing the constraint including the annexed word is used, and the homomorphic reading in the kana-kana mixed sentence is detected. Reading of words is done. Therefore, it becomes possible to distinguish sentences in an idiomatic expression such as "when a word appears in a sentence, the reading of this word is like this".

【0012】上記慣用表現辞書の記述において、他の単
語が挿入される場所に制約を設けられる場合には、慣用
表現中に単語が挿入されても正しい読み分けを行うこと
ができる。
In the description of the above-mentioned idiom expression, if restrictions are placed on the positions where other words are inserted, correct reading can be performed even if a word is inserted in the idiom.

【0013】また、慣用表現辞書が、単語辞書内容を参
照する形式で実現される場合には、慣用表現辞書は、単
語辞書のように単語の属性といった附属の情報を持つ必
要がなく、メモリ消費量を低減することができる。
Further, when the idiom expression dictionary is realized in a format that refers to the contents of the word dictionary, the idiom expression dictionary does not need to have ancillary information such as the attribute of the word like the word dictionary, and the memory consumption is reduced. The amount can be reduced.

【0014】また、前記慣用表現辞書による慣用表現的
的制約の適用が、単語辞書検索直後に行われる場合に
は、文法解析の処理回数を低減させることができる。
Further, when the application of the idiomatic expressional constraint by the idiomatic expression dictionary is performed immediately after the word dictionary is searched, the number of grammatical analysis processes can be reduced.

【0015】[0015]

【実施例】漢字かな混じり文を入力する自然言語処理で
は、一般に漢字や単語の読みや品詞が1対1でないため
に、適切な読みや品詞を選択する処理が必要となる。す
なわち、自然言語処理においては、読み分け(単語の正
しい認定)を行うことが重要である。
[Example] In natural language processing for inputting a mixed kanji / kana sentence, since the reading and part-of-speech of kanji and words are not generally one-to-one, it is necessary to select the appropriate reading and part-of-speech. That is, in natural language processing, it is important to perform reading distinction (correct recognition of words).

【0016】一般に、読み分けの難しい語というのは、
文法上の制約では絞りきれない語、すなわち、同形異読
語であり、文中に表れる他の語に影響をうけて読みが変
わるもの(他の語が制約をかすもの)である。
Generally, a word that is difficult to read is
Words that cannot be narrowed down by grammatical restrictions, that is, homomorphic different words, whose readings change under the influence of other words appearing in the sentence (those other words impose restrictions).

【0017】本発明は、文中に表れる文法的制約だけで
は読み分けられない同形異読語を読み分けようとするも
のであり、複数の単語間によって生じる制約を慣用表現
辞書として持ち、その辞書に記述された制約を適用する
ことで正しい読み分けを行うものである。
The present invention is intended to distinguish different homomorphic words that cannot be distinguished only by the grammatical constraints appearing in a sentence, and has a constraint caused by a plurality of words as an idiomatic expression dictionary and described in the dictionary. The correct reading is performed by applying the constraint described above.

【0018】図1は、本発明による自然言語処理装置の
一実施例を示す。この実施例は、本発明をテキスト音声
合成装置に適用した例である。この実施例の特徴は、
『ある語が文中に表れた時、この語の読みはこうなる』
といった自立語同士の制約と、自立語同士の制約だけで
は絞りきれないが、附属語を含めた制約により読み分け
られるものについては、付属語を含めた制約を慣用表現
辞書4に記述し、その制約を単語辞書3の検索時に適用
する点にある。また、上述の『他の語が間に挿入された
場合』も考慮し、他の語の挿入を許す場所も記述するこ
とを特徴とする。
FIG. 1 shows an embodiment of a natural language processing apparatus according to the present invention. This embodiment is an example in which the present invention is applied to a text-to-speech synthesizer. The feature of this embodiment is that
"When a word appears in a sentence, it reads like this"
Although it cannot be narrowed down only by the constraint between independent words and the constraint between independent words, for those that can be read according to the constraint including the attached word, the constraint including the attached word is described in the idiomatic expression dictionary 4 and the constraint Is applied when searching the word dictionary 3. In addition, considering "when another word is inserted in between" described above, it is also characterized in that the place where the insertion of another word is permitted is also described.

【0019】これらの制約を記述した慣用表現辞書4
は、単語辞書3の検索結果に制約を設けるものであり、
単語辞書3から最適な読み候補だけを検索結果とするこ
とを可能とする。また、この慣用表現辞書4の記述方法
は、特別な文法等を用いる必要が無いため、記述が容易
である。
Idiomatic expression dictionary 4 describing these constraints
Is to limit the search result of the word dictionary 3,
It is possible to use only the optimum reading candidate as the search result from the word dictionary 3. Further, the description method of the idiomatic expression dictionary 4 does not require the use of a special grammar, so that the description is easy.

【0020】入力部1からは文が入力される。辞書検索
部2は、入力部1から入力された文中に慣用表現辞書4
中に登録されている慣用表現が存在するかどうかを調べ
る。
A sentence is input from the input unit 1. The dictionary search unit 2 includes the idiomatic expression dictionary 4 in the sentence input from the input unit 1.
Check if there is an idiomatic expression registered in it.

【0021】辞書検索部2は、入力文中に慣用表現の部
分が検出された場合は、単語辞書3の検索の結果を、慣
用表現辞書4に記述された制約に基づき取捨選択し、慣
用表現以外の部分については単語辞書検索の結果を全て
後段の解析処理部5に渡す。
When an idiomatic expression part is detected in the input sentence, the dictionary search section 2 selects the result of the search of the word dictionary 3 based on the constraint described in the idiomatic expression dictionary 4, and selects other than the idiomatic expression. With regard to the part (2), all the results of the word dictionary search are passed to the analysis processing unit 5 in the subsequent stage.

【0022】単語辞書3は、単語とその読み、属性、ア
クセント等の情報を有する。慣用表現辞書4は、慣用表
現とその読み、他の単語の挿入を許可する場所と文節の
区切りの情報を有する。解析処理部5は、辞書検索部2
の結果を入力とし文法等の制約などにより、単語の認定
を行いその結果を後段の音韻記号生成部6に渡す。
The word dictionary 3 has information such as words, their readings, attributes, and accents. The idiomatic expression dictionary 4 has information on idiomatic expressions, their readings, places where passages of other words are permitted, and punctuation. The analysis processing unit 5 uses the dictionary search unit 2
The result is input as an input, the word is recognized based on restrictions such as grammar, and the result is passed to the phoneme symbol generator 6 in the subsequent stage.

【0023】音韻記号生成部6では、アクセントやポー
ズ時間長などを決定して音韻記号列を生成する。音声合
成部7では、音韻記号列に基づいて合成音声を生成し、
出力部8から合成音を出力する。
The phonological symbol generator 6 determines the accent, pause time length, etc. to generate a phonological symbol string. The speech synthesis unit 7 generates synthetic speech based on the phoneme symbol string,
The output unit 8 outputs a synthetic sound.

【0024】図2は、慣用表現辞書の具体例を示す図
で、慣用表現とその読み、他の語の挿入を許可する場
所、文節の区切りを示している。図2中の例は、『断
る』という語が促音便化した場合の例で、『断る』が促
音便化して『断った』のようになった場合は、文法上の
制約では『ことわった』と『たった』(「断つ」の促音
便化したもの)の区別がつかない。そこで、図2に示し
たように、共起する語とともに慣用的表現として登録す
る(制約を設ける)ことにより読み分けを行う。
FIG. 2 is a diagram showing a concrete example of the idiom dictionary, showing idioms, their readings, places where other words can be inserted, and punctuation marks. The example in Fig. 2 is an example in which the word "refuse" is converted into a consonant. When "refuse" is converted into a consonant and becomes "refused", the grammatical constraint is "kotowatari". Indistinguishable from "only" (the one that has become a consonant of "cut off"). Therefore, as shown in FIG. 2, the words are co-occurred and registered as an idiomatic expression (a restriction is provided) to distinguish the words.

【0025】また、『客を』と『断った』の間に語が挿
入されることが考えられるため、他の語の挿入を許可す
る場所を明示する。これにより、『客を全て断った』の
ような文でも読み分けが可能となる。文節の区切りは、
単語辞書3の検索結果の絞り込み精度を向上するために
設けるものである。
Since it is considered that a word is inserted between "customer" and "decline", the place where the insertion of another word is permitted is specified. This makes it possible to distinguish even sentences such as "I refused all customers." The segment break is
This is provided to improve the accuracy of narrowing down the search results of the word dictionary 3.

【0026】また、図3のように、慣用表現辞書4に自
立語の共起関係による制約だけでなく、附属語を含めた
制約を記述することにより、『約束が違(ちがう)』と
『約束を違(たが)える』を読み分けることができる。
ただし、全ての附属語を制約として、記述するわけでな
く、例外的に発生する事例についてだけ記述をする。図
3の例では、『約束を/違』を制約として記述してお
く。
Further, as shown in FIG. 3, not only the constraint by the co-occurrence relation of the independent words but also the constraint including the annex is described in the idiomatic expression dictionary 4, whereby the "promise is different". You can read differently.
However, not all the annexes are used as constraints, and only the exceptional cases are described. In the example of FIG. 3, “promise / mistake” is described as a constraint.

【0027】図4は、図2で示した慣用的表現を含む文
での慣用表現辞書の適用例である。図4中の例『彼は、
来客を全て断った』も下線部が慣用表現辞書の制約の適
用場所である。単語辞書3の検索結果(自立語のみ)
は、図4中にも示したように、彼は3、来は5、来客は
1、客は3、断は3つある。この結果を解析処理部5に
渡し、文法的制約より絞った場合、『断った』の部分は
複数の候補が残ってしまう。そこで、単語辞書検索結果
を慣用表現辞書4に記載された制約により絞り込んだ後
で、解析処理部5に渡し、文法的制約より絞ることによ
り、『かれはらいきゃくをすべてことわった』という正
しい読み分けが可能となる。
FIG. 4 is an application example of the idiom dictionary for the sentence including the idiom shown in FIG. The example in Figure 4 "He is
The underlined part is the place where the restrictions of the idiomatic expression dictionary are applied. Word dictionary 3 search results (independent words only)
As shown in FIG. 4, he has 3, he has 5, he has 1, he has 3, he has 3, and he has three. When this result is passed to the analysis processing unit 5 and narrowed down from the grammatical constraint, a plurality of candidates remain in the “refused” part. Therefore, the word dictionary search results are narrowed down by the constraints described in the idiomatic expression dictionary 4, passed to the analysis processing unit 5, and narrowed down from the grammatical constraints, so that the correct reading of "he said all the words" was made. It will be possible.

【0028】図5は、辞書検索部2の処理例を示す。辞
書検索部2は、まず、入力部1より与えられた検索対象
文字例に対して慣用表現辞書4を検索し(ステップS
1)、慣用表現が存在するか否かの判定を行う(ステッ
プS2)。文字列中に慣用表現が検出された場合には、
辞書検索部2は、慣用表現の妥当性の判定を行う(ステ
ップS3)。例えば、慣用表現として登録されている漢
字熟語を包含するような漢字列が検索対象文字列中に存
在した場合や、他の単語の挿入を許可する慣用表現であ
っても挿入される文字列が長い場合は、慣用表現である
という保証はできない。このような文字列を慣用表現と
して認定することによる誤解析を防ぐために、妥当性の
判定を行う。
FIG. 5 shows a processing example of the dictionary search unit 2. The dictionary search unit 2 first searches the idiomatic expression dictionary 4 for the search target character example given from the input unit 1 (step S
1) It is determined whether or not an idiomatic expression exists (step S2). If an idiomatic expression is found in the string,
The dictionary search unit 2 determines the validity of the idiomatic expression (step S3). For example, if a Kanji string that includes a Kanji compound word registered as an idiomatic expression is present in the search target character string, or if the idiomatic expression that allows the insertion of other words, the inserted character string is If it is long, it cannot be guaranteed that it is an idiomatic expression. In order to prevent erroneous analysis due to recognizing such a character string as an idiomatic expression, the validity is determined.

【0029】辞書検索部2は、ステップS3において慣
用表現が妥当であると判定した場合には、検索対象文字
列に対して単語辞書3の検索を行い単語の候補を取得し
(ステップS4)、慣用表現の部分は、慣用表現辞書に
書かれた読みにより候補の絞り込みを行い(ステップS
5)、慣用表現以外の部分については全候補を辞書検索
結果とする。
When it is determined in step S3 that the idiomatic expression is valid, the dictionary search section 2 searches the word dictionary 3 for the search target character string to obtain word candidates (step S4). For the idiomatic expression part, candidates are narrowed down by reading written in the idiomatic expression dictionary (step S
5) Regarding the parts other than the idiomatic expressions, all candidates are used as the dictionary search results.

【0030】慣用表現が検出されなかった場合や、妥当
性の判定で妥当でないと判断された場合は、辞書検索部
2は、検索対象文字列に対して単語辞書3の検索を行
い、取得した単語の全候補を辞書検索結果とする(ステ
ップS6)。
When the idiomatic expression is not detected or when it is judged to be invalid by the validity judgment, the dictionary search unit 2 searches the word dictionary 3 for the character string to be searched and acquires it. All word candidates are used as dictionary search results (step S6).

【0031】図2および図3に示したように、慣用表現
辞書4に記述する制約は特別難しい表記法を用いていな
い。そのため、編集が非常に簡単である。また、前述し
たように、慣用表現辞書4には例外的な事例(本発明の
手法を用いない場合には、例外的な解析処理を行う必要
のあるもの)を記述するだけで良く、解析処理部5へ渡
る前に例外的なものの処理が完了しているので、解析処
理部5では特別な処理を行う必要がなくなり、解析のた
めの計算量も少なくなる。
As shown in FIGS. 2 and 3, the constraints described in the idiomatic expression dictionary 4 do not use a particularly difficult notation. Therefore, editing is very easy. Further, as described above, the idiomatic expression dictionary 4 only needs to describe an exceptional case (which requires exceptional analysis processing if the method of the present invention is not used). Since the exceptional processing is completed before the processing is passed to the section 5, the analysis processing section 5 does not need to perform any special processing, and the amount of calculation for the analysis is reduced.

【0032】また、図2に示したように、慣用表現辞書
4には、後段の解析処理部5や音韻記号生成部6で必要
となるアクセント等の情報は記載されず、それらの情報
は単語辞書3に記載された情報を用いる。これにより、
慣用表現辞書4の編集は容易になり、単語辞書3と慣用
表現辞書4の内容は共有されるため、内容の不整合とい
う問題は生じない。この辞書内容の共有を実現する際
に、慣用表現辞書4の語が自立語辞書の特定のアドレス
を参照するような方法での共有は、一方の辞書の改変に
伴い両方の辞書の再編成が必要となるため、本発明の実
施例では、慣用表現辞書4中の見出し語と読みにより単
語辞書3の内容を参照するという共有方式をとる。
As shown in FIG. 2, the conventional expression dictionary 4 does not include information such as accents necessary for the analysis processing unit 5 and the phonological symbol generation unit 6 in the subsequent stage, and the information is a word. The information described in the dictionary 3 is used. This allows
Since the idiom expression dictionary 4 can be edited easily and the contents of the word dictionary 3 and the idiom expression dictionary 4 are shared, the problem of inconsistency of contents does not occur. When the sharing of the dictionary contents is realized, the sharing of the dictionary in which the words in the idiomatic expression dictionary 4 refer to a specific address in the independent word dictionary requires the reorganization of both dictionaries due to the modification of one dictionary. Since this is required, the embodiment of the present invention adopts a sharing system in which the contents of the word dictionary 3 are referred to by the entry and the reading in the idiomatic expression dictionary 4.

【0033】[0033]

【発明の効果】本発明の自然言語処理装置によれば、自
立語の共起関係による制約だけでなく、附属語を含めた
制約を記述した慣用表現辞書を使用して、漢字かな混じ
り文中の同形異読語の読み分けを行うよにしたので、
『ある語が文中に表れた時、この語の読みはこうなる』
というような、慣用表現的な文の読み分けが可能とな
る。また、複数の単語を、一つの単語として扱うのでは
ないため、構文解析時の例外的処理が必要なく、慣用表
現辞書の記述方法は特殊な形式でないため、記述や編集
が容易であり、品詞等の知識がなくても、慣用表現辞書
の記述が可能である。
According to the natural language processing apparatus of the present invention, not only the constraint due to the co-occurrence relation of independent words but also the constraint including the ancillary words are used to describe the kanji-kana mixed sentence in the sentence. I decided to distinguish different homonyms from each other.
"When a word appears in a sentence, it reads like this"
It is possible to distinguish sentences such as idiomatic expressions. In addition, since multiple words are not treated as one word, exceptional processing at the time of parsing is not necessary, and the description method of the idiom dictionary is not a special format, so it is easy to describe and edit. It is possible to describe an idiomatic expression dictionary without knowledge of such as.

【0034】また、上記慣用表現辞書の記述において、
他の単語が挿入される場所に制約を設けることにより、
慣用表現中に単語が挿入されても正しい読み分けを行う
ことができる。
Further, in the description of the above-mentioned idiomatic expression dictionary,
By placing restrictions on where other words are inserted,
Even if a word is inserted in an idiomatic expression, correct reading can be performed.

【0035】また、慣用表現辞書を、単語辞書内容を参
照する形式で実現することにより、慣用表現辞書は、単
語辞書のように単語の属性といった附属の情報を持つ必
要がなく、メモリ消費量を低減することができる。
Further, by implementing the idiomatic expression dictionary in a format that refers to the contents of the word dictionary, the idiomatic expression dictionary does not need to have attached information such as the attribute of the word unlike the word dictionary, and the memory consumption is reduced. It can be reduced.

【0036】また、前記慣用表現辞書による慣用表現的
的制約の適用を、単語辞書検索直後に行うことにより、
文法解析の処理回数を低減させることができる。
Further, by applying the idiomatic expression constraint by the idiomatic expression dictionary immediately after searching the word dictionary,
It is possible to reduce the number of times of grammar analysis processing.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明による自然言語処理装置の一実施例を示
すブロック図である。
FIG. 1 is a block diagram showing an embodiment of a natural language processing apparatus according to the present invention.

【図2】図1の慣用表現辞書4の具体例を示す図であ
る。
FIG. 2 is a diagram showing a specific example of an idiomatic expression dictionary 4 in FIG.

【図3】自立語の共起関係による制約だけでなく、附属
語を含めた制約を記述した慣用表現辞書4の具体例を示
す図である。
FIG. 3 is a diagram showing a specific example of an idiomatic expression dictionary 4 in which not only constraints based on the co-occurrence relation of independent words but also constraints including attached words are described.

【図4】図2で示した慣用的表現を含む文での慣用表現
辞書の適用例を示す図である。
FIG. 4 is a diagram showing an application example of an idiom dictionary in a sentence including the idiom shown in FIG.

【図5】辞書検索部2の処理例を示すフローチャートで
ある。
FIG. 5 is a flowchart showing a processing example of the dictionary search unit 2.

【符号の説明】[Explanation of symbols]

2 辞書検索部 3 単語辞書 4 慣用表現辞書 2 dictionary search unit 3 word dictionary 4 idiomatic expression dictionary

Claims (4)

【特許請求の範囲】[Claims] 【請求項1】 漢字かな混じり文中の同形異読語を読み
分ける自然言語処理装置であって、 自立語の共起関係による制約だけでなく、附属語を含め
た制約を記述した慣用表現辞書を備え、 前記慣用表現辞書を使用して、上記読み分けを行うこと
を特徴とする自然言語処理装置。
1. A natural language processing device for distinguishing homomorphic different words in a kanji-kana mixed sentence, wherein an idiomatic expression dictionary describing not only constraints by co-occurrence relations of independent words but also constraints including annexes is provided. A natural language processing device, comprising: using the conventional expression dictionary to perform the above-mentioned reading distinction.
【請求項2】 前記慣用表現辞書の記述において、他の
単語が挿入される場所に制約を設けられることを特徴と
する請求項1記載の自然言語処理装置。
2. The natural language processing apparatus according to claim 1, wherein in the description of the idiom dictionary, a restriction is placed on a place where another word is inserted.
【請求項3】 前記慣用表現辞書は、単語辞書内容を参
照する形式で実現されていることを特徴とする請求項1
記載の自然言語処理装置。
3. The idiom expression dictionary is realized in a format that refers to the contents of a word dictionary.
The described natural language processing device.
【請求項4】 前記慣用表現辞書による慣用表現的的制
約の適用が、単語辞書検索直後に行われることを特徴と
する請求項1記載の自然言語処理装置。
4. The natural language processing apparatus according to claim 1, wherein the application of the idiomatic expressional constraint by the idiomatic expression dictionary is performed immediately after the word dictionary is searched.
JP5097275A 1993-03-31 1993-03-31 Natural language processor Pending JPH06289890A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP5097275A JPH06289890A (en) 1993-03-31 1993-03-31 Natural language processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5097275A JPH06289890A (en) 1993-03-31 1993-03-31 Natural language processor

Publications (1)

Publication Number Publication Date
JPH06289890A true JPH06289890A (en) 1994-10-18

Family

ID=14187979

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5097275A Pending JPH06289890A (en) 1993-03-31 1993-03-31 Natural language processor

Country Status (1)

Country Link
JP (1) JPH06289890A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6012028A (en) * 1997-03-10 2000-01-04 Ricoh Company, Ltd. Text to speech conversion system and method that distinguishes geographical names based upon the present position
JP2011191332A (en) * 2010-03-11 2011-09-29 Fujitsu Ltd Voice-synthesizing device, voice-synthesizing method, and voice-synthesizing program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02201643A (en) * 1989-01-31 1990-08-09 Ricoh Co Ltd System for discriminatingly reading isomorphic word
JPH0363767A (en) * 1989-08-01 1991-03-19 Ricoh Co Ltd Text voice synthesizer
JPH0420998A (en) * 1990-05-16 1992-01-24 Ricoh Co Ltd Voice synthesizing device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02201643A (en) * 1989-01-31 1990-08-09 Ricoh Co Ltd System for discriminatingly reading isomorphic word
JPH0363767A (en) * 1989-08-01 1991-03-19 Ricoh Co Ltd Text voice synthesizer
JPH0420998A (en) * 1990-05-16 1992-01-24 Ricoh Co Ltd Voice synthesizing device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6012028A (en) * 1997-03-10 2000-01-04 Ricoh Company, Ltd. Text to speech conversion system and method that distinguishes geographical names based upon the present position
JP2011191332A (en) * 2010-03-11 2011-09-29 Fujitsu Ltd Voice-synthesizing device, voice-synthesizing method, and voice-synthesizing program

Similar Documents

Publication Publication Date Title
EP0971294A2 (en) Method and apparatus for automated search and retrieval processing
JP2002215617A (en) Method for attaching part of speech tag
JPH07282063A (en) Machine translation device
WO1997004405A9 (en) Method and apparatus for automated search and retrieval processing
JPH08248971A (en) Text reading aloud and reading device
JP2002117027A (en) Feeling information extracting method and recording medium for feeling information extracting program
JPH06282290A (en) Natural language processing device and method thereof
JP2595934B2 (en) Kana-Kanji conversion processor
JPH06289890A (en) Natural language processor
JPS58123129A (en) Converting device of japanese syllabary to chinese character
KR20040018008A (en) Apparatus for tagging part of speech and method therefor
JP3873305B2 (en) Kana-kanji conversion device and kana-kanji conversion method
JP3698454B2 (en) Parallel phrase analysis device and learning data automatic creation device
JP2966473B2 (en) Document creation device
KR100333681B1 (en) Automatic translation apparatus and method using verb-based sentence frame
JP2608384B2 (en) Machine translation apparatus and method
JP2655711B2 (en) Homomorphic reading system
JP2801601B2 (en) Text-to-speech synthesizer
JPS62145463A (en) Kana/kanji (japanese syllabary/chinese character) conversion system
JPS63163956A (en) Document preparation and correction supporting device
JPH01114976A (en) Dictionary structure for document processor
JPS6395570A (en) Language analysis system
JP2006139463A (en) Morpheme analysis device, method, and program
JPH09281993A (en) Phonetic symbol forming device
JPS63136264A (en) Mechanical translating device

Legal Events

Date Code Title Description
A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20020628