JPH02201643A - System for discriminatingly reading isomorphic word - Google Patents

System for discriminatingly reading isomorphic word

Info

Publication number
JPH02201643A
JPH02201643A JP1022021A JP2202189A JPH02201643A JP H02201643 A JPH02201643 A JP H02201643A JP 1022021 A JP1022021 A JP 1022021A JP 2202189 A JP2202189 A JP 2202189A JP H02201643 A JPH02201643 A JP H02201643A
Authority
JP
Japan
Prior art keywords
occurrence
isomorphic
word
words
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1022021A
Other languages
Japanese (ja)
Other versions
JP2655711B2 (en
Inventor
Junko Komatsu
小松 順子
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP1022021A priority Critical patent/JP2655711B2/en
Publication of JPH02201643A publication Critical patent/JPH02201643A/en
Application granted granted Critical
Publication of JP2655711B2 publication Critical patent/JP2655711B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PURPOSE:To discriminatingly read an isomorphic word without preparing a case pattern for all terminologies by obtaining the intensity of the co-occurrence of an isomorphic word and a literary word in a modification relation through the use of a co-occurrence dictionary and selecting a combination in which the coincidence is more intensive. CONSTITUTION:The above system is equipped with the co-occurrence dictionary to store co-occurrence relation between sentenses made into a clue at the time of discriminatingly read the isomorphic word (expressed in the same way but read in the different way). For example, in the execution of 'WATAKUSHIWA' (I) 'GAKKO E' (to school) 'ITTA' ('ITTA' (went) 'OKONATTA' (done)), since 'ITTA' (went) is the isomorphic word, when the co-occurrence dictionary id retrieved, the moidification relation to coincide at an index level cannot be found in the example sentence. However, the meaning code of 'school' expresses a facility, and since it coincides with 'MACHI' (town) in the hierarchy of a meaning classification one above, the intensity of the co-occurrence is made into 3 according to a prescribed rule, and as for the reading of 'OKONATTA', 'ITTA' is selected. Thus, without preparing the case pattern of all declinable words, the isomorphic word can be discriminatingly read easily.

Description

【発明の詳細な説明】 技権分災 本発明は、テキスト音声合成装置の同形語読み分け方式
に関する。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a method for distinguishing isomorphic words in a text-to-speech synthesis device.

菜米肢菫 同形語を読み分けるには、その同形語となんらかの関係
を持つ他の単語を手ががりにしなければならない。そこ
で、一般には、同形語を含む文節と係り受け関係にある
文節を手がかりとする方法をとる。このことに関しては
、「階層的単語属性を用いて同形語の自動読み分け法」
 (電芋通信学会論文誌85 / 3 Vol、J68
−D No、3 p、392−p、399)に記載され
ている。それによると次のような方法がとられていた。
In order to distinguish between the isomorphic words of the same name, it is necessary to use other words that have some kind of relationship with the isomorphic word as clues. Therefore, in general, a method is used in which a clause that has a dependency relationship with a clause that includes a homograph is used as a clue. Regarding this, please refer to ``Automatic reading method for homographs using hierarchical word attributes''.
(Denimo Communication Society Journal 85/3 Vol. J68
-D No., 3 p, 392-p, 399). According to this, the following methods were used:

(1)同形語が用言の場合は、その用言の必須路となる
文節を見つけ、その文節の名詞の意味属性が用言の格パ
ターンで指定されたものに一致する用言を選択する。
(1) If the homograph is a predicate, find the clause that is the essential path of the predicate, and select the clause whose semantic attributes of the noun in that clause match those specified by the case pattern of the predicate. .

(2)同形語が並列句を構成している場合は、相手の名
詞に意味的に近い組合せを選択する。
(2) If the homographs form a parallel phrase, select a combination that is semantically close to the other noun.

(3)同形語が格助詞゛′の″で結ばれた連体修飾関係
を構成している場合は、読み分けの対象となる単語に予
め、格助詞゛′の″を介して前後に接続しうる単語の意
味属性を与えておき、それに一致する組合せとなるもの
を選択する。
(3) If homographs form an adnominal modification relationship connected by the case particle ``'no'', they can be connected before and after the word to be distinguished through the case particle ``'no''. The semantic attributes of words are given, and combinations that match them are selected.

(4)同形語が用言の格要素となっている場合は、その
用言の格パターンで指定された意味属性に一致する方を
選択する。
(4) If a homograph is a case element of a predicate, select the one that matches the semantic attribute specified by the case pattern of the predicate.

(5)同形語が名詞文節−述部の関係になる場合(〜は
−だ。″の関係になる場合)は、述部となる名詞と名詞
文節内の名詞が意味的に近い属性を持つ組合せを選択す
る。
(5) When a homograph has a noun clause-predicate relationship (~wa-da.'' relationship), the noun that is the predicate and the noun in the noun clause have attributes that are semantically similar. Select a combination.

(6)同形語が連体修飾されている場合は、その同形語
を修飾している用言を児っけ、その用言の格パターンで
指定された属性を持つ名詞を選択する。
(6) If the isomorphic word is modified by an adnominal, a predicate modifying the isomorphic word is created, and a noun having an attribute specified by the case pattern of the predicate is selected.

従来の方法のうち、前記(1)、(4)、(6)を実現
するには、すべての用言について格パターンを〜 用意しなければならない。しかし、これは辞書の語常数
が多い場合には、非常に手間のかかる作業であり、同形
語という限られた単語の読み分けのために、辞書中のす
べての用言の格パターンを用意するのはコスト的に見合
わないという問題点がある。
In order to realize (1), (4), and (6) of the conventional methods, case patterns must be prepared for all predicates. However, this is a very time-consuming task when the number of common words in the dictionary is large, and in order to distinguish the limited number of words called homographs, it is necessary to prepare case patterns for all the words in the dictionary. The problem is that it is not cost-effective.

目     的 本発明は、上述のごとき実情に鑑みてなされたもので、
同形語を読み分ける際の手がかりとなる共起関係の典型
的な例を記憶した共起辞書を用意しておき、それを用い
て同形語と係り受け関係にある文節との共起の強さを求
め、共起が強い方の組合せを選択する方法をとり、すべ
ての用言について格パターンを用意しなくても同形語の
読み分けが可能になり、また、共起の強さを調べる際に
、共起辞書と見出しレベルで一致する場合はもちろんの
こと、共起辞書中に登録されている単語と類似の意味分
類の単語であれば、共起関係がある程度成立すると判断
することにより、1つの典型例だけで多様な表現の共起
関係も類推することが可能になるように構成した同形語
読み分け方式を提供することを目的としてなされたもの
である。
Purpose The present invention was made in view of the above-mentioned circumstances.
Prepare a co-occurrence dictionary that stores typical examples of co-occurrence relations to help you distinguish between isomorphic words, and use it to determine the strength of co-occurrence between the isomorphic word and clauses in dependency relationships. By using the method of determining the combination with the strongest co-occurrence, it is possible to distinguish between isomorphic words without preparing case patterns for all terms, and when investigating the strength of co-occurrence, , by determining that a co-occurrence relationship holds to some extent, not only when the words match the co-occurrence dictionary at the heading level, but also when the words have a similar semantic classification to the words registered in the co-occurrence dictionary. The purpose of this study is to provide a method for identifying isomorphic words that is structured so that it is possible to infer the co-occurrence relationships of various expressions using only one typical example.

構   l又 本発明は、」二足目的を達成するために、漢字かな混じ
りの1ヨ本語文章を形態素解析する形態素解析部と、文
節間の係り受け関係を調べる係り受け解析部と、前記形
態素解析と係り受け解析の結果を用いて韻律記号列を生
成する韻律記号生成部と。
Furthermore, in order to achieve the two-dimensional purpose, the present invention includes a morphological analysis unit that morphologically analyzes a Japanese sentence containing kanji and kana, a dependency analysis unit that examines dependency relationships between clauses, and a dependency analysis unit that examines dependency relationships between clauses. a prosodic symbol generation unit that generates a prosodic symbol string using the results of morphological analysis and dependency analysis;

韻律記号列を合成音声に変換する音声合成部とを備えた
テキスト音声合成装置において、同形語(表記が同じだ
が、読みが異なる単語)を読み分ける際の手がかりとな
る文節間の共起関係を記憶した共起辞書を備え、同形語
がある場合には、該同形語を含む文節と係り受け関係に
ある文節との共起の強さを共起辞書を用いて検索し、そ
の結果共起関係の強い方の単語を選択すること、更に、
共起の強さを決定する際に、前記共起辞書と見出しが一
致する場合だけでなく、共起辞書に登録されている単語
と意味的に類似なQL語の場合も、ある程度共起関係が
あると判断することを特徴としたものである。以下、本
発明の実施例に基づいて説明する。
In a text-to-speech synthesizer equipped with a speech synthesis unit that converts prosodic symbol strings into synthesized speech, the co-occurrence relationship between clauses is used as a clue to distinguish between isomorphic words (words with the same spelling but different pronunciations). If there is a memorized co-occurrence dictionary, if there is a isomorphic word, the strength of co-occurrence between the clause containing the isomorphic word and the clause in a dependency relationship is searched using the co-occurrence dictionary, and the result is a co-occurrence dictionary. Selecting words that are more closely related, and
When determining the strength of co-occurrence, we consider not only the cases where the headings match the co-occurrence dictionary, but also the cases of QL words that are semantically similar to words registered in the co-occurrence dictionary. It is characterized by determining that there is. Hereinafter, the present invention will be explained based on examples.

第1図は、本発明による同形語読み分け方式を説明する
ためのもので、共起辞書の構成を示し、第2図は、各単
語の意味分類を示すもので、階層的に定義されている。
Figure 1 is for explaining the isomorphic word classification method according to the present invention, and shows the structure of a co-occurrence dictionary, and Figure 2 shows the meaning classification of each word, which is defined hierarchically. .

この実施例では、同形語読み分け処理は、テキスト音声
合成の一連の処理の中で係り受け解析の直後におかれる
。次に、同形語読み分け処理がどのように行なわれるが
を具体的に示して説明する。共起の強さを決定するルー
ルは次の第1表のように定義されている。
In this embodiment, the homograph reading process is performed immediately after the dependency analysis in a series of text-to-speech synthesis processes. Next, how the isomorphic word recognition process is performed will be specifically explained. The rules for determining the strength of co-occurrence are defined as shown in Table 1 below.

実例1 「°°“°゛゛°°゛°°゛°°“′°゛°°゛″””
’1r”””””””””’”′] 謹””””’1 彼は 今日 実験を 行った(いった/おこなった)。
Example 1 “°°“°゛゛°°゛°°゛°°“′°゛°°゛″””
'1r"""""""""'"′] 謹"""'1 He conducted (did/conducted) an experiment today.

この実例では、“行った”が同形語である。In this example, “gone” is the isomorphic word.

″行った″に係る文節には、彼はn、u今日″″実験を
″の三つがある。共起辞書で″行った″を検索すると、
次の2つの共起関係が得られる。
There are three phrases related to ``did'': ``he did n'', ``today'', ``experiment''. When you search for ``did'' in the co-occurrence dictionary, you get
The following two co-occurrence relationships are obtained.

町へ(に)行った(いった)″ ″実験を  行った(おこなった)” 実例10文中に共起辞書の見出しと全く一致する″実験
を 行った″という係り受け関係があるので、共起の強
さは5となり、この場合、′行った″の読みとして″お
こなった”が選択される。
``I went to the town.'' ``I conducted an experiment.'' In the example 10 sentence, there is a dependency relationship such as ``I conducted an experiment,'' which is exactly the same as the entry in the co-occurrence dictionary. The strength of the occurrence is 5, and in this case, ``Ogadatta'' is selected as the reading of ``Ogata''.

実例2 膳“゛パ°゛°°“”””’1 謹””””’1 私は 学校へ 行った(いった/おこなった)にの実例
でも″行った”が同形語であるので、=7 共起辞書を検索すると実例1と同じ検索結果が得られる
。今度は、見出しレベルで一致する係り受け関係は例文
中にない。しかし、″学校″の意味コードは第2図にお
ける2、1.1で施設を表わし、“町”と意味分類の1
つの上の階層で一致しているので、共起の強さは3とな
り、″行った″の読みとしては、″いった″が選択され
る。
Example 2: “I went to school” in the example “゛pa°゛°°“”””’1 謹”””’1 I went to school (I went/did), because “I went” is a homograph. , = 7 If you search the co-occurrence dictionary, you will get the same search results as Example 1. This time, there is no dependency relationship that matches at the heading level in the example sentence. However, the meaning code for "school" is 2 in Figure 2. , 1.1 represents the facility, and “town” and 1 of the semantic classification
Since they match in the upper hierarchy, the strength of co-occurrence is 3, and the reading of ``Ichita'' is selected as ``Ichita''.

実例3 r””””””””””””’“°゛°°°°゛°“”
””1八代(はちだい/やつしろ)を 出発した。
Example 3 r”””””””””””””°゛°°°°゛°“”
``1 I left Hachidai/Yatsushiro.

この実例では、″八代”が同形語であるので、共起辞書
を検索すると、″八代(やつしろ)に到着した”が得ら
れる。″出発した”は“到着した”と意味分類が等しい
ので、共起の強さは2となり、“八代”の読みとしては
、“やつしろ”が選択される。
In this example, since "Yatsushiro" is a homograph, searching the co-occurrence dictionary yields "Arrived at Yatsushiro". Since "departed" has the same semantic classification as "arrived", the strength of co-occurrence is 2, and "Yatsushiro" is selected as the reading of "Yashiro".

カニーー限 以上の説明から明らかなように、本発明によると、共起
辞書を用いることによって、すべての用言の格パターン
を用意しなくても、容易に同形語の読み分けが可能にな
る。また、共起の強さを決定する際に、意味分類レベル
の一致も考慮することによって、共起辞書には典型的な
例しか登録されていなくても、かなり多様な表現の共起
関係にも対応することができる。
As is clear from the above explanation, according to the present invention, by using a co-occurrence dictionary, isomorphic words can be easily distinguished without having to prepare case patterns for all words. In addition, by considering the agreement of the semantic classification level when determining the strength of co-occurrence, even if only typical examples are registered in the co-occurrence dictionary, co-occurrence relationships of quite diverse expressions can be realized. can also be handled.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は、本発明による同形語読み分け方式を説明する
ためのもので、共起辞書の構成を示す図、第2図は、階
層的に定義されている意味分類を示す図である。 特許出願人  株式会社 リコー
FIG. 1 is a diagram for explaining the isomorphic word classification method according to the present invention, and is a diagram showing the structure of a co-occurrence dictionary, and FIG. 2 is a diagram showing hierarchically defined meaning classifications. Patent applicant Ricoh Co., Ltd.

Claims (1)

【特許請求の範囲】 1、漢字かな混じりの日本語文章を形態素解析する形態
素解析部と、文節間の係り受け関係を調べる係り受け解
析部と、前記形態素解析と係り受け解析の結果を用いて
韻律記号列を生成する韻律記号生成部と、韻律記号列を
合成音声に変換する音声合成部とを備えたテキスト音声
合成装置において、同形語を読み分ける際の手がかりと
なる文節間の共起関係を記憶した共起辞書を備え、同形
語がある場合には、該同形語を含む文節と係り受け関係
にある文節との共起の強さを前記共起辞書を用いて検索
し、その結果、共起関係の強い方の単語を選択すること
を特徴とする同形語読み分け方式。 2、共起の強さを決定する際に、前記共起辞書と見出し
が一致する場合だけでなく、共起辞書に登録されている
単語と意味的に類似な単語の場合も、ある程度共起関係
があると判断することを特徴とする請求項1記載の同形
語読み分け方式。
[Scope of Claims] 1. A morphological analysis unit that morphologically analyzes a Japanese sentence containing kanji and kana, a dependency analysis unit that examines dependency relationships between clauses, and a system that uses the results of the morphological analysis and dependency analysis. In a text-to-speech synthesizer equipped with a prosodic symbol generator that generates a prosodic symbol string and a speech synthesizer that converts the prosodic symbol string into synthesized speech, the co-occurrence relationship between clauses is used as a clue when distinguishing isomorphic words. If there is a isomorphic word, the co-occurrence dictionary is used to search for the strength of co-occurrence between a clause containing the isomorphic word and a clause in a dependency relationship, and the result is , is an isomorphic reading method characterized by selecting words with a stronger co-occurrence relationship. 2. When determining the strength of co-occurrence, we consider not only the cases where the heading matches the co-occurrence dictionary, but also the cases of words that are semantically similar to words registered in the co-occurrence dictionary. 2. The isomorphism reading method according to claim 1, wherein it is determined that there is a relationship.
JP1022021A 1989-01-31 1989-01-31 Homomorphic reading system Expired - Fee Related JP2655711B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1022021A JP2655711B2 (en) 1989-01-31 1989-01-31 Homomorphic reading system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1022021A JP2655711B2 (en) 1989-01-31 1989-01-31 Homomorphic reading system

Publications (2)

Publication Number Publication Date
JPH02201643A true JPH02201643A (en) 1990-08-09
JP2655711B2 JP2655711B2 (en) 1997-09-24

Family

ID=12071333

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1022021A Expired - Fee Related JP2655711B2 (en) 1989-01-31 1989-01-31 Homomorphic reading system

Country Status (1)

Country Link
JP (1) JP2655711B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06289890A (en) * 1993-03-31 1994-10-18 Sony Corp Natural language processor

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06289890A (en) * 1993-03-31 1994-10-18 Sony Corp Natural language processor

Also Published As

Publication number Publication date
JP2655711B2 (en) 1997-09-24

Similar Documents

Publication Publication Date Title
US5475587A (en) Method and apparatus for efficient morphological text analysis using a high-level language for compact specification of inflectional paradigms
US5794177A (en) Method and apparatus for morphological analysis and generation of natural language text
Brill A corpus-based approach to language learning
JP4306894B2 (en) Natural language processing apparatus and method, and natural language recognition apparatus
Gaizauskas et al. University of Sheffield: Description of the LaSIE system as used for MUC-6
US6243670B1 (en) Method, apparatus, and computer readable medium for performing semantic analysis and generating a semantic structure having linked frames
Pradhan et al. Semantic role labeling using different syntactic views
US6101492A (en) Methods and apparatus for information indexing and retrieval as well as query expansion using morpho-syntactic analysis
US5559693A (en) Method and apparatus for efficient morphological text analysis using a high-level language for compact specification of inflectional paradigms
US6269189B1 (en) Finding selected character strings in text and providing information relating to the selected character strings
US7788084B2 (en) Labeling of work of art titles in text for natural language processing
EP0583083A2 (en) Finite-state transduction of related word forms for text indexing and retrieval
US20040054530A1 (en) Generating speech recognition grammars from a large corpus of data
WO1997004405A9 (en) Method and apparatus for automated search and retrieval processing
US20070011160A1 (en) Literacy automation software
EP1331574B1 (en) Named entity interface for multiple client application programs
McDonald An efficient chart-based algorithm for partial-parsing of unrestricted texts
EP1290574B1 (en) System and method for matching a textual input to a lexical knowledge base and for utilizing results of that match
Kamir et al. A comprehensive NLP system for modern standard Arabic and modern Hebrew
RU2004127924A (en) DATA TRANSFER METHOD AND DEVICE FOR IMPLEMENTING THIS METHOD
JPH02201643A (en) System for discriminatingly reading isomorphic word
Litkowski Question Answering Using XML-Tagged Documents.
JP3526063B2 (en) Voice recognition device
Batarfi et al. Building an Arabic semantic lexicon for Hajj
KR20050123007A (en) A system for generating technique for generating korean phonetic alphabet

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees