JPH01321557A - Text voice synthesizing device - Google Patents

Text voice synthesizing device

Info

Publication number
JPH01321557A
JPH01321557A JP63155438A JP15543888A JPH01321557A JP H01321557 A JPH01321557 A JP H01321557A JP 63155438 A JP63155438 A JP 63155438A JP 15543888 A JP15543888 A JP 15543888A JP H01321557 A JPH01321557 A JP H01321557A
Authority
JP
Japan
Prior art keywords
kanji
reading
word
dictionary
unregistered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP63155438A
Other languages
Japanese (ja)
Other versions
JP2801601B2 (en
Inventor
Shiyouichi Sasabe
佐々部 昭一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP63155438A priority Critical patent/JP2801601B2/en
Publication of JPH01321557A publication Critical patent/JPH01321557A/en
Application granted granted Critical
Publication of JP2801601B2 publication Critical patent/JP2801601B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PURPOSE:To obtain the correct reading even in the case an identified unregistered word is equal to a Japanese word by using a KANJI (Chinese characters) dictionary where the information on the single KANJI are registered and selecting the reading out of the KANJI dictionary in the case a detected unregistered word includes KANJI. CONSTITUTION:The inputted Japanese sentences undergo the necessary analyses for the morpheme, the syntax structure, the meaning, etc., through a Japanese word processing part 1, and is extracted the required bit of information for preparing the rhythm. When a word unregistered in a dictionary 4 is identified, the proper reading is given to the word for each KANJI. For this purpose, a KANJI dictionary 5 containing the reading is prepared and plural types of reading are selected based on a fixed rule. The selection of the KANJI reading is decided based on the description of the detected unregistered word, the character description of the preceding and next words, the morpheme information, the meaning and the origin of KANJI of the detected unregistered word, the syntax structure information, the context information, etc. Thus the proper reading can be given to the unregistered KANJI character strings.

Description

【発明の詳細な説明】 技監立夏 本発明は、テキスト音声合成装置、より詳細には、テキ
スト音声合成装置の未登録語処理において、未登録語の
漢字文字列に適切な読みを付与する処理に関する。
[Detailed Description of the Invention] The present invention relates to a text-to-speech synthesis device, more specifically, a process for assigning an appropriate pronunciation to a kanji character string of an unregistered word in an unregistered word process of a text-to-speech synthesis device. Regarding.

盗m透 日本語テキストの文章解析を行う場合、一般には、先ず
文章を単語に分割する処理が施されるが、これはあらか
じめ辞書に登録された単語とのマツチングによって検索
される。しかしながら、任意のテキストに出現する単語
の全てを辞書に登録しておくことは不可能であり、未登
録語は必ず存在し、その同定処理が必要となる。
When performing sentence analysis of stolen Japanese text, the sentence is generally first divided into words, which are searched by matching with words registered in a dictionary in advance. However, it is impossible to register all the words that appear in a given text in a dictionary, and there are always unregistered words, which require identification processing.

従来はその未登録語文字列が少なくとも二つの漢字連続
なる音読み、孤立した漢字とその次の文字が平仮名なら
訓読みとする、といったような判断によって読みを選択
していた。しかし、この様な方法では、同定された未登
録語が和語であるときには正しい読みが付与できなかっ
た。
Previously, the reading was selected based on judgments such as if the unregistered word string had at least two consecutive kanji characters in the onyomi, or if an isolated kanji and the next character were hiragana, it would be the kunyomi. However, with this method, when the identified unregistered word is a Japanese word, it is not possible to give the correct reading.

目    的 本発明は、上述のごとき実情に鑑みてなされたもので、
特に、任意の日本語文章を音声に変換。
Purpose The present invention was made in view of the above-mentioned circumstances.
In particular, it converts any Japanese text into audio.

音読する日本語テキスト音声合成装置において、日本語
の文章解析(形態素解析、構文解析、意味解析)に関す
る単語分割に際し、辞書に未登録な単語が検出されたと
き、検出された単語に適切な読みを付与することを目的
としてなされたものである。
When a Japanese text-to-speech synthesizer that reads aloud detects a word that is not registered in the dictionary during word segmentation related to Japanese text analysis (morphological analysis, syntactic analysis, semantic analysis), the appropriate pronunciation for the detected word is detected. This was done for the purpose of giving.

構   成 本発明は、上記目的を達成するために、任意の日本語文
章を音声に変換するテキスト音声合成装置であって、入
力された日本語文章をあらかじめ作成された辞書中の単
語とのマツチングにより単語単位に分割する処理におい
て、前記辞書中に存在しなかった未登録語文字列を検出
する未登録語検出部および単漢字に関する情報を登録し
た漢字辞書を有し、検出された未登録語に漢字が含まれ
るときには、漢字読み付与処理によって漢字辞書から選
択的に読みを付与することを特徴としたものである。以
下、本発明の実施例に基いて説明する。
Configuration In order to achieve the above object, the present invention is a text-to-speech synthesizer that converts any Japanese text into speech, and which converts an input Japanese text into speech by matching the input Japanese text with words in a dictionary created in advance. In the process of dividing into words, it has an unregistered word detection unit that detects unregistered word character strings that did not exist in the dictionary, and a kanji dictionary that registers information about single kanji, and processes the detected unregistered words. When a kanji is included, a kanji reading is selectively assigned from a kanji dictionary by a kanji reading assignment process. Hereinafter, the present invention will be explained based on examples.

第1図は、本発明の一実施例を説明するための構成図で
、図中、1は日本語処理部、2は韻律生成部、3は音声
合成部、4は単語辞書、5は漢字辞書、6は漢字読み付
与部、7は辞書検索部、8は未登録語処理部、9は形態
素解析部、10は構文解析部、11は意味解析部で、本
発明は、任意の日本語文章を音声に変換するテキスト音
声合成装置、において、辞書に未登録の単語が同定され
たとき、その単語に字列の漢字iこ適切な読みを付与す
るため、読みが登録されている漢字辞書を用意し、複数
の読みの中から、一定の規則により選択するようにした
ものである。
FIG. 1 is a block diagram for explaining one embodiment of the present invention, in which 1 is a Japanese language processing unit, 2 is a prosody generation unit, 3 is a speech synthesis unit, 4 is a word dictionary, and 5 is a kanji character A dictionary, 6 is a kanji reading assignment unit, 7 is a dictionary search unit, 8 is an unregistered word processing unit, 9 is a morphological analysis unit, 10 is a syntactic analysis unit, 11 is a semantic analysis unit, and the present invention is capable of reading any Japanese language. In a text-to-speech synthesis device that converts sentences into speech, when a word that is not registered in the dictionary is identified, a kanji dictionary in which the readings are registered is used to give the word an appropriate reading for the kanji in the string. is prepared, and the user selects from among multiple readings according to certain rules.

第1図において、入力された日本語文章は日本語処理部
1で形態素、構文、意味解析などの必要な解析を施して
、韻律生成に必要な情報を抽出する。読み、アクセント
、形態素情報、構文情報。
In FIG. 1, an input Japanese sentence is subjected to necessary analysis such as morphological, syntactic, and semantic analysis in a Japanese processing unit 1 to extract information necessary for prosody generation. Reading, accent, morphological information, syntactic information.

意味情報などが韻律生成部2に入力され、これらの情報
を用いて、文章レベルでの、アクセント。
Semantic information and the like are input to the prosody generation unit 2, and this information is used to generate accents at the sentence level.

ストレス、ポーズ位置、ポーズ長、抑揚などの韻律情報
を生成して、音声合成部3に出力し、これを受けて音声
信号を生成する処理を行う、漢字読み付与は、検出同定
された未登録語の漢字について処理を施す、この処理は
、第1図に示した実施例では、未登録語が検出される毎
に処理しているが、検出処理が完了した後ならば、韻律
生成に移行するまでのどの段階で付与処理しても良い。
Prosodic information such as stress, pause position, pause length, intonation, etc. is generated and output to the speech synthesis unit 3, which then processes to generate a speech signal. Kanji reading assignment is performed on detected and identified unregistered In the embodiment shown in Fig. 1, this process is performed every time an unregistered word is detected, but after the detection process is completed, the process shifts to prosody generation. It may be added at any stage up to that point.

第2図は、漢字の読み付与処理の一例を説明するための
図で、図中、21は文字表記形態抽出部。
FIG. 2 is a diagram for explaining an example of kanji reading assignment processing, and in the figure, 21 is a character notation form extraction unit.

22は漢字辞書、23は読み検索部、24は判定規則部
、25は読み選択判定部、26は訓読みの代表読み選択
部、27は訓読み選択部、28は音読み選択部、29は
代表読み選択部、30は抽出した読み付与部で、この漢
字の読み付与処理には、未登録語として同定された文字
列や前後の単語の表記、形態素情報や、構文、文脈情報
などが入力される。先ず、未登録語文字列の表記の形態
を調べ、文字数や漢字の連続とその文字数、ひらがなの
存在やその位置などを抽出する1次に1文字列に含まれ
る漢字の各々について漢字辞書を検索して、読み、漢字
の意味、字源などを抽出する。これらの情報を読み選択
判定部に与え、規則を用いて、読みの選択を指示して各
々の選択処理に進み、抽出された読みを付与する。読み
は短漢字の全ての読みの中から最も頻度が高い読みを代
表読みとして、第2図に示すように、代表読み、訓読み
22 is a kanji dictionary, 23 is a reading search section, 24 is a judgment rule section, 25 is a reading selection judgment section, 26 is a Kun-yomi representative reading selection section, 27 is a Kun-yomi selection section, 28 is an On-yomi selection section, and 29 is a representative reading selection section. Section 30 is an extracted pronunciation assignment section, into which character strings identified as unregistered words, notation of preceding and following words, morpheme information, syntax, context information, etc. are input. First, the format of the notation of the unregistered word string is examined, and the number of characters, the number of consecutive kanji characters, the presence of hiragana and its position, etc. are extracted.First, the kanji dictionary is searched for each kanji included in the string. Then, the reading, meaning of kanji, character origin, etc. are extracted. This information is given to the pronunciation selection determination section, and using rules, the pronunciation is instructed to select a pronunciation, the process proceeds to each selection process, and the extracted pronunciation is assigned. As shown in Figure 2, the most frequent reading of the short kanji is taken as the representative reading, and the kun reading is used as the representative reading.

音読みを登録することが考えられる。また、音読みは慣
用音、漢音、呉音、唐音、宋音などの種別をあるいは種
別に登録しても良い、読み選択判定に使用される規則と
しては、例えば、未登録語検出文字列が漢字1字で次の
文字がひらがなであるとき訓読みの代表読み選択を指示
し、検出文字列が漢字2文字と続く文字がひらがなであ
るときに訓読み選択を指示し、それ以外の場合に音読み
選択を指示することなどが考えられる。これらの判断は
他の入力情報による規則から行うことも考えられ1例え
ば、形態素情報からは前記の選択指示に優先して、未登
録語の次が和語であるときには。
One possibility is to register the on-yomi reading. In addition, types of on-yomi such as idiomatic sounds, Chinese sounds, Wu sounds, Tang sounds, Song sounds, etc. may be registered as the type.As for the rules used for reading selection judgment, for example, if the unregistered word detected character string is a kanji When there is one character and the next character is a hiragana character, it instructs to select the representative reading of kun-yomi, when the detected character string is two kanji characters and the following character is a hiragana character, it instructs to select kun-yomi, and in other cases, it selects on-yomi. It may be possible to give instructions. These judgments may be made based on rules based on other input information. For example, if the morpheme information gives priority to the selection instruction described above and the next word after an unregistered word is a Japanese word.

訓読み選択を指示する。構文1文脈情報、漢字の意味、
字源からは、係り受けや意味による単語の共起を利用す
ることもできる。
Instruct students to select Kun-yomi. Syntax 1 Context information, meaning of kanji,
From the glyph source, it is also possible to use word co-occurrence based on dependencies and meanings.

以上の説明から明らかなように1本発明は、任意の日本
語文章を音声に変換するテキスト音声合成装置において
、入力された日本語文章をあらかじめ作成された辞書中
の単語とのマツチングにより単語単位に分割する処理に
おいて、前記辞書中に存在しなかった未登録語文字列を
検出する未登録語検出部および単漢字に関する情報を登
録した漢字辞書を有し、検出された未登録語に漢字が含
まれるときには、漢字読み付与処理によって漢字辞書か
ら選択的に読みを付与するものであるが、更に具体的に
は、 漢字の読みの選択は、検出された未登録語表記や前後の
単語に関する1文字表記、形態素情報、あるいは検出さ
れた未登録語の漢字の意味1字源。
As is clear from the above description, the present invention provides a text-to-speech synthesis device that converts arbitrary Japanese sentences into speech, which performs word-by-word processing by matching input Japanese sentences with words in a dictionary created in advance. In the process of dividing the unregistered word into two, it has an unregistered word detection unit that detects unregistered word character strings that did not exist in the dictionary, and a kanji dictionary that registers information about single kanji, so that the detected unregistered word has a kanji. When a kanji is included, a reading is selectively assigned from a kanji dictionary through a kanji reading assignment process. Character notation, morphological information, or the meaning of a detected unregistered word.

構文情報1文脈情報などの少なくとも一つの情報によっ
て判断し、或いは。
Determined by at least one information such as syntax information 1 context information, or.

未登録語検出文字列が一字漢字で次の文字がひらがなで
あるか、検出文字列が漢字二字と続く少なくとも一字の
ひらがなであるとき、訓読みを選択し、その他の場合は
音読みを選択するものであり、或いは、 文字表記からの判断による読みの選択に優先して、未登
録語の次の単語が和語名詞であるときは。
If the unregistered word detected character string is one Kanji character and the next character is Hiragana, or if the detected character string is two Kanji characters followed by at least one Hiragana character, Kun-yomi is selected, otherwise On-yomi is selected. or when the word following the unregistered word is a Japanese noun, giving priority to the selection of pronunciation based on judgment from the character notation.

未登録語検出文字列に訓読みを付し、また、漢字辞書は
最も頻度の高い代表読みと、音読み、訓読みが登録され
ており、未登録文字列が漢字−字で次の文字がひらがな
のときに限り、代表読みが訓読みならば、その読みを選
択し、そうでないときに訓読みを選択し、更にiよ、 読みの選択に際して、音読みが存在しないときは代表読
みを選択し、訓読みが存在しないときは音読みを選択す
るものである。
Kun-yomi is added to the unregistered word detected character string, and the most frequent representative reading, on-yomi, and kun-yomi are registered in the kanji dictionary, and when the unregistered character string is a kanji-character and the next character is a hiragana. If the representative reading is a kun-yomi, select that reading, otherwise select the kun-yomi; At times, one should choose on-yomi.

効   果 以上の説明から明らかなように1本発明によると、テキ
スト音声合成の未登録語処理によって抽出同定された未
登録の漢字文字列に、適切な読みを付与することが出来
る。
Effects As is clear from the above description, according to the present invention, an appropriate pronunciation can be assigned to an unregistered Kanji character string extracted and identified by the unregistered word processing of text-to-speech synthesis.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は1本発明によるテキスト音声合成装置の一実施
例を説明するための構成図、第2図は、漢字の読み付与
処理の一例を説明するための図である。 1・・・日本語処理部、2・・・韻律生成部、3・・・
音声合成部、4・・・単語辞書、5・・・漢字辞書、6
・・・漢字読み付与部、7・・・辞書検索部、8・・・
未登録語処理部、9・・・形態素解析部、1o・・・構
文解析部、11・・・意味解析部、21・・・文字表記
形態抽出部、22・・・漢字辞書、23・・・読み検索
部、24・・・判定規則部、25・・・読み選択判定部
、26・・・訓読みの代表読み選択部、27・・・訓読
み選択部、28・・・音読み選択部、29・・・代表読
み選択部、3o・・・抽出した読み付与部。
FIG. 1 is a block diagram for explaining an embodiment of a text-to-speech synthesis device according to the present invention, and FIG. 2 is a diagram for explaining an example of a process for adding pronunciations of kanji. 1... Japanese language processing unit, 2... Prosody generation unit, 3...
Speech synthesis section, 4... Word dictionary, 5... Kanji dictionary, 6
...Kanji reading assignment section, 7...Dictionary search section, 8...
Unregistered word processing unit, 9...Morphological analysis unit, 1o...Syntax analysis unit, 11...Semantic analysis unit, 21...Character notation form extraction unit, 22...Kanji dictionary, 23... - Reading search unit, 24... Judgment rule unit, 25... Reading selection determining unit, 26... Kun reading representative reading selection unit, 27... Kun reading selection unit, 28... On reading selection unit, 29 ... Representative reading selection part, 3o... Extracted reading assignment part.

Claims (1)

【特許請求の範囲】[Claims] 1、任意の日本語文章を音声に変換するテキスト音声合
成装置であって、入力された日本語文章をあらかじめ作
成された辞書中の単語とのマッチングにより単語単位に
分割する処理において、前記辞書中に存在しなかった未
登録語文字列を検出する未登録語検出部および単漢字に
関する情報を登録した漢字辞書を有し、検出された未登
録語に漢字が含まれるときには、漢字読み付与処理によ
って漢字辞書から選択的に読みを付与することを特徴と
するテキスト音声合成装置。
1. A text-to-speech synthesis device that converts any Japanese text into speech, in which the input Japanese text is divided into word units by matching with words in a dictionary created in advance. It has an unregistered word detection unit that detects unregistered word character strings that did not exist in A text-to-speech synthesis device characterized by selectively assigning readings from a kanji dictionary.
JP63155438A 1988-06-23 1988-06-23 Text-to-speech synthesizer Expired - Lifetime JP2801601B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63155438A JP2801601B2 (en) 1988-06-23 1988-06-23 Text-to-speech synthesizer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63155438A JP2801601B2 (en) 1988-06-23 1988-06-23 Text-to-speech synthesizer

Publications (2)

Publication Number Publication Date
JPH01321557A true JPH01321557A (en) 1989-12-27
JP2801601B2 JP2801601B2 (en) 1998-09-21

Family

ID=15606034

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63155438A Expired - Lifetime JP2801601B2 (en) 1988-06-23 1988-06-23 Text-to-speech synthesizer

Country Status (1)

Country Link
JP (1) JP2801601B2 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62119591A (en) * 1985-11-20 1987-05-30 富士通株式会社 Sentence reciting apparatus
JPS63189933A (en) * 1987-02-02 1988-08-05 Fujitsu Ltd Device for reading sentence aloud

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62119591A (en) * 1985-11-20 1987-05-30 富士通株式会社 Sentence reciting apparatus
JPS63189933A (en) * 1987-02-02 1988-08-05 Fujitsu Ltd Device for reading sentence aloud

Also Published As

Publication number Publication date
JP2801601B2 (en) 1998-09-21

Similar Documents

Publication Publication Date Title
KR900009170B1 (en) Synthesis-by-rule type synthesis system
JPH06282290A (en) Natural language processing device and method thereof
JP3371761B2 (en) Name reading speech synthesizer
JP2801601B2 (en) Text-to-speech synthesizer
JPH07262191A (en) Word dividing method and voice synthesizer
JP2580568B2 (en) Pronunciation dictionary update device
JPH09244677A (en) Speech synthesis system
JP2001343987A (en) Method and device for voice synthesis
JP2996978B2 (en) Text-to-speech synthesizer
JPH0760378B2 (en) Text-to-speech device
JP3573889B2 (en) Audio output device
JPH08185197A (en) Japanese analyzing device and japanese text speech synthesizing device
JPH096378A (en) Text voice conversion device
KR0180650B1 (en) Sentence analysis method for korean language in voice synthesis device
KR0136423B1 (en) Phonetic change processing method by validity check of sound control symbol
Granström et al. A danish text-to-speech system using a text normalizer based on morph analysis.
JPH06176023A (en) Speech synthesis system
JPH06289890A (en) Natural language processor
JP2888847B2 (en) Text-to-speech apparatus and method, and language processing apparatus and method
JPH05298364A (en) Phonetic symbol forming system
JPH02234198A (en) Text voice synthesizing system
JPS62208125A (en) Sentence reading device
JPH10240726A (en) Natural language processor
JPH09281993A (en) Phonetic symbol forming device
JPH0375898B2 (en)

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20070710

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080710

Year of fee payment: 10

EXPY Cancellation because of completion of term