JPH01321557A - Text voice synthesizing device - Google Patents
Text voice synthesizing deviceInfo
- Publication number
- JPH01321557A JPH01321557A JP63155438A JP15543888A JPH01321557A JP H01321557 A JPH01321557 A JP H01321557A JP 63155438 A JP63155438 A JP 63155438A JP 15543888 A JP15543888 A JP 15543888A JP H01321557 A JPH01321557 A JP H01321557A
- Authority
- JP
- Japan
- Prior art keywords
- kanji
- reading
- word
- dictionary
- unregistered
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002194 synthesizing effect Effects 0.000 title 1
- 230000015572 biosynthetic process Effects 0.000 claims description 11
- 238000003786 synthesis reaction Methods 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 abstract description 14
- 230000033764 rhythmic process Effects 0.000 abstract 1
- 238000000034 method Methods 0.000 description 15
- 230000000877 morphologic effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Abstract
Description
【発明の詳細な説明】
技監立夏
本発明は、テキスト音声合成装置、より詳細には、テキ
スト音声合成装置の未登録語処理において、未登録語の
漢字文字列に適切な読みを付与する処理に関する。[Detailed Description of the Invention] The present invention relates to a text-to-speech synthesis device, more specifically, a process for assigning an appropriate pronunciation to a kanji character string of an unregistered word in an unregistered word process of a text-to-speech synthesis device. Regarding.
盗m透
日本語テキストの文章解析を行う場合、一般には、先ず
文章を単語に分割する処理が施されるが、これはあらか
じめ辞書に登録された単語とのマツチングによって検索
される。しかしながら、任意のテキストに出現する単語
の全てを辞書に登録しておくことは不可能であり、未登
録語は必ず存在し、その同定処理が必要となる。When performing sentence analysis of stolen Japanese text, the sentence is generally first divided into words, which are searched by matching with words registered in a dictionary in advance. However, it is impossible to register all the words that appear in a given text in a dictionary, and there are always unregistered words, which require identification processing.
従来はその未登録語文字列が少なくとも二つの漢字連続
なる音読み、孤立した漢字とその次の文字が平仮名なら
訓読みとする、といったような判断によって読みを選択
していた。しかし、この様な方法では、同定された未登
録語が和語であるときには正しい読みが付与できなかっ
た。Previously, the reading was selected based on judgments such as if the unregistered word string had at least two consecutive kanji characters in the onyomi, or if an isolated kanji and the next character were hiragana, it would be the kunyomi. However, with this method, when the identified unregistered word is a Japanese word, it is not possible to give the correct reading.
目 的
本発明は、上述のごとき実情に鑑みてなされたもので、
特に、任意の日本語文章を音声に変換。Purpose The present invention was made in view of the above-mentioned circumstances.
In particular, it converts any Japanese text into audio.
音読する日本語テキスト音声合成装置において、日本語
の文章解析(形態素解析、構文解析、意味解析)に関す
る単語分割に際し、辞書に未登録な単語が検出されたと
き、検出された単語に適切な読みを付与することを目的
としてなされたものである。When a Japanese text-to-speech synthesizer that reads aloud detects a word that is not registered in the dictionary during word segmentation related to Japanese text analysis (morphological analysis, syntactic analysis, semantic analysis), the appropriate pronunciation for the detected word is detected. This was done for the purpose of giving.
構 成
本発明は、上記目的を達成するために、任意の日本語文
章を音声に変換するテキスト音声合成装置であって、入
力された日本語文章をあらかじめ作成された辞書中の単
語とのマツチングにより単語単位に分割する処理におい
て、前記辞書中に存在しなかった未登録語文字列を検出
する未登録語検出部および単漢字に関する情報を登録し
た漢字辞書を有し、検出された未登録語に漢字が含まれ
るときには、漢字読み付与処理によって漢字辞書から選
択的に読みを付与することを特徴としたものである。以
下、本発明の実施例に基いて説明する。Configuration In order to achieve the above object, the present invention is a text-to-speech synthesizer that converts any Japanese text into speech, and which converts an input Japanese text into speech by matching the input Japanese text with words in a dictionary created in advance. In the process of dividing into words, it has an unregistered word detection unit that detects unregistered word character strings that did not exist in the dictionary, and a kanji dictionary that registers information about single kanji, and processes the detected unregistered words. When a kanji is included, a kanji reading is selectively assigned from a kanji dictionary by a kanji reading assignment process. Hereinafter, the present invention will be explained based on examples.
第1図は、本発明の一実施例を説明するための構成図で
、図中、1は日本語処理部、2は韻律生成部、3は音声
合成部、4は単語辞書、5は漢字辞書、6は漢字読み付
与部、7は辞書検索部、8は未登録語処理部、9は形態
素解析部、10は構文解析部、11は意味解析部で、本
発明は、任意の日本語文章を音声に変換するテキスト音
声合成装置、において、辞書に未登録の単語が同定され
たとき、その単語に字列の漢字iこ適切な読みを付与す
るため、読みが登録されている漢字辞書を用意し、複数
の読みの中から、一定の規則により選択するようにした
ものである。FIG. 1 is a block diagram for explaining one embodiment of the present invention, in which 1 is a Japanese language processing unit, 2 is a prosody generation unit, 3 is a speech synthesis unit, 4 is a word dictionary, and 5 is a kanji character A dictionary, 6 is a kanji reading assignment unit, 7 is a dictionary search unit, 8 is an unregistered word processing unit, 9 is a morphological analysis unit, 10 is a syntactic analysis unit, 11 is a semantic analysis unit, and the present invention is capable of reading any Japanese language. In a text-to-speech synthesis device that converts sentences into speech, when a word that is not registered in the dictionary is identified, a kanji dictionary in which the readings are registered is used to give the word an appropriate reading for the kanji in the string. is prepared, and the user selects from among multiple readings according to certain rules.
第1図において、入力された日本語文章は日本語処理部
1で形態素、構文、意味解析などの必要な解析を施して
、韻律生成に必要な情報を抽出する。読み、アクセント
、形態素情報、構文情報。In FIG. 1, an input Japanese sentence is subjected to necessary analysis such as morphological, syntactic, and semantic analysis in a Japanese processing unit 1 to extract information necessary for prosody generation. Reading, accent, morphological information, syntactic information.
意味情報などが韻律生成部2に入力され、これらの情報
を用いて、文章レベルでの、アクセント。Semantic information and the like are input to the prosody generation unit 2, and this information is used to generate accents at the sentence level.
ストレス、ポーズ位置、ポーズ長、抑揚などの韻律情報
を生成して、音声合成部3に出力し、これを受けて音声
信号を生成する処理を行う、漢字読み付与は、検出同定
された未登録語の漢字について処理を施す、この処理は
、第1図に示した実施例では、未登録語が検出される毎
に処理しているが、検出処理が完了した後ならば、韻律
生成に移行するまでのどの段階で付与処理しても良い。Prosodic information such as stress, pause position, pause length, intonation, etc. is generated and output to the speech synthesis unit 3, which then processes to generate a speech signal. Kanji reading assignment is performed on detected and identified unregistered In the embodiment shown in Fig. 1, this process is performed every time an unregistered word is detected, but after the detection process is completed, the process shifts to prosody generation. It may be added at any stage up to that point.
第2図は、漢字の読み付与処理の一例を説明するための
図で、図中、21は文字表記形態抽出部。FIG. 2 is a diagram for explaining an example of kanji reading assignment processing, and in the figure, 21 is a character notation form extraction unit.
22は漢字辞書、23は読み検索部、24は判定規則部
、25は読み選択判定部、26は訓読みの代表読み選択
部、27は訓読み選択部、28は音読み選択部、29は
代表読み選択部、30は抽出した読み付与部で、この漢
字の読み付与処理には、未登録語として同定された文字
列や前後の単語の表記、形態素情報や、構文、文脈情報
などが入力される。先ず、未登録語文字列の表記の形態
を調べ、文字数や漢字の連続とその文字数、ひらがなの
存在やその位置などを抽出する1次に1文字列に含まれ
る漢字の各々について漢字辞書を検索して、読み、漢字
の意味、字源などを抽出する。これらの情報を読み選択
判定部に与え、規則を用いて、読みの選択を指示して各
々の選択処理に進み、抽出された読みを付与する。読み
は短漢字の全ての読みの中から最も頻度が高い読みを代
表読みとして、第2図に示すように、代表読み、訓読み
。22 is a kanji dictionary, 23 is a reading search section, 24 is a judgment rule section, 25 is a reading selection judgment section, 26 is a Kun-yomi representative reading selection section, 27 is a Kun-yomi selection section, 28 is an On-yomi selection section, and 29 is a representative reading selection section. Section 30 is an extracted pronunciation assignment section, into which character strings identified as unregistered words, notation of preceding and following words, morpheme information, syntax, context information, etc. are input. First, the format of the notation of the unregistered word string is examined, and the number of characters, the number of consecutive kanji characters, the presence of hiragana and its position, etc. are extracted.First, the kanji dictionary is searched for each kanji included in the string. Then, the reading, meaning of kanji, character origin, etc. are extracted. This information is given to the pronunciation selection determination section, and using rules, the pronunciation is instructed to select a pronunciation, the process proceeds to each selection process, and the extracted pronunciation is assigned. As shown in Figure 2, the most frequent reading of the short kanji is taken as the representative reading, and the kun reading is used as the representative reading.
音読みを登録することが考えられる。また、音読みは慣
用音、漢音、呉音、唐音、宋音などの種別をあるいは種
別に登録しても良い、読み選択判定に使用される規則と
しては、例えば、未登録語検出文字列が漢字1字で次の
文字がひらがなであるとき訓読みの代表読み選択を指示
し、検出文字列が漢字2文字と続く文字がひらがなであ
るときに訓読み選択を指示し、それ以外の場合に音読み
選択を指示することなどが考えられる。これらの判断は
他の入力情報による規則から行うことも考えられ1例え
ば、形態素情報からは前記の選択指示に優先して、未登
録語の次が和語であるときには。One possibility is to register the on-yomi reading. In addition, types of on-yomi such as idiomatic sounds, Chinese sounds, Wu sounds, Tang sounds, Song sounds, etc. may be registered as the type.As for the rules used for reading selection judgment, for example, if the unregistered word detected character string is a kanji When there is one character and the next character is a hiragana character, it instructs to select the representative reading of kun-yomi, when the detected character string is two kanji characters and the following character is a hiragana character, it instructs to select kun-yomi, and in other cases, it selects on-yomi. It may be possible to give instructions. These judgments may be made based on rules based on other input information. For example, if the morpheme information gives priority to the selection instruction described above and the next word after an unregistered word is a Japanese word.
訓読み選択を指示する。構文1文脈情報、漢字の意味、
字源からは、係り受けや意味による単語の共起を利用す
ることもできる。Instruct students to select Kun-yomi. Syntax 1 Context information, meaning of kanji,
From the glyph source, it is also possible to use word co-occurrence based on dependencies and meanings.
以上の説明から明らかなように1本発明は、任意の日本
語文章を音声に変換するテキスト音声合成装置において
、入力された日本語文章をあらかじめ作成された辞書中
の単語とのマツチングにより単語単位に分割する処理に
おいて、前記辞書中に存在しなかった未登録語文字列を
検出する未登録語検出部および単漢字に関する情報を登
録した漢字辞書を有し、検出された未登録語に漢字が含
まれるときには、漢字読み付与処理によって漢字辞書か
ら選択的に読みを付与するものであるが、更に具体的に
は、
漢字の読みの選択は、検出された未登録語表記や前後の
単語に関する1文字表記、形態素情報、あるいは検出さ
れた未登録語の漢字の意味1字源。As is clear from the above description, the present invention provides a text-to-speech synthesis device that converts arbitrary Japanese sentences into speech, which performs word-by-word processing by matching input Japanese sentences with words in a dictionary created in advance. In the process of dividing the unregistered word into two, it has an unregistered word detection unit that detects unregistered word character strings that did not exist in the dictionary, and a kanji dictionary that registers information about single kanji, so that the detected unregistered word has a kanji. When a kanji is included, a reading is selectively assigned from a kanji dictionary through a kanji reading assignment process. Character notation, morphological information, or the meaning of a detected unregistered word.
構文情報1文脈情報などの少なくとも一つの情報によっ
て判断し、或いは。Determined by at least one information such as syntax information 1 context information, or.
未登録語検出文字列が一字漢字で次の文字がひらがなで
あるか、検出文字列が漢字二字と続く少なくとも一字の
ひらがなであるとき、訓読みを選択し、その他の場合は
音読みを選択するものであり、或いは、
文字表記からの判断による読みの選択に優先して、未登
録語の次の単語が和語名詞であるときは。If the unregistered word detected character string is one Kanji character and the next character is Hiragana, or if the detected character string is two Kanji characters followed by at least one Hiragana character, Kun-yomi is selected, otherwise On-yomi is selected. or when the word following the unregistered word is a Japanese noun, giving priority to the selection of pronunciation based on judgment from the character notation.
未登録語検出文字列に訓読みを付し、また、漢字辞書は
最も頻度の高い代表読みと、音読み、訓読みが登録され
ており、未登録文字列が漢字−字で次の文字がひらがな
のときに限り、代表読みが訓読みならば、その読みを選
択し、そうでないときに訓読みを選択し、更にiよ、
読みの選択に際して、音読みが存在しないときは代表読
みを選択し、訓読みが存在しないときは音読みを選択す
るものである。Kun-yomi is added to the unregistered word detected character string, and the most frequent representative reading, on-yomi, and kun-yomi are registered in the kanji dictionary, and when the unregistered character string is a kanji-character and the next character is a hiragana. If the representative reading is a kun-yomi, select that reading, otherwise select the kun-yomi; At times, one should choose on-yomi.
効 果
以上の説明から明らかなように1本発明によると、テキ
スト音声合成の未登録語処理によって抽出同定された未
登録の漢字文字列に、適切な読みを付与することが出来
る。Effects As is clear from the above description, according to the present invention, an appropriate pronunciation can be assigned to an unregistered Kanji character string extracted and identified by the unregistered word processing of text-to-speech synthesis.
第1図は1本発明によるテキスト音声合成装置の一実施
例を説明するための構成図、第2図は、漢字の読み付与
処理の一例を説明するための図である。
1・・・日本語処理部、2・・・韻律生成部、3・・・
音声合成部、4・・・単語辞書、5・・・漢字辞書、6
・・・漢字読み付与部、7・・・辞書検索部、8・・・
未登録語処理部、9・・・形態素解析部、1o・・・構
文解析部、11・・・意味解析部、21・・・文字表記
形態抽出部、22・・・漢字辞書、23・・・読み検索
部、24・・・判定規則部、25・・・読み選択判定部
、26・・・訓読みの代表読み選択部、27・・・訓読
み選択部、28・・・音読み選択部、29・・・代表読
み選択部、3o・・・抽出した読み付与部。FIG. 1 is a block diagram for explaining an embodiment of a text-to-speech synthesis device according to the present invention, and FIG. 2 is a diagram for explaining an example of a process for adding pronunciations of kanji. 1... Japanese language processing unit, 2... Prosody generation unit, 3...
Speech synthesis section, 4... Word dictionary, 5... Kanji dictionary, 6
...Kanji reading assignment section, 7...Dictionary search section, 8...
Unregistered word processing unit, 9...Morphological analysis unit, 1o...Syntax analysis unit, 11...Semantic analysis unit, 21...Character notation form extraction unit, 22...Kanji dictionary, 23... - Reading search unit, 24... Judgment rule unit, 25... Reading selection determining unit, 26... Kun reading representative reading selection unit, 27... Kun reading selection unit, 28... On reading selection unit, 29 ... Representative reading selection part, 3o... Extracted reading assignment part.
Claims (1)
成装置であって、入力された日本語文章をあらかじめ作
成された辞書中の単語とのマッチングにより単語単位に
分割する処理において、前記辞書中に存在しなかった未
登録語文字列を検出する未登録語検出部および単漢字に
関する情報を登録した漢字辞書を有し、検出された未登
録語に漢字が含まれるときには、漢字読み付与処理によ
って漢字辞書から選択的に読みを付与することを特徴と
するテキスト音声合成装置。1. A text-to-speech synthesis device that converts any Japanese text into speech, in which the input Japanese text is divided into word units by matching with words in a dictionary created in advance. It has an unregistered word detection unit that detects unregistered word character strings that did not exist in A text-to-speech synthesis device characterized by selectively assigning readings from a kanji dictionary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63155438A JP2801601B2 (en) | 1988-06-23 | 1988-06-23 | Text-to-speech synthesizer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63155438A JP2801601B2 (en) | 1988-06-23 | 1988-06-23 | Text-to-speech synthesizer |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH01321557A true JPH01321557A (en) | 1989-12-27 |
JP2801601B2 JP2801601B2 (en) | 1998-09-21 |
Family
ID=15606034
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP63155438A Expired - Lifetime JP2801601B2 (en) | 1988-06-23 | 1988-06-23 | Text-to-speech synthesizer |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2801601B2 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62119591A (en) * | 1985-11-20 | 1987-05-30 | 富士通株式会社 | Sentence reciting apparatus |
JPS63189933A (en) * | 1987-02-02 | 1988-08-05 | Fujitsu Ltd | Device for reading sentence aloud |
-
1988
- 1988-06-23 JP JP63155438A patent/JP2801601B2/en not_active Expired - Lifetime
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62119591A (en) * | 1985-11-20 | 1987-05-30 | 富士通株式会社 | Sentence reciting apparatus |
JPS63189933A (en) * | 1987-02-02 | 1988-08-05 | Fujitsu Ltd | Device for reading sentence aloud |
Also Published As
Publication number | Publication date |
---|---|
JP2801601B2 (en) | 1998-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR900009170B1 (en) | Synthesis-by-rule type synthesis system | |
JPH06282290A (en) | Natural language processing device and method thereof | |
JP3371761B2 (en) | Name reading speech synthesizer | |
JP2801601B2 (en) | Text-to-speech synthesizer | |
JPH07262191A (en) | Word dividing method and voice synthesizer | |
JP2580568B2 (en) | Pronunciation dictionary update device | |
JPH09244677A (en) | Speech synthesis system | |
JP2001343987A (en) | Method and device for voice synthesis | |
JP2996978B2 (en) | Text-to-speech synthesizer | |
JPH0760378B2 (en) | Text-to-speech device | |
JP3573889B2 (en) | Audio output device | |
JPH08185197A (en) | Japanese analyzing device and japanese text speech synthesizing device | |
JPH096378A (en) | Text voice conversion device | |
KR0180650B1 (en) | Sentence analysis method for korean language in voice synthesis device | |
KR0136423B1 (en) | Phonetic change processing method by validity check of sound control symbol | |
Granström et al. | A danish text-to-speech system using a text normalizer based on morph analysis. | |
JPH06176023A (en) | Speech synthesis system | |
JPH06289890A (en) | Natural language processor | |
JP2888847B2 (en) | Text-to-speech apparatus and method, and language processing apparatus and method | |
JPH05298364A (en) | Phonetic symbol forming system | |
JPH02234198A (en) | Text voice synthesizing system | |
JPS62208125A (en) | Sentence reading device | |
JPH10240726A (en) | Natural language processor | |
JPH09281993A (en) | Phonetic symbol forming device | |
JPH0375898B2 (en) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20070710 Year of fee payment: 9 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20080710 Year of fee payment: 10 |
|
EXPY | Cancellation because of completion of term |