JPS6161400B2

JPS6161400B2 -

Info

Publication number: JPS6161400B2
Application number: JP54120374A
Authority: JP
Inventors: Kyoshi Aizawa
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1979-09-19
Filing date: 1979-09-19
Publication date: 1986-12-25
Also published as: JPS5643700A

Description

【発明の詳細な説明】この発明は単音節により任意語い（彙）生成を
行う音声合成装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech synthesis device that generates arbitrary vocabulary using monosyllables.

従来の単語登録形及び文章登録形の音声応答装
置は前以つて定まつた文章パターン、例えば「本
日は閉店いたしました。明日おかけ直し下さい」
等の文章を複数用意しておき、文章パターン指定
により該当する文章パターンを音声出力してい
た。また数字例えば電話番号、金額等の文章への
挿入についても数字情報を前以つて単語で登録し
ておき、これをつなぎ合せて音声出力していた。
しかし銀行の振込通知サービスにおいては人名、
地名、商社、商店等のオーダエントリサービスで
はこれ等人名、地名、商品名は音声出力サービス
において必須の可変情報であり、この情報を単語
単位で前以つて用意するには膨大なメモリ容量が
必要であつた。 Conventional word-registered and sentence-registered voice response devices respond to predetermined sentence patterns, such as "We are closed today. Please call back tomorrow."
We prepared multiple sentences such as, and output the corresponding sentence pattern aloud by specifying the sentence pattern. In addition, when inserting numbers, such as telephone numbers and monetary amounts, into sentences, the numerical information was registered in advance as words, which were then strung together and output as voice.
However, in the bank transfer notification service, the person's name,
In order entry services such as place names, trading companies, and stores, personal names, place names, and product names are essential variable information for voice output services, and preparing this information in advance in word units requires a huge amount of memory capacity. It was hot.

またこのような人名、地名、商品名等の可変情
報の音声出力方式として単音節による任意語い音
声応答装置がある。この装置は日本語100音節を
基本とし、その他に必要によつては促音、長音等
の単位音声を加え、約100〜500の単位音声を蓄積
しておき音声出力に際して一定周期ごとに続出し
それをそのままで形で単位音声ごとに音声出力し
ていた。このため出力音声は単位音声が同一間隔
で一語一語発声されたものと等価となり単位音声
間のとぎれが明瞭性、自然性劣化を招く欠点があ
つた。 Furthermore, as a voice output method for variable information such as a person's name, place name, product name, etc., there is a monosyllable arbitrary word voice response device. This device is based on the 100 Japanese syllables, and if necessary, adds unit sounds such as consonants and long sounds, and stores approximately 100 to 500 unit sounds, which are output one after another at regular intervals when outputting audio. was outputting audio for each unit sound as is. For this reason, the output speech is equivalent to unit speech uttered word by word at the same intervals, and there is a drawback that interruptions between unit speeches cause deterioration in clarity and naturalness.

一方VCV（母音−子音−母音）音韻連鎖を用
いた任押語い音声応答装置は700〜900の音韻連鎖
情報が必要であり、単音節方式の約２倍の単語数
が必要であること、カナ文字で表わされる単音節
情報のほかにアクセント位置情報、イントネーシ
ヨン情報が必要であること及び単語登録形もしく
は文章登録形音声応答装置に専用のVCV合成装
置を付加する必要があることから価格が高くな
り、しかし人名、地名等の可変情報の作成が難し
い欠点があつた。 On the other hand, a voice response device that uses VCV (vowel-consonant-vowel) phoneme chain requires 700 to 900 phoneme chain information, which is about twice the number of words as the monosyllabic method. In addition to the monosyllabic information expressed in kana characters, accent position information and intonation information are required, and a dedicated VCV synthesis device must be added to the word registration type or sentence registration type voice response device, so the price is low. However, it had the disadvantage that it was difficult to create variable information such as people's names and place names.

この発明はこれらの欠点を解決するため単位音
声間の「わたり」の部分にその単位音声間に相当
する母音・母音もしくは母音・子音の音韻連鎖情
報を挿入することにより出力音声の明瞭性及び自
然性を経済的にかつ効率的に向上させたもので以
下図面を参照して詳細に説明する。 In order to solve these drawbacks, this invention improves the clarity and naturalness of output speech by inserting phonological chain information of vowels and vowels or vowels and consonants corresponding between unit speech into the "crossing" part between unit speech. This will be explained in detail below with reference to the drawings.

第１図はこの発明音声合成装置の一実施例を示
し、音声蓄積部１１には例えば第２図Ａに示す文
章原文中の固定情報（数字情報は可変情報として
含まれることがある）１２が単語蓄積部１３に、
カナ文字で示されている単音節情報１４が単音節
蓄積部１５に、単音節情報１４の「わたり」とし
て生ずる母音・母音もしくは母音・子音の音韻連
鎖情報１６が音韻連鎖蓄積部１７にそれぞれ蓄積
されている。出力指定情報受信部２０は例えば計
算機センタから音声出力指令により文章パターン
指定及び数字による可変情報である月日情報１
８、金額情報１９とともにカナ文字で示される地
名、人名の単音節情報１４を受信する。 FIG. 1 shows an embodiment of the speech synthesis device of the present invention, in which a speech storage section 11 stores, for example, fixed information 12 (numerical information may be included as variable information) in the original text shown in FIG. 2A. In the word storage section 13,
Monosyllabic information 14 indicated in kana characters is stored in the monosyllabic storage section 15, and phonological chain information 16 of vowels/vowels or vowels/consonants that occur as a "crossover" of the monosyllabic information 14 is stored in the phonological chain storage section 17. has been done. The output designation information receiving unit 20 receives, for example, a voice output command from a computer center to designate a sentence pattern and receive month/day information 1 which is variable information based on numbers.
8. Receive amount information 19 as well as monosyllabic information 14 of place names and people's names indicated in kana characters.

出力指定情報受信部２０はこれをうけて音声出
力制御部２１へ音声出力開始指令を出力する。こ
れと並行して音韻連鎖情報発生部２２は出力指定
情報受信部２０に受信されている単音節情報１４
をもとにカナ文字例えば第４図Ｂに示すように
“ミ（mi）”、“ナ（na）”を接続する母音・子音情
報“in”の音韻連鎖情報１６を発生する。この発
生は例えばその入力単音節情報１４はコードとし
て入力され、その隣接するものにより読出し専用
メモリを読出して音韻連鎖情報を得るようにする
ことができる。音声出力制御部２１は出力指定情
報受信部２０からの音声出力開始指令を受けて前
以つて定まつている順序で単語蓄積部１３から単
語群１２を読出し音声出力部２３へ与え出力音声
とする。また音声出力制御部２３は該当文章パタ
ーン及び数字の単語出力中に単音節情報の挿入を
行う場合、出力指定情報受信部２０に表示されて
いる単音節情報１４の１文字単位に、該当する単
音節蓄積部１７に蓄積されている単音節のその音
声情報を読出し音声出力部２３へ与え出力音声と
し、続いて次の単音節情報との「わたり」を作る
その音韻連鎖情報１６を音韻連鎖発生部２２から
受信し、このその音韻連鎖情報１６が蓄積されて
いる音韻連鎖蓄積部１７から読出し音声出力部２
３に与え出力音声とする。このような手順により
単音節情報及び音韻連鎖情報を組合せて地名、人
名等を任意に編集し、単語情報へ挿入することに
より文章単位の音声情報を作成する。なお文章登
録形音声応答装置においては単語蓄積部１３に文
章パターン単位及び数字情報については単語単位
に音声情報が蓄積されており、その読出し方法に
ついては単語登録形の方法を適用できることは明
らかである。つまり固定情報及びカナ文字情報の
編集は従来と同様に行うが、この発明ではそのカ
ナ文字情報の編集は単音節情報で行い、かつその
連鎖情報を各単音節情報に組入れる。 In response to this, the output designation information receiving unit 20 outputs an audio output start command to the audio output control unit 21. In parallel with this, the phoneme chain information generation unit 22 outputs the monosyllabic information 14 received by the output designation information reception unit 20.
Based on the kana characters, for example, as shown in FIG. 4B, phoneme chain information 16 of vowel/consonant information "in" connecting "mi" and "na" is generated. This can occur, for example, in such a way that the input monosyllabic information 14 is input as a code and its neighbors read out a read-only memory to obtain phoneme chain information. The audio output control unit 21 receives the audio output start command from the output designation information receiving unit 20, reads word groups 12 from the word storage unit 13 in a predetermined order, and supplies them to the audio output unit 23 as output audio. . Furthermore, when inserting monosyllabic information into the output of the corresponding sentence pattern and numeric word, the audio output control section 23 inserts the corresponding The speech information of the monosyllable stored in the syllable storage section 17 is read out and given to the speech output section 23 as output speech, and then the phonological chain information 16 that creates a "crossover" with the next monosyllable information is generated as a phonological chain. The phoneme chain information 16 is received from the phoneme chain storage unit 22 and read out from the phoneme chain storage unit 17 in which the phoneme chain information 16 is stored.
3 as the output audio. Through such a procedure, monosyllabic information and phoneme chain information are combined to arbitrarily edit place names, personal names, etc., and then inserted into word information to create speech information for each sentence. In the text registration type voice response device, voice information is stored in the word storage unit 13 in sentence pattern units and numerical information in word units, and it is clear that the word registration type method can be applied to the reading method. . In other words, fixed information and kana character information are edited in the same manner as in the past, but in this invention, the kana character information is edited using monosyllabic information, and the chain information is incorporated into each monosyllabic information.

以上説明したようにこの発明によればカナ文字
情報をもとにカナ文字間に「わたり」、即ち母
音・母音もしくは母音・子音の音韻連鎖情報を挿
入することにより任意の単語を編集するため単音
節がと切れと切れにならず、その単語の明瞭性、
自然性が容易に向上し、かつ音韻連鎖蓄積部１７
のメモリ量の増分も比較的少くてすみ、また受信
部１７へ供給する情報も少くてすむ利点がある。 As explained above, according to the present invention, any word can be edited by "crossing" between kana characters based on kana character information, that is, by inserting vowel/vowel or vowel/consonant phonological chain information. The syllables are unbroken, the clarity of the word,
The naturalness is easily improved and the phonological chain storage unit 17
This has the advantage that the increase in the amount of memory required is relatively small, and the amount of information supplied to the receiving section 17 is also small.

上述では可変情報を含む文章原文内にカナ文字
を挿入する場合にこの発明を適用したが、出力音
声が比較的短かい場合は単音節のみを編集して文
章とする場合にもこの発明を適用でき、また単音
節としてはカナ文字のみならる数字を含めて編集
し、それ等の隣接単音節間の音韻連鎖情報を挿入
することにより明瞭性、自然性が得られる。 In the above, this invention was applied when inserting kana characters into an original text containing variable information, but when the output speech is relatively short, this invention can also be applied when editing only monosyllables to create a text. In addition, clarity and naturalness can be obtained by editing the monosyllables by including numbers that are only kana characters, and inserting phonological chain information between adjacent monosyllables.

[Brief explanation of the drawing]

第１図はこの発明による音声合成装置の一例を
示すブロツク図、第２図は音声出力の文章例を示
す図である。１１：音声蓄積部、１２：単語群、１３：単語
蓄積部、１４：単音節情報、１５：単音節蓄積
部、１６：音韻連鎖情報、１７：音韻連鎖蓄積
部、１８：月日情報、１９：金額情報、２０：出
力指定情報受信部、２１：音声出力制御部、２
２：音韻連鎖情報発生部、２３：音声出力部。 FIG. 1 is a block diagram showing an example of a speech synthesis device according to the present invention, and FIG. 2 is a diagram showing an example of text outputted from the speech. 11: Speech storage section, 12: Word group, 13: Word storage section, 14: Monosyllabic information, 15: Monosyllabic storage section, 16: Phonological chain information, 17: Phonological chain storage section, 18: Date information, 19 : Amount information, 20: Output specification information receiving section, 21: Audio output control section, 2
2: Phonological chain information generation section, 23: Audio output section.

Claims

[Claims]

1. A storage means for accumulating phonological chains of monosyllables and vowels/vowels and vowels/consonants between adjacent monosyllables that occur, a receiving means for receiving monosyllabic information to be outputted as a voice, and each of the monosyllables from the received monosyllable information. means for generating phonological chain information between adjacent monosyllables or vowels and consonants; and a voice output for sequentially reading the monosyllables and phonological chain information from the storage means in accordance with the received monosyllable information and outputting speech. A speech synthesis device comprising means.