JPH01204100A

JPH01204100A - Text speech synthesis system

Info

Publication number: JPH01204100A
Application number: JP63029930A
Authority: JP
Inventors: Tetsuya Sakayori; 哲也酒寄
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-02-09
Filing date: 1988-02-09
Publication date: 1989-08-16
Anticipated expiration: 2013-04-15
Also published as: JP2740510B2

Abstract

PURPOSE:To make a synthetic speech easy to hear by give variation to the rhythm of the synthetic speech for voicing a character string with emphasized information when the emphasized information is included in an input text. CONSTITUTION:Morpheme information is extracted from the input text by a morpheme analytic part 1 and a phoneme and rhythm symbol generation part 1 finds a phoneme symbol sequence and a rhythm symbol sequence according to the information. Further, when the input text contains the emphasized information, an emphasized symbol generation part 5 generates an emphasized symbol according to the information. Then emphasized rhythm generated an emphasized rhythm generation part 6 is added by an adder 8 to the rhythm generated by the rhythm generation part 3. Thus, a speech is synthesized with the rhythm to which the emphasized rhythm is added and the monotoneousness of the synthetic tone is reduced to offer important information in an easy-to-hear state for a listener.

Description

【発明の詳細な説明】技術分野本発明は、テキスト音声合成方式に関する。[Detailed description of the invention] Technical field The present invention relates to a text-to-speech synthesis method.

従来技術テキスト音声合成によって長い文章を合成する場合、合
成音は人間の発声と比べて非常に単調であり、長時間の
聴取は苦痛を伴うものであった。Prior Art When a long sentence is synthesized using text-to-speech synthesis, the synthesized speech is very monotonous compared to human speech, and listening to it for a long time is painful.

また、文字で書かれた文章には文字情報の他に。In addition to text information, written text contains text information.

傍線、傍点、かぎかっこ、太字、拡大文字、網掛け、変
形書体等の様々な強調情報が含まれるのが普通であり、
これによって重要な情報を読み手に分かりやすい形で提
供している。しかし従来のテキスト音声合成方式では、
これらの強調情報は無視して文字情報だけを入力情報と
していた。このため、出力される合成音声には強調箇所
と非強調箇所と区別はなく１合成音はさらに単調なもの
となっていた。It usually includes various emphasis information such as sidelines, dots, angle brackets, bold type, enlarged text, shading, modified fonts, etc.
This provides important information in an easy-to-understand format for readers. However, with conventional text-to-speech synthesis methods,
This emphasized information was ignored and only character information was used as input information. For this reason, there is no distinction between emphasized parts and non-emphasized parts in the output synthesized speech, and one synthesized speech becomes even more monotonous.

目　　　　　的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声のテキスト合成において、単調になりがちな
合成音声を、そこに含まれる重要な情報を強調して発声
することによって、聴取し易いものとすることを目的と
してなされたものである。Purpose The present invention was made in view of the above-mentioned circumstances.
In particular, in text synthesis of speech, the aim is to make the synthesized speech, which tends to be monotonous, easier to listen to by emphasizing the important information contained therein.

構　　　成本発明は、上記目的を達成するために、入力テキストか
ら形態素解析処理によって形態素情報を抽出し、これに
基づき音韻規則によって音韻記号列を、韻律記号導出規
則によって韻律記号列を求め、予め用意した音声素片の
パラメータ系列を音韻記号列に従って読みだし、結合規
則によって接続し、韻律記号列から韻律規則によって韻
律を付加するテキスト音声合成装置において、入力テキ
スト中に何等かの強調情報が含まれている場合、その強
調情報によって強調されている文字列を発声する際、合
成音声の韻律に変化を持たせることを特徴としたもので
ある。以下、本発明の実施例に基いて説明する。Structure In order to achieve the above object, the present invention extracts morphological information from an input text by morphological analysis processing, and based on this extracts phonological symbol strings according to phonological rules and prosodic symbol strings according to prosodic symbol derivation rules, and prepares them in advance. In a text-to-speech synthesizer that reads out parameter sequences of speech segments according to phonetic symbol strings, connects them according to connection rules, and adds prosody from prosodic symbol strings according to prosodic rules, it is necessary to use a text-to-speech synthesizer that reads out parameter sequences of speech segments according to phonetic symbol strings, connects them according to connection rules, and adds prosody from prosodic symbol strings according to prosodic rules. When the character string emphasized by the emphasis information is uttered, the prosody of the synthesized speech is changed. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するためのブロック
線図で、図中、１は形態素解析部、２は音韻韻律記号生
成部、３は韻律生成部、４は韻律テーブル、５は強調記
号生成部、６は強調韻律生成部、７は強調バイアステー
ブル、８は加算器、９は音韻パラメータ生成部、１０は
音声素片ファイル、１１は音声合成器で１本発明は、入
力テキストから形態素解析処理によって形態素情報を抽
出し、これに基づき音韻規則によって音韻記号列を、韻
律記号導出規則によって韻律記号列を求め。FIG. 1 is a block diagram for explaining one embodiment of the present invention, in which 1 is a morphological analysis section, 2 is a phonological and prosodic symbol generation section, 3 is a prosody generation section, 4 is a prosody table, and 5 is a block diagram for explaining an embodiment of the present invention. 1 is an emphasis symbol generation unit, 6 is an emphasis prosody generation unit, 7 is an emphasis bias table, 8 is an adder, 9 is a phonological parameter generation unit, 10 is a speech segment file, and 11 is a speech synthesizer. Morphological information is extracted from the text by morphological analysis processing, and based on this, phonological symbol strings are determined using phonological rules and prosodic symbol strings are obtained using prosodic symbol derivation rules.

予め用意した音声素片のパラメータ系列を音韻記号列に
従って読みだし、結合規則によって接続し、韻律記号列
から韻律規則によって韻律を付加するテキスト音声合成
装置において、入力テキスト中に何等かの強調情報が含
まれている場合、その強調情報によって強調されている
文字列を発声する際、合成音声の韻律に変化を持たせる
ようにしたもので、例えば、傍線、傍点、かぎかっこ、
太字。In a text-to-speech synthesizer that reads out a parameter sequence of speech segments prepared in advance according to a phonetic symbol string, connects them according to a combination rule, and adds prosody using a prosodic symbol string according to a prosodic rule, it is possible to detect some emphasis information in the input text. If included, the prosody of the synthesized speech is changed when the character string emphasized by the emphasis information is uttered. For example, it changes the prosody of the synthesized voice.
Bold.

拡大文字、網掛け、変形書体等の様々な強調情報を含む
文章を１強調箇所の韻律を変化させて発声することによ
って、合成音の単調性を減少し、さらに重要な情報を聞
き手に分かりやすい形で提供するものである。By changing the prosody of one emphasized part of a sentence that includes various emphasis information such as enlarged letters, shading, and modified fonts, the monotony of synthesized speech is reduced and important information is easier to understand for the listener. It is provided in the form of

第２図乃至第４図は、それぞれ本発明の詳細な説明する
ための図で、いずれも、“これが「霜降り」と呼ばれて
いる肉です″と発声した時の例を示し、「霜降り」が強
調されている時の例を示す、而して、第２図に示した実
施例は、強調情報によって強調された箇所の基本周波数
を非強調箇所のそれに対して一定のバイアスを持たせて
高くしたものである。また、第３図に示した実施例は、
強調情報によって強調された箇所のパワーを非強調箇所
のそれに対して一定のバイアスを持たせて大きくしたも
の、第４図に示した実施例は、強調情報によって強調さ
れた箇所の発話速度を非強調箇所のそれよりも遅く、す
なわち強調箇所の各音韻の継続時間長を非強調箇所に比
べて長くしたものである。Figures 2 to 4 are diagrams for explaining the present invention in detail, and each shows an example of uttering "This is the meat called 'marbled'" and "marbled". In the embodiment shown in FIG. 2, the fundamental frequency of the part emphasized by the emphasis information is given a certain bias with respect to that of the non-emphasized part. It was made more expensive. In addition, the embodiment shown in FIG.
The embodiment shown in FIG. 4, which increases the power of a part emphasized by emphasis information with a certain bias compared to that of a non-emphasis part, increases the power of a part emphasized by emphasis information. It is slower than that of the emphasized part, that is, the duration length of each phoneme of the emphasized part is longer than that of the non-emphasized part.

効　　　果以上の説明から明らかなように１本発明によると合成音
声の単調性を減少させ１重要な情報を強調して発声する
ことが可能となり、特に、音声のテキスト合成において
、単調になりがちな合成音声を、そこに含まれる重要な
情報を強調して発声することによって聴取し易いものと
することができる。Effects As is clear from the above explanation, (1) according to the present invention, the monotony of synthesized speech can be reduced, (1) important information can be emphasized and uttered, and in particular, monotony can be reduced in speech-to-text synthesis. Synthesized speech can be made easier to listen to by emphasizing the important information contained therein.

[Brief explanation of the drawing]

第１図は１本発明の実施に使用されるテキスト音声合成
装置の一例を示すブロック図、第２図乃至第４図は、そ
れぞれ本発明の詳細な説明するための図である。ｌ・・・形態素解析部、２・・・音韻韻律記号生成部、
３・・・韻律生成部、４・・・韻律テーブル、５・・・
強調記号生成部、６・・・強調韻律生成部、７・・・強
調バイアステーブル、８・・・加算器、９・・・音韻パ
ラメータ生成部、１０・・・音声素片ファイル、１１・
・・音声合成器。第２図第６図FIG. 1 is a block diagram showing an example of a text-to-speech synthesis device used to implement the present invention, and FIGS. 2 to 4 are diagrams for explaining the present invention in detail. l...Morphological analysis unit, 2...Phonological and prosodic symbol generation unit,
3... Prosodic generation unit, 4... Prosodic table, 5...
Emphasis symbol generation unit, 6: Emphasis prosody generation unit, 7: Emphasis bias table, 8: Adder, 9: Phonological parameter generation unit, 10: Speech element file, 11.
...Speech synthesizer. Figure 2 Figure 6

Claims

[Claims]

1. Extract morphological information from the input text by morphological analysis processing, and based on this, obtain a phonetic symbol string using phonological rules and a prosodic symbol string using prosodic symbol derivation rules. In a text-to-speech synthesizer that reads the text according to the rules, connects it according to the combination rules, and adds prosody using the prosodic rules from the prosodic symbol string, if the input text contains some emphasis information, it is emphasized by the emphasis information. A text-to-speech synthesis method characterized by varying the prosody of synthesized speech when uttering a character string.