JP2740510B2

JP2740510B2 - Text-to-speech synthesis method

Info

Publication number: JP2740510B2
Application number: JP63029930A
Authority: JP
Inventors: 哲也酒寄
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-02-09
Filing date: 1988-02-09
Publication date: 1998-04-15
Anticipated expiration: 2013-04-15
Also published as: JPH01204100A

Description

【発明の詳細な説明】技術分野本発明は、テキスト音声合成方式に関する。Description: TECHNICAL FIELD The present invention relates to a text-to-speech synthesis system.

従来技術テキスト音声合成によって長い文章を合成する場合、
合成音は人間の発声と比べて非常に単調であり、長時間
の聴取は苦痛を伴うものであった。また、文字で書かれ
た文章には文字情報の他に、傍線，傍点，かぎかっこ，
太字，拡大文字，網掛け，変形書体等の様々な強調情報
が含まれるのが普通であり、これによって重要な情報を
読み手に分かりやすい形で提供している。しかし従来の
テキスト音声合成方式では、これらの強調情報は無視し
て文字情報だけを入力情報としていた。このため、出力
される合成音声には強調箇所と非強調箇所と区別はな
く、合成音はさらに単調なものとなっていた。Prior Art When synthesizing long sentences by text-to-speech synthesis,
The synthesized speech was very monotonous compared to human speech, and prolonged listening was painful. In addition, in sentences written in characters, in addition to character information, horizontal lines, cross points, angle brackets,
Various emphasis information such as bold, enlarged characters, shading, and modified fonts are usually included, thereby providing important information in a form that is easy for the reader to understand. However, in the conventional text-to-speech synthesis method, such emphasis information is ignored and only character information is used as input information. For this reason, there is no distinction between the emphasized portion and the non-emphasized portion in the output synthesized speech, and the synthesized sound is more monotonous.

目的本発明は、上述のごとき実情に鑑みてなされたもの
で、特に、音声のテキスト合成において、単調になりが
ちな合成音声を、そこに含まれる重要な情報を強調して
発声することによって、聴取し易いものとすることを目
的としてなされたものである。Objective The present invention has been made in view of the above-mentioned circumstances, and in particular, in text synthesis of speech, a synthesized speech that tends to be monotonous, by emphasizing important information contained therein and uttering the speech, The purpose is to make it easy to hear.

構成本発明は、上記目的を達成するために、形態素解析
部、音韻・韻律記号生成部、韻律テーブル、韻律生成
部、強調バイアステーブル、強調パラメータ生成部、加
算器、音声素片ファイル、音韻パラメータ生成部、音声
合成器からなるテキスト音声合成装置であって、前記形
態素解析部は、入力されるテキストを形態素解析処理す
ることにより形態素情報を出力し、前記音韻・韻律記号
生成部は、形態素情報から韻律記号列、音韻記号列を生
成し、前記韻律テーブルは、韻律記号列に対応する韻律
パラメータを記憶し、前記韻律生成部は、韻律テーブル
を参照して韻律記号列から韻律パラメータを読み出し、
前記強調バイアステーブルは、テキストに含まれる強調
情報に対応した強調韻律パラメータを記憶し、前記強調
パラメータ生成部は、前記強調バイアステーブルを参照
し、テキストに含まれる強調情報が強調する箇所の強調
韻律パラメータを生成し、前記加算器は、韻律パラメー
タと強調韻律パラメータを加算し、加算韻律パラメータ
として出力し、前記音声素片ファイルは、音韻記号列に
対応する音素片パラメータを記憶し、前記音韻パラメー
タ生成部は、前記音声素片ファイルを参照し、入力され
る音韻記号列から音声素片パラメータを読み出し、前記
音声合成器は、加算韻律パラメータを加味しながら音声
素片パラメータを結合規則によって結合し合成音声を生
成することを特徴としたものである。以下、本発明の実
施例に基いて説明する。To achieve the above object, the present invention provides a morphological analysis unit, a phoneme / prosodic symbol generation unit, a prosody table, a prosody generation unit, an emphasis bias table, an emphasis parameter generation unit, an adder, a speech unit file, a phoneme parameter. A text-to-speech synthesis device including a generation unit and a speech synthesizer, wherein the morphological analysis unit outputs morphological information by performing a morphological analysis process on the input text; and the phonological / prosodic symbol generating unit includes a morphological information unit. A prosody symbol string and a phoneme symbol string are generated from the prosody table, the prosody table stores prosody parameters corresponding to the prosody symbol string, and the prosody generation unit reads the prosody parameter from the prosody symbol string with reference to the prosody table,
The emphasis bias table stores emphasis prosody parameters corresponding to emphasis information included in the text, and the emphasis parameter generation unit refers to the emphasis bias table and emphasizes the prosody of a portion where the emphasis information included in the text emphasizes. Generating a parameter, the adder adds a prosodic parameter and an enhanced prosodic parameter, and outputs the sum as a prosodic parameter. The speech unit file stores a phoneme parameter corresponding to a phoneme symbol string. The generation unit refers to the speech unit file, reads a speech unit parameter from the input phoneme symbol string, and the speech synthesizer combines the speech unit parameters according to a combination rule while taking into account the added prosody parameter. It is characterized by generating synthesized speech. Hereinafter, a description will be given based on an example of the present invention.

第１図は、本発明の一実施例を説明するためのブロッ
ク線図で、図中、１は形態素解析部、２は音韻韻律記号
生成部、３は韻律生成部、４は韻律テーブル、５は強調
記号生成部、６は強調韻律生成部で、これら強調記号生
成部５及び強調韻律生成部７で強調すべきパラメータを
生成する。７は強調バイアステーブル、８は加算器、９
は音韻パラメータ生成部、10は音声素片ファイル、11は
音声合成器で、本発明は、前記形態素解析部１は、入力
されるテキストを形態素解析処理することにより形態素
情報を出力し、前記音韻・韻律記号生成部２は、形態素
情報から韻律記号列、音韻記号列を生成し、前記韻律テ
ーブル４は、韻律記号列に対応する韻律パラメータを記
憶し、前記韻律生成部３は、韻律テーブル４を参照して
韻律記号列から韻律パラメータを読み出し、前記強調バ
イアステーブル７は、テキストに含まれる強調情報に対
応した強調韻律パラメータを記憶し、前記強調パラメー
タ生成部5,6は、前記強調バイアステーブル７を参照
し、テキストに含まれる強調情報が強調する箇所の強調
韻律パラメータを生成し、前記加算器８は、韻律パラメ
ータと強調韻律パラメータを加算し、加算韻律パラメー
タとして出力し、前記音声素片ファイル10は、音韻記号
列に対応する音素片パラメータを記憶し、前記音韻パラ
メータ生成部９は、前記音声素片ファイル10を参照し、
入力される音韻記号列から音声素片パラメータを読み出
し、前記音声合成器11は、加算韻律パラメータを加味し
ながら音声素片パラメータを結合規則によって結合し合
成音声を生成したもので、例えば、傍線，傍点，かぎか
っこ，太字，拡大文字，網掛け，変形書体等の様々な強
調情報を含む文章を、強調箇所の韻律を変化させて発声
することによって、合成音の単調性を減少し、さらに重
要な情報を聞き手に分かりやすい形で提供するものであ
る。FIG. 1 is a block diagram for explaining an embodiment of the present invention, in which 1 is a morphological analysis unit, 2 is a phoneme / prosodic symbol generation unit, 3 is a prosody generation unit, 4 is a prosody table, 5 Denotes an emphasis symbol generation unit, and 6 denotes an emphasis prosody generation unit. These emphasis symbol generation units 5 and 7 generate parameters to be emphasized. 7 is an emphasis bias table, 8 is an adder, 9
Is a phoneme parameter generation unit, 10 is a speech unit file, and 11 is a speech synthesizer. According to the present invention, the morphological analysis unit 1 outputs morphological information by performing morphological analysis processing on an input text, The prosody symbol generation unit 2 generates a prosody symbol sequence and a phoneme symbol sequence from the morphological information, the prosody table 4 stores prosody parameters corresponding to the prosody symbol sequence, and the prosody generation unit 3 , Read the prosody parameters from the prosody symbol string, the emphasis bias table 7 stores the emphasis prosody parameters corresponding to the emphasis information included in the text, and the emphasis parameter generation units 5 and 6 execute the emphasis bias table 7, the adder 8 generates a prosody parameter at a position where the emphasis information included in the text emphasizes, and the adder 8 generates the prosody parameter and the prosody parameter. The speech unit file 10 stores the phoneme parameter corresponding to the phoneme symbol string, and the phoneme parameter generation unit 9 refers to the speech unit file 10. ,
The speech synthesizer 11 reads out speech unit parameters from the input phoneme symbol string, and the speech synthesizer 11 combines the speech unit parameters according to a combination rule while taking into account the added prosodic parameters to generate a synthesized speech. By changing the prosody of the emphasis part to utter a sentence containing various emphasis information such as side-points, brackets, bold, enlarged characters, shading, deformed fonts, etc. Information is provided to the listener in an easy-to-understand format.

第２図乃至第４図は、それぞれ本発明の実施例を説明
するための図で、いずれも、“これが「霜降り」と呼ば
れている肉です”と発声した時の例を示し、「霜降り」
が強調されている時の例を示す。而して、第２図に示し
た実施例は、強調情報によって強調された箇所（文字列
の各文字）の基本周波数を非強調箇所のそれに対して一
定のバイアスを持たせて高くしたものである。また、第
３図に示した実施例は、強調情報（「」）によって強
調された文字列（霜降り）のパワーを非強調箇所のそれ
に対して一定のバイアスを持たせて大きくしたもの、第
４図に示した実施例は、強調情報によって強調された箇
所（文字列）の発話速度を非強調箇所のそれよりも遅
く、すなわち強調箇所の各音韻の継続時間長を非強調箇
所に比べて長くしたものである。FIG. 2 to FIG. 4 are diagrams for explaining the embodiment of the present invention, and all show examples in which "this is meat called" marbling "" is uttered, and "marbling""
Here is an example when is highlighted. In the embodiment shown in FIG. 2, the fundamental frequency of the portion (each character of the character string) emphasized by the emphasis information is increased by giving a certain bias to that of the non-emphasized portion. is there. In the embodiment shown in FIG. 3, the power of the character string (marbling) emphasized by the emphasis information (“”) is increased by giving a certain bias to that of the non-emphasized portion, In the embodiment shown in the figure, the utterance speed of the portion (character string) emphasized by the emphasis information is lower than that of the non-emphasized portion, that is, the duration of each phoneme of the emphasized portion is longer than that of the non-emphasized portion. It was done.

効果以上の説明から明らかなように、本発明によると合成
音声の単調性を減少させ、重要な情報を強調して発声す
ることが可能となり、特に、音声のテキスト合成におい
て、単調になりがちな合成音声を、そこに含まれる重要
な情報を強調して発声することによって聴取し易いもの
とすることができ、もって、強調処理を音韻生成前の韻
律生成処理と並行して行うことにより、発生合成音声を
より人間の発声に近くて自然性を向上させるとともに、
強調処理を加算にて行うようにして、強調処理を簡単に
したものである。Effects As is apparent from the above description, according to the present invention, it is possible to reduce the monotonicity of synthesized speech and emphasize important information to produce utterance. In particular, speech synthesis tends to be monotonous. It is possible to make the synthesized speech easy to hear by voicing with emphasis on important information contained in the synthesized speech. Therefore, by performing the emphasis process in parallel with the prosody generation process before the phoneme generation, Synthetic speech is closer to human utterance to improve naturalness,
The emphasis process is performed by addition, thereby simplifying the emphasis process.

[Brief description of the drawings]

第１図は、本発明の実施に使用されるテキスト音声合成
装置の一例を示すブロック図、第２図乃至第４図は、そ
れぞれ本発明の実施例を説明するための図である。１……形態素解析部、２……音韻韻律記号生成部、３…
…韻律生成部、４……韻律テーブル、５……強調記号生
成部、６……強調韻律生成部、７……強調バイアステー
ブル、８……加算器、９……音韻パラメータ生成部、10
……音声素片ファイル、11……音声合成器。FIG. 1 is a block diagram showing an example of a text-to-speech synthesizing apparatus used for carrying out the present invention, and FIGS. 2 to 4 are views for explaining an embodiment of the present invention. 1 ... morphological analysis unit, 2 ... phonological prosodic symbol generation unit, 3 ...
... Prosody generation section, 4... Prosody table, 5... Emphasis symbol generation section, 6... Emphasis prosody generation section, 7... Emphasis bias table, 8.
…… Speech unit file, 11 …… Speech synthesizer.

Claims

(57) [Claims]

1. A morphological analysis section, a phoneme / prosodic symbol generation section, a prosody table, a prosody generation section, an emphasis bias table, an emphasis parameter generation section, an adder, a speech unit file, a phoneme parameter generation section, and a speech synthesizer. In a text-to-speech synthesis apparatus, a morphological analysis unit outputs morphological information by performing morphological analysis processing on an input text, and a phoneme / prosodic symbol generation unit generates a prosodic symbol sequence from the morphological information,
A prosody table stores a prosody parameter corresponding to the prosody symbol string. The prosody generation unit reads the prosody parameter from the prosody symbol string with reference to the prosody table. The emphasis prosody parameter corresponding to the emphasis information included is stored. The emphasis parameter generation unit refers to the emphasis bias table and generates an emphasis prosody parameter of a portion where the emphasis information included in the text is emphasized. The parameter and the enhanced prosodic parameter are added and output as an added prosodic parameter. The speech unit file stores the phoneme parameter corresponding to the phoneme symbol string. The phoneme parameter generation unit refers to the speech unit file,
A text-to-speech synthesizer that reads out speech unit parameters from an input phoneme symbol string, and a speech synthesizer that combines speech unit parameters according to a combination rule while taking added prosody parameters into account.