JPH10260815A

JPH10260815A - Voice synthesizing method

Info

Publication number: JPH10260815A
Application number: JP9065144A
Authority: JP
Inventors: Katsumi Tsuchiya; 勝美土谷; Masami Akamine; 政巳赤嶺; Kimio Miseki; 公生三関; Masahiro Oshikiri; 正浩押切; Ko Amada; 皇天田; Takehiko Kagoshima; 岳彦籠嶋
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-03-18
Filing date: 1997-03-18
Publication date: 1998-09-29

Abstract

PROBLEM TO BE SOLVED: To provide a voice synthesizing method that makes it possible to grasp the attribute and display form of text from the speech mode of synthetic voice and the difference of a voice quality. SOLUTION: In this method, a voice synthesizing part 12 generates synthetic voice from synthetic parameters 102 to 105 that are generated by a synthetic parameter generating part 11 by using a characteristic parameter 200 by using the analysis result 101 of a text 10 of a text analyzing part 10. In such cases, the speech form and voice quality of a synthetic voice 106 are controlled through a speech mode parameter generating part 42 and a voice guality conversion parameter generating part 43 according to a text attribute analysis data 401, which represents the kind, content and generating source of a text that is acquired by a text attribute analyzing part 40 and a display form analysis data 402, which represents the size, kind, position, change, etc., of a window that shows the text, which is acquired by a display form analyzing part 41.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、テキストから合成
音声を生成するテキスト音声合成のための音声合成方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text-to-speech synthesis method for generating text-to-speech from text.

【０００２】[0002]

【従来の技術】テキスト音声合成は、入力されたテキス
トから自動的に合成音声を生成する技術であり、図２２
に示すようにテキスト１００を解析するテキスト解析部
１０、合成パラメータを生成する合成パラメータ生成部
１１および合成音声１０６を生成する音声合成部１２の
三つの要素から構成される。各々の構成要素は、基本的
には以下に示すような処理を行っている。2. Description of the Related Art Text-to-speech synthesis is a technique for automatically generating synthesized speech from an input text.
As shown in FIG. 3, the text analysis unit 10 analyzes the text 100, a synthesis parameter generation unit 11 that generates synthesis parameters, and a speech synthesis unit 12 that generates a synthesized speech 106. Each component basically performs the following processing.

【０００３】入力されたテキスト１００は、まずテキス
ト解析部１０において、形態素解析や構文解析が行われ
る。次に、合成パラメータ生成部１１において、テキス
ト解析データ１０１を用いて音韻記号列１０２、音韻継
続時間長１０３、ピッチパターン１０４およびパワー１
０５などの合成パラメータが生成される。そして、音声
合成部１２において、合成単位セット記憶部２０に記憶
されている音節、音素および１ピッチ区間などの基本と
なる小さな単位（音声合成単位という）の特徴パラメー
タ２００が音韻記号列１０２、音韻継続時間長１０３お
よびピッチパターン１０４などの情報に従って選択さ
れ、これらがピッチや継続時間長が制御された後に接続
されることにより、合成音声１０６が生成される。The input text 100 is first subjected to morphological analysis and syntax analysis in the text analysis unit 10. Next, the synthesis parameter generation unit 11 uses the text analysis data 101 to generate a phoneme symbol string 102, a phoneme duration 103, a pitch pattern 104, and a power 1.
A synthesis parameter such as 05 is generated. Then, in the speech synthesizing unit 12, the characteristic parameters 200 of the basic small units (referred to as speech synthesizing units) such as syllables, phonemes, and one-pitch sections stored in the synthesis unit set storage unit 20 are stored in the phoneme symbol string 102 The synthesized speech 106 is generated by selecting according to the information such as the duration 103 and the pitch pattern 104 and connecting them after controlling the pitch and the duration.

【０００４】このようにして生成される合成音声１０６
の声質（個人性）は、主に音声合成単位である特徴パラ
メータ２００によって決定される。このため、音声合成
において複数の異なる声質の音声を合成する必要がある
場合は、図２３に示されるように、それぞれの声質に対
応する複数の合成単位セット記憶部２０〜２ｎを用意
し、これらを声質変換パラメータ入力部３１からの声質
変換パラメータにより切り替えて使用し、特徴パラメー
タ２００を切り替えることによって、異なる声質の合成
音声１０６を生成する。[0004] The synthesized speech 106 thus generated
Is determined mainly by the feature parameter 200 which is a unit of speech synthesis. Therefore, when it is necessary to synthesize a plurality of voices of different voice qualities in voice synthesis, as shown in FIG. 23, a plurality of synthesis unit set storage units 20 to 2n corresponding to the respective voice qualities are prepared. Is switched by using the voice quality conversion parameter from the voice quality conversion parameter input unit 31 and the feature parameter 200 is switched to generate the synthesized voice 106 having a different voice quality.

【０００５】また、図２３に示されるように、発話様式
変換パラメータ入力部３０からの発話様式変換パラメー
タ３００で合成パラメータ生成部１１を制御することに
よって、発話様式の異なる合成音声１０６を生成するこ
ともできる。例えば、発話様式変換パラメータとしてピ
ッチ平均値情報、抑揚の大きさ情報、声の大きさ情報を
用いることで、合成音声におけるピッチパターンの平均
値、抑揚の強さおよび声の大きさを変化させることがで
きる。Further, as shown in FIG. 23, by controlling the synthesis parameter generation unit 11 with the utterance mode conversion parameter 300 from the utterance mode conversion parameter input unit 30, it is possible to generate the synthesized speech 106 having a different utterance mode. Can also. For example, by using pitch average value information, inflection magnitude information, and voice loudness information as speech style conversion parameters, changing the average value of the pitch pattern, inflection strength, and voice loudness in the synthesized speech. Can be.

【０００６】一方、複数の合成単位セット記憶部を用意
することなく様々な声質の合成音声を生成する方法とし
ては、例えば合成音声にフィルタをかける方法がある。
この方法によると、例えば図２４に示されるように音声
合成部１２の出力側にフィルタ部１３を配置し、合成音
声の低域や高域をフィルタ部１３で強調することによ
り、合成音声の声質を変えることができる。On the other hand, as a method of generating synthesized voices of various voice qualities without preparing a plurality of synthesis unit set storage units, for example, there is a method of filtering the synthesized voice.
According to this method, for example, as shown in FIG. 24, the filter unit 13 is arranged on the output side of the speech synthesis unit 12, and the low frequency range and the high frequency range of the synthesized voice are emphasized by the filter unit 13. Can be changed.

【０００７】このような従来の音声合成方法では、合成
音声の発話様式や声質を決定するためには、ユーザ自身
が発話様式パラメータ入力部３０や声質変換パラメータ
入力部３１を介して発話様式パラメータおよび声質変換
変換パラメータを与える操作が必要である。In such a conventional speech synthesis method, in order to determine the speech style and voice quality of the synthesized speech, the user himself / herself receives the speech style parameters and speech quality parameters via the speech style parameter input unit 30 and the voice quality conversion parameter input unit 31. An operation for giving a voice quality conversion parameter is required.

【０００８】[0008]

【発明が解決しようとする課題】このように従来の音声
合成方法では、ユーザ自身が発話様式変換パラメータお
よび声質変換パラメータを与えることにより、合成音声
の発話様式および声質を決定していた。従って、合成音
声の発話様式や声質は、単にユーザ自身の感性や好みの
みを反映したものとなるため、これら発話様式や声質か
ら例えばテキストの属性やテキストの表示形態の違いと
いった事項を把握することはできない。As described above, in the conventional speech synthesis method, the user himself / herself determines the speech style and voice quality of the synthesized speech by giving the speech style conversion parameter and the voice quality conversion parameter. Therefore, since the speech style and voice quality of the synthesized speech simply reflect the user's own sensibility and taste, it is necessary to grasp items such as differences in text attributes and text display forms from the speech style and voice quality. Can not.

【０００９】本発明は、合成音声の発話様式や声質の違
いからテキストの属性や表示形態の違いを把握すること
ができるようにした音声合成方法を提供することを目的
とするものである。SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech synthesizing method capable of recognizing differences in text attributes and display forms from differences in speech style and voice quality of synthesized speech.

【００１０】[0010]

【課題を解決するための手段】上記の課題を解決するた
め、本発明はテキストから合成音声を生成する音声合成
方法において、テキストの属性および表示形態の少なく
とも一方の情報に従って、合成音声の発話様式および声
質の少なくとも一方を制御することを特徴とする。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention relates to a speech synthesis method for generating a synthesized speech from a text. And controlling at least one of voice quality.

【００１１】ここで、テキストの属性としては、例えば
テキストのフォント、種類、内容、重要性および発生源
のうちの少なくとも一つが用いられる。Here, as the attribute of the text, for example, at least one of the font, type, content, importance, and source of the text is used.

【００１２】また、テキストの表示形態としては、例え
ばテキストを表示するウインドウの種類、大きさ、位置
および変化のうちの少なくとも一つが用いられる。As a text display mode, for example, at least one of a type, a size, a position, and a change of a window for displaying the text is used.

【００１３】また、合成音声の発話様式とは、例えば該
合成音声の音量、発話速度、平均ピッチ、抑揚および音
韻継続時間長などを指す。The speech style of the synthesized speech refers to, for example, the volume, speech speed, average pitch, intonation and phoneme duration of the synthesized speech.

【００１４】さらに、本発明においてはテキストの属性
をテキストの解析により求めることを特徴とする。Further, the present invention is characterized in that the attribute of the text is obtained by analyzing the text.

【００１５】このように本発明では、発話様式や声質が
テキストの属性や表示形態に従って制御された合成音声
が生成されるため、合成音声の発話様式や声質からテキ
ストの属性や表示形態の違いを把握することができ、テ
キスト音声合成の付加価値が向上する。As described above, according to the present invention, since the synthesized speech whose speech style and voice quality are controlled in accordance with the attributes and display forms of the text is generated, differences in the attributes and display forms of the text are determined from the speech style and voice properties of the synthesized speech. It can be grasped, and the added value of text-to-speech synthesis is improved.

【００１６】また、テキストに対して合成音声の発話様
式や声質を表すパラメータ、あるいは発話様式や声質に
応じて異ならせた合成パラメータを添付する方法では、
テキストに添付する付加情報が増大してしまうという問
題があるが、音声合成時にテキストを解析してテキスト
の属性を求めたり、テキストの表示形態を解析して求め
るようにすれば、このような付加情報の添付は不要とな
る。さらに、付加情報をテキストに付加する場合でも、
付加情報としてはテキストの属性情報や表示形態を示す
情報のみでよいため、合成パラメータなどを付加する方
法に比べて付加情報のデータ量は僅かで済む。In the method of attaching a parameter representing the speech style or voice quality of the synthesized speech to the text, or a synthetic parameter varied according to the speech style or voice quality,
There is a problem that the additional information attached to the text increases. However, if the text is analyzed at the time of speech synthesis to obtain the attribute of the text or the display form of the text is analyzed and obtained, such additional information is obtained. There is no need to attach information. Furthermore, even when additional information is added to text,
Since only the attribute information of the text and the information indicating the display mode are required as the additional information, the data amount of the additional information is small as compared with the method of adding the synthesis parameter and the like.

【００１７】[0017]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を説明する。（第１の実施形態）図１に、本発明の音声合成方法を適
用した第１の実施形態に係わる音声合成装置の構成を示
す。この音声合成装置は、テキスト１００を解析するテ
キスト解析部１０、合成パラメータを生成する合成パラ
メータ生成部１１および音声を合成する音声合成部１２
を有する構成を持つ図２２に示した従来の音声合成装置
に、テキストの属性を解析するテキスト属性解析部４
０、テキストの表示形態を解析する表示形態解析部４
１、発話様式変換パラメータ３００を生成する発話様式
変換パラメータ生成部４２および声質変換パラメータ３
０１を生成する声質変換パラメータ生成部４３が追加さ
れた構成となっている。Embodiments of the present invention will be described below with reference to the drawings. (First Embodiment) FIG. 1 shows the configuration of a speech synthesizer according to a first embodiment to which the speech synthesis method of the present invention is applied. This speech synthesis device includes a text analysis unit 10 for analyzing a text 100, a synthesis parameter generation unit 11 for generating synthesis parameters, and a speech synthesis unit 12 for synthesizing speech.
In the conventional speech synthesizer shown in FIG. 22 having a configuration having
0, display form analysis unit 4 for analyzing the display form of text
1. Speech style conversion parameter generation unit 42 for generating speech style conversion parameter 300 and voice quality conversion parameter 3
01 is added.

【００１８】この音声合成装置の基本動作は、従来の装
置と同様である。すなわち、入力されたテキスト１００
に対して、まず、テキスト解析部１０において、形態素
解析や構文解析が行われる。次に、合成パラメータ生成
部１１において、テキスト解析データ１０１を用いて音
韻記号列１０２、音韻継続時間長１０２、ピッチパター
ン１０４およびパワー１０５などの合成パラメータが生
成される。最後に、音声合成部１２において、合成単位
セット記憶部２０に記憶されている音節、音素および１
ピッチ区間などの基本となる小さな単位（音声合成単
位）の特徴パラメータ２００が音韻記号列１０２、音韻
継続時間長１０３およびピッチパターン１０４などの情
報に従って選択され、ピッチや継続時間長が制御された
後、これらが接続されることにより、合成音声１０６が
生成される。The basic operation of this speech synthesizer is the same as that of the conventional device. That is, the input text 100
First, the text analysis unit 10 performs morphological analysis and syntax analysis. Next, the synthesis parameter generation unit 11 generates synthesis parameters such as a phoneme symbol string 102, a phoneme duration time 102, a pitch pattern 104, and a power 105 using the text analysis data 101. Finally, the syllable, phoneme, and 1
After a characteristic parameter 200 of a basic small unit (speech synthesis unit) such as a pitch section is selected according to information such as a phoneme symbol string 102, a phoneme duration 103 and a pitch pattern 104, and the pitch and duration are controlled. By connecting these, a synthesized speech 106 is generated.

【００１９】次に、本実施形態の特徴的な構成について
説明する。まず、テキスト属性解析部４０において、テ
キスト１００のフォーマットおよびテキストのヘッダ部
の情報よりテキストの種類、内容および発生源などのテ
キストの属性を解析して、これらに関する情報４０１
（以下、テキスト属性解析データという）を得る。ま
た、テキストの種類、内容および発生源などの情報から
テキストの重要性を判断することもできる。Next, a characteristic configuration of the present embodiment will be described. First, the text attribute analysis unit 40 analyzes the attributes of the text, such as the type, content, and source of the text, from the format of the text 100 and the information in the header portion of the text, and obtains information 401 relating to them.
(Hereinafter referred to as text attribute analysis data). The importance of the text can also be determined from information such as the type, content, and source of the text.

【００２０】一方、表示形態解析部４１において、テキ
ストの表示形態を解析することにより、テキストを表示
するウインドウの大きさ、種類、位置および変化などの
情報４０２（以下、表示形態解析データという）を得
る。On the other hand, the display form analyzing section 41 analyzes the display form of the text, thereby obtaining information 402 (hereinafter referred to as display form analysis data) such as the size, type, position and change of the window for displaying the text. obtain.

【００２１】次に、発話様式変換パラメータ生成部４２
および声質変換パラメータ生成部４３において、予め用
意された、テキスト属性解析データ４０１および表示形
態解析データ４０２と、発話様式変換パラメータ３００
および声質変換パラメータ３０１との対応表（これを変
換パラメータ生成表と呼ぶ）を参照して、各変換パラメ
ータが生成される。その後、合成パラメータ生成部１１
において、発話様式変換パラメータ３００に従って合成
パラメータが生成される。そして、複数種類用意された
合成単位セット記憶部２０〜２ｎの切り替えを行い、合
成パラメータに従って音声を合成することで、発話様式
および声質の異なる音声を合成する。Next, an utterance style conversion parameter generation unit 42
And the voice quality conversion parameter generation unit 43, the text attribute analysis data 401 and the display form analysis data 402 prepared in advance, and the speech style conversion parameter 300
Each conversion parameter is generated with reference to a correspondence table (referred to as a conversion parameter generation table) with the voice quality conversion parameter 301. After that, the synthesis parameter generation unit 11
In, a synthesis parameter is generated according to the utterance style conversion parameter 300. Then, a plurality of types of the synthesis unit set storage units 20 to 2n are switched, and voices having different utterance styles and voice qualities are synthesized by synthesizing voices according to synthesis parameters.

【００２２】このようにして、本実施形態によると合成
音声の発話様式および声質の違いから、テキストの属性
および表示形態の違いを把握することが可能となる。As described above, according to the present embodiment, it is possible to grasp the difference in the attribute and display form of the text from the difference in the speech style and voice quality of the synthesized speech.

【００２３】なお、声質変換パラメータ３０１を生成す
る際に、予めテキストの属性を解析しておき、解析結果
のテキスト属性解析データ４０１をテキストに添付する
ことで、テキスト属性解析部４０を省くことも可能であ
る。次に、本実施形態に係る音声合成装置の具体的な動
作例を説明する。When generating the voice conversion parameter 301, the text attribute is analyzed in advance, and the text attribute analysis data 401 as the analysis result is attached to the text, so that the text attribute analysis unit 40 can be omitted. It is possible. Next, a specific operation example of the speech synthesis device according to the present embodiment will be described.

【００２４】［動作例１］まず、図２〜図４を参照し
て、計算機上で複数のウインドウを開いた状態でウイン
ドウに表示される計算機からのメッセージを音声合成す
る場合を例にとり、図１におけるテキスト属性解析部４
０、表示形態解析部４１、発話様式変換パラメータ生成
部４２および声質変換パラメータ生成部４３の動作を説
明する。[Operation Example 1] First, with reference to FIGS. 2 to 4, a case will be described in which a plurality of windows are opened on a computer and a message from the computer displayed in the window is synthesized by voice. Text attribute analysis unit 4 in 1
0, the operation of the display form analysis unit 41, the utterance style conversion parameter generation unit 42, and the voice quality conversion parameter generation unit 43 will be described.

【００２５】まず、テキスト属性解析部４０において、
予め用意したメッセージテーブルを参照し、ウインドウ
に発生したメッセージの種類や内容を識別する。一方、
表示形態解析部４１において、メッセージを表示するウ
インドウの形態を解析し、ウインドウの種類およびウイ
ンドウの大きさに関する情報を得る。First, in the text attribute analyzer 40,
The type and contents of the message generated in the window are identified by referring to a message table prepared in advance. on the other hand,
The display form analyzing unit 41 analyzes the form of the window displaying the message, and obtains information on the type and size of the window.

【００２６】次に、発話様式変換パラメータ生成部４２
および声質変換パラメータ生成部４３において、予め用
意された変換パラメータ生成表を参照し、メッセージの
内容、ウインドウの種類およびウインドウの大きさの情
報に従って発話様式変換パラメータ３００および声質変
換パラメータ３０１が生成された後に、生成された各変
換パラメータに従って合成音声が生成される。Next, the utterance style conversion parameter generation unit 42
The voice conversion parameter generation unit 43 refers to the conversion parameter generation table prepared in advance, and generates the speech style conversion parameter 300 and the voice conversion parameter 301 according to the information of the message content, the type of the window, and the size of the window. Later, a synthesized speech is generated according to each of the generated conversion parameters.

【００２７】例えば、図２に示すように計算機上で３つ
のウインドウを開いた状態で、計算機“hikari”によっ
て起動されたアプリケーション“kterm ”にエラーメッ
セージ“Error Message 0 ”が表示された場合を考え
る。この場合、まずテキスト属性解析部４０において、
メッセージテーブルを参照してウインドウに表示された
メッセージがエラーメッセージであると識別される。ま
た、表示形態解析部４１において、メッセージが表示さ
れているウインドウが計算機“hikari”のアプリケーシ
ョン“kterm ”によって生成されているウインドウ“ k
term＠hikari”であり、例えば、その大きさが「１０
０」であると識別される。For example, consider a case where an error message "Error Message 0" is displayed on an application "kterm" started by a computer "hikari" with three windows opened on the computer as shown in FIG. . In this case, first, in the text attribute analysis unit 40,
The message displayed in the window is identified as an error message by referring to the message table. In the display form analysis unit 41, the window displaying the message is displayed in the window “k” generated by the application “kterm” of the computer “hikari”.
term @ hikari ”, and its size is“ 10
0 ".

【００２８】次に、発話様式変換パラメータ生成部４２
および声質変換パラメータ生成部４３において、予め用
意した図４に示す変換パラメータ生成表を参照し、メッ
セージの種類、ウインドウの種類およびウインドウの大
きさに従って、発話様式変換パラメータ３００および声
質変換変換パラメータ３０１が生成される。Next, an utterance style conversion parameter generation unit 42
The voice quality conversion parameter generation unit 43 refers to the conversion parameter generation table shown in FIG. 4 prepared in advance, and converts the speech style conversion parameter 300 and the voice quality conversion parameter 301 according to the type of the message, the type of the window, and the size of the window. Generated.

【００２９】ここで、図２の場合は大きさが「１００」
のウインドウ“ kterm＠hikari”にメッセージ“Error
Message 0 ”が表示されているので、図４に示す変換パ
ラメータ生成表に従って、音量「１０」、発話速度「２
０」、平均ピッチ「３０」、合成単位セット記憶部番号
「０」の各変換パラメータが生成された後、これらのパ
ラメータに従って合成音声が生成される。Here, in the case of FIG. 2, the size is "100".
Message "Error in window" kterm @ hikari "
Since "Message 0" is displayed, the volume is "10" and the utterance speed is "2" according to the conversion parameter generation table shown in FIG.
After the conversion parameters of “0”, the average pitch “30”, and the synthesis unit set storage unit number “0” are generated, synthesized speech is generated according to these parameters.

【００３０】同様に、図３の場合には、アプリケーショ
ン“kterm ”にエラーメッセージは表示されていない
が、大きさが「５０」のウインドウ“ xterm＠nozomi”
に警告メッセージ“Warning Message 1 ”が表示されて
いるので、図４に示す変換パラメータ生成表に従って、
音量「５」、発話速度「１０」、平均ピッチ「２０」、
合成単位セット記憶部番号「８」の各変換パラメータが
生成された後、これらのパラメータに従って合成音声が
生成される。Similarly, in the case of FIG. 3, although no error message is displayed in the application “kterm”, a window “xterm @ nozomi” having a size of “50” is displayed.
Is displayed, a warning message "Warning Message 1" is displayed in accordance with the conversion parameter generation table shown in FIG.
Volume "5", utterance speed "10", average pitch "20",
After the conversion parameters of the synthesis unit set storage unit number “8” are generated, a synthesized speech is generated according to these parameters.

【００３１】このように図２と図３では、生成される合
成音声の発話様式および声質が異なっており、これらの
違いよりメッセージの種類、ウインドウの種類およびウ
インドウの大きさの違いを容易に把握することができ
る。As described above, in FIG. 2 and FIG. 3, the speech style and voice quality of the synthesized speech are different, and the difference in the type of message, the type of window, and the size of the window can be easily grasped from these differences. can do.

【００３２】［動作例２］次に、図５〜図８を参照し
て、通信システムの送信側から送られたテキストを受信
側で受け取り、その受け取ったテキストを受信側で音声
合成する場合を例にとって、図１におけるテキスト属性
解析部４０、発話様式変換パラメータ生成部４２および
声質変換パラメータ生成部４３の動作を説明する。[Operation Example 2] Next, referring to FIGS. 5 to 8, a case will be described where a text transmitted from a transmission side of a communication system is received by a reception side, and the received text is subjected to speech synthesis on the reception side. For example, the operations of the text attribute analysis unit 40, the utterance style conversion parameter generation unit 42, and the voice quality conversion parameter generation unit 43 in FIG. 1 will be described.

【００３３】まず、テキスト属性解析部４０において、
入力されたテキスト１００のフォーマットからテキスト
の種類が識別された後、ヘッダ部を参照して送信者およ
びサブジェクトなどのテキスト付属情報が識別される。First, in the text attribute analysis unit 40,
After the type of the text is identified from the format of the input text 100, the attached text information such as the sender and the subject is identified with reference to the header portion.

【００３４】一方、表示形態解析部４１において、テキ
ストを表示するウインドウの種類およびウインドウの大
きさが解析される。ここで、図５に示されるようなテキ
ストが受信された場合を考える。図６に、この場合の受
信側の表示画面および音声合成の様子を示す。On the other hand, the display form analysis unit 41 analyzes the type and size of the window for displaying the text. Here, consider a case where a text as shown in FIG. 5 is received. FIG. 6 shows a display screen on the receiving side and a state of speech synthesis in this case.

【００３５】テキスト属性解析部４０において、予め用
意された図７に示すようなテキストのフォーマットとテ
キストの種類との対応表を参照することにより、図５の
テキストの種類はメールであると識別される。さらに、
ヘッダ部の情報を解析することにより、送信者がＡ、サ
ブジェクトがＳであると識別される。一方、表示形態解
析部４１において、テキストを表示するウインドウの種
類およびウインドウの大きさが解析され、表示形態解析
データ４０２が得られる。なお、図７の対応表における
フォーマット１およびフォーマット２の具体的な例を図
８および図９に示す。The text attribute analyzer 40 refers to a correspondence table between the text format and the text type prepared as shown in FIG. 7 prepared in advance, whereby the text type in FIG. 5 is identified as mail. You. further,
By analyzing the information in the header portion, it is identified that the sender is A and the subject is S. On the other hand, the display mode analysis unit 41 analyzes the type and size of the window for displaying the text, and obtains display mode analysis data 402. 8 and 9 show specific examples of format 1 and format 2 in the correspondence table of FIG.

【００３６】次に、発話様式変換パラメータ生成部４２
および声質変換パラメータ生成部４３において、予め用
意された変換パラメータ生成表を参照して、メッセージ
の内容、ウインドウの種類およびウインドウの大きさの
情報に従って発話様式変換パラメータ３００および声質
変換パラメータ３０１が生成された後、これらの各変換
パラメータに従って合成音声が生成される。よって、合
成音声の発話様式および声質の違いからテキストの種
類、送信者、サブジェクト、ウインドウの種類およびウ
インドウの大きさの違いが容易に把握できるようにな
る。Next, the utterance style conversion parameter generation unit 42
The speech quality conversion parameter generation unit 43 refers to a conversion parameter generation table prepared in advance, and generates the speech style conversion parameter 300 and the voice quality conversion parameter 301 according to the information of the message content, the type of the window, and the size of the window. After that, a synthesized speech is generated according to each of these conversion parameters. Therefore, the difference in the type of text, the sender, the subject, the type of window, and the size of the window can be easily grasped from the difference in the speech style and voice quality of the synthesized speech.

【００３７】なお、テキストの種類はメールに限定され
る必要はなく、ニュース、雑誌、書籍など様々な種類が
考えられ得る。また、ヘッダ部に付け加えられるテキス
ト付属情報は送信者およびサブジェクトに限定される必
要はなく、その他の情報でも構わない。さらに、テキス
トの種類をもテキスト付属情報としてヘッダ部に付け加
えることにより、テキスト付属情報からテキストの種類
を識別するように変更することもできる。また、声質変
換パラメータと対応付ける要素は、テキストの種類、送
信者、サブジェクト、ウインドウの種類に限定されるも
のではなく、テキスト属性解析データおよび表示形態解
析データであれば、いかなる要素でも構わない。The type of text need not be limited to mail, and various types such as news, magazines, and books can be considered. Further, the text attached information added to the header portion does not need to be limited to the sender and the subject, and may be other information. Further, by adding the type of text to the header section as text-attached information, a change can be made so as to identify the type of text from the text-attached information. Further, the elements associated with the voice quality conversion parameters are not limited to the type of text, sender, subject, and window type, but may be any elements as long as they are text attribute analysis data and display form analysis data.

【００３８】さらに、本実施形態におけるテキスト解析
部１０では、入力されたテキストから読み上げ領域を抽
出した後、抽出された領域のテキスト情報を解析する処
理が行われている。Further, the text analysis section 10 in the present embodiment performs a process of extracting a reading area from the input text and analyzing text information of the extracted area.

【００３９】［動作例３］次に、図１０〜図１１を参照
して複数のセグメントからなるテキストを音声合成する
場合を例にとり、図１におけるテキスト属性解析部４
０、発話様式変換パラメータ生成部４２および声質変換
パラメータ生成部４３の動作を説明する。[Operation Example 3] Next, referring to FIGS. 10 to 11, taking as an example a case where a text composed of a plurality of segments is subjected to speech synthesis, the text attribute analysis unit 4 in FIG.
0, the operation of the speech style conversion parameter generation unit 42 and the voice quality conversion parameter generation unit 43 will be described.

【００４０】まず、テキスト属性解析部４０において、
入力されたテキストがどのセグメントに属しているか判
断される。また、表示形態解析部４１において、テキス
トを表示するウインドウの種類およびウインドウの大き
さが解析される。ここで、図１０に示されるテキストを
音声合成する場合を考える。図１１に、この場合の表示
画面および音声合成の様子を示す。この場合、まずテキ
スト属性解析部４０において、入力されたテキストが氏
名、自宅の電話番号、およ職場の電話番号のどのセグメ
ントに属しているか識別される。また、表示形態解析部
４１において表示形態が解析される。First, in the text attribute analysis unit 40,
It is determined which segment the input text belongs to. In addition, the display mode analysis unit 41 analyzes the type and size of the window for displaying the text. Here, consider the case where the text shown in FIG. 10 is synthesized by speech. FIG. 11 shows a display screen and speech synthesis in this case. In this case, first, the text attribute analysis unit 40 identifies which segment of the input text belongs to the name, the home telephone number, and the office telephone number. The display mode is analyzed by the display mode analysis unit 41.

【００４１】次に、発話様式変換パラメータ生成部４２
および声質変換パラメータ生成部４３において、変換パ
ラメータ生成表を参照して、セグメントの種類、ウイン
ドウの種類およびウインドウの大きさに従い、発話様式
変換パラメータ３００および声質変換変換パラメータ３
０１が生成された後、これらの各変換パラメータに従い
合成音声が生成される。Next, the utterance style conversion parameter generation unit 42
And voice conversion parameter generation unit 43, referring to the conversion parameter generation table, according to the type of segment, the type of window, and the size of window, utterance style conversion parameter 300 and voice conversion parameter 3
After the 01 is generated, a synthesized speech is generated according to each of these conversion parameters.

【００４２】従って、合成音声の発話様式および声質か
ら、読み上げが行われているウインドウおよびセグメン
ト（上述の例では氏名、自宅の電話番号、職場の電話番
号のいずれか）が容易に把握できるようになる。Accordingly, the window and segment (in the above example, any of the name, the home telephone number, and the office telephone number) in which the reading is performed can be easily grasped from the speech style and voice quality of the synthesized voice. Become.

【００４３】（第２の実施形態）図１２に、本発明の音
声合成方法を適用した第２の実施形態に係わる音声合成
装置の構成を示す。図１２において、図１と同じ参照番
号を付した要素は図１と同じ機能を有するものとして説
明を省略する。(Second Embodiment) FIG. 12 shows the configuration of a speech synthesis apparatus according to a second embodiment to which the speech synthesis method of the present invention is applied. 12, elements denoted by the same reference numerals as those in FIG. 1 have the same functions as those in FIG.

【００４４】本実施形態は、合成単位を変換する合成単
位変換部３２を設け、声質変換パラメータ３０１に従っ
て合成単位を変換するものである。音声合成部１２で
は、変換された合成単位を用いて合成音声１０６を生成
する。合成単位変換部３２の処理は、例えばスペクトル
パラメータに対する周波数や振幅のマッピングによって
実現することができる。この場合、声質変換パラメータ
生成部４３は周波数や振幅のマッピング関数を声質変換
パラメータ３０１として出力する。ｆを周波数とする
と、特徴パラメータ２００の中のパワースペクトルｓ
（ｆ）から特徴パラメータ２００の中のパワースペクト
ルｓ′（ｆ）への変換は次式で表される。In this embodiment, a synthesis unit conversion unit 32 for converting a synthesis unit is provided, and a synthesis unit is converted according to a voice quality conversion parameter 301. The speech synthesis unit 12 generates a synthesized speech 106 using the converted synthesis unit. The processing of the synthesis unit conversion unit 32 can be realized by, for example, mapping of a frequency and an amplitude to a spectrum parameter. In this case, the voice conversion parameter generation unit 43 outputs a mapping function of frequency and amplitude as the voice conversion parameter 301. If f is a frequency, the power spectrum s in the feature parameter 200
The conversion from (f) to the power spectrum s ′ (f) in the feature parameter 200 is represented by the following equation.

【００４５】ｓ′（ｆ）＝ｍ（ｓ（ω（ｆ）)) （１）ただし、ω（ｆ）は周波数のマッピングを表す関数であ
り、ｍ（）は振幅のマッピングを表す関数である。S ′ (f) = m (s (ω (f))) (1) where ω (f) is a function representing frequency mapping, and m () is a function representing amplitude mapping. .

【００４６】例えば、ｓ（ｆ），ｓ′（ｆ）が図１３、
図１４でそれぞれ表されるとする。ここで、Ｆ₁，
Ｆ₂，Ｆ₃′，Ｆ₄′はホルマント周波数であり、
Ｐ₁，Ｐ₂，Ｐ₃′，Ｐ₄′は対応するホルマントのパ
ワーである。この場合、変換パラメータに対応する周波
数のマッピング関数ω（ｆ）は、それぞれＦ₁をＦ₁′
に、Ｆ₂をＦ₂′にマッピングするような関数であり、
振幅のマッピング関数ｍ（）は、それぞれＰ₁を
Ｐ₁′に、Ｐ₂をＰ₂にマッピングするような関数であ
る。これらの関数ω（ｆ）およびｍ（）の例を図１
５、図１６にそれぞれ示す。For example, s (f) and s' (f) are shown in FIG.
It is assumed that each is represented in FIG. Where F ₁ ,
F ₂ , F ₃ ′, F ₄ ′ are formant frequencies,
P ₁ , P ₂ , P ₃ ′, P ₄ ′ are the corresponding formant powers. In this case, the mapping functions ω (f) of the frequencies corresponding to the conversion parameters are F ₁ and F ₁ ′, respectively.
And a function that maps F ₂ to F ₂ ′,
The amplitude mapping function m () is a function that maps P ₁ to P ₁ ′ and P ₂ to P ₂ , respectively. Examples of these functions ω (f) and m () are shown in FIG.
5 and FIG.

【００４７】本実施形態において、合成単位変換部３２
での特徴パラメータ２００の変換については、例えば全
ての合成単位について対応する変換パラメータを求め
て、各特徴パラメータ毎に異なる変換を行うことも可能
であり、また合成単位を幾つかのクラスに分類して一つ
のクラスに対しては共通の変換パラメータを求めて変換
を行うことも可能である。例えば、複数のパワースペク
トルｓｉ（ｆ）を対応するパワースペクトルＳｉ′
（ｆ），（ｉ＝１，２，…，Ｎ）に変換する共通のマッ
ピング関数ω（），ｍ（）は、公知の最適化手法を
用いて次式で表される変換の誤差関数Ｅ（ω，ｍ）を最
小化することによって求められる。In this embodiment, the synthesis unit converter 32
In the conversion of the feature parameter 200, for example, it is also possible to obtain a conversion parameter corresponding to all the synthesis units and perform a different conversion for each feature parameter, and classify the synthesis unit into several classes. It is also possible to perform conversion by obtaining a common conversion parameter for one class. For example, a plurality of power spectra si (f) are converted to corresponding power spectra Si '.
(F), a common mapping function ω (), m () to be converted into (i = 1, 2,..., N) is a conversion error function E expressed by the following equation using a known optimization method. It is obtained by minimizing (ω, m).

【００４８】[0048]

【数１】 (Equation 1)

【００４９】ただし、ｆｓはサンプリング周波数であ
る。なお、声質の情報は主に母音に含まれるため、母音
を含む合成単位についてのみ特徴パラメータ２００の変
換を行い、その他の合成単位については変換を行わずに
そのまま用いることによって声質を変換することも可能
である。このようにすれば、合成単位変換部３２での処
理量を減らすことができる。また、合成単位変換部３２
において次式のような変換を行うことも可能である。Here, fs is a sampling frequency. Since the voice quality information is mainly included in the vowel, the voice quality may be converted by performing the conversion of the characteristic parameter 200 only on the synthesis unit including the vowel and using the other synthesis units without conversion. It is possible. By doing so, it is possible to reduce the amount of processing in the synthesis unit conversion unit 32. Also, the synthesis unit conversion unit 32
It is also possible to perform the following conversion in

【００５０】ｓ′（ｆ）＝ｇ（ｆ）ｓ（ω（ｆ））（３）これは、振幅の変換を関数ｇ（ｆ）との積で実現した例
である。S ′ (f) = g (f) s (ω (f)) (3) This is an example in which the conversion of the amplitude is realized by the product of the function g (f).

【００５１】本実施形態では、一つの合成単位セット２
０を加工することで合成音声１０６の声質を変化させて
いることから、複数の合成単位セットを記憶しておく必
要がないので、より少ないメモリ容量で、複数種類の声
質の合成音声１０６が得られる。In this embodiment, one composite unit set 2
Since the voice quality of the synthesized voice 106 is changed by processing 0, it is not necessary to store a plurality of synthesis unit sets, so that the synthesized voice 106 of a plurality of types of voice quality can be obtained with a smaller memory capacity. Can be

【００５２】（第３の実施形態）図１７に、本発明の音
声合成方法を適用した第３の実施形態に係わる音声合成
装置の構成を示す。図１７において、図１と同じ参照番
号を付した要素は図１と同じ機能を有するものとして説
明を省略する。(Third Embodiment) FIG. 17 shows the configuration of a speech synthesis apparatus according to a third embodiment to which the speech synthesis method of the present invention is applied. 17, elements denoted by the same reference numerals as those in FIG. 1 have the same functions as those in FIG.

【００５３】本実施形態は、合成音声１０６をフィルタ
処理するフィルタ部１３を設け、声質変換パラメータ３
０１によりフィルタ部１３を制御するようにした点が第
１の実施形態と異なっている。フィルタ部１３を制御し
て合成音声１０６の低域や高域を強調することで、合成
音声１０６の声質を変えることが可能になる。また、本
実施形態ではフィルタを用いることにより合成音声の声
質を変化させていることから、複数の合成単位セットを
記憶しておく必要がないため、より少ないメモリ容量
で、複数種類の声質の合成音声１０６が得られる。In this embodiment, a filter unit 13 for filtering the synthesized speech 106 is provided,
The first embodiment is different from the first embodiment in that the filter unit 13 is controlled by the control unit 01. By controlling the filter unit 13 to emphasize the low and high frequencies of the synthesized speech 106, the voice quality of the synthesized speech 106 can be changed. Further, in the present embodiment, since the voice quality of the synthesized voice is changed by using a filter, it is not necessary to store a plurality of synthesis unit sets. Voice 106 is obtained.

【００５４】（第４の実施形態）図１８に、本発明の音
声合成方法を適用した第４の実施形態に係わる音声合成
装置の構成を示す。図１８において、図１と同じ参照番
号を付した要素は図１と同じ機能を有するものとして説
明を省略する。(Fourth Embodiment) FIG. 18 shows the configuration of a speech synthesis apparatus according to a fourth embodiment to which the speech synthesis method of the present invention is applied. 18, elements denoted by the same reference numerals as those in FIG. 1 have the same functions as those in FIG.

【００５５】本実施形態は、第１の実施形態における図
１中の発話様式変換パラメータ生成部４２を省略し、声
質変換のみを行うようにした例である。テキストの属性
および表示形態の情報に従った声質変換パラメータによ
って声質変換を行う点は、第１の実施形態と同様であ
る。This embodiment is an example in which the utterance style conversion parameter generation unit 42 in FIG. 1 in the first embodiment is omitted and only voice conversion is performed. As in the first embodiment, voice conversion is performed using voice conversion parameters according to text attribute and display mode information.

【００５６】（第５の実施形態）図１９に、本発明の音
声合成方法を適用した第５の実施形態に係わる音声合成
装置の構成を示す。図１９において、図１２と同じ参照
番号を付した要素は図１と同じ機能を有するものとして
説明を省略する。(Fifth Embodiment) FIG. 19 shows the configuration of a speech synthesis apparatus according to a fifth embodiment to which the speech synthesis method of the present invention is applied. 19, elements denoted by the same reference numerals as those in FIG. 12 have the same functions as those in FIG.

【００５７】本実施形態は、第２の実施形態における図
１２中の発話様式変換パラメータ生成部４２を省略し、
声質変換のみを行うようにした例である。テキストの属
性および表示形態の情報に従った声質変換パラメータに
よって声質変換を行う点は、第２の実施形態と同様であ
る。In the present embodiment, the utterance style conversion parameter generation unit 42 in FIG. 12 in the second embodiment is omitted,
This is an example in which only voice conversion is performed. The point that voice conversion is performed using voice conversion parameters in accordance with text attribute and display mode information is the same as in the second embodiment.

【００５８】（第６の実施形態）図２０に、本発明の音
声合成方法を適用した第６の実施形態に係わる音声合成
装置の構成を示す。図２０において、図１７と同じ参照
番号を付した要素は図１と同じ機能を有するものとして
説明を省略する。(Sixth Embodiment) FIG. 20 shows the configuration of a speech synthesis apparatus according to a sixth embodiment to which the speech synthesis method of the present invention is applied. 20, elements denoted by the same reference numerals as those in FIG. 17 have the same functions as those in FIG.

【００５９】本実施形態は、第３の実施形態における図
１７中の発話様式変換パラメータ生成部４２を省略し、
声質変換のみを行うようにした例である。テキストの属
性および表示形態の情報に従った声質変換パラメータに
よって声質変換を行う点は、第３の実施形態と同様であ
る。In the present embodiment, the utterance style conversion parameter generation unit 42 in FIG. 17 in the third embodiment is omitted.
This is an example in which only voice conversion is performed. As in the third embodiment, voice conversion is performed using voice conversion parameters according to text attribute and display mode information.

【００６０】（第７の実施形態）図２１に、本発明の音
声合成方法を適用した第７の実施形態に係わる音声合成
装置の構成を示す。図２１において、図１７と同じ参照
番号を付した要素は図１と同じ機能を有するものとして
説明を省略する。(Seventh Embodiment) FIG. 21 shows the configuration of a speech synthesis apparatus according to a seventh embodiment to which the speech synthesis method of the present invention is applied. In FIG. 21, elements denoted by the same reference numerals as those in FIG. 17 have the same functions as those in FIG.

【００６１】本実施形態は、第３の実施形態における図
１７中の声質変換パラメータ生成部４３を省略し、発話
様式変換のみを行うようにした例である。テキストの属
性および表示形態の情報に従った発話様式変換パラメー
タによって発話様式の変換を行う点は、第３の実施形態
と同様である。This embodiment is an example in which the voice quality conversion parameter generation unit 43 in FIG. 17 in the third embodiment is omitted, and only the speech style conversion is performed. The conversion of the utterance style by the utterance style conversion parameter according to the information on the attribute of the text and the display form is the same as in the third embodiment.

【００６２】以上、本発明の実施形態をいつくか説明し
たが、本発明は上述した実施形態に限られるものではな
く、種々変形して実施が可能である。Although some embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and can be implemented with various modifications.

【００６３】例えば、上記各実施形態において、発話様
式変換パラメータ生成部４２および声質変換パラメータ
生成部４３を制御するデータは、テキスト属性解析デー
タおよびテキストの表示形態解析データのうちのいずれ
か一方のデータを用いても良い。For example, in each of the above embodiments, the data for controlling the utterance style conversion parameter generation section 42 and the voice quality conversion parameter generation section 43 is one of the text attribute analysis data and the text display form analysis data. May be used.

【００６４】さらに、上記各実施形態において、テキス
ト属性解析データおよびテキストの表示形態解析データ
は実施形態で記述した要素に限定されるものではなく、
他の要素を用いても良い。Further, in each of the above embodiments, the text attribute analysis data and the text display form analysis data are not limited to the elements described in the embodiments.
Other elements may be used.

【００６５】[0065]

【発明の効果】以上詳述したように、本発明によれば発
話様式や声質がテキストの属性や表示形態に従って制御
された合成音声を生成することにより、合成音声の発話
様式や声質からテキストの属性、すなわちテキストのフ
ォント、種類、内容、重要性および発生源などや、テキ
ストの表示形態、例えばテキストを表示するウインドウ
の種類、大きさ、位置および変化などの違いを把握する
ことができる。As described above in detail, according to the present invention, by generating a synthesized speech in which the speech style and voice quality are controlled according to the attributes and display forms of the text, the text style is converted from the speech style and voice quality of the synthesized speech. It is possible to grasp the attributes, that is, the font, type, content, importance, source, and the like of the text, and the difference in the display form of the text, for example, the type, size, position, and change of the window that displays the text.

【００６６】従って、本発明によると従来の単なるテキ
スト読み上げ機能に加えて、合成音声から読み上げられ
たテキストの属性や計算機上での表示形態が分かるとい
う新たな機能が追加されることにより、テキスト音声合
成の付加価値が向上するという利点がある。Therefore, according to the present invention, in addition to the conventional simple text-to-speech function, a new function of knowing the attributes of the text read out from the synthesized speech and the display form on the computer is added. There is an advantage that the added value of the synthesis is improved.

[Brief description of the drawings]

【図１】本発明の第１の実施形態に係わる音声合成装置
の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a speech synthesizer according to a first embodiment of the present invention.

【図２】ウインドウに発生したメッセージを音声合成す
る場合の態様を示す図FIG. 2 is a diagram showing an embodiment in which a message generated in a window is synthesized by voice;

【図３】ウインドウに発生したメッセージを音声合成す
る場合の他の態様を示す図FIG. 3 is a diagram showing another mode in which a message generated in a window is synthesized by voice;

【図４】変換パラメータ生成表を示す図FIG. 4 is a diagram showing a conversion parameter generation table;

【図５】入力されたテキストの一例を示す図FIG. 5 is a diagram showing an example of an input text.

【図６】受信されたテキストを音声合成する場合の態様
を示す図FIG. 6 is a diagram showing an embodiment in which a received text is subjected to speech synthesis;

【図７】フォーマットとテキストの種類の対応表を示す
図FIG. 7 is a diagram showing a correspondence table between formats and text types;

【図８】図７中のフォーマット１の例を示す図FIG. 8 is a diagram showing an example of format 1 in FIG. 7;

【図９】図７中のフォーマット２の例を示す図9 is a diagram showing an example of format 2 in FIG.

【図１０】入力されたテキストの他の例を示す図FIG. 10 is a diagram showing another example of the input text.

【図１１】複数のセグメン卜からなるテキストを音声合
成する場合の態様を示す図FIG. 11 is a diagram showing an embodiment in which text composed of a plurality of segments is subjected to speech synthesis.

【図１２】本発明の第２の実施形態に係わる音声合成装
置の構成を示すブロック図FIG. 12 is a block diagram showing a configuration of a speech synthesizer according to a second embodiment of the present invention.

【図１３】同実施形態における特徴パラメータ中のパワ
ースペクトルの例を示す図FIG. 13 is a view showing an example of a power spectrum in a feature parameter according to the embodiment.

【図１４】同実施形態における特徴パラメータ中のパワ
ースペクトルを変換したパワースペクトルの例を示す図FIG. 14 is a diagram showing an example of a power spectrum obtained by converting a power spectrum in a feature parameter according to the embodiment;

【図１５】同実施形態における変換パラメータに対応す
る周波数のマッピングの関数の例を示す図FIG. 15 is a view showing an example of a frequency mapping function corresponding to a conversion parameter according to the embodiment;

【図１６】同実施形態における変換パラメータに対応す
る振幅のマッピングの関数の例を示す図FIG. 16 is a diagram showing an example of a function of mapping an amplitude corresponding to a conversion parameter in the embodiment.

【図１７】本発明の第３の実施形態に係わる音声合成装
置の構成を示すブロック図FIG. 17 is a block diagram showing a configuration of a speech synthesizer according to a third embodiment of the present invention.

【図１８】本発明の第４の実施形態に係わる音声合成装
置の構成を示すブロック図FIG. 18 is a block diagram showing a configuration of a speech synthesizer according to a fourth embodiment of the present invention.

【図１９】本発明の第５の実施形態に係わる音声合成装
置の構成を示すブロック図FIG. 19 is a block diagram showing a configuration of a speech synthesizer according to a fifth embodiment of the present invention.

【図２０】本発明の第６の実施形態に係わる音声合成装
置の構成を示すブロック図FIG. 20 is a block diagram showing a configuration of a speech synthesizer according to a sixth embodiment of the present invention.

【図２１】本発明の第７の実施形態に係わる音声合成装
置の構成を示すブロック図FIG. 21 is a block diagram showing a configuration of a speech synthesizer according to a seventh embodiment of the present invention.

【図２２】テキスト音声合成のための音声合成装置の基
本構成を示すブロック図FIG. 22 is a block diagram showing a basic configuration of a speech synthesizer for text speech synthesis.

【図２３】合成音声の声質および発話様式を可変とした
従来の音声合成装置の構成を示すブロック図FIG. 23 is a block diagram showing the configuration of a conventional speech synthesizer in which the voice quality and speech style of synthesized speech are variable.

【図２４】合成音声にフィルタをかけることによって合
成音声の声質を可変とした従来の別の音声合成装置の構
成を示すブロック図FIG. 24 is a block diagram showing the configuration of another conventional speech synthesizer in which the voice quality of the synthesized speech is made variable by applying a filter to the synthesized speech.

[Explanation of symbols]

１０…テキスト解析部１１…合成パラメータ生成部１２…音声合成部１３…フイルタ部２０〜２ｎ…合成単位セット記憶部３０…発話様式変換パラメータ入力部３１…声質様式変換パラメータ入力部３２…合成単位変換部４０…テキスト属性解析部４１…表示形態解析部４２…発話様式変換パラメータ生成部４３…声質変換パラメータ生成部１００…テキスト１０１…テキスト解析データ１０２…音韻記号列１０３…音韻継続時間長１０４…ピッチパターン１０５…パワー１０６…合成音声２００…特徴パラメータ３００…発話様式変換パラメータ３０１…声質変換パラメータ４００…表示形態４０１…テキスト属性解析データ４０２…表示形態解析データ DESCRIPTION OF SYMBOLS 10 ... Text analysis part 11 ... Synthesis parameter generation part 12 ... Speech synthesis part 13 ... Filter part 20-2n ... Synthesis unit set storage part 30 ... Speech style conversion parameter input part 31 ... Voice quality conversion parameter input part 32 ... Synthesis unit conversion Unit 40: Text attribute analysis unit 41: Display form analysis unit 42: Utterance style conversion parameter generation unit 43: Voice quality conversion parameter generation unit 100: Text 101: Text analysis data 102: Phoneme symbol string 103: Phoneme duration 104: Pitch Pattern 105 Power 106 Synthesized speech 200 Feature parameter 300 Speech style conversion parameter 301 Voice conversion parameter 400 Display mode 401 Text attribute analysis data 402 Display mode analysis data

───────────────────────────────────────────────────── フロントページの続き (72)発明者押切正浩神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者天田皇神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者籠嶋岳彦神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Masahiro Oshikiri 1 Toshiba R & D Center, Komukai-shi, Kawasaki-shi, Kanagawa Prefecture (72) Inventor Emperor Emperor Emperor Toshiba Komukai, Sai-ku, Kawasaki-shi, Kanagawa No. 1 Town, Toshiba R & D Center (72) Inventor Takehiko Kagoshima No. 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki-shi, Kanagawa, Japan Toshiba R & D Center

Claims

[Claims]

1. A speech synthesis method for generating a synthesized speech from a text, wherein at least one of a speech style and a voice quality of the synthesized speech is controlled according to at least one of information on an attribute and a display form of the text. Speech synthesis method.

2. The speech synthesis method according to claim 1, wherein at least one of font, type, content, importance, and source of the text is used as the attribute of the text.

3. The speech synthesis method according to claim 1, wherein at least one of a type, a size, a position, and a change of a window for displaying the text is used as the display mode of the text.

4. The speech synthesis according to claim 1, wherein the speech style of the synthesized speech is at least one of a volume, a speech speed, an average pitch, intonation, and a phoneme duration of the synthesized speech. Method.

5. The speech synthesis method according to claim 1, wherein the attribute of the text is obtained by analyzing the text.