JP2001134283A

JP2001134283A - Device and method for synthesizing speech

Info

Publication number: JP2001134283A
Application number: JP31400799A
Authority: JP
Inventors: Toshiyasu Masakawa; 俊康政川; Yasushi Ishikawa; 泰石川; Mitsuru Ebihara; 充海老原
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-11-04
Filing date: 1999-11-04
Publication date: 2001-05-18

Abstract

PROBLEM TO BE SOLVED: To realize high expressiveness as an alternative conversation means, which has not been realized because it has been difficult to change rhythms and acoustic features in an synthesized speech output. SOLUTION: A language processing part 1 analyzes an input text, and generates language information corresponding to the input text. A phoneme parameter generation means 3 selects a speech synthesis unit as a phoneme parameter corresponding to the input text on the basis of the language information. A rhythm parameter control part 81 generates a rhythm parameter from the language information according to a rhythm parameter generation rule, and also changes the subsequent rhythm parameter according to input operation during speech output. A speech synthesis part 7 synthesizes and outputs speech on the basis of the phoneme parameter and the rhythm parameter.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、入力テキストに
基づいて音声を合成する音声合成装置および音声合成方
法に関し、特に会話の代替手段として使用する際の表現
性向上のために韻律や音響的な特徴を変更可能な音声合
成装置および音声合成方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing apparatus and a voice synthesizing method for synthesizing a voice based on an input text, and more particularly to a prosody or an acoustic method for improving expression when used as a substitute for conversation. The present invention relates to a speech synthesis device and a speech synthesis method whose characteristics can be changed.

【０００２】[0002]

【従来の技術】音声合成装置を会話の代替手段として用
いる場合、合成音声が単調であるため表現性が乏しいと
いう問題点があったが、ピッチパタンや音韻継続時間長
などの韻律をユーザが変更することができるようにして
表現性を向上させる方法が開発されている。例えば、ユ
ーザが文中のある箇所を強調したい場合、その箇所のア
クセントを強くしたり話速を遅くして聞きやすくするこ
とにより、その箇所を強調したいというユーザの意思を
聞き手に伝えることができる。2. Description of the Related Art When a speech synthesizer is used as a substitute for conversation, there has been a problem that the synthesized speech is monotonous and expressiveness is poor. However, the user changes prosody such as pitch pattern and phoneme duration. A method has been developed to improve the expressiveness of the image. For example, when the user wants to emphasize a certain part in the sentence, the user's intention to emphasize that part can be conveyed to the listener by increasing the accent of the part or making the speech speed easier to hear.

【０００３】そのようなユーザが合成音声の韻律を変更
する方法としては、所定の規則に従って決定した韻律を
グラフ表示し、ＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒ
Ｉｎｔｅｒｆａｃｅ）に基づくユーザの操作により韻
律を変更させる方法があり、例えば「音声表現力と制作
効率を向上させた音声作成ツールＳｅｓｓｉｇｎ９
８」（水野ら著、日本音響学会平成１０年度秋季全国大
会講演論文集２−Ｐ−１２、第３０９頁〜第３１０頁、
１９９８年９月）（以下、第１の文献という）に記載さ
れている。[0003] As a method for such a user to change the prosody of the synthesized speech, a prosody determined according to a predetermined rule is displayed in a graph, and a GUI (Graphical User) is displayed.
There is a method of changing the prosody by a user's operation based on the interface, for example, “Speech creation tool ESSSIGN9 with improved speech expression and production efficiency.
8 "(Mizuno et al., The Acoustical Society of Japan, 1998 Autumn Conference, 2-P-12, pp. 309-310,
(September 1998) (hereinafter referred to as the first document).

【０００４】図２５は、第１の文献の記載に基づく、韻
律を変更可能な第１の従来の音声合成装置の構成を示す
ブロック図である。図２５において、１０１は入力テキ
ストを解析してアクセント句境界位置、読み、アクセン
ト型などを決定し、これらの他、アクセント強さなどの
韻律制御パラメータを含む言語情報を生成する言語処理
部であり、１０２は音声合成単位を記憶する音声合成単
位記憶部であり、１０３は言語情報に基づいて入力テキ
ストの読みに適合する音声合成単位を音声合成単位記憶
部１０２から読み出して音韻パラメータを生成し、音声
合成時まで保持する音韻パラメータ生成／保持部であ
り、１０５は言語情報からピッチパタンや音韻継続時間
長などの韻律パラメータを生成するための韻律パラメー
タ生成ルールを記憶する韻律パラメータ生成ルール記憶
部であり、１０６は韻律パラメータ生成ルールに従って
言語情報から韻律パラメータを生成する韻律パラメータ
生成部であり、１０８は韻律パラメータをグラフとして
画面に表示し、ユーザのＧＵＩ操作により編集された韻
律パラメータを保持し、音声合成時に音声合成部１０７
に供給する韻律パラメータエディタであり、１０７は韻
律パラメータと音韻パラメータとに基づいて合成音声を
生成し出力する音声合成部である。FIG. 25 is a block diagram showing a configuration of a first conventional speech synthesizer capable of changing prosody based on the description in the first document. In FIG. 25, reference numeral 101 denotes a language processing unit that analyzes input text to determine accent phrase boundary positions, readings, accent types, and the like, and generates linguistic information including prosody control parameters such as accent strength. , 102 is a speech synthesis unit storage unit that stores speech synthesis units, and 103 reads a speech synthesis unit that matches the reading of the input text from the speech synthesis unit storage unit 102 based on the linguistic information, and generates phoneme parameters. A phonological parameter generation / storage unit 105 that stores the rhythm parameter generation rule for generating a rhythm parameter such as a pitch pattern and a phonological duration from linguistic information. Yes, 106 generates a prosody parameter from linguistic information according to a prosody parameter generation rule A law parameter generating unit, 108 displayed on the screen prosodic parameters as a graph, holds prosodic parameters edited by the user of the GUI operation, the speech synthesizing unit 107 at the time of speech synthesis
A speech synthesis unit 107 generates and outputs synthesized speech based on the prosody parameters and the phoneme parameters.

【０００５】次に動作について説明する。言語処理部１
０１は、入力テキストを供給されると、その入力テキス
トを解析してアクセント句境界位置、読み、アクセント
型などを決定し、それらの他、アクセント強さなどの韻
律制御パラメータを含む言語情報を生成し、音韻パラメ
ータ生成／保持部１０３および韻律パラメータ生成部１
０６に供給する。Next, the operation will be described. Language processing unit 1
01, when input text is supplied, the input text is analyzed to determine accent phrase boundary positions, readings, accent types, etc., and to generate linguistic information including prosody control parameters such as accent strength, etc. Phoneme parameter generation / storage section 103 and prosody parameter generation section 1
06.

【０００６】韻律パラメータ生成部１０６は、供給され
た言語情報に含まれる韻律制御パラメータおよび読み、
並びに韻律パラメータ生成ルール記憶部１０５の韻律パ
ラメータ生成ルールに基づいて韻律パラメータを決定
し、韻律パラメータエディタ１０８に供給する。韻律パ
ラメータエディタ１０８は、供給された韻律パラメータ
をグラフとして画面に表示し、ユーザのＧＵＩ操作によ
り編集された韻律パラメータを保持し、音声合成時に音
声合成部１０７に供給する。[0006] The prosody parameter generation unit 106 reads the prosody control parameters and the reading included in the supplied linguistic information,
In addition, a prosody parameter is determined based on the prosody parameter generation rule in the prosody parameter generation rule storage unit 105 and supplied to the prosody parameter editor 108. The prosody parameter editor 108 displays the supplied prosody parameters as a graph on a screen, holds the prosody parameters edited by the user's GUI operation, and supplies the prosody parameters to the speech synthesis unit 107 during speech synthesis.

【０００７】一方、音韻パラメータ生成／保持部１０３
は、供給された言語情報に基づいて、入力テキストの読
みに適合する音声合成単位を音声合成単位記憶部１０２
から読み出して音韻パラメータを生成して音声合成時ま
で保持し、音声合成部１０７に供給する。On the other hand, phoneme parameter generation / storage section 103
Stores a speech synthesis unit suitable for reading the input text based on the supplied linguistic information in the speech synthesis unit storage unit 102
, And generates a phoneme parameter, holds it until speech synthesis, and supplies it to the speech synthesis unit 107.

【０００８】そして音声合成時に音声合成部１０７はそ
れぞれ供給された韻律パラメータと音韻パラメータとに
基づいて合成音声を生成し出力する。At the time of speech synthesis, the speech synthesis unit 107 generates and outputs a synthesized speech based on the supplied prosodic parameters and phonemic parameters.

【０００９】以上のようにして、第１の従来の音声合成
装置では、ユーザの操作により音韻パラメータおよび韻
律パラメータが確定した後に合成音声を生成する。As described above, the first conventional speech synthesizer generates a synthesized speech after the phonetic parameters and the prosodic parameters are determined by the operation of the user.

【００１０】また、ユーザが合成音声の韻律を変更する
別の方法として、入力テキストに埋め込まれたコマンド
から韻律制御のための対話情報を取得してその対話情報
に基づいて音声合成する方法があり、例えば特開平９−
２５２７８４号公報（以下、第２の文献という）に記載
されている。As another method for the user to change the prosody of the synthesized speech, there is a method of obtaining conversation information for prosody control from a command embedded in the input text and synthesizing speech based on the conversation information. See, for example,
No. 252784 (hereinafter referred to as a second document).

【００１１】図２６は、第２の文献に基づく第２の従来
の音声合成装置の構成を示すブロック図である。図２６
において、１１１は合成音声の韻律を制御するためのコ
マンドが埋め込まれた入力文をテキストとコマンドとに
分け、そのコマンドを解析して対話情報を生成するコマ
ンド解析部であり、１１２は言語情報に基づいてテキス
トの読みに適合する音声合成単位を音声合成単位記憶部
１０２から読み出して音韻パラメータを生成する音韻パ
ラメータ生成部であり、１１３は言語情報に含まれる韻
律制御パラメータおよび読み、対話情報、並びに韻律パ
ラメータ生成ルールに基づいてピッチパタンや音韻継続
時間長などの韻律パラメータを生成するコマンド制御韻
律パラメータ生成部である。なお、図２６におけるその
他の構成要素については図２５のものと同様であるので
その説明を省略する。FIG. 26 is a block diagram showing the configuration of a second conventional speech synthesizer based on the second document. FIG.
, 111 is a command analysis unit that divides an input sentence in which a command for controlling the prosody of the synthesized speech is embedded into text and a command, analyzes the command and generates dialogue information, and 112 denotes a language information. A phoneme parameter generation unit that reads a speech synthesis unit that matches text reading based on the text from the speech synthesis unit storage unit 102 and generates phoneme parameters based on the prosodic control parameter and reading included in the linguistic information. It is a command control prosody parameter generation unit that generates prosody parameters such as pitch patterns and phoneme durations based on prosody parameter generation rules. The other components in FIG. 26 are the same as those in FIG. 25, and thus description thereof will be omitted.

【００１２】次に動作について説明する。コマンド解析
部１１１は、合成音声の韻律を制御するためのコマンド
が埋め込まれた入力文を供給されると、その入力文をテ
キストとコマンドとに分け、そのコマンドを解析して対
話情報を生成し、その対話情報をコマンド制御韻律パラ
メータ生成部１１３に供給し、そのテキストを入力テキ
ストとして言語処理部１０１に供給する。言語処理部１
０１は、第１の従来の音声合成装置と同様にして入力テ
キストを解析して言語情報を生成し、音韻パラメータ生
成部１１２およびコマンド制御韻律パラメータ生成部１
１３に供給する。Next, the operation will be described. When supplied with an input sentence in which a command for controlling the prosody of the synthesized speech is supplied, the command analysis unit 111 divides the input sentence into text and a command, analyzes the command, and generates dialog information. The dialog information is supplied to the command control prosody parameter generation unit 113, and the text is supplied to the language processing unit 101 as input text. Language processing unit 1
01 analyzes the input text to generate linguistic information in the same manner as in the first conventional speech synthesizer, and generates the phonological parameter generation unit 112 and the command control prosody parameter generation unit 1.
13.

【００１３】音韻パラメータ生成部１１２は、供給され
た言語情報に基づいて入力テキストの読みに適合する音
声合成単位を音声合成単位記憶部１０２から読み出して
音韻パラメータとして音声合成部１０７に供給する。The phoneme parameter generation unit 112 reads a speech synthesis unit suitable for reading the input text from the speech synthesis unit storage unit 102 based on the supplied linguistic information and supplies the speech synthesis unit 107 as a phoneme parameter.

【００１４】一方、コマンド制御韻律パラメータ生成部
１１３は、供給された言語情報に含まれる韻律制御パラ
メータおよび読み、対話情報、並びに韻律パラメータ生
成ルール記憶部１０５の韻律パラメータ生成ルールに基
づいて、ピッチパタンや音韻継続時間長などの韻律パラ
メータを生成し、音声合成部１０７に供給する。On the other hand, the command control prosody parameter generation unit 113 generates a pitch pattern based on the prosody control parameters included in the supplied linguistic information, the reading and dialog information, and the prosody parameter generation rules of the prosody parameter generation rule storage unit 105. A prosody parameter such as a sound and phoneme duration is generated and supplied to the speech synthesis unit 107.

【００１５】そして音声合成部１０７は韻律パラメータ
と音韻パラメータとに基づいて合成音声を生成し出力す
る。The speech synthesizer 107 generates and outputs a synthesized speech based on the prosody parameters and the phoneme parameters.

【００１６】以上のようにして、第２の従来の音声合成
装置では、予め入力文に埋め込まれたコマンドに応じて
音韻パラメータおよび韻律パラメータを生成して合成音
声を生成する。As described above, in the second conventional speech synthesizer, a phoneme parameter and a prosody parameter are generated in accordance with a command embedded in an input sentence in advance to generate a synthesized speech.

【００１７】上述のものの他、ユーザの操作により音韻
パラメータおよび韻律パラメータが確定した後に合成音
声を生成する従来の技術として特開平１０−１１０８３
号公報に記載のものなどがある。In addition to the above-mentioned technology, Japanese Patent Laid-Open No. Hei 10-11083 discloses a conventional technique for generating a synthesized speech after the phoneme parameter and the prosodic parameter are determined by a user operation.
And those described in Japanese Patent Publication No.

【００１８】[0018]

【発明が解決しようとする課題】従来の音声合成装置は
以上のように構成されているので、第１の従来の音声合
成装置では、ユーザによる韻律パラメータの編集が完了
した後で合成音声を出力するため、音声出力中にユーザ
が韻律を変更することは困難であり、会話代替手段とし
て相手の反応に応じて文の一部を強調するなどの高い表
現性を実現することが困難であり、また第２の従来の音
声合成装置では、韻律を変更するためには予め入力文に
コマンドを埋め込んでおく必要があるため、同様に音声
出力中にユーザが韻律を変更することは困難であり、会
話代替手段として高い表現性を実現することが困難であ
るなどの課題があった。Since the conventional speech synthesizer is configured as described above, the first conventional speech synthesizer outputs synthesized speech after the user completes editing of the prosodic parameters. Therefore, it is difficult for the user to change the prosody during voice output, and it is difficult to realize high expressiveness such as emphasizing a part of a sentence according to the response of the other party as a conversation alternative means, Further, in the second conventional speech synthesizer, it is necessary to embed a command in an input sentence in advance in order to change the prosody, so that it is difficult for the user to change the prosody during voice output. There were problems such as difficulty in achieving high expressiveness as a conversation alternative.

【００１９】この発明は上記のような課題を解決するた
めになされたもので、音声の出力中にユーザの操作に応
じて、ピッチパタン、音韻継続時間長などの韻律や、ど
なり声やささやき声などの声質や音響的な特徴を変更す
るようにして、高い表現性を有する会話代替手段として
使用することができる音声合成装置および音声合成方法
を得ることを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and includes a prosody, such as a pitch pattern and a phoneme duration, and a shouting voice, a whispering voice, etc., in response to a user operation during output of a voice. It is an object of the present invention to obtain a speech synthesis device and a speech synthesis method that can be used as a conversational alternative having high expressiveness by changing the voice quality and acoustic characteristics of the speech.

【００２０】[0020]

【課題を解決するための手段】この発明に係る音声合成
装置は、入力テキストを解析し、その入力テキストに対
応するアクセント句境界位置、アクセント型、読みおよ
び韻律制御パラメータを少なくとも含む言語情報を生成
する言語処理部と、言語情報に基づいて入力テキストに
対応する音声合成単位を選択して音韻パラメータとする
音韻パラメータ生成部と、韻律パラメータ生成ルールに
従って言語情報から韻律パラメータを生成するととも
に、音声出力中の操作入力に応じて後続の韻律パラメー
タを変更する韻律パラメータ制御部と、音韻パラメータ
と韻律パラメータとに基づいて音声を合成し出力する音
声合成部とを備えるものである。A speech synthesizer according to the present invention analyzes an input text and generates linguistic information including at least an accent phrase boundary position, an accent type, a reading and a prosody control parameter corresponding to the input text. A linguistic processing unit, a phonological parameter generation unit that selects a speech synthesis unit corresponding to the input text based on the linguistic information and uses the phonological parameter as a phonological parameter, generates a prosodic parameter from the linguistic information according to the prosodic parameter generation rule, and outputs a voice. It has a prosody parameter control unit that changes the subsequent prosody parameters according to the operation input therein, and a speech synthesis unit that synthesizes and outputs speech based on the phoneme parameters and the prosody parameters.

【００２１】この発明に係る音声合成装置は、音声出力
中の操作入力に応じて言語情報のうちの韻律制御パラメ
ータを変更する韻律制御パラメータ変更部と、韻律パラ
メータ生成ルールに従って言語情報から韻律パラメータ
を生成する韻律パラメータ生成部とを韻律パラメータ制
御部に有するものである。A speech synthesis apparatus according to the present invention includes a prosody control parameter changing unit for changing a prosody control parameter of linguistic information according to an operation input during voice output, and a prosody parameter from the linguistic information according to a prosody parameter generation rule. And a prosody parameter control unit for generating the prosody parameter generation unit.

【００２２】この発明に係る音声合成装置は、複数の韻
律パラメータ生成ルールから音声出力中の操作入力に応
じて１つの韻律パラメータ生成ルールを選択する韻律パ
ラメータ生成ルール選択部と、韻律パラメータ生成ルー
ル選択部により選択された韻律パラメータ生成ルールに
従って言語情報から韻律パラメータを生成する韻律パラ
メータ生成部とを韻律パラメータ制御部に有するもので
ある。A prosody parameter generation rule selection unit for selecting one prosody parameter generation rule according to an operation input during voice output from a plurality of prosody parameter generation rules, and a prosody parameter generation rule selection unit And a prosody parameter control unit for generating a prosody parameter from linguistic information in accordance with the prosody parameter generation rule selected by the unit.

【００２３】この発明に係る音声合成装置は、韻律パラ
メータ生成ルールに従って言語情報から韻律パラメータ
を生成する韻律パラメータ生成部と、韻律パラメータ生
成部により生成された韻律パラメータを音声出力中の第
１の操作入力に応じて変更する韻律パラメータ変更部と
を韻律パラメータ制御部に有するものである。A speech synthesis apparatus according to the present invention includes a prosody parameter generation unit for generating a prosody parameter from linguistic information in accordance with a prosody parameter generation rule, and a first operation for outputting a prosody parameter generated by the prosody parameter generation unit during voice output. And a prosody parameter control unit for changing the prosody parameter according to the input.

【００２４】この発明に係る音声合成装置は、韻律パラ
メータ生成部により韻律パラメータと言語情報とに基づ
いてアクセント句境界および音節境界のタイミングを有
する境界情報を生成し、韻律パラメータ変更部により境
界情報に基づいて決定される韻律パラメータの変更可能
な部分から音声出力中の第２の操作入力に応じて韻律パ
ラメータを変更する部分を選択し、その部分の韻律パラ
メータを音声出力中の第１の操作入力に応じて変更する
ものである。In the speech synthesizer according to the present invention, the prosody parameter generation unit generates boundary information having timings of accent phrase boundaries and syllable boundaries based on the prosody parameters and the linguistic information, and the prosody parameter change unit converts the boundary information into boundary information. A part for changing the prosody parameter according to the second operation input during the voice output is selected from the prosody parameter changeable part determined based on the first operation input during the voice output. It changes according to.

【００２５】この発明に係る音声合成装置は、言語処理
部により、入力テキストのうちの所定の固定メッセージ
の部分を検出するとともに、固定メッセージ以外の部分
を解析し、その固定メッセージ部分以外の部分に対応す
るアクセント句境界位置、アクセント型、読みおよび韻
律制御パラメータを少なくとも含む言語情報を生成し、
検出した固定メッセージの音声波形データを固定メッセ
ージ選択部により操作入力に応じて選択し、固定メッセ
ージ韻律変更部により音声出力中の操作入力に応じて音
声波形データを変更して固定メッセージの韻律を変更
し、音声再生部により固定メッセージを音声として再生
するものである。In the speech synthesizer according to the present invention, the language processing unit detects a predetermined fixed message part of the input text, analyzes a part other than the fixed message, and analyzes the part other than the fixed message part. Generating linguistic information including at least corresponding accent phrase boundary positions, accent types, reading and prosodic control parameters;
The fixed message selection unit selects the detected fixed message voice waveform data according to the operation input, and the fixed message prosody changing unit changes the voice waveform data according to the operation input during voice output to change the fixed message prosody. Then, the fixed message is reproduced as voice by the voice reproducing unit.

【００２６】この発明に係る音声合成方法は、入力テキ
ストを解析し、その入力テキストに対応するアクセント
句境界位置、アクセント型、読みおよび韻律制御パラメ
ータを少なくとも含む言語情報を生成するステップと、
言語情報に基づいて入力テキストに対応する音声合成単
位を選択して音韻パラメータとするステップと、韻律パ
ラメータ生成ルールに従って言語情報から韻律パラメー
タを生成するとともに、音声出力中の操作入力に応じて
後続の韻律パラメータを変更するステップと、音韻パラ
メータと韻律パラメータとに基づいて音声を合成し出力
するステップとを備えるものである。The speech synthesis method according to the present invention analyzes an input text and generates linguistic information including at least an accent phrase boundary position, an accent type, a reading and a prosody control parameter corresponding to the input text;
Selecting a speech synthesis unit corresponding to the input text based on the linguistic information as a phonological parameter, generating a prosodic parameter from the linguistic information in accordance with a prosodic parameter generation rule, and generating a prosodic parameter from the linguistic information according to an operation input during voice output. The method includes a step of changing a prosody parameter, and a step of synthesizing and outputting a speech based on the phoneme parameter and the prosody parameter.

【００２７】この発明に係る音声合成装置は、入力テキ
ストを解析し、その入力テキストに対応するアクセント
句境界位置、アクセント型、読みおよび韻律制御パラメ
ータを少なくとも含む言語情報を生成する言語処理部
と、言語情報に基づいて入力テキストに対応する音声合
成単位を選択して音韻パラメータとするとともに、音声
出力中の操作入力に応じて後続の音韻パラメータを変更
する音韻パラメータ制御部と、韻律パラメータ生成ルー
ルに従って言語情報から韻律パラメータを生成する韻律
パラメータ生成部と、音韻パラメータと韻律パラメータ
とに基づいて音声を合成し出力する音声合成部とを備え
るものである。[0027] A speech synthesizer according to the present invention analyzes a input text and generates a linguistic information including at least an accent phrase boundary position, an accent type, a reading and a prosody control parameter corresponding to the input text; A phonological parameter control unit that selects a speech synthesis unit corresponding to the input text based on the linguistic information and sets the phonological parameter as a phonological parameter, and changes a subsequent phonological parameter according to an operation input during voice output, according to a prosodic parameter generation rule. It has a prosody parameter generation unit that generates a prosody parameter from linguistic information, and a speech synthesis unit that synthesizes and outputs speech based on the phoneme parameter and the prosody parameter.

【００２８】この発明に係る音声合成装置は、声質や音
響的な特徴の異なる複数種類の音声合成単位セットから
音声出力中の操作入力に応じて１種類の音声合成単位セ
ットを選択する音声合成単位セット選択部と、音声合成
単位セット選択部により選択された音声合成単位セット
から言語情報に基づいて入力テキストに対応する音声合
成単位を選択して音韻パラメータとする音韻パラメータ
生成部とを音韻パラメータ制御部に有するものである。[0028] A speech synthesis unit according to the present invention selects one type of speech synthesis unit set in accordance with an operation input during speech output from a plurality of types of speech synthesis unit sets having different voice qualities and acoustic characteristics. A phoneme parameter control unit configured to select a speech synthesis unit corresponding to the input text based on the linguistic information from the speech synthesis unit set selected by the speech synthesis unit set selection unit and to use the speech synthesis unit as a phoneme parameter; Part.

【００２９】この発明に係る音声合成装置は、言語処理
部により、入力テキストのうちの所定の固定メッセージ
の部分を検出するとともに、固定メッセージ以外の部分
を解析し、その固定メッセージ部分以外の部分に対応す
るアクセント句境界位置、アクセント型、読みおよび韻
律制御パラメータを少なくとも含む言語情報を生成し、
検出した固定メッセージの音声波形データを固定メッセ
ージ選択部により操作入力に応じて選択し、固定メッセ
ージ韻律変更部により音声出力中の操作入力に応じて音
声波形データを変更して固定メッセージの韻律を変更
し、音声再生部により固定メッセージを音声として再生
するものである。[0029] In the speech synthesizer according to the present invention, the language processing unit detects a predetermined fixed message portion of the input text, analyzes a portion other than the fixed message, and analyzes the portion other than the fixed message portion. Generating linguistic information including at least corresponding accent phrase boundary positions, accent types, reading and prosodic control parameters;
The fixed message selection unit selects the detected fixed message voice waveform data according to the operation input, and the fixed message prosody changing unit changes the voice waveform data according to the operation input during voice output to change the fixed message prosody. Then, the fixed message is reproduced as voice by the voice reproducing unit.

【００３０】この発明に係る音声合成方法は、入力テキ
ストを解析し、その入力テキストに対応する少なくとも
アクセント句境界位置、アクセント型、読みおよび韻律
制御パラメータを含む言語情報を生成するステップと、
言語情報に基づいて入力テキストに対応する音声合成単
位を選択して音韻パラメータとするとともに、音声出力
中の操作入力に応じて後続の音韻パラメータを変更する
ステップと、韻律パラメータ生成ルールに従って言語情
報から韻律パラメータを生成するステップと、音韻パラ
メータと韻律パラメータとに基づいて音声を合成し出力
するステップとを備えるものである。A speech synthesis method according to the present invention analyzes an input text and generates linguistic information corresponding to the input text, the linguistic information including at least accent phrase boundary position, accent type, reading and prosody control parameters;
A step of selecting a speech synthesis unit corresponding to the input text based on the linguistic information and setting it as a phonological parameter, and changing a subsequent phonological parameter according to an operation input during voice output; The method includes the steps of generating a prosody parameter and synthesizing and outputting speech based on the phoneme parameter and the prosody parameter.

【００３１】[0031]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１はこの発明の実施の形態１による音
声合成装置の構成を示すブロック図であり、図２は図１
の言語処理部の構成例を示す構成図である。図１におい
て、１は入力テキストを解析し、その入力テキストに対
応するアクセント句境界位置、アクセント型、読みおよ
び韻律制御パラメータを少なくとも含む言語情報を生成
する言語処理部である。図２に示す言語処理部１におい
て、１１は形態素の表記、読み、品詞、基本アクセント
型などが記述された辞書を予め記憶する辞書記憶部であ
り、１２は辞書記憶部１１の辞書を適宜参照してアクセ
ント句のアクセント型、読みを決定するテキスト解析部
である。１３は韻律制御パラメータを生成するための韻
律規則を記憶する韻律規則記憶部である。韻律規則とし
ては、例えば、アクセント強さを、数量化Ｉ類のモデル
を使用してアクセント句の自立語の品詞や付属語の種類
などに応じて計算し、平均モーラ長を、アクセント句の
モーラ数ごとに予め自然音声サンプルの平均モーラ長を
記憶したテーブルを参照して決定するものとする。１４
はアクセント句列に基づいて所定の韻律規則を参照して
韻律制御パラメータを決定し、アクセント句境界位置、
アクセント型および読みとともに言語情報として出力す
る韻律制御パラメータ生成部である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a block diagram showing a configuration of a speech synthesizer according to Embodiment 1 of the present invention, and FIG.
FIG. 3 is a configuration diagram illustrating a configuration example of a language processing unit. In FIG. 1, reference numeral 1 denotes a language processing unit that analyzes input text and generates linguistic information including at least accent phrase boundary positions, accent types, readings, and prosody control parameters corresponding to the input text. In the language processing unit 1 shown in FIG. 2, reference numeral 11 denotes a dictionary storage unit that stores in advance a dictionary in which morpheme notation, reading, part of speech, basic accent type, and the like are described. Reference numeral 12 denotes a dictionary in the dictionary storage unit 11 as appropriate. This is a text analysis unit that determines the accent type and reading of the accent phrase. A prosody rule storage unit 13 stores prosody rules for generating prosody control parameters. As a prosodic rule, for example, the accent strength is calculated using the model of quantification type I according to the part of speech of the independent phrase of the accent phrase and the type of the adjunct word, and the average mora length is calculated by the mora of the accent phrase. It is determined for each number by referring to a table in which the average mora length of the natural speech sample is stored in advance. 14
Determines a prosody control parameter by referring to a predetermined prosody rule based on the accent phrase sequence, and determines an accent phrase boundary position,
It is a prosody control parameter generator that outputs linguistic information along with accent type and reading.

【００３２】図１に戻り、２は音韻パラメータとする音
声合成単位（例えば基本合成方式がＰＳＯＬＡなどの音
声波形を直接利用するものである場合には音声波形）を
予め記憶する音声合成単位記憶部であり、３は言語情報
に基づいて入力テキストに対応する音声合成単位を選択
して音韻パラメータとする音韻パラメータ生成部であ
る。Returning to FIG. 1, reference numeral 2 denotes a speech synthesis unit storage unit for pre-storing a speech synthesis unit as a phoneme parameter (for example, a speech waveform when the basic synthesis system directly uses a speech waveform such as PSOLA). Reference numeral 3 denotes a phoneme parameter generation unit that selects a speech synthesis unit corresponding to the input text based on the linguistic information and uses the selected speech synthesis unit as a phoneme parameter.

【００３３】４は音声出力中の操作入力に応じて言語情
報のうちの韻律制御パラメータを変更する韻律制御パラ
メータ変更部である。５は言語情報からピッチパタンや
音韻継続時間長などの韻律パラメータを生成するための
韻律パラメータ生成ルールを記憶する韻律パラメータ生
成ルール記憶部である。韻律パラメータ生成ルールは、
例えばピッチパタンや音韻継続時間長を決定するための
モデルと、言語情報に含まれる韻律制御パラメータをモ
デルのパラメータに変換するための変換ルールとを含
む。Reference numeral 4 denotes a prosody control parameter changing unit for changing a prosody control parameter of the linguistic information according to an operation input during voice output. Reference numeral 5 denotes a prosody parameter generation rule storage unit that stores prosody parameter generation rules for generating prosody parameters such as pitch patterns and phoneme durations from linguistic information. The prosody parameter generation rule is
For example, it includes a model for determining a pitch pattern and a phoneme duration, and a conversion rule for converting a prosody control parameter included in linguistic information into a model parameter.

【００３４】例えば韻律パラメータ生成ルールには、ピ
ッチパタン生成モデルとしての点ピッチモデル、音韻継
続時間長決定モデルとしてのリズム知覚点等間隔配置モ
デルが含まれる。リズム知覚点等間隔配置モデルでは、
子音長を固定とし、音節の種類ごとに子音−母音境界か
らの時間で定められるリズム知覚点を等時間間隔に配置
することにより母音長が決定され、末尾の母音長はリズ
ム知覚点間隔に一定値（例えば０．８０）を乗じた長さ
とする。このモデルの情報としてリズム知覚点位置のテ
ーブルと、子音種類ごとの継続時間長テーブルとが記憶
される。図３はリズム知覚点テーブルの一例を示す図で
あり、図４は子音継続時間長テーブルの一例を示す図で
ある。図３において正の値はリズム知覚点が子音−母音
境界よりも時間的に後であることを表す。また単独母音
および撥音では音韻開始点を子音−母音境界とみなす。
また韻律パラメータ生成ルールには、例えば言語情報に
おける韻律制御パラメータをそのままモデルのパラメー
タとする変換ルールが含まれる。For example, the prosody parameter generation rules include a point pitch model as a pitch pattern generation model, and a rhythm perception point equally spaced model as a phoneme duration determination model. In the rhythm perception point equally spaced model,
The vowel length is determined by fixing the consonant length and rhythm perception points determined by the time from the consonant-vowel boundary for each syllable type at equal time intervals, and the last vowel length is fixed to the rhythm perception point interval The length is multiplied by a value (for example, 0.80). A rhythm perception point position table and a duration table for each consonant type are stored as the model information. FIG. 3 is a diagram illustrating an example of a rhythm perception point table, and FIG. 4 is a diagram illustrating an example of a consonant duration time table. In FIG. 3, a positive value indicates that the rhythm perception point is temporally later than the consonant-vowel boundary. For a single vowel and a vowel, the phoneme start point is regarded as a consonant-vowel boundary.
The prosody parameter generation rules include, for example, a conversion rule in which the prosody control parameters in the linguistic information are used as model parameters as they are.

【００３５】図１に戻り、６は韻律パラメータ生成ルー
ルに従って言語情報から韻律パラメータを生成する韻律
パラメータ生成部である。なお、韻律制御パラメータ変
更部４および韻律パラメータ生成部６は、韻律パラメー
タ生成ルールに従って言語情報から韻律パラメータを生
成するとともに音声出力中の操作入力に応じて後続の韻
律パラメータを変更する韻律パラメータ制御部８１を構
成する。Returning to FIG. 1, reference numeral 6 denotes a prosody parameter generation unit for generating a prosody parameter from linguistic information according to a prosody parameter generation rule. The prosody parameter changing unit 4 and the prosody parameter generation unit 6 generate a prosody parameter from the linguistic information in accordance with the prosody parameter generation rule and change a subsequent prosody parameter in response to an operation input during voice output. 81.

【００３６】７は音韻パラメータと韻律パラメータとに
基づいて音声を合成し出力する音声合成部である。Reference numeral 7 denotes a speech synthesizer for synthesizing and outputting speech based on phonemic parameters and prosodic parameters.

【００３７】次に動作について説明する。例えばキー入
力やかな漢字変換などを使用してユーザにより作成され
た漢字かな混じり文である入力テキストが言語処理部１
に供給される。Next, the operation will be described. For example, an input text that is a kanji-kana mixed sentence created by a user using a key input or a kana-kanji conversion is used by the language processing unit 1.
Supplied to

【００３８】言語処理部１は、その入力テキストを解析
し、アクセント句境界位置を決定するとともに、例えば
点ピッチモデルにおけるアクセント強さ、平均モーラ長
などの韻律制御パラメータを所定の規則に従って決定
し、少なくともアクセント句境界位置、読み、アクセン
ト型および韻律制御パラメータを含む言語情報を音韻パ
ラメータ生成部３および韻律制御パラメータ変更部４に
供給する。The language processing section 1 analyzes the input text and determines the accent phrase boundary position, and also determines prosody control parameters such as accent strength and average mora length in a point pitch model according to predetermined rules. Linguistic information including at least the accent phrase boundary position, the pronunciation, the accent type, and the prosody control parameter are supplied to the phoneme parameter generation unit 3 and the prosody control parameter change unit 4.

【００３９】このとき、言語処理部１においては、テキ
スト解析部１２が例えば形態素の表記、読み、品詞、基
本アクセント型などが記述された辞書記憶部１１の辞書
を適宜参照して、入力テキストにおけるアクセント句の
アクセント型および読みを決定し、韻律制御パラメータ
生成部１４がそのアクセント句列に基づいて韻律規則記
憶部１３の韻律規則を参照して韻律制御パラメータを決
定し、アクセント句境界位置、アクセント型および読み
とともに言語情報として出力する。At this time, in the language processing unit 1, the text analysis unit 12 appropriately refers to a dictionary in the dictionary storage unit 11 in which, for example, morphological notation, reading, part of speech, basic accent type, etc. are described, and The accent type and pronunciation of the accent phrase are determined, and the prosody control parameter generation unit 14 determines the prosody control parameter by referring to the prosody rules in the prosody rule storage unit 13 based on the accent phrase sequence, and determines the accent phrase boundary position and the accent. Output as linguistic information along with type and reading.

【００４０】ここで言語処理部１の動作について具体例
を用いて詳しく説明する。ここでは具体例として入力テ
キストが「明日家族みんなで遊園地に行くの。」なる文
である場合の動作について説明する。図５はテキスト解
析部によるアクセント句列の一例を示す図であり、図６
は言語処理部による言語情報の一例を示す図である。Here, the operation of the language processing section 1 will be described in detail using a specific example. Here, as a specific example, an operation in a case where the input text is a sentence "Tomorrow, the whole family will go to an amusement park." FIG. 5 is a diagram showing an example of an accent phrase string by the text analysis unit, and FIG.
FIG. 4 is a diagram illustrating an example of language information by a language processing unit.

【００４１】まずテキスト解析部１２は、辞書記憶部１
１の辞書を参照して、「明日／家族／みんなで／遊園地
に／行くの」というアクセント句境界位置（文字列中の
「／」の位置）、アクセント句のアクセント型および読
みを決定し、図５に示すような、順序１〜５のアクセン
ト句からなるアクセント句列を韻律制御パラメータ生成
部１４に供給する。次に韻律制御パラメータ生成部１４
は、供給されたアクセント句列に基づいて韻律規則を参
照し、例えば点ピッチモデルにおけるアクセント強さや
平均モーラ長などの韻律制御パラメータを決定し、図６
に示すように、アクセント句境界位置、アクセント型お
よび読みなどとともに言語情報として出力する。なお、
図６においてはアクセント句境界位置が明示的には記述
されていないが、アクセント句毎にアクセント型などが
記述されるため、アクセント句の境界位置が実質的に記
述されていることになる。First, the text analysis unit 12 operates in the dictionary storage unit 1.
With reference to the dictionary of No. 1, the accent phrase boundary position ("/" position in the character string) "tomorrow / family / all / amusement park / go" is determined, and the accent type and reading of the accent phrase are determined. 5 is supplied to the prosody control parameter generation unit 14 as an accent phrase string including accent phrases in the order 1 to 5. Next, the prosody control parameter generation unit 14
Refers to prosody rules based on the supplied accent phrase sequence, determines prosody control parameters such as accent strength and average mora length in the point pitch model, and
As shown in (1), linguistic information is output together with the accent phrase boundary position, accent type, pronunciation, and the like. In addition,
Although the accent phrase boundary position is not explicitly described in FIG. 6, the accent type and the like are described for each accent phrase, so that the accent phrase boundary position is substantially described.

【００４２】このようにして言語処理部１により入力テ
キストが解析され、言語情報が出力される。In this way, the input text is analyzed by the language processing section 1 and language information is output.

【００４３】次に音韻パラメータ生成部３は、供給され
た言語情報に基づいて入力テキストの各アクセント句の
読みに適合する音声合成単位を音声合成単位記憶部２か
ら読み出し、音韻パラメータとして音声合成部７に供給
する。Next, the phoneme parameter generation unit 3 reads out a speech synthesis unit suitable for reading each accent phrase of the input text from the speech synthesis unit storage unit 2 based on the supplied linguistic information, and uses the speech synthesis unit as a phoneme parameter. 7

【００４４】また、韻律制御パラメータ変更部４は、レ
バーなどの図示せぬ入力手段に対するユーザの操作に対
応する操作入力を音声出力中に（すなわち入力テキスト
の音声合成が途中まで終了した状態のときに）供給され
ると、供給された言語情報に含まれる、後続のアクセン
ト句（現在音声出力中のアクセント句の次のアクセント
句）の韻律制御パラメータの値を変更し、韻律制御パラ
メータを変更した適用言語情報を韻律パラメータ生成部
６に供給する。なお、操作入力が供給されていないとき
には、韻律制御パラメータ変更部４は、供給された言語
情報を韻律パラメータ生成部６に供給する。The prosody control parameter changing section 4 outputs an operation input corresponding to a user's operation on an input means (not shown) such as a lever during voice output (that is, when the voice synthesis of the input text is partially completed). ), The value of the prosody control parameter of the following accent phrase (the next accent phrase after the accent phrase currently being output) included in the supplied language information was changed, and the prosody control parameter was changed. The applied language information is supplied to the prosody parameter generation unit 6. When the operation input is not supplied, the prosody control parameter changing unit 4 supplies the supplied linguistic information to the prosody parameter generation unit 6.

【００４５】このときの韻律制御パラメータ変更部４の
動作について具体例を用いて詳しく説明する。ここでは
具体例として図６に示す言語情報が供給された場合の動
作について説明する。また、例えば後続のアクセント句
のアクセント強さと平均モーラ長をそれぞれ変更するた
めの２本のレバーを備えた図示せぬ入力手段をユーザが
操作することにより、操作入力が供給されるものとす
る。The operation of the prosody control parameter changing unit 4 at this time will be described in detail using a specific example. Here, the operation when the language information shown in FIG. 6 is supplied will be described as a specific example. Further, it is assumed that an operation input is supplied by a user operating an input unit (not shown) having two levers for respectively changing an accent strength and an average mora length of a subsequent accent phrase.

【００４６】このとき、例えば図６の言語情報における
第２番目のアクセント句「家族」の音声出力中にレバー
が操作され、アクセント強さが０．２オクターブ（ｏｃ
ｔ）だけ増加することを指示する操作入力が供給された
場合、後続のアクセント句「みんなで」のアクセント強
さが０．３オクターブから０．５オクターブに変更され
る。At this time, for example, the lever is operated during the voice output of the second accent phrase "family" in the linguistic information of FIG. 6, and the accent strength is 0.2 octave (oc).
When an operation input instructing to increase by t) is supplied, the accent strength of the subsequent accent phrase "everyone" is changed from 0.3 octave to 0.5 octave.

【００４７】このようにして韻律制御パラメータ変更部
４は、原則として言語処理部１からの言語情報をそのま
ま韻律パラメータ生成部６に供給し、操作入力があった
場合、その操作入力に応じて韻律制御パラメータを変更
した言語情報である適用言語情報を供給する。In this way, the prosody control parameter changing unit 4 supplies the linguistic information from the language processing unit 1 to the prosody parameter generation unit 6 in principle, and if there is an operation input, the prosody control parameter changing unit 4 according to the operation input Applied language information, which is language information with changed control parameters, is supplied.

【００４８】次に韻律パラメータ生成部６は韻律パラメ
ータ生成ルール記憶部５から韻律パラメータ生成ルール
を読み出し、韻律制御パラメータ変更部４からの言語情
報または適用言語情報に基づいて韻律パラメータを生成
する。すなわち適用言語情報が供給された場合には、そ
の適用言語情報に基づいて韻律パラメータが生成され
る。Next, the prosody parameter generation unit 6 reads the prosody parameter generation rules from the prosody parameter generation rule storage unit 5 and generates the prosody parameters based on the language information or the applied language information from the prosody control parameter change unit 4. That is, when applied linguistic information is supplied, a prosody parameter is generated based on the applied linguistic information.

【００４９】このとき韻律パラメータ生成部６は韻律パ
ラメータをアクセント句単位で生成する。ピッチパタン
生成には点ピッチモデルが使用され、言語情報または適
用言語情報に含まれるモーラ数、アクセント型およびア
クセント強さに適合する点ピッチが生成される。また音
韻継続時間長は、リズム知覚点等間隔配置モデルを使用
して、言語情報または適用言語情報に含まれる平均モー
ラ長をリズム知覚点の時間間隔として生成される。At this time, the prosody parameter generation unit 6 generates a prosody parameter for each accent phrase. A point pitch model is used for pitch pattern generation, and a point pitch suitable for the number of mora, accent type, and accent strength included in the linguistic information or applied linguistic information is generated. In addition, the phoneme duration is generated using an average mora length included in the linguistic information or the applied linguistic information as a time interval of the rhythm perception point, using a rhythm perception point equidistant arrangement model.

【００５０】図７は韻律制御パラメータ（アクセント強
さ）の変更があった場合とない場合とのピッチパタンの
比較の一例を示す図である。図６に示す言語情報におい
てアクセント句「家族」の音声出力中にアクセント強さ
を０．２オクターブだけ増加する操作入力があり、適用
言語情報が供給された場合、アクセント句「みんなで」
（読み：ｍｉＮｎａｄｅ）のピッチパタンが図７に示す
ようにアクセント強さが０．２オクターブだけ増加され
る。一方、操作入力がない場合、図６に示す言語情報に
基づいて図７に示すようにアクセント強さが０．３オク
ターブのままとなる。なお、点ピッチパタンの頂点は母
音中心点としている。操作入力があった場合にはピッチ
が図のように高くなり、そのアクセント句が強調され
る。FIG. 7 is a diagram showing an example of a comparison of pitch patterns when the prosody control parameter (accent strength) is changed and when it is not changed. In the linguistic information shown in FIG. 6, there is an operation input for increasing the accent strength by 0.2 octaves during the voice output of the accent phrase "family", and when the applied linguistic information is supplied, the accent phrase "everyone"
As shown in FIG. 7, in the pitch pattern of (reading: miNnade), the accent strength is increased by 0.2 octave. On the other hand, when there is no operation input, the accent strength remains at 0.3 octave as shown in FIG. 7 based on the linguistic information shown in FIG. The vertex of the point pitch pattern is the vowel center point. When an operation input is made, the pitch is increased as shown in the figure, and the accent phrase is emphasized.

【００５１】また、図８は韻律制御パラメータ（平均モ
ーラ長）の変更があった場合とない場合との音韻継続時
間長の比較の一例を示す図である。図６に示す言語情報
において同様にアクセント句「家族」の音声出力中に平
均モーラ長を２５パーセント長くする操作入力があり、
適用言語情報が供給された場合、図８（ｂ）に示すよう
にアクセント句「みんなで」における母音の継続時間長
が２５パーセント長くなり、そのアクセント句が強調さ
れる。一方、操作入力がない場合、図６に示す言語情報
に基づいて音韻継続時間長が図８（ａ）に示すようにな
る。FIG. 8 is a diagram showing an example of a comparison of phoneme durations when there is a change in the prosody control parameter (average mora length) and when there is no change. Similarly, in the linguistic information shown in FIG. 6, there is an operation input for increasing the average mora length by 25% during the voice output of the accent phrase "family".
When the applied linguistic information is supplied, as shown in FIG. 8B, the duration of the vowel in the accent phrase "everyone" is increased by 25%, and the accent phrase is emphasized. On the other hand, when there is no operation input, the phoneme duration time becomes as shown in FIG. 8A based on the linguistic information shown in FIG.

【００５２】このようにして言語情報または適用言語情
報に基づいて、ピッチパタンや音韻継続時間長などの韻
律パラメータが生成され、音声合成部７に供給される。Thus, based on the linguistic information or the applied linguistic information, the prosody parameters such as the pitch pattern and the phoneme duration are generated and supplied to the speech synthesizer 7.

【００５３】そして音声合成部７は、音韻パラメータ生
成部３からの音韻パラメータと韻律パラメータ生成部６
からの韻律パラメータに基づいて所定の音声合成手法に
従って合成音声を生成し出力する。例えばＰＳＯＬＡな
どの音声波形を直接利用する方法を使用する場合、音韻
パラメータとして入力された音声波形に対して、韻律パ
ラメータに応じた切り出しや加算重畳などの変形を施
し、それらの音声波形を直接接続して合成音声を生成す
る。The speech synthesizing unit 7 generates the phoneme parameters from the phoneme parameter generation unit 3 and the prosody parameter generation unit 6.
And generates and outputs a synthesized speech according to a predetermined speech synthesis method based on the prosodic parameters from. For example, when a method of directly using a speech waveform such as PSOLA is used, a speech waveform input as a phoneme parameter is subjected to a modification such as cutout or addition superimposition in accordance with the prosodic parameter, and the speech waveforms are directly connected. To generate synthesized speech.

【００５４】以上のように、この実施の形態１によれ
ば、音声出力中の操作入力に応じて言語情報のうちの韻
律制御パラメータを変更し、韻律パラメータ生成ルール
に従って言語情報から韻律パラメータを生成するように
したので、音声の出力中にユーザの操作に応じてピッチ
パタン、音韻継続時間長などの韻律を変更して、音声出
力中に音声の部分的な強調などをユーザが行うことがで
き、高い表現性を有する会話代替手段として本音声合成
装置を使用することができるという効果が得られる。As described above, according to the first embodiment, the prosody control parameters of the linguistic information are changed according to the operation input during the voice output, and the prosody parameters are generated from the linguistic information in accordance with the prosody parameter generation rules. This allows the user to change the prosody, such as the pitch pattern and phoneme duration, in response to the user's operation during the output of the voice, and to partially emphasize the voice during the voice output. Thus, the speech synthesizing apparatus can be used as a conversation alternative having high expressiveness.

【００５５】実施の形態２．図９はこの発明の実施の形
態２による音声合成装置の構成を示すブロック図であ
る。図９において、２１は言語情報から韻律パラメータ
を生成するための複数の韻律パラメータ生成ルールを記
憶する複数韻律パラメータ生成ルール記憶部である。各
韻律パラメータ生成ルールは、実施の形態１の場合と同
様にモデルと変換ルールとを有する。なお、この実施の
形態２では、韻律パラメータ生成ルールに含まれるモデ
ルはすべての韻律パラメータ生成ルールにおいて同一の
ものとする。したがってこの実施の形態２では変換ルー
ルのみが韻律パラメータ生成ルールによって異なる。図
１０は複数種類の変換ルールの一例を示す図である。な
お、図１０における「−」は言語情報に含まれる韻律制
御パラメータをそのままモデルのパラメータとして使用
することを意味する。また、この実施の形態２の場合に
おける韻律パラメータ生成ルールに含まれるモデルとし
ては実施の形態１の場合と同様に、ピッチパタン生成に
は点ピッチモデルを使用し、音韻継続時間長生成には子
音長を固定とするリズム知覚点等間隔配置モデルを使用
する。Embodiment 2 FIG. 9 is a block diagram showing a configuration of a speech synthesizer according to Embodiment 2 of the present invention. In FIG. 9, reference numeral 21 denotes a multiple prosody parameter generation rule storage unit that stores a plurality of prosody parameter generation rules for generating prosody parameters from linguistic information. Each prosody parameter generation rule has a model and a conversion rule as in the first embodiment. In the second embodiment, the model included in the prosody parameter generation rule is the same for all prosody parameter generation rules. Therefore, in the second embodiment, only the conversion rule differs depending on the prosody parameter generation rule. FIG. 10 is a diagram showing an example of a plurality of types of conversion rules. Note that “−” in FIG. 10 means that the prosody control parameters included in the linguistic information are used as they are as model parameters. As in the first embodiment, a point pitch model is used for generating a pitch pattern, and a consonant is used for generating a phoneme duration as in the first embodiment. A rhythm perception point equally spaced model with a fixed length is used.

【００５６】図９に戻り、２２は複数の韻律パラメータ
生成ルールから音声出力中のルール選択入力（操作入
力）に応じて１つの韻律パラメータ生成ルールを選択す
る韻律パラメータ生成ルール選択部である。６は韻律パ
ラメータ生成ルール選択部２２により選択された韻律パ
ラメータ生成ルールに従って言語情報から韻律パラメー
タを生成する韻律パラメータ生成部である。Returning to FIG. 9, reference numeral 22 denotes a prosody parameter generation rule selection unit that selects one prosody parameter generation rule from a plurality of prosody parameter generation rules in accordance with a rule selection input (operation input) during voice output. A prosody parameter generation unit 6 generates a prosody parameter from linguistic information in accordance with the prosody parameter generation rule selected by the prosody parameter generation rule selection unit 22.

【００５７】なお、韻律パラメータ生成部６および韻律
パラメータ生成ルール選択部２２は、韻律パラメータ生
成ルールに従って言語情報から韻律パラメータを生成す
るとともに音声出力中の操作入力に応じて後続の韻律パ
ラメータを変更する韻律パラメータ制御部８１Ａを構成
する。The prosody parameter generation unit 6 and the prosody parameter generation rule selection unit 22 generate prosody parameters from linguistic information in accordance with the prosody parameter generation rules, and change subsequent prosody parameters according to an operation input during voice output. The prosody parameter control unit 81A is configured.

【００５８】なお、図９のその他の構成要素については
実施の形態１によるもの（図１）と同様であるのでその
説明を省略する。The other components in FIG. 9 are the same as those according to the first embodiment (FIG. 1), and will not be described.

【００５９】次に動作について説明する。言語処理部１
および音韻パラメータ生成部３の動作については実施の
形態１によるものと同様であるのでその説明を省略す
る。Next, the operation will be described. Language processing unit 1
The operation of the phoneme parameter generation unit 3 is the same as that of the first embodiment, and a description thereof will be omitted.

【００６０】韻律パラメータ生成ルール選択部２２は、
音声出力中にユーザのルール選択入力が供給されると、
その入力に応じて複数韻律パラメータ生成ルール記憶部
２１から韻律パラメータ生成ルールのうちの１つを読み
出して韻律パラメータ生成部６に供給する。なお、ルー
ル選択入力が供給されていない場合には、所定の韻律パ
ラメータ生成ルールがデフォルトとして供給される。The prosody parameter generation rule selection unit 22
When a user's rule selection input is supplied during audio output,
In response to the input, one of the prosody parameter generation rules is read from the plural prosody parameter generation rule storage unit 21 and supplied to the prosody parameter generation unit 6. When the rule selection input is not supplied, a predetermined prosody parameter generation rule is supplied as a default.

【００６１】例えば図１０に示すルール番号と変換ルー
ル内容の対応を画面に表示し、ユーザがいずれかのルー
ル番号を選択した場合に、その選択の結果をルール選択
入力として韻律パラメータ生成ルール選択部２２に供給
するようにする。For example, the correspondence between the rule numbers and the conversion rule contents shown in FIG. 10 is displayed on the screen, and when the user selects any one of the rule numbers, the result of the selection is used as a rule selection input and the prosody parameter generation rule selection unit. 22.

【００６２】そして韻律パラメータ生成部６は韻律パラ
メータ生成ルール選択部２２により供給された韻律パラ
メータ生成ルールに従って、言語処理部１からの言語情
報に基づいて韻律パラメータを生成する。例えば図６に
示す言語情報が韻律パラメータ生成部６に供給され、ア
クセント句「家族」の音声出力中にルール番号２のルー
ル選択入力が韻律パラメータ生成ルール選択部２２に供
給された場合、後続のアクセント句「みんなで」のアク
セント強さが０．２オクターブだけ増加され、図７にお
ける操作入力ありの場合と同様のピッチパタンおよび音
韻継続時間長が生成される。Then, the prosody parameter generation unit 6 generates prosody parameters based on the linguistic information from the language processing unit 1 in accordance with the prosody parameter generation rules supplied by the prosody parameter generation rule selection unit 22. For example, when the linguistic information shown in FIG. 6 is supplied to the prosody parameter generation unit 6 and the rule selection input of the rule number 2 is supplied to the prosody parameter generation rule selection unit 22 during the audio output of the accent phrase “family”, The accent strength of the accent phrase "everyone" is increased by 0.2 octaves, and the same pitch pattern and phoneme duration as in the case with the operation input in FIG. 7 are generated.

【００６３】そして音声合成部７は音韻パラメータ生成
部３からの音韻パラメータと韻律パラメータ生成部６か
らの韻律パラメータとに基づいて実施の形態１と同様に
して音声を合成し出力する。The speech synthesizer 7 synthesizes and outputs speech based on the phoneme parameters from the phoneme parameter generator 3 and the prosody parameters from the prosody parameter generator 6 in the same manner as in the first embodiment.

【００６４】以上のように、この実施の形態２によれ
ば、複数の韻律パラメータ生成ルールから音声出力中の
操作入力に応じて１つの韻律パラメータ生成ルールを選
択し、選択した韻律パラメータ生成ルールに従って言語
情報から韻律パラメータを生成するようにしたので、音
声の出力中にユーザの操作に応じてピッチパタン、音韻
継続時間長などの韻律が変更され、音声出力中に音声の
部分的な強調などをユーザが行うことができ、高い表現
性を有する会話代替手段として本音声合成装置を使用す
ることができるという効果が得られる。As described above, according to the second embodiment, one prosody parameter generation rule is selected from a plurality of prosody parameter generation rules in accordance with an operation input during voice output, and according to the selected prosody parameter generation rule. Since prosodic parameters are generated from linguistic information, the prosody such as pitch pattern and phoneme duration is changed according to the user's operation during speech output, and partial emphasis of speech during speech output is performed. An effect is obtained that the speech synthesizer can be used by the user and can be used as a conversation alternative having high expressiveness.

【００６５】実施の形態３．図１１はこの発明の実施の
形態３による音声合成装置の構成を示すブロック図であ
る。図１１において、３１は音声出力中の韻律変更操作
入力（第１の操作入力）に応じて韻律パラメータを変更
する韻律パラメータ変更部である。なお、韻律パラメー
タ生成部６および韻律パラメータ変更部３１は、韻律パ
ラメータ生成ルールに従って言語情報から韻律パラメー
タを生成するとともに音声出力中の操作入力に応じて後
続の韻律パラメータを変更する韻律パラメータ制御部８
１Ｂを構成する。Embodiment 3 FIG. 11 is a block diagram showing a configuration of a voice synthesizing apparatus according to Embodiment 3 of the present invention. In FIG. 11, reference numeral 31 denotes a prosody parameter changing unit that changes a prosody parameter according to a prosody change operation input (first operation input) during voice output. The prosody parameter generation unit 6 and the prosody parameter change unit 31 generate a prosody parameter from the linguistic information in accordance with the prosody parameter generation rule and change a subsequent prosody parameter according to an operation input during voice output.
1B.

【００６６】なお、図１１におけるその他の構成要素に
ついては実施の形態１または実施の形態２によるもの
（図１、図９）と同様であるのでその説明を省略する。The other components in FIG. 11 are the same as those according to the first or second embodiment (FIGS. 1 and 9), and a description thereof will be omitted.

【００６７】次に動作について説明する。言語処理部１
および音韻パラメータ生成部３の動作については実施の
形態１と同様であるのでその説明を省略する。Next, the operation will be described. Language processing unit 1
The operation of the phoneme parameter generation unit 3 is the same as that of the first embodiment, and a description thereof will be omitted.

【００６８】韻律パラメータ生成部６は、韻律パラメー
タ生成ルール記憶部５から韻律パラメータ生成ルールを
読み出し、言語処理部１からの言語情報に基づいて韻律
パラメータを生成し、韻律パラメータ変更部３１に供給
する。例えば図６に示す言語情報が韻律パラメータ生成
部６に供給された場合、後続のアクセント句「みんな
で」について、図７における操作入力なしの場合と同様
のピッチパタンおよび音韻継続時間長が生成される。The prosody parameter generation unit 6 reads the prosody parameter generation rules from the prosody parameter generation rule storage unit 5, generates the prosody parameters based on the linguistic information from the language processing unit 1, and supplies them to the prosody parameter change unit 31. . For example, when the linguistic information shown in FIG. 6 is supplied to the prosody parameter generation unit 6, the same pitch pattern and phoneme duration as those in the case where there is no operation input in FIG. You.

【００６９】次に韻律パラメータ変更部３１は、音声出
力中にユーザの韻律変更操作入力を供給されると、その
操作入力が供給されている間だけ、その操作入力に応じ
て韻律パラメータ生成部６からの韻律パラメータを変更
し、変更後の韻律パラメータを音声合成部７に供給す
る。Next, when the prosody parameter changing unit 31 is supplied with the user's prosody changing operation input during the voice output, the prosody parameter generation unit 6 according to the operation input only while the operation input is being supplied. Is changed, and the changed prosody parameter is supplied to the speech synthesizer 7.

【００７０】このときの韻律パラメータ変更部３１の動
作について具体例を用いて詳しく説明する。この実施の
形態３における韻律変更操作入力は、例えば後続の音声
のピッチと音韻継続時間長をそれぞれ変更するための２
本のレバーを備えた図示せぬ入力手段に対して行われる
ユーザの操作に対応して供給されるものとする。図１２
は韻律変更操作入力に応じて変更されるピッチパタンの
一例を示す図である。The operation of the prosody parameter changing unit 31 at this time will be described in detail using a specific example. The prosody change operation input in the third embodiment is, for example, 2 for changing the pitch of the subsequent voice and the phoneme duration.
It is assumed that the information is supplied in response to a user's operation performed on an input unit (not shown) including the lever. FIG.
FIG. 8 is a diagram showing an example of a pitch pattern changed according to a prosody changing operation input.

【００７１】この場合、ユーザによりそれぞれ対応する
レバーが操作されている間、その操作量（例えばレバー
の角度）が韻律変更操作入力として供給され、音声のピ
ッチや音韻継続時間長がその操作量に対応する割合だけ
変更される。In this case, while the corresponding lever is operated by the user, the operation amount (for example, the angle of the lever) is supplied as the prosody change operation input, and the pitch of the voice and the phoneme duration are added to the operation amount. It is changed by the corresponding ratio.

【００７２】例えば音韻継続時間長の変更は母音出力時
にのみ有効になるものとし、図６に示す言語情報が韻律
パラメータ生成部６に供給され、韻律パラメータが生成
された場合、アクセント句「みんなで」の「みんな」を
強調するために、音韻系列「ｍｉＮｎａ」に対応する音
声出力中にピッチが上昇するようにユーザがピッチ変更
のためのレバーを操作すると、その操作に対応して、例
えば図１２に示すようにピッチパタンが変更される。ま
たこの場合において、「み」の時間長を長くして「みん
なで」を強調するために、音節「ｍｉ」の音声出力中に
例えば時間長が２倍になるようにユーザがレバーを操作
すると、その操作に対応して母音／ｉ／の音韻継続時間
長が３００ミリ秒に変更される。For example, it is assumed that the change of the phoneme duration is effective only at the time of outputting a vowel, and the linguistic information shown in FIG. 6 is supplied to the prosody parameter generation unit 6. When the user operates the lever for changing the pitch so as to raise the pitch during the voice output corresponding to the phoneme sequence “miNna” to emphasize “everyone” of “ As shown in FIG. 12, the pitch pattern is changed. Also, in this case, in order to lengthen the time length of “mi” and emphasize “everyone”, when the user operates the lever so that the time length doubles during the voice output of the syllable “mi”, for example. , The phoneme duration of the vowel / i / is changed to 300 milliseconds in response to the operation.

【００７３】そして音声合成部７は、音韻パラメータ生
成部３からの音韻パラメータと韻律パラメータ変更部３
１からの韻律パラメータとに基づいて音声を合成し、出
力する。The speech synthesizing unit 7 converts the phoneme parameters from the phoneme parameter generation unit 3 and the prosody parameter changing unit 3
The speech is synthesized based on the prosodic parameter from No. 1 and output.

【００７４】以上のように、この実施の形態３によれ
ば、韻律パラメータ生成部６により生成された韻律パラ
メータを音声出力中の韻律変更操作入力に応じて変更す
るようにしたので、出力中の合成音声の部分的な強調な
どをユーザが即座に行うことができ、高い表現性を有す
る会話代替手段として本音声合成装置を使用することが
できるという効果が得られる。As described above, according to the third embodiment, the prosody parameters generated by the prosody parameter generation unit 6 are changed in accordance with the prosody change operation input during voice output. The effect is obtained that the user can instantly perform partial emphasis on the synthesized speech and the like, and that the speech synthesis apparatus can be used as a conversation alternative having high expressiveness.

【００７５】実施の形態４．図１３はこの発明の実施の
形態４による音声合成装置の構成を示すブロック図であ
る。図１３において、６Ａは韻律パラメータ生成ルール
に従って言語情報から韻律パラメータを生成するととも
に、生成した韻律パラメータと言語情報とに基づいてア
クセント句境界および音節境界のタイミングを有する境
界情報を生成する境界情報／韻律パラメータ生成部（韻
律パラメータ生成部）であり、３１Ａは境界情報に基づ
いて決定される韻律パラメータの変更可能な部分から音
声出力中の韻律変更区間入力（第２の操作入力）に応じ
て韻律パラメータを変更する部分を選択し、その部分の
韻律パラメータを音声出力中の韻律変更操作入力（第１
の操作入力）に応じて変更する区間指定韻律パラメータ
変更部（韻律パラメータ変更部）である。なお、境界情
報／韻律パラメータ生成部６Ａおよび区間指定韻律パラ
メータ変更部３１Ａは、韻律パラメータ生成ルールに従
って言語情報から韻律パラメータを生成するとともに音
声出力中の操作入力に応じて後続の韻律パラメータを変
更する韻律パラメータ制御部８１Ｃを構成する。Embodiment 4 FIG. 13 is a block diagram showing a configuration of a voice synthesizing apparatus according to Embodiment 4 of the present invention. In FIG. 13, 6A generates prosody parameters from linguistic information in accordance with a prosody parameter generation rule, and generates boundary information having timings of accent phrase boundaries and syllable boundaries based on the generated prosody parameters and linguistic information. A prosody parameter generation unit (prosody parameter generation unit) 31A includes a prosody parameter change unit that determines the prosody from a changeable portion of the prosody parameter determined based on the boundary information according to a prosody change section input (second operation input) during voice output. A part whose parameter is to be changed is selected, and the prosody parameter of that part is input to the prosody changing operation input (first
Is a section-specific prosody parameter changing unit (prosody parameter changing unit) that changes in accordance with the operation input of the above. The boundary information / prosodic parameter generation unit 6A and the section-specific prosody parameter changing unit 31A generate prosody parameters from linguistic information in accordance with the prosody parameter generation rules, and change subsequent prosody parameters according to an operation input during voice output. It constitutes the prosody parameter control unit 81C.

【００７６】なお、図１３におけるその他の構成要素に
ついては実施の形態３によるもの（図１１）と同様であ
るのでその説明を省略する。The other components in FIG. 13 are the same as those according to the third embodiment (FIG. 11), and will not be described.

【００７７】次に動作について説明する。言語処理部１
および音韻パラメータ生成部３の動作については実施の
形態１、３と同様であるのでその説明を省略する。Next, the operation will be described. Language processing unit 1
The operation of the phoneme parameter generation unit 3 is the same as that of the first and third embodiments, and the description thereof is omitted.

【００７８】境界情報／韻律パラメータ生成部６Ａは、
韻律パラメータ生成ルール記憶部５から韻律パラメータ
生成ルールを読み出し、言語処理部１からの言語情報に
基づいて韻律パラメータを生成し、その言語情報および
生成した韻律パラメータのうちの音韻継続時間長に基づ
いて、アクセント句境界および音節境界の時刻を有する
境界情報を生成し、韻律パラメータと境界情報を区間指
定韻律パラメータ変更部３１Ａに供給する。The boundary information / prosodic parameter generation unit 6A
The prosody parameter generation rule is read from the prosody parameter generation rule storage unit 5, a prosody parameter is generated based on the linguistic information from the linguistic processing unit 1, and based on the linguistic information and the phoneme duration of the generated prosody parameter. , Generates boundary information having the time of the accent phrase boundary and the time of the syllable boundary, and supplies the prosody parameter and the boundary information to the section-designated prosody parameter changing unit 31A.

【００７９】例えば図６に示す言語情報が境界情報／韻
律パラメータ生成部６Ａに供給された場合、後続のアク
セント句「みんなで」について、図７における操作入力
なしの場合と同様のピッチパタンおよび音韻継続時間長
が生成される。図１４は図６に示す言語情報に基づいて
境界情報／韻律パラメータ生成部６Ａにより生成される
境界情報の一例を示す図である。図１４（ａ）ではこの
ときのアクセント句境界時刻の一例を示しており、図１
４（ｂ）ではアクセント句「みんなで」の音節境界時刻
の一例を示している。For example, when the linguistic information shown in FIG. 6 is supplied to the boundary information / prosodic parameter generation unit 6A, the same pitch pattern and phoneme as in the case of no operation input in FIG. A duration is generated. FIG. 14 is a diagram showing an example of the boundary information generated by the boundary information / prosodic parameter generation unit 6A based on the linguistic information shown in FIG. FIG. 14A shows an example of the accent phrase boundary time at this time, and FIG.
FIG. 4B shows an example of the syllable boundary time of the accent phrase “everyone”.

【００８０】そして区間指定韻律パラメータ変更部３１
Ａは、供給された境界情報に基づいてユーザに対して韻
律パラメータの変更が可能な区間の候補を提示し、音声
出力中のユーザによる韻律変更区間入力に基づいて韻律
パラメータを変更する区間を決定し、音声出力中のユー
ザによる韻律変更操作入力に基づいて、その区間の韻律
パラメータを変更して音声合成部７に供給する。なお、
韻律変更操作入力および韻律変更区間入力のいずれかが
供給されていないときには、区間指定韻律パラメータ変
更部３１Ａは、境界情報／韻律パラメータ生成部６Ａか
らの韻律パラメータをそのまま音声合成部７に供給す
る。Then, the section designation prosody parameter changing unit 31
A presents a candidate of a section in which the prosodic parameter can be changed to the user based on the supplied boundary information, and determines a section in which the prosodic parameter is changed based on the input of the prosody changing section by the user during voice output. Then, based on the input of the prosody changing operation by the user who is outputting the voice, the prosody parameter of the section is changed and supplied to the voice synthesizer 7. In addition,
When either the prosody change operation input or the prosody change section input is not supplied, the section designation prosody parameter change unit 31A supplies the prosody parameter from the boundary information / prosody parameter generation unit 6A to the speech synthesis unit 7 as it is.

【００８１】ここで区間指定韻律パラメータ変更部３１
Ａの動作について具体例を用いて詳しく説明する。この
実施の形態４における韻律変更区間入力は、例えばディ
スプレイとその画面に配置されたタッチパネルのような
図示せぬ表示入力装置の画面上に表示されたアクセント
句あるいは音節をユーザが触れることにより、そのアク
セント句あるいは音節が韻律変更区間入力として供給さ
れる。また、この実施の形態４における韻律変更操作入
力は、例えば指定した区間の音声のピッチと音韻継続時
間長をそれぞれ変更するための２本のレバーを備えた図
示せぬ入力手段に対して行われるユーザの操作に対応し
て供給されるものとする。Here, the section designation prosody parameter changing unit 31
The operation of A will be described in detail using a specific example. The prosody change section input in the fourth embodiment is performed by, for example, touching an accent phrase or syllable displayed on a screen of a display input device (not shown) such as a display and a touch panel disposed on the screen. Accent phrases or syllables are supplied as prosodic change interval inputs. In addition, the prosody change operation input in the fourth embodiment is performed, for example, to an input unit (not shown) having two levers for changing the voice pitch and the phoneme duration of the designated section, respectively. It shall be supplied in response to a user operation.

【００８２】図１５は、区間入力を行う表示／入力装置
の表示画面の一例を示す図である。図１５に示すように
画面には音声出力中のテキストを構成する音節がアクセ
ント句ごとに表示され、出力中の音節にカーソルが表示
される。そして例えば１つの音節の表示部分への押圧に
よる単一音節の指定、行頭の「＊」の表示部分への押圧
によるアクセント句全体の指定、２つの音節の表示部分
への押圧による音節列の指定などにより韻律変更区間が
指定され、韻律変更区間入力として供給される。FIG. 15 is a diagram showing an example of a display screen of a display / input device for performing section input. As shown in FIG. 15, syllables constituting the text being output as speech are displayed for each accent phrase on the screen, and a cursor is displayed at the syllable being output. For example, designation of a single syllable by pressing one syllable display portion, designation of an entire accent phrase by pressing a “*” display portion at the beginning of a line, designation of a syllable string by pressing two syllable display portions A prosody change section is designated by the above method, and is supplied as a prosody change section input.

【００８３】図１６は、図１５に示す表示における韻律
変更区間の指定の一例を示す図である。図１６に示すよ
うに２つの音節「み」，「な」へ押圧した場合、音節列
「みんな」が、韻律変更操作入力に基づいて韻律が変更
される対象となる。この実施の形態４においては、指定
した区間の音声のピッチあるいは音韻継続時間長が、韻
律変更操作入力で指定される一定値（例えばレバーの角
度変化の最大値）に相当する割合だけ変更される。FIG. 16 is a diagram showing an example of designation of a prosody change section in the display shown in FIG. As shown in FIG. 16, when two syllables “mi” and “na” are pressed, the syllable string “everyone” is the target whose prosody is changed based on the prosody changing operation input. In the fourth embodiment, the pitch or phoneme duration of the voice in the specified section is changed by a ratio corresponding to a constant value (for example, the maximum value of the lever angle change) specified by the prosody changing operation input. .

【００８４】例えば上記の場合において、音韻継続時間
長の変更を母音および撥音についてのみ有効なものとす
ると、音韻継続時間長の変更に対応するレバーが時間長
を５０パーセント長くするように操作されたときには、
区間「みんな」に含まれる／ｉ／、／Ｎ／、および／ａ
／の時間長が変更される。図１７は、時間長を５０パー
セント長くするように操作された場合と操作されない場
合の音節列「みんなで」の音韻継続時間長の一例を示す
図である。韻律変更区間入力がない場合には、図１７
（ａ）に示すように韻律変更操作入力は無効となり、境
界情報／韻律パラメータ生成部６Ａにより生成された韻
律パラメータがそのまま音声合成部７に供給され、韻律
変更区間入力がある場合には、韻律変更操作入力に対応
して図１７（ｂ）に示すように音韻継続時間長が変更さ
れ、変更後の韻律パラメータが音声合成部７に供給され
る。For example, in the above case, assuming that the change in the phoneme duration is valid only for the vowel and the vowel sound, the lever corresponding to the change in the phoneme duration is operated so as to increase the time by 50%. Sometimes
/ I /, / N /, and / a included in the section "Everybody"
The time length of / is changed. FIG. 17 is a diagram illustrating an example of the phoneme duration of the syllable string “Minna ga” when the operation is performed to increase the time length by 50% and when the operation is not performed. When there is no prosody change section input, FIG.
As shown in (a), the prosody change operation input is invalidated, the prosody parameter generated by the boundary information / prosody parameter generation unit 6A is supplied to the speech synthesis unit 7 as it is, and when there is a prosody change section input, the prosody change is performed. The phoneme duration is changed as shown in FIG. 17B in response to the change operation input, and the changed prosody parameters are supplied to the speech synthesizer 7.

【００８５】そして音声合成部７は音韻パラメータ生成
部３からの音韻パラメータと区間指定韻律パラメータ変
更部３１Ａからの韻律パラメータとに基づいて実施の形
態１と同様にして音声を合成し出力する。The speech synthesis unit 7 synthesizes and outputs speech based on the phoneme parameters from the phoneme parameter generation unit 3 and the prosody parameters from the section designation prosody parameter change unit 31A in the same manner as in the first embodiment.

【００８６】以上のように、この実施の形態４によれ
ば、生成した韻律パラメータと言語情報とに基づいてア
クセント句境界および音節境界のタイミングを有する境
界情報を生成し、その境界情報に基づいて決定される韻
律パラメータの変更可能な部分から音声出力中の韻律変
更区間入力に応じて韻律パラメータを変更する部分を選
択し、その部分の韻律パラメータを音声出力中の韻律変
更操作入力に応じて変更するようにしたので、音声の出
力中にユーザの操作に応じてピッチパタン、音韻継続時
間長などの韻律が変更され、音声出力中に音声の部分的
な強調などをユーザが行うことができ、高い表現性を有
する会話代替手段として本音声合成装置を使用すること
ができるという効果が得られる。As described above, according to the fourth embodiment, boundary information having timings of accent phrase boundaries and syllable boundaries is generated based on the generated prosody parameters and linguistic information, and based on the boundary information. Select a part to change the prosody parameter according to the input of the prosody change section in the audio output from the determined prosody parameter changeable part, and change the prosody parameter of that part in accordance with the prosody change operation input in the audio output Since the pitch pattern, the prosody such as the phoneme duration is changed according to the user's operation during the output of the voice, the user can partially emphasize the voice during the voice output, and the like. An effect is obtained that the speech synthesizer can be used as a conversation alternative having high expressiveness.

【００８７】実施の形態５．図１８はこの発明の実施の
形態５による音声合成装置の構成を示すブロック図であ
る。図１８において、４１は例えば普通の声、どなり
声、ささやき声など声質や音響的な特徴が異なる複数の
音声合成単位をそれぞれ音声合成単位セットとして記憶
する音声合成単位セット記憶部である。なおこの実施の
形態５における音声合成単位セットには、それぞれ番号
と音響的な特徴を表すコメントが付いているものとす
る。Embodiment 5 FIG. 18 is a block diagram showing a configuration of a speech synthesizer according to Embodiment 5 of the present invention. In FIG. 18, reference numeral 41 denotes a speech synthesis unit set storage unit that stores a plurality of speech synthesis units having different voice qualities and acoustic characteristics, such as a normal voice, a roaring voice, and a whispering voice, as a voice synthesis unit set. It is assumed that the speech synthesis unit set according to the fifth embodiment has a number and a comment indicating an acoustic feature.

【００８８】４２は複数種類の音声合成単位セットから
音声出力中の音響特徴選択入力（操作入力）に応じて１
種類の音声合成単位セットを選択する音声合成単位セッ
ト選択部であり、３は音声合成単位セット選択部４２に
より選択された音声合成単位セットから言語情報に基づ
いて入力テキストに対応する音声合成単位を選択して音
韻パラメータとする音韻パラメータ生成部である。な
お、音韻パラメータ生成部３および音声合成単位セット
選択部４２は、言語情報に基づいて入力テキストに対応
する音声合成単位を選択して音韻パラメータとするとと
もに音声出力中の操作入力に応じて後続の音韻パラメー
タを変更する音韻パラメータ制御部８２を構成する。Reference numeral 42 designates 1 according to an audio feature selection input (operation input) during voice output from a plurality of types of voice synthesis unit sets.
A speech synthesis unit set selection unit for selecting a type of speech synthesis unit set. Reference numeral 3 denotes a speech synthesis unit corresponding to the input text based on the linguistic information from the speech synthesis unit set selected by the speech synthesis unit set selection unit. A phoneme parameter generation unit that selects and sets phoneme parameters. Note that the phoneme parameter generation unit 3 and the speech synthesis unit set selection unit 42 select a speech synthesis unit corresponding to the input text based on the linguistic information, set the speech synthesis unit as a phoneme parameter, and perform subsequent speech synthesis in accordance with an operation input during speech output. The phoneme parameter control unit 82 that changes phoneme parameters is configured.

【００８９】なお、図１８におけるその他の構成要素に
ついては実施の形態３によるもの（図１１）と同様であ
るのでその説明を省略する。The other components in FIG. 18 are the same as those according to the third embodiment (FIG. 11), and will not be described.

【００９０】次に動作について説明する。言語処理部１
および韻律パラメータ生成部６の動作については実施の
形態３と同様であり、韻律パラメータ生成部６により生
成された韻律パラメータが音声合成部７に供給される。Next, the operation will be described. Language processing unit 1
The operation of the prosody parameter generation unit 6 is the same as that of the third embodiment, and the prosody parameter generated by the prosody parameter generation unit 6 is supplied to the speech synthesis unit 7.

【００９１】音声合成単位セット選択部４２は、音声出
力中のユーザによる音響特徴選択入力を供給されると、
その音響特徴選択入力に応じて音声合成単位セット記憶
部４１から音声合成単位セットを読み出して保持する。
一方、ユーザによる音響特徴選択入力を供給されていな
い場合には、デフォルトの音声合成単位セットが音声合
成単位セット選択部４２により選択される。なお、後述
のように例えば画面において音声合成単位セットの番号
とコメントのリストをユーザに提示して、図示せぬ所定
の入力手段を操作してその番号を選択することにより、
音響特徴選択入力が音声合成単位セット選択部４２に供
給される。Upon receiving the audio feature selection input by the user who is outputting the voice, the voice synthesis unit set selection section 42
The speech synthesis unit set is read out from the speech synthesis unit set storage unit 41 in accordance with the sound feature selection input and held.
On the other hand, when the audio feature selection input by the user is not supplied, the default speech synthesis unit set selection unit 42 selects the default speech synthesis unit set. As described later, for example, by presenting the user with a list of speech synthesis unit sets and a list of comments on the screen, and operating the predetermined input means (not shown) to select the number,
The sound feature selection input is supplied to the speech synthesis unit set selection unit 42.

【００９２】ここで音声合成単位セット選択部４２の動
作について具体例を用いて詳しく説明する。図１９は音
声合成単位セット選択部４２による表示の一例を示す図
である。例えばディスプレイとその画面上に配置された
タッチパネル（上記入力手段）を表示入力装置として使
用し、図１９に示すように、画面下部に表示された番号
とコメントをユーザが押圧すると、その番号に対応する
音声合成単位セットの選択を指示する音響特徴選択入力
が音声合成単位セット選択部４２に供給される。また音
声合成単位セットのデフォルトは「通常の声」とされて
いる。Here, the operation of the speech synthesis unit set selection section 42 will be described in detail using a specific example. FIG. 19 is a diagram showing an example of a display by the speech synthesis unit set selection unit 42. For example, when a display and a touch panel (the above input means) arranged on the screen are used as a display input device, and the user presses a number and a comment displayed at the bottom of the screen as shown in FIG. An audio feature selection input for instructing selection of a speech synthesis unit set to be performed is supplied to the speech synthesis unit set selection unit 42. The default of the speech synthesis unit set is "normal voice".

【００９３】音声出力が行われている間、図１９に示す
ように出力中の音節にカーソルが順次移動する。例えば
音節「ゆ」にカーソルが移動する直前に「３：ささやき
声」の表示部分を押圧し、音節「に」にカーソルが移動
した時に離すことにより、「３：ささやき声」に対応す
る音声合成単位セットの選択を指示する音響特徴選択入
力が音声合成単位セット選択部４２に供給され、「ゆう
えんちに」という部分がささやき声で音声合成される。While the voice output is being performed, the cursor sequentially moves to the syllable being output as shown in FIG. For example, by pressing the display portion of "3: whisper" immediately before the cursor moves to the syllable "Y" and releasing it when the cursor moves to the syllable "Ni", a voice synthesis unit set corresponding to "3: whisper" Is input to the speech synthesizing unit set selecting unit 42, and the part "Yuenchi ni" is speech-synthesized with a whisper.

【００９４】このとき音声合成単位セット選択部４２
は、「３：ささやき声」の表示部分が押圧されている間
はささやき声の合成単位セットを選択し、それ以外の時
は通常の声の音声合成単位セットを選択し、保持する。At this time, the speech synthesis unit set selection section 42
Selects the whispering voice synthesis unit set while the display portion of "3: Whispering voice" is pressed, and otherwise selects and holds the normal voice voice synthesis unit set.

【００９５】そして音韻パラメータ生成部３は、言語処
理部１からの言語情報に基づいて、音声合成単位セット
選択部４２に保持されている音声合成単位セットから言
語情報に含まれる入力テキストの読みに適合する音声合
成単位を選択し、音韻パラメータとして音声合成部７に
供給する。Then, based on the linguistic information from the linguistic processing unit 1, the phoneme parameter generating unit 3 reads the input text included in the linguistic information from the utterance synthesizing unit set held in the utterance synthesizing unit set selecting unit. A suitable speech synthesis unit is selected and supplied to the speech synthesis unit 7 as a phoneme parameter.

【００９６】そして音声合成部７は音韻パラメータ生成
部３からの音韻パラメータと韻律パラメータ生成部６か
らの韻律パラメータとに基づいて実施の形態１と同様に
して音声を合成し出力する。The speech synthesizer 7 synthesizes and outputs speech based on the phoneme parameters from the phoneme parameter generator 3 and the prosody parameters from the prosody parameter generator 6 in the same manner as in the first embodiment.

【００９７】以上のように、この実施の形態５によれ
ば、声質や音響的な特徴の異なる複数種類の音声合成単
位セットから音声出力中の音響特徴選択入力に応じて１
種類の音声合成単位セットを選択し、選択した音声合成
単位セットから言語情報に基づいて入力テキストに対応
する音声合成単位を選択して音韻パラメータとするよう
にしたので、どなり声やささやき声などの声質や音響的
な特徴を変更することにより出力中の合成音声の部分的
な強調などをユーザが即座に行うことができ、高い表現
性を有する会話代替手段として本音声合成装置を使用す
ることができるという効果が得られる。As described above, according to the fifth embodiment, one set of a plurality of types of speech synthesis units having different voice qualities and acoustic features is provided in accordance with the acoustic feature selection input being outputted.
The type of speech synthesis unit set is selected, and the speech synthesis unit corresponding to the input text is selected from the selected speech synthesis unit set based on the linguistic information and used as the phonological parameter. By changing the sound characteristics and the acoustic characteristics, the user can instantly perform partial emphasis on the synthesized speech being output, and can use the present speech synthesis apparatus as a conversation alternative having high expressiveness. The effect is obtained.

【００９８】実施の形態６．図２０はこの発明の実施の
形態６による音声合成装置の構成を示すブロック図であ
り、図２１は図２０の固定メッセージ判定言語処理部の
構成例を示すブロック図である。図２０において、５１
は入力テキストのうちの所定の固定メッセージの部分を
検出し、固定メッセージ以外の部分を解析し、その固定
メッセージ以外の部分に対応するアクセント句境界位
置、アクセント型、読みおよび韻律制御パラメータを少
なくとも含む言語情報を生成する固定メッセージ判定言
語処理部（言語処理部）である。Embodiment 6 FIG. FIG. 20 is a block diagram showing a configuration of a speech synthesizer according to Embodiment 6 of the present invention, and FIG. 21 is a block diagram showing a configuration example of a fixed message determination language processing unit in FIG. In FIG. 20, 51
Detects a predetermined fixed message part of the input text, analyzes a part other than the fixed message, and includes at least an accent phrase boundary position, an accent type, a reading and a prosody control parameter corresponding to the part other than the fixed message. A fixed message determination language processing unit (language processing unit) that generates language information.

【００９９】図２１に示す固定メッセージ判定言語処理
部５１において、６１は固定メッセージのテキストを番
号付きで記憶する固定メッセージリスト記憶部である。
この実施の形態６においては、例えば「こんにちは」、
「おはようございます」、「いくらですか」などの使用
頻度の高い定型文が固定メッセージとして記憶されてい
る。６２は入力テキストのうちの、固定メッセージリス
ト記憶部６１に記憶された固定メッセージのいずれかに
一致する部分を検出し、検出した固定メッセージの番号
を固定メッセージ選択部５３に供給し、その固定メッセ
ージ以外の部分をテキスト解析部１２に供給する固定メ
ッセージ判定部である。In the fixed message determination language processing unit 51 shown in FIG. 21, reference numeral 61 denotes a fixed message list storage unit for storing texts of fixed messages with numbers.
In the sixth embodiment, for example, "Hello",
A frequently used fixed phrase such as "Good morning" or "How much is it?" Is stored as a fixed message. 62 detects a portion of the input text that matches one of the fixed messages stored in the fixed message list storage unit 61, supplies the number of the detected fixed message to the fixed message selection unit 53, and This is a fixed message determination unit that supplies other parts to the text analysis unit 12.

【０１００】図２０に戻り、５２は固定メッセージ番号
とそれに対応する複数種類の音声波形データとの組であ
る固定メッセージデータを複数記憶する固定メッセージ
記憶部である。図２２は固定メッセージデータの一例を
示す図である。図２２に示すように、この実施の形態６
における音声波形データは、ユーザによるメッセージ選
択入力に応じて固定メッセージ選択部５３により１つの
音声波形データを選択するための音声波形データの番号
およびコメント、並びに音声ファイルを有する。なお、
図２２に示す固定メッセージデータでは音声ファイルは
ファイル名により指定されている。また、この実施の形
態６においては固定メッセージの韻律を変更する方法と
して後述のようにＰＳＯＬＡから類推される手法を用い
るため、音声ファイルは音声波形とピッチ同期点の情報
とを有する。Returning to FIG. 20, reference numeral 52 denotes a fixed message storage unit for storing a plurality of fixed message data, which is a set of a fixed message number and a plurality of types of voice waveform data corresponding thereto. FIG. 22 shows an example of fixed message data. As shown in FIG.
Has a voice waveform data number and a comment for selecting one voice waveform data by the fixed message selection unit 53 in response to a message selection input by the user, and a voice file. In addition,
In the fixed message data shown in FIG. 22, the audio file is specified by the file name. Further, in the sixth embodiment, since a method inferred from PSOLA is used as a method for changing the prosody of a fixed message as described later, the audio file has an audio waveform and information on a pitch synchronization point.

【０１０１】５３は固定メッセージ番号およびメッセー
ジ選択入力（操作入力）に応じて固定メッセージの音声
波形データを選択する固定メッセージ選択部であり、５
４は音声出力中の固定メッセージ韻律変更操作入力（操
作入力）に応じて固定メッセージの音声波形データを変
更して固定メッセージの韻律を変更する固定メッセージ
韻律変更部であり、５５は固定メッセージを音声として
再生する音声再生部である。Reference numeral 53 denotes a fixed message selection unit for selecting voice waveform data of a fixed message according to a fixed message number and a message selection input (operation input).
Reference numeral 4 denotes a fixed message prosody changing unit which changes the fixed message prosody by changing the voice waveform data of the fixed message in response to the fixed message prosody changing operation input (operation input) during voice output. This is an audio playback unit that plays back as a sound.

【０１０２】なお図２０および図２１におけるその他の
構成要素については実施の形態１によるもの（図１、図
２）と同様であるのでその説明を省略する。The other components in FIGS. 20 and 21 are the same as those according to the first embodiment (FIGS. 1 and 2), and a description thereof will be omitted.

【０１０３】次に動作について説明する。固定メッセー
ジ判定言語処理部５１は、入力テキストを供給される
と、固定メッセージ判定部６２によりその入力テキスト
を固定メッセージリスト記憶部６１に予め記憶されてい
る固定メッセージに一致する部分とそれ以外の部分とに
分け、一致した固定メッセージの固定メッセージ番号を
固定メッセージ選択部５３に供給し、テキスト解析部１
２により固定メッセージ以外のテキストを解析してアク
セント句境界位置を決定して韻律制御パラメータ生成部
１４によりアクセント強さや平均モーラ長などの韻律制
御パラメータを所定の規則に従って決定し、アクセント
句境界位置、読み、アクセント型および韻律制御パラメ
ータを少なくとも含む言語情報を出力する。Next, the operation will be described. When the input text is supplied, the fixed message determination language processing unit 51 converts the input text by the fixed message determination unit 62 into a portion that matches the fixed message stored in the fixed message list storage unit 61 in advance and a portion other than the fixed message. The fixed message number of the matched fixed message is supplied to the fixed message selection unit 53, and the text analysis unit 1
2, the text other than the fixed message is analyzed to determine the accent phrase boundary position, and the prosody control parameter generation unit 14 determines the prosody control parameters such as accent strength and average mora length according to predetermined rules. It outputs linguistic information including at least reading, accent type and prosody control parameters.

【０１０４】固定メッセージ判定言語処理部５１の動作
について具体例を用いて詳しく説明する。例えばキー入
力やかな漢字変換などを使用してユーザにより作成され
た漢字かな混じり文である入力テキストが固定メッセー
ジ判定言語処理部５１に供給される。ここでは具体例と
して入力テキストが「おはようございます。明日家族み
んなで遊園地に行くの。」なる文である場合の動作につ
いて説明する。The operation of the fixed message determination language processing section 51 will be described in detail using a specific example. For example, an input text that is a kanji-kana mixed sentence created by a user using a key input or a kana-kanji conversion is supplied to the fixed message determination language processing unit 51. Here, as a specific example, an operation in a case where the input text is a sentence “Good morning. The whole family will go to an amusement park tomorrow.”

【０１０５】この入力テキストを供給された固定メッセ
ージ判定部６２は、固定メッセージリスト記憶部６１の
固定メッセージリストを参照し、入力テキストを固定メ
ッセージである「おはようございます」の部分と固定メ
ッセージ以外のテキストである「明日家族みんなで遊園
地に行くの」の部分に分け、その固定メッセージ「おは
ようございます」の固定メッセージ番号を固定メッセー
ジ選択部５３に供給し、固定メッセージ以外のテキスト
をテキスト解析部１２に供給する。The fixed message judging section 62 supplied with the input text refers to the fixed message list in the fixed message list storage section 61 and determines the input text as a fixed message "Good morning" and a message other than the fixed message. The text is divided into "Tomorrow, the whole family will go to the amusement park". The fixed message number of the fixed message "Good morning" is supplied to the fixed message selection unit 53, and the text other than the fixed message is analyzed by the text analysis unit. 12

【０１０６】なおテキストを入力されたテキスト解析部
１２および韻律制御パラメータ生成部１４の動作につい
ては、実施の形態１における言語処理部１のテキスト解
析部１２および韻律制御パラメータ生成部１４の動作と
同様であるのでその説明を省略する。また音韻パラメー
タ生成部３、韻律制御パラメータ変更部４、韻律パラメ
ータ生成部６および音声合成部７の動作についても実施
の形態１と同様であるのでその説明を省略する。The operation of text analysis unit 12 and prosody control parameter generation unit 14 to which the text is input is the same as the operation of text analysis unit 12 and prosody control parameter generation unit 14 of language processing unit 1 in the first embodiment. Therefore, the description is omitted. Also, the operations of the phoneme parameter generation unit 3, the prosody control parameter change unit 4, the prosody parameter generation unit 6, and the speech synthesis unit 7 are the same as those in the first embodiment, and thus the description thereof is omitted.

【０１０７】一方、固定メッセージ選択部５３は、供給
された固定メッセージ番号に対応する音声波形データを
固定メッセージ記憶部５２から読み出し、例えば音声波
形データの番号とコメントを画面に表示し、それに対応
するユーザによるメッセージ選択入力に応じた種類の声
質の音声波形データを選択し、基準固定メッセージとし
て固定メッセージ韻律変更部５４に供給する。例えば図
２２に示す固定メッセージデータの場合、固定メッセー
ジ「おはようございます」が検出されたときには、固定
メッセージ番号として２が供給される。On the other hand, the fixed message selecting section 53 reads out the audio waveform data corresponding to the supplied fixed message number from the fixed message storage section 52, and displays, for example, the number of the audio waveform data and the comment on the screen, and responds to it. Speech waveform data of a type of voice quality according to the user's message selection input is selected and supplied to the fixed message prosody changing unit 54 as a reference fixed message. For example, in the case of the fixed message data shown in FIG. 22, when the fixed message "Good morning" is detected, 2 is supplied as the fixed message number.

【０１０８】図２３は図２２の固定メッセージデータに
対応して画面に表示される音声波形データの番号とコメ
ントの一例を示す図である。例えばディスプレイとその
画面上に配置されたタッチパネルなどの表示入力装置に
おける表示部（ディスプレイ）にこれらが表示され、入
力部（タッチパネル）における例えば番号３の部分をユ
ーザが押圧した場合、メッセージ選択入力としてその番
号３が固定メッセージ選択部５３に供給され、固定メッ
セージ選択部５３は、固定メッセージデータからファイ
ル名「ｄ０２０３」の音声ファイルの音声波形データを
基準固定メッセージとして固定メッセージ韻律変更部５
４に供給する。FIG. 23 is a diagram showing an example of voice waveform data numbers and comments displayed on the screen corresponding to the fixed message data of FIG. For example, when these are displayed on a display unit (display) of a display and a display input device such as a touch panel arranged on the screen, and a user presses, for example, a portion of number 3 on the input unit (touch panel), the message selection input is performed. The number 3 is supplied to the fixed message selection unit 53, and the fixed message selection unit 53 uses the audio waveform data of the audio file with the file name “d0203” from the fixed message data as the fixed message prosody change unit 5
4

【０１０９】固定メッセージ韻律変更部５４は、音声出
力中のユーザによる固定メッセージ韻律変更操作入力に
基づいて基準固定メッセージの音声波形データを変更
し、変更後の音声波形データを音声再生部５５に供給す
る。The fixed message prosody changing unit 54 changes the voice waveform data of the reference fixed message based on the fixed message prosody changing operation input by the user who is outputting the voice, and supplies the changed voice waveform data to the voice reproducing unit 55. I do.

【０１１０】ここで固定メッセージ韻律変更部の動作に
ついて具体例を用いて詳しく説明する。この実施の形態
６における固定メッセージ韻律変更操作入力は、例えば
後続の音声のピッチと音韻継続時間長をそれぞれ変更す
るための２本のレバーを備えた図示せぬ入力手段に対し
て行われるユーザの操作に対応して供給されるものとす
る。そしてその操作量（例えばレバーの角度）が固定メ
ッセージ韻律変更操作入力として供給され、音声のピッ
チおよび音韻継続時間長がその操作量に対応する割合だ
け変更される。また、この実施の形態６においては、音
韻継続時間長の変更は母音出力時にのみ有効になるもの
とする。Here, the operation of the fixed message prosody changing unit will be described in detail using a specific example. The fixed message prosody changing operation input in the sixth embodiment is performed by, for example, a user's input performed on input means (not shown) having two levers for respectively changing the pitch of the subsequent voice and the phoneme duration. It shall be supplied in response to the operation. Then, the operation amount (for example, the angle of the lever) is supplied as a fixed message prosody changing operation input, and the pitch of the voice and the phoneme duration are changed by a ratio corresponding to the operation amount. Further, in the sixth embodiment, it is assumed that the change of the phoneme duration is effective only when a vowel is output.

【０１１１】なお、固定メッセージ韻律変更部５４で
は、ピッチが高くなるようにレバーが操作された場合、
すなわちピッチ周期を短くする場合には、ピッチ同期点
を中心に変更後のピッチ周期の２倍長のハニング窓で元
の音声波形を切り出し、変更後のピッチ周期の間隔で加
算重畳する。一方、ピッチが低くなるようにレバーが操
作された場合、すなわちピッチ周期を長くする場合に
は、元の音声波形ピッチ周期の２倍の窓長で切り出し、
同様にして加算重畳する。このようにして固定メッセー
ジのピッチが変更される。In the fixed message prosody changing unit 54, when the lever is operated to increase the pitch,
That is, when shortening the pitch cycle, the original speech waveform is cut out using a Hanning window twice as long as the changed pitch cycle around the pitch synchronization point, and is added and superimposed at intervals of the changed pitch cycle. On the other hand, when the lever is operated to lower the pitch, that is, when the pitch period is lengthened, the window is cut out with a window length twice as long as the original voice waveform pitch period,
Similarly, superposition is performed. In this way, the pitch of the fixed message is changed.

【０１１２】また、固定メッセージ韻律変更部５４で
は、音韻継続時間長を長くするようにレバーが操作され
た場合には、１周期分の波形を繰り返して、指定された
音韻継続時間長とする。一方、音韻継続時間長を短くす
るようにレバーが操作された場合には、ピッチ周期単位
で波形の間引きを行う。このようにして固定メッセージ
の音韻継続時間長が変更される。When the lever is operated to increase the phoneme duration, the fixed message prosody changing section 54 repeats the waveform for one cycle to obtain the designated phoneme duration. On the other hand, when the lever is operated so as to shorten the phoneme duration, the waveform is thinned out in pitch cycle units. Thus, the phoneme duration of the fixed message is changed.

【０１１３】そして音声再生部５５は固定メッセージ韻
律変更部５４からの音声波形データに基づいて固定メッ
セージを再生音声として出力する。The voice reproducing unit 55 outputs the fixed message as the reproduced voice based on the voice waveform data from the fixed message prosody changing unit 54.

【０１１４】以上のように、この実施の形態６によれ
ば、入力テキストにおける固定メッセージ以外のテキス
トについては実施の形態１と同様にして音声合成し、入
力テキストにおける固定メッセージについては、音声出
力中の操作入力に応じた声質の固定メッセージの音声波
形データを選択し、音声出力中の操作入力に応じて音声
波形データを変更して固定メッセージの韻律を変更し、
固定メッセージを音声として再生するようにしたので、
固定メッセージかそれ以外のテキストかを問わずに、出
力中の合成音声の部分的な強調などをユーザが行うこと
ができ、高い表現性を有する会話代替手段として本音声
合成装置を使用することができるという効果が得られ
る。As described above, according to the sixth embodiment, the text other than the fixed message in the input text is subjected to voice synthesis in the same manner as in the first embodiment, and the fixed message in the input text is output during voice output. Select the voice waveform data of the fixed message of voice quality according to the operation input of, change the voice waveform data according to the operation input during voice output, change the prosody of the fixed message,
Since the fixed message is played as audio,
Regardless of whether the message is a fixed message or other text, the user can perform partial emphasis on the synthesized speech being output, and can use the speech synthesis apparatus as a conversational alternative having high expressiveness. The effect that it can be obtained is obtained.

【０１１５】なお、実施の形態２〜５による音声合成装
置に、実施の形態１における韻律制御パラメータ変更部
４をさらに設けるようにしてもよい。The speech synthesizing apparatuses according to the second to fifth embodiments may further include the prosody control parameter changing unit 4 according to the first embodiment.

【０１１６】また、実施の形態３〜５による音声合成装
置に、韻律パラメータ生成ルール記憶部５の代わりに実
施の形態２における複数韻律パラメータ生成ルール記憶
部２１を設け、実施の形態２における韻律パラメータ生
成ルール選択部２２をさらに設けるようにしてもよい。Further, in the speech synthesizing apparatus according to the third to fifth embodiments, a plurality of prosody parameter generation rule storage units 21 according to the second embodiment are provided instead of the prosody parameter generation rule storage unit 5, and the prosody parameter according to the second embodiment is provided. The generation rule selection unit 22 may be further provided.

【０１１７】さらに、実施の形態５による音声合成装置
に、実施の形態３における韻律パラメータ変更部３１を
さらに設けるようにしてもよい。Further, the voice synthesizing apparatus according to the fifth embodiment may further include the prosody parameter changing unit 31 according to the third embodiment.

【０１１８】さらに、実施の形態２〜５による音声合成
装置に、言語処理部１の代わりに実施の形態６における
固定メッセージ判定言語処理部５１を設け、実施の形態
６における固定メッセージ記憶部５２、固定メッセージ
選択部５３、固定メッセージ韻律変更部５４および音声
再生部５５をさらに設けるようにしてもよい。Further, in the speech synthesizers according to the second to fifth embodiments, the fixed message determination language processing unit 51 according to the sixth embodiment is provided in place of the language processing unit 1, and the fixed message storage unit 52 according to the sixth embodiment is used. A fixed message selecting unit 53, a fixed message prosody changing unit 54, and a voice reproducing unit 55 may be further provided.

【０１１９】さらに、実施の形態１〜６における韻律パ
ラメータ生成ルールに、ピッチパタン生成モデルとして
藤崎モデルなど、点ピッチモデル以外のモデルを含むよ
うにし、実施の形態１〜５における言語処理部１または
実施の形態６における固定メッセージ判定言語処理部５
１により生成される言語情報には、韻律制御パラメータ
として点ピッチモデルにおけるアクセント強さの代わり
に、使用する他のピッチパタン生成モデルのパラメータ
（例えば藤崎モデルの場合にはアクセント指令の大き
さ）を含むようにしてもよい。Further, the prosodic parameter generation rules in the first to sixth embodiments include a model other than the point pitch model, such as the Fujisaki model, as the pitch pattern generation model. Fixed message determination language processing unit 5 according to Embodiment 6
The linguistic information generated by 1 includes parameters of another pitch pattern generation model to be used (for example, the magnitude of the accent command in the case of the Fujisaki model) instead of the accent strength in the point pitch model as a prosody control parameter. It may be included.

【０１２０】さらに、実施の形態１〜６における韻律パ
ラメータ生成ルールに、音韻継続時間長生成モデルとし
て、母音−子音境界を平均モーラ長の間隔に配置するモ
デルや、母音と子音との継続時間長比率を保ちつつ母音
重心点（母音開始点からのパワーの積分値が母音全体の
パワーの積分値の１／２になる点）を平均モーラ長の間
隔に配置するモデルなどの、リズム知覚点等間隔配置モ
デル以外のモデルを含むようにしてもよい。Further, in the prosodic parameter generation rules in the first to sixth embodiments, a model in which a vowel-consonant boundary is arranged at an interval of an average mora length, a duration of a vowel and a consonant, as a phoneme duration generation model. A rhythm perception point, such as a model in which the vowel centroid point (the point at which the integrated value of the power from the vowel start point becomes half the integrated value of the power of the entire vowel) is placed at intervals of the average mora length while maintaining the ratio. Models other than the interval arrangement model may be included.

【０１２１】さらに、実施の形態１〜４における言語処
理部１または実施の形態６における固定メッセージ判定
言語処理部５１により生成される言語情報には、韻律制
御パラメータとしてアクセント強さ、平均モーラ長に加
え、平均パワーを含むようにしてもよい。その場合、実
施の形態１、３、４では、韻律パラメータ生成ルール
に、その平均パワーに基づいてパワーの時間変化パタン
を生成するためのモデルを含むようにし、実施の形態
１、６における韻律制御パラメータ変更部４、実施の形
態３における韻律パラメータ変更部３１または実施の形
態４における区間指定韻律パラメータ変更部３１Ａは、
ユーザによる操作入力または韻律変更操作入力に応じて
韻律制御パラメータまたは韻律パラメータを変更して合
成音声のパワーを変更するようにする。また、その場
合、実施の形態２では、韻律パラメータ生成ルールにそ
の平均パワーの時間変化パタンを生成するためのモデル
を含むようにし、韻律パラメータ生成ルールに含まれる
変換ルールは平均パワーをモデルに適用する方法を含む
ようにする。なお、図２４は平均パワーをモデルに適用
する場合の変換ルールの一例を示す図である。Furthermore, the linguistic information generated by the language processing unit 1 in the first to fourth embodiments or the fixed message determination language processing unit 51 in the sixth embodiment includes accent strength and average mora length as prosody control parameters. In addition, the average power may be included. In that case, in the first, third, and fourth embodiments, the prosody parameter generation rule includes a model for generating a time-varying pattern of power based on the average power. The parameter changing unit 4, the prosody parameter changing unit 31 in the third embodiment, or the section-specific prosody parameter changing unit 31A in the fourth embodiment,
A prosody control parameter or a prosody parameter is changed according to an operation input or a prosody change operation input by a user to change the power of the synthesized speech. In this case, in the second embodiment, the prosody parameter generation rule includes a model for generating a time change pattern of the average power, and the conversion rule included in the prosody parameter generation rule applies the average power to the model. Include a way to do it. FIG. 24 is a diagram illustrating an example of a conversion rule when the average power is applied to the model.

【０１２２】さらに、実施の形態１、６における操作入
力、実施の形態３、４における韻律変更操作入力および
実施の形態６における固定メッセージ韻律変更操作入力
によるパラメータの変化量は、レバーの角度の代わり
に、入力手段としてのダイヤルの回転量、入力手段とし
てのボタンの押し時間、入力手段としてのジョイスティ
ックの角度、入力手段としての圧力センサに加えられる
圧力などの連続量に応じて決定するようにしてもよい
し、また、予め変化量の離散値を用意して画面に表示
し、カーソルキーやリターンキーなどのキー入力、マウ
ス操作、タッチパネルへの押圧などによって選択された
値に応じて決定するようにしてもよい。Further, the amount of parameter change due to the operation input in the first and sixth embodiments, the prosody change operation input in the third and fourth embodiments, and the fixed message prosody change operation input in the sixth embodiment is obtained by replacing the lever angle with the lever angle. Then, the amount of rotation of the dial as the input means, the pressing time of the button as the input means, the angle of the joystick as the input means, determined according to the continuous amount such as the pressure applied to the pressure sensor as the input means Alternatively, a discrete value of the amount of change may be prepared in advance and displayed on the screen, and determined according to a value selected by key input such as a cursor key or a return key, mouse operation, or pressing on a touch panel. It may be.

【０１２３】さらに、実施の形態２におけるルール選択
入力、実施の形態４における韻律変更区間入力および実
施の形態５における音韻特徴選択入力は、タッチパネル
などに表示された番号や音節の表示部分への押圧の代わ
りに、カーソルキーやリターンキーなどのキー入力、マ
ウス操作などに対応して供給するようにしてもよい。Further, the rule selection input in the second embodiment, the prosody change section input in the fourth embodiment, and the phonological feature selection input in the fifth embodiment are performed by pressing numbers or syllables displayed on a touch panel or the like. Alternatively, the information may be supplied in response to a key input such as a cursor key or a return key, or a mouse operation.

【０１２４】さらに、実施の形態４における区間指定韻
律パラメータ変更部３１Ａは、韻律変更区間入力がない
場合に韻律変更操作入力を無効とする代わりに、実施の
形態３における韻律パラメータ変更部３１と同様にして
後続の音声の韻律パラメータを変更するようにしてもよ
い。Furthermore, instead of invalidating the prosody changing operation input when there is no prosody changing section input, the section specifying prosody parameter changing section 31A in the fourth embodiment is similar to the prosody parameter changing section 31 in the third embodiment. Then, the prosody parameter of the subsequent voice may be changed.

【０１２５】さらに、実施の形態１〜６における入力テ
キストには、キー入力とかな漢字変換に基づいてユーザ
が作成した漢字かな混じり文の代わりに、ユーザあるい
は第三者が予め作成、登録した複数の文候補からユーザ
が選択した漢字かな混じり文を使用するようにしてもよ
い。Further, in the input text in the first to sixth embodiments, instead of a kanji-kana mixed sentence created by the user based on key input and kana-kanji conversion, a plurality of sentences created and registered in advance by the user or a third party are included. A sentence mixed with kanji or kana characters selected by the user from the sentence candidates may be used.

【０１２６】さらに、実施の形態１〜６における入力テ
キストには、キー入力とかな漢字変換に基づいてユーザ
が作成した漢字かな混じり文の代わりに、ユーザあるい
は第三者が予め作成、登録した、アクセント句境界、ア
クセント型、品詞、モーラ数などの言語情報付きの複数
の文候補からユーザが選択した言語情報付き漢字かな混
じり文を使用するようにしてもよい。この場合、言語処
理部１および固定メッセージ判定言語処理部５１におけ
るテキスト解析処理は不要である。Further, in the input text in the first to sixth embodiments, instead of a kanji-kana mixed sentence created by a user based on key input and kana-kanji conversion, an accent created and registered in advance by a user or a third party is used. A kanji-kana sentence with linguistic information selected by the user from a plurality of sentence candidates with linguistic information such as phrase boundaries, accent types, parts of speech, and mora number may be used. In this case, the text analysis processing in the language processing unit 1 and the fixed message determination language processing unit 51 is unnecessary.

【０１２７】さらに、実施の形態１〜６における入力テ
キストは、キー入力とかな漢字変換に基づいてユーザが
作成した漢字かな混じり文の代わりに、単語を埋め込む
可変部分と固定部分とで構成される文テンプレートに対
してユーザがキー入力や候補選択によって単語を入力し
た穴埋め文を使用するようにしてもよい。なお、実施の
形態６における固定メッセージには、入力テキストとし
てその穴埋め文を使用する場合には、使用頻度の高い文
そのものに加え、文テンプレートの固定部分を含むよう
にしてもよい。Furthermore, the input text in the first to sixth embodiments is a sentence composed of a variable part and a fixed part in which words are embedded, instead of a kanji-kana mixed sentence created by the user based on key input and kana-kanji conversion. A fill-in-the-blank sentence in which the user inputs a word by key input or candidate selection may be used for the template. In the case where the fixed message in Embodiment 6 uses the padding sentence as the input text, the fixed message may include a fixed part of the sentence template in addition to the frequently used sentence itself.

【０１２８】[0128]

【発明の効果】以上のように、この発明によれば、入力
テキストを解析し、その入力テキストに対応するアクセ
ント句境界位置、アクセント型、読みおよび韻律制御パ
ラメータを少なくとも含む言語情報を生成し、その言語
情報に基づいて入力テキストに対応する音声合成単位を
選択して音韻パラメータとするとともに、韻律パラメー
タ生成ルールに従ってその言語情報から韻律パラメータ
を生成し、音声出力中の操作入力に応じて後続の韻律パ
ラメータを変更し、その音韻パラメータと韻律パラメー
タとに基づいて音声を合成し出力するように構成したの
で、音声出力中に音声の部分的な強調などをユーザが行
うことができ、高い表現性を有する会話代替手段として
本音声合成装置を使用することができるという効果があ
る。As described above, according to the present invention, an input text is analyzed, and linguistic information including at least an accent phrase boundary position, an accent type, a reading and a prosody control parameter corresponding to the input text is generated. Based on the linguistic information, a speech synthesis unit corresponding to the input text is selected as a phonological parameter, and a prosodic parameter is generated from the linguistic information according to a prosodic parameter generation rule, and a subsequent prosody parameter is generated according to an operation input during voice output. Since the prosody parameters are changed and the speech is synthesized and output based on the phoneme parameters and the prosody parameters, the user can partially emphasize the speech during the speech output, and can achieve high expressiveness. There is an effect that the present speech synthesizer can be used as a conversation alternative means having

【０１２９】この発明によれば、音声出力中の操作入力
に応じて言語情報のうちの韻律制御パラメータを変更
し、韻律パラメータ生成ルールに従ってその言語情報か
ら韻律パラメータを生成するように構成したので、音声
の出力中にユーザの操作に応じてピッチパタン、音韻継
続時間長などの韻律を変更して、音声出力中に音声の部
分的な強調などをユーザが行うことができるという効果
がある。According to the present invention, the prosody control parameter of the linguistic information is changed according to the operation input during the voice output, and the prosody parameter is generated from the linguistic information according to the prosody parameter generation rule. There is an effect that the prosody, such as a pitch pattern and a phoneme duration, is changed according to the operation of the user during the output of the voice, so that the user can partially emphasize the voice during the voice output.

【０１３０】この発明によれば、複数の韻律パラメータ
生成ルールから音声出力中の操作入力に応じて１つの韻
律パラメータ生成ルールを選択し、選択した韻律パラメ
ータ生成ルールに従って言語情報から韻律パラメータを
生成するように構成したので、音声の出力中にユーザの
操作に応じてピッチパタン、音韻継続時間長などの韻律
が変更され、音声出力中に音声の部分的な強調などをユ
ーザが行うことができるという効果がある。According to the present invention, one prosody parameter generation rule is selected from a plurality of prosody parameter generation rules according to an operation input during voice output, and a prosody parameter is generated from linguistic information in accordance with the selected prosody parameter generation rule. With such a configuration, the prosody such as the pitch pattern and the phoneme duration is changed according to the user's operation during the output of the voice, and the user can partially emphasize the voice during the voice output. effective.

【０１３１】この発明によれば、韻律パラメータ生成ル
ールに従って言語情報から韻律パラメータを生成し、生
成した韻律パラメータを音声出力中の第１の操作入力に
応じて変更するように構成したので、出力中の合成音声
の部分的な強調などをユーザが即座に行うことができる
という効果がある。According to the present invention, the prosody parameters are generated from the linguistic information in accordance with the prosody parameter generation rules, and the generated prosody parameters are changed according to the first operation input during voice output. This has the effect that the user can instantly perform partial emphasis on the synthesized speech of.

【０１３２】この発明によれば、韻律パラメータと言語
情報とに基づいてアクセント句境界および音節境界のタ
イミングを有する境界情報を生成し、その境界情報に基
づいて決定される韻律パラメータの変更可能な部分から
音声出力中の第２の操作入力に応じて韻律パラメータを
変更する部分を選択し、その部分の韻律パラメータを音
声出力中の第１の操作入力に応じて変更するように構成
したので、音声の出力中にユーザの操作に応じてピッチ
パタン、音韻継続時間長などの韻律が変更され、音声出
力中に音声の部分的な強調などをユーザが行うことがで
きるという効果がある。According to the present invention, the boundary information having the timing of the accent phrase boundary and the syllable boundary is generated based on the prosody parameter and the linguistic information, and the changeable portion of the prosody parameter determined based on the boundary information is generated. , A portion whose prosody parameter is to be changed according to the second operation input during voice output is selected, and the prosody parameter of the portion is changed according to the first operation input during voice output. The prosody, such as the pitch pattern and the phoneme duration, is changed according to the user's operation during the output of, and the user can partially emphasize the voice during the voice output.

【０１３３】この発明によれば、入力テキストのうちの
所定の固定メッセージの部分を検出し、操作入力に応じ
た種類のその固定メッセージの音声波形データを選択
し、音声出力中の操作入力に応じてその音声波形データ
を変更して固定メッセージの韻律を変更し、固定メッセ
ージを音声として再生するように構成したので、固定メ
ッセージかそれ以外のテキストかを問わずに、出力中の
合成音声の部分的な強調などをユーザが行うことができ
るという効果がある。According to the present invention, a predetermined fixed message portion in the input text is detected, and voice waveform data of the fixed message of the type corresponding to the operation input is selected, and the voice waveform data corresponding to the operation input during the voice output is selected. The fixed message by changing its voice waveform data to change the prosody of the fixed message and play the fixed message as voice. This has the effect that the user can perform a general emphasis or the like.

【０１３４】この発明によれば、入力テキストを解析
し、その入力テキストに対応するアクセント句境界位
置、アクセント型、読みおよび韻律制御パラメータを少
なくとも含む言語情報を生成し、その言語情報に基づい
て入力テキストに対応する音声合成単位を選択して音韻
パラメータとするとともに、音声出力中の操作入力に応
じて後続の音韻パラメータを変更し、韻律パラメータ生
成ルールに従って言語情報から韻律パラメータを生成
し、音韻パラメータと韻律パラメータとに基づいて音声
を合成し出力するように構成したので、出力中の合成音
声の部分的な強調などをユーザが行うことができ、高い
表現性を有する会話代替手段として本音声合成装置を使
用することができるという効果がある。According to the present invention, the input text is analyzed, and linguistic information including at least the accent phrase boundary position, accent type, reading and prosody control parameters corresponding to the input text is generated, and the linguistic information is input based on the linguistic information. A speech synthesis unit corresponding to the text is selected and used as a phonological parameter, a subsequent phonological parameter is changed according to an operation input during voice output, and a rhythm parameter is generated from linguistic information according to a rhythm parameter generation rule. And synthesized speech based on the prosody parameters, the user can partially emphasize the synthesized speech being output, etc., and use this speech synthesis as a conversational alternative with high expressiveness. There is an effect that the device can be used.

【０１３５】この発明によれば、声質や音響的な特徴の
異なる複数種類の音声合成単位セットから音声出力中の
操作入力に応じて１種類の音声合成単位セットを選択
し、選択した音声合成単位セットから言語情報に基づい
て入力テキストに対応する音声合成単位を選択して音韻
パラメータとするように構成したので、どなり声やささ
やき声などの声質や音響的な特徴を変更して出力中の合
成音声の部分的な強調などをユーザが即座に行うことが
できるという効果がある。According to the present invention, one type of voice synthesis unit set is selected from a plurality of types of voice synthesis unit sets having different voice qualities and acoustic characteristics in accordance with an operation input during voice output, and the selected voice synthesis unit is selected. Since the speech synthesis unit corresponding to the input text is selected from the set based on the linguistic information and used as the phonological parameters, the voice quality and acoustic features such as gurgling and whispering are changed and the synthesized speech being output There is an effect that the user can immediately perform a partial emphasis on.

[Brief description of the drawings]

【図１】この発明の実施の形態１による音声合成装置
の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a speech synthesis device according to a first embodiment of the present invention.

【図２】図１の言語処理部の構成例を示す構成図であ
る。FIG. 2 is a configuration diagram illustrating a configuration example of a language processing unit in FIG. 1;

【図３】リズム知覚点テーブルの一例を示す図であ
る。FIG. 3 is a diagram showing an example of a rhythm perception point table.

【図４】子音継続時間長テーブルの一例を示す図であ
る。FIG. 4 is a diagram showing an example of a consonant duration table.

【図５】テキスト解析部によるアクセント句列の一例
を示す図である。FIG. 5 is a diagram illustrating an example of an accent phrase string by a text analysis unit.

【図６】言語処理部による言語情報の一例を示す図で
ある。FIG. 6 is a diagram illustrating an example of language information by a language processing unit.

【図７】韻律制御パラメータ（アクセント強さ）の変
更があった場合とない場合とのピッチパタンの比較の一
例を示す図である。FIG. 7 is a diagram showing an example of a comparison of pitch patterns when a prosody control parameter (accent strength) is changed and when it is not changed.

【図８】韻律制御パラメータ（平均モーラ長）の変更
があった場合とない場合との音韻継続時間長の比較の一
例を示す図である。FIG. 8 is a diagram showing an example of a comparison of phoneme continuation time lengths when there is a change in a prosody control parameter (average mora length) and when there is no change.

【図９】この発明の実施の形態２による音声合成装置
の構成を示すブロック図である。FIG. 9 is a block diagram showing a configuration of a speech synthesis device according to a second embodiment of the present invention.

【図１０】複数種類の変換ルールの一例を示す図であ
る。FIG. 10 is a diagram showing an example of a plurality of types of conversion rules.

【図１１】この発明の実施の形態３による音声合成装
置の構成を示すブロック図である。FIG. 11 is a block diagram showing a configuration of a speech synthesizer according to Embodiment 3 of the present invention.

【図１２】韻律変更操作入力に応じて変更されるピッ
チパタンの一例を示す図である。FIG. 12 is a diagram illustrating an example of a pitch pattern changed according to a prosody changing operation input.

【図１３】この発明の実施の形態４による音声合成装
置の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of a speech synthesis device according to a fourth embodiment of the present invention.

【図１４】図６に示す言語情報に基づいて境界情報／
韻律パラメータ生成部により生成される境界情報の一例
を示す図である。14 is a diagram showing boundary information /
It is a figure showing an example of boundary information generated by a prosody parameter generation part.

【図１５】区間入力を行う表示／入力装置の表示画面
の一例を示す図である。FIG. 15 is a diagram showing an example of a display screen of a display / input device for performing section input.

【図１６】図１５に示す表示における韻律変更区間の
指定の一例を示す図である。16 is a diagram showing an example of designation of a prosody change section in the display shown in FIG.

【図１７】時間長を５０パーセント長くするように操
作された場合と操作されない場合の音節列「みんなで」
の音韻継続時間長の一例を示す図である。FIG. 17 shows a syllable string “everyone” when operated to increase the time length by 50% and when not operated.
FIG. 6 is a diagram showing an example of a phoneme duration time of the phoneme.

【図１８】この発明の実施の形態５による音声合成装
置の構成を示すブロック図である。FIG. 18 is a block diagram showing a configuration of a speech synthesis device according to a fifth embodiment of the present invention.

【図１９】音声合成単位セット選択部による表示の一
例を示す図である。FIG. 19 is a diagram illustrating an example of display by a speech synthesis unit set selection unit.

【図２０】この発明の実施の形態６による音声合成装
置の構成を示すブロック図である。FIG. 20 is a block diagram showing a configuration of a speech synthesizer according to Embodiment 6 of the present invention.

【図２１】図２０の固定メッセージ判定言語処理部の
構成例を示すブロック図である。21 is a block diagram illustrating a configuration example of a fixed message determination language processing unit in FIG. 20.

【図２２】固定メッセージデータの一例を示す図であ
る。FIG. 22 is a diagram illustrating an example of fixed message data.

【図２３】図２２の固定メッセージデータに対応して
画面に表示される音声波形データの番号とコメントの一
例を示す図である。FIG. 23 is a diagram showing an example of voice waveform data numbers and comments displayed on the screen corresponding to the fixed message data of FIG. 22.

【図２４】平均パワーをモデルに適用する場合の変換
ルールの一例を示す図である。FIG. 24 is a diagram illustrating an example of a conversion rule when applying average power to a model.

【図２５】第１の従来の音声合成装置の構成を示すブ
ロック図である。FIG. 25 is a block diagram showing a configuration of a first conventional speech synthesizer.

【図２６】第２の従来の音声合成装置の構成を示すブ
ロック図である。FIG. 26 is a block diagram showing a configuration of a second conventional speech synthesizer.

[Explanation of symbols]

１言語処理部、３音韻パラメータ生成部、４韻律
制御パラメータ変更部、６韻律パラメータ生成部、６
Ａ境界情報／韻律パラメータ生成部（韻律パラメータ
生成部）、７音声合成部、２２韻律パラメータ生成
ルール選択部、３１韻律パラメータ変更部、３１Ａ
区間指定韻律パラメータ変更部（韻律パラメータ変更
部）、４２音声合成単位セット選択部、５１固定メ
ッセージ判定言語処理部（言語処理部）、５３固定メ
ッセージ選択部、５４固定メッセージ韻律変更部、５
５音声再生部、８１，８１Ａ，８１Ｂ，８１Ｃ韻律
パラメータ制御部、８２音韻パラメータ制御部。1 language processing unit, 3 phoneme parameter generation unit, 4 prosody control parameter change unit, 6 prosody parameter generation unit, 6
A boundary information / prosodic parameter generation unit (prosody parameter generation unit), 7 voice synthesis unit, 22 prosody parameter generation rule selection unit, 31 prosody parameter change unit, 31A
Section specifying prosody parameter changing section (prosodic parameter changing section), 42 voice synthesis unit set selecting section, 51 fixed message determination language processing section (language processing section), 53 fixed message selecting section, 54 fixed message prosody changing section, 5
5 voice reproduction unit, 81, 81A, 81B, 81C prosody parameter control unit, 82 phoneme parameter control unit.

フロントページの続き (72)発明者海老原充東京都千代田区丸の内二丁目２番３号三菱電機株式会社内Ｆターム(参考） 5D045 AA09 AA20 Continued on the front page (72) Inventor Mitsuru Ebihara 2-3-2 Marunouchi, Chiyoda-ku, Tokyo Mitsubishi Electric Corporation F-term (reference) 5D045 AA09 AA20

Claims

[Claims]

A language processing unit that analyzes input text and generates linguistic information including at least an accent phrase boundary position, an accent type, a reading, and a prosody control parameter corresponding to the input text; A phoneme parameter generation unit for selecting a speech synthesis unit corresponding to the input text and using the phonetic parameter as a phoneme parameter; generating a prosody parameter from the linguistic information according to a prosody parameter generation rule; A speech synthesis device comprising: a prosody parameter control unit that changes a parameter; and a speech synthesis unit that synthesizes and outputs speech based on the phoneme parameter and the prosody parameter.

2. A prosody parameter control unit for changing a prosody control parameter of the linguistic information according to an operation input during voice output, and a prosody parameter changing unit based on the prosody parameter generation rule. The speech synthesis apparatus according to claim 1, further comprising a prosody parameter generation unit that generates the prosody.

3. A prosody parameter generation rule selection unit that selects one prosody parameter generation rule from a plurality of prosody parameter generation rules in accordance with an operation input during voice output, and a prosody parameter generation rule selection unit. A prosody parameter generation unit configured to generate a prosody parameter from linguistic information in accordance with the prosody parameter generation rule selected by the unit.
A speech synthesizer as described.

4. A prosody parameter control unit configured to generate a prosody parameter from linguistic information in accordance with a prosody parameter generation rule, and a first prosody parameter generated by the prosody parameter generation unit that outputs the prosody parameter in a speech output. 2. The speech synthesizer according to claim 1, further comprising a prosody parameter changing unit that changes in accordance with an operation input.

5. A prosody parameter generation unit generates boundary information having timing of an accent phrase boundary and a syllable boundary based on the generated prosody parameter and linguistic information, and a prosody parameter change unit based on the boundary information. A part for changing the prosody parameter according to the second operation input during voice output is selected from the determined prosody parameter changeable part, and the prosody parameter of the part is selected according to the first operation input during voice output. 5. The voice synthesizing device according to claim 4, wherein the voice synthesizing unit changes the frequency.

6. A language processing unit detects a predetermined fixed message part of the input text, analyzes a part other than the fixed message, and detects an accent phrase boundary position corresponding to the part other than the fixed message. A fixed message selection unit that generates linguistic information including at least accent type, reading and prosody control parameters, and selects voice waveform data of the detected fixed message in accordance with an operation input during voice output, and an operation input during voice output 2. The apparatus according to claim 1, further comprising: a fixed message prosody change unit configured to change the prosody of the fixed message by changing the voice waveform data in accordance with a voice message, and an audio reproduction unit configured to reproduce the fixed message as voice. The speech synthesizer according to claim 4.

7. Analyzing the input text and generating linguistic information including at least an accent phrase boundary position, an accent type, a reading and a prosody control parameter corresponding to the input text, and the input text based on the linguistic information. Selecting a speech synthesis unit corresponding to a rhythm parameter and generating a rhythm parameter from the linguistic information according to a rhythm parameter generation rule, and changing a subsequent rhythm parameter in accordance with an operation input during voice output. And a step of synthesizing and outputting speech based on the phoneme parameter and the prosodic parameter.

8. A language processing unit that analyzes input text and generates linguistic information including at least an accent phrase boundary position, an accent type, a reading, and a prosody control parameter corresponding to the input text. While selecting a speech synthesis unit corresponding to the input text and setting it as a phoneme parameter,
A phoneme parameter control unit that changes a subsequent phoneme parameter according to an operation input during voice output, a prosody parameter generation unit that generates a prosody parameter from the linguistic information according to a prosody parameter generation rule, And a speech synthesizer that synthesizes and outputs speech based on the speech.

9. A speech synthesis unit set for selecting one type of speech synthesis unit set according to an operation input during speech output from a plurality of types of speech synthesis unit sets having different voice qualities and acoustic characteristics. A selecting unit, and a phoneme parameter generation unit that selects a speech synthesis unit corresponding to the input text based on linguistic information from the speech synthesis unit set selected by the speech synthesis unit set selection unit, and sets the selected speech synthesis unit as a phoneme parameter. 9. The speech synthesizer according to claim 8, wherein:

10. A language processing unit detects a predetermined fixed message part of the input text, analyzes a part other than the fixed message, and detects an accent phrase boundary position corresponding to the part other than the fixed message. A fixed message selection unit that generates linguistic information including at least accent type, reading and prosody control parameters, and selects voice waveform data of the detected fixed message in accordance with an operation input during voice output, and an operation input during voice output 9. A fixed message prosody change unit that changes the prosody of the fixed message by changing the audio waveform data according to the following, and a voice reproduction unit that reproduces the fixed message as voice. The speech synthesizer according to claim 9.

11. An input text is analyzed, and an accent phrase boundary position corresponding to the input text, an accent type,
Generating linguistic information including at least reading and prosodic control parameters, and selecting a speech synthesis unit corresponding to the input text based on the linguistic information as a phonological parameter,
Changing a subsequent phonological parameter according to an operation input during voice output; generating a prosodic parameter from the linguistic information according to a prosodic parameter generation rule; synthesizing a voice based on the phonological parameter and the prosodic parameter And outputting.