JP2000221989A

JP2000221989A - Sound synthesizing device, regular sound synthesizing method, and memory medium

Info

Publication number: JP2000221989A
Application number: JP11019376A
Authority: JP
Inventors: Yukio Tabei; 幸雄田部井
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1999-01-28
Filing date: 1999-01-28
Publication date: 2000-08-11
Anticipated expiration: 2019-01-28
Also published as: JP4232254B2

Abstract

PROBLEM TO BE SOLVED: To provide a sound synthesizing device, a regular sound synthesizing method, and a memory medium, capable of appropriately controlling the pause lengths by a simple technique without using a connection information etc., used in text analysis at all, and capable of obtaining a synthesized sound of natural tempo feeling. SOLUTION: This sound synthesizing device determines the pause lengths for regular sound synthesis by being equipped with a just-before-pause breath group length computing part 210 for counting the length of a breath group just before a breath group boundary that is not a non-pause, a just-after-pause breath group length computing part 211 for counting the length of a breath group just after a breath group boundary that is not a non-pause, an after-two-pause breath group length computing part 212 for counting the length of a breath group two pauses after a breath group boundary that is not a non-pause, a categorizing means 203 for categorizing the length of the respective breath groups, a pause-length learning/ predicting part 204 for predicting the length of pause time from the categorized length of the breath groups by using a pause time length weighted table obtained by the quantification class 1, a learning/prediction control part 207 for controlling the pause-length learning/ predicting part 204, and a pause length modifying part 206 for modifying a predicted value of pause length according to the speed of sound production.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、規則によって任意
の音声を合成する音声合成装置、規則音声合成方法及び
記憶媒体に関し、特に、特に、日本語の規則によって、
自然性の高い合成音声を出力できる規則音声合成のため
のポーズ長決定方法を改良した音声合成装置、規則音声
合成方法及び記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for synthesizing an arbitrary speech according to rules, a rule speech synthesis method and a storage medium.
The present invention relates to a speech synthesis device, a rule speech synthesis method, and a storage medium in which a pause length determination method for rule speech synthesis capable of outputting synthetic speech with high naturalness is improved.

【０００２】[0002]

【従来の技術】音声合成方式には、音声波形そのものを
記録しておき、それを組み合わせて音声波形を作りだす
ものと、音声の特性を表すパラメータに分析して記録し
ておき、出力時に合成器を用いるものがある。さらに、
制御部と波形形成部の方式の組み合わせにより録音編
集、音声素片編集、パラメータ編集、規則合成などのさ
まざまな音声合成方式がある。2. Description of the Related Art In a speech synthesis system, a speech waveform itself is recorded, a speech waveform is created by combining the speech waveforms, and a parameter representing speech characteristics is analyzed and recorded. Some use. further,
There are various voice synthesis methods such as recording and editing, voice unit editing, parameter editing, and rule synthesis, depending on the combination of the methods of the control unit and the waveform forming unit.

【０００３】このうち音声の規則合成は、文字や音声記
号などの離散的記号で表現された系列を、連続音声に変
換するものである。変換の過程で音声生成の普遍的諸性
質や人為的諸特性が合成規則として適用される。[0003] Among them, the rule synthesis of speech converts a sequence represented by discrete symbols such as characters and speech symbols into continuous speech. In the process of conversion, universal properties and artificial properties of speech generation are applied as synthesis rules.

【０００４】テキスト音声合成は、文章（テキストデー
タ）を音声に変換するものであり、音声表記と１対１に
対応しないのが普通である。したがって、入力テキスト
を音声記号の系列に変換するとともに、韻律的特徴を自
動的に生成するため、形態素解析や構文解析などの言語
処理が必要となる。[0004] Text-to-speech synthesis converts text (text data) into speech, and usually does not correspond one-to-one with speech notation. Therefore, language processing such as morphological analysis and syntax analysis is required to convert the input text into a sequence of phonetic symbols and to automatically generate prosodic features.

【０００５】従来、テキスト文章を音声にして出力する
テキスト音声変換装置は、テキスト解析部と規則音声合
成部（パラメータ生成部と音声合成部）から構成され
る。Conventionally, a text-to-speech conversion apparatus for converting a text sentence into a voice and outputting the text includes a text analysis section and a rule speech synthesis section (a parameter generation section and a speech synthesis section).

【０００６】テキスト解析部では、漢字かな混じり文を
入力して、単語辞書を参照して形態素解析し、読み、ア
クセント、イントネーションを決定し、韻律記号付き発
音記号（中間言語）を出力する。The text analysis unit inputs a sentence mixed with kanji and kana, performs morphological analysis with reference to a word dictionary, determines reading, accent, intonation, and outputs phonetic symbols with prosodic symbols (intermediate language).

【０００７】パラメータ生成部では、ピッチ周波数パタ
ーンや音韻継続時間等の設定を行い、音声合成部では、
音声の合成処理を行う。[0007] The parameter generation unit sets the pitch frequency pattern, the phoneme duration, and the like.
Performs speech synthesis processing.

【０００８】音声合成部では、目的とする音韻系列（中
間言語）から音声合成単位を、あらかじめ蓄積されてい
る音声データから選択し、パラメータ生成部で決定した
パラメータに従って、結合／変形して音声の合成処理を
行う。The speech synthesis unit selects a speech synthesis unit from a target phoneme sequence (intermediate language) from speech data stored in advance and combines / deforms the speech according to the parameters determined by the parameter generation unit. Perform synthesis processing.

【０００９】音声合成単位は、音素、音節（ＣＶ）、Ｖ
ＣＶ，ＣＶＣ（Ｃ：子音、Ｖ：母音）等や、音韻連鎖を
拡張した単位がある。The speech synthesis units are phonemes, syllables (CV), V
There are units such as CV and CVC (C: consonant, V: vowel), and units obtained by expanding phoneme chains.

【００１０】音声合成方法としては、あらかじめ音声波
形にピッチマーク（基準点）を付けておき、その位置を
中心に切り出して、合成時には合成ピッチ周期に合わせ
て、ピッチマーク位置を合成ピッチ周期ずらしながら重
ね合わせる合成方式が知られている。As a voice synthesis method, a pitch mark (reference point) is previously attached to a voice waveform, and the voice waveform is cut out at the center thereof, and at the time of synthesis, the pitch mark position is shifted according to the synthesis pitch cycle while shifting the synthesis pitch cycle. A superposition combining method is known.

【００１１】上記構成のテキスト音声変換によって、よ
り自然性の高い合成音声を出力するには、音声素片の単
位の持ち方、素片品質、合成方式と共に、前記パラメー
タ生成部でのパラメータ（ピッチ周波数パターン、音韻
継続時間長、ポーズ、振幅）をいかに自然音声に近くな
るよう適切に制御するかが極めて重要となる。ポーズと
は、文節の前後の若干の休止区間をいう。In order to output a synthesized voice with higher naturalness by the text-to-speech conversion having the above-described configuration, the parameter (pitch) in the parameter generation unit is determined along with the manner of holding the unit of the voice unit, the unit quality, and the synthesis method. It is extremely important how to appropriately control the frequency pattern, phoneme duration, pause, and amplitude) so as to be close to natural speech. A pause is a short pause before and after a phrase.

【００１２】ポーズ設定方法としては、文献１：特開平
６−５９６９５号（ＡＴＲ自動翻訳電話研究所）公報に
記載された方法がある。As a pause setting method, there is a method described in Reference 1: Japanese Patent Laid-Open No. 6-59695 (ATR Automatic Translation and Telephone Research Institute).

【００１３】[0013]

【発明が解決しようとする課題】このような従来のポー
ズ長制御方法にあっては、以下のような問題点があっ
た。However, such a conventional pause length control method has the following problems.

【００１４】すなわち、この従来のポーズ長制御方法で
は、局所的な係り受け関係を用いてポーズ長を制御する
方法であるため、この方法を実現するには、テキスト解
析部が必要であり、さらにテキスト解析部において、係
り受け関係を求めるため構文解析が必須となり、単語辞
書の記憶容量とテキスト解析部全体の処理が大きくなる
という問題点があった。また、構文解析を誤ると、直接
的に悪影響を受けるという欠点があった。That is, in the conventional pause length control method, since the pause length is controlled by using a local dependency relationship, a text analysis unit is required to realize this method. In the text analysis unit, syntactic analysis is required to obtain the dependency relationship, and there is a problem that the storage capacity of the word dictionary and the processing of the entire text analysis unit increase. In addition, there is a drawback that an incorrect syntax analysis has a direct adverse effect.

【００１５】このように、上記従来の音声合成方法で
は、簡易な方法でポーズ長を決定するのは困難であっ
た。As described above, in the above-mentioned conventional speech synthesis method, it was difficult to determine the pause length by a simple method.

【００１６】本発明は、テキスト解析における係り受け
情報等を全く用いることなく、簡易な手法でポーズ長を
適切に制御でき、自然なテンポ感の合成音声を得ること
ができる音声合成装置、規則音声合成方法及び記憶媒体
を提供することを目的とする。The present invention provides a speech synthesizing apparatus which can appropriately control a pause length by a simple method without using any dependency information or the like in text analysis and can obtain a synthesized speech with a natural tempo feeling. It is an object to provide a synthesis method and a storage medium.

【００１７】[0017]

【課題を解決するための手段】本発明に係る音声合成装
置は、規則によって任意の合成音声を得る音声合成装置
において、非ポーズでない呼気段落境界を検出する手段
と、非ポーズでない呼気段落境界の直前の呼気段落長を
計数する第１の計数手段と、非ポーズでない呼気段落境
界の直後の呼気段落長を計数する第２の計数手段と、非
ポーズでない呼気段落境界の２つ後の呼気段落長を計数
する第３の計数手段と、第１、第２及び第３の呼気段落
長をカテゴリ化する手段と、カテゴリ化された呼気段落
長から、多変量解析によって得たポーズ時間長重みテー
ブルを用いて、ポーズ時間長を予測するポーズ長決定手
段とを備えて構成する。According to the present invention, there is provided a voice synthesizing apparatus for obtaining an arbitrary synthesized voice according to a rule, comprising: means for detecting a non-pause exhaled paragraph boundary; First counting means for counting the immediately preceding expiration paragraph length; second counting means for counting the expiration paragraph length immediately after the non-pause expiration paragraph boundary; and the expiration paragraph two times after the non-pause expiration paragraph boundary. Third counting means for counting the length, means for categorizing the first, second and third expiration paragraph lengths, and a pause time length weight table obtained by multivariate analysis from the categorized expiration paragraph lengths And a pause length determining means for predicting a pause time length.

【００１８】本発明に係る音声合成装置は、規則によっ
て任意の合成音声を得る音声合成装置において、非ポー
ズでない呼気段落境界を検出する手段と、非ポーズでな
い呼気段落境界の直前の呼気段落長を計数する第１の計
数手段と、非ポーズでない呼気段落境界の直後の呼気段
落長を計数する第２の計数手段と、第１及び第２の呼気
段落長をカテゴリ化する手段と、カテゴリ化された呼気
段落長から、多変量解析によって得たポーズ時間長重み
テーブルを用いて、ポーズ時間長を予測するポーズ長決
定手段とを備えて構成する。The speech synthesis apparatus according to the present invention is a speech synthesis apparatus for obtaining an arbitrary synthesized speech according to a rule, wherein a means for detecting a non-pause exhalation paragraph boundary and an exhalation paragraph length immediately before the non-pause exhalation paragraph boundary are used. A first counting means for counting, a second counting means for counting the exhalation paragraph length immediately after a non-pause exhalation paragraph boundary, a means for categorizing the first and second expiration paragraph lengths, And a pause length determining means for predicting a pause time length using a pause time length weight table obtained by the multivariate analysis from the expired paragraph length.

【００１９】本発明に係る音声合成装置は、規則によっ
て任意の合成音声を得る音声合成装置において、ポーズ
記号の直前の呼気段落長を計数する第１の計数手段と、
ポーズ記号の直後の呼気段落長を計数する第２の計数手
段と、ポーズ記号の２つ後の呼気段落長を計数する第３
の計数手段と、第１、第２及び第３の呼気段落長をカテ
ゴリ化する手段と、カテゴリ化された呼気段落長から、
多変量解析によって得たポーズ時間長重みテーブルを用
いて、ポーズ時間長を予測するポーズ長決定手段とを備
えて構成する。The speech synthesizer according to the present invention is a speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, wherein the first counting means for counting an exhalation paragraph length immediately before a pause symbol;
A second counting means for counting the length of the expiration paragraph immediately after the pause symbol; and a third means for counting the length of the expiration paragraph immediately after the pause symbol.
Counting means, means for categorizing the first, second and third exhalation paragraph lengths, and the categorized expiration paragraph lengths,
A pause length determining means for predicting a pause time length by using a pause time length weight table obtained by the multivariate analysis is provided.

【００２０】本発明に係る音声合成装置は、規則によっ
て任意の合成音声を得る音声合成装置において、ポーズ
記号の直前の呼気段落長を計数する第１の計数手段と、
ポーズ記号の直後の呼気段落長を計数する第２の計数手
段と、第１及び第２の呼気段落長をカテゴリ化する手段
と、カテゴリ化された呼気段落長から、多変量解析によ
って得たポーズ時間長重みテーブルを用いて、ポーズ時
間長を予測するポーズ長決定手段とを備えて構成する。The speech synthesizer according to the present invention is a speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, wherein: a first counting means for counting an exhalation paragraph length immediately before a pause symbol;
Second counting means for counting the exhalation paragraph length immediately after the pause symbol, means for categorizing the first and second expiration paragraph lengths, and a pause obtained by multivariate analysis from the categorized expiration paragraph lengths A pause length determining means for predicting a pause time length using a time length weight table is provided.

【００２１】本発明に係る音声合成装置は、規則によっ
て任意の合成音声を得る音声合成装置において、非ポー
ズでない呼気段落境界を検出する手段と、非ポーズでな
い呼気段落境界の直前の呼気段落長を計数する第１の計
数手段と、非ポーズでない呼気段落境界の直後の呼気段
落長を計数する第２の計数手段と、非ポーズでない呼気
段落境界の２つ後の呼気段落長を計数する第３の計数手
段と、各呼気段落長から、多変量解析によって得たポー
ズ時間長重みテーブルを用いて、ポーズ時間長を予測す
るポーズ長決定手段とを備えて構成する。According to the speech synthesizing apparatus of the present invention, in a speech synthesizing apparatus for obtaining an arbitrary synthesized speech according to a rule, a means for detecting a non-pause exhalation paragraph boundary and an exhalation paragraph length immediately before the non-pause exhalation paragraph boundary are used. First counting means for counting, second counting means for counting the length of the expiration paragraph immediately after the non-pause exhalation paragraph boundary, and third means for counting the exhalation paragraph length two times after the non-pause expiration paragraph boundary. And a pause length determining means for predicting a pause time length from each expiration paragraph length using a pause time length weight table obtained by multivariate analysis.

【００２２】本発明に係る音声合成装置は、規則によっ
て任意の合成音声を得る音声合成装置において、非ポー
ズでない呼気段落境界を検出する手段と、非ポーズでな
い呼気段落境界の直前の呼気段落長を計数する第１の計
数手段と、非ポーズでない呼気段落境界の直後の呼気段
落長を計数する第２の計数手段と、各呼気段落長から、
多変量解析によって得たポーズ時間長重みテーブルを用
いて、ポーズ時間長を予測するポーズ長決定手段とを備
えて構成する。A speech synthesis apparatus according to the present invention is a speech synthesis apparatus for obtaining an arbitrary synthesized speech according to a rule, wherein a means for detecting a non-pause expiration paragraph boundary and an expiration paragraph length immediately before the non-pause expiration paragraph boundary are set. A first counting means for counting, a second counting means for counting the exhalation paragraph length immediately after the non-pause exhalation paragraph boundary, and:
A pause length determining means for predicting a pause time length by using a pause time length weight table obtained by the multivariate analysis is provided.

【００２３】本発明に係る音声合成装置は、規則によっ
て任意の合成音声を得る音声合成装置において、ポーズ
記号の直前の呼気段落長を計数する第１の計数手段と、
ポーズ記号の直後の呼気段落長を計数する第２の計数手
段と、ポーズ記号の２つ後の呼気段落長を計数する第３
の計数手段と、各呼気段落長から、多変量解析によって
得たポーズ時間長重みテーブルを用いて、ポーズ時間長
を予測するポーズ長決定手段とを備えて構成する。The speech synthesizer according to the present invention is a speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, wherein: a first counting means for counting an exhalation paragraph length immediately before a pause symbol;
A second counting means for counting the length of the expiration paragraph immediately after the pause symbol; and a third means for counting the length of the expiration paragraph immediately after the pause symbol.
And a pause length determining means for predicting a pause time length from each expiration paragraph length using a pause time length weight table obtained by multivariate analysis.

【００２４】本発明に係る音声合成装置は、規則によっ
て任意の合成音声を得る音声合成装置において、ポーズ
記号の直前の呼気段落長を計数する第１の計数手段と、
ポーズ記号の直後の呼気段落長を計数する第２の計数手
段と、各呼気段落長から、多変量解析によって得たポー
ズ時間長重みテーブルを用いて、ポーズ時間長を予測す
るポーズ長決定手段とを備えて構成する。The speech synthesizer according to the present invention is a speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, wherein: a first counting means for counting an exhalation paragraph length immediately before a pause symbol;
Second counting means for counting the expiration paragraph length immediately after the pause symbol, and pause length determination means for predicting a pause time length from each expiration paragraph length using a pause time length weight table obtained by multivariate analysis. It comprises.

【００２５】上記ポーズ長決定手段は、発声速度に応じ
て、ポーズ長予測値を修正して設定するポーズ長修正手
段を備えたものであってもよい。The pause length determining means may include a pause length correcting means for correcting and setting a predicted pause length value in accordance with the utterance speed.

【００２６】また、本発明に係る規則音声合成方法は、
規則によって任意の合成音声を得る規則音声合成方法に
おいて、非ポーズでない呼気段落境界を検出するステッ
プと、非ポーズでない呼気段落境界の直前の呼気段落長
を計数するステップと、非ポーズでない呼気段落境界の
直後の呼気段落長を計数するステップと、非ポーズでな
い呼気段落境界の２つ後の呼気段落長を計数するステッ
プと、各呼気段落長をカテゴリ化するステップと、カテ
ゴリ化された呼気段落長から、多変量解析によって得た
ポーズ時間長重みテーブルを用いて、ポーズ時間長を予
測するステップとを順次実行して規則音声合成のための
ポーズ長を決定することを特徴とする。Further, the rule speech synthesis method according to the present invention comprises:
A method for detecting a non-paused exhalation paragraph boundary, a step for counting the exhalation paragraph length immediately before the non-pause exhalation paragraph boundary, and a non-pause exhalation paragraph boundary. Counting the exhalation paragraph length immediately after the step, counting the exhalation paragraph length two times after the non-pause exhalation paragraph boundary, categorizing each exhalation paragraph length, and categorizing the exhalation paragraph length. And a step of predicting the pause time length using the pause time length weight table obtained by the multivariate analysis to sequentially determine the pause length for the ruled speech synthesis.

【００２７】本発明に係る規則音声合成方法は、規則に
よって任意の合成音声を得る規則音声合成方法におい
て、非ポーズでない呼気段落境界を検出するステップ
と、非ポーズでない呼気段落境界の直前の呼気段落長を
計数するステップと、非ポーズでない呼気段落境界の直
後の呼気段落長を計数するステップと、各呼気段落長を
カテゴリ化するステップと、カテゴリ化された呼気段落
長から、多変量解析によって得たポーズ時間長重みテー
ブルを用いて、ポーズ時間長を予測するステップとを順
次実行して規則音声合成のためのポーズ長を決定するこ
とを特徴とする。According to the rule speech synthesis method of the present invention, in a rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, a step of detecting an exhalation paragraph boundary which is not non-paused, and an exhalation paragraph immediately before the non-pause exhalation paragraph boundary. Counting the length of the exhalation paragraph immediately after the non-pause exhalation paragraph boundary; categorizing each exhalation paragraph length; and multivariate analysis from the categorized exhalation paragraph length. And a step of predicting a pause time length by using the pause time length weight table to determine a pause length for regular speech synthesis.

【００２８】本発明に係る規則音声合成方法は、規則に
よって任意の合成音声を得る規則音声合成方法におい
て、ポーズ記号の直前の呼気段落長を計数するステップ
と、ポーズ記号の直後の呼気段落長を計数するステップ
と、ポーズ記号の２つ後の呼気段落長を計数するステッ
プと、各呼気段落長をカテゴリ化するステップと、カテ
ゴリ化された呼気段落長から、多変量解析によって得た
ポーズ時間長重みテーブルを用いて、ポーズ時間長を予
測するステップとを順次実行して規則音声合成のための
ポーズ長を決定することを特徴とする。According to a rule speech synthesis method according to the present invention, in a rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, a step of counting an exhalation paragraph length immediately before a pause symbol, and Counting, counting the exhalation paragraph length after the pause symbol, categorizing each exhalation paragraph length, and a pause time length obtained by multivariate analysis from the categorized expiration paragraph length. And a step of predicting a pause time length using a weight table are sequentially performed to determine a pause length for rule-based speech synthesis.

【００２９】本発明に係る規則音声合成方法は、規則に
よって任意の合成音声を得る規則音声合成方法におい
て、ポーズ記号の直前の呼気段落長を計数するステップ
と、ポーズ記号の直後の呼気段落長を計数するステップ
と、各呼気段落長をカテゴリ化するステップと、カテゴ
リ化された呼気段落長から、多変量解析によって得たポ
ーズ時間長重みテーブルを用いて、ポーズ時間長を予測
するステップとを順次実行して規則音声合成のためのポ
ーズ長を決定することを特徴とする。According to a rule speech synthesis method according to the present invention, in a rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, a step of counting an exhalation paragraph length immediately before a pause symbol, and A counting step, a step of categorizing each expiration paragraph length, and a step of predicting a pause time length from the categorized expiration paragraph length using a pause time length weight table obtained by multivariate analysis. The method is characterized in that it is executed to determine a pause length for rule speech synthesis.

【００３０】本発明に係る規則音声合成方法は、規則に
よって任意の合成音声を得る規則音声合成方法におい
て、非ポーズでない呼気段落境界を検出するステップ
と、非ポーズでない呼気段落境界の直前の呼気段落長を
計数するステップと、非ポーズでない呼気段落境界の直
後の呼気段落長を計数するステップと、非ポーズでない
呼気段落境界の２つ後の呼気段落長を計数するステップ
と、各呼気段落長から、多変量解析によって得たポーズ
時間長重みテーブルを用いて、ポーズ時間長を予測する
ステップとを順次実行して規則音声合成のためのポーズ
長を決定することを特徴とする。According to the rule speech synthesis method of the present invention, in a rule speech synthesis method for obtaining an arbitrary synthesized speech by a rule, a step of detecting an exhalation paragraph boundary which is not non-paused, and an exhalation paragraph immediately before the non-pause exhalation paragraph boundary. Counting the length of the exhalation paragraph immediately after the non-pause exhalation paragraph boundary; counting the exhalation paragraph length two times after the non-pause exhalation paragraph boundary; and And a step of predicting the pause time length using a pause time length weight table obtained by the multivariate analysis are sequentially executed to determine a pause length for rule-based speech synthesis.

【００３１】本発明に係る規則音声合成方法は、規則に
よって任意の合成音声を得る規則音声合成方法におい
て、非ポーズでない呼気段落境界を検出するステップ
と、非ポーズでない呼気段落境界の直前の呼気段落長を
計数するステップと、非ポーズでない呼気段落境界の直
後の呼気段落長を計数するステップと、各呼気段落長か
ら、多変量解析によって得たポーズ時間長重みテーブル
を用いて、ポーズ時間長を予測するステップとを順次実
行して規則音声合成のためのポーズ長を決定することを
特徴とする。According to the rule speech synthesis method of the present invention, in a rule speech synthesis method for obtaining an arbitrary synthesized speech by a rule, a step of detecting an exhalation paragraph boundary which is not non-paused, and an exhalation paragraph immediately before the non-pause exhalation paragraph boundary. Counting the length of the exhalation paragraph immediately after the non-pause exhalation paragraph boundary; and, from each exhalation paragraph length, using the pause time length weight table obtained by the multivariate analysis to determine the pause time length. And a predicting step are sequentially performed to determine a pause length for rule-based speech synthesis.

【００３２】本発明に係る規則音声合成方法は、規則に
よって任意の合成音声を得る規則音声合成方法におい
て、ポーズ記号の直前の呼気段落長を計数するステップ
と、ポーズ記号の直後の呼気段落長を計数するステップ
と、ポーズ記号の２つ後の呼気段落長を計数するステッ
プと、各呼気段落長から、多変量解析によって得たポー
ズ時間長重みテーブルを用いて、ポーズ時間長を予測す
るステップとを順次実行して規則音声合成のためのポー
ズ長を決定することを特徴とする。According to a rule speech synthesis method according to the present invention, in a rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, a step of counting an exhalation paragraph length immediately before a pause symbol, and Counting, counting the exhalation paragraph length after the pause symbol, and predicting the pause duration from each expiration paragraph length using a pause duration weight table obtained by multivariate analysis. Are sequentially executed to determine a pause length for rule-based speech synthesis.

【００３３】本発明に係る規則音声合成方法は、規則に
よって任意の合成音声を得る規則音声合成方法におい
て、ポーズ記号の直前の呼気段落長を計数するステップ
と、ポーズ記号の直後の呼気段落長を計数するステップ
と、各呼気段落長から、多変量解析によって得たポーズ
時間長重みテーブルを用いて、ポーズ時間長を予測する
ステップとを順次実行して規則音声合成のためのポーズ
長を決定することを特徴とする。In the rule speech synthesis method according to the present invention, in the rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, the step of counting the length of the breath paragraph immediately before the pause symbol and the step of counting the length of the breath paragraph immediately after the pause symbol are performed. A counting step and a step of predicting a pause time length from each expiration paragraph length using a pause time length weight table obtained by multivariate analysis are sequentially performed to determine a pause length for rule-based speech synthesis. It is characterized by the following.

【００３４】本発明に係る規則音声合成方法は、ポーズ
時間長を予測した後、発声速度に応じて、ポーズ長予測
値を修正して設定するステップを実行するものであって
もよい。In the rule speech synthesis method according to the present invention, a step of predicting a pause time length and then correcting and setting a predicted pause length value according to the utterance speed may be executed.

【００３５】上記多変量解析は、質的な要因に基づい
て、目的となる外的基準を算出する数量化１類であって
もよく、また、複数の説明変数基づいて、目的変数を算
出する重回帰分析であってもよい。The above-mentioned multivariate analysis may be a quantification type 1 for calculating an objective external criterion based on a qualitative factor, and also calculate an objective variable based on a plurality of explanatory variables. Multiple regression analysis may be used.

【００３６】本発明に係る記憶媒体は、コンピュータに
より必要に応じて読み出され、コンピュータ上で実行さ
れるプログラムを記憶した記憶媒体であって、請求項１
２乃至１９の何れかに記載の規則音声合成方法を順次実
行するプログラムを記憶したことを特徴とする。A storage medium according to the present invention is a storage medium that stores a program that is read by a computer as needed and executed on the computer.
A program for sequentially executing the rule speech synthesis method according to any one of 2 to 19 is stored.

【００３７】[0037]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態について説明する。第１の実施形態図１は本発明の第１の実施形態に係る音声合成装置及び
規則音声合成方法の構成を示すブロック図である。本実
施形態に係る音声合成方法は、テキストデータを入力と
するテキスト音声変換装置に適用した例である。Embodiments of the present invention will be described below with reference to the drawings. First Embodiment FIG. 1 is a block diagram showing a configuration of a speech synthesis device and a rule speech synthesis method according to a first embodiment of the present invention. The speech synthesis method according to the present embodiment is an example applied to a text-to-speech conversion device that inputs text data.

【００３８】図１において、１０１はテキスト解析部、
１０２は単語辞書、１０３はパラメータ生成部、１０４
は音声合成部、１０５は素片辞書、１０６は音声入力
部、１０７は素片作成部である。In FIG. 1, reference numeral 101 denotes a text analysis unit;
102 is a word dictionary, 103 is a parameter generator, 104
Is a speech synthesis unit, 105 is a speech unit dictionary, 106 is a speech input unit, and 107 is a speech unit creation unit.

【００３９】テキスト解析部１０１は、漢字かな混じり
文を入力して、単語辞書１０２を参照して形態素解析
し、読み、アクセント、イントネーションを決定し、韻
律記号付き発音記号（中間言語）を出力する。アクセン
トとイントネーションは、ピッチ周波数の時間的変化パ
ターンと最も密接に関係しており、ピッチ周波数パター
ンは自然で聞きやすい音調を付与するばかりでなく、単
語や句のまとまりを示して文音声を理解しやすくする役
割を果たす。The text analysis unit 101 inputs a sentence mixed with kanji and kana, performs morphological analysis with reference to the word dictionary 102, determines reading, accent, intonation, and outputs phonetic symbols with prosodic symbols (intermediate language). . Accent and intonation are most closely related to the temporal pattern of pitch frequency.Pitch frequency patterns not only give natural and easy-to-hear tones, but also show the unity of words and phrases to understand sentence speech. It plays a role in facilitating.

【００４０】単語辞書１０２は、例えばＲＯＭやＲＡＭ
で構成され、単語辞書及び文法的に連結可能な後続単語
の種類を規定した単語検索テーブルを記憶する。The word dictionary 102 is, for example, a ROM or a RAM.
And stores a word dictionary and a word search table that defines the types of subsequent words that can be grammatically linked.

【００４１】パラメータ生成部１０３は、テキスト解析
部１０１からの直接あるいは韻律記号付き発音記号（中
間言語）を入力として、中間言語自身から使用すべき、
素片辞書１０５内の素片アドレスを選択し、また、ピッ
チ周波数パターンや音韻継続時間長、振幅、ポーズ長等
の設定を行う。パラメータ生成部１０３は、本音声合成
方法の主要部分であり、図２〜図３により詳細に後述す
る。The parameter generation unit 103 receives the phonetic symbols with prosodic symbols (intermediate language) directly or from the text analysis unit 101 and uses them from the intermediate language itself.
A segment address in the segment dictionary 105 is selected, and a pitch frequency pattern, a phoneme duration time, an amplitude, a pause length, and the like are set. The parameter generation unit 103 is a main part of the present speech synthesis method, and will be described later in detail with reference to FIGS.

【００４２】素片辞書１０５は、自然音声信号を入力
し、あらかじめ素片作成部１０７により作成される。The unit dictionary 105 receives a natural speech signal and is created in advance by the unit creating unit 107.

【００４３】音声信号入力部１０６は、ＲＳ−２３２Ｃ
等の通信ポートやＦＤＤ、データを格納する内部バッフ
ァから構成され、音声合成に用いる素片の基となる自然
音声データが通信ポートやＦＤＤを通して入力される。The audio signal input unit 106 is an RS-232C
And the like, and an internal buffer for storing data, and natural speech data serving as a basis for a unit used for speech synthesis is input through the communication port and the FDD.

【００４４】素片作成部１０７は、あらかじめ自然音声
データをラベリングし、合成音の基となる素片を作成し
ておくものである。The segment creating section 107 labels a natural speech data in advance and creates a segment which is a base of a synthetic sound.

【００４５】音声合成部１０４は、素片辞書１０５内の
素片を選択して音声合成するもので、従来の種々の方法
が適用でき、例えば、波形重畳法を用いることができ
る。The speech synthesis unit 104 selects a segment in the segment dictionary 105 and synthesizes speech. Various conventional methods can be applied, for example, a waveform superposition method can be used.

【００４６】パラメータ生成部１０３で決定したピッチ
周期は、素片接続の周期として、音韻の継続時間長は、
素片辞書１０５内の素片の圧縮、伸縮などによって実現
する。The pitch period determined by the parameter generation unit 103 is the period of unit connection, and the duration of the phoneme is
This is realized by compression, expansion and contraction of the segments in the segment dictionary 105.

【００４７】なお、上記単語辞書１０２、音声合成部１
０４、素片辞書１０５、素片作成部１０７は従来技術の
ものと同一でよい。The word dictionary 102 and the speech synthesizer 1
04, the unit dictionary 105, and the unit creating unit 107 may be the same as those of the prior art.

【００４８】図２は上記パラメータ生成部１０３によ
る、規則音声合成のためのポーズ長決定方法を説明する
ためのブロック図である。FIG. 2 is a block diagram for explaining a method of determining a pause length for the synthesis of ruled speech by the parameter generation unit 103.

【００４９】図２において、２０１は中間言語、２１０
は非ポーズでない呼気段落境界の直前の呼気段落長を計
数するポーズ直前呼気段落長算出部（第１の計数手
段）、２１１は非ポーズでない呼気段落境界の直後の呼
気段落長を計数するポーズ直後呼気段落長算出部（第２
の計数手段）、２１２は非ポーズでない呼気段落境界の
２つ後の呼気段落長を計数するポーズ２つ後呼気段落長
算出部（第３の計数手段）、２０３は各呼気段落長算出
部２１０〜２１２の各呼気段落長２１３をカテゴリ化す
るカテゴリ化手段、２０４はカテゴリ化された呼気段落
長から、数量化１類によって得たポーズ時間長重みテー
ブルを用いてポーズ時間長を予測するポーズ長学習／予
測部（ポーズ長決定手段）、２０５は学習データ、２０
７はポーズ長学習／予測部２０４を制御する学習／予測
制御部、２０８は発声速度データ、２０６は発声速度デ
ータ２０８に基づいてポーズ長を修正するポーズ長修正
部（ポーズ長修正手段）、２０９は目的ポーズ長データ
である。数量化１類については後述する。In FIG. 2, 201 is an intermediate language, 210
Is a pause immediately before expiration paragraph length calculation unit (first counting means) for counting the expiration paragraph length immediately before the non-pause expiration paragraph boundary, and 211 is immediately after the pause for counting the expiration paragraph length immediately after the non-pause expiration paragraph boundary. Expiration paragraph length calculation unit (second
Counting means), 212 is an expiration paragraph length calculating section (third counting means) for counting the expiration paragraph length two cycles after the non-pause expiration paragraph boundary, and 203 is each expiration paragraph length calculating section 210. A categorizing means 204 for categorizing the respective exhalation paragraph lengths 213 to 212, and a pause length for estimating a pause time length from the categorized expiration paragraph lengths using a pause time length weight table obtained by quantification class 1. Learning / prediction unit (pause length determining means), 205 is learning data, 20
7, a learning / prediction control unit for controlling the pause length learning / prediction unit 204; 208, utterance speed data; 206, a pause length correction unit (pause length correction means) for correcting a pause length based on the utterance speed data 208; Is the target pose length data. Quantification 1 will be described later.

【００５０】図３は上記ポーズ長学習／予測部２０４及
びポーズ長修正部２０６の詳細な構成を示すブロック図
である。FIG. 3 is a block diagram showing a detailed configuration of the pause length learning / prediction unit 204 and the pause length correction unit 206.

【００５１】図３において、ポーズ長学習／予測部２０
４は、数量化１類予測モデル３０１、カテゴリ数量メモ
リ３０２及び数量化１類学習モデル３０３から構成さ
れ、ポーズ長修正部２０６は、発声速度比算出部３０５
及びポーズ長伸縮部３０７から構成される。また、２０
７はポーズ長学習／予測部２０４を制御する学習／予測
制御部である。In FIG. 3, the pause length learning / prediction unit 20
4 includes a quantified first-class prediction model 301, a category quantity memory 302, and a quantified first-class learning model 303, and a pause length correction unit 206 includes a utterance speed ratio calculation unit 305.
And a pose length extension / contraction section 307. Also, 20
A learning / prediction control unit 7 controls the pause length learning / prediction unit 204.

【００５２】数量化１類学習モデル３０３には、学習用
カテゴリ化各呼気段落長２０５が入力され、数量化１類
予測モデル３０１には、予測用カテゴリ化各呼気段落長
２１３が入力され、数量化１類予測モデル３０１からは
推定値データ３０４がポーズ長伸縮部３０７に出力され
る。一方、発声速度比算出部３０５には、発音速度デー
タ２０８が入力され、発声速度比算出部３０５からは発
声速度比３０６がポーズ長伸縮部３０７に出力される。
ポーズ長伸縮部３０７からは目的ポーズ長データ２０９
が出力される。The categorized first-class learning model 303 receives the learning categorized respective exhalation paragraph lengths 205, and the quantified first-class prediction model 301 receives the predicted categorized respective exhalation paragraph lengths 213. The estimated value data 304 is output to the pause length expansion / contraction unit 307 from the first type prediction model 301. On the other hand, the pronunciation speed data 208 is input to the utterance speed ratio calculation unit 305, and the utterance speed ratio 306 is output from the utterance speed ratio calculation unit 305 to the pause length expansion / contraction unit 307.
From the pose length expansion / contraction unit 307, target pose length data 209
Is output.

【００５３】以下、上述のように構成された音声合成方
法における、ポーズ長決定方法の動作を説明する。The operation of the pause length determination method in the speech synthesis method configured as described above will be described below.

【００５４】まず、パソコン通信の文章ファイルやフロ
ッピーディスク（ＦＤ）内の文章ファイル等のテキスト
データがＲＳ２３２Ｃ等の通信ポートやＦＤＤを経て入
力され、内部バッファに一時的に格納され、一定量を超
えることによりある単位ごと（例えば、１文章ごと）に
テキスト解析部１０１に送られる。First, text data such as a text file in a personal computer communication or a text file in a floppy disk (FD) is input via a communication port such as RS232C or an FDD, temporarily stored in an internal buffer, and exceeds a certain amount. Thus, the text is sent to the text analysis unit 101 for each unit (for example, for each sentence).

【００５５】テキスト解析部１０１では、ＲＯＭやＲＡ
Ｍで構成された単語辞書１０２の単語辞書と、そのテキ
ストデータを照合しながら読み、アクセント、イントネ
ーション、ポーズ等の情報を文字列として記述した音韻
韻律記号を生成し、これをパラメータ生成部１０３に送
る。In the text analysis unit 101, a ROM or RA
M is read while collating the word dictionary of the word dictionary 102 composed of M with its text data, and generates a phonological prosodic symbol in which information such as accent, intonation, and pause is described as a character string, and this is output to the parameter generation unit 103. send.

【００５６】パラメータ生成部１０３では、テキスト解
析部１０１からの直接あるいは韻律記号付き発音記号
（中間言語）を入力として、中間言語自身から使用すべ
き、素片辞書１０５内の素片アドレスを選択し、また、
ピッチ周波数パターンや音韻継続時間長、振幅、ポーズ
長等の設定を行い、これらの情報からなる合成パラメー
タを生成して合成音声部１０４に送る。The parameter generation unit 103 receives a phonetic symbol with a prosodic symbol (intermediate language) directly from the text analysis unit 101 and selects a segment address in the segment dictionary 105 to be used from the intermediate language itself. ,Also,
A pitch frequency pattern, a phoneme duration time, an amplitude, a pause length, and the like are set, and a synthesis parameter including these pieces of information is generated and sent to the synthesis voice unit 104.

【００５７】合成音声部１０４では、生成された合成パ
ラメータに基づいて音声波形データを生成し、これをＤ
／Ａ変換器（図示せず）に送る。The synthesized voice section 104 generates voice waveform data based on the generated synthesized parameters, and
/ A converter (not shown).

【００５８】次に、図２に示すパラメータ生成部１０３
による、規則音声合成のためのポーズ長決定方法につい
て説明する。Next, the parameter generator 103 shown in FIG.
Will be described with reference to FIG.

【００５９】ポーズとしては、明示的にカンマ（，）な
どの記号で示すか、あるいは、ポーズを陽に示さず、呼
気段落境界が息継ぎの箇所であるので、呼気段落境界を
ポーズとしてもよい。呼気段落境界をポーズとする場
合、特定の呼気段落境界にポーズが入らないことを示す
非ポーズ記号を採用してもよい。以下、ポーズとは、ポ
ーズ記号で明示的に示されたポーズ、または、陽に示さ
ない場合の、非ポーズでない箇所を指すものとする。The pause may be explicitly indicated by a symbol such as a comma (,) or the expiration paragraph boundary may be set as a pause because the expiration paragraph boundary is a part of breathing without explicitly indicating the pause. When the exhalation paragraph boundary is paused, a non-pause symbol indicating that no pause occurs at a specific exhalation paragraph boundary may be employed. In the following, a pose refers to a pose that is explicitly indicated by a pose symbol, or a part that is not non-paused when not explicitly indicated.

【００６０】図２に示すように、中間言語２０１、ポー
ズ直前呼気段落長算出部２１０、ポーズ直後呼気段落長
算出部２１１、ポーズ２つ後呼気段落長算出部２１２、
カテゴリ化手段２０３、ポーズ長学習／予測部２０４、
学習データ２０５、学習／予測制御部２０７、発声速度
データ２０８、ポーズ長修正部２０６、及び目的ポーズ
長データ２０９から構成されている。As shown in FIG. 2, an intermediate language 201, an expiration paragraph length calculation section 210 immediately before a pause, an expiration paragraph length calculation section 211 immediately after a pause, an expiration paragraph length calculation section 212 after 2 pauses,
Categorizing means 203, pose length learning / prediction unit 204,
It comprises learning data 205, learning / prediction control unit 207, utterance speed data 208, pause length correction unit 206, and target pause length data 209.

【００６１】その動作は、学習／予測制御部２０７の指
示によって、（１）モデル学習の場合と、（２）予測の
場合とに分かれる。（１）モデル学習の場合学習／予測制御部２０７によりモデル学習が指示される
と、ポーズ直前呼気段落長算出部２１０、ポーズ直後呼
気段落長算出部２１１及びポーズ２つ後呼気段落長算出
部２１２によりポーズ直前、直後、２つ後の呼気段落長
を計数しておく。長さの計数にあたっては、モーラ数を
計数し、日本語の特性を考慮して、撥音（Ｎ）、促音
（ッ）、長音をも１拍に計数する。その後、これらの呼
気段落長をカテゴリ化手段２０３によりカテゴリ化し要
因としておく。これらと、ポーズ長実測値を外的基準と
して、学習データ２０５を作成する。The operation is divided into (1) the case of model learning and (2) the case of prediction according to the instruction of the learning / prediction control unit 207. (1) In the case of model learning When model learning is instructed by the learning / prediction control unit 207, the expiration paragraph length calculation unit 210 immediately before the pause, the expiration paragraph length calculation unit 211 immediately after the pause, and the expiration paragraph length calculation unit 212 after two pauses. Thus, the exhalation paragraph length immediately before, immediately after, and two times after is counted. In counting the length, the number of mora is counted, and in consideration of the characteristics of the Japanese language, the sound repellent (N), the prompting sound (tsu), and the long sound are also counted in one beat. After that, these exhalation paragraph lengths are categorized by the categorizing means 203 as factors. The learning data 205 is created using these values and the measured pose length as external criteria.

【００６２】ポーズ長学習部２０４では、上記学習デー
タ２０５を入力し、ポーズ長実測値を外的規準として、
数量化１類のモデルを学習する。この数量化１類のモデ
ルを学習することは、カテゴリ数量を決定することであ
る。In the pose length learning unit 204, the learning data 205 is input, and the measured pose length value is used as an external criterion.
Learn the quantification type 1 model. Learning this model of quantification type 1 is to determine the category quantity.

【００６３】数量化１類は、多変量解析の１つであり、
質的な要因に基づいて、目的となる外的基準を算出する
もので、以下の式（１）〜（３）で定式化される。Quantification type 1 is one of multivariate analyses,
A target external criterion is calculated based on qualitative factors, and is formulated by the following equations (1) to (3).

【００６４】ｉ番目のデータの要因アイテムをｊ、その
属するカテゴリをｋ、そのカテゴリ数量（カテゴリに付
与する係数）をｘ（ｊｋ）とするとき、予測値ｙ（ｉ）
は、式（１），式（２）で示される。When the factor item of the i-th data is j, the category to which it belongs is k, and the category quantity (coefficient given to the category) is x (jk), the predicted value y (i)
Is represented by Expressions (1) and (2).

【００６５】[0065]

【数１】 (Equation 1)

【００６６】上記ｘ（ｊｋ）は、最小２乗法で求められ
る。すなわち、式（３）に示すように予測値ｙ（ｉ）と
実測値Ｙ（ｉ）の２乗誤差が最小になるようにして求め
られる。The above x (jk) is obtained by the least square method. That is, as shown in the equation (3), the square error between the predicted value y (i) and the actually measured value Y (i) is obtained so as to be minimized.

【００６７】[0067]

【数２】 (Equation 2)

【００６８】解法には上記式（３）を、ｘ（ｊｋ）で偏
微分してゼロとおくと正規方程式が得られる。In the solution, a normal equation can be obtained by partially differentiating the above equation (3) with x (jk) and setting it to zero.

【００６９】コンピュータによる実際の計算としては、
連立方程式を解く数値解析問題に帰着できる。（２）予測の場合一方、学習／予測制御部２０７の指示によって、予測の
場合には、中間言語２０１の入力音韻記号列から、ポー
ズ直前呼気段落長算出部２１０、ポーズ直後呼気段落長
算出部２１１及びポーズ２つ後呼気段落長算出部２１２
の動作によって、各呼気段落長２１３を計数する。長さ
の計数にあたっては、学習データと同様に、モーラ数を
計数し、日本語の特性を考慮して、撥音（Ｎ）、促音
（ッ）、長音をも１拍に計数する。As the actual calculation by the computer,
It can be reduced to a numerical analysis problem that solves simultaneous equations. (2) In the case of prediction On the other hand, in the case of prediction in accordance with an instruction from the learning / prediction control unit 207, the exhalation paragraph length calculation unit 210 immediately before pause and the exhalation paragraph length calculation unit immediately after pause from the input phoneme symbol string of the intermediate language 201. 211 and two pauses after expiration paragraph length calculation unit 212
By the above operation, each exhalation paragraph length 213 is counted. In counting the length, the number of moras is counted as in the case of the learning data, and the sound repellency (N), the prompting sound (tsu), and the long sound are also counted in one beat in consideration of the characteristics of Japanese.

【００７０】その後、カテゴリ化手段２０３の動作によ
って、前記呼気段落長を学習時と同一カテゴリに分類し
て質的データである要因データ（カテゴリ化した各呼気
段落長）とし、ポーズ長予測部２０４に入力する。ポー
ズ長予測部２０４では、学習時に決定したカテゴリ数量
を用いて、外的規準であるポーズ長を予測する。After that, by the operation of the categorizing means 203, the exhalation paragraph length is classified into the same category as that at the time of learning, and is used as factor data (category exhalation paragraph lengths) as qualitative data. To enter. The pose length prediction unit 204 predicts a pose length, which is an external criterion, using the category quantity determined at the time of learning.

【００７１】その後、ポーズ長修正部２０６によって、
発声速度データ２０８に基づいてポーズ長を修正し、目
的ポーズ長データ２０９を得る。After that, the pose length correction unit 206
The pause length is corrected based on the utterance speed data 208 to obtain target pause length data 209.

【００７２】ここで、要因データとしては、第１の実施
形態では、ポーズ直前の呼気段落長（モーラ数）、ポー
ズ直後の呼気段落長（モーラ数）、及びポーズ２つ後の
呼気段落長（モーラ数）を用いることを特徴とする。Here, as the factor data, in the first embodiment, the expiration paragraph length immediately before the pause (number of mora), the expiration paragraph length immediately after the pause (number of mora), and the expiration paragraph length after two pauses (the number of mora). (Mora number).

【００７３】次に、図３を参照してポーズ長学習／予測
部２０４及びポーズ長修正部２０６の動作を詳細に説明
する。Next, the operations of the pause length learning / prediction unit 204 and the pause length correction unit 206 will be described in detail with reference to FIG.

【００７４】この動作も、学習／予測制御部２０７の指
示によって、（１）学習の場合と、（２）予測の場合
と、に分かれる。（１）学習の場合学習／予測制御部２０７の指示により、学習の場合に
は、学習用カテゴリ化各呼気段落長（要因）とポーズ実
測値データ２０５から、数量化１類学習モデル３０３を
起動して、前記式（３）を解き、求まったカテゴリ数量
をカテゴリ数量メモリ３０２に記憶する。（２）予測の場合学習／予測制御部２０７の指示により、予測する場合に
は、予測用カテゴリ化各呼気段落長（要因）データ２１
３を、まず中間言語である入力音韻記号列から算出す
る。この予測用カテゴリ化各呼気段落長（要因）データ
２１３に対して、カテゴリ数量メモリ３０２内のカテゴ
リ数量を重み計数として、数量化１類予測モデル３０１
を起動して、前記式（１）を計算し、ポーズ長の推定値
データ３０４を求める。This operation is also divided into (1) the case of learning and (2) the case of prediction according to the instruction of the learning / prediction control unit 207. (1) In the case of learning According to the instruction of the learning / prediction control unit 207, in the case of learning, the quantification type 1 learning model 303 is activated from the categorization of each expiration paragraph (factor) for learning and the actually measured pause value data 205. Then, the equation (3) is solved, and the obtained category quantity is stored in the category quantity memory 302. (2) In the case of prediction In the case of prediction according to the instruction of the learning / prediction control unit 207, the categorization for each exhalation paragraph length (factor) data 21
3 is calculated from an input phoneme symbol string which is an intermediate language. For each of the predicted categorized expiration paragraph length (factor) data 213, the category quantity in the category quantity memory 302 is used as a weighting factor, and the quantified first-class prediction model 301
Is activated to calculate the equation (1), and obtain the estimated value data 304 of the pause length.

【００７５】ポーズ修正部２０６は、発声速度比算出部
３０５及びポーズ長伸縮部３０７から構成され、その動
作は、現在の発声速度を入力して、発声速度比算出部３
０５によって、学習時の基準の発声速度との比３０６を
求め、ポーズ長伸縮部３０７に出力する。ポーズ長伸縮
部３０７では、ポーズ長推定値データ３０４に発声速度
比３０６から定まる定数を乗じ、目的ポーズ長データ２
０９を得る。The pause correction unit 206 is composed of a utterance speed ratio calculation unit 305 and a pause length expansion / contraction unit 307. The operation is performed by inputting the current utterance speed and inputting the current utterance speed.
In step 05, the ratio 306 to the reference utterance speed at the time of learning is obtained and output to the pause length expansion / contraction unit 307. The pause length expansion / contraction unit 307 multiplies the pause length estimated value data 304 by a constant determined from the utterance speed ratio 306 to obtain the target pause length data 2.
09.

【００７６】以上説明したように、第１の実施形態に係
る音声合成装置は、非ポーズでない呼気段落境界の直前
の呼気段落長を計数するポーズ直前呼気段落長算出部２
１０、非ポーズでない呼気段落境界の直後の呼気段落長
を計数するポーズ直後呼気段落長算出部２１１、非ポー
ズでない呼気段落境界の２つ後の呼気段落長を計数する
ポーズ２つ後呼気段落長算出部２１２、各呼気段落長を
カテゴリ化するカテゴリ化手段２０３、カテゴリ化され
た呼気段落長から、数量化１類によって得たポーズ時間
長重みテーブルを用いて、ポーズ時間長を予測するポー
ズ長学習／予測部２０４、ポーズ長学習／予測部２０４
を制御する学習／予測制御部２０７、発声速度に応じ
て、ポーズ長予測値を修正するポーズ長修正部２０６を
備え、規則音声合成のためのポーズ長を決定するように
したので、テキスト解析を必ずしも必要とはしない、よ
りコンパクトな規則音声合成レベルにおいても適切なポ
ーズ長を得ることができ、全体のリズム感を向上させ、
自然なテンポ感の合成音声を得ることができる。As described above, the speech synthesizing device according to the first embodiment calculates the expiration paragraph length immediately before a non-pause expiration paragraph boundary.
10. Exhaled paragraph length immediately after pause counting the exhalation paragraph length immediately after the non-pause expiration paragraph boundary 211, Pause two after expiration paragraph length to count the exhalation paragraph length two after the non-pause expiration paragraph boundary A calculating unit 212, a categorizing unit 203 for categorizing each exhalation paragraph length, and a pause length for estimating a pause time length from the categorized expiration paragraph length using a pause time length weight table obtained by quantification class 1. Learning / prediction unit 204, pose length learning / prediction unit 204
A learning / prediction control unit 207 for controlling the pause length and a pause length correction unit 206 for correcting the predicted pause length value according to the utterance speed are provided to determine the pause length for rule-based speech synthesis. An appropriate pause length can be obtained even at a more compact rule speech synthesis level that is not always necessary, improving the overall rhythmic feeling,
A synthesized voice with a natural tempo can be obtained.

【００７７】以下、本実施形態の効果を具体的に説明す
る。Hereinafter, the effects of the present embodiment will be described specifically.

【００７８】ＡＴＲ５０３文データベースにより女性話
者１名の学習を行った結果、有効な読点データ数６７１
個に対して、ポーズ直前、直後、２つ後の呼気段落長と
ポーズ長には相関がみられ、重相関係数０．５６２、実
測値との平均２乗誤差９７．７ｍｓｅｃの推定精度を得
た。As a result of learning one female speaker using the ATR503 sentence database, the number of effective reading data 671 was obtained.
For each individual, there is a correlation between the exhalation paragraph length and the pause length immediately before, immediately after and two pauses later, and the estimation accuracy of the multiple correlation coefficient 0.562 and the mean square error 99.7 msec from the actually measured value is improved. Obtained.

【００７９】これにより、テキスト解析における係り受
け情報等を全く用いることなく、簡易な手法でポーズ長
を適切に制御でき、自然なテンポ感の合成音声を得るこ
とが可能になる効果がある。As a result, the pause length can be appropriately controlled by a simple method without using any dependency information or the like in the text analysis, and there is an effect that a synthesized voice with a natural tempo feeling can be obtained.

【００８０】第２の実施形態図４は本発明の第２の実施形態に係る音声合成装置及び
規則音声合成方法のパラメータ生成部による、規則音声
合成のためのポーズ長決定方法を説明するためのブロッ
ク図である。本実施形態に係る音声合成方法の説明にあ
たり図２と同一構成部分には同一符号を付している。Second Embodiment FIG. 4 is a diagram for explaining a method of determining a pause length for regular speech synthesis by a speech synthesis apparatus and a parameter generation unit of a regular speech synthesis method according to a second embodiment of the present invention. It is a block diagram. In the description of the speech synthesis method according to the present embodiment, the same components as those in FIG. 2 are denoted by the same reference numerals.

【００８１】本ポーズ長決定方法は、前記図１のパラメ
ータ生成部１０３によるポーズ長決定方法に適用でき
る。The present pause length determination method can be applied to the pause length determination method by the parameter generation unit 103 shown in FIG.

【００８２】図４において、２０１は中間言語、４０１
は非ポーズでない呼気段落境界の直前の呼気段落長を計
数するポーズ直前呼気段落長算出部（第１の計数手
段）、４０２は非ポーズでない呼気段落境界の直後の呼
気段落長を計数するポーズ直後呼気段落長算出部（第２
の計数手段）、２０３は各呼気段落長算出部４０１，４
０２の各呼気段落長２１３をカテゴリ化するカテゴリ化
手段、２０４はカテゴリ化された呼気段落長から、数量
化１類によって得たポーズ時間長重みテーブルを用いて
ポーズ時間長を予測するポーズ長学習／予測部、２０５
は学習データ、２０７は学習／予測制御部、２０８は発
声速度データ、２０６はポーズ長修正部、２０９は目的
ポーズ長データである。In FIG. 4, reference numeral 201 denotes an intermediate language;
Is a pause immediately before expiration paragraph length calculation section (first counting means) for counting the expiration paragraph length immediately before the non-pause expiration paragraph boundary, and 402 is immediately after the pause for counting the expiration paragraph length immediately after the non-pause expiration paragraph boundary. Expiration paragraph length calculation unit (second
Counting means), 203 is each exhalation paragraph length calculation unit 401, 4
A categorizing means 204 for categorizing each expiration paragraph length 213 of 02, and a pause length learning for estimating a pause time length from the categorized expiration paragraph length using a pause time length weight table obtained by quantification class 1. / Prediction unit, 205
Denotes learning data, 207 denotes a learning / prediction control unit, 208 denotes utterance speed data, 206 denotes a pause length correction unit, and 209 denotes target pause length data.

【００８３】上記ポーズ長学習／予測部２０４及びポー
ズ長修正部２０６の詳細な構成は、前記図３に示すブロ
ック図と同様である。The detailed configurations of the pause length learning / prediction unit 204 and the pause length correction unit 206 are the same as those in the block diagram shown in FIG.

【００８４】以下、上述のように構成された音声合成方
法における、ポーズ長決定方法の動作を説明する。The operation of the pause length determination method in the speech synthesis method configured as described above will be described below.

【００８５】ポーズとしては、明示的にカンマ（，）な
どの記号で示すか、あるいは、ポーズを陽に示さず、呼
気段落境界が息継ぎの箇所であるので、呼気段落境界を
ポーズとしてもよい。呼気段落境界をポーズとする場
合、特定の呼気段落境界にポーズが入らないことを示す
非ポーズ記号を採用してもよい。以下、ポーズとは、ポ
ーズ記号で明示的に示されたポーズ、または、陽に示さ
ない場合の、非ポーズでない箇所を指すものとする。The pause may be explicitly indicated by a symbol such as a comma (,), or the exhalation paragraph boundary may be set as a pause because the exhalation paragraph boundary is a point of breathing without explicitly indicating the pause. When the exhalation paragraph boundary is paused, a non-pause symbol indicating that no pause occurs at a specific exhalation paragraph boundary may be employed. In the following, a pose refers to a pose that is explicitly indicated by a pose symbol, or a part that is not non-paused when not explicitly indicated.

【００８６】図４に示すように、中間言語２０１、ポー
ズ直前呼気段落長算出部４０１、ポーズ直後呼気段落長
算出部４０２、カテゴリ化手段２０３、ポーズ長学習／
予測部２０４、学習データ２０５、学習／予測制御部２
０７、発声速度データ２０８、ポーズ長修正部２０６、
及び目的ポーズ長データ２０９から構成されている。As shown in FIG. 4, an intermediate language 201, an expiration paragraph length immediately before a pause calculation unit 401, an immediately after expiration paragraph length calculation unit 402, a categorizing unit 203, a pause length learning /
Prediction unit 204, learning data 205, learning / prediction control unit 2
07, utterance speed data 208, pause length correction unit 206,
And target pose length data 209.

【００８７】その動作は、学習／予測制御部２０７の指
示によって、（１）モデル学習の場合と、（２）予測の
場合と、に分かれる。（１）モデル学習の場合学習／予測制御部２０７によりモデル学習が指示される
と、ポーズ直前呼気段落長算出部４０１及びポーズ直後
呼気段落長算出部４０２によりポーズ直前、直後の呼気
段落長を計数しておく。長さの計数にあたっては、モー
ラ数を計数し、日本語の特性を考慮して、撥音（Ｎ）、
促音（ッ）、長音をも１拍に計数する。その後、これら
の呼気段落長をカテゴリ化手段２０３によりカテゴリ化
し要因としておく。これらと、ポーズ長実測値を外的基
準として、学習データ２０５を作成する。The operation is divided into (1) the case of model learning and (2) the case of prediction according to the instruction of the learning / prediction control unit 207. (1) In the case of model learning When model learning is instructed by the learning / prediction control unit 207, the expiration paragraph length immediately before and immediately after the pause is counted by the expiration paragraph length calculation unit 401 immediately before the pause and the expiration paragraph length calculation unit 402 immediately after the pause. Keep it. When counting the length, the number of mora is counted, and in consideration of the characteristics of Japanese, the sound repellency (N),
Prompting sounds and long sounds are counted as one beat. After that, these exhalation paragraph lengths are categorized by the categorizing means 203 as factors. The learning data 205 is created using these values and the measured pose length as external criteria.

【００８８】ポーズ長学習部２０４では、上記学習デー
タ２０５を入力し、ポーズ長実測値を外的規準として、
数量化１類のモデルを学習（カテゴリ数量を決定）す
る。（２）予測の場合一方、学習／予測制御部２０７の指示によって、予測の
場合には、中間言語２０１の入力音韻記号列から、ポー
ズ直前呼気段落長算出部４０１及びポーズ直後呼気段落
長算出部４０２の動作によって、各呼気段落長２１３を
計数する。長さの計数にあたっては、学習データと同様
に、モーラ数を計数し、日本語の特性を考慮して、撥音
（Ｎ）、促音（ッ）、長音をも１拍に計数する。In the pose length learning unit 204, the learning data 205 is input, and the measured pose length value is used as an external criterion.
The model of the quantification type 1 is learned (category quantity is determined). (2) In the case of prediction On the other hand, in the case of prediction in accordance with an instruction from the learning / prediction control unit 207, an expiration paragraph length calculation unit 401 immediately before a pause and an expiration paragraph length calculation unit immediately after a pause from an input phoneme symbol string of the intermediate language 201. By the operation of 402, each exhalation paragraph length 213 is counted. In counting the length, the number of moras is counted as in the case of the learning data, and the sound repellency (N), the prompting sound (tsu), and the long sound are also counted in one beat in consideration of the characteristics of Japanese.

【００８９】その後、カテゴリ化手段２０３の動作によ
って、前記呼気段落長を学習時と同一カテゴリに分類し
て質的データである要因データ（カテゴリ化した各呼気
段落長）とし、ポーズ長予測部２０４に入力する。ポー
ズ長予測部２０４では、学習時に決定したカテゴリ数量
を用いて、外的規準であるポーズ長を予測する。After that, the operation of the categorizing means 203 classifies the exhalation paragraph length into the same category as that at the time of learning, and sets the data as qualitative data (categorical exhalation paragraph lengths) as qualitative data. To enter. The pose length prediction unit 204 predicts a pose length, which is an external criterion, using the category quantity determined at the time of learning.

【００９０】ポーズ修正部２０６は、前記３に示すよう
に発声速度比算出部３０５及びポーズ長伸縮部３０７か
ら構成され、その動作は、現在の発声速度を入力して、
発声速度比算出部３０５によって、学習時の基準の発声
速度との比３０６を求め、ポーズ長伸縮部３０７に出力
する。ポーズ長伸縮部３０７では、ポーズ長推定値デー
タ３０４に発声速度比３０６から定まる定数を乗じ、目
的ポーズ長データ２０９を得る。The pause correction unit 206 is composed of a utterance speed ratio calculation unit 305 and a pause length expansion / contraction unit 307 as shown in the above 3, and its operation is to input the current utterance speed,
The utterance speed ratio calculation unit 305 obtains a ratio 306 from the reference utterance speed at the time of learning, and outputs the ratio 306 to the pause length expansion / contraction unit 307. The pause length expansion / contraction unit 307 multiplies the estimated pause length data 304 by a constant determined from the utterance speed ratio 306 to obtain target pause length data 209.

【００９１】ここで、要因データとしては、第２の実施
形態では、ポーズ直前の呼気段落長（モーラ数）、及び
ポーズ直後の呼気段落長（モーラ数）を用いることを特
徴とする。Here, in the second embodiment, the exhalation paragraph length immediately before the pause (number of mora) and the exhalation paragraph length immediately after the pause (number of mora) are used as the factor data.

【００９２】以上説明したように、第２の実施形態に係
る音声合成装置及び規則音声合成方法は、非ポーズでな
い呼気段落境界の直前の呼気段落長を計数するポーズ直
前呼気段落長算出部４０１、非ポーズでない呼気段落境
界の直後の呼気段落長を計数するポーズ直後呼気段落長
算出部４０２、各呼気段落長をカテゴリ化するカテゴリ
化手段２０３、カテゴリ化された呼気段落長から、数量
化１類によって得たポーズ時間長重みテーブルを用い
て、ポーズ時間長を予測するポーズ長学習／予測部２０
４、ポーズ長学習／予測部２０４を制御する学習／予測
制御部２０７、発声速度に応じて、ポーズ長予測値を修
正するポーズ長修正部２０６を備え、規則音声合成のた
めのポーズ長を決定するようにしたので、第１の実施形
態と同様に、簡易な構成で、ポーズ長を適切に制御で
き、自然なテンポ感の合成音声を得ることができる。As described above, the speech synthesizing apparatus and the rule speech synthesizing method according to the second embodiment include a pause-pre-expiration-paragraph length calculating unit 401 that counts the expiration paragraph length immediately before a non-pause exhalation paragraph boundary. A breathing paragraph length immediately after pause calculating section 402 for counting the breathing paragraph length immediately after the non-pause paragraph boundary, a categorizing means 203 for categorizing each breathing paragraph length, and a quantification class 1 from the categorized breathing paragraph length. Length learning / prediction unit 20 that predicts the pause time length using the pause time length weight table obtained by
4. A learning / prediction control unit 207 for controlling the pause length learning / prediction unit 204, and a pause length correction unit 206 for correcting the pause length prediction value according to the utterance speed, and determining a pause length for rule speech synthesis. Therefore, similarly to the first embodiment, the pause length can be appropriately controlled with a simple configuration, and a synthesized voice with a natural sense of tempo can be obtained.

【００９３】以下、本実施形態の効果を具体的に説明す
る。Hereinafter, the effects of the present embodiment will be specifically described.

【００９４】ＡＴＲ５０３文データベースの女性話者１
名による学習を行った結果、有効な読点データ数６７１
個に対して重相関係数０．４４、実測値との平均２乗誤
差１０６ｍｓｅｃの推定精度を得た。[0094] Female speaker 1 of ATR503 sentence database
As a result of learning by name, the number of valid reading data 671
An estimation accuracy of a multiple correlation coefficient of 0.44 and a mean square error of 106 msec from an actually measured value was obtained.

【００９５】したがって、第１の実施形態と同様に、簡
易な構成で、ポーズ長を適切に制御でき、自然なテンポ
感の合成音声を得ることが可能になる効果がある。Therefore, similarly to the first embodiment, there is an effect that the pause length can be appropriately controlled with a simple configuration, and a synthesized voice with a natural sense of tempo can be obtained.

【００９６】これらの効果に加えて、特に、第２の実施
形態においては、ポーズ長の推定精度を第１の実施形態
と比較して大幅に低下させることなく、重み係数である
カテゴリ数量メモリ３０２を２／３に、また、予測時の
加算等の演算量も２／３にそれぞれ削減でき、より低コ
ストでコンパクトにまとめることが可能になる効果があ
る。In addition to these effects, in particular, in the second embodiment, the category quantity memory 302, which is a weighting factor, is used without significantly lowering the pose length estimation accuracy compared to the first embodiment. Can be reduced to 2/3, and the amount of calculation such as addition at the time of prediction can be reduced to 2/3.

【００９７】第３の実施形態図５は本発明の第３の実施形態に係る音声合成装置及び
規則音声合成方法のパラメータ生成部による、規則音声
合成のためのポーズ長決定方法を説明するためのブロッ
ク図である。本実施形態に係る音声合成方法の説明にあ
たり図２と同一構成部分には同一符号を付している。Third Embodiment FIG. 5 is a diagram for explaining a pause length determination method for rule-based speech synthesis by a speech synthesis apparatus and a parameter generation unit of a rule-based speech synthesis method according to a third embodiment of the present invention. It is a block diagram. In the description of the speech synthesis method according to the present embodiment, the same components as those in FIG. 2 are denoted by the same reference numerals.

【００９８】本ポーズ長決定方法は、前記図１のパラメ
ータ生成部１０３によるポーズ長決定方法に適用でき
る。This pose length determination method can be applied to the pause length determination method by the parameter generation unit 103 in FIG.

【００９９】図５において、２０１は中間言語、２１０
は非ポーズでない呼気段落境界の直前の呼気段落長を計
数するポーズ直前呼気段落長算出部（第１の計数手
段）、２１１は非ポーズでない呼気段落境界の直後の呼
気段落長を計数するポーズ直後呼気段落長算出部（第２
の計数手段）、２１２は非ポーズでない呼気段落境界の
２つ後の呼気段落長を計数するポーズ２つ後呼気段落長
算出部（第３の計数手段）、５０１は各呼気段落長か
ら、重回帰分析によって得たポーズ時間長重みテーブル
を用いてポーズ時間長を予測するポーズ長学習／予測部
（ポーズ長決定手段）、５０５は学習データ、２０７は
学習／予測制御部、２０８は発声速度データ、２０６は
ポーズ長修正部、２０９は目的ポーズ長データである。In FIG. 5, reference numeral 201 denotes an intermediate language;
Is a pause immediately before expiration paragraph length calculation unit (first counting means) for counting the expiration paragraph length immediately before the non-pause expiration paragraph boundary, and 211 is immediately after the pause for counting the expiration paragraph length immediately after the non-pause expiration paragraph boundary. Expiration paragraph length calculation unit (second
Counting means), 212 is a pause-two-expiration-paragraph length calculation unit (third counting means) for counting the expiration paragraph length two times after the non-pause expiration paragraph boundary, and 501 is a weight from each expiration paragraph length. Pose length learning / prediction unit (pause length determining means) for predicting a pause time length using a pause time length weight table obtained by regression analysis, 505 is learning data, 207 is a learning / prediction control unit, and 208 is utterance speed data. , 206 is a pose length correction unit, and 209 is target pose length data.

【０１００】図６は上記ポーズ長学習／予測部５０１及
びポーズ長修正部２０６の詳細な構成を示すブロック図
であり、前記図３と同一構成部分には同一符号を付して
いる。FIG. 6 is a block diagram showing a detailed configuration of the pause length learning / prediction unit 501 and the pause length correction unit 206. The same components as those in FIG. 3 are denoted by the same reference numerals.

【０１０１】図６において、ポーズ長学習／予測部５０
１は、重回帰予測モデル６０１、回帰係数メモリ６０２
及び重回帰学習モデル６０３から構成され、ポーズ長修
正部２０６は、発声速度比算出部３０５及びポーズ長伸
縮部３０７から構成される。また、２０７はポーズ長学
習／予測部２０４を制御する学習／予測制御部である。In FIG. 6, a pose length learning / prediction unit 50 is used.
1 is a multiple regression prediction model 601, a regression coefficient memory 602
The pause length correction unit 206 includes a utterance speed ratio calculation unit 305 and a pause length expansion / contraction unit 307. Reference numeral 207 denotes a learning / prediction control unit that controls the pause length learning / prediction unit 204.

【０１０２】重回帰学習モデル６０３には、学習用各呼
気段落長５０５が入力され、重回帰予測モデル６０１に
は、予測用各呼気段落長６００が入力され、重回帰予測
モデル６０１からは推定値データ３０４がポーズ長伸縮
部３０７に出力される。一方、発声速度比算出部３０５
には、発音速度データ２０８が入力され、発声速度比算
出部３０５からは発声速度比３０６がポーズ長伸縮部３
０７に出力される。ポーズ長伸縮部３０７からは目的ポ
ーズ長データ２０９が出力される。The multiple regression learning model 603 receives the learning breath length 505, the multiple regression prediction model 601 receives the prediction breath length 600, and the estimated value from the multiple regression prediction model 601. The data 304 is output to the pause length expansion / contraction unit 307. On the other hand, the utterance speed ratio calculation unit 305
, The utterance speed data 308 is input from the utterance speed ratio calculator 305 to the utterance speed ratio calculator 305.
07. The pose length expansion / contraction unit 307 outputs target pose length data 209.

【０１０３】以下、上述のように構成された音声合成方
法における、ポーズ長決定方法の動作を説明する。The operation of the pause length determination method in the speech synthesis method configured as described above will be described below.

【０１０４】ポーズとしては、明示的にカンマ（，）な
どの記号で示すか、あるいは、ポーズを陽に示さず、呼
気段落境界が息継ぎの箇所であるので、呼気段落境界を
ポーズとしてもよい。呼気段落境界をポーズとする場
合、特定の呼気段落境界にポーズが入らないことを示す
非ポーズ記号を採用してもよい。以下、ポーズとは、ポ
ーズ記号で明示的に示されたポーズ、または、陽に示さ
ない場合の、非ポーズでない箇所を指すものとする。The pause may be explicitly indicated by a symbol such as a comma (,), or the exhalation paragraph boundary may be set as a pause because the exhalation paragraph boundary is a breathing point without explicitly indicating the pause. When the exhalation paragraph boundary is paused, a non-pause symbol indicating that no pause occurs at a specific exhalation paragraph boundary may be employed. In the following, a pose refers to a pose that is explicitly indicated by a pose symbol, or a part that is not non-paused when not explicitly indicated.

【０１０５】図５に示すように、中間言語２０１、ポー
ズ直前呼気段落長算出部２１０、ポーズ直後呼気段落長
算出部２１１、ポーズ２つ後呼気段落長算出部２１２、
ポーズ長学習／予測部５０１、学習データ２０５、学習
／予測制御部２０７、発声速度データ２０８、ポーズ長
修正部２０６、及び目的ポーズ長データ２０９から構成
されている。As shown in FIG. 5, the intermediate language 201, the expiration paragraph length calculation section 210 immediately before a pause, the expiration paragraph length calculation section 211 immediately after a pause, the expiration paragraph length calculation section 212 after two pauses,
It comprises a pause length learning / prediction unit 501, learning data 205, learning / prediction control unit 207, utterance speed data 208, pause length correction unit 206, and target pause length data 209.

【０１０６】その動作は、学習／予測制御部２０７の指
示によって、（１）モデル学習の場合と、（２）予測の
場合と、に分かれる。（１）モデル学習の場合学習／予測制御部２０７によりモデル学習が指示される
と、データベースからポーズ直前、直後、２つ後の呼気
段落長を計数しておく。長さの計数にあたっては、モー
ラ数を計数し、日本語の特性を考慮して、撥音（Ｎ）、
促音（ッ）、長音をも１拍に計数する。また、前記呼気
段落長を説明変数とし、ポーズ長実測値を目的変数とし
て、学習データ５０５を作成しておく。The operation is divided into (1) the case of model learning and (2) the case of prediction according to the instruction of the learning / prediction control unit 207. (1) In the case of model learning When model learning is instructed by the learning / prediction control unit 207, the exhalation paragraph length immediately before, immediately after, and immediately after the pause is counted from the database. When counting the length, the number of mora is counted, and in consideration of the characteristics of Japanese, the sound repellency (N),
Prompting sounds and long sounds are counted as one beat. In addition, learning data 505 is created using the exhalation paragraph length as an explanatory variable and the measured pause length as an objective variable.

【０１０７】前記学習データ５０５をポーズ長学習部５
０１に入力し、ポーズ長実測値を目的変数として、重回
帰分析のモデルを学習し、回帰係数を決定する。The learning data 505 is transferred to the pause length learning unit 5
01, the model of the multiple regression analysis is learned using the measured pose length as the objective variable, and the regression coefficient is determined.

【０１０８】重回帰分析は、多変量解析の１つであり、
複数の説明変数に基づいて目的変数を算出するもので、
以下の式（４）〜（６）で定式化される。The multiple regression analysis is one of the multivariate analyses,
The purpose variable is calculated based on multiple explanatory variables.
Formulated by the following equations (4) to (6).

【０１０９】説明変数ｘ１，ｘ２，…，ｘｐでｎ個の学
習データから目的変数ｙをモデル化すると、ｉ番目のデ
ータ予測値は、式（４）で表される。When the objective variable y is modeled from n pieces of learning data using the explanatory variables x1, x2,..., Xp, the i-th data predicted value is expressed by the following equation (4).

【０１１０】[0110]

【数３】 (Equation 3)

【０１１１】ここで、予測誤差ｅｉの平方和を最小にす
るよう、ａ０，ａ１，…，ａｐを選ぶ。Here, a0, a1,..., Ap are selected so as to minimize the sum of squares of the prediction error ei.

【０１１２】解法には、式（５）をａ０，ａ１，…，ａ
ｐでそれそれ偏微分してゼロとおいて、連立方程式を解
く数値解析問題に帰着できる。In the solution, the equation (5) is converted to a0, a1,.
By partially differentiating each with p and setting it to zero, it can be reduced to a numerical analysis problem for solving simultaneous equations.

【０１１３】[0113]

【数４】 (Equation 4)

【０１１４】予測誤差ｅｉの平方和を最小にするａ０，
ａ１，…，ａｐの解（回帰係数）をα０，α１，…，α
ｐとした時、予測モデルは、式（５）で表される。A0, which minimizes the sum of squares of the prediction error ei,
The solutions (regression coefficients) of a1,..., ap are α0, α1,.
When p is set, the prediction model is represented by Expression (5).

【０１１５】[0115]

【数５】 (Equation 5)

【０１１６】（２）予測の場合一方、学習／予測制御部２０７の指示によって、予測の
場合には、中間言語２０１の入力音韻記号列から、ポー
ズ直前呼気段落長算出部２１０、ポーズ直後呼気段落長
算出部２１１及びポーズ２つ後呼気段落長算出部２１２
の動作によって、各呼気段落長６００を計数する。長さ
の計数にあたっては、学習時と同様に、モーラ数を計数
し、その際、日本語の特質を考慮して、撥音（Ｎ）、促
音（ッ）、長音も１拍に計数する。(2) In the case of prediction On the other hand, in the case of prediction according to the instruction of the learning / prediction control unit 207, the exhalation paragraph length calculation unit 210 immediately before the pause, the exhalation paragraph immediately after the pause, from the input phoneme symbol string of the intermediate language 201. Length calculation unit 211 and expiration paragraph length calculation unit 212 after two pauses
, Each exhalation paragraph length 600 is counted. In counting the length, the number of mora is counted in the same manner as in the learning, and at that time, in consideration of the characteristics of Japanese, the repellent sound (N), the prompting sound (tsu), and the long sound are also counted in one beat.

【０１１７】ポーズ長学習／予測部５０１では、各呼気
段落長６００を入力し、学習時に決定した回帰係数を読
み出して、前記式（６）の線形和を演算し、目的変数で
あるポーズ長を予測する。The pause length learning / prediction unit 501 inputs each expiration paragraph length 600, reads out the regression coefficient determined at the time of learning, calculates the linear sum of the above equation (6), and calculates the pause length as the objective variable. Predict.

【０１１８】ポーズ修正部２０６は、現在の発声速度を
入力して、ポーズ長を修正する。The pause correcting section 206 receives the current utterance speed and corrects the pause length.

【０１１９】ここで、要因データとしては、第３の実施
形態では、ポーズ直前の呼気段落長（モーラ数）、ポー
ズ直後の呼気段落長（モーラ数）、及びポーズ２つ後の
呼気段落長（モーラ数）を用いることを特徴とする。Here, as the factor data, in the third embodiment, the expiration paragraph length immediately before the pause (number of mora), the expiration paragraph length immediately after the pause (number of mora), and the expiration paragraph length after two pauses (the number of mora). (Mora number).

【０１２０】次に、図６を参照してポーズ長学習／予測
部５０１及びポーズ長修正部２０６の動作を詳細に説明
する。Next, the operations of the pause length learning / prediction unit 501 and the pause length correction unit 206 will be described in detail with reference to FIG.

【０１２１】この動作も、学習／予測制御部２０７の指
示によって、（１）学習の場合と、（２）予測の場合と
に分かれる。（１）学習の場合学習／予測制御部２０７の指示により、学習の場合に
は、学習用各呼気段落長（説明変数）とポーズ実測値デ
ータからなる学習用データ５０５から、重回帰学習モデ
ル６０３を起動して、前記式（５）を解き、求まった回
帰係数を回帰係数メモリ６０２に記憶する。（２）予測の場合学習／予測制御部２０７の指示により、予測する場合に
は、予測用各呼気段落長（説明変数）データ６００を、
まず中間言語である入力音韻記号列から算出する。重回
帰予測モデル６０１では、前記予測用各呼気段落長（説
明変数）データ６００に対して、回帰係数メモリ６０２
内の回帰係数を読み出し、重回帰予測モデル６０１を起
動して、前記式（６）を計算し、ポーズ長の推定値デー
タ３０４を求める。This operation is also divided into (1) the case of learning and (2) the case of prediction according to the instruction of the learning / prediction control unit 207. (1) In the case of learning According to the instruction of the learning / prediction control unit 207, in the case of learning, in the case of learning, the multiple regression learning model 603 is obtained from the learning data 505 composed of each expiration paragraph length (explanatory variable) for learning and the actual measured value of the pose. Is started, the equation (5) is solved, and the obtained regression coefficient is stored in the regression coefficient memory 602. (2) In the case of prediction In the case of prediction according to the instruction of the learning / prediction control unit 207, each of the expiratory paragraph length (explanatory variable) data 600 for prediction is
First, it is calculated from an input phoneme symbol string which is an intermediate language. In the multiple regression prediction model 601, a regression coefficient memory 602
Are read out, the multiple regression prediction model 601 is started, and the above equation (6) is calculated, and the estimated value data 304 of the pause length is obtained.

【０１２２】ポーズ修正部２０６は、発声速度比算出部
３０５、ポーズ長伸縮部３０７から構成され、その動作
は、まず、現在の発声速度を入力して、発声速度比算出
部３０５によって、学習時の基準の発声速度との比３０
６を求め、ポーズ長伸縮部３０７において、ポーズ長推
定値データ３０４に発声速度比３０６から定まる定数を
乗ずる。The pause correction unit 206 is composed of a utterance speed ratio calculation unit 305 and a pause length expansion / contraction unit 307. First, the current utterance speed is input, and the utterance speed ratio calculation unit 305 performs the learning operation. Ratio to the standard utterance speed of 30
Then, in the pause length expansion / contraction unit 307, the pause length estimated value data 304 is multiplied by a constant determined from the utterance speed ratio 306.

【０１２３】以上説明したように、第３の実施形態に係
る音声合成装置は、非ポーズでない呼気段落境界の直前
の呼気段落長を計数するポーズ直前呼気段落長算出部２
１０、非ポーズでない呼気段落境界の直後の呼気段落長
を計数するポーズ直後呼気段落長算出部２１１、非ポー
ズでない呼気段落境界の２つ後の呼気段落長を計数する
ポーズ２つ後呼気段落長算出部２１２、各呼気段落長か
ら、重回帰分析によって得たポーズ時間長重みテーブル
を用いて、ポーズ時間長を予測するポーズ長学習／予測
部５０１、ポーズ長学習／予測部５０１を制御する学習
／予測制御部２０７、発声速度に応じて、ポーズ長予測
値を修正するポーズ長修正部２０６を備え、規則音声合
成のためのポーズ長を決定するようにしたので、第１の
実施形態と同様に、テキスト解析における係り受け情報
等を用いることなく、簡易な手法でポーズ長を適切に制
御でき、自然なテンポ感の合成音声を得ることができ
る。As described above, the speech synthesizing device according to the third embodiment calculates the expiration paragraph length immediately before the non-pause expiration paragraph boundary.
10. Exhaled paragraph length immediately after pause counting the exhalation paragraph length immediately after the non-pause expiration paragraph boundary 211, Pause two after expiration paragraph length to count the exhalation paragraph length two after the non-pause expiration paragraph boundary The calculation unit 212, a pause length learning / prediction unit 501 that predicts a pause time length from each expiration paragraph length using a pause time length weight table obtained by multiple regression analysis, and learning that controls the pause length learning / prediction unit 501. The prediction control unit 207 includes a pause length correction unit 206 that corrects the pause length prediction value according to the utterance speed, and determines the pause length for ruled speech synthesis, as in the first embodiment. In addition, the pause length can be appropriately controlled by a simple method without using dependency information or the like in text analysis, and a synthesized voice with a natural tempo feeling can be obtained.

【０１２４】特に、第３の実施形態では、重回帰モデル
を用いるので、各呼気段落長をカテゴリ化しないため、
説明変数の変動によるポーズ長の変動をより細かく制御
でき、高精度のポーズ長予測が可能になる。In particular, in the third embodiment, since the multiple regression model is used, each exhalation paragraph length is not categorized.
Variations in pose length due to variations in explanatory variables can be controlled more precisely, and highly accurate pose length prediction can be performed.

【０１２５】第４の実施形態図７は本発明の第４の実施形態に係る音声合成装置及び
規則音声合成方法のパラメータ生成部による、規則音声
合成のためのポーズ長決定方法を説明するためのブロッ
ク図である。本実施形態に係る音声合成方法の説明にあ
たり図４と同一構成部分には同一符号を付している。Fourth Embodiment FIG. 7 is a diagram for explaining a method of determining a pause length for regular speech synthesis by a speech synthesis apparatus and a parameter generation unit of a regular speech synthesis method according to a fourth embodiment of the present invention. It is a block diagram. In the description of the speech synthesis method according to the present embodiment, the same components as those in FIG. 4 are denoted by the same reference numerals.

【０１２６】本ポーズ長決定方法は、前記図１のパラメ
ータ生成部１０３によるポーズ長決定方法に適用でき
る。This pose length determination method can be applied to the pause length determination method by the parameter generation unit 103 in FIG.

【０１２７】図７において、２０１は中間言語、４０１
は非ポーズでない呼気段落境界の直前の呼気段落長を計
数するポーズ直前呼気段落長算出部（第１の計数手
段）、４０２は非ポーズでない呼気段落境界の直後の呼
気段落長を計数するポーズ直後呼気段落長算出部（第２
の計数手段）、５０１は各呼気段落長から、重回帰分析
によって得たポーズ時間長重みテーブルを用いてポーズ
時間長を予測するポーズ長学習／予測部、５０５は学習
データ、２０７は学習／予測制御部、２０８は発声速度
データ、２０６はポーズ長修正部、２０９は目的ポーズ
長データである。In FIG. 7, reference numeral 201 denotes an intermediate language;
Is a pause immediately before expiration paragraph length calculation section (first counting means) for counting the expiration paragraph length immediately before the non-pause expiration paragraph boundary, and 402 is immediately after the pause for counting the expiration paragraph length immediately after the non-pause expiration paragraph boundary. Expiration paragraph length calculation unit (second
A pause length learning / prediction unit 501 for predicting a pause time length from each expiration paragraph length using a pause time length weight table obtained by a multiple regression analysis; 505, learning data; The control unit 208 is utterance speed data, 206 is a pause length correction unit, and 209 is target pause length data.

【０１２８】上記ポーズ長学習／予測部５０１及びポー
ズ長修正部２０６の詳細な構成は、前記図６に示すブロ
ック図と同様である。The detailed configurations of the pause length learning / prediction unit 501 and the pause length correction unit 206 are the same as those in the block diagram shown in FIG.

【０１２９】以下、上述のように構成された音声合成方
法における、ポーズ長決定方法の動作を説明する。Hereinafter, the operation of the pause length determination method in the speech synthesis method configured as described above will be described.

【０１３０】ポーズとしては、明示的にカンマ（，）な
どの記号で示すか、あるいは、ポーズを陽に示さず、呼
気段落境界が息継ぎの箇所であるので、呼気段落境界を
ポーズとしてもよい。呼気段落境界をポーズとする場
合、特定の呼気段落境界にポーズが入らないことを示す
非ポーズ記号を採用してもよい。以下、ポーズとは、ポ
ーズ記号で明示的に示されたポーズ、または、陽に示さ
ない場合の、非ポーズでない箇所を指すものとする。The pause may be explicitly indicated by a symbol such as a comma (,), or the expiration paragraph boundary may be set as a pause because the expiration paragraph boundary is a point of breathing without explicitly indicating the pause. When the exhalation paragraph boundary is paused, a non-pause symbol indicating that no pause occurs at a specific exhalation paragraph boundary may be employed. In the following, a pose refers to a pose that is explicitly indicated by a pose symbol, or a part that is not non-paused when not explicitly indicated.

【０１３１】図６に示すように、中間言語２０１、ポー
ズ直前呼気段落長算出部４０１、ポーズ直後呼気段落長
算出部４０２、ポーズ長学習／予測部５０１、学習デー
タ５０５、学習／予測制御部２０７、発声速度データ２
０８、ポーズ長修正部２０６、及び目的ポーズ長データ
２０９から構成されている。As shown in FIG. 6, an intermediate language 201, an expiration paragraph length immediately before a pause calculation section 401, an expiration paragraph length immediately after a pause calculation section 402, a pause length learning / prediction section 501, learning data 505, a learning / prediction control section 207. Utterance rate data 2
08, a pose length correction unit 206, and target pose length data 209.

【０１３２】その動作は、学習／予測制御部２０７の指
示によって、（１）モデル学習の場合と、（２）予測の
場合とに分かれる。（１）モデル学習の場合学習／予測制御部２０７の指示によって、学習の場合に
は、ポーズ直前、直後の呼気段落長を計数しておく。長
さの計数にあたっては、モーラ数を計数し、その際、日
本語の特質を考慮して、撥音（Ｎ）、促音（ッ）、長音
も１拍に計数する。前記ポーズ直前、直後の呼気段落長
とポーズ長実測値とを学習データとし、ポーズ長学習部
５０１に入力され、ポーズ長実測値を説明変数として、
前記式（５）を解き、重回帰分析のモデルを学習（回帰
係数を決定）する。（２）予測の場合一方、学習／予測制御部２０７の指示によって、予測の
場合には、中間言語２０１の入力音韻記号列から、ポー
ズ直前呼気段落長算出部４０１及びポーズ直後呼気段落
長算出部４０２の動作によって、各呼気段落長を計数す
る。長さの計数にあたっては、学習時と同様に、モーラ
数を計数し、その際、日本語の特質を考慮して、撥音
（Ｎ）、促音（ッ）、長音も１拍に計数する。ポーズ長
予測部５０１では、前記呼気段落長を入力し、学習時に
決定した回帰係数メモリ６０２内の回帰係数を読み出し
て、重回帰予測モデル６０１を起動して、前記式（６）
を演算し、目的変数であるポーズ長を予測する。The operation is divided into (1) the case of model learning and (2) the case of prediction according to the instruction of the learning / prediction control unit 207. (1) In the case of model learning In the case of learning in accordance with an instruction from the learning / prediction control unit 207, the exhalation paragraph length immediately before and immediately after a pause is counted. In counting the length, the number of mora is counted, and at that time, in consideration of the characteristics of Japanese, the sound repellency (N), the prompting sound (tsu), and the long sound are also counted in one beat. The breath paragraph length immediately before and immediately after the pause and the measured pause length value are used as learning data, input to the pause length learning unit 501, and the measured pause length values are used as explanatory variables.
The equation (5) is solved, and a model for multiple regression analysis is learned (regression coefficients are determined). (2) In the case of prediction On the other hand, in the case of prediction in accordance with an instruction from the learning / prediction control unit 207, an expiration paragraph length calculation unit 401 immediately before a pause and an expiration paragraph length calculation unit immediately after a pause from an input phoneme symbol string of the intermediate language 201 By the operation of 402, each exhalation paragraph length is counted. In counting the length, the number of mora is counted in the same manner as in the learning, and at that time, in consideration of the characteristics of Japanese, the repellent sound (N), the prompting sound (tsu), and the long sound are also counted in one beat. The pause length prediction unit 501 inputs the expiration paragraph length, reads the regression coefficient in the regression coefficient memory 602 determined at the time of learning, activates the multiple regression prediction model 601, and executes the equation (6).
Is calculated, and a pause length, which is an objective variable, is predicted.

【０１３３】ポーズ修正部２０６は、現在の発声速度を
入力して、ポーズ長を修正する。The pause correcting section 206 inputs the current utterance speed and corrects the pause length.

【０１３４】ここで、要因データとしては、第４の実施
形態では、ポーズ直前の呼気段落長（モーラ数）、及び
ポーズ直後の呼気段落長（モーラ数）を用いることを特
徴とする。Here, in the fourth embodiment, the exhalation paragraph length immediately before the pause (number of mora) and the exhalation paragraph length immediately after the pause (number of mora) are used as the factor data.

【０１３５】以上説明したように、第４の実施形態に係
る音声合成装置は、非ポーズでない呼気段落境界の直前
の呼気段落長を計数するポーズ直前呼気段落長算出部４
０１、非ポーズでない呼気段落境界の直後の呼気段落長
を計数するポーズ直後呼気段落長算出部４０２、各呼気
段落長から、重回帰分析によって得たポーズ時間長重み
テーブルを用いて、ポーズ時間長を予測するポーズ長学
習／予測部５０１、ポーズ長学習／予測部５０１を制御
する学習／予測制御部２０７、発声速度に応じて、ポー
ズ長予測値を修正するポーズ長修正部２０６を備え、規
則音声合成のためのポーズ長を決定するようにしたの
で、上記各実施形態と同様に、簡易な構成で、ポーズ長
を適切に制御でき、自然なテンポ感の合成音声を得るこ
とが可能になる効果がある。また、重回帰分析を用いて
いるため、要因をカテゴリ化する際の、量子化誤差がな
く推定精度が向上する。As described above, the voice synthesizing apparatus according to the fourth embodiment calculates the expiration paragraph length immediately before a non-pause expiration paragraph boundary.
01, a pause immediately after expiration paragraph length calculation unit 402 that counts the expiration paragraph length immediately after the non-pause expiration paragraph boundary. Length learning / prediction unit 501 for predicting the length, a learning / prediction control unit 207 for controlling the pause length learning / prediction unit 501, and a pause length correction unit 206 for correcting the pause length prediction value according to the utterance speed. Since the pause length for speech synthesis is determined, similarly to the above embodiments, the pause length can be appropriately controlled with a simple configuration, and a synthesized voice with a natural sense of tempo can be obtained. effective. In addition, since multiple regression analysis is used, there is no quantization error when the factors are categorized, and the estimation accuracy is improved.

【０１３６】さらに、これらの効果に加えて、特に、第
４の実施形態においては、第３の実施形態と比較してポ
ーズ長の推定精度を、第３の実施形態と比較して大幅に
低下させることなく、重み数量メモリである回数係数メ
モリ６０２を２／３に、また、予測時の乗算及び加算の
演算量も２／３にそれぞれ削減でき、より低コストでコ
ンパクトにまとめることが可能になる効果がある。Further, in addition to these effects, in particular, in the fourth embodiment, the accuracy of estimating the pause length is greatly reduced as compared with the third embodiment. Without having to do this, the number-of-times coefficient memory 602, which is a weight quantity memory, can be reduced to 2/3, and the amount of calculations for multiplication and addition at the time of prediction can be reduced to 2/3, respectively. There is an effect.

【０１３７】なお、上記各実施形態に係る音声合成方法
及び装置では、非ポーズでない呼気段落境界を検出し、
非ポーズでない呼気段落境界の各呼気段落長を計数する
ようにしているが、特定の呼気段落境界にポーズが入ら
ないことを示す非ポーズ記号を採用して、ポーズ記号の
各呼気段落長を計数するようしてもよく、同様の効果を
得ることができる。In the speech synthesizing method and apparatus according to each of the above embodiments, a non-pause exhalation paragraph boundary is detected.
Although each exhalation paragraph length at the non-pause exhalation paragraph boundary is counted, each exhalation paragraph length of the pause symbol is counted by adopting a non-pause symbol indicating that a specific exhalation paragraph boundary does not enter a pause. The same effect can be obtained.

【０１３８】また、第１及び第２の実施形態では、各呼
気段落の長さを算出してから、カテゴリ化するように構
成しているが、各呼気段落の長さの算出とカテゴリ化を
同時に実行するような構成にしても差し支えない。In the first and second embodiments, the length of each exhalation paragraph is calculated and then categorized. However, the calculation and categorization of the length of each exhalation paragraph are performed. It may be configured to execute at the same time.

【０１３９】また、上記各実施形態における規則音声合
成のためのポーズ長決定方法としては、汎用コンピュー
タによって、ソフトウェアで実現する構成にしても、専
用ハードウェア装置（例えば、テキスト音声合成ＬＳ
Ｉ）で装置を実現する構成にしてもよい。また、このよ
うなソフトウェアを格納した、フロッピー・ディスク、
ＣＤ−ＲＯＭ等の記録媒体を用いて、必要に応じて読み
出して、汎用コンピュータ上で実行させるような構成に
しても、何ら差支えない。The method of determining the pause length for the rule speech synthesis in each of the above embodiments may be realized by a dedicated hardware device (for example, a text speech synthesis LS)
A configuration that realizes the device in I) may be adopted. Also, floppy disks containing such software,
There may be no problem if a configuration is adopted in which a recording medium such as a CD-ROM is used to read data as needed and execute the program on a general-purpose computer.

【０１４０】また、上記各実施形態に係る音声合成方法
及び装置では、テキストデータを入力とする音声合成方
法に全て適用することができるが、規則によって任意の
合成音声を得る音声合成方法及び装置であればどのよう
なものでもよく、各種端末に組み込まれる回路の一部で
あってもよい。The speech synthesis method and apparatus according to each of the above embodiments can be applied to any speech synthesis method using text data as an input. Any device may be used as long as it is a part of a circuit incorporated in various terminals.

【０１４１】さらに、上記各実施形態に係る音声合成方
法及び装置を構成する辞書や各種回路部の数、モデルの
形態などは前述した各実施形態に限られない。Further, the number of dictionaries and various circuit units constituting the speech synthesizing method and apparatus according to each of the above embodiments, the form of the model, and the like are not limited to the above embodiments.

【０１４２】[0142]

【発明の効果】本発明に係る音声合成装置及び規則音声
合成方法では、非ポーズでない呼気段落境界を検出する
手段と、非ポーズでない呼気段落境界の直前の呼気段落
長を計数する第１の計数手段と、非ポーズでない呼気段
落境界の直後の呼気段落長を計数する第２の計数手段
と、第１及び第２の呼気段落長をカテゴリ化する手段
と、カテゴリ化された呼気段落長から、数量化１類によ
って得たポーズ時間長重みテーブルを用いて、ポーズ時
間長を予測するポーズ長決定手段とを備えて構成したの
で、テキスト解析における係り受け情報等を全く用いる
ことなく、簡易な手法でポーズ長を適切に制御でき、自
然なテンポ感の合成音声を得ることができる。According to the speech synthesizing apparatus and the rule speech synthesizing method of the present invention, a means for detecting a non-paused exhalation paragraph boundary and a first counting for counting the exhalation paragraph length immediately before the non-paused exhalation paragraph boundary. Means, a second counting means for counting the exhalation paragraph length immediately after the non-pause exhalation paragraph boundary, a means for categorizing the first and second expiration paragraph lengths, and a categorized expiration paragraph length, A pause length determining means for predicting a pause time length using a pause time length weight table obtained by quantification class 1 is provided, so that a simple method can be used without using any dependency information in text analysis. , The pause length can be appropriately controlled, and a synthesized voice with a natural sense of tempo can be obtained.

【０１４３】本発明に係る音声合成装置及び規則音声合
成方法では、非ポーズでない呼気段落境界を検出する手
段と、非ポーズでない呼気段落境界の直前の呼気段落長
を計数する第１の計数手段と、非ポーズでない呼気段落
境界の直後の呼気段落長を計数する第２の計数手段と、
各呼気段落長から、重回帰分析によって得たポーズ時間
長重みテーブルを用いて、ポーズ時間長を予測するポー
ズ長決定手段とを備えて構成したので、説明変数の変動
によるポーズ長の変動をより細かく制御でき、高精度の
ポーズ長予測が可能になり、より自然なテンポ感の合成
音声を得ることができる。In the voice synthesizing apparatus and the rule voice synthesizing method according to the present invention, a means for detecting a non-paused exhalation paragraph boundary, and a first counting means for counting the exhalation paragraph length immediately before the non-paused exhalation paragraph boundary. Second counting means for counting the breath paragraph length immediately after the non-pause breath paragraph boundary;
A pause length determining means for predicting a pause time length using a pause time length weight table obtained by multiple regression analysis from each exhalation paragraph length is provided, so that the variation of the pause length due to the variation of the explanatory variable is further reduced. Fine control can be performed, and the pause length can be predicted with high accuracy, so that a synthesized voice with a more natural tempo can be obtained.

[Brief description of the drawings]

【図１】本発明を適用した第１の実施形態に係る音声合
成装置及び規則音声合成方法の構成を示すブロック図で
ある。FIG. 1 is a block diagram showing a configuration of a speech synthesis device and a rule speech synthesis method according to a first embodiment to which the present invention is applied.

【図２】上記音声合成装置及び規則音声合成方法のパラ
メータ生成部による、規則音声合成のためのポーズ長決
定方法を説明するためのブロック図である。FIG. 2 is a block diagram for explaining a method of determining a pause length for regular speech synthesis by the speech synthesis apparatus and the parameter generation unit of the regular speech synthesis method.

【図３】上記音声合成装置及び規則音声合成方法のポー
ズ長学習／予測部及びポーズ長修正部の詳細な構成を示
すブロック図である。FIG. 3 is a block diagram showing a detailed configuration of a pause length learning / prediction unit and a pause length correction unit of the voice synthesis device and the rule voice synthesis method.

【図４】本発明を適用した第２の実施形態に係る音声合
成装置及び規則音声合成方法のパラメータ生成部によ
る、規則音声合成のためのポーズ長決定方法を説明する
ためのブロック図である。FIG. 4 is a block diagram for explaining a pause length determination method for regular speech synthesis by a speech synthesis device and a parameter generation unit of a regular speech synthesis method according to a second embodiment of the present invention.

【図５】本発明を適用した第３の実施形態に係る音声合
成装置及び規則音声合成方法のパラメータ生成部によ
る、規則音声合成のためのポーズ長決定方法を説明する
ためのブロック図である。FIG. 5 is a block diagram for explaining a pause length determination method for regular speech synthesis by a speech synthesis device and a parameter generation unit of the regular speech synthesis method according to a third embodiment to which the present invention is applied.

【図６】上記音声合成装置及び規則音声合成方法のポー
ズ長学習／予測部及びポーズ長修正部の詳細な構成を示
すブロック図である。FIG. 6 is a block diagram showing a detailed configuration of a pause length learning / prediction unit and a pause length correction unit of the voice synthesis device and the rule voice synthesis method.

【図７】本発明を適用した第４の実施形態に係る音声合
成装置及び規則音声合成方法のパラメータ生成部によ
る、規則音声合成のためのポーズ長決定方法を説明する
ためのブロック図である。FIG. 7 is a block diagram for explaining a pause length determination method for regular speech synthesis by a speech synthesis apparatus and a parameter generation unit of the regular speech synthesis method according to a fourth embodiment to which the present invention is applied.

[Explanation of symbols]

１０１テキスト解析部、１０２単語辞書、１０３
パラメータ生成部、１０４音声合成部、１０５素片
辞書、１０６音声入力部、１０７素片作成部、２０
１中間言語、２０３カテゴリ化手段、２０４，５０
１ポーズ長学習／予測部（ポーズ長決定手段）、２０
５，５０５学習データ、２０６ポーズ長修正部（ポ
ーズ長修正手段）、２０７学習／予測制御部、２０８
発声速度データ、２０９目的ポーズ長データ、２１
０，４０１ポーズ直前呼気段落長算出部（第１の計数
手段）、２１１，４０２ポーズ直後呼気段落長算出部
（第２の計数手段）、２１２ポーズ２つ後呼気段落長
算出部（第３の計数手段）、３０１数量化１類予測モ
デル、３０２カテゴリ数量メモリ、３０３数量化１
類学習モデル、３０５発声速度比算出部、３０７ポ
ーズ長伸縮部、６０１重回帰予測モデル、６０２回
帰係数メモリ、６０３重回帰学習モデル101 text analysis unit, 102 word dictionary, 103
Parameter generation unit, 104 speech synthesis unit, 105 unit dictionary, 106 speech input unit, 107 unit creation unit, 20
1 intermediate language, 203 categorization means, 204, 50
1 Pose length learning / prediction unit (pause length determination means), 20
5,505 learning data, 206 pose length correction unit (pose length correction means), 207 learning / prediction control unit, 208
Utterance speed data, 209 target pause length data, 21
0,401 Expiration paragraph length calculation section immediately before pause (first counting means), 211,402 Expiration paragraph length calculation section immediately after pause (second counting means), 212 Expiration paragraph length calculation section after 2 pauses (third counting section) Counting means), 301 quantification type 1 prediction model, 302 category quantity memory, 303 quantification 1
Class learning model, 305 utterance speed ratio calculation unit, 307 pause length expansion / contraction unit, 601 multiple regression prediction model, 602 regression coefficient memory, 603 multiple regression learning model

Claims

[Claims]

1. A speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, means for detecting a non-paused exhalation paragraph boundary, and a first counting unit for counting an exhalation paragraph length immediately before the non-pause exhalation paragraph boundary. Means, a second counting means for counting an exhalation paragraph length immediately after the non-pause exhalation paragraph boundary, and a third counting means for counting the exhalation paragraph length two times after the non-pause exhalation paragraph boundary. Means for categorizing the first, second, and third exhalation paragraph lengths, and a pause time length weight table obtained by multivariate analysis from the categorized expiration paragraph lengths to determine a pause time length. A speech synthesizer comprising: a pause length determining means for predicting the pause length.

2. A speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, means for detecting a non-paused exhalation paragraph boundary, and a first counting unit for counting an exhalation paragraph length immediately before the non-pause exhalation paragraph boundary. Means, second counting means for counting the exhalation paragraph length immediately after the non-pause exhalation paragraph boundary, means for categorizing the first and second expiration paragraph lengths, and the categorized exhalation paragraph. A pause length determining means for predicting a pause time length using a pause time length weight table obtained by multivariate analysis from the length.

3. A speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, wherein: a first counting means for counting a breath paragraph length immediately before a pause symbol; and a second counting means for counting a breath paragraph length immediately after a pause symbol. Counting means; third counting means for counting the exhalation paragraph length two times after the pause symbol; means for categorizing the first, second and third exhalation paragraph lengths; and the categorized exhalation A speech synthesis apparatus comprising: a pause length determining unit that predicts a pause time length using a pause time length weight table obtained by multivariate analysis from a paragraph length.

4. A voice synthesizing apparatus for obtaining an arbitrary synthesized voice according to a rule, wherein: a first counting means for counting a breath paragraph length immediately before a pause symbol; and a second counting means for counting a breath paragraph length immediately after a pause symbol. Counting means; means for categorizing the first and second expiration paragraph lengths; and, from the categorized expiration paragraph lengths, the pause time length is calculated using a pause time length weight table obtained by multivariate analysis. A speech synthesizer comprising: a pause length determining means for predicting the pause length.

5. A speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, means for detecting a non-paused exhalation paragraph boundary, and a first counting unit for counting an exhalation paragraph length immediately before the non-pause exhalation paragraph boundary. Means, a second counting means for counting an exhalation paragraph length immediately after the non-pause exhalation paragraph boundary, and a third counting means for counting the exhalation paragraph length two times after the non-pause exhalation paragraph boundary. And a pause length determining means for predicting a pause time length from each of the expiration paragraph lengths using a pause time length weight table obtained by multivariate analysis.

6. A speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, means for detecting a non-paused exhalation paragraph boundary, and a first counting unit for counting an exhalation paragraph length immediately before the non-pause exhalation paragraph boundary. Means, a second counting means for counting the expiration paragraph length immediately after the non-pause expiration paragraph boundary, and a pause time length weight table obtained by multivariate analysis from each expiration paragraph length. And a pause length determining means for predicting the length.

7. A speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, wherein: a first counting means for counting a breath paragraph length immediately before a pause symbol; and a second counting means for counting a breath paragraph length immediately after a pause symbol. Counting means, third counting means for counting the length of the expiration paragraph after the pause symbol, and a pause time length from each of the expiration paragraph lengths using a pause time length weight table obtained by multivariate analysis. A speech synthesizer comprising: a pause length determining means for predicting the pause length.

8. A speech synthesizer for obtaining an arbitrary synthesized speech according to a rule, wherein: a first counting means for counting a breath paragraph length immediately before a pause symbol; and a second counting means for counting a breath paragraph length immediately after a pause symbol. A speech synthesizer comprising: counting means; and pause length determining means for predicting a pause time length from each of the expiration paragraph lengths using a pause time length weight table obtained by multivariate analysis.

9. The method according to claim 1, wherein the multivariate analysis is a quantification class for calculating a target external criterion based on qualitative factors. Speech synthesizer.

10. The speech synthesizer according to claim 5, wherein the multivariate analysis is a multiple regression analysis that calculates an objective variable based on a plurality of explanatory variables.

11. The apparatus according to claim 1, wherein said pause length determining means includes a pause length correcting means for correcting and setting a pause length predicted value in accordance with a utterance speed. Voice synthesizer.

12. A rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, comprising: detecting a non-pause exhalation paragraph boundary; and counting an exhalation paragraph length immediately before the non-pause exhalation paragraph boundary. Counting the exhalation paragraph length immediately after the non-pause exhalation paragraph boundary; counting the exhalation paragraph length two times after the non-pause exhalation paragraph boundary; and categorizing each of the exhalation paragraph lengths. And a step of predicting a pause time length from the categorized expiration paragraph lengths using a pause time length weight table obtained by multivariate analysis to determine a pause length for rule speech synthesis. A rule speech synthesis method.

13. A rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, comprising: detecting a non-paused exhalation paragraph boundary; and counting an exhalation paragraph length immediately before the non-paused exhalation paragraph boundary. Counting the exhalation paragraph length immediately after the non-pause exhalation paragraph boundary; categorizing each of the exhalation paragraph lengths; and, from the categorized exhalation paragraph length, a pause time length obtained by a multivariate analysis. And a step of predicting a pause time length using a weight table to sequentially determine a pause length for regular voice synthesis.

14. A rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, comprising: a step of counting an exhalation paragraph length immediately before a pause symbol; a step of counting an expiration paragraph length immediately after a pause symbol; Counting the breath length after two, and categorizing each breath length, from the categorized breath length, using a pause time length weight table obtained by multivariate analysis, And a step of predicting a pause time length is sequentially performed to determine a pause length for the synthesis of a rule speech.

15. A rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, wherein: a step of counting an exhalation paragraph length immediately before a pause symbol; a step of counting an expiration paragraph length immediately after a pause symbol; Categorizing paragraph lengths; and predicting pause time lengths from the categorized expiration paragraph lengths using a pause time length weight table obtained by multivariate analysis, sequentially executing rule speech synthesis. A ruled speech synthesis method characterized by determining a pause length for a speech.

16. A rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, comprising: detecting a non-paused exhalation paragraph boundary; and counting an exhalation paragraph length immediately before the non-paused exhalation paragraph boundary. Counting the exhalation paragraph length immediately after the non-pause exhalation paragraph boundary; counting the exhalation paragraph length two times after the non-pause exhalation paragraph boundary; and multivariate analysis from each of the exhalation paragraph lengths. And a step of predicting the pause time length using the pause time length weight table obtained in step (a), to sequentially determine a pause length for the rule voice synthesis.

17. A rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, comprising: detecting a non-paused exhalation paragraph boundary; and counting an exhalation paragraph length immediately before the non-pause exhalation paragraph boundary. Counting the exhalation paragraph length immediately after the non-pause exhalation paragraph boundary; and, from the respective exhalation paragraph lengths, estimating the pause time length using a pause time length weight table obtained by multivariate analysis. A rule speech synthesis method characterized by sequentially determining a pause length for rule speech synthesis.

18. A rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, comprising: counting a breath paragraph length immediately before a pause symbol; counting a breath paragraph length immediately after a pause symbol; A step of counting the breath length after two, and a step of predicting the pause time length from each of the breath paragraph lengths using a pause time length weight table obtained by multivariate analysis. A method of synthesizing a rule speech, comprising determining a pause length for synthesis.

19. A rule speech synthesis method for obtaining an arbitrary synthesized speech according to a rule, wherein: a step of counting an exhalation paragraph length immediately before a pause symbol; a step of counting an exhalation paragraph length immediately after a pause symbol; And a step of predicting a pause time length using a pause time length weight table obtained by multivariate analysis from the paragraph length to sequentially determine a pause length for rule speech synthesis. Synthesis method.

20. The multivariate analysis is a quantification class for calculating a target external criterion based on qualitative factors.
A method for synthesizing a rule speech according to any one of the above.

21. The rule speech synthesis method according to claim 16, wherein the multivariate analysis is a multiple regression analysis for calculating an objective variable based on a plurality of explanatory variables.

22. The method according to claim 12, wherein after predicting a pause time length, a step of correcting and setting the pause length prediction value according to a utterance speed is executed. Rule speech synthesis method.

23. A storage medium storing a program read out by a computer as needed and executed on the computer, wherein the rule-based speech synthesis method according to claim 12 is sequentially executed. A storage medium storing a program.