JP3310226B2

JP3310226B2 - Voice synthesis method and apparatus

Info

Publication number: JP3310226B2
Application number: JP29503398A
Authority: JP
Inventors: 村洋文西; 月亮望; 輪利光蓑
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1998-10-16
Filing date: 1998-10-16
Publication date: 2002-08-05
Anticipated expiration: 2018-10-16
Also published as: JP2000122683A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声合成方法および
装置に関し、特にテキストを音声に変換する音声合成方
法およびその装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesis method and apparatus, and more particularly, to a speech synthesis method and apparatus for converting text into speech.

【０００２】[0002]

【従来の技術】従来、音声片を接続して音声を合成する
音声合成方法は、特開平１０−１７１４８４号公報に記
載されているように、制御目標のピッチパタンに関係な
く音声片データベースを構築していた。2. Description of the Related Art Conventionally, a speech synthesis method for synthesizing speech by connecting speech segments, as described in JP-A- 10-171484, constructs a speech segment database irrespective of a pitch pattern to be controlled. Was.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
音声合成方法では、制御目標のピッチパタンに似通った
ピッチパタンの音声片が音声片データベースに存在する
という保証がないので、制御目標のピッチパタンと音声
片が本来持っているピッチパタンが大きく異なるとき、
ピッチ変更による音質劣化が目立ち、合成音声の音質劣
化を招くと言う問題があった。However, in the conventional speech synthesis method, there is no guarantee that a speech fragment of a pitch pattern similar to the pitch pattern of the control target exists in the speech fragment database. When the pitch pattern that the voice piece originally has is significantly different,
There has been a problem that sound quality deterioration due to the pitch change is conspicuous, resulting in sound quality deterioration of synthesized speech.

【０００４】本発明は、このようなピッチ変更による音
質劣化を防止する音声合成方法を提供すると共に、ピッ
チ変更による音質劣化の観点からみた音声片データベー
スの曖昧性を取り除いた音声合成装置を提供することを
目的とする。[0004] The present invention provides a speech synthesis method for preventing such sound quality deterioration due to pitch change, and also provides a speech synthesis apparatus in which the ambiguity of the speech piece database is removed from the viewpoint of sound quality deterioration due to pitch change. The purpose is to:

【０００５】[0005]

【課題を解決するための手段】上記課題を解決するため
に本発明では、制御目標とするピッチパタンをテンプレ
ートで決定し、このテンプレートを合成単位の区間に分
割して、それぞれの区間においてピッチの高さや形状が
どのようになっているのかを予め求め、カテゴリに分類
する。そして、音声片データベースを音韻連鎖毎に上記
のカテゴリの音声片が必ず一つは存在するように音声片
データベースを構成し、文章を合成する際には、文末に
なるにつれてピッチを低くするようにテンプレートを修
正し、これらの修正したテンプレートを組み合わせて文
章全体のテンプレートとして用い、テンプレートを低く
修正するレベルを幾つかに分類して、このレベル毎に音
声片のピッチの高さや形状がどのカテゴリに属するのか
を予め求めておき、このカテゴリ分類によって音声片を
選択する。以上により、制御目標とするピッチパタンに
似通ったピッチパタンの音声片が音声片データベースに
存在することが保証できるので、ピッチ変更による音質
劣化を抑えることができ、合成音声の品質を改善するこ
とができ、また、有限のテンプレートで長い文章などの
全ての文章に対応することができる。In order to solve the above-mentioned problems, according to the present invention, a pitch pattern to be controlled is determined by a template, and this template is divided into sections of a synthesis unit, and the pitch of each section is calculated. The height and shape are determined in advance and classified into categories. Then, configure the speech piece database as audio piece of the above categories the speech piece database for each phoneme chain one always exists, in synthesizing a sentence is a sentence end
Modify the template to lower the pitch as
Correct and combine these modified templates
Use as a template for the whole chapter, lower the template
The level to be corrected is classified into several levels, and
Which category does the pitch and shape of the voice piece belong to?
Is determined in advance, and a speech piece is selected based on the category classification . As described above, since it is possible to guarantee that a voice pattern having a pitch pattern similar to the pitch pattern to be controlled exists in the voice pattern database, it is possible to suppress the sound quality deterioration due to the pitch change and improve the quality of the synthesized voice. Yes, and it can handle all sentences, such as long sentences, with a finite template.

【０００６】[0006]

【発明の実施の形態】本発明の請求項１に記載の発明
は、ＣＶ、ＣＶ／ＶＣ、ＶＣＶ、ＣＶ／ＶＣＶといった
合成単位の音声片をデータベースの中から選択し、選択
した音声片のピッチをピッチの概略パタンであるテンプ
レートに従って変形し、これらの変形した音声片を接続
して音声を合成する波形重畳方式の音声合成方法におい
て、ピッチの概略パタンであるテンプレートを合成単位
の区間に分割し、それぞれの区間をピッチの高さや形状
によってカテゴリに分類し、音韻連鎖が同じ音声片毎に
このカテゴリを全て網羅するように音声片データベース
を構成し、文章を合成する際には、文末になるにつれて
ピッチを低くするようにテンプレートを修正し、これら
の修正したテンプレートを組み合わせて文章全体のテン
プレートとして用い、テンプレートを低く修正するレベ
ルを幾つかに分類して、このレベル毎に音声片のピッチ
の高さや形状がどのカテゴリに属するのかを予め求めて
おき、このカテゴリ分類によって音声片を選択すること
を特徴とする音声合成方法であり、ピッチを変更する最
大値を制限することができるので、ピッチ変更による音
質劣化の最低レベルを設定することができ、また、有限
のテンプレートで長い文章などの全ての文章に対応する
ことができるという作用を有する。DESCRIPTION OF THE PREFERRED EMBODIMENTS According to the first aspect of the present invention, a speech unit of a synthesis unit such as CV, CV / VC, VCV, or CV / VCV is selected from a database, and the pitch of the selected speech unit is selected. Is modified according to a template that is an approximate pattern of the pitch, and in the speech synthesis method of the waveform superposition method in which these modified speech pieces are connected to synthesize speech, the template that is the approximate pattern of the pitch is divided into sections of synthesis units. , each segment is categorized by the height and shape of the pitch constitutes a speech piece database as phoneme chain to cover all categories for each same audio piece, in synthesizing a sentence will endnotes As
Modify the template to lower the pitch
Combining the modified templates in
Use as a plate and adjust the template low
Audio pitches for each level.
Beforehand which category the height and shape belong to
This is a voice synthesis method characterized by selecting voice segments according to this category classification , and it is possible to limit the maximum value of pitch change, so that the lowest level of sound quality deterioration due to pitch change can be set. , Also finite
Template for all sentences such as long sentences
It has the effect that the Ru can.

【０００７】請求項２に記載の発明は、合成しようとす
る音声の読みを入力する入力手段と、その読みの入力か
らＣＶ、ＣＶ／ＶＣ、ＶＣＶ、ＣＶ／ＶＣＶといった合
成単位の音声片の音韻連鎖を決定する音韻連鎖決定手段
と、音韻連鎖毎に音声片のピッチの高さや形状をカテゴ
リに分類した音声片データベースと、単語や文章のピッ
チの概略パタンであるピッチのテンプレートを記録した
ピッチテンプレートデータベースと、ピッチ制御につい
てはピッチテンプレートに従ってピッチパタンを決定す
る韻律決定手段と、ピッチテンプレートを合成単位の区
間に分割し、それぞれの区間においてピッチの高さや形
状がどのカテゴリに属するのかという情報を記録したカ
テゴリ選択テーブルと、カテゴリ選択テーブルに従って
音声片を選択する音声片選択手段と、音声片のピッチを
ピッチテンプレートに従って変形する音声片変形手段
と、変形した音声片を接続する音声片接続手段と、合成
した音声を出力する合成音声出力手段とを有し、単語や
文章のピッチの概略パタンであるピッチテンプレートを
単語あるいは複合語の長さに設定し、文章を合成する場
合にはこれらのピッチテンプレートを文末になるにつれ
て低くなるように修正し、これら修正したピッチテンプ
レートを組み合わせて文章用のピッチテンプレートを生
成する文章用ピッチテンプレート生成手段を設け、それ
ぞれのピッチテンプレートにおいてピッチを低く修正す
るレベルの分類毎にカテゴリ選択テーブルを複数用意す
ることを特徴とする音声合成装置であり、ピッチ変更に
よる音質劣化が少ない音声片をピッチ形状カテゴリによ
って絞り込んでから選択するので音声片の選択が容易で
あり、合成時の処理量を削減でき、また、膨大なデータ
サイズである文章用のピッチテンプレートとカテゴリ選
択テーブルがなくなるので、音声合成装置のピッチ制御
用データのサイズを小さくすることができるという作用
を有する。[0007] The invention according to claim 2 attempts to synthesize.
Input means for inputting voice readings
From CV, CV / VC, VCV, CV / VCV
Phonological chain determining means for determining the phonological chain of the speech unit of the unit
And the pitch and shape of the pitch of the speech piece for each phoneme chain
Database of voice segments, and pick up words and sentences
The pitch template, which is the approximate pattern of the switch, was recorded.
About pitch template database and pitch control
The pitch pattern according to the pitch template
Prosody determining means and the pitch template
Divided into sections, and the pitch height and shape in each section
The category that records the information that the category belongs to
According to the category selection table and the category selection table
A voice segment selecting means for selecting a voice segment, and a pitch of the voice segment.
Speech piece deformation means deforming according to a pitch template
And voice unit connection means for connecting the deformed voice unit, and synthesis
Means for outputting synthesized speech,
A pitch template, which is an outline pattern of the pitch of a sentence
Set to the length of a word or compound word to combine sentences
If you use these pitch templates as the end of the sentence
The pitch pitch
Combine rates to create pitch templates for writing
And a means for generating a pitch template for a sentence to be generated.
Lower the pitch in each pitch template
Multiple category selection tables for each level of classification
The voice synthesizer is characterized by
Voice segments with less sound quality degradation by pitch category
To select a voice segment,
Yes, the amount of processing required during synthesis can be reduced, and
Pitch template and category selection for text that is size
Since there is no selection table, pitch control of the speech synthesizer
This has the effect that the size of the application data can be reduced .

【０００８】[0008]

【０００９】[0009]

【００１０】[0010]

【００１１】[0011]

【００１２】以下、本発明の実施の形態について、図１
から図８を用いて説明する。（実施の形態１）本発明の実施の形態１について具体的に説明する。音声
を合成するには、まず目的の音韻の音声片を集め、これ
ら音声片のピッチ、リズムなどの韻律を変形した後にこ
れらを接続して音声を合成する。この時、音声片がもと
もと持っているピッチパターンと制御目標のピッチパタ
ーンに大きな差があると、ピッチ変更による音質劣化が
生じる。従来の音声合成方法では、音声片データベース
の中に目的とするピッチパターンに近い音声片が存在し
ているという保証がなかったので、ピッチ変更による音
質劣化が生じる場合があったことは前述したとおりであ
る。Hereinafter, an embodiment of the present invention will be described with reference to FIG.
This will be described with reference to FIG. For Embodiment 1 implementation of the present invention (Embodiment 1) it will be specifically described. To synthesize speech, first, speech segments of a target phoneme are collected, and after modifying the prosody such as pitch and rhythm of these speech segments, these are connected to synthesize speech. At this time, if there is a large difference between the pitch pattern originally possessed by the voice piece and the pitch pattern of the control target, sound quality degradation due to the pitch change occurs. In the conventional speech synthesis method, there was no guarantee that a speech segment close to the target pitch pattern existed in the speech segment database. It is.

【００１３】本実施の形態１は、上記の問題を解決する
もので、図１、図２および、図３はその概念図である。
図１、図２および図３は音声片の単位をＶＣＶ（母音−
子音−母音）とした場合の例である。図１はピッチパタ
ーンの制御目標にするピッチテンプレートの例であり、
101 〜106 はモーラ位置毎の基本周波数を示している。
音声を合成するには、ＶＣＶの両端の母音部分がピッチ
テンプレートに合うようにピッチ変更し、これらを接続
して音声を合成する。この時、制御目標とするピッチパ
ターンに似通った音声片を音声片データベースに必ず確
保するために、まず、ピッチテンプレートにおけるピッ
チ形状のパターンの分類を行なう。図２がピッチ形状の
パターン分類の例である。図２において、横軸はＶＣＶ
音声片の先行する母音部分の基本周波数であり、縦軸は
後続する母音部分の基本周波数である。図１の音声片区
間111 のピッチ形状は、先行母音が150Hz で後続母音が
300Hz であるので図２の201 の位置に対応する。同様に
して、図１の音声片区間112 〜115 は図２の202 〜205
の位置に対応する。これをすべてのピッチテンプレート
について行なう。そして、図２の211 〜214 のようにピ
ッチ形状のカテゴリ分類を行なう。そして図３に示すよ
うに、ピッチテンプレート毎に音声片区間とピッチ形状
のカテゴリの対応を記載したテーブルを作成する。The first embodiment solves the above problem, and FIGS. 1, 2 and 3 are conceptual diagrams.
FIGS. 1, 2 and 3 show that the unit of the speech piece is VCV (vowel-
(Consonant-vowel). FIG. 1 is an example of a pitch template to be a control target of a pitch pattern.
Reference numerals 101 to 106 indicate fundamental frequencies for each mora position.
To synthesize speech, the pitch is changed so that the vowel parts at both ends of the VCV match the pitch template, and these are connected to synthesize speech. At this time, in order to ensure a voice segment similar to the pitch pattern to be controlled in the voice segment database, first, a pitch shape pattern in the pitch template is classified. FIG. 2 shows an example of pattern classification of the pitch shape. In FIG. 2, the horizontal axis is VCV
The fundamental frequency of the preceding vowel part of the voice segment is shown, and the vertical axis is the fundamental frequency of the following vowel part. The pitch shape of the voice segment 111 in FIG. 1 is such that the leading vowel is 150 Hz and the following vowel is
Since the frequency is 300 Hz, it corresponds to the position 201 in FIG. Similarly, speech segment sections 112 to 115 in FIG. 1 correspond to 202 to 205 in FIG.
Corresponding to the position. This is performed for all pitch templates. Then, category classification of the pitch shape is performed as indicated by 211 to 214 in FIG. Then, as shown in FIG. 3, a table is created for each pitch template, which describes the correspondence between the voice segment and the category of the pitch shape.

【００１４】次に、図２のようにして決定したカテゴリ
の中心的な位置の音声片が音韻連鎖毎に必ず一つは存在
するように音声片データベースを構成する。そして音声
を合成する際には図３のテーブルに従って音声片を選択
する。これにより、制御目的とするピッチパターンに近
いピッチ形状を持つ音声片が音声片データベースの中に
必ず存在するので、ピッチ変更による音質劣化を制限す
ることができる。Next, the speech segment database is constructed so that there is always one speech segment at the central position of the category determined as shown in FIG. 2 for each phoneme chain. Then, when synthesizing voice, a voice segment is selected according to the table of FIG. As a result, since a voice piece having a pitch shape close to the pitch pattern to be controlled always exists in the voice piece database, it is possible to limit deterioration in sound quality due to pitch change.

【００１５】なお、本実施の形態１では、ピッチ制御の
方法としてピッチテンプレートを用いたが、ピッチパタ
ンをモデルによって生成し、そのモデルが生成するピッ
チパタンのピッチ形状をカテゴリに分類して、これらの
カテゴリの音声片が含まれるように音声片データベース
を構築しても同ようの効果が得られる。In the first embodiment, the pitch template is used as a pitch control method. However, a pitch pattern is generated by a model, and the pitch shape of the pitch pattern generated by the model is classified into categories. The same effect can be obtained by constructing a speech segment database so as to include speech segments of the category.

【００１６】（実施の形態２）次に、本発明の実施の形態２について具体的に説明す
る。図４は本実施の形態２における音声合成方法の概念
図である。音声片のピッチを変更する際、0.3 オクター
ブ程度のピッチ変更なら聴感上あまり音質劣化は生じな
い。この点に注目し、ピッチ変更の範囲が0.3 オクター
ブ以下になるようにピッチ形状のカテゴリを分類すれ
ば、ピッチ変更による音質劣化を聴感上感じられないよ
な音声を合成することができる。図４において、Ｖ1 は
先行母音部分、Ｖ2 は後続母音部分の基本周波数を表す
ものである。[0016] Next (Embodiment 2), the implementation of the second embodiment of the present invention will be described. FIG. 4 is a conceptual diagram of a speech synthesis method according to the second embodiment. When changing the pitch of a voice segment, if the pitch is changed by about 0.3 octave, the sound quality does not deteriorate much in terms of hearing. Paying attention to this point, if the pitch shape categories are classified so that the range of the pitch change is 0.3 octave or less, it is possible to synthesize a sound in which the sound quality deterioration due to the pitch change is not perceivable. In FIG. 4, V1 represents the fundamental frequency of the preceding vowel part, and V2 represents the fundamental frequency of the succeeding vowel part.

【００１７】図４において、Ｖ1 とＶ2 は両軸ともに50
Hzを基準として0.3 オクターブ毎にメモリをふってい
る。太線で示した401 〜413 はピッチ形状のカテゴリ分
類を示している。例えば、カテゴリ401 はＶ1 が173Hz
から261Hz までをカバーし、Ｖ2 は261Hz から396Hz を
カバーするカテゴリである。この時、このカテゴリの中
心であるＶ1 が212Hz でＶ2 が322Hz である音声片がれ
ば、このカテゴリの範囲でピッチ変更を行なっても、0.
3 オクターブ以内になるので、ピッチ変更による音質劣
化を聴感上感じられないようにすることができる。従っ
て、これらのカテゴリの中心位置になるようなピッチ形
状の音声片を音声片データベースに確保することによ
り、ピッチ変更による音質劣化が聴感上感じられない音
声合成方法を実現することができる。In FIG. 4, V1 and V2 are 50
The memory is used every 0.3 octaves based on Hz. 401 to 413 indicated by bold lines indicate the category classification of the pitch shape. For example, for category 401, V1 is 173Hz
V2 is a category covering 261 Hz to 396 Hz. At this time, if there is a voice piece whose V1 is 212 Hz and V2 is 322 Hz, which is the center of this category, even if the pitch is changed within the range of this category, it is 0.
Since it is within three octaves, it is possible to prevent the sound quality deterioration due to the pitch change from being perceived. Therefore, a voice synthesis method in which sound quality deterioration due to a pitch change is not perceived by hearing can be realized by securing a voice shape having a pitch shape that becomes the center position of these categories in the voice piece database.

【００１８】（実施の形態３）次に、本発明の請求項１に対応する実施の形態３につい
て具体的に説明する。図５は本実施の形態３における音
声合成方法の概念図である。図５において、511 と521
と531 は、単語あるいは複合語のピッチパタンを表した
ピッチテンプレートである。文章を合成する場合には、
これらのテンプレートを組み合わせて文章用のピッチテ
ンプレートとする。この時、文章の場合には、文末にな
るにつれてピッチが低くなる傾向があので、文末になる
につれてこれらの単語あるいは複合語のテンプレートを
例えば0.9 倍，0.8 倍というようにピッチを低く修正し
た後、これらを組み合わせて文章のピッチテンプレート
とする。[0018] (Embodiment 3) Next, a third embodiment corresponding to claim 1 of the present invention will be described. FIG. 5 is a conceptual diagram of a speech synthesis method according to the third embodiment. In FIG. 5, 511 and 521
And 531 are pitch templates representing pitch patterns of words or compound words. When composing sentences,
These templates are combined to form a sentence pitch template. At this time, in the case of a sentence, the pitch tends to decrease as the sentence ends, so the pitch of the word or compound word is corrected to be 0.9 times or 0.8 times as low as the end of the sentence. These are combined to form a sentence pitch template.

【００１９】図５において、ピッチパタン512 〜514
は、ピッチテンプレート511 を0.9 から0.7 倍にしたも
のを表しており、同ようにピッチテンプレート522 〜52
4 と532 〜534 も、それぞれピッチテンプレート521 と
531 を0.9 から0.7 倍にしたものを表している。文章を
合成する場合には、例えば、ピッチテンプレート511 、
522 、533 を組み合わせることによって、文末になるに
つれてピッチが低くなるピッチパターンを生成すること
ができる。In FIG. 5, pitch patterns 512 to 514
Represents a pitch template 511 obtained by multiplying the pitch template 511 by 0.9 to 0.7, and similarly, pitch templates 522 to 52
4 and 532 to 534 are also pitch templates 521 and
531 has been increased from 0.9 to 0.7 times. When synthesizing sentences, for example, pitch template 511,
By combining 522 and 533, it is possible to generate a pitch pattern in which the pitch decreases as the sentence ends.

【００２０】一方、ピッチ形状のカテゴリ分類は、先行
母音部分と後続母音部分でそれぞれ高（Ｈ）、中
（Ｎ）、低（Ｌ）に分類し、ピッチ形状カテゴリは全部
で９種類に設定したものとする。例えば、ピッチテンプ
レート511 において、区間514 は、Ｎ−Ｈというカテゴ
リに属する。この時、ピッチテンプレート511 を低く修
正して作成するピッチテンプレート512 〜514 について
も、同様ピッチ形状のカテゴリ分類を行なう。そして、
音声片データベースには、これらのカテゴリの中心的な
音声片を確保しておく。これにより、あらゆる文章のピ
ッチパタンも有限のピッチテンプレートから生成するこ
とができ、このピッチパタンに似通った音声片が音声片
データベースの中に確保されているので、ピッチ変更に
よる音質劣化を抑えながら、文章などのあらゆるピッチ
パタンの音声を合成することが可能になる。On the other hand, pitch categories are classified into high (H), medium (N), and low (L) for the preceding vowel portion and the succeeding vowel portion, respectively, and a total of nine pitch shape categories are set. Shall be. For example, in the pitch template 511, the section 514 belongs to the category N-H. At this time, the pitch templates 512 to 514, which are created by correcting the pitch template 511 to a low level, are similarly subjected to pitch shape category classification. And
In the speech segment database, central speech segments of these categories are secured. As a result, the pitch pattern of any sentence can be generated from a finite pitch template.Speech fragments similar to this pitch pattern are secured in the speech fragment database. It is possible to synthesize voices of any pitch pattern such as sentences.

【００２１】（実施の形態４）次に、本発明の実施の形態４について具体的に説明す
る。図６は本実施の形態４における音声合成装置の構成
図である。601 は読み入力手段、602 は音韻連鎖決定手
段、603 は韻律決定手段、604 は音声片選択手段、605
は音声片変形手段、606 は音声片接続手段、607 は合成
音声出力手段、608 はピッチテンプレートＤＢ（データ
ベース）、609 はカテゴリ選択テーブル、610 は音声片
データベースである。[0021] (Embodiment 4) Next, the implementation of the fourth invention will be described in detail. FIG. 6 is a configuration diagram of the speech synthesis device according to the fourth embodiment. 601 is a reading input unit, 602 is a phoneme chain determining unit, 603 is a prosody determining unit, 604 is a speech unit selecting unit, 605.
, A speech piece connection means, 607, a synthesized speech output means, 608, a pitch template DB (database), 609, a category selection table, and 610, a speech piece database.

【００２２】まず、読み入力手段601 で合成内容の文字
情報である中間言語を入力する。そして、音韻連鎖決定
手段602 で中間言語を分析し、音声片として必要な音韻
連鎖を決定する。また、韻律決定手段603 では、中間言
語からアクセントや抑揚などの情報を抽出し、目的のピ
ッチパターンのピッチテンプレートをピッチテンプレー
トＤＢ608 から選択する。カテゴリ選択テーブル609
は、ピッチテンプレート毎に図１と図３のように構成す
る。音声片データベース610 は、図２で示すようなカテ
ゴリの音声片が音韻連鎖毎に全て存在するように構築す
る。これにより、音声片選択手段604 では、目的のピッ
チパタンに似通ったピッチ形状をもつ音声片を必ず選択
することができるようになる。こうして選択した音声片
を音声片変形手段605 で変形し、音声片接続手段606 で
接続し、合成音声出力手段607 から出力することによ
り、ピッチ変更による音質劣化を抑えた音声合成装置を
構築することができる。First, an intermediate language, which is character information of synthesized contents, is inputted by the reading input means 601. Then, the intermediate language is analyzed by the phoneme chain determination means 602, and a phoneme chain required as a voice segment is determined. The prosody determination means 603 extracts information such as accent and intonation from the intermediate language, and selects a pitch template of a target pitch pattern from the pitch template DB 608. Category selection table 609
Are configured as shown in FIGS. 1 and 3 for each pitch template. The speech segment database 610 is constructed so that speech segments of the category as shown in FIG. 2 are all present for each phoneme chain. As a result, the speech piece selecting means 604 can always select a speech piece having a pitch shape similar to the target pitch pattern. The speech segment selected in this way is transformed by the speech segment transformation unit 605, connected by the speech segment connection unit 606, and output from the synthesized speech output unit 607, thereby constructing a speech synthesis device that suppresses sound quality deterioration due to pitch change. Can be.

【００２３】（実施の形態５）次に、本発明の実施の形態５について具体的に説明す
る。図７は本実施の形態５におけるカテゴリ選択テーブ
ルの例である。図７において、ピッチ形状のカテゴリ
は、図４に従って１３種類に分類している。ピッチ形状
カテゴリは、全ての音韻連鎖毎について１３種類ずつ設
け、それぞれのカテゴリの中心的な音声片を音声片デー
タベースに確保し、その音声片のＩＤをテーブルに記録
する。このようにしてカテゴリ選択テーブルと音声片デ
ータベースを構築することにより、ピッチパタンの観点
から見て必要十分な音声片データベースを構築すること
ができるので、音声片データベースのサイズを最適化す
ることができる。Next (Embodiment 5), the fifth implementation of the present invention will be described. FIG. 7 is an example of a category selection table according to the fifth embodiment. In FIG. 7, the categories of the pitch shape are classified into 13 types according to FIG. Thirteen types of pitch shape categories are provided for every phoneme chain, a central voice segment of each category is secured in a voice segment database, and the ID of the voice segment is recorded in a table. By constructing the category selection table and the speech segment database in this way, a speech segment database necessary and sufficient from the viewpoint of pitch pattern can be constructed, so that the size of the speech segment database can be optimized. .

【００２４】（実施の形態６）次に、本発明の請求項２に対応する実施の形態６につい
て具体的に説明する。図８は本実施の形態６における音
声合成装置の構成図である。図８において、801 は読み
入力手段、802 は音韻連鎖決定手段、、803 は韻律決定
手段、804 は音声片選択手段、805 は音声片変形手段、
806 は音声片接続手段、807 は合成音声出力手段, 、80
8 はＮモーラＭ型アクセントのピッチテンプレートＤＢ
（データベース）、809 はＮモーラＭ型アクセントのピ
ッチテンプレート毎のカテゴリ選択テーブル、810 は音
声片データベース、811 は文章用ピッチテンプレート生
成手段である。(Embodiment 6) Next, Embodiment 6 corresponding to claim 2 of the present invention will be specifically described. FIG. 8 is a configuration diagram of a speech synthesis device according to the sixth embodiment. In FIG. 8, reference numeral 801 denotes reading input means, 802 denotes phoneme chain determination means, 803 denotes prosody determination means, 804 denotes speech piece selection means, 805 denotes speech piece deformation means,
806 is a speech piece connection means, 807 is a synthesized speech output means,, 80
8 is a pitch template DB with N-mora M-type accent
(Database), 809 is a category selection table for each pitch template of N-mora M-type accent, 810 is a speech piece database, and 811 is a pitch template generating means for sentences.

【００２５】ピッチテンプレートＤＢ808 には、図５に
示すテンプレート511 、521 、531のように高さを修
正していないデータのみを記録しておく。そして、カテ
ゴリ選択テーブル809 には、図５に示すようにピッチテ
ンプレートを1.0 倍，0.9 倍，0.8 倍，…と低く修正し
た場合のピッチ形状カテゴリも記録しておく。文章を合
成する場合には、文章用ピッチテンプレート生成手段81
1 で、文末になるにつれてピッチパターンを低く修正し
たＮモーラＭ型アクセントのピッチテンプレートを組み
合わせて文章用のピッチテンプレートＤＢ808 を生成す
る。そして、ピッチテンプレートを低く修正した量によ
って、それぞれカテゴリ選択テーブル809 から音声片を
選択するようにする。このようにピッチテンプレートＤ
Ｂ808 とカテゴリ選択テーブル809 を構成することによ
り、ピッチテンプレートＤＢ808とカテゴリ選択テーブ
ル809 のサイズを有限にしても、どんな長い文章でも合
成できるようになる。従って、ピッチテンプレートＤＢ
808 とカテゴリ選択テーブル809 のサイズを小さくする
ことができる。In the pitch template DB 808, only data whose height has not been corrected, such as the templates 511, 521 and 531 shown in FIG. 5, are recorded. Then, in the category selection table 809, as shown in FIG. 5, the pitch shape category in the case where the pitch template is modified to 1.0 times, 0.9 times, 0.8 times,... Is also recorded. When synthesizing sentences, the sentence pitch template generating means 81 is used.
In step 1, a pitch template DB 808 for a sentence is generated by combining pitch templates with N-mora M-type accents whose pitch pattern has been lowered toward the end of the sentence. Then, a speech piece is selected from the category selection table 809 according to the amount by which the pitch template is corrected low. Thus pitch template D
By configuring the B808 and the category selection table 809, even if the sizes of the pitch template DB 808 and the category selection table 809 are limited, any long text can be synthesized. Therefore, pitch template DB
The size of 808 and the category selection table 809 can be reduced.

【００２６】[0026]

【発明の効果】以上のように、本発明によれば、制御目
標とするピッチパタンに似通った音声片が音声片データ
ベースに存在することが保証できるので、ピッチ変更に
よる音質劣化を抑えることができ、合成音声の品質を改
善することができるという有利な効果が得られる。As described above, according to the present invention, it is possible to guarantee that voice segments resembling the pitch pattern to be controlled exist in the voice segment database, so that it is possible to suppress sound quality deterioration due to pitch change. This has an advantageous effect that the quality of synthesized speech can be improved.

[Brief description of the drawings]

【図１】本発明の実施の形態におけるピッチテンプレー
トの概念図FIG. 1 is a conceptual diagram of a pitch template according to an embodiment of the present invention.

【図２】本発明の実施の形態１におけるピッチパタンの
カテゴリ分類の概念図FIG. 2 is a conceptual diagram of category classification of pitch patterns according to the first embodiment of the present invention.

【図３】本発明の実施の形態１におけるカテゴリ選択テ
ーブルの概念図FIG. 3 is a conceptual diagram of a category selection table according to the first embodiment of the present invention.

【図４】本発明の実施の形態２におけるピッチパタンの
カテゴリ分類の概念図FIG. 4 is a conceptual diagram of a category classification of a pitch pattern according to the second embodiment of the present invention.

【図５】本発明の実施の形態３におけるピッチテンプレ
ートとカテゴリ選択テーブルの概念図FIG. 5 is a conceptual diagram of a pitch template and a category selection table according to the third embodiment of the present invention.

【図６】本発明の実施の形態４における音声合成装置の
ブロック構成図FIG. 6 is a block diagram of a speech synthesizer according to a fourth embodiment of the present invention.

【図７】本発明の実施の形態５におけるカテゴリ選択テ
ーブルの概念図FIG. 7 is a conceptual diagram of a category selection table according to the fifth embodiment of the present invention.

【図８】本発明の実施の形態６における音声合成装置の
ブロック構成図FIG. 8 is a block diagram of a speech synthesizer according to a sixth embodiment of the present invention.

[Explanation of symbols]

201 〜106 ピッチテンプレートにおけるモーラ位置毎
のデータ 201 〜205 ピッチ形状の分類 211 〜214 ピッチパタンのカテゴリ分類 401 〜413 ピッチパタンのカテゴリ分類 511 〜514 ピッチテンプレート 521 〜524 ピッチテンプレート 531 〜534 ピッチテンプレート 601 、801 読み入力手段 602 、802 音韻連鎖決定手段 603 、803 韻律決定手段 604 、804 音声片選択手段 605 、805 音声片変形手段 606 、806 音声片接続手段 607 、807 合成音声出力手段 608 、808 ピッチテンプレートＤＢ 609 、809 カテゴリ選択テーブル 610 、810 音声片データベース 811 文章用ピッチテンプレート生成手段201 to 106 Data for each mora position in the pitch template 201 to 205 Pitch shape classification 211 to 214 Pitch pattern category classification 401 to 413 Pitch pattern category classification 511 to 514 Pitch template 521 to 524 Pitch template 531 to 534 Pitch template 601 , 801 reading and input means 602, 802 phoneme chain determination means 603, 803 prosody determination means 604, 804 speech piece selection means 605, 805 speech piece transformation means 606, 806 speech piece connection means 607, 807 synthesized speech output means 608, 808 pitch Template DB 609, 809 Category selection table 610, 810 Voice segment database 811 Text pitch template generation means

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平10−116089（ＪＰ，Ａ) 特開平８−123469（ＪＰ，Ａ) 特開平11−344997（ＪＰ，Ａ) 特開平９−198073（ＪＰ，Ａ) 特開平10−97291（ＪＰ，Ａ) 特開2000−66695（ＪＰ，Ａ) 望月亮、西村洋文、蓑輪利光、新居康彦，波形接続合成に用いるＶＣＶマルチ素片データベースの検討，日本音響学会講演論文集，日本，日本音響学会，1998 年９月，平成10年秋季，287−288 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/08 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-10-116089 (JP, A) JP-A-8-123469 (JP, A) JP-A-11-344997 (JP, A) JP-A 9- 198073 (JP, A) JP-A-10-97291 (JP, A) JP-A-2000-66695 (JP, A) Ryo Mochizuki, Hirofumi Nishimura, Toshimitsu Minowa, Yasuhiko Arai, VCV multi-segment database used for waveform connection synthesis Of the Acoustic Society of Japan, Proceedings of the Acoustical Society of Japan, The Acoustical Society of Japan, September 1998, Autumn 1998, 287-288 (58) Fields surveyed (Int. Cl. ⁷ , DB name) G10L 13/08

Claims

(57) [Claims]

1. CV, CV / VC, VCV, CV / VC
A waveform superimposition method in which a speech unit of a synthesis unit such as V is selected from a database, the pitch of the selected speech unit is transformed according to a template that is a pattern of the pitch, and these transformed speech units are connected to synthesize speech. In the speech synthesis method, the template, which is the approximate pattern of the pitch, is divided into sections of the synthesis unit, and each section is classified into categories according to the pitch height and shape. When composing a sentence , the speech segment database is configured to
Modify template to lower pitch as
And then combine these modified templates
Use as a whole template
Classify the level to be corrected into several
Which category does the height and shape of the pitch belong to?
A speech synthesis method characterized in that speech pieces are selected in advance according to the category classification .

2. Inputting a voice reading to be synthesized
Input means and CV, CV / VC, V
Phoneme sequence of speech unit of synthesis unit such as CV, CV / VCV
A phonological chain determining means for determining a chain;
Voice segment data that categorizes the pitch height and shape of the
Database and a pitch pattern that is a rough pattern of the pitch of words and sentences.
Pitch template data recording pitch templates
Database and pitch template for pitch control
Means for determining the pitch pattern according to
Switch the template into sections of synthesis units, and
Which category does the pitch height and shape belong to in the section
Category selection table that records information on whether
A speech piece that selects a speech piece according to the category selection table
Selection means and the pitch of the speech piece according to the pitch template
Connects voice-segment deforming means that deforms
Speech unit connection means for synthesizing, and synthesized speech for outputting synthesized speech
Voice output means, and a rough pattern of pitch of words and sentences
Is the length of a word or compound word
Set to, and when synthesizing sentences,
Modify the plate so that it gets lower at the end of the sentence,
A sentence combining these modified pitch templates
Pitch template for writing pitch templates
Providing rate generating means, each pitch template
Category for each level classification that corrects the pitch low in
Voice synthesis characterized by preparing a plurality of reselection tables
Equipment.