JP2001350500A

JP2001350500A - Speech speed changer

Info

Publication number: JP2001350500A
Application number: JP2000171005A
Authority: JP
Inventors: Mitsuru Ebihara; 充海老原; Yasushi Ishikawa; 泰石川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2000-06-07
Filing date: 2000-06-07
Publication date: 2001-12-21

Abstract

PROBLEM TO BE SOLVED: To obtain a speech speed changer which automatically conducts a high quality speech speed change by waveform expanding and compressing a vowel segement in accordance with the speech speed change rate specified by a user. SOLUTION: Code vectors relative to learning voice data are recorded in a VQ code book 6 with appearance probabilities of vowels and consonants of each code vector. Then, input voice 1 is vector quantized by a segment discriminating means 7 while referring to the VQ code book to select code vectors. Then, waveform compressing and expanding is performed for the segement which is discriminated to be the vowel segment of the selected code vector by employing a speech speed change rate 10 given to a waveform expanding and compressing means 9.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、あらかじめ録音
された音声、あるいはマイクなどから直接入力された音
声の発話速度を、利用者の要求に応じて自由に変更する
ことができる話速変更装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech speed changing device which can freely change the speech speed of pre-recorded speech or speech directly input from a microphone or the like according to a user's request. Things.

【０００２】[0002]

【従来の技術】近年、カーナビゲーションやＰＤＡ（Ｐ
ｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ
ｓ）、電子玩具などの用途に適用可能な音声合成Ｓ／Ｗ
（ソフトウェア）への要求が高まっている。こうしたア
プリケーションでは、発話速度やピッチの変更などの多
様性が、少ない演算量とメモリで実現されることが望ま
れている。こうした用途には小規模のテキスト音声合成
Ｓ／Ｗの利用が考えられるが、実際のアプリケーション
ではテキスト音声合成のみが用いられることは稀で、合
成音声の品質の観点から、録音された固定メッセージ、
または固定メッセージと規則合成音声の併用がしばしば
用いられている。2. Description of the Related Art In recent years, car navigation systems and PDAs (P
personal Digital Assistant
s), speech synthesis S / W applicable to electronic toys and other applications
The demand for (software) is increasing. In such applications, it is desired that versatility such as changes in speech speed and pitch be realized with a small amount of computation and a small amount of memory. For such applications, the use of a small-scale text-to-speech S / W is conceivable, but in actual applications, only text-to-speech synthesis is rarely used. From the viewpoint of the quality of synthesized speech, fixed messages recorded,
Or, a combination of fixed messages and rule-based synthetic speech is often used.

【０００３】こうしたＳ／Ｗでは、固定メッセージは録
音音声を蓄積し、これを再生するだけであるが、単語や
句単位で録音された音声を接続したり、規則合成音声と
接続して文としての連続性を得るためには、発話速度な
どの韻律に対する変更が必要となる。また、ユーザが自
分の好みや聞き取りやすさに合わせて、発話速度を変更
できることは有効であり、玩具などの応用としても要求
は高い。[0003] In such S / W, a fixed message only accumulates a recorded voice and reproduces it. However, a voice recorded in units of words or phrases is connected, or connected with a rule synthesized voice to form a sentence. In order to obtain the continuity of, it is necessary to change the prosody such as the speech speed. In addition, it is effective for the user to be able to change the utterance speed in accordance with his or her preference and easiness of hearing, and there is a high demand for applications such as toys.

【０００４】このように、原音声の発話速度をピッチや
音韻性を保ったまま変更する方式は既に検討されてい
る。そのような従来の話速変更装置として、例えば特開
平１−９３７９５号公報に示された「音声の発話速度変
換方法」などがある。図１３は上記公報に示された手法
に基づく、従来の話速変更装置の一構成例を示すブロッ
ク図である。図において、１は入力音声、２は分析部、
３は制御部、４は波形接続部、５は合成音声である。[0004] As described above, a method of changing the utterance speed of the original voice while maintaining the pitch and phonology has already been studied. As such a conventional speech speed changing device, there is, for example, a "voice speech speed conversion method" disclosed in JP-A-1-93795. FIG. 13 is a block diagram showing an example of the configuration of a conventional speech speed changing device based on the technique disclosed in the above publication. In the figure, 1 is an input voice, 2 is an analysis unit,
Reference numeral 3 denotes a control unit, 4 denotes a waveform connection unit, and 5 denotes a synthesized voice.

【０００５】次に動作について説明する。入力音声１が
分析部２に入力されると、分析部２ではその入力音声１
について、有音と無音および有声音と無声音の判別が行
われ、有声音については線形予測分析やピッチ分析が行
われる。そして線形予測分析結果から共振周波数と帯域
幅の時間変化を求め、それを利用して有声音を母音と子
音に分離し、それを制御部３に入力する。制御部３で
は、子音は母音や無音より発話速度の変化が小さいとい
う知識に基づいて、有声子音と無声子音の伸縮比率が母
音や無音のそれより小さい値に設定される。波形接続部
４ではこの制御部３で決定された伸縮比率によって各区
間の発声時間長を伸縮して接続する。有声区間はピッチ
周期を基にピッチ波形の繰り返しや間引きにより、ま
た、無音区間長や無声区間は伸縮比率に応じた長さの波
形の削除または繰り返し挿入によって発話速度を変更
し、合成音声５として出力する。Next, the operation will be described. When the input voice 1 is input to the analysis unit 2, the analysis unit 2 outputs the input voice 1
, Voiced and unvoiced and voiced and unvoiced are discriminated, and for voiced, linear prediction analysis and pitch analysis are performed. Then, a time change of the resonance frequency and the bandwidth is obtained from the result of the linear prediction analysis, and the voiced sound is separated into a vowel and a consonant using the obtained result. The control unit 3 sets the expansion / contraction ratio between the voiced consonant and the unvoiced consonant to a value smaller than that of the vowel or the non-voice based on the knowledge that the change in the utterance speed of the consonant is smaller than that of the vowel or the non-voice. The waveform connection unit 4 expands and contracts the utterance time length of each section according to the expansion / contraction ratio determined by the control unit 3. The voiced section changes the speech speed by repeating or thinning out the pitch waveform based on the pitch period, and the silent section length or the unvoiced section changes the speech speed by deleting or repeatedly inserting a waveform having a length corresponding to the expansion / contraction ratio. Output.

【０００６】なお、このような従来の話速変更装置に関
連する記載のある文献としては、この他にも、例えば、
合成音の母音長と子音長を決定する韻律生成規則に関す
るものとして、特開平６−２６６３９１号公報、特開平
６−２７４１９５号公報、特開平１０−７８７９５号公
報などがあり、母音区間と子音区間の波形振幅値を制御
するものとして、特開平１０−１４５８９７号公報が、
別途指定する発話テンポに応じて合成音の母音長と子音
長を決定する韻律生成規則に関するものとして、特開平
６−２２２７９３号公報が、最初の母音長を検出して発
話速度を決定するものとして、特開平１０−７０７９０
号公報などがある。[0006] In addition, as a document having a description related to such a conventional speech speed changing device, other than this, for example,
JP-A-6-266391, JP-A-6-274195, JP-A-10-78795, etc. relate to the prosody generation rules for determining the vowel length and consonant length of a synthesized sound. Japanese Patent Application Laid-Open No. 10-145897 discloses a method for controlling the waveform amplitude value of
Japanese Patent Laid-Open Publication No. Hei 6-222793 discloses a technique for determining a vowel length and a consonant length of a synthesized sound in accordance with a separately specified utterance tempo. JP-A-10-70790
No. Gazette.

【０００７】[0007]

【発明が解決しようとする課題】従来の話速変更装置は
以上のように構成されているので、母音と子音とを区分
して、それぞれを固有の波形伸縮率により波形圧縮また
は伸長を行う際に、線形予測分析結果を基に区分化して
いるため、母音と有声子音の判別に誤りが生じやすく、
その境界も不明瞭となりがちになり、また、より自然な
波形伸縮を行うためには子音の種類を考慮することが必
要で、この枠組では不十分であり、さらに、分析部２に
おいて分析に要する演算量も多大で、実時間上での動作
を考える場合には実現が難しいなどの課題があった。Since the conventional speech speed changing device is configured as described above, it is necessary to classify vowels and consonants and compress or decompress each of the vowels and consonants with a unique waveform expansion / contraction ratio. In addition, since it is segmented based on the results of linear prediction analysis, vowels and voiced consonants are likely to be incorrectly distinguished,
The boundary tends to be unclear, and it is necessary to consider the type of consonant in order to perform more natural waveform expansion and contraction. This framework is not sufficient, and the analysis unit 2 needs to perform analysis. The amount of calculation is large, and there is a problem that it is difficult to realize the operation in real time.

【０００８】この発明は上記のような課題を解決するた
めになされたもので、ベクトル量子化（Ｖｅｃｔｏｒ
Ｑｕａｎｔｉｚａｔｉｏｎ：以下、ＶＱと略記すること
もある）されたパラメータ情報を用いて母音と子音を判
別し、その判定結果を基に母音区間と子音区間毎に固有
の波形伸縮率を与えて波形伸縮を行うことが可能であ
り、実時間で高品質な波形伸縮を実現することができる
話速変更装置を得ることを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and is directed to vector quantization (Vector).
A vowel and a consonant are discriminated using parameter information obtained by quantification (hereinafter sometimes abbreviated as VQ), and a unique waveform expansion / contraction rate is given to each vowel section and consonant section based on the result of the determination to expand and contract the waveform. It is an object of the present invention to provide a speech speed changing device which can perform the waveform expansion and contraction in real time with high quality.

【０００９】[0009]

【課題を解決するための手段】この発明に係る話速変更
装置は、ＶＱコードブックに、学習音声データについて
のコードベクトルを各コードベクトルの母音および子音
の出現確率とともに記録しておき、区間判別手段におい
て、そのＶＱコードブックを参照して入力音声をベクト
ル量子化してコードベクトルの選択を行い、選択された
コードベクトルにおける母音または子音の出現確率よ
り、当該区間が母音区間か子音区間かの判別を行い、波
形伸縮手段に話速変更率が与えられると、区間判別手段
にて母音区間と判定された区間について、与えられた話
速変更率に基づいて波形伸縮を行うようにしたものであ
る。A speech speed changing apparatus according to the present invention records a code vector of learning speech data in a VQ codebook together with the appearance probabilities of vowels and consonants of each code vector, and performs section discrimination. Means for selecting a code vector by vector-quantizing the input voice with reference to the VQ codebook, and determining whether the section is a vowel section or a consonant section from the appearance probability of a vowel or consonant in the selected code vector. When the speech speed change rate is given to the waveform expansion / contraction means, the waveform expansion / contraction is performed on the section determined as the vowel section by the section determination means based on the given speech speed change rate. .

【００１０】この発明に係る話速変更装置は、母音区間
と子音区間、または母音区間と子音区間と無音区間毎に
固有の伸縮比率を規定し、話速変更率が与えられると、
規定された伸縮比に応じて、母音区間、子音区間、無音
区間の各区間毎の波形伸縮率を求め、その波形伸縮率に
基づいて波形伸縮を行う機能を波形伸縮手段に持たせた
ものである。The speech speed changing device according to the present invention defines a unique expansion / contraction ratio for each of a vowel section and a consonant section, or for each of a vowel section, a consonant section and a silent section.
According to the specified expansion / contraction ratio, the waveform expansion / contraction means obtains a waveform expansion / contraction rate for each section of a vowel section, a consonant section, and a silent section, and performs a waveform expansion / contraction based on the waveform expansion / contraction rate. is there.

【００１１】この発明に係る話速変更装置は、伸縮比テ
ーブルを設けて、音素の種類やカテゴリー毎に規定され
た固有の伸縮比を記憶させ、話速変更率が与えられる
と、波形伸縮手段はその伸縮比テーブルを参照して、入
力音声の音素毎または音素のカテゴリー毎の伸縮比に応
じた区間毎の波形伸縮率を求め、それに基づいて波形伸
縮を行うようにしたものである。The speech speed changing device according to the present invention is provided with an expansion / contraction ratio table for storing a unique expansion / contraction ratio defined for each type or category of phoneme. Refers to the expansion / contraction ratio table, finds a waveform expansion / contraction ratio for each section corresponding to the expansion / contraction ratio for each phoneme or phoneme category of the input voice, and performs waveform expansion / contraction based on the ratio.

【００１２】この発明に係る話速変更装置は、学習音声
データのコードベクトルを各コードベクトルに固有の伸
縮比率とともに記録しているＶＱコードブックを、ＶＱ
波形伸縮手段にて参照し、入力音声についてベクトル量
子化してコードベクトルを選択し、話速変更率が与えら
れた場合に、その選択されたコードベクトル毎の伸縮比
に応じた区間毎の波形伸縮率を求め、それに基づいて波
形伸縮を行うようにしたものである。The speech speed changing apparatus according to the present invention stores a VQ codebook in which a code vector of learning speech data is recorded together with an expansion / contraction ratio specific to each code vector.
Referring to the waveform expansion and contraction means, vector quantization is performed on the input voice to select a code vector, and when a speech speed change rate is given, the waveform expansion and contraction for each section corresponding to the expansion and contraction ratio for each selected code vector is performed. The ratio is obtained, and the waveform is expanded and contracted based on the ratio.

【００１３】この発明に係る話速変更装置は、入力音声
の音素ラベルをラベリング手段で抽出し、リズム知覚点
抽出手段において音素列毎に固有のリズム知覚点を記憶
したリズム知覚点テーブルを参照して、音素ラベルから
入力音声のリズム知覚点を抽出し、波形伸縮手段に話速
変更率が与えられると、リズム知覚点間隔の伸縮率が一
定になるように区間毎の波形伸縮率を求め、それに基づ
いて波形伸縮を行うようにしたものである。In the speech speed changing device according to the present invention, a phoneme label of an input voice is extracted by a labeling means, and a rhythm perception point table which stores a unique rhythm perception point for each phoneme sequence in the rhythm perception point extraction means is referred to. Then, the rhythm perception point of the input voice is extracted from the phoneme label, and when the speech speed change rate is given to the waveform expansion / contraction means, the waveform expansion / contraction rate for each section is determined so that the expansion / contraction rate of the rhythm perception point interval is constant. Waveform expansion and contraction are performed based on this.

【００１４】この発明に係る話速変更装置は、母音区間
と子音区間、または母音区間と子音区間と無音区間毎に
固有の伸縮比を波形伸縮手段によって規定し、その波形
伸縮手段に話速変更率が与えられた場合に、リズム知覚
点間隔の伸縮率が一定になるように、母音区間と子音区
間の伸縮比、あるいは母音区間と子音区間と無音区間の
伸縮比に応じた各区間毎の波形伸縮率を求め、それに基
づいた波形伸縮を行うようにしたものである。The speech rate changing device according to the present invention defines a unique expansion / contraction ratio for each vowel section and consonant section, or for each vowel section, consonant section and silent section, by means of the waveform expanding / contracting means. When the rate is given, the expansion and contraction ratio of the vowel section and the consonant section or the expansion and contraction ratio of the vowel section, the consonant section and the silent section so that the expansion and contraction rate of the rhythm perception point interval becomes constant. The waveform expansion / contraction ratio is obtained, and the waveform expansion / contraction is performed based on the obtained ratio.

【００１５】この発明に係る話速変更装置は、伸縮比テ
ーブルを設けて、音素の種類や音素のカテゴリー毎に規
定された固有の伸縮比を記憶させ、話速変更率が与えら
れると、波形伸縮手段はその伸縮比テーブルを参照し
て、リズム知覚点間隔の伸縮率が一定になるように、入
力音声の音素毎または音素のカテゴリー毎の伸縮比に応
じた区間毎の波形伸縮率を求め、それに基づいて波形伸
縮を行うようにしたものである。The speech speed changing device according to the present invention is provided with an expansion / contraction ratio table to store a specific expansion / contraction ratio defined for each phoneme type or phoneme category. The expansion / contraction means refers to the expansion / contraction ratio table and calculates a waveform expansion / contraction ratio for each section corresponding to the expansion / contraction ratio for each phoneme or each phoneme category of the input voice so that the expansion / contraction ratio of the rhythm perception point interval is constant. , And expands and contracts the waveform based on it.

【００１６】この発明に係る話速変更装置は、入力音声
の音素ラベルをラベリング手段で抽出し、リズム知覚点
抽出手段にて、音素系列に固有のリズム知覚点と音素系
列に固有の伸縮比とを記憶するリズム知覚点兼伸縮比テ
ーブルを参照して、音素ラベルから入力音声のリズム知
覚点と音素区間毎の伸縮比を抽出し、波形伸縮手段に話
速変更率が与えられると、そのリズム知覚点間隔の伸縮
率が一定になるように、音素区間毎の伸縮比に応じた各
音素区間毎の波形伸縮率を求め、それに基づいて波形伸
縮を行うようにしたものである。In the speech speed changing apparatus according to the present invention, a phoneme label of an input voice is extracted by a labeling means, and a rhythm perception point unique to the phoneme sequence and a contraction ratio specific to the phoneme sequence are extracted by the rhythm perception point extraction means. With reference to the rhythm perception point and expansion / contraction ratio table that stores the rhythm perception section and the rhythm perception point of the input voice and the expansion / contraction ratio for each phoneme section from the phoneme label, The waveform expansion / contraction ratio is determined for each phoneme section in accordance with the expansion / contraction ratio for each phoneme section so that the expansion / contraction rate of the perceived point interval is constant, and the waveform is expanded / contracted based on the waveform expansion / contraction rate.

【００１７】この発明に係る話速変更装置は、区間判別
手段によりＶＱコードブックを参照して母音区間音声を
抽出し、抽出された母音区間音声における母音中心位置
を母音中心抽出手段において決定し、波形伸縮手段によ
って、母音区間と子音区間、または母音区間と子音区間
と無音区間毎に固有の伸縮比を規定して、話速変更率が
与えられた場合に、その母音中心間隔の伸縮率が一定に
なるように、母音区間と子音区間の伸縮比、あるいは母
音区間と子音区間と無音区間の伸縮比に応じた区間毎の
波形伸縮率を求め、それに基づいて波形伸縮を行うよう
にしたものである。In the speech speed changing device according to the present invention, the vowel section voice is extracted by referring to the VQ codebook by the section discriminating means, and the vowel center position in the extracted vowel section voice is determined by the vowel center extracting means. The waveform expansion / contraction means defines a specific expansion / contraction ratio for each vowel section and consonant section, or each vowel section, consonant section, and silent section, and when a speech speed change rate is given, the expansion / contraction rate of the vowel center interval becomes The expansion and contraction ratio between the vowel section and the consonant section or the expansion and contraction ratio of each section according to the expansion and contraction ratio between the vowel section, the consonant section and the silence section is determined so that the waveform expansion and contraction is performed based on the obtained ratio. It is.

【００１８】この発明に係る話速変更装置は、区間判別
手段によりＶＱコードブックを参照して母音区間音声を
抽出し、抽出された母音区間音声における母音中心位置
を母音中心抽出手段において決定し、話速変更率が与え
られた場合に、波形伸縮手段によって母音区間内の波形
伸縮率を、母音中心位置では最大または最小、子音区間
との境界に近くなるにつれ子音区間の波形伸縮率に近付
くように決定し、それに基づいて各区間の波形伸縮を行
うようにしたものである。In the speech speed changing device according to the present invention, the vowel section voice is extracted by referring to the VQ codebook by the section discriminating means, and the vowel center position in the extracted vowel section voice is determined by the vowel center extracting means. When the speech rate change rate is given, the waveform expansion / contraction means by the waveform expansion / contraction means makes the waveform expansion / contraction rate in the vowel center position approach the maximum / minimum at the vowel center position, and approaches the waveform expansion / contraction rate in the consonant section as it approaches the boundary with the consonant section. Is determined, and the waveform expansion / contraction of each section is performed based on this.

【００１９】この発明に係る話速変更装置は、ピッチ分
析手段による入力音声のピッチ周波数分析で得られたピ
ッチ概形の極大値と極小値より、ピッチ概形基準位置を
ピッチ概形基準位置決定手段にて求め、話速変更率が与
えられた場合に、波形伸縮手段によって、そのピッチ概
形基準位置間隔の伸縮率が一定になるように、母音区間
と子音区間の伸縮比、あるいは母音区間と子音区間と無
音区間の伸縮比に応じた区間毎の波形伸縮率を求め、そ
れに基づいて波形伸縮を行うようにしたものである。The speech speed changing device according to the present invention determines the pitch approximate reference position from the maximum value and the minimum value of the approximate pitch obtained by the pitch frequency analysis of the input voice by the pitch analysis means. Means, and when the speech speed change rate is given, the expansion / contraction ratio of the vowel section and the consonant section, or the vowel section, by the waveform expansion / contraction means so that the expansion / contraction rate of the pitch approximate reference position interval becomes constant. And a waveform expansion / contraction rate for each section corresponding to the expansion / contraction ratio between the consonant section and the silent section, and the waveform expansion / contraction is performed based on the waveform expansion / contraction rate.

【００２０】この発明に係る話速変更装置は、ピッチ分
析手段のピッチ周波数分析結果によるピッチ概形の判定
により、ピッチ概形分割手段で区間判別手段からの暫定
子音区間音声を母音区間に含む部分と子音区間の部分と
に分離して、改めて母音区間と子音区間との判別を行
い、波形伸縮手段に話速変更率が与えられた場合に、そ
の改めて母音区間と子音区間に判別された区間について
固有の伸縮比を決定し、その伸縮比に応じて求めた区間
毎の波形伸縮率に基づいて、波形伸縮を行うようにした
ものである。In the speech speed changing apparatus according to the present invention, the pitch outline is determined based on the pitch frequency analysis result of the pitch analysis means, and the pitch outline dividing means includes the provisional consonant section voice from the section determination means in the vowel section. Vowel section and consonant section, and the vowel section and consonant section are discriminated again, and when the speech speed change rate is given to the waveform expansion / contraction means, the vowel section and consonant section are discriminated again. , A specific expansion / contraction ratio is determined, and the waveform expansion / contraction is performed based on the waveform expansion / contraction ratio for each section obtained according to the expansion / contraction ratio.

【００２１】[0021]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１はこの発明の実施の形態１による話
速変更装置の一構成例を示すブロック図である。図にお
いて、１はこの話速変更装置に入力される、あらかじめ
システムに用意された音声データによる入力音声であ
り、５はこの話速変更装置により発話速度の変更が行わ
れて、外部に出力される合成音声である。６は学習音声
データについてのコードベクトルを、各コードベクトル
の母音や子音などの音素の出現確率とともに記録するＶ
Ｑコードブックである。７はこのＶＱコードブック６を
参照し、入力された入力音声１についてベクトル量子化
を行ってコードベクトルを選択し、その選択したコード
ベクトルにおける音素の出現確率に基づいて、当該区間
が母音区間であるか子音区間であるかの判別を行う区間
判別手段であり、８はこの区間判別手段７よりその判定
結果に基づいて出力される母音区間音声である。９は区
間判別手段７にて判定された母音区間音声８について、
所望の話速変更率で波形圧縮または伸長を行い、合成音
声５を生成して出力する波形伸縮手段であり、１０はこ
の波形伸縮手段９に与えられる話速変更率である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a block diagram showing a configuration example of a speech speed changing device according to Embodiment 1 of the present invention. In the figure, reference numeral 1 denotes an input voice which is input to the speech speed changing device and is prepared by voice data prepared in advance in the system, and 5 denotes a speech speed changed by the speech speed changing device and output to the outside. This is a synthesized speech. Reference numeral 6 denotes a V which records a code vector of the learning speech data together with a probability of occurrence of a phoneme such as a vowel or a consonant of each code vector.
It is a Q codebook. 7 refers to the VQ codebook 6, performs vector quantization on the input speech 1 input, selects a code vector, and, based on the appearance probability of phonemes in the selected code vector, determines that the section is a vowel section. A section discriminating means for discriminating whether there is a consonant section or a consonant section. Reference numeral 8 denotes a vowel section voice output from the section discriminating means 7 based on the result of the judgment. 9 is the vowel section voice 8 determined by the section determination means 7
Waveform expansion / contraction means for performing waveform compression or decompression at a desired speech rate change rate to generate and output a synthesized voice 5, and 10 is a speech rate change rate given to the waveform expansion / contraction means 9.

【００２２】次に動作について説明する。ここで、入力
音声１は音声を出力するシステムがあらかじめ用意した
音声データであるものとする。また、ＶＱコードブック
６はこの入力音声１の音声データをベクトル量子化する
ために利用するものであり、学習音声データにおける音
声波形のスペクトルデータを一般的なクラスタリング手
法によって分割したパターン（コードベクトル）と、そ
のコードの組とからなっている。なお、この実施の形態
１におけるＶＱコードブック６においては、後で説明す
る母音の出現確率をコードベクトルとともに記憶してい
る。入力音声１の入力を受けた区間判別手段７はこのＶ
Ｑコードブック６を参照して、入力音声１における母音
と子音の区間を判別し、母音区間部分の入力音声１であ
る母音区間音声８を波形伸縮手段９に送る。なお、この
ＶＱコードブック６を用いた区間判別の方法については
後程説明する。Next, the operation will be described. Here, it is assumed that the input voice 1 is voice data prepared in advance by a system that outputs voice. The VQ codebook 6 is used for vector-quantizing the audio data of the input audio 1, and is a pattern (code vector) obtained by dividing the audio waveform spectrum data in the learning audio data by a general clustering method. And the code set. In the VQ codebook 6 according to the first embodiment, the vowel appearance probability described later is stored together with the code vector. The section discriminating means 7 having received the input of the input voice 1 outputs the V
With reference to the Q codebook 6, a vowel and consonant section in the input voice 1 is determined, and a vowel section voice 8, which is the input voice 1 in the vowel section portion, is sent to the waveform expansion / contraction unit 9. Note that a section determination method using the VQ codebook 6 will be described later.

【００２３】この波形伸縮手段９には上記母音区間音声
８が入力されるとともに、話速変更率１０も与えられ
る。この話速変更率１０はユーザが音声の発話速度を変
更する際に与える値で、元の音声に対する伸縮比率とし
て波形伸縮手段９に入力される。例えば元の音声のβ倍
（β＞１）の遅さで音声を出力したい場合には、この話
速変更率αはα＝βとなり、β倍の速さで出力したい場
合には、話速変更率αはα＝１／βとなる。波形伸縮手
段９はこの話速変更率１０に応じて母音区間音声８の波
形伸縮を行い、それ以外の区間についての波形伸縮は行
わない。なお、この場合の母音区間の波形伸縮率は、母
音区間長と子音区間長により次のように決められる。The vowel section voice 8 is input to the waveform expansion / contraction means 9 and a speech speed change rate 10 is also given. The speech speed change rate 10 is a value given when the user changes the speech speed of the voice, and is input to the waveform expansion / contraction means 9 as an expansion / contraction ratio with respect to the original voice. For example, if it is desired to output the voice at a rate β times the original voice (β> 1), the speech rate change rate α is α = β. The change rate α is α = 1 / β. The waveform expansion / contraction means 9 expands / contracts the waveform of the vowel section voice 8 in accordance with the speech speed change rate 10, and does not perform waveform expansion / contraction for other sections. In this case, the waveform expansion / contraction rate of the vowel section is determined as follows based on the vowel section length and the consonant section length.

【００２４】例えば全音声区間Ｌ、母音区間長Ｌｖとし
て、 α＝｛α’・Ｌｖ＋１・（Ｌ−Ｌｖ）｝／Ｌとなるように、母音部分の波形伸縮率α’を決定する。
すなわち、この式を変形した次式より母音部分の波形伸
縮率α’が得られる。 α’＝１＋（α−１）・Ｌ／Ｌｖちなみに、母音区間長Ｌｖは区間判別の時点で求められ
る。For example, the waveform expansion / contraction ratio α ′ of the vowel part is determined so that α = {α ′ · Lv + 1 · (L−Lv)} / L as the entire voice section L and the vowel section length Lv.
That is, the waveform expansion / contraction ratio α ′ of the vowel portion is obtained from the following expression obtained by modifying this expression. α ′ = 1 + (α−1) · L / Lv Incidentally, the vowel section length Lv is obtained at the time of section discrimination.

【００２５】ここで、波形伸縮手段９による母音区間音
声８の伸縮方法としては、公知のＰＳＯＬＡ手法に基づ
く方法が一般的であるが、ＯｓｃｉｌｌａｔｏｒＭｏ
ｄｅｌというピッチ依存性のない公知の方式を用いるこ
とも有効である。こうして波形伸縮手段９により話速変
更された音声を合成音声５として出力する。Here, as a method of expanding and contracting the vowel section voice 8 by the waveform expanding and contracting means 9, a method based on a known PSOLA method is generally used, but an Oscillator Mo is used.
It is also effective to use a known method called "del" having no pitch dependency. The speech whose speech speed has been changed by the waveform expansion / contraction means 9 is output as the synthesized speech 5.

【００２６】以下ＶＱコードブック６を用いた区間判別
について説明する。コードとそのコードベクトルのみか
らなるＶＱコードブック６を用いて、まず母音または子
音としてラベル付けされた学習音声データのベクトル量
子化を事前に行う。その時の量子化尺度はスペクトルパ
ラメータである。次に、ＶＱコードブック６の各コード
ベクトルに対して、母音または子音が選択された出現頻
度を求め、この母音または子音の出現確率をコードベク
トルとともにＶＱコードブック６に記録する。区間判別
手段７はそのＶＱコードブック６を参照し、入力音声１
についてスペクトル分析を行ってベクトル量子化する。
このベクトル量子化によりＶＱコードブック６からコー
ドベクトルを選択して、そこに記録されている出現確率
から当該区間が母音区間または子音区間のいずれである
かを決定する。例えばある音声区間についてベクトル量
子化により選択されたコードベクトルにおいて、そこに
記録される母音の出現確率が７０％であれば当該区間を
母音区間と判別し、３０％であれば子音区間と判別す
る。この例では母音区間と子音区間とを判別する閾値は
５０％としている。The section discrimination using the VQ codebook 6 will be described below. First, vector quantization of training speech data labeled as a vowel or consonant is performed in advance using a VQ codebook 6 consisting of only a code and its code vector. The quantization scale at that time is a spectral parameter. Next, for each code vector in the VQ codebook 6, the appearance frequency at which a vowel or consonant is selected is determined, and the appearance probability of this vowel or consonant is recorded in the VQ codebook 6 together with the code vector. The section discriminating means 7 refers to the VQ codebook 6 and inputs the input voice 1
Is subjected to spectral analysis to perform vector quantization.
By this vector quantization, a code vector is selected from the VQ codebook 6, and it is determined from the appearance probability recorded therein whether the section is a vowel section or a consonant section. For example, in a code vector selected by vector quantization for a certain voice section, if the appearance probability of a vowel recorded therein is 70%, the section is determined to be a vowel section, and if 30%, the section is determined to be a consonant section. . In this example, the threshold for discriminating a vowel section and a consonant section is set to 50%.

【００２７】以上のように、この実施の形態１によれ
ば、母音区間についての波形伸縮を自動的に実施するこ
とが可能となって、全音声区間を一様に同じ伸縮率で変
更することによる自然性劣化の問題を解消することがで
き、良好な発話速度変更を行うことが可能になるという
効果が得られる。As described above, according to the first embodiment, it is possible to automatically perform waveform expansion and contraction for a vowel section, and to change all voice sections uniformly at the same expansion and contraction rate. This can solve the problem of the deterioration of naturalness due to the above, and has an effect that it is possible to change the utterance speed satisfactorily.

【００２８】実施の形態２．図２はこの発明の実施の形
態２による話速変更装置の一構成例を示すブロック図で
ある。図において、１は入力音声、５は合成音声、６は
ＶＱコードブック、７は区間判別手段、８は母音区間音
声、１０は話速変更率であり、これらは図１に同一符号
を付して示した、実施の形態１における各構成要素およ
びデータと同等の部分である。また、１１は区間判別手
段７よりその判定結果に基づいて出力される子音区間音
声である。９は図１に同一符号を付して示した構成要素
に相当する波形伸縮手段であるが、母音区間と子音区
間、または母音区間と子音区間と無音区間毎に固有の伸
縮比を規定し、話速変更率１０が与えられた場合に、母
音区間と子音区間の伸縮比、あるいは母音区間と子音区
間と無音区間の伸縮比に応じた区間毎の波形伸縮率を求
め、それに基づいた波形伸縮を行う点で、実施の形態１
のそれとは異なっている。Embodiment 2 FIG. 2 is a block diagram showing a configuration example of a speech speed changing device according to a second embodiment of the present invention. In the figure, 1 is an input voice, 5 is a synthesized voice, 6 is a VQ codebook, 7 is a section discriminating means, 8 is a vowel section voice, 10 is a speech speed change rate, and these are given the same reference numerals in FIG. This is a part equivalent to each component and data according to the first embodiment. Reference numeral 11 denotes a consonant section voice output from the section determination means 7 based on the determination result. Reference numeral 9 denotes waveform expansion / contraction means corresponding to the components denoted by the same reference numerals in FIG. 1, and defines a specific expansion / contraction ratio for each vowel section and consonant section, or each vowel section, consonant section, and silent section. Given a speech speed change rate of 10, a waveform expansion / contraction ratio is calculated for each section according to the expansion / contraction ratio between a vowel section and a consonant section, or an expansion / contraction ratio between a vowel section, a consonant section, and a silent section. Embodiment 1 in that
Is different from that of

【００２９】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データであるものとする。また、ＶＱコードブ
ック６は音声データ１のベクトル量子化を行うために利
用するものであり、その構成は実施の形態１の場合と同
様である。区間判別手段７は入力音声１における母音と
子音の区間を求め、母音区間部分の入力音声１を母音区
間音声８として、子音区間部分の入力音声１を子音区間
音声１１としてそれぞれ波形伸縮手段９に送る。なお、
上記区間判別例としては、実施の形態１の場合と同様の
方法があげられる。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data prepared in advance by a system for outputting sound. The VQ codebook 6 is used for performing vector quantization of the audio data 1, and has the same configuration as that of the first embodiment. The section discriminating means 7 obtains a section between a vowel and a consonant in the input speech 1, and the input speech 1 in the vowel section is used as the vowel section speech 8, and the input speech 1 in the consonant section is used as the consonant section speech 11, and the waveform expansion and contraction means 9 respectively. send. In addition,
As an example of the section determination, a method similar to that of the first embodiment can be used.

【００３０】話速変更率１０も実施の形態１の場合と同
様に、ユーザが音声の発話速度を変更する際に与える値
であり、元の音声に対する伸縮比率として波形伸縮手段
９に入力される。なお波形伸縮手段９には母音区間と子
音区間の伸縮比が規定されている。一般的に母音区間で
は子音区間より話速変化による伸縮の度合が大きいこと
が知られている。伸縮比の値は発話速度の異なる学習音
声データから得た母音全体および子音全体の平均時間長
と発話速度との相関から求められる。Similarly to the first embodiment, the speech speed change rate 10 is a value given when the user changes the speech speed of the voice, and is input to the waveform expansion / contraction means 9 as an expansion / contraction ratio with respect to the original voice. . The expansion / contraction ratio of the vowel section and the consonant section is defined in the waveform expansion / contraction means 9. It is generally known that the degree of expansion and contraction due to a change in speech speed is larger in a vowel section than in a consonant section. The value of the expansion / contraction ratio is obtained from the correlation between the average time length of the entire vowel and the entire consonant obtained from the learning speech data having different speech speeds and the speech speed.

【００３１】波形伸縮手段９にユーザより話速変更率１
０が与えられると、先に規定された母音区間と子音区間
の伸縮比より、母音区間と子音区間毎の波形伸縮率が決
定される。ここで、母音区間長をＬｖ、子音区間長をＬ
ｃとすると、母音区間長Ｌｖと子音区間長Ｌｃのトータ
ルの伸縮率は話速変更率α（α＞０）である。例えば母
音区間の伸縮比をβ、子音区間の伸縮比をγとすると、
母音区間と子音区間の伸縮比については、波形伸長の場
合には０＜γ＜１＜β、波形圧縮の場合には０＜β＜１
＜γであれば、母音区間の伸縮率がα’β、子音区間の
伸縮率がα’γとなるような係数α’を求めることによ
って決まる。The user changes the speech speed change rate 1 to the waveform stretching means 9.
When 0 is given, the waveform expansion / contraction ratio for each vowel section and consonant section is determined from the expansion / contraction ratio between the vowel section and the consonant section defined above. Here, the vowel section length is Lv, and the consonant section length is L
Assuming that c, the total expansion / contraction rate of the vowel section length Lv and the consonant section length Lc is the speech speed change rate α (α> 0). For example, if the expansion ratio of a vowel section is β and the expansion ratio of a consonant section is γ,
Regarding the expansion / contraction ratio between the vowel section and the consonant section, 0 <γ <1 <β in the case of waveform expansion, and 0 <β <1 in the case of waveform compression.
If <γ, it is determined by obtaining a coefficient α ′ such that the expansion and contraction rate of the vowel section is α′β and the expansion and contraction rate of the consonant section is α′γ.

【００３２】すなわち、 α＝α’（β・Ｌｖ＋γ・Ｌｃ）／（Ｌｖ＋Ｌｃ） α’＝α（Ｌｖ＋Ｌｃ）／（β・Ｌｖ＋γ・Ｌｃ）より、母音区間の波形伸縮率は、 α’β＝α・β（Ｌｖ＋Ｌｃ）／（β・Ｌｖ＋γ・Ｌ
ｃ）子音区間の波形伸縮率は、 α’γ＝α・γ（Ｌｖ＋Ｌｃ）／（β・Ｌｖ＋γ・Ｌ
ｃ）となる。ちなみに、母音区間長Ｌｖと子音区間長Ｌｃは
区間判別の際に求められるものとする。That is, from α = α ′ (β · Lv + γ · Lc) / (Lv + Lc) α ′ = α (Lv + Lc) / (β · Lv + γ · Lc), the waveform expansion / contraction ratio of the vowel section is α′β = α・ Β (Lv + Lc) / (β ・ Lv + γ ・ L
c) The waveform expansion / contraction rate in the consonant section is α′γ = α · γ (Lv + Lc) / (β · Lv + γ · L
c) Incidentally, it is assumed that the vowel section length Lv and the consonant section length Lc are obtained at the time of section determination.

【００３３】このように、波形伸縮手段９は入力音声１
の母音区間音声８と子音区間音声１１のそれぞれについ
て、決まった波形伸縮率により伸縮を行う。その波形伸
縮の方法は実施の形態１の場合と同様である。こうして
波形伸縮手段９により話速変更された音声が合成音声５
として出力される。As described above, the waveform expanding / contracting means 9 outputs the input voice 1
The vowel section voice 8 and the consonant section voice 11 are expanded and contracted at a predetermined waveform expansion and contraction rate. The method of expanding and contracting the waveform is the same as that of the first embodiment. The speech whose speech speed has been changed by the waveform expansion / contraction means 9 is the synthesized speech 5
Is output as

【００３４】以上のように、この実施の形態２によれ
ば、母音区間と子音区間について、個別の伸縮率による
波形伸縮を自動的に実施することができ、全音声区間を
一様に同じ伸縮率で変更することによる自然性劣化の問
題を解消することが可能となって、良好な発話速度変更
が実現できるようになるという効果が得られる。As described above, according to the second embodiment, it is possible to automatically perform the waveform expansion and contraction by the individual expansion and contraction ratios for the vowel section and the consonant section, and to uniformly perform the same expansion and contraction for all voice sections. This makes it possible to solve the problem of naturalness degradation due to the change at the rate, thereby achieving an effect that a good change in the utterance speed can be realized.

【００３５】実施の形態３．図３はこの発明の実施の形
態３による話速変更装置の一構成例を示すブロック図で
ある。なお、上記実施の形態１あるいは実施の形態２の
各構成要素およびデータに相当する部分については、図
１または図２と同一符号を付してその説明を省略する。
図において、１２は音素の種類や、音素カテゴリー毎に
規定された固有の伸縮比を記憶する伸縮比テーブルであ
る。なお、波形伸縮手段９は図１または図２に同一符号
を付して示した実施の形態１あるいは実施の形態２の構
成要素に相当する波形伸縮手段であるが、話速変更率１
０が与えられた場合に、伸縮比テーブル１２を参照し
て、入力音声１の音素毎または音素カテゴリー毎にその
伸縮比に応じた区間毎の波形伸縮率を求めて波形伸縮を
行う点でそれとは異なっている。Embodiment 3 FIG. FIG. 3 is a block diagram showing a configuration example of a speech speed changing device according to Embodiment 3 of the present invention. Parts corresponding to the respective components and data of the first embodiment or the second embodiment are denoted by the same reference numerals as in FIG. 1 or FIG. 2, and description thereof is omitted.
In the figure, reference numeral 12 denotes an expansion / contraction ratio table that stores types of phonemes and specific expansion / contraction ratios defined for each phoneme category. The waveform expansion / contraction means 9 is a waveform expansion / contraction means corresponding to a component of the first or second embodiment shown in FIG. 1 or FIG.
When 0 is given, the waveform expansion / contraction is performed by referring to the expansion / contraction ratio table 12 and calculating the waveform expansion / contraction ratio for each section corresponding to the expansion / contraction ratio for each phoneme or phoneme category of the input speech 1. Are different.

【００３６】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データ１であるものとする。ＶＱコードブック
６は音声データのベクトル量子化を行うために利用する
ものであり、その構成は実施の形態１の場合と同様であ
る。区間判別手段７は入力音声１における母音と子音の
区間を求め、母音区間部分の入力音声１である母音区間
音声８と、子音区間部分の入力音声１である子音区間音
声１１を波形伸縮手段９に送る。ただし、子音区間音声
は後に説明する音素カテゴリー毎に分類されている。な
お、上記区間判別の例も実施の形態２の場合と同様であ
る。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data 1 prepared in advance by a system for outputting sound. The VQ codebook 6 is used for performing vector quantization of audio data, and has the same configuration as that of the first embodiment. The section discriminating means 7 finds a section of a vowel and a consonant in the input speech 1, and converts the vowel section speech 8, which is the input speech 1 of the vowel section, and the consonant section speech 11, which is the input speech 1 of the consonant section, into the waveform expansion and contraction means 9. Send to However, the consonant section voices are classified for each phoneme category described later. The example of the section determination is the same as that in the second embodiment.

【００３７】話速変更率１０も実施の形態１の場合と同
様に、ユーザが音声の発話速度を変更する際に与える値
であり、元の音声に対する伸縮比率として波形伸縮手段
９に入力される。なお伸縮比テーブル１２には前述のよ
うに、音素カテゴリー毎の伸縮比が記述されている。こ
の音素カテゴリーの例としては、母音、摩擦性子音、破
裂性子音、鼻音、流音、撥音などが挙げられ、伸縮比に
ばらつきがある子音区間をこのように分類して、音素カ
テゴリー毎にその伸縮比を決定する。このカテゴリー毎
の伸縮比の値は、発話速度の異なる学習音声データから
得た、音素カテゴリー毎の平均時間長と発話速度との相
関から求められる。Similarly to the first embodiment, the speech speed change rate 10 is a value given when the user changes the speech speed of the voice, and is input to the waveform expansion / contraction means 9 as an expansion / contraction ratio with respect to the original voice. . Note that the expansion ratio table 12 describes the expansion ratio for each phoneme category as described above. Examples of this phoneme category include vowels, fricative consonants, explosive consonants, nasal sounds, flowing sounds, and sound repellents. Determine the stretch ratio. The value of the expansion / contraction ratio for each category is obtained from the correlation between the average time length for each phoneme category and the utterance speed obtained from learning speech data having different utterance speeds.

【００３８】波形伸縮手段９にユーザより話速変更率１
０が与えられると伸縮比テーブル１２を参照し、当該伸
縮比テーブル１２に記憶されている音素の種類や音素カ
テゴリー毎に固有の伸縮比に基づいて、母音区間と子音
区間の波形伸縮率を決定する。なお、この母音区間と子
音区間に与えられた伸縮比からの波形伸縮率の決定法
は、実施の形態２の場合と同様である。このように、波
形伸縮手段９は母音区間と子音カテゴリー毎に分類され
た区間それぞれについて、入力音声１の伸縮を行う。こ
の波形伸縮の方法は実施の形態１の場合と同様である。
こうして波形伸縮手段９により話速変更された音声を合
成音声５として出力する。The user changes the speech speed change rate 1 to the waveform stretching means 9 by the user.
When 0 is given, the expansion / contraction ratio table 12 is referred to, and the waveform expansion / contraction ratio between the vowel section and the consonant section is determined based on the expansion / contraction ratio specific to each phoneme category or phoneme category stored in the expansion / contraction ratio table 12. I do. Note that the method of determining the waveform expansion / contraction ratio from the expansion / contraction ratio given to the vowel section and the consonant section is the same as in the second embodiment. As described above, the waveform expanding / contracting means 9 expands / contracts the input speech 1 for each of the vowel sections and the sections classified according to the consonant categories. The method of expanding and contracting the waveform is the same as in the first embodiment.
The speech whose speech speed has been changed by the waveform expansion / contraction means 9 is output as the synthesized speech 5.

【００３９】以上のように、この実施の形態３によれ
ば、母音区間と子音カテゴリー毎に分類された各区間に
ついて、個別の伸縮率による波形伸縮を自動的に実施す
ることが可能になって、全音声区間を一様に同じ伸縮率
で変更することによる自然性劣化の問題を解消すること
ができ、良好な発話速度変更の実現が可能になるという
効果が得られる。As described above, according to the third embodiment, it is possible to automatically perform waveform expansion / contraction based on individual expansion / contraction ratios for vowel sections and consonant categories. In addition, it is possible to solve the problem of deterioration of naturalness caused by uniformly changing all voice sections at the same expansion / contraction ratio, and it is possible to obtain an effect that it is possible to achieve a favorable change in speech rate.

【００４０】実施の形態４．図４はこの発明の実施の形
態４による話速変更装置の一構成例を示すブロック図で
ある。図において、１は入力音声、５は合成音声、１０
は話速変更率であり、これらは図１から図３に同一符号
を付して示した、実施の形態１〜実施の形態３における
各構成要素およびデータと同等のものである。１３は学
習音声データについてのコードベクトルを、各コードベ
クトルの固有の伸縮比とともに記録する伸縮比付きＶＱ
コードブックである。１４は入力音声１について、この
伸縮比付きＶＱコードブック１３を参照してベクトル量
子化を行って、ベクトルコードを選択し、話速変更率１
０が与えられた場合に、選択したコードベクトル毎の伸
縮比に応じた区間毎の波形伸縮率を求め、それに基づい
て波形伸縮を行うＶＱ波形伸縮手段である。Embodiment 4 FIG. FIG. 4 is a block diagram showing a configuration example of a speech speed changing device according to Embodiment 4 of the present invention. In the figure, 1 is an input voice, 5 is a synthesized voice, 10
Is a speech speed change rate, which is equivalent to each component and data in the first to third embodiments, which are denoted by the same reference numerals in FIGS. 1 to 3. Reference numeral 13 denotes a VQ with an expansion / contraction ratio that records a code vector for learning speech data together with an expansion / contraction ratio specific to each code vector.
It is a code book. 14 performs vector quantization on the input speech 1 with reference to the VQ codebook 13 with expansion / contraction ratio, selects a vector code, and changes the speech speed change rate 1
This is a VQ waveform expansion / contraction unit that obtains a waveform expansion / contraction ratio for each section corresponding to the expansion / contraction ratio of each selected code vector when 0 is given, and performs waveform expansion / contraction based on the obtained ratio.

【００４１】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データであるものとする。また、伸縮比付きＶ
Ｑコードブック１３はその音声データのベクトル量子化
を行うために利用するものであり、その構成は後に説明
する。話速変更率１０も実施の形態１の場合と同様であ
り、ユーザが音声の発話速度を変更する際に与える値
で、元の音声に対する伸縮比率としてＶＱ波形伸縮手段
１４に入力される。ＶＱ波形伸縮手段１４は伸縮比付き
ＶＱコードブック１３を参照して、そこに記憶されてい
る伸縮比と、ユーザから与えられる話速変更率１０とに
よって波形伸縮を行う。なお、このＶＱ波形伸縮手段１
４における波形伸縮法は後に説明する。このようして、
ＶＱ波形伸縮手段１４により話速変更された音声が合成
音声５として出力される。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data prepared in advance by a system for outputting sound. In addition, V with expansion ratio
The Q codebook 13 is used to perform vector quantization of the audio data, and its configuration will be described later. The speech speed change rate 10 is also the same as that in the first embodiment, and is a value given when the user changes the speech speed of the voice, and is input to the VQ waveform expansion / contraction unit 14 as an expansion / contraction ratio with respect to the original voice. The VQ waveform expanding / contracting means 14 refers to the VQ codebook 13 with the expansion / contraction ratio, and expands / contracts the waveform based on the expansion / contraction ratio stored therein and the speech speed change rate 10 given by the user. The VQ waveform expansion / contraction means 1
4 will be described later. Thus,
The voice whose speech speed has been changed by the VQ waveform expanding / contracting means 14 is output as the synthesized voice 5.

【００４２】以下伸縮比付きＶＱコードブック１３の構
成を説明する。この音声データのベクトル量子化に利用
される伸縮比付きＶＱコードブック１３は、学習音声デ
ータにおける音声波形のスペクトルデータを、一般的な
クラスタリング手法によって分割したパターン（コード
ベクトル）と、そのコードの組からなっている。この伸
縮比付きＶＱコードブック１３については事前に、母音
または子音としてラベル付けされた学習音声データのベ
クトル量子化を行っておく。その時の量子化尺度はスペ
クトルパラメータである。The structure of the VQ codebook 13 with the expansion ratio will be described below. The VQ codebook 13 with expansion and contraction ratio used for the vector quantization of the audio data includes a pattern (code vector) obtained by dividing the audio waveform spectrum data in the learning audio data by a general clustering method, and a set of the code. Consists of For the VQ codebook 13 with the expansion / contraction ratio, the vector quantization of the learning voice data labeled as a vowel or a consonant is performed in advance. The quantization scale at that time is a spectral parameter.

【００４３】次に、伸縮比付きＶＱコードブック１３の
各コードベクトルに対して、母音または子音が選択され
た出現頻度を求め、母音区間または子音区間の出現確率
をコードベクトルとともに伸縮比付きＶＱコードブック
１３に記録する。さらに、各コードベクトル毎に記録さ
れた母音区間および子音区間の出現確率から、各コード
ベクトルが母音に対応するものか子音に対応するものか
を判定しておき、それぞれのコードベクトル毎に母音区
間または子音区間に固有の伸縮比を与えて記憶する。例
えば、母音区間と子音区間とを判別する閾値を５０％と
して、コードベクトルの母音出現確率が７０％であれば
母音区間の波形伸縮率を、３０％であれば子音区間の波
形伸縮率をそれぞれ記録する。ここでの伸縮比は実施の
形態２のものと同様とする。Next, for each code vector in the VQ codebook 13 with expansion / contraction ratio, the appearance frequency at which a vowel or consonant is selected is determined, and the appearance probability of a vowel section or consonant section together with the code vector is added to the VQ code with expansion / contraction ratio. Record in Book 13. Furthermore, it is determined whether each code vector corresponds to a vowel or a consonant from the appearance probability of the vowel section and the consonant section recorded for each code vector, and the vowel section for each code vector is determined. Alternatively, a specific expansion / contraction ratio is given to the consonant section and stored. For example, assuming that a threshold value for determining a vowel section and a consonant section is 50%, if the vowel appearance probability of the code vector is 70%, the waveform expansion rate of the vowel section is 30%, and the waveform expansion rate of the consonant section is 30%. Record. The expansion ratio here is the same as that of the second embodiment.

【００４４】次に、ＶＱ波形伸縮手段１４における波形
伸縮について説明する。ＶＱ波形伸縮手段１４はまず、
伸縮比付きＶＱコードブック１３を参照し、入力音声１
についてスペクトル分析を行ってそれをベクトル量子化
する。このベクトル量子化に基づいて伸縮比付きＶＱコ
ードブック１３からコードベクトルを選択し、そこに記
憶される伸縮比を読みだす。ユーザより話速変更率１０
が与えられると、伸縮比付きＶＱコードブック１３から
読みだした伸縮比により、母音区間と子音区間毎の波形
伸縮率を決定する。なお、この波形伸縮率の決定方法は
実施の形態２の場合と同様である。これらの波形伸縮率
に応じて当該区間の入力音声１の波形伸縮を行い、それ
を合成音声５として出力する。なお、この波形伸縮の方
法も実施の形態１の場合と同様である。Next, waveform expansion and contraction in the VQ waveform expansion and contraction means 14 will be described. First, the VQ waveform expansion / contraction means 14
Referring to the VQ codebook 13 with the expansion ratio, the input voice 1
Is subjected to spectral analysis and vector-quantized. Based on this vector quantization, a code vector is selected from the VQ codebook 13 with an expansion ratio, and the expansion ratio stored therein is read. Speech rate change rate from user 10
Is given, the waveform expansion / contraction ratio for each vowel section and consonant section is determined based on the expansion / contraction ratio read from the VQ codebook 13 with the expansion / contraction ratio. The method of determining the waveform expansion / contraction ratio is the same as that in the second embodiment. The waveform of the input voice 1 in the section is expanded / contracted in accordance with these waveform expansion / contraction rates, and the resultant is output as the synthesized voice 5. The method of expanding and contracting the waveform is the same as in the first embodiment.

【００４５】以上のように、この実施の形態４によれ
ば、コードベクトルに固有の伸縮比率を記録することに
よって、母音区間と子音区間についての個別の伸縮率で
の波形伸縮を簡便かつ自動的に実施することができ、全
音声区間を一様に同じ伸縮率で変更することによる自然
性劣化の問題を解消することが可能となって、良好な発
話速度変更を実現することができるという効果が得られ
る。As described above, according to the fourth embodiment, by recording the unique expansion / contraction ratio in the code vector, the waveform expansion / contraction at the individual expansion / contraction ratio for the vowel section and the consonant section can be performed easily and automatically. And the problem that naturalness is degraded by changing all voice sections uniformly at the same expansion / contraction ratio can be solved, and a good speech rate change can be realized. Is obtained.

【００４６】実施の形態５．図５はこの発明の実施の形
態５による話速変更装置の一構成例示すブロック図であ
る。なお、上記実施の形態１〜実施の形態４の各構成要
素およびデータに相当する部分については、図１〜図４
と同一符号を付してその説明を省略する。図において、
１５は入力音声１における音素境界と音素種類を判別
し、その識別結果より音素ラベルを抽出するラベリング
手段であり、１６はこのラベリング手段１５によって抽
出された上記音素ラベルである。１７は音素系列に固有
のリズム知覚点を記憶するリズム知覚点テーブルであ
り、１８はこのリズム知覚点テーブル１７を参照して、
ラベリング手段１５の抽出した音素ラベル１６から、入
力音声１のリズム知覚点を抽出するリズム知覚点抽出手
段である。なお、波形伸縮手段９は図１〜図３に同一符
号を付して示したも実施の形態１〜実施の形態３の構成
要素に相当するものであるが、話速変更率１０が与えら
れた場合に、リズム知覚点間隔の伸縮率が一定になるよ
うに区間毎の波形伸縮率を求め、それに基づいて波形伸
縮を行っている点で、上記各実施の形態における波形伸
縮手段９とは異なっている。Embodiment 5 FIG. FIG. 5 is a block diagram showing a configuration example of a speech speed changing device according to a fifth embodiment of the present invention. The components corresponding to the components and data of the first to fourth embodiments are described with reference to FIGS.
The same reference numerals are given and the description is omitted. In the figure,
Reference numeral 15 denotes a labeling unit that determines a phoneme boundary and a phoneme type in the input speech 1 and extracts a phoneme label from the identification result, and 16 denotes the phoneme label extracted by the labeling unit 15. Reference numeral 17 denotes a rhythm perception point table that stores a rhythm perception point unique to a phoneme sequence. Reference numeral 18 denotes a rhythm perception point table.
This is a rhythm perception point extraction unit that extracts a rhythm perception point of the input speech 1 from the phoneme label 16 extracted by the labeling unit 15. Although the waveform expanding / contracting means 9 is shown in FIGS. 1 to 3 with the same reference numerals, it also corresponds to a component of the first to third embodiments, but a speech speed change rate 10 is given. In this case, the waveform expansion and contraction ratio is determined for each section so that the expansion and contraction ratio of the rhythm perception point interval is constant, and the waveform is expanded and contracted based on the waveform expansion and contraction ratio. Is different.

【００４７】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データであるものとする。入力音声１が入力さ
れると、ラベリング手段１５はその入力音声１における
音素境界と音素種類の判別を行い、その判別結果に基づ
いて音素ラベル１６を抽出し、それをリズム知覚点抽出
手段１８へ送る。なお、この音素ラベル１６の決定方法
としては、例えば、入力音声１とその発話内容から、事
前にマニュアルで音素境界ならびに音素種類を与えてお
く方法がある。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data prepared in advance by a system for outputting sound. When the input speech 1 is input, the labeling means 15 determines a phoneme boundary and a phoneme type in the input speech 1, extracts a phoneme label 16 based on the discrimination result, and sends it to the rhythm perception point extraction means 18. send. As a method of determining the phoneme label 16, for example, there is a method in which a phoneme boundary and a phoneme type are manually given in advance from the input speech 1 and its utterance content.

【００４８】一方、リズム知覚点テーブル１７には子音
（Ｃ）や母音（Ｖ）の組合せに応じたリズム知覚点の情
報が記憶されている。リズム知覚点は人間が聴覚的にリ
ズムを知覚する音声波形上の位置であり、ＣＶ（子音−
母音）またはＶＣＶ（母音−子音−母音）などの音素系
列毎に固有の値を取る。このリズム知覚点の間隔が自然
音声のリズムと対応しており、発話速度が変わっても同
じ音素の組合せであれば、そのリズム知覚点間隔間の比
率は保存されている。On the other hand, the rhythm perception point table 17 stores information on rhythm perception points corresponding to combinations of consonants (C) and vowels (V). The rhythm perception point is the position on the audio waveform where the human perceptually perceives the rhythm, and is a CV (consonant-
A unique value is taken for each phoneme sequence such as vowels or VCV (vowel-consonant-vowel). The interval between the rhythm perception points corresponds to the rhythm of the natural voice, and the ratio between the rhythm perception point intervals is preserved for the same combination of phonemes even when the speech speed changes.

【００４９】このリズム知覚点テーブル１７の内容の一
例を図６に示す。リズム知覚点テーブル１７には図６
（ａ）に示すように、音素系列とその音素系列に対応す
るリズム知覚点が記述されている。すなわち、図６
（ａ）の例では先行Ｖ、Ｃ、後続ＶによるＣＶもしくは
ＶＣＶの音素系列に対応するリズム知覚点が記述されて
いる。また、この図６（ａ）の例では、リズム知覚点は
ＣＶまたはＶＣＶの音声素片におけるフレーム位置を示
しており、音声素片のパワー概形との対応においては図
６（ｂ）に示す位置となる。リズム知覚点抽出手段１８
は音素ラベル１６の組からこのリズム知覚点テーブル１
７を参照して、入力音声１上のリズム知覚点位置を決定
する。FIG. 6 shows an example of the contents of the rhythm perception point table 17. FIG. 6 shows the rhythm perception point table 17.
As shown in (a), a phoneme sequence and a rhythm perception point corresponding to the phoneme sequence are described. That is, FIG.
In the example of (a), a rhythm perception point corresponding to a CV or VCV phoneme sequence by preceding V, C, and succeeding V is described. Further, in the example of FIG. 6A, the rhythm perception point indicates the frame position in the speech unit of the CV or VCV, and the correspondence with the approximate power of the speech unit is shown in FIG. 6B. Position. Rhythm perception point extracting means 18
Is the rhythm perception point table 1 from the set of phoneme labels 16.
7, the position of the rhythm perception point on the input voice 1 is determined.

【００５０】話速変更率１０も実施の形態１の場合と同
様に、ユーザが音声の発話速度を変更する際に与える値
であり、元の音声に対する伸縮比率として波形伸縮手段
９に入力される。波形伸縮手段９はこの話速変更率１０
に応じて入力音声１を伸縮するが、その際、入力音声１
上のリズム知覚点間隔が同じ話速変更率１０によって伸
縮されるようにする。すなわち、まずラベリング手段１
５にて抽出された音素ラベル１６によって、リズム知覚
点間隔内の母音区間と子音区間の内訳を判定する。な
お、波形伸縮手段９で規定されている母音と子音の伸縮
比から、母音区間と子音区間の波形伸縮率を決定する方
法は実施の形態２の場合と同様である。波形伸縮手段９
はこれらの値に応じて各区間の入力音声１を圧縮または
伸長する。この波形伸縮の方法も実施の形態１の場合と
同様である。こうして波形伸縮手段９により話速変更さ
れた音声が合成音声５として出力される。Similarly to the first embodiment, the speech speed change rate 10 is a value given when the user changes the speech speed of the voice, and is input to the waveform expansion / contraction means 9 as an expansion / contraction ratio with respect to the original voice. . The waveform expansion / contraction means 9 calculates the speech speed change rate 10
Expands and contracts the input voice 1 according to the input voice 1
The upper rhythm perception point interval is expanded or contracted by the same speech speed change rate 10. That is, first, the labeling means 1
Based on the phoneme label 16 extracted in step 5, the vowel section and the consonant section within the rhythm perception point interval are determined. The method of determining the waveform expansion / contraction ratio between a vowel section and a consonant section from the expansion / contraction ratio between a vowel and a consonant defined by the waveform expansion / contraction means 9 is the same as in the second embodiment. Corrugated expansion / contraction means 9
Compresses or decompresses the input speech 1 in each section according to these values. The method of expanding and contracting the waveform is the same as in the first embodiment. The voice whose speech speed has been changed by the waveform expansion / contraction means 9 is output as the synthesized voice 5.

【００５１】以上のように、この実施の形態５によれ
ば、波形伸縮を自動的に実施することが可能となり、自
然音声の時間構造の基本であるリズム知覚点を考慮した
波形伸縮を行うことによって、全音声区間を一様に変換
するよりも自然な発話速度変更を実現することができる
という効果が得られる。As described above, according to the fifth embodiment, the expansion and contraction of the waveform can be automatically performed, and the expansion and contraction of the waveform in consideration of the rhythm perception point which is the basis of the time structure of the natural voice can be performed. Thus, an effect is obtained in which the utterance speed can be changed more naturally than when all voice sections are converted uniformly.

【００５２】実施の形態６．図７はこの発明の実施の形
態６による話速変更装置の一構成例を示すブロック図で
ある。なお、上記実施の形態１〜実施の形態５の各構成
要素およびデータに相当する部分については、図１〜図
５と同一符号を付してその説明を省略する。この実施の
形態６による話速変更装置は実施の形態５による話速変
更装置の波形伸縮手段９に伸縮比テーブル１２を接続し
たものであり、波形伸縮手段９に話速変更率１０が与え
られた場合に、伸縮比テーブル１２を参照して波形伸縮
を行う点で、実施の形態５とは異なっている。Embodiment 6 FIG. FIG. 7 is a block diagram showing a configuration example of a speech speed changing device according to Embodiment 6 of the present invention. The components corresponding to the respective components and data in the first to fifth embodiments are denoted by the same reference numerals as those in FIGS. 1 to 5, and description thereof is omitted. The speech speed changing device according to the sixth embodiment is obtained by connecting the expansion / contraction ratio table 12 to the waveform expanding / contracting means 9 of the speech speed changing device according to the fifth embodiment. In this case, the waveform expansion and contraction is performed with reference to the expansion and contraction ratio table 12, which is different from the fifth embodiment.

【００５３】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データであるものとする。ラベリング手段１５
は入力音声１における音素境界と音素種類を判別し、そ
の結果を音素ラベル１６としてリズム知覚点抽出手段１
８に送る。なお、音素ラベル１６の決定は実施の形態５
の場合と同様とする。リズム知覚点テーブル１７は音素
（ＣやＶ）の組合せに応じたリズム知覚点の情報が記憶
されているテーブルであり、その内容および構成は図６
に示した実施の形態５の場合と同様である。リズム知覚
点抽出手段１８はラベリング手段１５の抽出した音素ラ
ベル１６の組からリズム知覚点テーブル１７を参照し、
入力音声１上のリズム知覚点位置を決定する。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data prepared in advance by a system for outputting sound. Labeling means 15
Determines a phoneme boundary and a phoneme type in the input speech 1 and uses the result as a phoneme label 16 for the rhythm perception point extracting means 1.
Send to 8. Note that the phoneme label 16 is determined according to the fifth embodiment.
The same as in the case of. The rhythm perception point table 17 is a table in which information on rhythm perception points corresponding to combinations of phonemes (C and V) is stored.
This is the same as the case of the fifth embodiment shown in FIG. The rhythm perception point extracting means 18 refers to the rhythm perception point table 17 from the set of phoneme labels 16 extracted by the labeling means 15,
The rhythm perception point position on the input voice 1 is determined.

【００５４】話速変更率１０も実施の形態１の場合と同
様に、ユーザが音声の発話速度を変更する際に与える値
であり、元の音声に対する伸縮比率として波形伸縮手段
９に入力される。波形伸縮手段９はこの話速変更率１０
に応じて入力音声１を伸縮するが、その際、入力音声１
上のリズム知覚点間隔が同じ話速変更率１０によって伸
縮されるようにする。すなわち、まずラベリング手段１
５の抽出した音素ラベル１６により、リズム知覚点間隔
内の母音や破裂性子音などの音素カテゴリー毎の区間の
内訳が判定される。この音素カテゴリーの例は実施の形
態３において説明したものと同様である。そして伸縮比
テーブル１２を参照することにより、区間毎の音素カテ
ゴリーに対応する伸縮比を求める。この伸縮比テーブル
１２の構成と波形伸縮率の決定方法も、実施の形態３の
場合と同様である。波形伸縮手段９はこれらの値に基づ
いて、各区間の入力音声１を圧縮もしくは伸長する。な
お、この波形伸縮の方法も実施の形態１の場合と同様で
ある。こうして波形伸縮手段９により話速変更された音
声は合成音声５として出力される。Similarly to the first embodiment, the speech speed change rate 10 is a value given when the user changes the speech speed of the voice, and is input to the waveform expansion / contraction means 9 as an expansion / contraction ratio with respect to the original voice. . The waveform expansion / contraction means 9 calculates the speech speed change rate 10
Expands and contracts the input voice 1 according to the input voice 1
The upper rhythm perception point interval is expanded or contracted by the same speech speed change rate 10. That is, first, the labeling means 1
Based on the extracted phoneme label 16 of 5, the breakdown of the section for each phoneme category such as vowels and burst consonants within the rhythm perception point interval is determined. Examples of this phoneme category are the same as those described in the third embodiment. By referring to the expansion / contraction ratio table 12, the expansion / contraction ratio corresponding to the phoneme category for each section is obtained. The configuration of the expansion / contraction ratio table 12 and the method of determining the waveform expansion / contraction ratio are the same as those in the third embodiment. The waveform expansion / contraction means 9 compresses or expands the input voice 1 in each section based on these values. The method of expanding and contracting the waveform is the same as in the first embodiment. The speech whose speech speed has been changed by the waveform expansion / contraction means 9 is output as the synthesized speech 5.

【００５５】以上のように、この実施の形態６によれ
ば、波形伸縮を自動的に実施することが可能となり、自
然音声の時間構造の基本であるリズム知覚点と音素種類
を考慮した波形伸縮を行うことによって、全音声区間を
一様に変換するよりも自然な発話速度変更を実現するこ
とができるという効果が得られる。As described above, according to the sixth embodiment, the waveform expansion and contraction can be automatically performed, and the waveform expansion and contraction in consideration of the rhythm perception point and the phoneme type, which are the basics of the time structure of natural speech. Is performed, it is possible to achieve an effect that a more natural change in the utterance speed can be realized than when all voice sections are converted uniformly.

【００５６】実施の形態７．図８はこの発明の実施の形
態７による話速変更装置の一構成例を示すブロック図で
あり、上記実施の形態１〜実施の形態６の各構成要素お
よびデータに相当する部分については、図１〜図５およ
び図７と同一符号を付してその説明を省略する。図にお
いて、１９は音素系列に固有のリズム知覚点と音素系列
に固有の伸縮比とを記憶し、リズム知覚点抽出手段１８
によって参照されるリズム知覚点兼伸縮比テーブルであ
る。上記各実施の形態において個別に用意されていた伸
縮比テーブル１２とリズム知覚点テーブル１７が、この
実施の形態７では１つのリズム知覚点兼伸縮比テーブル
１９にまとめられている点で、実施の形態６とは異なっ
ている。Embodiment 7 FIG. FIG. 8 is a block diagram showing a configuration example of a speech speed changing device according to a seventh embodiment of the present invention. In FIG. 8, parts corresponding to the respective components and data of the first to sixth embodiments are not shown. 1 to 5 and FIG. 7, and the description thereof is omitted. In the figure, reference numeral 19 denotes a rhythm perception point unique to the phoneme sequence and an expansion / contraction ratio specific to the phoneme sequence.
Is a rhythm perception point and expansion / contraction ratio table referenced by. In the seventh embodiment, the expansion ratio table 12 and the rhythm perception point table 17 prepared separately in each of the above embodiments are combined into one rhythm perception point / expansion ratio table 19 in the seventh embodiment. This is different from the sixth embodiment.

【００５７】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データであるものとする。ラベリング手段１５
は入力音声１における音素境界と音素種類の判別を行
い、その判別結果を音素ラベル１６としてリズム知覚点
抽出手段１８へ送る。なお、この音素ラベル１６の決定
は実施の形態５の場合と同様とする。また、リズム知覚
点兼伸縮比テーブル１９には音素（ＣやＶ）の組合せに
応じたリズム知覚点の情報と、その組合せに対応する音
素波形の伸縮比が記憶されている。このリズム知覚点の
内容は、図６に示した実施の形態５の場合と同様であ
る。波形の伸縮比率はＣＶまたはＶＣＶの各音素区間に
対して伸縮比率を求め、それを記録したものである。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data prepared in advance by a system for outputting sound. Labeling means 15
Performs a discrimination between a phoneme boundary and a phoneme type in the input speech 1, and sends the discrimination result as a phoneme label 16 to the rhythm perception point extracting means 18. The determination of the phoneme label 16 is the same as in the fifth embodiment. The rhythm perception point and expansion / contraction ratio table 19 stores information on rhythm perception points corresponding to combinations of phonemes (C and V) and expansion / contraction ratios of phoneme waveforms corresponding to the combinations. The content of the rhythm perception point is the same as that of the fifth embodiment shown in FIG. The expansion and contraction ratio of the waveform is obtained by calculating the expansion and contraction ratio for each phoneme section of CV or VCV and recording the obtained value.

【００５８】リズム知覚点抽出手段１８は、ラベリング
手段１５にて抽出された音素ラベル１６の組に基づいて
リズム知覚点兼伸縮比テーブル１９を参照し、入力音声
１上のリズム知覚点位置を決定して音素系列に対応した
伸縮比を得る。なお、話速変更率１０は実施の形態１の
場合と同様であり、ユーザが音声の発話速度を変更する
際に与える値で、元の音声に対する伸縮比率として波形
伸縮手段９に入力される。波形伸縮手段９はその話速変
更率１０に応じて入力音声１を伸縮するが、その際に、
入力音声１上のリズム知覚点間隔が同じ話速変更率１０
で伸縮されるようにする。The rhythm perception point extracting means 18 refers to the rhythm perception point and expansion / contraction ratio table 19 based on the set of phoneme labels 16 extracted by the labeling means 15, and determines the rhythm perception point position on the input voice 1. To obtain the expansion / contraction ratio corresponding to the phoneme sequence. Note that the speech speed change rate 10 is the same as that in the first embodiment, and is a value given when the user changes the speech speed of the voice, and is input to the waveform expansion / contraction means 9 as an expansion / contraction ratio with respect to the original voice. The waveform expansion / contraction means 9 expands / contracts the input voice 1 in accordance with the speech speed change rate 10, and at this time,
Speech rate change rate 10 with the same rhythm perception point interval on input speech 1
To be stretched.

【００５９】すなわち、リズム知覚点間隔内の母音や破
裂性子音などの音素カテゴリー毎に区間の内訳を、音素
ラベル１６によりまず判定する。この音素カテゴリーの
例は実施の形態３の場合と同様である。このようにして
リズム知覚点兼伸縮比テーブル１９を参照することによ
って得た伸縮比と、与えられた話速変更率１０から各区
間毎の波形伸縮率を求める。なお、波形伸縮率の決定方
法は実施の形態３の場合と同様である。波形伸縮手段９
はこれらの値に応じて、各区間の入力音声１を伸縮す
る。波形伸縮の方法は実施の形態１の場合と同様であ
る。こうして波形伸縮手段９により話速変更された音声
は合成音声５として出力される。That is, the breakdown of the section for each phoneme category such as vowels and burst consonants within the rhythm perception point interval is first determined by the phoneme label 16. An example of this phoneme category is the same as in the third embodiment. The waveform expansion and contraction ratio for each section is obtained from the expansion and contraction ratio obtained by referring to the rhythm perception point and expansion and contraction ratio table 19 and the given speech speed change ratio 10. The method of determining the waveform expansion / contraction ratio is the same as that of the third embodiment. Corrugated expansion / contraction means 9
Expands and contracts the input voice 1 in each section according to these values. The method of expanding and contracting the waveform is the same as in the first embodiment. The speech whose speech speed has been changed by the waveform expansion / contraction means 9 is output as the synthesized speech 5.

【００６０】以上のように、この実施の形態７によれ
ば、波形伸縮を自動的に実施することが可能となり、自
然音声の時間構造の基本であるリズム知覚点と音素種類
を考慮した波形伸縮を行うことによって、全音声区間を
一様に変換するよりも自然な発話速度変更を実現するこ
とができるという効果が得られる。As described above, according to the seventh embodiment, the waveform expansion and contraction can be automatically performed, and the waveform expansion and contraction in consideration of the rhythm perception points and phoneme types, which are the basics of the time structure of natural speech. Is performed, it is possible to achieve an effect that a more natural change in the utterance speed can be realized than when all voice sections are converted uniformly.

【００６１】実施の形態８．図９はこの発明の実施の形
態８による話速変更装置の一構成例を示すブロック図で
ある。図において、１は入力音声、５は合成音声、６は
ＶＱコードブック、７は区間判別手段、８は母音区間音
声、９は波形伸縮手段、１０は話速変更率であり、図１
〜図５および図７，図８に同一符号を付して示した、実
施の形態１〜実施の形態７における各構成要素およびデ
ータと同等の部分である。Embodiment 8 FIG. FIG. 9 is a block diagram showing a configuration example of a speech speed changing device according to Embodiment 8 of the present invention. In the figure, 1 is an input voice, 5 is a synthesized voice, 6 is a VQ codebook, 7 is a section discriminating means, 8 is a vowel section voice, 9 is a waveform expanding / contracting means, and 10 is a speech speed change rate.
5 and FIGS. 7 and 8 are the same as the components and data in the first to seventh embodiments shown with the same reference numerals.

【００６２】また、２０は区間判別手段７が判別した母
音区間の中心位置を決定する母音中心抽出手段であり、
２１はその母音中心抽出手段２０によって抽出された母
音中心位置である。なお、波形伸縮手段９は母音区間と
子音区間、または母音区間と子音区間と無音区間毎に固
有の伸縮比を規定し、話速変更率１０が与えられた場合
に、母音中心抽出手段２０の抽出した母音中心位置２１
について、母音中心間隔の伸縮率が一定になるように、
母音区間と子音区間の伸縮比、あるいは母音区間と子音
区間と無音区間の伸縮比に応じた区間毎の波形伸縮率を
求め、その波形伸縮率に基づいて波形伸縮を行っている
点で、図１などに同一符号を付して示した上記各実施の
形態における波形伸縮手段とは異なっている。Reference numeral 20 denotes vowel center extracting means for determining the center position of the vowel section determined by the section determining means 7.
Reference numeral 21 denotes a vowel center position extracted by the vowel center extracting means 20. The waveform expansion / contraction means 9 defines a specific expansion / contraction ratio for each vowel section and consonant section, or each vowel section, consonant section, and silent section. Extracted vowel center position 21
, So that the expansion and contraction rate of the vowel center interval is constant,
The waveform expansion / contraction ratio is determined for each section according to the expansion / contraction ratio between a vowel section and a consonant section, or the expansion / contraction ratio between a vowel section, a consonant section, and a silent section, and the waveform is expanded / contracted based on the waveform expansion / contraction rate. This is different from the waveform expanding / contracting means in each of the above-described embodiments in which 1 and the like are assigned the same reference numerals.

【００６３】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データであるものとする。ＶＱコードブック６
は音声データのベクトル量子化を行うために利用するも
のであり、その構成は実施の形態１におけるそれと同様
である。区間判別手段７はこのＶＱコードブック６を参
照して入力音声１における母音と子音の区間を求め、母
音区間における入力音声１である母音区間音声８を母音
中心抽出手段２０に送る。なお、この区間判別の例とし
ては、実施の形態１の場合と同様の方法があげられる。
母音区間音声８を受けた母音中心抽出手段２０は、その
母音区間内の基準点である母音中心位置２１を抽出し、
波形伸縮手段９に送出する。この母音中心位置抽出方法
の例としては、母音区間内でパワーが最大となる位置を
決定する方法などがある。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data prepared in advance by a system for outputting sound. VQ Codebook 6
Are used to perform vector quantization of audio data, and the configuration is the same as that in the first embodiment. The section discriminating means 7 refers to the VQ codebook 6 to find a section between vowels and consonants in the input speech 1 and sends the vowel section speech 8 which is the input speech 1 in the vowel section to the vowel center extracting means 20. As an example of the section determination, the same method as in the first embodiment can be used.
The vowel center extracting means 20 having received the vowel section voice 8 extracts a vowel center position 21 which is a reference point in the vowel section,
It is sent to the waveform expansion / contraction means 9. As an example of the vowel center position extraction method, there is a method of determining a position where the power is maximum in a vowel section.

【００６４】話速変更率１０も実施の形態１の場合と同
様であり、ユーザが音声の発話速度を変更する際に与え
る値で、元の音声に対する伸縮比率として波形伸縮手段
９に入力される。波形伸縮手段９はその話速変更率１０
に応じて母音区間音声８および子音区間音声１１を伸縮
するが、その際に、母音区間音声８上の母音中心位置２
１の中心位置間隔が同じ話速変更率１０で伸縮されるよ
うにする。なお、波形伸縮手段９で規定されている母音
と子音の伸縮比から母音区間と子音区間の波形伸縮率を
決定する方法は、実施の形態２の場合と同様である。波
形伸縮手段９はこれらの値に応じて、各区間の入力音声
１を圧縮または伸長する。この波形伸縮の方法も実施の
形態１の場合と同様である。こうして波形伸縮手段９に
より変更された音声は合成音声５として出力される。The speech speed change rate 10 is the same as that in the first embodiment, and is a value given when the user changes the speech speed of the voice, and is input to the waveform expansion / contraction means 9 as an expansion / contraction ratio with respect to the original voice. . The waveform expansion / contraction means 9 has a speech speed change rate 10
Expands and contracts the vowel section voice 8 and the consonant section voice 11 according to the vowel center position 2 on the vowel section voice 8.
The center position interval of 1 is expanded and contracted at the same speech speed change rate 10. The method of determining the waveform expansion / contraction ratio between the vowel section and the consonant section from the expansion / contraction ratio between the vowel and the consonant specified by the waveform expansion / contraction means 9 is the same as in the second embodiment. The waveform expanding / contracting means 9 compresses or expands the input voice 1 in each section according to these values. The method of expanding and contracting the waveform is the same as in the first embodiment. The voice changed by the waveform expansion / contraction means 9 is output as the synthesized voice 5.

【００６５】以上のように、この実施の形態８によれ
ば、波形伸縮を自動的に実施することが可能となり、自
然音声の母音間の間隔を利用できるように母音中心位置
を考慮した波形伸縮を行うことによって、全音声区間を
一様に変換するよりも自然な発話速度変更を実現するこ
とができるという効果が得られる。As described above, according to the eighth embodiment, the expansion and contraction of the waveform can be automatically performed, and the expansion and contraction of the waveform in consideration of the center position of the vowel so that the interval between the vowels of the natural voice can be used. Is performed, it is possible to achieve an effect that a more natural change in the utterance speed can be realized than when all voice sections are converted uniformly.

【００６６】実施の形態９．図１０はこの発明の実施の
形態９による話速変更装置の一構成例を示すブロック図
であり、上記実施の形態１〜実施の形態８の各構成要素
およびデータに相当する部分については、図１〜図５お
よび図７〜図９と同一符号を付してその説明を省略す
る。図において、２２は話速変更率１０が与えられた場
合に、母音区間内の波形伸縮率を、母音中心抽出手段２
０が抽出した母音中心位置２１で最大または最小とな
り、子音区間との境界に近くなるにつれてその子音区間
の波形伸縮率に近付くように決定し、その決定された波
形伸縮率に基づいて各区間の波形の伸縮を行う母音中心
部波形伸縮手段である。実施の形態９は波形伸縮手段９
をこの母音中心部波形伸縮手段２２で代替している点
で、実施の形態８とは異なっている。Embodiment 9 FIG. 10 is a block diagram showing a configuration example of a speech speed changing device according to a ninth embodiment of the present invention. FIG. 10 is a diagram showing components corresponding to the components and data of the first to eighth embodiments. 1 to 5 and FIGS. 7 to 9 are denoted by the same reference numerals, and description thereof is omitted. In the figure, reference numeral 22 denotes a waveform expansion / contraction ratio in a vowel section when a speech speed change rate 10 is given.
0 is determined to be maximum or minimum at the extracted vowel center position 21 and to approach the waveform expansion / contraction rate of the consonant section as it approaches the boundary with the consonant section. Based on the determined waveform expansion / contraction rate, This is a vowel center waveform expansion / contraction unit that expands / contracts the waveform. Embodiment 9 is a waveform expanding / contracting means 9
Is different from the eighth embodiment in that this is replaced by the vowel central portion waveform expansion / contraction means 22.

【００６７】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データであるものとする。また、ＶＱコードブ
ック６も音声データのベクトル量子化を行うために利用
するものであり、その構成は実施の形態１の場合と同様
である。区間判別手段７はこのＶＱコードブック６を参
照して入力音声１における母音と子音の区間を求め、母
音区間における入力音声１である母音区間音声８を母音
中心抽出手段２０に送る。なお、区間判別の例として
は、実施の形態１の場合と同様の方法があげられる。母
音区間音声８を受けた母音中心抽出手段２０は、その母
音区間内の基準点である母音中心位置２１を抽出し、母
音中心部波形伸縮手段２２に送出する。この母音中心位
置２１とその抽出方法は、実施の形態８の場合と同様と
する。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data prepared in advance by a system for outputting sound. The VQ codebook 6 is also used for performing vector quantization of audio data, and has the same configuration as that of the first embodiment. The section discriminating means 7 refers to the VQ codebook 6 to find a section between vowels and consonants in the input speech 1 and sends the vowel section speech 8 which is the input speech 1 in the vowel section to the vowel center extracting means 20. As an example of the section discrimination, the same method as in the first embodiment can be used. The vowel center extracting means 20 having received the vowel section voice 8 extracts a vowel center position 21 which is a reference point in the vowel section, and sends it to the vowel center waveform expanding / contracting means 22. The vowel center position 21 and its extraction method are the same as in the eighth embodiment.

【００６８】話速変更率１０も実施の形態１の場合と同
様であり、ユーザが音声の発話速度を変更する際に与え
る値で、元の音声に対する伸縮比率として母音中心部波
形伸縮手段２２に入力される。母音中心部波形伸縮手段
２２はこの話速変更率１０に応じて母音区間音声８およ
び子音区間音声１１を伸縮するが、その際に入力音声１
上の母音中心間隔が同じ話速変更率１０で伸縮されるよ
うにする。なお、母音区間と子音区間のそれぞれの波形
伸縮率は、実施の形態８の場合と同様に決定することが
できる。The speech speed change rate 10 is the same as that of the first embodiment, and is a value given when the user changes the speech speed of the voice. Is entered. The vowel central part waveform expanding / contracting means 22 expands / contracts the vowel section voice 8 and the consonant section voice 11 in accordance with the speech speed change rate 10.
The upper vowel center interval is expanded and contracted at the same speech speed change rate 10. Note that the respective waveform expansion / contraction rates of the vowel section and the consonant section can be determined in the same manner as in the eighth embodiment.

【００６９】ここで、母音区間については子音区間との
境界付近などの過渡部分と、母音中心位置２１などの定
常部分を同じ伸縮率で伸縮することは、音声の自然性を
損なう可能性がある。そこで母音区間の波形伸縮率を、
母音中心抽出手段２０の抽出した母音中心位置２１にお
いて最大または最小となり、子音区間との境界に近くな
るにつれてその子音区間の波形伸縮率に近付くように決
定する。すなわち、母音区間の伸縮率λがλ＞１である
場合は波形伸長であるので、母音中心位置２１で波形伸
縮率を最大とし、子音区間との境界部分ではλより小さ
い当該子音区間の波形伸縮率と等しい値にまで減少させ
る。また母音区間の伸縮率λが０＜λ＜１の場合は波形
伸縮であるので、母音中心位置２１で波形伸縮率を最小
とし、子音区間との境界部分ではλより大きい当該子音
区間の波形伸縮率まで増加させる。母音中心部波形伸縮
手段２２はこの波形伸縮率に応じて、各区間の入力音声
１を伸縮する。なお、波形伸縮の方法は実施の形態１の
場合と同様である。こうして母音中心部波形伸縮手段２
２により変更された音声は合成音声５として出力され
る。Here, in a vowel section, expanding and contracting a transient part such as near the boundary with a consonant section and a stationary part such as the vowel center position 21 at the same expansion and contraction rate may impair the naturalness of speech. . Therefore, the waveform expansion and contraction rate of the vowel section
At the vowel center position 21 extracted by the vowel center extracting means 20, the maximum or the minimum is determined, and the waveform expansion / contraction rate of the consonant section is determined to be closer to the boundary with the consonant section. That is, when the expansion / contraction rate λ of the vowel section is λ> 1, the waveform is expanded, so that the waveform expansion / contraction rate is maximized at the vowel center position 21 and the waveform expansion / contraction of the consonant section smaller than λ at the boundary with the consonant section. Decrease to a value equal to the rate. When the expansion / contraction rate λ of a vowel section is 0 <λ <1, the waveform expansion / contraction is minimized at the vowel center position 21, and the waveform expansion / contraction of the consonant section is larger than λ at the boundary with the consonant section. Increase to the rate. The vowel center waveform expansion / contraction means 22 expands / contracts the input voice 1 in each section according to the waveform expansion / contraction rate. The method of expanding and contracting the waveform is the same as that in the first embodiment. In this way, the vowel center waveform expanding / contracting means 2
2 is output as synthesized speech 5.

【００７０】以上のように、この実施の形態９によれ
ば、波形伸縮を自動的に実施することが可能となり、自
然音声の母音間の間隔を利用できるように母音中心位置
を考慮し、さらに子音との境界付近より母音中心位置周
辺を強く伸縮するように重み付けをして波形伸縮を行う
ことによって、全音声区間を一様に変換するよりも自然
な発話速度変更を実現することができるという効果が得
られる。As described above, according to the ninth embodiment, the expansion and contraction of the waveform can be automatically performed, and the vowel center position is considered so that the interval between the vowels of natural speech can be used. By weighting and expanding and contracting the waveform more strongly around the vowel center position than near the boundary with the consonant, it is possible to achieve a more natural change in utterance speed than converting all voice sections uniformly. The effect is obtained.

【００７１】実施の形態１０．図１１はこの発明の実施
の形態１０による話速変更装置の一構成例を示すブロッ
ク図である。なお、上記実施の形態１〜実施の形態９に
おける各構成要素およびデータに相当する部分について
は、図１〜図５、図７〜図１０と同一符号を付してその
説明を省略する。図において、２３は入力音声１のピッ
チ周波数を分析し、ピッチ概形を得るピッチ分析手段で
あり、２４はこのピッチ分析手段２３によって抽出され
たピッチ概形である。２５はそのピッチ概形２４の最大
値と最小値を求め、それに基づいてピッチ概形基準位置
を決定するピッチ概形基準位置決定手段であり、２６は
このピッチ概形基準位置決定手段２５において決定され
たピッチ概形基準位置である。Embodiment 10 FIG. FIG. 11 is a block diagram showing a configuration example of a speech speed changing device according to Embodiment 10 of the present invention. The components corresponding to the respective components and data in the first to ninth embodiments are denoted by the same reference numerals as those in FIGS. 1 to 5 and FIGS. 7 to 10, and description thereof is omitted. In the figure, reference numeral 23 denotes pitch analysis means for analyzing the pitch frequency of the input voice 1 to obtain a pitch outline, and reference numeral 24 denotes a pitch outline extracted by the pitch analysis means 23. Reference numeral 25 denotes pitch outline reference position determining means for determining the maximum value and minimum value of the pitch outline 24 and determining the pitch outline reference position based on the maximum and minimum values. This is the pitch approximate reference position.

【００７２】なお、波形伸縮手段９は、母音区間と子音
区間、または母音区間と子音区間と無音区間毎に固有の
伸縮比を規定し、話速変更率１０が与えられた場合に、
ピッチ概形基準位置間隔の伸縮率が一定になるように、
母音区間と子音区間の伸縮比、あるいは母音区間と子音
区間と無音区間の伸縮比に応じた区間毎の波形伸縮率を
求め、その波形伸縮率に基づいて区間判別手段７からの
母音区間音声８および子音区間音声１１の波形伸縮を行
っているものである点で、図１〜図５および図７〜図１
０に同一符号を付して示した、上記各実施の形態におけ
る波形伸縮手段９とは異なっている。The waveform expansion / contraction means 9 defines a specific expansion / contraction ratio for each vowel section and consonant section, or each vowel section, consonant section and silent section, and when a speech speed change rate 10 is given,
In order to make the expansion and contraction rate of the pitch outline reference position interval constant,
A waveform expansion / contraction ratio for each section corresponding to the expansion / contraction ratio between a vowel section and a consonant section, or an expansion / contraction ratio between a vowel section, a consonant section, and a silent section is obtained. 1 to 5 and FIGS. 7 to 1 in that the waveform of the consonant section voice 11 is expanded and contracted.
This is different from the waveform expanding / contracting means 9 in each of the above embodiments, in which 0 is assigned the same reference numeral.

【００７３】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データであるものとする。ピッチ分析手段２３
は入力音声１が入力されると、その短時間のピッチ周波
数を求め、それを時間方向に記録したピッチ概形２４を
抽出して、ピッチ概形基準位置決定手段２５に入力す
る。なお、このピッチ周波数の抽出方法としては、一定
区間長毎に音声波形からケプストラムを求め、それが高
ケフレンシ領域で最大となる値を得、その逆数より求め
る方法がある。ピッチ概形基準位置決定手段２５は、入
力されたピッチ概形２４上の基準位置であるピッチ概形
基準位置２６を決定し、それを波形伸縮手段９に出力す
る。ここで、このピッチ概形基準位置２６は、例えば、
大域的にピッチ概形２４が最大値および最小値を取る時
間位置とする。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data prepared in advance by a system for outputting sound. Pitch analysis means 23
When the input speech 1 is inputted, the pitch pitch in a short time is obtained, the pitch pitch is recorded in the time direction, and the pitch pitch is extracted and inputted to the pitch pitch reference position determining means 25. As a method of extracting the pitch frequency, there is a method of obtaining a cepstrum from a speech waveform for each fixed section length, obtaining a maximum value of the cepstrum in a high quefrency area, and obtaining a reciprocal thereof. The pitch outline reference position determination means 25 determines a pitch outline reference position 26 which is a reference position on the input pitch outline 24 and outputs it to the waveform expansion / contraction means 9. Here, this pitch outline reference position 26 is, for example,
Globally, this is a time position where the pitch outline 24 takes the maximum value and the minimum value.

【００７４】また、ＶＱコードブック６も音声データの
ベクトル量子化を行うために利用するものであり、その
構成は実施の形態１の場合と同様である。区間判別手段
７は入力音声１における母音と子音の区間を求め、その
母音区間部分の入力音声１である母音区間音声８と、子
音区間部分の入力音声１である子音区間音声１１を波形
伸縮手段９に送る。なお、この区間判別の例としては、
実施の形態１の場合と同様の方法があげられる。The VQ codebook 6 is also used for performing vector quantization of audio data, and has the same configuration as that of the first embodiment. The section discriminating means 7 obtains a section between a vowel and a consonant in the input speech 1 and converts the vowel section speech 8 which is the input speech 1 of the vowel section and the consonant section speech 11 which is the input speech 1 of the consonant section into a waveform expanding / contracting means. Send to 9. In addition, as an example of this section determination,
The same method as in the first embodiment can be used.

【００７５】話速変更率１０は実施の形態１の場合と同
様であり、ユーザが音声の発話速度を変更する際に与え
る値で、元の音声に対する伸縮比率として波形伸縮手段
９に入力される。波形伸縮手段９はこの話速変更率１０
に応じて入力音声１を伸縮するが、その際に、ピッチ概
形基準位置決定手段２５にて決定された入力音声１上の
ピッチ概形基準位置２６の間隔が同じ話速変更率１０で
伸縮されるように線形伸縮する。すなわち、ユーザより
話速変更率１０が与えられると波形伸縮手段９はまず、
母音区間と子音区間毎の波形伸縮率を決定する。なお、
この波形伸縮率の決定方法は実施の形態２の場合と同様
である。波形伸縮手段９は次に、決定された波形伸縮率
で母音区間と子音区間それぞれについて入力音声１を伸
縮する。この波形伸縮の方法も実施の形態１の場合と同
様である。こうして波形伸縮手段９により話速変更され
た音声が合成音声５として出力される。The speech speed change rate 10 is the same as that in the first embodiment, and is a value given when the user changes the speech speed of the voice, and is input to the waveform expansion / contraction means 9 as an expansion / contraction ratio with respect to the original voice. . The waveform expansion / contraction means 9 calculates the speech speed change rate 10
The input voice 1 expands and contracts according to the pitch, and at this time, the interval between the pitch rough reference positions 26 on the input voice 1 determined by the pitch rough reference position determining means 25 expands and contracts at the same speech speed change rate 10. Linearly expand and contract as That is, when the speech speed change rate 10 is given by the user, the waveform stretching means 9 first
Determine the waveform expansion / contraction ratio for each vowel section and consonant section. In addition,
The method of determining the waveform expansion / contraction ratio is the same as that in the second embodiment. Next, the waveform expansion / contraction means 9 expands / contracts the input voice 1 in each of the vowel section and the consonant section at the determined waveform expansion / contraction rate. The method of expanding and contracting the waveform is the same as in the first embodiment. The voice whose speech speed has been changed by the waveform expansion / contraction means 9 is output as the synthesized voice 5.

【００７６】以上のように、この実施の形態１０によれ
ば、波形伸縮を自動的に実施することが可能となり、自
然音声のピッチ周波数より求めた基準位置に基づいて波
形伸縮を行うことにより、ピッチ概形を破壊せずに波形
伸縮を行うことができ、全音声区間を一様に変換するよ
りも自然な発話速度変更の実現が可能になるという効果
が得られる。As described above, according to the tenth embodiment, the expansion and contraction of the waveform can be automatically performed, and the expansion and contraction of the waveform is performed based on the reference position obtained from the pitch frequency of the natural voice. Waveform expansion and contraction can be performed without destroying the approximate pitch, and an effect that a natural change in speech rate can be realized as compared with the case where all voice sections are converted uniformly.

【００７７】実施の形態１１．図１２はこの発明の実施
の形態１１による話速変更装置の一構成例を示すブロッ
ク図である。なお、上記実施の形態１〜実施の形態１０
における各構成要素およびデータに相当する部分ついて
は、図１〜図５および図７〜図１１と同一符号を付して
その説明を省略する。図において、２７は区間判別手段
７が入力音声１より求めた母音区間における暫定母音区
間音声であり、２８は区間判別手段７が入力音声１より
求めた子音区間における暫定子音区間音声である。２９
は区間判別手段７で判定された暫定子音区間音声２８
を、ピッチ分析手段２３によるピッチ概形２４の判定結
果に基づいて、母音区間に含まれる部分と子音区間の部
分とに分離し、その子音区間の部分を子音区間音声１１
として、母音区間に含まれる部分と暫定母音区間音声２
７とを母音区間音声８として波形伸縮手段９に出力する
ピッチ概形分割手段である。Embodiment 11 FIG. FIG. 12 is a block diagram showing a configuration example of a speech speed changing device according to Embodiment 11 of the present invention. The first to tenth embodiments are described.
In FIG. 7, the same reference numerals as those in FIGS. 1 to 5 and FIGS. 7 to 11 denote parts corresponding to the components and data, and a description thereof will be omitted. In the figure, reference numeral 27 denotes a provisional vowel section voice in the vowel section obtained from the input speech 1 by the section determination means 7, and reference numeral 28 denotes a provisional consonant section voice in the consonant section obtained from the input speech 1 by the section determination means 7. 29
Is a provisional consonant section voice 28 determined by the section determination means 7
Is separated into a part included in a vowel section and a consonant section based on the determination result of the pitch outline 24 by the pitch analysis means 23, and the consonant section is separated into the consonant section speech 11
And the part included in the vowel section and the provisional vowel section voice 2
7 is a pitch approximate shape dividing means which outputs the vowel section sound 7 to the waveform expanding / contracting means 9.

【００７８】なお、波形伸縮手段９は、ピッチ概形分割
手段２９によって改めて判別された母音区間音声８と子
音区間音声１１について、それぞれ固有の伸縮比を決定
し、話速変更率１０が与えられた場合に、この伸縮比に
応じた区間毎の波形伸縮率を求めて、それに基づく波形
伸縮を行うものである点で、図１〜図５および図７〜図
１１に同一符号を付して示した、上記各実施の形態にお
ける波形伸縮手段９とは異なっている。The waveform expanding / contracting means 9 determines a specific expansion / contraction ratio for each of the vowel section voice 8 and the consonant section voice 11 newly determined by the pitch outline dividing section 29, and is given a speech speed change rate 10. In this case, the same reference numerals are given to FIGS. 1 to 5 and FIGS. 7 to 11 in that the waveform expansion / contraction ratio for each section according to the expansion / contraction ratio is obtained and the waveform expansion / contraction is performed based on the obtained ratio. This is different from the waveform expanding / contracting means 9 in each of the above embodiments.

【００７９】次に動作について説明する。この場合も、
入力音声１は音声を出力するシステムがあらかじめ用意
した音声データであるものとする。また、ＶＱコードブ
ック６も音声データのベクトル量子化を行うために利用
するものであり、その構成は実施の形態１の場合と同様
である。区間判別手段７は入力音声１における母音と子
音の区間を求め、母音区間の入力音声１を暫定母音区間
音声２７として、子音区間の入力音声１を暫定子音区間
音声２８として、それぞれピッチ概形分割手段２９に出
力する。なお、この区間判別の例としては、実施の形態
１と同様の方法があげられる。Next, the operation will be described. Again,
The input sound 1 is assumed to be sound data prepared in advance by a system for outputting sound. The VQ codebook 6 is also used for performing vector quantization of audio data, and has the same configuration as that of the first embodiment. The section discriminating means 7 obtains a section of a vowel and a consonant in the input speech 1, and sets the input speech 1 of the vowel section as a provisional vowel section speech 27, and the input speech 1 of the consonant section as a provisional consonant section speech 28, respectively, and roughly pitch-divides them. Output to means 29. As an example of the section determination, a method similar to that of the first embodiment can be used.

【００８０】また、ピッチ分析手段２３は入力音声１の
短時間のピッチ周波数を求めて、それを時間方向に記録
したものをピッチ概形２４として抽出し、それをピッチ
概形分割手段２９に送る。なお、このピッチ分析の例と
しては、実施の形態１０と同様の方法があげられる。The pitch analyzing means 23 obtains a short-time pitch frequency of the input voice 1, extracts the pitch frequency recorded in the time direction as a pitch outline 24, and sends it to the pitch outline dividing means 29. . As an example of the pitch analysis, the same method as in the tenth embodiment can be used.

【００８１】区間判別手段７から暫定母音区間音声２７
と暫定子音区間音声２８とが入力され、ピッチ分析手段
２３からピッチ概形２４が入力されたピッチ概形分割手
段２９は、暫定子音区間音声２８におけるピッチ概形２
４の分割を行う。すなわち、暫定子音区間音声２８のピ
ッチ概形２４がランダムであったり、前後の暫定母音区
間音声２７のピッチ概形２４と比べて概形の傾きが異な
る場合には、その区間を子音区間音声１１とし、暫定子
音区間音声２８のそれ以外の区間を暫定母音区間音声２
７とともに母音区間音声８とする。波形伸縮手段９に
は、このようにして、ピッチ概形分割手段２９により改
めて判別された母音区間音声８および子音区間音声１１
が入力される。From the section discriminating means 7, the provisional vowel section voice 27
And the provisional consonant section voice 28 are input, and the pitch approximate shape dividing means 29 to which the pitch approximate form 24 is input from the pitch analysis means 23,
4 is divided. That is, if the pitch outline 24 of the provisional consonant section voice 28 is random or the slope of the general shape is different from that of the provisional vowel section voice 27 before and after, the section is regarded as the consonant section voice 11. The other section of the provisional consonant section voice 28 is provisional vowel section voice 2
7 together with vowel section voice 8. The vowel section speech 8 and the consonant section speech 11 newly discriminated by the pitch outline shape dividing section 29 are added to the waveform expansion / contraction means 9 in this manner.
Is entered.

【００８２】波形伸縮手段９は話速変更率１０が与えら
れると、それに基づいて、ピッチ概形分割手段２９から
の母音区間音声８と子音区間音声１１の固有の伸縮比を
決定し、その伸縮比に応じた区間毎の波形伸縮率を求め
て、それに基づいた波形の圧縮あるいは伸長を行う。な
お、話速変更率１０は実施の形態１の場合と同様であ
り、ユーザが音声の発話速度を変更する際に与える値
で、元の音声に対する伸縮比率として波形伸縮手段９に
入力される。また、波形伸縮率の決定方法も実施の形態
２の場合と同様であり、さらに、波形伸縮の方法も実施
の形態１の場合と同様である。こうして波形伸縮手段９
により変更された音声は合成音声５として出力される。When the speech speed changing rate 10 is given, the waveform expanding / contracting means 9 determines a specific expansion / contraction ratio of the vowel section voice 8 and the consonant section voice 11 from the pitch outline shape dividing section 29 based on the speech rate change rate 10. The waveform expansion / contraction ratio for each section according to the ratio is obtained, and the waveform is compressed or expanded based on the obtained ratio. Note that the speech speed change rate 10 is the same as that in the first embodiment, and is a value given when the user changes the speech speed of the voice, and is input to the waveform expansion / contraction means 9 as an expansion / contraction ratio with respect to the original voice. Also, the method of determining the waveform expansion / contraction ratio is the same as that of the second embodiment, and the method of waveform expansion / contraction is also the same as that of the first embodiment. In this way, the waveform stretching means 9
Is output as synthesized speech 5.

【００８３】以上のように、この実施の形態１１によれ
ば、波形伸縮を自動的に実施することが可能となり、ピ
ッチ概形を母音子音区間判別の判定基準として利用する
ことによって、ピッチ概形を破壊せずに波形伸縮を行う
ことができ、全音声区間を一様に変換するよりも自然な
発話速度変更の実現が可能になるという効果が得られ
る。As described above, according to the eleventh embodiment, the expansion and contraction of the waveform can be automatically performed, and the approximate pitch is used as a criterion for determining the vowel consonant interval. The waveform can be expanded / contracted without destroying the utterance, and the effect that a natural change in the utterance speed can be realized as compared with the case where the entire voice section is converted uniformly.

【００８４】なお、実施の形態１〜実施の形態１１にお
ける入力音声１は、あらかじめシステムに用意された音
声データである代りに、マイクによりナレータが直接入
力した音声であってもよい。The input voice 1 in the first to eleventh embodiments may be voice directly input by a narrator using a microphone instead of voice data prepared in advance in the system.

【００８５】また、実施の形態２〜実施の形態１１にお
ける母音区間の波形伸縮率は、ユーザが入力した話速変
更率１０をそのまま用いてもよく、子音区間の波形伸縮
率を“１”に固定してもよい。Further, as the waveform expansion / contraction rate in the vowel section in Embodiments 2 to 11, the speech speed change rate 10 input by the user may be used as it is, and the waveform expansion / contraction rate in the consonant section is set to “1”. It may be fixed.

【００８６】また、実施の形態１〜実施の形態３、およ
び実施の形態８〜実施の形態１１における区間判別手段
７は、母音区間と子音区間の判別の他に、入力音声１の
波形の振幅の絶対値が任意の値をすべて下回る区間を無
音区間と判別する機能を備えるようにしてもよく、ま
た、実施の形態１〜実施の形態３、実施の形態８、実施
の形態１０、および実施の形態１１の波形伸縮手段９
と、実施の形態９の母音中心部波形伸縮手段２２は、無
音区間を伸縮することができ、伸縮する際には母音区間
と子音区間より無音区間の伸縮比率を最も大きく設定す
ることができるようにしてもよい。In the first to third embodiments and the eighth to eleventh embodiments, the section discriminating unit 7 discriminates between a vowel section and a consonant section as well as the amplitude of the waveform of the input speech 1. May be provided with a function of discriminating a section in which the absolute value of all is less than an arbitrary value as a silent section. Embodiments 1 to 3, Embodiment 8, Embodiment 10, and Embodiment Waveform expansion / contraction means 9 according to mode 11
In addition, the vowel central part waveform expanding / contracting means 22 of the ninth embodiment can expand / contract a silent section, and when expanding / contracting, can set the expansion / contraction ratio of a silent section to be the largest than a vowel section and a consonant section. It may be.

【００８７】また、実施の形態１〜実施の形態４、およ
び実施の形態８〜実施の形態１１における区間判別手段
７は、その都度スペクトルパラメータを分析する代り
に、事前に分析したスペクトルパラメータをデータとし
て保持しておくようにしてもよい。Also, the section discriminating means 7 in the first to fourth embodiments and the eighth to eleventh embodiments uses the spectrum parameters analyzed in advance instead of analyzing the spectrum parameters each time. It may be stored as.

【００８８】また、実施の形態１〜実施の形態３、およ
び実施の形態８〜実施の形態１１における区間判別手段
７は、単一の閾値により母音と子音を一意に決定する代
りに、例えば５０％の付近の母音出現確率で、前後の区
間が母音であれば母音と判別するなど、前後の区間との
連続性を考慮した区間判別を行うようにしてもよい。The section discriminating means 7 in the first to third embodiments and the eighth to eleventh embodiments uses, for example, 50 vowels and consonants in place of uniquely determining a vowel and a consonant using a single threshold value. At a vowel appearance probability near%, if the preceding and succeeding sections are vowels, the sections may be determined to be vowels, and section discrimination in consideration of continuity with the preceding and following sections may be performed.

【００８９】また、実施の形態１〜実施の形態３、およ
び実施の形態８〜実施の形態１１における区間判別手段
７は、個々の区間で母音と子音を一意に決定する代り
に、前後数区間の確率の累積値により判定するようにし
てもよい。Also, the section discriminating means 7 in the first to third embodiments and the eighth to eleventh embodiments does not uniquely determine vowels and consonants in each section, but uses May be determined based on the cumulative value of the probabilities.

【００９０】また、実施の形態１〜実施の形態４は、入
力音声１そのものの代りに、入力音声を線形予測分析し
て得た残差波形とスペクトルパラメータを入力音声１と
して入力するようにしてもよく、また、実施の形態１お
よび実施の形態３における区間判別手段７と、実施の形
態４におけるＶＱ波形伸縮手段１４は、ベクトル量子化
と区間判別をスペクトルパラメータについて行い、実施
の形態１〜実施の形態３における波形伸縮手段９と実施
の形態４におけるＶＱ波形伸縮手段１４は、残差波形の
伸縮を前記スペクトルパラメータに対応する区間につい
て行うようにしてもよい。Also, in the first to fourth embodiments, instead of the input speech 1 itself, a residual waveform and a spectrum parameter obtained by performing linear prediction analysis of the input speech are input as the input speech 1. In addition, the section discriminating means 7 in the first and third embodiments and the VQ waveform expanding / contracting means 14 in the fourth embodiment perform vector quantization and section discrimination with respect to spectral parameters. The waveform expanding / contracting means 9 in the third embodiment and the VQ waveform expanding / contracting means 14 in the fourth embodiment may expand and contract the residual waveform in a section corresponding to the spectrum parameter.

【００９１】また、実施の形態１１は、実施の形態１０
におけるピッチ概形基準位置決定手段２５を備えて、波
形伸縮手段９がそのピッチ概形基準位置決定手段で得た
ピッチ概形基準位置の間隔が同じ話速変更率で伸縮され
るように音声波形の伸縮率を設定するようにしてもよ
い。The eleventh embodiment is different from the tenth embodiment.
And the waveform expanding / contracting means 9 expands / contracts the interval between the pitch approximate reference positions obtained by the pitch approximate reference position determining means at the same speech rate change rate. May be set.

【００９２】また、実施の形態２、実施の形態３、実施
の形態５〜実施の形態８、実施の形態１０、実施の形態
１１の波形伸縮手段９と、実施の形態４のＶＱ波形伸縮
手段１４と実施の形態９の母音中心部波形伸縮手段２２
は、母音および子音の伸縮比を規定する代りに、モーラ
速度に対応する各母音および子音の伸縮比を記録するテ
ーブルを参照するようにしてもよい。Further, the waveform expansion / contraction means 9 of the second, third, fifth, eighth, tenth, and eleventh embodiments, and the VQ waveform expansion / contraction means of the fourth embodiment. 14 and the vowel center waveform expanding / contracting means 22 of the ninth embodiment
May refer to a table that records the expansion and contraction ratio of each vowel and consonant corresponding to the mora speed, instead of defining the expansion and contraction ratio of vowels and consonants.

【００９３】また、実施の形態５と実施の形態６におけ
るリズム知覚点テーブル１７、および実施の形態７にお
けるリズム知覚点兼伸縮比テーブル１９は、音素の組合
わせのみならず、母音部のピッチの高さ、平均話速、文
頭／文中／文末のどれに当るかという情報に応じたリズ
ム知覚点の値を記述することもできるようにしてもよ
い。The rhythm perception point table 17 in the fifth and sixth embodiments and the rhythm perception point and expansion / contraction ratio table 19 in the seventh embodiment include not only the combination of phonemes but also the vowel pitch. The value of the rhythm perception point may be described in accordance with information indicating whether the subject corresponds to the height, the average speech speed, or the beginning, middle, or end of a sentence.

【００９４】また、実施の形態５〜実施の形態７におけ
るラベリング手段１５は、マニュアルで音素ラベルを作
成する代りに、簡易な音声認識手法によってラベリング
を行うこようにしてもよい。The labeling means 15 in the fifth to seventh embodiments may perform labeling by a simple voice recognition technique instead of manually creating phoneme labels.

【００９５】[0095]

【発明の効果】以上のように、この発明によれば、学習
音声データについてのコードベクトルを、各コードベク
トルの母音および子音の出現確率とともに記録したＶＱ
コードブックを参照することにより、入力音声をベクト
ル量子化してコードベクトルの選択を行い、選択された
コードベクトルにおける母音または子音の出現確率によ
って判別された母音区間について、波形伸縮手段に与え
られた話速変更率にて波形伸縮を行うように構成したの
で、母音区間についての波形伸縮を自動的に実施するこ
とが可能となり、全音声区間を同じ伸縮率で一様に変更
することによる自然性劣化の問題を解消することがで
き、良好な発話速度変更を実現することができる話速変
更装置が得られるという効果がある。As described above, according to the present invention, the VQ in which the code vectors of the learning speech data are recorded together with the appearance probabilities of the vowels and consonants of each code vector.
By referring to the codebook, the input speech is vector-quantized to select a code vector, and a vowel section determined by the appearance probability of a vowel or consonant in the selected code vector is given to a speech given to the waveform stretching means. Since the waveform is expanded and contracted at the speed change rate, it is possible to automatically perform the waveform expansion and contraction for the vowel section, and the naturalness is deteriorated by uniformly changing all voice sections at the same expansion and contraction rate. Can be solved, and a speech speed changing device capable of realizing a good speech speed change can be obtained.

【００９６】この発明によれば、母音区間と子音区間、
または母音区間と子音区間と無音区間毎に固有の伸縮比
を規定し、話速変更率が与えられると、規定された伸縮
比に応じて求めた母音区間、子音区間、および無声区間
の波形伸縮率に基づいて波形伸縮を行うように構成した
ので、母音区間と子音区間について個別の伸縮率で波形
伸縮を自動的に実施することが可能となり、全音声区間
を一様に同じ伸縮率で変更することによる自然性劣化の
問題を解消することができ、良好な発話速度変更を実現
することができるという効果がある。According to the present invention, a vowel section and a consonant section,
Alternatively, a specific expansion / contraction ratio is defined for each of the vowel section, consonant section, and silent section, and when a speech speed change rate is given, the waveform expansion / contraction of the vowel section, consonant section, and unvoiced section determined according to the specified expansion / contraction ratio. Since the waveform is expanded and contracted based on the rate, it is possible to automatically perform waveform expansion and contraction at separate expansion and contraction rates for vowel sections and consonant sections, and change all voice sections uniformly at the same expansion and contraction rate. This can solve the problem of the deterioration of the naturalness due to the above, and has an effect that it is possible to realize a good change in the utterance speed.

【００９７】この発明によれば、話速変更率が与えられ
た場合に、音素の種類や音素カテゴリー毎に規定された
固有の伸縮比を記憶させた伸縮比テーブルを参照して求
めた、入力音声の音素毎または音素カテゴリー毎の伸縮
比に応じた区間毎の波形伸縮率に基づいて、波形伸縮を
行うように構成したので、母音区間と子音カテゴリー毎
に分類された区間について個別の伸縮率で波形伸縮を自
動的に実施することができ、全音声区間を一様に同じ伸
縮率で変更することによる自然性劣化の問題を解消する
ことが可能となって、良好な発話速度変更を実現するこ
とができるという効果がある。According to the present invention, when a speech speed change rate is given, an input / output ratio obtained by referring to an expansion / contraction ratio table which stores a specific expansion / contraction ratio defined for each type of phoneme and each phoneme category. Since the waveform expansion and contraction is performed based on the waveform expansion and contraction rate for each section corresponding to the expansion and contraction ratio for each phoneme or phoneme category of the voice, individual expansion and contraction rates are set for sections classified into vowel sections and consonant categories. Can automatically perform waveform expansion and contraction, and can eliminate the problem of naturalness degradation caused by changing all voice sections uniformly at the same expansion and contraction rate, and achieve a good change in speech rate There is an effect that can be.

【００９８】この発明によれば、話速変更率が与えられ
た場合に、コードベクトルを各コードベクトルに固有の
伸縮比率とともに記録しているＶＱコードブックの参照
により、入力音声をベクトル量子化してコードベクトル
の選択を行い、選択されたコードベクトル毎の伸縮比に
応じて区間毎に求めた波形伸縮率に基づいて、波形伸縮
を行うように構成したので、母音区間と子音区間につい
ての個別の伸縮率での波形伸縮を簡便かつ自動的に実施
することが可能となり、全音声区間を一様に同じ伸縮率
で変更することによる自然性劣化の問題を解消すること
ができて、良好な発話速度変更を実現することが可能に
なるという効果がある。According to the present invention, when the speech rate change rate is given, the input voice is vector-quantized by referring to the VQ codebook in which the code vector is recorded together with the expansion / contraction ratio specific to each code vector. A code vector is selected, and the waveform is expanded and contracted based on the waveform expansion and contraction ratio obtained for each section in accordance with the expansion and contraction ratio of each selected code vector. Waveform expansion and contraction at the expansion and contraction ratio can be performed easily and automatically, and the problem of deterioration in naturalness caused by changing all voice sections uniformly at the same expansion and contraction ratio can be solved, and good speech can be obtained. There is an effect that the speed can be changed.

【００９９】この発明によれば、音素系列毎に固有のリ
ズム知覚点を記憶したリズム知覚点テーブルを参照し
て、入力音声よりラベリング手段が抽出した音素ラベル
から入力音声のリズム知覚点を抽出し、話速変更率が与
えられると、リズム知覚点間隔の伸縮率が一定になるよ
うに求めた区間毎の波形伸縮率に基づいて、波形伸縮を
行うように構成したので、波形伸縮を自動的に実施する
ことが可能となり、自然音声の時間構造の基本である、
リズム知覚点を考慮した波形伸縮を行うことによって、
同一伸縮率で一様に変換するよりも自然な発話速度変更
を実現することができるという効果がある。According to the present invention, the rhythm perception point of the input voice is extracted from the phoneme label extracted by the labeling means from the input voice by referring to the rhythm perception point table storing the rhythm perception point unique to each phoneme sequence. When the speech rate change rate is given, the waveform expansion and contraction is performed based on the waveform expansion and contraction rate for each section obtained so that the expansion and contraction rate of the rhythm perception point interval becomes constant. And the basics of the natural sound time structure,
By expanding and contracting the waveform considering the rhythm perception point,
There is an effect that it is possible to realize a more natural change of the utterance speed than to perform uniform conversion with the same expansion / contraction ratio.

【０１００】この発明によれば、母音区間と子音区間、
または母音区間と子音区間と無音区間の各間毎に固有の
伸縮比を規定し、話速変更率が与えられると、リズム知
覚点間隔の伸縮率が一定になるように、規定された伸縮
比に応じて求めた母音区間と子音区間、および母音区間
と子音区間と無音区間の波形伸縮率に基づいて、波形伸
縮を行うように構成したので、母音区間と子音区間につ
いて波形伸縮を自動的に実施することが可能となり、自
然音声の時間構造の基本であるリズム知覚点を考慮した
波形伸縮を行うことによって、一様に変換するよりも自
然な発話速度変更を実現することができるという効果が
ある。According to the present invention, a vowel section and a consonant section,
Alternatively, a specific expansion / contraction ratio is defined for each of the vowel section, the consonant section, and the silent section, and when a speech speed change rate is given, the specified expansion / contraction ratio is such that the expansion / contraction rate of the rhythm perception point interval becomes constant. The waveform expansion and contraction is performed based on the waveform expansion and contraction ratio of the vowel interval and consonant interval and the vowel interval, consonant interval, and silent interval determined according to the above. By performing waveform expansion and contraction in consideration of the rhythm perception point, which is the basis of the time structure of natural speech, it is possible to achieve a more natural speech rate change than a uniform conversion. is there.

【０１０１】この発明によれば、話速変更率が与えられ
た場合に、音素の種類や音素カテゴリー毎に規定された
固有の伸縮比を記憶している伸縮比テーブルを参照し
て、リズム知覚点間隔の伸縮率が一定になるように求め
た、入力音声の音素毎または音素カテゴリー毎の伸縮比
に応じた区間毎の波形伸縮率に基づいて、波形伸縮を行
うように構成したので、母音区間と子音カテゴリー毎に
分類された区間について、個別の伸縮率で波形伸縮を自
動的に実施することが可能となり、自然音声の時間構造
の基本であるリズム知覚点と音素種類を考慮した波形伸
縮を行うことによって、一様変換よりも自然な発話速度
変更を実現することができるという効果がある。According to the present invention, when the speech speed change rate is given, the rhythm perception is referred to by referring to the expansion / contraction ratio table storing the specific expansion / contraction ratio defined for each phoneme type or phoneme category. Since the waveform expansion and contraction is performed based on the waveform expansion and contraction ratio for each section corresponding to the expansion and contraction ratio for each phoneme or phoneme category of the input voice, determined so that the expansion and contraction ratio of the point interval is constant, the vowel Waveform expansion and contraction can be automatically performed at individual expansion and contraction rates for sections and sections classified by consonant categories, and waveform expansion and contraction taking into account rhythm perception points and phoneme types, which are the basics of the time structure of natural speech. Is performed, there is an effect that the utterance speed can be changed more naturally than the uniform conversion.

【０１０２】この発明によれば、音素系列に固有のリズ
ム知覚点と音素系列に固有の伸縮比とを記憶するリズム
知覚点兼伸縮比テーブルを参照して、入力音声よりラベ
リング手段が抽出した音素ラベルから入力音声のリズム
知覚点と音素区間毎の伸縮比を抽出し、話速変更率が与
えられると、リズム知覚点間隔の伸縮率が一定になるよ
うに求めた音素区間毎の波形伸縮率に基づいて、波形伸
縮を行うように構成したので、母音区間と子音区間につ
いて、波形伸縮を自動的に実施することが可能となり、
自然音声の時間構造の基本であるリズム知覚点と音素種
類を考慮した波形伸縮を行うことによって、一様変換よ
りも自然な発話速度変更を実現することができるという
効果がある。According to the present invention, the phoneme extracted by the labeling means from the input speech with reference to the rhythm perception point and expansion ratio table storing the rhythm perception point unique to the phoneme sequence and the expansion ratio specific to the phoneme sequence. The rhythm perception point of the input voice and the expansion / contraction ratio for each phoneme section are extracted from the label, and given the speech speed change rate, the waveform expansion / contraction rate for each phoneme section is determined so that the expansion / contraction rate of the rhythm perception point interval is constant. , The waveform expansion and contraction is performed, so that the waveform expansion and contraction can be automatically performed for the vowel section and the consonant section,
By performing waveform expansion and contraction in consideration of the rhythm perception point and phoneme type, which are the basics of the time structure of natural speech, there is an effect that it is possible to achieve a more natural change in speech rate than uniform conversion.

【０１０３】この発明によれば、ＶＱコードブックを参
照して抽出した母音区間音声の母音中心位置を決定し、
話速変更率が与えられた場合に、その母音中心位置の間
隔の伸縮率が一定になるように求めた、母音区間と子音
区間、あるいは母音区間と子音区間と無音区間の伸縮比
に応じた区間毎の波形伸縮率に基づいて波形伸縮を行う
ように構成したので、波形伸縮を自動的に実施すること
が可能となり、自然音声の母音間の間隔を利用できるよ
うに母音中心位置を考慮した波形伸縮を行うことによっ
て、一様変換よりも自然な発話速度変更を実現すること
ができるという効果がある。According to the present invention, the vowel center position of the vowel section voice extracted with reference to the VQ codebook is determined,
Given the speech rate change rate, the expansion and contraction rate of the interval between the vowel center positions was determined to be constant, according to the expansion and contraction ratio of the vowel section and consonant section, or the vowel section, consonant section and silent section. Since the waveform is expanded and contracted based on the waveform expansion and contraction rate for each section, it is possible to automatically perform waveform expansion and contraction, and consider the vowel center position so that the interval between vowels of natural speech can be used. By performing the waveform expansion / contraction, there is an effect that the utterance speed can be changed more naturally than the uniform conversion.

【０１０４】この発明によれば、ＶＱコードブックを参
照して抽出した母音区間音声の母音中心位置を決定し、
話速変更率が与えられた場合に、母音区間内の母音中心
位置で最大または最小に、子音区間との境界に近付くと
ともに子音区間の波形伸縮率に近付くように決定した波
形伸縮率に基づいて、各区間の波形伸縮を行うように構
成したので、波形伸縮を自動的に実施することが可能と
なり、自然音声の母音間の間隔を利用できるように母音
中心位置を考慮し、さらに子音との境界付近より母音中
心位置周辺を強く伸縮するように重み付けした波形伸縮
を行うことによって、一様変換よりも自然な発話速度変
更を実現することができるという効果が得られる。According to the present invention, the vowel center position of the vowel section voice extracted with reference to the VQ codebook is determined,
Given the speech rate change rate, based on the waveform expansion and contraction rate determined to approach the boundary with the consonant section at the maximum or minimum at the vowel center position within the vowel section and approach the waveform expansion and contraction rate of the consonant section Since the waveform is expanded and contracted in each section, the waveform can be automatically expanded and contracted, and the vowel center position is considered so that the interval between vowels of natural speech can be used. By performing weighted waveform expansion and contraction so as to expand and contract more strongly around the vowel center position than near the boundary, it is possible to obtain an effect that it is possible to achieve a more natural change in speech rate than with uniform conversion.

【０１０５】この発明によれば、入力音声のピッチ周波
数分析で得られたピッチ概形の極大値と極小値よりピッ
チ概形基準位置間隔を決定し、話速変更率が与えられる
と、そのピッチ概形基準位置間隔の伸縮率が一定になる
ように求めた、母音区間と子音区間、あるいは母音区間
と子音区間と無音区間の伸縮比に応じた区間毎の波形伸
縮率に基づいて、波形伸縮を行うように構成したので、
波形伸縮を自動的に実施することが可能となり、自然音
声のピッチ周波数より求めた基準位置に基づいて波形伸
縮を行うことによって、ピッチ概形を破壊することなく
波形伸縮を行うことができ、一様変換よりも自然な発話
速度変更を実現することが可能になるという効果があ
る。According to the present invention, the pitch outline reference position interval is determined from the maximum value and the minimum value of the pitch outline obtained by the pitch frequency analysis of the input voice, and when the speech speed change rate is given, the pitch is determined. The waveform expansion / contraction based on the waveform expansion / contraction rate for each section according to the expansion / contraction rate of the vowel section and consonant section, or the vowel section / consonant section / silent section determined so that the expansion / contraction rate of the outline reference position interval is constant. Is configured to perform
Waveform expansion and contraction can be performed automatically, and by performing waveform expansion and contraction based on the reference position obtained from the pitch frequency of natural voice, waveform expansion and contraction can be performed without destroying the approximate pitch. There is an effect that it is possible to realize a more natural change of the utterance speed than the transformation.

【０１０６】この発明によれば、ピッチ周波数分析結果
によるピッチ概形の判定により、暫定子音区間音声を母
音区間に含む部分と子音区間の部分とに分離して、改め
て母音区間と子音区間との判別を行い、話速変更率が与
えられた場合に、その母音区間と子音区間について固有
に決定した伸縮比に応じて求めた、区間毎の波形伸縮率
に基づいて波形伸縮を行うように構成したので、波形伸
縮を自動的に実施することが可能となり、ピッチ概形を
母音子音区間判別の判定基準として利用することによっ
て、ピッチ概形を破壊せずに波形伸縮を行うことがで
き、一様変換よりも自然な発話速度変更を実現すること
が可能になるという効果がある。According to the present invention, a provisional consonant section voice is separated into a part including a vowel section and a consonant section by judging a pitch outline based on a result of pitch frequency analysis, and a vowel section and a consonant section are renewed. When the speech speed change rate is given, the waveform expansion / contraction is performed based on the waveform expansion / contraction rate for each section, which is obtained according to the expansion / contraction ratio uniquely determined for the vowel section and the consonant section. Therefore, waveform expansion and contraction can be automatically performed, and by using the pitch outline as a criterion for determining vowel consonant intervals, waveform expansion and contraction can be performed without destroying the pitch outline. There is an effect that it is possible to realize a more natural change of the utterance speed than the transformation.

[Brief description of the drawings]

【図１】この発明の実施の形態１による話速変更装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech speed changing device according to a first embodiment of the present invention.

【図２】この発明の実施の形態２による話速変更装置
の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a speech speed changing device according to a second embodiment of the present invention.

【図３】この発明の実施の形態３による話速変更装置
の構成を示すブロック図である。FIG. 3 is a block diagram showing a configuration of a speech speed changing device according to a third embodiment of the present invention.

【図４】この発明の実施の形態４による話速変更装置
の構成を示すブロック図である。FIG. 4 is a block diagram showing a configuration of a speech speed changing device according to a fourth embodiment of the present invention.

【図５】この発明の実施の形態５による話速変更装置
の構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a speech speed changing device according to a fifth embodiment of the present invention.

【図６】この発明の実施の形態５におけるリズム知覚
テーブルの内容を示す説明図である。FIG. 6 is an explanatory diagram showing the contents of a rhythm perception table according to Embodiment 5 of the present invention.

【図７】この発明の実施の形態６による話速変更装置
の構成を示すブロック図である。FIG. 7 is a block diagram showing a configuration of a speech speed changing device according to a sixth embodiment of the present invention.

【図８】この発明の実施の形態７による話速変更装置
の構成を示すブロック図である。FIG. 8 is a block diagram showing a configuration of a speech speed changing device according to a seventh embodiment of the present invention.

【図９】この発明の実施の形態８による話速変更装置
の構成を示すブロック図である。FIG. 9 is a block diagram showing a configuration of a speech speed changing device according to an eighth embodiment of the present invention.

【図１０】この発明の実施の形態９による話速変更装
置の構成を示すブロック図である。FIG. 10 is a block diagram showing a configuration of a speech speed changing device according to a ninth embodiment of the present invention.

【図１１】この発明の実施の形態１０による話速変更
装置の構成を示すブロック図である。FIG. 11 is a block diagram showing a configuration of a speech speed changing device according to a tenth embodiment of the present invention.

【図１２】この発明の実施の形態１１による話速変更
装置の構成を示すブロック図である。FIG. 12 is a block diagram showing a configuration of a speech speed changing device according to an eleventh embodiment of the present invention.

【図１３】従来の話速変更装置の構成を示すブロック
図である。FIG. 13 is a block diagram showing a configuration of a conventional speech speed changing device.

[Explanation of symbols]

１入力音声、５合成音声、６ＶＱコードブック、
７区間判別手段、８母音区間音声、９波形伸縮手
段、１０話速変更率、１１子音区間音声、１２伸
縮比テーブル、１３伸縮比付きＶＱコードブック、１
４ＶＱ波形伸縮手段、１５ラベリング手段、１６
音素ラベル、１７リズム知覚点テーブル、１８リズ
ム知覚点抽出手段、１９リズム知覚点兼伸縮比テーブ
ル、２０母音中心抽出手段、２１母音中心位置、２２
母音中心部波形伸縮手段、２３ピッチ分析手段、２
４ピッチ概形、２５ピッチ概形基準位置決定手段、
２６ピッチ概形基準位置、２７暫定母音区間音声、
２８暫定子音区間音声、２９ピッチ概形分割手段。1 input voice, 5 synthesized voice, 6 VQ codebook,
7 section discriminating means, 8 vowel section voices, 9 waveform expansion / contraction means, 10 speech rate change rate, 11 consonant section voices, 12 expansion / contraction ratio table, 13 VQ codebook with expansion / contraction ratio, 1
4 VQ waveform expansion / contraction means, 15 labeling means, 16
Phoneme label, 17 rhythm perception point table, 18 rhythm perception point extraction means, 19 rhythm perception point and expansion / contraction ratio table, 20 vowel center extraction means, 21 vowel center position, 22
Vowel center waveform expansion / contraction means, 23 pitch analysis means, 2
4 pitch outline, 25 pitch outline reference position determining means,
26 pitch outline reference position, 27 provisional vowel section voice,
28 provisional consonant section voice, 29 pitch approximate shape dividing means.

Claims

[Claims]

1. A vector quantization codebook for recording a code vector for learning speech data together with an appearance probability of a vowel or a consonant of each code vector, and a vector quantization codebook for input speech by referring to the vector quantization codebook. Segmentation means for selecting a code vector by performing conversion, based on the appearance probability of a vowel or consonant in the selected code vector, a section discriminating means for discriminating whether the section is a vowel section or a consonant section, A speech expander which performs waveform compression or expansion in accordance with a speech speed change rate given as an expansion / contraction ratio with respect to the original voice when the speech speed is changed in a section determined as a vowel section by the discriminator; Speed change device.

2. A waveform expansion / contraction means for defining a specific expansion / contraction ratio for each of a vowel section and a consonant section, or for each of a vowel section, a consonant section and a silent section, and using an expansion / contraction ratio with respect to an original voice for changing a speech speed. When a speech rate change rate is given, a waveform expansion / contraction rate for each section is obtained according to the expansion / contraction ratio between a vowel section and a consonant section, or a vowel section, a consonant section, and a silent section, and based on the waveform expansion / contraction rate. 2. The speech speed changing device according to claim 1, wherein the speech speed is changed by compressing or expanding the waveform.

3. An expansion / contraction ratio table for storing a specific expansion / contraction ratio defined for each type of phoneme and each category of phoneme, wherein the waveform expansion / contraction means uses an expansion / contraction ratio with respect to the original voice for changing the speech speed. When the speech rate change rate is given, the waveform expansion / contraction ratio is obtained by referring to the expansion / contraction ratio table, for each phoneme or each phoneme category of the input voice, for each section corresponding to the expansion / contraction ratio. 2. The speech speed changing device according to claim 1, wherein waveform compression or decompression is performed based on the following.

4. A vector quantization codebook with an expansion / contraction ratio for recording a code vector for learning speech data together with an expansion / contraction ratio specific to each of the code vectors; and a vector quantization codebook with an expansion / contraction ratio,
When a code rate is selected by performing vector quantization on an input voice and a speech speed change rate is given by an expansion / contraction ratio with respect to an original voice to change a speech speed, an expansion / contraction ratio for each of the selected code vectors is given. A speech rate changing device comprising: a vector quantization waveform expansion / contraction unit that obtains a waveform expansion / contraction rate for each section according to the waveform expansion / contraction based on the waveform expansion / contraction rate.

5. A labeling means for extracting a phoneme label of an input voice, a rhythm perception point table storing a rhythm perception point unique to a phoneme sequence of the input voice, and a phoneme with reference to the rhythm perception point table. A rhythm perception point extracting means for extracting a rhythm perception point of the input voice from a label; and a rhythm perception point extraction when a speech speed change rate based on an expansion / contraction ratio with respect to an original voice for changing a speech speed is given. Waveform expansion / contraction means for obtaining a waveform expansion / contraction rate for each section so that the expansion / contraction rate of the rhythm perception point interval at the rhythm perception point extracted by the means is constant, and performing waveform compression or expansion based on the waveform expansion / contraction rate. Speed change device.

6. A waveform expansion / contraction means for defining a specific expansion / contraction ratio for each of a vowel section and a consonant section, or for each of a vowel section, a consonant section and a silent section, and using an expansion / contraction ratio with respect to an original voice for changing a speech speed. When the speech rate change rate is given, the vowel section and the consonant section are set so that the expansion rate of the rhythm perception point interval becomes constant.
Alternatively, according to the expansion / contraction ratio of the vowel section, the consonant section, and the silent section, a waveform expansion / contraction rate is obtained for each section, and waveform compression or expansion is performed based on the waveform expansion / contraction rate. The speech speed changing device according to 5.

7. An expansion / contraction ratio table for storing a specific expansion / contraction ratio defined for each type of phoneme or each category of phoneme, wherein the waveform expansion / contraction means uses an expansion / contraction ratio with respect to the original voice for changing the speech speed. When the speech speed change rate is given, the expansion and contraction is performed for each phoneme type or phoneme category of the input voice so that the expansion and contraction rate of the rhythm perception point interval is constant with reference to the expansion and contraction ratio table. 6. The speech speed changing device according to claim 5, wherein a waveform expansion / contraction ratio for each section according to the ratio is obtained, and the waveform is compressed or expanded based on the waveform expansion / contraction ratio.

8. A labeling means for extracting a phoneme label of the input speech, a rhythm perception point unique to the phoneme sequence of the input speech, and a rhythm perception point and extension ratio table storing the extension ratio unique to the phoneme sequence. Rhythm perception point extracting means for extracting a rhythm perception point of the input voice and the expansion / contraction ratio of each of the phoneme sequences from the phoneme label with reference to the rhythm perception point and expansion / contraction ratio table; For each phoneme section corresponding to the expansion / contraction ratio of each of the phoneme sequences, when the speech speed change rate based on the expansion / contraction ratio with respect to the original voice is given, the expansion / contraction ratio of the interval between the rhythm perception points is constant. A speech speed changing device comprising: a waveform expansion / contraction unit that obtains a waveform expansion / contraction ratio and performs waveform compression or expansion based on the waveform expansion / contraction ratio.

9. A vector quantization codebook for recording a code vector of learning speech data together with an appearance probability of a vowel or a consonant of each code vector; and a vector quantization codebook for input speech by referring to the vector quantization codebook. Segmentation means for selecting a code vector by performing the conversion, and based on the appearance probability of a vowel or consonant in the selected code vector, a section discriminating means for determining whether the section is a vowel section or a consonant section, A vowel center extracting means for determining a vowel center position of the section, and a vowel section and a consonant section, or a vowel section, a consonant section, and an original expansion / contraction ratio for each silent section, for changing the speech rate, When a speech speed change rate based on the expansion / contraction ratio for voice is given, the expansion / contraction ratio at the interval between the vowel center positions becomes constant. In accordance with the expansion / contraction ratio of a vowel section and a consonant section, or a vowel section, a consonant section and a silent section, a waveform expansion / contraction rate for each section is obtained, and waveform compression or expansion is performed based on the waveform expansion / contraction rate. A speech speed changing device comprising a waveform expanding / contracting means.

10. A vector quantization codebook for recording a code vector for learning speech data together with an appearance probability of a vowel or a consonant of each code vector, and a vector quantization codebook for input speech by referring to the vector quantization codebook. Segmentation means for selecting a code vector by performing conversion, and based on the appearance probability of a vowel or consonant in the selected code vector, a section discriminating means for determining whether the section is a vowel section or a consonant section, A vowel center extracting means for determining a vowel center position of the vowel section determined by the discriminating means; and a vowel when a speech speed change rate based on an expansion / contraction ratio with respect to the original voice for changing a speech speed is given. The waveform expansion / contraction ratio in the section becomes maximum or minimum at the vowel center position, and approaches the boundary with the consonant section. A speech speed changing device comprising: a vowel center waveform expanding / contracting means for determining a waveform expansion / contraction rate of a consonant section to be close to the waveform expansion / contraction section and compressing or expanding the waveform of each section based on the waveform expansion / contraction rate.

11. A pitch analysis means for analyzing a pitch frequency of an input voice to obtain a pitch outline, and obtaining a maximum value and a minimum value of the pitch outline obtained by the pitch analysis means to obtain a pitch outline reference position. The pitch expansion and contraction reference position determination means is provided, and the waveform expansion / contraction means defines a specific expansion / contraction ratio for each vowel section and consonant section, or each vowel section, consonant section, and silent section, and changes the speech rate. Of the vowel section and the consonant section, or a vowel, so that when a speech speed change rate based on the expansion / contraction ratio with respect to the original voice is given, the expansion / contraction rate of the interval between the pitch rough reference positions is constant. 2. The speech speed according to claim 1, wherein a waveform expansion / contraction ratio is determined for each section according to the expansion / contraction ratio of the section, the consonant section, and the silent section, and the waveform is compressed or expanded based on the waveform expansion / contraction rate. Change device.

12. A pitch analyzing means for analyzing a pitch frequency of an input voice to obtain a pitch outline, and a provisional consonant section voice determined as a consonant section by a section determining means is converted into a pitch rough shape by said pitch analyzing means. According to the determination result, a part included in the vowel section and a consonant section are separated, and only the separated consonant section is output as a consonant section voice. A pitch rough dividing means for outputting the provisional vowel section voice determined as the section as a vowel section voice, wherein the waveform expanding / contracting means is specific to the section which is newly determined as a vowel section and a consonant section by the pitch rough dividing section. In order to determine the expansion / contraction ratio and change the utterance speed, when a speech speed change ratio based on the expansion / contraction ratio with respect to the original voice is given, a waveform for each section corresponding to the expansion / contraction ratio is given. Seeking shrinkage, speech speed changing device according to claim 1, characterized in that for performing waveform compression or decompression on the basis of the waveform scaling factor.