JP3422716B2

JP3422716B2 - Speech rate conversion method and apparatus, and recording medium storing speech rate conversion program

Info

Publication number: JP3422716B2
Application number: JP06551299A
Authority: JP
Inventors: 紀子水澤; 正信東田; 博和鈴木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-03-11
Filing date: 1999-03-11
Publication date: 2003-06-30
Anticipated expiration: 2019-03-11
Also published as: JP2000259200A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、録音音声の品質お
よび話者の声質を保ったまま話速だけを変換する話速変
換方法と装置とプログラムを格納した記録媒体に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech speed conversion method and apparatus for converting only a speech speed while maintaining the quality of recorded voice and the voice quality of a speaker, and a recording medium storing a program.

【０００２】[0002]

【従来の技術】各種の自動音声応答装置などに用いられ
る単語音声を蓄積する録音音声データベースは、それに
含まれる各音声データの話速が良く揃っていることが望
まれる。しかし、データベースが大規模になると、良く
訓練された発声者でも全ての音声データの話速が同じに
なるよう発声することは難しいため、録音音声データの
音質を劣化させることなく話速だけを変換する技術が必
要となる。2. Description of the Related Art In a recorded voice database for accumulating word voices used in various automatic voice response devices and the like, it is desired that the voice speeds of the respective voice data contained therein be well matched. However, when the database becomes large, it is difficult for even a well-trained speaker to speak so that all speech data have the same speech speed, so only the speech speed is converted without degrading the sound quality of recorded speech data. Technology is required.

【０００３】一方、テレビ放送などの録音音声データを
短時間で聞きたい、高齢者向けにゆっくり再生したいと
いう要求から、各種の話速変換方法が考案されている。
それらの中で、比較的簡便で原音の品質劣化、音質変化
の少ない方法として、音声波形区間を適宜挿入・削除し
て再生時間を変化させる方法がある。このような方法
は、大きく以下のように整理できる。On the other hand, various voice speed conversion methods have been devised in order to listen to recorded voice data such as television broadcast in a short time and to play back slowly for elderly people.
Among them, there is a method of changing the reproduction time by appropriately inserting / deleting a voice waveform section as a method that is relatively simple and does not cause deterioration of the quality of the original sound and a change in the sound quality. Such a method can be roughly summarized as follows.

【０００４】１．無音区間、子音区間、母音区間の区別
を行なうか否かによる区別1. Distinction based on whether or not to distinguish silent sections, consonant sections, and vowel sections

【０００５】人間の音声は大きく上記の３つの区間に分
けることができる。一般に人間が速く／遅く話そうとす
ると、無音区間、母音区間、子音区間の順で時間長を調
整している。これに着目し、処理対象の音声をまずこの
３つの区間にわけ、それぞれの区間に応じた時間長の調
整を行なう場合と、これらを区別せず全て同じ方式の時
間長調整を行なう場合がある。Human voice can be roughly divided into the above three sections. Generally, when a person tries to speak fast / slow, the time length is adjusted in the order of a silent section, a vowel section, and a consonant section. Focusing on this, the speech to be processed is first divided into these three sections, and the time length is adjusted according to each section, or the time length adjustment is performed by the same method without distinguishing between them. .

【０００６】２．挿入・削除する波形の長さによる区別2. Distinguishing by inserting / deleting waveform length

【０００７】挿入・削除する区間の長さを、常に一定の
長さにする場合と、基本周期の波形（以下、ピッチ区間
波形と呼ぶ）の長さもしくはその整数倍の長さにする場
合がある。There are cases where the length of the section to be inserted / deleted is always set to a constant length, and where the length of the waveform of the basic period (hereinafter referred to as the pitch section waveform) or an integral multiple thereof. is there.

【０００８】３．挿入・削除の方法による区別3. Distinguishing by insert / delete method

【０００９】挿入・削除を行なう際に、ある区間の波形
をそのまま挿入・削除する場合と、複数の切り出した波
形に窓関数をかけて、互いに重なるよう足し合わせて区
間長を調整する場合がある。When performing insertion / deletion, a waveform in a certain section may be inserted / deleted as it is, or a plurality of cut-out waveforms may be multiplied by a window function and added together so that the section lengths are adjusted. .

【００１０】[0010]

【発明が解決しようとする課題】従来考案されてきた方
法は、いずれも所望の話速変換率を得ることが第一の目
的であり、始めから音質を最重要視した方法ではない。
また、前記の従来技術においては、各々以下のような問
題点がある。In all of the methods that have been devised in the past, the first purpose is to obtain a desired speech rate conversion rate, and not the method that places the highest priority on sound quality from the beginning.
Further, each of the above conventional techniques has the following problems.

【００１１】１．無音区間、子音区間、母音区間の区別
について1. Distinguishing between silent sections, consonant sections, and vowel sections

【００１２】それぞれの区間に応じて時間長の調整を行
なう方法は、人間の話速調整方法に近く、自然な出力音
声が得られる。しかし従来の技術では、各区間の時間長
の調整を全く異なる方式で行なっている。このため、ま
ず音声波形を各区間にわけ、その後区間に応じた処理を
施すという二重の処理が必要である。The method of adjusting the time length according to each section is similar to the human speech speed adjusting method, and a natural output voice can be obtained. However, in the conventional technique, the time length of each section is adjusted by a completely different method. For this reason, it is necessary to divide the voice waveform into each section first, and then perform the processing according to the section.

【００１３】２，３．挿入・削除する波形の長さおよび
挿入・削除の方法について2,3. About the length of the waveform to be inserted / deleted and the insertion / deletion method

【００１４】人間の音声の基本周期は、女性の場合２．
５〜７ｍｓｅｃ、男性の場合５〜２０ｍｓｅｃと非常に
幅がある。これを考慮すると、ピッチ区間波形の長さに
応じて挿入・削除する波形の長さを変えて処理した方が
滑らかな出力音声になると考えられる。The basic cycle of human voice is 2.
It has a very wide range of 5 to 7 msec and 5 to 20 msec for men. Considering this, it is considered that smoother output speech can be obtained by changing the length of the waveform to be inserted / deleted according to the length of the pitch section waveform.

【００１５】しかし、音声波形の基本周期を正確に求め
るのは非常に難しい。そのため、簡便な周期抽出方法で
は接続部分が不連続になることが多く、これを避けるた
めに複数の切り出した波形に窓関数をかけて、互いに重
なるよう足し合わせて区間長を調整する方法がとられる
ことが多い。しかし、この方法は波形全体に渡って乗
算、加算などの演算が行なわれるため、原音の声質が損
なわれる可能性がある。一方、音声波形をフーリエ変換
などによって周波数領域で解析したり、ローパスフィル
タをかけたりして高精度に基本周期を抽出すれば波形を
そのまま挿入・削除しても接続部分の不連続が少なくて
済むが、計算量が多くなるため実時間性に欠け、装置も
大規模になる。However, it is very difficult to accurately determine the fundamental period of the voice waveform. Therefore, in a simple period extraction method, the connected portion is often discontinuous, and in order to avoid this, a method of applying a window function to a plurality of cut waveforms and adding them so that they overlap each other adjusts the section length. It is often done. However, in this method, since multiplication, addition, etc. are performed over the entire waveform, the voice quality of the original sound may be impaired. On the other hand, if the voice waveform is analyzed in the frequency domain by Fourier transform, etc., or if the low-pass filter is applied to extract the basic period with high accuracy, the discontinuity of the connection part can be reduced even if the waveform is inserted or deleted as it is. However, since the amount of calculation is large, it lacks real-time performance and the device becomes large-scale.

【００１６】本発明は、以上のような問題点に鑑みてな
されたものであり、原音の音質を保ったまま簡単な方法
で話速変換を行なおうとするものである。また、与える
パラメータを変えることにより、音質をどのくらい重視
するかを指定することができ、用途に応じた柔軟な話速
変換処理が可能とするものである。The present invention has been made in view of the above problems, and it is an object of the present invention to perform speech speed conversion by a simple method while maintaining the original sound quality. Also, by changing the parameters to be given, it is possible to specify how much importance is attached to the sound quality, and it is possible to perform flexible speech speed conversion processing according to the application.

【００１７】[0017]

【課題を解決するための手段】上記の課題を解決するた
め、請求項１記載の発明は、音声波形の基本周期の波形
（以下、ピッチ区間波形と呼ぶ）を適宜挿入・削除する
ことにより、音声波形の定常的な区間のみを伸長・短縮
して録音音声の話速を変換する話速変換方法において、
パラメータとして波形類似度下限値を設定し、入力音声
波形を先頭から順次短い区間で切りだし（以下、切り出
した波形をフレーム区間波形と呼ぶ）、当該フレーム区
間波形の波形類似度を計算し、前記波形類似度が、前記
パラメータとして与えた波形類似度下限値よりも大きい
場合に当該フレーム区間波形は定常的であるとみなし、
当該フレーム区間波形に対してピッチ区間波形の挿入・
削除を行なうことを特徴としており、前記ピッチ区間波
形を挿入・削除する点を決定する際に、前記フレーム区
間内の各時点について、当該時点を挟んで隣り合う２周
期分のピッチ区間波形の、差の２乗平均値または２乗和
または絶対値の平均値または絶対値の和を計算し、これ
が最小となる点を前記ピッチ区間波形を挿入・削除する
点とすることを特徴としている。 In order to solve the above-mentioned problems, the invention according to claim 1 inserts / deletes a waveform (hereinafter referred to as a pitch section waveform) having a basic period of a voice waveform as appropriate. In the speech speed conversion method that expands or shortens only the stationary section of the voice waveform to convert the speech speed of the recorded voice,
The waveform similarity lower limit value is set as a parameter, the input speech waveform is sequentially cut out from the beginning in short intervals (hereinafter, the cut out waveform is referred to as a frame interval waveform), the waveform similarity of the frame interval waveform is calculated, and When the waveform similarity is larger than the waveform similarity lower limit value given as the parameter, it is considered that the frame section waveform is stationary,
Insert a pitch section waveform to the frame section waveform
It is characterized in that the pitch section wave is deleted.
When determining the point to insert or delete a shape,
For each time point in the interval, two laps adjacent to each other with the time point in between
Squared average value or sum of squared difference of pitch section waveform for period
Or calculate the average of absolute values or the sum of absolute values,
Insert / delete the pitch interval waveform at the point where
It is characterized by making it a point.

【００１８】また、請求項２記載の発明は、音声波形の
基本周期の波形（以下、ピッチ区間波形と呼ぶ）を適宜
挿入・削除することにより、音声波形の定常的な区間の
みを伸長・短縮して録音音声の話速を変換する話速変換
装置において、パラメータとして波形類似度下限値を設
定する手段と、入力音声波形を先頭から順次短い区間で
切りだし（以下、切り出した波形をフレーム区間波形と
呼ぶ）、当該フレーム区間波形の波形類似度を計算し、
前記波形類似度が、前記パラメータとして与えた波形類
似度下限値よりも大きい場合に当該フレーム区間波形は
定常的であるとみなし、ピッチ区間波形の挿入・削除を
行なう手段とを備え、前記ピッチ区間波形を挿入・削除
する点を決定する際に、前記フレーム区間内の各時点に
ついて、当該時点を挟んで隣り合う２周期分のピッチ区
間波形の、差の２乗平均値または２乗和または絶対値の
平均値または絶対値の和を計算し、これが最小となる点
を前記ピッチ区間波形を挿入・削除する点とすることを
特徴とする話速変換装置である。 According to the second aspect of the invention, the speech waveform
Waveform of basic period (hereinafter referred to as pitch section waveform)
By inserting / deleting,
Speech speed conversion that expands and shortens only the speed to convert the speech speed of recorded voice
In the equipment, set the waveform similarity lower limit as a parameter.
And the input voice waveform in a short section from the beginning.
Cutout (Hereinafter, the cutout waveform is referred to as the frame section waveform.
Call), calculate the waveform similarity of the frame section waveform,
The waveform similarity is the waveform type given as the parameter.
If it is larger than the lower limit of similarity, the frame section waveform is
Considering that it is stationary, insert / delete the pitch section waveform.
And means for performing insertion / deletion of the pitch section waveform
When deciding which point to
Then, the pitch section for two cycles that are adjacent to each other across the time point.
Of the mean square of the difference, the sum of squares, or the absolute value of the waveform
The point at which the sum of averages or absolute values is calculated and which is the minimum
Be the point to insert / delete the pitch section waveform
It is a characteristic speech speed conversion device.

【００１９】また、請求項３記載の発明は、請求項１に
記載の話速変換方法を、計算機を用いて実行するための
話速変換プログラムを格納した計算機読み取り可能な記
録媒体である。 The invention according to claim 3 is the same as that of claim 1.
To execute the described speech speed conversion method using a computer
A computer-readable note that stores the speech speed conversion program.
It is a recording medium.

【００２０】本発明による話速変換方法は、母音区間の
区間長のみを、ピッチ区間波形を挿入・削除することに
より伸縮させて全体の長さを調節する。 The speech rate conversion method according to the present invention is applied to the vowel section.
Inserting / deleting the pitch interval waveform only for the interval length
Adjust the overall length by stretching more.

【００２１】しかし従来技術のように始めに母音区間を
抽出する処理が必要なわけではなく、母音区間は波形が
定常的であることを利用して、処理するフレーム区間の
自己相関係数を計算して当該フレーム区間が母音区間か
どうかを判定し、同時に基本周期を得ている。 However, as in the prior art, first the vowel section
The extraction process is not necessary, and the waveform is
By using the fact that it is stationary,
Calculate the autocorrelation coefficient to determine if the frame section is a vowel section.
It is determined whether or not the basic cycle is obtained at the same time.

【００２２】波形が定常的な部分しか処理を行なわない
ため、基本周期の抽出の誤りが少なく、接続部分が不連
続になることが少ない。したがって窓関数をかけるなど
の処理を行なわなくても、雑音が混入する可能性が低
い。Since only the stationary portion of the waveform is processed, there are few errors in the extraction of the basic period, and the connected portion is less likely to be discontinuous. Therefore, noise is unlikely to be mixed in without performing processing such as applying a window function.

【００２３】[0023]

【発明の実施の形態】図１は本発明による話速変換装置
の実施の形態の一例を示すブロック図である。本発明に
よる話速変換装置１は音声波形入力部２と閾値入力部３
と話速変換率入力部４とフレーム切りだし部５と自己相
関係数計算部６と処理判断部７と波形挿入／削除部８と
音声波形出力部９から成る。1 is a block diagram showing an example of an embodiment of a speech speed conversion apparatus according to the present invention. The speech speed converter 1 according to the present invention comprises a voice waveform input unit 2 and a threshold value input unit 3.
It comprises a speech rate conversion rate input unit 4, a frame cutout unit 5, an autocorrelation coefficient calculation unit 6, a processing judgment unit 7, a waveform insertion / deletion unit 8 and a voice waveform output unit 9.

【００２４】音声入力部２は話速変換したい音声を取得
する。閾値入力部３は波形の類似度を表す自己相関係数
の下限を取得し、処理判断部７に送る。話速変換率入力
部４はどのくらい話速を速く／遅くするかを表す話速変
換率を取得し、波形挿入／削除部８に送る。The voice input unit 2 acquires a voice whose voice speed is desired to be converted. The threshold value input unit 3 acquires the lower limit of the autocorrelation coefficient indicating the similarity of the waveform, and sends it to the processing determination unit 7. The speech rate conversion rate input unit 4 acquires a speech rate conversion rate representing how much the speech rate is to be increased / decreased, and sends it to the waveform insertion / deletion unit 8.

【００２５】フレーム切りだし部５は波形挿入／削除部
８から送られるフレーム開始点に応じて、音声入力部２
において取得された音声波形の一部分（以下フレーム区
間波形と呼ぶ）を切り出す。切り出すフレーム区間波形
が処理対象だった場合は自己相関係数計算部６と波形挿
入／削除部８に送り、処理対象でなかった場合は音声波
形処出力部９へ送る。The frame cut-out unit 5 is responsive to the frame start point sent from the waveform insertion / deletion unit 8 to the voice input unit 2
A part (hereinafter referred to as a frame section waveform) of the speech waveform obtained in step S1 is cut out. If the frame segment waveform to be cut out is the processing target, it is sent to the autocorrelation coefficient calculation unit 6 and the waveform insertion / deletion unit 8, and if it is not the processing target, it is sent to the speech waveform processing output unit 9.

【００２６】自己相関係数計算部６はフレーム切りだし
部５から送られたフレーム区間波形の自己相関係数を計
算して自己相関係数の極大値を処理判断部７に、極大値
を与えるフレーム区間波形のずれを基本周期の長さとし
て波形挿入／削除部８に送る。処理判断部７は自己相関
係数計算部６から送られた自己相関係数の極大値と閾値
入力部３から送られた相関係数の下限を比較して波形の
挿入／削除を行なうかどうかを決定し、その判断結果を
波形挿入／削除部８へ送る。The autocorrelation coefficient calculation unit 6 calculates the autocorrelation coefficient of the frame section waveform sent from the frame cutout unit 5 and gives the maximum value of the autocorrelation coefficient to the processing judgment unit 7. The shift of the frame section waveform is sent to the waveform insertion / deletion unit 8 as the length of the basic cycle. Whether the processing determination unit 7 compares the maximum value of the autocorrelation coefficient sent from the autocorrelation coefficient calculation unit 6 with the lower limit of the correlation coefficient sent from the threshold value input unit 3 to insert / delete the waveform. Is determined and the result of the determination is sent to the waveform insertion / deletion unit 8.

【００２７】波形挿入／削除部８は処理判断部７から送
られた挿入／削除を行なうか否かの判断結果を受けと
り、必要に応じてフレーム切りだし部５から送られたフ
レーム区間波形に対して基本周期の波形（以下、ピッチ
区間波形と呼ぶ）の挿入／削除を行ない、音声波形出力
部９に送る。波形挿入／削除部８は、また、話速変換率
入力部４から送られた話速変換率と自己相関係数計算部
６から送られた基本周期の長さを用いて次のフレームの
開始を決定し、フレーム切りだし部５に送る。The waveform insertion / deletion unit 8 receives the determination result sent from the processing determination unit 7 as to whether or not to perform the insertion / deletion, and if necessary, with respect to the frame section waveform sent from the frame cutout unit 5. Then, the waveform of the basic cycle (hereinafter referred to as the pitch section waveform) is inserted / deleted and sent to the voice waveform output unit 9. The waveform insertion / deletion unit 8 also uses the speech rate conversion rate sent from the speech rate conversion rate input unit 4 and the basic period length sent from the autocorrelation coefficient calculation unit 6 to start the next frame. Is determined and sent to the frame cutout unit 5.

【００２８】音声波形出力部９はフレーム切りだし部５
または波形挿入／削除部８から送られたフレーム区間波
形を順次、または入力された音声の処理が全て終了する
まで蓄積したのち、出力する。The voice waveform output section 9 is a frame cutting section 5
Alternatively, the frame section waveforms sent from the waveform inserting / deleting unit 8 are sequentially stored or accumulated until the processing of the input voice is completed, and then output.

【００２９】図２は、図１に示す本発明による話速変換
装置１の動作を説明する流れ図である。以下、図１のブ
ロック図と図２の流れ図に従って具体的に音声の話速を
α倍にする場合の、この話速変換装置の動作例を説明す
る。FIG. 2 is a flow chart for explaining the operation of the speech speed conversion apparatus 1 according to the present invention shown in FIG. Hereinafter, an operation example of the speech speed conversion device when the speech speed of voice is specifically multiplied by α will be described with reference to the block diagram of FIG. 1 and the flowchart of FIG.

【００３０】音声入力部２は話速変換したい音声を取得
し、必要に応じてディジタル信号ｘ（ｎ）；ｎ＝０，
１，２，…に直す（ステップＳ１）。ここでｎは時刻に
相当するインデックスである。閾値入力部３は波形の類
似度を表す自己相関係数の下限値βを取得し、処理判断
部７に送る（ステップＳ２）。The voice input unit 2 acquires a voice whose voice speed is desired to be converted, and if necessary, a digital signal x (n); n = 0,
1, 2, ... (Step S1). Here, n is an index corresponding to time. The threshold value input unit 3 acquires the lower limit value β of the autocorrelation coefficient representing the degree of similarity of the waveform and sends it to the processing determination unit 7 (step S2).

【００３１】雑音の混入など音質の劣化を防ぐため、波
形の挿入／削除は波形が定常的な部分で行なうことが望
ましいが、一般に波形が定常的な場合は自己相関係数が
１に近くなる。閾値入力部３で取得する自己相関係数の
下限値βは、処理したい波形がどのくらい定常的なら基
本周期の波形（以下、ピッチ区間波形と呼ぶ）の挿入／
削除を行なうかを指定するものである。βを小さくする
と波形が定常的でない部分でも挿入／削除を行なうた
め、雑音が混入しやすくなるが、目標とする話速に近い
出力が得られる。逆にβを１に近くすると雑音混入のリ
スクを抑えることができる。話速変換の目的に応じて自
己相関係数下限値βの値を設定することにより、「品質
を落さない範囲で話速を変える」といった柔軟な処理が
可能となる。例えばここではβ＝０．７に設定されるも
のとする。In order to prevent the deterioration of the sound quality such as the inclusion of noise, it is desirable that the insertion / deletion of the waveform is performed in the stationary portion of the waveform. Generally, when the waveform is stationary, the autocorrelation coefficient is close to 1. . The lower limit value β of the autocorrelation coefficient acquired by the threshold value input unit 3 is the insertion / insertion of the waveform of the basic cycle (hereinafter referred to as the pitch section waveform) when the waveform to be processed is stationary.
This is to specify whether to delete. If β is made small, insertion / deletion is performed even in a portion where the waveform is not stationary, so noise is likely to be mixed, but an output close to the target speech speed can be obtained. On the contrary, when β is close to 1, the risk of noise mixing can be suppressed. By setting the value of the lower limit value β of the autocorrelation coefficient according to the purpose of the voice speed conversion, it is possible to perform flexible processing such as "changing the voice speed within a range that does not deteriorate the quality". For example, β is set to 0.7 here.

【００３２】話速変換率入力部４はどのくらい話速を速
く／遅くするかを表す話速変換率αを取得し、波形挿入
／削除部８に送る（ステップＳ３）。例えばここでは話
速を１．２倍に速くする場合を考え、話速変換率α＝
１．２とする。The voice speed conversion rate input unit 4 acquires a voice speed conversion rate α indicating how fast or slow the voice speed is, and sends it to the waveform inserting / deleting unit 8 (step S3). For example, here, considering the case of increasing the speech speed to 1.2 times, the speech speed conversion rate α =
Set to 1.2.

【００３３】フレーム切りだし部５は波形挿入／削除部
８から送られるフレーム開始点ｎ_biに応じて、音声入力
部２において取得された音声波形の一部分（以下フレー
ム区間波形と呼ぶ）を切り出す。本話速変換装置は音声
波形を先頭から順次フレーム区間ごとに処理する。送ら
れたフレーム開始点ｎ_biが、前回切り出したフレーム区
間波形の終了点ｎ_e(i-1)より大きい場合は、まずThe frame cutout unit 5 cuts out a part of the voice waveform (hereinafter referred to as a frame section waveform) acquired by the voice input unit 2 according to the frame start point n _bi sent from the waveform insertion / deletion unit 8. The speech speed conversion apparatus sequentially processes the speech waveform from the beginning for each frame section. If the sent frame start point n _bi is larger than the end point n _{e (i-1) of} the frame segment waveform cut out last time, first

【００３４】[0034]

【数１】 [Equation 1]

【００３５】を切りだして直接音声波形出力部９へ送り
（ステップＳ６）、次にｎ_biを先頭として一定の長さＭ
サンプルの波形Is cut out and sent directly to the speech waveform output unit 9 (step S6), and then a fixed length M starting from n _bi
Sample waveform

【００３６】[0036]

【数２】 [Equation 2]

【００３７】を切りだして自己相関係数計算部６と波形
挿入／削除部８に送る（ステップＳ７）。なお、ディジ
タル信号ｘ（ｎ）のインデックスｎにおいて、ｎ_biはｉ
番目のフレームの開始点、ｎ_b(i-1)は（ｉ−１）番目の
フレームの開始点、ｎ_eiはｉ番目のフレームの終了点、
ｎ_e(i-1)は（ｉ−１）番目のフレームの終了点を表して
いる。The signal is cut out and sent to the autocorrelation coefficient calculation unit 6 and the waveform insertion / deletion unit 8 (step S7). Note that n _bi is i at the index n of the digital signal x (n).
N _{b (i-1)} is the start point of the (i−1) th frame, n _ei is the end point of the i th frame,
_{ne (i-1)} represents the end point of the (i-1) th frame.

【００３８】送られたフレーム開始点ｎ_biが、前回切り
出したフレーム区間波形の終了点ｎ _e(i-1)以下の場合
は、すぐにｎ_biを先頭としたＭサンプルの波形ｘ
（ｎ_bi）〜ｘ（ｎ_ei）を切りだして自己相関係数計算部
６と波形挿入／削除部８に送る（ステップＳ７）。Frame start point n sent_biBut last time
End point n of the generated frame section waveform _{e (i-1)}If
Immediately n_biWaveform of M samples starting with x
(N_bi) ~ X (n_ei) Is cut out to calculate the autocorrelation coefficient
6 and the waveform insertion / deletion unit 8 (step S7).

【００３９】図３にＭサンプルの長さのフレーム区間波
形の一例を示す。この例は、サンプリング周波数が１６
ｋＨｚの音声データの一部を切り出したものであり、Ｍ
＝２４０、この場合は１５ｍｓｅｃである。FIG. 3 shows an example of a frame section waveform having a length of M samples. In this example, the sampling frequency is 16
This is a cut-out of a part of the audio data of kHz, and M
= 240, in this case 15 msec.

【００４０】自己相関係数計算部６はフレーム切りだし
部５から送られたフレーム区間波形ｘ（ｎ_bi）〜ｘ（ｎ
_ei）の自己相関係数ｙ（ｍ）を計算する（ステップＳ
８）。ｙ（ｍ）は、ここではフレーム区間波形のみの計
算とし、The autocorrelation coefficient calculation unit 6 receives the frame section waveforms x (n _bi ) to x (n) sent from the frame cutout unit 5.
_ei ) autocorrelation coefficient y (m) is calculated (step S
8). y (m) is the calculation of only the frame section waveform here,

【００４１】[0041]

【数３】と定義する。[Equation 3] It is defined as

【００４２】図４（ａ）に自己相関係数の計算イメー
ジ、図４（ｂ）に図３のフレーム区間波形の自己相関係
数ｙ（ｍ）を示す。自己相関係数は、ある波形と、その
波形をｍずらした波形がどのくらい似ているかを示すも
のである。フレーム区間波形が周期的な場合には基本周
期Ｔの整数倍ずれた時に二つの波形が似ており、ｙ
（ｍ）が大きい値になる。逆に波形があまり周期的でな
い場合は、波形をずらすと似た部分がないので、ｙ
（ｍ）の値が小さい。波形が周期的でない例として、図
５、図６に別のフレーム区間波形とその自己相関係数を
示す。FIG. 4A shows an image of calculating the autocorrelation coefficient, and FIG. 4B shows the autocorrelation coefficient y (m) of the frame section waveform of FIG. The autocorrelation coefficient indicates how similar a certain waveform is to a waveform obtained by shifting the waveform by m. When the frame section waveform is periodic, the two waveforms are similar when they are shifted by an integer multiple of the basic period T, and y
(M) has a large value. On the contrary, if the waveform is not very periodic, there is no similar part when the waveform is shifted, so y
The value of (m) is small. As an example in which the waveform is not periodic, FIGS. 5 and 6 show another frame section waveform and its autocorrelation coefficient.

【００４３】自己相関係数計算部はｙ（ｍ）を計算した
のち、フレーム区間内の極大値ｙ（Ｔ）を探索し、自己
相関係数の極大値ｙ（Ｔ）を処理判断部７に、極大値を
与えるフレーム区間波形のずれ、すなわち基本周期の長
さＴを波形挿入／削除部８に送る（ステップＳ９）。図
３の例ではｙ（Ｔ）＝０．７５、Ｔ＝５５、図５の例で
はｙ（Ｔ）＝０．５６、Ｔ＝５３である。After calculating y (m), the autocorrelation coefficient calculation unit searches for the maximum value y (T) within the frame section, and the maximum value y (T) of the autocorrelation coefficient is sent to the processing determination unit 7. , The shift of the frame section waveform giving the maximum value, that is, the length T of the basic period is sent to the waveform inserting / deleting unit 8 (step S9). In the example of FIG. 3, y (T) = 0.75 and T = 55, and in the example of FIG. 5, y (T) = 0.56 and T = 53.

【００４４】処理判断部７は自己相関係数計算部６から
送られた自己相関係数の極大値ｙ（Ｔ）と閾値入力部３
から送られた自己相関係数の下限値βを比較する（ステ
ップＳ１０）。ｙ（Ｔ）の値が大きければ波形が定常的
なので、波形処理を行なっても雑音が混入しにくい。図
３の例ではｙ（Ｔ）＝０．７５＞０．７＝βなので、ピ
ッチ区間波形の挿入／削除を行なうと決定する。図５の
例ではｙ（Ｔ）＝０．５６＜０．７＝βなので、ピッチ
区間波形の挿入／削除を行なわないと決定する。例えば
判断結果を示す信号を、挿入／削除を行なう場合１、行
なわない場合０として、波形挿入／削除部８へ送る。The processing determination section 7 is a threshold value input section 3 and the maximum value y (T) of the autocorrelation coefficient sent from the autocorrelation coefficient calculation section 6.
The lower limit value β of the autocorrelation coefficient sent from is compared (step S10). If the value of y (T) is large, the waveform is stationary, so that noise is unlikely to be mixed even if waveform processing is performed. In the example of FIG. 3, since y (T) = 0.75> 0.7 = β, it is determined to insert / delete the pitch section waveform. In the example of FIG. 5, since y (T) = 0.56 <0.7 = β, it is determined not to insert / delete the pitch section waveform. For example, a signal indicating the determination result is sent to the waveform insertion / deletion unit 8 as 1 when the insertion / deletion is performed and 0 when the insertion / deletion is not performed.

【００４５】波形挿入／削除部８はまず処理判断部７か
ら送られた挿入／削除を行なうか否かの判断信号を受け
とり、それによって異なる動作をする。挿入／削除を行
なわない場合は、次のフレーム開始点ｎ_b(i+1)をこのフ
レームの次の点すなわちｎ_ei＋１と決定してフレーム切
りだし部５に送り（ステップＳ１１）、フレーム区間波
形に何も処理を施さずに音声波形出力部９に送る（ステ
ップＳ２３）。挿入／削除を行なう場合は、まず自己相
関係数計算部６から送られた基本周期の長さＴを用いて
フレーム区間内でピッチ区間波形の挿入／削除を行なう
点Ｐを探索する（ステップＳ１２）。点Ｐは、フレーム
区間内で最も似ている２周期の間の点とする。例えばｍ
＝ｎ_bi＋Ｔ〜ｎ_ei−Ｔに渡ってThe waveform insertion / deletion unit 8 first receives the determination signal sent from the processing determination unit 7 as to whether or not to perform the insertion / deletion, and operates differently. When the insertion / deletion is not performed, the next frame start point n _{b (i + 1)} is determined as the next point of this frame, that is, n _ei +1 and sent to the frame cutout unit 5 (step S11), and the frame section The waveform is sent to the audio waveform output unit 9 without any processing (step S23). In the case of inserting / deleting, first, using the length T of the basic period sent from the autocorrelation coefficient calculation unit 6, a point P where the pitch section waveform is inserted / deleted in the frame section is searched (step S12). ). The point P is a point between the two most similar periods in the frame section. For example, m
= N _bi + T to n _ei -T

【００４６】[0046]

【数４】 [Equation 4]

【００４７】を計算し、ｚ（ｍ）が最小になるようなｍ
をＰとする。ｚ（ｍ）の計算は、上記のような定義、す
なわち当該時点を挟んで隣り合う２周期分のピッチ区間
波形の差の２乗和の計算以外、｜ｘ（ｎ）−ｘ（ｎ＋
Ｔ）｜の平均値、２乗平均値または絶対値の和の計算な
どでも良い。図７に図３に示したフレーム区間波形にお
いてＰを探索する様子を示す。And m such that z (m) is minimized
Be P. The calculation of z (m) is not the above definition, that is, the calculation of the sum of squares of the difference between the pitch section waveforms for two cycles that are adjacent to each other with the time point sandwiched, | x (n) −x (n +
It is also possible to calculate the average value, the root mean square value or the sum of absolute values of T) |. FIG. 7 shows how P is searched in the frame section waveform shown in FIG.

【００４８】次に話速変換率入力部４から送られた話速
変換率αが１より大きいか、小さいかによって、ピッチ
区間波形を挿入するのか削除するのか判断する（ステッ
プＳ１３）。α＞１の場合は話速を速くするので、ｘ
（Ｐ＋１）〜ｘ（Ｐ＋Ｔ）をフレーム区間波形から削除
する（ステップＳ１４）。α＜１の場合は話速を遅くす
るので、ｘ（Ｐ＋１）〜ｘ（Ｐ＋Ｔ）を一度バッファに
保存して、それをフレーム区間波形のｘ（Ｐ）とｘ（Ｐ
＋１）の間に挿入する（ステップＳ１５）。Next, it is determined whether the pitch section waveform is to be inserted or deleted depending on whether the speech rate conversion rate α sent from the speech rate conversion rate input unit 4 is larger or smaller than 1 (step S13). If α> 1, the speech speed is increased, so x
(P + 1) to x (P + T) are deleted from the frame section waveform (step S14). When α <1, the speech speed is slowed down. Therefore, x (P + 1) to x (P + T) are once stored in the buffer, and they are stored in the frame section waveforms x (P) and x (P).
It is inserted between (+1) (step S15).

【００４９】次に、αとＴからフレーム移動量Ｍ’を計
算する（ステップＳ１６）。（Ｍ’±Ｔ）／Ｍ’＝１／
αから、Next, the frame movement amount M'is calculated from α and T (step S16). (M '± T) / M' = 1 /
From α,

【００５０】[0050]

【数５】 [Equation 5]

【００５１】を計算すれば良い。次のフレーム開始点ｎ
_b(i+1)は基本的にはｎ_bi＋Ｍ’＋１になる（ステップＳ
１８）。ただし、次のフレーム開始点が波形の挿入／削
除を行なった点より以前にある場合、すなわちｎ_bi＋
Ｍ’＜Ｐ＋Ｔの場合は、Ｐ＋Ｔ＋１を次のフレーム開始
点ｎ_b(i+1)とする（ステップＳ１７，Ｓ１９）。このよ
うに決定したｎ_b(i+1)をフレーム切りだし部５に送る。It is sufficient to calculate Next frame start point n
_{b (i + 1)} is basically n _bi + M '+ 1 (step S
18). However, when the next frame start point is before the point where the waveform is inserted / deleted, that is, n _bi +
If M '<P + T, P + T + 1 is set as the next frame start point _{nb (i + 1)} (steps S17 and S19). The n _{b (i + 1)} thus determined is sent to the frame cutout unit 5.

【００５２】次に、処理したフレーム区間波形のうち、
フレーム開始点ｎ_biから次のフレーム開始点ｎ_b(i+1)−
１までの波形を処理したフレーム区間波形を音声波形出
力部９に送る（ステップＳ２１，Ｓ２２）。ただし、次
のフレーム開始点がフレーム区間の外にある場合、すな
わちｎ_b(i+1)−１＞ｎ_eiの場合はフレーム区間ｘ
（ｎ _bi）〜ｘ（ｎ_ei）を出力する（ステップＳ２０，Ｓ
２３）。この場合、残りのｘ（ｎ_ei＋１）〜ｘ（ｎ
_b(i+1)）は次のフレーム区間を切り出す前に、そのまま
出力される（Ｓ６）。Next, of the processed frame section waveforms,
Frame start point n_biTo the next frame start point n_{b (i + 1)}−
Outputs the frame section waveform that processed the waveforms up to 1 as a voice waveform
It is sent to the force unit 9 (steps S21 and S22). However, next
If the frame start point of is outside the frame section,
Side n_{b (i + 1)}-1> n_eiFrame interval x
(N _bi) ~ X (n_ei) Is output (steps S20, S
23). In this case, the remaining x (n_ei+1) to x (n
_{b (i + 1)}) Is as it is before cutting out the next frame section
It is output (S6).

【００５３】フレームの進め方と出力範囲の例を図８〜
図１０に示す。例えばＴ＝５５，Ｍ＝２４０のフレーム
区間において、α＝１．２の場合（図８）、Examples of how to proceed the frame and output range are shown in FIG.
As shown in FIG. For example, in the frame section of T = 55 and M = 240, when α = 1.2 (FIG. 8),

【００５４】[0054]

【数６】 [Equation 6]

【００５５】であり、Ｐ＋Ｔ＜ｎ_ei＜ｎ_bi＋Ｍ’なの
で、次のフレーム開始点ｎ_b(i+1)としてｎ_bi＋３３１を
フレーム切りだし部５に送り、ｘ（ｎ_bi）〜ｘ（ｎ_ei）
の途中１周期５５サンプルを削除した波形を音声波形出
力部９に送る（ステップＳ１７→Ｓ１８→Ｓ２０→Ｓ２
３）。Since P + T <n _ei <n _bi + M ', n _bi +331 is sent to the frame cutout unit 5 as the next frame start point n _{b (i + 1)} , and x (n _bi ) to x ( n _ei )
The waveform from which 55 samples are deleted during one cycle is sent to the voice waveform output unit 9 (steps S17 → S18 → S20 → S2).
3).

【００５６】α＝１．３５の場合（図９）、When α = 1.35 (FIG. 9),

【００５７】[0057]

【数７】 [Equation 7]

【００５８】であり、Ｐ＋Ｔ＜ｎ_bi＋Ｍ’＜ｎ_eiなの
で、次のフレーム開始点ｎ_b(i+1)としてｎ_bi＋２１３を
フレーム切りだし部５に送り、ｘ（ｎ_bi）〜ｘ（ｎ_bi＋
Ｍ’）の途中１周期５５サンプルを削除した波形を音声
波形出力部９に送る（ステップＳ１７→Ｓ１８→Ｓ２０
→Ｓ２１）。Since P + T <n _bi + M '<n _ei , n _bi +213 is sent to the frame cutout unit 5 as the next frame start point n _{b (i + 1)} , and x (n _bi ) to x ( n _bi +
The waveform obtained by deleting 55 samples for one cycle in the middle of M ′) is sent to the speech waveform output unit 9 (steps S17 → S18 → S20).
→ S21).

【００５９】α＝１．５の場合（図１０）、When α = 1.5 (FIG. 10),

【００６０】[0060]

【数８】 [Equation 8]

【００６１】であり、ｎ_bi＋Ｍ’＜Ｐ＋Ｔ＜ｎ_eiなの
で、次のフレーム開始点ｎ_b(i+1)としてＰ＋５５＋１を
フレーム切りだし部５に送り、ｘ（ｎ_bi）〜ｘ（Ｐ＋
Ｔ）のうち処理された部分、この場合はｘ（Ｐ＋１）〜
ｘ（Ｐ＋Ｔ）を削除しているのでｘ（ｎ_bi）〜ｘ（Ｐ）
を音声波形出力部９に送る（ステップＳ１７→Ｓ１９→
Ｓ２２）。Since n _bi + M '<P + T <n _ei , P + 55 + 1 is sent to the frame cutout unit 5 as the next frame start point n _{b (i + 1)} , and x (n _bi ) to x (P +
The processed part of T), in this case x (P + 1)-
Since x (P + T) is deleted, x (n _bi ) to x (P)
To the voice waveform output unit 9 (steps S17 → S19 →
S22).

【００６２】この例では波形の挿入／削除を行なう点Ｐ
を探索する際、フレーム区間全体に対して探索を行なっ
たが、この探索の範囲をＴ一周期分にすれば結果的に次
のフレーム開始点ｎ_b(i+1)がＰ＋Ｔの手前になる可能性
が低くなり、所望の話速変換率に近い変換率が得られ
る。音声波形出力部９はフレーム切りだし部５または波
形挿入／削除部８から送られたフレーム区間波形を順次
出力する。入力音声を全て出力したかどうかを判断し
（ステップＳ２４）、まだ出力が終了していない場合は
ステップＳ４に戻り、次のフレーム処理を開始する。In this example, a point P at which a waveform is inserted / deleted
When searching for, the entire frame section was searched, but if the range of this search is set to one cycle of T, then the next frame start point n _{b (i + 1)} will be before P + T. The possibility is reduced, and a conversion rate close to the desired speech rate conversion rate is obtained. The voice waveform output unit 9 sequentially outputs the frame section waveforms sent from the frame cutting unit 5 or the waveform inserting / deleting unit 8. It is determined whether all the input voices have been output (step S24). If the output has not been completed, the process returns to step S4 to start the next frame processing.

【００６３】音声波形出力部９が送られたフレーム区間
波形を順次出力する場合は、入力音声をフレーム区間ご
とに次々に処理していくので、入力に対してリアルタイ
ムに出力していくことが可能であり、例えばＶＴＲ（ビ
デオテープレコーダ）の再生速度変更時の音声出力など
に適用可能である。音声データベース中の各音声データ
の話速変換など、実時間性が要求されない用途では、音
声波形出力部９は入力された音声の処理が全て終了する
までフレーム区間波形を蓄積したのち、必要に応じて接
続部分に平滑化処理を施して出力しても良い。When the voice waveform output section 9 sequentially outputs the transmitted frame section waveforms, the input voice is processed one after another for each frame section, so that it is possible to output in real time to the input. The present invention can be applied to, for example, audio output when changing the playback speed of a VTR (video tape recorder). In applications where real-time processing is not required, such as conversion of the voice speed of each voice data in the voice database, the voice waveform output unit 9 accumulates the frame section waveform until all the processing of the input voice is completed, and then, if necessary. The connection portion may be smoothed and output.

【００６４】上述のように、本発明による話速変換装置
は話速変換率αと自己相関係数の下限値βの二つのパラ
メータを取得し、βで指示したレベルの音質でできるだ
け話速をα倍に近付けようとする。これらの与え方によ
り適用領域に応じた柔軟な処理が可能になる。例えばβ
を０．８などの１に近い値に設定しておけば、音質を保
ったまま話速変換できる範囲で処理を行なうので、録音
音声データベースの話速均一化など、高音質が要求され
るような用途に向く。As described above, the speech speed conversion apparatus according to the present invention acquires two parameters, that is, the speech speed conversion rate α and the lower limit value β of the autocorrelation coefficient, and sets the speech speed as high as possible with the sound quality indicated by β. I try to get closer to α times. By these methods of giving, it becomes possible to perform flexible processing according to the application area. For example β
If the value is set to a value close to 1 such as 0.8, the processing will be performed within the range in which the voice speed can be converted while maintaining the voice quality, so that the high voice quality such as the voice speed uniformity of the recorded voice database is required. Suitable for various uses.

【００６５】また、αを与えず、話速を速くするか遅く
するかだけを指示し、ステップＳ１６，Ｓ１７，Ｓ１
８，Ｓ２０において決定される次のフレーム開始点ｎ
_b(i+1)を常にＰ＋１とするようにすれば、βで指示した
レベルの音質でできるだけ速く／遅くする、という処理
が可能である。Further, without giving α, it is instructed only to increase or decrease the speech speed, and steps S16, S17 and S1 are performed.
8, the next frame start point n determined in S20
_{If b (i + 1)} is always set to P + 1, it is possible to perform processing of making the sound quality at the level designated by β as fast / slow as possible.

【００６６】また、録音内容がわかれば良いといった、
音質がそれほど重要でない用途では、βとして０．１な
ど極端に小さい値を与えれば、少しでも周期的なフレー
ム区間波形であれば挿入／削除処理を行なうため、与え
たαに近い出力が得られる。Moreover, it is only necessary to understand the recorded contents,
In applications where sound quality is not so important, if an extremely small value such as 0.1 is given, insertion / deletion processing will be performed if the waveform is a periodic frame section, and an output close to the given α can be obtained. .

【００６７】なお、本発明の実施の形態は、図１および
図２に示した構成に限定されるものではなく、例えば、
話速変換率αと自己相関係数の下限値βの一方または両
方を、あらかじめ設定しておくようにしたり、あるい
は、あらかじめ設定した値から選択して設定するように
する等の変更が可能である。また、本発明の話速変換装
置は、組合せ論理回路等を用いたハードウェアのみによ
って実現することも可能であるとともに、コンピュータ
とそれによって実行されるソフトウェアを用いて実現す
ることも可能である。また、このソフトウェアは、コン
ピュータ読み取り可能な記録媒体、あるいは通信回線を
介して配布することが可能である。The embodiment of the present invention is not limited to the configuration shown in FIG. 1 and FIG.
One or both of the speech rate conversion rate α and the lower limit value β of the autocorrelation coefficient can be set in advance, or can be changed by selecting from preset values. is there. Further, the speech speed conversion device of the present invention can be realized not only by hardware using a combinational logic circuit or the like, but also by using a computer and software executed thereby. Further, this software can be distributed via a computer-readable recording medium or a communication line.

【００６８】[0068]

【発明の効果】以上の説明から明らかなように、本発明
による話速変換方法によれば、原音の音質を保ったまま
話速変換を行なうことが可能である。簡単な方法である
にも関わらず、雑音の混入も少ない。また、自己相関係
数の下限値βの与え方によって音質をどのくらい重視す
るかを指定することができ、用途に応じた柔軟な話速変
換処理が可能である。As is apparent from the above description, according to the voice speed conversion method of the present invention, the voice speed conversion can be performed while maintaining the sound quality of the original sound. Despite the simple method, there is little noise mixing. Further, how much importance is attached to the sound quality can be designated by giving the lower limit value β of the autocorrelation coefficient, and the flexible speech speed conversion processing according to the application can be performed.

[Brief description of drawings]

【図１】本発明による話速変換装置の実施の形態の一
例を示すブロック図。FIG. 1 is a block diagram showing an example of an embodiment of a speech speed conversion apparatus according to the present invention.

【図２】本発明による話速変換装置の動作を説明する
流れ図。FIG. 2 is a flowchart illustrating the operation of the speech speed conversion device according to the present invention.

【図３】フレーム区間波形の一例を示す図。FIG. 3 is a diagram showing an example of a frame section waveform.

【図４】図３に示すフレーム区間波形に対する自己相
関係数（ｂ）の計算方法の一例（ａ）を説明するための
図。FIG. 4 is a diagram for explaining an example (a) of a method of calculating an autocorrelation coefficient (b) for the frame section waveform shown in FIG.

【図５】フレーム区間波形の別の一例を示す図。FIG. 5 is a diagram showing another example of a frame section waveform.

【図６】図５に示したフレーム区間波形の自己相関係
数を示す図。6 is a diagram showing an autocorrelation coefficient of the frame section waveform shown in FIG.

【図７】図３に示したフレーム区間波形において波形
の挿入／削除を行なう点Ｐを決定する方法を説明する
図。7 is a diagram illustrating a method of determining a point P at which a waveform is inserted / deleted in the frame section waveform shown in FIG.

【図８】出力する波形の範囲と次フレーム開始点の一
例を示す図（α＝１．２の場合）。FIG. 8 is a diagram showing an example of a range of a waveform to be output and a start point of a next frame (when α = 1.2).

【図９】出力する波形の範囲と次フレーム開始点の一
例を示す図（α＝１．３５の場合）。FIG. 9 is a diagram showing an example of a waveform range to be output and a start point of a next frame (when α = 1.35).

【図１０】出力する波形の範囲と次フレーム開始点の
一例を示す図（α＝１．５の場合）。FIG. 10 is a diagram showing an example of a range of a waveform to be output and a start point of a next frame (when α = 1.5).

[Explanation of symbols]

１話速変換装置２音声波形入力部３閾値入力部４話速変換率入力部５フレーム切りだし部６自己相関係数計算部７処理判断部８波形挿入／削除部９音声波形出力部 1 Speech rate converter 2 Voice waveform input section 3 Threshold input section 4 Speech rate conversion rate input section 5 frame cutout 6 Autocorrelation coefficient calculator 7 Processing judgment section 8 Waveform insertion / deletion section 9 Voice waveform output section

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平８−292789（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 21/04 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP-A-8-292789 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB name) G10L 21/04

Claims

(57) [Claims]

1. By appropriately inserting and deleting a waveform of a fundamental period of a speech waveform (hereinafter referred to as a pitch section waveform),
In the speech rate conversion method that expands and shortens only the stationary section of the speech waveform to convert the speech rate of the recorded speech, set the waveform similarity lower limit value as a parameter, and cut the input speech waveform sequentially from the beginning into shorter sections. However, (hereinafter, the cut-out waveform is referred to as a frame section waveform), the waveform similarity of the frame section waveform is calculated, and when the waveform similarity is higher than the waveform similarity lower limit value given as the parameter, the frame regarded as the interval waveform is constant, and depending on whether speech rate conversion rate to be input is less than or greater 1 characterized by performing the insertion and deletion of pitch period waveform with respect to the frame sections waveform, the pitch
When determining the point at which the
Each time point in the ram section is adjacent to the other time point.
The mean square value of the difference between the pitch period waveforms for two cycles
Is the sum of squares or the average of absolute values or the sum of absolute values
Insert the pitch section waveform at the point where this is the minimum.
A speech speed conversion method characterized in that points are deleted .

2. A waveform of a fundamental period of a voice waveform (hereinafter referred to as "pit
(Called H section waveform)
Recorded sound by expanding and shortening only the stationary section of the sound waveform
In the speech speed conversion device for converting the speech speed of, the means for setting the waveform similarity lower limit value as a parameter and the input speech waveform are sequentially cut out in short sections from the beginning ( below.
The waveform cut out below is called the frame section waveform),
Calculate the waveform similarity of the frame section waveform and
Is the lower limit of the waveform similarity given as the parameter.
The frame section waveform is stationary when it is larger than
And the input speech rate conversion rate is less than 1
A method to insert / delete the pitch section waveform depending on whether it is large or not.
The pitch interval waveform is inserted.
When determining the point to enter / delete,
For each time point, two adjacent periods with the time point in between
Root mean squared difference or sum of squares of pitch interval waveform or
Calculates the average of absolute values or the sum of absolute values, and this is the minimum
Is the point where the pitch section waveform is inserted or deleted.
A speech speed conversion device characterized in that

3. The speech speed conversion method according to claim 1 is calculated.
Stores a speech speed conversion program for execution using a
Computer readable recording medium.