JP3156020B2

JP3156020B2 - Audio speed conversion method

Info

Publication number: JP3156020B2
Application number: JP14922493A
Authority: JP
Inventors: 正之三崎; 良二鈴木
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1993-06-21
Filing date: 1993-06-21
Publication date: 2001-04-16
Anticipated expiration: 2016-04-16
Also published as: JPH0713596A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声の基本周波数を変
えずに継続時間長のみを変える音声速度変換方法に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice speed conversion method for changing only the duration without changing the fundamental frequency of voice.

【０００２】[0002]

【従来の技術】従来より、テープレコーダ等に記録され
ている音声信号の早聞きや遅聞きを行うために音声速度
変換装置が利用されている。2. Description of the Related Art Conventionally, an audio speed converter has been used to perform an early listening or a slow listening of an audio signal recorded on a tape recorder or the like.

【０００３】以下、図面を参照しながら、上述したよう
な従来の音声速度変換装置について説明を行う。[0003] Hereinafter, the above-described conventional audio speed converter will be described with reference to the drawings.

【０００４】図８は従来の音声速度変換装置の構成を示
すものである。図８において、８１はＡＤ変換器、８２
はバッファ、８３は速度制御回路、８４はデータ読出回
路、８５はミューテイング回路、８６はＤＡ変換器であ
る。FIG. 8 shows the configuration of a conventional voice speed converter. In FIG. 8, reference numeral 81 denotes an AD converter, 82
Is a buffer, 83 is a speed control circuit, 84 is a data read circuit, 85 is a muting circuit, and 86 is a DA converter.

【０００５】以上のように構成された音声速度変換装置
について、以下その動作を説明する。[0005] The operation of the audio speed converter having the above configuration will be described below.

【０００６】まず入力信号はＡＤ変換器８１でディジタ
ル信号に変換され、バッファ８２へ書込まれる。次に速
度制御回路８３は圧伸比に応じてデータ読出回路８４を
制御し、バッファ８２からデー夕を読出させる。このデ
ータの読出方法によって、再生速度を様々に変化させる
ことができる。再生時間を短くする場合には、ブロック
単位で読出すデータを間引く。再生時間を長くする場合
には、ブロック単位で読出すデータを繰返す。そして各
ブロック間の不連続部分はミューテイング回路８５でミ
ューテイングをかけ、ＤＡ変換器８６でアナログ信号に
変換して出力する。First, an input signal is converted into a digital signal by an AD converter 81 and written into a buffer 82. Next, the speed control circuit 83 controls the data read circuit 84 in accordance with the compression / expansion ratio to read data from the buffer 82. Depending on the data reading method, the reproduction speed can be variously changed. To shorten the reproduction time, the data to be read is thinned out in block units. To lengthen the reproduction time, the data to be read is repeated in block units. The discontinuous portion between the blocks is muted by a muting circuit 85, converted into an analog signal by a DA converter 86, and output.

【０００７】図９は圧伸比（時間軸圧縮伸長比＝入力信
号に対する出力信号の時間長の比）αが０．５と２．０
の場合を模式的に示したものである。（ａ）が元の原音
に対して、（ｂ）は時間軸変換比０．５、（ｃ）は時間
軸変換比２．０の場合を示す。FIG. 9 shows that the compression / expansion ratio (time axis compression / expansion ratio = the ratio of the time length of the output signal to the input signal) α is 0.5 and 2.0.
Is schematically shown. (A) shows the case of the original sound, (b) shows the case where the time base conversion ratio is 0.5, and (c) shows the case where the time base conversion ratio is 2.0.

【０００８】[0008]

【発明が解決しようとする課題】しかし、上記した従来
の構成では、時間軸を圧縮して速度を早める場合には、
データを間引くために子音などが欠落して明瞭度が低下
し、さらにブロックの接続点は不連続であり、それを減
らすために接続点をミューテイングしているものの、振
幅や位相が不連続で自然性に乏しい音声しか得られない
という課題を有していた。However, in the above-described conventional configuration, when the time axis is compressed to increase the speed,
In order to reduce data, consonants are missing and the clarity is reduced, and the connection points of the blocks are discontinuous.To reduce this, the connection points are muted, but the amplitude and phase are discontinuous. There was a problem that only a voice with poor naturalness could be obtained.

【０００９】また、他の従来の音声速度変換装置では、
ＴＤＨＳ（ＴｉｍｅＤｏｍｅｉｍＨａｒｍｏｎｉｃ
Ｓｃａｌｉｎｇ）のように入力信号のピッチ周期を用い
る方法もあるが、入力信号に音楽や雑音が重畳している
場合にはピッチの抽出が難しいので適用できない。ま
た、波形を重み付け加算する窓長が速度比とピッチ周期
によって変化しており、求められたピッチ周波数よりさ
らに低い信号を含む信号を重み付け加算すると、その低
周波数成分が不連続的に接続されがちであり、滑らかさ
が欠如するという問題点を有している。[0009] In another conventional voice speed converter,
TDHS (Time Domeim Harmonic
Although there is a method of using the pitch period of the input signal as in Scaling, it is not applicable when music or noise is superimposed on the input signal because it is difficult to extract the pitch. In addition, the window length for weighting and adding the waveform changes depending on the speed ratio and the pitch period. When a signal including a signal that is lower than the obtained pitch frequency is weighted and added, the low-frequency components tend to be connected discontinuously. And has a problem of lack of smoothness.

【００１０】本発明は上記のような問題点に鑑み、波形
の振幅・位相の両方について不連続性が少なく、データ
の欠落をあまり生じない自然性に富んだ音声を出力で
き、音楽信号などの低周波数成分を含んだ信号を滑らか
に再生することを目的とする。The present invention has been made in view of the above problems, and has low discontinuity in both amplitude and phase of a waveform, can output natural sound without causing data loss, and can output music signals and the like. An object is to smoothly reproduce a signal including a low frequency component.

【００１１】[0011]

【課題を解決するための手段】請求項１に係わる本発明
は、音声信号において、所定の時間長Ｔｓの信号をＡ、
前記信号Ａに後続する時間長Ｔｓの信号をＢとしたと
き、信号Ａに対して時間遅れｋ（０≦ｋ）である時間長
Ｔｓの信号Ａ’と、信号Ｂに対して時間遅れ−ｋ（０＜
ｋ）である時間長Ｔｓの信号Ｂ’について、信号Ａと信
号Ｂ’との相関関数および信号Ａ’と信号Ｂとの相関関
数を所定のｋの範囲で計算して前記相関関数が最大とな
る時間遅れｒｋを求め、このｒｋの値に対応して、ｒｋ
＝０の場合、信号Ａと信号Ｂとを時間長Ｔｓの幅で漸減
漸増の関係で重み付け加算して出力し、また、ｒｋ＞０
の場合、信号Ａを時間幅ｒｋで出力したのち信号Ａ’と
信号Ｂとを時間長Ｔｓの幅で漸減漸増の関係で重み付け
加算して出力し、また、ｒｋ＜０の場合、信号Ａと信号
Ｂ’とを時間長Ｔｓの幅で漸減漸増の関係で重み付け加
算して出力し、上記ｒｋの値に対する処理の次に、時間
軸圧縮伸長比（入力信号に対する出力信号の時間長の
比）αと前記時間遅れｒｋとに対応して式｛α（Ｔｓ−
ｒｋ）／（１−α）｝が与える時間長に達するまで前記
加算信号に後続する信号を出力する一連の処理を、次の
信号Ａの先頭を式｛（Ｔｓ−ｒｋ）／（１−α）｝が与
える時間長だけ遅延した点に再設定して繰り返すことに
より、音声の再生時間を原音の１．０倍以下に変化させ
るようにした音声速度変換方法である。 The present invention according to claim 1 is provided.
Represents a signal having a predetermined time length Ts as A,
When a signal having a time length Ts subsequent to the signal A is B.
And the time length that is a time delay k (0 ≦ k) with respect to the signal A
A time delay -k (0 <
k), a signal B ′ having a time length Ts, and a signal A and a signal
Correlation function between signal B 'and signal A' and signal B
The number is calculated within a predetermined range of k to maximize the correlation function.
The time delay rk is determined, and rk is determined according to the value of rk.
When = 0, the signal A and the signal B are gradually reduced by the width of the time length Ts.
Weighted addition is performed in the relation of gradual increase, and rk> 0
In the case of, after outputting the signal A with the time width rk, the signal A '
Signal B is weighted in a relationship of gradual decrease and increase in the width of time length Ts
The signal A and the signal are output when rk <0.
B 'is weighted in a relationship of gradual decrease and increase with the width of the time length Ts.
And output it, and after the process for the value of rk,
Axis compression / expansion ratio (time length of output signal with respect to input signal
Ratio) α and the time delay rk, the expression ｛α (Ts−
rk) / (1−α)｝ until the given time length is reached.
A series of processes for outputting a signal subsequent to the addition signal is performed as follows.
The expression {(Ts-rk) / (1-α)} gives the head of the signal A.
To reset to the point delayed by the length of time
Change the audio playback time to less than 1.0 times the original sound.
This is an audio speed conversion method.

【００１２】また、請求項２に係わる本発明は、音声信
号において、所定の時間長Ｔｓの信号をＡ、前記信号Ａ
に後続する時間長Ｔｓの信号をＢとしたとき、信号Ａに
対して時間遅れｋ（０≦ｋ）である時間長Ｔｓの信号
Ａ’と、信号Ｂに対して時間遅れ−ｋ（０＜ｋ）である
時間長Ｔｓの信号Ｂ’について、信号Ａと信号Ｂ’との
相関関数および信号Ａ’と信号Ｂとの相関関数を所定の
ｋの範囲で計算して前記相関関数が最大となる時間遅れ
ｒｋを求め、このｒｋの値に対応して、ｒｋ＝０の場
合、信号Ｂと信号Ａとを時間長Ｔｓの幅で漸減漸増の関
係で重み付け加算して出力し、また、ｒｋ＜０の場合、
信号Ｂを時間幅（−ｒｋ）で出力したのち信号Ｂ’と信
号Ａとを時間長Ｔｓの幅で漸減漸増の関係で重み付け加
算して出力し、また、ｒｋ＞０の場合、信号Ｂと信号
Ａ’とを時間長Ｔｓの幅で漸減漸増の関係で重み付け加
算して出力し、上記ｒｋの値に対する処理の次に、時間
軸圧縮伸長比（入力信号に対する出力信号の時間長の
比）αと前記時間遅れｒｋとに対応して式｛α（Ｔｓ−
ｒｋ）／（α−１）｝が与える時間長に達するまで前記
加算信号に後続する信号を出力する一連の処理を、次の
信号Ａの先頭を式｛（Ｔｓ−ｒｋ）／（α−１）｝が与
える時間長だけ遅延した点に再設定して繰り返すことに
より、音声の再生時間を原音の１．０倍以上に変化させ
るようにした音声速度変換方法である。 Further, the present invention according to claim 2 provides a voice signal.
A signal having a predetermined time length Ts is denoted by A, and the signal A
When a signal having a time length Ts subsequent to
On the other hand, a signal having a time length Ts with a time delay k (0 ≦ k)
A ′ and a time delay −k (0 <k) with respect to the signal B.
For the signal B 'having the time length Ts, the signal A and the signal B'
The correlation function and the correlation function between signal A ′ and signal B
Time delay when the correlation function is maximized when calculated in the range of k
rk is obtained, and in the case of rk = 0,
In this case, the signal B and the signal A are gradually reduced by the width of the time length Ts.
And weighted and added, and if rk <0,
After outputting the signal B with a time width (-rk), the signal B ′ is output.
Weighted with signal A in the relationship of gradually decreasing and increasing with the width of time length Ts
Calculated and output, also, rk> 0, the signal B and the signal
A 'is weighted in the relationship of gradual decrease and increase with the width of the time length Ts.
And output it, and after the process for the value of rk,
Axis compression / expansion ratio (time length of output signal with respect to input signal
Ratio) α and the time delay rk, the expression ｛α (Ts−
rk) / (α-1)} until the given time length is reached.
A series of processes for outputting a signal subsequent to the addition signal is performed as follows.
The expression {(Ts-rk) / (α-1)} gives the head of the signal A.
To reset to the point delayed by the length of time
More than 1.0 times the original sound
This is an audio speed conversion method.

【００１３】[0013]

【作用】この構成によって、信号Ａ’と信号Ｂまたは信
号Ａと信号Ｂ’に対して時間長Ｔｓの幅で重み付け加算
を行うことにより加算した信号の欠落および振幅の不連
続が少なくなり、さらに、一定の時間長Ｔｓで重み付け
加算していることにより低周波数成分を含む信号の滑ら
かな接続が可能となる。また、信号Ａ’と信号Ｂまたは
信号Ａと信号Ｂ’の相関関数が最大となる時間遅れｒｋ
の位置に基づいて加算することにより波形接続を行う区
間の前後で位相の不整合が少なくなる。According to this configuration, the signal A 'and the signal B or the signal A and the signal B' are weighted and added with the width of the time length Ts, thereby reducing the loss of the added signal and the discontinuity of the amplitude. Since the weighted addition is performed with a fixed time length Ts, it is possible to smoothly connect signals including low frequency components. Further, a time delay rk at which the correlation function between the signal A ′ and the signal B or the signal A and the signal B ′ is maximized.
, The phase mismatch before and after the section where the waveform connection is performed is reduced.

【００１４】[0014]

【実施例】以下、本発明の第１の実施例について、図面
を参照しながら説明する。EXAMPLES Hereinafter, a first embodiment of the present invention will be described with reference to the drawings.

【００１５】本発明は圧伸比αが式｛Ｔｓ＋ｋｍａｘ／
２・Ｔｓ｝≦α≦１．０の範囲で動作する音声速度変換
方法に係る。According to the present invention, the companding ratio α is determined by the formula ΔTs + kmax /
The present invention relates to an audio speed conversion method operating in the range of 2 · Ts｝ ≦ α ≦ 1.0.

【００１６】図１は本発明の第１の実施例における音声
速度変換方法のフローチャートを示すもので、その動作
について説明する。FIG. 1 is a flow chart of a voice speed conversion method according to a first embodiment of the present invention, and its operation will be described.

【００１７】この例では、音声信号が離散時間データｘ
（ｎ）にサンプリングされているものとする。以下の処
理には、入力データポインタとしてＰ１、Ｐ２、および
出力データポインタＰ３を用いてデータの指定を行う。
まず、ステップ１０１で、入力ポインタＰ１の指すアド
レスｉｐ１にこれから再生したい音声データの先頭アド
レスに設定する。また、Ｐ２の指すアドレスｉｐ２には
Ｐ１からＴｓ個後のデータを指すようにする。また、出
力ポインタの指すアドレスｏｐには初期値を設定する。
ステップ１０２で圧伸比αを設定する。この圧伸比αは
前記の式に示した値を満たすものとする。In this example, the audio signal is discrete time data x
It is assumed that sampling is performed at (n). In the following processing, data is specified using P1, P2 as an input data pointer and an output data pointer P3.
First, in step 101, an address ip1 indicated by the input pointer P1 is set to the head address of audio data to be reproduced. The address ip2 indicated by P2 indicates the data Ts times after P1. Also, an initial value is set to the address op indicated by the output pointer.
In step 102, the companding ratio α is set. The companding ratio α satisfies the value shown in the above equation.

【００１８】次に、ポインタＰ１からデータ数Ｔｓ個の
信号ＡとポインタＰ２からデータ数Ｔｓ個の信号Ｂの一
方を基準としてもう一方を時間遅れの向きにずらしてい
き、相関の高くなる位置を求めるために、ステップ１０
３で相関関数を演算し、ステップ１０４で相関関数が最
大となるときの時間遅れに相当するデータ数（時間遅
れ）ｒｋを求める。相関関数ＣＯＲの計算内容について
は図２に示すように時間遅れｋの値の正負に応じて使用
する音声データの範囲が異なっている。また、計算を行
う時間遅れｋの範囲は最大値ｋｍａｘと最小値ｋｍｉｎ
を予め設定しておき、相関遅延を求める範囲には制限を
加える。以上で相関関数が最大となる時間遅れｒｋが求
められ、ステップ１０５で音声データをそのまま出力す
るデータ数Ｔｔを図３に示すように計算する。このスト
レートアウト区間のデータ数Ｔｔの計算も時間遅れｒｋ
の正負に応じて計算式が異なる。Next, one of the signal A having the number of data Ts from the pointer P1 and the signal B having the number of data Ts from the pointer P2 is shifted with respect to one of the signals A in a time delay direction, and the position where the correlation becomes high is determined. Step 10 to find
In step 3, a correlation function is calculated, and in step 104, the number of data (time delay) rk corresponding to a time delay when the correlation function is maximized is obtained. Range of audio data to be used according to the positive or negative value of the time between the delay k as shown in FIG. 2 is different from the calculation contents of the correlation function COR. The range of the time delay k for performing the calculation is a maximum value kmax and a minimum value kmin
Is set in advance, and the range for obtaining the correlation delay is limited. The time delay rk at which the correlation function is maximized is obtained as described above. In step 105, the number of data Tt for directly outputting audio data is calculated as shown in FIG. The calculation of the number of data Tt in this straight-out section is also delayed by rk.
The calculation formula differs depending on the sign of.

【００１９】そして、時間遅れｒｋの値が正のときはス
テップ１０７、１０８、１０９の処理を行って出力波形
を求め、それ以外の場合にはステップ１１０、１１１の
処理を行って出力波形を求める。ここで、ステップ１０
８、１１０におけるＷｄｅｃ（ｉ）はｉが０のときに大
きさ１でｉの増加と共にリニアに単調減少してｉが（Ｔ
ｓ−１）のときに０になる窓関数である。また、ステッ
プ１０８、１１０におけるＷｉｎｃ（ｉ）はｉが０のと
きに０でｉの増加と共にリニアに単調増加してｉが（Ｔ
ｓ−１）のときに１になる窓関数である。If the value of the time delay rk is positive, the processing of steps 107, 108 and 109 is performed to obtain an output waveform, and otherwise, the processing of steps 110 and 111 is performed to obtain an output waveform. . Here, step 10
The Wdec (i) at 8, 110 is linearly monotonically decreasing with the increase of i when i is 0 and i is ( T).
s-1 ) is a window function that becomes 0 at the time of (s-1 ) . In addition, Winc (i) in steps 108 and 110 is linearly monotonically increased at 0 when i is 0 and increases as i increases, and i becomes (T
The window function becomes 1 in the case of s-1).

【００２０】図４に時間遅れｒｋの値が０、正、負の場
合にわけて出力波形が求められる様子を示している。時
間遅れｒｋが正の場合には時間遅れｒｋが０の場合に較
べて、データ数Ｔｔが短くなっていることがわかる。逆
に、時間遅れｒｋが負の場合にはデータ数Ｔｔが長くな
っている。これは、時間遅れｒｋのずれに応じてデータ
数Ｔｔの長さを調節して目標の圧伸比αからのずれがな
いようにするためである。そして、引き続き処理を継続
する場合にはステップ１１３に示すように入力データポ
インタと出力データポインタの指すアドレスを更新して
から、ステップ１０２以下の処理を繰り返すようにす
る。The values in FIG. 4 between two o'clock delay r k is 0, a positive, the output waveform divided for negative shows how sought. It can be seen that the number of data Tt is shorter when the time delay rk is positive than when the time delay rk is zero. Conversely, when the time delay rk is negative, the data number Tt is long. This is to adjust the length of the number of data Tt according to the deviation of the time delay rk so that there is no deviation from the target companding ratio α. If the processing is to be continued, the address pointed to by the input data pointer and the address pointed to by the output data pointer is updated as shown in step 113, and then the processing from step 102 onward is repeated.

【００２１】以上のように本実施例によれば、次に述べ
るような特長を持った再生時間を圧縮して聴取する方法
（音程を変えずに速度を高速にする方法）を実現するこ
とができる。ポインタＰ１、Ｐ２を基準とした相関関数
を計算し、その相関の高くなる位置で重み付け加算をし
ている。これにより、波形を接続する前後の区間で位相
が著しく不整合になることを防いでいる。そして、２つ
の離れた部分の信号は一方は単調減少し、一方は単調増
加する窓関数をかけてから加算されており、波形を接続
する区間における振幅の連続性は良好に保たれる。As described above, according to the present embodiment, it is possible to realize a method of listening by compressing the reproduction time having the following features (a method of increasing the speed without changing the pitch). it can. A correlation function based on the pointers P1 and P2 is calculated, and weighted addition is performed at a position where the correlation is high. This prevents the phase from becoming significantly mismatched in the section before and after connecting the waveforms. One of the signals at the two separated portions is monotonically decreased, and the other is added after applying a monotonically increasing window function, so that the continuity of the amplitude in the section where the waveforms are connected is maintained well.

【００２２】これらによって、従来にない滑らかで自
然、かつ情報欠落やエコー感が少ない明瞭な再生音を得
ることができる。また、重み付け加算を行った後に続く
ストレートアウト区間のデータ数は時間遅れのデータ数
ｒｋが決定された後に計算され、時間遅れのデータ数が
変化することによる圧伸比αのずれを生じることはな
い。さらに、重み付け加算する区間の長さは、入力信号
や時間遅れｒｋに無関係な一定長Ｔｓで波形をクロスフ
ェードして接続しているので時間遅れｒｋの値によって
クロスフェード長が短くなることはなく、接続される信
号に含まれる低周波数成分の滑らかな再生音が得られる
ことになる。As a result, it is possible to obtain a clear reproduced sound that is smooth and natural, and has a clear information loss and little echo feeling. Further, the number of data in the straight-out section following the weighted addition is calculated after the number of time-delayed data rk is determined, and a change in the companding ratio α due to a change in the number of time-delayed data may not occur. Absent. Further, the length of the section to be weighted and added is such that the waveform is cross-fade with a constant length Ts irrespective of the input signal and the time delay rk, so that the cross-fade length is not shortened by the value of the time delay rk. Thus, a smooth reproduced sound of low frequency components contained in the connected signal can be obtained.

【００２３】以下、本発明の第２の実施例について、図
面を参照しながら説明する。本発明は圧伸比αが式１．
０≦α≦（Ｔｓ／ｋｍａｘ）の範囲で動作する音声速度
変換方法を提供するものである。Hereinafter , a second embodiment of the present invention will be described with reference to the drawings. In the present invention, the drawing / drawing ratio α is represented by the formula 1.
An object of the present invention is to provide a voice speed conversion method that operates in the range of 0 ≦ α ≦ (Ts / kmax) .

【００２４】図５は本発明の第２の実施例における音声
速度変換方法のフローチャートを示すもので、その動作
について説明する。FIG. 5 is a flowchart of a voice speed conversion method according to a second embodiment of the present invention, and the operation will be described.

【００２５】この例でも第１の実施例と同様に、音声信
号は離散時間データｘ（ｎ）にサンプリングされてお
り、入力データポインタＰ１、Ｐ２、および出力データ
ポインタＰ３を用いてデータの指定を行う。まず、ステ
ップ５０１で、入力ポインタＰ１の指すアドレスｉｐ１
にこれから再生したい音声データの先頭アドレスに設定
する。また、Ｐ２の指すアドレスｉｐ２にはＰ１からＴ
ｓ個後のデータを指すようにする。また、出力ポインタ
の指すアドレスｏｐには初期値を設定する。ステップ５
０２で圧伸比αを設定する。この圧伸比αは第２の実施
例における前記式に示した値を満たすものとする。次
に、ポインタＰ１からデータ数Ｔｓ個の信号Ａとポイン
タＰ２からデータ数Ｔｓ個の信号Ｂの一方を基準として
もう一方を時間遅れの向きにずらしていき、相関の高く
なる位置を求めるために、ステップ５０３で相関関数を
演算し、ステップ５０４で相関関数が最大となるときの
時間遅れに相当するデータ数ｒｋを求める。相関関数Ｃ
ＯＲの計算内容については第１の実施例と同様に図２に
示したように計算を行う。In this example, as in the first embodiment, the audio signal is sampled as discrete-time data x (n), and data is designated using input data pointers P1, P2 and output data pointer P3. Do. First, in step 501, the address ip1 indicated by the input pointer P1
Is set to the head address of the audio data to be reproduced. Also, the address ip2 indicated by P2 has a value from P1 to T.
Point to data s times later. Also, an initial value is set to the address op indicated by the output pointer. Step 5
In 02, the companding ratio α is set. The companding ratio α satisfies the value shown in the above equation in the second embodiment. Next, one of the signal A having the number of data Ts from the pointer P1 and the signal B having the number of data Ts from the pointer P2 is shifted with respect to one of them in a time delay direction to obtain a position having a high correlation. In step 503, a correlation function is calculated, and in step 504, the number of data rk corresponding to a time delay when the correlation function is maximized is obtained. Correlation function C
The calculation contents of OR are calculated as shown in FIG. 2 as in the first embodiment.

【００２６】また、計算を行う時間遅れｋの範囲は最大
値ｋｍａｘと最小値ｋｍｉｎを予め設定しておき、相関
遅延を求める範囲には制限を加える。以上で相関関数が
最大となる時間遅れｒｋが求められ、ステップ５０５で
音声データをそのまま出力するデータ数Ｔｔを図６に示
すように計算する。このストレートアウト区間のデータ
数Ｔｔの計算も時間遅れｒｋの正負に応じて計算式が異
なる。そして、時間遅れｒｋの値が負のときはステップ
５０７、５０８、５０９の処理を行って出力波形を求
め、それ以外の場合にはステップ５１０、５１１の処理
を行って出力波形を求める。ここで、ステップ５０８、
５１０におけるＷｄｅｃ（ｉ）は、第１の実施例と同様
にｉが０のときに大きさ１でｉの増加と共にリニアに単
調減少してｉが（Ｔｓ−１）のときに０になる窓関数で
ある。また、ステップ５０８、５１０におけるＷｉｎｃ
（ｉ）は、第１の実施例と同様にｉが０のときに０でｉ
の増加と共にリニアに単調増加してｉが（Ｔｓ−１）の
ときに１になる窓関数である。Further, the range of calculated line cormorants time between delay k is previously set maximum value kmax and the minimum value kmin advance, in the range correlating delay to limit. The time delay rk at which the correlation function is maximized is obtained as described above. In step 505, the number of data Tt for directly outputting audio data is calculated as shown in FIG. The calculation formula of the number Tt of data in the straight-out section also differs according to the sign of the time delay rk. When the value of the time delay rk is negative, the processing of steps 507, 508, and 509 is performed to obtain an output waveform. Otherwise, the processing of steps 510 and 511 is performed to obtain an output waveform. Here, step 508,
The window Wdec (i) at 510 is a window having a magnitude of 1 when i is 0, linearly decreasing monotonically with the increase of i, and becoming 0 when i is ( Ts−1 ) , as in the first embodiment. Function. Also, Winc in steps 508 and 510
(I) is 0 and i is 0 when i is 0 as in the first embodiment.
Is a window function that linearly and monotonically increases as i increases and becomes 1 when i is ( Ts−1 ) .

【００２７】図７に時間遅れｒｋの値が０、負、正の場
合にわけて出力波形が求められる様子を示している。時
間遅れｒｋが正の場合には時間遅れｒｋが０の場合に較
べて、データ数Ｔｔが短くなっていることがわかる。逆
に、時間遅れｒｋが負の場合にはデータ数Ｔｔが長くな
っている。これは時間遅れｒｋのずれに応じてデータ数
Ｔｔの長さを調節して目標の圧伸比αからのずれが無い
ようにするためである。そして、引き続き処理を継続す
る場合にはステップ５１３に示すように入力データポイ
ンタと出力データポインタの指すアドレスを更新してか
ら、ステップ５０２以下の処理を繰り返すようにする。The values in FIG. 7 between two o'clock delay r k is 0, negative, positive divided by the output waveform in the case shows how sought. It can be seen that the number of data Tt is shorter when the time delay rk is positive than when the time delay rk is zero. Conversely, when the time delay rk is negative, the data number Tt is long. This is to adjust the length of the data number Tt according to the deviation of the time delay rk so that there is no deviation from the target companding ratio α. If the processing is to be continued, the address pointed to by the input data pointer and the address pointed to by the output data pointer are updated as shown in step 513, and then the processing from step 502 onward is repeated.

【００２８】以上のように本実施例によれば、次に述べ
るような特長を持った再生時間を伸長して聴取する方法
（音程を変えずに速度を低速にする方法）を実現するこ
とができる。ポインタＰ１、Ｐ２を基準とした相関関数
を計算し、その相関の高くなる位置で重み付け加算をし
ている。これにより、波形を接続する前後の区間で位相
が著しく不整合になることを防いでいる。そして、２つ
の離れた部分の信号は一方は単調減少し、一方は単調増
加する窓関数を掛けてから加算されており、波形を接続
する区間における振幅の連続性は良好に保たれる。これ
らによって、従来にない滑らかで自然、かつ情報欠落や
エコー感が少ない明瞭な再生音を得ることができる。As described above, according to the present embodiment, it is possible to realize a method of extending the reproduction time and having the following characteristics for listening (a method of reducing the speed without changing the pitch). it can. A correlation function based on the pointers P1 and P2 is calculated, and weighted addition is performed at a position where the correlation is high. This prevents the phase from becoming significantly mismatched in the section before and after connecting the waveforms. One of the two separated signals is monotonically reduced, and the other is multiplied by a monotonically increasing window function, and then added, so that the continuity of the amplitude in the section where the waveforms are connected is maintained well. As a result, it is possible to obtain a clear and natural reproduced sound which is unprecedented and smooth and has little information loss and echo feeling.

【００２９】また、重み付け加算を行った後に続くスト
レートアウト区間のデータ数は時間遅れのデータ数ｒｋ
が決定された後に計算され、時間遅れのデータ数ｒｋが
変化することによる圧伸比αのずれを生じることはな
い。さらに、重み付け加算する区間の長さは、入力信号
や時間遅れｒｋに無関係な一定長Ｔｓで波形をクロスフ
ェードして接続しているので時間遅れｒｋの値によって
クロスフェード長が短くなることは無く、接続される信
号に含まれる低周波数成分の滑らかな再生音が得られる
ことになる。The number of data in the straight-out section following the weighted addition is the number of data rk with a time delay.
Is calculated after the determination is made, and there is no shift in the companding ratio α due to a change in the number of data rk with a time delay. Furthermore, the length of the section to be weighted and added is such that the cross-fade length is not shortened by the value of the time delay rk because the waveform is connected by cross-fading with a constant length Ts irrespective of the input signal and the time delay rk. Thus, a smooth reproduced sound of low frequency components contained in the connected signal can be obtained.

【００３０】[0030]

【発明の効果】本発明は、信号Ａと信号Ｂの一方を基準
とした相関関数が最大となる時間遅れｒｋを求め、その
時間遅れに応じて波形を重み付け加算する位置を変更す
ることにより、信号の接続を行う区間の前後で位相が著
しく不整合になることを防いでいる。また、波形の接続
を行う区間において時間的に漸減する窓関数と時間的に
漸増する窓関数を信号に乗算してから加算しているの
で、波形接続を行う区間の振幅の不連続性が無くなる。
さらに、時間遅れｒｋを決定した後に、式｛α（Ｔｓ−
ｒｋ）／（１一α）｝あるいは式｛α（Ｔｓ−ｒｋ）／
（α−１）｝に示される値に出力時間長が達するまで、
重み付け加算した信号に後続する信号をそのまま出力す
るので、時間遅延のデータ数が変化することによる圧伸
比αからのずれを生じることはない。さらに、一定時間
長Ｔｓの幅で重み付け加算を行ったことにより、接続さ
れる信号に含まれる低周波数成分の滑らかな再生音が得
られる利点がある。According to the present invention, the time delay rk that maximizes the correlation function based on one of the signal A and the signal B is determined, and the position where the waveform is weighted and added is changed according to the time delay. The phase is prevented from being significantly mismatched before and after the section where the signal is connected. In addition, since the signal is multiplied by a window function that gradually decreases in time and a window function that gradually increases in time in the section where the waveform is connected, the discontinuity of the amplitude in the section where the waveform is connected is eliminated. .
Further, after determining the time delay rk, the equation ｛α (Ts−
rk) / (11−α)} or the formula {α (Ts−rk) /
Until the output time length reaches the value shown in (α-1)｝ ,
Since the signal subsequent to the weighted and added signal is output as it is, there is no deviation from the companding ratio α due to a change in the number of time delay data. Further, by performing the weighted addition with the width of the fixed time length Ts, there is an advantage that a smooth reproduced sound of low frequency components included in the connected signal can be obtained.

[Brief description of the drawings]

【図１】本発明の第１の実施例における音声速度変換方
法のフローチャートFIG. 1 is a flowchart of a voice speed conversion method according to a first embodiment of the present invention;

【図２】本発明の第１の実施例における音声速度変換方
法の相関関数演算のフローチャートFIG. 2 is a flowchart of a correlation function operation of the voice speed conversion method according to the first embodiment of the present invention.

【図３】本発明の第１の実施例におけるストレートアウ
ト区間の長さを計算するフローチャートFIG. 3 is a flowchart for calculating a length of a straight-out section according to the first embodiment of the present invention;

【図４】本発明の第１の実施例における音声速度変換方
法で、入力信号に対して時間遅延ｒｋの値によって重み
付け加算されて得られる出力信号の模式図FIG. 4 is a schematic diagram of an output signal obtained by weighting and adding an input signal by a value of a time delay rk in the voice speed conversion method according to the first embodiment of the present invention;

【図５】本発明の第２の実施例における音声速度変換方
法のフローチャートFIG. 5 is a flowchart of a voice speed conversion method according to a second embodiment of the present invention;

【図６】本発明の第２の実施例におけるストレートアウ
ト区間の長さを計算するフローチャートFIG. 6 is a flowchart for calculating the length of a straight-out section according to the second embodiment of the present invention;

【図７】本発明の第２の実施例における音声速度変換方
法で、入力信号に対して時間遅延ｒｋの値によって重み
付け加算されて得られる出力信号の模式図FIG. 7 is a schematic diagram of an output signal obtained by weighting and adding an input signal by a value of a time delay rk in the audio speed conversion method according to the second embodiment of the present invention.

【図８】従来の音声速度変換装置の構成図FIG. 8 is a configuration diagram of a conventional voice speed conversion device.

【図９】従来の音声速度変換装置の入力信号と出力信号
の模式図FIG. 9 is a schematic diagram of an input signal and an output signal of a conventional audio speed conversion device.

[Explanation of symbols]

Ａ、Ｂ信号Ｔｓ所定の時間長ｒｋ時間遅れ α 圧伸比 A, B signal Ts Predetermined time length rk Time delay α Companding ratio

フロントページの続き (56)参考文献特開平３−219462（ＪＰ，Ａ) 特開平４−104200（ＪＰ，Ａ) 特開平４−188199（ＪＰ，Ａ) 特開平６−222794（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 11/06 G10L 21/00 - 21/04 Continuation of front page (56) References JP-A-3-219462 (JP, A) JP-A-4-104200 (JP, A) JP-A-4-188199 (JP, A) JP-A-6-222794 (JP) , A) (58) Fields studied (Int. Cl. ⁷ , DB name) G10L 11/00-11/06 G10L 21/00-21/04

Claims

(57) [Claims]

1. An audio signal having a predetermined time length Ts
A signal is represented by A, and a signal having a time length Ts subsequent to the signal A is represented by B.
Is a time delay k (0 ≦ k) with respect to the signal A.
Time delay with respect to the signal A 'having a time length Ts
k (0 <k), a signal B ′ having a time length Ts
The correlation function between signal A and signal B 'and signal A' and signal B
Is calculated in a predetermined range of k, and the correlation function is
The maximum time delay rk is determined, and corresponding to this rk value,
Therefore, when rk = 0, the signal A and the signal B are separated by a time length Ts.
Weighted addition is output in the relationship of gradually decreasing and increasing in width, and
If rk> 0, signal A is output with time width rk and then
The signal A 'and the signal B are gradually reduced and increased in the width of the time length Ts.
Weighted addition is performed, and when rk <0, the signal is
A and the signal B ′ overlap in a gradually decreasing and increasing relationship with the width of the time length Ts.
And outputs the result after the process for the value of rk.
The time axis compression / expansion ratio (for the output signal with respect to the input signal,
｛Α corresponding to the ratio (interval length) α and the time delay rk
Reaches the time length given by (Ts-rk) / (1-α)｝
A series of processes for outputting a signal subsequent to the addition signal up to
Is calculated by adding the head of the next signal A to the equation ｛(Ts−rk) / (1−
α) Reset to the point delayed by the time length given by｝ and repeat
By returning, the playback time of the sound is less than 1.0 times the original sound
Voice speed conversion method that is changed to.

2. An audio signal having a predetermined time length Ts
A signal is represented by A, and a signal having a time length Ts subsequent to the signal A is represented by B.
Is a time delay k (0 ≦ k) with respect to the signal A.
Time delay with respect to the signal A 'having a time length Ts
k (0 <k), a signal B ′ having a time length Ts
The correlation function between signal A and signal B 'and signal A' and signal B
Is calculated in a predetermined range of k, and the correlation function is
The maximum time delay rk is determined, and corresponding to this rk value,
Therefore, when rk = 0, the signal B and the signal A are separated by a time length Ts.
Weighted addition is output in the relationship of gradually decreasing and increasing in width, and
When rk <0, the signal B was output with a time width (-rk)
Thereafter, the signal B 'and the signal A are gradually reduced and increased by the width of the time length Ts.
Weighted and added in relation to each other, and when rk> 0,
In this case, the signal B and the signal A ′ are gradually reduced by the width of the time length Ts.
Weighted and added in relation to output
After processing, the time axis compression / expansion ratio (output for input signal
Signal time ratio) α and the time delay rk
The time length given by the equation {α (Ts−rk) / (α−1)}
A series of signals that follow the sum signal until they reach
Is calculated by adding the head of the next signal A to the equation ｛(Ts−rk) /
(Α-1) is reset to the point delayed by the time length given by
To repeat the playback time of the original sound by 1.0
An audio speed conversion method that changes it more than twice.