JPS60140298A

JPS60140298A - Speech controller

Info

Publication number: JPS60140298A
Application number: JP58250781A
Authority: JP
Inventors: 俊郎寺内; 田村　震一
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1983-12-27
Filing date: 1983-12-27
Publication date: 1985-07-25

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は、例えば音声信号においてその音程及び速度を
任意に変えられるようにするスピーチコントロール装機
′に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a speech control device which allows, for example, the pitch and speed of an audio signal to be changed arbitrarily.

背景技術とその問題点音声信号の速度及び音程を任意に変更できるようにする
スピーチコントロール装置がある。このような装置にお
いて、従来、時間軸で操作して信号の速度及び音程を変
えるには以下のようにされている。例えば第１図Ａに示
すような信号があった場合に、この信号の速度を上昇さ
せるには、上げる割合に応じ゛ζ間欠に信号を取り出し
、この取り出された信号を第１図Ｂに示すようにをつな
ぎ合せて出力する。ざらにこの信号を第１図Ｃに示すよ
うに時間軸伸長して音程を低下させることもできる。Background Art and Problems There is a speech control device that allows the speed and pitch of an audio signal to be changed arbitrarily. Conventionally, in such a device, the speed and pitch of a signal are changed by operating on the time axis as follows. For example, if there is a signal as shown in Figure 1A, in order to increase the speed of this signal, the signal is extracted intermittently according to the rate of increase, and the extracted signal is as shown in Figure 1B. Connect and output. It is also possible to roughly extend the time axis of this signal as shown in FIG. 1C to lower the pitch.

しかしながらこのような装置の場合、信号のつなぎ目に
不連続点が生じ、これを例んばクロスフェードする等の
対策も行われたが、それでも不連続さを完全に解消する
ことはできず、聴感上の違和感があった。However, in the case of such devices, discontinuities occur at the joints of signals, and although countermeasures such as crossfading have been taken, it is still not possible to completely eliminate discontinuities, and the auditory sense is affected. There was a feeling of discomfort above.

また間欠で取り出されなかった部分の情報が失なわれて
おり、信号が不完全なものになっていた。Additionally, the information that was not retrieved intermittently was lost, making the signal incomplete.

これに刻して本願発明者は先に、信号の不連続や欠落の
ないスピーチコントロール装置を提案した。In view of this, the inventor of the present application has previously proposed a speech control device that does not have discontinuities or dropouts in signals.

第２図においζ、入力端′子（１）に供給された音声信
号がＡＤ変換回路（２）に供給されてデジタル信号ｘ　
ｃｍ＞とされ、バッファメモリ（３）に記憶される。こ
のメモ１月３）は例えば全体の長さがＬとされ、人力が
順次シフトされて記憶される。このメモリ（３）の内容
がＲシフ１−された時点（Ｒ＜Ｌ）ごとにメモリ（３）
の内容が並列にバッファメモリ（４）に取り出される。In Fig. 2, the audio signal supplied to the input terminal (1) is supplied to the AD conversion circuit (2) and the digital signal x
cm> and stored in the buffer memory (3). This memo (January 3), for example, has an overall length of L, and is stored by sequentially shifting the human power. Every time the contents of this memory (3) are R-shifted (R<L), the memory (3) is
The contents of are taken out in parallel to the buffer memory (4).

これによってメモリ（４）からは、任意の時間間隔Ｒご
とに任意の時間長りずつ信号が抽出される。As a result, signals are extracted from the memory (4) at arbitrary time intervals R and for arbitrary lengths of time.

ここでＲはＬに対して充分小ざくされており、各信号は
互い←オーバーランプされ°ζいる。Here, R is made sufficiently small with respect to L, and each signal is overlamped with respect to each other.

このメモ１月４）からの信号が乗算器（５）に供給され
て所定の窓係数ｈＱＩｌ）がＨ（られる。この窓係数の
Ｂ（られた信号がフーリエ変換回路（６）に供給される
。The signal from this memo (January 4) is supplied to the multiplier (5) and a predetermined window coefficient hQIl) is converted to H(H).The signal obtained by B(of this window coefficient) is supplied to the Fourier transform circuit (6). .

これによっ′ζ信号の時間軸が周波数軸に変換される。As a result, the time axis of the 'ζ signal is converted to the frequency axis.

この変換された信号が乗算器（７）に供給されて、メモ
１月３）でのシフト量Ｒに相当する位相の１ｉｌｌｄ整
が行われる。この位相１ｌＩＩｉＩ整された信号が処理
回路（８）に供給される。This converted signal is supplied to a multiplier (7) to perform phase adjustment corresponding to the shift amount R in Memo January 3). This phase-adjusted signal is supplied to a processing circuit (8).

・　この処理回路（８）におい°ζ、フーリエ変換にて
周波数軸に変換された信号が所定の周波数１）域ごとに
各メモリ番地に収納される。この収納された信号が順次
読み出される。- In this processing circuit (8), the signal converted into the frequency axis by Fourier transform is stored in each memory address for each predetermined frequency range (1). The stored signals are sequentially read out.

この処理回路（８）から統み出された信号が乗算器（９
）に供給されて、後述する出力時のシフト量Ｒ゛に相当
する位相の調整が行われる。この位相調整された信号が
逆フーリエ変換回路（１０）に供給される。The signal output from this processing circuit (8) is sent to the multiplier (9).
), and the phase adjustment corresponding to the shift amount R' at the time of output, which will be described later, is performed. This phase-adjusted signal is supplied to an inverse Fourier transform circuit (10).

これによって信号の周波数軸が時間軸に変換される。This converts the frequency axis of the signal into the time axis.

この変換された信号が乗算器（１１）に供給されて、上
述の窓係数ｈ（Ｔ１１）に対応した窓係数ｆ（２）が掛
られる。この窓係数のり）られた信号がバッファメモリ
　（１２）に記憶される。このメモリ　（１２）の内容
が並列にバッファメモリ　（１３）に供給される。This converted signal is supplied to a multiplier (11) and multiplied by a window coefficient f(2) corresponding to the above-mentioned window coefficient h(T11). The signal multiplied by this window coefficient is stored in the buffer memory (12). The contents of this memory (12) are supplied in parallel to the buffer memory (13).

このメモリ　（１３）は例えば全体の長さがＬとされ、
内容が順次シフトされて出力される。またシフトによっ
て生じる空白部にはＯが記憶される。そしてこのメモリ
　（１３）の内容がＲ′　シフトされた時点ごとにメモ
リ　（１２）の内容が供給され、それ以前の内容に加算
される。For example, the total length of this memory (13) is L,
The contents are sequentially shifted and output. Further, O is stored in the blank space created by the shift. Each time the contents of this memory (13) are shifted by R', the contents of the memory (12) are supplied and added to the previous contents.

このメモリ　（１３）からの信号がＤＡ変換回路（工４
）に供給されてアナログ信号とされ、出力端子（１５）
に取り出される。The signal from this memory (13) is
) is supplied to the output terminal (15) and converted into an analog signal.
It is taken out.

さらに、入力端子（１）からの信号が／Ｓイバスフィル
タ（２１）及びローパスフィルタ（２２）に供給される
。これらの出力が比較回路（２３）に供給されζそれぞ
れの帯域の信号のエネルギーが比較される。この比較出
力が窓係数Ｉｌ■、ｆ（ＩＩ＋）の選択回路（２４）に
供給されζそれぞれの場合に応した窓係数が選択される
。Furthermore, the signal from the input terminal (1) is supplied to the /S Ibus filter (21) and the low-pass filter (22). These outputs are supplied to a comparator circuit (23), and the energies of the signals in each band are compared. This comparison output is supplied to a selection circuit (24) for window coefficients I1 and f(II+), and a window coefficient corresponding to each case of ζ is selected.

この装置において、入力端子＋１１に第３図Ａのような
信号が供給されると、この信号が時間間隔Ｒごとに時間
長１、ずつ抽出される。この抽出された信号がフーリエ
変換され′（第３図Ｂに示すように時間軸が周波数軸に
変換されたスペクトラムが形成される。In this device, when a signal as shown in FIG. 3A is supplied to the input terminal +11, this signal is extracted for each time interval R by one time length. This extracted signal is Fourier transformed' (as shown in FIG. 3B, a spectrum in which the time axis is transformed to the frequency axis is formed).

この信号が処理回路（８）の各メモリ番地に収納されて
位相劇整される。このｉ１’ｉ、４１１　ｉ１１＋Ｉ整
されたイご号が逆フーリエ変換されて第４図Ａ、第５図
Ａに示すように順次時間間隔Ｒずつシフトされた信号が
形成される。これらの信号が順次メモリ　（１２）を通
じてメモリ　（１３）に供給されて加算される。This signal is stored in each memory address of the processing circuit (8) and phase adjusted. The i1'i, 411 i11+I adjusted I symbol is subjected to inverse Fourier transform to form signals sequentially shifted by time intervals R as shown in FIGS. 4A and 5A. These signals are sequentially supplied to memory (13) through memory (12) and added.

ここで例えば信号の速度を上昇させる場合には、メモリ
　（１３）においＣ、メモリ　（１２）からの信号の加
算される際のシフｌ−量Ｒ’　を、（Ｒ’　＜Ｒ）とす
る。そしてこのシフト酸Ｒ°　ごとに加算が行ねれると
、この信号は第４図Ｂに示すようになり、Ｒ’ この信号は周波数帯域は元のままだが時間が□に縮まっ
ζいる。Here, for example, when increasing the speed of the signal, the shift l-amount R' when the signals C in the memory (13) and the signals from the memory (12) are added is set to (R'< R). When addition is performed for each shift acid R°, this signal becomes as shown in FIG. 4B, R' The frequency band of this signal remains the same, but the time is shortened to □.

また速度を低−トさせる場合には、Ｒ’　＞Ｒとする。Furthermore, when the speed is to be lowered, R'>R.

これによって加算信号は第５図Ｂに示すようＲ″ に周波数帯域は元のままで、時間か□に伸びた信号とな
る。As a result, the added signal becomes a signal whose frequency band R'' remains the same as shown in FIG.

Ｒ。R.

さらにこの信号を□のクロックで取り出すことにより、
信号の速度はそれぞれノじに戻され、第４図Ｃ１第５図
Ｃに示すように周波数帯域がＲ’ 周波数帯域か□に低下、あるいは上昇された信号が得ら
れる。Furthermore, by extracting this signal with the clock of □,
The speeds of the signals are respectively returned to the same level, and a signal whose frequency band is lowered or increased to the R' frequency band or □ is obtained as shown in FIG. 4C and FIG. 5C.

なお第３図〜第５図で波形はアナログで示したが、これ
らは実際にはデジタル値で処理されている。Although the waveforms in FIGS. 3 to 5 are shown in analog form, they are actually processed using digital values.

さらに上述の装置において、窓係数ｈＱ１１）、ｆ（Ｉ
ＩＩ）は以下のような関係にされる。すなわち信号ＸＱ
Ｉ＋）に対しζ ｘｇｎ）−＋ｈ　（ＳＲ−ｍ）　ｘ＜ｍ）但し、Ｓは任
怠の整数となり、これをフーリエ変換して、Ｘ２　（ＳＲＩ　ω）　＝　Σ　ｈ　（Ｓｌｌ　ｍ）　
Ｘ（ｍ）ｅ−ｊ″ＩＩ＋ｍ−一閃さらに、逆フーリエ変換し”ζ Ｓ＝−閃これがＸ（Ｔ１１）に等しければよいからψ　１であればよい。Furthermore, in the above device, window coefficients hQ11), f(I
II) is made into the following relationship. That is, signal XQ
I+), ζ xgn)-+h (SR-m)
X(m)e-j''II+m-1F.Furthermore, inverse Fourier transform is performed to obtain ζS=-1.Since it is sufficient that this is equal to X(T11), ψ1 is sufficient.

そして上述のように入力信号のスペクトル形状を検出し
て窓係数ハｆｌｌｌ）、ｆ（ＩＩＩ）を選択し°ζいる
場合には、例えば低域成分の方が小さいときはｈ（Ｔｌ
ｌ）”１ｆｃｍ＞＝　０．５−０．５　ｃｏｓ　（２πｎ　／　
Ｎ　−１）ｎ＝Ｑ、・・・Ｎ−１低域成分の方が大きいときはｈａｎ＋＝０．５４−０．４６　ｃｏｓ　（２ｔｃ　ｎ
／Ｎ　−１）ｎ＝０、・・・Ｎ−１ｆ（Ｗｏ＝２π／Σｈｊとするごとにより音質を向上させることができる。Then, as described above, when the spectral shape of the input signal is detected and the window coefficients h(tl) and f(III) are selected, for example, when the low frequency component is smaller, h(Tl
l)"1 fcm>= 0.5-0.5 cos (2πn/
N-1) n=Q,...N-1 When the low frequency component is larger, han+=0.54-0.46 cos (2tc n
/N −1) n=0, . . . N−1 f(Wo=2π/Σhj) The sound quality can be improved each time.

また上述の装置ｈ′において、乗算器（９）での位相調
整は以下のようにされる。Further, in the above-mentioned device h', the phase adjustment in the multiplier (9) is performed as follows.

まず時刻ＳＲでのフーリエ変換後のスペクトラムをＸ　（Ｓｔ？、ωｋ　）その実部をＸＲ（ＳＲ，ωｋ）虚部をＸ、Ｉ（ＳＲ，ωｋ）位相の１値をＰ　（Ｓｉ２．ωｋ）但し、−π≦Ｐ　（ＳＲ，ωｋ）＜π 及び時点Ｓに沿って連続化した位相をｐ　（ＳＲ，ωｋ）但し、−■＜’ｐ、（Ｓ）ｌ、ωｋ）く■とする。この
とき位相の連続化及び位相変形を次のように行う。First, the spectrum after Fourier transformation at time SR is X (St?, ωk), its real part is XR (SR, ωk), its imaginary part is X, I (SR, ωk), and the single value of the phase is P (Si2.ωk). , -π≦P (SR, ωk)<π and the phase made continuous along the time point S is p (SR, ωk), where -■<'p, (S)l, ωk). At this time, phase continuity and phase transformation are performed as follows.

ｉ）Ｓ≠０の場合（ａｌ　最初にフーリエ変換によって、Ｘ　（Ｓｉ２．
ωｋ）をめる。i) If S≠0 (al First, by Fourier transform, X (Si2.
ωk).

（創り）　次に１’　（３１１，ωｋ）をめる。(Creation) Next, add 1' (311, ωk).

ごごでＸＲ（ＳＲ，ωｋ）　、　Ｘｒ　（ＳＲ，ωｋ）の符号
が（＋、　＋）または（＋、−）のときはＰ　（ＳＲ＋
　ωｋ）　＝　ｊａｎ−”　（ＸＩ　（ＳＲ，ωｋ）／
ＸＲ（ＳＲ，ωｉ＋））符号が（−、＋）のときはＰ　（ＳＲ，（ｄｋ、　）　＝　ｔａｎ−”　（ＸＩ（
ＳＲ，ωｋ）／ＸＲ（ＳＲ，ωｋ））＋π 符号が（−、−）のときはＰ　（ＳＲ，ωｋ）　＝　ｔａｎ−”　（ＸＩ（Ｓｔ？
、　ωｌＯ／ＸＲ（ＳＲ，ωｋ））−π である。When the sign of XR (SR, ωk) and Xr (SR, ωk) is (+, +) or (+, -), P (SR+
ωk) = jan-” (XI (SR, ωk)/
XR (SR, ωi +)) When the sign is (-, +), P (SR, (dk, ) = tan-" (XI (
SR, ωk)/XR(SR, ωk)) + π When the sign is (-, -), P (SR, ωk) = tan-" (XI(St?
, ωlO/XR(SR, ωk))−π.

１０１　さらにＩ　Ｐ　（ＳＲ，ωｋ　’）−Ｐ（（Ｓ−１）　Ｒ，ω
ｋ）　１〈ε但し、εは定数であるか否かを判定する。101 Furthermore, I P (SR, ωk')-P((S-1) R, ω
k) 1<ε However, it is determined whether ε is a constant.

（ｄｉ　そしてこれが止しいときはＰ　（ＳＲ，（ｉｌｋ）　＝ｐ（（Ｓ−１）Ｒ，ωｋ）
　＋Ｐ　Ｃ５Ｒ，ωｋ）−Ｐ（（Ｓ−１）　Ｒ，ωｋ）とする。(di And when this stops, P (SR, (ilk) = p ((S-1)R, ωk)
+P C5R, ωk) - P((S-1) R, ωk).

（ｄ′）また（Ｃ）が正しくないときは、まず絶対値の
中の符号が（−）のときにｐ　（ＳＲ，ωｋ）　＝ｐ（（Ｓ−１）　＋５．ωｋ）
　＋Ｐ　（Ｓｔ？、ωｋ）−Ｐ（（Ｓ−１）　Ｒ，ωｋ
）＋２π 符号が（＋）のときにｐ　（ＳＲ，ωｋ）　＝ｐ（（Ｓ−１）　Ｒ，ωｋ）　
＋Ｐ　（ＳＩＬωｋ）二Ｐ（（Ｓ−１）　Ｒ，ωｋ）−
２π とする。(d') Also, if (C) is incorrect, first, when the sign of the absolute value is (-), p (SR, ωk) = p ((S-1) +5. ωk)
+P (St?, ωk)-P((S-1) R, ωk
)+2π When the sign is (+), p (SR, ωk) = p((S-1) R, ωk)
+P (SILωk)2P((S-1) R, ωk)-
Let it be 2π.

以上によって位相が連続化されるーさらに一ヒ述の合成
時のシフト量の変更を行ゲｒｂｓる場合に、エンコード
及びデコードに於し）て、ノくンド゛１ｔ１１＋渉を防
ぐため、シフト量の変更に応じてＲ゛ｐ　（ＳＲ，ωｋ）→ｐ（Ｓ１５．ωｋ）　・−とする
。これによって位相の不連続によるノイズの発生が防止
される。As described above, the phase is made continuous.Furthermore, when changing the shift amount during synthesis as described above, in order to prevent interference in encoding and decoding, the shift amount must be changed. According to the change, Rp (SR, ωk)→p(S15.ωk) -. This prevents noise from occurring due to phase discontinuity.

こうしてスピーチコントロールが行われるねむすである
が、この装置によればフーリエ変換にて周波数軸に変換
された信号を位相１ＩＩｌ１１整した後Ｇこ合成するよ
うにしたので、極めて高品質の信号が得られ、信号の不
連続や欠落等のない良好なスピーチコントロールが行わ
れる。This is how speech control is performed, but with this device, the signal converted to the frequency axis by Fourier transform is phase-aligned and then synthesized by G, so an extremely high quality signal can be obtained. This provides good speech control without signal discontinuities or dropouts.

ところでこの装置におい゛乙抽出される時間長しと、シ
フト量Ｒとの間には、音質の変化を生じないためには例
えば抽出の窓係数にハミングラインドウを用いている場
合で、Ｒ＜−−−Ｌとする制約がある。このためスピー
チコントロールの変化の単位を細かくとろうとすると、
Ｌが極めて大きくなる問題を生じる。By the way, in this device, in order to avoid a change in sound quality between the time length extracted and the shift amount R, for example, when a Hamming line is used as the extraction window coefficient, R< ---There is a restriction that it is L. For this reason, if you try to take small units of change in speech control,
A problem arises in that L becomes extremely large.

すなわち、例えば１０ｋＨｚでサンプリングされた信号
に対して、Ｌ＝１２８点でウィンドウをかけてｌフレー
ムとし、これをＲ−３２点ずつシフトして抽出を行って
いる場合に、Ｒ’　＝３１点とし°ζ位相１調整し、合成を行うと、−に時間短縮された信号２が取り出される。これによって−の時間の変化が２得られる。In other words, for example, if a signal sampled at 10 kHz is windowed at L = 128 points to make one frame, and this is shifted by R - 32 points for extraction, then R' = 31 points. By adjusting the °ζ phase 1 and performing synthesis, a signal 2 whose time has been shortened to - is extracted. This gives a - change in time of 2.

ところがこの変化の単位を細かくし、例えば必要になり
、抽出時間長も４　Ｌ必要になる。However, if the unit of this change is made smaller, for example, it becomes necessary, and the extraction time length becomes 4 L.

そしてこのようにＬが大きくなると、フーリエ変換の際
の演算量が増加し、処理時間が増大したり、処理のため
のハードウェアが大きくなるなどの問題があった。When L becomes large in this way, there are problems such as an increase in the amount of calculation during Fourier transform, an increase in processing time, and an increase in the size of hardware for processing.

発明の目的本発明はこのような点にかんがみ、演算量を増やすこと
なく、変化の（１４位を細かくできるようにするもので
ある。OBJECTS OF THE INVENTION In view of these points, the present invention makes it possible to finely refine the (14th) position of change without increasing the amount of calculations.

発明の概要本発明は、人力された音声信号を任意の時間間隔ごとに
任意の時間長ずつ抽出する手段と、この抽出された各フ
レームごとにフーリエ変換し゛Ｃ時間軸を周波数軸に変
換する手段と、この変換された１ば号の位相を調整する
手段と、この位相ＩＭ整された信号を逆フーリエ変換し
て周波数軸を時間軸に逆変換する手段と、この逆変換さ
れた信号を所定の倍率で補間する手段と、この補間され
た信号を所望の時間間隔ごとに順次合成すると共に任意
に時間軸を一伸縮して出力する手段とを有して成るスピ
ーチコントロール装置であっＣ１これによれば演算量を
増やすことなく、変化の単位を細かくすることができる
。SUMMARY OF THE INVENTION The present invention provides means for extracting human-generated audio signals at arbitrary time intervals and arbitrary lengths of time, and means for performing Fourier transform on each extracted frame to convert the time axis into the frequency axis. , a means for adjusting the phase of the converted signal No. 1, a means for performing an inverse Fourier transform on this phase IM adjusted signal to inversely transform the frequency axis into a time axis, and a means for inversely converting the inversely transformed signal into a predetermined value. C1 is a speech control device comprising means for interpolating at a magnification of Accordingly, the unit of change can be made finer without increasing the amount of calculation.

実施例第６図におい°ζ、逆フーリエ変換後の乗算器（１１）
の後に補間回路（３１）を設ける。そしてこの補間回路
（３１）に”ζ、ずば号を例えば４倍に補間する。In FIG. 6 of the embodiment, °ζ, multiplier (11) after inverse Fourier transform
An interpolation circuit (31) is provided after. Then, the interpolation circuit (31) interpolates the "ζ" and "Zuba" numbers, for example, by four times.

ずなわら、例えば乗算器（１１）から第７図Ａに示すよ
うなサンプリング周期ごとのＧ８号がｉ４７られた場合
に、−この各信号の間に第７図Ｂに示すようにそれぞれ
３点の補間を行う。なお２重丸はノじの１４号を示す。For example, when the G8 signal for each sampling period as shown in FIG. 7A is input from the multiplier (11), three points are generated between each signal as shown in FIG. 7B. performs interpolation. The double circle indicates Noji No. 14.

これによってメモリ　（１２）　（１３）によ５りる合
成では、第７図Ｃに示すように４Ｒ点に対して１点ずつ
のシフトを行うことができる。ずなわぢ従来′は２重丸
の元の信号の位置にしかシフトできなかったが、補間に
よりその−の位置へシフトするごとができる。例えばＬ
＝１２８、Ｒ−３２だった場合るごとかできる。As a result, in the synthesis using memories (12) and (13), the 4R points can be shifted one point at a time, as shown in FIG. 7C. In the conventional Zunawaji system, it was possible to shift only to the position of the original signal of the double circle, but by interpolation, it is possible to shift to the negative position. For example, L
= 128, if it was R-32, everything would be possible.

さらにメモリ　（１３）からの信号の取り出しに際して
、クロック周波数を従来の４倍にし、ＤＡ変換回路（１
４）にて４倍のクロックで変換を行うか、メモリ　（１
３）からの信号を−に間引いて、従来と同じＤＡ変換を
行う。Furthermore, when taking out the signal from the memory (13), the clock frequency is increased four times that of the conventional one, and the DA conversion circuit (13) is
4), or use memory (1
The signal from 3) is decimated to - and the same DA conversion as before is performed.

ｐ′ 従って例えばＲ’＝１２５とした場合に、処理回（１２
）、（１３）にてＲ’＝１２５のシフト量で合成ツクで
取り出し、時間を元の信号と同じように戻下された信号
が取り出される。p' Therefore, for example, when R' = 125, processing times (12
), (13), a signal is extracted by synthesis with a shift amount of R'=125, and a signal whose time has been returned to the same level as the original signal is extracted.

このようにして音声信号の速度及び音程を任意に変える
ことができる。そしてこの場合に、フーリエ変換及び逆
変換の演算量は従来と同じであり、この逆変換後に補間
を行うことでシフト量の変化単位が細かくなる。In this way, the speed and pitch of the audio signal can be changed arbitrarily. In this case, the amount of calculation for the Fourier transform and the inverse transform is the same as in the conventional method, and by performing interpolation after the inverse transform, the unit of change in the shift amount becomes finer.

さらにこの補間後の信号に応じてあらかじめ位相調整を
行うことにより、音質の劣化等のない良好な制御を行う
ことができる。Further, by performing phase adjustment in advance according to the interpolated signal, good control without deterioration of sound quality can be performed.

なお、合成時のメモリの容量が補間の分多く必要となる
が、演算部に比べてそのハードウェアの量は問題になら
ない。Although a larger memory capacity is required for the interpolation during synthesis, the amount of hardware involved is not a problem compared to the arithmetic unit.

発明の効果本発明によれば、演算量を増やすことなく、変化の単位
の細かい、良好なスピーチコントロールを行うことがで
きるようになった。Effects of the Invention According to the present invention, it has become possible to perform fine speech control in units of change without increasing the amount of calculations.

[Brief explanation of the drawing]

第１図は従来の装置の説明のための図、第２図〜第５図
は本願発明者が先に提案したスピーチコントロール装置
の説明のための図、第６図は本発明の一例の構成図、第
７図はその説明のための図である。（１）は入力端子、（２）はＡＤ変換回路、（３）、（
４）、（１２）、（１３）はバッツァメモリ、（５）、
曽、（９）、（１１）は乗算器、（６）はフーリエ変換
回路、（８）は処理回路、α０）は逆フーリエ変換回路
、（１４）はＤＡ変換回路、（１５）は出力端子、（３
１）は補間回路である。手Ｕさネ市正宿二昭和５９年　５月　１０日特許庁長官　若　杉　和　夫　殿　メジ１、事件の表示昭和５８年　特　許　願　第２５０７８１号２°Ｑ　Ｉ
ＪＪ　Ｏ）’ｒ５１１；　ユ、−ヶヨッ１、。−７５１
３、補正をする者事件との関係　特許出願人住　所　東京部品用区北品用６丁目７番３５号名称（２
’１Ｂ）ソニー株式会社代表取締役　大　賀　典　雄４、代理人住　所　東京都新宿区西新宿１丁目８番１号置　０３−
３４３−５８２１＆０（新宅ビル）６、袖正により増加
する発明の数（１）　特許請求の範囲を別紙の通り訂正する。（２）　明細書中、第３頁２行〜第４頁２行「第２図・
・・　供給される。」とあるを次の通り訂正する。１−第２図において、あらかじめ、マイクロホン等によ
り電気的信号に変換され、遮断周波数３．２ｋｌｌｚの
低域通過フィルタを通された音声信号が入力端子（１）
に供給される。この入力音声信号は、６’、４ｋｌｌｚ
（周期的１５８μｓ）の変換クロックにより駆動されて
いる１語１２ビツトのＡＤ変換器（２）により順次、こ
のクロンクパルスの割合で１語１２ピントのデジタルデ
ータに変換される。ＡＤ変換器（２）は、６．４ｋｌｌ
ｚのクロックで駆動されている１ｉ！１２ビツトより成
る２５６語のシフトレジスタ（３）に接続されており、
駆動クロックの１パルスがシフトレジスタ（３）に供給
されるごとに、シフトレジスタ（３）は、１　ｉｆ！、
第２図において右（以−ト、「左」、「右」という語を
、第２図において左、右という意味で用いることにする
）にシフトされ、ＡＤ変換器（２）の出力データが１語
、シフトレジスタ（３）の左より、シフトレジスタ（３
）に人いる。Ｊなわちシフトレジスタ（３）には、ＡＤ変換器（２）
によって生成された、一連の２５６語のデジタルデータ
がはいっており、ＡＤ変換器（２）が、デジタルデータ
を１語、生成するごとに、シフトレジスタ（３）は、１
語、右にシフトされ、その内容が更新されて行く。ここで、第２図における（４）以下の信号の具体的な流
れについて説明する前に、短時間フーリエ解析について
、一般的な事柄を述べておく。例えば、「あいうえお」という音声信号を考えてみると
、「あ」という音が発せられている時間と、１い」とい
う音が発せられている時間とでは、音声を発しているヒ
トの口や声道の形状がことなっている。すなわち「あい
うえお」という音声信号は、時間とともにその特性が変
化してゆく物理的実体から発せられた信号であり、定常
信号とは見做せない。このように、音声信号や音楽信号などは、それを発して
いる物理的実体の特性が、時間とともに変化しており、
一般に定常信号と見做ずことはできず、定常信号を対象
にしたフーリエスペクトラム解析を直接に適用すること
は不可能である。しかしながら、先はどの例の「あいう
えお」について百うと、「あ」、１−い」、「う」、１
゛え」。「お」の各々の音声を発している時間内では、ヒトの口
や声道の形状は、はぼ一定しており、その時間内に信号
を限定すれば、定常信号と見做せる。そこで、フーリエ変換する領域を、定常と見做せる時間
の区間に限定し、フーリエ変換をおこない、その区間を
次々に更新してゆき得られるフーリエスペクトラムを用
いれば、非定常ではあるが、短時間の区間については定
常であるような、音声信号や音楽信号に対してフーリエ
解析が可能になる。このようなフーリエ解析は、短時間フーリエ解析と呼ば
れζいる。数式を用いてさらに説明しよう。人力信号ｘ　（ｔｌを
、サンプリングに得られるデータ列を（Ｘ（→）（ｍ＝
０．１．２．・・・・）としたとき、上述した事柄は、
定常とみなせるデータの部分列（ｘ　（ｍ＋ＳＲ）　）
ｍ＝０．１．−・・；　Ｓ　＝０．１．・・＝　（Ｒ，
Ｍはある整定数）の変数ｍについて、有限の部分列（ｘ
（ｍ＋ＳＲ）　）　ｍ＝０＋Ｌ・・・・、　Ｍ−１の端
部がスペクトルに及ばず影響を減じる窓係数（ｈ（−ｍ
））　（ｍ＝０．１．・・・・、Ｍ−１）を乗じた後、
変数ｍについ°ζ離散的フーリエ変換をおこない、短時
間フーリエスペクトラムＸ　（ＳＲ，ｋ）　（Ｓ＝０．
１．・・・・＋Ｍ−１；に＝０．１，２．・・・・、Ｍ
−１）を得る、ということになる。２π 第８図より明らかなように、Ｒは分析すると区間の更Ｆ
ｉＷｋであり、以下のような制約がある。（Ａ）式より２π ｍ＋５Ｒ＝ｆとおくと２π −（Ｂ）窓係数（ｈ　（ｍ目（ｒｎ−０，１，２，”、　Ｍ−１
）の定義を、ｍについて一■〜十閃まで拡大し”乙とす
ると９π −（Ｃ）すなわち、Ｘ（ＳＲ，ｋ）は、第９図に示すように第１
番目の変数ＳＲについて、データ列口＋（ｍ））　とを
、たたみ込んだデータ列、Ｘ（Ｓ、ｋ）（Ｓ＝０．１．
２．旧・）をＲ−１データおきに再サンプリングしたも
のになワており、デジタル信号インパルスレスポンス（
ｈ　（ｍ）　）　ヲ有−Ｊ−る線形デジタルシステムに
入力した出方を、Ｒ−１データおきに再サンプリングし
たものと解釈できる。故に、分析する区間の更新１ｉＲＸｌは、サンプリング
定理が示すように、の第１番目の変数ｍについての帯域中〕でなければならない。（Ｘ　（ｍ、ｋ）　）　（ｍ＝０．１．２．・・”）の
帯域中は、（ｍ　＝　０１１１２．・・・・）に依存す
るわけであるが、その上限は、図に於ける、インパルス
レスポンス（ｈ（ｍ））を有する線形デジタルシステム
のローパス特性でおさえられるから、の第１番目の変数ｍについての帯域中〕 ≧２ｘ　（（ｈ（ｍ）　）　（ｍ”’Ｏ，Ｌｔ２．−”
）の帯域＋１Ｊ　）　−−（Ｄ　）すなわぢＲば、一（Ｅ）でなければならない。一例として、Ｍ＝２５６．　（ｈ　（ｍ）　）としてハ
ミング窓係数とすると窓係数ｂ　（ｒｎ）　＝　０．５
４−０．４６　ｃｏｓ（２πｍ／　２５５　）　（ｍ＝
０．１１・−、２５５）を用いるとすると、（ｈ　（ｍ
）　）　（ｍ＝０．１，２．・−、２５５）のローパス
部分の帯域中は、約４２ｄＢまで減衰するかって、Ｒは
、上式の関係から、Ｒ≦□−６４ ■ でなければならない。第２図において、（４１、（５１、（６１、（７１で上
述した、短時間フーリエ変換をおこなっている。Ｍ　＝
　２５６、分析窓係数として、ハミング窓係数ｈ（ｒｎ
）−０，５４−０，４６Ｘｃｏｓ　（２πｍ／２５５）
　（ｍ−０，１，２゜・・・・＋　２５５　）　、Ｒ＝
　６４としている。上述の例で明りかなように、Ｒ＝６
４は、（Ｅ）式を満たしている。以下、具体的に述べる。１語１２ビツト、　２５６語より成るシフトレジスタ（
３）の内容は、ＡＤ変換器（２）の駆動クロックを６４
分周したクロックの１パルス（すなわち、６４ｘ（ＡＤ
変換（２）の駆動クロック周期、約１５８μ５ｅｃ）（
秒））ごとに同じく、１語１２ビツト、　２５６語より
成るシフトレジスタ（４）にランチされる。ラッチされ
た２５６緒のデータは、シフトレジスタ（４）に供給さ
れる８Ｍ１ｌｚ（周期１２５　ｎ　５ｅｃ）のりＣ１７
りのタイミングで、１語右ヘシフトされ、１２ビツトよ
り成る２つの入力端子、および２３ビツトより成る１つ
の出力端子を有する乗算器（５）の一方の入力端子へお
くりこまれる。一方、この同じクロックのタイミングで
、乗算器（５）のもう一方の入力端子へ、あらかじめＲ
ＯＭに貯えである、ハミング窓係数ｈ　（ｍ）＝　０．
５４−０．４６ｃｏｓ　（２πｍ／　２５５　）　（ｒ
ｎ　＝　０．１，２゜・・・・、　２５５　）が、−語
ずつ、ｍ−（Ｌ１＋２＋・・・・の順に、おくりこまれ
、この２つの人力の積が、乗算器（５）の出力として、
人力データがセットされ”ζから１００　ｎ　ｓｅｃ後
に、乗算器（５）の出力端子にセットされる。この、乗算器（５）の２３ピツ］・より成る出力結果は
、乗算器（５）に人力データを送りこむタイミングクロ
ックのタイミングで（すなわぢ、１２５　ｎ　ｓｅｃご
とに）　Ｆ　Ｆ　Ｔ　（Ｆａｓｔ　Ｆｏｕｒｉｅｒ　Ｔ
ｒａｎｓｆｏｒｍ）変換器（６）へ送りこまれる。ＦＦ
Ｔ変換器（６）は、こうして送りごまれる１語２３ビツ
トのデータが２５６語になると、この１語２３ビツト、
　２５６語のデータに対しζ、ＦＦＴをおこない実部、
虚部ともに１６ビツトから成る、２５６語の複素データ
を生成する。さて、ＦＦＴ変換器（６）への、２５６語の入力データ
を（ｙ（ｍ））　（ｍ＝０．１．・・・・＋　２５５　
）出力データを（Ｙ　（ｋ）　（ｋ＝０．１．２．・・
・・、　２５５　）とすると、ＦＦＴの定義より、２π −（Ｆ）一方、この人力データ（ｙ　（Ｉｎ）　）　（ｒｎ＝ｏ
ｌｉｌ・・・・、２５５）の短時間フーリエスペクトラ
ムは（Ａ）式より、２π したが二で、（Ｙ　（１０）　（ｋ＝０．１．２．・・
・・、２５５）と（Ｘ　（６４Ｓ、ｋ　））　（ｋ＝０
．１．２．・・・・、　２５５　）とは、２π （ｋ＝０．１，２．・・・・、　２５５　）−（Ｈ）という関係がある。よって、ＦＦＴ変換器（６）の出２
π 人力データＸ（ｍ）の短時間フーリエスペクトラムが得
られることになる。これを、乗算器（７）でおこなう。すなわち、ＦＦＴ変換器（６）で生成された、実部、虚
部ともに１６ビツトから成る２５６語の複素データは、
周期１２５　ｎ　ｓｅｅのクロックのタイミングで、実
部、虚部ともに１６ビツトより成る２つの複素データ入
力端子、および実、虚部ともに１６ビツトより成る１つ
の出力端子を有する乗算器（７）の一方の入力端子へお
くりこまれる。一方、この同じクロックのタイミングで
、乗算器（７）のもう一方の入力端子へ、あらかじめ用
窓されている、上述の係数、２π ２、・・・・、　２５５　）が−語ずつおくりこまれた
、この２つの人力の積が、乗算器（７）の出力として、
入力データがセントされ°ζから１００　ｎ　ｓｅｃ後
に、乗算器（７）の出力端子にセットされる。この出力
結果は、乗算器（７）に入力データを送りこむクロック
のタイミングで１語ずつ、全部で２５６語がスペクトラ
ム変形回路（８）へ送りごまれる。」（３）　同、第４頁３行、第５頁２０行、第１７頁７〜
８行にそれぞれ「処理回路」とあるを「スペクＩ・ラム
変形回路」と訂正する。（４）　同、第４貝７行〜第５頁６行「この処理回路・
・・取り出される。」とあるを次の通り訂正する。１゛スペクトラム変形路（８）により変形された、１語
が実部、虚部ともに１６ビツトより成る２５６語の複素
データは、（９）、顛、（１１）　、（１２）　、（１
３）　。（１４）で時間領域の信号に変換される。（９）〜（１４）の流れを具体的に説明する前に、（９
）〜（１４）に関しての、一般的な関係について述べて
おく。先に述べたように、変形された短時間フーリエスペクト
ラムＸ　（Ｓｌｊ’　、　ｋ　）　（Ｓ＝０．１．２．
・・・・；ｋ　＝　０．１．２．・・・・、トｌ）は、
短時間フーリエスペクトラムＸ　（Ｓ、ｋ）　（Ｓ＝０
．１，２．・・・・；ｋ”’Ｏｔｌ＋２、・・・・、ト
ｌ）を、第１番目の変数Ｓについ”Ｃ１Ｒ′−１データ
おきに再サンプリングしたものである。そこで、変形さ
れた短時間フーリエスペクトラムＸ　（ＳＲ’　、　ｋ
）　（Ｓ＝０．１，２．・・・・１ｋ＝（Ｌｌ、２．・
・・・、トｌ）から、時間領域の信号を作成するには、
Ｘ　（ＳＲ’　、　ｋ）　（Ｓ＝０．１．２．・・・・
；に＝０．１，２．・・・・、トｌ）を補間して、Ｘ　
（Ｓ、ｋ）　（Ｓ＝０．１．２．・・・・；　ｋ　＝０
．１，２．・・・・、ト１）を作り、Ｘ　（Ｓ、ｋ）　
（Ｓ＝０．１．２．・・・・、　ｋ　＝　０．１．２．
・・・・。ト１）を逆離散的フーリエ変換すれば良い。すなわち、
Ｘ　（ＳＲ’　、　ｋ）の第１番目の変数に関し”ζ、
各々、隣りのデータの間に０をＲ′−１個つめた＾データＸ（Ｓ、ｋ）を１乍り、Ｍ（固のデータ（ｆ　（ｍ）　）　（ｍ＝０
．１．−・・、ト１）をインパルスレスポンスとしＣ持
つローパスフィルタに通して、Ｘ（Ｓ、ｋ）を−作る。式％式％ ′ゾ（β、ｋ）の定義よりｍ＝　−■ この後、Ｘ（Ｓ、ｋ）を第２番目の変数、ｋに関して、
逆離散的フーリエ変換して、出力信号（ｙ（Ｓ））（Ｓ
＝０．１．２．・・・・）を得る。これも式で書くと、
以下のようになる。２π −（Ｉ）Ｒ’　＝Ｒかつスペクトラムを操作しないときは、入力
信号がそのまま出力信号にならねばならない。そのためには、上式より、ｙ　（Ｓ）　＝　ｘ　（Ｓ）ところで、であるからＡ＝Ｓ−ｐＭ（ｐ　：変数）とおく＝（Ｊ）故に、（ｈ（ｍ））と（ｆ　（ｍ）　）とが、全てのＳ
について、ｍ＝−■ −（Ｋ）となることが必要である。さて、（Ｉ）式より、と書く
と、・・・・）　（ｆ　（ｍ）　）はｍ　＝　０．Ｌ２＋”
　”　ｔ　Ｍ−１でのみ０でないのでｆ　（Ｓ−ｍＲ’）　・ｘ　（ｍＲ’　、Ｓ）は、Ｓ＝
ｍＲ’　＋　ｍＲ’　＋　１＋　・・・・、　ｍＲ’　
＋Ｍ　−１（ｎ　＝Ｓ−ｍＲ’　＋　ｎ　＝Ｏｔ１．・
・”　＋　Ｍ−１）の部分だけが０でない。したがって
、Ｒ′として、ｒ−Ｒ′＝Ｍ　（ｒ　：正の整定数）と
、Ｍを割り切るように選ぶと、（ｍ−１）Ｒ’　＋ｙ１≦Ｓ≦ｍＲ’＋Ｍ−１（ｍ＝０
．ｌ、２．・・・・）と、有限回の加算で（−ｙ（Ｓ））　（Ｓ＝０．１．２
．・・・・）が逐次求まる。また、ｘ　（ｍＲ＋ｓ）をめる際にＦＦＴを使うには、
ＦＦＴ変換されたデータと短時間フーリエスペクトラム
データとの間に（Ｈ）式の関係がある２π ０．１１・・・・、　Ｍ−１ｉ　Ｓ−０，１，２，・・
・・；Ｒ′、整定数）を乗じたのちに、ＦＦＴを施せば
良い。第２図において具体的に述べる。なお以下の説明ではＲ
’＝６４とする。スペクトラム変形回路（８）により変形された、実部、
虚部ともに１６ビツトより成る２５６語の短時間フーリ
エスペクトラムＸ　（６４Ｓ、ｋ）　（ｋ＝０．１．２
゜・・・・、　２５５　）は、周期１２５　ｎ　ｓｅｃ
のクロックのタイミングで、ｋ＝ｏ、１．２．・・・・
の順に１語ずつ、実部、虚部ともに１６ビツトより成る
２つの複素データ入力端子、および、実部、虚部ともに
１６ビツトより成る１つの出力端子を有する乗ｕ　（９
）へおくりこまれる。一方、その同じクロックのタイミ
ングで、あらかじめ用意されている。上述の係数９π ・・＋　２５５　）かに−０，１，２，・・・・の順に
１語ずつ、乗算器（９）のもう一方の入力端子に送り出
され、この２つの入力の積が、乗算器（９）の出力とし
て、入力データが乗算器（９）にセントされてから、１
００　ｎ　ｓｅｃ後に、乗算器（９）の出力端子にセッ
トされる。この出力結果は、乗算器（９）に入力データ
を送りこむクロックのタイミングで１語ずつ、計２５６
語が、逆ＦＦＴ変換器（ｌｆｆｌへ送りこまれる。逆ＦＦＴ変換器叫は、こうして送りこまれる実部、虚部
ともに１６ビツトより成るデータが２５６語になると、
このデータに対し、逆ＦＦＴをおこない、１語１６ビツ
トから成る２５６語の時間領域のデータを生成する。こ
の１語１６ビツトから成る２５６語のデータは、周期１
２５　ｎ　ｓｅｃのクロックのタイミングで、１６ビツ
トより成る２つの入力端子、および１６ビツトより成る
１つの出力端子を有する乗算器（１１）の一方の入力端
子へおくりこまれる。一方、この同じクロックのタイミングで、乗算器（１１
）のもう一方の入力端子へ、あらかじめ、ＲＯＭに用意
されている、上述した関係式（Ｋ）がｍ＝ＬＬ２＋・・
・・の順に１語ずつ、おくりこまれ、この２つの入力の
積が、乗算器（１１）の出力として、入力データがセッ
トされてから１００　ｎ　ｓｅｃ後に、乗算器（１１）
の出力端子にセットされる。この出力結果は、乗算器（１１）に入力データを送りこ
むクロックのタイミングで１語ずつ、全部で２５６語、
シフトレジスタ（１２）へ送りこまれる。シフトレジスタ（１２）は、１語１６ビツト、２５６語
より成り、乗算器（１１）の乗算結果を送出する。周期１２５　ｎ　ｓｅｃの同じクロックで駆動されてお
り、乗算器（１１）から、乗算結果が１語おくりこまれ
るごとに、１語、右ヘシフトされる。こうして、シフト
レジスタ（１２）に、２５６語の、乗算器（１１）の乗
算結果がはいると、シフトレジスタ（１２）は、シフト
禁止の状態になりシフトレジスタ（１２）の２５６語が
、１語１６ビツト、　２５６語より成るシフトレジスタ
（１３）の各々、対応する語ごとに加算され、加算結果
が、シフトレジスタ（１３）の各々の対応する語へ入れ
られる。このシフトレジスタ（１３）には、ＡＤ変換器（２）を
駆動している６、４ｋＨｚのクロックが供給されており
、上述の加算が終了すると、この６．４ｋＨｚのクロッ
ク、１パルスごとにシフトレジスタ（１３）が、１語右
ヘシフトされ、１６ビツトＡＤ変換器（１４）に、ｌデ
ータ送出される。他方、このシフトにより、シフトレジ
スタ（１３）には、左より、０の値を有するデータが１
語入れられる。こうしてシフトレジスタ（１３）はシフ
トをＲ’＝６４回おこない、６４出力データをＤＡ変換
器におくりこむ。１６ビツトＤＡ変換器（１４）は、６．４ｋｌｌｚのク
ロックのタイミングでおくられてくるｌｉ！１６ビツト
のデータを逐次、アナログ電圧値に変換し、出力端子（
１５）に出力する。」（５）　同、第６頁１行「れて位相・・・信号が」とあ
るを１゛れる。この信号が」と訂正する。（６）　同、同頁１１行、１５行、末社、第７頁３行に
そする。（７）　同、第７頁ＩＯ行「に対して」の後に［窓係数
ｈ（ｍ）を掛けて、」を加入する。（８）　同、同頁１３行１°変換して、」の後に１−ス
ペクトラムＸ２　（ＳＲ，ω）は、」を加入する。（９）　同、同頁１６行とあるをと訂正する。（ｌＯ）同、第８頁７〜１４行１ｈ（ｍ）＝１ｚ・・で
きる。」とあるを次の通り訂正する。ｒｈ　（ｍ）　＝１ｆ　（ｍ）　＝　０．５−０．５ｃｏｓ　（２πｍ／　
（Ｎ　１）　＞ｍ＝０、・・・Ｎ−１低域成分の方が大きいときはｈ　（ｍ）　＝０．５４−０．４６　ｃｏｓ　（２ｙｃ
ｍ／　（Ｎ　−１）　）ｍ＝０、・・・Ｎ−１ｆ　（ｍ）＝Ｒ’　／Σｈ　１とすることにより音質を向上させることができる。なお、ｈ　（ｍ）＝Ｏということは、乗算器（５）につ
いては何も行わないことに相当する。」（１１）同、第
９頁６〜９行１位相の１値を、・・・位相を」とあるを
次の通り訂正する。［とする、このとき位相の１値をＰ　（ＳＲ，ωｋ）とすると、Ｐ　（ＳＲ，ωｋ）は −π≦Ｐ’　（ＳＲ，ωｋ）＜π の値をとり、位相の不連続となる部分が存在する。そこでこの不連続を取り除いた位相を」（１２）同、第
１０頁１０行「さらに」の後に］゛不連続な部分を判別
するために」を加入する。（１３）同、第１３頁１１行〜第１４頁１行１−人力さ
れた・・・スピーチコントロール装置」とあるを次の通
り訂正する。１−人力された音声信号に任意の時間間隔ごとにその時
間間隔により制限される時間以上の長さ及び係数を有す
る窓関数を掛けて抽出する手段と、この抽出された信号
ごとにフーリエ変換して時間軸を周波数軸に変換する手
段と、この変換された信号の位相を調整する手段と、こ
の位相調整された信号を逆フーリエ変換して周波数軸を
時間軸に逆変換する手段と、この逆変換された信号を所
定の倍率で補間する手段と、この補間された信号に上記
窓関数により規定される時間長及び係数を有する窓関数
を掛けて上記任意の時間間隔ごとに順次合成すると共に
任意に時間軸を伸縮して出力する手段とを有して成るス
ピーチコントロール装置」と訂正する。（１６）同、第１７頁１行「第５図」とあるを［第５図
、第８図、第９図」と訂正する。（１７）図面中、第８図、第９図を別紙の通り追加する
。以上特許請求の範囲入力された音声信号に任意の時間間隔ごとにその時間間
隔により制限される時間以上の長さ及び係数を有する窓
関数を掛けて抽出する手段と、この抽出された信号ごと
にフーリエ変換して時間軸を周波数軸に変換する手段と
、この変換された信号の位相を調整する手段と、この位
相調整された信号を逆フーリエ変換して周波数軸を時間
軸に逆変換する手段と、この逆変換された信号を所定の
倍率で補間する手段と、この補間された信号に」二記窓
関数により規定される時間長及び係数を有する窓関数を
掛け”Ｃ上記任意の時間間隔ごとに順次合成すると共に
任意に時間軸を伸縮して出力する手段とを有し°ζ成る
スピーチコントロール装置。FIG. 1 is a diagram for explaining a conventional device, FIGS. 2 to 5 are diagrams for explaining a speech control device previously proposed by the inventor of the present application, and FIG. 6 is a configuration of an example of the present invention. 7 are diagrams for explaining the same. (1) is an input terminal, (2) is an AD conversion circuit, (3), (
4), (12), (13) are Bazza memory, (5),
Zeng, (9) and (11) are multipliers, (6) is a Fourier transform circuit, (8) is a processing circuit, α0) is an inverse Fourier transform circuit, (14) is a DA conversion circuit, and (15) is an output terminal. ,(3
1) is an interpolation circuit. Teusane City Seishuku 2 May 10, 1980 Director-General of the Patent Office Kazuo Wakasugi Tono Meji 1, Indication of the Case 1982 Patent Application No. 250781 2°Q I
JJ O)'r511; -751
3. Relationship with the case of the person making the amendment Patent applicant address: 6-7-35, Kitashina-yo, Tokyo Parts-Yo-ku Name (2
'1B) Representative Director of Sony Corporation Norio Ohga 4, Agent Address: 1-8-1 Nishi-Shinjuku, Shinjuku-ku, Tokyo 03-
343-5821&0 (Shintaku Building) 6. Number of inventions increased due to sleeve correction (1) The scope of claims is amended as shown in the attached sheet. (2) In the specification, page 3, line 2 to page 4, line 2 “Figure 2.
... will be supplied. ” is corrected as follows. 1- In Figure 2, an audio signal that has been converted into an electrical signal using a microphone or the like and passed through a low-pass filter with a cutoff frequency of 3.2kllz is input to the input terminal (1).
supplied to This input audio signal is 6', 4kllz
An AD converter (2) of 12 bits per word driven by a conversion clock (periodically 158 μs) sequentially converts each word into 12 bits of digital data at the rate of this clock pulse. AD converter (2) is 6.4kll
1i driven by the clock of z! It is connected to a 256-word shift register (3) consisting of 12 bits,
Every time one pulse of the driving clock is supplied to the shift register (3), the shift register (3) outputs 1 if! ,
In FIG. 2, the output data of the AD converter (2) is shifted to the right (the words "left" and "right" will be used to mean left and right in FIG. 2). 1 word, from the left of shift register (3), shift register (3)
) There are people in In other words, the shift register (3) has an AD converter (2).
A series of 256 words of digital data generated by the AD converter (2) is input, and each time the AD converter (2) generates one word of digital data, the shift register (3)
The word is shifted to the right and its contents are updated. Here, before explaining the specific flow of signals below (4) in FIG. 2, general matters regarding short-time Fourier analysis will be described. For example, if we consider the audio signal "Aiueo", the time when the sound "A" is being made and the time when the sound "1i" is being made differ depending on the mouth of the person making the sound. The shape of the vocal tract is different. In other words, the audio signal "Aiueo" is a signal emitted from a physical entity whose characteristics change over time, and cannot be regarded as a stationary signal. In this way, the characteristics of the physical entity that emits audio signals, music signals, etc. change over time.
In general, it cannot be regarded as a stationary signal, and it is impossible to directly apply Fourier spectrum analysis to stationary signals. However, let's talk about which examples of "aiueo" are "a", "1-i", "u", and "1".
Yeah.” The shape of the human mouth and vocal tract remains fairly constant during the time each "o" sound is uttered, and if the signal is limited to that time, it can be regarded as a steady signal. Therefore, if we limit the region to be Fourier transformed to a time interval that can be considered stationary, perform Fourier transform, and use the Fourier spectrum obtained by updating that interval one after another, it is possible to Fourier analysis becomes possible for audio signals and music signals that are stationary in the interval. This kind of Fourier analysis is called short-time Fourier analysis. Let's explain further using mathematical formulas. Let the human signal x (tl be the data string obtained by sampling (X(→)(m=
0.1.2. ), the above-mentioned matters are
Subsequence of data that can be considered stationary (x (m+SR))
m=0.1. −...; S =0.1. ...= (R,
For a variable m of a certain integer constant M, a finite subsequence (x
(m+SR)) m=0+L..., window coefficient (h(-m
)) After multiplying by (m=0.1...,M-1),
Discrete Fourier transform of °ζ is performed on the variable m, and the short-time Fourier spectrum X (SR, k) (S=0.
1. ...+M-1; = 0.1, 2. ..., M
-1). 2π As is clear from Figure 8, when analyzed, R is the length of the interval F
iWk, and has the following restrictions. From formula (A), if we set 2π m+5R=f, then 2π − (B) Window coefficient (h (mth (rn-0, 1, 2,”, M-1
) is expanded from 1 to 10 flashes with respect to m and set to 9π - (C).In other words, X(SR,k) becomes the first
For the th variable SR, the data string is convolved with the data string +(m)), X(S, k)(S=0.1.
2. The old ) is resampled every R-1 data, and the digital signal impulse response (
h(m)) It can be interpreted that the output input to the linear digital system is resampled every R-1 data. Therefore, the update 1iRXl of the interval to be analyzed must be in the band for the first variable m, as shown by the sampling theorem. In the band of (X (m, k) ) (m=0.1.2..."), it depends on (m = 01112...), but its upper limit is shown in the figure. Since it is suppressed by the low-pass characteristic of a linear digital system with an impulse response (h(m)) at , Lt2.-”
) band + 1J) --(D) That is, R must be 1 (E). As an example, M=256. If (h (m) ) is the Hamming window coefficient, the window coefficient b (rn) = 0.5
4-0.46 cos(2πm/255) (m=
0.11・−, 255), then (h (m
)) In the low-pass band of (m=0.1, 2. -, 255), it is attenuated to about 42 dB.In other words, from the relationship in the above equation, R must be R≦□-64 ■ . In FIG. 2, the short-time Fourier transform described above is performed in (41, (51, (61, (71).M =
256, Hamming window coefficient h(rn
)-0,54-0,46Xcos (2πm/255)
(m-0,1,2°...+255), R=
It is set at 64. As is clear from the example above, R=6
4 satisfies formula (E). The details will be explained below. A shift register (12 bits per word, 256 words)
The content of 3) is that the drive clock of the AD converter (2) is set to 64
One pulse of the divided clock (i.e. 64x (AD
Drive clock period of conversion (2), approximately 158μ5ec) (
Similarly, each word is launched into a shift register (4) consisting of 256 words, 12 bits per second). The latched 256 pieces of data are transferred to the 8M1lz (period 125 n 5ec) gate C17 which is supplied to the shift register (4).
At the same timing, the data is shifted one word to the right and fed into one input terminal of a multiplier (5) having two input terminals of 12 bits and one output terminal of 23 bits. On the other hand, at this same clock timing, R
Hamming window coefficient h (m) = 0. stored in OM.
54-0.46cos (2πm/255) (r
n = 0.1, 2゜..., 255) are carried in the order of m-(L1+2+...), word by word, and the product of these two human forces is calculated by the multiplier (5). As the output,
The human input data is set to the output terminal of the multiplier (5) after 100 n sec from ζ. At the timing of the timing clock that sends human data (every 125 nsec) F F T (Fast Fourier T
transform) is sent to the converter (6). FF
When the data of 23 bits per word sent in this way becomes 256 words, the T converter (6) converts the 23 bits per word,
ζ, FFT is performed on the data of 256 words, and the real part is
Generates 256 words of complex data in which both the imaginary part consists of 16 bits. Now, the input data of 256 words to the FFT converter (6) is (y(m)) (m=0.1...+255
) output data (Y (k) (k=0.1.2...
..., 255), then from the definition of FFT, 2π - (F) On the other hand, this human data (y (In) ) (rn=o
From equation (A), the short-time Fourier spectrum of lil..., 255) is 2π but 2, and (Y (10) (k=0.1.2...
..., 255) and (X (64S, k )) (k=0
．． 1.2. ..., 255) has the following relationship: 2π (k=0.1, 2..., 255) - (H). Therefore, output 2 of FFT converter (6)
π A short-time Fourier spectrum of the human power data X(m) will be obtained. This is done by the multiplier (7). In other words, the 256 words of complex data generated by the FFT converter (6), consisting of 16 bits for both the real and imaginary parts, are as follows:
One of the multipliers (7) which has two complex data input terminals each consisting of 16 bits for both the real and imaginary parts and one output terminal consisting of 16 bits for both the real and imaginary parts at the timing of a clock with a period of 125 n see. is sent to the input terminal of On the other hand, at the timing of this same clock, the above-mentioned coefficients, 2π 2,..., 255), which have been pre-windowed, are transferred - word by word to the other input terminal of the multiplier (7). In addition, the product of these two human forces is the output of the multiplier (7),
The input data is set to the output terminal of the multiplier (7) after 100 n sec from °ζ. This output result is sent word by word to the spectrum modification circuit (8), 256 words in total, at the timing of the clock that sends the input data to the multiplier (7). (3) Same, page 4, line 3, page 5, line 20, page 17, line 7-
In each of the 8th line, the words "processing circuit" are corrected to "Spec I/RAM transformation circuit." (4) Same, 4th shell, line 7 to page 5, line 6 “This processing circuit...
...is taken out. ” is corrected as follows. 1゛The complex data of 256 words, each word consisting of 16 bits for both the real and imaginary parts, transformed by the spectrum transformation path (8) is as follows: (9), (11), (12), (1
3). The signal is converted into a time domain signal in (14). Before specifically explaining the flow of (9) to (14), (9)
) to (14), the general relationships will be described below. As mentioned earlier, the transformed short-time Fourier spectrum X (Slj', k) (S=0.1.2.
...; k = 0.1.2. ..., Tol) is,
Short-time Fourier spectrum X (S, k) (S=0
．． 1, 2. . . .;k'''Otl+2, . Therefore, the transformed short-time Fourier spectrum X (SR', k
) (S=0.1, 2....1k=(Ll, 2..
..., To create a time domain signal from
X (SR', k) (S=0.1.2...
;to=0.1,2. ..., Tol) is interpolated, and X
(S, k) (S=0.1.2...; k =0
．． 1, 2. ..., make G1) and X (S, k)
(S=0.1.2..., k=0.1.2.
.... 1) may be subjected to inverse discrete Fourier transform. That is,
Regarding the first variable of X (SR', k), ``ζ,
For each, add R'-1 0 between adjacent data ^ data
．． 1. -..., G1) is passed through a low-pass filter with C as an impulse response to produce -X(S, k). From the definition of the expression % expression % ′zo(β, k), m= −■ After this, let X(S, k) be the second variable, and with respect to k,
Perform inverse discrete Fourier transform to obtain output signal (y(S))(S
=0.1.2. ...) is obtained. If we also write this as a formula,
It will look like this: 2π - (I) R' = R and when the spectrum is not manipulated, the input signal must become the output signal as it is. For that purpose, from the above formula, y (S) = x (S) By the way, since , we set A = S - pM (p: variable) = (J) Therefore, (h (m)) and (f ( m) ) is all S
It is necessary that m=-■-(K). Now, from equation (I), if we write...) (f (m)) is m = 0. L2+”
” Since t is not 0 only at M-1, f (S-mR') ・x (mR', S) is S=
mR' + mR' + 1+ ..., mR'
+M −1(n = S−mR' + n = Ot1.・
・"+M-1) is not 0. Therefore, if R' is chosen to divide M, r-R'=M (r: positive integer constant), (m-1)R '+y1≦S≦mR'+M-1 (m=0
．． l, 2. ...), and with a finite number of additions (-y(S)) (S=0.1.2
．． ...) are found sequentially. Also, to use FFT when calculating x (mR+s),
There is a relationship of equation (H) between FFT-transformed data and short-time Fourier spectrum data 2π 0.11..., M-1i S-0,1,2,...
...; R', an integer constant), and then perform FFT. This will be explained in detail in FIG. In the following explanation, R
'=64. The real part, transformed by the spectrum transformation circuit (8),
Short-time Fourier spectrum of 256 words with both imaginary parts consisting of 16 bits X (64S, k) (k=0.1.2
゜..., 255) is a period of 125 n sec
At the clock timing of k=o, 1.2.・・・・・・
The power u (9
). On the other hand, the same clock timing is prepared in advance. The above coefficients 9π...+255) are sent one word at a time in the order of -0, 1, 2,... to the other input terminal of the multiplier (9), and the product of these two inputs is , as the output of the multiplier (9), the input data is sent to the multiplier (9) and then 1
After 00 n seconds, it is set to the output terminal of the multiplier (9). This output result is a total of 256 words, word by word, at the timing of the clock that sends input data to the multiplier (9).
Words are sent to the inverse FFT converter (lffl).The inverse FFT converter outputs 256 words, both the real and imaginary parts of which are 16 bits.
An inverse FFT is performed on this data to generate time domain data of 256 words each consisting of 16 bits. This 256-word data consisting of 16 bits per word has a period of 1
At a clock timing of 25 n sec, it is fed into one input terminal of a multiplier (11) having two input terminals each consisting of 16 bits and one output terminal consisting of 16 bits. On the other hand, at the timing of this same clock, the multiplier (11
), the above-mentioned relational expression (K) prepared in advance in the ROM is m=LL2+...
The product of these two inputs is input to the multiplier (11) 100 n seconds after the input data is set.
is set to the output terminal of This output result is 256 words in total, one word at a time at the timing of the clock that sends input data to the multiplier (11).
It is sent to the shift register (12). The shift register (12) consists of 256 words of 16 bits per word, and sends out the multiplication results of the multiplier (11). They are driven by the same clock with a period of 125 n sec, and each time the multiplication result is carried over from the multiplier (11) by one word, it is shifted to the right by one word. In this way, when the multiplication result of the multiplier (11) of 256 words is entered into the shift register (12), the shift register (12) becomes in a shift prohibited state and the 256 words of the shift register (12) become 1 Each of the shift registers (13), each consisting of 16 bits and 256 words, is added for each corresponding word, and the addition result is placed into the corresponding word of each shift register (13). This shift register (13) is supplied with a 6.4kHz clock that drives the AD converter (2), and when the above-mentioned addition is completed, this 6.4kHz clock is shifted every pulse. The register (13) is shifted one word to the right and l data is sent to the 16-bit AD converter (14). On the other hand, due to this shift, data having a value of 0 is transferred to the shift register (13) from the left to 1.
It can be put into words. In this way, the shift register (13) performs shifting R'=64 times and sends 64 output data to the DA converter. The 16-bit DA converter (14) receives li! which is sent at a clock timing of 6.4kllz. The 16-bit data is sequentially converted to an analog voltage value, and the output terminal (
15). ” (5) Same, page 6, line 1, “The phase...signal” is 1. "This signal," he corrected. (6) Same page, lines 11 and 15, and Susha, page 7, line 3. (7) Similarly, on the 7th page, IO line, after "for", add "multiply by window coefficient h(m)". (8) Same page, line 13, after converting by 1 degree, 1-spectrum X2 (SR, ω) is added after ". (9) Same, same page, line 16 is corrected. (lO) Same, page 8, lines 7-14 1h(m)=1z...can be done. ” is corrected as follows. rh (m) = 1 f (m) = 0.5-0.5cos (2πm/
(N 1) >m=0,...N-1 When the low frequency component is larger, h (m) = 0.54-0.46 cos (2yc
The sound quality can be improved by setting m/(N-1))m=0,...N-1 f(m)=R'/Σh1. Note that h (m)=O corresponds to doing nothing with the multiplier (5). "(11) Same, page 9, lines 6 to 9, 1 value of 1 phase... phase" should be corrected as follows. [In this case, if one value of the phase is P (SR, ωk), P (SR, ωk) takes a value of -π≦P' (SR, ωk) < π, and the phase becomes discontinuous. Part exists. Therefore, the phase after removing this discontinuity is added.'' (12) After ``Further'' on page 10, line 10, ``To determine the discontinuous portion.'' is added. (13) In the same article, page 13, line 11 to page 14, line 1, line 1 - ``Speech control device operated manually'' is corrected as follows. 1- Means for extracting a human-generated audio signal by multiplying it at each arbitrary time interval by a window function having a length and coefficient longer than the time limited by the time interval, and Fourier transform for each extracted signal. means for converting the time axis into the frequency axis by using the method; means for adjusting the phase of the converted signal; means for inversely converting the phase-adjusted signal from the frequency axis to the time axis; means for interpolating the inversely transformed signal at a predetermined magnification, multiplying the interpolated signal by a window function having a time length and a coefficient defined by the window function, and sequentially synthesizing the signals at each arbitrary time interval; "A speech control device comprising means for arbitrarily expanding or contracting the time axis and outputting it." (16) Same, page 17, line 1, ``Figure 5'' is corrected to ``Figure 5, Figure 8, Figure 9''. (17) Figures 8 and 9 will be added to the drawings as shown in the attached sheet. What is claimed is: a means for multiplying and extracting an input audio signal for each arbitrary time interval by a window function having a length and a coefficient longer than the time limited by the time interval, and for each extracted signal. means for converting the time axis into the frequency axis by Fourier transform; means for adjusting the phase of this converted signal; and means for inversely converting the frequency axis into the time axis by performing inverse Fourier transform on the phase-adjusted signal. , a means for interpolating this inversely transformed signal at a predetermined magnification, and a means for multiplying this interpolated signal by a window function having a time length and a coefficient defined by the window function 2. A speech control device comprising: means for sequentially synthesizing each segment, and arbitrarily expanding/contracting the time axis and outputting the output.

Claims

[Claims]

A means for extracting an input audio signal by an arbitrary time length at an arbitrary time interval, a means for performing a Fourier transform on each extracted frame and converting the °ζ time axis into a frequency axis, means for adjusting the phase of the signal; means for performing inverse Fourier transform on the phase-adjusted signal to inversely transform the ′ζ frequency axis into the time axis; and means for interpolating the inversely transformed signal at a predetermined magnification; A speech control device comprising means for sequentially synthesizing the interpolated signals at desired time intervals and arbitrarily expanding or contracting the time axis for output.