JP3468337B2

JP3468337B2 - Interpolated tone synthesis method

Info

Publication number: JP3468337B2
Application number: JP03491497A
Authority: JP
Inventors: 直敏小坂
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1997-01-07
Filing date: 1997-02-19
Publication date: 2003-11-17
Anticipated expiration: 2017-02-19
Also published as: JPH10254500A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は音声、音楽を含む
あらゆる音の中から二種類の原音の合成音色を、その二
つの原音自体を両端とし、これらの任意の割合での補間
音色を合成する補間音色合成方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention synthesizes two kinds of original tones from all sounds including voice and music, and uses the two original tones as both ends to synthesize interpolated tones at an arbitrary ratio. The present invention relates to an interpolation tone color synthesis method.

【０００２】[0002]

【従来の技術】従来の電子楽器において、使用者はさま
ざまな音色を、（１）ＰＣＭ音源、ＦＭ音源などの各種
方式によるプリセット、（２）使用者が音を新たに収録
することによる付加（サンプリング）、（３）ＦＭ音源
などモデルによるパラメータ表現されたものを修正編集
した再合成、（４）各種フィルタ処理、歪付加などによ
る加工などの方法により作成してきた。2. Description of the Related Art In a conventional electronic musical instrument, a user adds various tone colors (1) preset by various methods such as PCM sound source and FM sound source, and (2) addition by user newly recording sound ( Sampling), (3) re-synthesis in which a parameter expression such as an FM sound source is modified and edited, (4) various filter processes, and processing such as distortion addition has been performed.

【０００３】また音色の連続的処理、微妙な処理といっ
たものはフィルタ処理、歪付加などによる加工処理によ
ってある程度実現できた。しかし、これらは、あるひと
つの音色を崩し、変形させることであり、目標とする具
体的音色を与えてその音色に向けて補間するというより
高度な音色の制御はできなかった。Further, continuous processing of timbre and subtle processing could be realized to some extent by processing such as filter processing and distortion addition. However, these are to destroy and transform a certain timbre, and it was not possible to perform more advanced timbre control by giving a target specific timbre and interpolating toward that timbre.

【０００４】[0004]

【発明が解決しようとする課題】この発明の目的は与え
られた二つの音色の両端を含む任意の知覚的内分点とな
る音色を合成することができ、また与えられた二つの音
色の一端から他端までを連続的に変化させる音色の合成
を実現させることができる補間音色合成方法を提供する
ことにある。SUMMARY OF THE INVENTION The object of the present invention is to synthesize a tone color which is an arbitrary perceptual internal division point including both ends of two given tone colors, and one of the two tone colors given. It is an object of the present invention to provide an interpolated timbre synthesizing method capable of realizing timbre synthesizing that continuously changes from to the other end.

【０００５】[0005]

【課題を解決するための手段】この発明によれば、二つ
の原音（互いに音色が異なるもの）について、原音をモ
デル表現するパラメータを、一定時間ごとにそれぞれ推
定し、これら推定された二つの原音のパラメータ間の対
応する時点を抽出し、これら抽出された対応する時点に
おける二つの原音の各推定パラメータ間の対応するもの
を見い出し、これら見い出した対応するパラメータを、
合成音色の上記二原音への所望の近さの程度に応じて補
間する。According to the present invention, for two original sounds (those having different tone colors), parameters for expressing the original sound as a model are estimated at regular intervals, and these estimated two original sounds are estimated. The corresponding time points between the parameters of the two are extracted, the corresponding ones between the estimated parameters of the two original sounds at these extracted time points are found, and the corresponding parameters found are
Interpolation is performed according to the desired degree of proximity of the synthesized tone color to the two original tones.

【０００６】[0006]

【発明の実施の形態】実施例１この発明方法が適用される補間音色合成装置の機能構成
を図１に示す。二つの原音波形ｘ，ｙについて、原音を
信号モデルで表現するために用いられるパラメータ、例
えば部分音が、部分音分解部１１で推定される。原音
ｘ，ｙはこの例ではそれぞれ、例えば５ｍｓｅｃ程度の
フレーム毎に区切られた波形であって、これら波形は下
記のように表わされる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 FIG. 1 shows the functional configuration of an interpolated tone color synthesis apparatus to which the method of the present invention is applied. With respect to the two original sound waveforms x and y, a parameter used for expressing the original sound by the signal model, for example, a partial sound is estimated by the partial sound decomposition unit 11. In this example, the original sounds x and y are waveforms separated for each frame of, for example, about 5 msec, and these waveforms are expressed as follows.

【０００７】ｘ＝｛ｘ_n｜ｎ＝０，１，…，Ｎ_x−１｝（１）ｙ＝｛ｙ_n｜ｎ＝０，１，…，Ｎ_y−１｝（２）ここでＮ_x，Ｎ_yはそれぞれ原音波形ｘ，ｙの総フレー
ム数である。部分音分解部１１は、この例では原音波形
を、正弦波重畳モデル表現による合成に必要なスペクト
ル分析および整理を行ない、部分音（正弦波）への分解
を行う。正弦波重畳モデルは下記のように表わされる。X = {x _n | n = 0,1, ..., N _x −1} (1) y = {y _n | n = 0,1, ..., N _y −1} (2) where N _x and N _y are the total number of frames of the original sound waveforms x and y, respectively. In this example, the partial sound decomposing unit 11 performs spectrum analysis and rearrangement necessary for synthesis by the sine wave superimposed model representation on the original sound waveform, and decomposes into partial sounds (sine waves). The sinusoidal superposition model is expressed as follows.

【０００８】ｘ(m) ＝ΣＡ_p(m) cos （θ_p(m) ）（３） Σはｐ＝０からＰ−１まで、ｘ(m) はフレーム内のｍ番
目のサンプルの信号、ｐは部分音番号、Ｐは部分音の
数、θ_p(m) はｍ番目のサンプルの部分音ｐの瞬時位
相、Ａ_p(m) は部分音ｐの瞬時振幅をそれぞれ表わす。
（３）式ではフレーム番号ｎを省略している。Ａ，θの
他、Ｐも分析フレームにより異なる値をとる。X (m) = ΣA _p (m) cos (θ _p (m)) (3) Σ is from p = 0 to P−1, x (m) is the signal of the m-th sample in the frame, p is the partial number, P is the number of partials, θ _p (m) is the instantaneous phase of partial p of the m-th sample, and A _p (m) is the instantaneous amplitude of partial p.
In equation (3), frame number n is omitted. In addition to A and θ, P takes different values depending on the analysis frame.

【０００９】（３）式の表現に必要な情報は、フレーム
毎の境界の部分音のスペクトル情報であり、瞬時（サン
プル毎：添字ｍ）の情報は合成時の補間などで算出で
き、ここでは不用である。これより原音波形ｘのスペク
トルパラメータについては下記が出力される。Ｘ＝｛Ｘ_n｜ｎ＝０，１，…，Ｎ_x−１｝（４）Ｘ_n＝｛（ｆ_xp ⁿ，Ａ_xp ⁿ，θ_xp ⁿ）｜ｐ＝0,1,…，Ｐ_x(n) −１｝（５）ｆは瞬時周波数であって、θの時間微分であり、ここで
はｆ，Ａ，θはそれぞれ境界値であり、ｆ_xp ⁿ＝ｆ_xp ⁿ（ｍ＝０）、Ａ_xp ⁿ＝Ａ_xp ⁿ（ｍ＝
０），θ_xp ⁿ＝θ_xp ⁿ（ｍ＝０）を意味する。同様に原音波形ｙのスペクトルパラメータ
についても以下で表わされるＹが算出される。The information necessary for the expression (3) is the spectral information of the partial sound at the boundary of each frame, and the instantaneous (for each sample: subscript m) information can be calculated by interpolation during synthesis. It is useless. From this, the following is output for the spectrum parameter of the original sound waveform x. X = {X _n | n = 0,1, ..., N _x −1} (4) X _n = {(f _xp ⁿ , A _xp ⁿ , θ _xp ⁿ ) | p = 0, 1, ..., P _x (n) −1} (5) f is an instantaneous frequency, which is a time derivative of θ, where f, A, and θ are boundary values, respectively, and f _xp ⁿ = f _xp ⁿ (m = 0) , A _xp ⁿ = A _xp ⁿ (m =
0), θ _xp ⁿ = θ _xp ⁿ (m = 0). Similarly, for the spectral parameter of the original sound waveform y, Y represented below is calculated.

【００１０】Ｙ＝｛Ｙ_n｜ｎ＝０，１，…，Ｎ_y−１｝（６）Ｙ_n＝｛（ｆ_yp ⁿ，Ａ_yp ⁿ，θ_yp ⁿ）｜ｐ＝0,1,…，Ｐ_y(n) −１｝（７）（４）〜（７）式のパラメータ推定はＭｏｏｒｅｒのア
ルゴリズム（１９７７）以来いくつか提案されている。
代表的なものとして、ＦＦＴから得られるローカルピー
ク（雑音成分以外の各ピーク）の周波数、振幅および位
相を検出してＸ，Ｙを求めるＭｃＡｕｌａｙおよびＱｕ
ａｔｉｅｒｉによるアルゴリズム（ＭＱアルゴリズム、
１９８６）がある。Y = {Y _n | n = 0,1, ..., N _y −1} (6) Y _n = {(f _yp ⁿ , A _yp ⁿ , θ _yp ⁿ ) | p = 0, 1, ... , P _y (n) −1} (7) Several parameter estimations of the equations (4) to (7) have been proposed since Moorer's algorithm (1977).
As a typical example, McAulay and Qu for obtaining X and Y by detecting the frequency, amplitude and phase of a local peak (each peak other than noise component) obtained from FFT.
algorithm by atieri (MQ algorithm,
1986).

【００１１】次にこのようにして得られた原音波形のス
ペクトルパラメータＸ，Ｙ間の対応する時点を、時点対
応抽出部１２で抽出する。ここでは原音波形ｘ，ｙは同
一音韻（文章）の発声音、同一メロディ音、同一リズム
音、同一演奏法による楽音などであるが、音色が互いに
異なっているものを対象とし、この原音波形ｘ，ｙの知
覚的に同一事象が生起している時点の対応を求める。Then, the corresponding time points between the spectral parameters X and Y of the original sound waveform thus obtained are extracted by the time point correspondence extraction unit 12. Here, the original sound waveforms x and y are vocal sounds of the same phoneme (sentence), the same melody sound, the same rhythm sound, the musical sounds by the same playing method, etc., but those having different tone colors are targeted. , Y, the correspondence at the time when the same phenomenon occurs perceptually.

【００１２】図２にパラメータＸ，Ｙの時点の対応を取
った後に得られた結果を示す。横軸はフレーム番号で時
間を表わしている。この例ではＮ_x＝９，Ｎ_y＝１１で
あり、２番目（ｋ＝２）の対応データとしてＸ₂ とＹ₁
が対応し、３番目（ｋ＝３）の対応データとしてＸ₃ と
Ｙ₁ が対応している。パラメータＸとＹの対応は時間の
非線形伸縮によるマッチングであり、従来から知られて
いるＤＴＷ（DynamicTime Worping）の手法を用いて実
現できる。この結果、時点の対応を表す情報Ｃ(k) が求
まる。FIG. 2 shows the result obtained after the correspondence between the time points of the parameters X and Y is taken. The horizontal axis represents time by frame number. In this example, N _x = 9 and N _y = 11, and X ₂ and Y _{1 are set} as the second (k = 2) corresponding data.
And X ₃ and Y ₁ correspond as the third (k = 3) corresponding data. The correspondence between the parameters X and Y is matching by nonlinear expansion / contraction of time, and can be realized by using a conventionally known DTW (Dynamic Time Worping) method. As a result, information C (k) representing the correspondence at the time point is obtained.

【００１３】Ｃ(k) ＝（Ｃ_x(k),Ｃ_y(k))，ｋ＝０，１，…，Ｋ−１（８）Ｃ_x(k) ∈｛ｉ｜ｉ＝０，１，…，Ｎ_x−１｝（９）Ｃ_y(k) ∈｛ｊ｜ｊ＝０，１，…，Ｎ_y−１｝（10）図２の例では（Ｃ₁(0), Ｃ₀(0)），（Ｃ₂(1)，Ｃ
₁(1)），（Ｃ₃(3)，Ｃ₁(3)），（Ｃ₄(4)，Ｃ₂(4)），
（Ｃ₄(5)，Ｃ₃(5)），（Ｃ₅(6)，Ｃ₄(6)）…となる。C (k) = (C _x (k), C _y (k)), k = 0, 1, ..., K−1 (8) C _x (k) ε {i | i = 0,1 , ..., N _x −1} (9) C _y (k) ε {j | j = 0,1, ..., N _y −1} (10) In the example of FIG. 2, (C ₁ (0), C ₀ (0)), (C ₂ (1), C
₁ (1)), (C ₃ (3), C ₁ (3)), (C ₄ (4), C ₂ (4)),
(C ₄ (5), C ₃ (5)), (C ₅ (6), C ₄ (6)) ...

【００１４】次にこのようにして抽出した対応時点Ｃ
(k) と、推定パラメータＸ，Ｙと合成音色の程度、つま
り合成音における原音波形ｘ，ｙの比率αとを用いて補
間音色合成部１３により補間音色波形Ｚを合成する。こ
の補間音色合成部１３は図３に示すように初期値計算部
１５と単フレーム補間音色漸化計算部１６に大別され
る。初期値計算部１５においてはフレームｎを計数する
カウンタ記憶部１９の計数値がｎ＝０の時のみ動作され
る。第０番目フレーム（Ｃ_x(0),Ｃ_y(0))におけるパラ
メータＸ₀ とＹ₀ との対応するものをスペクトル対応探
査部１７で探査する。Next, the corresponding time point C extracted in this way
The interpolation tone color synthesis section 13 synthesizes the interpolation tone color waveform Z using (k), the estimated parameters X and Y, and the degree of the synthesized tone color, that is, the ratio α of the original sound waveforms x and y in the synthesized tone. As shown in FIG. 3, the interpolated timbre synthesizer 13 is roughly divided into an initial value calculator 15 and a single-frame interpolated timbre recurrence calculator 16. The initial value calculation unit 15 is operated only when the count value of the counter storage unit 19 that counts the frame n is n = 0. The spectrum correspondence search unit 17 searches for a corresponding one of the parameters X ₀ and Y _{0 in} the 0th frame (C _x (0), C _y (0)).

【００１５】次にスペクトル補間部１８で対応したスペ
クトルＯ_x，Ｏ_yを補間情報つまり合成比率αに応じて
補間して新たなスペクトル（パラメータ）Ｚ₀ を得る。
対応スペクトルの探査、補間についての具体的手法は後
で説明する。この新たなスペクトルＺ₀ はＺ_nとして単
フレーム補間音色漸化計算部１６中の単フレーム補間音
色合成部２１へ供給する。一方、単フレーム補間音色漸
化計算部１６内の基準時点算出部２２において、ｎ＋１
における、つまり最初はｎ＝１における補間スペクトル
Ｚ_n+1 の計算に用いる原音ｘ，ｙのスペクトルＸ，Ｙの
フレーム番号を対応付けているＺ_n+1 のフレーム番号ｋ
miを先に求めた（８）式のＣ(k) と、補間情報α_nとを
用いて算出する。Next, the spectrum interpolator 18 interpolates the corresponding spectra O _x and O _{y according} to the interpolation information, that is, the synthesis ratio α to obtain a new spectrum (parameter) Z ₀ .
A specific method for searching and interpolating the corresponding spectrum will be described later. This new spectrum Z ₀ is supplied as Z _{n to} the single-frame interpolation timbre synthesis unit 21 in the single-frame interpolation timbre recurrence calculation unit 16. On the other hand, in the reference time point calculation unit 22 in the single-frame interpolation timbre recurrence calculation unit 16, n + 1
, That is, the frame number k of Z _{n + 1} which is associated with the frame numbers of the spectra X and Y of the original sounds x and y used in the calculation of the interpolated spectrum Z _{n + 1 when} n = 1 at the beginning.
The mi is calculated using C (k) of the equation (8) previously obtained and the interpolation information α _n .

【００１６】この様子を図４を用いて説明する。一般的
なフレームでは、時点ｎのスペクトルパラメータは前フ
レームの分析で知られており、ｎ＋１の時点を算出す
る。いまｎ＋１＝３とする。補間情報α_nが図４中の曲
線２３で示すように時間に対し変化させる場合は、図４
の例ではｎ＝３の場合合成音色のスペクトルＺ₃ ′は、
Ｃ_x(4),Ｃ_y(4) を結ぶ線２４上にあり、この線２４は
ｋ＝４番目の値であり、対応時点（フレーム）Ｃ_x(4),
Ｃ_y(4) の各スペクトルはＸ₄ とＹ₂ であるから、これ
らＸ₄ とＹ₂ をαで補間して新スペクトルＺ₃ ′を求め
る。This state will be described with reference to FIG. In a typical frame, the spectral parameters at time n are known in the analysis of the previous frame and calculate time n + 1. Now, n + 1 = 3. When the interpolation information α _n changes with time as shown by the curve 23 in FIG.
In the case of n = 3, the spectrum Z ₃ ′ of the synthesized tone color is
It is on a line 24 connecting C _x (4) and C _y (4), this line 24 is the k = fourth value, and the corresponding time (frame) C _x (4),
Since each spectrum of C _y (4) is X ₄ and Y ₂ , these X ₄ and Y ₂ are interpolated by α to obtain a new spectrum Z ₃ ′.

【００１７】一方例えばα＝０．３とαを固定にし、ｎ
＋１＝３の合成音色スペクトルＺ₃は、時点対応抽出部
１２により得られた（８）式で求めた対応情報（Ｃ
_x(k),Ｃ _y(k））を結ぶ線上にない。そこで補間（合成
音色）スペクトルＺ₃ の計算に必要とする原音のスペク
トルＸ，Ｙを求めるために、Ｃ_x(k),Ｃ_y(k) を結ぶ直
線に近い番号ｋ＝ｋmiを求める。図４の例ではα＝０．
３でｎ＋１＝３の補間スペクトルＺ₃ に近いＣ_x(k),Ｃ
_y(k) を結ぶ線はｋ＝４の直線２４であり、従ってその
直線２４の両端のフレームＣ_x(4),Ｃ_y(4) における
Ｘ，Ｙの各スペクトルＸ₄ とＹ₂ を用いてＺ₃ を求め
る。この合成音色Ｚ_n+1 を通るＣ_x(k),Ｃ_y(k)を結ぶ
線、つまりｋがない場合に、Ｚ_n+1 に最も近いｋ＝ｋmi
は次式により求める。On the other hand, for example, with α = 0.3 and α fixed, n
+ 1 = 3 synthesized tone color spectrum Z₃Is the time point extraction unit
Correspondence information (C
_x(k), C _yNot on the line connecting (k)). So interpolation (composite)
Tone) Spectrum Z₃ Of the original sound needed for the calculation of
C to find the tor X, Y_x(k), C_ystraight tie (k)
Find the number k = kmi close to the line. In the example of FIG. 4, α = 0.
3 and n + 1 = 3 interpolation spectrum Z₃ Close to C_x(k), C
_yThe line connecting (k) is a straight line 24 with k = 4, so
Frame C at both ends of straight line 24_x(4), C_yIn (4)
X and Y spectra X_Four And Y₂ Using Z₃ Seeking
It This synthetic tone Z_{n + 1} Passing through C_x(k), C_ytie (k)
Z if there is no line, that is, k_{n + 1} Closest to k = kmi
Is calculated by the following formula.

【００１８】ｋmi＝ arg min｜（１−α）・Ｃ_x(k) ＋α・Ｃ_y(k) −ｎ｜（11）ｋmi∈（ｋ｜ｋ＝０，１，…，Ｋ−１）（12） arg min ｜Ａ(k) ｜は与えられたｋのなかでＡ(k) の値
が最も小さいｋを求めることである。この例ではＣ
_x（ｋmi＝３）＝４，Ｃ_y（ｋmi＝３）＝２となる。な
お、式（１１）、（１２）は次のようなことを意味して
いると云える。即ち、フレームごとに二原音の対応フレ
ーム（Ｃ_x(k) ，Ｃ_y(k) ）を抽出し、合成音色の二原
音への所望の近さの程度αにより抽出された対応フレー
ム上を補間（これは線形補間に限られない）したときの
時点が、合成音色の新たに算出すべきフレームの表す時
点と最も近くなるように上記抽出された対応フレームを
選ぶことである。Kmi = arg min | (1-α) · C _x (k) + α · C _y (k) −n | (11) kmiε (k | k = 0, 1, ..., K−1) ( 12) arg min | A (k) | is to find k having the smallest value of A (k) among given k. C in this example
_{x (kmi = 3) = 4} , C y (kmi = 3) = 2 and composed. It can be said that the expressions (11) and (12) mean the following. That is, the corresponding frames (C _x (k), C _y (k)) of the two original sounds are extracted for each frame, and the corresponding frames extracted by the degree α of the desired proximity to the two original sounds of the synthesized tone are interpolated. (This is not limited to linear interpolation) is to select the extracted corresponding frame so that the time point when it is performed is closest to the time point represented by the frame to be newly calculated for the synthesized timbre.

【００１９】このようにして、補間情報α_nにおける対
応時点（フレーム）Ｃ_x(k),Ｃ_y(k) のスペクトル
Ｘ_i，Ｙ_j（ｉ＝Ｃ_x(k),ｊ＝Ｃ_y(k) ）が求まると、
これら両スペクトルＸ_i，Ｙ_jをα_nに応じて補間した
新スペクトルＺ_n+1 を作成するが、その補間は次のよう
に行う。まずスペクトル対応探査部２６で、スペクトル
Ｘ _i，Ｙ_jの対応するものを求める。ここではＸ_i，Ｙ
_jの各要素数が異なり、かつ、フレーム内で対応する要
素が時間的に異なっているが、これら二つの異次元ベク
トルで、その要素間に距離を定義して、全体として最適
に合致するような相手を決定する。この場合、相手が見
つからないことのコストを距離と同一次元で定義すると
総合コスト最小の考えを用いてＤＰ（動的計画法）によ
り決定すればよい。この距離尺度として、瞬時周波数、
レベルをベクトルの要素として、重み付きユークリッド
距離が考えられる。これは同ピッチの場合を考えると瞬
時周波数そのものよりも瞬時周波数を基本周波数で正規
化し、調波番号に対応する数値（実数）とした方が自然
な対応となる。すなわち、後者ではそれぞれの同一調波
同士が対応することになる。In this way, the interpolation information α_nIn
Response time (frame) C_x(k), C_yspectrum of (k)
X_i, Y_j(I = C_x(k), j = C_y(k)) is obtained,
Both spectra X_i, Y_jΑ_nInterpolated according to
New spectrum Z_{n + 1} But the interpolation is as follows
To do. First, the spectrum correspondence search unit 26
X _i, Y_jAsk for the corresponding. X here_i, Y
_jThe number of elements in each
Although the elements are different in time, these two different dimension vectors
Tor, define the distance between its elements, optimal as a whole
Determine who will match. In this case, the other person sees
If the cost of failure is defined in the same dimension as distance,
With DP (Dynamic Programming) using the idea of total cost minimum
You can make a decision. As this distance measure, the instantaneous frequency,
Weighted Euclidean, with levels as elements of the vector
Distance can be considered. This is the moment when considering the case of the same pitch
Normalized the instantaneous frequency with the fundamental frequency rather than the time frequency itself
And it is more natural to use a numerical value (real number) corresponding to the harmonic
It will be a correspondence. That is, in the latter, each of the same harmonics
They will correspond to each other.

【００２０】ここではｋmiと対応するフレームｉのスペ
クトルＸ_iとフレームｊのスペクトルＹ_jが与えられ、
各スペクトルの次元Ｐ_x(n),Ｐ_y(n) は可変長であり、
フレーム番号ｎ，原音波形ｘ，ｙにより異なる。このＸ
_i，Ｙ_jに対し上のアルゴリズムにより最適な相手を見
つける。このようにして見つけた対応するものを次式で
記録する。つまり見つけた対応するものを一旦記憶して
おく。Given the spectrum X _i of frame _i and the spectrum Y _{j of} frame j corresponding to km _i ,
The dimension P _x (n), P _y (n) of each spectrum has a variable length,
It depends on the frame number n and the original sound waveforms x and y. This X
Find the best partner for _i , Y _{j by the} above algorithm. The corresponding one found in this way is recorded by the following formula. In other words, the corresponding one found is temporarily stored.

【００２１】Ｏ_x（ｐ_x），ｐ_x＝０，１，…，Ｐ_x(i) Ｏ_y（ｐ_y），ｐ_y＝０，１，…，Ｐ_y(j) ここで対応関係が成立する場合は次式で表わせる。Ｏ_x（ｐ_x）∈｛ｐ_y｜ｐ_y＝０，１，…，Ｐ_y(j) −
１｝Ｏ_y（ｐ_y）∈｛ｐ_x｜ｐ_x＝０，１，…，Ｐ_x(i) −
１｝対応のないものはＯ_x（ｐ_x）＝−１とするなど、対応する場合に用いられる値以外の値を定
義することにより区別する。相手がない場合、冗長性が
あるのでＯ_y（ｐ_y）はあえて記録しなくてもよい。O _x (p _x ), p _x = 0,1, ..., P _x (i) O _y (p _y ), p _y = 0,1, ..., P _y (j) If it holds, it can be expressed by the following equation. O _x (p _x ) ε {p _y | p _y = 0,1, ..., P _y (j) −
1} O _y (p _y ) ε {p _x | p _x = 0,1, ..., P _x (i) −
1} Those that do not correspond are distinguished by defining a value other than the value used in the case of correspondence, such as setting O _x (p _x ) = − 1. If there is no other party, O _y (p _y ) does not have to be recorded because of redundancy.

【００２２】例えばＸ_iのスペクトルが図５Ａに示すよ
うに、１００Ｈｚを基本周波数とするものであって、調
波番号０，１，２，３，４の各要素Ｏ_x(0) ，Ｏ_x(1)
，Ｏ _x(2) ，Ｏ_x(3) ，Ｏ_x(4) に対し、Ｙ_jのスペ
クトルが図５Ｂに示すように、２００Ｈｚを基本周波数
とするものであって、調波番号０，１，２の各要素Ｏ
_y(0），Ｏ_y(1）、Ｏ_y(2）であったとする。この時Ｏ
_x(0) はＯ_y(0）と対応し、Ｏ_x(0) ＝０とし、Ｏ
_x(1) は対応がないからＯ_x(1) ＝−１とし、Ｏ_x(2)
はＯ_y(1）と対応し、Ｏ_x(2) ＝１とし、Ｏ_x(3) はＯ
_y(2）と対応し、Ｏ_x(3) ＝２とし、Ｏ_x(4）は対応が
ないからＯ_x(4）＝−１とする。同様に、Ｏ_y(0）はＯ
_x(0）と対応しＯ_y(0）＝０，Ｏ_y(1）＝２，Ｏ_y(2）
＝３，Ｏ_y(3）＝−１とする。For example, X_iThe spectrum of is shown in Figure 5A.
As the basic frequency is 100 Hz,
Each element O of wave number 0, 1, 2, 3, 4_x(0), O_x(1)
, O _x(2), O_x(3), O_xFor (4), Y_jSpa
As shown in Fig. 5B, Khutor has a fundamental frequency of 200 Hz.
And each element O of harmonic numbers 0, 1, 2
_y(0), O_y(1), O_y(2). O at this time
_x(0) is O_yCorresponds to (0), O_x(0) = 0, O
_xO is because there is no correspondence in (1)_x(1) = -1 and O_x(2)
Is O_yCorresponds to (1), O_x(2) = 1 and O_x(3) is O
_yCorresponding to (2), O_x(3) = 2 and O_x(4) is compatible
O because there is no_x(4) = -1. Similarly, O_y(0) is O
_xCorresponds to (0) O_y(0) = 0, O_y(1) = 2, O_y(2)
= 3, O_y(3) = -1.

【００２３】以上のようにしてＸ_i，Ｙ_jの対応したも
のを見付け記録すると、この記録したＸ_i，Ｙ_jとα_n
を用いて、対応がついたスペクトル同士はα_nにより物
理補間を、対応がつかないスペクトルはゼロ値への補間
をスペクトル補間部２７で行い、新しい合成音色スペク
トルＺ_n+1 を作成する。つまり例えばＯ_x( ）とＯ
_y( ）とが対応する場合、αＯ_x( ）＋（１−α）Ｏ
_y( ）＝Ｏ_z( ）を求め、対応するものがなければ、Ｏ
_z( ）とする。When the corresponding X _i , Y _j is found and recorded as described above, the recorded X _i , Y _j and α _{n are} recorded.
Using the physical interpolation by the spectrum between the alpha _n marked with correspondence, spectrum corresponding in doubt performs interpolation to zero value in the spectrum interpolation unit 27, to create a new composite tone spectrum Z _{n + 1.} That is, for example, O _x () and O
_{When y} () corresponds, αO _x () + (1-α) O
_y () = O _z (), and if there is no corresponding value, then O
_z ().

【００２４】以上のように調波番号による管理により、
異音を生じさせないで済む。原音に無声部が含まれてい
る場合は、（１）無声部と無声部の場合は、便宜的に任
意の調波帯幅を二原音共通に組む、（２）無声部と有声
部の場合は無声部の調波帯を有声部の調波帯と同一に設
定すればよい。このようにして合成した新しい音色スペ
クトルＺ_n+1 を前のフレームで計算されたＺ_nとから音
色波形Ｚ_nを単フレーム補間音色合成部２１で算出す
る。この波形合成の手法は例えば下記の公知の各種のも
のを用いることができる。As described above, by the management by the harmonic number,
You don't have to make noise. When the unvoiced part is included in the original sound, (1) In the unvoiced part and unvoiced part, for the sake of convenience, an arbitrary harmonic band width is commonly set for the two original sounds. (2) In the unvoiced part and unvoiced part May set the harmonic band of the unvoiced part to be the same as the harmonic band of the voiced part. The new tone color spectrum Z _{n + 1} thus synthesized is calculated by the single frame interpolated tone color synthesizer 21 from the tone color waveform Z _n calculated from the previous frame Z _n . As the method of synthesizing the waveform, for example, the following various known methods can be used.

【００２５】１．フレーム毎の分析から得られたパラメ
ータでフレーム内で正弦波合成し、更に隣合うフレーム
でオーバラップアッドをする方法。２．先に述べたＭＱ
アルゴリズムで、瞬時位相関数を３次式で表現するも
の。３．図６に示すように、ＭＱアルゴリズムにおい
て、フレームと次フレームのスペクトルローカルピーク
の接続において、図３で用いられているスペクトル対応
探査部２６を再度用いる。1. A method of performing sine wave synthesis within a frame using the parameters obtained from analysis for each frame and then performing overlap add on adjacent frames. 2. MQ mentioned above
An algorithm that expresses the instantaneous phase function by a cubic expression. 3. As shown in FIG. 6, in the MQ algorithm, the spectrum correspondence search unit 26 used in FIG. 3 is used again in the connection of the spectrum local peak of the frame and the next frame.

【００２６】最後のこの方法は同一計算部を用いること
から装置構成が容易となる。図６でスペクトル対応探査
部２６でＺ_nとＺ_n+1 との対応スペクトルＯ_zn，Ｏ_zn+1
を求め、その対応するＯ_zn，Ｏ_zn+1により、（３）式に
おける境界となるスペクトル情報が定まるから、二つの
間のサンプル点ｍ対応に補間データ、つまり（３）式で
表される合成波形を単フレーム音合成部２９で求める。Since the last method uses the same calculation unit, the device configuration becomes easy. Corresponding spectral O _zn in the spectrum corresponding search unit 26 and Z _n and Z _{n + 1} in FIG. 6, O _{zn + 1}
And the corresponding O _zn and O _{zn + 1} determine the spectral information that becomes the boundary in the equation (3). Therefore, the interpolation data, that is, the equation (3), corresponds to the sample point m between the two. The single-frame sound synthesizer 29 obtains the synthesized waveform.

【００２７】図３の説明において、スペクトル補間部２
７で合成音色スペクトルＺ_n+1 が求まると、これを記憶
部２８にＺ_nとして記憶し、また記憶部１９のフレーム
計数値ｎを＋１し、再び、基準時点算出部２２でフレー
ムｎ＋１におけるα_nと対応するｋmiを算出し、そのｋ
miの対応スペクトルＸ_i，Ｙ_jの対応スペクトルを探査
して、スペクトル補間を行ってＺ_n+1 を求める。実施例２両端の二つの原音が同一文章で異なる声質の事例、ある
いは、ピッチ、あるいは、パワーが異なる二つの原音の
事例など、単なる定常音より複雑な二つの原音を両端音
とする場合についてのピッチ、あるいは音量は知覚的に
独立な要素であり、補間に際してはこれらが個別に補間
されていなくてはならない。また、音韻を司るスペクト
ル包絡も音韻を保存し、音色が補間されるような物理補
間がなされるべきである。そこでこの実施例２では図７
に示すように、まず、分析の初期段階で、部分音分解を
行うのみならず、ピッチ抽出部３１で原音波形ｘ，ｙよ
りピッチＰ_x，Ｐ_yをそれぞれ抽出し、またパワー抽出
部３２によりパワーａ_x，ａ_yをそれぞれ抽出し、更に
スペクトル包絡抽出部３３によりスペクトル包絡Ｅ _x，
Ｅ_yをそれぞれ求める。In the description of FIG. 3, the spectrum interpolation unit 2
7. Synthetic timbre spectrum Z_{n + 1} Remember this
Z to part 28_nAs the frame of the storage unit 19
The count value n is incremented by 1 and the frame is calculated again by the reference time point calculation unit 22.
Α at m n + 1_nAnd kmi corresponding to is calculated, and k
Corresponding spectrum X of mi_i, Y_jThe corresponding spectrum of
And perform spectrum interpolation to Z_{n + 1} Ask for.Example 2 There are cases where the two original sounds at both ends have the same sentence but different voice qualities.
Or two original sounds with different pitches or powers
Two original sounds that are more complex than simple stationary sounds such as cases
And the pitch or volume is perceptually
It is an independent element, and when interpolating, these are interpolated individually
Must have been done. Also, the spect that controls phonology
The le envelope also preserves the phoneme and is a physical supplement that interpolates the timbre.
A time should be made. Therefore, in this second embodiment, FIG.
As shown in, first, in the initial stage of analysis, partial sound decomposition is performed.
In addition to the above, the pitch extraction unit 31 uses the original waveforms x and y.
Pitch P_x, P_yRespectively, power extraction
Power a by the unit 32_x, A_yRespectively, and further
Spectral envelope E by spectral envelope extraction unit 33 _x，
E_yRespectively.

【００２８】これらパラメータを用いて、図３中のスペ
クトル補間部２７でのパラメータ補間、つまりスペクト
ル補間を行うが、この場合におけるスペクトル補間は図
８に示すようになされる。つまりこのスペクトル補間部
２７には、対応するフレームのフレーム番号ｎ_x（＝Ｃ
_x（ｋ_mi））、ｎ_y（＝Ｃ_y（ｋ_mi））、これらフレー
ムのスペクトルの部分音で対応するもの同士の番号
ｐ_x，ｐ_y（＝Ｏ_x（ｐ_x））、利用者が指定した物理
補間定数α、ピッチＰ_x，Ｐ_y，パワーａ_x，ａ_y、ス
ペクトル包絡Ｅ_x，Ｅ_yが入力される。Parameter interpolation, that is, spectrum interpolation is performed by the spectrum interpolating unit 27 in FIG. 3 using these parameters. The spectrum interpolation in this case is performed as shown in FIG. In other words, the spectrum interpolator 27 has a frame number n _x (= C) of the corresponding frame.
_x (k _mi )), n _y (= C _y (k _mi )), the numbers p _x and p _y (= O _x (p _x )) of corresponding partial tones of the spectrum of these frames, user , The physical interpolation constant α, the pitches P _x and P _y , the powers a _x and a _y , and the spectral envelopes E _x and E _y are input.

【００２９】そのスペクトル包絡Ｅ_x，Ｅ_yを補間して
目標とするスペクトル包絡を所望の合成音色に付与す
る。つまりピーク等対応探査部３５に対応フレーム番号
ｎ_x，ｎ_y、スペクトル包絡Ｅ_x，Ｅ_y、補間情報α_n
を入力して、Ｅ_x，Ｅ_yのピーク周波数とレベルをベク
トルとした距離をもとに、スペクトル対応探査部２６に
おける探査方法と同様の手法を用いて対応するピークを
算出し、その対応ピークを、スペクトル包絡補間部３６
で、α_nに応じた線形補間を行って、目標とするスペク
トル包絡Ｅ_zを求める。The spectral envelopes E _x and E _y are interpolated to give a desired spectral envelope to a desired synthesized tone color. That is, the corresponding frame numbers n _x and n _y , the spectrum envelopes E _x and E _y , and the interpolation information α _n are assigned to the peak-to-peak correspondence search unit 35.
By inputting the peak frequency of E _x and E _y and the distance using the level as a vector, the corresponding peak is calculated using the same method as the search method in the spectrum corresponding search unit 26, and the corresponding peak is calculated. To the spectrum envelope interpolation unit 36
Then, linear interpolation corresponding to α _n is performed to obtain the target spectral envelope E _z .

【００３０】一方、フレームｎ_xの原音波形ｘの振幅Ａ
_xpx ^nxとフレームｎ_yの原音波形ｙの振幅Ａ_ypy ^nyとを
補間情報α_nで振幅補間部３７により線形補間し、その
補間された振幅のスペクトル包絡をスペクトル包絡計算
部３８で計算し、このスペクトル包絡の先に求めた目標
スペクトル包絡Ｅ_zからのずれをスペクトル包絡修正量
計算部３９を計算し、このずれを、振幅補間部３７で求
めた補間された瞬時振幅に加算器４１で加算して、補間
された瞬時振幅のスペクトル包絡が目標とするものにな
るようにする。このようにして所望の合成音色に目標と
するスペクトル包絡が付与される。On the other hand, the amplitude A of the original sound waveform x of the frame n _x
_xpx ^nx and the amplitude A _ypy ^ny of the original sound waveform y of the frame n _y are linearly interpolated by the amplitude interpolator 37 with the interpolation information α _n , and the spectrum envelope of the interpolated amplitude is calculated by the spectrum envelope calculator 38. The shift of the spectrum envelope from the previously obtained target spectrum envelope E _z is calculated by the spectrum envelope correction amount calculation unit 39, and this shift is added by the adder 41 to the interpolated instantaneous amplitude obtained by the amplitude interpolation unit 37. Thus, the spectral envelope of the interpolated instantaneous amplitude is set to be a target. In this way, the desired spectral envelope is given to the desired synthesized tone.

【００３１】この所望の合成音色に目標とするスペクト
ル包絡の付与には次の処理を加えてもよい。つまりフレ
ームｎ_x，ｎ_yでそれぞれ抽出されたパワーａ_x（ｄ
Ｂ），ａ_y（ｄＢ）を、補間情報α_nでパワー補間部４
２において線形補間して目標パワー値ａ_zを求め、加算
器４１よりスペクトル包絡が修正された補間振幅のレベ
ルがａ_zになるようにレベル調整部４３で調整して、合
成音色の瞬時振幅Ａ_zp ⁿ⁺ ¹を得る。The following processing may be added to give a target spectral envelope to this desired synthesized tone color. That is, the power a _x (d) extracted in each of the frames n _x and n _y
B), a _y (dB), with the interpolation information α _n
2 linearly interpolates to obtain the target power value a _z , and the level adjuster 43 adjusts the level of the interpolated amplitude whose spectrum envelope has been corrected by the adder 41 to be a _z. _{Get zp} ^{n +} ¹ .

【００３２】ピッチを補間する場合は、知覚との対応が
よくなるように、対数ピッチをとり、フレームｎ_x，ｎ
_yにおける抽出ピッチＰ_x，Ｐ_yを補間情報α_nで、ピ
ッチ補間部４５により線形補間し、この補間されたピッ
チＰ_zにより、瞬時周波数を補間する際の補間定数を補
間係数再計算部４６で再計算し、その計算結果α_n′を
周波数補間部４７へ供給する。つまり瞬時周波数の補間
をα_nで行うと補間合成波形のピッチが少しずれる、よ
ってピッチＰ_zに応じて補間情報α_nを修正した方が、
聴覚的によい。このようにして周波数補間部４７では瞬
時周波数ｆ_xpx ^nx，ｆ_ypy ^nyをα_n′で補間して補間瞬
時周波数ｆ_zp ⁿ⁺¹を得る。When the pitch is interpolated, the correspondence with the perception is
For better results, take a logarithmic pitch and use frame n_x, N
_yExtraction pitch P at_x, P_yInterpolation information α_nThen,
Linear interpolation is performed by the pitch interpolation unit 45, and the interpolated pitch
Chi P_zIs used to compensate the interpolation constant when interpolating the instantaneous frequency.
Recalculation is performed by the inter-coefficient recalculation unit 46, and the calculation result α_n′
It is supplied to the frequency interpolation unit 47. That is, the interpolation of the instantaneous frequency
Α_n, The pitch of the interpolated composite waveform will shift a little.
That's the pitch P_zInterpolation information α according to_nIt is better to fix
Aurally good. In this way, the frequency interpolation unit 47
Hour frequency f_xpx ^nx, F_ypy ^nyΑ_nInterpolate with ’
Hour frequency f_zp ^{n + 1}To get

【００３３】更に音色合成波形ｚ_n+1の生成には、瞬時
周波数ｆ_zp ⁿ⁺¹、瞬時振幅Ａ_zp ⁿ⁺¹の他に瞬時位相θ_zp
ⁿ⁺¹を必要とする。この瞬時位相θ_zp ⁿ⁺¹を求めるには
位相計算部４８で原音波形の瞬時位相θ_xpx ^nx，θ_ypy
^nyを入力し、α_nで補間することなく、部分音はフレー
ム内でチャープ（chirp)信号とみなして瞬時周波数のみ
で決定し、又はランダムとし、あるいは二原音のそれぞ
れの隣接フレームとの位相差を補間するなどにより決定
する。Further, in order to generate the tone color synthesized waveform z _{n + 1} , in addition to the instantaneous frequency f _zp ^{n + 1} and the instantaneous amplitude A _zp ^{n + 1} , the instantaneous phase θ _zp.
Requires ^{n + 1} . In ^order to obtain this instantaneous phase θ _zp ^{n + 1} , the phase calculator 48 calculates the instantaneous phases θ _xpx ^nx and θ _ypy of the original sound waveform.
^ny is input, and without interpolating with α _n , the partial sound is regarded as a chirp signal in the frame and is determined only by the instantaneous frequency, or is made random, or the phase difference between each adjacent frame of the two original sounds Is determined by, for example, interpolating.

【００３４】なお、レベル調整部４３でのレベル調整は
省略してもよい。また周波数補間部４７での補間情報と
してα_nを使用し、ピッチ補間部４５、補間係数再計算
部４６を省略してもよい。実施例１（図３）でのスペク
トル補間部２７は、先に述べたようにスペクトル補間で
あって、振幅補間部３７、周波数補間部４７、位相計算
部４８により行われるものである。The level adjustment by the level adjusting unit 43 may be omitted. Further, α _n may be used as the interpolation information in the frequency interpolation unit 47, and the pitch interpolation unit 45 and the interpolation coefficient recalculation unit 46 may be omitted. The spectrum interpolation unit 27 in the first embodiment (FIG. 3) is the spectrum interpolation as described above, and is performed by the amplitude interpolation unit 37, the frequency interpolation unit 47, and the phase calculation unit 48.

【００３５】上述においては時点対応抽出部１２で原音
波形ｘ，ｙの各対応時点として対応フレームを抽出した
が、厳密に対応する時点を求めた方がより正しい音色合
成が得られると予測され、従って、各原音波形の各サン
プル時点ごとに対応する時刻を求めてもよい。この場合
は、部分音分解を各サンプル時点ごとにある区間につい
て部分音分解を行えばよい。また上述において原音をモ
デル表現するときのパラメータとしては部分音に限ら
ず、スペクトル、波形などを用いてもよい。In the above description, the time point correspondence extraction unit 12 extracts the corresponding frames as the corresponding time points of the original sound waveforms x and y, but it is predicted that more accurate timbre synthesis will be obtained if the corresponding time points are sought. Therefore, the time corresponding to each sampling time of each original sound waveform may be obtained. In this case, partial sound decomposition may be performed for a certain section at each sample time point. Further, in the above description, the parameter for expressing the original sound as a model is not limited to the partial sound, and a spectrum, a waveform or the like may be used.

【００３６】図１において部分音分解部１１、時点対応
抽出部１２、補間音色合成部１３はそれぞれ基本的には
実時間ではなくオフライン計算を前提としたが、事前に
部分音分解部１１と時点対応抽出部１２について予め計
算しておき、適度に速い演算装置を用いることにより、
αを実時間で使用者が指定して、補間音色合成部１３の
計算を実時間で行わせ、つまり補間音色合成音波形を実
時間で生成することもできる。なお使用目的に応じては
補間合成音色パラメータＺ_n+1 を得、これを他の場所で
波形合成するようにしてもよく、つまり、この発明は補
間合成音色パラメータＺ_n+1 を得るまでに特徴がある。In FIG. 1, the partial sound decomposition unit 11, the time point correspondence extraction unit 12, and the interpolated timbre synthesis unit 13 are basically based on offline calculation rather than real time. By calculating the correspondence extraction unit 12 in advance and using an appropriately fast arithmetic unit,
It is also possible that the user specifies α in real time to cause the interpolation tone color synthesis unit 13 to perform the calculation in real time, that is, the interpolation tone color synthesis sound waveform is generated in real time. Depending on the purpose of use, an interpolated synthesized tone color parameter Z _{n + 1} may be obtained and waveforms may be synthesized at another place. That is, according to the present invention, the interpolated synthesized tone color parameter Z _{n + 1} is obtained. There are features.

【００３７】[0037]

【発明の効果】以上述べたようにこの発明によれば、例
えば成人男声と成人女声の二原音の補間音色は、人間の
声、すなわち音声であるように、二原音に共通な属性が
あれば補間音色もそれを保つことができる。これは単に
歪を加える、雑音を付加するなどの目標のない方向への
音色制御とは異なり、一方の原音と他方他原音との間で
任意の程度にその一方の原音に近い合成音色を作成する
ことができ、もちろん、その二原音の一方自体をも出力
することができ、より音色制御の自由度が大きくなる。
また一方の原音から他方の原音へ連続的に音色を変化さ
せる音色モルフィングを行うことができる。As described above, according to the present invention, the interpolated timbres of the two tones, for example, the adult male voice and the adult female voice, are human voices, that is, voices, if the two tones have a common attribute. The interpolated timbre can also keep it. This is different from tone color control in the direction where there is no target, such as simply adding distortion or adding noise, and creates a synthesized tone color between one original sound and the other original sound to an arbitrary degree close to that original sound. Of course, one of the two original sounds can also be output, and the degree of freedom in controlling the timbre becomes greater.
Further, it is possible to perform tone color morphing in which the tone color is continuously changed from one original tone to the other original tone.

【００３８】特に実施例２によれば二つの原音に共通な
音韻、韻律などの言語特徴が合成された補間音色でも確
保でき、またピッチ、パワーなど知覚的に独立な要素
も、二つの原音で異なるとき、これらも補間される。こ
れにより楽器としては、これまでにない電子楽器として
の機能を持たせることができる。またコンピュータグラ
フィックでは知覚的な連続変形はモルフィングといわ
れ、既に実用化され多用されているが、このコンピュー
タグラフィックのモルフィングと合わせて音のモルフィ
ングを行うことができる。In particular, according to the second embodiment, it is possible to secure even an interpolated tone color in which linguistic features such as phonemes and prosody common to two original sounds are synthesized, and perceptually independent elements such as pitch and power are also included in the two original sounds. When they are different, they are also interpolated. As a result, the musical instrument can have a function as an electronic musical instrument that has never existed before. Further, in computer graphics, perceptual continuous deformation is called morphing, which has already been put to practical use and widely used, but sound morphing can be performed in combination with morphing of computer graphics.

[Brief description of drawings]

【図１】この発明の方法を適用した補間音色合成装置の
機能構成を示すブロック図。FIG. 1 is a block diagram showing a functional configuration of an interpolation tone color synthesis apparatus to which the method of the present invention is applied.

【図２】図１中の時点対応抽出部１２における二原音の
対応関係例を示す図。FIG. 2 is a diagram showing an example of a correspondence relationship between two original sounds in a time point correspondence extraction unit 12 in FIG.

【図３】図１中の補間音色合成部１３の機能構成を示す
ブロック図。FIG. 3 is a block diagram showing a functional configuration of an interpolated tone color synthesis unit 13 in FIG.

【図４】図３中の基準時点算出部２２における、合成し
たい時点と補間情報αから補間に用いる二原音の各時点
を求める説明図。FIG. 4 is an explanatory diagram for obtaining each time point of two original sounds used for interpolation from a time point to be synthesized and interpolation information α in a reference time point calculation unit 22 in FIG.

【図５】図３中のスペクトル対応探査部２６で得られた
対応スペクトルを説明するための二原音のスペクトル例
を示す図。5 is a diagram showing a spectrum example of two original sounds for explaining a corresponding spectrum obtained by a spectrum correspondence search unit 26 in FIG. 3;

【図６】図３中の単フレーム補間音色合成部２１の機能
構成例を示すブロック図。6 is a block diagram showing an example of the functional configuration of a single-frame interpolation timbre synthesizer 21 in FIG.

【図７】実施例２における初期過程の機能構成を示すブ
ロック図。FIG. 7 is a block diagram showing a functional configuration of an initial process in the second embodiment.

【図８】実施例２における図３中のスペクトル補間部の
機能情報を示すブロック図。FIG. 8 is a block diagram showing functional information of a spectrum interpolation unit in FIG. 3 in the second embodiment.

Claims

(57) [Claims]

1. A method for generating a synthesized timbre of two original sounds with a desired closeness to the original sounds, wherein a parameter for modeling the original sounds of the two original sounds is represented by a sine wave superposed model expression at regular intervals. Based on
Hazuki comprising the steps of: estimating each of these estimated two and the process of extracting the corresponding time points of the parameter of the original sound, different the two original at the time corresponding to the extracted
A dimension vector defines the distance between the elements,
The cost of failing is defined in the same dimension as the distance,
By using the idea of minimum total cost,
The process of finding the corresponding one between the estimated parameters and the process of interpolating the found corresponding parameter according to the degree of the desired proximity of the synthesized tone to the two tones to create a new parameter. And an interpolated tone color synthesizing method.

2. A method for generating a synthesized timbre from two original sounds with a desired closeness to the original sounds, a process of estimating parameters for modeling the original sounds of the two original sounds at regular time intervals. , The process of extracting the corresponding time points between the parameters of these two estimated original sounds, the process of finding the corresponding ones between the estimated parameters of the two original sounds at the extracted corresponding time points, and the correspondence found above the parameters, and a process of creating a new parameter by interpolation according to the desired degree of proximity to the two-original sound of the synthesized tone, a new interpolated depending on the degree of the desired closeness Parame
The process of creating the
And the extracted spectral envelopes.
Interpolate the target spectral envelope to the desired synthesized tone
The process of applying and interpolating the instantaneous frequencies of the above two original sounds
And a step of determining the instantaneous phase of the synthesized timbre .

3. The step of extracting the corresponding time points includes the steps of extracting the corresponding frames for each fixed time (frame) and numbering them in order of time, and the desired proximity of the synthesized tone color to the two original sounds. The extracted corresponding frame is selected so that the time when the above-mentioned extracted corresponding frame is interpolated is closest to the time represented by the frame to be newly calculated in the synthesized tone color, and this is extracted as described above. 3. The interpolated tone color synthesizing method according to claim 2 , further comprising the step of setting corresponding time points, wherein each parameter in both the selected corresponding frames is set as the found corresponding parameter.

4. The process of applying the target spectral envelope to a desired synthesized tone color is a process of interpolating the amplitudes of the two original sounds, a process of obtaining a spectral envelope of the interpolated amplitudes, and a process of obtaining the spectral envelope of the interpolated amplitudes. 4. The interpolated tone color synthesis according to claim 2 , further comprising a step of extracting a difference between the spectrum envelope and the interpolated spectrum envelope, and a step of adding the difference of the spectrum envelope to the interpolated amplitude. Method.

5. The process of extracting the powers of the two original sounds and the process of interpolating the extracted powers to determine the target power in the process of imparting the target spectral envelope to a desired synthesized tone color. process and interpolation tone synthesis method according to any one of claims 2 to 4 powers of the desired composite tone comprising the step of adjusting so that the power of the target.

6. The process of interpolating the instantaneous frequency includes a process of extracting the pitches of the two original sounds, and a process of interpolating the extracted pitches at corresponding time points to determine a target pitch. From the target pitch, the process of recalculating a constant representing the desired closeness to the two original sounds, and the instantaneous frequency of the two original sounds is interpolated using the recalculated constant representing the closeness. 3. The process according to claim 2
6. The interpolated tone color synthesizing method according to any one of items 1 to 5.

7. The interpolated tone color synthesizing method according to claim 2 , further comprising a step of synthesizing a waveform using the new parameter.