JP2932481B2

JP2932481B2 - Pitch detection method

Info

Publication number: JP2932481B2
Application number: JP63292936A
Authority: JP
Inventors: 雅一鈴置
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1988-11-19
Filing date: 1988-11-19
Publication date: 1999-08-09
Anticipated expiration: 2014-08-09
Also published as: JPH02138831A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、例えば楽音等のピッチを検出する検出方法
に関するものであり、特に、楽音をディジタル処理する
オーディオ・プロセッシング・ユニット（APU）を用い
たピッチ検出方法に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a detection method for detecting, for example, a pitch of a musical tone or the like, and particularly to an audio processing unit (APU) for digitally processing a musical tone. The present invention relates to a method of detecting a pitch.

[Summary of the Invention]

本発明は、アナログ信号がディジタル変換された入力
ディジタル信号をフーリエ変換して得られた各周波数成
分ごとの位相を揃えた後、再びフーリエ変換し、その出
力データのピーク値の周期を検出することで、アナログ
信号のピッチを検出することにより、アナログ信号のピ
ッチを少ないサンプル数でかつ高精度に検出することを
可能とするピッチ検出方法を提供するものである。According to the present invention, the phase of each frequency component obtained by Fourier-transforming an input digital signal obtained by converting an analog signal into a digital signal is aligned, and then Fourier-transformed again to detect the period of the peak value of the output data. Accordingly, it is an object of the present invention to provide a pitch detection method capable of detecting the pitch of an analog signal with a small number of samples and with high accuracy by detecting the pitch of the analog signal.

[Conventional technology]

一般に、電子楽器やTVゲーム器等に用いられる音源
は、例えばVCO、VCA、VCF等から成るアナログ音源と、P
SG（プログラマブル・サウンド・ジェネレータ）や波形
ROM読み出しタイプ等のディジタル音源とに大別され
る。このディジタル音源の一種として、近年において
は、生の楽器音等をサンプリングしてディジタル処理し
た音源データをメモリ等に記憶させて用いるようなサン
プラー音源も広く知られるようになってきている（例え
ば特開昭62−264099号公報、特開昭62−267798号公報参
照）。Generally, a sound source used for an electronic musical instrument, a video game device, or the like includes, for example, an analog sound source such as a VCO, a VCA, a VCF, and the like.
SG (programmable sound generator) and waveform
It is roughly classified into digital sound sources such as ROM read type. In recent years, as one type of the digital sound source, a sampler sound source that uses a sound source data obtained by sampling a raw musical instrument sound or the like and digitally processing the stored data in a memory or the like has been widely known (for example, a special sampler sound source). See JP-A-62-264099 and JP-A-62-267798).

このサンプラー音源においては、一般的に音源データ
記憶用のメモリに大容量を要することから、メモリ節約
のための手法が各種提案されており、例えば、楽音波形
の周期性を利用したルーピング処理や、非線形量子化等
によるビット圧縮処理がその代表的なものとして挙げら
れる。なお上記ルーピング処理は、サンプリングされた
楽音の元の持続時間よりも長い時間音を出し続けるため
の一手法でもある。すなわち、例えば楽音信号波形を考
えるとき、一般に発音開始直後においてはピアノの打鍵
ノイズや管楽器のブレスノイズ等の非音程成分を含む波
形の周期性が不明瞭なフォルマント部分が生じている
が、その後、楽音の音程（ピッチ、音高）に対応する基
本周期で同じ波形が繰り返し現れるようになる。この繰
り返し波形のｎ周期分（ｎは整数）をルーピング区間と
し、必要に応じて繰り返し再生することにより、少ない
メモリ容量で長時間の持続音を得ることができるわけで
ある。In this sampler sound source, since a memory for sound source data storage generally requires a large capacity, various methods for saving the memory have been proposed, for example, a looping process using periodicity of a musical sound waveform, A typical example thereof is a bit compression process using nonlinear quantization or the like. Note that the looping process is also a method for continuously outputting a sound for a longer time than the original duration of the sampled musical sound. That is, for example, when considering a musical tone signal waveform, a formant portion in which the periodicity of a waveform including non-pitch components such as a keystroke noise of a piano and a breath noise of a wind instrument generally occurs immediately after the start of sounding is generated. The same waveform repeatedly appears in the basic period corresponding to the musical pitch (pitch, pitch). By setting n loops (n is an integer) of this repetitive waveform as a looping section and repeating the reproduction as needed, a long-lasting sound can be obtained with a small memory capacity.

[Problems to be solved by the invention]

上述のルーピング処理において楽音の音程を知る方法
としては、従来より、例えば、楽音データの波形にロー
パスフィルタ（LPF）をかけて高周波ノイズ成分を除去
して、そのLPF通過後の波形のゼロクロスポイントをカ
ウントすることにより、楽音データ波形の周波数を求め
て音程（ピッチ）を計測する方法が行われている。しか
し、上述の方法は、多数のゼロクロスポイントをカウン
トしなければ音程（ピッチ）の周波数を測定するするこ
とができないので、楽音が長時間持続していることが必
要となる。したがって、短時間で楽音が消滅する音の処
理には用いにくい。As a method of knowing the pitch of a musical tone in the above-described looping processing, conventionally, for example, a low-pass filter (LPF) is applied to a waveform of musical tone data to remove high-frequency noise components, and a zero cross point of the waveform after passing through the LPF is determined. A method of measuring the pitch (pitch) by counting the frequency of a musical tone data waveform is performed. However, in the above-described method, the frequency of the pitch cannot be measured unless a large number of zero-cross points are counted. Therefore, it is difficult to use for processing a sound in which a musical sound disappears in a short time.

また、音程を知る他の方法として、例えば、楽音デー
タを高速フーリエ変換（FFT）し、その楽音データのピ
ークを検出して、このピークを計測する方法も挙げられ
る。しかし、この方法では、サンプリング周波数fsに比
べて音程（音程）の周波数が低い場合、基音の周波数の
ピークを有効に取り出すことができず精度が良くない。
また、楽音によっては基音成分が倍音成分より遥かに小
さい場合があり、この場合も基音の周波数のピークを有
効に取り出すことが困難である。As another method of finding the pitch, for example, there is a method of performing fast Fourier transform (FFT) on musical tone data, detecting a peak of the musical tone data, and measuring the peak. However, in this method, when the frequency of the pitch (pitch) is lower than the sampling frequency fs, the peak of the frequency of the fundamental tone cannot be effectively extracted, and the accuracy is not good.
In addition, a fundamental tone component may be much smaller than a harmonic component for some musical tones, and in this case also, it is difficult to effectively extract the frequency peak of the fundamental tone.

本発明は、上述のような実情に鑑みて提案されたもの
であり、少ないサンプル数の音源データから音源の音程
（ピッチ）が検出可能であり、かつ音源データの周波数
によるピッチ検出精度のばらつきが少ない高精度なピッ
チ検出方法を提供することを目的とするものである。The present invention has been proposed in view of the above-described circumstances, and it is possible to detect the pitch (pitch) of a sound source from sound source data with a small number of samples, and to have a variation in pitch detection accuracy depending on the frequency of the sound source data. It is an object of the present invention to provide a small and highly accurate pitch detection method.

[Means for solving the problem]

本発明に係るピッチ検出方法は、上述の目的を達成す
るために、アナログ信号がディジタル変換された入力デ
ィジタル信号をフーリエ変換する工程と、得られた各周
波数成分の絶対値をとる工程と、得られた各周波数成分
の絶対値を再びフーリエ変換する工程と、得られた出力
データのピーク値の周期を検出することで、上記アナロ
グ信号のピッチを検出する工程とを有することを特徴と
している。In order to achieve the above object, a pitch detection method according to the present invention includes a step of performing a Fourier transform on an input digital signal obtained by converting an analog signal into a digital signal, a step of obtaining an absolute value of each of the obtained frequency components, The method further comprises a step of performing a Fourier transform on the absolute value of each of the obtained frequency components again, and a step of detecting a pitch of the analog signal by detecting a cycle of a peak value of the obtained output data.

これの具体例を、第１図のフローチャートを参照しな
がら説明すると、ステップS11でアナログ信号がディジ
タル変換された入力ディジタル信号が取り込まれ、ステ
ップS12でフーリエ変換して、ステップS13で絶対値を取
り得られた各周波数成分ごとの位相を揃えた後、ステッ
プS14で再びフーリエ変換し、ステップS15でその出力デ
ータのピーク値の周期を検出することで、上記アナログ
信号のピッチを検出することを特徴とするものである。A specific example of this will be described with reference to the flowchart of FIG. 1. In step S11, an input digital signal obtained by converting an analog signal into a digital signal is fetched, Fourier transform is performed in step S12, and an absolute value can be obtained in step S13. After aligning the phases of the respective frequency components, the Fourier transform is performed again in step S14, and the pitch of the analog signal is detected by detecting the cycle of the peak value of the output data in step S15. Is what you do.

[Action]

本発明によれば、アナログ信号がディジタル変換され
た入力ディジタル信号に高速フーリエ変換（FFT）をか
け、このFFT処理後の得られた信号の位相差成分を強制
的にゼロに設定し、その後、再びFFT処理（逆FFT）する
ことにより、基音（音程）の波形が明確になる。したが
って、基音の周波数のピークが計測し易くなる。According to the present invention, an input digital signal obtained by converting an analog signal into a digital signal is subjected to a fast Fourier transform (FFT), and a phase difference component of the obtained signal after the FFT processing is forcibly set to zero. By performing the FFT processing (inverse FFT) again, the waveform of the fundamental tone (pitch) becomes clear. Therefore, the peak of the fundamental frequency can be easily measured.

〔Example〕

先ず、本発明の実施例の説明に先立って、第２図に示
す楽音信号波形を参照しながら、前述したルーピング処
理について簡単に説明する。一般に発音開始直後におい
てはピアノの打鍵ノイズや管楽器のブレスノイズ等の非
音程成分が含まれることにより、波形の周期性が不明瞭
な部分であるフォルマント部分FRが生じており、その
後、楽音の音程（ピッチ、音高）に対応する基本周期で
同じ波形が繰り返し現れるようになる。この繰り返し波
形のｎ周期分（ｎは整数）をルーピング区間LPとし、こ
のルーピング区間LPはルーピング開始点LP_Sとルーピン
グ終端点LP_Eのルーピングポイント間で表されるもので
ある。そして上記フォルマント部分FRとルーピング区間
LPとを記憶媒体に記録し、再生時にはフォルマント部分
FRの再生に続いてルーピング区間LPを繰り返し再生する
ことにより、任意の長時間に亘って楽音を発生させるこ
とができる。First, prior to the description of the embodiment of the present invention, the above-described looping processing will be briefly described with reference to a tone signal waveform shown in FIG. In general, immediately after the start of sound production, non-pitch components such as piano tapping noise and wind instrument breath noise are included, so that a formant part FR, in which the periodicity of the waveform is unclear, occurs. The same waveform repeatedly appears in the basic cycle corresponding to (pitch, pitch). The n period of the repetitive waveform component (n is an integer) as a looping section LP, the looping section LP is represented by the inter-looping points looping start point LP _S and the looping end point LP _E. And the above formant part FR and looping section
LP is recorded on a storage medium, and the formant
By repeatedly reproducing the looping section LP subsequent to the reproduction of the FR, a musical sound can be generated for an arbitrary long time.

以下、本発明の一実施例について図面を参照しながら
説明する。なお、本発明は以下の実施例に限定されるも
のでないことは言うまでもない。Hereinafter, an embodiment of the present invention will be described with reference to the drawings. It goes without saying that the present invention is not limited to the following examples.

第３図は、本発明実施例の音源データ圧縮符号化方法
を音源データ形成装置に適用する際に、入力楽音信号を
サンプリングして記憶媒体に記録するまでの各機能の具
体例を示す機能ブロック図である。この場合の入力端子
10に供給される入力楽音信号としては、例えばマイクロ
フォンで直接収音した信号、あるいはディジタル・オー
ディオ信号記録媒体等を再生して得られた信号を、アナ
ログ信号あるいはディジタル信号の形態で用いることが
できる。FIG. 3 is a functional block diagram showing specific examples of functions from sampling an input tone signal to recording it on a storage medium when the sound source data compression encoding method according to the embodiment of the present invention is applied to a sound source data forming apparatus. FIG. Input terminal in this case
As the input tone signal supplied to 10, for example, a signal directly collected by a microphone or a signal obtained by reproducing a digital audio signal recording medium or the like can be used in the form of an analog signal or a digital signal. .

先ず、第３図のサンプリング処理機能ブロック11にお
いては、上記入力楽音信号を例えば周波数38kHzでサン
プリングし、１サンプル16ビットのディジタルデータと
して取り出している。このサンプリング処理とは、上記
入力楽音信号がアナログ信号の場合のA/D変換処理に対
応するものであり、また入力信号がディジタル信号の場
合にはサンプリングレート変換及びビット数変換の処理
に対応するものである。First, in the sampling processing function block 11 shown in FIG. 3, the input tone signal is sampled at a frequency of, for example, 38 kHz, and extracted as 16-bit digital data per sample. This sampling processing corresponds to A / D conversion processing when the input tone signal is an analog signal, and corresponds to sampling rate conversion and bit number conversion processing when the input signal is a digital signal. Things.

次に、ピッチ検出機能ブロック12において、上述のサ
ンプリング処理により得られたディジタル楽音信号につ
いての楽音の音程（ピッチ）を決定する基音の周波数
（基本周波数）f₀、すなわちピッチ情報が検出される。Next, in the pitch detection function block 12, a fundamental tone frequency (fundamental frequency) f ₀ for determining a musical tone pitch (pitch) of the digital tone signal obtained by the above-described sampling processing, that is, pitch information is detected.

このピッチ検出機能ブロック12における検出原理を説
明する。ここで、サンプリング音源となる楽音信号は、
その基音となる周波数がサンプリング周波数fsに比べて
かなり低い場合が多く、周波数軸で楽音のピークを検出
するだけでは高い精度での音程の同定が難しい。したが
って、何らかの手段を用いて、楽音の倍音成分のスペク
トルを利用する必要がある。The principle of detection in the pitch detection function block 12 will be described. Here, the musical tone signal as the sampling sound source is
In many cases, the fundamental frequency is considerably lower than the sampling frequency fs, and it is difficult to identify the pitch with high accuracy only by detecting the peak of the musical tone on the frequency axis. Therefore, it is necessary to use some means to use the spectrum of the overtone component of the musical tone.

先ず、音程を検出したい楽音信号の波形をｆ（ｔ）と
すれば、この楽音波形ｆ（ｔ）を各倍音成分の振幅ａ
（ω）および位相φ（ω）で表せば、該楽音波形ｆ
（ｔ）はフーリエ展開した式、で表せる。ここで、各倍音の位相のずれφ（ω）を全て
ゼロにすると、の式で表せるものとなる。このように位相の揃えられた
楽音波形（ｔ）のピークは楽音波形（ｔ）の持つ全
ての倍音の周期の整数倍の点およびｔ＝０の点である。
これは基音の周期にほかならない。First, assuming that the waveform of a tone signal whose pitch is to be detected is f (t), the tone waveform f (t) is represented by the amplitude a of each harmonic component.
(Ω) and phase φ (ω), the tone waveform f
(T) is a Fourier-expanded equation, Can be represented by Here, if the phase shift φ (ω) of each overtone is all zero, It can be expressed by the following equation. The peaks of the musical tone waveform (t) whose phases are aligned in this manner are points at integer multiples of the period of all overtones of the musical tone waveform (t) and at t = 0.
This is nothing but the period of the fundamental tone.

この原理をふまえて、ピッチ検出の手順を第４図に示
す機能ブロック図を用いて説明する。Based on this principle, the procedure of pitch detection will be described with reference to a functional block diagram shown in FIG.

第４図において、実部データ入力端子31より楽音デー
タを、また虚部データ入力端子32より“0"を、高速フー
リエ変換（FFT）機能ブロック33に供給する。In FIG. 4, tone data is supplied from a real part data input terminal 31 and “0” is supplied from an imaginary part data input terminal 32 to a fast Fourier transform (FFT) function block 33.

ここで、上記高速フーリエ変換機能ブロック33で行わ
れる高速フーリエ変換において、ピッチを推定する楽音
信号をｘ（ｔ）とし、また、上記楽音信号ｘ（ｔ）に含
まれる倍音成分を a_ncos（２πf_nt＋θ）・・・・・・とすれば、ｘ（ｔ）はこれを複素表示で書き直して、ただし、 cosθ＝（exp（ｊθ）＋exp（−ｊθ））/2 ・・を用いた。この式をフーリエ変換すると、ここで、δ（ω−ω_ｎ）はデルタ関数である。Here, in a fast Fourier transformation performed by the fast Fourier transform function block 33, the musical tone signal to estimate the pitch is x (t), also a harmonic component included in the sound signal x (t) a _n cos ( 2πf _n t + θ) ... Then, x (t) becomes Rewrite this in complex notation, Here, cosθ = (exp (jθ) + exp (−jθ)) / 2. When this equation is Fourier transformed, Here, δ (ω−ω _n ) is a delta function.

次の機能ブロック34で該高速フーリエ変換後のデータ
のノルム（絶対値、すなわち実部と虚部をそれぞれ２乗
したものの和の平方根）を算出する。In the next function block 34, the norm (absolute value, that is, the square root of the sum of the squares of the real part and the imaginary part) of the data after the fast Fourier transform is calculated.

すなわち、Ｘ（ω）の絶対値Ｙ（ω）を取ると、位相
成分がキャンセルされて、これは、上記楽音データの高周波成分の全ての位相を合
わせるために成されるものであり、上記虚部をゼロにす
ることにより、位相成分を揃えることができる。That is, taking the absolute value Y (ω) of X (ω) cancels the phase component, This is performed to match the phases of all the high-frequency components of the musical tone data. The phase components can be aligned by setting the imaginary part to zero.

次に、この算出されたノルムを高速フーリエ変換（こ
の場合は逆FFTに相当）機能ブロック36に実部データと
して供給し、虚部データ入力端子35には“0"を供給して
逆FFTをかけて楽音データを復元する。すなわち、上記
逆フーリエ変換は、である。この逆フーリエ変換後の復元された楽音データ
は、全ての高周波成分の位相が揃ったコサイン波の合成
で表せる波形として取り出されるものである。Next, the calculated norm is supplied to the fast Fourier transform (corresponding to the inverse FFT in this case) function block 36 as the real part data, and “0” is supplied to the imaginary part data input terminal 35 to perform the inverse FFT. To restore the music data. That is, the inverse Fourier transform is It is. The restored tone data after the inverse Fourier transform is extracted as a waveform that can be expressed by synthesizing a cosine wave in which all high-frequency components have the same phase.

その後、ピーク検出機能ブロック37で上記復元された
音源データのピークを検出する。ここで、上記ピークは
上記楽音データの全ての高周波成分の極値（ピーク）が
一致した点であり、次の機能ブロック38において上記検
出されたピーク値を値の大きい方から分類（ソート）す
る。上記検出されたピークの周期を計測することによ
り、楽音信号の音程を知ることができる。Thereafter, the peak of the restored sound source data is detected by the peak detection function block 37. Here, the peak is a point where the extreme values (peaks) of all the high frequency components of the musical tone data coincide with each other, and in the next function block 38, the detected peak values are classified (sorted) from the larger value. . By measuring the period of the detected peak, the pitch of the tone signal can be known.

第５図は、第４図のピーク検出機能ブロック37におけ
る楽音データの極大値（ピーク）を検出するための構成
について説明するためのものである。FIG. 5 is a diagram for explaining a configuration for detecting the maximum value (peak) of the musical sound data in the peak detection function block 37 of FIG.

この場合上記楽音データは、値の異なったピーク（極
値）が多数存在するものであり、上記楽音データの最大
値を求めてその周期を検出することで楽音の音程を知る
ことができる。In this case, the musical tone data has many peaks (extreme values) having different values, and the pitch of the musical tone can be known by finding the maximum value of the musical tone data and detecting its cycle.

すなわち第５図において、逆フーリエ変換後の楽音デ
ータ列は、入力端子41を介しＮ＋１段のシフトレジスタ
42に供給され、このシフトレジスタ42の各段のレジスタ
ａ_−N/2…a₀…ａ_N/2を順次介して出力端子43に送られて
いる。このＮ＋１段のシフトレジスタ42は上記楽音デー
タ列に対して幅がＮ＋１サンプル分のウィンドウとして
作用し、該楽音データ列のＮ＋１サンプルが上記ウィン
ドウを介して最大値検出回路44に送られる。すなわち、
上記楽音データは最初にレジスタａ_−N/2に入力した後
レジスタａ_N/2まで順次伝送され、各々のレジスタａ
_−N/2…a₀…ａ_N/2からのＮ＋１サンプルの上記各楽音デ
ータが最大値検出回路44に送られる。That is, in FIG. 5, the tone data string after the inverse Fourier transform is transferred to an N + 1-stage shift register via an input terminal 41.
Is supplied to the 42, being sent to register _{_{a -N / 2 ... a 0 ...}} a N / 2 sequentially through to the output terminal 43 of each stage of the shift register 42. The N + 1-stage shift register 42 acts as a window having a width of N + 1 samples for the tone data string, and N + 1 samples of the tone data string are sent to the maximum value detection circuit 44 via the window. That is,
The tone data is first input to the register a- _{N / 2 and} then transmitted sequentially to the register _{aN / 2.}
_{_{-N / 2 ... a 0 ... N}} + 1 samples of each tone data from a _{N / 2} are sent to the maximum value detection circuit 44.

この最大値検出回路44は、上記シフトレジスタ42内の
例えば中央のレジスタa₀の値が上記Ｎ＋１サンプルのデ
ータの各値の内で最大となったとき、そのレジスタa₀の
データをピーク値として検出して、出力端子45より出力
するものである。なお、上記ウィンドウの幅Ｎ＋１は任
意に設定可能である。The maximum value detection circuit 44 sets the data of the register a ₀ as a peak value when, for example, the value of the central register a ₀ in the shift register 42 becomes the maximum among the values of the data of the N + 1 samples. This is detected and output from the output terminal 45. The window width N + 1 can be set arbitrarily.

第３図に戻って、エンベロープ検出機能ブロック13に
おいては、上述のサンプリング処理後のディジタル楽音
信号に対して、上記ピッチ情報を用いたエンベロープ検
出処理を施すことにより、楽音信号のいわゆるエンベロ
ープ波形を得ている。これは、例えば第６図Ａに示すよ
うな楽音信号波形のピーク点を順次結んで得られる第６
図Ｂに示すような波形であり、発音直後からの時間経過
に伴うレベル（あるいは音量）の変化を表している。こ
のエンベロープ波形は、一般にADSR（アタックタイム／
ディケイタイム／サスティンレベル／リリースタイム）
のような各パラメータにより表されることが多い。ここ
で楽音信号の一具体例として、打鍵操作に応じて発音さ
れるピアノ音等を考えるとき、上記アタックタイムT_Aは
鍵盤の鍵が押され（キー・オン）徐々に音量が上がり目
標とする音量に達するまでの時間を表し、上記ディケイ
タイムT_Dは上記アタックタイムT_Sで達した音量から次の
音量（例えば楽器の持続音の音量）に達するまでの時間
を表し、上記サスティンレベルL_Sは鍵の押圧を解除して
キー・オフするまで保たれる持続音の音量を表し、上記
リリースタイムT_Rは上記キー・オフしてから音が消える
までの時間を表している。なお上記各時間T_A、T_D、T
_Rは、音量変化の傾きあるいはレートを示すこともあ
る。また、これらの４つのパラメータの他にさらに多く
のエンベロープパラメータを用いるようにしてもよい。Returning to FIG. 3, the envelope detection function block 13 performs an envelope detection process using the pitch information on the digital tone signal after the sampling process to obtain a so-called envelope waveform of the tone signal. ing. This is achieved by sequentially connecting peak points of the tone signal waveform as shown in FIG. 6A, for example.
The waveform is as shown in FIG. B, and represents a change in level (or volume) over time immediately after sound generation. This envelope waveform generally has the ADSR (attack time /
Decay time / sustain level / release time)
In many cases. As a specific example of where tone signal, when considering the piano sound like be pronounced according to keying operation, the attack time T _A is a target increases the volume gradually keys of the keyboard is pressed (key on) The decay time T _D represents the time required to reach the volume, and the decay time T _D represents the time required to reach the next volume (for example, the volume of the continuous sound of the instrument) from the volume reached by the attack time T _S , and the sustain level L _S represents the volume of the sustained sound is maintained until the key-off by releasing the pressing of the key, the release time T _R represents the time until the sound from the above-mentioned key-off disappears. Each of the above times T _A , T _D , T
_R may also indicate the slope or rate of volume change. Further, in addition to these four parameters, more envelope parameters may be used.

ここで、エンベロープ検出機能ブロック13において
は、上述したようなADSR（アタックタイムT_A/ディケイ
タイムT_D/サスティンレベルL_S/リリースタイムT_R）等の
各パラメータにより表されるエンベロープ波形情報と同
時に、前述したフォルマント部分をアタック波形の残っ
た状態で取り出すために、信号波形の全体的なディケイ
レートを示す情報を得るようにしている。このディケイ
レート情報は、例えば第７図に示すように、発音時（キ
ー・オン時）から上記アタックタイムT_Aの間は基準の値
“1"をとり、その後単調減衰する波形を表すものであ
る。Here, in the envelope detection function block 13, simultaneously with the envelope waveform information represented by each parameter such as ADSR (attack time T _A / decay time T _D / sustain level L _S / release time T _R ) as described above. In order to extract the above-mentioned formant portion with the attack waveform remaining, information indicating the entire decay rate of the signal waveform is obtained. The decay rate information, for example, as shown in FIG. 7, between time pronunciation (when a key on) of the attack time T _A has a value "1" of the reference, which represents the subsequent waveform monotonously attenuated is there.

ここで、第３図のエンベロープ検出機能ブロック13の
構成例について、第８図の機能ブロック図を参照しなが
ら説明する。Here, an example of the configuration of the envelope detection function block 13 in FIG. 3 will be described with reference to the functional block diagram in FIG.

当該エンベロープ検出の原理は、いわゆるAM（振幅変
調）信号のエンベロープ検波と同様なものである。すな
わち、上記楽音信号のピッチを上記AM信号のキャリアの
周波数として考えることによりエンベロープを検出する
ものである。上記エンベロープ情報は楽音を再生する際
に用いるものであり、当該楽音は上記エンベロープ情報
とピッチ情報に基づいて形成されるものである。The principle of the envelope detection is similar to the envelope detection of a so-called AM (amplitude modulation) signal. That is, the envelope is detected by considering the pitch of the tone signal as the frequency of the carrier of the AM signal. The envelope information is used for reproducing a musical sound, and the musical sound is formed based on the envelope information and the pitch information.

第８図の入力端子51に供給された楽音データは、絶対
値出力機能ブロック52において、上記楽音の波高値デー
タの絶対値が求められる。この絶対値データをFIR（有
限インパルス応答）型ディジタルフィルタの機能ブロッ
ク55に送る。ここで、上記FIRフィルタ機能ブロック55
はローパスフィルタとして作用するものであり、予め、
入力端子53に供給されたピッチ情報に基づいて機能ブロ
ック54において形成しておいたフィルタ係数をFIRフィ
ルタ機能ブロック55に供給することにより、そのローパ
スフィルタのカットオフ特性を決定するものである。From the musical tone data supplied to the input terminal 51 in FIG. 8, the absolute value of the peak value data of the musical tone is obtained in the absolute value output function block 52. The absolute value data is sent to a functional block 55 of a FIR (finite impulse response) type digital filter. Here, the above FIR filter function block 55
Acts as a low-pass filter.
By supplying the filter coefficient formed in the function block 54 to the FIR filter function block 55 based on the pitch information supplied to the input terminal 53, the cutoff characteristic of the low-pass filter is determined.

ここで、上記フィルタ特性は、例えば第９図に示す特
性となっており、上記楽音信号の基音（周波数f₀）やそ
の倍音の周波数に零点を有するものである。例えば、上
記第６図Ａに示す楽音信号からは、上記FIRフィルタで
基音，倍音の周波数を減衰させることにより第６図Ｂに
示すようなエンベロープ情報が検出される。なお上記フ
ィルタ係数の特性は、次式で示されるものである。Here, the filter characteristic is, for example, the characteristic shown in FIG. 9, and has a zero point in the fundamental tone (frequency f ₀ ) of the musical tone signal and its overtone frequency. For example, from the tone signal shown in FIG. 6A, envelope information as shown in FIG. 6B is detected by attenuating the fundamental and harmonic frequencies by the FIR filter. Note that the characteristics of the filter coefficient are represented by the following equations.

Ｈ（ｆ）＝ｋ・（sin（πf/f₀））/f ・・・・この式中のf₀は楽音信号の基本周波数（ピッチ）を
示す。H (f) = k · (sin (πf / f ₀ )) / f... F _{0 in} this equation indicates a fundamental frequency (pitch) of the tone signal.

次に、上述のサンプリング処理された楽音信号の波高
値データ（サンプリングデータ）から、前述の第２図に
示すフォルマント部分FRの信号の波高値データと、ルー
ピング区間LPの信号の波高値データ（ループデータ）と
を生成する処理について説明する。Next, the peak value data of the signal of the formant part FR shown in FIG. 2 and the peak value data of the signal of the looping section LP (loop ) Will be described.

上記ループデータ生成のための最初の機能ブロック14
において、上記サンプリングされた楽音信号の波高値デ
ータを、先に検出したエンベロープ波形（第６図Ｂ）の
データで割算（又は逆数を乗算）してエンベロープ補正
を行うことにより、第10図に示すような振幅一定の波形
の信号の波高値データを得ている。このエンベロープ補
正された信号（の波高値データ）をフィルタ処理するこ
とにより、音程成分以外が減衰された、あるいは相対的
に音程成分が強調された信号（の波高値データ）を得て
いる。ここで音程成分とは、基本周波数f₀の整数倍の周
波数成分のことである。具体的には、上記エンベロープ
補正された信号に含まれるビブラート等の低周波成分を
除去するためにHPF（ハイパスフィルタ）を介し、次
に、第11図の一点鎖線に示すような周波数特性、すなわ
ち基本周波数f₀の整数倍の周波数帯域が通過帯域の周波
数特性、を有する櫛形フィルタを介すことにより、上記
HPF出力信号に含まれる音程成分のみを通過させてこれ
ら以外の非音程成分やノイズ成分を減衰させ、さらに必
要に応じてLPF（ローパスフィルタ）を介すことによ
り、上記櫛形フィルタ通過後の信号に重畳しているノイ
ズ成分を除去する。First functional block 14 for generating the above loop data
In FIG. 10, the peak value data of the sampled tone signal is divided (or multiplied by the reciprocal) by the data of the previously detected envelope waveform (FIG. 6B) to perform envelope correction. Crest value data of a signal having a constant amplitude waveform as shown is obtained. By subjecting this envelope-corrected signal (peak value data) to filtering processing, a signal (peak value data) in which components other than the pitch components are attenuated or the pitch components are relatively emphasized is obtained. Here, the pitch component is that an integer multiple of the frequency component of the fundamental frequency f _0. Specifically, in order to remove low-frequency components such as vibrato contained in the envelope-corrected signal, the signal passes through an HPF (high-pass filter), and then has a frequency characteristic as shown by a one-dot chain line in FIG. integral multiples of the frequency band is the frequency characteristic of the pass band of the fundamental frequency f _0, by the intervention of the comb filter having the above
By passing only the pitch components included in the HPF output signal to attenuate other non-pitch components and noise components, and passing through an LPF (low-pass filter) as necessary, the signal after passing through the comb filter The superimposed noise component is removed.

すなわち、前記入力信号として楽器の音等の楽音信号
を考えるとき、この楽音信号は通常一定の音程（ピッ
チ、音高）を有していることから、その周波数スペクト
ラムには、第11図の実線に示すように、上記楽音自体の
音程に対応する基本周波数f₀の近傍とその整数倍の周波
数の近傍にエネルギが集中するような分布が得られる。
これに対して一般のノイズ成分は一様な周波数分布を持
っていることが知られている。従って、上記入力楽音信
号を第11図の一点鎖線に示すような周波数特性の櫛形フ
ィルタを通すことにより、楽音信号の基本周波数f₀の整
数倍の周波数成分（いわゆる音程成分）のみがそのまま
通過あるいは強調されて他の成分（非音程成分及びノイ
ズの一部）が減衰され、結果としてSN比を改善すること
ができる。ここで、上記第11図中の一点鎖線に示す櫛形
フィルタの周波数特性は、次式Ｈ（ｆ）＝［（cos（２πf/f₀）＋１）/2］^Ｎ・・・で表されるものである。この式中のf₀は上記入力信号
の基本周波数（音程に対応する基音の周波数）、Ｎは櫛
形フィルタの段数である。That is, when considering a tone signal such as the sound of a musical instrument as the input signal, since the tone signal usually has a fixed pitch (pitch, pitch), its frequency spectrum has a solid line in FIG. as shown in, distributed as near the energy in the vicinity of the frequency of an integer multiple of the fundamental frequency f ₀ corresponding to the pitch of the musical tone itself is concentrated is obtained.
On the other hand, it is known that general noise components have a uniform frequency distribution. Therefore, by passing the input tone signal through a comb filter having a frequency characteristic as shown by a dashed line in FIG. 11, only a frequency component (a so-called pitch component) that is an integral multiple of the fundamental frequency f ₀ of the tone signal is passed or left as it is. The other components (non-pitched components and part of noise) are emphasized and attenuated, and as a result, the S / N ratio can be improved. Here, the frequency characteristic of the comb filter shown in dashed line in the FIG. 11, the following formula H (f) = [(cos (2πf / f 0) +1) / 2] represented by the ^N · · · It is. In this equation, f ₀ is the fundamental frequency of the input signal (the frequency of the fundamental tone corresponding to the pitch), and N is the number of stages of the comb filter.

このようにしてノイズ成分が低減された楽音信号は、
前記繰り返し波形抽出回路に送られ、この繰り返し波形
抽出回路により前述した第２図のルーピング区間LPのよ
うな適当な繰り返し波形区間が抽出された後、半導体メ
モリ等の記憶媒体に送られて記録される。この記憶媒体
に記録された楽音信号データは、非音程成分や一部のノ
イズ成分が減衰されたものであるため、上記繰り返し波
形区間を繰り返し再生する際のノイズ、いわゆるルーピ
ングノイズを低減することができる。The tone signal with the noise component reduced in this way is
It is sent to the repetitive waveform extraction circuit, and after the repetition waveform extraction circuit extracts an appropriate repetition waveform section such as the looping section LP in FIG. 2 described above, it is sent to a storage medium such as a semiconductor memory and recorded. You. Since the tone signal data recorded on this storage medium has attenuated non-pitch components and some noise components, it is possible to reduce noise when repeatedly playing back the repetitive waveform section, so-called looping noise. it can.

なお上記HPF、櫛形フィルタ、LPFの周波数特性は、先
にピッチ検出機能ブロック12にて検出されたピッチ情報
である上記基本周波数f₀に基づいて設定されるようにな
っている。Note the HPF, comb filter, the frequency characteristic of the LPF is adapted to be set based on the fundamental frequency f ₀ is the pitch information detected by the pitch detection function block 12 first.

次に第３図のループ区間間検出機能ブロック16におい
て、上記フィルタ処理によって音程成分以外が減衰され
た楽音信号に対して、適当な繰り返し波形区間を検出す
ることにより、ルーピング開始点LP_Sとルーピング終端
点LP_Eとのルーピングポイントを設定する。Then, in the loop interval between detection blocks 16 of FIG. 3, with respect to the musical tone signal other than pitch component is attenuated by the filtering process, by detecting a suitable repetitive waveform sections, looping start point LP _S and looping setting the looping points between end point LP _E.

すなわち、ループ区間検出機能ブロック16では、上記
楽音信号のピッチ（音程）に対応する繰り返し周期（の
整数倍）だけ相対的に離れた２点であるルーピングポイ
ントを選定するが、以下にその選定原理を説明する。That is, the loop section detection function block 16 selects two looping points that are relatively separated by a repetition period (an integer multiple) corresponding to the pitch (pitch) of the musical tone signal. Will be described.

楽音データをルーピング処理する場合、ルーピングの
間隔は、楽音信号の基本周期（基音の周波数の逆数）の
整数倍でなければならない。したがって、その楽音の音
程を正確に同定すれば、容易に決定することが可能とな
る。When performing looping processing on musical tone data, the looping interval must be an integral multiple of the fundamental period of the musical tone signal (the reciprocal of the fundamental tone frequency). Therefore, if the pitch of the musical tone is accurately identified, it can be easily determined.

つまり、予めルーピング間隔を決定しておき、その間
隔分だけ離れた２点を取り出し、その２点の近傍の信号
波形の相関性あるいは類似性を評価することでルーピン
グポイントを設定する。この評価関数の一例として、上
記２点の各近傍の信号波形のサンプルについてのたたみ
込み（合成積、コンボリューション）を用いるものにつ
いて説明する。すなわち、上記コンボリューションの操
作を全ての点の組みについて順次施すことで信号波形の
相関性あるいは類似性を評価する。ここで、上述のコン
ボリューションによる評価は、例えば上記楽音データを
シフトレジスタに順次入力してゆき、それぞれ各レジス
タで取り込まれた楽音データを、例えば後述するDSP
（ディジタル信号処理装置）で構成された積和器にそれ
ぞれ入力し、該積和器で上記コンボリューションを計算
し出力するものである。このようにして得られたコンボ
リューションが最大となる２点の組みをルーピング開始
点LP_Sおよびルーピング終端点LP_Eとする。In other words, a looping interval is determined in advance, two points separated by the interval are extracted, and a looping point is set by evaluating the correlation or similarity of signal waveforms near the two points. As an example of the evaluation function, a description will be given of a function using convolution (synthesis product, convolution) for a sample of a signal waveform near each of the two points. That is, the convolution operation is sequentially performed on all sets of points to evaluate the correlation or similarity of the signal waveforms. Here, in the evaluation by the convolution described above, for example, the tone data is sequentially input to the shift register, and the tone data captured by each register is converted into, for example, a DSP described later.
(Digital signal processing device), each of which is input to a product-sum device, and the product-sum device calculates and outputs the convolution. Thus convolution obtained is to set the looping start point LP _S and looping end point LP _E of two points becomes maximum.

すなわち、第12図において、ルーピング開始点LP_Sの
候補点をa₀とし、ルーピング終端点LP_Eの候補点をb₀と
して、上記ルーピング開始点LP_Sの候補点a₀の前後近傍
の複数個の点、例えば2N＋１個の点の各波高値データ
を、それぞれa_-N・・,a_-2,a_-1,a₀,a₁,a₂,・・a_N、ルー
ピング終端点LP_Eの候補点b₀の前後近傍の同じ個数（2N
＋１個）の点の各波高値データを、b_-N・・,b_-2,b_-1,
b₀,b₁,b₂,・・b_Nとすると、このときの評価関数Ｅ（a₀,
b₀）は、次式で定めることができる。この第はa₀,b₀の点を中心と
したコンボリューションを求めるための式である。そし
て上記候補点a₀,b₀の組を順次変更して、全てのルーピ
ングポイントの候補となる点についての上記評価関数Ｅ
の値を求め、得られた全ての評価関数Ｅの内でその値が
最大となる点をルーピングポイントとする。That is, in Figure 12, the candidate points of the looping start point LP _S and a _0, the candidate points of the looping end point LP _E as b _0, a plurality of front and rear vicinity of the candidate point a ₀ of the looping start point LP _S , For example, 2N + 1 points of each peak value data, a _-N .., a _-2 , a _-1 , a ₀ , a ₁ , a ₂ , ... a _N , looping end point LP _E same number before and after the vicinity of the candidate point b ₀ (2N
+1) points are expressed as b _-N .., b _-2 , b _-1 ,
Assuming that b ₀ , b ₁ , b ₂ ,... b _N , the evaluation function E (a ₀ ,
b ₀ ) is Can be determined. This is an equation for obtaining a convolution centered on the points a ₀ and b ₀ . Then, the set of the candidate points a ₀ and b ₀ is sequentially changed, and the evaluation function E for all the looping point candidate points is changed.
Is determined, and a point at which the value becomes the maximum among all the obtained evaluation functions E is defined as a looping point.

また、ルーピングポイントは上述のようにコンボリュ
ーションから求める方法の他に、誤差の最小２乗法から
求めることも可能である。すなわち、最小２乗法による
ルーピングポイントの候補点a₀,b₀は、の式で表すことができる。この場合には、評価関数εの
値が最小となるa₀,b₀を求めればよい。Further, the looping point can be obtained by the least square method of the error in addition to the method of obtaining the looping point from the convolution as described above. That is, the candidate points a ₀ and b ₀ of the looping point by the least square method are Can be represented by the following equation. In this case, a ₀ and b ₀ that minimize the value of the evaluation function ε may be obtained.

また、上述のループ区間検出機能ブロック16では、必
要に応じて上記ルーピング開始点LP_Sとルーピング終端
点LP_Eとに基づいてピッチ変換比を算出する。このピッ
チ変換比は、次の機能ブロック17における時間軸補正処
理の際の時間軸補正値データとして用いられる。この時
間軸補正処理は、実際に各種音源データをメモリ等の記
憶手段に記録する際の各種音源データの各ピッチを揃え
ておくために行われるものであり、上記ピッチ変換比の
代わりにピッチ検出機能ブロック12において検出された
上記ピッチ情報を用いるようにしてもよい。Further, the loop interval detection block 16 described above, calculates the pitch conversion ratio based on the above looping start point LP _S and the looping end point LP _E as required. This pitch conversion ratio is used as time axis correction value data in the time axis correction processing in the next function block 17. This time axis correction process is performed to make the pitches of the various sound source data uniform when actually recording the various sound source data in a storage unit such as a memory, and the pitch detection is performed instead of the pitch conversion ratio. The pitch information detected in the function block 12 may be used.

この時間軸補正機能ブロック17におけるピッチの正規
化動作について第13図を参照しながら説明する。The pitch normalization operation in the time axis correction function block 17 will be described with reference to FIG.

第13図Ａは時間軸補正処理（主として時間軸圧伸処
理）を施す前の楽音信号波形を示し、第13図Ｂは上記圧
伸後の補正波形を示している。これらの第13図Ａ、Ｂの
時間軸には、後述する準瞬時ビット圧縮符号化処理の際
のブロック単位で目盛りを付している。FIG. 13A shows a tone signal waveform before time-base correction processing (mainly, time-base expansion / compression processing), and FIG. 13B shows a corrected waveform after the above-mentioned expansion. The time axis in FIGS. 13A and 13B is marked on a block-by-block basis in the quasi-instantaneous bit compression encoding process described later.

時間軸補正前の波形Ａにおいては、通常の場合ルーピ
ング区間LPと上記ブロックとは無関係となるが、第13図
Ｂに示すように、上記ルーピング区間LPがブロックの長
さ（ブロック周期）の整数倍（ｍ倍）となるように時間
軸圧伸処理し、さらにブロックの境界位置が上記ルーピ
ング開始点LP_S及びルーピング終端点LP_Eに一致するよう
に時間軸方向にシフトする。すなわち、ルーピング区間
LPの開始点LP_S及び終端点LP_Eが所定のブロックの境界位
置となるように時間軸補正（時間軸圧伸及びシフト）す
ることによって、整数個（ｍ個）のブロック単位でルー
ピング処理を行うことができ、記録時の音源データのピ
ッチの正規化が実現できる。ここで、上記時間シフトに
よって楽音信号波形の先頭に生ずるブロックの境界から
のずれ分ΔＴの間には、波高値データとして“0"を詰め
るようにすればよい。In the waveform A before the time axis correction, the looping section LP and the block are normally unrelated in the normal case. However, as shown in FIG. 13B, the looping section LP is an integer of the block length (block cycle). times (m times) and a way to time-scale modification processing, further boundary position of the block is shifted in the time axis direction to coincide with the looping start point LP _S and looping end point LP _E. That is, the looping section
By starting point of the LP LP _S and end point LP _E is time base correction so that the boundary position of a predetermined block (time scale modification and shift), the looping process in blocks of an integer number (m pieces) And normalization of the pitch of the sound source data at the time of recording can be realized. Here, "0" may be filled as the peak value data during the deviation ΔT from the block boundary generated at the head of the tone signal waveform due to the time shift.

第14図は、上記時間軸補正後の波形の波高値データを
後述のビット圧縮符号化処理するためにブロック化する
際のブロック構造を表すものであり、１ブロックの波高
値データの個数（サンプル数、ワード数）をｈとしてい
る。この場合、上記ピッチの正規化とは、一般的に第２
図に示す楽音信号波形の一定周期Twの波形のｎ周期分す
なわちルーピング区間LP内のワード数を、上記ブロック
内のワード数ｈの整数倍（ｍ倍）とするように時間軸圧
伸処理することであり、さらに好ましくは、ルーピング
区間LPの開始点LP_S及び終端点LP_Eを時間軸上のブロック
境界位置に一致させるように時間軸処理（シフト処理）
させることである。このように各点LP_S、LP_Eがブロック
境界位置に一致していると、ビット圧縮符号化システム
でのデコードの際のブロック切替えによって生じる誤差
を減少させることができる。FIG. 14 shows a block structure when the peak value data of the waveform after the time axis correction is divided into blocks for bit compression encoding processing to be described later. The number of peak value data in one block (sample (Number, number of words). In this case, the pitch normalization generally means the second
A time axis expansion process is performed so that the number of words in the n-cycle of the constant tone Tw of the tone signal waveform shown in FIG. it, still preferably, in the time axis processing to match the start point LP _S and end point LP _E looping section LP to block border position on the time axis (shift)
It is to make it. When the points LP _S and LP _E coincide with the block boundary position as described above, it is possible to reduce an error caused by block switching at the time of decoding in the bit compression encoding system.

ここで、第14図Ａの１ブロック内の図中斜線で示す部
分のワードWLP_SとWLP_Eは、図中補正波形のルーピング開
始点LP_Sとルーピング終端点LP_E（正確には点LP_Eの直前
の点）のサンプルを示すワードである。なお上記シフト
処理を行わない場合には、ルーピング開始点LP_S及び終
端点LP_Eがブロック境界に必ずしも一致しないため、第1
4図Ｂに示すように、上記ワードWLP_S、WLP_Eの設定位置
は、ブロック内の任意の位置に設定される。ただし、上
記ワードWLP_SからワードWLP_Eまでの間のワード数は１ブ
ロック内のワード数ｈの整数倍（ｍ倍）となっており、
ピッチは正規化される。Here, the word WLP _S and WLP _E of the portion indicated by oblique lines in FIG within 1 block of Figure 14 A is a looping start point of the figure correction waveform LP _S and the looping end point LP _E (exact points in LP _E Is a word indicating the sample at the point immediately before the. Note that the case of not performing the shift process, since the looping start point LP _S and end point LP _E do not necessarily coincide with the block boundary, the first
4 As shown in FIG. B, set position of the word WLP _S, WLP _E is set at an arbitrary position in the block. However, the number of words between the above word WLP _S to word WLP _E is an integral multiple of the number of words h in one block (m times),
The pitch is normalized.

ここで、上述のようにルーピング区間LP内のワード数
を１ブロックのワード数ｈの整数倍とするための楽音信
号波形の時間軸圧伸処理には各種方法が考えられるが、
例えばサンプリングされた波形の波高値データを補間処
理することにより実現でき、その一具体例としては、オ
ーバーサンプリング処理用のフィルタ構成等を利用する
ことができる。Here, as described above, various methods can be considered for the time axis companding process of the tone signal waveform for making the number of words in the looping section LP an integral multiple of the number h of words in one block.
For example, it can be realized by interpolating the peak value data of the sampled waveform. As a specific example, a filter configuration for oversampling processing or the like can be used.

ところで、現実の楽音波形のルーピング周期がサンプ
リング周期単位に対して端数を持ち、ルーピング開始点
LP_Sでのサンプリング波高値とルーピング終端点LP_Eでの
サンプリング波高値とにずれが生じている場合に、オー
バサンプリング等を利用した補間処理により、ルーピン
グ終端点LP_Eの近傍位置（サンプリング周期よりも短い
距離の位置）でルーピング開始点LP_Sのサンプリング波
高値に一致するような波高値を求める等して、補間サン
プルも含めたサンプリング周期の非整数倍の（端数を持
つ）ルーピング周期を実現することが考えられる。この
ようなサンプリング周期の非整数倍のルーピング周期
も、上記時間軸補正処理により上記ブロック周期の整数
倍とすることができ、例えば256倍オーバサンプリング
を利用して時間軸圧伸処理する場合には、ルーピング開
始点LP_Sと終端点LP_Eとの間の波高値の誤差を1/256に低
減して、より円滑なルーピング再生を実現できる。By the way, the looping cycle of the actual tone waveform has a fraction with respect to the sampling cycle unit, and the looping start point
If the deviation in the sampling peak value at the sampling peak value and looping end point LP _E in LP _S occurs, by interpolation processing using oversampling like, from a position near (sampling period of looping end point LP _E even if such finding the peak value to conform to the sampling the peak value of the short distance looping start point LP _S at position) of, with non-integer multiple of the (fractional sampling period interpolated samples were also included) realized looping cycle It is possible to do. Such a looping cycle of a non-integer multiple of the sampling cycle can also be set to an integer multiple of the block cycle by the time axis correction processing. For example, when performing the time axis companding processing using 256 times oversampling, , to reduce the error in the wave height value between the looping start point LP _S and the end point LP _E 1/256 can be realized more smoothly looping playback.

上述のようにしてルーピング区間LPが決められ時間軸
補正（圧伸）処理が施された波形は、次の機能ブロック
21において、第15図に示すようにルーピング区間LPを前
後に接続してループデータの生成が行われる。すなわち
第15図は、上記時間軸補正後の楽音波形（第13図Ｂ）か
らルーピング区間LPのみを切り取り、このルーピング区
間LPを複数個並べたループデータ波形を示しており、こ
のループデータ波形は、複数個のルーピング区間LPのそ
れぞれ一方のルーピング終端点LP_Eと他方のルーピング
開始点LP_Sとを順次接続して並べたものである。このル
ープデータ波形がループデータ生成機能ブロック21にて
生成される。The waveform for which the looping section LP has been determined as described above and the time axis correction (compression expansion) processing has been performed is performed by the following functional block.
In FIG. 21, loop data is generated by connecting the looping sections LP back and forth as shown in FIG. That is, FIG. 15 shows a loop data waveform obtained by cutting out only the looping section LP from the tone waveform after the time axis correction (FIG. 13B) and arranging a plurality of looping sections LP. it is obtained by arranging the respective one of the looping end point LP _E and the other looping start point LP _S of a plurality of looping sections LP and sequentially connected. This loop data waveform is generated by the loop data generation function block 21.

このループデータは、ルーピング区間LPを多数回接続
して形成されるため、該接続形成されたループデータ波
形の各ルーピング開始点LP_Sに対応するワードWLP_Sを含
む開始ブロックの直前には、ルーピング終端点LP_E（正
確には点LP_Eの直前の点）に対応するワードWLP_Sを含む
終了ブロックのデータがそのまま配置されることにな
る。原理的には、ビット圧縮符号化のエンコード処理を
する際に、記憶しようとするルーピング区間LP₀の上記
開始ブロックの直前位置に、少なくとも上記終了ブロッ
クが存在していればよい。さらに一般化して述べるなら
ば、上記ブロック単位のビット圧縮エンコード時に、上
記開始ブロックのパラメータ（圧縮ブロック毎のビット
圧縮符号化の情報、例えば後述するレンジ情報やフィル
タ選択情報）は、上記開始ブロックと終了ブロックのデ
ータに基づいて形成されるようにすればよい。これは、
後述するフォルマント部分を持たないループデータのみ
の楽音信号を音源とする場合にも適用可能な技術であ
る。The loop data, because it is formed by connecting multiple looping section LP, just before the start block containing the word WLP _S corresponding to each looping start point LP _S of the connection formed loop data waveform, looping (the exact point immediately before the point LP _E) termination point LP _E so that the data of the end block containing the word WLP _S corresponding to is arranged as it is. In principle, when the encoding process of bit compression encoding, just before the position of the start block of the looping section LP ₀ to be stored, it is sufficient that at least the end block is present. In more general terms, at the time of the bit compression encoding in block units, the parameters of the start block (information of bit compression encoding for each compression block, for example, range information and filter selection information described later) are What is necessary is just to form based on the data of an end block. this is,
This technique is also applicable to a case where a tone signal of only loop data having no formant part, which will be described later, is used as a sound source.

こうすれば、上記エンコード時に、ルーピング開始点
LP_Sと終端点LP_Eとについては、それぞれの前後複数サン
プルに亘って、それぞれ同じデータが並ぶことになる。
従って、これらの各点LP_SとLP_Eの直前のそれぞれのブロ
ックについてのビット圧縮符号化の際のパラメータは同
じものとなり、デコード処理の際のルーピング再生時の
エラー（ノイズ）を減少することができる。すなわち、
ルーピング再生される楽音データは接続ノイズの無い安
定したものとなる。なお、本実施例においては、上記開
始ブロックの直前に配置する上記ルーピング区間LPのデ
ータのサンプル数を約500サンプルとしている。In this way, at the time of the above encoding, the looping start point
For the LP _S and the end point LP _E is over each of the front and rear multiple samples, so that each same data lined.
Therefore, that the parameters of the time of bit compression encoding for each of the blocks immediately preceding each of these points LP _S and LP _E becomes the same as, reducing the looping playback error upon decoding (noise) it can. That is,
Music data to be looped and reproduced is stable without connection noise. In the present embodiment, the number of data samples of the looping section LP arranged immediately before the start block is about 500 samples.

次に上記フォルマント部分FRの信号のデータ生成工程
においては、先ず、上記ループデータ生成の際の機能ブ
ロック14と同様に、機能ブロック18においてエンベロー
プ補正処理が施される。ただしこの場合のエンベロープ
補正は、上記サンプリング処理された楽音信号に対し
て、前述したディケイレート情報のみのエンベロープ波
形（第７図）で割算することにより、第16図に示すよう
な波形の信号（の波高値データ）を得ている。すなわち
この第16図の出力信号においては、上記アタック部分
（時間T_Aの間）のエンベロープが残され、それ以外の部
分は一定振幅となっている。Next, in the data generation process of the signal of the formant part FR, first, similarly to the functional block 14 at the time of generating the loop data, an envelope correction process is performed in a functional block 18. In this case, however, the envelope correction is performed by dividing the above-described sampled tone signal by the envelope waveform (FIG. 7) containing only the decay rate information as described above. (Peak value data). That is, in the output signal of FIG. 16, the envelope of the above-mentioned attack portion (during the time T _A ) is left, and the other portions have a constant amplitude.

このエンベロープ補正された信号は、必要に応じて機
能ブロック19でのフィルタ処理が施される。この機能ブ
ロック19でのフィルタ処理には、上記機能ブロック15と
同様な例えば第11図の一点鎖線に示すような周波数特性
の櫛形フィルタが用いられる。すなわちこの櫛形フィル
タは、上記音程に対応する基本周波数f₀の整数倍の周波
数帯域成分を強調して相対的に非音程成分を減衰するよ
うな周波数特性を有しており、この櫛形フィルタも上記
ピッチ検出機能ブロック12で検出されたピッチ情報（基
本周波数f₀）に基づいて周波数特性が設定されるもので
ある。このような信号は、最終的にメモリ等の記憶媒体
に記録される音源データにおけるフォルマント部分の信
号のデータを生成するために用いられる。This envelope-corrected signal is subjected to a filtering process in a functional block 19 as necessary. For the filter processing in the functional block 19, for example, a comb filter having a frequency characteristic as shown by a dashed line in FIG. 11 similar to the functional block 15 is used. That this comb filter has a frequency characteristic as to attenuate the relatively non-pitch component emphasizes the integral multiple of the frequency band component of the fundamental frequency f ₀ corresponding to the pitch, also the comb filter described above The frequency characteristic is set based on the pitch information (basic frequency f ₀ ) detected by the pitch detection function block 12. Such a signal is used to generate data of a signal of a formant part in sound source data finally recorded on a storage medium such as a memory.

次の機能ブロック20においては、上記機能ブロック17
と同様な時間軸補正が上記フォルマント部分生成用信号
に対しても行われる。これは、上記機能ブロック16で求
められたピッチ変換比あるいは上記機能ブロック12で検
出されたピッチ情報に基づいて時間軸の圧縮伸長を行う
ことにより、各音源毎のピッチを揃える（正規化する）
ためのものである。In the next function block 20, the above function block 17
The same time axis correction is performed on the formant part generation signal. This is because the pitch of each sound source is made uniform (normalized) by performing compression and expansion on the time axis based on the pitch conversion ratio obtained in the function block 16 or the pitch information detected in the function block 12.
It is for.

次に、機能ブロック22において、上記共に同じピッチ
変換比あるいはピッチ情報を用いて時間軸補正されたル
ープデータとフォルマント部分生成用データとが混合さ
れる。このときの混合は、上記機能ブロック20からのフ
ォルマント部分生成用信号に対してハミング窓をかけ、
ループデータと混合しようとする部分で時間に伴って減
衰するフェイドアウト型の信号を形成し、これに対して
上記機能ブロック20からのループデータに対しても同様
なハミング窓をかけ、この場合にはフォルマント信号と
混合しようとする部分で時間に伴って増大するフェイド
イン型の信号を形成し、これらの信号を混合する（クロ
スフェイドする）ことにより、最終的に音源データとな
る楽音信号を得ている。ここで、メモリ等の記憶媒体に
記録するループデータとしては、上記クロスフェイド部
分からある程度離れた１つのルーピング区間のデータを
取り出すことにより、ルーピング再生時のノイズ（ルー
ピングノイズ）を低減することができる。このようにし
て、発音時からの非音程成分を含む波形部分であるフォ
ルマント部分FRと、音程成分のみの繰り返し波形部分で
あるルーピング区間LPとから成る音源信号の波高値デー
タが得られる。Next, in the function block 22, the loop data and the formant part generation data that have been time-axis corrected using the same pitch conversion ratio or pitch information are mixed. At this time, a hamming window is applied to the formant part generation signal from the functional block 20,
Form a fade-out type signal that attenuates with time in the part to be mixed with the loop data, and applies the same Hamming window to the loop data from the functional block 20 in this case. By forming a fade-in type signal that increases with time in the portion to be mixed with the formant signal, and mixing (cross-fading) these signals, a tone signal is finally obtained as sound source data. I have. Here, as loop data to be recorded on a storage medium such as a memory, noise in a looping reproduction (looping noise) can be reduced by extracting data of one looping section that is separated from the cross-fade part to some extent. . In this manner, the peak value data of the sound source signal including the formant portion FR, which is a waveform portion including a non-pitch component from the time of sound generation, and the looping section LP, which is a repetitive waveform portion including only the pitch component, is obtained.

この他、上記フォルマント部分生成用信号における上
記ルーピング開始点の位置にループデータの信号の開始
点を接続するように各部分を切り繋ぐ処理等も考えられ
る。In addition, a process of connecting each part such that the start point of the loop data signal is connected to the position of the looping start point in the formant part generation signal may be considered.

ところで、現実にループ区間検出やルーピング処理、
さらにはループデータとフォルマント部分との混合を行
う際には、人間の手操作により試行錯誤的に試聴を繰り
返しながら大まかな混合をしておき、このときのループ
ポイント（ルーピング開始点LP_Sとルーピング終端点L
P_E）情報等に基づいてより高精度の処理を行っている。By the way, actually, loop section detection and looping processing,
When further performing mixing of the loop data and formant portion, by the hand of man operations leave the rough mixed with trial and error repeated Listen, loop points (looping start point LP _S and the looping of the time End point L
_PE ) Higher precision processing is performed based on information.

すなわち、上記機能ブロック16での高精度のループ区
間検出に先立って、第17図のフローチャートに示すよう
な手順でループ区間検出や上記混合等を試聴を繰り返し
ながら手操作で行い、その後、上述したような高精度の
処理（ステップS26以降）を行わせる。That is, prior to the high-precision loop section detection in the functional block 16, the loop section detection and the mixing and the like are manually performed while repeating the audition according to the procedure shown in the flowchart of FIG. Such high-precision processing (step S26 and subsequent steps) is performed.

この第17図において、最初のステップS21において
は、例えば信号波形のゼロクロス点を利用したり、信号
波形の表示を目視確認しながら、比較的粗い精度で上記
ループポイントを検出し、ステップS22でルーピング処
理して上記ループポイント間の波形を繰り返し再生し、
次のステップS23で人間が試聴して良好か否かを判別す
る。不良の場合には上記最初のステップS21に戻ってル
ープポイントを再度検出する。これを繰り返して良好な
試聴結果が得られれば、次のステップS24に進み、上記
フォルマント部用信号とクロスフェード等により混合
し、次のステップS23で人間が試聴してフォルマント部
からルーピング部への移行が良好か否かを判別する。不
良の場合にはステップS24に戻って上記混合をやり直
す。その後、ステップS26に進んで、上記ループ区間検
出機能ブロック16における高精度のループ区間検出を行
う。具体的には上記補間サンプルも含むループ区間検
出、例えば256倍オーバサンプリング時にはサンプリン
グ周期の1/256の精度でのループ区間検出を行い、次の
ステップS27で上記ピッチ正規化のためのピッチ変換比
を算出する。このピッチ変換比に基づいて、次のステッ
プS28で上記機能ブロック17、20における時間軸補正処
理を行い、次のステップS29にて上記機能ブロック21で
のループデータ生成を行う。そして、ステップS30にお
いて、上記機能ブロック22での混合処理を行う。これら
のステップS26以降の処理においては、ステップS21から
S25までで得られたループポイント情報等を利用するも
のである。なお、上記ステップS21からS25までを省略し
て、ルーピング処理等の全自動化を図ってもよい。In FIG. 17, in the first step S21, the loop point is detected with relatively coarse accuracy while using, for example, the zero-cross point of the signal waveform or visually confirming the display of the signal waveform, and looping is performed in step S22. Process and repeatedly play the waveform between the above loop points,
In the next step S23, it is determined whether or not the human being listens to the sample by listening. In the case of a failure, the process returns to the first step S21 to detect the loop point again. If a good audition result is obtained by repeating this, the process proceeds to the next step S24, where the signal is mixed with the above-mentioned signal for the formant section by crossfading, etc. It is determined whether the transition is good. If defective, the process returns to step S24 to repeat the mixing. Thereafter, the process proceeds to step S26, in which the loop section detection function block 16 performs high-accuracy loop section detection. Specifically, the loop section detection including the interpolation sample is performed, for example, at the time of 256 times oversampling, the loop section detection is performed with an accuracy of 1/256 of the sampling period, and the pitch conversion ratio for the pitch normalization is performed in the next step S27. Is calculated. Based on the pitch conversion ratio, the time axis correction processing in the functional blocks 17 and 20 is performed in the next step S28, and the loop data is generated in the functional block 21 in the next step S29. Then, in step S30, the mixing process in the functional block 22 is performed. In the processing after step S26, the processing from step S21
The loop point information and the like obtained up to S25 are used. Note that steps S21 to S25 may be omitted and full automation such as looping processing may be achieved.

このような混合処理により得られたフォルマント部分
FRとルーピング区間LPとから成る信号の波高値データ
は、次の機能ブロック23においてビット圧縮符号化処理
が施される。Formant part obtained by such a mixing process
The peak value data of the signal composed of the FR and the looping section LP is subjected to a bit compression encoding process in the next functional block 23.

上述のビット圧縮符号化方式としては種々のものが考
えられるが、ここでは、本件出願人が先に特開昭62−00
8629号公報や特開昭62−003516号公報等において提案し
ている準瞬時圧伸型、すなわち波高値データの一定ワー
ド数（ｈサンプル）毎にブロック化しこのブロック単位
でビット圧縮を施すような高能率符号化方式を用いるも
のとし、この高能率ビット圧縮符号化方式について、第
18図を参照しながら概略的に説明する。Various types of the above-described bit compression encoding method are conceivable.
No. 8629 and Japanese Unexamined Patent Publication No. 62-003516, etc., a quasi-instantaneous companding type, in which a block is formed for each fixed number of words (h samples) of peak value data and bit compression is performed in block units. The high-efficiency coding method shall be used.
This will be schematically described with reference to FIG.

この第18図において、上記高能率ビット圧縮符号化シ
ステムは、記録側のエンコーダ70と、再生側のデコーダ
90とにより構成されており、エンコーダ70の入力端子71
には、上記音源信号の波高値データｘ（ｎ）が供給され
ている。In FIG. 18, the high-efficiency bit compression encoding system comprises a recording-side encoder 70 and a reproduction-side decoder.
90 and the input terminal 71 of the encoder 70.
Is supplied with peak value data x (n) of the sound source signal.

この入力信号（の波高値データ）ｘ（ｎ）は、予測器
72及び加算器73で構成されたFIR（有限インパルス応答
型）ディジタルフィルタ74に供給され、上記予測器72か
らの予測信号（の波高値データ）（ｎ）は上記加算器
73に減算信号として送られている。上記加算器73におい
ては、上記入力信号ｘ（ｎ）から上記予測信号（ｎ）
が減算されることによって、予測誤差信号あるいは広義
の差分出力ｄ（ｎ）が出力される。予測器72は、一般に
過去のｐ個の入力ｘ（ｎ−ｐ）,x（ｎ−ｐ＋１），・
・,x（ｎ−１）の１次結合により予測値（ｎ）を算出
するものである。なお、上記FIRフィルタ74を、以下エ
ンコード・フィルタと称す。This input signal (peak value data) x (n) is calculated by a predictor
The prediction signal (the peak value data) (n) of the prediction signal from the predictor 72 is supplied to an FIR (finite impulse response type) digital filter 74 comprising an adder 72 and an adder 73.
73 is sent as a subtraction signal. In the adder 73, the prediction signal (n) is obtained from the input signal x (n).
Is subtracted to output a prediction error signal or a difference output d (n) in a broad sense. The predictor 72 generally has p past inputs x (n-p), x (n-p + 1),.
The prediction value (n) is calculated by a linear combination of x (n-1). Note that the FIR filter 74 is hereinafter referred to as an encoding filter.

上記高能率ビット圧縮符号システムにおいては、上記
音源データの一定時間内のデータ、すなわち、一定ワー
ド数ｈの入力データ毎にブロック化して、各ブロック毎
に最適の特性の上記エンコード・フィルタ74を選択する
ようにしている。これは、互いに異なる特性を有する複
数の（例えば４個の）エンコード・フィルタを予め設け
ておき、これらのフィルタのうち最適の特性の、すなわ
ち最も高い圧縮率を得ることのできるようなフィルタを
選択することで実現できるものである。ただし、一般の
ディジタル・フィルタの構成上は、第18図に示す１個の
エンコード・フィルタ74の予測器72の係数の組を複数組
（例えば４組）係数メモリ等に記憶させておき、これら
の係数の組を時分割的に切り換え選択することで、実質
的に上記複数のエンコード・フィルタのうちの１つを選
択するのと等価な動作を行わせることが多い。In the high-efficiency bit compression encoding system, data of the excitation data within a certain time, that is, input data of a certain number of words h is divided into blocks, and the encoding filter 74 having the optimum characteristic is selected for each block. I am trying to do it. This is because a plurality of (for example, four) encoding filters having different characteristics are provided in advance, and a filter having an optimum characteristic, that is, a filter capable of obtaining the highest compression ratio is selected from these filters. It can be realized by doing. However, in the structure of a general digital filter, a plurality of sets (for example, four sets) of coefficients of the predictor 72 of one encoding filter 74 shown in FIG. In many cases, an operation substantially equivalent to selecting one of the plurality of encoding filters is performed by switching and selecting the set of coefficients in a time-division manner.

次に、上記予測誤差としての差分出力ｄ（ｎ）は、加
算器81を介し、利得Ｇのシフタ75と量子化器76とよりな
るビット圧縮器に送られ、例えば浮動小数点（フローテ
ィング・ポイント）表示形態における指数部が上記利得
Ｇに、仮数部が量子化器76からの出力にそれぞれ対応す
るような圧縮処理あるいはレンジング処理が施される。
すなわち、シフタ75により入力データを上記利得Ｇに応
じたビット数だけシフトしてレンジを切り替え、量子化
器76により該ビット・シフトされたデータの一定ビット
数を取り出すような再量子化を行っている。ここで、ノ
イズ・シェイピング回路（ノイズ・シェイパ）77は、量
子化器76の出力と入力との誤差分いわゆる量子化誤差を
加算器78で得て、この量子化誤差を利得G^-1のシフタ79
を介し予測器80に送って、量子化誤差の予測信号を加算
器81に減算信号として帰還するようないわゆるエラー・
フィードバックを行う。このように量子化器76による再
量子化とノイズ・シェイピング回路77によるエラー・フ
ィードバックとが施され、出力端子82より出力（ｎ）
が取り出される。Next, the difference output d (n) as the prediction error is sent to a bit compressor composed of a shifter 75 for gain G and a quantizer 76 via an adder 81, for example, a floating point. A compression process or a ranging process is performed so that the exponent part in the display form corresponds to the gain G and the mantissa part corresponds to the output from the quantizer 76.
That is, the range is switched by shifting the input data by the number of bits according to the gain G by the shifter 75, and requantization is performed by the quantizer 76 to extract a certain number of bits of the bit-shifted data. I have. Here, a noise shaping circuit (noise shaper) 77 obtains a so-called quantization error corresponding to an error between an output and an input of the quantizer 76 by an adder 78, and converts the quantization error into a shifter having a gain G- ¹ . 79
, And a prediction signal of the quantization error is fed back to the adder 81 as a subtraction signal.
Give feedback. In this way, requantization by the quantizer 76 and error feedback by the noise shaping circuit 77 are performed, and the output (n) is output from the output terminal 82.
Is taken out.

ところで、上記加算器81からの出力ｄ′（ｎ）は上記
差分出力ｄ（ｎ）より上記ノイズ・シェイパ77からの量
子化誤差の予測信号（ｎ）を減算したものであり、上
記利得Ｇのシフタ75からの出力ｄ″（ｎ）は利得Ｇと上
記出力加算器81からの出力ｄ′（ｎ）を乗算したもので
ある。また、上記量子化器76からの出力（ｎ）は、量
子化の過程における量子化誤差ｅ（ｎ）と上記シフタ75
からの出力ｄ″（ｎ）を加算したものとなり、上記ノイ
ズ・シェイパ77の上記加算器78において上記量子化誤差
ｅ（ｎ）が取り出される。この量子化誤差ｅ（ｎ）は、
上記利得G^-1のシフタ79を介し、過去のｒ個の入力の１
次結合をとる予測器80を介することにより量子化誤差の
予測信号（ｎ）となる。The output d '(n) from the adder 81 is obtained by subtracting the prediction signal (n) of the quantization error from the noise shaper 77 from the difference output d (n). The output d ″ (n) from the shifter 75 is obtained by multiplying the gain G by the output d ′ (n) from the output adder 81. The output (n) from the quantizer 76 is the quantum Error e (n) in the process of quantization and the shifter 75
Is added to the output d ″ (n), and the quantization error e (n) is extracted by the adder 78 of the noise shaper 77. The quantization error e (n) is
Through the shifter 79 of the gain G ^-1 , one of the past r inputs
The prediction signal (n) of the quantization error is obtained through the predictor 80 that takes the next combination.

上記音源データは、以上のようなエンコード処理が施
され、上記量子化器76からの出力（ｎ）となって出力
端子82を介して取り出される。The above-mentioned sound source data is subjected to the above-described encoding processing, output as the output (n) from the quantizer 76, and taken out through the output terminal 82.

次に予測・レンジ適応回路84からは、最適フィルタ選
択情報としてのモード選択情報が出力されて、上記エン
コード・フィルタ74の例えば予測器72および出力端子87
に送られ、また、上記利得Ｇおよび利得G^-1あるいは上
記ビット・シフト量を決定するためのレンジ情報が出力
されて、各シフタ75,79および出力端子86に送られてい
る。Next, the mode selection information as the optimum filter selection information is output from the prediction / range adaptation circuit 84, for example, the predictor 72 and the output terminal 87 of the encoding filter 74.
The range information for determining the gain G and the gain G- ¹ or the bit shift amount is output to the shifters 75 and 79 and the output terminal 86.

次に、再生側のデコーダ90の入力端子91には、上記エ
ンコーダ70の出力端子82からの出力（ｎ）が伝送さ
れ、あるいは記録，再生されることによって得られた信
号′（ｎ）が供給されている。この入力信号′
（ｎ）は利得G^-1のシフタ92を介し加算器93に送られて
いる。加算器93からの出力ｘ′（ｎ）は予測器94に送ら
れて予測信号′（ｎ）となり、この予測信号′
（ｎ）は上記加算器93に送られて上記シフタ92からの出
力″（ｎ）と加算される。この加算出力がデコード出
力′（ｎ）として出力端子95より出力される。Next, the output (n) from the output terminal 82 of the encoder 70 is transmitted to the input terminal 91 of the decoder 90 on the reproduction side, or the signal '(n) obtained by recording and reproduction is supplied. Have been. This input signal
(N) is sent to the adder 93 via the shifter 92 having the gain G ^-1 . The output x '(n) from the adder 93 is sent to the predictor 94 to become a predicted signal' (n),
(N) is sent to the adder 93 and added to the output "(n) from the shifter 92. The added output is output from the output terminal 95 as a decoded output '(n).

また、上記エンコーダ70の各出力端子86および87より
出力され、伝送あるいは記録，再生された上記レンジ情
報およびモード選択信号は、上記デコーダ90の各入力端
子96および97にそれぞれ入力されている。そして、入力
端子96からのレンジ情報は上記シフタ92に送られて利得
G^-1を決定し、入力端子97からのモード選択情報は上記
予測器94に送られて予測特性を決定する。この予測器94
の予測特性は、上記エンコーダ70の予測器72の特性に等
しいものが選択される。The range information and the mode selection signal output from the output terminals 86 and 87 of the encoder 70 and transmitted, recorded, or reproduced are input to the input terminals 96 and 97 of the decoder 90, respectively. The range information from the input terminal 96 is sent to the shifter 92 and gain
G- ¹ is determined, and the mode selection information from the input terminal 97 is sent to the predictor 94 to determine a prediction characteristic. This predictor 94
Are selected as those of the predictor 72 of the encoder 70.

このような構成のデコーダ90において、上記シフタ92
からの出力″（ｎ）は、上記入力信号′（ｎ）と利
得G^-1を乗算したものである。また、上記加算器93の出
力′（ｎ）は、上記シフタ92からの出力″（ｎ）と
予測信号′（ｎ）を加算したものである。In the decoder 90 having such a configuration, the shifter 92
Is the product of the input signal '(n) and the gain G- ¹ . The output' (n) of the adder 93 is the output '(n) of the shifter 92. n) and the prediction signal '(n).

次に、第19図は、上記ビット圧縮符号化エンコーダ70
からの上記１ブロック分の出力データの一例を示してお
り、この１ブロック分のデータは、１バイトのヘッダ情
報（圧縮に関するパラメータ情報、あるいは付属情報）
FRと８バイトのサンプル用データD_A0〜D_B3で構成されて
いる。上記ヘッダ情報RFは、４ビットの上記レンジ情報
と、２ビットの上記モード選択情報、あるいはフィルタ
選択情報と、それぞれ１ビットの２つのフラグ情報、例
えばループの有無を示す情報LI及び波形の終端ブロック
（エンドブロック）が否かを示す情報EIとで構成されて
いる。ここで１サンプルの波高値データは、ビット圧縮
されて４ビットで表されており、上記データD_A0〜D_B3中
には16サンプル分の４ビット・データD_A0H〜D_B3Lが含ま
れている。Next, FIG. 19 shows the bit compression encoding encoder 70
1 shows an example of output data of one block from the above, and this one block of data is composed of 1-byte header information (compression parameter information or additional information).
It is composed of FR and 8-byte sample data D _{A0 to} D _B3 . The header information RF includes the 4-bit range information, the 2-bit mode selection information, or the filter selection information, two 1-bit flag information, for example, information LI indicating the presence or absence of a loop, and a waveform end block. (End block) is composed of information EI indicating whether or not (end block). Here, the peak value data of one sample is bit-compressed and represented by 4 bits, and the data D _{A0 to} D _B3 include 4-bit data D _{A0H to} D _{B3L for} 16 samples. .

次に第20図は、第２図に示すような楽音信号波形の先
頭部分に対応する上記準瞬時（ブロック化）ビット圧縮
符号化された波高値データの各ブロックを示している。
この第20図においては、上記ヘッダを省略して波高値デ
ータのみを示しており、図示の都合上１ブロックを８サ
ンプルとしているが、１ブロック16サンプル等のように
任意に設定可能であることは勿論である。これは、前記
第14図の場合も同様である。Next, FIG. 20 shows each block of the peak value data which has been subjected to the quasi-instantaneous (blocking) bit compression encoding corresponding to the head portion of the tone signal waveform as shown in FIG.
In FIG. 20, only the peak value data is shown omitting the above-mentioned header, and one block is set to 8 samples for the sake of illustration. However, it can be set arbitrarily such as 16 samples per block. Of course. This is the same in the case of FIG.

ここで、上記準瞬時ビット圧縮符号システムは、上記
入力楽音信号を直接出力するモードすなわちストレート
PCMモードと、楽音信号をフィルタを介して出力するモ
ードすなわち１次または２次差分フィルタモードのう
ち、最も高い圧縮率を有する信号が得られるモードを選
択して、出力信号である楽音データを伝送するようにし
たものである。Here, the quasi-instantaneous bit compression code system is a mode for directly outputting the input tone signal, that is, a straight mode.
Selects a mode in which a signal having the highest compression rate is obtained from the PCM mode and a mode in which a tone signal is output through a filter, that is, a primary or secondary difference filter mode, and transmits tone data as an output signal. It is something to do.

楽音をサンプリングしてメモリ等の記憶媒体に記録す
る場合、上記楽音の楽音信号波形は発音開始点KSで波形
取り込みが開始されるものであるが、この発音開始点KS
からの最初のブロックにて１次または２次差分フィルタ
モード等のように初期値が必要なフィルタモードが選択
されると、この初期値を予め用意しておく必要が生じる
ため、このような初期値の必要のない形態とすることが
望まれる。このため、上記発音開始点KSに先行する期間
に、上記ストレートPCMモード（入力楽音信号を直接出
力するモード）が選択されるような擬似入力信号を付加
した後、その入力信号を含めて信号処理するようにして
いる。When a musical tone is sampled and recorded in a storage medium such as a memory, the tone signal waveform of the musical tone starts to be captured at the tone generation start point KS.
When a filter mode requiring an initial value, such as a primary or secondary difference filter mode, is selected in the first block from, this initial value needs to be prepared in advance. It is desirable to have a form that does not require a value. For this reason, during a period preceding the tone generation start point KS, a pseudo input signal for selecting the straight PCM mode (a mode for directly outputting an input tone signal) is added, and then signal processing including the input signal is performed. I am trying to do it.

すなわち具体的には、第20図において、上記発音開始
点KSに先行して、上記疑似入力信号としてデータを全て
“0"としたブロックを配置し、このブロックの先頭から
の全データ“0"をサンプリング波高値データとしてビッ
ト圧縮処理して取り込むようにしている。これは、例え
ば、予め１ブロックのデータが全て“0"のブロック作成
しておきこれをメモリ等にストアしておいて用いるか、
または、楽音をサンプリングする際に上記発音開始点KS
の前にデータが全て“0"の部分（すなわち発音開始前の
無音部分）の入力信号からサンプリングを開始する等に
より得ることができる。なお、上記擬似入力信号のブロ
ックは最低１ブロック以上である。That is, specifically, in FIG. 20, a block in which data is all “0” is arranged as the pseudo input signal prior to the sound generation start point KS, and all data “0” from the head of this block are arranged. Is subjected to bit compression processing as sampling peak value data and taken in. This is, for example, to create a block in which one block of data is all “0” and store it in a memory or the like before use,
Alternatively, when sampling a tone,
, Data can be obtained by starting sampling from an input signal in which all data is "0" (that is, a silent portion before the start of sound generation). The number of blocks of the pseudo input signal is at least one.

上述のようにして形成された擬似入力信号を含んだ楽
音データを、前述の第18図に示すような高能率ビット圧
縮符号化システムにより信号圧縮処理し、メモリ等の記
憶媒体に記録させておき、この圧縮処理された信号を再
生する。The tone data including the pseudo input signal formed as described above is subjected to signal compression processing by the high-efficiency bit compression encoding system as shown in FIG. 18 and recorded in a storage medium such as a memory. The compressed signal is reproduced.

したがって、上記擬似入力信号を含んだ楽音データを
再生する場合、再生開始時（擬似入力信号のブロック部
分）のフィルタにストレートPCMモードが選択されるた
め、１次または２次差分フィルタの初期値をあらかじめ
設定しておく必要がなくなる。Therefore, when the tone data including the pseudo input signal is reproduced, the straight PCM mode is selected as the filter at the start of reproduction (the block portion of the pseudo input signal). There is no need to set in advance.

ここで、再生開始時に上記擬似入力信号（データが全
て“0"であるため無音である。）による発音開始時間の
遅れについての懸念がある。しかし、例えば、サンプリ
ング周波数32kHzで１ブロック16サンプルとした場合、
上記発音時間の遅れは約0.5msecとなり聴覚上で識別で
きる遅れではなく問題にならない。Here, there is a concern about a delay in the sound generation start time due to the pseudo input signal (since the data is all "0" and there is no sound) at the start of reproduction. However, for example, when the sampling frequency is 32 kHz and one block is 16 samples,
The delay of the sounding time is about 0.5 msec, which is not a problem that is not a delay that can be discerned by hearing.

ところで、上記ビット圧縮符号化処理やその他の音源
データ生成のためのディジタル信号処理については、デ
ィジタル信号処理装置（DSP）を用いてソフトウェア的
に実現することが多く行われており、また記録された音
源データの再生にもDSPを用いたソフトウェア的な構成
が採用されることが多い。第21図はその一例として、音
源データを取り扱う音源ユニットとしてのオーディオ・
プロセッシング・ユニット（APU）107及びその周辺を含
むシステムの全体構成例を示している。By the way, digital signal processing for generating the above-mentioned bit compression encoding processing and other sound source data is often implemented by software using a digital signal processing device (DSP). A software-like configuration using a DSP is often used for reproducing sound source data. FIG. 21 shows an example of an audio / sound source unit that handles sound source data.
1 shows an example of the overall configuration of a system including a processing unit (APU) 107 and its periphery.

この第21図において、例えば一般のパーソナルコンピ
ュータ装置や、ディジタル電子楽器、TVゲーム機等に設
けられているホストコンピュータ104は、上記音源ユニ
ットとしてのAPU107と接続されており、該ホストコンピ
ュータ104からは音源データ等がAPU107にロードされる
ようになっている。このAPU107は、マイクロプロセッサ
等のCPU（中央処理装置）103と、DSP（ディジタル信号
処理装置）101と、上述したような音源データ等が記憶
されたメモリ102とを少なくとも有して構成されるもの
である。すなわち、このメモリ102には少なくとも音源
データが記憶されており、上記DSP101により該音源デー
タの読み出し制御を含む各種処理、例えばルーピング処
理、ビット伸長（復元）処理、ピッチ変換処理、エンベ
ロープの付加、エコー（リバーブ）処理等が施される。
メモリ102は、これらの各種処理のためのバッファメモ
リとしても用いられる。CPU103は、DSP101のこれらの各
種処理の動作や内容等についての制御を行うものであ
る。In FIG. 21, for example, a host computer 104 provided in a general personal computer device, a digital electronic musical instrument, a video game machine, or the like is connected to an APU 107 as the sound source unit. Sound source data and the like are loaded into the APU 107. The APU 107 includes at least a CPU (central processing unit) 103 such as a microprocessor, a DSP (digital signal processing unit) 101, and a memory 102 in which sound source data and the like are stored as described above. It is. That is, at least sound source data is stored in the memory 102, and various kinds of processing including reading control of the sound source data by the DSP 101, such as looping processing, bit decompression (decompression) processing, pitch conversion processing, envelope addition, echo (Reverb) processing or the like is performed.
The memory 102 is also used as a buffer memory for these various processes. The CPU 103 controls operations and contents of these various processes of the DSP 101.

さらに、メモリ102からの上記音源データに対してDSP
101により上記各種処理を施して最終的に得られたディ
ジタル音源データは、ディジタル／アナログ（D/A）コ
ンバータ105によりアナログ信号に変換されてスピーカ1
06に供給されるようになっている。Furthermore, the above sound source data from the memory 102 is
The digital sound source data finally obtained by performing the above-described various processings at 101 is converted into an analog signal by a digital / analog (D / A) converter 105 and
06 will be supplied.

なお、本発明は上述した実施例のみに限定されるもの
ではなく、例えば、上述の実施例においてはフォルマン
ト部分とルーピング区間とを接続して音源データを形成
していたが、ルーピング区間のみから成る音源データを
形成する場合にも容易に適用可能である。また、上記デ
コーダ側構成や音源データ用外部メモリは、ROMカート
リッジやアダプタとして供給してもよい。また、楽音信
号の音源のみならず音声合成にも適用可能である。It should be noted that the present invention is not limited to only the above-described embodiment. For example, in the above-described embodiment, the sound source data is formed by connecting the formant part and the looping section, but only the looping section is used. It can be easily applied to the case where sound source data is formed. Further, the decoder side configuration and the external memory for sound source data may be supplied as a ROM cartridge or an adapter. In addition, the present invention can be applied not only to the sound source of a tone signal but also to speech synthesis.

〔The invention's effect〕

本発明のピッチ検出方法によれば、入力ディジタル信
号をフーリエ変換し、得られた各周波数成分の絶対値を
とって各周波数成分の絶対値を再びフーリエ変換（逆フ
ーリエ変換）しており、この逆フーリエ変換によって生
成されたデータのピークの周波数を検出しているため、
少ないサンプル数でピッチが検出でき、サンプルの周波
数による精度のバラツキも少ない。According to the pitch detection method of the present invention, the input digital signal is Fourier-transformed, the absolute value of each obtained frequency component is taken, and the absolute value of each frequency component is again subjected to Fourier transform (inverse Fourier transform). Since the frequency of the peak of the data generated by the inverse Fourier transform is detected,
The pitch can be detected with a small number of samples, and there is little variation in accuracy due to the frequency of the sample.

したがって、少ないサンプル数の音源データから音源
のピッチが検出可能であり、かつ音源データの周波数に
よるピッチ検出精度のばらつきが少ない高精度なピッチ
検出方法を得ることができる。また、本発明は、高速フ
ーリエ変換とシフトレジスタで構成された単純なピーク
検出器との組み合わせで実現できる。Therefore, it is possible to detect the pitch of the sound source from the sound source data of a small number of samples, and it is possible to obtain a highly accurate pitch detection method in which the variation in the pitch detection accuracy due to the frequency of the sound source data is small. Further, the present invention can be realized by a combination of a fast Fourier transform and a simple peak detector constituted by a shift register.

[Brief description of the drawings]

第１図は本発明のピッチ検出方法の原理を示すフローチ
ャート、第２図は楽音信号波形図、第３図は本発明の信
号記録方法の具体例を説明するための機能ブロック図、
第４図はピッチ検出動作を説明するための機能ブロック
図、第５図はピーク検出動作を説明するためのブロック
図、第６図は楽音信号及びエンベロープの波形図、第７
図は楽音信号のディケイレート情報の波形図、第８図は
エンベロープ検出動作を説明するための機能ブロック
図、第９図はFIRフィルタの特性図、第10図は楽音信号
のエンベロープ補正された後の波高値データを示す波形
図、第11図は櫛形フィルタの特性図、第12図は最適ルー
ピングポイントの設定動作を説明するための波形図、第
13図は時間軸補正の前後の楽音信号を示す波形図、第14
図は時間軸補正後の波高値データについて準瞬時ビット
圧縮用のブロックの構造を示す模式図、第15図はルーピ
ング区間の波形を繰り返し接続されて得られるループデ
ータを示す波形図、第16図はディケイレート情報に基づ
くエンベロープ補正後のフォルマント部分生成用データ
を示す波形図、第17図は現実のルーピング処理前後の動
作を説明するためのフローチャート、第18図は準瞬時ビ
ット圧縮符号化システムの概略構成を示すブロック回路
図、第19図は準瞬時ビット圧縮符号化されて得られたデ
ータの１ブロックの具体例を示す模式図、第20図は楽音
信号の先頭部分のブロックの内容を示す模式図、第21図
はオーディオ・プロセッシング・ユニット（APU）及び
その周辺を含むシステムの構成例を示すブロック図であ
る。1 is a flowchart showing the principle of the pitch detection method of the present invention, FIG. 2 is a tone signal waveform diagram, FIG. 3 is a functional block diagram for explaining a specific example of the signal recording method of the present invention,
FIG. 4 is a functional block diagram for explaining a pitch detecting operation, FIG. 5 is a block diagram for explaining a peak detecting operation, FIG. 6 is a waveform diagram of a tone signal and an envelope, FIG.
FIG. 8 is a waveform diagram of the decay rate information of the tone signal, FIG. 8 is a functional block diagram for explaining the envelope detection operation, FIG. 9 is a characteristic diagram of the FIR filter, and FIG. 10 is a diagram of the tone signal after envelope correction. FIG. 11 is a characteristic diagram of a comb filter, FIG. 12 is a waveform diagram for explaining an operation of setting an optimal looping point, and FIG.
FIG. 13 is a waveform diagram showing the tone signal before and after the time axis correction, and FIG.
FIG. 15 is a schematic diagram showing a structure of a block for quasi-instantaneous bit compression for peak value data after time axis correction, FIG. 15 is a waveform diagram showing loop data obtained by repeatedly connecting waveforms in a looping section, and FIG. Is a waveform diagram showing formant part generation data after envelope correction based on decay rate information, FIG. 17 is a flowchart for explaining operations before and after actual looping processing, and FIG. 18 is a quasi-instantaneous bit compression encoding system. FIG. 19 is a block circuit diagram showing a schematic configuration, FIG. 19 is a schematic diagram showing a specific example of one block of data obtained by quasi-instantaneous bit compression encoding, and FIG. FIG. 21 is a schematic diagram showing a configuration example of a system including an audio processing unit (APU) and its periphery.

Claims

(57) [Claims]

A step of performing a Fourier transform on an input digital signal obtained by converting an analog signal into a digital signal; a step of obtaining an absolute value of each of the obtained frequency components; and a step of performing a Fourier transform on the obtained absolute value of each of the frequency components again. A pitch detecting method, comprising: detecting a pitch of the analog signal by detecting a cycle of a peak value of the obtained output data.