JPH09198095A

JPH09198095A - Pitch detecting device

Info

Publication number: JPH09198095A
Application number: JP8005253A
Authority: JP
Inventors: Takeshi Daishiyouji; 健大聖寺; Yasuo Wakamori; 康男若森; Toshihiko Suzuki; 俊彦鈴木; Yusuke Yamamoto; 裕介山本
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1996-01-16
Filing date: 1996-01-16
Publication date: 1997-07-31
Anticipated expiration: 2016-01-16
Also published as: JP3996222B2

Abstract

PROBLEM TO BE SOLVED: To accurately find a pitch cycle fast and accurately even when a speech waveform is a complicated waveform containing an overtone component or when a fine noise is superposed on the speech waveform. SOLUTION: A binarization part 8 which has a masking zone nearby the zero level converts a digital speech signal into a binary signal. Then, a timer 9 measures the inversion intervals of the binary signal to find successive zero- crossing intervals of the digital speech signal and stores them in a RAM 10. A pitch arithmetic part 11 assumes that the pitch cycle is the sum of 2n pieces of zero-crossing interval data as to n=1-4, calculates a reproduction rate as the degree of matching in each pitch cycle of each zero-crossing interval data constituting one pitch cycle, and employs the assumption by which the highest reproduction rate is obtained to find the pitch cycle.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、音声波形のピッ
チ周期またはピッチ周波数を検出するピッチ検出装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pitch detecting device for detecting a pitch period or pitch frequency of a voice waveform.

【０００２】[0002]

【従来の技術】音声波形を特徴付けるパラメータの１つ
としてピッチ周期（あるいはピッチ周波数）があり、こ
の音声波形のピッチ周期を検出する技術が音声分析・合
成システム、音声符号化システム等において一般的に使
用されている。また、最近では、カラオケシステムに
も、歌唱者の音声のピッチ周期の検出を行うものがあ
り、歌唱の採点等に利用されている。2. Description of the Related Art A pitch period (or pitch frequency) is one of the parameters that characterize a speech waveform, and a technique for detecting the pitch period of this speech waveform is generally used in speech analysis / synthesis systems, speech coding systems and the like. It is used. Recently, some karaoke systems also detect the pitch period of the voice of the singer, and are used for scoring the singing.

【０００３】従来、音声のピッチ周期を検出する方法と
して以下のものがあった。（１）零クロス法音声波形が正弦波に非常に近いものと仮定すると、音声
波形は零レベル線を負方向から正方向に横切り、次いで
正方向から負方向に横切り、再び負方向から正方向に横
切るという単調な変化を繰り返すため、零レベル線を同
一方向に横切る時間間隔によってピッチ周期が与えられ
る。零クロス法は、この考えに従い、単純に２つの零ク
ロス間隔を計測してピッチ周期とする方法である。ま
た、これと同様な発想として、音声波形の瞬時値が極大
値または極小値となるタイミングの間隔を計測してピッ
チ周期とする方法もある。Conventionally, there have been the following methods for detecting the pitch period of voice. (1) Zero cross method Assuming that the voice waveform is very close to a sine wave, the voice waveform crosses the zero level line from the negative direction to the positive direction, then from the positive direction to the negative direction, and again from the negative direction to the positive direction. Since a monotonous change of crossing the zero level line is repeated, the pitch period is given by the time interval across the zero level line in the same direction. According to this idea, the zero-cross method is a method of simply measuring two zero-cross intervals and setting them as pitch periods. Further, as an idea similar to this, there is also a method of measuring the interval of the timing at which the instantaneous value of the voice waveform reaches the maximum value or the minimum value and sets it as the pitch cycle.

【０００４】（２）自己相関法この自己相関法においては、音声波形を一定のサンプリ
ング周期毎にサンプリングすることによって得られる時
系列サンプルｘ（１），ｘ（２），…を用い、以下の自
己相関関数Ｒ（ｒ）の演算を行うことにより、ピッチ周
期を求める。Ｒ（ｒ）＝１／Ｎ・Σ ｛ｘ（ｎ）／ｘ（ｎ＋ｒ）｝（ただし、上記式において、Σはｎ＝１〜Ｎ・ｒの範囲
で｛｝内の総和を求める演算子である。）すなわち、ｒを各種変化させ、各ｒについて自己相関関
数Ｒ（ｒ）を求め、Ｒ（ｒ）が最大（すなわち、自己相
関が最大）になるときのｒから音声波形のピッチ周期を
算出する。(2) Autocorrelation method In this autocorrelation method, time-series samples x (1), x (2), ... Obtained by sampling the speech waveform at a constant sampling period are used. The pitch period is obtained by calculating the autocorrelation function R (r). R (r) = 1 / N · Σ {x (n) / x (n + r)} (In the above equation, Σ is an operator that finds the sum within {} in the range of n = 1 to N · r. That is, the r is variously changed, the autocorrelation function R (r) is obtained for each r, and the pitch period of the speech waveform is calculated from r when R (r) is maximum (that is, the autocorrelation is maximum). calculate.

【０００５】[0005]

【発明が解決しようとする課題】ところで、上述した零
クロス法は、比較的安価にしかも高速にピッチ周期を検
出することができる反面、人間の音声は多くの倍音成分
を多く含んでいるため正確なピッチ周期を検出すること
ができないという問題があった。また、上述した自己相
関法は、ある程度正確にピッチ周期を検出することが可
能であるが、計算量が膨大であるとともに、検出時間が
多くかかる。また、コスト的にも高くなる。By the way, the above-mentioned zero-cross method can detect the pitch period at a relatively low cost and at a high speed, but on the other hand, since human voice contains many harmonic components, it is accurate. There is a problem that it is not possible to detect different pitch periods. Further, the above-described autocorrelation method can detect the pitch period to some extent accurately, but the amount of calculation is enormous and the detection time is long. In addition, the cost is increased.

【０００６】この発明は、上記問題点を克服し、基本的
には零クロス間隔を測定することにより音声波形のピッ
チ周期を演算し、かつ、この方法を採用したことによっ
て生じる弊害を防止する手段を講じ、安価な構成で、正
確かつ高速にピッチ周期を検出することが可能なピッチ
検出装置を提供することを目的とする。The present invention overcomes the above-mentioned problems, basically calculates the pitch period of the voice waveform by measuring the zero-crossing interval, and prevents the adverse effects caused by adopting this method. Therefore, it is an object of the present invention to provide a pitch detecting device having a low cost structure and capable of detecting a pitch period accurately and at high speed.

【０００７】[0007]

【課題を解決するための手段】この発明は、音声波形を
零レベルを基準として２値化し、２値信号を出力する２
値化手段と、前記２値信号に基づき、音声波形の連続す
る零クロス間隔ｔ₁，ｔ₂，…を計測する零クロス間隔計
測手段と、ｎ（ｎは１以上の整数）を各種変化させ、各
ｎについて、２ｎ個の零クロス間隔の総和Ｔ＝（ｔ₁＋
ｔ₂＋・・ｔ_2n）をピッチ周期と仮定し、前記零クロス
間隔ｔ₁，ｔ₂，…に基づいて、隣接するｍ周期（ｍは２
以上の整数）分の各ピッチ周期間での前記音声波形の一
致の程度を算出し、音声波形の一致の程度が最も高いｎ
を選択することによりピッチ周期を求めるピッチ演算手
段とを具備するピッチ検出装置であって、前記２値化手
段が、前記零レベルから所定の範囲内をマスキング帯と
するものであり、前記音声波形が該マスキング帯を横切
って変化した場合にのみ２値信号を反転させ、音声波形
が該マスキング帯内にある場合には該マスキング帯に入
る直前の２値信号を維持することを特徴とするピッチ検
出装置を要旨とする。請求項２に係る発明は、前記音声
波形の振幅に応じて前記マスキング帯の幅を制御するよ
うにしたことを特徴とする請求項１記載のピッチ検出装
置を要旨とする。According to the present invention, a speech waveform is binarized with a zero level as a reference and a binary signal is output.
Quantizing means, zero-crossing interval measuring means for measuring continuous zero-crossing intervals t ₁ , t ₂ , ... Of the voice waveform based on the binary signal, and variously changing n (n is an integer of 1 or more) , For each n, the sum of 2n zero-crossing intervals T = (t ₁ +
It is assumed that t ₂ + ... T _2n ) is a pitch period, and adjacent m periods (m is 2) based on the zero crossing intervals t ₁ , t ₂ ,.
The degree of coincidence of the voice waveforms during each pitch period of the above integer) is calculated, and the degree of coincidence of the voice waveforms is the highest n
And a pitch calculating means for determining a pitch cycle by selecting a pitch period, wherein the binarizing means uses a masking band within a predetermined range from the zero level, and the speech waveform A binary signal that is inverted only when it changes across the masking band, and maintains the binary signal just before entering the masking band if the speech waveform is within the masking band. The detection device is the main point. The invention according to claim 2 is characterized in that the width of the masking band is controlled according to the amplitude of the voice waveform.

【０００８】[0008]

【発明の実施の形態】以下、本発明を更に理解しやすく
するため、実施の形態について説明する。かかる実施の
形態は、本発明の一態様を示すものであり、この発明を
限定するものではなく、本発明の範囲で任意に変更可能
である。BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments will be described to make the present invention easier to understand. Such an embodiment shows one aspect of the present invention, and does not limit the present invention, and can be arbitrarily changed within the scope of the present invention.

【０００９】Ａ．実施形態の構成図１はこの発明をカラオケシステムに適用した実施形態
の構成を示すブロック図である。本実施形態は、カラオ
ケシステムの構成部分のうち歌唱者の歌の採点をする部
分に関するものである。図１において、１はデジタル音
楽信号が記録されたＣＤ（コンパクトディスク）であ
る。このＣＤ１に記録されたデジタル音楽信号はサンプ
リング周波数ｆｓ＝４４．１ｋＨｚのクロックに同期し
て順次再生される。２はボーカル抽出部であり、ＣＤ１
から再生されたデジタル音楽信号からボーカル音に相当
する信号（以下、デジタルお手本信号という。）を抽出
する。一例としてＣＤ１から再生されたデジタル音楽信
号の音声帯域を含む周波数帯域の信号をバンドパスフィ
ルタにより抽出するという処理によりデジタルお手本信
号を得ることができる。また、ボーカル音のみを記録し
たメディアを利用可能な場合は、そのメディアから再生
されたデジタル音楽信号をそのままデジタルお手本信号
として使用すればよい。３はマイクロホンであり、ＣＤ
１の再生に合わせて歌う歌唱者の歌声を採取し、アナロ
グ音声信号として出力する。４はＡ／Ｄ変換器であり、
マイクロホン１からアナログ音声信号を、ＣＤ１の再生
の場合と同様なサンプリング周波数ｆｓ＝４４．１ｋＨ
ｚのクロックに同期してサンプリングし、デジタル音声
信号に変換する。A. Configuration of Embodiment FIG. 1 is a block diagram showing the configuration of an embodiment in which the present invention is applied to a karaoke system. The present embodiment relates to a part of the karaoke system that is used for scoring a song by a singer. In FIG. 1, reference numeral 1 is a CD (compact disc) on which a digital music signal is recorded. The digital music signals recorded on the CD 1 are sequentially reproduced in synchronization with the clock having the sampling frequency fs = 44.1 kHz. 2 is a vocal extraction unit, CD1
A signal corresponding to a vocal sound (hereinafter referred to as a digital model signal) is extracted from the digital music signal reproduced from. As an example, a digital model signal can be obtained by a process of extracting a signal in a frequency band including a voice band of a digital music signal reproduced from the CD 1 with a bandpass filter. If a medium recording only vocal sounds is available, the digital music signal reproduced from the medium may be used as it is as a digital model signal. 3 is a microphone, CD
The singing voice of the singer singing along with the reproduction of 1 is sampled and output as an analog audio signal. 4 is an A / D converter,
An analog audio signal from the microphone 1 is sampled at the same sampling frequency fs = 44.1 kHz as in the case of reproducing the CD1.
It is sampled in synchronization with the z clock and converted into a digital audio signal.

【００１０】５はＤＣ除去部であり、順次供給されるデ
ジタル音声信号およびデジタルお手本信号に対してＤＣ
除去処理を施し、ＤＣとみなせる低い周波数帯域、例え
ば０Ｈｚ〜５０Ｈｚの帯域の成分の除去されたデジタル
音声信号およびデジタルお手本信号を各々出力する。６
はＬＰＦ（ローパスフィルタ）であり、ＤＣ除去部５に
よって出力されたデジタル音声信号およびデジタルお手
本信号の各々から例えば５００ｋＨｚ以上の周波数の成
分を除去して出力する。これらのＤＣ除去部５およびＬ
ＰＦ６により、デジタル音声信号およびデジタルお手本
信号の各々について、５０〜５００Ｈｚの帯域内の成分
のみが選択され、出力される。Reference numeral 5 denotes a DC removing unit which applies DC to the digital audio signal and digital model signal which are sequentially supplied.
The removal processing is performed, and the digital audio signal and the digital model signal from which the components in the low frequency band that can be regarded as DC, for example, the band of 0 Hz to 50 Hz are removed are output. 6
Is an LPF (low-pass filter), which removes a component having a frequency of, for example, 500 kHz or more from each of the digital audio signal and the digital model signal output by the DC removing unit 5, and outputs the result. These DC removing section 5 and L
The PF 6 selects and outputs only the component within the band of 50 to 500 Hz for each of the digital voice signal and the digital model signal.

【００１１】７は４倍オーバーサンプリング部であり、
ＬＰＦ６を通過したデジタル音声信号およびデジタルお
手本信号（いずれもサンプリング周波数ｆｓ＝４４．１
ｋＨｚ）に対して補間演算を施し、４倍のサンプリング
周波数の信号に変換して出力する。Reference numeral 7 is a 4 × oversampling unit,
Digital voice signal and digital model signal that have passed through the LPF 6 (both sampling frequency fs = 44.1
(kHz) is interpolated, converted into a signal having a quadruple sampling frequency, and output.

【００１２】図２はこの４倍オーバーサンプリング部７
のうちデジタル音声信号またはデジタルお手本信号の一
方（以下、入力デジタル信号という。）の処理を行うの
に必要な回路構成を例示したものである。この図におい
て、ラッチ７１は、サンプリング周波数ｆｓに対応した
クロックが与えられることにより、入力デジタル信号を
取り込んで保持する。遅延器７２，７２，…は図示の通
りラッチ７１の後段にカスケード接続されている。これ
らの遅延器７２は、各々サンプリング周波数ｆｓの４倍
の周波数のクロックが与えられることにより、ラッチ７
１に保持された入力信号を順次シフトし、該入力信号を
１クロック周期ずつ順次遅延させた遅延信号を各々出力
する。７３，７３，…は乗算器、７４，７４，…は加算
器であり、これらによりラッチ７１および遅延器７２，
７２，…の各出力信号に所定の補間係数列を畳み込む補
間演算が実行される。以上の構成により、サンプリング
周波数ｆｓの４倍の周波数のクロックに同期して補間演
算が実行され、補間のなされたデジタル信号が最終段の
加算器７４から順次出力される。FIG. 2 shows the quadruple oversampling unit 7
Among them, a circuit configuration necessary for processing one of a digital voice signal and a digital model signal (hereinafter referred to as an input digital signal) is illustrated. In this figure, the latch 71 receives and holds an input digital signal by being supplied with a clock corresponding to the sampling frequency fs. The delay devices 72, 72, ... Are cascade-connected to the latter stage of the latch 71 as shown. These delay devices 72 are supplied with a clock having a frequency four times as high as the sampling frequency fs, whereby the latch 7
The input signal held at 1 is sequentially shifted, and the input signal is sequentially delayed by one clock cycle, and each delayed signal is output. 73, 73, ... Are multipliers, and 74, 74 ,.
Interpolation calculation is performed in which a predetermined interpolation coefficient string is convoluted with each output signal of 72, .... With the above configuration, the interpolation operation is executed in synchronization with the clock having a frequency four times the sampling frequency fs, and the interpolated digital signal is sequentially output from the adder 74 at the final stage.

【００１３】この４倍オーバーサンプリング部７は、ピ
ッチ周期を求める際の精度を高めるために設けられた手
段である。すなわち、本実施形態においては、デジタル
音声信号およびデジタルお手本信号の各々の零クロス点
の時間間隔を測定することにより各デジタル信号のピッ
チ周期を求める。このため、ピッチ周期の測定精度を高
めるためには、時間軸上における零クロス点の位置の検
出精度を高める必要がある。そこで、この４倍オーバー
サンプリング部７を介挿することにより、デジタル音声
信号およびデジタルお手本信号の各々のサンプルの時間
密度を４倍にし、各々の零クロス点の位置の検出精度を
高めている。この例では、曲線補間によりオーバーサン
プリングを行っているが、コストの問題に鑑みて、ある
程度の精度が得られる直線補間を用いることもできる。The quadruple oversampling unit 7 is a means provided to improve the accuracy in obtaining the pitch period. That is, in the present embodiment, the pitch period of each digital signal is obtained by measuring the time interval between the zero cross points of each of the digital voice signal and the digital model signal. Therefore, in order to improve the accuracy of measuring the pitch period, it is necessary to improve the accuracy of detecting the position of the zero cross point on the time axis. Therefore, by inserting the 4-fold oversampling unit 7, the time density of each sample of the digital voice signal and the digital sample signal is quadrupled to improve the detection accuracy of the position of each zero cross point. In this example, oversampling is performed by curve interpolation, but in view of the cost problem, linear interpolation that provides some accuracy may be used.

【００１４】８は２値化部であり、４倍オーバーサンプ
リング部７から出力されるデジタル音声信号およびデジ
タルお手本信号のレベルの２値化を行う。この２値化
は、基本的には、零レベルを基準として入力デジタル信
号の正負判定を行い、入力デジタル信号が正の場合は
“１”を、負の場合は“０”を出力するものである。す
なわち、この２値化部８は入力デジタル信号が零レベル
を横切る毎に“０”／“１”が反転する２値信号を出力
する手段である。ただし、本実施形態においては２値化
を行う際に零レベルを中心に±Δの範囲をマスキング帯
とし、入力デジタル信号にこの±Δのマスキング帯内の
微小な振動があったとしても、かかる微小な振動によっ
ては２値信号を反転させないようにしている。Reference numeral 8 denotes a binarizing unit which binarizes the levels of the digital voice signal and the digital model signal output from the quadruple oversampling unit 7. This binarization basically determines whether the input digital signal is positive or negative based on the zero level, and outputs "1" when the input digital signal is positive and outputs "0" when the input digital signal is negative. is there. That is, the binarization unit 8 is means for outputting a binary signal in which "0" / "1" is inverted every time the input digital signal crosses the zero level. However, in the present embodiment, even when the binarization is performed, a range of ± Δ around the zero level is set as a masking band, and even if the input digital signal has a minute vibration within the ± Δ masking band, it is still required. The binary signal is not inverted due to a slight vibration.

【００１５】図３はこの２値化部８のうちデジタル音声
信号またはデジタルお手本信号の一方（以下、入力デジ
タル信号という。）の処理を行うのに必要な回路構成を
例示したものである。この図において、８１は入力デジ
タル信号の絶対値を検出する絶対値検出部である。８２
は比較部であり、絶対値検出部８１によって検出された
入力デジタル信号の絶対値を所定値Δと比較し、絶対値
がΔを越えている場合には“１”を、越えていない場合
には“０”を出力する。８３はサンプルホールド部であ
り、比較部８２から“１”が出力されている期間は入力
デジタル信号をそのまま出力し（サンプル状態）、比較
部８２から“０”が出力されている期間は比較部８２の
出力信号が“１”から“０”に変化する直前の入力デジ
タル信号を保持し出力する（ホールド状態）。８４は比
較部であり、零レベルを基準としてサンプルホールド部
８３の出力信号の正負判定を行い、正の場合は“１”
を、負の場合は“０”の２値信号を出力する。FIG. 3 exemplifies a circuit configuration necessary for processing one of the digital voice signal and the digital sample signal (hereinafter, referred to as an input digital signal) in the binarizing unit 8. In this figure, reference numeral 81 is an absolute value detecting section for detecting the absolute value of the input digital signal. 82
Is a comparison unit, which compares the absolute value of the input digital signal detected by the absolute value detection unit 81 with a predetermined value Δ, and if the absolute value exceeds Δ, exceeds “1”; Outputs "0". Reference numeral 83 denotes a sample and hold unit, which outputs the input digital signal as it is while the comparison unit 82 outputs “1” (sampling state), and the comparison unit 82 outputs “0” during the comparison unit. The input digital signal immediately before the output signal of 82 changes from "1" to "0" is held and output (hold state). Reference numeral 84 is a comparison unit, which determines whether the output signal of the sample-hold unit 83 is positive or negative with reference to the zero level.
If it is negative, a binary signal of "0" is output.

【００１６】以上の構成によれば、入力デジタル信号が
±Δの範囲外にある場合にはサンプルホールド部８３を
介してそのまま出力される。また、入力デジタル信号が
零レベル±Δのマスキング帯内に入った場合には、その
直前の入力デジタル信号の値がサンプルホールド部８３
によって保持され、この保持動作が行われている期間中
は比較部８４が出力する２値信号が反転することはな
い。従って、入力デジタル信号が零レベル±Δのマスキ
ング帯を横切って変化する場合はマスキング帯を横切り
終えた時点で２値信号が反転することとなる。一方、入
力デジタル信号が零レベル±Δのマスキング帯に入った
がこれを横切ることなくマスキング帯内を上下動するよ
うな場合には、たとえ入力デジタル信号が零レベルを横
切ったとしてもサンプルホールド部８３の出力信号値が
零レベルを横切ることはないため、２値信号の反転は起
こらない。According to the above configuration, when the input digital signal is outside the range of ± Δ, it is output as it is through the sample hold unit 83. Further, when the input digital signal enters the masking band of zero level ± Δ, the value of the input digital signal immediately before that falls in the sample hold unit 83.
The binary signal output from the comparator 84 is not inverted during the period in which the holding operation is performed. Therefore, when the input digital signal changes across the masking band of zero level ± Δ, the binary signal is inverted when the crossing of the masking band is completed. On the other hand, when the input digital signal enters the masking band of zero level ± Δ and moves up and down in the masking band without crossing it, even if the input digital signal crosses the zero level, the sample hold unit Since the output signal value of 83 does not cross the zero level, the inversion of the binary signal does not occur.

【００１７】図３において比較部８４よりも前段にある
回路は、図４に示すものに置き換えてもよい。この図４
において、８５および８６は比較部であり、各々、入力
デジタル信号を基準レベルと比較し、入力デジタル信号
が基準レベルより高いときには“１”を、基準レベルよ
り低いときには“０”を出力する。比較部８５に対して
は基準レベルとして＋Δが与えられ、比較部８６に対し
ては基準レベルとして−Δが与えられる。８７は入力デ
ジタル信号を保持するラッチ、８８は入力デジタル信号
またはラッチ８７の出力信号を選択して出力するセレク
タである。８９は制御部であり、比較部８５および８６
の各出力信号に基づいてラッチ８７およびセレクタ８９
の制御を行う。すなわち、次の通りである。In FIG. 3, the circuit preceding the comparison unit 84 may be replaced with the circuit shown in FIG. This figure 4
In 85, reference numerals 85 and 86 respectively compare the input digital signal with the reference level, and output "1" when the input digital signal is higher than the reference level and "0" when it is lower than the reference level. The reference level + Δ is given to the comparison unit 85, and the reference level −Δ is given to the comparison unit 86. 87 is a latch for holding an input digital signal, and 88 is a selector for selecting and outputting the input digital signal or the output signal of the latch 87. Reference numeral 89 is a control unit, and comparison units 85 and 86
Latch 87 and selector 89 based on each output signal of
Control. That is, it is as follows.

【００１８】ａ．比較部８５および８６の出力信号がい
ずれも“１”、あるいはいずれも“０”である場合入力デジタル信号が零レベル±Δのマスキング帯の外側
にある場合である。この場合、制御部８９は、ラッチ８
７をサンプル状態とし、セレクタ８８には入力デジタル
信号を出力させる。ｂ．比較部８５の出力信号が“０”であり、かつ、比較
部８６の出力信号が“１”である場合入力デジタル信号が零レベル±Δのマスキング帯の内側
にある場合である。この場合、制御部８９は、入力デジ
タル信号がマスキング帯内に入った時点でラッチ８７を
ホールド状態とし、セレクタ８８にはラッチ８７の出力
信号を出力させる。A. When the output signals of the comparators 85 and 86 are both "1" or "0", the input digital signal is outside the masking band of zero level ± Δ. In this case, the control unit 89 uses the latch 8
7 is set in the sample state, and the selector 88 outputs the input digital signal. b. When the output signal of the comparison unit 85 is “0” and the output signal of the comparison unit 86 is “1” This is the case where the input digital signal is inside the masking band of zero level ± Δ. In this case, the control unit 89 puts the latch 87 in the hold state when the input digital signal enters the masking band, and causes the selector 88 to output the output signal of the latch 87.

【００１９】図１において、９はデジタル音声信号およ
びデジタルお手本信号に対応した２値化部８の各出力信
号の反転が起こる時間間隔、すなわち、これらの各デジ
タル信号の零クロス点の発生する時間間隔を計時するた
めのタイマであり、１０はタイマ９の計時結果を記憶す
るＲＡＭである。In FIG. 1, 9 is a time interval at which inversion of each output signal of the binarization unit 8 corresponding to a digital voice signal and a digital model signal occurs, that is, a time at which a zero cross point of each of these digital signals occurs. A timer for timing the interval, and 10 is a RAM for storing the timing result of the timer 9.

【００２０】図５はタイマ９およびＲＡＭ１０をそれら
の制御系と共に示したブロック図である。なお、この図
は、デジタル音声信号およびデジタルお手本信号の一方
に対応した処理に必要な部分のみが示されている。図５
において、９１は遅延器、９２は排他的論理和回路であ
る。これらは２値化部８が出力する２値信号を微分する
微分回路９０を構成しており、２値信号の反転が起こる
毎にパルスを出力する。タイマ９は、微分回路９０から
の出力パルスが与えられる毎にリセットされ、このリセ
ットの後、次にリセットされるまでの間は、一定周波数
４ｆｓのクロックをカウントする。FIG. 5 is a block diagram showing the timer 9 and the RAM 10 together with their control systems. It should be noted that this figure shows only a portion necessary for processing corresponding to one of the digital voice signal and the digital model signal. FIG.
In the figure, 91 is a delay device, and 92 is an exclusive OR circuit. These configure a differentiating circuit 90 that differentiates the binary signal output from the binarizing unit 8, and outputs a pulse each time the inversion of the binary signal occurs. The timer 9 is reset each time an output pulse from the differentiating circuit 90 is applied, and counts clocks having a constant frequency of 4 fs until the next reset after this reset.

【００２１】タイマ９のカウント値は、ラッチ９３に対
し入力データとして与えられる。ラッチ９３は、微分回
路９０からの出力パルスが与えられることにより、リセ
ット直前のタイマ９のカウント値を取り込んで保持す
る。このラッチ９３に保持されるカウント値は、前回の
２値信号の反転が検出されてから今回の反転が検出され
るまでの間に出力された周波数４ｆｓのクロックの個数
であるから、零クロス点が発生する時間間隔を表してい
ると言える。従って、以下では、このラッチ９３の保持
データを零クロス間隔データと呼ぶ。The count value of the timer 9 is given to the latch 93 as input data. The latch 93 fetches and holds the count value of the timer 9 immediately before reset by being supplied with the output pulse from the differentiating circuit 90. The count value held in the latch 93 is the number of clocks having the frequency 4fs output from the time when the inversion of the previous binary signal was detected to the time when the current inversion was detected, and therefore, the zero cross point Can be said to represent the time interval at which Therefore, hereinafter, the data held in the latch 93 will be referred to as zero-cross interval data.

【００２２】書込制御部９４は、微分回路９０からの出
力パルスが与えられる毎に、ラッチ９３内の零クロス間
隔データを順次読み出し、一定範囲内の零クロス間隔デ
ータが所定値以上（タイマ９のカウント値が大）のとき
はリミットを設けてＲＡＭ１０に書込み、また、所定値
未満（タイマ９のカウント値が小）のときはリミットを
設けてＲＡＭ１０への書込みを行わず廃棄する。このよ
うに一定範囲内の零クロス間隔データのみをＲＡＭ１０
へ書込むようにしたのは、音声信号の零クロス点の時間
間隔として妥当でない零クロス間隔データが演算に使用
され、誤ったピッチ周期が演算されてしまうのを防止す
るためである。The write control section 94 sequentially reads the zero-cross interval data in the latch 93 every time the output pulse from the differentiating circuit 90 is given, and the zero-cross interval data within a certain range is equal to or more than a predetermined value (timer 9 When the count value of is large), a limit is provided and written in the RAM 10, and when it is less than a predetermined value (the count value of the timer 9 is small), a limit is provided and the RAM 10 is discarded without being written. Thus, only the zero-crossing interval data within a certain range is stored in the RAM 10.
The reason why the data is written in is to prevent the wrong pitch period from being calculated due to the use of the zero-cross interval data, which is not valid as the time interval of the zero-cross points of the audio signal, in the calculation.

【００２３】図１におけるピッチ演算部１１は、ＲＡＭ
１０に蓄積された零クロス間隔データを参照することに
より、デジタル音声信号およびデジタルお手本信号の各
々のピッチ周期を演算する。The pitch calculator 11 in FIG. 1 is a RAM
By referring to the zero-cross interval data stored in 10, the pitch period of each of the digital voice signal and the digital model signal is calculated.

【００２４】ここで、デジタル音声信号等が正弦波であ
るとすると、１周期分の正弦波の始点と終点において零
レベル線とクロスする他、これらの零クロス点の中間に
おいて１回だけ零レベル線とクロスする。従って、連続
した２個の零クロス間隔データを加算することによりピ
ッチ周期を求めることができる。If the digital audio signal or the like is a sine wave, the zero level line is crossed at the start point and the end point of the sine wave for one period, and the zero level is only once at the midpoint between these zero cross points. Cross the line. Therefore, the pitch period can be obtained by adding two consecutive zero-cross interval data.

【００２５】しかしながら、人間の音声波形を表したデ
ジタル音声信号等は、多くの倍音成分を含んでいるた
め、１ピッチ周期分の波形がそのピッチ周期の始点と終
点の間に３個以上の零クロス点を含んでいる場合があ
り、かかる場合には連続した２個の零クロス間隔データ
を加算しても正しいピッチ周期が得られない。However, since a digital voice signal representing a human voice waveform contains many overtone components, a waveform for one pitch period has three or more zeros between the start point and the end point of the pitch period. In some cases, a cross point is included, and in such a case, the correct pitch period cannot be obtained even if two consecutive zero-cross interval data are added.

【００２６】そこで、本実施形態においては、複数種類
の整数ｎの各々について、１ピッチ周期が２ｎ個の零ク
ロス間隔データの和に相当する長さを有するものと仮定
する。そして、各々の仮定の下でピッチ周期を求め、１
ピッチ周期内の各零クロス点の発生タイミングが各ピッ
チ周期間でどの程度一致しているかを求める。なお、こ
の零クロス点の発生タイミングの一致の程度の検出の詳
細については後述する。そして、この一致の程度が最も
高いピッチ周期を真のピッチ周期として選択する。これ
は、短い時間内であれば大きな波形の変化は生じないと
いう音声信号の性質を前提としたものである。Therefore, in the present embodiment, it is assumed that one pitch period has a length corresponding to the sum of 2n zero-cross interval data for each of a plurality of types of integers n. Then, the pitch period is calculated under each assumption, and 1
The degree to which the generation timing of each zero cross point within the pitch cycle matches between the pitch cycles is determined. The details of the detection of the degree of coincidence of the generation timings of the zero cross points will be described later. Then, the pitch cycle with the highest degree of coincidence is selected as the true pitch cycle. This is based on the property of an audio signal that a large waveform change does not occur within a short time.

【００２７】次に、図１において、１２はレベル検出部
であり、Ａ／Ｄ変換器４によって出力されたデジタル音
声信号およびボーカル抽出部２によって出力されたデジ
タルお手本信号の各々のレベルを検出し、各レベルを表
す信号を出力する。Next, in FIG. 1, reference numeral 12 is a level detection unit for detecting the level of each of the digital voice signal output by the A / D converter 4 and the digital model signal output by the vocal extraction unit 2. , Outputs a signal representing each level.

【００２８】１３は採点部であり、ピッチ演算部１１に
よって求められたデジタル音声信号およびデジタルお手
本信号の各々のピッチ周期のずれと、レベル検出部１２
によって求められた両信号レベルのずれを総合評価し、
歌唱者の歌を採点する。この採点結果は表示部１４に表
示される。Reference numeral 13 denotes a scoring unit, which detects the deviation of the pitch cycle of each of the digital voice signal and the digital model signal obtained by the pitch calculating unit 11 and the level detecting unit 12.
Comprehensively evaluate the deviation of both signal levels obtained by
Score the song of the singer. The scoring result is displayed on the display unit 14.

【００２９】Ｂ．実施形態の動作以下、本実施形態の動作を説明する。歌唱者によって選
曲が行われると、その曲に対応したＣＤ１からデジタル
音楽信号が順次再生される。そして、ボーカル抽出部２
により、デジタル音楽信号からデジタルお手本信号が抽
出され、ＤＣ除去部５およびレベル検出部１２へ出力さ
れる。一方、ＣＤ１の再生により歌唱者が歌唱を開始
し、その歌声がマイクロホン３によって採取され、アナ
ログ音声信号として出力される。このアナログ音声信号
は、Ａ／Ｄ変換器４を介すことにより、デジタル音声信
号に変換され、ＤＣ除去部５およびレベル検出部１２へ
出力される。B. Operation of Embodiment Hereinafter, the operation of this embodiment will be described. When the song is selected by the singer, digital music signals are sequentially reproduced from the CD 1 corresponding to the song. And the vocal extraction unit 2
As a result, a digital model signal is extracted from the digital music signal and output to the DC removing section 5 and the level detecting section 12. On the other hand, when the CD 1 is reproduced, the singer starts singing, and the singing voice is collected by the microphone 3 and output as an analog audio signal. This analog audio signal is converted into a digital audio signal through the A / D converter 4, and is output to the DC removing section 5 and the level detecting section 12.

【００３０】デジタル音声信号およびデジタルお手本信
号は、ＤＣ除去部５およびＬＰＦ６を順次介すことによ
り、不要な周波数帯域の信号が除去され、人の声の周波
数帯域内の成分のみからなる波形を表すデジタル信号と
なって４倍オーバーサンプリング部７へ各々出力され
る。The digital voice signal and the digital model signal are sequentially passed through the DC removing section 5 and the LPF 6, whereby unnecessary frequency band signals are removed and a waveform consisting of only components within the frequency band of human voice is represented. The digital signals are output to the 4 × oversampling unit 7, respectively.

【００３１】そして、デジタル音声信号およびデジタル
お手本信号は、４倍オーバーサンプリング部７により、
各々時間軸上において補間され、４倍のサンプリング周
波数の信号に変換されて出力され、２値化部８によって
２値信号に変換される。Then, the digital voice signal and the digital model signal are processed by the 4 × oversampling unit 7.
Each is interpolated on the time axis, converted into a signal having a quadruple sampling frequency, output, and converted into a binary signal by the binarization unit 8.

【００３２】図６はこの４倍オーバーサンプリング部７
の動作を例示したものである。図６（ａ）において、水
平方向の直線は零レベル線である。また、正弦波状の信
号波形に沿って○印のプロットが示されているが、後者
のプロットはデジタル音声信号（デジタルお手本信号）
を構成する個々の原サンプルを表しており、前者はこれ
らの原サンプルの母体である本来の信号波形を表してい
る。また、各原サンプルを表す○印のプロットの間に
は、３個の×印のプロットが介挿されているが、これら
は４倍オーバーサンプリング部７によって求められた補
間サンプルを各々表している。FIG. 6 shows the 4 × oversampling unit 7
The operation of FIG. In FIG. 6A, the horizontal straight line is a zero level line. In addition, the plot of ○ is shown along the sinusoidal signal waveform, but the latter plot is a digital audio signal (digital sample signal).
Represents the individual original samples that make up the above, and the former represents the original signal waveform that is the parent of these original samples. Further, three plots marked with X are inserted between the plots marked with ◯ representing each original sample, and these plots each represent the interpolated sample obtained by the 4 × oversampling unit 7. .

【００３３】図６（ｂ）は、４倍オーバーサンプリング
を行わず、原サンプル（○印）のみを２値化部８に与え
た場合に得られる２値信号を示しており、図６（ｃ）は
４倍オーバーサンプリングを行い、原サンプル（○印）
および補間サンプル（×印）を２値化部８に与えた場合
に得られる２値信号を示している。なお、これらの図
は、説明の便宜のため、デジタル音声信号（デジタルお
手本信号）が２値化部８のマスキング帯よりも小さなレ
ベルの振動を含んでいない場合の例を示している。FIG. 6B shows a binary signal obtained when only the original sample (marked with a circle) is given to the binarizing unit 8 without performing the 4 times oversampling, and FIG. ) Is the original sample (○)
And a binary signal obtained when the interpolation sample (x mark) is given to the binarization unit 8. For convenience of description, these figures show an example in which the digital audio signal (digital model signal) does not include vibration of a level smaller than the masking band of the binarization unit 8.

【００３４】ここで、デジタル音声信号等は信号波形と
無関係に一定のサンプリング周期毎にサンプリングされ
たものである。従って、デジタル音声信号等が同一波形
を繰り返すものである場合に、図６（ａ）に示すよう
に、いずれのタイミングの瞬時値がサンプリングされる
かは各波形により区々になる。このため、サンプリング
周期が粗いと、図６（ｂ）に示すように、ピッチ周期が
切り換わると同一波形であるにも拘わらず異なった波形
の２値信号が得られてしまう場合がある。しかしなが
ら、本実施形態のようにデジタル音声信号等の４倍オー
バーサンプリングを行った後で２値化を行う場合には、
図６（ｃ）に示すように本来の零クロス点に近いタイミ
ングで反転する２値信号が得られ、図６（ｂ）に示した
ような不具合は防止される。Here, the digital audio signal or the like is sampled at a constant sampling period regardless of the signal waveform. Therefore, when the digital audio signal or the like repeats the same waveform, as shown in FIG. 6A, which timing the instantaneous value is sampled differs depending on each waveform. For this reason, if the sampling period is rough, as shown in FIG. 6B, when the pitch period is switched, a binary signal having a different waveform may be obtained although the waveform is the same. However, in the case where binarization is performed after performing 4 times oversampling of a digital audio signal or the like as in the present embodiment,
As shown in FIG. 6C, a binary signal that inverts at a timing close to the original zero-cross point is obtained, and the problem shown in FIG. 6B is prevented.

【００３５】図７（ａ）〜（ｄ）は２値化部８の動作を
例示したものである。まず、図７（ａ）において正弦波
状の信号波形は４倍オーバーサンプリング部７から出力
されるデジタル音声信号（デジタルお手本信号）を表し
ており、水平線は零レベル線を表している。図７（ｂ）
は図３におけるサンプルホールド部８３の動作を示すも
のである。この図に示すように、サンプルホールド部８
３は、入力信号たるデジタル音声信号（デジタルお手本
信号）が零レベル±Δのマスキング帯の外側にある場合
にはサンプル状態とされ（同図において“Ｓ”と表
記）、零レベル±Δのマスキング帯の内側にある場合に
はホールド状態とされる（同図において“Ｈ”と表記）
される。このようなサンプルホールド部８３の制御が行
われる結果、比較部８４へ入力される信号波形は図７
（ｃ）に例示するものとなり、比較部８４から得られる
２値信号は図７（ｄ）に例示するものとなる。このよう
にデジタル音声信号（デジタルお手本信号）が零レベル
±Δのマスキング帯を横切って変化する場合はマスキン
グ帯を横切り終えた時点で２値信号が反転することとな
る。また、仮にデジタル音声信号（デジタルお手本信
号）に±Δ以下の振幅の微小な振動部分を含んでいたと
しても、デジタル音声信号（デジタルお手本信号）が零
レベル±Δのマスキング帯内にある場合にはサンプルホ
ールド部８３が前値保持動作を行うため、振動部分にお
いて２値信号が反転することはない。FIGS. 7A to 7D illustrate the operation of the binarizing unit 8. First, in FIG. 7A, a sinusoidal signal waveform represents a digital audio signal (digital model signal) output from the quadruple oversampling unit 7, and a horizontal line represents a zero level line. FIG. 7 (b)
Shows the operation of the sample hold unit 83 in FIG. As shown in this figure, the sample hold unit 8
When the digital audio signal (digital model signal) as an input signal is outside the zero level ± Δ masking band, 3 is in a sampled state (denoted as “S” in the figure) and zero level ± Δ masking. If it is inside the band, it is in the hold state (indicated as "H" in the figure).
Is done. As a result of such control of the sample hold unit 83, the signal waveform input to the comparison unit 84 is as shown in FIG.
7C, the binary signal obtained from the comparison unit 84 is shown in FIG. 7D. In this way, when the digital voice signal (digital model signal) changes across the masking band of zero level ± Δ, the binary signal is inverted at the time when the crossing of the masking band is completed. Even if the digital audio signal (digital model signal) includes a minute vibration part with an amplitude of ± Δ or less, if the digital audio signal (digital model signal) is within the zero level ± Δ masking band. Since the sample hold unit 83 performs the previous value holding operation, the binary signal is not inverted in the vibrating portion.

【００３６】本実施形態においては、零クロス間隔を使
用してピッチ周期を演算するため、１ピッチ周期相当の
入力デジタル信号波形についてあまりの多くの零クロス
間隔が検出されてしまうと、ピッチ周期の演算の負担が
大きくなってしまう。しかしながら、本実施形態におい
ては、上記のようにマスキング帯を有する２値化部８に
よって２値信号を生成しているので、入力デジタル信号
中、ピッチ周期の演算にとって重要でない零レベル近傍
の微動が無視され、“０”／“１”反転箇所を必要以上
に多く含まない２値信号が得られ、ピッチ周期の演算に
とって適度な数の零クロス間隔を検出することが可能と
なる。In the present embodiment, the pitch period is calculated using the zero-crossing interval, so that if too many zero-crossing intervals are detected in the input digital signal waveform corresponding to one pitch period, the pitch period The calculation load will increase. However, in the present embodiment, since the binary signal is generated by the binarizing unit 8 having the masking band as described above, in the input digital signal, a slight movement near the zero level which is not important for the calculation of the pitch period is generated. A binary signal that is ignored and does not include more "0" / "1" inversion points than necessary is obtained, and it becomes possible to detect an appropriate number of zero-cross intervals for pitch period calculation.

【００３７】以上のようにデジタル音声信号およびデジ
タルお手本信号の各々に基づいて２値信号が生成され
る。そして、各２値信号毎に、“１”／“０”反転が生
じる時間間隔がタイマ９によって順次計時され、その計
時結果たる零クロス間隔データが図５に示すラッチ９３
に順次保持される。このようにしてラッチ９３に順次保
持される零クロス間隔データが、書込制御部９４による
制御の下、ＲＡＭ１０に順次書込まれる。すなわち、書
込制御部９４は、２値信号の反転によって微分回路９０
からパルスが出力されるのに応答し、図８にフローを示
す書込制御ルーチンを実行する。まず、書込制御部９４
は、ラッチ９３から零クロス間隔データｔを取り込み
（ステップＳ１）、この零クロス間隔データｔが下限値
「８」以上か否かを判断する。この判断結果が「ＮＯ」
の場合は零クロス間隔データｔの書込みを行うことなく
ルーチンを終了する。ステップＳ２の判断結果が「ＹＥ
Ｓ」の場合はステップＳ３に進み、零クロス間隔データ
ｔが上限値「８１９２」より大きいか否かを判断する。
この判断結果が「ＮＯ」の場合は零クロス間隔データｔ
をＲＡＭ１０へ書込み（ステップＳ４）、ルーチンを終
了する。一方、ステップＳ３の判断結果が「ＹＥＳ」の
場合は、取り込んだ零クロス間隔データｔの代りに「８
１９２」をＲＡＭ１０に書込み（ステップＳ５）、ルー
チンを終了する。以上の制御により、「８」〜「８１９
２」の範囲内の零クロス間隔データのみがＲＡＭ１０へ
書込まれるため、音声信号の零クロス点の時間間隔とし
て妥当でない零クロス間隔データが演算に使用され、誤
ったピッチ周期が演算されてしまうのを防止することが
できる。As described above, a binary signal is generated based on each of the digital voice signal and the digital model signal. Then, for each binary signal, the time interval at which "1" / "0" inversion occurs is sequentially counted by the timer 9, and the zero-cross interval data as the timing result is latched 93 shown in FIG.
Are held in sequence. In this way, the zero-cross interval data sequentially held in the latch 93 is sequentially written in the RAM 10 under the control of the write controller 94. That is, the write control unit 94 uses the inversion of the binary signal to differentiate the differential circuit 90.
In response to the output of the pulse from, the write control routine whose flow is shown in FIG. 8 is executed. First, the write controller 94
Receives the zero-cross interval data t from the latch 93 (step S1) and determines whether or not the zero-cross interval data t is the lower limit value "8" or more. If the result of this determination is "NO"
In the case of, the routine is terminated without writing the zero-cross interval data t. The determination result of step S2 is "YE
In the case of "S", the process proceeds to step S3, and it is determined whether the zero-cross interval data t is larger than the upper limit "8192".
If the result of this determination is "NO", the zero-cross interval data t
Is written in the RAM 10 (step S4), and the routine ends. On the other hand, when the result of the determination in step S3 is "YES", "8
"192" is written in the RAM 10 (step S5), and the routine is ended. By the above control, "8" to "819"
Since only the zero-cross interval data within the range of "2" is written to the RAM 10, the zero-cross interval data that is not valid as the time interval of the zero-cross points of the audio signal is used for the calculation, and the incorrect pitch period is calculated. Can be prevented.

【００３８】このようにしてＲＡＭ１０に蓄積される零
クロス間隔データがピッチ演算部１１によって参照さ
れ、デジタル音声信号およびデジタルお手本信号の各々
のピッチ周期が求められる。ここで、図９を参照し、デ
ジタル音声信号のピッチ周期の算出処理を例にその概要
を説明する。図９（ａ）に例示するようなデジタル音声
信号が２値化部８に与えられたとすると、現時点までに
発生された零クロス間隔データｔ₁，ｔ₂，…がＲＡＭ１
０内に蓄積されている。ピッチ演算部１１は、これらの
零クロス間隔データｔ₁，ｔ₂，…とデジタル音声信号の
ピッチ周期との間の関係について以下の４通りの仮定を
設け、各々の妥当性を検討するという手順に従ってピッ
チ周期を求める。In this way, the zero-cross interval data stored in the RAM 10 is referred to by the pitch calculator 11 to find the pitch period of each of the digital voice signal and the digital model signal. Here, with reference to FIG. 9, an outline thereof will be described by taking a process of calculating a pitch period of a digital audio signal as an example. Assuming that the digital audio signal as illustrated in FIG. 9A is given to the binarization unit 8, the zero-cross interval data t ₁ , t ₂ , ...
It is accumulated within 0. The pitch calculator 11 makes the following four assumptions regarding the relationship between the zero-cross interval data t ₁ , t ₂ , ... And the pitch period of the digital audio signal, and examines the validity of each of them. To find the pitch period.

【００３９】仮定１デジタル音声信号のピッチ周期は、２個の零クロス間隔
データｔ₁，ｔ₂の和に相当する長さＴ₁を有する。すな
わち、図９（ｂ１）に示す時間Ｔ₁₁，Ｔ₁₂，…がデジタ
ル音声信号のピッチ周期である。仮定２デジタル音声信号のピッチ周期は、４個の零クロス間隔
データｔ₁〜ｔ₄の和に相当する長さＴ₂を有する。すな
わち、図９（ｂ２）に示す時間Ｔ₂₁，Ｔ₂₂，…がデジタ
ル音声信号のピッチ周期である。仮定３デジタル音声信号のピッチ周期は、６個の零クロス間隔
データｔ₁〜ｔ₆の和に相当する長さＴ₃を有する。すな
わち、図９（ｂ３）に示す時間Ｔ₃₁，Ｔ₃₂，…がデジタ
ル音声信号のピッチ周期である。仮定４デジタル音声信号のピッチ周期は、８個の零クロス間隔
データｔ₁〜ｔ₈の和に相当する長さＴ₄を有する。すな
わち、図９（ｂ４）に示す時間Ｔ₄₁，Ｔ₄₂，…がデジタ
ル音声信号のピッチ周期である。Assumption 1 The pitch period of the digital audio signal has a length T ₁ corresponding to the sum of _two zero-cross interval data t ₁ and t ₂ . That is, the times T ₁₁ , T ₁₂ , ... Shown in FIG. 9B1 are the pitch periods of the digital audio signal. Assumption 2 The pitch period of the digital audio signal has a length T ₂ corresponding to the sum of _four zero-cross interval data t _{1 to} t ₄ . That is, the times T ₂₁ , T ₂₂ , ... Shown in FIG. 9B2 are pitch periods of the digital audio signal. Assumption 3 The pitch period of the digital audio signal has a length T ₃ corresponding to the sum of the _six zero-cross interval data t _{1 to} t ₆ . That is, the times T ₃₁ , T ₃₂ , ... Shown in FIG. 9B3 are pitch periods of the digital audio signal. Assumption 4 The pitch period of the digital audio signal has a length T ₄ corresponding to the sum of _eight zero-cross interval data t _{1 to} t ₈ . That is, the times T ₄₁ , T ₄₂ , ... Shown in FIG. 9B4 are pitch periods of the digital audio signal.

【００４０】上記各仮定の妥当性の検討およびこの検討
結果に基づくピッチ周期の算出は図１０に示すフローに
従って実行される。まず、ピッチ演算部１１は、上記仮
定１を前提とした場合のデジタル音声信号の波形の再現
率ＣＲ１を算出する（ステップＳ１０１）。この再現率
は、上記各仮定に従った場合に各ピッチ周期に対応した
各デジタル音声信号波形がどの程度一致しているかを表
す数値であり、本実施形態においては、零クロス間隔デ
ータｔ₁，ｔ₂，…に基づいて算出する。The examination of the validity of each of the above assumptions and the calculation of the pitch period based on the examination result are executed according to the flow shown in FIG. First, the pitch calculator 11 calculates the recall ratio CR1 of the waveform of the digital audio signal under the assumption 1 above (step S101). This recall ratio is a numerical value indicating how much the respective digital audio signal waveforms corresponding to the respective pitch periods match when the above-mentioned assumptions are followed. In the present embodiment, the zero-cross interval data t ₁ , It is calculated based on t ₂ , ...

【００４１】ここで、図１１のフローチャートを参照
し、ステップＳ１０１において行われる再現率ＣＲ１を
求める演算の手順について説明する。まず、ステップＳ
２０１に進み、カウンタＣＮＴおよび制御変数ｉに対
し、初期値として「０」および「１」を各々設定する。Now, with reference to the flowchart of FIG. 11, the procedure of the calculation for obtaining the recall ratio CR1 performed in step S101 will be described. First, step S
In step 201, "0" and "1" are set as initial values for the counter CNT and the control variable i.

【００４２】次にステップＳ２０２に進み、制御変数ｉ
を「２」だけ増加させ、ｉ＝「３」とする。次にステッ
プＳ２０３に進み、０．９ｔ₁−ｔ_i＜０なる条件を満た
すか否か、すなわち、零クロス間隔データｔ₃が零クロ
ス間隔データｔ₁の９０％よりも大きいか否かを判断す
る。そして、この判断結果が「ＹＥＳ」の場合はカウン
タＣＮＴを「１」だけ増加させ（ステップＳ２０４）、
ステップＳ２０５へ進み、「ＮＯ」の場合はステップＳ
２０４を介すことなくステップＳ２０５に進む。次にス
テップＳ２０５に進むと、−１．１ｔ₁＋ｔ_i＜０なる条
件を満たすか否か、すなわち、零クロス間隔データｔ₃
が零クロス間隔データｔ₁の１１０％よりも小さいか否
かを判断する。そして、この判断結果が「ＹＥＳ」の場
合はカウンタＣＮＴを「１」だけ増加させ（ステップＳ
２０６）、ステップＳ２０７へ進み、「ＮＯ」の場合は
ステップＳ２０６を介すことなくステップＳ２０７に進
む。Next, in step S202, the control variable i
Is incremented by “2” to set i = “3”. Next, in step S203, it is determined whether the condition of 0.9t ₁ −t _i <0 is satisfied, that is, whether the zero-cross interval data t ₃ is larger than 90% of the zero-cross interval data t _1. To do. If the result of this determination is "YES", the counter CNT is incremented by "1" (step S204),
The process proceeds to step S205, and if "NO", the step S205
The process proceeds to step S205 without passing through 204. Then proceeds to step _{_{S205, -1.1t 1 + t i <}} 0 satisfies the condition or not made, i.e., the zero cross interval data t ₃
It is equal to or zero less than 110% of the cross interval data t _1. If the result of this determination is "YES", the counter CNT is incremented by "1" (step S
206), the process proceeds to step S207, and if “NO”, the process proceeds to step S207 without passing through step S206.

【００４３】次にステップＳ２０７に進むと、制御変数
ｉが「７」となったか否かを判断し、この判断結果が
「ＮＯ」の場合はステップＳ２０２に戻る。以後、２回
に亙ってステップＳ２０２〜Ｓ２０７が実行され、零ク
ロス間隔データｔ₅およびｔ₇の各々について上記ステッ
プＳ２０３およびＳ２０５の判断が行われ、各零クロス
間隔データが零クロス間隔データｔ₁の９０％より大き
い場合または１１０％よりも小さい場合にカウンタＣＮ
Ｔのインクリメントが行われる（ステップＳ２０４，Ｓ
２０６）。Next, in step S207, it is determined whether or not the control variable i has become "7". If the result of this determination is "NO", then the process returns to step S202. Thereafter, step S202~S207 over twice is performed, zero for each of the cross interval data t ₅ and t ₇ the determination in steps S203 and S205 are performed, the zero-cross interval data the zero cross interval data t Counter CN when greater than 90% of ₁ or less than 110%
T is incremented (steps S204, S
206).

【００４４】そして、ｉ＝「７」となると、ステップＳ
２０７の判断結果が「ＹＥＳ」となってステップＳ２０
８へ進み、制御変数ｉに「２」を設定する。When i = “7”, step S
The determination result of 207 is "YES" and step S20.
Going to step 8, the control variable i is set to "2".

【００４５】次いでステップＳ２０９に進み、制御変数
ｉを「２」だけ増加させ、ｉ＝「４」とする。次にステ
ップＳ２１０に進み、０．９ｔ₂−ｔ_i＜０なる条件を満
たすか否か、すなわち、零クロス間隔データｔ₄が零ク
ロス間隔データｔ₂の９０％よりも大きいか否かを判断
する。そして、この判断結果が「ＹＥＳ」の場合はカウ
ンタＣＮＴを「１」だけ増加させ（ステップＳ２１
１）、ステップＳ２１２へ進み、「ＮＯ」の場合はステ
ップＳ２１１を介すことなくステップＳ２１２に進む。
次にステップＳ２１２に進むと、−１．１ｔ₂＋ｔ_i＜０
なる条件を満たすか否か、すなわち、零クロス間隔デー
タｔ₄が零クロス間隔データｔ₂の１１０％よりも小さい
か否かを判断する。そして、この判断結果が「ＹＥＳ」
の場合はカウンタＣＮＴを「１」だけ増加させ（ステッ
プＳ２１３）、ステップＳ２１４へ進み、「ＮＯ」の場
合にはステップＳ２１３を介すことなくステップＳ２１
４に進む。Next, in step S209, the control variable i is increased by "2" to set i = "4". Next, in step S210, it is determined whether or not the condition of 0.9t ₂ −t _i <0 is satisfied, that is, whether or not the zero cross interval data t ₄ is larger than 90% of the zero cross interval data t _2. To do. If the result of this determination is "YES", the counter CNT is incremented by "1" (step S21).
1) The process proceeds to step S212, and if “NO”, the process proceeds to step S212 without passing through step S211.
Next, when proceeding to step S212, -1.1t ₂ + t _i <0
It is determined whether the following condition is satisfied, that is, whether the zero-cross interval data t ₄ is smaller than 110% of the zero-cross interval data t ₂ . If the result of this determination is "YES"
In the case of, the counter CNT is incremented by "1" (step S213), and the process proceeds to step S214. In the case of "NO", the process proceeds to step S21 without going through step S213.
Proceed to 4.

【００４６】次にステップＳ２１４に進むと、制御変数
ｉが「８」となったか否かを判断し、この判断結果が
「ＮＯ」の場合はステップＳ２０９に戻る。以後、２回
に亙ってステップＳ２０９〜Ｓ２１４が実行され、零ク
ロス間隔データｔ₆およびｔ₈の各々について上記ステッ
プＳ２１０およびＳ２１２の判断が行われ、各零クロス
間隔データが零クロス間隔データｔ₂の９０％より大き
い場合または１１０％よりも小さい場合にカウンタＣＮ
Ｔのインクリメントが行われる（ステップＳ２１１，Ｓ
２１３）。Next, in step S214, it is determined whether or not the control variable i has become "8". If the result of this determination is "NO", then the procedure returns to step S209. Thereafter, step S209~S214 over twice is performed, zero for each of the cross interval data t ₆ and t ₈ determines in step S210 and S212 are performed, the zero-cross interval data the zero cross interval data t Counter CN when greater than 90% of ₂ or less than 110%
T is incremented (steps S211, S
213).

【００４７】そして、ｉ＝「８」となると、ステップＳ
２１４の判断結果が「ＹＥＳ」となってステップＳ２１
５へ進み、カウンタＣＮＴの値を零クロス間隔データに
ついての判断の回数によって正規化し、その結果を再現
率ＣＲ１とする。このフローの場合、判断は１２回行わ
れるので、ＣＮＴ／１２が再現率ＣＲ１とされる。When i = “8”, step S
The determination result of 214 becomes "YES" and step S21.
5, the value of the counter CNT is normalized by the number of judgments regarding the zero-cross interval data, and the result is defined as the recall ratio CR1. In the case of this flow, since the determination is performed 12 times, CNT / 12 is set as the recall ratio CR1.

【００４８】ここで、ピッチ周期の長さを２個の零クロ
ス間隔データの和Ｔ₁とした仮定が正しく、かつ、ピッ
チ周期が４回切り換わってもデジタル音声信号の波形が
変化しない理想状態においては、ｔ₁＝ｔ₃＝ｔ₅＝ｔ₇か
つｔ₂＝ｔ₄＝ｔ₆＝ｔ₈となる。従って、この場合に上記
処理によって得られる再現率ＣＲ１は１００％となる。
また、各零クロス間隔データに多少の誤差があっても、
ｔ₃，ｔ₅およびｔ₇がｔ₁±１０％の範囲内に収ってお
り、かつ、ｔ₄，ｔ₆およびｔ₈がｔ₂±１０％の範囲内に
収っている場合には再現率ＣＲ１は１００％となる。一
方、上記仮定が誤りであるとすると、ピッチ周期が切り
換わることによって相互に対応する零クロス間隔データ
間に大きな差が生じることとなる。このため、上記ステ
ップＳ２０３等において否定的な判断がされ易くなり、
そのような否定的な判断のなされる回数の増加に応じて
再現率ＣＲ１が低下することとなる。Here, the assumption that the length of the pitch period is the sum T _{1 of the} two zero-cross interval data is correct, and the waveform of the digital audio signal does not change even if the pitch period is switched four times. At, t ₁ = t ₃ = t ₅ = t ₇ and t ₂ = t ₄ = t ₆ = t ₈ . Therefore, in this case, the recall rate CR1 obtained by the above processing is 100%.
Also, even if there is some error in each zero-cross interval data,
When t ₃ , t ₅ and t ₇ are within the range of t ₁ ± 10% and t ₄ , t ₆ and t ₈ are within the range of t ₂ ± 10% The recall rate CR1 is 100%. On the other hand, if the above assumption is incorrect, the pitch period is switched, which causes a large difference between mutually corresponding zero-cross interval data. Therefore, it is easy to make a negative judgment in step S203 and the like,
The recall rate CR1 decreases as the number of such negative determinations increases.

【００４９】このようにして再現率ＣＲ１の算出が終了
すると、図１０のフローに戻ってステップＳ１０２に進
み、上記仮定２を前提とした場合のデジタル音声信号の
波形の再現率ＣＲ２を算出する。すなわち、ピッチ周期
が４個の零クロス間隔データの和に相当する長さＴ₂を
有していると仮定する。そして、第１番目のピッチ周期
に対応した零クロス間隔データｔ₁〜ｔ₄を各々基準と
し、第２番目，第３番目および第４番目の各ピッチ周期
に対応した零クロス間隔データｔ₅〜ｔ₈，ｔ₉〜ｔ₁₂お
よびｔ₁₃〜ｔ₁₅の各々が基準と所定の誤差範囲内で一致
しているか否かを判断する。そして、肯定的な判断結果
の得られた回数をカウントし、全判断回数によって正規
化し、再現率ＣＲ２を求める。When the calculation of the reproduction ratio CR1 is completed in this way, the process returns to the flow of FIG. 10 and proceeds to step S102 to calculate the reproduction ratio CR2 of the waveform of the digital audio signal under the assumption 2. That is, it is assumed that the pitch period has a length T ₂ corresponding to the sum of four zero-cross interval data. Then, the first-th zero crossing interval data t ₁ ~t ₄ corresponding to the pitch period, respectively as a reference, the second, third and fourth zero corresponding to the pitch period of the cross interval data t ₅ ~ t _8, t ₉ each ~t ₁₂ and t ₁₃ ~t ₁₅ determines whether or not the match in the reference and the predetermined error range. Then, the number of times that a positive determination result is obtained is counted and normalized by the total number of determinations to obtain the recall rate CR2.

【００５０】ピッチ周期の長さを４個の零クロス間隔デ
ータの和とした仮定が正しく、かつ、ピッチ周期が４回
切り換わってもデジタル音声信号の波形が変化しない理
想状態においては、ｔ₁＝ｔ₅＝ｔ₉＝ｔ₁₃ ｔ₂＝ｔ₆＝ｔ₁₀＝ｔ₁₄ ｔ₃＝ｔ₇＝ｔ₁₁＝ｔ₁₅ ｔ₄＝ｔ₈＝ｔ₁₂＝ｔ₁₆ なる条件を全て満たし、再現率ＣＲ２は１００％とな
る。また、各零クロス間隔データに多少の誤差があって
も、±１０％の範囲内に収っている場合には再現率ＣＲ
２は１００％となる。ピッチ周期が切り換わることによ
って基準（すなわち、第１番目のピッチ周期に対応した
零クロス間隔データ）から大きくずれた零クロス間隔デ
ータが生じる場合には、その個数に応じて再現率ＣＲ２
が低下することとなる。The assumption that the length of the pitch cycle is the sum of four zero-cross interval data is correct, and in the ideal state in which the waveform of the digital audio signal does not change even if the pitch cycle is switched four times, t ₁ = T ₅ = t ₉ = t ₁₃ t ₂ = t ₆ = t ₁₀ = t ₁₄ t ₃ = t ₇ = t ₁₁ = t ₁₅ t ₄ = t ₈ = t ₁₂ = t ₁₆ All the conditions are satisfied and the recall is CR2 is 100%. Even if there is some error in each zero-crossing interval data, if it is within ± 10%, recall ratio CR
2 is 100%. When the pitch cycle is switched, if zero-cross interval data that largely deviates from the reference (that is, the zero-cross interval data corresponding to the first pitch cycle) is generated, the recall ratio CR2 is set according to the number.
Will decrease.

【００５１】次にステップＳ１０３に進み、上記仮定３
を前提とした場合のデジタル音声信号の波形の再現率Ｃ
Ｒ３を算出する。すなわち、ピッチ周期が６個の零クロ
ス間隔データの和に相当する長さＴ₃を有していると仮
定する。そして、第１番目のピッチ周期に対応した零ク
ロス間隔データｔ₁〜ｔ₆を各々基準とし、第２番目，第
３番目および第４番目の各ピッチ周期に対応した零クロ
ス間隔データｔ₇〜ｔ₁ ₂，ｔ₁₃〜ｔ₁₈およびｔ₁₉〜ｔ₂₄
の各々が基準と所定の誤差範囲内で一致しているか否か
を判断する。そして、肯定的な判断結果の得られた回数
をカウントし、全判断回数によって正規化し、再現率Ｃ
Ｒ３を求める。Next, in step S103, the above assumption 3
Reproduction rate C of digital audio signal waveform
Calculate R3. That is, it is assumed that the pitch period has a length T ₃ corresponding to the sum of 6 zero-cross interval data. Then, the zero-cross interval data t _{1 to} t ₆ corresponding to the first pitch period are used as references, and the zero-cross interval data t ₇ to corresponding to the second, third, and fourth pitch periods are compared. t ₁ _2, t ₁₃ ~t ₁₈ and t ₁₉ ~t ₂₄
It is determined whether or not each of them matches the reference within a predetermined error range. Then, the number of times that a positive determination result is obtained is counted and normalized by the total number of determinations, and the recall ratio C
Find R3.

【００５２】この再現率ＣＲ３は、ｔ₁＝ｔ₇＝ｔ₁₃＝ｔ₁₉ ｔ₂＝ｔ₈＝ｔ₁₄＝ｔ₂₀ ｔ₃＝ｔ₉＝ｔ₁₅＝ｔ₂₁ ｔ₄＝ｔ₁₀＝ｔ₁₆＝ｔ₂₂ ｔ₅＝ｔ₁₁＝ｔ₁₇＝ｔ₂₃ ｔ₆＝ｔ₁₂＝ｔ₁₈＝ｔ₂₄ なる条件を全て満たす場合あるいは各零クロス間隔デー
タに多少の誤差があっても±１０％の範囲内の誤差であ
る場合には再現率ＣＲ３は１００％となる。また、誤差
の大きな零クロス間隔データが生じる場合にはその個数
に応じて再現率ＣＲ３が低下する。This recall rate CR3 is t ₁ = t ₇ = t ₁₃ = t ₁₉ t ₂ = t ₈ = t ₁₄ = t ₂₀ t ₃ = t ₉ = t ₁₅ = t ₂₁ t ₄ = t ₁₀ = t ₁₆ = T ₂₂ t ₅ = t ₁₁ = t ₁₇ = t ₂₃ t ₆ = t ₁₂ = t ₁₈ = t ₂₄ If all the conditions are met or each zero crossing interval data has some error, it is within ± 10%. If the error is within, the recall CR3 is 100%. Further, when the zero-cross interval data having a large error is generated, the recall ratio CR3 is lowered according to the number of the zero-cross interval data.

【００５３】次にＳ１０４に進み、上記仮定４を前提と
した場合のデジタル音声信号の波形の再現率ＣＲ３を算
出する。すなわち、ピッチ周期が８個の零クロス間隔デ
ータの和に相当する長さＴ₄を有していると仮定する。
そして、第１番目のピッチ周期に対応した零クロス間隔
データｔ₁〜ｔ₈を各々基準とし、第２番目および第３番
目の各ピッチ周期に対応した零クロス間隔データｔ₉〜
ｔ₁₆およびｔ₁₇〜ｔ₂₄の各々が基準と所定の誤差範囲内
で一致しているか否かを判断する。そして、肯定的な判
断結果の得られた回数をカウントし、全判断回数によっ
て正規化し、再現率ＣＲ４を求める。Next, in S104, the recall ratio CR3 of the waveform of the digital audio signal under the assumption 4 is calculated. That is, it is assumed that the pitch period has a length T ₄ corresponding to the sum of eight zero-cross interval data.
Then, with the zero-cross interval data t _{1 to} t ₈ corresponding to the first pitch cycle as a reference, the zero-cross interval data t ₉ to t ₉ corresponding to the second and third pitch cycles, respectively.
It is determined whether or not each of t ₁₆ and t _{17 to} t ₂₄ matches the reference within a predetermined error range. Then, the number of times that a positive determination result is obtained is counted and normalized by the total number of determinations to obtain the recall ratio CR4.

【００５４】上記ステップＳ１０１〜Ｓ１０３までの各
処理においては４個分のピッチ周期を処理対象とした
が、このステップＳ１０４においては３個分のピッチ周
期（図９（ｂ４）におけるＴ₄₁〜Ｔ₄₃）を処理対象とし
ている。これは次の理由によるものである。すなわち、
ステップＳ１０４においては、ピッチ周期として８個分
の零クロス間隔データに相当する長い時間を仮定してい
る。従って、仮にステップＳ１０４において４個分のピ
ッチ周期を処理対象とすると、たとえ仮定４が正しい場
合であっても、４個分のピッチ周期という極めて長時間
に亙ってデジタル音声信号波形が安定していないと再現
率ＣＲ４が低下することとなる。しかし、デジタル音声
信号の波形は、ある程度の短時間の間は同一波形を維持
し得るが、ある程度の時間が経つと波形に変化が生じる
ものである。このため、４個分のピッチ周期を処理対象
とした場合には、たとえ仮定４が正しかったとしても、
デジタル音声信号の波形の時間的変化の影響によって不
当に低い再現率ＣＲ４が演算されてしまう可能性が高
い。そこで、ステップＳ１０４においては、上述の通り
３個分のピッチ周期を処理対象としている。[0054] In the process up to the step S101~S103 is has been processed the pitch period of the 4 pieces of, T ₄₁ through T ₄₃ in the pitch period corresponding to three in step S104 (FIG. 9 (b4) ) Is the processing target. This is for the following reason. That is,
In step S104, it is assumed that the pitch period is a long time corresponding to eight zero-cross interval data. Therefore, if four pitch periods are to be processed in step S104, even if the assumption 4 is correct, the digital audio signal waveform will be stable for a very long time of four pitch periods. Otherwise, the recall rate CR4 will decrease. However, the waveform of the digital audio signal can maintain the same waveform for a certain short time, but the waveform changes after a certain amount of time. Therefore, when four pitch periods are to be processed, even if Assumption 4 is correct,
There is a high possibility that an unreasonably low recall ratio CR4 will be calculated due to the influence of the temporal change of the waveform of the digital audio signal. Therefore, in step S104, three pitch periods are processed as described above.

【００５５】ステップＳ１０４において、再現率ＣＲ４
は、ｔ₁＝ｔ₉＝ｔ₁₇ ｔ₂＝ｔ₁₀＝ｔ₁₈ ｔ₃＝ｔ₁₁＝ｔ₁₉ ｔ₄＝ｔ₁₂＝ｔ₂₀ ｔ₅＝ｔ₁₃＝ｔ₂₁ ｔ₆＝ｔ₁₄＝ｔ₂₂ ｔ₇＝ｔ₁₅＝ｔ₂₃ ｔ₈＝ｔ₁₆＝ｔ₂₄ なる条件を全て満たす場合あるいは各零クロス間隔デー
タに多少の誤差があっても±１０％の範囲内の誤差であ
る場合には再現率ＣＲ４は１００％となる。また、誤差
の大きな零クロス間隔データが生じる場合にはその個数
に応じて再現率ＣＲ４が低下する。In step S104, the recall ratio CR4
Is t ₁ = t ₉ = t ₁₇ t ₂ = t ₁₀ = t ₁₈ t ₃ = t ₁₁ = t ₁₉ t ₄ = t ₁₂ = t ₂₀ t ₅ = t ₁₃ = t ₂₁ t ₆ = t ₁₄ = t ₂₂ Reproduced if all the conditions of t ₇ = t ₁₅ = t ₂₃ t ₈ = t ₁₆ = t ₂₄ are satisfied, or if there is some error in each zero-cross interval data, but within ± 10% of the error. The rate CR4 is 100%. Further, when the zero-cross interval data having a large error is generated, the recall rate CR4 is reduced according to the number of the zero-cross interval data.

【００５６】次にステップＳ１０５に進み、以上のよう
にして求めた再現率ＣＲ１〜ＣＲ４に基づき、仮定１〜
４のいずれが妥当であるか否かを判断する。この判断の
詳細なフローを図１２に示す。まず、ステップＳ３０１
に進み、再現率ＣＲ１〜ＣＲ４のうちどれが最大である
かを判断する。そして、再現率ＣＲ１が最大である場合
は、このＣＲ１が所定の基準値ｒｅｆよりも大きいか否
かを判断し（ステップＳ３０２）、この判断結果が「Ｙ
ＥＳ」の場合には仮定１に従うこと、すなわち、２個分
の零クロス間隔データの長さＴ₁によりピッチ周期を求
めることとする。他の再現率ＣＲ２〜ＣＲ４が最大であ
る場合も同様であり、ＣＲ２等が所定の基準値ｒｅｆよ
りも大きいか否かを判断し（ステップＳ３０３〜Ｓ３０
５）、この判断結果が「ＹＥＳ」の場合には、各再現率
の算出の前提となった仮定に従い、４個分の零クロス間
隔データの長さＴ₂、６個分の零クロス間隔データの長
さＴ₃あるいは８個分の零クロス間隔データの長さＴ₄に
よりピッチ周期を求めることとする。万一、再現率が同
じ場合には、その優先順位は、ＣＲ１＞ＣＲ２＞ＣＲ３
＞ＣＲ４（ＣＲ１が最優先）である。Next, in step S105, assumptions 1 to 1 are made based on the recall rates CR1 to CR4 obtained as described above.
It is judged which of the four is appropriate. The detailed flow of this determination is shown in FIG. First, step S301
Then, it is determined which of the recall rates CR1 to CR4 is the largest. Then, when the recall ratio CR1 is the maximum, it is determined whether or not this CR1 is larger than a predetermined reference value ref (step S302), and the determination result is "Y.
In the case of "ES", the assumption 1 is followed, that is, the pitch period is obtained from the length T ₁ of the two zero-cross interval data. The same applies when the other recall rates CR2 to CR4 are maximum, and it is determined whether CR2 or the like is larger than a predetermined reference value ref (steps S303 to S30).
5) If the result of this judgment is "YES", the length T ₂ of four zero-cross interval data and the six zero-cross interval data according to the assumption on which each recall is calculated. The pitch period is calculated from the length T ₃ or the length T ₄ of eight zero-cross interval data. If the recall is the same, the priority is CR1>CR2> CR3.
> CR4 (CR1 is the highest priority).

【００５７】一方、再現率ＣＲ１〜ＣＲ４のうち最大の
ものが基準値ｒｅｆ以下である場合には、ステップＳ３
０２〜Ｓ３０５のいずれに進んだとしても判断結果が
「ＮＯ」となる。この場合、仮定１〜４のいずれが妥当
であるか結論を出すことができず、該当なしという判断
結果となる。On the other hand, if the maximum one of the recall rates CR1 to CR4 is less than or equal to the reference value ref, step S3.
The determination result is “NO” regardless of which of the steps from 02 to S305. In this case, it is not possible to conclude which of the assumptions 1 to 4 is valid, and the judgment result is not applicable.

【００５８】以上の判断が終了すると、図１０に示すフ
ローに戻り、判断結果に対応したステップへ進む。すな
わち、２個分の零クロス間隔データの長さＴ₁によりピ
ッチ周期を求めることと判断した場合にはステップＳ１
０６に進み、各々２個分の零クロス間隔データからなる
ピッチ周期を４周期分求め（図９（ｂ１）のＴ₁₁〜Ｔ₁₄
に相当）、これらの平均値をデジタル音声信号のピッチ
周期とする。また、４個分の零クロス間隔データの長さ
Ｔ₂によりピッチ周期を求めることと判断した場合には
ステップＳ１０７に進み、この判断結果に従ってピッチ
周期を４周期分求め（図９（ｂ２）のＴ₂₁〜Ｔ₂₄に相
当）、これらの平均値をデジタル音声信号のピッチ周期
とする。また、６個分の零クロス間隔データの長さＴ₃
によりピッチ周期を求めることと判断した場合にはステ
ップＳ１０８に進み、この判断結果に従ってピッチ周期
を４周期分求め（図９（ｂ３）のＴ₃₁〜Ｔ₃₄に相当）、
これらの平均値をデジタル音声信号のピッチ周期とす
る。そして、８個分の零クロス間隔データの長さＴ₄に
よりピッチ周期を求めることと判断した場合にはステッ
プＳ１０９に進み、この判断結果に従ってピッチ周期を
３周期分求め（図９（ｂ４）のＴ₄₁〜Ｔ₄₃に相当）、こ
れらの平均値をデジタル音声信号のピッチ周期とする。When the above judgment is completed, the process returns to the flow shown in FIG. 10 and proceeds to the step corresponding to the judgment result. That is, when it is determined that the pitch period is to be obtained from the length T ₁ of the two zero-cross interval data, step S1
In step 06, four pitch periods each consisting of two zero-cross interval data are obtained (T _{11 to} T _{14 in} FIG. 9B1).
Corresponding to the pitch period of the digital audio signal. If it is determined that the pitch cycle is to be calculated from the length T ₂ of the four zero-cross interval data, the process proceeds to step S107, and the pitch cycle is calculated for four cycles according to the result of this judgment (see FIG. 9 (b2)). (Corresponding to T _{21 to} T ₂₄ ), and the average value of them is used as the pitch period of the digital audio signal. In addition, the length T ₃ of the zero-cross interval data for 6 pieces
Proceeds to step S108 when it is determined that obtaining the pitch period, the pitch period four periods determined according to the determination result (equivalent to T ₃₁ through T ₃₄ in FIG. 9 (b3)), the
Let these average values be the pitch period of the digital audio signal. When it is determined that the pitch cycle is to be calculated from the length T ₄ of the eight zero-cross interval data, the process proceeds to step S109, and the pitch cycle is calculated for three cycles according to the judgment result (see FIG. 9 (b4)). (Equivalent to T _{41 to} T ₄₃ ), and the average value of them is used as the pitch period of the digital audio signal.

【００５９】以上の処理が終了すると、ステップＳ１０
１へ戻り、同様の処理を繰り返す。このようにして、デ
ジタル音声信号のピッチ周期が連続的に出力される訳で
ある。一方、図１２の判断において、「該当なし」との
結論が得られた場合にはピッチ周期の演算は行わず、ピ
ッチ周期の演算を行わなかった旨を示す信号を出力し、
ステップＳ１０１に戻る。なお、上記においては、デジ
タル音声信号の場合を例にピッチ周期の演算処理を説明
したが、デジタルお手本信号についても全く同様な処理
によりピッチ周期が演算される。When the above processing is completed, step S10
Return to 1 and repeat the same process. In this way, the pitch period of the digital audio signal is continuously output. On the other hand, in the determination of FIG. 12, when the conclusion “not applicable” is obtained, the pitch period is not calculated, and a signal indicating that the pitch period is not calculated is output.
It returns to step S101. In the above description, the calculation process of the pitch period is described by taking the case of the digital audio signal as an example, but the pitch period is calculated by the same process for the digital model signal.

【００６０】以上のように、本実施形態は、仮定１〜４
のすべてについて再現率を求め、最も高い再現率の得ら
れた仮定を選択し、この選択した仮定に基づくピッチ演
算を当該再現率が許容範囲内である場合に限って実施
し、許容範囲外である場合は実施しないという慎重な手
順を踏むものである。このような慎重な手順を踏むこと
とした理由は次の通りである。As described above, in this embodiment, assumptions 1 to 4 are assumed.
For all of the above, recall is selected, the assumption that yields the highest recall is selected, and pitch calculation based on this selected assumption is performed only when the recall is within the allowable range. In some cases, it is a prudent procedure that is not done. The reasons for choosing such a careful procedure are as follows.

【００６１】ａ．上記手順以外のものとして、例えば仮
定１〜４に対応した各再現率を順次演算してゆき、許容
範囲内の再現率が得られた時点で演算を終了し、その再
現率の得られた仮定を選択してピッチ周期を求めるよう
な代替案が考えられる。しかしながら、音声波形によっ
ては、例えば仮定１および３に対応した再現率が許容範
囲内にあり、しかも仮定３に対応した再現率の方が仮定
１のものよりも高いという状況の生じることが有り得
る。かかる場合にこの代替案に従うとすると、仮定１を
選択し、誤ったピッチ周期を求めることとなる。仮定の
選択が正しくなされるように許容範囲を狭く設定するこ
とも考えられるが、その場合には「該当なし」と判断さ
れるケースが続出するおそれがある。A. As a procedure other than the above procedure, for example, each recall rate corresponding to assumptions 1 to 4 is sequentially calculated, and when the recall rate within the allowable range is obtained, the calculation is ended, and the recall rate is obtained. An alternative method is conceivable in which is selected to obtain the pitch period. However, depending on the voice waveform, for example, the recall rates corresponding to Assumption 1 and 3 are within the allowable range, and the recall rate corresponding to Assumption 3 is higher than that of Assumption 1. If this alternative would be followed in such a case, assumption 1 would be selected and the wrong pitch period would be determined. It is conceivable to set the allowable range to be narrow so that the assumptions can be selected correctly, but in that case, there may be a number of cases in which it is determined that “not applicable”.

【００６２】ｂ．また、仮定１〜４に対応した各再現率
をすべて演算し、最大の再現率の得られた仮定を無条件
に採用し、ピッチ周期を求めるという代替案も考えられ
る。しかしながら、いずれの仮定に対応した再現率も一
様に低く、特定の仮定に対応した再現率が僅かに他より
勝っているようなケースが生じる場合が考えられ、この
ような場合に特定の仮定を採用して無理にピッチ周期を
求めたとしても果たして正確なピッチ周期が得られる
か、その保証はない。例えばピッチ周期をデジタル音声
信号の波形が急激に変化した場合等においては、上記仮
定のいずれにおいても再現率が低くなる可能性が高い。B. Further, an alternative method is also conceivable in which all recall rates corresponding to assumptions 1 to 4 are calculated, the assumption that yields the maximum recall rate is unconditionally adopted, and the pitch period is obtained. However, there are cases in which the recall corresponding to any of the assumptions is uniformly low, and the recall corresponding to a particular hypothesis may be slightly superior to the others. There is no guarantee that an accurate pitch cycle can be obtained even if the pitch cycle is forcibly calculated by adopting. For example, when the waveform of the digital audio signal changes abruptly in the pitch period, the recall is likely to be low under any of the above assumptions.

【００６３】ｃ．そこで、本実施形態においては、上述
の手順に従ってピッチ周期の演算をすることとし、不適
当なピッチ周期の出力を防止している。C. Therefore, in the present embodiment, the pitch cycle is calculated according to the above procedure to prevent the output of an inappropriate pitch cycle.

【００６４】以上のようにして求められるデジタル音声
信号およびデジタルお手本信号の各ピッチ周期が採点部
１３に順次報告され、この両信号のピッチ周期のずれと
レベル検出部１２によって求められた両信号レベルのず
れとの総合評価により、歌唱者の歌が採点され、採点結
果が表示部１４に表示される。The pitch periods of the digital voice signal and the digital model signal obtained as described above are sequentially reported to the scoring unit 13, and the deviation of the pitch period of both signals and the level of both signals obtained by the level detection unit 12 are reported. The song of the singer is scored by the comprehensive evaluation of the deviation and the scored result is displayed on the display unit 14.

【００６５】Ｃ．本実施形態に係る装置の評価結果以上説明したピッチ周期検出装置について各部の動作条
件を種々設定し、ピッチ周期の検出時間および検出誤差
の評価を行った。図１３〜図１６はその結果を示すもの
である。まず、図１３は、４倍オーバーサンプリング部
７として直線補間を行う回路を使用し、この回路のオー
バーサンプリング周波数を種々に変化させ、実用域での
ピッチ周期の検出誤差を測定した結果である。この結果
より、４倍オーバーサンプリング程度の補間を行えば実
用域での検出誤差を充分に小さくすることができること
がわかる。次に図１４は、ピッチ周期を３周期間の相関
により求めた場合（ｍ＝３）と４周期間の相関により求
めた場合（ｍ＝４）の各々について、ピッチ周期が検出
されるまでの遅れ時間を入力周波数毎に測定した結果を
示すものである。この実験結果が示すように、ｍ＝３ま
たは４程度であれば、検出遅れを問題のない範囲に収め
ることができる。また、図１５は、平均化の回数とピッ
チ周期の抽出誤差との関係を示している。また、図１６
は、過去何周期（ピッチ周期）分の波形と比較をすれば
正確にピッチ周期を抽出できるかを実験した結果を示す
ものである。この実験結果は、過去２周期程度を比較し
たのでは誤差が多く、過去５周期以上の入力波形を比較
したのでは波形が古過ぎて却ってピッチ周期を誤ってし
まい、結局のところ、過去３〜４周期に亙って入力波形
の比較を行うことが正確なピッチ抽出を行う上で最適で
あることを物語っている。C. Evaluation Results of the Device According to the Present Embodiment With respect to the pitch period detection device described above, various operating conditions of each part were set, and the pitch period detection time and the detection error were evaluated. 13 to 16 show the results. First, FIG. 13 shows a result of measuring a pitch cycle detection error in a practical range by using a circuit that performs linear interpolation as the quadruple oversampling unit 7, variously changing the oversampling frequency of the circuit. From this result, it is understood that the detection error in the practical range can be sufficiently reduced by performing the interpolation of about 4 times oversampling. Next, FIG. 14 shows the process until the pitch period is detected for each of the case where the pitch period is obtained by correlation between three periods (m = 3) and the case where it is obtained by correlation between four periods (m = 4). It shows the result of measuring the delay time for each input frequency. As shown by the results of this experiment, if m = 3 or 4, the detection delay can be kept within a range without any problem. FIG. 15 shows the relationship between the number of times of averaging and the pitch cycle extraction error. FIG.
Shows the result of an experiment on how accurately the pitch period can be extracted by comparing with the waveform of the past period (pitch period). This experimental result shows that there are many errors when comparing the past two cycles or so, and when the input waveforms of the past five cycles or more are compared, the waveform is too old and the pitch period is wrong, and, in the It shows that the comparison of input waveforms over four cycles is the most suitable for accurate pitch extraction.

【００６６】Ｄ．変形例（１）上記実施形態においては、１ピッチ周期を構成す
る各零クロス間隔データが各ピッチ周期間でどの程度一
致しているかにより、ピッチ周期を２ｎ個分の零クロス
間隔データの和とした仮定が妥当か否かの判断を行っ
た。この方法の代りに、各ｎについて、２ｎ個分の零ク
ロス間隔データの和を演算することにより所定個数のピ
ッチ周期を求め、これらのピッチ周期のばらつきが最も
少ないｎを選択し、ピッチ周期を選択するようにしても
よい。すなわち、図９（ｂ１）〜（ｂ４）において、Ｔ
₁₁〜Ｔ₁₄のばらつきが最も小さい場合はＴ₁₁〜Ｔ₁₄の平
均値をピッチ周期とし、Ｔ₂₁〜Ｔ₂₄のばらつきが最も小
さい場合はＴ₂₁〜Ｔ₂₄の平均値をピッチ周期とし、…と
いう具合にピッチ周期を求める訳である。また、上記実
施形態において開示した零クロス間隔データに基づく判
定方法とこのピッチ周期のばらつきに求める判定方法を
併用し、零クロス間隔データおよびピッチ周期の長さの
ピッチ周期間ばらつきを総合評価し、ピッチ周期を選択
するようにしてもよい。D. Modification (1) In the above-described embodiment, the pitch period is set to the sum of 2n zero-cross interval data depending on how much the zero-cross interval data forming one pitch period match between the pitch periods. It was judged whether the assumptions made were valid. Instead of this method, for each n, a predetermined number of pitch periods is calculated by calculating the sum of 2n zero-cross interval data, and n having the smallest variation in these pitch periods is selected, You may make it select. That is, in FIGS. 9 (b1) to 9 (b4), T
_When the variation of _{11 to} T ₁₄ is the smallest, the average value of T _{11 to} T ₁₄ is the pitch period, and when the variation of the T _{21 to} T ₂₄ is the smallest, the average value of T _{21 to} T ₂₄ is the pitch period, ... That is, the pitch period is calculated. Further, the determination method based on the zero-cross interval data disclosed in the above embodiment and the determination method for determining the variation of the pitch period are used together, and the zero-interval interval data and the variation between the pitch periods of the pitch period are comprehensively evaluated, The pitch cycle may be selected.

【００６７】（２）上記実施形態において、２値化部８
のマスキング帯の幅Δを固定とした。しかし、零レベル
付近に生じる音声波形の微小な上下動の振幅は、音声波
形全体の振幅に依存するため、適切なΔを決めるのが困
難な場合もある。そこで、デジタル音声信号またはデジ
タルお手本信号の振幅を検出し、この振幅値に所定の係
数を乗じ、その結果をΔとする等の方法により、２値化
部８のマスキング帯の幅Δの制御を行うのが好ましい。（３）上記実施形態ではデジタル処理によりピッチ周期
を求めたが、アナログ音声波形を直接２値化することに
より零クロス間隔を求め、その結果に基づいてピッチ周
期を求めるようにしてもよい。(2) In the above embodiment, the binarization unit 8
The width Δ of the masking band was fixed. However, since the amplitude of the minute vertical movement of the voice waveform that occurs near the zero level depends on the amplitude of the entire voice waveform, it may be difficult to determine an appropriate Δ. Therefore, the width Δ of the masking band of the binarization unit 8 is controlled by a method such as detecting the amplitude of the digital voice signal or the digital model signal, multiplying the amplitude value by a predetermined coefficient, and setting the result as Δ. It is preferable to carry out. (3) In the above embodiment, the pitch period is obtained by digital processing. However, the zero crossing interval may be obtained by directly binarizing the analog voice waveform, and the pitch period may be obtained based on the result.

【００６８】[0068]

【発明の効果】以上説明したように、本発明によれば、
零レベル近傍にマスキング帯を有する２値化手段によっ
て音声波形を２値信号に変換し、この２値信号に基づい
て音声波形の連続した零クロス間隔を求め、各種のｎに
ついて、ピッチ周期を２ｎ個分の零クロス間隔データの
和と仮定し、各ピッチ周期間での音声波形の一致度を求
め、一致度の最も優れた仮定を採用してピッチ周期を求
めるようにしたので、音声波形が零レベル付近に微小な
振動を有している場合であっても、高速かつ正確にピッ
チ周期を求めることができるという効果がある。As described above, according to the present invention,
The voice waveform is converted into a binary signal by a binarizing means having a masking band near the zero level, and continuous zero crossing intervals of the voice waveform are obtained based on the binary signal, and the pitch period is 2n for various n. Assuming that the sum of the zero-cross interval data for each piece, the degree of coincidence of the speech waveform between each pitch period is obtained, and the pitch period is obtained by adopting the assumption of the highest degree of coincidence. Even if there is a minute vibration in the vicinity of the zero level, there is an effect that the pitch period can be accurately obtained at high speed.

[Brief description of drawings]

【図１】この発明の一実施形態の構成を示すブロック
図である。FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention.

【図２】同実施形態における４倍オーバーサンプリン
グ部の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a 4 × oversampling unit in the same embodiment.

【図３】同実施形態における２値化部の構成を例示す
るブロック図である。FIG. 3 is a block diagram illustrating a configuration of a binarizing unit in the same embodiment.

【図４】同実施形態における２値化部の構成を例示す
るブロック図である。FIG. 4 is a block diagram illustrating a configuration of a binarization unit in the same embodiment.

【図５】同実施形態におけるタイマ、ＲＡＭおよびこ
れらの制御系を示すブロック図である。FIG. 5 is a block diagram showing a timer, a RAM, and their control system in the same embodiment.

【図６】同実施形態における４倍オーバーサンプリン
グ部の動作を示す図である。FIG. 6 is a diagram showing an operation of a 4 × oversampling unit in the embodiment.

【図７】同実施形態における２値化部の動作を示す図
である。FIG. 7 is a diagram showing an operation of a binarizing unit in the same embodiment.

【図８】同実施形態における書込制御部の動作を示す
図である。FIG. 8 is a diagram showing an operation of a write control unit in the same embodiment.

【図９】同実施形態におけるピッチ周期の算出処理の
概要を説明する図である。FIG. 9 is a diagram illustrating an outline of pitch period calculation processing according to the first embodiment.

【図１０】同実施形態におけるピッチ周期の算出処理
を示すフローチャートである。FIG. 10 is a flowchart showing a pitch cycle calculation process in the same embodiment.

【図１１】同実施形態におけるピッチ周期の算出処理
を示すフローチャートである。FIG. 11 is a flowchart showing pitch period calculation processing in the same embodiment.

【図１２】同実施形態におけるピッチ周期の算出処理
を示すフローチャートである。FIG. 12 is a flowchart showing a pitch period calculation process in the same embodiment.

【図１３】同実施形態の性能評価結果を示す図であ
る。FIG. 13 is a diagram showing a performance evaluation result of the same embodiment.

【図１４】同実施形態の性能評価結果を示す図であ
る。FIG. 14 is a diagram showing a performance evaluation result of the same embodiment.

【図１５】同実施形態の性能評価結果を示す図であ
る。FIG. 15 is a diagram showing a result of performance evaluation of the same embodiment.

【図１６】同実施形態の性能評価結果を示す図であ
る。FIG. 16 is a diagram showing a performance evaluation result of the same embodiment.

[Explanation of symbols]

１……ＣＤ、２……ボーカル抽出部、３……マイクロホ
ン、４……Ａ／Ｄ変換器、５……ＤＣ除去部、６……Ｌ
ＰＦ、７……４倍オーバーサンプリング部、８……２値
化部、９……タイマ（零クロス間隔計測手段）、１０…
…ＲＡＭ（零クロス間隔計測手段）、１１……ピッチ演
算部（ピッチ演算手段）、１２……レベル検出部、１３
……採点部、１４……表示部。1 ... CD, 2 ... vocal extractor, 3 ... microphone, 4 ... A / D converter, 5 ... DC remover, 6 ... L
PF, 7 ... 4 times oversampling unit, 8 ... Binarizing unit, 9 ... Timer (zero crossing interval measuring means), 10 ...
RAM (zero cross interval measuring means), 11 pitch calculating section (pitch calculating means), 12 level detecting section, 13
…… Scoring section, 14 …… Display section.

───────────────────────────────────────────────────── フロントページの続き (72)発明者山本裕介静岡県浜松市中沢町10番１号ヤマハ株式会社内 ─────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Yusuke Yamamoto 10-1 Nakazawa-machi, Hamamatsu-shi, Shizuoka Yamaha Stock Company

Claims

[Claims]

1. A binarizing means for binarizing a voice waveform with a zero level as a reference and outputting a binary signal; and a continuous zero crossing interval t ₁ , t ₂ , of the voice waveform based on the binary signal. A zero crossing interval measuring means for measuring ... And variously changing n (n is an integer of 1 or more), and for each n, a total sum T of 2n zero crossing intervals T = (t ₁ + t ₂ + ...
Assuming that t _2n ) is a pitch period, the zero crossing interval t ₁ ,
Based on t ₂ , ..., the degree of coincidence of the voice waveforms between adjacent pitch periods of m periods (m is an integer of 2 or more) is calculated, and n having the highest degree of coincidence of the voice waveforms is calculated. A pitch detecting device comprising a pitch calculating means for obtaining a pitch cycle by selecting, wherein the binarizing means uses a masking band within a predetermined range from the zero level, and the voice waveform is Pitch detection characterized by inverting the binary signal only when it changes across the masking band and maintaining the binary signal just before entering the masking band if the speech waveform is within the masking band. apparatus.

2. The pitch detecting apparatus according to claim 1, wherein the width of the masking band is controlled according to the amplitude of the voice waveform.