JP3195700B2

JP3195700B2 - Voice analyzer

Info

Publication number: JP3195700B2
Application number: JP28784293A
Authority: JP
Inventors: 俊男萩原
Original assignee: 株式会社スペクトラ
Priority date: 1993-10-23
Filing date: 1993-10-23
Publication date: 2001-08-06
Anticipated expiration: 2016-08-06
Also published as: JPH07121196A

Abstract

PURPOSE:To prevent the deterioration in tone quality by remarkably reducing an error rate of pitch extraction from an audio signal and the error rate of discrimination of sound/soundless. CONSTITUTION:The minimum value and the nearest maximum value of a normalized mean amplitude difference function are detected, and the true minimum value is obtained from the minimum value by an interpolater 9, and a difference between the true minimum value and the nearest maximum value is obtained by an ALU 10 to be inputted to a pitch detector 11, and further, a guide pitch detected by the detector is obtained by the ALU 12 to be inputted to the pitch detector 11. In addition, the audio signal from an A/D converter 3, an extraction parameter from a spectal envelope parameter extractor 4 and the true minimum value and the nearest maximum value of the normalized mean amplitude difference function corresponding to the detected pitch from the pitch detector 11 are inputted to a sound/soundless discriminator 13.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声信号から音源情報
の基本的パラメータであるピッチおよび有声音・無声音
判別パラメータを自動分析・抽出する音声分析装置に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech analyzer for automatically analyzing and extracting pitch and voiced / unvoiced sound discrimination parameters, which are basic parameters of sound source information, from a speech signal.

【０００２】[0002]

【従来の技術】従来、この種の音声分析法の代表的なも
のとして、（イ）自己相関法、（ロ）変形相関法、
（ハ）ＳＩＦＴアルゴリズム、（ニ）平均振幅差関数
（ＡＭＤＦ：ＡｖｅｒａｇｅＭａｇｎｉｔｕｄｅＤ
ｉｆｆｅｒｅｎｃｅＦｕｎｃｔｉｏｎ）があるが、い
ずれの方法においても自動的なピッチ抽出については、
真の基本周期の２倍のピッチ（倍ピッチ）や半分のピッ
チ（半ピッチ）を誤って抽出したり、有声音・無声音の
判別を誤ったりして、音声合成時に音質が劣化する大き
な原因となっていた。2. Description of the Related Art Conventionally, typical examples of this type of speech analysis method include (a) an autocorrelation method, (b) a modified correlation method,
(C) SIFT algorithm, (d) Average amplitude difference function (AMDF: Average Magnitude D)
In any case, for automatic pitch extraction,
A major cause of sound quality degradation during speech synthesis is erroneous extraction of a pitch twice as long as the true fundamental period (double pitch) or half of the pitch (half pitch) or erroneous discrimination between voiced and unvoiced sounds. Had become.

【０００３】[0003]

【発明が解決しようとする課題】本発明は、上記音声分
析の問題点に鑑み、ピッチ抽出の誤り率と、有声音・無
声音の判別誤り率とを著しく低減することにより、音質
の劣化を極力防止するようにした音声分析装置を提供す
るものである。SUMMARY OF THE INVENTION In view of the above problems in speech analysis, the present invention significantly reduces the pitch extraction error rate and voiced / unvoiced discrimination error rate, thereby minimizing sound quality degradation. It is an object of the present invention to provide a voice analysis device which is prevented.

【０００４】[0004]

【課題を解決するための手段】本発明は、（１）音声信号の入力端子と、第１の低域通過フィル
タと、アナログデジタル変換器と、スペクトル包絡パラ
メータ抽出器と、第２の低域通過フィルタと、これらを
通して所定のサンプリング周期でデジタル化された一定
時間長（１フレーム）の音声波形データの正規化平均振
幅差関数を算出する平均振幅差関数演算器とを有し、こ
の演算器で得られた正規化平均振幅差関数が極小値をと
る遅延量から音声信号の基本周波数のピッチを検出する
ピッチ検出器と、を備えた音声分析装置において、前記
平均振幅差関数演算器で得られた前記正規化平均振幅差
関数の極小値と直近の極大値とを検出する平均振幅差関
数極小値・極大値検出器と、この検出器で検出された極
小値を含む３点以上の検出値から近似した２次曲線以上
の近似関数にて真の極小値を求める補間器と、を備え、
前記補間器にて得られた真の極小値を前記ピッチ検出器
に入力するとともに、このピッチ検出器で検出されてピ
ッチ出力端子に出力された前のフレームのピッチ（ＩＰ
k-1）とガイドピッチ（ＧＰk-1）とから現在分析してい
るフレームのピッチ（ＩＰk）に近似するガイドピッチ
（ＧＰk）を求めるガイドピッチ演算器を備え、この求
められたガイドピッチ（ＧＰk）を前記ピッチ検出器に
入力し、該ピッチ検出器はガイドピッチ（ＧＰk）前後
のピッチ探索範囲とそれ以外の探索範囲で異なる識別条
件を与えてピッチ（ＩＰk）を検出することを特徴とす
る音声分析装置を提供することにより上記課題を解決す
る。（２）また、上記（１）記載の音声分析装置におい
て、前記補間器で得られた真の極小値と直近の極大値と
の差分を求める平均振幅差関数極小値・極大値差分演算
器を備えるとともに、該演算器で得られる差分（ＤＥＬ
ＴＲＡｉ）を前記ピッチ検出器に入力し、前記アナログ
デジタル変換器で変換された音声信号と、前記スペクト
ル包絡パラメータ抽出器で抽出されたパラメータのスペ
クトル包絡パラメータ（Ｋ１）及び最小２乗誤差（Ｅ
ｐ）と、前記ピッチ検出器で検出されたピッチに対応す
る正規化平均振幅差関数の真の極小値と直近の極大値と
の差分（ＤＥＬＴＲＡ）と、を判別パラメータとして帰
納的に得られる判別条件にて有声音・無声音を判別する
有声音・無声音判別器を備えたことを特徴とする音声分
析装置を提供することにより上記課題を解決する。According to the present invention, there are provided (1) an input terminal for an audio signal, a first low-pass filter, an analog-to-digital converter, a spectrum envelope parameter extractor, and a second low-pass. A pass filter, and an average amplitude difference function calculator for calculating a normalized average amplitude difference function of audio waveform data of a fixed time length (one frame) digitized at a predetermined sampling period through the filters. A pitch detector that detects the pitch of the fundamental frequency of the audio signal from the delay amount at which the normalized average amplitude difference function obtained in step 1 takes a minimum value. An average amplitude difference function minimum value / maximum value detector for detecting the minimum value and the nearest maximum value of the normalized average amplitude difference function obtained, and detection of three or more points including the minimum value detected by this detector Value And a interpolator to determine the true minimum value at a quadratic curve more approximate function which approximates,
The true minimum value obtained by the interpolator is input to the pitch detector, and the pitch (IP) of the previous frame detected by the pitch detector and output to the pitch output terminal is output.
k-1) and a guide pitch (GPk-1), and a guide pitch calculator for calculating a guide pitch (GPk) that approximates the pitch (IPk) of the frame currently being analyzed. ) Is input to the pitch detector, and the pitch detector detects a pitch (IPk) by giving different identification conditions between a pitch search range before and after the guide pitch (GPk) and other search ranges. The above problem is solved by providing a voice analysis device. (2) In the speech analyzer according to the above (1), an average amplitude difference function minimum value / maximum value difference calculator for calculating a difference between a true minimum value obtained by the interpolator and a nearest maximum value is provided. And the difference (DEL
TRAi) is input to the pitch detector, and the speech signal converted by the analog-to-digital converter, the spectrum envelope parameter (K1) of the parameter extracted by the spectrum envelope parameter extractor, and the least square error (E
p) and the difference (DELTRA) between the true minimum value and the most recent maximum value of the normalized average amplitude difference function corresponding to the pitch detected by the pitch detector, as a discrimination parameter. The above object is attained by providing a voice analysis device including a voiced / unvoiced sound discriminator for discriminating voiced / unvoiced sound under conditions.

【０００５】[0005]

【作用】本発明によると、残差信号の平均振幅差関数の
差分に新規な関数を見い出し、基本周波数のピッチに対
応する係数をもつ補正関数及び抽出するピッチの予想値
を与えるガイドピッチを並用することにより、ピッチ抽
出の誤り率と有声音・無声音の判別誤り率が著しく低減
される。According to the present invention, a new function is found for the difference between the average amplitude difference functions of the residual signals, and a correction function having a coefficient corresponding to the pitch of the fundamental frequency and a guide pitch for giving an expected value of the pitch to be extracted are used in common. By doing so, the pitch extraction error rate and the discrimination error rate between voiced and unvoiced sounds are significantly reduced.

【０００６】[0006]

【実施例】本発明の実施例を図１に示す音声分析装置の
ブロック図で説明する。音声信号入力端子１に加えられ
た一定時間長（１フレーム）の音声信号は、第１の低域
通過フィルタ２によって標本化周波数（たとえば８ＫＨ
ｚ）の少なくとも１／２以上の周波数成分（例えば３．
４ＫＨｚ以上）が除去された後、アナログデジタル変換
器（Ａ／Ｄ変換器）３によって標本化周波数でデジタル
化される。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described with reference to a block diagram of a speech analyzer shown in FIG. The audio signal of a fixed time length (one frame) applied to the audio signal input terminal 1 is sampled by the first low-pass filter 2 at a sampling frequency (for example, 8 KH).
z) at least a half or more frequency component (for example, 3.
(4 kHz or more), and is digitized by an analog-to-digital converter (A / D converter) 3 at a sampling frequency.

【０００７】次に、デジタル化された信号はスペクトル
包絡パラメータ抽出器４の中で、データ窓によって切り
出した後、自己相関関数が求められる。Next, after the digitized signal is cut out by the data window in the spectral envelope parameter extractor 4, an autocorrelation function is obtained.

【０００８】更にスペクトル包絡パラメータの抽出に偏
相関分析器を用いれば例えば次の斬化式によってスペク
トル包絡パラメータＫ1，Ｋ2…Ｋpおよび最小２乗誤差
Ｅpが求められる。Further, if a partial correlation analyzer is used to extract the spectral envelope parameters, the spectral envelope parameters K1, K2... Kp and the least square error Ep can be obtained by, for example, the following cutting equation.

【０００９】Ｋn＝Ｗn-1／Ｕn-1 ｎ＝１，…ｐ（ｐ：分析次数）Kn = Wn-1 / Un-1 n = 1,... P (p: analysis order)

【００１０】[0010]

【数１】 (Equation 1)

【数２】 (Equation 2)

【００１１】[0011]

【数３】 (Equation 3)

【００１２】Ｅp＝Ｕp／Ｖo このようにして求められたスペクトル包絡パラメータＫ
1，…Ｋpおよびデジタル化された音声信号列を音源の特
性に応じて分析フィルタ５に加えるか、又は直接第２の
低域通過フィルタ６に一点鎖線矢印の如く加える。Ep = Up / Vo Spectral envelope parameter K thus obtained
1,... Kp and the digitized audio signal sequence are added to the analysis filter 5 or directly to the second low-pass filter 6 as indicated by a dashed line arrow according to the characteristics of the sound source.

【００１３】この分析フィルタ５に次式で表わされる偏
相関格子型フィルタを用いると、残差信号列ＥＤjが次
の式で求められる。When a partial correlation lattice type filter expressed by the following equation is used as the analysis filter 5, a residual signal sequence EDj is obtained by the following equation.

【００１４】ＥＦi＝ＥＦi-1−Ｋi・ＥＢＢi-1 ｉ＝１，…ｐＥＢi＝ＥＢＢi-1−Ｋi・ＥＦi-1 ｉ＝１，…ｐＥＢＢi＝ＥＢi ＥＦo＝ＮＤj （ＮＤj＝音声信号列）ＥＢＢo＝ＮＤj-1 ＥＤj＝ＥＦp，ｊ＝１…３，ＳＮ（ＳＮ：切り出した音
声信号の数）この求められた残差信号列又は音声信号列を概略最大ピ
ッチ周波数以上の成分を除去あるいは減衰させるために
第２の低域通過フィルタ６を通し、残差信号列又は音声
信号列から高域成分を濾波した信号列ＥＡ（ｊ＝１，
…，ＳＮ）を得る。EFi = EFi-1−Ki · EBBi−1 i = 1,... P EBi = EBBi−1−Ki · EFi−1 i = 1,. EBBo = NDj-1 EDj = EFp, j = 1 ... 3, SN (SN: number of cut-out audio signals) The obtained residual signal sequence or the audio signal sequence is removed or attenuated from components having a frequency substantially equal to or higher than the maximum pitch frequency. Through a second low-pass filter 6 to filter a high-frequency component from a residual signal sequence or an audio signal sequence, EA (j = 1,
.., SN).

【００１５】更にＥＡjを平均振幅差関数（ＡＭＤＦ）
演算器７に入力して正規化平均振幅差関数列ＲＡjを次
式によって求める。Further, EAj is converted to an average amplitude difference function (AMDF)
It is input to the arithmetic unit 7 to obtain a normalized average amplitude difference function sequence RAj by the following equation.

【００１６】[0016]

【数４】 (Equation 4)

【００１７】ここにおいて、単に最小となるＲＡjを求
めてｊをピッチとすると、倍ピッチや３倍ピッチ、半ピ
ッチを求めてしまう抽出誤りの確率が高くなってしまう
点に着目し、本発明では、更に、得られた正規化平均振
幅差関数列ＲＡjをＡＭＤＦ極小値・極大値を検出する
検出器８に入力し、ＲＡjの極小値ＲＡＭＮi，ＲＡＭＮ
iを与えるｊの値ＡＰi（ピッチ選択の候補の値）および
ＲＡＭＮiの直近の極大値ＲＡＭＡＸiを求める。In the present invention, attention is paid to the point that if the minimum RAj is simply obtained and j is set as the pitch, the probability of an extraction error for obtaining the double pitch, triple pitch, and half pitch increases. Further, the obtained normalized average amplitude difference function sequence RAj is input to a detector 8 for detecting the minimum value / maximum value of the AMDF, and the minimum values RAMNi and RAMN of RAj are detected.
A value APi (a value of a pitch selection candidate) of j giving i and a local maximum value RAMAXi of RAMNi are obtained.

【００１８】この極大値ＲＡＭＡＸiは比較的ゆるやか
な曲線上に存在するが、極小値ＲＡＭＮiは鋭い尖頭曲
線上に存在するため、標本化周期では正確な値が求めら
れない。Although the maximum value RAMAXi exists on a relatively gentle curve, the minimum value RAMNi exists on a sharp peak curve, so that an accurate value cannot be obtained in the sampling period.

【００１９】そこで、極小値ＲＡＭＮiをＡＭＤＦ極小
値補間器９に通して真の極小値を求める。Then, the minimum value RAMNi is passed through the AMDF minimum value interpolator 9 to determine the true minimum value.

【００２０】かかる補間方法としては、例えば、極小値
付近を２次曲線（ｙ＝ａｘ＋ｂｘ＋ｃ）で近似し、その
頂点を真の極小値ＲＡＭＩＮiとする程度で十分であ
る。For such an interpolation method, for example, it is sufficient to approximate a local minimum value with a quadratic curve (y = ax + bx + c) and set its vertex to a true local minimum value RAMINi.

【００２１】次に、真の極小値と直近の極大値を、ＡＭ
ＤＦ極小値・極大値差分演算器１０に入力し、次式によ
り、極小値・極大値の差分ＤＥＬＴＲＡiを求める。Next, the true minimum value and the nearest maximum value are expressed by AM
The difference is input to the DF minimum value / maximum value difference calculator 10 and the difference DELTRAi between the minimum value / maximum value is calculated by the following equation.

【００２２】ＤＥＬＴＲＡi＝ＲＡＭＡＸi−ＲＡＭＩＮ
i 更に、この差分ＤＥＬＴＲＡiと補間器９で求めた真の
極小値とをピッチ検出器１１に入力して、ピッチＩＰを
後述の手段で検出し、この検出されたピッチＩＰはピッ
チ出力端子１４へ出力されると共にガイドピッチ演算器
１２へ入力される。DELTARAi = RAMAXi-RAMIN
i Further, this difference DELTARi and the true minimum value obtained by the interpolator 9 are input to the pitch detector 11, and the pitch IP is detected by means described later, and the detected pitch IP is sent to the pitch output terminal 14. It is output and input to the guide pitch calculator 12.

【００２３】一方、Ａ／Ｄ変換器３で変換された音声信
号と、スペクトル包絡パラメータ抽出器４で抽出された
パラメータとが有声音・無声音判別器１３に入力されて
有声音・無声音が判別され、有音声・無音声判別パラメ
ータ出力端子１５に出力されると共にこのガイドピッチ
演算器１２へ入力される。尚、このガイドピッチ演算器
１２への入力（１点鎖線矢印）は実用的にはあった方が
良いが、無くてもそれなりの効果が得られる。On the other hand, the voice signal converted by the A / D converter 3 and the parameters extracted by the spectrum envelope parameter extractor 4 are input to the voiced / unvoiced sound discriminator 13 to discriminate the voiced / unvoiced sound. Are output to the voiced / unvoiced discrimination parameter output terminal 15 and also to the guide pitch calculator 12. It should be noted that the input to the guide pitch calculator 12 (the one-dot chain line arrow) should be practical, but a certain effect can be obtained without it.

【００２４】そして、このガイドピッチ演算器１２で
は、ｋ番目の音声フレーム分析におけるガイドピッチＧ
Ｐkを求める次の演算が行われる。The guide pitch calculator 12 calculates the guide pitch G in the k-th speech frame analysis.
The following calculation for Pk is performed.

【００２５】ＧＰk＝（１−ＧＲ）・ＧＰk-1＋ＧＲ・Ｉ
Ｐk-1 ここに、ＧＲは０＜ＧＲ＜１の適当な定数である。ま
た、ＩＰk-1は直前に求めた（ｋー１）番目のフレーム
の抽出されたピッチである。GPk = (1-GR) .GPk-1 + GR.I
Pk-1 where GR is a suitable constant of 0 <GR <1. IPk-1 is the extracted pitch of the (k-1) th frame obtained immediately before.

【００２６】この時、好ましくは連続する無声音フレー
ム（無音フレームを含む）から有声音フレームになり、
いくつかの有声音フレームが続いた後の有声音フレーム
を完全有声音フレームとし、この完全有声音フレームに
なるまでの有声音フレームを有声音への遷移フレームと
いうことにすると、完全有声音フレームとなった時点
で、有声音への遷移フレームの間の平均ピッチを新しい
ガイドピッチとしてセットする。At this time, preferably, a continuous unvoiced frame (including a non-voice frame) is converted to a voiced frame.
A voiced sound frame after several voiced sound frames continues is defined as a fully voiced sound frame, and a voiced sound frame up to the fully voiced sound frame is referred to as a transition frame to a voiced sound. At this point, the average pitch between transition frames to voiced sounds is set as the new guide pitch.

【００２７】この求められたｋ番目のフレームのガイド
ピッチＧＰkをピッチ検出器１１に入力させることによ
り、ピッチ検出器１１内で次の手順によりＡＰi（ピッ
チ選択の候補の値）の中からピッチＩＰkが選択され
る。By inputting the determined guide pitch GPk of the k-th frame to the pitch detector 11, the pitch IPk is selected from APi (pitch selection candidate value) in the pitch detector 11 in the following procedure. Is selected.

【００２８】（１）例えば、ガイドピッチＧＰk＜３２
ならば、変数ＧＰＣ＝１．３７５とする。ＧＰk＜６４
ならば、変数ＧＰＣ＝１．２５とする。ＧＰk≧６４な
らば、変数ＧＰＣ＝１．１８７５とする。(1) For example, the guide pitch GPk <32
Then, the variable GPC is set to 1.375. GPk <64
Then, the variable GPC is set to 1.25. If GPk ≧ 64, the variable GPC is set to 1.1875.

【００２９】（２）例えば、最小予想ピッチＧＰＭＩＮ
＝ＧＰk／ＧＰＣとし、例えば、最大予想ピッチＧＰＭ
ＡＸ＝ＧＰk・ＧＰＣとする（この間のピッチ探索領域
が予想される抽出すべきピッチの存在領域である）。(2) For example, the minimum expected pitch GPMIN
= GPk / GPC, for example, the maximum expected pitch GPM
AX = GPk · GPC (the pitch search area during this period is the expected pitch existence area to be extracted).

【００３０】（３）ｊ＝１とする。(3) j = 1.

【００３１】（４）１フレーム程度の時間の経過では、
ピッチの変動は小さいという事実から、ＲＡＩ，ＲＡＪ
という重み変数を導入する。現在の分析フレーム（ｋ番
目）が完全に有声音フレームであってＡＰi≧ＧＰＭＩ
ＮかつＡＰi≦ＧＰＭＡＸであるならば重み変数ＲＡＩ
＝ＲＡＣとし、ＡＰj≧ＧＰＭＩＮかつＡＰj≦ＧＰＭＡ
Ｘであるならば重み変数ＲＡＪ＝ＲＡＣとする。ＲＡＣ
は例えば、ＲＡＣ＝０．３４３７５とする。逆に、現在
の分析フレームが無声音フレームまたは有声音への遷移
フレームであるならば、変数ＲＡＩ＝ＲＡＪ＝０とす
る。(4) After a lapse of about one frame,
RAI, RAJ
Weight variable is introduced. The current analysis frame (kth) is a completely voiced frame and APi ≧ GPMI
N and if APi≤GPMAX, weight variable RAI
= RAC, APj ≧ GPMIN and APj ≦ GPMA
If X, the weight variable RAJ = RAC. RAC
Is, for example, RAC = 0.34375. Conversely, if the current analysis frame is an unvoiced sound frame or a transition frame to a voiced sound, the variable RAI = RAJ = 0.

【００３２】（５）正規化平均振幅差関数が同じ大きさ
を持つ場合、より小さいピッチが正しいピッチである確
率が高いという事実から、ＷＡＩ，ＷＡＪという重み変
数を導入する。重み変数ＷＡＩ＝ＡＰi・ＲＡＷＧＴと
する。重み変数ＷＡＪ＝ＡＰj・ＲＡＷＧＴとする。定
数ＲＡＷＧＴは例えば、ＲＡＷＧＴ＝１／５１２とす
る。(5) When the normalized average amplitude difference functions have the same magnitude, weight variables WAI and WAJ are introduced due to the fact that a smaller pitch is more likely to be a correct pitch. It is assumed that the weight variable WAI = APi · RAWGT. It is assumed that the weight variable WAJ = APj · RAWGT. The constant RAWGT is, for example, RAWGT = 1/512.

【００３３】（６）ＲＡＭＩＮi＋ＷＡＩ−ＲＡＩ＞Ｒ
ＡＭＩＮj＋ＷＡＪ−ＲＡＪならば（８）の処理へ移
る。逆に、ＲＡＭＩＮi＋ＷＡＩ−ＲＡＩ≦ＲＡＭＩＮj
＋ＷＡＪ−ＲＡＪならば（７）の処理へ移る。(6) RAMINi + WAI-RAI> R
If AMINj + WAJ-RAJ, the process proceeds to (8). Conversely, RAMINi + WAI-RAI ≦ RAMINj
If it is + WAJ-RAJ, the process proceeds to (7).

【００３４】（７）ＩＰk＝ＡＰi，ＤＥＬＴＲＡ＝ＤＥ
ＬＴＲＡi，ｊ＝ｉとする。(7) IPk = APi, DELTA = DE
Let LTRAi, j = i.

【００３５】（８）ＡＰｉの候補が未だ存在すれば、
（４）へ戻り、処理をくり返す。(8) If the candidate for APi still exists,
Return to (4) and repeat the process.

【００３６】（９）全てのＡＰiについて選択操作が終
了すると、最終のＩＰkが求めるピッチとなり、ピッチ
出力端子１４に出力される。(9) When the selection operation is completed for all APi, the final IPk becomes the pitch to be obtained and is output to the pitch output terminal 14.

【００３７】この場合、例えば図２に示すように、従来
のＡＭＤＦ法ではＲＡＭＮ2が最小値となっているの
で、ＡＰ2をピッチとして抽出してしまうが、本発明で
は補間を行うことにより真の極小値ＲＡＭＩＮ1，ＲＡ
ＭＩＮ2が求められ、ＲＡＭＩＮ1が真の最小値であるこ
とが判り、ＡＰ1をピッチとして抽出することができ、
ＡＰ2を誤って抽出し、倍ピッチとなってしまうことを
防止する。更に、完全有声音フレームにおいては、例え
ば図３に示すように、ガイドピッチ、重み変数の導入に
より、ＲＡＭＩＮi＞ＲＡＭＩＮjとなっている場合で
も、倍ピッチであるＡＰjではなく、正しいピッチのＡ
Ｐiをｋ番目のフレームのピッチＩＰkとして抽出する。In this case, as shown in FIG. 2, for example, in the conventional AMDF method, since RAMN2 is the minimum value, AP2 is extracted as a pitch. In the present invention, the true minimum is obtained by performing interpolation. Value RAMIN1, RA
MIN2 is found, RAMIN1 is found to be the true minimum, AP1 can be extracted as pitch,
AP2 is prevented from being erroneously extracted and becoming double pitch. Further, in a completely voiced sound frame, as shown in FIG. 3, for example, even if RAMINi> RAMINj due to the introduction of the guide pitch and the weighting variable, the correct pitch Aj is used instead of the double pitch APj.
Extract Pi as the pitch IPk of the k-th frame.

【００３８】このようにして求められたスペクトル包絡
パラメータＫ1と最小２乗誤差Ｅpと極小値・極大値の差
分ＤＥＬＴＲＡが有声音・無声音判別器１３に入力さ
れ、分析次数ｐ＝８の場合、次の基準で判別が行われ
る。尚、この基準となる数値は一例である。（ア）Ｋ1≧０．９かつＤＥＬＴＲＡ≧０．１５６２５
ならば有声音である。（イ）Ｋ1≧０．７かつＥp≦０．５かつＤＥＬＴＲＡ≧
０．１９３７５ならば有声音である。（ウ）Ｋ1≧０．４かつＥp≦０．３かつＤＥＬＴＲＡ≧
０．２３１２５ならば有声音である。（エ）Ｋ1≧０かつＥp≦０．２かつＤＥＬＴＲＡ≧０．
２６８７５ならば有声音である。（オ）Ｅp＜０．７かつＤＥＬＴＲＡ≧０．３０６２５
かつＥp＜ＤＥＬＴＲＡ＋０．１５ならば有声音であ
る。（カ）Ｅp≧０．７かつＤＥＬＴＲＡ≧０．３４３７５
かつＥp＜ＤＥＬＴＲＡ＋０．３ならば有声音である。（キ）Ｅpが極めて小さい場合（例えばＥp＜０．００
１）は、（ア）〜（カ）を満足していても無声音とす
る。（ク）音声信号レベルが極めて小さい場合は、（ア）〜
（カ）を満足していても無声音とする。（ケ）上記条件に適合しないものは無声音とする。そし
て、この判別結果は有声音・無声音判別パラメータ出力
端子１５より出力される。The spectrum envelope parameter K 1, the least square error Ep, and the difference DELTRA between the minimum value and the maximum value thus obtained are input to the voiced / unvoiced sound discriminator 13. Is determined based on the following criteria. Note that this reference value is an example. (A) K1 ≥ 0.9 and DELTA * 0.15625
Then it is a voiced sound. (B) K1 ≧ 0.7, Ep ≦ 0.5 and DELTA *
If it is 0.193375, it is a voiced sound. (C) K1 ≧ 0.4 and Ep ≦ 0.3 and DERTRA ≧
If it is 0.23125, it is a voiced sound. (D) K1 ≧ 0 and Ep ≦ 0.2 and DERTRA ≧ 0.
26875 is a voiced sound. (E) Ep <0.7 and DERTRA ≧ 0.30625
And if Ep <DELTRA + 0.15, it is a voiced sound. (F) Ep ≧ 0.7 and DERTRA ≧ 0.34375
If Ep <DELTRA + 0.3, it is a voiced sound. (G) When Ep is extremely small (for example, Ep <0.00)
In 1), even if (a) to (f) are satisfied, the voice is unvoiced. (H) When the audio signal level is extremely low,
Even if (f) is satisfied, the sound is unvoiced. (G) Those that do not meet the above conditions shall be unvoiced. The discrimination result is output from the voiced / unvoiced discrimination parameter output terminal 15.

【００３９】尚、無声音の区間でも、例えば図４に示す
ように、正規化平均振幅差関数の大きさが、従来の方法
では、有声音であると判定してしまう場合が多いが、本
発明では、正規化平均振幅差関数の真の極小値と極大値
との差分ＤＥＬＴＲＡという考えを導入することによ
り、図４の例の場合にはＤＥＬＴＲＡが小さいので、そ
の他の判定条件と組合わせて、このフレームは無声音で
あると判定することができる。In the unvoiced sound section, as shown in FIG. 4, for example, the magnitude of the normalized average amplitude difference function is often determined to be a voiced sound by the conventional method. Then, by introducing the idea of the difference DELTRA between the true minimum value and the maximum value of the normalized average amplitude difference function, in the case of the example in FIG. 4, since DELTRA is small in the example of FIG. This frame can be determined to be unvoiced.

【００４０】又、図５に示すように、ＲＡＭＩＮが従来
の判定方法では、無声音と判定してしまう大きさであっ
てもＤＥＬＴＲＡが十分大きい場合には、他の条件との
組合わせにより、本発明では有声音と判定できる。Further, as shown in FIG. 5, even if RAMIN is large enough to be judged as unvoiced sound in the conventional judgment method, if DELTRA is sufficiently large, the combination of other conditions will cause the present invention to fail. In the invention, it can be determined that the voiced sound is present.

【００４１】[0041]

【発明の効果】以上説明したように、本発明の請求項１
の音声分析装置は、正規化平均振幅差関数の極小値と直
近の極大値とを検出器で検出して、この検出された極小
値から真の極小値を補間器で求めてピッチ検出器に入力
し、更にこのピッチ検出器で検出されたピッチの予想領
域を与えるガイドピッチをガイドピッチ演算器で求めて
ピッチ検出器に入力し、ピッチ検出器に重み変数を導入
するようにしたので、ピッチ抽出の誤り率を著しく低減
でき、音質の劣化を極力防止することができる。As described above, according to the first aspect of the present invention,
The voice analysis device detects the minimum value and the latest maximum value of the normalized average amplitude difference function with a detector, obtains a true minimum value from the detected minimum value with an interpolator, and outputs the true minimum value to a pitch detector. Input, and furthermore, a guide pitch which gives an expected area of the pitch detected by this pitch detector is obtained by a guide pitch calculator and input to the pitch detector, and a weight variable is introduced into the pitch detector. The error rate of extraction can be significantly reduced, and the deterioration of sound quality can be prevented as much as possible.

【００４２】又、本発明の請求項２の音声分析装置は、
アナログデジタル変換器で変換された音声信号と、スペ
クトル包絡パラメータ抽出器で抽出されたパラメータ
と、ピッチ検出器で検出されたピッチに対応する正規化
平均振幅差関数の真の極小値と直近の極大値との差分の
各値を有声音・無声音判別器に入力するようにしたの
で、有声音・無声音の判別誤り率を著しく低減でき、音
質の劣化を極力防止することができる。Further, according to the second aspect of the present invention,
The true minimum value and the nearest local maximum of the normalized average amplitude difference function corresponding to the voice signal converted by the analog-to-digital converter, the parameters extracted by the spectrum envelope parameter extractor, and the pitch detected by the pitch detector Since each value of the difference from the value is input to the voiced / unvoiced sound discriminator, the discrimination error rate of the voiced / unvoiced sound can be significantly reduced, and the deterioration of the sound quality can be prevented as much as possible.

[Brief description of the drawings]

【図１】本発明の実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

【図２】極小値の補間の効果の例を示す正規化平均振幅
差関数と遅延の特性図である。FIG. 2 is a characteristic diagram of a normalized average amplitude difference function and delay showing an example of the effect of interpolation of a minimum value.

【図３】完全有声音フレームにおける重み変数の効果の
例を示す正規化平均振幅差関数と遅延の特性図である。FIG. 3 is a characteristic diagram of a normalized average amplitude difference function and a delay illustrating an example of an effect of a weight variable in a completely voiced sound frame.

【図４】無声音を有声音に判別誤りし易い例を示す正規
化平均振幅差関数と遅延の特性図である。FIG. 4 is a characteristic diagram of a normalized average amplitude difference function and a delay illustrating an example in which an unvoiced sound is easily erroneously determined to be a voiced sound.

【図５】有声音を無声音に判別誤りし易い例を示す正規
化平均振幅差関数と遅延の特性図である。FIG. 5 is a characteristic diagram of a normalized average amplitude difference function and delay showing an example in which a voiced sound is likely to be erroneously determined to be unvoiced.

[Explanation of symbols]

１音声信号入力端子２低減通過フィルタ３アナログデジタル変換器４スペクトル包絡パラメータ抽出器５分析フィルタ６低域通過フィルタ７ＡＭＤＦ演算器８ＡＭＤＦ極小値・極大値検出器９ＡＭＤＦ極小値補間器１０ＡＭＤＦ極小値・極大値差分演算器１１ピッチ検出器１２ガイドピッチ演算器１３有声音・無声音判別器 Reference Signs List 1 audio signal input terminal 2 reduction pass filter 3 analog-to-digital converter 4 spectrum envelope parameter extractor 5 analysis filter 6 low-pass filter 7 AMDF calculator 8 AMDF minimum / maximum value detector 9 AMDF minimum value interpolator 10 AMDF minimum Value / maximum value difference calculator 11 pitch detector 12 guide pitch calculator 13 voiced / unvoiced sound discriminator

Claims

(57) [Claims]

1. An audio signal input terminal, a first low-pass filter, an analog-to-digital converter, a spectrum envelope parameter extractor, a second low-pass filter, and a predetermined sampling period through these. An average amplitude difference function calculator for calculating a normalized average amplitude difference function of the digitized audio waveform data having a fixed time length (one frame), wherein the normalized average amplitude difference function obtained by the calculator is A pitch detector for detecting a pitch of a fundamental frequency of the audio signal from a delay amount having a minimum value, wherein a minimum value of the normalized average amplitude difference function obtained by the average amplitude difference function calculator is provided. Mean amplitude difference function for detecting the minimum value and the local maximum value, a minimum value / maximum value detector, and an approximation function of a quadratic curve or more approximated from three or more detection values including the minimum value detected by this detector And an interpolator for obtaining a true minimum value.The true minimum value obtained by the interpolator is input to the pitch detector, and detected by the pitch detector and output to a pitch output terminal. A guide pitch calculator for calculating a guide pitch (GPk) that approximates the pitch (IPk) of the frame currently being analyzed from the pitch (IPk-1) of the previous frame and the guide pitch (GPk-1);
The obtained guide pitch (GPk) is inputted to the pitch detector, and the pitch detector receives the guide pitch (GPk).
k) A voice analysis device that detects a pitch (IPk) by giving different identification conditions between the preceding and following pitch search ranges and other search ranges.

2. The speech analyzer according to claim 1, wherein an average amplitude difference function minimum value / maximum value difference calculator for calculating a difference between a true minimum value obtained by the interpolator and a nearest maximum value is provided. And the difference (DELTR
Ai) is input to the pitch detector, and the speech signal converted by the analog-to-digital converter, the spectrum envelope parameter (K1) of the parameter extracted by the spectrum envelope parameter extractor, and the least square error (Ep)
And a difference (DELTRA) between the true minimum value and the latest maximum value of the normalized average amplitude difference function corresponding to the pitch detected by the pitch detector, as a determination parameter, A voice / unvoiced sound discriminator for discriminating voiced / unvoiced sounds.