JPS5912185B2 - Voiced/unvoiced determination device - Google Patents

Voiced/unvoiced determination device

Info

Publication number
JPS5912185B2
JPS5912185B2 JP128778A JP128778A JPS5912185B2 JP S5912185 B2 JPS5912185 B2 JP S5912185B2 JP 128778 A JP128778 A JP 128778A JP 128778 A JP128778 A JP 128778A JP S5912185 B2 JPS5912185 B2 JP S5912185B2
Authority
JP
Japan
Prior art keywords
voiced
outputs
voice
unvoiced
waveform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
JP128778A
Other languages
Japanese (ja)
Other versions
JPS5494212A (en
Inventor
哲 田口
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Electric Co Ltd filed Critical Nippon Electric Co Ltd
Priority to JP128778A priority Critical patent/JPS5912185B2/en
Publication of JPS5494212A publication Critical patent/JPS5494212A/en
Publication of JPS5912185B2 publication Critical patent/JPS5912185B2/en
Expired legal-status Critical Current

Links

Abstract

PURPOSE:To discriminate the voice or voiceless condition, by measuring the periodic nature or timing similarity of the series of correlativity in which the part around delayed zoro having greater effect of correlativity by the noise surrounded is removed. CONSTITUTION:The audio waveform signal including surrounding noise is inputted to the A/D converter 402 via 401. The 402 samples the signal and quantizes it, and outputs it to the temporal memory 403. 403 temporarily memorizes the sampled waveform signal continuously inputted and outputs it to the window processor 404. 404 executes the window operation to the sampled waveform signal, cuts out the sampled waveform signal, and outputs the signal cut out to the correlativity unit 405. 405 measures the correlativity from the delay time having almost no effect of surrounding noise to the integer times time of the maximum value of the pitch period and outputs it to the degree of voice measurement unit 406. 406 measures the degree of voice from the correlativity, discriminates the voice or voiceless condition, and outputs the result of discrimination via the output terminal 407 as the voice and voiceless discrimination signal.

Description

【発明の詳細な説明】 本発明は音声の有声無声判定を行なう有声無声判定装置
に関し、殊に高周囲雑音環境下において良好な有声無声
判定を行なうための有声無声判定装置に係るものである
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a voiced/unvoiced determining device for determining voiced/unvoiced speech, and more particularly to a voiced/unvoiced determining device for performing a good voiced/unvoiced determination in a high ambient noise environment.

音声波形における有声無声判定特性の音声の分析合成、
認識等における重要なパラメータであることが知られて
いる。
Speech analysis and synthesis of voiced/unvoiced characteristics in speech waveforms,
It is known to be an important parameter in recognition, etc.

例えば、音声の分析合成系においては分析部で判定され
る有声無声判定結果が合成部において合成される合成音
の品質に大きな影響を及ぼす。音声波形の有声無声判定
方法としては、従来、音声波形のエネルギー、零交さ率
、単位遅れにおける自己相関係数、一次の線形予測係数
、線形予測分析における予測残差電力等種々の方法、あ
るいは前記種々の方法の組合せ等、が知られている。し
かしながら従来の方法、例えば音声波形のエネルギーを
用いる方法は、高周囲雑音環境下において、エネルギー
の小さい無声音が雑音の影響で、エネルギーが増大し、
無声音を有声音に間違がえることが多い。同様に零交さ
率を用いる方法は、比較的にエネルギーの小さい有声音
、例えば有声音同志のいわゆる渡りの部分や有声音の語
尾附近を無声音に間違がえることが多い。また、単位遅
れによる自己相関係数を用いる方法と一次の線形予測供
数を用いる方法と、線形予測分析における予測残差電力
を用いる方法とは、一般に周囲雑音がパワースペクトラ
ムで表現すると10低域が大きい有色雑音であるため多
大な影響を受ける。以上のように従来の有声無声判定法
は、高周囲雑音環境下において、判定誤りを起し易とい
う欠点を有していた。本発明の目的は、周囲雑音の影響
を軽減し、よ15り確実な有声無声判定を可能とする有
声無声判定装置を提供することにある。
For example, in a speech analysis and synthesis system, the voiced/unvoiced determination result determined by the analysis section has a large effect on the quality of synthesized speech synthesized by the synthesis section. Conventionally, various methods have been used to determine voicedness or unvoicedness of speech waveforms, such as energy of speech waveforms, zero crossing rate, autocorrelation coefficient at unit delay, first-order linear prediction coefficient, predicted residual power in linear prediction analysis, etc. Combinations of the various methods described above are known. However, in conventional methods, such as methods that use the energy of voice waveforms, in a high ambient noise environment, unvoiced sounds with low energy increase in energy due to the influence of noise.
Unvoiced sounds are often mistaken for voiced sounds. Similarly, the method using the zero crossing rate often mistakes voiced sounds with relatively low energy, such as so-called crossing parts between voiced sounds or the vicinity of the end of a voiced sound, as unvoiced sounds. In addition, the method using the autocorrelation coefficient due to unit delay, the method using the first-order linear prediction constant, and the method using the prediction residual power in linear prediction analysis are generally different from each other when the ambient noise is expressed as a power spectrum. Since it is a loud colored noise, it has a great influence. As described above, the conventional voiced/unvoiced determination method has the disadvantage that it is easy to make a determination error in a high ambient noise environment. SUMMARY OF THE INVENTION An object of the present invention is to provide a voiced/unvoiced determination device that reduces the influence of ambient noise and enables more reliable voiced/unvoiced determination.

本発明は音声波形から直接的に、あるいは音声のスペク
トラムを表現するパラメータでつくられる逆フィルタを
通過した音声波形(以下残差波形nという)(例えば、
板倉文忠、「統計的手法による音声の特徴抽出」、東北
大学電気通信研究所主催第8回シンポジウム。
The present invention is based on a speech waveform (hereinafter referred to as residual waveform n) that is generated directly from a speech waveform or passed through an inverse filter created using parameters expressing the speech spectrum (for example,
Fumitada Itakura, "Speech feature extraction using statistical methods," 8th Symposium sponsored by the Institute of Electrical Communication, Tohoku University.

小林勉、山本啓、「極性相関法を用いた有声/無声性の
判定について」、日本音響学会研究発表会講演論文集、
1976年525月)から直接的に、もしくは残差波形
から直接的に求めるのと等価な手法(例えば前記「統計
的手法による音声の特徴抽出」)で自己相関係、ケプス
トラム、極性相関係数、差分係数(例えば、福馬均、中
村顕一「音声ピッチ抽出の差分演算によ30る一手法」
、昭和52年度電子通信学会情報部門全国大会)等の相
関係数を計測する手段と、前記相関係数の周期性もしく
は時間的類似性を計測する手段とから構成されている。
本発明の特徴は、有声音の相関係数が概周期性35を持
ち、前記概周期がピッチ周期とよく一致するという第1
の性質と、無声音と多くの場合の周囲雑音とのそれぞれ
の相関係数が時間遅れ方向に収oハ:一束性を持つとい
う第2の性質とを利用して、周囲雑音の相関係数の影響
が大きい遅れ零附近の部分を除去した相関係数列の周期
性もしくは時間的類似性を計測して有声無声判定を行な
うことにある。
Tsutomu Kobayashi, Kei Yamamoto, “Determination of voiced/voicedness using polar correlation method”, Proceedings of the Acoustical Society of Japan Research Conference,
Autocorrelation, cepstrum, polar correlation coefficient, Difference coefficient (for example, Hitoshi Fukuma, Kenichi Nakamura "A method of extracting voice pitch using 30 differences"
, 1971 Information Division National Conference of the Institute of Electronics and Communication Engineers), etc., and a means for measuring the periodicity or temporal similarity of the correlation coefficient.
The first feature of the present invention is that the correlation coefficient of a voiced sound has an approximate periodicity of 35, and the approximate period closely matches the pitch period.
, and the second property that the correlation coefficients of unvoiced sounds and ambient noise in many cases converge in the direction of time delay: the correlation coefficient of ambient noise is The purpose of this method is to measure the periodicity or temporal similarity of a correlation coefficient sequence from which the portion near zero delay, which is largely affected by the noise, is removed to determine whether or not there is a voice.

このため相関係数が概周期性を持つという有声音の特徴
を高周囲雑音環境下において正確に抽出し得る。従つて
前記環境下において、より正確な有声無声判定を行ない
得るという効果がある。一般に、音声波形の有声音部分
は概周期的であるため、音声波形から直接的に求められ
る有声音部分の相関波形は、例えば第1図に示す自己相
関波形の様に概周期的となる。一方、周囲雑音は通常、
周期性が弱く、周囲雑音波形から直接的に求められる相
関波形は、例えば第2図に示す自己相関波形の様に、遅
れ零附近の変化は大きく、遅れの増加に伴なつて=定値
に収束して行く。前記収束値は例えば自己相関係数、ケ
プストラム、極性相関係数等では零であり、差分係数等
では正のある値となる。また周囲雑音と有声音との相互
相関は一般に殆んど零であり、分析ウインドウの影響に
よる相互相関の分散を無視するならば、周囲雑音の混入
した有声音の相関波形は、自己相関波形を例に取れば、
例えば第3図に示す様に第1図に示す有声音の自己相関
波形と第2図に示す周囲雑音の自己相関波形との線形加
算で近似的に表現される。他の相関係数、例えばケブス
トラムについても、自己相関係数が波形のパクースペト
ラムのフーリエ逆変換であるのに対し、ケブストラムが
波形のパクースペクトラムの絶対値の対数のフーリエ逆
変換であることから明らかのように、周囲雑音の混入し
た有声音のケプストラムは、有声音のケプストラムと周
囲雑音のケプストラムとの線形加算で近似的に表現され
る。極性相関係数、差分係数等についても同様である。
故にピツチ周期程度またはピツチ周期以上の遅れ時間に
おける相関波形又は相関波形の変化分はほぼ有声音の相
関波形を表わしていることになる。また無声音の相関波
形は概して周期性を持たないのが普通である。従つて、
周囲雑音の混入した直接的な音声の相関波形のピツチ周
期およびピツチ周期の整数倍の遅れ時間における類似性
を計測することにより、又は相関波形の周期性を計測す
ることにより、前記音声の有声無声判定を確実に行なう
ことができる。また残差波形から求めた相関係数を用い
た場合でも、有声部の周期性は保存されており(例えば
、前記「極性相関法を用いた有声/無声性の判定につい
て」)、直接的な音声から求めた相関係数と同様の取扱
いができる。次に図面を参照して本発明を詳細に説明す
る。
Therefore, it is possible to accurately extract the feature of voiced sound that the correlation coefficient has approximately periodicity in a high ambient noise environment. Therefore, there is an effect that more accurate voiced/unvoiced determination can be performed under the above environment. In general, the voiced part of a speech waveform is approximately periodic, so the correlation waveform of the voiced part directly determined from the speech waveform is approximately periodic, such as the autocorrelation waveform shown in FIG. 1, for example. On the other hand, ambient noise is usually
Correlation waveforms that have weak periodicity and can be directly determined from ambient noise waveforms, such as the autocorrelation waveform shown in Figure 2, show large changes near zero delay, and converge to a constant value as the delay increases. I'll go. The convergence value is, for example, zero for an autocorrelation coefficient, cepstrum, polar correlation coefficient, etc., and is a positive value for a difference coefficient, etc. Furthermore, the cross-correlation between ambient noise and voiced sound is generally almost zero, and if we ignore the dispersion of cross-correlation due to the influence of the analysis window, the correlation waveform of voiced sound mixed with ambient noise will be less than the autocorrelation waveform. For example,
For example, as shown in FIG. 3, it is approximately expressed by linear addition of the autocorrelation waveform of the voiced sound shown in FIG. 1 and the autocorrelation waveform of the ambient noise shown in FIG. 2. Regarding other correlation coefficients, such as the kebstrum, the autocorrelation coefficient is the inverse Fourier transform of the Pacou spectrum of the waveform, whereas the kebstrum is the inverse Fourier transform of the logarithm of the absolute value of the Pacou spectrum of the waveform. As is clear, the cepstrum of a voiced sound mixed with ambient noise is approximately expressed by linear addition of the cepstrum of the voiced sound and the cepstrum of the ambient noise. The same applies to polarity correlation coefficients, difference coefficients, etc.
Therefore, the correlation waveform or the change in the correlation waveform at a delay time of about the pitch period or more than the pitch period almost represents the correlation waveform of the voiced sound. Furthermore, the correlation waveform of unvoiced sounds generally does not have periodicity. Therefore,
By measuring the similarity in the pitch period and the delay time of an integral multiple of the pitch period of the correlation waveform of the direct voice mixed with ambient noise, or by measuring the periodicity of the correlation waveform, it is possible to determine whether the voice is voiced or unvoiced. Judgments can be made reliably. Furthermore, even when using the correlation coefficient obtained from the residual waveform, the periodicity of voiced parts is preserved (for example, as described above in "Determination of voiced/unvoiced using polar correlation method"), and It can be handled in the same way as the correlation coefficient obtained from speech. Next, the present invention will be explained in detail with reference to the drawings.

第4図は本発明の第1の実施例を示すプロツク図である
。まず、周囲雑音を含んだ音声波形信号が波形入力端子
401を介してA/D変換器402へ入力される。
FIG. 4 is a block diagram showing a first embodiment of the present invention. First, an audio waveform signal containing ambient noise is input to the A/D converter 402 via the waveform input terminal 401.

A/D変換器402は前記波形信号を標本化し、さらに
量子化して一時メモリ403へ出力する。一時メモリ4
03は連続的に入力される前記標本化波形信号を一時的
に記憶し、ウインドウ処理器404へ出力する。ウイン
ドウ処理器404は標本化波形信号に例えば矩形ウイン
ドウ、・・ミングウインドウ等の窓関数演算を実施し、
標本化波形信号を切出し、前記切出した波形信号を相関
器405へ出力する。相関器405は周囲雑音の影響が
殆んどない遅れ時間(例えば2mSEC)からピツチ周
期の最大値の整数倍(例えば15mSECの3倍の45
mSEC)までの相関係数を計測し有声度計測器406
へ出力する。
The A/D converter 402 samples the waveform signal, further quantizes it, and outputs it to the temporary memory 403. temporary memory 4
03 temporarily stores the continuously input sampled waveform signal and outputs it to the window processor 404. The window processor 404 performs a window function operation such as rectangular window, . . . ming window, etc. on the sampled waveform signal,
The sampled waveform signal is cut out, and the cut out waveform signal is output to the correlator 405. The correlator 405 uses a delay time that is almost unaffected by ambient noise (for example, 2 mSEC) to an integer multiple of the maximum value of the pitch period (for example, 45 times three times 15 mSEC).
The voicing degree measuring device 406 measures the correlation coefficient up to
Output to.

有声度計測器406は前記相関係数から有声度を計測し
、有声無声を判定し、判定結果を有声無声判定信号とし
て有声無声判定信号出力端子407を介して出力する。
次に図面を参照して有声度計測器406の第1の構成例
を詳細に説明する。
The voicing degree measuring device 406 measures the voicing degree from the correlation coefficient, determines whether the signal is voiced or unvoiced, and outputs the determination result as a voiced/unvoiced determination signal via a voiced/unvoiced determination signal output terminal 407 .
Next, a first configuration example of the voicing degree measuring device 406 will be described in detail with reference to the drawings.

第5図は有声度計測器406の第1の構成例を示すプロ
ツク図である。本構成例は自己相関係数、ケプストラム
、極性相関係数等の相関係数に有効である。相関器40
5で計測された相関係数が相関係数入力端子501を介
して相似度計測器502へ入力される。
FIG. 5 is a block diagram showing a first configuration example of the voicing level measuring device 406. This configuration example is effective for correlation coefficients such as autocorrelation coefficients, cepstral correlation coefficients, and polar correlation coefficients. Correlator 40
The correlation coefficient measured in step 5 is input to the similarity measuring device 502 via the correlation coefficient input terminal 501.

相似度計測器502は相似度、R(K)を例えばで計測
する。
The similarity measuring device 502 measures the similarity, R(K), for example.

但しρPは遅れPにおける相関係数、Kは遅れ時間間の
時間的距離でピツチ検索範囲の最小値(例えば2.5m
SEC)からピツチ検索範囲の最大値(例えば15mS
EC)までの範囲をとる。Nは周囲雑音の影響が殆んど
ない遅れ時間、Mは前記ピツチ周期の最大値の整数倍(
例えば15mSECの3倍の45mSEC)より短かい
遅れ時間である。相似度計測器502はさらに計測した
R(K)を最大値検索器503へ出力する。最大値検索
器503はR(K)の最大値RMAXを検索し、有無判
定器504へ出力する。有無判定器504はRMAXを
あらかじめ設定された判定基準値と比較して、RMAX
が前記判定基準値より大きければ有声と、小さければ無
声と判定し、判定結果を有声無声判定信号として有声無
声判定信号出力端子505を介して出力する。次に図面
を参照して有声度計測器406の第2の構成例を詳細に
説明する。
However, ρP is the correlation coefficient at delay P, and K is the temporal distance between the delay times, which is the minimum value of the pitch search range (for example, 2.5 m).
SEC) to the maximum value of the pitch search range (for example, 15 mS
EC). N is a delay time that is hardly affected by ambient noise, and M is an integral multiple of the maximum value of the pitch period (
For example, the delay time is shorter than 45 mSEC, which is three times 15 mSEC. The similarity measurer 502 further outputs the measured R(K) to the maximum value searcher 503. Maximum value searcher 503 searches for the maximum value RMAX of R(K) and outputs it to presence/absence determination unit 504 . The presence/absence determination unit 504 compares RMAX with a preset determination reference value, and determines whether RMAX
If it is larger than the determination reference value, it is determined to be voiced, and if it is smaller, it is determined to be unvoiced, and the determination result is outputted as a voiced/unvoiced determination signal via the voiced/unvoiced determination signal output terminal 505. Next, a second configuration example of the voicing degree measuring device 406 will be described in detail with reference to the drawings.

第6図は有声度計測器406の第2の構成例を示すプロ
ツク図である。本構成例は自己相関係数、ケプストラム
、極性相関係数等の相関係数に有効である。相関器40
5で計測された相関係数が相関係数入力端子601を介
して相似度計測器602と最大値検索器605とへ入力
される。
FIG. 6 is a block diagram showing a second configuration example of the voicing degree measuring device 406. This configuration example is effective for correlation coefficients such as autocorrelation coefficients, cepstral correlation coefficients, and polar correlation coefficients. Correlator 40
The correlation coefficient measured in step 5 is input to a similarity measuring device 602 and a maximum value searching device 605 via a correlation coefficient input terminal 601.

相似度計測器602は相似度S(K)を例えば、で計測
する。
The similarity measuring device 602 measures the similarity S(K), for example.

但しρ2は遅れPにおける相関係数、Kは遅れ時間の時
間的距離でピツチ検索範囲の最小値(例えば2.5mS
EC)からピツチ検索範囲の最大値(例えば15mSE
C)までの範囲をとる。Nは周囲雑音の影響が殆んどな
い遅れ時間、Mは前記ピツチ周期の最大値の整数倍(例
えば45mSEC)より短かい遅れ時間である。相似度
計測器602はさらに計測したS(K)を最小値検索器
603へ出力する。最小値検索器603はS(K)の最
小値SMlNを検索し、有無判定器604へ出力する。
最大値検索器605は相関係数入力端子601を介して
供給される相関係数の最大値、ρMAXを検索し、判定
信号発生器606へ出力する。判定信号発生器606は
絹AXの増加につれて単調に増加する判定基準値信号を
発生し、有無判定器604へ出力する。有無判定器60
4は最小値検索器603より供給されるSMINと、判
定信号発生器606より供給される判定基準値信号とを
比較し、SMlNが前記判定基準値より小いさければ有
声と、大きければ無声と判定し、判定結果を有声無声判
定信号として有声無声判定信号出力端子607を介して
出力する。次に図面を参照して有声度計測器406の第
3構成例を詳細に説明する。第7図は有声度計測器40
6の第3の構成例を示すプロツク図である。本構成例は
自己相関係数、ケプストラム、極性相関係数等の相関係
数に有効である。相関器405で計測された相関係数が
相関係数入力端子701を介してピツチ抽出器702と
整数倍ピツチ検索器703とへ入力される。
However, ρ2 is the correlation coefficient at delay P, and K is the temporal distance of the delay time, which is the minimum value of the pitch search range (for example, 2.5 mS
EC) to the maximum value of the pitch search range (e.g. 15mSE
C). N is a delay time that is hardly affected by ambient noise, and M is a delay time that is shorter than an integral multiple (for example, 45 mSEC) of the maximum value of the pitch period. The similarity measurer 602 further outputs the measured S(K) to the minimum value searcher 603. The minimum value searcher 603 searches for the minimum value SMIN of S(K) and outputs it to the presence/absence determination unit 604.
The maximum value searcher 605 searches for the maximum value ρMAX of the correlation coefficients supplied via the correlation coefficient input terminal 601 and outputs it to the determination signal generator 606. The determination signal generator 606 generates a determination reference value signal that monotonically increases as silk AX increases, and outputs it to the presence/absence determiner 604. Presence/absence determiner 60
4 compares SMIN supplied from the minimum value searcher 603 with the determination reference value signal supplied from the determination signal generator 606, and if SMIN is smaller than the determination reference value, it is voiced, and if it is larger, it is voiced. The determination result is output as a voiced/unvoiced determination signal via the voiced/unvoiced determination signal output terminal 607. Next, a third configuration example of the voicing degree measuring device 406 will be described in detail with reference to the drawings. Figure 7 shows the voicing level measuring device 40.
FIG. 6 is a block diagram showing a third configuration example of No. 6; This configuration example is effective for correlation coefficients such as autocorrelation coefficients, cepstral correlation coefficients, and polar correlation coefficients. The correlation coefficient measured by the correlator 405 is inputted to the pitch extractor 702 and the integer multiple pitch searcher 703 via the correlation coefficient input terminal 701.

ピツチ抽出器702は相関係数を用いたピツチ抽出法と
してよく知られた方法で、ピツチ周期と、ピツチ周期に
おける相関係数であるρMAXを求める。さらにピツチ
抽出器702は前記ピツチ周期を整数倍ピツチ検索器7
03へ、ρMAXを有無判定器704へそれぞれ出力す
る。整数倍ピツチ検索器703はピツチ抽出器702か
ら供給されるピツチ周期情報を用いて、相関係数入力端
子701を介して供給される相関係数の、ピツチ周期の
例えば2倍の遅延時間等の整数倍の遅延時間附近におけ
る最大の相関値、ρ2MAX等を検索する。さらに整数
倍ピツチ検索器703はρ2MAX等を有無判定器70
4へ出力する。有無判定器704は前記ρ2MAX等の
前記ρMAXに対する比率を計測し、前記比率があらか
じめ設定されている判定基準値より大きければ有声と判
定し、小さければ無声と判定して、判定結果を有声無声
判定信号として有声無声判定信号出力端子705を介し
て出力する。次に図面を参照して有声度計測器406の
第4の構成例を詳細に説明する。第8図は有声度計測器
406の第4の構成例を示すプロツク図である。本実施
例は差分係数等の相関係数、及び自己相関係数、ケプス
トラム、極性相関係数等の相関係数に有効である。相関
器405で計測された相関係数が相関係数人力端子80
1を介して極小値抽出器802とピツチ遅れ極小値抽出
器803と極大値抽出器804とピツチ遅れ極大値抽出
器805とピツチ抽出器806とへ人力される。
The pitch extractor 702 uses a well-known pitch extraction method using a correlation coefficient to obtain the pitch period and the correlation coefficient ρMAX in the pitch period. Furthermore, the pitch extractor 702 uses the pitch searcher 7 to multiply the pitch period by an integer.
03, ρMAX is output to the presence/absence determination unit 704, respectively. The integer multiple pitch searcher 703 uses the pitch period information supplied from the pitch extractor 702 to calculate a delay time, for example, twice the pitch period, of the correlation coefficient supplied via the correlation coefficient input terminal 701. Search for the maximum correlation value, ρ2MAX, etc. near an integral multiple of the delay time. Furthermore, the integer multiple pitch searcher 703 determines whether ρ2MAX etc.
Output to 4. The presence/absence determiner 704 measures the ratio of the ρ2MAX to the ρMAX, and if the ratio is larger than a preset determination reference value, it is determined to be voiced, and if it is smaller, it is determined to be unvoiced, and the determination result is determined as voiced/unvoiced. It is output as a signal via the voiced/unvoiced determination signal output terminal 705. Next, a fourth configuration example of the voicing degree measuring device 406 will be described in detail with reference to the drawings. FIG. 8 is a block diagram showing a fourth example of the configuration of the voicing level measuring device 406. This embodiment is effective for correlation coefficients such as difference coefficients, and correlation coefficients such as autocorrelation coefficients, cepstrum, and polar correlation coefficients. The correlation coefficient measured by the correlator 405 is transmitted to the correlation coefficient manual terminal 80.
1 to a minimum value extractor 802, a pitch-delayed minimum value extractor 803, a maximum value extractor 804, a pitch-delayed maximum value extractor 805, and a pitch extractor 806.

極小値抽出器802は周囲雑音の影響が殆んどない遅れ
時間の相関係数列から極小値、ρMINと、ρMINに
おける遅れ時間、TMINを検索し、ρMINを有無判
定器807へ、TMINをピツチ遅れ極小値抽出器80
3へそれぞれ出力する。ピツチ抽出器806は相関係数
を用いたピツチ抽出法としてよく知られる方法で、ピツ
チ周期Tpを求め、ピツチ遅れ極小値抽出器803とピ
ツチ遅れ極大値抽出器805とへ出力する。ピツチ遅れ
極小値抽出器803は、TMlN+TP附近の相関係数
列から、極小値ρMINDを検索し、有無判定器807
へ出力する。極大値抽出器804は周囲雑音の影響が殆
んどない遅れ時間の相関係数列から極大値、ρMAXと
ρMAXにおける遅れ時間、TMAXを検索し、ρMA
Xを有無判定器807へ、TMAXをピツチ遅れ極大値
抽出器805へそれぞれ出力する。ピツチ遅れ極大値抽
出器805は前記TMAXとピツチ抽出器806から供
給されるTpとによりTMAX+TPを計算し、前記計
算値附近の相関係都びリから、極大値ρMAXDを検索
し、有無判定器807へ出力する。有無判定器807は
有無度Vを例えばで計測する。
The minimum value extractor 802 searches for the minimum value, ρMIN, and the delay time at ρMIN, TMIN, from the delay time correlation coefficient sequence that is hardly affected by ambient noise, sends ρMIN to the presence/absence determiner 807, and sends TMIN to the delay time by a pitch. Minimum value extractor 80
Output each to 3. The pitch extractor 806 obtains the pitch period Tp using a well-known pitch extraction method using a correlation coefficient, and outputs it to the pitch delay minimum value extractor 803 and the pitch delay maximum value extractor 805. The pitch delay minimum value extractor 803 searches for the minimum value ρMIND from the correlation coefficient sequence near TMlN+TP, and the presence/absence determination unit 807 searches for the minimum value ρMIND.
Output to. The maximum value extractor 804 searches the maximum value, ρMAX, and the delay time at ρMAX, TMAX, from the delay time correlation coefficient sequence that is almost unaffected by ambient noise, and calculates ρMAX.
X is output to the presence/absence determiner 807, and TMAX is output to the pitch delay local maximum value extractor 805, respectively. The pitch delay local maximum value extractor 805 calculates TMAX+TP using the TMAX and Tp supplied from the pitch extractor 806, searches for the local maximum value ρMAXD from the correlation coefficients around the calculated value, and determines whether the value is present or not. Output to. The presence/absence determination unit 807 measures the presence/absence degree V, for example.

更に有無判定器807は例えばあらかじめ設定された判
定基準値とvとを比較して、vが前記判定基準値より大
きければ、有声と判定し、小さければ無声と判定して判
定結果を有声無声判定信号として有声無声判定信号出力
端子808を介して出力する。次に図面を参照して本発
明の第2の実施例を詳細に説明する。
Further, the presence/absence determiner 807 compares v with a determination reference value set in advance, for example, and if v is larger than the determination reference value, it is determined to be voiced, and if it is smaller, it is determined to be voiceless, and the determination result is determined as voiced/unvoiced. It is output as a signal via the voiced/unvoiced determination signal output terminal 808. Next, a second embodiment of the present invention will be described in detail with reference to the drawings.

第9図は本発明の第2の実施例を示すプロツク図である
。まず、周囲雑音を含んだ音声波形信号が波形入力端子
901を介してA/D変換器902へ入力される。
FIG. 9 is a block diagram showing a second embodiment of the present invention. First, an audio waveform signal containing ambient noise is input to the A/D converter 902 via the waveform input terminal 901.

A/D変換器902は前記波形信号を標本化し、さらに
量子化して一時メモリ903へ出力する。一時メモリ9
03は連続的に入力される前記標本化波形信号を一時的
に記憶し、ウインドウ処理器904へ出力する。ウイン
ドウ処理器904は標本化波形信号に例えば矩形ウイン
ドウ、・・ミングウインドウ等の窓関数演算を実施し、
標本化波形信号を切出し、前記切出した波形信号をスペ
クトラムパラメータ抽出器905と逆フイルタ906と
へ出力する。スペクトラムパラメータ抽出器905は線
形予測法等によりスペクトラムパラメータを抽出し、逆
フイルタ906へ出力する。逆フイルタ906は前記ス
ペクトラムパラメータで表現される逆フイルタを構成し
、ウインドウ処理器904より供給される波形をろ波し
相関器907へ出力する。相関器907は周囲雑音の影
響が殆んどない遅れ時間(例えば2mSEC)からピツ
チ周期の最大値の整数倍(例えば15mSECの3倍の
45mSEC)までの相関係数を計測し有声度計測器9
08へ出力する。有声度計測器908は前記相関係数か
ら有声度を計測し、有声無声を判定し、判定結果を有声
無声判定信号として有声無声判定信号出力端子909を
介して出力する。なお有声度計測器908は第1の実施
例で示した第4図の有声度計測器406と等価である。
A/D converter 902 samples the waveform signal, further quantizes it, and outputs it to temporary memory 903. Temporary memory 9
03 temporarily stores the continuously input sampled waveform signal and outputs it to the window processor 904. The window processor 904 performs a window function operation such as rectangular window, . . . ming window, etc. on the sampled waveform signal,
A sampled waveform signal is cut out, and the cut out waveform signal is output to a spectrum parameter extractor 905 and an inverse filter 906. A spectrum parameter extractor 905 extracts spectrum parameters using a linear prediction method or the like and outputs them to an inverse filter 906. The inverse filter 906 constitutes an inverse filter expressed by the spectrum parameters, filters the waveform supplied from the window processor 904, and outputs it to the correlator 907. The correlator 907 measures the correlation coefficient from a delay time that is hardly affected by ambient noise (for example, 2 mSEC) to an integral multiple of the maximum value of the pitch period (for example, 45 mSEC, which is three times 15 mSEC).
Output to 08. The voicing degree measuring device 908 measures the voicing degree from the correlation coefficient, determines whether it is voiced or unvoiced, and outputs the determination result as a voiced/unvoiced determination signal via a voiced/unvoiced determination signal output terminal 909 . Note that the voicing degree measuring device 908 is equivalent to the voicing degree measuring device 406 shown in FIG. 4 in the first embodiment.

またスペクトラムパラメータ抽出器905と逆フイルタ
906とはA/D変換器902と一時メモリ903との
間、もしくは一時メモリ903とウインドウ処理器90
4との間へ移しても本発明を実施し得るのは明らかであ
る。
The spectrum parameter extractor 905 and the inverse filter 906 are arranged between the A/D converter 902 and the temporary memory 903, or between the temporary memory 903 and the window processor 90.
It is clear that the present invention can also be practiced by moving between 4 and 4.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は有声音部の自己相関波形の特徴を説明するため
の波形図、第2図は周囲雑音の自己相関波形の特徴を説
明するための波形図、第3図は周囲雑音の混入した有声
音部の自己相関波形の特徴を説明するための波形図、第
4図は本発明の第1の実施例を説明するためのプロツク
図である。 第4図において、401は波形入力端子、402はA/
D変換器、403は一時メモリ、404はウインドウ処
理器、405は相関器、406は有声度計測器、407
は有声無声判定信号出力端子。 第5図は有声度計測器406の第1の構成例を説明する
ためのプロツク図である。 第5図において、501は相関係数入力端子、502は
相似度計測器、503は最大値検索器、504は有無判
定器、505は有声無声判定信号出力端子。 第6図は有声度計測器406の第2の構成例を説明する
ためのプロツク図である。 第6図において、601は相関係数入力端子、602は
相似度計測器、603は最小値検索器、604は有無判
定器、605は最大値検索器、606は判定信号発生器
、607は有声無声判定信号出力端子。 第7図は有声度計測器406の第3の構成例を説明する
ためのプロツク図である。 第7図において、701は相関係数入力端子、702は
ピツチ抽出器、703は整数倍ピッチ検索器、704は
有無判定器、705は有声無声判定信号出力端子。 第8図は有声度計測器406の第4の構成例を説明する
ためのプロツク図である。 第8図において、801は相関係数入力端子、802は
極小値抽出器、803はピツチ遅れ極小値抽出器、80
4は極大値抽出器、805はピツチ遅れ極大値抽出器、
806はピツチ抽出器、807は有無判定器、808は
有声無声判定信号出力端子。 第9図は本発明の第2の実施例を説明するためのプロツ
ク図である。 第9図において、901は波形入力端子、902はA/
D変換器、903は一時メモリ、904はウィンドウ処
理器、905はスペクトラムパラメータ抽出器、906
は逆フイルタ、907は相関器、908は有声度計測器
、909は有声無声判定信号出力端子。
Figure 1 is a waveform diagram for explaining the characteristics of the autocorrelation waveform of a voiced part, Figure 2 is a waveform diagram for explaining the characteristics of the autocorrelation waveform of ambient noise, and Figure 3 is a waveform diagram for explaining the characteristics of the autocorrelation waveform of a voiced part. FIG. 4 is a waveform diagram for explaining the characteristics of the autocorrelation waveform of a voiced part. FIG. 4 is a block diagram for explaining the first embodiment of the present invention. In Fig. 4, 401 is a waveform input terminal, 402 is an A/
D converter, 403 temporary memory, 404 window processor, 405 correlator, 406 voicing degree measuring device, 407
is the voiced/unvoiced judgment signal output terminal. FIG. 5 is a block diagram for explaining a first configuration example of the voicing degree measuring device 406. In FIG. 5, 501 is a correlation coefficient input terminal, 502 is a similarity measuring device, 503 is a maximum value search device, 504 is a presence/absence determination device, and 505 is a voiced/unvoiced determination signal output terminal. FIG. 6 is a block diagram for explaining a second configuration example of the voicing degree measuring device 406. In FIG. 6, 601 is a correlation coefficient input terminal, 602 is a similarity measurer, 603 is a minimum value searcher, 604 is an existence/absence judger, 605 is a maximum value searcher, 606 is a judgment signal generator, and 607 is a voiced Silence judgment signal output terminal. FIG. 7 is a block diagram for explaining a third configuration example of the voicing degree measuring device 406. In FIG. 7, 701 is a correlation coefficient input terminal, 702 is a pitch extractor, 703 is an integer multiple pitch searcher, 704 is a presence/absence determination device, and 705 is a voiced/unvoiced determination signal output terminal. FIG. 8 is a block diagram for explaining a fourth configuration example of the voicing degree measuring device 406. In FIG. 8, 801 is a correlation coefficient input terminal, 802 is a minimum value extractor, 803 is a pitch-delayed minimum value extractor, and 80
4 is a local maximum value extractor, 805 is a pitch-lag local maximum value extractor,
806 is a pitch extractor, 807 is a presence/absence determination device, and 808 is a voiced/unvoiced determination signal output terminal. FIG. 9 is a block diagram for explaining a second embodiment of the present invention. In FIG. 9, 901 is a waveform input terminal, 902 is an A/
D converter, 903 temporary memory, 904 window processor, 905 spectrum parameter extractor, 906
907 is an inverse filter, 907 is a correlator, 908 is a voicing degree measuring device, and 909 is a voiced/unvoiced determination signal output terminal.

Claims (1)

【特許請求の範囲】[Claims] 1 音声の有声無声判定を行なう有声無声判定装置にお
いて、音声波形から相関係数を計測する手段と、一定遅
れ時間以降の前記相関係数の周期性もしくは時間的類似
性を計測する手段とを有することを特徴とする有声無声
判定装置。
1. A voiced/unvoiced determination device that determines voiced/unvoiced speech, comprising means for measuring a correlation coefficient from a speech waveform, and means for measuring periodicity or temporal similarity of the correlation coefficient after a certain delay time. A voiced/unvoiced determination device.
JP128778A 1978-01-09 1978-01-09 Voiced/unvoiced determination device Expired JPS5912185B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP128778A JPS5912185B2 (en) 1978-01-09 1978-01-09 Voiced/unvoiced determination device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP128778A JPS5912185B2 (en) 1978-01-09 1978-01-09 Voiced/unvoiced determination device

Publications (2)

Publication Number Publication Date
JPS5494212A JPS5494212A (en) 1979-07-25
JPS5912185B2 true JPS5912185B2 (en) 1984-03-21

Family

ID=11497228

Family Applications (1)

Application Number Title Priority Date Filing Date
JP128778A Expired JPS5912185B2 (en) 1978-01-09 1978-01-09 Voiced/unvoiced determination device

Country Status (1)

Country Link
JP (1) JPS5912185B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6118989A (en) * 1984-07-05 1986-01-27 松下電器産業株式会社 Display
JPS6263989A (en) * 1985-09-17 1987-03-20 松下電器産業株式会社 Display unit

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS56104399A (en) * 1980-01-23 1981-08-20 Hitachi Ltd Voice interval detection system
JPS56111900A (en) * 1980-02-12 1981-09-03 Nippon Electric Co System discriminating existence of input voice
JPS5772199A (en) * 1980-10-23 1982-05-06 Tokyo Shibaura Electric Co Voice recognition device
US4731846A (en) * 1983-04-13 1988-03-15 Texas Instruments Incorporated Voice messaging system with pitch tracking based on adaptively filtered LPC residual signal
GB2139052A (en) * 1983-04-20 1984-10-31 Philips Electronic Associated Apparatus for distinguishing between speech and certain other signals
JP4490090B2 (en) * 2003-12-25 2010-06-23 株式会社エヌ・ティ・ティ・ドコモ Sound / silence determination device and sound / silence determination method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6118989A (en) * 1984-07-05 1986-01-27 松下電器産業株式会社 Display
JPS6263989A (en) * 1985-09-17 1987-03-20 松下電器産業株式会社 Display unit

Also Published As

Publication number Publication date
JPS5494212A (en) 1979-07-25

Similar Documents

Publication Publication Date Title
KR100438826B1 (en) System for speech synthesis using a smoothing filter and method thereof
US4956865A (en) Speech recognition
Wong et al. Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification
US4074069A (en) Method and apparatus for judging voiced and unvoiced conditions of speech signal
JP3423906B2 (en) Voice operation characteristic detection device and detection method
JPS597120B2 (en) speech analysis device
JPH0990974A (en) Signal processor
JPS5912185B2 (en) Voiced/unvoiced determination device
KR940002437B1 (en) Speech recognition method and device
JP2016042152A (en) Voice recognition device and program
Zhao et al. A processing method for pitch smoothing based on autocorrelation and cepstral F0 detection approaches
JP2797861B2 (en) Voice detection method and voice detection device
JP2940835B2 (en) Pitch frequency difference feature extraction method
KR100526110B1 (en) Method and System for Pith Synchronous Feature Generation of Speaker Recognition System
CN106920558A (en) Keyword recognition method and device
RU2174714C2 (en) Method for separating the basic tone
JPH04100099A (en) Voice detector
Boll et al. Robust syntax free speech recognition
JP2001042889A (en) Device for normalizing interval of inputted voice for voice recognition
Shimodaira et al. Robust pitch detection by narrow band spectrum analysis
JP2001083978A (en) Speech recognition device
JPS6151320B2 (en)
RU2807170C2 (en) Dialog detector
KR100345402B1 (en) An apparatus and method for real - time speech detection using pitch information
Hu et al. Efficient estimation of perceptual features for speech recognition.