JPH0439680B2

JPH0439680B2 -

Info

Publication number: JPH0439680B2
Application number: JP60119685A
Authority: JP
Priority date: 1985-06-04
Filing date: 1985-06-04
Publication date: 1992-06-30
Also published as: JPS61278000A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、ケプストラム法によつて音声の分
析を行う音声分析装置における有声音無声音判別
装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a voiced/unvoiced sound discriminator in a speech analysis device that analyzes speech using the cepstral method.

[Conventional technology]

一般に音声の特徴はその周波数スペクトル、す
なわち、第５図Ａに示す音声信号の各周波数成分
の分布によつて表される。従つて、音声の特徴を
表わすパラメータはスペクトルを何らかの形で表
現する物理量を用いる。ケプストラムは対数スペ
クトルのコサイン展開で求められるパラメータで
一般的に(1)式で表現される。 In general, the characteristics of a voice are represented by its frequency spectrum, that is, the distribution of each frequency component of the voice signal shown in FIG. 5A. Therefore, physical quantities that express the spectrum in some form are used as parameters representing the characteristics of the voice. The cepstrum is a parameter obtained by cosine expansion of a logarithmic spectrum, and is generally expressed by equation (1).

S_(K)＝_M 〓^m=0 Ｃ(m)・cos（2π／Ｎｋ・ｍ） ……(1) 但し、S_(K)は対数スペクトル、ｋは周波数、Ｃ(m)はケプストラム、である。S _(K) = _M 〓 ^m=0 C(m)・cos(2π/Nk・m) ……(1) However, S _(K) is the logarithmic spectrum, k is the frequency, C(m) is the cepstrum, and be.

そして、第５図Ｂは前記第５図Ａに示した音声
信号をフーリエ分析・対数スペクトルで表わした
ものである。例えば(1)式において、ケプストラム
の０次項C₍₀₎は第６図Ａに示すように対数スペク
トルS_(K)の平均値であり、C₍₁₎はS_(K)のコサインの
一次の成分となる。すなわち、対数スペクトル
S_(K)は第６図Ｂに示す如く、各次項の成分の和と
して表現される。従つて、この種の有声音無声音
判別装置では上述のケプストラムの０次項の値に
対する閾値判定、ケプストラムの１次項の値に対
する閾値判定、ケプストラム各次数の２乗和、つ
まり、スペクトルの分散の値に対する閾値判定、
あるいはケプストラムの各次数の積和計算による
スペクトルのある帯域の平均値に対する閾値判
定、もしくはこれらの組合せによつて有声音か無
声音かの判別を行うようにしていた。これは新美
康永著「音声認識」第70頁〜第72頁に説明されて
いるように、例えばケプストラムの０次項による
方法とはケプストラムの０次項が音声のパワーに
相当し、この値が有声音では大きくなり、また無
声音では小さいことを利用した方法であり、更に
ケプストラムの１次項による方法とは、ケプスト
ラムの１次項がスペクトルのだいたいの傾きに相
当すること（第６図ＢのC₍₁₎を用い、有声音では
低域にパワーが集中しこの値が大きくなること）
を用いた方法であり、いずれの場合にも非常に簡
単な装置として実現できる。 FIG. 5B shows the audio signal shown in FIG. 5A using Fourier analysis and a logarithmic spectrum. For example, in equation (1), the zero-order term C ₍₀₎ of the cepstrum is the average value of the logarithmic spectrum S _(K) , as shown in Figure 6A, and C ₍₁₎ is the first-order cosine of S _(K). Becomes an ingredient. i.e. the logarithmic spectrum
As shown in FIG. 6B, S _(K) is expressed as the sum of the components of each order term. Therefore, this type of voiced/unvoiced sound discrimination device performs threshold judgment on the value of the zero-order term of the cepstrum, threshold judgment on the value of the first-order term of the cepstrum, and judgment on the sum of squares of each order of the cepstrum, that is, the value of the spectral variance. Threshold judgment,
Alternatively, a voiced or unvoiced sound is determined by determining a threshold value for the average value of a certain band of the spectrum by calculating the sum of products of each order of the cepstrum, or by a combination of these. As explained in "Speech Recognition" by Yasunaga Niimi, pages 70 to 72, for example, the method using the zero-order term of the cepstrum is that the zero-order term of the cepstrum corresponds to the power of the voice, and this value is This method takes advantage of the fact that voiced sounds are louder and unvoiced sounds are smaller. Furthermore, the method using the first-order term of the cepstrum is based on the fact that the first-order term of the cepstrum corresponds to the approximate slope of the spectrum (C _{(1 )} , and for voiced sounds, the power is concentrated in the low range and this value becomes large)
In either case, it can be realized as a very simple device.

[Problem that the invention seeks to solve]

従来の有声音無声音判別装置は以上のようにな
されていたので、装置が簡単というだけで、判定
誤差が多く、そのためこの装置を用いた音声合成
装置では合成音品質の劣下が生じ、音声認識装置
の前処理部に用いると誤認識率の低下をもたらす
などの問題点があつた。また、スペクトルの分散
による方法でも同様であつた。一方スペクトルの
ある帯域の平均値を用いる方法では音声のスペク
トラムと周波数上の荷重関数のケプストラムの積
和によつて希望する帯域の平均パワーを求めるも
のであり、有声音でパワーが集中する100〜1000
Hz程度に帯域を選べば、判別誤りはかなり少なく
なる。しかしこの様な装置ではケプストラムの次
数だけの積和計算が必要となり比較的大きな計算
量が必要であるという問題点があつた。 Conventional voiced/unvoiced sound discriminators have been designed as described above, and even though the device is simple, there are many judgment errors.As a result, in speech synthesis devices using this device, the synthesized sound quality deteriorates, making it difficult to recognize speech. When used in the preprocessing section of the device, there were problems such as a decrease in the false recognition rate. The same result was obtained using a method using spectral dispersion. On the other hand, in the method of using the average value of a certain band of the spectrum, the average power of the desired band is determined by the sum of the products of the speech spectrum and the cepstrum of the weighting function on frequency. 1000
If the band is selected to be around Hz, discrimination errors will be considerably reduced. However, such a device has a problem in that it requires product-sum calculations only for the orders of the cepstrum, which requires a relatively large amount of calculation.

この発明は、上記の様な問題点を解決するため
になされたもので、ケプストラムの低次項を加算
する加算回路と、その加算値と閾値を比較する閾
値比較回路を設けることにより、少ない計算量で
判別誤りの少ない有声音無声音判別装置を得るこ
とを目的とする。 This invention was made to solve the above-mentioned problems, and reduces the amount of calculation by providing an addition circuit that adds the low-order terms of the cepstrum and a threshold comparison circuit that compares the added value with a threshold. The purpose of the present invention is to obtain a device for discriminating voiced and unvoiced sounds with fewer discrimination errors.

[Means for solving problems]

この発明に係る有声音無声音判別装置はケプス
トラム分析装置から得られるケプストラム系数の
低次項の和を算出する加算回路を設け、その加算
回路の結果と閾値とを比較する閾値比較回路とを
備え閾値以上であれば有声音、閾値以下であれば
無声音と判断して有声音無声音判別結果を得るよ
うにしたものである。 The voiced/unvoiced sound discrimination device according to the present invention includes an addition circuit that calculates the sum of low-order terms of the cepstrum series obtained from the cepstral analysis device, and a threshold comparison circuit that compares the result of the addition circuit with a threshold value. If it is below a threshold value, it is determined to be a voiced sound, and if it is below a threshold value, it is determined to be an unvoiced sound, and a voiced/unvoiced sound discrimination result is obtained.

[Effect]

この発明における有声音無声音の判別は加算回
路で得られた判別パラメータを固定的な閾値と比
較し、その比較結果の大小に応じて有声音又は無
声音と判定する。 To discriminate between voiced and unvoiced sounds in this invention, the discrimination parameter obtained by the adding circuit is compared with a fixed threshold value, and the sound is determined to be voiced or unvoiced depending on the magnitude of the comparison result.

〔Example〕

以下、この発明の一実施例を図について説明す
る。第１図は有声音無声音判別装置を示すブロツ
ク構成図で、図において、１は分析装置によつて
得られた音声のケプストラム、２はケプストラム
の各次項を加算する加算回路、３は判定パラメー
タ、４は加算回路２で得られた判定パラメータ３
を固定的な閾値と比較する閾値比較回路、５は有
声音無声音判別結果である。 An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block configuration diagram showing a voiced/unvoiced sound discriminator. In the figure, 1 is the cepstrum of the speech obtained by the analyzer, 2 is an addition circuit that adds each order term of the cepstrum, 3 is a determination parameter, 4 is the judgment parameter 3 obtained by the addition circuit 2
5 is a threshold value comparison circuit that compares the threshold value with a fixed threshold value, and 5 is the voiced/unvoiced sound discrimination result.

また、第２図は第１図の有声音無声音判別装置
における音声スペクトルと判別パラメータの関係
例を示す説明図である。 Further, FIG. 2 is an explanatory diagram showing an example of the relationship between the voice spectrum and the discrimination parameter in the voiced/unvoiced sound discriminator of FIG. 1.

次にこの発明の動作について説明する。まず、
音声の対数スペクトルS_(K)は、ケプストラムＣ(m)
（ｍ＝０、１、…Ｍ）により、(2)式で表される。 Next, the operation of this invention will be explained. first,
The logarithmic spectrum of speech S _(K) is the cepstrum C (m)
(m=0, 1,...M), it is expressed by equation (2).

S_(K)＝_M 〓^m=0 Ｃ(m)cos（2π／Ｎkm） ……(2) 但し、Ｋ＝０、１、…Ｎ−１、すなわち、この対数スペクトルS_(K)は第２図に
おいて音声スペクトル１１で示される。これに対
して、ケプストラムの次数(m)をごく低次のみに注
目すると、 S_0(K)＝_M0 〓^m=0 Ｃ(m)cos（2π／Ｎkm） ……(3) (3)式となり、コサイン級数展開の意味で平滑化
したスペクトル１２が得られる。このスペクトル
の周波数０に対する値、すなわち、判別パラメー
タ１３はＰ＝_M0 〓^m=0 Ｃ(m) ……(4) (4)式となりケプストラムの和として表すことが
できる。今Ｍを３〜４程度に選ぶとこの値Ｐは、
もとの音声スペクトル１１の従来の装置における
ケプストラムすべての次数項の積和で求めていた
有声音でパワーが集中する従来の方式の判別低域
１４のパワーとほぼ同様の値となる。よつて、第
１図における加算回路２は、図示していないケプ
ストラム分析装置によつて得られるケプストラム
１の低次の数項の和を第３図の如く加算回路２に
よつて算出し、上記(4)式のＰで表す判定パラメー
タ３を得る。ここで加算回路２で得られる判別パ
ラメータ１３は第４図の如く表わされる。Ｘは無
声音、Ｙは有声音である。このようにして閾値比
較回路４により閾値k_pとの比較がなされ、閾値k_p
以上であれば有声音、閾値k_p以下であれば無声音
と判断して有声音無声音判別結果５を得る。 S _(K) = _M 〓 ^m=0 C(m)cos (2π/Nkm) ...(2) However, K=0, 1, ...N-1, that is, this logarithmic spectrum S _(K) is the second This is indicated by the audio spectrum 11 in the figure. On the other hand, if we focus only on the very low order (m) of the cepstrum, S _0(K) = _M0 〓 ^m=0 C(m)cos (2π/Nkm) ……(3) Equation (3) Thus, a spectrum 12 smoothed in the sense of cosine series expansion is obtained. The value of this spectrum for frequency 0, that is, the discrimination parameter 13, is expressed as P= _M0 〓 ^m=0 C(m) (4) (4) and can be expressed as a sum of cepstrums. If we choose M to be around 3 to 4, this value P is
This value is almost the same as the power of the discrimination low frequency band 14 in the conventional system where the power is concentrated in voiced sounds, which was obtained by the sum of products of all order terms of the cepstrum in the conventional device of the original speech spectrum 11. Therefore, the adder circuit 2 in FIG. 1 calculates the sum of the lower-order terms of the cepstrum 1 obtained by a cepstrum analyzer (not shown) using the adder circuit 2 as shown in FIG. The determination parameter 3 represented by P in equation (4) is obtained. Here, the discrimination parameter 13 obtained by the addition circuit 2 is expressed as shown in FIG. X is a voiceless sound, and Y is a voiced sound. In this way, the threshold value comparison circuit 4 compares the threshold value k _p with the threshold value k _p
If it is above, it is determined to be a voiced sound, and if it is less than or equal to the threshold value k _p , it is determined to be an unvoiced sound, and a voiced/unvoiced sound discrimination result 5 is obtained.

〔Effect of the invention〕

以上のように、この発明によれば、ケプストラ
ム分析装置によつて得られるケプストラムの低次
の数項を加算回路に取り込み閾値比較回路によつ
て有声音無声音の判別を行うようにしたので、従
来の如く多くの計算量を実行していた判別パラメ
ータとほぼ同様の性能を持つパラメータが得ら
れ、従来装置では得られない高い判別率を得るこ
とができる優れた効果を奏する。 As described above, according to the present invention, the low-order terms of the cepstrum obtained by the cepstrum analyzer are incorporated into the adder circuit and the threshold comparison circuit is used to discriminate between voiced and unvoiced sounds. It is possible to obtain parameters that have almost the same performance as the discriminating parameters that require a large amount of calculation, such as the above, and have the excellent effect of obtaining a high discriminating rate that cannot be obtained with conventional devices.

[Brief explanation of drawings]

第１図はこの発明の一実施例である有声音無声
音判別装置を示すブロツク構成図、第２図は第１
図の有声音無声音判別装置の判定に用いられるパ
ラメータと音声スペクトルの説明図、第３図は加
算回路の説明用図、第４図は閾値比較回路の説明
用図、第５図は従来における一般的なケプストラ
ム説明図、第６図はケプストラム低次項と対数ス
ペクトルとの関係図である。図において、１はケプストラム、２は加算回
路、３は判別パラメータ、４は閾値比較回路、５
は有声音無声音判別結果である。 FIG. 1 is a block diagram showing a voiced/unvoiced sound discriminator which is an embodiment of the present invention, and FIG.
Fig. 3 is an explanatory diagram of the adding circuit, Fig. 4 is an explanatory diagram of the threshold comparison circuit, and Fig. 5 is a conventional general diagram. FIG. 6 is a diagram showing the relationship between the cepstrum low-order terms and the logarithmic spectrum. In the figure, 1 is a cepstrum, 2 is an addition circuit, 3 is a discrimination parameter, 4 is a threshold comparison circuit, and 5
is the voiced/unvoiced sound discrimination result.

Claims

[Claims]

1. An addition circuit that calculates the sum of the low-order terms of the cepstrum series obtained by the speech cepstrum analysis device, and a discrimination parameter obtained from the addition result of the addition circuit are input and compared with a pre-fixed threshold value, and from the threshold value A voiced/unvoiced sound discrimination device comprising a threshold comparison circuit that discriminates a voiced sound when the discrimination parameter is large and a voiceless sound when the discrimination parameter is small.