JPH07135490A

JPH07135490A - Voice detector and vocoder having voice detector

Info

Publication number: JPH07135490A
Application number: JP5282595A
Authority: JP
Inventors: Osamu Watanabe; 治渡辺; Seiji Sasaki; 誠司佐々木
Original assignee: Kokusai Electric Corp
Current assignee: Kokusai Electric Corp
Priority date: 1993-11-11
Filing date: 1993-11-11
Publication date: 1995-05-23

Abstract

PURPOSE:To attain accurate voice detection with a simple circuit scale and less processing time by utilizing parameters calculated from a vocoder having an acoustic sense weighting filter adaptor and a backward gain adaptor efficiently. CONSTITUTION:A reflection coefficient (t) calculated by an acoustic sense weighting filter adaptor 106 of a vocoder 100 and an exciting gain (s) calculated by a backward gain adaptor 110 are inputted to a voice detector 200. The voice detector 2000 uses mean value calculation devices 201, 204 to calculate a mean value of an input signal for a predetermined period and a decision circuit 205 compares the mean value with a preset threshold level to make decision on sound/silence. Thus, accurate decision is made under a usual service environment by taking the presence of background noise in account. The quantity required for decision processing is less and the voice is detected for a period less than a 10ms frame unit and the circuit scale is made small.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声符号化器、特に音声
の有音／無音判定を行なう音声検出器及び音声検出器を
有する音声符号化器に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coder, and more particularly, to a speech detector for making speech / non-speech judgment of speech and a speech coder having the speech detector.

【０００２】[0002]

【従来の技術】音声検出器の応用例の一つとしては、次
のものが考えられる。携帯型の無線機などでは、消費電
力を低減する為に、音声があるときのみ送信し、音声が
無いときは送信を中断するＶＯＸ（ＶｏｉｃｅＯｐｅ
ｒａｔｅＳｗｉｔｃｈＥｘｃｈｅｎｇｅ）制御が使
用されており、これを用いると送信時の平均消費電力
は、全二重通信で、約６５％削減することができる。こ
のようなＶＯＸ機能を実現する為には、送信側におい
て、音声信号の有無を検出する必要がある。この音声信
号の有無を検出するために音声検出器が設けられる。2. Description of the Related Art The following can be considered as one application example of a voice detector. In a portable wireless device or the like, in order to reduce power consumption, VOX (Voice Opera) which transmits only when there is voice and suspends transmission when there is no voice
The rate switch exchange control is used, and if this is used, the average power consumption during transmission can be reduced by about 65% in full-duplex communication. In order to realize such a VOX function, it is necessary for the transmitting side to detect the presence or absence of an audio signal. An audio detector is provided to detect the presence or absence of this audio signal.

【０００３】以下この音声検出器をディジタルコードレ
ス電話装置のＶＯＸに適用することを前提にして説明す
る。このディジタルコードレス電話装置では、次世代の
ハーフレートＣＯＤＥＣ（音声符号化方式）として、処
理遅延時間が、２ｍｓ以下であることが要求されてお
り、１６ｋｂｐｓのＬＤ−ＣＥＬＰ（低遅延符号励振線
形予測）が用いられる見通しである。又この装置での音
声検出器による有音／無音判定のフレーム長は、データ
伝送フォートマットより、１０ｍｓフレーム単位が適当
である（ＲＣＲＳＴＤ−２８標準規格参照）とされ
ている。Description will be given below on the premise that this voice detector is applied to a VOX of a digital cordless telephone system. In this digital cordless telephone device, a processing delay time of 2 ms or less is required as a next-generation half-rate CODEC (voice coding method), and LD-CELP (low delay code excitation linear prediction) of 16 kbps is required. Is expected to be used. In addition, it is said that the frame length of voice / non-voice determination by the voice detector in this device is appropriate in 10 ms frame unit according to the data transmission format (see RCR STD-28 standard).

【０００４】従来の音声検出器のブロック図を図６に示
す。ここでの音声検出は、８ｋＨｚサンプリングで、８
ビット量子化された入力音声ａを、２０ｍｓフレーム単
位（１６０サンプル）に分割して実行する。直流成分抑
圧器１１では、広域フィルタにより、音声入力ａから、
直流成分を取り除いた信号を出力する。A block diagram of a conventional voice detector is shown in FIG. The voice detection here is 8 kHz sampling and 8
The bit-quantized input voice a is divided into 20 ms frame units (160 samples) and executed. In the direct-current component suppressor 11, a wide area filter is used to
Outputs the signal with the DC component removed.

【０００５】高レベルパワー検出器１２では、２０ｍｓ
の音声区間を４ｍｓ毎のサブフレーム（３２サンプル）
に５分割し、各サブフレームについて、数１により、短
区間パワーＰ_SKを算出する。In the high level power detector 12, 20 ms
Subframe every 32 ms (32 samples)
And the short-term power P _SK is calculated for each sub-frame by the formula 1.

【０００６】[0006]

【数１】 [Equation 1]

【０００７】さらに各Ｐ_SKに対して、パワー闘値Ｔｈ２
（−３０ｄｂｍ０）により、数２で重み付けパワーＤ_2K
の検出を行う。Furthermore, for each P _SK , the power threshold value Th2
By (-30dbm0), the weighting power D _{2K is} calculated by Equation _2.
Is detected.

【０００８】[0008]

【数２】 [Equation 2]

【０００９】次に、重み付けパワー総和Ｄ₂を数３によ
り計算し、これを１フレームの検出結果ｃとして出力す
る。Next, the total weighted power D ₂ is calculated by the equation 3, and this is output as the detection result c of one frame.

【００１０】[0010]

【数３】 [Equation 3]

【００１１】また、低レベルパワー検出器１３では、数
１の短区間パワーＰ_SKに対し、パワー闘値Ｔｈ１（−５
０ｄｂｍ０）により数４で重み付けパワーＤ_1Kの検出を
行う。Further, in the low-level power detector 13, with respect to short-term power P _SK having 1, power闘値Th1 (-5
0dbm0) is used to detect the weighting power D _1K by Equation 4.

【００１２】[0012]

【数４】 [Equation 4]

【００１３】同様に、重み付け総和Ｄ₁を数５により計
算し、これを１フレームの検出結果ｄとして出力する。Similarly, the weighted sum D ₁ is calculated by the equation 5, and this is output as the detection result d for one frame.

【００１４】[0014]

【数５】 [Equation 5]

【００１５】又この時、下記数６でＤ₃も求めておく。At this time, D ₃ is also calculated by the following equation 6.

【００１６】[0016]

【数６】 [Equation 6]

【００１７】零交差数検出器１４では、フィルタ出力ｂ
の零クロス数Ｚ_SK（連続した音声信号２サンプル間の符
号ビットが反転する数）をカウントする為、サブフレー
ム毎に次の数７の演算を行う。In the zero-crossing number detector 14, the filter output b
In order to count the number of zero-crossings Z _SK (the number of sign bits between two consecutive audio signal samples inverted), the following equation 7 is calculated for each subframe.

【００１８】[0018]

【数７】 [Equation 7]

【００１９】次に、各Ｚ_SKに対し、零クロス闘値Ｔｈ３
（２４個）により、数８により零クロス数ＤＺ_SK検出を
行う。Next, for each Z _SK , the zero-cross threshold value Th3
(24), the zero-cross number DZ _{SK is} detected by the equation 8.

【００２０】[0020]

【数８】 [Equation 8]

【００２１】同様に、重み付け総和Ｄｚを数９により計
算し、これを１フレームの検出結果ｅとして出力する。Similarly, the weighted sum Dz is calculated by the equation 9, and this is output as the detection result e of one frame.

【００２２】[0022]

【数９】 [Equation 9]

【００２３】また、フレーム間パワー増分比較器１５で
は、１フレーム分のパワーＰ_TNを下記数１０の演算によ
り求める。Further, the inter-frame power increment comparator 15 obtains the power P _TN for one frame by the calculation of the following expression 10.

【００２４】[0024]

【数１０】 [Equation 10]

【００２５】前フレームのフレーム間パワーＰ_T(n-1)と
の比較により、次の数１１によりパワー増分検出を行
い、その結果をｆとして出力する。By comparing with the inter-frame power P _{T (n-1)} of the previous frame, the power increment is detected by the following equation 11, and the result is output as f.

【００２６】[0026]

【数１１】 [Equation 11]

【００２７】判定器１６では各信号を入力し、図７に示
す判定論理フローにより判定し、音声検出結果ｇを出力
する。図７に於いてＨＯＴは、ハングオーバータイマー
（語尾切れ防止の為、有音から無音に判定が変わった時
点で、それ以降の数フレームで強制的に有音判定を行う
機能）であり、ＳＰフラグは、有音／無音フラグであ
る。The decision unit 16 inputs each signal, makes a decision by the decision logic flow shown in FIG. 7, and outputs a voice detection result g. In FIG. 7, HOT is a hangover timer (a function for forcibly determining the presence of speech in a few frames after that when the determination is changed from voiced to silence to prevent ending of the ending). The flag is a voice / non-voice flag.

【００２８】ステップ２１で高レベルパワー検出器１２
の出力Ｃ（Ｄ₂）、ステップ２２で零交差数検出器１４
の出力ｅ（Ｄｚ）、ステップ２３で低レベルパワー検出
器１３の出力ｄ（Ｄ₁）、ステップ２４でフレームパワ
ー増分比較器１５の出力ｆ（Ｄ₄）、及びステップ２５
で低レベルパワー検出器１３で求めた数値（Ｄ₃）の各
々判定をする。各判定結果は判定理論フローにしたがっ
て、成立のときＳＰフラグセット２７、ＨＯＴＳＥＴ２
８を経てＳＰフラグ送信３１から有音フラグを出力し、
不成立では無音フラグを出力する。In step 21, the high level power detector 12
Output C (D ₂ ) of the zero crossing number detector 14 in step 22.
Output e (Dz), the output d (D ₁₎ of the low-level power detector 13 at step 23, the output f (D ₄₎ of the frame power increment comparator 15 in step 24, and step 25
Then, each of the numerical values (D ₃ ) obtained by the low level power detector 13 is judged. Each judgment result follows the judgment theory flow, and when satisfied, SP flag set 27, HOTSET2
The voice flag is output from the SP flag transmission 31 via 8,
If not established, a silent flag is output.

【００２９】[0029]

【発明が解決しようとする課題】しかしながら、前記従
来技術の音声検出器による処理は、２０ｍｓフレーム単
位で実行される為、最低２０ｍｓの遅延時間を生じ、前
述した１０ｍｓフレーム単位という条件は満たせない。
又、この従来の音声検出器は、音声符号化器と独立構成
である為、全体の処理量が多くなるか、ハードウェア規
模が大きくなるという欠点がある。However, since the processing by the speech detector of the prior art is executed in the unit of 20 ms frame, a delay time of at least 20 ms occurs, and the condition of the unit of 10 ms frame cannot be satisfied.
Further, since this conventional speech detector is independent of the speech coder, it has the drawback of increasing the total processing amount or increasing the hardware scale.

【００３０】本発明の目的は、聴覚重みづけフィルタ適
応器、バックワード利得適応器を有する音声符号化器の
処理過程で得られるパラメータを効率良く利用すること
により、少ない処理時間、かつ遅延時間２ｍｓ以下の音
声符号化器のもとで実現できる１０ｍｓフレーム単位以
下での音声検出を可能とする手段、すなわち音声検出器
を音声符号化器に設けることにある。An object of the present invention is to efficiently use the parameters obtained in the process of the speech coder having the auditory weighting filter adaptor and the backward gain adaptor, so that the processing time can be reduced and the delay time can be 2 ms. The speech coder is provided with a means for enabling speech detection in a unit of 10 ms frame or less that can be realized under the following speech coder, that is, a speech detector.

【００３１】[0031]

【課題を解決するための手段】前記目的は、聴覚重み付
けフィルタ適応器とバックワード利得適応器を有する音
声符号化器の、前記聴覚重み付けフィルタ適応器で算出
される反対係数と前記バックワード利得適応器で算出さ
れる励振利得の各々を入力し前もって用意された闘値と
比較することによって有音／無音判定を行なう音声検出
器を音声符号化器に設けることによって達成される。In the speech coder having a perceptual weighting filter adaptor and a backward gain adaptor, the object is the inverse coefficient calculated by the perceptual weighting filter adaptor and the backward gain adaptation. This is accomplished by providing the speech encoder with a speech detector that makes a voiced / silent decision by inputting each of the excitation gains calculated by the device and comparing it with a prepared threshold value.

【００３２】[0032]

【作用】前記本発明によれば、符号化器の聴覚重み付け
フィルタ適応器は、入力信号ベクトルの内部計算により
反射係数ｔを出力する。これは入力信号のスペクトル包
絡情報を示す声道パラメータであり、これを用いれば、
音声と背景雑音の判定が可能である。またバックワード
利得適応器は、利得調整された励振信号を入力し、バッ
クワード適応により励振利得ｓを出力する。これは入力
信号パワーと考えることができ、このパラメータによる
パワー判定による音声検出が可能となる。音声検出器
は、前記符号化器から出力する反射係数ｔと励振利得ｓ
とが入力され、これら２種のパラメータを前もって用意
された闘値と比較することにより音声検出信号を出力
し、有音／無音の判定をすることができる。According to the present invention, the auditory weighting filter adaptor of the encoder outputs the reflection coefficient t by the internal calculation of the input signal vector. This is a vocal tract parameter that indicates the spectral envelope information of the input signal.
It is possible to judge voice and background noise. The backward gain adaptor inputs the excitation signal whose gain has been adjusted, and outputs the excitation gain s by backward adaptation. This can be considered as the input signal power, and the voice can be detected by the power judgment based on this parameter. The speech detector has a reflection coefficient t and an excitation gain s output from the encoder.
Is input, and a voice detection signal is output by comparing these two types of parameters with a threshold value prepared in advance, and it is possible to determine whether there is sound or no sound.

【００３３】[0033]

【実施例】以下、本発明を一実施例により説明する。デ
ィジタルコードレス電話装置の次世代ハーフレート音声
符号化器として用いられる見込みの１６ｋｂｐｓＬＤ−
ＣＥＬＰに適用する例を以下に示す。EXAMPLES The present invention will be described below with reference to examples. 16 kbps LD- expected to be used as a next-generation half-rate voice encoder for digital cordless telephone devices
An example applied to CELP is shown below.

【００３４】図１に音声検出機能を有するＬＤ−ＣＥＬ
Ｐ音声符号化器１００を示す。図１において、１０１は
均一ＰＣＭ変換回路、１０２はベクトルバッファ、１０
３は励振コードブック、１０４は利得調整ユニット、１
０５は合成フィルタ、１０６は聴覚重み付けフィルタ適
応器、１０７が聴覚重み付けフィルタ、１０８は最小自
乗平均誤差算出器、１０９はバックワード合成フィルタ
適応器、１１０はバックワード利得適応器である。２０
０は音声符号化器１００の出力するパラメータを入力し
判定結果を出力する音声検出器である。FIG. 1 shows an LD-CEL having a voice detection function.
1 shows a P speech encoder 100. In FIG. 1, 101 is a uniform PCM conversion circuit, 102 is a vector buffer, and 10 is a vector buffer.
3 is an excitation codebook, 104 is a gain adjustment unit, 1
Reference numeral 05 is a synthesis filter, 106 is a perceptual weighting filter adaptor, 107 is a perceptual weighting filter, 108 is a least mean square error calculator, 109 is a backward synthesis filter adaptive device, and 110 is a backward gain adaptive device. 20
Reference numeral 0 is a voice detector that inputs the parameters output from the voice encoder 100 and outputs the determination result.

【００３５】以上において、入力信号ｈは、均一ＰＣＭ
変換器１０１でμ則ＰＣＭから均一ＰＣＭに変換ｉされ
た後、ベクトルバッファ１０２にて５個の連続した入力
信号サンプルのブロックｊに分割される。各入力ブロッ
クｊに対して符号化器は、励振コードブック１０３に保
持されている１０２４個のコードブックベクトル候補ｋ
を利得調整ユニット１０４及び合成フィルタ１０５に通
し、その結果である１０２４個の量子化信号ベクトル候
補ｍのなかから、入力信号ベクトルｊに対して聴覚重み
付けフィルタ１０７で周波数の重み付けをし、重み付け
をされた自乗平均誤差０を最小自乗平均誤差算出器１０
８に入力して最小となるものを決定する。この最適量子
化信号ベクトルに対応した１０ビットのコードブックイ
ンデックスｐを復号器に送信する。In the above, the input signal h is the uniform PCM.
After the converter 101 converts the μ-law PCM into a uniform PCM, the vector buffer 102 divides the i-law PCM into a block j of five continuous input signal samples. For each input block j, the encoder has 1024 codebook vector candidates k held in the excitation codebook 103.
Is passed through the gain adjustment unit 104 and the synthesis filter 105, and from the resulting 1024 quantized signal vector candidates m, the perceptual weighting filter 107 performs frequency weighting on the input signal vector j, and weighting is performed. The root mean square error 0 is calculated as the least mean square error calculator 10
Enter 8 to determine the minimum. The 10-bit codebook index p corresponding to this optimum quantized signal vector is transmitted to the decoder.

【００３６】その後、次の信号ベクトルを符号化する準
備としてフィルタメモリを更新する為に、最小自乗平均
誤差算出器１０８の最適コードベクトルｑは利得調整ユ
ニット１０４及び合成フィルタ１０５に通される。合成
フィルタ１０５の係数ｒ及び利得調整ユニット１０４の
利得ｓは、過去の量子化信号ｍ及び利得調整された励振
信号ｌに基づいた各々バックワード合成フィルタ適応器
１０９及びバックワード利得適応器１１０のバックワー
ド適応により周期的に更新される。これら適応の周期
は、励振利得ｓが１ベクトル（０．６２５ｍｓ）毎、合
成フィルタ係数ｒが４ベクトル（２．５ｍｓ）毎であ
る。また聴覚重み付けフィルタ１０７の係数ｎは聴覚重
み付けフィルタ適応器１０６により周期的に更新され、
その周期も４ベクトル（２．５ｍｓ）毎であるが、前記
したベクトルバッファ１０２のサイズが１ベクトル（５
サンプル）だけであるので、片方向遅延は、２ｍｓ以下
を実現することができる。The optimum code vector q of the least mean square error calculator 108 is then passed through the gain adjustment unit 104 and the synthesis filter 105 in order to update the filter memory in preparation for coding the next signal vector. The coefficient r of the synthesizing filter 105 and the gain s of the gain adjusting unit 104 are the back of the backward synthesizing filter adaptor 109 and the backward gain adaptor 110, respectively, based on the past quantized signal m and the gain adjusted excitation signal l. It is updated periodically by word adaptation. The period of these adaptations is such that the excitation gain s is every 1 vector (0.625 ms) and the synthesis filter coefficient r is every 4 vectors (2.5 ms). The coefficient n of the perceptual weighting filter 107 is periodically updated by the perceptual weighting filter adaptive unit 106,
The cycle is also every 4 vectors (2.5 ms), but the size of the vector buffer 102 is 1 vector (5 ms).
Therefore, the one-way delay of 2 ms or less can be realized.

【００３７】この符号化器のバックワード利得適応器１
１０から出力される励振利得ｓは、入力信号パワーと考
えることができ、このパラメータによるパワー判定によ
り音声検出が可能となる。又、聴覚重み付けフィルタ適
応器１０６の内部で計算される反射係数ｔは、入力信号
のスペクトル包絡情報を示す声道パラメータであり、こ
れを用いれば、音声と背景雑音の判別が可能である。Backward gain adaptor 1 of this encoder
The excitation gain s output from 10 can be considered as the input signal power, and the voice can be detected by the power determination based on this parameter. Further, the reflection coefficient t calculated inside the auditory weighting filter adaptor 106 is a vocal tract parameter indicating spectral envelope information of the input signal, and by using this, it is possible to distinguish between voice and background noise.

【００３８】よって、音声検出器（２００）には、励振
利得ｓと反射係数ｔが入力され、これら２種のパラメー
タを前もって用意された闘値と比較することにより音声
検出信号ｕを出力することができる。Therefore, the excitation gain s and the reflection coefficient t are input to the voice detector (200), and the voice detection signal u is output by comparing these two types of parameters with the prepared threshold value. You can

【００３９】次に、図２に、音声検出器２００の内部構
成を示し説明する。ここでは、１０ｍｓフレーム単位で
音声検出結果を出力することを前提に説明する。音声符
号化器で算出された反射係数ｔは、平均値計算器２０１
に入力され、１０ｍｓ毎（４適応周期）の平均値ｗが計
算される。最小誤差計算器２０３では、この平均値ｗ
を、様々な種類の背景雑音をトレーニングシーケンスと
し、例えばＬＢＧアルゴリズム（公知の技術）によって
作成された、例えばコードブックサイズが８の背景雑音
コードブック２０２より出力される全ての反射係数ベク
トルｖと比較し、最小誤差Ｅｍｉｎｘを出力する。Next, the internal structure of the voice detector 200 is shown in FIG. 2 and will be described. Here, description will be made on the assumption that the voice detection result is output in units of 10 ms frames. The reflection coefficient t calculated by the speech encoder is the average value calculator 201.
The average value w is calculated every 10 ms (4 adaptive cycles). In the minimum error calculator 203, this average value w
With all kinds of background noise as a training sequence and all reflection coefficient vectors v output from a background noise codebook 202 with a codebook size of 8, for example, created by the LBG algorithm (known technique), for example. Then, the minimum error Eminx is output.

【００４０】一方、音声符号化器で算出された励振利得
ｓは、平均値計算器２０４に入力され、１０ｍｓ毎（１
６適応周期）の平均値ｙが計算される。この励振利得平
均値ｙと、最小誤差Ｅｍｉｎｘは、判定回路２０５に入
力され、有音／無音判定がなされ、その判定結果ｚが出
力される。ハングオーバー処理器２０６では、語尾切れ
を防止する為、前記判定結果ｚに、例えば、１００ｍｓ
のハングオーバー処理を施し最終的な音声検出結果ｕと
して出力する。On the other hand, the excitation gain s calculated by the speech encoder is input to the average value calculator 204, and every 10 ms (1
The average value y of 6 adaptation periods) is calculated. The excitation gain average value y and the minimum error Eminx are input to the determination circuit 205, the presence / absence determination is performed, and the determination result z is output. In the hangover processor 206, for example, 100 ms is added to the determination result z in order to prevent word endings.
Hangover processing is performed and the final voice detection result u is output.

【００４１】前記判定回路２０５による判定論理フロー
を図３によって説明する。音声符号化器で算出された励
振利得ｓが平均値計算器２０４で計算された平均値ｙ
（以下これを平均励振利得ｓとする）をステップ３０１
で設定した励振利得闘値Ｔｈｓと比較する。ｓ＜Ｔｈｓ
が成立すればステップ３０４で無音判定され、不成立で
あれば次のステップ３０２に移る。ステップ３０２では
音声符号化器からの反射係数ｔから算出された最小誤差
Ｅｍｉｎｘが設定した最小誤差闘値Ｔｈｘと比較され、
ｘ＜Ｔｈｘが成立すればステップ３０４で無音判定さ
れ、不成立であればステップ３０３で有音判定される。The decision logic flow by the decision circuit 205 will be described with reference to FIG. The excitation gain s calculated by the speech encoder is the average value y calculated by the average value calculator 204.
(Hereinafter, referred to as average excitation gain s)
The value is compared with the excitation gain threshold Ths set in. s <Ths
If is satisfied, the silence is determined in step 304, and if not satisfied, the process proceeds to the next step 302. In step 302, the minimum error Eminx calculated from the reflection coefficient t from the speech encoder is compared with the set minimum error threshold Thx,
If x <Thx is satisfied, it is determined that there is no sound in step 304, and if not satisfied, it is determined that there is sound in step 303.

【００４２】次に、実際の音声信号入力に対して、本発
明を実施した例を示す。図４は、入力信号ｈのＳ／Ｎが
無限大の時の実施例である。音声信号ｈと共に、最小誤
差Ｅｍｉｎｘと励振利得ｓ及び最終的な音声検出結果ｕ
の変化を示す。Next, an example in which the present invention is applied to an actual voice signal input will be shown. FIG. 4 shows an embodiment when the S / N of the input signal h is infinite. Along with the voice signal h, the minimum error Eminx, the excitation gain s, and the final voice detection result u
Shows the change of.

【００４３】このように入力信号に含まれる背景雑音が
知覚できないレベルのときは、励振利得ｓだけを用いた
音声検出で十分である。この時、励振利得ｓは、無音時
には１．０付近、有音時には、約５以上の値を示してい
る。この入力信号に含まれる背景雑音の無い状態では、
当然背景雑音コードブック２０２に蓄えてある信号の性
質と近い信号はなく、最小誤差Ｅｍｉｎｘは、全て０．
１より大きい値を示している。When the background noise contained in the input signal is at a level that cannot be perceived as described above, voice detection using only the excitation gain s is sufficient. At this time, the excitation gain s shows a value of around 1.0 when there is no sound and about 5 or more when there is sound. If there is no background noise included in this input signal,
Naturally, there is no signal close to the nature of the signal stored in the background noise codebook 202, and the minimum error Eminx is all 0.
A value larger than 1 is shown.

【００４４】図５には、入力信号ｈに、背景雑音が付加
されてＳ／Ｎが約５ｄＢとなっている音声信号を用いた
場合を示す。このように入力信号に含まれる背景雑音の
レベルが比較的大きな場合には、単に励振利得ｓだけを
用いた音声検出では不十分であり、現に、励振利得ｓ
は、常に２０以上の値をとるようになる。しかしこの場
合、背景雑音の性質が、あらかじめ背景雑音コードブッ
ク２０２に蓄えてある信号の性質と似ていることから、
音声の無い部分では、最小誤差Ｅｍｉｎｘが０．１以下
の値を示すようになる。FIG. 5 shows a case where a voice signal in which background noise is added to the input signal h and the S / N is about 5 dB is used. When the level of the background noise included in the input signal is relatively high as described above, voice detection using only the excitation gain s is insufficient, and in fact, the excitation gain s
Always takes a value of 20 or more. However, in this case, since the property of the background noise is similar to the property of the signal stored in the background noise codebook 202 in advance,
In the part where there is no sound, the minimum error Eminx shows a value of 0.1 or less.

【００４５】図４及び、図５の実施例は共に、図３に記
述した闘値をＴｈｓ＝５、Ｔｈｘ＝０．１と設定してあ
るが、これによる音声検出は、ほぼ正確に動作している
ことが解る。In both the embodiment of FIGS. 4 and 5, the threshold values described in FIG. 3 are set as Ths = 5 and Thx = 0.1, but the voice detection by this operation operates almost accurately. I understand that.

【００４６】[0046]

【発明の効果】以上詳細に説明したように、本発明によ
れば、音声符号化器で算出される反射係数ｔ及び励振利
得ｓを利用し、この一定区間内での平均値を計算し、こ
れを予じめ用意された闘値と比較して有音／無音の判定
を行なうから背景雑音の有無を考慮しており、通常の使
用環境下でも正確な判定が可能である。更に、音声検出
の判定間隔が、反射係数ｔ及び励振利得ｓの２つの平均
値計算器の平均回数を変えることにより、２．５ｍｓか
ら２．５ｍｓステップで可変できるので、システムの要
求に柔軟に対応でき、１０ｍｓフレーム単位以下での音
声検出ができる。また、音声符号化器の算出パラメータ
を利用するから音声検出の為の回路規模と計算量が少な
く、ＬＳＩ化にも適している等、多くの利点がある。As described in detail above, according to the present invention, the reflection coefficient t and the excitation gain s calculated by the speech coder are used to calculate the average value within this fixed section, The presence / absence of background noise is taken into consideration because it is compared with a threshold value prepared in advance to determine the presence / absence of sound, and accurate determination is possible even in a normal use environment. Furthermore, the voice detection determination interval can be varied in 2.5 ms to 2.5 ms steps by changing the average number of two average value calculators for the reflection coefficient t and the excitation gain s. It can handle voice detection in 10 ms frame units or less. Further, since the calculation parameters of the speech coder are used, the circuit scale and the amount of calculation for speech detection are small, and there are many advantages such as suitability for LSI.

[Brief description of drawings]

【図１】本発明の音声検出器を有するＬＤ−ＣＥＬＰ音
声符号化器の一実施例構成図である。FIG. 1 is a block diagram of an embodiment of an LD-CELP speech encoder having a speech detector of the present invention.

【図２】本発明の音声検出器の一実施例構成図である。FIG. 2 is a configuration diagram of an embodiment of a voice detector of the present invention.

【図３】本発明の音声検出器の判定論理フローである。FIG. 3 is a decision logic flow of the voice detector of the present invention.

【図４】本発明の音声検出処理結果の一例である。FIG. 4 is an example of a voice detection processing result of the present invention.

【図５】本発明の音声検出処理結果の一例である。FIG. 5 is an example of a voice detection processing result of the present invention.

【図６】従来の音声検出器の構成図である。FIG. 6 is a configuration diagram of a conventional voice detector.

【図７】従来の音声検出器の判定論理フローである。FIG. 7 is a decision logic flow of a conventional voice detector.

[Explanation of symbols]

１００…ＬＤ−ＣＥＬＰ音声符号化器、１０１…均一Ｐ
ＣＭ変換器、１０２…ベクトルバッファ、１０３…励振
コードブック、１０４…利得調整ユニット、１０５…合
成フィルタ、１０６…聴覚重み付けフィルタ適応器、１
０７…聴覚重み付けフィルタ、１０８…最小自乗平均誤
差算出器、１０９…バックワード合成フィルタ適応器、
１１０…バックワード利得適応器、２００…音声検出
器、２０１，２０４…平均値計算器、２０２…背景雑音
コードブック、２０３…最小誤差計算器、２０５…判定
回路、２０６…ハングオーバー処理器。100 ... LD-CELP speech encoder, 101 ... Uniform P
CM converter, 102 ... Vector buffer, 103 ... Excitation codebook, 104 ... Gain adjustment unit, 105 ... Synthesis filter, 106 ... Auditory weighting filter adaptor, 1
07 ... Auditory weighting filter, 108 ... Least mean square error calculator, 109 ... Backward synthesis filter adaptor,
110 ... Backward gain adaptor, 200 ... Speech detector, 201, 204 ... Average value calculator, 202 ... Background noise codebook, 203 ... Minimum error calculator, 205 ... Judgment circuit, 206 ... Hangover processor.

Claims

[Claims]

1. A speech encoder comprising a perceptual weighting filter adaptor and a backward gain adaptor, wherein a reflection coefficient calculated by the perceptual weighting filter adaptor and an excitation calculated by the backward gain adaptor. A voice detector characterized by performing voiced / non-voiced determination by inputting each gain and comparing it with a prepared threshold value.

2. A speech coder having a speech detector, comprising the speech detector according to claim 1.