JP3081264B2

JP3081264B2 - Voice detector

Info

Publication number: JP3081264B2
Application number: JP03087381A
Authority: JP
Inventors: 誠司佐々木
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 1991-03-28
Filing date: 1991-03-28
Publication date: 2000-08-28
Anticipated expiration: 2015-08-28
Also published as: JPH04299400A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声通信における音声
検出器に関するものである。 BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to voice communication in voice communication.
It relates to a detector.

【０００２】[0002]

【従来の技術】ディジタルコードレス電話装置の如き携
帯型の無線機等では、送信時の消費電力を低減するため
に、音声があるときのみ送信し音声がない時には送信を
中断するＶＯＸ（ＶｏｉｃｅＯｐｅｒａｔｅＳｗｉ
ｔｃｈＥｘｃｈａｎｇｅ）制御が使用されており、こ
れを用いると送信時の平均消費電力を約１５％削減する
ことができる。このようなＶＯＸ機能を実行するために
送信出力回路の前段に音声信号の有無を検出する音声検
出器が必要になる。このような音声検出器をディジタル
コードレス電話装置のＶＯＸ制御に適用することを前提
にして説明する。このディジタルコードレス電話装置で
は、音声符号化方式（ＣＯＤＥＣ）として、３２ｋｂ／
ｓ適応差分パルス符号化（ＡＤＰＣＭ：Ａｄａｐｔｉｖ
ｅＤｉｆｆｅｒｅｎｔｉａｌＰｕｌｓｅＣｏｄｅ
Ｍｏｄｕｌａｔｉｏｎ）が用いられる。また、この装
置での処理遅延時間は７ｍｓｅｃ以下であることが要求
される。図６は従来の音声検出器のブロック図であり、
８ｋＨｚサンプリングで２ ⁸ ＝２５６レベルの量子化レ
ベルを用いて量子化された入力音声信号ａを、２０ｍｓ
ｅｃフレーム単位（１６０サンプル）に分割して音声の
有無を判定し有音／無音フラグを出力する音声検出器で
ある。音声入力信号ａは直流成分抑圧器１１の高域通過
フィルタにより直流成分が取り除かれた信号ｂとなって
次の各回路に与える。2. Description of the Related Art A portable telephone such as a digital cordless telephone device is used.
For band-type wireless devices, etc., to reduce power consumption during transmission
When there is audio, send only when there is no audio
Suspended VOX (Voice Operate Swi)
tch Exchange) control is used.
Using this reduces the average power consumption during transmission by about 15%
be able to. In order to execute such a VOX function
An audio detector that detects the presence or absence of an audio signal before the transmission output circuit
A dispatcher is required. A description will be given on the assumption that such a voice detector is applied to VOX control of a digital cordless telephone device. In this digital cordless telephone device, 32 kb / s is used as a voice coding system (CODEC).
s Adaptive differential pulse coding (ADPCM: Adaptive)
e Differential Pulse Code
Modulation) is used. Further, the processing delay time in this device is required to be 7 msec or less. FIG. 6 is a block diagram of a conventional voice detector.
Quantization level of 2 ⁸ = 256 levels with ⁸ kHz sampling
The input audio signal a quantized using the bell is output for 20 ms.
This is a voice detector that divides the data into ec frames (160 samples), determines the presence / absence of voice, and outputs a voice / non-voice flag. The audio input signal a becomes a signal b from which the DC component has been removed by the high-pass filter of the DC component suppressor 11, and is supplied to the following circuits.

【０００３】高レベルパワー検出器１２では、２０ｍｓ
ｅｃの音声区間を４ｍｓｅｃ毎のサブフレーム（３２サ
ンプル）に５分割し各サブフレームについて次の（１）
式により短区間パワーＰ_skを算出する。但し、ｘ_iはフィルタ出力，ｋはサブフレーム番号であ
る。算出された各サブフレームのＰ_skに対して、パワー
しきい値Ｔｈ２（−３０ｄＢｍ０）により次式のように
パワー検出を行う。Ｐ_sk≧Ｔｈ２のときＤ_2k＝１（２）Ｐ_sk＜Ｔｈ２のときＤ_2k＝０（３）さらに（４）式の重み付け総和Ｄ₂をとり、これを１フ
レームの検出結果として信号ｃを出力する。 In the high-level power detector 12, 20 ms
ec is divided into 5 subframes (32 samples) every 4 msec, and the following (1)
The short section power _Psk is calculated by the equation. However, x _i is the filter output, k is the subframe number. Against P _sk for each subframe is calculated, performs power detection as in the following equation by power threshold Th2 (-30dBm0). _{P D 2k = 1 (2)} D 2k = 0 (3) when P _sk <Th2 when _sk ≧ Th2 more (4) takes a weighted sum D ₂ of formula, a signal c so as a detection result of the frame Output.

【０００４】低レベルパワー検出器１３では、（１）式
により算出した短区間パワーに対してパワーしきい値Ｔ
ｈ１（−５０ｄＢｍ０）により次式のようにパワー検出
を行う。Ｐ_sk≧Ｔｈ１のときＤ_1k＝１（５）Ｐ_sk＜Ｔｈ１のときＤ_1k＝０（６）同様に次式の重み付け総和Ｄ₁をとり、１フレームの検
出結果として信号ｄを出力する。また、このとき同時に下式の値を求めておく。 The low-level power detector 13 has a power threshold T for the short section power calculated by the equation (1).
Based on h1 (−50 dBm0), power detection is performed as in the following equation. Take _{D 1k = 1 (5) D} 1k = 0 (6) likewise weighted sum D ₁ of the following formula when P _sk <Th1 when P _sk ≧ Th1, and outputs a signal d as a detection result of the frame. At this time, the value of the following equation is also obtained.

【０００５】零交差数検出器１４では、信号ｂの零クロ
ス数（連続した２サンプルの音声信号の符号ビットが異
符号となる数）をカウントするため、サブフレーム毎に
次の（９）式によるＺ_skの演算を行う。算出された各Ｚ_skに対して零クロスしきい値Ｔｈ３（２
４個）により、次式のように零クロス数を検出する。Ｚ_sk≧Ｔｈ３のときＤＺ_sk＝１（10）Ｚ_sk＜Ｔｈ３のときＤＺ_sk＝０（11）同様に、次式の重み付け総和Ｄ_zをとり１フレームの検
出結果として信号ｅを出力する。 [0005] The zero-crossing number detector 14 counts the number of zero-crossings of the signal b (the number of code bits of the audio signal of two consecutive samples having different codes). _Zsk is calculated by For each calculated _Zsk , the zero cross threshold Th3 (2
4), the number of zero crosses is detected as in the following equation. When Z _sk ≧ Th3, DZ _sk = 1 (10) When Z _sk <Th 3, DZ _sk = 0 (11) Similarly, a signal e is output as a detection result of one frame by taking the weighted sum D _z of the following equation.

【０００６】フレーム間パワー増分比較器１５では、１
フレーム分のパワーＰ_Tnを次の（13）式の演算により求
める。 ₅ Ｐ_Tn＝ Σ Ｐ_sk （13） ^k=1 さらに前フレームのフレーム間パワーＰ_T(n-1)との比較
を行って次のパワー増分検出Ｄ₄を行い、その結果を信
号ｆとして出力する。Ｐ_Tn≧４Ｐ_T(n-1) のときＤ₄＝１（14）Ｐ_Tn＜４Ｐ_T(n-1) のときＤ₄＝０（15）In the inter-frame power increment comparator 15, 1
The power P _Tn for the frame is calculated by the following equation (13). _{_{_{5 P Tn = Σ P sk (}}} 13) by performing a comparison with ^{k = 1} further previous frame of the frame between the power P _{T (n-1)} performs the following power increment detection D _4, Shin results
And output as signal f . D ₄ = 1 when P _Tn ≧ 4P _{T (n-1)} (14) D ₄ = 0 when P _Tn <4P _{T (n-1)} (15)

【０００７】判定器１６では、これらの各信号ｃ，ｄ，
ｅ，ｆを入力して図７の判定理論フローに従って音声検
出結果を示す有音／無音フラグを出力する。図７におい
て、ＨＯＴはハングオーバタイマ（語尾切れ防止のため
有音から無音に判定が変わった時それ以降の数フレーム
を有音に設定する機能）を意味し、ＳＰフラグは有音／
無音フラグを意味する。[0007] In the decision unit 16, these signals c, d,
e and f are input, and a sound / non-sound flag indicating a sound detection result is output in accordance with the decision theory flow of FIG. In FIG. 7, HOT means a hangover timer (a function of setting several frames after that when the determination changes from voiced to silent to prevent end of speech from being voiced), and the SP flag is voiced / voiced.
Means silence flag.

【０００８】[0008]

【発明が解決しようとする課題】以上述べた従来の音声
検出器の処理は２０ｍｓｅｃフレーム単位で実行される
ため最低２０ｍｓｅｃの遅延時間を生じ、上述した７ｍ
ｓｅｃ以下という条件を満たすことができない。また、
従来の音声検出器は音声符号化器と独立して構成されて
いるため処理量が大きくなるなどの欠点がある。本発明
の目的は、適応予測機能を有する音声符号化器の処理過
程で得られる予測係数を有効に利用して、短い処理時間
で、かつ、遅延時間を７ｍｓｅｃ以下に抑えて音声の有
無を検出することのできる音声検出器を提供することに
ある。Since the above-described processing of the conventional speech detector is executed in units of 20 msec frame, a delay time of at least 20 msec is generated, and the above-described 7 m
sec. or less. Also,
The conventional speech detector is configured independently of the speech encoder, and thus has a drawback such as a large processing amount. An object of the present invention is to effectively use a prediction coefficient obtained in a process of a speech encoder having an adaptive prediction function to detect presence / absence of speech in a short processing time and a delay time of 7 msec or less. It is an object of the present invention to provide a voice detector capable of performing the above-mentioned operations.

【０００９】[0009]

【課題を解決するための手段】本発明の音声検出器は、
入力音声信号を符号化して出力する音声符号化器に設け
られた適応予測器から得られる前記入力音声信号の相隣
接する２つの標本値に対する２つの予測係数を入力と
し、それぞれフレーム化された区間毎に平均値を求めて
出力する平均値計算手段と、前記２つの予測係数の発生
分布から予め求めたそれぞれの予測係数用しきい値範囲
に前記２つの平均値が含まれるか否かの比較結果により
前記区間が有音区間であるか無音区間であるかを判定
し、有音または無音を示す有音／無音フラグを出力する
判定手段とを備えたことを特徴とするものである。SUMMARY OF THE INVENTION A speech detector according to the present invention comprises:
Provided in an audio encoder that encodes and outputs the input audio signal
Adjacent to the input speech signal obtained from the
Input two prediction coefficients for the two sample values
And calculate the average value for each framed section
Mean value calculating means for outputting, and generation of the two prediction coefficients
Threshold range for each prediction coefficient obtained in advance from the distribution
The comparison result of whether or not the two average values are included in
Determines whether the section is a sound section or a silent section
And outputs a sound / silence flag indicating sound or silence.
And a determination means .

【００１０】[0010]

【実施例】実施例として、本発明をディジタルコードレ
ス電話装置用の音声符号化器である３２ｋｂ／ｓ（キロ
ビット／秒）ＡＤＰＣＭに適用する例を以下に示す。図
３は本発明を適用する音声検出機能を有するＡＤＰＣＭ
音声符号化器のブロック図であり、図１は本発明の音声
検出器の実施例を示すブロック図である。まず、図３の
ＡＤＰＣＭ符号化器について説明する。２１は６４ｋｂ
／ｓのμ則ＰＣＭ入力信号を線形１３ビットＰＣＭに変
換する均一ＰＣＭ変換器である。２２は均一ＰＣＭ変換
器の出力から適応予測器２３の出力である予測信号ｊを
差し引いて差分信号ｇを得る減算器２２である。この差
分信号ｇは適応量子化器２４により量子化され、ＡＤＰ
ＣＭ音声符号化器の出力として３２ｋｂ／ｓの音声デー
タが伝送路に送出される。一方、適応逆量子化器２６
は、３２ｋｂ／ｓの音声データを適応逆量子化すること
により量子化差分信号ｍを出力する。加算器２５は、量
子化差分信号ｍと予測信号ｊを加算することにより再生
信号ｎを出力する。適応予測器２３は、予測係数ａ₁，
ａ₂を算出しそれを用いて量子化差分信号ｍおよび再生
信号ｎから予測信号ｊを生成する。適応予測器２３が予
測信号ｊを生成するために算出する予測係数ａ₁，ａ₂
はある時点の標本値を相隣接する過去の２つの標本値で
予測するための係数であり、その値は、自己相関が大き
い音声信号の場合と自己相関が小さい背景雑音の場合と
では異なった発生分布となる。この予測係数ａ₁，ａ₂
が本発明の音声検出器２７に入力される。DESCRIPTION OF THE PREFERRED EMBODIMENTS As an embodiment, an example in which the present invention is applied to a 32 kb / s (kilobits / second) ADPCM which is a voice encoder for a digital cordless telephone apparatus will be described below. FIG. 3 shows an ADPCM having a voice detection function to which the present invention is applied.
FIG. 1 is a block diagram of a speech encoder, and FIG. 1 is a block diagram showing an embodiment of a speech detector according to the present invention. First, the ADPCM encoder of FIG. 3 will be described. 21 is 64 kb
/ S μ-law PCM input signal is converted into a linear 13-bit PCM. Reference numeral 22 denotes a subtractor 22 that subtracts the prediction signal j, which is the output of the adaptive predictor 23, from the output of the uniform PCM converter to obtain a difference signal g . This difference signal g is quantized by the adaptive quantizer 24 and the ADP
32 kb / s voice data is transmitted to the transmission line as an output of the CM voice coder. On the other hand, the adaptive inverse quantizer 26
Outputs a quantized difference signal m by adaptively dequantizing audio data of 32 kb / s. The adder 25 outputs a reproduced signal n by adding the quantized difference signal m and the prediction signal j. The adaptive predictor 23 calculates prediction coefficients a ₁ ,
It calculates a ₂ generates a predicted signal j from the quantized difference signal m and a reproduction signal n with it. Prediction coefficients a ₁ and a ₂ calculated by the adaptive predictor 23 to generate the prediction signal j
Is a coefficient for predicting in two sample values of the past sample values adjacent to each of the point in, the value is different in the case where the autocorrelation is less background noise autocorrelation is larger audio signal Occurrence distribution. The prediction coefficients a ₁ and a ₂
Is input to the voice detector 27 of the present invention.

【００１１】これを実証するため、予測係数ａ₁，ａ₂
の発生分布を測定した例を図４（Ａ），（Ｂ）及び図５
（Ｃ），（Ｄ）に示す。図において、図４（Ａ）は音声
信号（男声）、（Ｂ）は音声信号（女声）を示し、図５
（Ｃ）は白色雑音、（Ｄ）は有色雑音（−６ｄＢ／ｏｃ
ｔ）を示す。これらの図では、各サンプル点〇，●，◎
が示す予測係数ａ₁，ａ₂の範囲は、そのサンプル点を
原点とし−０．０５より大きく＋０．０５より小さいも
のとしている。また、最大の発生頻度を示すサンプル点
を◎印で示し、最大の発生頻度で正規化した場合０．１
以上の値をとるサンプル点を●印で示している。図４，
図５の結果から、予測係数ａ₁，ａ₂についてそれぞれ
適当なしきい値範囲を与えれば有音区間，背景雑音区間
（無音区間）の判定が可能となることが分かる。図４，
図５の予測係数ａ₁，ａ₂の発生分布図より、音声検出
器２７ではそれらが以下に示す〜の範囲の値となる
時は背景雑音区間（無音区間）であると判定し、その他
の場合は有音区間と判定し、それぞれＬレベル，Ｈレベ
ルで示す音声検出フラグを出力する。（0.70≦ａ₁≦1.00) かつ (−0.45＜ａ₂≦−0.35) （0.75≦ａ₁≦1.10) かつ (−0.55＜ａ₂≦−0.45) （0.85≦ａ₁≦1.20) かつ (−0.65＜ａ₂≦−0.55) （0.95≦ａ₁≦1.20) かつ (−0.70＜ａ₂≦−0.65) （ａ₁≦0.75) かつ (ａ₂≦０）To prove this, the prediction coefficients a ₁ and a ₂
4 (A), (B) and FIG.
(C) and (D) show. 4A shows a voice signal (male voice), FIG. 4B shows a voice signal (female voice), and FIG.
(C) is white noise, (D) is colored noise (−6 dB / oc)
t). In these figures, each sample point 〇, ●, ◎
The ranges of the prediction coefficients a ₁ and a ₂ indicated by are set to be larger than −0.05 and smaller than +0.05 with the sample point as the origin. In addition, a sample point indicating the maximum occurrence frequency is indicated by a double-circle mark, and when normalized by the maximum occurrence frequency, 0.1
The sample points having the above values are indicated by ●. FIG.
From the results shown in FIG. 5, it can be seen that if an appropriate threshold range is given for each of the prediction coefficients a ₁ and a _2, it is possible to determine a sound section and a background noise section (silence section). FIG.
Based on the occurrence distribution diagram of the prediction coefficients a ₁ and a ₂ in FIG. 5, the speech detector 27 determines that it is a background noise section (silent section) when they have a value in the range of the following. In this case, it is determined that the section is a sound section, and a sound detection flag indicated by the L level and the H level is output. (0.70 ≦ a ₁ ≦ 1.00) and (−0.45 <a ₂ ≦ −0.35) (0.75 ≦ a ₁ ≦ 1.10) and (−0.55 <a ₂ ≦ −0.45) (0.85 ≦ a ₁ ≦ 1.20) and (−0.65 <A ₂ ≦ −0.55) (0.95 ≦ a ₁ ≦ 1.20) and (−0.70 <a ₂ ≦ −0.65) (a ₁ ≦ 0.75) and (a ₂ ≦ 0)

【００１２】図１は本発明の音声検出器の構成例を示す
ブロック図である。図１の各ブロックの処理内容につい
て説明する。予測係数ａ₁，ａ₂をそれぞれフレーム化
器３１，３２に入力しそれぞれ５ｍｓｅｃ間隔にフレー
ム化して平均値計算器３３，３４に与える。平均値計算
器３３，３４では、１フレーム分の平均値を計算して有
音／無音判定器３５に入力する。有音／無音判定器３５
では、予測係数ａ₁，ａ₂の平均値が、上記の〜の
しきい値範囲に入れば音声検出フラグｕを無音（Ｌ）に
設定し、それ以外の場合は有音（Ｈ）に設定する。以上
で得られた結果に対してハングオーバ処理装置３６によ
り１００ｍｓｅｃのハングオーバ処理を施し最終的な音
声検出出力ｖを得る。図２はコンピュータシミュレーシ
ョンによる音声検出の動作確認の結果を示すタイムチャ
ートである。入力信号には有色雑音（−６ｄＢ／ｏｃ
ｔ）を重畳したものを用いている。同図（Ａ）には入力
信号、（Ｂ）にはハングオーバ処理後の有音／無音判定
結果を示す。これらより、本方式は周囲雑音に対して誤
動作が少なく良好な結果が得られているのが分かる。ま
た、（Ｃ），（Ｄ）にはそれぞれ予測係数ａ₁，ａ₂の
時間的変化を示す。これらより、有音区間と背景雑音区
間とでは予測係数ａ₁，ａ₂の値が異なることが確認出
来る。FIG. 1 is a block diagram showing a configuration example of a voice detector according to the present invention. The processing content of each block in FIG. 1 will be described. The prediction coefficients a ₁ and a ₂ are input to framers 31 and 32, respectively, and are framed at intervals of 5 msec and provided to average calculators 33 and 34. The average calculators 33 and 34 calculate the average of one frame and input the average to the sound / non-speech determiner 35. Sound / silence determiner 35
Then, the average value of the prediction coefficients a ₁ and a ₂ is
If it falls within the threshold range, the sound detection flag u is set to silence (L); otherwise, it is set to sound (H). The result obtained above is subjected to a hangover process of 100 msec by the hangover processing device 36 to obtain a final voice detection output v. FIG. 2 is a time chart showing the result of confirming the operation of voice detection by computer simulation. The input signal has colored noise (-6 dB / oc)
t) is used. FIG. 7A shows an input signal, and FIG. 7B shows a sound / non-sound determination result after the hangover process. From these results, it can be seen that the present system has less malfunction with respect to the ambient noise and obtains a good result. (C) and (D) show temporal changes of the prediction coefficients a ₁ and a ₂ , respectively. From these, it can be confirmed that the values of the prediction coefficients a ₁ and a ₂ are different between the sound section and the background noise section.

【００１３】[0013]

【発明の効果】以上詳細に説明したように、本発明を実
施することにより、音声検出処理の所要処理時間は約５
ｍｓｅｃと小さくなり、また、ＡＤＰＣＭの処理過程で
得られる係数を効率良く利用しているため小規模なハー
ドウェア（処理量はＡＤＰＣＭの１５％）で実現するこ
とができるため実用上極めて大きい効果がある。As described in detail above, by implementing the present invention, the processing time required for the voice detection processing is about 5 times.
msec, and the coefficient obtained in the process of ADPCM is efficiently used, so that it can be realized with small-scale hardware (the processing amount is 15% of ADPCM). is there.

[Brief description of the drawings]

【図１】本発明の音声検出器のブロック図である。FIG. 1 is a block diagram of a voice detector according to the present invention.

【図２】本発明の動作を示すタイムチャートである。FIG. 2 is a time chart showing the operation of the present invention.

【図３】本発明の音声検出器を付加したＡＤＰＣＭ符号
器のブロック図である。FIG. 3 is a block diagram of an ADPCM encoder to which a speech detector according to the present invention is added.

【図４】予測係数ａ₁，ａ₂の発生分布図である。FIG. 4 is an occurrence distribution diagram of prediction coefficients a ₁ and a ₂ .

【図５】予測係数ａ₁，ａ₂の発生分布図である。FIG. 5 is an occurrence distribution diagram of prediction coefficients a ₁ and a ₂ .

【図６】従来の音声検出器のブロック図である。FIG. 6 is a block diagram of a conventional voice detector.

【図７】従来の判定論理フローチャートである。FIG. 7 is a conventional decision logic flowchart.

[Explanation of symbols]

１１直流成分抑圧器１２高レベルパワー検出器１３低レベルパワー検出器１４零交差数検出器１５フレーム間パワー増分比較器１６判定器２１均一ＰＣＭ変換器２２減算器２３適応予測器２４適応量子化器２５加算器２６適応逆量子化器２７音声検出器３１，３２フレーム化器３３，３４平均値計算器３５有音／無音判定器３６ハングオーバ処理装置 DESCRIPTION OF SYMBOLS 11 DC component suppressor 12 High level power detector 13 Low level power detector 14 Zero crossing number detector 15 Inter-frame power increment comparator 16 Judgment device 21 Uniform PCM converter 22 Subtractor 23 Adaptive predictor 24 Adaptive quantizer Reference Signs List 25 adder 26 adaptive inverse quantizer 27 speech detector 31, 32 frame generator 33, 34 average value calculator 35 sound / non-speech determiner 36 hangover processing device

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩＨ０４Ｂ 14/06 Ｇ１０Ｌ 9/14 ３０１Ａ (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 - 19/14 G10L 11/00 - 11/06 H04B 14/00 - 14/08 H03M 7/30 - 7/38 ──────────────────────────────────────────────────続き Continuation of the front page (51) Int.Cl. ⁷ identification code FI H04B 14/06 G10L 9/14 301A (58) Investigated field (Int.Cl. ⁷ , DB name) G10L 19/00-19 / 14 G10L 11/00-11/06 H04B 14/00-14/08 H03M 7/30-7/38

Claims

(57) [Claims]

1. A sound output by encoding an input audio signal.
The input obtained from an adaptive predictor provided in the encoder
Two predictions for two adjacent samples of a speech signal
Coefficients are used as inputs, and the average
Average value calculating means for obtaining and outputting an average value; and each of the average value calculating means previously obtained from the occurrence distribution of the two prediction coefficients.
The above two averages are included in the threshold range for the prediction coefficient of
Whether the section is a sound section based on the comparison result
Judgment as to whether it is a silent section and a sound indicating sound or silence
/ Sound detection provided with judgment means for outputting a silence flag
vessel.