JPH0832526A

JPH0832526A - Voice detector

Info

Publication number: JPH0832526A
Application number: JP6186840A
Authority: JP
Inventors: Ichiro Matsumoto; 一郎松本; Seiji Sasaki; 誠司佐々木
Original assignee: Kokusai Electric Corp
Current assignee: Kokusai Electric Corp
Priority date: 1994-07-18
Filing date: 1994-07-18
Publication date: 1996-02-02

Abstract

PURPOSE:To reduce an unpleasant sense of a reproduced voice by preventing missing of a voice frame due to noise/voice detection error and executing voice detection processing without giving an unpleasant sense to a recipient. CONSTITUTION:A VAD threshold level 1, that is, a voice detection threshold level 1 is equal to a prediction coefficient threshold level used to decide a noise region by a noise region decision device, and a VAD threshold level 2 and a VAD threshold level 3 are used to extract a region of higher frequency of occurrence than that of the VAD threshold level 1. Thus, they are set to decide a narrower region to be a noise (silence section). A voice/silence deciding device 35 sets a voice detection flag (u) to silence L when a mean value of prediction coefficients a1, a2 enters a specific range set by any of the VAD threshold levels 1, 2, 3 selected by a threshold level changeover device 41 among plural predetermined ranges and sets the flag to a voiced sound H in other cases. A final voice detection output (v) is obtained from the obtained result by using a hang-over processing circuit 36.

Description

Detailed Description of the Invention

【０００１】本発明は、音声符号化方式に用いられる音
声検出器に関するものである。The present invention relates to a voice detector used in a voice coding system.

【０００２】[0002]

【従来の技術】携帯型の無線機等では、送信時の消費電
力を低減するために、音声があるときのみ送信し音声が
無いときには送信を中断するＶＯＸ（Voice Operate Sw
itch Exchange ）制御が使用されており、これを用いる
と送信時の平均電力を削減することができる。このよう
なＶＯＸ機能を実行するために送信出力回路の前段に音
声信号の有無を検出する音声検出器が必要になる。この
ような音声検出器を本願発明者の一人が先に提案した
〔特開平４−３０１９３０号公報参照〕。2. Description of the Related Art In a portable radio device or the like, in order to reduce power consumption during transmission, VOX (Voice Operate Swing) which transmits only when there is voice and suspends transmission when there is no voice
itch exchange) control is used, which can reduce the average power during transmission. In order to execute such a VOX function, a voice detector for detecting the presence / absence of a voice signal is required in the preceding stage of the transmission output circuit. One of the inventors of the present application previously proposed such a voice detector [see Japanese Patent Application Laid-Open No. 4-301930].

【０００３】[0003]

【発明が解決しようとする課題】しかし、従来技術で
は、有音／無音判定用の予測係数しきい値が固定のため
背景雑音が大きくなるほど、音声の予測係数の発生頻度
分布領域が雑音の予測係数発生頻度分布領域に近づくた
め、音声区間であるのに雑音区間と判定される判定誤り
を生じ、送信出力に音声区間の欠落が起きてしまう。音
声区間の欠落は話者に対し不快感を与えるばかりではな
く、会話の明瞭度も低下させてしまう。このため雑音環
境下における音声検出としてはまだ十分とはいえない。However, in the prior art, as the background noise becomes larger because the prediction coefficient threshold for determining the voice / silence is fixed, the occurrence frequency distribution region of the prediction coefficient of the voice predicts the noise. Since the region is close to the coefficient occurrence frequency distribution region, a determination error that a voice segment is determined to be a noise segment occurs, and a voice segment is missing in the transmission output. The lack of the voice section not only makes the speaker uncomfortable, but also reduces the intelligibility of the conversation. Therefore, it cannot be said to be sufficient for voice detection in a noisy environment.

【０００４】本発明の目的は、従来技術の問題点である
雑音環境下でのＶＯＸ制御による音声信号の欠落をなく
し、受信側の再生音声の不快感を軽減した音声検出器を
提供することにある。An object of the present invention is to provide a voice detector which eliminates the loss of voice signal due to VOX control in a noisy environment, which is a problem of the prior art, and reduces the discomfort of the reproduced voice on the receiving side. is there.

【０００５】[0005]

【課題を解決するための手段】この目的を達成するため
に、本発明による音声検出器は、音声符号化装置の適応
予測器から得られる予測係数をフレーム化する第１のフ
レーム器と、該予測係数のフレーム区間の平均値を計算
する平均値計算器と、前記予測係数の発生分布から予め
求めた予測係数しきい値と前記平均値とを比較し雑音領
域発生頻度分布の発生頻度の高い部分に位置する区間を
見つけて雑音領域を判定する雑音領域判定器と、均一Ｐ
ＣＭ信号をフレーム化する第２のフレーム化器と、その
フレーム化された均一ＰＣＭ信号が雑音領域と判定され
たときのみ該均一ＰＣＭ信号の電力平均値を計算する電
力計算器と、前記雑音領域と判定されたときのみ、該均
一ＰＣＭ信号の電力平均値がある決められた数種類の電
力しきい値により設定された複数の領域のいずれの領域
にあるかの判定結果を出力する電力判定器と、該判定結
果により前記電力平均値が高い領域にあることが判定さ
れたときに音声検出用しきい値を前記予測係数しきい値
より高くするように切換えるしきい値切換え器と、前記
均一ＰＣＭ信号の有音／無音判定をするために前記予測
係数の平均値が前記しきい値切換え器により切換えられ
た音声検出用しきい値により設定される特定の範囲にあ
るかの判定により前記区間が有音区間であるか無音区間
であるかのいずれかを示す有音／無音フラグを出力する
有音／無音判定器と、該フラグに適当なハングオーバ処
理を行い音声検出出力信号を出力するハングオーバー処
理装置とを備えた構成を有している。To achieve this object, a speech detector according to the present invention comprises a first framer for framing a prediction coefficient obtained from an adaptive predictor of a speech coding apparatus, and An average value calculator for calculating the average value of the prediction coefficient in the frame section, and a prediction coefficient threshold value previously obtained from the occurrence distribution of the prediction coefficient and the average value are compared to generate a high occurrence frequency of the noise region occurrence frequency distribution. A noise region determiner for determining a noise region by finding a section located in the portion, and a uniform P
A second framing device for framing the CM signal; a power calculator for calculating a power average value of the uniform PCM signal only when the framed uniform PCM signal is determined to be in the noise region; Only when it is determined that the power average value of the uniform PCM signal outputs a determination result as to which region among a plurality of regions set by a certain number of determined power thresholds. A threshold switch for switching the voice detection threshold to a value higher than the prediction coefficient threshold when it is determined from the determination result that the power average value is in a high region, and the uniform PCM By determining whether the average value of the prediction coefficient is within a specific range set by the voice detection threshold value switched by the threshold value switcher to determine whether the signal is voiced or not. A voice / sound determiner that outputs a voice / silence flag indicating whether the segment is a voice segment or a silence segment, and outputs a voice detection output signal by performing an appropriate hangover process on the flag. And a hangover processing device that operates.

【０００６】[0006]

【実施例】実施例として、本発明をＰＨＳ（パーソナル
・ハンディホン・システム）の標準符号化方式である３
２ｋｂ／ｓ（キロビット／秒）ＡＤＰＣＭに適用する例
を以下に示す。図２は本発明を適用する音声検出機能を
有するＡＤＰＣＭ音声符号化装置のブロック図であり、
図１は本発明の音声検出器の実施例を示すブロック図で
ある。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS As an embodiment, the present invention is a PHS (Personal Handyphone System) standard encoding system.
An example applied to 2 kb / s (kilobits / second) ADPCM is shown below. FIG. 2 is a block diagram of an ADPCM voice encoding device having a voice detection function to which the present invention is applied.
FIG. 1 is a block diagram showing an embodiment of a voice detector of the present invention.

【０００７】まず、図２のＡＤＰＣＭ符号化装置につい
て説明する。２１は６４ｋｂ／ｓのμ則ＰＣＭ入力信号
を線形１３ビットＰＣＭの均一ＰＣＭ信号ｏに変換する
均一ＰＣＭ変換器である。２２は均一ＰＣＭ変換器２１
の出力ｏから適応予測器２３の出力である予測信号ｊを
差し引いて差分信号ｋを得る減算器である。２４は差分
信号ｋを量子化する適応量子化器である。２６は３２ｋ
ｂ／ｓの音声データを逆量子化し量子化差分信号ｍを出
力する適応逆量子化器である。２５は量子化差分信号ｍ
と予測信号ｊを加算して再生信号ｎを出力する加算器で
ある。２３は再生信号ｎと量子化差分信号ｍより予測信
号ｊを生成する適応予測器である。２７は本発明に係る
音声検出器である。First, the ADPCM coding apparatus shown in FIG. 2 will be described. Reference numeral 21 denotes a uniform PCM converter for converting a 64 kb / s μ-law PCM input signal into a uniform 13-bit PCM uniform PCM signal o. 22 is a uniform PCM converter 21
Is a subtractor that subtracts the prediction signal j, which is the output of the adaptive predictor 23, from the output o of the above to obtain the difference signal k. An adaptive quantizer 24 quantizes the difference signal k. 26 is 32k
It is an adaptive dequantizer that dequantizes b / s voice data and outputs a quantized difference signal m. 25 is the quantized difference signal m
And a predicted signal j and add the predicted signal j to output a reproduced signal n. Reference numeral 23 is an adaptive predictor that generates a prediction signal j from the reproduction signal n and the quantized difference signal m. 27 is a voice detector according to the present invention.

【０００８】６４ｋｂ／ｓのμ則ＰＣＭ入力信号は線形
１３ビットＰＣＭの均一ＰＣＭ信号ｏに均一変換器２１
で変換される。減算器２２で均一ＰＣＭ信号ｏから適応
予測器２３の出力である予測信号ｊを差し引いて、差分
信号ｋを得る。この差分信号ｋは適応量子化器２４によ
り量子化され、ＡＤＰＣＭ音声符号化装置の出力として
３２ｋｂ／ｓの音声データが伝送路に送出される。一
方、適応逆量子化器２６で３２ｋｂ／ｓの音声データが
適応逆量子化され量子化差分信号ｍが出力される。加算
器２５で、量子化差分信号ｍと予測信号ｊが加算され、
再生信号ｎが出力される。適応予測器２３で、予測係数
ａ１，ａ２を算出してそれを用いて量子化差分信号ｍ及
び再生信号ｎから予測信号ｊが生成される。適応予測器
２３が予測信号ｊを生成するために算出する予測係数ａ
１，ａ２はある時点の標本値を隣接する過去の２つの標
本値で予測するための係数であり、その値は、自己相関
が大きい音声信号の場合と自己相関が小さい背景雑音の
場合とでは領域の異なった発生分布となる。この予測係
数ａ１，ａ２と均一ＰＣＭ信号ｏが本発明の音声検出器
２７に入力される。音声検出器２７ではこれらの信号を
用い、有音／無音の判定をして音声検出出力信号を出力
する。The 64 kb / s μ-law PCM input signal is converted into a uniform 13-bit PCM uniform PCM signal o.
Is converted by. The subtractor 22 subtracts the prediction signal j output from the adaptive predictor 23 from the uniform PCM signal o to obtain the difference signal k. This differential signal k is quantized by the adaptive quantizer 24, and 32 kb / s voice data is sent to the transmission line as the output of the ADPCM voice coding device. On the other hand, the adaptive dequantizer 26 adaptively dequantizes the voice data of 32 kb / s and outputs the quantized difference signal m. In the adder 25, the quantized difference signal m and the prediction signal j are added,
The reproduction signal n is output. The adaptive predictor 23 calculates the prediction coefficients a1 and a2 and uses them to generate the prediction signal j from the quantized difference signal m and the reproduction signal n. Prediction coefficient a calculated by the adaptive predictor 23 to generate the prediction signal j
1, a2 are coefficients for predicting a sample value at a certain time point with two adjacent sample values in the past, and the values thereof are different between a case of a speech signal with a large autocorrelation and a case of background noise with a small autocorrelation The distribution of occurrence is different in the area. The prediction coefficients a1 and a2 and the uniform PCM signal o are input to the voice detector 27 of the present invention. The voice detector 27 uses these signals to determine whether there is sound or no sound and outputs a voice detection output signal.

【０００９】次に本発明の音声検出器について説明す
る。図１は本発明の音声検出器の構成例を示すブロック
図である。３１，３２は適応予測器２３から得られる予
測係数ａ１，ａ２をそれぞれ５ｍｓｅｃにフレーム化す
る第１のフレーム化器である。３３，３４は第１のフレ
ーム化器３１，３２の各出力の平均値をそれぞれ計算す
る平均値計算器である。３５は平均値計算器３３，３４
の出力を与えられたしきい値と比較して有音／無音判定
を行う有音／無音判定器である。３６は有音／無音判定
器３５の結果に対してハングオーバー処理を行うハング
オーバー処理装置である。３７は均一ＰＣＭ信号ｏをフ
レーム化する第２のフレーム化器である。３８は平均値
計算器３３，３４の各出力が予測係数ａ１，ａ２の発生
分布から図３のように雑音領域として予め定めた予測係
数しきい値の領域Ａに含まれるか否かを比較し、その雑
音領域Ａに含まれると判定したときに雑音領域信号ｔを
出力する雑音領域判定器である。３９は第２のフレーム
化器３７の出力の電力の平均値を計算する電力計算器で
ある。４０は、有音／無音判定器３５に与える音声検出
用しきい値すなわち、ＶＡＤ（Voice Activity Detecti
on）のためのＶＡＤしきい値１，ＶＡＤしきい値２，Ｖ
ＡＤしきい値３を背景雑音の電力の大きさにより切換え
るために、電力計算器３９の出力を予め設定された電力
しきい値（背景雑音電力判定用しきい値である Back Gr
ound Noise Thresholdとして用いられる bgnth＿１， b
gnth＿２）と比較し電力の大きさを判定する電力判定器
である。４１は有音／無音判定器３５に与える音声検出
用しきい値（ＶＡＤしきい値１，ＶＡＤしきい値２，Ｖ
ＡＤしきい値３）を切換えるしきい値切換え器である。Next, the voice detector of the present invention will be described. FIG. 1 is a block diagram showing a configuration example of a voice detector of the present invention. Reference numerals 31 and 32 are first framing devices for framing the prediction coefficients a1 and a2 obtained from the adaptive predictor 23 into 5 msec, respectively. 33 and 34 are average value calculators that calculate the average value of each output of the first framing devices 31 and 32, respectively. 35 is an average value calculator 33, 34
Is a voiced / non-voiced discriminator which compares the output of the above with a given threshold value to make a voiced / non-voiced determination. Reference numeral 36 denotes a hangover processing device that performs a hangover process on the result of the sound / silence determiner 35. 37 is a second framing device for framing the uniform PCM signal o. Reference numeral 38 compares whether or not each output of the average value calculators 33 and 34 is included in the area A of the predetermined prediction coefficient threshold value as a noise area from the occurrence distribution of the prediction coefficients a1 and a2. , A noise area determiner that outputs a noise area signal t when it is determined to be included in the noise area A. Reference numeral 39 is a power calculator that calculates the average value of the power output from the second framing device 37. Reference numeral 40 denotes a voice detection threshold value given to the voice / non-voice determination device 35, that is, VAD (Voice Activity Detecti).
on) VAD threshold 1, VAD threshold 2, V
In order to switch the AD threshold value 3 according to the power level of the background noise, the output of the power calculator 39 is set to a preset power threshold value (Back Gr
bgnth_1, b used as sound noise threshold
gnth_2) to determine the magnitude of power. Reference numeral 41 designates a voice detection threshold value (VAD threshold value 1, VAD threshold value 2, V
This is a threshold value switching device for switching the AD threshold value 3).

【００１０】まず、予測係数ａ１，ａ２は第１のフレー
ム化器３１，３２でフレーム化され、平均値計算器３
３，３４で平均値が計算される。雑音領域判定器３８か
らは予測係数ａ１，ａ２のフレーム平均値を用いてその
フレームが雑音領域か否かの判定結果を電力計算器３９
と電力判定器４０に出力する。電力計算器３９では雑音
領域判定器３８からの雑音領域信号ｔを受けたときのみ
式（１）の電力平均値（移動平均値）を計算する。First, the prediction coefficients a1 and a2 are framed by the first framers 31 and 32, and the average value calculator 3
At 3,34 the average value is calculated. From the noise area determiner 38, the power calculator 39 is used to determine whether the frame is in the noise area using the frame average values of the prediction coefficients a1 and a2.
And output to the power determiner 40. The power calculator 39 calculates the power average value (moving average value) of the equation (1) only when receiving the noise area signal t from the noise area determiner 38.

【数１】 bgnp＝pbgnp × 0.95 ＋nbgnp × 0.05 ……（１）ここで、 bgnp ：雑音電力平均値 pbgnp ：前回の雑音領域フレームの電力平均値 nbgnp ：今回の雑音領域フレームの電力平均値[Equation 1] bgnp = pbgnp × 0.95 + nbgnp × 0.05 (1) where, bgnp: average noise power value pbgnp: average power value of previous noise area frame nbgnp: average power value of current noise area frame

【００１１】電力判定器４０は雑音領域信号ｔを受けた
ときのみ、電力計算器３９の出力（bgnp）を背景雑音電
力判定用電力しきい値と比較し、例えば、（２）式に従
い判定して、しきい値切換え器４１を制御し、その判定
結果に対応するＶＡＤしきい値１，ＶＡＤしきい値２又
はＶＡＤしきい値３が図４のように選択されて有音／無
音判定器３５にＶＡＤしきい値として与えられるように
する。The power determiner 40 compares the output (bgnp) of the power calculator 39 with the power threshold for background noise power determination only when it receives the noise region signal t, and makes a determination according to, for example, equation (2). The threshold switch 41 is controlled to select the VAD threshold 1, VAD threshold 2 or VAD threshold 3 corresponding to the determination result as shown in FIG. 35 as a VAD threshold.

【数２】ここで、 bgnth＿ｎ（ｎ＝１，２）は、ＶＡＤしきい値
１，２，３を切換えるため次式（３）の関係で固定され
電力判定器４０にしきい値として与えられた電力であ
る。[Equation 2] Here, bgnth_n (n = 1, 2) is the power fixed as the threshold to the power determiner 40 in order to switch the VAD thresholds 1, 2, and 3 by the relationship of the following expression (3).

【数３】 bgnth ＿１＜bgnth ＿２ ……（３）[Equation 3] bgnth_1 <bgnth_2 (3)

【００１２】ＶＡＤしきい値１すなわち音声検出用しき
い値１は、雑音領域判定器３８で雑音領域を判定するた
めの予測係数しきい値と等しく、ＶＡＤしきい値２，Ｖ
ＡＤしきい値３は、ＶＡＤしきい値１より発生頻度の高
い領域を抽出するため、より狭い領域を雑音（無音区
間）と判定するように設定されている。有音／無音判定
器３５は、予測係数ａ１，ａ２の平均値が図５に示すよ
うな予め定めた複数の範囲のうちしきい値切換え器４１
で選択されたＶＡＤしきい値１，２又は３により設定さ
れる特定の範囲に入れば音声検出フラグｕを無音（Ｌ）
に設定し、それ以外の場合は有音（Ｈ）に設定する。以
上で得られた結果に対してハングオーバー処理装置３６
によりここでは例として１００ｍｓｅｃのハングオーバ
ー処理を施し最終的な音声検出出力信号ｖを得る。The VAD threshold 1, that is, the voice detection threshold 1, is equal to the prediction coefficient threshold for determining the noise region by the noise region determiner 38, and the VAD thresholds 2, V
The AD threshold value 3 is set so that a narrower area is determined as noise (silent section) in order to extract an area having a higher occurrence frequency than the VAD threshold value 1. The voiced / non-voiced determination unit 35 uses the threshold value switching unit 41 among a plurality of predetermined ranges in which the average values of the prediction coefficients a1 and a2 are as shown in FIG.
If it falls within a specific range set by the VAD threshold value 1, 2 or 3 selected in step 3, the voice detection flag u is set to silence (L).
Otherwise, set to voice (H) otherwise. The hangover processing device 36 is applied to the results obtained above.
Therefore, here, as an example, a hangover process of 100 msec is performed to obtain a final voice detection output signal v.

【００１３】[0013]

【発明の効果】以上詳細に説明したように、本発明によ
れば、雑音環境下での雑音／音声検出誤りによる音声フ
レームの欠落を防ぎ受話者に不快感を与えることなく音
声検出処理を実行することができるため、その効果は極
めて大きい。As described above in detail, according to the present invention, the voice detection processing is executed without causing the listener to feel uncomfortable by preventing the voice frame from being dropped due to noise / voice detection error in a noisy environment. Therefore, the effect is extremely large.

[Brief description of drawings]

【図１】本発明の音声検出器の実施例を示すブロック図
である。FIG. 1 is a block diagram showing an embodiment of a voice detector of the present invention.

【図２】音声検出機能を有するＡＤＰＣＭ音声符号化装
置のブロック図である。FIG. 2 is a block diagram of an ADPCM speech coding apparatus having a speech detection function.

【図３】本発明の動作を説明するための特性図である。FIG. 3 is a characteristic diagram for explaining the operation of the present invention.

【図４】本発明の動作を説明するための特性図である。FIG. 4 is a characteristic diagram for explaining the operation of the present invention.

【図５】本発明の動作を説明するための特性図である。FIG. 5 is a characteristic diagram for explaining the operation of the present invention.

【符号の説明】３１，３２第１のフレーム化器（予測係数用）３３，３４平均値計算器（予測係数用）３５有音／無音判定器３６ハングオーバー処理装置３７第２のフレーム化器（均一ＰＣＭ信号用）３８雑音領域判定器３９電力計算器４０電力判定器４１しきい値切換え器ｔ雑音領域信号ｕ音声検出フラグｖ音声検出出力信号２１均一ＰＣＭ変換器２２減算器２３適応予測器２４適応量子化器２５加算器２６適応逆量子化器２７音声検出器ｊ予測信号ｋ差分信号ｍ量子化差分信号ｎ再生信号ｏ均一ＰＣＭ信号[Explanation of Codes] 31, 32 First Framer (for Prediction Coefficient) 33, 34 Average Value Calculator (for Prediction Coefficient) 35 Sound / Silence Determiner 36 Hangover Processing Device 37 Second Framer (For uniform PCM signal) 38 Noise region determiner 39 Power calculator 40 Power determiner 41 Threshold switching device t Noise region signal u Speech detection flag v Speech detection output signal 21 Uniform PCM converter 22 Subtractor 23 Adaptive predictor 24 adaptive quantizer 25 adder 26 adaptive dequantizer 27 speech detector j prediction signal k difference signal m quantized difference signal n reproduced signal o uniform PCM signal

Claims

[Claims]

1. A first frame unit for framing a prediction coefficient obtained from an adaptive predictor of a speech coding apparatus, an average value calculator for calculating an average value of a frame section of the prediction coefficient, and the prediction coefficient. A noise region determiner that compares a prediction coefficient threshold previously obtained from the occurrence distribution of the above with the average value to find a section located in a high occurrence frequency portion of the noise region occurrence frequency distribution to determine the noise region, and a uniform A second framing device for framing the PCM signal; a power calculator for calculating a power average value of the uniform PCM signal only when the framed uniform PCM signal is determined to be in the noise region; Only when it is determined that the average power value of the uniform PCM signal is in a plurality of regions set by a certain number of predetermined power thresholds A determining device, and a threshold value switching device for switching the voice detection threshold value to be higher than the prediction coefficient threshold value when it is determined by the determination result that the power average value is in a high region, By determining whether the average value of the prediction coefficient is within a specific range set by the voice detection threshold value switched by the threshold value switcher to determine whether the uniform PCM signal is voiced or not. A voice / non-voice deciding device for outputting a voice / non-voice flag indicating whether the period is a voice period or a silence period, and a voice detection output signal by performing an appropriate hangover process on the flag. A voice detector having a hangover processing device.