JP2656069B2

JP2656069B2 - Voice detection device

Info

Publication number: JP2656069B2
Application number: JP63116264A
Authority: JP
Inventors: 吉弘富田; 衡平伊▲せ▼田; 重之海上
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1988-05-13
Filing date: 1988-05-13
Publication date: 1997-09-24
Anticipated expiration: 2012-09-24
Also published as: JPH01286643A

Description

【発明の詳細な説明】〔概要〕入力信号を一定時間間隔のフレームで切り出し、信号
電力算出部で入力信号電力を算出し、零交差数計数部で
該入力信号の極性反転回数を計数することにより判定部
で音声の有無を検出する音声検出装置において、話頭及び話尾等の音声信号の非定常的な部分の検出の
精度を向上させることを目的とし、該入力信号の予測誤差信号を求める適応予測フィルタ
と、該予測誤差信号の電力を求める誤差信号電力算出部
と、該入力信号電力と予測誤差信号電力との電力比を求
める電力比較部と、を備え、該判定部では、該入力信号
電力がその閾値より小さい時、該零交差数による有音／
無音の判定に加えて、該電力比のフレーム間差分の絶対
値をその閾値と比較し更に前フレームが有音であるか無
音であるかに応じて現フレームの有音／無音判定を行う
ように構成する。DETAILED DESCRIPTION OF THE INVENTION [Outline] An input signal is cut out in frames at fixed time intervals, an input signal power is calculated by a signal power calculation unit, and the number of polarity inversions of the input signal is counted by a zero-crossing number counting unit. The purpose of the present invention is to obtain a prediction error signal of an input signal by improving the accuracy of detecting an unsteady portion of a speech signal such as the beginning and end of a speech in a speech detection device that detects the presence or absence of speech by a determination unit. An adaptive prediction filter, an error signal power calculation unit that calculates the power of the prediction error signal, and a power comparison unit that calculates a power ratio between the input signal power and the prediction error signal power. When the signal power is smaller than the threshold value,
In addition to the silence determination, the absolute value of the inter-frame difference of the power ratio is compared with the threshold value, and the presence / absence determination of the current frame is performed according to whether the previous frame is voiced or silent. To be configured.

[Industrial applications]

本発明は、音声検出装置に関し、特に入力信号を一定
時間間隔のフレームで切り出し、信号電力算出部で入力
信号電力を算出し、零交差数計数部で該入力信号の極性
反転回数を計数することにより判定部で音声の有無を検
出する音声検出装置に関するものである。The present invention relates to a voice detection device, and in particular, cuts out an input signal in frames at fixed time intervals, calculates an input signal power in a signal power calculation unit, and counts the number of polarity inversions of the input signal in a zero-crossing number counting unit. The present invention relates to a voice detection device for detecting the presence / absence of voice in a determination unit.

近年、各種データ通信サービス、特にパケット交換
網、ATM等の実現を目前にして、音声及びデータの効率
的な伝送を行うシステムの実現が望まれている。2. Description of the Related Art In recent years, realization of various data communication services, particularly, packet switching networks, ATMs, and the like has been demanded, and realization of a system for efficiently transmitting voice and data has been desired.

このようにシステムでは、音声信号の有無を検出して
音声の無い時間にはデータを伝送したり、或いは何も伝
送しないように制御して伝送効率の向上を図っている
が、音声信号の検出がシステムの性能に大きく影響する
ため、精度の良い音声検出装置が必要となっている。As described above, in the system, the presence or absence of an audio signal is detected and data is transmitted during a time when there is no audio, or control is performed so that nothing is transmitted to improve transmission efficiency. However, since this greatly affects the performance of the system, an accurate voice detection device is required.

[Conventional technology]

第７図には、従来から用いられている音声検出装置が
示されており、１は音声信号をフレーム毎に切り出し音
声信号電力を算出する信号電力算出部、２は該音声信号
の極性反転回数を計数する零交差数計数部、そして３は
信号電力算出部１及び零交差数計数部２の出力から有音
／無音の判定を行う判定部である。FIG. 7 shows a conventional voice detection apparatus, wherein 1 is a signal power calculation unit for extracting a voice signal for each frame and calculating voice signal power, and 2 is the number of polarity inversions of the voice signal. Numeral 3 is a zero-crossing number counting unit that counts the number of signals, and 3 is a determining unit that determines whether there is sound or no sound from the outputs of the signal power calculating unit 1 and the zero-crossing number counting unit 2.

このような音声検出装置における判定部３のアルゴリ
ズムが第８図に示されており、判定部３では、信号電力
算出部11で算出された音声信号電力SPをその閾値SP_thと
比較する（ステップ100）。その結果、SP＜SP_thであっ
た時には、有音と判定し、閾値SP_th＝SP_th2とする（ス
テップ101）が、SP＞SP_thでないときは、更に零交差数
計数部２からの零交差数ZCを２つの閾値ZC_v,ZC_fと比較
する（ステップ102）。The algorithm of the determination unit 3 in such a voice detection device is shown in FIG. 8, and the determination unit 3 compares the voice signal power SP calculated by the signal power calculation unit 11 with its threshold _SPth (step 100). As a result, SP <when was SP _th, it is determined that sound, and the threshold value SP _th = SP _th2 (step 101), SP> when not SP _th further zero from the zero crossing number counter section 2 the number of intersections ZC 2 one threshold ZC _v, is compared with the ZC _f (step 102).

これらの閾値ZC_v,ZC_fと有音（有声音、無声音）並び
に雑音（無音）との関係が第10図に示されており、SC_v
＜ZC＜ZC_fの時のみ無音状態となることが知られてい
る。These thresholds ZC _v, ZC _f and voiced (voiced, unvoiced) the relationship between and noise (silence) is shown in FIG. 10, SC _v
It is known that silence occurs only when <ZC <ZC _f .

従って、有音と判定されたときには、上記と同様に閾
値SP_th＝SP_th2として（ステップ101）、最初に戻るが、
無音と判定されたときには、閾値SP_th＝SP_th1として
（ステップ103）最初に戻る。Therefore, when it is determined that there is a sound, the threshold value is set to SP _th = SP _th2 (step 101) in the same manner as above, and the process returns to the beginning.
If it is determined that there is no _sound , the process returns to the beginning with the threshold value SP _th = SP _th1 (step 103).

この場合の電力閾値SP_th1とSP_th2との関係が第９図に
示されており、有音検出した時と無音検出した時の閾値
にヒステリシスを設け、無音→有音遷移時の閾値をSP
_th1とし、有音→無音遷移時の閾値をSP_th2として検出結
果がチャタリグしないようにしている。The relationship between the power thresholds SP _th1 and SP _{th2 in} this case is shown in FIG. 9. Hysteresis is provided for the thresholds when sound is detected and when silence is detected.
and _th1, the detection result is not to Chatarigu the threshold at the time voiced → silent transition as SP _th2.

[Problems to be solved by the invention]

しかしながら、このような信号電力、零交差回数のみ
では応答性が悪いため音声信号の非定常的な部分である
話頭及び話尾等の正確な検出ができないという問題点が
あった。However, there has been a problem that responsiveness is poor only with such signal power and the number of zero crossings, so that it is not possible to accurately detect a speech head, a speech tail, and the like, which are non-stationary portions of a speech signal.

このため、音声信号を一定期間記憶しておき、有音と
判断した段階で少し古いデータを読み出すことにより話
頭が切れることを避けるとともに、話尾においても音声
無しと判断してから一定期間有音区間を意識的に継続さ
せて音声の切断状態を無くしていた。これに伴い、音声
の切断を防ぐために挿入した遅延要素により音声検出動
作に遅延が発生し、符号化器の構成上好ましくなかっ
た。For this reason, the audio signal is stored for a certain period of time, and by reading out a little older data when it is determined that there is sound, the beginning of the speech can be prevented from being cut off. The section was intentionally continued to eliminate the disconnection state of the voice. Along with this, a delay occurs in the voice detection operation due to the delay element inserted to prevent the disconnection of the voice, which is not preferable in the configuration of the encoder.

従って、本発明は、入力信号を一定時間間隔のフレー
ムで切り出し、信号電力算出部で入力信号電力を算出
し、零交差数計数部で該入力信号の極性反転回数を計数
することにより判定部で音声の有無を検出する音声検出
装置において、話頭及び話尾等の音声信号の非定常的な
部分の検出の精度を向上させることを目的とする。Therefore, according to the present invention, the input signal is cut out by frames at fixed time intervals, the input signal power is calculated by the signal power calculation unit, and the number of polarity inversions of the input signal is counted by the zero-crossing number counting unit. It is an object of the present invention to improve the accuracy of detecting a non-stationary portion of a speech signal such as a speech head and a speech tail in a speech detection device that detects the presence or absence of speech.

[Means for solving the problem]

上記の目的を達成するために為された本発明の音声検
出装置では、第１図に原理的に示すように、入力信号の
予測誤差信号を求める適応予測フィルタ４と、該予測誤
差信号の電力を求める誤差信号電力算出部５と、入力信
号電力と予測誤差信号電力との電力比を求める電力比較
部６と、を備え、該判定部３において、該入力信号電力
及び該零交差数による有音／無音の判定に加えて、該電
力比のフレーム間差分の絶対値をその閾値と比較し該絶
対値が該閾値より大きい時には現フレームに対して直前
フレームの有音／無音判定と逆の判定を現フレームにつ
いて行い、該絶対値が該閾値以下のときには前記直前フ
レームの有音／無音判定を前記現フレームについて維持
するようにしている。In order to achieve the above object, in the speech detection apparatus of the present invention, as shown in principle in FIG. 1, an adaptive prediction filter 4 for obtaining a prediction error signal of an input signal, and a power of the prediction error signal And a power comparing unit 6 for calculating a power ratio between the input signal power and the predicted error signal power. In addition to the sound / silence determination, the absolute value of the inter-frame difference of the power ratio is compared with the threshold value, and when the absolute value is greater than the threshold value, the current frame is reversed from the sound / silence determination of the immediately preceding frame. The determination is performed for the current frame, and when the absolute value is equal to or less than the threshold, the sound / non-sound determination of the immediately preceding frame is maintained for the current frame.

また、本発明では、適応予測フィルタ４に線形予測フ
ィルタを用い、線形予測分析器79を設けて該線形予測フ
ィルタの予測係数を予め求めておくこともできる。Further, in the present invention, a linear prediction filter may be used as the adaptive prediction filter 4 and a linear prediction analyzer 79 may be provided to obtain a prediction coefficient of the linear prediction filter in advance.

[Action]

第１図に示した音声検出装置において、従来と同様に
信号電力算出部１及び零交差数係数部２でそれぞれ求め
た信号電力及び零交差数を基にして音声検出を行うこと
に加えて本発明では次のように音声検出手法を行う。In the speech detection device shown in FIG. 1, in addition to performing speech detection based on the signal power and the number of zero crossings obtained by the signal power calculation unit 1 and the zero crossing coefficient unit 2 as in the prior art, In the present invention, a voice detection method is performed as follows.

即ち、予測誤差信号は入力信号が定常的であればほぼ
一定の値となるが、入力信号が変化する部分においては
予測フィルタの特性が入力信号に最適とならず大きな値
を呈する。That is, if the input signal is stationary, the prediction error signal has a substantially constant value, but in a portion where the input signal changes, the characteristic of the prediction filter is not optimal for the input signal and exhibits a large value.

本発明はこの概念を利用するものであり、該入力信号
電力がその閾値より小さい時、該零交差数による有音／
無音の判定と平行して、信号電力と適応予測フィルタ４
及びこのフィルタ４によって発生される予測誤差信号か
ら誤差信号電力算出部５で求めた誤差信号電力との電力
比を電力比較部６で求める。そして、この電力比のフレ
ーム間差分の絶対値を判定部３で求め、その差分絶対値
を閾値と比較し、この差分絶対値が閾値より大きい時の
み更に直前のフレームが有音検出されているか無音検出
されているかによりそれぞれ現フレームの有音／無音判
定を逆転判定している。The present invention makes use of this concept, wherein when the input signal power is less than its threshold, the sound /
The signal power and the adaptive prediction filter 4
A power comparison unit 6 calculates a power ratio between the prediction error signal generated by the filter 4 and the error signal power calculated by the error signal power calculation unit 5. Then, the absolute value of the inter-frame difference of the power ratio is determined by the determining unit 3 and the absolute value of the difference is compared with a threshold value. Only when the absolute value of the difference is greater than the threshold value, is the sound of the immediately preceding frame detected? The presence / absence determination of the current frame is determined in reverse depending on whether silence is detected.

従って、フレーム間において予測誤差が急激に増大し
たり減少したりするのを電力比で検出し、これに直前の
フレームの有音／無音判定結果を加味することにより現
フレームの有音／無音判定を迅速且つ正確に行うことが
できることになる。Therefore, a sudden increase or decrease in the prediction error between frames is detected by the power ratio, and the result of the sound / non-speech determination of the immediately preceding frame is added to the result to determine the sound / no-sound of the current frame. Can be performed quickly and accurately.

また、本発明では、適応予測フィルタ４に線形予測フ
ィルタを用い、線形予測分析器７を設けて該線形予測フ
ィルタの予測係数を予め求めておけば、より一層適応予
測フィルタ４の予測動作を正確なものにすることがで
き、判定結果もそれに伴って正確なものとなる。Further, in the present invention, if a linear prediction filter is used as the adaptive prediction filter 4 and the linear prediction analyzer 7 is provided to obtain the prediction coefficients of the linear prediction filter in advance, the prediction operation of the adaptive prediction filter 4 can be more accurately performed. And the determination result becomes accurate accordingly.

〔Example〕

第２図乃至第４図は、第１図に原理的に示した本発明
の音声検出装置の一実施例を示しており、第２図は信号
電力算出部１の一実施例を、第３図は零交差数計数部２
の一実施例を、そして第４図は適応予測フィルタ４の一
実施例をそれぞれ示している。2 to 4 show an embodiment of the voice detection device of the present invention shown in principle in FIG. 1, and FIG. 2 shows an embodiment of the signal power calculator 1 in FIG. The figure shows the zero-crossing counter 2
FIG. 4 shows an embodiment of the adaptive prediction filter 4, and FIG.

第２図において、信号電力SPは入力信号χ_ｉより、で与えられる。この場合、ｎはサンプル数、Ｎは一定時
間間隔で区切ることにより形成されるフレームの数を示
している。In Figure 2, the signal power SP from the input signal chi _i, Given by In this case, n indicates the number of samples, and N indicates the number of frames formed by delimiting at regular time intervals.

第３図において、21は高域フィルタ（HPF）、22は極
性検出部、23は１サンプル遅延部、24は極性反転検出
部、25はカウンタであり、入力信号χ_ｉをフィルタ21を
通して直流オフセットを除去した後、極性検出部22で入
力信号の極性を検出し、現在のサンプル分と前サンプル
分の両信号の極性から極性反転検出部24が極性の反転を
検出し、その出力によりカウンタ25が計数を行う。これ
はフレーム毎のリセット信号が掛かるまで行う。In Figure 3, 21 is a high-pass filter (HPF), 22 is the polarity detector, 1 sample delay part 23, 24 is the polarity inversion detection part, 25 is a counter, the DC offset of the input signal chi _i through a filter 21 Then, the polarity detection unit 22 detects the polarity of the input signal, the polarity reversal detection unit 24 detects the reversal of the polarity from the polarities of both signals of the current sample and the previous sample, and outputs the counter 25 Performs counting. This is performed until a reset signal for each frame is applied.

第４図はADPCM符号化器で良く用いられる最急降下法
で係数が更新される適応予測フィルタから量子化器及び
逆量子化器を除いたもので、６組の遅延部（Ｄ）及びタ
ップb₁〜b₆から成る全零型フィルタ41と、２組の遅延部
（Ｄ）及びタップa₁、a₂から成る全極型フィルタ42とを
含んでいる。FIG. 4 shows an adaptive prediction filter in which coefficients are updated by a steepest descent method often used in an ADPCM encoder, excluding a quantizer and an inverse quantizer. Six sets of delay units (D) and taps b the all-zero type filter 41 consisting of ₁ ~b _6, and a all-pole filter 42 consisting of two sets of the delay unit (D) and tap a _1, a _2.

上記のような実施例の動作を第５図に示した判定部３
のフローチャートに沿って以下に説明する。The operation of the embodiment as described above is shown in FIG.
This will be described below with reference to the flowchart of FIG.

まず、第５図のフローチャート中、判定部３は、第８
図のフローチャートと同一符号を付したステップと同じ
アルゴリズムを実行する（説明は省略する）。First, in the flowchart of FIG.
The same algorithm as that of the steps denoted by the same reference numerals in the flowchart of the figure is executed (the description is omitted).

本発明の実施例では、信号電力算出部１で算出された
信号SPを閾値SP_thと比較した後、SP≦SP_thと判定した時
は零交差数ZCと閾値ZC_v,ZC_fとの比較（ステップ102）を
行うとともに別途平行して電力比較部６からの電力比Ｇ
に基づいて以下の有音／無音判定を行う。尚、ステップ
102での比較結果により有音／無音判定したときには、
それぞれ有音フラグ／無音フラグを立てておく（ステッ
プ104、105）。Comparison of the embodiments of the present invention, after comparing the signal SP calculated in the signal power calculator 1 and the threshold value SP _th, and SP ≦ SP _th and when determining the number of zero-crossings ZC and the threshold ZC _v, ZC _f (Step 102) and separately in parallel with the power ratio G from the power comparing unit 6.
The following sound / non-sound determination is performed based on In addition, step
When sound / no-sound is determined based on the comparison result in 102,
A sound flag / silence flag is set (steps 104 and 105).

即ち、フィルタ４からの予測誤差信号を誤差信号電力
算出部５に入力して求めた予測誤差信号電力EPと信号電
力算出部１からの信号電力SPとの電力比に相当する予測
利得Ｇを判定部３において次のように求める。That is, the prediction gain G corresponding to the power ratio between the prediction error signal power EP obtained by inputting the prediction error signal from the filter 4 to the error signal power calculation unit 5 and the signal power SP from the signal power calculation unit 1 is determined. In the part 3, it is determined as follows.

Ｇ＝10log₁₀（SP/EP）そして、この予測利得Ｇのフレーム間の変動を、 GD＝|G_t−G_t-1|（ｔはフレーム）として算出する。この場合、絶対値をとっているのは、
電力の変動が大きい方から小さい方への時とその逆の時
の双方があるからである。G = ₁₀ log ₁₀ (SP / EP) Then, the inter-frame variation of the prediction gain G is calculated as GD = | G _t −G _t−1 | (t is a frame). In this case, the absolute value is
This is because there are both times when the power fluctuations are large to small and vice versa.

このようにして求めた予測利得のフレーム間差分GD
を、予め設定した閾値GD_thと比較し（ステップ200）、G
D＞GD_thのときは、フレーム間の差分が大きいことを示
しており、従って、前フレームにおいて記憶された有音
／無音判定情報を参照し（ステップ201）、前フレーム
が有音であったときには現フレームは無音、前フレーム
が無音であったときには現フレームは有音であるとして
それぞれ無音フラグ、有音フラグを立てる（ステップ20
2、203）。The inter-frame difference GD of the prediction gain obtained in this way
_Is compared with a preset threshold GD _th (step 200), and G
When the D> GD _th, shows that the difference between frames is large, therefore, reference to voice / silence determination information stored in the previous frame (step 201), the frame was voiced before Sometimes, the current frame is silent, and when the previous frame is silent, the current frame is determined to be voiced, and a silent flag and a voiced flag are set (step 20).
2, 203).

一方、GD＜GD_thでないときはフレーム間の差分が小さ
いことを示したおり、従って、前フレームにおいて記憶
された有音／無音判定情報を参照し（ステップ204）、
前フレームが有音であったときには現フレームは有音、
前フレームは無音であったときには現フレームは無音で
あるとしてそれぞれ有音フラグ、無音フラグを立てる
（ステップ205、206）。On the other hand, when GD <GD _th is not satisfied, it indicates that the difference between the frames is small. Therefore, the sound / non-speech determination information stored in the previous frame is referred to (step 204).
When the previous frame was voiced, the current frame was voiced,
When the previous frame is silent, the current frame is determined to be silent, and a voice flag and a voice flag are set (steps 205 and 206).

このようにして色々なフラグに判定結果が記憶される
ことになるが、この内一つでも有音フラグが立っている
と判定されれば（ステップ207）、最終的に有音検出を
行い、有音フラグがいずれも立っていないときには無音
検出してそれぞれステップ101、103で信号電力SPの閾値
を更新する。As described above, the determination results are stored in various flags. If it is determined that at least one of the voiced flags is set (step 207), voice detection is finally performed. When no sound flag is set, silence is detected, and the threshold of the signal power SP is updated in steps 101 and 103, respectively.

有音検出を行ったときには、判定部３から有音検出信
号が発生された音声とデータの伝送の切り替え信号とし
て用いられる。When sound detection is performed, a sound detection signal is generated from the determination unit 3 and used as a signal for switching between transmission of voice and data.

第６図は本発明の別の実施例を示すもので、この実施
例では、適応予測フィルタ４に線形予測フィルタを用い
た場合の予測係数として、前フレームで線形予測分析器
７で求めた線形予測係数を用いる。FIG. 6 shows another embodiment of the present invention. In this embodiment, as a prediction coefficient when a linear prediction filter is used as the adaptive prediction filter 4, the linear coefficient obtained by the linear prediction analyzer 7 in the previous frame is used. Use prediction coefficients.

このように予め予測係数を求めておけば、より一層予
測誤差の計算を速く行うことができる。If the prediction coefficient is obtained in advance in this way, the calculation of the prediction error can be performed more quickly.

〔The invention's effect〕

以上のように、本発明の音声検出装置によれば、音声
信号の非定常的な話頭及び話尾を信号電力と予測誤差電
力との比に基づいて検出するように構成したので、高精
度な音声検出が可能となり、従ってハングオーバー制御
等の遅延要素が不必要となるか又は短くできる効果があ
る。As described above, according to the speech detection device of the present invention, the non-stationary speech head and speech tail of the speech signal are configured to be detected based on the ratio between the signal power and the prediction error power. This makes it possible to detect the voice, and thus has the effect that the delay element such as the hangover control becomes unnecessary or can be shortened.

[Brief description of the drawings]

第１図は本発明に係る音声検出装置の原理ブロック図、第２図は本発明及び従来例に係る音声検出装置に用いる
信号電力算出部の一実施例を示す図、第３図は本発明及び従来例に係る音声検出装置に用いる
零交差数計数部の一実施例を示す図、第４図は本発明に係る音声検出装置に用いる適応予測フ
ィルタの一実施例に示す図、第５図は本発明に用いる判定部の動作の一実施例を示す
フローチャート図、第６図は本発明の他の実施例を示す図、第７図乃至第10図は従来の音声検出装置を説明するため
の図、である。第１図において、１……信号電力算出部、２……零交差数計数部、３……判定部、４……適応予測フィルタ、５……誤差信号電力算出部、６……電力比較部、７……線形予測分析部。尚、図中、同一符号は同一又は相当部分を示す。FIG. 1 is a block diagram showing the principle of a voice detection device according to the present invention, FIG. 2 is a diagram showing an embodiment of a signal power calculator used in the voice detection device according to the present invention and a conventional example, and FIG. FIG. 4 is a diagram showing an embodiment of a zero-crossing number counting unit used in a speech detection device according to a conventional example. FIG. 4 is a diagram showing an embodiment of an adaptive prediction filter used in a speech detection device according to the present invention. Is a flowchart showing one embodiment of the operation of the determination unit used in the present invention, FIG. 6 is a diagram showing another embodiment of the present invention, and FIGS. 7 to 10 are diagrams for explaining a conventional voice detecting device. FIG. In FIG. 1, 1... Signal power calculation unit, 2... Zero crossing number counting unit, 3... Determination unit, 4... Adaptive prediction filter, 5... Error signal power calculation unit, 6. 7, a linear prediction analysis unit. In the drawings, the same reference numerals indicate the same or corresponding parts.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭60−200300（ＪＰ，Ａ) 特開昭60−39700（ＪＰ，Ａ) 特開昭58−143394（ＪＰ，Ａ) ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．36，Ｎｏ．３, （昭和63年３月）Ｐ．411〜412 古井著「ディジタル信号処理」東海大学出版会（昭和60年）Ｐ．76 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-60-200300 (JP, A) JP-A-60-39700 (JP, A) JP-A-58-143394 (JP, A) IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 36, No. 3, (March 1988) p. 411-412 "Digital Signal Processing" by Furui, Tokai University Press (1985), p. 76

Claims

(57) [Claims]

An input signal is cut out at predetermined time intervals in a frame, and a signal power calculator (1) calculates an input signal power.
An adaptive prediction filter for obtaining a prediction error signal of the input signal in a voice detection device for detecting the presence or absence of a voice by a determination unit (3) by counting the number of polarity inversions of the input signal by a zero-crossing number counting unit (2). (4), an error signal power calculation unit (5) for calculating the power of the prediction error signal, and a power comparison unit (6) for calculating a power ratio between the input signal power and the prediction error signal power. The determining unit (3) compares the absolute value of the difference between frames of the power ratio with a threshold value in addition to the sound / non-sound determination based on the input signal power and the number of zero crossings. When the value is greater, the opposite of the sound / non-speech determination of the immediately preceding frame with respect to the current frame is performed on the current frame. Keep about A voice detection device characterized by the following.

2. The voice according to claim 1, wherein said adaptive prediction filter is a linear prediction filter, and a prediction coefficient of said linear prediction filter is obtained in advance by a linear prediction analyzer. Detection device.