JPS63281200A

JPS63281200A - Voice section detecting system

Info

Publication number: JPS63281200A
Application number: JP62115990A
Authority: JP
Inventors: 孝夫鈴木; 白木　裕一; 庄司　保夫
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1987-05-14
Filing date: 1987-05-14
Publication date: 1988-11-17

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、ディジタル通信分野のディジタル音声挿入シ
ステム、或いは音声パケットシステムにおける音声区間
の検出方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a method for detecting a voice section in a digital voice insertion system or a voice packet system in the field of digital communications.

（従来の技術）従来、音声区間検出方式に適用される音声検出器として
は第２図に示すものが知られている。第２図によれば、
入力端子１に入力された入力サンプル信号Ａのうち、母
音等の比較的振幅の大きい信号ＡＩは、振幅検出部２に
入力され、摩擦性子音による信号Ａ２は、ＤＣ抑圧回路
４でオフセットを取り除かれた後、一定値ａを加えその
符号ビットを取り出したものが零交差検出部３に入力さ
れる。振幅検出部２では、比較回路２ａで信号Ａｌの絶
対値と所定値θとの大小の比較をなし、その結果により
カウンタ２ｂを増減させ、このカウンタ２ｂのカウンタ
値が閾値ＴＨｖより大きくなると、閾値回路２Ｃから出
力αＶを高レベル「１」でＯＲ回路５に出力する。一方
、零交差検出部３では、ＯＲ回路３ａにて入力した符号
ビットと１サンプル前の符号ビットとの一致を判別し、
この結果が一致、不一致かによりカウンタ３ｂを増減さ
せる。このことは、入力が（−ａ）を横切る回数を数え
ることと等価であり、カウンタ３ｂのカウンタ値が閾値
ＴＨｚより大きくなると、閾値回路２Ｃから出力α２を
高レベル「１」でＯＲ回路５に出力する。このＯＲ回路
５から閾値回路２Ｃ及び３Ｃからの出力αＶ及びα２と
の論理和αがハングオーバ制御回路６に入力され、ハン
グオーバ制御回路６にてＯＲ回路５の出力αが高レベル
「１」から低レベル「０」になったとしても、一定時間
高レベル「１」を出力し続けるハングオーバ時間が付加
されて、出力端子７から出力αｏｕｔとして出力する。(Prior Art) Conventionally, the one shown in FIG. 2 is known as a speech detector applied to the speech section detection method. According to Figure 2,
Among the input sample signals A input to the input terminal 1, a signal AI having a relatively large amplitude such as a vowel is input to the amplitude detection section 2, and a signal A2 due to a fricative consonant is subjected to offset removal by a DC suppression circuit 4. After that, a constant value a is added and the sign bit is extracted and input to the zero crossing detection section 3. In the amplitude detection section 2, a comparator circuit 2a compares the absolute value of the signal Al with a predetermined value θ, increases or decreases a counter 2b according to the result, and when the counter value of the counter 2b becomes larger than the threshold value THv, the threshold value The output αV from the circuit 2C is outputted to the OR circuit 5 at a high level "1". On the other hand, the zero crossing detection section 3 determines whether the sign bit inputted by the OR circuit 3a matches the sign bit one sample before, and
The counter 3b is increased or decreased depending on whether the result matches or does not match. This is equivalent to counting the number of times the input crosses (-a), and when the counter value of the counter 3b becomes larger than the threshold THz, the output α2 from the threshold circuit 2C is sent to the OR circuit 5 at a high level "1". Output. The logical sum α of the outputs αV and α2 from the threshold circuits 2C and 3C is input from the OR circuit 5 to the hangover control circuit 6, and the hangover control circuit 6 changes the output α of the OR circuit 5 from high level "1" to low level. Even if the level becomes "0", a hangover time is added in which the high level "1" is continued to be output for a certain period of time, and the output is output from the output terminal 7 as the output αout.

このαｏｕｔが高レベル「１」である間は有音であり、
低レベル「０」であれば無音である（昭和５１年度電子
通信学会総合全国大会、１７５３　　ｒ零交差頻度を利
用した音声検出の一方式」荒関　卓、落合和雄）。While this αout is at a high level “1”, there is a sound,
If the low level is "0", there is no sound (1975 General National Conference of the Institute of Electronics and Communication Engineers, 1753 "A method of sound detection using r zero crossing frequency" by Takashi Araseki and Kazuo Ochiai).

（発明が解決しようとする問題点）しかしながら、上記構成によれば、零交差回数は、入力
サンプル信号Ａが一定値（−ａ）を横切る回数を数えて
いるので、正確な入力サンプル列もスペクトル分布の推
定とはならない。即ち、回線雑音と音声信号とが同じ零
交差数を持ったとしても、聴感上では有意なスペクトル
分布の違いが生じる可能性があり、音声区間の切り出し
が不正確になるという間通点があった。更に、音声区間
検出に零交差情報を用いているので、ハングオーバ時間
を大きく短縮することができず、このため全通話時間に
占める有音部分の割合いが高くなってしまい、ディジタ
ル音声挿入システム或いは音声パケットシステムの様に
、無音部分に他の音声やデータを伝送することにより伝
送路の有効利用を図るシステムにおいては、より一層の
効率化が期待できないという問題点があった。(Problem to be Solved by the Invention) However, according to the above configuration, the number of zero crossings is counted as the number of times the input sample signal A crosses a certain value (-a), so the accurate input sample sequence is also It is not an estimation of the distribution. In other words, even if the line noise and the voice signal have the same number of zero crossings, there is a possibility that there will be a significant difference in the spectral distribution perceptually, and there will be a point where the extraction of the voice section will become inaccurate. Ta. Furthermore, since zero-crossing information is used to detect voice sections, it is not possible to significantly reduce hangover time, and as a result, the proportion of voiced sections in the total call time increases, making it difficult to use digital voice insertion systems or In a system such as a voice packet system that attempts to utilize the transmission path effectively by transmitting other voice or data during silent portions, there is a problem in that further efficiency cannot be expected.

本発明の目的は、上記問題点に鑑み、ハングオーバ時間
の短縮化が図れるとともに、的確に音声区間を検出でき
る音声区間検出方式を提供することにある。SUMMARY OF THE INVENTION In view of the above-mentioned problems, an object of the present invention is to provide a voice section detection method that can shorten hangover time and accurately detect voice sections.

（問題点を解決するための手段）本発明は、上記目的を達成するために、入力サンプル信
号の回線雑音と音声信号との識別をなし、音声信号を検
出する音声区間検出方式において、入力サンプル信号を
所定のフレーム長に区切り、各フレームのＬＰＣ分析を
なし、各フレームの平均パラとＬＰＣ係数を算出する分
析手段と、前記ＬＰＣ係数と予め設定した標準ベクトル
とのＬＰＣ距離と予め設定した閾値との比較をなし、Ｌ
ＰＣ距離が該閾値よりも大きいならば有音と判定する第
１の判定手段と、前記平均パワと予め設定した閾値との
比較をなし、平均パワが該閾値よりも大きいならば有音
と判定する第２の判定手段と、前記平均バラと所定範囲
で更新可能な適応パワ閾値との比較をなし、平均パワが
該適応パワ閾値よりも大きいならば有音と判定する第３
の判定手段と、前記第１．第２及び第３の判定手段によ
る判定結果に基づいて音声信号の有音・無音の判定をな
すとともに、有音及び無音の各フレーム数を計数する第
４の判定手段と、前記第４の判定手段からの指示により
、適応パワ閾値を算出し、これを更新する更新手段とを
備え、前記更新手段は、連続する無音フレーム数が予め
設定した指定数に達するか、前記第１及び第２の判定手
段による判定が無音であり、かつ前記第３の判定手段に
より有音と判定されたフレームが連続しその数が前記指
定数に達した時に適応パワ閾値を更新するようになした
ことを特徴とする。(Means for Solving the Problems) In order to achieve the above object, the present invention distinguishes the line noise of the input sample signal from the voice signal, and detects the voice signal by detecting the line noise of the input sample signal. an analysis means that divides the signal into predetermined frame lengths, performs LPC analysis on each frame, and calculates the average para and LPC coefficient of each frame; and the LPC distance between the LPC coefficient and a preset standard vector and a preset threshold value. Compare with L
a first determination means that determines that there is a sound if the PC distance is greater than the threshold; and a first determination means that compares the average power with a preset threshold, and determines that there is a sound if the average power is greater than the threshold; and a third determining means that compares the average variation with an adaptive power threshold that can be updated within a predetermined range, and determines that there is a sound if the average power is larger than the adaptive power threshold.
determining means; and the first determining means. a fourth determining means that determines whether the audio signal is voiced or silent based on the determination results of the second and third determining means, and counts the number of frames of voiced and silent frames; updating means for calculating and updating an adaptive power threshold based on an instruction from the means; The adaptive power threshold is updated when consecutive frames are determined to be silent by the determining means and are determined to be sound by the third determining means and the number reaches the specified number. shall be.

（作　用）本発明によれば、分析手段により入力サンプル信号を所
定長のフレームに区切り、このフレームのＬＰＣ分析を
行なって平均バラとＬＰＣ係数を算出し、第１の判定手
段は、このＬＰＣ係数と予め設定しである標準ベクトル
とのＬＰＣ距離と予め設定した閾値との大小の比較を行
ないＬＰＣ距離に基づく有音・無音の判定を行う。第２
の判定手段はフレームの平均パワと予め設定した閾値と
の大小の比較を行い、フレームの平均パワに基づく有音
・無音の判定を行ない、更に、第３の判定手段では、フ
レームの平均バラと更新手段よりの適応パワ閾値との大
小の比較を行い、適応パワ閾値に基づく有音・無音の判
定を行う。ここで、第４の判定手段が以上の第１．第２
及び第３の判定手段の各々の判定結果を統合して音声信
号の有音・無音の判定を行ない音声が検出される。(Function) According to the present invention, the analysis means divides the input sample signal into frames of a predetermined length, performs LPC analysis on the frames to calculate the average variation and the LPC coefficient, and the first determination means divides the input sample signal into frames of a predetermined length. A comparison is made between the LPC distance between the coefficient and a preset standard vector and a preset threshold value to determine whether there is a sound or no sound based on the LPC distance. Second
The third determination means compares the average power of the frame with a preset threshold value, and determines whether there is a sound or no voice based on the average power of the frame. A comparison is made with the adaptive power threshold from the updating means, and a determination is made as to whether there is a sound or not based on the adaptive power threshold. Here, the fourth determination means is the first determination means described above. Second
The determination results of the third determining means are integrated to determine whether the audio signal is voiced or not, and the voice is detected.

また、第４の判定手段は、有音及び無音の各フレーム数
を計数して、連続する無音フレーム数が予め設定した指
定数に達するか、前記第１及び第２の判定手段による判
定が無音であり、かつ前記第３の判定手段により有音と
判定されたフレームが連続し、その数が前記指定数に達
した時に更新手段に適応パワ閾値の更新を指示し、これ
により更新手段により、適応パワ閾値の算出、更新が行
なわれる。The fourth determining means counts the number of frames with sound and without sound, and determines whether the number of consecutive silent frames reaches a preset specified number or whether the first and second determining means determine that there is no sound. and when the number of consecutive frames determined to be sound by the third determining means reaches the specified number, the updating means is instructed to update the adaptive power threshold, whereby the updating means: The adaptive power threshold is calculated and updated.

（実施例）第１図は、本発明による音声区間検出方式が適用される
音声検出器の一実施例を示すブロック図である。図中、
１は入力端子、７は出力端子、１０はＬＰＣ分析部、１
１はＬＰＣ距離算出部、１２は標準ベクトルメモリ、１
３，１４．１６は比較部、１５は適応パワ閾値更新部、
１７は有音・無音判定部、１８はハングオーバ時間付加
部である。(Embodiment) FIG. 1 is a block diagram showing an embodiment of a speech detector to which the speech section detection method according to the present invention is applied. In the figure,
1 is an input terminal, 7 is an output terminal, 10 is an LPC analysis section, 1
1 is an LPC distance calculation unit, 12 is a standard vector memory, 1
3, 14.16 is a comparison unit, 15 is an adaptive power threshold update unit,
Reference numeral 17 represents a sound/non-speech determination unit, and 18 represents a hangover time addition unit.

ＬＰＣ分析部１０は、入力サンプル信号Ａを入力して所
定のフレーム長に区切りＬＰＣ分析を行ないＬＰＣ分析
フレームの自己相関係数Ｒ１及びＬＰＣ係数ｋｉを算出
し、自己相関係数行列ＩＲ及びＬＰＣ係数ベクトル１に
として、ＬＰＣ距離算出部１１に送出するとともに、第
０次の自己相関係数ＲＯをＬＰＣ分析フレームの平均パ
ワとして、比較部１４．１６及び適応パワ閾値更新部１
５に送出する。The LPC analysis unit 10 inputs the input sample signal A, divides it into predetermined frame lengths, performs LPC analysis, calculates the autocorrelation coefficient R1 and LPC coefficient ki of the LPC analysis frame, and calculates the autocorrelation coefficient matrix IR and the LPC coefficient. The vector 1 is sent to the LPC distance calculation unit 11, and the 0th order autocorrelation coefficient RO is set as the average power of the LPC analysis frame to the comparison unit 14.16 and the adaptive power threshold updating unit 1.
Send to 5.

ＬＰＣ距離算出部１１は、ＬＰＣ分析部１０からの自己
相関係数行列ＩＲ，ＬＰＣ係数ベクトル１ｋ及び標準ベ
クトルメモリ１２に格納しである標準ＬＰＣ係数ベクト
ル１ｂとから下記（１）式に従いＬＰＣ距離りを算出し
、その結果を比較部１３に送出する。The LPC distance calculation unit 11 calculates the LPC distance from the autocorrelation coefficient matrix IR from the LPC analysis unit 10, the LPC coefficient vector 1k, and the standard LPC coefficient vector 1b stored in the standard vector memory 12 according to the following formula (1). is calculated and the result is sent to the comparison section 13.

（ｌｋ−１ｂ）・ＩＲＩ・　（ｌｋ　−１ｂ　”）　ｔ
Ｄ　−（１）ｌｋ　　・ＩＲ−Ｉｋｔ但し、ｌｋ−（１，ｋｌ　、に２　、　　・・・、ｋｐ
）ｌｂ−（１，ｂｌ　、ｂ２　、・・・、ｂｐ）ＩＲ−
（Ｒｌｊ）Ｒｌｊ−Ｒ１１−ｊ　　ｌ　　（ｔ、　　ｊ−０＋　　
・・・、　　ｐ）Σ　　　　ｘ　（ｎ）　　・　ｘ　（
ｎ＋１）Ｒ１−□ 尚、ＮはＬＰＣ分析フレームのサンプル数、Ｘ（１）は
入力信号のサンプル値、ｐはＬＰＣ分析の分析次数、Ｒ
１はＬＰＣ分析フレームの第１次の自己相関係数、ｋｌ
は第１次のＬＰＣ係数、ｂｌは第１次の標準ＬＰＣ係数
、ｎはＬＰＣ分析フレームの番号、ｔはベクトルの転置
を示す記号である。（上記（１）式は、昭和５８年４月
１５日、■コロナ社より発行の「音声のディジタル信号
処理（下）ＪＰ２６９−Ｐ２７１、の式９−２１参照、
著者り、　Ｒ，Ｒａｂｌｎｅｒ　、　Ｒ，Ｗ、　５ｃｈ
ａｆ’ｅｒ　ｓ訳者：鈴木久喜）。(lk-1b)・IRI・(lk-1b ”) t
D-(1) lk ・IR-Ikt However, lk-(1, kl, 2, ..., kp
)lb-(1,bl,b2,...,bp)IR-
(Rlj) Rlj−R11−j l (t, j−0+
..., p) Σ x (n) ・ x (
n+1)R1-□ Note that N is the number of samples in the LPC analysis frame, X(1) is the sample value of the input signal, p is the analysis order of LPC analysis, and R
1 is the first-order autocorrelation coefficient of the LPC analysis frame, kl
is the first-order LPC coefficient, bl is the first-order standard LPC coefficient, n is the number of the LPC analysis frame, and t is a symbol indicating vector transposition. (For formula (1) above, refer to formula 9-21 of "Sound Digital Signal Processing (Part 2) JP269-P271," published by Corona Publishing, April 15, 1980.
Author: R, Rablner, R, W, 5ch
af'er s Translator: Hisaki Suzuki).

標準ベクトルメモリー２には、前記標準ＬＰＣ係数ベク
トル１ｂが格納されており、この標準ＬＰＣ係数ベクト
ル１ｂは、回線雑音を白色雑音とし、この白色雑音の実
測データから決定したものである。The standard LPC coefficient vector 1b is stored in the standard vector memory 2, and this standard LPC coefficient vector 1b is determined from actual measurement data of white noise as line noise.

比較部１３は、ＬＰＣ距離算出部１１からのＬＰＣ距離
りと閾値Ｄｔｈとの比較をなし、下記（２）式に従い、
ＬＰＣ距離りが閾値Ｄｔｈよりも大きい場合は、ＬＰＣ
距離りに基づく有音、無音の判定信号ＶＤを有音を示す
高レベル「１」とし、ＬＰＣ距離が閾値ｐｔｈと等しい
か或いは小さい場合は、判定信号ＶＤを無音を示す低レ
ベル「０」として有音・無音判定部１７に送出する。The comparison unit 13 compares the LPC distance from the LPC distance calculation unit 11 with the threshold value Dth, and according to the following formula (2),
If the LPC distance is greater than the threshold Dth, the LPC
The determination signal VD of whether there is a sound or no sound based on the distance is set to a high level "1" indicating the presence of sound, and when the LPC distance is equal to or smaller than the threshold value pth, the determination signal VD is set to a low level "0" indicating no sound. The signal is sent to the sound/silence determining section 17.

ＶＤ　−１（Ｄ＞Ｄｔｈ）　　　　　　（２）０　（Ｄ
≦Ｄ　ｔｈ）尚、閾値Ｄ−ｔｈは、ＬＰＣ距離りがＬＰＣ係数ベクト
ル１ｋ及び標準ＬＰＣ係数ベクトル１ｂとがもつスペク
トル包絡の形が類似しておれば小さな値をもち、相違し
ておれば大きな値をもつ傾向があり、従って、ＬＰＣ距
離りが大きいならば、入力フレームは白色的信号ではな
く有色性の信号、即ち音声信号として判断でき、このス
ペクトル分布の違いを利用して音声信号を検出するため
にＬＰＣ距離りに対して設けられたものである。VD −1(D>Dth) (2)0 (D
≦D th) The threshold D-th has a small value if the spectral envelope shapes of the LPC coefficient vector 1k and the standard LPC coefficient vector 1b are similar in LPC distance, and has a large value if they are different. Therefore, if the LPC distance is large, the input frame can be determined as a colored signal, that is, an audio signal, rather than a white signal, and the audio signal can be detected using this difference in spectral distribution. This is provided for the LPC distance in order to

比較部１４は、ＬＰＣ分析部１０からのＬＰＣ分析フレ
ームの平均パワ、即ち、第０次の自己相関係数ＲＯと第
１の閾値ｐ　ｔｈｉ及び第２の閾値Ｐｔｈ２との比較を
なし、下記（３）式に従い、自己相関係数ＲＯが閾値ｐ
　ｔｈｔよりも大きい場合は、ＬＰＣ分析フレームの平
均パワに基づく有音、無音の判定信号Ｖｐを有音を示す
高レベル「１」とし、自己相関係数ＲＯが第１の閾値Ｐ
　ｔｈｌと等しいか小さく、かつ第２の閾値Ｐ　ｔｈ２
よりも大きいか等しい場合は、検出信号Ｖｐを無音を示
す低Ｌ／ベルｒＯＪとし、更に、自己相関係数ＲＯが閾
値Ｐｔｈ２より小さい場合は、検出信号Ｖｐを同じく無
音を示す低レベル「−１」として有音・無音判定部１７
に送出する。The comparison unit 14 compares the average power of the LPC analysis frame from the LPC analysis unit 10, that is, the 0th order autocorrelation coefficient RO, with the first threshold p thi and the second threshold Pth2, and calculates the following ( 3) According to formula, the autocorrelation coefficient RO is set to the threshold p
If it is larger than tht, the sound/non-sound determination signal Vp based on the average power of the LPC analysis frame is set to a high level "1" indicating sound, and the autocorrelation coefficient RO is set to the first threshold P.
equal to or smaller than thl, and second threshold P th2
If the autocorrelation coefficient RO is smaller than the threshold value Pth2, the detection signal Vp is set to a low level "-1" indicating silence. ” as the sound/non-sound determination unit 17.
Send to.

１　（ＲＯ＞Ｐｔｈｌ）Ｖｐ　−０（Ｐｔｈｌ　≧ＲＯ≧Ｐｔｈ２　）　　　（
３）−１（Ｐｔｈ２　＞ＲＯ）（但し、ｐｔｈｌ　＞　Ｐｔｈ２　）適応パワ閾値更新部１５は前記ＬＰＣ分析フレームの平
均パワである第０次の自己相関係数ＲＯを入力するとと
もに、有音・無音判定部１７から送出されてくる制御信
号ＣＡＰの指示により下記（４）式に従って、後述する
条件により連続するＭＡＰ個のＬＰＣ分析フレームの平
均レベルに基づき適応パワ閾値ＰＡＰを算出、更新し、
この適応パワ閾値ＰＡＰを比較部１６に送出する。また
適応パワ閾値ＰＡＰの可変範囲は、前記閾値Ｐ　ｔｈｌ
以下かつＰ　ｔｈ２以上であり、閾値Ｐ　ｔｈｌより大
きくなった時は閾値Ｐ　ｔｈｌの値に、閾値Ｐ　ｔｈ２
より小さくなった時は、閾値Ｐ　ｔｈ２の値に保持する
。1 (RO>Pthl) Vp -0 (Pthl ≧RO≧Pth2) (
3)-1(Pth2 > RO) (However, pthl > Pth2) The adaptive power threshold updating unit 15 inputs the 0th order autocorrelation coefficient RO which is the average power of the LPC analysis frame, and also inputs the 0th order autocorrelation coefficient RO which is the average power of the LPC analysis frame, and According to the following equation (4) according to the instruction of the control signal CAP sent from the determination unit 17, the adaptive power threshold PAP is calculated and updated based on the average level of consecutive MAP LPC analysis frames according to the conditions described later,
This adaptive power threshold PAP is sent to the comparator 16. Further, the variable range of the adaptive power threshold PAP is the threshold P thl
When the value is less than or equal to P th2 and greater than the threshold P thl, the value is set to the value of the threshold P thl, and the threshold P th2 is set to the value of the threshold P thl.
When it becomes smaller, it is held at the value of threshold P th2.

ｌ　　　　ＮＡＰＰ　ＡＰ−Ｍ　ＡＰＸ　　−Σ　Ｄ　Ｒ０（Ｓ）　　（
４）ＮＡＰ　　　　Ｓ−１但し、ＭＡＰは定数、ＮＡＰは後述するカウンタで計数
する予め設定しである連続するフレームの指定数、Ｄ　
Ｒ０（Ｓ）は連続するＭＡＰ個のＬＰＣ分析フレームの
平均パワを示す。l NAP P AP-M APX -Σ D R0(S) (
4) NAP S-1 However, MAP is a constant, NAP is a preset specified number of consecutive frames counted by a counter described later, and D
R0(S) indicates the average power of consecutive MAP LPC analysis frames.

比較部１６は、適応パワ閾値ＰＡＰとＬＰＣ分析フレー
ムの平均パワである第０次の自己相関係数ＲＯとを入力
して両者の比較をなし、下記（５）式に従い、自己相関
係数ＲＯが適応パワ閾値ＰＡＰよりも大きい場合は、適
応パワ閾値に基づく判定信号ＶＡＰを有音を宗す高レベ
ル「１」とし、また自己相関係数ＲＯが適応パワ閾値Ｐ
ＡＰよりも等しいか小さい場合は、判定信号ＶＡＰを無
音を示す低レベル「０」と、Ｌ、て有音・無音判定部１
７に送出する。The comparison unit 16 inputs the adaptive power threshold PAP and the 0th order autocorrelation coefficient RO, which is the average power of the LPC analysis frame, and compares the two, and calculates the autocorrelation coefficient RO according to the following equation (5). is larger than the adaptive power threshold PAP, the judgment signal VAP based on the adaptive power threshold is set to a high level "1" indicating a sound, and the autocorrelation coefficient RO is set to a high level "1" that indicates a sound.
If it is equal to or smaller than AP, the determination signal VAP is set to a low level "0" indicating silence, and the sound/silence determination unit 1
Send on 7.

ＶＡＰ　−１（ＲＯ＞ＰＡＰ）　　　　　　　（５）０
　（ＲＯ≦Ｐ　ＡＰ）（但し、Ｐ　ｔｈｌ　≧ＰＡＰ≧Ｐｔｈ２）有音・無音
判定部１７は、比較部１３からのＬＰＣ距ＨＤに基づく
判定信号ＶＤ、比較部１４からのＬＰＣフレームの平均
パワに基づく判定信号■Ｐ及び比較部１６からの適応パ
ワ閾値ＰＡＰに基づく判定信号ＶＡＰを入力して、下記
（６）式に従い、判定信号ＶＤ　、　ＶＰ　、　ＶＡＰ
の和が０より大きい場合は、有音と判定して有音・無音
判定信号Ｖを高レベル「１」とし、また判定信号ＶＤ、
ＶＰ、ＶＡＰの和がｒＯＪかｒＯＪより小さい場合は、
無音と判定して有音・無音判定信号Ｖを低レベルｒＯＪ
としてハングオーバ時間付加部１８に送出する。VAP -1 (RO>PAP) (5) 0
(RO≦P AP) (However, P thl ≧PAP≧Pth2) The voice/silence determination unit 17 uses the determination signal VD based on the LPC distance HD from the comparison unit 13 and the average power of the LPC frame from the comparison unit 14. By inputting the judgment signal ■P based on the judgment signal P and the judgment signal VAP based on the adaptive power threshold PAP from the comparator 16, the judgment signals VD, VP, VAP are calculated according to the following equation (6).
If the sum is greater than 0, it is determined that there is a sound, and the sound/silence determination signal V is set to a high level "1", and the determination signal VD,
If the sum of VP and VAP is rOJ or smaller than rOJ,
It is determined that there is no sound and the sound/no-sound judgment signal V is set to a low level rOJ.
It is sent to the hangover time addition unit 18 as a hangover time adding unit 18.

Ｖ−１（ＶＤ　＋ＶＰ　＋ＶＡＰ＞０）　　（８）０　
（ＶＤ　＋ＶＰ　＋ＶＡＰ≦０）更に、適応パワ閾値更新部１５が適応パワ閾値ＰＡＰを
算出するための条件を設定するカウンタ１７ａを有し、
有音・無音判定信号Ｖによって無音と判定されたＬＰＣ
分析フレームが連続して生じる限り、このＬＰＣ分析フ
レームの数をカウンタ１７ａで予め指定したＮＡＰ個ず
つ計数し、もし判定信号ＶＤ、ＶＰとの和が「０」より
大きいＬＰＣ分析フレームが生じたならば、その時点で
カウンタ値ＮＣを「０」に戻す。一方、判定信号ＶＤ、
ＶＰの和が「０」か「０」より小さく、判定信号ＶＡＰ
が高レベル「１」であるフレームが生じれば、カウンタ
値ＮＣを「１」とし、この状態が続く限りＮＡＰ個ずつ
計数してカウンタ１７ａを歩進させる。V-1(VD +VP +VAP>0) (8)0
(VD +VP +VAP≦0) Furthermore, the adaptive power threshold updating unit 15 has a counter 17a for setting conditions for calculating the adaptive power threshold PAP,
LPC determined to be silent by sound/silence determination signal V
As long as analysis frames occur continuously, the number of LPC analysis frames is counted in units of NAP specified in advance by the counter 17a, and if an LPC analysis frame whose sum of judgment signals VD and VP is larger than "0" occurs, For example, at that point, the counter value NC is returned to "0". On the other hand, the judgment signal VD,
If the sum of VP is "0" or less than "0", the judgment signal VAP
If a frame with a high level "1" occurs, the counter value NC is set to "1", and as long as this state continues, the counter 17a is incremented by counting NAP pieces.

このようにして、カウンタ１７ａのカウンタ値ＮＣが予
め指定した数値ＮＡＰと等しくなった時、即チ、判定信
号ＶＤ　、　ＶＰ　、　ＶＡＰ（７）和が「０」か、「
０」より小さいＬＰＣ分析フレームが連続してＮＡＰ個
生じたか、或いは検出信号ＶＤ、ＶＰの和が「０」か、
「０」より小さく、かつ検出信号ＶＡＰが有音を示す高
レベル「１」のＬＰＣ分析フレームが連続してＮＡＰ個
生じた場合、制御信号ＣＡＰを適応パワ閾値更新部１５
に送出して、カウンタ値ＮＣを「０」に戻す。In this way, when the counter value NC of the counter 17a becomes equal to the predetermined value NAP, it is immediately determined whether the sum of the judgment signals VD, VP, and VAP (7) is "0" or "
Whether NAP consecutive LPC analysis frames smaller than "0" have occurred, or whether the sum of detection signals VD and VP is "0",
If NAP consecutive LPC analysis frames are smaller than "0" and have a high level "1" in which the detection signal VAP indicates sound, the control signal CAP is changed to the adaptive power threshold updating unit 15.
and returns the counter value NC to "0".

ハングオーバ時間付加部１８は、有音・無音判定部１７
からの有音・無音の判定信号Ｖを入力し、判定信号Ｖが
有音を示す高レベル「１」ならばハングオーバ時間を付
加して最終的な有音・無音の判定信号Ｖ　ｏｕｔを出力
端子７から出力する。The hangover time adding section 18 includes the sound/silence determining section 17.
If the judgment signal V is a high level "1" indicating the presence of sound, a hangover time is added and the final judgment signal V out is output from the terminal. Output from 7.

第３図（ａ）　、（ｂ）　、　（ｃ）は、ＬＰＣ分析フ
レームの平均パワ、適応パワ閾値Ｐ　ＡＰ、有音・無音
の判定信号Ｖとカウンタ１７ａの動作との関係を説明す
るための説明図である。第３図（ａ）は、入力サンプル
信号ＡのＬＰＣ分析フレームの平均パワの時間変化と、
適応パワ閾値ＰＡＰの時間変化を示しており、ここで、
Ｘ区間は音声信号区間、Ｙ区間は回線雑音区間、Ｙ′区
間は一時的に平均パワが大きく上昇した回線雑音区間を
示す。第３図（ｂ）は同図（ａ）に対応する、有音・無
音の判定信号Ｖの出力結果を示し、同図（ｃ）は、同じ
く同図（ａ）図に対応する判定信号ＶＤ　、　ＶＰ　、
　ＶＡＰと適応パワ閾値ＰＡＰの算出に用いたカウンタ
１７ａのカウンタ値ＮＣを示している。尚、ＮＡＰは６
を指定したものである。3(a), (b), and (c) are diagrams for explaining the relationship between the average power of the LPC analysis frame, the adaptive power threshold PAP, the voice/silence determination signal V, and the operation of the counter 17a. It is an explanatory diagram. FIG. 3(a) shows the temporal change in the average power of the LPC analysis frame of input sample signal A,
It shows the time change of the adaptive power threshold PAP, where:
The X section is a voice signal section, the Y section is a line noise section, and the Y' section is a line noise section where the average power temporarily increases significantly. FIG. 3(b) shows the output result of the sound/silence determination signal V corresponding to FIG. 3(a), and FIG. 3(c) shows the output result of the determination signal VD corresponding to FIG. 3(a). , VP ,
The counter value NC of the counter 17a used to calculate VAP and adaptive power threshold PAP is shown. In addition, NAP is 6
is specified.

ここで、以上の構成による動作を第４図（ａ）。Here, the operation of the above configuration is shown in FIG. 4(a).

（ｂ）の流れ図により説明する。This will be explained using the flowchart in (b).

まず、ＬＰＣ分析部１０は入力端子１から入力した入力
サンプル信号Ａを所定のフレーム長に区切り（Ｓ　１）
　、ＬＰＣ分析を行いＬＰＣ分析フレームの自己相関係
数行列ＩＲ及びＬＰＣ係数ベクトル１ｋを算出して（Ｓ
２）　、ＬＰＣ距離算出部１１に送出するとともにＬＰ
Ｃ分析フレームの平均パワを示す第０次の自己相関係数
ＲＯを比較部１４゜１６及び適応パワ閾値更新部１５に
送出する。ＬＰＣ距離算出部１１は、自己相関係数行列
ＩＲ，ＬＰＣ係数ベクトル１ｋ及び標準ベクトルメモリ
１２に格納しである標準ＬＰＣ係数ベクトルｌｂとから
ＬＰＣ距離りを（１）式に従って算出して、その結果を
比較部１３に送出する（Ｓ３）。比較部１３は（２）式
に従い、ＬＰＣ距離りと閾値Ｄｔｈとの大小の比較をな
しくＳ４）　、ＬＰＣ距ＨＤが閾値Ｄｔｈよりも大きい
場合は、判定信号ＶＤを高レベル「１」としくＳ５）　
、ＬＰＣ距離りが閾値と等しいか、小さい場合は判定信
号ＶＤを低レベル「０」として（Ｓ６）有音・無音判定
部　１７に送出する。また、比較部１４はＬＰＣ分析フ
レームの平均パワを示す第０次の自己相関係数ＲＯと第
１の閾値Ｐ　ｔｈｌとの大小の比較をなしくＳ７）、自
己相関係数ＲＯが第１の閾値Ｐ　ｔｈＬよりも大きい場
合は、判定信号ｖＰを高レベル「１」としくＳ８）、自
己相関係数ＲＯが第１の閾値Ｐ　ｔｈｌよりも小さい場
合は、第２の閾値Ｐ　ｔｈ２との大小の比較をなしくＳ
９）、自己相関係数ＲＯが第２の閾値Ｐｔｈ２よりも大
きいか等しい場合は、判定信号ＶＰを低レベル「０」と
しくＳ　１０）　、自己相関係数ＲＯが第２の閾値Ｐ　
ｔｈ２よりも小さい場合は判定信号ｖｐを「−１」とし
て（Ｓｌｌ）有音・無音判定部１７に送出する。更に、
比較部１６は（５）式に従い適応パワ閾値ＰＡＰ（初期
値はＰｔｈｌ）と自己相関係数ＲＯとの大小の比較をな
しくＳ　１２）、自己相関係数ＲＯが適応パワ閾値ＰＡ
Ｐより大きい場合は、判定信号ＶＡＰを高レベル「１」
としくＳ　１３）　、自己相関係数ＲＯが適応パワ閾値
ＰＡＰと等しいか小さい場合は、判定信号ＶＡＰを低レ
ベル「０」として（Ｓ１４）有音・無音判定部１７に送
出する。First, the LPC analysis unit 10 divides the input sample signal A input from the input terminal 1 into predetermined frame lengths (S1).
, LPC analysis is performed to calculate the autocorrelation coefficient matrix IR and LPC coefficient vector 1k of the LPC analysis frame (S
2) Sends it to the LPC distance calculation unit 11 and LP
The zero-th order autocorrelation coefficient RO indicating the average power of the C analysis frame is sent to the comparison unit 14 16 and the adaptive power threshold updating unit 15 . The LPC distance calculation unit 11 calculates the LPC distance from the autocorrelation coefficient matrix IR, the LPC coefficient vector 1k, and the standard LPC coefficient vector lb stored in the standard vector memory 12 according to equation (1), and calculates the result. is sent to the comparison unit 13 (S3). The comparator 13 does not compare the LPC distance with the threshold value Dth according to equation (2) (S4), and sets the determination signal VD to a high level "1" if the LPC distance HD is greater than the threshold value Dth. S5)
, LPC distance is equal to or smaller than the threshold value, the determination signal VD is set to a low level "0" (S6) and sent to the sound/silence determination section 17. In addition, the comparison unit 14 does not compare the magnitude of the 0th order autocorrelation coefficient RO indicating the average power of the LPC analysis frame with the first threshold value Pthl (S7), so that the autocorrelation coefficient RO indicates the average power of the LPC analysis frame. If the autocorrelation coefficient RO is smaller than the first threshold P thl, the determination signal vP is set to a high level "1" (S8), and if the autocorrelation coefficient RO is smaller than the first threshold P thl, it is set to the second threshold P th2. No comparison between S
9) If the autocorrelation coefficient RO is greater than or equal to the second threshold Pth2, the determination signal VP is set to a low level "0". 10) The autocorrelation coefficient RO is set to the second threshold Pth2.
If it is smaller than th2, the determination signal vp is set to "-1" (Sll) and sent to the voice/silence determination section 17. Furthermore,
The comparison unit 16 eliminates the magnitude comparison between the adaptive power threshold PAP (initial value is Pthl) and the autocorrelation coefficient RO according to equation (5).
If it is larger than P, the judgment signal VAP is set to high level "1".
If the autocorrelation coefficient RO is equal to or smaller than the adaptive power threshold PAP (S13), the determination signal VAP is set to a low level "0" (S14) and sent to the voice/silence determination section 17.

を音・無音判定部１７は、判定信号ＶＤ、ＶＰ。The sound/silence determining section 17 receives determination signals VD and VP.

ＶＡＰを入力しく６）式に従い判定信号ＶＤ、ＶＰ。Input VAP and generate judgment signals VD and VP according to equation 6).

ＶＡＰの和と０との比較をなしくＳ　１５）　、この和
が「０」と等しいか或いは小さい場合は、有音・無音の
判定信号Ｖを低レベル「０」として（Ｓ１６）カウンタ
１７ａのカウンタ値ＮＣに１を加算する（ＳＬ？）、一
方、判定信号、ＶＤ、ＶＰ。The sum of VAP and 0 are not compared (S15), and if this sum is equal to or smaller than "0", the sound/silence determination signal V is set to a low level "0" (S16) and the counter 17a is 1 is added to the counter value NC (SL?), while the judgment signals VD and VP.

ＶＡＰの和が「０」より大きい場合は、判定信号Ｖを高
レベル「１」としく８１８）　、判定信号ＶＤとｖＰと
の和と「０」との比較をなしくＳ　１９）、この和が０
より大きい場合はカウンタ１７ａのカウンタ値ＮＣを「
０」に戻す（Ｓ　２０）。判定信号ＶＤとＶＰとの和が
ｒＯＪに等しいか小さい場合、一つ前のＬＰＣ分析フレ
ームが有音であったか無音であったかの判別をなしく５
２１）、無音であったら、カウンタ値ＮＣを「１」とし
く５２２）、有音であったならばカウンタ値ＮＧに「１
」を加算する（Ｓ　２３）。以上の判定を行なった後、
現在のＬＰＣ分析フレームの有音・無音の判定信号ｖの
レベルを次のＬＰＣ分析フレームがＳ２１で参照できる
様に記憶する（Ｓ２４）とともに、連続するＮＡＰ個の
ＬＰＣ分析フレームの各々の平均パワも記憶しくＳ　２
５）　、カウンタ値ＮＣと指定値ＮＡＰとの一致、不一
致を判定する（Ｓ２６）。If the sum of VAP is greater than "0", the judgment signal V is set to a high level "1" 818), and the sum of judgment signals VD and vP is not compared with "0" (S 19), and this sum is is 0
If it is larger than that, set the counter value NC of the counter 17a to "
0" (S20). If the sum of the judgment signals VD and VP is equal to or smaller than rOJ, it is not possible to determine whether the previous LPC analysis frame was a sound or a soundless frame.5
21), if there is no sound, set the counter value NC to "1" 522), and if there is sound, set the counter value NG to "1".
” is added (S23). After making the above judgments,
The level of the sound/silence determination signal v of the current LPC analysis frame is stored so that the next LPC analysis frame can refer to it in S21 (S24), and the average power of each of the consecutive NAP LPC analysis frames is also stored. Remember S 2
5) It is determined whether the counter value NC and the designated value NAP match or do not match (S26).

カウンタ値ＮＧと指定値ＮＡＰとが一致したならば、制
御信号ＣＡＰを高レベル「１」にすることにより、適応
パワ閾値更新部１５で（４）式に従い適応パワマ゛り値
ＰＡＰが算出される（Ｓ２７）。一方、カウンタ値ＮＣ
が指定値ＮＡＰに達していなければ、制御信号ＣＡＰを
低レベルｒＯＪとし、有音・無音判定信号Ｖが、有音を
示す高レベル「１」であれば、ハングオーバ時間付加部
１８でハングオーバ時間が付加されて（８２ｇ）出力端
子７から最終的な有音・無音判定信号Ｖ　ｏｕｔが出力
される。以上の動作が８１〜５２８まで順次繰り返され
ることになる。If the counter value NG and the specified value NAP match, the adaptive power threshold updating unit 15 calculates the adaptive power difference value PAP according to equation (4) by setting the control signal CAP to a high level "1". (S27). On the other hand, counter value NC
has not reached the designated value NAP, the control signal CAP is set to a low level rOJ, and if the sound/no-sound determination signal V is a high level "1" indicating the presence of sound, the hangover time addition unit 18 sets the hangover time to The output terminal 7 outputs the final sound/non-sound determination signal V out (82g). The above operations are repeated sequentially from 81 to 528.

本実施例によれば、音声信号のパワは大きな範囲で変化
するが、回線雑音はある一定のパワ領域であることを利
用して、音声信号のみが存在するパワ領域と、回線雑音
と音声信号が混在する領域と、回線雑音のみが存在する
領域に分割して、音声信号のみが存在する領域での音声
信号の検出は平均パワが所定閾値Ｐ　ｔｈｌより大きい
場合に検出し、一方、回線雑音と音声信号とが混在する
領域では、ＬＰＣ距離りによるスペクトル分布上の違い
による検出と及び適応パワ閾値ＰＡＰとにより検出とを
併用する様になしたので、回線雑音による誤動作の影響
を少なくし、かつ適切な音声区間を検出できる。According to this embodiment, the power of the voice signal varies over a large range, but line noise is within a certain power range. A voice signal is detected in a region where only a voice signal is present when the average power is greater than a predetermined threshold P thl; In areas where audio and audio signals coexist, detection based on differences in spectral distribution due to LPC distance and detection using adaptive power threshold PAP are used together, reducing the influence of malfunctions due to line noise. In addition, it is possible to detect appropriate speech intervals.

（発明の効果）以上説明したように本発明によれば、回線雑音と音声信
号との識別として、ＬＰＣ係数と標準ベクトルとのＬＰ
Ｃ距離と予め設定した閾値との比較をすることでスペク
トル分布上の違いによるを音・無音の判定ができるとと
もに、連続する無音フレーム数が予め設定された指定数
に達するか、ＬＰＣ距離に基づく判定と、平均パワに基
づく判定が無音であり、かつ適応パワ閾値に基づく判定
が有音である連続フレームが前記指定数に達した時にこ
の適応パワ閾値を所定範囲で更新するようにしたので、
回線雑音レベルを正しく推定した適応パワ閾値とフレー
ムの平均パワとの比較をすることで有音・無音の判定が
できる。従って、回線雑音による誤動作を最小限にする
ことができ、かつ的確に音声区間の検出ができるので、
ハングオーバ時間を大幅に短縮できる利点がある。(Effects of the Invention) As explained above, according to the present invention, in order to distinguish between line noise and voice signals, LP coefficients and standard vectors are
By comparing the C distance with a preset threshold, it is possible to determine whether there is sound or silence based on the difference in spectral distribution, and whether the number of consecutive silent frames reaches a preset specified number is determined based on the LPC distance. When the number of consecutive frames in which the determination based on the average power is silent and the determination based on the adaptive power threshold reaches the specified number, the adaptive power threshold is updated within a predetermined range.
Speech/silence can be determined by comparing the adaptive power threshold that correctly estimates the line noise level with the average power of the frame. Therefore, malfunctions caused by line noise can be minimized, and voice sections can be detected accurately.
This has the advantage of greatly reducing hangover time.

[Brief explanation of drawings]

第１図は本発明による音声区間検出方式が適用される音
声検出器の一実施例を示すブロック図、第２図は従来の
音声区間検出方式が適用された音声検出器を示すブロッ
ク図、第３図（ａ）　、　（ｂ）　、（ｃ）は、ＬＰＣ
フレーム平均パワ、適応パワ閾値、有音・無音の判定信
号とカウンタの動作との関係を説明するための説明図、
第４図（ａｃ　、（ｂ）は、本発明による動作を説明す
るための流れ図である。図中、１・・・入力端子、７・・・出力端子、１０・・
・ＬＰＣ分析部、１１・・・ＬＰＣ距離算出部、１２・
・・標準ベクトルメモリ、１３，１４．１６・・・比較
部、１５・・・適応パワ閾値更新部、１７・・・有音・
無音判定部、１８・・・ハングオーバ時間付加部。FIG. 1 is a block diagram showing an embodiment of a speech detector to which the speech interval detection method according to the present invention is applied; FIG. 2 is a block diagram showing an embodiment of a speech detector to which the conventional speech interval detection method is applied; Figures 3 (a), (b), and (c) are LPC
An explanatory diagram for explaining the relationship between frame average power, adaptive power threshold, voice/silence determination signal, and counter operation;
FIG. 4(ac, b) is a flowchart for explaining the operation according to the present invention. In the figure, 1...input terminal, 7...output terminal, 10...
・LPC analysis section, 11...LPC distance calculation section, 12.
... Standard vector memory, 13, 14. 16... Comparison unit, 15... Adaptive power threshold update unit, 17... Sound
Silence determination section, 18... Hangover time addition section.

Claims

[Claims] In a voice section detection method that distinguishes between line noise and voice signals in an input sample signal and detects the voice signal, the input sample signal is divided into predetermined frame lengths, and each frame is linearly predicted (hereinafter referred to as , LPC) for calculating the average power and LPC coefficient of each frame; and LPC of the LPC coefficient and a preset standard vector.
a first determination means that compares the distance with a preset threshold and determines that there is a sound if the LPC distance is greater than the threshold; and a first determination means that compares the average power with a preset threshold and determines the average power. a second determination means that determines that there is a sound if the average power is greater than the threshold, and a second determination means that compares the average power with an adaptive power threshold that can be updated within a predetermined range, and if the average power is greater than the adaptive power threshold; a third determining means for determining whether there is a sound if the audio signal is present; a fourth determining means for counting the number of frames; and an updating means for calculating and updating an adaptive power threshold based on an instruction from the fourth determining means, the updating means configured to count the number of consecutive silent frames. reaches a specified number set in advance, or the number of consecutive frames in which the first and second determining means determine that there is no sound, and the third determining means determines that there is sound reaches the specified number. A speech interval detection method characterized in that an adaptive power threshold is updated when the threshold value is reached.