JPS6332600A

JPS6332600A - Voice detector

Info

Publication number: JPS6332600A
Application number: JP61175418A
Authority: JP
Inventors: 庄司　保夫; 孝夫鈴木; 白木　裕一
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1986-07-25
Filing date: 1986-07-25
Publication date: 1988-02-12

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）この発明はディジタル通信、音声蓄積システム或いはそ
の他の音声信号処理に使用して好適な音声検出装置に関
する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a voice detection device suitable for use in digital communications, voice storage systems, or other voice signal processing.

（従来の技術）従来１通常の通信システムにおいては、　二人の人が双
方向の電話回線を介して会話するため、片方の通話者の
話先相手が聴くという形となっている。これがため、一
方向の回線にのみ音声信号が流れ、逆方向の回線は空の
状態となることが多い。さらに、話の途中でも音節の切
れ目その他の箇所で音声信号が鋳辺れる瞬間がある。(Prior Art) In a conventional communication system, two people have a conversation via a two-way telephone line, so that the person on the other end of the conversation listens. As a result, voice signals often flow only in one direction, and the lines in the opposite direction are empty. Furthermore, even in the middle of a speech, there are moments when the audio signal fails at syllable breaks or other places.

このように、通話者個々に一回線を割当てた場合には、
各回線に音声信号が流れている確率は通常１／２以下と
なってしまっている。従って、音声が存在する時に限っ
て１通話者に回線を割当てるようにすると、回線数より
も多くの通話を取扱うことが可能となる。In this way, when one line is assigned to each caller,
The probability that a voice signal is flowing through each line is usually less than 1/2. Therefore, by allocating a line to one caller only when voice is present, it becomes possible to handle more calls than the number of lines.

ところで、このような−回線多通話を実現するためには
、回線を流れる音声信号の区間を検出して通話の割当て
を行う必要がある。この音声信号の検出方法として、従
来からアナログ音声信号を対象とするＴＡＳＩ（タイム
　アサインメントスビーチ　インターボレイジョン（Ｔ
ｉｍｅ　ＡＳｓｉｇｎｓｅｎｔ　５ｐｅｅｃｈ　　Ｉｎ
ｔｅｒｐｏｌａｔｉｏｎ　〕略称））法及びディジタル
音声信号を対象とするＤＳＩ（ディジタル　スピーチ　
インターボ１／イシヨン（［１１ｇ１ｔａｌＳｐｅｅｃ
ｈ　Ｉｎｔｅｒｐｏｌａｔｉｏｎの賂称））法（例えば
、文献「電子通信学会論文誌８０／７．Ｊ６３−Ａ（７
）ｐ４１３〜４２０）とが知られている。By the way, in order to realize such a multi-line call, it is necessary to detect the section of the voice signal flowing through the line and allocate calls. As a detection method for this audio signal, TASI (Time Assignment Intercollation), which targets analog audio signals, has conventionally been used.
ime ASsignsent 5peech In
terpolation (abbreviation)) and DSI (digital speech
Interturbo 1/Ission ([11g1talSpeec
h Interpolation) law (for example, the document “Transactions of the Institute of Electronics and Communication Engineers 80/7.J63-A (7
) p413-420) are known.

このＴＡＳＩ法は王として長距犀の海底同軸ケーブル方
式に採用されている。これに対し、ＤＳＩ法は、音声検
出、チャネル割りて制御の操作が比較的容易である観点
から、主としてディジタル衛星通信回線で使用されてい
る。This TASI method is most commonly used in long-distance submarine coaxial cable systems. On the other hand, the DSI method is mainly used in digital satellite communication lines because voice detection and channel allocation control operations are relatively easy.

このＤＳＩ法によれば、Ｌ述したように、音声検出部と
、チャネル割当て制御部とに大別される。特に、ＤＳＩ
法に必要とするこの音声検出部の仕様としては、音声検
出時間が短く、ｎ　音による誤動作が少なく及び音声を
正しく検出することが上げられる。According to this DSI method, as described above, the DSI method is roughly divided into a voice detection section and a channel allocation control section. In particular, DSI
The specifications of this voice detection section required by the law include short voice detection time, few malfunctions caused by sounds, and correct voice detection.

従来、このＤＳＩ法で使用している音声検出装置では、
音声波形すなわち音声区間を電力ｈｔのみによって検出
して音声の有無を判定している。Conventionally, the voice detection device used in this DSI method,
The presence or absence of speech is determined by detecting the speech waveform, that is, the speech section, using only the electric power ht.

これがため、語頭切れ或いは話中及び語尾の切断等を回
避又は除去するために、例えば１５０〜２５０ｍ５とい
うような長いハングオーバ時間を付加する必要がある６
　しかし、このため雑音による誤動作によっても長いハ
ングオーバ時間が付加されてしまい、その結果音声検出
装置の利得が低下するという問題点があった。Therefore, it is necessary to add a long hangover time, for example 150 to 250 m5, in order to avoid or eliminate truncations at the beginning of words, mid-speech and endings, etc.6
However, this poses a problem in that a long hangover time is added due to malfunctions caused by noise, and as a result, the gain of the voice detection device is reduced.

そこで、通常、音声以外の熱雑音等は、周波数について
は一様分布、電力についてはガウス分布をしており、一
方、声の電力分布はラプラス分布で周波数についてはラ
ンダム分布を示すことが知られている点に着目し、音声
検出の検出能力を向上させるため、電力量のみではなく
雑音と音声との質的（エネルギー分布、周波数分布或い
はその他好適な量）相違を利用して検出することも考え
られる。Therefore, it is known that thermal noise other than voice usually has a uniform frequency distribution and a Gaussian power distribution, whereas the power distribution of voice is a Laplace distribution and a random frequency distribution. In order to improve the detection ability of voice detection, it is also possible to perform detection using not only the amount of power but also the qualitative difference (energy distribution, frequency distribution, or other suitable amount) between noise and voice. Conceivable.

この音声と雑音との質的相違を識別する従来既知の方法
として、ある時間内に音声波形が零点を横切る回数の多
少による方法（零交叉数法）が提案されている。しかし
ながら、この零交叉数法は、音声波形中に、ある範囲の
周波数が存在するかを識別するものであるため、音声波
形の中のある範囲の周波数の電力量の大小を識別するこ
とが出来ない。As a conventionally known method for identifying qualitative differences between speech and noise, a method has been proposed that uses the number of times a speech waveform crosses a zero point within a certain period of time (zero-crossing method). However, since this zero-crossing method identifies whether a certain range of frequencies exists in the audio waveform, it cannot identify the magnitude of the power amount of a certain range of frequencies in the audio waveform. do not have.

また、音声認識装置に使用されている音声検出装置は、
ＤＳＩ法に使用されている音声検出装置に比べて長時間
の音声波形のディジタル信号処理により、音声の有無の
識別を実施しているので精度が高いが、検出時間が長い
ためにＤＳＩ法には適用出来ない。In addition, the voice detection device used in voice recognition devices is
Compared to the voice detection device used in the DSI method, the presence or absence of voice is identified by digital signal processing of the voice waveform over a long period of time, so the accuracy is high, but the long detection time makes the DSI method Not applicable.

このように、従来は音声波形の電力量を勘案した周波数
分布を識別する方法がないため、従来の音声検出装置は
結局は電力量のみによる音声検出法が適用されていた。As described above, since there is no conventional method for identifying the frequency distribution that takes into account the power amount of a voice waveform, conventional voice detection devices have ended up using a voice detection method based only on the power amount.

第５図はＤＳＩ法に使用され、電力量のみによる音声検
出を行っている従来の音声検出装置の・構成例を説明す
るためのブロック図である。FIG. 5 is a block diagram illustrating an example of the configuration of a conventional voice detection device used in the DSI method and which performs voice detection based only on electric power.

第５図において、Ｄｌは、通常８ｋＨｚのサンプリング
の入力ＰＣＭ信号である。５０は絶対値算出回路で、入
力されたＰＣＭ信号Ｄ１の正負の符号を除去して絶対値
を表す出力Ｄ２を次段の第一比較器５２へ出力する。こ
の第一・比較器５２において、予め経験的に設定されて
いる大きさの田植Ｔ１と、この出力Ｄ２との比較を行っ
て、入力信号ＤＩの絶対値Ｄ２が閾値Ｔ１の２倍よりも
大きい場合には出力信号Ｄ３を＋２に設定し、閾値Ｔ１
より大きいが２倍量丁の場合には出力信号Ｄ３を＋１と
設定し、閾値Ｔ１以下である場合には出力Ｄ３を−１と
設定するようにして、それぞれの値が割当てられた出力
Ｄ３を累積器５４に供給する。この累積器５４において
は、一定期間例えば４ｍｓの期間中送られてきた出力Ｄ
３の値を加算する。この累積器５４からの加算値Ｄ４を
第二比較器５６に送り、そこで予め経験により設定され
ている閾値Ｔ２との比較を行う。加算値Ｄ４が閾値Ｔ２
以りの場合には、この第二比較器５６の出力Ｄ５は２値
の「１」に設定されて音声有りと判定される。一方、累
積器５４の加算値Ｄ４が閾値Ｔ２以下である場合には、
第二比較器５６の出力ｐ５は「０」に設定されて音声が
消滅していると判定される。In FIG. 5, Dl is the input PCM signal, typically sampled at 8 kHz. 50 is an absolute value calculation circuit which removes the positive and negative signs of the input PCM signal D1 and outputs an output D2 representing the absolute value to the first comparator 52 at the next stage. In this first comparator 52, the rice planting T1 having a size set empirically in advance is compared with this output D2, and the absolute value D2 of the input signal DI is larger than twice the threshold value T1. In this case, the output signal D3 is set to +2 and the threshold T1
If the quantity is larger but double, the output signal D3 is set to +1, and if it is less than or equal to the threshold T1, the output D3 is set to -1, and the output D3 to which each value is assigned is set. is supplied to an accumulator 54. In this accumulator 54, the output D that is sent during a certain period of time, for example, 4 ms.
Add the value of 3. The added value D4 from the accumulator 54 is sent to a second comparator 56, where it is compared with a threshold value T2 previously set based on experience. Added value D4 is threshold T2
In this case, the output D5 of the second comparator 56 is set to binary "1" and it is determined that there is audio. On the other hand, if the added value D4 of the accumulator 54 is less than or equal to the threshold T2,
The output p5 of the second comparator 56 is set to "0" and it is determined that the voice has disappeared.

この出力Ｄ５は次段のハングオー八時間付加回路５８に
供給される。そこでは、音声の語尾切れ、話中切断等が
生ずるのを防止するため、出力Ｄ５が「１」から「０」
へと変化したとき、この出力のｒｌＪの状態の終ｒ時に
ハングオーバ時間を付加して一定時間だけ「ｌ」状態に
継続保持するようにし、よって持続時間の延長された出
力Ｄ６を出力するように構成されている。This output D5 is supplied to the Hang-O-8 hour adding circuit 58 at the next stage. In this case, in order to prevent voice endings being cut off, disconnections during calls, etc., the output D5 changes from "1" to "0".
, a hangover time is added at the end of the rlJ state of this output to keep it in the "l" state for a certain period of time, thereby outputting an output D6 with an extended duration. It is configured.

（発明が解決しようとする問題点）しかしながら、この従来構成の音声検出装置は、電力量
に基づき音声波形を検出する方法を取っているため、依
然として既に説明したような問題点すなわち雑音による
誤動作によっても長いハングオー八時間が付加され、従
って、検出時間が￥際の音声区間よりも長くなり、これ
がため、音声検出装置の利得が低下するという問題点が
あった。(Problems to be Solved by the Invention) However, since the voice detection device with this conventional configuration uses a method of detecting the voice waveform based on the amount of electric power, it still suffers from the problem described above, that is, due to malfunction due to noise. 8 hours are added to the long hang-o, and therefore the detection time becomes longer than the last voice section, which causes a problem in that the gain of the voice detection device decreases.

この発明の目的は、１述した従来の音声検出装置が有す
る問題点に鑑み、音声区間を−・層高精度に検出するこ
とによって雑音による誤動作紮防１ヒすると共に、音声
の話頭切れ、話中すｊ断、語尾！、ＴＪ断を除去し、し
かも利得の低下を来たさない、音声検出能力の優れた音
声検出装置を提供することにある。In view of the problems of the conventional speech detection device mentioned above, it is an object of the present invention to prevent malfunctions caused by noise by detecting speech sections with high precision, Nakasujdan, ending! It is an object of the present invention to provide a voice detection device which eliminates TJ disconnection, does not cause a decrease in gain, and has excellent voice detection ability.

（問題点を解決するための手段〕この目的の達成を図るため、この発明の音声検出装置は
次のような手段を取る。(Means for Solving the Problems) In order to achieve this objective, the voice detection device of the present invention takes the following measures.

先ず、この音声検出装置はディジタル入力信号のある期
間の電力量に基づいて音声波形すなわち音声区間を判定
する第一音声検出信号を出力する第一音声検出部を具え
る。First, this voice detection device includes a first voice detection section that outputs a first voice detection signal that determines a voice waveform, that is, a voice section, based on the amount of power during a certain period of a digital input signal.

さらに、このディジタル入力信号の前と同じ期間におけ
る互いに異なる周波数帯域の成分間の電力比に基づいて
音声波形すなわち音声区間を′Ｉ’ｌ定する第二音声検
出信号を出力する第二音声検出部を具える。Furthermore, a second voice detection section outputs a second voice detection signal that determines the voice waveform, that is, the voice section, based on the power ratio between components of mutually different frequency bands in the same period as before the digital input signal. Equipped with.

さらに、これら第一及び第二音声検出信号の論理演算に
よってディジタル入力信号の音声の有急に従って音声区
間を判定する判定部を具える。Furthermore, a determination unit is provided which determines a voice section according to the urgency of the voice of the digital input signal by logically calculating the first and second voice detection signals.

この場合、好ましくは、第一音声検出部には、ディジタ
ル入力信号のある期間における電力を算出するだめの１
し力算出部と、得られた電力と所定の閾値とを比較して
その比較結果を例えば２値の第−音声検出信号として出
力する比較器とを設けるのが良い。In this case, it is preferable that the first audio detection unit includes one unit for calculating the power during a certain period of the digital input signal.
It is preferable to provide a power calculation section and a comparator that compares the obtained power with a predetermined threshold value and outputs the comparison result as, for example, a binary voice detection signal.

また、好ましくは、第二音声検出部には、ディジタル入
力信号の異なる周波数帯域の成分をそれぞれ出力出来る
複数の濾波器と、これら濾波器の出力の前述と同様なあ
る期間における電力量をそれぞれ算出する複数の電力算
出部と、これら主力算出部から得られた適当な電力量間
で比のイめを算出し、これら電力比の値とＬｌｌ値とを
比較してその比較結果を例えば２仙の第一音声検出信号
として出力する比較器とを設けるのが良い。Preferably, the second audio detection unit includes a plurality of filters each capable of outputting components of different frequency bands of the digital input signal, and calculates the power amount of each of the outputs of these filters in a certain period as described above. Calculate the ratio between the multiple power calculation units and the appropriate power amounts obtained from these main power calculation units, compare these power ratio values with the Lll value, and compare the comparison results with, for example, two power calculation units. It is preferable to provide a comparator that outputs the first voice detection signal as the first voice detection signal.

これら濾波器としては、例えば、低域濾波器及び高域濾
波器、適当な遮断周波数を持ったｆｉｒ域濾波器又は帯
域消去濾波器等の適当な通過周波数帯域の異なる複数の
濾波器を組合せて使用することが出来る。These filters may be a combination of multiple filters with different appropriate pass frequency bands, such as a low-pass filter, a high-pass filter, a fir-pass filter with an appropriate cutoff frequency, or a band-elimination filter. It can be used.

（作用）このように、この発明の音声検出装置によれば、入力信
号の電力量のみに基づいて検出した音一区間検出信号と
、入力信号を二個以上のディジタルｅ波器を通過させた
後、各々の異なる周波数帯域の成分間での電力比に基づ
いて検出した音声区間検出信号とから、入力信号中の音
声の有無を判定するような構成を採っている。(Function) As described above, according to the voice detection device of the present invention, the one-sound section detection signal detected based only on the power amount of the input signal and the input signal passed through two or more digital e-wave devices Thereafter, the presence or absence of speech in the input signal is determined based on the speech section detection signal detected based on the power ratio between components of different frequency bands.

既に説明した通り、通常、音声と３１　音とが質的相違
しているので、入力信号の異なる周波数成分間での電力
比は、この音声と雑音との質的相違を反映した測定量と
なっており、従って、これに基づく音声区間の検出と、
電力量のみによる音声区間検出とを組み合せることによ
って、雑音による誤った音声区間の検出をなくすと共に
、音声区間の検出を高感度に行うことが出来る。As already explained, since speech and 31 sounds are usually qualitatively different, the power ratio between different frequency components of the input signal is a measurement quantity that reflects this qualitative difference between speech and noise. Therefore, detection of speech intervals based on this,
By combining the voice section detection based only on the amount of power, it is possible to eliminate erroneous voice section detection due to noise and to perform voice section detection with high sensitivity.

（実施例）以下、図面を参照して、この発明の音声検出装置の実施
例につき説明する。(Embodiments) Hereinafter, embodiments of the voice detection device of the present invention will be described with reference to the drawings.

尚、以下説明する実施例は、単なる好適例であるにすぎ
ず、従って、この発明はこの実施例で例示した数値例、
処理手段、処理手順等の構成にのみ限定されるものでは
なく、この発明の範囲内で種々の変更或は変形を成し得
ること明らかである。It should be noted that the embodiments described below are merely preferred examples, and therefore, the present invention does not apply to the numerical examples illustrated in these embodiments,
It is clear that the present invention is not limited only to the configuration of the processing means, processing procedure, etc., and that various changes and modifications can be made within the scope of the present invention.

第１図はこの発明の音声検出装置の−・実施例の構成を
示すブロック図である。同図において、１０は第一音声
検出部、１２は第二音声検出部、１４は判定部及び１６
は、必ずしもこの装置に含ませる必要はないが、ハング
オーバ時間付加回路である。FIG. 1 is a block diagram showing the configuration of an embodiment of the voice detection device of the present invention. In the figure, 10 is a first voice detection section, 12 is a second voice detection section, 14 is a determination section, and 16 is a second voice detection section.
Although not necessarily included in this device, is a hangover time adding circuit.

角検出部ｌＯ及び１２にはディジタル入力信号ＤＩＯ例
えば８ｋＨｚでサンプリングされたＰＣＭ信号をリニア
変換して得た１４ビット信号を供給する。The angle detectors IO and 12 are supplied with a digital input signal DIO, for example, a 14-bit signal obtained by linearly converting a PCM signal sampled at 8 kHz.

第一音声検出部の説明この第一　音声検出部１０は、既に説明したように、入
力信号ＤＩＯのある期間の電力量に基づいて音声波形す
なわち音声区間の有無を判定する第一・音声検出信号Ｄ
１２を出力するように構成する。Description of First Audio Detection Unit As already explained, the first audio detection unit 10 uses a first audio detection signal that determines the presence or absence of an audio waveform, that is, audio section, based on the amount of power during a certain period of the input signal DIO. D
12.

そのため、この実施例では、この第一音声検出部ｌＯに
は、好ましくは、ディジタル入力信号ＤＩＯのある期間
における電力を算出するだめの電力算出部２０と、得ら
れた電力と所定の閾値とを比較してその比較結果を例え
ば２値の第一音声検出信号として出力する比較器２１と
を設ける。Therefore, in this embodiment, the first audio detection unit IO preferably includes a power calculation unit 20 that calculates the power during a certain period of the digital input signal DIO, and a power calculation unit 20 that calculates the power in a certain period of the digital input signal DIO, and a power calculation unit 20 that calculates the power in a certain period of the digital input signal DIO, and a A comparator 21 is provided for comparing and outputting the comparison result as, for example, a binary first voice detection signal.

次に、この第一音声検出部１０の動作例につき第３図の
動作の流れ図を参照しながら説明する。Next, an example of the operation of the first voice detection section 10 will be described with reference to the flowchart of the operation shown in FIG.

入力信号ＤＩＯが電力算出部２０に入力すると（ステッ
プ１．以下、ステップをＳで示す。）、各サンプル毎の
電力を算出する演算処理を行い（Ｓ２）、求められた各
電力のうち一定のサンプル数について（通常は３２〜８
０サンプル程度で、時間は約４ｍｓ　ｅ　ｃ　〜１０ｍ
ｓ　ｅ　ｃ程度に対応する。）総和を求める電力和演算
処理を行って（Ｓ３）、総和電力Ｄ１６を次段の比較器
２１へ出力する。一定サンプル数の電力総和が、完了し
ていない場合には、ステップＳ１〜Ｓ３の処理を繰り返
し行ない、この処理が完了している場合には次の処理に
進む（Ｓ４）。When the input signal DIO is input to the power calculation unit 20 (step 1. Hereinafter, the step is indicated by S), calculation processing is performed to calculate the power for each sample (S2), and a certain amount of the calculated power is Regarding the number of samples (usually 32 to 8
Approximately 0 samples, time is approximately 4ms ec ~ 10m
It corresponds to about sec. ) A power sum calculation process is performed to obtain the total sum (S3), and the total power D16 is output to the comparator 21 at the next stage. If the power summation of a certain number of samples has not been completed, the processes of steps S1 to S3 are repeated, and if this process has been completed, the process proceeds to the next process (S4).

次に、この比較器２１においては、音声区間判定のため
のｆ−１（ｆｆｉＴ＋ｏと、この総和電力Ｄｌ１３との
比較処理を行って、総和電力ＤＩＯが閾値ＴＩＯ以北で
ある場合には音声有りを表わす２値の「１」の判定情報
及び閾値Ｄ１０よりも小さい場合には音声無しを表わす
２値の「０」の判定情報をそれぞれ第一音声検出信号と
して出力させる（Ｓ５）。そして、入力信号Ｄ１０が終
了したかを判定し、終了していない場合にはステップ３
１〜Ｓ５を信号が終了するまで繰り返し行う（Ｓ６）。Next, this comparator 21 performs a comparison process between f-1(ffiT+o for voice section determination and this total power Dl13), and if the total power DIO is north of the threshold TIO, there is voice. The determination information of binary "1" representing the sound and the determination information of binary "0" representing the absence of sound when it is smaller than the threshold D10 are respectively output as the first sound detection signal (S5). Determine whether the signal D10 has ended, and if it has not ended, step 3
Steps 1 to S5 are repeated until the signal ends (S6).

尚、この閾値ＴＩＯは予め経験によりその値を設定する
ことが出来る。又、電力算出部２０及び比較器２１は対
応する上述した各処理を行う手段を具えている。又、特
に、この電力算出部２０を好ましくはマイクロソフトで
作動するシグナル・プロセンサで構成することが出来る
。Note that this threshold value TIO can be set in advance based on experience. Further, the power calculation unit 20 and the comparator 21 are provided with means for performing the corresponding processes described above. Also, in particular, the power calculation unit 20 can preferably be constructed from a signal processor operated by Microsoft.

第二音声検出部の説明また、第二音声検出部１２は、このディジタル入力信号
ＤＩＯの前と同じ期間における互いに異なる周波数帯域
の成分間の電力比に基づいて音声波形すなわち音声区間
を判定する第二音声検出信号Ｄ１４を出力するように構
成する。Description of Second Audio Detection Unit The second audio detection unit 12 also includes a second audio detection unit 12 that determines the audio waveform, that is, audio section, based on the power ratio between components of different frequency bands in the same period as before the digital input signal DIO. It is configured to output a two-voice detection signal D14.

この実施例では、第二音声検出部１２には、好ましくは
、ディジタル入力信号の異なる周波数帯域の成分をそれ
ぞれ出力出来る複数の例えば第一濾波器２２としての低
域フィルタ及び第二濾波器２３としての高域フィルタを
設ける。これらフィルタ２２及び２３は２次のディジタ
ルフィルタ濾波器とし、それぞれＰＣＭ信号の使用周波
数域の中心に遮断周波数を持っている低域フィルタ及び
高域フィルタとする。これらフィルタ２２及び２３から
得られたそれぞれ異なる周波数範囲の成分の出力ＤＩ８
．Ｄ２０に対して、ある期間における電力量Ｄ２２．Ｄ
２４をそれぞれ算出して出力する複数の第一及び第二電
力算出部２４及び２５を設ける。さらに得られた電力％
　Ｄ　２２及び０２４間で比の値を算出し、これら電力
比の値と閾値とを比較してその比較結果を、音声波形す
なわち音声区間を判定する例えば２値の第二音声検出信
号Ｄ１４として、出力する比較器２Ｂとを設ける。In this embodiment, the second audio detection unit 12 preferably includes a plurality of low-pass filters, such as a first filter 22 and a second filter 23, each of which can output components of different frequency bands of the digital input signal. A high-pass filter is provided. These filters 22 and 23 are second-order digital filters, and are a low-pass filter and a high-pass filter, respectively, having cutoff frequencies at the center of the frequency range used for the PCM signal. Output DI8 of components in different frequency ranges obtained from these filters 22 and 23
．． D20, the amount of electric power D22. D
A plurality of first and second power calculation units 24 and 25 are provided, each of which calculates and outputs 24. Further power gained%
Calculate the ratio value between D22 and D024, compare these power ratio values with a threshold value, and use the comparison result as, for example, a binary second voice detection signal D14 for determining the voice waveform, that is, the voice section. A comparator 2B for output is provided.

この実施例で用いている第一及び第二濾波器２２及び２
３は例えば第２図に示すような構成のディジタルフィル
タとすることが出来る。第２図において、３０及び３１
は入力端子及び出力端ｆである。３３〜３６は加算器、
３７〜４１は乗算器、及び４２及び４３は遅延素子であ
る。そして、入力端７−３０から加算器３３、乗算器３
９、加算器３５及び出力端Ｔ−３１へと接続する。また
加算器３３と乗算器３８との間を遅延素子４０、乗算器
３７、加算器３４を経て加算器３３の他方の入力端子に
接続する。また遅延素子４２と乗算器３７との間を一方
においては遅延素子４３、乗算器３８を経て加算器３４
の他方の入力端ｒに接続すると共に、他方においては乗
算器４３、加算器３６を経て加算器３５の他方の入力端
Ｐに接続する。さらに、遅延素子及び乗算器３８との間
を乗算器４１を経て加算器３６の他方の入力端子−に接
続して構成しである。First and second filters 22 and 2 used in this example
3 can be a digital filter having a configuration as shown in FIG. 2, for example. In Figure 2, 30 and 31
are the input terminal and the output terminal f. 33 to 36 are adders;
37 to 41 are multipliers, and 42 and 43 are delay elements. Then, from the input terminal 7-30, an adder 33 and a multiplier 3
9, connect to adder 35 and output terminal T-31. Further, the adder 33 and the multiplier 38 are connected to the other input terminal of the adder 33 via a delay element 40, a multiplier 37, and an adder 34. Also, between the delay element 42 and the multiplier 37, on one side, the delay element 43 and the multiplier 38 are connected, and then the adder 34 is connected.
It is connected to the other input terminal r of the adder 35 at the other end via the multiplier 43 and the adder 36. Further, the delay element and the multiplier 38 are connected to the other input terminal of the adder 36 via the multiplier 41.

この実施例ではディジタルフィルタは２次の伝達関数を
示す。そして、このフィルタは、入力をＶｉｎとしかつ
出力をＶＯｕ、　とじ、乗算器３７〜４１の乗算係数を
それぞれｂｌ　、ｂ２．ａｏ、ａｌ、ａ２　としてこれ
らを従来既知の方法で適切に設定しておき、さらに、−
個の遅延素子−の遅延着をｚ−ｌと表示すると、入力Ｖ
ｉ□及び出力■。Ｉとの間には関係式（１）が成立する
ように構成することが出来る。In this embodiment, the digital filter exhibits a second-order transfer function. This filter takes the input as Vin and the output as Vou, and sets the multiplication coefficients of the multipliers 37 to 41 as bl, b2 . These are appropriately set as ao, al, and a2 using a conventionally known method, and -
If the delayed arrival of delay elements - is denoted as z-l, then the input V
i□ and output■. It can be configured such that the relational expression (1) holds between I and I.

次に、この第二音声検出部の動作につき第４図の動作の
流れ図を参照しながら説明する。Next, the operation of the second voice detection section will be explained with reference to the flowchart of the operation shown in FIG.

入力信号Ｄ１０が第一及び第二フィルタ２２及び２３に
入力すると（ＳＩＯ）、それぞれ低域及び高域の周波数
範囲の成分が出力されて次段の電力算出部２４及び２５
にそれぞれ供給される（Ｓ１１）、これら電力算出部２
４及び２５において、各サンプル毎の電力を算出する演
算処理を行い（Ｓ１２）、求められた各電力のうち第一
音声検出部１０の場合と同一の一定のサンプル数につい
て（通常は３２〜８０サンプル程度で、時間は約４ｍ５
ｅｃ〜１０ｍ５ｅｃ程度に対応する。）総和を求める電
力和演算処理を行って、総和電力Ｄ２２及びＤ２４をそ
れぞれ次段の比較器２１へ出力する（Ｓ１３）。When the input signal D10 is input to the first and second filters 22 and 23 (SIO), components in the low and high frequency ranges are outputted and sent to the power calculation units 24 and 25 in the next stage.
(S11), these power calculation units 2
4 and 25, arithmetic processing is performed to calculate the power for each sample (S12), and among the determined powers, the same constant number of samples as in the case of the first audio detection unit 10 (usually 32 to 80 It takes about 4m5 for a sample.
It corresponds to about ec~10m5ec. ) A power sum calculation process is performed to obtain the total sum, and the total power D22 and D24 are respectively output to the next stage comparator 21 (S13).

一定サンプル数の電力総和が完了していない場合には、
ステップＳ１０〜Ｓ１３の処理を繰り返し行い、この処
理が完了している場合には、次の処理に進む（５１４）
。If the power summation for a certain number of samples is not completed,
The process of steps S10 to S13 is repeated, and if this process is completed, proceed to the next process (514)
.

次に、この比較器２１においては、先ず、二つのフィル
タ２２及び２３及びそれぞれの電力算出部２４及び２５
を経て得られた総和電力Ｄ２２及びＤ２４の比を求める
演算処理を行う（Ｓ１５）、入力信号Ｄｌｏが雑音であ
る場合には、一様に分布しているので電力比はほぼ１と
なり、一方、入力信号ＤＩＯが音声の場合には、この音
声の種類によって周波数の分布が異なるため、電力比は
１以外の値となる。Next, in this comparator 21, first, two filters 22 and 23 and their respective power calculation units 24 and 25 are used.
An arithmetic process is performed to calculate the ratio of the total powers D22 and D24 obtained through (S15). If the input signal Dlo is noise, the power ratio will be approximately 1 because it is uniformly distributed; on the other hand, When the input signal DIO is audio, the power ratio takes a value other than 1 because the frequency distribution differs depending on the type of audio.

次に、音声区間判定のため予め経験により設定されてい
る閾値ＴＩ２と、この電力比との比較処理を行う（ＳＩ
６）。例えば内偵ＴＩ２を１．１　と０．８　とすると
、電力比≧１．１或いは≦０．９の時、音声イ１りを表
わす２値の「１」の判定情報及び１．１≧’＋ｈ力比≧
０．９の場合には雑音であって音声無しを表わすｚ値の
「０」の判定情報をそれぞれ第一音声検出信号Ｄ１４と
して出力させる。そして、入力信号ＤＩＯが網下したか
を判定し、終了していない場合にはステップ８１０〜Ｓ
１８を信号が終ｒするまで繰り返し行う（Ｓ　１７）。Next, a comparison process is performed between this power ratio and a threshold value TI2 previously set based on experience for voice section determination (SI
6). For example, if the internal detective TI2 is 1.1 and 0.8, when the power ratio ≧1.1 or ≦0.9, the judgment information of binary “1” representing voice I1 and 1.1≧'+h Force ratio≧
In the case of 0.9, determination information of a z value of "0" representing noise and no voice is output as the first voice detection signal D14. Then, it is determined whether the input signal DIO has dropped, and if it has not finished, steps 810 to S
Step 18 is repeated until the signal ends (S17).

尚、第一及び第二フィルタ２２及び２３、電力算出部２
４及び２５、及び比較器２６は対応すると述した各処理
を行う手段を具えている。Note that the first and second filters 22 and 23, the power calculation section 2
4 and 25, and the comparator 26 are provided with means for performing the corresponding processing.

また、このような低域フィルタ及び高域フィルタを用い
る場合には、３ｋＨｚ以上の周波数域に大きなエネルギ
ーを持つ子音及び１ｋＨｚ以下の周波数域に大きなエネ
ルギーを持つ母音の検出に特に好適である。Further, when such a low-pass filter and a high-pass filter are used, it is particularly suitable for detecting consonants having large energy in a frequency range of 3 kHz or more and vowels having large energy in a frequency range of 1 kHz or less.

判定部及びハングオーバ時間付加回路の説明これら第一
及び第二音声検出信号Ｄ１２及びＤ１４は２値の信号で
あるので、論理演算によってこれらディジタル入力信号
の音声区間を判定する。この実施例では、′利足部１４
を通常の論理和（ＯＲ）回路を以って構成する。従って
、第一及び第二音声検出部ＩＯ及び１２からのいずれか
一方又は双方が音声検出を示す「１」の信号ＤＩ２及び
Ｄ１４である場合には、判定部１４からは「１」の音声
検出信号Ｄ２６を出力して、この「１」の信号の！１続
する期間が音声区間であることを示す。尚、この場合、
音声レベルが犬であるときには、主として第一音声検出
部ｌＯがきき、又、音声レベルが雑音レベルに近くなっ
たときには、主として第二音声検出部１２がきいてくる
。Description of Determination Section and Hangover Time Adding Circuit Since these first and second voice detection signals D12 and D14 are binary signals, the voice section of these digital input signals is determined by logical operations. In this embodiment, the 'legged part 14
is constructed using an ordinary logical sum (OR) circuit. Therefore, when either or both of the first and second voice detection units IO and 12 are signals DI2 and D14 of “1” indicating voice detection, the determination unit 14 detects a voice of “1”. Outputs the signal D26 and outputs this "1" signal! Indicates that the continuous period is a voice section. In this case,
When the sound level is dog, the first sound detection unit 10 is mainly heard, and when the sound level is close to the noise level, the second sound detection unit 12 is mainly heard.

そして、この音声検出信号Ｄ２ＢがｒｌＪからｒＱＪに
変化した時、従来と同様にハングオーバ時間付加回路１
８においてこの変化を検出して適当な時間だけ「１」状
態を！１続させて、語尾、話中切れ等の防止を行う。Then, when this voice detection signal D2B changes from rlJ to rQJ, the hangover time adding circuit 1
8, detect this change and keep it in the “1” state for an appropriate amount of time! Continuing one line to prevent endings, interruptions, etc.

上述したこの発明の音声検出装置はその構成をソフト技
術によって或いはソフト技術とハード技術との組み合せ
によって容易に形成することが出来る。The configuration of the voice detection device of the present invention described above can be easily formed using software technology or a combination of software technology and hardware technology.

また、前述した電力算出部はマイクロソフトで動作する
通信用シグナル・プロセッサによって構成することが出
来るので、この発明はＤＳＩ等の通信装置或いは音声蓄
積システムの音声検出装置に適用して好適である。Further, since the power calculation section described above can be configured by a communication signal processor operated by Microsoft, the present invention is suitable for application to a communication device such as a DSI or a voice detection device of a voice storage system.

（発明の効果）上述した説明からも明らかなように、この発明の音声検
出装置によれば、音声検出において、入力信号の電力量
を併用して音声と雑音の質的な相違により、音声の有無
を検出し、しかも、その検出法が周波数と電力の双方を
勘案する検出法であるため、雑音による誤った音声検出
を無くすと共に、高感度に音声検出が出来る。(Effects of the Invention) As is clear from the above description, according to the voice detection device of the present invention, in voice detection, the electric power of the input signal is used in conjunction with the qualitative difference between voice and noise to detect the voice. Since the presence or absence is detected and the detection method is a detection method that takes both frequency and power into consideration, erroneous voice detection due to noise can be eliminated and voice detection can be performed with high sensitivity.

[Brief explanation of the drawing]

第１図はこの発明の音声検出装置の実施例の構成を示す
ブロック図、第２図は第１図の音声検出装置に使用するディジタルフ
ィルタの構成例を示すブロック図、第３図は第一・音声
検出部の説明に供する動作の流れ図、第４図は第一二音声検出部の説明に供する動作の流れ図
、第５図は従来の音声検出装置の一構成例を示すブロック
図である。ｌＯ・・・第一音声検出部、１２・・・第二音声検出部
１４・・・判定部１６・・・ハングオーバ時間付加回路２０．２４．２５・・・電力算出部、２１．２８・・・
比較器２２・・・第一濾波器（第一フィルタ）２３・・
・第二濾波器（第二フィルタ）３０・・・入力端ｆ、　
　　　　３１・・・出力端子３３〜３Ｂ・・・加算器、
３７〜４１・・・乗算器４２、４３・・・遅延素子。特許出願人　　　　沖電気工業株式会社第−＠Ｐ　季足
世−９Ｑ　　［７１事カイγ　ラカー　十を四〇第３図第二昔芦壕工御ｃ７１動４１シ箆れ図第４図FIG. 1 is a block diagram showing the configuration of an embodiment of the voice detection device of the present invention, FIG. 2 is a block diagram showing an example configuration of a digital filter used in the voice detection device of FIG. 1, and FIG.・A flowchart of the operation to explain the voice detection section. FIG. 4 is a flowchart of the operation to explain the first and second voice detection sections. FIG. 5 is a block diagram showing an example of the configuration of a conventional voice detection device. lO...First audio detection unit, 12...Second audio detection unit 14...Judgment unit 16...Hangover time addition circuit 20.24.25...Power calculation unit, 21.28...・
Comparator 22...first filter (first filter) 23...
・Second filter (second filter) 30...input end f,
31... Output terminals 33-3B... Adder,
37-41... Multipliers 42, 43... Delay elements. Patent Applicant: Oki Electric Industry Co., Ltd. No.-@P Kisokuyo-9Q

Claims

[Claims]

(1) In a voice detection device that detects a voice section from a digital input signal, a first voice detection section that outputs a first voice detection signal that determines a voice section based on the amount of power during a certain period of the digital input signal; a second voice detection unit that outputs a second voice detection signal that determines a voice section based on a power ratio between components of different frequency bands in the certain period of the digital input signal; and the first and second voice detection. A voice detection device comprising: a determination unit that determines a voice section of the digital input signal by performing a logical operation on the signal.

(2) The first audio detection unit includes a power calculation unit for calculating the power in a certain period of the digital input signal, and a first audio detection unit that compares the obtained power with a predetermined threshold and uses the comparison result as the first audio detection unit. The voice detection device according to claim 1, further comprising a comparator that outputs the signal as a signal.

(3) The second audio detection unit includes a plurality of filters each capable of outputting components of different frequency bands of the digital input signal, and a plurality of power filters each capable of calculating the amount of power of the output of these filters in the certain period. A patent characterized in that it comprises a calculation unit and a comparator that calculates a power ratio between components of different frequency bands, compares the power ratio with a threshold value, and outputs the comparison result as a second audio detection signal. A voice detection device according to claim 1.