JP2708913B2

JP2708913B2 - Sound detection output device

Info

Publication number: JP2708913B2
Application number: JP1273882A
Authority: JP
Inventors: 義注太田; 哲夫古谷
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-10-23
Filing date: 1989-10-23
Publication date: 1998-02-04
Anticipated expiration: 2013-02-04
Also published as: JPH03136099A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、入力信号中に含まれる音声などの有効な信
号成分の有無を検出し、これを音声認識装置、音声蓄積
装置などへ入力する音声検出出力装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention detects the presence or absence of a valid signal component such as voice included in an input signal, and inputs this to a voice recognition device, a voice storage device, and the like. The present invention relates to a voice detection output device.

[Conventional technology]

従来より、音声認識、音声蓄積などにおける入力音声
の取り込みのための音声検出装置は、たとえば音声認識
装置へマイクロホン等を用いて音声を入力する場合に
は、特開昭63-291096号公報に記載されているように、
マイクロホン出力信号から算出されるパワーとしきい値
を用いて入力音声区間の検出を行い、検出した音声を音
声認識装置に入力している。Conventionally, a voice detection device for capturing an input voice in voice recognition, voice storage, and the like is described in Japanese Patent Application Laid-Open No. 63-291096, for example, when voice is input to a voice recognition device using a microphone or the like. As has been
The input voice section is detected using the power and the threshold value calculated from the microphone output signal, and the detected voice is input to the voice recognition device.

以下、これを第７図を用いて簡単に説明する。 Hereinafter, this will be briefly described with reference to FIG.

第７図は入力信号から音声区間を検出する原理の説明
図であつて、横軸に時間を縦軸にパワーをとつたマイク
ロホンの出力信号波形図である。FIG. 7 is an explanatory diagram of the principle of detecting a speech section from an input signal, and is a waveform diagram of an output signal of a microphone in which the horizontal axis represents time and the vertical axis represents power.

同図において、マイクロホン出力信号のパワーがしき
い値TSよりも大きくなつたときに音声が検出されたと判
断するものである。このパワーとしきい値を用いて音声
を検出する方法は非常に簡単であるが、多くの欠点をも
つ。たとえば語頭にパワーの低い子音があるときには、
しきい値TSを大きく設定しておくと語頭の子音を落とし
て音声区間を検出することになる。語頭の子音を落とさ
ずに音声区間を検出するには、しきい値TSを小さくして
おけばよいが、しきい値を小さくすると周囲騒音，発声
前後の呼気音をも音声区間とする危険性がある。In the drawing, when the power of the microphone output signal becomes larger than a threshold value TS, it is determined that a voice is detected. Although using this power and threshold to detect speech is very simple, it has many disadvantages. For example, if there is a low power consonant at the beginning,
If the threshold value TS is set large, a consonant at the beginning of a word is dropped to detect a voice section. In order to detect a speech section without dropping the consonant at the beginning of the word, the threshold value TS may be reduced. There is.

これを防止するため、前記特開昭63-291096号公報の
発明では音声検出を上記パワーとしきい値による判断に
加え、入力信号の特徴パラメータのベクトル間距離の変
化を補助情報としてこれとしきい値による判断を加味し
て音声検出を行うようにするしたものである。これは、
騒音と音声の特徴パラメータが大きく異なり、騒音中に
音声が存在したときは入力信号の特徴パラメータのベク
トル間距離が変化することを利用しているものである。In order to prevent this, in the invention of Japanese Patent Application Laid-Open No. 63-291096, voice detection is determined based on the above-mentioned power and threshold value, and a change in the inter-vector distance of a characteristic parameter of the input signal is used as auxiliary information and the threshold value. The voice detection is performed in consideration of the judgment of the above. this is,
Characteristic parameters of noise and voice are largely different, and when a voice is present in noise, the distance between vectors of the characteristic parameter of the input signal changes.

しかし、特徴パラメータの抽出には大きなハードウエ
アを必要とする。However, extracting feature parameters requires large hardware.

また、パワーとしきい値による音声検出の改善策とし
て特開昭64-80999号公報に記載された発明のごとく、入
力信号を帯域分割して、それぞれのパワーをしきい値で
判断するもの、あるいはしきい値を可変できるようにし
ておき、別の周囲騒音収集専用のマイクロホンにより、
騒音レベルを測定し、これで先のしきい値を設定するも
のなどがある。Further, as an improvement measure of voice detection based on power and a threshold, as in the invention described in JP-A-64-80999, an input signal is divided into bands and each power is determined by a threshold, or The threshold value can be changed, and another microphone dedicated to collecting ambient noise
There is a method in which the noise level is measured, and the threshold value is set based on the measured noise level.

[Problems to be solved by the invention]

上記従来技術は入力信号のパワーとしきい値により音
声検出を行うことを基本としているが、このしきい値の
設定を如何に行うかについては言及されていない。ま
た、別の専用マイクロホンで騒音レベルのみを測定し、
これでしきい値を設定するものでは、この別の専用マイ
クロホンを音声入力に対して空間的に遮蔽できないた
め、音声も入力されて誤動作することもある。さらに、
騒音レベルの測定をどの時点で行うかについても考慮さ
れてはいない。The above-mentioned prior art is based on performing voice detection based on the power of an input signal and a threshold, but does not mention how to set the threshold. Also, measure the noise level only with another dedicated microphone,
With this setting of the threshold value, the other dedicated microphone cannot be spatially shielded from voice input, so that voice may be input and malfunction may occur. further,
No consideration is given as to when the noise level measurement will be performed.

音声入力の場合には、周囲環境つまり周囲騒音は時々
刻々変化しているために、しきい値レベルは常に変化さ
せないと正確な音声区間検出が不可能である。In the case of voice input, since the surrounding environment, that is, the surrounding noise is changing every moment, accurate detection of a voice section is impossible unless the threshold level is constantly changed.

上述したように、従来技術では、このしきい値の設
定、つまり何時の時点でどのような値に設定すべきかの
点については配慮がされておらず、音声認識装置への入
力が正確でなく、認識率が低下するなどの問題があつ
た。As described above, in the related art, no consideration is given to the setting of this threshold value, that is, what value should be set at what time, and the input to the speech recognition device is not accurate. However, there were problems such as a decrease in the recognition rate.

本発明の目的は、上記しきい値を最適な値に最適なタ
イミングで設定し、これによつて正確に音声区間を検出
し、音声を音声認識装置や音声蓄積装置に入力すること
により、認識率の低下あるいは誤動作の防止、蓄積にお
けるフアイルコストの削減を達成可能とした音声検出出
力装置を提供することにある。An object of the present invention is to set the threshold value to an optimum value at an optimum timing, thereby accurately detecting a voice section, and inputting a voice to a voice recognition device or a voice storage device to perform recognition. It is an object of the present invention to provide a voice detection output device capable of preventing a reduction in rate or malfunction and reducing a file cost in storage.

[Means for solving the problem]

上記目的は、音声認識あるいは蓄積装置へ音声の入力
を催告する音を空間に音波として出力する催告音出力手
段と、該空間に設置されて該音を収集するマイクロホン
と、前記マイクロホンが出力する信号から前記催告音出
力手段が出力する催告音の信号を消去する催告音消去手
段と、マイクロホン出力信号のパワーを検出するパワー
検出手段と、上記催告音の信号期間での該パワー手段の
出力に応じてしきい値を発生して保持するしきい値発生
手段と、パワー検出手段としきい値発生手段の出力を比
較し、上記催告音の信号の後の音声区間を検出する比較
手段と、比較手段の出力によりマイクロホン出力信号の
出力を制御する開閉手段とで音声検出出力装置を構成
し、音声認識あるいは蓄積装置に接続した構成とするこ
とによつて達成される。The above object is to provide a reminder sound output means for outputting a sound for informing a voice input to a speech recognition or storage device as a sound wave to a space, a microphone installed in the space to collect the sound, and a signal output by the microphone. A sound signal erasing means for erasing a sound signal output from the sound signal output means, a power detection means for detecting the power of a microphone output signal, and a power signal output means for detecting the power of the microphone output signal. A threshold generating means for generating and holding a threshold value, a comparing means for comparing outputs of the power detecting means and the threshold generating means, and detecting a voice section after the signal of the reminder sound; This is achieved by forming a voice detection output device with opening / closing means for controlling the output of a microphone output signal by the output of the microphone and connecting to a voice recognition or storage device.

[Action]

催告音出力手段は操作者に音声入力を催すために電子
音あるいは音声合成による音声等をスピーカから放声す
る。この催告音はマイクロホンに入力されマイクロホン
出力信号となる。この時、同時に周囲騒音もマイクロホ
ンに入力されている。催告音消去手段はこのマイクロホ
ン出力信号から催告音成分を消去する。パワー検出手段
は、この催告音が出力されている間のマイクロホン出力
信号のパワーを検出する。このパワーは催告音消去手段
の働きにより、周囲騒音のパワーとして検出される。し
きい値発生手段はこの時のパワー検出手段の出力つまり
周囲騒音のパワーを受け、それに比例したしきい値を発
生し、これを保持する。催告音が終了すると操作者が音
声認識あるいは蓄積装置に音声を入力するために発声す
る。この時のマイクロホン出力信号は操作者の音声と周
囲騒音が混じったものである。比較手段はこの時のパワ
ー検出手段の出力であるパワーと、先の催告音の時に設
定され保持されているしきい値発生手段のしきい値とを
比較し、パワーがしきい値より大なる時に、開閉手段を
閉として音声認識あるいは音声蓄積装置にマイクロホン
出力信号を出力する。The notification sound output means emits an electronic sound or a voice by voice synthesis from a speaker in order to give a voice input to the operator. This sound is input to the microphone and becomes a microphone output signal. At this time, ambient noise is also input to the microphone at the same time. The reminder sound erasing means deletes the reminder sound component from the microphone output signal. The power detection means detects the power of the microphone output signal while the audible sound is being output. This power is detected as the power of the ambient noise by the function of the warning sound canceling means. The threshold value generating means receives the output of the power detecting means at this time, that is, the power of the ambient noise, generates a threshold value proportional thereto, and holds the threshold value. When the audible signal ends, the operator speaks for voice recognition or inputting voice to the storage device. The microphone output signal at this time is a mixture of the voice of the operator and the ambient noise. The comparing means compares the power output from the power detecting means at this time with the threshold value of the threshold value generating means set and held at the time of the preceding sound, and the power becomes larger than the threshold value. At times, the opening / closing means is closed to output a microphone output signal to the voice recognition or voice storage device.

この時のしきい値は操作者が発生する直前の周囲騒音
パワーに比例したものであり、操作者の発声は、人の常
として周囲騒音レベルより大きな声となることを考えれ
ば、比較手段は正確に周囲騒音と音声を比較判断し、音
声を検出して音声認識あるいは音声蓄積装置にマイクロ
ホンで収集した操作者の音声信号を出力することにな
る。The threshold value at this time is proportional to the ambient noise power immediately before the occurrence of the operator.Considering that the utterance of the operator is always higher than the ambient noise level as a person, the comparison means is Accurately comparing and determining the ambient noise and the voice, the voice is detected, and the voice signal of the operator collected by the microphone is output to the voice recognition or voice storage device.

つまり、時々刻々変動する周囲環境下において、音声
の検出レベルを最適のタイミングであるこの音声信号の
直前で最適の値として設定できることになる。In other words, in an environment that changes every moment, the detection level of the sound can be set as the optimum value immediately before the sound signal, which is the optimum timing.

以上の動作によつて、音声認識あるいは蓄積装置は周
囲騒音レベルに係わらず正確に操作者の音声を入力され
るため、認識率の低下あるいは誤動作の防止、音声蓄積
におけるフアイルコストの削減を達成できる。By the above operation, the voice recognition or the storage device can accurately input the operator's voice regardless of the ambient noise level, so that the recognition rate can be reduced or malfunctions can be prevented, and the file cost for voice storage can be reduced. .

〔Example〕

以下、本発明の実施例を図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第１図は本発明による音声検出出力装置の第１の実施
例を説明するブロツク図であつて、ａは音声検出出力装
置、ｂは音声認識装置である。FIG. 1 is a block diagram for explaining a first embodiment of a voice detection and output device according to the present invention, wherein a is a voice detection and output device, and b is a voice recognition device.

同図において、１は操作者に装置への音声入力を催告
する音を出力する催告音出力回路、２は操作者の音声を
収集するマイクロホン、３は特定周波数成分を除去する
バンドエリミネートフイルタ（以後、BEFと略す）、４
はマイクロホン２の出力信号とこれがBEF3を通過した信
号のいずれか一方を選択する選択スイツチ、５は信号の
パワーを検出するパワー検出回路、６はパワー検出回路
の出力に比例した値をしきい値として発生し保持するし
きい値発生回路、７はパワー検出回路５としきい値発生
回路６の出力同士を比較する比較回路、８はマイクロホ
ン出力信号を開閉する開閉器、９は装置全体の制御を行
う制御回路、10は外部装置とのインタフエースを行うイ
ンタフエース回路である。音声検出出力装置ａは上記１
〜10の各回路から構成される。In FIG. 1, reference numeral 1 denotes an alarm sound output circuit for outputting a sound informing the operator of a voice input to the apparatus, 2 a microphone for collecting the voice of the operator, and 3 a band elimination filter (hereinafter referred to as a band elimination filter) for removing a specific frequency component. , BEF), 4
Is a selection switch for selecting one of the output signal of the microphone 2 and the signal that has passed through the BEF 3, 5 is a power detection circuit for detecting the power of the signal, and 6 is a threshold value that is proportional to the output of the power detection circuit. A threshold generation circuit for generating and holding as a reference, 7 a comparison circuit for comparing the outputs of the power detection circuit 5 and the threshold generation circuit 6, 8 a switch for opening and closing the microphone output signal, and 9 a control of the entire apparatus. The control circuit 10 performs an interface with an external device. The voice detection output device a is
~ 10 circuits.

催告音出力回路１は、正弦波発生回路101と増幅回路1
02およびスピーカ103で構成される。正弦波発生回路101
は可聴周波数f₁の正弦波を発生し、この正弦波を増幅回
路102で増幅し、スピーカ103から出力する。The alarm sound output circuit 1 includes a sine wave generation circuit 101 and an amplification circuit 1
02 and a speaker 103. Sine wave generation circuit 101
Generates a sine wave having an audible frequency f ₁ , amplifies the sine wave with the amplifier circuit 102, and outputs the sine wave from the speaker 103.

BEF3は周波数f₁の信号成分のみを除去するもので、他
の周波数帯域成分はそのまま通過させる。パワー検出回
路５は、整流回路、平滑回路などで構成され、信号の平
均パワーを出力する。もちろん、さらに複雑な回路構成
として信号の実効パワーを出力するものでもよい。BEF3 intended to remove only the signal component of the frequency f _1, the other frequency band components to pass as it is. The power detection circuit 5 includes a rectifier circuit, a smoothing circuit, and the like, and outputs an average power of a signal. Of course, a more complex circuit configuration that outputs the effective power of the signal may be used.

第２図は第１図に示した実施例の動作を説明するタイ
ミングチヤートであり、これに従い本実施例の動作を説
明する。FIG. 2 is a timing chart for explaining the operation of the embodiment shown in FIG. 1, and the operation of this embodiment will be described accordingly.

現在の音声認識技術の達成度は低く、多くの実用化さ
れている音声認識装置は特定話者の発声する単語を認識
するものである。そして、それも何時でも音声を受け入
れるように設計されたものは少なく、ほとんどは操作者
に何らかの合図たとえば電子音あるいは音声合成による
ガイダンスにより発声を催し、その後のある時間内に発
声された音声を装置に取り込み認識するものである。そ
れでも周囲環境特に周囲騒音レベルが変化するとこの音
声の取り込みに失敗し認識率の低下をひき起すのが現実
である。本発明はこのような誤動作を防止すべくなされ
たものである。The degree of achievement of the current speech recognition technology is low, and many practical speech recognition devices recognize words uttered by a specific speaker. And few of them are designed to accept voice at any time, and most of them give voice to the operator by some kind of signal, for example, guidance by electronic sound or voice synthesis, and the voice generated within a certain period of time thereafter is transmitted to the operator. And recognizes it. Nevertheless, if the ambient environment, especially the ambient noise level, changes, it is a reality that the voice capture fails and the recognition rate decreases. The present invention has been made to prevent such a malfunction.

音声認識装置ｂは音声検出出力装置ａに対してまずイ
ンタフエース回路10を介して制御回路９に催告音発声の
指令を発する。制御回路９は催告音出力回路１を制御
し、可聴周波数f₁の正弦波を正弦波発生回路101で一定
時間発生して増幅回路102で増幅し、スピーカ103から放
声する。The voice recognition device b first issues a command sound generation command to the control circuit 9 via the interface circuit 10 to the voice detection output device a. The control circuit 9 controls the alarm sound output circuit 1, generates a sine wave having an audible frequency f ₁ for a predetermined time in the sine wave generation circuit 101, amplifies the sine wave in the amplification circuit 102, and outputs the sound from the speaker 103.

この時、同時に一定時間の間に制御回路９は選択スイ
ツチ４を制御し、パワー検出回路５にマイクロホン２で
収集されBEF3を通過した信号を導く。この時間以外では
マイクロホン２で収集された信号はそのままパワー検出
回路５に入力されている。At this time, the control circuit 9 simultaneously controls the selection switch 4 for a certain period of time, and guides the signal collected by the microphone 2 and passed through the BEF 3 to the power detection circuit 5. Outside of this time, the signal collected by the microphone 2 is input to the power detection circuit 5 as it is.

マイクロホン２には、この時、可聴周波数f₁の催告音
と周囲騒音が入力されているが、BEF3は可聴周波数f₁の
信号成分を除去するため、パワー検出回路５にはマイク
ロホン２で収集された音のうち周囲騒音による信号だけ
が入力されることになる。At this time, the sound of the audible frequency f ₁ and the ambient noise are input to the microphone 2, but the BEF 3 is collected by the microphone 2 by the power detection circuit 5 to remove the signal component of the audible frequency f _1. Only the signal due to the ambient noise among the noises is input.

この催告音が出力されている一定時間の中の一時点ｃ
で制御回路９はしきい値発生回路６を制御し、パワー検
出回路５の出力するパワー値に比例したしきい値を再設
定し、これを以後保持させる（しきい値発生回路６は、
この時点ｃまでは先の催告音出力のとき同様に設定され
たしきい値を保持している）。One point c in a fixed time during which this sound is output
Then, the control circuit 9 controls the threshold value generation circuit 6 to reset a threshold value proportional to the power value output from the power detection circuit 5 and hold it thereafter (the threshold value generation circuit 6
Until this time point c, the threshold value set in the same manner as in the case of the preceding sound output is held.)

上記説明では、催告音出力の時間長は制御回路９が設
定するとしたが、これに限るものではなく、音声認識装
置ｂがインタフエース回路10を介して指定し、制御回路
９がこの指定に従い催告音出力の時間長を設定するよう
に構成してもよい。In the above description, the time length of the sound output is set by the control circuit 9. However, the present invention is not limited to this. The voice recognition device b specifies the time length via the interface circuit 10, and the control circuit 9 specifies the time according to the specification. You may comprise so that the time length of a sound output may be set.

なお、しきい値発生回路６を制御してしきい値を再設
定する時点ｃは催告音発生中であればよいが、なるべく
催告音出力終了間際の方が好ましい。The time point c for resetting the threshold value by controlling the threshold value generating circuit 6 may be any time during which the audible alarm is being generated, but is preferably as close to the end of the audible alarm as possible.

操作者は催告音を聴取したのちに音声を発する。この
時も周囲騒音はマイクロホン２に入力され続けている。The operator emits a sound after hearing the audible sound. At this time, the ambient noise continues to be input to the microphone 2.

比較回路７はパワー検出回路５としきい値発生回路６
の新たに設定されたしきい値とを比較し、このしきい値
よりパワー検出回路５の出力信号が大なるときを操作者
発声音声区間として検出し、開閉器８を閉とする。そし
て、操作者発声音声を音声認識装置ｂに出力する。The comparison circuit 7 includes a power detection circuit 5 and a threshold generation circuit 6
Is compared with the newly set threshold value, the time when the output signal of the power detection circuit 5 becomes larger than this threshold value is detected as the operator utterance voice section, and the switch 8 is closed. Then, the voice uttered by the operator is output to the voice recognition device b.

以上説明したようち、本実施例では催告音出力中つま
り操作者の発声直前のタイミングで周囲騒音レベルを測
定し、これにより音声検出のための比較回路７の一方の
しきい値を設定し、そのしきい値で操作者発声を検出す
るものであるため、周囲騒音レベルが変動する場合であ
つても確実に音声区間を検出できる。そして、音声認識
装置の認識率低下、誤動作を防止できる。As described above, in this embodiment, the ambient noise level is measured during the output of the audible sound, that is, immediately before the utterance of the operator, whereby one threshold of the comparison circuit 7 for voice detection is set. Since the operator's utterance is detected based on the threshold value, the voice section can be reliably detected even when the ambient noise level fluctuates. In addition, it is possible to prevent the recognition rate of the voice recognition device from lowering and malfunction.

また、本実施例では、催告音を単一の正弦波とした
が、これに限ることなく、多周波の正弦波を混合したも
のでもよい。たとえば、DTMFトーンのようなものであれ
ば操作者に不快感を与えない。当然この場合にはBEF3は
この多周波成分を除去するものでなければならない。Further, in this embodiment, the audible sound is a single sine wave, but the audible sound is not limited to this and may be a mixture of sine waves of multiple frequencies. For example, if it is a DTMF tone, the operator does not feel uncomfortable. Of course, in this case, BEF3 must remove this multi-frequency component.

比較回路７は以上説明では常に動作しているが、催告
音終了後の一定時間のみ比較動作を行うように制御回路
９で制御した方が好ましい。また、この一定時間は音声
認識装置６の音声入力受け付け時間と同一とするのが好
ましい。Although the comparison circuit 7 is always operating in the above description, it is preferable that the control circuit 9 controls the comparison circuit 7 to perform the comparison operation only for a certain period of time after the end of the audible signal. Further, it is preferable that this fixed time is the same as the voice input reception time of the voice recognition device 6.

第３図は本発明の第２の実施例を説明するブロツク図
であつて、第１図と同一符号は同一物を示す。FIG. 3 is a block diagram for explaining a second embodiment of the present invention, and the same reference numerals as those in FIG. 1 denote the same components.

同図において、11は催告音出力の一定時間内でパワー
検出回路５の出力であるパワー値の時間平均をとる平均
回路である。In the figure, reference numeral 11 denotes an averaging circuit which averages the power value output from the power detection circuit 5 over a certain period of time for the sound output.

前記第１図実施例の説明では、第２図のタイミングチ
ヤートに示すように、周囲騒音は定常的で催告音出力中
にほとんどそのパワー変化はないとしたが、実際には周
囲騒音レベルは多少変動する。したがつて、時点ｃでの
みしきい値発声回路６を制御し、その時点でのパワー検
出回路５のパワー出力に比例したしきい値に新たに設定
すると誤りを起す可能性である。In the description of the embodiment of FIG. 1, as shown in the timing chart of FIG. 2, the ambient noise is stationary and there is almost no power change during the output of the warning sound. fluctuate. Therefore, if the threshold utterance circuit 6 is controlled only at the time point c and a new threshold value is set in proportion to the power output of the power detection circuit 5 at that time, an error may occur.

平均回路11は制御回路９で制御され、催告音出力時間
中のパワー検出回路５の出力するパワー値を受けとり、
この時間平均値を出力する。そしてこの時間平均値によ
り、しきい値発生回路６は比例したしきい値を新たに発
生し保持する。The averaging circuit 11 is controlled by the control circuit 9 and receives the power value output from the power detection circuit 5 during the audible sound output time.
This time average is output. Then, based on this time average value, the threshold value generating circuit 6 newly generates and holds a proportional threshold value.

これにより、本実施例では周囲騒音レベルが多少変動
する場合でも正確に音声区間を検出することができるよ
うになる。他の動作は第１図と同様なため、説明を省略
する。As a result, in the present embodiment, even when the ambient noise level fluctuates slightly, the voice section can be detected accurately. Other operations are the same as those in FIG.

第４図は本発明の第３の実施例を説明するブロツク図
であつて、第５図において第１図と同一符号は同一部分
を示す。FIG. 4 is a block diagram for explaining a third embodiment of the present invention. In FIG. 5, the same reference numerals as in FIG. 1 denote the same parts.

同図において、12は反響消去回路、121,122,123,124,
125はそれぞれ反響消去回路12を構成するAD（アナログ
デイジタル）変換器,DA（デイジタルアナログ）変換
器，デイジタルフイルタ，加減算器，タツプ係数設定回
路,104は音声合成回路である。In the figure, 12 is an echo canceling circuit, 121, 122, 123, 124,
Reference numeral 125 denotes an AD (analog digital) converter, DA (digital analog) converter, digital filter, adder / subtractor, tap coefficient setting circuit, and 104, which constitute the echo canceling circuit 12, respectively.

第１図と第３図に記載した実施例では、催告音として
正弦波を出力したが、本実施例では音声合成回路104が
合成する合成音声を催告音として出力するものである。
これは催告音としては正弦波よりも人の声の方が操作者
に与える印象が自然で好ましいためである。In the embodiment shown in FIGS. 1 and 3, a sine wave is output as a reminder sound, but in the present embodiment, a synthesized voice synthesized by the voice synthesis circuit 104 is output as a reminder sound.
This is because a human voice has a more natural and preferable impression as a reminder sound than a sine wave.

音声合成回路104は予じめ作成された人の声の特徴パ
ラメータのデータをもとに電子回路で音声を合成するも
のである。この合成音声はたとえば人に何かを頼むとき
の「……して下さい」のような指示文である。これによ
り操作者は対話形式で音声認識装置に音声入力を行うこ
とができる。The voice synthesizing circuit 104 synthesizes voice using an electronic circuit based on data of characteristic parameters of a human voice created in advance. This synthesized speech is an instruction such as "Please ..." when asking someone for something. As a result, the operator can make a speech input to the speech recognition device in an interactive manner.

周知のように音声は多くの周波数成分からなり、これ
を催告音とした場合、第１図と第３図の実施例のよう
に、マイクロホン２に入力される催告音をBEF3のような
帯域除去フイルタで取り除くことができない。As is well known, speech is composed of many frequency components, and when this is used as a reminder sound, the reminder sound input to the microphone 2 is removed from the band such as BEF3 as in the embodiment of FIGS. It cannot be removed with a filter.

スピーカ103から放声された音声は室の壁などで反響
してマイクロホン２に入力される。したがつて、スピー
カの出力とマイクロホンの入力では周波数成分ごとに位
相が異なり単純な移相器と加減算器だけではマイクロホ
ン２に入力される催告音声を消去できない。つまり、催
告音声発声中にマイクロホン２に入力される周囲騒音レ
ベルを測定することができない。The sound emitted from the speaker 103 is reflected on the wall of the room or the like and is input to the microphone 2. Therefore, the output of the speaker and the input of the microphone have different phases for each frequency component, and the reminder sound input to the microphone 2 cannot be eliminated only by a simple phase shifter and an adder / subtractor. That is, it is impossible to measure the ambient noise level input to the microphone 2 during the utterance voice utterance.

反響消去回路12は、反響路、この場合にはスピーカ10
3からマイクロホン２に至る経路の、伝達関数を推定
し、マイクロホン２に入力される信号からスピーカ103
から放声されマイクロホン２に入力される信号のみを消
去するものである。この反響消去動作の詳細は、たとえ
ば電子通信学会編「デイジタル信号処理」に述べられて
いるので、ここでは簡単な説明にとどめる。The echo canceling circuit 12 includes an echo path, in this case, the speaker 10.
The transfer function of the path from 3 to the microphone 2 is estimated, and the speaker 103
Only the signal output from the microphone and input to the microphone 2 is deleted. The details of the echo canceling operation are described in, for example, "Digital Signal Processing" edited by the Institute of Electronics and Communication Engineers, and will be described only briefly here.

反響消去回路12はアナログ信号をデイジタル信号に変
換し、内部ではすべてデイジタル演算で信号処理を行
う。スピーカ103から出力される催告音声はAD変換器121
を経て基準信号として、デイジタルフイルタ123とタツ
プ係数設定回路125に入力される。デイジタルフイルタ1
23は、たとえばFIR型のフイルタであり、多数のタツプ
出力をもつ遅延線で構成される。そして、基準信号であ
る催告音声の複製（レプリカ）を作成し、加減算器124
に出力する。一方、マイクロホン２に入力される信号
は、AD変換器121を経て、加減算器124に入力される。加
減算器124において、マイクロホン２に入力される催告
音声の反響信号から先の基準信号から作成される催告音
声のレプリカが差し引かれる。その結果として、加減算
器124の出力では催告音声成分は消去されて存在せず、
それ以外の成分（周囲騒音）がDA変換器122を経て出力
される。The echo canceling circuit 12 converts an analog signal into a digital signal, and internally performs signal processing by digital operation. The notification sound output from the speaker 103 is output from the AD converter 121.
Are input to the digital filter 123 and the tap coefficient setting circuit 125 as reference signals. Digital Filter 1
Reference numeral 23 denotes, for example, an FIR type filter, which is constituted by a delay line having a large number of tap outputs. Then, a copy (replica) of the notification sound as the reference signal is created, and the adder / subtractor 124 is generated.
Output to On the other hand, the signal input to the microphone 2 is input to the adder / subtractor 124 via the AD converter 121. In the adder / subtractor 124, a replica of the notification sound created from the reference signal is subtracted from the echo signal of the notification sound input to the microphone 2. As a result, at the output of the adder / subtractor 124, the reminder voice component is deleted and does not exist,
Other components (ambient noise) are output via the DA converter 122.

タツプ係数設定回路125はデイジタルフイルタ123のタ
ツプ出力の重み係数を設定するもので、基準信号と反響
信号が消去された残差信号（加減算器124の出力信号）
の相関からその値を設定するものである。この設定アル
ゴリズムとしては周知のLMS法、学習同定法などが用い
られる。これを簡単に説明すれば、残差信号が最小にな
るようにタツプ重み係数を逐次少しずつ更新する方法で
ある。残差信号が最小となつた時点ではこのタツプ係数
設定回路125が設定する重み係数値列はスピーカ103から
マイクロホン２に到り反響経路伝達関数の時間領域表現
であるインパルス応答の最良推定となる。The tap coefficient setting circuit 125 sets the weight coefficient of the tap output of the digital filter 123, and the residual signal (output signal of the adder / subtractor 124) from which the reference signal and the echo signal have been eliminated.
The value is set from the correlation. As the setting algorithm, a known LMS method, a learning identification method, or the like is used. In brief, this is a method in which the tap weight coefficient is updated little by little so as to minimize the residual signal. When the residual signal becomes minimum, the weight coefficient value sequence set by the tap coefficient setting circuit 125 reaches the microphone 2 from the speaker 103 and is the best estimation of the impulse response as a time domain expression of the reverberation path transfer function.

制御回路９は催告音声を出力している時間中、反響消
去回路12を制御し、催告音声の反響信号（マイクロホン
２に入力される）の消去動作を行わせる。The control circuit 9 controls the reverberation canceling circuit 12 during the time when the reminder sound is being output, so that the reverberation signal (input to the microphone 2) of the reminder sound is erased.

以上の消去動作により、催告音声出力中であつてもパ
ワー検出回路５は催告音声成分に妨害されることなく、
同時にマイクロホン２に入力されている周囲騒音のパワ
ーを正確に出力する。他の動作は第１図で説明した実施
例と同様であるので説明を省略する。By the above erasing operation, the power detection circuit 5 is not disturbed by the notification sound component even during the output of the notification sound,
At the same time, the power of the ambient noise input to the microphone 2 is accurately output. Other operations are the same as those of the embodiment described with reference to FIG.

なお、反響消去回路12が消去動作を行つていないとき
は、デイジタルフイルタ123の出力信号はなく（催告音
出力がないため入力がない）、マイクロホン２への入力
信号はそのままAD変換、DA変換されてパワー検出回路５
に出力される。When the echo canceling circuit 12 is not performing the erasing operation, there is no output signal of the digital filter 123 (there is no input because there is no audible sound output), and the input signal to the microphone 2 is directly subjected to AD conversion and DA conversion. Power detection circuit 5
Is output to

また、反響消去回路12は反響音を消去するまで（反響
路を推定するまで）にある時間（100ms程度）を要する
ため、しきい値発生回路６が新たなしきい値を発生保持
する時点（第３図の時点ｃ）は、消去が完了してから
（催告音出力から例えば100ms以上経過してから）が好
ましい。The echo canceling circuit 12 requires a certain time (about 100 ms) to cancel the echo (estimating the echo path), so that the threshold value generating circuit 6 generates and holds a new threshold value (the first time). The time point c) in FIG. 3 is preferably after the erasure is completed (for example, after 100 ms or more has elapsed from the output of the audible alarm).

また、第３図実施例のように、しきい値発生回路６の
前に平均回路11を挿入してもよいことは明らかである。
この場合、平均する時間は消去完了から催告音声出力の
終わりまでの時間とするのが好ましい。It is apparent that the averaging circuit 11 may be inserted before the threshold value generating circuit 6 as in the embodiment shown in FIG.
In this case, the averaging time is preferably the time from the completion of the erasure to the end of the output of the reminder sound.

なお、本実施例では催告音を合成音声としてが、第１
図と第３図の実施例のように催告音が正弦波であつても
反響消去回路12は消去動作を行うのは明らかである。In this embodiment, the notification sound is a synthesized voice.
It is clear that the echo canceling circuit 12 performs an erasing operation even if the audible tone is a sine wave as in the embodiment of FIGS.

以上、本実施例によれば、催告音が音声であつてもこ
の出力中つまり操作者の発声直前のタイミングで周囲騒
音レベルを測定できる。これにより、音声検出のための
比較回路７の一方のしきい値を設定し、そのしきい値で
操作者音声を検出するため、周囲騒音レベルが変動する
場合であつても確実に音声区間を検出できる。そして、
音声認識装置の認識率の低下，誤動作を防止できる。As described above, according to the present embodiment, even if the reminder sound is a voice, the ambient noise level can be measured during this output, that is, immediately before the operator utters the voice. Thereby, one threshold value of the comparison circuit 7 for voice detection is set, and the operator voice is detected at the threshold value. Therefore, even when the ambient noise level fluctuates, the voice section can be reliably detected. Can be detected. And
It is possible to prevent the recognition rate of the speech recognition device from decreasing and malfunctioning.

第５図は本発明の第４の実施例を説明するブロツク図
であつて、第４図と同一符号は同一物を示す。FIG. 5 is a block diagram for explaining a fourth embodiment of the present invention, and the same reference numerals as those in FIG. 4 denote the same parts.

同図において、13はマイクロホン２とは離してできる
だけ周囲騒音のみを収集するように設置した騒音用マイ
クロホン、14は反響消去回路12の基準信号入力を切換え
る切換スイツチである。切換スイツチ14はスピーカ103
から供給される催告音出力と騒音用マイクロホン13のど
ちらか一方を選択して反響消去回路12へ供給するもので
ある。In the figure, reference numeral 13 denotes a noise microphone which is set apart from the microphone 2 so as to collect only ambient noise as much as possible, and 14 denotes a switching switch for switching a reference signal input of the echo canceling circuit 12. The switching switch 14 is a speaker 103
One of the audible sound output and the noise microphone 13 is supplied from the controller and supplied to the echo canceller circuit 12.

催告音声出力中は、第４図に記載の実施例と同様に、
制御回路９が切換スイツチ14を制御し、催告音声を基準
信号として反響消去回路12に入力する。そして、マイク
ロホン２に入力される催告音声を消去し、催告音声出力
中であつてもパワー検出回路５で周囲騒音のパワー検出
を可能にする。催告音声出力中以外の時間では、制御回
路９が切換スイツチ14を制御し、騒音用マイクロホン13
で収集される周囲騒音信号を基準信号として反響消去回
路12に入力する。During the output of the notice voice, as in the embodiment shown in FIG.
The control circuit 9 controls the switching switch 14 and inputs the notification sound to the echo canceling circuit 12 as a reference signal. Then, the alarm sound input to the microphone 2 is deleted, and the power of the ambient noise can be detected by the power detection circuit 5 even while the alarm sound is being output. At times other than when the notification sound is being output, the control circuit 9 controls the switching switch 14 and the noise microphone 13
Is input to the echo canceller circuit 12 as a reference signal.

ここで、この反響消去回路12の基準信号入力として、
騒音用マイクロホン13の出力が選択されているときの動
作を説明する。この時反響消去回路は騒音消去回路とし
て動作する。Here, as a reference signal input of the echo canceling circuit 12,
The operation when the output of the noise microphone 13 is selected will be described. At this time, the echo canceling circuit operates as a noise canceling circuit.

先に説明したように、デイジタルフイルタ123は騒音
用マイクロホン13に入力される周囲騒音からレプリカを
作成し、マイクロホン２に入力される周囲騒音からこれ
を差し引き消去する。周囲騒音の発生源が騒音用マイク
ロホン13およびマイクロホン２から十分遠方にあり、双
方のマイクロホンに同じ周囲騒音が入力されれば、DA変
換器122から出力される信号にはマイクロホン２に入力
された周囲騒音は出力されない。しかし、現実には多少
異なるためマイクロホン２に入力される周囲騒音は一部
が出力されてしまう。さらに、騒音用マイクロホン13に
周囲騒音のみを入力することは音波を空間的に遮幣する
ことが不可能なことを考えれば、双方のマイクロホンの
距離を隔離したとしても、騒音用マイクロホン13に多少
の操作者発声音声が混入するのは避けられない。これは
基準信号としての周囲騒音にとつて外乱となり、消去能
力を減少させることになる。As described above, the digital filter 123 creates a replica from the ambient noise input to the noise microphone 13 and subtracts it from the ambient noise input to the microphone 2 to eliminate it. If the source of the ambient noise is sufficiently distant from the noise microphone 13 and the microphone 2 and the same ambient noise is input to both microphones, the signal output from the DA converter 122 includes the surrounding input to the microphone 2. No noise is output. However, in reality, the ambient noise input to the microphone 2 is partially output because it is slightly different. Furthermore, considering that inputting only ambient noise to the noise microphone 13 cannot spatially block sound waves, even if the distance between both microphones is separated, the noise microphone It is inevitable that the operator uttered voice is mixed. This causes disturbance to the ambient noise as a reference signal, and reduces the erasing ability.

しかし、多少の不具合はあつても、催告音声出力後の
操作者発声音声への周囲騒音の混入を防止できる。However, even if there are some problems, it is possible to prevent the ambient noise from being mixed into the voice uttered by the operator after the output of the notification voice.

以上述べたように、本実施例によれば、第４図の実施
例と同様に、確実に操作者発声音声の検出ができる。し
かも、検出した音声は周囲騒音の混入が低減されたもの
となり、音声認識装置の認識率の低下、誤動作を防止で
きる。As described above, according to the present embodiment, the operator's uttered voice can be reliably detected as in the embodiment of FIG. In addition, the detected voice has a reduced amount of ambient noise mixed therein, and it is possible to prevent the recognition rate of the voice recognition device from lowering and prevent malfunction.

第６図は本発明の第５の実施例を説明するブロツク図
であつて、第４図と同一符号は同一物を示し、15は調整
回路、16は差動増幅回路である。FIG. 6 is a block diagram for explaining a fifth embodiment of the present invention, in which the same reference numerals as those in FIG. 4 denote the same components, 15 is an adjusting circuit, and 16 is a differential amplifier circuit.

これまで説明した実施例は、催告音出力中、マイクロ
ホン２に入力される催告音信号を消去して周囲音声レベ
ルを検出し、検出した値で音声圧間検出のしきい値を設
定しようとしたものである。そして催告音が合成音声の
場合には、第４図の実施例のように、複雑な反響消去回
路12を必要とした。In the embodiments described above, the audible sound signal input to the microphone 2 is erased during the audible sound output, the surrounding sound level is detected, and an attempt is made to set the threshold value for the sound pressure detection based on the detected value. Things. In the case where the notification sound is a synthesized voice, a complicated echo canceling circuit 12 is required as in the embodiment of FIG.

本実施例はさらに簡略な方法で第４図の実施例と同様
の効果を得ようとするものである。In this embodiment, the same effect as that of the embodiment shown in FIG. 4 is obtained by a simpler method.

第６図において、スピーカ103から出力される催告音
声とマイクロホン２に入力されるそれとは、空間の反響
特性により位相および周波数特性が大きくずれ、波形で
比較すると大きく異なったものとなつている。しかし、
パワーという観点でみれば、エネルギー保存が成立する
ため相似となる。スピーカ103から放声される催告音の
パワー時系列とマイクロホン２に入力される催告音のパ
ワー時系列は、その絶対値は異なるが形としては相似な
ものである。In FIG. 6, the notice sound output from the speaker 103 and the input sound to the microphone 2 are significantly different in phase and frequency characteristics due to the reverberation characteristics of the space, and are significantly different when compared by waveforms. But,
From the viewpoint of power, it is similar because energy conservation is established. The power time series of the audible sound emitted from the speaker 103 and the power time series of the audible sound input to the microphone 2 have different absolute values, but are similar in shape.

本実施例はこの原理に基ずき、催告音声発声中にマイ
クロホン２に入力される周囲騒音パワーを検出し、それ
によつてしきい値を設定しようとするものである。In the present embodiment, based on this principle, the ambient noise power input to the microphone 2 is detected during the utterance voice utterance, and the threshold value is set accordingly.

スピーカ103およびマイクロホン２に接続されるパワ
ー検出回路５はそれぞれのパワーを検出する。調整回路
５は出力されるパワーの値を補正（増幅あるいは減衰）
する。ここで、スピーカ103とマイクロホン２の位置関
係は固定されているとする。好ましくは、同一匡体に予
じめ固定しておく。周囲騒音がない状態で催告音を出力
し、マイクロホン２に接続されるパワー検出回路５の出
力とスピーカ103に接続されるパワー検出回路５で検出
され調整回路15で値を補正される出力が同一となるよう
調整回路を調整する。The power detection circuit 5 connected to the speaker 103 and the microphone 2 detects each power. The adjustment circuit 5 corrects the output power value (amplification or attenuation)
I do. Here, it is assumed that the positional relationship between the speaker 103 and the microphone 2 is fixed. Preferably, they are fixed to the same housing in advance. An alarm sound is output in the absence of ambient noise, and the output of the power detection circuit 5 connected to the microphone 2 and the output detected by the power detection circuit 5 connected to the speaker 103 and corrected by the adjustment circuit 15 are the same. Adjust the adjustment circuit so that

調整回路15とマイクロホン２に接続されるパワー検出
回路５の出力信号は、それぞれ差動増幅回路16に供給さ
れる。そして、差動増幅回路16は両者の差の信号をしき
い値発生回路６に供給する。先の調整により差動増幅回
路16の出力信号は零となる。つまり、調整回路15により
周囲騒音がなく、催告音のみが出力されマイクロホン２
に入力される状態で差動増幅回路16の出力を零となるよ
うに調整する。Output signals of the adjustment circuit 15 and the power detection circuit 5 connected to the microphone 2 are supplied to a differential amplifier circuit 16, respectively. Then, the differential amplifying circuit 16 supplies the difference signal between the two to the threshold value generating circuit 6. By the above adjustment, the output signal of the differential amplifier circuit 16 becomes zero. In other words, there is no ambient noise by the adjustment circuit 15 and only the audible signal is output and the microphone 2
Is adjusted so that the output of the differential amplifier circuit 16 becomes zero in a state where the signal is input to.

このように調整された状態で周囲騒音がマイクロホン
２に入力されると、催告音出力中であつても差動増幅回
路16の出力信号ではこの催告音出力のパワーは相殺さ
れ、周囲騒音のパワーのみが出力される。そして、この
周囲騒音パワーにより、制御回路９がしきい値発生回路
６を制御し、このパワーに比例したしきい値を新たに発
生し保持する。その他の動作は第１図の実施例と同様な
ため説明を省略する。When the ambient noise is input to the microphone 2 in the state adjusted as described above, the power of the alarm sound output is canceled by the output signal of the differential amplifier circuit 16 even during the alarm sound output, and the power of the ambient noise is reduced. Only output. The control circuit 9 controls the threshold value generating circuit 6 based on the ambient noise power, and newly generates and holds a threshold value proportional to the power. The other operations are the same as those of the embodiment shown in FIG.

なお、調整回路15は、マイクロホン２に接続されるパ
ワー検出回路５と差動増幅回路16との間に設けてもよ
い。Note that the adjustment circuit 15 may be provided between the power detection circuit 5 connected to the microphone 2 and the differential amplifier circuit 16.

また、パワー値の調整機能のみでなく、スピーカから
マイクロホンへの音波の伝搬遅延を考え、時間遅延の機
能をもたせた方が好ましい場合もある。In some cases, it is preferable to provide a time delay function in consideration of the propagation delay of a sound wave from a speaker to a microphone in addition to the power value adjustment function.

さらに、第３図の実施例と同様に、差動増幅回路16の
出力信号を平均回路に供給してもよい。Further, as in the embodiment of FIG. 3, the output signal of the differential amplifier circuit 16 may be supplied to an averaging circuit.

以上述べたように、本実施例によれば、第４図の実施
例に比較して、簡略な回路構成で安価かつ確実な音声検
出出力装置を提供することができる。As described above, according to this embodiment, it is possible to provide an inexpensive and reliable voice detection output device with a simple circuit configuration as compared with the embodiment of FIG.

〔The invention's effect〕

以上説明したように、本発明によれば、催告音出力時
という操作者発声の直前タイミングで正確に周囲騒音パ
ワーの測定ができ、この測定により音声区間検出のしき
い値を設定し音声区間検出を行うため、周囲騒音レベル
が変動する場合であつても確実に音声区間を検出でき
る。これにより音声認識装置の認識率低下、誤動作を防
止できる。なお、音声蓄積装置にあつてはフアイルコス
トの削減を達成できる。As described above, according to the present invention, the ambient noise power can be accurately measured at the timing immediately before the operator's utterance, that is, at the time of outputting the audible sound. Therefore, even when the ambient noise level fluctuates, the voice section can be reliably detected. As a result, it is possible to prevent the recognition rate of the speech recognition device from being lowered and malfunctioning. In the case of a voice storage device, a reduction in file cost can be achieved.

[Brief description of the drawings]

第１図は本発明による音声検出出力装置の第１の実施例
を説明するブロツク図、第２図は第１図の構成の動作を
説明するタイミングチヤート、第３図は本発明の第２の
実施例を説明するブロツク図、第４図は本発明の第３の
実施例を説明するブロツク図、第５図は本発明の第４の
実施例を説明するブロツク図、第６図は本発明の第５の
実施例を説明するブロツク図、第７図は入力信号から音
声区間を検出する原理の説明図である。１……催告音出力回路、２……マイクロホン、３……BE
F、４……スイツチ、５……パワー検出回路、６……し
きい値発生回路、７……比較回路、８……開閉器、11…
…平均回路、12……反響消去回路、13……騒音用マイク
ロホン、14……切換スイツチ、15……調整回路、16……
差動増幅回路。FIG. 1 is a block diagram for explaining a first embodiment of a voice detection and output device according to the present invention, FIG. 2 is a timing chart for explaining the operation of the configuration of FIG. 1, and FIG. 3 is a second embodiment of the present invention. FIG. 4 is a block diagram illustrating a third embodiment of the present invention, FIG. 5 is a block diagram illustrating a fourth embodiment of the present invention, and FIG. 6 is a block diagram illustrating the present invention. FIG. 7 is a block diagram for explaining a fifth embodiment of the present invention, and FIG. 7 is an explanatory diagram of the principle of detecting a voice section from an input signal. 1 ... notice sound output circuit 2 ... microphone 3 ... BE
F, 4 ... switch, 5 ... power detection circuit, 6 ... threshold value generation circuit, 7 ... comparison circuit, 8 ... switch, 11 ...
... Averaging circuit, 12 ... Echo canceling circuit, 13 ... Noise microphone, 14 ... Switching switch, 15 ... Adjustment circuit, 16 ...
Differential amplifier circuit.

Claims

(57) [Claims]

1. A sound detection and output device for detecting and outputting a sound signal from an input signal from a microphone, wherein a sound signal is output to emit a sound to a space where the microphone is installed in order to notify a sound input. Means for erasing the audible sound component from the output signal of the microphone, and detecting the power of the output signal of the audible sound elimination means while the audible sound component is being output from the microphone. Power detecting means for detecting the power of the output signal of the microphone during a period other than the period; and generating and holding a threshold value corresponding to the power of the output signal of the audible tone erasing means detected by the power detecting means. Threshold value generation means; and comparison means for comparing the threshold value held by the threshold value generation means with the detected power of the power detection means. The threshold value generating means generates and holds the threshold value within the period of the sound signal component in the output signal of the microphone, and the sound after the sound signal component of the input signal from the microphone is output by the comparing means. An audio detection and output device for detecting and outputting an audio signal by detecting a section.

2. The voice detection output device according to claim 1, wherein the audible sound is a sine wave, and the audible sound elimination means is constituted by a band eliminator filter for removing a sine wave component.

3. The digital signal processing system according to claim 1, wherein the audible sound erasing means is a digital filter connected to the audible sound output means and comprising a delay line having a tap output; an adder / subtractor connected to the microphone; And a tap coefficient setting circuit for setting a weight coefficient of a line tap.

4. The digital microphone according to claim 3, further comprising a second microphone for collecting ambient noise, wherein the second microphone is connected to the digital filter during a period other than a period in which the audible sound is output from the audible sound output means. A voice detection output device.

5. The voice detection output device according to claim 1, further comprising averaging means for averaging the output of said power detection means with time.

6. A sound detection output device for detecting and outputting a sound signal from an input signal from a microphone, wherein a sound signal for emitting a sound to a space in which the microphone is installed to notify a sound input. Means, first power detection means for detecting the power of the output signal of the alarm sound output means, second power detection means for detecting the power of the output signal of the microphone, and the first and second powers A differential amplifying means for outputting an output difference of the detecting means, and a threshold for generating and holding a threshold corresponding to an output of the differential amplifying means during an output period of the alarm sound component from the microphone Generating means, and comparing means for comparing the output of the differential amplifying means with the threshold value, wherein the threshold value generating means operates within the period of the audible tone component in the output signal of the microphone. Generate and hold threshold , By detecting the speech interval after components 該催 Tsugeoto of the input signal from the microphone at said comparing means, voice detection output apparatus characterized by detecting and outputting an audio signal.

7. The sound detection output device according to claim 6, further comprising averaging means for averaging the output of the differential amplifying means over time.

8. A speech recognition device comprising the speech detection output device according to claim 1 as speech input means.

9. A voice storage device comprising the voice detection output device according to claim 1 as voice input means.