JP2016178584A

JP2016178584A - Audio processing device, intercom device and intercom system

Info

Publication number: JP2016178584A
Application number: JP2015059077A
Authority: JP
Inventors: 哲平鷲; Teppei Washi; 真行前嶋; Masayuki Maejima; 秀範瀧; Hidenori Taki; 福島　実; Minoru Fukushima; 実福島; 克彦木村; Katsuhiko Kimura; 池田　光治; Mitsuharu Ikeda; 光治池田
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2015-03-23
Filing date: 2015-03-23
Publication date: 2016-10-06
Anticipated expiration: 2035-03-23
Also published as: JP6485739B2

Abstract

PROBLEM TO BE SOLVED: To implement comfortable telephone call irrespective of an environment at an installation position.SOLUTION: A voice switch unit 142 is capable of selecting a call transmission mode to give priority to a signal input from a microphone 11a over a signal output to a speaker 12a, and a voice signal detector 141 detects a voice section containing a voice signal from the signal input from the microphone 11a. A voice switch unit 142 selects the call transmission mode when the voice section has been detected in the voice signal detector 141. The voice signal detector 141 adjusts the sensitivity of determining the voice section according to a reverberation time.SELECTED DRAWING: Figure 2

Description

本発明は、マイクから入力される信号を処理する音声処理装置、インターホン装置、及びインターホンシステムに関する。 The present invention relates to a sound processing device, an intercom device, and an intercom system that process a signal input from a microphone.

マンション等の集合住宅ではロビーにインターホンを設置する場合が多い。来訪者はロビーに設置されたインターホンから住戸内の住人に来訪を伝え、住人が解錠すると来訪者が中に入ることができる。ロビーは空間が広いため残響の影響が大きくなる。 Intercoms are often installed in the lobby of apartment buildings. The visitor can tell the visitor to the residents in the dwelling through the intercom installed in the lobby, and the visitor can enter when the resident unlocks. The lobby has a large space and the effect of reverberation increases.

インターホンでは、送話と受話の音声を比較して大きい方の音声を優先させるボイススイッチ機能が使用されることがある（例えば、特許文献１参照）。 In the intercom, a voice switch function that prioritizes the larger voice by comparing the voices of the transmitted and received voices may be used (for example, see Patent Document 1).

特開２０１２−３２３６８５号公報JP 2012-323865 A

ロビーのように残響の影響が大きい場所では、音声と騒音の判別が難しくなる。従って来訪者が発話している最中に音声を騒音と誤判定する場合が発生し、その場合、送話を優先するモードから中間モードに切り替わってしまうため快適な通話が阻害されることがあった。 In a place such as a lobby where the influence of reverberation is large, it becomes difficult to distinguish between voice and noise. Therefore, when a visitor is speaking, there is a case where the voice is erroneously determined as noise, and in this case, the mode that prioritizes transmission is switched from the intermediate mode to the comfortable mode, which may hinder a comfortable call. It was.

本発明はこうした状況に鑑みなされたものであり、その目的は、設置位置の環境に関わらず、快適な通話を実現する音声処理装置、インターホン装置、及びインターホンシステムを提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a voice processing device, an intercom device, and an intercom system that realize a comfortable call regardless of the environment of the installation position.

上記課題を解決するために、本発明のある態様の音声処理装置は、マイクから入力される信号をスピーカへ出力する信号より優先させる送話モードを選択可能なボイススイッチ部と、前記マイクから入力される信号から音声信号を含む音声区間を検出する音声信号検出部と、を備える。前記ボイススイッチ部は、前記音声信号検出部において前記音声区間が検出されているとき前記送話モードを選択し、前記音声信号検出部は、残響時間に応じて前記音声区間と判定する感度を調整する。 In order to solve the above problems, a speech processing apparatus according to an aspect of the present invention includes a voice switch unit capable of selecting a transmission mode in which a signal input from a microphone is given priority over a signal output to a speaker, and an input from the microphone. A voice signal detection unit that detects a voice section including a voice signal from the received signal. The voice switch unit selects the transmission mode when the voice section is detected by the voice signal detection unit, and the voice signal detection unit adjusts sensitivity for determining the voice section according to reverberation time. To do.

本発明の別の態様は、インターホン装置である。この装置は、上述の音声処理装置と、集音した音を電気信号に変換して前記音声処理装置に入力するマイクと、前記音声処理装置から出力される電気信号を音に変換するスピーカと、を備える。 Another aspect of the present invention is an intercom device. This device includes the above-described sound processing device, a microphone that converts collected sound into an electric signal and inputs the electric signal, a speaker that converts an electric signal output from the sound processing device into sound, Is provided.

本発明のさらに別の態様は、インターホンシステムである。このインターホンシステムは、インターホン室内装置と、前記インターホン室内装置と通信するためのインターホン室外装置であり、上述のインターホン装置が使用されるインターホン室外装置と、を備える。 Yet another embodiment of the present invention is an intercom system. This intercom system includes an interphone indoor device and an interphone outdoor device for communicating with the interphone indoor device, and an interphone outdoor device in which the above-described interphone device is used.

なお、以上の構成要素の任意の組み合わせ、本発明の表現を方法、装置、システムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a representation of the present invention converted between a method, an apparatus, a system, and the like are also effective as an aspect of the present invention.

本発明によれば、設置位置の環境に関わらず、快適な通話を実現できる。 According to the present invention, a comfortable call can be realized regardless of the environment of the installation position.

本発明の実施の形態に係るインターホンシステムの構成を示す図である。It is a figure which shows the structure of the intercom system which concerns on embodiment of this invention. ロビーインターホン装置の構成を示す図である。It is a figure which shows the structure of a lobby intercom apparatus. マイクにより集音される信号波形の一例を示す図である。It is a figure which shows an example of the signal waveform collected by the microphone. マイクにより集音される信号の移動平均値波形の一例を示す図である。It is a figure which shows an example of the moving average value waveform of the signal collected by the microphone. マイクにより集音される信号の移動平均値波形の別の例を示す図である。It is a figure which shows another example of the moving average value waveform of the signal collected by the microphone. 図６（ａ）−（ｃ）は、発話音声信号のレベル推移、残響の影響が小さい場合のマイク入力信号のレベル推移、及び残響の影響が大きい場合のマイク入力信号のレベル推移を模式的に描いた図である。6A to 6C schematically show the transition of the level of the speech signal, the transition of the level of the microphone input signal when the influence of reverberation is small, and the transition of the level of the microphone input signal when the influence of reverberation is large. It is the figure drawn. 図７（ａ）−（ｃ）は、スピーカ出力信号のレベル推移、残響の影響が小さい場合のマイク入力信号のレベル推移、及び残響の影響が大きい場合のマイク入力信号のレベル推移を模式的に描いた図である。7A to 7C schematically show the level transition of the speaker output signal, the level transition of the microphone input signal when the influence of reverberation is small, and the level transition of the microphone input signal when the influence of reverberation is large. It is the figure drawn.

図１は、本発明の実施の形態に係るインターホンシステム１の構成を示す図である。本実施の形態ではインターホンシステム１をマンションに設置する例を想定する。インターホンシステム１はロビーインターホン装置１０、制御装置２０、インターホン室内装置３０及びドアインターホン装置４０を備える。 FIG. 1 is a diagram showing a configuration of an intercom system 1 according to an embodiment of the present invention. In the present embodiment, it is assumed that the intercom system 1 is installed in an apartment. The intercom system 1 includes a lobby intercom device 10, a control device 20, an intercom indoor device 30, and a door intercom device 40.

ロビーインターホン装置１０はロビーに設置されたインターホン室外装置であり、来訪者は部屋番号を入力して、訪問先の住戸内に設置されたインターホン室内装置３０を発呼する。制御装置２０は機械室や管理人室などの共用スペースに設置される。制御装置２０はロビーインターホン装置１０とインターホン室内装置３０との間の信号を中継する。ドアインターホン装置４０は住戸玄関に設置される。インターホン室内装置３０とドアインターホン装置４０は親機と子機の関係にある。 The lobby intercom device 10 is an intercom outdoor device installed in the lobby, and a visitor inputs a room number and calls the intercom indoor device 30 installed in the visited dwelling unit. The control device 20 is installed in a common space such as a machine room or an administrator room. The control device 20 relays a signal between the lobby intercom device 10 and the intercom indoor device 30. The door intercom device 40 is installed at the entrance of the dwelling unit. The intercom indoor unit 30 and the door intercom unit 40 are in a relationship between a master unit and a slave unit.

なお図１では単純化するため１つの住戸しか描いていないが、実際には多数の住戸が存在し、制御装置２０は複数のインターホン室内装置３０とロビーインターホン装置１０との間を中継する。また制御装置２０は分電盤や各種のセンサに接続されており、火災などの異常が発生した際、各住戸のインターホン室内装置３０にアラート信号を送信する。 Although only one dwelling unit is illustrated in FIG. 1 for simplification, there are actually a large number of dwelling units, and the control device 20 relays between the plurality of intercom indoor units 30 and the lobby intercom unit 10. The control device 20 is connected to a distribution board and various sensors, and transmits an alert signal to the intercom indoor device 30 of each dwelling unit when an abnormality such as a fire occurs.

図２は、ロビーインターホン装置１０の構成を示す図である。ロビーインターホン装置１０はマイク１１ａ、マイクアンプ１１ｂ、Ａ／Ｄ変換器１１ｃ、スピーカ１２ａ、スピーカアンプ１２ｂ、Ｄ／Ａ変換器１２ｃ、処理部１３、通信部１７及び操作部１８を備える。処理部１３は音声処理部１４、残響時間設定部１５及び残響時間推定部１６を備える。音声処理部１４は音声信号検出部１４１及びボイススイッチ部１４２を含む。 FIG. 2 is a diagram illustrating a configuration of the lobby intercom apparatus 10. The lobby intercom apparatus 10 includes a microphone 11a, a microphone amplifier 11b, an A / D converter 11c, a speaker 12a, a speaker amplifier 12b, a D / A converter 12c, a processing unit 13, a communication unit 17, and an operation unit 18. The processing unit 13 includes an audio processing unit 14, a reverberation time setting unit 15, and a reverberation time estimation unit 16. The audio processing unit 14 includes an audio signal detection unit 141 and a voice switch unit 142.

図２の処理部１３の機能ブロックには、本実施の形態で注目する処理に関連する機能ブロックのみを描いている。処理部１３の機能はハードウェア資源とソフトウェア資源の協働、又はハードウェア資源のみにより実現できる。ハードウェア資源としてプロセッサ、ＲＯＭ、ＲＡＭ、その他のＬＳＩを利用できる。ソフトウェア資源としてファームウェア、アプリケーション等のプログラムを利用できる。 In the functional block of the processing unit 13 in FIG. 2, only functional blocks related to the process of interest in the present embodiment are drawn. The function of the processing unit 13 can be realized by cooperation of hardware resources and software resources, or only by hardware resources. Processors, ROM, RAM, and other LSIs can be used as hardware resources. Firmware, applications, and other programs can be used as software resources.

マイク１１ａは来訪者の声を集音し、電気信号に変換してマイクアンプ１１ｂに出力する。マイクアンプ１１ｂはマイク１１ａから入力される信号を増幅してＡ／Ｄ変換器１１ｃに出力する。Ａ／Ｄ変換器１１ｃはマイクアンプ１１ｂから入力されるアナログ信号をデジタル信号に変換して音声処理部１４に出力する。 The microphone 11a collects a visitor's voice, converts it into an electrical signal, and outputs it to the microphone amplifier 11b. The microphone amplifier 11b amplifies the signal input from the microphone 11a and outputs the amplified signal to the A / D converter 11c. The A / D converter 11 c converts the analog signal input from the microphone amplifier 11 b into a digital signal and outputs the digital signal to the audio processing unit 14.

通信部１７は、ロビーインターホン装置１０のマイク１１ａまたはインターホン室内装置３０で集音された音を変換して生成される電気信号（以下、音信号という）を制御装置２０を介してインターホン室内装置３０と双方向通信するための通信インタフェースである。音声処理部１４は通信部１７を介してインターホン室内装置３０から受信した音信号を処理してＤ／Ａ変換器１２ｃに出力する。なお音信号が圧縮符号化されている場合は伸張復号する。なお伸張復号は音声処理部１４の前段の復号部（不図示）で行ってもよい。 The communication unit 17 converts an electrical signal (hereinafter referred to as a sound signal) generated by converting sound collected by the microphone 11 a of the lobby intercom device 10 or the intercom indoor device 30 via the control device 20. Is a communication interface for two-way communication. The sound processing unit 14 processes the sound signal received from the intercom indoor device 30 via the communication unit 17 and outputs the processed sound signal to the D / A converter 12c. If the sound signal is compression-encoded, it is decompressed and decoded. Note that the decompression decoding may be performed by a decoding unit (not shown) in the previous stage of the audio processing unit 14.

Ｄ／Ａ変換器１２ｃは、音声処理部１４から入力されるデジタル信号をアナログ信号に変換してスピーカアンプ１２ｂに出力する。スピーカアンプ１２ｂはＤ／Ａ変換器１２ｃから入力される信号を増幅してスピーカ１２ａに出力する。スピーカ１２ａは、スピーカアンプ１２ｂから入力される電気信号を物理的な振動に変換して出力する。 The D / A converter 12c converts the digital signal input from the audio processing unit 14 into an analog signal and outputs the analog signal to the speaker amplifier 12b. The speaker amplifier 12b amplifies the signal input from the D / A converter 12c and outputs the amplified signal to the speaker 12a. The speaker 12a converts the electrical signal input from the speaker amplifier 12b into physical vibration and outputs it.

操作部１８はユーザの操作を受け付けるためのユーザインタフェースである。操作部１８は少なくとも、部屋番号を入力するためのテンキーボタン、呼出ボタンを備える。 The operation unit 18 is a user interface for receiving user operations. The operation unit 18 includes at least a numeric keypad button and a call button for inputting a room number.

ボイススイッチ部１４２は、マイク１１ａから入力される信号をスピーカ１２ａへ出力する信号より優先させる送話モード、及びスピーカ１２ａへ出力する信号をマイク１１ａから入力される信号より優先させる受話モードを選択できる。送話モードでは、通信部１７を介して入力される音信号を減衰する。１００％減衰してスピーカ１２ａから音声出力されない設計でもよいし、１００％未満の減衰率で減衰してスピーカ１２ａから出力される音声レベルを下げる設計でもよい。またマイクアンプ１１ｂの増幅率を上げる処理を併用してもよい。 The voice switch unit 142 can select a transmission mode in which a signal input from the microphone 11a is given priority over a signal output to the speaker 12a, and a reception mode in which a signal output to the speaker 12a is given priority over a signal input from the microphone 11a. . In the transmission mode, the sound signal input via the communication unit 17 is attenuated. The design may be such that the sound is not output from the speaker 12a after being attenuated by 100%, or the sound level output from the speaker 12a is decreased by being attenuated by an attenuation factor of less than 100%. Moreover, you may use together the process which raises the gain of the microphone amplifier 11b.

一方、受話モードではマイク１１ａから入力される信号を減衰する。１００％減衰して有意な信号をインターホン室内装置３０に送出しない設計でもよいし、信号レベルを下げてインターホン室内装置３０に送出する設計でもよい。またスピーカアンプ１２ｂの増幅率を上げる処理を併用してもよい。なおボイススイッチ部１４２は、マイク１１ａから入力される信号およびスピーカ１２ａへ出力する信号を同じ減衰量で減衰させる中間モードも選択できる。 On the other hand, in the reception mode, the signal input from the microphone 11a is attenuated. The design may be such that the signal is attenuated 100% and a significant signal is not sent to the intercom indoor device 30, or the signal level is lowered and sent to the intercom indoor device 30. A process for increasing the amplification factor of the speaker amplifier 12b may be used in combination. The voice switch unit 142 can also select an intermediate mode in which the signal input from the microphone 11a and the signal output to the speaker 12a are attenuated by the same attenuation amount.

図２では、信号の減衰処理をデジタル信号処理による数値演算で実現する例を想定しているが、アナログ素子で構成される減衰器を使用してもよい。その場合、Ａ／Ｄ変換器１１ｃ及びＤ／Ａ変換器１２ｃは、ボイススイッチ部１４２と通信部１７の間に移動される。 Although FIG. 2 assumes an example in which signal attenuation processing is realized by numerical calculation by digital signal processing, an attenuator constituted by analog elements may be used. In that case, the A / D converter 11 c and the D / A converter 12 c are moved between the voice switch unit 142 and the communication unit 17.

音声信号検出部１４１は、マイク１１ａから入力される信号から音声区間を検出する。以下の説明では音声とは人間により発声された音を指し、音声信号検出部１４１は人間により発声された音を含む区間を音声区間と判定し、人間により発声された音を含まない環境音のみの区間を非音声区間と判定する。ボイススイッチ部１４２は、音声信号検出部１４１において音声区間が検出されているとき送話モードを選択し、非音声区間が検出されているとき受話モードまたは中間モードを選択する。 The audio signal detection unit 141 detects an audio section from a signal input from the microphone 11a. In the following description, “sound” refers to a sound uttered by a person, and the sound signal detection unit 141 determines a section including a sound uttered by a person as a sound section, and only an environmental sound that does not include a sound uttered by a person. Are determined to be non-voice segments. The voice switch unit 142 selects the transmission mode when a voice section is detected by the voice signal detection unit 141, and selects the reception mode or the intermediate mode when a non-voice section is detected.

音声信号検出部１４１は具体的には、マイク１１ａから入力される信号絶対値の長期の移動平均値Ｎと、短期の移動平均値Ｓを算出する。音声信号検出部１４１は短期の移動平均値Ｓと長期の移動平均値Ｎとの比率Ｓ／Ｎを算出し、当該比率Ｓ／Ｎが設定比率α以上であれば音声区間と判定し、設定比率α未満であれば非音声区間と判定する。すなわち短期の移動平均値Ｓが長期の移動平均値Ｎより設定比率α以上大きくなると音声区間に突入と判定する。反対に短期の移動平均値Ｓが長期の移動平均値Ｎの設定比率α未満になると非音声区間に復帰と判定する。 Specifically, the audio signal detection unit 141 calculates the long-term moving average value N and the short-term moving average value S of the signal absolute value input from the microphone 11a. The voice signal detection unit 141 calculates a ratio S / N between the short-term moving average value S and the long-term moving average value N. If the ratio S / N is equal to or larger than the set ratio α, the voice signal detecting unit 141 determines the voice section. If it is less than α, it is determined as a non-voice segment. That is, when the short-term moving average value S becomes larger than the long-term moving average value N by a set ratio α or more, it is determined that the voice section is entered. Conversely, when the short-term moving average value S becomes less than the set ratio α of the long-term moving average value N, it is determined to return to the non-voice section.

通常、音声信号の振幅の方が騒音信号の振幅より大きくなるため、音声信号を含む区間は振幅が大きくなる。音声および騒音とも振幅が小刻みに変動するため、単純に信号レベルが大きいとき音声区間と判定する方法では、判定結果が音声区間と非音声区間との間で頻繁に切り替わることになる。そこで一定期間の平均値を使用して信号データにヒステリシスを持たせる。音声が存在しない状態から音声が存在する状態に変化したとき、短期の移動平均値Ｓの方が長期の移動平均値Ｎより先に立ち上がることになる。長期の移動平均値Ｎの方が、より過去のデータを多く含んで平均化された値であるため短期の移動平均値Ｓより変化が緩やかなためである。 Usually, since the amplitude of the audio signal is larger than the amplitude of the noise signal, the amplitude including the audio signal has a larger amplitude. Since the amplitude of both voice and noise fluctuates little by little, in the method of simply determining a voice section when the signal level is high, the determination result is frequently switched between a voice section and a non-voice section. Therefore, hysteresis is given to the signal data using an average value for a certain period. When the state where there is no sound is changed to the state where there is sound, the short-term moving average value S rises before the long-term moving average value N. This is because the long-term moving average value N is a value averaged by including more past data, and therefore changes more slowly than the short-term moving average value S.

音声信号検出部１４１は、残響時間に応じて音声区間と判定する感度を調整する。具体的には残響時間が長いほど音声区間と判定する感度を高くする。残響とは音源が振動を停止した後も音が響いて聞こえる現象を指す。一般的に残響時間は、音源が振動を停止してから６０デシベル減衰するまでの時間と定義される。残響時間は、音が鳴っている空間が広いほど長くなる性質がある。従って広いロビーでは残響時間が長くなる。残響時間が長くなると音声に追加されて集音される残響の影響が大きくなる。 The audio signal detection unit 141 adjusts the sensitivity for determining the audio section according to the reverberation time. Specifically, the sensitivity for determining a speech section is increased as the reverberation time is longer. Reverberation is a phenomenon in which sound can be heard even after the sound source stops vibrating. In general, the reverberation time is defined as the time from when the sound source stops oscillating until it decays by 60 dB. The reverberation time has a property that it becomes longer as the space where the sound is played is wider. Therefore, the reverberation time becomes longer in a large lobby. As the reverberation time becomes longer, the influence of the reverberation added to the sound and collected becomes larger.

図３は、マイク１１ａにより集音される信号波形の一例を示す図である。図３では、同じ音声（こんにちは）を残響の影響が小さい環境下で集音した場合の信号波形と、残響の影響が大きい環境下で集音した場合の信号波形を模式的に描いている。残響の影響が大きい環境下で集音した信号波形の方が振幅の変動が小さくなることが分かる。振幅の変動が小さくなと、短期の移動平均値Ｓと長期の移動平均値Ｎとの乖離が小さくなる方向に作用するため、音声を騒音と誤検出しやすくなる。 FIG. 3 is a diagram illustrating an example of a signal waveform collected by the microphone 11a. In Figure 3, the same voice (hello) signal waveform when collected by the influence is small environment of reverberation, a signal waveform when collected by the high-impact environment of reverberation depicts schematically. It can be seen that the amplitude fluctuation is smaller in the signal waveform collected in an environment where the influence of reverberation is large. If the fluctuation of the amplitude is small, the difference between the short-term moving average value S and the long-term moving average value N acts in a direction that decreases, so that it is easy to misdetect voice as noise.

そこで音声信号検出部１４１は残響時間が長いとき、音声区間と判定される感度を高くして、通常より音声区間と判定されやすくする。以下、音声検出の判定感度を上げる３つの方法を説明する。第１の方法では音声信号検出部１４１は、残響時間が長いほど長期の移動平均値Ｎを算出する対象期間を長くする。 Therefore, when the reverberation time is long, the audio signal detection unit 141 increases the sensitivity to be determined as a voice interval so that it is more easily determined as a voice interval than usual. Hereinafter, three methods for increasing the detection sensitivity of voice detection will be described. In the first method, the audio signal detection unit 141 increases the target period for calculating the long-term moving average value N as the reverberation time is longer.

図４は、マイク１１ａにより集音される信号の移動平均値波形の一例を示す図である。音声（こんにちは）が存在する区間では、短期の移動平均値波形の方が長期の移動平均値波形より大きくなる。長期の移動平均値は音声が存在するようになる前の期間のデータを多く含むため、短期の移動平均値より相対的に小さくなる。対象期間が相対的に長い「長期の移動平均波形」と、対象期間が相対的に短い「長期の移動平均波形」を比較すると、前者の方がより、音声が存在するようになる前の期間のデータをより多く含むことになる。従って前者の方が後者よりレベルが小さくなる。よって、対象期間が相対的に長い「長期の移動平均値」を使用した方が短期の移動平均値との差分が大きなり、音声区間と判定されやすくなる。残響時間が長くなるほど長期の移動平均値と短期の移動平均値との乖離が小さくなる方向に作用するが、長期の移動平均値の対象期間を延長することにより、その影響を相殺または緩和できる。 FIG. 4 is a diagram illustrating an example of a moving average value waveform of a signal collected by the microphone 11a. In a section voice (hello) is present, towards the short-term moving average waveform is greater than the long-term moving average waveform. Since the long-term moving average value includes a lot of data in the period before the voice is present, it is relatively smaller than the short-term moving average value. Comparing the “long-term moving average waveform” with a relatively long target period with the “long-term moving average waveform” with a relatively short target period, the former is a period before the voice becomes more present Will contain more data. Therefore, the former has a lower level than the latter. Therefore, the use of the “long-term moving average value” having a relatively long target period has a larger difference from the short-term moving average value, and is easily determined as a voice section. The longer the reverberation time, the smaller the difference between the long-term moving average value and the short-term moving average value. However, by extending the target period of the long-term moving average value, the influence can be offset or mitigated.

音声検出の判定感度を上げる第２の方法では音声信号検出部１４１は、残響時間が長いほど短期の移動平均値Ｓを算出する対象期間を短くする。 In the second method of increasing the detection sensitivity of voice detection, the voice signal detection unit 141 shortens the target period for calculating the short-term moving average value S as the reverberation time is longer.

図５は、マイク１１ａにより集音される信号の移動平均値波形の別の例を示す図である。図５において対象期間が相対的に短い「短期の移動平均波形」と、対象期間が相対的に長い「短期の移動平均波形」を比較すると、前者の方が振幅の変動が大きくなる。従って対象期間が相対的に短い「短期の移動平均値」を使用した方が長期の移動平均値との差分が大きくなりやすく音声区間と判定されやすくなる。残響時間が長くなるほど長期の移動平均値と短期の移動平均値との乖離が小さくなる方向に作用するが、短期の移動平均値の対象期間を縮小することにより、その影響を相殺または緩和できる。 FIG. 5 is a diagram showing another example of the moving average waveform of the signal collected by the microphone 11a. In FIG. 5, when comparing the “short-term moving average waveform” with a relatively short target period and the “short-term moving average waveform” with a relatively long target period, the former has a larger amplitude fluctuation. Therefore, the use of the “short-term moving average value” having a relatively short target period tends to increase the difference from the long-term moving average value, and it is easy to determine the voice section. The longer the reverberation time, the smaller the difference between the long-term moving average value and the short-term moving average value. However, by reducing the target period of the short-term moving average value, the influence can be offset or mitigated.

音声検出の判定感度を上げる第３の方法では音声信号検出部１４１は、残響時間が長いほど設定比率αを小さくする。設定比率αを小さくすれば、短期の移動平均値Ｓと長期の移動平均値Ｎとの比率Ｓ／Ｎが設定比率α以上になりやすくなり、音声区間と判定されやすくなる。 In the third method of increasing the detection sensitivity of the sound detection, the sound signal detection unit 141 decreases the setting ratio α as the reverberation time is longer. If the setting ratio α is reduced, the ratio S / N between the short-term moving average value S and the long-term moving average value N is likely to be greater than or equal to the setting ratio α, and it is easy to determine a voice section.

以上の説明において、短期の移動平均値Ｓの対象期間（すなわちデータサンプル数）、長期の移動平均値Ｎの対象期間、設定比率αのそれぞれ値には、実験やシミュレーションをもとに導出した値を用いることができる。また残響時間の変化値と、長期の移動平均値Ｎの対象期間の変化値との関係も実験やシミュレーションをもとに導出した関係を用いることができる。当該関係は関数で規定してもよいし、当該関係を記述したテーブルを用いてもよい。長期の移動平均値Ｎの対象期間、又は設定比率αを変化させる場合も同様である。 In the above description, each of the target period of the short-term moving average value S (that is, the number of data samples), the target period of the long-term moving average value N, and the setting ratio α are values derived from experiments and simulations. Can be used. Moreover, the relationship derived | led-out based on experiment and simulation can also be used for the relationship between the change value of reverberation time, and the change value of the target period of the long-term moving average value N. FIG. The relationship may be defined by a function, or a table describing the relationship may be used. The same applies when the target period of the long-term moving average value N or the set ratio α is changed.

また残響時間と、長期の移動平均値Ｎの対象期間との関係はモードで規定してもよい。例えば残響時間に「短」、「普通」、「長」の３モードを用意し、長期の移動平均値Ｎの対象期間にも「短」、「普通」、「長」の３モードを用意する。残響時間のモードが「長」に設定されている場合、音声信号検出部１４１は対象期間が「長」の場合のサンプル数で長期の移動平均値Ｎを算出する。 Further, the relationship between the reverberation time and the target period of the long-term moving average value N may be defined by the mode. For example, three modes of “short”, “normal”, and “long” are prepared for the reverberation time, and three modes of “short”, “normal”, and “long” are prepared for the target period of the long-term moving average value N. . When the reverberation time mode is set to “long”, the audio signal detection unit 141 calculates the long-term moving average value N by the number of samples when the target period is “long”.

図２に戻る。残響時間設定部１５は残響時間を音声信号検出部１４１に設定する。残響時間設定部１５は、操作部１８からユーザにより入力された残響時間の値を、音声信号検出部１４１に設定する。例えばロビーインターホン装置１０の施工時、施工業者が騒音計を用いてロビーの残響時間を測定し、測定した値を操作部１８から入力してもよい。 Returning to FIG. The reverberation time setting unit 15 sets the reverberation time in the audio signal detection unit 141. The reverberation time setting unit 15 sets the value of the reverberation time input by the user from the operation unit 18 in the audio signal detection unit 141. For example, at the time of construction of the lobby intercom apparatus 10, the contractor may measure the reverberation time of the lobby using a sound level meter and input the measured value from the operation unit 18.

このように残響時間にはユーザにより入力された値を使用できるが、推定値を使用することもできる。また推定値を使用したほうが、壁の材質変更、温度、湿度などの環境条件の変化を直ぐに残響時間に反映できる。残響時間推定部１６は、マイク１１ａから入力される信号を解析して残響時間を推定する。例えば残響時間推定部１６は、マイク１１ａから入力される信号の音声区間の終了からの傾きを検出することにより残響時間を推定できる。 Thus, although the value input by the user can be used for the reverberation time, an estimated value can also be used. In addition, using estimated values can immediately reflect changes in environmental conditions such as wall material changes, temperature, and humidity in the reverberation time. The reverberation time estimation unit 16 analyzes the signal input from the microphone 11a and estimates the reverberation time. For example, the reverberation time estimation unit 16 can estimate the reverberation time by detecting the inclination from the end of the voice section of the signal input from the microphone 11a.

図６（ａ）−（ｃ）は、発話音声信号のレベル推移、残響の影響が小さい場合のマイク入力信号のレベル推移、及び残響の影響が大きい場合のマイク入力信号のレベル推移を模式的に描いた図である。図６（ｂ）と図６（ｃ）を比較すると、図６（ｃ）に示す残響の影響が大きい場合のマイク入力信号の方が発話音声の文末からの傾きが緩くなる。残響時間推定部１６は例えば、音声区間から非音声期間に切り替わってから、マイク入力信号のレベルが非音声期間の平均レベルに到達するまでの時間を測定することにより残響時間を推定する。このような来訪者の発話音声を利用する方法では、後述するようなスピーカ１２ａから音を出力する必要がなく、テスト用の音源データを用意する必要もない。 6A to 6C schematically show the transition of the level of the speech signal, the transition of the level of the microphone input signal when the influence of reverberation is small, and the transition of the level of the microphone input signal when the influence of reverberation is large. It is the figure drawn. Comparing FIG. 6 (b) and FIG. 6 (c), the microphone input signal when the influence of reverberation shown in FIG. For example, the reverberation time estimation unit 16 estimates the reverberation time by measuring the time from when the speech section is switched to the non-speech period until the level of the microphone input signal reaches the average level of the non-speech period. In such a method using the speech voice of a visitor, it is not necessary to output sound from a speaker 12a as will be described later, and it is not necessary to prepare test sound source data.

残響時間の推定に使用するマイク入力信号は発話音声を含む信号に限るものではない。例えば残響時間推定部１６はスピーカ１２ａから音を出力させ、その音をマイク１１ａから集音して残響時間を推定してもよい。この場合、スピーカ１２ａからの出力音の終了タイミングを厳密に特定できるため、より高精度に残響時間を推定することができる。 The microphone input signal used for estimating the reverberation time is not limited to the signal including the speech sound. For example, the reverberation time estimation unit 16 may output sound from the speaker 12a and collect the sound from the microphone 11a to estimate the reverberation time. In this case, since the end timing of the output sound from the speaker 12a can be specified precisely, the reverberation time can be estimated with higher accuracy.

図７（ａ）−（ｃ）は、スピーカ出力信号のレベル推移、残響の影響が小さい場合のマイク入力信号のレベル推移、及び残響の影響が大きい場合のマイク入力信号のレベル推移を模式的に描いた図である。図７（ｂ）と図７（ｃ）を比較すると、図７（ｃ）に示す残響の影響が大きい場合のマイク入力信号の方がスピーカ出力の終了時点からの傾きが緩くなる。スピーカ出力信号を使用する場合、残響時間推定部１６はスピーカ出力の終了時から、マイク入力信号のレベルが非音声期間の平均レベルに到達するまでの時間を測定することにより残響時間を推定する。 7A to 7C schematically show the level transition of the speaker output signal, the level transition of the microphone input signal when the influence of reverberation is small, and the level transition of the microphone input signal when the influence of reverberation is large. It is the figure drawn. Comparing FIG. 7B and FIG. 7C, the inclination of the microphone input signal when the influence of the reverberation shown in FIG. When the speaker output signal is used, the reverberation time estimation unit 16 estimates the reverberation time by measuring the time from the end of the speaker output until the level of the microphone input signal reaches the average level of the non-voice period.

残響時間推定部１６は、スピーカ１２ａから出力させる残響推定用テスト信号の音源データを保持するテスト音源データ保持部１６ａを備えていてもよい。残響時間推定部１６は音声処理部１４の起動シーケンスにおいて、テスト音源データ保持部１６ａから音源データを読み出してスピーカ１２ａからテスト音を出力させる。残響時間推定部１６は当該テスト音をマイク１１ａで集音し、残響時間を推定する。残響時間は温度や湿度の影響も受けるため音声処理部１４の起動の度に残響時間を推定することは、その時点の環境に合致した残響時間を使用する観点から望ましい制御といえる。 The reverberation time estimation unit 16 may include a test sound source data holding unit 16a that holds sound source data of a reverberation estimation test signal output from the speaker 12a. The reverberation time estimation unit 16 reads sound source data from the test sound source data holding unit 16a and outputs a test sound from the speaker 12a in the activation sequence of the sound processing unit 14. The reverberation time estimation unit 16 collects the test sound with the microphone 11a and estimates the reverberation time. Since the reverberation time is also affected by temperature and humidity, estimating the reverberation time each time the sound processing unit 14 is activated is a desirable control from the viewpoint of using the reverberation time that matches the environment at that time.

なお残響推定用テスト信号の音源データとして、来訪者が住人を呼び出す際に鳴動させるバックトーンの音源データを流用してもよい。この場合、残響推定用テスト信号の音源データを別途に用意する必要がなくなる。残響時間推定部１６は呼出しボタンが押下される度に、マイク１１ａから呼出音を集音して残響時間を推定してもよい。 It should be noted that as the sound source data of the reverberation estimation test signal, back tone sound source data that rings when a visitor calls a resident may be used. In this case, it is not necessary to separately prepare sound source data for the reverberation estimation test signal. The reverberation time estimation unit 16 may estimate the reverberation time by collecting a ringing tone from the microphone 11a each time the call button is pressed.

残響時間推定部１６は、スピーカ１２ａから出力される通話中の受話音声を、マイク１１ａから集音して残響時間を推定してもよい。この推定処理は、ボイススイッチ部１４２が受話モードを選択してるときに実行することが好ましい。この場合も、残響推定用テスト信号の音源データを別途に用意する必要がなくなる。 The reverberation time estimation unit 16 may collect the received voice during a call output from the speaker 12a from the microphone 11a and estimate the reverberation time. This estimation process is preferably executed when the voice switch unit 142 selects the reception mode. Also in this case, it is not necessary to separately prepare sound source data for the reverberation estimation test signal.

以上説明したように本実施の形態によれば、残響時間に応じて音声信号検出部１４１において音声区間と判定する感度を調整することにより、送話モードと受話モードのスイッチング精度が高くなり快適な通話を実現できる。特にロビーなどの残響の影響が大きい場所に設置されるインターホン装置に有効である。また残響時間推定部１６を設けることにより、設置場所の環境変化をリアルタイムにボイススイッチ機能に反映させることができる。 As described above, according to the present embodiment, the switching accuracy between the transmission mode and the reception mode is increased and comfortable by adjusting the sensitivity of the audio signal detection unit 141 according to the reverberation time, which is determined as the audio section. A call can be realized. This is particularly effective for intercom devices installed in places such as lobbies where the effects of reverberation are significant. Further, by providing the reverberation time estimation unit 16, it is possible to reflect the environmental change of the installation location in the voice switch function in real time.

以上、本発明を実施の形態をもとに説明した。実施の形態は例示であり、それらの各構成要素や各処理プロセスの組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. The embodiments are exemplifications, and it will be understood by those skilled in the art that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are within the scope of the present invention. .

上記の実施の形態では、短期の移動平均値Ｓと長期の移動平均値Ｎとの比率Ｓ／Ｎが設定比率α以上のとき音声区間と判定する方法を説明したが、短期の移動平均値Ｓと長期の移動平均値Ｎとの差分（Ｓ−Ｎ）が設定値β以上のとき音声区間と判定してもよい。 In the above embodiment, the method of determining a voice section when the ratio S / N between the short-term moving average value S and the long-term moving average value N is greater than or equal to the set ratio α has been described. If the difference (S−N) between the long-term moving average value N and the long-term moving average value N is equal to or larger than the set value β, it may be determined that the voice section.

上記の実施の形態では、ボイススイッチ部１４２を含む音声処理部１４をロビーインターホン装置１０に搭載する例を説明した。この点、音声処理部１４をドアインターホン装置４０またはインターホン室内装置３０に搭載してもよい。なおドアインターホン装置４０のマイクで集音される残響の影響を低減するための音声処理部１４を、親機としてのインターホン室内装置３０に搭載してもよい。ドアインターホン装置４０が設置される廊下も比較的残響が大きい場所であるため、音声処理部１４を搭載する意義が大きい。 In the above embodiment, the example in which the voice processing unit 14 including the voice switch unit 142 is mounted on the lobby intercom apparatus 10 has been described. In this regard, the sound processing unit 14 may be mounted on the door intercom device 40 or the intercom indoor device 30. Note that the sound processing unit 14 for reducing the influence of reverberation collected by the microphone of the door intercom device 40 may be mounted on the intercom indoor device 30 serving as a master unit. Since the corridor where the door intercom device 40 is installed is also a place where the reverberation is relatively large, it is significant to mount the audio processing unit 14.

また住戸内の天井が高いなど、住戸内の空間が広い場合はインターホン室内装置３０のマイクで集音される残響の影響を低減するための音声処理部１４をインターホン室内装置３０に搭載してもよい。 Further, when the space in the dwelling unit is wide, such as when the ceiling in the dwelling unit is high, the sound processing unit 14 for reducing the influence of reverberation collected by the microphone of the intercom indoor unit 30 may be mounted on the intercom indoor unit 30. Good.

また音声処理部１４を制御装置２０に搭載してもよい。制御装置２０がロビーインターホン装置１０とインターホン室内装置３０間の音信号を中継する際、ボイススイッチを適用してもよい。その場合、残響時間推定部１６は制御装置２０に搭載されてもロビーインターホン装置１０に搭載されてもよい。後者の場合、残響時間推定部１６が推定した残響時間は幹線を介して制御装置２０に伝達されることになる。 Further, the audio processing unit 14 may be mounted on the control device 20. When the control device 20 relays a sound signal between the lobby intercom device 10 and the intercom indoor device 30, a voice switch may be applied. In that case, the reverberation time estimation unit 16 may be mounted on the control device 20 or the lobby intercom device 10. In the latter case, the reverberation time estimated by the reverberation time estimation unit 16 is transmitted to the control device 20 via the trunk line.

上記の実施の形態ではマンションに設置されるインターホンシステム１を説明したが、戸建てに設置されるインターホンシステム１にも同様に適用できる。戸建てに設置されるインターホンシステム１では、ロビーインターホン装置１０及び制御装置２０が設けられない。上述のボイススイッチ機能は、インターホン室内装置３０とドアインターホン装置４０間の音信号に適用される。 In the above embodiment, the intercom system 1 installed in a condominium has been described. However, the present invention can be similarly applied to an intercom system 1 installed in a detached house. In the intercom system 1 installed in a detached house, the lobby intercom device 10 and the control device 20 are not provided. The voice switch function described above is applied to a sound signal between the intercom indoor device 30 and the door intercom device 40.

なお、実施の形態は、以下の項目によって特定されてもよい。 The embodiment may be specified by the following items.

［項目１］
マイク１１ａから入力される信号をスピーカ１２ａへ出力する信号より優先させる送話モードを選択可能なボイススイッチ部１４２と、
前記マイク１１ａから入力される信号から音声信号を含む音声区間を検出する音声信号検出部１４１と、を備え、
前記ボイススイッチ部１４２は、前記音声信号検出部１４１において前記音声区間が検出されているとき前記送話モードを選択し、
前記音声信号検出部１４１は、残響時間に応じて前記音声区間と判定する感度を調整することを特徴とする音声処理装置（音声処理部１４）。
これにより、残響により音声区間と判定する精度が低下することを抑制できる。
［項目２］
前記音声信号検出部１４１は、設定される残響時間が長いほど音声区間と判定する感度を高くすることを特徴とする項目１に記載の音声処理装置（音声処理部１４）。
これにより、残響の影響により音声区間を非音声区間と誤判定する確率を低減できる。
［項目３］
前記音声信号検出部１４１は、前記マイク１１ａから入力される信号の短期の移動平均値が、長期の移動平均値より閾値以上大きいとき音声区間と判定し、設定される残響時間が長いほど前記長期の移動平均値を算出する対象期間を長くすることを特徴とする項目１または２に記載の音声処理装置（音声処理部１４）。
これにより、長期の移動平均値を下側にシフトさせることができ、音声区間と判定する感度を高めることができる。
［項目４］
前記音声信号検出部１４１は、前記マイク１１ａから入力される信号の短期の移動平均値が、長期の移動平均値より閾値以上大きいとき音声区間と判定し、設定される残響時間が長いほど前記短期の移動平均値を算出する対象期間を短くすることを特徴とする項目１または２に記載の音声処理装置（音声処理部１４）。
これにより、短期の移動平均値の変化を大きくさせることができ、音声区間と判定する感度を高めることができる。
［項目５］
前記音声信号検出部１４１は、前記マイク１１ａから入力される信号の短期の移動平均値が、長期の移動平均値より閾値以上大きいとき音声区間と判定し、設定される残響時間が長いほど前記閾値を小さくすることを特徴とする項目１または２に記載の音声処理装置（音声処理部１４）。
これにより、短期の移動平均値と長期の移動平均値との比率または差分が、設定比率または設定値以上となる確率を上げることができ、音声区間と判定する感度を高めることができる。
［項目６］
前記残響時間は、ユーザ入力により設定されることを特徴とする項目１から５のいずれかに記載の音声処理装置（音声処理部１４）。
これにより、残響時間を設置環境に応じた値に設定できる。
［項目７］
前記残響時間は、前記マイク１１ａから入力される信号を解析して前記残響時間を推定する残響時間推定部から設定されることを特徴とする項目１から５のいずれかに記載の音声処理装置（音声処理部１４）。
これにより、残響時間を設置環境をリアルタイムに反映した値に設定できる。
［項目８］
項目１から７のいずれかに記載の音声処理装置（音声処理部１４）と、
集音した音を電気信号に変換して前記音声処理装置（音声処理部１４）に入力するマイク１１ａと、
前記音声処理装置（音声処理部１４）から出力される電気信号を音に変換するスピーカ１２ａと、
を備えることを特徴とするインターホン装置（ロビーインターホン装置１０）。
これにより、残響により音声区間と判定する精度が低下することを抑制でき、快適な通話が可能なインターホン装置を実現できる。
［項目９］
インターホン室内装置（インターホン室内装置３０）と、
前記インターホン室内装置（インターホン室内装置３０）と通信するためのインターホン室外装置（ロビーインターホン装置１０）であり、項目８に記載のインターホン装置が使用されるインターホン室外装置（ロビーインターホン装置１０）と、
を備えることを特徴とするインターホンシステム１。
これにより、残響により音声区間と判定する精度が低下することを抑制でき、快適な通話が可能なインターホンシステムを実現できる。 [Item 1]
A voice switch unit 142 capable of selecting a transmission mode in which a signal input from the microphone 11a is prioritized over a signal output to the speaker 12a;
An audio signal detector 141 that detects an audio section including an audio signal from a signal input from the microphone 11a;
The voice switch unit 142 selects the transmission mode when the voice section is detected by the voice signal detection unit 141,
The audio signal detection unit 141 adjusts the sensitivity to determine the audio section according to the reverberation time.
Thereby, it can suppress that the precision which determines with a speech area by reverberation falls.
[Item 2]
The sound processing device (sound processing unit 14) according to item 1, wherein the sound signal detecting unit 141 increases sensitivity for determining a sound section as the set reverberation time is longer.
Thereby, it is possible to reduce the probability of erroneously determining a speech segment as a non-speech segment due to reverberation.
[Item 3]
The voice signal detection unit 141 determines a voice section when a short-term moving average value of a signal input from the microphone 11a is larger than a long-term moving average value by a threshold or more. 3. The speech processing apparatus (speech processing unit 14) according to item 1 or 2, characterized in that the target period for calculating the moving average value is lengthened.
Thereby, a long-term moving average value can be shifted to the lower side, and the sensitivity for determining a voice segment can be increased.
[Item 4]
The voice signal detection unit 141 determines a voice section when a short-term moving average value of a signal input from the microphone 11a is larger than a long-term moving average value by a threshold or more. 3. The speech processing apparatus (speech processing unit 14) according to item 1 or 2, wherein a target period for calculating the moving average value is shortened.
Thereby, the change of a short-term moving average value can be enlarged, and the sensitivity which determines with a speech area can be improved.
[Item 5]
The voice signal detection unit 141 determines a voice section when a short-term moving average value of a signal input from the microphone 11a is greater than or equal to a threshold value than a long-term moving average value, and the threshold value increases as the set reverberation time is longer. The audio processing device (audio processing unit 14) according to item 1 or 2, characterized in that
Thereby, it is possible to increase the probability that the ratio or difference between the short-term moving average value and the long-term moving average value is equal to or higher than the set ratio or the set value, and to increase the sensitivity for determining the voice section.
[Item 6]
6. The voice processing device (voice processing unit 14) according to any one of items 1 to 5, wherein the reverberation time is set by user input.
Thereby, the reverberation time can be set to a value according to the installation environment.
[Item 7]
The speech processing apparatus according to any one of Items 1 to 5, wherein the reverberation time is set by a reverberation time estimation unit that analyzes a signal input from the microphone 11a and estimates the reverberation time. Voice processing unit 14).
Thereby, the reverberation time can be set to a value reflecting the installation environment in real time.
[Item 8]
The voice processing device (voice processing unit 14) according to any one of items 1 to 7,
A microphone 11a that converts the collected sound into an electric signal and inputs the electric signal to the sound processing device (sound processing unit 14);
A speaker 12a for converting an electrical signal output from the audio processing device (audio processing unit 14) into sound;
An intercom device (lobby intercom device 10).
Thereby, it can suppress that the precision which determines with an audio | voice area falls by reverberation, and can implement | achieve the intercom apparatus in which a comfortable telephone call is possible.
[Item 9]
An intercom indoor device (interphone indoor device 30);
An intercom outdoor device (lobby interphone device 10) for communicating with the intercom indoor device (interphone indoor device 30), an interphone outdoor device (lobby interphone device 10) in which the interphone device according to item 8 is used,
An intercom system 1 comprising:
Thereby, it can suppress that the precision which determines with an audio | voice area falls by reverberation, and can implement | achieve the intercom system in which a comfortable telephone call is possible.

１インターホンシステム、１０ロビーインターホン装置、１１ａマイク、１１ｂマイクアンプ、１１ｃＡ／Ｄ変換器、１２ａスピーカ、１２ｂスピーカアンプ、１２ｃＤ／Ａ変換器、１３処理部、１４音声処理部、１４１音声信号検出部、１４２ボイススイッチ部、１５残響時間設定部、１６残響時間推定部、１６ａテスト音源データ保持部、１７通信部、１８操作部、２０制御装置、３０インターホン室内装置、４０ドアインターホン装置。 DESCRIPTION OF SYMBOLS 1 Intercom system, 10 Lobby intercom apparatus, 11a microphone, 11b Microphone amplifier, 11c A / D converter, 12a Speaker, 12b Speaker amplifier, 12c D / A converter, 13 Processing part, 14 Sound processing part, 141 Sound signal detection Unit, 142 voice switch unit, 15 reverberation time setting unit, 16 reverberation time estimation unit, 16a test sound source data holding unit, 17 communication unit, 18 operation unit, 20 control device, 30 intercom indoor device, 40 door intercom device.

Claims

A voice switch unit capable of selecting a transmission mode in which a signal input from a microphone is given priority over a signal output to a speaker;
A voice signal detection unit that detects a voice section including a voice signal from a signal input from the microphone;
The voice switch unit selects the transmission mode when the voice section is detected in the voice signal detection unit,
The voice processing apparatus, wherein the voice signal detection unit adjusts sensitivity for determining the voice section according to reverberation time.

The audio processing apparatus according to claim 1, wherein the audio signal detection unit increases sensitivity for determining a voice interval as the set reverberation time is longer.

The voice signal detection unit determines a voice section when a short-term moving average value of a signal input from the microphone is greater than a long-term moving average value by a threshold or more, and the longer the reverberation time is set, the longer the long-term movement is. 3. The speech processing apparatus according to claim 1, wherein a target period for calculating the average value is lengthened.

The voice signal detection unit determines a voice section when a short-term moving average value of a signal input from the microphone is larger than a long-term moving average value by a threshold or more, and the shorter the reverberation time is set, the shorter the short-term movement is. The speech processing apparatus according to claim 1, wherein a target period for calculating the average value is shortened.

The voice signal detection unit determines a voice section when a short-term moving average value of a signal input from the microphone is greater than or equal to a threshold value than a long-term moving average value, and decreases the threshold value as the set reverberation time is longer. The speech processing apparatus according to claim 1 or 2, wherein

The speech processing apparatus according to claim 1, wherein the reverberation time is set by user input.

The speech processing apparatus according to claim 1, wherein the reverberation time is set by a reverberation time estimation unit that analyzes a signal input from the microphone and estimates the reverberation time.

A voice processing device;
A microphone that converts the collected sound into an electrical signal and inputs it to the sound processing device;
A speaker that converts an electrical signal output from the audio processing device into sound;
An intercom apparatus comprising:

Intercom indoor unit,
An intercom outdoor device for communicating with the intercom indoor device, wherein the intercom outdoor device in which the interphone device according to claim 8 is used;
An intercom system comprising: