JP3580175B2

JP3580175B2 - Voice detector

Info

Publication number: JP3580175B2
Application number: JP11571399A
Authority: JP
Inventors: 実福島; 博昭竹山; 章寺澤
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 1999-04-23
Filing date: 1999-04-23
Publication date: 2004-10-20
Anticipated expiration: 2019-04-23
Also published as: JP2000305579A

Description

【０００１】
【発明の属する技術分野】
本発明は、住宅、事務所、工場等で用いられる拡声通話装置（インターホン、電話機、ＰＨＳ等）における通話回路に雑音除去機能や音声切替機能等を搭載するための音声検出器に関するものである。
【０００２】
【従来の技術】
一般に音声検出器は、マイクロホンにより集音された音響信号が音声信号を含んでいるか否かを検出するために用いられる。このような音声検出器の典型的な構成例を図５に示す。この音声検出器ＶＤ′は、瞬時パワー推定部２０、背景雑音パワー推定部２１並びに比較判定部２２を備える。瞬時パワー推定部２０は、立ち上がりが急峻であり且つ立ち下がりが緩やかな特性をもつ積分回路又はデジタルフィルタ等により実現され、参照信号（マイクロホンにより集音される音響信号）の短時間平均パワーを推定するものである。また背景雑音パワー推定部２１は、立ち上がりが緩やかであり且つ立ち下がりが急峻な特性をもつ積分回路又はデジタルフィルタ等により実現され、参照信号中に定常的に存在する暗騒音（背景雑音）レベルを推定するものである。さらに比較判定部２２は、瞬時パワー推定部２０により求められる瞬時パワー推定値と、背景雑音パワー推定部２１により求められる背景雑音パワー推定値の比を所定のしきい値と比較することにより、参照信号が音声信号を含んでいるか否かを判定（検出）してＨ又はＬの２値信号（検出信号）を出力する。
【０００３】
【発明が解決しようとする課題】
ところが、スピーカ及びマイクロホンを有する拡声通話機（図示せず）の内部回路に上述のような音声検出器ＶＤ′を設ける場合、マイクロホンにより集音される音響信号にはスピーカからの回り込み成分（音響結合成分）が含まれる。この音響信号に含まれる回り込み成分の割合が大きい場合には、本来の目的であるマイクロホンの近傍に居る話者が音声を発したか否かを検出することが困難となる。例えば、遠端側の通話端末付近の背景雑音レベルが大きく、音響結合を経てマイクロホンが遠端側の背景雑音を集音する場合には、上記従来例における背景雑音パワー推定部２１により求められる背景雑音パワー推定値が大きくなる。その結果、マイクロホンの近傍に居る話者が音声を発した状態においても、瞬時パワー推定値と背景雑音パワー推定値との比が小さく、所定のしきい値を越えることができずに比較判定部２２においては音声信号でない（非音声）として誤検出されてしまう虞がある。
【０００４】
本発明は上記問題に鑑みて為されたものであり、その目的とするところは、背景雑音レベルが大きい状況下においても参照信号に音声信号が含まれているか否かを精度良く検出することができる音声検出器を提供することにある。
【０００５】
【課題を解決するための手段】
請求項１の発明は、上記目的を達成するために、マイクロホン及びスピーカを有する拡声通話端末が他の通話端末又は拡声通話端末に接続されて半二重通話を行う拡声通話系の上記拡声通話端末に用いられ、通話路に伝送される信号が音声信号であるか非音声信号であるかを検出する音声検出器であって、上記通話路から取り出した参照信号の瞬時パワーを推定する瞬時パワー推定部と、上記参照信号に含まれる背景雑音成分のパワーを推定する背景雑音パワー推定部と、上記瞬時パワー推定部で推定される瞬時パワー推定値並びに上記背景雑音パワー推定部で推定される背景雑音パワー推定値に基づいて参照信号が音声信号であるか非音声信号であるかを判定するとともに判定結果が更新されるまで前回の判定結果を保持する第１の音声／非音声判定部と、参照信号に含まれる音響結合成分の割合が大きいときに上記背景雑音パワー推定部の推定処理を停止するとともに上記割合が大きくないときには背景雑音パワー推定部の推定処理を停止しないように切り替える背景雑音パワー推定切替部とを備え、背景雑音パワー推定部は、背景雑音パワー推定切替部によって推定処理を停止する停止モードに切り替えられた場合にそれ以前に求めて保持していた推定値を背景雑音パワー推定値とし、第１の音声／非音声判定部が背景雑音パワー推定部が保持していた背景雑音パワー推定値に基づいて判定することを特徴とし、参照信号に含まれる音響結合成分の割合が大きい場合には背景雑音パワー推定切替部によって背景雑音パワー推定部の処理を停止するとともに、参照信号に含まれる音響結合成分の割合が小さい状況で求められて背景雑音パワー推定部が保持していた背景雑音パワー推定値に基づいて第１の音声／非音声判定部による判定が行われるため、遠端側の通話端末における背景雑音がスピーカから送出されてマイクロホンに回り込むことにより、マイクロホンの集音する音響信号における近端側話者の発する音声成分とそれ以外の背景雑音成分との比が小さくなることに起因して参照信号中に音声信号が含まれていても音声信号が検出できなくなるような状況を低減することができ、背景雑音レベルが大きい状況下においても参照信号が音声信号であるか否かを精度良く検出することができる。
【０００６】
請求項２の発明は、請求項１の発明において、上記第１の音声／非音声判定部による判定結果に応じて参照信号が非音声信号であると検出された音声非検出継続時間を求める音声非検出区間計時部と、該音声非検出区間計時部により求められた音声非検出継続時間から上記参照信号が音声信号と非音声信号の何れであるかを判定する第２の音声／非音声判定部とを備え、該第２の音声／非音声判定部は、上記音声非検出区間計時部により求められる音声非検出継続時間が人間の音声の音韻継続時間程度の時間にわたって略一定であり、且つ上記音声非検出継続時間が人間の音声のピッチ周期程度である場合には該音声非検出継続時間の参照信号を全て音声信号と判定して成ることを特徴とし、背景雑音レベルが非常に大きい状況下で第１の音声／非音声判定部においては音声信号が検出されないような場合においても、音声非検出区間計時部において測定される音声非検出継続時間が音声の音韻継続時間程度の間ほぼ一定であり、且つ人間の音声のピッチ間隔とほぼ等しい場合には、第２の音声／非音声判定部において改めて参照信号を音声信号として判定するから、背景雑音レベルの大きい拡声通話系において音声信号をさらに精度良く検出することができる。
【０００７】
【発明の実施の形態】
（実施形態１）
図１は、本発明の実施形態１における音声検出器ＶＤ_１を有する拡声通話機Ｍを示すブロック図である。この拡声通話機Ｍは、マイクロホン１０、スピーカ１１、マイクロホンアンプ１５、スピーカアンプ１６、音声検出器ＶＤ_１並びに音声スイッチＶＳを備え、回線を通じて他の拡声通話機等と接続される。ここで音声スイッチＶＳは、スピーカ１１からマイクロホン１０への音響結合、及び回線側での回り込みにより形成される閉ループの利得を低減させることによりハウリングを抑圧するものであり、マイクロホン１０で集音する音声信号（送話信号）を回線へ伝送するための送話信号線上に挿入される送話側減衰器１２と、回線から受信した音声信号（受話信号）をスピーカ１１へ伝送するための受話信号線上に挿入される受話側減衰器１３と、通話状態に応じて送話側減衰器１２並びに受話側減衰器１３の利得を制御する挿入損失量制御部１４とを備える。而して、挿入損失量制御部１４においては、送受話信号を観測して通話状態を判定し、通話状態に応じて送話側減衰器１２の利得及び受話側減衰器１３の利得を適切に設定する。
【０００８】
本発明に係る音声検出器ＶＤ_１は、通話路（送話信号線）から取り出した参照信号（送話信号）Ｖｘの瞬時パワーを推定する瞬時パワー推定部１と、参照信号Ｖｘに含まれる背景雑音成分のパワーを推定する背景雑音パワー推定部２と、瞬時パワー推定部１で推定される瞬時パワー推定値Ｐｓ並びに背景雑音パワー推定部２で推定される背景雑音パワー推定値Ｐｎに基づいて参照信号Ｖｘが音声信号であるか非音声信号であるかを判定するとともに判定結果が更新されるまで前回の判定結果を保持する第１の音声／非音声判定部３と、背景雑音パワー推定部２における背景雑音パワー推定値Ｐｎの更新／停止を切り替える背景雑音パワー推定切替部４とを備える。
【０００９】
瞬時パワー推定部１は、立ち上がりが急峻であり、且つ立ち下がりが緩やかな特性をもつ積分回路又はデジタルフィルタ等によって構成される。また、背景雑音パワー推定部２は、立ち上がりが緩やかであり、且つ立ち下がりが急峻な特性をもつ積分回路又はデジタルフィルタによって構成される。
【００１０】
一方、第１の音声／非音声判定部３は、図２に示すように瞬時パワー推定部１から出力される瞬時パワー推定値Ｐｓを所定のしきい値Ｐｓ０と比較してＨ又はＬの２値信号Ｄ１を出力するコンパレータＣＰ１と、瞬時パワー推定値Ｐｓと背景雑音パワー推定部２から出力される背景雑音パワー推定値Ｐｎとの比Ｐｓ／Ｐｎを求める除算器３ａと、除算器３ａの出力値Ｐｓ／Ｐｎを所定のしきい値δと比較してＨ又はＬの２値信号Ｄ２を出力するコンパレータＣＰ２と、２つの２値信号Ｄ１，Ｄ２の論理積を求める論理積演算部３ｂとにより構成される。而して、本実施形態においては、瞬時パワー推定値Ｐｓがしきい値Ｐｓ０よりも大きく（Ｐｓ＞Ｐｓ０）、且つ除算器３ａの出力Ｐｓ／Ｐｓ０がしきい値δよりも大きい（Ｐｓ／Ｐｓ０＞δ）場合に音声信号と判定し、その他の場合に非音声信号と判定する。ここで、しきい値Ｐｓ０は音声信号の最小レベルを規定するしきい値であり、しきい値δは音声信号レベルと背景騒音レベルとの最小比を規定するしきい値である。
【００１１】
背景雑音パワー推定切替部４は、音声スイッチＶＳの挿入損失量制御部１４から出力される制御信号Ｖｓによりオンオフされて背景雑音パワー推定部２に対する参照信号Ｖｘの入力を入／切するスイッチ等で構成される。そして、背景雑音パワー推定部２は、背景雑音パワー推定切替部４がオンして参照信号Ｖｘが入力されている場合に更新モードとなり、背景雑音パワー推定切替部４がオフして参照信号Ｖｘが入力されない場合に停止モードとなる。ここで更新モードにおいては、背景雑音パワー推定部２が参照信号Ｖｘを参照して逐次背景雑音パワー推定値Ｐｎを更新する。また、停止モードにおいては、背景雑音パワー推定部２が上記演算処理を停止し、背景雑音パワー推定値Ｐｎとしてそれ以前に求められた値を保持する。
【００１２】
ここで、例えば音声スイッチＶＳの挿入損失量制御部１４は、通話状態を受話状態と判定したときに制御信号Ｖｓにより背景雑音パワー推定切替部４をオフするとともに、送話状態と判定したときに制御信号Ｖｓにより背景雑音パワー推定切替部４をオンする。而して、背景雑音パワー推定部２では受話状態の時に停止モードとなり、送話状態の時に更新モードとなるから、音声検出器ＶＤ_１においては、参照信号Ｖｘに含まれる音響結合成分の割合が大きいときに背景雑音パワー推定部２の推定処理を停止することによって、背景雑音パワー推定値Ｐｎを、通話状態によらずにマイクロホン１０周辺の背景雑音パワーを近似した値とすることができる。その結果、マイクロホン１０とスピーカ１１との間の距離が短く、音響結合利得が大きい系においても遠端側の背景雑音がマイクロホン１０に回り込むことによる音声検出性能の劣化を低減することができる。なお、音声検出器ＶＤ_１の検出信号（検出フラグ）ＳＤ１は、例えば音声スイッチＶＳに与えられて種々の制御に利用される。
【００１３】
上述のように本発明に係る音声検出器ＶＤ_１によれば、参照信号Ｖｘに含まれる音響結合成分の割合が大きい場合には背景雑音パワー推定切替部４によって背景雑音パワー推定部３の処理を停止するとともに、参照信号Ｖｘに含まれる音響結合成分の割合が小さい状況で求められて背景雑音パワー推定部２が保持していた背景雑音パワー推定値Ｐｎに基づいて第１の音声／非音声判定部４による判定が行われるため、遠端側の通話端末における背景雑音がスピーカ１１から送出されてマイクロホン１０に回り込むことにより、マイクロホン１０の集音する音響信号における近端側話者の発する音声成分とそれ以外の背景雑音成分との比が小さくなることに起因して参照信号Ｖｘ中に音声信号が含まれていても音声信号が検出できなくなるような状況を低減することができ、スピーカ１１からマイクロホン１０までの間の距離が短くて音響結合利得が大きい系においても参照信号Ｖｘが音声信号であるか否かを精度良く検出することができる。
【００１４】
（実施形態２）
図３は、本発明の実施形態２における音声検出器ＶＤ_２のブロック図を示している。但し、本実施形態の基本的な構成は実施形態１と共通するので、共通する構成には同一の符号を付して説明を省略する。
【００１５】
本実施形態は、第１の音声／非音声判定部３の検出信号ＳＤ１に基づいて、図４に示すように参照信号Ｖｘが非音声信号であると判定された時間、すなわち検出信号ＳＤ１がＬレベルである時間（以下、「音声非検出継続時間」という）τ_１，τ_２，…を求める音声非検出区間計時部５と、音声非検出区間計時部５により求められた音声非検出継続時間τ_１，τ_２，…に基づいて参照信号Ｖｘが音声信号と非音声信号の何れであるかを判定する第２の音声／非音声判定部６とを備えた点に特徴がある。
【００１６】
ここで、音声非検出区間計時部５では、検出信号ＳＤ１がＬからＨに変化する毎に計時処理をリセットするが、それ以前の計時結果（音声非検出継続時間τ_１…）を少なくとも音声信号における音韻継続時間程度の間だけＲＡＭ等の記憶手段に保持している。
【００１７】
また第２の音声／非音声区間判定部６では、音声非検出区間計時部５の記憶手段に記憶された音声非検出継続時間τ_１，τ_２，…，τ_Ｎを参照し、これらの値が人間の音声の音韻継続時間程度の時間にわたって略一定であり、且つ人間の音声のピッチ間隔程度である場合には、これらの区間τ_１〜τ_Ｎを改めて音声区間として検出し、Ｈレベルの検出信号ＳＤ２を出力する（図４参照）。
【００１８】
而して本実施形態に撚れば、図４に示すようにマイクロホン１０付近における周囲騒音レベルＶ_Ｎが高く、瞬時パワー推定値ＰＳと背景雑音パワー推定値Ｐｎとの比Ｐｓ／Ｐｎが小さいために、参照信号Ｖｘに音声が含まれているにも拘わらず、第１の音声／非音声判定部３においては非音声信号として判定されてしまうような場合において、第２の音声／非音声判定部６において改めて音声信号として検出することが可能となる。その結果、実施形態１に対して背景雑音レベルの大きい拡声通話系においても音声信号をさらに精度良く検出することができるという利点がある。
【００１９】
【発明の効果】
請求項１の発明は、マイクロホン及びスピーカを有する拡声通話端末が他の通話端末又は拡声通話端末に接続されて半二重通話を行う拡声通話系の上記拡声通話端末に用いられ、通話路に伝送される信号が音声信号であるか非音声信号であるかを検出する音声検出器であって、上記通話路から取り出した参照信号の瞬時パワーを推定する瞬時パワー推定部と、上記参照信号に含まれる背景雑音成分のパワーを推定する背景雑音パワー推定部と、上記瞬時パワー推定部で推定される瞬時パワー推定値並びに上記背景雑音パワー推定部で推定される背景雑音パワー推定値に基づいて参照信号が音声信号であるか非音声信号であるかを判定するとともに判定結果が更新されるまで前回の判定結果を保持する第１の音声／非音声判定部と、参照信号に含まれる音響結合成分の割合が大きいときに上記背景雑音パワー推定部の推定処理を停止するとともに上記割合が大きくないときには背景雑音パワー推定部の推定処理を停止しないように切り替える背景雑音パワー推定切替部とを備え、背景雑音パワー推定部は、背景雑音パワー推定切替部によって推定処理を停止する停止モードに切り替えられた場合にそれ以前に求めて保持していた推定値を背景雑音パワー推定値とし、第１の音声／非音声判定部が背景雑音パワー推定部が保持していた背景雑音パワー推定値に基づいて判定するので、参照信号に含まれる音響結合成分の割合が大きい場合には背景雑音パワー推定切替部によって背景雑音パワー推定部の処理を停止するとともに、参照信号に含まれる音響結合成分の割合が小さい状況で求められて背景雑音パワー推定部が保持していた背景雑音パワー推定値に基づいて第１の音声／非音声判定部による判定が行われるため、遠端側の通話端末における背景雑音がスピーカから送出されてマイクロホンに回り込むことにより、マイクロホンの集音する音響信号における近端側話者の発する音声成分とそれ以外の背景雑音成分との比が小さくなることに起因して参照信号中に音声信号が含まれていても音声信号が検出できなくなるような状況を低減することができ、背景雑音レベルが大きい状況下においても参照信号が音声信号であるか否かを精度良く検出することができるという効果がある。
【００２０】
請求項２の発明は、上記第１の音声／非音声判定部による判定結果に応じて参照信号が非音声信号であると検出された音声非検出継続時間を求める音声非検出区間計時部と、該音声非検出区間計時部により求められた音声非検出継続時間から上記参照信号が音声信号と非音声信号の何れであるかを判定する第２の音声／非音声判定部とを備え、該第２の音声／非音声判定部が、上記音声非検出区間計時部により求められる音声非検出継続時間が人間の音声の音韻継続時間程度の時間にわたって略一定であり、且つ上記音声非検出継続時間が人間の音声のピッチ周期程度である場合には該音声非検出継続時間の参照信号を全て音声信号と判定して成るので、背景雑音レベルが非常に大きい状況下で第１の音声／非音声判定部においては音声信号が検出されないような場合においても、音声非検出区間計時部において測定される音声非検出継続時間が音声の音韻継続時間程度の間ほぼ一定であり、且つ人間の音声のピッチ間隔とほぼ等しい場合には、第２の音声／非音声判定部において改めて参照信号を音声信号として判定するから、背景雑音レベルの大きい拡声通話系において音声信号をさらに精度良く検出することができるという効果がある。
【図面の簡単な説明】
【図１】実施形態１を含む拡声通話機のブロック図である。
【図２】同上における第１の音声／非音声判定部のブロック図である。
【図３】実施形態２を示すブロック図である。
【図４】同上の動作説明用の信号波形図である。
【図５】従来例を示すブロック図である。
【符号の説明】
１瞬時パワー推定部
２背景雑音パワー推定部
３第１の音声／非音声判定部
４背景雑音パワー推定切替部
ＶＤ音声検出器
ＶＳ音声スイッチ
Ｍ拡声通話機[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice detector for mounting a noise elimination function, a voice switching function, and the like in a communication circuit of a loudspeaker communication device (intercom, telephone, PHS, etc.) used in a house, office, factory, or the like.
[0002]
[Prior art]
Generally, an audio detector is used to detect whether or not an audio signal collected by a microphone includes an audio signal. FIG. 5 shows a typical configuration example of such a voice detector. The voice detector VD ′ includes an instantaneous power estimating unit 20, a background noise power estimating unit 21, and a comparison determining unit 22. The instantaneous power estimating unit 20 is realized by an integrator or a digital filter or the like having a steep rise and a gentle fall, and estimates a short-time average power of a reference signal (an acoustic signal collected by a microphone). Is what you do. The background noise power estimating unit 21 is realized by an integrator or a digital filter or the like having a gradual rising characteristic and a sharp falling characteristic, and detects a background noise (background noise) level which is constantly present in the reference signal. It is an estimate. Further, the comparison / determination unit 22 compares the ratio between the instantaneous power estimated value obtained by the instantaneous power estimating unit 20 and the background noise power estimated value obtained by the background noise power estimating unit 21 with a predetermined threshold value. It is determined (detected) whether or not the signal includes an audio signal, and outputs an H or L binary signal (detection signal).
[0003]
[Problems to be solved by the invention]
However, when the above-described sound detector VD 'is provided in an internal circuit of a loudspeaker (not shown) having a speaker and a microphone, an acoustic signal collected by the microphone includes a wraparound component (acoustic coupling) from the speaker. Component). When the ratio of the wraparound component included in the acoustic signal is large, it is difficult to detect whether or not the speaker who is near the microphone, which is the original purpose, has uttered a voice. For example, when the background noise level near the far-end side call terminal is large and the microphone collects the far-end background noise via acoustic coupling, the background noise obtained by the background noise power estimating unit 21 in the conventional example described above. The noise power estimate increases. As a result, even when the speaker near the microphone utters a voice, the ratio between the instantaneous power estimated value and the background noise power estimated value is small, and the ratio cannot exceed a predetermined threshold value. At 22, there is a risk of being erroneously detected as not a voice signal (non-voice).
[0004]
The present invention has been made in view of the above problems, and an object of the present invention is to accurately detect whether an audio signal is included in a reference signal even under a situation where a background noise level is large. It is to provide a sound detector which can be used.
[0005]
[Means for Solving the Problems]
In order to achieve the above object, the present invention provides a loudspeaker system for a loudspeaker system in which a loudspeaker terminal having a microphone and a speaker is connected to another call terminal or a loudspeaker terminal to perform a half-duplex call. A voice detector for detecting whether a signal transmitted to a communication channel is a voice signal or a non-voice signal, and an instantaneous power estimation for estimating an instantaneous power of a reference signal extracted from the communication channel. Unit, a background noise power estimating unit for estimating the power of a background noise component included in the reference signal, an instantaneous power estimation value estimated by the instantaneous power estimating unit, and a background noise estimated by the background noise power estimating unit. It is determined whether the reference signal is a voice signal or a non-voice signal based on the power estimation value, and the first voice / voice holding the previous determination result until the determination result is updated. A sound determination unit, so that when the ratio is not large to stop the process of estimating the background noise power estimating portion when the ratio of the acoustic coupling component included in the reference signal is large does not stop processing for estimating the background noise power estimating section And a background noise power estimation switching unit that switches to the stop mode in which the estimation process is stopped by the background noise power estimation switching unit. Is a background noise power estimation value, and the first speech / non-speech determination unit makes a decision based on the background noise power estimation value held by the background noise power estimation unit , and the acoustic coupling included in the reference signal is When the proportion of the component is large, the processing of the background noise power estimation unit is stopped by the background noise power estimation switching unit, and is included in the reference signal. The determination by the first voice / non-voice determination unit is performed based on the background noise power estimation value obtained in the situation where the ratio of the acoustic coupling component is small and held by the background noise power estimation unit. The background noise in the call terminal is transmitted from the speaker and goes around the microphone, causing the ratio of the near-end speaker's voice component to other background noise components in the sound signal collected by the microphone to decrease. Thus, it is possible to reduce a situation where the audio signal cannot be detected even if the audio signal is included in the reference signal, and it is possible to determine whether the reference signal is an audio signal even in a situation where the background noise level is large. Detection can be performed with high accuracy.
[0006]
According to a second aspect of the present invention, in the first aspect of the present invention, a voice for determining a voice non-detection duration in which a reference signal is detected as a non-voice signal in accordance with a result of the determination by the first voice / non-voice determination unit. A second voice / non-voice determination for determining whether the reference signal is a voice signal or a non-voice signal based on a voice non-detection duration calculated by the voice non-detection section clock unit and the voice non-detection section clock unit; A second voice / non-voice determination unit, wherein the voice non-detection duration determined by the voice non-detection section clocking unit is substantially constant over a period of time about the phoneme duration of human voice, and When the above-mentioned voice non-detection duration is about the pitch period of human voice, all the reference signals of the voice non-detection duration are determined to be voice signals, and the background noise level is extremely large. The first sound below Even when the voice signal is not detected by the non-voice determination unit, the voice non-detection duration measured by the voice non-detection section clock unit is substantially constant for about the phoneme duration of voice, and When the pitch interval is almost equal to the voice pitch, the second voice / non-voice determination unit determines the reference signal again as a voice signal. Therefore, the voice signal is more accurately detected in a loudspeaker system having a large background noise level. Can be.
[0007]
BEST MODE FOR CARRYING OUT THE INVENTION
(Embodiment 1)
Figure 1 is a block diagram showing a hands-free communication unit M having a speech detector VD ₁ according to the first embodiment of the present invention. The speaker-phone call machine M includes a microphone 10, a speaker 11, a microphone amplifier 15, speaker amplifier 16, a voice detector VD ₁ and audio switch VS, is connected to other hands-free communication device or the like via a line. Here, the voice switch VS suppresses howling by reducing the gain of a closed loop formed by acoustic coupling from the speaker 11 to the microphone 10 and wraparound on the line side. A transmission side attenuator 12 inserted on a transmission signal line for transmitting a signal (transmission signal) to a line, and a reception signal line for transmitting a voice signal (reception signal) received from the line to a speaker 11. And an insertion loss control unit 14 that controls the gains of the transmitting side attenuator 12 and the receiving side attenuator 13 according to the communication state. Thus, the insertion loss control unit 14 determines the communication state by observing the transmission and reception signals, and appropriately adjusts the gain of the transmission-side attenuator 12 and the gain of the reception-side attenuator 13 according to the communication state. Set.
[0008]
The voice detector VD _{1 according} to the present invention includes an instantaneous power estimator 1 for estimating the instantaneous power of a reference signal (transmitted signal) Vx extracted from a communication path (transmitted signal line), and a background included in the reference signal Vx. A background noise power estimator 2 for estimating the power of the noise component, and an instantaneous power estimate Ps estimated by the instantaneous power estimator 1 and a background noise power estimate Pn estimated by the background noise power estimator 2 are referred to. A first speech / non-speech decision unit 3 which decides whether the signal Vx is a speech signal or a non-speech signal and holds the previous decision result until the decision result is updated, and a background noise power estimation unit 2 And a background noise power estimation switching unit 4 for switching between updating / stopping the background noise power estimation value Pn.
[0009]
The instantaneous power estimating unit 1 is configured by an integrator circuit or a digital filter or the like having a steep rise and a gradual fall characteristic. Further, the background noise power estimating unit 2 is configured by an integrating circuit or a digital filter having a characteristic that the rising is gradual and the falling is steep.
[0010]
On the other hand, the first voice / non-voice determination unit 3 compares the instantaneous power estimation value Ps output from the instantaneous power estimation unit 1 with a predetermined threshold value Ps0 as shown in FIG. A comparator CP1 for outputting a value signal D1, a divider 3a for obtaining a ratio Ps / Pn of the instantaneous power estimated value Ps and the background noise power estimated value Pn output from the background noise power estimating unit 2, and an output of the divider 3a A comparator CP2 that compares the value Ps / Pn with a predetermined threshold value δ to output a binary signal D2 of H or L, and a logical product operation unit 3b that obtains a logical product of the two binary signals D1 and D2. Be composed. Thus, in the present embodiment, the instantaneous power estimated value Ps is larger than the threshold value Ps0 (Ps> Ps0), and the output Ps / Ps0 of the divider 3a is larger than the threshold value δ (Ps / Ps0). > Δ), the signal is determined to be an audio signal; otherwise, it is determined to be a non-voice signal. Here, the threshold value Ps0 is a threshold value defining the minimum level of the audio signal, and the threshold value δ is a threshold value defining the minimum ratio between the audio signal level and the background noise level.
[0011]
The background noise power estimation switching unit 4 is a switch or the like that is turned on / off by a control signal Vs output from the insertion loss control unit 14 of the voice switch VS, and turns on / off the input of the reference signal Vx to the background noise power estimation unit 2. Be composed. Then, the background noise power estimation unit 2 enters the update mode when the background noise power estimation switching unit 4 is turned on and the reference signal Vx is input, and the background noise power estimation switching unit 4 is turned off and the reference signal Vx is turned off. If no input is made, the mode is the stop mode. Here, in the update mode, the background noise power estimation unit 2 sequentially updates the background noise power estimation value Pn with reference to the reference signal Vx. Further, in the stop mode, the background noise power estimating unit 2 stops the above-described arithmetic processing, and holds the value previously obtained as the background noise power estimation value Pn.
[0012]
Here, for example, the insertion loss amount control unit 14 of the voice switch VS turns off the background noise power estimation switching unit 4 by the control signal Vs when it determines that the communication state is the reception state, and when it determines that it is in the transmission state. The background noise power estimation switching unit 4 is turned on by the control signal Vs. And Thus, in a stopped mode when background noise power estimating section 2, receiving state, because the update mode when the transmission state, the voice detector VD _1, the ratio of the acoustic coupling component included in the reference signal Vx is By stopping the estimation process of the background noise power estimation unit 2 when the value is large, the background noise power estimation value Pn can be set to a value approximating the background noise power around the microphone 10 regardless of the call state. As a result, even in a system where the distance between the microphone 10 and the speaker 11 is short and the acoustic coupling gain is large, it is possible to reduce the deterioration of the voice detection performance due to the background noise on the far end wrapping around the microphone 10. The detection signal (detection flag) of speech detector VD ₁ SD1, for example, given to the voice switch VS is used for various controls.
[0013]
As described above, according to the speech detector VD ₁ according to the present invention, when the ratio of the acoustic coupling component included in the reference signal Vx is large, the processing of the background noise power estimation unit 3 is performed by the background noise power estimation switching unit 4. At the same time, the first speech / non-speech determination is performed based on the background noise power estimation value Pn held in the background noise power estimation unit 2 which is obtained in a situation where the ratio of the acoustic coupling component included in the reference signal Vx is small. Since the determination by the unit 4 is performed, the background noise in the far-end-side communication terminal is transmitted from the speaker 11 and wraps around the microphone 10, so that the sound component emitted by the near-end speaker in the sound signal collected by the microphone 10 is obtained. The audio signal cannot be detected even if the audio signal is included in the reference signal Vx due to the decrease in the ratio between the reference signal Vx and other background noise components. In a system where the distance between the speaker 11 and the microphone 10 is short and the acoustic coupling gain is large, it is possible to accurately detect whether or not the reference signal Vx is an audio signal.
[0014]
(Embodiment 2)
Figure 3 shows a block diagram of a speech detector VD ₂ according to the second embodiment of the present invention. However, since the basic configuration of this embodiment is the same as that of the first embodiment, the common components are denoted by the same reference numerals and description thereof is omitted.
[0015]
In the present embodiment, based on the detection signal SD1 of the first voice / non-voice determination unit 3, the time when the reference signal Vx is determined to be a non-voice signal as shown in FIG. The voice non-detection section timer 5 for obtaining the time (hereinafter referred to as “voice non-detection duration”) τ ₁ , τ ₂ ,..., And the voice non-detection duration calculated by the voice non-detection section timer 5 It is characterized in that it has a second voice / non-voice determination unit 6 that determines whether the reference signal Vx is a voice signal or a non-voice signal based on τ ₁ , τ ₂ ,.
[0016]
Here, the voice non-detection section clock unit 5 resets the clock processing each time the detection signal SD1 changes from L to H. However, the previous clock result (voice non-detection duration τ ₁ ...) Is stored in a storage means such as a RAM for only the duration of the phoneme.
[0017]
Further, the second voice / non-voice section determination unit 6 refers to the voice non-detection durations τ ₁ , τ ₂ ,..., Τ _N stored in the storage means of the voice non-detection section clock unit 5 and calculates these values. Is substantially constant over the duration of about the phoneme duration of the human voice, and is about the pitch interval of the human voice, these sections τ _{1 to} τ _N are newly detected as voice sections, and The detection signal SD2 is output (see FIG. 4).
[0018]
According Thus to the present embodiment, high ambient noise level V _N in the vicinity of the microphone 10 as shown in FIG. 4, since the ratio Ps / Pn of the instantaneous power estimate PS and background noise power estimate Pn is small In the case where the reference signal Vx includes a voice and the first voice / non-voice determination unit 3 determines that the reference signal Vx is a non-voice signal, the second voice / non-voice determination is performed. In the section 6, it is possible to detect again as an audio signal. As a result, there is an advantage that the voice signal can be detected with higher accuracy even in a loudspeaker system having a large background noise level as compared with the first embodiment.
[0019]
【The invention's effect】
The invention according to claim 1 is used for the above-mentioned loudspeaker terminal of a loudspeaker system in which a loudspeaker terminal having a microphone and a speaker is connected to another telephone terminal or a loudspeaker terminal and performs a half-duplex call, and is transmitted to a telephone line. A voice detector for detecting whether the signal to be transmitted is a voice signal or a non-voice signal, and an instantaneous power estimator for estimating the instantaneous power of the reference signal extracted from the communication channel; A background noise power estimator for estimating the power of the background noise component to be detected, an instantaneous power estimate estimated by the instantaneous power estimator, and a reference signal based on the background noise power estimate estimated by the background noise power estimator. There a first voice / non-voice determining portion for holding the previous determination result to the determination result is updated while determining whether the non-speech signal or a speech signal, the reference signal Background noise power estimating switching unit for switching so as not to stop the process of estimating the background noise power estimation unit when the ratio is not large to stop the process of estimating the background noise power estimating portion when the proportion of Murrell acoustic coupling component is larger The background noise power estimation unit, when switched to the stop mode to stop the estimation process by the background noise power estimation switching unit, the background noise power estimation value obtained and held the estimated value obtained before, Since the first speech / non-speech determination unit makes a determination based on the background noise power estimation value held by the background noise power estimation unit, if the ratio of acoustic coupling components included in the reference signal is large, the background noise power The process of the background noise power estimation unit is stopped by the estimation switching unit, and the estimation is performed in a situation where the ratio of the acoustic coupling component included in the reference signal is small. The first speech / non-speech determination unit makes the determination based on the background noise power estimation value held by the background noise power estimation unit, so that the background noise in the far-end communication terminal is transmitted from the speaker. The sound signal is included in the reference signal because the ratio of the sound component emitted by the near-end speaker to the background noise component in the sound signal collected by the microphone is reduced due to the sound signal collected by the microphone. This makes it possible to reduce the situation where the audio signal cannot be detected even if the reference signal is an audio signal even under the situation where the background noise level is large. is there.
[0020]
The invention according to claim 2 is a voice non-detection section clock unit that obtains a voice non-detection continuation time in which the reference signal is detected as a non-voice signal according to the determination result by the first voice / non-voice determination unit. A second sound / non-speech determining unit that determines whether the reference signal is a sound signal or a non-speech signal from the sound non-detection duration calculated by the sound non-detection section clock unit; The second voice / non-voice determination unit determines that the voice non-detection duration obtained by the voice non-detection section timing unit is substantially constant over a period of about the phoneme duration of a human voice, and the voice non-detection duration is When the pitch is about the pitch cycle of a human voice, all the reference signals of the voice non-detection duration are determined to be voice signals. Therefore, the first voice / non-voice determination is performed in a situation where the background noise level is very large. Audio signal in part Even in the case where the voice is not detected, if the voice non-detection duration measured by the voice non-detection section timer is approximately constant for about the phoneme duration of the voice, and is substantially equal to the pitch interval of the human voice, Since the second voice / non-voice determination unit determines the reference signal again as a voice signal, the voice signal can be more accurately detected in a loudspeaker system having a large background noise level.
[Brief description of the drawings]
FIG. 1 is a block diagram of a loudspeaker including Embodiment 1.
FIG. 2 is a block diagram of a first voice / non-voice determination unit in the above.
FIG. 3 is a block diagram showing a second embodiment.
FIG. 4 is a signal waveform diagram for explaining the operation of the above.
FIG. 5 is a block diagram showing a conventional example.
[Explanation of symbols]
REFERENCE SIGNS LIST 1 instantaneous power estimation unit 2 background noise power estimation unit 3 first speech / non-speech determination unit 4 background noise power estimation switching unit VD speech detector VS speech switch M loudspeaker

Claims

A loudspeaker terminal having a microphone and a speaker is used for the above-mentioned loudspeaker terminal of a loudspeaker system which is connected to another telephone terminal or a loudspeaker terminal and performs a half-duplex call. An audio detector for detecting whether the signal is a non-voice signal or not, an instantaneous power estimator for estimating the instantaneous power of the reference signal extracted from the communication channel, and the power of a background noise component included in the reference signal. The reference signal is a speech signal based on the background noise power estimating unit to be estimated, the instantaneous power estimation value estimated by the instantaneous power estimating unit, and the background noise power estimation value estimated by the background noise power estimating unit. a first voice / non-voice determining portion for holding the previous determination result to the determination result is updated while determining whether the speech signal, the acoustic imaging synthesis contained in the reference signal When the ratio is larger stops the process of estimating the background noise power estimation unit when the ratio is not large and a background noise power estimating switching unit for switching so as not to stop the process of estimating the background noise power estimating section, background When the background noise power estimation switching unit switches to the stop mode in which the estimation process is stopped by the background noise power estimation switching unit, the noise power estimation unit sets the estimated value obtained and held before that as the background noise power estimation value, and A speech detector, wherein the non-speech determination unit makes a determination based on the background noise power estimation value held by the background noise power estimation unit.

A voice non-detection section timer for obtaining a voice non-detection duration in which the reference signal is detected as a non-voice signal in accordance with a determination result by the first voice / non-voice determination section; And a second voice / non-voice determination unit that determines whether the reference signal is a voice signal or a non-voice signal from the voice non-detection duration obtained by the second voice / non-voice determination. The voice non-detection duration obtained by the voice non-detection section clocking unit is substantially constant over a period of about the phoneme duration of human voice, and the voice non-detection duration is about the pitch cycle of human voice. 2. The voice detector according to claim 1, wherein, when the above condition is satisfied, all of the reference signals of the voice non-detection duration are determined to be voice signals.