JP3688934B2

JP3688934B2 - Microphone system

Info

Publication number: JP3688934B2
Application number: JP10963399A
Authority: JP
Inventors: 真吾木内
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 1999-04-16
Filing date: 1999-04-16
Publication date: 2005-08-31
Anticipated expiration: 2019-04-16
Also published as: JP2000305594A

Description

【０００１】
【発明の属する技術分野】
本発明はマイクロホンシステムに係わり、特に、第１、第２の２つマイクロホンを備え、一方のマイクロホンから出力する信号を目標信号、他方のマイクロホンから出力する信号を参照信号として適応信号処理を行って適応フィルタの係数を決定し、該適応フィルタより出力する信号を用いて話者音声信号のＳＮ比を改善するマイクロホンシステムに関する。
【０００２】
【従来の技術】
現在の音声認識システムは、15dB以上のSN比（S：音声／N：ノイズ）が確保されている場合、約95％の認識率を実現できるくらいの技術レベルにまで達している。しかし、周囲に存在するノイズによりSN比が低下すると、それに伴って認識率が急激に低下する性質も有している。図８はSN比と認識性能との関係をいくつかの種類のマイクロホン（無指向性、単一指向性、狭指向性、AMNOR(Adaptive Microphone-array for Noise Reduction))について評価したもので、SN比と認識率がおおむねＳ字特性100を示す帯の中に包含されている。この図８から明らかなように、認識率はSN比の低下により急激に低下し、SN比が0dBの環境下において約50％にまで低下してしまう。
【０００３】
そのため、自動車が発生するノイズ（エンジン音・ロードノイズ・パターンノイズ・風切り音など）が存在する自動車車室内において、上記のような認識性能の劣化は避けられず、音声認識システムを車載化する上で大きな問題の一つとなっている。
前記したような事情から、周囲に存在するノイズの影響を少なくし、高いSN比で音声を受音するための方式が種々提案されており、複数のマイクロホンとディジタル信号処理を用いた高SN比受音システムはその一例である。かかる高SN比受音システムの中で最も簡単な構成のものは図９に示すように２つのマイクロホンを使用するシステムであるが、他にも、Griffith-Jim型アレイやAMNORといった、より高度なシステムが提案されている。
【０００４】
図９において、１，２は第１、第２のマイクロホン、３は適応信号処理部であり、誤差信号ｅが入力されると共にマイクロホン２の出力信号ｘ₂が参照信号として入力され、誤差信号ｅのパワーが最小となるようにLMS(Least Mean Square)アルゴリズムに基づいて適応信号処理を行う。適応信号処理部３において、３ａはLMS演算部、３ｂは例えばFIR型デジタルフィルタ構成の適応フィルタである。LMS演算部３ａは適応信号処理により誤差信号ｅのパワーが最小となるように適応フィルタ３ｂの係数を決定する。
【０００５】
４はマイクロホン１から出力する信号を目標信号として入力される目標応答設定部であり、音響系の逆特性を精度よく近似するためのものである。適応フィルタ３ｂのタップ長の半分の信号遅延時間（モデリングディレイ）をｄとするとき、目標応答設定部４は該時間ｄの遅延特性を有し、オーディオ周波数帯域でフラットな特性（ゲイン１の特性）を有する。すなわち、目標応答設定部４は、図１０（ａ）に示すようにゲイン１のフラットな周波数特性を備え、図１０（ｂ）に示すように遅延時間ｄを有するインパルス応答特性を有している。この目標応答設定部４は、FIR型デジタルフィルタの遅延時間ｄに対応する係数を１にし、他の係数を０にすることにより実現できる。
５は減算部であり、目標応答設定部４から出力する目標応答ｙ₁より適応フィルタ３ｂの出力信号ｙ₂を減算して誤差信号ｅを出力する。
【０００６】
非音声認識時、マイクロホン１、２にはノイズのみが入力し、適応信号処理部３は適応信号処理により誤差信号ｅのパワー、すなわち、ノイズ出力が最小となるようにフィルタ係数Wを決定する。一方、音声認識時には、適応信号処理部３はフィルタ係数の更新をせず、前記非音声認識時に決定したフィルタ係数Wを適応フィルタ３ｂに設定して音声信号を出力する。
【０００７】
図９に示すシステムに本来求められている理想的な性能は、音声認識時にノイズ出力を最小とすることである。すなわち、ノイズ出力En(z)に関して、
En(z)＝Xn₁(z)z^-d−Xn₂(z)W(z) (1)のとき、
{En(z)}²が最小値となるように、調整可能なパラメータ(適応フィルタ３ｂの係数)Wを決定することである。
【０００８】
ただし、Xn₁(z)，Xn₂(z)はマイクロホン１、２の出力信号に含まれるノイズであり、例としてノイズ源が１個の場合を考えるとノイズ源(ノイズ＝ｘn)から第１、第２のマイクロホン１，２までの伝搬特性をCN1, CN2とすれば、
Xn₁(z)＝CN1・ｘn
Xn₂(z)＝CN2・ｘn
であり、(1)式は
En(z)＝(CN1・z^-d−CN2・W(z))ｘn (2)
となる。
【０００９】
以上より、ノイズ源が１個の場合、フィルタ係数W(Z)は理想的には、
W(z)=CN1・z^-d/CN2 (3)
となる。
一方、音声認識時、適応信号処理部３はフィルタ係数の更新をせず、前記非音声認識時に決定したフィルタ係数W(Z)を適応フィルタ３ｂに設定して音声信号を出力する。
【００１０】
【発明が解決しようとする課題】
運転者である話者の口元からマイクロホン１，２までの伝搬特性をCS1, CS2とした場合、CS1, CS2は、ほぼ一定であるが、騒音源からマイクロホン１，２までの伝搬特性CN1, CN2は一定でない。これは、自動車が発生するノイズ（エンジン音・ロードノイズ・パターンノイズ・風切り音など）は多種多様であり、走行状態、走行環境などによりノイズの音場が大きく変わるからである。また、第１、第２のマイクロホン出力のどちらを目標信号、参照信号にするかは固定である。このため、騒音状態により適応フィルタＷが(CN1/CN2)・Z^-dをうまく模擬できず、SN比の改善効果が小さくなる問題がある。
以上から本発明の目的は、騒音源の環境に関係無く大きなSN比の改善効果が得られるマイクロホンシステムを提供することである。
【００１１】
【課題を解決するための手段】
上記課題は第１の本発明によれば、(1) 非音声認識時に、▲１▼第１のマイクロホンの出力を目標信号、第２のマイクロホンの出力を参照信号として適応信号処理を行わせてノイズリダクション量を求め、▲２▼ついで、第２のマイクロホンの出力を目標信号、第１のマイクロホンの出力を参照信号として適応信号処理を行わせてノイズリダクション量を求め、▲３▼ノイズリダクション量が大きい方のマイクロホン出力選択状態及びその時のフィルタ係数を保存し、▲４▼以後、上記ノイズリダクション量の大小に基づく保存処理を繰り返し、(2) 音声認識に際して、前記保存してあるマイクロホン出力選択状態に基づいて各マイクロホンの出力を目標信号、参照信号として決定し、かつ、前記保存してあるフィルタ係数を適応フィルタに設定する、ことにより達成される。
すなわち、以上のようにすれば、ノイズの発生状態により騒音源から各マイクロホン１，２迄の伝搬特性が変化しても、ノイズリダクション量が大きくなるようにマイクロホン出力を目標信号、参照信号として決定できるため、SN比を効果的に改善できる。
【００１２】
又、上記課題は本発明によれば、(1) 非音声認識時、▲１▼第１のマイクロホンの出力を目標信号、第２のマイクロホンの出力を参照信号として適応信号処理を行わせたときの出力信号をノイズ信号Ｎ１とし、▲２▼ついで、第１、第２のマイクロホン出力に替えて第１、第２の伝搬特性設定手段の出力を目標信号、参照信号として適応信号処理を行わせたときの出力信号を音声信号Ｓ１とし、▲３▼これらノイズ信号及び音声信号を用いてＳＮ比を計算し、▲４▼しかる後、第２のマイクロホンの出力を目標信号、第１のマイクロホンの出力を参照信号として適応信号処理を行せたときの出力信号をノイズ信号Ｎ２とし、▲５▼ついで、第２、第１のマイクロホン出力に替えて第２、第１の伝搬特性設定手段の出力を目標信号、参照信号として適応信号処理を行わせたときの出力信号を音声信号Ｓ２とし、▲６▼これらノイズ信号及び音声信号を用いてＳＮ比を計算し、▲７▼ＳＮ比が大きい方のマイクロホン出力選択状態及びその時のフィルタ係数を保存し、▲８▼以後、ＳＮ比の大小に基づく保存処理を繰り返し、(2) 音声認識に際して、前記保存してあるマイクロホン出力選択状態に基づいて各マイクロホンの出力を目標信号、参照信号として決定し、かつ、前記保存してあるフィルタ係数を適応フィルタに設定する、ことにより達成される。
以上のようにすれば、ノイズの発生状態により騒音源から各マイクロホン１，２迄の伝搬特性が変化しても、SN比が大きくなるようにマイクロホン出力を目標信号、参照信号として決定できるため、SN比の改善効果は大きい。
【００１３】
【発明の実施の形態】
（ａ）第１実施例
図１は本発明の第１実施例のマイクロホンシステム（ノイズリダクションシステム）の構成図であり、図９の従来例と同一部分には同一符号を付している。
図中、１，２は第１、第２のマイクロホン、３は適応信号処理部であり、誤差信号ｅが入力されると共に適宜マイクロホン１またはマイクロホン２の出力信号が参照信号ｘ₂として入力され、誤差信号ｅのパワーが最小となるようにLMS
(Least Mean Square)アルゴリズムに基づいて適応信号処理を行う。適応信号処理部３において、３ａはLMS演算部、３ｂは例えばFIR型デジタルフィルタ構成の適応フィルタである。LMS演算部３ａは適応信号処理により誤差信号ｅのパワーが最小となるように適応フィルタ３ｂの係数を決定する。
【００１４】
４はマイクロホン１またはマイクロホン２から出力する信号を目標信号ｘ₁として入力される目標応答設定部であり、音響系の逆特性を精度よく近似するためのものである。適応フィルタ３ｂのタップ長の半分の信号遅延時間（モデリングディレイ）をｄとするとき、目標応答設定部４は該時間の遅延特性を有し、オーディオ周波数帯域でフラットな特性（ゲイン１の特性）を有する。５は減算部であり、目標応答設定部４から出力する目標応答ｙ₁より適応フィルタ３ｂの出力信号ｙ₂を減算して誤差信号ｅを出力する。この誤差信号ｅは音声認識時において音声信号となって音声認識処理部（図示せず）に入力する。
【００１５】
１１は第１、第２のマイクロホン１，２の出力をそれぞれ目標信号ｘ₁、参照信号ｘ₂として選択的に切り替えるスイッチ部であり、２つのスイッチ１１ａ，１１ｂを有している。２１はメモリで、▲１▼マイクロホン出力の選択状態及びその時のノイズリダクション量NR1,NR2及びフィルタ係数W1,W2、▲２▼ノイズリダクション量が大きい方のマイクロホン出力の選択状態及びその時のフィルタ係数Wを記憶する。３１は処理部であり、非音声認識時に、ノイズリダクション量が大きくなるマイクロホン出力選択状態及びその時のフィルタ係数Wを決定し、音声認識に際して、非音声認識時に決定したマイクロホン出力選択状態に基づいて各マイクロホンの出力を目標信号、参照信号とし使用し、かつ、非音声認識時に決定したフィルタ係数Wを適応フィルタ３ｂに設定するものである。
【００１６】
図２は第１実施例の目標信号、参照信号決定処理及びフィルタ係数リアルタイム更新処理のフローである。
非音声認識時、処理部３１はスイッチ部１１を制御し、マイクロホン１の出力を目標信号ｘ₁、マイクロホン２の出力を参照信号ｘ₂として選択する（ステップ１０１）。適応信号処理部３は誤差信号ｅのパワーが最小となるように適応信号処理を行う（ステップ１０２）。誤差信号ｅが収束すれば（ステップ１０３）、処理部３１は目標応答設定部４から出力する目標応答ｙ₁と誤差信号ｅのパワーの差であるノイズリダクション量NR1を計算し、該ノイズリダクション量NR1及びその時の適応フィルタ係数W1をメモリ２１に記憶する（ステップ１０４）。
ついで、処理部３１はスイッチ部１１を制御し、マイクロホン２の出力を目標信号ｘ₁、マイクロホン１の出力を参照信号ｘ₂として選択する（ステップ１０５）。適応信号処理部３は誤差信号ｅのパワーが最小となるように適応信号処理を行う（ステップ１０６）。誤差信号ｅが収束すれば（ステップ１０７）、処理部３１は目標応答設定部４から出力する目標応答ｙ₁と誤差信号ｅのパワーの差であるノイズリダクション量NR2を計算し、該ノイズリダクション量NR2及びその時の適応フィルタ係数W2をメモリ２１に記憶する（ステップ１０８）。
【００１７】
しかる後、処理部３１はノイズリダクション量NR1,NR2の大小を比較し(ステップ１０９）、NR1>NR2であれば、マイクロホン１の出力を目標信号ｘ₁、マイクロホン２の出力を参照信号ｘ₂としてメモリ２１に記憶すると共にフィルタ係数W1をWとして(W=W1)メモリ２１に記憶する(ステップ１１０）。
一方、NR1≦NR2であれば、マイクロホン２の出力を目標信号ｘ₁、マイクロホン１の出力を参照信号ｘ₂としてメモリ２１に記憶すると共にフィルタ係数W2をWとして(W=W2)メモリ２１に記憶する（ステップ１１１）。
以後、始めに戻って上記処理を繰り返し、ノイズリダクション量が大きい方の最新のマイクロホン選択状態及びその時のフィルタ係数をメモリ２１に保存する。
【００１８】
図３は第１実施例における音声認識時のマイクロホン出力選択及びフィルタ係数設定処理フローである。
車載ナビゲーション等では音声により指示する場合、トークスイッチ等を操作してから音声入力する。したがって、処理部３１は例えばトークスイッチがオンして音声認識状態になったか監視する（ステップ２０１）。音声認識状態になれば、処理部３１は図２の目標信号、参照信号の決定処理及び適応フィルタの係数更新処理を停止する（ステップ２０２）。
【００１９】
ついで、処理部３１はメモリ２１に保存してあるマイクロホン選択状態に基づいて、スイッチ部１１を切り替えて各マイクロホン出力を目標信号ｘ₁、参照信号ｘ₂として使用し、かつ、非音声認識時に決定したフィルタ係数Wを適応フィルタ３ｂに設定する（ステップ２０３）。
かかる状態において、音声が入力するとノイズが減衰した音声信号が減算部５から出力し、音声認識処理部に入力する。
以後、音声認識処理が終了したか監視し（ステップ２０４）、終了すれば、図２の目標信号、参照信号の決定処理及びフィルタ係数の更新処理を再開する（ステップ２０５）。
【００２０】
（ｂ）第２実施例
図４は本発明の第２実施例のマイクロホンシステム（ノイズリダクションシステム）の構成図であり、図１の第１実施例と同一部分には同一符号を付している。第１実施例ではノイズリダクション量の大小に基づいてマイクロホン出力の選択及びフィルタ係数の設定を行うが、第２実施例ではSN比の大小に基づいてマイクロホン出力の選択及びフィルタ係数の設定を行う。
【００２１】
図４のマイクロホンシステムが図１の第１実施例のマイクロホンシステムと異なる点は、
(1) 疑似音声（例えばホワイトノイズ）を発生する疑似音声出力部４１を設けた点、
(2) 話者口元からマイクロホン１，２までの伝搬特性CS1、CS2(図５参照）を模擬する伝搬特性設定部５１，５２を設けたた点、
(3) マイクロホン１，２の出力と第１、第２の伝搬特性設定部５１，５２の出力を選択的に切り替えるスイッチ部６１を設けた点、
(4) 処理部３１が、▲１▼マイクロホン１の出力を目標信号ｘ₁、マイクロホン２の出力を参照信号ｘ₂としたときのSN比(=S1/N1)、▲２▼マイクロホン２の出力を目標信号ｘ₁、マイクロホン１の出力を参照信号ｘ₂としたときのSN比(=S2/N2)、をそれぞれ計算し、SN比の大きい方のマイクロホン選択状態及びフィルタ係数Wをメモリ２１に記憶する点、
である。
【００２２】
図６及び図７は第２実施例の目標信号、参照信号決定処理及びフィルタ係数リアルタイム更新処理のフローである。
非音声認識時、処理部３１はスイッチ部１１，６１をそれぞれ切替制御（図中実線状態）し、マイクロホン１の出力を目標信号ｘ₁、マイクロホン２の出力を参照信号ｘ₂として選択する（ステップ３０１）。適応信号処理部３は誤差信号ｅのパワーが最小となるように適応信号処理を行う（ステップ３０２）。誤差信号ｅが収束すれば（ステップ３０３）、処理部３１は誤差信号ｅのパワー（＝ｅ²）をノイズ出力N1として記憶する（ステップ３０４）。
ついで、フィルタ係数W1の更新を停止すると共に、スイッチ部６１を制御して第１の伝搬特性設定部５１から出力する模擬音声信号を目標応答設定部４に入力し、伝搬特性設定部５２から出力する模擬音声信号を適応フィルタ３ｂに入力する（ステップ３０５）。そして、かかる状態において、誤差信号ｅのパワー（＝ｅ²）を音声信号出力S1として記憶し（ステップ３０６）、SN比（=S1/N1）とその時の適応フィルタ係数W1をメモリ２１に記憶する（ステップ３０７）。
【００２３】
しかる後、スイッチ部１１，６１をそれぞれ切替制御し、マイクロホン２の出力を目標信号ｘ₁、マイクロホン１の出力を参照信号ｘ₂として選択する（ステップ３０８）。適応信号処理部３は誤差信号ｅのパワーが最小となるように適応信号処理を行う（ステップ３０９）。誤差信号ｅが収束すれば（ステップ３１０）、処理部３１は誤差信号ｅのパワー（＝ｅ²）をノイズ出力N2として記憶する（ステップ３１１）。
ついで、フィルタ係数W2の更新を停止すると共に、スイッチ部６１を制御して第２の伝搬特性設定部５２から出力する模擬音声信号を目標応答設定部４に入力し、伝搬特性設定部５１から出力する模擬音声信号を適応フィルタ３ｂに入力する（ステップ３１２）。そして、かかる状態において、誤差信号ｅのパワー（＝ｅ²）を音声信号出力S2として記憶し（ステップ３１３）、SN比（=S2/N2）とその時の適応フィルタ係数W2をメモリ２１に記憶する（ステップ３１４）。
【００２４】
以上により、SN比(S1/N1, S2/N2)が求まれば、処理部３１はこれらSN比S1/N1,S2/N2の大小を比較し(ステップ３１５）、S1/N1>S2/N2であれば、マイクロホン１の出力を目標信号、マイクロホン２の出力を参照信号としてメモリ２１に記憶すると共にフィルタ係数W1をWとして(W=W1)メモリ２１に記憶する(ステップ３１６）。
しかし、S1/N1≦S2/N2であれば、マイクロホン２の出力を目標信号、マイクロホン１の出力を参照信号としてメモリ２１に記憶すると共にフィルタ係数W2をWとして(W=W2)メモリ２１に記憶する(ステップ３１７）。
以後、始めに戻って上記処理を繰り返し、SN比が大きい方の最新のマイク選択状態及びその時のフィルタ係数を保存する。
音声認識状態になれば、図３の第１実施例と同一の処理フローにしたがってマイクロホン出力の選択処理及びフィルタ係数の設定処理を実行する。
以上、本発明を実施例により説明したが、本発明は請求の範囲に記載した本発明の主旨に従い種々の変形が可能であり、本発明はこれらを排除するものではない。
【００２５】
【発明の効果】
以上本発明によれば、ノイズの発生状態により騒音源から各マイクロホン迄の伝搬特性が変化しても、ノイズリダクション量が大きくなるように各マイクロホン出力を目標信号、参照信号として決定するため、SN比を効果的に改善することができる。
又、本発明によれば、ノイズの発生状態により騒音源から各マイクロホン迄の伝搬特性が変化しても、SN比を計算し、SN比が大きくなるように各マイクロホン出力を目標信号、参照信号として決定するため確実にSN比を改善でき、その改善効果は大きい。
【図面の簡単な説明】
【図１】本発明の第１実施例のマイクロホンシステム（ノイズリダクションシステム）の構成図である。
【図２】第１実施例の目標信号、参照信号の決定処理及びフィルタ係数のリアルタイム更新処理フローである。
【図３】第１実施例の音声認識時におけるマイクロホン出力の選択及びフィルタ係数の設定処理フローである。
【図４】本発明の第２実施例のマイクロホンシステムの構成図である。
【図５】話者口元から各マイクロホンまでの伝搬特性説明図である。
【図６】第２実施例の目標信号、参照信号の決定処理及びフィルタ係数のリアルタイム更新処理フロー（その１）である。
【図７】第２実施例の目標信号、参照信号の決定処理及びフィルタ係数のリアルタイム更新処理フロー（その２）である。
【図８】 SN比と認識率の関係図である。
【図９】従来のマイクロホンを２つ使用した場合の高SN比受音システムである。
【図１０】目標応答設定部の特性図である。
【符号の説明】
１，２・・第１、第２のマイクロホン
３・・適応信号処理部
３ａ・・LMS演算部
３ｂ・・適応フィルタ
４・・目標応答設定部
５・・減算部
１１・・スイッチ部
２１・・メモリ
３１・・処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a microphone system, and in particular, includes first and second microphones, and performs adaptive signal processing using a signal output from one microphone as a target signal and a signal output from the other microphone as a reference signal. The present invention relates to a microphone system that determines a coefficient of an adaptive filter and improves an S / N ratio of a speaker voice signal using a signal output from the adaptive filter.
[0002]
[Prior art]
The current speech recognition system has reached a technical level that can achieve a recognition rate of about 95% when a signal-to-noise ratio (S: voice / N: noise) of 15 dB or more is secured. However, when the S / N ratio is reduced due to noise present in the surrounding area, the recognition rate is rapidly lowered. Fig. 8 shows the relationship between SN ratio and recognition performance for several types of microphones (omnidirectional, unidirectional, narrow directivity, AMNOR (Adaptive Microphone-array for Noise Reduction)). Ratios and recognition rates are generally contained within a band showing the S-characteristic 100. As is apparent from FIG. 8, the recognition rate rapidly decreases due to a decrease in the SN ratio, and decreases to about 50% in an environment where the SN ratio is 0 dB.
[0003]
For this reason, the above-mentioned deterioration in recognition performance is unavoidable in automobile interiors where noise generated by automobiles (engine noise, road noise, pattern noise, wind noise, etc.) is present, and a voice recognition system is mounted on the vehicle. It is one of the big problems.
In view of the circumstances described above, various methods have been proposed for receiving sound with a high S / N ratio while reducing the influence of surrounding noise, and a high S / N ratio using a plurality of microphones and digital signal processing has been proposed. A sound receiving system is an example. The simplest configuration of such a high S / N ratio receiving system is a system using two microphones as shown in FIG. 9, but other advanced systems such as Griffith-Jim type array and AMNOR are also used. A system has been proposed.
[0004]
9, 1 and 2 first, second microphone, 3 is an adaptive signal processing unit, the output signal x ₂ microphones 2 with the error signal e is input is input as a reference signal, the error signal e Adaptive signal processing is performed based on the LMS (Least Mean Square) algorithm so that the power of the signal becomes minimum. In the adaptive signal processing unit 3, 3a is an LMS calculation unit, and 3b is an adaptive filter having, for example, an FIR type digital filter configuration. The LMS calculation unit 3a determines the coefficient of the adaptive filter 3b so that the power of the error signal e is minimized by adaptive signal processing.
[0005]
Reference numeral 4 denotes a target response setting unit that receives a signal output from the microphone 1 as a target signal, and is used to accurately approximate the inverse characteristics of the acoustic system. When the signal delay time (modeling delay) that is half the tap length of the adaptive filter 3b is d, the target response setting unit 4 has a delay characteristic of the time d and is flat in the audio frequency band (gain 1 characteristic). ). That is, the target response setting unit 4 has a flat frequency characteristic with a gain of 1 as shown in FIG. 10A, and has an impulse response characteristic with a delay time d as shown in FIG. 10B. . This target response setting unit 4 can be realized by setting the coefficient corresponding to the delay time d of the FIR type digital filter to 1 and setting other coefficients to 0.
5 is a subtraction unit, and outputs an error signal e by subtracting the output signal y ₂ of the adaptive filter 3b from the target response y ₁ to be output from the target response setting section 4.
[0006]
During non-speech recognition, only noise is input to the microphones 1 and 2, and the adaptive signal processing unit 3 determines the filter coefficient W by adaptive signal processing so that the power of the error signal e, that is, the noise output is minimized. On the other hand, at the time of speech recognition, the adaptive signal processing unit 3 does not update the filter coefficient, sets the filter coefficient W determined at the time of non-speech recognition to the adaptive filter 3b, and outputs a speech signal.
[0007]
The ideal performance originally required for the system shown in FIG. 9 is to minimize the noise output during speech recognition. That is, regarding the noise output En (z),
When En (z) = Xn ₁ (z) z ^-d −Xn ₂ (z) W (z) (1)
An adjustable parameter (coefficient of the adaptive filter 3b) W is determined so that {En (z)} ² becomes a minimum value.
[0008]
However, Xn ₁ (z) and Xn ₂ (z) are noises included in the output signals of the microphones 1 and 2, and considering the case of one noise source as an example, the first from the noise source (noise = xn) If the propagation characteristics to the second microphones 1 and 2 are CN1 and CN2,
Xn ₁ (z) = CN1 xn
Xn ₂ (z) = CN2 xn
And equation (1) is
En (z) = (CN1 ・ z ^-d −CN2 ・ W (z)) xn (2)
It becomes.
[0009]
From the above, when there is one noise source, the filter coefficient W (Z) is ideally
W (z) = CN1 ・ z ^-d / CN2 (3)
It becomes.
On the other hand, at the time of speech recognition, the adaptive signal processing unit 3 does not update the filter coefficient, sets the filter coefficient W (Z) determined at the time of non-speech recognition to the adaptive filter 3b, and outputs a speech signal.
[0010]
[Problems to be solved by the invention]
If the propagation characteristics from the speaker's mouth to the microphones 1 and 2 are CS1 and CS2, CS1 and CS2 are almost constant, but the propagation characteristics CN1 and CN2 from the noise source to the microphones 1 and 2 Is not constant. This is because there are various types of noise (engine sound, road noise, pattern noise, wind noise, etc.) generated by an automobile, and the noise sound field varies greatly depending on driving conditions, driving environment, and the like. Further, which of the first and second microphone outputs is used as the target signal and the reference signal is fixed. For this reason, there is a problem that the adaptive filter W cannot properly simulate (CN1 / CN2) · Z ^−d depending on the noise state, and the effect of improving the SN ratio becomes small.
From the above, an object of the present invention is to provide a microphone system capable of obtaining a large effect of improving the S / N ratio regardless of the environment of the noise source.
[0011]
[Means for Solving the Problems]
According to the first aspect of the present invention, (1) during non-speech recognition, (1) adaptive signal processing is performed using the output of the first microphone as a target signal and the output of the second microphone as a reference signal. Determine the amount of noise reduction. (2) Then, the amount of noise reduction is obtained by performing adaptive signal processing using the output of the second microphone as the target signal and the output of the first microphone as the reference signal. (3) Noise reduction amount The microphone output selection state with the larger value and the filter coefficient at that time are saved, and after (4), the saving process based on the magnitude of the noise reduction amount is repeated, and (2) the saved microphone output selection is performed for voice recognition. Based on the state, the output of each microphone is determined as a target signal and a reference signal, and the stored filter coefficients are set in the adaptive filter. To be achieved by.
In other words, with the above configuration, the microphone output is determined as the target signal and reference signal so that the amount of noise reduction increases even if the propagation characteristics from the noise source to each of the microphones 1 and 2 change due to the noise generation state. Therefore, the SN ratio can be improved effectively.
[0012]
In addition, according to the present invention, (1) when non-voice recognition is performed, and (1) when adaptive signal processing is performed using the output of the first microphone as a target signal and the output of the second microphone as a reference signal The output signal is the noise signal N1, and (2) then, instead of the first and second microphone outputs, the output of the first and second propagation characteristic setting means is used as the target signal and the reference signal to perform adaptive signal processing. The output signal at that time is the audio signal S1, and (3) the S / N ratio is calculated using the noise signal and the audio signal, and (4) the output of the second microphone is set as the target signal and the output of the first microphone. The output signal when the adaptive signal processing can be performed using the output as a reference signal is the noise signal N2. (5) Next, instead of the second and first microphone outputs, the outputs of the second and first propagation characteristic setting means As the target signal and reference signal The output signal when the adaptive signal processing is performed is the audio signal S2, and (6) the SN ratio is calculated using these noise signal and audio signal, and (7) the microphone output selection state with the larger SN ratio and The filter coefficient at that time is stored, and after (8), the storing process based on the magnitude of the SN ratio is repeated. (2) Upon speech recognition, the output of each microphone is set to the target signal based on the stored microphone output selection state. , Determining as a reference signal, and setting the stored filter coefficients in an adaptive filter.
As described above, since the microphone output can be determined as the target signal and the reference signal so that the SN ratio is increased even if the propagation characteristics from the noise source to each of the microphones 1 and 2 change due to the noise generation state, The effect of improving the SN ratio is great.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
(A) First Embodiment FIG. 1 is a block diagram of a microphone system (noise reduction system) according to a first embodiment of the present invention. Components identical with those of the conventional example of FIG.
In the figure, 1 and 2 are first and second microphones, 3 is an adaptive signal processing unit, and an error signal e is input and an output signal of the microphone 1 or the microphone 2 is appropriately input as a reference signal x ₂ . LMS so that the power of error signal e is minimized.
Performs adaptive signal processing based on the (Least Mean Square) algorithm. In the adaptive signal processing unit 3, 3a is an LMS calculation unit, and 3b is an adaptive filter having, for example, an FIR type digital filter configuration. The LMS calculation unit 3a determines the coefficient of the adaptive filter 3b so that the power of the error signal e is minimized by adaptive signal processing.
[0014]
Reference numeral 4 denotes a target response setting unit for inputting a signal output from the microphone 1 or the microphone 2 as the target signal x ₁ for approximating the inverse characteristic of the acoustic system with high accuracy. When the signal delay time (modeling delay) that is half the tap length of the adaptive filter 3b is d, the target response setting unit 4 has a delay characteristic of the time, and is flat in the audio frequency band (gain 1 characteristic). Have 5 is a subtraction unit, and outputs an error signal e by subtracting the output signal y ₂ of the adaptive filter 3b from the target response y ₁ to be output from the target response setting section 4. This error signal e becomes a voice signal at the time of voice recognition and is input to a voice recognition processing unit (not shown).
[0015]
A switch unit 11 selectively switches the outputs of the first and second microphones 1 and ₂ as the target signal x ₁ and the reference signal x ₂ , respectively, and has two switches 11a and 11b. Reference numeral 21 denotes a memory. (1) Selection state of microphone output and noise reduction amounts NR1, NR2 and filter coefficients W1, W2 at that time, (2) Selection state of microphone output having a larger noise reduction amount and filter coefficient W at that time Remember. Reference numeral 31 denotes a processing unit that determines a microphone output selection state and a filter coefficient W at that time in which the amount of noise reduction becomes large at the time of non-speech recognition, and determines each of the microphone outputs based on the microphone output selection state determined at the time of non-speech recognition. The output of the microphone is used as the target signal and reference signal, and the filter coefficient W determined at the time of non-speech recognition is set in the adaptive filter 3b.
[0016]
FIG. 2 is a flowchart of the target signal, reference signal determination process and filter coefficient real-time update process of the first embodiment.
During non-voice recognition, the processing unit 31 controls the switch unit 11 to select the output of the microphone ₁ as the target signal x ₁ and the output of the microphone 2 as the reference signal x ₂ (step 101). The adaptive signal processing unit 3 performs adaptive signal processing so that the power of the error signal e is minimized (step 102). If the error signal e is converged (step 103), processing unit 31 calculates the noise reduction amount NR1 is the difference in power of the target response y ₁ and the error signal e output from the target response setting section 4, the noise reduction amount NR1 and the adaptive filter coefficient W1 at that time are stored in the memory 21 (step 104).
Next, the processing unit 31 controls the switch unit 11 to select the output of the microphone 2 as the target signal x ₁ and the output of the microphone 1 as the reference signal x ₂ (step 105). The adaptive signal processing unit 3 performs adaptive signal processing so that the power of the error signal e is minimized (step 106). If the error signal e converges (step 107), the processing unit 31 calculates a noise reduction amount NR2 that is a difference between the power of the target response y ₁ output from the target response setting unit 4 and the error signal e, and the noise reduction amount. NR2 and the adaptive filter coefficient W2 at that time are stored in the memory 21 (step 108).
[0017]
Thereafter, the processing unit 31 compares the noise reduction amounts NR1 and NR2 (step 109). If NR1> NR2, the output of the microphone 1 is set as the target signal x ₁ and the output of the microphone 2 is set as the reference signal x _2. The data is stored in the memory 21 and the filter coefficient W1 is stored in the memory 21 as W (W = W1) (step 110).
On the other hand, if NR1 ≦ NR2, the output of the microphone 2 is stored in the memory 21 as the target signal x ₁ and the output of the microphone 1 as the reference signal x ₂ and the filter coefficient W2 is stored in the memory 21 as W (W = W2). (Step 111).
Thereafter, returning to the beginning, the above processing is repeated, and the latest microphone selection state with the larger noise reduction amount and the filter coefficient at that time are stored in the memory 21.
[0018]
FIG. 3 is a flowchart of a microphone output selection and filter coefficient setting process during speech recognition in the first embodiment.
In the case of in-vehicle navigation or the like, when a voice instruction is given, voice input is performed after the talk switch or the like is operated. Therefore, the processing unit 31 monitors, for example, whether the talk switch is turned on and the voice recognition state is set (step 201). If the speech recognition state is entered, the processing unit 31 stops the target signal and reference signal determination processing and adaptive filter coefficient update processing of FIG. 2 (step 202).
[0019]
Next, the processing unit 31 switches the switch unit 11 based on the microphone selection state stored in the memory 21 to use each microphone output as the target signal x ₁ and the reference signal x ₂ and is determined at the time of non-speech recognition. The filtered filter coefficient W is set in the adaptive filter 3b (step 203).
In this state, when a voice is input, a voice signal with attenuated noise is output from the subtracting unit 5 and input to the voice recognition processing unit.
Thereafter, it is monitored whether or not the voice recognition processing is completed (step 204), and if completed, the target signal and reference signal determination processing and filter coefficient updating processing in FIG. 2 are resumed (step 205).
[0020]
(B) Second Embodiment FIG. 4 is a block diagram of a microphone system (noise reduction system) according to a second embodiment of the present invention. Components identical with those of the first embodiment of FIG. . In the first embodiment, the microphone output is selected and the filter coefficient is set based on the magnitude of the noise reduction amount. In the second embodiment, the microphone output is selected and the filter coefficient is set based on the magnitude of the SN ratio.
[0021]
The microphone system of FIG. 4 is different from the microphone system of the first embodiment of FIG.
(1) The provision of a pseudo sound output unit 41 that generates pseudo sound (for example, white noise),
(2) Providing propagation characteristic setting units 51 and 52 that simulate propagation characteristics CS1 and CS2 (see FIG. 5) from the speaker's mouth to the microphones 1 and 2;
(3) A switch unit 61 that selectively switches the output of the microphones 1 and 2 and the output of the first and second propagation characteristic setting units 51 and 52 is provided.
(4) SN ratio (= S1 / N1) when the processing unit 31 uses (1) the output of the microphone 1 as the target signal x ₁ and the output of the microphone 2 as the reference signal x ₂ , (2) the output of the microphone 2 Is the target signal x ₁ , and the SN ratio (= S2 / N2) when the output of the microphone 1 is the reference signal x ₂ is calculated, and the microphone selection state and the filter coefficient W with the larger SN ratio are stored in the memory 21. Points to remember,
It is.
[0022]
6 and 7 are flowcharts of the target signal, reference signal determination process and filter coefficient real-time update process of the second embodiment.
At the time of non-speech recognition, the processing unit 31 switches and controls the switch units 11 and 61 (solid line state in the figure), and selects the output of the microphone ₁ as the target signal x ₁ and the output of the microphone 2 as the reference signal x ₂ (step) 301). The adaptive signal processing unit 3 performs adaptive signal processing so that the power of the error signal e is minimized (step 302). If the error signal e converges (step 303), the processing unit 31 stores the power (= e ² ) of the error signal e as the noise output N1 (step 304).
Next, the updating of the filter coefficient W1 is stopped, the simulated voice signal output from the first propagation characteristic setting unit 51 is input to the target response setting unit 4 by controlling the switch unit 61, and output from the propagation characteristic setting unit 52 The simulated audio signal to be input is input to the adaptive filter 3b (step 305). In such a state, the power (= e ² ) of the error signal e is stored as the audio signal output S1 (step 306), and the SN ratio (= S1 / N1) and the adaptive filter coefficient W1 at that time are stored in the memory 21. (Step 307).
[0023]
Thereafter, the switch units 11 and 61 are respectively controlled to select the output of the microphone 2 as the target signal x ₁ and the output of the microphone 1 as the reference signal x ₂ (step 308). The adaptive signal processing unit 3 performs adaptive signal processing so that the power of the error signal e is minimized (step 309). When the error signal e converges (step 310), the processing unit 31 stores the power (= e ² ) of the error signal e as the noise output N2 (step 311).
Next, the updating of the filter coefficient W2 is stopped, and the simulated voice signal output from the second propagation characteristic setting unit 52 by controlling the switch unit 61 is input to the target response setting unit 4 and output from the propagation characteristic setting unit 51. The simulated audio signal to be input is input to the adaptive filter 3b (step 312). In such a state, the power (= e ² ) of the error signal e is stored as the audio signal output S2 (step 313), and the SN ratio (= S2 / N2) and the adaptive filter coefficient W2 at that time are stored in the memory 21. (Step 314).
[0024]
When the SN ratio (S1 / N1, S2 / N2) is obtained as described above, the processing unit 31 compares the SN ratios S1 / N1, S2 / N2 (step 315), and S1 / N1> S2 / N2 If so, the output of the microphone 1 is stored in the memory 21 as a target signal and the output of the microphone 2 as a reference signal, and the filter coefficient W1 is stored in the memory 21 as W (W = W1) (step 316).
However, if S1 / N1 ≦ S2 / N2, the output of the microphone 2 is stored in the memory 21 as the target signal and the output of the microphone 1 as the reference signal, and the filter coefficient W2 is stored in the memory 21 as W (W = W2). (Step 317).
Thereafter, returning to the beginning, the above process is repeated, and the latest microphone selection state with the larger SN ratio and the filter coefficient at that time are stored.
If the voice recognition state is entered, the microphone output selection process and the filter coefficient setting process are executed according to the same processing flow as in the first embodiment of FIG.
The present invention has been described with reference to the embodiments. However, the present invention can be variously modified in accordance with the gist of the present invention described in the claims, and the present invention does not exclude these.
[0025]
【The invention's effect】
As described above, according to the present invention, each microphone output is determined as the target signal and the reference signal so that the amount of noise reduction is increased even if the propagation characteristic from the noise source to each microphone changes due to the noise generation state. The ratio can be improved effectively.
In addition, according to the present invention, even if the propagation characteristics from the noise source to each microphone change due to the noise generation state, the SN ratio is calculated, and each microphone output is set to the target signal and reference signal so that the SN ratio becomes large. Therefore, the signal-to-noise ratio can be reliably improved, and the improvement effect is great.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a microphone system (noise reduction system) according to a first embodiment of the present invention.
FIG. 2 is a flow of processing for determining a target signal and a reference signal and a filter coefficient in real time according to the first embodiment.
FIG. 3 is a processing flow for selecting a microphone output and setting a filter coefficient during speech recognition according to the first embodiment;
FIG. 4 is a configuration diagram of a microphone system according to a second embodiment of the present invention.
FIG. 5 is an explanatory diagram of propagation characteristics from a speaker's mouth to each microphone.
FIG. 6 is a flowchart (No. 1) of target signal and reference signal determination processing and filter coefficient real-time update processing according to the second embodiment;
FIG. 7 is a flowchart (part 2) of the target signal / reference signal determination process and filter coefficient real-time update process according to the second embodiment;
FIG. 8 is a relationship diagram between SN ratio and recognition rate.
FIG. 9 shows a high S / N ratio sound receiving system in the case of using two conventional microphones.
FIG. 10 is a characteristic diagram of a target response setting unit.
[Explanation of symbols]
1, 2 ··· First and second microphones 3 ··· Adaptive signal processing unit 3a · · LMS operation unit 3b · · adaptive filter 4 · · target response setting unit 5 · · subtraction unit 11 · · switch unit 21 · · Memory 31 ... Processing unit

Claims

The first and second microphones are adapted to perform adaptive signal processing using a signal output from one microphone during non-speech recognition as a target signal and a signal output from the other microphone as a reference signal to determine the coefficient of the adaptive filter. In a microphone system comprising a signal processing unit and setting the determined filter coefficient in an adaptive filter at the time of speech recognition to improve the S / N ratio of a speaker speech signal,
Switching means for selectively switching the first and second microphone outputs as the target signal and the reference signal, respectively;
Means for storing the microphone output selection state with the larger noise reduction amount and the filter coefficient at that time,
During non-speech recognition, adaptive signal processing is performed using the output of the first microphone as the target signal and the output of the second microphone as the reference signal to determine the amount of noise reduction, and then the output of the second microphone is set as the target signal. Adaptive signal processing is performed using the output of the first microphone as a reference signal to determine the noise reduction amount, the microphone output selection state with the larger noise reduction amount and the filter coefficient at that time are stored, and thereafter, the noise reduction amount The storage process based on the size is repeated, and at the time of speech recognition, the output of each microphone is determined as a target signal and a reference signal based on the stored microphone output selection state, and the stored filter coefficient is used as an adaptive filter. Processing unit to set,
A microphone system characterized by comprising:

The first and second microphones are adapted to perform adaptive signal processing using a signal output from one microphone during non-speech recognition as a target signal and a signal output from the other microphone as a reference signal to determine the coefficient of the adaptive filter. In a microphone system comprising a signal processing unit and setting the determined filter coefficient in an adaptive filter at the time of speech recognition to improve the S / N ratio of a speaker speech signal,
First and second propagation characteristic setting means for simulating propagation characteristics from the speaker's mouth to each microphone;
Simulated voice generating means for inputting simulated voice to each propagation characteristic setting means,
First switching means for selectively switching the first and second microphone outputs as a target signal and a reference signal, respectively;
Second switching means for selectively switching the output of the first and second microphones and the output of the first and second propagation characteristic setting means;
Means for storing a microphone output selection state having a larger S / N ratio and a filter coefficient at that time;
At the time of non-speech recognition, the output signal when adaptive signal processing is performed using the output of the first microphone as the target signal and the output of the second microphone as the reference signal is the noise signal N1, and then the first and second signals are output. The output signal when the adaptive signal processing is performed using the output of the first and second propagation characteristic setting means as the target signal and the reference signal instead of the microphone output is the audio signal S1, and the noise signal N1 and the audio signal S1 are Then, the S / N ratio is calculated, and then the output signal when the adaptive signal processing can be performed using the output of the second microphone as the target signal and the output of the first microphone as the reference signal is the noise signal N2, Instead of the second and first microphone outputs, the output signal when the adaptive signal processing is performed using the output of the second and first propagation characteristic setting means as the target signal and the reference signal is the audio signal S2. Then, the S / N ratio is calculated using the noise signal N2 and the audio signal S2, the microphone output selection state with the larger S / N ratio and the filter coefficient at that time are stored, and thereafter the storage process based on the S / N ratio is repeated, Upon speech recognition, a processing unit that determines the output of each microphone as a target signal and a reference signal based on the stored microphone output selection state, and sets the stored filter coefficient in an adaptive filter,
A microphone system characterized by comprising: