JP2008070878A

JP2008070878A - Voice signal pre-processing device, voice signal processing device, voice signal pre-processing method and program for voice signal pre-processing

Info

Publication number: JP2008070878A
Application number: JP2007236466A
Authority: JP
Inventors: Abderrahman Essebbar; アブデラマンエズバー; Tristan Poinsard; トリスタンポワンサール; Michel Gaeta; ミッシェルガエタ
Original assignee: Aisin Seiki Co Ltd
Current assignee: Aisin Corp
Priority date: 2006-09-15
Filing date: 2007-09-12
Publication date: 2008-03-27
Also published as: FR2906070B1; FR2906070A1

Abstract

<P>PROBLEM TO BE SOLVED: To effectively reduce external noise from various vibration sources. <P>SOLUTION: The voice signal pre-processing device 10 including a first filter 20 for reducing noise and a second filter 22 for reducing noise, reduces a noise component by processing an electric signal including voice. Either of the first filter 20 or the second filter 22 is a coherent filter for reducing noise by coherent filtering, and the other is a non-coherent filter for reducing noise by non-coherent filtering, and the second filter 22 performs filtering of output of the first filter 20. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声含有信号を前処理し、その信号に含まれる外来ノイズ成分を低減する音声信号前処理装置、音声信号前処理装置を備える音声信号処理装置、音声信号前処理方法、及び音声信号前処理用のプログラムに関するものである。特に、音声認識システムや電話（例えばハンズフリー電話）へ入力される信号の処理に特に適している。更に、本発明は、比較的大量の外来ノイズが音声認識処理を妨げることで音声認識の精度や信頼性に影響を及ぼす車両環境での使用に特に適している。 The present invention preprocesses an audio-containing signal and reduces an external noise component included in the signal, an audio signal preprocessing device including the audio signal preprocessing device, an audio signal preprocessing method, and an audio signal The present invention relates to a preprocessing program. In particular, it is particularly suitable for processing a signal input to a voice recognition system or a telephone (for example, a hands-free telephone). Furthermore, the present invention is particularly suitable for use in a vehicle environment in which a relatively large amount of external noise interferes with the speech recognition process, thereby affecting the accuracy and reliability of speech recognition.

車載電気機器のハンズフリー操作に音声を使用したり、車室内にてハンズフリー電話を用いることに広く関心が示されている。マイクロホンによって集音されたボイス指令やその他の発話情報の認識に電子音声認識技術が用いられる一方で、これら音声認識技術は、外来ノイズを含有しない比較的「クリーン」なボイス信号には適している。しかし、集音されたオーディオ信号の品質をひどく劣化する様々な外来ノイズ源が車両環境中には存在し、音声認識が困難になったり、音声認識の信頼性が落ちる。例えば、外来ノイズ源とは、道路ノイズ、エンジン音、タイヤノイズ、風音、雨音、ラジオや音楽プレイヤからの音、車両内のガタ音や振動、ガラス窓ふきワイパー音や、車両外の一過性ノイズ等である。希望する音声信号の多くの成分を破壊することなく、音声認識結果に関わるこれら様々な外来ノイズをフィルタリングすることは困難である。更には、車両メーカや車両部品メーカに受け入れられるためには、これらの技術は、コスト面、また、新たな機器や処理装置を多く追加することなく、相当の性能を発揮しなければならない。 There has been widespread interest in using voice for hands-free operation of in-vehicle electrical equipment and using hands-free telephones in the passenger compartment. While electronic voice recognition technology is used to recognize voice commands and other speech information collected by microphones, these voice recognition technologies are suitable for relatively “clean” voice signals that do not contain extraneous noise. . However, various external noise sources that severely degrade the quality of the collected audio signal exist in the vehicle environment, making speech recognition difficult or reducing the reliability of speech recognition. For example, external noise sources include road noise, engine noise, tire noise, wind noise, rain sound, sound from radio and music players, rattling and vibration in the vehicle, windshield wiper sound, For example, transient noise. It is difficult to filter these various extraneous noises related to speech recognition results without destroying many components of the desired speech signal. Furthermore, in order to be accepted by vehicle manufacturers and vehicle parts manufacturers, these technologies must exhibit considerable performance without cost and without adding a lot of new equipment and processing devices.

従来技術は、それらノイズを低減し、音声含有信号に含まれる音声成分の質を高める様々な技術を提示している。 The prior art presents various techniques for reducing these noises and improving the quality of audio components included in audio-containing signals.

そのうちの一つとして、集音の指向特性を制御するのに、マイクロホンネットワークやビームフォーミング技法を用いる方法がある。例えば、ビームフォーミング技法は、運転手の方向や位置に指向特性を向けることができる。しかしながら、車両室内の特定方向からのノイズは主要な原因であることはまれであるため、これだけでは外来ノイズの小幅な低減だけに留まり、複数のマイクロホンを追加することでコストが発生することを考えると、あまり効果的ではない。 One of them is a method of using a microphone network or a beam forming technique to control the directivity of sound collection. For example, the beamforming technique can direct the directivity toward the driver's direction and position. However, noise from a specific direction in the vehicle compartment is rarely the main cause, so this alone can only reduce the external noise only a little, and adding multiple microphones can cause costs. And not very effective.

他の方法として、２つのマイクロホンを用い、一方は音声と外来ノイズの両方を集音する方向に向く第一マイクロホンで、他方は主に外来ノイズを集音する方向に向く第二マイクロホンである。第二マイクロホンは、第一マイクロホンが集音した音声含有信号からノイズを低減するノイズキャンセリングフィルタへ入力されるノイズ参照信号を提供する。しかしながら、この技術は、マイクロホンの配置によっては不適切な場合もある。第二マイクロホンが集音する音声量を減らすには、第二マイクロホンを第一マイクロホンから離間して配置する必要がある。しかし、第二マイクロホンと第一マイクロホンの離間距離が長くなるほど、第一マイクロホン周辺の局所雑音（外来ノイズ）の参照信号としての第二マイクロホンの性能が悪化する。 As another method, two microphones are used, one is a first microphone that faces in a direction of collecting both voice and external noise, and the other is a second microphone that mainly faces in a direction of collecting external noise. The second microphone provides a noise reference signal that is input to a noise canceling filter that reduces noise from the audio-containing signal collected by the first microphone. However, this technique may be inappropriate depending on the placement of the microphone. In order to reduce the amount of sound collected by the second microphone, it is necessary to dispose the second microphone away from the first microphone. However, the longer the distance between the second microphone and the first microphone, the worse the performance of the second microphone as a reference signal for local noise (external noise) around the first microphone.

車載オーディオ系のスピーカを駆動するのに直接用いる電気駆動信号を受信する音声信号処理装置が開示されている（特許文献１参照）。特許文献１では、この電気駆動信号は、マイクロホンが集音した信号（以下、マイクロホン信号）から低減されるべきオーディオ系の外来音に正確に代替するものであり、ノイズキャンセルフィルタへ雑音参照信号として提供されるので、雑音参照信号を得るのにセンサーを追加する必要が無い。 An audio signal processing apparatus that receives an electric drive signal used directly to drive a vehicle-mounted audio system speaker is disclosed (see Patent Document 1). In Patent Document 1, this electric drive signal is an exact substitute for an external sound of an audio system to be reduced from a signal collected by a microphone (hereinafter referred to as a microphone signal), and is used as a noise reference signal to a noise cancellation filter. As provided, there is no need to add a sensor to obtain a noise reference signal.

特許文献２には、マイクロホン信号を複数の周波数バンドや各バンドに分割する、連続ノイズキャンセルシステムが開示されている。特許文献２では、主要外来ノイズ成分が原則コヒーレントか非コヒーレントかを決定する。コヒーレントノイズキャンセリングフィルタか適応非コヒーレントノイズキャンセリングフィルタのいずれかがその決定に応じて選択的に実行される。各バンド毎に最適な方法でフィルタリングした後、信号を再生するのに複数の周波数バンドが合成される。フィルタリング性能を不安定にする一過性ノイズはメインフィルタリング開始前に検出される。適応非コヒーレントノイズキャンセリングフィルタは、振動センサのような外部非音響センサからの信号を受信し、マイクロホンと非音響センサ間の伝達関数の推定値にてスペクトラルパワーフィルタリングを実行する。
特開平２−２４４０９９号公報特開２００６−２７６８５６号公報 Patent Document 2 discloses a continuous noise cancellation system that divides a microphone signal into a plurality of frequency bands and each band. In Patent Document 2, it is determined in principle whether the main external noise component is coherent or non-coherent. Either a coherent noise canceling filter or an adaptive non-coherent noise canceling filter is selectively performed in response to the determination. After filtering in an optimal manner for each band, a plurality of frequency bands are combined to reproduce the signal. Transient noise that makes filtering performance unstable is detected before the start of main filtering. The adaptive non-coherent noise canceling filter receives a signal from an external non-acoustic sensor such as a vibration sensor and performs spectral power filtering on the estimated transfer function between the microphone and the non-acoustic sensor.
JP-A-2-244099 JP 2006-276856 A

特許文献１の音声信号処理装置では、本装置は、エンジン、道路、タイヤ、風、雨や車両等の振動といった他の様々な振動源からの外来ノイズを低減できない。また、特許文献２のノイズキャンセルシステムであっても、様々な振動源からの外来ノイズを低減できるために十分ではないと考えられる。 In the audio signal processing device of Patent Document 1, this device cannot reduce external noise from various other vibration sources such as vibrations of the engine, roads, tires, wind, rain, vehicles, and the like. Further, even the noise cancellation system of Patent Document 2 is considered not sufficient because it can reduce external noise from various vibration sources.

そこで、本発明は、様々な振動源からの外来ノイズを効果的に低減できる音声信号前処理装置、音声信号処理装置、音声信号前処理方法、及び音声信号前処理用のプログラムを提供するものである。 Therefore, the present invention provides an audio signal preprocessing device, an audio signal processing device, an audio signal preprocessing method, and an audio signal preprocessing program that can effectively reduce external noise from various vibration sources. is there.

上記課題を解決するものは以下の通りである。 What solves the said subject is as follows.

（１）雑音低減用第一フィルタと雑音低減用第二フィルタとを有し、音声を含む電気音声信号を処理して雑音成分を低減する音声信号前処理装置において、前記第一フィルタまたは前記第二フィルタのいずれか一方がコヒーレントフィルタリングにより雑音成分を低減するコヒーレントフィルタであり、他方が非コヒーレントフィルタリングにより雑音成分を低減する非コヒーレントフィルタであり、前記第二フィルタは前記第一フィルタの出力をフィルタリングすることを特徴とする音声信号前処理装置。 (1) In an audio signal pre-processing device that includes a noise-reducing first filter and a noise-reducing second filter, and that processes an electric audio signal including audio to reduce noise components, the first filter or the first filter One of the two filters is a coherent filter that reduces the noise component by coherent filtering, and the other is a non-coherent filter that reduces the noise component by non-coherent filtering, and the second filter filters the output of the first filter An audio signal pre-processing apparatus.

（２）前記第一フィルタが前記コヒーレントフィルタであり、前記第二フィルタが前記非コヒーレントフィルタであることを特徴とする上記（１）に記載の音声信号前処理装置。 (2) The audio signal preprocessing device according to (1), wherein the first filter is the coherent filter, and the second filter is the non-coherent filter.

（３）前記コヒーレントフィルタは、前記電気音声信号に含まれる雑音成分とコヒーレントである第一雑音参照信号を非音響コヒーレント雑音センサから受信するように構成されていることを特徴とする上記（１）又は（２）に記載の音声信号前処理装置。 (3) The coherent filter is configured to receive from a non-acoustic coherent noise sensor a first noise reference signal that is coherent with a noise component included in the electrical audio signal. Or the audio | voice signal pre-processing apparatus as described in (2).

（４）前記非コヒーレントフィルタは、前記電気音声信号に含まれる雑音成分と非コヒーレントでありかつスペクトラムパワーにて関連する第二雑音参照信号を非音響非コヒーレント雑音センサから受信するように構成されていることを特徴とする上記（１）乃至（３）のいずれか一に記載の音声信号前処理装置。 (4) The non-coherent filter is configured to receive, from a non-acoustic non-coherent noise sensor, a second noise reference signal that is non-coherent with a noise component included in the electrical audio signal and related in spectrum power. The audio signal preprocessing device according to any one of (1) to (3) above, wherein

（５）前記非音響コヒーレント雑音センサは車両のオーディオ系統と非音響的に接続されていることを特徴とする上記（３）又は（４）に記載の音声信号前処理装置。 (5) The audio signal preprocessing device according to (3) or (4), wherein the non-acoustic coherent noise sensor is non-acoustically connected to an audio system of a vehicle.

（６）前記非音響非コヒーレント雑音センサは車両の振動を感知するように構成されていることを特徴とする上記（４）又は（５）に記載の音声信号前処理装置。 (6) The audio signal preprocessing device according to (4) or (5), wherein the non-acoustic non-coherent noise sensor is configured to sense a vibration of a vehicle.

（７）前記コヒーレントフィルタが線形フィルタを有することを特徴とする上記（1）乃至（６）のいずれか一に記載の音声信号前処理装置。 (7) The audio signal preprocessing device according to any one of (1) to (6), wherein the coherent filter includes a linear filter.

（８）前記コヒーレントフィルタは、ウィナーフィルタを有することを特徴とする上記（１）乃至（７）のいずれか一に記載の音声信号前処理装置。 (8) The audio signal preprocessing device according to any one of (1) to (7), wherein the coherent filter includes a Wiener filter.

（９）前記コヒーレントフィルタは、前記電気音声信号および前記第一雑音参照信号に基づいて繰り返しフィルタ係数を演算するフィルタ係数演算部を有することを特徴とする上記（３）乃至（８）のいずれか一に記載の音声信号前処理装置。 (9) Any one of (3) to (8), wherein the coherent filter includes a filter coefficient calculation unit that repeatedly calculates a filter coefficient based on the electrical audio signal and the first noise reference signal. The audio signal preprocessing device according to 1.

（１０）前記非コヒーレントフィルタは非線形フィルタを有することを特徴とする上記（１）乃至（９）のいずれか一に記載の音声信号前処理装置。 (10) The audio signal preprocessing device according to any one of (1) to (9), wherein the non-coherent filter includes a nonlinear filter.

（１１）前記非コヒーレントフィルタはスペクトル−ゲインフィルタを有することを特徴とする上記（１）乃至（１０）のいずれか一に記載の音声信号前処理装置。 (11) The audio signal preprocessing device according to any one of (1) to (10), wherein the non-coherent filter includes a spectrum-gain filter.

（１２）前記非コヒーレントフィルタは、前記非コヒーレントフィルタに入力された前記電気音声信号および前記第二雑音参照信号に基づいてフィルタ係数を演算するフィルタ係数演算部を更に有することを特徴とする上記（４）乃至（１１）のいずれか一に記載の音声信号前処理装置。 (12) The noncoherent filter further includes a filter coefficient calculation unit that calculates a filter coefficient based on the electrical audio signal and the second noise reference signal input to the noncoherent filter. 4) The audio signal preprocessing device according to any one of (11).

（１３）前記第一フィルタ及び前記第二フィルタそれぞれは、複数の対応する周波数サブバンドにて前記電気音声信号をフィルタリングする複数のサブフィルタを有することを特徴とする上記（１）乃至（１２）のいずれか一に記載の音声信号前処理装置。 (13) Each of the first filter and the second filter has a plurality of sub-filters that filter the electrical audio signal in a plurality of corresponding frequency subbands. The audio signal preprocessing device according to any one of the above.

（１４）前記周波数サブバンドへ前記電気音声信号を分割する入力部と、フィルタリング後に前記周波数サブバンドからの前記電気音声信号を合成する出力部とを更に有することを特徴とする上記（１３）に記載の音声信号前処理装置。 (14) The above (13), further comprising: an input unit that divides the electrical audio signal into the frequency subbands; and an output unit that synthesizes the electrical audio signal from the frequency subbands after filtering. The audio signal pre-processing device described.

（１５）前記第一フィルタと前記第二フィルタは一連のフレームとして前記電気音声信号を処理し、前記各フレームは所定期間のウィンドウ中の信号を表すフレームの連続したフレームとして、信号を処理することを特徴とする上記（１）乃至（１４）のいずれか一に記載の音声信号前処理装置。 (15) The first filter and the second filter process the electrical audio signal as a series of frames, and each frame processes the signal as a continuous frame of frames representing a signal in a window for a predetermined period. The audio signal preprocessing device according to any one of (1) to (14) above,

（１６）前記一連のフレームへ前記電気音声信号を分割する入力部と、前記一連のフレームからの前記電気音声信号を合成する出力部とを更に有することを特徴とする上記（１５）に記載の音声信号前処理装置。 (16) The method according to (15), further comprising: an input unit that divides the electrical audio signal into the series of frames; and an output unit that synthesizes the electrical audio signal from the series of frames. Audio signal pre-processing device.

（１７）上記（１）乃至（１６）のいずれか一に記載の前記音声信号前処理装置と、
音響信号を集音し、前記音声信号前処理装置への入力として前記電気音声信号を供給するマイクロホンと、
前記マイクロホンによって集音可能な第一雑音源とコヒーレントな前記第一雑音参照信号を生成し、前記第一雑音参照信号を前記コヒーレントフィルタへ供給する前記非音響コヒーレント雑音センサと、
前記マイクロホンによって集音可能な第二雑音源と非コヒーレントであり、かつ前記第二雑音源とはスペクトラムパワーにて関連している第二雑音参照信号を生成し、前記第二雑音参照信号を前記非コヒーレントフィルタへ供給する前記非音響非コヒーレント雑音センサとを備えることを特徴とする音声信号処理装置。 (17) The audio signal preprocessing device according to any one of (1) to (16),
A microphone that collects an acoustic signal and supplies the electrical audio signal as an input to the audio signal preprocessing device;
The non-acoustic coherent noise sensor that generates the first noise reference signal coherent with a first noise source that can be collected by the microphone, and supplies the first noise reference signal to the coherent filter;
Generating a second noise reference signal that is non-coherent with a second noise source that can be collected by the microphone and is related in spectrum power to the second noise source; An audio signal processing apparatus comprising: the non-acoustic non-coherent noise sensor supplied to a non-coherent filter.

（１８）前記非音響非コヒーレント雑音センサは、振動センサや加速度計であることを特徴とする上記（１７）に記載の音声信号処理装置。 (18) The audio signal processing device according to (17), wherein the non-acoustic non-coherent noise sensor is a vibration sensor or an accelerometer.

（１９）音声を含む電気音声信号をフィルタリングし雑音成分を低減する音声前処理方法において、前記電気音声信号を第一フィルタによりコヒーレントフィルタリングもしくは非コヒーレントフィルタリングのいずれか一方のフィルタリングを行う第一フィルタリング工程と、
前記第一フィルタに基づく出力を第二フィルタにより前記第一フィルタとは異なる他方のフィルタリングを行う第２フィルタリング工程と
を備えることを特徴とする音声信号前処理方法。 (19) In a speech preprocessing method for filtering an electrical speech signal including speech and reducing a noise component, a first filtering step of performing either one of coherent filtering or non-coherent filtering on the electrical speech signal by a first filter. When,
And a second filtering step of filtering the output based on the first filter by the second filter, which is the other filtering different from the first filter.

（２０）前記第一フィルタが前記コヒーレントフィルタであり、前記第二フィルタが前記非コヒーレントフィルタであることを特徴とする上記（１９）に記載の音声信号前処理方法。 (20) The audio signal preprocessing method according to (19), wherein the first filter is the coherent filter, and the second filter is the non-coherent filter.

（２１）上記（１９）又は（２０）に記載の方法をコンピュータに実行させることを特徴とする音声信号前処理用のプログラム。 (21) A program for audio signal preprocessing, which causes a computer to execute the method according to (19) or (20).

また、本発明の前記音声信号前処理装置は集積回路に実装されることが好ましい。前記音声信号前処理装置は、外来車両ノイズ低減用に車両に適用されることが好ましい。また、前記音声信号前処理装置の出力が入力される音声認識部を更に有することが好ましい。 Moreover, it is preferable that the audio signal preprocessing device of the present invention is mounted on an integrated circuit. The audio signal preprocessing device is preferably applied to a vehicle for reducing external vehicle noise. Moreover, it is preferable to further have a voice recognition unit to which the output of the voice signal preprocessing device is input.

請求項１の音声信号前処理装置によれば、コヒーレントフィルタ及び非コヒーレントフィルタの両方にて順次フィルタリングされるため、様々な外来ノイズ成分を効果的にフィルタリングできる。 According to the audio signal pre-processing device of the first aspect, since it is sequentially filtered by both the coherent filter and the non-coherent filter, various external noise components can be effectively filtered.

請求項２の音声信号前処理装置によれば、非コヒーレントフィルタリングの前にコヒーレントフィルタリングを実行することで、コヒーレント成分を最適に低減でき、更には、コヒーレントフィルタリングによって除去もしくは少なくとも低減されうる成分によって非コヒーレントフィルタリングが偏調されることを回避できる。 According to the speech signal preprocessing device of claim 2, the coherent filtering can be optimally reduced by performing the coherent filtering before the non-coherent filtering, and further, the non-coherent filtering is performed by a component that can be removed or at least reduced by the coherent filtering. It can be avoided that the coherent filtering is biased.

請求項３乃至６の音声信号前処理装置によれば、非音響で雑音参照信号を取得できるため、音声区間検出器が不要となる。 According to the speech signal preprocessing devices of claims 3 to 6, since the noise reference signal can be acquired non-acousticly, a speech segment detector is not necessary.

請求項１７の音声信号処理装置によれば、コヒーレントフィルタ及び非コヒーレントフィルタの両方にて順次フィルタリングされるため、様々な外来ノイズ成分を効果的にフィルタリングできる。 According to the audio signal processing device of the seventeenth aspect, since filtering is sequentially performed by both the coherent filter and the non-coherent filter, various external noise components can be effectively filtered.

請求項１９の音声信号前処理方法によれば、コヒーレントフィルタ及び非コヒーレントフィルタの両方にて順次フィルタリングされるため、様々な外来ノイズ成分を効果的にフィルタリングできる。 According to the audio signal preprocessing method of the nineteenth aspect, since filtering is sequentially performed by both the coherent filter and the non-coherent filter, various external noise components can be effectively filtered.

請求項２０の音声信号前処理方法によれば、非コヒーレントフィルタリングの前にコヒーレントフィルタリングを実行することで、コヒーレント成分を最適に低減でき、更には、コヒーレントフィルタリングによって除去もしくは少なくとも低減されうる成分によって非コヒーレントフィルタリングが偏調されることを回避できる。 According to the speech signal preprocessing method of claim 20, the coherent filtering can be optimally reduced by performing the coherent filtering before the non-coherent filtering, and further, the non-coherent filtering can be performed by a component that can be removed or at least reduced by the coherent filtering. It can be avoided that the coherent filtering is biased.

請求項２１の音声信号前処理用のプログラムによれば、コヒーレントフィルタ及び非コヒーレントフィルタの両方にて順次フィルタリングされるため、様々な外来ノイズ成分を効果的にフィルタリングできる。 According to the audio signal preprocessing program of the twenty-first aspect, since filtering is sequentially performed by both the coherent filter and the non-coherent filter, various external noise components can be effectively filtered.

まず、本発明の概要について説明する。 First, an outline of the present invention will be described.

本発明の実施形態１は、複数の雑音の効果的なフィルタリングに関するものである。 Embodiment 1 of the present invention relates to effective filtering of a plurality of noises.

本発明の音声信号前処理装置（音声信号処理装置）は、互いに連続する第一フィルタ２０と第二フィルタ２２を有するフィルタを提供する。第一、第二フィルタ２０，２２の一方がコヒーレントな外来ノイズ成分を低減するコヒーレントフィルタであり、他方が非コヒーレント外来ノイズ成分を低減する非コヒーレントフィルタである。第一フィルタ２０と第二フィルタ２２は連続的に配置されている。このような連続的なフィルタの組み合わせによって、音声信号は、コヒーレントフィルタと非コヒーレントフィルタの両方にて順次フィルタリングされるため、様々な外来ノイズ成分を効果的にフィルタリングできる。特に、車両使用に適している。 The audio signal pre-processing device (audio signal processing device) of the present invention provides a filter having a first filter 20 and a second filter 22 that are continuous with each other. One of the first and second filters 20 and 22 is a coherent filter that reduces a coherent external noise component, and the other is a non-coherent filter that reduces a non-coherent external noise component. The first filter 20 and the second filter 22 are continuously arranged. With such a combination of continuous filters, the audio signal is sequentially filtered by both the coherent filter and the non-coherent filter, so that various external noise components can be effectively filtered. It is particularly suitable for vehicle use.

一つの形として、第一フィルタ２０がコヒーレントフィルタで、第二フィルタ２２が非コヒーレントフィルタである。非コヒーレントフィルタリングの前にコヒーレントフィルタリングを実行することで、コヒーレント成分を最適に低減でき、更には、コヒーレントフィルタリングによって除去もしくは少なくとも低減されうる成分によって非コヒーレントフィルタリングが偏調されることを回避できる。 In one form, the first filter 20 is a coherent filter and the second filter 22 is a non-coherent filter. By performing the coherent filtering before the non-coherent filtering, the coherent component can be optimally reduced, and further, the non-coherent filtering can be avoided from being distorted by a component that can be removed or at least reduced by the coherent filtering.

コヒーレントフィルタ及び非コヒーレントフィルタは、それぞれの雑音参照源からの雑音参照信号を受信する。各雑音参照源は非音響雑音参照源である。非音響とは、雑音参照源が空気中の振動を直接検出しないことを意味する。しかし、非音響雑音参照源は、車両のある箇所において可聴周波数範囲の振動を検出し、そして、可聴周波数範囲の成分を示す信号を生成する。 The coherent filter and the non-coherent filter receive noise reference signals from respective noise reference sources. Each noise reference source is a non-acoustic noise reference source. Non-acoustic means that the noise reference source does not directly detect vibrations in the air. However, the non-acoustic noise reference source detects vibrations in the audible frequency range at a location in the vehicle and generates a signal indicative of the components in the audible frequency range.

また、本発明の実施形態２は、スペクトルパワー（スペクトル−ゲイン）フィルタリングに関するものである。ここで使用されるフィルタは反復フィルタである。このフィルタは、雑音参照源とマイクロホン間の伝達関数の大きさの推定値に関連する校正ゲインを生成する。校正ゲインは単一の値／信号か、スペクトル値／信号である。 The second embodiment of the present invention relates to spectrum power (spectrum-gain) filtering. The filter used here is an iterative filter. This filter produces a calibration gain associated with an estimate of the magnitude of the transfer function between the noise reference source and the microphone. The calibration gain is a single value / signal or a spectral value / signal.

本発明の音声信号前処理装置（音声信号処理装置）は、校正ゲインの最大許容変化率をモニタし制限する制限部を備えている。実施形態２では、伝達関数の大きさは変化するが比較的ゆっくり変化するという認識に基づく（つまり、変化率がある限界値よりも小さい）。生成された校正ゲインが早い変化率で変動し始める場合、これは、音声や一過性ノイズといった雑音参照源に関連しない他成分によって、マイクロホン信号が不安定になっていることを示す。 The audio signal preprocessing apparatus (audio signal processing apparatus) of the present invention includes a limiting unit that monitors and limits the maximum allowable change rate of the calibration gain. The second embodiment is based on the recognition that the magnitude of the transfer function changes but changes relatively slowly (that is, the rate of change is smaller than a certain limit value). If the generated calibration gain begins to fluctuate at a fast rate of change, this indicates that the microphone signal has become unstable due to other components not related to the noise reference source, such as speech or transient noise.

このように、制限部が一過性ノイズや音声に起因する乱れを自動的に阻止するので、専用の一過性ノイズ検出器や専用の音声区間検出器備える必要性が無い。 As described above, since the limiting unit automatically prevents the disturbance caused by the transient noise or the voice, there is no need to provide a dedicated transient noise detector or a dedicated voice section detector.

好ましくは、伝達関数の大きさの変化に校正ゲインが対応できるように、変化閾値の上限値は、予想される変化率範囲内にて充分大きく設定されてもよい。更には、伝達関数に関連しない成分の急速な変動によって校正ゲインが乱されないように、変化閾値の下限値は、充分小さく設定されてもよい。 Preferably, the upper limit value of the change threshold value may be set sufficiently large within an expected change rate range so that the calibration gain can cope with a change in the size of the transfer function. Furthermore, the lower limit value of the change threshold value may be set sufficiently small so that the calibration gain is not disturbed by a rapid fluctuation of a component not related to the transfer function.

校正ゲインは周期的に生成され、新たに生成された校正ゲインは、比較（参照）値と比較される。比較値は、校正ゲインの一つもしくは複数の前の値から求められる。新しい校正ゲインと比較値の差異が一定の閾値を超える場合、新しい校正ゲインは比較値へ置き換えられる。この閾値とは、比較値の所定割合、たとえば、比較値の約２０％である。 The calibration gain is periodically generated, and the newly generated calibration gain is compared with the comparison (reference) value. The comparison value is obtained from one or more previous values of the calibration gain. If the difference between the new calibration gain and the comparison value exceeds a certain threshold, the new calibration gain is replaced with the comparison value. This threshold is a predetermined ratio of the comparison value, for example, about 20% of the comparison value.

校正ゲインは、可変倍率を有する。速度に関連したノイズを考慮するために、可変倍率は車速に依存してもよい。 The calibration gain has a variable magnification. To account for speed related noise, the variable magnification may depend on the vehicle speed.

本発明の実施形態は、マイクロホン信号から車両ノイズを低減する技術に関するものである。これによると、よりクリアな音声信号を得ることができ、車載音声認識システムやハンズフリー電話用に音声をより効果的に認識することができる。実施形態１によると、コヒーレントフィルタリングと非コヒーレントフィルタリングを順に実行する。各フィルタリングは、それぞれに対応する非音響雑音参照源に基づいて行われる。コヒーレントフィルタリングは、非コヒーレントフィルタリングの前に実行されてもよい。実施形態２によると、雑音参照源とマイクロホン間の伝達関数の大きさの推定値に関連する校正ゲインの最大許容変化率を制限することで、スペクトル−ゲインフィルタリングを実行する。このように、校正ゲインの最大許容変化率を制限することで、一過性ノイズや音声に起因する乱れを自動的に阻止することができ、結果、従来のように音声区間検出器の必要性が無くなる。 Embodiments described herein relate generally to a technique for reducing vehicle noise from a microphone signal. According to this, a clearer voice signal can be obtained, and the voice can be recognized more effectively for an in-vehicle voice recognition system or a hands-free phone. According to the first embodiment, coherent filtering and non-coherent filtering are sequentially performed. Each filtering is performed based on a corresponding non-acoustic noise reference source. Coherent filtering may be performed before non-coherent filtering. According to the second embodiment, spectrum-gain filtering is performed by limiting the maximum allowable change rate of the calibration gain related to the estimated value of the transfer function between the noise reference source and the microphone. In this way, by limiting the maximum allowable change rate of the calibration gain, it is possible to automatically prevent transient noise and turbulence due to speech, resulting in the need for a speech interval detector as in the past. Disappears.

上記の本発明の態様は別々に実施できるが、これら異なる態様を組み合わせてもよい。 Although the above aspects of the present invention can be implemented separately, these different aspects may be combined.

次に、本発明を具体的に説明する。 Next, the present invention will be specifically described.

（実施形態１）
図１は、本発明の実施形態に係る音声信号処理装置１（音声信号前処理装置１０）の作動原理を示す。 (Embodiment 1)
FIG. 1 shows an operation principle of an audio signal processing device 1 (audio signal preprocessing device 10) according to an embodiment of the present invention.

図１に示すように、音声信号処理装置１は、音声信号前処理装置１０と、マイクロホン１２と、音声認識部１６と、第１雑音参照センサ２８と、第２雑音参照センサ３０とを備えている。図１において、音声信号前処理装置１０は、マイクロホン１２が受信した音声信号から外来ノイズを低減するように構成されている。本発明の実施形態１の音声信号処理前処理装置１０は、コスト上の理由で一つのマイクロホン１２に対して適用されるが、必要に応じて、複数のマイクロホンによって構成されたより高価なネットワークに対して適用されてもよい。音声信号前処理装置１０の出力１４は、音声認識部１６へ入力される。音声信号前処理装置１０は、車両内で発生する幾つかの外来ノイズを低減するように特に構成されている。音声認識部１６の出力は、例えば、車載電気機器への入力信号の生成に使用される。更にはもしくはあるいは、音声信号前処理装置１０は音声認識の有無に関わらずフィルタリングされたボイス信号を出力し、それらボイス信号は携帯電話のような車両通信システムに用いられる。携帯電話は、ハンズフリータイプの電話も含む。 As shown in FIG. 1, the audio signal processing device 1 includes an audio signal preprocessing device 10, a microphone 12, an audio recognition unit 16, a first noise reference sensor 28, and a second noise reference sensor 30. Yes. In FIG. 1, the audio signal preprocessing device 10 is configured to reduce external noise from the audio signal received by the microphone 12. The audio signal processing pre-processing apparatus 10 according to the first embodiment of the present invention is applied to one microphone 12 for cost reasons. However, if necessary, the audio signal processing pre-processing apparatus 10 may be applied to a more expensive network including a plurality of microphones. May be applied. The output 14 of the audio signal preprocessing device 10 is input to the audio recognition unit 16. The audio signal pre-processing device 10 is particularly configured to reduce some external noise generated in the vehicle. The output of the voice recognition unit 16 is used, for example, to generate an input signal to the in-vehicle electric device. In addition or alternatively, the voice signal pre-processing device 10 outputs filtered voice signals regardless of whether voice recognition is performed, and these voice signals are used in a vehicle communication system such as a mobile phone. The mobile phone includes a hands-free phone.

音声信号前処理装置１０は、専用のハードウェア回路、構築可能なハードウェア、プロセッサが実行するフィルタリングアルゴリズム、もしくはこれらのいずれかを組み合わせることで実行される。更に、音声信号前処理装置１０は、例えば特定用途向けＩＣ（ＡＳＩＣ）のような集積回路内にて実行され、同一集積回路内にて音声認識部１６と共に作動してもよい。 The audio signal preprocessing device 10 is executed by a dedicated hardware circuit, hardware that can be constructed, a filtering algorithm executed by a processor, or a combination of any of these. Further, the audio signal pre-processing device 10 may be executed in an integrated circuit such as an application specific IC (ASIC), and may operate together with the voice recognition unit 16 in the same integrated circuit.

一般に、マイクロホン１２が受信した音響信号x(n)は、以下成分のうち少なくとも一つを有する。 In general, the acoustic signal x (n) received by the microphone 12 has at least one of the following components.

音声信号成分s(n)：話者が発話している間に含まれる成分で、音声認識部１６へ出力されるべき、ノイズが付加されていない希望信号。
コヒーレント雑音成分c(n)：少なくとも一つの第一雑音参照信号とコヒーレントな成分で、例えば車両オーディオ系統（例：ラジオ、オーディオ、ビデオプレーヤ）の出力である。 Speech signal component s (n): a component included while the speaker is speaking, and a desired signal to which noise is not added, which should be output to the speech recognition unit 16.
Coherent noise component c (n): a component coherent with at least one first noise reference signal, for example, an output of a vehicle audio system (eg, radio, audio, video player).

非コヒーレント雑音成分nc(n)：車両内の雑音参照信号と非コヒーレントな成分で、以下成分のうち少なくとも一つを有する。 Non-coherent noise component nc (n): a component that is non-coherent with the noise reference signal in the vehicle, and has at least one of the following components.

非コヒーレント車両成分ncv(n)：第二車両雑音参照信号と非コヒーレントであるが、パワースペクトラムにて第二車両雑音参照信号と相関関係にある成分。例えばエンジン音やタイヤノイズである。 Non-coherent vehicle component ncv (n): a component that is non-coherent with the second vehicle noise reference signal but is correlated with the second vehicle noise reference signal in the power spectrum. For example, engine noise and tire noise.

外部の比較的安定した成分（外部定常成分）d(n)：厳密に言うと第二車両雑音参照信号と関連は無いが、時間の経過とともに比較的ゆっくり変化する成分。例えば道路ノイズ、雨音や風音などである。 External relatively stable component (external stationary component) d (n): Strictly speaking, it is not related to the second vehicle noise reference signal, but it changes relatively slowly with time. For example, road noise, rain sound and wind sound.

一過性成分t(n)：車両のホーン、他車両によるノイズ、車両外部の一過性ノイズなどである。よって、nc(n)= ncv(n)+ d(n)+ t(n)
一般に、音声信号前処理装置１０は、第一フィルタ２０と第二フィルタ２２とを有し、これら第一、第二フィルタ２０、２２は、互いに関連しているかもしくは順に実行される。第一、第二フィルタ２０、２２の一方は、コヒーレント雑音低減アルゴリズムＣＮＲＡ（例えば線形フィルタ）に従って雑音を低減するコヒーレントフィルタ２４である。コヒーレントフィルタ２４は、コヒーレント雑音成分c(n)を低減する。第一、第二フィルタ２０、２２の他方は、非コヒーレントアルゴリズム（例：非線形フィルタ、非線形雑音低減アルゴリズムＮＬＮＲ）に従って雑音を低減する非コヒーレントフィルタ２６である。非コヒーレントフィルタ２６は、非コヒーレント雑音成分nc(n)（即ち、非コヒーレント車両成分ncv(n)か、さらに選択的に、外部定常成分d(n)を加えてもよい）を低減する。 Transient component t (n): vehicle horn, noise from other vehicles, transient noise outside the vehicle, etc. Therefore, nc (n) = ncv (n) + d (n) + t (n)
In general, the audio signal preprocessing device 10 includes a first filter 20 and a second filter 22, and the first and second filters 20 and 22 are related to each other or executed in order. One of the first and second filters 20 and 22 is a coherent filter 24 that reduces noise according to a coherent noise reduction algorithm CNRA (for example, a linear filter). The coherent filter 24 reduces the coherent noise component c (n). The other of the first and second filters 20 and 22 is a non-coherent filter 26 that reduces noise according to a non-coherent algorithm (eg, non-linear filter, non-linear noise reduction algorithm NLNR). The non-coherent filter 26 reduces the non-coherent noise component nc (n) (that is, the non-coherent vehicle component ncv (n) or, optionally, the external stationary component d (n) may be added).

本発明の実施形態１では、コヒーレントフィルタ２４の後に非コヒーレントフィルタ２６が実行されるので、フィルタリング性能が向上する。また、コヒーレントフィルタ２４によってより効率的に低減される雑音成分によって、非コヒーレントフィルタ２６が偏調されることを防ぐ。 In Embodiment 1 of the present invention, since the non-coherent filter 26 is executed after the coherent filter 24, the filtering performance is improved. Further, the non-coherent filter 26 is prevented from being detuned by a noise component that is more efficiently reduced by the coherent filter 24.

コヒーレントフィルタと非コヒーレントフィルタを順番に実行することで、従来に比べ、様々な外来ノイズ成分をより広範囲（様々なノイズの種類）に渡って低減できる。 By executing the coherent filter and the non-coherent filter in order, various external noise components can be reduced over a wider range (various types of noise) than in the past.

コヒーレントフィルタ２４および非コヒーレントフィルタ２６はそれぞれ、第一雑音参照センサ２８の第一雑音参照信号と第二雑音参照センサ３０の第二雑音参照信号を受信する。これら第一、第二雑音参照信号は、コヒーレントもしくは非コヒーレント雑音成分を低減するのに用いられる。第一、第二雑音参照センサ２８、３０は非音響参照センサであるので、マイクロホン１２を介して受信する音声信号との混信が回避される。コヒーレントフィルタ２４用の第一雑音参照センサ２８は、例えば、車両のラジオプレーヤおよび／もしくはミュージックプレーヤのスピーカー駆動信号へ直結（たとえば非音響）しているが、電気的接続であってもよい。スピーカー駆動信号は、モノラル信号、ステレオ信号、もしくは多次元信号（例えばサラウンド・サウンド）である。ステレオ信号や多次元信号の場合、別々の信号が個々の雑音参照信号として適用されるので、コヒーレントフィルタ２４へは複数の雑音参照信号が入力される。このように、複数の雑音参照信号が入力される場合、コヒーレントフィルタ２４のフィルタリング性能は増大するが、フィルタリングが複雑化するおそれがある。一方、複数のもしくは全ての受信信号をダウンミックスして、雑音参照信号の数を減らす場合は、フィルタリングの複雑化を低減できる。例えば、ステレオ信号がモノラル信号へダウンミックスされると、元の二つのステレオ信号成分を基に生成された単一の雑音参照信号がフィルタ２４、２６へ入力されることとなる。信号のダウンミックスは、コヒーレントフィルタ２４の最終性能特性を下げることにはなるが、実際には、フィルタリング性能には著しく有効であることが立証されている。このようなダウンミックスは、所望の性能レベル範囲内にて費用対性能比を最適化する。複数の信号は、適宜同等にダウンミックスされてもよいし、重み付けされてもよい。 The coherent filter 24 and the non-coherent filter 26 receive the first noise reference signal of the first noise reference sensor 28 and the second noise reference signal of the second noise reference sensor 30, respectively. These first and second noise reference signals are used to reduce coherent or non-coherent noise components. Since the first and second noise reference sensors 28 and 30 are non-acoustic reference sensors, interference with an audio signal received via the microphone 12 is avoided. The first noise reference sensor 28 for the coherent filter 24 is, for example, directly connected (for example, non-acoustic) to a speaker drive signal of a vehicle radio player and / or a music player, but may be electrically connected. The speaker drive signal is a monaural signal, a stereo signal, or a multidimensional signal (for example, surround sound). In the case of a stereo signal or a multidimensional signal, separate signals are applied as individual noise reference signals, so that a plurality of noise reference signals are input to the coherent filter 24. As described above, when a plurality of noise reference signals are input, the filtering performance of the coherent filter 24 increases, but the filtering may be complicated. On the other hand, when a plurality of or all received signals are downmixed to reduce the number of noise reference signals, the complexity of filtering can be reduced. For example, when a stereo signal is downmixed to a monaural signal, a single noise reference signal generated based on the original two stereo signal components is input to the filters 24 and 26. While signal downmixing will reduce the final performance characteristics of the coherent filter 24, in practice it has proven to be significantly effective in filtering performance. Such a downmix optimizes the cost to performance ratio within the desired performance level range. The plurality of signals may be appropriately downmixed or weighted as appropriate.

非コヒーレントフィルタ２６用の第二雑音参照センサ３０は非音響センサであり、例えば、車体や車両フロアに搭載される加速度計や振動センサである。図２は、第一雑音参照センサ２８、第二雑音参照センサ３０およびマイクロホン１２の配置を模式的に図示する。マイクロホン１２は、車室の運転手近くに配置され、発話者（この場合は運転手）の話言葉を受信する。第一雑音参照源２８は車両オーディオ系に直結している。第二雑音参照センサ３０は、一般的には、車体や車両フロアに取り付けられ、機械的信号を受信する。 The second noise reference sensor 30 for the non-coherent filter 26 is a non-acoustic sensor, for example, an accelerometer or a vibration sensor mounted on a vehicle body or a vehicle floor. FIG. 2 schematically illustrates the arrangement of the first noise reference sensor 28, the second noise reference sensor 30 and the microphone 12. The microphone 12 is arranged near the driver of the passenger compartment and receives the spoken word of the speaker (in this case, the driver). The first noise reference source 28 is directly connected to the vehicle audio system. The second noise reference sensor 30 is generally attached to a vehicle body or a vehicle floor and receives a mechanical signal.

図３は、音声信号前処理装置１０の詳細な構造を図示する。音声信号前処理装置１０は、マイクロホン１２から受信される信号、第一雑音参照センサ２８および第二雑音参照センサ３０から供給される信号を受信する入力部３２を備える。入力部３２は、受信した信号がデジタル形式で無い場合にその信号をデジタル化（２値化）するデジタル部を有する。入力部３２は、生成したデジタル信号を重複フレームへ分割するフレーム部を有する。各フレーム時間は、例えば、１フレーム当たり略１０ｍｓである。入力部３２は、Ｎ個（所定値）の周波数バンドへ受信信号を分離するバンド分離部を更に有する。例えば、バンド分離部は高速フーリエ変換（ＦＦＴ）を用いて実行される。周波数バンドは、周波数領域内にて対数尺度で記載される。これにより、数オクターブにわたる信号の質を一様に維持することができ、処理が複雑化することを抑制できる。マイクロホン１２、第一雑音参照センサ２８および第二雑音参照センサ３０から受信される信号は、入力部３２にてデジタル化され、信号１２ａ、２８ａおよび３０ａというデジタル形式で入力部３２から出力される。これら信号１２ａ、２８ａおよび３０ａは、時間領域にてフレームへ分割され、周波数領域のＮ周波数バンドへも分割されたものである。周波数バンド数Ｎは、２５０Ｈｚ〜６ｋＨｚの周波数領域において約１１に定められる。 FIG. 3 illustrates the detailed structure of the audio signal preprocessing apparatus 10. The audio signal preprocessing device 10 includes an input unit 32 that receives signals received from the microphone 12 and signals supplied from the first noise reference sensor 28 and the second noise reference sensor 30. The input unit 32 includes a digital unit that digitizes (binarizes) a received signal when the received signal is not in a digital format. The input unit 32 includes a frame unit that divides the generated digital signal into overlapping frames. Each frame time is approximately 10 ms per frame, for example. The input unit 32 further includes a band separation unit that separates the received signal into N (predetermined value) frequency bands. For example, the band separation unit is executed using a fast Fourier transform (FFT). Frequency bands are described on a logarithmic scale within the frequency domain. As a result, the signal quality over several octaves can be maintained uniformly, and the processing can be prevented from becoming complicated. Signals received from the microphone 12, the first noise reference sensor 28, and the second noise reference sensor 30 are digitized by the input unit 32, and output from the input unit 32 in the digital form of signals 12a, 28a, and 30a. These signals 12a, 28a and 30a are divided into frames in the time domain and also divided into N frequency bands in the frequency domain. The frequency band number N is set to about 11 in the frequency region of 250 Hz to 6 kHz.

音声信号前処理装置１０は、フィルタリングされたＮ個のサブバンド信号を合成することで出力信号を生成する出力部３６を更に備える。出力部３６は、周波数領域から時間領域へＮサブバンド信号を変換する逆ＦＦＴ（逆フーリエ変換）と、信号を生成する重複加算部を有する。 The audio signal preprocessing apparatus 10 further includes an output unit 36 that generates an output signal by synthesizing the filtered N subband signals. The output unit 36 includes an inverse FFT (Inverse Fourier Transform) that converts the N subband signal from the frequency domain to the time domain, and an overlap adder that generates a signal.

コヒーレントフィルタ２４および非コヒーレントフィルタ２６としていずれのフィルタも適用できるが、コヒーレントフィルタ２４は第一雑音参照信号とコヒーレントなノイズを低減する線形フィルタであり、非コヒーレントフィルタ２６は第二雑音参照信号と非コヒーレントでもパワースペクトラムにて相関関係にあるノイズを低減する非線形フィルタである。図４、５は、コヒーレントフィルタ２４の一例を図示する。図４を参照すると（および、車両のスピーカーから供給されるコヒーレント外来ノイズだけを考慮する）、マイクロホン１２の信号（以下、マイクロホン信号）x(n)は、以下のとおり示される。 Although both filters can be applied as the coherent filter 24 and the non-coherent filter 26, the coherent filter 24 is a linear filter that reduces noise coherent with the first noise reference signal, and the non-coherent filter 26 is non-coherent with the second noise reference signal. It is a non-linear filter that reduces noise that is correlated in the power spectrum even in the coherent manner. 4 and 5 illustrate an example of the coherent filter 24. Referring to FIG. 4 (and considering only the coherent external noise supplied from the vehicle speaker), the signal of the microphone 12 (hereinafter referred to as microphone signal) x (n) is shown as follows.

x(n)=s(n)+g_c(a(n))
ここで、s(n)はサンプリングされた音声成分信号で、外来ノイズが含まれていない希望信号である。a(n)はスピーカーへ送信される信号で、第一雑音参照センサ２８からの第一雑音参照信号である。g_cはスピーカーからマイクロホン１２への伝達関数であり、線形である。時系列信号を重複フレームへ分割し、各信号にＦＦＴ（フーリエ変換）を施した後、各フレームiに対して周波数を２値化表示した次式にてX_i(k)を求める。X_i(k)=FFT(x_i),S_i(k)=FFT(s_i),A_i(k) =FFT(a_i) and Gⁱ _c(k)=FFT(gⁱ _c)
このような表記でコヒーレントフィルタ２４は図５に示される。ここで、kは周波数に対応するビンの数を示す変数であり、フレーム数を示すiは簡略化のため省略されている。 x (n) = s (n) + g _c (a (n))
Here, s (n) is a sampled audio component signal, which is a desired signal that does not include external noise. a (n) is a signal transmitted to the speaker and is a first noise reference signal from the first noise reference sensor 28. g _c is a transfer function from the speaker to the microphone 12 and is linear. The time series signal is divided into overlapping frames, and each signal is subjected to FFT (Fourier transform), and then X _i (k) is obtained by the following expression in which the frequency is binarized for each frame i. X _i (k) = FFT (x _i ), S _i (k) = FFT (s _i ), A _i (k) = FFT (a _i ) and G ⁱ _c (k) = FFT (g ⁱ _c )
With such a notation, the coherent filter 24 is shown in FIG. Here, k is a variable indicating the number of bins corresponding to the frequency, and i indicating the number of frames is omitted for simplification.

X(k)現在のフレームにおけるマイクロホン信号を示し、
S(k)は現在のフレームにおけるノイズが付加されていない音声信号を示し、
G_c(k)は現在のフレームにおけるスピーカーとマイクロホン間の伝達関数を示し、
A(k)は現在のフレームにおけるオーディオ参照信号を示し、
H(k)は現在のフレームにおける改良型ウィナーアルゴリズムによる伝達関数G_c(k)の推定値を示し、数１は現在のフレームにおけるコヒーレントフィルタ２４の出力値であり、音声が強調された音声信号を示す。

X (k) indicates the microphone signal in the current frame,
S (k) indicates the audio signal without noise in the current frame,
G _c (k) represents the transfer function between the speaker and microphone in the current frame,
A (k) indicates the audio reference signal in the current frame,
H (k) represents the estimated value of the transfer function G _c (k) by the improved Wiener algorithm in the current frame, and Equation 1 is the output value of the coherent filter 24 in the current frame, and the speech signal with enhanced speech Indicates.

関数H(k)は、周波数領域内でのウィナーフィルターという雑音低減方法に基づく。複数のスピーカーからマイクロホン１２への音響伝達は線形であると考えられる。関数H(k)を推定するアルゴリズムは、適応処理というよりむしろ反復処理である。反復処理を使う場合、フィルタ更新は、入力値A(k) およびX(k)にのみ依存し出力値（数１）とは無関係である。よって、フィルタ更新の安定性が向上する。 The function H (k) is based on a noise reduction method called a Wiener filter in the frequency domain. The acoustic transmission from the plurality of speakers to the microphone 12 is considered to be linear. The algorithm for estimating the function H (k) is an iterative process rather than an adaptive process. When using iterative processing, the filter update depends only on the input values A (k) and X (k) and is independent of the output value (Equation 1). Therefore, the stability of filter update is improved.

例えば、フィルタ係数Hは、フィルタ係数演算部４５により、以下のアルゴリズムにより繰り返し推定される。 For example, the filter coefficient H is repeatedly estimated by the filter coefficient calculation unit 45 using the following algorithm.

H(k)=γ_XA(k)γ^-1 _AA(k)
ここで、γ_XA(k)は、マイクロホン信号とオーディオ参照信号間のクロススペクトラムの推定値を示し、γ_AA(k)は、オーディオ参照信号スペクトラムの推定値を示し、γ_XA(k)とγ_AA(k)は、現在のフレームにおいて繰り返し推定され、先のフレームにおけるそれぞれの値と現在のフレームにおける瞬間スペクトラム値とクロススペクトラム値に依存する。 H (k) = γ _XA (k) γ ^-1 _AA (k)
Where γ _XA (k) indicates the estimated value of the cross spectrum between the microphone signal and the audio reference signal, γ _AA (k) indicates the estimated value of the audio reference signal spectrum, and γ _XA (k) and γ _AA (k) is repeatedly estimated in the current frame and depends on the respective values in the previous frame, the instantaneous spectrum value and the cross spectrum value in the current frame.

コヒーレントフィルタ２４は、車両用ミュージックプレーヤ／ラジオが現在ＯＮ状態かどうかを決定する第一スイッチング部３４を更に備える、もしくはそれと関連している。スイッチング部３４は、車両用ミュージックプレーヤ／ラジオがＯＮ状態の時、コヒーレントフィルタ２４のみが作動するように制御する。たとえば、スイッチング部３４は、車両用ミュージックプレーヤ／ラジオから供給される”ON/OFF”表示信号を直接受信するか、もしくは第一雑音参照源２８の信号２８ａの値がある閾値より大きいかを決定する閾値スイッチを有してもよい。スイッチング部３４は、コヒーレントフィルタ２４の全バンドに対して単一のon/off制御をするか、各バンドに対して個別制御を行う。 The coherent filter 24 further comprises or is associated with a first switching unit 34 that determines whether the vehicle music player / radio is currently ON. The switching unit 34 controls so that only the coherent filter 24 operates when the vehicle music player / radio is in the ON state. For example, the switching unit 34 directly receives the “ON / OFF” display signal supplied from the vehicle music player / radio, or determines whether the value of the signal 28a of the first noise reference source 28 is greater than a certain threshold value. You may have a threshold switch to do. The switching unit 34 performs single on / off control for all bands of the coherent filter 24 or performs individual control for each band.

（実施形態２）
実施形態２では、図６，図７を用いて非コヒーレントフィルタ２６の一例を説明する。この例は、特に実施形態１の音声信号前処理装置１０での使用に適しているが、実施形態２は、これに限らず、非コヒーレントフィルタ２６は、スペクトラルパワーにて低減されるノイズに関連する雑音参照信号に基づいてノイズを低減する装置（特に車載装置）に適用できる。 (Embodiment 2)
In the second embodiment, an example of the non-coherent filter 26 will be described with reference to FIGS. This example is particularly suitable for use in the audio signal pre-processing device 10 of the first embodiment, but the second embodiment is not limited to this, and the non-coherent filter 26 is related to noise reduced by spectral power. The present invention can be applied to a device (particularly an in-vehicle device) that reduces noise based on a noise reference signal.

この実施形態２の特徴事項は、音声区間検出器（ＶＡＤ）モジュールを用いずにフィルタリングすることにある。ＶＡＤ検出器は、従来のスペクトルサブトラクションフィルタでは一般に必要とされているが、安定性にかけるという問題がある。 The feature of the second embodiment is that filtering is performed without using a voice interval detector (VAD) module. The VAD detector is generally required for the conventional spectral subtraction filter, but has a problem of stability.

実施形態２の詳細な説明の前に、理解を深めるために、ＶＡＤモジュールを有する１チャンネル用のスペクトルサブトラクションの原理を説明する。１チャンネルスペクトルサブトラクションの原理は、以下のとおりである。 Before the detailed description of the second embodiment, the principle of spectral subtraction for one channel having a VAD module will be described for better understanding. The principle of 1-channel spectral subtraction is as follows.

図１０に示すとおり、x(n)は、サンプル音声s(n)と雑音b(n)を含有するマイクロホンからのサンプル信号を示す。 As shown in FIG. 10, x (n) represents a sample signal from a microphone containing sample sound s (n) and noise b (n).

出力値（数２）は、強調された音声信号である。

The output value (Equation 2) is an enhanced audio signal.

同じ周波数領域において各フレーム毎に前述と同じ記号を用いると、出力値（数２）は数３の通り求められる。

When the same symbol as described above is used for each frame in the same frequency region, the output value (Equation 2) is obtained as Equation 3.

ここで、G(k)（数５）はゲイン関数で、数４は無発声中の雑音推定値である。

Here, G (k) (Equation 5) is a gain function, and Equation 4 is an estimated noise value during speechlessness.

h(.)は、スペクトラムの変化に基づく関数である。これらは、（S.V.Vaseghi、「Advanced Digital Signal Processing and Noise Reduction」誌、John Wiley & Sons Ltd出版、２０００年）に開示されている。 h (.) is a function based on a change in spectrum. These are disclosed in (S.V.Vaseghi, "Advanced Digital Signal Processing and Noise Reduction", published by John Wiley & Sons Ltd, 2000).

１チャンネルスペクトルサブトラクションは効果的である一方、雑音を含む音声フレームと雑音のみのフレームを識別するのにＶＡＤの性能に大変依存する。 While one-channel spectral subtraction is effective, it relies heavily on VAD performance to distinguish between noisy speech frames and noisy frames.

図６に示す実施形態２では、x(n)は、マイクロホン１２から伝送されるサンプル信号を示すが、実施形態のコヒーレントフィルタ２４にてすでに低減されたコヒーレント成分はこのサンプル信号には含まれていない。図６に示す実施形態では、コヒーレントフィルタ２４の図示および説明を省略するが、本実施の形態１と同様にこれら二つのフィルタ２４、２６は連続して実行されてもよい。x(n)は、サンプル音声成分s(n)と非コヒーレント雑音成分 nc(n)を含有する。nc(n)は、非コヒーレント車両成分ncv(n)、外部定常成分d(n)および一過性成分t(n)から構成される。マイクロホン１２で受信した非コヒーレント車両成分ncv(n)は、非線形関数f_NC:ncv(n)=f_NC(r(n))によって第二雑音参照センサ３０で受信する信号r(n)に関連付けられている。時系列信号を互いに少しずつ重複するフレームへ分割し各信号にフーリエ変換（ＦＦＴ）を施した後、各フレームi、周波数ビンkに対してX_i(k)を求める。
X_i(k)=FFT(x_i),S_i(k)=FFT(s_i), NCV_i(k)=FFT(ncv_i), D_i(k)=FFT(d_i), T_i(k)=FFT(t_i)
非線形フィルタ２６は、これらの表記法を用いて図７に示される。ここで、kは周波数に対応するビンを表す変数であり、フレーム番号を示すiは簡略化のため省略されている。 In the second embodiment shown in FIG. 6, x (n) represents a sample signal transmitted from the microphone 12, but the coherent component already reduced by the coherent filter 24 of the embodiment is included in this sample signal. Absent. In the embodiment shown in FIG. 6, the illustration and description of the coherent filter 24 are omitted, but these two filters 24 and 26 may be executed continuously as in the first embodiment. x (n) contains a sample speech component s (n) and a non-coherent noise component nc (n). nc (n) is composed of a non-coherent vehicle component ncv (n), an external stationary component d (n), and a transient component t (n). The non-coherent vehicle component ncv (n) received by the microphone 12 is related to the signal r (n) received by the second noise reference sensor 30 by the nonlinear function f _NC : ncv (n) = f _NC (r (n)). It has been. After dividing the time series signal into frames that overlap each other little by little, and applying Fourier transform (FFT) to each signal, X _i (k) is obtained for each frame i and frequency bin k.
X _i (k) = FFT (x _i ), S _i (k) = FFT (s _i ), NCV _i (k) = FFT (ncv _i ), D _i (k) = FFT (d _i ), T _i (k) = FFT (t _i )
Nonlinear filter 26 is shown in FIG. 7 using these notations. Here, k is a variable representing a bin corresponding to a frequency, and i indicating a frame number is omitted for simplification.

X(k)=S(k)+NCV(k)+D(k)+T(k)
マイクロホン１２が受信するノイズの多くは振動センサによっても受信されると仮定すると、振動センサは雑音参照センサとして用いられ、スペクトルサブトラクションに基づく技法によって線形フィルタの出力である信号に含まれる雑音を低減する。ここで、線形フィルタから出力されるノイズ成分と振動参照信号成分は非コヒーレントであるがパワースペクトラムにおいては互いに関連している点がポイントである。 X (k) = S (k) + NCV (k) + D (k) + T (k)
Assuming that much of the noise received by the microphone 12 is also received by the vibration sensor, the vibration sensor is used as a noise reference sensor and reduces the noise contained in the signal that is the output of the linear filter by a technique based on spectral subtraction. . Here, the noise component and the vibration reference signal component output from the linear filter are incoherent, but the point is that they are related to each other in the power spectrum.

非コヒーレントフィルタ２６は、マイクロホン信号Xへスペクトルサブトラクション関数G_NCを適用することで、スペクトルサブトラクションを行うゲイン部４０を有する。スペクトルサブトラクション関数G_NCは、G_NC(k)=G_NC[R(k),X(k),ref_calib]と表される非線形関数である。非コヒーレントフィルタ２６は非線形フィルタであるので、スペクトルサブトラクションによってノイズをキャンセルする用に実行される。G_NCは、１チャンネルのスペクトルサブトラクションで用いられるゲインに似ている。推定音声信号（数６）は、数７の通り演算される。

The non-coherent filter 26 includes a gain unit 40 that performs spectral subtraction by applying the spectral subtraction function G _NC to the microphone signal X. The spectral subtraction function G _NC is a nonlinear function expressed as G _NC (k) = G _NC [R (k), X (k), ref_calib]. Since the non-coherent filter 26 is a non-linear filter, it is executed to cancel noise by spectral subtraction. G _NC is similar to the gain used in 1-channel spectral subtraction. The estimated speech signal (Equation 6) is calculated as shown in Equation 7.

ゲイン部４０は、マイクロホン信号X(k)と、校正ゲインref_calibで増幅した雑音参照信号R(k)である、新たな雑音参照信号４２ref_calib^*R(k)を受信する。校正ゲインref_calibは、例えば推定部４４にて算出される。校正ゲイン “ref_calib” は、参照信号センサとマイクロホン間の伝達関数の２乗値の大きさの推定値であるか、もしくはそれに関連した値である。校正ゲインref_calibは、単一の値もしくは信号であるか、スペクトル成分を有する。非コヒーレントフィルタ２６は、第二雑音参照源の信号R(k)に校正ゲインref_calibを掛け合わせる乗算部４６を有する。校正ゲイン“ref_calib”の推定は、以下原則に則り、更新アルゴリズムにより連続的に行われる。 The gain unit 40 receives the microphone signal X (k) and a new noise reference signal 42ref_calib ^* R (k) that is the noise reference signal R (k) amplified by the calibration gain ref_calib. The calibration gain ref_calib is calculated by the estimation unit 44, for example. The calibration gain “ref_calib” is an estimated value of the square value of the transfer function between the reference signal sensor and the microphone, or a value related thereto. The calibration gain ref_calib is a single value or signal, or has a spectral component. The non-coherent filter 26 includes a multiplication unit 46 that multiplies the signal R (k) of the second noise reference source by the calibration gain ref_calib. The estimation of the calibration gain “ref_calib” is continuously performed by the update algorithm in accordance with the following principle.

(ａ)雑音参照信号とマイクロホン間の伝達関数は、比較的ゆっくりとした時間で（例えば、秒オーダ）変化する。車両の通常運転中は、マイクロホン信号のスペクトラル変動幅は、第二雑音参照センサ３０の信号のスペクトラル変動幅とおおよそ比例する。また、マイクロホン信号と第二雑音参照センサ３０の信号それぞれが急速に変化するとしても、依然としてスペクトラル変動幅は互いに比例関係である。これは、非コヒーレント車両成分NCV(k)と雑音参照信号R(k)がパワースペクトラムにおいて互いに関連していることによるものである。 (a) The transfer function between the noise reference signal and the microphone changes in a relatively slow time (for example, on the order of seconds). During normal operation of the vehicle, the spectral fluctuation range of the microphone signal is approximately proportional to the spectral fluctuation range of the signal of the second noise reference sensor 30. Moreover, even if the microphone signal and the signal of the second noise reference sensor 30 each change rapidly, the spectral fluctuation ranges are still proportional to each other. This is because the non-coherent vehicle component NCV (k) and the noise reference signal R (k) are related to each other in the power spectrum.

(ｂ)外部の比較的安定した成分D(k)は、比較的ゆっくりと、かつ車両速度と共に変動すると仮定される。外部の比較的安定した成分D(k)は、校正ゲインref_calibの倍率λによって調整される。 (b) The external relatively stable component D (k) is assumed to vary relatively slowly and with vehicle speed. The external relatively stable component D (k) is adjusted by the magnification λ of the calibration gain ref_calib.

(ｃ) 校正ゲインref_calibの推定方法としてはいくつかあるが、次式で推定してもよい。 ref_calib=λE_x/E_r
ここで、E_xは、マイクロホン信号の瞬時パワーの推定値を示す。例えば、E_xは数８の通り求められる。

(c) Although there are several methods for estimating the calibration gain ref_calib, it may be estimated by the following equation. ref_calib = λE _x / E _r
Here, E _x denotes the estimated value of the instantaneous power of the microphone signal. For example, _Ex is obtained as shown in Equation 8.

ここで、フレームＬは、通常、秒単位、たとえば０．５秒間継続する。E_rは、第二雑音参照信号の瞬時パワーの推定値を示す。例えば、E_rは以下の数９の通り求められる。

Here, the frame L usually lasts in seconds, for example, 0.5 seconds. _Er represents an estimated value of the instantaneous power of the second noise reference signal. For example, E _r is obtained as shown in Equation 9 below.

ここで、フレームＬは、通常、秒単位、たとえば０．５秒間継続する。λ(1)は、非コヒーレント車両成分NCV(k)の割合（寄与度）を過大に推定せずに、また外部の比較的安定な成分D(k)も考慮するために用いられる要素である。λは一般に、約０．７〜約１の範囲にて変動し、その変動率は車両速度に依存する。 Here, the frame L usually lasts in seconds, for example, 0.5 seconds. λ (1) is an element used not to overestimate the ratio (contribution) of the non-coherent vehicle component NCV (k) and to take into account the relatively stable external component D (k) . λ generally varies in the range of about 0.7 to about 1, with the rate of variation depending on the vehicle speed.

パワースペクトラムの大きさは一般に約０．５〜１秒間毎に推定され、校正ゲインref_calibは一般に約１〜約３秒間毎に推定される。 The magnitude of the power spectrum is generally estimated about every 0.5 to 1 second, and the calibration gain ref_calib is generally estimated about every 1 to about 3 seconds.

(ｄ) 伝達関数（雑音参照センサとマイクロホン間）と│D(k)│²は比較的ゆっくり変化するので、両信号X(k)とR(k)の2乗値の大きさ間で不均衡な差異が生じる場合、これは、音声信号S(k)もしくは一過性ノイズT(k)中に外部混乱が存在することを示す。変化閾値もしくは変化率閾値が推定部４４によって求められた校正ゲインref_calibに適用されることで、校正ゲインref_calibはそういった外部雑音によってゆがめられないようにすることができる。例えば、校正ゲインref_calibの値が前回値より約２０％より大きく変化している場合、その変化は反映されず、校正ゲインref_calibの前回値が代用される。 (d) Since the transfer function (between the noise reference sensor and the microphone) and | D (k) | ² change relatively slowly, there is no difference between the magnitudes of the squares of both signals X (k) and R (k). If a balanced difference occurs, this indicates that there is external confusion in the audio signal S (k) or transient noise T (k). By applying the change threshold or the change rate threshold to the calibration gain ref_calib obtained by the estimation unit 44, the calibration gain ref_calib can be prevented from being distorted by such external noise. For example, when the value of the calibration gain ref_calib changes more than about 20% from the previous value, the change is not reflected and the previous value of the calibration gain ref_calib is used instead.

この技法により、本実施形態２では、ＶＡＤを用いる必要が無く、一過性雑音検出器を用いる必要も無い。これらに代わって変化閾値を用いることで、校正ゲインref_calibをグローバルコントロールできるようになり、音声信号S(k)もしくは一過性ノイズT(k)による外乱を回避できる。ＶＡＤは従来の処理回路において効果的に実装するために問題があるため、ＶＡＤの必要性が無いというのは技術的に大変有意義である。 According to this technique, in the second embodiment, it is not necessary to use VAD, and it is not necessary to use a transient noise detector. By using the change threshold instead of these, the calibration gain ref_calib can be controlled globally, and disturbance due to the audio signal S (k) or the transient noise T (k) can be avoided. Since VAD has a problem for effective implementation in conventional processing circuits, it is technically very significant that there is no need for VAD.

(ｅ) 変化閾値は、以下のように選定されてもよい。 (e) The change threshold may be selected as follows.

（イ）校正ゲインref_calib が、振動センサとマイクロホンとの間の伝達関数値の通常の変化に追従する程度十分に大きくなるように変化閾値を選定する。 (A) The change threshold is selected so that the calibration gain ref_calib is sufficiently large to follow a normal change in the transfer function value between the vibration sensor and the microphone.

（ロ）上記の伝達関数の大きさの変化率が速すぎて変化できない程の変化率をもつ信号成分によって、校正ゲインref_calibが、外乱を受けないように十分に小さい値に変化閾値を選定する。上記の通り、約２０％程度の変化閾値が効果的である。 (B) The change threshold is selected to a sufficiently small value so that the calibration gain ref_calib is not disturbed by a signal component having a change rate that cannot change because the change rate of the transfer function is too fast. . As described above, a change threshold of about 20% is effective.

上記のように、校正ゲインの変化率が所定の閾値を越えて変化しないように、校正ゲインが制限部４４により制限される。 As described above, the calibration gain is limited by the limiting unit 44 so that the change rate of the calibration gain does not change beyond a predetermined threshold.

図８からも明らかな通り、非コヒーレントフィルタ２６は、実施形態１と同様に複数の周波数バンドを持つように分割設定される。非コヒーレントフィルタ２６は、入力値を時間毎にフレーム分割し、入力信号x(n)と非コヒーレント雑音参照センサからの入力信号r(n)を周波数バンドへ分割する入力部５０を有する。入力部５０は、それら信号を周波数バンドへ分割するＦＦＴ部を有する。非コヒーレントフィルタ２６は、複数のサブフィルタ部２６'からの信号を合成する出力部５２を有する。出力部５２は逆ＦＦＴ部を有する。本実施の形態２の非コヒーレントフィルタ２６が実施形態１に含まれる場合、入力部５０及び出力部５２は、本実施形態１の入力部３２と出力部３６によって代わりに実行されてもよく、非コヒーレントフィルタ２６中にて再度実行されなくてもよい。 As is clear from FIG. 8, the non-coherent filter 26 is divided and set to have a plurality of frequency bands as in the first embodiment. The non-coherent filter 26 has an input unit 50 that divides an input value into frames for each time and divides an input signal x (n) and an input signal r (n) from a non-coherent noise reference sensor into frequency bands. The input unit 50 includes an FFT unit that divides these signals into frequency bands. The non-coherent filter 26 includes an output unit 52 that synthesizes signals from the plurality of sub-filter units 26 ′. The output unit 52 has an inverse FFT unit. When the non-coherent filter 26 of the second embodiment is included in the first embodiment, the input unit 50 and the output unit 52 may be executed instead by the input unit 32 and the output unit 36 of the first embodiment. It may not be executed again in the coherent filter 26.

図９は、実施形態２が独立して実施される場合のマイクロホン１２と非音響非コヒーレントセンサ３０の車両中での一配置例を示す。マイクロホン１２は、運転手近くに設置され、運転手の音声を音響的に受信する。非音響非コヒーレント雑音参照信号センサ３０は、車体もしくは車両フロアに通常備え付けられ、機械的振動を受信する。非コヒーレントフィルタ２６の出力信号は、本実施形態１と同様に、音声認識回路（図示無し）に送信される。 FIG. 9 shows an arrangement example of the microphone 12 and the non-acoustic non-coherent sensor 30 in the vehicle when the second embodiment is implemented independently. The microphone 12 is installed near the driver and acoustically receives the driver's voice. A non-acoustic non-coherent noise reference signal sensor 30 is typically provided on the vehicle body or vehicle floor and receives mechanical vibrations. The output signal of the non-coherent filter 26 is transmitted to a speech recognition circuit (not shown) as in the first embodiment.

本発明は、音声を含む電気音声信号をフィルタリングし雑音成分を低減する音声前処理方法において、前記電気音声信号を第一フィルタによりコヒーレントフィルタリングもしくは非コヒーレントフィルタリングのいずれか一方のフィルタリングを行う第一フィルタリング工程と、
前記第一フィルタに基づく出力を第二フィルタにより前記第一フィルタとは異なる他方のフィルタリングを行う第２フィルタリング工程とを備える方法であってもよい。 The present invention provides a speech preprocessing method for filtering an electrical speech signal including speech and reducing a noise component, wherein the electrical speech signal is subjected to either a coherent filtering or a non-coherent filtering by a first filter. Process,
A method may be provided that includes a second filtering step of filtering the output based on the first filter with the other filter different from the first filter by the second filter.

また、本発明は、上記方法をコンピュータに実行させることを特徴とする音声信号前処理用のプログラムであってもよい。 The present invention may also be an audio signal preprocessing program that causes a computer to execute the above method.

図１は、実施形態１の音声信号前処理装置原理を模式的に示した概略図である。FIG. 1 is a schematic view schematically showing the principle of an audio signal preprocessing device according to the first embodiment. 図２は、車両内でのマイクロホンと非音響雑音参照センサとの配置を模式的に示した概略図である。FIG. 2 is a schematic diagram schematically showing the arrangement of the microphone and the non-acoustic noise reference sensor in the vehicle. 図３は、音声信号前処理装置の詳細を模式的に示したブロック図である。FIG. 3 is a block diagram schematically showing details of the audio signal preprocessing device. 図４は、コヒーレントフィルタリングの原理を模式的に示した概略図である。FIG. 4 is a schematic diagram schematically showing the principle of coherent filtering. 図５は、コヒーレントフィルタの構造の詳細を模式的に示した概略図である。FIG. 5 is a schematic view schematically showing details of the structure of the coherent filter. 図６は、実施形態２の非コヒーレントフィルタリングの原理を模式的に示した概略図である。FIG. 6 is a schematic diagram schematically illustrating the principle of non-coherent filtering according to the second embodiment. 図７は、非コヒーレントフィルタを模式的に示したブロック図である。FIG. 7 is a block diagram schematically showing a non-coherent filter. 図８は、非コヒーレントフィルタの構造の詳細を模式的に示したブロック図である。FIG. 8 is a block diagram schematically showing details of the structure of the non-coherent filter. 図９は、実施形態２が実施形態１とは独立して用いられる場合のマイクロホンと非音響雑音参照センサの配置を模式的に示す図である。FIG. 9 is a diagram schematically illustrating the arrangement of the microphone and the non-acoustic noise reference sensor when the second embodiment is used independently of the first embodiment. 図１０は、１チャンネルスペクトルサブトラクションの原理を模式的に示す図である。FIG. 10 is a diagram schematically showing the principle of 1-channel spectral subtraction.

Explanation of symbols

１音声信号処理装置
１０音声信号前処理装置
１２マイクロホン
１６音声認識部
２０第一フィルタ
２２第二フィルタ
２８非音響コヒーレント雑音センサ
３０非音響非コヒーレント雑音センサ DESCRIPTION OF SYMBOLS 1 Audio | voice signal processing apparatus 10 Audio | voice signal pre-processing apparatus 12 Microphone 16 Voice recognition part 20 1st filter 22 2nd filter 28 Non-acoustic coherent noise sensor 30 Non-acoustic non-coherent noise sensor

Claims

An audio signal pre-processing device that includes a noise reducing first filter and a noise reducing second filter and that processes an electric audio signal including audio to reduce a noise component, wherein the first filter or the second filter Either one is a coherent filter that reduces the noise component by coherent filtering, and the other is a non-coherent filter that reduces the noise component by non-coherent filtering, and the second filter filters the output of the first filter. An audio signal preprocessing device.

The audio signal preprocessing device according to claim 1, wherein the first filter is the coherent filter, and the second filter is the non-coherent filter.

The coherent filter is configured to receive a first noise reference signal that is coherent with a noise component included in the electrical audio signal from a non-acoustic coherent noise sensor. Audio signal pre-processing device.

The non-coherent filter is configured to receive from a non-acoustic non-coherent noise sensor a second noise reference signal that is non-coherent with a noise component included in the electrical audio signal and is related in spectral power. The audio signal preprocessing device according to any one of claims 1 to 3, wherein

The audio signal processing apparatus according to claim 3 or 4, wherein the non-acoustic coherent noise sensor is non-acoustically connected to an audio system of a vehicle.

6. The audio signal processing apparatus according to claim 4, wherein the non-acoustic non-coherent noise sensor is configured to sense vehicle vibration.

The audio signal preprocessing apparatus according to any one of claims 1 to 6, wherein the coherent filter includes a linear filter.

The audio signal preprocessing apparatus according to claim 1, wherein the coherent filter includes a Wiener filter.

The sound according to any one of claims 3 to 8, wherein the coherent filter includes a filter coefficient calculation unit that repeatedly calculates a filter coefficient based on the electric sound signal and the first noise reference signal. Signal pre-processing device.

The audio signal preprocessing device according to any one of claims 1 to 9, wherein the non-coherent filter includes a non-linear filter.

The audio signal preprocessing device according to any one of claims 1 to 10, wherein the non-coherent filter includes a spectrum-gain filter.

12. The non-coherent filter further includes a filter coefficient calculation unit that calculates a filter coefficient based on the electric audio signal and the second noise reference signal input to the non-coherent filter. The audio signal preprocessing device according to any one of the above.

Each of the first filter and the second filter has a plurality of sub-filters that filter the electrical audio signal in a plurality of corresponding frequency sub-bands. The audio signal pre-processing device described.

The audio signal according to claim 13, further comprising: an input unit that divides the electric audio signal into the frequency subband; and an output unit that synthesizes the electric audio signal from the frequency subband after filtering. Pre-processing device.

The first filter and the second filter process the electrical audio signal as a series of frames, and each frame processes the signal as a continuous frame of frames representing a signal in a window for a predetermined period. The audio signal preprocessing device according to any one of claims 1 to 14.

16. The audio signal preprocessing according to claim 15, further comprising: an input unit that divides the electric audio signal into the series of frames; and an output unit that synthesizes the electric audio signal from the series of frames. apparatus.

The audio signal pre-processing device according to any one of claims 1 to 16,
A microphone that collects an acoustic signal and supplies the electrical audio signal as an input to the audio signal preprocessing device;
The non-acoustic coherent noise sensor that generates the first noise reference signal coherent with a first noise source that can be collected by the microphone, and supplies the first noise reference signal to the coherent filter;
Generating a second noise reference signal that is non-coherent with a second noise source that can be collected by the microphone and is related in spectrum power to the second noise source; An audio signal processing apparatus comprising: the non-acoustic non-coherent noise sensor supplied to a non-coherent filter.

The audio signal processing apparatus according to claim 17, wherein the non-acoustic non-coherent noise sensor is a vibration sensor or an accelerometer.

In a speech preprocessing method for filtering an electrical speech signal including speech and reducing a noise component,
A first filtering step of filtering the electrical audio signal with a first filter, either coherent filtering or non-coherent filtering;
And a second filtering step of filtering the output based on the first filter by the second filter, which is the other filtering different from the first filter.

The audio signal preprocessing method according to claim 19, wherein the first filter is the coherent filter, and the second filter is the non-coherent filter.

An audio signal preprocessing program for causing a computer to execute the method according to claim 19 or 20.