JP5289128B2

JP5289128B2 - Signal processing method, apparatus and program

Info

Publication number: JP5289128B2
Application number: JP2009073901A
Authority: JP
Inventors: 皇天田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2009-03-25
Filing date: 2009-03-25
Publication date: 2013-09-11
Anticipated expiration: 2029-03-25
Also published as: JP2010221945A

Description

本発明は、マイクロホンアレー技術を用いた信号処理方法、装置及びプログラムに関する。 The present invention relates to a signal processing method, apparatus, and program using a microphone array technique.

配列された複数のマイクロホン、いわゆるマイクロホンアレーを用いて、特定の方向から到来する音を強調し、その他の音を抑圧する技術や、音源の方向を検出する技術の研究が盛んに行われている。例えば、自動車内で運転者（以下、ドライバという）が希望することを把握するために音声認識を行うような場合、走行雑音やカーオーディオの再生音を抑圧し、ドライバの発話だけを抽出するために、マイクロホンアレーが用いられる。 Research is being actively conducted on techniques for emphasizing sound coming from a specific direction and suppressing other sounds and detecting the direction of the sound source using a plurality of arranged microphones, so-called microphone arrays. . For example, when voice recognition is performed in order to grasp what a driver (hereinafter referred to as a driver) wants in a car, in order to suppress driving noise and car audio playback sound and extract only the driver's speech In addition, a microphone array is used.

マイクロホンアレーには種々の方式があるが、代表的な方式として非特許文献１に開示された遅延和アレーがあげられる。遅延和アレーは、各マイクロホンからの信号に対して遅延を付与した後に加算処理を行うことで、事前に設定された特定の方向（例えば、自動車内であればドライバの方向）から到来した音響信号のみが同位相で足し合わされて強調されるのに対し、他の方向から到来した音響信号は位相が揃わないため弱め合う、という原理に基づいている。この原理により、特定の方向からの音響信号を強調する、すなわち特定の方向に指向性を形成することができる。 There are various types of microphone arrays, and a typical example is the delay sum array disclosed in Non-Patent Document 1. The delay-and-sum array performs an addition process after adding a delay to the signal from each microphone, so that an acoustic signal arriving from a specific direction set in advance (for example, the direction of the driver in a car) Is based on the principle that only the sound signals coming from other directions are weakened because the phases are not aligned. According to this principle, an acoustic signal from a specific direction can be emphasized, that is, directivity can be formed in a specific direction.

マイクロホンアレーの別の例として、非特許文献２に開示されたGriffith-Jim形アレーがあげられる。Griffith-Jim型アレーは、適応フィルタを用いて妨害音の方向が低感度の特性（以下、ｎｕｌｌ（零点）ともいう）を持つ指向性を形成し、妨害音を選択的に除去する点が特徴である。除去できる妨害音の方位数は、一般的にはＮ−１（Ｎはマイクロホン数）である。 Another example of the microphone array is a Griffith-Jim type array disclosed in Non-Patent Document 2. The Griffith-Jim array uses an adaptive filter to form a directivity with a characteristic that the direction of the interference sound is low in sensitivity (hereinafter also referred to as null) and selectively remove the interference sound. It is. The number of azimuths of the interference sound that can be removed is generally N-1 (N is the number of microphones).

自動車内での音声認識において妨害音となるカーオーディオの再生音は、自動車の走行雑音等とは異なり、音源信号や音源位置が既知である。従って、マイクロホンアレーからの信号に対していわゆるアレー処理を行うことにより、再生音を抑圧することが可能である。例えば、アレー処理によりマイクロホンアレーの低感度方向をカーオーディオのスピーカの方向に向けることで、カーオーディオの再生音を抑圧することができる。スピーカの位置は既知であるため、音源位置が未知である周囲の妨害音よりも効果的な抑圧が可能である。 The reproduction sound of the car audio that becomes an interference sound in the voice recognition in the automobile is different from the running noise of the automobile, and the sound source signal and the sound source position are known. Therefore, the reproduced sound can be suppressed by performing so-called array processing on the signals from the microphone array. For example, the reproduction sound of the car audio can be suppressed by directing the low sensitivity direction of the microphone array toward the speaker of the car audio by array processing. Since the position of the speaker is known, it is possible to suppress the noise more effectively than the surrounding interference sound whose sound source position is unknown.

J.L. Flanagan, J.D.Johnston, R.Zahn and G.W.Elko,"Computer-steered microphone arrays for sound transduction in large rooms,"J.Acoust. Soc. Am., vol.78, no.5, pp.1508-1518, 1985JL Flanagan, JDJohnston, R. Zahn and GWElko, "Computer-steered microphone arrays for sound transduction in large rooms," J. Acoust. Soc. Am., Vol. 78, no. 5, pp. 1508-1518, 1985 L.J. Griffiths and C.W. Jim, "An Alternative Approach to Linearly Constrained Adaptive Beamforming," IEEE Trans. Antennas & Propagation, Vol. AP-30, No. 1, Jan., 1982L.J. Griffiths and C.W.Jim, "An Alternative Approach to Linearly Constrained Adaptive Beamforming," IEEE Trans. Antennas & Propagation, Vol. AP-30, No. 1, Jan., 1982

カーオーディオの再生音がステレオである場合、音源によって定位する位置が異なる。例えばボーカルはセンターに定位し、伴奏は楽器ごとに左右に分散して定位するといったことが多い。この場合、全ての音源の方向にマイクロホンアレーの死角ができるように、すなわち全ての音源の方向が低感度であるようにすることが望ましいが、マイクロホン数が少ない場合にはそれが難しい。この結果、妨害音の抑圧性能が低下する。また、再生音の音源の一部が目的音と同一方向に定位する場合には、妨害音である再生音を除去することは非常に難しくなる。 When the playback sound of the car audio is stereo, the localization position differs depending on the sound source. For example, vocals are usually localized at the center, and accompaniment is often distributed to the left and right for each instrument. In this case, it is desirable that the dead angle of the microphone array is formed in the direction of all sound sources, that is, the direction of all sound sources is low sensitivity, but this is difficult when the number of microphones is small. As a result, the interference noise suppression performance decreases. In addition, when a part of the sound source of the reproduced sound is localized in the same direction as the target sound, it is very difficult to remove the reproduced sound that is a disturbing sound.

本発明は、音声認識やハンズフリー通話時に妨害音となるスピーカからの再生音を効果的に抑圧するための信号処理方法、装置及びプログラムを提供することを目的とする。 An object of the present invention is to provide a signal processing method, apparatus, and program for effectively suppressing a reproduced sound from a speaker that becomes a disturbing sound during voice recognition or hands-free call.

本発明の一態様によると、複数のマイクロホンにより複数の再生チャネルの再生音を受音して得られる受音信号に対して、前記再生音の受音方向に応じて異なる感度を持つようにアレー処理を行うアレー処理部と、複数チャネルのオーディオ信号から少なくとも１チャネルの音源信号を生成する音源信号生成部と、前記複数の再生チャネル毎に、前記音源信号に対して前記感度が相対的に低い方向に音源が定位するようにフィルタリングを行って、前記複数の再生チャネルに供給する複数のフィルタリング信号を生成するフィルタリング部と、を具備する信号処理装置が提供される。 According to one aspect of the present invention, an array is provided so as to have different sensitivities for received sound signals obtained by receiving reproduced sounds of a plurality of reproduction channels with a plurality of microphones depending on the sound receiving direction of the reproduced sounds. An array processing unit that performs processing, a sound source signal generation unit that generates at least one channel of a sound source signal from a plurality of channels of audio signals, and the sensitivity relative to the sound source signal for each of the plurality of playback channels is relatively low There is provided a signal processing apparatus including a filtering unit that performs filtering so that a sound source is localized in a direction and generates a plurality of filtering signals to be supplied to the plurality of reproduction channels.

本発明によれば、少数のマイクロホンで構成されるマイクロホンアレーを用いた場合でも、音声認識やハンズフリー通話時に妨害音となる多チャネルのオーディオ信号による再生音を効果的に抑圧することができる。 According to the present invention, even when a microphone array composed of a small number of microphones is used, it is possible to effectively suppress reproduced sound by a multi-channel audio signal that becomes a disturbing sound during voice recognition or hands-free call.

本発明の第１の実施形態に係る信号処理装置を示すブロック図1 is a block diagram showing a signal processing apparatus according to a first embodiment of the present invention. 第１の実施形態の動作を説明するための音源位置とマイクロホンの関係を示す図The figure which shows the relationship between the sound source position and microphone for demonstrating operation | movement of 1st Embodiment. 本発明の第２の実施形態に係る信号処理装置を示すブロック図The block diagram which shows the signal processing apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第３の実施形態に係る信号処理装置を示すブロック図The block diagram which shows the signal processing apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第４の実施形態に係る信号処理装置を示すブロック図The block diagram which shows the signal processing apparatus which concerns on the 4th Embodiment of this invention. 本発明の第５の実施形態に係る信号処理装置を示すブロック図The block diagram which shows the signal processing apparatus which concerns on the 5th Embodiment of this invention. 本発明の第６の実施形態に係る信号処理装置を示すブロック図The block diagram which shows the signal processing apparatus which concerns on the 6th Embodiment of this invention. 本発明の第７の実施形態に係る電子機器を示すブロック図The block diagram which shows the electronic device which concerns on the 7th Embodiment of this invention

以下、本発明の実施の形態について説明する。
（第１の実施形態）
図１に示すように、第１の実施形態に係る信号処理装置は、加算器１０３、フィルタリング部１０４−１，１０４−２、選択器１０５−１，１０５−２、アレー処理部１０８及び位相制御部１０９を有する。 Embodiments of the present invention will be described below.
(First embodiment)
As shown in FIG. 1, the signal processing apparatus according to the first embodiment includes an adder 103, filtering units 104-1 and 104-2, selectors 105-1 and 105-2, an array processing unit 108, and phase control. Part 109.

加算器１０３、フィルタリング部１０４−１，１０４−２、選択器１０５−１，１０５−２は、複数チャネルのオーディオ信号をそれぞれ受けるオーディオ入力端子１０１−１，１０１−２と複数チャネルのスピーカ１０６−１，１０６−２との間に配置される。アレー処理部１０８は、マイクロホンアレーを形成する複数のマイクロホン１０７−１〜１０７−Ｎからの受音信号に対して信号処理を行うことによって所定の指向性を形成し、処理音声信号を出力する。 Adder 103, filtering sections 104-1 and 104-2, and selectors 105-1 and 105-2 are respectively connected to audio input terminals 101-1 and 101-2 that receive a plurality of channels of audio signals, and a speaker 106- of a plurality of channels. 1, 106-2. The array processing unit 108 forms a predetermined directivity by performing signal processing on the received sound signals from the plurality of microphones 107-1 to 107-N forming the microphone array, and outputs a processed sound signal.

次に、本実施形態に係る信号処理装置の動作について説明する。
オーディオ入力端子１０１−１，１０１−２には、複数チャネル、この例ではステレオ２チャネルのオーディオ信号が入力される。選択器１０５−１，１０５−２は、例えば制御入力端子１０２からの制御信号により切り替わる切り替えスイッチであり、上側に切り替わっているときはオーディオ入力端子１０１−１，１０１−２からのオーディオ信号が選択され、複数の再生チャネルであるスピーカ１０６−１，１０６−２から音響として放射される。以後、スピーカ１０６−１，１０６−２から放射される音響を再生音と呼ぶ。スピーカ１０６−１，１０６−２に供給されるオーディオ信号は、実際にはオーディオ増幅器により増幅されるが、図ではオーディオ増幅器については省略している。 Next, the operation of the signal processing apparatus according to the present embodiment will be described.
Audio signals of a plurality of channels, in this example, stereo 2 channels, are input to the audio input terminals 101-1 and 101-2. The selectors 105-1 and 105-2 are, for example, changeover switches that are switched by a control signal from the control input terminal 102. When the selectors 105-1 and 105-2 are switched to the upper side, the audio signals from the audio input terminals 101-1 and 101-2 are selected. Then, the sound is radiated as sound from the speakers 106-1 and 106-2 which are a plurality of reproduction channels. Hereinafter, the sound radiated from the speakers 106-1 and 106-2 is referred to as reproduction sound. The audio signals supplied to the speakers 106-1 and 106-2 are actually amplified by the audio amplifier, but the audio amplifier is omitted in the figure.

オーディオ入力端子１０１−１，１０１−２からのオーディオ信号は加算器１０３にも入力され、ここで両信号が足し合わされることにより１チャネルの信号すなわちモノラル信号とされる。以後、加算器１０３から出力される１チャネルの信号を音源信号という。加算器１０３から出力される音源信号は、複数のフィルタリング部１０４−１，１０４−２によりフィルタリングされる。フィルタリング部１０４−１，１０４−２のフィルタ特性については後述する。ここで、選択器１０５−１，１０５−２が下側に切り替わっている場合、フィルタリング部１０４−１，１０４−２からのフィルタリング信号が選択され、それぞれスピーカ１０６−１，１０６−２から再生音として放射される。 Audio signals from the audio input terminals 101-1 and 101-2 are also input to the adder 103, where both signals are added to form a one-channel signal, that is, a monaural signal. Hereinafter, the 1-channel signal output from the adder 103 is referred to as a sound source signal. The sound source signal output from the adder 103 is filtered by a plurality of filtering units 104-1 and 104-2. The filter characteristics of the filtering units 104-1 and 104-2 will be described later. Here, when the selectors 105-1 and 105-2 are switched to the lower side, the filtering signals from the filtering units 104-1 and 104-2 are selected, and the reproduced sound from the speakers 106-1 and 106-2, respectively. As radiated.

スピーカ１０６−１，１０６−２からの再生音は、複数のマイクロホン１０７−１〜１０７−Ｎによって受音され、受音信号と呼ばれる電気信号がマイクロホン１０７−１〜１０７−Ｎから出力される。マイクロホン１０７−１〜１０７−Ｎからの受音信号に対して、アレー処理部１０８によりアレー処理と呼ばれる信号処理が行われ、妨害音が除去された処理音声信号が出力される。 Reproduced sounds from the speakers 106-1 and 106-2 are received by the plurality of microphones 107-1 to 107-N, and electric signals called sound reception signals are output from the microphones 107-1 to 107-N. Signal processing called array processing is performed on the sound reception signals from the microphones 107-1 to 107-N by the array processing unit 108, and a processed sound signal from which the interference sound is removed is output.

アレー処理部１０８は、本実施形態ではスピーカ１０６−１，１０６−２からの再生音に対するマイクロホン１０７−１〜１０７−Ｎの受音方向に応じて異なる感度を持つようなアレー処理を行う。本実施形態によると、アレー処理部１０８からの処理音声信号は妨害音が除去された、ドライバの発話音を主体とした音声信号である。妨害音は、例えば後述するように助手席からの発話音や、スピーカ１０６−１，１０６−２からの再生音などである。アレー処理部１０８からの処理音声信号は、例えば図示しない音声認識装置に入力される。 In this embodiment, the array processing unit 108 performs array processing having different sensitivities depending on the sound receiving directions of the microphones 107-1 to 107-N with respect to the reproduced sound from the speakers 106-1 and 106-2. According to the present embodiment, the processed audio signal from the array processing unit 108 is an audio signal mainly composed of the utterance sound of the driver from which the interference sound is removed. The interference sound is, for example, an utterance sound from the passenger seat, a reproduction sound from the speakers 106-1 and 106-2, as will be described later. The processed voice signal from the array processing unit 108 is input to a voice recognition device (not shown), for example.

図２は、本実施形態に係る信号処理装置の具体的な使用例であり、自動車内に信号処理装置を実装した様子を示している。この例は２マイク・２スピーカであり、マイクロホン１０７−１，１０７−２は、例えばルームミラーやインパネの中央部等に設置される。スピーカ１０６−１，１０６−２は、例えば前席ドア部に配置されることが多い。また、前席ドア部に加えてさらに後部にスピーカが配置されることも多い。 FIG. 2 is a specific use example of the signal processing apparatus according to the present embodiment, and shows a state in which the signal processing apparatus is mounted in an automobile. In this example, there are two microphones and two speakers, and the microphones 107-1 and 107-2 are installed, for example, in the center of a room mirror or an instrument panel. The speakers 106-1 and 106-2 are often arranged, for example, at the front seat door. In addition to the front seat door part, a speaker is often arranged at the rear part.

音声認識技術を利用してドライバが音声でカーナビの操作等を行う場合、例えばドライバ２１０の方向に高感度の特性、同乗者２１１の方向には低感度の特性（ｎｕｌｌ）の指向性を形成するようにマイクロホンアレーの特性を制御することで、ドライバの発話のみを的確に捕らえることができる。具体的には、例えば非特許文献１に記載されたような遅延和アレーを用いる場合、アレー処理部１０８ではマイクロホン１０７−１〜１０７−Ｎからの受音信号について遅延を施してから加算する。 When a driver performs voice navigation using a voice recognition technology, for example, a high sensitivity characteristic is formed in the direction of the driver 210, and a low sensitivity characteristic (null) directivity is formed in the direction of the passenger 211. Thus, by controlling the characteristics of the microphone array, only the driver's speech can be captured accurately. Specifically, for example, when a delay sum array as described in Non-Patent Document 1 is used, the array processing unit 108 adds the signals after delaying the received sound signals from the microphones 107-1 to 107-N.

ここで、位相制御部１０９によってアレー処理部１０８での遅延時間を適宜制御することにより、アレー処理部１０８で例えばドライバの方向から到来した音響信号のみを同位相で足し合わして強調し、他の方向からの音響信号については位相が揃わないため打ち消す、というアレー処理を行うことができる。すなわち、このアレー処理によりドライバの方向に指向性を形成することができる。 Here, by appropriately controlling the delay time in the array processing unit 108 by the phase control unit 109, for example, only the acoustic signal that has arrived from the direction of the driver is added in the same phase and emphasized by the array processing unit 108. Array processing can be performed in which acoustic signals from directions are canceled because the phases are not aligned. That is, directivity can be formed in the direction of the driver by this array processing.

Ｎ本のマイクロホン１０７−１〜１０７−Ｎからの受音信号をＸn(t)（ｎ＝１，．．．，Ｎ）と表すと、アレー処理部１０８からの処理音声信号Ｙ(t)は、次式で表される。

When the received sound signals from the N microphones 107-1 to 107-N are expressed as Xn (t) (n = 1,..., N), the processed sound signal Y (t) from the array processing unit 108 is Is expressed by the following equation.

ただし、マイクロホン１０７−１〜１０７−Ｎは等間隔で、添字ｎの順で配置されているものとする。また、τはマイクロホン１０７−１〜１０７−Ｎからの受音信号Ｘ(t)を目的音の到来方向に同相化するための遅延時間である。 However, the microphones 107-1 to 107-N are arranged at regular intervals in the order of the subscript n. Further, τ is a delay time for making the received signal X (t) from the microphones 107-1 to 107-N in-phase with the arrival direction of the target sound.

別の例として、先の非特許文献２に記載されたGriffith-Jim形アレーのように、適応フィルタを用いて妨害音の方向にｎｕｌｌを形成し、妨害音を選択的に除去するようにしてもよい。その場合、除去できる妨害音の方向は、一般的にはＮ−１（Ｎはマイクロホン数）である。その他にも、様々な方式が提案されているが、本実施形態はマイクロホンアレーの方式そのものには依存せず、任意の方式を用いることが可能である。 As another example, as in the Griffith-Jim type array described in Non-Patent Document 2 above, a null is formed in the direction of the disturbing sound using an adaptive filter, and the disturbing sound is selectively removed. Also good. In that case, the direction of the interference sound that can be removed is generally N-1 (N is the number of microphones). Various other methods have been proposed, but the present embodiment does not depend on the microphone array method itself, and an arbitrary method can be used.

このようなマイクロホンアレーの設計方針として、ドライバの発話のみを正確に捉えるために、助手席に座っている同乗者の発話を抑圧するように設計することは自然である。この場合、妨害音が助手席からの発話音のみである場合は、効果的に妨害音を抑圧することができる。 As a design policy of such a microphone array, in order to accurately capture only the driver's utterance, it is natural to design so as to suppress the utterance of the passenger sitting in the passenger seat. In this case, when the disturbing sound is only the speech sound from the passenger seat, the disturbing sound can be effectively suppressed.

しかし、同じく妨害音であるスピーカから再生音が同時に出力されている場合、その再生音までを抑圧することは難しい。助手席からの発話音とスピーカからの再生音の到来方向は、一般に異なるからである。特に、マイクロホン数が少ない場合は、ｎｕｌｌを形成できる方位数も少なくなるので、例えばマイクロホン数が２の場合、助手席の方向にｎｕｌｌを形成すると、それ以外の方向にｎｕｌｌを作ることができず、スピーカからの再生音を抑圧できない。 However, when reproduced sound is output simultaneously from a speaker that is also an interference sound, it is difficult to suppress the reproduced sound. This is because the direction of arrival of the uttered sound from the passenger seat and the reproduced sound from the speaker are generally different. In particular, when the number of microphones is small, the number of azimuths in which nulls can be formed decreases. For example, when the number of microphones is 2, if nulls are formed in the direction of the passenger seat, nulls cannot be formed in other directions. The playback sound from the speaker cannot be suppressed.

さらに、スピーカからステレオで再生音が出力されている場合は、さまざまな音源方位が同時に存在する可能性があり、少数のマイクロホンを用いたアレー処理では限界がある。モノラル信号であっても、助手席と合わせて２方向にｎｕｌｌを生成する必要があり、マイクロホン数が２では困難である。 Furthermore, when the reproduced sound is output in stereo from the speaker, there is a possibility that various sound source directions exist at the same time, and there is a limit in the array processing using a small number of microphones. Even for a monaural signal, it is necessary to generate null in two directions together with the passenger seat, and it is difficult to use two microphones.

本実施形態では、スピーカ１０６−１，１０６−２からの再生音をマイクロホン１０７−１〜１０７−Ｎで受音するときに、感度が相対的に低い方向に音源が定位するようにフィルタリング部１０４−１，１０４−２によって事前に加算器１０３からの音源信号をフィルタリングして加工しておく。具体的には、例えばスピーカ１０６−１，１０６−２からの再生音があたかも助手席の方向から到来したかのように観測されるように、フィルタリング部１０４−１，１０４−２によってフィルタリングを行う。 In the present embodiment, when the reproduced sound from the speakers 106-1 and 106-2 is received by the microphones 107-1 to 107-N, the filtering unit 104 is arranged so that the sound source is localized in a direction in which the sensitivity is relatively low. The sound source signal from the adder 103 is filtered and processed in advance by −1 and 104-2. Specifically, for example, filtering is performed by the filtering units 104-1 and 104-2 so that the reproduced sound from the speakers 106-1 and 106-2 is observed as if it came from the direction of the passenger seat. .

このようにすることで、助手席からの発話音とスピーカ１０６−１，１０６−２からの再生音を同一方向の一つのｎｕｌｌで消すことが可能となる。すなわち、マイクロホン１０７−１〜１０７−Ｎの数が少ないために数多くのｎｕｌｌを形成できない場合でも、助手席の発話音に加えてスピーカ１０６−１，１０６−２からの再生音も同時に抑圧することができる。このように処理音声信号において助手席からの発話音のみでなく、スピーカ１０６−１，１０６−２からの再生音も抑圧することにより、音声認識を正確に行うことができる。 In this way, it is possible to eliminate the utterance sound from the passenger seat and the reproduction sound from the speakers 106-1 and 106-2 with one null in the same direction. That is, even when a large number of nulls cannot be formed due to the small number of microphones 107-1 to 107-N, the reproduced sound from the speakers 106-1 and 106-2 is simultaneously suppressed in addition to the spoken sound of the passenger seat. Can do. In this way, not only the utterance sound from the passenger seat but also the reproduced sound from the speakers 106-1 and 106-2 are suppressed in the processed sound signal, so that speech recognition can be performed accurately.

上記の説明では、アレー処理部１０８は予め定められた方向（例えば助手席の方向）に対する感度を極小値とするようにアレー処理を行い、フィルタリング部１０４−１，１０４−２ではスピーカ１０６−１，１０６−２からの再生音において当該予め定められた方向に音源が定位するようにフィルタリングを行う場合について述べた。しかし、マイクロホンアレーの方式によっては、上述のようにスピーカ１０６−１，１０６−２からの再生音が助手席からの発話音と同一方向とみなされるようにフィルタリングを行う必要は必ずしもない。 In the above description, the array processing unit 108 performs array processing so that the sensitivity to a predetermined direction (for example, the direction of the passenger seat) is minimized, and the filtering units 104-1 and 104-2 perform the speaker 106-1. , 106-2, the filtering is performed so that the sound source is localized in the predetermined direction. However, depending on the method of the microphone array, it is not always necessary to perform filtering so that the reproduced sound from the speakers 106-1 and 106-2 is regarded as the same direction as the uttered sound from the passenger seat as described above.

例えば、特開２００７−１０８９７号公報には、学習用の音源位置それぞれに対して所望の応答（目標信号）を生成するようにアレーの指向性を設計する方法が示されている。この場合、ドライバが存在する可能性のある方向範囲から発せられた信号を強調し、それ以外の信号を抑圧するような設計が可能である。このようなアレーでは、マイクロホンアレーの受音方向はドライバが存在する可能性のある方向範囲よりも外にあればよく、必ずしも助手席からの発話音の方向と一致している必要はない。 For example, Japanese Patent Laid-Open No. 2007-10897 discloses a method of designing the array directivity so as to generate a desired response (target signal) for each of the learning sound source positions. In this case, it is possible to design such that signals emitted from a direction range in which a driver may exist are emphasized and other signals are suppressed. In such an array, the sound receiving direction of the microphone array only needs to be outside the direction range in which the driver may be present, and does not necessarily match the direction of the uttered sound from the passenger seat.

このようにアレー処理部１０８においては、予め定められた方向範囲に対する感度を該範囲外の方向に対する感度よりも高くするようにアレー処理を行い、フィルタリング部１０４−１，１０４−２では、この方向範囲外の方向に音源が定位するようにフィルタリングを行ってもよい。 As described above, the array processing unit 108 performs array processing so that the sensitivity with respect to a predetermined direction range is higher than the sensitivity with respect to a direction outside the range, and the filtering units 104-1 and 104-2 perform this direction. Filtering may be performed so that the sound source is localized in a direction outside the range.

ところで、フィルタリング部１０４−１，１０４−２からのフィルタリング信号がスピーカ１０６−１，１０６−２に供給される場合、スピーカ１０６−１，１０６−２からの再生音は、本来のオーディオ信号の再生時とは異なる音となり、ステレオ感も失われる。すなわち、フィルタリング信号に対応する再生音は、聞く人にとって違和感のある音となる。 By the way, when the filtering signals from the filtering units 104-1 and 104-2 are supplied to the speakers 106-1 and 106-2, the reproduced sound from the speakers 106-1 and 106-2 is reproduced from the original audio signal. The sound is different from the time and the stereo sense is lost. That is, the reproduced sound corresponding to the filtered signal is a sound that is uncomfortable for the listener.

そこで、本実施形態では選択器１０５−１，１０５−２によって、ドライバの発話中のみフィルタリング部１０４−１，１０４−２からのフィルタリング信号を選択してスピーカ１０６−１，１０６−２に供給し、それ以外のときはオーディオ入力端子１０１−１，１０１−２からのオーディオ信号を選択してスピーカ１０６−１，１０６−２に供給し、通常の再生を行うようにすることで、再生音のこのような違和感を最小限に押さえることができる。 Therefore, in the present embodiment, the selectors 105-1 and 105-2 select the filtering signals from the filtering units 104-1 and 104-2 only during the driver's speech and supply them to the speakers 106-1 and 106-2. In other cases, the audio signals from the audio input terminals 101-1 and 101-2 are selected and supplied to the speakers 106-1 and 106-2 so that normal reproduction is performed. Such a sense of incongruity can be minimized.

次に、フィルタリング部１０４−１，１０４−２の設計方法について述べる。
スピーカ１０６−１，１０６−２に供給される音源信号は、加算器１０３でステレオ信号を加算して得られるモノラル信号である。この音源信号をフィルタリング部１０４−１，１０４−２によりフィルタリングした後にスピーカ１０６−１，１０６−２で再生し、その再生音を２つマイクロホン１０７−１，１０７−２により受音する場合、音源（加算器１０３の出力端）からマイクロホン１０７−１，１０７−２までの伝達関数（ｙ１，ｙ２）は、以下の式（２）で表される。

Next, a design method for the filtering units 104-1 and 104-2 will be described.
The sound source signals supplied to the speakers 106-1 and 106-2 are monaural signals obtained by adding the stereo signals by the adder 103. When this sound source signal is filtered by the filtering units 104-1 and 104-2 and reproduced by the speakers 106-1 and 106-2 and the reproduced sound is received by the two microphones 107-1 and 107-2, The transfer function (y1, y2) from the (output terminal of the adder 103) to the microphones 107-1 and 107-2 is expressed by the following equation (2).

ただし、ｈｘｙはスピーカｙからマイクロホンxまでの伝達関数であり、ｇ１，ｇ２はフィルタリング部１０４−１，１０４−２の伝達関数である。 Here, hxy is a transfer function from the speaker y to the microphone x, and g1 and g2 are transfer functions of the filtering units 104-1 and 104-2.

スピーカ１０６−１，１０６−２からの再生音が助手席の方向から到来したかのようにフィルタリング部１０４−１，１０４−２がフィルタリングを行うためには、（ｙ１，ｙ２）が助手席からマイクロホン１０７−１，１０７−２までの伝達関数（ａ１，ａ２）と等しくなるように、フィルタリング部１０４−１，１０４−２の伝達関数（ｇ１，ｇ２）を設計すればよい。このためには、式（２）の（ｙ１，ｙ２）に（ａ１，ａ２）を代入して、（ｇ１，ｇ２）について解けばよい。 In order for the filtering units 104-1 and 104-2 to perform filtering as if the reproduced sound from the speakers 106-1 and 106-2 came from the direction of the passenger seat, (y1, y2) is from the passenger seat. The transfer functions (g1, g2) of the filtering units 104-1 and 104-2 may be designed so as to be equal to the transfer functions (a1, a2) to the microphones 107-1 and 107-2. For this purpose, (g1, g2) may be solved by substituting (a1, a2) into (y1, y2) in equation (2).

２マイク・２スピーカの場合、スピーカとマイクロホン間の伝達関数を表す行列が正方行列なので、通常、逆行列を計算することができる。マイクロホン数とスピーカ数が異なる場合は、逆行列が定義できないので、一般化逆行列を用いるのが一般的である。 In the case of two microphones and two speakers, since the matrix representing the transfer function between the speaker and the microphone is a square matrix, an inverse matrix can usually be calculated. Since the inverse matrix cannot be defined when the number of microphones and the number of speakers are different, it is common to use a generalized inverse matrix.

ところで、フィルタリング部１０４−１，１０４−２の伝達関数（ｇ１，ｇ２）が周波数毎に大きく異なり、フィルタリング信号に対応してスピーカ１０６−１，１０６−２から出力される再生音のスペクトルがフィルタリング前のオーディオ信号に対応する再生音のスペクトルと大きく異なる場合がある。このような場合、フィルタリング信号に対応する再生音のスペクトルがフィルタリング前のオーディオ信号に対応する再生音のスペクトルと同じになるように伝達関数（ｇ１，ｇ２）の大きさを調整することは可能である。なぜなら、音源定位において音の大きさは無関係であるから、例えば任意の定数Ａを用いてフィルタリング部１０４−１，１０４−２の伝達関数を（Ａ×ｇ１，Ａ×ｇ２）としても、音源の定位に影響はないからである。 By the way, the transfer functions (g1, g2) of the filtering units 104-1 and 104-2 are greatly different for each frequency, and the spectrum of the reproduced sound output from the speakers 106-1 and 106-2 corresponding to the filtering signal is filtered. There are cases where the spectrum of the reproduced sound corresponding to the previous audio signal is greatly different. In such a case, it is possible to adjust the magnitude of the transfer function (g1, g2) so that the spectrum of the reproduced sound corresponding to the filtered signal is the same as the spectrum of the reproduced sound corresponding to the audio signal before filtering. is there. This is because the sound volume is irrelevant in sound source localization, so even if the transfer function of the filtering units 104-1 and 104-2 is set to (A × g1, A × g2) using an arbitrary constant A, for example, This is because the localization is not affected.

図１の制御入力端子１０２には、オーディオ入力端子１０１−１，１０１−２からのオーディオ信号をそのままスピーカ１０６−１，１０６−２で再生するか、フィルタリング部１０４−１，１０４−２でフィルタリングを行ってから再生するかを切り替え制御するための制御信号が与えられ、この制御信号によって選択器１０５−１，１０５−２が制御される。 In the control input terminal 102 of FIG. 1, the audio signals from the audio input terminals 101-1 and 101-2 are directly reproduced by the speakers 106-1 and 106-2, or filtered by the filtering units 104-1 and 104-2. A control signal for switching and controlling whether or not to perform reproduction is given, and the selectors 105-1 and 105-2 are controlled by this control signal.

制御信号の与え方としては、例えばドライバの発話区間（発声区間）を検出し、発話区間にのみフィルタリング部１０４−１，１０４−２からのフィルタリング信号をスピーカ１０６−１，１０６−２に供給するように選択器１０５−１，１０５−２を制御する方法が挙げられる。発話区間の検出には、例えば音声区間検出器が用いられる。音声区間検出器の方式としては、信号のパワー情報、推定雑音にもとづく信号対雑音比あるいはスペクトル情報等に基づき音声区間を判定する方法などが提案されている。また、統計的手法に基づく判別法（Sohn J., Kim N.S., and Sung W., "A statistical model-based voice activity detection", IEEE Signal Process. Lett., pp.1-3, 1999, 16, (1)参照）もよく用いられる。制御信号の別の与え方として、ドライバから明示的に音声区間を指示するようにする方法も考えられる。 As a method for giving the control signal, for example, a driver's speech section (speech section) is detected, and the filtering signals from the filtering units 104-1 and 104-2 are supplied to the speakers 106-1 and 106-2 only during the speech section. A method of controlling the selectors 105-1 and 105-2 can be given. For example, a speech section detector is used to detect the speech section. As a method of a speech section detector, a method of determining a speech section based on signal power information, a signal-to-noise ratio based on estimated noise, spectrum information, or the like has been proposed. In addition, based on statistical methods (Sohn J., Kim NS, and Sung W., "A statistical model-based voice activity detection", IEEE Signal Process. Lett., Pp.1-3, 1999, 16, (See (1)) is also often used. As another method of giving the control signal, a method of explicitly instructing the voice section from the driver can be considered.

音声認識を行う場合、発話開始を示すボタンを発話者が押すことがよく行われる。発話開始の情報を音声区間情報として用いることで確実に音声区間を特定することができ、これを制御信号として用いることができる。発話時に一度だけ押し、発話終了はプログラムが判定するようにしてもよい。この場合は、発話開始はボタンで、終了は音声区間検出器の情報を用いるようにすればよい。 When performing speech recognition, the speaker often presses a button indicating the start of utterance. By using the utterance start information as the voice section information, the voice section can be reliably identified, and this can be used as the control signal. It is possible to press the key only once during the utterance, and the program may determine the end of the utterance. In this case, it is sufficient to use the information of the voice section detector for the start and the end of the utterance.

（第２の実施形態）
図３は、第１の実施形態を変形した第２の実施形態に係る信号処理装置であり、図１の制御入力端子１０２と選択器１０５−１，１０５−２が除去され、常にフィルタリング部１０４−１〜１０４−Ｎからのフィルタリング信号がスピーカ１０６−１〜１０６−Ｎに供給される。このようにスピーカ１０６−１〜１０６−Ｎから常時フィルタリング信号を再生することが許容できる場合は、装置の簡略化という観点からも本実施形態の構成が望ましい。なお、図３ではオーディオ入力端子、フィルタリング部及びスピーカはいずれもＭ個となっているが、Ｍは２以上の複数であればよいことはこれまでの説明から明らかである。 (Second Embodiment)
FIG. 3 shows a signal processing apparatus according to the second embodiment, which is a modification of the first embodiment. The control input terminal 102 and selectors 105-1 and 105-2 in FIG. The filtering signals from -1 to 104-N are supplied to the speakers 106-1 to 106-N. Thus, when it is permissible to always reproduce the filtered signal from the speakers 106-1 to 106-N, the configuration of the present embodiment is desirable from the viewpoint of simplification of the apparatus. In FIG. 3, there are M audio input terminals, filtering units, and speakers, but it is clear from the above description that M may be a plurality of two or more.

（第３の実施形態）
図４は、第１の実施形態を変形した第３の実施形態に係る信号処理装置であり、フィルタリング部１０４−１，１０４−２からのフィルタリング信号は、蓄積部１１０−１，１１０−２に一旦蓄積され、蓄積部１１０−１，１１０−２から読み出されたフィルタリング信号がスピーカ１０６−１，１０６−２に供給される。すなわち、本実施形態ではオンラインでフィルタリングを行わず、フィルタリング部１０４−１，１０４−２で事前に音源信号をフィルタリングして得られたフィルタリング信号の波形を蓄積部１１０−１，１１０−２に蓄積しておき、それらを実使用時に適宜読み出してスピーカ１０６−１，１０６−２で再生する。 (Third embodiment)
FIG. 4 shows a signal processing apparatus according to the third embodiment modified from the first embodiment. Filtering signals from the filtering units 104-1 and 104-2 are stored in the storage units 110-1 and 110-2. The filtering signals once accumulated and read from the accumulation units 110-1 and 110-2 are supplied to the speakers 106-1 and 106-2. That is, in this embodiment, filtering is not performed online, and the waveforms of the filtering signals obtained by filtering the sound source signals in advance by the filtering units 104-1 and 104-2 are stored in the storage units 110-1 and 110-2. In addition, they are appropriately read out during actual use and reproduced by the speakers 106-1 and 106-2.

例えば、スピーカ１０６−１，１０６−２で再生すべき信号のうち、カーナビゲーションの固定メッセージのように予め決まっている信号については、再生の都度フィルタリングを行うよりも、事前にフィルタリングして蓄積部１１０−１，１１０−２に蓄積しておくことが望ましい。そして、再生時に蓄積部１１０−１，１１０−２からフィルタリング済みの信号波形を取り出してスピーカ１０６−１，１０６−２に供給すればよい。 For example, among the signals to be reproduced by the speakers 106-1 and 106-2, a signal that is determined in advance, such as a car navigation fixed message, is filtered and stored in advance rather than being filtered each time it is reproduced. It is desirable to store in 110-1 and 110-2. Then, a filtered signal waveform may be extracted from the storage units 110-1 and 110-2 during reproduction and supplied to the speakers 106-1 and 106-2.

なお、固定メッセージはモノラル音声であることも多いが、その場合は蓄積部１０１−１，１０１−２に同一のデータが入力されると考えればよく、ステレオデータと同一の構成でも問題は生じない。 In many cases, the fixed message is monaural sound. In this case, it is sufficient that the same data is input to the storage units 101-1 and 101-2, and there is no problem with the same configuration as stereo data. .

（第４の実施形態）
図５は、第４の実施形態に係る信号処理装置を示している。本実施形態では、図１の加算器１０３に代えて相関低減部３０３が設けられ、この相関低減部３０３にオーディオ入力端子１０１−１，１０１−２からのオーディオ信号が入力される。相関低減部２０３では、複数チャネル（この例では２チャネル）のオーディオ信号間の相関より相関が減じられた複数チャネル（この例では２チャネル）の音源信号が生成される。相関低減部２０３からの２チャネルの音源信号のうち、第１チャネルの音源信号はフィルタリング部３０４−１，３０４−３に入力され、第２チャネルの音源信号はフィルタリング部３０４−２，３０４−４に入力される。 (Fourth embodiment)
FIG. 5 shows a signal processing apparatus according to the fourth embodiment. In the present embodiment, a correlation reduction unit 303 is provided instead of the adder 103 in FIG. 1, and audio signals from the audio input terminals 101-1 and 101-2 are input to the correlation reduction unit 303. The correlation reduction unit 203 generates a sound source signal of a plurality of channels (two channels in this example) in which the correlation is reduced from the correlation between the audio signals of the plurality of channels (two channels in this example). Of the two-channel sound source signals from the correlation reduction unit 203, the first channel sound source signal is input to the filtering units 304-1 and 304-3, and the second channel sound source signal is input to the filtering units 304-2 and 304-4. Is input.

フィルタリング部３０４−１，３０４−２からのフィルタリング信号は、加算器３１１−１で加算された後、選択器１０５−１を介してスピーカ１０６−１に供給される。同様に、フィルタリング部３０４−３，３０４−４からのフィルタリング信号は、加算器３１１−２で加算された後、選択器１０５−２を介してスピーカ１０６−２に供給される。 Filtering signals from the filtering units 304-1 and 304-2 are added by the adder 311-1 and then supplied to the speaker 106-1 via the selector 105-1. Similarly, the filtering signals from the filtering units 304-3 and 304-4 are added by the adder 311-2 and then supplied to the speaker 106-2 via the selector 105-2.

このように再生チャネルであるスピーカ１０６−１，１０６−２毎に、相関低減部２０３からの２チャネルの音源信号に対してフィルタリングが行われ、各２チャネルのフィルタリング信号の加算信号がスピーカ１０６−１，１０６−２に供給される。 In this way, filtering is performed on the two-channel sound source signal from the correlation reduction unit 203 for each of the speakers 106-1 and 106-2, which are reproduction channels, and an addition signal of the filtering signals of the two channels is the speaker 106-. 1, 106-2.

スピーカ１０６−１，１０６−２からの再生音を２つマイクロホン１０７−１，１０７−２によって受音する場合、音源（相関低減部３０３の２つの出力端）からマイクロホン１０７−１，１０７−２までの伝達関数（ｙ１，ｙ３）及び（ｙ２，ｙ４）は、以下の式（３）及び（４）で表される。

When two reproduced sounds from the speakers 106-1 and 106-2 are received by the two microphones 107-1 and 107-2, the microphones 107-1 and 107-2 are received from the sound source (two output terminals of the correlation reducing unit 303). The transfer functions (y1, y3) and (y2, y4) up to are expressed by the following equations (3) and (4).

ただし、ｈｘｙはスピーカｙからマイクロホンｘまでの伝達関数であり、ｇ１，ｇ３はフィルタリング部３０４−１，３０４−３の伝達関数、ｇ２，ｇ４はフィルタリング部３０４−２，３０４−４の伝達関数である。第１の実施形態において式（２）からフィルタリング部１０４−１，１０４−２の伝達関数（ｇ１，ｇ２）を求めたのと同様に、（ｙ１，ｙ３）と（ｙ２，ｙ４）について異なる方向を与えることで、式（３）及び（４）から伝達関数ｇ１，ｇ３及びｇ２，ｇ４を求めることができる。 However, hxy is a transfer function from the speaker y to the microphone x, g1 and g3 are transfer functions of the filtering units 304-1 and 304-3, and g2 and g4 are transfer functions of the filtering units 304-2 and 304-4. is there. Similarly to the case where the transfer functions (g1, g2) of the filtering units 104-1 and 104-2 are obtained from the expression (2) in the first embodiment, different directions for (y1, y3) and (y2, y4). , The transfer functions g1, g3 and g2, g4 can be obtained from the equations (3) and (4).

次に、本実施形態の動作原理について説明する。
第１の実施形態では、入力された複数チャネルのオーディオ信号は加算器１０３で加算され、１チャネルすなわちモノラルの音源信号とされたのに対して、本実施形態では複数チャネル（第１チャネル及び第２チャネル）の音源信号が出力される。これら複数チャネルの音源信号は、前述のようにスピーカ１０６−１，１０６−２毎にフィルタリングが行われ、さらに各２チャネルのフィルタリング信号の加算信号がスピーカ１０６−１，１０６−２に供給される。このときに注意すべきことは、スピーカ１０６−１，１０６−２からの再生音同士が干渉し合い、マイクロホン１０７−１〜１０７−Ｎで受音したときに設計とは異なる方向に定位してしまうことである。 Next, the operation principle of this embodiment will be described.
In the first embodiment, input audio signals of a plurality of channels are added by the adder 103 to be a single channel, that is, a monaural sound source signal, whereas in this embodiment, a plurality of channels (the first channel and the first channel) are added. (2 channels) sound source signal is output. As described above, the sound source signals of the plurality of channels are filtered for each of the speakers 106-1 and 106-2, and an addition signal of the filtered signals of the two channels is supplied to the speakers 106-1 and 106-2. . It should be noted that the reproduced sounds from the speakers 106-1 and 106-2 interfere with each other and are localized in a direction different from the design when received by the microphones 107-1 to 107-N. It is to end.

本実施形態では、このような弊害を避けるため相関低減部３０３により２チャネルの音源信号間の相関を低減する。すなわち、オーディオ入力端子１０１−１，１０１−２からのオーディオ信号はステレオ信号であるため相関が大きいが、相関低減部３０３から出力される２チャネルの音源信号は相関が小さい信号とされる。ここでは、入力されるオーディオ信号のチャネル数と相関を低減した音源信号のチャネル数は共に２チャネルとなっているが、これらのチャネル数は同一でなくてもよい。 In the present embodiment, in order to avoid such an adverse effect, the correlation reduction unit 303 reduces the correlation between the two-channel sound source signals. That is, since the audio signals from the audio input terminals 101-1 and 101-2 are stereo signals, the correlation is large, but the two-channel sound source signals output from the correlation reducing unit 303 are signals having a small correlation. Here, both the number of channels of the input audio signal and the number of channels of the sound source signal whose correlation is reduced are two, but the number of channels may not be the same.

具体的には、相関低減部３０３では例えば入力される２チャネルのオーディオ信号を加算してモノラル信号とした後、周波数帯別に分離を行い、第１チャネルの音源信号として低域信号、第２チャネルの音源信号として高域信号を出力する。また、２チャネルのオーディオ信号を加算して得たモノラル信号を多数の周波数成分に分離していわゆるサブバンド化し、各周波数成分を２チャネルの音源信号に交互に振り分けたり、ランダムに振り分けたりすることもできる。また、ＭＩＤＩ（Musical Instrument Digital Interface）データから生成されたオーディオ信号のように、信号を構成する個々の信号（パート）が既知の場合は、パート毎に分離して２チャネルの音源信号を生成することも考えられる。 Specifically, in the correlation reducing unit 303, for example, the input two-channel audio signals are added to form a monaural signal, and then separated for each frequency band, and the low-frequency signal and the second channel are used as the first-channel sound source signal. A high frequency signal is output as a sound source signal. Also, the monaural signal obtained by adding the two-channel audio signals is separated into a number of frequency components to form so-called subbands, and each frequency component is alternately allocated to the two-channel sound source signals or randomly allocated. You can also. Further, when individual signals (parts) constituting a signal are known, such as an audio signal generated from MIDI (Musical Instrument Digital Interface) data, a two-channel sound source signal is generated separately for each part. It is also possible.

前述したように、フィルタリング信号がスピーカ１０６−１，１０６−２に供給される場合、スピーカ１０６−１，１０６−２からの再生音のステレオ感が失われる。本実施形態によると、このような再生音のステレオ感の低減を補うことができる。すなわち、本実施形態ではアレー処理部１０８によってマイクロホンアレーが複数の方向に対して感度の極小値を持つ場合、生成された多チャネルの音源信号を各極小値に振り分けることでステレオ感の低減を補うことが可能となる。マイクロホンアレーが特定の方向範囲の方向に高い感度を示す場合は、生成された多チャネルの音源信号は感度の高い方向範囲以外の方向に定位させることで、ステレオ間の低減を補うことが可能となる。 As described above, when the filtering signal is supplied to the speakers 106-1 and 106-2, the stereo feeling of the reproduced sound from the speakers 106-1 and 106-2 is lost. According to the present embodiment, it is possible to compensate for such a reduction in the stereo feeling of the reproduced sound. That is, in this embodiment, when the microphone array has minimum values of sensitivity in a plurality of directions by the array processing unit 108, the generated multi-channel sound source signal is distributed to each minimum value to compensate for a reduction in stereo feeling. It becomes possible. When the microphone array shows high sensitivity in a specific direction range, the generated multi-channel sound source signal can be localized in a direction other than the high sensitivity direction range to compensate for the reduction between stereo. Become.

（第５の実施形態）
図６に、第５の実施形態に係る信号処理装置を示す。第１の実施形態に係る信号処理装置との相違点について述べると、本実施形態ではオーディオ入力端子１０１−１，１０２−２の直後に配置された信号分離部４１０−１，４１０−２が追加され、図１の選択器１０５−１，１０５−２が加算器３１２−１，３１２−２に置き換えられ、さらに制御入力端子１０２からの制御信号の入力先が選択器１０５−１，１０５−２から信号分離部４１０−１，４１０−２へと変更されている。 (Fifth embodiment)
FIG. 6 shows a signal processing apparatus according to the fifth embodiment. The difference from the signal processing apparatus according to the first embodiment will be described. In this embodiment, signal separation units 410-1 and 410-2 arranged immediately after the audio input terminals 101-1 and 102-2 are added. The selectors 105-1 and 105-2 in FIG. 1 are replaced with adders 312-1 and 312-2, and the input destination of the control signal from the control input terminal 102 is the selectors 105-1 and 105-2. To signal separation units 410-1 and 410-2.

オーディオ入力端子１０１−１，１０２からの２チャネルのオーディオ信号は、信号分離部４１０−１，４１０−２において、加算器４０３により加算されてモノラルの音源信号となる成分（モノラル成分という）と、ステレオ２チャネルのまま直接出力される成分（ステレオ成分という）とに分離される。 The two-channel audio signals from the audio input terminals 101-1 and 102 are added by the adder 403 in the signal separation units 410-1 and 410-2 and become a monaural sound source signal (referred to as a monaural component), It is separated into components (referred to as stereo components) that are directly output as stereo two channels.

前者のモノラル成分は、第１の実施形態と同様に加算器４０３で加算されることによりモノラルの音源信号とされる。この音源信号はフィルタリング部３０４−１，３０４−２によりフィルタリングされ、フィルタリング信号が加算器３１２−１，３１２−２にそれぞれ入力される。後者のステレオ成分は、そのまま加算器３１２−１，３１２−２にそれぞれ入力される。加算器３１２−１，３１２−２により音源信号とステレオ成分の信号とが加算され、スピーカ１０６−１，１０６−２から再生音として出力される。 The former monaural component is added by the adder 403 in the same manner as in the first embodiment to obtain a monaural sound source signal. This sound source signal is filtered by filtering sections 304-1 and 304-2, and the filtered signals are input to adders 312-1 and 312-2, respectively. The latter stereo component is input to adders 312-1 and 312-2 as they are. The sound source signal and the stereo component signal are added by the adders 312-1 and 312-2, and output from the speakers 106-1 and 106-2 as reproduced sound.

本実施形態によると、フィルタリング部３０４−１，３０４−２によりフィルタリングされる信号は、信号分離部４１０−１，４１０−２によって分離された、入力されるオーディオ信号の一部の成分である。例えば、オーディオ信号の各成分の音源方向が事前に分かっていたり、音源分離手法等を用いたりすることで各成分の分離ができている場合、フィルタリング部３０４−１，３０４−２はオーディオ信号のうち、妨害音となる発話音が到来する可能性のある、例えば助手席の方向範囲内の成分だけをフィルタリングすればよい。この方向範囲外に存在する成分については、スピーカ１０６−１，１０６−２からそのまま出力しても、アレー処理部１０８によって抑圧が可能である。 According to the present embodiment, the signals filtered by the filtering units 304-1 and 304-2 are some components of the input audio signal separated by the signal separation units 410-1 and 410-2. For example, when the sound source direction of each component of the audio signal is known in advance or each component can be separated by using a sound source separation method or the like, the filtering units 304-1 and 304-2 may Of these, it is only necessary to filter, for example, a component within the direction range of the passenger seat that may cause an utterance sound as an interference sound. The components existing outside this direction range can be suppressed by the array processing unit 108 even if they are directly output from the speakers 106-1 and 106-2.

このように本実施形態によれば、入力されるオーディオ信号のうち、フィルタリング部３０４−１，３０４−２において妨害音となる発話音が到来する可能性のある方向範囲内の成分だけをフィルタリングすればよい。従って、入力されるオーディオ信号に基づくスピーカ１０６−１，１０６−２からの再生音において、フィルタリングに起因するステレオ感の低下を最小限に止めることができる。 As described above, according to the present embodiment, only the components within the direction range in which the utterance sound that becomes the interference sound may arrive in the filtering units 304-1 and 304-2 are filtered out of the input audio signal. That's fine. Therefore, in the reproduced sound from the speakers 106-1 and 106-2 based on the input audio signal, it is possible to minimize a reduction in stereo feeling due to filtering.

なお、図６では信号分離部４１０−１，４１０−２は制御入力端子１０２からの制御信号によって制御されているが、制御が必要でない場合は制御信号の入力を省略することができる。 In FIG. 6, the signal separation units 410-1 and 410-2 are controlled by a control signal from the control input terminal 102. However, when control is not necessary, the input of the control signal can be omitted.

信号分離部４１０−１，４１０−２の別の具体例として、特定の周波数帯域の成分を選択することもできる。例えば、入力されるオーディオ信号の帯域が２０ｋＨｚであり、アレー処理部１０９がアレー処理を行う帯域が８ｋＨｚの場合、オーディオ信号のうち８ｋＨｚ以上の帯域の成分はアレー処理には無関係である。そこで、信号分離部４１０−１，４１０−２により加算器４０３により加算されてモノラルの音源信号となるモノラル成分として、アレー処理の対象となる８ｋＨｚに満たない成分のみを分離する。フィルタリング部３０４−１，３０４−２では、加算器４０３から出力される８ｋＨｚ以下の成分のみからなる音源信号がフィルタリングされ、フィルタリング信号が加算器３１２−１，３１２−２に入力される。 As another specific example of the signal separation units 410-1 and 410-2, a component in a specific frequency band can be selected. For example, when the band of the input audio signal is 20 kHz and the band in which the array processing unit 109 performs the array process is 8 kHz, the component of the band of 8 kHz or more in the audio signal is irrelevant to the array process. Therefore, only the components less than 8 kHz to be subjected to the array processing are separated as the monaural components that are added by the adder 403 by the signal separators 410-1 and 410-2 and become a monaural sound source signal. In the filtering units 304-1 and 304-2, the sound source signal including only components of 8 kHz or less output from the adder 403 is filtered, and the filtered signal is input to the adders 312-1 and 312-2.

一方、入力されるオーディオ信号のうち８ｋＨｚ以上の帯域の成分は、信号分離部４１０−１，４１０−２においてステレオ２チャネルのまま直接出力されるステレオ成分として分離され、加算器３１２−１，３１２−２にそれぞれ入力される。加算器３１２−１，３１２−２により音源信号とステレオ成分の信号とが加算され、スピーカ１０６−１，１０６−２から再生音として出力される。このようにすることで、入力されるオーディオ信号に基づくスピーカ１０６−１，１０６−２からの再生音における、フィルタリングによるステレオ感の低下を最小限に止めることができる。 On the other hand, the components in the band of 8 kHz or more in the input audio signal are separated as stereo components that are directly output as stereo two channels in the signal separation units 410-1 and 410-2, and adders 312-1 and 312. -2, respectively. The sound source signal and the stereo component signal are added by the adders 312-1 and 312-2, and output from the speakers 106-1 and 106-2 as reproduced sound. By doing in this way, the fall of the stereo feeling by filtering in the reproduced sound from the speakers 106-1 and 106-2 based on the input audio signal can be minimized.

（第６の実施形態）
図７は、第６の実施形態に係る信号処理装置を示している。図５に示した第４の実施形態との相違点について説明すると、本実施形態では図５のオーディオ入力端子１０１−１，１０１−２の直後に信号分離部４１０−１，４１０−２が追加され、選択器１０５−１，１０５−２が加算器３１２−１，３１２−２に置き換えられ、さらに制御入力端子１０２からの制御信号の入力先が選択器１０５−１，１０５−２から信号分離部４１０−１，４１０−２に変更されている。 (Sixth embodiment)
FIG. 7 shows a signal processing apparatus according to the sixth embodiment. The difference from the fourth embodiment shown in FIG. 5 will be described. In this embodiment, signal separation units 410-1 and 410-2 are added immediately after the audio input terminals 101-1 and 101-2 in FIG. The selectors 105-1 and 105-2 are replaced with adders 312-1 and 312-2, and the control signal input destination from the control input terminal 102 is separated from the selectors 105-1 and 105-2. It is changed to the part 410-1,410-2.

本実施形態では、第４の実施形態と同様に入力信号の一部の成分に対してフィルタリングが行われる。フィルタリングは第４の実施形態が第１の実施形態と同様の方法で行ったのに対し、本実施形態では第３の実施形態と同様に、相関低減部３０３から出力される２チャネルの音源信号に対して行われる。 In the present embodiment, filtering is performed on some components of the input signal as in the fourth embodiment. The filtering is performed in the fourth embodiment by the same method as in the first embodiment, but in this embodiment, the two-channel sound source signal output from the correlation reducing unit 303 is the same as in the third embodiment. To be done.

よって、本実施形態によると第５の実施形態と同様に入力されるオーディオ信号に基づくスピーカ１０６−１，１０６−２からの再生音において、フィルタリングに起因するステレオ感の低下を最小限に止めることができる。なお、図７では信号分離部４１０−１，４１０−２は制御入力端子１０２からの制御信号によって制御されているが、制御が必要でない場合は制御信号の入力を省略することができる。 Therefore, according to the present embodiment, in the reproduced sound from the speakers 106-1 and 106-2 based on the input audio signal as in the fifth embodiment, the reduction in stereo feeling due to filtering is minimized. Can do. In FIG. 7, the signal separation units 410-1 and 410-2 are controlled by a control signal from the control input terminal 102. However, when control is not necessary, the input of the control signal can be omitted.

（第７の実施形態）
図８は、これまでの実施形態で説明してきたような信号処理装置を含む、本発明の第７の実施形態に係る電子機器５０１を示している。ここで、電子機器５０１は例えばパーソナルコンピュータ、携帯通信端末等であり、表示部５０２を有する。電子機器５０１の例えば表示部５０２の周囲にスピーカ１０６−１，１０６−２とマイクロホン１０７−１，１０７−２が設置され、電子機器５０１を取り扱う話者５０３がマイクロホン１０７−１，１０７−２に向けて音声を入力することができる。 (Seventh embodiment)
FIG. 8 shows an electronic apparatus 501 according to a seventh embodiment of the present invention, including a signal processing device as described in the previous embodiments. Here, the electronic device 501 is, for example, a personal computer, a portable communication terminal, or the like, and includes a display unit 502. For example, speakers 106-1 and 106-2 and microphones 107-1 and 107-2 are installed around the display unit 502 of the electronic device 501, and a speaker 503 handling the electronic device 501 is connected to the microphones 107-1 and 107-2. Voice can be input.

次に、本実施形態の動作原理について説明する。
電子機器５０１で生成される出力信号に基づくスピーカ１０６−１，１０６−２からの再生音は、話者５０３に向けて放射される。この再生音は、例えば通話相手の声、あるいは音楽をはじめとするオーディオ信号等である。話者５０３は、電子機器５０１に備え付けられたマイクロホン１０７−１〜１０７−Ｎに向かって、例えば通話相手に対する発話や、端末に対する指示などの音声を入力する。 Next, the operation principle of this embodiment will be described.
The reproduced sound from the speakers 106-1 and 106-2 based on the output signal generated by the electronic device 501 is radiated toward the speaker 503. The reproduced sound is, for example, the voice of the other party or an audio signal including music. The speaker 503 inputs voices such as utterances to the other party and instructions to the terminal toward the microphones 107-1 to 107-N provided in the electronic device 501.

ところで、マイクロホン１０７−１，１０７−２がスピーカ１０６−１，１０６−２からの再生音を受音したときに観測される音源方向と、話者５０３の発話を受音したとき観測される音源方向が重なっていると、マイクロホン１０７−１，１０７−２から出力される受音信号において、話者５０３の声とスピーカ１０６−１，１０６−２からの再生音が混ざってしまう。これは通話においては相手側にエコーが発生することになり、音声認識をする場合には認識誤りを引き起こす要因になる。 By the way, the sound source direction observed when the microphones 107-1 and 107-2 receive the reproduced sound from the speakers 106-1 and 106-2 and the sound source observed when the speech of the speaker 503 is received. If the directions overlap, the voice of the speaker 503 and the reproduced sound from the speakers 106-1 and 106-2 are mixed in the sound reception signals output from the microphones 107-1 and 107-2. This causes an echo to occur on the other side in a call, and causes a recognition error when performing voice recognition.

第１乃至第６の実施形態で述べてきた信号処理装置を用いて、スピーカ１０６−１，１０６−２からの再生音を複数のマイクロホン０７−１，１０７−２で受音したとき、音源方向が話者の存在する可能性のある方向範囲より外になるようにスピーカ１０６−１，１０６−２からの再生音を事前にフィルタリングしておくことにより、このような問題を回避することができる。 When the reproduced sound from the speakers 106-1 and 106-2 is received by the plurality of microphones 07-1 and 107-2 using the signal processing apparatus described in the first to sixth embodiments, the sound source direction Such a problem can be avoided by filtering the reproduced sound from the speakers 106-1 and 106-2 in advance so that the sound is out of the direction range where the speaker may exist. .

以上説明した本発明の実施形態に基づく信号処理は、ハードウェアでも実現可能であるが、パーソナルコンピュータのようなコンピュータを用いてソフトウェアにより実行することも可能である。従って、本発明によればコンピュータを上述した信号処理装置として機能させるためのプログラム、あるいは当該プログラムを格納したコンピュータ読み取り可能な記憶媒体を提供することができる。 The signal processing based on the embodiment of the present invention described above can be realized by hardware, but can also be executed by software using a computer such as a personal computer. Therefore, according to the present invention, it is possible to provide a program for causing a computer to function as the signal processing apparatus described above, or a computer-readable storage medium storing the program.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

１０８・・・アレー処理部
１０９，１０９−１，１０９−２・・・位相制御部
１１０−１，１１０−２・・・蓄積部
３０３・・・相関低減部
３０４−１〜３０４−４・・・フィルタリング部
３１２−１，３１２−２，４０３・・・加算器
４１０−１，４１０−２・・・信号分離部
５０１・・・電子機器
５０２・・・表示部 108: Array processing unit 109, 109-1, 109-2 ... Phase control unit 110-1, 110-2 ... Accumulation unit 303 ... Correlation reduction unit 304-1 to 304-4 ... Filtering unit 312-1, 312-2, 403 ... adder 410-1, 410-2 ... signal separation unit 501 ... electronic device 502 ... display unit

Claims

An array processing unit for performing an array process so as to have different sensitivities depending on the sound receiving direction of the reproduced sound, with respect to a received signal obtained by receiving reproduced sounds of a plurality of reproducing channels by a plurality of microphones;
A sound source signal generating unit that generates a sound source signal of at least one channel from a plurality of channels of audio signals;
Filtering for generating a plurality of filtering signals to be supplied to the plurality of reproduction channels by performing filtering so that the sound source is localized in a direction in which the sensitivity is relatively low with respect to the sound source signal for each of the plurality of reproduction channels. And
A signal processing apparatus comprising:

The signal processing apparatus according to claim 1, wherein the sound source signal generation unit generates a one-channel sound source signal by adding the plurality of channels of audio signals.

The array processing unit performs the array processing so that the sensitivity with respect to a predetermined direction is a minimum value,
The signal processing apparatus according to claim 1, wherein the filtering unit performs the filtering so that a sound source is localized in the predetermined direction.

The array processing unit performs the array processing so that sensitivity to a direction within a predetermined direction range is higher than sensitivity to a direction outside the direction range,
The signal processing apparatus according to claim 1, wherein the filtering unit performs the filtering so that a sound source is localized in a direction outside the direction range.

The sound source signal generating unit generates a sound source signal of a plurality of channels in which a correlation is reduced from between the audio signals of the plurality of channels from the audio signals of the plurality of channels;
The signal processing apparatus according to claim 1, wherein the filtering unit performs the filtering on each of the sound source signals of the plurality of channels for each of the plurality of reproduction channels.

The signal processing apparatus according to claim 1, further comprising a selector that selects one of the plurality of channels of audio signals and the plurality of filtering signals and supplies the selected signals to the plurality of reproduction channels.

The selector is a signal processing apparatus according to claim 6, characterized in that selecting the filtered signal during the utterance of the speaker, selecting the audio signal during the non-spoken.

A signal separation unit that separates each of the audio signals of the plurality of channels into a first signal component used to generate the sound source signal and a second signal component other than the first signal component;
An adder for adding the second signal component to the plurality of filtering signals supplied to the plurality of reproduction channels;
The signal processing apparatus according to claim 1, further comprising:

A display unit;
A plurality of microphones provided in the display unit;
A plurality of speakers provided in the display unit;
The signal processing device according to claim 1, wherein the filtering unit performs filtering so that the sound source is localized in a direction other than a direction range in which a user facing the display unit is present. An electronic apparatus comprising:

Computer
An array processing unit that performs array processing so as to have different sensitivities depending on the sound receiving direction of the reproduced sound, with respect to a received signal obtained by receiving reproduced sounds of a plurality of reproducing channels by a plurality of microphones;
A sound source signal generating unit that generates a sound source signal of at least one channel from a plurality of channels of audio signals;
Filtering for generating a plurality of filtering signals to be supplied to the plurality of reproduction channels by performing filtering so that the sound source is localized in a direction in which the sensitivity is relatively low with respect to the sound source signal for each of the plurality of reproduction channels. Part,
Signal processing program to function as

For the received signal obtained by receiving the reproduced sound of the plurality of reproduction channels by the plurality of microphones, an array process is performed so as to have different sensitivities depending on the sound receiving direction of the reproduced sound,
Generating a sound source signal of at least one channel from a plurality of channels of audio signals;
For each of the plurality of reproduction channels, filtering is performed so that the sound source is localized in a direction in which the sensitivity is relatively low with respect to the sound source signal, thereby generating a plurality of filtering signals to be supplied to the plurality of reproduction channels. And a signal processing method.