JPWO2008143142A1

JPWO2008143142A1 - Sound source localization apparatus, sound source localization method, and program

Info

Publication number: JPWO2008143142A1
Application number: JP2009515195A
Authority: JP
Inventors: 信孝中村
Original assignee: Tokyo Electron Device Ltd
Current assignee: Tokyo Electron Device Ltd
Priority date: 2007-05-18
Filing date: 2008-05-15
Publication date: 2010-08-05
Also published as: WO2008143142A1

Abstract

単チャネル方向検出部（１３）は、チャネルを切り替えながらチャネルに対応する一組の音源データ（Ｅ１２）からチャネル毎の方向データを作成し、チャネル別方向データ（Ｅ１３）としてＦＦＴ部（１４）に出力する。ＦＦＴ部（１４）は、チャネル別方向データ（Ｅ１３）を連結して時系列の方向データとみなしたものをフーリエ変換して方向データ周波数成分（Ｅ１４）を求める。フィルタ部（１５）は、方向データ周波数成分（Ｅ１４）のうち偶数次高調波の周波数成分を抑圧したものをフィルタ済方向データ周波数成分（Ｅ１５）として出力する。ＩＦＦＴ部（１６）は、フィルタ済方向データ周波数成分（Ｅ１５）を逆フーリエ変換してフィルタ済方向データ（Ｅ１６）を求める。音源定位部（１７）は、フィルタ済方向データ（Ｅ１６）から音源の方向を求める。The single channel direction detection unit (13) creates direction data for each channel from a set of sound source data (E12) corresponding to the channel while switching the channel, and sends it to the FFT unit (14) as direction data (E13) for each channel. Output. The FFT unit (14) obtains a direction data frequency component (E14) by Fourier transforming the channel-specific direction data (E13) and considering it as time-series direction data. A filter part (15) outputs what suppressed the frequency component of the even-order harmonic among the direction data frequency component (E14) as a filtered direction data frequency component (E15). The IFFT unit (16) obtains filtered direction data (E16) by performing an inverse Fourier transform on the filtered direction data frequency component (E15). The sound source localization unit (17) obtains the direction of the sound source from the filtered direction data (E16).

Description

本発明は、音源定位装置、音源定位方法、及び、プログラムに関する。 The present invention relates to a sound source localization device, a sound source localization method, and a program.

精度の高い音源定位を実現する方法として、ビームフォーミング技術が知られている。音源定位とは、複数のマイクロフォンを使用し、それぞれに入力される音源データの位相差や強度差から音源の方向を特定することをいう。また、ビームフォーミング技術とは、多チャンネルの音源データ情報を統合し、精度の高い音源定位を実現する技術をいう。ビームフォーミング技術としては、適応型ビームフォーミング技術と遅延和型ビームフォーミング技術とが知られている。 Beam forming technology is known as a method for realizing highly accurate sound source localization. The sound source localization refers to specifying a sound source direction from a phase difference or intensity difference of sound source data input to each using a plurality of microphones. The beam forming technique refers to a technique that integrates multi-channel sound source data information to realize highly accurate sound source localization. As the beam forming technique, an adaptive beam forming technique and a delay sum type beam forming technique are known.

適応型ビームフォーミング技術は、複数のマイクロフォンで音源データを測定し、目的方向以外の音源データを減衰させて音源の方向を推定する技術である。適応型ビームフォーミング技術は、例えば、非特許文献１に記載されている。 The adaptive beamforming technique is a technique for measuring sound source data with a plurality of microphones and attenuating sound source data other than the target direction to estimate the direction of the sound source. The adaptive beam forming technique is described in Non-Patent Document 1, for example.

遅延和型ビームフォーミング技術は、複数のマイクロフォンで音源データを測定し、目的方向の音源データを強調させて音源の方向を推定する技術である。遅延和型ビームフォーミング技術は、例えば、特許文献１に記載されている。
電子情報通信学会編：音響システムとデジタル処理特開２００１−３０９４８３号公報 The delay-sum beamforming technique is a technique for measuring sound source data with a plurality of microphones and enhancing the sound source data in a target direction to estimate the direction of the sound source. The delay sum type beam forming technique is described in Patent Document 1, for example.
The Institute of Electronics, Information and Communication Engineers: Acoustic systems and digital processing JP 2001-309383 A

上述した適応型ビームフォーミング技術を用いた場合、（マイクロフォン数−１）の分しか目的方向以外の音源データを減衰することができない。このため、音波の反射やノイズの影響による目的方向以外の音源データが多い場合、これら音波の反射やノイズによる影響を抑えることができない。 When the above-described adaptive beamforming technique is used, sound source data other than the target direction can be attenuated by the number of (microphones−1). For this reason, when there is a lot of sound source data other than the target direction due to the reflection of sound waves and noise, the effects of reflection and noise of these sound waves cannot be suppressed.

また、遅延和型ビームフォーミング技術を用いた場合であっても、音波の反射やノイズによる影響を完全に抑えることはできない。 Even when the delay-and-sum beamforming technique is used, the influence of sound wave reflection and noise cannot be completely suppressed.

本発明は、反射やノイズの影響を抑制し、精度の高い音源定位を行うことのできる音源定位装置、音源定位方法、及び、プログラムを提供することを目的とする。 It is an object of the present invention to provide a sound source localization device, a sound source localization method, and a program that can perform the sound source localization with high accuracy while suppressing the influence of reflection and noise.

上記目的を達成するために、本発明の第１の観点に係る音源定位装置は、
少なくとも３つ以上のマイクロフォン素子から構成されるマイクロフォンアレイと、
前記マイクロフォンアレイからマイクロフォン素子毎の音声データを入力し、所定の位置関係にある２つのマイクロフォンの組み合わせに対応するチャネル毎に、チャネルに対応する２つのマイクロフォンの位置を基準とした音源の方向を示す方向データを求める方向検出手段と、
前記方向検出手段からチャネル毎の方向データを入力し、入力したチャネル毎の方向データを連結し、時系列データとみなしたものをフーリエ変換して連結された方向データが表す波形の周波数成分を求めるフーリエ変換手段と、
前記フーリエ変換手段が求めた周波数成分を入力し、入力した周波数成分のうち偶数次高調波の周波数成分を抑圧したものをフィルタ済周波数成分として出力するフィルタ手段と、
前記フィルタ手段からフィルタ済周波数成分を入力し、フィルタ済周波数成分を逆フーリエ変換してフィルタ済データを求める逆フーリエ変換手段と、
前記逆フーリエ変換手段が求めたフィルタ済データから音源の方向を求める音源定位手段と、を備えることを特徴とする。In order to achieve the above object, a sound source localization apparatus according to the first aspect of the present invention includes:
A microphone array composed of at least three or more microphone elements;
Voice data for each microphone element is input from the microphone array, and for each channel corresponding to a combination of two microphones having a predetermined positional relationship, the direction of the sound source is shown with reference to the positions of the two microphones corresponding to the channel. Direction detection means for obtaining direction data;
The direction data for each channel is input from the direction detection means, the input direction data for each channel is concatenated, and the frequency component of the waveform represented by the concatenated direction data is obtained by Fourier transforming the data regarded as time series data. Fourier transform means;
Filter means for inputting the frequency component obtained by the Fourier transform means, and outputting as a filtered frequency component a frequency component of the even-order harmonics suppressed among the inputted frequency components;
An inverse Fourier transform unit that inputs a filtered frequency component from the filter unit, and obtains filtered data by performing an inverse Fourier transform on the filtered frequency component;
Sound source localization means for obtaining the direction of the sound source from the filtered data obtained by the inverse Fourier transform means.

前記チャネルは隣接する２つのマイクロフォンの組み合わせに対応してもよい。 The channel may correspond to a combination of two adjacent microphones.

前記フィルタ手段は、
前記フーリエ変換手段が求めた周波数成分を入力し、入力した周波数成分のうち偶数次高調波の周波数成分を抑圧し、さらに奇数次高調波の周波数成分を減衰したものをフィルタ済周波数成分として出力してもよい。The filter means includes
The frequency component obtained by the Fourier transform means is input, the frequency component of the even-order harmonic is suppressed among the input frequency components, and the frequency component of the odd-order harmonic is further attenuated and output as the filtered frequency component. May be.

前記方向検出手段は、
チャネルを選択するためのチャネル選択信号を出力し、チャネル選択信号に対応するチャネルの音声データを入力し、チャネル毎に方向データを求める単チャネル方向検出手段と、
前記マイクロフォンアレイからマイクロフォン素子毎の音声データを入力し、前記単チャネル方向検出手段から入力した選択信号により指示されたチャネルに対応する音声データを出力するチャネル切替手段と、をさらに備えてもよい。The direction detecting means includes
A single channel direction detection means for outputting a channel selection signal for selecting a channel, inputting voice data of a channel corresponding to the channel selection signal, and obtaining direction data for each channel;
Channel switching means for inputting voice data for each microphone element from the microphone array and outputting voice data corresponding to the channel indicated by the selection signal input from the single channel direction detecting means may be further provided.

前記マイクロフォンアレイは、無指向性のマイクロフォン素子から構成されていてもよい。 The microphone array may be composed of omnidirectional microphone elements.

上記目的を達成するために、本発明の第２の観点に係る音源定位方法は、
少なくとも３つ以上のマイクロフォン素子から構成されるマイクロフォンアレイからマイクロフォン素子毎の音声データを入力し、所定の位置関係にある２つのマイクロフォンの組み合わせに対応するチャネル毎に、チャネルに対応する２つのマイクロフォンの位置を基準とした音源の方向を示す方向データを求める方向検出ステップと、
前記方向検出ステップにより求められたチャネル毎の方向データを連結し、時系列の方向データとみなしたものをフーリエ変換して連結された方向データが表す波形の周波数成分を求めるフーリエ変換ステップと、
前記フーリエ変換ステップにより求められた周波数成分のうち偶数次高調波の周波数成分を抑圧したものをフィルタ済周波数成分として出力するフィルタステップと、
前記フィルタステップにより出力されたフィルタ済周波数成分を逆フーリエ変換してフィルタ済データを求める逆フーリエ変換ステップと、
前記逆フーリエ変換ステップにより求められたフィルタ済データから音源の方向を求める音源定位ステップと、を備えることを特徴とする。In order to achieve the above object, a sound source localization method according to the second aspect of the present invention includes:
Audio data for each microphone element is input from a microphone array composed of at least three or more microphone elements, and for each channel corresponding to a combination of two microphones in a predetermined positional relationship, two microphones corresponding to the channel A direction detecting step for obtaining direction data indicating the direction of the sound source relative to the position;
A Fourier transform step for obtaining frequency components of a waveform represented by the direction data represented by concatenating the direction data for each channel obtained by the direction detection step and performing Fourier transform on what is regarded as time-series direction data;
A filter step of outputting a frequency component obtained by suppressing the frequency component of the even-order harmonics among the frequency components obtained by the Fourier transform step as a filtered frequency component;
An inverse Fourier transform step for obtaining a filtered data by performing an inverse Fourier transform on the filtered frequency component output by the filter step;
A sound source localization step for obtaining the direction of the sound source from the filtered data obtained by the inverse Fourier transform step.

上記目的を達成するために、本発明の第３の観点に係るプログラムは、
コンピュータを、
少なくとも３つ以上のマイクロフォン素子から構成されるマイクロフォンアレイから入力したマイクロフォン素子毎の音声データに基づいて、所定の位置関係にある２つのマイクロフォンの組み合わせに対応するチャネル毎に、チャネルに対応する２つのマイクロフォンの位置を基準とした音源の方向を示す方向データを求める方向検出手段、
前記方向検出手段が求めたチャネル毎の方向データを連結し、時系列データとみなしたものをフーリエ変換して連結された方向データが表す波形の周波数成分を求めるフーリエ変換手段、
前記フーリエ変換手段が求めた周波数成分のうち偶数次高調波の周波数成分を抑圧したものをフィルタ済周波数成分として求めるフィルタ手段、
前記フィルタ手段が求めたフィルタ済周波数成分を逆フーリエ変換してフィルタ済データを求める逆フーリエ変換手段、
前記逆フーリエ変換手段が求めたフィルタ済データから音源の方向を求める音源定位手段として機能させることを特徴とする。In order to achieve the above object, a program according to the third aspect of the present invention provides:
Computer
Two channels corresponding to each channel corresponding to a combination of two microphones in a predetermined positional relationship based on audio data for each microphone element input from a microphone array composed of at least three or more microphone elements. Direction detection means for obtaining direction data indicating the direction of the sound source with respect to the position of the microphone;
Fourier transform means for concatenating the direction data for each channel obtained by the direction detection means and obtaining a frequency component of the waveform represented by the direction data represented by Fourier transform of what is regarded as time series data,
Filter means for obtaining a filtered frequency component obtained by suppressing the frequency component of the even-order harmonics among the frequency components obtained by the Fourier transform means;
Inverse Fourier transform means for obtaining filtered data by performing inverse Fourier transform on the filtered frequency component obtained by the filter means;
It is made to function as a sound source localization means which calculates | requires the direction of a sound source from the filtered data which the said inverse Fourier transform means calculated | required.

本発明にかかる音源定位装置、音源定位方法、及び、プログラムによれば、多チャンネルの音源データにフーリエ変換をかけることにより、音波の反射やノイズによる影響を抑え、精度の高い音源定位を行うことができる。 According to the sound source localization apparatus, the sound source localization method, and the program according to the present invention, by performing Fourier transform on multi-channel sound source data, the influence of sound wave reflection and noise is suppressed, and highly accurate sound source localization is performed. Can do.

図１は、本発明の実施形態に係る音源定位装置の全体構成を示す機能ブロック図である。FIG. 1 is a functional block diagram showing an overall configuration of a sound source localization apparatus according to an embodiment of the present invention. 図２は、図１に示す音源定位装置における主要部の物理的な構成を示すブロック図である。FIG. 2 is a block diagram showing a physical configuration of a main part in the sound source localization apparatus shown in FIG. 図３は、マイクロフォンアレイの構成を模式的に表した図である。FIG. 3 is a diagram schematically showing the configuration of the microphone array. 図４は、単チャネル方向データを表現する際の角度の基準を説明するための図である。FIG. 4 is a diagram for explaining a reference for an angle when expressing single channel direction data. 図５は、音源定位装置の音源定位処理の一例を示したフローチャートである。FIG. 5 is a flowchart showing an example of a sound source localization process of the sound source localization apparatus. 図６は、図５におけるチャネル別方向検出処理を示したフローチャートである。FIG. 6 is a flowchart showing the channel-specific direction detection processing in FIG. 図７は、図５における定位処理を示したフローチャートである。FIG. 7 is a flowchart showing the localization process in FIG. 図８は、ＩＩＤ（Interaural Intensity Difference）の確信度を定義した図である。FIG. 8 is a diagram defining the certainty of IID (Interaural Intensity Difference). 図９は、各チャネルにおける方向データの測定角度と理想角度とを示した図である。FIG. 9 is a diagram showing the measurement angle and ideal angle of direction data in each channel. 図１０は、時系列方向データが構成する波形を示したものである。FIG. 10 shows a waveform formed by time-series direction data. 図１１Ａは、フィルタ処理を行う前の方向データの周波数成分の一例を示した図である。FIG. 11A is a diagram illustrating an example of frequency components of the direction data before performing the filtering process. 図１１Ｂは、フィルタ処理を行った後の方向データの周波数成分の一例を示した図である。FIG. 11B is a diagram illustrating an example of frequency components of the direction data after performing the filtering process. 図１１Ｃは、奇数次高調波の強度を減衰させるフィルタ処理を行った後の方向データの周波数成分の一例を示した図である。FIG. 11C is a diagram illustrating an example of frequency components of direction data after performing filter processing for attenuating the intensity of odd-order harmonics. 図１２は、フィルタ済方向データが構成する波形を示した図である。FIG. 12 is a diagram illustrating a waveform formed by the filtered direction data. 図１３は、マイクロフォンの配置の変形例を示した図である。FIG. 13 is a diagram showing a modification of the arrangement of the microphones. 図１４は、図１３に示す変形例において、任意の基準方向とチャネルとの関係を示した図である。FIG. 14 is a diagram showing a relationship between an arbitrary reference direction and a channel in the modification shown in FIG. 図１５は、図１３に示す変形例において、ＦＦＴ（Fast Fourier Transform）処理を行うデータの波形を示した図である。FIG. 15 is a diagram showing a waveform of data for performing FFT (Fast Fourier Transform) processing in the modification shown in FIG. 図１６は、マイクロフォンの配置の別の変形例において、任意の基準方向とチャネルとの関係を示した図である。FIG. 16 is a diagram illustrating a relationship between an arbitrary reference direction and a channel in another modification of the microphone arrangement.

Explanation of symbols

１０音源定位装置
１１マイクロフォンアレイ
１２チャネル切替部
１３単チャネル方向検出部
１４ＦＦＴ部
１５フィルタ部
１６ＩＦＦＴ（Inverse Fast Fourier Transform）部
１７音源定位部
１８制御部
１９測温部
２０音源
５０方向検出部DESCRIPTION OF SYMBOLS 10 Sound source localization apparatus 11 Microphone array 12 Channel switching part 13 Single channel direction detection part 14 FFT part 15 Filter part 16 IFFT (Inverse Fast Fourier Transform) part 17 Sound source localization part 18 Control part 19 Temperature measuring part 20 Sound source 50 Direction detection part

以下、図面に基づき、本発明の実施形態に係る音源定位装置について説明する。 Hereinafter, a sound source localization apparatus according to an embodiment of the present invention will be described with reference to the drawings.

本実施形態の音源定位装置１０は、図１に示すように、マイクロフォンアレイ１１と、方向検出部５０と、ＦＦＴ（Fast Fourier Transform）部１４と、フィルタ部１５と、ＩＦＦＴ（Inverse Fast Fourier Transform）部１６と、音源定位部１７と、制御部１８と、測温部１９と、を備える。 As shown in FIG. 1, the sound source localization apparatus 10 of the present embodiment includes a microphone array 11, a direction detection unit 50, an FFT (Fast Fourier Transform) unit 14, a filter unit 15, and an IFFT (Inverse Fast Fourier Transform). A unit 16, a sound source localization unit 17, a control unit 18, and a temperature measuring unit 19 are provided.

音源定位装置１０は、音源２０が出力した音波ＳＷをマイクロフォンアレイ１１を構成する複数のマイクロフォンで検出し、複数のマイクロフォンが出力する音源データから音源の正確な方向を求める装置である。 The sound source localization device 10 is a device that detects the sound wave SW output from the sound source 20 with a plurality of microphones constituting the microphone array 11 and obtains an accurate direction of the sound source from sound source data output from the plurality of microphones.

マイクロフォンアレイ１１は、音源２０が出力した音波ＳＷを入力する装置である。マイクロフォンアレイ１１は、複数のマイクロフォンから構成される。 The microphone array 11 is a device that inputs the sound wave SW output from the sound source 20. The microphone array 11 is composed of a plurality of microphones.

図３に示すように、マイクロフォンアレイ１１は、円周上に等間隔に配置された１６個のマイクロフォン（マイクロフォン１１Ａ〜マイクロフォン１１Ｐ）から構成される。各マイクロフォンは、指向性を有しないことが望ましい。マイクロフォン１１Ａ〜１１Ｐには、それぞれ音源２０から出力された音波ＳＷが入力される。各マイクロフォン１１Ａ〜１１Ｐは、入力した音波ＳＷをアナログオーディオデータ（アナログ音源データＥ１１）に変換してチャネル切替部１２に出力する。すなわち、マイクロフォンアレイ１１は、１６個のマイクロフォン１１Ａ〜１１Ｐがそれぞれ出力するアナログ音源データＥ１１を全て方向検出部５０に出力する。 As shown in FIG. 3, the microphone array 11 is composed of 16 microphones (microphones 11A to 11P) arranged at equal intervals on the circumference. Each microphone desirably has no directivity. The sound waves SW output from the sound source 20 are input to the microphones 11A to 11P, respectively. Each of the microphones 11 </ b> A to 11 </ b> P converts the input sound wave SW into analog audio data (analog sound source data E <b> 11) and outputs the analog audio data to the channel switching unit 12. That is, the microphone array 11 outputs all the analog sound source data E11 output from each of the 16 microphones 11A to 11P to the direction detection unit 50.

本明細書では、隣接するマイクロフォンの組を特定するためにチャネルという概念を用いる。マイクロフォンが１６個である場合、チャネル数は１６となる。 In this specification, the concept of channel is used to identify a set of adjacent microphones. When there are 16 microphones, the number of channels is 16.

方向検出部５０は、所定の位置関係にある２つのマイクロフォンの組み合わせに対応するチャネル毎に、入力されたアナログ音源データＥ１１から、音源の方向を示す方向データを求める。方向検出部５０は、チャネル切替部１２、単チャネル方向検出部１３を備える。 The direction detection unit 50 obtains direction data indicating the direction of the sound source from the input analog sound source data E11 for each channel corresponding to a combination of two microphones having a predetermined positional relationship. The direction detection unit 50 includes a channel switching unit 12 and a single channel direction detection unit 13.

チャネル切替部１２は、例えば、マルチプレクサから構成される。チャネル数が１６の場合、チャネル選択信号Ｅｃｈは、例えば、４ビット（２の４乗＝１６）のデジタル信号から構成される。チャネル切替部１２は、アナログ音源データＥ１１のうち、単チャネル方向検出部１３から入力するチャネル選択信号Ｅｃｈで指示されたチャネルに対応する隣接する一組のマイクロフォンの音源データＥ１２を選択する。そして、チャネル切替部１２は、選択した音源データＥ１２を内蔵するＡ／Ｄ（Analog/Digital）変換器１２ａ，１２ｂでデジタル信号に変換して単チャネル方向検出部１３に出力する。 The channel switching unit 12 is configured by a multiplexer, for example. When the number of channels is 16, the channel selection signal Ech is composed of, for example, a digital signal of 4 bits (2 4 = 16). The channel switching unit 12 selects, from the analog sound source data E11, sound source data E12 of a pair of adjacent microphones corresponding to the channel indicated by the channel selection signal Ech input from the single channel direction detection unit 13. Then, the channel switching unit 12 converts the selected sound source data E12 into digital signals by A / D (Analog / Digital) converters 12a and 12b, and outputs the digital signals to the single channel direction detection unit 13.

単チャネル方向検出部１３は、入力された一組の音源データＥ１２の信号から、チャネル毎に音源の方向を示す方向データを求める。そして、単チャネル方向検出部１３は、求められた方向データをチャネル別方向データＥ１３としてＦＦＴ部１４に出力する。 The single channel direction detection unit 13 obtains direction data indicating the direction of the sound source for each channel from the input signal of the set of sound source data E12. Then, the single channel direction detection unit 13 outputs the obtained direction data to the FFT unit 14 as channel-specific direction data E13.

なお、単チャネル方向検出部１３は、チャネル選択信号Ｅｃｈをチャネル切替部１２に出力することにより、単チャネル方向検出部１３に入力される一組の音源データＥ１２を切り替える。そして、単チャネル方向検出部１３は、順次チャネルを切り替えながらチャネル毎に方向データを求める。以下にチャネルとチャネルの切り替えについて詳細に説明する。 The single channel direction detection unit 13 switches a set of sound source data E12 input to the single channel direction detection unit 13 by outputting a channel selection signal Ech to the channel switching unit 12. And the single channel direction detection part 13 calculates | requires direction data for every channel, switching a channel sequentially. Hereinafter, switching between channels will be described in detail.

単チャネル方向検出部１３は、チャネル１を選択すると、マイクロフォン１１Ａ及びマイクロフォン１１Ｂの音源データＥ１２がチャネル切替部１２から出力されるようなチャネル選択信号Ｅｃｈをチャネル切替部１２に出力する。そして、単チャネル方向検出部１３は、入力した音源データＥ１２からチャネル１の方向データを求める。 When the channel 1 is selected, the single channel direction detection unit 13 outputs a channel selection signal Ech such that the sound source data E12 of the microphones 11A and 11B is output from the channel switching unit 12 to the channel switching unit 12. Then, the single channel direction detection unit 13 obtains the direction data of the channel 1 from the input sound source data E12.

また、単チャネル方向検出部１３は、１つのチャネルの方向検出が完了すると次のチャネルに切り替える。具体的には、チャネル１→チャネル２→・・・→チャネル１６という順序でチャネルを切り替える。例えば、チャネル２はマイクロフォン１１Ｂ，１１Ｃの組に対応する。チャネル３以降も同様であり、チャネル１６はマイクロフォン１１Ｐ，１１Ａの組に対応する。そして、単チャネル方向検出部１３は、チャネルを切り替える毎に、チャネル毎の方向データを求める。 Moreover, the single channel direction detection part 13 will switch to the next channel, if the direction detection of one channel is completed. Specifically, the channels are switched in the order of channel 1 → channel 2 →. For example, channel 2 corresponds to a set of microphones 11B and 11C. The same applies to the channel 3 and subsequent channels, and the channel 16 corresponds to a set of microphones 11P and 11A. And the single channel direction detection part 13 calculates | requires the direction data for every channel, whenever a channel is switched.

ここで、図４を用いて、方向データを表現する際の角度の基準について説明する。 Here, the reference | standard of the angle at the time of expressing direction data is demonstrated using FIG.

図４では、チャネル１、すなわちマイクロフォン１１Ａ，１１Ｂの出力する音源データＥ１２を用いて、音源２０の方向を検出する場合が例示されている。図４に示すように、マイクロフォン１１Ａとマイクロフォン１１Ｂとを結んだ線分Ｌ１の中心点を中心点Ｏとする。また、中心点Ｏから音源２０側に線分Ｌ１に対して垂直に伸びる半直線Ｌ２と、中心点Ｏと音源２０とを結んだ線分Ｌ３とのなす角を音源方向θとする。ここで、中心点Ｏを基準として、マイクロフォン１１Ａの方向を−９０°、マイクロフォン１１Ｂの方向を９０°、半直線Ｌ２の伸びる方向を０°としている。なお、他のチャネルにおいては、アルファベットの若い（ただし、ＰとＡの場合は、ＡよりもＰの方が若いものとする）方のマイクロフォンをマイクロフォン１１Ａと置き換え、他方のマイクロフォンをマイクロフォン１１Ｂと置き換えて角度の基準を考える。 FIG. 4 illustrates a case where the direction of the sound source 20 is detected using the sound source data E12 output from the channel 1, that is, the microphones 11A and 11B. As shown in FIG. 4, the center point of the line segment L1 connecting the microphone 11A and the microphone 11B is a center point O. Further, an angle formed by a half line L2 extending perpendicularly to the line segment L1 from the center point O toward the sound source 20 and a line segment L3 connecting the center point O and the sound source 20 is defined as a sound source direction θ. Here, with respect to the center point O, the direction of the microphone 11A is -90 °, the direction of the microphone 11B is 90 °, and the direction in which the half line L2 extends is 0 °. In other channels, the microphone with the younger alphabet (in the case of P and A, where P is younger than A) is replaced with microphone 11A, and the other microphone is replaced with microphone 11B. Think about the angle standard.

ＦＦＴ部１４は、単チャネル方向検出部１３から入力したチャネル別方向データＥ１３を時系列のデータとみなし、チャネル別方向データＥ１３が表している波形に高速フーリエ変換を実行する。そして、ＦＦＴ部１４は、変換された値を方向データ周波数成分Ｅ１４としてフィルタ部１５に出力する。 The FFT unit 14 regards the channel-specific direction data E13 input from the single channel direction detection unit 13 as time-series data, and performs fast Fourier transform on the waveform represented by the channel-specific direction data E13. Then, the FFT unit 14 outputs the converted value to the filter unit 15 as the direction data frequency component E14.

フィルタ部１５は、ＦＦＴ部１４から入力した方向データ周波数成分Ｅ１４にフィルタ処理を行う。そして、フィルタ部１５は、フィルタ処理後の値をフィルタ済方向データ周波数成分Ｅ１５としてＩＦＦＴ部１６に出力する。 The filter unit 15 performs a filtering process on the direction data frequency component E14 input from the FFT unit 14. Then, the filter unit 15 outputs the filtered value to the IFFT unit 16 as a filtered direction data frequency component E15.

ＩＦＦＴ部１６は、フィルタ部１５から入力したフィルタ済方向データ周波数成分Ｅ１６に対し逆フーリエ変換を実行する。そして、ＩＦＦＴ部１６は、変換された値をフィルタ済方向データＥ１６として音源定位部１７に出力する。 The IFFT unit 16 performs an inverse Fourier transform on the filtered direction data frequency component E16 input from the filter unit 15. Then, the IFFT unit 16 outputs the converted value to the sound source localization unit 17 as filtered direction data E16.

音源定位部１７は、ＩＦＦＴ部１６から入力したフィルタ済方向データＥ１６から音源２０の方向を推定し、定位データＥ１７として出力する。 The sound source localization unit 17 estimates the direction of the sound source 20 from the filtered direction data E16 input from the IFFT unit 16, and outputs it as localization data E17.

制御部１８は、チャネル切替部１２、単チャネル方向検出部１３、ＦＦＴ部１４、フィルタ部１５、ＩＦＦＴ部１６又は音源定位部１７のタイミングの制御を行うための制御信号Ｅ１８の出力等を行う。 The control unit 18 outputs a control signal E18 for controlling the timing of the channel switching unit 12, the single channel direction detection unit 13, the FFT unit 14, the filter unit 15, the IFFT unit 16, or the sound source localization unit 17.

測温部１９は、気温を測定する温度センサなどから構成される。測温部１９は、測定した気温のデータを電気信号である気温信号Ｅ１９に変換して単チャネル方向検出部１３に出力する。なお、測温部１９は、温度を常温と仮定して近似値を用いる場合はなくてもよい。 The temperature measuring unit 19 includes a temperature sensor that measures the temperature. The temperature measuring unit 19 converts the measured temperature data into an air temperature signal E19 that is an electrical signal, and outputs it to the single channel direction detecting unit 13. Note that the temperature measuring unit 19 does not have to use an approximate value assuming that the temperature is normal temperature.

音源２０は、音波ＳＷの発生源であり、本音源定位装置の定位の対象となる装置である。音源２０は、ＭＩＤＩ（Musical Instrument Digital Interface）音源など特定の規格に合致する音源である必要はなく、音波ＳＷを発生するものであれば人や動物などであってもよい。 The sound source 20 is a generation source of the sound wave SW and is a device to be localized by the sound source localization device. The sound source 20 does not need to be a sound source that conforms to a specific standard such as a MIDI (Musical Instrument Digital Interface) sound source, and may be a person or an animal as long as it generates a sound wave SW.

図１に示す音源定位装置１０は、物理的には、図２に示すように、コンピュータ１１０を備えている。コンピュータ１１０は、マルチプレクサ１１１、Ａ／Ｄ変換器１１２、ＲＯＭ（Read Only Memory）１１３、ＲＡＭ（Random Access Memory）１１４、ＣＰＵ（Central Processing Unit）１１５，入出力部１１６から構成される。なお、図１に示す音源定位装置１０の機能的構成では、コンピュータ１１０は、方向検出部５０と、ＦＦＴ部１４と、フィルタ部１５と、ＩＦＦＴ部１６と、音源定位部１７と、制御部１８と、から構成される。 The sound source localization apparatus 10 shown in FIG. 1 physically includes a computer 110 as shown in FIG. The computer 110 includes a multiplexer 111, an A / D converter 112, a ROM (Read Only Memory) 113, a RAM (Random Access Memory) 114, a CPU (Central Processing Unit) 115, and an input / output unit 116. In the functional configuration of the sound source localization apparatus 10 illustrated in FIG. 1, the computer 110 includes a direction detection unit 50, an FFT unit 14, a filter unit 15, an IFFT unit 16, a sound source localization unit 17, and a control unit 18. And.

マルチプレクサ１１１は、チャネル切換部１２に対応する装置である。マルチプレクサ１１１は、マイクロフォンアレイ１１からのアナログ音源データＥ１１が入力されると、ＣＰＵ１１５の制御に従って、アナログ音源データＥ１１を順次切り換えて出力する。 The multiplexer 111 is a device corresponding to the channel switching unit 12. When the analog sound source data E11 from the microphone array 11 is input, the multiplexer 111 sequentially switches and outputs the analog sound source data E11 according to the control of the CPU 115.

Ａ／Ｄ変換器１１２は、マルチプレクサ１１１が出力したアナログ音源データＥ１１をデジタルデータに変換して出力する。 The A / D converter 112 converts the analog sound source data E11 output from the multiplexer 111 into digital data and outputs the digital data.

ＣＰＵ１１５は、ＲＯＭ１１３に格納された動作プログラムに従って、ＲＡＭ１１４を主メモリ及びワークエリアとして使用して、演算、制御などの動作を実行する。これにより、ＣＰＵ１１５は、単チャネル方向検出部１３，ＦＦＴ部１４，フィルタ部１５，ＩＦＦＴ部１６，音源定位部１７、制御部１８などを実現する。 The CPU 115 performs operations such as calculation and control using the RAM 114 as a main memory and a work area according to an operation program stored in the ROM 113. Thereby, the CPU 115 realizes the single channel direction detection unit 13, the FFT unit 14, the filter unit 15, the IFFT unit 16, the sound source localization unit 17, the control unit 18, and the like.

入出力部１１６は、判別した音源の方向を示すデータを他装置に提供する。また、入出力部１１６は、ＣＰＵ１１５の制御に従い、マルチプレクサ１１１に対して、チャネル選択信号Ｅｃｈを出力する。 The input / output unit 116 provides data indicating the determined direction of the sound source to other devices. Further, the input / output unit 116 outputs a channel selection signal Ech to the multiplexer 111 under the control of the CPU 115.

次に、上記の構成を有する音源定位装置の動作について、図５のフローチャートを参照して説明する。 Next, the operation of the sound source localization apparatus having the above configuration will be described with reference to the flowchart of FIG.

音源定位処理が開始されると、まず、単チャネル方向検出部１３（ＣＰＵ１１５）は、チャネル番号ｃｈを“１”に初期化する（ステップＳ１０）。 When the sound source localization processing is started, first, the single channel direction detection unit 13 (CPU 115) initializes the channel number ch to “1” (step S10).

単チャネル方向検出部１３は、次に、チャネル切替を行う（ステップＳ２０）。本ステップでは、単チャネル方向検出部１３は、チャネル番号ｃｈに応じた音源データＥ１２の組がチャネル切替部１２から入力されるようなチャネル選択信号Ｅｃｈをチャネル切替部１２に出力する。音源定位処理の開始直後はチャネル番号ｃｈは“１”であるため、単チャネル方向検出部１３は、マイクロフォン１１Ａ、１１Ｂの音源データＥ１２の組が入力されるようなチャネル選択信号Ｅｃｈを出力する。 Next, the single channel direction detection unit 13 performs channel switching (step S20). In this step, the single channel direction detection unit 13 outputs to the channel switching unit 12 a channel selection signal Ech such that a set of sound source data E12 corresponding to the channel number ch is input from the channel switching unit 12. Since the channel number ch is “1” immediately after the start of the sound source localization process, the single channel direction detection unit 13 outputs a channel selection signal Ech to which a set of sound source data E12 of the microphones 11A and 11B is input.

チャネル切替部１２は、単チャネル方向検出部１３からのチャネル選択信号Ｅｃｈに従って、マイクロフォン１１Ａ、１１Ｂのデジタル音源データＥ１２を単チャネル方向検出部１３に出力する。 The channel switching unit 12 outputs the digital sound source data E12 of the microphones 11A and 11B to the single channel direction detection unit 13 in accordance with the channel selection signal Ech from the single channel direction detection unit 13.

単チャネル方向検出部１３は、入力したマイクロフォン１１Ａ、１１Ｂのデジタル音源データＥ１２に対して、チャネル別方向検出処理を行う（ステップＳ３０）。本ステップでは、単チャネル方向検出部１３は、従来の方法により１チャネル分の方向検出を行う。 The single channel direction detection unit 13 performs channel-specific direction detection processing on the input digital sound source data E12 of the microphones 11A and 11B (step S30). In this step, the single channel direction detection unit 13 performs direction detection for one channel by a conventional method.

図６は、図５に示すフローチャートにおけるチャネル別方向検出処理を説明するためのフローチャートである。まず、単チャネル方向検出部１３は、入力した２つの音源データＥ１２のそれぞれに対してＦＦＴ処理するなどして周波数解析を行う。これにより、単チャネル方向検出部１３は、２つの音源データＥ１２のスペクトル情報をそれぞれ求める（ステップＳ３１）。なお、単チャネル方向検出部１３は、周波数解析をした後に、フィルタ処理などによりノイズ成分を除去してもよい。 FIG. 6 is a flowchart for explaining the channel-specific direction detection processing in the flowchart shown in FIG. First, the single channel direction detection unit 13 performs frequency analysis by performing FFT processing on each of the two input sound source data E12. Thereby, the single channel direction detection part 13 calculates | requires the spectrum information of the two sound source data E12, respectively (step S31). Note that the single channel direction detection unit 13 may remove noise components by performing filter processing after performing frequency analysis.

次に、単チャネル方向検出部１３は、２つの音源データＥ１２のスペクトル情報からそれぞれ倍音成分を抽出する（ステップＳ３２）。 Next, the single channel direction detection unit 13 extracts overtone components from the spectrum information of the two sound source data E12 (step S32).

次に、単チャネル方向検出部１３は、倍音（基音を含む）毎に位相差から実データＩＰＤ（Interaural Phase Difference）を計算する（ステップＳ３３）。具体的には、倍音毎の実データＩＰＤ（Δφ_Ｒと定義する）は式（１）から求めることができる。
（数１）
Δφ_Ｒ＝ａｒｃｔａｎ（Ｉ［Ｓｂ］／Ｒ［Ｓｂ］）−ａｒｃｔａｎ（Ｉ［Ｓａ］／Ｒ［Ｓａ］）（１）
但し、
Ｓａ：マイクロフォン１１Ａの倍音のスペクトル値
Ｓｂ：マイクロフォン１１Ｂの倍音のスペクトル値
Ｒ［Ｓａ］、Ｒ［Ｓｂ］：倍音のスペクトルの実数部
Ｉ［Ｓａ］、Ｉ［Ｓｂ］：倍音のスペクトルの虚数部Next, the single channel direction detection unit 13 calculates actual data IPD (Interaural Phase Difference) from the phase difference for each overtone (including fundamental tone) (step S33). Specifically, the actual data IPD (defined as Δφ _R ) for each overtone can be obtained from Equation (1).
(Equation 1)
Δφ _R = arctan (I [Sb] / R [Sb]) − arctan (I [Sa] / R [Sa]) (1)
However,
Sa: Spectral value of overtone of microphone 11A Sb: Spectral value of harmonic overtone of microphone 11B R [Sa], R [Sb]: Real part of overtone spectrum I [Sa], I [Sb]: Imaginary part of overtone spectrum

次に、単チャネル方向検出部１３は、倍音毎に強度差から実データＩＩＤ（Interaural Intensity Difference）を計算する（ステップＳ３４）。具体的には、マイクロフォン１１Ａ、１１Ｂの音源データのスペクトル情報のうち、各倍音のスペクトルのｄＢ（デシベル）値をＤａ、Ｄｂと定義すると、倍音毎の実データＩＩＤ（ΔＩｓと定義する）は式（２）から求めることができる。
（数２）
ΔＩｓ＝Ｄａ−Ｄｂ（２）Next, the single channel direction detection unit 13 calculates actual data IID (Interaural Intensity Difference) from the intensity difference for each harmonic (step S34). Specifically, out of the spectrum information of the sound source data of the microphones 11A and 11B, if the dB (decibel) value of the spectrum of each harmonic is defined as Da and Db, the actual data IID (defined as ΔIs) for each harmonic is expressed by the equation. It can be obtained from (2).
(Equation 2)
ΔIs = Da−Db (2)

次に、単チャネル方向検出部１３は、式（２）で求められた倍音毎の実データＩＩＤを使用して方向情報Ｓ_Ｉを求める。なお、方向情報Ｓ_Ｉは、倍音の周波数が閾値ｆｔｈ未満の場合には実データＩＰＤを使用し、倍音の周波数が閾値ｆｔｈ以上の場合には実データＩＩＤを使用して求める。ここでは、方向情報Ｓ_Ｉは、閾値ｆｔｈから考慮すべき最大の周波数であるｆｍａｘまでの倍音毎の実データＩＩＤを考慮して求める。すなわち、方向情報Ｓ_Ｉは、倍音周波数を昇べきの順で並べた関数をＨ（ｉ）と定義すると、式（３）から求めることができる。なお、関数Ｈ（ｉ）は、例えば基音の周波数をｆ_０とすると、Ｈ（０）＝ｆ_０、Ｈ（１）＝２ｆ_０、Ｈ（２）＝３ｆ_０、…として表すことができる。

Next, a single-channel direction detecting section 13 obtains the direction information S _I using actual data IID for each harmonic obtained by Equation (2). The direction information S _I is obtained by using the actual data IPD when the harmonic frequency is less than the threshold fth, and by using the actual data IID when the harmonic frequency is greater than or equal to the threshold fth. Here, the direction information S _I is obtained by taking into account the actual data IID for each harmonic to fmax is the maximum frequency to be considered from the threshold fth. That is, the direction information S _I can be obtained from Expression (3) when a function in which harmonic frequencies are arranged in ascending order is defined as H (i). The function H (i) can be expressed as H (0) = f ₀ , H (1) = 2f ₀ , H (2) = 3f ₀ ,..., For example, where the fundamental frequency is f ₀ .

ｎｔｈは、Ｈ（ｎｔｈ）≧ｆｔｈを満たし、かつｆｔｈに最も近くなるような整数値である。例えば、ｆｔｈ＝１５００Ｈｚ、Ｈ（０）＝６００Ｈｚ、Ｈ（１）＝１２００Ｈｚ、Ｈ（２）＝１８００Ｈｚ、Ｈ（３）＝２４００Ｈｚである場合、ｎｔｈの値は２となる。また、ｎ−１はＨ（ｎ−１）がｆｍａｘとなる整数値である。 nth is an integer value that satisfies H (nth) ≧ fth and is closest to fth. For example, when fth = 1500 Hz, H (0) = 600 Hz, H (1) = 1200 Hz, H (2) = 1800 Hz, and H (3) = 2400 Hz, the value of nth is 2. N-1 is an integer value at which H (n-1) is fmax.

閾値ｆｔｈは、νを音速（ｍ／ｓ）、λをマイクロフォン１１Ａとマイクロフォン１１Ｂの距離（ｍ）として、式（４）から求めることができる。
（数４）
ｆｔｈ＝ ν ／ λ （４）The threshold fth can be obtained from Equation (4), where ν is the speed of sound (m / s) and λ is the distance (m) between the microphone 11A and the microphone 11B.
(Equation 4)
fth = ν / λ (4)

例えば、νを３４０ｍ／ｓ、λを２０ｃｍとすると、閾値ｆｔｈは１７００Ｈｚとなる。この場合、倍音の周波数が１７００Ｈｚ未満でＩＰＤを、１７００Ｈｚ以上でＩＩＤを使用して方向情報Ｓ_Ｉを算出する。なお、音速νは、測温部１９が検出した気温信号Ｅ１９（空気温度）から求めることができる。For example, when ν is 340 m / s and λ is 20 cm, the threshold fth is 1700 Hz. In this case, the IPD frequency harmonics is less than 1700 Hz, and calculates the direction information _{S I} using IID above 1700 Hz. Note that the speed of sound ν can be obtained from the air temperature signal E19 (air temperature) detected by the temperature measuring unit 19.

ここで、式（３）により求めたＳ_Ｉが０に近ければ、正面方向（図４における半直線Ｌ２の伸びる方向）に音源が存在すると考えることができる。また、Ｓ_Ｉが負であれば右方向（図４におけるマイクロフォン１１Ｂの方向）、Ｓ_Ｉが正であれば左方向（図４におけるマイクロフォン１１Ａの方向）、にそれぞれ音源が存在すると考えることができる。Here, it can be considered that the closer to the S _I 0 determined by the equation (3), the sound source is present in the front direction (the extending direction of the ray L2 in Fig. 4). Further, it can be considered that S _I is if it is negative the right direction (direction of the microphone 11B in FIG. 4), (the direction of the microphone 11A of FIG. 4) left if positive is S _I, the sound source each occurrence .

次に、単チャネル方向検出部１３は、モデルとなるＩＰＤを求める（ステップＳ３５）。具体的には、５°おきの角度をθ’、倍音の周波数をｆと定義すると、モデルＩＰＤ（Δφ_Ｍと定義する）は、式（５）から求めることができる。
（数５）
Δφ_Ｍ＝（２πｆ／ν）×λ（ｓｉｎθ’）（５）Next, the single channel direction detection part 13 calculates | requires IPD used as a model (step S35). Specifically, an angle of 5 ° intervals theta ', by defining the frequency of the harmonic and f, (defined as [Delta] [phi _M) model IPD can be obtained from equation (5).
(Equation 5)
Δφ _M = (2πf / ν) × λ (sin θ ′) (5)

次に、単チャネル方向検出部１３は、ＩＰＤの確信度とＩＩＤの確信度とを求める（ステップＳ３６）。具体的には、式（１）により求めた実データＩＰＤと、モデルＩＰＤとを比較し、５°おきの方向に対するＩＰＤの確信度をガウスの確率分布によって求める。また、実データＩＩＤの確信度を式（３）により求めた方向情報Ｓ_Ｉによって、図８の表に示すように定義する。Next, the single channel direction detection unit 13 obtains the certainty of IPD and the certainty of IID (step S36). Specifically, the actual data IPD obtained by the expression (1) is compared with the model IPD, and the certainty of the IPD with respect to the direction every 5 ° is obtained by a Gaussian probability distribution. Further, the direction information S _I obtained by the equation (3) the confidence of the actual data IID, defined as shown in the table of FIG. 8.

図８は、ＩＩＤの確信度を定義した表である。例えば、θ’が９０°〜３５°のときのＩＩＤの確信度は、式（３）により求めた方向情報Ｓ_Ｉが“＋”のときに０．３５、“−”のときに０．６５となる。FIG. 8 is a table that defines the certainty factor of the IID. For example, the reliability of IID when θ ′ is 90 ° to 35 ° is 0.35 when the direction information S _I obtained by the equation (3) is “+”, and 0.65 when “−”. It becomes.

次に、単チャネル方向検出部１３は、方向情報を抽出し、チャネル別方向データＥ１３としてＦＦＴ部１４に出力する（ステップＳ３７）。具体的には、ＩＰＤの確信度とＩＩＤの確信度とを、独立した証拠から推論された基本確率を統合するＤｅｍｐｓｔｅｒ−Ｓｈａｆｅｒ理論によって統合し、最も確信度の高い方向情報を真の方向とする。単チャネル方向検出部１３は、方向情報抽出を完了するとチャネル別方向検出処理（ステップＳ３０）を完了する。 Next, the single channel direction detection unit 13 extracts the direction information and outputs it to the FFT unit 14 as channel-specific direction data E13 (step S37). Specifically, the certainty of IPD and the certainty of IID are integrated by the Demster-Shafer theory that integrates the basic probabilities inferred from independent evidence, and the direction information with the highest certainty is made the true direction. . The single channel direction detection unit 13 completes the channel-specific direction detection process (step S30) when the direction information extraction is completed.

チャネル別方向検出処理が完了すると、単チャネル方向検出部１３は、チャネル番号ｃｈが１６であるか否かを判別する（ステップＳ４０）。単チャネル方向検出部１３は、チャネル番号ｃｈが１６でない、すなわち１６チャネル分の方向検出が完了していないと判別したときは、チャネル番号ｃｈをインクリメントし（ステップＳ５０）、チャネル切替（ステップＳ２０）に処理を戻す。 When the channel-specific direction detection process is completed, the single channel direction detection unit 13 determines whether or not the channel number ch is 16 (step S40). When the single channel direction detector 13 determines that the channel number ch is not 16, that is, the direction detection for 16 channels is not completed, the channel number ch is incremented (step S50), and the channel is switched (step S20). Return processing to.

一方、単チャネル方向検出部１３は、チャネル番号ｃｈが１６である、すなわち１６チャネル分の方向検出が完了したと判別したときは、定位処理（ステップＳ６０）に処理を移行する。 On the other hand, when the single channel direction detection unit 13 determines that the channel number ch is 16, that is, the direction detection for 16 channels has been completed, the single channel direction detection unit 13 proceeds to the localization process (step S60).

図７は、図５に示すフローチャートにおける定位処理を説明するためのフローチャートである。まず、ＦＦＴ部１４は、チャネル別方向データＥ１３から時系列方向データを作成する（ステップＳ６１）。具体的には、ＦＦＴ部１４は、チャネル別方向検出処理（ステップＳ３０）で取得された１６チャネル分の方向データを時系列のデータとみなして連結して時系列方向データを作成する。ここで、図９を参照して各チャネルの方向データについて説明する。 FIG. 7 is a flowchart for explaining the localization processing in the flowchart shown in FIG. First, the FFT unit 14 creates time-series direction data from the channel-specific direction data E13 (step S61). Specifically, the FFT unit 14 regards the direction data for 16 channels acquired in the channel-specific direction detection processing (step S30) as time-series data, and creates time-series direction data. Here, the direction data of each channel will be described with reference to FIG.

図９は、各チャネルにおける方向データの測定角度と理想角度との一例を示した図である。測定角度は、マイクロフォンアレイ１１と音源２０とが図３に示す位置関係にあるときに、チャネル別方向検出処理（ステップＳ３０）で実際に取得した方向データである。理想角度は、マイクロフォンアレイ１１と音源２０とが図３に示す位置関係にあるときに、チャネル別方向検出処理（ステップＳ３０）において取得されるべき方向データである。 FIG. 9 is a diagram showing an example of the measurement angle and ideal angle of direction data in each channel. The measurement angle is direction data actually acquired in the channel-specific direction detection process (step S30) when the microphone array 11 and the sound source 20 are in the positional relationship shown in FIG. The ideal angle is direction data to be acquired in the channel-specific direction detection process (step S30) when the microphone array 11 and the sound source 20 are in the positional relationship shown in FIG.

図９に例示するように、チャネル別方向検出処理（ステップＳ３０）で実際に取得した測定角度は、理想角度とは異なる。これは、実際の方向検出の際には、音の反射や雑音などが存在するためである。図９に示すデータを横軸をチャネル、縦軸を角度としてプロットすると、図１０に示すような三角波となる。 As illustrated in FIG. 9, the measurement angle actually acquired in the channel-specific direction detection process (step S30) is different from the ideal angle. This is because there is sound reflection, noise, and the like in the actual direction detection. When the data shown in FIG. 9 is plotted with the horizontal axis as a channel and the vertical axis as an angle, a triangular wave as shown in FIG. 10 is obtained.

図１０において、縦軸は角度（°）を表し、横軸はチャネル（ｃｈ）を表す。チャネルは、一定周期で選択されるので、チャネル（ｃｈ）＝時刻（ｓ）とみなすことも可能である。換言すれば、図１０のグラフは、チャネル別方向データＥ１３を検出（選択）タイミングに従って連結してプロットしたものであり、一定の周期波形となる。図１０では、３周期（１６×３チャネル）分の時系列方向データを示している。
なお、図１０において、測定角度が構成する波形を実線で示し、理想角度が構成する波形を破線で示す。In FIG. 10, the vertical axis represents an angle (°), and the horizontal axis represents a channel (ch). Since the channel is selected at a constant period, it can be considered that channel (ch) = time (s). In other words, the graph of FIG. 10 is obtained by connecting and plotting the channel-specific direction data E13 according to the detection (selection) timing, and has a constant periodic waveform. FIG. 10 shows time-series direction data for three periods (16 × 3 channels).
In FIG. 10, the waveform formed by the measurement angle is indicated by a solid line, and the waveform formed by the ideal angle is indicated by a broken line.

このように、時系列方向データは、チャネル別方向検出処理（ステップＳ３０）で取得する方向データを、単純に時系列に並べたデータである。このため、音源２０が移動しない場合、時系列方向データは、毎回取得しても、同じ若しくは近い値となり、その波形は周期波形となる。 As described above, the time-series direction data is data in which the direction data acquired in the channel-specific direction detection process (step S30) is simply arranged in time series. For this reason, when the sound source 20 does not move, the time-series direction data becomes the same or close value even if acquired every time, and the waveform becomes a periodic waveform.

ＦＦＴ部１４は、作成した時系列方向データにＦＦＴ処理を行う（ステップＳ６２）。即ち、ＦＦＴ部１４は、図１０に示す波形の横軸を時間とみなして、周知のＦＦＴ処理により時系列方向データを方向データ周波数成分Ｅ１４に変換する。そして、ＦＦＴ部１４は、方向データ周波数成分Ｅ１４をフィルタ部１５に出力する。 The FFT unit 14 performs FFT processing on the created time-series direction data (step S62). That is, the FFT unit 14 regards the horizontal axis of the waveform shown in FIG. 10 as time, and converts the time-series direction data into the direction data frequency component E14 by a well-known FFT process. Then, the FFT unit 14 outputs the direction data frequency component E14 to the filter unit 15.

フィルタ部１５は、ＦＦＴ部１４から入力した方向データ周波数成分Ｅ１４をフィルタ処理する（ステップＳ６３）。理想角度から構成される時系列方向データは、図１０に破線で示すように、時間軸上で三角波に近い形を形成する。これは、一定間隔毎にマイクロフォンが配列されているため、マイクロフォンアレイ１１の大きさに対して、音源２０が十分離れている場合、選択されたチャネルに対応するマイクロフォンの組からみた音源２０の角度は、チャネルを切り替える度に一定の角度（３６０°／１６＝２２．５°）ずつ変化すると推定できるからである。一方、測定角度から構成される時系列方向データは、反射やノイズの影響により、図１０において実線で示すように、時間軸上で三角波に近い形とはならない。フィルタ部１５は、この反射やノイズによる成分を、方向データ周波数成分Ｅ１４から除去するようにフィルタ処理する。 The filter unit 15 filters the direction data frequency component E14 input from the FFT unit 14 (step S63). The time-series direction data composed of ideal angles forms a shape close to a triangular wave on the time axis, as indicated by a broken line in FIG. This is because the microphones are arranged at regular intervals, and therefore, when the sound source 20 is sufficiently separated from the size of the microphone array 11, the angle of the sound source 20 viewed from the set of microphones corresponding to the selected channel. This is because it can be estimated that each time the channel is switched, the angle changes by a certain angle (360 ° / 16 = 22.5 °). On the other hand, time-series direction data composed of measurement angles does not have a shape close to a triangular wave on the time axis as shown by a solid line in FIG. 10 due to the influence of reflection and noise. The filter unit 15 performs filtering so as to remove the component due to reflection and noise from the direction data frequency component E14.

具体的には、フィルタ部１５は、三角波を構成する周波数成分のみを抽出し、他の周波数成分を除去する。すなわち、三角波は、基本波と奇数次高調波のみから構成されるため、偶数次高調波を除去する。図１１（ａ）、図１１（ｂ）を参照して、フィルタ処理の一例を説明する。なお、図１１（ａ）、図１１（ｂ）において、ｆは基本波の周波数を示す。 Specifically, the filter unit 15 extracts only the frequency components constituting the triangular wave and removes other frequency components. That is, since the triangular wave is composed only of the fundamental wave and the odd-order harmonics, the even-order harmonics are removed. With reference to FIGS. 11A and 11B, an example of the filter processing will be described. In FIGS. 11A and 11B, f indicates the frequency of the fundamental wave.

図１１（ａ）は、フィルタ処理を行う前の方向データの周波数成分の一例を示したものである。図１１（ａ）に示すように、フィルタ処理を行う前は、基本波の周波数成分の他、第２次高調波から第６次高調波の周波数成分を有している。 FIG. 11A shows an example of the frequency component of the direction data before performing the filtering process. As shown in FIG. 11A, before performing the filter processing, the frequency components of the second harmonic to the sixth harmonic are included in addition to the frequency components of the fundamental wave.

ここで、偶数次高調波（基本波と奇数次高調波以外）の周波数成分をカットするフィルタ処理を行う。図１１（ｂ）は、フィルタ処理を行った後の方向データの周波数成分の一例を示したものである。図１１（ｂ）に示すように、フィルタ処理を行った後は、基本波と奇数次高調波の周波数成分のみが残されている。換言すれば、ノイズや反射により発生したと考えられる偶数次高調波の周波数成分がカットされている。また、残された周波数成分の位相（位相スペクトル）は変化しておらず、強度（強度スペクトル）も変化していない。すなわち、フィルタ処理において、方向データ周波数成分Ｅ１４の偶数時高調波のみをカットすることにより、ノイズや反射による影響分を除去し、ＩＦＦＴ処理後の時系列方向データを三角波に近づけることができる。 Here, a filtering process for cutting frequency components of even-order harmonics (other than fundamental waves and odd-order harmonics) is performed. FIG. 11B shows an example of the frequency component of the direction data after the filtering process. As shown in FIG. 11B, after the filtering process, only the frequency components of the fundamental wave and the odd harmonics are left. In other words, the frequency components of even harmonics that are considered to be generated by noise or reflection are cut off. Further, the phase (phase spectrum) of the remaining frequency component has not changed, and the intensity (intensity spectrum) has not changed. That is, in the filtering process, by cutting only the even-numbered harmonics of the direction data frequency component E14, the influence due to noise and reflection can be removed, and the time-series direction data after IFFT processing can be made closer to a triangular wave.

フィルタ部１５は、フィルタ処理後のデータをフィルタ済方向データ周波数成分Ｅ１５としてＩＦＦＴ部１６に出力する。 The filter unit 15 outputs the filtered data to the IFFT unit 16 as a filtered direction data frequency component E15.

ＩＦＦＴ部１６は、フィルタ部１５から入力したフィルタ済方向データ周波数成分Ｅ１５に対し、ＩＦＦＴ処理すなわちＦＦＴ部１４が実行したＦＦＴ処理の逆変換を行う（ステップＳ６４）。ＩＦＦＴ部１６は、ＩＦＦＴ処理によりフィルタ済方向データ周波数成分Ｅ１５を時系列の方向データであるフィルタ済方向データＥ１６に変換する。図１２を参照して、フィルタ済方向データＥ１６について説明する。 The IFFT unit 16 performs inverse transformation of IFFT processing, that is, FFT processing executed by the FFT unit 14 on the filtered direction data frequency component E15 input from the filter unit 15 (step S64). The IFFT unit 16 converts the filtered direction data frequency component E15 into filtered direction data E16 that is time-series direction data by IFFT processing. The filtered direction data E16 will be described with reference to FIG.

図１２は、フィルタ済方向データＥ１６が構成する波形を示したものである。図１２において、縦軸は角度（°）を表し、横軸は時刻（ｓ）を表す。フィルタ済方向データＥ１６は、フィルタ処理前の方向データと同様に、１６チャネル分の方向データから構成される。従って、図１２では、時系列方向データとして波形を示すため横軸を時刻（ｓ）としているが、横軸をチャネル（ｃｈ）と考えても良い。図１２では、３周期（１６×３チャネル）分の時系列方向データを示している。なお、図１２において、１周期分の時間を点線で区切っている。 FIG. 12 shows a waveform formed by the filtered direction data E16. In FIG. 12, the vertical axis represents the angle (°), and the horizontal axis represents time (s). The filtered direction data E16 is composed of direction data for 16 channels, similarly to the direction data before filtering. Therefore, in FIG. 12, the horizontal axis is time (s) in order to show a waveform as time-series direction data, but the horizontal axis may be considered as a channel (ch). FIG. 12 shows time-series direction data for three periods (16 × 3 channels). In FIG. 12, the time for one cycle is divided by a dotted line.

図１２に示すように、フィルタ済方向データＥ１６が形成する波形は、フィルタ処理前の方向データが形成する波形（図１１の実線で示す波形）と比較すると、三角波に近い波形となっている。これは、フィルタ処理において方向データ周波数成分Ｅ１４から偶数次高調波の周波数成分をカットしたため、ノイズや反射による周波数成分が除去されたためである。ＩＦＦＴ部１６は、ＩＦＦＴ処理が完了すると、フィルタ済方向データＥ１６を音源定位部１７に出力する。 As shown in FIG. 12, the waveform formed by the filtered direction data E <b> 16 is a waveform close to a triangular wave as compared to the waveform formed by the direction data before filtering (the waveform indicated by the solid line in FIG. 11). This is because the frequency components due to noise and reflection are removed because the frequency components of even-order harmonics are cut from the direction data frequency component E14 in the filter processing. When the IFFT process is completed, the IFFT unit 16 outputs the filtered direction data E16 to the sound source localization unit 17.

音源定位部１７は、ＩＦＦＴ部１６から入力したフィルタ済方向データＥ１６から音源２０の方向を求めて定位データＥ１７として出力する（ステップＳ６５）。前述のように、フィルタ済方向データＥ１６からはノイズや反射による周波数成分が除去されている。音源定位部１７は、このフィルタ済方向データＥ１６に基づいて、定位データＥ１７を求める。以下に、定位データを求める具体的な方法の例を示す。 The sound source localization unit 17 obtains the direction of the sound source 20 from the filtered direction data E16 input from the IFFT unit 16 and outputs it as localization data E17 (step S65). As described above, frequency components due to noise and reflection are removed from the filtered direction data E16. The sound source localization unit 17 obtains localization data E17 based on the filtered direction data E16. An example of a specific method for obtaining the localization data is shown below.

前述のように、各マイクロフォンは円周上に等間隔で配置され、単チャネル方向検出部１３は、マイクロフォンの組をシフトしながら方向検出する。このため、図１２に示す横軸は時間であると同時に、チャネルあるいは円周上の位置をも示すものと考えることができる。このため、例えば、方向検出により求められた角度が０°となるポイント（図１２において黒点で示すポイント）に該当するチャネル或いは円周上の位置から音源２０の方向を特定することができる。 As described above, the microphones are arranged at equal intervals on the circumference, and the single channel direction detection unit 13 detects the direction while shifting the set of microphones. For this reason, it can be considered that the horizontal axis shown in FIG. 12 indicates not only the time but also the position on the channel or the circumference. Therefore, for example, the direction of the sound source 20 can be specified from a channel corresponding to a point (a point indicated by a black dot in FIG. 12) at which the angle obtained by direction detection is 0 ° or a position on the circumference.

音源定位部１７は、求めた定位データＥ１７を外部に出力すると、定位処理（ステップＳ６０）が完了する。これにより、音源定位処理は完了する。 When the sound source localization unit 17 outputs the obtained localization data E17 to the outside, the localization process (step S60) is completed. Thereby, the sound source localization process is completed.

なお、この発明は上記実施例に限定されず、種々の変形及び応用が可能である。 In addition, this invention is not limited to the said Example, A various deformation | transformation and application are possible.

上記実施の形態では、マイクロフォンの数を１６個としたが、マイクロフォンの数は任意である。精度を上げるため、マイクロフォンの数を例えば２４個、３２個としてもよい。また、高速化、省電力化のため、マイクロフォンの数を例えば８個に減らしても良い。 In the above embodiment, the number of microphones is 16, but the number of microphones is arbitrary. In order to increase accuracy, the number of microphones may be set to 24 or 32, for example. Further, the number of microphones may be reduced to eight, for example, for speeding up and power saving.

上記実施の形態では、マイクロフォンを等間隔で配置したが、マイクロフォンの配置は任意である。例えば、図１３に示すように、５個のマイクロフォンを１１Ａ、１１Ｃ、１１Ｄ、１１Ｉ、及び１１Ｌの位置に配置することとしてもよい。 In the said embodiment, although the microphone was arrange | positioned at equal intervals, arrangement | positioning of a microphone is arbitrary. For example, as shown in FIG. 13, five microphones may be arranged at positions 11A, 11C, 11D, 11I, and 11L.

この場合、例えば、図１４に示すように、任意の基準方向Ｖｒを定める。そして、基準方向Ｖｒに対するチャネルの方向を交差角φで表す。図１４では、チャネルの方向が直線の矢印で示されている。基準方向Ｖｒは、例えばマイクロフォン１１Ａ，１１Ｃの組のチャネルの方向と同一とすることができる。マイクロフォン１１Ｃ，１１Ｄの組、マイクロフォン１１Ｄ，１１Ｉの組、マイクロフォン１１Ｉ，１１Ｌの組、及びマイクロフォン１１Ｃ，１１Ｄの組のチャネルの方向は、基準方向Ｖｒに対してそれぞれ交差角φ１、φ２、φ３、及びφ４の関係にある。 In this case, for example, an arbitrary reference direction Vr is determined as shown in FIG. The direction of the channel with respect to the reference direction Vr is represented by an intersection angle φ. In FIG. 14, the channel direction is indicated by a straight arrow. The reference direction Vr can be the same as the channel direction of the pair of microphones 11A and 11C, for example. The directions of the channels of the microphones 11C and 11D, the microphones 11D and 11I, the microphones 11I and 11L, and the microphones 11C and 11D are crossed angles φ1, φ2, φ3, and the reference direction Vr, respectively. There is a relationship of φ4.

図１４の構成の場合、図１５に示すように、縦軸を角度（チャネル毎の基準方向に対する音源２０の方向）、横軸を交差角φとしてデータをプロットする。そして、プロットされたデータから構成される波形に対して、図の横軸（交差角φ）を時間と見なしてＦＦＴ処理を行えばよい。 In the case of the configuration shown in FIG. 14, as shown in FIG. 15, data is plotted with the vertical axis representing the angle (the direction of the sound source 20 with respect to the reference direction for each channel) and the horizontal axis representing the crossing angle φ. Then, FFT processing may be performed on a waveform composed of the plotted data by regarding the horizontal axis (crossing angle φ) in the figure as time.

また、マイクロフォンは、円周上に配置される必要はない。図１６に、４個のマイクロフォン１１Ｑ、１１Ｒ、１１Ｓ、及び１１Ｔが平行四辺形を形成して配置された例を示す。図１６の例では、基準方向Ｖｒを例えばマイクロフォン１１Ｔからマイクロフォン１１Ｑに向かう方向と定めている。この場合にも、図１４に示した例と同様に、基準方向Ｖｒに対する各チャネルの方向が交差角φ１、φ２、φ３、及びφ４として得られる。その後、図１５に示した例と同様に、データをプロットしＦＦＴ処理を行えばよい。 Further, the microphone need not be arranged on the circumference. FIG. 16 shows an example in which four microphones 11Q, 11R, 11S, and 11T are arranged to form a parallelogram. In the example of FIG. 16, the reference direction Vr is defined as a direction from the microphone 11T to the microphone 11Q, for example. Also in this case, the direction of each channel with respect to the reference direction Vr is obtained as the intersection angles φ1, φ2, φ3, and φ4 as in the example shown in FIG. Thereafter, similarly to the example shown in FIG. 15, data may be plotted and FFT processing may be performed.

上記実施の形態では、音源定位部１７は、方向検出により求められた角度が０°となるポイントから音源２０の方向を特定した。しかし、図１２に示すフィルタ済方向データＥ１６の波形のピークとなるポイントから音源２０の方向を特定しても良い。また、フィルタ済方向データＥ１６の各ポイントの値を、各チャネルの方向データとして扱い、全チャネルの方向データを総合的に判断して音源２０の方向を特定してもよい。 In the above embodiment, the sound source localization unit 17 specifies the direction of the sound source 20 from the point where the angle obtained by the direction detection is 0 °. However, the direction of the sound source 20 may be specified from the point at which the waveform of the filtered direction data E16 shown in FIG. Alternatively, the value of each point of the filtered direction data E16 may be handled as the direction data of each channel, and the direction data of all the channels may be comprehensively determined to specify the direction of the sound source 20.

また、上記実施の形態では、フィルタ部１５は、方向データ周波数成分Ｅ１４から偶数次高調波の周波数成分をカットするだけのフィルタ処理をしていた。しかし、フィルタ済方向データＥ１５が形成する波形をより三角波に近づけるためには、偶数次高調波の周波数成分をカットするだけでなく、奇数次高調波の強度を減衰させるフィルタ処理を行ってもよい。 Moreover, in the said embodiment, the filter part 15 performed the filter process which only cuts the frequency component of an even-order harmonic from the direction data frequency component E14. However, in order to make the waveform formed by the filtered direction data E15 closer to a triangular wave, not only the frequency components of the even-order harmonics are cut, but also a filter process for attenuating the intensity of the odd-order harmonics may be performed. .

具体的には、偶数次高調波の周波数成分をカットした後に、例えば、奇数次高調波の強度を高調波の次数で除算したものを強度とする。つまり、位相が変化しないように振幅成分のみを変化させる。すなわち、奇数次高調波の実数部と虚数部の比率を保ったまま減衰させる。 Specifically, after the frequency component of the even-order harmonic is cut, for example, the intensity obtained by dividing the intensity of the odd-order harmonic by the order of the harmonic is defined as the intensity. That is, only the amplitude component is changed so that the phase does not change. That is, attenuation is performed while maintaining the ratio between the real part and the imaginary part of the odd-order harmonics.

図１１（ｃ）は、奇数次高調波の強度を減衰させるフィルタ処理を行った後の方向データの周波数成分の一例を示したものである。図１１（ｃ）に示すように、フィルタ処理を行った後は、基本波と奇数次高調波の周波数成分が残されているが、奇数次高調波の強度は次数に応じて減衰している。ただし、この場合であっても、残された周波数成分の位相は変化していない。方向データ周波数成分Ｅ１４に、このようなフィルタ処理を行うことにより、ノイズや反射による影響分を除去し、フィルタ済方向データＥ１６をより三角波に近づけることができる。 FIG. 11C shows an example of the frequency component of the direction data after performing the filtering process for attenuating the intensity of the odd-order harmonics. As shown in FIG. 11 (c), after the filtering process, the frequency components of the fundamental wave and odd harmonics remain, but the intensity of the odd harmonics is attenuated according to the order. . However, even in this case, the phase of the remaining frequency component does not change. By performing such a filtering process on the direction data frequency component E14, it is possible to remove the influence due to noise and reflection and make the filtered direction data E16 closer to a triangular wave.

上述の実施例では、音源定位部１７は、１６チャネル分の方向データから音源の方向のみを求めていたが、従来の方法により音源までの距離を求めるようにしてもよい。例えば、１６チャネル分の方向データを重ね合わせることにより音源までの距離を求めることが可能である。 In the embodiment described above, the sound source localization unit 17 obtains only the direction of the sound source from the direction data for 16 channels. However, the distance to the sound source may be obtained by a conventional method. For example, the distance to the sound source can be obtained by superimposing direction data for 16 channels.

なお、前記のハードウェア構成やフローチャートは一例であり、任意に変更及び修正が可能である。上記実施の形態においては、音源定位装置１０をディスクリート部品で構成するように記載したが、例えば、音源定位装置１０で実行される処理のほとんどをＣＰＵやＤＳＰ（Digital Signal Processor)等のプロセッサ回路に実行させることも可能である。このような構成とすれば、回路構成を簡略化することが可能である。 The above hardware configuration and flowchart are examples, and can be arbitrarily changed and modified. In the above embodiment, the sound source localization device 10 is described as being configured with discrete components. For example, most of the processing executed by the sound source localization device 10 is performed in a processor circuit such as a CPU or a DSP (Digital Signal Processor). It is also possible to execute. With such a configuration, the circuit configuration can be simplified.

上記実施の形態では、音源の定位装置及び定位方法にこの発明を適用した例を示したが、この発明は、任意のパラメータを基準としてプロットすることにより一定の周期関数となるデータ列からノイズを除去する場合に広く適用可能である。例えば、データ群をあるパラメータを基準としてプロットすることにより、このパラメータに基づいて周期波形が得られるものとする。この場合、パラメータを時間とみなしてＦＦＴを行う。これにより、波形の高調波成分を抑圧してノイズを除去する。その後、ＩＦＦＴを実行してノイズを除去し、得られた波形を用いて以後の処理を行うことができる。 In the above-described embodiment, an example in which the present invention is applied to a sound source localization apparatus and a localization method has been described. However, in the present invention, noise is extracted from a data sequence that becomes a constant periodic function by plotting with reference to an arbitrary parameter. Widely applicable when removing. For example, it is assumed that a periodic waveform is obtained on the basis of this parameter by plotting the data group based on a certain parameter. In this case, FFT is performed by regarding the parameter as time. Thereby, the harmonic component of the waveform is suppressed and noise is removed. Thereafter, IFFT is performed to remove noise, and subsequent processing can be performed using the obtained waveform.

本出願は、２００７年５月１８日にされた、日本国特許出願特願２００７−１３３３２９に基づく。本明細書中に、その明細書、特許請求の範囲、図面全体を参照として取り込むものとする。 This application is based on Japanese Patent Application No. 2007-133329 filed on May 18, 2007. The specification, claims, and entire drawings are incorporated herein by reference.

本発明は、例えば、ロボット、監視システム等、音源定位を利用するあらゆる技術分野に好適に利用することができる。 The present invention can be suitably used in all technical fields that use sound source localization, such as robots and monitoring systems.

Claims

A microphone array composed of at least three or more microphone elements;
Voice data for each microphone element is input from the microphone array, and for each channel corresponding to a combination of two microphones having a predetermined positional relationship, the direction of the sound source is shown with reference to the positions of the two microphones corresponding to the channel. Direction detection means for obtaining direction data;
The direction data for each channel is input from the direction detection means, the input direction data for each channel is concatenated, and the frequency component of the waveform represented by the concatenated direction data is obtained by Fourier transforming the data regarded as time series data. Fourier transform means;
Filter means for inputting the frequency component obtained by the Fourier transform means, and outputting as a filtered frequency component a frequency component of the even-order harmonics suppressed among the inputted frequency components;
An inverse Fourier transform unit that inputs a filtered frequency component from the filter unit, and obtains filtered data by performing an inverse Fourier transform on the filtered frequency component;
Sound source localization means for obtaining the direction of the sound source from the filtered data obtained by the inverse Fourier transform means;
Comprising
A sound source localization device characterized by that.

The channel corresponds to a combination of two adjacent microphones;
The sound source localization apparatus according to claim 1.

The filter means includes
The frequency component obtained by the Fourier transform means is input, the frequency component of the even-order harmonic is suppressed among the input frequency components, and the frequency component of the odd-order harmonic is further attenuated and output as the filtered frequency component. ,
The sound source localization apparatus according to claim 1.

The direction detecting means includes
A single channel direction detection means for outputting a channel selection signal for selecting a channel, inputting voice data of a channel corresponding to the channel selection signal, and obtaining direction data for each channel;
Channel switching means for inputting voice data for each microphone element from the microphone array and outputting voice data corresponding to a channel indicated by the selection signal input from the single channel direction detecting means;
Further comprising
The sound source localization apparatus according to claim 1.

The microphone array is
Consists of omnidirectional microphone elements
The sound source localization apparatus according to claim 1.

Audio data for each microphone element is input from a microphone array composed of at least three or more microphone elements, and for each channel corresponding to a combination of two microphones in a predetermined positional relationship, two microphones corresponding to the channel A direction detecting step for obtaining direction data indicating the direction of the sound source relative to the position;
A Fourier transform step for obtaining frequency components of a waveform represented by the direction data represented by concatenating the direction data for each channel obtained by the direction detection step and performing Fourier transform on what is regarded as time-series direction data;
A filter step of outputting a frequency component obtained by suppressing the frequency component of the even-order harmonics among the frequency components obtained by the Fourier transform step as a filtered frequency component;
An inverse Fourier transform step for obtaining a filtered data by performing an inverse Fourier transform on the filtered frequency component output by the filter step;
A sound source localization step for obtaining a direction of a sound source from the filtered data obtained by the inverse Fourier transform step;
Comprising
A sound source localization method characterized by that.

Computer
Two channels corresponding to each channel corresponding to a combination of two microphones in a predetermined positional relationship based on audio data for each microphone element input from a microphone array composed of at least three or more microphone elements. Direction detection means for obtaining direction data indicating the direction of the sound source with respect to the position of the microphone;
Fourier transform means for concatenating the direction data for each channel obtained by the direction detection means and obtaining a frequency component of the waveform represented by the direction data represented by Fourier transform of what is regarded as time series data,
Filter means for obtaining a filtered frequency component obtained by suppressing the frequency component of the even-order harmonics among the frequency components obtained by the Fourier transform means;
Inverse Fourier transform means for obtaining filtered data by performing inverse Fourier transform on the filtered frequency component obtained by the filter means;
Sound source localization means for obtaining the direction of the sound source from the filtered data obtained by the inverse Fourier transform means,
A program characterized by functioning as