JP2011139409A

JP2011139409A - Audio signal processor, audio signal processing method, and computer program

Info

Publication number: JP2011139409A
Application number: JP2010000238A
Authority: JP
Inventors: Mitsunori Mizumachi; 光徳水町
Original assignee: Individual
Current assignee: Individual
Priority date: 2010-01-04
Filing date: 2010-01-04
Publication date: 2011-07-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide an audio signal processor capable of obtaining reliability in an estimated sound source direction, an audio signal processing method, and a computer program. <P>SOLUTION: The audio signal processor 1 includes: two microphones 2, 2; amplifiers 3, 3 separately connected to the microphones 2, 2, respectively; A-D converters 4, 4 connected to the respective amplifiers 3, 3; a CPU 5 connected to the A-D converters 4, 4; and an ROM 51 and an RAM 52 connected to the CPU 5. The CPU 51 converts an audio signal subjected to A-D conversion into a frame, acquires a sound space feature quantity from the audio signal converted into the frame, estimates the sound source direction of target audio based on the sound space feature quantity, and estimates reliability in the sound source direction by acquiring the tertiary or higher high-order statistics of the sound space feature quantity. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、観測された音響信号から推定された音源方向の信頼性を推定する音響信号処理装置及び音響信号処理方法、並びにコンピュータに音源方向の信頼性を推定させるためのコンピュータプログラムに関する。 The present invention relates to an acoustic signal processing apparatus and an acoustic signal processing method for estimating reliability of a sound source direction estimated from an observed acoustic signal, and a computer program for causing a computer to estimate the reliability of a sound source direction.

音源方向は、多チャネル音響信号処理において重要な情報である。従来、音源方向は種々の方法により推定され、例えば複数音源の分離、雑音除去、残響除去、及び音声区間検出等の音響処理技術において利用されている。 The sound source direction is important information in multi-channel acoustic signal processing. Conventionally, the sound source direction is estimated by various methods and used in acoustic processing techniques such as separation of a plurality of sound sources, noise removal, dereverberation, and speech section detection.

実環境には多種多様な雑音源及び残響が存在し、それらが時々刻々と変化する。これらの外乱は、観測信号に不要な歪みを与え、音源方向推定に用いる音空間特徴量を歪ませることにより、音源方向の推定精度を低下させる。このような理由により、音源方向を正確に推定することは困難である。そこで、観測信号から雑音成分を除去して音源方向を推定する方法（非特許文献１参照）、対象信号（音源方向の推定対象である音響信号）の特徴又は雑音の特徴を利用して、音響の空間的な特徴を示す情報である音空間特徴量の耐雑音性を高め、音源方向を推定する方法（非特許文献２，３参照）等、実環境において高精度に音源方向を推定可能な方法が開発されている。 There are various noise sources and reverberations in the real environment, and they change from moment to moment. These disturbances cause unnecessary distortion in the observation signal and distort the sound space feature quantity used for sound source direction estimation, thereby reducing the accuracy of sound source direction estimation. For this reason, it is difficult to accurately estimate the sound source direction. Therefore, a method of estimating a sound source direction by removing a noise component from an observation signal (see Non-Patent Document 1), a feature of a target signal (acoustic signal that is a target of sound source direction estimation) or a feature of noise is used. It is possible to estimate the sound source direction with high accuracy in the actual environment, such as a method for improving the noise resistance of the sound space feature amount that is information indicating the spatial feature of the sound source and estimating the sound source direction (see Non-Patent Documents 2 and 3) A method has been developed.

S. F. Boll, "Suppression ofacoustic noise in speech using spectral subtraction," IEEE Trans. Acoust.,Speech, and Signal Process., vol. 27, no. 2, pp. 113-120, 1979.S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, and Signal Process., Vol. 27, no. 2, pp. 113-120, 1979. M. Brandstein, "On the useof explicit speech modeling in microphone array applications," Proc. Intl.Conf. on Acoust., Speech, and Signal Process. (ICASSP'98), pp. 613-616, 1998.M. Brandstein, "On the use of explicit speech modeling in microphone array applications," Proc. Intl.Conf. On Acoust., Speech, and Signal Process. (ICASSP'98), pp. 613-616, 1998. M. Mizumachi and K. Niyada,"DOA Estimation Based on Cross-Correlation with FrequencySelectivity," RISP Journal of Signal Process., Vol. 11, No. 1, pp. 43-50,2007.M. Mizumachi and K. Niyada, "DOA Estimation Based on Cross-Correlation with Frequency Selectivity," RISP Journal of Signal Process., Vol. 11, No. 1, pp. 43-50, 2007.

しかしながら、これらの従来の音源方向推定方法は、対象信号又は雑音について何らかの事前知識を必要とするという制約がある。例えば、非特許文献１及び３に開示されている方法にあっては、雑音のパワースペクトルが事前に既知であるか、推定可能である必要がある。また、非特許文献２に開示されている方法にあっては、対象信号を音声とする必要があり、しかも音声の基本周波数（声の高さに対応する物理量）が既知又は推定可能である必要がある。したがって、このような事前知識を取得できる環境でなければ、高精度に音源方向を推定することはできない。このように、推定した音源方向が正しい場合もあれば正しくない場合もあるため、従来の方法により推定した音源方向を利用して複数音源の分離、雑音除去、残響除去、及び音声区間検出等の音響処理を行うときには、間違った方向を音源方向としてしまい、適切な処理結果を得ることができない場合があった。 However, these conventional sound source direction estimation methods are limited in that they require some prior knowledge about the target signal or noise. For example, in the methods disclosed in Non-Patent Documents 1 and 3, the power spectrum of noise needs to be known in advance or can be estimated. In the method disclosed in Non-Patent Document 2, the target signal needs to be speech, and the fundamental frequency of speech (physical quantity corresponding to voice pitch) needs to be known or can be estimated. There is. Therefore, the sound source direction cannot be estimated with high accuracy unless the environment can acquire such prior knowledge. As described above, the estimated sound source direction may be correct or incorrect, and therefore, using the sound source direction estimated by the conventional method, separation of multiple sound sources, noise removal, dereverberation, and speech section detection, etc. When performing acoustic processing, the wrong direction may be used as the sound source direction, and an appropriate processing result may not be obtained.

本発明は斯かる事情に鑑みてなされたものであり、その主たる目的は、推定された音源方向の信頼度を得ることが可能な音響信号処理装置、音響信号処理方法、及びコンピュータプログラムを提供することにある。 The present invention has been made in view of such circumstances, and a main object thereof is to provide an acoustic signal processing device, an acoustic signal processing method, and a computer program capable of obtaining the reliability of the estimated sound source direction. There is.

上述した課題を解決するために、本発明の一の態様の音響信号処理装置は、音源から発せられた対象音響を含む音響を捕捉して当該音響を示す音響信号を出力する複数のマイクロホンと、前記複数のマイクロホンから出力された音響信号に基づいて、前記音響の空間における特徴に関する音空間特徴量を取得する音空間特徴量取得手段と、前記音空間特徴量取得手段により取得された音空間特徴量に基づいて、前記対象音響の音源方向を推定する音源方向推定手段と、前記音空間特徴量取得手段により取得された音空間特徴量の３次以上の高次統計量を取得する高次統計量取得手段と、前記高次統計量取得手段により取得された高次統計量に基づいて、前記音源方向推定手段により推定された音源方向の信頼度を推定する信頼度推定手段と、を備える。 In order to solve the above-described problem, an acoustic signal processing device according to one aspect of the present invention includes a plurality of microphones that capture sound including target sound emitted from a sound source and output an acoustic signal indicating the sound; Based on acoustic signals output from the plurality of microphones, a sound space feature amount acquisition unit that acquires a sound space feature amount related to a feature in the acoustic space, and a sound space feature acquired by the sound space feature amount acquisition unit A higher-order statistic that obtains a third-order or higher-order statistic of the sound space feature quantity acquired by the sound space feature quantity acquisition means, and a sound source direction estimation means that estimates a sound source direction of the target sound based on a quantity; Reliability estimation means for estimating the reliability of the sound source direction estimated by the sound source direction estimation means based on the quantity acquisition means; and the higher order statistics obtained by the higher order statistics acquisition means; Provided.

この態様においては、前記高次統計量取得手段が、前記空間における前記音空間特徴量の分布状態を示すグラフにおける尖度を示す前記高次統計量を取得するように構成されていることが好ましい。 In this aspect, it is preferable that the higher-order statistic acquisition unit is configured to acquire the higher-order statistic indicating kurtosis in a graph indicating a distribution state of the sound space feature value in the space. .

また、上記態様においては、前記音空間特徴量取得手段が、前記音響において雑音の影響が少ないと推定される周波数を取得し、当該取得した周波数を中心としたバンドパスフィルタによって帯域制限を行った音響信号から音空間特徴量を抽出するように構成されていることが好ましい。 Moreover, in the said aspect, the said sound space feature-value acquisition means acquired the frequency estimated that there is little influence of the noise in the said sound, and performed the band restriction | limiting with the band pass filter centering on the acquired frequency It is preferable that the sound space feature is extracted from the acoustic signal.

また、上記態様においては、前記音空間特徴量取得手段が、前記マイクロホンから経時的に出力された音響信号を所定時間毎にフレーム化し、前記帯域制限を行った音響信号から抽出した音空間特徴量を尤度として用い、隣り合うフレーム間における音源方向の変化を示す音空間特徴量の動特性モデルに基づくパーティクルフィルタを用いて、対象フレームの１時刻前のフレームにおける音空間特徴量の状態から対象フレームの音空間特徴量を推定するように構成されていることが好ましい。 Further, in the above aspect, the sound space feature value acquisition unit frames the sound signal output over time from the microphone at predetermined time intervals and extracts the sound space feature value from the sound signal subjected to the band limitation. Using the particle filter based on the dynamic characteristic model of the sound space feature value indicating the change of the sound source direction between adjacent frames, the object is determined from the state of the sound space feature value in the frame one time before the target frame. It is preferable that the sound space feature amount of the frame is estimated.

また、上記態様においては、前記音空間特徴量取得手段が、同一の重さを有する複数の粒子を空間内に一様に配置する初期粒子分布設定手段と、式（１）によって示される動特性モデルにしたがって、式（２）によって示される重さ｛ｗ_ｋ ^（ｌ）｝_ｌ＝１ ^Ｍを有する粒子｛Θ_ｋ ^（ｌ）｝_ｌ＝１ ^Ｍを生成することにより、時刻ｋにおける粒子の事前分布を取得する事前分布取得手段と、前記事前分布取得手段によって取得された事前分布における粒子を、重さが所定値以上のものはその重さに応じた数に分割し、重さが所定値未満のものは０とすることにより、時刻ｋにおける音空間特徴量を推定する音空間特徴量推定手段と、を具備することが好ましい。

但し、ｘ_ｋは時刻ｋにおける音響信号を、Θ_ｋは時刻ｋにおける真の音源方向を、ν_ｋは時刻ｋにおける平均０で分散σ^２のガウス分布に従う雑音を、Ｎはガウス分布を、ｌは粒子番号を、Ｍは粒子数を示す。 Further, in the above aspect, the sound space feature quantity acquisition unit includes an initial particle distribution setting unit that uniformly arranges a plurality of particles having the same weight in the space, and dynamic characteristics represented by the equation (1). According to the model, by generating particles {Θ _k ^(l) } _{l = 1} ^M with weight {w _k ^(l) } _{l = 1} ^M shown by equation (2), the prior of the particles at time k Prior distribution acquisition means for acquiring a distribution, and particles in the prior distribution acquired by the prior distribution acquisition means are divided into a number corresponding to the weight when the weight is a predetermined value or more, and the weight is predetermined. It is preferable to provide sound space feature quantity estimation means for estimating the sound space feature quantity at time k by setting 0 to less than the value.

Where x _k is the acoustic signal at time k, Θ _k is the true sound source direction at time k, ν _k is noise according to the Gaussian distribution with mean 0 and variance σ ² at time k, N is the Gaussian distribution, Indicates the particle number, and M indicates the number of particles.

また、上記態様においては、前記高次統計量取得手段が、音空間特徴量として重みｗ^（ｌ）についての高次統計量を取得するように構成されていることが好ましい。 Moreover, in the said aspect, it is preferable that the said high order statistic acquisition means is comprised so that the high order statistic about weight w ^(l) may be acquired as a sound space feature-value.

また、上記態様においては、前記高次統計量取得手段が、音空間特徴量ｗ^（ｌ）について、式（３）に示される高次統計量Skewnessを取得するように構成されていることが好ましい。

Moreover, in the said aspect, it is preferable that the said high-order statistic acquisition means is comprised so that the high-order statistic Skewness shown by Formula (3) may be acquired about sound space feature-value w ^(l). .

また、上記態様においては、前記高次統計量取得手段が、音空間特徴量ｗ^（ｌ）について、式（４）に示される高次統計量Kurtosisを取得するように構成されていることが好ましい。

Moreover, in the said aspect, it is preferable that the said high order statistic acquisition means is comprised so that the high order statistic Kurtosis shown by Formula (4) may be acquired about sound space feature-value w ^(l). .

また、本発明の一の態様の音響信号処理方法は、複数のマイクロホンにより、音源から発せられた対象音響を含む音響を捕捉して当該音響を示す音響信号へ変換するステップと、変換された音響信号に基づいて、前記音響の空間における特徴に関する音空間特徴量を取得するステップと、取得された音空間特徴量に基づいて、前記対象音響の音源方向を推定するステップと、前記音空間特徴量の３次以上の高次統計量を取得するステップと、取得された高次統計量に基づいて、推定された音源方向の信頼度を推定するステップと、を有する。 The acoustic signal processing method according to one aspect of the present invention includes a step of capturing sound including target sound emitted from a sound source by a plurality of microphones and converting the sound into an acoustic signal indicating the sound, and the converted sound Obtaining a sound space feature amount relating to a feature in the acoustic space based on the signal; estimating a sound source direction of the target sound based on the obtained sound space feature amount; and the sound space feature amount A third-order or higher order statistical quantity, and a step of estimating the reliability of the estimated sound source direction based on the obtained higher-order statistical quantity.

また、本発明の一の態様のコンピュータプログラムは、音源から発せられた対象音響を含む音響を捕捉して当該音響を示す音響信号へ変換する複数のマイクロホンに接続されたＣＰＵに、前記複数のマイクロホンから出力された音響信号を処理させるためのコンピュータプログラムであって、前記複数のマイクロホンから出力された音響信号に基づいて、前記音響の空間における特徴に関する音空間特徴量を取得するステップと、取得された音空間特徴量に基づいて、前記対象音響の音源方向を推定するステップと、前記音空間特徴量の３次以上の高次統計量を取得するステップと、取得された高次統計量に基づいて、推定された音源方向の信頼度を推定するステップと、を前記ＣＰＵに実行させる。 The computer program according to one aspect of the present invention includes a plurality of microphones connected to a CPU connected to a plurality of microphones that capture sound including target sound emitted from a sound source and convert the sound into an acoustic signal indicating the sound. A computer program for processing an acoustic signal output from the step of acquiring a sound space feature amount relating to a feature in the acoustic space based on the acoustic signal output from the plurality of microphones; A step of estimating a sound source direction of the target sound based on the obtained sound space feature value, a step of obtaining a third-order or higher order statistic of the sound space feature value, and a basis of the obtained higher order statistic And estimating the reliability of the estimated sound source direction.

本発明に係る音響信号処理装置、音響信号処理方法、及びコンピュータプログラムによれば、推定された音源方向の信頼度を得ることが可能となる。 According to the acoustic signal processing device, the acoustic signal processing method, and the computer program according to the present invention, it is possible to obtain the reliability of the estimated sound source direction.

実施の形態１に係る音響信号処理装置の構成を示すブロック図。1 is a block diagram illustrating a configuration of an acoustic signal processing device according to Embodiment 1. FIG. 実施の形態１に係る音響信号処理装置の音響信号処理の流れを示すフローチャート。5 is a flowchart showing a flow of acoustic signal processing of the acoustic signal processing device according to the first embodiment. 実施の形態１に係る音源方向推定処理の手順を示すフローチャート。5 is a flowchart showing a procedure of sound source direction estimation processing according to the first embodiment. 音空間特徴量と音源方向推定値との関係を示すグラフ。The graph which shows the relationship between a sound space feature-value and a sound source direction estimated value. 実施の形態３に係る音響信号処理装置の音響信号処理の流れを示すフローチャート。10 is a flowchart showing a flow of acoustic signal processing of the acoustic signal processing device according to the third embodiment. 拡散性雑音が存在する環境下における音源方向推定結果及び関連情報を示すグラフ。The graph which shows the sound source direction estimation result and related information in the environment where a diffuse noise exists. 方向性雑音が存在する環境下における音源方向推定結果及び関連情報を示すグラフ。The graph which shows the sound source direction estimation result and related information in the environment where directional noise exists. 拡散性雑音が存在する環境下での各フレームにおけるＥＳＳ、波高率、歪度、及び尖度の計算結果を示すグラフ。The graph which shows the calculation result of ESS, the crest factor, the skewness, and the kurtosis in each frame in the environment where a diffuse noise exists. 方向性雑音が存在する環境下での各フレームにおけるＥＳＳ、波高率、歪度、及び尖度の計算結果を示すグラフ。The graph which shows the calculation result of ESS, the crest factor, the skewness, and the kurtosis in each frame in the environment where directional noise exists.

以下、本発明の好ましい実施の形態を、図面を参照しながら説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

（実施の形態１）
図１は、本実施の形態に係る音響信号処理装置の構成を示すブロック図である。図１に示すように、音響信号処理装置１は、２つのマイクロホン２，２と、マイクロホン２，２に各別に接続された増幅器３，３と、増幅器３，３のそれぞれに接続されたＡ／Ｄ変換器４，４と、Ａ／Ｄ変換器４，４に接続されたＣＰＵ５と、ＣＰＵ５に接続されたＲＯＭ５１及びＲＡＭ５２とを備えている。 (Embodiment 1)
FIG. 1 is a block diagram showing a configuration of an acoustic signal processing device according to the present embodiment. As shown in FIG. 1, the acoustic signal processing apparatus 1 includes two microphones 2 and 2, amplifiers 3 and 3 connected to the microphones 2 and 2, and A / A connected to each of the amplifiers 3 and 3. D converters 4 and 4, CPU 5 connected to A / D converters 4 and 4, and ROM 51 and RAM 52 connected to CPU 5 are provided.

２つのマイクロホン２，２は、互いに１０ｃｍの距離を隔てて配置されている。これらのマイクロホン２，２は、周囲の音響を捕捉し、これに応じた電気信号である音響信号を出力する。マイクロホン２，２の周囲には、話者又はスピーカ装置等の音源６が発した音響（対象音響）並びに雑音及び残響等が生じており、マイクロホン２，２はこれらを含む音響を捕捉する。 The two microphones 2 and 2 are arranged at a distance of 10 cm from each other. These microphones 2 and 2 capture ambient sound and output an acoustic signal that is an electrical signal corresponding to the ambient sound. Around the microphones 2 and 2, sound (target sound) generated by a sound source 6 such as a speaker or a speaker device, noise, reverberation, and the like are generated. The microphones 2 and 2 capture sound including these sounds.

増幅器３，３には、マイクロホン２，２から出力された音響信号が各別に与えられる。増幅器３，３は、それぞれ与えられた音響信号を所定の増幅率により増幅し、増幅した音響信号を出力する。 The acoustic signals output from the microphones 2 and 2 are given to the amplifiers 3 and 3 separately. Each of the amplifiers 3 and 3 amplifies the given acoustic signal with a predetermined amplification factor and outputs the amplified acoustic signal.

Ａ／Ｄ変換器４，４には、増幅器３，３から出力された増幅後の音響信号が各別に与えられる。Ａ／Ｄ変換器４，４は、アナログ信号である音響信号をデジタル信号へ変換し、変換後の音響データを内蔵するレジスタに格納する。 The amplified acoustic signals output from the amplifiers 3 and 3 are respectively supplied to the A / D converters 4 and 4. The A / D converters 4 and 4 convert an acoustic signal that is an analog signal into a digital signal, and store the converted acoustic data in a built-in register.

ＣＰＵ５は、ＲＯＭ５１に格納されたコンピュータプログラムを実行することが可能である。そして、音響信号処理用のコンピュータプログラム５１ａを当該ＣＰＵ５が実行することにより、ＣＰＵ５がＡ／Ｄ変換器４，４のレジスタに記憶された音響データを読み出し、後述するようなデータ処理を行う。 The CPU 5 can execute a computer program stored in the ROM 51. Then, when the CPU 5 executes the computer program 51a for acoustic signal processing, the CPU 5 reads the acoustic data stored in the registers of the A / D converters 4 and 4, and performs data processing as described later.

ＲＯＭ５１は、マスクＲＯＭ、ＰＲＯＭ、ＥＰＲＯＭ、又はＥＥＰＲＯＭ等によって構成されており、ＣＰＵ５に実行されるコンピュータプログラム及びこれに用いるデータ等が記録されている。すなわち、ＣＰＵ５に後述する音響信号処理を実行させるためのコンピュータプログラム５１ａと、その実行において使用されるデータ５１ｂとがＲＯＭ５１に記憶されている。このデータ５１ｂには、後述する雑音モデルが含まれる。 The ROM 51 is configured by a mask ROM, PROM, EPROM, EEPROM, or the like, and stores a computer program executed by the CPU 5, data used for the same, and the like. That is, the ROM 51 stores a computer program 51 a for causing the CPU 5 to execute acoustic signal processing described later, and data 51 b used in the execution. This data 51b includes a noise model described later.

ＲＡＭ５２は、ＳＲＡＭまたはＤＲＡＭ等によって構成されている。ＲＡＭ５２は、ＣＰＵ５がコンピュータプログラムを実行するときに、ＣＰＵ５の作業領域として利用される。 The RAM 52 is configured by SRAM, DRAM, or the like. The RAM 52 is used as a work area for the CPU 5 when the CPU 5 executes a computer program.

次に、本実施の形態に係る音響信号処理装置１の動作について説明する。音響信号処理装置１を起動すると、ＣＰＵ５がＲＯＭ５１に記憶されているコンピュータプログラム５１ａを実行する。このとき、雑音モデルがＲＯＭ５１からＲＡＭ５２へロードされる。この状態において、音響信号処理装置１は次のように動作する。 Next, the operation of the acoustic signal processing device 1 according to the present embodiment will be described. When the acoustic signal processing device 1 is activated, the CPU 5 executes the computer program 51 a stored in the ROM 51. At this time, the noise model is loaded from the ROM 51 to the RAM 52. In this state, the acoustic signal processing device 1 operates as follows.

図２は、本実施の形態に係る音響信号処理装置１の音響信号処理の流れを示すフローチャートである。マイクロホン２，２が捕捉した音響は、音響信号へ変換され、マイクロホン２，２から出力される。アナログ信号である音響信号は、増幅器３，３によりそれぞれ増幅され、増幅された音響信号がＡ／Ｄ変換器４，４によりデジタル信号へと変換され、変換後の音響データがＡ／Ｄ変換器４，４に内蔵されるレジスタに記憶される。かかる動作は所定のサンプリング周波数により繰り返し実行される。 FIG. 2 is a flowchart showing a flow of acoustic signal processing of the acoustic signal processing apparatus 1 according to the present embodiment. The sound captured by the microphones 2 and 2 is converted into an acoustic signal and output from the microphones 2 and 2. The acoustic signal which is an analog signal is amplified by the amplifiers 3 and 3, respectively, and the amplified acoustic signal is converted into a digital signal by the A / D converters 4 and 4, and the converted acoustic data is converted into the A / D converter. 4 and 4 are stored in registers. Such an operation is repeatedly executed at a predetermined sampling frequency.

ＣＰＵ５は、Ａ／Ｄ変換器４，４のレジスタから音響データを読み出し、サンプリング周波数により切り出された音響信号をフレーム化する（ステップＳ１）。次に、ＣＰＵ５は、音響信号をフーリエ変換し（ステップＳ２）、フーリエ変換後のデータにより、音響信号中に対象信号、即ち音源６が発した対象音響を示す信号が存在するか否かを判定する（ステップＳ３）。この処理では、対象信号のエネルギー密度はノイズ成分に比べて高いと考えられることから、エネルギー密度が突出して高い周波数がデータ中に存在するか否かを判定することにより行われる。 The CPU 5 reads the acoustic data from the registers of the A / D converters 4 and 4, and frames the acoustic signal cut out by the sampling frequency (step S1). Next, the CPU 5 performs a Fourier transform on the sound signal (step S2), and determines whether or not the target signal, that is, a signal indicating the target sound emitted from the sound source 6 exists in the sound signal based on the data after the Fourier transform. (Step S3). In this processing, since the energy density of the target signal is considered to be higher than the noise component, it is performed by determining whether the energy density protrudes and a high frequency exists in the data.

ステップＳ３において対象信号が存在しない場合には（ステップＳ３においてＮＯ）、そのフレームの音響信号は雑音のみを含んでいると考えられる。ここで、実環境に存在する雑音のうち、定常な雑音成分については、その周波数特徴は対象信号が存在しない区間で観測した平均の長時間平均パワースペクトルとして得ることができる。そこで、音響信号中に対象信号が存在しない場合には、ＣＰＵ５は当該フレームの音響信号によりＲＡＭ５２の雑音モデルを更新し（ステップＳ４）、処理をステップＳ１へ移す。 If the target signal does not exist in step S3 (NO in step S3), it is considered that the acoustic signal of that frame contains only noise. Here, among noises existing in the real environment, the frequency characteristics of stationary noise components can be obtained as an average long-time average power spectrum observed in a section where the target signal does not exist. Therefore, when the target signal is not present in the acoustic signal, the CPU 5 updates the noise model in the RAM 52 with the acoustic signal of the frame (step S4), and moves the process to step S1.

雑音モデルは、次式（５）により与えられる。

ここで、演算子Ｆ（・）はフーリエ変換を、演算子｜・｜^２はパワースペクトルの演算を表し、雑音モデルは時刻ｋ_１から時刻ｋ_２までのパワースペクトルの平均値として与えられる。 The noise model is given by the following equation (5).

Here, the operator F (•) represents the Fourier transform, the operator | • | ² represents the operation of the power spectrum, and the noise model is given as an average value of the power spectrum from the time k ₁ to the time k ₂ .

ステップＳ３において対象信号が存在する場合には（ステップＳ３においてＹＥＳ）、周波数領域において音響信号（対象信号と雑音との混合信号）とＲＡＭ５２の雑音モデルとの差分を求めることにより、対象信号が優勢な周波数を推定する（ステップＳ５）。これにより、最も雑音の影響を受けていないと考えられる、即ちＳＮ比の高い周波数を得ることができる。 If the target signal exists in step S3 (YES in step S3), the target signal is dominant by obtaining the difference between the acoustic signal (mixed signal of the target signal and noise) and the noise model in the RAM 52 in the frequency domain. A correct frequency is estimated (step S5). As a result, it is possible to obtain a frequency that is considered to be least affected by noise, that is, a high SN ratio.

次にＣＰＵ５は、ステップＳ５において推定した周波数を中心とした所定帯域幅のバンドパスフィルタに音響信号を通すことにより、音響信号の狭帯域幅信号を抽出する（ステップＳ６）。このように、音響信号より雑音の影響が少ない帯域における信号を抽出することにより、音空間特徴量の耐雑音性を向上させることが可能になる。なお、ここでは、マイクロホン２，２によって捕捉される雑音は時々刻々変化するため、そのパワースペクトルを厳密に推定することは困難であると考えられることから、音響信号と雑音モデルとの差分ではなく、Ａ／Ｄ変換器４，４から読み出した音響信号に対してバンドパスフィルタを適用している。 Next, the CPU 5 extracts a narrow bandwidth signal of the acoustic signal by passing the acoustic signal through a bandpass filter having a predetermined bandwidth centered on the frequency estimated in step S5 (step S6). Thus, by extracting a signal in a band that is less affected by noise than the acoustic signal, it is possible to improve the noise resistance of the sound space feature. Here, since the noise captured by the microphones 2 and 2 changes from moment to moment, it is considered difficult to estimate the power spectrum precisely, so it is not the difference between the acoustic signal and the noise model. A band pass filter is applied to the acoustic signals read from the A / D converters 4 and 4.

次に、ＣＰＵ５は、音源方向推定処理を実行する（ステップＳ７）。定常雑音環境下において雑音モデルは有効であるが、たとえ対象信号が優勢である周波数においても、音空間特徴量には多かれ少なかれ雑音による歪みが生じる。そこで本実施の形態においては、音源モデルの導入により、更なる音空間特徴量の耐雑音性向上を図っている。対象信号を音声信号とすると、その周波数特徴は時変であり、周波数特徴の統計的性質も個人性の影響により一意に定めることは困難である。そこで、音源の時間的動きに着目する。つまり、音響信号を短時間フレームで切り出し、フレーム間での音源の動きをモデル化する。ここでは、物体の運動を記述するモデルとして最も汎用性が高いランダムウォークモデル（式（６））を採用する。

ここで、Θ_ｋは時刻ｋでの真の音源方向を表し、ν_ｋは平均０で分散σ^２のガウス分布に従う雑音である。式（６）は、物体が時間的に滑らかに移動することを表しており、分散σ^２が小さいほど滑らかな移動軌跡を描くことを意味している。例えば、対象物体が自動車又はロケットの場合、等速運動又は等加速度運動として音源モデルを記述することが望ましい。対象音源が人の場合、フレーム長を数十ミリ秒と短く設定することにより、ランダムウォークモデルで音源の時間的移動を記述することは妥当である。 Next, the CPU 5 executes sound source direction estimation processing (step S7). Although the noise model is effective under a stationary noise environment, the sound space feature is more or less distorted by noise even at a frequency where the target signal is dominant. Therefore, in this embodiment, the noise resistance of the sound space feature is further improved by introducing a sound source model. If the target signal is an audio signal, its frequency characteristics are time-varying, and it is difficult to uniquely determine the statistical characteristics of the frequency characteristics due to the influence of individuality. Therefore, attention is paid to the temporal movement of the sound source. That is, the acoustic signal is cut out in a short time frame, and the movement of the sound source between the frames is modeled. Here, the most versatile random walk model (formula (6)) is adopted as a model for describing the motion of an object.

Here, Θ _k represents the true sound source direction at time k, and ν _k is noise that follows a Gaussian distribution with an average of 0 and variance σ ² . Expression (6) represents that the object moves smoothly in time, and the smaller the variance σ ^{2, the} smoother the movement locus is drawn. For example, when the target object is an automobile or a rocket, it is desirable to describe the sound source model as a constant velocity motion or a constant acceleration motion. When the target sound source is a person, it is appropriate to describe the temporal movement of the sound source with a random walk model by setting the frame length as short as several tens of milliseconds.

ここで、音源の動特性モデルと雑音の周波数特徴モデルを組み合わせて音源方向推定を実現する手法について説明する。まず、時系列フィルタリングを考えるため、音源方向Θと観測信号（音響信号）ｘの時刻ｋまでの時系列を以下のように表記する。

音空間特徴量としては、２つの観測信号間の相互相関（C. H. Knapp and
G. C. Carter, "The generalized correlation method for estimation of time
delay," IEEE Trans. Acoust., Speech, Signal Process., Vol. 24, pp.
320-327, 1976.）、及び音源数が既知の場合にはMUSIC法（R. O. Schmidt, "Multiple emitter location and signal parameter
estimation," IEEE Trans. Antennas Propagation, Vol. 34, No. 3, pp.
276-280, 1986.）がよく用いられる。ここでは、音空間特徴量を相互相関法に基づき計算する。但し、音空間特徴量ｐ（Θ｜ｘ）を尤度ｐ（ｘ｜Θ）（値域は［０，１］となる必要がある）として用いるために、相互相関値（値域は［−１，１］）を半波整流したものを採用する。 Here, a method for realizing sound source direction estimation by combining a dynamic characteristic model of a sound source and a frequency feature model of noise will be described. First, in order to consider time series filtering, the time series of the sound source direction Θ and the observation signal (acoustic signal) x up to time k is expressed as follows.

Sound space features include cross-correlation between two observed signals (CH Knapp and
GC Carter, "The generalized correlation method for estimation of time
delay, "IEEE Trans. Acoust., Speech, Signal Process., Vol. 24, pp.
320-327, 1976.) and MUSIC method (RO Schmidt, “Multiple emitter location and signal parameter” if the number of sound sources is known.
estimation, "IEEE Trans. Antennas Propagation, Vol. 34, No. 3, pp.
276-280, 1986.) is often used. Here, the sound space feature is calculated based on the cross-correlation method. However, since the sound space feature value p (Θ | x) is used as the likelihood p (x | Θ) (the range needs to be [0, 1]), the cross-correlation value (the range is [−1, 1]) is half-wave rectified.

このとき、１時刻前の音空間特徴量の事後確率ｐ（Θ_{１：ｋ−１}｜ｘ_{１：ｋ−１}）と時刻ｋでの尤度ｐ（ｘ_ｋ｜Θ_ｋ）と式（６）に示した音源の動きを記述したシステムモデルｐ（Θ_ｋ｜Θ_ｋ−１）を用いて、次式（８）に示す状態推定により、時刻１から時刻ｋまでを考慮した音空間特徴量の事後確率ｐ（Θ_１：ｋ｜ｘ_１：ｋ）を得ることができる。

At this time, the a posteriori probability p (Θ _{1: k−1} | x _{1: k−1} ) of the sound space feature quantity one hour before, the likelihood p (x _k | Θ _k ) at the time k, and the equation (6) Using the system model p (Θ _k | Θ _k−1 ) describing the movement of the sound source shown in FIG. 4, the state estimation shown in the following equation (8) is used to calculate the sound space feature amount considering time 1 to time k. A posteriori probability p (Θ _{1: k} | x _{1: k} ) can be obtained.

式（８）の状態推定は、プロポーザル分布としてシステムモデルを用いるブートストラップフィルタ（A. Doucet, J. F. G. de Freitas, and N. J. Gordon, Sequential Monte
Carlo Methods in Practice, Springer-Verlag, New York, 2001.）によるものである。現実の問題では、状態推定の際に非線形・非ガウス型の尤度を用いるため、式（８）を解析的に解くことはできない。そこで、本実施の形態においては、任意の確率分布を重み付き粒子の集合として表現するパーティクルフィルタを用いて状態推定を行う。パーティクルフィルタは、１期先予測と重みの更新、粒子の再分配（リサンプリング）を各時刻で行う。パーティクルフィルタによる状態推定と、それにより事後分布として推定される音空間特徴量を用いた音源方向推定アルゴリズムの具体的な手順を以下に示す。 The state estimation of Equation (8) is based on a bootstrap filter that uses a system model as a proposal distribution (A. Doucet, JFG de Freitas, and NJ Gordon, Sequential Monte
Carlo Methods in Practice, Springer-Verlag, New York, 2001.) In an actual problem, since nonlinear / non-Gaussian likelihood is used in state estimation, equation (8) cannot be solved analytically. Therefore, in the present embodiment, state estimation is performed using a particle filter that expresses an arbitrary probability distribution as a set of weighted particles. The particle filter performs one-period prediction, weight update, and particle redistribution (resampling) at each time. A specific procedure of the sound source direction estimation algorithm using the state estimation by the particle filter and the sound space feature amount estimated as the posterior distribution by the state estimation is shown below.

図３は、本実施の形態に係る音源方向推定処理の手順を示すフローチャートである。まず、ＣＰＵ５は、粒子分布の予測及び重みの更新を行う（ステップＳ７１）。この処理において、処理対象のフレームが最初のフレームである場合には、音源方向が未知であるため、１次元空間［−９０ｄｅｇ．，９０ｄｅｇ．］に一様に粒子｛Θ_０ ^（ｌ）｝_ｌ＝１ ^Ｍを配置する。ここで、ｌは粒子番号、Ｍは粒子数を表す。初期フレームにおいては、粒子はすべて等しい重み｛ｗ_０ ^（ｌ）｝_ｌ＝１ ^Ｍ＝１／Ｍを持つものとする。一方、処理対象のフレームが２つめの以降のフレームである場合には、ＣＰＵ５は、式（６）に示したシステムモデルにしたがって生成した粒子｛Θ_ｋ ^（ｌ）｝_ｌ＝１ ^Ｍにより、時刻ｋにおける粒子の事前分布を式（９）のように推定する。

また、式（１０）に示すように、各粒子の重み｛ｗ_ｋ ^（ｌ）｝_ｌ＝１ ^Ｍは、尤度ｐ（ｘ_ｋ｜Θ_ｋ）にしたがって更新される。

ここで、尤度は、雑音モデルを用いて推定した優勢な周波数において帯域制限された相互相関関数の半波整流値として計算される。 FIG. 3 is a flowchart showing a procedure of sound source direction estimation processing according to the present embodiment. First, the CPU 5 predicts the particle distribution and updates the weight (Step S71). In this processing, when the processing target frame is the first frame, the sound source direction is unknown, so that the one-dimensional space [−90 deg. , 90 deg. ] Uniformly arrange particles {Θ ₀ ^(l) } _{l = 1} ^M. Here, l represents the particle number, and M represents the number of particles. In the initial frame, it is assumed that all particles have equal weight {w ₀ ^(l) } _{1 = 1} ^M = 1 / M. On the other hand, when the processing target frame is the second and subsequent frames, the CPU 5 uses the particles {Θ _k ^(l) } _{l = 1} ^M generated according to the system model shown in the equation (6) to The prior distribution of particles at k is estimated as shown in Equation (9).

Further, as shown in Expression (10), the weight {w _k ^(l) } _{1 = 1} ^M of each particle is updated according to the likelihood p (x _k | Θ _k ).

Here, the likelihood is calculated as a half-wave rectified value of a cross-correlation function band-limited at a dominant frequency estimated using a noise model.

次にＣＰＵ５は、各粒子が等しい重みを持つように、粒子を再分配（リサンプリング）する（ステップＳ７２）。この処理では、所定値以上の重さを有する粒子はその重さに比例した数に分割され、所定値未満の重さを有する粒子は削除される。つまり、粒子の再分配により、大きな重みを持つ粒子は多数の粒子へ分割され、小さな重みを持つ粒子は消滅してしまう。リサンプリングされた重み付き粒子の集合は、次時刻におけるプロポーザル分布として利用される。またＣＰＵ５は、重み付き粒子の集合から、音空間特徴量を再構築（推定）する（ステップＳ７３）。この音空間特徴量は、雑音モデルと音源モデルの両者を考慮して推定したものであるため、雑音による歪みは大幅に低減されているものと期待できる。 Next, the CPU 5 redistributes (resamples) the particles so that each particle has an equal weight (step S72). In this process, particles having a weight greater than or equal to a predetermined value are divided into a number proportional to the weight, and particles having a weight less than the predetermined value are deleted. That is, due to the redistribution of particles, particles having a large weight are divided into a large number of particles, and particles having a small weight disappear. The set of resampled weighted particles is used as a proposal distribution at the next time. Further, the CPU 5 reconstructs (estimates) the sound space feature amount from the set of weighted particles (step S73). Since this sound space feature amount is estimated in consideration of both the noise model and the sound source model, it can be expected that distortion due to noise is greatly reduced.

図４は、音空間特徴量と音源方向推定値との関係を示すグラフである。図４において、縦軸は音空間特徴量の大きさとされ、横軸は角度とされている。音空間特徴量は、図４に示すように、観測信号から得られる音源方向の確率分布に比例するものと考えることができる。そこでＣＰＵ５は、次式（１１）にしたがって、時刻ｋにおける音源方向Θ_ｋを、音空間特徴量ｐ（Θ_ｋ｜ｘ_ｋ）の最大値を与えるΘとして推定する（ステップＳ７４）。その後ＣＰＵ５は、処理をメインルーチンにおける音源方向推定処理の呼び出しアドレスへリターンする。

FIG. 4 is a graph showing the relationship between the sound space feature and the sound source direction estimated value. In FIG. 4, the vertical axis represents the size of the sound space feature value, and the horizontal axis represents the angle. As shown in FIG. 4, the sound space feature amount can be considered to be proportional to the probability distribution in the sound source direction obtained from the observation signal. Therefore, the CPU 5 estimates the sound source direction Θ _k at time k as Θ giving the maximum value of the sound space feature quantity p (Θ _k | x _k ) according to the following equation (11) (step S74). Thereafter, the CPU 5 returns the process to the calling address of the sound source direction estimation process in the main routine.

次にＣＰＵ５は、ステップＳ７において推定された音源方向の信頼度を推定する（ステップＳ８）。以下、この処理について詳細に説明する。まず、有効サンプル数に基づいた2次統計量と音源方向推定値の信頼度との関係について説明する。なお、ここでいう有効サンプル数とは、パーティクルフィルタにおいてリサンプリングの必要性を判断するために提案された尺度をいう（J. S. Liu and R. Chen, "Blind deconvolution via sequential
imputations," J. Amer. Stat. Assoc., vol. 90, pp. 567-576, 1995.）。 Next, the CPU 5 estimates the reliability of the sound source direction estimated in step S7 (step S8). Hereinafter, this process will be described in detail. First, the relationship between the secondary statistic based on the number of valid samples and the reliability of the sound source direction estimation value will be described. The number of effective samples here is a measure proposed for judging the necessity of resampling in a particle filter (JS Liu and R. Chen, "Blind deconvolution via sequential
imputations, "J. Amer. Stat. Assoc., vol. 90, pp. 567-576, 1995.).

有効サンプル数ＥＳＳは、Ｍ個の粒子の重み｛ｗ^（ｌ）｝_ｌ＝１ ^Ｍを用いて以下のように定義される。

式（１２）は、１次元空間における粒子の集中度を表している。つまり、粒子がある方向に集中していればＥＳＳは大きな値をとり、粒子が分散していればＥＳＳは小さな値をとる。音源方向推定問題では、音空間特徴量が単峰性であり、しかも主ローブが鋭いほど望ましい。したがって、ＥＳＳが大きいほど音源方向推定値の信頼性は高いと考えられる。 The effective sample number ESS is defined as follows using the weights {w ^(l) } _{1 = 1} ^M of ^M particles.

Equation (12) represents the degree of particle concentration in the one-dimensional space. That is, if the particles are concentrated in a certain direction, the ESS takes a large value, and if the particles are dispersed, the ESS takes a small value. In the sound source direction estimation problem, it is desirable that the sound space feature is unimodal and the main lobe is sharper. Therefore, it is considered that the reliability of the sound source direction estimation value is higher as the ESS is larger.

実際に、本願発明者は、拡散性雑音（雑音源方向が明確でない場合）に対しては、ＥＳＳにより音源方向推定値の信頼度を推定できることを確認した（M. Mizumachi and K. Niyada, "Robust direction-of-arrival
estimation by particle filtering with confidence measure based on effective
sample size under noisy environments," Proc. Joint 4th Intl. Conf. on Soft
Computing and Intelligent Systems and 9th Intl. Sympo. on advanced Intelligent
Systems (SCIS&ISIS 2008), CD-ROM, 2008.を参照。）。しかし、方向性雑音（雑音源が鋭い指向性を持つ場合）には、ＥＳＳでは音源方向推定値の信頼度を推定することができないこともわかった。一般には、方向性雑音が存在すると、音空間特徴量は目的音源方向と雑音源方向に２つの極大値を有する。本実施の形態における推定方法により推定された、耐雑音性を向上させた音空間特徴量を用いると、雑音源方向のピークは相対的に小さくなるはずであるが、雑音源方向にも多少の粒子が分配されている可能性がある。式（１２）で定められるＥＳＳは、全ての粒子の重みを評価するため、本来評価すべき目的音源方向付近に存在する粒子の重みのみならず、音源方向推定結果とは無関係な雑音源方向に分配された粒子の重みの影響を受ける。したがって、方向性雑音環境下では、ＥＳＳによる音源方向推定値の信頼度推定は望ましくない。 Actually, the present inventor confirmed that the reliability of the sound source direction estimation value can be estimated by ESS for diffuse noise (when the noise source direction is not clear) (M. Mizumachi and K. Niyada, " Robust direction-of-arrival
estimation by particle filtering with confidence measure based on effective
sample size under noisy environments, "Proc. Joint 4th Intl. Conf. on Soft
Computing and Intelligent Systems and 9th Intl. Sympo. On advanced Intelligent
See Systems (SCIS & ISIS 2008), CD-ROM, 2008. ). However, it was also found that directional noise (when the noise source has a sharp directivity) cannot estimate the reliability of the sound source direction estimation value by ESS. In general, when directional noise is present, the sound space feature value has two maximum values in the target sound source direction and the noise source direction. When the sound space feature amount with improved noise resistance estimated by the estimation method in the present embodiment is used, the peak in the noise source direction should be relatively small, but there is also some noise source direction. Particles may be distributed. Since the ESS defined by the equation (12) evaluates the weights of all the particles, not only the weights of the particles existing in the vicinity of the target sound source direction to be originally evaluated, but also the noise source directions unrelated to the sound source direction estimation result. It is affected by the weight of the distributed particles. Therefore, in the directional noise environment, the reliability estimation of the sound source direction estimation value by ESS is not desirable.

そこで本実施の形態においては、３次以上の高次統計量を用いて音源方向推定値の信頼度を推定する。３次統計量としては、次式（１３）で示される３次モーメント（３乗の期待値）である歪度（Skewness）を利用する。つまり、ＣＰＵ５は歪度を音源方向推定値の信頼度として算出する。

ここで、ｗ_ｒｍｓは全粒子の重みの実効値であり、ｗ_ｍｅａｎは、全粒子の重みの平均値であり、それぞれ以下に示される。

Therefore, in the present embodiment, the reliability of the sound source direction estimation value is estimated using higher-order statistics of the third order or higher. As the third-order statistic, the skewness (Skewness) that is the third-order moment (the expected value of the third power) represented by the following equation (13) is used. That is, the CPU 5 calculates the degree of distortion as the reliability of the sound source direction estimated value.

Here, w _rms is the effective value of the weight of all particles, and w _mean is the average value of the weight of all particles, and is shown below.

歪度は、分布の非対称性を表す尺度であり、分布が対称であるほど０に近い値をとる。音源方向推定問題では、拡散性雑音環境下では雑音があらゆる方向から到来するため、本来は目的音源方向に集中しているべき粒子が、目的音源方向を中心に対称に分散している可能性がある。従って、歪度が低い場合には、拡散性雑音の存在により音空間特徴量に歪みが生じている可能性が疑われる。また、歪度は、音空間特徴量の尖度を示す指標ということもできる。このように、音空間特徴量の尖度を示す指標を用いれば、粒子が真の目的音源方向近傍にどの程度集中しているのかがわかるという理由により音源方向推定値の信頼度を高精度に推定することが可能と考えられる。 The skewness is a measure representing the asymmetry of the distribution, and takes a value closer to 0 as the distribution is symmetric. In the sound source direction estimation problem, since noise comes from all directions in a diffuse noise environment, there is a possibility that particles that should be concentrated in the target sound source direction are distributed symmetrically around the target sound source direction. is there. Therefore, when the degree of distortion is low, it is suspected that the sound space feature amount may be distorted due to the presence of diffusive noise. The skewness can also be referred to as an index indicating the kurtosis of the sound space feature quantity. In this way, if the index indicating the kurtosis of the sound space feature is used, the reliability of the sound source direction estimation value can be improved with high accuracy because it is possible to know how concentrated the particles are in the vicinity of the true target sound source direction. It can be estimated.

ＣＰＵ５は、上記のステップＳ８の処理を終了すると、ステップＳ１へと処理を戻す。 CPU5 returns a process to step S1, after complete | finishing the process of said step S8.

ＣＰＵ５は、このようにして得られた音源方向の推定値とその信頼度とを、図示しない表示部に表示することができる。また、これと共に、又はこれに代えて、音源方向の推定値及びその信頼度をそれぞれデータとして音響信号処理装置１の外部へ出力することもできる。 The CPU 5 can display the estimated value of the sound source direction thus obtained and its reliability on a display unit (not shown). In addition to or instead of this, the estimated value of the sound source direction and the reliability thereof can be output to the outside of the acoustic signal processing apparatus 1 as data.

また、ＣＰＵ５は、推定した音源方向を他のアプリケーションに用いることもできる。例えば、複数音源の分離、雑音除去、残響除去、及び音声区間検出等の音響処理技術に利用することができる。ここで、推定された信頼度を所定の基準値と比較して、信頼度が基準値以上の場合には推定された音源方向を前記アプリケーションに利用し、信頼度が基準値未満の場合には推定された音源方向を前記アプリケーションに利用しないことができる。また、信頼度が基準値以上の場合には推定された音源方向を中心とした狭い範囲の音響信号を抽出して、抽出された音響信号を前記アプリケーションに利用し、信頼度が基準値未満の場合には推定された音源方向を中心とした広い範囲の音響信号を抽出したり、又は音響信号の観測方向制限を施さずに、その音響信号を前記アプリケーションに利用することができる。このようにすることにより、間違った方向を音源方向としてしまうことが抑制され、従来に比してより適切な処理結果を得ることが期待できる。 The CPU 5 can also use the estimated sound source direction for other applications. For example, it can be used for acoustic processing techniques such as separation of a plurality of sound sources, noise removal, dereverberation, and speech section detection. Here, the estimated reliability is compared with a predetermined reference value. When the reliability is equal to or higher than the reference value, the estimated sound source direction is used for the application, and when the reliability is lower than the reference value, The estimated sound source direction can not be used for the application. In addition, when the reliability is equal to or higher than the reference value, an acoustic signal in a narrow range centered on the estimated sound source direction is extracted, and the extracted acoustic signal is used for the application, and the reliability is less than the reference value. In some cases, it is possible to extract an acoustic signal in a wide range centered on the estimated sound source direction or use the acoustic signal for the application without restricting the observation direction of the acoustic signal. By doing in this way, it can suppress that the wrong direction is made into a sound source direction, and it can be anticipated that a more suitable process result will be obtained compared with the past.

（実施の形態２）
本実施の形態においては、音源方向の信頼度の推定処理において、３次以上の高次統計量として、次式（１４）で示される４次モーメント（４乗の期待値）である尖度（Kurtosis）を利用する。つまり、ＣＰＵ５は尖度を音源方向推定値の信頼度として算出する。

(Embodiment 2)
In the present embodiment, in the process of estimating the reliability of the sound source direction, the kurtosis (the fourth-order moment (expected value of the fourth power) represented by the following equation (14) is used as a third-order or higher order statistical quantity ( Kurtosis). That is, the CPU 5 calculates the kurtosis as the reliability of the estimated sound source direction.

尖度は、分布の集中度を表す統計量であるため、音空間特徴量の単峰性を評価するために適した尺度として期待できる。式（１２）で定義したＥＳＳも粒子の分布度合いを知るために提案された指標であるが、尖度は４次モーメントであるため、つまりＥＳＳよりも次数が高いため分布の集中度をより強調することが可能である。 Since the kurtosis is a statistic indicating the degree of concentration of the distribution, it can be expected as a suitable measure for evaluating the unimodality of the sound space feature. The ESS defined by the equation (12) is also an index proposed for knowing the degree of particle distribution. However, since the kurtosis is a fourth-order moment, that is, the degree is higher than the ESS, the concentration degree of the distribution is more emphasized. Is possible.

本実施の形態に係る音響信号処理装置のその他の構成及び動作については、実施の形態１に係る音響信号処理装置１の構成及び動作と同様であるため、その説明を省略する。 The other configuration and operation of the acoustic signal processing device according to the present embodiment are the same as the configuration and operation of the acoustic signal processing device 1 according to the first embodiment, and thus the description thereof is omitted.

（実施の形態３）
本実施の形態に係る音響信号処理装置の構成は、実施の形態１に係る音響信号処理装置１の構成と同様であるので、同一構成要素については同一符号を付し、その説明を省略する。 (Embodiment 3)
Since the configuration of the acoustic signal processing device according to the present embodiment is the same as the configuration of the acoustic signal processing device 1 according to the first embodiment, the same components are denoted by the same reference numerals, and the description thereof is omitted.

本実施の形態に係る音響信号処理装置の動作について説明する。図５は、本実施の形態に係る音響信号処理装置の音響信号処理の流れを示すフローチャートである。まず、マイクロホン２，２が捕捉した音響は、音響信号へ変換され、マイクロホン２，２から出力される。アナログ信号である音響信号は、増幅器３，３によりそれぞれ増幅され、増幅された音響信号がＡ／Ｄ変換器４，４によりデジタル信号へと変換され、変換後の音響データがＡ／Ｄ変換器４，４に内蔵されるレジスタに記憶される。かかる動作は所定のサンプリング周波数により繰り返し実行される。 The operation of the acoustic signal processing apparatus according to this embodiment will be described. FIG. 5 is a flowchart showing a flow of acoustic signal processing of the acoustic signal processing device according to the present embodiment. First, the sound captured by the microphones 2 and 2 is converted into an acoustic signal and output from the microphones 2 and 2. The acoustic signal which is an analog signal is amplified by the amplifiers 3 and 3, respectively, and the amplified acoustic signal is converted into a digital signal by the A / D converters 4 and 4, and the converted acoustic data is converted into the A / D converter. 4 and 4 are stored in registers. Such an operation is repeatedly executed at a predetermined sampling frequency.

ＣＰＵ５は、Ａ／Ｄ変換器４，４のレジスタから音響データ（音響信号）を読み出す（ステップＳ３０１）。対象音源ｓ（ｔ）がΘ方向に存在するとき、２つの異なる位置で観測された音響信号ｘ（ｔ）≡（ｘ_１（ｔ），ｘ_２（ｔ））は以下の式（１５）のように表すことができる。

ここで、ｈ_１（ｔ）及びｈ_２（ｔ）は対象音源からそれぞれの観測点（マイクロホン２，２）までのインパルス応答であり、ｎ_１（ｔ）及びｎ_２（ｔ）
はそれぞれの観測点における雑音であり、対象信号ｓ（ｔ）がそれぞれの観測点に到来するまでの時間差τ＝τ_１−τ_２は音源方向Θに応じて変化する。音源方向を１次元方向に限定すれば、τは音源方向Θと１対１に対応する。つまり、音源方向推定問題は、音響信号ｘ（ｔ）に内在する信号到来時間差τを推定する問題とみなすことができる。 The CPU 5 reads the acoustic data (acoustic signal) from the registers of the A / D converters 4 and 4 (step S301). When the target sound source s (t) exists in the Θ direction, the acoustic signal x (t) ≡ (x ₁ (t), x ₂ (t)) observed at two different positions is expressed by the following equation (15). Can be expressed as:

Here, h ₁ (t) and h ₂ (t) are impulse responses from the target sound source to the respective observation points (microphones 2 and 2), and n ₁ (t) and n ₂ (t)
Is noise at each observation point, and the time difference τ = τ ₁ −τ ₂ until the target signal s (t) arrives at each observation point varies depending on the sound source direction Θ. If the sound source direction is limited to a one-dimensional direction, τ corresponds one-to-one with the sound source direction Θ. That is, the sound source direction estimation problem can be regarded as a problem of estimating the signal arrival time difference τ inherent in the acoustic signal x (t).

次に、ＣＰＵ５は、観測信号ｘ（ｔ）から音空間特徴量ｐ（Θ｜ｘ）を計算する（ステップＳ３０２）。音空間特徴量は、観測信号ｘ（ｔ）から得られる音源方向Θの確率分布に比例するものと考えることができる（図４参照）。本実施の形態においては、音空間特徴量として、２つの観測信号間の相互相関を採用する。この他にも、音源数が既知の場合には、MUSIC法を用いて音空間特徴量を求めるてもよい。 Next, the CPU 5 calculates a sound space feature quantity p (Θ | x) from the observation signal x (t) (step S302). The sound space feature can be considered to be proportional to the probability distribution of the sound source direction Θ obtained from the observation signal x (t) (see FIG. 4). In the present embodiment, a cross-correlation between two observation signals is adopted as the sound space feature amount. In addition, when the number of sound sources is known, the sound space feature amount may be obtained using the MUSIC method.

次に、ＣＰＵ５は、次式（１６）にしたがって、音空間特徴量ｐ（Θ｜ｘ）の最大値を与えるΘとして音源方向Θを推定する（ステップＳ３０３）。

Next, the CPU 5 estimates the sound source direction Θ as Θ that gives the maximum value of the sound space feature quantity p (Θ | x) according to the following equation (16) (step S303).

次にＣＰＵ５は、ステップＳ３０３において推定された音源方向の信頼度を推定する（ステップＳ３０４）。この処理においては、音源方向の信頼度として、音空間特徴量ｐ（Θ｜ｘ）の歪度（Skewness）が算出される。なお、音源方向の信頼度として、音空間特徴量ｐ（Θ｜ｘ）の尖度（Kurtosis）を算出する構成であってもよい。 Next, the CPU 5 estimates the reliability of the sound source direction estimated in step S303 (step S304). In this process, the skewness of the sound space feature quantity p (Θ | x) is calculated as the reliability of the sound source direction. Note that the kurtosis (Kurtosis) of the sound space feature value p (Θ | x) may be calculated as the reliability of the sound source direction.

かかる構成とすることにより、音源方向の推定値の信頼度を簡易に得ることが可能となる。 With this configuration, it is possible to easily obtain the reliability of the estimated value of the sound source direction.

（評価実験）
本願発明者は、上記の実施の形態１及び２に係る音響信号処理方法における音源方向推定値の信頼度評価尺度の妥当性を検証するため、拡散性雑音及び方向性雑音環境下での音源方向推定結果と各信頼度評価尺度の振る舞いとの関係を調査する実験を行った。以下、この実験結果について説明する。 (Evaluation experiment)
In order to verify the validity of the reliability evaluation measure of the sound source direction estimation value in the acoustic signal processing method according to Embodiments 1 and 2 described above, the inventor of the present application has applied the sound source direction under the diffusive noise and directional noise environments. An experiment was conducted to investigate the relationship between the estimation results and the behavior of each reliability rating scale. Hereinafter, the experimental results will be described.

対象信号は、ＴＩ−ｄｉｇｉｔ音声データベースより抜粋した女性による数字読み上げ音声であり、防音室内でスピーカより放射し、間隔１０ｃｍで配置した２つのマイクロホンで再収録したものである。対象音源は、連続的かつ滑らかに移動するものとした。雑音はホワイトノイズとし、拡散性雑音としては２つのマイクロホンでの観測信号間で無相関なホワイトノイズを対象信号へそれぞれ加算し、方向性雑音としては−１５°方向に配置したスピーカから放射されたホワイトノイズを２つのマイクロホンで観測したものとした。 The target signal is a number reading voice by a woman extracted from the TI-digit voice database, which is radiated from a speaker in a soundproof room and re-recorded by two microphones arranged at an interval of 10 cm. The target sound source was assumed to move continuously and smoothly. Noise is white noise, diffuse noise is white noise that is uncorrelated between the observation signals from two microphones, and directional noise is emitted from a speaker arranged at -15 °. White noise was observed with two microphones.

音源方向推定に必要なパラメータとして、粒子数は５００で固定し、式（６）のシステムノイズの分散σ^２はそれぞれの雑音環境下で最適なものを設定した。 As parameters necessary for sound source direction estimation, the number of particles was fixed at 500, and the system noise variance σ ^{2 in} equation (6) was set to be optimal under each noise environment.

図６は、拡散性雑音が存在する環境下における音源方向推定結果及び関連情報を示すグラフであり、図７は、方向性雑音が存在する環境下における音源方向推定結果及び関連情報を示すグラフである。各図の上段には、音源方向の推定結果を示すグラフを示している。これらの上段のグラフには、各フレームにおいて、真の音源方向を○印で、実施の形態３の方法で推定した音源方向を＋印で、実施の形態１の方法で推定した音源方向を実線で示している。図６及び図７のそれぞれの中段には、実施の形態１の方法による音源方向推定誤差として、真の音源方向（○印）と音源方向推定値（実線）との差分を示している。図６及び図７のそれぞれの下段には、各フレームにおける信号対雑音比を示している。信号対雑音比は値が小さいほど雑音のエネルギーが相対的に大きいことを意味する。本実験においては、拡散性雑音が存在する環境下（以下、「拡散性雑音環境下」という。）及び方向性雑音が存在する環境下（以下、「方向性雑音環境下」という。）のそれぞれにおいて、音源方向を推定し、推定した音源方向の信頼度として、各フレームにおけるＥＳＳ、波高率（Crest factor）、歪度（Skewness）、尖度（Kurtosis）を算出した。図８は、図６に対応しており、拡散性雑音環境下での各フレームにおけるＥＳＳ、波高率（Crest factor）、歪度（Skewness）、尖度（Kurtosis）の計算結果を示している。図９は、図７に対応しており、方向性雑音環境下での各フレームにおけるＥＳＳ、波高率（Crest factor）、歪度（Skewness）、尖度（Kurtosis）の計算結果を示している。 FIG. 6 is a graph showing a sound source direction estimation result and related information in an environment where diffusive noise exists, and FIG. 7 is a graph showing a sound source direction estimation result and related information in an environment where directional noise exists. is there. In the upper part of each figure, a graph showing the estimation result of the sound source direction is shown. In these upper graphs, in each frame, the true sound source direction is indicated by a circle, the sound source direction estimated by the method of Embodiment 3 is +, and the sound source direction estimated by the method of Embodiment 1 is a solid line. Is shown. The middle part of each of FIGS. 6 and 7 shows the difference between the true sound source direction (◯ mark) and the sound source direction estimated value (solid line) as the sound source direction estimation error by the method of the first embodiment. The lower part of each of FIGS. 6 and 7 shows the signal-to-noise ratio in each frame. The smaller the value of the signal-to-noise ratio, the higher the noise energy. In this experiment, each of an environment in which diffusive noise exists (hereinafter referred to as “diffuse noise environment”) and an environment in which directional noise exists (hereinafter referred to as “directional noise environment”). The sound source direction was estimated, and ESS, crest factor, skewness, and kurtosis in each frame were calculated as reliability of the estimated sound source direction. FIG. 8 corresponds to FIG. 6 and shows calculation results of ESS, crest factor, skewness, and kurtosis in each frame under a diffuse noise environment. FIG. 9 corresponds to FIG. 7 and shows the calculation results of ESS, crest factor, skewness, and kurtosis in each frame under a directional noise environment.

ここで、波高率について説明する。波高率（Crest factor; CF）は、次式（１７）で示される音空間特徴量の最大ピーク付近での粒子に着目した２次統計量であり、実効値に対する最大値の比である。

ここで、ｗ^{（ｍａｘ）}は全粒子のうち重みが最大の粒子の重みである。 Here, the crest factor will be described. The crest factor (CF) is a secondary statistic that focuses on particles near the maximum peak of the sound space feature value represented by the following equation (17), and is a ratio of the maximum value to the effective value.

Here, w ^(max) is the weight of the largest particle among all particles.

図６及び図７に示すように、拡散性雑音環境下及び方向性雑音環境下の両方共、フレーム番号１０〜２０の間、つまり音源方向の角度が大きい領域において推定した音源方向の真の音源方向からの誤差が大きくなっている。 As shown in FIGS. 6 and 7, the true sound source in the sound source direction estimated between the frame numbers 10 to 20, that is, in the region where the angle of the sound source direction is large, in both the diffuse noise environment and the directional noise environment. The error from the direction is large.

図８に示すように、拡散性雑音環境下では、４つの音源方向推定結果の信頼性評価尺度（信頼度推定値）が同様に振る舞っている。即ち、全ての尺度において、フレーム番号１０〜１５の範囲で数値が落ち込んでいる。フレーム番号１０〜１５の範囲は上記の誤差が大きい領域に含まれていることから、図８の結果はどの尺度も誤差が大きい領域において信頼度が低いという推定結果が出ていることを示しており、何れの尺度においても正しく信頼性推定が働いていることがわかる。各信頼性評価尺度をより詳細に比較すると、３次及び４次統計量である歪度と尖度については、信頼度のダイナミックレンジが大きい（落ち込みの度合が大きい）ことがわかる。つまり、音空間特徴量の信頼性評価尺度としては、４つの尺度のすべてを利用できるが、３次以上の高次統計量を用いること望ましい。 As shown in FIG. 8, in the diffuse noise environment, the reliability evaluation scales (reliability estimation values) of the four sound source direction estimation results behave similarly. That is, in all the scales, the numerical value falls in the range of frame numbers 10-15. Since the range of frame numbers 10 to 15 is included in the region where the above error is large, the result of FIG. 8 shows that the estimation result is low that the reliability is low in the region where the error is large. It can be seen that the reliability estimation works correctly in any scale. When comparing each reliability evaluation scale in more detail, it can be seen that the degree of skewness and kurtosis, which are the third and fourth order statistics, have a large dynamic range of reliability (the degree of depression is large). That is, all four scales can be used as the reliability evaluation scale of the sound space feature quantity, but it is desirable to use a higher-order statistic of the third or higher order.

一方、方向性雑音環境下では、図９に示すように、２次統計量であるＥＳＳ及び波高率に対して、３次以上の統計量である歪度と尖度は、まったく異なる振る舞いを示していることがわかる。即ち、ＥＳＳ及び波高率については、フレーム番号１０〜１３の範囲において数値が突出しており、歪度及び尖度については、フレーム番号１１〜１５において数値が落ち込んでいる。これらの範囲は何れも上記の誤差が大きい領域に含まれていることから、図９の結果はＥＳＳ及び波高率については誤差が大きい領域において信頼度が高いという間違った推定結果が出ており、歪度及び尖度については誤差が大きい領域において信頼度が低いという正しい推定結果が出ていることを示している。つまり、２次統計量（ＥＳＳ及び波高率）は信頼度推定に失敗している。特定方向から到来する雑音源に対しては、音空間特徴量を近似した全粒子のばらつき度合いを表現したＥＳＳ及び最大ピークに着目した波高率は、雑音の影響を強く受けて、本来評価すべき目的音源方向付近での粒子の集中度を表現できていないと推察される。これに対し、３次以上の統計量である歪度と尖度は、方向性雑音環境下でも拡散性雑音環境下とほぼ同様に、音源方向推定結果の信頼度を正しく推定できている。以上より、音源方向推定結果の信頼性評価尺度としては、３次以上の高次統計量に基づいたものを利用することが望ましいことがわかる。 On the other hand, in the directional noise environment, as shown in FIG. 9, the skewness and the kurtosis, which are the third-order statistics and the statistic, which are the second-order statistics, exhibit completely different behaviors. You can see that That is, numerical values of ESS and crest factor are prominent in the range of frame numbers 10 to 13, and numerical values of skewness and kurtosis are decreased in frame numbers 11 to 15. Since both of these ranges are included in the region where the above error is large, the result of FIG. 9 is an erroneous estimation result that the reliability of the ESS and the crest factor is high in the region where the error is large. For skewness and kurtosis, it is shown that a correct estimation result is obtained that the reliability is low in a region where the error is large. That is, the second order statistics (ESS and crest factor) have failed to estimate reliability. For noise sources coming from a specific direction, the ESS representing the degree of variation of all particles approximating the sound space feature and the crest factor focusing on the maximum peak are strongly influenced by noise and should be evaluated originally It is assumed that the concentration of particles near the target sound source direction cannot be expressed. On the other hand, the skewness and kurtosis, which are third-order and higher statistics, can accurately estimate the reliability of the sound source direction estimation result even in a directional noise environment, as in a diffusive noise environment. From the above, it can be seen that it is desirable to use a measure based on higher-order statistics of the third or higher order as the reliability evaluation scale of the sound source direction estimation result.

（その他の実施の形態）
なお、上述した実施の形態１〜３においては、音空間特徴量の３次又は４次の統計量を音源方向の信頼度とする構成について述べたが、これらに限定されるものではない。５次又は６次等、３次以上の統計量であればその次数は問わない。 (Other embodiments)
In the first to third embodiments described above, the configuration in which the third-order or fourth-order statistic of the sound space feature value is used as the reliability of the sound source direction is described, but the present invention is not limited to this. The order is not limited as long as it is a third or higher order statistic such as fifth order or sixth order.

また、上述した実施の形態１〜３においては、音空間特徴量の３次以上の統計量を音源方向の信頼度として算出する構成について述べたが、これに限定されるものではない。音空間特徴量の３次以上の統計量を正規化するなど、前記統計量そのものではなく、前記統計量を適宜加工することにより得られる数値を音源方向の信頼度とする構成であってもよい。但し、この音源方向の信頼度は、前記統計量の増減に対応して増減する数値である必要がある。 Further, in the first to third embodiments described above, a configuration has been described in which a third-order or higher-order statistic of the sound space feature value is calculated as the reliability of the sound source direction, but the present invention is not limited to this. A configuration may be adopted in which a numerical value obtained by appropriately processing the statistic, instead of the statistic itself, is used as the reliability of the sound source direction, such as normalizing a third or higher order statistic of the sound space feature quantity. . However, the reliability of the sound source direction needs to be a numerical value that increases or decreases in accordance with the increase or decrease of the statistics.

また、上述した実施の形態１及び２においては、バンドパスフィルタにより、雑音の影響が少ない帯域に音響信号の帯域制限を行った後、音空間特徴量を算出する構成について述べたが、これに限定されるものではない。バンドパスフィルタによる帯域制限を行っていない音響信号から音空間特徴量を導出してもよい。 In the first and second embodiments described above, the configuration in which the sound space feature amount is calculated after the band limitation of the acoustic signal is performed to the band where the influence of noise is small by the band pass filter has been described. It is not limited. A sound space feature amount may be derived from an acoustic signal that is not subjected to band limitation by a bandpass filter.

また、上述した実施の形態１及び２においては、Ａ／Ｄ変換器４，４から読み出した音響信号に対してバンドパスフィルタを適用し、帯域制限を行う構成について述べたが、これに限定されるものではない。雑音のパワースペクトルが既知であるか、推定できる場合には、音響信号と雑音モデルとの差分に対してバンドパスフィルタを適用し、前記差分に対して帯域制限を行うことが、より雑音の影響を排除できる点で好ましい。 In the first and second embodiments described above, the bandpass filter is applied to the acoustic signals read from the A / D converters 4 and 4 to limit the band. However, the present invention is not limited to this. It is not something. If the noise power spectrum is known or can be estimated, it is more effective to apply a bandpass filter to the difference between the acoustic signal and the noise model and limit the band to the difference. Is preferable in that it can be eliminated.

また、上述した実施の形態１〜３においては、ＣＰＵ５がコンピュータプログラム５１ａを実行することにより音響信号処理を行う構成について述べたが、これに限定されるものではない。同等の処理を行える構成であれば、ＡＳＩＣ（Application Specific Integrated Circuit）又はＦＰＧＡ（Field Programmable Gate Array）により、コンピュータプログラムを実行することなく、そのハードウェア自体によって、音響信号処理を実行する構成としてもよいし、汎用のパーソナルコンピュータが備えるハードディスクに音響信号処理用のコンピュータプログラムをインストールし、当該パーソナルコンピュータのＣＰＵが前記コンピュータプログラムを実行することにより、同等の音響信号処理を実行する構成としてもよい。 Moreover, in Embodiment 1-3 mentioned above, although the structure which performs acoustic signal processing by CPU5 running the computer program 51a was described, it is not limited to this. As long as the configuration can perform equivalent processing, the configuration may be such that acoustic signal processing is executed by the hardware itself without executing a computer program by an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). Alternatively, a computer program for acoustic signal processing may be installed in a hard disk included in a general-purpose personal computer, and the CPU of the personal computer may execute the computer program to perform equivalent acoustic signal processing.

本発明の音響信号処理装置、音響信号処理方法、及びコンピュータプログラムは、観測された音響信号から推定された音源方向の信頼性を推定する音響信号処理装置及び音響信号処理方法、並びにコンピュータに音源方向の信頼性を推定させるためのコンピュータプログラムとして有用である。 An acoustic signal processing apparatus, an acoustic signal processing method, and a computer program according to the present invention provide an acoustic signal processing apparatus and acoustic signal processing method for estimating reliability of a sound source direction estimated from an observed acoustic signal, and a sound source direction in a computer. It is useful as a computer program for estimating the reliability.

１音響信号処理装置
２マイクロホン
３増幅器
４変換器
５ＣＰＵ
６音源
５１ＲＯＭ
５１ａコンピュータプログラム
５１ｂデータ
５２ＲＡＭ DESCRIPTION OF SYMBOLS 1 Acoustic signal processing apparatus 2 Microphone 3 Amplifier 4 Converter 5 CPU
6 Sound sources 51 ROM
51a Computer program 51b Data 52 RAM

Claims

A plurality of microphones that capture sound including target sound emitted from a sound source and output an acoustic signal indicating the sound;
Sound space feature quantity acquisition means for acquiring a sound space feature quantity relating to a feature in the acoustic space based on acoustic signals output from the plurality of microphones;
Sound source direction estimation means for estimating the sound source direction of the target sound based on the sound space feature quantity acquired by the sound space feature quantity acquisition means;
High-order statistic acquisition means for acquiring a third-order or higher-order statistic of the sound space feature value acquired by the sound space feature value acquisition means;
Reliability estimation means for estimating the reliability of the sound source direction estimated by the sound source direction estimation means based on the higher order statistics obtained by the higher order statistics acquisition means;
Comprising
Acoustic signal processing device.

The high-order statistic acquisition unit is configured to acquire the high-order statistic indicating kurtosis in a graph indicating a distribution state of the sound space feature value in the space.
The acoustic signal processing device according to claim 1.

The sound space feature acquisition unit acquires a frequency estimated to be less affected by noise in the sound, and the sound space feature is obtained from an acoustic signal subjected to band limitation by a bandpass filter centered on the acquired frequency. Is configured to extract,
The acoustic signal processing apparatus according to claim 1 or 2.

The sound space feature quantity acquisition means frames the acoustic signal output over time from the microphone every predetermined time and uses the sound space feature quantity extracted from the acoustic signal subjected to the band limitation as a likelihood. Using a particle filter based on a dynamic characteristic model of a sound space feature amount indicating a change in sound source direction between matching frames, the sound space feature amount of the target frame is calculated from the state of the sound space feature amount in the frame one time before the target frame. Configured to estimate,
The acoustic signal processing apparatus according to claim 3.

The sound space feature quantity acquisition means includes:
Initial particle distribution setting means for uniformly arranging a plurality of particles having the same weight in the space;
Generate particles {Θ _k ^(l) } _{l = 1} ^M with weight {w _k ^(l) } _{l = 1} ^M _represented by equation (2) according to the dynamic model represented by equation (1) A prior distribution acquisition means for acquiring a prior distribution of particles at time k,
Particles in the prior distribution acquired by the prior distribution acquisition means are divided into a number corresponding to the weight when the weight is greater than or equal to a predetermined value, and 0 if the weight is less than the predetermined value Sound space feature quantity estimating means for estimating the sound space feature quantity at time k;
Comprising
The acoustic signal processing device according to claim 4.

The higher-order statistic acquisition unit is configured to acquire a higher-order statistic about the weight w ^(l) as a sound space feature amount.
The acoustic signal processing apparatus according to claim 5.

The high-order statistic acquisition unit is configured to acquire a high-order statistic Skewness represented by the equation (3) for the sound space feature value w ^(l) .
The acoustic signal processing device according to claim 6.

The higher-order statistic acquisition means is configured to acquire the higher-order statistic Kurtosis shown in the equation (4) for the sound space feature value w ^(l) .
The acoustic signal processing device according to claim 6.

Capturing sound including target sound emitted from a sound source by a plurality of microphones and converting the sound into an acoustic signal indicating the sound; and
Obtaining a sound space feature amount related to a feature in the acoustic space based on the converted acoustic signal;
Estimating a sound source direction of the target sound based on the acquired sound space feature, and
Obtaining a third or higher order statistical quantity of the sound space feature quantity;
Estimating the reliability of the estimated sound source direction based on the obtained higher order statistics;
Having
Acoustic signal processing method.

A computer program for causing a CPU connected to a plurality of microphones that captures a sound including a target sound emitted from a sound source and converts the sound to a sound signal indicating the sound to process the sound signals output from the plurality of microphones Because
Obtaining a sound space feature amount relating to a feature in the acoustic space based on acoustic signals output from the plurality of microphones;
Estimating a sound source direction of the target sound based on the acquired sound space feature, and
Obtaining a third or higher order statistical quantity of the sound space feature quantity;
Estimating the reliability of the estimated sound source direction based on the obtained higher order statistics;
A computer program for causing the CPU to execute.