JP6343771B2

JP6343771B2 - Head related transfer function modeling apparatus, method and program thereof

Info

Publication number: JP6343771B2
Application number: JP2014161681A
Authority: JP
Inventors: 健太郎松井; 修一足立; 真帆菅谷
Original assignee: Keio University; Japan Broadcasting Corp
Current assignee: Keio University; Japan Broadcasting Corp
Priority date: 2014-08-07
Filing date: 2014-08-07
Publication date: 2018-06-20
Anticipated expiration: 2034-08-07
Also published as: JP2016039493A

Description

本発明は、頭部伝達関数のモデリングに関し、特に音像定位技術などに用いられる頭部伝達関数のモデリング装置、その方法及びそのプログラムに関する。 The present invention relates to modeling of a head related transfer function, and more particularly to a head related transfer function modeling apparatus, a method thereof, and a program thereof used for sound image localization technology and the like.

頭部伝達関数（ＨＲＴＦ：Head-Related Transfer Function）とは、音響伝達関数であって、具体的には、模擬として頭がない状態での頭部中心に相当する位置から、頭外音源位置を経て、両耳鼓膜位置もしくは外耳道入口までの音響伝達関数のことである。
頭部伝達関数のモデリングとは、頭部伝達関数を有理多項式又はその比として表すことである。頭部伝達関数およびそのモデルは、音響信号の音像を空間内に擬似的に配置させるような音像定位技術など応用は多岐に渡る。 The head-related transfer function (HRTF) is an acoustic transfer function. Specifically, the head-related transfer function is calculated from the position corresponding to the center of the head when there is no head as a simulation. It is the acoustic transfer function to the binaural tympanic membrane position or the ear canal entrance.
The head-related transfer function modeling means expressing the head-related transfer function as a rational polynomial or a ratio thereof. The head-related transfer function and its model have a wide variety of applications such as a sound image localization technique that artificially arranges a sound image of an acoustic signal in space.

一般に、システムの伝達関数Ｇ（ｑ）は、システムのインパルス応答ｇ（ｋ）のｚ変換で定義され、次の式（１）のように表わすことができる。ここで、ｑはシフトオペレータである。 In general, the transfer function G (q) of the system is defined by z-transformation of the impulse response g (k) of the system, and can be expressed as the following equation (1). Here, q is a shift operator.

式（１）は無限インパルス応答（ＩＩＲ：Infinite impulse response）モデルを表している。それに対し、一般的に、頭部伝達関数は、そのインパルス応答が十分に収束するような次数Ｍで打ち切ることで、有理多項式を用いて次の式（２）のように、有限インパルス応答（ＦＩＲ：finite impulse response）モデルとしてモデリングされる。 Equation (1) represents an infinite impulse response (IIR) model. On the other hand, in general, the head-related transfer function is truncated at an order M such that the impulse response sufficiently converges, and a rational polynomial is used to obtain a finite impulse response (FIR) as in the following equation (2). : Finite impulse response) model.

頭部伝達関数のインパルス応答は、スピーカから入力信号を印加し、例えばダミーヘッドの両耳に内蔵したマイクロフォンにより収音を行なうことで測定される。例えば、サンプリング周波数４８ｋＨｚで測定された場合、十分な収束が得られる長さは、およそ５１２サンプルである。そのため、測定信号は、このサンプル数と同じか又はやや長い矩形窓により切り出され、頭部伝達関数は５１２次や１０４８次等の高次有限インパルス応答モデルとなる。 The impulse response of the head-related transfer function is measured by applying an input signal from a speaker and collecting sound with, for example, microphones built in both ears of the dummy head. For example, when measured at a sampling frequency of 48 kHz, the length that provides sufficient convergence is approximately 512 samples. Therefore, the measurement signal is cut out by a rectangular window that is the same as or slightly longer than the number of samples, and the head-related transfer function becomes a high-order finite impulse response model such as 512th order or 1048th order.

頭部伝達関数には、例えば両耳間時間差やレベル差、周波数特性上のスペクトラルキューなど、音像定位知覚に係る多くの特徴量が含まれる。十分に高次の有限インパルス応答モデルによる頭部伝達関数のモデリングにおいて、これらの特徴量が十分に保存されていることは、音像定位実験により確認されている。頭部伝達関数の同定において、スペクトラルキューなどは、音像定位知覚の手がかりとなる。 The head-related transfer function includes many feature quantities related to sound image localization perception, such as interaural time difference, level difference, and spectral cue on frequency characteristics. It has been confirmed by sound image localization experiments that these feature quantities are sufficiently preserved in modeling of the head-related transfer function by a sufficiently high-order finite impulse response model. Spectral cues and the like serve as clues for sound image localization perception in the identification of head-related transfer functions.

従来、頭部伝達関数のモデリング法が知られている（例えば、非特許文献１，２、特許文献１，２参照）。
非特許文献１には、頭部伝達関数のモデリング法として、極零（pole/zero）モデルにより有理多項式の比で頭部伝達関数をモデリングする方法が開示されている。非特許文献１には、極零モデルが次の式（３）で定義されることが記載されている。 Conventionally, methods for modeling a head related transfer function are known (see, for example, Non-Patent Documents 1 and 2 and Patent Documents 1 and 2).
Non-Patent Document 1 discloses a method of modeling a head-related transfer function with a ratio of a rational polynomial using a pole / zero model as a method for modeling a head-related transfer function. Non-Patent Document 1 describes that the pole-zero model is defined by the following equation (3).

式（３）において、Ｃは定数、ｐ_kは極、ｚ_kは零点であり、このうち極は共振に対応し、零点は時間遅れや反共振に対応する。 In Equation (3), C is a constant, p _k is a pole, and z _k is a zero point. Of these, the pole corresponds to resonance, and the zero point corresponds to time delay or anti-resonance.

前記した式（２）で表される有限インパルス応答モデルでは零点のみを用いてインパルス応答を表現している。これに対して、例えば式（３）に示す極零モデルのように零点だけではなく極を用いてモデリングすることで、より少ないパラメータ数で頭部伝達関数を表現することができる。非特許文献１に記載の技術では、極零モデルのパラメータを導出する際に、一次モデルとして有限インパルス応答モデルを求め、それぞれのインパルス応答の誤差、すなわち出力誤差の二乗和を最小化することで、パラメータを導出する。 In the finite impulse response model expressed by the above equation (2), the impulse response is expressed using only the zero point. On the other hand, the head-related transfer function can be expressed with a smaller number of parameters by modeling using not only zeros but poles as in the pole-zero model shown in Equation (3), for example. In the technique described in Non-Patent Document 1, when the parameters of the pole-zero model are derived, a finite impulse response model is obtained as a primary model, and the error of each impulse response, that is, the sum of squares of output errors is minimized. Deriving parameters.

非特許文献２に記載された技術では、さらに、共振を方位によらないものとして、モデルの極を各方位の頭部伝達関数間で共通化した、共通極零（ＣＡＰＺ：Common-acoustical-pole and zero）モデルによるモデリングを行っている。非特許文献２には、音源受音点位置の位置ベクトルをｒ_jとしたとき、共通極零モデルは次の式（４）で定義されることが記載されている。 In the technique described in Non-Patent Document 2, the common pole zero (CAPZ: Common-acoustical-pole) in which the poles of the model are shared between the head-related transfer functions in each direction, assuming that the resonance does not depend on the direction. and zero) modeling. Non-Patent Document 2 describes that the common pole-zero model is defined by the following equation (4), where r _j is the position vector of the sound source receiving point position.

式（４）において、極ｐ_kが方向に依存せず、零点ｚ_kのみが方向に応じて変化するため、このモデルでは、音源や受音点の変化に応じて変えるパラメータの個数を削減することができる。非特許文献２に記載の技術では、共通極零モデルを導出する際に、複数方向の頭部伝達関数の有限インパルス応答モデルを一次モデルとして求め、その近似として共通極零モデルを導出する。非特許文献２には、このとき、評価関数は、各方位の有限インパルス応答モデルと共通極零モデルの間のインパルス応答の誤差、すなわち出力誤差ε(ｒ_j，ｋ)を用いて次の式（５）のように表わされることが記載されている。 In Equation (4), the pole p _k does not depend on the direction, and only the zero point z _k changes according to the direction. Therefore, in this model, the number of parameters to be changed according to changes in the sound source and the sound receiving point is reduced. be able to. In the technique described in Non-Patent Document 2, when a common pole-zero model is derived, a finite impulse response model of a multi-directional head-related transfer function is obtained as a primary model, and a common pole-zero model is derived as an approximation thereof. In Non-Patent Document 2, at this time, the evaluation function is expressed by the following equation using the error of the impulse response between the finite impulse response model of each direction and the common pole-zero model, that is, the output error ε (r _j , k). It is described that it is expressed as (5).

式（５）において、Ｒは方向数、Ｍは有限インパルス応答モデルの次数である。この式（５）におけるＪを最小化することにより、共通極零モデルのパラメータは導出される。 In Equation (5), R is the number of directions, and M is the order of the finite impulse response model. By minimizing J in this equation (5), the parameters of the common pole-zero model are derived.

また、特許文献１に記載された技術は、複数の頭部伝達関数の有限インパルス応答モデルから主成分分析を用いて抽出した基本ベクトルを、バランスモデル近似技術により極零モデルとして模擬する方法に関するものである。この基本ベクトルは、一つの非方向平均基本ベクトルと複数の方向性基本ベクトルから構成される。ここで、非方向平均基本ベクトルとは、モデリングされた全方向の頭部伝達関数の特徴のうち、音源の方向とは無関係に決定される特徴を代表する基本ベクトルを意味する。一方で、方向性基本ベクトルは、音源の方向により決定される特徴を代表とする基本ベクトルである。特許文献１に記載された技術は、主成分分析とバランスモデル近似技術を用いることで、極零モデルとして少ないパラメータで頭部伝達関数を模擬することを可能としている。 The technique described in Patent Document 1 relates to a method of simulating a basic vector extracted by using principal component analysis from a finite impulse response model of a plurality of head related transfer functions as a pole-zero model using a balance model approximation technique. It is. This basic vector is composed of one non-directional average basic vector and a plurality of directional basic vectors. Here, the non-directional average basic vector means a basic vector that represents a feature determined regardless of the direction of the sound source among the features of the modeled omnidirectional head-related transfer function. On the other hand, the directional basic vector is a basic vector typified by features determined by the direction of the sound source. The technique described in Patent Document 1 can simulate a head-related transfer function with few parameters as a pole-zero model by using a principal component analysis and a balance model approximation technique.

また、特許文献２には、頭部伝達関数を表わすパラメータを生成する方法として、頭部インパルス応答信号を周波数領域においてサブバンドに分割し、各サブバンドのパラメータを求める方法が開示されている。この方法では、高速フーリエ変換ビンのグループ化により、周波数領域において少なくとも２つのサブバンドに分割され、サブバンドの信号レベルの二乗平均平方根に基づいて、パラメータが決定される。特許文献２に記載された技術は、サブバンドに分割することにより、頭部伝達関数を用いた演算処理量の低減を可能としている。 Patent Document 2 discloses a method of generating a parameter representing a head-related transfer function by dividing a head impulse response signal into subbands in the frequency domain and obtaining parameters of each subband. In this method, fast Fourier transform bins are grouped into at least two subbands in the frequency domain, and parameters are determined based on the root mean square of the signal levels of the subbands. The technique described in Patent Document 2 can reduce the amount of calculation processing using the head-related transfer function by dividing the subband.

特許第４６８１４６４号公報Japanese Patent No. 4681464 特許第４９２１４７０号公報Japanese Patent No. 4912470

F. Asano, Y. Suzuki, and T. Sone, “Role of spectral cues in median plane localization,” J. Acoust. Soc. Am., Vol.88, pp.159-168 (1989)F. Asano, Y. Suzuki, and T. Sone, “Role of spectral cues in median plane localization,” J. Acoust. Soc. Am., Vol.88, pp.159-168 (1989) Y. Haneda, S. Makino, and Y. Kaneda, ”Common-Acoustical-Pole and Zero Modeling of Head-Related Transfer Functions,”IEEE Trans. Vol.7, No.2 (1999)Y. Haneda, S. Makino, and Y. Kaneda, “Common-Acoustical-Pole and Zero Modeling of Head-Related Transfer Functions,” IEEE Trans. Vol. 7, No. 2 (1999)

前記した式（２）で表される頭部伝達関数の有限インパルス応答モデルによるモデリングにおいては、５１２次や１０４８次等の高次であるためにパラメータ数が多く、それに伴い、頭部伝達関数を用いた音像定位方式における演算量が多くなる問題がある。加えて、この有限インパルス応答モデルによる頭部伝達関数の模擬では雑音を考慮していないため、雑音の影響を低減するために、同期加算等の前処理が必要となる。しかし、この場合にも雑音の白色性もしくは平均が０となることを仮定しており、低減効果は限定的である。
同様に、非特許文献１，２及び特許文献１，２に記載されたいずれの方法による頭部伝達関数の模擬においても、雑音の影響が考慮されていない。 In the modeling by the finite impulse response model of the head related transfer function represented by the above-described formula (2), since it is a higher order such as 512th order or 1048th order, the number of parameters is large. There is a problem that the amount of calculation in the used sound image localization method increases. In addition, since simulation of the head-related transfer function using this finite impulse response model does not consider noise, preprocessing such as synchronous addition is required to reduce the influence of noise. However, also in this case, it is assumed that the whiteness or average of noise is 0, and the reduction effect is limited.
Similarly, in the simulation of the head-related transfer function by any of the methods described in Non-Patent Documents 1 and 2 and Patent Documents 1 and 2, the influence of noise is not considered.

また、従来の技術のうち、十分に高次の有限インパルス応答モデルによる頭部伝達関数のモデリングでは、音像定位知覚の手がかりとなる頭部伝達関数の特徴量が十分に保存されているものの、非特許文献１，２及び特許文献１，２に記載された技術では、少なくともスペクトラルキューが保存されるとは限らない。ここで、スペクトラルキューとは、図１０に示すような頭部伝達関数の周波数特性上の特定のピークＰ１やノッチＮ１，Ｎ２であり、音像を空間内に模擬的に配置させる音像定位技術などにおいては、その正確なモデリングが重要とされている。 In addition, among the conventional techniques, in the modeling of the head related transfer function using a sufficiently high-order finite impulse response model, the features of the head related transfer function that are clues for sound image localization perception are sufficiently preserved. In the techniques described in Patent Documents 1 and 2 and Patent Documents 1 and 2, at least a spectral cue is not always stored. Here, the spectral cue is a specific peak P1 or notches N1 and N2 on the frequency characteristic of the head related transfer function as shown in FIG. Therefore, accurate modeling is considered important.

本発明は、以上のような問題点に鑑みてなされたものであり、音像定位知覚に係る特徴量を保存し且つ雑音の影響を考慮した頭部伝達関数のモデルであって音像定位技術などに利用した際に演算量を低減できるモデルを求める頭部伝達関数のモデリング装置を提供することを課題とする。 The present invention has been made in view of the above problems, and is a model of a head-related transfer function that preserves the feature quantity related to sound image localization perception and considers the influence of noise. It is an object of the present invention to provide a head-related transfer function modeling device for obtaining a model capable of reducing the amount of calculation when used.

前記課題を解決するために、本発明に係る頭部伝達関数のモデリング装置は、スピーカに印加された入力信号とスピーカから発せられた音声をマイクロフォンで測定して得られた出力信号とを入出力データとして用いて、漸近推定法により頭部伝達関数をモデリングする頭部伝達関数のモデリング装置であって、高次モデル推定手段と、低次元化手段と、を備え、低次元化手段が、周波数伝達関数算出手段と、低次モデル推定手段と、低次モデル探索手段と、を備えることとした。 In order to solve the above problems, the head related transfer function modeling apparatus according to the present invention inputs and outputs an input signal applied to a speaker and an output signal obtained by measuring a sound emitted from the speaker with a microphone. A head-related transfer function modeling device that uses the data as an asymptotic estimation method to model a head-related transfer function, comprising a high-order model estimation means and a reduction means, and the reduction means has a frequency A transfer function calculating means, a low-order model estimating means, and a low-order model searching means are provided.

かかる構成によれば、頭部伝達関数のモデリング装置は、高次モデル推定手段によって、前記入出力データを用いて、予め定められた高次のモデル次数を有した頭部伝達関数及び雑音モデルについての高次モデルのパラメータを予測誤差法により推定する。
この高次モデル推定手段によって、頭部伝達関数の測定における雑音を考慮した、精度の良いモデルを求めることが可能となる。
そして、頭部伝達関数のモデリング装置は、低次元化手段によって、推定された高次モデルと、周波数領域における評価関数である対数尤度関数とを用いて最尤推定値を導出することで前記高次モデルを低次元化する。
ここで、低次元化手段では、周波数伝達関数算出手段によって、前記高次モデルの周波数伝達関数を求める。また、低次元化手段では、低次モデル推定手段によって、前記対数尤度関数を最小化することで前記高次のモデル次数よりも低い次数の低次モデルの推定値を求める。
そして、低次元化手段では、低次モデル探索手段によって、前記低次の次数を更新して前記対数尤度関数を最小化させることを繰り返すことでそれぞれ推定された各頭部伝達関数の低次モデルと、高次参照モデルと、の間の音像定位知覚に係る特徴量の誤差をそれぞれ求め、前記特徴量の誤差が予め定められた許容条件を満たし且つ最低次数となるときの低次モデルを探索する。 According to such a configuration, the head related transfer function modeling apparatus uses the input / output data by the high order model estimating means to determine a head related transfer function and a noise model having a predetermined high order model order. The parameters of the higher-order model are estimated by the prediction error method.
With this high-order model estimation means, it is possible to obtain a highly accurate model in consideration of noise in the measurement of the head related transfer function.
Then, the head related transfer function modeling device derives the maximum likelihood estimated value by deriving the maximum likelihood estimated value using the high-order model estimated by the reduction means and the log likelihood function which is an evaluation function in the frequency domain. Reduce higher-order model.
Here, in the reduction means, the frequency transfer function of the higher order model is obtained by the frequency transfer function calculation means. Further, the reduction means obtains an estimated value of a lower-order model having an order lower than the higher-order model order by minimizing the log likelihood function by the lower-order model estimation means.
Then, in the reduction means, the low-order model search means updates the low-order order and repeats minimizing the log-likelihood function. An error of a feature amount related to sound image localization perception between the model and a higher-order reference model is obtained, and a low-order model when the error of the feature amount satisfies a predetermined allowable condition and becomes a minimum order Explore.

本発明は、頭部伝達関数のモデリング装置の各手段が処理を実行する頭部伝達関数のモデリング方法で実現することもできる。
本発明は、コンピュータを、頭部伝達関数のモデリング装置の各手段として動作させる頭部伝達関数のモデリングプログラムで実現することもできる。 The present invention can also be realized by a head-related transfer function modeling method in which each unit of the head-related transfer function modeling apparatus executes processing.
The present invention can also be realized by a head-related transfer function modeling program that causes a computer to operate as each means of a head-related transfer function modeling apparatus.

本発明によれば、漸近推定法により頭部伝達関数をモデリングすることで、頭部伝達関数として、従来の５１２次や１０２４次といった高次モデルの参照モデルとの誤差が許容される範囲で音像定位知覚に係る特徴量を保存し、従来よりもパラメータ数の少ない、雑音を考慮した低次の頭部伝達関数モデルを求めることができる。そのため、この頭部伝達関数として推定された低次モデルを音像定位技術などの制御対象として用いたときに演算量を低減することができる。 According to the present invention, the head-related transfer function is modeled by the asymptotic estimation method, so that the sound image is within a range in which an error from a higher-order model reference model such as the 512th order or 1024th order is allowed as the head-related transfer function. It is possible to obtain a low-order head-related transfer function model that takes into account noise and preserves the feature amount related to localization perception, and has a smaller number of parameters than conventional ones. Therefore, when the low-order model estimated as the head-related transfer function is used as a control target such as a sound image localization technique, the amount of calculation can be reduced.

本発明の実施形態に係る頭部伝達関数のモデリング装置の構成を模式的に示すブロック図である。It is a block diagram which shows typically the structure of the modeling apparatus of the head related transfer function which concerns on embodiment of this invention. 図１の入出力データの測定の様子を模式的に示す図面である。It is drawing which shows typically the mode of the measurement of the input-output data of FIG. 図１の入出力データの信号波形の一例であって、（ａ）はスピーカへの入力信号の抜粋、（ｂ）はマイクロフォンで測定された出力信号の抜粋を示している。It is an example of the signal waveform of the input-output data of FIG. 1, (a) has extracted the input signal to a speaker, (b) has shown the extract of the output signal measured with the microphone. 頭部伝達関数のモデルの構造を示すブロック線図であり、（ａ）はＦＩＲモデルの構造、（ｂ）はＡＲＸモデルの構造を示している。It is a block diagram which shows the structure of the model of a head-related transfer function, (a) shows the structure of FIR model, (b) has shown the structure of ARX model. 本発明の実施形態に係る頭部伝達関数のモデリング方法を含む音像定位制御の流れを示すフローチャートである。It is a flowchart which shows the flow of the sound image localization control containing the modeling method of the head related transfer function which concerns on embodiment of this invention. 図５のＡＲＸモデルの低次元化処理の一例を示すフローチャートである。6 is a flowchart illustrating an example of a process for reducing the dimension of the ARX model in FIG. 5. 推定された頭部伝達関数の低次モデルと高次参照モデルとの間のスペクトル歪の一例を示すグラフである。It is a graph which shows an example of the spectrum distortion between the low-order model and high-order reference model of the estimated head-related transfer function. 頭部伝達関数の周波数特性の一例を示すグラフであって、（ｂ）は実施例、（ａ）は実施例と比較例とを重ねて示している。It is a graph which shows an example of the frequency characteristic of a head-related transfer function, Comprising: (b) is an Example, (a) has overlapped and showed the Example and the comparative example. 本実施形態で推定された頭部伝達関数を適用した音像定位制御の一例を模式的に示すブロック図である。It is a block diagram which shows typically an example of the sound image localization control to which the head-related transfer function estimated in this embodiment is applied. スペクトラルキューの一例を示す図である。It is a figure which shows an example of a spectral cue.

以下、図面を参照して本発明の頭部伝達関数のモデリング装置を実施するための形態（以下「実施形態」という）について詳細に説明する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an embodiment for implementing a head related transfer function modeling apparatus of the present invention (hereinafter referred to as “embodiment”) will be described in detail with reference to the drawings.

図１に示す頭部伝達関数のモデリング装置１は、スピーカに印加された入力信号とスピーカから発せられた音声をマイクロフォンで測定して得られた出力信号とを入出力データとして用いて、漸近推定法により頭部伝達関数をモデリングするものである。 The head related transfer function modeling apparatus 1 shown in FIG. 1 uses asymptotic estimation using an input signal applied to a speaker and an output signal obtained by measuring a sound emitted from the speaker with a microphone as input / output data. The head-related transfer function is modeled by the method.

＜入出力データ＞
図１に示す入出力データを事前に測定する際には、図２に例示するように音響無響室において、所定方向に設置した例えば１台のスピーカＳＰに対して入力信号ｕ（ｋ）を印加する。ここで、ｋは、音声の連続時間信号のサンプリングを行うときの時間間隔（サンプリング周期）に対応付けられたサンプルを識別する変数である。ｋは離散値であり、その個数がデータ数である。そして、スピーカＳＰから発せられた音声を、ダミーヘッドＤの耳に当たる位置に設置したマイクロフォンで測定する。このときに測定された信号を出力信号ｙ（ｋ）とする。 <Input / output data>
When the input / output data shown in FIG. 1 is measured in advance, the input signal u (k) is applied to, for example, one speaker SP installed in a predetermined direction in an acoustic anechoic room as illustrated in FIG. Apply. Here, k is a variable that identifies a sample associated with a time interval (sampling period) when sampling a continuous time signal of speech. k is a discrete value, and the number thereof is the number of data. Then, the sound emitted from the speaker SP is measured with a microphone installed at a position where it hits the ear of the dummy head D. The signal measured at this time is defined as an output signal y (k).

図２では、スピーカＳＰをダミーヘッドＤにとっての正面から右９０°方向（つまり左２７０°方向）に設置してダミーヘッドＤの右耳に向けているが、これは一例である。ダミーヘッドＤを水平面内でスピーカＳＰに対して所定角度だけ相対的に回転させることで、様々な方向から左耳及び右耳に設置したマイクロフォンで音声を測定することができる。なお、スピーカＳＰとダミーヘッドＤとの相対的な距離や高さも可変である。 In FIG. 2, the speaker SP is installed in the right 90 ° direction (that is, the left 270 ° direction) from the front of the dummy head D and is directed to the right ear of the dummy head D, but this is an example. By rotating the dummy head D relative to the speaker SP within a horizontal plane by a predetermined angle, it is possible to measure sound with microphones installed in the left and right ears from various directions. The relative distance and height between the speaker SP and the dummy head D are also variable.

入力信号ｕ（ｋ）としては、擬似白色信号であるＭ系列信号を用いることができる。
一例として、図３（ａ）にシフトレジスタ長が１５（サンプル数＝２¹⁵−１）のＭ系列信号の波形を示す。なお、図３（ａ）には、Ｍ系列信号の途中の一部分（１００サンプル分）の波形を示した。また、図３（ｂ）には、ダミーヘッドＤの正面から左３０°方向に設置したスピーカＳＰに、そのＭ系列を印加したときに測定された出力信号ｙ（ｋ）の一部波形を示す。ただし、図３（ａ）及び図３（ｂ）に示す波形はサンプリング周波数を４８ｋＨｚとした場合に得られたものである。 As the input signal u (k), an M-sequence signal that is a pseudo white signal can be used.
As an example, FIG. 3A shows a waveform of an M-sequence signal having a shift register length of 15 (number of samples = 2 ¹⁵ −1). FIG. 3A shows a waveform of a part (100 samples) in the middle of the M-sequence signal. FIG. 3B shows a partial waveform of the output signal y (k) measured when the M series is applied to the speaker SP installed 30 ° to the left from the front of the dummy head D. . However, the waveforms shown in FIGS. 3A and 3B are obtained when the sampling frequency is 48 kHz.

＜漸近推定法＞
漸近推定法（asymptotic method）は、例えばプラント制御のためのモデリング法として公知の手法である。漸近推定法では、まず、システム同定実験によって得られた入出力データに対して雑音モデルを考慮した高次（例えばｎ次とする）モデルのパラメータを推定し、その後に、漸近理論に基づき周波数特性を考慮して低次元化を行なう。 <Asymptotic estimation method>
The asymptotic method is a known method as a modeling method for plant control, for example. In the asymptotic estimation method, first, the parameters of a higher-order model (for example, n-th order) taking into account the noise model are estimated for the input / output data obtained by the system identification experiment, and then the frequency characteristics based on the asymptotic theory. Reduce the dimension considering the above.

［頭部伝達関数のモデリング装置の構成］
頭部伝達関数のモデリング装置１（以下、単にモデリング装置１という）は、一般的なコンピュータと同様に、例えば、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）、入出力インタフェース等を備えている。ここでは、モデリング装置１は、図１に示すように、高次モデル推定手段１０と、高次数設定手段１１と、低次元化手段２０と、を備えている。 [Configuration of head transfer function modeling device]
The head-related transfer function modeling device 1 (hereinafter simply referred to as the modeling device 1) is, for example, a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), An HDD (Hard Disk Drive), an input / output interface, and the like are provided. Here, as shown in FIG. 1, the modeling apparatus 1 includes a high-order model estimation unit 10, a high-order setting unit 11, and a reduction order unit 20.

高次モデル推定手段１０は、入出力データを用いて、予め定められた高次（ｎ次）のモデル次数を有した頭部伝達関数及び雑音モデルについての高次モデルを予測誤差法により推定するものである。 The high-order model estimation means 10 estimates a high-order model for a head-related transfer function and a noise model having a predetermined high-order (n-th order) model order by using a prediction error method using input / output data. Is.

ここで、雑音モデルをもつモデルとしては、図４（ｂ）に示すＡＲＸ（AutoRegressive eXogenous）モデル等の公知のモデルを用いることができる。ＡＲＸモデルでは雑音モデル（図中の１／Ａ（ｑ））を考慮しており、本実施形態では、マイクロフォンで測定された出力信号に雑音の影響が含まれていることとして扱う。 Here, as a model having a noise model, a known model such as an ARX (Auto Regressive eXogenous) model shown in FIG. 4B can be used. In the ARX model, a noise model (1 / A (q) in the figure) is considered, and in this embodiment, the output signal measured by the microphone is treated as including the influence of noise.

なお、従来の一般的な頭部伝達関数のモデリングは、図４（ａ）に示すＦＩＲモデルを用いており、音像定位にはＦＩＲフィルタが実装されてきた。ＦＩＲモデルでは雑音モデル（１／Ａ（ｑ））を考慮しておらず、同期加算等の前処理を行うことで、雑音の影響が含まれていないものとした信号を扱っている。 The conventional general head related transfer function modeling uses the FIR model shown in FIG. 4A, and an FIR filter has been mounted for sound image localization. The FIR model does not consider the noise model (1 / A (q)), but handles signals that are not affected by noise by performing preprocessing such as synchronous addition.

モデリング装置１は、後の処理で低次元化を行なうことを前提にして、高次モデル推定手段１０によって、高次モデルのパラメータを推定する。ここで、高次モデルのパラメータを推定するのは、一般的に、モデルが高次であるほど精度が良くなるためである。なお、雑音モデルを考慮しない従来技術では、頭部伝達関数を例えば５１２次や１０４８次等の高次有限インパルス応答モデルとしていた。本実施形態では、高次の次数としては、経験的に三桁の数値を想定している。 The modeling apparatus 1 estimates the parameters of the higher-order model by the higher-order model estimation means 10 on the assumption that the order is reduced in the subsequent processing. Here, the reason why the parameters of the higher-order model are estimated is that, generally, the higher the model, the better the accuracy. In the prior art that does not consider the noise model, the head-related transfer function is a high-order finite impulse response model such as the 512th order or the 1048th order. In the present embodiment, a three-digit numerical value is empirically assumed as the higher order.

また、本実施形態において前提とする予測誤差法とは、予測値に基づく予測誤差から構成される評価規範を最小化する推定値を計算する、パラメータ推定法の総称である。この予測誤差法としてはシステム同定に用いられる一般的な手法を用いることができる。予測誤差の大きさの測度として２次関数を用いると、予測誤差法は最小二乗法となる。
ここでは、高次モデル推定手段１０は、高次ＡＲＸモデルを用いてシステム同定理論に基づき最小二乗法でパラメータを推定することとする。 The prediction error method assumed in this embodiment is a general term for parameter estimation methods for calculating an estimated value that minimizes an evaluation criterion composed of prediction errors based on predicted values. As this prediction error method, a general method used for system identification can be used. When a quadratic function is used as a measure of the magnitude of the prediction error, the prediction error method is a least square method.
Here, the high-order model estimation unit 10 estimates the parameters by the least square method based on the system identification theory using the high-order ARX model.

高次モデル推定手段１０は、予め測定された入出力データ｛ｕ（ｋ），ｙ（ｋ）；ｋ＝１，２，…，Ｎ｝によって、次の式（６）で定義されるＡＲＸモデルのパラメータを推定する。ここで、ｕ（ｋ）は入力信号、ｙ（ｋ）は出力信号、Ｎはデータ数である。 The higher-order model estimation means 10 is an ARX model defined by the following equation (6) based on input / output data {u (k), y (k); k = 1, 2,. Estimate the parameters of Here, u (k) is an input signal, y (k) is an output signal, and N is the number of data.

ただし、式（６）においてＡ^h（ｑ）、Ｂ^h（ｑ）は、システムを構成する要素（以下、単にモデルと呼ぶ）であり、パラメータ｛ａ_i｝，｛ｂ_i｝（ｉ＝１〜ｎ）を用いて、それぞれ以下の式（７）、式（８）で表される。 In Equation (6), A ^h (q) and B ^h (q) are elements (hereinafter simply referred to as models) constituting the system, and parameters {a _i }, {b _i } (i = 1). To n), respectively, are represented by the following formulas (7) and (8).

また、前記した式（６）において、外乱ｗ（ｋ）としては、平均値が０で分散がσ² _wである正規性白色雑音を仮定している。
式（８）において、Ｌはむだ時間（dead time）を表す。
上記各式において、上添え字ｈはｈｉｇｈの省略形であって、モデルＡ^h（ｑ）、Ｂ^h（ｑ）の次数ｎが十分に高いこと、すなわち高次モデルであることを意味しており、ｈは変数ではない。 Further, in the above-described equation (6), normal white noise having an average value of 0 and a variance of σ ² _w is assumed as the disturbance w (k).
In equation (8), L represents a dead time.
In each of the above equations, the superscript h is an abbreviation for high, meaning that the order n of the models A ^h (q) and B ^h (q) is sufficiently high, that is, a high-order model. H is not a variable.

本実施形態では、高次数設定手段１１が、モデリング装置１の外部から入力されるモデル次数を高次モデル推定手段１０に設定することとした。このモデル次数（高次の次数ｎ）をモデリング装置１の内部に記憶させておき、高次モデル推定手段１０が処理の際に読み出す。ここでは、高次の次数ｎは例えば１００であるものとする。なお、高次の次数ｎは、処理の際にその都度、モデリング装置１の外部から入力するようにしてもよい。 In the present embodiment, the high-order setting unit 11 sets the model order input from the outside of the modeling apparatus 1 in the high-order model estimation unit 10. This model order (higher order n) is stored in the modeling apparatus 1 and read by the higher order model estimation means 10 during processing. Here, it is assumed that the high-order order n is 100, for example. Note that the higher-order order n may be input from outside the modeling apparatus 1 each time processing is performed.

高次モデル推定手段１０は、ＡＲＸモデルのパラメータとして、前記式（７）に示すパラメータ｛ａ_i｝（ｉ＝１〜１００）と、前記式（８）に示すパラメータ｛ｂ_i｝（ｉ＝１〜１００）とを最小二乗法により推定する。 Order model estimation means 10, as parameters of the ARX model, the equation parameters shown in _{(7) {a i} (} i = 1~100), parameters shown in the equation _{(8) {b i} (} i = 1 to 100) is estimated by the method of least squares.

高次モデル推定手段１０によって前記式（７）から推定された多項式をＡ^^h（ｑ）と表記し、前記式（８）から推定された多項式をＢ^^h（ｑ）と表記する。なお、本明細書において、ある文字の右に配置された記号「^」は、その直前の文字の上に配置されたハット記号を意味することとする。このハット記号は推定値であることを表すものである。 The polynomial estimated from the equation (7) by the higher-order model estimation means 10 is denoted as A ^ ^h (q), and the polynomial estimated from the equation (8) is denoted as B ^ ^h (q). In the present specification, the symbol “^” arranged on the right of a certain character means a hat symbol arranged on the character immediately before. This hat symbol represents an estimated value.

このように表記する場合、制御対象とする伝達関数Ｇ（ｑ）についての高次モデルは、Ｇ^^h（ｑ）と表記され、次の式（９ａ）で表される。また、雑音モデルＨ（ｑ）についての高次雑音モデルは、Ｈ^^h（ｑ）と表記され、次の式（９ｂ）で表される。 When expressed in this way, the higher-order model for the transfer function G (q) to be controlled is expressed as G ^ ^h (q) and is expressed by the following equation (9a). Further, the high-order noise model for the noise model H (q) is expressed as H ^ ^h (q) and is expressed by the following equation (9b).

なお、ＦＩＲモデルの場合、正規性白色雑音もしくは平均０の雑音を仮定しているので、式（９ａ）に対応した関係式がＧ（ｑ）＝Ｂ（ｑ）のように記述され、式（９ｂ）に対応した関係式がＨ（ｑ）＝１のように記述される。比較的高次となるＦＩＲモデルは、後記するように、低次ＡＲＸモデルのリファレンス（以下、高次参照モデルという）として用いることにする。 In the case of the FIR model, normal white noise or average zero noise is assumed. Therefore, a relational expression corresponding to the equation (9a) is described as G (q) = B (q), and the equation ( The relational expression corresponding to 9b) is described as H (q) = 1. The relatively high-order FIR model is used as a reference for the low-order ARX model (hereinafter referred to as a high-order reference model), as will be described later.

低次元化手段２０は、高次モデル推定手段１０で推定された高次モデル｛Ｇ^^h（ｑ），Ｈ^^h（ｑ）｝と、周波数領域における評価関数である対数尤度関数とを用いて最尤推定値を導出することで高次モデルを低次元化するものである。
この低次元化手段２０は、図１に示すように、周波数伝達関数算出手段２１と、低次モデル推定手段２２と、低次モデル探索手段２３と、を備えている。 The reduction means 20 obtains a higher-order model {G ^ ^h (q), H ^ ^h (q)} estimated by the higher-order model estimation means 10 and a log likelihood function that is an evaluation function in the frequency domain. It is used to reduce the higher-order model by deriving the maximum likelihood estimate.
As shown in FIG. 1, the order reduction means 20 includes a frequency transfer function calculation means 21, a low-order model estimation means 22, and a low-order model search means 23.

周波数伝達関数算出手段２１は、高次モデル推定手段１０で推定された既知の高次モデル｛Ｇ^^h（ｑ），Ｈ^^h（ｑ）｝についての周波数伝達関数Ｇ^^h(ｅ^jω)，Ｈ^^h(ｅ^jω)をそれぞれ求めるものである。この周波数伝達関数算出手段２１は、例えば、高次モデルの伝達関数Ｇ^^h（ｑ）から、その周波数応答である高次の周波数伝達関数Ｇ^^h(ｅ^jω)を算出する。 The frequency transfer function calculating means 21 is a frequency transfer function G ^ ^h (e ^jω ) for the known higher order models {G ^ ^h (q), H ^ ^h (q)} estimated by the higher order model estimating means 10. , H ^ ^h (e ^jω ). The frequency transfer function calculating means 21 calculates, for example, a high-order frequency transfer function G ^ ^h (e ^jω ) that is the frequency response from the transfer function G ^ ^h (q) of the high-order model.

低次モデル推定手段２２は、周波数領域における評価関数である対数尤度関数を最小化することで、高次モデル推定手段１０で推定された高次モデルのモデル次数（高次の次数）よりも低い次数の低次モデルの推定値を求めるものである。 The low-order model estimation unit 22 minimizes the log likelihood function, which is an evaluation function in the frequency domain, so that the model order (high-order order) of the high-order model estimated by the high-order model estimation unit 10 is smaller. This is to obtain an estimated value of a low-order model having a low order.

ここでは、高次モデル推定手段１０に設定されたモデル次数をｎ（＝１００）としている。また、推定しようとする未知の低次モデルのモデル次数をｍ（０＜ｍ＜ｎ）とおく。また、このｍ次の低次モデルのことを、｛Ｇ^^l（ｑ），Ｈ^^l（ｑ）｝と表記する。ここで、上添え字ｌはｌｏｗの省略形であって、Ｇ^^l（ｑ），Ｈ^^l（ｑ）の次数ｍがｎよりも低いこと、すなわち低次モデルであることを意味しており、ｌは変数ではない。
なお、Ｇ^^l（ｑ）は、制御対象とする伝達関数Ｇ（ｑ）についての低次モデルの推定値であり、Ｈ^^l（ｑ）は、雑音モデルＨ（ｑ）についての低次雑音モデルの推定値である。 Here, the model order set in the higher-order model estimation means 10 is n (= 100). Further, the model order of an unknown low-order model to be estimated is set to m (0 <m <n). The m-th order low-order model is denoted as {G ^ ^l (q), H ^ ^l (q)}. Here, the superscript l is an abbreviation of low, meaning that the order m of G ^ ^l (q) and H ^ ^l (q) is lower than n, that is, a low-order model. And l is not a variable.
G ^ ^l (q) is an estimated value of the low-order model for the transfer function G (q) to be controlled, and H ^ ^l (q) is a low-order noise for the noise model H (q). This is an estimate of the model.

この場合、さらに、推定しようとする未知の低次モデルの伝達関数Ｇ^^l（ｑ）についての周波数伝達関数をＧ^^l(ｅ^jω)とおき、同様に、推定しようとする未知の低次雑音モデルＨ^^l（ｑ）についての周波数伝達関数をＨ^^l(ｅ^jω)とおく。 In this case, the frequency transfer function for the transfer function G ^ ^l (q) of the unknown low-order model to be estimated is further set as G ^ ^l (e ^jω ). Similarly, the unknown low-order model to be estimated is Let H ^ ^l (e ^jω ) be the frequency transfer function for the noise model H ^ ^l (q).

前記した対数尤度関数は、次の式（１０）のＶで表される。なお、漸近理論では、高次モデル（Ｇ^^h(ｅ^jω)）は周波数領域において近似的に正規分布に従うので低次モデル（Ｇ^^l(ｅ^jω)）は漸近的に最尤推定値になる。 The log likelihood function described above is represented by V in the following equation (10). In the asymptotic theory, the higher order model (G ^ ^h ( ^ejω )) approximately follows a normal distribution in the frequency domain, so the lower order model (G ^ ^l ( ^ejω )) becomes asymptotically the maximum likelihood estimate. Become.

式（１０）において、Φ_u（ω）は、入力信号ｕ（ｋ）のパワースペクトル密度関数である。また、Φ_v（ω）は、雑音ｖ（ｋ）のパワースペクトル密度関数である。この雑音ｖ（ｋ）は次の式（１１）で表わされる。 In equation (10), Φ _u (ω) is a power spectral density function of the input signal u (k). Φ _v (ω) is a power spectral density function of the noise v (k). This noise v (k) is expressed by the following equation (11).

なお、ここで、雑音ｖ（ｋ）は、外乱ｗ（ｋ）として仮定した正規性白色雑音が、雑音成形フィルタ（高次モデル推定手段１０で推定された既知の高次雑音モデルＨ^^h（ｑ））を通過した後の雑音のことである。 Here, the noise v (k) is a normal white noise assumed as the disturbance w (k), and a noise-forming filter (a known high-order noise model H ^ ^h (estimated by the high-order model estimation means 10). q)) Noise after passing through.

低次モデル推定手段２２は、推定しようとする未知の低次モデルのモデル次数ｍ（０＜ｍ＜ｎ）が低次モデル探索手段２３によって所定値に設定されたときに、前記式（１０）のＶを最小化させる演算処理を行い、モデル次数ｍが当該設定値のときに、式（１０）のＶを最小化する低次モデルの推定値Ｇ^^l（ｑ）を得る。この低次モデルの推定値Ｇ^^l（ｑ）はモデリング装置１の出力候補である。なお、対数尤度関数Ｖの最小化には、非線形最適化問題を解く必要がある。その求解法は限定しないが、一例について後で説明を行う。 The low-order model estimation unit 22 determines that the model order m (0 <m <n) of the unknown low-order model to be estimated is set to a predetermined value by the low-order model search unit 23. Is calculated, and when the model order m is the set value, an estimated value G ^ ^l (q) of the low-order model that minimizes V in Expression (10) is obtained. The estimated value G ^ ^l (q) of the low-order model is an output candidate of the modeling apparatus 1. In order to minimize the log likelihood function V, it is necessary to solve a nonlinear optimization problem. The solution method is not limited, but an example will be described later.

＜低次モデル探索手段２３＞
低次モデル探索手段２３は、推定しようとする未知の低次モデルのモデル次数ｍを更新して対数尤度関数Ｖを最小化させることを繰り返すことでそれぞれ推定された各頭部伝達関数の低次モデルＧ^^l（ｑ）と、高次参照モデルと、の間の音像定位知覚に係る特徴量の誤差をそれぞれ求め、この特徴量の誤差が予め定められた許容条件を満たし且つ最低次数となるときの低次モデルを探索するものである。ここで探索された最低次数の低次モデルＧ^^l（ｑ）がモデリング装置１の出力である。 <Lower model search means 23>
The low-order model search means 23 updates the model order m of an unknown low-order model to be estimated and repeats minimizing the log-likelihood function V. An error of a feature amount related to sound image localization perception between the next model G ^ ^l (q) and a higher-order reference model is obtained, and the error of the feature amount satisfies a predetermined allowable condition and has a minimum order. It searches for a low-order model when The low-order model G ^ ^l (q) of the lowest order searched here is the output of the modeling apparatus 1.

本実施形態では、高次参照モデルの一例として、音響分野で一般的に用いられているＦＩＲモデルを用いた。高次参照モデルとしたＦＩＲモデルの次数は、十分な収束を考慮して５１２次とした。 In this embodiment, an FIR model generally used in the acoustic field is used as an example of a higher-order reference model. The order of the FIR model as the higher-order reference model is 512th in consideration of sufficient convergence.

以下では、音像定位知覚に係る特徴量としては、一例として、頭部伝達関数の周波数特性上のピーク及びノッチであるスペクトラルキューの位置であることとする。それは、音像を空間内に模擬的に配置させる音像定位技術などにおいては、スペクトラルキューの正確なモデリングが重要とされているからである。 Hereinafter, as an example, the feature amount related to sound image localization perception is the position of a spectral cue that is a peak and a notch on the frequency characteristic of the head-related transfer function. This is because accurate modeling of spectral cues is important in sound image localization technology that arranges sound images in space in a simulated manner.

本実施形態では、高次ＡＲＸモデルを低次元化して低次ＡＲＸモデルを推定する際に、スペクトラルキューを保存するように構成した。ここで、スペクトラルキューの保存とは、推定しようとする低次ＡＲＸモデルのスペクトラルキューが、高次参照モデル（例えば高次のＦＩＲモデル）のスペクトラルキューを所望の正確さで再現できるように予め定められた許容条件を満たすことをいう。再現性の精度は、音像定位などに応用するときに期待する演算量の所望の低減効果に応じて適宜設定される。両者のスペクトラルキューの中心周波数のずれを所望の許容できる範囲に抑えることにより、頭部伝達関数として推定された低次モデルが所望の確度を有しつつ、その低次モデルを音像定位技術などの制御対象として用いたときの演算量を低減することができる。 In the present embodiment, the spectral cue is stored when the low-order ARX model is estimated by reducing the order of the high-order ARX model. Here, the preservation of the spectral cue is determined in advance so that the spectral cue of the low-order ARX model to be estimated can reproduce the spectral cue of the high-order reference model (for example, a high-order FIR model) with a desired accuracy. Satisfying the specified acceptable conditions. The accuracy of reproducibility is appropriately set according to the desired effect of reducing the amount of calculation expected when applied to sound image localization and the like. By suppressing the deviation of the center frequency of both spectral cues to a desired allowable range, the low-order model estimated as the head-related transfer function has the desired accuracy, and the low-order model is converted into a sound image localization technique, etc. The amount of calculation when used as a control target can be reduced.

本実施形態では、低次モデル探索手段２３が、低次モデル推定手段２２でそれぞれ推定された各頭部伝達関数の低次モデルＧ^^l（ｑ）と高次参照モデルとの間のスペクトル歪（ＳＤ:spectral distortion）をそれぞれ求め、スペクトル歪が予め定められた第１閾値以下、且つスペクトラルキューの位置についての高次参照モデルとの間の誤差が予め定められた第２閾値以下の条件を満たし且つ最低次数となるときの低次モデルを探索することとした。ここで、スペクトル歪ＳＤとは、２つの伝達関数の一致度を判定するために、その振幅特性の差をすべての周波数成分で評価した物理指標のことである。このように構成することで、演算量を低減し且つ、推定しようとする低次モデルの精度を保証することができる。 In the present embodiment, the low-order model search means 23 performs spectral distortion between the low-order model G ^ ^l (q) of each head-related transfer function estimated by the low-order model estimation means 22 and the high-order reference model. (SD: spectral distortion) is obtained, and the spectral distortion is equal to or less than a predetermined first threshold value, and the error between the spectral cue position and the higher-order reference model is equal to or less than a predetermined second threshold value. It was decided to search for a low-order model that satisfies the minimum order. Here, the spectral distortion SD is a physical index in which the difference in amplitude characteristics is evaluated for all frequency components in order to determine the degree of coincidence between two transfer functions. With this configuration, it is possible to reduce the amount of calculation and to guarantee the accuracy of the low-order model to be estimated.

そこで、本実施形態では、低次モデル探索手段２３が、図１に示すように、スペクトラル歪算出手段２４と、スペクトラル歪判定手段２５と、音響特徴量算出手段２６と、音響特徴量判定手段２７と、を備えることとした。 Therefore, in the present embodiment, as shown in FIG. 1, the low-order model search means 23 is a spectral distortion calculation means 24, a spectral distortion determination means 25, an acoustic feature quantity calculation means 26, and an acoustic feature quantity determination means 27. And so on.

スペクトラル歪算出手段２４は、２つの頭部伝達関数Ｈ_X，Ｈ_Yが与えられたときに、そのスペクトル歪ＳＤを次の式（１２）によって算出するものである。 Spectral distortion calculation means 24 calculates the spectral distortion SD by the following equation (12) when two head-related transfer functions H _X and H _Y are given.

式（１２）において、Ｎは頭部伝達関数におけるデータ数を表わす。また、頭部伝達関数Ｈ_Xは例えば推定された頭部伝達関数の低次モデルＧ^^l（ｑ）の周波数伝達関数を表わし、頭部伝達関数Ｈ_Yは例えば高次参照モデルの周波数伝達関数を表わす。 In Expression (12), N represents the number of data in the head-related transfer function. The head-related transfer function H _X represents the frequency transfer function of the estimated lower-order model G ^ ^l (q) of the head-related transfer function, for example, and the head-related transfer function H _Y represents the frequency transfer function of the higher-order reference model, for example. Represents.

スペクトラル歪判定手段２５は、スペクトラル歪算出手段２４によって算出されたスペクトル歪ＳＤが第１閾値以下であるか否かを判定するものである。ここで、第１閾値は特に限定されるものではない。ただし、スペクトル歪ＳＤは、およそ３ｄＢを下回っていれば、２つの頭部伝達関数がよく一致しているとみなすことができるとされている。そこで、本実施形態では、典型的な一例として第１閾値を３ｄＢとした。これにより、スペクトル歪ＳＤが３ｄＢより大きい場合、そのときに設定されているモデル次数ｍにおいて推定された低次モデルＧ^^l（ｑ）は、ＮＧであってモデリング装置１の最終出力とはならない。よって、前記式（１０）や前記式（１２）の演算において、ｍとして取り得るすべての値を選択する前に、モデル次数ｍの更新につれてスペクトル歪ＳＤが３ｄＢよりも大きくなった時点で演算を終了させることができる。 Spectral distortion determination means 25 determines whether or not the spectral distortion SD calculated by the spectral distortion calculation means 24 is equal to or less than the first threshold value. Here, the first threshold value is not particularly limited. However, if the spectral distortion SD is less than about 3 dB, it can be considered that the two head-related transfer functions are in good agreement. Therefore, in the present embodiment, the first threshold is set to 3 dB as a typical example. Thus, when the spectral distortion SD is larger than 3 dB, the low-order model G ^ ^l (q) estimated at the model order m set at that time is NG and does not become the final output of the modeling apparatus 1. . Therefore, before selecting all possible values for m in the calculations of Equation (10) and Equation (12), the calculation is performed when the spectral distortion SD becomes greater than 3 dB as the model order m is updated. Can be terminated.

音響特徴量算出手段２６は、推定しようとする低次モデルＧ^^l（ｑ）についての周波数伝達関数Ｇ^^l(ｅ^jω)におけるスペクトラルキューの中心周波数と、高次参照モデルについての周波数伝達関数におけるスペクトラルキューの中心周波数との差分を音響特徴量の誤差として算出するものである。 The acoustic feature quantity calculation means 26 uses the center frequency of the spectral cue in the frequency transfer function G ^ ^l (e ^jω ) for the low-order model G ^ ^l (q) to be estimated and the frequency transfer function for the high-order reference model. The difference from the center frequency of the spectral cue in is calculated as an error of the acoustic feature amount.

音響特徴量判定手段２７は、音響特徴量算出手段２６によって算出された音響特徴量の誤差（スペクトラルキューの中心周波数のずれ）が第２閾値以下であるか否かを判定するものである。ここで、第２閾値は特に限定されるものではない。ただし、本実施形態では、典型的な一例として第２閾値を離散周波数ビン１サンプルとした。つまり、スペクトラルキューの中心周波数において１サンプルの誤差範囲を許容した。具体的には、例えばサンプリング周波数を４８ｋＨｚ、スペクトルをデータ数Ｎ（５１２）の離散フーリエ変換により求めた場合、１サンプルの区間は９３．７５Ｈｚとすることができる。これにより、スペクトラルキューの中心周波数のずれが１サンプルより大きい場合、そのときに設定されているモデル次数ｍにおいて推定された低次モデルＧ^^l（ｑ）は、ＮＧであってモデリング装置１の最終出力とはならない。よって、スペクトラルキューの中心周波数のずれを算出する処理において、ｍとして取り得るすべての値を選択する前に、モデル次数ｍの更新につれてスペクトラルキューの中心周波数のずれが１サンプルよりも大きくなった時点で演算を終了させることができる。 The acoustic feature amount determination unit 27 determines whether or not an error of the acoustic feature amount calculated by the acoustic feature amount calculation unit 26 (a shift in the center frequency of the spectral cue) is equal to or less than a second threshold value. Here, the second threshold value is not particularly limited. However, in the present embodiment, as a typical example, the second threshold value is one sample of the discrete frequency bin. That is, an error range of 1 sample was allowed at the center frequency of the spectral cue. Specifically, for example, when the sampling frequency is 48 kHz and the spectrum is obtained by discrete Fourier transform with the number of data N (512), the section of one sample can be 93.75 Hz. As a result, when the shift of the center frequency of the spectral cue is larger than one sample, the low-order model G ^ ^l (q) estimated at the model order m set at that time is NG and the modeling apparatus 1 It is not the final output. Therefore, in the process of calculating the shift of the center frequency of the spectral cue, the time when the shift of the center frequency of the spectral cue becomes greater than one sample as the model order m is updated before selecting all possible values as m. The operation can be terminated with.

図１の低次モデル探索手段２３は、前記したように、推定しようとする未知の低次モデルのモデル次数ｍを更新して前記した式（１０）の対数尤度関数Ｖを最小化させることを繰り返しつつ、スペクトル歪ＳＤとスペクトラルキューの中心周波数のずれについての閾値判定を行う。この際に、モデル次数ｍの更新の仕方は特に限定されるものではない。 As described above, the low-order model search means 23 in FIG. 1 updates the model order m of the unknown low-order model to be estimated to minimize the log likelihood function V of the above-described equation (10). , The threshold value is determined for the difference between the spectral distortion SD and the center frequency of the spectral cue. At this time, the method of updating the model order m is not particularly limited.

例えば、スペクトル歪ＳＤの閾値判定と、スペクトラルキューの中心周波数のずれについての閾値判定との一方についてモデル次数ｍを更新しながら処理を行った後に、他方についてモデル次数ｍを更新しながらの閾値判定処理を行ってもよい。または、モデル次数ｍをある値に設定したときに、両方の閾値判定処理を行ってから、モデル次数ｍを更新するようにしてもよい。 For example, after performing the process while updating the model order m for one of the threshold determination for the spectral distortion SD and the threshold determination for the shift of the center frequency of the spectral cue, the threshold determination while updating the model order m for the other Processing may be performed. Alternatively, when the model order m is set to a certain value, the model order m may be updated after performing both threshold determination processes.

また、次数ｍを単調に減少させたり、単調に増加させたり、増減させたりしてもよい。
また、次数ｍを１ずつシフトしてもよいし、２以上の所定値ずつシフトしてもよい。必ずしも毎回同じシフト数にする必要もなく、例えば１０ずつシフトした後で、その間を１ずつシフトしてもよい。
また、前記したようにモデリング装置１の最終出力とはならないことが明らかならば、モデル次数ｍの更新の際に、値を取り得る全てのｍ（０＜ｍ＜ｎ）を必ずしも選択しなくてもよい。 Further, the order m may be monotonously decreased, monotonously increased, or increased / decreased.
Further, the order m may be shifted by 1 or may be shifted by 2 or more predetermined values. It is not always necessary to set the same number of shifts each time. For example, after shifting by 10 units, it may be shifted by 1 unit.
If it is clear that the final output of the modeling apparatus 1 is not obtained as described above, all m (0 <m <n) that can take values are not necessarily selected when the model order m is updated. Also good.

本実施形態では、一例として、低次モデル探索手段２３が、低次のモデル次数ｍを高次のモデル次数ｎの側から降順に更新してスペクトル歪ＳＤが第１閾値より大きくなった場合（これを以下、反復停止条件という）、対数尤度関数Ｖの最小化処理を停止し、スペクトル歪ＳＤが第１閾値（３ｄＢ）以下であったときの次数を起点として低次のモデル次数ｍを昇順に更新してスペクトラルキューの位置についての高次参照モデルとの間の誤差が第２閾値以下になったときの次数を最低次数として決定することとした。このようにすることで、モデル次数ｍの最低次数を効率よく求めることが可能である。 In the present embodiment, as an example, when the low-order model search means 23 updates the low-order model order m in descending order from the high-order model order n side, and the spectral distortion SD becomes larger than the first threshold ( This is hereinafter referred to as an iterative stop condition), the log likelihood function V minimization processing is stopped, and the low-order model order m is set starting from the order when the spectral distortion SD is less than or equal to the first threshold (3 dB). By updating in ascending order, the order when the error from the higher-order reference model regarding the position of the spectral cue falls below the second threshold is determined as the lowest order. By doing in this way, it is possible to obtain | require efficiently the minimum order of the model order m.

（式（１０）の対数尤度関数Ｖの最小化の求解法）
ここで説明する求解法には、下記のＡ−１，Ａ−２，Ａ−３の３つの手続きがある。 (Solution method for minimization of log-likelihood function V in equation (10))
The solution method described here includes the following three procedures A-1, A-2, and A-3.

Ａ−１．入力のフィルタリング
高次モデル推定手段１０で推定された既知の高次雑音モデルＨ^^h（ｑ）を用いて、次の式（１３）のように入力信号ｕ（ｋ）をフィルタリングする。この高次雑音モデルＨ^^h（ｑ）のフィルタを通過した信号ｕ_f（ｋ）は、頭部伝達関数の高次モデルＧ^^h（ｑ）の入力信号として用いられる。 A-1. Input Filtering The input signal u (k) is filtered using the known high-order noise model H ^ ^h (q) estimated by the high-order model estimation means 10 as shown in the following equation (13). The signal u _f (k) that has passed through the filter of the high-order noise model H ^ ^h (q) is used as an input signal for the high-order model G ^ ^h (q) of the head-related transfer function.

Ａ−２．出力の計算
前記手続きＡ−１で得られた式（１３）に示すフィルタ通過信号ｕ_f（ｋ）を、頭部伝達関数の高次モデルＧ^^h（ｑ）への新たな入力信号として、高次モデルＧ^^h（ｑ）の出力信号ｙ^_f（ｋ）を次の式（１４）により計算する。 A-2. Calculation of output The filtered signal u _f (k) shown in the equation (13) obtained in the procedure A-1 is used as a new input signal to the higher-order model G ^ ^h (q) of the head-related transfer function. The output signal y ^ _f (k) of the higher order model G ^ ^h (q) is calculated by the following equation (14).

Ａ−３．低次モデルのパラメータ推定
手続きＡ−１，Ａ−２によって、頭部伝達関数の高次モデルＧ^^h（ｑ）の新しい入出力信号｛ｕ_f（ｋ），ｙ^_f（ｋ）；ｋ＝１，２，…，Ｎ｝が得られたならば、この新しい入出力信号を用いて、高次モデルＧ^^h（ｑ）を低次元化した低次モデルのパラメータを出力誤差法により推定する。このとき、出力誤差法の損失関数Ｖ^OEは、次の式（１５）で表される。 A-3. Low-order model parameter estimation The new input / output signals {u _f (k), y ^ _f (k); k of the high-order model G ^ ^h (q) of the head-related transfer function by the procedures A-1 and A-2 = 1, 2,..., N}, the parameters of the low-order model obtained by reducing the high-order model G ^ ^h (q) using the new input / output signal are estimated by the output error method. To do. At this time, the loss function V ^OE of the output error method is expressed by the following equation (15).

ここで、式（１５）におけるＧ^^l（ｑ）が、求めるべき頭部伝達関数の低次モデルであり、次の式（１６）で表わされる。また、低次雑音モデルＨ^^l（ｑ）は次の式（１７）で表される。 Here, G ^ ^l (q) in the equation (15) is a low-order model of the head-related transfer function to be obtained, and is represented by the following equation (16). The low-order noise model H ^ ^l (q) is expressed by the following equation (17).

ただし、式（１６）において、Ａ^l（ｑ），Ｂ^l（ｑ）は式（１８），式（１９）でそれぞれ表され、式（１７）において、Ｃ^l（ｑ），Ｄ^l（ｑ）は式（２０），式（２１）でそれぞれ表される。 However, in formula (16), A ^l (q) and B ^l (q) are represented by formula (18) and formula (19), respectively, and in formula (17), C ^l (q), D ^l (q ) Is expressed by Expression (20) and Expression (21), respectively.

［音像定位制御の流れ］
ここでは、頭部伝達関数およびそのモデルの応用の一例として、音響信号の音像を空間内に擬似的に配置させるような音像定位技術を挙げて説明する。具体的には、モデリング装置１による頭部伝達関数のモデリング方法を含む音像定位制御の全体の流れについて図５を参照（適宜図１〜図３参照）して説明する。
まず、モデリング装置１の処理をする前に、システム同定を行うために必要な入出力データを測定する（ステップＳ１００）。測定方法は、図２を参照して説明した方法を用いることができる。入出力データの具体例は図３（ａ）及び図３（ｂ）に示されている。 [Flow of sound image localization control]
Here, as an example of the application of the head-related transfer function and its model, a sound image localization technique that artificially arranges a sound image of an acoustic signal in space will be described. Specifically, the overall flow of the sound image localization control including the method of modeling the head-related transfer function by the modeling apparatus 1 will be described with reference to FIG. 5 (refer to FIGS. 1 to 3 as appropriate).
First, before the processing of the modeling apparatus 1, input / output data necessary for system identification is measured (step S100). As the measurement method, the method described with reference to FIG. 2 can be used. Specific examples of input / output data are shown in FIGS. 3 (a) and 3 (b).

続いて、モデリング装置１の処理として、入出力データを用いて、漸近推定法により頭部伝達関数をモデリングする（ステップＳ２００）。
漸近推定法による処理（ステップＳ２００）を概説すると、まず、高次ＡＲＸモデルのパラメータを推定する処理を行い（ステップＳ２１０：高次モデル推定ステップ）、その後に、漸近理論に基づきＡＲＸモデルの低次元化処理を行う（ステップＳ２２０：低次元化ステップ）。 Subsequently, as a process of the modeling apparatus 1, a head related transfer function is modeled by an asymptotic estimation method using input / output data (step S200).
The process by the asymptotic estimation method (step S200) will be outlined. First, the process of estimating the parameters of the higher-order ARX model is performed (step S210: higher-order model estimation step), and then the lower dimension of the ARX model is based on the asymptotic theory. (Step S220: Reduction step).

より詳細には、ステップＳ２１０では、モデリング装置１において、高次モデル推定手段１０が、入出力データを用いて、予め定められた高次（ｎ次）のモデル次数を有した頭部伝達関数及び雑音モデルについての高次モデル（Ｇ^^h（ｑ），Ｈ^^h（ｑ））のパラメータ｛ａ_i｝，｛ｂ_i｝を予測誤差法により推定する。
また、ステップＳ２２０では、モデリング装置１において、低次元化手段２０が、推定された高次モデル（Ｇ^^h（ｑ），Ｈ^^h（ｑ））と、前記式（１０）に示す対数尤度関数Ｖとを用いて最尤推定値を導出することで高次モデルを低次元化する。そして、モデリング装置１は頭部伝達関数の低次モデルを出力する。なお、ＡＲＸモデルの低次元化処理の詳細な流れについては後記する。 More specifically, in step S210, in the modeling apparatus 1, the higher-order model estimation means 10 uses the input / output data to generate a head related transfer function having a predetermined higher-order (n-th) model order and The parameters {a _i } and {b _i } of the higher-order model (G ^ ^h (q), H ^ ^h (q)) for the noise model are estimated by the prediction error method.
In step S220, in the modeling apparatus 1, the reduction means 20 performs the estimated higher-order model (G ^ ^h (q), H ^ ^h (q)) and the logarithmic likelihood shown in the equation (10). The higher-order model is reduced in dimension by deriving the maximum likelihood estimate using the degree function V. Then, the modeling device 1 outputs a low-order model of the head related transfer function. The detailed flow of the ARX model reduction processing will be described later.

続いて、音像定位の制御対象に対して、モデリング装置１によって推定された頭部伝達関数の低次ＡＲＸモデルを適用する（ステップＳ３００）。この低次ＡＲＸモデルは、従来よりもパラメータ数の少ないモデルとして求められているので、頭部伝達関数として推定された低次モデルを音像定位技術などの制御対象として用いたときの演算量を従来よりも低減することができる。 Subsequently, the low-order ARX model of the head related transfer function estimated by the modeling apparatus 1 is applied to the sound image localization control target (step S300). Since this low-order ARX model is required as a model having a smaller number of parameters than the conventional model, the amount of computation when the low-order model estimated as the head-related transfer function is used as a control object such as a sound image localization technique is conventionally known. Can be reduced.

［ＡＲＸモデルの低次元化処理の詳細な流れ］
次に、モデリング装置１の低次元化手段２０によるＡＲＸモデルの低次元化処理の詳細な流れについて図６を参照（適宜図１〜図３及び図５参照）して説明する。
ＡＲＸモデルの低次元化処理（ステップＳ２２０）では、まず、モデリング装置１の周波数伝達関数算出手段２１が、図５のステップＳ２１０で得られた高次（ｎ次）モデル｛Ｇ^^h（ｑ），Ｈ^^h（ｑ）｝の周波数伝達関数Ｇ^^h(ｅ^jω)，Ｈ^^h(ｅ^jω)を求める（ステップＳ２２１）。 [Detailed flow of ARX model reduction processing]
Next, the detailed flow of ARX model reduction processing by the reduction means 20 of the modeling apparatus 1 will be described with reference to FIG. 6 (refer to FIGS. 1 to 3 and 5 as appropriate).
In the ARX model reduction processing (step S220), first, the frequency transfer function calculation means 21 of the modeling apparatus 1 uses the higher-order (n-order) model {G ^ ^h (q) obtained in step S210 of FIG. , H ^ ^h (q)}, the frequency transfer functions G ^ ^h ( ^ejω ), H ^ ^h ( ^ejω ) are obtained (step S221).

そして、低次モデル探索手段２３が、モデル次数ｍ（０＜ｍ＜ｎ）を設定する（ステップＳ２２２）。ここでは、一例としてｎ＝１００としているので、ｍ＝９９を設定することとする。そして、低次モデル推定手段２２は、例えばｎ＝１００、ｍ＝９９の場合において、前記した式（１０）に示す対数尤度関数Ｖを最小化し、例えばｍ＝９９の設定値の場合の低次モデルを推定する（ステップＳ２２３）。これにより、低次モデル探索手段２３が、例えばｍ＝９９の設定値の場合の低次モデルの推定値Ｇ^^l（ｑ）を得る。 Then, the low-order model search means 23 sets the model order m (0 <m <n) (step S222). Here, as an example, n = 100, and therefore m = 99 is set. Then, the low-order model estimation means 22 minimizes the log likelihood function V shown in the above equation (10) when, for example, n = 100 and m = 99. A next model is estimated (step S223). As a result, the low-order model search means 23 obtains an estimated value G ^ ^l (q) of the low-order model when the set value is m = 99, for example.

そして、低次モデル探索手段２３は、モデル次数ｍの更新についての反復停止条件が成立したか否かを判定する（ステップＳ２２４）。具体的には、スペクトラル歪算出手段２４が、９９次のモデル（Ｈ_X）と１００次のモデル（Ｈ_Y）のスペクトル歪ＳＤを前記式（１２）によって算出し、スペクトラル歪判定手段２５が３ｄＢ（第１閾値）より大きいと判定した場合、反復停止条件が成立する。 Then, the low-order model search means 23 determines whether or not the iterative stop condition for the update of the model order m is satisfied (step S224). Specifically, the spectral distortion calculating means 24 calculates the spectral distortion SD of the 99th order model (H _X ) and the 100th order model (H _Y ) according to the above equation (12), and the spectral distortion determining means 25 is 3 dB. If it is determined that the value is greater than (first threshold), the repeated stop condition is satisfied.

一方、反復停止条件が成立していない場合（ステップＳ２２４：Ｎｏ）、低次のモデル次数ｍを更新し（ステップＳ２２５）、ステップＳ２２３に戻る。具体的には、低次モデル探索手段２３が、モデル次数ｍの値を１だけ減算してｍ＝９８を設定した場合、低次モデル推定手段２２は、例えばｎ＝１００、ｍ＝９８の場合において、前記した式（１０）に示す対数尤度関数Ｖを最小化し、例えばｍ＝９８の設定値の場合の低次モデルを推定する（ステップＳ２２２）。このステップＳ２２４でＮｏの場合の処理は以下同様に減算を行う。 On the other hand, when the repeated stop condition is not satisfied (step S224: No), the low-order model order m is updated (step S225), and the process returns to step S223. Specifically, when the low-order model searching means 23 subtracts the value of the model order m by 1 and sets m = 98, the low-order model estimating means 22 is, for example, when n = 100 and m = 98. In step S222, the log-likelihood function V shown in the above equation (10) is minimized to estimate a low-order model in the case of a set value of m = 98, for example (step S222). In the case of No in step S224, subtraction is performed in the same manner.

ステップＳ２２５によって、モデル次数ｍの値をより低くしてモデルを低次元化し続けると、やがて、低次モデル探索手段２３は、反復停止条件が成立したと判定する（ステップＳ２２４：Ｙｅｓ）。具体的には、スペクトラル歪判定手段２５が３ｄＢ（第１閾値）より大きいと判定する。 If the value of the model order m is further lowered in step S225 and the model continues to be reduced in dimension, the low-order model search unit 23 eventually determines that the iterative stop condition is satisfied (step S224: Yes). Specifically, it is determined that the spectral distortion determination means 25 is larger than 3 dB (first threshold).

ステップＳ２２４でＹｅｓの場合、低次モデル探索手段２３は、ステップＳ２２６において、その時点のモデル次数ｍの設定値に「１」を加算する。この加算で得られたモデル次数ｍの値は、低次ＡＲＸモデルにおいてスペクトラル歪ＳＤが第１閾値（３ｄＢ）以下であったときの最低次数である。また、このとき、低次モデル探索手段２３は、音像定位知覚に係る特徴量についての参照（高次参照モデル）との誤差が予め定められた許容条件を満たす最低次数から、低次モデルの次数ｍを決定する（ステップＳ２２６）。
具体的には、音響特徴量算出手段２６が、その時点で設定されているモデル次数ｍの低次モデルＧ^^l（ｑ）についての周波数伝達関数Ｇ^^l(ｅ^jω)におけるスペクトラルキューの中心周波数と、高次参照モデル（５１２次ＦＩＲモデル）についての周波数伝達関数におけるスペクトラルキューの中心周波数との差分を音響特徴量の誤差として算出する。そして、音響特徴量判定手段２７は、スペクトラルキューの中心周波数のずれが１サンプル（第２閾値）以下であるか否かを判定する。 In the case of Yes in step S224, the low-order model search means 23 adds “1” to the set value of the model order m at that time in step S226. The value of the model order m obtained by this addition is the lowest order when the spectral distortion SD is less than or equal to the first threshold (3 dB) in the low-order ARX model. Further, at this time, the low-order model search means 23 determines the order of the low-order model from the lowest order that satisfies the predetermined permissible condition with respect to the reference (high-order reference model) for the feature amount related to sound image localization perception. m is determined (step S226).
Specifically, the acoustic feature quantity calculating means 26 is the center of the spectral cue in the frequency transfer function G ^ ^l (e ^jω ) for the low-order model G ^ ^l (q) of the model order m set at that time. The difference between the frequency and the center frequency of the spectral cue in the frequency transfer function for the higher-order reference model (512th order FIR model) is calculated as an error of the acoustic feature quantity. Then, the acoustic feature amount determination unit 27 determines whether or not the shift in the center frequency of the spectral cues is equal to or less than one sample (second threshold).

スペクトラルキューの中心周波数のずれが１サンプル（第２閾値）より大きい場合、その時点のモデル次数ｍの設定値が低過ぎるので、低次モデル探索手段２３は、その時点のモデル次数ｍの設定値に「１」を加算する。なお、この加算で得られたモデル次数ｍの値の場合、スペクトラル歪ＳＤは当然ながら第１閾値（３ｄＢ）以下である。 If the shift in the center frequency of the spectral cue is larger than one sample (second threshold value), the set value of the model order m at that time is too low, so the low-order model search means 23 sets the set value of the model order m at that time. "1" is added to. In the case of the value of the model order m obtained by this addition, the spectral distortion SD is naturally not more than the first threshold value (3 dB).

そして、同様にして、音響特徴量算出手段２６が、スペクトラルキューの中心周波数のずれを算出し、音響特徴量判定手段２７は、スペクトラルキューの中心周波数のずれが１サンプル（第２閾値）以下であるか否かを判定する。スペクトラルキューのずれが大きい場合の処理は以下同様にモデル次数ｍの値の加算を行う。やがて、音響特徴量判定手段２７は、ずれが１サンプル以下になったと判定する。この時点のモデル次数ｍの設定値が、低次モデル探索手段２３で本来探索していた最低次数である。そして、モデリング装置１は、その最低次数のモデル次数を有した頭部伝達関数の低次元ＡＲＸモデルを出力する。 Similarly, the acoustic feature quantity calculation means 26 calculates the shift of the center frequency of the spectral cue, and the acoustic feature quantity determination means 27 determines that the shift of the center frequency of the spectral cue is one sample (second threshold) or less. It is determined whether or not there is. In the case where the difference between the spectral cues is large, the value of the model order m is added in the same manner. Eventually, the acoustic feature quantity determination means 27 determines that the deviation has become one sample or less. The set value of the model order m at this time is the lowest order originally searched by the low-order model search means 23. Then, the modeling device 1 outputs a low-dimensional ARX model of the head related transfer function having the lowest order model order.

［ＡＲＸモデルの低次元化処理の具体例］
本発明の効果を確かめるために、ダミーヘッドＤの頭部中心の位置より１．３ｍ離れた位置で、ダミーヘッドＤの正面から左３０°方向に設置した１つのスピーカＳＰから測定信号を印加する実験を行った。そして、ダミーヘッドＤの左耳に内蔵されたマイクロフォンにより収音を行なって図３（ｂ）に例示した出力信号を得た。このときの入出力データを用いて本実施形態に係るモデリング方法で頭部伝達関数をモデリングした。また、高次参照モデルとして５１２次ＦＩＲモデルを求めた。 [Specific example of ARX model reduction processing]
In order to confirm the effect of the present invention, a measurement signal is applied from one speaker SP installed in a 30 ° left direction from the front of the dummy head D at a position 1.3 m away from the center of the head of the dummy head D. The experiment was conducted. Then, sound was collected by a microphone built in the left ear of the dummy head D to obtain an output signal illustrated in FIG. The head-related transfer function was modeled by the modeling method according to the present embodiment using the input / output data at this time. In addition, a 512th order FIR model was obtained as a higher order reference model.

図６のステップＳ２２３〜Ｓ２２５を繰り返しつつモデル次数ｍの設定値を単調減少させながら、ｎ＝１００の高次モデルとの間で各次数におけるスペクトラル歪ＳＤを求めた。このときのモデル次数ｍに対するＳＤの変化のグラフを図７に示す。モデル次数ｍの設定値を１７次まで下げたときにＳＤが３ｄＢよりも大きくなった。つまり、ＳＤが３ｄＢ以下となる次数は１８次以上であった。なお、モデル次数ｍの更新にあわせて前記式（１０）に示す対数尤度関数Ｖを最小化する処理を８３回行った。 Spectral distortion SD at each order with respect to a higher order model of n = 100 was obtained while monotonously decreasing the set value of model order m while repeating steps S223 to S225 of FIG. FIG. 7 shows a graph of SD change with respect to the model order m at this time. When the set value of the model order m was lowered to the 17th order, SD became larger than 3 dB. That is, the order at which SD becomes 3 dB or less was 18th or more. Note that the process of minimizing the log likelihood function V shown in the equation (10) was performed 83 times in accordance with the update of the model order m.

続いて、スペクトラルキューの中心周波数を、高次参照モデル（５１２次ＦＩＲモデル）との間で１サンプルの範囲内で捉えることのできるモデルを求めた。高次参照モデル（５１２次ＦＩＲモデル）の周波数特性を図１０に示す。 Subsequently, a model that can capture the center frequency of the spectral cues within a range of one sample with a higher-order reference model (512th-order FIR model) was obtained. FIG. 10 shows frequency characteristics of the higher-order reference model (512th-order FIR model).

図６のステップＳ２２６に対応させて、既に得られている１８次以上のモデルを探索対象としてスペクトラルキューの中心周波数のずれを求めた。１８次、１９次、２０次の場合、スペクトラルキューのずれが許容範囲を超えていたが、２１次の場合、スペクトラルキューのずれを１サンプルの範囲内で捉えることができた。つまり、モデル次数ｍの更新に係るスペクトラルキューのずれを求める処理は４回だけ行った。 Corresponding to step S226 in FIG. 6, the shift of the center frequency of the spectral cues was obtained by using the already obtained 18th-order or higher model as a search target. In the 18th, 19th, and 20th orders, the deviation of the spectral cues exceeded the allowable range, but in the 21st order, the deviation of the spectral cues could be captured within the range of one sample. That is, the process for obtaining the shift of the spectral queue related to the update of the model order m was performed only four times.

上記実験から得られた２１次のＡＲＸモデルの周波数特性を図８（ｂ）に破線で示す。また、図８（ａ）には、図８（ｂ）に示した２１次のＡＲＸモデルの周波数特性（破線）と、図１０に示した高次参照モデル（５１２次ＦＩＲモデル）の周波数特性（実線）とを重ねて表示した。低周波側においてひずみが顕著となるが、これは、モデルの違いと、ゲインの小さい帯域では相対的にＳＤが大きくなることが原因として考えられる。 The frequency characteristic of the 21st-order ARX model obtained from the above experiment is shown by a broken line in FIG. 8A shows the frequency characteristics (broken line) of the 21st-order ARX model shown in FIG. 8B and the frequency characteristics of the higher-order reference model (512th-order FIR model) shown in FIG. (Solid line) and superimposed. Distortion becomes conspicuous on the low frequency side. This is considered to be caused by the difference between models and the relatively large SD in a band with a small gain.

図８（ａ）に示すように、２１次のＡＲＸモデルでは、スペクトラルキューが保存されていることが分かる。加えて、従来用いられてきたＦＩＲモデルが５１２次であることを考慮する高次参照モデルに比べてパラメータ数を大幅に減少させることができることを確かめた。 As shown in FIG. 8A, it can be seen that the spectral queue is stored in the 21st-order ARX model. In addition, it has been confirmed that the number of parameters can be greatly reduced compared to a higher-order reference model that considers that the FIR model that has been used in the past is the 512th order.

なお、上記実験について、１方向の測定例を挙げたが、実際には、収音し終えたらダミーヘッドを５°時計周りに回転させ、同様にして収音する、という手順を繰り返すことにより、水平面５°間隔７２方向の頭部伝達関数を測定した。そして、水平面７２方向から左耳までの頭部伝達関数をモデリングしたとき、各方位において妥当だと考えられる次数の平均次数を求めると２０次となった。 For the above experiment, an example of measurement in one direction was given. However, in practice, by repeating the procedure of rotating the dummy head clockwise by 5 ° and collecting sound in the same manner, after collecting the sound, The head-related transfer function in the direction of 72 in the horizontal plane at 5 ° intervals was measured. When the head-related transfer function from the direction of the horizontal plane 72 to the left ear was modeled, the average order of the orders considered to be appropriate in each direction was 20th.

［頭部伝達関数の低次モデルを適用する音像定位制御の具体例］
ここでは、本実施形態で推定された頭部伝達関数の低次モデルを適用する音像定位制御の具体例について説明する。
頭部伝達関数を利用したシステムとして、トランスオーラルシステムと呼ばれる、三次元音響を実現するためのシステムが知られている。図９に、制御点（以下、その識別子をｉ、ただしｉ＝１，…，ｍとする）及び２次音源（以下、その識別子をｊ、ただしｊ＝１，…，ｎとする）を有するトランスオーラル再生システムのブロック図を示す。なお、ｍはｎと等しくてもよい。 [Specific example of sound image localization control using a low-order model of the head-related transfer function]
Here, a specific example of sound image localization control that applies a low-order model of the head-related transfer function estimated in this embodiment will be described.
As a system using a head-related transfer function, a system called a trans-oral system for realizing three-dimensional sound is known. FIG. 9 includes control points (hereinafter, the identifier is i, where i = 1,..., M) and secondary sound sources (hereinafter, the identifier is j, where j = 1,..., N). 1 shows a block diagram of a trans-oral playback system. Note that m may be equal to n.

ここで、制御点は、例えば、図９に示す聴取者（リスナー）９０の右耳位置Ｒや左耳位置Ｌである。一例として、識別子ｉによって、例えばリスナー９０の右耳位置、左耳位置の順に制御点を識別するものとした。シンプルな例では１人のリスナー９０を想定してｍ＝２とすればよいが一般化して説明する。 Here, the control points are, for example, the right ear position R and the left ear position L of the listener (listener) 90 shown in FIG. As an example, the control point is identified by the identifier i in the order of the right ear position and the left ear position of the listener 90, for example. In a simple example, assuming that one listener 90 is assumed and m = 2, it will be generalized.

２次音源は、図９に示すスピーカである。ここでは、一例として、リスナー９０の右耳位置Ｌ側から順にスピーカＳＰ_jを識別子ｊによって識別するものとした。音源ＳＳはスピーカＳＰ_j（ｊ＝１，…，ｎ）に信号を出力するものである。なお、音源ＳＳの個数は特に限定されない。スピーカＳＰは、例えばラウドスピーカである。 The secondary sound source is a speaker shown in FIG. Here, as an example, the speaker SP _j is identified by the identifier _j in order from the right ear position L side of the listener 90. The sound source SS outputs a signal to the speaker SP _j (j = 1,..., N). The number of sound sources SS is not particularly limited. The speaker SP is, for example, a loud speaker.

図中、システムの要素Ｇ_ij（ｑ）は、シフトオペレータｑを用いた制御対象の頭部伝達関数を表し、ｊ番目のスピーカ（２次音源）からｉ番目の制御点（耳位置）への音響伝達関数を表す。なお、モデリング装置１で推定する１つの低次元ＡＲＸモデルが１つのＧ_ij（ｑ）に相当する。また、要素Ｘ_i（ｑ）は、各制御点での所望伝達関数を表す。 In the figure, an element G _ij (q) of the system represents a head-related transfer function to be controlled using the shift operator q, and is transferred from the j-th speaker (secondary sound source) to the i-th control point (ear position). Represents an acoustic transfer function. One low-dimensional ARX model estimated by the modeling apparatus 1 corresponds to one G _ij (q). Element X _i (q) represents a desired transfer function at each control point.

また、要素Ｈ_ji（ｑ）は、クロストーク・キャンセレーションのための制御器として働く。一般に、スピーカを用いてバイノーラル信号を提示する場合、スピーカから同側耳までの信号の伝搬に加え、対側耳への漏洩(クロストーク)も発生する。従って、このクロストークを抑圧し、所望信号のみをそれぞれの耳に伝送する補償処理が必要となってくる。この処理のことをクロストーク・キャンセレーションという。
なお、前記した要素Ｘ_i（ｑ）は、制御器Ｈ_ji（ｑ）によってクロストークを抑圧後に、ｉ番目の各制御点においてリスナー９０の右耳又は左耳にだけ聴かせたい音声信号の伝達関数を表す。 The element H _ji (q) serves as a controller for crosstalk cancellation. In general, when a binaural signal is presented using a speaker, in addition to signal propagation from the speaker to the ipsilateral ear, leakage (crosstalk) to the contralateral ear also occurs. Therefore, it is necessary to perform compensation processing for suppressing the crosstalk and transmitting only a desired signal to each ear. This process is called crosstalk cancellation.
The element X _i (q) transmits an audio signal to be heard only by the right ear or the left ear of the listener 90 at each i-th control point after the crosstalk is suppressed by the controller H _ji (q). Represents a function.

従って、システムの入出力信号は、次の式（２２）〜式（２６）のような関係で表される。 Therefore, the input / output signals of the system are expressed by the following relationships (22) to (26).

ここで、システムの入力信号をスカラーのｕ（ｋ）として、システムの出力信号ｙ（ｋ）を式（２３）に示すように、制御点数に対応してｍ個の要素を有した一次元行列（列ベクトル）で表すこととしている。なお、ｙ（ｋ）の列ベクトルは、式（２３）において行ベクトルの転置Ｔにより表されている。また、ｋ＝１，２，…，Ｎとすると共に、Ｎはデータ数であるものとする。
式（２４）は、制御点数に対応してｍ個の要素（Ｘ_i（ｑ））を有した列ベクトルで表される。なお、この列ベクトルは行ベクトルの転置Ｔにより表されている。
式（２５）は、制御点数と２次音源数とに対応してｍ×ｎ個の要素（Ｇ_ij（ｑ））を有したベクトル（行列）で表される。
式（２６）は、２次音源数と制御点数とに対応してｎ×ｍ個の要素（Ｈ_ji（ｑ））を有したベクトル（行列）で表される。 Here, the system input signal is a scalar u (k), and the system output signal y (k) is a one-dimensional matrix having m elements corresponding to the number of control points, as shown in Equation (23). (Column vector). Note that the column vector of y (k) is represented by the transposition T of the row vector in Equation (23). Further, k = 1, 2,..., N, and N is the number of data.
Expression (24) is represented by a column vector having m elements (X _i (q)) corresponding to the number of control points. This column vector is represented by a transposition T of the row vector.
Expression (25) is represented by a vector (matrix) having m × n elements (G _ij (q)) corresponding to the number of control points and the number of secondary sound sources.
Expression (26) is expressed by a vector (matrix) having n × m elements (H _ji (q)) corresponding to the number of secondary sound sources and the number of control points.

このシステムにおける式（２２）の左辺に示す所望出力信号は、クロストーク・キャンセレーション後、入力信号ｕ（ｋ）に対して前記式（２４）に示す所望伝達関数が作用された信号となるため、次の式（２７）のように記述される。 In this system, the desired output signal shown on the left side of Equation (22) is a signal obtained by applying the desired transfer function shown in Equation (24) to the input signal u (k) after crosstalk cancellation. The following equation (27) is used.

このようにシフトオペレータｑを用いると、時間領域での畳み込み演算が行列積の形で記述可能となる。そのため、前記式（２６）で定義されたシステムの制御器を求めるには、式（２７）と前記式（２２）とから代数学的な逆行列演算を行えばよい。その結果、次の式（２８）で記述されるようなシステムの制御器を設計することができる。 If the shift operator q is used in this way, the convolution operation in the time domain can be described in the form of a matrix product. Therefore, to obtain the controller of the system defined by the equation (26), an algebraic inverse matrix operation may be performed from the equation (27) and the equation (22). As a result, it is possible to design a controller of the system as described by the following equation (28).

ただし、式（２８）に示す制御器では不安定となる。これを解決するため、この不安定な制御器を一旦設計した後、その制御器を構成する各伝達関数から不安定極を持つ伝達関数を括り出すことが考えられる。そして、これを遅れ逆システムとして近似することにより、安定な制御器を実現することが可能である。本実施形態では、頭部伝達関数のモデリングは、図４（ｂ）に示すＡＲＸモデルを用いており、音像定位にはＩＩＲフィルタが実装されることになる。 However, the controller shown in Expression (28) becomes unstable. In order to solve this problem, it is conceivable that after designing the unstable controller once, a transfer function having an unstable pole is extracted from each transfer function constituting the controller. Then, by approximating this as a delayed inverse system, a stable controller can be realized. In the present embodiment, the head-related transfer function is modeled using the ARX model shown in FIG. 4B, and an IIR filter is mounted for sound image localization.

以上説明したように、本実施形態に係る頭部伝達関数のモデリング装置によれば、音像定位知覚にかかる特徴量であるスペクトラルキューを保存しつつ、頭部伝達関数を少ないパラメータ数でモデリングすることが可能となる。また、頭部伝達関数の測定における雑音を考慮しているため、より精緻なモデルとなることが期待される。さらに、パラメータ数が少ないため、頭部伝達関数を用いた音像定位方式などにおける演算量を低減することが可能となる。 As described above, according to the head-related transfer function modeling apparatus according to the present embodiment, the head-related transfer function is modeled with a small number of parameters while preserving the spectral cue that is a feature amount related to sound image localization perception. Is possible. In addition, since noise in the measurement of the head-related transfer function is taken into consideration, it is expected to be a more precise model. Furthermore, since the number of parameters is small, it is possible to reduce the amount of calculation in a sound image localization method using a head-related transfer function.

以上、実施形態に基づいて本発明を説明したが、本発明はこれに限定されるものではない。例えば、低次モデル探索手段２３が低次モデルを探索する際の指標に用いる音像定位知覚に係る特徴量としてスペクトラルキューを例示したが、頭部伝達関数に含まれる他の特徴量、例えば両耳間時間差やレベル差を用いてもよい。 As mentioned above, although this invention was demonstrated based on embodiment, this invention is not limited to this. For example, although the spectral cue is exemplified as a feature quantity related to sound image localization perception used as an index when the low-order model search means 23 searches for a low-order model, other feature quantities included in the head-related transfer function, for example, binaural A time difference or a level difference may be used.

また、頭部伝達関数のモデリング装置１は、電子回路が各種電子部品や半導体デバイス等によってハードウェア的に構築された回路であってもよいし、当該装置１の各構成の処理を汎用的または特殊なコンピュータ言語によって記述した頭部伝達関数のモデリングプログラムとこれを処理するＣＰＵの協働によって実現するものであってもよい。 Further, the head related transfer function modeling device 1 may be a circuit in which an electronic circuit is constructed in hardware by various electronic components, semiconductor devices, or the like. It may be realized by cooperation of a head transfer function modeling program described in a special computer language and a CPU that processes the program.

また、高次参照モデルの一例として５１２次のＦＩＲモデルを挙げて具体的に説明したが、高次参照モデルは、これに限らず、ＦＩＲ以外の例えばＡＲＸモデルであっても、雑音モデルを考慮した高次モデルであっても構わない。
また、頭部伝達関数およびそのモデルの応用の一例として、音像定位技術を挙げて具体的に説明したが、本発明は、音像定位に限らず、例えば、ラウドネスメーターなど頭部伝達関数を利用した技術全般に適用することができる。 Further, the 512th-order FIR model has been specifically described as an example of the higher-order reference model. However, the higher-order reference model is not limited to this, and a noise model is considered even if it is an ARX model other than the FIR, for example. It may be a higher order model.
In addition, as an example of the application of the head related transfer function and its model, a sound image localization technique has been specifically described. However, the present invention is not limited to sound image localization, and for example, a head related transfer function such as a loudness meter is used. Applicable to all technologies.

本発明に係る頭部伝達関数のモデリング装置は、ヘッドホンを用いた音響再生技術、スピーカによる音響再生技術全般に利用することができる。 The head-related transfer function modeling apparatus according to the present invention can be used in general sound reproduction technology using headphones and sound reproduction technology using speakers.

１頭部伝達関数のモデリング装置
１０高次モデル推定手段
１１高次数設定手段
２０低次元化手段
２１周波数伝達関数算出手段
２２低次モデル推定手段
２３低次モデル探索手段
２４スペクトラル歪算出手段
２５スペクトラル歪判定手段
２６音響特徴量算出手段
２７音響特徴量判定手段 DESCRIPTION OF SYMBOLS 1 Head transfer function modeling apparatus 10 Higher order model estimation means 11 Higher order setting means 20 Reduction means 21 Frequency transfer function calculation means 22 Lower order model estimation means 23 Lower order model search means 24 Spectral distortion calculation means 25 Spectral distortion Determination means 26 Acoustic feature quantity calculation means 27 Acoustic feature quantity determination means

Claims

A head-related transfer function that models a head-related transfer function by an asymptotic estimation method using an input signal applied to the speaker and an output signal obtained by measuring the sound emitted from the speaker with a microphone as input / output data Modeling equipment,
High-order model estimation means for estimating a high-order model for a head-related transfer function and a noise model having a predetermined high-order model order using the input / output data by a prediction error method;
A reduction means for reducing the order of the higher-order model by deriving a maximum likelihood estimate using an estimated higher-order model and a log likelihood function that is an evaluation function in the frequency domain;
The reduction means is
A frequency transfer function calculating means for obtaining a frequency transfer function of the higher order model;
Low-order model estimation means for obtaining an estimated value of a lower-order model having a lower order than the higher-order model order by minimizing the log likelihood function;
Sound image localization perception between a low-order model of each head-related transfer function and a high-order reference model, each of which is estimated by updating the low-order order and minimizing the log likelihood function A low-order model search means for searching for a low-order model when the feature-value error satisfies a predetermined permissible condition and has a minimum order, respectively,
An apparatus for modeling a head-related transfer function.

2. The head related transfer function modeling apparatus according to claim 1, wherein the feature amount related to the perception of sound image localization is a position of a spectral cue which is a peak and a notch on a frequency characteristic of the head related transfer function.

The low-order model search means includes
Spectral distortion between the low-order model of each head-related transfer function estimated by the low-order model estimation means and the high-order reference model is obtained, and the spectral distortion is equal to or less than a predetermined first threshold value. And searching for a low-order model when an error between the spectral cue position and the high-order reference model satisfies a condition equal to or smaller than a predetermined second threshold value and becomes a minimum order. The head related transfer function modeling apparatus according to claim 2.

The low-order model search means includes
Updating the low-order order in descending order from the high-order model order side and stopping the log likelihood function minimization process when the spectral distortion is greater than the first threshold;
The low-order order is updated in ascending order starting from the order when the spectral distortion is less than or equal to the first threshold, and an error between the spectral reference position and the high-order reference model is the second order. 4. The head-related transfer function modeling apparatus according to claim 3, wherein an order when a value falls below a threshold is determined as the lowest order.

A head-related transfer function that models a head-related transfer function by an asymptotic estimation method using an input signal applied to the speaker and an output signal obtained by measuring the sound emitted from the speaker with a microphone as input / output data Modeling method,
Using the input / output data, a high-order model estimation step for estimating a high-order model for a head-related transfer function and a noise model having a predetermined high-order model order by a prediction error method;
A lowering step for lowering the higher order model by deriving a maximum likelihood estimate using an estimated higher order model and a log likelihood function that is an evaluation function in the frequency domain, ,
The dimension reduction step includes:
A frequency transfer function calculating step for obtaining a frequency transfer function of the higher order model;
A low-order model estimation step for obtaining an estimated value of a low-order model of an order lower than the high-order model order by minimizing the log likelihood function;
Sound image localization perception between a low-order model of each head-related transfer function and a high-order reference model, each of which is estimated by updating the low-order order and minimizing the log likelihood function A low-order model search step for searching for a low-order model when the feature-value error satisfies a predetermined permissible condition and has a minimum order,
A method for modeling a head related transfer function, comprising:

A head transfer function modeling program for causing a computer to function as the head transfer function modeling device according to claim 1.