JP6592838B2

JP6592838B2 - Binaural signal generation apparatus, method, and program

Info

Publication number: JP6592838B2
Application number: JP2015168545A
Authority: JP
Inventors: 圭吾若山; 英明高田; 翔一小山; 洋猿渡
Original assignee: Nippon Telegraph and Telephone Corp; University of Tokyo NUC
Current assignee: Nippon Telegraph and Telephone Corp; University of Tokyo NUC
Priority date: 2015-08-28
Filing date: 2015-08-28
Publication date: 2019-10-23
Anticipated expiration: 2035-08-28
Also published as: JP2017046256A

Description

この発明は、ある音場に設置されたマイクロホンで音信号を収音し、その音信号を用いてその音場を再現する技術に関する。 The present invention relates to a technique for collecting a sound signal with a microphone installed in a certain sound field and reproducing the sound field using the sound signal.

複数のマイクロホンを用いて遠隔地の音場を仮想的に再現するためのバイノーラル信号を生成する技術として例えば非特許文献１に記載された技術が知られている。非特許文献１に記載された技術では、一方向に対して対応する１個の頭部伝達関数（Head Related Transfer Function:HRTF）を畳み込むことにより、バイノーラル信号を生成している。 As a technique for generating a binaural signal for virtually reproducing a remote sound field using a plurality of microphones, for example, a technique described in Non-Patent Document 1 is known. In the technique described in Non-Patent Document 1, a binaural signal is generated by convolving one head related transfer function (HRTF) corresponding to one direction.

平原達也, et al. , "頭部伝達関数の計測とバイノーラル再生にかかわる諸問題," 電子情報通信学会基礎・境界ソサイエティ Fundamentals Review, vol. 2, no. 4, pp. 68-85, 2009.Tatsuya Hirahara, et al., "Problems related to measurement of head-related transfer functions and binaural reproduction," Fundamentals Review, IEICE Fundamentals / Boundary Society Fundamentals Review, vol. 2, no. 4, pp. 68-85, 2009.

非特許文献１に記載された技術では、頭部伝達関数が１個だけしか用いられていない。このため、生成されるバイノーラル信号の精度が必ずしも高くなかった。 In the technique described in Non-Patent Document 1, only one head-related transfer function is used. For this reason, the accuracy of the generated binaural signal is not necessarily high.

この発明の目的は、より精度の高いバイノーラル信号を生成するバイノーラル信号生成装置、方法及びプログラムを提供することである。 An object of the present invention is to provide a binaural signal generation apparatus, method, and program for generating a binaural signal with higher accuracy.

この発明の一態様によるバイノーラル信号生成装置は、円状に配置された円状スピーカアレーからの再生音に基づく複数の頭部伝達関数を考慮して所望音場と合成音場を一致させることで得られるフィルタと、円状に配置された円状マイクロホンアレーによる観測信号とを時空間方向に畳み込むことによりバイノーラル信号を生成するフィルタリング部を含み、jを虚数とし、kを波数とし、JA binaural signal generation device according to an aspect of the present invention matches a desired sound field with a synthesized sound field in consideration of a plurality of head-related transfer functions based on reproduced sounds from circular speaker arrays arranged in a circle. It includes a filtering unit that generates a binaural signal by convolving the obtained filter and a circular microphone array observation signal in a spatio-temporal direction, j is an imaginary number, k is a wave number, and J _mm をm次のベッセル関数とし、HBe an m-order Bessel function and H _mm ⁽¹⁾⁽¹⁾ 'をm次の第一種ハンケル関数H'M-th order Hankel function H _mm ⁽¹⁾⁽¹⁾ の微分とし、聴者の何れか一方の耳の円筒座標系における位置を(RThe position of one of the listener's ears in the cylindrical coordinate system is expressed as (R _LL ,φ, φ _LL ,0)とし、~G, 0) and ~ G ^LL を円筒調和関数領域表現された頭部伝達関数とし、~Gを各スピーカの円筒調和関数領域表現された伝達関数とし、~Gは半径RIs the head-related transfer function expressed in the cylindrical harmonic function region, ~ G is the transfer function expressed in the cylindrical harmonic function region of each speaker, and ~ G is the radius R _refref の円周上で測定されるとし、R, R _mm を円状マイクロホンアレーの半径とし、RIs the radius of the circular microphone array and R _ss を円状スピーカアレーの半径として、フィルタは、以下の式(A)又は式(B)により定義されるFWhere F is defined by the following equation (A) or (B) ^LL を逆フーリエ変換することにより得られるフィルタである。Is a filter obtained by inverse Fourier transform.

The binaural signal generation device according to one aspect of the present invention is an observation using a spherical microphone array arranged in a spherical shape with a filter obtained by matching a desired sound field and a synthesized sound field in consideration of a spherical speaker array arranged in a spherical shape. A filtering unit that obtains a signal by space-time convolution of the signal, a time signal corresponding to each speaker position of the spherical speaker array of the obtained signal, and a plurality of head-related transfer functions based on reproduced sound from the spherical speaker array A signal generator for generating a binaural signal in the time domain by adding together the time signals corresponding to the respective speaker positions of the spherical speaker array in the time direction by the number of speakers constituting the spherical speaker array. , J is an imaginary number, k is a wave number, and n and m are spherical harmonic functions Y _nn ^mm And the position r of the spherical speaker array in the spherical coordinate system. _ss R _ss = (R _ss , θ _ss , φ _ss ) And h _nn ⁽¹⁾⁽¹⁾ 'N-th order first-class sphere Hankel function h _nn ⁽¹⁾⁽¹⁾ ~ G is the transfer function expressed in the cylindrical harmonic function region of the speaker constituting the spherical speaker array, and ~ G is the radius R _refref , R _mm Is the radius of the spherical microphone array and R _ss Is the radius of the spherical speaker array, and the filter is defined by the following formula (E) or formula (F). ^LL It is.

複数の頭部伝達関数を用いることにより、精度の高いバイノーラル信号を生成することができる。 By using a plurality of head-related transfer functions, a highly accurate binaural signal can be generated.

第一実施形態のバイノーラル信号生成装置の例を示す機能ブロック図。The functional block diagram which shows the example of the binaural signal generator of 1st embodiment. 第一実施形態のバイノーラル信号生成方法の例を示す流れ図。The flowchart which shows the example of the binaural signal generation method of 1st embodiment. 座標系を説明するための図。The figure for demonstrating a coordinate system. 第一実施形態のバイノーラル信号生成装置の例を示す機能ブロック図。The functional block diagram which shows the example of the binaural signal generator of 1st embodiment. 第一実施形態のバイノーラル信号生成方法の例を示す流れ図。The flowchart which shows the example of the binaural signal generation method of 1st embodiment.

以下、図面を参照してこの発明の実施形態を説明する。以下の説明において、テキスト中で使用する記号「~」「^-」「^」等は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Embodiments of the present invention will be described below with reference to the drawings. In the following explanation, the symbols “~”, “ ^- ”, “^”, etc. used in the text should be written directly above the previous character, but immediately after the character due to restrictions on the text notation. It describes. In the formula, these symbols are written in their original positions. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

［第一実施形態］
第一実施形態は、２次元に配置された円状マイクロホンアレーで収音した信号に基づいてバイノーラル信号を生成するバイノーラル信号生成装置及び方法に関する。まず、第一実施形態の技術的背景について説明する。 [First embodiment]
The first embodiment relates to a binaural signal generation apparatus and method for generating a binaural signal based on signals collected by a two-dimensionally arranged circular microphone array. First, the technical background of the first embodiment will be described.

＜技術的背景＞
半径R_sの円周上に３個以上のスピーカから構成される円状スピーカアレーを配置し、その円の中心に頭部が存在することを仮定する。スピーカ位置は円筒座標系（図３を参照のこと）でr_s=(R_s,φ_s,0)と書く。左耳および右耳の位置をそれぞれr_L=(R_L,φ_L,0)，r_R=(R_R,φ_R,0)とし、時間周波数領域のHRTFをG^L(r_L-r_s)，G^R(r_R-r_s)と書く。G^L(r_L-r_s)とG^R(r_R-r_s)は事前に測定し、既知であるとする。HRTFの測定は、スピーカアレーを用いてする必要は必ずしもなく、各方向から別々にHRTFを測定してもよい。なお、頭部形状をモデル化し、HRTFをシミュレーションして求めても構わない。以降、左耳の時間周波数領域信号P^L(r_L)についてのみ議論するが、右耳に関しても全く同じ議論が成り立つ。 <Technical background>
It is assumed that a circular speaker array composed of three or more speakers is arranged on the circumference of the radius R _s and the head exists at the center of the circle. The speaker position is written as r _s = (R _s , φ _s , 0) in a cylindrical coordinate system (see FIG. 3). The positions of the left and right ears are r _L = (R _L , φ _L , 0) and r _R = (R _R , φ _R , 0), respectively, and the HRTF in the time-frequency domain is set to G ^L (r _L −r _s ), G ^R (r _R -r _s ). G ^L (r _L -r _s ) and G ^R (r _R -r _s ) are measured in advance and are assumed to be known. It is not always necessary to measure the HRTF using a speaker array, and the HRTF may be measured separately from each direction. The head shape may be modeled and the HRTF may be obtained by simulation. Hereinafter, only the time frequency domain signal P ^L (r _L ) of the left ear will be discussed, but the same argument holds for the right ear.

各スピーカ位置からの伝達関数に対して、信号D(r_s)を畳み込むことで、左耳信号P^L(r_L)を合成することを考えると、P^L(r_L)は以下のように書ける。 Considering the synthesis of the left ear signal P ^L (r _L ) by convolving the signal D (r _s ) with the transfer function from each speaker position, P ^L (r _L ) is I can write.

ここで、「~」を付けた変数は円筒調和関数領域表現であることを表し、mは円筒調和関数領域表現における次数である。また、jは虚数であり、eはネイピア数であり、πは円周率である。 Here, a variable with “˜” represents that the cylindrical harmonic function region is expressed, and m is an order in the cylindrical harmonic function region. Further, j is an imaginary number, e is a Napier number, and π is a circular ratio.

さて、D(・)は所望の音場P^des(r)を合成するための駆動信号であればよい。 Now, D (•) may be a drive signal for synthesizing a desired sound field P ^des (r).

r=(r,φ,0)を半径R_sの円の内部領域における任意の位置とし、頭部がない状態で円周上のスピーカを用いて合成される音場をP^syn(r)とする。このとき、 Let r = (r, φ, 0) be an arbitrary position in the inner region of the circle with radius R _s , and let P ^syn (r) be the sound field synthesized using a speaker on the circumference without a head. To do. At this time,

となる。ここで、G(r-r_s)は各スピーカの伝達特性を表し、測定もしくは物理現象をモデル化（線音源としてモデル化した例を後述する。）することで得られる。なお、従来はHRTFを求める時点でスピーカ方向からの平面波伝搬を仮定して信号変換を定式化していたのに対し、ここででは点音源からの球面波伝搬を仮定して以下の定式化を行う。 It becomes. Here, G (rr _s ) represents the transfer characteristic of each speaker, and is obtained by modeling a measurement or physical phenomenon (an example of modeling as a line sound source will be described later). In the past, signal conversion was formulated assuming plane wave propagation from the speaker direction at the time of obtaining HRTF, but here the following formulation is performed assuming spherical wave propagation from a point source. .

収音側では所望音場の円筒調和スペクトル~P^des(r,m)=^-P^des(m)J_m(kr)が推定されているものとする。これはシミュレーションによって合成しても構わない。ここで、kは波数、J_mはm次のベッセル関数である。また、^-P^des(m)は所望音場の円筒調和関数展開の係数である。P^syn(r)とP^des(r) とが一致すればよいことから、 It is assumed that a cylindrical harmonic spectrum ~ P ^des (r, m) = ⁻ P ^des (m) J _m (kr) of the desired sound field is estimated on the sound collection side. This may be synthesized by simulation. Here, k is the wave number, and J _m is the mth order Bessel function. Also, ^- P ^des (m) are the coefficients of the cylindrical harmonic expansion of the desired sound field. Since P ^syn (r) and P ^des (r) need to match,

となる。~G(r-R_s,m)を半径R_refの円周上で測定することで取得する場合、 It becomes. When acquiring ~ G (rR _s , m) on the circumference of radius R _ref

となる。
したがって、式(1)に式(2)を代入することで、バイノーラル信号が得られる。 It becomes.
Therefore, a binaural signal can be obtained by substituting Equation (2) into Equation (1).

所望音場の円筒調和スペクトル~P^des(r,m)=P^des(m)J_m(kr)はいくつかの推定方法があるが、ここでは半径R_mの円筒形状の剛体のバッフル上に配置された円状マイクロホンアレーを用いることを考える。円状マイクロホンアレーは、３個以上のマイクロホンから構成される。ここで、バッフルは、マイクロホンやスピーカを取り付けるものを意味する。バッフルそのものの存在が、収音または再生の際の音の伝達関数に影響する。また、バッフルが剛体であるか吸音体であるかによって、所望音場の円筒調和スペクトルと観測信号の円筒調和スペクトルの関係式にも影響する。 Cylindrical harmonic spectrum of the desired sound field ~ P ^des (r, m) = P ^des (m) J _m (kr) has several estimation methods, but here on a cylindrical rigid body baffle with radius R _m Consider using an array of circular microphones. The circular microphone array is composed of three or more microphones. Here, a baffle means what attaches a microphone and a speaker. The presence of the baffle itself affects the sound transfer function during sound collection or playback. Further, depending on whether the baffle is a rigid body or a sound absorber, the relational expression between the cylindrical harmonic spectrum of the desired sound field and the cylindrical harmonic spectrum of the observation signal is also affected.

マイクロホンアレーによる観測信号の円筒調和スペクトルを~P^rcv(R_m,m)とすれば、~P^des(r,m)は以下のようになる。 Assuming that the cylindrical harmonic spectrum of the observation signal from the microphone array is ~ P ^rcv (R _m , m), ~ P ^des (r, m) is as follows.

ここで、H_m ⁽¹⁾′はm次の第一種ハンケル関数H_m ⁽¹⁾の微分である。もう一つの例として、半径R_mの球状の剛体のバッフル上の半径R_mの円状マイクロホンアレーを用いることを考える。マイクロホンアレーによる観測信号の円筒調和スペクトルを~P^rcv(R_m,m)とすれば、~P^des(r,m)は以下のようになる。 Here, H _m ⁽¹⁾ ′ is a derivative of the m-th order first-class Hankel function H _m ⁽¹⁾ . As another example, consider the use of a circular microphone array of radius R _m on the spherical rigid radius R _m baffle. Assuming that the cylindrical harmonic spectrum of the observation signal from the microphone array is ~ P ^rcv (R _m , m), ~ P ^des (r, m) is as follows.

なお、~P^des(r,m)は、マイクロホンアレーの配置、バッフルの形状や性質（剛体、吸音体、空中に配置されているか等の性質）により、上記以外の式により定義されてもよい。 Note that ~ P ^des (r, m) may be defined by an expression other than the above, depending on the arrangement of the microphone array and the shape and properties of the baffle (properties such as rigid bodies, sound absorbers, and arrangement in the air). .

ここで、P_n ^mはルジャンドル陪関数である。式(4)を式(3)に代入すれば、 Here, P _n ^m is the associated Legendre functions. Substituting equation (4) into equation (3),

となる。この式により、円状マイクロホンアレー信号から、以下で定義される時間周波数領域のフィルタの時空間畳み込みによって、バイノーラル信号を合成できることになる。 It becomes. From this equation, a binaural signal can be synthesized from a circular microphone array signal by spatiotemporal convolution of a filter in the time-frequency domain defined below.

正確に言えば、畳み込みではF^Lの逆フーリエ変換で求まる時空間領域のフィルタを用いる。すなわち、F^Lの逆フーリエ変換で求まる時空間領域のフィルタを用いて、円状マイクロホンアレーの観測信号に対し時空間領域の時空間畳み込みを行うことによって、バイノーラル信号を生成することができる。 Rather, a filter in the spatial domain when the determined by inverse Fourier transform of F ^L is the convolution. That is, using the filter when space region obtained by inverse Fourier transform of F ^L, by performing a convolution space when spatiotemporal contrast observation signals of the circular microphone array, it is possible to generate a binaural signal.

なお、次数mの時空間周波数領域のフィルタ係数~F^L(m)を The filter coefficient ~ F ^L (m) in the spatio-temporal frequency domain of order m

で定義すれば、バイノーラル信号は次数mの時空間周波数領域のフィルタ係数~F^L(m)と円状マイクロホンアレー信号で以下のように表すこともできる。 In other words, the binaural signal can be expressed by the filter coefficient ~ F ^L (m) in the spatio-temporal frequency domain of order m and the circular microphone array signal as follows.

式(5)は測定によって得られた~Gを用いることを仮定したものだったが、ここでは線音源としてモデル化した場合について考える。 Equation (5) was assumed to use ~ G obtained by measurement, but here we consider the case of modeling as a line source.

であることより、 Than

となる。ただし、実際のスピーカは線音源よりは点音源に近い特性を持っていることより、補正項として(2πj/k)^1/2を乗じることが必要である。 It becomes. However, since an actual speaker has characteristics closer to a point sound source than a line sound source, it is necessary to multiply (2πj / k) ^1/2 as a correction term.

なお、聴者の頭部の移動に対する追従は，収録側の円状マイクロホンアレーで取得した円筒調和スペクトルに対する処理によって実現可能である。単純な回転に対しては、単純な位相シフトを適用すればよい。回転角をφ_rotとして、位相シフトを適用した円筒調和スペクトル~P^rot(R_m,m)は、 The follow-up to the movement of the listener's head can be realized by processing the cylindrical harmonic spectrum acquired by the circular microphone array on the recording side. For simple rotation, a simple phase shift may be applied. Cylindrical harmonic spectrum ~ P ^rot (R _m , m) using a phase shift with the rotation angle as φ _rot is

となる。
円状マイクロホンアレーの中心とは異なる位置を原点とした円筒調和スペクトルを推定することも可能である。例えば、位置(R_t,φ_t)における円筒調和スペクトル^-P^transは、 It becomes.
It is also possible to estimate a cylindrical harmonic spectrum with the origin at a position different from the center of the circular microphone array. For example, a cylindrical conditioning spectrum at position _{_{^{(R t, φ t) -}}} P trans is

として得られる。複数の位置における円筒調和スペクトルが取得可能な場合には、最小二乗法による推定が可能となる。 As obtained. When cylindrical harmonic spectra at a plurality of positions can be acquired, estimation by the least square method is possible.

式(A)又は式(B)のフィルタを用いて、以下の式のように空間方向の畳み込みを行うことによりバイノーラル信号を生成してもよい。空間方向の畳み込みによりバイノーラル信号を生成する場合には、円状スピーカアレーを構成するスピーカの数＝円状マイクロホンアレーを構成するマイクロホンの数と同じにする必要があるところ、N_Lはそのスピーカの数=マイクロホンの数である。φ_iはφ_Lに対応する方向である。 A binaural signal may be generated by performing convolution in the spatial direction using the filter of Expression (A) or Expression (B) as shown in the following expression. The convolution in the spatial direction when generating a binaural signal, where there is a need to be the same as the number of microphones constituting the number = circular microphone array speaker that constitutes a circular speaker array, N _L is the speaker Number = number of microphones. φ _i is a direction corresponding to φ _L.

＜バイノーラル信号生成装置及び方法＞
第一実施形態のバイノーラル信号生成装置は、図１に示すように、フィルタリング部１、フィルタ生成部２を備えている。バイノーラル信号生成装置が、図２に示すステップＳ１の処理を実行することによりバイノーラル信号生成方法が実現される。 <Binaural signal generation apparatus and method>
The binaural signal generation device according to the first embodiment includes a filtering unit 1 and a filter generation unit 2 as shown in FIG. The binaural signal generation apparatus implements the binaural signal generation method by executing the process of step S1 shown in FIG.

フィルタリング部１には、円状に配置された円状スピーカアレーからの再生音に基づく複数の頭部伝達関数を考慮して所望音場と合成音場を一致させることで得られるフィルタが入力される。 The filtering unit 1 receives a filter obtained by matching a desired sound field and a synthesized sound field in consideration of a plurality of head-related transfer functions based on reproduced sounds from circular speaker arrays arranged in a circle. The

このフィルタは、式(A)又は式(B)により定義されるフィルタを逆フーリエ変換することにより得られるフィルタである。このフィルタは、フィルタリング部１の処理に先立ってフィルタ生成部２により予め生成される。 This filter is a filter obtained by performing an inverse Fourier transform on a filter defined by Expression (A) or Expression (B). This filter is generated in advance by the filter generation unit 2 prior to the processing of the filtering unit 1.

また、フィルタリング部１には、円状に配置された円状マイクロホンアレーによる観測信号が入力される。 In addition, an observation signal from a circular microphone array arranged in a circle is input to the filtering unit 1.

フィルタリング部１は、入力されたフィルタと入力された観測信号とを時空間方向に畳み込むことによりバイノーラル信号を生成する（ステップＳ１）。 The filtering unit 1 generates a binaural signal by convolving the input filter and the input observation signal in the space-time direction (step S1).

上記のフィルタリング部１の処理は、一例に過ぎない。フィルタリング部１は、＜技術的背景＞の欄で説明した他のフィルタ処理や畳み込み処理によりバイノーラル信号を生成してもよい。 The processing of the filtering unit 1 is only an example. The filtering unit 1 may generate a binaural signal by the other filter processing or convolution processing described in the section <Technical background>.

すなわち、例えば、フィルタリング部１は、上記フィルタF^Lに代えて、F^Lの逆フーリエ変換で求まるフィルタを用いてフィルタリング処理を行ってもよい。 That is, for example, the filtering unit 1, instead of the filter F ^L, may be performed filtering processing using the filter which is obtained by inverse Fourier transform of F ^L.

また、例えば、フィルタリング部１は、~P^rcv(R_m,m)を上記観測信号の円筒調和スペクトルとして、聴者がφ_rotだけ向きを変えた場合には、上記観測信号に代えて式(C)により定義される~P^rcv(R_m,m)と上記フィルタを畳み込み、 Further, for example, when the listener changes the direction by φ _rot using ~ P ^rcv (R _m , m) as the cylindrical harmonic spectrum of the observed signal, the filtering unit 1 replaces the observed signal with the formula (C ) ~ P ^rcv (R _m , m) defined by

聴者が位置(R_t,φ_t)に移動した場合には、上記観測信号に代えて式(D)により定義される^-P^trans(R_m,m)と上記フィルタとを畳み込んでもよい。 If the listener moves to the position (R _{_t,} φ _t), instead of the the observed signal is defined by the formula ^{^{(D) - P trans (R}} m, m) and may convolving the above filter.

このように、複数の頭部伝達関数を考慮することより、精度の高いバイノーラル信号を生成することができる。 Thus, by considering a plurality of head related transfer functions, a highly accurate binaural signal can be generated.

［第二実施形態］
第二実施形態は、３次元に配置された球状マイクロホンアレーで収音した信号に基づいてバイノーラル信号を生成するバイノーラル信号生成装置及び方法である。まず、第二実施形態の技術的背景について説明する。 [Second Embodiment]
The second embodiment is a binaural signal generation apparatus and method for generating a binaural signal based on signals collected by a spherical microphone array arranged three-dimensionally. First, the technical background of the second embodiment will be described.

＜技術的背景＞
半径R_sの球面上に３個以上のスピーカから構成される球状スピーカアレーを配置し、その中心に頭部が存在すると仮定する。スピーカ位置は球座標系でr_s=(R_s,θ_s,φ_s)と書く。左耳および右耳の位置をそれぞれr_L=(R_L,θ_L,φ_L)，r_R=(R_R,θ_R,φ_R)とし、時間周波数領域のHRTFをG^L(r_L-r_s)，G^R(r_R-r_s)と書く。G^L(r_L-r_s)とG^R(r_R-r_s)は事前に測定し、既知であるとする。HRTFの測定は、スピーカアレーを用いてする必要は必ずしもなく、各方向から別々にHRTFを測定してもよい。なお、頭部形状をモデル化し、HRTFをシミュレーションして求めても構わない。以降、左耳の時間周波数領域信号P^L(r_L)についてのみ議論するが、右耳に関しても全く同じ議論が成り立つ。 <Technical background>
It is assumed that a spherical speaker array composed of three or more speakers is arranged on a spherical surface with a radius R _s and that a head exists at the center thereof. The speaker position is written as r _s = (R _s , θ _s , φ _s ) in a spherical coordinate system. The left and right ear positions are r _L = (R _L , θ _L , φ _L ) and r _R = (R _R , θ _R , φ _R ), respectively, and the HRTF in the time-frequency domain is set to G ^L (r _L − r _s ), G ^R (r _R -r _s ). G ^L (r _L -r _s ) and G ^R (r _R -r _s ) are measured in advance and are assumed to be known. It is not always necessary to measure the HRTF using a speaker array, and the HRTF may be measured separately from each direction. The head shape may be modeled and the HRTF may be obtained by simulation. Hereinafter, only the time frequency domain signal P ^L (r _L ) of the left ear will be discussed, but the same argument holds for the right ear.

円状のアレーを用いる第一実施形態とは異なり、球状のアレーを用いる第二実施形態では、空間の畳み込みとしては記述できないことに注意する。 Note that, unlike the first embodiment using a circular array, the second embodiment using a spherical array cannot be described as a convolution of space.

さて、D(・)は所望の音場を合成するための駆動信号であればよい。r=(r,θ,φ)を内部領域における任意の位置とし、頭部がない状態で球面上のスピーカを用いて合成される音場をP^syn(r)とする。このとき、各スピーカの軸対称性を仮定すれば、η=(0,0,R_s)の北極位置をおいて、 Now, D (•) may be a drive signal for synthesizing a desired sound field. Let r = (r, θ, φ) be an arbitrary position in the inner region, and let P ^syn (r) be a sound field synthesized using a spherical speaker without a head. At this time, assuming the axial symmetry of each speaker, the north pole position of η = (0,0, R _s )

となる。SO(3)は、３次の回転群（特殊直交群）である。ここで、G(r-r_s)は各スピーカの伝達特性を表し、測定もしくはモデル化することで得られる。従来はHRTFを求める時点で音源からの平面波を仮定して信号変換を定式化していたのに対し、ここでは点音源からの球面波を仮定して以下の定式化を行う。Y_n ^mは球面調和関数であり、nとmはその次数である。ここでは、「~」付きの変数は球面調和スペクトル領域を表すこととする。 It becomes. SO (3) is a third-order rotation group (special orthogonal group). Here, G (rr _s ) represents the transfer characteristic of each speaker, and is obtained by measurement or modeling. In the past, signal conversion was formulated assuming a plane wave from a sound source at the time of obtaining HRTF, but here the following formulation is performed assuming a spherical wave from a point sound source. Y _n ^m is a spherical harmonic function, and n and m are their orders. Here, the variable with “˜” represents the spherical harmonic spectrum region.

収音側では所望音場の球面調和スペクトル~P^des(r,n,m)=^-P^des(n,m)j_n(kr)が推定されているものとする。これは、シミュレーションによって合成しても構わない。P^syn(r)とP^des(r)とが一致すればよいことから、 It is assumed that the spherical harmonic spectrum ~ P ^des (r, n, m) = ⁻ P ^des (n, m) j _n (kr) of the desired sound field is estimated on the sound collection side. This may be synthesized by simulation. Since P ^syn (r) and P ^des (r) should match,

となる。j_nはn次の第一種球ベッセル関数である。~G(r-R_s,n,0)を半径R_refの球面上で測定することで取得する場合、 It becomes. j _n is an nth-order first-order spherical Bessel function. When obtaining ~ G (rR _s , n, 0) on a sphere with radius R _ref

となる。
したがって、式(6)に式(7)を代入することで、バイノーラル信号が得られる。 It becomes.
Therefore, a binaural signal can be obtained by substituting Equation (7) into Equation (6).

所望音場の球面調和スペクトル~P^des(r,n,m)=^-P^des(n,m)j_n(kr)はいくつかの推定方法があるが、ここでは、半径R_mの球バッフル上に配置された球状マイクロホンアレーを用いることを考える。球状マイクロホンアレーは、３個以上のマイクロホンから構成される。 Desired sound field of spherical harmonic spectrum ^{~ P des (r, n,} m) = - P des (n, m) j n (kr) is a number of estimation methods, here, the radius R _m sphere baffle Consider using a spherical microphone array placed above. The spherical microphone array is composed of three or more microphones.

ここで、バッフルは、マイクロホンやスピーカを取り付けるものを意味する。バッフルそのものの存在が、収音または再生の際の音の伝達関数に影響する。また、バッフルが剛体であるか吸音体であるかによって、球面調和スペクトルと観測信号の球面調和スペクトルの関係式に影響する。 Here, a baffle means what attaches a microphone and a speaker. The presence of the baffle itself affects the sound transfer function during sound collection or playback. Further, depending on whether the baffle is a rigid body or a sound absorber, the relational expression between the spherical harmonic spectrum and the spherical harmonic spectrum of the observation signal is affected.

マイクロホンアレーによる観測信号の球面調和スペクトルを~P^rcv(R_m,n,m)とすれば、~P^des(r,n,m)は以下のようになる。h_n ⁽¹⁾'は、n次の第一種ハンケル関数h_n ⁽¹⁾の微分を意味する。 Assuming that the spherical harmonic spectrum of the observation signal from the microphone array is ~ P ^rcv (R _m , n, m), ~ P ^des (r, n, m) is as follows. h _n ⁽¹⁾ ′ means a derivative of the _n- th order first-class Hankel function h _n ⁽¹⁾ .

この式(8)に代入すれば、 Substituting into this equation (8),

となる。SはHRTFの測定に用いた球状スピーカアレーを構成するスピーカの集合である。円状アレーの場合と異なり、単一のフィルタ畳み込みでの信号変換はできないが、マイクロホンアレーの信号に対して以下の式(E)のフィルタを適用後、各HRTFとの畳み込みを、HRTFの測定に用いたスピーカの数の分だけ足しあわせた信号をバイノーラル信号とすれば良い。 It becomes. S is a set of speakers constituting the spherical speaker array used for HRTF measurement. Unlike a circular array, signal conversion with a single filter convolution is not possible, but after applying the filter of the following equation (E) to the microphone array signal, convolution with each HRTF is measured for HRTF. The binaural signal may be a signal obtained by adding the number of speakers used in the above.

すなわち、式(E)のフィルタと球状に配置された球状マイクロホンアレーによる観測信号とを時空間畳み込みすることにより信号を得て、得られた信号の球状スピーカアレーの各スピーカ位置に対応する時間信号と、球状スピーカアレーからの再生音に基づく複数の頭部伝達関数の球状スピーカアレーの各スピーカ位置に対応する時間信号とを時間方向に畳み込みしたものを球状スピーカアレーを構成するスピーカの数の分だけ足し合わせることにより時間領域のバイノーラル信号を生成する。 That is, a time signal corresponding to each speaker position of the spherical speaker array of the obtained signal is obtained by space-time convolution of the filter of equation (E) and the observation signal from the spherical microphone array arranged in a spherical shape. And the time signal corresponding to each speaker position of the spherical speaker array of a plurality of head-related transfer functions based on the reproduced sound from the spherical speaker array is the number of speakers constituting the spherical speaker array The binaural signal in the time domain is generated by adding together.

なお、次数n,mのフィルタ係数~F^L(r_s,n,m)を The filter coefficient ~ F ^L (r _s , n, m) of order n, m

で定義すれば、バイノーラル信号は次数n,mのフィルタ係数と円状マイクロホンアレー信号で以下のように表される。 , The binaural signal is expressed as follows using a filter coefficient of order n and m and a circular microphone array signal.

式(F)は測定によって得られた~Gを用いることを仮定したものだったが、ここでは、点音源としてモデル化した場合について考える。 Equation (F) was assumed to use ~ G obtained by measurement, but here, consider the case of modeling as a point sound source.

であることより、 Than

となる。この式(F)のフィルタを式(E)のフィルタの代わりに用いてもよい。 It becomes. The filter of the formula (F) may be used instead of the filter of the formula (E).

＜バイノーラル信号生成装置及び方法＞
第二実施形態のバイノーラル信号生成装置は、図４に示すように、フィルタリング部１、フィルタ生成部２を備えている。バイノーラル信号生成装置が、図５に示す各ステップの処理を実行することによりバイノーラル信号生成方法が実現される。 <Binaural signal generation apparatus and method>
The binaural signal generation device of the second embodiment includes a filtering unit 1 and a filter generation unit 2 as shown in FIG. The binaural signal generation apparatus implements the binaural signal generation method by executing the process of each step shown in FIG.

フィルタリング部１には、球状に配置された球状スピーカアレーを考慮して所望音場と合成音場を一致させることで得られるフィルタが入力される。 The filtering unit 1 receives a filter obtained by matching a desired sound field and a synthesized sound field in consideration of a spherical speaker array arranged in a spherical shape.

このフィルタは、式(E)又は式(F)により定義されるフィルタである。このフィルタは、フィルタリング部１の処理に先立ってフィルタ生成部２により予め生成される。 This filter is a filter defined by Expression (E) or Expression (F). This filter is generated in advance by the filter generation unit 2 prior to the processing of the filtering unit 1.

また、フィルタリング部１には、球状に配置された円状マイクロホンアレーによる観測信号が入力される。 In addition, an observation signal from a circular microphone array arranged in a spherical shape is input to the filtering unit 1.

フィルタリング部１は、入力されたフィルタと入力された観測信号とを時空間畳み込みすることにより信号を得る。得られた信号は、信号生成部３に出力される。 The filtering unit 1 obtains a signal by performing space-time convolution of the input filter and the input observation signal. The obtained signal is output to the signal generator 3.

信号生成部３は、フィルタリング部１により得られた信号のHRTFを得るために用いた球状スピーカアレーの各スピーカ位置に対応する時間信号と、球状スピーカアレーからの再生音に基づく複数の頭部伝達関数の球状スピーカアレーの各スピーカ位置に対応する時間信号とを時間方向に畳み込みしたものを球状スピーカアレーを構成するスピーカの数の分だけ足し合わせることにより時間領域のバイノーラル信号を生成する。 The signal generation unit 3 transmits a plurality of head signals based on the time signal corresponding to each speaker position of the spherical speaker array used to obtain the HRTF of the signal obtained by the filtering unit 1 and the reproduced sound from the spherical speaker array. A binaural signal in the time domain is generated by adding up the time signal corresponding to each speaker position of the spherical speaker array of the function in the time direction and adding the result by the number of speakers constituting the spherical speaker array.

すなわち、例えば、フィルタリング部１は、~P^rcv(R_m,m)を上記観測信号の球面調和スペクトルとして、式(9)により定義される処理を行うことによりバイノーラル信号を生成してもよい。 That is, for example, the filtering unit 1 may generate a binaural signal by performing the process defined by Equation (9) using ~ P ^rcv (R _m , m) as the spherical harmonic spectrum of the observed signal.

［関数］
上記の説明で用いた各関数について説明する。 [function]
Each function used in the above description will be described.

・を任意の実数として、n次の第一種ハンケル関数H_n ⁽¹⁾(・)及びn次のベッセル関数J_n(・)は、以下のように定義される。Γ(z)はガンマ関数であり、Y_n(z)はノイマン関数である。 N is the first real Hankel function H _n ⁽¹⁾ (•) and n-order Bessel function J _n (•) are defined as follows. Γ (z) is a gamma function, and Y _n (z) is a Neumann function.

n次の第一種球ハンケル関数h⁽¹⁾ _n(・)及びn次の球ベッセル関数j_n(・)は、以下のように定義される。 The nth-order first-type sphere Hankel function h ⁽¹⁾ _n (•) and the nth-order sphere Bessel function j _n (•) are defined as follows.

ルジャンドル陪関数P^m _n(・)であり、以下のように定義される。P_n(・)はルジャンドル多項式を表す。 Legendre 陪 function P ^m _n (•), defined as follows: P _n (·) represents a Legendre polynomial.

球面調和関数Y_n ^mは下記式により定義される。 The spherical harmonic function Y _n ^m is defined by the following equation.

［変形例］
上記の時間周波数領域又は時空間周波数領域でのフィルタリング処理は、時空間領域で行ってもよい。すなわち、上記フィルタリング処理を、時間方向の畳み込みと空間方向の畳み込みでフィルタリングによりを行ってもよい。 [Modification]
The filtering process in the temporal frequency domain or the spatiotemporal frequency domain may be performed in the spatiotemporal domain. In other words, the filtering process may be performed by filtering using a temporal convolution and a spatial convolution.

バイノーラル信号生成装置は、コンピュータによって実現することができる。この場合、この装置の各部の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、この装置における各部がコンピュータ上で実現される。 The binaural signal generation device can be realized by a computer. In this case, the processing content of each part of this apparatus is described by a program. Then, by executing this program on a computer, each unit in this apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、これらの装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. In this embodiment, these apparatuses are configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

この発明は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 The present invention is not limited to the above-described embodiment, and can be modified as appropriate without departing from the spirit of the present invention.

１フィルタリング部
２フィルタ生成部
３信号生成部 1 Filtering Unit 2 Filter Generation Unit 3 Signal Generation Unit

Claims

A filter obtained by matching a desired sound field and a synthesized sound field in consideration of a plurality of head related transfer functions based on reproduced sound from a circular speaker array arranged in a circle, and a circle arranged in a circle Including a filtering unit that generates a binaural signal by convolving the observation signals from the cylindrical microphone array in a spatio-temporal direction,
j is an imaginary number, k is a wave number, J _m is an m-order Bessel function, H _m ⁽¹⁾ 'is a derivative of the m-th order first-class Hankel function H _m ⁽¹⁾ , and one of the listeners The position of the ear in the cylindrical coordinate system is (R _L , φ _L , 0), ~ G ^L is the head related transfer function expressed in the cylindrical harmonic function area, and ~ G is the transfer expressed in the cylindrical harmonic function area of each speaker _Where ~ G is measured on the circumference of radius R _ref , R _m is the radius of the circular microphone array, R _s is the radius of the circular speaker array,
The filter is a filter obtained by inverse Fourier transform of F ^L which is defined by the following formula (A) or Formula (B),

Binaural signal generator.

The binaural signal generation device according to claim 1,
~ P ^rcv (R _m , m) as the cylindrical harmonic spectrum of the observed signal
When the listener changes the direction by φ _rot , the filtering unit convolves ~ P ^{r ot} (R _m , m) defined by the following equation (C) with the filter instead of the observed signal. ,

If the above listener moves to the position (R _{_t,} φ _t), said filtering unit is defined by the formula (D) below in place of the observed signal ^{^{_{- P trans (R m, m}}} ) and the Convolve with filter,

Binaural signal generator.

A signal is obtained by convolving the filter obtained by matching the desired sound field with the synthesized sound field in consideration of the spherical speaker array arranged in a spherical shape and the observation signal from the spherical microphone array arranged in a spherical shape. A filtering unit;
A time signal corresponding to each speaker position of the spherical speaker array of the obtained signal and a time corresponding to each speaker position of the spherical speaker array of a plurality of head-related transfer functions based on reproduced sound from the spherical speaker array A signal generation unit that generates a binaural signal in the time domain by adding the convolution of the signal in the time direction by the number of speakers constituting the spherical speaker array;
Including
j is an imaginary number, k is a wave number, n and m are the orders of the spherical harmonic function Y _n ^m , and the position r _s of the spherical speaker array in the spherical coordinate system is represented by r _s = (R _s , θ _s , φ _s ) , H _n ⁽¹⁾ 'is the derivative of the nth- order first-class spherical Hankel function h _n ⁽¹⁾ , ~ G is the transfer function expressed in the cylindrical harmonic function region of the speaker constituting the spherical speaker array, and ~ G is measured on the circumference of radius R _ref , R _m is the radius of the spherical microphone array, R _s is the radius of the spherical speaker array,
The filter is F ^L which is defined by the following formula (E) or Formula (F),

Binaural signal generator.

A filter obtained by matching a desired sound field and a synthesized sound field in consideration of a plurality of head-related transfer functions based on reproduced sound from circular speaker arrays arranged in a circular shape, and a circular shape; It includes a filtering step that generates a binaural signal by convolving the observation signals from the arranged circular microphone array in the spatio-temporal direction, where j is an imaginary number, k is a wave number, J _m is an m-th order Bessel function, and H _m and ⁽¹⁾ 'the derivative of m-th order first kind Hankel function H _m ^(1), and the position in the cylindrical coordinate system of one of the ears of the listener and _{_{(R L, φ L, 0}} ), ~ G Let ^L be the head-related transfer function expressed in the cylindrical harmonic function area, ~ G be the transfer function expressed in the cylindrical harmonic function area of each speaker, ~ G be measured on the circumference of the radius R _ref , R _m was the radius of the circular microphone array, the circular Supikaa the R _s As the radius of the over,
The filter is a filter obtained by inverse Fourier transform of F ^L which is defined by the following formula (A) or Formula (B),

Binaural signal generation method.

Filtering space-time convolves the filter obtained by matching the desired sound field with the synthesized sound field in consideration of the spherical speaker array arranged in a spherical shape and the observation signal from the spherical microphone array arranged in a spherical shape. A filtering step to obtain a signal by
The signal generation unit includes the spherical signals of a plurality of head-related transfer functions based on time signals corresponding to the respective speaker positions of the spherical speaker array of the obtained signals and reproduced sounds from the spherical speaker arrays arranged in the spherical shape. A signal generation step of generating a binaural signal in the time domain by adding the time signal corresponding to each speaker position of the speaker array in the time direction and adding the amount corresponding to the number of speakers constituting the spherical speaker array;
Where j is an imaginary number, k is a wave number, n, m is the order of the spherical harmonic function Y _n ^m , and the position r _s of the spherical speaker array in the spherical coordinate system is expressed as r _s = (R _s , θ _s , φ _s ), h _n ⁽¹⁾ 'is the derivative of the nth- order first-class spherical Hankel function h _n ⁽¹⁾ , and ~ G is the transfer function expressed in the cylindrical harmonic function region of the speaker constituting the spherical speaker array And ~ G is measured on the circumference of radius R _ref , R _m is the radius of the spherical microphone array, and R _s is the radius of the spherical speaker array,
The filter is F ^L which is defined by the following formula (E) or Formula (F),

Binaural signal generation method.

A program for causing a computer to function as each unit of the binaural signal generation device according to any one of claims 1 to 3.