JP2017130899A

JP2017130899A - Sound field estimation device, method therefor and program

Info

Publication number: JP2017130899A
Application number: JP2016011016A
Authority: JP
Inventors: 暁江村; Akira Emura
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-01-22
Filing date: 2016-01-22
Publication date: 2017-07-27

Abstract

PROBLEM TO BE SOLVED: To provide a sound field estimation device having a larger space region in which sound field estimation is effective than prior arts, a method therefor and a program.SOLUTION: A sound field estimation device 200 includes: a plane wave decomposition unit 213 for estimating a vector consisting of a strength of a plane wave constituting a sound field using a sound collection signal u(ω, m, j) in frequency domain of M spherical microphone arrays 1 and 2 including microphones at predetermined positions on a sphere; and an interpolation estimation unit 216 for estimating a sound collection signal u^(ω, r) in frequency domain at a position rusing an estimate a(ω) of the vector consisting of the strength of the plane wave and the position rof the virtual microphone.SELECTED DRAWING: Figure 2

Description

本発明は、ある位置に配置されたマイクロホンの収音信号を用いて、他の位置にマイクロホンが配置された場合に得られる収音信号を推定する技術に関する。 The present invention relates to a technique for estimating a sound collection signal obtained when a microphone is disposed at another position using a sound collection signal of a microphone disposed at a certain position.

近年、オーディオ再生技術は２チャネルステレオから５．１チャネル再生に拡大し、さらに２２．２チャネル再生や波面合成法の研究開発が進められ、再生そのものの臨場感を大きく向上させることと、臨場感の高い再生エリアをなるべく拡大することが図られている。 In recent years, audio playback technology has expanded from 2-channel stereo to 5.1-channel playback, and further research and development on 22.2 channel playback and wavefront synthesis methods have greatly improved the realism of reproduction itself. It is intended to expand the high playback area as much as possible.

このような多チャネルオーディオ再生方法を評価検証するには、再生された音場を計測することが重要になる。例えば波面合成法では、実際に収録された音場と再現された音場とを比較し、その相違を把握する必要がある。その理由は、収録音場を再生信号へ変換する信号処理、収録した信号のエンコードとデコード、再生装置が設置された部屋の音響特性などの諸要因が音場の再現精度に影響するためであり、再現精度の高い手法を確立することが重要だからである。 In order to evaluate and verify such a multi-channel audio reproduction method, it is important to measure the reproduced sound field. For example, in the wavefront synthesis method, it is necessary to compare the actually recorded sound field with the reproduced sound field and grasp the difference. The reason for this is that various factors such as signal processing for converting the recorded sound field into a playback signal, encoding and decoding of the recorded signal, and acoustic characteristics of the room where the playback device is installed affect the sound field reproduction accuracy. This is because it is important to establish a method with high reproducibility.

（従来法１）
音場を計測する方法として、対象とする測定エリアの一部に局所的にマイクロホンを集中配置し、その測定結果から周辺エリアの音場を推定することが考えられる。一例として球面マイクロホンアレーの検討が進められている。球面マイクロホンアレーとは、数十以上のマイクロホン素子を半径r_aの球面上に配置して構成するマイクロホンアレーであり、r_aは数cmから十数cmの範囲にある。 (Conventional method 1)
As a method for measuring the sound field, it is conceivable to concentrate microphones locally in a part of the target measurement area and estimate the sound field in the surrounding area from the measurement result. As an example, a spherical microphone array is being studied. The spherical microphone array, a microphone array configured by arranging the several tens or more microphone element on a sphere of radius r _a, the r _a is in the range from a few cm of ten cm.

図１に、従来技術における球面マイクロホンアレー１をもちいる音場推定処理の信号フローを示す。球面上に配置されたJ個のマイクロホンでそれぞれ収音された時間領域の信号y(t,r_a,Ω_j)は、短時間フーリエ変換部１１１により周波数領域の信号u(i,ω,r_a,Ω_j)に変換される。ただし、tは時刻、iはフレーム、Jは2以上の整数、ωは時間周波数、j=1,2,…,Jである。なお、以下の処理では、フレーム単位で処理を行うが、表記を簡略化するために、iを省略する。Ω_jはj番目のマイクロホン素子の球面上の位置であり、elevation角θ_jとazimuth角φ_jのペアで指定される。Ω_j=(θ_j,φ_j)である。 FIG. 1 shows a signal flow of sound field estimation processing using a spherical microphone array 1 in the prior art. The time domain signals y (t, r _a , Ω _j ) respectively picked up by the J microphones arranged on the spherical surface are converted by the short-time Fourier transform unit 111 into the frequency domain signals u (i, ω, r). _a , Ω _j ). Where t is time, i is a frame, J is an integer of 2 or more, ω is a time frequency, and j = 1, 2,. Note that in the following processing, processing is performed in units of frames, but i is omitted to simplify the notation. Ω _j is a position on the spherical surface of the j-th microphone element, and is specified by a pair of elevation angle θ _j and azimuth angle φ _j . Ω _j = (θ _j , φ _j ).

球面波スペクトル変換部１１２は、周波数ωごとに次式により球面波スペクトルu_n,m(ω,r_a)を求める。 The spherical wave spectrum conversion unit 112 obtains _a spherical wave spectrum u _{n, m} (ω, r _a ) by the following equation for each frequency ω.

ただしα_jは、式(1)の積和が次式で表される球調和関数の直交条件を満たすように適切に設定された重みである。 However, α _j is a weight appropriately set so that the product sum of Equation (1) satisfies the orthogonal condition of the spherical harmonic function expressed by the following equation.

なお、Y_n ^m(θ_j,φ_j)はオーダーn、次数mの球調和関数であり、*は複素共役を意味する。n=0,1,…,N、m=-n,-n+1,…,nである。δ_nn'はn=n'のときに１、n≠n'のときに0となる値であり、δ_mm'はm=m'のときに１、m≠m'のときに0となる値である。オーダー数Nまでの球面波スペクトルを得るには、(N+1)²個以上のマイクロホン素子が必要になる。 Y _n ^m (θ _j , φ _j ) is a spherical harmonic function of order n and order m, and * means a complex conjugate. n = 0,1, ..., N, m = -n, -n + 1, ..., n. δ _{nn ′} is 1 when n = n ′, 0 when n ≠ n ′, δ _{mm ′} is 1 when m = m ′, and 0 when m ≠ m ′. Value. To obtain a spherical wave spectrum up to the order number N, (N + 1) ² or more microphone elements are required.

なお、これ以降は測定対象範囲の外側にある音源によって生成された音場を測定すること、すなわち内部問題を扱う。別の言い方をすると、球面マイクロホンアレーの成す球体の外側にある音源によって生成された音場を測定する。 In the following, the sound field generated by the sound source outside the measurement target range is measured, that is, the internal problem is dealt with. In other words, the sound field generated by the sound source outside the sphere formed by the spherical microphone array is measured.

球面マイクロホンアレーの中心を原点として、極座標系(r,Ω)=(r,θ,φ)で音場を考える。 The sound field is considered in the polar coordinate system (r, Ω) = (r, θ, φ) with the center of the spherical microphone array as the origin.

外挿推定部１１６では、周波数ωで次式により、半径r_aの位置から半径rへと音場を外挿し、極座標系(r,Ω)=(r,θ,φ)における収音信号u(ω,r,Ω)を求める。言い換えると、球面マイクロホンアレー上に配置されたマイクロホンの収音信号を用いて球面マイクロホンアレーの成す球体の外側にある音場を推定する。 In extrapolation estimator 116, the following equation in the frequency omega, extrapolating the sound field to the radius r from the position of the radius r _a, the polar coordinate system (r, Ω) = (r , θ, φ) collected sound in the signal u Find (ω, r, Ω). In other words, the sound field outside the sphere formed by the spherical microphone array is estimated using the collected sound signals of the microphones arranged on the spherical microphone array.

ただしkは波数k=ω/c(cは音速)であり、b_n( )はモード強度関数である。 Where k is the wave number k = ω / c (c is the speed of sound), and b _n () is the mode intensity function.

非特許文献１では、マイクロホン素子を中空で球面上に配置する開球型(open sphere)の球面マイクロホンアレーの場合が示されている。この場合、モード強度関数は次式で表される。 Non-Patent Document 1 shows a case of an open sphere spherical microphone array in which microphone elements are hollow and arranged on a spherical surface. In this case, the mode intensity function is expressed by the following equation.

である。ただしiは虚数であり、j_n( )はn次の球ベッセル関数である。球面マイクロホンアレーが剛球表面にマイク素子を配置して構成されている場合には、非特許文献２に基づき、モード強度関数は次式で表される。 It is. Here, i is an imaginary number, and j _n () is an nth order spherical Bessel function. When the spherical microphone array is configured by arranging microphone elements on the surface of a hard sphere, the mode intensity function is expressed by the following expression based on Non-Patent Document 2.

ただしh_n( )はn次の第1種ハンケル関数である。なお、「A'」はAの微分を意味する。 Here, h _n () is an nth-order first-class Hankel function. “A ′” means the differentiation of A.

短時間逆フーリエ変換部１１８は、空間的に外挿した収音信号を周波数領域の信号u(ω,r,Ω)から時間領域の信号y(t,r,Ω)に戻し、出力する。 The short-time inverse Fourier transform unit 118 returns the spatially extrapolated sound collection signal from the frequency domain signal u (ω, r, Ω) to the time domain signal y (t, r, Ω) and outputs it.

なお、式（３）は、球面波スペクトルにb_n(kr)/b_n(kr_a)を適用し、Y_n ^m(θ,φ)で積和をとっている。このY_n ^m(θ,φ)の積和は、逆球面波スペクトル変換に対応する。そのため、空間的に外挿した収音信号u(ω,r,Ω)は周波数領域の信号となっている。 In Equation (3), b _n (kr) / b _n (kr _a ) is applied to the spherical wave spectrum, and the product sum is taken as Y _n ^m (θ, φ). This product sum of Y _n ^m (θ, φ) corresponds to inverse spherical wave spectrum conversion. Therefore, the spatially extrapolated sound pickup signal u (ω, r, Ω) is a frequency domain signal.

開球型のマイクロホンアレーによる計測では、特異点の影響を避けられず、j_n(kr)=0になるkおよびrで測定不能になる。具体的には、音場が存在してもｊ_n(kr)=0が満たされるとき、その出力が0になる。しかし剛球型のマイクロホンアレーには特異点がなく、測定不能にならない。そのため球面型マイクロホンアレーとしては、剛球型のマイクロホンアレーを使うことが主流である。 In the measurement using the open-ball type microphone array, the influence of the singular point cannot be avoided, and it becomes impossible to measure at k and r where j _n (kr) = 0. Specifically, when j _n (kr) = 0 is satisfied even if a sound field exists, its output becomes zero. However, the hard sphere type microphone array has no singular point and does not become impossible to measure. For this reason, as the spherical microphone array, it is mainstream to use a rigid sphere type microphone array.

T. Abhayapala and D. Ward, "Theory and design of high order sound field microphones using spherical microphone array", in Acoustics, Speech, and Signal Processing (ICASSP), IEEE International Conference on, 2002, pp. II-1949.T. Abhayapala and D. Ward, "Theory and design of high order sound field microphones using spherical microphone array", in Acoustics, Speech, and Signal Processing (ICASSP), IEEE International Conference on, 2002, pp. II-1949. Meyer, Jens; Elko, Gary, "A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield", Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, 2002, II-1781 - II-1784.Meyer, Jens; Elko, Gary, "A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield", Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, 2002, II-1781-II- 1784.

従来技術では、半径r_aの位置から半径rへと音場を外挿する際に、ベッセル関数j_n(kr)もしくはベッセル関数j_n(kr)とハンケル関数h_n(kr)とを使用している。参考文献１によれば、2関数とも大域的な傾向としてkrが増大すると、1/krのペースで減少する。
（参考文献１）Ｅ．Ｇ．ウィリアムズ、「フーリエ音響学」、シュプリンガー・フェアラーク、２００５、p.234-236．
例えばrがr_aの10倍になると、外挿の推定値は約1/10と急激に小さくなってしまう。そのため、外挿が有効な空間領域は球面マイクロホンアレー表面の周囲に限定されてしまう。また同じ理由により、周波数ωが高くなってもk=ω/cが大きくなり、外挿の推定値が急激に小さくなる。つまり周波数を高くすると外挿が有効な空間領域が急激に狭まってしまう。 In the prior art, when the position of the radius r _a extrapolating the sound field to the radius r, using the Bessel function j _n (kr) or Bessel functions j _n (kr) and Hankel function h _n (kr) ing. According to the reference 1, when kr increases as a global trend in both functions, it decreases at a rate of 1 / kr.
(Reference 1) G. Williams, “Fourier Acoustics”, Springer Fairlark, 2005, p.234-236.
For example, if r is 10 times r _a, the estimated value of the extrapolation becomes abruptly small as about 1/10. For this reason, the spatial region in which extrapolation is effective is limited to the periphery of the spherical microphone array surface. For the same reason, even if the frequency ω increases, k = ω / c increases, and the extrapolated estimated value decreases rapidly. In other words, when the frequency is increased, the spatial region in which extrapolation is effective is rapidly narrowed.

本発明は、従来技術よりも、音場推定が有効な空間領域が大きい音場推定装置、その方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide a sound field estimation apparatus, a method and a program therefor, which have a larger spatial region in which sound field estimation is more effective than the prior art.

上記の課題を解決するために、本発明の一態様によれば、音場推定装置は、m=1,2,…,Mを球面マイクロホンアレーのインデックスとし、j_m=1,2,…,J_m、r_mを極座標の動径、θ_{j_m}及びφ_{j_m}を極座標の偏角、ωを時間周波数のインデックス、d^- _mを中心とする半径r_mの球体上の偏角θ_{j_m}及びφ_{j_m}のJ_m個の位置r^-(m,j_m)=d^- _m+[r_msinθ_{j_m}cosφ_{j_m} r_msinθ_{j_m}sinφ_{j_m} r_mcosθ_{j_m}]^Tにそれぞれマイクロホンを備えるM個の球面マイクロホンアレーmの周波数領域の収音信号u(ω,m,j_m)を用いて、音場を構成する平面波の強度からなるベクトルを推定する平面波分解部と、平面波の強度からなるベクトルの推定値a(ω)と、仮想的なマイクロホンの位置r^- _pとを用いて、仮想マイクロホンの位置r^- _pでの周波数領域の収音信号u^(ω,r^- _p)を推定する補間推定部とを含む。 In order to solve the above problems, according to one aspect of the present invention, the sound field estimation apparatus uses m = 1, 2,..., M as an index of a spherical microphone array, and j _m = 1, 2,. J _m , r _m are polar coordinate _radials , θ _{j_m} and φ _{j_m} are polar coordinate declinations, ω is a time frequency index, and declination angles θ _{j_m} and φ _{j_m} on a sphere of radius r _m centered at d ^- _m of J _m-number of position ^{_{r - (m, j m)}} = d - m + [r m sinθ j_m cosφ j_m r m sinθ j_m sinφ j_m r m cosθ j_m] M pieces of each ^T comprises a microphone spherical microphone array m A plane wave decomposition unit that estimates a vector composed of the intensity of a plane wave that constitutes the sound field using a sound collection signal u (ω, m, j _m ) in the frequency domain, and an estimated value a ( ω) and the virtual microphone position r ^- _p, and an interpolation estimation unit that estimates the frequency domain sound pickup signal u ^ (ω, r ^- _p ) at the virtual microphone position r ^- _p. Including.

上記の課題を解決するために、本発明の他の態様によれば、音場推定方法は、m=1,2,…,Mを球面マイクロホンアレーのインデックス、j_m=1,2,…,J_m、r_mを極座標の動径、θ_{j_m}及びφ_{j_m}を極座標の偏角、ωを時間周波数のインデックスとし、平面波分解部が、d^- _mを中心とする半径r_mの球体上の偏角θ_{j_m}及びφ_{j_m}のJ_m個の位置r^-(m,j_m)=d^- _m+[r_msinθ_{j_m}cosφ_{j_m} r_msinθ_{j_m}sinφ_{j_m} r_mcosθ_{j_m}]^Tにそれぞれマイクロホンを備えるM個の球面マイクロホンアレーmの周波数領域の収音信号u(ω,m,j_m)を用いて、音場を構成する平面波の強度からなるベクトルを推定する平面波分解ステップと、補間推定部が、平面波の強度からなるベクトルの推定値a(ω)と、仮想的なマイクロホンの位置r^- _pとを用いて、仮想マイクロホンの位置r^- _pでの周波数領域の収音信号u^(ω,r^- _p)を推定する補間推定ステップとを含む。 In order to solve the above problems, according to another aspect of the present invention, a sound field estimation method includes m = 1, 2,..., M as an index of a spherical microphone array, j _m = 1, 2,. J _m and r _m are polar coordinate _radials , θ _{j_m} and φ _{j_m} are polar coordinate declinations, ω is an index of time frequency, and the plane wave decomposition part is a deviation on a sphere with a radius r _m centered on d ^- _m. angle theta _{J_m} and phi _{J_m} of J _m-number of position ^{_{r - (m, j m)}} = d - m + [r m sinθ j_m cosφ j_m r m sinθ j_m sinφ j_m r m cosθ j_m] each ^T comprises a microphone M A plane wave decomposition step for estimating a vector consisting of the intensity of plane waves constituting the sound field using the collected sound signal u (ω, m, j _m ) in the frequency domain of each spherical microphone array m, and an interpolation estimation unit, estimate of vector of intensity of a plane wave with a (ω), the position r of the virtual microphone ^- with a _p, the position of the virtual microphone r ^- collected sound signal in the frequency domain in _p u ^ (ω, r ^- and a interpolation estimation step of estimating _p).

本発明によれば、従来技術よりも、音場推定が有効な空間領域が大きいという効果を奏する。 According to the present invention, there is an effect that a spatial region in which sound field estimation is effective is larger than that of the related art.

従来技術に係る音場推定装置の機能ブロック図。The functional block diagram of the sound field estimation apparatus which concerns on a prior art. 第一実施形態に係る音場推定装置の機能ブロック図。The functional block diagram of the sound field estimation apparatus which concerns on 1st embodiment. 第一実施形態に係る音場推定装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the sound field estimation apparatus which concerns on 1st embodiment. 第一実施形態、その変形例１における仮想マイクロホンの位置の概要を示す図。The figure which shows the outline | summary of the position of the virtual microphone in 1st embodiment and the modification 1. FIG. 第二実施形態に係る音場推定装置の機能ブロック図。The functional block diagram of the sound field estimation apparatus which concerns on 2nd embodiment. 第二実施形態に係る音場推定装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the sound field estimation apparatus which concerns on 2nd embodiment. 第三実施形態に係る音場推定装置の機能ブロック図。The functional block diagram of the sound field estimation apparatus which concerns on 3rd embodiment. 第三実施形態に係る音場推定装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the sound field estimation apparatus which concerns on 3rd embodiment.

以下、本発明の実施形態について、説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。以下の説明において、テキスト中で使用する記号「^」「^-」等は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Hereinafter, embodiments of the present invention will be described. In the drawings used for the following description, constituent parts having the same function and steps for performing the same process are denoted by the same reference numerals, and redundant description is omitted. In the following explanation, the symbols "^", " ^- ", etc. used in the text should be written immediately above the character just before, but they are written immediately after the character due to restrictions on the text notation. . In the formula, these symbols are written in their original positions. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

＜第一実施形態のポイント＞
本実施形態では、(1)単一の球面マイクロホンアレーでなく、複数のマイクロホンアレーをもちい、(2)球面波スペクトルの代わりに、周波数領域の収音信号から音場を構成する平面波の集まりを直接求めて、この平面波をもちいて音場を推定する。(1)と(2)により、音場推定が有効な空間範囲を大きく広げることが可能になる。以下、その方法を説明する。 <Points of first embodiment>
In this embodiment, (1) a plurality of microphone arrays are used instead of a single spherical microphone array, and (2) instead of a spherical wave spectrum, a collection of plane waves that constitute a sound field from a collected signal in the frequency domain. The sound field is estimated directly using this plane wave. (1) and (2) make it possible to greatly expand the spatial range in which sound field estimation is effective. The method will be described below.

＜第一実施形態に係る音場推定装置２００＞
図２は第一実施形態に係る音場推定装置２００の機能ブロック図を、図３はその処理フローを示す。 <Sound Field Estimation Device 200 according to First Embodiment>
FIG. 2 is a functional block diagram of the sound field estimation apparatus 200 according to the first embodiment, and FIG. 3 shows its processing flow.

音場推定装置２００は、短時間フーリエ変換部２１１、平面波分解部２１３、補間推定部２１６及び短時間逆フーリエ変換部２１８を含む。 The sound field estimation apparatus 200 includes a short-time Fourier transform unit 211, a plane wave decomposition unit 213, an interpolation estimation unit 216, and a short-time inverse Fourier transform unit 218.

音場推定装置２００は、球面マイクロホンアレー１から時間領域の収音信号y(t,1,j)(ただし、j=1,2,…,J)を受け取り、球面マイクロホンアレー２から時間領域の収音信号y(t,2,j)(ただし、j=1,2,…,J)を受け取り、仮想マイクロホンの位置情報r^-を受け取り、仮想マイクロホンの位置r^-における時間領域の収音信号y(t,r^-)を推定し、出力する。 The sound field estimation apparatus 200 receives a time domain collected signal y (t, 1, j) (where j = 1, 2,..., J) from the spherical microphone array 1 and receives a time domain collected signal y from the spherical microphone array 2. Receives the collected sound signal y (t, 2, j) (where j = 1, 2,..., J), receives the virtual microphone position information r ^−, and collects the time domain sound collected signal at the virtual microphone position r ⁻ y (t, r ^-) to estimate, to output.

本実施形態では、球面マイクロホンアレー１、２が含む球面上のマイクロホン数、球面マイクロホンアレー１、２におけるマイクロホンの配置、球面マイクロホンアレー１、２の半径が同一であるとする。球面マイクロホンアレー１、２は、開球型の球面マイクロホンアレーであり、半径r_aの球面上にJ個のマイクロホンが配置され、球面上のマイクロホン配置はelevation角とazimuth角のペア(θ_j,φ_j)で指定される。球面マイクロホンアレー１の中心位置をd^- ₁=[x₁,y₁,z₁]とし、球面マイクロホンアレー２の中心位置をd^- ₂=[x₂,y₂,z₂]とする。球面マイクロホンアレーｍ（ｍ＝１，２）上の第ｊマイクロホンの３次元位置は、 In this embodiment, it is assumed that the number of microphones on the spherical surfaces included in the spherical microphone arrays 1 and 2, the arrangement of the microphones in the spherical microphone arrays 1 and 2, and the radius of the spherical microphone arrays 1 and 2 are the same. The spherical microphone arrays 1 and 2 are open-type spherical microphone arrays in which J microphones are arranged on _a spherical surface with _a radius ra, and the microphone arrangement on the spherical surface is a pair of elevation angle and azimuth angle (θ _j , φ _j ). The center position of the spherical microphone array 1 d ^- ₁ = a _{_{[x 1, y 1, z}} 1], the center position of the spherical microphone array 2 d ^- ₂ = a _{_{[x 2, y 2, z}} 2]. The three-dimensional position of the jth microphone on the spherical microphone array m (m = 1, 2) is

で与えられる。このマイクロホンによる収音信号を、時間領域で収音信号y(t,m,j)(ただし、m=1,2、j=1,2,…,J)とする。 Given in. The collected sound signal from the microphone is assumed to be a collected sound signal y (t, m, j) (where m = 1, 2, j = 1, 2,..., J) in the time domain.

＜短時間フーリエ変換部２１１＞
短時間フーリエ変換部２１１は、時間領域の収音信号y(t,m,j)(ただし、m=1,2、j=1,2,…,J)を受け取り、短時間フーリエ変換により、時間領域の収音信号y(t,m,j)を周波数領域の収音信号u(i,ω,m,j)(ただし、iはフレーム番号、ω=1,2,…,F、j=1,2,…,J)に変換し（Ｓ２１１）、出力する。なお、以降の処理はフレームi毎に行うが、記載を簡略化するため、フレーム番号iを省略する。また、時間領域の信号を周波数領域の信号に変換する方法であれば、短時間フーリエ変換以外の方法を用いてもよい。 <Short-time Fourier transform unit 211>
The short-time Fourier transform unit 211 receives a time-domain sound pickup signal y (t, m, j) (where m = 1, 2, j = 1, 2,..., J), Time-domain sound pickup signal y (t, m, j) is changed to frequency-domain sound pickup signal u (i, ω, m, j) (where i is a frame number, ω = 1, 2,..., F, j = 1, 2,..., J) (S211) and output. The subsequent processing is performed for each frame i, but the frame number i is omitted to simplify the description. Further, any method other than the short-time Fourier transform may be used as long as it is a method for converting a time-domain signal into a frequency-domain signal.

＜平面波分解部２１３＞
平面波分解部２１３は、周波数領域の収音信号u(ω,m,j)(ただし、ω=1,2,…,F、m=1,2、j=1,2,…,J)を受け取り、これらの値を用いて、音場を構成する平面波の強度からなるベクトルを推定し（Ｓ２１３）、推定値a^-(ω)(ただし、ω=1,2,…,F)を出力する。例えば、平面波分解部２１３は、音場を構成する平面波の集まりを求めるために、次のコスト関数Ｊを最小にする解ベクトル(推定値)a^-(ω)を求める。 <Plane wave decomposition unit 213>
The plane wave decomposition unit 213 generates a frequency-domain sound pickup signal u (ω, m, j) (where ω = 1, 2,..., F, m = 1, 2, j = 1, 2,..., J). Using these values, a vector composed of the intensity of the plane wave constituting the sound field is estimated (S213), and an estimated value a ⁻ (ω) (where ω = 1, 2,..., F) is output. . For example, the plane wave decomposition unit 213 obtains a solution vector (estimated value) a ⁻ (ω) that minimizes the next cost function J in order to obtain a collection of plane waves constituting the sound field.

なお、λは正則化のための定数であり、λを大きくするほどu(ω,m,j)にのる雑音に対し、推定をロバストにすることができる。また、式(7)中のD(ω)は辞書行列とよばれ、 Note that λ is a constant for regularization. As λ is increased, estimation can be made more robust against noise on u (ω, m, j). Also, D (ω) in equation (7) is called a dictionary matrix,

であり、そのl番目の列ベクトルD(ω,l)は、elevation角とazimuth角のペア(θ_l,φ_l)で指定された方向から振幅１の平面波が入射したとき、原点での位相が0の状態での各球面マイクロホンアレー１，２での観測値からなるベクトルである。例えば、L'個の平面波が全方位から万遍なく取得できるようにL'個の入射角(θ_l,φ_l)を設定する。例えば、正多面体の頂点の方向から平面波が入射するようにL'個の入射角(θ_l,φ_l)を設定する。 The l-th column vector D (ω, l) is the phase at the origin when a plane wave of amplitude 1 is incident from the direction specified by the elevation angle and azimuth angle pair (θ _l , φ _l ). Is a vector composed of observation values at the spherical microphone arrays 1 and 2 in a state where is 0. For example, L ′ incident angles (θ _l , φ _l ) are set so that L ′ plane waves can be obtained uniformly from all directions. For example, L ′ incident angles (θ _l , φ _l ) are set so that plane waves are incident from the direction of the apex of the regular polyhedron.

辞書行列D(ω)のl番目の列ベクトルD(ω,l)は、入射角(θ_l,φ_l)の平面波が到来する際の球面マイクロホンアレーｍの第ｊマイクロホンの観測値 The l-th column vector D (ω, l) of the dictionary matrix D (ω) is an observation value of the j-th microphone of the spherical microphone array m when a plane wave having an incident angle (θ _l , φ _l ) arrives.

をもちいて、

Use

で表される。

It is represented by

また、解ベクトルa^-(ω)のl番目の値は、l番目の平面波の振幅に対応する。a(ω)=[a₁(ω),a₂(ω),…,a_l(ω),…,a_L'(ω)]^Tとする。 The l-th value of the solution vector a ⁻ (ω) corresponds to the amplitude of the l-th plane wave. a (ω) = a _{[a 1 (ω), a} 2 (ω), ..., a l (ω), ..., a L '(ω)] T.

上記のＬ１ノルムの正則化項をもちいる凸最適化は、解ベクトルa^-(ω)として０を多く含むスパースなベクトルを導きだす。そのため、参考文献２で示されているように、あらかじめ想定する平面波の個数L'がマイクロホン数を大きく上回るような冗長な場合でも、平面波をうまく抽出することが可能である。
(参考文献２)A. Wabnitz, N. Epain, A. van.Shaik, C. Jin, "Reconstruction of spatial sound field using compressed sensing", in Acoustics, Speech, and Signal Processing (ICASSP), IEEE International Conference on, 2011. Convex optimization of using the regularization term of L1 norm, the solution vector a ^- (ω) as derive sparse vector containing much 0. Therefore, as shown in Reference Document 2, it is possible to successfully extract plane waves even in a redundant case where the number L ′ of plane waves assumed in advance greatly exceeds the number of microphones.
(Reference 2) A. Wabnitz, N. Epain, A. van.Shaik, C. Jin, "Reconstruction of spatial sound field using compressed sensing", in Acoustics, Speech, and Signal Processing (ICASSP), IEEE International Conference on , 2011.

＜補間推定部２１６＞
補間推定部２１６は、推定値a(ω)と仮想マイクロホンの位置情報r^-=(r_x,r_y,r_z)とを受け取り、仮想マイクロホンの位置r^-での周波数領域の収音信号u^(ω,r^-)(ただし、ω=1,2,…,F)を次式により推定し（Ｓ２１６）(言い換えると、解ベクトルa^-(ω)をパラメータとする平面波モデルから仮想マイクロホンの出力値u^(ω,r^-)を推定し)、出力する。 <Interpolation estimation unit 216>
The interpolation estimation unit 216 receives the estimated value a (ω) and the virtual microphone position information r ⁻ = (r _x , r _y , r _z ), and collects the sound signal u in the frequency domain at the position r ⁻ of the virtual microphone. ^ (ω, r ⁻ ) (where ω = 1, 2,..., F) is estimated by the following equation (S216) (in other words, from the plane wave model with the solution vector a ⁻ (ω) as a parameter, the virtual microphone the output value u ^ (ω, r ^-) to estimate), and outputs.

ただし、●は内積を意味し、k^- _lはl番目の平面波の入射方向に対応する波数ベクトルであり、次式により表される。 However, ● means an inner product, and k ^- _l is a wave vector corresponding to the incident direction of the l-th plane wave, and is expressed by the following equation.

^Tは転置を表す。なお、仮想マイクロホンの位置情報r^-は、例えば、音場推定装置２００の利用者により入力される。 ^T represents transposition. Note that the position information r ⁻ of the virtual microphone is input by a user of the sound field estimation apparatus 200, for example.

＜短時間逆フーリエ変換部２１８＞
短時間逆フーリエ変換部２１８は、周波数領域の収音信号u^(ω,r^-)(ただし、ω=1,2,…,F)を受け取り、逆短時間フーリエ変換により、収音信号u^(ω,r^-)を時間領域の収音信号y(t,r^-)に変換し（Ｓ２１８）、出力する。なお、時間領域の信号を周波数領域の信号に変換する方法として、短時間フーリエ変換部２１１における変換方法に対応する方法を用いればよい。 <Short-time inverse Fourier transform unit 218>
The short-time inverse Fourier transform unit 218 receives the collected sound signal u ^ (ω, r ⁻ ) (where ω = 1, 2,..., F) in the frequency domain, and collects the collected sound signal u by inverse short-time Fourier transform. ^ (ω, r ⁻ ) is converted into a time domain sound pickup signal y (t, r ⁻ ) (S218) and output. Note that a method corresponding to the conversion method in the short-time Fourier transform unit 211 may be used as a method for converting a time-domain signal into a frequency-domain signal.

＜効果＞
以上の構成により、従来技術よりも、音場推定が有効な空間領域が大きい音場推定装置を実現することができる。 <Effect>
With the above configuration, it is possible to realize a sound field estimation apparatus having a larger spatial area in which sound field estimation is more effective than in the related art.

＜変形例１＞
第一実施形態では、１つの仮想的なマイクロホンを想定し、その位置で収音される信号を推定した。しかし、当然、複数の位置に仮想的なマイクロホンを想定してもよい。また、仮想的なマイクロホンを同一の球面上に配置することで、半径rの開球型の仮想的なマイクロホンアレーを構成することができる。例えば、仮想的なマイクロホンアレーの中心が、原点から見てD=[d_x d_y d_z]の位置にあり、P個の仮想的なマイクロホンを備える仮想的なマイクロホンアレーを構成した場合、音場推定装置２００は、P個の仮想マイクロホンの位置情報(r^- _p)(ただし、p=1,2,…,P)を受け取り、収音信号y(t,r^- _p)(ただし、p=1,2,…,P)を出力する。 <Modification 1>
In the first embodiment, a single virtual microphone is assumed, and a signal collected at that position is estimated. However, of course, virtual microphones may be assumed at a plurality of positions. Further, by arranging virtual microphones on the same spherical surface, an open-ball type virtual microphone array having a radius r can be configured. For example, if the center of the virtual microphone array is at the position of D = [d _x d _y d _z ] when viewed from the origin, and a virtual microphone array including P virtual microphones is configured, the sound The field estimation apparatus 200 receives the position information (r ⁻ _p ) (where p = 1, 2,..., P) of the P virtual microphones, and collects the collected sound signal y (t, r ⁻ _p ) (where p = 1,2, ..., P) is output.

仮想的なマイクロホンアレーの球面上のp番目のマイクロホン位置をr^- _p(ただし、p=1,2,…,P)とするとき、補間推定部２１６は、周波数領域の収音信号u^(ω,r^- _p)を次式により推定する。 When the position of the p-th microphone on the spherical surface of the virtual microphone array is r ^- _p (where p = 1, 2,..., P), the interpolation estimation unit 216 captures the collected sound signal u ^ ( ω, r ^- _p ) is estimated by the following equation.

図４は、第一実施形態、その変形例１における仮想マイクロホンの位置の概要を示している。仮想的なマイクロホンアレーの中心[d_x d_y d_z]が[r_xr_y r_z]、半径r=0、仮想的なマイクロホンアレーが備える仮想的なマイクロホンの個数Pが1のとき、第一実施形態となるため、第一実施形態は変形例１の一例とも言える（ただし、本変形例では半径rの球面上のマイクロホンが設けられる。一方、第一実施形態では半径r=0、つまり、球面上ではなく、点上にマイクロホンが設けられる）。 FIG. 4 shows an outline of the position of the virtual microphone in the first embodiment and the first modification thereof. When the center [d _x d _y d _z ] of the virtual microphone array is [r _x r _y r _z ], the radius r = 0, and the number P of virtual microphones included in the virtual microphone array is 1, Since this embodiment is an embodiment, the first embodiment can be said to be an example of Modification 1 (however, in this modification, a microphone on a spherical surface having a radius r is provided. On the other hand, in the first embodiment, a radius r = 0, that is, , A microphone is provided on the point, not on the spherical surface).

＜変形例２＞
第一実施形態では、開球型マイクロホンアレー２つを音場に設置する場合を説明した。本変形例では剛球型マイクロホンアレー２つを音場に設置する場合について説明する。 <Modification 2>
In the first embodiment, the case where two open-ball type microphone arrays are installed in the sound field has been described. In this modification, a case where two hard sphere type microphone arrays are installed in a sound field will be described.

剛球型マイクロホンアレーは、半径r_a、マイクロホン数Jとし、球面上のマイクロホン配置はelevation角とazimuth角のペア(θ_j,φ_j)で指定されるとする。elevation角とazimuth角のペア(θ_l、φ_l)で指定された方向から振幅１の平面波が入射するとき、音場は入射波と散乱波からなる。 The hard-sphere microphone array has a radius r _a and the number of microphones J, and the microphone arrangement on the spherical surface is specified by a pair of elevation angle and azimuth angle (θ _j , φ _j ). When a plane wave having an amplitude of 1 is incident from a direction specified by a pair of elevation angle and azimuth angle (θ _l , φ _l ), the sound field is composed of an incident wave and a scattered wave.

剛球型マイクロホンアレーの中心と座標系の原点が一致するとき、第jマイクロホンで観測される信号は When the center of the hard-sphere microphone array coincides with the origin of the coordinate system, the signal observed by the jth microphone is

になる。第m(m=1,2)の剛球型マイクロホンアレーの中心が原点からd_m離れている場合、その第jマイクロホンで観測される信号は、位相差を考慮して become. If the center of the rigid ball-type microphone array of the m (m = 1,2) is away d _m from the origin, the signal observed at the j-th microphone, taking into account the phase difference

になる。

become.

そこで式(9)の代わりに式(16)をもちいて、式(10)の辞書行列D(ω)のl番目の列ベクトルD(ω,l)を生成すれば、あとは同様に最適化問題を解くことで、剛球マイクロホンアレーの出力信号から平面波を抽出することができる。 Therefore, if equation (16) is used instead of equation (9) and the l-th column vector D (ω, l) of the dictionary matrix D (ω) of equation (10) is generated, then optimization is performed similarly. By solving the problem, a plane wave can be extracted from the output signal of the hard sphere microphone array.

なお式(15)は無限個の項をふくむため、実際には有限のnをもちい、数値計算によりv^rigid(ω,l,m,j)を求める。r_a=4cmのとき、n=10程度にとればよい。 Since equation (15) includes an infinite number of terms, in actuality, finite n is used, and v ^rigid (ω, l, m, j) is obtained by numerical calculation. When r _a = 4 cm, n = 10 is sufficient.

＜その他の変形例＞
本実施形態では、マイクロホンアレー１、２におけるマイクロホンの配置、球面マイクロホンアレー１、２の半径が同一であるとしているが、異なってもよい。また、マイクロホンアレーの個数は２つである必要はなく、複数個であればよい。例えば、Mを2以上の整数の何れかとし、M個の球面マイクロホンアレーｍ（ｍ＝１，２，…，Ｍ）上の第ｊ_mマイクロホンの３次元位置は、球面マイクロホンアレーｍの中心をd^- _mとし、半径をr_mとし、 <Other variations>
In the present embodiment, the microphone arrangement in the microphone arrays 1 and 2 and the radius of the spherical microphone arrays 1 and 2 are the same, but they may be different. Further, the number of microphone arrays need not be two, but may be plural. For example, M is any integer of 2 or more, and the three-dimensional position of the j _m microphone on the M spherical microphone arrays m (m = 1, 2,..., M) is the center of the spherical microphone array m. d ^- _m , radius r _m

で与えられる。このマイクロホンによる収音信号を、時間領域で収音信号y(t,m,j_m)(j_m=1,2,…,J_mとし、J_mは球面マイクロホンアレーｍが備えるマイクロホンの個数)とする。また、式(7),(8)は次式に置き換える。 Given in. The collected sound signal from the microphone is collected in the time domain as y (t, m, j _m ) (j _m = 1, 2,..., J _m , where J _m is the number of microphones included in the spherical microphone array m). And Also, equations (7) and (8) are replaced with the following equations.

辞書行列D(ω)のl番目の列ベクトルD(ω,l)は、入射角(θ_l,φ_l)の平面波が到来する際の球面マイクロホンアレーｍの第ｊ_mマイクロホンの観測値 The l-th column vector D (ω, l) of the dictionary matrix D (ω) is an observation value of the j _m microphone of the spherical microphone array m when a plane wave having an incident angle (θ _l , φ _l ) arrives.

をもちいて、

Use

で表される。

It is represented by

また、第一実施形態の変形例２の式(15),(16)は次式に置き換える。 In addition, formulas (15) and (16) in the second modification of the first embodiment are replaced with the following formulas.

＜第二実施形態＞
第一実施形態の変形例１と異なる部分を中心に説明する。 <Second embodiment>
A description will be given centering on differences from the first modification of the first embodiment.

第一実施形態の変形例１では、仮想的に開球型のマイクロホンアレーを想定し、その収音信号を推定した。第二実施形態では、第一実施形態の変形例１の構成をベースとして、開球型のマイクロホンアレーの代わりに、仮想的に剛球型マイクロホンアレーを想定し、その収音信号を推定する。 In Modification 1 of the first embodiment, a sound pickup signal is estimated assuming a virtually open-ball type microphone array. In the second embodiment, based on the configuration of the first modification of the first embodiment, instead of the open-ball type microphone array, a hard-ball type microphone array is virtually assumed, and the collected sound signal is estimated.

図５は第二実施形態に係る音場推定装置３００の機能ブロック図を、図６はその処理フローを示す。 FIG. 5 is a functional block diagram of the sound field estimation apparatus 300 according to the second embodiment, and FIG. 6 shows the processing flow.

音場推定装置３００は、短時間フーリエ変換部２１１、平面波分解部２１３、補間推定部２１６及び短時間逆フーリエ変換部２１８を含み、さらに、アレー型変換部３１７を含む。 The sound field estimation apparatus 300 includes a short-time Fourier transform unit 211, a plane wave decomposition unit 213, an interpolation estimation unit 216, and a short-time inverse Fourier transform unit 218, and further includes an array type conversion unit 317.

最初に、仮想的な球面マイクロホンアレーとして、参考文献３の二重式開球型の球面マイクロホンアレー(dual open sphere microphone array)による収音を想定する。このマイクロホンアレーではマイクロホン素子が半径rの球面もしくは半径αrの球面上に配置されており、α=1.2が推奨されている。
（参考文献３）I. Balmages, B. Rafaely, "Open-Sphere Designs for Spherical Microphone Arrays", IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp 727-732, 2007.
例えば、Q=P×2とし、Q個の仮想的なマイクロホン素子のうち、P個の仮想的なマイクロホン素子の位置を変形例１と同様とする。つまり、仮想的なマイクロホンアレーの中心が、原点から見てD=[d_x d_y d_z]の位置にあり、仮想的なマイクロホンアレーの球面上のp番目の仮想的なマイクロホンの位置r^- _pは、 First, as a virtual spherical microphone array, sound collection by a double open sphere microphone array of Reference 3 is assumed. In this microphone array, microphone elements are arranged on a spherical surface with a radius r or a spherical surface with a radius αr, and α = 1.2 is recommended.
(Reference 3) I. Balmages, B. Rafaely, "Open-Sphere Designs for Spherical Microphone Arrays", IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp 727-732, 2007.
For example, it is assumed that Q = P × 2, and the positions of P virtual microphone elements among the Q virtual microphone elements are the same as those in the first modification. That is, the center of the virtual microphone array is at the position of D = [d _x d _y d _z ] when viewed from the origin, and the position of the p-th virtual microphone on the spherical surface of the virtual microphone array r ⁻ _p is

である。Q個の仮想的なマイクロホン素子のうち、残りP個の仮想的なマイクロホンを中心が[d_x d_yd_z]、半径αrの球面上に配置し、q番目の仮想的なマイクロホンの位置を It is. Of the Q virtual microphone elements, the remaining P virtual microphones are arranged on a spherical surface with a center of [d _x d _y d _z ] and a radius αr, and the position of the qth virtual microphone is determined.

とする。また、Ω_q=(θ_q,φ_q)=Ω_p=(θ_p,φ_p)とする。つまり、q番目（ｑ＝P+p）のマイクロホンとp番目のマイクロホンとは、仮想的なマイクロホンアレーの中心からみて同じ方向にあり、q番目のマイクロホンは半径rの球面上にあり、q番目のマイクロホンは半径αrの球面上にある。 And Further, Ω _q = (θ _q , φ _q ) = Ω _p = (θ _p , φ _p ). In other words, the q-th (q = P + p) microphone and the p-th microphone are in the same direction as seen from the center of the virtual microphone array, the q-th microphone is on the spherical surface of radius r, and the q-th microphone The microphone is on a spherical surface with a radius αr.

補間推定部２１６は、推定値Aと仮想的なマイクロホンアレーの中心Dと、仮想マイクロホンのP個の位置情報(r^- _p)(ただし、p=1,2,…,P)とP個の位置情報(r^- _q)(ただし、q=P+1,P+2,…,Q)とを受け取り、仮想マイクロホンの位置(r^- _p)及び(r^- _q)での周波数領域の収音信号u^(ω,r^- _p)(ただし、p=1,2,…,P)、u^(ω,r^- _q)(ただし、q=P+1,P+2,…,Q)を推定し（Ｓ２１６）、出力する。なお、P個の位置情報(r^- _q)(ただし、q=P+1,P+2,…,Q)に代えて、αのみを受け取る構成とし、P個の位置情報(r^- _p)とαとから、P個の位置情報(r^- _q)を計算して求めてもよい。 Interpolation estimation section 216 includes estimated value A, virtual microphone array center D, virtual microphone P position information (r ^- _p ) (where p = 1, 2,..., P) and P pieces of positional information. Receives position information (r ^- _q ) (where q = P + 1, P + 2, ..., Q) and collects sound in the frequency domain at virtual microphone positions (r ^- _p ) and (r ^- _q ) Signal u ^ (ω, r ^- _p ) (where p = 1,2, ..., P), u ^ (ω, r ^- _q ) (where q = P + 1, P + 2, ..., Q) Is estimated (S216) and output. It should be noted that instead of P pieces of position information (r ⁻ _q ) (where q = P + 1, P + 2,..., Q), only α is received, and P pieces of position information (r ⁻ _p ) And P pieces of position information (r ⁻ _q ) may be calculated from α and α.

＜アレー型変換部３１７＞
アレー型変換部３１７は、周波数領域の収音信号u^(ω,r^- _p)(ただし、p=1,2,…,P)、u^(ω,r^- _q)(ただし、q=P+1,P+2,…,Q)を受け取り、次式により、球面波スペクトルu_n,m(ω,r)およびu_n,m(ω,αr)に変換する。 <Array type conversion unit 317>
The array-type conversion unit 317 is a frequency domain sound pickup signal u ^ (ω, r ⁻ _p ) (where p = 1, 2,..., P), u ^ (ω, r ⁻ _q ) (where q = P + 1, P + 2,..., Q) are received and converted into spherical wave spectra u _{n, m} (ω, r) and u _{n, m} (ω, αr) by the following equation.

開球型の球面マイクロホンアレーでは、特異点の影響によりj_n(kr)=0になるkおよびrで測定が不可能になる。しかしu_n,m(ω,r)とu_n,m(ω,αr)のうち、絶対値の大きい方を選択することで、二重式開球型の球面マイクロホンアレーは特異点の影響を回避することができる。 In an open spherical spherical microphone array, measurement becomes impossible at k and r where j _n (kr) = 0 due to the influence of singularities. However, by selecting the larger absolute value between u _{n, m} (ω, r) and u _{n, m} (ω, αr), the double-open spherical microphone array has the effect of singularities. It can be avoided.

そこで、アレー型変換部３１７は、 Therefore, the array type conversion unit 317

を用い、
|u_n,m(ω,r)|>|u_n,m(ω,αr)|のとき Use
When | u _{n, m} (ω, r) |> | u _{n, m} (ω, αr) |

とし、|u_n,m(ω,r)|≦|u_n,m(ω,αr)|のとき When | u _{n, m} (ω, r) | ≦ | u _{n, m} (ω, αr) |

として、球面波スペクトルv_n,m(ω,r)を求める。 To obtain a spherical wave spectrum v _{n, m} (ω, r).

アレー型変換部３１７は、最後に逆球面波スペクトル変換 Finally, the array type conversion unit 317 performs inverse spherical wave spectrum conversion.

を適用する。これにより、最初に仮想的に設置した二重式開球型の球面マイクロホンアレーの位置に、半径ｒの剛球型マイクロホンアレーを設置した場合の収音信号を周波数領域で得ることができる。アレー型変換部３１７は周波数領域の信号v(ω,r^- _p)(ただし、p=1,2,…,P)を短時間逆フーリエ変換部２１８に出力する。 Apply. As a result, it is possible to obtain a sound collection signal in the frequency domain when a rigid spherical microphone array having a radius r is installed at the position of the double-type open spherical spherical microphone array that is virtually installed first. Array type conversion unit 317 frequency domain signal ^{_{v (ω, r - p)}} ( however, p = 1,2, ..., P ) and outputs a short time inverse Fourier transform unit 218.

＜効果＞
このような構成とすることで、第一実施形態の変形例１と同様の効果を得ることができる。さらに、剛球型のマイクロホンアレーを設置した場合の収音信号を仮想的に得ることができる。なお、本実施形態と第一実施形態の変形例２とを組合せてもよい。 <Effect>
By setting it as such a structure, the effect similar to the modification 1 of 1st embodiment can be acquired. Furthermore, it is possible to virtually obtain a sound collection signal when a hard sphere type microphone array is installed. In addition, you may combine this embodiment and the modification 2 of 1st embodiment.

＜第三実施形態＞
剛球型マイクロホンアレーのバーチャルリアリティへの適用が、参考文献４で示されている。
（参考文献４）R. Duraiswami1, D. N. Zotkin1, Z. Li, E. Grassi, N. A. Gumerov, L. S. Davis, "High Order Spatial Audio Capture and Binaural Head-Tracked Playback over Headphones with HRTF Cues", Proceedings 119th convention of AES, 2005.
この参考文献４では、固定された剛球型マイクロホンアレーの収音信号および仮想的な頭部の方向を入力とし、指定方向に頭を向けたときに右耳および左耳に聞こえる信号（バイノーラル信号）を出力する方法が示されている。球面マイクロホンアレーが全方向に収音しているために、マイクロホン素子およびマイクロホンアレーを動かすことなく、指定された任意の方向に対応したバイノーラル信号を生成可能である。すなわち、受聴者の頭部回転をリアルタイムに計測して入力すると、その回転運動に追随してバイノーラル信号を生成して、受聴者に提示できる。 <Third embodiment>
Reference 4 shows an application of a hard-sphere microphone array to virtual reality.
(Reference 4) R. Duraiswami1, DN Zotkin1, Z. Li, E. Grassi, NA Gumerov, LS Davis, "High Order Spatial Audio Capture and Binaural Head-Tracked Playback over Headphones with HRTF Cues", Proceedings 119th convention of AES , 2005.
In this reference document 4, a sound signal (binaural signal) that can be heard by the right ear and the left ear when the head is turned in a specified direction with the sound pickup signal of the fixed rigid-sphere microphone array and the virtual head direction as inputs. How to output is shown. Since the spherical microphone array collects sound in all directions, a binaural signal corresponding to an arbitrary designated direction can be generated without moving the microphone element and the microphone array. That is, if the listener's head rotation is measured and input in real time, a binaural signal can be generated following the rotational movement and presented to the listener.

第二実施形態では、仮想的に設置した剛球型マイクロホンアレーの収音信号を求める方法を示した。この収音信号に対して、図７のようにこのバイノーラル信号生成法を組み合わせる構成が、本実施形態の構成である。 In the second embodiment, a method of obtaining a sound collection signal of a virtually spherical hard sphere microphone array has been described. A configuration in which this binaural signal generation method is combined with this collected sound signal as shown in FIG. 7 is the configuration of this embodiment.

第二実施形態と異なる部分を中心に説明する。 A description will be given centering on differences from the second embodiment.

図７は第三実施形態に係る音場推定装置４００の機能ブロック図を、図８はその処理フローを示す。 FIG. 7 is a functional block diagram of the sound field estimation apparatus 400 according to the third embodiment, and FIG. 8 shows a processing flow thereof.

音場推定装置４００は、短時間フーリエ変換部２１１、平面波分解部２１３、補間推定部２１６、アレー型変換部３１７及び短時間逆フーリエ変換部２１８を含み、さらに、バイノーラル信号生成部４１９を含む。 The sound field estimation apparatus 400 includes a short-time Fourier transform unit 211, a plane wave decomposition unit 213, an interpolation estimation unit 216, an array type conversion unit 317, and a short-time inverse Fourier transform unit 218, and further includes a binaural signal generation unit 419.

＜バイノーラル信号生成部４１９＞
バイノーラル信号生成部４１９は、仮想的な頭部の方向（姿勢）と時間領域の収音信号y(t,r^- _p)(ただし、p=1,2,…,P、剛球型球面マイクロホンアレーの収音信号に相当)とを受け取り、例えば参考文献４に記載の方法により、これらの信号から仮想的な頭部の位置と方向におけるバイノーラル信号y(t,R),y(t,L)を生成し（Ｓ４１９）、音場推定装置４００の出力値として出力する。なお、仮想的な頭部の位置は、仮想的なマイクロホンアレーの中心D=[d_x d_y d_z]に相当し、時間領域の収音信号y(t,r^- _p)は、仮想的な頭部の位置における剛球型球面マイクロホンアレーの収音信号に相当する。そのため、バイノーラル信号生成部４１９では、仮想的な頭部の方向（姿勢）と時間領域の収音信号y(t,r^- _p)とから仮想的な頭部の位置と方向におけるバイノーラル信号y(t,R),y(t,L)を生成することができる。 <Binaural signal generator 419>
The binaural signal generation unit 419 includes a virtual head direction (posture) and a time-domain sound pickup signal y (t, r ^- _p ) (where p = 1, 2,..., P, a hard spherical spherical microphone array. And the binaural signals y (t, R), y (t, L) in the position and direction of the virtual head from these signals by the method described in Reference 4, for example. (S419) and output as the output value of the sound field estimation apparatus 400. Note that the position of the virtual head corresponds to the center D = [d _x d _y d _z ] of the virtual microphone array, and the collected sound signal y (t, r ^- _p ) in the time domain is the virtual This corresponds to the sound collection signal of the hard spherical spherical microphone array at the position of the head. Therefore, the binaural signal generation unit 419 uses the binaural signal y (() in the position and direction of the virtual head from the direction (posture) of the virtual head and the collected sound signal y (t, r ^- _p ) in the time domain. t, R), y (t, L) can be generated.

参考文献４の手法は頭での回転運動にしか追随できず、頭の並進運動には対応できない。しかし、本実施形態の構成では、剛球型球面マイクロホンアレーを仮想的に並進移動させることができる。そのために本実施形態は、頭部の回転運動および並進運動の両方に追随してバイノーラル信号を生成することを可能にする。 The method of Reference 4 can only follow the rotational movement of the head and cannot cope with the translational movement of the head. However, in the configuration of the present embodiment, the rigid spherical spherical microphone array can be virtually translated. To this end, the present embodiment makes it possible to generate a binaural signal following both the rotational movement and translational movement of the head.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 <Other variations>
The present invention is not limited to the above-described embodiments and modifications. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 <Program and recording medium>
In addition, various processing functions in each device described in the above embodiments and modifications may be realized by a computer. In that case, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage unit. When executing the process, this computer reads the program stored in its own storage unit and executes the process according to the read program. As another embodiment of this program, a computer may read a program directly from a portable recording medium and execute processing according to the program. Further, each time a program is transferred from the server computer to the computer, processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program includes information provided for processing by the electronic computer and equivalent to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, although each device is configured by executing a predetermined program on a computer, at least a part of these processing contents may be realized by hardware.

Claims

m = 1,2, ..., the index of the spherical microphone array to _{M, j m = 1,2, ...} , J m, r m a polar _radius, θ j_m and φ the polar polarization angle _{J_m,} the ω time and the frequency index, d ^- _m spheres on the J _m-number of deflection angle theta _{J_m} and phi _{J_m} position of radius r _m around the ^{_{r - (m, j m)}} = d - m + [r m sinθ j_m cosφ _{j_m} r _m sinθ _{j_m sinφ} _{j_m} r _m _{cosθ j_m} ] The sound field is constructed using the collected sound signals u (ω, m, j _m ) in the frequency domain of M spherical microphone arrays m each having a microphone at ^T A plane wave decomposition unit for estimating a vector consisting of the intensity of the plane wave to be
Using the estimated value a (ω) of the vector composed of the intensity of the plane wave and the virtual microphone position r ^- _p , the sound collected signal u ^ (ω in the frequency domain at the virtual microphone position r ^- _p is used. , r ^- _p ), and an interpolation estimation unit,
Sound field estimation device.

The sound field estimation apparatus according to claim 1,
α is a predetermined real number, p = 1, 2,..., P, r is the radial radius of the polar coordinate, θ _p and φ _p are the polar coordinates, and the collected sound signal u ^ (ω, r, θ _p , φ _p ) Spherical wave spectrum u _{n, m} (ω, r) absolute value | u _{n, m} (ω, r) | and spherical wave spectrum of collected sound signal u ^ (ω, αr, θ _p , φ _p ) _{u n, m (ω, αr} ) the absolute value of _{| u n, m (ω,} αr) | based on magnitude relationship between the spherical wave spectrum is calculated from the sound signals picked up by a rigid ball-type microphone array v _{n, m} ( including an array transform that estimates ω, r),
Sound field estimation device.

The sound field estimation apparatus according to claim 2, wherein the position of the virtual head is defined as the position of the hard sphere microphone array.
Based on the P time collected sound signals y (t, r ^- _p ) obtained from the spherical wave spectrum v _{n, m} (ω, r) and the direction of the virtual head, Including a binaural signal generator for generating binaural signals in position and direction,
Sound field estimation device.

The sound field estimation apparatus according to any one of claims 1 to 3,
P is an integer greater than or equal to 1, p = 1,2, ..., P, a (ω) = [a ₁ (ω), a ₂ (ω), ..., a _l (ω), ..., a _{L '} (ω )] ^T and i are imaginary numbers, k is a wave number, polar argument of the l-th plane wave constituting the sound field is θ _l and φ _l , and the interpolation estimation unit includes:

To estimate the sound collection signal u ^ (ω, r ^- _p ) in the frequency domain of the virtual microphone with declination θ _p and φ _p on a sphere of radius r centered at [d _x d _y d _z ] To
Sound field estimation device

The sound field estimation apparatus according to any one of claims 1 to 4,
The M spherical microphone arrays m are open-type microphone arrays,
|| a (ω) || ₁ is the L1 norm of a (ω), the constant for regularization is λ, and the plane wave decomposition unit is

Is used as a cost function, and an estimated value a ⁻ (ω) that minimizes the cost function J is obtained.
Sound field estimation device.

The sound field estimation apparatus according to any one of claims 1 to 4,
The M spherical microphone arrays are rigid spherical microphone arrays,
|| a (ω) || ₁ is the L1 norm of a (ω), λ is a constant for regularization, Y _n ^{m ′} is a spherical harmonic function of order n, order m ′, k is a wave number, b _n (kr _m ) is a mode intensity function, and the plane wave decomposition unit is

m = 1,2, ..., the index of the spherical microphone array to _{M, j m = 1,2, ...} , J m, r m a polar _radius, θ j_m and φ the polar polarization angle _{J_m,} the ω time an index of the frequency, the plane wave decomposition section, d ^- the argument of the sphere radius r _m around the _{m θ} _{j_m} and phi _{J_m} of J _m-number of position ^{_{r - (m, j m)}} = d - m + [r _m sinθ _{j_m} cosφ _{j_m} r _m sinθ _{j_m sinφ} _{j_m} r _m _{cosθ j_m} ] Using the collected sound signals u (ω, m, j _m ) in the frequency domain of M spherical microphone arrays m each having a microphone in ^T A plane wave decomposition step for estimating a vector composed of the intensity of the plane wave constituting the sound field;
Interpolation estimation unit, and estimates a (omega) of the vector of intensity of the plane wave, the position r of the virtual microphone ^- with a _p, the position r of the virtual microphone ^- sound pickup frequencies in _{the p} region An interpolation estimation step for estimating the signal u ^ (ω, r ^- _p ),
Sound field estimation method.

The program for functioning a computer as a sound field estimation apparatus in any one of Claims 1-6.