JP4691662B2

JP4691662B2 - Out-of-head sound localization device

Info

Publication number: JP4691662B2
Application number: JP2006031651A
Authority: JP
Inventors: 正治島田; 治英穂刈; 彰洋工藤; 星哉久保
Original assignee: Nagaoka University of Technology
Current assignee: Nagaoka University of Technology
Priority date: 2006-02-08
Filing date: 2006-02-08
Publication date: 2011-06-01
Anticipated expiration: 2026-02-08
Also published as: JP2007214815A

Description

本発明は、ステレオヘッドホン受聴時に頭外に音源の方位を知覚できるようにする頭外音像定位技術に関する。 The present invention relates to an out-of-head sound image localization technique that makes it possible to perceive the direction of a sound source outside the head when listening to stereo headphones.

頭外音像定位は、ヘッドホン装着時の外耳道の音響伝播特性をデジタルフィルタで補正して受聴者に提示することにより、頭の外に開放感のある音像を再現するものである。
空間音響特性の付与は、図１３に示すように、頭外音像定位伝達関数ＳＬＴＦ（Sound Localization Transfer Function）を音源信号Ｓ（ω）と複素乗算して行う。
図の添字Ｌ、Ｒはそれぞれ左耳、右耳のものであることを示している。
ＳＬＴＦは、自由空間におけるスピーカ３から外耳道入口のマイクロホン２までの空間音響伝達関数ＳＳＴＦ（Spacial Sound Transfer Function）を求め、これをスピーカ３の伝達関数ＬＳＴＦ（Loud Speaker Transfer Function）で割ってスピーカ３の特性を補償したものをさらに受聴者８のヘッドホン４からマイクロホン２までの外耳道伝達関数ＥＣＴＦ（Ear Canal Transfer Function）で割って得られる。すなわちＳＬＴＦ＝ＳＳＴＦ／（ＬＳＴＦ・ＥＣＴＦ）となる。
このＳＬＴＦを音源信号Ｓ（ω）と複素乗算し、ヘッドホン４を介して受聴者８に提示することにより、頭外音像定位が実現する。 Out-of-head sound image localization reproduces a sound image with a sense of openness outside the head by correcting the acoustic propagation characteristics of the ear canal when wearing headphones with a digital filter and presenting it to the listener.
As shown in FIG. 13, the spatial acoustic characteristics are given by complex multiplication of an out-of-head sound image localization transfer function SLTF (Sound Localization Transfer Function) and the sound source signal S (ω).
The subscripts L and R in the figure indicate the left ear and the right ear, respectively.
The SLTF obtains a spatial sound transfer function SSTF (Spacial Sound Transfer Function) from the speaker 3 in the free space to the microphone 2 at the entrance of the ear canal, and divides this by the transfer function LSTF (Loud Speaker Transfer Function) of the speaker 3. The characteristic compensated is further divided by the ear canal transfer function (ECTF) from the headphone 4 to the microphone 2 of the listener 8. That is, SLTF = SSTF / (LSTF · ECTF).
The SLTF is complex-multiplied with the sound source signal S (ω) and presented to the listener 8 via the headphones 4 to realize out-of-head sound image localization.

しかしながら、受聴者が音源位置を知覚するのに必要な両耳間の到達時間差、レベル差、周波数特性などの両耳特性や単耳特性は、頭部、胴体、耳介などの微妙な違いで変化することが知られている。
そのため人間の頭部、胴体、耳介には個人差があり、ダミーヘッドを用いて測定した汎用の伝達関数では本来は前方に知覚されるべき音像が後方に知覚される、いわゆる前後誤判定が生じ、音像定位は不正確になるか、もしくは最悪の場合、音像は頭の中に定位してしまう。 However, the binaural characteristics such as arrival time difference, level difference, frequency characteristics, etc. necessary for the listener to perceive the sound source position, and the monoaural characteristics are subtle differences such as the head, trunk, and pinna. It is known to change.
For this reason, there are individual differences in the human head, torso, and auricle, and so-called front / rear misjudgment where a sound image that should be perceived forward is perceived backward in a general-purpose transfer function measured using a dummy head. The sound image localization is inaccurate or, in the worst case, the sound image is localized in the head.

解決しようとする問題点は以上のような点であり、本発明は、汎用の伝達関数を用いながら不特定多数の受聴者に良好な定位感が得られるステレオヘッドホンの頭外音像定位装置を提供することを目的になされたものである。 The problems to be solved are as described above, and the present invention provides an out-of-head sound image localization apparatus for stereo headphones that can obtain a good localization feeling for an unspecified number of listeners using a general-purpose transfer function. It was made for the purpose of doing.

そのため本発明は、方位角±Φを有する前方の2チャネルステレオ実音源位置Ｌ、Ｒの左右に角度差θ（スイング角度）を有する移動音源位置Ａ、Ｂを設定して両耳の受聴点までの経路の伝達関数Ｈａ（ω）、Ｈｂ（ω）を移動音源位置毎に求め、ステレオヘッドホンの各チャネルの音源信号ｓ（ｔ）に対し、フレームの前後がオーバラップする波形切り出し関数ｗ_１（ｔ）を掛けて逐次フレームを切り出し、音源信号ｓ（ｔ）を複数のフレーム信号ｓｎ（ｔ）に分割する切り出し手段と、フレーム信号ｓｎ（ｔ）に対し、交互に伝達関数Ｈａ（ω）、Ｈｂ（ω）を畳み込んで移動音源位置Ａ、Ｂの位置情報を含んだフレーム信号ｓａ（ｔ）、ｓｂ（ｔ）を生成する畳み込み手段と、フレーム信号ｓａ（ｔ）、ｓｂ（ｔ）に対し、波形合成関数ｗ_２を掛けて得られるフレーム信号ｓａ´（ｔ）、ｓｂ´（ｔ）を交互にオーバラップ加算して、波形の不連続を平滑化するとともに、ステレオ実音源位置Ｌ、Ｒに設置した音源が移動音源位置Ａ、Ｂ間を一定の周期Ｔ（スイッチング時間）で往復運動する移動情報を含んだ合成信号ｓ´（ｔ）を生成する加算手段とを備え、これより受聴者の両耳に音像提示角度φ、スイング角度θ、スイッチング時間Ｔなるスイング音像を提示して音像を頭外に定位させることを最も主要な特徴とする。 Therefore, the present invention sets the moving sound source positions A and B having the angle difference θ (swing angle) to the left and right of the front two-channel stereo real sound source positions L and R having the azimuth angle ± Φ to the listening point of both ears. Transfer functions Ha (ω) and Hb (ω) for each path are obtained for each moving sound source position, and a waveform cut-out function w ₁ (front and back) overlaps the sound source signal s (t) of each channel of stereo headphones. t), the frame is sequentially cut out, the cut-out means for dividing the sound source signal s (t) into a plurality of frame signals sn (t), and the transfer function Ha (ω), alternately for the frame signal sn (t), Convolution means for generating frame signals sa (t) and sb (t) including position information of the moving sound source positions A and B by convolving Hb (ω), and frame signals sa (t) and sb (t) contrast, the waveform synthesis function w The multiplied by frame signal sa' obtained (t), Sb' and overlap-add (t) are alternately while smooth discontinuities in the waveform, stereo actual sound source position L, the sound source installed in R movement Adding means for generating a synthesized signal s ′ (t) including movement information that reciprocates between the sound source positions A and B at a constant period T (switching time), and thereby presenting a sound image to both ears of the listener The main feature is that the sound image is localized out of the head by presenting a swing sound image having an angle φ, a swing angle θ, and a switching time T.

本発明は、人間を含む動物は一般的に移動音源の知覚に対して敏感であるという性質を利用して、ステレオヘッドホンの各チャネルの音源信号ｓ（ｔ）に移動音源位置Ａ、Ｂの伝達関数Ｈａ（ω）、Ｈｂ（ω）を交互に畳み込んで音像を移動するので、汎用の伝達関数を用いた高精度な頭外音像定位を実現できる。
また、オーバラップ区間の波形を合成してオーバラップ加算時の波形の不連続を平滑化するので、伝達関数を切り替える際に生じる振幅変動による違和感をなくし、より自然な信号波形の音像を提示できる。 The present invention takes advantage of the fact that animals including humans are generally sensitive to the perception of moving sound sources, and transfers the sound source positions A and B to the sound source signal s (t) of each channel of stereo headphones. Since the sound image is moved by alternately convolving the functions Ha (ω) and Hb (ω), high-accuracy out-of-head sound image localization using a general-purpose transfer function can be realized.
In addition, since the waveform of the overlap interval is synthesized to smooth the discontinuity of the waveform at the time of overlap addition, there is no sense of incongruity due to amplitude fluctuation that occurs when the transfer function is switched, and a more natural signal waveform sound image can be presented .

以下、本発明の実施の形態について説明する。 Embodiments of the present invention will be described below.

図１に、本発明を実施した頭外音像定位装置の構成図を示す。
図には、図を複雑にしないためステレオヘッドホンシステムの片方のチャネル系列のみを示している。
頭外音像定位装置は、パソコン１の入力側に測定系のマイクロホン２を接続し、出力側に測定系のスピーカ３と再生系のヘッドホン４をスイッチ５により切換え可能に接続する。
マイクロホン２は、Ａ／Ｄ変換器２１、アンチエリアシングフィルタとしてのローパスフィルタ２２、アンプ２３を介してパソコン１に接続する。
スピーカ３は、セレクタ３１、アンプ３２を介してスイッチ５に接続する。
ヘッドホン４は、アンプ４１を介してスイッチ５に接続する。
スイッチ５は、Ｄ／Ａ変換器５１、スムージングフィルタとしてのローパスフィルタ５２を介してパソコン１に接続する。 FIG. 1 shows a configuration diagram of an out-of-head sound image localization apparatus embodying the present invention.
In the figure, only one channel series of the stereo headphone system is shown in order not to make the figure complicated.
The out-of-head sound image localization apparatus has a measuring system microphone 2 connected to the input side of the personal computer 1 and a measuring system speaker 3 and a reproducing system headphone 4 connected to the output side by a switch 5 so as to be switchable.
The microphone 2 is connected to the personal computer 1 through an A / D converter 21, a low pass filter 22 as an anti-aliasing filter, and an amplifier 23.
The speaker 3 is connected to the switch 5 via the selector 31 and the amplifier 32.
The headphone 4 is connected to the switch 5 via the amplifier 41.
The switch 5 is connected to the personal computer 1 through a D / A converter 51 and a low pass filter 52 as a smoothing filter.

測定は測定室６内に人間の頭部、胴体、耳介形状を模したダミーヘッド７を設置して行い、ダミーヘッド７の両耳の外耳道入口部分にマイクロホン２をセットし、ダミーヘッド７を中心にした前方の円弧上に等しい角度間隔でスピーカ３を配置する。
そしてセレクタ３１を切換えて測定音を出力するスピーカ３の位置を移動し、スピーカ３の測定音をダミーヘッド７の耳に挿入したマイクロホン２で収音して所定の測定角度間隔で順番にインパルス応答を測定する。
測定角度間隔を実際のスピーカ３の配置間隔より狭い、例えば１度間隔にする場合は、到来時間差を考慮した線形補間法を用いてインパルス応答を計算で求める。
スピーカ３は、円形の枠に等しい角度間隔で取り付けてもよい。
その場合、枠を水平方向に回転して測定角度間隔を実際のスピーカ３の配置間隔より狭い、例えば１度間隔にすることもできる。 Measurement is performed by installing a dummy head 7 simulating a human head, torso, and auricle shape in the measurement chamber 6, setting the microphone 2 at the entrance of the ear canal of both ears of the dummy head 7, and mounting the dummy head 7. The speakers 3 are arranged at equal angular intervals on the forward arc centered.
The selector 31 is switched to move the position of the speaker 3 that outputs the measurement sound, and the measurement sound of the speaker 3 is picked up by the microphone 2 inserted into the ear of the dummy head 7 and is impulse response in order at predetermined measurement angle intervals. Measure.
When the measurement angle interval is narrower than the actual arrangement interval of the speakers 3, for example, an interval of 1 degree, the impulse response is obtained by calculation using a linear interpolation method considering the arrival time difference.
The speakers 3 may be attached at equal angular intervals to the circular frame.
In that case, the frame can be rotated in the horizontal direction so that the measurement angle interval is narrower than the actual arrangement interval of the speakers 3, for example, an interval of 1 degree.

測定はダミーヘッド７の両耳について行い、図２に示すように、左側のスピーカ３Ｌと左耳との間のインパルス応答ｈ１Ｌ（ｔ）、左側のスピーカ３Ｌと右耳との間のインパルス応答ｈ２Ｌ（ｔ）、右側のスピーカ３Ｒと右耳との間のインパルス応答ｈ１Ｒ（ｔ）、右側のスピーカ３Ｒと左耳との間のインパルス応答ｈ２Ｒ（ｔ）をそれぞれ測定する。 The measurement is performed for both ears of the dummy head 7, and as shown in FIG. 2, the impulse response h1L (t) between the left speaker 3L and the left ear, and the impulse response h2L between the left speaker 3L and the right ear. (T) The impulse response h1R (t) between the right speaker 3R and the right ear and the impulse response h2R (t) between the right speaker 3R and the left ear are measured.

図３に、パソコン１内で処理する測定系のブロック図を示す。
測定系は、信号発生部１１、インパルス応答計算部１２、メモリ保存部１３で構成し、測定室６内におけるスピーカ３とマイクロホン２の間のインパルス応答を測定して音源から受聴点までの経路の伝達関数を求める。 FIG. 3 shows a block diagram of a measurement system processed in the personal computer 1.
The measurement system includes a signal generation unit 11, an impulse response calculation unit 12, and a memory storage unit 13. The measurement system measures the impulse response between the speaker 3 and the microphone 2 in the measurement chamber 6 and determines the path from the sound source to the listening point. Find the transfer function.

信号発生部１１は、Ｍ系列信号（Maximum Length Sequence）や時間伸張パルス（Time Stretched Pulse）などのインパルス応答測定用の入力信号ｘ（ｔ）を生成し、スピーカ３に出力する。
入力信号ｘ（ｔ）はスピーカ３により音として出力され、ダミーヘッド７の耳に挿入したマイクロホン２により収音される。
マイクロホン２で収音した音はデジタル信号に変換され、インパルス応答がｈ（ｔ）の線形システムにｘ（ｔ）を入力したときの出力信号ｙ（ｔ）として入力信号ｘ（ｔ）と共にインパルス応答計算部１２に入力される。
インパルス応答計算部１２は、入力信号ｘ（ｔ）のフーリエ変換Ｘ（ω）と出力信号ｙ（ｔ）のフーリエ変換Ｙ（ω）からインパルス応答ｈ（ｔ）のフーリエ変換である伝達関数Ｈ（ω）＝Ｙ（ω）／Ｘ（ω）を算出する。
測定はスピーカ３の位置を移動して行い、異なる音源位置のインパルス応答ｈ（ｔ）を所定の測定角度間隔で順番に取得し、それより異なる音源位置の伝達関数Ｈ（ω）を順番に算出する。
メモリ保存部１３は、インパルス応答計算部１２が算出した異なる音源位置の伝達関数Ｈ（ω）を順番にメモリに保存する。 The signal generator 11 generates an impulse response measurement input signal x (t) such as an M-sequence signal (Maximum Length Sequence) or a time stretched pulse (Time Stretched Pulse) and outputs it to the speaker 3.
The input signal x (t) is output as sound by the speaker 3 and collected by the microphone 2 inserted into the ear of the dummy head 7.
The sound collected by the microphone 2 is converted into a digital signal, and the impulse response together with the input signal x (t) is output as the output signal y (t) when x (t) is input to the linear system with the impulse response h (t). Input to the calculator 12.
The impulse response calculation unit 12 performs a transfer function H () that is a Fourier transform of the impulse response h (t) from the Fourier transform X (ω) of the input signal x (t) and the Fourier transform Y (ω) of the output signal y (t). ω) = Y (ω) / X (ω) is calculated.
Measurement is performed by moving the position of the speaker 3, impulse responses h (t) of different sound source positions are sequentially acquired at predetermined measurement angle intervals, and transfer functions H (ω) of different sound source positions are sequentially calculated therefrom. To do.
The memory storage unit 13 sequentially stores the transfer functions H (ω) of different sound source positions calculated by the impulse response calculation unit 12 in the memory.

図２と図４に示すように、音源信号ＳＬ（ｔ）と実音源３Ｌの位置から受聴者の両耳までの伝達関数のインパルス応答ｈ１Ｌ（ｔ）、ｈ２Ｌ（ｔ）を畳み込んだ信号と、音源信号ＳＲ（ｔ）と実音源３Ｒの位置から受聴者の両耳までの伝達関数のインパルス応答ｈ１Ｒ（ｔ）、ｈ２Ｒ（ｔ）を畳み込んだ信号を両耳毎にそれぞれ加算することによって得られる２チャネルの仮想音源ＳｉＬ（ｔ）とＳｉＲ（ｔ）を、ヘッドホン４を用いて提示することで、受聴者は合成されたステレオ音像Ｓｉ（ｔ）を知覚する。
このとき、音源信号ＳＬ（ｔ）とＳＲ（ｔ）にレベル差と時間差を付加することで、合成されたステレオ音像の提示角度φを制御できる。
音像スイング法は、図４に示すように、仮想音源ＳｉＬ（ｔ）とＳｉＲ（ｔ）をスイング角度θだけ変位した位置Ａ、Ｂ間で一定のスイッチング時間Ｔで往復運動させることで、合成されたステレオ音像Ｓｉ（ｔ）を左右方向に変位させることにより、受聴者８の頭外に定位させるものである。
図４において、円弧ＡＢの中心角θをスイング角度とし、３〜１０度の範囲に設定する。
また、受聴者８の頭部中心からステレオ実音源位置Ｌ、Ｒまでの距離?を約１．５ｍ、スイッチング時間Ｔを２００ｍ秒以上に設定する。
音像提示角度φは、各チャネルの音源信号s（ｔ）に時間差とレベル差を付加することにより設定する。 As shown in FIG. 2 and FIG. 4, the sound source signal SL (t) and a signal obtained by convolving the impulse responses h1L (t) and h2L (t) of the transfer function from the position of the real sound source 3L to both ears of the listener, By adding, for each ear, a signal obtained by convolving the sound source signal SR (t) and the impulse response h1R (t), h2R (t) of the transfer function from the position of the real sound source 3R to the listener's ears, respectively. By presenting the obtained two-channel virtual sound sources SiL (t) and SiR (t) using the headphones 4, the listener perceives the synthesized stereo sound image Si (t).
At this time, the presentation angle φ of the synthesized stereo sound image can be controlled by adding a level difference and a time difference to the sound source signals SL (t) and SR (t).
As shown in FIG. 4, the sound image swing method is synthesized by reciprocating the virtual sound sources SiL (t) and SiR (t) between positions A and B displaced by the swing angle θ with a constant switching time T. Further, the stereo sound image Si (t) is displaced in the left-right direction to be localized outside the head of the listener 8.
In FIG. 4, the central angle θ of the arc AB is set as a swing angle and set in a range of 3 to 10 degrees.
Also, the distance from the center of the head of the listener 8 to the stereo real sound source positions L and R is set to about 1.5 m, and the switching time T is set to 200 msec or more.
The sound image presentation angle φ is set by adding a time difference and a level difference to the sound source signal s (t) of each channel.

スイング音像の提示方法には、図５に示すように、ステレオ実音源位置Ｌ、Ｒに設定した仮想音源ＳｉＬ（ｔ）、ＳｉＲ（ｔ）を逆方向に移動音源位置Ａ、Ｂ間を往復させるコンパンド法と、図６に示すように、仮想音源ＳｉＬ（ｔ）、ＳｉＲ（ｔ）を同方向に移動音源位置Ａ、Ｂ間を往復させるツイスト法がある。
コンパンド法は、左右のステレオ実音源位置Ｌ、Ｒに設定した仮想音源ＳｉＬ（ｔ）、ＳｉＲ（ｔ）が提示する音像の位置を左右に伸縮して受聴者８の前方の頭外に音像を定位させる。
ツイスト法は、左右のステレオ実音源位置Ｌ、Ｒに設定した仮想音源ＳｉＬ（ｔ）、ＳｉＲ（ｔ）が提示する音像の位置を左右に揺動して受聴者８の前方の頭外に音像を定位させる。 As shown in FIG. 5, the virtual sound sources SiL (t) and SiR (t) set at the stereo real sound source positions L and R are reciprocated between the moving sound source positions A and B in the reverse direction. As shown in FIG. 6, there is a compound method and a twist method in which the virtual sound sources SiL (t) and SiR (t) are reciprocated between the moving sound source positions A and B in the same direction.
In the companding method, the position of the sound image presented by the virtual sound sources SiL (t) and SiR (t) set at the left and right stereo real sound source positions L and R is expanded and contracted to the left and right, and the sound image is placed out of the head in front of the listener 8. Let it be localized.
In the twist method, the position of the sound image presented by the virtual sound sources SiL (t) and SiR (t) set at the left and right stereo real sound source positions L and R is swung left and right, and the sound image is out of the head in front of the listener 8. Is localized.

図７に、パソコン１内で処理する再生系のブロック図を示す。
再生系は、第１音像生成部１４、第２音像生成部１５、第３音像生成部１６、第４音像生成部１７、第１音像合成部１８、第２音像合成部１９で構成し、左右のステレオ信号を入力して両耳のスイング音像を生成し、ヘッドホン４の左右のチャネルに出力する。
スイング音像は、左右のチャネルに対しメモリ保存部１３に保存した異なる音源位置の伝達関数Ｈ（ω）の中から移動音源位置Ａ、Ｂの伝達関数Ｈａ（ω）、Ｈｂ（ω）を抽出し、それを交互に音源信号と複素乗算して得られる。 FIG. 7 shows a block diagram of a reproduction system for processing in the personal computer 1.
The reproduction system includes a first sound image generating unit 14, a second sound image generating unit 15, a third sound image generating unit 16, a fourth sound image generating unit 17, a first sound image synthesizing unit 18, and a second sound image synthesizing unit 19. The stereo sound signal is input to generate a swing sound image of both ears and output to the left and right channels of the headphones 4.
The swing sound image extracts the transfer functions Ha (ω) and Hb (ω) of the moving sound source positions A and B from the transfer functions H (ω) of different sound source positions stored in the memory storage unit 13 for the left and right channels. It is obtained by complex multiplication with the sound source signal alternately.

第１音像生成部１４は、左側のステレオ信号とインパルス応答ｈ１Ｌ（ｔ）を畳み込み乗算して左耳用のスイング音像ｓ１Ｌ（ｔ）を生成する。
第２音像生成部１５は、左側のステレオ信号とインパルス応答ｈ２Ｌ（ｔ）を畳み込み乗算して右耳用のスイング音像ｓ２Ｌ（ｔ）を生成する。
第３音像生成部１６は、右側のステレオ信号とインパルス応答ｈ２Ｒ（ｔ）を畳み込み乗算して左耳用のスイング音像ｓ２Ｒ（ｔ）を生成する。
第４音像生成部１７は、右側のステレオ信号とインパルス応答ｈ１Ｒ（ｔ）を畳み込み乗算して右耳用のスイング音像ｓ１Ｒ（ｔ）を生成する。
第１音像合成部１８は、スイング音像ｓ１Ｌ（ｔ）、ｓ２Ｒ（ｔ）を加算してヘッドホン４の左チャネル出力信号を生成する。
第２音像合成部１９は、スイング音像ｓ１Ｒ（ｔ）、ｓ２Ｌ（ｔ）を加算してヘッドホン４の右チャネル出力信号を生成する。 The first sound image generation unit 14 generates a swing sound image s1L (t) for the left ear by convolving and multiplying the left stereo signal and the impulse response h1L (t).
The second sound image generation unit 15 convolves and multiplies the left stereo signal and the impulse response h2L (t) to generate a right ear swing sound image s2L (t).
The third sound image generation unit 16 generates a swing sound image s2R (t) for the left ear by convolving and multiplying the right stereo signal and the impulse response h2R (t).
The fourth sound image generating unit 17 generates a swing sound image s1R (t) for the right ear by convolving and multiplying the right stereo signal and the impulse response h1R (t).
The first sound image synthesis unit 18 adds the swing sound images s1L (t) and s2R (t) to generate the left channel output signal of the headphones 4.
The second sound image synthesizer 19 adds the swing sound images s1R (t) and s2L (t) to generate the right channel output signal of the headphones 4.

図８に、音像生成部の処理フローを示す。
まず、音源信号ｓ（ｔ）にフレーム間で前後がオーバラップする波形切り出し関数ｗ_１（ｔ）を掛けて逐次フレームを切り出し、音源信号ｓ（ｔ）を複数のフレーム信号ｓｎ（ｔ）＝ｓ（ｔ）・ｗ_１（ｔ）に分割する（ステップ１０１）。
これにより音源信号の長さをインパルス応答と同程度の長さに分割し、畳み込み演算の処理効率を高める。
次に、高速フーリエ変換（ＦＦＴ）によりフレーム信号ｓｎ（ｔ）のフーリエ変換Ｓｎ（ω）＝Ｆ｛ｓｎ（ｔ）｝を求める（ステップ１０２）。
次に、周波数領域のフレーム信号Ｓｎ（ω）と移動音源位置Ａ、Ｂの伝達関数Ｈａ（ω）、Ｈｂ（ω）を交互に複素乗算してＡ位置フレーム信号Ｓａ（ω）＝Ｓｎ（ω）・Ｈａ（ω）とＢ位置フレーム信号Ｓｂ（ω）＝Ｓｎ（ω）・Ｈｂ（ω）を生成する（ステップ１０３）。
これにより異なる音源位置Ａ、Ｂで測定・算出された伝達関数が畳み込まれ、フレーム信号は空間の位置情報を含んだ音源信号となる。 FIG. 8 shows a processing flow of the sound image generation unit.
First, the sound source signal s (t) is multiplied by a waveform cut-out function w ₁ (t) that overlaps between frames before and after, thereby sequentially cutting out frames, and the sound source signal s (t) is converted into a plurality of frame signals sn (t) = s. Divide into (t) and w ₁ (t) (step 101).
As a result, the length of the sound source signal is divided into the same length as the impulse response, and the processing efficiency of the convolution calculation is increased.
Next, the Fourier transform Sn (ω) = F {sn (t)} of the frame signal sn (t) is obtained by fast Fourier transform (FFT) (step 102).
Next, the A position frame signal Sa (ω) = Sn (ω) is obtained by alternately complex-multiplying the frequency domain frame signal Sn (ω) and the transfer functions Ha (ω) and Hb (ω) of the moving sound source positions A and B. ) · Ha (ω) and B position frame signal Sb (ω) = Sn (ω) · Hb (ω) are generated (step 103).
As a result, transfer functions measured and calculated at different sound source positions A and B are convoluted, and the frame signal becomes a sound source signal including spatial position information.

コンパンド法の場合、ステレオの各チャネルのフレーム信号Ｓｎ（ω）に対し、複素乗算する伝達関数Ｈａ（ω）、Ｈｂ（ω）の順序を左右逆にする。
ツイスト法の場合、ステレオの各チャネルのフレーム信号Ｓｎ（ω）に対し、複素乗算する伝達関数Ｈａ（ω）、Ｈｂ（ω）の順序を左右同じにする。 In the case of the companding method, the order of the transfer functions Ha (ω) and Hb (ω) for performing complex multiplication on the frame signal Sn (ω) of each stereo channel is reversed.
In the case of the twist method, the order of the transfer functions Ha (ω) and Hb (ω) for performing complex multiplication on the frame signal Sn (ω) of each stereo channel is the same on the left and right.

次に、逆高速フーリエ変換（ＩＦＦＴ）によりＡ位置フレーム信号Ｓａ（ω）とＢ位置フレーム信号Ｓｂ（ω）の逆フーリエ変換ｓａ（ｔ）＝Ｆ^-1｛Ｓａ（ω）｝、ｓｂ（ｔ）＝Ｆ^-1｛Ｓｂ（ω）｝を求め、音源信号を時間領域に戻す（ステップ１０４）。
次に、Ａ位置フレーム信号ｓａ（ｔ）とＢ位置フレーム信号ｓｂ（ｔ）に波形合成関数ｗ_２（ｔ）を掛けてフレームの前後の波形を合成し、波形合成Ａ位置フレーム信号ｓａ´（ｔ）＝ｓａ（ｔ）・ｗ_２（ｔ）と波形合成Ｂ位置フレーム信号ｓｂ´（ｔ）＝ｓｂ（ｔ）・ｗ_２（ｔ）を生成する。これによりオーバラップ区間の振幅を調整し、オーバラップ加算時の振幅変動を抑えてフレームのつなぎを滑らかにする。次に、波形合成Ａ位置フレーム信号ｓａ´（ｔ）と波形合成Ｂ位置フレーム信号ｓｂ´（ｔ）を交互にオーバラップ加算して結合し、合成信号ｓ´（ｔ）＝ｓａ´（ｔ_１）＋ｓｂ´（ｔ_２）＋・・・を生成する（ステップ１０５）。
これにより異なる音源位置Ａ、Ｂの位置情報を含んだフレーム信号が交互に接続され、合成信号は空間の移動情報を含んだ音源信号となる。 Next, the inverse Fourier transform sa (t) = F ⁻¹ {Sa (ω)}, sb (t) of the A position frame signal Sa (ω) and the B position frame signal Sb (ω) by inverse fast Fourier transform (IFFT). ) = F ⁻¹ {Sb (ω)}, and the sound source signal is returned to the time domain (step 104).
Next, the A position frame signal sa (t) and the B position frame signal sb (t) are multiplied by the waveform synthesis function w ₂ (t) to synthesize the waveforms before and after the frame, and the waveform synthesis A position frame signal sa ′ ( t) = sa (t) · w ₂ (t) and the waveform synthesis B position frame signal sb ′ (t) = sb (t) · w ₂ (t) are generated. As a result, the amplitude of the overlap interval is adjusted, and fluctuations in amplitude at the time of overlap addition are suppressed to smooth the frame connection. Next, the waveform synthesis A position frame signal sa ′ (t) and the waveform synthesis B position frame signal sb ′ (t) are alternately overlap-added and combined, and the synthesis signal s ′ (t) = sa ′ (t ₁ ) + Sb ′ (t ₂ ) +... (Step 105).
As a result, frame signals including positional information of different sound source positions A and B are alternately connected, and the synthesized signal becomes a sound source signal including spatial movement information.

波形切り出し関数ｗ_１（ｔ）と波形合成関数ｗ₂（ｔ）は、フェードイン／フェードアウト関数を用いる場合とモディファイド・ハミング窓を用いる場合がある。
切り出し区間Ｌとフレームシフト量Ｍの間に、Ｌ＝（4・Ｍ）の倍数という関係が成り立つならば、モディファイド・ハミング窓を用いて、波形の切り出し・合成を行うことで、滑らかに波形を合成することができる。モディファイド・ハミング窓は、オーバラップ区間のパワー和が一定になるように一方の振幅と他方の振幅を両方同時に小さくして信号波形を平滑化し、音像が滑らかに移動するようにする。
また、フェードイン／フェードアウト関数を用いることでも、滑らかに波形の合成を行うことができる。フェードイン／フェードアウト関数は、図９に示すように、フレーム信号をａからｂに切り替えるとき、もしくはｂからａに切り替えるとき、信号ａ、ｂのオーバラップ区間をクロスフェード領域とし、クロスフェード領域においてフェードアウトする信号ａには直線状に傾斜して下降するフェードアウト関数ｗａ（ｔ）を乗算し、フェードインする信号ｂには直線状に傾斜して上昇するフェードイン関数ｗｂ（ｔ）を乗算する。
これによりオーバラップ区間のパワー和が一定になるように一方の振幅を単調減少、他方の振幅を単調増加させて信号波形を平滑化し、音像が滑らかに移動するようにする。 The waveform cut-out function w ₁ (t) and the waveform synthesis function w ₂ (t) may use a fade-in / fade-out function or a modified Hamming window.
If a relationship of multiples of L = (4 · M) is established between the cut-out section L and the frame shift amount M, the waveform can be cut out and synthesized using a modified Hamming window to smoothly form the waveform. Can be synthesized. The modified Hamming window smoothes the signal waveform by simultaneously reducing the amplitude of one and the other so that the power sum in the overlap section is constant, so that the sound image moves smoothly.
Also, the waveform can be synthesized smoothly by using a fade-in / fade-out function. As shown in FIG. 9, when the frame signal is switched from a to b, or when switching from b to a, the fade-in / fade-out function uses the overlap interval of the signals a and b as a cross-fade region. The signal a that fades out is multiplied by a fade-out function wa (t) that linearly inclines and falls, and the signal b that fades in is multiplied by a fade-in function wb (t) that inclines and rises linearly.
Thus, one amplitude is monotonously decreased and the other amplitude is monotonously increased so that the power sum of the overlap section is constant, thereby smoothing the signal waveform so that the sound image moves smoothly.

以下、本発明の実施例（評価結果）について説明する。
図１０に、音像提示角度φと前後誤判定率の関係を示す。
図１０は、汎用の伝達関数を用いると定位精度が悪化する被験者に対し、本発明の頭外音像定位装置を適用した場合の評価結果を示し、左右０度から２０度の音像提示角度φを横軸に、音像定位知覚の前後誤判定率を縦軸に配置している。
これより音像提示角度が０度の場合、従前の技術では前後誤判定率が６０％であったものが本発明のコンパンド法とツイスト法では１０〜３０％に減少していることが分かる。
また、音像提示角度が１０度の場合、従前の技術では前後誤判定率が２５％であったものが本発明のコンパンド法とツイスト法では１０〜２０％に減少している。
このときの値は、後述のスイッチング時間Ｔ、スイング角度θ、伝達関数に合致しない被験者のすべての平均値で表している。
以上により、音像提示角度が０度における前後誤判定率が最も悪く、音像提示角度が正面を離れるほど、前後誤判定率が低下することが分かる。これは理論的・実験的にも正面方向の定位精度が悪いことを実証している。
以下、正面定位に議論を絞って最適なスイング角度θ、スイッチング時間Ｔを求める。 Hereinafter, examples (evaluation results) of the present invention will be described.
FIG. 10 shows the relationship between the sound image presentation angle φ and the front / rear erroneous determination rate.
FIG. 10 shows an evaluation result when the out-of-head sound image localization apparatus of the present invention is applied to a subject whose localization accuracy deteriorates when a general-purpose transfer function is used, and a sound image presentation angle φ of 0 to 20 degrees on the left and right is shown. On the horizontal axis, the misjudgment rate before and after sound image localization perception is arranged on the vertical axis.
From this, it can be seen that when the sound image presentation angle is 0 degree, the previous technique has a front / rear misjudgment rate of 60%, but the compound method and twist method of the present invention are reduced to 10-30%.
Further, when the sound image presentation angle is 10 degrees, the front-rear error determination rate of 25% in the conventional technique is reduced to 10-20% in the compound method and the twist method of the present invention.
The value at this time is represented by an average value of all the subjects who do not match the switching time T, the swing angle θ, and the transfer function, which will be described later.
From the above, it can be seen that the front / rear misjudgment rate is the worst when the sound image presentation angle is 0 degrees, and that the front / rear misjudgment rate decreases as the sound image presentation angle leaves the front. This proves that the localization accuracy in the front direction is poor both theoretically and experimentally.
In the following, the optimum swing angle θ and switching time T are determined by focusing on the front localization.

図１１に、スイング角度θと前後誤判定率の関係を示す。
図１１は、汎用の伝達関数に合致しない被験者（Ａグループ）とほぼ合致する被験者（Ｂグループ）に対し、本発明の頭外音像定位装置を適用した場合のツイスト法の評価結果を示し、音像提示角度φが０度でスイッチング時間Ｔを２００ｍ秒〜１秒までとし、スイング角度θを横軸に、音像定位知覚の前後誤判定率を縦軸に配置している。
これよりスイング角度θが最適な値の範囲は３〜１０度であることが分かる。
なお、コンパンド法については記述を省略するが、ツイスト法と同様な評価結果を得ている。 FIG. 11 shows the relationship between the swing angle θ and the forward / backward misjudgment rate.
FIG. 11 shows the evaluation result of the twist method when the out-of-head sound localization apparatus of the present invention is applied to a subject (Group B) that substantially matches a subject (Group A) that does not match a general-purpose transfer function. The presentation angle φ is 0 degree, the switching time T is from 200 milliseconds to 1 second, the swing angle θ is set on the horizontal axis, and the pre- and post-judgment error determination rates for sound image localization are arranged on the vertical axis.
From this, it can be seen that the range of the optimum value of the swing angle θ is 3 to 10 degrees.
In addition, although description is abbreviate | omitted about the companding method, the evaluation result similar to the twist method is obtained.

図１２に、スイッチング時間Ｔと前後誤判定率の関係を示す。
図１２は、汎用の伝達関数に合致しない被験者（Ａグループ）とほぼ合致する被験者（Ｂグループ）に対し、本発明の頭外音像定位装置を適用した場合のツイスト法の評価結果を示し、音像提示角度φが０度でスイング角度θを４度、８度とし、スイッチング時間Ｔを横軸に、音像定位知覚の前後誤判定率を縦軸に配置している。
これよりスイッチング時間Ｔが最適な値の範囲は２００ｍ秒以上であることが分かる。
同様に、コンパンド法については記述を省略するが、ツイスト法と同様な評価結果を得ている。 FIG. 12 shows the relationship between the switching time T and the front / rear misjudgment rate.
FIG. 12 shows the evaluation results of the twist method when the out-of-head sound localization apparatus of the present invention is applied to subjects (Group B) that substantially match subjects (Group A) that do not match the general-purpose transfer function. The presentation angle φ is 0 degree, the swing angle θ is 4 degrees and 8 degrees, the switching time T is set on the horizontal axis, and the error determination rate before and after sound image localization perception is arranged on the vertical axis.
From this, it is understood that the range of the optimum value of the switching time T is 200 milliseconds or more.
Similarly, although the description of the companding method is omitted, the same evaluation results as the twist method are obtained.

本発明を実施した頭外音像定位装置の構成図である。It is a block diagram of the out-of-head sound image localization apparatus which implemented this invention. インパルス応答の測定方法の概念図である。It is a conceptual diagram of the measuring method of an impulse response. パソコン１内で処理する測定系のブロック図である。2 is a block diagram of a measurement system that is processed in the personal computer 1. FIG. 本発明を実施した頭外音像定位装置の音像提示方法の概念図である。It is a conceptual diagram of the sound image presentation method of the out-of-head sound image localization apparatus which implemented this invention. コンパンド法による音像提示方法の概念図である。It is a conceptual diagram of the sound image presentation method by a companding method. ツイスト法による音像提示方法の概念図である。It is a conceptual diagram of the sound image presentation method by a twist method. パソコン１内で処理する再生系のブロック図である。2 is a block diagram of a playback system that is processed in the personal computer 1; FIG. 音像生成部の処理フローである。It is a processing flow of a sound image generation part. フェードイン／フェードアウト処理の概念図である。It is a conceptual diagram of a fade-in / fade-out process. 音像提示角度と前後誤判定率の関係を表すグラフである。It is a graph showing the relationship between a sound image presentation angle and a back-and-front error determination rate. スイング角度θと前後誤判定率の関係を表すグラフである。It is a graph showing the relationship between swing angle (theta) and a back-and-front misjudgment rate. スイッチング時間Ｔと前後誤判定率の関係を表すグラフである。It is a graph showing the relationship between the switching time T and the back-and-front error determination rate. 頭外音像定位伝達関数の測定方法の概念図である。It is a conceptual diagram of the measuring method of an out-of-head sound image localization transfer function.

Explanation of symbols

１パソコン
１１信号発生部
１２インパルス応答計算部
１３メモリ保存部
１４第１音像生成部
１５第２音像生成部
１６第３音像生成部
１７第４音像生成部
１８第１音像合成部
１９第２音像合成部
２マイクロホン
２１Ａ／Ｄ変換器
２２ローパスフィルタ
２３アンプ
３スピーカ
３１セレクタ
３２アンプ
４ヘッドホン
４１アンプ
５スイッチ
５１Ｄ／Ａ変換器
５２ローパスフィルタ
６測定室
７ダミーヘッド
８受聴者 DESCRIPTION OF SYMBOLS 1 Personal computer 11 Signal generation part 12 Impulse response calculation part 13 Memory preservation | save part 14 1st sound image generation part 15 2nd sound image generation part 16 3rd sound image generation part 17 4th sound image generation part 18 1st sound image synthesis part 19 2nd sound image synthesis Part 2 Microphone 21 A / D converter 22 Low-pass filter 23 Amplifier 3 Speaker 31 Selector 32 Amplifier 4 Headphone 41 Amplifier 5 Switch 51 D / A converter 52 Low-pass filter 6 Measurement room 7 Dummy head 8 Audience

Claims

Transfer function of path to listening point of both ears by setting moving sound source positions A and B having an angle difference θ (swing angle) to the left and right of the front two-channel stereo real sound source positions L and R having azimuth angles ± Φ Ha (ω) and Hb (ω) are obtained for each moving sound source position, and for the sound source signal s (t) of each channel of the stereo headphones,
A cutout unit that sequentially cuts out frames by multiplying the waveform cutout function w ₁ (t) in which the front and back of the frame overlap, and divides the sound source signal s (t) into a plurality of frame signals sn (t);
The frame signals sa (t) and sb (t) including the position information of the moving sound source positions A and B by alternately convolving the transfer functions Ha (ω) and Hb (ω) with respect to the frame signal sn (t). A convolution means to generate,
The frame signals sa (t) and sb (t) are alternately overlapped with the frame signals sa ′ (t) and sb ′ (t) obtained by multiplying the waveform synthesis function w ₂ to generate waveform discontinuities. And a synthesized signal s ′ (t) including movement information in which the sound source installed at the stereo real sound source positions L and R reciprocates between the moving sound source positions A and B at a constant period T (switching time). Adding means for generating
With
An out-of-head sound image localization apparatus that presents a sound image presentation angle φ, a swing angle θ, and a switching time T to the listener's ears to localize the sound image out of the head.

2. The head according to claim 1, wherein the swing sound image is expanded and contracted to the left and right by reversing the order of the transfer functions Ha (ω) and Hb (ω) for the frame signal sn (t) in the left and right channels. Outside sound image localization device.

The swing sound image is swung left and right by making the order of the transfer functions Ha (ω) and Hb (ω) to be convoluted in the left and right channels with respect to the frame signal sn (t). Out-of-head sound image localization device.

The out-of-head sound localization apparatus according to claim 1, wherein the waveform cut-out function w ₁ (t) and the waveform synthesis function w ₂ (t) are either a fade-in / fade-out function or a modified Hamming window, respectively. .

The out-of-head sound image localization apparatus according to claim 1, wherein the swing angle θ is 3 to 10 degrees.

2. The out-of-head sound image localization apparatus according to claim 1, wherein the switching time T is 200 milliseconds or more.