JP2003061200A

JP2003061200A - Sound processing apparatus and sound processing method, and control program

Info

Publication number: JP2003061200A
Application number: JP2001248209A
Authority: JP
Inventors: Kazufumi Yoshida; 和史吉田; Kohei Asada; 宏平浅田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2001-08-17
Filing date: 2001-08-17
Publication date: 2003-02-28

Abstract

PROBLEM TO BE SOLVED: To use an impulse response measured in a real space so as to calculate a sound waveform in more detail with a calculation amount smaller than that of a conventional method. SOLUTION: The sound processing apparatus is provided with a space configuration data storage section 12 for storing data with respect to a configuration of a virtual space and with an impulse response segment data storage section 14 stored with segment data of an impulse response of a sound wave actually measured in a real space equivalent to the virtual space, an amplitude/delay calculation section 13 calculates respective amplitude/delay amounts for a direct response and a reflection response on the basis of the space configuration data, an impulse response composite section 15 generates a composite impulse response on the basis of the amplitude delay data produced by the amplitude/delay calculation section 13 and a convolution section 16 convolutes the composite impulse response to sound source sound data.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声処理装置及び
音声処理方法、並びに制御プログラムに関し、特に、シ
ミュレートによって求められる振幅遅延データと実空間
を実際に測定することによって得られるインパルス応答
断片データとを使用して現実感ある仮想空間が表現可能
な音声処理装置及び音声処理方法、並びに、該音声処理
方法に基づく音声処理を実行する制御プログラムに関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice processing device, a voice processing method, and a control program, and particularly to amplitude response data obtained by simulation and impulse response fragment data obtained by actually measuring real space. The present invention relates to a voice processing device and a voice processing method capable of expressing a realistic virtual space by using, and a control program for executing voice processing based on the voice processing method.

【０００２】[0002]

【従来の技術】特定の響きをもつ音響空間において、実
際に空間をモデル化することによって、ある２点位置間
のインパルス応答の伝達関数を実現する方法は、特に建
築音響の分野でよく用いられている。2. Description of the Related Art A method of realizing a transfer function of impulse response between two points by actually modeling the space in an acoustic space having a particular sound is often used especially in the field of architectural acoustics. ing.

【０００３】インパルス応答とは、音源から受音点に至
るまでの音響伝達特性であって、音響空間に壁等の反射
物があれば、直接音の伝達応答のほかに反射音の伝達特
性が含まれる。また、音波の伝達経路に障害物があれ
ば、音波は、物体の大きさに応じて回折効果を受ける。
また、周波数応答にも影響を受ける。特に、受音点が聴
取者の耳の位置であれば、頭部伝達関数（ＨＲＴＦ：He
ad Related Transfer Function）が適用される。The impulse response is the acoustic transfer characteristic from the sound source to the sound receiving point. If there is a reflective object such as a wall in the acoustic space, in addition to the direct sound transfer response, the reflected sound transfer characteristic is included. Also, if there is an obstacle in the sound wave transmission path, the sound wave will be diffracted according to the size of the object.
It is also affected by the frequency response. Especially, if the sound receiving point is the position of the listener's ear, the head related transfer function (HRTF: He
ad Related Transfer Function) is applied.

【０００４】例えば、コンサートホールやスタジオ等を
建築する際、その建築内部の形状や壁の材質等の情報を
用いることによって、設計段階において完成時の音響効
果をシミュレーションするソフトウェア等が既に知られ
ている。このようなソフトウェアは、例えば、無響室で
実際に録音した様々な状況に応じて作成されたインパル
ス応答をデータベース化して備え、状況に応じたインパ
ルス応答を音源音声データに畳み込むことによって、設
計された空間内の特定位置にて音を発したときの任意の
場所における聞こえ方をシミュレーションすることがで
きる。For example, when building a concert hall or studio, software for simulating the acoustic effect at the time of completion at the design stage by using information such as the shape of the inside of the building and the material of the wall is already known. There is. Such software is designed by, for example, preparing a database of impulse responses created according to various situations actually recorded in an anechoic room, and convolving the impulse responses according to the situations with sound source audio data. It is possible to simulate how to hear a sound when it emits a sound at a specific position in a space.

【０００５】[0005]

【発明が解決しようとする課題】ところが、一般的に、
このような音響シミュレーションのためのソフトウェア
は、「音線法」や「虚像法」等を用いて、厳密な計算を
行うものであるため、実時間でインパルス応答を生成す
ることは困難である。However, in general,
Since such software for acoustic simulation performs rigorous calculations using the “sound ray method”, the “virtual image method”, etc., it is difficult to generate an impulse response in real time.

【０００６】一方、モデルを非常に単純化することによ
って実時間で音響空間のシミュレーションを生成できる
ソフトウェアとして、「デジタルリバーブ方式」も数多
く存在する。中には、音波の吸収率及び反射率の計算に
壁の材質や障害物の形状の情報を使用しているものもあ
るが、何れも現実の音をサンプリングしたものではない
ため音の現実感、立体感及び臨場感等に欠ける。On the other hand, there are many "digital reverb methods" as software that can generate a simulation of an acoustic space in real time by greatly simplifying the model. Some of them use the information of the wall material and the shape of obstacles to calculate the absorption rate and reflectance of sound waves, but none of them is a sampling of the actual sound, so the realism of the sound , Lacks in three-dimensional effect and realism.

【０００７】そこで本発明は、このような従来の実情に
鑑みて提案されたものであり、実空間を実際に測定する
ことによって得られたインパルス応答の断片データを効
果的に使用することによって、より詳細なモデルの計算
を従来よりも少ない計算量で行い、現実感ある仮想空間
を表現できるインパルス応答の合成が可能な音声処理装
置及び音声処理方法、並びに該音声処理方法を実行させ
る制御プログラムを提供することを目的とする。Therefore, the present invention has been proposed in view of such a conventional situation, and by effectively using the fragment data of the impulse response obtained by actually measuring the real space, A voice processing device and a voice processing method capable of performing more detailed model calculation with a smaller amount of calculation than before and synthesizing an impulse response capable of expressing a realistic virtual space, and a control program for executing the voice processing method. The purpose is to provide.

【０００８】[0008]

【課題を解決するための手段】上述した目的を達成する
ために、本発明に係る音声処理装置は、仮想空間におけ
る音源と該仮想音源からの音声を受音する受音点との間
の該仮想空間での位置関係に応じた音声を生成する音声
処理装置において、仮想空間を構成する要素に関する空
間構成用データが記憶された空間構成用データ記憶手段
と、仮想音源から受音点に伝播する音波の直接音応答と
主要な反射音応答の少なくとも振幅及び遅延からなる振
幅遅延データを、空間構成用データに基づいて算出する
振幅遅延算出手段と、仮想空間に相当する実空間で測定
された仮想音源から受音点に伝播する音波のインパルス
応答から、所定区間を抽出したインパルス応答断片デー
タが記憶されたインパルス応答断片データ記憶手段と、
振幅遅延データとインパルス応答断片データとに基づい
て合成インパルス応答を生成するインパルス応答合成手
段と、インパルス応答合成手段において合成された合成
インパルス応答を音源の音声データに畳み込む畳込手段
とを備える。In order to achieve the above-mentioned object, a voice processing apparatus according to the present invention is provided with a sound source between a sound source in a virtual space and a sound receiving point for receiving a sound from the virtual sound source. In a voice processing device that generates a voice according to a positional relationship in a virtual space, a space configuration data storage unit that stores space configuration data regarding elements that configure the virtual space, and propagates from a virtual sound source to a sound receiving point. Amplitude delay calculation means for calculating amplitude delay data consisting of at least amplitude and delay of direct sound response of sound wave and main reflected sound response based on the data for space configuration, and virtual measured in real space corresponding to virtual space. From the impulse response of the sound wave propagating from the sound source to the sound receiving point, impulse response fragment data storage means in which impulse response fragment data extracted from a predetermined section is stored,
An impulse response synthesizing unit that generates a synthetic impulse response based on the amplitude delay data and the impulse response fragment data, and a convolution unit that convolves the synthetic impulse response synthesized by the impulse response synthesizing unit with the sound data of the sound source.

【０００９】このような音声処理装置は、振幅遅延算出
手段によって、仮想音源から受音点に伝播する音波の直
接音応答と主要な反射音応答の少なくとも振幅及び遅延
からなる振幅遅延データを空間構成用データに基づいて
算出し、インパルス応答合成手段によって、振幅遅延デ
ータとインパルス応答断片データとに基づいて合成イン
パルス応答を生成し、畳込手段によって、インパルス応
答合成手段において合成された合成インパルス応答を音
源の音声データに畳み込む。In such a voice processing apparatus, the amplitude delay calculation means spatially configures the amplitude delay data including at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response. Calculated based on the use data, the impulse response synthesizing means generates a synthetic impulse response based on the amplitude delay data and the impulse response fragment data, and the convolution means generates the synthetic impulse response synthesized by the impulse response synthesizing means. Fold it into the sound data of the sound source.

【００１０】このような音声処理装置において、インパ
ルス応答断片データは、仮想空間に相当する実空間で測
定されたインパルス応答の所定区間を抽出したものであ
る。また、反射音応答には、仮想空間の空間構成に基づ
いた反射物或いは障害物に対する一次反射或いは回折の
応答が少なくとも含まれている。In such a voice processing device, the impulse response fragment data is obtained by extracting a predetermined section of the impulse response measured in the real space corresponding to the virtual space. Further, the reflected sound response includes at least a primary reflection or diffraction response to a reflector or an obstacle based on the spatial configuration of the virtual space.

【００１１】また、本発明に係る音声処理装置は、仮想
空間における音源と該仮想音源からの音声を受音する受
音点との間の該仮想空間での位置関係に応じた音声を生
成する音声処理装置において、仮想空間に相当する実空
間で測定された仮想音源から受音点に伝播する音波のイ
ンパルス応答から、所定区間を抽出したインパルス応答
断片データを音源の音声データに対して予め重畳した重
畳済音声データが記憶された音声データ記憶手段と、仮
想空間を構成する要素に関する空間構成用データが記憶
された空間構成用データ記憶手段と、仮想音源から受音
点に伝播する音波の直接音応答と主要な反射音応答の少
なくとも振幅及び遅延からなる振幅遅延データを、空間
構成用データに基づいて算出する振幅遅延算出手段と、
重畳済音声データを振幅遅延データに基づいて合成する
合成手段とを備える。Further, the voice processing device according to the present invention generates a voice corresponding to a positional relationship in the virtual space between a sound source in the virtual space and a sound receiving point for receiving the voice from the virtual sound source. In a voice processing device, impulse response fragment data obtained by extracting a predetermined section from an impulse response of a sound wave propagating from a virtual sound source measured in a real space corresponding to a virtual space to a sound receiving point is superimposed on the sound data of the sound source in advance. Sound data storage means for storing the superposed sound data, space configuration data storage means for storing space configuration data relating to elements forming the virtual space, and direct transmission of sound waves propagating from the virtual sound source to the sound receiving point. Amplitude delay calculation means for calculating amplitude delay data consisting of at least the amplitude and delay of the sound response and the main reflected sound response, based on the spatial configuration data,
And a synthesizing means for synthesizing the superimposed voice data based on the amplitude delay data.

【００１２】このような音声処理装置は、振幅遅延算出
手段によって、仮想音源から受音点に伝播する音波の直
接音応答と主要な反射応答の少なくとも振幅及び遅延か
らなる振幅遅延データを、空間構成用データに基づいて
算出し、合成手段によって、仮想音源からの受音点に伝
播する音波のインパルス応答から抽出した直接音応答と
反射音応答の各々を表すインパルス応答断片データを音
源の音声データに対して予め重畳した重畳済音声データ
を、振幅遅延データに基づいて合成する。In such a voice processing apparatus, the amplitude delay calculation means spatially configures the amplitude delay data including at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflection response. Calculated based on the data for the sound source, by the synthesizing means, the impulse response fragment data representing each of the direct sound response and the reflected sound response extracted from the impulse response of the sound wave propagating to the sound receiving point from the virtual sound source is converted into the sound data of the sound source. On the other hand, the superimposed voice data that has been superimposed in advance is synthesized based on the amplitude delay data.

【００１３】また、本発明に係る音声処理方法は、仮想
空間における音源と該仮想音源からの音声を受音する受
音点との間の該仮想空間での位置関係に応じた音声を生
成する音声処理方法において、仮想音源から受音点に伝
播する音波の直接音応答と主要な反射音応答の少なくと
も振幅及び遅延からなる振幅遅延データを、仮想空間を
構成する要素に関する空間構成用データに基づいて算出
する振幅遅延算出工程と、仮想空間に相当する実空間で
測定された仮想音源から受音点に伝播する音波のインパ
ルス応答から、所定区間を抽出したインパルス応答断片
データと振幅遅延データとに基づいて合成インパルス応
答を生成するインパルス応答合成工程と、インパルス応
答合成工程において合成された合成インパルス応答を音
源の音声データに畳み込む畳込工程とを備える。Further, the voice processing method according to the present invention generates a voice corresponding to a positional relationship in the virtual space between a sound source in the virtual space and a sound receiving point for receiving the voice from the virtual sound source. In the voice processing method, the amplitude delay data consisting of at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response is based on the data for space configuration related to the elements forming the virtual space. From the impulse response of the sound wave propagating from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point, the impulse response fragment data and the amplitude delay data extracted from the predetermined section are calculated. Based on the impulse response synthesizing step that generates a synthetic impulse response based on the And a Komu only convolution process.

【００１４】このような音声処理方法によれば、振幅遅
延算出工程において、仮想音源から受音点に伝播する音
波の直接音応答と主要な反射音応答の少なくとも振幅及
び遅延からなる振幅遅延データが空間構成用データに基
づいて算出され、インパルス応答合成工程において、イ
ンパルス応答とインパルス応答断片データとが合成され
て合成インパルス応答が生成され、畳込工程において、
インパルス応答合成工程において合成された合成インパ
ルス応答が音源の音声データに畳み込まれる。According to such a voice processing method, in the amplitude delay calculation step, the amplitude delay data including at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response is obtained. Calculated based on the data for spatial configuration, in the impulse response synthesis step, the impulse response and the impulse response fragment data are synthesized to generate a synthetic impulse response, and in the convolution step,
The synthesized impulse response synthesized in the impulse response synthesis step is convoluted with the sound data of the sound source.

【００１５】また、本発明に係る音声処理方法は、仮想
空間における音源と該仮想音源からの音声を受音する受
音点との間の該仮想空間での位置関係に応じた音声を生
成する音声処理方法において、仮想音源から受音点に伝
播する音波の直接音応答と主要な反射音応答の少なくと
も振幅及び遅延からなる振幅遅延データを、仮想空間を
構成する要素に関する空間構成用データに基づいて算出
する振幅遅延算出工程と、仮想空間に相当する実空間で
測定された仮想音源から受音点に伝播する音波のインパ
ルス応答から、所定区間を抽出したインパルス応答断片
データを音源の音声データに対して予め重畳した重畳済
音声データを、振幅遅延データに基づいて合成する合成
工程とを備える。Further, the voice processing method according to the present invention generates a voice according to a positional relationship in the virtual space between a sound source in the virtual space and a sound receiving point for receiving the voice from the virtual sound source. In the voice processing method, the amplitude delay data consisting of at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response is based on the data for space configuration related to the elements forming the virtual space. From the impulse response of the sound wave propagating from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point, the impulse response fragment data obtained by extracting the predetermined section is converted into the sound data of the sound source. A synthesizing step of synthesizing the superposed voice data which has been superposed in advance based on the amplitude delay data.

【００１６】このような音声処理方法によれば、振幅遅
延算出工程において、仮想音源から受音点に伝播する音
波の直接音応答と主要な反射音応答の振幅及び遅延から
なる振幅遅延データが空間構成用データに基づいて算出
され、合成工程において、仮想空間に相当する実空間で
測定された仮想音源から受音点に伝播する音波のインパ
ルス応答から、インパルス応答断片データを音源の音声
データに対して予め重畳した重畳済音声データが振幅遅
延データに基づいて合成される。According to such a voice processing method, in the amplitude delay calculation step, the amplitude delay data consisting of the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response is stored in the space. Calculated based on the configuration data, in the synthesis process, from the impulse response of the sound wave propagating from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point, impulse response fragment data to the sound data of the sound source Then, the superimposed voice data that has been superimposed in advance is synthesized based on the amplitude delay data.

【００１７】この音声処理工程において、インパルス応
答断片データは、仮想空間に相当する実空間で測定され
たインパルス応答の所定区間を抽出したものであること
が好ましい。また、反射音応答には、仮想空間の空間構
成に基づいた反射物或いは障害物に対する一次反射或い
は回折の応答とが少なくとも含まれている。In this voice processing step, it is preferable that the impulse response fragment data is obtained by extracting a predetermined section of the impulse response measured in the real space corresponding to the virtual space. The reflected sound response includes at least a primary reflection or diffraction response to a reflector or an obstacle based on the spatial configuration of the virtual space.

【００１８】また、本発明に係る制御プログラムは、仮
想空間における音源と該仮想音源からの音声を受音する
受音点との間の該仮想空間での位置関係に応じた音声を
生成するコンピュータ制御可能な音声処理装置の制御プ
ログラムにおいて、仮想音源から受音点に伝播する音波
の直接音応答と主要な反射音応答の少なくとも振幅及び
遅延からなる振幅遅延データを、仮想空間を構成する要
素に関する空間構成用データに基づいて算出する振幅遅
延算出処理と、仮想空間に相当する実空間で測定された
仮想音源から受音点に伝播する音波のインパルス応答か
ら、所定区間を抽出したインパルス応答断片データと振
幅遅延データとに基づいて合成インパルス応答を生成す
るインパルス断片合成処理と、インパルス断片合成処理
において合成された合成インパルス応答を音源の音声デ
ータに畳み込む畳込処理とを音声処理装置に実行させ
る。Further, the control program according to the present invention is a computer for generating a sound according to a positional relationship in the virtual space between a sound source in the virtual space and a sound receiving point for receiving the sound from the virtual sound source. In a control program of a controllable voice processing device, amplitude delay data including at least amplitude and delay of a direct sound response of a sound wave propagating from a virtual sound source to a sound receiving point and a main reflected sound response is related to an element forming a virtual space. Amplitude-delay calculation processing that is calculated based on the spatial configuration data, and impulse response fragment data that extract a predetermined section from the impulse response of the sound wave that propagates from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point And the impulse delay data to generate a combined impulse response, and combined in the impulse fragment combination process. A convolution process of convoluting synthetic impulse response to a sound source of the audio data to be executed by the sound processing apparatus.

【００１９】また、本発明に係る制御プログラムは、仮
想空間における音源と該仮想音源からの音声を受音する
受音点との間の該仮想空間での位置関係に応じた音声を
生成するコンピュータ制御可能な音声処理装置の制御プ
ログラムにおいて、仮想音源から受音点に伝播する音波
の直接音応答と主要な反射音応答の少なくとも振幅及び
遅延からなる振幅遅延データを、仮想空間を構成する要
素に関する空間構成用データに基づいて算出する振幅遅
延算出処理と、仮想空間に相当する実空間で測定された
仮想音源から受音点に伝播する音波のインパルス応答か
ら、所定区間を抽出したインパルス応答断片データを音
源の音声データに対して予め重畳した重畳済音声データ
を、振幅遅延データに基づいて合成する合成処理とを音
声処理装置に実行させる。Further, the control program according to the present invention is a computer for generating a sound according to a positional relationship in the virtual space between a sound source in the virtual space and a sound receiving point for receiving the sound from the virtual sound source. In a control program of a controllable voice processing device, amplitude delay data including at least amplitude and delay of a direct sound response of a sound wave propagating from a virtual sound source to a sound receiving point and a main reflected sound response is related to an element forming a virtual space. Amplitude-delay calculation processing that is calculated based on the spatial configuration data, and impulse response fragment data that extract a predetermined section from the impulse response of the sound wave that propagates from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point Performs on the voice processing device a synthesis process of synthesizing the superimposed voice data, which has been superposed on the voice data of the sound source, based on the amplitude delay data. To.

【００２０】[0020]

【発明の実施の形態】本発明の具体例として示す音声処
理装置は、例えば、特願２０００−３５４５３０に記載
されたような多チャンネルの音声再生処理技術を用い
て、仮想空間における音源と該仮想音源からの音声を受
音する受音点との間の該仮想空間での位置関係に応じた
音声を生成する音声処理装置である。BEST MODE FOR CARRYING OUT THE INVENTION An audio processing apparatus shown as a specific example of the present invention uses a multi-channel audio reproduction processing technology as described in Japanese Patent Application No. 2000-354530, for example, and a sound source and a virtual sound source in a virtual space. It is a voice processing device that generates voice according to a positional relationship in the virtual space with a sound receiving point that receives a voice from a sound source.

【００２１】すなわち、聴取者を取り囲むように配置さ
れた複数の受音点に対し、仮想音源からの伝達特性（イ
ンパルス応答）を求める。これらの受音点に対応する位
置に同数の音響再生手段（スピーカ）を配置し、それぞ
れのインパルス応答を畳み込んだ音源音声をこれらのス
ピーカより再生することにより、聴取者にとっては、あ
たかも仮想音源が存在した位置に音像定位しているよう
に聴くことができるものである。That is, the transfer characteristic (impulse response) from the virtual sound source is obtained for a plurality of sound receiving points arranged so as to surround the listener. By arranging the same number of sound reproducing means (speakers) at the positions corresponding to these sound receiving points and reproducing the sound source sound convolving the respective impulse responses from these speakers, it is as if a virtual sound source for the listener. It can be heard as if the sound image was localized at the position where was.

【００２２】特に、この音声処理装置は、仮想空間に相
当する実空間で予め測定されたインパルス応答の所定区
間を抽出したインパルス応答断片データを記憶したイン
パルス応答断片データ記憶手段を備え、振幅／遅延算出
部において仮想音源から受音点に伝播する音波の直接音
応答と主要な反射音応答の振幅及び遅延からなる振幅遅
延データを空間構成用データに基づいて算出し、インパ
ルス応答合成部において振幅遅延データとインパルス応
答断片データとに基づいて合成インパルス応答を生成
し、畳込部において合成インパルス応答を音源音声デー
タに畳み込むことによって、より詳細なモデルの計算を
従来よりも少ない計算量で行い、現実感ある仮想空間を
表現できる音声処理を実現したものである。In particular, this speech processing apparatus comprises impulse response fragment data storage means for storing impulse response fragment data in which a predetermined section of impulse response measured in advance in a real space corresponding to a virtual space is stored, and the amplitude / delay The calculation unit calculates the amplitude delay data consisting of the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response, and the amplitude response delay unit calculates the amplitude delay. By generating a synthetic impulse response based on the data and the impulse response fragment data, and convolving the synthetic impulse response with the sound source speech data in the convolution unit, a more detailed model is calculated with a smaller amount of calculation than before, It realizes voice processing that can express a feeling of virtual space.

【００２３】この音声処理装置は、仮想的な空間のある
場所に存在する音源からの音波、特に、その音源からの
直接音や壁・床からの反射音が同空間のほかの場所でど
のように聞こえるかを実時間でシミュレーションでき
る。そのため、特に、３Ｄ（３-Dimensional：３次元）
ゲームやＣＧ（Computer Graphics）等において、音源
が仮想空間中を移動するとき、その音源に対する位置関
係の変化によって、聞こえ方が変わる場合に、さらなる
臨場感を与えることが期待できる。This sound processing device is designed to show how sound waves from a sound source existing in a certain place in a virtual space, particularly direct sound from the sound source and reflected sound from a wall / floor, are observed in other places in the same space. Can be simulated in real time. Therefore, especially 3D (3-Dimensional)
In a game, CG (Computer Graphics), or the like, when a sound source moves in a virtual space, if the way of hearing changes due to a change in the positional relationship with respect to the sound source, it can be expected to give a more realistic feeling.

【００２４】本発明の具体例としての音声処理装置は、
例えば、仮想空間に相当する実空間、及び考慮される状
況毎に用意した実空間を実際に設定して、この実空間で
インパルス応答を取得しておくことが前提となる。この
測定時のインパルス応答は、好ましくは、直接音と反射
音とが分離できることが理想的である。測定したインパ
ルス応答は、可能な限り直接音と反射音とに分離し、断
片毎にデータベース化して保持しておく。A voice processing apparatus as a specific example of the present invention is
For example, it is premised that a real space corresponding to a virtual space and a real space prepared for each considered situation are actually set and the impulse response is acquired in this real space. Ideally, the impulse response at the time of this measurement should be able to separate the direct sound and the reflected sound. The measured impulse response is separated into direct sound and reflected sound as much as possible, and stored as a database for each fragment.

【００２５】一方、空間のモデル化の段階において、音
源から受音位置（又はマイクロフォン位置）までの直接
音及び反射音の振幅と遅延とを計算してシミュレーショ
ンで使用する振幅遅延データを作成する。遅延が表す部
分にデータベース中のインパルス断片を振幅に合わせて
合成する。データベースとして、ほかの情報（何回反射
して聴取点に到達した音か、等）を含めておくことによ
り、リアル感・臨場感を高めた音声表現が可能となる。On the other hand, at the stage of modeling the space, the amplitude and delay of the direct sound and the reflected sound from the sound source to the sound receiving position (or microphone position) are calculated to create amplitude delay data used in the simulation. The impulse fragment in the database is combined with the part represented by the delay according to the amplitude. By including other information (how many times the sound has reached the listening point after being reflected, etc.) as a database, it is possible to make a voice expression that enhances the realism and presence.

【００２６】以下、本発明の具体例について、図面を参
照して詳細に説明する。ここでは本発明の具体例として
の音声処理装置を、例えば、ＣＧで表現された空間内を
音源や聴取者が移動するような３Ｄゲームに適用した場
合を考える。Specific examples of the present invention will be described below in detail with reference to the drawings. Here, a case where the audio processing device as a specific example of the present invention is applied to, for example, a 3D game in which a sound source or a listener moves in a space represented by CG is considered.

【００２７】図１に示す音声処理装置１は、マルチチャ
ンネル方式、例えば８チャンネルで音声を再生できるよ
うに、スピーカ２ａ、２ｂ、２ｃ、２ｄ、２ｅ、２ｆ、
２ｇ、２ｈを備えている。各スピーカは、図１に点線で
示す閉曲面３を形成するように配置され、詳細を後述す
る音声信号制御生成部４からの音声信号に基づいて音声
を出力することにより、各スピーカによって囲まれる閉
曲面３の内部領域内に位置する聴取者（以下、リスナ、
必要に応じてプレーヤと記す。）１００は、仮想的な音
源位置と各スピーカ配置位置との位置関係に応じた臨場
感あふれる音響効果を得ることができる。The audio processing apparatus 1 shown in FIG. 1 has speakers 2a, 2b, 2c, 2d, 2e, 2f, 2f, 2b, 2c, 2d so that audio can be reproduced in a multi-channel system, for example, 8 channels.
It is equipped with 2g and 2h. Each speaker is arranged so as to form a closed curved surface 3 shown by a dotted line in FIG. 1, and is surrounded by each speaker by outputting a sound based on a sound signal from a sound signal control generator 4 whose details will be described later. Listeners (hereinafter referred to as listeners) located within the inner region of the closed curved surface 3
It will be referred to as a player if necessary. ) 100 can obtain a realistic sound effect according to the positional relationship between the virtual sound source position and each speaker arrangement position.

【００２８】音声処理装置１は、映像を表示する表示部
としてのディスプレイ５を備えており、ディスプレイ５
には、音声に同期した映像が表示されるようになってい
る。より具体的には、ディスプレイ５は、例えば、いわ
ゆるヘッドマウントディスプレイ（Head Mount Displa
y：ＨＭＤ）であり、映像制御生成部６からの映像信号
に基づいて映像が表示される。リスナ１００は、このＨ
ＭＤに表示される３Ｄ空間の映像を視ながら、映像とし
て表現される仮想空間に応じた音声を得ることができ
る。The audio processing apparatus 1 is equipped with a display 5 as a display unit for displaying an image, and the display 5
A video synchronized with the voice is displayed on the. More specifically, the display 5 is, for example, a so-called head mount display (Head Mount Displa).
y: HMD), and the video is displayed based on the video signal from the video control generator 6. Listener 100 is this H
It is possible to obtain a sound corresponding to the virtual space represented as a video while viewing the video in the 3D space displayed on the MD.

【００２９】音声処理装置１は、さらに、ディスプレイ
５に表示される仮想空間における仮想音源と各スピーカ
位置との位置関係を変更する、すなわちリスナ１００と
仮想音源との位置関係を変更するための操作部７を備え
ている。操作部７は、プログラムにより制御されていて
もよいし、リスナ１００以外の第３者によって操作可能
なものであってもよいし、リスナ１００によって入力可
能なコントローラ７ａを介して操作入力されるものであ
ってもよい。The sound processing apparatus 1 further changes the positional relationship between the virtual sound source and each speaker position in the virtual space displayed on the display 5, that is, the operation for changing the positional relationship between the listener 100 and the virtual sound source. It has a section 7. The operation unit 7 may be controlled by a program, may be operated by a third party other than the listener 100, or may be operated and input via the controller 7a that can be input by the listener 100. May be

【００３０】なお、音声信号制御生成部４、映像制御生
成部６及び操作部７は、図示しないＣＰＵによって統括
制御されている。The audio signal control generation unit 4, the video control generation unit 6 and the operation unit 7 are centrally controlled by a CPU (not shown).

【００３１】この音声処理装置１のように、音声信号制
御生成部４及び複数チャンネルの出力を有する音響シス
テムでは、閉曲面３内部に位置するリスナ１００にとっ
て、仮想空間のある地点にあたかも音源があるかのよう
な音場をつくり出すことができる。さらに、音源から各
スピーカ位置までの複数のインパルス応答を滑らかに入
れ換えることによって、音源が移動する場合や仮想空間
の中でリスナ１００が移動する場合等、音源とリスナ１
００との位置関係の相対的な変化を音声に反映できる。In the audio system having the audio signal control generator 4 and the output of a plurality of channels like the audio processing device 1, the listener 100 located inside the closed curved surface 3 has a sound source as if the virtual space were at a certain point. You can create a sound field like that. Further, by smoothly exchanging a plurality of impulse responses from the sound source to each speaker position, the sound source and the listener 1 can be moved when the sound source moves or when the listener 100 moves in the virtual space.
The relative change in the positional relationship with 00 can be reflected in the voice.

【００３２】音声信号制御生成部４及び映像制御生成部
６の具体的な構成を明らかにした音声処理装置１を図２
に示す。FIG. 2 shows an audio processing device 1 in which the concrete configurations of the audio signal control generator 4 and the video control generator 6 are clarified.
Shown in.

【００３３】音声信号制御生成部４は、場面及び状況に
応じた音声・効果音等の音声データであって、未処理の
オリジナル音声データである音源の音声データ（以下、
音源音声データと記す。）が記憶された音声データ記憶
部１１と、仮想空間の構成に関するデータが記憶された
空間構成用データ記憶部１２と、空間構成用データに基
づいて直接音と反射音の振幅の減衰量と時間遅延を算出
する振幅／遅延算出部１３と、仮想空間に相当する実空
間で実測された音波のインパルス応答が直接音部分と反
射音部分とに断片化されて記憶されたインパルス応答断
片データ記憶部１４と、振幅／遅延算出部１３で生成さ
れた子服遅延データとインパルス応答断片データとに基
づいて合成インパルス応答を合成するインパルス応答合
成部１５と、インパルス応答合成部１５で合成して生成
された合成インパルス応答を音源音声データに畳み込む
畳込部１６と、合成インパルス応答が重畳された音声デ
ータを各スピーカから出力する際の音声処理を行う音声
処理部２０とを備え、これらが内部バス１９によって接
続されている。また、音声処理部２０からの音声信号を
出力するためのＤ／Ａ（Digital to Analog）部１７及
びアンプ１８を備えている。The voice signal control generator 4 is voice data such as voice and sound effects depending on a scene and a situation, and is voice data of a sound source (hereinafter, referred to as unprocessed original voice data).
It is referred to as sound source voice data. ) Is stored, a spatial configuration data storage unit 12 that stores data relating to the configuration of the virtual space, an attenuation amount and time of the amplitude of the direct sound and the reflected sound based on the spatial configuration data. An amplitude / delay calculation unit 13 for calculating a delay, and an impulse response fragment data storage unit in which an impulse response of a sound wave actually measured in a real space corresponding to a virtual space is fragmented and stored into a direct sound portion and a reflected sound portion. 14, an impulse response synthesizing unit 15 for synthesizing a synthetic impulse response based on the child clothing delay data and the impulse response fragment data generated by the amplitude / delay calculating unit 13, and the impulse response synthesizing unit 15 The convolution unit 16 that convolves the synthesized impulse response with the sound source speech data, and the speech processing when outputting the speech data with the synthesized impulse response superimposed from each speaker. And a sound processing unit 20 for performing, which are connected by an internal bus 19. Further, a D / A (Digital to Analog) unit 17 and an amplifier 18 for outputting a sound signal from the sound processing unit 20 are provided.

【００３４】映像制御生成部６は、図２に示すように、
例えば、仮想空間中の仮想音源となるオブジェクトと仮
想空間内におけるリスナ１００との表示位置関係を制御
するためのオブジェクト表示制御部２１と、画像信号処
理部２２とを備えている。The video control generator 6 is, as shown in FIG.
For example, an object display control unit 21 for controlling a display positional relationship between an object which is a virtual sound source in the virtual space and the listener 100 in the virtual space, and an image signal processing unit 22 are provided.

【００３５】本具体例では、図３に示すような仮想空間
を想定し、この仮想空間４０における仮想音源４１から
の音声を表現する場合を示す。仮想空間４０には、波線
で示す壁４２が想定されている。そのため、リスナ１０
０が仮想音源４１の位置に音像定位感を得るように各ス
ピーカから出力する音波を生成する際に、本具体例で
は、仮想音源４１からの直接波５１と、仮想音源４１の
壁４２に対する一次反射波５２ａ及び５２ｂと、仮想音
源４１の壁４２に対する二次反射波５３の影響を考慮
し、ここではまず、スピーカ２ｇから出力される音声波
形の生成について説明する。In this specific example, a virtual space as shown in FIG. 3 is assumed, and a case where a sound from a virtual sound source 41 in this virtual space 40 is expressed is shown. A wall 42 indicated by a wavy line is assumed in the virtual space 40. Therefore, listener 10
When 0 generates a sound wave output from each speaker so that a sound image localization feeling is obtained at the position of the virtual sound source 41, in this specific example, the direct wave 51 from the virtual sound source 41 and the primary wave with respect to the wall 42 of the virtual sound source 41 are generated. Considering the influence of the reflected waves 52a and 52b and the secondary reflected wave 53 on the wall 42 of the virtual sound source 41, first, the generation of the voice waveform output from the speaker 2g will be described.

【００３６】具体的に音声信号制御生成部４における空
間構成用データ記憶部１２には、仮想空間４０の空間構
成を表現するデータが記憶されている。本具体例におい
ては、空間構成用データは、仮想空間４０において仮想
音源から受音点に至る音波が影響を受ける構成要素を、
位置、形状寸法等のデータ構造として表したものをい
う。例えば、音波を反射する壁面を数点の３次元座標値
（ポリゴンデータ）で表現するものとする。このデータ
構造には、反射率（吸収率）や透過率等のデータを含め
てもよい。本具体例では、仮想空間を単純化又は簡略化
した要素で構成し、音源からの直接音と主要な反射音の
みが得られるような構成要素だけを考慮している。More specifically, the spatial configuration data storage unit 12 in the audio signal control generation unit 4 stores data representing the spatial configuration of the virtual space 40. In the present specific example, the spatial configuration data is a component in which a sound wave from the virtual sound source to the sound receiving point in the virtual space 40 is affected,
A data structure such as position and shape dimensions. For example, the wall surface that reflects sound waves is represented by several three-dimensional coordinate values (polygon data). This data structure may include data such as reflectance (absorption rate) and transmittance. In this specific example, the virtual space is configured with simplified or simplified elements, and only the components that can obtain only the direct sound from the sound source and the main reflected sound are considered.

【００３７】振幅／遅延算出部１３は、空間構成用デー
タに基づいて、仮想音源４１からの伝達応答としてスピ
ーカ２ｇに伝播する音波の直接音応答と直接波に対する
反射波の時間振幅及び遅延の減衰度合いを振幅／遅延モ
デル（以下、振幅遅延データと記す。）として算出す
る。振幅／遅延算出部１３は、仮想音源４１、壁４２、
スピーカ２ｇ間の位置関係を示す空間構成用データに基
づいて、音波を直線的に見立てて、仮想音源４１からス
ピーカ２ｇへの経路、仮想音源４１からの音波が壁４２
によって反射された場合のスピーカ２ｇまでの経路及び
音波の振幅の減衰度合いを算出する。The amplitude / delay calculation unit 13 attenuates the direct sound response of the sound wave propagating to the speaker 2g as the transmission response from the virtual sound source 41 and the time amplitude and delay of the reflected wave with respect to the direct wave, based on the spatial configuration data. The degree is calculated as an amplitude / delay model (hereinafter referred to as amplitude delay data). The amplitude / delay calculation unit 13 includes a virtual sound source 41, a wall 42,
Based on the spatial configuration data indicating the positional relationship between the speakers 2g, the sound wave is linearly regarded as a path from the virtual sound source 41 to the speaker 2g, and the sound wave from the virtual sound source 41 is a wall 42.
Then, the path to the speaker 2g and the degree of attenuation of the amplitude of the sound wave when reflected by are calculated.

【００３８】本発明においては、振幅遅延データには、
上述される直接音応答の振幅及び遅延、反射音応答の振
幅及び遅延の各データに加えて、それぞれの直接音、反
射音の経路を表す情報を併せ持つことが望ましい。この
情報には、例えば、壁４２ａに入射角６０°で入射した
ことや、図示しない障害物Ｘによって回折された等の情
報を含めることができ、後述されるインパルス応答断片
データ記憶部１４から適切なインパルス応答断片データ
を選択する際の検索キーとして使用することができる。In the present invention, the amplitude delay data includes
In addition to the data of the amplitude and delay of the direct sound response and the amplitude and delay of the reflected sound response described above, it is desirable to have information indicating the paths of the respective direct sounds and reflected sounds. This information can include, for example, information that the wall 42a is incident at an incident angle of 60 ° and that it is diffracted by an obstacle X (not shown), and is appropriate from the impulse response fragment data storage unit 14 described later. It can be used as a search key when selecting various impulse response fragment data.

【００３９】振幅／遅延算出部１３では、いわゆる「虚
像法」を用いて振幅／遅延を算出している。ここでは簡
単のため、音線が何回反射したかを意味する次数は、二
次程度の低い次数で打ち切るものとする。より高い次数
の計算を行う必要があるかどうかは、ＣＰＵ等のハード
ウェアの負荷や高次の反射波の大きさ等を考えて自由に
選択することができる。例えば、ユーザに提示する画像
処理の負荷が大きければ、より低次の次数で計算を打ち
切る処理も可能である。The amplitude / delay calculation unit 13 calculates the amplitude / delay by using the so-called "virtual image method". Here, for the sake of simplicity, it is assumed that the order, which means how many times the sound ray is reflected, is cut off at a low order such as the second order. Whether or not a higher order calculation needs to be performed can be freely selected in consideration of the load of hardware such as a CPU and the magnitude of a higher-order reflected wave. For example, if the load of image processing presented to the user is large, it is possible to terminate the calculation at a lower order.

【００４０】図４は、虚像法を用いた反射音の音線の一
般的な算出過程を示している。図４（ａ）は、一次反射
波の経路を求める場合であり、図４（ｂ）は、二次反射
の経路を求める場合である。虚像法では、仮想音源から
の音波からの音線が反射する壁Ｗ_１、Ｗ_２、Ｗ_３に対し
て、その壁の対称位置にある「虚像」を求めることによ
り、全ての反射音の経路Ｌを求める。FIG. 4 shows a general process of calculating the ray of the reflected sound using the virtual image method. FIG. 4A shows a case where the path of the primary reflected wave is obtained, and FIG. 4B shows a case where the path of the secondary reflected wave is obtained. In the virtual image method, for walls W ₁ , W ₂ , and W _{3 on} which sound rays from sound waves from a virtual sound source are reflected, by obtaining a “virtual image” at a symmetrical position of the walls, all reflected sound paths are obtained. Find L.

【００４１】すなわち、一次反射波の場合、仮想空間に
おける壁Ｗ_１に対して仮想音源Ｐ_１の虚像音源Ｐ_１’を
仮定する。また、二次反射波の場合、仮想空間における
壁Ｗ _２及びＷ_３に対して仮想音源Ｐ_２の虚像音源
Ｐ_２’、Ｐ_２”を仮定する。経路を分れることができれ
ば、その経路長から距離を知ることができ、さらには遅
延時間を知ることができる。また、同時に、ある減衰率
を仮定することにより距離に応じた減衰量を計算するこ
とができる。さらに、それぞれの壁の反射率、吸音率等
を含めて減衰量をより精密にしてもよい。That is, in the case of the primary reflected wave,
Wall W in₁Against virtual sound source P₁Virtual image source P₁’
I assume. In the case of the secondary reflected wave, in the virtual space
Wall W _TwoAnd W_ThreeAgainst virtual sound source P_TwoVirtual image source
P_Two’、 P_Two"Assuming that you can split the path
For example, you can know the distance from the route length, and
You can know the extra time. Also, at the same time, there is an attenuation factor
It is possible to calculate the attenuation depending on the distance by assuming
You can Furthermore, the reflectance and sound absorption of each wall, etc.
The attenuation amount may be made more precise by including.

【００４２】なお、反射波１つ１つの振幅の減衰量と時
間遅延の算出方法は、特に限定されない。上述した手法
のほかには、例えば壁に対する反射角度に応じた減衰度
合いを表すパラメータ等を用いることもできる。The method of calculating the amount of attenuation of each reflected wave and the time delay is not particularly limited. In addition to the method described above, for example, a parameter indicating the degree of attenuation according to the reflection angle with respect to the wall may be used.

【００４３】このようにして、振幅／遅延算出部１３
は、図５に示すような振幅遅延データを生成する。ここ
では、図３における直接波５１、壁面４２ａに対する一
次反射波５２ａ、壁面４２ａと壁面４２ｂに対する二次
反射波５３についてのみ考慮する。In this way, the amplitude / delay calculation unit 13
Generates amplitude delay data as shown in FIG. Here, only the direct wave 51, the primary reflected wave 52a for the wall surface 42a, and the secondary reflected wave 53 for the wall surface 42a and the wall surface 42b in FIG. 3 are considered.

【００４４】図５において、０は、仮想音源４１からあ
る音声（インパルス応答測定用信号）が発生された時刻
を示す。時間軸Ｔに沿った時間経過とともに、直接波５
１、一次反射波５２ａ、二次反射波５２に対応するシグ
ナルが順次表示されている。ｔ_１は、仮想空間４０にお
けるスピーカ２ｇの位置で直接波５１が検出されるまで
の期間を示し、Ａ_１は、直接波５１の振幅を示す。同様
に、シグナルＳ_２は、スピーカ２ｇの位置で検出される
一次反射波５２ａの時間遅延ｔ_２と振幅Ａ_２を示し、シ
グナルＳ_３は、スピーカ２ｇの位置で検出される二次反
射波５３の時間遅延ｔ_３と振幅Ａ_３を示している。In FIG. 5, 0 indicates the time when a certain sound (impulse response measurement signal) is generated from the virtual sound source 41. Direct wave 5 with the passage of time along the time axis T
1, the signals corresponding to the primary reflected wave 52a and the secondary reflected wave 52 are sequentially displayed. t ₁ indicates a period until the direct wave 51 is detected at the position of the speaker 2g in the virtual space 40, and A ₁ indicates the amplitude of the direct wave 51. Similarly, the signal S ₂ indicates the time delay t ₂ and the amplitude A ₂ of the primary reflected wave 52a detected at the position of the speaker 2g, and the signal S ₃ is the secondary reflected wave 53 detected at the position of the speaker 2g. The time delay t ₃ and the amplitude A ₃ are shown.

【００４５】インパルス応答断片データ記憶部１４に
は、仮想音源から各スピーカ配置位置までのインパルス
応答のうち、直接音に対応する部分と反射音に対応する
部分とが、仮想空間に相当する実空間で測定されたイン
パルス応答の実測値の所定区間を抽出したインパルス応
答断片データとして記憶されている。本具体例では、上
述したように仮想空間に相当する実空間及び考慮される
状況毎に用意した実空間において、インパルス応答を予
め取得しておくことが前提となる。In the impulse response fragment data storage unit 14, in the impulse response from the virtual sound source to each speaker arrangement position, the portion corresponding to the direct sound and the portion corresponding to the reflected sound correspond to the real space corresponding to the virtual space. It is stored as impulse response fragment data obtained by extracting a predetermined section of the measured value of the impulse response measured in. In this specific example, it is premised that the impulse response is acquired in advance in the real space corresponding to the virtual space and the real space prepared for each situation to be considered as described above.

【００４６】例えば、ある仮想空間の壁の反射音のイン
パルス応答断片データは、図６に示すように、この仮想
空間に相当する実空間において実際のインパルス応答を
測定することによって得られる。図６に示す実空間６０
において、位置Ｐに存在する音源６１により発生される
音波をマイクロフォン６２によって取得する。For example, the impulse response fragment data of the reflected sound of the wall of a certain virtual space is obtained by measuring the actual impulse response in the real space corresponding to this virtual space, as shown in FIG. Real space 60 shown in FIG.
At, the sound wave generated by the sound source 61 existing at the position P is acquired by the microphone 62.

【００４７】ここで取得されるインパルス応答を図７に
示す。図７に示すインパルス応答から、直接音に対応す
る部分をインパルス応答断片データＤ_１として、反射音
に対応する部分をインパルス応答断片データＤ_２として
抽出する。ここで、各インパルス応答断片データは、例
えば、抽出した区間の後半で音声波形の振幅が徐々に減
衰するような時間窓によって区切られている。実空間に
おけるインパルス応答に基づいて音声処理を行うので、
音声処理装置１は、より自然な音声波形を再現すること
ができる。The impulse response acquired here is shown in FIG. From the impulse response shown in FIG. 7, a portion corresponding to the direct sound is extracted as impulse response fragment data D ₁ and a portion corresponding to the reflected sound is extracted as impulse response fragment data D ₂ . Here, each impulse response fragment data is divided by a time window in which the amplitude of the voice waveform gradually attenuates in the latter half of the extracted section, for example. Since voice processing is performed based on the impulse response in the real space,
The voice processing device 1 can reproduce a more natural voice waveform.

【００４８】なお、音波のサンプル数は、残響特性を表
現できる最低の点数でよい。また、場合によっては、抽
出した区間の前半で音声波形の振幅が徐々に減衰するよ
うな時間窓としてもよい。The number of sound wave samples may be the lowest score that can express the reverberation characteristic. In some cases, the time window may be such that the amplitude of the voice waveform gradually attenuates in the first half of the extracted section.

【００４９】インパルス応答合成部１５は、振幅遅延デ
ータとインパルス応答断片データ記憶部１４に記憶され
たインパルス応答断片データとに基づいて合成インパル
ス応答を生成する。インパルス応答合成部１５において
合成される合成インパルス応答を図８に示す。インパル
ス応答合成部１５は、図５に示した振幅遅延データの振
幅、時間遅延の関係を保持したまま、直接音、一次反射
音及び二次反射音のインパルス応答断片データを合成
し、最終的な振幅遅延データ、すなわち合成インパルス
応答を生成する。図８の例では、二次反射に対応するイ
ンパルス応答断片データも、一次反射音のそれと同様に
インパルス断片データＤ_２を利用している。The impulse response synthesizing unit 15 generates a synthetic impulse response based on the amplitude delay data and the impulse response fragment data stored in the impulse response fragment data storage unit 14. FIG. 8 shows a combined impulse response combined by the impulse response combining unit 15. The impulse response synthesis unit 15 synthesizes the impulse response fragment data of the direct sound, the first-order reflected sound, and the second-order reflected sound while maintaining the relationship between the amplitude and the time delay of the amplitude delay data shown in FIG. Amplitude delay data, that is, a synthetic impulse response is generated. In the example of FIG. 8, the impulse response fragment data corresponding to the secondary reflection also uses the impulse fragment data D ₂ similarly to that of the primary reflected sound.

【００５０】インパルス応答合成部１５において合成さ
れた合成インパルス応答は、畳込部１６において音声デ
ータ記憶部１１に記憶される音源音声データに畳み込ま
れる。The synthesized impulse response synthesized by the impulse response synthesis section 15 is convolved by the convolution section 16 with the sound source speech data stored in the speech data storage section 11.

【００５１】音声処理部２０は、操作部７（コントロー
ラ７ａ）から入力される仮想空間における仮想音源と各
スピーカとの位置関係の変化に応じて、上述した合成イ
ンパルス応答に基づいて、リスナ１００に聞こえる音声
を滑らかに変化させるためのクロスフェイド処理を行っ
ている。クロスフェイド処理に関する詳細は、後述す
る。音声処理装置１では、この音声処理部２０において
クロスフェイド処理を行うことができるため、仮想空間
４０における代表点をいくつか選んで、その位置に関す
る空間構成用データ及びインパルス応答断片データを用
意すればよく、全ての位置について各データを用意する
必要はない。The voice processing unit 20 sends to the listener 100 based on the above-mentioned synthesized impulse response according to the change in the positional relationship between the virtual sound source and each speaker in the virtual space input from the operation unit 7 (controller 7a). Crossfade processing is performed to smoothly change the sound that can be heard. Details regarding the crossfade processing will be described later. In the audio processing device 1, since the crossfading process can be performed in the audio processing unit 20, if some representative points in the virtual space 40 are selected and the spatial configuration data and the impulse response fragment data regarding the position are prepared. Well, it is not necessary to prepare each data for every position.

【００５２】なお、上述の例では、スピーカ２ｇから出
力される音声に関して説明したが、音声処理装置１は、
８チャンネルのマルチチャンネル方式を採用しているた
め、各スピーカに対して上述と同様の方法で合成インパ
ルス応答を合成することによって、仮想音源４１から各
スピーカ２ａ乃至２ｈに対しての伝達特性を形成してい
る。再生時には、形成された各合成インパルス応答を音
源音声データに畳み込んで各スピーカ２ａ乃至２ｈより
放音することにより、リスナ１００は、あたかも仮想音
源４が配置される位置にあるような定位感を得ることが
できる。そのため、振幅／遅延算出部１３では、全ての
チャンネルに関して個々の応答部分の振幅及び時間遅延
が求められる。In the above example, the sound output from the speaker 2g has been described, but the sound processing device 1
Since the 8-channel multi-channel method is adopted, the transfer characteristic from the virtual sound source 41 to each of the speakers 2a to 2h is formed by combining the combined impulse response with each speaker in the same manner as described above. is doing. At the time of reproduction, the synthesized impulse responses thus formed are convoluted with the sound source sound data and sound is emitted from the speakers 2a to 2h, so that the listener 100 has a sense of localization as if the virtual sound source 4 is located. Obtainable. Therefore, the amplitude / delay calculation unit 13 obtains the amplitude and time delay of each response part for all channels.

【００５３】このように、音声処理装置１は、仮想空間
４０おけるリスナ１００と仮想音源４１との位置関係、
或いは仮想音源４１と各スピーカとの位置関係がコント
ローラ７ａ等の操作によって変更される場合、例えば仮
想音源が移動している場合であっても、時々刻々と変化
する仮想音源の位置からの音波を滑らかに切り換えて仮
想音源４１が移動しているように表現できる。仮想空間
４０でリスナ１００が移動した際、或いは仮想音源４１
が移動した際に、音声処理装置１が音場を生成する処理
を図９を用いて具体的に説明する。As described above, the voice processing device 1 has a positional relationship between the listener 100 and the virtual sound source 41 in the virtual space 40.
Alternatively, when the positional relationship between the virtual sound source 41 and each speaker is changed by the operation of the controller 7a or the like, for example, even when the virtual sound source is moving, a sound wave from the position of the virtual sound source that changes moment by moment is generated. The virtual sound source 41 can be smoothly switched and expressed as if it were moving. When the listener 100 moves in the virtual space 40, or when the virtual sound source 41
A process in which the voice processing device 1 generates a sound field when the player moves will be specifically described with reference to FIG.

【００５４】音声処理装置１は、ステップＳ１におい
て、操作部７（コントローラ７ａ）を介して、プレーヤ
又は仮想音源４１の位置移動が入力されたか否かの判別
を行う。位置移動が入力された場合、ステップＳ２に進
む。In step S1, the voice processing apparatus 1 determines whether or not the position movement of the player or the virtual sound source 41 has been input via the operation unit 7 (controller 7a). When the position movement is input, the process proceeds to step S2.

【００５５】ステップＳ２において、音声処理装置１
は、振幅／遅延算出部１３において、移動した仮想音源
４１の位置情報の空間構成用データを空間構成用データ
記憶部１２から読出し、仮想音源４１から各スピーカ配
置位置までの直接音及び反射音の音線を算出し、振幅遅
延データを生成する。音線とは、仮想音源４１から各ス
ピーカ配置位置までの直接音及び反射音の経路を直線で
表したものであり、反射音には、少なくとも一次反射音
が含まれる。In step S2, the voice processing device 1
In the amplitude / delay calculation unit 13, the spatial configuration data of the position information of the moved virtual sound source 41 is read from the spatial configuration data storage unit 12, and the direct sound and the reflected sound from the virtual sound source 41 to each speaker arrangement position are generated. A sound ray is calculated and amplitude delay data is generated. The sound ray is a straight line representing the path of the direct sound and the reflected sound from the virtual sound source 41 to each speaker arrangement position, and the reflected sound includes at least the primary reflected sound.

【００５６】次に、ステップＳ３において、音声処理装
置１は、インパルス応答合成部１５において、音線が仮
想空間４０における壁４２に反射した、又は反射しない
という情報をキーとしてインパルス応答断片データ記憶
部１４よりインパルス応答断片データを抽出する。Next, in step S3, the impulse response synthesizer 15 of the voice processing apparatus 1 uses the information that the sound ray is reflected or not reflected by the wall 42 in the virtual space 40 as a key, and the impulse response fragment data storage unit is used. The impulse response fragment data is extracted from 14.

【００５７】ステップＳ４において、音声処理装置１の
インパルス応答合成部１５は、ステップＳ２で音波の経
路から生成された直接波、一次反射音及び二次反射音の
振幅の減衰度合いと時間遅延とを示した振幅遅延データ
に、ステップＳ３で抽出されたインパルス応答断片デー
タを合成する。合成によって得られた合成インパルス応
答を、ステップＳ５において、最終的なインパルス応答
とする。In step S4, the impulse response synthesizing unit 15 of the voice processing device 1 calculates the attenuation degree and the time delay of the amplitude of the direct wave, the primary reflected sound and the secondary reflected sound generated from the sound wave path in step S2. The impulse response fragment data extracted in step S3 is combined with the indicated amplitude delay data. The combined impulse response obtained by combining is set as the final impulse response in step S5.

【００５８】音声処理装置１は、ステップＳ６におい
て、現在の合成インパルス応答を音源音声データに畳み
込む。In step S6, the voice processing device 1 convolves the current synthesized impulse response with the voice source voice data.

【００５９】ステップＳ１において、位置移動が入力さ
れない場合は、ステップＳ６に進み、その時点の合成イ
ンパルス応答を音源音声データに畳み込む。If the position movement is not input in step S1, the process proceeds to step S6, and the synthesized impulse response at that time point is convoluted with the sound source voice data.

【００６０】以上、図９に示した一連の処理によって、
音声処理装置１は、時々刻々と変化する仮想音源４１か
らの音波を滑らかに切り換えて仮想音源４１が移動して
いるように表現できる。As described above, by the series of processing shown in FIG.
The sound processing device 1 can smoothly switch the sound waves from the virtual sound source 41, which changes moment by moment, and express the virtual sound source 41 as if it were moving.

【００６１】合成インパルス応答は、仮想音源４１の位
置や仮想空間４０の空間構成が等しければ、毎回同じも
のが合成される。そのため、合成インパルス応答の一時
記憶部（キャッシュ機構）を備えることによって、操作
部７若しくはコントローラ７ａから受け取った位置情報
から一時記憶部内の同じ条件の合成インパルス応答を検
索し、一時的に記憶されていた場合には、振幅／遅延算
出部１３における振幅遅延データの算出からインパルス
応答合成部１５におけるインパルス応答断片データの合
成までの一連の処理を省略することもできる。If the position of the virtual sound source 41 and the spatial structure of the virtual space 40 are equal, the same combined impulse response is combined every time. Therefore, by providing a temporary storage unit (cache mechanism) for the composite impulse response, the composite impulse response of the same condition in the temporary storage unit is searched from the position information received from the operation unit 7 or the controller 7a and is temporarily stored. In that case, a series of processes from the calculation of the amplitude delay data in the amplitude / delay calculation unit 13 to the combination of the impulse response fragment data in the impulse response combination unit 15 can be omitted.

【００６２】本具体例では、仮想音源４１からの音声波
形を各スピーカ位置において再生する場合としたが、リ
スナ１００の耳の位置で再現される音声波形として考え
てもよい。つまり、音源からリスナの両耳までの頭部伝
達関数に基づいて音像定位処理を行う場合でも、この頭
部伝達関数に対して、上述した合成インパルス応答の合
成方法が適用できる。In this specific example, the sound waveform from the virtual sound source 41 is reproduced at each speaker position, but it may be considered as a sound waveform reproduced at the ear position of the listener 100. That is, even when the sound image localization processing is performed based on the head related transfer function from the sound source to both ears of the listener, the above-described method of combining the combined impulse responses can be applied to this head related transfer function.

【００６３】また、上述した例において、壁４２に対す
る反射回数は、音声処理装置のＣＰＵ、メモリ容量等の
基本性能に応じて変更可能である。これに伴い、空間構
成用データ記憶部１２に予め用意する空間位置情報及び
インパルス応答断片データ記憶部１４に予め用意するイ
ンパルス応答断片データもまた、音声処理装置の性能に
合わせて自由に設定できる。In the above example, the number of reflections on the wall 42 can be changed according to the basic performance such as the CPU and memory capacity of the voice processing device. Along with this, the spatial position information prepared in advance in the spatial configuration data storage unit 12 and the impulse response fragment data prepared in advance in the impulse response fragment data storage unit 14 can also be freely set according to the performance of the voice processing device.

【００６４】上述した例では、振幅／遅延算出部１３
は、直接波及び反射波の振幅減衰と時間遅延とを算出す
るのみであって、インパルス応答断片データは、直接波
か一次反射波か二次反射波かという点しか考慮していな
いが、インパルス応答断片データ記憶部１４のデータベ
ース設計時に、さらに詳細な情報を検索キーによって入
力して使用できるようにすると、最終的な合成インパル
ス応答をより忠実に再現できる。In the above example, the amplitude / delay calculation unit 13
Only calculates the amplitude attenuation and time delay of the direct wave and the reflected wave, and the impulse response fragment data considers only the direct wave, the first-order reflected wave, or the second-order reflected wave. When the database of the response fragment data storage unit 14 is designed, more detailed information can be input by using a search key so that the final synthesized impulse response can be reproduced more faithfully.

【００６５】例えば、図１０及び図１１に示すように、
振幅遅延データを仮想音源４１から仮想空間を構成する
構成要素に対する音波の入射角、構成要素の材質等のよ
うな、仮想空間４０における音声の反射条件を示す付加
情報を含めたデータ構成とし、インパルス応答断片デー
タ記憶部１４を壁の種類や反射の順序によって断片デー
タを複数種類もつようなデータベースとすることもでき
る。この場合、振幅／遅延算出部１３は、図３に示され
た反射音の音線のうちの二次反射波５３に対しては、ま
ず、壁４２ａに入射し、次に壁４２ｂで反射して聴取位
置に到達することまで考慮して音波の振幅減衰と時間遅
延とを算出する。すなわち、壁による反射率や吸音率を
考慮して算出する。For example, as shown in FIG. 10 and FIG.
The amplitude delay data has a data structure including additional information indicating the reflection condition of the sound in the virtual space 40, such as the incident angle of the sound wave from the virtual sound source 41 to the constituent elements forming the virtual space, the material of the constituent elements, and the like. The response fragment data storage unit 14 may be a database having a plurality of kinds of fragment data depending on the type of wall and the order of reflection. In this case, the amplitude / delay calculation unit 13 first makes the secondary reflected wave 53 in the sound ray of the reflected sound shown in FIG. 3 incident on the wall 42a and then reflects it on the wall 42b. In consideration of reaching the listening position, the amplitude attenuation and time delay of the sound wave are calculated. That is, the calculation is performed in consideration of the reflectance and sound absorption coefficient of the wall.

【００６６】振幅及び時間遅延は、この音線を求める際
に、例えば、空気中を進行する際の減衰率や音速等から
も計算できる。インパルス応答断片データベースは、直
接波のインパルス応答断片データと反射波のインパルス
応答断片データに対して、壁４２ａ又は壁４２ｂに２回
反射したデータ等も含んでいる。The amplitude and the time delay can be calculated from the attenuation rate and the speed of sound when traveling through the air when obtaining the sound ray. The impulse response fragment database also includes data obtained by reflecting the impulse response fragment data of the direct wave and the impulse response fragment data of the reflected wave twice on the wall 42a or the wall 42b.

【００６７】したがって、図３に示す反射波５３の音線
に対しては、「壁４２ａに反射し、続いて壁４２ｂに反
射したデータ」を使用することになる。ここでは、入射
角を使用していないが、例えば「壁４２ａに○○°〜○
○°の入射角で反射したデータ」等のようにさらに細か
くデータベース化しておくこともできる。この場合、入
射角ｘが含まれるデータをデータベースから検索するこ
ととなる。このようにして、細かい条件に対してそれぞ
れ測定データを断片化して保持することにより、様々な
音色をもつ合成インパルス応答の形成が実現できる。Therefore, for the sound ray of the reflected wave 53 shown in FIG. 3, "data reflected on the wall 42a and subsequently on the wall 42b" is used. Here, although the incident angle is not used, for example, "...
It is also possible to create a more detailed database such as "data reflected at an incident angle of ○ °". In this case, the data including the incident angle x is retrieved from the database. In this way, the measurement data is fragmented and held for each fine condition, whereby the formation of a synthetic impulse response having various timbres can be realized.

【００６８】また、直接波に対応するインパルス応答断
片データに関しても「仮想音源と受音位置との間に障害
物があった場合のインパルス応答断片データ」等のよう
に、データベースを詳細に分類することができる。音線
上に障害物がある場合、この障害物の大きさや透過率等
により回折効果を表すデータを複数用意してもよい。た
だし、この場合は、振幅遅延データにおいても音瀬淫が
障害物を通って到達したかどうかを示す付加情報の項目
を記憶することが必要になる。Further, regarding the impulse response fragment data corresponding to the direct wave, the database is classified in detail such as "impulse response fragment data when an obstacle exists between the virtual sound source and the sound receiving position". be able to. When there is an obstacle on the sound ray, a plurality of data representing the diffraction effect may be prepared depending on the size and transmittance of the obstacle. However, in this case, it is necessary to store an item of additional information indicating whether or not Otomi has arrived through an obstacle even in the amplitude delay data.

【００６９】このように、インパルス応答断片データ記
憶部１４は、振幅／遅延算出部１３で算出される振幅遅
延データと密接に連携するような設計とすることが好ま
しい。As described above, it is preferable that the impulse response fragment data storage unit 14 is designed to closely cooperate with the amplitude delay data calculated by the amplitude / delay calculation unit 13.

【００７０】上述したクロスフェイド処理を実行する音
声処理部としては、例えば、図１２に示す構成を有した
周知の音声処理部を用いることができる。音声処理部２
０は、複数の音声伝達特性を重畳させながら徐々に切り
換えることによって、複雑な音場の再生を可能としたも
のである。音声処理部２０は、図示しない音声信号入力
部から入力された音声信号Ｓ_ｉに対して、クロスフェイ
ド処理を行うものであり、音声信号入力部に対応する音
源としては、例えば、線形記録媒体、ディスク型記録媒
体等に記録された音声信号を再生する再生装置における
音声信号出力部等がこれに相当する。As the voice processing section for executing the above-mentioned cross fade processing, for example, a well-known voice processing section having the configuration shown in FIG. 12 can be used. Voice processing unit 2
The value of 0 makes it possible to reproduce a complicated sound field by gradually switching while superimposing a plurality of voice transfer characteristics. The voice processing unit 20 performs a crossfade process on the voice signal S _i input from a voice signal input unit (not shown). As a sound source corresponding to the voice signal input unit, for example, a linear recording medium, An audio signal output unit or the like in a reproducing device that reproduces an audio signal recorded on a disc-type recording medium or the like corresponds to this.

【００７１】音声処理部は、音声信号入力部の後段に、
複数個の線形フィルタ３１_（１）、３１_（２）、・・
・、３１_（Ｎ）が並列に配置されており、さらに、これ
ら線形フィルタ各々の後段には、線形フィルタの個数に
対応する可変重み付け部３２_（ _１）、３２_（２）、・・
・、３２_（Ｎ）が並列に配置され、これら可変重み付け
部の後段には、１つの加算回路３３が接続されて構成さ
れている。図１２に示す音声処理部２０は、１チャンネ
ル分のみ示したが、本具体例では、８チャンネルのマル
チチャンネル方式であるため、この音声処理部２０が８
組用意されている。クロスフェイド処理後の信号Ｓ
_ｏは、加算回路３３の出力端子より取り出され、Ｄ／Ａ
部１７へと供給されている。The voice processing unit is provided at the subsequent stage of the voice signal input unit.
A plurality of linear filters 31 ₍₁₎ , 31 ₍₂₎ , ...
, 31 _(N) are arranged in parallel, and variable weighting units 32 ₍ ₁₎ , 32 ₍₂₎ , ... Corresponding to the number of linear filters are provided at the subsequent stages of the respective linear filters.
, 32 _(N) are arranged in parallel, and one adder circuit 33 is connected to the rear stage of these variable weighting units. The audio processing unit 20 shown in FIG. 12 shows only one channel, but in this specific example, since it is a multi-channel system of 8 channels, this audio processing unit 20 has eight channels.
It is prepared as a set. Signal S after crossfade processing
_o is taken out from the output terminal of the adder circuit 33, and D / A
Is supplied to the section 17.

【００７２】各線形フィルタは、前段の音声信号入力部
の端子と共通接点を介して接続され、音声信号入力部か
らの出力信号がそれぞれ並列に入力されるように配線接
続されている。各線形フィルタは、入力された音声信号
Ｓ_ｉに対して、後述するコントローラ７ａから転送され
たフィルタ係数Ｄ_ｆを重畳処理し、この重畳信号Ｓ_（
_１）、Ｓ_（２）、・・・、Ｓ_（Ｎ）を後段の可変重み付
け部３２に出力する。ここで、フィルタ係数Ｄ_ｆは、例
えば、入力信号Ｓ_ｉの信号レベル、或いは位相を変化さ
せるための係数であって音響伝達特性を表している。フ
ィルタ係数は、仮想音源（以下、必要に応じて音像と示
す。）を仮想的な空間の中で定位させたい任意の位置に
あるものとして表現するために、シミュレーションや実
測によって得られたインパルス応答波形を示している。Each linear filter is connected to a terminal of the audio signal input section at the preceding stage through a common contact, and is wired and connected so that output signals from the audio signal input section are input in parallel. Each linear filter superimposes a filter coefficient D _f transferred from a controller 7a, which will be described later, on the input audio signal S _i , and the superposed signal S ₍
₁₎ , S ₍₂₎ , ..., S _(N) are output to the variable weighting unit 32 in the subsequent stage. Here, the filter coefficient D _f is, for example, a coefficient for changing the signal level or phase of the input signal S _i , and represents the acoustic transfer characteristic. The filter coefficient is an impulse response obtained by simulation or actual measurement in order to represent a virtual sound source (hereinafter, referred to as a sound image, if necessary) as an arbitrary position to be localized in a virtual space. The waveform is shown.

【００７３】各可変重み付け部は、前段のそれぞれに対
応する線形フィルタと接続され、対応する線形フィルタ
からの重畳信号Ｓに対して、コントローラ７ａからの指
令信号Ｓ_ｃに基づいて、フェイドイン処理又はフェイド
アウト処理を行っている。Each variable weighting unit is connected to a linear filter corresponding to each of the preceding stages, and based on a command signal S _c from the controller 7a, a fade-in process or a superimposition signal S from the corresponding linear filter is connected. Fade out processing is being performed.

【００７４】一方、操作部７から供給された音像（仮想
音源）の移動要求信号Ｓ_ｄに基づいて、前回フェイドイ
ン処理を行った可変重み付け部に対しては、フェイドア
ウト処理を行わせるための指令信号Ｓ_ｃ１を出力し、前
回フェイドアウト処理を行った可変重み付け部に対して
は、フェイドイン処理を行わせるための指令信号Ｓ_ｃ _２
を出力する。また、制御部３４は、指令信号Ｓ_ｃの出力
に対して音像位置位置データを含む指令コードＣ_ｓを出
力する。On the other hand, on the basis of the movement request signal S _d of the sound image (virtual sound source) supplied from the operation unit 7, the variable weighting unit which previously performed the fade-in process is instructed to perform the fade-out process. The command signal S _c ₂ for outputting the signal S _c1 and performing the fade-in process on the variable weighting unit that previously performed the fade-out process.
Is output. Further, the control unit 34 outputs a command code C _s including sound image position data in response to the output of the command signal S _c .

【００７５】フィルタ係数切換部３５は、各線形フィル
タのうち前回フィルタ係数Ｄｆが転送された線形フィル
タ以外の線形フィルタ部に返送されたフィルタ係数Ｄ_ｆ
を転送する。これによって、信号Ｓ_ｉは、各線形フィル
タにおいて、フィルタ係数Ｄ _ｆが重畳処理された信号Ｓ
となり、これら線形フィルタ部からの重畳信号Ｓが後段
の可変重み付け部にてフェイドイン処理又はフェイドア
ウト処理されて出力される。The filter coefficient switching section 35 is arranged to
Linear filter to which the previous filter coefficient Df was transferred
Filter coefficient D returned to a linear filter unit other than_f
To transfer. This causes the signal S_iIs each linear fill
, The filter coefficient D _fSignal S on which is superimposed
And the superimposed signal S from these linear filter units is
Fade-in process or fade door
It is processed and output.

【００７６】具体的には、信号Ｓｉは、前回のフィルタ
係数Ｄ_ｆが転送されている線形フィルタ部において、該
前回のフィルタ係数Ｄ_ｆが重畳処理され、今回、フィル
タ係数切換部３５から新たにフィルタ係数Ｄ_ｆが転送さ
れた線形フィルタ部において、今回のフィルタ係数Ｄ_ｆ
が重畳処理されることになる。[0076] Specifically, the signal Si is the linear filter unit the previous filter coefficient D _f is transferred, the last filter coefficient D _f is superimposed processed, this newly from the filter coefficient switching unit 35 In the linear filter unit to which the filter coefficient D _f is transferred, the current filter coefficient D _f
Will be superposed.

【００７７】これにより、前回のフィルタ係数Ｄ_ｆが転
送されている線形フィルタ部からの重畳信号Ｓによっ
て、対応する可変重み付け部においてフェイドアウト処
理が行われ、今回のフィルタ係数Ｄ_ｆが転送された線形
フィルタ部からの重畳信号Ｓによって、対応する可変重
み付け部においてフェイドイン処理が行われる。各可変
重み付け部からの出力信号は、後段の加算回路３３にて
加算処理され、最終出力信号、すなわち、クロスフェイ
ド処理信号Ｓ_ｏとしてＤ／Ａ部１７に供給されることに
なる。As a result, the fade-out processing is performed in the corresponding variable weighting section by the superimposed signal S from the linear filter section to which the previous filter coefficient D _f has been transferred, and the linear filter to which the current filter coefficient D _f has been transferred. Fade-in processing is performed in the corresponding variable weighting section by the superimposed signal S from the filter section. The output signal from each variable weighting unit is subjected to addition processing by the addition circuit 33 in the subsequent stage, and is supplied to the D / A unit 17 as the final output signal, that is, the crossfade processing signal S _o .

【００７８】したがって、上述したような処理によっ
て、音声処理装置１は、ディスプレイ５に表示される状
況に対応して、仮想音源とリスナ１００との位置関係を
現実的に表現することができる。つまり、リスナ１００
にとっては、音源が実際にその位置にあるかのように聞
こえる。この音声処理装置１は、仮想空間における音源
のリスナ１００に対する移動、リスナ１００の音源に対
する移動、或いは周囲の状況変化、例えば、音源からの
音波が反射する壁、床、障害物等の相対的な移動に対応
して音源とリスナ１００との位置関係や音波の伝達特性
が実際に変化しているかのように表現できる。Therefore, by the above-described processing, the voice processing device 1 can realistically represent the positional relationship between the virtual sound source and the listener 100 in accordance with the situation displayed on the display 5. In other words, listener 100
For you, the sound source sounds as if it were actually there. The sound processing device 1 is configured so that a sound source moves in a virtual space with respect to the listener 100, moves with respect to the sound source of the listener 100, or changes in surrounding conditions, for example, relative to a wall, a floor, an obstacle, or the like where sound waves from the sound source are reflected. It can be expressed as if the positional relationship between the sound source and the listener 100 and the transmission characteristic of the sound wave are actually changed in response to the movement.

【００７９】以上説明した具体例は、最終的に振幅遅延
データとインパルス応答断片データとに基づいて形成さ
れた合成インパルス応答を取り出し、これを入力音声信
号に畳み込む処理によってスピーカ２から出力する音声
波形を生成する例であったが、例えば、装置の能力上の
制約等によって振幅／遅延算出部１３が非常に少ない反
射波データしか扱えないような場合、また、同様の理由
によってインパルス応答断片データ記憶部１４に格納で
きるインパルス応答断片データ量が限られている場合、
上述した具体例を同様の処理を以下に示す別の具体例で
実現することもできる。In the concrete example described above, the voice waveform output from the speaker 2 by the process of finally taking out the combined impulse response formed on the basis of the amplitude delay data and the impulse response fragment data and convolving this into the input voice signal. However, for example, when the amplitude / delay calculation unit 13 can handle only a very small amount of reflected wave data due to restrictions on the capability of the device, impulse response fragment data storage is performed for the same reason. If the amount of impulse response fragment data that can be stored in the unit 14 is limited,
It is also possible to implement the same processing as the specific example described above by another specific example shown below.

【００８０】図１３に示す音声処理装置７０は、音声処
理装置１と同様、仮想空間における音源と該仮想音源か
らの音声を受音する受音点との間の該仮想空間での位置
関係に応じた音声を生成する音声処理装置であり、図１
に示したような位置関係により各構成を配置することに
よって、臨場感あふれる音響効果を得ることができる。
音声処理装置７０において、音声処理装置１と同様の機
能を有する構成に関しては、図１と同じ番号を付して詳
細な説明を省略する。The sound processing device 70 shown in FIG. 13 is similar to the sound processing device 1 in that it has a positional relationship in the virtual space between a sound source in the virtual space and a sound receiving point for receiving the sound from the virtual sound source. 1 is a voice processing device that generates a voice according to FIG.
By arranging the respective components according to the positional relationship as shown in, it is possible to obtain a realistic sound effect.
With respect to the configuration of the voice processing device 70 having the same functions as those of the voice processing device 1, the same numbers as in FIG. 1 are assigned and detailed description thereof is omitted.

【００８１】音声処理装置７０は、基本構成は、音声処
理装置１と同様であるが、仮想空間に相当する実空間で
測定された音源から各スピーカ位置までのインパルス応
答の実測値のうち、直接音に対応する部分と反射音に対
応する部分に対応する所定区間を抽出して得られたイン
パルス応答断片データをそれぞれの音源音声データに対
して予め重畳した重畳済音声データが記憶された音声デ
ータ記憶部７１を備え、音声データ合成部７２におい
て、振幅遅延データにこの重畳済音声データを合成する
点が特徴である。The basic configuration of the voice processing device 70 is the same as that of the voice processing device 1. However, among the measured values of the impulse response from the sound source measured in the real space corresponding to the virtual space to the respective speaker positions, Voice data in which superimposed voice data in which impulse response fragment data obtained by extracting a predetermined section corresponding to a portion corresponding to a sound and a portion corresponding to a reflected sound are preliminarily superimposed on each sound source voice data is stored A feature is that the storage unit 71 is provided, and the voice data synthesizing unit 72 synthesizes the superimposed voice data with the amplitude delay data.

【００８２】音声データ記憶部７１には、図１４（ａ）
に示す音源音声データの音声波形に対して、図１４
（ｂ）に示す実測値から抽出された直接波のインパルス
応答断片データＤ_１及び反射波のインパルス応答断片デ
ータＤ_２がそれぞれ予め重畳されて、図１４（ｃ）、図
１４（ｄ）に示すような重畳済音声データとして記憶さ
れている。ここで重畳済音声データは、時間遅延に関す
るデータが最小限になるようにデータベース化されてい
るため、これまで説明した方法で合成される振幅遅延デ
ータに比べて非常に短くできる。つまり、振幅遅延デー
タのデータ量が少なく、これを記憶しておくためのリソ
ースをさほど必要としない。FIG. 14A shows the voice data storage section 71.
For the voice waveform of the sound source voice data shown in FIG.
(B) to be superimposed previously Found impulse response fragment data D ₂ of the extracted direct wave impulse response fragment data D ₁ and the reflected wave from each shown, shown in FIG. 14 (c), FIG. 14 (d) It is stored as such superimposed voice data. Here, since the superimposed voice data is stored in a database so that the data regarding the time delay is minimized, it can be made much shorter than the amplitude delay data synthesized by the method described above. That is, the amount of amplitude delay data is small, and a resource for storing the data is not required so much.

【００８３】音声データ合成部７２では、振幅／遅延算
出部１３からの振幅遅延データに対応させて音声データ
を合成している。すなわち、図１５（ａ）に示すような
振幅遅延データに対し、図１５（ｂ）で示す重畳済音声
データをそれぞれ合成し、図１５（ｃ）に示す音声波形
を得ている。The voice data synthesizing unit 72 synthesizes voice data in correspondence with the amplitude delay data from the amplitude / delay calculating unit 13. That is, the amplitude-delayed data as shown in FIG. 15A is combined with the superimposed voice data shown in FIG. 15B to obtain the voice waveform shown in FIG. 15C.

【００８４】音声処理装置７０によれば、最も遅れて届
く反射波の遅延時間までの期間長が短縮されるため、デ
ータ容量が節約され、音声処理のリアルタイム性が向上
する。According to the voice processing device 70, the period length up to the delay time of the reflected wave that arrives most late is shortened, so that the data capacity is saved and the real time property of voice processing is improved.

【００８５】以上説明したように、本発明に係る音声処
理装置１及び音声処理装置７０によれば、合成インパル
ス応答又は合成音声データを形成するに際して、全て計
算により求めるのではなく、計算により算出する部分
と、実空間における実測から得られたインパルス応答断
片データを使用する部分と効果的に分離することによっ
て、全てを計算により求める場合よりも少ない計算量
で、より現実感のある仮想空間を音響的に表現すること
が可能である。また、仮想空間中を仮想音源が移動する
場合の音場をつくり出す上でも、仮想音源が移動する全
てのポイントにおけるインパルス応答を実空間中で測定
することなく、数ポイントの測定と計算とを組み合わせ
て用いることによって、より簡素に、より現実的な音場
を表現することができる。As described above, according to the voice processing device 1 and the voice processing device 70 according to the present invention, when forming the synthetic impulse response or the synthetic voice data, not all are calculated but calculation is performed. By effectively separating the part and the part that uses impulse response fragment data obtained from actual measurements in the real space, it is possible to acoustically create a more realistic virtual space with a smaller amount of calculation than when all are calculated. Can be expressed as Moreover, even when creating a sound field when a virtual sound source moves in the virtual space, it is possible to combine the measurement and calculation of several points without measuring the impulse response at all points where the virtual sound source moves in the real space. It is possible to more simply express a more realistic sound field by using the sound field.

【００８６】なお、本発明は上述した実施の形態のみに
限定されるものではなく、本発明の要旨を逸脱しない範
囲において種々の変更が可能であることは勿論である。It should be noted that the present invention is not limited to the above-described embodiments, and it goes without saying that various modifications can be made without departing from the gist of the present invention.

【００８７】[0087]

【発明の効果】以上詳細に説明したように、本発明に係
る音声処理装置は、音場を再現するための音声波形を生
成するに際し、振幅遅延算出手段において仮想音源から
受音点に伝播する音波の直接音応答と主要な反射音応答
の少なくとも振幅及び遅延からなる振幅遅延データを、
空間構成用データに基づいて算出する部分と、仮想空間
に相当する実空間で測定されたインパルス応答の所定区
間を抽出したインパルス応答断片データを使用する部分
とに効果的に分離することによって、全てを計算により
求める場合よりも少ない計算量でより現実的な音響空間
を再現できるとともに、最終的な音質の向上をも同時に
達成できる。As described in detail above, in the voice processing device according to the present invention, when the voice waveform for reproducing the sound field is generated, the amplitude delay calculation means propagates from the virtual sound source to the sound receiving point. Amplitude delay data consisting of at least amplitude and delay of direct sound response of sound wave and main reflected sound response,
By effectively separating into a part that is calculated based on the data for spatial configuration and a part that uses the impulse response fragment data that has extracted a predetermined section of the impulse response measured in the real space corresponding to the virtual space, all A more realistic acoustic space can be reproduced with a smaller amount of calculation than the case where is calculated, and the final improvement of sound quality can be achieved at the same time.

【００８８】また、本発明に係る音声処理装置によれ
ば、仮想空間中を音源が移動する状態をシミュレートす
る場合、実空間中で特徴的な合成インパルス応答を数点
入力するだけで、音源がこの仮想空間内を自由に移動す
る状況を現実的に表現することができる。Further, according to the speech processing apparatus of the present invention, when simulating a state where a sound source moves in a virtual space, the sound source can be generated by inputting only a few synthetic impulse responses characteristic in the real space. It is possible to realistically express the situation where a person moves freely in this virtual space.

【００８９】さらに、計算量を減らすことにより、例え
ばゲーム等のように音声処理にリアルタイム性が要求さ
れる状況下であっても臨場感ある音場を再現できる。Further, by reducing the amount of calculation, it is possible to reproduce a realistic sound field even in a situation where real time processing is required for voice processing such as a game.

【００９０】また、本発明に係る音声処理装置は、振幅
遅延算出手段において仮想音源から受音点に伝播する音
波の直接音応答と主要な反射音応答の振幅及び遅延から
なる振幅遅延データを空間構成用データに基づいて算出
し、仮想音源から受音点に伝播する音波のインパルス応
答から抽出した直接音応答と反射音応答の各々を表すイ
ンパルス応答断片データを音源の音声データに対して予
め重畳した重畳済音声データと振幅遅延データに基づい
て合成することによって、全てを計算により求める場合
よりも少ない計算量でより現実的な音響空間を再現でき
るとともに、最終的な音質の向上をも同時に達成でき
る。Further, in the audio processing device according to the present invention, the amplitude delay calculation means stores the amplitude delay data consisting of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the amplitude and delay of the main reflected sound response. Impulse response fragment data representing each of the direct sound response and the reflected sound response calculated from the configuration data and extracted from the impulse response of the sound wave propagating from the virtual sound source to the sound receiving point is superimposed on the sound data of the sound source in advance. By synthesizing based on the superimposed voice data and the amplitude delay data, it is possible to reproduce a more realistic acoustic space with a smaller amount of calculation than when all are calculated, and at the same time improve the final sound quality. it can.

【００９１】さらに、計算量を減らすことにより、例え
ばゲーム等のように音声処理にリアルタイム性が要求さ
れる状況下であっても臨場感ある音場を再現できる。Furthermore, by reducing the amount of calculation, it is possible to reproduce a realistic sound field even in a situation where real-time processing is required for voice processing such as a game.

【００９２】また、本発明に係る音声処理装置によれ
ば、仮想空間中を音源が移動する状態をシミュレートす
る場合、実空間中で特徴的なインパルス応答を数点測定
するだけで、音源がこの仮想空間内を自由に移動する状
況を現実的に表現することができる。Further, according to the speech processing apparatus of the present invention, when simulating the state where the sound source moves in the virtual space, the sound source can be measured by measuring a few points of the characteristic impulse response in the real space. It is possible to realistically represent the situation of freely moving in this virtual space.

【００９３】また、本発明に係る音声処理方法は、音場
を再現するための音声波形を生成するに際し、振幅遅延
算出工程において仮想音源から受音点に伝播する音波の
直接音応答と主要な反射音応答の少なくとも振幅及び遅
延からなる振幅遅延データを空間構成用データに基づい
て算出する部分と、仮想空間に相当する実空間で測定さ
れたインパルス応答の所定区間を抽出したインパルス応
答断片データを使用する部分とに効果的に分離すること
によって、全てを計算により求める場合よりも少ない計
算量でより現実的な音響空間を再現できるとともに、最
終的な音質の向上をも同時に達成できる。Further, the voice processing method according to the present invention, when generating the voice waveform for reproducing the sound field, has a direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point in the amplitude delay calculation step and the main sound response. A part that calculates amplitude delay data consisting of at least the amplitude and delay of the reflected sound response based on the data for spatial composition, and impulse response fragment data that extracts a predetermined section of the impulse response measured in the real space corresponding to the virtual space. By effectively separating it from the part to be used, a more realistic acoustic space can be reproduced with a smaller amount of calculation than the case where all are calculated, and the final improvement in sound quality can be achieved at the same time.

【００９４】また、本発明に係る音声処理方法によれ
ば、仮想空間中を音源が移動する状態をシミュレートす
る場合、実空間中で特徴的なインパルス応答を数点測定
するだけで、音源がこの仮想空間内を自由に移動する状
況を現実的に表現することができる。Further, according to the voice processing method of the present invention, when simulating a state in which a sound source moves in the virtual space, the sound source can be measured by measuring only a few characteristic impulse responses in the real space. It is possible to realistically represent the situation of freely moving in this virtual space.

【００９５】さらに、計算量を減らすことにより、例え
ばゲーム等のように音声処理にリアルタイム性が要求さ
れる状況下であっても臨場感ある音場を再現できる。Further, by reducing the amount of calculation, it is possible to reproduce a realistic sound field even in a situation where real time processing is required for voice processing such as a game.

【００９６】また、本発明に係る音声処理方法によれ
ば、振幅遅延算出工程において仮想音源から受音点に伝
播する音波の直接音応答と主要な反射音応答の少なくと
も振幅及び遅延からなる振幅遅延データを空間構成用デ
ータに基づいて算出し、仮想音源から受音点に伝播する
音波のインパルス応答から抽出した直接音応答と反射音
応答の各々を表すインパルス応答断片データを音源の音
声データに対して予め重畳した重畳済音声データとを振
幅遅延データに基づいて合成することによって、全てを
計算により求める場合よりも少ない計算量でより現実的
な音響空間を再現できるとともに、最終的な音質の向上
をも同時に達成できる。Further, according to the voice processing method of the present invention, in the amplitude delay calculation step, an amplitude delay including at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response. Data is calculated based on the spatial composition data, and impulse response fragment data representing each of the direct sound response and the reflected sound response extracted from the impulse response of the sound wave propagating from the virtual sound source to the sound receiving point is compared with the sound data of the sound source. By combining the pre-superimposed superposed voice data with the amplitude delay data, it is possible to reproduce a more realistic acoustic space with a smaller amount of calculation than when calculating all and to improve the final sound quality. Can be achieved at the same time.

【００９７】さらに、計算量を減らすことにより、例え
ばゲーム等のように音声処理にリアルタイム性が要求さ
れる状況下であっても臨場感ある音場を再現できる。Furthermore, by reducing the amount of calculation, it is possible to reproduce a realistic sound field even in a situation where real-time processing is required for voice processing such as a game.

【００９８】また、本発明に係る音声処理方法によれ
ば、仮想空間中を音源が移動する状態をシミュレートす
る場合、実空間中で特徴的なインパルス応答を数点測定
するだけで、音源がこの仮想空間内を自由に移動する状
況を現実的に表現することができる。Further, according to the voice processing method of the present invention, when simulating a state in which a sound source moves in the virtual space, the sound source can be measured by measuring only a few characteristic impulse responses in the real space. It is possible to realistically represent the situation of freely moving in this virtual space.

【００９９】また、本発明に係る制御プログラムによれ
ば、音場を再現するための音声波形を生成するに際し、
振幅遅延算出処理によって仮想音源から受音点に伝播す
る音波の直接音応答と主要な反射音応答の少なくとも振
幅及び遅延からなる振幅遅延データを空間構成用データ
に基づいて算出する部分と、仮想空間に相当する実空間
で測定されたインパルス応答の所定区間を抽出したイン
パルス応答断片データを使用する部分とに効果的に分離
することで、全てを計算により求める場合よりも少ない
計算量でより現実的な音響空間を再現できるとともに、
最終的な音質の向上をも同時に達成できる。Further, according to the control program of the present invention, when a voice waveform for reproducing a sound field is generated,
A portion for calculating the amplitude delay data consisting of at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response by the amplitude delay calculation processing, and the virtual space. By effectively separating the predetermined section of the impulse response measured in the real space into the part that uses the extracted impulse response fragment data, it is more realistic with a smaller amount of calculation than when calculating all. While it can reproduce various acoustic spaces,
The final improvement in sound quality can be achieved at the same time.

【０１００】また、本発明に係る制御プログラムによれ
ば、仮想空間中を音源が移動する状態をシミュレートす
る場合、実空間中で特徴的なインパルス応答を数点測定
するだけで、音源がこの仮想空間内を自由に移動する状
況を現実的に表現することができる。Further, according to the control program of the present invention, when simulating the state in which the sound source moves in the virtual space, the sound source can be measured by measuring a few points of the characteristic impulse response in the real space. It is possible to realistically represent the situation of freely moving in the virtual space.

【０１０１】さらに、計算量を減らすことにより、例え
ばゲーム等のように音声処理にリアルタイム性が要求さ
れる状況下であっても臨場感ある音場を再現できる。Further, by reducing the amount of calculation, it is possible to reproduce a realistic sound field even in a situation where real time processing is required for voice processing such as a game.

【０１０２】また、本発明に係る制御プログラムによれ
ば、振幅遅延算出処理において仮想音源から受音点に伝
播する音波の直接音応答と主要な反射音応答の少なくと
も振幅及び遅延からなる振幅遅延データを空間構成用デ
ータに基づいて算出し、仮想音源から受音点に伝播する
音波のインパルス応答から抽出した直接音応答と反射音
応答の各々を表すインパルス応答断片データを音源の音
声データに対して予め重畳した重畳済音声データとを合
成することによって、全てを計算により求める場合より
も少ない計算量でより現実的な音響空間を再現できると
ともに、最終的な音質の向上をも同時に達成できる。According to the control program of the present invention, the amplitude delay data including at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response in the amplitude delay calculation process. Is calculated based on the spatial composition data, and impulse response fragment data representing each of the direct sound response and the reflected sound response extracted from the impulse response of the sound wave propagating from the virtual sound source to the sound receiving point is compared with the sound data of the sound source. By synthesizing the pre-superimposed superimposed voice data, it is possible to reproduce a more realistic acoustic space with a smaller amount of calculation as compared with the case where all of them are calculated, and at the same time, improve the final sound quality.

【０１０３】さらに、計算量を減らすことにより、例え
ばゲーム等のように音声処理にリアルタイム性が要求さ
れる状況下であっても臨場感ある音場を再現できる。Furthermore, by reducing the amount of calculation, it is possible to reproduce a realistic sound field even in a situation where real time processing is required for voice processing such as a game.

【０１０４】また、本発明に係る制御プログラムによれ
ば、仮想空間中を音源が移動する状態をシミュレートす
る場合、実空間中で特徴的なインパルス応答を数点測定
するだけで、音源がこの仮想空間内を自由に移動する状
況を現実的に表現することができる。Further, according to the control program of the present invention, when simulating the state in which the sound source moves in the virtual space, the sound source can be measured by measuring a few characteristic impulse responses in the real space. It is possible to realistically represent the situation of freely moving in the virtual space.

[Brief description of drawings]

【図１】本発明の具体例として示す音声処理装置の構成
の概略を示す構成図である。FIG. 1 is a configuration diagram showing an outline of a configuration of a voice processing device shown as a specific example of the present invention.

【図２】本発明の具体例として示す音声処理装置の構成
を説明する構成図である。FIG. 2 is a configuration diagram illustrating a configuration of a voice processing device shown as a specific example of the present invention.

【図３】本発明の具体例として示す音声処理装置によっ
て表現される仮想空間における仮想音源とスピーカとリ
スナとの位置関係を説明する図である。FIG. 3 is a diagram illustrating a positional relationship between a virtual sound source, a speaker, and a listener in a virtual space represented by a voice processing device shown as a specific example of the present invention.

【図４】図４（ａ）は、虚像法によって仮想音源と受音
点との間の一次反射音の経路を算出する様子を説明する
模式図であり、図４（ｂ）は、虚像法によって仮想音源
と受音点との間の二次反射音の経路を算出する様子を説
明する模式図である。FIG. 4A is a schematic diagram for explaining how to calculate a path of a primary reflected sound between a virtual sound source and a sound receiving point by the virtual image method, and FIG. 4B is a virtual image method. It is a schematic diagram explaining a mode that the path of the secondary reflected sound between the virtual sound source and the sound receiving point is calculated by.

【図５】本発明の具体例として示す音声処理装置の振幅
／遅延算出部において、空間構成用データに基づいて算
出される振幅／遅延データを示す図である。FIG. 5 is a diagram showing amplitude / delay data calculated based on spatial configuration data in an amplitude / delay calculation unit of a voice processing device as a specific example of the present invention.

【図６】本発明の具体例として示す音声処理装置のイン
パルス応答断片データ記憶部に記憶するインパルス応答
断片データの取得について説明する図である。FIG. 6 is a diagram illustrating acquisition of impulse response fragment data stored in an impulse response fragment data storage unit of a voice processing device as a specific example of the present invention.

【図７】本発明の具体例として示す音声処理装置のイン
パルス応答断片データ記憶部に記憶されるインパルス応
答断片データを示す図である。FIG. 7 is a diagram showing impulse response fragment data stored in an impulse response fragment data storage unit of a voice processing device shown as a specific example of the present invention.

【図８】本発明の具体例として示す音声処理装置におい
て、インパルス応答断片データを振幅／遅延データに基
づいて合成する様子を示す模式図である。FIG. 8 is a schematic diagram showing how impulse response fragment data is synthesized based on amplitude / delay data in a voice processing device as a specific example of the present invention.

【図９】本発明の具体例として示す音声処理装置が仮想
空間中を移動する仮想音源の音声波形を生成する処理を
説明するフローチャートである。FIG. 9 is a flowchart illustrating a process in which a voice processing device shown as a specific example of the present invention generates a voice waveform of a virtual sound source moving in a virtual space.

【図１０】本発明の具体例として示す音声処理装置の振
幅遅延算出部において算出される振幅遅延データのテー
ブルの例を示す模式図である。FIG. 10 is a schematic diagram showing an example of a table of amplitude delay data calculated by an amplitude delay calculation unit of the audio processing device shown as a specific example of the present invention.

【図１１】本発明の具体例として示す音声処理装置のイ
ンパルス応答断片データ記憶部に記憶されるインパルス
応答断片データのほかの例を示す図である。FIG. 11 is a diagram showing another example of impulse response fragment data stored in an impulse response fragment data storage unit of the audio processing device shown as a specific example of the present invention.

【図１２】本発明の具体例として示す音声処理装置内の
音声処理部の構成を説明する構成図である。FIG. 12 is a configuration diagram illustrating a configuration of an audio processing unit in an audio processing device shown as a specific example of the present invention.

【図１３】本発明の別の具体例として示す音声処理装置
の構成を説明する構成図である。FIG. 13 is a configuration diagram illustrating a configuration of a voice processing device shown as another specific example of the present invention.

【図１４】図１４（ａ）は、音源音声データの音声波形
を示す波形図であり、図１４（ｂ）は、実測値から抽出
されたインパルス応答断片データの波形を示す波形図で
あり、図１４（ｃ）は、音源音声データの音声波形に対
して直接波を示すインパルス応答断片データを合成した
音声波形を示す波形図であり、図１４（ｄ）は、音源音
声データの音声波形に対して反射波を示すインパルス応
答断片データを合成した重畳済音声データを示す波形図
である。14A is a waveform diagram showing a voice waveform of sound source voice data, and FIG. 14B is a waveform diagram showing a waveform of impulse response fragment data extracted from an actual measurement value. FIG. 14C is a waveform diagram showing a voice waveform in which impulse response fragment data indicating a direct wave is combined with the voice waveform of the voice source voice data, and FIG. 14D is a voice waveform of the voice source voice data. FIG. 9 is a waveform diagram showing superimposed voice data obtained by synthesizing impulse response fragment data indicating a reflected wave.

【図１５】図１５（ａ）は、空間構成用データに基づい
て算出された振幅遅延データを示す図であり、図１５
（ｂ）は、振幅遅延データに基づいて構成された重畳済
音声データを示す波形図であり、図１５（ｃ）は、図１
５（ｂ）の各音声データを合成して得られた音声波形を
示す波形図である。15 (a) is a diagram showing amplitude delay data calculated based on spatial configuration data, and FIG.
FIG. 15B is a waveform diagram showing superimposed voice data constructed based on amplitude delay data, and FIG. 15C is a waveform diagram of FIG.
It is a waveform diagram which shows the audio | voice waveform obtained by synthesize | combining each audio | voice data of 5 (b).

[Explanation of symbols]

１、７０音声処理装置、２スピーカ、３閉曲面、
４音声信号制御生成部、５ディスプレイ、６映像
信号制御生成部、７操作部、７ａコントローラ、１
１音声データ記憶部、１２空間構成用データ記憶
部、１３振幅／遅延算出部、１４インパルス応答断
片データ記憶部、１５インパルス応答合成部、１６
畳込部、１７Ｄ／Ａ部、１８アンプ、１９内部バ
ス、２０音声処理部、２１オブジェクト表示制御部、
２２画像信号処理部、３１線形フィルタ、３２可
変重み付け部、３３加算回路、３４制御部、３５
フィルタ係数切換部、３６フィルタ係数発生部、４０
仮想空間、４１仮想音源、４１壁、５１直接
波、５２ａ一次反射波、５２ｂ一次反射波、５３二
次反射波、６０実空間、６１音源、６２マイクロ
フォン、７１音声データ記憶部、７２音声データ合
成部、１００リスナ1, 70 voice processing device, 2 speaker, 3 closed curved surface,
4 audio signal control generation unit, 5 display, 6 video signal control generation unit, 7 operation unit, 7a controller, 1
1 voice data storage unit, 12 spatial configuration data storage unit, 13 amplitude / delay calculation unit, 14 impulse response fragment data storage unit, 15 impulse response synthesis unit, 16
Folding unit, 17 D / A unit, 18 amplifier, 19 internal bus, 20 audio processing unit, 21 object display control unit,
22 image signal processing unit, 31 linear filter, 32 variable weighting unit, 33 adder circuit, 34 control unit, 35
Filter coefficient switching unit, 36 filter coefficient generating unit, 40
Virtual space, 41 virtual sound source, 41 wall, 51 direct wave, 52a primary reflected wave, 52b primary reflected wave, 53 secondary reflected wave, 60 real space, 61 sound source, 62 microphone, 71 voice data storage section, 72 voice data synthesis Part, 100 listeners

Claims

[Claims]

1. A sound processing device for generating a sound according to a positional relationship in a virtual space between a sound source in the virtual space and a sound receiving point for receiving a sound from the virtual sound source, Spatial configuration data storage means in which spatial configuration data relating to constituent elements are stored, and an amplitude composed of at least an amplitude and a delay of a direct sound response of a sound wave propagating from the virtual sound source to the sound receiving point and a main reflected sound response. From the impulse response of the sound wave propagating from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point, a predetermined value Based on the impulse response fragment data storage means for storing the impulse response fragment data in which the section is extracted, and the amplitude delay data and the impulse response fragment data. And the impulse response combining means for generating a composite impulse response, the speech processing apparatus; and a convolution means for convolving the been the synthetic impulse response synthesized in the impulse response combining means to the voice data of the sound source.

2. The impulse response fragment data is extracted from the impulse response by a time window in which the amplitude of the impulse response fragment data is gradually attenuated in the latter half of the predetermined section. The voice processing device described.

3. The voice processing apparatus according to claim 1, wherein the space configuration data includes additional information indicating a reflection condition of voice in the virtual space.

4. The reflected sound response includes at least a first-order reflection or diffraction response to a reflector or an obstacle based on the spatial configuration of the virtual space in the virtual space. Voice processor.

5. The amplitude delay calculation means calculates a direct sound response of a sound wave propagating from the virtual sound source to the sound receiving point and a main reflected sound response by using a virtual image method. 1. The voice processing device according to 1.

6. The voice processing apparatus according to claim 1, further comprising operation means for changing a positional relationship between the sound source and the sound receiving point in a virtual space.

7. A voice processing device for generating a sound according to a positional relationship in a virtual space between a sound source in the virtual space and a sound receiving point for receiving a sound from the virtual sound source, From the impulse response of the sound wave propagating from the virtual sound source measured in the corresponding real space to the sound receiving point, impulse response fragment data obtained by extracting a predetermined section is superimposed on the sound data of the sound source in advance. A voice data storage means in which is stored, a space configuration data storage means in which space configuration data relating to the elements forming the virtual space is stored, and a direct sound response of a sound wave propagating from the virtual sound source to the sound receiving point. And amplitude delay calculation means for calculating amplitude delay data consisting of at least the amplitude and delay of the main reflected sound response based on the spatial configuration data, and the superimposed voice. Speech processing apparatus characterized by a chromatography data and a synthesizing means for synthesizing on the basis of said amplitude delay data.

8. A sound processing method for generating a sound according to a positional relationship in a virtual space between a sound source in a virtual space and a sound receiving point for receiving a sound from the virtual sound source, the sound processing method comprising: Amplitude delay calculation for calculating the amplitude delay data including at least the amplitude and delay of the direct sound response of the sound wave propagating to the sound receiving point and the main reflected sound response on the basis of the data for space configuration related to the elements configuring the virtual space. Step, from the impulse response of the sound wave propagating from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point, based on the impulse response fragment data and the amplitude delay data extracted a predetermined section The impulse response synthesizing step of generating a synthetic impulse response, and the synthetic impulse response synthesized in the impulse response synthesizing step are combined with the sound of the sound source. And a convolution step of convolving the voice data.

9. The impulse response fragment data is extracted from the impulse response by a time window such that the amplitude of the impulse response fragment data is gradually attenuated in the latter half of the predetermined section. Described voice processing method.

10. The voice processing method according to claim 8, wherein the space configuration data includes additional information indicating a reflection condition of voice in the virtual space.

11. The voice processing method according to claim 8, wherein the reflected sound response includes at least a primary reflection or diffraction response to a reflector or an obstacle based on a spatial configuration of the virtual space. .

12. The amplitude delay calculation step calculates a direct sound response of a sound wave propagating from the virtual sound source to the sound receiving point and a main reflected sound response by using a virtual image method. 8. The voice processing method described in 8.

13. The voice processing method according to claim 8, further comprising an operation step of changing a positional relationship between the sound source and the sound receiving point in a virtual space.

14. A sound processing method for generating a sound according to a positional relationship in a virtual space between a sound source in a virtual space and a sound receiving point for receiving a sound from the virtual sound source, comprising: Amplitude delay calculation for calculating the amplitude delay data including at least the amplitude and delay of the direct sound response of the sound wave propagating to the sound receiving point and the main reflected sound response on the basis of the data for space configuration related to the elements configuring the virtual space. Steps, from the impulse response of the sound wave propagating from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point, impulse response fragment data obtained by extracting a predetermined section from the sound data of the sound source. And a synthesizing step of synthesizing preliminarily superimposed voice data based on the amplitude delay data.

15. A control of a computer controllable voice processing device for generating a sound according to a positional relationship in a virtual space between a sound source in the virtual space and a sound receiving point for receiving a sound from the virtual sound source. In the program, the amplitude delay data consisting of at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response is used as the data for space configuration related to the elements configuring the virtual space. Amplitude delay calculation processing based on the impulse response fragment data obtained by extracting a predetermined section from the impulse response of the sound wave propagating from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point, and Impulse fragment synthesizing process for generating a synthetic impulse response based on the amplitude delay data, and And a convolution process for convolving the synthesized impulse response with the voice data of the sound source, the control program being executed.

16. A control of a computer controllable voice processing device for generating a sound according to a positional relationship in a virtual space between a sound source in the virtual space and a sound receiving point for receiving a sound from the virtual sound source. In the program, the amplitude delay data consisting of at least the amplitude and delay of the direct sound response of the sound wave propagating from the virtual sound source to the sound receiving point and the main reflected sound response is used as the data for space configuration related to the elements configuring the virtual space. Amplitude delay calculation processing based on the above, impulse response fragment data obtained by extracting a predetermined section from the impulse response of the sound wave propagating from the virtual sound source measured in the real space corresponding to the virtual space to the sound receiving point Performs, on the audio processing device, a synthesis process of synthesizing the overlaid voice data that has been overlaid on the voice data of the sound source based on the amplitude delay data. A control program for causing.