JP2016134768A

JP2016134768A - Audio signal processor

Info

Publication number: JP2016134768A
Application number: JP2015008306A
Authority: JP
Inventors: 雄太湯山; Yuta Yuyama; 良太郎青木; Ryotaro Aoki; 加納　真弥; Masaya Kano; 真弥加納
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2015-01-20
Filing date: 2015-01-20
Publication date: 2016-07-25
Anticipated expiration: 2035-01-20
Also published as: JP6641693B2

Abstract

PROBLEM TO BE SOLVED: To provide an audio signal processor for estimating position information of an object included in a content.SOLUTION: A correlation calculation part 912 calculates mutual correlations between channels in respective divided bandwidths. The calculated mutual correlations are inputted to an object information acquisition unit 172 of a CPU 17. The correlation calculation part 912 also functions as a level detection part for detecting a level of an audio signal of each of the channels. Level information of the audio signal of each channel is inputted to the object information acquisition unit 172 as well. The object information acquisition unit 172 estimates a position of an object on the basis of inputted correlation value and level information of the audio signal of each channel. For example, when a correlation value between an L channel and an SL channel is high (over a prescribed threshold) and the level of the L channel and the level of the SL channel are high, it is estimated that an object exists between a speaker 21L and a speaker 21SL.SELECTED DRAWING: Figure 1

Description

この発明は、オーディオ信号に種々の処理を行うオーディオ信号処理装置に関する。 The present invention relates to an audio signal processing apparatus that performs various processing on an audio signal.

従来から、聴取環境において所望の音場を形成する音場支援装置が知られている（例えば特許文献１を参照）。音場支援装置は、複数チャンネルのオーディオ信号を合成して、合成後のオーディオ信号に所定のパラメータを畳み込むことにより、擬似反射音（音場効果音）を生成する。 2. Description of the Related Art Conventionally, a sound field support device that forms a desired sound field in a listening environment is known (see, for example, Patent Document 1). The sound field support device generates a pseudo reflected sound (sound field effect sound) by combining a plurality of channels of audio signals and convolving a predetermined parameter with the combined audio signal.

一方、近年ではコンテンツに付加されたオブジェクト情報による音像定位の方式が普及している。オブジェクト情報は、各オブジェクト（音源）の位置を示す情報が含まれている。 On the other hand, in recent years, a sound image localization method using object information added to content has become widespread. The object information includes information indicating the position of each object (sound source).

特開２００１−１８６５９９号公報JP 2001-186599 A

オブジェクト情報による音像定位の方式では、聴取環境（スピーカの配置態様）に基づくチャンネル分配後のオーディオ信号が入力されるだけで、元のオブジェクトの位置情報そのものを取得することができない場合がある。 In the method of sound image localization based on object information, there are cases where the position information itself of the original object cannot be acquired simply by inputting an audio signal after channel distribution based on the listening environment (arrangement of speakers).

そこで、この発明は、コンテンツに含まれるオブジェクトの位置情報を推定するオーディオ信号処理装置を提供することを目的とする。 Therefore, an object of the present invention is to provide an audio signal processing apparatus that estimates position information of an object included in content.

この発明のオーディオ信号処理装置は、複数チャンネルのオーディオ信号を入力するオーディオ信号入力手段と、チャンネル間の相関成分を検出する相関検出部と、前記相関検出部が検出した相関成分に基づいて、前記オーディオ信号に対応するコンテンツに含まれるオブジェクトの位置情報を取得する取得手段と、を備えている。 The audio signal processing apparatus according to the present invention is based on an audio signal input unit that inputs audio signals of a plurality of channels, a correlation detection unit that detects a correlation component between channels, and the correlation component detected by the correlation detection unit, Obtaining means for obtaining position information of an object included in the content corresponding to the audio signal.

例えば、ＬチャンネルとＳＬチャンネルの相関値が高く（所定の閾値を超え）、ＬチャンネルのレベルおよびＳＬチャンネルのレベルが高い（所定の閾値を超える）場合、ＬチャンネルのスピーカとＳＬチャンネルのスピーカとの間にオブジェクトが存在するものと推定することができる。推定したオブジェクトの位置は、例えば表示部に表示することも可能であるし、当該オブジェクトの位置に適した音場効果を付与することもできる。 For example, when the correlation value between the L channel and the SL channel is high (exceeding a predetermined threshold) and the level of the L channel and the level of the SL channel are high (exceeding the predetermined threshold), the L channel speaker and the SL channel speaker It can be estimated that an object exists between the two. The estimated position of the object can be displayed, for example, on a display unit, or a sound field effect suitable for the position of the object can be given.

また、前記複数チャンネルのオーディオ信号をそれぞれ所定の帯域毎に分割する帯域分割部を備え、前記相関検出部は、帯域毎に相関成分を検出することが好ましい。 In addition, it is preferable that a band dividing unit that divides the audio signals of the plurality of channels into predetermined bands is provided, and the correlation detection unit detects a correlation component for each band.

また、分割された各帯域のレベルを検出するレベル検出部を備え、取得手段は、分割された各帯域のレベルに基づいて前記オブジェクトの種別情報を取得することが好ましい。 In addition, it is preferable that a level detection unit that detects the level of each divided band is provided, and the acquisition unit acquires the type information of the object based on the level of each divided band.

この発明によれば、コンテンツに含まれるオブジェクトの位置情報を推定することができる。 According to the present invention, it is possible to estimate position information of an object included in content.

聴取環境の模式図である。It is a schematic diagram of listening environment. 第１実施形態に係るオーディオ信号処理装置のブロック図である。1 is a block diagram of an audio signal processing device according to a first embodiment. ＤＳＰおよびＣＰＵの機能的構成を示したブロック図である。It is the block diagram which showed the functional structure of DSP and CPU. 第１実施形態の変形例に係るＤＳＰの機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of DSP which concerns on the modification of 1st Embodiment. 第２実施形態の変形例に係るＤＳＰの機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of DSP which concerns on the modification of 2nd Embodiment. 分析部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of an analysis part. 第１実施形態（または第２実施形態）の変形例１に係るオーディオ信号処理部１４の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the audio signal processing part 14 which concerns on the modification 1 of 1st Embodiment (or 2nd Embodiment). 第３実施形態に係る聴取環境の模式図である。It is a schematic diagram of the listening environment which concerns on 3rd Embodiment. 第３実施形態におけるオーディオ信号処理装置のブロック図である。It is a block diagram of the audio signal processing apparatus in 3rd Embodiment.

（第１実施形態）
図１は、第１実施形態における聴取環境の模式図であり、図２は、第１実施形態におけるオーディオ信号処理装置１のブロック図である。本実施形態では、一例として平面視して正方形状の部屋内において、中心位置が聴取位置となっている聴取環境を示す。聴取位置の周囲には、複数のスピーカ（この例では、５つのスピーカ２１Ｌ、スピーカ２１Ｒ、スピーカ２１Ｃ、スピーカ２１ＳＬ、およびスピーカ２１ＳＲ）が設置されている。スピーカ２１Ｌは、聴取位置の前方左側、スピーカ２１Ｒは、聴取位置の前方右側、スピーカ２１Ｃは、聴取位置の前方中央、スピーカ２１ＳＬは、聴取位置の後方左側、スピーカ２１ＳＲは、聴取位置の後方右側に設置されている。スピーカ２１Ｌ、スピーカ２１Ｒ、スピーカ２１Ｃ、スピーカ２１ＳＬ、およびスピーカ２１ＳＲは、それぞれオーディオ信号処理装置１に接続されている。 (First embodiment)
FIG. 1 is a schematic diagram of a listening environment in the first embodiment, and FIG. 2 is a block diagram of an audio signal processing apparatus 1 in the first embodiment. In the present embodiment, as an example, a listening environment in which the center position is the listening position in a square room in plan view is shown. A plurality of speakers (in this example, five speakers 21L, speakers 21R, speakers 21C, speakers 21SL, and speakers 21SR) are installed around the listening position. The speaker 21L is the front left side of the listening position, the speaker 21R is the front right side of the listening position, the speaker 21C is the front center of the listening position, the speaker 21SL is the rear left side of the listening position, and the speaker 21SR is the rear right side of the listening position. is set up. The speaker 21L, the speaker 21R, the speaker 21C, the speaker 21SL, and the speaker 21SR are each connected to the audio signal processing device 1.

オーディオ信号処理装置１は、入力部１１、デコーダ１２、レンダラ１３、オーディオ信号処理部１４、Ｄ／Ａコンバータ１５、アンプ（ＡＭＰ）１６、ＣＰＵ１７、ＲＯＭ１８、およびＲＡＭ１９を備えている。 The audio signal processing apparatus 1 includes an input unit 11, a decoder 12, a renderer 13, an audio signal processing unit 14, a D / A converter 15, an amplifier (AMP) 16, a CPU 17, a ROM 18, and a RAM 19.

ＣＰＵ１７は、ＲＯＭ１８に記憶されている動作用プログラム（ファームウェア）をＲＡＭ１９に読み出し、オーディオ信号処理装置１を統括的に制御する。 The CPU 17 reads an operation program (firmware) stored in the ROM 18 to the RAM 19 and controls the audio signal processing apparatus 1 in an integrated manner.

入力部１１は、ＨＤＭＩ（登録商標）等のインタフェースを有する。入力部１１は、プレーヤ等からコンテンツデータを入力し、デコーダ１２に出力する。 The input unit 11 has an interface such as HDMI (registered trademark). The input unit 11 receives content data from a player or the like and outputs the content data to the decoder 12.

デコーダ１２は、例えばＤＳＰからなり、コンテンツデータをデコードし、オーディオ信号を抽出する。なお、本実施形態においては、オーディオ信号は特に記載がない限り全てデジタルオーディオ信号として説明する。 The decoder 12 is composed of a DSP, for example, and decodes content data and extracts an audio signal. In the present embodiment, audio signals are all described as digital audio signals unless otherwise specified.

デコーダ１２は、入力されたコンテンツデータがオブジェクトベース方式に対応するものである場合、オブジェクト情報を抽出する。オブジェクトベース方式は、コンテンツに含まれるオブジェクト（音源）を、独立したオーディオ信号として格納したものである。オブジェクトベース方式は、後段のレンダラ１３によって当該オブジェクトのオーディオ信号を各チャンネルのオーディオ信号に分配することで（オブジェクト単位での）音像定位を行うものである。したがって、オブジェクト情報には、各オブジェクトの位置情報、およびレベル等の情報が含まれている。 The decoder 12 extracts the object information when the input content data corresponds to the object base method. In the object-based method, an object (sound source) included in content is stored as an independent audio signal. The object-based method performs sound image localization (in object units) by distributing the audio signal of the object to the audio signal of each channel by the renderer 13 at the subsequent stage. Therefore, the object information includes information such as position information and level of each object.

レンダラ１３は、例えばＤＳＰからなり、オブジェクト情報に含まれている各オブジェクトの位置情報に基づいて、音像定位処理を行う。すなわち、レンダラ１３は、各オブジェクトの位置情報に対応する位置に音像が定位するように、デコーダ１２から出力される各オブジェクトのオーディオ信号を各チャンネルのオーディオ信号に所定のゲインで分配する。このようにして、チャンネルベース方式のオーディオ信号が生成される。生成された各チャンネルのオーディオ信号は、オーディオ信号処理部１４に出力される。 The renderer 13 is made of, for example, a DSP, and performs sound image localization processing based on the position information of each object included in the object information. That is, the renderer 13 distributes the audio signal of each object output from the decoder 12 to the audio signal of each channel with a predetermined gain so that the sound image is localized at a position corresponding to the position information of each object. In this way, a channel-based audio signal is generated. The generated audio signal of each channel is output to the audio signal processing unit 14.

オーディオ信号処理部１４は、例えばＤＳＰからなり、ＣＰＵ１７の設定に応じて、入力された各チャンネルのオーディオ信号に所定の音場効果を付与する処理を行う。 The audio signal processing unit 14 is formed of a DSP, for example, and performs a process of giving a predetermined sound field effect to the input audio signal of each channel according to the setting of the CPU 17.

音場効果は、例えば入力されたオーディオ信号から生成される擬似反射音からなる。生成された擬似反射音は、元のオーディオ信号に加算されて出力される。 The sound field effect is composed of, for example, a pseudo reflected sound generated from an input audio signal. The generated pseudo reflected sound is added to the original audio signal and output.

図３は、オーディオ信号処理部１４およびＣＰＵ１７の機能的構成を示したブロック図である。オーディオ信号処理部１４は、機能的に、加算処理部１４１、音場効果音生成部１４２、および加算処理部１４３を備えている。 FIG. 3 is a block diagram showing functional configurations of the audio signal processing unit 14 and the CPU 17. The audio signal processing unit 14 functionally includes an addition processing unit 141, a sound field effect sound generation unit 142, and an addition processing unit 143.

加算処理部１４１は、各チャンネルのオーディオ信号を所定のゲインで合成して、モノラル信号にミックスダウンする。各チャンネルのゲインは、ＣＰＵ１７における制御部１７１により設定される。一般的に、音源がセリフ等の音声の場合には音場効果を抑えることが好ましいため、音楽等の成分が多く含まれることの多いフロントチャンネルやサラウンドチャンネルのゲインは高く、セリフ等の成分が多く含まれることの多いセンタチャンネルのゲインは低く設定されている。 The addition processing unit 141 synthesizes the audio signals of the respective channels with a predetermined gain and mixes them down to a monaural signal. The gain of each channel is set by the control unit 171 in the CPU 17. Generally, when the sound source is speech such as speech, it is preferable to suppress the sound field effect. Therefore, the gain of the front channel and surround channel, which often contains a lot of components such as music, is high. The gain of the center channel that is often included is set low.

音場効果音生成部１４２は、例えばＦＩＲフィルタからなり、入力されたオーディオ信号に所定のインパルス応答を示すパラメータ（フィルタ係数）を畳み込むことで、擬似反射音を生成する。また、音場効果音生成部１４２は、生成した擬似反射音を各チャンネルに分配する処理を行う。フィルタ係数および分配比率は、ＣＰＵ１７における制御部１７１により設定される。 The sound field effect sound generation unit 142 includes, for example, an FIR filter, and generates a pseudo reflection sound by convolving a parameter (filter coefficient) indicating a predetermined impulse response with the input audio signal. The sound field effect sound generation unit 142 performs a process of distributing the generated pseudo reflected sound to each channel. The filter coefficient and the distribution ratio are set by the control unit 171 in the CPU 17.

ＣＰＵ１７は、機能的に、制御部１７１とオブジェクト情報取得部１７２とを備えている。制御部１７１は、ＲＯＭ１８に記憶された音場効果情報に基づいて、音場効果音生成部１４２に、上記フィルタ係数、および各チャンネルへの分配比率等を設定する。 The CPU 17 functionally includes a control unit 171 and an object information acquisition unit 172. Based on the sound field effect information stored in the ROM 18, the control unit 171 sets the filter coefficient, the distribution ratio to each channel, and the like in the sound field effect sound generation unit 142.

音場効果情報は、ある音響空間で発生する反射音群のインパルス応答、および反射音群の音源位置を示す情報を含むものである。例えば、スピーカ２１Ｌおよびスピーカ２１ＳＬに、所定の遅延量および所定のゲイン比率（例えば１：１）でオーディオ信号を供給すると、聴取位置の左側に擬似反射音を生成することができる。音場効果情報は、例えば前方上方側の音場を演出するプレゼンス音場用の設定や、サラウンド側の音場を演出するサラウンド音場用の設定がある。選択する音場効果情報は、オーディオ信号処理装置１において１つに固定されていてもよいが、映画館やコンサートホール等、ユーザが所望する音響空間の指定を受け付けて、受け付けた音響空間に対応する音場効果情報を選択するようにしてもよい。 The sound field effect information includes information indicating an impulse response of a reflected sound group generated in a certain acoustic space and a sound source position of the reflected sound group. For example, when an audio signal is supplied to the speaker 21L and the speaker 21SL with a predetermined delay amount and a predetermined gain ratio (for example, 1: 1), a pseudo reflected sound can be generated on the left side of the listening position. The sound field effect information includes, for example, a setting for a presence sound field that produces a sound field on the upper front side and a setting for a surround sound field that produces a sound field on the surround side. The sound field effect information to be selected may be fixed to one in the audio signal processing apparatus 1, but it accepts designation of the acoustic space desired by the user, such as a movie theater or a concert hall, and corresponds to the accepted acoustic space. Sound field effect information to be selected may be selected.

以上のようにして、音場効果音が生成され、加算処理部１４１において各チャンネルに加算される。その後、各チャンネルのオーディオ信号は、Ｄ／Ａコンバータ１５においてアナログ信号に変換され、アンプ１６で増幅された後に各スピーカに出力される。これにより、聴取位置の周囲にコンサートホール等の所定の音響空間を模した音場が形成される。 As described above, the sound field effect sound is generated and added to each channel in the addition processing unit 141. Thereafter, the audio signal of each channel is converted into an analog signal by the D / A converter 15, amplified by the amplifier 16, and then output to each speaker. Thereby, a sound field imitating a predetermined acoustic space such as a concert hall is formed around the listening position.

そして、本実施形態のオーディオ信号処理装置１は、デコーダ１２で抽出されたオブジェクト情報をオブジェクト情報取得部１７２が取得し、オブジェクト毎に最適な音場を形成する。制御部１７１は、オブジェクト情報取得部１７２が取得したオブジェクト情報に含まれている位置情報に基づいて、加算処理部１４１の各チャンネルのゲインを設定する。これにより、制御部１７１は、音場効果音生成部１４２における各チャンネルのゲインを制御する。 Then, in the audio signal processing apparatus 1 of the present embodiment, the object information acquisition unit 172 acquires the object information extracted by the decoder 12, and forms an optimal sound field for each object. The control unit 171 sets the gain of each channel of the addition processing unit 141 based on the position information included in the object information acquired by the object information acquisition unit 172. Thereby, the control unit 171 controls the gain of each channel in the sound field effect sound generation unit 142.

例えば、時刻ｔ＝１のときに聴取位置の前方にオブジェクトが存在し、当該オブジェクトが時刻ｔ＝２のときに聴取位置付近に移動し、時刻ｔ＝３のときに聴取位置の後方に移動すると仮定する。制御部１７１は、時刻ｔ＝１のとき、フロントチャンネルのゲインを最大に設定し、加算処理部１４１のサラウンドチャンネルのゲインを最小に設定する。制御部１７１は、時刻ｔ＝２のとき、加算処理部１４１のフロントチャンネルのゲインおよびサラウンドチャンネルのゲインを同じ程度に設定する。その後、制御部１７１は、時刻ｔ＝３のとき、加算処理部１４１のサラウンドチャンネルのゲインを最大に設定し、フロントチャンネルのゲインを最小に設定する。 For example, if an object exists in front of the listening position at time t = 1, the object moves to the vicinity of the listening position at time t = 2, and moves to the rear of the listening position at time t = 3. Assume. At time t = 1, the control unit 171 sets the gain of the front channel to the maximum, and sets the gain of the surround channel of the addition processing unit 141 to the minimum. At time t = 2, the control unit 171 sets the gain of the front channel and the surround channel of the addition processing unit 141 to the same level. Thereafter, at time t = 3, the control unit 171 sets the surround channel gain of the addition processing unit 141 to the maximum, and sets the front channel gain to the minimum.

このように、オーディオ信号処理装置１は、移動するオブジェクトに対応して加算処理部１４１の各チャンネルのゲインを動的に変化させることで、形成される音場を動的に変化させることができる。よって、聴取者は、より立体感のある音場効果を得ることができる。 As described above, the audio signal processing apparatus 1 can dynamically change the formed sound field by dynamically changing the gain of each channel of the addition processing unit 141 corresponding to the moving object. . Therefore, the listener can obtain a more effective sound field effect.

なお、本実施形態では、説明を容易にするために５つのスピーカ２１Ｌ、スピーカ２１Ｒ、スピーカ２１Ｃ、スピーカ２１ＳＬ、およびスピーカ２１ＳＲが設置され、５チャンネルのオーディオ信号を処理する例を示したが、スピーカの数およびチャンネルの数はこの例に限るものではない。実際には立体的な音像定位および音場効果を実現するために、高さの異なる位置にさらに多数のスピーカを設置することが好ましい。 In the present embodiment, for ease of explanation, an example in which five speakers 21L, 21R, 21C, 21SL, and 21SR are installed and a 5-channel audio signal is processed is shown. The number of channels and the number of channels are not limited to this example. Actually, in order to realize a three-dimensional sound image localization and a sound field effect, it is preferable to install a larger number of speakers at different heights.

なお、上述の例では、取得した位置情報に基づいたゲインで各チャンネルのオーディオ信号を合成し、所定のインパルス応答を示すパラメータ（フィルタ係数）を畳み込むことで、擬似反射音を生成する処理を行ったが、各チャンネルのオーディオ信号に個別のフィルタ係数を畳み込むことで音場効果を付与する処理を行ってもよい。この場合、ＲＯＭ１８には、オブジェクトの位置に応じた複数のフィルタ係数を記憶しておき、制御部１７１は、取得した位置情報に基づいて、ＲＯＭ１８から対応するフィルタ係数を読み出して音場効果音生成部１４２に設定する。また、制御部１７１は、取得した位置情報に基づいたゲインで各チャンネルのオーディオ信号を合成し、かつ取得した位置情報に基づいてＲＯＭ１８から対応するフィルタ係数を読み出して音場効果音生成部１４２に設定する処理を行ってもよい。 In the above-described example, the audio signal of each channel is synthesized with the gain based on the acquired position information, and a process of generating a pseudo reflected sound is performed by convolving a parameter (filter coefficient) indicating a predetermined impulse response. However, a process of adding a sound field effect may be performed by convolving an individual filter coefficient with the audio signal of each channel. In this case, the ROM 18 stores a plurality of filter coefficients corresponding to the position of the object, and the control unit 171 reads out the corresponding filter coefficients from the ROM 18 based on the acquired position information to generate sound field sound effects. Section 142. In addition, the control unit 171 combines the audio signals of the respective channels with a gain based on the acquired position information, reads out the corresponding filter coefficient from the ROM 18 based on the acquired position information, and sends it to the sound field effect sound generating unit 142. You may perform the process to set.

（第２実施形態）
次に、図４は、第２実施形態に係るオーディオ信号処理装置１Ｂの構成を示すブロック図である。図２に示した第１実施形態に係るオーディオ信号処理装置１と共通する構成については同一の符号を付し、説明を省略する。また、第２実施形態に係る聴取環境は、図１に示した第１実施形態に係る聴取環境と同様である。 (Second Embodiment)
Next, FIG. 4 is a block diagram showing a configuration of an audio signal processing device 1B according to the second embodiment. The components common to the audio signal processing apparatus 1 according to the first embodiment shown in FIG. The listening environment according to the second embodiment is the same as the listening environment according to the first embodiment shown in FIG.

オーディオ信号処理装置１Ｂにおけるオーディオ信号処理部１４は、図３に示した機能に加えて、分析部９１の機能を備えている。実際には、分析部９１は、別のハードウェア（ＤＳＰ）として実現されるものであるが、第２実施形態では説明のため、オーディオ信号処理部１４の機能として実現されるものとする。また、分析部９１は、ＣＰＵ１７によるソフトウェアで実現することも可能である。 The audio signal processing unit 14 in the audio signal processing device 1B has the function of the analysis unit 91 in addition to the functions shown in FIG. Actually, the analysis unit 91 is realized as another hardware (DSP), but in the second embodiment, it is assumed to be realized as a function of the audio signal processing unit 14 for the sake of explanation. The analysis unit 91 can also be realized by software by the CPU 17.

分析部９１は、各チャンネルのオーディオ信号を分析することにより、コンテンツに含まれているオブジェクト情報を抽出する。すなわち、第２実施形態のオーディオ信号処理装置１Ｂでは、ＣＰＵ１７がデコーダ１２からオブジェクト情報を取得しない（取得できない）場合に、各チャンネルのオーディオ信号を分析することでオブジェクト情報を推定するものである。 The analysis unit 91 extracts object information included in the content by analyzing the audio signal of each channel. That is, in the audio signal processing device 1B of the second embodiment, when the CPU 17 does not acquire (cannot acquire) object information from the decoder 12, the object information is estimated by analyzing the audio signal of each channel.

図５は、分析部９１の機能的構成を示すブロック図である。分析部９１は、帯域分割部９１１と計算部９１２とを備えている。帯域分割部９１１は、各チャンネルのオーディオ信号を所定の周波数帯域に分割する。この例では、低域（ＬＰＦ）、中域（ＢＰＦ）、および高域（ＨＰＦ）の３つの帯域に分割する例を示す。ただし、分割する帯域は３つに限るものではない。帯域分割された各チャンネルのオーディオ信号は、計算部９１２に入力される。 FIG. 5 is a block diagram illustrating a functional configuration of the analysis unit 91. The analysis unit 91 includes a band division unit 911 and a calculation unit 912. The band dividing unit 911 divides the audio signal of each channel into a predetermined frequency band. In this example, an example of dividing into three bands of a low frequency (LPF), a mid frequency (BPF), and a high frequency (HPF) is shown. However, the bandwidth to be divided is not limited to three. The band-divided audio signal of each channel is input to the calculation unit 912.

計算部９１２は、分割された各帯域において、チャンネル間の相互相関を算出する。算出された相互相関は、ＣＰＵ１７のオブジェクト情報取得部１７２に入力される。また、計算部９１２は、各チャンネルのオーディオ信号のレベルを検出するレベル検出部としても機能する。各チャンネルのオーディオ信号のレベル情報もオブジェクト情報取得部１７２に入力される。 The calculation unit 912 calculates the cross-correlation between channels in each divided band. The calculated cross-correlation is input to the object information acquisition unit 172 of the CPU 17. The calculation unit 912 also functions as a level detection unit that detects the level of the audio signal of each channel. The level information of the audio signal of each channel is also input to the object information acquisition unit 172.

オブジェクト情報取得部１７２は、入力された相関値および各チャンネルのオーディオ信号のレベル情報に基づいて、オブジェクトの位置を推定する。 The object information acquisition unit 172 estimates the position of the object based on the input correlation value and the level information of the audio signal of each channel.

例えば、図６（Ａ）に示すように、低域（Ｌｏｗ）におけるＬチャンネルとＳＬチャンネルの相関値が高く（所定の閾値を超え）、図６（Ｂ）に示すように、低域（Ｌｏｗ）におけるＬチャンネルのレベルおよびＳＬチャンネルのレベルが高い（所定の閾値を超える）場合、図６（Ｃ）に示すように、スピーカ２１Ｌおよびスピーカ２１ＳＬの間にオブジェクトが存在するものとする。 For example, as shown in FIG. 6 (A), the correlation value between the L channel and the SL channel in the low frequency (Low) is high (exceeds a predetermined threshold), and as shown in FIG. 6 (B), the low frequency (Low) ), The level of the L channel and the level of the SL channel are high (exceeding a predetermined threshold value), as shown in FIG. 6C, it is assumed that an object exists between the speaker 21L and the speaker 21SL.

また、高域（Ｈｉｇｈ）においては、相関の高いチャンネルは存在しないが、中域（Ｍｉｄ）のＣチャンネルにおいて高レベルのオーディオ信号が入力されている。したがって、図６（Ｃ）に示すように、スピーカ２１Ｃの付近にもオブジェクトが存在するものとする。 In the high range (High), there is no highly correlated channel, but a high-level audio signal is input in the mid-range (Mid) C channel. Therefore, as shown in FIG. 6C, it is assumed that an object is also present near the speaker 21C.

この場合、制御部１７１は、図３における加算処理部１４１に設定するゲインについて、ＬチャンネルのゲインおよびＳＬチャンネルのゲインを同じ程度（０．５：０．５）に設定するとともに、Ｃチャンネルのゲインを最大（１）に設定する。他のチャンネルのゲインは、最小に設定される。これにより、各オブジェクトの位置に応じた最適な寄与率を設定した音場効果音が生成される。 In this case, the control unit 171 sets the gain of the L channel and the gain of the SL channel to the same level (0.5: 0.5) for the gain set in the addition processing unit 141 in FIG. Set the gain to maximum (1). The gain of other channels is set to the minimum. Thereby, a sound field effect sound in which an optimum contribution rate according to the position of each object is set is generated.

ただし、Ｃチャンネルにおける高レベルの信号は、セリフ等の音声に関するものである可能性があるため、制御部１７１は、オブジェクトの種類に関する情報も参照してゲインを設定することが好ましい。オブジェクトの種類に関する情報については、後述する。 However, since the high-level signal in the C channel may be related to speech such as speech, the control unit 171 preferably sets the gain with reference to information on the type of object. Information regarding the type of object will be described later.

また、このとき、制御部１７１は、帯域毎に設定された音場効果情報をＲＯＭ１８から読み出し、帯域毎に個別のパラメータ（フィルタ係数）を音場効果音生成部１４２に設定することが好ましい。例えば低域については残響時間が短く、高域については残響時間が長くなるように設定される。 At this time, it is preferable that the control unit 171 reads the sound field effect information set for each band from the ROM 18 and sets individual parameters (filter coefficients) for each band in the sound field effect sound generation unit 142. For example, the reverberation time is set short for the low frequency range, and the reverberation time is set long for the high frequency range.

なお、チャンネルの数が多いほど、オブジェクトの位置は、正確に推定することができる。この例では、各スピーカが全て同じ高さに配置され、５チャンネルのオーディオ信号の相関値を算出する例を示しているが、実際には立体的な音像定位および音場効果を実現するために、高さの異なる位置にさらに多数のスピーカを設置し、さらに多数のチャンネル間の相関値を算出するため、音源の位置はほぼ一意に決定することが可能である。 Note that the more the number of channels, the more accurately the position of the object can be estimated. In this example, all the speakers are arranged at the same height, and the correlation value of the 5-channel audio signal is calculated. However, in order to actually realize a three-dimensional sound image localization and a sound field effect. Since more speakers are installed at different heights and correlation values between more channels are calculated, the position of the sound source can be determined almost uniquely.

なお、この実施形態においては、帯域毎に各チャンネルのオーディオ信号を分割して、帯域毎にオブジェクトの位置情報を取得する例を示したが、帯域毎にオブジェクトの位置情報を取得する構成は、本発明において必須の構成ではない。 In this embodiment, the audio signal of each channel is divided for each band and the object position information is acquired for each band. However, the configuration for acquiring the object position information for each band is as follows. This is not an essential configuration in the present invention.

（変形例１）
次に、図７は、第１実施形態（または第２実施形態）の変形例１に係るオーディオ信号処理部１４の機能的構成を示すブロック図である。変形例１に係るオーディオ信号処理部１４は、加算処理部１４１Ａ、第１音場効果音生成部１４２Ａ、加算処理部１４１Ｂ、第２音場効果音生成部１４２Ｂ、および加算処理部１４３を備えている。なお、加算処理部１４１Ｂおよび第２音場効果音生成部１４２Ｂは、それぞれ実際には別のハードウェア（ＤＳＰ）として構成されるが、この例では説明のため、それぞれオーディオ信号処理部１４の機能として実現されるものとする。 (Modification 1)
Next, FIG. 7 is a block diagram illustrating a functional configuration of the audio signal processing unit 14 according to the first modification of the first embodiment (or the second embodiment). The audio signal processing unit 14 according to Modification 1 includes an addition processing unit 141A, a first sound field effect sound generation unit 142A, an addition processing unit 141B, a second sound field effect sound generation unit 142B, and an addition processing unit 143. Yes. Note that the addition processing unit 141B and the second sound field effect sound generation unit 142B are actually configured as different hardware (DSP), but in this example, for the sake of explanation, the functions of the audio signal processing unit 14 are respectively provided. It shall be realized as

加算処理部１４１Ａは、各チャンネルのオーディオ信号を所定のゲインで合成して、モノラル信号にミックスダウンする。各チャンネルのゲインは、固定されている。例えば、上述したように、フロントチャンネルやサラウンドチャンネルのゲインは高く、センタチャンネルのゲインは低く設定されている。 The addition processing unit 141A combines the audio signals of the respective channels with a predetermined gain and mixes them down to a monaural signal. The gain of each channel is fixed. For example, as described above, the gain of the front channel and the surround channel is set high, and the gain of the center channel is set low.

第１音場効果音生成部１４２Ａは、入力されたオーディオ信号に所定のインパルス応答を示すパラメータ（フィルタ係数）を畳み込むことで、擬似反射音を生成する。また、第１音場効果音生成部１４２Ａは、生成した擬似反射音を各チャンネルに分配する処理を行う。フィルタ係数および分配比率は、制御部１７１により設定される。図３の例と同様に、映画館やコンサートホール等、ユーザが所望する音響空間の指定を受け付けて、受け付けた音響空間に対応する音場効果情報を選択するようにしてもよい。 The first sound field effect sound generation unit 142A generates a pseudo reflected sound by convolving a parameter (filter coefficient) indicating a predetermined impulse response with the input audio signal. The first sound field effect sound generation unit 142A performs a process of distributing the generated pseudo reflected sound to each channel. The filter coefficient and the distribution ratio are set by the control unit 171. Similarly to the example of FIG. 3, it is also possible to accept designation of an acoustic space desired by the user, such as a movie theater or a concert hall, and select sound field effect information corresponding to the accepted acoustic space.

一方、制御部１７１は、オブジェクト情報取得部１７２が取得したオブジェクト情報に含まれている位置情報に基づいて、加算処理部１４１Ｂの各チャンネルのゲインを設定する。これにより、制御部１７１は、第２音場効果音生成部１４２Ｂにおける各チャンネルのゲインを制御する。 On the other hand, the control unit 171 sets the gain of each channel of the addition processing unit 141B based on the position information included in the object information acquired by the object information acquisition unit 172. Thereby, the control part 171 controls the gain of each channel in the 2nd sound field effect sound generation part 142B.

第１音場効果音生成部１４２Ａで生成された音場効果音と、第２音場効果音生成部１４２Ｂで生成された音場効果音と、は、それぞれ加算処理部１４３で各チャンネルのオーディオ信号に加算される。 The sound field effect sound generated by the first sound field effect sound generation unit 142A and the sound field effect sound generated by the second sound field effect sound generation unit 142B are respectively added to the audio of each channel by the addition processing unit 143. Added to the signal.

よって、変形例に係るオーディオ信号処理部１４では、従来のように各チャンネルの寄与率を固定した音場効果音を生成しながらも、各オブジェクトの位置に応じた最適な寄与率を設定した音場効果音が生成される。 Therefore, the audio signal processing unit 14 according to the modified example generates a sound field effect sound in which the contribution rate of each channel is fixed as in the related art, while setting the optimum contribution rate according to the position of each object. A field effect sound is generated.

（変形例２）
次に、第１実施形態（または第２実施形態）の変形例２に係るオーディオ信号処理装置について説明する。変形例２に係るオーディオ信号処理部１４およびＣＰＵ１７は、図３に示した構成（または図７に示した構成）と同様の機能的構成を備えている。ただし、変形例２に係るオブジェクト情報取得部１７２は、オブジェクト情報として、位置情報に加えて、オブジェクトの種類を示す情報を取得する。 (Modification 2)
Next, an audio signal processing device according to Modification 2 of the first embodiment (or the second embodiment) will be described. The audio signal processing unit 14 and the CPU 17 according to Modification 2 have the same functional configuration as the configuration illustrated in FIG. 3 (or the configuration illustrated in FIG. 7). However, the object information acquisition unit 172 according to the modification 2 acquires information indicating the type of the object in addition to the position information as the object information.

オブジェクトの種類を示す情報は、例えばセリフ、楽器、効果音、等の音源の種類を示す情報である。オブジェクトの種類を示す情報は、コンテンツデータに含まれている場合には、デコーダ１２が抽出するが、分析部９１における計算部９１２により推定することも可能である。 The information indicating the type of object is information indicating the type of sound source such as speech, musical instrument, sound effect, and the like. The information indicating the type of the object is extracted by the decoder 12 when included in the content data, but can be estimated by the calculation unit 912 in the analysis unit 91.

例えば、分析部９１における帯域分割部９１１は、入力されたオーディオ信号から、第１フォルマント（２００Ｈｚ〜５００Ｈｚ）、および第２フォルマント（２ｋＨｚ〜３ｋＨｚ）の帯域を抽出する。仮に、入力信号成分にセリフに関する成分が多く含まれる場合、またはセリフに関する成分しか含まれていない場合には、これら第１フォルマントおよび第２フォルマントの成分が他の帯域よりも多く含まれる。 For example, the band dividing unit 911 in the analysis unit 91 extracts the first formant (200 Hz to 500 Hz) and second formant (2 kHz to 3 kHz) bands from the input audio signal. If the input signal component includes many components related to serifs or only includes components related to serifs, these first formant and second formant components are included more than other bands.

したがって、オブジェクト情報取得部１７２は、全周波数帯域の平均レベルに比べて、これら第１フォルマントまたは第２フォルマントの成分のレベルが高い場合、オブジェクトの種類がセリフであると判断する。 Therefore, the object information acquisition unit 172 determines that the type of the object is a serif when the level of the component of the first formant or the second formant is higher than the average level of the entire frequency band.

制御部１７１は、オブジェクトの種類に基づいて加算処理部１４１（または加算処理部１４１Ｂ）のゲインを設定する。例えば、図６（Ｃ）に示したように、聴取位置の左側にオブジェクトが存在し、当該オブジェクトの種類がセリフである場合に、ＬチャンネルおよびＳＬチャンネルのゲインを低く設定する。また、図６（Ｃ）に示したように、聴取位置の前方にオブジェクトが存在し、当該オブジェクトの種類がセリフである場合に、Ｃチャンネルのゲインを低く設定する。 The control unit 171 sets the gain of the addition processing unit 141 (or the addition processing unit 141B) based on the type of object. For example, as shown in FIG. 6C, when an object exists on the left side of the listening position and the type of the object is a line, the gains of the L channel and the SL channel are set low. Further, as shown in FIG. 6C, when an object is present in front of the listening position and the type of the object is a line, the gain of the C channel is set low.

（変形例３）
第２実施形態の変形例３として、オーディオ信号処理装置１Ｂは、推定したオブジェクトの位置情報を用いて、表示部（不図示）にオブジェクトの位置を表示させることができる。これにより、ユーザは、音源の移動を視覚的に把握することができる。映画等のコンテンツの場合は、既に映像として表示部に音源に対応するものが表示されている場合が多いが、表示されている映像は主観的な視野である。そこで、オーディオ信号処理装置１Ｂは、例えば自身の位置を中心とした俯瞰図としてオブジェクトの位置を表示させることもできる。 (Modification 3)
As a third modification of the second embodiment, the audio signal processing device 1B can display the position of the object on a display unit (not shown) using the estimated position information of the object. Thereby, the user can visually grasp the movement of the sound source. In the case of content such as a movie, in many cases, a video corresponding to a sound source is already displayed on the display unit, but the displayed video has a subjective field of view. Therefore, the audio signal processing device 1B can also display the position of the object as an overhead view centering on its own position, for example.

（第３実施形態）
次に、図８（Ａ）および図８（Ｂ）は、第３実施形態に係る聴取環境の模式図であり、図９は、第３実施形態におけるオーディオ信号処理装置１Ｃのブロック図である。第３実施形態に係るオーディオ信号処理装置１Ｃは、図２に示したオーディオ信号処理装置１と同一のハードウェア構成を備えているが、さらにユーザインタフェース（Ｉ／Ｆ）８１を備えている。 (Third embodiment)
Next, FIGS. 8A and 8B are schematic views of a listening environment according to the third embodiment, and FIG. 9 is a block diagram of an audio signal processing device 1C according to the third embodiment. The audio signal processing device 1C according to the third embodiment has the same hardware configuration as the audio signal processing device 1 shown in FIG. 2, but further includes a user interface (I / F) 81.

ユーザＩ／Ｆ８１は、ユーザの操作を受け付けるインタフェースであり、例えばオーディオ信号処理装置の筐体に設けられたスイッチ、タッチパネル、またはリモコン等からなる。ユーザは、ユーザＩ／Ｆ８１を介して、聴取環境の変更指示として、所望する音響空間を指定する。 The user I / F 81 is an interface that accepts user operations, and includes, for example, a switch, a touch panel, a remote controller, or the like provided in a housing of the audio signal processing device. The user designates a desired acoustic space as a listening environment change instruction via the user I / F 81.

ＣＰＵ１７の制御部１７１は、当該音響空間の指定を受け付けて、ＲＯＭ１８から指定された音響空間に対応する音場効果情報を読み出す。そして、制御部１７１は、当該音場効果情報に基づくフィルタ係数および各チャンネルへの分配比率等をオーディオ信号処理部１４に設定する。 The control unit 171 of the CPU 17 receives designation of the acoustic space and reads out sound field effect information corresponding to the designated acoustic space from the ROM 18. Then, the control unit 171 sets a filter coefficient based on the sound field effect information, a distribution ratio to each channel, and the like in the audio signal processing unit 14.

さらに、制御部１７１は、オブジェクト情報取得部１７２で取得したオブジェクトの位置情報を、読み出した音場効果情報に対応する位置に変換し、レンダラ１３に変換後の位置情報を出力することで、オブジェクトを再配置する。 Further, the control unit 171 converts the position information of the object acquired by the object information acquisition unit 172 into a position corresponding to the read sound field effect information, and outputs the converted position information to the renderer 13, thereby Rearrange.

すなわち、制御部１７１は、例えば大コンサートホールの音響空間の指定を受け付けた場合、聴取位置から遠い位置にオブジェクトの位置を再配置することで、当該大コンサートホールの規模に相当する位置に各オブジェクトを再配置する。レンダラ１３は、制御部１７１から入力される位置情報に基づいて、音像定位処理を行う。 That is, for example, when the designation of the acoustic space of the large concert hall is received, the control unit 171 rearranges the position of the object at a position far from the listening position, so that each object is positioned at a position corresponding to the scale of the large concert hall. Rearrange. The renderer 13 performs sound image localization processing based on the position information input from the control unit 171.

例えば、図８（Ａ）に示すように、聴取位置の前方右側にオブジェクト５１Ｒが配置され、聴取位置の前方左側にオブジェクト５１Ｌが配置されている場合において、制御部１７１は、図８（Ｂ）に示すように、大コンサートホールの音響空間の指定を受け付けた場合、聴取位置から離れた位置にオブジェクト５１Ｒおよびオブジェクト５１Ｌを再配置する。これにより、選択された音響空間の音場環境だけでなく、直接音に相当する音源の位置も実際の音響空間に近づけることができる。 For example, as shown in FIG. 8A, when the object 51R is arranged on the right front side of the listening position and the object 51L is arranged on the left front side of the listening position, the control unit 171 controls the control unit 171 in FIG. As shown in FIG. 5, when designation of the acoustic space of the large concert hall is received, the object 51R and the object 51L are rearranged at a position away from the listening position. Thereby, not only the sound field environment of the selected acoustic space but also the position of the sound source corresponding to the direct sound can be brought close to the actual acoustic space.

また、制御部１７１は、オブジェクトの移動についても、選択された音響空間の規模に対応する移動量に変換する。例えば、演劇等では、演者は、動的に移動しながらセリフを発する。制御部１７１は、例えば大ホールの音響空間の指定を受け付けた場合、デコーダ１２で抽出されるオブジェクトの移動量を大きくして、演者に対応するオブジェクトの位置を再配置する。これにより、演者がその場所で演じているような臨場感を与えることができる。 The control unit 171 also converts the movement of the object into a movement amount corresponding to the selected acoustic space. For example, in a theater or the like, a performer utters a speech while moving dynamically. For example, when the designation of the acoustic space of the large hall is received, the control unit 171 increases the movement amount of the object extracted by the decoder 12 and rearranges the position of the object corresponding to the performer. As a result, it is possible to give a sense of presence as if the performer is performing at the place.

また、ユーザＩ／Ｆ８１は、聴取環境の変更指示として、聴取位置の指定を受け付けることも可能である。ユーザは、例えば、大ホールの音響空間を選択した後、さらに、ホールの中で、舞台のすぐ前の位置、２階席（斜め上から舞台を見下ろす位置）、出口付近の舞台から遠い位置等の聴取位置を選択する。 Further, the user I / F 81 can also accept the designation of the listening position as a listening environment change instruction. For example, after the user selects an acoustic space in a large hall, the user is in a position immediately in front of the stage in the hall, a second-floor seat (a position where the stage is looked down obliquely), a position far from the stage near the exit, etc. Select the listening position.

制御部１７１は、指定された聴取位置に応じて、各オブジェクトを再配置する。例えば、舞台のすぐ前の位置に聴取位置が指定された場合、オブジェクトの位置を聴取位置に近い位置に再配置し、舞台から遠い位置に聴取位置が指定された場合には、オブジェクトの位置を聴取位置から遠い位置に再配置する。また、例えば、２階席の位置が聴取位置として指定された場合（斜め上から舞台を見下ろす位置）、聴取者から見て斜め位置にオブジェクトの位置を再配置する。 The control unit 171 rearranges each object according to the designated listening position. For example, when the listening position is specified immediately before the stage, the object position is rearranged to a position close to the listening position, and when the listening position is specified far from the stage, the object position is changed. Relocate to a position far from the listening position. Further, for example, when the position of the second-floor seat is designated as the listening position (position where the stage is looked down from above), the position of the object is rearranged obliquely as viewed from the listener.

また、聴取位置の指定を受け付ける場合、各位置における実際の音場（間接音の到来タイミングおよび方向）を測定し、音場効果情報としてＲＯＭ１８に格納しておくことが好ましい。制御部１７１は、指定された聴取位置に対応する音場効果情報をＲＯＭ１８から読み出す。これにより、舞台のすぐ前の位置における音場、舞台から遠い位置における音場等を再現することができる。 In addition, when receiving the designation of the listening position, it is preferable to measure the actual sound field (indirect sound arrival timing and direction) at each position and store it in the ROM 18 as sound field effect information. The control unit 171 reads out the sound field effect information corresponding to the designated listening position from the ROM 18. Thereby, a sound field at a position immediately in front of the stage, a sound field at a position far from the stage, and the like can be reproduced.

なお、音場効果情報は、実際の音響空間における全ての位置で測定する必要はない。例えば、舞台のすぐ前の位置では、直接音が大きくなり、舞台から遠い位置では間接音が大きくなる。したがって、例えばホール中央の聴取位置が選択された場合には、舞台のすぐ前の位置における測定結果に対応する音場効果情報と舞台から遠い位置に置ける測定結果に対応する音場効果情報とを平均化することで、ホール中央の聴取位置に対応する音場効果情報を補間することもできる。 The sound field effect information need not be measured at all positions in the actual acoustic space. For example, the direct sound increases at a position immediately in front of the stage, and the indirect sound increases at a position far from the stage. Therefore, for example, when the listening position at the center of the hall is selected, the sound field effect information corresponding to the measurement result at the position immediately before the stage and the sound field effect information corresponding to the measurement result placed at a position far from the stage are obtained. By averaging, the sound field effect information corresponding to the listening position at the center of the hall can be interpolated.

（応用例）
応用例に係るオーディオ信号処理装置１Ｂは、ユーザが装着する端末に設けられたジャイロセンサ等を用いて、ユーザの向いている方向に関する情報を取得する。制御部１７１は、ユーザの向いている方向に応じて各オブジェクトを再配置する。 (Application examples)
The audio signal processing device 1B according to the application example acquires information regarding the direction in which the user is facing, using a gyro sensor or the like provided in a terminal worn by the user. The control unit 171 rearranges each object according to the direction in which the user is facing.

例えば、制御部１７１は、聴取者が右側を向いている場合、聴取者から見て左側の位置にオブジェクトの位置を再配置する。 For example, when the listener is facing the right side, the control unit 171 rearranges the position of the object at a position on the left side as viewed from the listener.

また、応用例に係るオーディオ信号処理装置１ＢのＲＯＭ１８は、方向毎の音場効果情報を記憶している。制御部１７１は、聴取者の向いている方向に応じてＲＯＭ１８から音場効果情報を読み出し、オーディオ信号処理部１４に設定する。これにより、ユーザは、あたかもその場所に居るような実在感を得ることができる。 Further, the ROM 18 of the audio signal processing device 1B according to the application example stores sound field effect information for each direction. The control unit 171 reads out the sound field effect information from the ROM 18 according to the direction in which the listener is facing, and sets it in the audio signal processing unit 14. As a result, the user can obtain a sense of reality as if he were in the place.

１，１Ｂ，１Ｃ…オーディオ信号処理装置
１１…入力部
１２…デコーダ
１３…レンダラ
１４…オーディオ信号処理部
１５…Ｄ／Ａコンバータ
１７…ＣＰＵ
１８…ＲＯＭ
１９…ＲＡＭ
２１Ｃ，２１Ｌ，２１Ｒ，２１ＳＬ，２１ＳＲ…スピーカ
５１Ｌ，５１Ｒ…オブジェクト
９１…分析部
１４１，１４１Ａ，１４１Ｂ…加算処理部
１４２…音場効果音生成部
１４２Ａ…第１音場効果音生成部
１４２Ｂ…第２音場効果音生成部
１４３…加算処理部
１７１…制御部
１７２…オブジェクト情報取得部
９１１…帯域分割部
９１２…相関計算部 DESCRIPTION OF SYMBOLS 1, 1B, 1C ... Audio signal processing apparatus 11 ... Input part 12 ... Decoder 13 ... Renderer 14 ... Audio signal processing part 15 ... D / A converter 17 ... CPU
18 ... ROM
19 ... RAM
21C, 21L, 21R, 21SL, 21SR ... Speakers 51L, 51R ... Object 91 ... Analysis units 141, 141A, 141B ... Addition processing unit 142 ... Sound field effect sound generation unit 142A ... First sound field effect sound generation unit 142B ... No. 2 sound field effect sound generation unit 143 ... addition processing unit 171 ... control unit 172 ... object information acquisition unit 911 ... band division unit 912 ... correlation calculation unit

Claims

Audio signal input means for inputting audio signals of a plurality of channels;
A correlation detector for detecting a correlation component between channels;
Acquisition means for acquiring position information of an object included in content corresponding to the audio signal based on the correlation component detected by the correlation detection unit;
An audio signal processing apparatus comprising:

A band dividing unit for dividing the plurality of channels of audio signals into predetermined bands,
The audio signal processing apparatus according to claim 1, wherein the correlation detection unit detects a correlation component for each band.

A level detection unit that detects the level of each divided band,
The audio signal processing apparatus according to claim 2, wherein the acquisition unit acquires the type information of the object based on the level of each divided band.