JP4951985B2

JP4951985B2 - Audio signal processing apparatus, audio signal processing system, program

Info

Publication number: JP4951985B2
Application number: JP2006020653A
Authority: JP
Inventors: 裕司山田; 越沖本
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2006-01-30
Filing date: 2006-01-30
Publication date: 2012-06-13
Anticipated expiration: 2026-01-30
Also published as: JP2007202021A

Description

本発明は、音声信号を対象として信号処理を実行する音声信号処理装置に関する。また、このような音声信号処理装置の機能を与えようとする情報処理装置が実行するプログラムに関する。 The present invention relates to an audio signal processing apparatus that performs signal processing on an audio signal. The present invention also relates to a program executed by an information processing apparatus that intends to provide the function of such an audio signal processing apparatus.

例えば５．１ｃｈサラウンドや７．１ｃｈサラウンドなどの、いわゆるマルチチャンネルといわれるチャンネル構成により音響再生を行うことが知られ、また、普及してきている。
一方で、例えばＬｃｈ，Ｒｃｈによる２チャンネルステレオに代表されるように、マルチチャンネルシステムよりも前から普及定着している再生システムも依然として広く使用されている。このために、マルチチャンネルの音声ソースを、上記２チャンネルステレオなどの、よりチャンネル数の少ない再生システムにより再生しなければならない状況は避け難い。 For example, it is known and widely used that sound reproduction is performed by a so-called multi-channel channel structure such as 5.1ch surround or 7.1ch surround.
On the other hand, as represented by, for example, two-channel stereo using Lch and Rch, a reproduction system that has been widely established before the multi-channel system is still widely used. For this reason, it is unavoidable that a multi-channel audio source must be played back by a playback system having a smaller number of channels, such as the above-described 2-channel stereo.

しかしながら、マルチチャンネルの音声ソースとしては、本来は、マルチチャンネルを全体で視聴したときにしかるべき音響効果が得られるようにして形成された個々のチャンネルごとに応じた音声信号からなるものとされる。５．１ｃｈサラウンドであれば、Ｌ（左）ｃｈ、Ｃ（センター）ｃｈ、Ｒ（右）ｃｈ、ＬＳ（左サラウンド）ｃｈ、ＲＳ（右サラウンド）ｃｈ、ＳＷ（サブウーファ）ｃｈごとに応じた６つの音声信号から成るということである。このために、マルチチャンネルのソースをＬ，Ｒステレオチャンネルにより再生させる場合において、例えば単純に、マルチチャンネルのＬｃｈ、Ｒｃｈの音声を再生出力させたとすると、残るＣｃｈ、ＬＳｃｈ、ＲＳｃｈにより再生させるべき音源の要素が完全に欠落し、聴くことのできない音ができてしまうという不都合を生じる。
そこで、マルチチャンネルを形成する各チャンネルの音声信号を適切に分配するようにして、例えばＬｃｈ，Ｒｃｈによる２チャンネルステレオのチャンネル構成の音声ソースに変換するエンコード技術が知られている。例えばこのようにしてエンコードされた２チャンネルステレオの音声ソースを再生すれば、その再生音場としては、左右方向においてのみ音像が定位する一般的な２チャンネルステレオによるものとはなるが、全てのチャンネルの音声の成分が含まれているので、欠落して聴けなくなる音はなくなる。 However, a multi-channel audio source is originally composed of an audio signal corresponding to each individual channel formed so as to obtain an appropriate acoustic effect when viewing the multi-channel as a whole. . In the case of 5.1 ch surround, 6 according to each of L (left) ch, C (center) ch, R (right) ch, LS (left surround) ch, RS (right surround) ch, SW (subwoofer) ch. It consists of two audio signals. For this reason, when a multi-channel source is played back by L and R stereo channels, for example, if the multi-channel Lch and Rch audio are simply played back and output, the sound source to be played back by the remaining Cch, LSch, and RSch. This causes the inconvenience that this element is completely lost and a sound that cannot be heard is produced.
Therefore, an encoding technique is known in which audio signals of respective channels forming a multi-channel are appropriately distributed and converted into an audio source having a 2-channel stereo channel configuration using, for example, Lch and Rch. For example, when a two-channel stereo sound source encoded in this way is reproduced, the reproduced sound field is based on a general two-channel stereo in which the sound image is localized only in the left-right direction, but all channels are reproduced. Since the sound component is included, there is no sound that is lost and cannot be heard.

そして、上記したようなエンコードの技術としては、次のようなものが知られている。なお、ここでのエンコード技術の説明にあたっては、エンコード対象となるマルチチャンネルは、Ｌｃｈ、Ｃｃｈ、Ｒｃｈ、Ｓ（サラウンド）ｃｈの４チャンネルであることとし、エンコードによってＬｃｈ，Ｒｃｈの２チャンネルステレオに変換される場合を例に挙げる。
ここで、上記したマルチチャンネルを形成するＬｃｈ、Ｃｃｈ、Ｒｃｈ、Ｓｃｈチャンネルごとの音声信号を、それぞれＳl、Ｓc、Ｓr、Ｓsとし、エンコード後の２チャンネルによるＬｃｈ、Ｒｃｈの信号を、Ｓ1、Ｓ2とする。そして、エンコード処理としては、例えば信号Ｓl、Ｓc、Ｓr、Ｓsを利用して、それぞれ下記の式（１）、（２）に示す演算を実行することで、これら信号Ｓ1、Ｓ2を得るようにされる。
Ｓ1＝Ｌ＋0.7Ｃ＋0.7Ｓ・・・式（１）
Ｓ2＝Ｒ＋0.7Ｃ−0.7Ｓ・・・式（２）
このようにして、信号Ｓ1は、Ｌｃｈの信号に対して、所定の係数（0.7）により乗算したＣｃｈ、Ｓｃｈの信号を加算して得られる。また、信号Ｓ2は、Ｒｃｈの信号に対して、所定の係数（0.7）により乗算したＣｃｈを加算し、Ｓｃｈの信号を減算したものとなっている。そして、このようにして得られた信号Ｓ1、Ｓ2による音声ソースを、２チャンネルステレオによる再生システムにより再生すれば、通常のＬｃｈ，Ｒｃｈによる２チャンネルステレオの音像定位ではあるが、元の音声ソースの音は欠落することなく、全て聴こえるようにして再生されることになる。 The following encoding techniques are known as described above. In the description of the encoding technique here, the multi-channel to be encoded is assumed to be four channels of Lch, Cch, Rch, and S (surround) ch, and converted into Lch and Rch two-channel stereo by encoding. Take the case where it is done as an example.
Here, the audio signals for each of the Lch, Cch, Rch, and Sch channels forming the above-described multichannel are Sl, Sc, Sr, and Ss, respectively, and the Lch and Rch signals of the two channels after encoding are S1, S2, respectively. And As an encoding process, for example, the signals S1, Sc, Sr, and Ss are used to execute the operations shown in the following equations (1) and (2) to obtain these signals S1 and S2. Is done.
S1 = L + 0.7C + 0.7S ... Formula (1)
S2 = R + 0.7C-0.7S (2)
In this way, the signal S1 is obtained by adding the Cch and Sch signals multiplied by the predetermined coefficient (0.7) to the Lch signal. The signal S2 is obtained by adding Cch multiplied by a predetermined coefficient (0.7) to the Rch signal and subtracting the Sch signal. Then, if the audio source based on the signals S1 and S2 obtained in this way is reproduced by a 2-channel stereo playback system, the sound image localization of the 2-channel stereo based on the normal Lch and Rch is obtained. The sound will be reproduced without being lost.

また、上記したエンコード技術に対応した技術として、エンコードされた２チャンネルステレオなどの音声ソースを、元のマルチチャンネルの音声ソースに変換するデコード技術も存在する。このようなデコード技術について、図２２を参照して説明する。 As a technique corresponding to the above-described encoding technique, there is a decoding technique for converting an encoded audio source such as a two-channel stereo into an original multi-channel audio source. Such a decoding technique will be described with reference to FIG.

図２２においては、デコード元の信号として、上記したエンコードの処理によって得られた信号Ｓ1、Ｓ2が入力される。
信号Ｓ1、Ｓ2は、それぞれ、方向性強調回路５０１、５０４に対して直接入力される。また、これとともに、信号Ｓ1、Ｓ2は加算器５１１により加算されることで信号Ｓ3として方向性強調回路５０２に入力される。さらに、信号Ｓ1、Ｓ2は加算器５１２により減算され、信号Ｓ4として方向性強調回路５０３に入力されるようになっている。つまり、信号Ｓ1、Ｓ2を入力して、信号Ｓ3、Ｓ4を生成する部位は、マトリクス回路としての構成を採る。
このマトリクス回路の動作に基づき、信号Ｓ3、Ｓ4は、それぞれ、下記の式（３）（４）により表される。
Ｓ3＝1.4Ｃ＋Ｌ＋Ｒ・・・（式３）
Ｓ4＝1.4Ｓ＋Ｌ−Ｒ・・・（式４）
なお、図２２において示される信号Ｓ1、Ｓ2としては、それぞれ、先に示した式（１）（２）により表される。 In FIG. 22, signals S1 and S2 obtained by the above-described encoding process are input as decoding source signals.
The signals S1 and S2 are directly input to the direction enhancement circuits 501 and 504, respectively. At the same time, the signals S 1 and S 2 are added by the adder 511 and input to the directionality enhancement circuit 502 as the signal S 3. Further, the signals S1 and S2 are subtracted by the adder 512 and input to the directionality emphasis circuit 503 as the signal S4. That is, the part which receives the signals S1 and S2 and generates the signals S3 and S4 adopts a configuration as a matrix circuit.
Based on the operation of this matrix circuit, the signals S3 and S4 are expressed by the following equations (3) and (4), respectively.
S3 = 1.4C + L + R (Formula 3)
S4 = 1.4S + LR (Formula 4)
Note that the signals S1 and S2 shown in FIG. 22 are represented by the equations (1) and (2) shown above, respectively.

上記のようにして得られる信号Ｓ1、Ｓ2、Ｓ3、Ｓ4のそれぞれの特徴として、先ず、信号Ｓ1
は、デコード後のＬｃｈ信号成分が他のチャンネルの信号成分よりも３ｄＢ高くなっている。また、信号Ｓ2は、デコード後のＲｃｈ信号成分が他のチャンネルの信号成分よりも３ｄＢ高くなっている。また、信号Ｓ3は、デコード後のＣｃｈ信号成分が他のチャンネルの信号成分よりも３ｄＢ高く、信号Ｓ4は、デコード後のＣｃｈ信号成分が他のチャンネルの信号成分よりも３ｄＢ高くなっている。つまり、信号Ｓ1、Ｓ2、Ｓ3、Ｓ4は、自身に含まれる各チャンネルの信号成分の間で、特定の１つのチャンネルの信号成分のみが他のチャンネルの信号成分よりも高いという性質を持つことで、それぞれ、Ｌｃｈ、Ｃｃｈ、Ｒｃｈ、Ｓｃｈチャンネルの信号としての適正を得ている。
ただし、マトリクス回路により生成されたままの段階の信号Ｓ1、Ｓ2、Ｓ3、Ｓ4の状態では、音像の分離が不十分になる。そこで、方向性強調回路５０１、５０２、５０３、５０４を設け、これらの回路に対して、それぞれ、信号Ｓ1、Ｓ2、Ｓ3、Ｓ4を通過させ、実際のＬｃｈ、Ｃｃｈ、Ｒｃｈ、Ｓｃｈチャンネルごとの再生用信号を得るようにされている。方向性強調回路は、信号Ｓ1、Ｓ2、Ｓ3、Ｓ4のレベル差に応じてその出力レベルを変化させるように構成されている。例えば、Ｌｃｈの信号Ｓ1が、他のチャンネルの信号Ｓ2、Ｓ3、Ｓ4よりもレベルが大きくなったとすると、これに適応して信号Ｓ1のレベルを動的に増強させ、Ｌｃｈの音声を他のチャンネルの音声よりも際だたせるようにする。このような動作によって、４チャンネルの音声の間での音像の分離がより良好になる。
なお、上記したエンコード、デコードの技術は、例えばドルビープロロジックなどに採用されている。 As characteristics of the signals S1, S2, S3, and S4 obtained as described above, first, the signal S1
The decoded Lch signal component is 3 dB higher than the signal components of the other channels. In the signal S2, the decoded Rch signal component is 3 dB higher than the signal components of other channels. The signal S3 has a decoded Cch signal component 3 dB higher than the signal components of other channels, and the signal S4 has a decoded Cch signal component 3 dB higher than the signal components of other channels. In other words, the signals S1, S2, S3, and S4 have the property that only the signal component of one specific channel is higher than the signal components of other channels among the signal components of each channel included in itself. , Respectively, have obtained properness as signals of the Lch, Cch, Rch, and Sch channels.
However, in the state of the signals S1, S2, S3, and S4 at the stage where they are still generated by the matrix circuit, the sound image is not sufficiently separated. Therefore, direction enhancement circuits 501, 502, 503, and 504 are provided, and signals S 1, S 2, S 3, and S 4 are passed through these circuits, respectively, and reproduction is performed for each actual Lch, Cch, Rch, and Sch channel. The signal for use is obtained. The direction enhancement circuit is configured to change its output level in accordance with the level difference between the signals S1, S2, S3, and S4. For example, if the level of the Lch signal S1 becomes higher than that of the signals S2, S3, and S4 of other channels, the level of the signal S1 is dynamically increased in response to this, and the Lch audio is transmitted to the other channels. Make it stand out from your voice. By such an operation, the separation of the sound image between the sound of the four channels becomes better.
Note that the encoding and decoding techniques described above are employed in, for example, Dolby Pro Logic.

特開２００３−２７４４９３号公報JP 2003-274493 A

しかしながら、上記したエンコード、デコード技術は、下記のような点で万全ではなく、改良される余地が残っているということがいえる。
例えばデコード処理にあっては、図２２により説明したように、マトリクス回路により復元したマルチチャンネルごとの音声信号（Ｓ1、Ｓ2、Ｓ3、Ｓ4）について方向性強調のための処理を施している。しかし、この処理は、他のチャンネル音声よりも大きなレベルのチャンネル音声を増強させるというものである。このことは、チャンネル間の音像の分離をよりはっきりさせるという効果がある反面、チャンネルごとの出力音声のレベルが動的に変動することになり、聴感的に不自然な音量の変化を感じやすいという問題を抱える。また、全てのチャンネルの音声信号がほぼ同等レベルであるような場合には、レベル差を増強する処理が行われないことになり、例えばチャンネル間の音声の音量的分離は、３ｄＢ程度を確保できるにとどまって、音像の分離が良好でなくなる。また、音声の内容によっては、配置が隣り合うスピーカ同士の間で、一方のスピーカから出力されている音が、他方のスピーカ側に引きつけられるようにして、不用意に定位が変化することもある。つまり、図２２に対応する技術では、エンコードされた音声信号をデコードし、マルチチャンネルにより再現したときの音響に関して、より高品位とする余地が残っている。 However, it can be said that the above-described encoding and decoding techniques are not perfect in the following points, and there is still room for improvement.
For example, in the decoding process, as described with reference to FIG. 22, direction enhancement processing is performed on the audio signals (S 1, S 2, S 3, S 4) for each multichannel restored by the matrix circuit. However, this process is to enhance the channel sound at a higher level than the other channel sounds. This has the effect of making the sound image separation between channels more clear, but the level of the output sound for each channel fluctuates dynamically, and it is easy to feel unnaturally changing volume. Have a problem. Further, when the audio signals of all the channels are at substantially the same level, the processing for increasing the level difference is not performed. For example, the sound volume separation between the channels can secure about 3 dB. The sound image separation is not good. In addition, depending on the content of the audio, the sound output from one speaker may be attracted to the other speaker side between speakers that are adjacent to each other, so that the localization may be inadvertently changed. . That is, the technology corresponding to FIG. 22 leaves room for higher quality regarding the sound when the encoded audio signal is decoded and reproduced by multi-channel.

そこで本発明は上記した課題を考慮して、音声信号処理装置として次のように構成する。
つまり、本願発明の音声信号処理装置は、所定のチャンネル構成を成すデコードチャンネルに対応する音声信号成分のそれぞれに対して、対応のデコードチャンネルとしての音源の位置に基づいて求められた空間伝達関数により表される伝達特性を与え、これらの音声信号成分をエンコードチャンネルのチャンネル構成に応じて振り分けて生成した、エンコードチャンネルの音声信号を入力して、上記デコードチャンネルにおける特定の１つのチャンネルに対応する音声信号成分を生成する音声信号生成手段を、上記デコードチャンネルごとに対応して備えるものとされる。
そして、上記音声信号生成手段の各々は、入力されたエンコードチャンネルの音声信号の各々について、その音声信号生成手段が対応するデコードチャンネルの音声信号成分に与えられた伝達特性についての補正を行う補正手段と、この補正手段により補正された信号の間での所定の近似性を検出する近似性検出手段と、この近似性検出手段の検出結果に基づいて、信号補正手段から出力されるエンコードチャンネルごとの信号から、相互に近似しているとされる信号成分を分離して出力する分離手段と、この分離手段により分離された信号成分を加算して、対応するデコードチャンネルの音声信号として出力するチャンネル音声信号出力手段とを備えることとした。
In view of the above-described problems, the present invention is configured as an audio signal processing apparatus as follows.
In other words, the audio signal processing device of the present invention uses the spatial transfer function obtained based on the position of the sound source as the corresponding decode channel for each of the audio signal components corresponding to the decode channel having a predetermined channel configuration. The audio signal corresponding to one specific channel in the decode channel is input by inputting the audio signal of the encode channel, which gives the transfer characteristics expressed, and is generated by distributing these audio signal components according to the channel configuration of the encode channel. Audio signal generation means for generating signal components is provided for each of the decode channels.
Each of the audio signal generation means corrects the transfer characteristic given to the audio signal component of the decoding channel corresponding to the audio signal generation means corresponding to each of the audio signals of the input encode channels. And proximity detection means for detecting a predetermined approximation between signals corrected by the correction means, and for each encode channel output from the signal correction means based on the detection result of the proximity detection means. Separation means that separates and outputs signal components that are said to be close to each other from the signal, and channel sound that is output as the audio signal of the corresponding decode channel by adding the signal components separated by this separation means And signal output means.

また、音声信号処理システムとして次のように構成することとした。
つまり、本願発明の音声信号処理システムは、所定のチャンネル構成を成す原チャンネルの音声信号の組を、この原チャンネル以外の所定のチャンネル構成を成すエンコードチャンネルの音声信号の組に変換して出力するエンコード装置と、所定のチャンネル構成を成すエンコードチャンネルの音声信号の組を入力して、所定のチャンネル構成を成すデコードチャンネルの音声信号の組に変換するデコード装置とから成る。
そして、上記エンコード装置は、１原チャンネルにつきエンコードチャンネルごとに対応したものが設けられ、入力される音声信号が対応する原チャンネルとしての音源の位置に基づいて設定される空間伝達関数により表される伝達特性を、入力される音声信号に付与する伝達特性付与手段と、エンコードチャンネルごとに対応して設けられ、伝達特性付与手段の各々によって処理が施された信号を入力して加算し、この加算した出力を、対応するエンコードチャンネルの音声信号として出力する加算手段とを備えることとした。
また、上記デコード装置は、デコードチャンネルにおける特定の１つのチャンネルに対応する音声信号成分を分離する音声信号分離手段を、デコードチャンネルごとに対応して有させることとして、これら音声信号生成手段の各々は、入力されたエンコードチャンネルごとの音声信号について、対応するデコードチャンネルの音声信号成分に与えられた伝達特性についての補正を行う補正手段と、この補正手段による補正後のエンコードチャンネルごとの信号についての所定の近似性を検出する近似性検出手段と、この近似性検出手段の検出結果に基づいて、信号補正手段から出力されるエンコードチャンネルごとの信号から、相互に近似しているとされる信号成分を分離して出力する分離手段と、この分離手段により分離された信号成分を加算して、対応するデコードチャンネルの音声信号として出力するチャンネル音声信号出力手段とを備えることとした。
なお、ここでのチャンネル構成とは、１つの音響システムを形成するためのオーディオチャンネルの数と、オーディオチャンネルに応じた音源の間での位置関係などにより決まる構成内容をいうものである。
The audio signal processing system is configured as follows.
In other words, the audio signal processing system of the present invention converts a set of audio signals of the original channel that forms a predetermined channel configuration into a set of audio signals of the encode channel that forms a predetermined channel configuration other than the original channel, and outputs it. The encoding device includes a decoding device that inputs a set of audio signals of an encoding channel having a predetermined channel configuration and converts the set into a set of audio signals of a decoding channel having a predetermined channel configuration.
In the encoding apparatus, one original channel corresponding to each encoding channel is provided, and the input audio signal is represented by a spatial transfer function set based on the position of the sound source as the corresponding original channel. A transfer characteristic applying unit that applies a transfer characteristic to an input audio signal, and a signal that is provided for each encode channel and that is processed by each of the transfer characteristic applying units is input and added, and this addition is performed. And adding means for outputting the output as an audio signal of the corresponding encoding channel.
In addition, the decoding apparatus includes an audio signal separation unit that separates an audio signal component corresponding to one specific channel in the decode channel corresponding to each decode channel. Correction means for correcting the transfer characteristic given to the audio signal component of the corresponding decode channel for the input audio signal for each encode channel, and a predetermined signal for the signal for each encode channel after correction by the correction means Based on the detection result of the proximity detection means and the signal for each encode channel output from the signal correction means, signal components that are approximated to each other are detected. separating means for separating and outputting the signal component separated by the separating means San, it was decided and a channel sound signal output means for outputting as a voice signal of the corresponding decoding channels.
The channel configuration here refers to the configuration content determined by the number of audio channels for forming one acoustic system and the positional relationship between sound sources corresponding to the audio channels.

上記各構成によると、エンコードされた音声ソースは、所定のチャンネル構成によるものを、他のチャンネル構成に変換したものとされる。そのときに、エンコード後のチャンネルごとの音声信号には、エンコード前のチャンネル構成における各チャンネルの音源の位置に応じてしかるべき空間伝達関数に応じた伝達特性が与えられている。このようにしてエンコードされた音声ソースは、エンコードチャンネルのチャンネル構成に応じた再生システムにより再生することで、エンコード前の音声ソースを再生した場合と同等の音像定位を実現することが可能である。
そして、本願発明の音声信号処理装置（デコード装置）は、上記したエンコードチャンネル構成の音声信号を入力して、エンコード前と同じチャンネル構成、若しくは別のチャンネル構成による音声信号群からなる音声ソースに変換する。このためには、入力したエンコード後の各チャンネルの音声信号から、各デコードチャンネルとしての音声信号成分を分離して出力するようにされる。
１デコードチャンネルに対応する、上記音声信号成分の分離のための構成としては、エンコードチャンネルごとの音声信号のそれぞれに含まれる音声信号成分のうちから、エンコードのための伝達特性を与えられたことにより変化した、そのデコードチャンネルの音声信号成分の所定要素（例えば位相、レベル、伝搬時間差など）を補正するようにされる。そして、エンコードチャンネルの音声信号の間で、これらの要素が近似しているとされる信号成分を分離するようにされる。このようにして分離された信号成分が、対応するデコードチャンネルの音声信号として出力される。このような信号分離の処理を、デコードチャンネルごとに実行する。この場合、各デコードチャンネルの音声信号としての出力は、そのデコードチャンネルにより再生出力させるべき音声信号成分のみから成るもので、他のチャンネルの音声信号成分は含んでいないものとしてみてよい。 According to each of the above configurations, an encoded audio source is obtained by converting a predetermined channel configuration into another channel configuration. At that time, the audio signal for each channel after encoding is given a transfer characteristic corresponding to the appropriate spatial transfer function according to the position of the sound source of each channel in the channel configuration before encoding. The sound source encoded in this way can be reproduced by a reproduction system according to the channel configuration of the encode channel, thereby realizing a sound image localization equivalent to the case where the sound source before encoding is reproduced.
Then, the audio signal processing device (decoding device) of the present invention inputs the audio signal having the above-described encoding channel configuration and converts it into an audio source consisting of an audio signal group having the same channel configuration as before encoding or another channel configuration. To do. For this purpose, the audio signal component as each decode channel is separated from the input audio signal of each channel after encoding and output.
The configuration for separating the audio signal component corresponding to one decode channel is that the transfer characteristic for encoding is given from the audio signal components included in each audio signal for each encode channel. A predetermined element (for example, phase, level, propagation time difference, etc.) of the audio signal component of the decoded channel that has changed is corrected. Then, signal components that are approximated by these elements are separated from the audio signals of the encoding channel. The signal component separated in this way is output as an audio signal of the corresponding decode channel. Such signal separation processing is executed for each decode channel. In this case, the output as the audio signal of each decode channel is composed of only the audio signal component to be reproduced and output by the decode channel, and may be regarded as not including the audio signal component of other channels.

このことから本発明としての音声信号処理装置は、デコードチャンネルの構成に応じた再生システムによりデコード後の音声ソースを再生するのにあたり、方向性強調などの処理を施さなくとも、適正な音像定位を再現できるものであり、このことが、例えば再生音の品質向上につながる。 Therefore, the audio signal processing apparatus according to the present invention does not perform processing such as directionality enhancement when reproducing the decoded audio source by the reproduction system according to the configuration of the decode channel, and performs proper sound image localization. This can be reproduced, and this leads to, for example, an improvement in the quality of reproduced sound.

以下、本願発明を実施するための最良の形態（以下、実施の形態という）について説明していくこととする。
図１、図２は、本実施の形態のエンコード装置とデコード装置のそれぞれについての、入出力のチャンネル構成を示している。
先ず、本実施の形態のエンコード装置１としては、図１に示すようにして、マルチチャンネルといわれるチャンネル構成の１つであるＬ（左）ｃｈ、Ｃ（中央）ｃｈ、Ｒ（右）ｃｈ、ＬＳ（左サラウンド）ｃｈ、ＲＳｃｈ（右サラウンド）による５チャンネル分の音声信号の組による音声ソースを入力し、Ｌｃｈ，Ｒｃｈの２チャンネルステレオに対応するチャンネル構成による音声信号の組の音声ソースに変換して出力するものとして構成される。なお、上記したＬｃｈ、Ｃｃｈ、Ｒｃｈ、ＬＳｃｈ、ＲＳｃｈのチャンネル構成は、例えば５．１ｃｈサラウンドのチャンネル構成から、サブウーファのチャンネルを省略したものとしてみることができる。
また、本実施の形態のデコード装置２としては、図２に示すようにして、２チャンネルステレオに対応したチャンネル構成による音声ソースの音声信号の組を入力する。このようにして入力される音声信号は、上記エンコード装置１にてエンコードされた音声ソースのものとされる。そして、これらの入力音声信号についてデコード処理を行った結果として、エンコード装置１によりエンコードされる前と同様の５チャンネル構成の組による音声信号を出力するものとされる。 Hereinafter, the best mode for carrying out the present invention (hereinafter referred to as an embodiment) will be described.
FIG. 1 and FIG. 2 show the input / output channel configuration for each of the encoding apparatus and decoding apparatus of the present embodiment.
First, as shown in FIG. 1, the encoding apparatus 1 according to the present embodiment has L (left) ch, C (center) ch, R (right) ch, which is one of channel configurations called multi-channels. Input an audio source with a set of 5 channels of LS (left surround) ch and RSch (right surround) audio signals, and convert them to an audio source of an audio signal set with a channel configuration corresponding to 2-channel stereo of Lch and Rch. And output. The channel configuration of Lch, Cch, Rch, LSch, and RSch described above can be considered as a channel configuration of 5.1ch surround in which the subwoofer channel is omitted.
Further, as shown in FIG. 2, the decoding apparatus 2 of the present embodiment inputs a set of audio signals of an audio source having a channel configuration corresponding to 2-channel stereo. The audio signal input in this way is that of the audio source encoded by the encoding device 1. Then, as a result of performing decoding processing on these input audio signals, audio signals having a set of five channels similar to those before being encoded by the encoding apparatus 1 are output.

図３は、図１に示したエンコード装置１によりエンコードされるべき音声ソースのチャンネル構成についてのモデルを示している。
この図には、Ｌｃｈ、Ｃｃｈ、Ｒｃｈ、ＬＳｃｈ、ＲＳｃｈのそれぞれに応じた音源として、スピーカＳＰ−Ｌ、ＳＰ−Ｃ、ＳＰ−Ｒ、ＳＰ−ＬＳ、ＳＰ−ＲＳが示され、これらのスピーカから出力されて左耳と右耳のそれぞれ到達する音声をリスナ（聴取者）Ｍが聴き取る、というモデルが示されている。
ちなみに、このようなチャンネル構成では、図示もしているように、リスナＭの位置に対する左前方にスピーカＳＰ−Ｌを配置し、中央前方にスピーカＳＰ−Ｃを配置し、右前方にスピーカＳＰ−Ｒを配置し、左後方にスピーカＳＰ−ＬＳを配置し、右後方にスピーカＳＰ−ＲＳを配置するのが通常である。また、このようなマルチチャンネルのスピーカの配置位置については、ＩＴＵ−Ｒなどによって理想的とされる所定の配置角度、高さなどが推奨されている。 FIG. 3 shows a model for the channel configuration of the audio source to be encoded by the encoding apparatus 1 shown in FIG.
In this figure, speakers SP-L, SP-C, SP-R, SP-LS, and SP-RS are shown as sound sources corresponding to each of Lch, Cch, Rch, LSch, and RSch. A model is shown in which a listener (listener) M listens to sounds that are output and reach the left and right ears respectively.
Incidentally, in such a channel configuration, as shown in the figure, the speaker SP-L is arranged at the front left with respect to the position of the listener M, the speaker SP-C is arranged at the center front, and the speaker SP-R at the right front. Are usually arranged, the speaker SP-LS is arranged on the left rear side, and the speaker SP-RS is arranged on the right rear side. As for the arrangement position of such multi-channel speakers, a predetermined arrangement angle, height, and the like that are ideal by ITU-R and the like are recommended.

そして、図３に示されるチャンネル構成の下での、各スピーカからリスナＭの右耳、左耳に到達する音の経路については、下記の伝達関数（空間伝達関数）により表すものとする。
Ｈll：スピーカＳＰ−Ｌから左耳に到達する経路の伝達関数
Ｈlr：スピーカＳＰ−Ｌから右耳に到達する経路の伝達関数
Ｈcl：スピーカＳＰ−Ｃから左耳に到達する経路の伝達関数
Ｈcr：スピーカＳＰ−Ｃから右耳に到達する経路の伝達関数
Ｈrl：スピーカＳＰ−Ｒから左耳に到達する経路の伝達関数
Ｈrr：スピーカＳＰ−Ｒから右耳に到達する経路の伝達関数
Ｈlsl：スピーカＳＰ−ＬＳから左耳に到達する経路の伝達関数
Ｈlsr：スピーカＳＰ−ＬＳから右耳に到達する経路の伝達関数
Ｈrsl：スピーカＳＰ−ＲＳから左耳に到達する経路の伝達関数
Ｈrsr：スピーカＳＰ−ＲＳから右耳に到達する経路の伝達関数

なお、スピーカ（音源）から発せられる音の到達目標位置が、リスナの左耳、右耳ということになると、音源から、これら左耳、右耳に対して音声が到達するための経路についての空間伝達関数は、特に頭部伝達関数として扱われるものとなる。 The path of the sound that reaches the right and left ears of the listener M from each speaker under the channel configuration shown in FIG. 3 is represented by the following transfer function (spatial transfer function).
Hll: Transfer function of the path from the speaker SP-L to the left ear Hlr: Transfer function of the path from the speaker SP-L to the right ear Hcl: Transfer function of the path from the speaker SP-C to the left ear Hcr: Transfer function of path from speaker SP-C to right ear Hrl: Transfer function of path from speaker SP-R to left ear Hrr: Transfer function of path from speaker SP-R to right ear Hlsl: Speaker SP -Transfer function of path from LS to left ear Hlsr: Transfer function of path from speaker SP-LS to right ear Hrsl: Transfer function of path from speaker SP-RS to left ear Hrsr: Speaker SP-RS Transfer function from the path to the right ear

When the target position of the sound emitted from the speaker (sound source) is the listener's left ear and right ear, the space for the path from which the sound reaches the left ear and right ear from the sound source. In particular, the transfer function is handled as a head-related transfer function.

図４は、図１に示したエンコード装置１の内部構成例を示している。
エンコード装置１の入力としては、図１と同様にして、Ｌｃｈ、Ｃｃｈ、Ｒｃｈ、ＬＳｃｈ、ＲＳｃｈのチャンネル構成を形成する各チャンネル（原チャンネル）ごとの音声信号が入力される。
先ず、Ｌｃｈに対応した入力音声信号についてみると、この原チャンネルのＬｃｈ（原チャンネル（Ｌ））としての入力音声信号は、フィルタ１１ａ、１１ｂに分岐して入力される。フィルタ１１ａでは、原チャンネル（Ｌ）の入力音声信号に対して伝達関数Ｈllにより表される伝達特性を与えるための処理を実行する。このためには、例えば伝達関数Ｈllを時間軸上に変換したインパルス応答を得て、このインパルス応答を原チャンネル（Ｌｃｈ）の入力音声信号に対して畳み込むためのフィルタリング処理を実行すればよい。また、フィルタ１１ｂでは、原チャンネル（Ｌ）の入力音声信号に対して、上記と同様のフィルタリング処理により、伝達関数Ｈlrにより表される伝達特性を与えるための処理を実行する。 FIG. 4 shows an example of the internal configuration of the encoding apparatus 1 shown in FIG.
As an input of the encoding apparatus 1, as in FIG. 1, an audio signal for each channel (original channel) forming the channel structure of Lch, Cch, Rch, LSch, and RSch is input.
First, regarding the input audio signal corresponding to Lch, the input audio signal as the Lch (original channel (L)) of the original channel is branched and input to the filters 11a and 11b. The filter 11a executes a process for giving a transfer characteristic represented by the transfer function Hll to the input audio signal of the original channel (L). For this purpose, for example, an impulse response obtained by converting the transfer function Hll on the time axis is obtained, and a filtering process for convolving the impulse response with the input audio signal of the original channel (Lch) may be executed. Further, the filter 11b executes a process for giving a transfer characteristic represented by the transfer function Hlr to the input audio signal of the original channel (L) by the same filtering process as described above.

そして、残る原チャンネル（Ｃ）、（Ｒ）、（ＬＳ）、（ＲＳ）の各原チャンネルの入力音声信号についても同様にして、対応の伝達関数に応じた伝達特性を与えるための処理を施すようにされる。
つまり、原チャンネル（Ｃ）の入力音声信号については、フィルタ１２ａが伝達関数Ｈclにより表される伝達特性を与えるとともに、フィルタ１２ｂが伝達関数Ｈcrにより表される伝達特性を与えるようにされる。
原チャンネル（Ｒ）の入力音声信号については、フィルタ１３ａが伝達関数Ｈrlにより表される伝達特性を与えるとともに、フィルタ１３ｂが伝達関数Ｈrrにより表される伝達特性を与えるようにされる。
原チャンネル（ＬＳ）の入力音声信号については、フィルタ１４ａにより伝達関数Ｈlslにより表される伝達特性を与えるとともに、フィルタ１４ｂにより伝達関数Ｈlsrにより表される伝達特性を与えるようにされる。
原チャンネル（ＲＳ）の入力音声信号については、フィルタ１５ａにより伝達関数Ｈrslにより表される伝達特性を与えるとともに、フィルタ１５ｂにより伝達関数Ｈrsrにより表される伝達特性を与えるようにされる。 Similarly, the input audio signals of the original channels (C), (R), (LS), and (RS) remaining are subjected to processing for giving transfer characteristics corresponding to the corresponding transfer functions. To be done.
That is, for the input audio signal of the original channel (C), the filter 12a gives the transfer characteristic represented by the transfer function Hcl, and the filter 12b gives the transfer characteristic represented by the transfer function Hcr.
For the input audio signal of the original channel (R), the filter 13a gives the transfer characteristic represented by the transfer function Hrl, and the filter 13b gives the transfer characteristic represented by the transfer function Hrr.
The input audio signal of the original channel (LS) is given a transfer characteristic represented by the transfer function Hlsl by the filter 14a and given a transfer characteristic represented by the transfer function Hlsr by the filter 14b.
For the input audio signal of the original channel (RS), the transfer characteristic represented by the transfer function Hrsl is given by the filter 15a, and the transfer characteristic given by the transfer function Hrsr is given by the filter 15b.

ここで、上記フィルタ１１ａ，１１ｂ〜１５ａ、１５ｂは、それぞれ、図５に示される所定次数のＦＩＲ(Finite Impulse Response)型のデジタルフィルタによって構成することができる。ＦＩＲフィルタとしては、例えば構成すべき次数（Ｎ次）に応じた数の遅延器２１（１）〜２１（Ｎ）と、乗算器２２（１）〜２２（ｎ）と加算器２３（１）〜２３（Ｍ）を図のようにして接続したものとして形成される。遅延器２１（１）〜２１（Ｎ）は、それぞれ１サンプルのタイミング信号を遅延させ、乗算器２２（１）〜２２（ｎ）に対しては、畳み込むべきインパルス応答に応じた係数が設定される。このような構成により、入力端子２０から入力されたデジタル音声信号は、インパルス応答が畳み込まれて出力端子２４から出力される。つまり、インパルス応答に応じた伝達特性を持った音声信号に変換されて出力される。
また、これらフィルタ１１ａ，１１ｂ〜１５ａ、１５ｂにより畳み込むインパルス応答、あるいはその基となる伝達関数は、所定の環境をつくったうえで実際に測定して求めるようにしてもよいし、あるいは、一定の環境を想定したうえで演算などにより求めることができる。また、このときに実際的にあるいは仮想的に設定する原チャンネルの音源（スピーカ）の位置は、先に説明したＩＴＵ−Ｒの推奨に従ったものを採用することができる。また、ＩＴＵ−Ｒの推奨以外の位置を設定してもよい。 Here, each of the filters 11a, 11b to 15a, 15b can be constituted by a predetermined order FIR (Finite Impulse Response) type digital filter shown in FIG. As the FIR filter, for example, a number of delay devices 21 (1) to 21 (N), multipliers 22 (1) to 22 (n), and an adder 23 (1) corresponding to the order to be configured (Nth order). To 23 (M) are connected as shown in the figure. The delay units 21 (1) to 21 (N) each delay the timing signal of one sample, and the multipliers 22 (1) to 22 (n) are set with coefficients corresponding to the impulse responses to be convoluted. The With such a configuration, the digital audio signal input from the input terminal 20 is output from the output terminal 24 with the impulse response convoluted. That is, it is converted into an audio signal having a transfer characteristic corresponding to the impulse response and output.
Further, the impulse response convoluted by these filters 11a, 11b to 15a, 15b, or the transfer function based on the impulse response may be obtained by actually measuring after creating a predetermined environment, It can be obtained by calculation after assuming the environment. In addition, as the position of the sound source (speaker) of the original channel that is actually or virtually set at this time, the position according to the recommendation of the ITU-R described above can be adopted. Further, positions other than those recommended by the ITU-R may be set.

説明を図４に戻す。
フィルタ１１ａ，１１ｂ〜１５ａ、１５ｂによりしかるべき伝達特性が与えられて出力される信号のうち、フィルタ１１ａ、１２ａ、１３ａ、１４ａ、１５ａから出力される信号は、加算器１６ａにより加算され、エンコード後のステレオチャンネル（２チャンネル）におけるＬチャンネルの信号として出力される。
また、フィルタ１１ｂ、１２ｂ、１３ｂ、１４ｂ、１５ｂから出力される信号は、加算器１６ｂにより加算され、エンコード後のステレオチャンネルにおけるＲチャンネルの信号として出力される。 Returning to FIG.
Of the signals output with appropriate transfer characteristics given by the filters 11a, 11b to 15a, 15b, the signals output from the filters 11a, 12a, 13a, 14a, 15a are added by the adder 16a and encoded. Is output as an L channel signal in stereo channels (2 channels).
The signals output from the filters 11b, 12b, 13b, 14b, and 15b are added by the adder 16b and output as an R channel signal in the stereo channel after encoding.

ここで、エンコード後のＬチャンネルの信号は、原チャンネル（Ｌ）（Ｃ）（Ｒ）（ＬＳ）（ＲＳ）の各音声信号に対して、図３のリスナＭの左耳に到達する経路の伝達特性を与えたものを加算（合成）したものとなっている。また、エンコード後のＲチャンネルの信号は、同じ原チャンネル（Ｌ）（Ｃ）（Ｒ）（ＬＳ）（ＲＳ）の各音声信号に対して、リスナＭの右耳に到達する経路の伝達特性を与えたものを加算（合成）したものとなっている。
例えば、このようにしてエンコードされた２チャンネルの音声ソースを、通常の２チャンネルステレオに対応した音声再生装置により再生出力させ、この再生音をヘッドフォンにより聴いたとする。
このときにヘッドフォンを装着したリスナの左右の耳で聴き取る音は、図３のスピーカＳＰ−Ｌ、ＳＰ−Ｃ、ＳＰ−Ｒ、ＳＰ−ＬＳ、ＳＰ−ＲＳからリスナＭの左耳と右耳とにそれぞれ到達する経路の伝達特性を持っている。従って、実際にヘッドフォンを装着したリスナが知覚する音としては、通常の２チャンネルステレオのようにして頭内において定位するものではなく、例えば図３のようにして、リスナＭの位置にて、スピーカＳＰ−Ｌ、ＳＰ−Ｃ、ＳＰ−Ｒ、ＳＰ−ＬＳ、ＳＰ−ＲＳが仮想的に在るとされる位置にて原チャンネルの各音が発せられているときの定位を知覚することになる。
なお、ここでは図３との対応を分かりやすいものとするために、本実施の形態のエンコード装置１によりエンコードした音声ソースをヘッドフォンにより再生した場合について述べているが、２チャンネルステレオ再生システムとしてＬ，Ｒの各チャンネルに対応した２つのスピーカから音声を再生出力させたときにも、例えば図３と同様の仮想音源の定位とすることは可能である。この場合には、図３に示す原チャンネルのスピーカごとに対応した伝達関数に加えて、上記Ｌ，Ｒの各チャンネルに対応した２つのスピーカからリスナの両耳に到達する音の経路の伝達関数を加味して、図４のフィルタ１１ａ，１１ｂ〜１５ａ，１５ｂにおいて畳み込むべきインパルス応答の伝達関数を求めるようにすればよい。
例えば従来例として説明したエンコード技術により２チャンネルステレオのチャンネル構成にエンコードされた音声ソースを通常の２チャンネルステレオに対応する再生装置により再生させたときには、通常の２チャンネルステレオとしての音像定位になる。これに対して本実施の形態のエンコード装置１であれば、上記のようにして、エンコード前の原チャンネルによる仮想の音像定位が得られるものである。これにより、例えばエンコードされた音声ソースを含んだコンテンツ情報などとしては、その付加価値が高まることになる。 Here, the L channel signal after the encoding is a path of the path reaching the left ear of the listener M in FIG. 3 with respect to each audio signal of the original channel (L) (C) (R) (LS) (RS). It is the result of adding (synthesizing) those given transfer characteristics. Further, the encoded R channel signal has a transfer characteristic of a path reaching the right ear of the listener M with respect to each audio signal of the same original channel (L) (C) (R) (LS) (RS). It is the result of adding (combining) the given ones.
For example, it is assumed that the 2-channel audio source encoded in this way is reproduced and output by an audio reproducing apparatus compatible with normal 2-channel stereo, and the reproduced sound is heard through headphones.
At this time, the sounds heard by the left and right ears of the listener wearing the headphones are the left and right ears of the listener M from the speakers SP-L, SP-C, SP-R, SP-LS, and SP-RS in FIG. And has the transfer characteristics of the route to reach each. Therefore, the sound actually perceived by the listener wearing the headphones is not localized in the head like a normal two-channel stereo. For example, as shown in FIG. The localization when the sound of the original channel is emitted at the position where the SP-L, SP-C, SP-R, SP-LS, and SP-RS are virtually present is perceived. .
Here, in order to make the correspondence with FIG. 3 easy to understand, the case where the audio source encoded by the encoding apparatus 1 of the present embodiment is reproduced by headphones is described. When sound is reproduced and output from two speakers corresponding to the channels R and R, for example, the localization of the virtual sound source similar to that shown in FIG. 3 can be performed. In this case, in addition to the transfer function corresponding to each speaker of the original channel shown in FIG. 3, the transfer function of the path of the sound reaching the both ears of the listener from the two speakers corresponding to the L and R channels. In consideration of the above, the transfer function of the impulse response to be convoluted in the filters 11a, 11b to 15a, 15b in FIG.
For example, when an audio source encoded in a two-channel stereo channel configuration by the encoding technique described as a conventional example is reproduced by a playback device corresponding to a normal two-channel stereo, sound image localization as a normal two-channel stereo is obtained. On the other hand, with the encoding apparatus 1 of the present embodiment, virtual sound image localization using the original channel before encoding can be obtained as described above. As a result, for example, content information including an encoded audio source increases its added value.

続いては、図６により、本実施の形態のデコード装置２の内部構成例について説明する。
この図に示すようにして、デコード装置２に対しては、例えばエンコード装置１によりエンコードされた後の２チャンネルステレオによるＬｃｈ、Ｒｃｈの音声信号が入力される。なお、ここでは、このデコード装置２に入力されるエンコード後のチャンネル構成に対応したＬ，Ｒの各チャンネルについて、エンコードチャンネル（Ｌ）、エンコードチャンネル（Ｒ）ともいう。 Next, an internal configuration example of the decoding device 2 according to the present embodiment will be described with reference to FIG.
As shown in this figure, for example, Lch and Rch audio signals in 2-channel stereo after being encoded by the encoding device 1 are input to the decoding device 2. Here, the L and R channels corresponding to the channel structure after encoding input to the decoding device 2 are also referred to as an encode channel (L) and an encode channel (R).

この図に示すエンコーダ２は、高速フーリエ変換部（ＦＦＴ部）３１ａ、３１ｂ、チャンネル信号分離ブロック３２−Ｌ、３２−Ｃ、３２−Ｒ、３２−Ｃ、３２−ＬＳ、３２−ＲＳ、逆高速フーリエ変換部（ＩＦＦＴ部）３３−Ｌ、３３−Ｃ、３３−Ｒ、３３−ＬＳ、３３−ＲＳから成る。
エンコードチャンネル（Ｌ）の入力音声信号と、エンコードチャンネル（Ｒ）の入力信号のうち、エンコードチャンネル（Ｌ）の入力音声信号は、高速フーリエ変換部３１ａに入力される。フーリエ変換部３１ａでは、高速フーリエ変換処理を実行することで、入力された音声信号を周波数領域の信号Ｓｇｌに変換する。この信号Ｓｇｌは、分岐して、チャンネル信号分離ブロック３２−Ｌ、３２−Ｃ、３２−Ｒ、３２−ＬＳ、３２−ＲＳ内に設けられる補正処理部４１ａに対してそれぞれ入力される。
また、一方のエンコードチャンネル（Ｒ）の入力音声信号は、高速フーリエ変換部３１ｂに入力される。フーリエ変換部３１ｂにおいても、入力音声信号について高速フーリエ変換処理を実行して、周波数領域の信号Ｓｇｒに変換し、チャンネル信号分離ブロック３２−Ｌ、３２−Ｃ、３２−Ｒ、３２−ＬＳ、３２−ＲＳ内に設けられる補正処理部４１ｂに対してそれぞれ入力させる。 The encoder 2 shown in this figure includes fast Fourier transform units (FFT units) 31a and 31b, channel signal separation blocks 32-L, 32-C, 32-R, 32-C, 32-LS, 32-RS, and inverse high speed. It comprises a Fourier transform unit (IFFT unit) 33-L, 33-C, 33-R, 33-LS, 33-RS.
Of the input audio signal of the encode channel (L) and the input signal of the encode channel (R), the input audio signal of the encode channel (L) is input to the fast Fourier transform unit 31a. The Fourier transform unit 31a converts the input audio signal into a frequency domain signal Sgl by executing a fast Fourier transform process. The signal Sgl branches and is input to the correction processing unit 41a provided in the channel signal separation blocks 32-L, 32-C, 32-R, 32-LS, and 32-RS.
The input audio signal of one encode channel (R) is input to the fast Fourier transform unit 31b. Also in the Fourier transform unit 31b, the input audio signal is subjected to fast Fourier transform processing to be converted into a frequency domain signal Sgr, and channel signal separation blocks 32-L, 32-C, 32-R, 32-LS, 32 -Input each to the correction processing unit 41b provided in the RS.

チャンネル信号分離ブロック３２−Ｌ、３２−Ｃ、３２−Ｒ、３２−ＬＳ、３２−ＲＳは、以降の説明からも理解されるように、デコード後のチャンネル構成である、Ｌｃｈ、Ｃｃｈ、Ｒｃｈ、ＬＳｃｈ、ＲＳｃｈの５つのチャンネル（デコードチャンネル）に対応して、５つ設けられているものであり、それぞれ、図示するようにして、補正処理部４１ａ、４１ｂ、及び分離処理部４２を備えてなる。 The channel signal separation blocks 32-L, 32-C, 32-R, 32-LS, and 32-RS are Lch, Cch, Rch, and the channel configuration after decoding, as will be understood from the following description. Five channels are provided corresponding to five channels (decode channels) of LSch and RSch, and each includes correction processing units 41a and 41b and a separation processing unit 42 as illustrated. .

図７、図８は、チャンネル信号分離ブロック３２の構成として、分離処理部４２の内部をより詳細に示した構成例を示している。なお、これら図７、図８においては、５つあるチャンネル信号分離ブロックのうち、チャンネル信号分離ブロック３２−Ｌを例に挙げている。
図７は、チャンネル信号分離ブロック３２−Ｌにおいて実行される信号処理動作の概念に基づいて、その内部構成を示している。
エンコードチャンネル（Ｌ）の入力音声信号を高速フーリエ変換部３１ａにより周波数領域に変換して得られた信号Ｓglは、チャンネル信号分離ブロック３２−Ｌにおける補正処理部４１ａに対して入力される。この補正処理部４１ａにおいては、伝達関数Ｈllに応じたインパルス応答畳み込み処理のフィルタ特性に対して逆となるフィルタ特性によるフィルタリング処理を実行する。
この伝達関数Ｈllに応じたインパルス応答畳み込み処理のフィルタ特性の逆特性については、ここでは［１／Ｈll］のようにして伝達関数Ｈllの逆数により表している。
例えば伝達関数Ｈllに応じた伝達特性を持つとされる音声信号成分の周波数応答特性が図９（ａ）に示すものであるとした場合、その逆特性［１／Ｈll］の周波数応答特性は、図９（ｂ）に示すようにして、図９（ａ）の特性を反転させたようなものとなる。
そして、上記伝達関数Ｈllは、図３に示した原チャンネル（Ｌ）のスピーカＳＰ−ＬからリスナＭの左耳に到達する経路の伝達関数であり、図４に示したエンコード装置１におけるフィルタ１１ａに設定されるフィルタ特性に対応する。つまり、図７の補正フィルタ４１ａでは、［１／Ｈll］として示される逆フィルタをかけているものであり、これにより、信号Ｓｇｌに含まれているとされるフィルタ１１ａ〜フィルタ１５ａの信号成分のうちで、フィルタ１１ａの出力信号成分に与えられていた伝達関数Ｈllによる伝達特性はキャンセルされる。このために、信号Ｓｇｌに含まれるフィルタ１１ａの出力信号成分は、フィルタ１１ａに入力される前段階の原チャンネル（Ｌ）の音声信号、つまり、エンコード前の音声ソースの信号に限りなく近くなり、同等とみてよい特性にまで補正されることとなる。なお、留意すべきことは、補正フィルタ４１ａにより原チャンネルの音声信号と同等の特性となるようにして補正される信号成分は、原チャンネル（Ｌ）に対応したもののみであり、他の原チャンネルに対応する信号成分については、かかる補正はかけられていないままである。 7 and 8 show configuration examples showing the details of the inside of the separation processing unit 42 as the configuration of the channel signal separation block 32. 7 and 8, the channel signal separation block 32-L among the five channel signal separation blocks is taken as an example.
FIG. 7 shows the internal configuration based on the concept of the signal processing operation executed in the channel signal separation block 32-L.
The signal Sgl obtained by converting the input audio signal of the encode channel (L) into the frequency domain by the fast Fourier transform unit 31a is input to the correction processing unit 41a in the channel signal separation block 32-L. In the correction processing unit 41a, a filtering process using a filter characteristic that is opposite to the filter characteristic of the impulse response convolution process according to the transfer function Hll is executed.
The inverse characteristic of the filter characteristic of the impulse response convolution process according to this transfer function Hll is represented by the inverse of the transfer function Hll here as [1 / Hll].
For example, when the frequency response characteristic of an audio signal component that has a transfer characteristic corresponding to the transfer function Hll is as shown in FIG. 9A, the frequency response characteristic of the inverse characteristic [1 / Hll] is As shown in FIG. 9B, the characteristics of FIG. 9A are reversed.
The transfer function Hll is a transfer function of a path from the speaker SP-L of the original channel (L) shown in FIG. 3 to the left ear of the listener M, and the filter 11a in the encoding apparatus 1 shown in FIG. This corresponds to the filter characteristic set to. That is, in the correction filter 41a of FIG. 7, the inverse filter shown as [1 / Hll] is applied, and thereby the signal components of the filters 11a to 15a that are included in the signal Sgl. Among them, the transfer characteristic by the transfer function Hll given to the output signal component of the filter 11a is cancelled. For this reason, the output signal component of the filter 11a included in the signal Sgl becomes as close as possible to the audio signal of the previous original channel (L) input to the filter 11a, that is, the signal of the audio source before encoding. It will be corrected to a characteristic that can be regarded as equivalent. It should be noted that the signal components corrected by the correction filter 41a so as to have characteristics equivalent to those of the audio signal of the original channel are only those corresponding to the original channel (L), and other original channels. The signal component corresponding to is left uncorrected.

また、一方のエンコードチャンネル（ｒ）の入力音声信号を高速フーリエ変換部３１ａにより周波数領域に変換して得られた信号Ｓgｒについても、チャンネル信号分離ブロック３２−Ｌにおける補正処理部４１ａにより、伝達関数Ｈlrに応じたインパルス応答畳み込み処理のフィルタ特性（図４のフィルタ１１ｂのフィルタ特性）に対する逆特性［１／Ｈlr］によるフィルタリングの処理がかけられる。これにより、補正処理部４１ａの出力としては、信号Ｓｇｒに含まれる信号成分のうちで原チャンネル（Ｌ）の音声信号の成分のみが、フィルタ１１ｂの入力前と同等特性となるようにして補正されることになる。
なお、補正処理部４１ａ、４１ｂについても、例えば図５に示したようなＦＩＲ型のフィルタを形成し、例えば逆フィルタ特性に応じた係数を乗算器に設定して構成することができる。 Further, the signal Sgr obtained by converting the input audio signal of one encoding channel (r) into the frequency domain by the fast Fourier transform unit 31a is also transferred by the correction processing unit 41a in the channel signal separation block 32-L. A filtering process is performed with an inverse characteristic [1 / Hlr] with respect to the filter characteristic of the impulse response convolution process according to Hlr (the filter characteristic of the filter 11b in FIG. 4). As a result, the output of the correction processing unit 41a is corrected so that only the audio signal component of the original channel (L) among the signal components included in the signal Sgr has the same characteristics as those before the input of the filter 11b. Will be.
The correction processing units 41a and 41b can also be configured by forming, for example, an FIR type filter as shown in FIG. 5 and setting, for example, a coefficient corresponding to the inverse filter characteristic in the multiplier.

上記のようにして、チャンネル信号分離ブロック３２−Ｌの補正処理部４１ａによっては、エンコードチャンネル（Ｌ）に対応する信号Ｓｇｌについて、これに含まれる原チャンネル（Ｌ）の信号成分についてのみ、エンコード前と同等の特性に補正するようにされ、補正処理部４１ｂによっては、エンコードチャンネル（Ｒ）に対応する信号Ｓｇｒについて、これに含まれる原チャンネル（Ｌ）の信号成分についてのみ、エンコード前と同等の特性に補正する。つまり、信号Ｓｇｌと、信号Ｓｇｒとについて、ともに、原チャンネル（Ｌ）の信号成分のみをエンコード前と同等に補正した信号Ｓｇｌａ、Ｓｇｒａが得られることになる。これら信号Ｓｇｌａ、Ｓｇｒａの関係としては、共通に補正された原チャンネル（Ｌ）の信号成分については、その位相とレベルが相互に一致していることになる。つまり、補正処理部４１ａ，４１ｂによる補正処理は、エンコード時において原チャンネル（Ｌ）の信号成分に対して伝達特性（Ｈll、Hlr）を与えたことにより生じた、信号Ｓｇｌ，Ｓｇｒ間における原チャンネル（Ｌ）の信号成分の位相差、レベル差を補正しているものであるともみることができる。なお、信号Ｓｇｌａ、Ｓｇｒａにおいて、原チャンネル（Ｌ）以外の信号成分については、エンコード時においてフィルタ１２ａ、１２ｂ〜１５ａ、１５ｂにより与えられた伝達特性を有したままであることで、相互が一致しない状態であるということになる。 As described above, depending on the correction processing unit 41a of the channel signal separation block 32-L, only the signal component of the original channel (L) included in the signal Sgl corresponding to the encode channel (L) is encoded. The correction processing unit 41b has a signal Sgr corresponding to the encoding channel (R), and only the signal component of the original channel (L) included in the signal Sgr is equal to that before encoding. Correct for characteristics. That is, for the signal Sgl and the signal Sgr, signals Sgla and Sgra obtained by correcting only the signal component of the original channel (L) in the same manner as before encoding are obtained. As for the relationship between these signals Sgla and Sgra, the phase and level of the signal components of the original channel (L) corrected in common coincide with each other. In other words, the correction processing by the correction processing units 41a and 41b is performed by applying the transfer characteristics (Hll and Hlr) to the signal component of the original channel (L) during encoding, and the original channel between the signals Sgl and Sgr. It can also be considered that the phase difference and level difference of the signal component (L) are corrected. In addition, in the signals Sgla and Sgra, signal components other than the original channel (L) do not coincide with each other because the transfer characteristics given by the filters 12a, 12b to 15a and 15b remain at the time of encoding. It is a state.

このような性質を有する信号Ｓｇｌａ、Ｓｇｒａは、分離処理部１２におけるレベル／位相比較処理ブロック５１に入力される。また、後述する乗算器５３、５４に対してそれぞれ入力される。
レベル／位相比較処理ブロック５１は、入力された信号Ｓｇｌａ、Ｓｇｒａとについて、レベルの比較と位相の比較を行い、比較結果として、信号Ｓｇｌａ、Ｓｇｒａについての周波数領域におけるレベルと位相についての近似率の値を示すとされる信号を、音源分離関数演算ブロック５２に出力するようにされる。 The signals Sgla and Sgra having such properties are input to the level / phase comparison processing block 51 in the separation processing unit 12. Further, it is input to multipliers 53 and 54, which will be described later.
The level / phase comparison processing block 51 performs level comparison and phase comparison on the input signals Sgla and Sgra, and, as a comparison result, an approximation rate of the level and phase in the frequency domain for the signals Sgla and Sgra. A signal indicating a value is output to the sound source separation function calculation block 52.

音源分離関数演算ブロック５２は、レベル／位相比較処理ブロック５１から入力される検出信号としての近似値に基づいて、所定の音源分離関数についての演算を行うことで、乗算器５３、５４の係数を求め、この求められた係数を乗算器５３、５４に対して設定する。乗算器５３，５４は、それぞれ、入力された信号Ｓｇｌａ、Ｓｇｒａに対して、設定された係数を乗算して出力する。なお、この係数の求め方のより具体的な例については後述する。このようにして係数が設定されることで、乗算器５３からは、信号Ｓｇｌａにおいて、他方の信号Ｓｇｒａとレベル、位相が一定以上近似しているとされる成分が出力される。同様に、乗算器５４からは、信号Ｓｇｒａにおいて、信号Ｓｇｌａとレベル、位相が一定以上近似しているとされる成分が出力される。この結果、乗算器５３、５４の出力は、信号Ｓｇｌａに含まれる信号成分と、信号Ｓｇｒａに含まれる信号成分とで、ほぼ同レベル、同位相とされる同一とみてよい信号成分であることになる。先にも説明したように、信号Ｓｇｌａと信号Ｓｇｒａとの間で同レベルとなる信号は、補正フィルタ４１ａ、４１ｂにより補正された原チャンネル（Ｌ）の信号成分である、従って、乗算器５３、５４の出力としては、信号Ｓｇｌａ、Ｓｇｒａの各々から、この補正された原チャンネル（Ｌ）の信号成分を分離抽出したものである、ということがいえる。そして、これら乗算器５３、５４の出力を加算器５５により加算して出力する。この加算器５５の出力が、チャンネル信号分離処理ブロック３２−Ｌの出力信号となるものであり、この出力信号としては、エンコード前の原チャンネル（Ｌ）の音声信号と同等の信号成分であることになる。つまり、チャンネル信号分離処理ブロック３２−Ｌでは、周波数領域に変換したエンコードチャンネル（Ｌ）（Ｒ）の音声信号を入力して、エンコード前の原チャンネル（Ｌ）の音声信号と同等の成分の信号を分離抽出して出力する。 The sound source separation function calculation block 52 performs a calculation on a predetermined sound source separation function based on the approximate value as the detection signal input from the level / phase comparison processing block 51, thereby calculating the coefficients of the multipliers 53 and 54. The obtained coefficient is set for the multipliers 53 and 54. The multipliers 53 and 54 multiply the input signals Sgla and Sgra by set coefficients, respectively, and output the result. A more specific example of how to obtain this coefficient will be described later. By setting the coefficients in this way, the multiplier 53 outputs a component of the signal Sgla whose level and phase approximate to a certain level or more from the other signal Sgra. Similarly, the multiplier 54 outputs a component in the signal Sgra whose level and phase approximate to a certain level or more from the signal Sgla. As a result, the outputs of the multipliers 53 and 54 are signal components that can be regarded as the same level and the same phase in the signal component included in the signal Sgla and the signal component included in the signal Sgra. Become. As described above, the signal having the same level between the signal Sgla and the signal Sgra is the signal component of the original channel (L) corrected by the correction filters 41a and 41b. Therefore, the multiplier 53, It can be said that the output 54 is obtained by separating and extracting the corrected signal component of the original channel (L) from each of the signals Sgla and Sgra. The outputs of the multipliers 53 and 54 are added by an adder 55 and output. The output of the adder 55 becomes an output signal of the channel signal separation processing block 32-L, and this output signal is a signal component equivalent to the audio signal of the original channel (L) before encoding. become. That is, in the channel signal separation processing block 32-L, the audio signal of the encoding channel (L) (R) converted into the frequency domain is input, and the signal having the same component as the audio signal of the original channel (L) before encoding. Are extracted and output.

図８には、上記図７により説明した処理概念に基づいて実際に構成されるチャンネル信号分離ブロック３２−Ｌを示している。なお、この図において、図７と同一部分には同一符号を付して説明を省略する。
この図８においては、分離処理部４２についてのより実際的な内部構成例が示されているので、この点について説明する。図８に示される分離処理部４２としては、レベル比較部６１、係数発生部６２、位相比較部６３、係数発生部６４、乗算器６５，６６，６７，６８、及び加算器５５から成るものとされる。
信号Ｓｇｌａ、Ｓｇｒａは、分離処理部４２に入力されると、先ず、レベル比較部６１に対して入力される。レベル比較部６１は、入力された信号Ｓｇｌａ、Ｓｇｒａについてのレベルを例えば周波数のサンプルごとに求め、その求めた両者のレベルにより、例えば信号Ｓｇｌａに対する信号Ｓｇｒａ（あるいは信号Ｓｇｒａに対する信号Ｓｇｌａ）のレベル比ｍを算出して係数発生部６２に出力するようにされる。ちなみにレベル比ｍは０≦ｍ≦１の範囲をとるもので、ｍ＝１であれば、相互のレベルは完全に同じであることを示す。また、レベル比ｍの値が小さいほど、相互のレベル差が大きくなって近似性は低くなる。
係数発生部６２では、入力されたレベル比ｍの値に基づいて、乗算器６５，６６に対して設定する係数ｒを求める。この係数ｒの範囲は、０≦ｒ≦１となる。そして、この係数ｒを決定するためには、所定の音源分離関数を用いた演算を行う。この音源分離関数としては、レベル比ｍが１に近づくのに応じて、係数ｒも１に近づいていくようにされた所定の関係を与えるものとされる。この係数発生部６２が利用する音源分離関数の例を、図１０（ａ）（ｂ）（ｃ）に示す。 FIG. 8 shows a channel signal separation block 32-L that is actually configured based on the processing concept described with reference to FIG. In this figure, the same parts as those in FIG.
In FIG. 8, a more practical internal configuration example of the separation processing unit 42 is shown, and this point will be described. The separation processing unit 42 shown in FIG. 8 includes a level comparison unit 61, a coefficient generation unit 62, a phase comparison unit 63, a coefficient generation unit 64, multipliers 65, 66, 67, 68, and an adder 55. Is done.
When the signals Sgla and Sgra are input to the separation processing unit 42, first, the signals Sgla and Sgra are input to the level comparison unit 61. The level comparison unit 61 obtains the level of the input signals Sgla and Sgra for each frequency sample, for example, and based on the obtained both levels, for example, the level ratio of the signal Sgra to the signal Sgla (or the signal Sgla to the signal Sgla) m is calculated and output to the coefficient generator 62. Incidentally, the level ratio m takes a range of 0 ≦ m ≦ 1, and if m = 1, it indicates that the mutual levels are completely the same. Further, as the value of the level ratio m is smaller, the mutual level difference becomes larger and the approximation becomes lower.
The coefficient generator 62 obtains a coefficient r to be set for the multipliers 65 and 66 based on the input level ratio m. The range of the coefficient r is 0 ≦ r ≦ 1. And in order to determine this coefficient r, the calculation using a predetermined sound source separation function is performed. As the sound source separation function, a predetermined relationship is set such that the coefficient r approaches 1 as the level ratio m approaches 1. Examples of sound source separation functions used by the coefficient generator 62 are shown in FIGS. 10 (a), 10 (b), and 10 (c).

図１０（ａ）（ｂ）（ｃ）は、音源分離関数を、レベル比ｍと係数ｒとの関係により示しているもので、横軸がレベル比ｍで、縦軸が係数ｒとされている。これらの図に示される音源分離関数は、例えばレベル比ｍ＝１のときには係数ｒ＝１を設定する点では共通しているが、レベル比ｍが１より小さいときの係数ｒの設定のしかたが異なっている。また、これら図１０（ａ）（ｂ）（ｃ）に示す以外の関数も考えられるもので、この中には、レベル比ｍ＝１のときにも1未満の係数ｒの値を設定する可能性も含まれる。 FIGS. 10A, 10B, and 10C show the sound source separation function by the relationship between the level ratio m and the coefficient r, where the horizontal axis is the level ratio m and the vertical axis is the coefficient r. Yes. The sound source separation functions shown in these figures are common in that, for example, the coefficient r = 1 is set when the level ratio m = 1, but how to set the coefficient r when the level ratio m is smaller than 1. Is different. Further, functions other than those shown in FIGS. 10A, 10B, and 10C are also conceivable. In this case, the value of the coefficient r less than 1 can be set even when the level ratio m = 1. Sex is also included.

説明を図８に戻す。
例えば上記のようにして係数発生部６２が求めた係数ｒは、乗算器６５、６６のそれぞれに対して設定される。乗算器６５、６６は、入力された信号Ｓｇｌａ、Ｓｇｒａに対して設定された係数ｒを乗算して出力する。このようにして乗算器６５、６６から出力される信号は、先の図７における分離処理部４２における、レベル／位相比較処理ブロック５１、及び音源分離関数演算ブロック５２についての説明に基づいて理解されるように、それぞれ、信号Ｓｇｌａから信号Ｓｇｒａとレベルが一定以上近似しているとされるスペクトル成分を分離抽出したものであり、信号Ｓｇｒａにおいて信号Ｓｇｌａとレベルが一定以上近似しているとされるスペクトル成分を分離抽出したものであることになる。そして、このことは、乗算器６５、６６の出力は、それぞれ、レベル的には、補正フィルタ４１ａ、４１ｂにより補正された原チャンネル（Ｌ）の音声信号と同じとされる信号成分であることになる。 Returning to FIG.
For example, the coefficient r obtained by the coefficient generator 62 as described above is set for each of the multipliers 65 and 66. The multipliers 65 and 66 multiply the input signals Sgla and Sgra by a set coefficient r and output the result. The signals output from the multipliers 65 and 66 in this way are understood based on the description of the level / phase comparison processing block 51 and the sound source separation function calculation block 52 in the separation processing unit 42 in FIG. As described above, the spectral components whose levels are approximated by a certain level or more from the signal Sgla are separately extracted, and the levels of the signal Sgla and the signal Sgla are approximated by a certain level or more. The spectral components are separated and extracted. This means that the outputs of the multipliers 65 and 66 are signal components that are the same as the audio signal of the original channel (L) corrected by the correction filters 41a and 41b in terms of level. Become.

ただし、上記乗算器６５、６６の出力は、レベル比較結果のみに基づいて信号Ｓｇｌａ、Ｓｇｒａから分離抽出された信号である。従って、例えばある時系列において原チャンネル（Ｌ）の音声信号とたまたまレベルが同じであった、原チャンネル（Ｌ）以外の原チャンネルの音声信号成分が相応に含まれている可能性がある。
そこで、乗算器６５、６６の出力は、さらに位相比較部６３に入力されて、ここで位相比較が行われる。そして、その比較結果として乗算器６５の出力信号に対する乗算器６６（あるいは乗算器６６の出力信号に対する乗算器６５）の出力信号の位相差ｐを求めて、係数発生部６４に出力するようにされる。位相差ｐは、例えば０≦ｐ≦πの範囲をとるもので、ｐ＝０であれば、完全に同位相であることを示す。また、位相差ｐの値が大きくなって位相差が拡大するほど、位相についての信号の近似性が低くなる。
係数発生部６４では、入力された位相差ｐの値に基づいて、乗算器６７、６８に対して設定する係数ｒｐを求める。この係数ｒｐの範囲は、０≦ｒｐ≦１となる。そして、この係数ｒｐを決定するためには、所定の音源分離関数を用いた演算を行う。この音源分離関数としては、位相差ｐが０近づくのに応じて、係数ｒｐは１に近づいていくような関係を与えるものとされる。この係数発生部６２が利用する音源分離関数の例を、図１１（ａ）（ｂ）（ｃ）に示す。
However, the outputs of the multipliers 65 and 66 are signals that are separated and extracted from the signals Sgla and Sgra based only on the level comparison result. Therefore, for example, there is a possibility that the audio signal components of the original channels other than the original channel (L), which happen to have the same level as the audio signal of the original channel (L) in a certain time series, are included accordingly.
Therefore, the outputs of the multipliers 65 and 66 are further input to the phase comparison unit 63, where phase comparison is performed. As a result of the comparison, the phase difference p of the output signal of the multiplier 66 (or the multiplier 65 for the output signal of the multiplier 66) with respect to the output signal of the multiplier 65 is obtained and output to the coefficient generator 64. The The phase difference p takes a range of, for example, 0 ≦ p ≦ π. If p = 0, it indicates that the phase difference is completely the same. In addition, as the value of the phase difference p increases and the phase difference increases, the signal approximation with respect to the phase decreases.
The coefficient generator 64 obtains a coefficient rp to be set for the multipliers 67 and 68 based on the input value of the phase difference p. The range of the coefficient rp is 0 ≦ rp ≦ 1. And in order to determine this coefficient rp, the calculation using a predetermined sound source separation function is performed. As the sound source separation function, the coefficient rp has a relationship of approaching 1 as the phase difference p approaches 0. Examples of sound source separation functions used by the coefficient generator 62 are shown in FIGS. 11 (a), 11 (b), and 11 (c).

図１１（ａ）（ｂ）（ｃ）は、位相差に応じた音源分離関数を、位相差ｐと係数ｒｐとの関係により示しているもので、横軸が位相差ｐで、縦軸が係数ｒｐとされている。これらの図に示される音源分離関数も、例えば位相差ｐ＝０のときには係数ｒｐ＝１を設定する点では共通しているが、位相差ｐが１より小さいときの係数ｒｐの設定のしかたが異なっている。また、この場合にも、図１１（ａ）（ｂ）（ｃ）に示す以外の音源分離関数も考えられ、このような関数には、例えば位相差ｐ＝０のときにも1未満の係数ｒｐの値を設定する可能性が含まれる。
Figure 11 (a) (b) ( c) , the sound source separation function according to the phase difference, in which are shown by the relation between the phase difference p and the coefficient rp, the horizontal axis in the phase difference p, the vertical axis The coefficient is rp . The sound source separation functions shown in these figures are also common in that, for example, the coefficient rp = 1 is set when the phase difference p = 0, but how to set the coefficient rp when the phase difference p is smaller than 1. Is different. Also in this case, sound source separation functions other than those shown in FIGS. 11A, 11B, and 11C are conceivable. For example, such a function includes a coefficient less than 1 even when the phase difference p = 0. The possibility to set the value of rp is included.

そして、例えば上記のようにして係数発生部６４が求めた係数ｒｐは、図８に示されるように、乗算器６７、６８のそれぞれに対して設定される。乗算器６７、６８は、それぞれ、乗算器６５、６６の出力信号を入力して、設定された係数ｒｐを乗算して出力する。
そして、このようにして乗算器６７、６８から出力される信号は、乗算器６５、６６の出力信号から、位相差が一定以内にある（一定以上の位相の近似性がある）とされるスペクトル成分を分離抽出したものとなる。このことから乗算器６７、６８から出力された信号は、レベルに関して補正フィルタ４１ａ、４１ｂにより補正された原チャンネル（Ｌ）の音声信号と同じとされる信号成分から、さらに、位相が同じとされる信号成分を分離したものということになる。つまり、図７の乗算器５３，５４から出力される信号に相当するもので、レベルと位相の両者に関して、補正フィルタ４１ａ、４１ｂにより補正された原チャンネル（Ｌ）の音声信号と同じとされる信号であり、従って、エンコード前の原チャンネル（Ｌ）の音声信号と同等の信号となる。
そして、このようにして得られた乗算器６７、６８の出力を、図７と同様にして、加算器５５により加算し、この加算された信号をチャンネル信号分離処理ブロック３２−Ｌの出力とする。
For example, the coefficient rp obtained by the coefficient generator 64 as described above is set for each of the multipliers 67 and 68, as shown in FIG. Multipliers 67 and 68 receive the output signals of multipliers 65 and 66, respectively, multiply the set coefficient rp, and output the result.
The signals output from the multipliers 67 and 68 in this way are spectrums whose phase difference is within a certain range (there is a phase approximation of a certain level or more) from the output signals of the multipliers 65 and 66. The components are separated and extracted. Therefore, the signals output from the multipliers 67 and 68 have the same phase from the signal components that are the same as the audio signal of the original channel (L) corrected by the correction filters 41a and 41b with respect to the level. This means that the signal components are separated. That is, it corresponds to the signal output from the multipliers 53 and 54 in FIG. 7, and is the same as the audio signal of the original channel (L) corrected by the correction filters 41a and 41b in both level and phase. Therefore, the signal is equivalent to the audio signal of the original channel (L) before encoding.
Then, the outputs of the multipliers 67 and 68 thus obtained are added by the adder 55 in the same manner as in FIG. 7, and the added signal is used as the output of the channel signal separation processing block 32-L. .

図７と図８を比較してみると、図８の構成では、図７に示されていたレベル／位相比較処理ブロック５１と音源分離関数演算ブロック５２としての機能を、レベル比較結果のみを行って同じレベルの信号成分を分離抽出する部位（レベル対応分離処理系：レベル比較部６１、係数発生部６２、乗算器６５，６６）と、位相比較のみを行って同じ位相の信号成分を分離抽出する部位（位相対応分離処理系：位相比較部６３、係数発生部６４、乗算器６７，６８）とを、前段と後段とで分割するようにして設けている構成になっていることがわかる。
なお、図８における分離処理部４２の他の構成として、前段に位相対応分離処理系（位相比較部６３、係数発生部６４、乗算器６７，６８）をおき、後段に、レベル対応分離処理系（レベル比較部６１、係数発生部６２、乗算器６５，６６）をおく構成とすることも考えられる。
また、分離処理部４２として、例えばデコード装置に求められる再生音の品質などについてそれほど高品位なものを必要としないような場合には、レベル対応分離処理系と位相対応分離処理系の何れか一方のみを備えるような構成とすることも考えられる。レベル対応分離処理系と位相対応分離処理系の何れか一方の処理のみが行われても、レベルあるいは位相の何れか一方に基づいて原チャンネル（Ｌ）のエンコード前と同じとされる信号成分が抽出できるので、例えば従来のマトリクス回路及び方向性強調回路によるエンコード出力に比較すれば、相応に良好なデコード出力音声の品位を保てる。 Comparing FIG. 7 and FIG. 8, in the configuration of FIG. 8, the functions as the level / phase comparison processing block 51 and the sound source separation function calculation block 52 shown in FIG. The signal component of the same phase is separated and extracted by performing only phase comparison with a part (level corresponding separation processing system: level comparison unit 61, coefficient generation unit 62, multipliers 65 and 66) that separates and extracts signal components of the same level. It can be seen that the configuration (phase correspondence separation processing system: phase comparison unit 63, coefficient generation unit 64, multipliers 67, 68) to be performed is provided so as to be divided into the former stage and the latter stage.
As another configuration of the separation processing unit 42 in FIG. 8, a phase-corresponding separation processing system (phase comparison unit 63, coefficient generation unit 64, multipliers 67 and 68) is placed in the previous stage, and a level-corresponding separation processing system is placed in the subsequent stage. It is also possible to employ a configuration in which (level comparison unit 61, coefficient generation unit 62, multipliers 65 and 66) are provided.
In addition, as the separation processing unit 42, for example, when the quality of reproduced sound required for the decoding device is not so high quality, either one of the level correspondence separation processing system and the phase correspondence separation processing system is used. It is also conceivable to have a configuration including only the above. Even if only one of the level-corresponding separation processing system and the phase-corresponding separation processing system is performed, the signal component that is the same as that before encoding of the original channel (L) is determined based on either the level or the phase. Since it can be extracted, for example, compared with the encoded output by the conventional matrix circuit and the directionality enhancement circuit, the quality of the decoded output sound can be kept reasonably good.

説明を図６に戻す。
例えば上記図７及び図８に示した構成により、チャンネル信号分離ブロック３２−Ｌでは、エンコード前の原チャンネル（Ｌ）と同じとされる周波数成分による信号を分離して出力するようにされる。
そして、残る４つのチャンネル信号分離ブロック３２−Ｃ、３２−Ｒ、３２−ＬＳ、３２−ＲＳとしても、ブロック構成的には、図７あるいは図８に示した構成を採る。そのうえで、チャンネル信号分離ブロック３２−Ｃの補正処理部４１ａ，４１ｂは、それぞれ、伝達関数Ｈcl、Ｈcrの逆特性［１／Ｈcl］［１／Ｈcｒ］による逆フィルタをかけるようにされる。これにより、チャンネル信号分離ブロック３２−Ｃでは、エンコード前の原チャンネル（Ｃ）と同じとされる周波数成分による信号を分離して出力する。
また、チャンネル信号分離ブロック３２−Ｒの補正処理部４１ａ，４１ｂの逆フィルタ特性は、それぞれ、伝達関数Ｈrl、Ｈrrの逆特性［１／Ｈrl］［１／Ｈrｒ］を設定する。これにより、チャンネル信号分離ブロック３２−Ｒの出力は、エンコード前の原チャンネル（Ｒ）と同じとされる周波数成分による信号となる。
また、チャンネル信号分離ブロック３２−ＬＳの補正処理部４１ａ，４１ｂの逆フィルタ特性は、それぞれ、伝達関数Ｈlsl、Ｈlsrの逆特性［１／Ｈlsl］［１／Ｈlsr］を設定する。これにより、チャンネル信号分離ブロック３２−ＬＳの出力は、エンコード前の原チャンネル（ＬＳ）と同じとされる周波数成分による信号となる。
また、チャンネル信号分離ブロック３２−ＲＳの補正処理部４１ａ，４１ｂの逆フィルタ特性は、それぞれ、伝達関数Ｈrsl、Ｈrsrの逆特性［１／Ｈrsl］［１／Ｈrsr］を設定する。これにより、チャンネル信号分離ブロック３２−ＬＳの出力は、エンコード前の原チャンネル（ＲＳ）と同じとされる周波数成分による信号となる。 Returning to FIG.
For example, with the configuration shown in FIGS. 7 and 8, the channel signal separation block 32-L separates and outputs a signal having the same frequency component as that of the original channel (L) before encoding.
The remaining four channel signal separation blocks 32-C, 32-R, 32-LS, and 32-RS also adopt the configuration shown in FIG. 7 or FIG. 8 in terms of block configuration. In addition, the correction processing units 41a and 41b of the channel signal separation block 32-C are respectively subjected to inverse filters based on the inverse characteristics [1 / Hcl] [1 / Hcr] of the transfer functions Hcl and Hcr. As a result, the channel signal separation block 32-C separates and outputs a signal having the same frequency component as that of the original channel (C) before encoding.
Further, the inverse filter characteristics of the correction processing units 41a and 41b of the channel signal separation block 32-R set the inverse characteristics [1 / Hrl] and [1 / Hrr] of the transfer functions Hrl and Hrr, respectively. As a result, the output of the channel signal separation block 32-R becomes a signal having a frequency component that is the same as that of the original channel (R) before encoding.
Further, the inverse filter characteristics of the correction processing units 41a and 41b of the channel signal separation block 32-LS set inverse characteristics [1 / Hlsl] [1 / Hlsr] of the transfer functions Hlsl and Hlsr, respectively. Thereby, the output of the channel signal separation block 32-LS becomes a signal having a frequency component which is the same as that of the original channel (LS) before encoding.
The inverse filter characteristics of the correction processing units 41a and 41b of the channel signal separation block 32-RS set inverse characteristics [1 / Hrsl] [1 / Hrsr] of the transfer functions Hrsl and Hrsr, respectively. Thereby, the output of the channel signal separation block 32-LS becomes a signal having a frequency component which is the same as that of the original channel (RS) before encoding.

そして、これらのチャンネル信号分離ブロック３２−Ｌ、３２−Ｃ、３２−Ｒ、３２−ＬＳ、３２−ＲＳから出力される各信号は、それぞれ、ＩＦＦＴ部３３−Ｌ、３３−Ｃ、３３−Ｒ、３３−ＬＳ、３３−ＲＳにより、周波数領域の信号から、時間領域の音声信号に変換されて出力されることになる。このようにして出力される音声信号は、それぞれ、エンコード前の原チャンネル（Ｌ）（Ｃ）（Ｒ）（ＬＳ）（ＬＲ）と同じとされる音声信号となる。つまり、デコード装置２によりデコードされた出力となるものである。 The signals output from the channel signal separation blocks 32-L, 32-C, 32-R, 32-LS, and 32-RS are respectively converted into IFFT units 33-L, 33-C, and 33-R. By 33-LS and 33-RS, a frequency domain signal is converted into a time domain audio signal and output. The audio signals output in this way are audio signals that are the same as the original channels (L), (C), (R), (LS), and (LR) before encoding. That is, an output decoded by the decoding device 2 is obtained.

上記構成による本実施の形態のデコード装置２により得られる音声信号は、エンコードされた音声ソースとしての音声信号（エンコードチャンネル（Ｌ）（Ｒ）の音声信号）から、信号の位相とレベルとに基づいた近似性の検出結果に応じて、原チャンネルの音声信号成分を分離抽出したものとされている。このことは、例えば従来として説明したエンコード／デコード技術によるデコード出力のようにして、デコード後の音声信号において、他のチャンネルの音声信号が一定比率で含まれているようなものではなく、デコード後の各チャンネルの音声信号は、ほぼエンコード前の各原チャンネルの音声信号と同一とみて良いものであることを意味する。
これにより、本実施の形態のデコード装置２の出力である音声信号を、各チャンネルに応じて適切に配置されたスピーカなどにより再生出力させた場合には、原チャンネルの音声信号を再生出力させた場合とほぼ同等の品質の音響効果を得ることができるものである。換言すれば、従来のようにして、音量や定位の変化を生じず、良好なチャンネルセパレーションの再生音声を聴くことができる。 The audio signal obtained by the decoding apparatus 2 of the present embodiment having the above configuration is based on the phase and level of the signal from the audio signal (the audio signal of the encoding channel (L) (R)) as the encoded audio source. It is assumed that the audio signal component of the original channel is separated and extracted according to the closeness detection result. This is because, for example, the decoded audio signal does not contain audio signals of other channels at a fixed ratio in the decoded audio signal as in the case of the decoding output by the encoding / decoding technology described as a conventional example. This means that the audio signal of each channel can be considered substantially the same as the audio signal of each original channel before encoding.
As a result, when the audio signal that is the output of the decoding device 2 of the present embodiment is reproduced and output by a speaker or the like that is appropriately arranged according to each channel, the audio signal of the original channel is reproduced and output. It is possible to obtain an acoustic effect having a quality almost equal to that of the case. In other words, it is possible to listen to the reproduced sound of good channel separation without causing a change in volume or localization as in the conventional case.

ところで、デコード装置２におけるチャンネル信号分離ブロック３２−Ｌ、３２−Ｃ、３２−Ｒ、３２−Ｃ、３２−ＬＳ、３２−ＲＳに備えられる補正処理部４１ａ、４１ｂには、先に説明したように、図４に示したエンコード装置１の各フィルタ１１ａ，１１ｂ〜１５ａ，１５ｂに与えたインパルス応答の伝達関数に対して逆特性となるものである。そして、このような逆特性に対応するインパルス応答は、エンコードに用いる側の伝達関数に応じたインパルス応答が複雑で長い応答である場合に収束しにくくなるという傾向にある。
例えば、図１２（ａ）には、残響のある環境を想定して測定したとされるインパルス応答波形の一例を示している。周知のようにして、インパルス応答としては、時間進行に従って、先ず、直接音に応答する直接音部分と、これに続く直接音が到達した後の反射音（間接音）に応答する間接音部分とがある。図１２（ａ）では、区間Ａで示す時間幅の応答部分が直接音部分であり、これに続く応答部分が例えば反射音部分となる。
一般に、直接音部分と反射音部分の応答時間を比較すると、反射音部分のほうが相当に長くなる。また、測定環境、条件などに応じた応答時間の変化が大きいのも反射音部である。そして、例えばこの反射音部分の応答時間が長いと、その逆特性を持つフィルタが収束しにくくなってくる、ということである。 By the way, the correction processing units 41a and 41b included in the channel signal separation blocks 32-L, 32-C, 32-R, 32-C, 32-LS, and 32-RS in the decoding device 2 are as described above. In addition, the characteristic is inverse to the transfer function of the impulse response given to each of the filters 11a, 11b to 15a, 15b of the encoding apparatus 1 shown in FIG. The impulse response corresponding to the inverse characteristic tends to be difficult to converge when the impulse response corresponding to the transfer function used for encoding is a complex and long response.
For example, FIG. 12A shows an example of an impulse response waveform assumed to be measured assuming a reverberant environment. As is well known, as an impulse response, as time progresses, first, a direct sound part that responds to a direct sound, and an indirect sound part that responds to a reflected sound (indirect sound) after the subsequent direct sound arrives, There is. In FIG. 12A, the response portion having the time width indicated by the section A is a direct sound portion, and the subsequent response portion is, for example, a reflected sound portion.
In general, when the response times of the direct sound portion and the reflected sound portion are compared, the reflected sound portion is considerably longer. In addition, the reflected sound part also has a large change in response time according to the measurement environment, conditions, and the like. For example, if the response time of the reflected sound part is long, a filter having the opposite characteristic becomes difficult to converge.

そこで、本実施の形態としては、逆フィルタが収束しにくくなる原因が、主としては、逆特性の元となる伝達関数のインパルス応答の長さによるもので、さらにインパルス応答の長さは主に反射音部分の長さに依存することに着目し、次のようにして、逆特性を設定するようにされる。
つまり、図１２（ｂ）に示すようにして、図１２（ａ）のインパルス応答波形の全体から、例えば区間Ａとして示される直接音部分に対応する応答分のみを抜き出したインパルス応答を利用する。例えば、図１２（ａ）のインパルス応答波形が伝達関数Ｈllに対応するものであるとすると、チャンネル信号分離ブロック３２−Ｌの補正処理部４１ａには、図１２（ｂ）に示すようにして、本来の伝達関数Ｈllから反射音部分を省略したものに応じたインパルス応答により、逆フィルタ特性［１／Ｈll］を求め、補正処理部４１ａに設定するようにされる。残る補正処理部４１ａ，４１ｂについても同様にして、対応するエンコード時の伝達関数から反射音部分を省略したもののインパルス応答により求めた逆フィルタ特性を設定するようにされる。 Therefore, in this embodiment, the cause that the inverse filter is difficult to converge is mainly due to the length of the impulse response of the transfer function that is the source of the inverse characteristics, and the length of the impulse response is mainly reflected. Focusing on the fact that it depends on the length of the sound part, the inverse characteristic is set as follows.
That is, as shown in FIG. 12B, an impulse response obtained by extracting only the response corresponding to the direct sound portion shown as the section A from the entire impulse response waveform of FIG. For example, assuming that the impulse response waveform in FIG. 12A corresponds to the transfer function Hll, the correction processing unit 41a of the channel signal separation block 32-L has the following configuration as shown in FIG. The inverse filter characteristic [1 / Hll] is obtained by an impulse response corresponding to the original transfer function Hll with the reflected sound portion omitted, and set in the correction processing unit 41a. Similarly, the remaining correction processing units 41a and 41b are configured to set the inverse filter characteristics obtained from the impulse response, although the reflected sound portion is omitted from the corresponding encoding transfer function.

このようにして逆フィルタ特性を設定した場合、デコード時における逆フィルタ処理にあっては、反射音成分についての補正は行われないことから、反射音部分に対応する信号成分についての適正な分離はできないことになる。しかしながら、周知のようにして、インパルス応答においては直接音のほうが支配的であり、従って、デコード出力される音声についての品位の低下などは特に問題にはならない。 When the inverse filter characteristic is set in this way, the correction for the reflected sound component is not performed in the inverse filter process at the time of decoding, so that the proper separation for the signal component corresponding to the reflected sound part is not performed. It will not be possible. However, as is well known, the direct sound is more dominant in the impulse response, and therefore, the quality of the decoded audio is not particularly problematic.

また、例えばリスナのフロント側に位置するなどして、他のチャンネルよりも音の再現性が重視されるようなデコード出力チャンネルについては、反射音部分を含めた逆フィルタ特性を設定し、他のデコードチャンネルについては、反射音部分を除去した逆フィルタ特性を設定するようにして、チャンネルごとに使い分けるようにしてもよい。 In addition, for the decode output channel where sound reproducibility is more important than other channels, for example, located on the front side of the listener, reverse filter characteristics including the reflected sound part are set, For the decode channel, an inverse filter characteristic from which the reflected sound portion is removed may be set so that the channel is properly used for each channel.

また、インパルス応答の直接音部分を使用するほかに、無響室などの残響が全く無い環境で測定した伝達特性、あるいは残響が全く無い環境を想定したうえで演算により求めた伝達特性を元に、逆特性を設定するという手法を考えることもできる。残響が全く無い環境の伝達特性は、残響部分の応答を持たないことから、例えば、図１２により説明したような、インパルス応答から直接音部分を抜き出すことをしなくとも、そのまま反射音部の省略されたものと同等のインパルス応答を得ることができる。ただし、残響のある環境では、インパルス応答の直接音部分にも残響成分が含まれていることから、前者の例のようにして、残響を持つ環境により得たインパルス応答の直接音部分を利用したほうが、再現される音場は豊かなものになる。 In addition to using the direct sound part of the impulse response, transfer characteristics measured in an environment with no reverberation such as an anechoic room, or based on transfer characteristics obtained by calculation assuming an environment with no reverberation. It is also possible to consider a method of setting reverse characteristics. Since the transfer characteristic of an environment having no reverberation does not have a response of the reverberation part, for example, the reflection sound part is omitted without extracting the sound part directly from the impulse response as described with reference to FIG. An impulse response equivalent to that obtained can be obtained. However, in an environment with reverberation, the direct sound part of the impulse response also contains a reverberation component. Therefore, as in the former example, the direct sound part of the impulse response obtained in an environment with reverberation was used. However, the reproduced sound field will be richer.

図１３は、本実施の形態のデコード装置２についての他の例を示している。なお、この図において図６と同一部分については同一符号を付して説明を省略する。
この図においては、チャンネル信号分離ブロック３２−Ｌ、３２−Ｃ、３２−Ｒ、３２−Ｃ、３２−ＬＳ、３２−ＲＳの各構成が、図６の場合と異なっている。つまり、チャンネル信号分離ブロック３２−Ｌ、３２−Ｒ、３２−Ｃ、３２−ＬＳ、３２−ＲＳについては、補正処理部４１ａ，４１ｂが省略され、代わりに、１つの補正処理部４１Ａが備えられる。この場合の補正処理部４１Ａは、信号Ｓｇｌ側においてのみ設けられ、信号Ｓｇｒは、そのまま分離処理部４２に入力されている。
また、チャンネル信号分離ブロック３２−Ｃについては、補正処理部４１Ａは設けられず、信号Ｓｇｌ、Ｓｇｒがそのまま分離処理部４２に入力されるようになっている。このように、チャンネル信号分離ブロック３２−Ｃにおいてのみ、補正処理部４１Ａが設けられない理由は、次の説明から理解されるように、対応する原チャンネルの音源であるスピーカＳＰ−Ｃについて、図５にも示されているように、リスナの正中面に位置させることとした場合には、このスピーカＳＰ−Ｃからリスナの左耳、右耳に到達する音の伝搬時間差、及びレベル差は生じないものとして扱うことができるからである。 FIG. 13 shows another example of the decoding device 2 of the present embodiment. In this figure, the same parts as those in FIG.
In this figure, each configuration of the channel signal separation blocks 32-L, 32-C, 32-R, 32-C, 32-LS, and 32-RS is different from that in FIG. That is, for the channel signal separation blocks 32-L, 32-R, 32-C, 32-LS, and 32-RS, the correction processing units 41a and 41b are omitted, and instead, one correction processing unit 41A is provided. . The correction processing unit 41A in this case is provided only on the signal Sgl side, and the signal Sgr is input to the separation processing unit 42 as it is.
The channel signal separation block 32-C is not provided with the correction processing unit 41A, and the signals Sgl and Sgr are input to the separation processing unit 42 as they are. As described above, the reason why the correction processing unit 41A is not provided only in the channel signal separation block 32-C is that the speaker SP-C, which is the sound source of the corresponding original channel, is understood from the following explanation. As shown in FIG. 5, if the speaker is positioned on the midplane of the listener, there will be a difference in propagation time and a level difference between the speakers SP-C and the left and right ears of the listener. This is because it can be treated as not.

例えば、１つの音源から聴こえる音をリスナＭが聴いて、その音源の定位を知覚するときの重要な要素の１つとしては、各スピーカからリスナＭの左耳と右耳に到達（伝搬）する音の時間差（伝搬時間差）を第１に挙げることができる。このような伝搬時間差は、例えば図１４（ａ）（ｂ）のようにして、インパルス応答の立ち上がり時間差として現れる。この図では、スピーカＳＰ−Ｌの音がリスナの左耳に到達する経路（伝達関数Ｈll）のインパルス応答と、右耳に到達する経路（伝達関数Ｈlｒ）のインパルス応答との関係を例として示している。例えばこのようにして、図１４（ａ）に示される伝達関数Ｈllのインパルス応答の立ち上がり時点に対して、図１４（ｂ）に示される伝達関数Ｈlｒのインパルス応答の立ち上がり時点は、時間Ｔｄ分遅れている。この時間Ｔｄは、例えば点音源として考えるスピーカＳＰ−ＬがリスナＭの左前方に偏って位置していることで、リスナＭの左耳に到達するまでの距離と、右耳に到達するまでの距離とに違いが生じ、これに応じて伝搬時間も異なってくることにより生じる。
そして、エンコード時において、伝達関数Ｈll、Ｈlrのそれぞれに応じたインパルス応答の畳み込み処理がフィルタ１１ａ、１１ｂにより行われることで、エンコードチャンネル（Ｌ）（Ｒ）の各音声信号に含まれる原チャンネル（Ｌ）の信号成分の間には、には、上記図１４（ａ）（ｂ）に示すようにインパルス応答の立ち上がり時間差（Ｔｄ）が生じているものである。 For example, one of the important elements when the listener M listens to the sound heard from one sound source and perceives the localization of the sound source reaches (propagates) the left ear and right ear of the listener M from each speaker. The sound time difference (propagation time difference) can be mentioned first. Such a propagation time difference appears as a rise time difference of the impulse response, for example, as shown in FIGS. In this figure, the relationship between the impulse response of the path (transfer function Hll) for the sound of the speaker SP-L to reach the listener's left ear and the impulse response of the path (transfer function Hlr) for reaching the right ear is shown as an example. ing. For example, in this way, the rise time of the impulse response of the transfer function Hlr shown in FIG. 14B is delayed by the time Td with respect to the rise time of the impulse response of the transfer function Hll shown in FIG. ing. This time Td is, for example, the distance until the speaker M reaches the left ear of the listener M and the time it reaches the right ear because the speaker SP-L considered as a point sound source is biased to the left front of the listener M. This is caused by the difference in distance and the propagation time differing accordingly.
At the time of encoding, convolution processing of impulse responses according to the transfer functions Hll and Hlr is performed by the filters 11a and 11b, so that the original channels (L) and (R) included in each audio signal (L) (R) Between the signal components of (L), a rise time difference (Td) of the impulse response is generated as shown in FIGS. 14 (a) and 14 (b).

そこで、音声信号Ｓｇｌそのものを上記時間差Ｔｄだけ遅延させることで、音声信号Ｓｇｌに含まれる原チャンネル（Ｌ）の音声信号成分と、音声信号Ｓｇｒに含まれる同じ原チャンネル（Ｌ）の音声信号成分とは、時間差Ｔｄがキャンセルされ、インパルス応答としてみた場合には、その立ち上がり時間が一致するようにされる。
補正処理部４１Ａは、このようにして、音声信号Ｓｇｌを時間差Ｔｄだけ遅延させるためのフィルタ処理を実行するために設けられる。
このようにして補正処理部４１Ａによる信号の遅延が行われることで、上記もしているように、音声信号Ｓｇｌ、Ｓｇｒに含まれる原チャンネル（Ｌ）の音声信号成分の間における立ち上がり時間が同じとなるように調整される。つまり、音声信号Ｓｇｌ、Ｓｇｒに含まれる特定の１つの原チャンネルの音声信号成分の時間のずれが補正される。 Therefore, by delaying the audio signal Sgl itself by the time difference Td, the audio signal component of the original channel (L) included in the audio signal Sgl and the audio signal component of the same original channel (L) included in the audio signal Sgr When the time difference Td is canceled and viewed as an impulse response, the rise times are made to coincide.
In this way, the correction processing unit 41A is provided to perform a filter process for delaying the audio signal Sgl by the time difference Td.
Since the signal is delayed by the correction processing unit 41A in this manner, as described above, the rise time between the audio signal components of the original channel (L) included in the audio signals Sgl and Sgr is the same. It is adjusted to become. That is, the time lag of the audio signal component of one specific original channel included in the audio signals Sgl and Sgr is corrected.

また、本実施の形態にあっては、この補正処理部４１Ａにより、音声信号Ｓｇｌ、Ｓｇｒに含まれる特定の１つの共通な原チャンネルの音声信号成分の間でのレベル差についても補正することとしている。
例えば図３におけるスピーカＳＰ−ＬとリスナＭとの関係についてみれば、スピーカＳＰ−ＬがリスナＭの左前方に偏って位置していることで、リスナＭの左耳と右耳とでの到達距離差や、音の到達方向の違いなどから、スピーカＳＰ−Ｌから左耳と右耳のそれぞれに到達して聴こえる音には、その伝搬時間差の他に、レベル差も生じる。
例えば図１４（ｃ）（ｄ）には、それぞれ、伝達関数Ｈll、Ｈlrに応じたインパルス応答の周波数特性が示されている。これら図１４（ａ）（ｂ）を比較して分かるように、両者の基本的な周波数分布の特性は似通っているが、両者のレベル差Ｌｖとして示すように、そのレベル差が比較的に顕著となっている。そして、このようなレベル差もまた、信号Ｓｇｌ、Ｓｇｒに含まれる伝達関数Ｈll、Ｈlrの各特性が与えられた原チャンネル（Ｌ）の信号成分の間にて生じているものであり、遅延時間（伝搬時間差）とともに、音源の定位感を決定する要素となる。
チャンネル信号分離ブロック３２−Ｌの補正処理部４１Ａでは、信号Ｓｇｌについて、先に説明したように遅延時間Ｔｄ分により遅延させるとともに、上記レベル差Ｌｖの分によるレベル低減処理も実行するようにされる。
このようにして補正処理部４１Ａによる信号の遅延が行われることで、上記もしているように、音声信号Ｓｇｌ、Ｓｇｒに含まれる原チャンネル（Ｌ）の音声信号成分の間におけるレベルが同じとなるように調整される。つまり、音声信号Ｓｇｌ、Ｓｇｒに含まれる特定の１つの原チャンネルの音声信号成分についてのレベル差が補正される。 In the present embodiment, the correction processing unit 41A also corrects the level difference between the audio signal components of one specific common original channel included in the audio signals Sgl and Sgr. Yes.
For example, regarding the relationship between the speaker SP-L and the listener M in FIG. 3, the speaker SP-L is biased to the left front of the listener M, so that the listener M reaches the left ear and the right ear. Due to the difference in distance and the direction in which the sound reaches, the sound that reaches the left and right ears from the speaker SP-L has a level difference in addition to the propagation time difference.
For example, FIGS. 14C and 14D show the frequency characteristics of the impulse response according to the transfer functions Hll and Hlr, respectively. 14A and 14B, the basic frequency distribution characteristics are similar to each other, but the level difference is relatively remarkable as shown by the level difference Lv between the two. It has become. Such a level difference is also caused between the signal components of the original channel (L) to which the characteristics of the transfer functions Hll and Hlr included in the signals Sgl and Sgr are given, and the delay time. Together with the (propagation time difference), it becomes a factor that determines the localization of the sound source.
In the correction processing unit 41A of the channel signal separation block 32-L, the signal Sgl is delayed by the delay time Td as described above, and the level reduction process by the level difference Lv is also executed. .
Since the signal is delayed by the correction processing unit 41A in this way, the level between the audio signal components of the original channel (L) included in the audio signals Sgl and Sgr becomes the same as described above. To be adjusted. That is, the level difference of the audio signal component of one specific original channel included in the audio signals Sgl and Sgr is corrected.

分離処理部４２としては、先に図８に示したのと同様の構成、処理を実行して、最終的には、原チャンネル（Ｌ）の信号を分離して出力する。但し、この場合においては、位相比較比較部６３によっては信号の時間差を検出するようにされる。これに伴い、係数発生部６２においては、検出された時間差に応じて係数ｒｐが求まるような音源分離関数演算を行うことになる。 The separation processing unit 42 executes the same configuration and processing as previously shown in FIG. 8, and finally separates and outputs the signal of the original channel (L). However, in this case, the phase comparison / comparison unit 63 detects a time difference between signals. Accordingly, the coefficient generation unit 62 performs a sound source separation function calculation so that the coefficient rp is obtained according to the detected time difference.

この図１３の例において備えられる補正処理部４１Ａは、例えば信号遅延と、レベル変更とが可能なように構成すればよいことから、先の実施の形態において図７，図８に示したようにして備えられる補正処理部４１ａ，４１ｂよりも、簡易に構成できる。その分、デコード後の出力音声信号についての分離性は、図７，図８に示した構成のほうが良好ではあるが、図１３の例にあっても、チャンネル信号分離ブロック３２の構成は、できるだけ他のチャンネルの信号成分を除去して必要なチャンネルの音声信号成分のみを抜き出そうとするものであり、従って、例えば従来のマトリクス回路と方向性強調回路とを組み合わせた技術と比較しても、十分に良好な再現性を維持しているものである。 The correction processing unit 41A provided in the example of FIG. 13 may be configured to be capable of signal delay and level change, for example, as shown in FIGS. 7 and 8 in the previous embodiment. The correction processing units 41a and 41b provided can be configured more simply. Accordingly, the separation performance of the output audio signal after decoding is better in the configuration shown in FIGS. 7 and 8, but the configuration of the channel signal separation block 32 can be as much as possible even in the example of FIG. This is intended to extract only the audio signal component of the necessary channel by removing the signal component of the other channels, and therefore, for example, even when compared with a technique combining a conventional matrix circuit and a direction enhancement circuit. That is, sufficiently good reproducibility is maintained.

図１５は、本実施の形態のエンコード装置を適用した記録システムの構成例を示している。
この図に示す記録システムは、エンコードユニット１００とメディア記録ユニット１０１とから成る。
エンコードユニット１００は、記録システムにおいて本実施の形態のエンコード装置１と同様の構成を持つユニット部位である。このエンコードユニット１００には、例えば音声ソースのコンテンツとして制作されたＬｃｈ、Ｃｃｈ、Ｒｃｈ、ＬＳｃｈ、ＲＳｃｈのマルチチャンネル構成による音声信号を入力し、例えば図４に示した信号処理構成により、Ｌ，Ｒ２チャンネルステレオの音声信号に変換して出力する。 FIG. 15 shows a configuration example of a recording system to which the encoding apparatus of the present embodiment is applied.
The recording system shown in this figure includes an encoding unit 100 and a media recording unit 101.
The encoding unit 100 is a unit part having the same configuration as that of the encoding apparatus 1 of the present embodiment in the recording system. For example, an audio signal having a multi-channel configuration of Lch, Cch, Rch, LSch, and RSch produced as audio source content is input to the encoding unit 100. For example, the signal processing configuration shown in FIG. It is converted into a channel stereo audio signal and output.

このようにしてエンコードにより得られたＬｃｈ、Ｒｃｈの音声信号は、メディア記録ユニット１０１に入力される。メディア記録ユニット１０１は、所定の記憶媒体（メディア）１０２に対して、入力されたＬｃｈ、Ｒｃｈの音声信号を記録する。このようにして、エンコードされた音声信号が、例えばコンテンツの情報としてメディア１０２に記憶されることになる。
このような記録システムは、例えばコンテンツの制作者などが利用し、音声情報を記憶させたメディア１０２を、パッケージメディアとして提供するようにされる。また、エンコードユニット１００により得られたＬｃｈ，Ｒｃｈによる２チャンネルステレオの音声信号としてのコンテンツを、ネットワーク経由で配布できるようにしてもよい。 The Lch and Rch audio signals obtained by encoding in this way are input to the media recording unit 101. The media recording unit 101 records the input Lch and Rch audio signals on a predetermined storage medium (media) 102. In this way, the encoded audio signal is stored in the medium 102 as content information, for example.
Such a recording system is used, for example, by a content creator, and provides a medium 102 storing audio information as a package medium. Further, the content as a 2-channel stereo audio signal obtained by the encoding unit 100 using Lch and Rch may be distributed via a network.

図１６は、本実施の形態のデコード装置２を適用した再生システムの構成例を示す。
この図に示す再生システムは、メディア再生ユニット２０１、デコードユニット２００とを備える。メディア再生ユニット２０１は、メディア１０２を装填して、このメディアのフォーマットに対応した再生処理を実行することで、エンコード後の音声ソースである、Ｌｃｈ，Ｒｃｈの音声信号を出力する。 FIG. 16 shows a configuration example of a playback system to which the decoding device 2 of the present embodiment is applied.
The playback system shown in this figure includes a media playback unit 201 and a decode unit 200. The media playback unit 201 loads the media 102 and executes playback processing corresponding to the format of the media, thereby outputting audio signals of Lch and Rch that are audio sources after encoding.

メディア再生ユニット２０１により再生されたＬｃｈ，Ｒｃｈの音声信号は、例えば先ず、ヘッドフォンにより音声再生させることができる。前述したように、このときにヘッドフォン６を装着しているリスナにとっては、例えば図３に示されるようにして、あたかも、周囲に設置された５つのスピーカＳＰ−Ｌ、ＳＰ−Ｃ、ＳＰ−Ｒ、ＳＰ−ＬＳ、ＳＰ−ＲＳから音が聴こえてくるような音場を知覚できることになる。 For example, the Lch and Rch audio signals reproduced by the media reproducing unit 201 can be first reproduced by headphones. As described above, for the listener wearing the headphones 6 at this time, for example, as shown in FIG. 3, it is as if the five speakers SP-L, SP-C, and SP-R installed around the speaker. , SP-LS and SP-RS can perceive a sound field where sound can be heard.

また、メディア再生ユニット２０１により再生されたＬｃｈ，Ｒｃｈの音声信号は、デコードユニット２００に入力されるようにもなっている。デコードユニット２００は、例えば図６〜図８あるいは図１３に示した構成による本実施の形態のデコード装置２と同じ構成を備えており、先の説明のようにしてデコード処理を実行して、エンコード前の原チャンネルの音声信号に変換するようにされる。このようにして得られた原チャンネル（Ｌ）（Ｃ）（Ｒ）（ＬＳ）（ＲＳ）の音声信号は、例えば増幅されて、実際に設置されたスピーカＳＰ−Ｌ、ＳＰ−Ｃ、ＳＰ−Ｒ、ＳＰ−ＬＳ、ＳＰ−ＲＳを駆動する。このようして駆動されるスピーカから出力される音を、しかるべき聴取位置にて聴いた場合には、原チャンネルとしての音声ソースとしての理想的な音像定位が再現されることになる。また、前述のようにして、従来のエンコード、デコード技術によりデコードした音声信号をスピーカから出力させる場合と比較して、より高い品位の再現性が得られる。 Also, the Lch and Rch audio signals reproduced by the media reproducing unit 201 are input to the decoding unit 200. The decoding unit 200 has the same configuration as the decoding device 2 of the present embodiment having the configuration shown in FIGS. 6 to 8 or FIG. 13, for example, and performs the decoding process as described above to encode the decoding unit 200. The audio signal of the previous original channel is converted. The audio signals of the original channels (L), (C), (R), (LS), and (RS) thus obtained are amplified, for example, and actually installed speakers SP-L, SP-C, SP- R, SP-LS, and SP-RS are driven. When the sound output from the speaker thus driven is heard at an appropriate listening position, an ideal sound image localization as an audio source as the original channel is reproduced. Further, as described above, higher quality reproducibility can be obtained as compared with the case where the audio signal decoded by the conventional encoding / decoding technique is output from the speaker.

また、上記のような記録システムと再生システムとを考えた場合、再生システム側で最良のデコード結果を得るためには、記録システム側でエンコードしたときのインパルス応答畳み込み処理に利用したのと同じ伝達関数（伝達特性）に基づいて、チャンネル信号分離ブロック３２における補正処理部４１ａ、４１ｂによる逆フィルタ処理、あるいは補正処理部４１Ａによる遅延、レベル補正処理が実行されることが必要である。
このためには、先ず、記録システム側でエンコードするときに使用する伝達特性群を１つのみと決めておき、再生システムでは、この決められた伝達特性群に応じて逆フィルタ特性、あるいは遅延時間、レベル補正量などを組み込んだ補正処理部４１を構成するものである。 Also, when considering the recording system and the playback system as described above, in order to obtain the best decoding result on the playback system side, the same transmission as that used for the impulse response convolution process when encoding on the recording system side is performed. Based on the function (transfer characteristic), it is necessary to execute the inverse filter processing by the correction processing units 41a and 41b in the channel signal separation block 32 or the delay and level correction processing by the correction processing unit 41A.
For this purpose, first, only one transfer characteristic group to be used for encoding on the recording system side is determined. In the reproduction system, an inverse filter characteristic or a delay time is determined according to the determined transfer characteristic group. The correction processing unit 41 incorporating the level correction amount and the like is configured.

しかしながら、上記の場合には、音声ソースの内容に応じて原チャンネルとして想定するスピーカの位置や、周囲の環境などの音響環境を１つのものに決めることになるので、エンコード元の音声ソースのコンテンツを作成するにあたっての自由度がなくなってしまうなどの不都合が生じる。
そこで、コンテンツの作成にあたっては、任意に音響環境をつくる、あるいは、あらかじめ規定された複数の音響環境のうちから選択できるようにして、音響環境のバリエーションが与えられるようにしておく。そして、記録システムによってエンコードした音声ソースをメディア１０２に記録するときには、所定のフォーマットなどに従って、エンコード前の原音源に設定した音響環境を示す識別信号、あるいは音響環境設定に応じて決まるエンコード時に使用する伝達関数群を示す識別信号をともに記録するようにされる。再生システム側では、メディア１０２を再生するときに、この識別信号の読み出しも行って、例えばデコードユニット２００に出力するようにされる。デコードユニット２００は、入力された識別信号に基づいて、チャンネル信号処理ブロック３２における補正処理部４１などの所要の信号処理部に対するパラメータ設定を変更するようにされる。このための構成例を図１７に示す。 However, in the above case, the audio source content of the encoding source is determined because the position of the speaker assumed as the original channel and the acoustic environment such as the surrounding environment are determined as one according to the content of the audio source. Inconvenience such as loss of freedom in creating
Therefore, when creating the content, an acoustic environment can be arbitrarily created or selected from a plurality of predefined acoustic environments so that variations of the acoustic environment are given. When an audio source encoded by the recording system is recorded on the medium 102, an identification signal indicating the acoustic environment set for the original sound source before encoding is used according to a predetermined format or the like, or used for encoding determined according to the acoustic environment setting. An identification signal indicating a transfer function group is recorded together. On the playback system side, when the medium 102 is played back, the identification signal is also read out and output to the decode unit 200, for example. Based on the input identification signal, the decode unit 200 changes the parameter setting for a required signal processing unit such as the correction processing unit 41 in the channel signal processing block 32. A configuration example for this is shown in FIG.

図１７においては、図８と同じ構成のチャンネル信号分離ブロック３２−Ｌと、パラメータ設定部４００が示される。パラメータ部４００によるパラメータの設定は、チャンネル信号処理ブロック３２−Ｌだけではなく、残りのチャンネル信号処理ブロックに対しても行われるが、ここでは、図示及び説明を簡便にすることの都合上、チャンネル信号処理ブロック３２−Ｌとパラメータ設定部４００との関係のみが示されている。
パラメータ設定部４００は、デコードユニット２００に入力された識別信号を読み込む。そして、この読み込んだ識別信号に基づいて、パラメータとして、例えば補正処理部４１ａ、４１ｂに設定すべき逆フィルタ特性を決定するようにされる。
また、この場合のパラメータ設定部４００は、係数発生部６２、６４の音源分離関数も決定するようにされる。例えばエンコード時に設定した音響環境などの相違に応じては、係数発生部６２，６３にて係数を発生させるときに利用する音源分離関数も変更する必要がある、あるいは変更したほうがより最適なデコード結果が得られて好ましいような場合のあることも考えられるからである。
また、チャンネル信号分離ブロック３２の構成が、図１３に示すものであった場合には、補正処理部４１ａ、４１ｂの逆フィルタ特性に代えて、補正処理部４１Ａの遅延時間、補正レベル量をパラメータとして決定する。 17 shows a channel signal separation block 32-L having the same configuration as that of FIG. 8 and a parameter setting unit 400. The parameter setting by the parameter unit 400 is performed not only for the channel signal processing block 32-L but also for the remaining channel signal processing blocks. Here, for convenience of illustration and description, the channel is set. Only the relationship between the signal processing block 32-L and the parameter setting unit 400 is shown.
The parameter setting unit 400 reads the identification signal input to the decode unit 200. Then, based on the read identification signal, for example, the inverse filter characteristics to be set in the correction processing units 41a and 41b are determined as parameters.
In this case, the parameter setting unit 400 also determines the sound source separation functions of the coefficient generation units 62 and 64. For example, depending on the difference in the acoustic environment set at the time of encoding, it is necessary to change the sound source separation function used when generating the coefficients by the coefficient generating units 62 and 63, or it is more optimal to change the decoding result. This is because it may be preferable that the above is obtained.
If the configuration of the channel signal separation block 32 is as shown in FIG. 13, the delay time and the correction level amount of the correction processing unit 41A are used as parameters instead of the inverse filter characteristics of the correction processing units 41a and 41b. Determine as.

ここで、パラメータ設定部４００による上記各パラメータの決定（取得）の仕方としては、次のようなものを考えることができる。
先ず、識別信号（識別情報）の構造内に対して設定すべきパラメータが格納されている場合には、読み込んだ識別信号からパラメータの情報を取得すればよい。
また、識別信号が、例えばエンコード時の音響環境などに応じたエンコードタイプを特定するようなものである場合には、パラメータ設定部４００においてエンコードタイプに応じてパラメータを記述したテーブル情報などを用意しておき、識別信号の内容により識別したエンコードタイプと対応つけられているパラメータをテーブル情報から検索して取得するように構成することが考えられる。あるいは、識別情報により識別したエンコードタイプに応じて所定の演算式、関数に基づいた演算を実行し、その演算結果をパラメータとして出力させる構成とすることも考えられる。
また、パラメータ設定部４００の実際の構成としては、ＣＰＵなどを備えたコンピュータなどが、パラメータ設定のためのプログラムを実行することで実現されるものとされればよい。 Here, as a method of determining (acquiring) each parameter by the parameter setting unit 400, the following can be considered.
First, when parameters to be set are stored in the structure of the identification signal (identification information), parameter information may be acquired from the read identification signal.
In addition, when the identification signal specifies an encoding type according to the acoustic environment at the time of encoding, for example, the parameter setting unit 400 prepares table information describing parameters according to the encoding type. It is conceivable that the parameter associated with the encoding type identified by the content of the identification signal is retrieved from the table information and acquired. Alternatively, a configuration may be considered in which an operation based on a predetermined arithmetic expression or function is executed according to the encoding type identified by the identification information, and the operation result is output as a parameter.
Further, the actual configuration of the parameter setting unit 400 may be realized by a computer having a CPU or the like executing a program for parameter setting.

上記のようにしてパラメータ設定部４００により決定されたパラメータとしての逆フィルタ特性、及び音源分離関数は、補正処理部４１ａ、４１ｂ、係数発生部６２，６４に対してそれぞれ設定される。なお、例えば逆フィルタ特性の設定については、補正処理部４１ａ、４１ｂを形成するデジタルフィルタにおける乗算器の係数を変更することにより行うことができる。 The inverse filter characteristic and the sound source separation function as parameters determined by the parameter setting unit 400 as described above are set for the correction processing units 41a and 41b and the coefficient generation units 62 and 64, respectively. For example, the inverse filter characteristics can be set by changing the coefficient of the multiplier in the digital filter forming the correction processing units 41a and 41b.

そして、上記したようなパラメータ設定部４００によるチャンネル分離処理ブロック３２−Ｌに対するパラメータ設定は、残るチャンネル分離処理ブロック３２−Ｃ、３２−Ｒ、３２−Ｃ、３２−ＬＳ、３２−ＲＳについても同様にして行われるものである。
このようにして識別信号に応じたパラメータ設定が行われたチャンネル分離処理ブロック３２−Ｌ、３２−Ｃ、３２−Ｒ、３２−Ｃ、３２−ＬＳ、３２−ＲＳにおける補正処理部４１ａ、４１ｂ及び係数発生部６２，６４が処理を実行することで、例えばエンコード時の条件に応じて最適とされるパラメータによる信号分離の処理が行われることになるものであり、この結果、例えばデコード出力される信号については、エンコード前の原チャンネルの音声信号に非常に近い、最良とされるものが得られることになる。 The parameter setting for the channel separation processing block 32-L by the parameter setting unit 400 as described above is the same for the remaining channel separation processing blocks 32-C, 32-R, 32-C, 32-LS, and 32-RS. Is done.
Correction processing units 41a and 41b in the channel separation processing blocks 32-L, 32-C, 32-R, 32-C, 32-LS, and 32-RS in which parameters are set in accordance with the identification signal in this way. When the coefficient generators 62 and 64 execute processing, for example, signal separation processing is performed using parameters that are optimized according to the encoding conditions. As a result, for example, decoding output is performed. As for the signal, the best signal that is very close to the sound signal of the original channel before encoding is obtained.

また、補足として、本実施の形態のエンコード装置２によりエンコードされた音声ソースを再生出力する再生システムの他の例を図１８に示す。
この図に示す再生システムは、メディア再生ユニット２０１、及びスピーカ駆動ユニット２０２を備えて構成される。メディア再生ユニット２０１は、先に図１６に示したものと同様にして、メディア１０２から、エンコード後の音声ソースであるＬｃｈ，Ｒｃｈの音声信号を再生して出力する。
この場合においても、メディア再生ユニット２０１により再生されたＬｃｈ，Ｒｃｈの音声信号は、ヘッドフォンにより音声として再生出力させることができるようになっている。
そして、メディア再生ユニット２０１により再生されたＬｃｈ，Ｒｃｈの音声信号は、スピーカ駆動ユニット２０２に対しても入力されるようになっている。 As a supplement, FIG. 18 shows another example of a reproduction system that reproduces and outputs an audio source encoded by the encoding apparatus 2 of the present embodiment.
The playback system shown in this figure includes a media playback unit 201 and a speaker drive unit 202. The media playback unit 201 plays back and outputs Lch and Rch audio signals, which are audio sources after encoding, from the media 102 in the same manner as shown in FIG.
Also in this case, the Lch and Rch audio signals reproduced by the media reproducing unit 201 can be reproduced and output as sound by the headphones.
The Lch and Rch audio signals reproduced by the media reproducing unit 201 are also input to the speaker driving unit 202.

スピーカ駆動ユニット２０２は、入力されたＬｃｈ，Ｒｃｈの音声信号について所要の信号処理を施した上で増幅を行い、Ｌ、Ｒチャンネルに対応する２つのスピーカＳＰ−Ｌ、ＳＰ−Ｒを駆動する。つまり、この再生システムでは、エンコード後の音声ソースであるＬｃｈ，Ｒｃｈの音声信号を、本実施の形態のデコード装置２の構成によりデコードして５チャンネル構成のスピーカシステムにより再生出力するのではなく、同じ２チャンネル構成のスピーカシステムにより再生出力するようにされる。 The speaker driving unit 202 performs amplification after performing necessary signal processing on the input Lch and Rch audio signals, and drives the two speakers SP-L and SP-R corresponding to the L and R channels. That is, in this playback system, the audio signals of Lch and Rch, which are audio sources after encoding, are not decoded by the configuration of the decoding device 2 of the present embodiment and are reproduced and output by the speaker system having a 5-channel configuration. Playback and output are performed by the same two-channel speaker system.

本実施の形態のエンコード装置２によりエンコードされた音声ソースであるＬ、Ｒチャンネルの音声信号は、先に説明したとおり、通常のＬ，Ｒステレオに対応した再生システムにより再生しても、エンコード前のチャンネル構成に応じたスピーカシステムで聴いているのと同等の音像定位が得られる。ただし、エンコード時に想定した音響環境に忠実な音像定位を聴くためには、ヘッドフォンによる再生が適している。ヘッドフォンのドライバ部分から出力される音声は、リスナの耳に直接的に到達するので、左右のチャンネルの音のクロストークはほとんど無いからである。しかしながら、スピーカにより再生する場合に、例えば左チャンネルのスピーカから出力される音は、それぞれ、リスナの左耳だけではなく、右耳にも到達して聴こえるものであり、同様に、右チャンネルのスピーカから出力される音は、それぞれ、リスナの右耳だけではなく、左にも到達して聴こえる。つまり、左右のチャンネルのスピーカと、リスナの左右の耳との間でクロストークが発生する。このことが、適正な音像定位による再生を妨げる主たる要因である。 As described above, the audio signals of the L and R channels, which are the audio sources encoded by the encoding apparatus 2 of the present embodiment, are reproduced before being encoded by a reproduction system compatible with normal L and R stereo. Sound image localization equivalent to that of listening with a speaker system corresponding to the channel configuration is obtained. However, in order to listen to the sound image localization that is faithful to the acoustic environment assumed at the time of encoding, playback using headphones is suitable. This is because the sound output from the headphone driver part directly reaches the listener's ears, so there is almost no crosstalk between the sound of the left and right channels. However, when the sound is reproduced by the speaker, for example, the sound output from the left channel speaker can be heard by reaching not only the left ear of the listener but also the right ear, and similarly, the right channel speaker. The sound output from can be heard not only by the listener's right ear but also by the left. That is, crosstalk occurs between the left and right channel speakers and the left and right ears of the listener. This is the main factor that hinders reproduction by proper sound image localization.

そこで、図１８に示す再生システムのスピーカ駆動ユニット２０２では、以降説明するようにして、上記したクロストークがキャンセルされるようにするための信号処理機能を備える。
先ず、図１９には、Ｌ（左），Ｒ（右）のチャンネルごとに応じた２つのスピーカＳＰ−Ｌ，ＳＰ−Ｒが配置され、このスピーカＳＰ−Ｌ，ＳＰ−Ｒの正中面となる位置にリスナＭが位置して、スピーカＳＰ−Ｌ，ＳＰ−Ｒから到達する音を聴き取るというモデルが示されている。
このモデルにおいては、スピーカＳＰ−Ｌから左耳に到達する経路の伝達関数をＨsll、スピーカＳＰ−Ｌから右耳に到達する経路の伝達関数をＨslr、スピーカＳＰ−Ｒから左耳に到達する経路の伝達関数をＨsrl、スピーカＳＰ−Ｒから右耳に到達する経路の伝達関数をＨsrrとして示している。
上記した伝達関数に応じた経路のうちで、クロストークに対応するのは、スピーカＳＰ−Ｌから右耳に至る経路と、スピーカＳＰ−Ｒから左耳に至る経路である。図１３に示したモデルから、この２つの経路を除けば、リスナＭには、スピーカＳＰ−Ｌから左耳に至る経路と、スピーカＳＰ−Ｒから右耳に至る経路とによる音のみが到達しているのと同じことになる。つまり、ヘッドフォンによる再生音を聴いているのと同じく、クロストークを聴かない状態で聴くことになる。
このことから、図１８のスピーカ駆動ユニット２０２としては、入力されるＬ，Ｒチャンネルの音声信号から、クロストークに対応する経路の伝達関数Ｈslr、Ｈsrlに応じた伝達特性を除去するための信号処理を実行するようにすればよい、ということになる。これにより、実際のスピーカＳＰ−Ｌ、ＳＰ−Ｒと、リスナの左右の耳との間でのクロストークはなくなり、リスナにとっては、例えばヘッドフォンによる再生音声を聴いているときと等価の、エンコード時に想定した音響環境に非常に忠実な音像定位を知覚できる。 Therefore, the speaker drive unit 202 of the reproduction system shown in FIG. 18 has a signal processing function for canceling the above-described crosstalk as described below.
First, in FIG. 19, two speakers SP-L and SP-R corresponding to each of the L (left) and R (right) channels are arranged, and serve as the median plane of the speakers SP-L and SP-R. A model is shown in which a listener M is positioned at a position and sounds arriving from speakers SP-L and SP-R are listened to.
In this model, the transfer function of the path from the speaker SP-L to the left ear is Hsll, the transfer function of the path from the speaker SP-L to the right ear is Hslr, and the path from the speaker SP-R to the left ear. Hsrl and the transfer function of the path from the speaker SP-R to the right ear are indicated as Hsrr.
Among the paths according to the transfer function described above, the paths corresponding to the crosstalk are the path from the speaker SP-L to the right ear and the path from the speaker SP-R to the left ear. Except for these two routes from the model shown in FIG. 13, only the sound from the route from the speaker SP-L to the left ear and the route from the speaker SP-R to the right ear reaches the listener M. Will be the same as In other words, just like listening to the playback sound from headphones, you listen without crosstalk.
Therefore, the speaker drive unit 202 in FIG. 18 performs signal processing for removing transfer characteristics corresponding to the transfer functions Hslr and Hsrl of the path corresponding to the crosstalk from the input L and R channel audio signals. This means that it should be executed. As a result, crosstalk between the actual speakers SP-L and SP-R and the left and right ears of the listener is eliminated, and for the listener, for example, during encoding, which is equivalent to listening to playback sound through headphones. Sound image localization that is very faithful to the assumed acoustic environment can be perceived.

続いて、スピーカ駆動ユニット２０２におけるクロストークキャンセルのための構成につて説明する。
ここで、図１９に示されるスピーカＳＰ−Ｌ、ＳＰ−ＲがリスナＭの正中面に対して対称に配置されていることとして、スピーカがリスナＭにまで到達する音の経路のうちで、クロストークではないとされる、スピーカＳＰ−ＬからリスナＭの左耳に至る経路と、スピーカＳＰ−ＲからリスナＭの右耳に至る経路とに対応した伝達関数Ｈsll、Ｈsrrについて、
Ｈsll＝Ｈsrr＝Ｓ
とする。また、クロストークとされる、スピーカＳＰ−ＬからリスナＭの右耳に至る経路と、スピーカＳＰ−ＲからリスナＭの左耳に至る経路とに対応した伝達関数Ｈsll、Ｈsrrについて、
Ｈslr＝Ｈsrl＝Ａ
とする。そして下記の式により表される伝達関数Ｃを定義する。
Ｃ＝−Ａ／Ｓ Next, the configuration for canceling the crosstalk in the speaker drive unit 202 will be described.
Here, it is assumed that the speakers SP-L and SP-R shown in FIG. 19 are arranged symmetrically with respect to the median plane of the listener M. Transfer functions Hsll and Hsrr corresponding to a path from the speaker SP-L to the left ear of the listener M and a path from the speaker SP-R to the right ear of the listener M, which are not talks,
Hsll = Hsrr = S
And Further, transfer functions Hsll and Hsrr corresponding to a path from the speaker SP-L to the right ear of the listener M and a path from the speaker SP-R to the left ear of the listener M, which are crosstalk,
Hslr = Hsrl = A
And Then, a transfer function C expressed by the following equation is defined.
C = -A / S

上記のようにして求められる伝達関数Ｃを利用して、スピーカ駆動ユニット２０２におけるクロストークキャンセルのための信号処理系を、例えば図２０のようにして構成できる。
図２０に示すクロストークキャンセルのための信号処理系の構成としては、図示するようにして、加算器２１１，２１３、フィルタ２１２，２１４，２１５，２１６を備える。
入力されるＬｃｈ，Ｒｃｈの音声信号のうち、Ｌｃｈの音声信号は加算器２１１に対して入力されるとともに、分岐してフィルタ２１２に対して入力される。フィルタ２１２は、Ｌｃｈの音声信号に対して伝達関数Ｃの伝達特性を与えて加算器２１３に出力する。
また、Ｒｃｈの音声信号は加算器２１３に対して入力されるとともに、分岐してフィルタ２１４に対して入力される。フィルタ２１４は、Ｒｃｈの音声信号に対して伝達関数Ｃの伝達特性を与えて加算器２１１に出力する。 Using the transfer function C obtained as described above, a signal processing system for crosstalk cancellation in the speaker drive unit 202 can be configured as shown in FIG. 20, for example.
The signal processing system for crosstalk cancellation shown in FIG. 20 includes adders 211 and 213 and filters 212, 214, 215, and 216 as shown in the figure.
Of the input Lch and Rch audio signals, the Lch audio signal is input to the adder 211 and branched to be input to the filter 212. The filter 212 gives a transfer characteristic of the transfer function C to the Lch audio signal and outputs it to the adder 213.
In addition, the Rch audio signal is input to the adder 213 and branched to be input to the filter 214. The filter 214 gives a transfer characteristic of the transfer function C to the Rch audio signal and outputs it to the adder 211.

加算器２１１によっては、Ｌｃｈの音声信号と、伝達関数Ｃの伝達特性が与えられたＲｃｈの音声信号が加算、合成されて出力される。この加算器２１１から出力される信号は、元のＬｃｈの音声信号より、図１９におけるスピーカＳＰ−ＬからリスナＭの右耳にクロストークして到達する伝達特性に応じた成分をあらかじめ取り除いたものとなる。
また、加算器２１３によっては、Ｒｃｈの音声信号と、伝達関数Ｃの伝達特性が与えられたＬｃｈの音声信号が加算、合成されて出力される。この加算器２１１から出力される信号は、元のＲｃｈの音声信号より、スピーカＳＰ−ＲからリスナＭの左耳にクロストークして到達する伝達特性に応じた成分をあらかじめ取り除いたものとなる。 Depending on the adder 211, the Lch audio signal and the Rch audio signal to which the transfer function C is given are added, synthesized, and output. The signal output from the adder 211 is obtained by removing in advance components corresponding to transfer characteristics that reach the right ear of the listener M from the speaker SP-L in FIG. 19 from the original Lch audio signal. It becomes.
The adder 213 adds and synthesizes the Rch audio signal and the Lch audio signal to which the transfer characteristic C is given, and outputs the resultant signal. The signal output from the adder 211 is obtained by removing in advance components corresponding to transfer characteristics that reach the left ear of the listener M from the speaker SP-R from the original Rch audio signal.

加算器２１１の出力は、フィルタ２１５を通過して、Ｌｃｈの再生用音声信号として出力され、加算器２１３の出力は、フィルタ２１６を通過して、Ｒｃｈの再生用音声信号として出力され、る。フィルタ２１５、２１６は、例えばフィルタ特性Ｆにより、周波数特性を平坦化するようにして補正するために設けられる。
このようにして出力されるＬｃｈの再生用音声信号とＲｃｈの再生用音声信号によりスピーカＳＰ−Ｌ、ＳＰ−Ｒを駆動すると、スピーカＳＰ−Ｌ、ＳＰ−Ｌから発せられる音を実際に聴くリスナＭとしては、図１９のスピーカＳＰ−ＬからリスナＭの左耳に到達する経路による音と、スピーカＳＰ−Ｒからリスナの右耳に到達する経路による音のみを聴くのと等価の状態が得られることになる。つまり、クロストークがキャンセルされ、ヘッドフォンで聴く場合と同様に、エンコード時に想定した音響環境に対応した音像定位を知覚できることになる。 The output of the adder 211 passes through the filter 215 and is output as an Lch playback audio signal, and the output of the adder 213 passes through the filter 216 and is output as an Rch playback audio signal. The filters 215 and 216 are provided to correct the frequency characteristics so as to be flattened by the filter characteristics F, for example.
When the speakers SP-L and SP-R are driven by the Lch playback audio signal and the Rch playback audio signal output in this way, a listener that actually listens to the sound emitted from the speakers SP-L and SP-L. As M, a state equivalent to listening to only the sound from the path from the speaker SP-L to the left ear of the listener M and the sound from the speaker SP-R to the right ear of the listener is obtained. Will be. That is, the crosstalk is canceled and the sound image localization corresponding to the acoustic environment assumed at the time of encoding can be perceived as in the case of listening with headphones.

また、図２１に、スピーカ駆動ユニット２０２におけるクロストークキャンセルのための信号処理系についての他の構成例を示す。
この図に示す構成においては、Ｌｃｈの信号を、フィルタ２２１とフィルタ２２２に対して入力させる。フィルタ２２１はフィルタ特性Ｆ１によるフィルタリング処理を実行し、フィルタ２２１はフィルタ特性Ｆ２によるフィルタリング処理を実行する。
また、Ｒｃｈの信号は、フィルタ特性Ｆ３を持つフィルタ２２３によるフィルタリング処理と、フィルタ特性Ｆ４を持つフィルタ２２４によるフィルタリング処理を実行する。
そして、フィルタ２２１とフィルタ２２３を加算器２１１により加算した出力がＬｃｈの再生用音声信号となり、フィルタ２２２とフィルタ２２４を加算器２１３により加算した出力がＲｃｈの再生用音声信号となる。
フィルタ２２１，２２２，２２３，２２４の各フィルタ特性Ｆ１，Ｆ２，Ｆ３，Ｆ４は、図１９の伝達関数との関係では、次のようにして表される。
Ｆ１＝Hsrr/（Hsll×Hsrr−Hslr×Hsrl）
Ｆ２＝−Hslr/（Hsll×Hsrr−Hslr×Hsrl）
Ｆ３＝−Hsrl/（Hsll×Hsrr−Hslr×Hsrl）
Ｆ４＝Hsll/（Hsll×Hsrr−Hslr×Hsrl） FIG. 21 shows another configuration example of the signal processing system for canceling the crosstalk in the speaker drive unit 202.
In the configuration shown in this figure, an Lch signal is input to the filter 221 and the filter 222. The filter 221 executes a filtering process based on the filter characteristic F1, and the filter 221 executes a filtering process based on the filter characteristic F2.
Further, the Rch signal is subjected to filtering processing by the filter 223 having the filter characteristic F3 and filtering processing by the filter 224 having the filter characteristic F4.
An output obtained by adding the filter 221 and the filter 223 by the adder 211 becomes an Lch reproduction audio signal, and an output obtained by adding the filter 222 and the filter 224 by the adder 213 becomes an Rch reproduction audio signal.
The filter characteristics F1, F2, F3, and F4 of the filters 221, 222, 223, and 224 are expressed as follows in relation to the transfer function of FIG.
F1 = Hsrr / (Hsll × Hsrr−Hslr × Hsrl)
F2 = −Hslr / (Hsll × Hsrr−Hslr × Hsrl)
F3 = −Hsrl / (Hsll × Hsrr−Hslr × Hsrl)
F4 = Hsll / (Hsll × Hsrr−Hslr × Hsrl)

この図２１の構成によっても、加算器２１１、２１３から出力される信号の各々は、図２０の加算器２１１、２１３から出力される信号の組成と同じものとなる。従って、この図２１の構成による処理を経て出力されるＬｃｈ、Ｒｃｈの再生用音声信号をスピーカＳＰ−Ｌ、ＳＰ−Ｒにより駆動させた場合にも、ヘッドフォンで聴く場合と同等の音像定位を知覚できることになる。 Even in the configuration of FIG. 21, the signals output from the adders 211 and 213 have the same composition as the signals output from the adders 211 and 213 of FIG. Therefore, even when the Lch and Rch playback audio signals output through the processing of the configuration of FIG. 21 are driven by the speakers SP-L and SP-R, sound image localization equivalent to the case of listening with headphones is perceived. It will be possible.

ところで、これまでの説明にあっては、エンコード装置１は、原チャンネルとしてＬｃｈ、Ｃｃｈ、Ｒｃｈ、ＬＳｃｈ、ＲＳｃｈのチャンネル構成の組に対応し、エンコードチャンネルとしてＬｃｈ，Ｒｃｈの２チャンネル構成の組に対応することとしている。しかし、このチャンネル構成はあくまでも一例であって、原チャンネル側とエンコードチャンネル側とでそれぞれ変更されて良いものである。また、エンコード前と後とでは、例えば同じコンテンツについて、チャンネル構成が異なれば良いものとされ、この点で、エンコード前と後とで構成チャンネル数が同じとなる場合もあるものとされる。構成チャンネル数が同じであっても、例えばチャンネル間での音源位置などに違いがあれば、チャンネル構成としては異なるからである。
また、さらに実施の形態としてのデコード装置２は、エンコード装置１によりエンコードされた音声ソースを入力して、デコードチャンネルとして原チャンネルと同じチャンネル構成にデコードすることとしているが、デコードにより得られるデコードチャンネルとしてのチャンネル構成は、必ずしも、エンコード装置１が対応する原チャンネルのチャンネル構成と同じである必要は無く、他のチャンネル構成とされてもよい。このようなデコード装置は、デコード後のチャンネル構成のモデルに従った伝達関数を加味して、補正処理部４１に与える特性を設定することで実現可能である。 By the way, in the description so far, the encoding apparatus 1 corresponds to a set of channel configurations of Lch, Cch, Rch, LSch, and RSch as original channels, and is set to a set of 2-channel configurations of Lch and Rch as encode channels. It is supposed to respond. However, this channel configuration is merely an example, and may be changed on the original channel side and the encode channel side. In addition, for example, the same content may have different channel configurations before and after encoding, and in this respect, the number of constituent channels may be the same before and after encoding. This is because even if the number of constituent channels is the same, the channel configuration is different if there is a difference in the sound source position between the channels, for example.
Further, the decoding device 2 as an embodiment inputs the audio source encoded by the encoding device 1 and decodes it as a decoding channel into the same channel configuration as the original channel. The channel configuration is not necessarily the same as the channel configuration of the original channel to which the encoding apparatus 1 corresponds, and may be another channel configuration. Such a decoding apparatus can be realized by setting a characteristic to be given to the correction processing unit 41 in consideration of a transfer function according to a model of a channel configuration after decoding.

さらに、これまでの説明にあっては、本実施の形態のエンコード装置１とデコード装置２は、それぞれ、記録システム、再生システムにおいて個別に備えられるものとしているが、本実施の形態のエンコード装置１とデコード装置２の構成を共に備えた記録再生装置、記録再生システムを構築することもできるものである。 Furthermore, in the description so far, the encoding apparatus 1 and the decoding apparatus 2 of the present embodiment are individually provided in the recording system and the reproduction system, respectively, but the encoding apparatus 1 of the present embodiment. It is also possible to construct a recording / reproducing apparatus and a recording / reproducing system having both the configuration of the decoding device 2 and the decoding device 2.

また、これまでに説明した本実施の形態としてのエンコード装置１、デコード装置２としての構成は、例えば音声の記録、再生機能を有するオーディオ機器として物理的に構成することができる。また、その信号処理系の構成をプログラムとして構成することもできる。本実施の形態のエンコード装置、デコード装置の機能をプログラムにより構成した場合、エンコード、デコードとしての信号処理は、ＣＰＵなどがプログラムに従って実行することで実現される。そして、このようなプログラムは、音声再生装置としての機能を実現する機器が備えるＲＯＭなどに対して、製造時などに書き込んで記憶させることができる。また、プログラムは、例えばリムーバブルの記憶媒体（磁気ディスク、光ディスク、半導体メモリなど）に記憶させておいたうえで、パーソナルコンピュータをはじめとする各種機器がこの記憶媒体から読み出して実行させることができる。あるいは、記憶媒体に記憶されたプログラムを機器にインストールし、その後に、機器がインストールされたプログラムを実行できるように構成することもできる。また、ネットワーク上のサーバなどにおける記憶装置に記憶させておき、各種の機器がネットワーク経由で一時取得したうえで実行したり、あるいは、機器がネットワーク経由でインストールを行い、その後にインストールされたプログラムを実行できるように構成したりすることも考えられる。
また、本願発明としては、これまでに説明した実施の形態としての例に限定されるものではない、適宜変更が可能とされる。例えば、本実施の形態では、音の空間伝達関数は、音源からリスナの耳に到達する経路のものとされていることから、頭部伝達関数と同義とみてよいのであるが、音源の到達目標となる位置をリスナの耳ではない、何らかの他のものに対応させる場合もあると考えられる。この場合には、音源から到達目標の位置までの経路を表す伝達特性として、本来の意味での空間伝達関数を用いることになる。 In addition, the configuration as the encoding device 1 and the decoding device 2 according to the present embodiment described so far can be physically configured as, for example, an audio device having a sound recording and reproducing function. Further, the configuration of the signal processing system can be configured as a program. When the functions of the encoding device and decoding device of the present embodiment are configured by a program, signal processing as encoding and decoding is realized by a CPU or the like executing according to the program. And such a program can be written and memorize | stored at the time of manufacture etc. with respect to ROM etc. with which the apparatus which implement | achieves the function as an audio | voice reproducing apparatus is provided. Further, the program can be stored in, for example, a removable storage medium (magnetic disk, optical disk, semiconductor memory, etc.) and read out from the storage medium and executed by various devices such as a personal computer. Alternatively, the program stored in the storage medium can be installed in the device, and then the device can be configured to execute the installed program. In addition, it can be stored in a storage device such as a server on the network and executed after various devices are temporarily acquired via the network, or the device is installed via the network and the installed program is It may be configured to be executable.
Further, the invention of the present application is not limited to the example as the embodiment described so far, and can be appropriately changed. For example, in this embodiment, since the spatial transfer function of sound is assumed to have a path from the sound source to the listener's ear, it may be considered synonymous with the head-related transfer function. It is also possible that the position corresponding to the other position is not the listener's ear. In this case, a spatial transfer function in the original sense is used as a transfer characteristic representing a route from the sound source to the position of the target to be reached.

本願発明の実施の形態としてのエンコード装置が対応する入出力のチャンネル構成例を示す図である。It is a figure which shows the example of an input-output channel structure which the encoding apparatus as embodiment of this invention corresponds. 本願発明の実施の形態としてのデコード装置が対応する入出力のチャンネル構成例を示す図である。It is a figure which shows the example of a channel structure of the input / output which the decoding apparatus as embodiment of this invention corresponds. 本実施の形態のエンコード装置によりエンコードされる音声ソースのチャンネル構成を音源とする場合のモデルを示す図である。It is a figure which shows the model in the case of making into a sound source the channel structure of the audio | voice source encoded by the encoding apparatus of this Embodiment. 実施の形態のエンコード装置の構成例を示す図である。It is a figure which shows the structural example of the encoding apparatus of embodiment. 実施の形態のエンコード装置内のフィルタの構成例を示す図である。It is a figure which shows the structural example of the filter in the encoding apparatus of embodiment. 実施の形態のデコード装置の構成例を示す図である。It is a figure which shows the structural example of the decoding apparatus of embodiment. 実施の形態のデコード装置におけるチャンネル信号分離ブロックの構成例を概念的に示す図である。It is a figure which shows notionally the structural example of the channel signal separation block in the decoding apparatus of embodiment. 実施の形態のデコード装置におけるチャンネル信号分離ブロックの構成例を示す図である。It is a figure which shows the structural example of the channel signal separation block in the decoding apparatus of embodiment. 伝達関数Ｈllと、この伝達関数Ｈllに対する逆特性とを、周波数特性により比較して示す図である。It is a figure which compares and shows the transfer function Hll and the reverse characteristic with respect to this transfer function Hll by a frequency characteristic. 係数発生部がレベル比に応じて乗算器の係数を設定するための関数例を示す図である。It is a figure which shows the example of a function for a coefficient generation part to set the coefficient of a multiplier according to a level ratio. 係数発生部が位相差に応じて乗算器の係数を設定するための関数例を示す図である。It is a figure which shows the example of a function for a coefficient generation part to set the coefficient of a multiplier according to a phase difference. 有響環境でのインパルス応答波形と、このインパルス応答波形から直接音部分のみを取り出した応答波形とを示す図である。It is a figure which shows the impulse response waveform in a sympathetic environment, and the response waveform which took out only the sound part directly from this impulse response waveform. 実施の形態のデコード装置についての他の構成例を示す図である。It is a figure which shows the other structural example about the decoding apparatus of embodiment. 同一音源の音声に与えられる伝達関数に応じて生じるとされる伝搬時間差とレベル差の例を示す図である。It is a figure which shows the example of the propagation time difference and level difference which are produced according to the transfer function given to the sound of the same sound source. 実施の形態のエンコード装置を備える記録システムの構成例を示す図であるIt is a figure which shows the structural example of a recording system provided with the encoding apparatus of embodiment. 実施の形態のデコード装置を備える再生システムの構成例を示す図であるIt is a figure which shows the structural example of a reproduction system provided with the decoding apparatus of embodiment. 識別信号に応じてチャンネル信号分離ブロックにおけるパラメータを変更設定するための構成例を示す図である。It is a figure which shows the structural example for changing and setting the parameter in a channel signal separation block according to an identification signal. 実施の形態のエンコード装置によりエンコードされた音声ソースを再生する再生システムの構成例を示す図である。It is a figure which shows the structural example of the reproduction | regeneration system which reproduces | regenerates the audio | voice source encoded by the encoding apparatus of embodiment. 音源が２チャンネルの場合の音響のモデルを示す図である。It is a figure which shows the model of an acoustic in case a sound source is 2 channels. 図１８のスピーカ駆動ユニットに備えられるクロストークキャンセルのための構成例を示す図である。It is a figure which shows the structural example for the crosstalk cancellation with which the speaker drive unit of FIG. 18 is equipped. 図１８のスピーカ駆動ユニットに備えられるクロストークキャンセルのための構成例を示す図である。It is a figure which shows the structural example for the crosstalk cancellation with which the speaker drive unit of FIG. 18 is equipped. 従来としてのエンコード技術の構成例を示す図である。It is a figure which shows the structural example of the encoding technique as the past.

Explanation of symbols

１エンコード装置、２デコード装置、６ヘッドフォン、１１ａ〜１５ａ・１１ｂ〜１５ｂフィルタ、１６ａ・１６ｂ３１ａ・３１ｂ高速フーリエ変換部、３２−Ｌ・３２−Ｃ・３２−Ｒ・３２−ＬＳ・３２−ＬＲチャンネル信号分離ブロック、逆フーリエ変換部３３−Ｌ・３３−Ｃ・３３−Ｒ・３３−ＬＳ・３３−ＬＲ、４１ａ・４１ｂ・４１Ａ補正処理部、４２分離処理部、５１レベル／位相比較処理ブロック、５２音源分離関数演算ブロック、５３・５４・６５・６６・６７・６８係数器、５５加算器、６１レベル比較部、６２・６４係数発生部、６３位相比較部、１００エンコードユニット、１０１メディア記録ユニット、１０２メディア、２００デコードユニット、２０１メディア再生ユニット、４００パラメータ設定部 DESCRIPTION OF SYMBOLS 1 Encoding apparatus, 2 Decoding apparatus, 6 Headphone, 11a-15a * 11b-15b filter, 16a * 16b 31a * 31b Fast Fourier-transform part, 32-L * 32-C * 32-R * 32-LS * 32-LR Channel signal separation block, inverse Fourier transform unit 33-L / 33-C / 33-R / 33-LS / 33-LR, 41a / 41b / 41A correction processing unit, 42 separation processing unit, 51 level / phase comparison processing block , 52 Sound source separation function calculation block, 53/54/65/66/67/68 Coefficient unit, 55 Adder, 61 Level comparison unit, 62/64 Coefficient generation unit, 63 Phase comparison unit, 100 Encoding unit, 101 Media recording Unit, 102 media, 200 decode unit, 201 media playback unit, 4 0 parameter setting unit

Claims

For each of the audio signal components corresponding to the decode channel constituting a predetermined channel configuration, transfer characteristics represented by a spatial transfer function obtained based on the position of the sound source as the corresponding decode channel are given, and these audio signals are given. Audio signal generation means for generating an audio signal component corresponding to one specific channel in the decode channel by inputting an audio signal of the encode channel generated by distributing the signal component according to the channel configuration of the encode channel, Corresponding to each decoding channel,
Each of the audio signal generation means includes
For each input audio signal of the encoding channel, correction means for correcting the transfer characteristics given to the audio signal component of the decoding channel corresponding to the audio signal generation means,
Proximity detection means for detecting a predetermined approximation between the signals corrected by the correction means;
Separation means for separating and outputting signal components that are approximated to each other from signals for each encoding channel output from the signal correction means based on the detection result of the proximity detection means;
Channel audio signal output means for adding the signal components separated by the separation means and outputting as a corresponding decode channel audio signal;
An audio signal processing device.

The correction means is
For each input audio signal of the encoding channel, the audio signal generating means is configured to execute a filtering process that gives an inverse characteristic based on the transfer characteristic given to the audio signal of the corresponding decoding channel. ,
The audio signal processing apparatus according to claim 1.

The correction means is
The audio signal generating means executes a filtering process that gives the inverse characteristic of the direct sound part in the impulse response of the transfer characteristic given to the audio signal of the corresponding decode channel.
The audio signal processing apparatus according to claim 2.

The correction means is
The audio signal generation means performs filtering processing that gives the inverse characteristic of the transfer characteristic due to the anechoic environment, which is given to the audio signal of the corresponding decoding channel.
The audio signal processing apparatus according to claim 2.

The proximity detection means is
Detecting the proximity of the phase of the signal for each encoding channel after correction by the correction means;
The audio signal processing apparatus according to claim 2.

The proximity detection means is
Detecting the level approximation of the signal for each encoding channel after correction by the correction means;
The audio signal processing apparatus according to claim 2.

The correction means is
A process for correcting a propagation time difference related to an audio signal component of a corresponding decode channel caused by an assigned transfer characteristic between audio signals for each input encode channel is executed.
The proximity detection means is
Detecting the propagation time difference for the signal for each encoding channel after correction by the correction means as approximation,
The audio signal processing apparatus according to claim 1.

The correction means is
Furthermore, a process for correcting a level difference related to the audio signal component of the corresponding decode channel caused by the imparted transfer characteristic between the audio signals of each input encode channel is executed.
The proximity detection means is
Furthermore, a process for correcting the level difference for the signal for each encoding channel after correction by the correction means is performed.
8. An audio signal processing apparatus according to claim 7,

An encoding apparatus that converts a set of audio signals of an original channel having a predetermined channel configuration into a set of audio signals of an encode channel having a predetermined channel configuration other than the original channel, and outputs the set;
A decoding device that inputs a set of audio signals of an encoding channel that constitutes a predetermined channel configuration and converts the set to an audio signal set of a decoding channel that constitutes a predetermined channel configuration;
The encoding device is
Each original channel has a corresponding one for each encode channel, and the input audio signal has a transfer characteristic represented by a spatial transfer function set based on the position of the sound source as the corresponding original channel. Transfer characteristic imparting means for imparting to the audio signal;
An adder that is provided corresponding to each encode channel, inputs and adds signals processed by each of the transfer characteristic assigning means, and outputs the added output as an audio signal of the corresponding encode channel; With
The decoding device is
Audio signal separation means for separating an audio signal component corresponding to one specific channel in the decode channel is provided for each decode channel,
Each of the audio signal separation means is
Correction means for correcting the transfer characteristic given to the audio signal component of the corresponding decode channel for the input audio signal for each encode channel;
Proximity detection means for detecting a predetermined approximation for a signal for each encoding channel after correction by the correction means;
Separation means for separating and outputting signal components that are approximated to each other from signals for each encoding channel output from the signal correction means based on the detection result of the proximity detection means;
Channel audio signal output means for adding the signal components separated by the separation means and outputting as a corresponding decode channel audio signal;
An audio signal processing system.

For each of the audio signal components corresponding to the decode channel constituting a predetermined channel configuration, transfer characteristics represented by a spatial transfer function obtained based on the position of the sound source as the corresponding decode channel are given, and these audio signals are given. An audio signal generation procedure for generating an audio signal component corresponding to one specific channel in the decode channel by inputting an audio signal of the encode channel generated by distributing the signal component according to the channel configuration of the encode channel, It is executed corresponding to each decoding channel,
As an audio signal generation procedure corresponding to each decoding channel,
For each input audio signal of the encoding channel, a correction procedure for correcting the transfer characteristics given to the audio signal component of the decoding channel corresponding to the audio signal generation procedure;
An approximation detection procedure for detecting a predetermined approximation between the signals corrected by the correction procedure;
Based on the detection result of the proximity detection procedure, a separation procedure for separating and outputting signal components that are approximated to each other from the signals for each encoding channel obtained by the signal correction procedure;
A channel audio signal output procedure for adding the signal components separated by the separation procedure and outputting as an audio signal of the corresponding decode channel;
For causing an information processing apparatus to execute the program.

An encoding process for converting an audio signal set of an original channel forming a predetermined channel configuration into an audio signal set of an encoding channel forming a predetermined channel configuration other than the original channel, and outputting the set;
An information processing apparatus is configured to execute a decoding process of inputting a set of audio signals of an encode channel that forms a predetermined channel configuration and converting the set of audio signals of a decode channel that forms a predetermined channel configuration,
The above encoding process
A transfer characteristic represented by a spatial transfer function set based on the position of the sound source as the corresponding original channel, which is a procedure to be executed corresponding to each encode channel for each original channel. , A transfer characteristic applying procedure to be applied to the input audio signal,
An addition procedure that is provided corresponding to each encode channel, inputs the signals processed by each of the transfer characteristic imparting procedures, adds them, and outputs the added output as an audio signal of the corresponding encode channel; Is executed by the information processing device,
The decoding process is
An audio signal separation procedure for separating an audio signal component corresponding to one specific channel in the decode channel is to be executed for each decode channel,
Each of the above audio signal separation procedures includes:
A correction procedure for correcting the transfer characteristic given to the audio signal component of the corresponding decode channel for the input audio signal for each encode channel;
A proximity detection procedure for detecting a predetermined proximity of the signal for each encoding channel after correction by the correction procedure;
Based on the detection result of the proximity detection procedure, a separation procedure for separating and outputting signal components that are approximated to each other from the signals for each encoding channel obtained by the signal correction procedure;
Adding the signal components separated by the separation procedure, and causing the information processing apparatus to execute a channel audio signal output procedure for outputting the audio signal of the corresponding decode channel;
A program characterized by that.