JP2004226991A

JP2004226991A - Speech signal transmitting method and speech signal decoding device

Info

Publication number: JP2004226991A
Application number: JP2004053488A
Authority: JP
Inventors: Yoshiaki Tanaka; 美昭田中; Shoji Ueno; 昭治植野; Norihiko Fuchigami; 徳彦渕上
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2004-02-27
Filing date: 2004-02-27
Publication date: 2004-08-12
Anticipated expiration: 2018-11-16
Also published as: JP4151018B2

Abstract

<P>PROBLEM TO BE SOLVED: To enables a reproduction side to perform reproduction normally even when multiple channels are selectively transmitted on a compression or non-compression basis or down mixing on the reception side is selectively permitted or inhibited. <P>SOLUTION: ATSI includes a 1st identifier showing whether multichannel data in an audio packet are compressed or not and a 2nd identifier indicating whether down mixing of the multichannel data into two stereophonic channels is permitted or inhibited. Thus, even when data transmitted while the identifier showing whether or not multichannel data are compressed and the identifier indicating whether down mixing of the multichannel data into two stereophonic channels is permitted or inhibited are both packetized are received, the data can be decoded and reproduced normally. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

本発明は、マルチチャネル音声信号の伝送方法及びその復号化装置に関する。 The present invention relates to a method for transmitting a multi-channel audio signal and a decoding device therefor.

音声信号を可変長で圧縮する方法として、本発明者は先の出願（特願平９−２８９１５９号）において１チャネルの原デジタル音声信号に対して、特性が異なる複数の予測器により時間領域における過去の信号から現在の信号の複数の線形予測値を算出し、原デジタル音声信号と、この複数の線形予測値から予測器毎の予測残差を算出し、予測残差の最小値を選択する予測符号化方法を提案している。 As a method of compressing an audio signal with a variable length, the present inventor has proposed in the earlier application (Japanese Patent Application No. 9-289159) that a plurality of predictors having different characteristics are used in the time domain for one channel of the original digital audio signal. A plurality of linear prediction values of the current signal are calculated from the past signal, a prediction residual for each predictor is calculated from the original digital audio signal and the plurality of linear prediction values, and a minimum value of the prediction residual is selected. A prediction coding method is proposed.

なお、上記方法では原デジタル音声信号がサンプリング周波数＝９６ｋＨｚ、量子化ビット数＝２０ビット程度の場合にある程度の圧縮効果を得ることができるが、近年のＤＶＤオーディオディスクではこの２倍のサンプリング周波数（＝１９２ｋＨｚ）が使用され、また、量子化ビット数も２４ビットが使用される傾向がある。また、マルチチャネルにおけるサンプリング周波数と量子化ビット数はチャネル毎に異なることもある。 In the above method, a certain compression effect can be obtained when the original digital audio signal has a sampling frequency = 96 kHz and the number of quantization bits = approximately 20 bits. = 192 kHz) and the number of quantization bits tends to be 24 bits. Further, the sampling frequency and the number of quantization bits in the multi-channel may be different for each channel.

ところで、マルチチャネルの音声信号を伝送する場合、著作権者がオーディオソースに依っては圧縮を希望するものとそうでないものがあり、また、ユーザがマルチチャネルをステレオ２チャネルにダウンミクスして再生することを望まないものとそうでないものとの２通りがある。したがって、このように圧縮又は非圧縮で選択的に伝送する２通りと、再生側のダウンミクスを選択的に許可、禁止する２通りの合計４通りで伝送した場合には、再生側でこれを識別して選択的に再生する必要がある。 By the way, when transmitting a multi-channel audio signal, the copyright holder may or may not desire compression depending on the audio source, and the user may down-mix the multi-channel to two stereo channels for playback. There are two types of things that you do not want to do and those that do not. Therefore, if the transmission is performed in two ways, that is, selectively transmitted with compression or non-compression and the downmix on the reproduction side is selectively permitted and prohibited, a total of four transmissions is performed on the reproduction side. It needs to be identified and selectively played.

そこで本発明は、再生側のダウンミクスを選択的に許可又は禁止しても再生側が正常に再生することができる音声受信方法及び音声受信装置を提供することを目的とする。 Therefore, an object of the present invention is to provide an audio receiving method and an audio receiving apparatus that allow the reproducing side to normally reproduce even if the downmix on the reproducing side is selectively permitted or prohibited.

本発明は上記目的を達成するために、以下の１）〜３）の手段より成る。
すなわち、
１）マルチチャネルの音声信号が圧縮されたデータ又は圧縮されないデータが選択的に配置されるオーディオデータの領域（オーディオパケット）と、
前記オーディオデータの領域（オーディオパケット）内のマルチチャネルデータが圧縮されているか否かを示す第１の識別子と、前記オーディオデータの領域（オーディオパケット）内のマルチチャネルデータをステレオ２チャネルにダウンミクスすることを許可するか又は禁止するかを示す第２の識別子と、前記マルチチャネルの構造を示す第３の識別子とが配置された管理情報の領域と、
を有するデータ構造のデータを所定のパケットによる通信フォーマットで伝送することを特徴とする音声信号伝送方法。 The present invention comprises the following means 1) to 3) to achieve the above object.
That is,
1) an audio data area (audio packet) in which multi-channel audio signal compressed data or uncompressed data is selectively arranged;
A first identifier indicating whether or not the multi-channel data in the audio data area (audio packet) is compressed; and down-mixing the multi-channel data in the audio data area (audio packet) into two stereo channels. An area of management information in which a second identifier indicating whether to permit or prohibit the operation and a third identifier indicating the structure of the multi-channel are arranged;
An audio signal transmission method characterized by transmitting data having a data structure having a communication format of a predetermined packet.

２）マルチチャネルの音声信号が圧縮されたデータ又は圧縮されないデータが選択的に配置されるオーディオデータの領域（オーディオパケット）と、
前記オーディオデータの領域（オーディオパケット）内のマルチチャネルデータが圧縮されているか否かを示す第１の識別子と、前記オーディオデータの領域（オーディオパケット）内のマルチチャネルデータをステレオ２チャネルにダウンミクスすることを許可するか又は禁止するかを示す第２の識別子と、前記マルチチャネルの構造を示す第３の識別子とが配置された管理情報の領域と、
を有するデータ構造に符号化されて記録することを特徴とする音声信号記録方法。 2) a region (audio packet) of audio data in which compressed data or uncompressed data of a multi-channel audio signal is selectively arranged;
A first identifier indicating whether or not the multi-channel data in the audio data area (audio packet) is compressed; and down-mixing the multi-channel data in the audio data area (audio packet) into two stereo channels. An area of management information in which a second identifier indicating whether to permit or prohibit the operation and a third identifier indicating the structure of the multi-channel are arranged;
A sound signal recording method characterized in that the sound signal is encoded and recorded in a data structure having the following.

３）請求項２記載の記録方法により記録されたデータを復号化する音声復号化装置であって、
前記データをオーディオパケットと管理情報に分離する手段と、前記管理情報内の第２の識別子がダウンミクスすることを許可する場合に前記オーディオパケット内のマルチチャネルデータを前記第１の識別子に基づいて選択的に伸長するか又は伸長しないでマルチチャネルとステレオ２チャネルで再生し、ダウンミクス識別子がダウンミクスすることを禁止する場合に前記オーディオパケット内のマルチチャネルデータを前記第１の識別子に基づいて選択的に伸長するか又は伸長しないで前記第３の識別子によりチャネルを識別してマルチチャネルのみで再生する手段と、
を有する音声信号復号化装置。 3) An audio decoding device for decoding data recorded by the recording method according to claim 2,
Means for separating the data into audio packets and management information, wherein the multi-channel data in the audio packets is based on the first identifier if a second identifier in the management information allows downmixing. The multi-channel data in the audio packet is reproduced based on the first identifier when the reproduction is performed on the multi-channel and the stereo two-channel with or without the expansion and the down-mix identifier prohibits the down-mix. Means for selectively decompressing or not decompressing and identifying a channel by the third identifier and reproducing only with multi-channels;
An audio signal decoding device having:

以上説明したように本発明によれば、例えば、マルチチャネルデータが圧縮されているか否かを示す識別子と、マルチチャネルデータをステレオ２チャネルにダウンミクスすることを許可するか又は禁止するかを示す識別子をともにパケット化して伝送される等したデータを受信する場合、正常に復号化して再生することができる。 As described above, according to the present invention, for example, an identifier indicating whether or not multi-channel data is compressed and whether to permit or prohibit down-mixing of multi-channel data to two stereo channels are indicated. In the case of receiving data transmitted with the identifier packetized together, the data can be normally decoded and reproduced.

以下、図面を参照して本発明の実施の形態を説明する。図１〜図４は本発明が適用されるマルチチャネル伝送形態を実現する音声符号化装置の処理を示す説明図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 to FIG. 4 are explanatory diagrams showing the processing of a speech coding apparatus for realizing a multi-channel transmission mode to which the present invention is applied.

ここで、マルチチャネル方式としては、例えば次の４つの方式が知られている。
（１）４チャネル方式ドルビーサラウンド方式のように、前方Ｌ、Ｃ、Ｒの３チャネル＋後方Ｓの１チャネルの合計４チャネル
（２）５チャネル方式ドルビーＡＣ−３方式のＳＷチャネルなしのように、前方Ｌ、Ｃ、Ｒの３チャネル＋後方ＳＬ、ＳＲの２チャネルの合計５チャネル（３）６チャネル方式ＤＴＳ（Digital Theater System）方式や、ドルビーＡＣ−３方式のように６チャネル（Ｌ、Ｃ、Ｒ、ＳＷ（Ｌｆｅ）、ＳＬ、ＳＲ）
（４）８チャネル方式ＳＤＤＳ（Sony Dynamic Digital Sound）方式のように、前方Ｌ、ＬＣ、Ｃ、ＲＣ、Ｒ、ＳＷの６チャネル＋後方ＳＬ、ＳＲの２チャネルの合計８チャネル
図１は第１の例の伝送形態として、マルチチャネルを圧縮するとともに再生側のダウンミクスを禁止する場合を示している。符号化側の６チャネル（ch）ミクス＆マトリクス回路１’は、マルチチャネル信号の一例としてフロントレフト（Ｌｆ）、センタ（Ｃ）、フロントライト（Ｒｆ）、サラウンドレフト（Ｌｓ）、サラウンドライト（Ｒｓ）及びＬｆｅ（Low Frequency Effect）の６chのＰＣＭデータを次式（１−１）により６ch「１」〜「６」分の相関信号に変換し、符号化部２’に出力する。 Here, for example, the following four systems are known as the multi-channel system.
(1) Four-channel system Like the Dolby surround system, three channels of front L, C, and R + one channel of rear S are total of four channels. (2) Five-channel system As with no Dolby AC-3 system SW channel. , Front L, C, R 3 channels + rear SL, SR 2 channels, total 5 channels (3) 6 channel system 6 channels (L, L, C) such as DTS (Digital Theater System) system and Dolby AC-3 system C, R, SW (Lfe), SL, SR)
(4) 8-channel system Like the SDDS (Sony Dynamic Digital Sound) system, a total of 8 channels including 6 channels of front L, LC, C, RC, R, and SW + 2 channels of rear SL and SR are shown in FIG. In this example, a case is shown in which the multi-channel is compressed and the downmix on the playback side is prohibited. The 6-channel (ch) mix & matrix circuit 1 ′ on the encoding side includes front left (Lf), center (C), front right (Rf), surround left (Ls), and surround right (Rs) as examples of multi-channel signals. ) And Lfe (Low Frequency Effect) 6-channel PCM data are converted into correlation signals for 6 channels “1” to “6” by the following equation (1-1) and output to the encoding unit 2 ′.

「１」＝Ｌｆ＋Ｒｆ−Ｃ
「２」＝Ｌｆ−Ｒｆ−Ｃ
「３」＝Ｃ−（Ｌｓ＋Ｒｓ）／２
「４」＝Ｌｓ＋Ｒｓ
「５」＝Ｌｓ−Ｒｓ
「６」＝Ｌｆｅ−ａ×Ｃ
ただし、０≦ａ≦１ …（１−１）
このような６チャネル（ｃｈ）ミクス＆マトリクス回路１’による相関式と符号化部２’の符号化方式は選択手段７’で選択される。以下説明する図２、図３、図４、図５及び図６でも同様であるので、これらの図では選択手段７’を略すことにする。 "1" = Lf + Rf-C
"2" = Lf-Rf-C
“3” = C− (Ls + Rs) / 2
"4" = Ls + Rs
“5” = Ls−Rs
“6” = Lfe−a × C
However, 0 ≦ a ≦ 1 (1-1)
Such a correlation formula by the 6-channel (ch) mix & matrix circuit 1 'and the coding system of the coding unit 2' are selected by the selection means 7 '. The same applies to FIG. 2, FIG. 3, FIG. 4, FIG. 5, and FIG. 6, which will be described below. Therefore, in these figures, the selection means 7 'is omitted.

第１と第２の符号化部２’−１、２’−２を有する符号化部２’は図７に詳しく示すようにこの６ch「１」〜「６」のＰＣＭデータを予測符号化し、予測符号化データを図８に示すようなビットストリームで記録媒体５や通信媒体６を介して復号側に伝送する。復号側では第１と第２の復号化部３’−１、３’−２を有する復号化部３’により、図１４に詳しく示すように６ch「１」〜「６」の予測符号化データをＰＣＭデータに復号し、次いでミクス＆マトリクス回路４’により式（１−１）に基づいて元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のみを復元する。 The encoding unit 2 ′ including the first and second encoding units 2′-1 and 2′-2 predictively encodes the PCM data of 6ch “1” to “6” as shown in detail in FIG. The prediction coded data is transmitted to the decoding side via the recording medium 5 and the communication medium 6 in a bit stream as shown in FIG. On the decoding side, as shown in detail in FIG. 14, the prediction coded data of 6ch "1" to "6" is decoded by the decoding unit 3 'having the first and second decoding units 3'-1 and 3'-2. Is decoded into PCM data, and only the original 6 channels (Lf, C, Rf, Ls, Rs, Lfe) are restored by the mix & matrix circuit 4 ′ based on the equation (1-1).

図２は第２の例の伝送形態として、マルチチャネルを圧縮するとともに再生側のダウンミクスを許可する場合を示している。符号化側の６chミクス＆マトリクス回路１’は、元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）と係数ｍij（ｉ＝１，２，ｊ＝１，２〜６）により次式（２）のようにステレオ２chデータ（Ｌ、Ｒ）を生成（ダウンミクス）する。 FIG. 2 shows, as a transmission form of the second example, a case in which multi-channels are compressed and downmixing on the reproduction side is permitted. The 6-channel mixing and matrix circuit 1 'on the encoding side uses the original 6 channels (Lf, C, Rf, Ls, Rs, Lfe) and the coefficients mij (i = 1, 2, j = 1, 2 to 6) as follows: As described in (2), stereo 2ch data (L, R) is generated (downmixed).

Ｌ＝ｍ11・Ｌｆ＋ｍ12・Ｒｆ＋ｍ13・Ｃ
＋ｍ14・Ｌｓ＋ｍ15・Ｒｓ＋ｍ16・Ｌｆｅ
Ｒ＝ｍ21・Ｌｆ＋ｍ22・Ｒｆ＋ｍ23・Ｃ
＋ｍ24・Ｌｓ＋ｍ25・Ｒｓ＋ｍ26・Ｌｆｅ …（２）
そして、式（２）と次式（１−２）により次のような第１グループの２チャネル分の相関信号「１」、「２」と第２グループの４チャネル分の相関信号「３」〜「６」に変換し、それぞれ第１符号化部２’−１、第２符号化部２’−２に出力する。 L = m11 · Lf + m12 · Rf + m13 · C
+ M14 · Ls + m15 · Rs + m16 · Lfe
R = m21 · Lf + m22 · Rf + m23 · C
+ M24 · Ls + m25 · Rs + m26 · Lfe (2)
Then, according to the equation (2) and the following equation (1-2), the correlation signals “1” and “2” for two channels of the first group and the correlation signal “3” for four channels of the second group are as follows. To "6" and output to the first encoding unit 2'-1 and the second encoding unit 2'-2, respectively.

「１」＝Ｌ＋Ｒ
「２」＝Ｌ−Ｒ
「３」〜「６」は式（１−１）と同じ …（１−２）
第１、第２符号化部２’−１、２’−２はそれぞれ第１グループチャネル「１」、「２」と第２グループチャネル「３」〜「６」のＰＣＭデータを予測符号化し、各チャネルの予測符号化データを記録媒体５や通信媒体６を介して復号側に伝送する。復号側では第１、第２復号化部３’−１、３’−２により、それぞれ第１グループチャネル「１」、「２」と第２グループチャネル「３」〜「６」の予測符号化データをＰＣＭデータに復号し、次いでミクス＆マトリクス回路４’により式（１−２）、（２）に基づいて元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元するとともに、第１グループチャネル「１」、「２」を加算、減算することによりそれぞれステレオ２chデータ（Ｌ、Ｒ）を生成する。 "1" = L + R
"2" = LR
"3" to "6" are the same as in the formula (1-1) ... (1-2)
The first and second encoding units 2′-1 and 2′-2 predictively encode PCM data of the first group channels “1” and “2” and the second group channels “3” to “6”, respectively. The prediction coded data of each channel is transmitted to the decoding side via the recording medium 5 and the communication medium 6. On the decoding side, the first and second decoding units 3'-1 and 3'-2 predictively encode the first group channels "1" and "2" and the second group channels "3" to "6", respectively. The data is decoded into PCM data, and the original 6 ch (Lf, C, Rf, Ls, Rs, Lfe) is restored by the mix & matrix circuit 4 ′ based on the equations (1-2) and (2). Stereo 2-ch data (L, R) is generated by adding and subtracting the first group channels "1" and "2".

図３は第３の例の伝送形態として、マルチチャネルを圧縮しないで伝送するとともに再生側のダウンミクスを禁止する場合を示している。この場合には、非圧縮であるので、符号化側では相関信号も生成することなく元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のＰＣＭデータをそのまま伝送し（ただし、フォーマット化する）、復号化側ではデフォーマット化した後、元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のみを復元する。 FIG. 3 shows, as a transmission form of the third example, a case where multi-channel transmission is performed without compression and downmixing on the reproduction side is prohibited. In this case, since the data is uncompressed, the encoding side transmits the original 6-channel (Lf, C, Rf, Ls, Rs, Lfe) PCM data as it is without generating a correlation signal (however, the formatting is performed). Then, on the decoding side, after reformatting, only the original 6 channels (Lf, C, Rf, Ls, Rs, Lfe) are restored.

図４は第４の例の伝送形態として、マルチチャネルを圧縮しないで伝送するとともに再生側のダウンミクスを許可する場合を示している。この場合にも、非圧縮であるので、符号化側では圧縮率を高めるための相関信号も生成することなく元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のＰＣＭデータをそのまま伝送する（ただし、フォーマット化する）。復号化側ではデフォーマット化した後、元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元するとともに、式（２）によりステレオ２chデータ（Ｌ、Ｒ）を生成（ダウンミクス）する。 FIG. 4 shows, as a transmission form of the fourth example, a case where multi-channel transmission is performed without compression and downmixing on the reproduction side is permitted. Also in this case, since the data is uncompressed, the encoding side transmits the original 6-channel (Lf, C, Rf, Ls, Rs, Lfe) PCM data without generating a correlation signal for increasing the compression rate. Yes (but format). On the decoding side, after performing the reformatting, the original 6 ch (Lf, C, Rf, Ls, Rs, Lfe) is restored, and the stereo 2ch data (L, R) is generated by equation (2) (downmix). I do.

図５は図１においてマルチチャネルを圧縮するとともに再生側のダウンミクスを禁止する場合の変形例を示している。この場合には、符号化側では次式（１−３）により６ch（１）〜（６）分の相関信号に変換し、符号化部２’はこれを予測符号化する。そして、復号化側では式（１−２）により元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のみを復元する。 FIG. 5 shows a modification in which the multi-channel is compressed and the downmix on the reproduction side is prohibited in FIG. In this case, on the encoding side, the signal is converted into a correlation signal for 6 channels (1) to (6) by the following equation (1-3), and the encoding unit 2 'performs predictive encoding. Then, on the decoding side, only the original 6 channels (Lf, C, Rf, Ls, Rs, Lfe) are restored by equation (1-2).

「１」＝Ｌｆ−Ｃ
「２」＝Ｒｆ−Ｃ
「３」〜「６」は式（１−１）と同じ …（１−３）
このように再生側のダウンミクスを禁止する場合は、これに対応して式（２）のダウンミクス係数を符号化に加えないとともに、符号化側で式（２）によりステレオ２ｃｈデータ（Ｌ、Ｒ）を生成（ダウンミクス）することが禁じられる。 "1" = Lf-C
"2" = Rf-C
"3" to "6" are the same as in the formula (1-1) ... (1-3)
In the case where the downmix on the reproduction side is prohibited in this manner, the downmix coefficient of the equation (2) is not added to the encoding corresponding to this, and the stereo 2ch data (L, R) is forbidden.

図６は図２においてマルチチャネルを圧縮するとともに再生側のダウンミクスを許可する場合の変形例を示している。この場合には、符号化側では式（２）によりステレオ２chデータ（Ｌ、Ｒ）を生成（ダウンミクス）し、次いで次式（１−４）により次のような第１グループの２チャネル「１」、「２」と第２グループの４チャネル分の相関信号「３」〜「６」に変換し、第１、第２符号化部２’−１、２’−２はこの各グループチャネルを予測符号化する。そして、復号化側では式（１−４）、（２）により元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元するとともにステレオ２chデータ（Ｌ、Ｒ）をそのまま出力する。 FIG. 6 shows a modification in which the multi-channel is compressed and the downmix on the reproduction side is permitted in FIG. In this case, the encoding side generates (down-mixes) stereo 2ch data (L, R) according to equation (2), and then, according to equation (1-4), the first group of two channels “ 1 "," 2 "and correlation signals" 3 "to" 6 "for four channels of the second group, and the first and second encoding units 2'-1 and 2'-2 convert the respective group channels. Is predictively coded. Then, on the decoding side, the original 6 channels (Lf, C, Rf, Ls, Rs, Lfe) are restored according to equations (1-4) and (2), and the stereo 2ch data (L, R) is output as it is.

「１」＝Ｌ
「２」＝Ｒ
「３」〜「６」は式（１−１）と同じ …（１−４）
図７を参照して符号化部２’−１、２’−２について詳しく説明する。各ch「１」〜「６」のＰＣＭデータは１フレーム毎に１フレームバッファ１０に格納される。そして、１フレームの各ch「１」〜「６」のサンプルデータがそれぞれ予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４に印加されるとともに、各ch「１」〜「６」の各フレームの先頭サンプルデータがフォーマット化回路１９に印加される。予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４はそれぞれ、各ch「１」〜「６」のＰＣＭデータに対して、特性が異なる複数の予測器（不図示）により時間領域における過去の信号から現在の信号の複数の線形予測値を算出し、次いで原ＰＣＭデータと、この複数の線形予測値から予測器毎の予測残差を算出する。続くバッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４はそれぞれ、予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４により算出された各予測残差を一時記憶して、選択信号／ＤＴＳ（デコーディング・タイム・スタンプ）生成器１７により指定されたサブフレーム毎に予測残差の最小値を選択する。 "1" = L
"2" = R
"3" to "6" are the same as in the formula (1-1) ... (1-4)
The encoding units 2′-1 and 2′-2 will be described in detail with reference to FIG. The PCM data of each of the channels “1” to “6” is stored in one frame buffer 10 for each frame. Then, the sample data of each channel “1” to “6” of one frame is applied to the prediction circuits 13D1, 13D2, 15D1 to 15D4, respectively, and the first sample data of each frame of each channel “1” to “6” Is applied to the formatting circuit 19. The prediction circuits 13D1, 13D2, and 15D1 to 15D4 respectively use a plurality of predictors (not shown) having different characteristics to convert the current signal from the past signal in the time domain to the PCM data of each channel “1” to “6”. Are calculated, and then a prediction residual for each predictor is calculated from the original PCM data and the plurality of linear prediction values. The following buffer / selectors 14D1, 14D2, 16D1 to 16D4 temporarily store the prediction residuals calculated by the prediction circuits 13D1, 13D2, 15D1 to 15D4, respectively, and provide a selection signal / DTS (decoding time stamp). The minimum value of the prediction residual is selected for each subframe specified by the generator 17.

選択信号／ＤＴＳ生成器１７は予測残差のビット数フラグをパッキング回路１８とフォーマット化回路１９に対して印加し、また、予測残差が最小の予測器を示す予測器選択フラグと、相関係数ａと、復号化側が入力バッファ２２ａ（図１４）からストリームデータを取り出す時間を示すＤＴＳをフォーマット化回路１９に対して印加する。パッキング回路１８はバッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４により選択された６ch分の予測残差を、選択信号／ＤＴＳ生成器１７により指定されたビット数フラグに基づいて指定ビット数でパッキングする。またＰＴＳ生成器１７ｃは、復号化側が出力バッファ１１０（図１４）からＰＣＭデータを取り出す時間を示すＰＴＳ（プレゼンテーション・タイム・スタンプ）を生成してフォーマット化回路１９に出力する。フォーマット化回路１９にはまた、圧縮／非圧縮などを示す符号化モードと、ダウンミクス許可／禁止を示す識別子が印加される。 The selection signal / DTS generator 17 applies the bit number flag of the prediction residual to the packing circuit 18 and the formatting circuit 19, and furthermore, a predictor selection flag indicating the predictor having the minimum prediction residual and a phase relationship. The number a and the DTS indicating the time at which the decoding side extracts the stream data from the input buffer 22a (FIG. 14) are applied to the formatting circuit 19. The packing circuit 18 packs the prediction residuals for the 6 channels selected by the buffer / selectors 14D1, 14D2, 16D1 to 16D4 with the specified number of bits based on the bit number flag specified by the selection signal / DTS generator 17. . The PTS generator 17c generates a PTS (presentation time stamp) indicating the time at which the decoding side takes out the PCM data from the output buffer 110 (FIG. 14), and outputs the PTS to the formatting circuit 19. An encoding mode indicating compression / non-compression and an identifier indicating downmix permission / prohibition are also applied to the formatting circuit 19.

続くフォーマット化回路１９は図８〜図１３に示すようなユーザデータにフォーマット化する。図８に示すユーザデータ（サブパケット）は、前方グループに関する２ch「１」、「２」の予測符号化データを含む可変レートビットストリーム（サブストリーム）ＢＳ０と、他のグループに関する４ch「３」〜「６」の予測符号化データを含む可変レートビットストリーム（サブストリーム）ＢＳ１と、サブストリームＢＳ０、ＢＳ１の前に設けられたビットストリームヘッダ（リスタートヘッダ）により構成されている。 The following formatting circuit 19 formats the user data as shown in FIGS. The user data (sub-packet) shown in FIG. 8 includes a variable-rate bit stream (sub-stream) BS0 including 2ch “1” and “2” prediction coded data for the front group, and 4ch “3” to 4ch for the other groups. It is composed of a variable-rate bit stream (substream) BS1 containing the prediction encoded data of “6” and a bitstream header (restart header) provided before the substreams BS0 and BS1.

また、サブストリームＢＳ０、ＢＳ１の１フレーム分は
・フレームヘッダと、
・各ch「１」〜「６」の１フレームの先頭サンプルデータと、
・各ch「１」〜「６」のサブフレーム毎の予測器選択フラグと、
・各ch「１」〜「６」のサブフレーム毎のビット数フラグと、
・各ch「１」〜「６」の予測残差データ列（可変ビット数）と、
・ch「６」の係数ａとが、
多重化されている。このような予測符号化によれば、原信号が例えばサンプリング周波数＝９６ｋＨｚ、量子化ビット数＝２４ビット、６チャネルの場合、７１％の圧縮率を実現することができる。 Also, one frame of the substreams BS0 and BS1 has a frame header,
-First sample data of one frame of each channel "1" to "6";
A predictor selection flag for each subframe of each of the channels “1” to “6”;
A bit number flag for each subframe of each channel “1” to “6”;
A prediction residual data string (variable number of bits) for each channel “1” to “6”;
-The coefficient a of ch "6" is
It is multiplexed. According to such predictive coding, when the original signal has, for example, a sampling frequency of 96 kHz, a quantization bit number of 24 bits, and 6 channels, a compression ratio of 71% can be realized.

図７に示す符号化部２’−１、２’−２により予測符号化された可変レートビットストリームデータを、記録媒体の一例としてＤＶＤオーディオディスクに記録する場合には、図９に示すオーディオ（Ａ）パックにパッキングされる。このパックは２０３４バイトのユーザデータ（Ａパケット、Ｖパケット）に対して４バイトのパックスタート情報と、６バイトのＳＣＲ（System Clock Reference：システム時刻基準参照値）情報と、３バイトのMux レート（rate）情報と１バイトのスタッフィングの合計１４バイトのパックヘッダが付加されて構成されている（１パック＝合計２０４８バイト）。この場合、タイムスタンプであるＳＣＲ情報を、先頭パックでは「１」として同一タイトル内で連続とすることにより同一タイトル内のＡパックの時間を管理することができる。 When recording the variable-rate bit stream data predictively coded by the coding units 2′-1 and 2′-2 shown in FIG. 7 on a DVD audio disc as an example of a recording medium, the audio data shown in FIG. A) Packed in a pack. This pack has 4 bytes of pack start information, 20 bytes of SCR (System Clock Reference: System Clock Reference) information, and 3 bytes of Mux rate (20 bytes) of user data (A packet, V packet). rate) information and 1-byte stuffing, and a pack header of 14 bytes in total is added (1 pack = 2048 bytes in total). In this case, the time of the A pack in the same title can be managed by setting the SCR information as a time stamp to “1” in the first pack and making it continuous within the same title.

圧縮ＰＣＭのＡパケットは図１０に詳しく示すように、１９又は１４バイトのパケットヘッダと、圧縮ＰＣＭのプライベートヘッダと、図１１に示すフォーマットの１ないし２０１１バイトのオーディオデータ（圧縮ＰＣＭ）により構成されている。そして、ＤＴＳとＰＴＳは図５のパケットヘッダ内に（具体的にはパケットヘッダの１０〜１４バイト目にＰＴＳが、１５〜１９バイト目にＤＴＳが）セットされる。圧縮ＰＣＭのプライベートヘッダは、
・１バイトのサブストリームＩＤと、
・２バイトのＵＰＣ／ＥＡＮ−ＩＳＲＣ（Universal Product Code/European Article Number-International Standard Recording Code）番号、及びＵＰＣ／ＥＡＮ−ＩＳＲＣデータと、
・１バイトのプライベートヘッダ長と、
・２バイトの第１アクセスユニットポインタと、
・８バイトのオーディオデータ情報（ＡＤＩ）と、
・０〜７バイトのスタッフィングバイトとに、
より構成されている。 As shown in detail in FIG. 10, the A packet of the compressed PCM is composed of a packet header of 19 or 14 bytes, a private header of the compressed PCM, and 1 to 2011 bytes of audio data (compressed PCM) in the format shown in FIG. ing. Then, the DTS and the PTS are set in the packet header of FIG. 5 (specifically, the PTS is set at the 10th to 14th bytes and the DTS is set at the 15th to 19th bytes). The compressed PCM private header is
A 1-byte substream ID,
A 2-byte UPC / EAN-ISRC (Universal Product Code / European Article Number-International Standard Recording Code) number and UPC / EAN-ISRC data;
A 1-byte private header length,
A 2 byte first access unit pointer;
8 bytes of audio data information (ADI);
・ With stuffing byte of 0-7 bytes,
It is composed of

また、ＡＤＩ内に１秒後のアクセスユニットをサーチするための前方アクセスユニット・サーチポインタと、１秒前のアクセスユニットをサーチするための後方アクセスユニット・サーチポインタがともに１バイトでセットされる。具体的にはＡＤＩの７バイト目に前方アクセスユニット・サーチポインタが、８バイト目に後方アクセスユニット・サーチポインタがセットされる。 Also, a forward access unit search pointer for searching for an access unit one second later and a backward access unit search pointer for searching for an access unit one second earlier are set in the ADI in one byte. Specifically, the forward access unit search pointer is set to the seventh byte of the ADI, and the backward access unit search pointer is set to the eighth byte.

図１０に示す圧縮ＰＣＭ（ＰＰＣＭともいう）のオーディオパケットにおけるオーディオデータエリアは、図１１に示すようにサブパケットと複数のＰＰＣＭアクセスユニットにより構成され、ＰＰＣＭアクセスユニットはＰＰＣＭシンク情報とサブパケットにより構成されている。最初のＰＰＣＭアクセスユニット内のサブパケットは、ディレクトリと、サブストリーム「０」と、ＣＲＣと、サブストリーム「１」と、ＣＲＣとエクストラ情報により構成され、サブストリーム「０」、「１」はＰＰＣＭブロックのみにより構成されている。２番目以降のＰＰＣＭアクセスユニット内のサブパケットは、ディレクトリを除いてサブストリーム「０」と、ＣＲＣと、サブストリーム「１」と、ＣＲＣとエクストラ情報により構成され、サブストリーム「０」、「１」はリスタートヘッダとＰＰＣＭブロックにより構成されている。 The audio data area in the compressed PCM (also referred to as PPCM) audio packet shown in FIG. 10 is composed of a subpacket and a plurality of PPCM access units as shown in FIG. 11, and the PPCM access unit is composed of PPCM sink information and subpackets. Have been. A subpacket in the first PPCM access unit is composed of a directory, a substream “0”, a CRC, a substream “1”, a CRC and extra information, and the substreams “0” and “1” are PPCMs. It is composed of only blocks. The sub-packets in the second and subsequent PPCM access units are composed of a sub-stream “0”, a CRC, a sub-stream “1”, a CRC and extra information except for a directory, and sub-streams “0” and “1”. Is composed of a restart header and a PPCM block.

ＰＰＣＭシンク情報（以下、同期情報ともいう）は次の情報を含む。
・１パケット当たりのサンプル数：サンプリング周波数ｆｓに応じて４０、８０又は１６０が選択される。
・データレート：ＶＢＲの場合には「０」（サブパケット内のデータが圧縮データであることを示す識別子）
・サンプリング周波数ｆｓ及び量子化ビット数Ｑｂ
・チャネル割り当て情報
フォーマット化回路１９はまた、図８〜図１１に示すオーディオパックを管理するために図１２、図１３に示すような管理情報を含むＡＴＳＩ（オーディオ・タイトル・セット・インフォーメーション）をフォーマット化する。図１２はＡＯＴＴ−ＡＯＢ−ＡＴＲ（オーディオオンリタイトル・オーディオオブジェクトセット・アトリビュート）を示し、このＡＯＴＴ−ＡＯＢ−ＡＴＲ（ｂ１２７〜ｂ０）は、ＭＳＢ側から順に
・８ビット（ｂ１２７〜ｂ１２０）のオーディオ符号化モードと、
・８ビット（ｂ１１９〜ｂ１１２）の保留領域と、
・４ビット（ｂ１１１〜ｂ１０８）のチャネルグループ「１」の量子化ビット数Ｑ１と、
・４ビット（ｂ１０７〜ｂ１０４）のチャネルグループ「２」の量子化ビット数Ｑ２と、
・４ビット（ｂ１０３〜ｂ１００）のチャネルグループ「１」のサンプリング周波数ｆｓ１と、
・４ビット（ｂ９９〜ｂ９６）のチャネルグループ「２」のサンプリング周波数ｆｓ２と、
・３ビット（ｂ９５〜ｂ９３）のマルチチャネル構造のタイプと、
・５ビット（ｂ９２〜ｂ８８）のチャネル割り当てと、
・８ビット×１１（ｂ８７〜ｂ０）の保留領域により構成されている。 The PPCM sync information (hereinafter also referred to as synchronization information) includes the following information.
-Number of samples per packet: 40, 80 or 160 is selected according to the sampling frequency fs.
Data rate: "0" in the case of VBR (identifier indicating that the data in the subpacket is compressed data)
.Sampling frequency fs and quantization bit number Qb
The channel assignment information formatting circuit 19 also uses an ATSI (Audio Title Set Information) including management information as shown in FIGS. 12 and 13 to manage the audio packs shown in FIGS. Format. FIG. 12 shows an AOTT-AOB-ATR (audio only title audio object set attribute). The AOTT-AOB-ATR (b127 to b0) is an 8-bit (b127 to b120) audio code in order from the MSB side. Activation mode,
An 8-bit (b119 to b112) reserved area;
A 4-bit (b111 to b108) channel group “1” quantization bit number Q1;
A 4-bit (b107 to b104) quantization bit number Q2 of the channel group “2”;
A sampling frequency fs1 of a 4-bit (b103 to b100) channel group "1";
A sampling frequency fs2 of a 4-bit (b99 to b96) channel group “2”;
A 3-bit (b95 to b93) multi-channel structure type;
Channel assignment of 5 bits (b92 to b88);
-Consists of a reserved area of 8 bits x 11 (b87 to b0).

上記データを以下に詳しく示す。
（１）オーディオ符号化モード（ｂ１２７〜ｂ１２０）
００００００００ｂ：リニアＰＣＭモード
０００００００１ｂ：圧縮ＰＣＭモード
その他：その他の符号化モード用に保留
（２）チャネルグループ１の量子化ビット数Ｑ１（ｂ１１１〜ｂ１０８）
００００ｂ：１６ビット
０００１ｂ：２０ビット
００１０ｂ：２４ビット
その他：保留
（３）チャネルグループ２の量子化ビット数Ｑ２（ｂ１０７〜ｂ１０４）
・チャネルグループ１の量子化ビット数Ｑ１が「００００ｂ」の場合には「００００ｂ」
・チャネルグループ１の量子化ビット数Ｑ１が「０００１ｂ」の場合には「００００ｂ」又は「０００１ｂ」
・チャネルグループ１の量子化ビット数Ｑ１が「００１０ｂ」の場合には「００００ｂ」、「０００１ｂ」又は「００１０ｂ」
ただし、００００ｂ：１６ビット
０００１ｂ：２０ビット
００１０ｂ：２４ビット
その他：保留
（４）チャネルグループ１のサンプリング周波数ｆｓ１（ｂ１０３〜ｂ１００）００００ｂ：４８ｋＨｚ
０００１ｂ：９６ｋＨｚ
００１０ｂ：１９２ｋＨｚ
１０００ｂ：４４．１ｋＨｚ
１００１ｂ：８８．２ｋＨｚ
１０１０ｂ：１７６．４ｋＨｚ
その他：保留
（５）チャネルグループ２のサンプリング周波数ｆｓ２（ｂ９９〜ｂ９６）
・チャネルグループ１のサンプリング周波数ｆｓ１が「００００ｂ」の場合には「００００ｂ」
・チャネルグループ１のサンプリング周波数ｆｓ１が「０００１ｂ」の場合には「００００ｂ」又は「０００１ｂ」
・チャネルグループ１のサンプリング周波数ｆｓ１が「００１０ｂ」の場合には「００００ｂ」、「０００１ｂ」又は「００１０ｂ」
・チャネルグループ１のサンプリング周波数ｆｓ１が「１０００ｂ」の場合には「１０００ｂ」
・チャネルグループ１のサンプリング周波数ｆｓ１が「１００１ｂ」の場合には「１０００ｂ」又は「１００１ｂ」
・チャネルグループ１のサンプリング周波数ｆｓ１が「１０１０ｂ」の場合には「１０００ｂ」、「１００１ｂ」又は「１０１０ｂ」
（６）マルチチャネル構造のタイプ（ｂ９５〜ｂ９３）
０００ｂ：タイプ１
その他：保留
（７）チャネル割り当て（ｂ９２〜ｂ８８）
１チャネル（モノラル）から６チャネルまでのグループ「１」、「２」のチャネル割り当て情報
図１３はＡＴＳ−ＰＧ−ＣＮＴ（オーディオタイトルセット・プログラム・コンテンツ）を示し、これは先頭から順に
・１ビット（ｂ３１）の、前回と今回のＰＧの関係（Ｒ／Ａ）と、
・１ビット（ｂ３０）のＳＴＣ不連続性フラグ（ＳＴＣ−Ｆ）と、
・３ビット（ｂ２９〜ｂ２７）のアトリビュート数（ＡＴＲＮ）と、
・３ビット（ｂ２６〜ｂ２４）のチャネルグループ（ＣｈＧｒ）「２」のビットシフトデータと、
・２ビット（ｂ２３、ｂ２２）の保留領域と、
・１ビット（ｂ２１）のダウンミックスモード（Ｄ−Ｍ）と、
・１ビット（ｂ２０）のダウンミックス係数の有効性（図示※）と、
・４ビット（ｂ１９〜ｂ１６）のダウンミックス係数テーブル番号（ＤＭ−ＣＯＥＦＴＮ）と、
・各々が１ビット、合計１６ビット（ｂ１５〜ｂ０）のＲＴＩフラグＦ１５〜Ｆ０により構成されている。 The above data is shown in detail below.
(1) Audio encoding mode (b127 to b120)
00000000b: Linear PCM mode 00000001b: Compressed PCM mode Others: reserved for other coding modes (2) Quantization bit number Q1 of channel group 1 (b111 to b108)
0000b: 16 bits 0001b: 20 bits 0010b: 24 bits Others: reserved (3) Number of quantization bits Q2 of channel group 2 (b107 to b104)
"0000b" when the quantization bit number Q1 of the channel group 1 is "0000b"
"0000b" or "0001b" when the quantization bit number Q1 of the channel group 1 is "0001b"
When the quantization bit number Q1 of the channel group 1 is “0010b”, “0000b”, “0001b” or “0010b”
However, 0000b: 16 bits
0001b: 20 bits
0010b: 24 bits
Others: reserved (4) Sampling frequency fs1 of channel group 1 (b103 to b100) 0000b: 48 kHz
0001b: 96 kHz
0010b: 192 kHz
1000b: 44.1 kHz
1001b: 88.2 kHz
1010b: 176.4 kHz
Others: reserved (5) Sampling frequency fs2 of channel group 2 (b99 to b96)
"0000b" when the sampling frequency fs1 of the channel group 1 is "0000b"
"0000b" or "0001b" when the sampling frequency fs1 of the channel group 1 is "0001b"
"0000b", "0001b" or "0010b" when the sampling frequency fs1 of the channel group 1 is "0010b"
"1000b" when the sampling frequency fs1 of the channel group 1 is "1000b"
"1000b" or "1001b" when the sampling frequency fs1 of the channel group 1 is "1001b"
When the sampling frequency fs1 of the channel group 1 is “1010b”, “1000b”, “1001b”, or “1010b”
(6) Multi-channel structure type (b95 to b93)
000b: Type 1
Others: reserved (7) Channel allocation (b92 to b88)
Channel assignment information of groups “1” and “2” from 1 channel (monaural) to 6 channels FIG. 13 shows ATS-PG-CNT (audio title set program content), which is 1 bit in order from the top (B31) the relationship (R / A) between the previous and current PGs,
1-bit (b30) STC discontinuity flag (STC-F);
The number of attributes (ATRN) of 3 bits (b29 to b27);
3 bits (b26 to b24) of a channel group (ChGr) “2” bit shift data;
A 2-bit (b23, b22) reserved area;
1-bit (b21) downmix mode (DM);
・ Effectiveness (illustration *) of 1-bit (b20) downmix coefficient,
A 4-bit (b19 to b16) downmix coefficient table number (DM-COEFTN);
-Each bit is composed of RTI flags F15 to F0 each having 1 bit and a total of 16 bits (b15 to b0).

そして、ビット（ｂ２１）のダウンミクスモード（Ｄ−Ｍ）が「１」の場合に「ダウンミクス禁止」、「０」の場合に「ダウンミクス許可」を表す。 When the downmix mode (DM) of the bit (b21) is “1”, “downmix prohibition” is indicated, and when “0”, “downmix permission” is indicated.

次に図１４を参照して復号化部３’（３’−１、３’−２）について説明する。なお、この復号化部３’（３’−１、３’−２）とミクス＆マトリクス回路４’は、ハードウエアの他にコンピュータプログラムよっても実現することができる。上記フォーマットの可変レートビットストリームデータＢＳ０、ＢＳ１は、デフォーマット化回路２１により分離される。そして、各ｃｈ「１」〜「６」の１フレームの先頭サンプルデータと予測器選択フラグはそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４に印加され、各ｃｈ「１」〜「６」のビット数フラグはアンパッキング回路２２に印加される。また、ＳＣＲと、ＤＴＳと予測残差データ列は入力バッファ２２ａに印加され、ＰＴＳは出力バッファ１１０に印加される。また、圧縮／非圧縮などを示す符号化モードと、ダウンミクス許可／禁止を示す識別子は制御部１００に印加され、サンプリング周波数ｆｓ及び量子化ビット数ＱｂはＤ／Ａ変換器１０２に印加される。ここで、予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４内の複数の予測器（不図示）はそれぞれ、符号化側の予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４内の複数の予測器と同一の特性であり、予測器選択フラグにより同一特性のものが選択される。 Next, the decoding unit 3 '(3'-1, 3'-2) will be described with reference to FIG. The decoding section 3 '(3'-1, 3'-2) and the mix & matrix circuit 4' can be realized not only by hardware but also by a computer program. The variable rate bit stream data BS0 and BS1 in the above format are separated by the deformatting circuit 21. Then, the head sample data of one frame of each channel “1” to “6” and the predictor selection flag are applied to the prediction circuits 24D1, 24D2, 23D1 to 23D4, respectively, and the bit numbers of each channel “1” to “6” The flag is applied to the unpacking circuit 22. The SCR, the DTS, and the prediction residual data string are applied to the input buffer 22a, and the PTS is applied to the output buffer 110. An encoding mode indicating compression / non-compression and an identifier indicating downmix permission / prohibition are applied to the control unit 100, and the sampling frequency fs and the number of quantization bits Qb are applied to the D / A converter 102. . Here, the plurality of predictors (not shown) in the prediction circuits 24D1, 24D2, 23D1 to 23D4 have the same characteristics as the plurality of predictors in the encoding-side prediction circuits 13D1, 13D2, 15D1 to 15D4, respectively. , And those having the same characteristics are selected by the predictor selection flag.

デフォーマット化回路２１により分離されたストリームデータ（予測残差データ列）は、図１５に示すようにＳＣＲによりアクセスユニット毎に入力バッファ２２ａに取り込まれて蓄積される。ここで、１つのアクセスユニットのデータ量は、例えばｆｓ＝９６ｋＨｚの場合には（１／９６ｋＨｚ）秒分であるが、図１６、図１７（ａ）に詳しく示すように可変長である。そして、入力バッファ２２ａに蓄積されたストリームデータはＤＴＳに基づいてＦＩＦＯで読み出されてアンパッキング回路２２に印加される。 The stream data (predicted residual data string) separated by the deformatting circuit 21 is fetched and accumulated in the input buffer 22a for each access unit by the SCR as shown in FIG. Here, the data amount of one access unit is (1/96 kHz) seconds when fs = 96 kHz, for example, but has a variable length as shown in detail in FIGS. 16 and 17A. Then, the stream data stored in the input buffer 22a is read out by the FIFO based on the DTS and applied to the unpacking circuit 22.

アンパッキング回路２２は各ｃｈ「１」〜「６」の予測残差データ列をビット数フラグ毎に基づいて分離してそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４に出力する。予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４ではそれぞれ、アンパッキング回路２２からの各ｃｈ「１」〜「６」の今回の予測残差データと、内部の複数の予測器の内、予測器選択フラグにより選択された各１つにより予測された前回の予測値が加算されて今回の予測値が算出され、次いで１フレームの先頭サンプルデータを基準として各サンプルのＰＣＭデータが算出されて出力バッファ１１０に蓄積される。出力バッファ１１０に蓄積されたＰＣＭデータはＰＴＳに基づいて読み出されて出力され、したがって、図１７（ａ）に示す可変長のアクセスユニットが伸長されて、図１７（ｂ）に示す一定長のプレゼンテーションユニットが出力される。 The unpacking circuit 22 separates the prediction residual data sequence of each of the channels “1” to “6” based on the bit number flag and outputs the separated data to the prediction circuits 24D1, 24D2, and 23D1 to 23D4. The prediction circuits 24D1, 24D2, and 23D1 to 23D4 respectively use the current prediction residual data of each of the channels “1” to “6” from the unpacking circuit 22 and a predictor selection flag among a plurality of internal predictors. The previous predicted value predicted by each selected one is added to calculate the current predicted value, and then the PCM data of each sample is calculated based on the first sample data of one frame and stored in the output buffer 110. Is done. The PCM data stored in the output buffer 110 is read out and output based on the PTS. Therefore, the variable-length access unit shown in FIG. 17A is expanded and the fixed-length access unit shown in FIG. The presentation unit is output.

また、ＰＰＣＭシンク情報内のサンプリング周波数ｆｓ及び量子化ビット数Ｑｂに基づいて、ＰＣＭデータがＤ／Ａ変換器１０２によりアナログ信号に変換される。ここで、操作部１０１を介してサーチ再生が指示された場合には、制御部１００により図５に示す前方アクセスユニット・サーチポインタ（１秒先）と後方アクセスユニット・サーチポインタ（１秒前）に基づいてアクセスユニットを再生する。このサーチポインタとしては、１秒先、１秒前の代わりに２秒先、２秒前のものでよい。 The PCM data is converted into an analog signal by the D / A converter 102 based on the sampling frequency fs and the number of quantization bits Qb in the PPCM sync information. Here, when a search reproduction is instructed via the operation unit 101, the control unit 100 causes the forward access unit search pointer (one second ahead) and the backward access unit search pointer (one second ago) shown in FIG. Play the access unit based on the. The search pointer may be one second ahead, two seconds ahead, two seconds ahead instead of one second ahead.

符号化部２’（２’−１、２’−２）により予測符号化された可変レートビットストリームデータをネットワークを介して伝送する場合には、符号化側では図１８に示すように伝送用にパケット化し（ステップＳ４１）、次いでパケットヘッダを付与し（ステップＳ４２）、次いでこのパケットをネットワーク上に送り出す（ステップＳ４３）。 When variable-rate bit stream data predictively encoded by the encoding unit 2 ′ (2′-1, 2′-2) is transmitted via a network, the encoding side uses a transmission-rate bit stream as shown in FIG. (Step S41), a packet header is added (step S42), and the packet is sent out to the network (step S43).

復号側では図１９（Ａ）に示すようにヘッダを除去し（ステップＳ５１）、次いでデータを復元し（ステップＳ５２）、次いでこのデータをメモリに格納して復号を待つ（ステップＳ５３）。そして、復号を行う場合には図１９（Ｂ）に示すように、デフォーマット化を行い（ステップＳ６１）、次いで入力バッファ２２ａの入出力制御を行い（ステップＳ６２）、次いでアンパッキングを行う（ステップＳ６３）。なお、このとき、サーチ再生指示がある場合にはサーチポインタをデコードする。次いで予測器をフラグに基づいて選択してデコードを行い（ステップＳ６４）、次いで出力バッファ１１０の入出力制御を行い（ステップＳ６５）、次いで元のマルチチャネルを復元し（ステップＳ６６）、次いでこれを出力し（ステップＳ６７）、以下、これを繰り返す。 On the decoding side, as shown in FIG. 19A, the header is removed (step S51), the data is restored (step S52), and the data is stored in the memory and waiting for decoding (step S53). When decoding is performed, as shown in FIG. 19B, deformatting is performed (step S61), input / output control of the input buffer 22a is performed (step S62), and then unpacking is performed (step S62). S63). At this time, if there is a search reproduction instruction, the search pointer is decoded. Next, a predictor is selected based on the flag and decoding is performed (step S64), input / output control of the output buffer 110 is performed (step S65), and the original multi-channel is restored (step S66). This is output (step S67), and thereafter, this is repeated.

次に図２０、図２１を参照して第２の実施形態について説明する。上記の実施形態では、１グループの相関性の信号「１」〜「６」を予測符号化するように構成されているが、この第４の実施形態では複数グループの相関性のある信号を生成して予測符号化し、圧縮率が最も高いグループの予測符号化データを選択するように構成されている。このため図２０に示す符号化部では、第１〜第ｎの相関回路１−１〜１−ｎが設けられ、このｎ個の相関回路１−１〜１−ｎは例えば６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のＰＣＭデータを、相関性が異なるｎ種類の６ch信号「１」〜「６」に変換する。 Next, a second embodiment will be described with reference to FIGS. In the above embodiment, one group of correlated signals "1" to "6" are configured to be predictively coded. In the fourth embodiment, a plurality of groups of correlated signals are generated. Then, it is configured to perform predictive coding and select predictive coded data of a group having the highest compression ratio. For this reason, the encoding unit shown in FIG. 20 is provided with first to n-th correlation circuits 1-1 to 1-n, and the n correlation circuits 1-1 to 1-n have, for example, 6 channels (Lf, Cf). , Rf, Ls, Rs, and Lfe) are converted into n types of 6-channel signals “1” to “6” having different correlations.

例えば第１の相関回路１−１は以下のように変換し、
（１）＝Ｌｆ
（２）＝Ｃ−（Ｌｓ＋Ｒｓ）／２
（３）＝Ｒｆ−Ｌｆ
（４）＝Ｌｓ−ａ×Ｌｆｅ
（５）＝Ｒｓ−ｂ×Ｒｆ
（６）＝Ｌｆｅ
また、第ｎの相関回路１−ｎは以下のように変換する。 For example, the first correlation circuit 1-1 converts as follows,
(1) = Lf
(2) = C− (Ls + Rs) / 2
(3) = Rf-Lf
(4) = Ls−a × Lfe
(5) = Rs−b × Rf
(6) = Lfe
The n-th correlation circuit 1-n performs conversion as follows.

（１）＝Ｌｆ＋Ｒｆ
（２）＝Ｃ−Ｌｆ
（３）＝Ｒｆ−Ｌｆ
（４）＝Ｌｓ−Ｌｆ
（５）＝Ｒｓ−Ｌｆ
（６）＝Ｌｆｅ−Ｃ
また、相関回路１−１〜１−ｎ毎に予測回路１５とバッファ・選択器１６が設けられ、グループ毎の予測残差の最小値のデータ量に基づいて圧縮率が最も高いグループが相関選択信号生成器１７ｂにより選択される。このとき、フォーマット化回路１９はその選択フラグ（相関回路選択フラグ、その相関回路の相関係数ａ、ｂ）を追加して多重化する。 (1) = Lf + Rf
(2) = C-Lf
(3) = Rf-Lf
(4) = Ls-Lf
(5) = Rs-Lf
(6) = Lfe-C
A prediction circuit 15 and a buffer / selector 16 are provided for each of the correlation circuits 1-1 to 1-n, and a group having the highest compression rate is selected for correlation based on the data amount of the minimum value of the prediction residual for each group. Selected by the signal generator 17b. At this time, the formatting circuit 19 adds and multiplexes the selection flag (correlation circuit selection flag, correlation coefficients a and b of the correlation circuit).

また、図２１に示す復号化側では、符号化側の相関回路１−１〜１−ｎに対してｎ個の相関回路４−１〜４−ｎ（又は係数ａ、ｂが変更可能な１つの相関回路４）が設けられる。なお、図２０に示すｎグループの予測回路が同一の構成である場合、復号装置では図２１に示すようにｎグループ分の予測回路を設ける必要はなく、１つのグループ分の予測回路でよい。そして、符号化装置から伝送された選択フラグに基づいて相関回路４−１〜４−ｎの１つを選択、又は係数ａ、ｂを設定して元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元し、また、式（２）によりマルチチャネルをダウンミクスしてステレオ２chデータ（Ｌ、Ｒ）を生成する。 In addition, on the decoding side shown in FIG. 21, n correlation circuits 4-1 to 4-n (or coefficients a and b in which the number of changeable coefficients a and b are One correlation circuit 4) is provided. When the prediction circuits of the n groups shown in FIG. 20 have the same configuration, the decoding device does not need to provide the prediction circuits of the n groups as shown in FIG. 21, but may use the prediction circuits of one group. Then, one of the correlation circuits 4-1 to 4-n is selected based on the selection flag transmitted from the encoding device, or the coefficients a and b are set and the original 6 ch (Lf, C, Rf, Ls, Rs, Lfe) are restored, and multi-channels are downmixed according to equation (2) to generate stereo 2-ch data (L, R).

また、上記の第１の実施形態では、１種類の相関性の信号「１」〜「６」を予測符号化するように構成されているが、この信号「１」〜「６」のグループと原信号（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のグループを予測符号化し、圧縮率が高い方のグループを選択するようにしてもよい。 In the first embodiment, one kind of correlated signals “1” to “6” is configured to be predictively coded. A group of original signals (Lf, C, Rf, Ls, Rs, Lfe) may be predictively coded and a group having a higher compression rate may be selected.

本発明によれば、特許請求の範囲に記載した発明の他に、次のような発明が提供される。 According to the present invention, the following inventions are provided in addition to the inventions described in the claims.

マルチチャネルの音声信号が圧縮されたデータ又は圧縮されないデータを選択的にオーディオパケットに配置するフォーマット化手段と、
前記オーディオパケット内のマルチチャネルデータが圧縮されているか否か、あるいは、前記オーディオパケット内のマルチチャネルデータをステレオ２チャネルにダウンミクスすることを許可するか又は禁止するかによってあらかじめダウンミクスして符号化するか否か、あるいはダウンミクス係数を符号化するか否かを選択する手段とを、
有する音声符号化装置。 Formatting means for selectively arranging compressed data or uncompressed data of a multi-channel audio signal in an audio packet;
The code is downmixed beforehand depending on whether the multi-channel data in the audio packet is compressed, or whether to allow or prohibit the down-mixing of the multi-channel data in the audio packet into two stereo channels. Means to select whether to encode, or whether to encode the downmix coefficient,
Speech encoding device having the same.

本発明が適用されるマルチチャネルの伝送形態の第１の例を示す説明図である。FIG. 3 is an explanatory diagram showing a first example of a multi-channel transmission mode to which the present invention is applied. 本発明が適用されるマルチチャネルの伝送形態の第２の例を示す説明図である。FIG. 7 is an explanatory diagram showing a second example of a multi-channel transmission mode to which the present invention is applied. 本発明が適用されるマルチチャネルの伝送形態の第３の例を示す説明図である。FIG. 9 is an explanatory diagram showing a third example of a multi-channel transmission mode to which the present invention is applied. 本発明が適用されるマルチチャネルの伝送形態の第４の例を示す説明図である。FIG. 11 is an explanatory diagram showing a fourth example of a multi-channel transmission mode to which the present invention is applied. 図１の変形例を示す説明図である。FIG. 4 is an explanatory diagram showing a modification of FIG. 1. 図２の変形例を示す説明図である。FIG. 6 is an explanatory diagram showing a modification of FIG. 2. 図１の符号化部を詳しく示すブロック図である。FIG. 2 is a block diagram illustrating an encoding unit of FIG. 1 in detail. 図１、図７の符号化部により符号化されたビットストリームを示す説明図である。FIG. 8 is an explanatory diagram illustrating a bit stream encoded by the encoding unit in FIGS. 1 and 7. ＤＶＤのパックのフォーマットを示す説明図である。FIG. 4 is an explanatory diagram showing a format of a DVD pack. ＤＶＤのオーディオパックのフォーマットを示す説明図である。FIG. 3 is an explanatory diagram showing a format of a DVD audio pack. 図１０のオーディオデータエリアのフォーマットを詳しく示す説明図である。FIG. 11 is an explanatory diagram showing the format of the audio data area in FIG. 10 in detail. ＤＶＤオーディオのＡＯＴＴ−ＡＯＢ−ＡＴＲ（オーディオオンリタイトル・オーディオオブジェクトセット・アトリビュート）を示す説明図である。FIG. 3 is an explanatory diagram showing AOTT-AOB-ATR (audio only title, audio object set, attribute) of DVD audio. ＤＶＤオーディオのＡＴＳ−ＰＧ−ＣＮＴ（オーディオタイトルセット・プログラム・コンテンツ）を示す説明図である。FIG. 4 is an explanatory diagram showing ATS-PG-CNT (audio title set program content) of DVD audio. 図１の復号化部を詳しく示すブロック図である。FIG. 2 is a block diagram illustrating a decoding unit of FIG. 1 in detail. 図１４の入力バッファの書き込み／読み出しタイミングを示すタイミングチャートである。15 is a timing chart showing write / read timings of the input buffer of FIG. アクセスユニット毎の圧縮データ量を示す説明図である。FIG. 4 is an explanatory diagram showing a compressed data amount for each access unit. アクセスユニットとプレゼンテーションユニットを示す説明図である。FIG. 3 is an explanatory diagram showing an access unit and a presentation unit. 音声伝送方法を示すフローチャートである。5 is a flowchart illustrating a voice transmission method. 音声伝送方法を示すフローチャートである。5 is a flowchart illustrating a voice transmission method. 第２の実施形態の音声符号化装置を示すブロック図である。It is a block diagram showing a speech encoding device of a second embodiment. 第２の実施形態の音声復号装置を示すブロック図である。It is a block diagram showing a speech decoding device of a second embodiment.

Explanation of reference numerals

１’ ６chミクス＆マトリクス回路
１３Ｄ１，１３Ｄ２，１５Ｄ１〜１５Ｄ４予測回路（バッファ・選択器１４Ｄ１，１４Ｄ２，１６Ｄ１〜１６Ｄ４と共に圧縮手段を構成する。）
１４Ｄ１，１４Ｄ２，１６Ｄ１〜１６Ｄ４バッファ・選択器
１７選択信号／ＤＴＳ生成器
１７ｃＰＴＳ生成器
１９フォーマット化回路
２１デフォーマット化回路（分離手段）
２２アンパッキング回路
２２ａ入力バッファ
２４Ｄ１，２４Ｄ２，２３Ｄ１〜２３Ｄ４予測回路（伸長手段）
１００制御部（再生手段）
１０２Ｄ／Ａ変換器
１１０出力バッファ 1 '6ch Mix & Matrix Circuit 13D1, 13D2, 15D1 to 15D4 Prediction circuit (compresses with buffer / selector 14D1, 14D2, 16D1 to 16D4)
14D1, 14D2, 16D1 to 16D4 Buffer / selector 17 Selection signal / DTS generator 17c PTS generator 19 Formatting circuit 21 Deformatting circuit (separating means)
22 Unpacking circuit 22a Input buffer 24D1, 24D2, 23D1 to 23D4 Prediction circuit (expansion means)
100 control unit (reproduction means)
102 D / A converter 110 Output buffer

Claims

An audio data area (audio packet) in which compressed data or uncompressed data of a multi-channel audio signal is selectively arranged;
A first identifier indicating whether or not the multi-channel data in the audio data area (audio packet) is compressed; and down-mixing the multi-channel data in the audio data area (audio packet) into two stereo channels. An area of management information in which a second identifier indicating whether to permit or prohibit the operation and a third identifier indicating the structure of the multi-channel are arranged;
An audio signal transmission method characterized by transmitting data having a data structure having a communication format of a predetermined packet.

An audio data area (audio packet) in which compressed data or uncompressed data of a multi-channel audio signal is selectively arranged;
A first identifier indicating whether or not the multi-channel data in the audio data area (audio packet) is compressed; and down-mixing the multi-channel data in the audio data area (audio packet) into two stereo channels. An area of management information in which a second identifier indicating whether to permit or prohibit the operation and a third identifier indicating the structure of the multi-channel are arranged;
A sound signal recording method characterized in that the sound signal is encoded and recorded in a data structure having the following.

An audio decoding device for decoding data recorded by the recording method according to claim 2,
Means for separating the data into audio packets and management information, wherein the multi-channel data in the audio packets is based on the first identifier if a second identifier in the management information allows downmixing. The multi-channel data in the audio packet is reproduced based on the first identifier when the reproduction is performed on the multi-channel and the stereo two-channel with or without the expansion and the down-mix identifier prohibits the down-mix. Means for selectively decompressing or not decompressing and identifying a channel by the third identifier and reproducing only with multi-channels;
An audio signal decoding device having: