JP2004139101A

JP2004139101A - Optical recording medium and voice decoding device

Info

Publication number: JP2004139101A
Application number: JP2003372807A
Authority: JP
Inventors: Yoshiaki Tanaka; 田中　美昭; Shoji Ueno; 植野　昭治
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2003-10-31
Filing date: 2003-10-31
Publication date: 2004-05-13
Anticipated expiration: 2018-11-16
Also published as: JP3821386B2

Abstract

<P>PROBLEM TO BE SOLVED: To manage a reproduction-side processing time when a multichannel voice signal is encoded with a variable compression ratio. <P>SOLUTION: Predicting circuits 13D1, 13D2, and 15D1 to 15D4 and buffer selectors 14D1, 14D2, and 16D1 to 16D4 perform predictive encoding of a 6-channel voice signal. A DTS generator 17 generates decoding time stamp information showing the read timing of compressed data in a decoding-side input buffer 22a according to the amount of predictively encoded data by channels, and a formatting circuit 19 formats the data into a packet which has a packet header including the decoding time stamp information and user data including the compressed data. <P>COPYRIGHT: (C)2004,JPO

Description

　本発明は、マルチチャネルの音声信号を可変長で圧縮するための光記録媒体及び音声復号装置に関する。 The present invention relates to an optical recording medium and an audio decoding device for compressing a multi-channel audio signal with a variable length.

　音声信号を可変長で圧縮する方法として、本発明者は先の出願（特願平９−２
８９１５９号）において１チャネルの原デジタル音声信号に対して、特性が異な
る複数の予測器により時間領域における過去の信号から現在の信号の複数の線形
予測値を算出し、原デジタル音声信号と、この複数の線形予測値から予測器毎の
予測残差を算出し、予測残差の最小値を選択する予測符号化方法を提案している
。 As a method of compressing an audio signal with a variable length, the present inventor has disclosed a prior application (Japanese Patent Application No. 9-2980).
No. 89159), a plurality of predictors having different characteristics are used to calculate a plurality of linear prediction values of a current signal from a past signal in a time domain with respect to a one-channel original digital audio signal. A prediction coding method for calculating a prediction residual for each predictor from a plurality of linear prediction values and selecting a minimum value of the prediction residual has been proposed.

　なお、上記方法では原デジタル音声信号がサンプリング周波数＝９６ｋＨｚ、
量子化ビット数＝２０ビット程度の場合にある程度の圧縮効果を得ることができ
るが、近年のＤＶＤオーディオディスクではこの２倍のサンプリング周波数（＝
１９２ｋＨｚ）が使用され、また、量子化ビット数も２４ビットが使用される傾
向があるので、圧縮率を改善する必要がある。また、マルチチャネルにおけるサ
ンプリング周波数と量子化ビット数はチャネル毎に異なることもある。 In the above method, the sampling frequency of the original digital audio signal is 96 kHz,
Although a certain degree of compression effect can be obtained when the quantization bit number is about 20 bits, a recent DVD audio disc has a sampling frequency (= 2 times) which is twice this.
192 kHz) and the number of quantization bits tends to be 24 bits. Therefore, it is necessary to improve the compression ratio. Further, the sampling frequency and the number of quantization bits in the multi-channel may be different for each channel.

　ところで、予測符号化方式のような圧縮方式は圧縮率が可変（ＶＢＲ：バリア
ブル・ビット・レート）であるので、マルチチャネルの音声信号を予測符号化す
るとチャネル毎のデータ量が時間的に大きく変化する。また、このようなデータ
を伝送する場合には、チャネル毎にパラレルではなくデータストリームとして伝
送される。 By the way, the compression rate such as the predictive coding method has a variable compression rate (VBR: variable bit rate), and therefore, when predictive coding is performed on a multi-channel audio signal, the data amount of each channel greatly changes over time. I do. When transmitting such data, the data is transmitted as a data stream instead of parallel for each channel.

　したがって、再生側（デコード側）においてこのような可変長のデータストリ
ームをチャネル毎に同期して再生（プレゼンテーション）可能にするためには、
入力バッファに蓄積されたデータストリームを読み出してデコーダに出力するた
めのタイミングを示すデコード時間と、出力バッファに蓄積されたデコード後の
データを読み出してスピーカなどに出力（プレゼンテーション）するためのタイ
ミングを示す再生時間を管理しなければならない。また、再生側でこのような可
変長のデータストリームをサーチ再生するための時間を管理しなければならない
。 Therefore, in order to enable reproduction (presentation) of such a variable-length data stream on the reproduction side (decoding side) in synchronization with each channel,
A decoding time indicating the timing for reading the data stream stored in the input buffer and outputting the data to the decoder, and a timing for reading the decoded data stored in the output buffer and outputting (presenting) the data to a speaker or the like. Play time must be managed. In addition, the playback side must manage the time for searching and playing back such a variable-length data stream.

　そこで本発明は、マルチチャネルの音声信号を可変の圧縮率で符号化する場合
に再生側の処理時間を管理することができる光記録媒体及び音声復号装置を提供することを目的とする。 Therefore, an object of the present invention is to provide an optical recording medium and an audio decoding device that can manage the processing time on the reproduction side when encoding a multi-channel audio signal at a variable compression rate.

　本発明は上記目的を達成するために、以下の１）及び２）に記載の手段よりなる。 The present invention comprises means described in 1) and 2) below to achieve the above object.

　すなわち、
　１）マルチチャネルの音声信号を、そのままのチャネル又は互いに相関をとったチャネル毎に、入力される音声信号に応答して、先頭サンプル値を得ると共に、特性が異なる複数の線形予測方法により時間領域の過去の信号から現在の信号の線形予測値がそれぞれ予測され、その予測される線形予測値と前記音声信号とから得られる予測残差が最小となるような線形予測方法を選択して圧縮するステップと、
前記圧縮データの１秒乃至２秒前又は１秒乃至２秒後のアクセスユニットをサーチ再生するためのアクセスユニット・サーチポインタを生成するステップと、
前記アクセスユニット・サーチポインタを含むプライベートヘッダと、前記圧縮データと、を含むユーザデータを有するパケットにフォーマット化するステップとにより、前記フォーマット化されたパケットが記録され、前記プライベートヘッダ内の前記アクセスユニット・サーチポインタは、復号側において一旦蓄積される前記圧縮データのアクセスユニットをサーチ再生する情報として記録されていることを特徴とする光記録媒体。
　２）マルチチャネルの音声信号を、そのままのチャネル又は互いに相関をとったチャネル毎に、入力される音声信号に応答して、先頭サンプル値を得ると共に、特性が異なる複数の線形予測方法により時間領域の過去の信号から現在の信号の線形予測値がそれぞれ予測され、その予測される線形予測値と前記音声信号とから得られる予測残差が最小となるような線形予測方法を選択して圧縮する圧縮手段と、
前記圧縮データの１秒乃至２秒前又は１秒乃至２秒後のアクセスユニットをサーチ再生するためのアクセスユニット・サーチポインタを生成するタイミング生成手段と、
前記アクセスユニット・サーチポインタを含むプライベートヘッダと、前記アクセスユニットを含む前記圧縮データと、を含むユーザデータを有するパケットにフォーマット化する手段とを、
有する音声符号化装置により符号化されたデータから元の音声信号を復号する音声復号装置であって、前記パケット内のユーザデータからプライベートヘッダと圧縮データとを分離する手段と、
前記分離された圧縮データを蓄積する入力バッファと、
前記入力バッファ内に蓄積された圧縮データのアクセスユニットを前記プライベートヘッダ内の前記アクセスユニット・サーチポインタに基づいてサーチする手段と、
前記サーチされた圧縮データのアクセスユニットをデコードするデコード手段とを有する音声復号装置。 That is,
1) A multi-channel audio signal is obtained in response to an input audio signal for each channel as it is or for each channel correlated with each other, a first sample value is obtained, and a time domain is obtained by a plurality of linear prediction methods having different characteristics. , A linear prediction value of the current signal is predicted from the past signal, and a linear prediction method that minimizes the prediction residual obtained from the predicted linear prediction value and the audio signal is selected and compressed. Steps and
Generating an access unit search pointer for searching and reproducing an access unit one second to two seconds before or one second to two seconds after the compressed data;
Formatting the packet with user data including the private header including the access unit search pointer and the compressed data, wherein the formatted packet is recorded and the access unit in the private header is recorded. An optical recording medium characterized in that a search pointer is recorded as information for searching and reproducing an access unit of the compressed data temporarily stored on the decoding side.
2) In response to an input audio signal for each channel as it is or for each channel correlated with each other, a multi-channel audio signal is obtained in a time domain by a plurality of linear prediction methods having different characteristics while obtaining a leading sample value. , A linear prediction value of the current signal is predicted from the past signal, and a linear prediction method that minimizes the prediction residual obtained from the predicted linear prediction value and the audio signal is selected and compressed. Compression means;
Timing generating means for generating an access unit search pointer for searching and reproducing an access unit one second to two seconds before or one second to two seconds after the compressed data;
Means for formatting into a packet having user data including a private header including the access unit search pointer and the compressed data including the access unit;
An audio decoding device for decoding an original audio signal from data encoded by an audio encoding device having a means for separating a private header and compressed data from user data in the packet,
An input buffer for storing the separated compressed data;
Means for searching for an access unit of compressed data stored in the input buffer based on the access unit search pointer in the private header;
A decoding means for decoding the access unit of the searched compressed data.

　以上説明したように本発明によれば、アクセスユニットサーチポインタをパケットヘッダにセットしたので、マルチチャネルの音声信号を可変の圧縮率で符号化する場合に再生側がサーチ再生することができる。 As described above, according to the present invention, since the access unit search pointer is set in the packet header, the reproduction side can search and reproduce when encoding a multi-channel audio signal at a variable compression rate.

　以下、図面を参照して本発明の実施の形態を説明する。図１は本発明が適用される声符号化装置とそれに対応する音声復号装置の第１の実施形態を示すブロック図、図２は図１の符号化部を詳しく示すブロック図、図３は図１、図２の符号化部により符号化されたビットストリームを示す説明図、図４はＤＶＤのパックのフォーマットを示す説明図、図５はＤＶＤのオーディオパックのフォーマットを示す説明図、図６は図１の復号化部を詳しく示すブロック図、図７は図６の入力バッファの書き込み／読み出しタイミングを示すタイミングチャート、図８はアクセスユニット毎の圧縮データ量を示す説明図、図９はアクセスユニットとプレゼンテーションユニットを示す説明図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of a voice encoding apparatus to which the present invention is applied and a speech decoding apparatus corresponding thereto, FIG. 2 is a block diagram showing the encoding unit of FIG. 1 in detail, and FIG. 1, 2 and 3. FIG. 4 is an explanatory diagram showing a DVD pack format, FIG. 5 is an explanatory diagram showing a DVD audio pack format, and FIG. 6 is an explanatory diagram showing a DVD audio pack format. FIG. 7 is a block diagram showing the decoding unit in detail in FIG. 1, FIG. 7 is a timing chart showing write / read timing of the input buffer in FIG. 6, FIG. 8 is an explanatory diagram showing the amount of compressed data for each access unit, FIG. 4 is an explanatory diagram showing a presentation unit.

　ここで、マルチチャネル方式としては、例えば次の４つの方式が知られている
。
（１）４チャネル方式　　　ドルビーサラウンド方式のように、前方Ｌ、Ｃ、Ｒ
の３チャネル＋後方Ｓの１チャネルの合計４チャネル
（２）５チャネル方式　　　ドルビーＡＣ−３方式のＳＷチャネルなしのように
、前方Ｌ、Ｃ、Ｒの３チャネル＋後方ＳＬ、ＳＲの２チャネルの合計５チャネル
（３）６チャネル方式　　　ＤＴＳ（Digital Theater System）方式や、ドルビ
ーＡＣ−３方式のように６チャネル（Ｌ、Ｃ、Ｒ、ＳＷ（Ｌｆｅ）、ＳＬ、ＳＲ
）
（４）８チャネル方式　　　ＳＤＤＳ（Sony Dynamic Digital Sound）方式のよ
うに、前方Ｌ、ＬＣ、Ｃ、ＲＣ、Ｒ、ＳＷの６チャネル＋後方ＳＬ、ＳＲの２チ
ャネルの合計８チャネル
　図１に示す符号化側の６チャネル（ch）ミクス＆マトリクス回路１’は、マル
チチャネル信号の一例としてフロントレフト（Ｌｆ）、センタ（Ｃ）、フロント
ライト（Ｒｆ）、サラウンドレフト（Ｌｓ）、サラウンドライト（Ｒｓ）及びＬ
ｆｅ（Low Frequency Effect）の６chのＰＣＭデータを次式（１）により前方グ
ループに関する２ch「１」、「２」と他のグループに関する４ch「３」〜「６」
に分類して変換し、２ch「１」、「２」を第１符号化部２’−１に、また、４ch
「３」〜「６」を第２符号化部２’−２に出力する。 Here, for example, the following four systems are known as the multi-channel system.
(1) 4 channel system L, C, R forward like Dolby surround system
3 channels + 1 channel of rear S, 4 channels in total (2) 5 channels system Like 3 channels of front L, C, R + 2 channels of rear SL, SR like the Dolby AC-3 system without SW channel 6 channels (L, C, R, SW (Lfe), SL, SR) such as DTS (Digital Theater System) and Dolby AC-3
)
(4) 8-channel system As in the case of the Sony Dynamic Digital Sound (SDDS) system, a total of 8 channels including 6 channels of front L, LC, C, RC, R, and SW + 2 channels of rear SL and SR are shown in FIG. The 6-channel (ch) mixing and matrix circuit 1 'on the conversion side includes front left (Lf), center (C), front right (Rf), surround left (Ls), and surround right (Rs) as examples of multi-channel signals. And L
The 6-ch PCM data of fe (Low Frequency Effect) is divided into 2ch “1” and “2” for the front group and 4ch “3” to “6” for the other groups by the following equation (1).
2ch “1” and “2” are assigned to the first encoding unit 2′-1 and 4ch
"3" to "6" are output to second encoding section 2'-2.

　「１」＝Ｌｆ＋Ｒｆ
　「２」＝Ｌｆ−Ｒｆ
　「３」＝Ｃ−（Ｌｓ＋Ｒｓ）／２
　「４」＝Ｌｓ＋Ｒｓ
　「５」＝Ｌｓ−Ｒｓ
　「６」＝Ｌｆｅ−ａ×Ｃ
　ただし、０≦ａ≦１　　　　　　　　　　…（１）
　符号化部２’を構成する第１及び第２符号化部２’−１、２’−２はそれぞれ
、図２に詳しく示すように２ch「１」、「２」と４ch「３」〜「６」のＰＣＭデ
ータを予測符号化し、予測符号化データを図３に示すようなビットストリームで
記録媒体５や通信媒体６を介して復号側に伝送する。復号側では復号化部３’を
構成する第１及び第２復号化部３’−１、３’−２により、図６に詳しく示すよ
うにそれぞれ前方グループに関する２ch「１」、「２」と他のグループに関する
４ch「３」〜「６」の予測符号化データをＰＣＭデータに復号する。 “1” = Lf + Rf
“2” = Lf−Rf
“3” = C− (Ls + Rs) / 2
"4" = Ls + Rs
“5” = Ls−Rs
“6” = Lfe−a × C
However, 0 ≦ a ≦ 1 (1)
As shown in detail in FIG. 2, the first and second encoding units 2'-1 and 2'-2 that constitute the encoding unit 2 'respectively have 2ch "1", "2" and 4ch "3" to "4ch". The PCM data of "6" is predictively coded, and the predicted coded data is transmitted to the decoding side via the recording medium 5 and the communication medium 6 as a bit stream as shown in FIG. On the decoding side, the first and second decoding units 3′-1 and 3′-2 that constitute the decoding unit 3 ′ generate 2ch “1” and “2” for the front group as shown in detail in FIG. The predictive coded data of 4ch “3” to “6” relating to another group is decoded into PCM data.

　次いでミクス＆マトリクス回路４’により式（１）に基づいて元の６ch（Ｌｆ
、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元するとともに、この元の６chと係数ｍ
iｊ（ｉ＝１，２，ｊ＝１，２〜６）により次式（２）のようにステレオ２chデー
タ（Ｌ、Ｒ）を生成する。 Next, the original 6ch (Lf) is calculated by the mix & matrix circuit 4 ′ based on the equation (1).
, C, Rf, Ls, Rs, Lfe) and restore the original 6 ch and coefficient m
Stereo two-channel data (L, R) is generated by ij (i = 1, 2, j = 1, 2 to 6) as in the following equation (2).

　Ｌ＝ｍ１１・Ｌｆ＋ｍ１２・Ｒｆ＋ｍ１３・Ｃ
　　　＋ｍ１４・Ｌｓ＋ｍ１５・Ｒｓ＋ｍ１６・Ｌｆｅ
　Ｒ＝ｍ２１・Ｌｆ＋ｍ２２・Ｒｆ＋ｍ２３・Ｃ
　　　＋ｍ２４・Ｌｓ＋ｍ２５・Ｒｓ＋ｍ２６・Ｌｆｅ　　　　　　　　…（２）
　図２を参照して符号化部２’−１、２’−２について詳しく説明する。各ch「
１」〜「６」のＰＣＭデータは１フレーム毎に１フレームバッファ１０に格納さ
れる。そして、１フレームの各ch「１」〜「６」のサンプルデータがそれぞれ予
測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４に印加されるとともに、各ch「
１」〜「６」の各フレームの先頭サンプルデータがフォーマット化回路１９に印
加される。予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４はそれぞれ、各ch
「１」〜「６」のＰＣＭデータに対して、特性が異なる複数の予測器（不図示）
により時間領域における過去の信号から現在の信号の複数の線形予測値を算出し
、次いで原ＰＣＭデータと、この複数の線形予測値から予測器毎の予測残差を算
出する。続くバッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４はそれ
ぞれ、予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４により算出された各予
測残差を一時記憶して、選択信号／ＤＴＳ（デコーディング・タイム・スタンプ
）生成器１７により指定されたサブフレーム毎に予測残差の最小値を選択する。 L = m11 · Lf + m12 · Rf + m13 · C
+ M14 · Ls + m15 · Rs + m16 · Lfe
R = m21 · Lf + m22 · Rf + m23 · C
+ M24 · Ls + m25 · Rs + m26 · Lfe (2)
The encoding units 2'-1 and 2'-2 will be described in detail with reference to FIG. Each ch
The PCM data “1” to “6” are stored in the one-frame buffer 10 for each frame. Then, the sample data of each channel “1” to “6” of one frame is applied to the prediction circuits 13D1, 13D2, 15D1 to 15D4, respectively, and each channel “
The first sample data of each frame of “1” to “6” is applied to the formatting circuit 19. The prediction circuits 13D1, 13D2, 15D1 to 15D4 respectively
A plurality of predictors (not shown) having different characteristics for PCM data of “1” to “6”
Calculates a plurality of linear prediction values of the current signal from the past signal in the time domain, and then calculates prediction residuals for each predictor from the original PCM data and the plurality of linear prediction values. The following buffer / selectors 14D1, 14D2, 16D1 to 16D4 temporarily store the prediction residuals calculated by the prediction circuits 13D1, 13D2, 15D1 to 15D4, respectively, and provide a selection signal / DTS (decoding time stamp). The minimum value of the prediction residual is selected for each subframe specified by the generator 17.

　選択信号／ＤＴＳ生成器１７は予測残差のビット数フラグをパッキング回路１
８とフォーマット化回路１９に対して印加し、また、予測残差が最小の予測器を
示す予測器選択フラグと、式（１）における相関係数ａと、復号化側が入力バッ
ファ２２ａ（図６）からストリームデータを取り出す時間を示すＤＴＳをフォー
マット化回路１９に対して印加する。パッキング回路１８はバッファ・選択器１
４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４により選択された６ch分の予測残差を、
選択信号／ＤＴＳ生成器１７により指定されたビット数フラグに基づいて指定ビ
ット数でパッキングする。またＰＴＳ生成器１７ｃは、復号化側が出力バッファ
１１０（図６）からＰＣＭデータを取り出す時間を示すＰＴＳ（プレゼンテーシ
ョン・タイム・スタンプ）を生成してフォーマット化回路１９に出力する。 The selection signal / DTS generator 17 stores the bit number flag of the prediction residual in the packing circuit 1.
8 to the formatting circuit 19, a predictor selection flag indicating the predictor with the smallest prediction residual, the correlation coefficient a in the equation (1), and the decoding side input buffer 22a (FIG. 6). ) Is applied to the formatting circuit 19, which indicates the time for extracting the stream data from the format circuit 19. The packing circuit 18 is a buffer / selector 1
The prediction residuals for 6 ch selected by 4D1, 14D2, 16D1 to 16D4 are
Packing is performed with the specified number of bits based on the bit number flag specified by the selection signal / DTS generator 17. Further, the PTS generator 17c generates a PTS (presentation time stamp) indicating a time at which the decoding side takes out the PCM data from the output buffer 110 (FIG. 6) and outputs the PTS to the formatting circuit 19.

　続くフォーマット化回路１９は図３〜図５に示すようなユーザデータにフォー
マット化する。図３に示すユーザデータ（サブパケット）は、前方グループに関
する２ch「１」、「２」の予測符号化データを含む可変レートビットストリーム
（サブストリーム）ＢＳ０と、他のグループに関する４ch「３」〜「６」の予測
符号化データを含む可変レートビットストリーム（サブストリーム）ＢＳ１と、
サブストリームＢＳ０、ＢＳ１の前に設けられたビットストリームヘッダ（リス
タートヘッダ）により構成されている。また、サブストリームＢＳ０、ＢＳ１の
１フレーム分は
・フレームヘッダと、
・各ch「１」〜「６」の１フレームの先頭サンプルデータと、
・各ch「１」〜「６」のサブフレーム毎の予測器選択フラグと、
・各ch「１」〜「６」のサブフレーム毎のビット数フラグと、
・各ch「１」〜「６」の予測残差データ列（可変ビット数）と、
・ch「６」の係数ａ
が多重化されている。このような予測符号化によれば、原信号が例えばサンプリ
ング周波数＝９６ｋＨｚ、量子化ビット数＝２４ビット、６チャネルの場合、７
１％の圧縮率を実現することができる。 The following formatting circuit 19 formats the user data as shown in FIGS. The user data (sub-packet) shown in FIG. 3 includes a variable-rate bit stream (sub-stream) BS0 including 2ch “1” and “2” prediction coded data for the front group, and 4ch “3” to 4ch for the other groups. A variable-rate bit stream (sub-stream) BS1 including the prediction encoded data of “6”;
It is composed of a bit stream header (restart header) provided before the substreams BS0 and BS1. Also, one frame of the substreams BS0 and BS1 has a frame header,
-First sample data of one frame of each channel "1" to "6";
A predictor selection flag for each subframe of each of the channels “1” to “6”;
A bit number flag for each subframe of each channel “1” to “6”;
A prediction residual data string (variable number of bits) for each channel “1” to “6”;
・ Coefficient a of ch “6”
Are multiplexed. According to such predictive coding, when the original signal has, for example, a sampling frequency = 96 kHz, the number of quantization bits = 24 bits, and 6 channels, 7
A compression ratio of 1% can be realized.

　図２に示す符号化部２’−１、２’−２により予測符号化された可変レートビ
ットストリームデータを、記録媒体の一例としてＤＶＤオーディオディスクに記
録する場合には、図４に示すオーディオ（Ａ）パックにパッキングされる。この
パックは２０３４バイトのユーザデータ（Ａパケット、Ｖパケット）に対して４
バイトのパックスタート情報と、６バイトのＳＣＲ（System Clock Reference：
システム時刻基準参照値）情報と、３バイトのMux レート（rate）情報と１バイ
トのスタッフィングの合計１４バイトのパックヘッダが付加されて構成されてい
る（１パック＝合計２０４８バイト）。この場合、タイムスタンプであるＳＣＲ
情報を、先頭パックでは「１」として同一タイトル内で連続とすることにより同
一タイトル内のＡパックの時間を管理することができる。 When the variable-rate bit stream data predictively encoded by the encoding units 2′-1 and 2′-2 shown in FIG. 2 is recorded on a DVD audio disc as an example of a recording medium, the audio data shown in FIG. A) Packed in a pack. This pack is 4 packs for 2034 bytes of user data (A packet, V packet).
Byte pack start information and 6-byte SCR (System Clock Reference:
It is configured by adding a 14-byte pack header of system byte reference value information, 3-byte Mux rate information, and 1-byte stuffing (1 pack = 2048 bytes in total). In this case, the time stamp SCR
By setting the information to be “1” in the first pack and being continuous within the same title, the time of the A pack within the same title can be managed.

　圧縮ＰＣＭのＡパケットは図５に詳しく示すように、１９又は１４バイトのパ
ケットヘッダと、圧縮ＰＣＭのプライベートヘッダと、図３に示すフォーマット
の１ないし２０１１バイトのオーディオデータ（圧縮ＰＣＭ）により構成されて
いる。そして、ＤＴＳとＰＴＳは図５のパケットヘッダ内に（具体的にはパケッ
トヘッダの１０〜１４バイト目にＰＴＳが、１５〜１９バイト目にＤＴＳが）セ
ットされる。圧縮ＰＣＭのプライベートヘッダは、
・１バイトのサブストリームＩＤと、
・２バイトのＵＰＣ／ＥＡＮ−ＩＳＲＣ（Universal Product Code/European Ar
ticle Number-International Standard Recording Code）番号、及びＵＰＣ／ＥＡＮ−ＩＳＲＣデータと、
・１バイトのプライベートヘッダ長と、
・２バイトの第１アクセスユニットポインタと、
・８バイトのオーディオデータ情報（ＡＤＩ）と、
・０〜７バイトのスタッフィングバイトとに、
　より構成されている。そして、ＡＤＩ内に１秒後のアクセスユニットをサーチ
するための前方アクセスユニット・サーチポインタと、１秒前のアクセスユニッ
トをサーチするための後方アクセスユニット・サーチポインタがともに１バイト
で（具体的にはＡＤＩの７バイト目に前方アクセスユニット・サーチポインタが
、８バイト目に後方アクセスユニット・サーチポインタが）セットされる。 As shown in detail in FIG. 5, the A packet of the compressed PCM is composed of a packet header of 19 or 14 bytes, a private header of the compressed PCM, and 1 to 2011 bytes of audio data (compressed PCM) in the format shown in FIG. ing. Then, the DTS and the PTS are set in the packet header of FIG. 5 (specifically, the PTS is set at the 10th to 14th bytes and the DTS is set at the 15th to 19th bytes). The compressed PCM private header is
A 1-byte substream ID,
-2-byte UPC / EAN-ISRC (Universal Product Code / European Ar
ticle Number-International Standard Recording Code) number and UPC / EAN-ISRC data,
A 1-byte private header length,
A 2 byte first access unit pointer;
8 bytes of audio data information (ADI);
・ With stuffing byte of 0-7 bytes,
It is composed of Both the forward access unit search pointer for searching for the access unit one second later and the backward access unit search pointer for searching for the access unit one second earlier in the ADI are both 1 byte (specifically, In the ADI, the forward access unit search pointer is set at the seventh byte and the backward access unit search pointer is set at the eighth byte.

　次に図６を参照して復号化部３’−１、３’−２について説明する。上記フォ
ーマットの可変レートビットストリームデータＢＳ０、ＢＳ１は、デフォーマッ
ト化回路２１により分離される。そして、各ｃｈ「１」〜「６」の１フレームの
先頭サンプルデータと予測器選択フラグはそれぞれ予測回路２４Ｄ１、２４Ｄ２
、２３Ｄ１〜２３Ｄ４に印加され、各ｃｈ「１」〜「６」のビット数フラグはア
ンパッキング回路２２に印加される。また、ＳＣＲと、ＤＴＳと予測残差データ
列は入力バッファ２２ａに印加され、ＰＴＳは出力バッファ１１０に印加される
。ここで、予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４内の複数の予測器
（不図示）はそれぞれ、符号化側の予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１
５Ｄ４内の複数の予測器と同一の特性であり、予測器選択フラグにより同一特性
のものが選択される。 Next, the decoding units 3'-1 and 3'-2 will be described with reference to FIG. The variable rate bit stream data BS0 and BS1 in the above format are separated by the deformatting circuit 21. The head sample data of one frame of each of the channels “1” to “6” and the predictor selection flag are respectively stored in the prediction circuits 24D1 and 24D2.
, 23D1 to 23D4, and the bit number flags of the respective channels “1” to “6” are applied to the unpacking circuit 22. The SCR, the DTS, and the prediction residual data string are applied to the input buffer 22a, and the PTS is applied to the output buffer 110. Here, a plurality of predictors (not shown) in the prediction circuits 24D1, 24D2, 23D1 to 23D4 are respectively prediction circuits 13D1, 13D2, 15D1 to 1 on the encoding side.
The same characteristics as those of the plurality of predictors in 5D4, and those having the same characteristics are selected by the predictor selection flag.

　デフォーマット化回路２１により分離されたストリームデータ（予測残差デー
タ列）は、図７に示すようにＳＣＲによりアクセスユニット毎に入力バッファ２
２ａに取り込まれて蓄積される。ここで、１つのアクセスユニットのデータ量は
、例えばｆｓ＝９６ｋＨｚの場合には（１／９６ｋＨｚ）秒分であるが、図８、
図９（ａ）に詳しく示すように可変長である。そして、入力バッファ２２ａに蓄
積されたストリームデータはＤＴＳに基づいてＦＩＦＯで読み出されてアンパッ
キング回路２２に印加される。 The stream data (prediction residual data string) separated by the deformatting circuit 21 is input to the input buffer 2 for each access unit by the SCR as shown in FIG.
2a and is stored. Here, the data amount of one access unit is (1/96 kHz) seconds when fs = 96 kHz, for example.
The variable length is variable as shown in detail in FIG. Then, the stream data stored in the input buffer 22a is read out by the FIFO based on the DTS and applied to the unpacking circuit 22.

　アンパッキング回路２２は各ｃｈ「１」〜「６」の予測残差データ列をビット
数フラグ毎に基づいて分離してそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１
〜２３Ｄ４に出力する。予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４では
それぞれ、アンパッキング回路２２からの各ｃｈ「１」〜「６」の今回の予測残
差データと、内部の複数の予測器の内、予測器選択フラグにより選択された各１
つにより予測された前回の予測値が加算されて今回の予測値が算出され、次いで
１フレームの先頭サンプルデータを基準として各サンプルのＰＣＭデータが算出
されて出力バッファ１１０に蓄積される。出力バッファ１１０に蓄積されたＰＣ
ＭデータはＰＴＳに基づいて読み出されて出力される。したがって、図９（ａ）
に示す可変長のアクセスユニットが伸長されて、図９（ｂ）に示す一定長のプレ
ゼンテーションユニットが出力される。 The unpacking circuit 22 separates the prediction residual data strings of each of the channels “1” to “6” based on each bit number flag, and separates the respective prediction circuits 24D1, 24D2, and 23D1.
To 23D4. The prediction circuits 24D1, 24D2, and 23D1 to 23D4 respectively use the current prediction residual data of each of the channels “1” to “6” from the unpacking circuit 22 and a predictor selection flag among a plurality of internal predictors. Each one selected
The current predicted value is calculated by adding the previous predicted value calculated by the first method, and then the PCM data of each sample is calculated based on the first sample data of one frame and stored in the output buffer 110. PC stored in output buffer 110
The M data is read and output based on the PTS. Therefore, FIG.
The variable length access unit shown in FIG. 9 is decompressed and a fixed length presentation unit shown in FIG. 9B is output.

　ここで、操作部１０１を介してサーチ再生が指示された場合には、制御部１０
０により図５に示すＡＤＩ内に置かれる１秒先を示す前方アクセスユニット・サ
ーチポインタと１秒後を示す後方アクセスユニット・サーチポインタに基づいて
アクセスユニットを再生する。このサーチポインタとしては、１秒先、１秒前の
代わりに２秒先、２秒前のものでよい。 Here, when search reproduction is instructed via the operation unit 101, the control unit 10
0 reproduces the access unit based on the forward access unit search pointer indicating one second ahead and the backward access unit search pointer indicating one second later in the ADI shown in FIG. The search pointer may be one second ahead, two seconds ahead, two seconds ahead instead of one second ahead.

　図２に示す符号化部２’−１、２’−２により予測符号化された可変レートビ
ットストリームデータをネットワークを介して伝送する場合には、符号化側では
図１０に示すように伝送用にパケット化し（ステップＳ４１）、次いでパケット
ヘッダを付与し（ステップＳ４２）、次いでこのパケットをネットワーク上に送
り出す（ステップＳ４３）。 When variable-rate bit stream data predictively coded by the coding units 2′-1 and 2′-2 shown in FIG. 2 is transmitted via a network, the coding side performs transmission as shown in FIG. (Step S41), a packet header is added (step S42), and the packet is sent out to the network (step S43).

　復号側では図１１（Ａ）に示すようにヘッダを除去し（ステップＳ５１）、次
いでデータを復元し（ステップＳ５２）、次いでこのデータをメモリに格納して
復号を待つ（ステップＳ５３）。そして、復号を行う場合には図１１（Ｂ）に示
すように、デフォーマット化を行い（ステップＳ６１）、次いで入力バッファ２
２ａの入出力制御を行い（ステップＳ６２）、次いでアンパッキングを行う（ス
テップＳ６３）。なお、このとき、サーチ再生指示がある場合にはサーチポイン
タをデコードする。次いで予測器をフラグに基づいて選択してデコードを行い（
ステップＳ６４）、次いで出力バッファ１１０の入出力制御を行い（ステップＳ
６５）、次いで元のマルチチャネルを復元し（ステップＳ６６）、次いでこれを
出力し（ステップＳ６７）、以下、これを繰り返す。 On the decoding side, as shown in FIG. 11A, the header is removed (step S51), the data is restored (step S52), and the data is stored in the memory and decoding is waited (step S53). Then, when decoding is performed, as shown in FIG. 11B, deformatting is performed (step S61).
Input / output control of 2a is performed (step S62), and then unpacking is performed (step S63). At this time, if there is a search reproduction instruction, the search pointer is decoded. Next, a predictor is selected and decoded based on the flag (
(Step S64) Then, input / output control of the output buffer 110 is performed (Step S64).
65), and then restore the original multi-channel (step S66), and then output this (step S67).

　なお、上記実施形態では、前方グループに関する２ch「１」、「２」を
　「１」＝Ｌｆ＋Ｒｆ
　「２」＝Ｌｆ−Ｒｆ
により変換して予測符号化したが、代わりに式（２）によりマルチチャネルをダ
ウンミクスしてステレオ２chデータ（Ｌ、Ｒ）を生成し、次いで次式（１）’
　「１」＝Ｌ＋Ｒ
　「２」＝Ｌ−Ｒ
　「３」〜「５」は同じ
　「６」＝Ｌｆｅ−Ｃ　　　　　…（１）’
により変換して予測符号化するようにしてもよい（第２の実施形態）。この場合
には、復号化側のミクス＆マトリクス回路４’はチャネル「１」、「２」を加算
することによりチャネルＬを、減算することによりチャネルＲを生成することが
できる。 In the above embodiment, 2ch “1” and “2” for the front group are represented by “1” = Lf + Rf
“2” = Lf−Rf
, And performs predictive encoding. Instead, the multi-channel is downmixed by equation (2) to generate stereo 2-ch data (L, R), and then the following equation (1) ′
"1" = L + R
"2" = LR
“3” to “5” are the same “6” = Lfe−C (1) ′
May be used to perform predictive coding (second embodiment). In this case, the mix & matrix circuit 4 'on the decoding side can generate the channel L by adding the channels "1" and "2", and generate the channel R by subtracting the channel L.

　また、第３の実施形態として図１２に示すように、２ch「１」、「２」の代わ
りに式（２）によりマルチチャネルをダウンミクスしてステレオ２chデータ（Ｌ
、Ｒ）を生成して、このステレオ２ch（Ｌ、Ｒ）と４ch「３」〜「６」を予測符
号化するようにしてもよい。なお、第２、第３の実施形態では、フロントレフト
（Ｌｆ）とフロントライト（Ｒｆ）が復号化側に伝送されないので、復号化側で
はこれを式（１）、（２）により生成する。 As a third embodiment, as shown in FIG. 12, instead of 2ch "1" and "2", multi-channel is downmixed by equation (2) and stereo 2ch data (L
, R), and the stereo 2ch (L, R) and 4ch “3” to “6” may be predictively coded. In the second and third embodiments, since the front left (Lf) and the front right (Rf) are not transmitted to the decoding side, the decoding side generates them according to equations (1) and (2).

　次に図１３、図１４を参照して第４の実施形態について説明する。上記の実施
形態では、１グループの相関性の信号「１」〜「６」を予測符号化するように構
成されているが、この第４の実施形態では複数グループの相関性のある信号を生
成して予測符号化し、圧縮率が最も高いグループの予測符号化データを選択する
ように構成されている。このため図１３に示す符号化部では、第１〜第ｎの相関
回路１−１〜１−ｎが設けられ、このｎ個の相関回路１−１〜１−ｎは例えば６
ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のＰＣＭデータを、相関性が異なる
ｎ種類の６ch信号「１」〜「６」に変換する。 Next, a fourth embodiment will be described with reference to FIGS. In the above embodiment, one group of correlated signals "1" to "6" are configured to be predictively coded. In the fourth embodiment, a plurality of groups of correlated signals are generated. Then, it is configured to perform predictive coding and select predictive coded data of a group having the highest compression ratio. For this reason, the encoding unit shown in FIG. 13 is provided with first to n-th correlation circuits 1-1 to 1-n.
The PCM data of ch (Lf, C, Rf, Ls, Rs, Lfe) is converted into n types of 6ch signals “1” to “6” having different correlations.

　例えば第１の相関回路１−１は以下のように変換し、
　「１」＝Ｌｆ
　「２」＝Ｃ−（Ｌｓ＋Ｒｓ）／２
　「３」＝Ｒｆ−Ｌｆ
　「４」＝Ｌｓ−ａ×Ｌｆｅ
　「５」＝Ｒｓ−ｂ×Ｒｆ
　「６」＝Ｌｆｅ
また、第ｎの相関回路１−ｎは以下のように変換する。 For example, the first correlation circuit 1-1 converts as follows,
"1" = Lf
“2” = C− (Ls + Rs) / 2
“3” = Rf−Lf
“4” = Ls−a × Lfe
“5” = Rs−b × Rf
"6" = Lfe
The n-th correlation circuit 1-n performs conversion as follows.

　「１」＝Ｌｆ＋Ｒｆ
　「２」＝Ｃ−Ｌｆ
　「３」＝Ｒｆ−Ｌｆ
　「４」＝Ｌｓ−Ｌｆ
　「５」＝Ｒｓ−Ｌｆ
　「６」＝Ｌｆｅ−Ｃ
　また、相関回路１−１〜１−ｎ毎に予測回路１５とバッファ・選択器１６が設
けられ、グループ毎の予測残差の最小値のデータ量に基づいて圧縮率が最も高い
グループが相関選択信号生成器１７ｂにより選択される。このとき、フォーマッ
ト化回路１９はその選択フラグ（相関回路選択フラグ、その相関回路の相関係数
ａ、ｂ）を追加して多重化する。 “1” = Lf + Rf
"2" = C-Lf
“3” = Rf−Lf
“4” = Ls−Lf
“5” = Rs−Lf
"6" = Lfe-C
A prediction circuit 15 and a buffer / selector 16 are provided for each of the correlation circuits 1-1 to 1-n, and a group having the highest compression rate is selected for correlation based on the data amount of the minimum value of the prediction residual for each group. Selected by the signal generator 17b. At this time, the formatting circuit 19 adds and multiplexes the selection flag (correlation circuit selection flag, correlation coefficients a and b of the correlation circuit).

　また、図１４に示す復号化側では、符号化側の相関回路１−１〜１−ｎに対し
てｎ個の相関回路４−１〜４−ｎ（又は係数ａ、ｂが変更可能な図示省略の１つ
の相関回路）が設けられる。なお、図１３に示すｎグループの予測回路が同一の
構成である場合、復号装置では図１４に示すようにｎグループ分の予測回路を設
ける必要はなく、１つのグループ分の予測回路でよい。そして、符号化装置から
伝送された選択フラグに基づいて相関回路４−１〜４−ｎの１つを選択、又は係
数ａ、ｂを設定して元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元し
、また、式（２）によりマルチチャネルをダウンミクスしてステレオ２chデータ
（Ｌ、Ｒ）を生成する。 Also, on the decoding side shown in FIG. 14, n correlation circuits 4-1 to 4-n (or coefficients a and b can be changed) with respect to the correlation circuits 1-1 to 1-n on the encoding side. (One correlated circuit omitted). When the prediction circuits of n groups shown in FIG. 13 have the same configuration, the decoding device does not need to provide the prediction circuits of n groups as shown in FIG. Then, one of the correlation circuits 4-1 to 4-n is selected based on the selection flag transmitted from the encoding device, or the coefficients a and b are set and the original 6 ch (Lf, C, Rf, Ls, Rs, Lfe) are restored, and multi-channels are downmixed according to equation (2) to generate stereo 2-ch data (L, R).

　また、上記の第１の実施形態では、１種類の相関性の信号「１」〜「６」を予
測符号化するように構成されているが、この信号「１」〜「６」のグループと原
信号（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のグループを予測符号化し、圧縮
率が高い方のグループを選択するようにしてもよい。 In the first embodiment, one kind of correlated signals “1” to “6” is configured to be predictively coded. A group of original signals (Lf, C, Rf, Ls, Rs, Lfe) may be predictively coded and a group having a higher compression rate may be selected.

本発明が適用される音声符号化装置とそれに対応した音声復号装置の第１の実施形態を示すブロック図である。FIG. 1 is a block diagram illustrating a first embodiment of a speech encoding device to which the present invention is applied and a speech decoding device corresponding thereto. 図１の符号化部を詳しく示すブロック図である。FIG. 2 is a block diagram illustrating an encoding unit of FIG. 1 in detail. 図１、図２の符号化部により符号化されたビットストリームを示す説明図である。FIG. 3 is an explanatory diagram illustrating a bit stream encoded by the encoding unit in FIGS. 1 and 2. ＤＶＤのパックのフォーマットを示す説明図である。FIG. 4 is an explanatory diagram showing a format of a DVD pack. ＤＶＤのオーディオパックのフォーマットを示す説明図である。FIG. 3 is an explanatory diagram showing a format of a DVD audio pack. 図１の復号化部を詳しく示すブロック図である。FIG. 2 is a block diagram illustrating a decoding unit of FIG. 1 in detail. 図６の入力バッファの書き込み／読み出しタイミングを示すタイミングチャートである。7 is a timing chart showing write / read timings of the input buffer of FIG. アクセスユニット毎の圧縮データ量を示す説明図である。FIG. 4 is an explanatory diagram showing a compressed data amount for each access unit. アクセスユニットとプレゼンテーションユニットを示す説明図である。FIG. 3 is an explanatory diagram showing an access unit and a presentation unit. 音声伝送方法を示すフローチャートである。5 is a flowchart illustrating a voice transmission method. 音声伝送方法を示すフローチャートである。5 is a flowchart illustrating a voice transmission method. 本発明が適用される音声符号化装置とそれに対応した音声復号装置の第３の実施形態を示すブロック図である。FIG. 11 is a block diagram showing a third embodiment of a speech encoding device to which the present invention is applied and a speech decoding device corresponding thereto. 第４の実施形態の音声符号化装置を示すブロック図である。It is a block diagram showing a speech coding device of a fourth embodiment. 第４の実施形態の音声復号装置を示すブロック図である。It is a block diagram showing a speech decoding device of a fourth embodiment.

Explanation of reference numerals

　１’　６chミクス＆マトリクス回路
　１３Ｄ１，１３Ｄ２，１５Ｄ１〜１５Ｄ４　予測回路（バッファ・選択器１４
Ｄ１，１４Ｄ２，１６Ｄ１〜１６Ｄ４と共に圧縮手段を構成する。）
　１４Ｄ１，１４Ｄ２，１６Ｄ１〜１６Ｄ４　バッファ・選択器
　１７　選択信号／ＤＴＳ生成器（タイミング生成手段）
　１７ｃ　ＰＴＳ生成器（タイミング生成手段）
　１９　フォーマット化回路（フォーマット化手段）
　２１　デフォーマット化回路（分離手段）
　２２　アンパッキング回路
　２２ａ　入力バッファ
　２４Ｄ１，２４Ｄ２，２３Ｄ１〜２３Ｄ４　予測回路（伸長手段）
　１００　制御部（読み出し手段）
　１１０　出力バッファ
1 '6ch Mix & Matrix Circuit 13D1, 13D2, 15D1-15D4 Prediction circuit (buffer / selector 14
D1, 14D2, 16D1 to 16D4 constitute compression means. )
14D1, 14D2, 16D1 to 16D4 Buffer / Selector 17 Selection Signal / DTS Generator (Timing Generation Means)
17c PTS generator (timing generation means)
19 Formatting circuit (formatting means)
21 Deformatting circuit (separation means)
22 Unpacking circuit 22a Input buffer 24D1, 24D2, 23D1 to 23D4 Prediction circuit (expansion means)
100 control unit (reading means)
110 output buffer

Claims

A multi-channel audio signal is obtained for each channel as it is or for each channel correlated with each other, in response to the input audio signal, a first sample value is obtained, and the past in the time domain is obtained by a plurality of linear prediction methods having different characteristics. A linear prediction value of the current signal is predicted from the signals of the signals, and selecting and compressing a linear prediction method that minimizes a prediction residual obtained from the predicted linear prediction value and the audio signal; ,
Generating an access unit search pointer for searching and reproducing an access unit one second to two seconds before or one second to two seconds after the compressed data;
Formatting the packet with user data including the private header including the access unit search pointer and the compressed data, wherein the formatted packet is recorded and the access unit in the private header is recorded. An optical recording medium characterized in that a search pointer is recorded as information for searching and reproducing an access unit of the compressed data temporarily stored on the decoding side.

A multi-channel audio signal is obtained for each channel as it is or for each channel correlated with each other, in response to the input audio signal, a first sample value is obtained, and the past in the time domain is obtained by a plurality of linear prediction methods having different characteristics. Compression means for predicting a linear prediction value of a current signal from each of the signals, and selecting and compressing a linear prediction method that minimizes a prediction residual obtained from the predicted linear prediction value and the audio signal. When,
Timing generating means for generating an access unit search pointer for searching and reproducing an access unit one second to two seconds before or one second to two seconds after the compressed data;
Means for formatting into a packet having user data including a private header including the access unit search pointer and the compressed data including the access unit;
An audio decoding device for decoding an original audio signal from data encoded by an audio encoding device having a means for separating a private header and compressed data from user data in the packet,
An input buffer for storing the separated compressed data;
Means for searching for an access unit of compressed data stored in the input buffer based on the access unit search pointer in the private header;
A decoding means for decoding the access unit of the searched compressed data.