JP2001195096A

JP2001195096A - Voice coder

Info

Publication number: JP2001195096A
Application number: JP2000325670A
Authority: JP
Inventors: Yoshiaki Tanaka; 美昭田中; Shoji Ueno; 昭治植野; Norihiko Fuchigami; 徳彦渕上
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 1998-11-16
Filing date: 2000-10-25
Publication date: 2001-07-19
Anticipated expiration: 2019-11-16
Also published as: JP3387096B2

Abstract

PROBLEM TO BE SOLVED: To improve the compression rate of multi-channel voice signals. SOLUTION: A mix and matrix circuit 1' adds, subtracts and mixes individual six ch PCM data and computes the correlation of each channel by a prescribed equation for every one of correlation circuits 1-1 to 1-n. On the basis of these correlation data, the predictive residual of every channel is computed by each predicting circuit 15. A predictor selection signal generator 17a selects the minimum data among these predictive residuals. The data are supplied to a formatting circuit 19 through a packing circuit 18 and a prescribed bit stream is formatted.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、マルチチャネルの
音声信号を予測符号化するための音声符号化方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding method for predictively coding a multi-channel speech signal.

【０００２】[0002]

【従来の技術】音声信号を予測符号化する方法として、
本発明者は先の出願（特願平９−２８９１５９号）にお
いて１チャネルの原デジタル音声信号に対して、特性が
異なる複数の予測器により時間領域における過去の信号
から現在の信号の複数の線形予測値を算出し、原デジタ
ル音声信号と、この複数の線形予測値から予測器毎の予
測残差を算出し、予測残差の最小値を選択する方法を提
案している。2. Description of the Related Art As a method of predictive encoding of a speech signal,
In the prior application (Japanese Patent Application No. 9-289159), the inventor of the present invention applied a plurality of linearizers of a current signal from a past signal in the time domain to a one-channel original digital audio signal using a plurality of predictors having different characteristics. A method is proposed in which a prediction value is calculated, a prediction residual for each predictor is calculated from the original digital audio signal and the plurality of linear prediction values, and a minimum value of the prediction residual is selected.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記方
法では原デジタル音声信号がサンプリング周波数＝９６
ｋＨｚ、量子化ビット数＝２０ビット程度の場合にある
程度の圧縮効果を得ることができるが、近年のＤＶＤオ
ーディオディスクではこの２倍のサンプリング周波数
（＝１９２ｋＨｚ）が使用され、また、量子化ビット数
も２４ビットが使用される傾向があるので、圧縮率を改
善する必要がある。However, in the above method, the original digital audio signal has a sampling frequency = 96.
Although a certain compression effect can be obtained when the kHz and the quantization bit number are about 20 bits, recent DVD audio discs use twice the sampling frequency (= 192 kHz). Since 24 bits also tend to be used, the compression ratio needs to be improved.

【０００４】そこで本発明は、マルチチャネルの音声信
号を予測符号化する場合に、圧縮率を改善することがで
きる音声符号化方法を提供することを目的とする。Accordingly, an object of the present invention is to provide a speech encoding method capable of improving a compression ratio when predictive encoding of a multi-channel audio signal is performed.

【０００５】[0005]

【課題を解決するための手段】本発明は上記目的を達成
するために、以下に記載の手段よりなる。すなわち、The present invention, in order to achieve the above object, comprises the following means. That is,

【０００６】元のマルチチャネルの音声信号をダウンミ
クスしてステレオ２チャネルの音声信号に変換するステ
ップと、前記ダウンミクスされない元のチャネルの複数
チャネルの各音声信号を所定のマトリクス演算により相
関性のある音声信号に変換するステップと、前記ステレ
オ２チャネルと前記相関性のある音声信号のチャネル毎
に入力される音声信号に応答して先頭サンプル値を得る
と共に、時間領域の過去の信号から予測される現在の信
号の複数の予測値の中でその予測残差が最小値となる線
形予測方法を選択するステップと、前記ステップによっ
て選択された線形予測方法と予測残差と所定の先頭サン
プル値とを含む予測符号化データを所定のビットストリ
ームにフォーマット化するステップと、からなる音声符
号化方法。Down-mixing the original multi-channel audio signal into a stereo two-channel audio signal; and correlating each of the plurality of un-downmixed original audio signals by a predetermined matrix operation. Converting to a certain audio signal, obtaining a first sample value in response to the audio signal input for each channel of the stereo two channels and the correlated audio signal, and predicting from the past signal in the time domain. Selecting a linear prediction method whose prediction residual is a minimum value among a plurality of prediction values of the current signal, and a linear prediction method selected by the step, the prediction residual, a predetermined leading sample value, And formatting the prediction encoded data into a predetermined bit stream.

【０００７】[0007]

【発明の実施の形態】以下、図面を参照して本発明を説
明する。図１は本発明が適用される音声符号化装置とそ
れに対応する音声復号装置の第１の実施形態を示すブロ
ック図、図２は図１の符号化部を詳しく示すブロック
図、図３は図１、図２の符号化部により符号化されたビ
ットストリームを示す説明図、図４は図１の復号化部を
詳しく示すブロック図、図５はＤＶＤのパックのフォー
マットを示す説明図、図６はＤＶＤのオーディオパック
のフォーマットを示す説明図、図７、図８は音声伝送方
法を示すフローチャートである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of a speech encoding apparatus to which the present invention is applied and a speech decoding apparatus corresponding to the speech encoding apparatus. FIG. 2 is a block diagram showing the encoding unit of FIG. 1 in detail. 1, an explanatory diagram showing a bit stream encoded by the encoding unit in FIG. 2, FIG. 4 is a block diagram showing the decoding unit in FIG. 1 in detail, FIG. 5 is an explanatory diagram showing a format of a DVD pack, and FIG. Is an explanatory diagram showing the format of a DVD audio pack, and FIGS. 7 and 8 are flowcharts showing an audio transmission method.

【０００８】ここで、マルチチャネル方式としては、例
えば次の４つの方式が知られている。（１）４チャネル方式ドルビーサラウンド方式の
ように、前方Ｌ、Ｃ、Ｒの３チャネル＋後方Ｓの１チャ
ネルの合計４チャネル（２）５チャネル方式ドルビーＡＣ−３方式のＳ
Ｗチャネルなしのように、前方Ｌ、Ｃ、Ｒの３チャネル
＋後方ＳＬ、ＳＲの２チャネルの合計５チャネル（３）６チャネル方式ＤＴＳ（Digital Theater
System）方式や、ドルビーＡＣ−３方式のように６チャ
ネル（Ｌ、Ｃ、Ｒ、ＳＷ（Ｌｆｅ）、ＳＬ、ＳＲ）（４）８チャネル方式ＳＤＤＳ（Sony Dynamic D
igital Sound）方式のように、前方Ｌ、ＬＣ、Ｃ、Ｒ
Ｃ、Ｒ、ＳＷの６チャネル＋後方ＳＬ、ＳＲの２チャネ
ルの合計８チャネルHere, for example, the following four systems are known as multi-channel systems. (1) Four-channel system As in the Dolby surround system, a total of four channels including three channels of front L, C, and R + one channel of rear S (2) Five-channel system S in the Dolby AC-3 system
Like without W channel, 3 channels of front L, C and R + 2 channels of rear SL and SR, total 5 channels (3) 6 channel system DTS (Digital Theater)
6) (L, C, R, SW (Lfe), SL, SR) like the Dolby AC-3 system (4) 8-channel system SDDS (Sony Dynamic D
digital sound), forward L, LC, C, R
6 channels of C, R, SW + 2 channels of rear SL, SR, total 8 channels

【０００９】図１に示す符号化側の６チャネル（ch）ミ
クス＆マトリクス回路１’は、マルチチャネル信号の一
例としてフロントレフト（Ｌｆ）、センタ（Ｃ）、フロ
ントライト（Ｒｆ）、サラウンドレフト（Ｌｓ）、サラ
ウンドライト（Ｒｓ）及びＬｆｅ（Low Frequency Effe
ct）の６chのＰＣＭデータを係数ｍij（ｉ＝１，２，ｊ
＝１，２〜６）を用いて次式（１）によりステレオ２チ
ャネル（Ｌ、Ｒ）にダウンミクスする。Ｌ＝ｍ11・Ｌｆ＋ｍ12・Ｒｆ＋ｍ13・Ｃ＋ｍ14・Ｌｓ＋ｍ15・Ｒｓ＋ｍ16・ＬｆｅＲ＝ｍ21・Ｌｆ＋ｍ22・Ｒｆ＋ｍ23・Ｃ＋ｍ24・Ｌｓ＋ｍ25・Ｒｓ＋ｍ26・Ｌｆｅ …（１）The 6-channel (ch) mix and matrix circuit 1 'on the encoding side shown in FIG. 1 includes a front left (Lf), a center (C), a front right (Rf), a surround left ( Ls), surround light (Rs) and Lfe (Low Frequency Effe)
ct) of the 6-channel PCM data by coefficients mij (i = 1, 2, j)
= 1, 2 to 6), and downmixes to two stereo channels (L, R) by the following equation (1). L = m11 · Lf + m12 · Rf + m13 · C + m14 · Ls + m15 · Rs + m16 · Lfe R = m21 · Lf + m22 · Rf + m23 · C + m24 · Ls + m25 · Rs + m26 · Lfe (1)

【００１０】またミクス＆マトリクス回路１’は、元の
６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を前方グ
ループに関する２chと他のグループに関する４chに分類
して４chを次式（２）のように、相関性のある信号
「３」〜「６」に変換し、２ch（Ｌ、Ｒ）を第１符号化
部２’−１に、また、４ch「３」〜「６」を第２符号化
部２’−２に出力する。「１」＝Ｌ「２」＝Ｒ「３」＝Ｃ−（Ｌｓ＋Ｒｓ）／２「４」＝Ｌｓ＋Ｒｓ「５」＝Ｌｓ−Ｒｓ「６」＝Ｌｆｅ−Ｃ …（２）The mixer & matrix circuit 1 'classifies the original 6 channels (Lf, C, Rf, Ls, Rs, Lfe) into 2 channels for the front group and 4 channels for the other groups, and divides the 4 channels into the following equation (2). , And 2ch (L, R) to the first encoder 2′-1, and 4ch “3” to “6” to the third signal “3” to “6”. And outputs the result to the 2 coding unit 2′-2. “1” = L “2” = R “3” = C− (Ls + Rs) / 2 “4” = Ls + Rs “5” = Ls−Rs “6” = Lfe−C (2)

【００１１】符号化部２’を構成する第１及び第２符号
化部２’−１、２’−２はそれぞれ、図２に詳しく示す
ように２ch「１」、「２」と４ch「３」〜「６」のＰＣ
Ｍデータをチャネル毎に予測符号化し、予測符号化デー
タを図３に示すようなビットストリームで記録媒体５や
衛星回線や電話回線等の通信媒体６を介して復号側に伝
送する。復号側では復号化部３’を構成する第１及び第
２復号化部３’−１、３’−２により、図４に詳しく示
すようにそれぞれ前方グループに関する２ch「１」、
「２」と他のグループに関する４ch「３」〜「６」の予
測符号化データをチャネル毎にＰＣＭデータに復号す
る。次いでミクス＆マトリクス回路４’により式
（１）、（２）に基づいて元の６ch（Ｌｆ、Ｃ、Ｒｆ、
Ｌｓ、Ｒｓ、Ｌｆｅ）を復元するとともに、ステレオ２
chデータ（Ｌ、Ｒ）をそのまま出力する。As shown in detail in FIG. 2, the first and second encoders 2'-1 and 2'-2 which constitute the encoder 2 'respectively have 2ch "1", "2" and 4ch "3". "~" 6 "PC
The M data is predictively encoded for each channel, and the encoded prediction data is transmitted as a bit stream as shown in FIG. 3 to the decoding side via a recording medium 5 or a communication medium 6 such as a satellite line or a telephone line. On the decoding side, the first and second decoding units 3'-1 and 3'-2 constituting the decoding unit 3 'respectively perform 2ch "1" for the forward group, as shown in detail in FIG.
The prediction coded data of "2" and 4ch "3" to "6" relating to the other groups are decoded into PCM data for each channel. Next, the original 6 ch (Lf, C, Rf,
Ls, Rs, Lfe) and restore stereo 2
The ch data (L, R) is output as it is.

【００１２】図２を参照して符号化部２’−１、２’−
２について詳しく説明する。各ch「１」〜「６」のＰＣ
Ｍデータは１フレーム毎に１フレームバッファ１０に格
納される。そして、１フレームの各ch「１」〜「６」の
サンプルデータがそれぞれ予測回路１３Ｄ１、１３Ｄ
２、１５Ｄ１〜１５Ｄ４に印加されるとともに、各ch
「１」〜「６」の各フレームの先頭サンプルデータ（後
述のリスタートヘッダ内に格納される）がアンパッキン
グ回路８及びフォーマット化回路１９に印加される。ま
た、ＰＣＭデータがＡ／Ｄ変換されたときのサンプリン
グ周波数（ｆｓ）と量子化ビット数（Ｑｂ）がパッキン
グ回路１８及びフォーマット化回路１９に印加される。
予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４はそ
れぞれ、各ch「１」〜「６」のＰＣＭデータに対して、
特性が異なる複数の予測器（不図示）により時間領域に
おける過去の信号から現在の信号の複数の線形予測値を
算出し、次いで原ＰＣＭデータと、この複数の線形予測
値から予測器毎の予測残差を算出する。続くバッファ・
選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４はそれ
ぞれ、予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ
４により算出された各予測残差を一時記憶して、選択信
号／ＤＴＳ（デコーディング・タイム・スタンプ）生成
器１７により指定されたサブフレーム毎に予測残差の最
小値を選択する。Referring to FIG. 2, encoding sections 2'-1, 2'-
2 will be described in detail. PC for each channel "1" to "6"
The M data is stored in one frame buffer 10 for each frame. Then, the sample data of each of the channels “1” to “6” of one frame are respectively supplied to the prediction circuits 13D1 and 13D
2, 15D1 to 15D4 and each channel
First sample data (stored in a restart header described later) of each frame of “1” to “6” is applied to the unpacking circuit 8 and the formatting circuit 19. The sampling frequency (fs) and the number of quantization bits (Qb) when the PCM data is A / D converted are applied to the packing circuit 18 and the formatting circuit 19.
The prediction circuits 13D1, 13D2, and 15D1 to 15D4 respectively calculate the PCM data of each channel “1” to “6”.
A plurality of predictors (not shown) having different characteristics calculate a plurality of linear prediction values of the current signal from a past signal in the time domain, and then perform prediction for each predictor from the original PCM data and the plurality of linear prediction values. Calculate the residual. The following buffer
The selectors 14D1, 14D2, 16D1 to 16D4 are prediction circuits 13D1, 13D2, 15D1 to 15D, respectively.
4 is temporarily stored, and the minimum value of the prediction residual is selected for each subframe specified by the selection signal / DTS (decoding time stamp) generator 17.

【００１３】選択信号生成器１７は予測残差のビット数
フラグをパッキング回路１８とフォーマット化回路１９
に対して印加し、また、予測残差が最小の予測器を示す
予測器選択フラグと、後述するような相関係数をフォー
マット化回路１９に対して印加する。パッキング回路１
８はバッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜
１６Ｄ４により選択された６ch分の予測残差を、選択信
号生成器１７により指定されたビット数フラグに基づい
て指定ビット数でパッキングする。The selection signal generator 17 outputs a bit number flag of the prediction residual to a packing circuit 18 and a formatting circuit 19.
, And a predictor selection flag indicating the predictor with the smallest prediction residual and a correlation coefficient as described later are applied to the formatting circuit 19. Packing circuit 1
8 is a buffer / selector 14D1, 14D2, 16D1.
The prediction residual for 6 ch selected by 16D4 is packed with the specified number of bits based on the bit number flag specified by the selection signal generator 17.

【００１４】続くフォーマット化回路１９は図３に示す
ようなユーザデータにフォーマット化する。このユーザ
データは前方グループに関する２ch（１）、（２）の予
測符号化データを含む可変レートビットストリームＢＳ
０と、他のグループに関する４ch（３）〜（６）の予測
符号化データを含む可変レートビットストリームＢＳ１
と、ストリームＢＳ０、ＢＳ１の前に設けられたビット
ストリームヘッダにより構成されている。また、１フレ
ーム分のストリームＢＳ０、ＢＳ１は・フレームヘッダと、・各ch（１）〜（６）の１フレームの先頭サンプルデー
タと、・各ch（１）〜（６）のサブフレーム毎の予測器選択フ
ラグと、・各ch（１）〜（６）のサブフレーム毎のビット数フラ
グと、・各ch（１）〜（６）の予測残差データ列（可変ビット
数）と、・後述する相関係数が多重化されている。このような予
測符号化によれば、原信号が例えばサンプリング周波数
＝９６ｋＨｚ、量子化ビット数＝２４ビット、６チャネ
ルの場合、７１％の圧縮率を実現することができる。The following formatting circuit 19 formats the data into user data as shown in FIG. This user data is a variable rate bit stream BS including 2ch (1) and (2) prediction coded data for the front group.
0 and the variable rate bit stream BS1 including the prediction coded data of 4ch (3) to (6) regarding other groups.
And a bit stream header provided before the streams BS0 and BS1. The streams BS0 and BS1 for one frame include: a frame header; first sample data of one frame of each channel (1) to (6); and each subframe of each channel (1) to (6). A predictor selection flag; a bit number flag for each subframe of each channel (1) to (6); a prediction residual data sequence (variable number of bits) for each channel (1) to (6); A correlation coefficient described later is multiplexed. According to such predictive coding, when the original signal has, for example, a sampling frequency of 96 kHz, the number of quantization bits = 24 bits, and 6 channels, a compression ratio of 71% can be realized.

【００１５】次に図４を参照して復号化部３’−１、
３’−２について説明する。上記フォーマットの可変レ
ートビットストリームデータＢＳ０、ＢＳ１は、デフォ
ーマット化回路２１によりストリームデータとフレーム
ヘッダに基づいて分離される。そして、各ｃｈ「１」〜
「６」の１フレームの先頭サンプルデータと予測器選択
フラグはそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３Ｄ
１〜２３Ｄ４に印加され、各ｃｈ「１」〜「６」のビッ
ト数フラグと予測残差データ列はアンパッキング回路２
２に印加される。ここで、予測回路２４Ｄ１、２４Ｄ
２、２３Ｄ１〜２３Ｄ４内の複数の予測器（不図示）は
それぞれ、符号化側の予測回路１３Ｄ１、１３Ｄ２、１
５Ｄ１〜１５Ｄ４内の複数の予測器と同一の特性であ
り、予測器選択フラグにより同一特性のものが選択され
る。Next, referring to FIG. 4, the decoding units 3'-1,
3′-2 will be described. The variable-rate bit stream data BS0 and BS1 in the above format are separated by the deformatting circuit 21 based on the stream data and the frame header. And each channel "1" ~
The head sample data of one frame of “6” and the predictor selection flag are stored in the prediction circuits 24D1, 24D2, and 23D, respectively.
1 to 23D4, and the bit number flags of each channel “1” to “6” and the prediction residual data sequence are stored in the unpacking circuit 2
2 is applied. Here, the prediction circuits 24D1, 24D
2, a plurality of predictors (not shown) in 23D1 to 23D4 are prediction circuits 13D1, 13D2, and 1 on the encoding side, respectively.
The characteristics are the same as those of the plurality of predictors in 5D1 to 15D4, and those having the same characteristics are selected by the predictor selection flag.

【００１６】アンパッキング回路２２は各ｃｈ「１」〜
「６」の予測残差データ列をビット数フラグ毎に基づい
て分離してそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３
Ｄ１〜２３Ｄ４に出力する。予測回路２４Ｄ１、２４Ｄ
２、２３Ｄ１〜２３Ｄ４ではそれぞれ、アンパッキング
回路２２からの各ｃｈ「１」〜「６」の今回の予測残差
データと、内部の複数の予測器の内、予測器選択フラグ
により選択された各１つにより予測された前回の予測値
が加算されて今回の予測値が算出され、次いで１フレー
ムの先頭サンプルデータを基準として各サンプルのＰＣ
Ｍデータが算出される。The unpacking circuit 22 is provided for each channel "1" to
The prediction residual data string of “6” is separated based on each bit number flag, and is divided into prediction circuits 24D1, 24D2, and 23, respectively.
It outputs to D1-23D4. Prediction circuits 24D1, 24D
2, 23D1 to 23D4, the current prediction residual data of each of the channels “1” to “6” from the unpacking circuit 22 and each of the plurality of internal predictors selected by the predictor selection flag. The previous predicted value predicted by one frame is added to calculate the current predicted value, and then the PC of each sample is determined based on the first sample data of one frame.
M data is calculated.

【００１７】ここで、図２に示す符号化部２’−１、
２’−２により予測符号化された可変レートビットスト
リームデータを、記録媒体の一例としてＤＶＤオーディ
オディスクに記録する場合には、図５に示すオーディオ
（Ａ）パックにパッキングされる。このパックは２０３
４バイトのユーザデータ（Ａパケット、Ｖパケット）に
対して４バイトのパックスタート情報と、６バイトのＳ
ＣＲ（System Clock Reference：システム時刻基準参照
値）情報と、３バイトのMux レート（rate）情報と１バ
イトのスタッフィングの合計１４バイトのパックヘッダ
が付加されて構成されている（１パック＝合計２０４８
バイト）。この場合、タイムスタンプであるＳＣＲ情報
を、ＡＣＢユニット内の先頭パックでは「１」として同
一タイトル内で連続とすることにより同一タイトル内の
Ａパックの時間を管理することができる。Here, the coding units 2'-1 shown in FIG.
When the variable-rate bit stream data that has been predictively encoded by 2′-2 is recorded on a DVD audio disc as an example of a recording medium, it is packed in an audio (A) pack shown in FIG. This pack is 203
For 4 bytes of user data (A packet, V packet), 4 bytes of pack start information and 6 bytes of S
It is configured by adding a 14-byte pack header of CR (System Clock Reference: system time reference value) information, 3-byte Mux rate (rate) information, and 1-byte stuffing (1 pack = 2048 total).
Part-Time Job). In this case, the time of the A pack in the same title can be managed by setting the SCR information as the time stamp to be “1” in the first pack in the ACB unit so as to be continuous in the same title.

【００１８】圧縮ＰＣＭのＡパケットは図６に詳しく示
すように、９〜２２バイトのパケットヘッダと、圧縮Ｐ
ＣＭのプライベートヘッダと、図３に示すフォーマット
の１ないし２０１５バイトのオーディオデータ（圧縮Ｐ
ＣＭ）により構成されている。圧縮ＰＣＭのプライベー
トヘッダは、・１バイトのサブストリームＩＤと、・２バイトのＵＰＣ／ＥＡＮ−ＩＳＲＣ（Universal Pr
oduct Code/European Article Number-International S
tandard Recording Code）番号、及びＵＰＣ／ＥＡＮ−
ＩＳＲＣデータと、・１バイトのプライベートヘッダ長と、・２バイトの第１アクセスユニットポインタと、・４バイトのオーディオデータ情報（ＡＤＩ）と、・０〜７バイトのスタッフィングバイトとに、より構成
されている。The A packet of the compressed PCM has a packet header of 9 to 22 bytes and a compressed P
The CM private header and 1 to 2015 bytes of audio data (compressed P
CM). The private header of the compressed PCM is: 1-byte substream ID, 2 bytes of UPC / EAN-ISRC (Universal Prism).
oduct Code / European Article Number-International S
tandard Recording Code) number and UPC / EAN-
ISRC data, 1-byte private header length, 2-byte first access unit pointer, 4 bytes of audio data information (ADI), and 0 to 7 bytes of stuffing bytes. ing.

【００１９】そして、ＡＤＩ内に１秒後のアクセスユニ
ットをサーチするための前方アクセスユニット・サーチ
ポインタと、１秒前のアクセスユニットをサーチするた
めの後方アクセスユニット・サーチポインタがともに１
バイトでセットされる。具体的には、ＡＤＩの１バイト
目に前方アクセスユニット・サーチポインタが、８バイ
ト目に後方アクセスユニット・サーチポインタがセット
される。このようにＡＤＩは、圧縮ＰＣＭでは４バイト
に減少させるためオーディオデータを２０１５バイトま
で収納できる。The forward access unit search pointer for searching for the access unit one second later and the backward access unit search pointer for searching for the access unit one second earlier in the ADI are both 1
Set in bytes. Specifically, the forward access unit search pointer is set in the first byte of the ADI, and the backward access unit search pointer is set in the eighth byte. Thus, ADI can store up to 2015 bytes of audio data in order to reduce it to 4 bytes in compressed PCM.

【００２０】図６に示す圧縮ＰＣＭ（ＰＰＣＭ）のオー
ディオパケットにおけるオーディオデータエリアは、図
７に示すように複数のＰＰＣＭアクセスユニットにより
構成され、ＰＰＣＭアクセスユニットはＰＰＣＭシンク
情報とサブパケットにより構成されている。最初のＰＰ
ＣＭアクセスユニット内のサブパケットは、ディレクト
リと、サブストリーム「ＢＳ０」と、ＣＲＣ（１バイト
又は２バイト）と、サブストリーム「ＢＳ１」と、ＣＲ
Ｃとエクストラ情報により構成され、サブストリーム
「ＢＳ０」、「ＢＳ１」はＰＰＣＭブロックのみにより
構成されている。２番目以降のＰＰＣＭアクセスユニッ
ト内のサブパケットも、ディレクトリと、サブストリー
ム「ＢＳ０」と、ＣＲＣと、サブストリーム「ＢＳ１」
と、ＣＲＣとエクストラ情報により構成され、サブスト
リーム「ＢＳ０」、「ＢＳ１」はリスタートヘッダとＰ
ＰＣＭブロックにより構成されている。The audio data area in the compressed PCM (PPCM) audio packet shown in FIG. 6 is composed of a plurality of PPCM access units as shown in FIG. 7, and the PPCM access unit is composed of PPCM sync information and sub-packets. I have. First PP
The subpacket in the CM access unit includes a directory, a substream “BS0”, a CRC (1 byte or 2 bytes), a substream “BS1”,
C and extra information, and the sub-streams “BS0” and “BS1” are composed of only PPCM blocks. Sub-packets in the second and subsequent PPCM access units also include a directory, a sub-stream “BS0”, a CRC, and a sub-stream “BS1”.
, CRC and extra information, and the sub-streams “BS0” and “BS1” have a restart header and P
It is composed of PCM blocks.

【００２１】また、図２に示す符号化部２’−１、２’
−２により予測符号化された可変レートビットストリー
ムデータをネットワークを介して伝送する場合には、符
号化側では図８示すように伝送用にパケット化し（ステ
ップＳ４１）、次いでパケットヘッダを付与し（ステッ
プＳ４２）、次いでこのパケットをネットワーク上に送
り出す（ステップＳ４３）。復号側では図９に示すよう
にヘッダを除去し（ステップＳ５１）、次いでデータを
復元し（ステップＳ５２）、次いでこのデータをメモリ
に格納して復号を待つ（ステップＳ５３）。The encoding units 2'-1, 2 'shown in FIG.
When the variable rate bit stream data predicted and coded according to -2 is transmitted through a network, the coding side packetizes the data for transmission as shown in FIG. 8 (step S41), and then attaches a packet header (step S41). (Step S42) Then, the packet is sent out onto the network (Step S43). On the decoding side, the header is removed as shown in FIG. 9 (step S51), the data is restored (step S52), and the data is stored in a memory and decoding is waited (step S53).

【００２２】なお、上記実施形態では、ステレオ２chデ
ータ（Ｌ、Ｒ）をそのまま伝送したが、「１」＝Ｌ＋Ｒ「２」＝Ｌ−Ｒ「３」〜「５」は同じ「６」＝Ｌｆｅ−ａ×Ｃただし、０≦ａ≦１ …（２）’ により６チャネル「１」〜「６」と共に、相関のある信
号に変換して予測符号化するようにしてもよい（第２の
実施形態）。この場合には、復号化側のミクス＆マトリ
クス回路４’はチャネル「１」、「２」を加算すること
によりチャネルＬを、減算することによりチャネルＲを
生成することができる。なお、上記実施例では、マルチ
チャンネル（６ｃｈ）とステレオ（２ｃｈ）と復元する
ようにしているが、いずれか一方でもよいことは言うま
でもない。In the above embodiment, the stereo 2ch data (L, R) is transmitted as it is, but “1” = L + R “2” = LR “3” to “5” are the same “6” = Lfe −a × C However, 0 ≦ a ≦ 1... (2) ′ may be converted into a correlated signal together with the six channels “1” to “6” for predictive coding (second embodiment). Form). In this case, the mix & matrix circuit 4 ′ on the decoding side can generate the channel L by adding the channels “1” and “2”, and generate the channel R by subtracting the channel L. In the above embodiment, the multi-channel (6ch) and the stereo (2ch) are restored, but it goes without saying that either one may be restored.

【００２３】また、図１０は第３の実施の形態を示す図
で、この場合にはダウンミックスすることなく、前方グ
ループに関する２ch「１」、「２」を「１」＝Ｌｆ＋Ｒｆ「２」＝Ｌｆ−Ｒｆとして伝送する。そして、再生側では、所望に応じて後
段側のミックス＆マトリクス回路４’から出力されたダ
ウンミックスされないステレオ２チャンネル信号Ｌｆ，
Ｒｆを使用したり、この回路４’内でダウンミックスさ
れて取り出されたステレオ２チャンネル信号Ｌ，Ｒを使
用することもできる。FIG. 10 is a diagram showing the third embodiment. In this case, without downmixing, 2ch “1” and “2” for the front group are changed to “1” = Lf + Rf “2” = It is transmitted as Lf-Rf. Then, on the reproduction side, the stereo two-channel signal Lf, which is not downmixed and output from the subsequent mix & matrix circuit 4 ′, as desired.
It is also possible to use Rf, or to use the stereo two-channel signals L and R that are downmixed and extracted in the circuit 4 '.

【００２４】次に、図１１、図１２、図１３を参照して
第４の実施形態について説明する。上記の実施形態で
は、１グループの相関性の信号「１」〜「６」を予測符
号化するように構成されているが、この第４の実施形態
では複数グループの相関性のある信号を生成して予測符
号化し、圧縮率が最も高いグループの予測符号化データ
を選択するように構成されている。また、このこの実施
例ではその１グループ内における符号化は、前述の各実
施例の場合のように前方グループに関する２ｃｈと他の
グループに関する４ｃｈに分類して変換するようなこと
はせずに、一つにまとめた符号化処理が行われる構成
で、図１１は前述の図１に対応した図として示してあ
る。このため図１２に示す符号化部では、第１〜第ｎの
相関回路１−１〜１−ｎが設けられ、このｎ個の相関回
路１−１〜１−ｎは例えば６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌ
ｓ、Ｒｓ、Ｌｆｅ）のＰＣＭデータを、相関性が異なる
ｎ種類の６ch信号「１」〜「６」に変換する。Next, a fourth embodiment will be described with reference to FIG. 11, FIG. 12, and FIG. In the above embodiment, one group of correlated signals "1" to "6" are configured to be predictively coded. In the fourth embodiment, a plurality of groups of correlated signals are generated. Then, it is configured to perform predictive encoding and select predictive encoded data of a group having the highest compression ratio. Further, in this embodiment, the encoding in the one group is not performed by classifying and converting into 2ch for the front group and 4ch for the other group as in the above-described embodiments. FIG. 11 is a diagram corresponding to the above-described FIG. 1 in a configuration in which the encoding process is integrated into one. For this reason, in the encoding unit shown in FIG. 12, first to n-th correlation circuits 1-1 to 1-n are provided, and the n correlation circuits 1-1 to 1-n have, for example, 6 channels (Lf, Cf). , Rf, L
s, Rs, and Lfe) are converted into n types of 6-channel signals “1” to “6” having different correlations.

【００２５】例えば第１の相関回路１−１は以下のよう
に変換し、「１」＝Ｌｆ「２」＝Ｃ−（Ｌｓ＋Ｒｓ）／２「３」＝Ｒｆ−Ｌｆ「４」＝Ｌｓ−ａ×Ｌｆｅ「５」＝Ｒｓ−ｂ×Ｒｆ「６」＝Ｌｆｅまた、第ｎの相関回路１−ｎは以下のように変換し、「１」＝Ｌｆ＋Ｒｆ「２」＝Ｃ−Ｌｆ「３」＝Ｒｆ−Ｌｆ「４」＝Ｌｓ−Ｌｆ「５」＝Ｒｓ−Ｌｆ「６」＝Ｌｆｅ−Ｃまた、他の相関回路は第１の実施形態のように変換す
る。For example, the first correlation circuit 1-1 converts as follows: "1" = Lf "2" = C- (Ls + Rs) / 2 "3" = Rf-Lf "4" = Ls-a × Lfe “5” = Rs−b × Rf “6” = Lfe Further, the n-th correlation circuit 1-n converts as follows: “1” = Lf + Rf “2” = C−Lf “3” = Rf−Lf “4” = Ls−Lf “5” = Rs−Lf “6” = Lfe−C Further, other correlation circuits perform conversion as in the first embodiment.

【００２６】また、相関回路１−１〜１−ｎ毎に予測回
路１５とバッファ・選択器１６が設けられ、グループ毎
の予測残差の最小値のデータ量に基づいて圧縮率が最も
高いグループが相関選択信号生成器１７ｂにより選択さ
れる。このとき、フォーマット化回路１９はその選択フ
ラグ（相関回路選択フラグ、その相関回路の相関係数
ａ、ｂ）を追加して多重化する。Further, a prediction circuit 15 and a buffer / selector 16 are provided for each of the correlation circuits 1-1 to 1-n, and the group having the highest compression ratio is determined based on the data amount of the minimum value of the prediction residual for each group. Are selected by the correlation selection signal generator 17b. At this time, the formatting circuit 19 adds and multiplexes the selection flag (correlation circuit selection flag, correlation coefficients a and b of the correlation circuit).

【００２７】そして、図１３は前述の図６に対応したデ
ータエリアを示し、この実施例ではサブストリーム「Ｂ
Ｓ１」を用いず、サブストリーム「ＢＳ０」のみで構成
することになる。FIG. 13 shows a data area corresponding to FIG. 6 described above.
Instead of using “S1”, the sub-stream “BS0” alone is used.

【００２８】また、図１４に示す復号化側では、符号化
側の相関回路１−１〜１−ｎに対してｎ個の相関回路４
−１〜４−ｎ（又は係数ａ、ｂが変更可能な１つの相関
回路４）が設けられる。なお、図１２に示すｎグループ
の予測回路が同一の構成である場合、復号装置では図１
４に示すようにｎグループ分の予測回路を設ける必要は
なく、１つのグループ分の予測回路でよい。そして、符
号化装置から伝送された選択フラグに基づいて相関回路
４−１〜４−ｎの１つを選択、又は係数ａ、ｂを設定し
て元の６ch（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を
復元し、また、式（１）によりマルチチャネルをダウン
ミクスしてステレオ２chデータ（Ｌ、Ｒ）を生成する。
また、チャンネル数が「１」〜「６」の６チャンネル方
式のものは、一例であって５チャンネル方式等他の方式
のものであってもよい。Further, on the decoding side shown in FIG. 14, n correlation circuits 4 are provided for the correlation circuits 1-1 to 1-n on the encoding side.
−1 to 4-n (or one correlation circuit 4 whose coefficients a and b can be changed) are provided. Note that when the prediction circuits of the n groups shown in FIG. 12 have the same configuration,
As shown in FIG. 4, there is no need to provide prediction circuits for n groups, and prediction circuits for one group are sufficient. Then, one of the correlation circuits 4-1 to 4-n is selected based on the selection flag transmitted from the encoding device, or coefficients a and b are set and the original 6 ch (Lf, C, Rf, Ls, Rs, Lfe) are restored, and the multi-channel is downmixed according to equation (1) to generate stereo 2-ch data (L, R).
Further, the 6-channel system having the number of channels “1” to “6” is an example, and another system such as a 5-channel system may be used.

【００２９】また、上記の第１の実施形態では、１種類
の相関性の信号「１」〜「６」を予測符号化するように
構成されているが、この信号「１」〜「６」のグループ
と原信号（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のグ
ループを予測符号化し、圧縮率が高い方のグループを選
択するようにしてもよい。In the first embodiment, one kind of correlation signal "1" to "6" is configured to be predictively coded. However, the signals "1" to "6" are encoded. And the group of the original signals (Lf, C, Rf, Ls, Rs, Lfe) may be predictively coded and the group with the higher compression ratio may be selected.

【００３０】[0030]

【発明の効果】以上説明したように本願各本発明によれ
ば、特に、マルチチャネルの音声信号を予測符号化する
場合に圧縮率を改善することができる音声符号化方法を
提供することができる。As described above, according to the present invention, it is possible to provide a speech coding method capable of improving the compression ratio particularly when predictive coding of a multi-channel speech signal is performed. .

[Brief description of the drawings]

【図１】本発明が適用される音声符号化装置とそれに対
応する音声復号装置の第１の実施形態を示すブロック図
である。FIG. 1 is a block diagram showing a first embodiment of a speech encoding device to which the present invention is applied and a speech decoding device corresponding thereto.

【図２】図１の符号化部を詳しく示すブロック図であ
る。FIG. 2 is a block diagram illustrating an encoding unit of FIG. 1 in detail.

【図３】図１、図２の符号化部により符号化されたビッ
トストリームを示す説明図である。FIG. 3 is an explanatory diagram showing a bit stream encoded by an encoding unit shown in FIGS. 1 and 2;

【図４】図１の復号化部を詳しく示すブロック図であ
る。FIG. 4 is a block diagram illustrating a decoding unit of FIG. 1 in detail;

【図５】ＤＶＤのパックのフォーマットを示す説明図で
ある。FIG. 5 is an explanatory diagram showing a format of a DVD pack.

【図６】ＤＶＤのオーディオパックのフォーマットを示
す説明図である。FIG. 6 is an explanatory diagram showing a format of a DVD audio pack.

【図７】図６のオーディオデータエリアのフォーマット
を詳しく示す説明図である。FIG. 7 is an explanatory diagram showing a format of an audio data area in FIG. 6 in detail;

【図８】音声伝送方法を示すフローチャートである。FIG. 8 is a flowchart showing a voice transmission method.

【図９】音声伝送方法を示すフローチャートである。FIG. 9 is a flowchart showing a voice transmission method.

【図１０】本発明が適用される音声符号化装置とそれに
対応する音声復号装置の第３の実施形態を示すブロック
図である。FIG. 10 is a block diagram showing a third embodiment of a speech encoding apparatus to which the present invention is applied and a speech decoding apparatus corresponding thereto.

【図１１】本発明が適用される音声符号化装置とそれに
対応する音声復号装置の第４の実施形態を示すブロック
図である。FIG. 11 is a block diagram showing a fourth embodiment of a speech encoding device to which the present invention is applied and a speech decoding device corresponding thereto.

【図１２】本発明が適用される音声符号化装置の第４の
実施形態を示すブロック図である。FIG. 12 is a block diagram showing a fourth embodiment of the speech coding apparatus to which the present invention is applied.

【図１３】図７に対応した別の実施例の説明図である。FIG. 13 is an explanatory diagram of another embodiment corresponding to FIG. 7;

【図１４】本発明が適用される音声復号装置の第４の実
施形態を示すブロック図である。FIG. 14 is a block diagram showing a fourth embodiment of the speech decoding device to which the present invention is applied.

[Explanation of symbols]

１’ ６chミクス＆マトリクス回路（相関手段、ダウン
ミクス手段）１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４予測回路
（バッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１
６Ｄ４と共に予測符号化手段を構成する。）１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４バッファ・
選択器１９フォーマット化回路（フォーマット化手段）1 '6ch mix & matrix circuit (correlation means, downmix means) 13D1, 13D2, 15D1-15D4 Prediction circuit (buffer / selector 14D1, 14D2, 16D1-1)
Together with 6D4, it constitutes a predictive coding means. 14D1, 14D2, 16D1-16D4 buffer
Selector 19 Formatting circuit (Formatting means)

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｓ 3/02 Ｇ１０Ｌ 9/14 Ｊ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) H04S 3/02 G10L 9/14 J

Claims

[Claims]

1. Downmixing an original multi-channel audio signal to convert it into a stereo 2-channel audio signal; and correlating each of the plurality of original audio signals which are not down-mixed by a predetermined matrix operation. Converting to a stereophonic audio signal; and obtaining a first sample value in response to the audio signal input for each channel of the stereo two channels and the correlated audio signal, and from a past signal in the time domain. Selecting a linear prediction method whose prediction residual is a minimum value among a plurality of prediction values of the current signal to be predicted; and a linear prediction method selected by the step, the prediction residual, and a predetermined first sample. And a step of formatting predicted encoded data including a value into a predetermined bit stream.