JP2001127641A

JP2001127641A - Audio encoder, audio encoding method and audio encoding signal recording medium

Info

Publication number: JP2001127641A
Application number: JP30209499A
Authority: JP
Inventors: Kazumi Arakage; 和美荒蔭
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 1999-10-25
Filing date: 1999-10-25
Publication date: 2001-05-11
Anticipated expiration: 2019-10-25
Also published as: JP3518737B2

Abstract

PROBLEM TO BE SOLVED: To provide an audio encoding method that uses window functions to conduct overlap encoding. SOLUTION: In the audio encoding method that uses at least 4 kinds of window functions of a long window, a short window, a start window, an a stop window to conduct overlapping encoding, the long window or the start window of the window function at the start of encoding is used by taking the matching performance of codes into account, the long window or the stop window is used for the window function at the end of encoding to decide the window function, the window function is multiplied, its output is converted from a time axis into a frequency axis, data are encoded so that a virtual buffer value of a frame at the start of encoding and a virtual buffer value of a frame at the end of encoding are the same specified value so as to enhance the matching performance of the codes.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数の窓関数を用
いオーバーラップ符号化を行うオーディオ（音声）符号
化方法及びその装置に関する。[0001] 1. Field of the Invention [0002] The present invention relates to an audio (speech) encoding method and apparatus for performing overlap encoding using a plurality of window functions.

【０００２】[0002]

【従来の技術】オーディオ符号化方式として、従来ＶＣ
Ｄ（ビデオＣＤ）等で用いられる符号化方式であるＭＰ
ＥＧ１−LayerII方式の場合は、リニアＰＣＭ１１５２
サンプル単位（フレーム）で符号化が行われ、１１５２
サンプル毎に生成される符号量は一定である。このよう
な場合は、別々に信号の符号化を行った符号化ストリー
ムの接続を行っても、図２の「ＭＰＥＧ１LayerIIでの
ストリーム接続」に示すように、符号化レートが一定で
あれば、フレーム単位で生成される符号量(Ｎbits)が一
定（固定）であるため、復号時のバッファ量を示す仮想
バッファに関しては、オーバーフローやアンダーフロー
が起きることはなかった。2. Description of the Related Art As an audio encoding method, a conventional VC is used.
MP, which is an encoding method used in D (video CD), etc.
In the case of the EG1-LayerII system, the linear PCM1152
Encoding is performed in sample units (frames), and 1152
The code amount generated for each sample is constant. In such a case, even if the encoded streams obtained by separately encoding the signals are connected, if the encoding rate is constant as shown in “Stream connection in MPEG1 Layer II” in FIG. Since the code amount (Nbits) generated in units is constant (fixed), overflow and underflow did not occur for the virtual buffer indicating the buffer amount at the time of decoding.

【０００３】しかし、１９９７年４月に規格化されたＭ
ＰＥＧ２−ＡＡＣ(Advanced AudioCoding)符号化方式の
場合、ＰＣＭ１０２４単位（フレーム）で符号化が行わ
れ、符号化レートが一定であっても、図３の「ＭＰＥＧ
２−ＡＡＣでのストリーム接続」に示すように、フレー
ム当たりのビット数が可変長になつており、フレーム毎
に生成される符号量が異なる。However, M standardized in April 1997
In the case of the PEG2-AAC (Advanced Audio Coding) coding method, coding is performed in units of PCM 1024 (frames), and even if the coding rate is constant, “MPEG” in FIG.
As shown in “2-AAC Stream Connection”, the number of bits per frame is variable and the amount of code generated for each frame is different.

【０００４】このＭＰＥＧ２−ＡＡＣ符号化方式のオー
ディオ符号化復号化装置の一例を図と共に説明する。図
１４はオーディオ符号化復号化装置の一例のブロック構
成図を示す。このオーディオ符号化復号化装置は、窓関
数処理器１１、時間周波数変換器１２、量子化器１３、
聴覚モデル１４、逆量子化器１５、周波数時間変換器１
６、及び窓関数処理器１７より構成されている。An example of an audio encoding / decoding apparatus of the MPEG2-AAC encoding system will be described with reference to the drawings. FIG. 14 is a block diagram showing an example of the audio encoding / decoding device. This audio encoding / decoding device includes a window function processor 11, a time-frequency converter 12, a quantizer 13,
Auditory model 14, inverse quantizer 15, frequency-time converter 1
6 and a window function processor 17.

【０００５】まず、入力されたＰＣＭ信号は、窓関数処
理器１１において、窓関数を乗じることが行なわれる。
これは、つぎの時間周波数変換器１２において精度良く
周波数成分を求めるのに必要となる。この窓関数は図１
６に示されるように、オーバーラップ領域を設けてフレ
ームを作成し、そのフレーム毎に窓関数を乗じる処理を
行なう。First, the input PCM signal is multiplied by a window function in a window function processor 11.
This is necessary for the next time-frequency converter 12 to accurately determine the frequency component. This window function is shown in FIG.
As shown in FIG. 6, a frame is created by providing an overlap area, and a process of multiplying a window function for each frame is performed.

【０００６】時間周波数変換器１２では、ＦＦＴやＭＤ
ＣＴ（Modified Discrete Cosine Transform）等を用い
て、時間軸から周波数軸への変換が行なわれる。聴覚モ
デル１４には、入力ＰＣＭ信号が供給され、聴覚心理に
基づいたマスキングレベルの計算により量子化のための
ビット割当量を算出し、ビット割当量を量子化器１３に
供給する。量子化器１３では、ビット割当量を基にして
量子化を行ない、ビットストリームを構成して出力す
る。In the time-frequency converter 12, an FFT or MD
The transformation from the time axis to the frequency axis is performed by using a CT (Modified Discrete Cosine Transform) or the like. The input PCM signal is supplied to the auditory model 14, a bit allocation amount for quantization is calculated by calculating a masking level based on psychoacoustic psychology, and the bit allocation amount is supplied to the quantizer 13. The quantizer 13 performs quantization on the basis of the bit allocation amount, and forms and outputs a bit stream.

【０００７】復号化側では、逆量子化器１５で、送られ
てきたビットストリームを分解して、逆量子化を行な
う。そして、周波数時間変換器１６には逆量子化器１５
の出力が供給されて、周波数軸から時間軸への変換が行
なわれる。On the decoding side, an inverse quantizer 15 decomposes the transmitted bit stream and performs inverse quantization. The frequency-time converter 16 has an inverse quantizer 15.
Is supplied and the conversion from the frequency axis to the time axis is performed.

【０００８】窓関数処理器１７には、周波数時間変換器
１６の出力が供給され、符号化時に使用した窓関数をフ
レームに更に乗じて、前後のフレームを加算することに
より窓関数の影響を取り除くことが出来、ＰＣＭ信号が
復号される。The output of the frequency-to-time converter 16 is supplied to the window function processor 17, and the effect of the window function is removed by further multiplying the frame by the window function used at the time of encoding and adding the preceding and succeeding frames. And the PCM signal is decoded.

【０００９】また、ＭＰＥＧ２−ＡＡＣ方式の場合は、
図４の「ＭＰＥＧ２ＡＡＣでの窓関数とオーバーラップ
処理」に示されるように、１０２４サンプル単位でオー
バーラップされて符号化が行われる。この場合、２０４
８サンプル毎に窓関数を乗じた後、ＭＤＣＴ(ModifiedD
iscrete Cosine Transform)により周波数軸への変換を
行い、符号化を行う。In the case of the MPEG2-AAC system,
As shown in “MPEG2 AAC Window Function and Overlap Processing” in FIG. 4, encoding is performed with 1024 samples being overlapped. In this case, 204
After multiplying the window function every 8 samples, the MDCT (ModifiedD
Conversion to the frequency axis is performed by iscrete cosine transform, and encoding is performed.

【００１０】窓関数には、ＭＤＣＴ演算を行うブロック
長に応じて、適応ブロック長切換えを行なうために、図
６「窓関数の形状の１例」に示されるように、ロング
（long）窓、ショート（short）窓、スタート（start）
窓、及びストップ（stop）窓の４種類の関数が存在す
る。In order to perform adaptive block length switching in accordance with the block length in which the MDCT operation is performed, a long window, a long window, as shown in FIG. Short window, start
There are four types of functions: windows and stop windows.

【００１１】図１６は図１４の時間周波数変換器１２に
おいてＭＤＣＴを用いる場合によく使用されるサイン窓
と５０パーセントオーバーラップによるフレーム構成を
示した図である。オーディオ信号（ＰＣＭデータ）を或
る一定期間で区切り、サイン窓を乗じてＭＤＣＴを実行
し、信号処理を行なう（フレーム１）。つぎのフレーム
（フレーム２）は、前記フレーム１の後半部分のオーデ
ィオ信号（オーバーラップ領域）と、新しいオーディオ
信号から構成され、同様に処理を行なう。FIG. 16 is a diagram showing a sine window and a frame structure with 50% overlap, which are often used when MDCT is used in the time frequency converter 12 of FIG. An audio signal (PCM data) is divided at a certain period, multiplied by a sine window, MDCT is performed, and signal processing is performed (frame 1). The next frame (frame 2) is composed of an audio signal (overlap area) in the latter half of the frame 1 and a new audio signal, and performs the same processing.

【００１２】しかし、例えばカスタネット等のアタック
音を含むオーディオ信号の場合には、一定間隔のフレー
ムでは、量子化ノイズが検知されやすいため、通常のフ
レームよりも短い間隔のフレーム（ショートフレーム）
を構成することで、量子化ノイズを知覚しにくくする。
その場合には、ショートフレーム対応の窓関数（ショー
ト窓関数）とショートフレームに移る場合の窓関数（ス
タート窓関数）を使用して処理を行なう。However, in the case of an audio signal containing an attack sound such as a castanet, quantization noise is easily detected in a frame at a fixed interval, so that a frame (short frame) having a shorter interval than a normal frame is used.
, The quantization noise is hardly perceived.
In this case, the processing is performed using a window function corresponding to a short frame (short window function) and a window function for shifting to a short frame (start window function).

【００１３】図６にロング、ショート、スタート、及び
ストップの４種類の窓の形状の種類を示す。スタート、
及びストップ窓はロング窓、ショート窓の接続部に使用
され、非対称の形状をしている。図１７に窓関数の状態
遷移図を示す。通常時はロング窓を繰り返して使用する
が、アタック音のような過渡的な信号が入力されたこと
を検知すると、つぎのフレームはスタート窓、さらにそ
のつぎのフレームはショート窓と遷移していく。FIG. 6 shows four types of window shapes, long, short, start, and stop. start,
The stop window is used for connecting a long window and a short window, and has an asymmetric shape. FIG. 17 shows a state transition diagram of the window function. Normally, the long window is used repeatedly, but when a transient signal such as an attack sound is detected, the next frame transitions to the start window, and the next frame transitions to the short window. .

【００１４】定常的な信号に戻るにつれて、ショート窓
からストップ窓へ、そしてロング窓に戻る。また、スタ
ート窓からショート窓を経ずにストップ窓に遷移する場
合も有り得る。しかし、符号化ストリーム編集を行なっ
た場合には、例えばロング窓のつぎは、ロング窓または
スタート窓であるように、図１７の状態遷移に添った形
でフレームが接続されるとは限らない。As it returns to a steady signal, it returns from the short window to the stop window and back to the long window. Further, there is a case where a transition is made from the start window to the stop window without passing through the short window. However, when the coded stream is edited, frames are not always connected in the form following the state transition of FIG. 17, for example, following a long window, such as a long window or a start window.

【００１５】[0015]

【発明が解決しようとする課題】ここで、符号化開始時
には図５の「符号化開始フレーム」に示されるように、
オーバーラップ部分に無音データを用いるのが一般的で
ある。このため、オーバーラップ部分に無音データを用
いてビットストリームを編集時に単純に接続した場合
に、接続部の再生音は、図１５の「接続部分の復号結果
の例」に示されるようになり、正常な再生音が得られな
いという問題が起こる。特に符号化終了時に、ショート
窓またはスタート窓を用いた場合には、図１５の各加算
結果欄に示されるように、ショート窓を用いた場合（図
１５（ａ））も、スタート窓を用いた場合（図１５
（ｂ））も、再生音のレベルが短時間で急激に下がって
無音データレベルに到達するため、出力される音が不快
なものとなってしまうという問題が起こる。Here, at the start of encoding, as shown in "encoding start frame" in FIG.
Generally, silence data is used for the overlap portion. For this reason, when the bit stream is simply connected at the time of editing using the silence data in the overlap portion, the reproduction sound of the connection portion is as shown in “Example of decoding result of connection portion” in FIG. There is a problem that a normal reproduction sound cannot be obtained. In particular, when a short window or a start window is used at the end of encoding, as shown in each addition result column in FIG. 15, when a short window is used (FIG. 15A), the start window is used. (Fig. 15
(B)) also has a problem that the output sound becomes unpleasant because the level of the reproduced sound rapidly drops in a short time to reach the silence data level.

【００１６】また、ＭＰＥＧ２−ＡＡＣ符号化方式によ
り素材Ａ、素材Ｂを符号化すると、素材Ａの符号化を行
った場合、各フレーム毎にＡ１，Ａ２，Ａ３，Ａ４bits
が生成され、素材Ｂの場合には、各フレーム毎にＢ１，
Ｂ２，Ｂ３，Ｂ４bitsが生成される。これらは、各素材
Ａ、Ｂ毎に使用可能な符号量制御が行われ、仮想バッフ
ァが安定するように符号化が行われるため、図３の「Ｍ
ＰＥＧ２ＡＡＣでのストリーム接合」に示されるよう
に、単純に、前半の素材Ａと後半の素材Ｂの接続を行う
と、接続したビットストリームは、Ａ１，Ａ２，Ｂ３，
Ｂ４bitsの符号となるため、仮想バッファがオーバーフ
ローやアンダーフローを起こしてしまうという問題が起
こる。When the material A and the material B are encoded by the MPEG2-AAC encoding method, when the material A is encoded, A1, A2, A3, and A4 bits are set for each frame.
Is generated, and in the case of the material B, B1,
B2, B3 and B4 bits are generated. For these, code amount control that can be used for each of the materials A and B is performed, and coding is performed so that the virtual buffer is stabilized.
As shown in “stream joining in PEG2AAC”, simply connecting the first half material A and the second half material B, the connected bit streams are A1, A2, B3
Since the code is B4 bits, a problem occurs that the virtual buffer overflows or underflows.

【００１７】[0017]

【課題を解決するための手段】本発明はこの問題を解決
するために、請求項１の発明は、少なくとも、ロング
窓、ショート窓、スタート窓、ストップ窓の４種類の窓
関数とを用いてオーバーラップ符号化を行うオーディオ
符号化方法において、符号化開始時の窓関数は、符号の
接合性を考慮して前記ロング窓またはスタート窓の何れ
かとし、符号化終了時の窓関数は前記ロング窓またはス
トップ窓の何れかとして、窓関数の決定を行い、前記窓
関数を乗じた後、その出力を時間軸から周波数軸に変換
し、前記符号化開始時のフレームの仮想バッファ値と符
号化終了時のフレームの仮想バッファ値が同一の規定値
になるようにして符号化して、前記符号の接合性を向上
させたことを特徴とするオーディオ符号化方法を提供
し、請求項２の発明は、少なくとも、ロング窓、ショー
ト窓、スタート窓、ストップ窓の４種類の窓関数とを用
いてオーバーラップ符号化を行うオーディオ符号化方法
において、符号の接合性を考慮して、符号化開始時の窓
関数は、前記ロング窓またはスタート窓の何れかとして
窓関数の決定を行い、前記窓関数を乗じた後、その出力
を時間軸から周波数軸に変換し、前記符号化開始時のフ
レームの仮想バッファ値が規定値になるようにして符号
化した符号列を第１の符号列とし、符号化終了時の窓関
数は、前記ロング窓またはストップ窓の何れかとして窓
関数の決定を行い、前記窓関数を乗じた後、その出力を
時間軸から周波数軸に変換し、前記符号化終了時のフレ
ームの仮想バッファ値が前記の規定値になるようにして
符号化しした符号列を第２の符号列として、前記第１の
符号列の開始部分と前記第２の符号列の終了部分とを接
続して、接合性を向上させるようにしたことを特徴とす
るオーディオ符号化方法を提供し、請求項３の発明は、
少なくとも、ロング窓、ショート窓、スタート窓、スト
ップ窓の４種類の窓関数とを用いてオーバーラップ符号
化を行うオーディオ符号化装置において、符号化開始時
の窓関数は、符号の接合性を考慮して前記ロング窓また
はスタート窓の何れかとし、符号化終了時の窓関数は前
記ロング窓またはストップ窓の何れかとして、窓関数の
決定を行う窓関数処理手段１，８と、前記窓関数処理手
段の出力を時間軸から周波数軸に変換する時間周波数変
換手段２と、前記時間周波数変換手段の出力を前記符号
化開始時のフレームの仮想バッファ値と符号化終了時の
フレームの仮想バッファ値が同一の規定値になるように
して符号化を行う量子化手段３，４，９とを有して構成
したことを特徴とするオーディオ符号化装置を提供し、
請求項４の発明は、少なくとも、ロング窓、ショート
窓、スタート窓、ストップ窓の４種類の窓関数とを用い
てオーバーラップ符号化を行ったオーディオ信号が記録
されたオーディオ符号化信号記録媒体において、符号化
開始時の窓関数は、符号の接合性を考慮して前記ロング
窓またはスタート窓の何れかとし、符号化終了時の窓関
数は前記ロング窓またはストップ窓の何れかとして、更
に前記符号化開始時のフレームの仮想バッファ値と符号
化終了時のフレームの仮想バッファ値が同一の規定値に
なるようにして符号化を行ったオーディオ信号が記録さ
れたことを特徴とするオーディオ符号化信号記録媒体を
提供するものである。In order to solve this problem, the present invention solves this problem by using at least four types of window functions of a long window, a short window, a start window, and a stop window. In the audio coding method of performing overlap coding, the window function at the start of coding is either the long window or the start window in consideration of the joinability of codes, and the window function at the end of coding is the long window function. As a window or a stop window, a window function is determined, and after multiplying by the window function, the output is converted from the time axis to the frequency axis, and the virtual buffer value and the coding of the frame at the start of the coding. The audio encoding method according to claim 2, characterized in that the encoding is performed so that the virtual buffer value of the frame at the end becomes the same prescribed value to improve the joinability of the codes. In an audio coding method for performing overlap coding using at least four types of window functions of a long window, a short window, a start window, and a stop window, a window at the start of coding in consideration of code jointness. The function determines a window function as either the long window or the start window, multiplies the window function, converts the output from the time axis to the frequency axis, and provides a virtual buffer of the frame at the start of encoding. A code string coded so that the value becomes a specified value is defined as a first code string, and a window function at the end of coding determines a window function as one of the long window and the stop window. After multiplying by the function, the output is converted from the time axis to the frequency axis, and a code string coded so that the virtual buffer value of the frame at the end of the coding becomes the specified value is converted into a second code string. age 4. An audio encoding method according to claim 3, wherein a start part of said first code string and an end part of said second code string are connected to improve the joinability. The invention of
In an audio coding apparatus that performs overlap coding using at least four types of window functions such as a long window, a short window, a start window, and a stop window, the window function at the start of coding considers code jointness. Window function processing means 1 and 8 for determining a window function as either the long window or the start window, and the window function at the end of encoding as the long window or the stop window; A time-frequency converter 2 for converting the output of the processing unit from the time axis to the frequency axis, and a virtual buffer value of the frame at the start of encoding and a virtual buffer value of the frame at the end of encoding Are provided with quantizing means 3, 4 and 9 for performing encoding so that the values have the same specified value.
According to a fourth aspect of the present invention, there is provided an audio encoded signal recording medium on which an audio signal subjected to overlap encoding using at least four types of window functions of a long window, a short window, a start window, and a stop window is recorded. The window function at the start of encoding is either the long window or the start window in consideration of the associativity of the code, and the window function at the end of encoding is either the long window or the stop window. Audio encoding characterized in that an encoded audio signal is recorded such that the virtual buffer value of the frame at the start of encoding and the virtual buffer value of the frame at the end of encoding have the same prescribed value. A signal recording medium is provided.

【００１８】[0018]

【発明の実施の形態】本発明のオーディオ符号化復号化
装置の一実施例について、図と共に以下に説明する。図
１には本発明が適用されるＭＰＥＧ２−ＡＡＣ方式に用
いた場合の符号化復号化装置の一実施例のブロック図を
示す。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of an audio encoding / decoding apparatus according to the present invention. FIG. 1 shows a block diagram of an embodiment of an encoding / decoding apparatus when used in the MPEG2-AAC system to which the present invention is applied.

【００１９】本発明の実施例のオーディオ符号化復号化
装置は、図１のブロック構成図に示されるように、窓関
数処理器１、時間周波数変換器２、量子化器３、聴覚モ
デル４、逆量子化器５、周波数時間変換器６、窓関数処
理器７、窓関数指定器８、ビット割当指定器９、及び符
号化開始終了時間検出器１０より構成されている。As shown in the block diagram of FIG. 1, an audio encoding / decoding apparatus according to an embodiment of the present invention comprises a window function processor 1, a time-frequency converter 2, a quantizer 3, an auditory model 4, It comprises an inverse quantizer 5, a frequency-time converter 6, a window function processor 7, a window function designator 8, a bit allocation designator 9, and an encoding start / end time detector 10.

【００２０】まず、本発明のオーディオ符号化装置の一
実施例について、説明する。入力ＰＣＭデータは、窓関
数処理器１に供給されて、ここで窓関数が乗じられる。First, an embodiment of the audio encoding apparatus according to the present invention will be described. The input PCM data is supplied to a window function processor 1 where it is multiplied by a window function.

【００２１】ＭＰＥＧ２−ＡＡＣ方式は、図６の「窓関
数の形状の１例」に示されているように、ショート窓、
ロング窓の２種類の窓関数と、これらのショート窓とロ
ング窓の間の窓関数の切替えのために使用されるスター
ト窓、ストップ窓の２種類の窓関数とが規定され、前記
窓関数処理器１でそれらがオーバーラップして乗じられ
る。このスタート窓、及びストップ窓は、ロング窓、シ
ョート窓の接続部を受け持つものであり、図６に示され
るように、その形状は非対称である。The MPEG2-AAC method uses a short window, as shown in "Example of shape of window function" in FIG.
Two types of window functions of a long window and two types of window functions of a start window and a stop window used for switching the window function between the short window and the long window are defined. In the vessel 1, they are multiplied by overlapping. The start window and the stop window serve to connect the long window and the short window, and as shown in FIG. 6, their shapes are asymmetric.

【００２２】通常時はロング窓を繰り返して使用する
が、アタック音のような過渡的な信号が入力されたこと
を検知すると、つぎのフレームはスタート窓、さらにそ
のつぎのフレームはショート窓と遷移していく。定常的
な信号に戻るにつれて、ショート窓からストップ窓へ、
そしてロング窓に戻る。Normally, a long window is repeatedly used. However, when a transition signal such as an attack sound is detected, the next frame is switched to a start window, and the next frame is switched to a short window. I will do it. As it returns to a steady signal, from the short window to the stop window,
Then return to the long window.

【００２３】図４は、ＭＰＥＧ２−ＡＡＣ方式における
ロング窓及びショート窓の「窓関数とオーバーラップ処
理」を説明する図である。窓関数の乗算は、オーディオ
信号（ＰＣＭ信号）を１０２４サンプル毎に区切り（フ
レーム)、符号化対象のフレームの（ＰＣＭ信号）と一
つ前のフレームのＰＣＭ信号との２０４８サンプルに対
して行われる。FIG. 4 is a view for explaining the "window function and overlap processing" of the long window and the short window in the MPEG2-AAC system. The multiplication of the window function is performed by dividing the audio signal (PCM signal) every 1024 samples (frame), and performing 2048 samples of the encoding target frame (PCM signal) and the PCM signal of the immediately preceding frame. .

【００２４】通常の符号化では、実際に１フレーム前の
符号化に用いられた窓の形、及び１フレーム後の入力Ｐ
ＣＭ信号により判定される窓の形により、現在の乗算に
使用される窓の形を決定している。In ordinary coding, the window shape actually used for coding one frame before and the input P after one frame
The shape of the window determined by the CM signal determines the shape of the window used for the current multiplication.

【００２５】本発明になる符号化開始時には、符号化開
始終了時間検出器１０によって、符号化開始時間を検出
して、検出された信号を窓関数指定器８に供給して、こ
こで決定された窓関数情報を窓関数処理器１に供給し
て、１フレーム前のフレームで用いられた窓関数をロン
グ窓とする。さらに１フレーム後の入力データから判定
される窓の形により決定される窓関数を用いて符号化を
行うようにする。すなわち、１フレーム後のフレームの
入力データがロング窓に適していればロング窓を用い、
ショート窓に適していればスタート窓をそれぞれ用いて
符号化を行うようにするものである。At the start of encoding according to the present invention, an encoding start time is detected by an encoding start and end time detector 10 and the detected signal is supplied to a window function designator 8 where it is determined. The window function information is supplied to the window function processor 1 and the window function used in the frame one frame before is set as a long window. Further, encoding is performed using a window function determined by a window shape determined from input data one frame later. That is, if the input data of the frame after one frame is suitable for the long window, the long window is used,
If it is suitable for a short window, encoding is performed using each start window.

【００２６】図８の「前のフレームがロング窓で次がシ
ョート窓の場合」に示されるように、例えば、１フレー
ム前がロング窓により符号化されて、１フレーム後の処
理フレームのＰＣＭ信号がショート窓を用いた符号化に
適している場合は、接合部の窓関数としてスタート窓が
選択される。As shown in FIG. 8 "when the previous frame is a long window and the next is a short window", for example, the PCM signal of the processed frame after one frame is encoded by the long window one frame before. Is suitable for encoding using a short window, the start window is selected as the window function of the joint.

【００２７】また、図９の「前のフレームがロング窓で
つぎがロング窓の場合」に示されるように、１フレーム
前がロング窓により符号化されて、１フレーム後のフレ
ームがロング窓を用いた符号化に適している場合は、接
合部の窓関数としてはロング窓が選択される。Also, as shown in FIG. 9 "when the previous frame is a long window and the next is a long window", one frame before is coded by the long window, and the frame after one frame is coded by the long window. If the coding used is suitable, a long window is selected as the window function of the joint.

【００２８】また、図５の「符号化開始フレーム」に示
されるように、オーバーラップ部分には無音データを用
いる。As shown in "encoding start frame" in FIG. 5, silence data is used for an overlap portion.

【００２９】また、通常は１フレーム前の符号化に用い
た窓関数、及び、１フレーム後の入力データから判定さ
れる窓関数により窓関数が決定されるが、本発明になる
符号化終了時の最終フレームの場合には、窓関数指定器
８によって、１フレーム前の符号化に用いた窓関数のみ
により窓関数の決定を行うようにした。Although the window function is usually determined by the window function used for encoding one frame before and the window function determined from the input data one frame after, In the case of the last frame, the window function designator 8 determines the window function using only the window function used for encoding one frame before.

【００３０】すなわち、本発明は、最終フレームの場合
において、１フレーム前の窓が、図１０の「最終フレー
ムの前がロング窓の場合」に示すように、ロング窓の場
合は窓関数指定器８によって、ロング窓を用いて最終フ
レームの符号化を行なうようにする。また、本発明は、
最終フレームの場合において、１フレーム前の窓が、図
１１の「最終フレームの前がショート窓の場合」に示す
ように、ショート窓の場合は窓関数指定器８により、ス
トップ窓を用用いて最終フレームの符号化を行なうよう
にする。That is, according to the present invention, when the window before the last frame is a long window, as shown in FIG. 8, the last frame is encoded using a long window. Also, the present invention
In the case of the last frame, the window before the last frame is a short window, as shown in “Case before the last frame is short window” in FIG. Encode the last frame.

【００３１】また、本発明は、最終フレームの場合にお
いて、１フレーム前の窓が、図１２の「最終フレームの
前がスタート窓の場合」に示すようにスタート窓の場合
は窓関数指定器８により、ストップ窓を用いて最終フレ
ームの符号化を行なうようにする。Further, according to the present invention, in the case of the last frame, if the window one frame before is a start window as shown in FIG. Thus, the last frame is encoded using the stop window.

【００３２】また、本発明は、最終フレームの場合にお
いて、１フレーム前の窓が、図１３の「最終フレームの
前がストップ窓の場合」に示すようにストップ窓の場合
は窓関数指定器８により、ロング窓を用いて、最終フレ
ームの符号化を行うようにする。Further, according to the present invention, in the case of the last frame, if the window one frame before is a stop window as shown in FIG. Thus, the encoding of the last frame is performed using the long window.

【００３３】すなわち、本発明はストップ窓、または、
ロング窓を用いて最終フレームの符号化を行うことによ
り、ビットストリームを単純に接続したものを、復号化
し、再生した場合は、最終フレームにショート窓、また
はスタート窓を用いた場合と比較して、再生音のレベル
の変化が緩やかになり、音質の劣化を感じにくくさせる
ことが出来る。すなわち、図７の各加算結果欄に示され
るように、ロング窓を用いた場合（図７（ａ））も、ス
トップ窓を用いた場合（図７（ｂ））も、再生音のレベ
ルの変化が緩やかになり、音質の劣化を感じにくくさせ
ることが出来る。That is, the present invention provides a stop window or
By encoding the last frame using a long window, the result of simply connecting the bit stream is decoded and played back, compared to using a short window or a start window for the last frame. In addition, the change in the level of the reproduced sound becomes gentle, and the deterioration of the sound quality can be hardly perceived. That is, as shown in each addition result column of FIG. 7, the level of the reproduced sound is not limited both when the long window is used (FIG. 7A) and when the stop window is used (FIG. 7B). The change becomes gradual, and the deterioration of the sound quality can be hardly perceived.

【００３４】図７は本発明で処理された符号を編集した
状態も示しており、図７（ａ）は、ロング窓で終了した
符号列とロング窓で開始した符号列とを接続した場合を
示し、図７（ｂ）は、ストップ窓で終了した符号列とロ
ング窓で開始した符号列とを接続した場合を示してい
る。別々に符号化したビットストリームの開始部分と終
了部分とを単純に接続（編集）しても、接続部の音質の
劣化を最小に抑えることが出来ることを示している。FIG. 7 also shows a state in which the code processed by the present invention is edited. FIG. 7A shows a case where a code sequence ending with a long window and a code sequence starting with a long window are connected. FIG. 7B shows a case where a code string terminated by a stop window and a code string started by a long window are connected. This shows that even if the start part and the end part of the separately encoded bit stream are simply connected (edited), deterioration of the sound quality at the connection part can be minimized.

【００３５】これ以外にもここでは図示していないが、
ロング窓で終了した符号列とスタート窓で開始した符号
列とを接続した場合、ストップ窓で終了した符号列とス
タート窓で開始した符号列とを接続した場合も、全く同
様の効果を奏し得るものである。Although not shown here other than this,
The same effect can be obtained when the code sequence ended with the long window and the code sequence started with the start window are connected, and when the code sequence ended with the stop window and the code sequence started with the start window are connected. Things.

【００３６】以上説明したように、前記窓関数処理器１
で２０４８サンプル毎に窓関数を乗じた後、時間周波数
変換器２でＭＤＣＴ(Modified Discrete Cosine Transf
orm)により周波数軸への変換を行う。As described above, the window function processor 1
Multiplies the window function for each 2048 samples by the time-frequency converter 2 and then multiplies the MDCT (Modified Discrete Cosine Transf
orm) to convert to the frequency axis.

【００３７】前記時間周波数変換器２よりの周波数軸に
変換されたＰＣＭ信号が供給される量子化器３では、入
力ＰＣＭ信号が供給される聴覚モデル４よりビット割当
指定器９を介して割り当てられるビット割当てのビット
量に基づいて量子化を行う。In the quantizer 3 to which the PCM signal converted to the frequency axis from the time-frequency converter 2 is supplied, the quantizer 3 is assigned via the bit assignment designator 9 by the auditory model 4 to which the input PCM signal is supplied. The quantization is performed based on the bit amount of the bit allocation.

【００３８】ビット割当指定器９は、通常は聴覚モデル
４からのビット割当てをそのまま量子化器３に対して行
ない、符号化開始フレームを量子化する場合、及び終了
フレームを量子化する場合には、符号化開始終了時間検
出器１０よりの検出信号がビット割当指定器９に供給さ
れて、仮想バッファ値が同一の規定値になるようにビッ
ト割当てを量子化器３に対して行い、符号化を行って符
号化器出力としてビットストリームを出力する。Normally, the bit assignment designator 9 directly assigns the bit from the auditory model 4 to the quantizer 3 and quantizes the coding start frame and the end frame. , The detection signal from the encoding start / end time detector 10 is supplied to the bit allocation designator 9, and the bit allocation is performed on the quantizer 3 so that the virtual buffer value becomes the same specified value. To output a bit stream as an encoder output.

【００３９】つぎに、本発明のオーディオ復号化装置の
一実施例について、以下に説明する。図１に示されるよ
うに、復号化装置側は、ビットストリームが供給される
逆量子化器５、周波数時間変換器６、及びＰＣＭを出力
する窓関数処理器７より構成される。Next, an embodiment of the audio decoding apparatus according to the present invention will be described below. As shown in FIG. 1, the decoding apparatus comprises an inverse quantizer 5 to which a bit stream is supplied, a frequency-time converter 6, and a window function processor 7 for outputting PCM.

【００４０】符号化器側よりのビットストリームは、ま
ず逆量子化器５に供給され、逆量子化処理が行なわれ
る。The bit stream from the encoder side is first supplied to an inverse quantizer 5, where an inverse quantization process is performed.

【００４１】逆量子化器５の出力は周波数時間変換器６
に供給され、周波数軸から時間軸信号に変換される。そ
して、周波数時間変換器６の変換出力は窓関数処理器７
に供給され、ここで窓関数処理器７において窓関数が乗
じられて、復号化器出力として、ＰＣＭデータを出力す
る。The output of the inverse quantizer 5 is a frequency-time converter 6
And is converted from a frequency axis to a time axis signal. The conversion output of the frequency-time converter 6 is supplied to the window function processor 7
, And is multiplied by the window function in the window function processor 7 to output PCM data as a decoder output.

【００４２】窓関数処理器７には、周波数時間変換器６
の出力が供給され、符号化時に使用した窓関数をフレー
ムに更に乗じて前後のフレームを加算することにより窓
関数の影響を取り除くことが出来、ＰＣＭ信号が復号さ
れる。The window function processor 7 includes a frequency-time converter 6
Is supplied, and the effect of the window function can be removed by multiplying the frame by the window function used at the time of encoding and adding the preceding and succeeding frames, and the PCM signal is decoded.

【００４３】このように符号化開始時と符号化終了時の
仮想バッファ値を同一の規定値になるように符号量制御
を行うことにより、別々に符号化したビットストリーム
を単純に結合しても、仮想バッファがオーバーフローや
アンダーフローを起こすことなく、正常に復号化するこ
とが出来る。以上の処理により、別々に符号化したビッ
トストリームの結合を可能にし、結合部分の音質の劣化
を最小にすることが出来る。As described above, by controlling the code amount so that the virtual buffer value at the time of encoding start and the virtual buffer value at the time of encoding end become the same specified value, it is possible to simply combine separately encoded bit streams. Thus, decoding can be performed normally without causing overflow or underflow of the virtual buffer. Through the above processing, it is possible to combine bit streams that have been separately encoded, and it is possible to minimize deterioration in sound quality at the connection part.

【００４４】[0044]

【発明の効果】本発明は、少なくとも、ロング窓、ショ
ート窓、スタート窓、ストップ窓の４種類の窓関数とを
用いてオーバーラップ符号化を行うオーディオ符号化方
法において、符号の接合性を考慮して、符号化開始時の
窓関数は、前記ロング窓またはスタート窓の何れかと
し、符号化終了時の窓関数は、前記ロング窓またはスト
ップ窓の何れかとして、窓関数の決定を行い、前記窓関
数の決定を行った出力を時間軸から周波数軸に変換し、
前記符号化開始時のフレームの仮想バッファ値と符号化
終了時のフレームの仮想バッファ値が同一の規定値にな
るようにして符号化して、前記符号の接合性を向上させ
られるので、別々に符号化された符号の結合の整合性が
保証される。According to the present invention, an audio coding method for performing overlap coding using at least four types of window functions of a long window, a short window, a start window, and a stop window, in consideration of code jointness. Then, the window function at the start of encoding is any of the long window or the start window, and the window function at the end of encoding is a window function determined as any of the long window or the stop window, Convert the output that has determined the window function from the time axis to the frequency axis,
The encoding is performed such that the virtual buffer value of the frame at the start of encoding and the virtual buffer value of the frame at the end of encoding have the same specified value, and the joinability of the codes can be improved. The integrity of the combined code is assured.

【００４５】また、本発明により、別々に符号化したビ
ットストリームを単純に結合しても、接続部の音質の劣
化を最小に抑えることが出来る。Further, according to the present invention, even if the bit streams separately encoded are simply combined, the deterioration of the sound quality at the connection portion can be suppressed to the minimum.

【００４６】また、本発明により、符号化開始時と符号
化終了時の仮想バッファ値を同一の規定値になるように
符号量制御を行うことにより、別々に符号化したビット
ストリームを単純に結合しても、オーバーフローやアン
ダーフローを起こすことなく、正常に復号化することが
出来る。Further, according to the present invention, by controlling the code amount so that the virtual buffer values at the start of encoding and the end of encoding have the same specified value, the bit streams separately encoded can be simply combined. However, decoding can be performed normally without causing overflow or underflow.

[Brief description of the drawings]

【図１】本発明のオーディオ符号化復号化装置の一実施
例のブロック構成図を示したものである。FIG. 1 is a block diagram showing an embodiment of an audio encoding / decoding apparatus according to the present invention.

【図２】ＭＰＥＧ１でのストリーム接合を説明するため
の図である。FIG. 2 is a diagram for describing stream joining in MPEG1.

【図３】ＭＰＥＧ２−ＡＡＣでのストリーム接合を説明
するための図である。FIG. 3 is a diagram for explaining stream joining in MPEG2-AAC.

【図４】ＭＰＥＧ２−ＡＡＣ方式におけるロング窓及び
ショート窓の「窓関数とオーバーラップ処理」を説明す
る図である。FIG. 4 is a diagram illustrating “window function and overlap processing” of a long window and a short window in the MPEG2-AAC method.

【図５】符号化開始フレームを説明するための図であ
る。FIG. 5 is a diagram illustrating an encoding start frame.

【図６】ＭＰＥＧ２で用いられる、ショート窓、ロング
窓、スタート窓、及びストップ窓の４種類の窓関数を説
明するための図である。FIG. 6 is a diagram for explaining four types of window functions used in MPEG2: a short window, a long window, a start window, and a stop window.

【図７】本発明の場合の接続部分の復号結果の実施例を
示した図である。FIG. 7 is a diagram showing an example of a decoding result of a connection part in the case of the present invention.

【図８】本発明になる前のフレームがロング窓で、つぎ
がショート窓の場合を説明するための図である。FIG. 8 is a diagram for explaining a case in which a frame before the present invention is a long window and a next is a short window.

【図９】本発明になる前のフレームがロング窓でつぎが
ロング窓の場合を説明するための図である。FIG. 9 is a diagram for explaining a case where a frame before the present invention is a long window and a next is a long window.

【図１０】本発明になる最終フレームの前がロング窓の
場合を説明するための図である。FIG. 10 is a diagram for explaining a case where a long window precedes a last frame according to the present invention.

【図１１】本発明になる最終フレームの前がショート窓
の場合を説明するための図である。FIG. 11 is a diagram illustrating a case where a short window is provided before the last frame according to the present invention.

【図１２】本発明になる最終フレームの前がスタート窓
の場合を説明するための図である。FIG. 12 is a diagram for explaining a case where a start window is before a last frame according to the present invention.

【図１３】本発明になる最終フレームの前がストップ窓
の場合を説明するための図である。FIG. 13 is a diagram for explaining a case where a stop window is provided before the last frame according to the present invention.

【図１４】従来のオーディオ符号化復号化装置の一例の
ブロック構成図を示したものである。FIG. 14 is a block diagram showing an example of a conventional audio encoding / decoding device.

【図１５】従来の場合の接続部分の復号結果の例を示し
た図である。FIG. 15 is a diagram illustrating an example of a decoding result of a connection part in a conventional case.

【図１６】ＭＤＣＴを用いたフレーム構成を示した図で
ある。FIG. 16 is a diagram showing a frame configuration using MDCT.

【図１７】通常の状態遷移を示した状態遷移図である。FIG. 17 is a state transition diagram showing a normal state transition.

[Explanation of symbols]

１窓関数処理器（窓関数処理手段）２時間周波数変換器（時間周波数変換手段）３量子化器（量子化手段）４聴覚モデル５逆量子化器６周波数時間変換器７窓関数処理器８窓関数指定器９ビット割当指定器１０符号化開始終了時間検出器 DESCRIPTION OF SYMBOLS 1 Window function processor (window function processing means) 2 Time frequency converter (time frequency conversion means) 3 Quantizer (quantization means) 4 Auditory model 5 Inverse quantizer 6 Frequency-time converter 7 Window function processor 8 Window function designator 9 Bit allocation designator 10 Coding start / end time detector

Claims

[Claims]

1. An audio encoding method for performing overlap encoding using at least four types of window functions: a long window, a short window, a start window, and a stop window. The window function at the start of the coding is either the long window or the start window, and the window function at the end of the coding is a window function determination as either the long window or the stop window. After multiplying
The output is converted from the time axis to the frequency axis, and the encoding is performed so that the virtual buffer value of the frame at the start of encoding and the virtual buffer value of the frame at the end of encoding have the same specified value. An audio encoding method characterized by improving the joinability of audio data.

2. An audio coding method for performing overlap coding by using at least four types of window functions of a long window, a short window, a start window, and a stop window. The window function at the start of coding determines the window function as either the long window or the start window, multiplies the window function, converts the output from the time axis to the frequency axis, and starts the coding at the start of the coding. A code sequence coded so that the virtual buffer value of the frame becomes a specified value is defined as a first code sequence, and a window function at the end of coding is determined as either the long window or the stop window. After multiplying the window function, the output is converted from the time axis to the frequency axis, and the encoded code sequence is encoded so that the virtual buffer value of the frame at the end of the encoding becomes the specified value. Second As a code sequence, and it connects the termination portion of said the start of the first code string second code string, audio encoding method is characterized in that so as to improve the bonding properties.

3. An audio encoding apparatus that performs overlap encoding using at least four types of window functions, a long window, a short window, a start window, and a stop window,
The window function at the start of encoding is either the long window or the start window in consideration of the joinability of codes, and the window function at the end of encoding is the window function of either the long window or the stop window. Window function processing means for determining, a time-frequency conversion means for converting the output of the window function processing means from a time axis to a frequency axis, and a virtual buffer value of the frame at the start of the encoding, And a quantizing means for performing encoding so that the virtual buffer value of the frame at the end of encoding becomes the same prescribed value.

4. An audio coded signal recording medium on which an audio signal subjected to overlap coding using at least four window functions of a long window, a short window, a start window, and a stop window is recorded. The window function at the start of coding is either the long window or the start window in consideration of the associativity of the code, and the window function at the end of coding is any of the long window or the stop window. An audio encoded signal recording characterized in that an encoded audio signal is recorded such that a virtual buffer value of a frame at the start and a virtual buffer value of a frame at the end of encoding have the same specified value. Medium.