JP2008026372A

JP2008026372A - Encoding rule conversion method and device for encoded data

Info

Publication number: JP2008026372A
Application number: JP2006195447A
Authority: JP
Inventors: Koichi Takagi; 幸一高木; Satoshi Miyaji; 悟史宮地; Yasuhiro Takishima; 康弘滝嶋
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2006-07-18
Filing date: 2006-07-18
Publication date: 2008-02-07
Anticipated expiration: 2026-07-18
Also published as: JP4721355B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an encoding rule conversion method and device for converting the encoding rule of data which are encoded with a first encoding rule, to a second encoding rule in a short period of time, without degradation in the quality. <P>SOLUTION: In the encoding rule conversion method of the encoded data, a quantization scale calculating section 300 calculates an AAC(advanced audio coding) scale value Q' from an MP3 quantization scale value Q, on the basis of a primary function indicating correlation of the MP3 quantization scale value and the AAC quantization scale value. A quantization section 311 quantizes MDCT (modified discrete cosine transform) coefficient, on the basis of the AAC quantization scale value Q', which is calculated in the quantization scale calculating section 300. Consequently, repeated processings for determining the quantization scale value are not performed in the AAC encoding process. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、第１の符号化則で符号化されているオーディオデータを、第１の符号化則とは異なる第２の符号化則で符号化されたオーディオデータに変換する符号化データの符号化則変換方法および装置に関する。 The present invention relates to encoding of encoded data for converting audio data encoded with the first encoding rule to audio data encoded with a second encoding rule different from the first encoding rule. The present invention relates to a chemical rule conversion method and apparatus.

国際標準で定められたオーディオ圧縮方式として、ISOで標準化されたMP3 （MPEG（Moving Picture Experts Group）-１ Audio Layer 3）規格が広く普及している。MP3はもっとも人気のあるオーディオ圧縮方式の一つであり、多くのポータブル再生デバイスにMP3デコーダが実装されている。 As an audio compression method defined by an international standard, the MP3 (MPEG (Moving Picture Experts Group) -1 Audio Layer 3) standard standardized by ISO is widely used. MP3 is one of the most popular audio compression methods, and many portable playback devices have an MP3 decoder.

一方、MP3とは互換性がないが、音質を保ったまま、より高い圧縮率を実現した符号化方式としてAAC(Advanced Audio Coding)が規定され、いくつかのオンライン音楽配信や各種放送サービスなどで採用されている。AACはCD品質の音声を１チャネルあたり約48〜64 kbpsで実現でき、MP3と比較して約３０%の符号量を削減できると言われている。それゆえ、AACの製品やAACに関連するサービスが今後市場に多く出現し、MP3のファイルをAACのファイルに変換したいという要望が増えていくことが予想される。 On the other hand, although it is not compatible with MP3, AAC (Advanced Audio Coding) is defined as an encoding method that achieves a higher compression rate while maintaining sound quality, and it is used in several online music distribution and various broadcasting services, etc. It has been adopted. AAC is said to be capable of realizing CD-quality audio at about 48 to 64 kbps per channel and reducing the code amount by about 30% compared to MP3. Therefore, AAC products and services related to AAC will appear in the market in the future, and the demand for converting MP3 files to AAC files is expected to increase.

MP3をAACに変換する最も簡単な方法として、MP3をPCMデータまで完全にデコードした後、このPCMデータをAACに再エンコードする、非圧縮ドメインでのトランスコーディングが挙げられる。 The simplest way to convert MP3 to AAC is transcoding in the uncompressed domain, where MP3 is completely decoded to PCM data and then this PCM data is re-encoded to AAC.

図５は、従来のMP3デコーダ１０の主要部の構成を示したブロック図であり、MP3のハフマン符号を非線形量子化されたデータにデコードするハフマン符号デコード部１０１と、フレーム内のサイド情報を取り出してデコードするサイド情報デコード部１０２と、サイド情報に基づいてデータを逆量子化する逆量子化部１０３と、ハイブリッドフィルターバンクによって生じたエイリアシング（折り返し歪み）を削減するエイリアス削減部１０４と、逆変形離散コサイン変換を行うIMDCT(Inverse Modified Discrete Cosine Transform)部１０５と、３２分割されたサブバンドを合成してPCMデータを復元するSynthesis Subband Filter Bank部１０６とを主要な構成としている。このようなMP3デコーダは特許文献１に開示されている。 FIG. 5 is a block diagram showing the configuration of the main part of a conventional MP3 decoder 10, which extracts a Huffman code decoding unit 101 for decoding MP3 Huffman code into nonlinear quantized data and side information in the frame. The side information decoding unit 102 for decoding the data, the inverse quantization unit 103 for dequantizing the data based on the side information, the alias reduction unit 104 for reducing aliasing caused by the hybrid filter bank, and the inverse deformation An IMDCT (Inverse Modified Discrete Cosine Transform) unit 105 that performs discrete cosine transform and a Synthesis Subband Filter Bank unit 106 that synthesizes 32 subbands to restore PCM data are the main components. Such an MP3 decoder is disclosed in Patent Document 1.

図６は、従来のAACエンコーダ２０の主要部の構成を示したブロック図であり、入力されたオーディオ信号は、所定のサンプル数毎にブロック化（フレームと呼ぶ）され、２つのパスに分かれて処理される。 FIG. 6 is a block diagram showing a configuration of a main part of a conventional AAC encoder 20, and an input audio signal is blocked (called a frame) every predetermined number of samples and divided into two paths. It is processed.

心理聴覚分析部２０１は、入力フレームを高速フーリエ変換(FFT)して周波数スペクトルおよび各種のパラメータを求める。MDCT(modified DCT)部２０２は、心理聴覚分析部で決定されたブロック長で入力オーディオ信号を周波数スペクトル（以下、MDCT係数と表現する場合もある）に変換する。TNS（Temporal Noise Shaping：時間領域雑音整形)部２０３は、圧縮処理に伴う雑音のレベルを、音の大きさに合わせて変化させることで信号レベルの大きな箇所に量子化ノイズを集中させ、音が小さな部分では雑音も小さくして聴感を向上させる。 The psychoacoustic analysis unit 201 obtains a frequency spectrum and various parameters by performing a fast Fourier transform (FFT) on the input frame. The MDCT (modified DCT) unit 202 converts the input audio signal into a frequency spectrum (hereinafter also referred to as an MDCT coefficient) with the block length determined by the psychoacoustic analysis unit. A TNS (Temporal Noise Shaping) unit 203 concentrates the quantization noise on a portion with a large signal level by changing the noise level accompanying the compression processing according to the volume of the sound, and the sound is In a small part, the noise is reduced and the hearing is improved.

後方予測処理部２０４は、MDCT係数に対して予測フィルタリングを行う。非線形量子化部２０５は、心理聴覚分析部で求めたスケールファクタバンド毎の許容量子化雑音電力を下回ることを目標にMDCT係数を量子する。量子化されたMDCT係数は、更にハフマン符号化部２０６でハフマン符号化されて冗長度を削減される。この量子化・ハフマン符号化の処理は反復ループで行われ、実際に生成される符号量がフレームに割当てられたビット数を下回るまで繰返される。このようなAACエンコーダは特許文献２に開示されている。 The backward prediction processing unit 204 performs prediction filtering on the MDCT coefficient. The nonlinear quantization unit 205 quantizes the MDCT coefficient with the goal of being below the allowable quantization noise power for each scale factor band obtained by the psychoacoustic analysis unit. The quantized MDCT coefficient is further subjected to Huffman coding by the Huffman coding unit 206 to reduce redundancy. This quantization / Huffman encoding process is performed in an iterative loop, and is repeated until the amount of code actually generated falls below the number of bits allocated to the frame. Such an AAC encoder is disclosed in Patent Document 2.

なお、特許文献３には、編集作業が難しい第１の符号化則のオーディオデータ(MP3)を、編集作業が容易な第２の符号化則のオーディオデータ(ATRAC)に変換して、その第２の符号化則のオーディオデータに対して編集処理を行ない、編集後に第１の符号化則のオーディオデータに戻す技術が開示されている。
特開２００３−９９０９５号公報特開２００６−１４５７８２号公報特開２００１−１８４８４０号公報 In Patent Document 3, the audio data (MP3) of the first coding rule that is difficult to edit is converted into the audio data (ATRAC) of the second coding rule that is easy to edit. A technique is disclosed in which editing processing is performed on audio data with the second encoding rule, and the data is restored to audio data with the first encoding rule after editing.
JP 2003-99095 A JP 2006-145882 A JP 2001-184840 A

オーディオデータの符号化則変換において、変換対象データを完全復号化後に再符号化するトランスコーディング方式では、全ての復号化プロセスおよび符号化プロセスが実行され、特に符号化プロセスが全て実行されると、量子化・ハフマン符号化の反復ループで長時間を要するために、変換時間が長くなるという技術課題があった。また、同一データに対して２種類の符号化処理が行われることになるので品質が劣化するという技術課題があった。 In the coding rule conversion of audio data, in the transcoding scheme in which the data to be converted is re-encoded after complete decoding, all decoding processes and encoding processes are executed, especially when all the encoding processes are executed, Since it takes a long time in the iterative loop of quantization and Huffman coding, there is a technical problem that the conversion time becomes long. In addition, since two types of encoding processing are performed on the same data, there is a technical problem that quality deteriorates.

本発明の目的は、上記した従来技術の課題を解決し、第１符号化則で符号化されているデータの符号化則を、品質劣化を最小限に抑えながら短時間で第２符号化則に変換できる符号化則変換方法および装置を提供することにある。 The object of the present invention is to solve the above-mentioned problems of the prior art and to convert the data encoded by the first encoding rule into the second encoding rule in a short time while minimizing quality degradation. It is an object of the present invention to provide a coding rule conversion method and apparatus that can convert the data into a code.

上記した目的を達成するために、本発明は、第１符号化則で符号化された第１符号化則データを、符号化時の量子化が反復ループで繰り返される第２符号化則で符号化された第２符号化則データに変換する符号化データの符号化則変換方法において、以下のような手順を含むことを特徴とする。
(1)第１符号化則データをPCMデータに復号化する復号化プロセスおよびPCMデータを第２符号化則データに符号化する符号化プロセスを含み、さらに、第１符号則データの復号化プロセスにおいて量子化データを逆量子化する手順と、前記逆量子化手順において、第１逆量子化スケール値をサンプルごとに取得する手順と、前記各第１量子化スケール値に所定の関数計算を実行して第２量子化スケール値を算出する手順と、第２符号化則の符号化プロセスにおいて、前記第２各量子化スケール値を用いてデータを量子化する手順とを含むことを特徴とする。
(2)第１符号化則データが、当該第１符号化則データの１フレームのサンプル数と第２符号化則データの１フレームのサンプル数との最小公倍数に相当するフレーム数ずつ、その符号化則を変換されることを特徴とする。
(3)第２符号化則データの符号化プロセスが、データの時間領域を周波数領域に変換するDCT手順を含み、さらに、第１符号則データの復号化プロセスにおける逆量子化手順において、各フレームのフレーム構造を保存する手順と、第２符号則データの符号化プロセスにおけるDCT手順において、前記保存されたフレーム構造に基づいてウインドウサイズを決定する手順とを含むことを特徴とする。 In order to achieve the above-described object, the present invention encodes the first coding rule data encoded with the first coding rule with the second coding rule in which the quantization at the time of encoding is repeated in an iterative loop. An encoded data conversion method for converting encoded data into converted second encoded rule data includes the following procedure.
(1) A decoding process for decoding the first coding rule data into the PCM data, a decoding process for coding the PCM data into the second coding rule data, and a decoding process for the first coding rule data In the inverse quantization procedure, in the inverse quantization procedure, a first inverse quantization scale value is obtained for each sample, and a predetermined function calculation is performed on each first quantization scale value. And calculating a second quantization scale value and, in the encoding process of the second encoding rule, a procedure of quantizing data using each of the second quantization scale values. .
(2) The first coding rule data is encoded by the number of frames corresponding to the least common multiple of the number of samples of one frame of the first coding rule data and the number of samples of one frame of the second coding rule data. It is characterized in that the chemical rule is converted.
(3) The encoding process of the second encoding rule data includes a DCT procedure for converting the time domain of the data into the frequency domain, and each frame in the inverse quantization procedure in the decoding process of the first encoding rule data And a procedure for determining a window size based on the stored frame structure in the DCT procedure in the encoding process of the second code rule data.

本発明によれば、以下のような効果が達成される。
(1)第１符号化則で符号化された第１符号化則データを、符号化時の量子化が反復ループで繰り返される第２符号化則で符号化された第２符号化則データに変換する際、第１符号化則データの復号化プロセスで得られる量子化スケールに関するパラメータを、第２符号化則での符号化プロセスに継承できるので、第２符号化則の符号化プロセスにおいて最も時間を要する繰り返しプロセスを簡略化できるようになる。
(2)符号化則変換が、第１符号化則データの１フレームのサンプル数と第２符号化則データの１フレームのサンプル数との最小公倍数に相当するフレーム数ずつ行われるようにしたので、第１符号化則データのフレームサイズと第２符号化則データのフレームサイズとの相違を解消できるようになる。
(3)第１符号化則で符号化された第１符号化則データを、符号化時の量子化が反復ループで繰り返される第２符号化則で符号化された第２符号化則データに変換する際、第１符号化則データの復号化プロセスで得られるフレーム構造に関するパラメータを、第２符号化則での符号化プロセスに継承できるので、第２符号化則の符号化プロセスにおいて最適なフレーム構造を選択できるようになる。 According to the present invention, the following effects are achieved.
(1) The first encoding rule data encoded by the first encoding rule is converted into the second encoding rule data encoded by the second encoding rule in which the quantization at the time of encoding is repeated in an iterative loop. At the time of conversion, the parameter relating to the quantization scale obtained in the decoding process of the first coding rule data can be inherited by the coding process in the second coding rule, so that the most in the coding process of the second coding rule. It is possible to simplify a time-consuming repetitive process.
(2) Since the coding rule conversion is performed by the number of frames corresponding to the least common multiple of the number of samples of one frame of the first coding rule data and the number of samples of one frame of the second coding rule data. Thus, the difference between the frame size of the first coding rule data and the frame size of the second coding rule data can be resolved.
(3) The first encoding rule data encoded by the first encoding rule is converted into the second encoding rule data encoded by the second encoding rule in which the quantization at the time of encoding is repeated in an iterative loop. At the time of conversion, parameters related to the frame structure obtained in the decoding process of the first coding rule data can be inherited by the coding process in the second coding rule, which is optimal in the coding process of the second coding rule. The frame structure can be selected.

以下、図面を参照して本発明の最良の実施の形態について詳細に説明する。ここでは、MP3からAACへの変換を例にして、初めに本発明の概要について説明し、次いで、その詳細について説明する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the best embodiment of the present invention will be described in detail with reference to the drawings. Here, taking the conversion from MP3 to AAC as an example, the outline of the present invention will be described first, and then the details will be described.

本発明では、AACの符号化プロセスにおいて最も時間を要する量子化の反復プロセスを簡略化することで、上記したトランスコーディング方式の技術課題を解決すべく、MP3の復号化プロセスで得られるフレーム構造および量子化スケールに関するパラメータを、AACの符号化プロセスに継承するようにしている。 In the present invention, the frame structure obtained by the decoding process of MP3 and the above-described technical problem of the transcoding scheme are simplified by simplifying the iterative process of quantization that takes the most time in the encoding process of AAC. The parameters related to the quantization scale are inherited by the AAC encoding process.

MP3デコーダでは、サブバンドごとにフレームを半分ずつオーバラップさせながら、６点（short）または１８点(long)のMDCT(modified DCT)を適用することにより、周波数成分が３２個のサブバンド信号に分割される。６点のMDCTは、特にプリエコーが起こるような、時間的に見て急峻に変化する部分に適用される。 In MP3 decoder, the frequency component is changed to 32 subband signals by applying 6-point (short) or 18-point (long) MDCT (modified DCT) while overlapping the frames by half in each subband. Divided. The 6-point MDCT is applied to a portion that changes abruptly in time, particularly where pre-echo occurs.

一方、AACはMP3よりも高品質な圧縮符号化を実現するために検討されたものであり、AACで使われている多くの技術はMP3でも既に導入されている。AACにおいてMP3よりも音質が向上した理由のひとつはMDCTフィルタバンクの導入である。AACではフィルタバンクが１２８点と１０２４点との間で適応的に変動する（２５６点および２０４８点の窓をかけて半分をオーバラップさせる）。 On the other hand, AAC has been studied to achieve higher quality compression coding than MP3, and many techniques used in AAC have already been introduced in MP3. One of the reasons for improved sound quality over MP3 in AAC is the introduction of the MDCT filter bank. In AAC, the filter bank adaptively fluctuates between 128 and 1024 points (half windows overlap with 256 and 2048 points).

オーディオ信号に対しては高周波数解像度のフィルタバンクがしばしば必要とされるが、AACでは１０２４点のフィルタバンクを適用可能な一方、MP3では高くても５７６点までである。しかしながら、過渡的な信号の場合は高い周波数解像度が必要とされないため、ともに適応的に窓長をより短く設定することが可能となっている。この場合、周波数解像度の設定をMP3では１９２点で行うのに対して、AACでは１２８点で行うことができる。このように、MP3とAACとの違いのひとつは周波数ドメインへの変換の方式の違いである。さらに変換のための各フレームサイズも異なっている。 A filter bank with high frequency resolution is often required for audio signals, but 1024 filter banks can be applied with AAC, but up to 576 with MP3. However, since a high frequency resolution is not required in the case of a transient signal, it is possible to adaptively set the window length shorter. In this case, the frequency resolution can be set at 192 points in MP3, but 128 points in AAC. Thus, one of the differences between MP3 and AAC is the difference in the method of conversion to the frequency domain. Furthermore, each frame size for conversion is also different.

一般に、MP3およびAACの符号化プロセスでは、同一の心理聴覚モデルを用いて符号ビットの割り当てが行われる。さらに、符号割り当ては一般的に反復演算を行うため、これに必要となる時間は符号化プロセスの大半を占める。したがって、この処理のための時間を削減することはオーディオ符号化全体の時間を大きく削減することにつながる。 In general, in the encoding process of MP3 and AAC, code bits are assigned using the same psychoacoustic model. Furthermore, since code assignment is typically an iterative operation, the time required for this takes up most of the encoding process. Therefore, reducing the time for this process greatly reduces the time for the entire audio encoding.

本発明では、反復演算で決定されるAACのパラメータが、MP3のパラメータから計算されるようにすることで、符号化則変換に要する時間の短縮化が図られる。 In the present invention, the time required for coding rule conversion can be shortened by calculating the AAC parameters determined by the iterative calculation from the MP3 parameters.

MP3の逆量子化係数は、グローバルゲイン（量子化ステップ）とスケールファクタ（１フレームにおける各々のサブバンド信号に対し、最大絶対値をもつサンプルの値を対数に変換して量子化したもの）との和として以下のように表現される。 The inverse quantization coefficient of MP3 is a global gain (quantization step) and a scale factor (for each subband signal in one frame, the value of the sample having the maximum absolute value is converted into a logarithm and quantized). It is expressed as the sum of

ここで、xr[i]は量子化MDCT係数、is[i]はハフマン符号をデコードして得られた５７６個のデータ（MDCT係数）、Scalefactorはスケールファクターバンド毎に適用されるスケールファクタ、Global_Gain、Subblock_Gain、Scalefactor_Scaleは、それぞれグラニュール情報から得られる値であり、i, gr, w，cbはそれぞれ、MDCT係数のインデックス、グラニュールのインデックス、windowのインデックスおよび符号帳のインデックスを示している。一方、AACの再量子化係数もまた、スケールファクタを利用して以下の通り計算される。 Here, xr [i] is a quantized MDCT coefficient, is [i] is 576 data (MDCT coefficients) obtained by decoding a Huffman code, Scalefactor is a scale factor applied for each scale factor band, Global_Gain , Subblock_Gain, Scalefactor_Scale are values obtained from the granule information, and i, gr, w, and cb indicate the MDCT coefficient index, the granule index, the window index, and the codebook index, respectively. On the other hand, the requantization coefficient of AAC is also calculated as follows using the scale factor.

ここで、g, sfbはそれぞれwindow groupおよびスケールファクタバンドを意味する。上式(1)，(2)において、２の指数部で示される量子化スケール値の分布は図３のように表現され、MP3およびAACの各量子化スケール値Q，Q'は絶対的な値こそ異なるものの、高い相関を示していることが判る。 Here, g and sfb mean window group and scale factor band, respectively. In the above formulas (1) and (2), the distribution of the quantization scale value indicated by the exponent part of 2 is expressed as shown in FIG. 3, and the quantization scale values Q and Q ′ of MP3 and AAC are absolute. Although the values are different, it can be seen that they are highly correlated.

本発明では、このような量子化スケール値Q，Q'の相関に着目し、前記相関関係を代表する一次関数を利用して、MP3の量子化スケール値Qから、反復処理なしにAACの量子化スケール値Q'を求めることで、特にAAC符号化プロセスの時間短縮を実現している。 In the present invention, paying attention to the correlation between the quantization scale values Q and Q ′, using the linear function representing the correlation, the quantization scale value Q of the MP3 can be used for the AAC quantum without repetition processing. In particular, the time of the AAC encoding process is shortened by obtaining the conversion scale value Q ′.

図１は、本発明に係る符号化則変換装置１の主要部の構成を示したブロック図である。 FIG. 1 is a block diagram showing the configuration of the main part of a coding rule conversion apparatus 1 according to the present invention.

ハフマン符号デコード部３０１は、MP3のハフマン符号を非線形量子化されたデータにデコードする。サイド情報デコード部３０２は、フレーム内のサイド情報を取り出してデコードする。逆量子化部３０３は、サイド情報に基づいてデータを逆量子化する。エイリアス削減部３０４は、ハイブリッドフィルターバンクによって生じたエイリアシング（折り返し歪み）を削減する。 The Huffman code decoding unit 301 decodes the MP3 Huffman code into nonlinear quantized data. The side information decoding unit 302 takes out the side information in the frame and decodes it. The inverse quantization unit 303 performs inverse quantization on the data based on the side information. The alias reduction unit 304 reduces aliasing (folding distortion) caused by the hybrid filter bank.

IMDCT部３０５は、逆変形離散コサイン変換を行う。サブバンドFilter Bank(FB)部３０６は、３２分割されたサブバンドを合成してPCMデータを再生する。量子化スケール算出部３００は、前記図３に関して説明したMP3量子化スケール値とAAC量子化スケール値との相関関係を示す一次関数に基づいて、MP3量子化スケール値QからAAC量子化スケール値Q'を算出する。 The IMDCT unit 305 performs inverse deformation discrete cosine transform. A subband filter bank (FB) unit 306 combines the 32 subbands and reproduces PCM data. The quantization scale calculation unit 300 calculates the AAC quantization scale value Q from the MP3 quantization scale value Q based on the linear function indicating the correlation between the MP3 quantization scale value and the AAC quantization scale value described with reference to FIG. 'Is calculated.

心理聴覚分析部３０７は、入力フレームを心理聴覚分析部でFFTして周波数スペクトルを求める。MDCT部３０８は、心理聴覚分析部３０７で決定されたブロック長で入力オーディオ信号を周波数スペクトル（MDCT係数）に変換する。TNS部３０９は、圧縮処理に伴う雑音のレベルを、音の大きさに合わせて変化させることで信号レベルの大きな箇所に量子化ノイズを集中させ、音が小さな部分では雑音も小さくして聴感を向上させる。 The psychoacoustic analysis unit 307 obtains a frequency spectrum by performing FFT on the input frame in the psychoacoustic analysis unit. The MDCT unit 308 converts the input audio signal into a frequency spectrum (MDCT coefficient) with the block length determined by the psychoacoustic analysis unit 307. The TNS unit 309 concentrates the quantization noise on the part where the signal level is high by changing the level of the noise accompanying the compression process according to the volume of the sound, and reduces the noise at the part where the sound is low, thereby improving the audibility. Improve.

後方予測処理部３１０は、MDCT係数を時間軸上の信号であるかのように見立てて線形予測を行い、MDCT係数に対して予測フィルタリングを行う。量子化部３１１は、前記量子化スケール算出部３００で算出されたAAC量子化スケール値Q'に基づいてMDCT係数を量子化する。量子化されたMDCT係数は、ハフマン符号化部３１２でハフマン符号化されて冗長度を削減される。 The backward prediction processing unit 310 performs linear prediction on the assumption that the MDCT coefficient is a signal on the time axis, and performs prediction filtering on the MDCT coefficient. The quantization unit 311 quantizes the MDCT coefficient based on the AAC quantization scale value Q ′ calculated by the quantization scale calculation unit 300. The quantized MDCT coefficient is Huffman encoded by the Huffman encoder 312 to reduce the redundancy.

次いで、図２のフローチャートを参照して、MP3符号化データをAAC符号化データに変換する手順を詳細に説明する。 Next, a procedure for converting MP3 encoded data into AAC encoded data will be described in detail with reference to the flowchart of FIG.

ステップＳ１では、MP3の１フレーム分のサンプル数と、AACの１フレーム分のサンプル数との最小公倍数分に相当するMP3フレームがバッファに蓄積される。MP3の１フレームは１１５２サンプルであり、AACの１フレームは１０２４サンプルなので、ここでは、両者の最小公倍数である９２１６サンプルが変換単位とされ、８（＝９２１６／１１５２）フレーム分のサンプルがバッファに蓄積される。これにより、MP3およびAACのフレームサイズの違いの問題を吸収できる。 In step S1, MP3 frames corresponding to the least common multiple of the number of samples of one frame of MP3 and the number of samples of one frame of AAC are accumulated in the buffer. Since one frame of MP3 is 1152 samples and one frame of AAC is 1024 samples, here, 9216 samples which are the least common multiple of them are used as conversion units, and samples for 8 (= 9216/1152) frames are stored in the buffer. Accumulated. As a result, the problem of the difference in frame size between MP3 and AAC can be absorbed.

ステップＳ２では、バッファ内のMP3データがハフマン符号デコード部３０１およびサイド情報デコード部３０２でデコードされ、グラニュール情報、スケールファクタおよび量子化サンプルの５７６データが得られる。前記グラニュール情報には、グローバルゲイン、サブブロックゲインおよびスケールファクタスケールなどのパラメータが含まれる。 In step S2, the MP3 data in the buffer is decoded by the Huffman code decoding unit 301 and the side information decoding unit 302, and 576 data of granule information, scale factor, and quantized samples are obtained. The granule information includes parameters such as global gain, sub-block gain, and scale factor scale.

ステップＳ３では、前記逆量子化部３０３において、上記したグラニュール情報、スケールファクタおよび量子化サンプルの５７６データを入力として、上式(1)に基づいて逆量子化が実行され、５７６の量子化MDCT係数xr[i]が算出される。ステップＳ４では、量子化MDCT係数xr[i]の算出過程で得られる量子化スケール値が変数Q[i]（iはサンプル番号）として保存される。このとき、各サンプルのフレーム構造（longまたはshort）も併せて保存される。 In step S3, the inverse quantization unit 303 receives the above-mentioned granule information, scale factor, and 576 data of the quantized samples as input, and performs inverse quantization based on the above equation (1). MDCT coefficient xr [i] is calculated. In step S4, the quantization scale value obtained in the process of calculating the quantized MDCT coefficient xr [i] is stored as a variable Q [i] (i is a sample number). At this time, the frame structure (long or short) of each sample is also stored.

ステップＳ５では、前記量子化MDCT係数にエイリアス処理が実行され、折り返しひずみが削減される。ステップＳ６ではIMDCT合成処理が実施され、周波数領域のデータが時間領域のデータに変換される。このとき、フレーム構造がロングブロックのフレームに関しては、５７６のデータが３２×１８のデータ構造に変換され、ショートブロックのフレームに関しては、１９２のデータが３２×６のデータ構造に変換される。ステップＳ７では、３２分割されているサブバンドが合成されてPCMデータが復元される。 In step S5, alias processing is performed on the quantized MDCT coefficients to reduce aliasing distortion. In step S6, IMDCT synthesis processing is performed, and the frequency domain data is converted into time domain data. At this time, for a frame having a long block structure, 576 data is converted to a 32 × 18 data structure, and for a short block frame, 192 data is converted to a 32 × 6 data structure. In step S7, the subbands divided into 32 are combined to restore PCM data.

ステップＳ８では、前記心理聴覚分析部３０７において、前記PCMデータが所定のフレーム数ずつFFTされ、その周波数スペクトルが求められる。さらに、周波数スペクトルに基づいて聴覚のマスキングが計算され、予め設定された周波数帯域ごとの許容量子化雑音電力と、そのフレームに対する心理聴覚エントロピPEとが求められる。 In step S8, the psychoacoustic analysis unit 307 performs FFT on the PCM data by a predetermined number of frames to obtain the frequency spectrum. Further, auditory masking is calculated based on the frequency spectrum, and allowable quantization noise power for each preset frequency band and psychoacoustic entropy PE for the frame are obtained.

ステップＳ９では、前記MDCT部３０８において、前記心理聴覚分析部３０７で決定されたブロック長で入力オーディオ信号が周波数スペクトル（MDCT係数）に変換される。このとき、前記ステップＳ４で保存されたフレーム構造が参照され、図４に一例を示したように、MP3においてショートブロック(SB)であったフレームと大部分が重なるフレームではショートウインドウ(SW)が選択され、それ以外のフレームではロングウインドウ(LW)が選択される。 In step S9, the MDCT unit 308 converts the input audio signal into a frequency spectrum (MDCT coefficient) with the block length determined by the psychoacoustic analysis unit 307. At this time, the frame structure stored in step S4 is referred to, and as shown in an example in FIG. 4, a short window (SW) is formed in a frame that largely overlaps a frame that was a short block (SB) in MP3. The long window (LW) is selected in other frames.

ステップＳ１０では、TNS部３０９において、MDCT係数を時間軸上の信号であるかのように見立てて線形予測が行われ、MDCT係数に対して予測フィルタリングが行われる。ステップＳ１１では、前記後方予測処理部３１０において、MDCT係数ごとに、過去２フレームにおける量子化されたMDCT係数から現在のMDCT係数の値を予測する後方予測処理が実行され、直前のデータと比較して差分を取ることでデータ量が削減される。ステップＳ１２では、MP3から継承された量子化スケール値Q[i]に基づいて、前記量子化スケール算出部３００においてAACの量子化スケール値Q’[i]が算出される。 In step S10, the TNS unit 309 performs linear prediction on the assumption that the MDCT coefficient is a signal on the time axis, and performs prediction filtering on the MDCT coefficient. In step S11, the backward prediction processing unit 310 performs backward prediction processing for predicting the current MDCT coefficient value from the quantized MDCT coefficients in the past two frames for each MDCT coefficient, and compares it with the immediately preceding data. The amount of data is reduced by taking the difference. In step S12, based on the quantization scale value Q [i] inherited from MP3, the quantization scale calculation unit 300 calculates the AAC quantization scale value Q '[i].

ステップＳ１３では、この量子化スケール値Q’[i]に基づいて符号量が決定される。ステップＳ１４では、量子化部２０５で前記予測残差に対して量子化が実行される。すなわち、本実施形態では量子化スケール値を決定するための反復処理が行われない。 In step S13, the code amount is determined based on the quantization scale value Q '[i]. In step S14, the quantization unit 205 performs quantization on the prediction residual. That is, in this embodiment, iterative processing for determining the quantization scale value is not performed.

ステップＳ１５では、心理聴覚モデルに従ってグローバルゲインが微修正される。ステップＳ１６では、量子化されたMDCT係数に対してハフマン符号が適用されて冗長度が削減される。ステップＳ１７では、未処理のMP3データが残っているか否かが判定され、全てのMP3データに対する変換処理が完了するまで、ステップＳ１へ戻って上記した各処理が繰り返される。 In step S15, the global gain is finely corrected according to the psychoacoustic model. In step S16, the Huffman code is applied to the quantized MDCT coefficient to reduce the redundancy. In step S17, it is determined whether or not unprocessed MP3 data remains, and the process returns to step S1 and the above-described processes are repeated until the conversion process for all the MP3 data is completed.

なお、上記した実施形態では、本発明をMP3からAACへの変換を例にして説明したが、本発明はこれのみに限定されるものではなく、第１の符号化則で符号化されたデータを第２の符号化則で符号化されたデータに変換する際、第１符号化則の復号化プロセスで得られるパラメータを第２符号化則による符号化プロセスで利用できるのであれば、他の符号化則間の変換にも同様に適用できる。 In the above-described embodiment, the present invention has been described by taking the conversion from MP3 to AAC as an example. However, the present invention is not limited to this, and data encoded by the first encoding rule. Is converted into data encoded by the second encoding rule, if the parameters obtained by the decoding process of the first encoding rule can be used in the encoding process by the second encoding rule, The same applies to conversion between coding rules.

本発明に係る符号化則変換装置の主要部の構成を示したブロック図である。It is the block diagram which showed the structure of the principal part of the encoding rule converter based on this invention. MP3データをAACデータに変換する手順を示したフローチャートである。It is the flowchart which showed the procedure which converts MP3 data into AAC data. MP3およびAACの各量子化スケール値の相関関係を示した図である。It is the figure which showed the correlation of each quantization scale value of MP3 and AAC. MP3のフレーム構造とAACのフレーム構造との関係を示した図である。FIG. 3 is a diagram illustrating a relationship between an MP3 frame structure and an AAC frame structure. 従来のMP3デコーダの構成を示したブロック図である。It is the block diagram which showed the structure of the conventional MP3 decoder. 従来のAACエンコーダの構成を示したブロック図である。FIG. 10 is a block diagram showing a configuration of a conventional AAC encoder.

Explanation of symbols

１…符号化則変換装置，１０…MP3デコーダ，２０…AACエンコーダ，３００…量子化スケール算出部，３０１…ハフマン符号デコード部，３０２…サイド情報デコード部，３０３…逆量子化部，３０４…エイリアス削減部，３０５…IMDCT部，３０６…Synthesis Subband Filter Bank部，３０７…心理聴覚分析部，３０８…MDCT部，３０９…TNS部，３１０…後方予測処理部，３１１…量子化部，３１２…ハフマン符号化部 DESCRIPTION OF SYMBOLS 1 ... Coding rule converter, 10 ... MP3 decoder, 20 ... AAC encoder, 300 ... Quantization scale calculation part, 301 ... Huffman code decoding part, 302 ... Side information decoding part, 303 ... Dequantization part, 304 ... Alias Reduction unit, 305... IMDCT unit, 306... Synthesis Subband Filter Bank unit, 307... Psychoacoustic analysis unit, 308... MDCT unit, 309. Chemical department

Claims

A code for converting the first coding rule data encoded by the first coding rule into the second coding rule data encoded by the second coding rule in which quantization at the time of encoding is repeated in an iterative loop. In the coding rule conversion method for coded data,
A decoding process for decoding the first coding rule data into PCM data and a coding process for coding the PCM data into second coding rule data;
A procedure of dequantizing the quantized data in the decoding process of the first code rule data;
In the inverse quantization procedure, a procedure for obtaining a first inverse quantization scale value for each sample;
Performing a predetermined function calculation on each first quantization scale value to calculate a second quantization scale value;
A process of quantizing data using the second quantization scale values in an encoding process of a second encoding rule,
The predetermined function is configured to convert the first quantization scale value to the second quantization scale based on an additive relationship between the quantization scale value of the first coding rule and the quantization scale value of the second coding rule for the same sample. A coding rule conversion method for encoded data, characterized by converting into a value.

The first coding rule data has its coding rule for each frame number corresponding to the least common multiple of the number of samples of one frame of the first coding rule data and the number of samples of one frame of the second coding rule data. The encoding rule conversion method for encoded data according to claim 1, wherein conversion is performed.

3. The coding rule conversion method for coded data according to claim 1, wherein the first coding rule is MP3 and the second coding rule is AAC. 4.

The encoding process of the second encoding rule data includes a DCT procedure for converting the time domain of the data to the frequency domain;
In the inverse quantization procedure in the decoding process of the first code rule data, a procedure for storing a frame structure of each frame;
4. The encoding of encoded data according to claim 3, wherein a DCT procedure in the encoding process of the second encoding rule data includes a procedure for determining a window size based on the stored frame structure. 5. Law conversion method.

A code for converting the first coding rule data encoded by the first coding rule into the second coding rule data encoded by the second coding rule in which quantization at the time of encoding is repeated in an iterative loop. In a coding data conversion device for coded data,
Means for dequantizing the quantized data of the first code rule;
Means for performing a predetermined function calculation on the first inverse quantization scale value obtained at the time of the inverse quantization to calculate a second quantization scale value;
Means for quantizing data to be encoded according to a second encoding rule using each of the second quantization scale values,
The predetermined function is configured to convert the first quantization scale value to the second quantization scale based on an additive relationship between the quantization scale value of the first coding rule and the quantization scale value of the second coding rule for the same sample. An encoding rule conversion apparatus for encoded data, characterized by converting into a value.

Buffer means for storing the first coding rule data by the number of frames corresponding to the least common multiple of the number of samples of one frame of the first coding rule data and the number of samples of one frame of the second coding rule data. ,
The coding rule conversion device for coded data, wherein the coding rule conversion is executed in units of frames stored in the buffer.

7. The encoded rule conversion apparatus for encoded data according to claim 5, wherein the first encoding rule is MP3 and the second encoding rule is AAC.

DCT means for converting the time domain of the data to be encoded by the second encoding rule into the frequency domain,
8. The coding rule conversion apparatus for coded data according to claim 7, wherein the DCT means determines a window size of each frame based on a frame structure of each frame of the first coding rule data.