JP2008129250A

JP2008129250A - Window changing method for advanced audio coding and band determination method for m/s encoding

Info

Publication number: JP2008129250A
Application number: JP2006312942A
Authority: JP
Inventors: Keimin Ryu; 啓民劉; Wen-Chieh Lee; 文傑李; Yu-Ha Hsiao; 又華蕭; Kang-Yen Peng; 康硯彭
Original assignee: National Chiao Tung University NCTU
Current assignee: National Yang Ming Chiao Tung University NYCU
Priority date: 2006-11-20
Filing date: 2006-11-20
Publication date: 2008-06-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide an audio compression method for reducing quantization error and a determination method of a band state of M/S coding for advanced audio coding (AAC). <P>SOLUTION: The invention determines a global energy ratio of a first range of an audio signal and provides a method for comparing the global energy ratio with a first threshold value. Moreover, the invention provides the determination method of the band state of M/S coding for AAC, and the method comprises the steps of: receiving at least one audio stream including most part of the band; calculating a first node and a second node of each band including a left signal, a right signal, a middle signal and a side signal; calculating a minimum cost path value of each adjoining band; and determining a state of each band, based on the minimum cost path value which is in L/R state or M/S state. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明はオーディオ信号に関し、特に、圧縮エラーの低減およびデジタルオーディオ符号化のための帯域毎のＭ／Ｓ符号化の帯域状態の決定方法の改良に関する。 The present invention relates to an audio signal, and more particularly, to an improvement in a method for determining a band state of M / S coding for each band for reducing compression error and digital audio coding.

多くのデジタルオーディオシステムは、オーディオファイルサイズを少なくするために信号圧縮の技術に依存している。そのようなオーディオシステムでは一般的に未加工のオーディオ信号をサンプルウィンドウを使用してサンプリングする。
例えば、三分間の楽曲はそれぞれの長さが０.１８秒のサンプルウィンドウを１０００使用してサンプリングされる。通常ビット内で特定の長さを有するサンプルウィンドウのビット分解能は、符号化されたオーディオ信号の品質に大きな影響を及ぼす。例えば、０.１８秒のサンプルウィンドウが１２８ビットを有する場合、それぞれのビットは０.００１４秒の音楽に対応する。これらの数は実アプリケーションと一致しないかもしれない。明らかに、ウィンドウ毎のビット数が高ければ高いほどより品質の高い音楽が記憶されるが、ビットが大すぎる場合、圧縮という目的に反してしまう。圧縮およびサンプルウィンドウを使用する一般的なデジタルオーディオシステムはＭＰ３（Motion Picture Expert Group Audio Layer-3）である。 Many digital audio systems rely on signal compression techniques to reduce audio file size. Such audio systems typically sample the raw audio signal using a sample window.
For example, a three minute piece of music is sampled using 1000 sample windows each having a length of 0.18 seconds. The bit resolution of a sample window having a specific length within a normal bit has a significant effect on the quality of the encoded audio signal. For example, if a 0.18 second sample window has 128 bits, each bit corresponds to 0.0013 second of music. These numbers may not match the actual application. Obviously, the higher the number of bits per window, the more quality music is stored, but if there are too many bits, it goes against the purpose of compression. A common digital audio system that uses compression and sample windows is MP3 (Motion Picture Expert Group Audio Layer-3).

ウィンドウ切り替えの原理は時間ベースの音声信号を周波数データに符号化する装置であるフィルタバンクのウィンドウサイズの変更であり、好適な時間周波数分解能を達成する。一般的に、ウィンドウ切り替えは二つの所定のウィンドウサイズであるラージとスモールとの間の選択に関係する。プリエコーと呼ばれる圧縮による人工的または不快なノイズが過渡信号（例えば非常に短時間の音声）が符号化されているとき発生する。過渡信号は信号変換を時間内に正確に表現する高い符号化分解能が必要であるので、全てのビット不足は量子化誤差がウィンドウ期間全体に拡散することを許してしまう。 The principle of window switching is to change the window size of a filter bank, which is a device for encoding a time-based audio signal into frequency data, and achieves a suitable time-frequency resolution. In general, window switching involves a choice between two predetermined window sizes, large and small. Artificial or unpleasant noise due to compression called pre-echo occurs when a transient signal (eg very short speech) is being encoded. Since transient signals require high coding resolution to accurately represent signal transformations in time, all bit deficiencies allow the quantization error to spread throughout the window period.

この問題を明らかに図示するために、図１は過渡音声を有する信号が符号化される例を示す。
図１において、符号化されるオリジナル信号１００は小さい振幅範囲の後に続く高い振幅範囲の後に突然続く非常に小さな振幅範囲を有することが示されている。これは過渡信号ということがわかる。ロングウィンドウ１２０によってオリジナル信号１００は符号化された後、符号化された信号１１０が得られる。量子化誤差の拡散は過渡高振幅の前の範囲１３０の符号化された信号１１０で見られる。オリジナル信号１００のこの範囲には実質的に信号がないので、量子化誤差はより多くのドミナント信号によってマスクされない。一般的に、量子化誤差は一ウィンドウが実質的に異なる振幅を含む一エリアにかかる周波数領域符号化を使用するとき現われ、拡散する。周波数領域圧縮の結果として、ウィンドウ内のデータは特徴をシェアする傾向がある。符号化されたオーディオにおける量子化誤差はリスナーには不快である。 To clearly illustrate this problem, FIG. 1 shows an example in which a signal with transient speech is encoded.
In FIG. 1, the original signal 100 to be encoded is shown to have a very small amplitude range that suddenly follows a high amplitude range that follows a small amplitude range. It can be seen that this is a transient signal. After the original signal 100 is encoded by the long window 120, the encoded signal 110 is obtained. Quantization error spread is seen in the encoded signal 110 in the range 130 before the transient high amplitude. Since there is virtually no signal in this range of the original signal 100, the quantization error is not masked by more dominant signals. In general, quantization errors appear and spread when using frequency domain coding over an area where a window contains substantially different amplitudes. As a result of frequency domain compression, the data in the window tends to share features. Quantization errors in the encoded audio are uncomfortable for the listener.

量子化誤差を低減させる一方法は、異なる長さのウィンドウを使用することである。図１に示すように、量子化誤差の拡散はロングウィンドウ１６０がショートウィンドウ１７０との接続で使用されたとき、量子化された信号１４０の範囲１５０で低減される。ロングウィンドウの符号化された信号１１０と比較し、量子化誤差の拡散はショートウィンドウの量子化された信号のショートウィンドウ期間によって阻止される。 One way to reduce the quantization error is to use windows of different lengths. As shown in FIG. 1, the diffusion of quantization error is reduced in the range 150 of the quantized signal 140 when the long window 160 is used in connection with the short window 170. Compared to the long window encoded signal 110, the diffusion of quantization error is blocked by the short window period of the short window quantized signal.

プレエコー現象の説明を行う。テンポラルマスキングは同時マスキング、プレマスキングおよびポストマスキングを含む。各マスキングのタイプの効果を図２に示す。プレマスキングおよびポストマスキングの効果的なマスカーの持続期間はそれぞれおよそ２０ｍｓおよび１００ｍｓである。過渡信号またはオーディオアタックが周波数領域へと符号化されたとき、量子化誤差は時間領域の信号ブロック全体へと拡散する。アタック前の信号部分は相対的に小さいので、アタックはそのエネルギーの大部分が信号ブロックへ最も寄与し、このようにマスキング閾値の生成を制御する。そのとき閾値はブロックの静寂範囲では高すぎる。一般のロングウィンドウサイズは２０４８サンプルであり、サンプルレートが４４.１ｋＨｚのとき約４６ｍｓを表現し、プレマスキングが２０ｍｓ未満続くので、この過渡信号を符号化するのにロングウィンドウを使用したとき、量子化誤差の拡散はリスナーに容易に聞かれる。これはプレエコー現象と呼ばれる。 The pre-echo phenomenon will be explained. Temporal masking includes simultaneous masking, premasking and postmasking. The effect of each masking type is shown in FIG. The effective masker duration for pre-masking and post-masking is approximately 20 ms and 100 ms, respectively. When a transient signal or audio attack is encoded into the frequency domain, the quantization error is spread over the entire signal block in the time domain. Since the signal portion before the attack is relatively small, the attack contributes most to the signal block, thus controlling the generation of the masking threshold. The threshold is then too high in the block silence range. A typical long window size is 2048 samples, representing approximately 46 ms when the sample rate is 44.1 kHz, and premasking lasts less than 20 ms, so when using a long window to encode this transient signal, quantum Listener error diffusion is easily heard by listeners. This is called a pre-echo phenomenon.

さらに、現在のオーディオ符号化にとって、Ｍ／Ｓ(ミドル信号／サイド信号)符号化はステレオチャンネルにおける不適切で冗長な情報を効果的に低減させる中心技術である。二つ以上のチャンネル数に対し、現在のＭＰＥＧ２ＡＡＣおよびＭＰＥＧ４ＡＡＣ標準で使用される方法はチャンネルをペアに分割し、それからＭ／Ｓ符号化をそれぞれのペアに用いる方法である。符号化利得がＡＡＣに存在するとき、Ｍ／Ｓ符号化の使用は選択的なスペクトル領域範囲に適用することができる。ＭＰＥＧ４ＡＡＣ符号化標準において、帯域毎のＭ／Ｓ符号化はチャンネルの不適切性および冗長性を低減させるのに更なる融通性を提供する。しかし、その融通性はエンコーダの設計寸法および複雑度を増加させる。 Furthermore, for current audio coding, M / S (middle signal / side signal) coding is a central technology that effectively reduces inappropriate and redundant information in the stereo channel. For more than two channels, the method used in the current MPEG2 AAC and MPEG4 AAC standards is to divide channels into pairs and then use M / S coding for each pair. When coding gain is present in AAC, the use of M / S coding can be applied to selective spectral domain ranges. In the MPEG4 AAC coding standard, per-band M / S coding provides further flexibility to reduce channel inadequacy and redundancy. However, its flexibility increases encoder design dimensions and complexity.

Ｍ／Ｓ符号化はＬ／Ｒ(左／右)信号をＭ／Ｓ信号へと変換するＭ／Ｓ変換モデルを含む拡大された聴覚オーディオ符号化である。図３は従来技術によるＭ／Ｓ変換での聴覚符号化を示すブロック図である。Ｌ／Ｒオーディオ信号は分析フィルタバンク１０によって重複ブロックに分割され、周波数領域に変換される。仮に心理音響モデル２０によって計算される符号化利得がある場合、Ｍ／Ｓ変換モデル１５は周波数領域およびＭ／Ｓ信号への変換機のＬ／Ｒ信号を受信する。量子化／符号化モデル２５はビット割り当て３０によって決定されたいくつかのパラメータと共にこれらの信号の量子化および符号化をする信号を受信する。 M / S coding is an expanded auditory audio coding that includes an M / S conversion model that converts L / R (left / right) signals to M / S signals. FIG. 3 is a block diagram showing auditory coding by M / S conversion according to the prior art. The L / R audio signal is divided into overlapping blocks by the analysis filter bank 10 and converted to the frequency domain. If there is a coding gain calculated by the psychoacoustic model 20, the M / S conversion model 15 receives the L / R signal of the converter to the frequency domain and M / S signal. The quantization / encoding model 25 receives signals that quantize and encode these signals along with some parameters determined by the bit allocation 30.

心理音響モデル２０はＬ／Ｒ信号内容を分析し、関連する人間の聴覚システムの聴覚分解能を計算する。聴覚分解能および使用可能なビットに基づいて、ビット割り当て３０はビットレートに符合する好適な量子化方法を決定する。パッキングモデル３５は規格により特定されたフォーマットで符号化された情報のすべてをパッキングする。帯域毎のＭ／Ｓ符号化に関する文献が存在する。 The psychoacoustic model 20 analyzes the L / R signal content and calculates the auditory resolution of the associated human auditory system. Based on the auditory resolution and the available bits, bit allocation 30 determines the preferred quantization method that matches the bit rate. The packing model 35 packs all of the information encoded in the format specified by the standard. There are documents related to M / S coding for each band.

第１の文献はＭ／Ｓ信号のための心理音響モデル２０に関するものである。心理音響モデル２０は人間の聴覚システムをシミュレートし、量子化のための正しいマスキング閾値を与えようとする。ＬおよびＲチャンネルに対する心理音響モデル２０のマスキングモデルは標準においてすでに構築されている。しかし、ＭおよびＳチャンネルに同じ手順を置くことは合理的ではない。その上、心理音響モデル２０の複雑度はＬ／Ｒ符号化の１５％以上のファクタに寄与している。心理音響モデル２０からの追加的な複雑度はＭ／Ｓ符号化の費用の増加をもたらす。 The first document relates to a psychoacoustic model 20 for M / S signals. The psychoacoustic model 20 simulates the human auditory system and tries to give the correct masking threshold for quantization. A masking model of the psychoacoustic model 20 for the L and R channels has already been built in the standard. However, it is not reasonable to put the same procedure on the M and S channels. Moreover, the complexity of the psychoacoustic model 20 contributes to a factor of 15% or more of L / R encoding. The additional complexity from the psychoacoustic model 20 results in an increase in the cost of M / S encoding.

第２の文献はそれぞれの帯域に基づいた信号の符号化の決定に関するものである。この決定はＭ／Ｓ符号化からＬ／Ｒ符号化への符号化利得の測定に関係する。帯域状態の切り替えは心理音響モデル２０によって最大符号化利得を探し出すことを目的とする。全ての可能なケースを評価することによって最適の決定が探し出され、再構築信号を計算し、全てのケースから最小の歪を探し出す。オーディオ信号ファームは４９の帯域を含むので、全ての可能なケースに対して命令O(2^49)の複雑度算定数値を有する。 The second document relates to the determination of signal encoding based on each band. This determination relates to the measurement of coding gain from M / S coding to L / R coding. The purpose of switching the band state is to find the maximum coding gain by the psychoacoustic model 20. The best decision is found by evaluating all possible cases, the reconstructed signal is calculated, and the smallest distortion is found from all cases. Since the audio signal firm contains 49 bands, it has a complexity calculation value of instruction O (2 ^ 49) for all possible cases.

Ｍ／Ｓ符号化は自由に使用され、もっとも代表的なＡＡＣエンコーダであるＦＡＡＣは緻密なパラメータ調整がされたジョンストンの調査に基づいて改良された。図４は従来技術によるＦＡＡＣにおけるＭ／Ｓ符号化の帯域状態を決定する過程を示すフローチャート図である。心理音響モデル２０はＭ／Ｓ符号化のそれぞれの帯域状態を決定するＬ／Ｒ信号を受信し、下記のステップを含む。 M / S coding is freely used, and FAAC, the most representative AAC encoder, has been improved based on Johnston's research with fine parameter adjustments. FIG. 4 is a flowchart showing a process for determining the band state of M / S encoding in FAAC according to the prior art. The psychoacoustic model 20 receives L / R signals that determine the respective band states of M / S encoding and includes the following steps.

ステップ１〜ステップ２：左信号および右信号を高速フーリエ変換（ＦＦＴ：Fast Fourier Transform）によって左ＦＦＴ（Ｌ_FFT）信号および右ＦＦＴ（Ｒ_FFT）信号に変換する。 Step 1 to Step 2: The left signal and the right signal are converted into a left FFT (L _FFT ) signal and a right FFT (R _FFT ) signal by Fast Fourier Transform ( _FFT ).

ステップ３：左ＦＦＴ信号および右ＦＦＴ信号をミドルＦＦＴ（Ｍ_FFT）信号およびサイドＦＦＴ（Ｓ_FFT）信号に変換する。 Step 3: The left FFT signal and the right FFT signal are converted into a middle FFT (M _FFT ) signal and a side FFT (S _FFT ) signal.

ステップ４〜ステップ５：心理音響モデル２０のマスキングモデルによって左信号および右信号のマスキング閾値（Ｔ_L、Ｔ_R）をそれぞれ計算する。 Step 4 Step 5: Masking Model psychoacoustic model 20 calculates masking threshold value (T _L, T _R) of the left signal and right signal, respectively.

ステップ６〜ステップ８：ミドル信号およびサイド信号のマスキング閾値（Ｔ_M、Ｔ_S）を計算し、Ｍ／Ｓ信号はＬ／Ｒ符号化の中で同じモデルであるマスキングモデルに入れられ、マスキング閾値を取得する。その後、最後のマスキング閾値がバイノーラルＭＬＤ（masking level difference）効果を利用することによって決定される。 Step 6 to Step 8: The masking threshold values (T _M , T _S ) of the middle signal and the side signal are calculated, and the M / S signal is put into a masking model which is the same model in L / R coding, and the masking threshold value is calculated. To get. Thereafter, the final masking threshold is determined by using a binaural MLD (masking level difference) effect.

ステップ９〜ステップ１４：ｄｂ＜０.２５のときステップ１５を実行するために計算および比較を行い、そうでなければステップ１６を実行する。 Step 9 to Step 14: When db <0.25, calculation and comparison are performed to execute Step 15, otherwise, Step 16 is executed.

ステップ１５：ｉ^th帯域状態はＭ／Ｓ状態であると決定し、それからＭ／Ｓ変換モデル１５はＭ／Ｓ信号へのＮ^th帯域変換機のＬ／Ｒ信号を受信し、これらのＭ／Ｓ信号は量子化／符号化モデル２５によって量子化および符号化される。 Step 15: It is determined that the i ^th band state is the M / S state, and then the M / S conversion model 15 receives the L / R signal of the N ^th band converter to the M / S signal, and these M / S signals The S signal is quantized and encoded by a quantization / coding model 25.

ステップ１６：Ｎ^th帯域状態はＬ／Ｒ状態であると決定し、量子化／符号化モデル２５はＮ^thのＬ／Ｒ信号を受信して量子化および符号化を行う。 Step 16: It is determined that the N ^th band state is the L / R state, and the quantization / coding model 25 receives the N ^th L / R signal and performs quantization and coding.

ＦＡＡＣの帯域状態の決定に関する問題が存在する。第１の問題は、ＦＡＡＣはＭ／Ｓ帯域使用を決定するマスキング閾値の相違度のみを使用し、Ｍ／Ｓ信号はＬ／Ｒ閾値の中で同じモデルであるマスキングモデルに入れられ、マスキング閾値を取得する。Ｍ／Ｓ信号を置くことは合理的ではない。閾値の設定および基準の比較によって帯域状態使用を容易に決定することができるが、連続した帯域情報は使用できず、一つのフレーム内の不安定な状態の切り替えは効果的にそれぞれの帯域にビットを割り当てることができず、サイド情報が増加してしまう。さらに、全ての可能なケースを評価し、再構築された信号を計算し、各ケースから最低歪を見つけることによって最適な帯域状態の決定が見つけ出される。しかし、命令O(2^49)の複雑度計算は導入するには高価すぎる。 There are problems with determining the bandwidth state of FAAC. The first problem is that FAAC uses only the masking threshold dissimilarity that determines M / S band usage, and the M / S signal is put into a masking model that is the same model among the L / R thresholds. To get. Placing the M / S signal is not reasonable. Bandwidth usage can be easily determined by setting the threshold and comparing the criteria, but continuous bandwidth information is not available, and switching of unstable states within one frame is effectively a bit in each bandwidth. Cannot be assigned, and the side information increases. In addition, an optimal bandwidth state determination is found by evaluating all possible cases, calculating the reconstructed signal, and finding the lowest distortion from each case. However, the complexity calculation of the instruction O (2 ^ 49) is too expensive to introduce.

従って、本発明は、プレエコー、時間複雑度およびその他欠点などの量子化誤差を低減させるオーディオ圧縮方法およびＡＡＣのためのＭ／Ｓ符号化の帯域状態の決定方法に関する。
特開平８−１６７８７８号公報 Accordingly, the present invention relates to an audio compression method that reduces quantization errors such as pre-echo, time complexity and other drawbacks, and a method for determining the band state of M / S coding for AAC.
JP-A-8-167878

本発明の第１の目的は、量子化誤差を低減させる方法およびそれに関連する装置を提供することにある。
本発明の第２の目的は、各ＰＥ（聴覚エントロピー）を考慮し、隣の帯域の符号化状態を変更するための帯域の状態を決定し、時間複雑度を低減させるＡＡＣのためのＭ／Ｓ符号化の帯域状態の決定方法を提供することにある。
本発明の第３の目的は、どんな補助機能を使用するよりも簡単で、安価な計算で最適の帯域状態決定を見つけ出す方法を提供することにある。
本発明の第４の目的は、Ｍ／Ｓマスキング閾値を取得する心理音響モデルのＭ／Ｓ符号化モデルを修正する方法を提供することにあり、Ｍ／Ｓ信号を置くことは合理的である。
本発明の第５の目的は、ＡＡＣのためのＭ／Ｓ符号化の帯域状態の決定方法を提供することにあり、大多数の帯域を含む少なくとも一つのオーディオストリームを受信するステップと、左信号、右信号、ミドル信号およびサイド信号を含む各帯域の、右信号および左信号のＰＥ（聴覚エントロピー）値の合計である第１のノード、およびミドル信号およびサイド信号のＰＥ値の合計である第２のノードを計算するステップと、Ｎ^th帯域の第１のノードから（Ｎ＋１）^th帯域の第１または第２のノード、或いはＮ^th帯域の第２のノードから（Ｎ＋１）^th帯域の第１または第２のノードまでである各隣の帯域の最小コストパス値を計算するステップと、状態がＬ／Ｒ状態またはＭ／Ｓ状態であろう最小コストパス値に基づいて各帯域の状態を決定するステップとを含み、その方法は安価な計算およびＭ／Ｓマスキング閾値を提供し、時間複雑度を低減させる。
本発明のその他の目的は、発明を実施するための最良の形態での記述を読むことによって明らかになる。 It is a first object of the present invention to provide a method and related apparatus for reducing quantization error.
The second object of the present invention is to consider each PE (auditory entropy), determine the state of a band for changing the coding state of the adjacent band, and reduce M / for AAC to reduce time complexity. An object of the present invention is to provide a method for determining the band state of S encoding.
A third object of the present invention is to provide a method that finds the optimal bandwidth state determination with simpler and cheaper computation than using any auxiliary function.
The fourth object of the present invention is to provide a method for modifying the M / S coding model of the psychoacoustic model for obtaining the M / S masking threshold, and it is reasonable to put the M / S signal. .
A fifth object of the present invention is to provide a method for determining a band state of M / S coding for AAC, receiving at least one audio stream including a majority band, and a left signal. A first node that is the sum of the PE (auditory entropy) values of the right signal and the left signal, and the sum of the PE values of the middle signal and the side signal in each band including the right signal, the middle signal, and the side signal. calculating a second node, the first node N ^th band (N + 1) the first or second node of ^th band, or from the second node of the N ^th band (N + 1) ^th first band Or calculating the minimum cost path value of each adjacent band up to the second node and determining the state of each band based on the minimum cost path value that would be in the L / R state or M / S state That comprises the steps, the method provides an inexpensive computing and M / S masking threshold, reduce the time complexity.
Other objects of the invention will become apparent upon reading the description of the best mode for carrying out the invention.

上述の課題を解決するために、本発明は、オーディオ信号の第１の範囲のグローバルエネルギー比率を決定し、グローバルエネルギー比率を第１の閾値と比較する方法を提供し、オーディオ信号のブロックを受信するステップと、オーディオ信号の第１の範囲のグローバルエネルギー比率を決定し、グローバルエネルギー比率と第１の閾値とを比較するステップと、オーディオ信号の第２の範囲のゼロクロス比率を決定し、ゼロクロス比率と第２の閾値とを比較するステップと、グローバルエネルギー比率またはゼロクロス比率が第１または第２の閾値を超え、オーディオ信号の第３の範囲のトーンアタックが検出されないときショート符号化ウィンドウを選択するステップと、グローバルエネルギー比率およびゼロクロス比率が第１および第２の閾値を超えないとき、或いはオーディオ信号の第３の範囲のトーンアタックが検出されたときロング符号化ウィンドウを選択するステップと、選択された符号化ウィンドウで、第１、第２および第３の範囲と共通であるオーディオ信号の第４の範囲を符号化するステップとを含む。
本発明はさらにＡＣＣのためのＭ／Ｓ符号化の帯域状態の決定方法を提供し、帯域の大部分を含む少なくとも1つのオーディオストリームを受信するステップと、左信号、右信号、ミドル信号、およびサイド信号を含む各帯域の第１のノードおよび第２のノードを計算するステップと、各隣の帯域の最小コストパス値を計算するステップと、状態がＬ／Ｒ状態またはＭ／Ｓ状態であろう最小コストパス値に基づいて各帯域の状態を決定するステップとを含む。 To solve the above problems, the present invention provides a method for determining a global energy ratio of a first range of an audio signal, comparing the global energy ratio with a first threshold, and receiving a block of the audio signal. Determining a global energy ratio of the first range of the audio signal, comparing the global energy ratio with a first threshold, determining a zero cross ratio of the second range of the audio signal, and zero cross ratio Comparing the second and second thresholds, and selecting a short coding window when the global energy ratio or zero crossing ratio exceeds the first or second threshold and no third range tone attack of the audio signal is detected. Steps, the global energy ratio and the zero-cross ratio are the first and Selecting a long encoding window when a threshold of 2 is not exceeded, or when a tone attack of the third range of the audio signal is detected, and the first, second and third in the selected encoding window Encoding a fourth range of the audio signal that is common to the range of.
The present invention further provides a method for determining a band state of M / S coding for ACC, receiving at least one audio stream including a majority of the band, a left signal, a right signal, a middle signal, and Calculating a first node and a second node of each band including a side signal; calculating a minimum cost path value of each adjacent band; and the state is an L / R state or an M / S state Determining a state of each band based on a wax minimum cost path value.

本発明は、グローバルエネルギーへの考慮から、ゼロクロスおよびオーディオ信号のトーンアタックはショートウィンドウおよびロングウィンドウの選択を許し、このことによって量子化誤差をかなり低減することができる。 The present invention, from a global energy consideration, allows zero-cross and audio signal tone attacks to select between short and long windows, which can significantly reduce quantization errors.

図５は、本発明の実施例のＡＡＣ（advanced audio coding）エンコーダ３００を示すブロック図である。
ＡＡＣエンコーダ３００は利得制御ユニット３１０、聴覚モデル３２０、フィルタバンク３３０、ウィンドウ決定モジュール３４０およびビットストリームマルチプレクサ３５０から構成される。入力信号が利得制御ユニット３１０および聴覚モデル３２０からＡＡＣエンコーダ３００に入力される。聴覚モデル３２０はウィンドウ決定方法（後ほど説明を行う）と関連がある情報をウィンドウ決定モジュール３４０に送る。ウィンドウ決定モジュール３４０はウィンドウサイズを選択し、適切な情報の入力信号を符号化するために、選択されたウィンドウサイズを使用するフィルタバンク３３０に通過させ、利得制御ユニット３１０の出力と協調して符号化されたオーディオストリームが生成される。ＡＡＣエンコーダ３００はさらにウィンドウ決定モジュール３４０とフィルタバンク３３０との間に接続されるウィンドウタイプスイッチ３６０およびフィルタバンク３３０とビットストリームマルチプレクサ３５０との間に接続される量子化モジュール３７０を備える。
上述の具体的な実施例によって本発明が制限されることはなく、ＡＡＣエンコーダ３００はＩＳＯ／ＩＥＣＭＰＥＧ‐２／４規格に合わせて設計することもできる。 FIG. 5 is a block diagram illustrating an AAC (advanced audio coding) encoder 300 according to an embodiment of the present invention.
The AAC encoder 300 includes a gain control unit 310, an auditory model 320, a filter bank 330, a window determination module 340, and a bitstream multiplexer 350. Input signals are input from the gain control unit 310 and the auditory model 320 to the AAC encoder 300. The auditory model 320 sends information related to the window determination method (to be described later) to the window determination module 340. The window determination module 340 selects the window size and passes it through the filter bank 330 using the selected window size to encode the appropriate information input signal and code in concert with the output of the gain control unit 310. An audio stream is generated. The AAC encoder 300 further includes a window type switch 360 connected between the window determination module 340 and the filter bank 330 and a quantization module 370 connected between the filter bank 330 and the bitstream multiplexer 350.
The present invention is not limited by the specific embodiments described above, and the AAC encoder 300 may be designed in accordance with the ISO / IEC MPEG-2 / 4 standard.

フィルタバンク３３０はロングウィンドウまたはショートウィンドウを選択することによる２０４８サンプルまたは２５６サンプルの入力期間を有する変換間での移行によって入力信号に対して時間周波数変換を行う。 The filter bank 330 performs a time-frequency transform on the input signal by transitioning between transforms having an input period of 2048 samples or 256 samples by selecting a long window or a short window.

２０４８サンプルおよび２５６サンプルの二つのウィンドウサイズはただの模範であり、二つウィンドウサイズより大きなものや異なるサイズのウィンドウでもよい。２５６サンプル期間は過渡信号符号化のためのものであり、周波数選択度とプレエコー抑制との間での良好な折衷点である。 The two window sizes of 2048 samples and 256 samples are merely exemplary, and may be larger than the two window sizes or different size windows. The 256 sample period is for transient signal coding and is a good compromise between frequency selectivity and pre-echo suppression.

図１に示すように、ロング変換とショート変換との間の遷移の間、スタートとストップとのブリッジされた変換（即ち、スタートウィンドウおよびストップウィンドウ）はＭＤＣＴ（Modified Discrete Cosine Transformation）およびＩＭＤＣＴ（逆ＭＤＣＴ）の時間領域エイリアシング打消し特性の維持に使用され、ウィンドウアライメントが維持される。一般に、２０４８サンプルロング変換はロングシーケンスと呼ばれ、グループ内で発生する２５６サンプルショート変換はショートシーケンスと呼ばれる。ショートシーケンスは約５０％が相互に重複するように配置され、スタートウィンドウおよびストップウィンドウに重複する境界の変換の半分を有する八つのショートウィンドウ変換を有することができる。
図６に示すように、これらの重複するシーケンスグループはウィンドウをスタートシーケンス、ストップシーケンス、ロングシーケンスおよびショートシーケンスに変換する。図６の下のカーブはストップウィンドウに続く八つのショートウィンドウに続くスタートウィンドウを示し、上のカーブは過渡信号不在でのロングウィンドウ符号化を示す。 As shown in FIG. 1, during the transition between long and short transformations, the bridged transformation between start and stop (ie, start window and stop window) is MDCT (Modified Discrete Cosine Transformation) and IMDCT (inverse). MDCT) is used to maintain time domain aliasing cancellation characteristics and window alignment is maintained. In general, a 2048 sample long transform is called a long sequence, and a 256 sample short transform occurring within a group is called a short sequence. The short sequence is arranged so that about 50% overlaps each other and can have eight short window transformations with half of the boundary transformations overlapping the start and stop windows.
As shown in FIG. 6, these overlapping sequence groups transform windows into start sequences, stop sequences, long sequences and short sequences. The lower curve in FIG. 6 shows the start window following the eight short windows following the stop window, and the upper curve shows the long window encoding in the absence of transient signals.

ショートウィンドウは高い時間分解能を有し、ロングウィンドウは高い周波数分解能を有するので、過渡信号はショートウィンドウから恩恵を受けてプレエコー効果を制御し、非過渡信号（即ち、変動がない）信号はロングウィンドウから恩恵を受けて余剰を取り出すために信号スペクトルの線路を分析する。仮に非過渡信号がショートウィンドウで発生した場合、低周波数分解能が周波数領域の符号化された信号の精密度を低減させる。第１の実施例では、ＡＡＣエンコーダ３００のウィンドウ決定モジュール３４０は、グローバルエネルギー比率、ゼロクロス比率およびトーンアタックを参照して次のウィンドウサイズを選択する。 Since the short window has a high time resolution and the long window has a high frequency resolution, the transient signal benefits from the short window to control the pre-echo effect, and the non-transient signal (ie, no variation) signal is a long window. Analyze signal spectrum lines to get the surplus to benefit from. If a non-transient signal occurs in a short window, the low frequency resolution reduces the accuracy of the frequency domain encoded signal. In the first embodiment, the window determination module 340 of the AAC encoder 300 selects the next window size with reference to the global energy ratio, the zero cross ratio, and the tone attack.

グローバルエネルギー比率：時間領域エネルギーが急激に変化するとき過渡信号は通常発生する。ゆえに過渡信号を検出するのにエネルギー比率が使用される。従来のエネルギー比率の検出方法は二つのスライドするショートウィンドウ間のエネルギー比率だけが考慮されたが、このエネルギー比率は徐々に増加する信号の検出には不適当である。一般にプリエコー効果は最も高いエネルギーを有する信号部分によって生成される。 Global energy ratio: Transient signals usually occur when time domain energy changes rapidly. Therefore, energy ratio is used to detect transient signals. Conventional energy ratio detection methods only consider the energy ratio between two sliding short windows, but this energy ratio is unsuitable for detecting signals that increase gradually. In general, the pre-echo effect is generated by the signal portion having the highest energy.

図７はスピーチ信号の例を示す図である。図７の三つの信号は上から、徐々に増加する過渡信号、エネルギー比率の従来値および本発明によるグローバルエネルギー比率である。従来のエネルギー比率の最高値は約２.１であるが、過渡検出閾値が２.０にセットされた場合、誤判断が容易に発生する。グローバルエネルギー比率方法はこの問題を解決するエネルギー比率の検出可能値をさらに容易に提供する。 FIG. 7 is a diagram illustrating an example of a speech signal. The three signals in FIG. 7 are, from above, a gradually increasing transient signal, the conventional value of the energy ratio and the global energy ratio according to the present invention. The maximum value of the conventional energy ratio is about 2.1. However, when the transient detection threshold is set to 2.0, erroneous determination easily occurs. The global energy ratio method more easily provides a detectable value of the energy ratio that solves this problem.

２５６サンプルウィンドウＷｉのエネルギー機能Ｅn(i）を決定するために、本発明では数式１に示すような入力信号Ｘｋの二乗和を使用する。 In order to determine the energy function En (i) of the 256 sample window Wi, the present invention uses the square sum of the input signal Xk as shown in Equation 1.

（数１）

(Equation 1)

それから、ショートウィンドウのエネルギーＥ_n(i)のセット内の最高エネルギーＭａｘ＿Ｅｎおよび最低エネルギーＭｉｎ＿Ｅｎが見つけ出される。このようにグローバルエネルギー比率は数式２のように定義される。 Then, the highest energy Max_En and the lowest energy Min_En in the set of short window energies En _(i) are found. Thus, the global energy ratio is defined as Equation 2.

（数２）

(Equation 2)

従って、グローバルエネルギー比率Ｇｌｏｂａｌ＿Ｅｎ＿Ｒａｔｉｏが所定のエネルギー閾値よりも大きい場合、信号は過渡信号であるとみなされる。図７の下部の二つのグラフの比較から分かるように、数式１および数式２は改善された過渡信号検出を提供する。 Thus, if the global energy ratio Global_En_Ratio is greater than a predetermined energy threshold, the signal is considered a transient signal. As can be seen from the comparison of the two graphs at the bottom of FIG. 7, Equations 1 and 2 provide improved transient signal detection.

ゼロクロス比率：グローバルエネルギー比率単独ではスペクトル内容の迅速な変更のあるセグメントを有する信号の検出を行うことはできないので、信号のメインの周波数内容を表現するためにゼロクロスレートが使用される。 Zero cross ratio: The zero cross rate is used to represent the main frequency content of the signal because the global energy ratio alone cannot detect signals with segments with rapid changes in spectral content.

一例として、図８は安定したグローバルエネルギー比率での過渡信号を示す図であるが、この信号はスペクトル内容での急激な変化を有する。各２５６サンプルショートウィンドウのゼロクロスレートＺe(i)が数式３のように定義されるときゼロクロス比率はこの種の過渡信号を検出できる。 As an example, FIG. 8 shows a transient signal with a stable global energy ratio, but this signal has an abrupt change in spectral content. When the zero cross rate Ze (i) of each 256 sample short window is defined as Equation 3, the zero cross ratio can detect this type of transient signal.

（数３）

(Equation 3)

それから、ショートウィンドウのゼロクロスレートのセット内の最高ゼロクロスレートＭａｘ＿Ｚｅおよび最低ゼロクロスレートＭｉｎ＿Ｚｅが見つけ出される。このようにゼロクロス比率は数式４のように定義される。 Then, the highest zero cross rate Max_Ze and the lowest zero cross rate Min_Ze within the set of short window zero cross rates are found. Thus, the zero cross ratio is defined as in Equation 4.

（数４）

(Equation 4)

ゼロクロス比率Ｚｅ＿Ｒａｔｉｏがゼロクロス閾値よりも大きいとき、信号は過渡信号であると見なされる。この方法は従来の方法よりも複雑度が低く、例えばバイオリンおよびスピーチ内の信号の過渡を正確に検出することができる。 When the zero cross ratio Ze_Ratio is greater than the zero cross threshold, the signal is considered to be a transient signal. This method is less complex than conventional methods and can accurately detect signal transients in, for example, violins and speech.

トーンアタック：一般にショートウィンドウはロングウィンドウよりも低い周波数分解能を有する。図９は本発明のグローバルエネルギー比率によって恐らく過渡信号であると見なされる純音声の信号の例を示す図である。
図１０は２０４８サンプル変換（上）および２５６サンプル変換（下）によって変換された周波数を示す。図１０において、短い方の変換によるトーン信号変換はサイド帯域エネルギーの増加をもたらすことが見られる。トーンアタック効果は信号がロングウィンドウ心理音響モデル（後ほど述べる）によって分析されたトーン帯域を有するときと定義される。 Tone attack: In general, a short window has a lower frequency resolution than a long window. FIG. 9 is a diagram illustrating an example of a pure speech signal that is considered to be a transient signal by the global energy ratio of the present invention.
FIG. 10 shows the frequency converted by the 2048 sample conversion (top) and 256 sample conversion (bottom). In FIG. 10, it can be seen that tone signal conversion by the shorter conversion results in an increase in sideband energy. A tone attack effect is defined when the signal has a tone band analyzed by a long window psychoacoustic model (discussed later).

ウィンドウ決定方法：ウィンドウ決定方法は上述のグローバルエネルギー比率、ゼロクロス比率およびトーンアタックが考慮される。図１１は過渡信号の検出にグローバルエネルギー比率およびゼロクロス比率を使用し、トーンアタック分析による誤検出を避けることを表すフローチャート図である。ステップ９００でエネルギー比率またはゼロクロス比率のどちらかがそれぞれの閾値を超えているか測定される。これらの比率のどちらかが閾値を超える場合、トーンアタックがステップ９１０でテストされる。両方の比率が閾値を超えない場合またはトーンアタックが検出された場合、ロングウィンドウがステップ９２０で選択される。しかし、比率のどちらかが閾値を越え、ステップ９１０でトーンアタックが検出されない場合、ステップ９３０でショートウィンドウが選択される。第１の実施例では図１１のフローチャート図で達成される手順は図５に示すＡＡＣエンコーダ３００のウィンドウ決定モジュール３４０によって実行される。 Window determination method: The above-described global energy ratio, zero cross ratio and tone attack are considered in the window determination method. FIG. 11 is a flowchart showing the use of the global energy ratio and the zero-cross ratio for detection of transient signals and avoiding false detection by tone attack analysis. In step 900 it is determined whether either the energy ratio or the zero cross ratio exceeds the respective threshold. If either of these ratios exceeds the threshold, the tone attack is tested at step 910. If both ratios do not exceed the threshold or if a tone attack is detected, a long window is selected at step 920. However, if either of the ratios exceeds the threshold and no tone attack is detected at step 910, a short window is selected at step 930. In the first embodiment, the procedure achieved in the flowchart of FIG. 11 is executed by the window determination module 340 of the AAC encoder 300 shown in FIG.

上述の手順はオーディオ信号全体の符号化が完成するように繰り返される。 The above procedure is repeated to complete the encoding of the entire audio signal.

図１２は、本発明のもう1つの実施例によるＡＡＣエンコーダ１０００を示すブロック図である。ＡＡＣエンコーダ３００と同様に、ＡＡＣエンコーダ１０００は聴覚モデル３２０、フィルタバンク３３０、ウィンドウ決定モジュール３４０およびビットストリームマルチプレクサ３５０を備える。ＡＡＣエンコーダ１０００はさらにウィンドウタイプスイッチ１０１０、ＴＮＳ（temporal noise shaping）ユニット１０２０、ショートウィンドウスケールファクタ評価ユニット１０３０、グルーピングユニット１０４０およびＭ／Ｓ符号化ユニット１０５０を備える。ＡＡＣエンコーダ１０００はさらに利得制御を提供する反復ループ１０６０を備える。 FIG. 12 is a block diagram illustrating an AAC encoder 1000 according to another embodiment of the present invention. Similar to the AAC encoder 300, the AAC encoder 1000 includes an auditory model 320, a filter bank 330, a window determination module 340, and a bitstream multiplexer 350. The AAC encoder 1000 further includes a window type switch 1010, a TNS (temporal noise shaping) unit 1020, a short window scale factor evaluation unit 1030, a grouping unit 1040, and an M / S encoding unit 1050. The AAC encoder 1000 further comprises an iterative loop 1060 that provides gain control.

図１３は、本発明のさらにもう1つの実施例によるＡＡＣエンコーダ１１００を示すブロック図である。ＡＡＣエンコーダ３００と同様に、ＡＡＣエンコーダ１１００は聴覚モデル３２０、フィルタバンク３３０、ウィンドウ決定モジュール３４０およびビットストリームマルチプレクサ３５０を備える。 FIG. 13 is a block diagram illustrating an AAC encoder 1100 according to yet another embodiment of the present invention. Similar to AAC encoder 300, AAC encoder 1100 includes an auditory model 320, a filter bank 330, a window determination module 340, and a bitstream multiplexer 350.

ＡＡＣエンコーダ１０００と同様に、ＡＡＣエンコーダ１１００はさらにウィンドウタイプスイッチ１０１０、ＴＮＳ（temporal noise shaping）ユニット１０２０、ショートウィンドウスケールファクタ評価ユニット１０３０、グルーピングユニット１０４０およびＭ／Ｓ符号化ユニット１０５０を備える。ＡＡＣエンコーダ１１００はさらにウィンドウカップリングユニット１１０５、グループカップリングユニット１１１０、ショートウィンドウスケールファクタ再評価ユニット１１２０および利得制御を提供する反復ループ１１３０を備える。 Similar to the AAC encoder 1000, the AAC encoder 1100 further includes a window type switch 1010, a TNS (temporal noise shaping) unit 1020, a short window scale factor evaluation unit 1030, a grouping unit 1040, and an M / S encoding unit 1050. The AAC encoder 1100 further comprises a window coupling unit 1105, a group coupling unit 1110, a short window scale factor reevaluation unit 1120, and an iterative loop 1130 that provides gain control.

さらに、手順を表すいくつかの構成要素が併合されるが、説明を明確にするためにここでは分割して説明を行う。例えばショートウィンドウスケールファクタ評価ユニット１０３０とショートウィンドウスケールファクタ再評価ユニット１１２０は同一の物理的装置とすることができる。 Furthermore, although some components representing the procedure are merged, the explanation is divided here for the sake of clarity. For example, the short window scale factor evaluation unit 1030 and the short window scale factor reevaluation unit 1120 can be the same physical device.

ウィンドウタイプスイッチ３６０、１０１０：ウィンドウ決定モジュール３４０が次のフレームのウィンドウタイプを決定した後、現在のウィンドウタイプはウィンドウタイプスイッチ１０１０を使用して次のウィンドウタイプと前のウィンドウタイプとを比較することによって切り替えられる。 Window type switch 360, 1010: After the window determination module 340 determines the window type of the next frame, the current window type uses the window type switch 1010 to compare the next window type with the previous window type. It is switched by.

スタートタイプウィンドウはロングウィンドウとショートウィンドウとをブリッジするのに使用される。そのために、ウィンドウ決定モジュール３４０は予め次のフレームのウィンドウタイプを決定しなければならず、次のフレームが前のフレームと異なる場合、現在のフレームはスタートウィンドウタイプまたはストップウィンドウタイプに切り替えられる。 The start type window is used to bridge a long window and a short window. For this, the window determination module 340 must determine the window type of the next frame in advance, and if the next frame is different from the previous frame, the current frame is switched to the start window type or the stop window type.

図１４は、ウィンドウタイプスイッチの全ての可能な状況の分析を示す図である。ロングウィンドウ、ショートウィンドウ、スタートウィンドウおよびストップウィンドウがそれぞれＬ、Ｓ、Ｌ＿ＳおよびＳ＿Ｌで表される。いくつかの不可能な状況を無視することによって簡単なスイッチング演算式を得ることができる。 FIG. 14 shows an analysis of all possible situations of the window type switch. A long window, a short window, a start window, and a stop window are represented by L, S, L_S, and S_L, respectively. A simple switching equation can be obtained by ignoring some impossible situations.

if (Current == S) {
if (Previous == S || Previous == L_S)
Current = S;
} else {
if (Previous == L || Previous == S_L) {
if (Next == L)
Current = L;
else Current = L_S;
} else if (Previous == S) {
if (Next == L)
Current = S_L;
else
Current = S;
}
}
Previous [] = Current[]; Current [] = Next[] if (Current == S) {
if (Previous == S || Previous == L_S)
Current = S;
} else {
if (Previous == L || Previous == S_L) {
if (Next == L)
Current = L;
else Current = L_S;
} else if (Previous == S) {
if (Next == L)
Current = S_L;
else
Current = S;
}
}
Previous [] = Current []; Current [] = Next []

この演算式はウィンドウタイプスイッチ３６０および／または１０１０によって実行され、そのような変更が隣接するウィンドウタイプによって必要とされる場合、現在のウィンドウは変更される。 This formula is executed by window type switch 360 and / or 1010, and if such a change is required by an adjacent window type, the current window is changed.

心理音響モデル：心理音響モデルはどの特定の音声信号が人間に聞き取られ、どれが聞き取られないかを決定し、どの音声を無視してよいかを制御する。異なるウィンドウサイズは心理音響モデルの異なる解釈および標準化を要求する。仮にウィンドウシーケンスが八つのショートウィンドウから構成される場合、ＡＡＣエンコーダ３００、１０００、１１００はショートウィンドウ心理音響モデルを八回実行する必要がある。 Psychoacoustic model: The psychoacoustic model determines which specific speech signals are heard by humans, which are not heard, and controls which speech can be ignored. Different window sizes require different interpretations and standardizations of the psychoacoustic model. If the window sequence is composed of eight short windows, the AAC encoders 300, 1000, 1100 need to execute the short window psychoacoustic model eight times.

心理音響モデルはフィルタバンク３３０のそれぞれの帯域のために顕著なノイズレベルを決定するのに必要である最低マスキング閾値を計算する。 The psychoacoustic model calculates the minimum masking threshold required to determine a significant noise level for each band of filter bank 330.

図１５は、サンプルレートが４４.１kHzのときのショートウィンドウの１４の帯域に対応するロングウィンドウの４９の帯域のマッピング結果の例を示す図である。仮にフレームがショートウィンドウを使用する場合、ＳＭＲｓはロングウィンドウから取得される。 FIG. 15 is a diagram illustrating an example of a mapping result of 49 bands of the long window corresponding to 14 bands of the short window when the sample rate is 44.1 kHz. If the frame uses a short window, SMRs are obtained from the long window.

この改良はＡＡＣエンコーダ３００、１０００および１１００の聴覚モデル３２０またはウィンドウ決定モジュール３４０によって実行される。 This refinement is performed by the auditory model 320 or window determination module 340 of the AAC encoders 300, 1000 and 1100.

グルーピングユニット１０４０およびスケールファクタ評価ユニット１０３０／１１２０：仮にウィンドウシーケンスが八つのショートウィンドウから構成される場合、１０２４係数のセットは実際は八つのショートウィンドウの持続期間上の信号の時間周波数分解能を表す８×１２８周波数係数のマトリクスである。具体的に述べると、１０２４係数のセットｃはインターリーブ前に次のように索引付けされる。 Grouping unit 1040 and scale factor evaluation unit 1030/1120: If the window sequence consists of eight short windows, the set of 1024 coefficients is actually 8 × representing the time-frequency resolution of the signal over the duration of the eight short windows. It is a matrix of 128 frequency coefficients. Specifically, the set c of 1024 coefficients is indexed as follows before interleaving:

c[g][w][b][k] c [g] [w] [b] [k]

ｇはグループ索引であり、ｗはグループ内でのウィンドウの索引であり、ｂはウィンドウ内でのスケールファクタ帯域の索引であり、ｋはスケールファクタ帯域内での係数の索引であり、最左側の索引は最も迅速に変わる。 g is the group index, w is the index of the window within the group, b is the index of the scale factor band within the window, k is the index of the coefficient within the scale factor band, and the leftmost The index changes most quickly.

インターリーブ後、係数は次のように索引付けされる。 After interleaving, the coefficients are indexed as follows:

c[g][b][w][k] c [g] [b] [w] [k]

図１６はショートウィンドウグルーピングおよびインターリーブの例を示す図である。図１６において、グループ０は０、１および２と索引付けされたショートウィンドウを含む。インターリーブ後、これらの三つのショートウィンドウの第１の帯域は大きなスケールファクタ帯域（ｓｆｂ０）を形成する。グルーピング方法は異なる符号化の考慮のためにスケールファクタ帯域の数に柔軟性を提供する。 FIG. 16 is a diagram illustrating an example of short window grouping and interleaving. In FIG. 16, group 0 includes short windows indexed as 0, 1 and 2. After interleaving, the first band of these three short windows forms a large scale factor band (sfb 0). The grouping method provides flexibility in the number of scale factor bands for different coding considerations.

ショートウィンドウはショートウィンドウ内にある量子化のノイズの拡散を制御することによって過渡信号を好適に取り扱うことができる。しかし、ＡＡＣエンコーダ１０００、１１００がショートウィンドウを使用する場合、スケールファクタ帯域の総数は1つのロングウィンドウを使用する場合の二倍となる。 The short window can preferably handle the transient signal by controlling the diffusion of quantization noise within the short window. However, when the AAC encoders 1000 and 1100 use short windows, the total number of scale factor bands is twice that when one long window is used.

本発明では、グルーピングユニット１０４０で実行されるグルーピング方法はスケールファクタ推定ユニット１０３０または１１２０で決定された八つのショートウィンドウの推定スケールファクタを使用する。従って、スケールファクタはＡＡＣエンコーダ１０００内で相対的に初期にあるショートウィンドウスケールファクタ評価ユニット１０３０で推定されるので、グルーピング方法は他のコーデックモジュール（例えばＭ／Ｓ符号化ユニット１０５０）でより柔軟に適用される。 In the present invention, the grouping method performed by the grouping unit 1040 uses the estimated scale factors of the eight short windows determined by the scale factor estimation unit 1030 or 1120. Accordingly, since the scale factor is estimated by the short window scale factor evaluation unit 1030 which is relatively early in the AAC encoder 1000, the grouping method is more flexible in other codec modules (eg, M / S encoding unit 1050). Applied.

スケールファクタを推定するために、次の方程式が使用され、非一様量子化器の量子化誤差の予想ｅｉは、数式５のようになる。 The following equation is used to estimate the scale factor, and the expected ei of the quantization error of the non-uniform quantizer is

（数５）

(Equation 5)

Δ_ｑは量子化ステップサイズであって、数式６のように定義される。 Delta _q is a quantization step size is defined as Equation 6.

（数６）

(Equation 6)

ｇはスケールファクタ帯域ｑの独立したグローバル利得である。ｃｑは各スケールファクタ帯域のスケールファクタである。 g is an independent global gain of the scale factor band q. cq is a scale factor of each scale factor band.

ビット割り当てのスケールファクタ推定は、帯域幅比例ノイズシェーピング基準に基づく。スケールファクタ帯域に対するノイズレベルは有効帯域幅Ｂ（ｑ）に比例する。 The bit factor scale factor estimate is based on a bandwidth proportional noise shaping criterion. The noise level for the scale factor band is proportional to the effective bandwidth B (q).

（数７）

(Equation 7)

σ² _N(q)およびσ² _M(q)はスケールファクタバンドｑに関連するノイズエネルギーおよびマスキングエネルギーである。 σ ² _{N (q)} and σ ² _{M (q)} are noise energy and masking energy associated with the scale factor band q.

数式５でスケールファクタをノイズパワーと関係させ、簡単に数式５と数式６とを結びつける。Ｅ[e_i ²]＝σ² _N(q)をさせ、Ｔ² _q＝σ² _M(q)・Ｂ(q) を定義する。ビット割り当てのための量子化誤差の予想は数式８で表される。 In Equation 5, the scale factor is related to the noise power, and Equation 5 and Equation 6 are simply combined. Let E [e _i ² ] = σ ² _N (q) and define T ² _q = σ ² _{M (q)} · B (q). The prediction of the quantization error for bit allocation is expressed by Equation 8.

（数８）

(Equation 8)

量子化ステップサイズの二乗Δ_q ²は数式９で表される。 The square Δ _q ² of the quantization step size is expressed by Equation 9.

（数９）

(Equation 9)

グローバル利得ｇとスケールファクタとの違いは数式１０によって評価される。 The difference between the global gain g and the scale factor is evaluated by Equation 10.

（数１０）

(Equation 10)

数式１０から、グローバル利得ｇは数式１１から評価される。 From Equation 10, the global gain g is evaluated from Equation 11.

（数１１）

(Equation 11)

そして全てのサブ帯域に対するスケールファクタが得られる。 And scale factors for all sub-bands are obtained.

グルーピング方法に関して、同じグループのショートウィンドウはグループ内の全てのスケールファクタ帯域間でスケールファクタをシェアするので、同じグループのショートウィンドウのシェアされたスケールファクタ（sharesfb_g,b）および推定スケールファクタ（sf_b,w）の違いは制限される。スケールファクタの違いに加え、この違いの影響は帯域幅（bandwidth_b）に比例する。従って、グループｇのスケールファクタエラーは数式１２によって推定される。 With respect to grouping methods, the same group of short windows share the scale factor across all scale factor bands in the group, so the shared group's short window shared scale factor (sharesfb _{g, b} ) and estimated scale factor (sf _{b, w} ) differences are limited. In addition to the difference in scale factor, the effect of this difference is proportional to bandwidth ( _b ). Therefore, the scale factor error of group g is estimated by Equation 12.

（数１２）

(Equation 12)

グルーピング方法の基準はグルーピング数を最小化し、各グループのスケールファクタエラーＥｇは閾値Ｍよりも小さくなる。この基準によって、図１７のフローチャート図に示す演算式が実行される。先ずスケールファクタ推定が実行される。その後、第１のショートウィンドウでグルーピング方法がスタートする。1つのグループのショートウィンドウは連続的であるので、演算式は各ショートウィンドウを前のショートウィンドウが属するグループに置こうとする。新しいグループのスケールファクタエラーが閾値Ｍよりも小さい場合、与えられたショートウィンドウはグループに入れられる。そうでなければ、ショートウィンドウのために新しいグループが作られる。 The standard of the grouping method minimizes the number of groupings, and the scale factor error Eg of each group becomes smaller than the threshold value M. Based on this criterion, the arithmetic expression shown in the flowchart of FIG. 17 is executed. First, scale factor estimation is performed. Thereafter, the grouping method starts in the first short window. Since a group of short windows is continuous, the arithmetic expression attempts to place each short window in the group to which the previous short window belongs. If the new group's scale factor error is less than the threshold M, the given short window is put into the group. Otherwise, a new group is created for the short window.

ＴＮＳユニット１０２０：ＴＮＳはプレエコー現象を避けるための技術である。この技術は本発明のＴＮＳユニット１０２０で適用される。図１８はエリアジングを緩和する試みにＴＮＳが適用されたときのウィンドウタイプスイッチ構成を示す図である。図１９は下記の対応する演算式を有するウィンドウタイプスイッチ１０１０のために修正されたウィンドウタイプスイッチテーブルを示す。 TNS unit 1020: TNS is a technique for avoiding the pre-echo phenomenon. This technique is applied in the TNS unit 1020 of the present invention. FIG. 18 is a diagram showing a window type switch configuration when TNS is applied to an attempt to alleviate aliasing. FIG. 19 shows a modified window type switch table for a window type switch 1010 having the following arithmetic expression.

if (Current == S) {
if (Previous == S || Previous == L_S)
Current = S;
} else {
if (Previous == L || Previous == S_L) {
if (Next == L)
Current = L;
else
Current = L_S;
}else if (Previous == S || Previous = L_S) {
if (Next == L)
Current = S_L;
else Current = S;
}
}
Previous [] = Current[]; Current [] = Next[] if (Current == S) {
if (Previous == S || Previous == L_S)
Current = S;
} else {
if (Previous == L || Previous == S_L) {
if (Next == L)
Current = L;
else
Current = L_S;
} else if (Previous == S || Previous = L_S) {
if (Next == L)
Current = S_L;
else Current = S;
}
}
Previous [] = Current []; Current [] = Next []

図１９に示すように、現在のウィンドウタイプがロングである場合、ＴＮＳが適用されたときスタートウィンドウタイプに切り替えられる。次の時間（ｎ＋１）において、新しい状況（前のウィンドウタイプがスタート、現在のウィンドウタイプがロング、次のウィンドウタイプもロングのとき）が考慮される。 As shown in FIG. 19, when the current window type is long, when the TNS is applied, it is switched to the start window type. At the next time (n + 1), the new situation (when the previous window type is started, the current window type is long, and the next window type is also long) is considered.

Ｍ／Ｓ符号化ユニット１０５０およびウィンドウカップリングユニット１１０５：ステレオ符号化で、二つのステレオチャンネルのウィンドウタイプおよびグルーピング方法が同じときＭ／Ｓメカニズムは適用可能である。 M / S encoding unit 1050 and window coupling unit 1105: In stereo encoding, the M / S mechanism is applicable when the window type and grouping method of two stereo channels are the same.

ＭＰＥＧ基準で定義されるように、聴覚エントロピー（ＰＥ）は数式１３で示すように、類似性を判断するのを補助ことができる。 As defined by the MPEG standard, auditory entropy (PE) can assist in determining similarity, as shown in Equation 13.

（数１３）

(Equation 13)

bは閾値計算区画の索引であり、E_bは区画bのエネルギー合計であり、BW_bは区画bの周波数ラインの数であり、Masking_bは区画bのマスキングである。 b is the index of the threshold calculation section, E _b is the total energy of section b, BW _b is the number of frequency lines in section b, and Masking _b is the masking of section b.

プレエコー制御を行うために、期間Masking_bは数式１４のように修正される。 In order to perform the pre-echo control, the period Masking _b is modified as shown in Equation 14.

（数１４）

(Equation 14)

qthr_bは静寂での閾値であり、nb_b およびnb_l_bは現在および前のブロックのための区画の閾値であり、repelevは不変である。 qthr _b is the quiet threshold, nb _b and nb_l _b are the partition thresholds for the current and previous blocks, and repelev is unchanged.

信号が高いエネルギーにバーストしたとき、信号エネルギーの増加の結果、nb_l_b からnb_bまでの閾値は高くなる。それからMasking_bは小さく、PEの値は大きくなる。フレームPEが所定の閾値PE_SWITCHよりも高くなったとき、エンコーダは時間分解能を増加させ、プレエコー効果を低減させるためにウィンドウタイプをショートに変更する。 When the signal bursts to high energy, the threshold from nb_l _b to nb _b increases as a result of the increase in signal energy. Then Masking _b is small and PE value is large. When the frame PE becomes higher than a predetermined threshold value PE_SWITCH, the encoder increases the time resolution and changes the window type to short in order to reduce the pre-echo effect.

図２０は、ウィンドウカップリングを示すフローチャート図である。左チャンネルＰＥと右チャンネルＰＥの違いは類似性を判断するために、閾値T1と比較される。その他のPE閾値T2はウィンドウタイプを決定するために使用される。一般に上述の手順はＭ／Ｓ符号化ユニット１０５０およびウィンドウカップリングユニット１１０５によって実行される。 FIG. 20 is a flowchart showing window coupling. The difference between the left channel PE and the right channel PE is compared with a threshold T1 to determine the similarity. The other PE threshold T2 is used to determine the window type. In general, the above procedure is performed by the M / S encoding unit 1050 and the window coupling unit 1105.

グループカップリングユニット１１１０：グループカップリングユニット１１１０に関して、スケールファクタエラーの合計がチャンネルおよびグループの二つのチャンネルで同時に計算される。図２１の左部分で、グルーピング方法が二つのチャンネルで個々に使用されている。グループカップリングの目的は、図２１の右部分に示すように、両方のチャンネルで同じグルーピング構成を維持させることにある。 Group coupling unit 1110: For group coupling unit 1110, the sum of the scale factor errors is calculated simultaneously on the channel and the two channels of the group. In the left part of FIG. 21, the grouping method is used individually for the two channels. The purpose of group coupling is to maintain the same grouping configuration in both channels, as shown in the right part of FIG.

本発明のグルーピングはグループの数を最小にし、両チャンネルの各グループのトータルのスケールファクタエラーE_gを制限し、新しい閾値2Mより小さくする。 The grouping of the present invention minimizes the number of groups and limits the total scale factor error E _g for each group of both channels, making it smaller than the new threshold 2M.

図２２は、ウィンドウカップリングおよびグループカップリングを示すフローチャート図であり、さらにＭ／Ｓコーディングとの関連を示す。Ｍ／Ｓがオンになったとき、二つのチャンネルのエネルギーは修正され、各スケールファクタ帯域と関連したスケールファクタは再推定される。Ｍ／Ｓが使用されないとき、グルーピングは二つのステレオチャンネルに個別に適用される。 FIG. 22 is a flowchart showing window coupling and group coupling, and further shows the relationship with M / S coding. When M / S is turned on, the energy of the two channels is modified and the scale factor associated with each scale factor band is re-estimated. When M / S is not used, the grouping is applied to the two stereo channels separately.

図５、１２、１３の実施例の装置で示されるエレメントの特徴は記述を明らかにするためだけのものである。 The features of the elements shown in the apparatus of the embodiment of FIGS. 5, 12, and 13 are for clarity of description only.

さらに、本発明は心理音響モデルによって計算される聴覚エントロピー（ＰＥ）にも関係し、それは左帯域、右帯域およびサイド帯域のために評価されるトランスペアレント品質を持つことが要求される最低ビットに反映される。ＰＥ値は帯域の左信号、右信号、ミドル信号およびサイド信号のためにビットを評価するのに最も簡単な方法となる。それから心理音響モデルは、Ｌ／Ｒ帯域およびＭ／Ｓ帯域からのＰＥの値を比較することによって各隣の帯域の最低コストパス値を計算し、帯域状態をＬ／Ｒ状態またはＭ／Ｓ状態に決定する。 Furthermore, the present invention also relates to auditory entropy (PE) calculated by the psychoacoustic model, which reflects on the lowest bit required to have a transparent quality evaluated for the left, right and side bands. Is done. The PE value is the simplest way to evaluate bits for the left, right, middle and side signals of the band. The psychoacoustic model then calculates the lowest cost path value for each adjacent band by comparing the PE values from the L / R and M / S bands, and the band state is either the L / R state or the M / S state. To decide.

ＰＥは数式１５のように定義される。 PE is defined as Equation 15.

（数１５）

(Equation 15)

Ｗ_i、Ｅ_iおよびＴ_iはｉth帯域の帯域幅、エネルギーおよびマスキング閾値である。 W _i , E _i and T _i are the bandwidth, energy and masking threshold of the _i th band.

Ｍ／Ｓチャンネルのマスキング閾値を引き出すために、数式１６、１７のように再構築された左チャンネルおよび右チャンネルを考慮する。 To derive the masking threshold for the M / S channel, consider the left and right channels reconstructed as in Equations 16 and 17.

（数１６）

(Equation 16)

（数１７）

(Equation 17)

数式１６、１７から数式１８、１９が導き出される。 Equations 18 and 19 are derived from Equations 16 and 17.

（数１８）

(Equation 18)

（数１９）

(Equation 19)

L'_i[k],R'_i[k],M'_i[k] およびS'_i[k]はデコーダからの再量子化された周波数ラインである。量子化誤差のために再構築された信号は数式２０、２１のように書き換えられる。 L ′ _i [k], R ′ _i [k], M ′ _i [k] and S ′ _i [k] are requantized frequency lines from the decoder. The signal reconstructed due to the quantization error is rewritten as Equations 20 and 21.

（数２０）

(Equation 20)

（数２１）

(Equation 21)

N_Li[k]，N_Ri[k]，N_Mi[k]およびN_si[k]は各チャンネルに対する関連したノイズである。トランスペアレントオーディオ符号化のために、N_Li[k]とN_Ri[k]との違いはＬ帯域信号およびＲ帯域信号のマスキング閾値未満でなければならない。区画帯域に関する違いは数式２２、２３によって強制される。 N _Li [k], N _Ri [k], N _Mi [k] and N _si [k] are the associated noise for each channel. For transparent audio coding, the difference between N _Li [k] and N _Ri [k] must be less than the masking threshold for L-band and R-band signals. The difference regarding the partition band is enforced by Equations 22 and 23.

（数２２）

(Equation 22)

（数２３）

(Equation 23)

不等式である数式２２、２３を満たす十分条件は数式２４、２５、２６である。 The sufficient conditions that satisfy the mathematical expressions 22 and 23 that are inequalities are the mathematical expressions 24, 25, and 26.

（数２４）

(Equation 24)

（数２５）

(Equation 25)

（数２６）

(Equation 26)

ゆえに、数式２７に示すように、閾値はＭ／Ｓ信号から直接出ている閾値に取って代わるために使用される。 Therefore, as shown in Equation 27, the threshold is used to replace the threshold directly coming from the M / S signal.

（数２７）

(Equation 27)

都合がよいように、ＰＥはしばしば心理モデルのＦＦＴから伝達された結果を使用する。しかし、実際の符号化信号はＭＤＣＴ（modified discrete cosine transform）分析フィルタバンクの結果から来る。従って、マスキング閾値を調整し直し、エネルギーをＦＦＴフォーマットからＭＤＣＴフォーマットに変更する必要がある。修正されたマスキング閾値は数式２８、２９、３０のように表される。 For convenience, PEs often use the results communicated from the psychological model FFT. However, the actual encoded signal comes from the result of a modified discrete cosine transform (MDCT) analysis filter bank. Therefore, it is necessary to readjust the masking threshold and change the energy from the FFT format to the MDCT format. The corrected masking threshold is expressed as Equations 28, 29, and 30.

（数２８）

(Equation 28)

（数２９）

(Equation 29)

（数３０）

(Equation 30)

数式１５によって各状態の各帯域のＰＥは数式３１、３２、３３、３４のように引き出される。 According to Expression 15, PEs in each band in each state are extracted as Expressions 31, 32, 33, and 34.

（数３１）

(Equation 31)

（数３２）

(Expression 32)

（数３３）

(Expression 33)

（数３４）

(Equation 34)

ＬおよびＲ、ＭおよびＳのすべての帯域ＰＥは利用可能であるので、好適な代替法はそのＰＥの比較後に選ばれる。 Since all bands PE of L and R, M and S are available, the preferred alternative is chosen after comparing the PEs.

心理音響モデルは修正されたビタビ演算式によって各隣の帯域の最小コストパス値を計算し、帯域状態をＬ／Ｒ状態またはＭ／Ｓ状態に決定する。図２３はＭ／Ｓ符号化コストを最小化するための修正されたビタビ演算式を示すブロック図である。状態ｉおよびＬ／Ｒ状態が０を表し、Ｍ／Ｓ状態が１を表すｋ^th帯域の終わりのためのコストＳ_k(i)を最小化するためにトレリスが構築される。各エッジは符号化状態を変更するための過渡コストファクタを表し、各ノードは比較のためにその帯域ＰＥを有する。修正されたビタビ演算式は第１のスケールファクタ帯域から最後まで最小コストパスを探す。 The psychoacoustic model calculates the minimum cost path value of each adjacent band by using the modified Viterbi arithmetic expression, and determines the band state as the L / R state or the M / S state. FIG. 23 is a block diagram showing a modified Viterbi arithmetic expression for minimizing the M / S encoding cost. A trellis is constructed to minimize the cost S _k (i) for the end of the k ^th band where state i and L / R state represent 0 and M / S state represents 1. Each edge represents a transient cost factor for changing the coding state, and each node has its band PE for comparison. The modified Viterbi equation searches for the minimum cost path from the first scale factor band to the end.

Ｓ_k(i)に第１の帯域からk^th帯域までの状態ｉの最小累積コストを記録させ、ｎ_k(i)はｋ^th帯域のｉ^th状態ノードコストを表し、メインビタビ演算式プロセスは数式３５のように実行される。 Let S _k (i) record the minimum accumulated cost of state i from the first band to the k ^th band, n _k (i) represents the i ^th state node cost of the k ^th band, and the main Viterbi equation process is This is executed as shown in Equation 35.

（数３５）

(Equation 35)

Ｑは全ての状態セットを意味し、α_i,_jは過渡コストファクタを表す。最小コストパスは追跡パスをリバースすることによって見つけ出される。言い換えると、この修正されたビタビ演算式によって最適な帯域モード使用法を見つけることができる。 Q means all state sets, and α _i , _j represents a transient cost factor. The minimum cost path is found by reversing the tracking path. In other words, the optimal band mode usage can be found by this modified Viterbi arithmetic expression.

時間複雑度を分析するために、第１の帯域ノード以外のすべてのノードが各ステージにおいて一回だけ比較を行うことを観察する。 To analyze the time complexity, observe that all nodes except the first band node make a comparison only once in each stage.

図２４は、本発明の修正されたビタビ演算法の使用実施例を示すブロック図であり、第１の帯域４０、第２の帯域４５および第３の帯域５０を備え、各帯域は第１のノードおよび第２のノードを備える。第１の帯域４０の第１のノード４０１は１０にセットされ、第１の帯域４０の第２のノード４０２は２０にセットされ、第２の帯域４５の第１のノード４５１は３０にセットされ、第２の帯域４５の第２のノード４５２は４０にセットされ、第３の帯域５０の第１のノード５０１は５０にセットされ、第３の帯域５０の第２のノード５０２は６０にセットされる。 FIG. 24 is a block diagram showing an embodiment of using the modified Viterbi algorithm of the present invention, comprising a first band 40, a second band 45, and a third band 50, each band being a first band. A node and a second node. The first node 401 of the first band 40 is set to 10, the second node 402 of the first band 40 is set to 20, and the first node 451 of the second band 45 is set to 30. , The second node 452 of the second band 45 is set to 40, the first node 501 of the third band 50 is set to 50, and the second node 502 of the third band 50 is set to 60 Is done.

第１の帯域４０の第１のノード４０１から第２の帯域４５の第１のノード４５１までの過渡コストは１にセットされ、第１の帯域４０の第１のノード４０１から第２の帯域４５の第２のノード４５２までの過渡コストは２にセットされ、第１の帯域４０の第２のノード４０２から第２の帯域４５の第１のノード４５１までの過渡コストは３にセットされ、第１の帯域４０の第２のノード４０２から第２の帯域４５の第２のノード４５２までの過渡コストは４にセットされ、第２の帯域４５の第１のノード４５１から第３の帯域５０の第１のノード５０１までの過渡コストは５にセットされ、第２の帯域４５の第１のノード４５１から第３の帯域５０の第２のノード５０２は６にセットされる。第１の帯域４０と第２の帯域４５との間に四つのコストパス値が存在し、第２の帯域４５と第３の帯域５０との間に二つのコストパス値が存在する。 The transient cost from the first node 401 of the first band 40 to the first node 451 of the second band 45 is set to 1, and the first node 401 of the first band 40 to the second band 45 is set. The transition cost from the second node 452 to the second node 452 is set to 2, the transition cost from the second node 402 in the first band 40 to the first node 451 in the second band 45 is set to 3, The transient cost from the second node 402 of the first band 40 to the second node 452 of the second band 45 is set to 4, and the first node 451 of the second band 45 to the third band 50 The transient cost to the first node 501 is set to 5, and the first node 451 in the second band 45 to the second node 502 in the third band 50 are set to 6. Four cost path values exist between the first band 40 and the second band 45, and two cost path values exist between the second band 45 and the third band 50.

第１の帯域４０の第１のノード４０１、過渡コストおよび第２の帯域４５の第１のノード４５１の合計は第１のコストパス値であり、第１のコストパス値は４１である。第１の帯域４０の第１のノード４０１、過渡コストおよび第２の帯域４５の第２のノード４５２の合計は第２のコストパス値であり、第２のコストパス値は５２である。第１の帯域４０の第２のノード４０２、過渡コストおよび第２の帯域４５の第１のノード４５１の合計は第３のコストパス値であり、第３のコストパス値は５３である。第１の帯域４０の第２のノード４０２、過渡コストおよび第２の帯域４５の第２のノード４５２の合計は第４のコストパス値であり、第４のコストパス値は６４である。 The sum of the first node 401 of the first band 40, the transient cost, and the first node 451 of the second band 45 is the first cost path value, and the first cost path value is 41. The sum of the first node 401 of the first band 40, the transient cost and the second node 452 of the second band 45 is the second cost path value, and the second cost path value is 52. The sum of the second node 402 of the first band 40, the transient cost, and the first node 451 of the second band 45 is the third cost path value, and the third cost path value is 53. The sum of the second node 402 of the first band 40, the transient cost and the second node 452 of the second band 45 is the fourth cost path value, and the fourth cost path value is 64.

四つのコストパス値は最小コストパスを得るために比較される。最小コストパス値は４１であり、最小コストパス値を有する第２の帯域４５の第１のノード４５１は４１にセットされた累積値を含む。第２の帯域４５の第２のノード４５２から第３の帯域５０のノードまでのコストパス値を計算するよりむしろ、第２の帯域４５の第１のノード４５１から第３の帯域５０のノードまでのコストパス値を計算する。 The four cost path values are compared to obtain the minimum cost path. The minimum cost path value is 41, and the first node 451 of the second band 45 having the minimum cost path value includes the accumulated value set to 41. Rather than calculating the cost path value from the second node 452 of the second band 45 to the node of the third band 50, the first node 451 of the second band 45 to the node of the third band 50 Calculate the cost path value.

累積値、過渡コストおよび第３の帯域５０の第１のノード５０１の合計は第１のコストパス値であり、第１のコストパス値は９６であり、累積値は第２の帯域４５の第１のノード４５１に属する。累積値、過渡コストおよび第３の帯域５０の第２のノード５０２の合計は第２のコストパス値であり、第２のコストパス値は１０７であり、累積値は第２の帯域４５の第１のノード４５１に属する。二つのコストパス値は最小コストパスを得るために比較される。最小コストパス値は９６であり、最小コストパス値を有する第３の帯域５０の第１のノード５０１は累積値を含む。最後に最小コストパスは第１の帯域４０から第３の帯域５０まで見つけられる。 The sum of the accumulated value, the transient cost, and the first node 501 of the third band 50 is the first cost path value, the first cost path value is 96, and the accumulated value is the second cost of the second band 45. Belongs to one node 451. The sum of the accumulated value, the transient cost, and the second node 502 of the third band 50 is the second cost path value, the second cost path value is 107, and the accumulated value is the second band 45. Belongs to one node 451. The two cost path values are compared to obtain a minimum cost path. The minimum cost path value is 96, and the first node 501 of the third band 50 having the minimum cost path value includes a cumulative value. Finally, the minimum cost path is found from the first band 40 to the third band 50.

図２５は、本発明のＭ／Ｓ符号化の帯域状態の決定方法を示すフローチャート図である。 FIG. 25 is a flowchart showing a method for determining the band state of M / S encoding according to the present invention.

ステップ２１：心理音響モデルによって各帯域が左信号を含む帯域の大多数を受信し、ＦＦＴ（fast fourier transform）によって左信号を左ＦＦＴ信号（Ｌ_FFT）に変換する。 Step 21: The majority of the bands including the left signal are received by the psychoacoustic model, and the left signal is converted into a left FFT signal (L _FFT ) by FFT (fast fourier transform).

ステップ２２：心理音響モデルによって各帯域が右信号を含む帯域の大多数を受信し、ＦＦＴ（fast fourier transform）によって右信号を右ＦＦＴ信号（Ｒ_FFT）に変換する。 Step 22: The majority of the bands including the right signal are received by the psychoacoustic model, and the right signal is converted into a right FFT signal (R _FFT ) by FFT (fast fourier transform).

ステップ２３：分析フィルタバンクのＭＤＣＴ（modified discrete cosine transform）によって左信号を左ＭＤＣＴ信号（Ｌ_MDCT）に変換する。 Step 23: The left signal is converted into a left MDCT signal (L _MDCT ) by MDCT (modified discrete cosine transform) of the analysis filter bank.

ステップ２４：分析フィルタバンクのＭＤＣＴ（modified discrete cosine transform）によって右信号を右ＭＤＣＴ信号（Ｒ_MDCT）に変換する。 Step 24: The right signal is converted into the right MDCT signal (R _MDCT ) by MDCT (modified discrete cosine transform) of the analysis filter bank.

ステップ２５：同じ帯域の左信号および右信号を使用することによってミドル信号およびサイド信号を計算する。 Step 25: Calculate middle signal and side signal by using left signal and right signal of the same band.

ステップ２６：左ＦＦＴ信号のマスキング閾値（Ｔ_LFFT）を計算するために、Ｌ_FFT信号を受信する。 Step 26: Receive the L _FFT signal to calculate the masking threshold (T _LFFT ) of the left FFT signal.

ステップ２７：右_FFT信号のマスキング閾値（Ｔ_RFFT）を計算するために、Ｒ_FFT信号を受信する。 Step 27: Receive the R _FFT signal to calculate the masking threshold (T _RFFT ) of the right _FFT signal.

ステップ２８：左信号および右信号のマスキング閾値（Ｔ_L、Ｔ_R）をそれぞれ計算するために、Ｔ_LFFT信号、Ｔ_RFFT信号、ＬＦＦＴ信号、ＲＦＦＴ信号、Ｌ_MDCT信号およびＲ_MDCT信号を受信する。 Step 28: _Receive the T _LFFT signal, T _RFFT signal, LFFT signal, RFFT signal, L _MDCT signal and R _MDCT signal to calculate the masking thresholds (T _L , T _R ) of the left signal and the right signal, respectively.

ステップ２９：ミドル信号および右信号のマスキング閾値（Ｔ_M、Ｔ_S）をそれぞれ計算するために、ＴＬ信号およびＴＲ信号を受信する。 Step 29: Receive the TL signal and the TR signal to calculate the masking thresholds (T _M , T _S ) of the middle signal and the right signal, respectively.

ステップ３０：左信号のＰＥ値（ＰＥ_L）を計算するために、Ｔ_LFFT信号およびＬ_FFT信号を受信する。 Step 30: _Receive the T _LFFT signal and the L _FFT signal to calculate the PE value (PE _L ) of the left signal.

ステップ３１：右信号のＰＥ値（ＰＥＲ）を計算するために、Ｔ_RFFT信号およびＲ_FFT信号を受信する。 Step 31: _Receive the T _RFFT signal and the R _FFT signal to calculate the PE value (PER) of the right signal.

ステップ３２：第１のノードを計算する。ＰＥＬおよび右ＰＥＲの合計が第１のノードである。 Step 32: Calculate the first node. The sum of PEL and right PER is the first node.

ステップ３３：ミドル信号のＰＥ値（ＰＥ_M）を計算するためにＴＭ信号およびミドル信号を受信する。 Step 33: Receive the TM signal and the middle signal to calculate the PE value (PE _M ) of the middle signal.

ステップ３４：サイド信号のＰＥ値（ＰＥｓ）を計算するためにＴｓ信号およびサイド信号を受信する。 Step 34: Receive the Ts signal and the side signal to calculate the PE value (PEs) of the side signal.

ステップ３５：第２のノードを計算する。ＰＥＭおよび右ＰＥＳの合計が第２のノードである。 Step 35: Calculate the second node. The sum of the PEM and the right PES is the second node.

ステップ３６：修正されたビタビ演算法によって各隣の帯域の最小コストパスを計算する。 Step 36: Calculate the minimum cost path of each adjacent band by the modified Viterbi algorithm.

ステップ３７：最小コストパス値に基づいて各帯域の状態を決定する。状態はＬ／Ｒ状態またはＭ／Ｓ状態である。 Step 37: Determine the state of each band based on the minimum cost path value. The state is an L / R state or an M / S state.

心理音響モデルによって帯域状態がＭ／Ｓ状態に決定されたとき、Ｍ／Ｓ変換モデルはＮ^th帯域のＬ／Ｒ信号を受信し、Ｍ／Ｓ信号に変換し、量子化／符号化モデルによってＮ^th帯域のＭ／Ｓ信号の量子化および符号化を行い、そうでなければ量子化／符号化モデルが量子化および符号化を行うためにＮ^th帯域のＬ／Ｒ信号を受信する。 When the band state is determined to be the M / S state by the psychoacoustic model, the M / S conversion model receives the L / R signal of the N ^th band, converts it to the M / S signal, and uses the quantization / coding model The N ^th band M / S signal is quantized and encoded, otherwise the quantization / coding model receives the N ^th band L / R signal for quantization and encoding.

本発明は帯域、ＰＥおよび修正されたビタビ演算式を通じて効果的な計算方法で帯域状態を決定する方法を提供する。修正されたビタビ演算法はＡＡＣのための命令O(2^49)からO(49*2)まで複雑度を低減させることができる。さらにＭ／Ｓマスキング閾値はＭ／Ｓ符号化閾値を得るためにＬ／Ｒ心理音響モデルから引き出すように修正され、Ｍ／Ｓ信号を置くことは合理的である。 The present invention provides a method for determining a band state with an effective calculation method through a band, a PE, and a modified Viterbi equation. The modified Viterbi algorithm can reduce the complexity from O (2 ^ 49) to O (49 * 2) instructions for AAC. Furthermore, the M / S masking threshold is modified to be derived from the L / R psychoacoustic model to obtain the M / S encoding threshold, and it is reasonable to put the M / S signal.

本発明の説明過程でこれらの装置および方法は多くの修正や変更がなされることが容易に分かる。よって、上述の説明は特許請求の範囲によってのみ制限されると解釈されるべきである。 It will be readily apparent that many modifications and variations of these devices and methods may be made during the course of describing the present invention. Accordingly, the above description should be construed as limited only by the following claims.

符号化されている過渡音声を有する信号を示す図である。FIG. 6 shows a signal with encoded transient speech. 異なるタイプのマスキングの効果を示す図である。It is a figure which shows the effect of a different type of masking. 従来技術におけるＭ／Ｓ変換による聴覚符号化を示すブロック図である。It is a block diagram which shows the auditory encoding by M / S conversion in a prior art. 従来技術におけるＦＡＡＣのＭ／Ｓ符号化の帯域決定方法を示すフローチャート図である。It is a flowchart figure which shows the band determination method of M / S encoding of FAAC in a prior art. 本発明によるＡＡＣエンコーダを示すブロック図である。1 is a block diagram illustrating an AAC encoder according to the present invention. FIG. ロングウィンドウ符号化およびスタート‐ショート‐ストップウィンドウシーケンスを示す図である。FIG. 5 shows a long window encoding and a start-short-stop window sequence. 徐々に増加する過渡信号、エネルギー比率の従来値および本発明によるグローバルエネルギー比率を示す図である。It is a figure which shows the transient signal which increases gradually, the conventional value of an energy ratio, and the global energy ratio by this invention. 安定したグローバルエネルギー比率およびスペクトル内容の急激な変化での過渡信号を示す図である。FIG. 6 shows a transient signal with a stable global energy ratio and abrupt changes in spectral content. 純音声の信号の例を示す図である。It is a figure which shows the example of the signal of a pure audio | voice. ２０４８サンプル変換（上）および２５６サンプル変換（下）によって変換された周波数を示す図である。FIG. 6 shows the frequency converted by 2048 sample conversion (top) and 256 sample conversion (bottom). 本発明によるウィンドウ決定方法を示すフローチャート図である。It is a flowchart figure which shows the window determination method by this invention. 本発明の第２のＡＡＣエンコーダを示すブロック図である。It is a block diagram which shows the 2nd AAC encoder of this invention. 本発明の第３のＡＡＣエンコーダを示すブロック図である。It is a block diagram which shows the 3rd AAC encoder of this invention. ウィンドウタイプスイッチテーブルを示す図である。It is a figure which shows a window type switch table. ロング‐ショートウィンドウ心理音響マッピング結果を示す図である。It is a figure which shows a long-short window psychoacoustic mapping result. ショートウィンドウグルーピングおよびインターリーブの例を示す図である。It is a figure which shows the example of a short window grouping and interleaving. 本発明のショートウィンドウグルーピング方法を示すフローチャート図である。It is a flowchart figure which shows the short window grouping method of this invention. ＴＮＳが適用されたときのウィンドウタイプスイッチ構成を示す図である。It is a figure which shows a window type switch structure when TNS is applied. ＴＮＳが適用されたときの修正されたウィンドウタイプスイッチテーブルを示す図である。It is a figure which shows the corrected window type switch table when TNS is applied. ウィンドウカップリング方法を示すフローチャート図である。It is a flowchart figure which shows the window coupling method. チャンネルグルーピングの例を示す図である。It is a figure which shows the example of channel grouping. ウィンドウカップリングおよびグループカップリング方法を示すフローチャート図である。It is a flowchart figure which shows a window coupling and a group coupling method. Ｍ／Ｓ符号化コストを最小化するための修正されたビタビ演算式を示すブロック図である。It is a block diagram which shows the modified Viterbi arithmetic expression for minimizing M / S encoding cost. 本発明の修正されたビタビ演算法の使用実施例を示すブロック図である。FIG. 6 is a block diagram illustrating an example of use of the modified Viterbi algorithm of the present invention. 本発明のＭ／Ｓ符号化の帯域状態の決定方法を示すフローチャート図である。It is a flowchart figure which shows the determination method of the band state of M / S encoding of this invention.

Claims

Receiving a block of audio signals;
Determining a global energy ratio of a first range of the audio signal and comparing the global energy ratio to a first threshold;
Determining a zero cross ratio of a second range of the audio signal and comparing the zero cross ratio to a second threshold;
Selecting a short coding window when either the global energy ratio or the zero crossing ratio exceeds the first or second threshold and no third range tone attack of the audio signal is detected;
Selecting a long encoding window when neither the global energy ratio nor the zero crossing ratio exceeds the first and second thresholds or when a tone attack of the third range of the audio signal is detected;
Encoding a fourth range of the audio signal that is substantially common to the first, second, and third ranges in the selected encoding window. .

2. The audio signal encoding method according to claim 1, wherein the global energy ratio is a ratio of a maximum energy in the first range and a minimum energy in the first range.

The zero cross ratio is a ratio of a zero cross rate of the first sub-range of the second range to a zero cross rate of the second sub-range of the second range, and the zero cross rate of the first sub-range is the second The audio signal encoding method according to claim 1, wherein the zero cross rate of the second sub-range is a minimum value of the second range.

The audio signal encoding method according to claim 1, wherein the tone attack has a tonality higher than a tone threshold.

The global energy ratio is a ratio of the maximum energy of the first range and the minimum energy of the first range, and the zero cross ratio is the zero cross rate of the first sub-range of the second range and the second range. Of the second sub-range, the zero-cross rate of the first sub-range is the maximum value of the second range, and the zero-cross rate of the second sub-range is 2. The audio signal encoding method according to claim 1, wherein the tone attack is a minimum value, and the tone attack has a tonality higher than a tone threshold.

The selected window is the next window, the two preselected windows are the current window and the previous window;
Changing the current window to a long to short transition window when the previous window is a long window, the current window is a long window, and the next window is a short window;
Changing the current window from a short to long transition window when the previous window is a short window, the current window is a long window, and the next window is a long window;
Changing the current window to a short window when the previous window is a short window, the current window is a long window, and the next window is a short window;
When the previous window is a short to long transition window, the current window is a long window, and the next window is a short window, changing the current window to a long to short transition window and The audio signal encoding method according to claim 1, further comprising:

2. The audio signal encoding method according to claim 1, further comprising the step of defining a psychoacoustic model of the selected short window as a psychoacoustic model of a corresponding range of the virtual long window.

And estimating a scale factor for the short window;
The method of claim 1, further comprising: grouping short windows having a scale factor similar to a predetermined error.

And performing M / S encoding on the audio signal;
9. The audio signal encoding method according to claim 8, further comprising the step of re-evaluating the scale factor for the short window.

The selected window is the next window, the two preselected windows are the current window and the previous window;
Applying TNS to a fourth range of the audio signal;
Changing the current window to a long to short transition window when the previous window is a long window, the current window is a long window, and the next window is a short window;
Changing the current window from a short to long transition window when the previous window is a short window, the current window is a long window, and the next window is a long window;
Changing the current window to a short window when the previous window is a short window, the current window is a long window, and the next window is a short window;
Changing the current window from a short to long transition window when the previous window is a long to short transition window, the current window is a long window, and the next window is a long window;
Changing the current window to a short window when the previous window is a long to short transition window, the current window is a long window, and the next window is a short window;
When the previous window is a short to long transition window, the current window is a long window, and the next window is a short window, changing the current window to a long to short transition window and The audio signal encoding method according to claim 1, further comprising:

The audio signal is a two-channel stereo signal, and
Selecting long or short coding for each channel;
Detecting the difference in the PEs of the two channels when the encoding window size of each channel of the audio signal does not match;
When a difference in PE is detected and the PE for both channels is above the hearing threshold, the short coding window is used for both channels, and when both PEs are below the hearing threshold, the long code is used for both channels. The method according to claim 1, further comprising: using an encoding window.

An AAC encoder comprising a gain control unit, an auditory model, a filter bank, a bitstream multiplexer, and a window determination module programmed to perform the method of claim 1.

Receiving a block of audio signals;
Determining a global energy ratio of a first range of the audio signal and comparing the global energy ratio to a first threshold, wherein the global energy ratio is a maximum energy of the first range and a minimum energy of the first range; A step that is a ratio;
Determining a zero cross ratio of a second range of the audio signal and comparing the zero cross ratio with a second threshold, the zero cross ratio being a zero cross rate of a first sub-range of the second range and a second cross-range of the second range; The zero cross rate of the second sub-range, the zero cross rate of the first sub-range is the maximum value of the second range, and the zero cross rate of the second sub-range is the minimum value of the second range. A step and
When either the global energy ratio or the zero crossing ratio exceeds the first or second threshold and no third range tone attack of the audio signal is detected, a short coding window is selected, the tone attack being a tone threshold Selecting a short coding window when having a higher tonality;
Selecting a long encoding window when neither the global energy ratio nor the zero crossing ratio exceeds the first and second thresholds or when a tone attack of the third range of the audio signal is detected;
Encoding a fourth range of the audio signal that is substantially common to the first, second, and third ranges in the selected encoding window. .

The selected window is the next window, the two preselected windows are the current window and the previous window;
Changing the current window to a long to short transition window when the previous window is a long window, the current window is a long window, and the next window is a short window;
Changing the current window from a short to long transition window when the previous window is a short window, the current window is a long window, and the next window is a long window;
Changing the current window to a short window when the previous window is a short window, the current window is a long window, and the next window is a short window;
When the previous window is a short to long transition window, the current window is a long window, and the next window is a short window, changing the current window to a long to short transition window and The audio signal encoding method according to claim 13, further comprising:

14. The audio signal encoding method according to claim 13, further comprising the step of defining a psychoacoustic model of the selected short window as a psychoacoustic model of a corresponding range of the virtual long window.

And estimating a scale factor for the short window;
The method of claim 13, further comprising: grouping short windows having a scale factor similar to a predetermined error.

And performing M / S encoding on the audio signal;
17. The audio signal encoding method according to claim 16, further comprising the step of re-evaluating the scale factor for the short window.

The selected window is the next window, the two preselected windows are the current window and the previous window;
Applying TNS to a fourth range of the audio signal;
Changing the current window to a long-to-short transition window when the previous window is a long window, the current window is a long window, and the next window is a short window;
Changing the current window from a short to long transition window when the previous window is a short window, the current window is a long window, and the next window is a long window;
Changing the current window to a short window when the previous window is a short window, the current window is a long window, and the next window is a short window;
Changing the current window from a short to long transition window when the previous window is a long to short transition window, the current window is a long window, and the next window is a long window;
Changing the current window to a short window when the previous window is a long to short transition window, the current window is a long window, and the next window is a short window;
When the previous window is a short to long transition window, the current window is a long window, and the next window is a short window, the step of changing the current window to a long to short transition window and 14. The audio signal encoding method according to claim 13, further comprising:

The audio signal is a two-channel stereo signal, and
Selecting long or short coding for each channel;
Detecting the difference in the PEs of the two channels when the encoding window size of each channel of the audio signal does not match;
When a difference in PE is detected and the PE for both channels is above the hearing threshold, the short coding window is used for both channels, and when both PEs are below the hearing threshold, the long code is used for both channels. The method of claim 13, further comprising the step of using an encoding window.

An AAC encoder comprising a gain control unit, an auditory model, a filter bank, a bitstream multiplexer and a window determination module programmed to perform the method of claim 13.

Receiving at least one audio stream having a majority of bands, each band having a left signal and a right signal;
Calculating a middle signal and a side signal by using a left signal and a right signal in the same band; and
Calculating a first node that is the sum of the PE values of the left signal and the right signal and a second node that is the sum of the PE values of the middle signal and the side signal for each band;
Each is from a first node N ^th band until (N + 1) the first or second node of ^th band, or from the second node of the N ^th band (N + 1) ^th first or second node of the band Calculating the minimum cost path value of the adjacent band;
Determining a state of each band based on a minimum cost path value where the state may be an L / R state or an M / S state, and a band state of M / S encoding for AAC, comprising: Decision method.

And calculating a minimum cost path value, said step comprising:
Calculating a majority of cost path values where each cost path value is from a first band node to a second band node;
22. The method for determining a band state of M / S encoding for AAC according to claim 21, further comprising: obtaining a minimum cost path value by comparing cost path values.

The audio stream includes four cost path values between a first band and a second band and two cost path values between the remaining adjacent bands of the audio stream. Of determining the band state of M / S coding for AAC of the first.

And calculating a minimum cost path value between the first band and the second band, said step comprising:
Calculating each cost path value by using the sum of the first band node, the transient cost and the second band node;
24. The method for determining a band state of M / S encoding for AAC according to claim 23, further comprising: obtaining a minimum cost path value by comparing cost path values.

And calculating a minimum cost path value between the N ^th band of the remaining adjacent bands and the (N + 1) ^th band, said step comprising:
Calculating each cost path value by using the cumulative value, the transient cost and the sum of the nodes in the (N + 1) ^th band;
24. The method for determining a band state of M / S encoding for AAC according to claim 23, further comprising: obtaining a minimum cost path value by comparing cost path values.

The accumulated value, (N-1) ^th band and M / S code for AAC according to claim 25, characterized in that belonging to the node of the N ^th band with a least-cost path between the N ^th band How to determine the bandwidth state of a network.

Further, the method includes calculating a minimum cost path value, the step comprising:
The method for determining a band state of M / S coding for AAC according to claim 21, further comprising: calculating a minimum cost path value of each adjacent band of the audio stream by a modified Viterbi arithmetic expression. .

And calculating a minimum cost path value, said step comprising:
Calculating a majority of cost path values where each cost path value is from a first band node to a second band node;
The method for determining the band state of M / S encoding for AAC according to claim 27, comprising: comparing a cost path value to obtain a minimum cost path value.

28. The audio stream includes four cost path values between a first band and a second band and two cost path values between the remaining adjacent bands of the audio stream. Of determining the band state of M / S coding for AAC of the first.

And calculating a minimum cost path value between the first band and the second band, said step comprising:
Calculating each cost path value by using the sum of the first band node, the transient cost and the second band node;
30. The method for determining a band state of M / S encoding for AAC according to claim 29, comprising: comparing a cost path value to obtain a minimum cost path value.

And calculating a minimum cost path value between the N ^th band of the remaining adjacent bands and the (N + 1) ^th band, said step comprising:
Calculating each cost path value by using the cumulative value, the transient cost and the sum of the nodes in the (N + 1) ^th band;
30. The method for determining a band state of M / S encoding for AAC according to claim 29, comprising: comparing a cost path value to obtain a minimum cost path value.

The accumulated value, (N-1) ^th band and M / S code for AAC according to claim 31, characterized in that belonging to the node of the N ^th band with a least-cost path between the N ^th band How to determine the bandwidth state of a network.

And calculating the PE value of the left signal and the right signal, said step comprising:
Converting left and right signals into left and right FFT signals by FFT;
Receiving a left FFT signal and a right FFT signal to calculate a masking threshold for the left FFT signal and the right FFT signal;
22. The M / S encoding for AAC according to claim 21, comprising receiving a masking threshold, a left FFT signal and a right FFT signal to calculate PE values of the left signal and the right signal, respectively. How to determine the bandwidth status of

In addition, before calculating the middle and side signals,
The method of claim 21, further comprising: converting left and right signals into left and right MDCT signals by MDCT and calculating middle and side signals. Bandwidth determination method.

The method further includes the step of calculating the PE value of the middle signal and the side signal,
Calculating a middle signal and a side signal masking threshold;
35. The method of M / S encoding for AAC according to claim 34, further comprising: receiving a masking threshold, a middle signal and a side signal to calculate a PE value of the middle signal and the side signal, respectively. Bandwidth determination method.

Calculating a middle signal and side signal masking thresholds, said steps comprising:
Converting left and right signals into left and right MDCT signals by MDCT; converting left and right signals into left and right FFT signals by FFT; and
Receiving left and right FFT signals to calculate left and right FFT signal masking thresholds; and calculating left and right FFT signal masking thresholds to calculate left and right FFT masking thresholds. Receiving a masking threshold, a left FFT signal, a right FFT signal, a left MDCT signal and a right MDCT signal;
36. The M / S coding for AAC according to claim 35, comprising: receiving a masking threshold for the left signal and the right signal to calculate a masking threshold for the middle signal and the right signal, respectively. Bandwidth determination method.

The band state of M / S encoding for AAC according to claim 36, wherein the masking threshold values of the middle signal and the side signal are respectively set to half the minimum value of the masking threshold values of the left signal and the right signal. How to determine.