TW584835B - Method and architecture of digital coding for transmitting and packing audio signals - Google Patents

Method and architecture of digital coding for transmitting and packing audio signals Download PDF

Info

Publication number
TW584835B
TW584835B TW91136087A TW91136087A TW584835B TW 584835 B TW584835 B TW 584835B TW 91136087 A TW91136087 A TW 91136087A TW 91136087 A TW91136087 A TW 91136087A TW 584835 B TW584835 B TW 584835B
Authority
TW
Taiwan
Prior art keywords
audio
patent application
item
scope
transmitting
Prior art date
Application number
TW91136087A
Other languages
Chinese (zh)
Other versions
TW200410201A (en
Inventor
Chi-Min Liu
Wen-Chieh Lee
Original Assignee
Univ Nat Chiao Tung
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Chiao Tung filed Critical Univ Nat Chiao Tung
Priority to TW91136087A priority Critical patent/TW584835B/en
Application granted granted Critical
Publication of TW584835B publication Critical patent/TW584835B/en
Publication of TW200410201A publication Critical patent/TW200410201A/en

Links

Abstract

A digital coding method converts inputted audio signal into frequency sampling sequence to represent a spectrum integration of the audio signal; further based on a bit allocation procedure, quantitize the frequency sampling sequence into quantitized value. By means of a mask threshold value, the bit allocation procedure uses a parameter predictor to predict quantitized parameter. The quantitized value is coded as the coded data containing multiple bits. If the bit number of the coded data exceed a preset available bit number of the coded data, a recursive rate control loop will adjust quantitized parameter and quantitized gap size. Prior to quantitizing frequency sampling sequence, the method also follows a cutoff frequency to cut off the high frequency portion of inputted audio signal where the cutoff frequency is decided by the recursive rate control loop.

Description

584835 五、發明說明(1) 發明所屬之技術領域 本發明係有關於一種用於傳送和包裝(packing)訊號 的數位編碼(digital coding)方法與結構,特別是關於音 孔(audio signai)編碼中的位元分配(bH aii〇cati〇n)。 先前技術 知覺(perceptual)語音編碼如MPEG 1-3層,進階語 曰、’扁號’或時間/頻率(t i m e / f r e q u e n c y,T / F )編碼等已 廣泛應用在電子電信消費品,電信電訊及播音技術上。在 這些知覺語音編碼器中,位元分配是引導高複雜度和決定 編碼品質的關鍵模組中一項主要的工作。 圖1為一種知覺音訊編碼中編碼過程(p r 〇 c e s s )的方塊 示忌圖’時間/頻率轉換器(T / j? m a p p e r) 1 0 1以一個視窗接 ,個視窗(window-by-window)為基礎將音訊S(n)轉換成由 時間範圍至頻率範圍的頻段(frequenCy segments) S (m,f)。多種編碼器(c 〇 d e r s) 1 0 3也被用在編碼過程中, 以達成咼壓縮率(high compression ratios)。輸出值 X(m,f)為以視窗片段指標(wind〇w segment index)m和頻 率指標(frequency segment)f 編碼後的序列(sequence), 以頻率範圍為定義域。量化器(quantizer )丨05將x(m,f)量584835 V. Description of the invention (1) Technical field to which the invention belongs The present invention relates to a digital coding method and structure for transmitting and packing signals, particularly in audio signai coding Bit allocation (bH aiiocation). Perceptual speech coding in the prior art, such as MPEG layer 1-3, advanced expressions, 'symbols' or time / frequency (T / F) coding, etc. have been widely used in consumer electronics, telecommunications and telecommunications Broadcasting technically. In these perceptual speech encoders, bit allocation is a major task in key modules that guide high complexity and determine encoding quality. FIG. 1 is a block diagram of a coding process (pr occess) in a perceptual audio coding. The time / frequency converter (T / j? Mapper) 1 0 1 is connected by a window, window-by-window Based on this, the audio S (n) is converted into a frequency range (frequenCy segments) S (m, f) from time range to frequency range. A variety of encoders (co d e r s) 103 are also used in the encoding process to achieve high compression ratios. The output value X (m, f) is the sequence encoded by the window segment index m and the frequency segment f, and the frequency range is used as the definition domain. Quantizer (quantizer) 丨 05 will be x (m, f) amount

584835584835

化成有限個級別(丨e v e 1 s ),以χ,( 一 立广 +· νηι,ί)表不0並以量化雜 曰 Uuantizatlon nolse)所產生的宝 ·,、 )減至最小A日沾。、士此旦關扣-(impairments )減主琅】為目的泛些里化級別是由量化參數 (quantization parameters)來控制。 ===縮(C_FessiC)r〇是將頻率線歸類成組 柄之為Ϊ化波段uuantlzatlon bands)。一個量化 所組成的頻率線的數量是根據重要頻波段(crltical / bands)以及用來傳送量化參數所能提供的位元來決定。可 變長度編碼器(variable length c〇ding,VLc)i〇7 代表經 由可變長度編碼的量化後的序列x,(m,f),其以轉換後的 信號的統計上的發生機率來考量。聚集單元(packing u n i t) 1 0 9將最終轉碼結果集合成一個由特定音訊規範 (specified audio protocol)所定義的序列。聽覺分析模 式(^ych〇acoustic m〇del)lll將訊號做分析並由分析結 果提供訊號-對-遮罩率(signaht〇 —masking rati〇,SMR) 給來自訊號分析結果的量化波段。位元分配器(bi t-allocator)113參考聽覺分析模式in所提供的遮罩門檻值 (masking thresholds)和預定可用位元(available bit . budge t) 11 5來決定量化參數。 非均勻量化器(non-uniform quantizer)在位元分配 裔的控制下量化光譜線(spectral lines),此位元分配器 考量最終的音訊品質和需要的位元,來決定量化方式。因Formed into a limited number of levels (丨 e v e 1 s), χ, (一 立 广 + · νηι, ί) represent 0 and quantify the amount of treasure generated by Uuantizatlon nolse (,,) to minimize A-day contamination. For this purpose, the level of generalization is controlled by quantization parameters. === 缩 (C_FessiC) r0 is to classify the frequency lines into groups and handle them as uuantlzatlon bands. The number of frequency lines formed by a quantization is determined according to the important frequency bands (crltical / bands) and the bits that can be provided to transmit the quantization parameters. A variable length encoder (VLc) i07 represents a quantized sequence x, (m, f) after variable length encoding, which is considered in terms of the statistical probability of the converted signal . The aggregation unit (packing u n t) 1 0 9 aggregates the final transcoding results into a sequence defined by a specific audio protocol. The auditory analysis mode (^ ych〇acoustic model) analyzes the signal and provides the signal-pair-masking rati0 (SMR) from the analysis result to the quantized band from the signal analysis result. The bit allocator 113 refers to the masking thresholds provided by the auditory analysis mode in and the predetermined available bits (budge t) 11 5 to determine the quantization parameter. A non-uniform quantizer quantizes the spectral lines under the control of a bit allocation system. This bit allocation device considers the final audio quality and the required bits to determine the quantization method. because

第6頁 584835 五、發明說明(3) 此、’音訊品質和位元數目的控制為位元分配器的基本要件 美國專利文獻5,579,430中揭露了 一種有關以頻率定義 的最佳、、為碼(optimum coding in the frequency domain ’ 〇CF)過程的數位轉碼程序。它提供一種可媲美CD (compact-disc)品質的音樂轉碼,其可行的資料率(date rate)約為2位元/ATW,以及在良好的FM_無線電廣播品質 下1^5位元/ATW的資料率。另一美國專利文獻5, 924, 0 60則 揭路種關於聽覺訊號(acoustical signals)傳送及/或 儲存的數位編碼程序,其利用一個4和6的因數來減低資料 率’而無實質降低音樂訊號的品質。 關於MPEG第1和第2層,均勻量化器(unif〇rm ==11 zer)是用來控制品質和位元要件。因此位元分配器 間早地將可用位元總數分配給次頻波段訊號(別卜band sjg^ls)作為量化用,以使量化雜音能聽度 二; ^TtlZaU〇n如⑷最小化。在如ΜΡΕ(^3層、 MPEG - 2 AAC和MPEG4 T/F編碼的編碼器中,控 二因在於它們皆使用非均勻量化器, 的雜音:變:換…依據知覺可允許 第3層和長化产= 來將 給不同的值,I即祐/Λ 碼將可變位元長度指定 ,而非從量化ί參的位元應該是從量化結果中取得 584835 五、發明說明(4) < 、上述缺點造成在估計量化參數時產生問題。一種兩層 巢狀(two nested)迴圈反覆方法(i〇〇p iterative method )’稱為OCF ’被提出來解決這個問題。如圖2所示,它經 由兩種反覆迴圈’即比率控制迴圈(rate — controlling loop)和口口鲁控制迴圈(quaHty一c〇ntr〇iiing ι〇〇ρ),來 估=量化參數。比率控制迴圈反覆地調整參數值,以符合 由里化過程頻譜線的jju f f man編碼得到的有限的位元。品 質控制迴圈則反覆地調整參數值,以符合反量化過程Page 6 584835 V. Description of the invention (3) Therefore, the control of audio quality and the number of bits is a basic requirement of a bit allocator. US Patent Document 5,579,430 discloses an optimal, A digital transcoding program for optimal coding in the frequency domain '〇CF. It provides a music transcoding that is comparable to CD (compact-disc) quality, its feasible data rate (date rate) is about 2 bits / ATW, and 1 ^ 5 bits / under good FM_radio broadcast quality ATW data rate. Another US patent document 5, 924, 0 60 discloses a digital encoding program for transmitting and / or storing acoustic signals, which uses a factor of 4 and 6 to reduce the data rate without substantially reducing the music. The quality of the signal. Regarding the first and second layers of MPEG, a uniform quantizer (unifom == 11 zer) is used to control quality and bit requirements. Therefore, the bit allocator early allocates the total number of available bits to the sub-frequency band signal (beijing band sjg ^ ls) for quantization, so as to make the quantization noise audibility two; ^ TtlZaUon as minimized. In encoders such as MPE (^ 3 layer, MPEG-2 AAC, and MPEG4 T / F encoding, the second reason is that they all use non-uniform quantizers. Noise: change: change ... according to the perceptual allowable layer 3 and Changhua product = different values will be given, I, I / Λ code specifies the variable bit length, instead of quantizing the bit should be obtained from the quantization result 584835 5. Invention description (4) & lt The above disadvantages cause problems in estimating the quantization parameters. A two-layer nested loop iterative method (referred to as OCF) is proposed to solve this problem. As shown in Figure 2 It is shown that it is estimated by two kinds of iterative loops, that is, a rate control loop (rate-control loop) and a mouth-loop control loop (quaHty-c0ntr〇iiing ι〇〇ρ). The loop repeatedly adjusts the parameter values to conform to the limited bits obtained by the jjuff man encoding of the spectral lines of the refining process. The quality control loop repeatedly adjusts the parameter values to conform to the inverse quantization process

CinveTse quantization)所須評估的量化雜音的知覺標準 (perceptual criterion)。 對於一個含有F條頻譜線之畫面(frame)的方法,其複 雜度可被描述為〇 (F · R · π +F · q · r ),其中q和r分別 為重覆品質控制迴圈及比率控制迴圈的次數,而”和r又分 別為在比率控制迴圈及品質控制迴圈下處理頻譜線的計算 複雜度(computation complexity)。比率控制迴圈複雜度 7?係來自夏化及頻譜線VLC編碼,而品質控制迴圈複雜度γ 則來自解量化(dequ anti zat ion)和雜音量測(n〇ise measure)。々和7兩者皆為高複雜度。並且,重覆次數q和r 係依里化參數的初始值和調整方法而定,此複雜度甚至大 於圖1中混合轉換(h y b r i d t r a n s f 〇 r m)和聽覺分析模式的 總複雜度。CinveTse quantization) The perceptual criterion of the quantization noise. For a frame method with F spectral lines, its complexity can be described as 〇 (F · R · π + F · q · r), where q and r are repeated quality control loops and ratios, respectively. The number of control loops, and "r" are the computational complexity of processing the spectral lines under the ratio control loop and the quality control loop, respectively. The complexity of the ratio control loop 7 is from Xiahua and spectrum Line VLC encoding, and the quality control loop complexity γ comes from deequ anti zat ion and noise measure. Both 々 and 7 are high complexity. And, the number of repetitions q And r depend on the initial value of the reification parameter and the adjustment method, and this complexity is even greater than the total complexity of hybridtransfom and auditory analysis mode in FIG. 1.

五、發明說明(5) 在口口質控制迴圈中指定位 碼的品質。指位70、,、口里化波段決定了語音編 日疋位凡有兩種處理方法。從士、+ θ 元指定給每一万#圃由址处理万/2: 一種方法是只將位 母反覆迴圈中擁有最差的雜音-對-遮罩率 (ncuse-to〜masking ratl〇)的波段。 口、、 ”反覆數目很大,意即相當高的複雜:法 在母反覆迴圈中將位元指定仏雜音疋 Ϊ Μ :: 所有可用位元皆被使用殆盡為止。此方法的 第-種方法的複雜度低許多。然而,令人所Κ 的疋此方法的品質是否讓人滿意。 第一種方法 檻值相等,此標 iso的樣本編碼. 法的問題在於它 内控制品質和位 止盡的迴圈,一 。一般用來處理 一個極限,並且 質和迴圈數。然 保證。 可使雜音形變,以使遮罩門檻值與雜音門 準已廣被接受。第二種方法曾被使用在 通A吊使品質實質上變好。兩種巢狀迴圈 可施無法導致收斂的情況。因為在兩迴圈 元/肖耗有兩種不同法則,因此可能導致無 身二稱之為”停滯π問題(d e a d 1 〇 c k p r 〇 b 1 e m) +滞問題的方法是將反覆次數的最大值設 使用某種啟發性的參數調整方法來關照品 ’在這些方法中,仍然無法使品質獲得 發明内容V. Description of the invention (5) Specify the quality of the bit code in the mouth quality control loop. The finger position 70,, and the localized band determine the two ways of processing the voice editor. Congshi, + θ yuan is assigned to every 10,000 # garden by address processing / 2: One method is to only have the worst noise-to-masking ratl in the loop repeatedly. ). The number of repetitions is large, which means a high degree of complexity: the method assigns bits in the mother's repeated loops. ::: All available bits are used up. The first- The complexity of this method is much lower. However, it makes people wonder whether the quality of this method is satisfactory. The first method has the same threshold and the sample encoding of this standard iso. The problem of the method is that it controls the quality and position. The number of loops is one. Generally used to deal with a limit, and the quality and number of loops. Of course, the noise can be deformed, so that the mask threshold and noise threshold have been widely accepted. The second method has been The use of the Tong A crane makes the quality substantially better. Two kinds of nested loops can not be used to cause convergence. Because there are two different rules for the yuan / Xiao consumption in the two loops, it may cause the bodyless to call it "Stagnation π problem (dead 1 〇ckpr 〇b 1 em) + The method of stagnation problem is to set the maximum value of the number of iterations to use some kind of heuristic parameter adjustment method to care about the product. In these methods, the quality still cannot be obtained. Summary of the invention

第9頁 584835 、發明說明(6) 曰本毛明克服上述傳統數位編碼過程的缺點。其主要目 =ί提供一種具有高品質及較少複雜度運算之用來傳達和 t集语音訊號的數位編碼方法。 根據本發明,輸入音訊首先被轉換成頻率樣本序列 feculence of frequency samples),以代表此音訊的頻 譜合成(spectral composition)。此頻率樣本序列根據一 種位元分配程序和一個參數預估器(parameter )而被量化。此參數預估器直接依據遮罩門襤值來預估量 化參數。這些量化值以一可變長度編碼方式來編碼,或是 直接集合成一個特,定的語音規格。如果編碼後的資料的全 長超過可用位元的數目,則將參數調整並且增加量化間隔 的大小(Quantization step size)。重複這個過程直到可 用位元的數目大於此編碼所需要的位元數目。最後,將最 終已編碼的序列聚集成一個由特定語音規格所定義的序 列。Page 9 584835, description of the invention (6) The present Mao Ming overcomes the shortcomings of the traditional digital coding process described above. Its main purpose is to provide a digital encoding method with high-quality and less complex operations for communicating and t-set speech signals. According to the present invention, the input audio is first converted into a sequence of frequency samples (feculence of frequency samples) to represent the spectral composition of the audio. The frequency sample sequence is quantized according to a bit allocation procedure and a parameter estimator (parameter). This parameter estimator estimates the quantization parameters directly based on the mask threshold. These quantized values are encoded in a variable-length encoding method, or they are directly assembled into a specific, fixed voice specification. If the full length of the encoded data exceeds the number of available bits, adjust the parameters and increase the quantization step size. This process is repeated until the number of available bits is greater than the number of bits required for this encoding. Finally, the final encoded sequences are aggregated into a sequence defined by specific speech specifications.

三層;也可應用到如大部份的MPEG 本發明的方法以MPEG第三層的非均句量化器來詳細說 明衍生步驟,且檢查此知覺編碼方法的複雜度和語音品 質。依此,本發明利用分段的(segmental)雜音_對_遮罩 率說明此衍生步驟’並且提供一種封閉式的方程式 (closed-form equation)來表示位元/量化間格大小(bu/ step size)和量化語音之間的關係。本方法不限於肝“第 AAC ( 進階語音編碼Three layers; can also be applied to most of the MPEG methods. The method of the present invention uses the MPEG third layer non-uniform sentence quantizer to explain the derivation steps in detail, and checks the complexity and speech quality of this perceptual coding method. According to this, the present invention uses segmental murmur_pair_mask rate to explain this derivation step 'and provides a closed-form equation to represent the bit / quantization grid size (bu / step size) and quantized speech. This method is not limited to liver "AAC (Advanced Speech Coding

第10頁 584835Page 10 584835

的位元分配標準也 如mpeg的第一層和 的知覺編螞器。由於本發明所提供的新 可以應用到具有均勻量化器的編碼器, 第二層。 本發明的另一目的是提 。此結構含有一轉換器、一 一參數預估器、一包裝單元 一個可被訊號處理器實現以 (comparator) 〇 =實現此數位編碼程序的結構 量化器、一可變長度編碼器、 、一調整器(ad justor),以及 元成本發明之方法的比較器 根據本發明,對於低位元率編碼程序量化參數直接從 叩質條件來評估。因為藉由利用比率控制迴圈,在非均等 的頻率線裡的量化頻寬(quantizat i〇n bandwidth)和所需 的位兀數被考量,並且可被調整。而對於可變位元率的編 $馬’則可完全除去比率控制迴圈。 兹配合下列圖示、實施例之詳細說明及專利申請範圍 ’將上述及本發明之其他目的與優點詳述於后。 貫施方式 •圖3a說明本發明之音訊編碼方法的程序。參照圖3a, 輪入音訊首先被轉換成一頻率樣本序列,以代表此音訊的The bit allocation standards are also as the first layer of mpeg and the perceptual editor. Since the novel provided by the present invention can be applied to an encoder having a uniform quantizer, the second layer. Another object of the invention is to mention. This structure contains a converter, a parameter estimator, a packaging unit, and a processor that can be implemented by a signal processor. 〇 = A structure quantizer that implements this digital encoding program, a variable-length encoder, and an adjustment. According to the present invention, the quantization parameters of the low bit rate encoding program are directly evaluated from the quality conditions. Because by using the ratio control loop, the quantizaton bandwidth and the required number of bits in the non-uniform frequency lines are considered and can be adjusted. For variable bit rate programming, the ratio control loop can be completely removed. The above-mentioned and other objects and advantages of the present invention are described in detail below in conjunction with the following drawings, detailed description of the embodiments, and the scope of patent applications. Implementation method Figure 3a illustrates the procedure of the audio coding method of the present invention. Referring to FIG. 3a, the turn-in audio is first converted into a sequence of frequency samples to represent the audio frequency.

刈4835 五、發明說明(8) ^請合成。接著,此頻率樣本序列根據_種位元分配 ^,而芒量化以獲得具有一個較低精確度的符號。一個灸 數預估态直接依據遮罩門檻值來預估量化參數, 二 檻值為人類聽㈣統所能聽見的雜音限度。 ^門 個壓縮系統之決定訊號級別解析度的參數。… 預估一 硒=ΐ 6量化符號的編碼是以一個τ變長度編碼器來編 讓1一個步驟則是檢查一個預定的可用位元數是否足夠 ^碼_貝料使用。如果編碼後的資料的全長超過可用位 ▲㈤目,則將參數調整並且增加量化間隔的大小。重複 直到可用位元的數目大於此編碼所需要的位元數 格所定'義的:Ϊ終已編碼的序列包裝成一個由特定音訊規 對於低位元 化參數之前,高 位7L率音訊編碼 音編碼所需要的 ^ 頻率(CUt-off 量在預估量化參 大小也將被調整 可根據所需要的 2迴圈的步驟則 音訊編碼程序的 率音讯的編碼,在評估參數估琴中 頻率可被砍除(CUt-Off)。·圖扑說明此低 程序的步驟。如圖3b所示,當低位元率語 位元數目超過可用位元的數目時,調整砍 frequency)然後傳送,以使高頻率的分 數之前就被砍除。如有需要,則量化間隔 。對於可變位元率音訊的編碼,可用位元 品質而被調整。在此情況下,反覆比率控 可以完全被移除。圖3 c說明此可變位元率 步驟,其中反覆比率控制迴圈的步驟從圖 584835刈 4835 V. Description of the invention (8) ^ Please synthesize. Then, this frequency sample sequence is allocated according to the _ species bits, and the quantization is performed to obtain a symbol with a lower accuracy. One moxibustion number estimation state directly estimates the quantization parameter based on the mask threshold value. The second threshold value is the noise limit that can be heard by the human hearing system. ^ Gate A compression system parameter that determines the signal level resolution. … Estimated one Selenium = ΐ 6 The encoding of the quantization symbol is coded with a τ variable-length encoder. One step is to check whether a predetermined number of available bits is sufficient. If the full length of the encoded data exceeds the available digits, adjust the parameters and increase the size of the quantization interval. Repeat until the number of available bits is greater than the number of bits required for this encoding. Definition: Ϊ The final encoded sequence is packed into a high-level 7L rate audio coded audio coded by a specific audio specification before the low bitization parameter The required ^ frequency (CUt-off amount will be adjusted in the estimated quantization parameter size. According to the required 2 loop steps, the rate of the audio encoding program is the encoding of the audio frequency. The frequency can be cut off in the evaluation parameter estimation. (CUt-Off). Tupu explains the steps of this low program. As shown in Figure 3b, when the number of low-bit rate speech bits exceeds the number of available bits, adjust the frequency) and then transmit to make the high-frequency The score was cut off before. If necessary, quantize the interval. For encoding of variable bit rate audio, the bit quality can be adjusted. In this case, the iterative ratio control can be completely removed. Figure 3c illustrates this variable bit rate step, where the step of iterative rate control loops is shown in Figure 584835.

3a中被移除。 本發明之圖3a到圖3c的編碼程序可用訊號處理器 (signal processors)來實現 〇 ψ 者工曰, .^ _私 此只現的詳細架構揭露如 后。根據圖3a,圖4a所示的實現架構 ,將輸入音訊接收並轉換成_媚1婵士产,得換态4〇1 ^ 頻率樣本序列,以代表續咅 訊的一種頻譜合成。一個量化哭4n9 # 叭衣Θ曰 — + 化為4 0 2,根據一個位元分配 程^,將此頻率樣本序列量化成有限個 一 個參數預估器4 0 5,直接μ出 ώ ^ ) Α ^ ^ ^ 7 η 接錯由一個遮軍門檻值來估計量化 參數,然後一個最佳的編石馬5| 4 〇Cj蔣 當預定預定可用位元數不足2二已!化的級別編碼。 調整器4。7調整這…匕4夠讓,資料使用時’-個Removed in 3a. The coding procedure of FIG. 3a to FIG. 3c of the present invention may be implemented by a signal processor. 工 工,. ^ _Private The detailed architecture is only disclosed later. According to the implementation architecture shown in FIG. 3a and FIG. 4a, the input audio is received and converted into a 产 1 婵婵 product, and a state sequence of 401 ^ frequency samples is obtained to represent a kind of spectrum synthesis of the continuous signal. A quantized cry 4n9 # 衣衣 Θ said — + is changed to 4 0 2, according to a bit allocation process ^, this frequency sample sequence is quantized into a finite number of parameter estimators 4 0 5, directly ώ) Α ^ ^ ^ 7 η The error is estimated by a cover threshold to quantify the parameters, and then an optimal stone horse 5 | 4 〇Cj Chiang when the number of reserved available bits is less than 22! Level encoding. Adjuster 4. 7 adjust this ... Dagger 4 is enough, when the data is used’- a

一 匕多數。一個比較器4 0 8,比軔一〆151 預定的可用位元數和已編;^ I 用的位元數是否足夠讓已η:的長度’以檢查此可 4〇9,將最終已㈣ΛΛΛ 一個包裝單元 定義的序列。 j匕裝成一個由特定音訊規格所 圖4b和圖4c分別為實翊 -個調整器413用來調整二圖:和3C的架構。參考圖4b ’ 編碼的情形下,將此砍除^、冑率’並在低位儿率音訊 411。調整器413也可以^頻率傳送到高頻率砍除單元 大小。高頻率砍除單元(:整丄在量化器4°2裡的量化間隔 411是被加在轉換器4〇1和』卜freQuency cut-off unit) 後的砍除頻率,並將它傳::器40 2之間,用來接收調整 ^丨哥迗給I數預估器4 〇 5。在可變位One dagger majority. A comparator 4 0 8 compares the number of available bits with the predetermined number of 〆 〆 151; ^ I The number of bits used is sufficient to allow the length of η: to check this may be 409, which will eventually be ㈣ΛΛΛ A sequence defined by a packing unit. Figure 4b and Figure 4c are respectively implemented by a specific audio specification-an adjuster 413 is used to adjust the architecture of the two pictures: and 3C. Referring to FIG. 4b, in the case of encoding, this is cut off ^, 胄 rate ', and the audio rate 411 is at a low rate. The adjuster 413 may also transmit the frequency to the high-frequency cut-off unit size. The high-frequency cut-off unit (: the quantization interval 411 in the quantizer 4 ° 2 is the cut-off frequency added to the converters 401 and FreQuency cut-off unit) and passes it :: Between the receivers 40 and 2 for receiving adjustments to the e-number estimator 4 05. Variable position

第13頁 584835 五、發明說明(ίο) 元率編碼的情形下,與及爱,_ h ϋ ^ _ 覆比率控制迴圈步驟有關的元件 均被移除,如圖4 c所不。 在本發明中,產生 礎的決定公式(determi 分配程序中參數預估器 個雜音預估器的封閉式 法以Μ P E G第三層來詳細 一個MPEG ACC量化器, 一個以常數遮罩_對_雜音率p為基 nistic formula),來計算在位元 2量化參數。此決定方程式提供一 ^式給非均勻量化器。本發明的方 °兒明衍生步驟和作為實施例。對於 也可以使用類似的程序。 本發明的位元分配藉由單一步驟的預測,就符合了對 於每一個子波段(SUb-band)的位元率和雜音成形(n〇ise shaping)的需求。而直接藉由一個遮罩門檻值,最佳的全 域因子(global factor)和每一個子波段的比率調整因子 (scaling factor)就被預估。此全域因子控制了被使用殆 盡之位元的總數,而此比率調整因子則控制.了該相關波段 相對於其他波段的量化雜音。以下章節首先說明此位元分 配的要件,再詳加說明雜音預估器(n〇ise predict〇r)的 衍生步驟’以及在零波段(zero band)和負值雜音遮罩率 (negative noise-to-masking ratio,NMR)條件下的一個 比率調整因子的上下限(bounds)。 位元分i己4条件Page 13 584835 V. Description of the invention (ίο) In the case of meta-rate encoding, the components related to the love, _ h ϋ ^ _ override ratio control loop step have been removed, as shown in Figure 4c. In the present invention, a basic decision formula (closed method of parameter estimator and noise estimator in the determi allocation procedure is used to detail an MPEG ACC quantizer with MPEG third layer, and a constant mask_pair_ The noise rate p is a basic nistic formula) to calculate the quantization parameter at bit 2. This decision equation provides a formula for the non-uniform quantizer. The method and derivation steps of the present invention are described as examples. A similar procedure can be used for. The bit allocation of the present invention meets the requirements for the bit rate and noise shaping of each sub-band (SUb-band) through a single-step prediction. By directly using a mask threshold, the optimal global factor and the scaling factor of each sub-band are estimated. This global factor controls the total number of exhausted bits used, while this ratio adjustment factor controls the quantization noise of the relevant band relative to other bands. The following chapters first explain the requirements for this bit allocation, and then explain in detail the derivation steps of the noise predictor (n〇ise predict〇r) and the zero noise and negative noise masking rate (negative noise- to-masking ratio (NMR) The upper and lower bounds of a ratio adjustment factor. Bit points i and 4 conditions

584835 五、發明說明(11) 首先,考慮分段的N M R的最小值,並表示為 =魟gMinT/ m584835 V. Description of the invention (11) First, consider the minimum value of the segmented N M R and express it as = 魟 gMinT / m

M{i) Jj 其中σΑ^/;>和分別為與關鍵波段/ 相關的雜音能$ (noise energy)和遮罩能量(masking energy)。為將 NMR最小化的位元率。在一個每樣本有邱)個位元的PCM編 碼器中,量化誤差變異數(Quantization error variance )表示如下: (2) (3) Ν(η = Ρ2,σ2χ(〇 因此,此最小值 iin2~2R(0a2.) ar§MmZ 一^—— m i [V σΜ{〇 J 必須受限於此總位元率,亦即 (4)- 根據拉氏乘數(Lagrange multipliers)的方法,其λ 解必須滿足下列方程式:M {i) Jj where σA ^ /; > and the noise energy $ (noise energy) and masking energy related to the key band / respectively. Is the bit rate to minimize NMR. In a PCM encoder with Qiu bits per sample, the quantization error variance (Quantization error variance) is expressed as follows: (2) (3) Ν (η = Ρ2, σ2χ (〇 Therefore, this minimum iin2 ~ 2R (0a2.) Ar§MmZ a ^ —— mi [V σΜ {〇J must be limited by this total bit rate, which is (4)-according to the Lagrange multipliers method, its λ solution The following equations must be satisfied:

第15頁 584835Page 15 584835

B(j) B(j) 則B (j) B (j) then

A (2 log 2) P2 v(./) 2 log 2(- '(.n 對於所有 (5) (j) (j) 所以,必須找出i?⑺使得雜音遮罩率正比於此。亦即 ⑺⑽·),對於所有 (6) 雜音的標準水平(noise level)必須正比於遮罩門根值乘 以一個頻兔’以便具有最佳分段的N M R。 其次’考量在該量化波段的遮罩門檻值和關鍵頻寬以 選取量化波段的雜音標準水平。換句話說,將找出4 ’ 取代 <.)以使分段的NMR最小化。 木A (2 log 2) P2 v (./) 2 log 2 (-'(.n for all (5) (j) (j) So we must find i? ⑺ so that the noise mask ratio is proportional to this. Also That is,) ·), for all (6) noise, the standard level (noise level) must be proportional to the mask gate root value multiplied by a frequency rabbit 'in order to have the best segmented NMR. Secondly, consider the mask at this quantized band Mask thresholds and key bandwidths to select a standard level of noise in the quantization band. In other words, 4 'substitutions will be found to minimize segmented NMR. wood

其中,?是量化波段的指標。 近使分段的N M R最小化的能量 此問題就是找出s⑼來最佳逼 ,亦即among them,? It is an indicator of the quantization band. The energy that minimizes the segmented N M R. This problem is to find s⑼ for the best approximation, that is,

584835 五、發明說明(13) 在量化波段裡的關鍵波段的遮罩能量是均勻 汁鼻後的選擇為 則經 (9) 第 ΛΑ ώ ,為了避免位元被分配給具有高於雜音水伞 =罩水平標準的波段’必須修改將分段的麗最^準 條件,使得具有負值NMR的波段被歸整為1。意即,々化的 這 ί 2 ί雜音必須有一個下a。另-方面,高於遮ΐ:波 /雜曰會導致一種現象’此現象即相關波段將被J2檻 :::之為零波段。這些零波段是相當明顯 -為 些ϊ化級別也必須被限制為不可大於訊號能量。汀从, 總而言之,在零波段和負值NMR的條件 元分配時,雜音必須與遮罩門檻值乘以_〜,指定此仇 π見的值相等。 本發明以MPEG第三層的量化器來詳% 的衍生步驟。從MPEG第三層的標準,此筮,明雜音分配器 化器的簡化公式為 系二層之非均句量 (3 Λ int (1〇)584835 V. Description of the invention (13) The masking energy of the key band in the quantized band is even after the nose is selected. (9) No. ΛΑ FREE, in order to avoid the bits being allocated to the umbrella with noise higher than the noise = The band level of the mask level standard must be modified to match the segmentation criteria so that the band with negative NMR is rounded to 1. This means that the 々 2 杂 murmurs that have been converted must have a lower a. On the other hand, it is higher than the cover: the wave / miscellaneous phenomenon will cause a phenomenon. This phenomenon means that the relevant band will be thresholded by J2 ::: the zero band. These zero bands are quite obvious-for some levels of virtualization, they must also be limited to no more than the signal energy. Ting Cong, in short, in the allocation of conditional elements of the zero band and negative NMR, the noise must be multiplied by the mask threshold multiplied by _ ~, specifying the value of this hate. In the present invention, a quantizer of the third layer of MPEG is used to describe the% derivation step. From the standard of the third layer of MPEG, here, the simplified formula of the splitter and splitter is the non-uniform sentence volume of the second layer (3 Λ int (1〇)

第17頁 584835 五、發明說明(14) 其中,量化間隔大小為Page 17 584835 V. Description of the invention (14) Among them, the quantization interval is

(ID _ -scales ΑΦ = 1 . 從MPEG的標準,非均勻量化器的公式也可以表示為 =mt xr.2 scale - gain -0.0946 (12)(ID _ -scales ΑΦ = 1. From the standard of MPEG, the formula of non-uniform quantizer can also be expressed as = mt xr. 2 scale-gain -0.0946 (12)

VV

其中,對於每個量化波段q,比率調整因子為 scaleq = 1 / 2(1 + scalefac_ scale){scalefacq + preflag · pretabq) ; scalefac _ scale 為 〇 或 1 , scalefacq 介於0到1 5之間,而預先放大的旗幟 (pre-amplified flag)為 Prefias,pretabq ;對於ΜPEG 第三 層結構的每一個小細節(granule),全域益(global gain) 為 gciingr=\/2(gl〇bal』aingr2\Q) 。將 0 · 0 9 4 6 忽略,則(1 2 )可被 推導成 / is =int xr.2 scale-gain \ int ^scaleq-gaingFor each quantization band q, the ratio adjustment factor is scaleq = 1/2 (1 + scalefac_scale) {scalefacq + preflag · pretabq); scalefac _ scale is 0 or 1, and scalefacq is between 0 and 15. And the pre-amplified flag is Prefias, pretabq; for each small detail of the third layer structure of MPEG, the global gain is gciingr = \ / 2 (gl〇bal′aingr2 \ Q). If 0 · 0 9 4 6 is ignored, then (1 2) can be derived as / is = int xr.2 scale-gain \ int ^ scaleq-gaing

int _3、 xr} Δ,, (13) 其中,間隔大小為 ^{gaing,-scale qint _3, xr} Δ ,, (13) where the gap size is ^ {gaing, -scale q

第18頁 584835 五、發明說明(15) 接下來,輸入訊號(i n p u t s i g n a 1 )巧和回復訊號 (r e c ο n s t r u c t e d s i g n a 1 ) xr,有下列兩個公式來表示: xrt = ((iSi^s,)/isJb} ,和 χη =(is}AsJh} 非均勻量化器e的量化誤差將等於輸入訊號xr,和回復訊號xr, 的差值· {(is. +sl)Asjhf =(1 + isf εί)3 isi 3 Δφ' — {isf Δφ(14) 令 /(^,) = 0 +以,'.户 。以泰勒展開式(Ty 1 er expans i on) 的第一階導數 + f(^)s 來逼近此差值,而導出 假設量化訊號h和非均勻量化器&的量化後的誤差相互獨 立,則非均勻量化器A的量化誤差的期望值如下: E[ef ] - ^a}e[is}£2] ^ ^A}E[IS}]E[£!] ( 1 5 } 如果量化波段的光譜是均勻的,則線的雜音可以是此量化 波段的平均能量;亦即Page 18 584835 V. Description of the invention (15) Next, the input signal (inputsigna 1) and the reply signal (rec ο nstructedsigna 1) xr are expressed by the following two formulas: xrt = ((iSi ^ s,) / isJb}, and χη = (is} AsJh} The quantization error of the non-uniform quantizer e will be equal to the difference between the input signal xr, and the response signal xr, {(is. + sl) Asjhf = (1 + isf εί) 3 isi 3 Δφ '— {isf Δφ (14) Let / (^,) = 0 + Yi,'. hu. Take the first derivative of Taylor expansion (Ty 1 er expans i on) + f (^) s Approximating this difference, and deriving the assumption that the quantization signal h and the quantization error of the non-uniform quantizer & are independent of each other, the expected value of the quantization error of the non-uniform quantizer A is as follows: E [ef]-^ a} e [is } £ 2] ^ ^ A} E [IS}] E [£!] (1 5} If the spectrum of the quantization band is uniform, the noise of the line can be the average energy of this quantization band; that is,

第19頁 584835 五、發明說明(16) E(ef) = E(e2g) (16) 因為 Ε[ε2]上 ,(15)式變成 Li 12 E[el ] a ¥ } q1 E[\XRlY ] (1 7)P.19 584835 V. Explanation of the invention (16) E (ef) = E (e2g) (16) Since E [ε2], the formula (15) becomes Li 12 E [el] a ¥} q1 E [\ XRlY] (1 7)

將(7 )式帶入(1 6 )式得出 E[e2q] = Ka2M{g)B(q) (18) 最後,藉由定義7;二<,⑷,此全域益和比率調整因子之 間的差值趨近至 0 0.5 gaingr - scaleq « f l〇g2 ^K'Tq I E[ XRq ] 5 或Bring Eq. (7) into Eq. (1 6) and get E [e2q] = Ka2M {g) B (q) (18) Finally, by defining 7; two <, ⑷, this global benefit and ratio adjustment factor The difference between them approaches 0 0.5 gaingr-scaleq «fl〇g2 ^ K'Tq IE [XRq] 5 or

gain - scale log2 — + l〇g2 K +1°§2 ^ ^°§2 ^f\^q| ^ 4 (19) 因為比率調整因子的值介於0和1 6之間,並且這些量 化波段之比率調整的最小值必須為零,因此全域益為gain-scale log2 — + l〇g2 K + 1 ° §2 ^ ^ ° §2 ^ f \ ^ q | ^ 4 (19) because the value of the ratio adjustment factor is between 0 and 16 and these quantized bands The minimum ratio adjustment must be zero, so the global benefit is

第20頁 584835 五、發明說明(17) gain 二 Max{gaingr - scaleq \, s q 並且得到所有子波段的比率調整因子。可見,全域益隨著 位元率相關常數/C而改變,而每一子波段的比率調整因子 則根據此遮罩門檻值和輸入訊號而不同。 比率調整因子的上下限Page 20 584835 V. Description of the invention (17) gain 2 Max {gaingr-scaleq \, s q and get the ratio adjustment factors of all sub-bands. It can be seen that the global benefit changes with the bit rate related constant / C, and the ratio adjustment factor of each sub-band is different according to this mask threshold and the input signal. Upper and lower limits of the ratio adjustment factor

如前所述,位元分配應受限於非負值NMR和零波段的 條件下。對於非負值NMR的情形,雜音級別被設定為遮罩 門檻值;亦即7; 和/c = 1。如此產生相對於全域比率 調整之的上限。 gaingr - Uscaleq = -( l〇g2 — +1〇§2 σ1ί{ς)"" | ^ (20) 意即, scalea <Όscale = gaingr -\(l〇g2 — + l°g2 σΜ(ς) "| J) q 4 (21 )As mentioned earlier, bit allocation should be limited to non-negative NMR and zero band conditions. For the case of non-negative NMR, the noise level is set to the mask threshold; that is, 7; and / c = 1. This results in an upper limit relative to the global ratio adjustment. gaingr-Uscaleq =-(l〇g2 — + 1〇§2 σ1ί {ς) " " | ^ (20) That is, scalea < Όscale = gaingr-\ (l〇g2 — + l ° g2 σΜ ( ς) " | J) q 4 (21)

此將根據可用的位元而被調整。 下限可由零波段的條件下導出。當雜音大於訊號能量 時,則會發生零波段的情形;亦即This will be adjusted based on the available bits. The lower limit can be derived from the zero-band condition. When the noise is greater than the signal energy, a zero-band situation will occur; that is,

第21頁 (22) 584835 五 、發明說明(18) 0.5 因此,比率調整的下限會是 scaleq > Dscaleq = gaingr - ^\0gi E[\XRq 0.5 (23) 圖5分別說明本發明*MpEG位元分配程 數,而R為比率控制的平均反覆次數。如圖5 反復次 的分配方法移除了品質控制反覆所需要的反择=j本發明 利用一個大於3的因子來減少比率控制的反覆^欠人數數。,並且 圖6說明本發明相較於丨s〇之位元分配方 分紀錄。在此,本發明採用語音品質的知覺預的客觀的評 (perceptual evaluation of audio qualHy^tJ! ,此PEAQ系統為ITU-R工作組10/4所推薦的’々MAQ)系統 始碼。I SOI採用Lame的終止條件而被改盖'。”、、t° 1 S〇為原 款式(stereo mode)和心理分析模式2為美礎 二驗以立體 .(blt -ser;;i〇V^^^ 關,因此實驗中將此兩個機制關掉。客觀差異等配級^法無 (objecUve deference g]:ade,〇DG)是客觀測量%方法的 輸出變數。0DG值應理想地介於0和〜4之間,其中〇 個察覺不出的損害’而-4係對應—個被認定是非常惱:的 584835 五、發明說明(19) 損害。如圖6所示,本發明之方法的品質較此圖文中所提 出的方法品質為好。 PEAQ在本發明中所採用的架構為基本的版本。此基本 版本使用以FFT為基礎的耳朵模型(FFT - based ear model )。它使用下列的模型輸出變數:BandwidthRefB,Page 21 (22) 584835 V. Description of the invention (18) 0.5 Therefore, the lower limit of the ratio adjustment will be scaleq > Dscaleq = gaingr-^ \ 0gi E [\ XRq 0.5 (23) Figure 5 illustrates the * MpEG bit of the present invention, respectively. Yuan distribution process, and R is the average number of iterations of ratio control. The repeated allocation method shown in Figure 5 removes the inverse selection needed for quality control iterations. The present invention uses a factor greater than 3 to reduce the number of iterations of rate control. And, FIG. 6 illustrates the bit allocation record of the present invention compared to 丨 s0. Here, the present invention uses a perceptual pre-evaluation of audio quality (perceptual evaluation of audio qualHy ^ tJ!). This PEAQ system is the 々MAQ recommended by the ITU-R Working Group 10/4. I SOI was changed using Lame's termination conditions. ' ", T ° 1 S〇 is the original style (stereo mode) and psychoanalysis mode 2 is based on the second test and three-dimensional. (Blt -ser ;; i〇V ^^^ off, so in the experiment these two mechanisms Turn off. The objective difference equal grading method (objecUve deference g): ade, 0DG) is the output variable of the objective measurement% method. The value of 0DG should ideally be between 0 and ~ 4, of which 0 are not perceptible. The damage is 'and -4' corresponds to one which is considered to be very annoying: 584835 V. Description of the invention (19) Damage. As shown in Figure 6, the quality of the method of the present invention is better than the quality of the method proposed in this figure and text Well, the architecture used by PEAQ in the present invention is a basic version. This basic version uses an FFT-based ear model (FFT-based ear model). It uses the following model output variables: BandwidthRefB,

BandwidthTestB,Total NMRB,WinModDifflB,ADBb, EHSB,AvgModDifflB,AvgModDiff2B,RmsNoiseLoudB, ^^?08以及1^10丨3七?厂3111658。利用一個人造類神經網 (artificial neural network),這 11 個模型輸出變數被 轉換成一個單一品質指數,此人造類神經網的隱藏層 (hidden layer)中含有三個節點(n〇de)。 圖7提供一組測試訊號的表單,這些測試訊號被使用 於客觀和主觀的試驗中。藉由設定相同的反覆終止條件, 例如反覆次數、非增加雜音比率調整因子波段、合適於比 率调整因子表等等[網址http://www.inp3dev.org/inp3], I SO演譯法則可以利用Lame提到的方法來改善(即具有最佳 。口 λ的m p 3編碼)。被採用來比較的兩個巢狀迴圈係以 Lame所使用的反覆演譯法則為基礎。 ’、 唯,以上所述者,僅為本發明之較佳實施例而已,當 不能以此限定本發明實施之範圍。即大凡依本發明申請專 利範圍所作之均等變化與修飾,皆應仍屬本發明專利涵雲BandwidthTestB, Total NMRB, WinModDifflB, ADBb, EHSB, AvgModDifflB, AvgModDiff2B, RmsNoiseLoudB, ^^? 08 and 1 ^ 10 丨 37? Plant 3111658. Using an artificial neural network, the 11 model output variables are converted into a single quality index. The hidden layer of the artificial neural network contains three nodes (node). Figure 7 provides a list of test signals used in objective and subjective tests. By setting the same iterative termination conditions, such as the number of iterations, non-increasing noise ratio adjustment factor bands, suitable ratio adjustment factor tables, etc. [URL http://www.inp3dev.org/inp3], the I SO algorithm can Use the method mentioned by Lame to improve (that is, mp 3 encoding with the best. Λ). The two nested loops used for comparison are based on the iterative algorithm used by Lame. However, the above are merely preferred embodiments of the present invention, and the scope of implementation of the present invention cannot be limited by this. That is to say, all equivalent changes and modifications made in accordance with the scope of the patent application of the present invention should still belong to the patent scope of the present invention.

第23頁 584835 五、發明說明(20) 之範圍内。Page 23 584835 5. Within the scope of (20).

Ι1·Ι1ΙΙ 第24頁 圖式簡單說明 圖1係時下音訊編碼之_ 種編碼程序的方塊示意圖。 圖2係一 〇 C F程序中的a 一 々汴〒的位兀分配程序。 圖3a說明本發明之音_ “ 扁碼知序的流程步驟。 圖3 b說明本發明之柄你—亡^ - 凡率曰訊編碼程序的流程步驟 訊編碼程序的流程步 圖3c說明本發明之可變位元率音 圖4 a說明本發明之圖3 ^ ^ 圃仏的一種實現架構。 圖4b和圖4c分別說明圖3h知闰Q ^ — n BMb和圖3C的實現架構。 圖5分別說明本發明未μ p p广 一 料時,口所> ^ # MPEG位兀分配程序使用不同試驗材Ι1 · Ι1ΙΙ page 24 Brief description of the figure Figure 1 is a block diagram of the current audio coding _ encoding process. Figure 2 shows the a-bit allocation procedure in the 10 C program. Fig. 3a illustrates the steps of the sequence of the sound of the present invention _ "oblate code. Fig. 3b illustrates the process of the present invention. You-die ^-flow rate of the encoding process of the message rate. Fig. 3c illustrates the process of the present invention. Figure 4a illustrates the implementation architecture of Figure 3 ^ ^ of the present invention. Figure 4b and Figure 4c illustrate the implementation architecture of Figure 3h known Q ^ n BMb and Figure 3C. Figure 5 Explain separately when the present invention is not μ pp wide, ^ # MPEG bit allocation program uses different test materials

料% 口口貝控制和位元率i允生,1 ΛΑ T 兀半控制的平均反覆次數。 圖6說明本發明相較於I — 蛛 、1 bU之位兀分配方法的客觀的評分乡 錄° 圖7提供一組測試訊缺沾本抑 觀和主觀的試驗中/的表|’這些測試訊號被使用於客Material% Oral control and bit rate i allow birth, 1 ΛΑ T half control the average number of iterations. Figure 6 illustrates the objective scores of the present invention compared to the I-spider, 1 bU allocation method. Figure 7 provides a set of tests / tables for testing the subjective and subjective tests. 'These tests Signals are used by customers

第25頁 584835Page 25 584835

圖式簡單說明 圖號說明 101 時 間 /頻率轉換器 103 其 他 編 碼 器 105 量 化 器 107 可 變 長 度 編 碼器 109 包 裝 單 元 111 聽 覺 分 析 模 式 113 位 元 分 配器 115 可 用 的 位 元 401 轉 換 器 402 量 化 器 403 可 變 長 度編碼器 405 參 數 預 估 器 407 調 整 器 408 比 較 器 409 包 裝 單 元 411 砍 除 1¾ 頻 率 413 調 整 器Brief description of the drawing Figure description 101 Time / frequency converter 103 Other encoders 105 Quantizer 107 Variable-length encoder 109 Packaging unit 111 Auditory analysis mode 113 Bit allocator 115 Available bits 401 Converter 402 Quantizer 403 Variable Length Encoder 405 Parameter Estimator 407 Adjuster 408 Comparator 409 Packaging Unit 411 Cut 1¾ Frequency 413 Adjuster

第26頁Page 26

Claims (1)

584835 六、申請專利範圍 1. 一種用於傳送和包裝音訊之數位編碼的方法,包含下 列步驟: (a) 將輸入音訊轉換成頻率樣本序列,以代表該音訊 的一種頻譜合成; (b) 根據一種位元分配程序,將該頻率樣本序列量化 成量化值,藉由一個遮罩門楹值,該位元分配程 序利用一個參數預估器來估計量化參數; (c) 利用一個符號編碼器將該量化值編碼以形成已編 碼的資料,該已編碼的資料包含複數個位元;以 及 (d) 根據一個特定音訊規格,將該已編碼的資料包裝 成一個資料序列。 2. 如申請專利範圍第1項所述之用於傳送和包裝音訊之數 位編碼的方法,其中該步驟(b )係·由一個均勻量化器或 是一個非均勻量化器來執行。 3. 如申請專利範圍第1項所述之用於傳送和包裝音訊之數 位編碼的方法,其中該符號編碼器包含一個可變長度-編碼器。 4.如申請專利範圍第1項所述之用於傳送和包裝音訊之數 位編碼的方法,其中對於一個量化波段,該位元分配 程序中的參數預估器利用一個公式來計算並調整至少584835 6. Scope of patent application 1. A method for digitally encoding and transmitting audio, including the following steps: (a) converting the input audio into a sequence of frequency samples to represent a spectral synthesis of the audio; (b) according to A bit allocation program that quantizes the frequency sample sequence into a quantized value. With a mask threshold value, the bit allocation program uses a parameter estimator to estimate the quantization parameter; (c) uses a symbol encoder to The quantized value is encoded to form encoded data, the encoded data comprising a plurality of bits; and (d) packaging the encoded data into a sequence of data according to a specific audio specification. 2. The method for digitally encoding and transmitting audio as described in item 1 of the scope of patent application, wherein step (b) is performed by a uniform quantizer or a non-uniform quantizer. 3. The method for digitally encoding and transmitting audio as described in item 1 of the patent application scope, wherein the symbol encoder comprises a variable length-encoder. 4. The method of digital encoding for transmitting and packaging audio according to item 1 of the scope of patent application, wherein for a quantized band, the parameter estimator in the bit allocation program uses a formula to calculate and adjust at least 第27頁 584835 六、申請專利範圍 一個對應的全域因子和/或一頻波段比率調整因子,該 公式係以一個常數遮罩-對-雜音率為基礎。 5. 如申請專利範圍第4項所述之用於傳送和包裝音訊之數 位編碼的方法,其中該步驟(b )之位元分配程序更包含 下列步驟: 根據該已編碼資料的一個預定的可用位元數來調整該 全域因子;以及 產生與該量化波段全域因子對應的該波段比率調整因 子的一個上限和一個下限。 6. 如申請專利範圍第5項所述之用於傳送和包裝音訊之數 位編碼的方法,其中該上限是由一個非負值的噪音-對 -遮罩率來限制。 7. 如申請專利範圍第5項所述之用於傳送和包裝音訊之數 位編碼的方法,其中該下限是由零波段來限制。 8. 如申請專利範圍第4項所述之用於傳送和包裝音訊之數 位編碼的方法,其中根據該遮罩門檻值和該輸入的音 訊,該波段比率調整因子依每個子波段而不同。 9. 如申請專利範圍第4項所述之用於傳送和包裝音訊之數 位編碼的方法,其中該全域因子隨一個位元率相關的Page 27 584835 VI. Patent Application Range A corresponding global factor and / or a frequency band ratio adjustment factor. The formula is based on a constant mask-to-noise ratio. 5. The method for digitally encoding and transmitting audio as described in item 4 of the scope of the patent application, wherein the bit allocation procedure of step (b) further includes the following steps: According to a predetermined availability of the encoded data Adjust the global factor by the number of bits; and generate an upper limit and a lower limit of the band ratio adjustment factor corresponding to the global factor of the quantized band. 6. The method for digitally encoding and transmitting audio as described in item 5 of the patent application scope, wherein the upper limit is limited by a non-negative noise-to-mask ratio. 7. The method for digitally encoding and transmitting audio as described in item 5 of the patent application scope, wherein the lower limit is limited by the zero band. 8. The method for digitally encoding and transmitting audio as described in item 4 of the scope of patent application, wherein the band ratio adjustment factor is different for each sub-band according to the mask threshold and the input audio. 9. The method for digitally encoding and transmitting audio as described in item 4 of the scope of the patent application, wherein the global factor is dependent on a bit rate 第28頁 584835 六、申請專利範圍 常數而不同。 10. 如申請專利範圍第1項所述之用於傳送和包裝音訊之 數位編碼的方法,在步驟(d)之前,更備有一個反覆比 率控制迴圈,該反覆比率控制迴圈包含下列步驟: (c 1 )如果包含於該已編碼資料的該位元數不超過該已 編碼資料的一個預定的可用位元數,則繼續該步 驟(d ),否則繼續步驟(c 2 );以及 (c2 )調整量化參數和一個將用於步驟(b)的量化間隔大 小,並且回到步驟(b)。 11. 如申請專利範圍第1 0項所述之用於傳送和包裝音訊之 數位編碼的方法,其中該步驟(b)係經由一均勻量化 器或是一個非均勻量化器來執行。 12. 如申請專利範圍第1 0項所述之用於傳送和包裝音訊包 裝音訊之數位編碼的方法,其中如果包含於該已編碼 資料的該位元數超過該已編碼資料的一個預定的可用 位元數,則至少一個對應的全域因子和一個頻波段比 率調整因子會被調整,並且該步驟(c2 )中的量化間隔 大小也會增加。 13. 如申請專利範圍第1項所述之用於傳送和包裝音訊之 數位編碼的方法,其中該符號編碼器包含一個可變長Page 28 584835 6. The scope of patent application is constant and varies. 10. According to the method for digitally encoding and transmitting audio as described in item 1 of the scope of the patent application, before step (d), there is an iterative rate control loop, which includes the following steps : (C 1) if the number of bits contained in the encoded data does not exceed a predetermined number of available bits of the encoded data, continue with step (d), otherwise continue with step (c 2); and ( c2) Adjust the quantization parameter and the size of a quantization interval to be used in step (b), and return to step (b). 11. The method for digitally encoding and transmitting audio as described in item 10 of the patent application scope, wherein step (b) is performed by a uniform quantizer or a non-uniform quantizer. 12. The method for digitally encoding and transmitting audio packaging audio as described in item 10 of the scope of patent application, wherein if the number of bits included in the encoded data exceeds a predetermined available number of the encoded data Number of bits, at least one corresponding global factor and one frequency band ratio adjustment factor will be adjusted, and the size of the quantization interval in this step (c2) will also increase. 13. A method of digital encoding for transmitting and packaging audio as described in item 1 of the patent application scope, wherein the symbol encoder comprises a variable length 第29頁 六、申請專利範圍 度編碼器 14 15. t Γ凊專利範圍第1 0項所述之用於傳送和包穿立1之 =碼的方法,其中该夕驟(b)更包含一:步曰:之 J §玄頻率樣本序列之前’對於低位元率音訊的編 碼,砍除高頻率。 千曰Λ的編 申凊專利範圍第1 4項所述之用於傳送和包穿立 數位編碼的方反覆比表曰讯之 ⑽更包括對該砍除二率的步ί:!:'的步驟 率的步驟。人除问頻 %去凋整-個砍除頻 16· 17 如申請專利範 數位編碼的方 配程序中的參 至少一個對應 子,該公式係 如申請專利範 數位編碼的方 包含下列步驟 根據該已編碼 全域因子;以 產生與該量化 圍第10項所述之用於傳送和包裝音訊之 法,其中對於一個量化波段,該位元分 數預估器利用一個公式,來計算並調整 的全域因子和/或一頻波段比率調整因 以一個常數的遮罩〜對-雜音率為基礎。 圍第1 6項所述之用於傳送和包裝音訊之 法,其中該步驟(b )之位元分配程序更 資料的一個預定的可用位元數來調整該 及 波段全域因子對應的該波段比率調整因Page 29 6. Application for patent range encoder 14 15. The method for transmitting and packetizing 1 = code described in item 10 of t Γ t patent range, wherein step (b) further includes a : Step by step: J § Before Xuan frequency sample sequence 'for low bit rate audio coding, cut off high frequencies. The method of transmitting and encapsulating the cubic digital code described in the scope of patent application No. 14 of Qian Yue Λ includes the step of removing the second rate more than that shown in the table. Step-by-step steps. The frequency is reduced by one person-a cut-off frequency of 16.17. If there is at least one corresponding parameter in the formula of the patent application, the formula is as follows: Global factor coded; to generate the method for transmitting and packaging audio as described in item 10 of the quantization circle, where for a quantized band, the bit-score estimator uses a formula to calculate and adjust the global factor And / or one-frequency band ratio adjustment is based on a constant mask ~ parallel noise ratio. The method for transmitting and packaging audio described in item 16, wherein the bit allocation procedure of step (b) uses a predetermined number of available bits of data to adjust the band ratio corresponding to the global factor of the band Adjustment factor 584835 六、申請專利範圍 子的一個上限和一個下限。 18 如申請專利範圍第1 7項所述之用於傳送和包裝之兮 數位編碼的方法, 該上限是由一個非負 ,之 -對-遮罩率來限制。 復的噪音 19. 如申請專利範圍第1 7項所述之用於傳送和包裝之士 數位編碼的方法,其中該下限是由零波段來限^。孔之 20. 如申請專利範圍第1 6項所述之用於傳送和包裝音訊之 f位編碼的方法,其中根據該遮罩門檻值和該輸入的 曰成’該波段比率調整因子依每個子波段而不同。 21β ^申請專利範圍第1 6項所述之用於傳送和包裝音訊之 數2編碼的方法,其中該全域因子隨一個位元率相關 的吊數而不同。 入·:傳送和包裝音訊之數位編石馬的結構,該结構 22. 包含: 個轉換器,該轉換器將輸入音 ^ ^ ^ 序列,以代表該音訊的Ζ種頻譜合^換成—頻率樣本 一個參數預估器,該炎數預估哭莛’ 來估計量化參數;H個遮罩門檻值 -個量化器,該量化器根據該量化參數,將該頻率樣 1 第31頁 584835 六、申請專利範圍 本序列量化成量化值; 一個可變長度編碼器,該可變長度編碼器將該量化值 編碼成已編碼的資料,其中該已編碼的資料包含複數 個位元;以及 一個包裝單元,該包裝單元根據一個特定音訊規格, 將該已編碼的資料包裝成一個資料序列。 23. 如申請專利範圍第22項所述之用於傳送和包裝音訊之 數位編碼的結構,該結構更包含: 一個比較器,該比較器比較包含於該已編碼資料的該 位元數和該已編碼資料的一個預定的可用位元數;以 及 一個調整器,該調整器當包含於該已編碼資料的該位 元數超過該已編碼資料的該預定的可用位元數時,用 來調整該量化參數。 24. 如申請專利範圍第23項所述之用於傳送和包裝音訊之 數位編碼的結構,該結構更包含一個高頻率砍除單元 ,該高頻率砍除單元連接在該轉換器和該量化器之間 ,並且備有一個輸入口,用來接收來自該調整器的一 個砍除頻率。584835 6. An upper limit and a lower limit of the scope of patent application. 18 The digital coding method for transmission and packaging as described in item 17 of the scope of patent application, the upper limit is limited by a non-negative, -pair-to-mask ratio. Complex noise 19. The method for digitally encoding a transport and packing person as described in item 17 of the patent application scope, wherein the lower limit is limited by the zero band ^. Hole 20. The f-bit encoding method for transmitting and packaging audio as described in item 16 of the scope of patent application, wherein according to the mask threshold value and the input, the band ratio adjustment factor is Different bands. 21β ^ The number 2 encoding method for transmitting and packaging audio as described in item 16 of the scope of patent application, wherein the global factor varies with a bit rate-dependent hanging number. In ·: The structure of a digitally coded stone horse that transmits and packs audio, the structure 22. Contains: a converter that converts the input audio ^ ^ ^ sequence to represent the Z-type spectrum of the audio combined with-frequency Sample a parameter estimator that estimates the number of inflammations to estimate the quantization parameters; H mask thresholds-a quantizer, which quantifies the frequency based on the quantization parameters 1 Page 31 584835 6. Patent application scope This sequence is quantized into a quantized value; a variable-length encoder that encodes the quantized value into coded data, where the coded data includes a plurality of bits; and a packaging unit , The packaging unit packages the encoded data into a data sequence according to a specific audio specification. 23. The digitally encoded structure for transmitting and packaging audio as described in item 22 of the scope of the patent application, the structure further comprising: a comparator that compares the number of bits contained in the encoded data with the A predetermined number of available bits of encoded data; and an adjuster for adjusting when the number of bits included in the encoded data exceeds the predetermined number of available bits of the encoded data The quantization parameter. 24. The structure of digital encoding for transmitting and packaging audio as described in item 23 of the scope of patent application, the structure further includes a high-frequency cut-off unit connected to the converter and the quantizer There is also an input port for receiving a cut-off frequency from the regulator. 第32頁Page 32
TW91136087A 2002-12-13 2002-12-13 Method and architecture of digital coding for transmitting and packing audio signals TW584835B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW91136087A TW584835B (en) 2002-12-13 2002-12-13 Method and architecture of digital coding for transmitting and packing audio signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW91136087A TW584835B (en) 2002-12-13 2002-12-13 Method and architecture of digital coding for transmitting and packing audio signals

Publications (2)

Publication Number Publication Date
TW584835B true TW584835B (en) 2004-04-21
TW200410201A TW200410201A (en) 2004-06-16

Family

ID=34058054

Family Applications (1)

Application Number Title Priority Date Filing Date
TW91136087A TW584835B (en) 2002-12-13 2002-12-13 Method and architecture of digital coding for transmitting and packing audio signals

Country Status (1)

Country Link
TW (1) TW584835B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI405187B (en) * 2007-11-04 2013-08-11 Qualcomm Inc Scalable speech and audio encoder device, processor including the same, and method and machine-readable medium therefor
US8547255B2 (en) 2008-07-11 2013-10-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method for encoding a symbol, method for decoding a symbol, method for transmitting a symbol from a transmitter to a receiver, encoder, decoder and system for transmitting a symbol from a transmitter to a receiver
US8959017B2 (en) 2008-07-17 2015-02-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoding/decoding scheme having a switchable bypass
CN111310908A (en) * 2018-12-12 2020-06-19 财团法人工业技术研究院 Deep neural network hardware accelerator and operation method thereof

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI405187B (en) * 2007-11-04 2013-08-11 Qualcomm Inc Scalable speech and audio encoder device, processor including the same, and method and machine-readable medium therefor
US8515767B2 (en) 2007-11-04 2013-08-20 Qualcomm Incorporated Technique for encoding/decoding of codebook indices for quantized MDCT spectrum in scalable speech and audio codecs
US8547255B2 (en) 2008-07-11 2013-10-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method for encoding a symbol, method for decoding a symbol, method for transmitting a symbol from a transmitter to a receiver, encoder, decoder and system for transmitting a symbol from a transmitter to a receiver
TWI453734B (en) * 2008-07-11 2014-09-21 Fraunhofer Ges Forschung Method for encoding a symbol, method for decoding a symbol, method for transmitting a symbol from a transmitter to a receiver, encoder, decoder and system for transmitting a symbol from a transmitter to a receiver
US8959017B2 (en) 2008-07-17 2015-02-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoding/decoding scheme having a switchable bypass
CN111310908A (en) * 2018-12-12 2020-06-19 财团法人工业技术研究院 Deep neural network hardware accelerator and operation method thereof
CN111310908B (en) * 2018-12-12 2023-03-24 财团法人工业技术研究院 Deep neural network hardware accelerator and operation method thereof

Also Published As

Publication number Publication date
TW200410201A (en) 2004-06-16

Similar Documents

Publication Publication Date Title
JP5608660B2 (en) Energy-conserving multi-channel audio coding
RU2763374C2 (en) Method and system using the difference of long-term correlations between the left and right channels for downmixing in the time domain of a stereophonic audio signal into a primary channel and a secondary channel
KR101340233B1 (en) Stereo encoding device, stereo decoding device, and stereo encoding method
JP5539203B2 (en) Improved transform coding of speech and audio signals
JP4859670B2 (en) Speech coding apparatus and speech coding method
JP4212591B2 (en) Audio encoding device
JP5363488B2 (en) Multi-channel audio joint reinforcement
CN103069484B (en) Time/frequency two dimension post-processing
KR100986924B1 (en) Information Signal Encoding
EP2898506B1 (en) Layered approach to spatial audio coding
RU2434324C1 (en) Scalable decoding device and scalable coding device
US8032371B2 (en) Determining scale factor values in encoding audio data with AAC
US20230410822A1 (en) Filling of Non-Coded Sub-Vectors in Transform Coded Audio Signals
JP5036317B2 (en) Scalable encoding apparatus, scalable decoding apparatus, and methods thereof
US20160027447A1 (en) Spatial comfort noise
WO2006049204A1 (en) Encoder, decoder, encoding method, and decoding method
WO2012081166A1 (en) Coding device, decoding device, and methods thereof
JPWO2009057327A1 (en) Encoding device and decoding device
KR20070070174A (en) Scalable encoder, scalable decoder, and scalable encoding method
US20040002859A1 (en) Method and architecture of digital conding for transmitting and packing audio signals
JP2019512739A (en) Coder for processing input signal and decoder for processing coded signal
JPWO2010016270A1 (en) Quantization apparatus, encoding apparatus, quantization method, and encoding method
TW584835B (en) Method and architecture of digital coding for transmitting and packing audio signals
GB2454208A (en) Compression using a perceptual model and a signal-to-mask ratio (SMR) parameter tuned based on target bitrate and previously encoded data
TW202215417A (en) Multi-channel signal generator, audio encoder and related methods relying on a mixing noise signal