TWI337812B

TWI337812B - An audio decoder and method thereof

Info

Publication number: TWI337812B
Application number: TW96115163A
Authority: TW
Inventors: Jenya Chou; Ryan Liu
Original assignee: Magima Digital Information Co Ltd; Magima Technology Co Ltd
Priority date: 2007-04-27
Filing date: 2007-04-27
Publication date: 2011-02-21
Also published as: TW200843364A

Description

九、發明說明：【發明所屬之技術領域】本發明是有關於一種音頻解碼器及其方法。【先前技術】音視頻傳輸技術廣泛應用於視頻會議、數位電視、網路電話等各類資訊技術領域。由於音視頻資料具有数據量大的特性，業界普遍採用編解碼技術來實現音視頻資料的傳輪。例如，按照一定的編碼規則在發送端編碼，並在接收端按與發送端相對應的解碼規則來解碼。通常發送端在編碼時會採用一定的編碼時脈，接收端在恢復音視頻資料時’還需要設置與編瑪時脈保持一致的解碼時脈，從而保證音視頻資料的連續有序的播放Q例如目前廣泛採用的音視頻編碼標準，即移動圖像專家組（MPEG )系列標準，它在視頻壓縮方面充分利用空間和時間上的冗餘而達到有效的壓縮，而在音頻壓縮方面則主要利用人耳的主觀雜訊感知特性來達到麼縮的目的。例如’人耳的有效識別頻率範圍在20〜20KHZ ’在進行音頻資料壓縮時可以相應地重點突出該範圍以内的信號而忽略該範圍以外的信號，同時還可以利用聲音頻譜的非平坦性從一個方面達到壓縮的目的。但音頻解瑪設備的發送端與接收端的採樣頻率瑪制可能有所不同，例如發送端編碼的採樣頻率與接收端的採樣點輸出速率不一致’可能需要進行採樣頻率的轉換。另一方面，由於通道的堵塞等原因，可能造成接收端本地時脈與發送端編碼時脈的不匹配。例如，MPEG2中以節目流 1337812 依照本發明之一實施例，音頻解碼器係解碼一音頻編碼器所輸出之至少一音頻碼流（audi〇 data stream)，此音頻解碼器包含一解析單元、一解碼單元、一重採樣單元以及一控制單元。解析單元接收外部之音頻碼流並進行解封包’以獲得一音頻資料。解碼單元解碼音頻資料，並對解碼後之音頻資料進行反離散餘弦轉換（IDCT)以及加窗處理 (Windowing)，藉以獲得複數個脈波編碼調變採樣值（pCM sample)。重採樣單元按照一採樣頻率比值，對脈波編碼調變採樣值進行重採樣（re-sampling)。控制單元控制音頻解碼器之工作。因此本發明之另一方面提供一種音頻解碼方法，使輸出之音頻信號具有較高的頻率控制精確度，並且降低音頻解碼器之成本》依照本發明之另一實施例，音頻解碼方法包括：接收外部之音頻碼流並進行解封包；對解封包後之音頻碼流進行解碼，並進行反離散餘弦轉換（IDCT)和加窗處理 (Windowing)，以獲得複數個脈波編碼調變採樣值（PCM sample);以及按一預定採樣頻率比值，對脈波編碼調變採樣值進行重採樣（Resample)後予以輸出。依照本發明之又一實施例，音頻解碼方法包括：解析音頻封包，以得到一音頻資料；對音頻資料進行一解碼程序，以獲得至少一脈波編碼調變採樣值（at least one PCM sample);對該脈波編碼調變採樣值進行重採樣 (re-sampling);以及據波（filtering)調整後之脈波編瑪調變採樣值。 7 1337812 採樣演算法對採樣值進行滤波，以重構採樣點輸出波形，而調整採樣點輸出速率。在此第2圖中，箭頭有序排列表不採樣的時間點，爲便於清楚顯示，圖中僅以較少的採樣點作爲示例。在原有採樣頻率下有波形A，當實際音頻採 • 樣點輸出速率大於音頻解碼器的播放速率時，藉由波形重、冑相與波形A相同的波形B，但波形B的採樣頻率提高；當實際音頻採樣點輪出速率小於音頻解碼器的播放速率時，藉由波形重建得到與波形A相同的波形c，但波形〇 • 的採樣頻率降低。這襄採闬的賣構演算法例如可以爲插值演算法、短時傅立葉變換演算法或頻域預測演算法等或者其中任意幾種演算法的合理組合，如時域插值演算法與傅立葉變換演算法相結合。例如在本發明的一個實施例中，採用時域插值演算法，利用内插逐步逼近的方法完成採樣頻率的變換，從而達到調整採樣點輸出速率的目的。 6月參閱第3圖，其係繪示本發明一實施例之重採樣單兀結構框圖。重採樣單元105設有檢測裝置13卜頻率比值 φ 控制裝置m和頻率調整裝置I33。檢測裝置131可以對採樣頻率的變換和/或採樣頻率的誤差進行檢測。檢測裝置131根據接收到的資料串流（data stream)中的資訊來確定是否需要調整採樣頻率。例如，檢測裝置131可以根據音頻資料串流的基本封包的檔頭 (Header)資訊來確定是否需要進行採樣頻率的調整。頻率比值控制裝置132根據檢測裝置131的輸出值計算得出頻率調整參考值或直接記錄爲頻率調整參考值。當檢查到需要變換採樣頻率而進行碼制轉換和/或存在採樣頻率的誤差 9 1337812 採用級聯的形式來實現。從形式上來說，可以先對一種重採樣的應用進行濾波完成頻率變化，例如先將編碼端的 48.005KHZ變換到48ΚΗζ達成誤差糾正，再對另—種重採樣的應用進行濾波完成頻率變化，例如再將48KHz變換到 32KHz。兩種重採樣之間可以用卷積的邏輯關係相連。第4圖係繪不本發明一實施例之音頻解碼方法流程圖。音頻解碼方法包括：步驟401，接收外部的音頻瑪流並進行解封包。IX. INSTRUCTIONS: TECHNICAL FIELD The present invention relates to an audio decoder and a method thereof. [Prior Art] Audio and video transmission technology is widely used in various information technology fields such as video conferencing, digital television, and network telephone. Due to the large amount of data in audio and video data, codec technology is commonly used in the industry to realize the transmission of audio and video data. For example, it is encoded at the transmitting end according to a certain encoding rule, and decoded at the receiving end according to a decoding rule corresponding to the transmitting end. Usually, the transmitting end uses a certain encoding clock when encoding, and the receiving end needs to set the decoding clock consistent with the programming clock when recovering audio and video data, so as to ensure continuous and orderly playback of audio and video data. For example, the currently widely used audio and video coding standard, the Moving Picture Experts Group (MPEG) series of standards, utilizes spatial and temporal redundancy in video compression to achieve effective compression, while audio compression mainly uses The subjective noise perception characteristics of the human ear are used to achieve the purpose of shrinking. For example, 'the effective recognition frequency range of the human ear is 20~20KHZ'. When performing audio data compression, the signal within the range can be highlighted accordingly, and the signal outside the range can be ignored, and the non-flatness of the sound spectrum can also be utilized from one. Aspects achieve the purpose of compression. However, the sampling frequency of the transmitting end and the receiving end of the audio decoding device may be different. For example, the sampling frequency encoded by the transmitting end is inconsistent with the sampling point output rate of the receiving end. It may be necessary to perform sampling frequency conversion. On the other hand, due to the blockage of the channel, etc., it may cause a mismatch between the local clock at the receiving end and the encoded clock at the transmitting end. For example, in MPEG2, program stream 1337812, in accordance with an embodiment of the present invention, an audio decoder decodes at least one audio stream outputted by an audio encoder, the audio decoder including a parsing unit, A decoding unit, a resampling unit, and a control unit. The parsing unit receives the external audio stream and performs decapsulation to obtain an audio material. The decoding unit decodes the audio data, and performs inverse discrete cosine transform (IDCT) and windowing (Windowing) on the decoded audio data to obtain a plurality of pulse code modulated sample values (pCM samples). The resampling unit resamples the pulse code modulated sample values according to a sampling frequency ratio. The control unit controls the operation of the audio decoder. Accordingly, another aspect of the present invention provides an audio decoding method that enables an output audio signal to have higher frequency control accuracy and lowers the cost of the audio decoder. According to another embodiment of the present invention, an audio decoding method includes: receiving The external audio stream is decapsulated; the audio stream after decapsulation is decoded, and inverse discrete cosine transform (IDCT) and windowing (Windowing) are performed to obtain a plurality of pulse code modulated samples ( PCM sample); and according to a predetermined sampling frequency ratio, the pulse code modulated sample value is resampled (Resample) and output. According to still another embodiment of the present invention, an audio decoding method includes: parsing an audio packet to obtain an audio material; performing a decoding process on the audio data to obtain at least one PCM sample at least one pulse coded sample value (at least one PCM sample) And re-sampling the pulse code modulated sample value; and adjusting the pulse wave coded sample value according to the filtering. 7 1337812 The sampling algorithm filters the sampled values to reconstruct the sample point output waveform and adjust the sample point output rate. In this second figure, the arrow sorts the list of time points that are not sampled. For the sake of clarity, only a few sample points are used as an example. There is waveform A at the original sampling frequency. When the actual audio sampling point output rate is greater than the playback rate of the audio decoder, the waveform B with the same waveform and 胄 phase and waveform A is the same, but the sampling frequency of waveform B is increased; When the actual audio sample point rotation rate is lower than the audio decoder's playback rate, the same waveform c as waveform A is obtained by waveform reconstruction, but the sampling frequency of the waveform 〇• is lowered. The selling algorithm can be, for example, an interpolation algorithm, a short-time Fourier transform algorithm or a frequency domain prediction algorithm, or a reasonable combination of any of the several algorithms, such as a time domain interpolation algorithm and a Fourier transform algorithm. The law is combined. For example, in an embodiment of the present invention, the time domain interpolation algorithm is used to perform the transformation of the sampling frequency by using the interpolation stepwise approximation method, thereby achieving the purpose of adjusting the sampling point output rate. Referring to Figure 3 in June, a block diagram of a resampling unit 一 according to an embodiment of the present invention is shown. The resampling unit 105 is provided with a detecting means 13 frequency ratio value φ controlling means m and frequency adjusting means I33. The detecting means 131 can detect the conversion of the sampling frequency and/or the error of the sampling frequency. The detecting means 131 determines whether it is necessary to adjust the sampling frequency based on the information in the received data stream. For example, the detecting means 131 can determine whether or not the sampling frequency needs to be adjusted based on the header information of the basic packet of the audio stream. The frequency ratio control means 132 calculates a frequency adjustment reference value based on the output value of the detecting means 131 or directly records it as a frequency adjustment reference value. When it is checked that the sampling frequency needs to be converted and the code conversion is performed and/or the error of the sampling frequency is present, 9 1337812 is implemented in a cascaded form. Formally speaking, a resampling application can be first filtered to complete the frequency change. For example, the 48.005 KHZ of the encoding end is first transformed to 48 ΚΗζ to achieve error correction, and then another resampling application is filtered to complete the frequency change, for example, Transform 48KHz to 32KHz. The two types of resampling can be connected by a logical relationship of convolution. Fig. 4 is a flow chart showing an audio decoding method according to an embodiment of the present invention. The audio decoding method includes: Step 401: Receive an external audio stream and perform decapsulation.

步驟403，對解封包後的音頻碼流進行解碼，龙進行反離散餘弦轉換（IDCT)以及加窗處理（wind〇wing)，來獲得脈波編碼調變採樣值（PCM sampies)。步驟405,對脈波編碼調變採樣值按預定採樣頻率比值進行重採樣後予以輸出。第5圖係繪示音頻解碼方法中步驟4〇5的詳細流裎圖。其中，步驟405包括：Step 403: Decode the audio stream after decapsulation, and perform inverse discrete cosine transform (IDCT) and windowing (wind) on the dragon to obtain pulse code modulated samples (PCM sampies). In step 405, the pulse code modulated sample value is resampled according to a predetermined sampling frequency ratio and then output. Figure 5 is a detailed flow diagram showing steps 4〇5 of the audio decoding method. Wherein, step 405 includes:

步驟501，對採樣頻率的變換和/或採樣頻率的誤差進行檢測，並産生一個頻率調整比值參考值。步驟503,根據頻率調整比值參考值輪出新採樣頻率和原採樣頻率的比值。步驟505，根據新採樣頻率和原採樣頻率的比值，採用濾波方法重建波形，並對採樣頻率進行變換和/或調整，之後輸出採樣點。其中，重採樣步驟是按照一定的演算法採用濾波方法重構採樣點的輸出波形而調整採樣點輸出速率。這種演算法包括插值演算法、短時傅立葉變換演算法和頻域預測演 16In step 501, the conversion of the sampling frequency and/or the error of the sampling frequency are detected, and a frequency adjustment ratio reference value is generated. Step 503, according to the frequency adjustment ratio reference value, the ratio of the new sampling frequency to the original sampling frequency is rotated. Step 505: According to the ratio of the new sampling frequency and the original sampling frequency, the filtering method is used to reconstruct the waveform, and the sampling frequency is transformed and/or adjusted, and then the sampling point is output. The resampling step is to adjust the output point of the sampling point by using a filtering method to reconstruct the output waveform of the sampling point according to a certain algorithm. This algorithm includes interpolation algorithm, short-time Fourier transform algorithm and frequency domain prediction.

一種或多種演算法進行濾波以重構波 β對PCM採樣值進行重採樣時，採樣頻率的重採樣比值疋可變的。在本發明的一個實施例中，採樣頻率的精確度範圍可以根據需要利用軟體編程來設置。知樣頻率的重採樣比值可以根據音頻資料串流提供的的資訊來設置。在符合MPEG標準的一些實施例中，對於採樣頻率的誤差的調整，由採樣頻率的重採樣比值根據音，資料串朗基本封包的楼頭資訊包含的pTS域的值來決定，對於不同知樣頻率之間的變換，由採樣頻率的重採樣比值根據音頻資料串流的基本流的檔頭資訊包含的 sampling frequencey域的值來決定，並可結合基本流的檔頭資訊包含的ID域的值。雖然本發明已以一實施例揭露如上，然其並非用以限定本發明，任何在本發明所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍内，當可作各種之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準》【圖式簡單說明】為讓本發明之上述和其他目的、特徵、優點與實施例能更明顯易懂’所附圖式之詳細說明如下：第1圖係繪示本發明一實施例之一種音頻解碼器結構示意圖。 17When one or more algorithms filter to reconstruct the wave β to resample the PCM sample value, the resampling ratio of the sample frequency is variable. In one embodiment of the invention, the accuracy of the sampling frequency can be set using software programming as needed. The resampling ratio of the known frequency can be set based on the information provided by the audio stream. In some embodiments conforming to the MPEG standard, for the adjustment of the error of the sampling frequency, the resampling ratio of the sampling frequency is determined according to the value of the pTS field included in the information of the basic packet of the data packet, for different samples. The conversion between frequencies, the resampling ratio of the sampling frequency is determined according to the value of the sampling frequencey field included in the header information of the elementary stream of the audio data stream, and may be combined with the value of the ID field included in the header information of the elementary stream. . Although the present invention has been disclosed in an embodiment of the present invention, it is not intended to limit the present invention, and any one of ordinary skill in the art to which the invention pertains may be modified in various ways without departing from the spirit and scope of the invention. And the scope of the present invention is defined by the scope of the appended claims. [Simplified Description of the Drawings] The above and other objects, features, advantages and embodiments of the present invention will become more apparent. The detailed description of the drawings is as follows: FIG. 1 is a schematic diagram showing the structure of an audio decoder according to an embodiment of the present invention. 17

Claims

1337812 September 1999 丨7曰Revised replacement page f.............— ___ X. Patent application scope: [Π一痛日修正"Ij • *"-**- An audio decoder for decoding at least one audio data stream output by an audio encoder, comprising: a parsing unit for receiving the external audio stream and performing decapsulation Obtaining an audio data; a decoding unit, decoding the audio data, and performing inverse discrete cosine transform (IDCT) and windowing processing (Windowing) on the decoded audio data to obtain a plurality of pulse code modulated sample values ( The pcm is sampled; the resampling unit resamples the pulse coded modulated samples according to a sampling frequency ratio, wherein the resampling unit comprises: a detecting device, which is a sampling frequency (sampnng) Frequency) and/or sampling frequency error is detected to generate a frequency adjustment ratio reference value; a frequency ratio control device that adjusts the ratio reference value according to the frequency to output a sampling frequency Ratio; and a frequency adjustment means, according to the sampling frequency ratio of using filtering methods (Hltering) reconstruct the waveform, thereby converting and / or adjusting the sampling frequency; and a control unit for controlling the work of the audio decoder. 2. The audio decoder of claim 1, wherein the frequency ratio control device comprises an X-value register (a X register) and a γ, a value register (a Y register) for storing Value. </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; The display time stamp (Presentation Time Stamp) and one of the audio decoders (Real Time c丨〇ck) are calculated, and the juice calculation result is stored in the X ratio register and the The gamma ratio register is 'one. 4. The audio decoding p as described in claim 2, wherein the intra-bottle x-value register and the Y-value register store a constant and/or a variable. 5. The audio decoder of claim 2, wherein the ratio register and the Y ratio register store an original sampling frequency of the audio encoder and a playback sample of the decoding unit. Frequency value.

6. If the χ ratio register and the γ ratio register are in the tone_code|^ as described in the application patent (4), the ratio of the audio encoder is: the original sampling frequency value and One of the decoding units plays the value of the sampling frequency value, and one of the constants can be set. 7--A kind of audio decoding method, which is an audio code stream output by an audio r® decoder that decodes an audio chip device (audi〇d 颏, method includes: U gamma, the audio solution U) receiving Externally the audio stream and unpacking 20 September 9th, 7th revised replacement page (b) Decoding the audio stream after decapsulation, and performing inverse discrete cosine transform (IDCT) and windowing ( Windowing) to obtain a plurality of pulse code modulated samples (PCM samples); and (c) performing a resampling (Re_sample) on the pulse code modulated sample values according to a predetermined sampling frequency ratio ' The step (c) comprises: (cl) detecting a sampling frequency (sampling freqUenCy) and/or an error of the sampling frequency to generate a frequency adjustment ratio reference value: (c2) adjusting the ratio reference value according to the frequency And outputting a frequency ratio; and c3) reconstructing an audio signal waveform according to the sampling frequency ratio by using a filtering method =::), thereby converting and/or adjusting the sampling frequency. Jiajing seeks the range of the first rate of the code conversion, and when '~sample; difference, correct the sample; it: owe!) detects that the sampling frequency has 1 (4)) detected the audio decoding method, When 1 is, the code conversion and the error are converted, and the sampling frequency is corrected by J-correction synthesis-time chopping. 1〇.” Please refer to the audio decoding method described in item 8 of the patent scope, 21 1337812, September 1999 丨7曰 correction replacement page. When this step (Cl) detects that the sampling frequency is changed, and the sampling frequency is ^ In the case of an error, the code conversion and the error correction are separately filtered, and then logically synthesized in a convolution form. 11. The audio decoding method according to any one of claims 7 to 10 When the step (cl) detects that the sampling frequency has an error, the error Δ/τ of the sampling frequency is: ^F^liRTC-PTSyC,] > ® where 'the step coefficient CoefHc丨ent RTC (Real Time Clock) indicates one of the local decoders of the audio decoder, and PTS indicates one of the recorded time stamps (presentati〇n Time Stamp). 12. As described in claim 11 The audio decoding method, wherein the step (cl) obtains the frequency adjustment ratio reference value by using the following formula: Fs=F^*Df where Μ is the error of the sampling frequency, and ^ is the original according to one of the local clocks Sampling frequency value, F s is the detected output sampling frequency value, and D/ is the adjustment accuracy. 13. The audio decoding method according to the seventh to the 1Gth patent application, wherein the step (cl) detects that the sampling frequency has a transformation The ratio of the original sampling frequency of the audio encoder recorded by the audio stream to the audio sampling frequency of the audio decoder is taken as the frequency 22 1337812 «September 17, 1999 correction replacement page Adjusting the ratio reference value. 14. An audio decoding method for decoding at least one audio packet output by an audio encoder, comprising: (a) parsing the audio packet to obtain an audio material; (b) performing the audio data. a decoding process to obtain at least one pulse-coded sample value (at least one PCM sample); (c) re-sampling the pulse code-modulated modulated sample value, wherein the step is sufficient (c Included: detecting a local time clock (Real Time Clock) of the audio decoder and one of a display time stamp (Presentation Time Stamp) recorded by the audio packet, thereby obtaining an error frequency; The error frequency performs a mathematical operation to obtain a playback sampling frequency of one of the audio decoders; and adjusts the number of the pulse code modulated sample values according to the playback sampling frequency; and (d) filtering after resampling The pulse code encodes the modulated sample value. 15. The audio decoding method of claim 14, wherein the decoding program comprises inverse discrete cosine transform (IDCT) and windowing (Windowing) on the audio material. 16. The audio decoding method according to claim 14, wherein the frequency error AF is: = , wherein CF is a step coefficient (Step Length 23 1337812 September 17, 1999, correction replacement page Coefficient), RTC and The PTS is the local clock and the display time stamp, respectively. 17. The audio decoding method according to claim 16, wherein the dialing sampling frequency is: , for the frequency of the local clock, &

twenty four