TW201227351A - Recursive modified discrete cosine transform and inverse discrete cosine transform system with a computing kernel of RDFT - Google Patents

Recursive modified discrete cosine transform and inverse discrete cosine transform system with a computing kernel of RDFT Download PDF

Info

Publication number
TW201227351A
TW201227351A TW99146938A TW99146938A TW201227351A TW 201227351 A TW201227351 A TW 201227351A TW 99146938 A TW99146938 A TW 99146938A TW 99146938 A TW99146938 A TW 99146938A TW 201227351 A TW201227351 A TW 201227351A
Authority
TW
Taiwan
Prior art keywords
signal
temporary
multiplexer
generate
delay
Prior art date
Application number
TW99146938A
Other languages
Chinese (zh)
Other versions
TWI423046B (en
Inventor
Sheau-Fang Lei
Shin-Chi Lai
Chen-Chieh Lin
Wen-Ho Juang
Original Assignee
Univ Nat Cheng Kung
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Cheng Kung filed Critical Univ Nat Cheng Kung
Priority to TW99146938A priority Critical patent/TWI423046B/en
Publication of TW201227351A publication Critical patent/TW201227351A/en
Application granted granted Critical
Publication of TWI423046B publication Critical patent/TWI423046B/en

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention provides a recursive modified discrete cosine transform and inverse modified discrete cosine transform system, whose computing kernel is a Recursive Discrete Fourier Transform (RDFT). The system is implemented with a low core area, low computation complexity and high performance recursive discrete Fourier transform. Moreover, the system can be re-usable, re-configurable, and can be easily used in any taps of discrete Fourier transform. The system can play a kernel of the MDCT/IMDCT to achieve the reusability so as to increase the utilization.

Description

201227351 六、發明說明: 【發明所屬之技術領域】 本發明係關於數位訊號處理之技術領域尤指—種 以離散傅立葉轉換為核心之修正型離散餘弦正轉換之系 【先前技術】 近年來由於環保意識高漲,全球各種產業紛紛不斷 倡導節能減碳的行冑,對於3C產業而t,邁向綠能設計 必為未來趨勢。以行動多媒體裝置來說,其功能性不再 像是以往一般單一性質,除了具備整合多種高壓縮率的 音樂格式(MP3、AC-3、ACC等)外,尚提供即時廣播收聽 與錄音等多樣化功能’像這種多功能性的產品,要如何 納入綠能設計之主要概念—低成本、高性能、可組態化 以及可重覆利用性’仍是一大挑戰。同時,在相同的 播放平台上同時要將不同的系統或編解碼 (C 〇 d e c)整合在一起且達到有效地縮減其相似性 並不容易。 隨者科技的發展與 3C(Computer、Communication、 Consumer Electronics)產品技術不斷創新下,快速傅立葉 轉換(Fast Fourier Transform, FFT)已廣泛的被應用,特別 是在通訊方面。因傳輸過程常一般常見會利用正交分頻 多工(Orthogonal Frequency-Division Multiplexing,OFDM) 技術來做調變(Modulation)及解調變(Demodulation)動 201227351 作,其OFDM内部即需運用到快速傅立葉轉換斤扣 Fourier Transform, FFT) ° 快速型傅立葉轉換(Frr)自丨965年由j w 乂叫丨”及上 W. Tukey提出才開始為人所重視,早期對於FFT的研究主 要疋致力於方法複雜度的分析、探討和需要多少運算量 感到興趣,進而提出更有效率之運算方式。近年來依然 有許多研究不斷尋求FFT複雜度的下限。201227351 VI. Description of the Invention: [Technical Field] The present invention relates to the field of digital signal processing, and more particularly to a system of modified discrete cosine positive conversion with discrete Fourier transform as the core [Prior Art] Awareness is rising, and various industries around the world are constantly advocating energy conservation and carbon reduction. For the 3C industry, the green energy design will be the future trend. In terms of mobile multimedia devices, the functionality is no longer a single nature in the past. In addition to the integration of multiple high compression ratio music formats (MP3, AC-3, ACC, etc.), it also provides instant broadcast listening and recording. The ability to incorporate such a versatile product into the main concept of green energy design—low cost, high performance, configurability, and reusability—is still a challenge. At the same time, it is not easy to integrate different systems or codecs (C 〇 d e c) on the same playback platform and effectively reduce the similarity. With the development of technology and the continuous innovation of 3C (Computer, Communication, Consumer Electronics) product technology, Fast Fourier Transform (FFT) has been widely used, especially in communication. Due to the transmission process, Orthogonal Frequency-Division Multiplexing (OFDM) technology is commonly used to perform modulation and demodulation (2012), and OFDM is required to be used internally. Fourier Transform (Fourier Transform, FFT) ° Fast Fourier Transform (Frr) Since 965, by JW 乂 丨 及 and W. Tukey, it has been paid attention to. The early research on FFT is mainly devoted to methods. The analysis of complexity, the discussion and the amount of computation required, and the introduction of more efficient algorithms. In recent years, there are still many studies that constantly seek the lower limit of FFT complexity.

傳統對於FFT轉換多半都是由軟體處理。轉換過程中 需大量乘、加運算,無疑地增加處理器的負擔。對於行 動多媒體裝置而言,往往會受限於處理器的運算能力, 導致轉換速度及結果有差異。因此,最常見的習知技術 疋將此部分的運算硬體化,其優點在於可降處理器的負 擔,同日守由於疋硬體獨立運作,故可增加轉換速度。其 硬aa杀構可概分遞迴式(Recursive)以及平行式丨丨e丨)。 平行式架構等實現方式,常見有記憶體基礎 FFT(Memory-based FFT)、MDC (Multi-path DelayTraditionally, most FFT conversions are handled by software. A large number of multiplications and additions are required during the conversion process, which undoubtedly increases the burden on the processor. For mobile multimedia devices, it is often limited by the computing power of the processor, resulting in differences in conversion speed and results. Therefore, the most common conventional techniques 硬 hardwareize this part of the operation, which has the advantage of reducing the burden on the processor, and the same day operation can increase the conversion speed due to the independent operation of the hardware. Its hard aa killing can be roughly recursive (recursive) and parallel 丨丨e丨). Parallel architecture and other implementations, common memory-based FFT, MDC (Multi-path Delay)

Commutator,MDC) FFT 及 SDF (Single-path DelayCommutator, MDC) FFT and SDF (Single-path Delay)

Feedbaek,SDF)FFT等。其優點為轉換速度快,然其缺點 為.(1)規格點數調整性差,一般運用於二的冪次方點數’ 一旦硬體實現後想運用於其他方面有所限制與困難;(2) 需大量的記憶元件,這會使得晶片面積過大及功耗的提 升。 近年來又出現一嶄新的數位廣播技術,此技術稱之 為數位王球無線電廣播(D丨抑汪丨M〇dja丨e, drm),其 所使用的規格點數與傳統二的冪次方點有所差異,分別 201227351 為N=288、256、176、112。對於平行式架構而言,此係 一個全新的挑戰。若要達成此類點數之設計必須規劃額 外的硬體來混搭現有的架構,K. Dong-Sun,et al_在 Consumer Electronics, IEEE Transactions on, vol. 54,pp. 1590-1594, 2008戶斤提出的 r Design of a mixed prime factor FFT for portable digital radio mondiale receiver」論文中 採取此方式設計。相較於遞迴式架構,無論是二的冪次 方點或非二的冪次方點皆不需重新設計,可直接達到綠 能設計之重覆運用的概念,但唯一考量即為運算速度問 題。因此,如何設計出有效率的遞迴式架構電路成為一 項挑戰。 音樂格式 MP3 (MPEG-1 Audio Layer 3, MP3)、AC-3 (Dolby AC-3, AC-3)及 AAC (Advanced Audio Coding, A AC)其編碼端之訊號時/頻轉換分析都是藉由修正型離 散餘弦轉換(Modified Discrete Cosine Transform, MDCT) 來完成,而解碼端亦由逆轉換一逆修正型離散餘弦轉換 (Inverse Modified Discrete Cosine Transform,IMDCT)來 完成,故以子頻帶分析/合成(subband analysis/synthesis) 為基礎之MDCT/IMDCT已廣泛應用在各種音頻編解碼標 準上。 然而MDCT/IMDCT的計算複雜度與FFT相同,其均 具有大量的乘、加運算,且在整個編解碼過程中此運算 佔有一定的比例。於是有了將MDCT/IMDCT的計算獨立 硬體實現化的概念,以減少處理器的負擔。一習知技術 係採用FFT為核心之平行架構來實現之,但此種架構將會 201227351 有較差的運用彈性’往往受限於二的冪次方規格、大量 記憶元件等問題。 為了改善點數的限制問題,另一習知技術係採用以 DCT (Discrete Cosine Transform, DCT)為核心之平行架 構及遞迴架構,以應用於非二的冪次方規格上。 對於平行架構而言,其需複雜的控制方式與極高的 硬體需求,這將不利於多格式多點數之硬體實現。 對於遞迴架構而言,C· Hwang-Cheng and L.Feedbaek, SDF) FFT, etc. The advantage is that the conversion speed is fast, but the disadvantage is that (1) the specification points are poorly adjusted, and the power points generally used for the second power point 'has to be limited and difficult to be used in other aspects after the hardware is implemented; (2) ) A large number of memory components are required, which leads to an excessive wafer area and an increase in power consumption. In recent years, a new digital broadcasting technology has appeared. This technology is called the digital ball radio broadcast (D丨丨汪丨M〇dja丨e, drm), and the number of specifications used is the power of the traditional two. The points are different, respectively, 201227351 is N=288, 256, 176, 112. For a parallel architecture, this is a new challenge. In order to achieve such a design, additional hardware must be planned to mix and match existing architectures, K. Dong-Sun, et al_ in Consumer Electronics, IEEE Transactions on, vol. 54, pp. 1590-1594, 2008 This design is adopted in the paper of r Design of a mixed prime factor FFT for portable digital radio mondiale receiver. Compared with the recursive architecture, neither the power of the second power nor the power of the second power need to be redesigned, and the concept of repeated application of the green energy design can be directly achieved, but the only consideration is the operation speed. problem. Therefore, how to design an efficient recursive architecture circuit becomes a challenge. The music format MP3 (MPEG-1 Audio Layer 3, MP3), AC-3 (Dolby AC-3, AC-3) and AAC (Advanced Audio Coding, A AC) have their signal-time/frequency conversion analysis at the encoding end. It is implemented by Modified Discrete Cosine Transform (MDCT), and the decoding end is also implemented by Inverse Modified Discrete Cosine Transform (IMDCT), so subband analysis/synthesis is performed. (subband analysis/synthesis) Based on MDCT/IMDCT, it has been widely used in various audio codec standards. However, the computational complexity of MDCT/IMDCT is the same as that of FFT, which has a large number of multiplication and addition operations, and this operation has a certain proportion in the entire codec process. So there is the concept of realizing the computational independent hardware of MDCT/IMDCT to reduce the burden on the processor. A conventional technique uses FFT as the core parallel architecture to achieve this, but this architecture will have poor application flexibility in 201227351 'often limited by the power of two power specifications, a large number of memory components and so on. In order to improve the limitation of the number of points, another conventional technique adopts a parallel architecture and a recursive architecture with DCT (Discrete Cosine Transform, DCT) as the core, and is applied to the power of the non-two power. For the parallel architecture, it requires complex control methods and extremely high hardware requirements, which will be detrimental to the hardware implementation of multi-format multi-points. For the recursive architecture, C· Hwang-Cheng and L.

Jie-Cherng於 Signal Processing Letters, IEEE, vol. 3,pp. 116-118, 1996所發表的論文"Regressive implementat丨〇ns for the forward and inverse MDCT in MPEG audio coding” 中利用Sinusoidal/Cosinusoidal遞迴式提出遞迴 MDCT/IMDCT (RMDCT/RIMDCT)架構。C. Che_H〇ng et al於Circuits and Systems II: Analog and Digital SignalJie-Cherng recursively using Sinusoidal/Cosinusoidal in the paper "Regressive implement at 丨〇. for the forward and inverse MDCT in MPEG audio coding" published by Signal Processing Letters, IEEE, vol. 3, pp. 116-118, 1996. Recursive MDCT/IMDCT (RMDCT/RIMDCT) architecture. C. Che_H〇ng et al in Circuits and Systems II: Analog and Digital Signal

Processing, IEEE Transactions on, vol. 50, pp. 38-45, 2003 所發表的論文"Recursive architectures f〇r reaHzing modified discrete cosine transform and its inverse"及 S. Lai, et al.於 IEEE Transactions on Circuits and Systems II: Express Briefs,voi. 56,pp. 793-797, 2009所發表的論文 "Common architecture design of novel recursive MDCT and IMDCT algorithms for application to AAC, AAC in DRM’ and MP3 codecs"利用Chebyshev多項式提出有效率 及較高產量之RMDCT/RIMDCT架構與實現方式。 相較於平行式架構,遞迴架構有架構與控制設計簡 單等優點,能在不更改硬體架構下,動態地切換規格點 201227351 數’若硬體資源有限情況下,此種架構會是不錯的選擇, 不過其缺點需有較多的計算週期。 儘管多年來離散餘弦正轉換、反轉換之系統已經發 展許多,然而為能進一步降低運算複雜度,減少硬體成 本、及提高資料計算之效能’前述離散餘弦正轉換之系 統仍有予以改善之需要。 【發明内容】 本發明之主要目的係在提供一種以離散傅立葉轉換 為核心之修正型離散餘弦正轉換、反轉換之系統,其可 實現出具有低面積、低複雜度及高效能的遞迴離散傅立 葉轉換(Recursive Discrete Fourier Transform, RDFT),還 擁有節能、可重覆利用及可組態化之綠能設計概念,它 能輕易地被使用於任何規格點數之DFT轉換,同時還扮演 著MDCT/IMDCT核心,達到RDFT能善加被重覆利用,以 增進使用率。 依據本發明之一特色,本發明提出一種以離散傅立 葉轉換為核心之修正型離散餘弦正轉換之系統,其包含 一資料順序移位編排單元、一資料重新排序單元、一第 一旋轉運算單元、一 N/4個點之離散傅立葉轉換單元、一 第二旋轉運算單元、及一解交錯(de_inteHeave)運算單 元。該資料順序移位編排單元接收1^個輸入數位訊號,對 該N個數位訊號執行順序移位編排,以產生N個第一暫時 訊號,當中,N為4的倍數之正整數。該資料重新排序單 元連接至該資料順序移位編排單元,對該第_暫時訊號 201227351 執行資料重新排序運算,以產生N/4個第二暫時訊號。令 第-旋轉運算單元連接至該資料重新排序單元,對該^ :固第二暫時訊號執行一第一旋轉運算,以產生_個第三 暫時訊號。該NM個點之離散傅立葉轉換單㈣接至 知轉運具草元,對該N/4個笛-缸。士 第二暫時訊號執行離散傅立 德/ ’以產生_個第四暫時訊號。該N/4個點之離散 傅立葉轉換單元包含一第一 笛^ 弟夕工态、—第一加法器、一Processing, IEEE Transactions on, vol. 50, pp. 38-45, 2003 published paper "Recursive architectures f〇r reaHzing modified discrete cosine transform and its inverse" and S. Lai, et al. on IEEE Transactions on Circuits And Systems II: Express Briefs, voi. 56, pp. 793-797, 2009 published "Common architecture design of novel recursive MDCT and IMDCT algorithms for application to AAC, AAC in DRM' and MP3 codecs" using Chebyshev polynomials Propose the efficient and high-output RMDCT/RIMDCT architecture and implementation. Compared with the parallel architecture, the recursive architecture has the advantages of simple architecture and control design, and can dynamically switch the specification point 201227351 without changing the hardware architecture. If the hardware resources are limited, this architecture will be good. The choice, but its shortcomings need to have more calculation cycles. Although the system of discrete cosine positive conversion and inverse conversion has been developed for many years, in order to further reduce the computational complexity, reduce the hardware cost, and improve the performance of data calculation, the system of the aforementioned discrete cosine positive conversion still needs improvement. . SUMMARY OF THE INVENTION The main object of the present invention is to provide a system for correcting discrete cosine transforming and inverse transforming with discrete Fourier transform as a core, which can realize recursive discrete with low area, low complexity and high performance. Recursive Discrete Fourier Transform (RDFT) also has an energy-saving, reusable and configurable green energy design concept that can be easily used for DFT conversion of any specification point and also acts as MDCT /IMDCT core, to achieve RDFT can be reused to increase utilization. According to a feature of the present invention, the present invention provides a system for modifying a discrete cosine transform with a discrete Fourier transform as a core, comprising a data sequential shifting unit, a data reordering unit, a first rotating unit, A discrete Fourier transform unit of N/4 points, a second rotation operation unit, and a de-interleaving (de_inteHeave) operation unit. The data sequential shifting unit receives 1^ input digit signals, and performs sequential shift programming on the N digit signals to generate N first temporary signals, wherein N is a positive integer of a multiple of 4. The data reordering unit is connected to the data sequential shifting unit, and performs a data reordering operation on the first temporary signal 201227351 to generate N/4 second temporary signals. The first rotation unit is coupled to the data reordering unit to perform a first rotation operation on the second temporary signal to generate a third temporary signal. The discrete Fourier transform single (4) of the NM points is connected to the known transfer device grass, for the N/4 flute-cylinder. The second temporary signal performs discrete Fried/ _ to generate _ a fourth temporary signal. The discrete Fourier transform unit of the N/4 points includes a first flute, a first adder, a first adder

第-乘法器、一第_移位暫存器、—第二多工器 —遲延器、一第二乘法 弟 号、货免 弟—夕工益、一第三乘法 :第一!四多工器、—第二加法器'及-第二遲延器。 “ 7工益用以接收該N/4個第三暫時訊號與-第一 乘法訊號,並產生一笼 夕丫上 米— 至該第一多工ί 夕訊號。該第-加法器連接 〇第夕工益,以對該第—多工訊號與一第二遲^ 號進行加法運算,以產遲 m… 第四暫時訊號。言玄第-乘法 函…加法器’以對該第四暫時訊號與-餘弦 函數讯號進行乘法運笞& 、 -移位暫存器連二第第-乘法訊號。該第 號進行移位運算,以產Ϊ ^ 以對該第-乘法訊 器連接JL該第第—移位訊號。該第二多工 移位訊轳,γ ’益接收該第—乘法訊號及該第-多位。ft號’以輪出一笛-炙丁 〇〇 接至該第一加法琴 …訊號。該第-遲延器連 算,以產生-第1 第四暫時訊號進行遲延運 -遲延㈠ 號。該第二乘法器連接至該第 遲延裝置,以對該第一遲延訊 行乘法運算,以產生兮第一乘、…弦函數訊號進 連接至該第一遲延法:號。該第三多工器, ^裝置及δ玄第二乘法器,接收該第一遲 a 9 201227351 以輸出一第三多工器訊 延訊號及該該第二乘法訊號 號。。該第三乘法器連接至該第三多玉器,㈣該第三多 工器訊號與-1進行乘法運算,以產生—第三乘法訊號。 該第四多工器連接至該第二多工器,接收該第二多工器 訊號及該第二遲延訊號,以輪出—第四多工器訊號。該 第二加法器連接至該第三乘法器及該第四多工器,以對 該第三乘法訊號與該第四多工器訊號進行加法運算,以 產生一第二加法訊號。該第二遲延器連接至該第二加法 器以對"亥第一加法訊號進行遲延運算,以產生該第二 遲延遘號。戎第二旋轉運算單元連接至該N/4個點之離散 傅立葉轉換單疋’對該N/4個由第四暫時訊號及第二加法 訊號組成執行一第二旋轉運算,以產生N/4個第五暫時訊 號。該解交錯(de-interleave)運算單元連接至該第二旋轉 運算單元,對該N/4個第五暫時訊號執行一解交錯運算, 以產生N個輸出訊號。 依據本發明之另一特色,本發明提出一種以離散傅 立葉轉換為核心之修正型離散餘弦反轉換之系統,其包 含_資料重新排序單元 ' —第一旋轉運算單元、一 N/4個 點之離散傅立葉轉換單元、一第二旋轉運算單元、及一 解交錯(deinterleave)運算單元。該資料重新排序單元接收 N/2個輪入數位訊號’對該n/2個輸入數位訊號執行資料 重新排序運算,以產生N/4個第 六暫時訊號,當中,N為4 的倍數之正整數。該第一旋轉運算單元連接至該資料重 新排序單元’對該N/4個第六暫時訊號執行一第一旋轉運 算’以產生N/4個第七暫時訊號。該n/4個點之離散傅立 201227351 葉轉換單元連接至該第-旋轉運算單元,㈣n/4 暫時訊號執行離散傅立葉轉換,以產生N/4㈣ 號。該NM個點之離散傅立葉轉換單元包含一第—:汛 器、一第一加法器、-第一乘法器、—第—移位暫二, -第二多工器'一第一遲延器、一第二乘法器—第三 多工器、一第二乘法器、一第四多工器、_ _ 。The first-multiplier, the first _shift register, the second multiplexer, the delay, the second multiplication, the brother, the younger brother, the Xigongyi, and the third multiplication: first! Four multiplexers, a second adder', and a second delay. "7 benefits are used to receive the N/4 third temporary signals and - the first multiplication signal, and generate a cage for the rice to the first multiplex signal. The first-adder is connected to the first Xigongyi, in order to add the first-multiplex signal and a second delay number to produce a late m... fourth temporary signal. Yan Xuan-multiplication letter...adder' to the fourth temporary signal Multiplying with the - cosine function signal &, - shift register to connect the second - multiply signal. The number is shifted to produce Ϊ ^ to connect the first multiplier to JL The first-shift signal. The second multiplex shift signal, γ 'receives the first-multiplication signal and the first-multiple bit. ft number' to take a flute-炙丁〇〇 to the first a first lingering signal. The first delaying device is connected to generate a -1st fourth temporary signal for late delay-delay (1). The second multiplier is connected to the delaying device to delay the first delay Row multiplication operation to generate a first multiplication, ... string function signal to connect to the first delay method: number. The third multiplexer, ^ installed And the δ 玄 second multiplier receives the first late a 9 201227351 to output a third multiplexer delay signal and the second multiplied signal number. The third multiplier is connected to the third multi-jade (4) the third multiplexer signal is multiplied by -1 to generate a third multiplication signal. The fourth multiplexer is coupled to the second multiplexer to receive the second multiplexer signal and the a delay signal to rotate the fourth multiplexer signal. The second adder is coupled to the third multiplier and the fourth multiplexer to signal the third multiplier signal and the fourth multiplexer signal Performing an addition operation to generate a second addition signal. The second delay is coupled to the second adder to delay the "Hai first addition signal to generate the second delay apostrophe. 戎Second rotation The arithmetic unit is connected to the discrete Fourier transform unit 该 of the N/4 points to perform a second rotation operation on the N/4 fourth temporary signal and the second addition signal to generate N/4 fifth temporary a de-interleave operation unit connected to the second Rotating the arithmetic unit to perform a deinterleaving operation on the N/4 fifth temporary signals to generate N output signals. According to another feature of the present invention, the present invention provides a modified discrete cosine with discrete Fourier transform as the core The inverse conversion system includes a data reordering unit ′—a first rotation operation unit, an N/4 point discrete Fourier transform unit, a second rotation operation unit, and a deinterleave operation unit. The data reordering unit receives N/2 round digit signals to perform a data reordering operation on the n/2 input digit signals to generate N/4 sixth temporary signals, wherein N is a positive integer of a multiple of 4 The first rotation operation unit is connected to the data reordering unit to perform a first rotation operation on the N/4 sixth temporary signals to generate N/4 seventh temporary signals. The n/4 points of the discrete Fourier 201227351 leaf conversion unit are connected to the first-rotation operation unit, and (4) the n/4 temporary signal performs a discrete Fourier transform to generate an N/4 (four) number. The discrete Fourier transform unit of the NM points includes a first: a first adder, a first multiplier, a first shifter, a second multiplexer, a first delay, a second multiplier - a third multiplexer, a second multiplier, a fourth multiplexer, _ _.

及-第二遲延器。該第-多工器用以接收該N:個第:暫 時訊號與-第二乘法訊號,並產生_第_多工訊號。該 第-加法器連接至該第—多卫器’以對該第—多工气號 與-第二遲延訊號進行加法運算,以產生該第八暫料 號。該第-乘法器連接至該第一加法器,以對該第八暫 時訊號與一餘弦函數訊號進行乘法運算,以產生一第— 乘法訊號。該第一移位暫存器連接至該第一乘法器,以 對該第一乘法訊號進行移位運算,以產生一第一移位訊 號。泫第二多工器連接至該第一乘法器接收該第一乘 法訊號及該第一移位訊號,以輸出一第二多工器訊號。 該第一遲延連接至該第一加法器,以對該第八暫時訊 號進仃遲延運具,以產生一第一遲延訊號。該第二乘法 益連接至該第一遲延裝置’以對該第一遲延訊號與一正 .弦函數訊號進行乘法運算,以產生該第二乘法訊號。該 第二多工器連接至該第一遲延裝置及該第二乘法器接 收該第一遲延訊號及該該第二乘法訊號,以輸出一第三 多工器訊號。該第三乘法器連接至該第三多工器,以對 該第二多工器訊號與-丨進行乘法運算以產生一第三乘 法sfl號。該第四多工器連接至該第二多工器,接收該第 201227351 二多工器訊號及該第二遲延訊號,以輸出一第四多工器 訊號。該第二加法器連接至該第三乘法器及該第四多工 器,以對該第三乘法訊號與該第四多工器訊號進行加法 運算,以產生一第二加法訊號。該第二遲延器連接至該 第二加法器,以對該第二加法訊號進行遲延運算,以產 生該第二遲延訊號。該第二旋轉運算單元連接至該N/4個 點之離散傅立葉轉換單元,對該N/4個由第八暫時訊號及 第二加法訊號組成執行一第二旋轉運算,以產生N/4個第 九暫時訊號。該解交錯(de inter leave)運算單元連接至該第 二旋轉運算單元,對該N/4個第九暫時訊號執行一解交錯 運算,以產生N個輸出訊號。 【實施方式】 修正型離散餘弦轉換(MDCT)使用DFT輔助運算只需 要使用N/4點DFT運算加上前、後處理即可完成MDCT運 算,整體運算量降低了不少。可藉由以DFT為核心去實現 MDCT運算的方法,使得為遞迴DFT架構的該N/4個點之 離散傅立葉轉換單元1 60使用性被提高。 首先,MDCT轉換可以使用下列公式表示: n-ι / Μ 1 π\ = ^ a(n) cos Μ « + —+ 2)^) # ίί = 0·1· -·Μ~ί 11=0 V ^ 。 利用變數變換η=η-Ν/4代入MDCT轉換的公式中: S‘V/4-1 η=·ν/4 Ν 12 201227351 SAf/4-lAnd - the second delay. The first multiplexer is configured to receive the N: temporary signal and the second multiplication signal, and generate a _th multiplex signal. The first adder is coupled to the first multi-guard to add the first multi-gas number and the second delay signal to generate the eighth temporary number. The first multiplier is coupled to the first adder to multiply the eighth temporary signal and a cosine function signal to generate a first multiplication signal. The first shift register is coupled to the first multiplier to perform a shift operation on the first multiplying signal to generate a first shift signal. The second multiplexer is connected to the first multiplier to receive the first multiplication signal and the first shift signal to output a second multiplexer signal. The first delay is coupled to the first adder to delay the transport of the eighth temporary signal to generate a first delay signal. The second multiplying method is coupled to the first delay device ‘ to multiply the first delay signal and a positive chord function signal to generate the second multiplication signal. The second multiplexer is connected to the first delay device and the second multiplier receives the first delay signal and the second multiplication signal to output a third multiplexer signal. The third multiplier is coupled to the third multiplexer to multiply the second multiplexer signal and -丨 to generate a third multiplying sfl number. The fourth multiplexer is connected to the second multiplexer to receive the second 201227351 multiplexer signal and the second delay signal to output a fourth multiplexer signal. The second adder is coupled to the third multiplier and the fourth multiplexer to add the third multiplier signal and the fourth multiplexer signal to generate a second add signal. The second delay is coupled to the second adder to delay the second addition signal to generate the second delay signal. The second rotation operation unit is connected to the discrete Fourier transform unit of the N/4 points, and performs a second rotation operation on the N/4 components consisting of the eighth temporary signal and the second addition signal to generate N/4 The ninth temporary signal. The de inter leave operation unit is coupled to the second rotation operation unit to perform a deinterleave operation on the N/4 ninth temporary signals to generate N output signals. [Embodiment] Modified Discrete Cosine Transform (MDCT) uses DFT-assisted operation. It only needs to use N/4 DFT operation plus pre- and post-processing to complete MDCT operation, and the overall calculation amount is reduced a lot. The method of implementing the MDCT operation with the DFT as the core enables the use of the discrete Fourier transform unit 160 of the N/4 points for recursive DFT architecture. First, the MDCT transformation can be expressed using the following formula: n-ι / Μ 1 π\ = ^ a(n) cos Μ « + —+ 2)^) # ίί = 0·1· -·Μ~ί 11=0 V ^. Using the variable transformation η = η - Ν / 4 into the formula of the MDCT transformation: S 'V / 4-1 η = · ν / 4 Ν 12 201227351 SAf / 4-l

、2N ^ Λ (n — y) (2n 十 l)(2fc 十 1) n=.V/4 AV~l >:>"— —^ cos J^^(2n + l)(2fc + l) n=*V/4, 2N ^ Λ (n - y) (2n ten l) (2fc ten 1) n=.V/4 AV~l >:>"-^^ cos J^^(2n + l)(2fc + l) n=*V/4

、2N 5.V/4-1 η=Λ. Λτ-1, 2N 5.V/4-1 η=Λ. Λτ-1

〉:x^ ~ —^ cos I (2η -τ X)(2k -l· 1) η=λ*/4〉:x^ ~ —^ cos I (2η -τ X)(2k -l· 1) η=λ*/4

,2/V AV4-1 ~ X x(^ + 4W)C05(:^^2n"l"1^2A:+^ η=0 .V-1 ^ .r(n)cos (2η + l)(2fe + 1) ⑴ n = 0 其中, 2/V AV4-1 ~ X x(^ + 4W)C05(:^^2n"l"1^2A:+^ η=0 .V-1 ^ .r(n)cos (2η + l)( 2fe + 1) (1) n = 0 where

ί(η)((η)

X ( ^ \ Ν λ· η 丁 一 λ’ , Ο < η <--1 ν 4 / 4 κ)' < η < Λ» - 1 設k’=N-k:-l,則公式(1)式中之餘弦函式(Cosine function)可被表示為: cos (2n + 1)(2¾ + 1)^ = cos (2η + 1)(2Λ? - 2ke - l) j cos (?r(2n + 1) 一 (2ti + l)(2fc'+ 1) 13 201227351 =(5(2n + l)(2fc’+ 1) j (2) 由公式(2)的結果可知: X(2k + 1) = -X{N -2k-2), (3) 因此,將只需考慮X(2k)的情形。 接著利用cos⑼的對稱性,基於Θ = =π,原式可重新表 式為. ΛΤ — i 欠(2A〇 = I i (n)cos (2n + 1)(从 + 1) n=Q ^ / i(n)c〇5{ —(2n + l)(4fc-}- 1) n=0 πX ( ^ \ Ν λ· η 丁一λ' , Ο < η <--1 ν 4 / 4 κ)' < η < Λ» - 1 Let k'=Nk: -l, then the formula ( 1) The Cosine function in the equation can be expressed as: cos (2n + 1)(23⁄4 + 1)^ = cos (2η + 1)(2Λ? - 2ke - l) j cos (?r( 2n + 1) One (2ti + l)(2fc'+ 1) 13 201227351 =(5(2n + l)(2fc'+ 1) j (2) From the result of equation (2): X(2k + 1 ) = -X{N -2k-2), (3) Therefore, only the case of X(2k) will be considered. Then, using the symmetry of cos(9), based on Θ = =π, the original form can be re-formulated as . — i 欠(2A〇= I i (n)cos (2n + 1) (from + 1) n=Q ^ / i(n)c〇5{ —(2n + l)(4fc-}- 1) n =0 π

4*x(W — n — l)cos I ——(2(N — n — 1) + l)(4fc + i) V2N )1 i(n)cosl-(2n + l)(4fe + l) Σ {i(n)c〇s(^ n=0 +x(W — n — l)c〇5 ίπ(4^ +1)- (2n + l)(4A: + l) x(n)cos| —(2n + l)(4fc+ l) n=0 π —x(W — n — l)cos I — (2n + l)(4A: + l) —1 ^ {x(n) — ί (W — n — l)}c〇5 i —-(2n + l)(4fc + l) n=0 \ y (4) 14 201227351 再次利用cos(;e;)的對稱性,基於θ = π/2,公式(2)可重 新表不為. X(2fc) = ^{x(2n) — x{N — 2n — l)]cos (4n + l)(4/c + l) n=04*x(W — n — l)cos I ——(2(N — n — 1) + l)(4fc + i) V2N )1 i(n)cosl-(2n + l)(4fe + l) Σ {i(n)c〇s(^ n=0 +x(W — n — l)c〇5 ίπ(4^ +1)- (2n + l)(4A: + l) x(n)cos | —(2n + l)(4fc+ l) n=0 π —x(W — n — l)cos I — (2n + l)(4A: + l) —1 ^ {x(n) — ί (W —n — l)}c〇5 i —(2n + l)(4fc + l) n=0 \ y (4) 14 201227351 Again using the symmetry of cos(;e;), based on θ = π/2 Equation (2) can be re-table as. X(2fc) = ^{x(2n) — x{N — 2n — l)]cos (4n + l)(4/c + l) n=0

M Λ-1 n=0M Λ-1 n=0

COS π (2(*—2n-l)+l)(4fc + l)COS π (2(*—2n-l)+l)(4fc + l)

、2N2N

Λ· ^ (x(2n) — x(W — 2n — i)}cos i— (4n + l)(4fc + l)Λ· ^ (x(2n) — x(W — 2n — i)}cos i— (4n + l)(4fc + l)

n=G \ *J n=0 cos (4fc + (4n + l)(4fc + l)n=G \ *J n=0 cos (4fc + (4n + l)(4fc + l)

.V 4 〆 ^ (x(2n) — :?·(Λ’ 一 2n — l)}cos (Φη + l)(4/c + 1) j n=0 、 / .V •a Σ P —2n - ό_ $ (皆+2ti) j5in (巧(4n+丄)(4*+” n=0 「年-i.V 4 〆^ (x(2n) — :?·(Λ'a 2n — l)}cos (Φη + l)(4/c + 1) jn=0 , / .V •a Σ P —2n - Ό_ $ (all +2ti) j5in (巧(4n+丄)(4*+" n=0 "year-i

Re ^ |(x(2n) — x{N — 2n — 1)) + i — 2n — 1 n=0 — I XV2 7Γ + 2n) JJejcpI —i — (4n + l)(4fc + l) 由指數函數的特性: Η 15 (5) 201227351 &十5(如 + 1种+?) + 1)) =exp t — (4n + l)^exp^—(4n + l)(4ic + l) exp (-1· (4n + 1) (4fc + 1) 其中 xi2k+i)Re ^ |(x(2n) — x{N — 2n — 1)) + i — 2n — 1 n=0 — I XV2 7Γ + 2n) JJejcpI —i — (4n + l)(4fc + l) by exponent The characteristics of the function: Η 15 (5) 201227351 & 10 5 (eg + 1 +) + 1)) = exp t — (4n + l)^exp^—(4n + l)(4ic + l) exp (-1· (4n + 1) (4fc + 1) where xi2k+i)

Re T" ^ |—i(x(2n) — .r(W — 2n — 1)) + — 2n — 1 n=0Re T" ^ |—i(x(2n) — .r(W — 2n — 1)) + — 2n — 1 n=0

Im 义 (S + 2«))} exp (-i^ (4n + l)(4fc + 1)Im meaning (S + 2«))} exp (-i^ (4n + l)(4fc + 1)

r.V ^ |(x(2?i) — λ (Λ/ — 2n — 1)) + i — 2n — 1 n=0 ίτ-*- 2nJ \\exP\ πr.V ^ |(x(2?i) — λ (Λ/ — 2n — 1)) + i — 2n — 1 n=0 ίτ-*- 2nJ \\exP\ π

2N (4n + l)(4fc + 1) (6) 定義新符號^(k),表示為 ΛΓ Τ' n=0 ⑺ 其中 λ'η = (x(2w) _ x(^N — 2n — 1)) + f (尤(^· — 2τι — 1) _ ·Χ(·^· + 2w)) 公式(7)可重新被改寫為: 16 (8)201227351 2ττ , N Λ = 〇·〜—一 1 / 4 其中 xjjo2N (4n + l)(4fc + 1) (6) Define the new symbol ^(k), denoted as ΛΓ Τ' n=0 (7) where λ'η = (x(2w) _ x(^N — 2n — 1 )) + f (Especial (^· - 2τι — 1) _ ·Χ(·^· + 2w)) Equation (7) can be rewritten as: 16 (8)201227351 2ττ , N Λ = 〇·~—one 1 / 4 where xjjo

n=〇 2ΤΓ, exp / 2π \ l-i—-nfc] \ A//4 } 由公式(5)及公式(6)式可知 /?e(;?(fc)) = A,(2Ar) , N — 4n=〇2ΤΓ, exp / 2π \ li—-nfc] \ A//4 } From equations (5) and (6), we know that /?e(;?(fc)) = A,(2Ar) , N — 4

hn (;?(fc)) = X(2fc + $) , fc = 0、仝一 (9)(10) 根據公式(3)及公式(1 0),可得 , k=〇+ —X(A/ — — 2) , k =--1〜0 4 (11) = -X(2k+l) 〇 由公式(u)式結果可知··Hn (;?(fc)) = X(2fc + $) , fc = 0, the same (9) (10) According to formula (3) and formula (1 0), k = 〇 + - X ( A/ — — 2) , k =--1 to 0 4 (11) = -X(2k+l) 可The result of the formula (u) shows that··

X(2k + 1) = -ImX(2k + 1) = -Im

X /N I--k 一 1 \4X /N I--k a 1 \4

N ,k = --1 4 (12) 由前面的推導過程,可將mdct轉換過程簡易整理成 下列數個步驟: 1.把輸入資料順序做簡易的移位並編排成複數形態。 2-將複數資料執行係數以{3(_丨(271/1^)(11 + (1/8)))乘法運算之 前處理。 .經前處理運算後的資料做NM點DFT轉換。 17 201227351 4.轉換完成的資料再執行係數exp(_i(27l/N)(k + (1/8)))乘法 運算之後處理。 5·最後’將資料做有系統的重新編排,即可轉換 輸出。 以下為以DFT為核心之IMDCT方法,其非利用dft 轉換成IDFT後再來實現11^11)(:丁,因這樣的做法無法有效 地與之如的結果做合併’而達到核心的共架構。 首先,IMDCT轉換的定義可以使用下列公式表示: Λ(«)=N , k = --1 4 (12) From the previous derivation process, the mdct conversion process can be easily organized into the following steps: 1. The input data is simply shifted and arranged into a complex form. 2- The pre-processing of the complex data execution coefficient by {3(_丨(271/1^)(11 + (1/8)))). The data after the pre-processing operation is NM point DFT conversion. 17 201227351 4. The data of the conversion completion is executed after the coefficient exp(_i(27l/N)(k + (1/8))) multiplication operation. 5. Finally, the data will be systematically rearranged to convert the output. The following is the DFT-based IMDCT method, which does not use dft to convert to IDFT and then implements 11^11) (: D, because such an approach cannot effectively merge with the results) to achieve the core common architecture First, the definition of an IMDCT transformation can be expressed using the following formula: Λ(«)=

其次,利用輸出結果之對稱性’可只要考慮偶數部分 故IMDCT轉換可以用下列公式表示: 刀Secondly, the symmetry of the output result can be considered as long as the even part is considered, so the IMDCT conversion can be expressed by the following formula:

(4n + 1 + j) (2k + l)^ 可以改寫為公式 將IMDCT轉換公式中的餘弦函式展開 (13):(4n + 1 + j) (2k + l)^ can be rewritten as a formula to expand the cosine function in the IMDCT conversion formula (13):

考慮&的對稱性, 考慮A的對稱性,可以導出公式(M)及公式(15): (2n)= ^ (-DftYW2fc)(C(n.fc)-5(n.fc)) fc=〇 ’ (14) (15)201227351 Λ;/4-1Considering the symmetry of & considering the symmetry of A, we can derive the formula (M) and the formula (15): (2n) = ^ (-DftYW2fc)(C(n.fc)-5(n.fc)) fc =〇' (14) (15)201227351 Λ;/4-1

X (¥+2π)= ^ 〇k){-C(:nt k) - 5(η, fe)^ 一 *=〇 +X (备- - 1) (C(rU) - S(n, fc))j 其中 C(nf λ·) = cos ’(4n + l)(4fc + 1)tt、X (¥+2π)= ^ 〇k){-C(:nt k) - 5(η, fe)^ A*=〇+X (Prepared - - 1) (C(rU) - S(n, fc ))j where C(nf λ·) = cos '(4n + l)(4fc + 1)tt,

2N2N

S(nt k) = 5in r(4n + l)(4fc + 1S(nt k) = 5in r(4n + l)(4fc + 1

2/V -) 將公式(14)及公式(15)式合併,可得結果為公式(16) x(2n) + ix + 2n j Λν/4-l ^ (-Ι)%βχρ(-ί λ=0 2πNji •nk >exp\2/V -) Combine the formula (14) and the formula (15), and the result is the formula (16) x(2n) + ix + 2n j Λν/4-l ^ (-Ι)%βχρ(-ί λ=0 2πNji •nk >exp\

2/Γ / *一 s N (n ^ i)) (16) 其中 = f^(2Ar) -r iX — 2k — exp(—i~^k + —j2/Γ / *一 s N (n ^ i)) (16) where = f^(2Ar) -r iX — 2k — exp(—i~^k + —j

最後,依據G. Chih-Da Chien and J. Guo 在 2007 發表 的論文 ”A Memory-Based Hardware Accelerator for Real-Time MPEG-4 Audio Coding and Reverberation"將公 式(16)中的輸出重新排列,則可免除係數w8-i的乘法運 鼻’其規則為: 19 201227351 x(2n) = /?e (il (〇)) x(2n + 1) = —Im W-H) x(- + = Im(x(n)) \ (;+ 2« + 1)=邊(《-i - „)) r(?+2n)=irn(K€+n)) 伶+ 2n + 1) = -Re(《-1 - „)) x (-^7 + 27i) = ~i?e(x(n))Finally, according to the paper "A Memory-Based Hardware Accelerator for Real-Time MPEG-4 Audio Coding and Reverberation" published by G. Chih-Da Chien and J. Guo in 2007, the output in equation (16) can be rearranged. The multiplication method of the exemption factor w8-i's rule is: 19 201227351 x(2n) = /?e (il (〇)) x(2n + 1) = —Im WH) x(- + = Im(x( n)) \ (;+ 2« + 1)=edge ("-i - „)) r(?+2n)=irn(K€+n)) 伶+ 2n + 1) = -Re(-1 - „)) x (-^7 + 27i) = ~i?e(x(n))

X (t + 2n + 1 ί(η)= 其中 r.v/4-ι \ Σ(-i)乂找p(_i5nfc)打p(-f U=0 \X (t + 2n + 1 ί(η)= where r.v/4-ι \ Σ(-i)乂 find p(_i5nfc) hit p(-f U=0 \

o 觀察前面MDCT轉換與IMDCT轉換,可 MDCT/IMDCT轉換除了對輸入 '輸出資料重新排列^ 2 不同外,其餘部分皆相同,包括前、後處理的係數、I 採N/4點DFT當作核心架構。因此,整個河〇(:丁/11^1:)(^^ 換系統可如圖1所顯示》 圖1係本發明之以離散傅立葉轉換為核心之修正型 離散餘弦正轉換之系統110、及修正型離散餘弦反轉換之 系統120的示意圖。該修正型離散餘弦正轉換系統丨1〇包 括一資料順序移位編排單元13〇、一資料重新排序單元 140、一第一旋轉運算單元u〇、— N/4個點之離散傅立葉 20 201227351 轉換單元160' —第二旋轉運算單元17〇、及一解交錯 (de-interleave)運算單元 18〇。 該資料順序移位編排單元丨3〇接收\個輸入數位訊號 x(n),對该N個數位訊號執行順序移位編排,以產生1^個 第一暫時訊號环„),當中,!^為4的倍數之正整數。 該資料重新排序單元丨4〇連接至該資料順序移位編 排單元1 3 0,對該第一暫時訊號玎…執行資料重新排序運 异,以產生N/4個第二暫時訊號&。 忒第一旋轉運算單元1 5〇連接至該資料重新排序單 元140,對該N/4個第二暫時訊號~執行一第一旋轉運 算’以產生N/4個第三暫時訊號。 該NM個點之離散傅立葉轉換單元16〇連接至該第一 旋轉運算單元15〇,對該N/4個第三暫時訊號執行離散傅 立葉轉換,以產生N/4個第四暫時訊號。該N/4個點之離 散傅立葉轉換單元1 60係一遞迴dft架構。 »亥第—旋轉運异單兀丨7〇連接至該N/4個點之離散傅 立葉轉換單元丨60,對該N/4個第四暫時訊號及第二加法 訊號執行一第二旋轉運算,以產生N/4個第五暫時訊號 X(k) ° " 該解交錯(de-inter丨eave)運算單元18〇連接至該第二 旋轉運算單元丨7〇 ’對該NM個第五暫時訊號戈「幻執行一 解交錯運算,以產生N個輸出訊號。 由前述的公式推導可知’該資料順序移位 130以下列公式表示: 21 201227351o Observe the previous MDCT conversion and IMDCT conversion. The MDCT/IMDCT conversion can be the same except that the input 'output data is rearranged ^ 2 differently, including the coefficients of the pre- and post-processing, and the N/4 DFT as the core. Architecture. Therefore, the entire river raft (: D / 11 ^ 1 :) (^ ^ change system can be as shown in Figure 1) Figure 1 is a discrete Fourier transform to the core of the modified discrete cosine forward conversion system 110, and A schematic diagram of a modified discrete cosine inverse conversion system 120. The modified discrete cosine forward conversion system 〇1〇 includes a data sequential shifting unit 13〇, a data reordering unit 140, and a first rotating computing unit u〇, — N/4 points of discrete Fourier 20 201227351 Conversion unit 160' - second rotation operation unit 17〇, and a de-interleave operation unit 18〇. The data sequence shifting unit 丨3〇 receives\ Inputting a digital signal x(n), performing sequential shift programming on the N digital signals to generate 1^ first temporary signal rings „), wherein !^ is a positive integer of a multiple of 4. The data is reordered. The unit 丨4〇 is connected to the data sequence shifting arrangement unit 130, and performs data reordering on the first temporary signal 玎... to generate N/4 second temporary signals & 忒 first rotation operation Unit 1 5〇 is connected to the data The new sorting unit 140 performs a first rotation operation on the N/4 second temporary signals to generate N/4 third temporary signals. The discrete Fourier transform unit 16 of the NM points is connected to the first The rotation operation unit 15 performs discrete Fourier transform on the N/4 third temporary signals to generate N/4 fourth temporary signals. The N/4-point discrete Fourier transform unit 1 60 is a recursive dft The Hierarchy-rotation transport unit 7兀丨 is connected to the N/4 point discrete Fourier transform unit 丨60, and performs a second rotation on the N/4 fourth temporary signals and the second addition signal Calculating to generate N/4 fifth temporary signals X(k) ° " The de-interleaving operation unit 18〇 is connected to the second rotating computing unit 丨7〇' to the NM The five temporary signals are "executively interleaved to generate N output signals. It is known from the foregoing formula that the sequence shift 130 of the data is expressed by the following formula: 21 201227351

該資料重新排序罩开,1 40以下列公斗+ 時訊戒。The data is reordered and slid open, 1 40 with the following cock + time.

時訊號。 該第一旋轉運算單元150對該N/4個第二暫時訊號 所執行第一旋轉運算以下列公式表示: 271 1 exp(—i—(t —)) 1 N 8 當中’ /係一個由0至N/4-1的指標。 該第二旋轉運算單元170對N/4個第四暫時訊號所執 行第二旋轉運算以下列公式表示: 當中,〆係一個由0至N/4-1的指標。 該解交錯(de-interleave)運算單元1 80對該NM個第五 暫時訊號所執行解交錯運算以下列公式表示: X(2k) = Re(X(k)) X(2k + l) = -Im(X(-Time signal. The first rotation operation unit 150 performs the first rotation operation on the N/4 second temporary signals by the following formula: 271 1 exp(—i—(t —)) 1 N 8 where the '/system is one by 0 To the N/4-1 indicator. The second rotation operation unit 170 performs a second rotation operation on the N/4 fourth temporary signals by the following formula: wherein, the index is an index from 0 to N/4-1. The de-interleave operation unit 180 performs deinterleaving on the NM fifth temporary signals by the following formula: X(2k) = Re(X(k)) X(2k + l) = - Im(X(-

22 201227351 當中,為該NM個第五暫時訊號,义為該N個輸出 訊號。22 201227351 Among them, the NM fifth temporary signals are the N output signals.

圖2係本發明之N/4個點之離散傅立葉轉換單元16〇 之示意圖。該N/4個點之離散傅立葉轉換單元16〇包含一 第一多工器205、一第一加法器21〇、一第一乘法器215、 一第一移位暫存器220、一第二多工器225、一第一遲延 益23 0、一第二乘法器23 5、一第三多工器24〇、一第三乘 法器245、一第四多工器25〇、一第二加法器255、及一第 二遲延器260。 一第二乘法訊號,並產生一第一多工訊號。 ★該第一加法器2丨0連接至該第一多工器2〇5 ,以對該 第一多工訊號與一第二遲延訊號進行加法運算而產生 該第四暫時訊號。 卜。玄第乘法态2 1 5連接至該第一加法器2 1 〇,以對該 第四暫時Λ號與-餘弦函數訊號進行乘法運算而產 一第一乘法訊號。 Λ第移位暫存器22G連接至該第—乘法器215,以 乘法訊號進行移位運算,而產生―第—移位訊 號,以輸出一第二多工器訊號。 二多工器225連接至該第—乘法器215及該第~ 、 接收邊第一乘法汛號及該第一移位訊 23 201227351 該第遲延器230連接至該第一加法器21〇,以對該 第四暫時訊號進行遲延運算,而產生一第一遲延訊號。 該第一乘法器235連接至該第一遲延裝置230,以對 該第一遲延訊號與一正弦函數訊號進行乘法運算,而產 生該第二乘法訊號。 忒第二多工器240連接至該第一遲延裝置23〇及該第 二乘法器235,接收該第一遲延訊號及該該第二乘法訊 號’以輸出一第三多工器訊號。 一該第三乘法器245連接至該第三多工器24〇,以對該 第三多工器訊號與-1進行乘法運算,以產生一第三乘法 訊號。 該第四多工器250連接至該第二多工器225,接收該 第二多工器訊號及該第二遲延訊號,以輪出一第四多工 器訊號。 該第二加法器255連接至該第三乘法器245及該第四 多工器250,以對該第三乘法訊號與該第四多工器訊號進 行加法運算,而產生一第二加法訊號。 該第二遲延器260連接至該第二加法器255 ,以對該 第二加法訊號進行遲延運算,而產生該第二遲延訊號。 再請參照圖1本發明之以離散傅立葉轉換為核心之 修正型離散餘弦正轉換之系統11〇、及修正型離散餘弦反 轉換之系統120的示意圖。該修正型離散餘弦反轉換系統 120包括一資料重新排序單元19〇、一第一旋轉運算單元 150、一 N/4個點之離散傅立葉轉換單元16〇、一第二旋轉 運异單元!7〇、及一解交錯(deinterieave)運算單元。 24 201227351 該資料重新排序單元190其接收N/2個輸入數位訊號 耶」,對該N/2個輸入數位訊號執行資料重新排序運算, 以產生N/4個第六暫時訊號a ’當中,料4的倍數之正 整數。 6亥第一紅轉運其單元丨5〇連接至該資料重新排序單 TC190,對該N/4個第六暫時訊號义免執行一第一旋轉運 鼻’以產生N/4個第七暫時訊號。Figure 2 is a schematic illustration of the N/4 point discrete Fourier transform unit 16A of the present invention. The N/4 point discrete Fourier transform unit 16A includes a first multiplexer 205, a first adder 21A, a first multiplier 215, a first shift register 220, and a second The multiplexer 225, a first delay benefit 23 0, a second multiplier 23 5, a third multiplexer 24 〇, a third multiplier 245, a fourth multiplexer 25 〇, a second addition The device 255 and a second delay 260. a second multiplication signal and generating a first multiplex signal. The first adder 2丨0 is connected to the first multiplexer 2〇5 to add the first multiplex signal and a second delay signal to generate the fourth temporary signal. Bu. The first multiplication mode 2 1 5 is connected to the first adder 2 1 〇 to multiply the fourth temporary apostrophe and the cosine function signal to generate a first multiplication signal. The first shift register 22G is connected to the first multiplier 215 to perform a shift operation by the multiplication signal to generate a "first shift signal" for outputting a second multiplexer signal. The second multiplexer 225 is connected to the first multiplier 215 and the first, receiving side first multiplication apostrophe and the first shifting signal 23 201227351. The second delay 230 is connected to the first adder 21 〇 to The fourth temporary signal is delayed to generate a first delay signal. The first multiplier 235 is coupled to the first delay device 230 to multiply the first delay signal and a sinusoidal function signal to generate the second multiplication signal. The second multiplexer 240 is connected to the first delay device 23 and the second multiplier 235, and receives the first delay signal and the second multiplication signal ’ to output a third multiplexer signal. A third multiplier 245 is coupled to the third multiplexer 24A to multiply the third multiplexer signal by -1 to produce a third multiply signal. The fourth multiplexer 250 is coupled to the second multiplexer 225 to receive the second multiplexer signal and the second delay signal to rotate a fourth multiplexer signal. The second adder 255 is coupled to the third multiplier 245 and the fourth multiplexer 250 to add the third multiplier signal and the fourth multiplexer signal to generate a second add signal. The second delay 260 is coupled to the second adder 255 to delay the second addition signal to generate the second delay signal. Referring again to Figure 1, there is shown a schematic diagram of a modified discrete cosine transforming system 11 〇 and a modified discrete cosine inverse transform system 120 with discrete Fourier transform as the core. The modified discrete cosine inverse conversion system 120 includes a data reordering unit 19A, a first rotating operation unit 150, an N/4 point discrete Fourier transform unit 16A, and a second rotating transport unit! 7〇, and a deinteriave (deinterieave) arithmetic unit. 24 201227351 The data reordering unit 190 receives N/2 input digit signals, and performs data reordering operation on the N/2 input digit signals to generate N/4 sixth temporary signals a ' A positive integer of multiples of 4. 6H first red transport unit 丨5〇 connected to the data reordering single TC190, perform a first rotating nose for the N/4 sixth temporary signals to generate N/4 seventh temporary signals .

該N/4個點之離散傅立葉轉換單元連接至該第一 旋轉運算單το 1 5G ’對該N/4個第七暫時訊號執行離散傅 立葉轉換,以產生N/4個第八暫時訊號。 該第二旋轉運算單元1 70連接至該N/4個點之離散傅 立葉轉換單元1 60,對該N/4個第八暫時訊號執行一第二 旋轉運算,以產生NM個第九暫時訊號环…。 該解交錯(deinter丨eave)運算單元丨8〇連接至該第二旋 轉運异單元,對該N/4個第九暫時訊號对…執行一解交錯 運算’以產生N個輸出訊號〆^。 其中遠資料重新排序單元19〇以下列公式表示:The N/4 point discrete Fourier transform unit is coupled to the first rotational operation unit το 1 5G ′ to perform discrete Fourier transform on the N/4 seventh temporary signals to generate N/4 eighth temporary signals. The second rotation operation unit 170 is connected to the N/4 points of the discrete Fourier transform unit 160, and performs a second rotation operation on the N/4 eighth temporary signals to generate NM ninth temporary signal rings. .... The deinterleaving operation unit 丨8〇 is connected to the second rotation transfer different unit, and performs a deinterleaving operation on the N/4 ninth temporary signal pairs to generate N output signals 〆^. The far data reordering unit 19 is represented by the following formula:

Xk - X(2k) + iX{N/2-2k-l), 當中,X灸為該N/4個第六暫時訊號,^為該N/2個輸入 數位訊號。 該第一旋轉運算單元1 5 〇對該N/4個第六暫時訊號 /灸所執行第一旋轉運算以下列公式表示: 25 201227351 exp(~i~(t + in , N 8 當中’ i係一個由〇至N/4-1的指標《 該第二旋轉運算單元17〇對該該1^[/4個第八暫時訊號 所執行第二旋轉運算以下列公式表示: / 1 exP(-l-7r(t'+~)) » N 8 當中’〆係一個由〇至N/4- 1的指標。 該解交錯(deinterleave)運算單元對該ν/4個第九暫 時訊號所執行解交錯運算以下列公式表示: -v(2n) = Re + ,r(2n + i)=—如(交(w 1 — n)) + 2n) = im(交(η))Xk - X(2k) + iX{N/2-2k-l), wherein X moxibustion is the N/4 sixth temporary signals, and ^ is the N/2 input digital signals. The first rotation operation unit 15 〇 performs the first rotation operation on the N/4 sixth temporary signals/moxibuses by the following formula: 25 201227351 exp(~i~(t + in , N 8 中 中 ' i An index from 〇 to N/4-1 "The second rotation operation unit 17 第二 performs the second rotation operation on the 1^[/4 eighth temporary signals by the following formula: / 1 exP(-l -7r(t'+~)) » N 8 is an index from 〇 to N/4-1. The deinterleave unit deinterleaves the ν/4 ninth temporary signals. The operation is expressed by the following formula: -v(2n) = Re + , r(2n + i)=—such as (cross (w 1 — n)) + 2n) = im (cross (η))

λ: + 2n + l) = — J?e (支(I 一 1 — η)) Λ- x(^+2n)=/m^(^+n)) (y+2n + l)=-J?e^(^-l-n] τ + 2nj = -i?e(x(n)) ,(^+2n+1)=/nl (ί(?-1-η) 當中,為該Ν/4個第九暫時訊號,為該N個輪出 訊號。 圖3係一習知改良型RDFT架構之示意圖,由圖3可 知,習知改良型RDFT架構需4個實數乘法器及5個複數加 26 201227351 法器。但進一步去觀察,可發現乘jsinx(ek)係數之運算當 在最後一個週期時,其結果才被視為有效值,若在硬體 實現上真的使用二個乘法器來支援此運算,對於乘法器 而言不僅效率非常差,且在晶片面積及功耗上都是—種 浪費。λ: + 2n + l) = — J?e (branch (I -1 - η)) Λ - x(^+2n)=/m^(^+n)) (y+2n + l)=-J ?e^(^-ln] τ + 2nj = -i?e(x(n)) , (^+2n+1)=/nl (ί(?-1-η), for this Ν/4 The ninth temporary signal is the N rounds of signals. Fig. 3 is a schematic diagram of a conventional improved RDFT architecture. As can be seen from Fig. 3, the conventional improved RDFT architecture requires four real multipliers and five complex numbers plus 26 201227351 But further observation, we can find that the operation of multiplying jsinx (ek) coefficient is regarded as a valid value in the last cycle, if two multipliers are used to support this in hardware implementation The operation is not only very inefficient for the multiplier, but also a waste in both wafer area and power consumption.

由表1可知,其暫存器與多工器的電晶體數跟乘法器 相比較是微不足道的,故將加入暫存器及多工器使得 jsin(ek)係數與cos(ek)係數之乘法器共用,不過相對的代價 必須額外一個週期來執行jsin(ek)係數乘法運算,圖4係圖 3改良之共用乘法器之RDFT架構方塊圖,圖5為圖4共用 乘法器之設計方式。如此一來,乘法器效率不但達百分 百’且在面積與功率方面都有很大改善。圖5中c〇eff_sei 的程式碼為: — define cos_value= 1'bO define sin_value=l'bl if(loop_cycie=N+1 ) coeff_sel=sin_ value else coeff_sel=cos value o 表1 24-Bits元件之電晶體數 元件 拴鎖器 加法器 乘法器 多工器 電晶體數目 240 672 18624 192 對圖4再進一步觀察,可發現圖中虛線所圍之3個複 數加法器,其效率與先前討論之乘法器效率問題相同, 27 201227351 均在最後-個週期_,運算結果才被使用…再一次 對改良型RDFT架構進行修改,本次修改只需使用4個多 工?!即可,圖2為修改之结果。 經由硬體改良後,將圖2與圖3做硬體評估比較,可 知硬體需求由原本5個複數加法器及4個實數乘法器簡化 為2個複數加法器及2個實數乘法器’改善率約略為 47.2%。 公共因子(Common Factor)方法的優點在於N的分解 可為任意數’分解所得的兩數愈接近時管線化的效率則 越南’但其缺點會有旋轉因子的問題’會增加乘法的運 算量降低精確度。而互質因子(Prime Factor)方法的優點 就是不會有旋轉因子的問題,缺點則為N的分解彼此需為 互質’這樣的分解可能會導致管線化的效率降低,對於 冪次方的點數也不適用。 S.-C. Lai,et al在 Circuits and Systems II: Express Briefs, IEEE Transactions on, pp. 647-651,2010所發表的 論文” Low-Computation cycle, Power-Efficient and Reconfigurable Design of Recursive DFT for portabie Digital Radio Mondiale Receiver”中對於規格點數n=256 是直接採取一維形式來運算,無論是在運算週期、複雜 度及SNR值都表現都不是很理想。因此在硬體規劃上會 採取混合型來提升整體效率,至於旋轉因子的問題,將 在不增加硬體的情況下來解決。在表2中列出了 DRM所需 規格點數採取混合型方法之c、m值分解方式。 28 201227351It can be seen from Table 1 that the number of transistors in the register and the multiplexer is negligible compared with the multiplier, so the register and the multiplexer will be added to multiply the jsin(ek) coefficient and the cos(ek) coefficient. The device is shared, but the relative cost must be an additional cycle to perform the jsin(ek) coefficient multiplication. Figure 4 is a block diagram of the improved RDFT architecture of the shared multiplier of Figure 3, and Figure 5 is the design of the shared multiplier of Figure 4. As a result, the efficiency of the multiplier is not only 100% but also greatly improved in area and power. The code for c〇eff_sei in Figure 5 is: — define cos_value= 1'bO define sin_value=l'bl if(loop_cycie=N+1 ) coeff_sel=sin_ value else coeff_sel=cos value o Table 1 Electricity for 24-Bits components Crystal Number Component 拴 Locker Adder Multiplier multiplexer Number of Transistors 240 672 18624 192 Looking further at Figure 4, you can find three complex adders surrounded by dashed lines in the graph, the efficiency of which is the efficiency of the multiplier discussed earlier. The problem is the same, 27 201227351 are in the last cycle _, the operation results are used... Once again, the modified RDFT architecture is modified. How many multiplexes are needed for this modification? ! That is, Figure 2 shows the result of the modification. After hardware improvement, comparing Figure 2 with Figure 3 for hardware evaluation, we can see that the hardware requirements are reduced from the original 5 complex adders and 4 real multipliers to 2 complex adders and 2 real multipliers. The rate is about 47.2%. The advantage of the Common Factor method is that the decomposition of N can be any number of 'decompositions. The closer the two numbers are, the more efficient the pipeline is. Vietnam's 'but its shortcomings have the problem of the rotation factor' will increase the computational complexity of multiplication. Accuracy. The advantage of the Prime Factor method is that there is no problem of the twiddle factor. The disadvantage is that the decomposition of N needs to be mutually primed. Such decomposition may lead to a decrease in the efficiency of the pipeline, and the point for the power. The number does not apply. S.-C. Lai, et al., Circuits and Systems II: Express Briefs, IEEE Transactions on, pp. 647-651, 2010. Low-Computation cycle, Power-Efficient and Reconfigurable Design of Recursive DFT for portabie In Digital Radio Mondiale Receiver, the number of specification points n=256 is directly calculated in one-dimensional form, and it is not ideal in terms of calculation cycle, complexity and SNR value. Therefore, in the hardware planning, a hybrid type will be adopted to improve the overall efficiency. As for the problem of the twiddle factor, it will be solved without adding hardware. Table 2 lists the c and m value decomposition methods for the DRM required specification points in the hybrid method. 28 201227351

最後’基於圖2的結果及管線化概念,可規割出呈有 兩級化管線的硬體架構,其中第—級部分被規劃為負責c 點DFT運算,而第二級部分則被規劃負責⑺點加丁運算。Finally, based on the results of Figure 2 and the concept of pipelined, the hardware architecture with two-stage pipelines can be cut out. The first-level part is planned to be responsible for the c-point DFT operation, while the second-level part is planned to be responsible. (7) Point plus calculation.

由於RDFT架構具有雙倍產^,故在第二級安排兩套 二DFT硬體來處理前-級運算的結果。由表2得知c皆為偶 數,且當k=0,c/2時,RDFT架構只會有單一產量,對於 第二級而言會導致一套硬體無法動作,造成資源上的浪 費。為了改進此問題,將在第一級增加如圖6所示之簡易 累加電路,讓lc=0, c/2可同時運算而產生兩筆結果供下一 級使用。最後,為了簡化晶片1/〇接腳的個數,將利用多 工器使RDFT硬體sf异結果依序輸出,其硬體架構如圖7 所示。 為了有效地提供係數給計算電路使用’一般常見的 做法係採用外部輸入’相對地當係數的精確度需求越 问’則需有較多的I/O接腳來增加係數輸入的位元數,但 此做法會導致PAD過大使得整體面積變大。另一方法係 晶片内建記憶元件唯讀記憶體(Read only Memory, ROM) ’利用查表法(L〇〇k_up Table,[υτ)來得知係數,由 表3知’面積大小隨著點數增加而增加,對於遞迴架構而 言’其優點就是具有較小的面積,若採用LUT必然增加 29 201227351 面積與功耗,則會與此特性矛盾n方法係由電路 自我產生,由簡易的電路並給予初值,藉由初值來計算 其他所需的係數,此方法對於日後增加規格點數,其調 整彈性較大,在晶片實現上所需的硬體也較小,本發明 基於低面積、節能及多規格的觀點,故採取電路自我產 生方式。 表3 24-Bits記憶元件大小之單槔與雙埠比較 記憶體大 面積(mm勹 遲延(US) 小 雙埠 單埠 雙埠 單埠 256 0.186 0.082 1.33 1.24 512 0.294 0.129 1.42 1.25 1K 0.467 0.223 1.46 1.26 2K 0.816 0.354 1.49 1.28 8K 2.718 1.180 1.89 1.65 係數自我產生的方式是依據三角函數和角公式定理 所發展出來的,其和角公式為: (17) (18) ,設 cos(« + /?) = _ Si.„(rt)Si7l(分)。 sm(a + 幻=伽(《)奶〇?) + co„(a)咖⑹。 RDFT架構電路之係數變化是由變數k決定 θ = 2π/Ν,則cos(ke)及sin(kG)可重新表式為: (19) (20) 公式(18)展開, cos(k0) = cos sin^kff) = sin 將公式(1 9)及公式(20)依公式(1 7) 可得一遞迴關係式: 30 201227351 (21) (22) ,·..,Ν-ΐ 所 〇JS(fc0) = cos((fc - l)0)cos(0) _ ⑻, = 3:M((fc - l)0)cos(0) + cos((fc — 。 若當cos(0)及sin(0)已知初始值’則k= 1,2, 相對應的係數值都可藉由公式(21)及公式(22)產生。 因所提出的二維形式RDFT架構是同時採取c〇mmon Factor方法與prime Factor方法的混合型,所以必有旋轉Since the RDFT architecture has double production, two sets of two DFT hardware are arranged at the second level to handle the results of the pre-stage operation. It can be seen from Table 2 that c is even, and when k=0, c/2, the RDFT architecture will only have a single output, and for the second level, a set of hardware will be inoperable, resulting in waste of resources. In order to improve this problem, a simple accumulating circuit as shown in Fig. 6 will be added in the first stage, so that lc=0, c/2 can be simultaneously operated to generate two results for use in the next stage. Finally, in order to simplify the number of die 1/〇 pins, the RDFT hardware sf will be output sequentially using the multiplexer. The hardware architecture is shown in Figure 7. In order to effectively provide coefficients to the calculation circuit, the 'commonly common practice is to use external input'. Relatively speaking, when the accuracy of the coefficient needs to be asked, then more I/O pins are needed to increase the number of bits of the coefficient input. However, this practice will cause the PAD to be too large and the overall area will become larger. Another method is to read the memory (Read only Memory, ROM) of the built-in memory of the chip. Using the look-up table method (L〇〇k_up Table, [υτ) to know the coefficient, the table 3 knows the 'area size' along with the number of points. Increase and increase, for the recursive architecture, the advantage is that it has a small area. If the LUT is inevitably increased by 29 201227351 area and power consumption, it will contradict this characteristic. The method is self-generated by the circuit, and the simple circuit And the initial value is given, and other required coefficients are calculated by the initial value. This method is more flexible in adjusting the number of specifications in the future, and the required hardware is smaller in the realization of the wafer. The present invention is based on the low area. The concept of energy saving and multi-standard, so take the circuit self-generation method. Table 3 Comparison of 24-Bits memory element size single and double 记忆 memory large area (mm勹 delay (US) small double 埠單埠 double 256 0.186 0.082 1.33 1.24 512 0.294 0.129 1.42 1.25 1K 0.467 0.223 1.46 1.26 2K 0.816 0.354 1.49 1.28 8K 2.718 1.180 1.89 1.65 The way the coefficient is self-generated is based on the trigonometric function and the angular formula theorem. The sum and angle formulas are: (17) (18), let cos(« + /?) = _ Si.„(rt)Si7l(分). sm(a + 幻=伽(")奶〇?) + co„(a)咖(6). The coefficient variation of the RDFT architecture circuit is determined by the variable k θ = 2π/ Ν, then cos(ke) and sin(kG) can be re-formulated as: (19) (20) Equation (18) expands, cos(k0) = cos sin^kff) = sin Formula (1 9) and formula (20) According to the formula (1 7), a recursive relation can be obtained: 30 201227351 (21) (22) ,·..,Ν-ΐ 〇JS(fc0) = cos((fc - l)0)cos (0) _ (8), = 3: M((fc - l)0)cos(0) + cos((fc — . If cos(0) and sin(0) are known to be initial values' then k=1 2, the corresponding coefficient values can be generated by the formula (21) and the formula (22). RDFT architecture while taking a hybrid method c〇mmon Factor and Factor method prime, so there must be rotated

因子係數產生的問題。接續來探討旋轉因子如何同樣利 用公式(17)、公式(18)產生。 由於旋轉因子係數變化是受變數屮、^所控制, 可知紅轉因子在同-個時間内需有兩種不同的係數值, 其需求順序如圖8所示。 田固δ °』知,係數 ,、》 4 t 值依序遞增ηι的值,直到—的因子產生完畢後, 再將h值加丨或減!並且設„|=〇,重覆執行到結東。因此 由這些動作得知需有三組初始值,二組負責用來計算由 /7 &對應k2 ' k' n|決定的因子’ _組負責用來計算由^ 決定的因子’因子產生方式如下:設Θ=2π/Ν、2 ⑽=⑺咻一及咖⑽》=咖(一},則旋轉因子可表示為 =cos(nikt2d^-jsin(n:ikt2e) (23) = cos(nief}~jsin(niet),The problem caused by the factor coefficient. Continued to explore how the twiddle factor is equally generated using equations (17) and (18). Since the change of the rotation factor coefficient is controlled by the variables 屮 and ^, it can be seen that the red-turn factor needs two different coefficient values in the same time, and the order of demand is as shown in FIG. Tiangu δ °" know, coefficient,," 4 t value is incremented by the value of ηι, until the factor of - is completed, then increase or decrease the value of h! And set „|=〇, repeat execution to the east. Therefore, it is known from these actions that there are three sets of initial values, and the two groups are responsible for calculating the factor ' _ group determined by /7 & corresponding to k2 ' k' n| Responsible for calculating the factor 'factor' determined by ^ is as follows: set Θ = 2π / Ν, 2 (10) = (7) 咻 1 and coffee (10) = = coffee (a), then the rotation factor can be expressed as = cos (nikt2d ^- Jsin(n:ikt2e) (23) = cos(nief}~jsin(niet),

Wjv - cosing''2Θ) — jsin(n^k"2Θ) = <:〇«)-細(〇 〇 31 (24) 201227351 將公式(23)、公式(24)依公式(17)、公式(18)展開, 可得: ¢705(740,) = — l)0,)i:OS(0,) - - 。 (25) = - l^eOcos^') + ¢05((^ - l)0f)sin(0f)。 (26) = 0)5((74 — l)0n)a>s(0") — — l)f)sin(0”)。(27) sir^n/")=沿— l)0")cos(0M) + 。 (28) 而cos(f) · sin(f) · cos(〇 .咖(0”)的產生方式同樣也可由 公式(17)、公式(18)式展開,可得: = i:os((fc、— l)0)cos(0) — sin((fcf2 — l)0)sin(0)。 (29) •sinQSe) = 5ίη((Λ’2 — l)0)c:os(0) + — l)0)sin(0)。 (30) cos(fc"2ff) = ^5((^^2 + l)0}cos(—— 5in((fcM2 + 1)0)5171(-6^) =c〇5((fcn2 + l)e)cos(0) + 5ίη((λΜ2 + l)0)sm(0) 。 (3 1 ) 5m(/cM2ff) = 5111((¾1^ + l)e)c〇5(—+ cos((fen2 + l)0)sm(—β) =sin((fc"2 + l)0)<ros(0) -+ l)0)sin(0)。 (3 2) 若將 cos(0) 5in(0) cojs(0,) ·5ίη(0’)cos(f) ·5ίη(0π)給定初 始值,則所有的旋轉因子都可藉由公式(25)至公式(32)推 衍得知。 前述已說明如何藉由給定的初值來計算出所有的係 數值,包含Common Factor方法所需的旋轉因子部分,但 卻都沒有提到如何使用現有的硬體架構來計算出所有的 係數。 由公式(21)至公式(22)可知每一個係數都需有兩次 的乘法運算,若直接利用乘法器來支援此部分的計算, 則需有四個之多,這將會是很大的負擔,且不利於節能、 32 201227351Wjv - cosing''2Θ) — jsin(n^k"2Θ) = <:〇«)-fine (〇〇31 (24) 201227351 Formula (23), formula (24) according to formula (17), formula (18) Expand, available: ¢ 705 (740,) = — l) 0,) i: OS (0,) - - . (25) = - l^eOcos^') + ¢05((^ - l)0f)sin(0f). (26) = 0)5((74 — l)0n)a>s(0") — — l)f)sin(0”).(27) sir^n/")=along — l)0&quot ;)cos(0M) + (28) and cos(f) · sin(f) · cos(〇.咖(0") can also be generated by formula (17) and formula (18). Get: = i: os((fc, - l)0)cos(0) — sin((fcf2 — l)0)sin(0). (29) •sinQSe) = 5ίη((Λ'2 — l) 0) c: os(0) + — l) 0) sin(0). (30) cos(fc"2ff) = ^5((^^2 + l)0}cos(—— 5in((fcM2 + 1)0)5171(-6^) =c〇5((fcn2 + l )))cos(0) + 5ίη((λΜ2 + l)0)sm(0) (3 1 ) 5m(/cM2ff) = 5111((3⁄41^ + l)e)c〇5(—+ cos( (fen2 + l)0)sm(—β) =sin((fc"2 + l)0)<ros(0) -+ l)0)sin(0). (3 2) If cos(0) 5in(0) cojs(0,) ·5ίη(0')cos(f) ·5ίη(0π) Given the initial value, all the twiddle factors can be derived by equations (25) through (32) The foregoing has explained how to calculate all the coefficient values by the given initial value, including the twiddle factor part required by the Common Factor method, but does not mention how to use the existing hardware architecture to calculate all From the formula (21) to the formula (22), it is known that each coefficient requires two multiplication operations. If the multiplier is directly used to support the calculation of this part, there are four more, which will be Great burden and not conducive to energy saving, 32 201227351

低面積的設計,同時也不兼具綠能設計的概念。為了改 善乘法器的效率,係以採用乘法器共用的方法來解決乘 法器的問題,對於一組HDFT電路架構而言共有2個實數 乘法器,故只需兩個額外週期來負擔,基於此方法圖5可 修改為圖9所示。圖9係本發明cos係數共用乘法器之示意 圖,其中圖9僅列出cos係數共用乘法器之設計,而sin係 數只要將4對I多工器之1〇、η輸入交換即可。圖9中 coeff_sel及coeff_se2程式碼分別為: define cos_value=l'bO define sin_value= l'b 1 if(loop_cycle = N+1) coeff_sel = sin_value else coeff_sel=cos_value 及 define nodeA_value=2'b00 define nodeB_value=2'bO 1 define cos_value=2'b 1 0 define sin_value=2'b 1 1 if(loop_cycle=N+3) coeff_sel2=cos_value else if(loop_cycle=N + 2) coeff_sel2= sinvalue else if(loop_cycle=N+1) coeff_sel2= nodeB_value else 201227351 coeff_sel2= nodeA—value。 至於旋轉因子可藉由共用乘法器之設計來完成,但 旋轉因子在同-時間内卻需兩組不同的係數值,由公式 (29)至公式(32)可發現要八次的乘法運算,若對於2d形 式架構而言單純由第一級來支援則需四個週期來負擔, 這將會造成第一級與第二級的週期差距越來越大,導致 管線的效率大大地下降,因本架構第二級安排有兩套的 RDFT架構,所以會有4個實數的乘法器,對旋轉因子的 處理可降為一個週期’故由第二級來處理旋轉因子為最 佳之選擇,其共用乘法器之設計方式可參考圖9。 有關產生係數的問題及產生方式已能完整由所提出 的方案解決,最後將基於此方案下來說明本發明所提之 架構硬體實現後是如何動作的。硬體架構是在滿足這些 條件下所實現的,其條件有: 1 N = cXm, c>m 0 2. e皆為偶數。 3. 與互值時’採取Prime Factor方法,反之則 Common Factor驗算法。 4. 若採取Common Factor驗算法時,則需滿足 w 2 121 ’此條件是如何得知’將於稍後内容中做說 明。 5. 管線化第一級負責c點DFT轉換,第二級負責η»點轉 換。 6·對於Common Factor方法之旋轉因子問題一律在管 34 201227351 線之第二級解決。 硬體動作說明如下: 在重置(Reset)後,第一級硬體於一開始將會同時運 算h = 〇 C/2之DFT轉換,分別由RDFT電路及圖6之累加電 路架構來負責,運真完成後累加電路將會被禁能 (Disable) ’屆時只剩RDFT電路動作,接續運算 卜=1,2,…,以2—1之轉換,電路同時會產出The low-area design does not have the concept of green energy design. In order to improve the efficiency of the multiplier, the multiplier problem is solved by the method shared by the multiplier. For a group of HDFT circuit architectures, there are two real multipliers, so only two extra cycles are needed to bear. Figure 5 can be modified as shown in Figure 9. Fig. 9 is a schematic diagram of the cos coefficient sharing multiplier of the present invention, wherein Fig. 9 only lists the design of the cos coefficient sharing multiplier, and the sin coefficient only needs to exchange the 1 〇 and η inputs of the 4 pairs of I multiplexers. The coeff_sel and coeff_se2 codes in Figure 9 are: define cos_value=l'bO define sin_value= l'b 1 if(loop_cycle = N+1) coeff_sel = sin_value else coeff_sel=cos_value and define nodeA_value=2'b00 define nodeB_value=2 'bO 1 define cos_value=2'b 1 0 define sin_value=2'b 1 1 if(loop_cycle=N+3) coeff_sel2=cos_value else if(loop_cycle=N + 2) coeff_sel2= sinvalue else if(loop_cycle=N+1 Coeff_sel2= nodeB_value else 201227351 coeff_sel2= nodeA—value. As for the twiddle factor, it can be completed by the design of the shared multiplier, but the twiddle factor requires two different sets of coefficient values in the same time, and the multiplication operation of eight times can be found from equations (29) to (32). If the 2d-form architecture is supported by the first-level architecture, it will take four cycles to bear. This will cause the gap between the first-level and the second-level to become larger and larger, resulting in a significant decrease in the efficiency of the pipeline. The second level of the architecture has two sets of RDFT architectures, so there will be four real multipliers, and the processing of the twiddle factor can be reduced to one cycle'. Therefore, the second level is used to deal with the twiddle factor as the best choice. Refer to Figure 9 for the design of the shared multiplier. The problem and the way of generating the coefficient can be completely solved by the proposed scheme. Finally, based on this scheme, how the architecture hardware implementation of the present invention is implemented will be explained. The hardware architecture is implemented under these conditions, and the conditions are: 1 N = cXm, c > m 0 2. e is even. 3. With the mutual value, 'take the Prime Factor method, and vice versa. 4. If the Common Factor algorithm is used, it is necessary to satisfy w 2 121 'how this condition is known' will be explained later. 5. The first stage of pipeline is responsible for c-point DFT conversion, and the second level is responsible for η»point conversion. 6. The problem of the twiddle factor for the Common Factor method is solved at the second level of the line 34 201227351. The hardware action is described as follows: After the reset, the first stage hardware will simultaneously calculate the DFT conversion of h = 〇C/2 at the beginning, which is responsible for the RDFT circuit and the accumulation circuit architecture of Figure 6, respectively. After the completion of the operation, the accumulating circuit will be disabled (Disable). At that time, only the RDFT circuit will be left, and the subsequent operations will be =1, 2, ..., with 2-1 conversion, the circuit will also produce

fc2 = C'1,C'2…C/2 + 1之DFT係數,對於每次h值的遞增, 電路會閒置二個週期來產生下一筆係數值’所以對^一 級而言單一 Mi轉換需((C+1)Xm) + 2個週期,因有μ次轉 換故總需〇VXc + A/ + 2c)/2個週期。 第二級部分包含了兩組RDF丁電路,一組負責前—級 h = CU…c/2一1轉換結果, fc2 = C/2 c-l,c- 2…c/2 + l轉換結果 所以此級硬體動作將會有不同動作 另一組負責 ’因採取混合型架構, 方式: 1·採取Prime Factor方法 此方法的硬體動作方式與第—級rdf丁電路動作方 式大致相同,因此可依據前一級週期評估方式得知單 4值轉換需(W + 1+2)個週期,完成所有Mi轉換總需求 週期為(州+2)_/2卜2,因最後—筆值轉換後不需再算 下一筆係數,所以扣除2個週期。 ^ 2·採取 Common Factor方法 因級與級之間資料轉移需乘上旋轉因子,由於第— 級完成單-卜值轉換需((e+1)x,扣個週期,相較於第二 35 201227351 級完成運算所需週期(Μι + 1+2)χ(ηι/2)_2約彡兩倍週期,而 因子的運算需加個週期,其中2個週期是被使用處理因子 的f生’另2個週期則是被使用處理轉移的資料乘因子的 運异’其可藉由這段多餘的週期來處理旋轉因子,此時 «值符合公式(33)之關係式: ((C+l)Xn〇 + 2>(ni + 1 + 2)x(m/2K2 + 4m。 ⑴) 假設管線處在於最佳效率下(此條件下對於不等式 為最差情況),即c=m。 將公式(33)式整理可得c.m值為: (c,所)=(8,8) OT· (1,1) 0 在此條件下硬體動作被規劃為先花4m個週期處理 旋轉因子問題’再接續運算„,點贿轉換,其中rdft硬 體動作大致也相似於第一級,因沒有下一級的考量,所 以沒有使用簡易累加電路來負責卜m/2之轉換故總需 求週期為4,一…2)X((…)/2)_2, 二級所需週期 增加了 〇n + 3)個週期數,故公式(33)中不等式需修改為: ((c+ 1) xm) + 2 > (m + 1 + 2) X (m/2)+ i + 5n> 。 (34) 同樣地’將公式(34)整理可得cw值為: (c,m) ^ (11,11) or (0,0) 由〇_和的結果,可知採Comm〇n Fact〇r方法時,規格 點數需符合N 2 U1之條件。 將上述之硬體動作說明’可整理出分別採取卜化6 Factor方法及Common Factor方法完成轉換所需的計算週 36 201227351 期 為 因管線化的關係部分時間將會重疊,故其所得式子 (WXc + W+2c)/2+(m+3)X 卜 1/21-2。 (35) (/V X c + A/ + 2c)/2 + (τη + 3) X (τη/2) + 5m + 1。 ( 3 ό )Fc2 = C'1, C'2...C/2 + 1 DFT coefficient, for each increment of h value, the circuit will idle two cycles to generate the next coefficient value' so a single Mi conversion for ^ level ((C+1)Xm) + 2 cycles, since there are μ conversions, 总VXc + A/ + 2c)/2 cycles are always required. The second stage contains two sets of RDF Ding circuits, one set is responsible for the front-level h = CU...c/2-1 conversion result, fc2 = C/2 cl, c- 2...c/2 + l conversion result, so this The level hardware action will have different actions and the other group will be responsible for 'by adopting the hybrid architecture. The way: 1. Take the Prime Factor method. The hardware action mode of this method is roughly the same as that of the first-level rdf circuit, so it can be based on The previous stage evaluation method knows that the single 4-value conversion requires (W + 1+2) cycles, and the total demand cycle for completing all Mi conversions is (state + 2) _/2 b 2, because the last-pen value conversion is not required. Then calculate the next coefficient, so deduct 2 cycles. ^ 2· Take the Common Factor method because the data transfer between the level and the level needs to be multiplied by the rotation factor, because the first-level completion of the single-bu value conversion needs ((e+1)x, deduct a cycle, compared to the second 35 The 201227351 level is required to complete the operation (Μι + 1+2) χ(ηι/2)_2 about twice the period, and the factor operation needs to add a period, of which 2 periods are the f-process used by the processing factor. The two cycles are the difference of the data multiplication factor used to process the transfer. It can handle the twiddle factor by this extra cycle. At this time, the value corresponds to the relation of formula (33): ((C+l) Xn〇+ 2>(ni + 1 + 2)x(m/2K2 + 4m. (1)) Assume that the pipeline is at the optimum efficiency (the worst case for inequality under this condition), ie c=m. 33) The formula can be obtained as: (c, s) = (8, 8) OT · (1,1) 0 Under this condition, the hardware action is planned to take 4m cycles to deal with the twiddle factor problem. Continued operation „, the bribe conversion, where the rdft hardware action is roughly similar to the first level, because there is no next level of consideration, so no simple accumulator circuit is used to be responsible for the m/2 The total demand cycle is 4, one...2)X((...)/2)_2, and the required period of the second level is increased by 〇n + 3) cycles, so the inequality in equation (33) needs to be modified to: (c+ 1) xm) + 2 > (m + 1 + 2) X (m/2) + i + 5n> (34) Similarly, by formulating (34), the cw value is: (c, m) ^ (11,11) or (0,0) From the results of 〇_ and , it can be seen that when the Comm〇n Fact〇r method is adopted, the specification points must meet the conditions of N 2 U1. 'It is possible to sort out the calculation weeks required to complete the conversion by using the Di 6 Factor method and the Common Factor method respectively. The 201227351 period is due to the pipelined relationship, and the time will overlap, so the resulting formula (WXc + W+2c)/ 2+(m+3)X 卜1-21-2. (35) (/VX c + A/ + 2c)/2 + (τη + 3) X (τη/2) + 5m + 1. ( 3 ό )

當DFT轉換規格點數為N,若以G. Goertze丨在 American mathematical monthly, ρρ· 34-35, 1958所發表的 論文"An algorithm for the evaluation of finite trigonometric series"中戶斤提的方法進行轉換,其計算週期 需求為 A/x〇V + i)個。若以 Van et al.在 IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E SERIES A, vol. 90, p. 1644, 2007所發表的 論文"VLSI Architecture for the Low-Computation Cycle and Power-Efficient Recursive DFT/IDFT Design” 中所提 的方法進行轉換,當輸入資料已完成前處理下,其計算 週期需求為f/2。對於本發明提出的架構而言,其計算週 期需求如公式(37)及公式(38),其中尺= ,且若c · m為 互質時,使用公式(37)來計算所需週期,反之則使用公式 (38)來計算。 (/V X c -f- A/ + 2c)/2 + (m -r 3) X [rn/2l — 2〇 (37) (Λ/ X c + A/ + 2c)/2 -f (m -f 3) X (rn/2) + 5m + 1。 (3 8) 比較對象除上述的兩篇文獻外,並且將Van et al.於 04年發表的文獻及Lei et al.近年所提出與本論文相關的 文獻,一同納入比較對象。至於比較資料,在此主要是 37 201227351 針對DRM應用所需規格點數來進行比較,因點數同時具 有二的冪次方點以及非二的冪次方點數,其比較結果如 表 4所示。其中,[1]係 G. Goertzel在 American mathematical monthly, pp. 34-35,1958所發表的論文"An algorithm for the evaluation of finite trigonometric series”’ [2]係 L. VAN and C. YANG 在 2004, pp. 357-360 所發表的論文 "High-speed area-efficient recursive DFT/IDFT architectures",[3]係 Van et al.在 IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E SERIES A,vol. 90,p. 1644,2007所發表的論文"VLSI Architecture for the Low-Computation Cycle and Power-Efficient Recursive DFT/IDFT Design",[4]係 L. Shin-Chi,et al在 Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 56, pp. 921-925,2009所發表的 論文"Low Computational Complexity, Low Power, and Low Area Design for the Implementation of Recursive DFT and IDFT Algorithms",[5]係 S.-C. Lai, et al·在 Circuits and Systems II: Express Briefs, IEEE Transactions on, pp. 1-5, 2010 所發表的論文"Low-Computation cycle,When the DFT conversion specification point is N, if the method is performed by G. Goertze丨 in American based simulation, ρρ· 34-35, 1958, "An algorithm for the evaluation of finite trigonometric series" For conversion, the calculation cycle demand is A/x〇V + i). The paper published by Van et al. in IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E SERIES A, vol. 90, p. 1644, 2007"VLSI Architecture for the Low-Computation Cycle and Power-Efficient Recursive DFT The method proposed in /IDFT Design" is converted. When the input data has been processed, the calculation cycle demand is f/2. For the architecture proposed by the present invention, the calculation cycle demand is as in formula (37) and formula. (38), where ruler =, and if c · m is a prime, use equation (37) to calculate the required period, otherwise use equation (38) to calculate. (/VX c -f- A/ + 2c )/2 + (m -r 3) X [rn/2l - 2〇(37) (Λ/ X c + A/ + 2c)/2 -f (m -f 3) X (rn/2) + 5m + 1. (3 8) The comparison object is in addition to the above two documents, and the literature published by Van et al. in 2004 and the literature related to this paper by Lei et al. in recent years are included in the comparison object. Comparing the data, here is mainly 37 201227351 for the DRM application required specification points to compare, because the points have two The power square points and non-two power power points are compared as shown in Table 4. Among them, [1] is the paper published by American G. Goertzel in American alvid monthly, pp. 34-35, 1958. An algorithm for the evaluation of finite trigonometric series"' [2] is a paper published by L. VAN and C. YANG in 2004, pp. 357-360 "High-speed area-efficient recursive DFT/IDFT architectures", [3] is a paper published by Van et al. in IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E SERIES A, vol. 90, p. 1644, 2007 "VLSI Architecture for the Low-Computation Cycle and Power-Efficient Recursive DFT/IDFT Design", [4] is a paper published by L. Shin-Chi, et al in Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 56, pp. 921-925, 2009 "Low Computational Complexity, Low Power, and Low Area Design for the Implementation of Recursive DFT and IDFT Algorithms", [5] is S.-C. Lai, et al. in Circuits and Systems II: Express B Riefs, IEEE Transactions on, pp. 1-5, 2010 Published paper "Low-Computation cycle,

Power-Efficient, and Reconfigurable Design of Recursive DFT for Portable Digital Radio Mondiale Receiver”。 表4 針對DRM規格點數之週期比較 框大小(Frame Size, N) N 288 256 176 112 m N(N+1) 83,232 65,729 31,152 12,656 38 201227351 m Ν2 82,944 65,536 30,976 ^_12,544 PI Ν2/2 41,472 32,768 15,488 6,272 [4].. (Ν/2-1ΧΝ+1) 41,327 32,639 15,399 6,215 [5] 1 (A)、(Β) 9,594 32,896 3,124 —1,960 本發明 (C)、(D) 4,842 2,425 1,594 1,006 #5 與其他文獻比較之週期改善率 週期改善率(Ratio for Various Frame SizeN) 288 256 176 112 [1] 17.19 27.10 19.54 12.58 —[21 17.13 27.03 19.43 12.47 m 8.57 13.51 9.72 6.23 [41 8.54 13.46 9.66 6.18 ~~ _ [51 1.98 13.57 1.96 1.95 本發明 1 1 1 1Power-Efficient, and Reconfigurable Design of Recursive DFT for Portable Digital Radio Mondiale Receiver. Table 4 Period Comparison Box Size for DRM Specification Points (Frame Size, N) N 288 256 176 112 m N(N+1) 83,232 65,729 31,152 12,656 38 201227351 m Ν2 82,944 65,536 30,976 ^_12,544 PI Ν2/2 41,472 32,768 15,488 6,272 [4].. (Ν/2-1ΧΝ+1) 41,327 32,639 15,399 6,215 [5] 1 (A), (Β) 9,594 32,896 3,124 —1,960 The present invention (C), (D) 4,842 2,425 1,594 1,006 #5 Periodic improvement rate cycle improvement rate compared with other documents (Ratio for Various Frame SizeN) 288 256 176 112 [1] 17.19 27.10 19.54 12.58 — [21 17.13 27.03 19.43 12.47 m 8.57 13.51 9.72 6.23 [41 8.54 13.46 9.66 6.18 ~~ _ [51 1.98 13.57 1.96 1.95 The invention 1 1 1 1

藉由表4之結果可進一步算出週期改善率,如表5所 示。依據表5所示,可明顯發現本發明提出的架構其整體 效率相較於其他文獻至少都有1.95倍以上的改善率。 基於上述的說明,假設輸入音框長度Ν為8的倍數,The cycle improvement rate can be further calculated by the results of Table 4, as shown in Table 5. According to Table 5, it is apparent that the overall efficiency of the architecture proposed by the present invention is at least 1.95 times better than other literatures. Based on the above description, assume that the input frame length Ν is a multiple of 8,

若以 C. Hwang-Cheng and L· Jie-Cherng 在 Signal Processing Letters, IEEE,ν〇1· 3, ρρ· 1 16-1 18, 1996所發表 的論文"Regressive implementations for the forward and inverse MDCT in MPEG audio coding",其計算週期需求 為 Λ’-/2個。若以 c. Che-Hong,et al.在 Circuits and Systems II: Analog and Digital Signal Processing, IEEEReprinted by C. Hwang-Cheng and L. Jie-Cherng in Signal Processing Letters, IEEE, ν〇1·3, ρρ· 1 16-1 18, 1996"Regressive implementations for the forward and inverse MDCT in MPEG audio coding", the calculation cycle demand is Λ'-/2. If c. Che-Hong, et al. in Circuits and Systems II: Analog and Digital Signal Processing, IEEE

Transactions on, vol. 50, pp. 38-45, 2003所發表的論文 "Recursive architectures for realizing modified discrete cosine transform and its inverse”,在不包含前、後處理之 39 201227351 下,其計算週期需求為W2/16個。若以s. F. Lei, et al.在 Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. PP,pp. 1 - 5,2010所發表的論文"Low Complexity and Fast Computation for Recursive MDCT and IMDCT Algorithms",其計算週期需求為况2/32個,但 其為了降低硬體實現的成本,同樣使用了乘法器共用的 概念,因此計算週期數增加w/4個,總計算週期變為 〇V/8 + l)〇V/4)個。 相較於本發明的架構而言,若轉換點數W藉由前、後 處理程序,可使得尺點IMDCT轉換變為以W/4點DFT為核心 架構之轉換,其核心轉換所需週期可由公式(37)修改得 知,如下式所示。 (W’Xc’ + AT+2〇/2+(w’+3)X [ni’/2l-2 , (39) 其中,Λ/ = 4Λ/、Λ/ = X τη’ 〇50, pp. 38-45, 2003, "Recursive architectures for realizing modified discrete cosine transform and its inverse", without pre- and post-processing 39 201227351, the calculation cycle demand is W2/16. If s. F. Lei, et al. in Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. PP, pp. 1 - 5, 2010 published paper "Low Complexity and Fast Computation for Recursive MDCT and IMDCT Algorithms", the calculation cycle demand is 2/32, but in order to reduce the cost of hardware implementation, the concept of multiplier sharing is also used, so the number of calculation cycles is increased by w/4, total The calculation period becomes 〇V/8 + l) 〇V/4). Compared with the architecture of the present invention, if the number of conversion points W is changed by the pre- and post-processing procedures, the IMDCT conversion can be changed to The W/4 DFT is the core architecture conversion, and the core conversion required period can be modified by the formula (37), as shown in the following equation. (W'Xc' + AT+2〇/2+(w'+3) X [ni'/2l-2 , (39) where Λ/ = 4Λ/, Λ/ = X τη 〇

同樣地,其中,[6]係 C. Hwang-Cheng and L. Jie-Cherng在 Signal Processing Letters, IEEE, vol. 3, pp. 1 16-1 18, 1996所發表的論文"Regressive implementations for the forward and inverse MDCT in MPEG audio coding"。[7]係 C. Che-Hong, et al.在 Circuits and Systems II: Analog and Digital Signal Processing, IEEESimilarly, [6] is a paper published by C. Hwang-Cheng and L. Jie-Cherng in Signal Processing Letters, IEEE, vol. 3, pp. 1 16-1 18, 1996 "Regressive implementations for the Forward and inverse MDCT in MPEG audio coding". [7] Department C. Che-Hong, et al. in Circuits and Systems II: Analog and Digital Signal Processing, IEEE

Transactions on, vol. 50, pp. 38-45,2003所發表的論文 "Recursive architectures for realizing modified discrete cosine transform and its inverse" 0 [8]係 S. Lai, et al.在 IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 56, pp. 793-797, 2009"Common architecture 201227351Transactions on vol. 50, pp. 38-45, 2003 "Recursive architectures for realizing modified discrete cosine transform and its inverse" 0 [8] is S. Lai, et al. in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 56, pp. 793-797, 2009"Common architecture 201227351

design of novel recursive MDCT and IMDCT algorithms for application to AAC,AAC in DRM,and MP3 codecs'·。 [9]係 S. F. Lei, et al.在 Circuits and Systems Π: Express Briefs, IEEE Transactions on, vol. PP, pp. 571-575, 2010 所發表的論文"Low Complexity and Fast Computation for Recursive MDCT and IMDCT Algorithms”。而比較資料主 要還是針對DRM應用,其應用包含了 1920點、240點之 AAC格式壓縮,對於公式(39)式中βίη‘值可由表5.2.1得 知’比較結果如表6所示。Design of novel recursive MDCT and IMDCT algorithms for application to AAC, AAC in DRM, and MP3 codecs'. [9] SF Lei, et al. in Circuits and Systems Π: Express Briefs, IEEE Transactions on, vol. PP, pp. 571-575, 2010 Published papers "Low Complexity and Fast Computation for Recursive MDCT and IMDCT Algorithms". The comparison data is mainly for DRM applications. Its application includes 1920 points and 240 points of AAC format compression. For the formula (39), the value of βίη' can be obtained from Table 5.2.1. The comparison results are shown in Table 6. Show.

^ 6 遞迴週期比較及改善率 框大小(Frame Size, Ν') Ν 1920 Ratio 240 Ratio 6 Ν2/2 1,843,200 227.72 28,800 67 92 [71 [8] Wne (Ν/2+1ΧΝ/4Ί 230,400 461,280~ 28.47 3,600 8.49 56.99 7,260 17 12 -1^1_ 本發明 (Ν/8+1)(Ν/4、 (A) 115,680 ~8,094~~ 14.29 Ϊ 1,860 ~424~~ 4.39 i 表6比較結果,主要是針對核心部分來做探討,其原 因是為了得到精確的週期數,故將前 '後處理部分忽略 發現本發明所提之架構,具有較少的 運期,改善率至少都在4 39倍以上。 數,::=係利用記憶元件來健存事先已算好的係 積有…響—二使用 '然而記憶元件對於晶片實現面 能指標響,因㈣數需求量可被視種硬體效 係數評估方式可藉 由各方法之轉移函數來評估^ 6 Recursive cycle comparison and improvement rate frame size (Frame Size, Ν') 1920 1920 Ratio 240 Ratio 6 Ν2/2 1,843,200 227.72 28,800 67 92 [71 [8] Wne (Ν/2+1ΧΝ/4Ί 230,400 461,280~ 28.47 3,600 8.49 56.99 7,260 17 12 -1^1_ The present invention (Ν/8+1)(Ν/4, (A) 115,680-8,094~~ 14.29 Ϊ 1,860 ~424~~ 4.39 i Table 6 comparison results, mainly for the core Partly to discuss, the reason is that in order to obtain an accurate number of cycles, the former 'post-processing part ignores the architecture proposed by the present invention, and has a lesser period of time, and the improvement rate is at least 4 39 times or more. ::= is the use of memory components to survive the previously calculated good system has ... ring - two use 'However, memory components for the wafer to achieve surface energy indicators, because (four) number of demand can be assessed by the type of hard body efficiency coefficient Can be evaluated by the transfer function of each method

41 201227351 而對於不同的轉移函數將可能同時擁有Cosine係數及 Sine係數或者只有單一種係數之需求。其評估結果如表7 所示。 表7 針對DRM規格點數之FFT係數需求量 框大小(Frame Size, Ν) 總字組數 目(Total words) Ν 288 256 176 112 m 2Ν 576 512 352 224 1,664 Γ21 2Ν 576 512 352 224 1,664 [31 2Ν 576 512 352 224 1,664 f41 Ν-2 286 254 174 110 824 ί51 2c+2m 82 512 54 46 694 本發明 無需使用記憶體 0 在IMDCT轉換部分,係數評估方式可採取與DFT之 轉移函數評估法,其可直接觀看架構方塊圖來進行評 估,此方式好處為較為直覺且容易發現哪些系數可共 用,以減少評估上的錯誤。其評估結果如表8所示。 表8 各種IMDCT方法之係數需求量與比較 前處理、後處理(Pre-and Post- processing) 遞迴核心(Recursive kernel) 總字組 數目 (Total words) 比例 (Ratio ) Ν 1920 240 N 1920 240 Γ61 0 0 0 2N 3,840 480 4,080 3.78 m Ν 1920 240 3N/4 1,440 180 3,780 3.5 m 0 0 0 3N/4 1,440 180 1,620 1.5 Γ9] 3Ν/4 1440 180 N/2 960 120 2,700 2.5 本發明 Ν/2 960 120 0 0 0 1,080 1 由表7及表8之比較結果,可知本發明所提出的架構 無論是在DFT轉換或者應用於在IMDCT轉換上對於係數 42 201227351 :求量皆有最小的需求,這間接地說明了此架構在晶片 貫現上能有較小的面積需求,以達到低成本的效益。 承了》己隐元件會對晶片面積有所影響外,其架構硬 體需求也是其中一種因素考量,藉由評估結果再進一步 去推算出各種方法之計算複雜度。 因複雜度是基於硬體評估之結果求得,所以硬體評 估將為首要工作。其評估結果如表9所示。其中’[叫為41 201227351 For different transfer functions it is possible to have both Cosine and Sine coefficients or only a single factor. The evaluation results are shown in Table 7. Table 7 FFT Coefficients for DRM Specification Points Box Size (Ν) Total Number of Blocks (Total words) 288 288 256 176 112 m 2Ν 576 512 352 224 1,664 Γ21 2Ν 576 512 352 224 1,664 [31 2Ν 576 512 352 224 1,664 f41 Ν-2 286 254 174 110 824 ί51 2c+2m 82 512 54 46 694 The invention does not require the use of memory 0. In the IMDCT conversion section, the coefficient evaluation method can take the transfer function evaluation method with DFT. It can be viewed directly by viewing the block diagram of the architecture. The benefit of this approach is more intuitive and easy to find which coefficients can be shared to reduce errors in evaluation. The evaluation results are shown in Table 8. Table 8 Coefficient requirements for various IMDCT methods and Pre-and Post-processing Recursive kernel Total words Ratio (Ratio) Ν 1920 240 N 1920 240 Γ61 0 0 0 2N 3,840 480 4,080 3.78 m Ν 1920 240 3N/4 1,440 180 3,780 3.5 m 0 0 0 3N/4 1,440 180 1,620 1.5 Γ9] 3Ν/4 1440 180 N/2 960 120 2,700 2.5 Invention Ν/2 960 120 0 0 0 1,080 1 From the comparison results of Table 7 and Table 8, it can be seen that the proposed architecture of the present invention has the minimum requirement for the coefficient 42 201227351: whether it is in DFT conversion or applied to IMDCT conversion, which is indirect. This shows that this architecture can have a small area requirement in the wafer to achieve low cost benefits. In addition to the influence of the hidden components on the chip area, the hardware requirements of the architecture are also considered as one of the factors. The evaluation results further further calculate the computational complexity of the various methods. Since complexity is based on the results of hardware evaluations, hardware evaluation will be a top priority. The evaluation results are shown in Table 9. Where '[called

K. Dong-Sun, et al_ 在 Consumer Electronics, IEEEK. Dong-Sun, et al_ at Consumer Electronics, IEEE

Transact 丨 ons on,vo 丨.54, pp 159〇 丨 594, 2〇〇8 所發表的論 文"Design of a mixed prime factor FFT for portable digital radio mondiale receiver" c 表9 各種RDFT方法之硬體資源 加法器 乘法器 Coeff.-ROM DTPT 101 30 30 un-listed un-listed Π 8 6 1,664 1 Ϊ2 8 4 1,664 1 [31 12 6 1 1,664 1 [41 13 2 824 2 f51 8 4 694 1 本發明 14 6 0 4 有了表9結果,可進一步來評估計算複雜度,其評 估結果如表10及表11所示。 表lj__^出RDFT架構之計算複雜度分析 框大小(Frame Size,N) N 288 256 176 1 112 實數加法 (A)'(B) 26,270 19,198 11,118 6,190 實數乘法 (C)'(D)1 13,276 11,260 5,644 3,148 43 S; 201227351 對 256 點計算:(八)2#(〇 + 讲 + 5.5) —2 (C)A^ + m+12)—4 其他:(B) 2A/(c + 2_5) + 4(A/ + c)[?n/2l - 2 (D) N(c +3) + 2(W + c)im/2l - 4 表11 各種RDFT方法之計算複雜度分析(規格點數) 實數加法 實數乘法 [10] 未列 5,928 未列 1,000 m 4N(N+2) 334,080 2N(N+3) 167,616 ί21 4N(N+1) 332,928 2N(N+1) 166,464 m 2N(2N+3) 333,504 2N(N+3) 167,616 f41 N(2N+7)-2 167,902 (N+1KN-2) 82,654 [51 4N(m+c+2) 49,536 2N(m+c+2) 24,768 本發明 (A) 26,270 (B) 13,276 (A) 2iV(c + 2.5) + 4(Λ/ + c)im/2l - 2 (B) A'(c + 3) + 2 (Λ/ + c)f m/2] - 4 由表1 1可知,對於遞迴式架構而言,其加法複雜度 方面至少有1.89倍的改善率,最大改善率可達12.72倍, 在乘法複雜度方面至少有1.87倍的改善率,最大改善率可 達12.63倍。 IMDCT部分,[6]中ChiangandLiu的方法架構包含3 個實數加法器及2個實數乘法器。[7]中Chen et al.的方法 架構在不含前、後處理情況下,包含7個實數加法器及4 個實數乘法器。[9]中Lei et al.的方法架構在不含前、後 處理情況下,包含6個實數加法器及2個實數乘法器,本發 明是基於DFT為核心來實現IMDCT,若在不含前、後處理 情況下,其硬體需求有14個實數加法器及6個實數乘法 器,完整的硬體比較結果如表12所示。 表12 各種IMDCT方法之硬體資源 44 201227351Transact 丨ons on, vo 丨.54, pp 159〇丨594, 2〇〇8 published papers "Design of a mixed prime factor FFT for portable digital radio mondiale receiver" c Table 9 Hardware resources of various RDFT methods Adder multiplier Coeff.-ROM DTPT 101 30 30 un-listed un-listed Π 8 6 1,664 1 Ϊ 2 8 4 1,664 1 [31 12 6 1 1,664 1 [41 13 2 824 2 f51 8 4 694 1 invention 14 6 0 4 With the results of Table 9, the computational complexity can be further evaluated. The evaluation results are shown in Table 10 and Table 11. Table lj__^ The computational complexity analysis frame size of the RDFT architecture (Frame Size, N) N 288 256 176 1 112 Real addition (A)'(B) 26,270 19,198 11,118 6,190 Real multiplication (C)'(D)1 13,276 11,260 5,644 3,148 43 S; 201227351 Calculation of 256 points: (eight) 2#(〇+讲+ 5.5) —2 (C)A^ + m+12)—4 Others: (B) 2A/(c + 2_5) + 4(A/ + c)[?n/2l - 2 (D) N(c +3) + 2(W + c)im/2l - 4 Table 11 Computational complexity analysis (specification points) of various RDFT methods Real Addition Real Multiplication [10] Not listed 5,928 Not listed 1,000 m 4N(N+2) 334,080 2N(N+3) 167,616 2121 4N(N+1) 332,928 2N(N+1) 166,464 m 2N(2N+3) 333,504 2N(N+3) 167,616 f41 N(2N+7)-2 167,902 (N+1KN-2) 82,654 [51 4N(m+c+2) 49,536 2N(m+c+2) 24,768 The present invention (A 26,270 (B) 13,276 (A) 2iV(c + 2.5) + 4(Λ/ + c)im/2l - 2 (B) A'(c + 3) + 2 (Λ/ + c)fm/2] - 4 It can be seen from Table 1 that for the recursive architecture, the addition complexity is at least 1.89 times improvement rate, the maximum improvement rate is 12.72 times, and the multiplication complexity is at least 1.87 times improvement rate. The maximum improvement rate can reach 12. 63 times. In the IMDCT section, the method architecture of ChiangandLi in [6] consists of three real adders and two real multipliers. The method of Chen et al. in [7] contains 7 real adders and 4 real multipliers without pre- and post-processing. In [9], the method architecture of Lei et al. includes six real adders and two real multipliers without pre- and post-processing. The present invention implements IMDCT based on DFT, if not included. In the case of post-processing, the hardware requirements are 14 real adders and 6 real multipliers. The complete hardware comparison results are shown in Table 12. Table 12 Hardware resources for various IMDCT methods 44 201227351

Algorithm Adder Multipier Coeff.-ROM DTPT [61 3 2 4,080 1 Γ71 7 4 3,780 4 ί81 11 2 1,620 4 ί91 6 2 2,700 4 本發明 14 6 1,080 4Algorithm Adder Multipier Coeff.-ROM DTPT [61 3 2 4,080 1 Γ71 7 4 3,780 4 ί81 11 2 1,620 4 ί91 6 2 2,700 4 The present invention 14 6 1,080 4

接著藉由表12之結果,來評估各種IMDCT方法之計 算複雜度,其評估方式將採取與先前評估方式相同。對 於所有演算法之計算複雜度評估結果如表1 3及表1 4所 示。 表13 基於RDFT為核心之IMDCT架構計算複雜度 分析 框大小 前處理、後處理(Pre- and 遞迴核心(Recursive (Frame Post- processing) kernel) Size, N) 實數加法 實數乘法 實數加法 實數乘法 N N 2N (A) (B) 1920 1,920 3,840 49,502 24,988 240 240 480 2,602 1,328 (A) 2W(c + 2.5) + 4(W -l- c)f«i/2l - 2 (B) W(c + 3) + 2(Λ» + c)fm/2l - 4 表1 4 各種IMDCT方法之計算複雜度分析(規格點 _____ 數) _ 前處理、後處理(Pre- and 遞迴核心(Recursive Post- processing) kernel) 實數加法 實數乘法 實數加法 實數乘法 ί61 0 0 5,529,600 1,845,120 m 960 1920 923,040 461,760 『81 0 0 4,608,000 922,560 ί91 3,358 1,440 691,200 231,360 本發明 1,920 3,840 49,502 24,988 由表14可知,在加法複雜度方面至少有13.5 1倍的改 45 201227351 善率,最大改善率可達107.53倍,在乘法複雜度方面至少 有8.08倍的改善率,最大改進率可達64倍。 根據前面章節方法的介紹、推導,到硬體的規劃、 改良,經由這一連串的探討,得以發展出RDFT架構電 路,並藉由 Synopsys公司之 Design compiler Tool 合成, 再透過 Cadence公司之 SoC Encounter Tool 完成 APR (Auto Placement and Route, APR),將此RDFT架構電路晶片實 現,其晶片數據如表1 5所示。 表1 5 晶片數據 支援框大 DFT/IDFT 288/256/ 176/ 112 小 (Supporting Frame Size, MDCT/IMDCT 1920/240 _N)___ 内部係數字組長度(Internal / Coeff. Word Length) 24 bit / 24 bit 製程技術(Process Technology) TSMC 0.18um 1P6M CMOS 封裝(Package) CQFP128 供應電壓(Supply Voltage) 1.98v 功率消耗(Power consumption) 14.6mW @ 25MHz 核大小(Core Size) 0.84x0.84 mm2 其中晶片功率消耗為RDFT規格點數設定w = 288且 操作頻率為25Mhz的情況下,經由Prime Power測得之模 擬結果。將此數據結果與其他論文結果進一步比較。其 結果為公式(40),將藉由公式(40)式作正規化排除製程因 素再進行比較,並由公式(4 1)計算可得一客觀的性能指 201227351 標,比較結果如表16所示。 Area Normalized Area =- Technology/0.18Hni , (40) DFTs Technology/Ο.ΐδμιη Energy Power X Execution Time X 103 o (41)The computational complexity of the various IMDCT methods is then evaluated by the results of Table 12, which will be evaluated in the same manner as the previous assessment. The results of the computational complexity assessment for all algorithms are shown in Table 13 and Table 14. Table 13 RDFT-based IMDCT architecture computational complexity analysis box size pre-processing, post-processing (Pre- and recursive (Frame Post-processing) kernel) Size, N) real addition real multiplication real number addition real multiplication NN 2N (A) (B) 1920 1,920 3,840 49,502 24,988 240 240 480 2,602 1,328 (A) 2W(c + 2.5) + 4(W -l- c)f«i/2l - 2 (B) W(c + 3 + 2(Λ» + c)fm/2l - 4 Table 1 4 Computational complexity analysis of various IMDCT methods (specification number _____ number) _ pre-processing, post-processing (Pre- and recursive core-processing (Recursive Post-processing) Kernel addition real multiplication real multiplication real multiplication ί61 0 0 5,529,600 1,845,120 m 960 1920 923,040 461,760 『81 0 0 4,608,000 922,560 ί91 3,358 1,440 691,200 231,360 The present invention 1,920 3,840 49,502 24,988 As can be seen from Table 14, there is at least additive complexity 13.5 1x change 45 201227351 Good rate, the maximum improvement rate can reach 107.53 times, at least 8.08 times improvement rate in multiplication complexity, the maximum improvement rate can reach 64 times. Based on the introduction, derivation, and hardware planning and improvement of the previous chapters, through this series of discussions, the RDFT architecture circuit was developed and synthesized by Synopsys' Design compiler Tool, and then completed by Cadence's SoC Encounter Tool. APR (Auto Placement and Route, APR), this RDFT architecture circuit chip is implemented, and its wafer data is shown in Table 15. Table 1 5 Chip Data Support Box Large DFT/IDFT 288/256/ 176/ 112 Small (Supporting Frame Size, MDCT/IMDCT 1920/240 _N)___ Internal Digital Number Length (Internal / Coeff. Word Length) 24 bit / 24 Bit Process Technology TSMC 0.18um 1P6M CMOS Package (Package) CQFP128 Supply Voltage 1.98v Power consumption 14.6mW @ 25MHz Core Size 0.84x0.84 mm2 Where Wafer Power Consumption The simulation result measured by Prime Power in the case where the RDFT specification point is set to w = 288 and the operating frequency is 25Mhz. Compare this data result with the results of other papers. The result is formula (40), which will be normalized by formula (40) to eliminate the process factors, and then calculated by formula (4 1) to obtain an objective performance index of 201227351. The comparison results are shown in Table 16. Show. Area Normalized Area =- Technology/0.18Hni , (40) DFTs Technology/Ο.ΐδμιη Energy Power X Execution Time X 103 o (41)

表1 6 晶片比較 Design Van et al. m Previous-I i41 Previous-II m 本發明 製程技術 (Process Technology) 0.13 um 0.18 um 0.18 um 0.18 um 内部係數字組長 度(Internal / Coeff.Word Length) 12 bit / — 24 bit / 24 bit 21 bit / 16 bit 24 bit / 24 bit 供應電壓(Supply Voltage) 1.2v 1.98v 1.98v 1.98v 時序(Clock Rate) 20 MHz 25MHz 25MHz 25MHz 支援框大小 (Supporting Frame Size, N) 212, 106 288, 256, 176, 112, 212, 165, 106 288, 256, 176, 112 480, 288, 256, 176, 112, 60 288點執行時間 2.07 ms* (estimation) 1.65 ms (estimation) 384ps (estimation) 193.68ys (estimation) 功率消耗(Power consumption) 1.25 mW for DTMF 5.98 mW 8.44 mW 14.6 mW 核面積(Core Area) 0.182 mm2 0.154 mm2 0.265 mm2 0.705 mm2 正規化面積 (Normalized Area) 0.348 mm2 0.154 mm2 0.265 mm2 0.705 mm2 正規化DFT/能量 (Normalized DFTs/Energy) 279.12 101.45 308.55 353.64 47 201227351 由前述比較可知’本發明之以離散傅立葉轉換為核 心之修正型離散餘弦正轉換、反轉換之系統係可實現出 具有低面積、低複雜度及高效能的RDFT,由先前比較結 果可得知’對於288點來說,本發明技術與最新Lai et ai, RDFT架構[5]比較,其改善率在運算週期方面足足降低 49.5°/。,在運算複雜度方面加法運算節省47 5%、乘法運 算節省48.7%。除此之外,還擁有節能、可重覆利用及可 組態化之綠能設計概念,它能輕易地被使用於任何規格 點數之DFT轉換,同時還扮演著MDCT7IMDCT核心,達 到RDFT能善加被重覆利用,以增進使用率。 由上述可知,本發明無論就目的、手段及功效,在 在均顯示其迥異於習知技術之特徵,極具實用價值。惟 應注意的是,上述諸多實施例僅係為了便於說明而舉例 而已,本發明所主張之權利範圍自應以申請專利範圍所 述為準,而非僅限於上述實施例。 【圖式簡單說明】 圖1係本發明之以離散傅立葉轉換為核心之修正型離散 餘弦正轉換、及反轉換之系統之之示意圖。 圖2係本發明之該N / 4個點之離散傅立葉轉換單元之 圖3係一習知改良型RDFT架構之示意圖。 圖4係圖3改良之共用祕器之聊丁架構方塊圖。 圖5為圖4共用乘法器之示意圖。 圖6係本發明累加電路之示意圖。 48 201227351 圖7仏本發明利用多工器使RDFT硬體計算結果之示意 圖。 圖8係本發明旋轉因子在同一個時間内需有兩種不同的 係數之需求順序的示意圖。 圖9係本發明⑽係數共用乘法器之示意圖。 【主要元件符號說明】Table 1 6 wafer comparison Design Van et al. m Previous-I i41 Previous-II m Process Technology 0.13 um 0.18 um 0.18 um 0.18 um Internal Digital Length (Internal / Coeff. Word Length) 12 bit / — 24 bit / 24 bit 21 bit / 16 bit 24 bit / 24 bit Supply Voltage 1.2v 1.98v 1.98v 1.98v Timing (Clock Rate) 20 MHz 25MHz 25MHz 25MHz Support Frame Size (N) ) 212, 106 288, 256, 176, 112, 212, 165, 106 288, 256, 176, 112 480, 288, 256, 176, 112, 60 288 execution time 2.07 ms* (estimation) 1.65 ms (estimation) 384ps (estimation) 193.68ys (estimation) Power consumption 1.25 mW for DTMF 5.98 mW 8.44 mW 14.6 mW Core area 0.182 mm2 0.154 mm2 0.265 mm2 0.705 mm2 Normalized Area 0.348 mm2 0.154 mm2 0.265 mm2 0.705 mm2 Normalized DFTs/Energy 279.12 101.45 308.55 353.64 47 201227351 From the foregoing comparison, the modified discrete cosine forward rotation with discrete Fourier transform as the core of the present invention is known. The system of switching and reverse conversion can realize RDFT with low area, low complexity and high efficiency. It can be known from the previous comparison results. For the 288 point, the technology of the present invention and the latest Lai et ai, RDFT architecture [5 ] Comparison, the improvement rate is reduced by 49.5 ° / in terms of the calculation cycle. In addition, the addition operation saves 47 5% in terms of computational complexity and 48.7% in multiplication operation. In addition, it has an energy-saving, reusable and configurable green energy design concept that can be easily used for DFT conversion of any specification point, while also playing the core of MDCT7IMDCT, achieving RDFT goodness. Plus is reused to increase usage. As apparent from the above, the present invention is extremely useful in terms of its purpose, means, and efficacy, both of which are different from those of the prior art. It is to be noted that the various embodiments described above are intended to be illustrative only, and the scope of the invention is intended to be limited by the scope of the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a schematic diagram of a system for modifying discrete discrete cosine transforms and inverse transforms with discrete Fourier transform as the core of the present invention. 2 is a schematic diagram of the N/4 point discrete Fourier transform unit of the present invention. FIG. 3 is a schematic diagram of a conventional modified RDFT architecture. FIG. 4 is a block diagram of the improved architecture of the shared secret device of FIG. 3. Figure 5 is a schematic diagram of the shared multiplier of Figure 4. Figure 6 is a schematic illustration of the accumulation circuit of the present invention. 48 201227351 Figure 7 is a schematic diagram of the RDFT hardware calculation results using the multiplexer of the present invention. Figure 8 is a schematic illustration of the order of demand for the twiddle factors of the present invention to have two different coefficients at the same time. Figure 9 is a schematic illustration of the (10) coefficient sharing multiplier of the present invention. [Main component symbol description]

修正型離散餘弦正轉換系統11〇 G正型離散餘弦反轉換系統1 2 0 資料順序移位編排單^ i 3Q冑料重新排序單元⑽ 第一旋轉運算單元15〇 N’4個點之離散傅立葉轉換單元160Modified Discrete Cosine Positive Conversion System 11〇G Positive Discrete Cosine Inverse Conversion System 1 2 0 Data Sequence Shift Scheduling Single ^ i 3Q Data Reorder Unit (10) First Rotary Operation Unit 15〇N'4 Points Discrete Fourier Conversion unit 160

第—旋轉運算單元170 第—多工器205 第一乘法器2 1 5 第二多工器225 第二乘法器235 第三乘法器245 第二加法器255 解交錯運算單元180 第一加法器2 1 0 第一移位暫存器220 第—遲延器230 第三多工器240 第四多工器250 第二遲延器260 49First-rotation operation unit 170 first-multiplexer 205 first multiplier 2 1 5 second multiplexer 225 second multiplier 235 third multiplier 245 second adder 255 de-interleave operation unit 180 first adder 2 1 0 first shift register 220 first-delay 230 third multiplexer 240 fourth multiplexer 250 second delay 260 49

Claims (1)

201227351 七 申請專利範圍 一種以離散傅立葉轉換為核心之修正型離散餘 弦正轉換之系統,其包含: 離政餘 t料順序移位編排單元,其接收N個輸入數位气 號’對該N個數位訊號執行順序移位編排, 第 -暫時訊號,”,為4的倍數之正整數; 個第 …一資料重新排序單元,連接至該㈣順序移位編排 早兀’對該第-暫時訊號執行資料重新排序運算 生N/4個第二暫時訊號; 一第紋轉運算單元,連接至該資料重新排序單 兀’對該N/4個第二暫時訊號執行一第一旋轉運算以產 生Ν/4個第三暫時訊號; — Ν/4個點之離散傅立葉轉換單元,連接至該第一旋 轉運算單元,對該Ν/4個第三暫時訊號執行離散傅立葉轉 換以產生Ν/4個第四暫時訊號,該ΝΜ個點之離散傅立 葉轉換單元包含: 第—多工器,用以接收該Ν/4個第三暫時訊 號與一第二乘法訊號,並產生一第一多工訊號; 一第一加法器,連接至該第一多工器,以對 s玄第一多工訊號與一第二遲延訊號進行加法運 异’以產生該第四暫時訊號; 第乘法器’連接至該第一加法器,以對 該第四暫時訊號與一餘弦函數訊號進行乘法運 异’以產生一第一乘法訊號; 50 201227351 —第一移位暫存器’連接至該第一乘法器, 以對該第一乘法訊號進行移位運算,以產生—第 一移位訊號; —第二多工器,連接至該第一乘法器及該第 —移位暫存器,接收該第一乘法訊號及該第—移 位訊號’以輸出一第二多工器訊號; 第遲延器,連接至該第一加法器,以對201227351 VII Patent Application Scope A modified discrete cosine forward conversion system with discrete Fourier transform as the core, comprising: a continuation shifting sequence shifting arrangement unit, which receives N input digits and a gas number 'for the N digits The signal execution sequence shifting arrangement, the first temporary signal, ", is a positive integer of a multiple of 4; a ... data reordering unit, connected to the (four) sequential shifting arrangement early 'execution data for the first temporary signal Reordering the operation to generate N/4 second temporary signals; a truncation operation unit connected to the data reordering unit 执行 performing a first rotation operation on the N/4 second temporary signals to generate Ν/4 a third temporary signal; — a 傅/4 point discrete Fourier transform unit connected to the first rotating operation unit, performing discrete Fourier transform on the Ν/4 third temporary signals to generate Ν/4 fourth temporary The signal, the discrete Fourier transform unit of the point includes: a first multiplexer for receiving the Ν/4 third temporary signals and a second multiplication signal, and generating a first multi a first adder connected to the first multiplexer to add a difference between the first multiplex signal and the second delay signal to generate the fourth temporary signal; the first multiplier is connected Up to the first adder, multiplying the fourth temporary signal by a cosine function signal to generate a first multiplication signal; 50 201227351 - the first shift register is connected to the first multiplier Performing a shift operation on the first multiplication signal to generate a first shift signal; a second multiplexer connected to the first multiplier and the first shift register to receive the first a multiplication signal and the first-shift signal' to output a second multiplexer signal; a delayer connected to the first adder to "亥第四暫時訊號進行遲延運算,以產生一第—遲 延訊號; 一第二乘法器,連接至該第一遲延裝置,以 對泫第一遲延訊號與一正弦函數訊號進行乘法運 具’以產生該第二乘法訊號; 第—夕工器,連接至該第一遲延裝置及該 第一乘法益,接收該第一遲延訊號及該該第二乘 法訊號’以輸出一第三多工器訊號;"Hai fourth temporary signal to perform a delay operation to generate a first delay signal; a second multiplier connected to the first delay device for multiplying the first delay signal and a sinusoidal function signal' The second multiplication signal is generated, and the first delay device and the first multiplication signal are received, and the first delay signal and the second multiplication signal are received to output a third multiplexer Signal 一第二乘法器’連接至該第三多工器,以對 S玄第二多工器訊號與_丨進行乘法運算以產生一 第三乘法訊號; 一第四多工器,連接至該第二多工器,接收 該第二多工器訊號及該第二遲延訊號,以 第四多工器訊號; 一第二加法器,連接至該第三乘法器及該第 四多工器,以對該第三乘法訊號與該第四多二器 訊號進行加法運算,以產生一第二加法訊號;^ 及 5! 201227351 一第二遲延器’連接至該第二加法器,以對 該第二加法訊號進行遲延運算,以產生該第二遲 延訊號; —第二旋轉運算單元,連接至該N/4個點之離散傅立 葉轉換單兀,對該N/4個由第四暫時訊號及第二加法訊號 組成執行一第二旋轉運算,以產生N/4個第五暫時訊號; 以及 —解交錯(de-interleave)運算單元,連接至該第二旋 ,運算單元’對該NM個第五暫時訊號執行_解交錯運 算’以產生N個輸出訊號。 唷寻刊範圍第1項所述 …狀矸且茱轉換 為核心之修正型離散餘弦正轉換 死其中該資料順 序移位編排單元以下列公式表示_ χ(η+ιή fn-^) 、 4/ X N 71 ^--1 4 N J ~ 1 坪為該N個第一暫 當中,x(n)為該N個輸入數位訊號 時訊號。 3.如申請專利範圍第2項所述之以離 ^ 為核心之修正型離散餘弦正轉換之系统,傅立葉轉換 新排序單元以下列公式表示: ,其中該資料重 χη — (^(2n) — ;v(/V — 2n — ι)\ :(n \ ))+!H「2 叫、 田 時訊號 中,為該N/4個第二暫時訊號, , 暫 訊號。 ;為該N個第. 52 201227351 4·如申請專利範圍第3項所述之以離散傅立葉轉換 為核心之修正型離散餘弦正轉換之系統,其中,該第一 旋轉運算單元對該Ν/4個第二暫時訊號;^所執行第—旋 轉運算以下列公式表示: exp(—i .2π Ν (4)), 當中’ ί係一個由〇至Ν/4-1的指標。a second multiplier 'connected to the third multiplexer to multiply the S-second second multiplexer signal and _丨 to generate a third multiply signal; a fourth multiplexer connected to the first a second multiplexer receiving the second multiplexer signal and the second delay signal to be a fourth multiplexer signal; a second adder connected to the third multiplier and the fourth multiplexer to Adding the third multiplication signal and the fourth multi-two signal to generate a second addition signal; ^ and 5! 201227351 a second delayer 'connected to the second adder to the second The addition signal performs a delay operation to generate the second delay signal; the second rotation operation unit is connected to the discrete Fourier transform unit of the N/4 points, and the N/4 is the fourth temporary signal and the second The addition signal is configured to perform a second rotation operation to generate N/4 fifth temporary signals; and a de-interleave operation unit connected to the second rotation, the operation unit 'the fifth temporary for the NM Signal execution _ deinterlace operation 'to generate N Output signal.修正 范围 范围 范围 第 第 范围 范围 其中 η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η η (η+ιή fn-^) XN 71 ^--1 4 NJ ~ 1 ping is the N first temporary, x (n) is the N input digital signal signal. 3. As in the system of the modified discrete cosine transform which is based on the second part of the patent application, the Fourier transform new sorting unit is represented by the following formula: , wherein the data is χ η — (^(2n) — ;v(/V — 2n — ι)\ :(n \ ))+!H "2 calls, in the field signal, for the N/4 second temporary signals, the temporary signal number; for the N 52 201227351 4. The system of modified discrete cosine forward conversion with discrete Fourier transform as the core according to claim 3, wherein the first rotating operation unit is Ν/4 second temporary signals; ^ The first rotation performed is expressed by the following formula: exp(—i .2π Ν (4)), where ' ί is an indicator from 〇 to Ν /4-1. 5.如申請專利範圍第4項所述之以離散傅立葉轉換 為核心之修正型離散餘弦正轉換之系統,其中,該第二 旋轉運算單元對Ν/4個第四暫時訊號所執行第二旋轉運 算以下列公式表示: exp —1ύ0Ί 當中係一個由ο至ν/4- 1的指標。 6.如申請專利範圍第5項所述之以離散傅立葉轉換 為核心之修正型離散餘弦正轉換之系統,其中,該解交 錯(de-mter丨eave)運算單元對該ΝΜ個第五暫時訊號 所執行解交錯運算以下列公式表示: X( 2k) = Re( X(k)) X(2k + l) = -lm(又(A 一 h”,k = 〇 上 4 4 當中,加)為該NM個第五暫時訊號,州」為該N個輪出 訊號。 7. -種以離散傅立葉轉換為核心之修正型離散餘 弦反轉換之系統,其包含: ' 53 201227351 對,其接㈣2個輸人數位訊號, 對该N/2個輸入數位訊號執行資料重新排序運算,以產生 刪固第六暫時訊號’當中,叫的倍數之正整數; 第紅轉運异單元,連接至該資料重新排序單 凡 _個第六暫時訊號執行-第-旋轉運算,以產 生N/4個第七暫時訊號; ㈣舁以產 轉二4個點之離散傅立葉轉換單元,連接至該第-旋 =運4 1對該N/4個“暫時訊號執行離散傅立葉轉 換’以產生N/4個第八軔h± % 口占 暫時讥唬,該Ν/4個點之離散傅立 莱轉換單元包含: 第一多工器,用以接收該ΝΜ個第七暫時訊 號與:第二乘法訊號,並產生一第一多工訊號; ^ 一第—加法器,連接至該第一多工器,以對 ί第一多工訊號與一第二遲延訊號進行加法運 算,以產生該第八暫時訊號; ^ 一第一乘法器,連接至該第一加法器,以對 〇玄第八暫時訊號與一餘弦函數訊號進行乘法運 算,以產生一第一乘法訊號; 一第一移位暫存器,連接至該第一乘法器, 以對該第一乘法訊號進行移位運算,以產生一第 一移位訊號; 一第二多工器,連接至該第一乘法器及該第 —移位暫存器,接收該第一乘法訊號及該第一移 位訊號’以輸出一第二多工器訊號; 54 201227351 一第一遲延器,連接至該第一加法器,以對 該第八暫時訊號進行遲延運算, 沒'土—弟—遲 延訊號; 一第二乘法器,連接至該第一遲延裝置以 對該第一遲延訊號與一正弦函數訊號進行乘法運 算’以產生該第二乘法訊號;5. The system of modified discrete cosine forward conversion with discrete Fourier transform as a core according to claim 4, wherein the second rotation operation unit performs a second rotation on the fourth/four fourth temporary signals. The operation is expressed by the following formula: exp —1ύ0Ί is an indicator from ο to ν/4-1. 6. The system of modified discrete cosine forward conversion with discrete Fourier transform as a core according to claim 5, wherein the de-mter 丨 eave unit performs the fifth temporary signal The deinterleaving performed is expressed by the following formula: X( 2k) = Re( X(k)) X(2k + l) = -lm (again (A - h", k = 〇 4 4 , plus) The NM fifth temporary signal, the state is the N rounds of signals. 7. A modified discrete cosine inverse transform system with discrete Fourier transform as the core, which includes: ' 53 201227351 Yes, its (4) 2 Transmitting a digit signal, performing a data reordering operation on the N/2 input digit signals to generate a positive integer of a multiple of the sixth temporary signal 'deleted'; a red transport different unit, connected to the data to reorder Single _ a sixth temporary signal to perform a --rotation operation to generate N / 4 seventh temporary signals; (d) 离散 to transfer two or four points of discrete Fourier transform unit, connected to the first-rotation = transport 4 1 Perform discrete Fourier transform on the N/4 "temporary signals" N/4 eighth 轫h±% ports are temporarily occupied, and the Ν/4 points discrete Fourier transform unit includes: a first multiplexer for receiving the seventh temporary signal and: a second multiplication signal and generating a first multiplex signal; ^ a first-adder coupled to the first multiplexer to add a first multiplex signal and a second delay signal to generate the An eighth temporary signal; a first multiplier coupled to the first adder for multiplying the first temporary signal and the cosine function signal to generate a first multiplication signal; a first shift a register connected to the first multiplier to perform a shift operation on the first multiplying signal to generate a first shift signal; a second multiplexer connected to the first multiplier and the first - shifting the register, receiving the first multiplying signal and the first shifting signal ' to output a second multiplexer signal; 54 201227351 a first delay, connected to the first adder to The eighth temporary signal is delayed, no 'earth-di a delay signal; a second multiplier coupled to the first delay device to multiply the first delay signal and a sinusoidal function signal to generate the second multiplication signal; 一第二多工器,連接至該第一遲延裝置及該 第二乘法器,接收該第一遲延訊號及該該第二乘 法汛號,以輸出一第三多工器訊號; 一第三乘法器,連接至該第三多工器,以對 該第三多工器訊號與_丨進行乘法運算,以產生— 第三乘法訊號; 一第四多工器,連接至該第二多工器,接收 該第二多工器訊號及該第二遲延訊號,以輸出一 第四多工器訊號; 一第二加法器,連接至該第三乘法器及該第 四多工器,以對該第三乘法訊號與該第四多工器 訊號進行加法運算,以產生一第二加法訊號;以 及 一第二遲延器,連接至該第二加法器,以對 該第二加法訊號進行遲延運算以產生該第二遲 延訊號; 一第二旋轉運算單元,連接至該N/4個點之離散傅立 葉轉換單7L,對該N/4個由第八暫時訊號及第二加法訊號 55 201227351 組成執行-第二旋轉運算,以產生_個第九暫時訊號; 以及 μ :解交錯(deinterleave)運算單元,連接至該第二旋轉 運算單疋,對該N/4個第九暫時訊號執行一解交錯運算, 以產生N個輸出訊號。 8.如申凊專利範圍第7項所述之以離散傅立葉轉換 為核心之修正型離散餘弦反轉換之系統,其中該資料重 新排序單元以下列公式表示:a second multiplexer connected to the first delay device and the second multiplier, receiving the first delay signal and the second multiplication signal to output a third multiplexer signal; a third multiplication And connecting to the third multiplexer to multiply the third multiplexer signal and _丨 to generate a third multiplication signal; a fourth multiplexer connected to the second multiplexer Receiving the second multiplexer signal and the second delay signal to output a fourth multiplexer signal; a second adder connected to the third multiplier and the fourth multiplexer to The third multiplication signal is added to the fourth multiplexer signal to generate a second addition signal; and a second delay is connected to the second adder to delay the second addition signal to Generating the second delay signal; a second rotation operation unit connected to the N/4 points of the discrete Fourier transform unit 7L, and the N/4 is composed of the eighth temporary signal and the second addition signal 55 201227351 - Second rotation operation to generate _ ninth The temporary signal; and the μ deinterleave operation unit are connected to the second rotation operation unit, and perform a deinterleaving operation on the N/4 ninth temporary signals to generate N output signals. 8. A system for modifying a discrete cosine inverse transform having a discrete Fourier transform as a core, as recited in claim 7, wherein the data reordering unit is represented by the following formula: Xw=X{2k) + iX{N/2-2k-l), 备中,X免為該N/4個第六暫時訊號,幻為該N/2個輸入 數位訊號。 9.如申請專利範圍第8項所述之以離散傅立葉轉換 為核心之修正型離散餘弦反轉換之系統,其中,該第一 旋轉運算單元對該N/4個第六暫時訊號所執行第一旋 轉運算以下列公式表示:Xw=X{2k) + iX{N/2-2k-l), in the standby, X is the N/4 sixth temporary signal, and the N/2 input digital signals are illusory. 9. The system of modified discrete cosine inverse conversion with discrete Fourier transform as a core according to claim 8, wherein the first rotating operation unit performs the first for the N/4 sixth temporary signals. The rotation operation is represented by the following formula: ..2π . 1 .. exp(-ι——(t + —)) ' y N 8 當中,/係一個由0至N/4-1的指標。 1 0.如申請專利範圍第9項所述之以離散傅立葉轉 換為核心之修正型離散餘弦反轉換之系統,其中,該第 —旋轉運算單元對該該N/4個第八暫時訊號所執行第二 碇轉運算以下列公式表示: 哪(-与(1,+ ^)) ’ 當中,〆係一個由0至N/4-1的指標。 56 201227351 11.如申請專利範圍第10項所述之以離散傅立葉轉 換為核心之修正型離散餘弦反轉換之系統,其中,該解 交錯(deinterleave)運算單元對該NM個第九暫時訊號对心 所執行解交錯運算以下列公式表示·· x(2n) = (i (g + «)) ;c(2n + 1) = —·ίηι (ί (^ _ 1 — η)) X..2π . 1 .. exp(-ι——(t + —)) ' y N 8 Where / is an indicator from 0 to N/4-1. 1 . The system of modified discrete cosine inverse conversion with discrete Fourier transform as a core according to claim 9 , wherein the first rotating operation unit performs the N/4 eighth temporary signals The second rotation operation is expressed by the following formula: Where (- and (1, + ^)) ', among them, an indicator from 0 to N/4-1. 56 201227351 11. The system of modified discrete cosine inverse conversion with discrete Fourier transform as core according to claim 10, wherein the deinterleave operation unit is opposite to the NM ninth temporary signals The deinterleaving performed is expressed by the following formula: · x(2n) = (i (g + «)) ; c(2n + 1) = —·ίηι (ί (^ _ 1 — η)) X λ· (7 + 2η + 1) = (《-1 - η))X(h2n)^Im(i{^ + n))伶+2^1) = -/?+(备—1-„)) 2η^ = —J?e(r(n)} η = 〇* Λ1 * —— 8 ·γ (τ+2η +1)= ΐΊη (τ _ 1 ~η)) 當中’玎為該ΝΜ個第九暫時訊號,w Α ~马該Ν個輸出 訊號。 八、圖式(請見下頁): 57λ· (7 + 2η + 1) = ("-1 - η))X(h2n)^Im(i{^ + n))伶+2^1) = -/?+(备—1-„) ) 2η^ = —J?e(r(n)} η = 〇* Λ1 * —— 8 ·γ (τ+2η +1)= ΐΊη (τ _ 1 ~η)) where '玎 is the first Nine temporary signals, w Α ~ Ma should have an output signal. Eight, schema (see next page): 57
TW99146938A 2010-12-30 2010-12-30 Recursive modified discrete cosine transform and inverse discrete cosine transform system with a computing kernel of rdft TWI423046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW99146938A TWI423046B (en) 2010-12-30 2010-12-30 Recursive modified discrete cosine transform and inverse discrete cosine transform system with a computing kernel of rdft

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW99146938A TWI423046B (en) 2010-12-30 2010-12-30 Recursive modified discrete cosine transform and inverse discrete cosine transform system with a computing kernel of rdft

Publications (2)

Publication Number Publication Date
TW201227351A true TW201227351A (en) 2012-07-01
TWI423046B TWI423046B (en) 2014-01-11

Family

ID=46933203

Family Applications (1)

Application Number Title Priority Date Filing Date
TW99146938A TWI423046B (en) 2010-12-30 2010-12-30 Recursive modified discrete cosine transform and inverse discrete cosine transform system with a computing kernel of rdft

Country Status (1)

Country Link
TW (1) TWI423046B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI558172B (en) * 2014-12-11 2016-11-11 上海兆芯集成電路有限公司 Advanced video coding and decoding chip and advanced video coding and decoding method

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI718625B (en) * 2019-08-16 2021-02-11 瑞昱半導體股份有限公司 Computation circuit used in dct, dst, idct and idst
TWI799302B (en) * 2022-06-24 2023-04-11 瑞昱半導體股份有限公司 Computation circuit used in dct, dst, idct and idst

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6496795B1 (en) * 1999-05-05 2002-12-17 Microsoft Corporation Modulated complex lapped transform for integrated signal enhancement and coding
TWI276975B (en) * 2004-12-01 2007-03-21 Ind Tech Res Inst Fast fourier transform processor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI558172B (en) * 2014-12-11 2016-11-11 上海兆芯集成電路有限公司 Advanced video coding and decoding chip and advanced video coding and decoding method
US9686553B2 (en) 2014-12-11 2017-06-20 Via Alliance Semiconductor Co., Ltd. Advanced video coding and decoding chip and advanced video coding and decoding method

Also Published As

Publication number Publication date
TWI423046B (en) 2014-01-11

Similar Documents

Publication Publication Date Title
Ahmed et al. A 512-point 8-parallel pipelined feedforward FFT for WPAN
Liu et al. Pipelined architecture for a radix-2 fast Walsh–Hadamard–Fourier transform algorithm
KR20090127462A (en) Fast fourier transform/inverse fast fourier transform operating core
Kim et al. High speed eight-parallel mixed-radix FFT processor for OFDM systems
Wang et al. Efficient VLSI architecture for lifting-based discrete wavelet packet transform
Prakash et al. Performance evaluation of FFT processor using conventional and Vedic algorithm
TW201227351A (en) Recursive modified discrete cosine transform and inverse discrete cosine transform system with a computing kernel of RDFT
Badar et al. High speed FFT processor design using radix− 4 pipelined architecture
Li et al. A radix 2 2 based parallel pipeline FFT processor for MB-OFDM UWB system
Ferizi et al. Design and implementation of a fixed-point radix-4 FFT optimized for local positioning in wireless sensor networks
Wenqi et al. Design of fixed-point high-performance FFT processor
Kim et al. Design of a mixed prime factor FFT for portable digital radio mondiale receiver
Su et al. Reconfigurable FFT design for low power OFDM communication systems
Lai et al. Hybrid architecture design for calculating variable-length Fourier transform
Chang Design of an 8192-point sequential I/O FFT chip
Wang et al. An area-and energy-efficient hybrid architecture for floating-point FFT computations
Xiao et al. Low-cost reconfigurable VLSI architecture for fast fourier transform
Ahmed A low-power time-interleaved 128-point FFT for IEEE 802.15. 3c standard
Ranganathan et al. Efficient hardware implementation of scalable FFT using configurable Radix-4/2
Hazarika et al. Energy efficient VLSI architecture of real‐valued serial pipelined FFT
Sarode et al. Mixed-radix and CORDIC algorithm for implementation of FFT
Oh et al. Fast Fourier transform processor based on low-power and area-efficient algorithm
Dai et al. An MDCT hardware accelerator for MP3 audio
Jing et al. A configurable FFT processor
Lu et al. A low-power variable-length FFT processor base on Radix-2 4 algorithm

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees