KR100892292B1

KR100892292B1 - Parallel and Pipelined Radix - 2 to the Fourth Power FFT Processor

Info

Publication number: KR100892292B1
Application number: KR1020060109020A
Authority: KR
Inventors: 이한호
Original assignee: 인하대학교 산학협력단
Priority date: 2006-11-06
Filing date: 2006-11-06
Publication date: 2009-04-08
Also published as: KR20080040978A

Abstract

본 발명은 병렬 구조 및 파이프라인 방식을 이용한 Radix 2의 4승 고속 푸리에 변환 프로세서에 관한 것으로서, 2개의 병렬 데이터 경로의 구조 및 단일 경로 딜레이 피드백(SDF: Single-path Delay Feedback) 구조를 이용함으로써, 동작 속도 및 데이터 처리율을 증가시킴과 동시에 구성을 용이하도록 하드웨어의 복잡도를 감소시켰다. 이를 위하여 본 발명은 Radix - 2⁴ 알고리즘을 고속 푸리에 변환에 적용하였고, 프로그래머블 복소 곱셈기의 절반에 해당하는 곱셈양을 CSD 방식을 이용한 특수 복소 상수 곱셈기로 설계하여, 하드웨어 비용 및 전력 소모를 감소시킬 수 있는 병렬 구조 및 파이프라인 방식을 이용한 Radix 2의 4승 고속 푸리에 변환 프로세서를 구현하였다. 그 기술적 구성은 고속 푸리에 변환 프로세서에 있어서, 입력신호가 인가되면 버터플라이 연산을 행하는 복수개의 Radix - 2⁴ 제1 버터플라이 연산부; 및 상기 Radix - 2⁴ 제1 버터플라이 연산부의 다음 스테이지에 연결하되, 상기 제1 Radix- 2⁴ 버터플라이 연산부로부터 출력된 출력 데이터에 버터플라이 연산을 행하는 복수개의 Radix - 2⁴ 제2 버터플라이 연산부를 포함하여 이루어져, Radix - 2⁴ FFT 알고리즘이 적용되는 것을 특징으로 한다.The present invention relates to a Radix 2 quadratic fast Fourier transform processor using a parallel structure and a pipelined method, by using a structure of two parallel data paths and a single-path delay feedback (SDF) structure. The complexity of the hardware has been reduced to facilitate configuration while increasing operating speed and data throughput. To this end, the present invention applies the Radix-2 ⁴ algorithm to the fast Fourier transform, and by designating a multiplication amount corresponding to half of the programmable complex multiplier as a special complex constant multiplier using the CSD method, it is possible to reduce hardware cost and power consumption The Radix 2 quadratic fast Fourier transform processor is implemented using a parallel architecture and pipeline method. The technical configuration includes a fast Fourier transform processor comprising: a plurality of Radix-2 ⁴ first butterfly calculators for performing butterfly operations when an input signal is applied; And the ^Radix-24 but connected to the first butterfly operation unit of the next stage, the first Radix- ²⁴ butterfly plurality of Radix in the output data outputted from the operation unit for performing the butterfly operation - ²⁴ second butterfly operation unit It consists of, including the Radix-2 ⁴ FFT algorithm is characterized.

FFT, IFFT, Radix-2의 4승 , SDF, MB-OFDM, UWB FFT, IFFT, Radix-2 power of 4, SDF, MB-OFDM, UWB

Description

Parallel and Pipelined Radix-2 to the Fourth Power FFT Processor}

도 1은 본 발명에 따른 128 포인트 Radix-2⁴ SDF FFT 알고리즘의 개략적인 신호 흐름도.1 is a schematic signal flow diagram of a 128 point Radix-2 ⁴ SDF FFT algorithm in accordance with the present invention;

도 2은 본 발명에 따른 Radix-2⁴ 고속 푸리에 변환 구조를 개략적으로 나타내는 블록도.2 is a block diagram schematically illustrating a Radix-2 ⁴ fast Fourier transform structure according to the present invention.

도 3은 본 발명에 따른 Radix-2⁴ 버터플라이 연산부의 개략적인 구성도.3 is a schematic configuration diagram of a Radix-2 ⁴ butterfly calculation unit according to the present invention.

도 4a는 본 발명에 따른 sin(π/8)를 위한 CSD 복소 상수 곱셈기를 도시하는 블록도.4A is a block diagram illustrating a CSD complex constant multiplier for sin (π / 8) in accordance with the present invention.

도 4b는 본 발명에 따른 cos(π/8)를 위한 CSD 복소 상수 곱셈기를 도시하는 블록도.4B is a block diagram illustrating a CSD complex constant multiplier for cos (π / 8) in accordance with the present invention.

도 4c는 본 발명에 따른 sin(π/4)= cos(π/4)를 위한 CSD 복소 상수 곱셈기를 도시하는 블록도.4C is a block diagram illustrating a CSD complex constant multiplier for sin (π / 4) = cos (π / 4) in accordance with the present invention.

도 5은 본 발명에 따른 CSD 복소 상수 곱셈기를 개략적으로 나타내는 블록 구성도.5 is a block diagram schematically illustrating a CSD complex constant multiplier according to the present invention.

도 6은 본 발명에 따른 FFT 프로세서의 내부워드길이에 따른 잡음률을 도시한 그래프.6 is a graph showing the noise rate according to the internal word length of the FFT processor according to the present invention.

본 발명은 병렬 구조 및 파이프라인 방식을 이용한 Radix 2의 4승 고속 푸리에 변환 프로세서에 관한 것으로, 더욱 상세하게는 2개의 병렬 데이터 경로의 구조 및 단일 경로 딜레이 피드백 구조를 이용함으로써, 동작 속도 및 데이터 처리율을 증가시킴과 동시에 구성을 용이하도록 하드웨어의 복잡도를 감소시켰다. 이를 위하여 본 발명은 Radix - 2⁴ 알고리즘을 고속 푸리에 변환에 적용시킴으로써, 프로그래머블 복소 곱셈기의 절반에 해당하는 곱셈양을 CSD 방식을 이용한 특수 복소 상수 곱셈기로 설계하여 하드웨어 비용 및 전력 소모를 감소시킬 수 있는 병렬 구조 및 파이프라인 방식을 이용한 Radix 2의 4승 고속 푸리에 변환 프로세서를 구현하였다.The present invention relates to a Radix 2 quadratic fast Fourier transform processor using a parallel architecture and a pipelined scheme, and more particularly, by using a structure of two parallel data paths and a single path delay feedback structure. At the same time, the complexity of the hardware is reduced to facilitate configuration. To this end, the present invention applies the Radix-2 ⁴ algorithm to the fast Fourier transform, thereby designing a multiplication amount corresponding to half of the programmable complex multiplier as a special complex constant multiplier using the CSD method, which can reduce hardware cost and power consumption. Radix 2 quadratic fast Fourier transform processor is implemented using parallel architecture and pipeline.

일반적으로, 직교 주파수 분할 변조(Orthogonal Frequency Division Multiplexing: 이하, 'OFDM') 방식과 이산 다중 톤(Discrete Multi-Tone: 이하 'DMT') 방식은 다중 경로 채널을 통한 고속 데이터 전송에 유리하다.In general, orthogonal frequency division multiplexing (OFDM) and discrete multi-tone (DMT) are advantageous for high-speed data transmission over a multipath channel.

이 방식들은 변조된 데이터를 부반송파의 수만큼 직/병렬 변환하고 각각에 대응하는 부반송파로 변조하는 방식이고, 이는 이산 푸리에 변환(Discrete Fourier Transform: 이하 'DFT' )을 이용하여 구현하는데, 하드웨어 설계시에는 연산량을 줄이기 위하여 고속 푸리에 변환(Fast Fourier Transform: 이하 'FFT') 알고리즘을 이용한다.These methods convert the modulated data by the number of subcarriers in parallel / parallel and the corresponding subcarriers, which are implemented using a Discrete Fourier Transform (DFT). Fast Fourier Transform (FFT) algorithm is used to reduce the amount of computation.

여기서, FFT 알고리즘을 처리하기 위한 프로세서는 OFDM 시스템에 가장 큰 복잡도를 가지며, 고속 연산을 요구한다. 이러한, 고성능을 요구하는 모뎀응용분야의 FFT 설계는 파이프라인(Pipeline) 구조를 갖는 프로세서를 주로 사용한다. 파이프라인 구조는 각 스테이지(Stage) 마다 하나의 연산부를 두는 방식으로 하드웨어의 비용을 절감할 뿐 아니라 낮은 동작 주파수에서도 고속 연산의 효과를 얻을 수 있으며, 구조가 비교적 간단하여 설계가 용이한 장점을 가지고 있다.Here, the processor for processing the FFT algorithm has the largest complexity in the OFDM system and requires a high speed operation. The FFT design of modem applications requiring high performance mainly uses a processor having a pipeline structure. The pipeline structure not only reduces hardware cost by providing one operation unit for each stage, but also achieves the effect of high-speed operation even at a low operating frequency. have.

그러나, 요구하는 FFT 포인트 수가 증가할수록 하드웨어의 비용은 로그 함수적으로 증가하며, 특히 복소 곱셈기는 전체 전력의 50%~80%의 전력을 소모하고, 하드웨어의 복잡성을 증가시키며, 데이터 처리율을 저하시키는 문제점이 있었다.However, as the number of required FFT points increases, the hardware cost increases logarithmically, especially complex multipliers consume 50% to 80% of the total power, increase hardware complexity, and reduce data throughput. There was a problem.

본 발명은 상기한 문제점을 해결하기 위하여 안출한 것으로, 2개의 병렬 데이터 경로의 구조, 단일 경로 딜레이 피드백 구조 및 파이프라인 방식을 이용함으로써, 동작 속도 및 데이터 처리율을 증가시킴과 동시에 구성을 용이하도록 하드웨 어의 복잡도를 감소시키도록 구현하였다. 그리고 Radix - 2⁴ 알고리즘을 고속 푸리에 변환에 적용시킴으로써, 프로그래머블 복소 곱셈기의 절반에 해당하는 곱셈양을 CSD 방식을 이용한 특수 복소 상수 곱셈기로 설계하여 하드웨어 비용 및 전력 소모를 감소시킬 수 있는 병렬 구조 및 파이프라인 방식을 이용한 Radix 2의 4승 고속 푸리에 변환 프로세서를 제공하는 것을 목적으로 한다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and by using a structure of two parallel data paths, a single path delay feedback structure and a pipelined method, it is possible to increase the operating speed and data throughput and to make the configuration hard. Implemented to reduce the complexity of the software. By applying the Radix-2 ⁴ algorithm to the fast Fourier transform, parallel structures and pipes can be used to reduce the hardware cost and power consumption by designing the multiplication amount of half of the programmable complex multiplier as a special complex constant multiplier using CSD. It is an object of the present invention to provide a Radix 2 quadratic fast Fourier transform processor using a line method.

상기한 바와 같은 목적을 달성하기 위하여 본 발명은 고속 푸리에 변환 프로세서에 있어서, 입력신호가 인가되면 버터플라이 연산을 행하는 복수개의 Radix - 2⁴ 제1 버터플라이 연산부; 및 상기 Radix - 2⁴ 제1 버터플라이 연산부의 다음 스테이지에 연결하되, 상기 제1 Radix- 2⁴ 버터플라이 연산부로부터 출력된 출력 데이터에 버터플라이 연산을 행하는 복수개의 Radix - 2⁴ 제2 버터플라이 연산부를 포함하여 이루어져, Radix - 2⁴ FFT 알고리즘이 적용되는 것을 특징으로 한다.In order to achieve the above object, the present invention provides a fast Fourier transform processor comprising: a plurality of Radix-2 ⁴ first butterfly calculating units for performing a butterfly operation when an input signal is applied; And the ^Radix-24 but connected to the first butterfly operation unit of the next stage, the first Radix- ²⁴ butterfly plurality of Radix in the output data outputted from the operation unit for performing the butterfly operation - ²⁴ second butterfly operation unit It consists of, including the Radix-2 ⁴ FFT algorithm is characterized.

여기서, 전체 곱셈기의 절반을 특수 복소 상수 곱셈기로 설계한다.Here, half of the total multipliers are designed with special complex constant multipliers.

그리고, 상기 특수 복소 상수 곱셈기는 CSD 복소 상수 곱셈기인 것을 특징으로 한다.The special complex constant multiplier is a CSD complex constant multiplier.

또한, 상기 고속 푸리에 변환 프로세서는 상기 알고리즘을 적용한 2개의 병렬 데이터경로를 가지는 파이프라인 FFT 프로세서인 것을 특징으로 한다.In addition, the fast Fourier transform processor is a pipelined FFT processor having two parallel data paths to which the algorithm is applied.

더불어, 상기 고속 푸리에 변환 프로세서는 상기 알고리즘을 적용한 SDF FFT 프로세서인 것을 특징으로 한다.In addition, the fast Fourier transform processor is characterized in that the SDF FFT processor applying the algorithm.

한편, 상기 알고리즘은 인덱스 분해법에 의하여 유도된 것을 특징으로 한다.On the other hand, the algorithm is characterized in that it is derived by the index decomposition method.

이하, 본 발명에 따른 실시예를 첨부된 예시도면을 참고로 하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 128 포인트 Radix-2⁴ SDF FFT 알고리즘의 개략적인 신호 흐름도이다. 도면에서 도시하고 있는 바와 같이, Radix - 2⁴ 의 전개과정을 간략하게 설명한다.1 is a schematic signal flow diagram of a 128 point Radix-2 ⁴ SDF FFT algorithm according to the present invention. As shown in the figure, the development of Radix-2 ⁴ is briefly described.

길이가 N인 DFT(Discrete Fourier Transform)은 수학식 1과 같이 정의한다.Discrete Fourier Transform (DFT) having length N is defined as in Equation 1.

여기에서, W^k _n는 회전 인자(Twiddle factor), k는 주파수 인덱스, n은 시간 인덱스를 나타낸다. 본 발명에 따른 Radix - 2⁴ 알고리즘을 유도하기 위하여, 5차원 선형 인덱스 맵을 적용하면 수학식 2와 같다.Here, W ^k _n denotes a rotation factor, k denotes a frequency index, and n denotes a time index. In order to derive the Radix-2 ⁴ algorithm according to the present invention, a 5-dimensional linear index map is applied to Equation 2.

이때, 공통 인수 분해 알고리즘(CFA: Common Factor Algorithm)를 적용하면 수학식 3과 같다.In this case, when the common factor decomposition algorithm (CFA) is applied, Equation 3 is obtained.

*수학식 3의 회전 인자를 계산하면, 수학식 4와 같다.When the rotation factor of Equation 3 is calculated, Equation 4 is obtained.

그리고, 네번째 버터플라이 연산부의 구조는 수학식 5와 같으며, 4개의 복소수를 가지는 회전인자를 나타낸다.The structure of the fourth butterfly calculation unit is shown in Equation 5, and represents a rotation factor having four complex numbers.

또한, 두번째 버터플라이 연산부의 구조는 H(n)으로 표현되며, 수학식 6과 같다.In addition, the structure of the second butterfly calculation unit is represented by H (n), and is represented by Equation 6.

여기서, H(n) = H(n, k1, k2)이다. 수학식 5와 같이 네 번째 버터플라이 연산부는 두 번째 버터플라이 연산부 H(n)가 회전인자 Wn16 로 서로 연결된 형태를 가진다.Where H (n) = H (n, k1, k2). As shown in Equation 5, the fourth butterfly operator has a form in which the second butterfly operator H (n) is connected to each other by the rotation factor Wn16.

그리고, 첫번째 버터플라이 연산부는 수학식 7과 같다.The first butterfly calculation unit is expressed by Equation (7).

마지막으로, 본 발명에 따른 알고리즘은 프로그래머블 복소 곱셈기를 대신하여 복소 상수 곱셈기를 이용하고, CSD 상수 곱셈기는 1의 개수를 최소화하도록 연산과정을 간소화하여 하드웨어의 면적 및 전력 소비를 감소시킨다. Finally, the algorithm according to the present invention uses a complex constant multiplier in place of the programmable complex multiplier, and the CSD constant multiplier reduces the area and power consumption of the hardware by simplifying the computation process to minimize the number of ones.

도 2은 본 발명에 따른 Radix-2⁴ 고속 푸리에 변환 구조를 개략적으로 나타내는 블록도이다. 도면에서 도시하고 있는 바와 같이, 2개의 병렬 데이터 경로를 가지는 128 포인트의 Radix-2⁴ SDF FFT 프로세서는 버터플라이 연산부와 복소Booth 곱셈기, CSD 복소상수 곱셈기, 레지스터와 메모리 블럭으로 이루어진다.2 is a block diagram schematically illustrating a Radix-2 ⁴ fast Fourier transform structure according to the present invention. As shown in the figure, a 128-point Radix-2 ⁴ SDF FFT processor with two parallel data paths consists of a butterfly operator, a complex Booth multiplier, a CSD complex constant multiplier, a register and a memory block.

여기서, CSD 복소 상수 곱셈기를 이용하여 복소 Booth 곱셈기를 감소시키며, 이에 따라 하드웨어 복잡도를 감소시킬 수 있다.Here, the complex booth multiplier is reduced by using the CSD complex constant multiplier, thereby reducing the hardware complexity.

본 구조는 2개의 병렬 데이터 구조이므로 일반적인 데이터 흐름과는 다르게 2개의 복소수 값이 입력되므로 순차적으로 2개의 복소수를 저장 후 128 포인트의 반인 64번째의 데이터를 저장한 후 65번째의 데이터가 입력되면 이전에 저장되었던 첫 번째(홀수 번째)값을 새로운 입력 값과 연산하게 되고 함께 입력된 짝수 번째 값을 이전 홀수 번째 메모리에 저장되게 된다. 이러한 연산으로 홀수 값들을 모두 연산한 후에는 짝수 쌍에 대한 연산만을 수행하여 빈 메모리에는 입력되는 2개의 복소수들을 순차적으로 저장한다. Since this structure is two parallel data structures, two complex values are entered differently from the general data flow. Therefore, after storing two complex numbers sequentially, if the 64th data is stored after half of the 128 points, the 65th data is transferred. The first (odd) value stored in S is calculated with the new input value, and the even value entered together is stored in the previous odd memory. After all the odd values are calculated by this operation, only the even pairs are calculated and two complex numbers are sequentially stored in the empty memory.

첫 번째 버터플라이 연산의 결과 값으로 홀수 번째 입력에 대한 연산이 먼저 수행되므로 버터플라이 연산에 필요한 N 포인트의 N/2의 램 면적은 첫 번째와 마지막 단을 제외하고 줄어들게 되지만 마지막 단의 버터플라이 연산은 홀수 번째 입력에 대한 연산을 끝낸 후 짝수 번째 연산이 시작될 동안의 값을 모두 저장하므로 일반적인 구조에 비해 램 면적인 늘어나게 된다. 하지만 2개의 병렬 데이터 구조에 의한 추가적인 램은 필요하지 않게 된다.As the result of the first butterfly operation, the operation on the odd numbered input is performed first, so the N point of N / 2 ram area required for the butterfly operation is reduced except for the first and last stages, but the butterfly operation of the last stage is performed. Saves all the values during the start of even-numbered operations after completing the operation on odd-numbered inputs, thus increasing the RAM area compared to the general structure. However, additional RAM by two parallel data structures is not needed.

도 3은 본 발명에 따른 Radix-2⁴ 버터플라이 연산부의 개략적인 구성도이다. 도면에서 도시하는 바와 같이, 두 가지 연산 과정을 행하게 된다.3 is a schematic configuration diagram of a Radix-2 ⁴ butterfly calculation unit according to the present invention. As shown in the figure, two calculation processes are performed.

첫 번째 연산으로는 입력되는 실수와 허수를 램에 저장하고 이전에 램에 저장된 값은 출력 단으로 그대로 출력한다.In the first operation, real numbers and imaginary numbers are stored in RAM, and the values previously stored in RAM are output to the output.

두 번째 연산으로는 버터플라이 연산부(BF1)에서 램에 저장된 실수와 허수와 새로운 입력된 실수와 허수는 각각 복소 감산기와 복소 가산기로 입력하고, 복소 감산기로부터 연산을 하고, 출력되는 데이터는 다시 램으로 입력되며, 복소 가산기에서 연산된 출력되는 데이터는 다음 연산을 위해 출력된다. 이러한 두 가지 연산 과정을 순차적으로 반복하게 된다.In the second operation, the butterfly operator (BF1) inputs the real and imaginary numbers stored in the RAM and the new input real and imaginary numbers into the complex subtractor and the complex adder, respectively, and operates from the complex subtractor. The output data calculated by the complex adder is output for the next operation. These two operations are repeated sequentially.

한편, 버터플라이 연산부(BF2)는 (-j)의 연산을 수행하기 위하여 약간의 조합 회로를 포함한다.On the other hand, the butterfly calculating unit BF2 includes some combination circuits for performing the calculation of (-j).

여기서, 본 발명에 따른 2개의 병렬 데이터경로를 가지는 Radix-2⁴ FFT 를 구동하기 위한 알고리즘을 구현하기 위하여, 2개의 복소 Booth 곱셈기가 구비된다.Here, in order to implement an algorithm for driving a Radix-2 ⁴ FFT having two parallel data paths according to the present invention, two complex Booth multipliers are provided.

그리고, 여섯번째 스테이지의 버터플라이 연산부(BF1)로부터 생성된 결과데이터는 회전인자, -j, W(16), W(48)와 곱해지는데, ROM이 32 바이트의 상기 회전인자의 데이터를 저장한다.The result data generated from the butterfly operation unit BF1 of the sixth stage is multiplied by the rotation factors, -j, W (16), and W (48), and the ROM stores 32 bytes of the rotation factor data. .

또한, 복소 곱셈기의 절반을 CSD 복소 상수 곱셈기로 설계한다.Also, half of the complex multipliers are designed with CSD complex constant multipliers.

표 1은 회전인자를 십진수, 8-bit의 2진수, CSD 형식으로 나타낸다.Table 1 shows the rotation factors in decimal, 8-bit binary, and CSD formats.

여기서, 회전인자에 해당하는 복소 곱셈계수는 삼각함수 sin(π/4) = cos(π/4), cos(π/8), sin(π/8) 인데, 회전인자인 W(8), W(16), W(24), W(48)로 나타낼 수 있다.Here, the complex multiplication coefficients corresponding to the rotation factors are trigonometric functions sin (π / 4) = cos (π / 4), cos (π / 8), sin (π / 8), and the rotation factors W (8), It may be represented by W (16), W (24), and W (48).

그리고, 상수 곱셈기는 곱셈 계수 중에 '0' 이 아닌 비트의 위치에 따라 입력의 값 비트를 시프트하여 부분곱을 생성한 후, 서로 더하여 결과를 얻는다. 계수가 N 비트이면 곱셈기에서 발생하는 부분곱의 개수는 최대 N 개이고, 주로 면적을 차지하는 부분이 부분곱에 대한 가산이므로, 면적을 줄이기 위하여 부분곱을 감소시켜야 한다.The constant multiplier generates partial products by shifting the value bits of the input according to the positions of the bits other than '0' in the multiplying coefficients, and adding them together to obtain a result. If the coefficient is N bits, the number of partial products generated in the multiplier is N maximum, and since the portion occupying the area is added to the partial product, the partial product must be reduced to reduce the area.

그래서, 곱셈기 설계시, 면적을 감소시키는 방법은 2의 보수로 표현된 곱셈 계수를 CSD 방식으로 표현하는 것인데, CSD는 {-1, 0, 1}의 디지트 세트를 가지며, 두개이상의 연속적인 '1'의 비트를 가질 수 없다. 이렇게 변환된 CSD 표현식은 2의 보수와 비교하여 '1'의 개수를 줄일 수 있다.Thus, when designing a multiplier, the method of reducing the area is to express multiplication coefficients expressed in two's complement by the CSD method, which has a digit set of {-1, 0, 1} and two or more consecutive '1's. Cannot have a bit of ' The converted CSD expression can reduce the number of '1's as compared to two's complement.

이때, 1bar는 -1 을 나타낸다.At this time, 1 bar represents -1.

도 4a는 본 발명에 따른 sin(π/8)를 위한 CSD 복소 상수 곱셈기를 도시하는 블록도이고, 도 4b는 본 발명에 따른 cos(π/8)를 위한 CSD 복소 상수 곱셈기를 도시하는 블록도이며, 도 4c는 본 발명에 따른 sin(π/4)= cos(π/4)를 위한 CSD 복소 상수 곱셈기를 도시하는 블록도이다. 도면에서 도시하고 있는 바와 같이, sin(π/4) = cos(π/4), cos(π/8), sin(π/8)을 위하여 구비되며, sin(π/8)는 에러 보상 바이어스(C)와 반가산기와 벡터 합산 가산기를 포함하며, cos(π/8)는 에러 보상 바이어스(C1,C2)와 전가산기와 반 가산기 및 벡터 합산 가산기를 포함하고, sin(π/4)= cos(π/4)는 에러 보상 바이어스(C)와 전가산기와 벡터 합산 가산기를 포함하여 이루어진다.4A is a block diagram showing a CSD complex constant multiplier for sin (π / 8) according to the present invention, and FIG. 4B is a block diagram showing a CSD complex constant multiplier for cos (π / 8) according to the present invention. 4C is a block diagram illustrating a CSD complex constant multiplier for sin (π / 4) = cos (π / 4) according to the present invention. As shown in the figure, for sin (π / 4) = cos (π / 4), cos (π / 8), sin (π / 8), sin (π / 8) is an error compensation bias. (C) and a half adder and a vector add adder, cos (π / 8) includes an error compensation bias (C1, C2), a full adder, a half adder and a vector add adder, and sin (π / 4) = cos (π / 4) includes an error compensation bias C, a full adder and a vector add adder.

여기서, 에러 보상 바이어스는 양자화 에러를 효율적으로 보상하기 위하여, 양자화 에러에 미치는 영향에 따라 버림수를 주 그룹과 부 그룹으로 구성하고, 주 그룹 버림 비트에 의하여 표현되며, 버림 비트의 영향을 확률을 이용하여 반영시키는데, 에러 보상 바이어스의 총 합은 C또는 C1, C2 로 정의한다.In order to efficiently compensate the quantization error, the error compensation bias is composed of the main group and the sub group according to the effect on the quantization error, and is represented by the main group truncation bit, and the probability of the effect of the truncation bit is calculated. The total sum of the error compensation biases is defined as C, or C1, C2.

한편, 일반적인 N X N 비트의 2의 보수 곱셈기의 출력은 (2N - 1) 입력이며, 입력과 같은 N 비트의 출력만을 얻고자 한다면, (2N - 1) 비트의 결과 중에서 (N - 1) LSB 를 생략하여 N 비트로 양자화하는 방법을 사용하는데, 특히 FFT 프로세서는 디지털 연산의 반복으로 이러한 방법을 적용하면, 기존의 곱셈기를 그대로 사용할 수 있는 장점이 있으나, 양자화 에러가 증가할 뿐만 아니라 하드웨어의 낭비를 가져오며, 이에 따라 처음부터 출력 비트를 N 개로 고정시키고, CSD 곱셈기 설계시 가장 작은 에러를 갖도록 에러 보상 바이어스를 구현하여 결과를 얻는 방법을 사용한다.On the other hand, the output of a two-complement multiplier of a general NXN bit is a (2N-1) input, and if you want to obtain only N-bit output like the input, omit the (N-1) LSB from the result of the (2N-1) bit. In this case, the FFT processor can use the existing multiplier as it is by repeating the digital operation, but the quantization error is not only increased but also waste of hardware. Therefore, we fix the output bits to N from the beginning and implement the error compensation bias to get the result to have the smallest error when designing the CSD multiplier.

도 5은 본 발명에 따른 CSD 복소 상수 곱셈기를 개략적으로 나타내는 블록 구성도이다. 도면에서 도시하고 있는 바와 같이, CSD 복소 상수 곱셈기는 6개의 CSD 복소 상수 곱셈기와 2개의 비교기를 포함하여 이루어진다.5 is a block diagram schematically illustrating a CSD complex constant multiplier according to the present invention. As shown in the figure, the CSD complex constant multiplier comprises six CSD complex constant multipliers and two comparators.

여기서, 각각의 실수 데이터와 허수 데이터를 입력하여 6개의 CSD 복소 상수 곱셈기의 결과 데이터로 각각의 실수 데이터와 허수 데이터를 출력시킨다.Here, the real data and the imaginary data are inputted to output the real data and the imaginary data as result data of the six CSD complex constant multipliers.

표 2는 각각의 데이터 경로의 회전인자 연산 스케쥴링을 도시한다. 도 5와 같이, 3가지의 종류를 가지는 6개의 CSD 복소 상수 곱셈기를 공유하여 본 발명에 따른 회전인자연산을 순차적으로 수행한다.Table 2 shows the rotation factor calculation scheduling of each data path. As shown in FIG. 5, six CSD complex constant multipliers having three types are shared to sequentially perform rotational natural acids according to the present invention.

본 발명의 실시예에서는 본 발명에 따른 병렬 구조 및 파이프라인 방식을 이용하여 128포인트 Radix 2의4승 고속 푸리에 변환 프로세서를 하드웨어적으로 구현한다.In the embodiment of the present invention, a 128-point Radix 2 quadratic fast Fourier transform processor is implemented in hardware using a parallel structure and a pipeline method according to the present invention.

도 6은 본 발명에 따른 FFT 프로세서의 내부워드길이에 따른 잡음률을 도시한 그래프이다. 도면에서 도시하고 있는 바와 같이, 적절한 워드 길이를 결정하기 위하여 하드웨어 구현에 앞서 고정 소수점 시뮬레이션을 수행한다.6 is a graph illustrating the noise rate according to the internal word length of the FFT processor according to the present invention. As shown in the figure, fixed-point simulation is performed prior to hardware implementation to determine the appropriate word length.

여기서, 입력 잡음률(IN SNR)은 FFT에 입력되는 신호에 따른 채널에 의한 AWGN(Additive White Gaussian Noise)의 비율이며, 출력 잡음률(OUT SNR)은 AWGN과 고정 소수점 연산에 따른 양자화 잡음에 의한 입력 신호비(Input Signal Ratio)이다.Here, the input noise ratio (IN SNR) is the ratio of AWGN (Additive White Gaussian Noise) by the channel according to the signal input to the FFT, and the output noise ratio (OUT SNR) is due to AWGN and quantization noise due to fixed-point arithmetic. Input signal ratio.

이때, 도면에서 도시하고 있는 바와 같이 입력 잡음률을 고정한 상태에서 워드의 길이를 증가시키면, 양자화 잡음이 감소하여 출력 잡음률이 증가함을 보이며, 출력 잡음률이 입력 잡음률과 비슷해질수록 수렴하는데, 이는 양자화 잡음을 무시가능한 것이다.At this time, as shown in the drawing, if the length of the word is increased while the input noise rate is fixed, the quantization noise decreases and the output noise rate increases, and as the output noise rate approaches the input noise rate, it converges. This is negligible in quantization noise.

이상에서는 본 발명의 바람직한 실시예를 예시적으로 설명하였으나, 본 발명의 범위는 이 같은 특정 실시예에만 한정되지 않으며 해당 분야에서 통상의 지식을 가진자라면 본 발명의 특허 청구 범위내에 기재된 범주 내에서 적절하게 변경이 가능 할 것이다.Although the exemplary embodiments of the present invention have been described above by way of example, the scope of the present invention is not limited to such specific embodiments, and a person of ordinary skill in the art may be provided within the scope of the claims of the present invention. Appropriate changes can be made.

이상에서 설명한 바와 같이 상기와 같은 구성을 갖는 본 발명은 2개의 병렬 데이터 경로의 구조 및 단일 경로 딜레이 피드백 구조를 이용함으로써, 동작 속도 및 데이터 처리율을 증가시킴과 동시에 구성을 용이하도록 하드웨어의 복잡도를 감소시키도록 구현하고, Radix - 2⁴ 알고리즘을 고속 푸리에 변환에 적용시킴으로써, 프로그래머블 복소 곱셈기의 절반에 해당하는 곱셈양을 CSD 방식을 이용한 특수 복소 상수 곱셈기로 설계하여 하드웨어 비용 및 전력 소모를 감소시킬 수 있는 등의 효 과를 거둘 수 있다.As described above, the present invention having the configuration described above uses a structure of two parallel data paths and a single path delay feedback structure to increase the operation speed and data throughput and to reduce the complexity of hardware to facilitate the configuration. By applying the Radix-2 ⁴ algorithm to the fast Fourier transform, we can reduce the hardware cost and power consumption by designing the multiplication amount of half of the programmable complex multiplier with a special complex constant multiplier using CSD. This can be effective.

Claims

In a pipeline fast Fourier transform processor with two parallel data paths applying the Radix-2 ⁴ algorithm,

A plurality of Radix-2 ⁴ first butterfly calculators for outputting two parallel outputs after performing a butterfly operation capable of efficient data processing and memory operation when input signals of two parallel data paths are applied; And

The ^Radix-24 first being connected to the next stage of the butterfly operation unit, said first plurality of Radix Butterfly Radix- ²⁴ for performing butterfly operation on the output data outputted from the computing unit fly-2 butterfly operation unit ^42, and

It includes six CSD complex constant multipliers of three types,

Radix 2 quadratic fast Fourier transform processor of 128 points using a parallel structure and a pipelined method characterized in that to perform the rotational natural acid in the two parallel paths by sequentially sharing the six CSD complex constant multiplier.

delete

The method of claim 1,

And a quadratic fast Fourier transform processor of 128-point Radix 2 using a parallel structure and a pipelined method, wherein an SDF algorithm is applied to the two parallel structures of the fast Fourier transform processor.

The method of claim 1,

And the algorithm is derived by index decomposition. A 128-point Radix 2 quadratic fast Fourier transform processor using a parallel structure and a pipeline method.