RU2730174C1

RU2730174C1 - Reconfigurable fast fourier transform computer of super-long transform length

Info

Publication number: RU2730174C1
Application number: RU2020101956A
Authority: RU
Inventors: Павел Сергеевич Поперечный; Ирина Юрьевна Поперечная; Ярослав Ярославович Петричкович; Татьяна Владимировна Солохина
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2020-08-19

Abstract

FIELD: digital signal processing.SUBSTANCE: reconfigurable fast Fourier transform computer of super-long transform length for input samples comprises two "butterfly" computing units, four FIFO buffers and DMA memory direct access controller, which is connected to system bus and Control Bus control bus. At that, DMA zero channel data output Rdata0 is connected to the coefficients of the RfifoCoef coefficients buffer coefficients data input, the high-end part Coef1 of the output bus of which is connected to the input of the multiplier of the first "butterfly" unit BFly1, and the lower part Coef0 of the output bus is connected to the input of the multiplier of the zero node "butterfly" BFly0.EFFECT: creation of a reconfigurable fast Fourier transform (FFT) computer of super-long conversion length with conflict-free linear access to memory with increased speed, due to optimized use of hardware resources, including memory, due to use of unified (single) switching circuit values from memory for basic computing units operation "butterfly" for all stages of conveyor.1 cl, 8 dwg

Description

Изобретение относится к области цифровой обработки сигналов (ЦОС), а именно к реконфигурируемым вычислителям быстрого преобразования Фурье (БПФ) сверхбольшой длины преобразования, и может применяться для цифровой обработки сигналов во всех областях современной техники. The invention relates to the field of digital signal processing (DSP), namely to reconfigurable fast Fourier transform (FFT) computers of extra-long conversion length, and can be used for digital signal processing in all areas of modern technology.

Быстрое преобразование Фурье (БПФ) является алгоритмом быстрого вычисления дискретного преобразования Фурье (ДПФ) и применяется как для программной, так и для аппаратной реализации ввиду гораздо меньшего количества умножителей и сумматоров по сравнению с ДПФ. Преобразование Фурье, как одно из главных преобразований в ЦОС, используется практически во всех областях современной техники. Многие цифровые стандарты связи, телевидения, измерительная аппаратура и т.д. подразумевают использование БПФ.Fast Fourier Transform (FFT) is an algorithm for fast computation of discrete Fourier transform (DFT) and is used for both software and hardware implementation due to the much smaller number of multipliers and adders compared to DFT. The Fourier transform, as one of the main transformations in DSP, is used in almost all areas of modern technology. Many digital standards for communications, television, instrumentation, etc. imply the use of an FFT.

Хорошо известны две схемы вычисления БПФ: с прореживанием по частоте и с прореживанием по времени. По количеству математических операций (количеству аппаратных умножителей и сумматоров при аппаратной реализации) обе схемы одинаковы. Отличие в различном порядке либо входных (временных) отсчетов, либо выходных (частотных) отсчетов. Существует прямой порядок и порядок с инверсией адресов. БПФ вычисляют конвейерно по стадиям. Основным вычислительным узлом схемы БПФ является операция «бабочка», включающая в себя два комплексного умножения и суммирования. Также устройство БПФ включает в себя блоки памяти и схему коммутации между ячейками блоков памяти различных стадий. Существует большое количество схем коммутации с оптимизацией по объему памяти, аппаратным затратам, быстродействию. Слабым местом в схеме коммутации является доступ к памяти ввиду того, что операция «бабочка» подразумевает считывание значений из разных адресов памяти, и после вычисления результата запись его в разные адреса. Адреса зависят от выбранной схемы коммутации и стадии вычисления БПФ. В классической схеме коммутации считывание значений и запись результатов осуществляют по-разному от стадии к стадии, что накладывает большие аппаратные затраты на вычисление адресов. К тому же из однопортовой памяти, как правило, нельзя считать одновременно из двух адресов в один такт работы, что делает невозможным применять один блок памяти для одной операции «бабочка».Two schemes for calculating FFT are well known: with decimation in frequency and decimation in time. In terms of the number of mathematical operations (the number of hardware multipliers and adders in hardware implementation), both circuits are the same. Unlike in a different order or input (s temporal x) samples, or the output (frequency) samples. There is direct order and order with inversion of addresses. The FFT is computed in stages in a pipeline. The main computational unit of the FFT circuit is the "butterfly" operation, which includes two complex multiplication and addition. Also, the FFT device includes memory blocks and a switching circuit between memory cells of different stages. There are a large number of switching schemes with optimization in memory size, hardware costs, and speed. A weak point in the switching circuit is memory access, since the "butterfly" operation implies reading values from different memory addresses, and after calculating the result, writing it to different addresses. The addresses depend on the chosen switching scheme and the stage of the FFT calculation. In the classical switching scheme, the reading of values and the recording of the results are carried out differently from stage to stage, which imposes large hardware costs on calculating addresses. In addition, as a rule, it is impossible to read simultaneously from two addresses in one clock cycle from single-port memory, which makes it impossible to use one memory block for one butterfly operation.

Наиболее близкой к заявленному изобретению является унифицированная реконфигурируемая схема коммутации быстрого преобразования Фурье, описанная в патенте RU2700194, которая содержит унифицированную схему коммутации узлов «бабочка» в разных стадиях конвейера. Данная схема выбрана в качестве прототипа заявленного изобретения. Closest to the claimed invention is a unified reconfigurable fast Fourier transform switching circuit described in patent RU2700194, which contains a unified switching circuit for butterfly nodes in different stages of the pipeline. This scheme is selected as a prototype of the claimed invention.

Недостатком схемы прототипа является его дороговизна и низкое быстродействие, вследствие отсутствия возможности бесконфликтного доступа к памяти для последовательного вычисления БПФ с целью оптимизации использования аппаратных ресурсов, в том числе памяти.The disadvantage of the prototype circuit is its high cost and low speed, due to the lack of the possibility of conflict-free memory access for sequential FFT computation in order to optimize the use of hardware resources, including memory.

Техническим результатом изобретения является создание реконфигурируемого вычислителя быстрого преобразования Фурье (БПФ) сверхбольшой длины преобразования с бесконфликтным линейным доступом к памяти с меньшей стоимостью изготовления и увеличенным быстродействием, вследствие оптимизации использования аппаратных ресурсов, в том числе памяти, за счет применения унифицированной (единой) схемы коммутации значения из памяти для базовых узлов вычислений операции «бабочка» для всех стадий конвейера.The technical result of the invention is the creation of a reconfigurable calculator of the fast Fourier transform (FFT) of an extra-long transformation length with conflict-free linear access to memory with a lower manufacturing cost and increased speed, due to the optimization of the use of hardware resources, including memory, due to the use of a unified (single) switching scheme memory values for the basic nodes of the butterfly operation for all stages of the pipeline.

Поставленный технический результат достигнут путем создания реконфигурируемого вычислителя быстрого преобразования Фурье сверхбольшой длины преобразования для

входных отсчетов, содержащего два вычислительных узла «бабочка» (700, 701), четыре буфера FIFO (702, 705, 709, 710) и контроллер прямого доступа к памяти DMA (703), который соединен с сиcтемной шиной (704) и шиной управления Control Bus, при этом выход Rdata0 данных нулевого канала контроллера DMA (703) соединен с входом данных коэффициентов буфера коэффициентов RfifoCoef (702), старшая часть Coef1 выходной шины которого соединена с входом умножителя первого узла «бабочка» BFly1 (701), а младшая часть Coef0 выходной шины соединена с входом умножителя нулевого узла «бабочка» BFly0 (700), вход WE разрешения записи буфера коэффициентов RfifoCoef (702) соединен с выходом WE0 нулевого канала контроллера DMA (703), вход Full0 которого соединен с выходом заполненности FullCoef буфера коэффициентов RfifoCoef (702), вход разрешения чтения RE которого соединен и выполнен с возможностью приема сигнала разрешения En с инверсного выхода элемента НЕ-ИЛИ (711), нулевой вход которого соединен с выходом опустошения EmptyCoef буфера коэффициентов RfifoCoef (702), третий вход соединен с выходом заполненности FullW буфера записи данных Wfifo (705), а первый и второй входы соединены c выходами опустошения EmptyDown и EmptyUp буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA, при этом инверсный выход логического элемента НЕ-ИЛИ (711) соединен с входом логического элемента И (706) и с входом WE разрешения записи буфера записи данных Wfifo (705), а также с входами RE разрешения чтения буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA, входы WE разрешения записи которых соединены с выходами WE1 и WE2 разрешения записи каналов чтения контроллера DMA (703), а входы Full1 и Full2 заполненности которых соединены с выходами FullDown и FullUp заполненности буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA, входные шины данных чтения Rdata_Down и Rdata_Up которых соединены с выходами данных чтения Rdata1 и Rdata2 каналов чтения контроллера DMA (703), при этом старшие половины разрядов RdataH выходных шин данных Rdata_Up и Rdata_Down буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA соединены с входами A и B нулевого узла «бабочка» BFly0 (700), а младшие половины разрядов RdataL выходных шин Rdata_Up и Rdata_Down буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA соединены с входами A и B первого узла «бабочка» BFly1 (701), при этом выходы Y и Z нулевого узла «бабочка» BFly0 (700) объединены в общую шину и соединены с нулевым входом мультиплексора (708), первый вход которого соединен с объединенной шиной выходов Y и Z первого узла «бабочка» BFly1 (701), при этом выходная шина данных мультиплексора (708) соединена с входом данных буфера записи данных Wfifo (705), выходная шина данных которого соединена с входом Wdata канала записи контроллера DMA (703), вход Empty которого соединен с выходом EmptyW буфера разрешения записи Wfifo (705), а выход RE разрешения записи с входом RE буфера разрешения записи Wfifo (705), при этом выход элемента И-НЕ (706) соединен с входом элемента задержки D (707), выход которого соединен с инверсным входом элемента И-НЕ (706) и с входом селектора мультиплексора (708).The stated technical result was achieved by creating a reconfigurable calculator of the fast Fourier transform of an extra-long transformation length for

input samples containing two computing nodes "butterfly" (700, 701), four FIFO buffers (702, 705, 709, 710) and a DMA controller (703), which is connected to the system bus (704) and the control bus Control Bus, while the data output Rdata0 of the zero channel of the DMA controller (703) is connected to the data input of the coefficients of the coefficient buffer RfifoCoef (702), the upper part of the output bus Coef1 of which is connected to the input of the multiplier of the first node "butterfly" BFly1 (701), and the lower part Coef0 of the output bus is connected to the input of the zero node "butterfly" multiplier BFly0 (700), the input WE of the write permission of the buffer of coefficients RfifoCoef (702) is connected to the output WE0 of the zero channel of the DMA controller (703), the input Full0 of which is connected to the output of the FullCoef of the buffer of coefficients RfifoCoef (702), the RE permission input of which is connected and configured to receive the permission signal En from the inverse output of the NOT-OR element (711), the zero input of which is connected to the output m emptying the RfifoCoef coefficient buffer EmptyCoef (702), the third input is connected to the FullW output of the Wfifo data write buffer (705), and the first and second inputs are connected to the emptyDown and EmptyUp outputs of the RfifoDown buffer (709) for reading the data of the first DMA channel and the RfifoUp buffer (710) reading the data of the second DMA channel, while the inverse output of the NOT-OR gate (711) is connected to the input of the AND gate (706) and to the WE input of the write permission of the Wfifo data write buffer (705), as well as to the RE enable inputs reads the RfifoDown buffer (709) reads the data of the first DMA channel and the RfifoUp buffer (710) reads the data of the second DMA channel, the WE inputs of the write permission are connected to the WE1 and WE2 outputs of the write permission of the DMA controller read channels (703), and the Full1 and Full2 inputs are full which are connected to the FullDown and FullUp outputs of the fullness of the RfifoDown buffer (709) for reading the data of the first DMA channel and of the buffer RfifoUp (710) for reading the data of the second DMA channel, input the read data buses Rdata_Down and Rdata_Up which are connected to the read data outputs Rdata1 and Rdata2 of the read channels of the DMA controller (703), while the upper half of the bits RdataH of the output data buses Rdata_Up and Rdata_Down of the RfifoDown buffer (709) for reading the data of the first DMAUp channel and buffer Rf10 ) data reads of the second DMA channel are connected to inputs A and B of the zero node "butterfly" BFly0 (700), and the lower half of the bits RdataL of the output buses Rdata_Up and Rdata_Down of the buffer RfifoDown (709) read data of the first DMA channel and buffer RfifoUp (710) read data of the second DMA channel are connected to the inputs A and B of the first butterfly node BFly1 (701), while the Y and Z outputs of the zero butterfly node BFly0 (700) are combined into a common bus and connected to the zero input of the multiplexer (708), the first input which is connected to the combined bus of outputs Y and Z of the first node "butterfly" BFly1 (701), while the output data bus of the multiplexer (708) is connected to the data input of the data write buffer Wfifo (705), the output bus which is connected to the input Wdata of the write channel of the DMA controller (703), the Empty input of which is connected to the EmptyW output of the Wfifo write enable buffer (705), and the RE write enable output to the RE input of the Wfifo write enable buffer (705), while the output of the AND element -NOT (706) is connected to the input of the delay element D (707), the output of which is connected to the inverse input of the AND-NOT element (706) and to the input of the multiplexer selector (708).

В предпочтительном варианте осуществления вычислителя вычислительный узел «бабочка» является типовым и состоит из двух сумматоров и комплексного умножителя, при этом первый вход А узла «бабочка» соединен с первыми входами двух сумматоров, выход первого сумматора является первым выходом Y узла «бабочка», а второй В вход узла «бабочка» соединен с вторым входом первого сумматора, а также с входом умножителя на -1, выход которого соединен с вторым входом второго сумматора, выход которого соединен в входом комплексного умножителя, выход которого является вторым выходом Z узла «бабочка».In a preferred embodiment of the calculator, the computing node "butterfly" is typical and consists of two adders and a complex multiplier, while the first input A of the "butterfly" node is connected to the first inputs of two adders, the output of the first adder is the first output Y of the "butterfly" node, and the second In the input of the "butterfly" node is connected to the second input of the first adder, as well as to the input of the multiplier by -1, the output of which is connected to the second input of the second adder, the output of which is connected to the input of the complex multiplier, the output of which is the second output Z of the "butterfly" ...

Для лучшего понимания заявленного изобретения далее приводится его подробное описание с соответствующими графическими материалами.For a better understanding of the claimed invention, the following is a detailed description thereof with the corresponding graphic materials.

Фиг. 1. Схема вычисления БПФ с прореживанием по частоте, известная из уровня техники.FIG. 1. A frequency decimation FFT computation circuit known in the art.

Фиг. 2. Схема базовой операции «бабочка», известная из уровня техники: А-структурная схема; Б-функциональная.FIG. 2. Scheme of the basic operation "butterfly" known from the prior art: A-block diagram; B-functional.

Фиг. 3. Унифицированная схема коммутации БПФ с прореживанием по частоте (при N=8), выполненная согласно изобретению.FIG. 3. Unified FFT switching circuit with decimation in frequency (at N = 8), made according to the invention.

Фиг. 4. Схема вычисления БПФ с прореживанием по частоте (при N=16), известная из уровня техники.FIG. 4. Scheme for calculating the FFT with decimation in frequency (at N = 16), known from the prior art.

Фиг. 5. Унифицированная схема коммутации БПФ с прореживанием по частоте (при N=16), выполненная согласно изобретению.FIG. 5. Unified FFT switching circuit with decimation in frequency (at N = 16), made according to the invention.

Фиг. 6. Схема организация памяти для бесконфликтного доступа при вычислении БПФ с прореживанием по частоте (при N=16), выполненная согласно изобретению.FIG. 6. Scheme of memory organization for conflict-free access when calculating the FFT with decimation in frequency (with N = 16), made according to the invention.

Фиг. 7. Схема реконфигурируемого вычислителя БПФ сверхбольшой длины преобразования, выполненная согласно изобретению.FIG. 7. Circuit of a reconfigurable ultra-long conversion length FFT calculator according to the invention.

Фиг. 8. Схема системы с использованием внешней DDR памяти, выполненная согласно изобретению.FIG. 8. Schematic diagram of a system using external DDR memory according to the invention.

Рассмотрим более подробно функционирование заявленного реконфигурируемого вычислителя быстрого преобразования Фурье (БПФ) сверхбольшой длины преобразования (Фиг. 1 - 8).Let us consider in more detail the operation of the claimed reconfigurable calculator of the fast Fourier transform (FFT) of an extra-long transform length (Figs. 1 - 8).

БПФ основано на дискретном преобразовании Фурье, согласно которому:The FFT is based on the discrete Fourier transform, according to which:

(1)

где

–

-ый отсчет входной последовательности,

,Where

-

-th sample of the input sequence,

,

–

-ый отсчет выходного спектра,

,

-

-th reading of the output spectrum,

,

– количество отсчетов,

- the number of counts,

– коэффициенты ДПФ.

- DFT coefficients.

Традиционная известная из уровня техники схема вычисления БПФ с прореживанием по частоте показана на Фиг. 1. Входные отсчеты

по порядку записывают в массив элементов памяти (101), далее по конвейеру выполняют вычисление с помощью базового вычислительного элемента (102) операции «бабочка». Количество стадий (Stage0, Stage1, Stage2) конвейера определяют значением

. Количество отсчетов

выбирают кратным степени двойки. Схема коммутации на каждой стадии различна, в некоторых вершинах находится умножитель (103) на поворотный множитель

Базовая операция «бабочка», выполняемая элементом (102), представлена на Фиг. 2-А. Более подробно работа элемента (102) операции «бабочка» представлена на функциональной схеме (Фиг. 2-Б). В состав элемента (102) операции «бабочка» входит два сумматора (201), в нижнем ребре «бабочки» расположен умножитель (103) на поворотный множитель. Операцию «бабочка» выполняют в соответствии со следующим выражением:A conventional prior art FFT computation scheme with frequency decimation is shown in FIG. 1. Input samples

are written in order to the array of memory elements (101), then the computation is performed along the pipeline using the basic computing element (102) of the "butterfly" operation. The number of stages (Stage0, Stage1, Stage2) of the conveyor is determined by the value

... Counts count

choose a multiple of a power of two. The switching circuit is different at each stage, at some vertices there is a multiplier (103) by the rotational factor

The basic butterfly operation performed by element (102) is shown in FIG. 2-A. In more detail, the operation of the element (102) of the "butterfly" operation is presented in the functional diagram (Fig. 2-B). The element (102) of the "butterfly" operation includes two adders (201), in the lower edge of the "butterfly" there is a multiplier (103) by a rotational factor. The butterfly operation is performed according to the following expression:

, (2)

где

и

– пара входных отсчетов;

и

– пара выходных комплексных отсчетов;

– комплексный поворотный множитель.Where

and

- a pair of input samples;

and

- a pair of output complex readouts;

Is a complex rotational factor.

Схема коммутации, представленная на Фиг. 1, на каждой стадии различна, поэтому для каждой стадии необходим свой неунифицированный дешифратор адреса. Для лучшего понимания черные кружки обозначены цифрами, это вклад каждого первоначального отсчета

в последующие стадии и участие в операции «бабочка». Видно, что вклад отсчетов

в последнюю стадию, то есть в выходные отсчеты

, по имеют обратную нумерацию, если считать сверху вниз.The switching circuit shown in FIG. 1 is different at each stage, therefore, each stage requires its own non-unified address decoder. For a better understanding, the black circles are indicated by numbers, this is the contribution of each initial count

in subsequent stages and participation in the operation "butterfly". It is seen that the contribution of the counts

in the last stage, that is, in the weekend countdown

, but are numbered from top to bottom.

Унифицированная схема коммутации БПФ, применяемая в заявленном устройстве, представлена на Фиг. 3. Узел операции «бабочка» (301) схематично стал несимметричен, при этом работа узла по-прежнему эквивалентна схеме на Фиг. 2-Б и выражению (2). Видно, что схема коммутации на каждой стадии (Stage0, Stage1, Stage2) остается одинаковой. Вклад (номер над черными кружками) первоначального отсчета

в последующие стадии отличается от традиционной схемы на Фиг. 1, однако в конечной стадии вклад в выходные отсчеты

аналогичен вкладу на Фиг. 1. Алгоритмически схемы на Фиг. 1 и Фиг. 3 эквивалентны, все вычисления на каждой стадии совпадают, отличие лишь в адресах записи/чтения из ячеек памяти (101).The unified FFT switching circuit used in the claimed device is shown in FIG. 3. The butterfly operation node (301) has become schematically asymmetrical, while the operation of the node is still equivalent to that of FIG. 2-B and expression (2). It can be seen that the switching scheme at each stage (Stage0, Stage1, Stage2) remains the same. Contribution (number above black circles) of the initial count

in subsequent stages differs from the conventional circuit in FIG. 1, however, at the final stage, the contribution to the output counts

is similar to the contribution in FIG. 1. Algorithmically, the circuit in FIG. 1 and FIG. 3 are equivalent, all calculations at each stage coincide, the only difference is in the write / read addresses from memory cells (101).

Аналогичным образом можно построить схему для любого количества отсчетов N. На Фиг. 4 представлена традиционная схема вычисления БПФ с прореживанием по частоте (N=16), а на Фиг. 5 ее аналог - унифицированная схема коммутации БПФ с прореживанием по частоте (N=16). Исходя из заявленной унифицированной схемы коммутации (N=8,16) и выражения (2) для общего случая (любого N) справедливо итеративное выражение:Similarly, a circuit can be constructed for any number of samples N. In FIG. 4 illustrates a conventional frequency decimation FFT computation (N = 16), and FIG. 5 its analogue is a unified FFT switching circuit with frequency decimation (N = 16). Based on the declared unified switching scheme (N = 8.16) and expression (2) for the general case (any N), the iterative expression is valid:

(3)

(3)

где

– значение (входной отсчет или промежуточное значение, вычисленное узлом «бабочка») считываемое из

-ой ячейки памяти

-ой стадии конвейера;

– значение (вычисленное узлом «бабочка») записываемое в

-ой ячейки памяти

-ой стадии конвейера;

– комплексный поворотный множитель согласно выражению (2).Where

- value (input sample or intermediate value calculated by the butterfly node) read from

-th memory cell

-th stage of the conveyor;

- the value (calculated by the "butterfly" node) written to

-th memory cell

-th stage of the conveyor;

- complex rotational factor according to expression (2).

Зачастую требуется меньшее количество отсчетов для преобразования БПФ, а именно

, при этом, если использовать традиционную известную из уровня техники схему коммутации БПФ с прореживанием по частоте, необходимо использовать первые

элементов памяти для отсчетов, а в остальные элементы памяти должны быть записаны нули. При том нетрудно заметить, что поворачивающие коэффициенты останутся прежними, так как

при

. Таким образом, и в унифицированной схеме (Фиг. 3) нет необходимости менять поворачивающие коэффициенты для реконфигурирования схемы по количеству отсчетов. Все что следует сделать, это обнулить все неиспользуемые отсчеты

во входном массиве элементов памяти (101).Fewer samples are often required for the FFT conversion, namely

, in this case, if you use the conventional known from the prior art FFT switching scheme with decimation in frequency, it is necessary to use the first

memory elements for readings, and zeros must be written to the remaining memory elements. Moreover, it is easy to see that the turning coefficients remain the same, since

at

... Thus, even in the unified circuit (Fig. 3) there is no need to change the turning coefficients for reconfiguring the circuit in terms of the number of samples. All you have to do is reset all unused samples.

in the input array of memory elements (101).

С целью уменьшения аппаратных затрат используют вариант выполнения заявленного изобретения с последовательным вычислением БПФ, требующий один узел «бабочка» и два массива памяти объема

отсчетов. При этом лучшим вариантом является схема с бесконфликтным доступом к памяти и линейной адресацией для записи и чтения. Согласно выражению (3), доступ к памяти для чтения значений будет линейным, то есть с инкрементацией на один, также линейным будет и доступ для записи, с инкрементацией на два, однако если одна ячейка памяти будет хранить два отсчета, то инкрементация адреса станет на один.In order to reduce hardware costs, an embodiment of the claimed invention with sequential FFT calculation is used, which requires one butterfly node and two memory arrays

counts. In this case, the best option is a scheme with conflict-free memory access and linear addressing for writing and reading. According to expression (3), access to memory for reading values will be linear, that is, with an increment of one, and access for writing, with an increment of two, will also be linear, but if one memory cell stores two readings, then the increment of the address will become one.

На Фиг. 6 представлена организация памяти для бесконфликтного доступа с линейной адресацией. Два массива памяти разбиты пополам, таким образом, что за один такт вычитывают два значения из двух разных памятей для одной операции «бабочка», а результат записывают в третью (или четвертую) память по одному адресу, в старшую (прямоугольник с закрашенной левой частью) и младшую часть слова (прямоугольник с закрашенной правой частью). При использовании одного узла «бабочка» такая организация памяти позволяет осуществлять доступ к памяти без конфликта по чтению и записи в один такт, при этом адресация линейная, то есть с инкрементацией адреса плюс один. Линейная адресация существенно упрощает узел генерации адресов, что в свою очередь увеличивает быстродействие данного устройства при аппаратной реализации.FIG. 6 shows the memory organization for contention-free linear addressing access. Two memory arrays are split in half, so that in one cycle two values are subtracted from two different memories for one "butterfly" operation, and the result is written into the third (or fourth) memory at one address, into the older one (rectangle with the filled left part) and the lower part of the word (rectangle with the filled right part). When using a single butterfly node, this memory organization allows access to memory without a conflict in reading and writing in one clock cycle, while addressing is linear, that is, with address increment plus one. Linear addressing greatly simplifies the address generation unit, which in turn increases the performance of this device in hardware implementation.

На Фиг. 7 представлена схема заявленного реконфигурируемого вычислителя БПФ сверхбольшой длины преобразования. Реконфигурируемый вычислитель быстрого преобразования Фурье сверхбольшой длины преобразования для входных отсчетов, содержит два вычислительных узла «бабочка» (700, 701), четыре буфера FIFO (702, 705, 709, 710) и контроллер прямого доступа к памяти DMA (703), который соединен с сиcтемной шиной (704) и шиной управления Control Bus. Выход Rdata0 данных нулевого канала контроллера DMA (703) соединен с входом данных коэффициентов буфера коэффициентов RfifoCoef (702), старшая часть Coef1 выходной шины которого соединена с входом умножителя первого узла «бабочка» BFly1 (701), а младшая часть Coef0 выходной шины соединена с входом умножителя нулевого узла «бабочка» BFly0 (700). Вход WE разрешения записи буфера коэффициентов RfifoCoef (702) соединен с выходом WE0 нулевого канала контроллера DMA (703). Вход Full0 нулевого канала контроллера DMA (703) соединен с выходом заполненности FullCoef буфера коэффициентов RfifoCoef (702). Вход разрешения чтения RE буфера коэффициентов RfifoCoef (702) соединен и выполнен с возможностью приема сигнала разрешения En с инверсного выхода элемента НЕ-ИЛИ (711). Нулевой вход элемента НЕ-ИЛИ (711) соединен с выходом опустошения EmptyCoef буфера коэффициентов RfifoCoef (702). Третий вход элемента НЕ-ИЛИ (711) соединен с выходом заполненности FullW буфера записи данных Wfifo (705). Первый и второй входы элемента НЕ-ИЛИ (711) соединены c выходами опустошения EmptyDown и EmptyUp буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA. Инверсный выход логического элемента НЕ-ИЛИ (711) соединен с входом логического элемента И (706) и с входом WE разрешения записи буфера записи данных Wfifo (705), а также с входами RE разрешения чтения буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA. Входы WE разрешения записи буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA соединены с выходами WE1 и WE2 разрешения записи каналов чтения контроллера DMA (703). Входы Full1 и Full2 заполненности буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA соединены с выходами FullDown и FullUp заполненности буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA. Входные шины данных чтения Rdata_Down и Rdata_Up буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA соединены с выходами данных чтения Rdata1 и Rdata2 каналов чтения контроллера DMA (703). Старшие половины разрядов RdataH выходных шин данных Rdata_Up и Rdata_Down буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA соединены с входами A и B нулевого узла «бабочка» BFly0 (700). Младшие половины разрядов RdataL выходных шин Rdata_Up и Rdata_Down буфера RfifoDown (709) чтения данных первого канала DMA и буфера RfifoUp (710) чтения данных второго канала DMA соединены с входами A и B первого узла «бабочка» BFly1 (701). Выходы Y и Z нулевого узла «бабочка» BFly0 (700) объединены в общую шину и соединены с нулевым входом мультиплексора (708), первый вход которого соединен с объединенной шиной выходов Y и Z первого узла «бабочка» BFly1 (701). Выходная шина данных мультиплексора (708) соединена с входом данных буфера записи данных Wfifo (705). Выходная шина данных буфера записи данных Wfifo (705) соединена с входом Wdata канала записи контроллера DMA (703). Вход Empty контроллера DMA (703) соединен с выходом EmptyW буфера разрешения записи Wfifo (705), а выход RE разрешения записи с входом RE буфера разрешения записи Wfifo (705). Выход элемента И-НЕ (706) соединен с входом элемента задержки D (707), выход которого соединен с инверсным входом элемента И-НЕ (706) и с входом селектора мультиплексора (708).FIG. 7 shows a diagram of the claimed reconfigurable FFT calculator of extra-long conversion length. Reconfigurable fast Fourier transform calculator of extra-long transform length for input samples, contains two computing nodes "butterfly" (700, 701), four FIFO buffers (702, 705, 709, 710) and a DMA controller (703), which is connected with system bus (704) and Control Bus. The data output Rdata0 of the zero channel of the DMA controller (703) is connected to the data input of the coefficients of the coefficient buffer RfifoCoef (702), the high part Coef1 of the output bus of which is connected to the input of the multiplier of the first butterfly node BFly1 (701), and the low part Coef0 of the output bus is connected to input of the multiplier of the zero node "butterfly" BFly0 (700). The input WE of the permission to write the buffer of coefficients RfifoCoef (702) is connected to the output WE0 of the zero channel of the DMA controller (703). The Full0 input of the zero channel of the DMA controller (703) is connected to the FullCoef output of the RfifoCoef coefficient buffer (702). The input of permission to read RE of the buffer of coefficients RfifoCoef (702) is connected and is configured to receive the enable signal En from the inverse output of the NOT-OR element (711). The zero input of the NOT-OR element (711) is connected to the emptying output EmptyCoef of the RfifoCoef coefficient buffer (702). The third input of the NOT-OR gate (711) is connected to the FullW output of the Wfifo data write buffer (705). The first and second inputs of the NOT-OR element (711) are connected to the emptying outputs EmptyDown and EmptyUp of the RfifoDown buffer (709) for reading the data of the first DMA channel and the RfifoUp buffer (710) for reading the data of the second DMA channel. The inverse output of the NOT-OR gate (711) is connected to the input of the AND gate (706) and to the WE input of the write permission of the Wfifo data write buffer (705), as well as to the RE inputs of the RfifoDown buffer read permission (709) of the first DMA channel read data and a buffer RfifoUp (710) for reading the second DMA channel data. The inputs WE of the write permission of the RfifoDown buffer (709) for reading the data of the first DMA channel and the buffer RfifoUp (710) for reading the data of the second DMA channel are connected to the outputs WE1 and WE2 of the write permission of the read channels of the DMA controller (703). The Full1 and Full2 inputs of the RfifoDown buffer fullness (709) for reading the first DMA channel data and the RfifoUp (710) buffer for reading the second DMA channel data are connected to the FullDown and FullUp outputs of the RfifoDown buffer fullness (709) for reading the first DMA channel data and RfifoUp (710) reading buffer second DMA channel data. The input data lines of reading Rdata_Down and Rdata_Up of the buffer RfifoDown (709) for reading data of the first DMA channel and buffer RfifoUp (710) for reading data of the second DMA channel are connected to the data outputs of reading Rdata1 and Rdata2 of the read channels of the DMA controller (703). The upper half of the bits RdataH of the output data buses Rdata_Up and Rdata_Down of the buffer RfifoDown (709) for reading the data of the first DMA channel and the buffer RfifoUp (710) for reading the data of the second DMA channel are connected to inputs A and B of the zero node "butterfly" BFly0 (700). The lower half of the bits RdataL of the output buses Rdata_Up and Rdata_Down of the buffer RfifoDown (709) for reading the data of the first DMA channel and the buffer RfifoUp (710) for reading the data of the second DMA channel are connected to the inputs A and B of the first node "butterfly" BFly1 (701). Outputs Y and Z of the zero node "butterfly" BFly0 (700) are combined into a common bus and connected to the zero input of the multiplexer (708), the first input of which is connected to the combined bus of outputs Y and Z of the first node "butterfly" BFly1 (701). The output data bus of the multiplexer (708) is connected to the data input of the data write buffer Wfifo (705). The data output line of the Wfifo write buffer (705) is connected to the Wdata input of the write channel of the DMA controller (703). The Empty input of the DMA controller (703) is connected to the EmptyW output of the Wfifo write enable buffer (705), and the RE write enable output to the RE input of the Wfifo write enable buffer (705). The output of the NAND element (706) is connected to the input of the delay element D (707), the output of which is connected to the inverse input of the NAND element (706) and to the input of the multiplexer selector (708).

Вычислительные узлы «бабочка» (700, 701) являются типовыми и состоят из двух сумматоров и комплексного умножителя. Первый вход А узла «бабочка» соединен с первыми входами двух сумматоров, выход первого сумматора является первым выходом Y узла «бабочка». Второй В вход узла «бабочка» соединен с вторым входом первого сумматора, а также с входом умножителя на -1, выход которого соединен с вторым входом второго сумматора, выход которого соединен в входом комплексного умножителя, выход которого является вторым выходом Z узла «бабочка».Computational nodes "butterfly" (700, 701) are typical and consist of two adders and a complex multiplier. The first input A of the butterfly node is connected to the first inputs of two adders, the output of the first adder is the first output Y of the butterfly node. The second B input of the "butterfly" node is connected to the second input of the first adder, as well as to the input of the multiplier by -1, the output of which is connected to the second input of the second adder, the output of which is connected to the input of the complex multiplier, the output of which is the second Z output of the "butterfly" ...

Так как считывание отсчетов для одной операции «бабочка» происходит с двух ячеек, хранящих четыре значения, то можно использовать еще одну операцию «бабочка», для того, чтобы не считывать значения с одной ячейки дважды. При этом запись результатов операции «бабочка» выполняют последовательно, сначала по всему одному диапазону памяти, затем по всему другому. Такая организация памяти, позволяет выполнять БПФ сколь угодно большой длины преобразования, если использовать внешнюю память, например, DDR, а доступ к ней осуществлять при помощи контроллера DMA (703).Since the reading of samples for one operation "butterfly" occurs from two cells storing four values, you can use one more operation "butterfly" in order not to read values from one cell twice. In this case, the recording of the results of the "butterfly" operation is performed sequentially, first over the entire one memory range, then over the whole other. This memory organization allows performing FFTs of an arbitrarily long conversion length if you use external memory, for example, DDR, and access it using a DMA controller (703).

Поворотные множители вычисляются в процессе преобразования или хранят во внешней памяти предварительно подсчитанными. После настройки всех каналов DMA и записи входных отсчетов в нужный диапазон внешней памяти начинают преобразование. С помощью контроллера DMA (703) по двум каналам чтения данных считывают по системной шине значения из внешней памяти в буфер коэффициентов RfifoCoef (702), параллельно считывают необходимый поворотный множитель. При наличии данных в буфере коэффициентов RfifoCoef (702), выполняют две операции «бабочка» параллельно, результат которых записывают в буфер разрешения записи Wfifo (705), содержащий данные на запись в память, и далее по каналу записи с помощью контроллера DMA (703) записывают по системной шине данные в следующий диапазон памяти. Описанную выше итерацию повторяют для всех отсчетов и всех стадий преобразования, по мере выполнения всех заданий с помощью контроллера DMA (703) формируют прерывание об окончании преобразования, при этом задают следующее преобразование. Буферы FIFO (702, 705, 709, 710) необходимы для согласования и оптимального использования пропускной способности системной шины с возможностью пачкового доступа к данным. Реконфигурируемость вычислителя под различную длину преобразования обеспечиваются изменением настроечных регистров DMA: N-длина преобразования (диапазонов памятей), BASE_DATA – начальный базовый адрес данных, BASE_W – базовый адрес поворотных подсчитанных множителей, Stages – количество стадий преобразования.The rotational factors are calculated during the conversion or stored in external memory pre-calculated. Once all DMA channels have been configured and the input samples are written to the desired range of external memory, conversion begins. With the help of the DMA controller (703), the values from the external memory are read from the external memory to the buffer of coefficients RfifoCoef (702) via two data read channels via the system bus, and the necessary rotational factor is read in parallel. If there is data in the buffer of RfifoCoef coefficients (702), two "butterfly" operations are performed in parallel, the result of which is written into the write permission buffer Wfifo (705), containing data for writing to memory, and then through the write channel using the DMA controller (703) write data on the system bus to the next memory range. The above iteration is repeated for all samples and all conversion stages, as all jobs are completed, an interrupt about the end of conversion is generated by the DMA controller (703), while the next conversion is set. FIFO buffers (702, 705, 709, 710) are required to match and make optimal use of the system bus bandwidth with bursting data access. The reconfigurability of the calculator for different conversion lengths is ensured by changing the DMA adjustment registers: N-conversion length (memory ranges), BASE_DATA - the initial base address of the data, BASE_W - the base address of the rotary counted multipliers, Stages - the number of conversion stages.

Пример построения системы с использованием внешней DDR памяти и с применением заявленного реконфигурируемого вычислителя быстрого преобразования Фурье (БПФ) сверхбольшой длины преобразования представлен на Фиг. 8.An example of building a system using external DDR memory and using the claimed reconfigurable calculator of the fast Fourier transform (FFT) of an extra-long transform length is shown in Fig. 8.

Вариант выполнения заявленной унифицированной схемы коммутации БПФ с прореживанием по частоте (для N=16), представленный на Фиг. 5, может применяться для различных целей:An embodiment of the claimed unified FFT switching scheme with decimation in frequency (for N = 16), shown in FIG. 5, can be used for various purposes:

• с целью уменьшения аппаратных затрат - последовательная схема, итерационная, требующая один узел «бабочка» и два массива памяти объема

отсчетов, при этом доступ к памяти является бесконфликтным;• in order to reduce hardware costs - a sequential scheme, iterative, requiring one butterfly node and two memory arrays

counts, while access to memory is conflict-free;

• с целью максимизации производительности - полностью параллельная схема, конвейерная, требующая

узлов «бабочка» и элементов памяти (один элемент для хранения одного отсчета).• in order to maximize performance - fully parallel, pipelined, requiring

Butterfly nodes and memory elements (one element to store one sample).

• для целевых задач – последовательно параллельная схема, итерационная, требующая несколько узлов «бабочка» не более

, работающих параллельно и два массива памяти объема

отсчетов.• for target tasks - a sequentially parallel scheme, iterative, requiring several nodes "butterfly" no more

working in parallel and two memory arrays

counts.

Заявленное изобретение представляет собой вычислитель БПФ с прореживанием по частоте и оптимизацией аппаратных затрат на схему коммутации. Вычислитель обеспечивает последовательное вычисление БПФ, с бесконфликтным доступом к памяти с линейной адресацией.The claimed invention is an FFT calculator with frequency decimation and optimization of hardware costs for the switching circuit. The calculator provides sequential FFT computation with contention-free linear addressing memory access.

Заявленный вычислитель БПФ выполнен на основе унифицированной (единой) схемы коммутации значения из памяти для базовых узлов вычислений операции «бабочка» для всех стадий конвейера. Ввиду того, что схема коммутации едина, она позволяет построить вычислитель, оптимизированный по используемым ресурсам, в том числе по ресурсам памяти, быстродействию и т.д. Например, в случае жестких требований по аппаратным затратам, заявленный вычислитель позволяет использовать контроллер прямого доступа к памяти (DMA). The declared FFT calculator is made on the basis of a unified (single) switching circuit of the values from the memory for the basic nodes of calculations of the "butterfly" operation for all stages of the pipeline. Due to the fact that the switching scheme is the same, it allows you to build a computer optimized for the resources used, including memory resources, speed, etc. For example, in case of strict requirements for hardware costs, the declared calculator allows the use of a direct memory access (DMA) controller.

При этом, благодаря использованию единой схемы коммутации и бесконфликтного линейного доступа к памяти в заявленном вычислителе может быть использован контроллер DMA общего назначения, а именно с каналами чтения и записи с системной шины. Ввиду наличия доступа контроллера DMA к системной шине, возможно использование контроллера внешней памяти, например, DDR сколь угодно большого размера, требуемого для хранения входных, выходных, промежуточных значений и поворачивающих множителей. At the same time, due to the use of a single switching scheme and conflict-free linear access to memory, the declared calculator can use a general-purpose DMA controller, namely, with read and write channels from the system bus. Since the DMA controller has access to the system bus, it is possible to use an external memory controller, for example, DDR of an arbitrarily large size required to store input, output, intermediate values and turning factors.

Преимуществом заявленного изобретения является возможность выполнять преобразования практически неограниченной длины, с аппаратными затратами лишь на два узла «бабочка» и на согласующие буферы FIFO. Благодаря применяемой в заявленном изобретении схеме вычисления БПФ, доступ к памяти является линейным, что крайне важно для достижения максимальной производительности DDR памяти. Ввиду наличия пакетного режима обмена данными на системной шине и возможности линейного доступа к памяти быстродействие заявленного вычислителя соразмерно скорости поступления необработанных данных, что крайне важно для решения задач потоковой обработки. An advantage of the claimed invention is the ability to perform transformations of practically unlimited length, with the hardware cost of only two butterfly nodes and FIFO matching buffers. Due to the FFT calculation scheme used in the claimed invention, memory access is linear, which is extremely important for maximum performance of DDR memory. Due to the presence of a batch mode of data exchange on the system bus and the possibility of linear access to memory, the speed of the declared calculator is commensurate with the speed of incoming raw data, which is extremely important for solving streaming processing problems.

Хотя описанный выше вариант выполнения изобретения был изложен с целью иллюстрации заявленного изобретения, специалистам ясно, что возможны разные модификации, добавления и замены, не выходящие из объема и смысла заявленного изобретения, раскрытого в прилагаемой формуле изобретения.Although the above embodiment has been set forth for the purpose of illustrating the claimed invention, it will be clear to those skilled in the art that various modifications, additions and substitutions are possible without departing from the scope and spirit of the claimed invention as disclosed in the appended claims.

Claims

1. Reconfigurable calculator of fast Fourier transform of extra-long transform length for

input samples containing two computing nodes "butterfly" (700, 701), four FIFO buffers (702, 705, 709, 710) and a DMA controller (703), which is connected to the system bus (704) and the control bus Control Bus, while the data output Rdata0 of the zero channel of the DMA controller (703) is connected to the data input of the coefficients of the coefficient buffer RfifoCoef (702), the upper part of the output bus Coef1 of which is connected to the input of the multiplier of the first node "butterfly" BFly1 (701), and the lower part Coef0 of the output bus is connected to the input of the zero node "butterfly" multiplier BFly0 (700), the input WE of the write permission of the buffer of coefficients RfifoCoef (702) is connected to the output WE0 of the zero channel of the DMA controller (703), the input Full0 of which is connected to the output of the FullCoef of the buffer of coefficients RfifoCoef (702), the RE permission input of which is connected and configured to receive the permission signal En from the inverse output of the NOT-OR element (711), the zero input of which is connected to the output emptying EmptyCoef of the buffer of RfifoCoef coefficients (702), the third input is connected to the FullW output of the buffer for writing data Wfifo (705), and the first and second inputs are connected to the emptying outputs EmptyDown and EmptyUp of the buffer RfifoDown (709) for reading data of the first DMA channel and buffer RfifoUp ( 710) reading the data of the second DMA channel, while the inverse output of the NOT-OR gate (711) is connected to the input of the AND gate (706) and to the WE input of the write permission of the Wfifo data write buffer (705), as well as to the RE inputs of the read permission buffer RfifoDown (709) for reading data of the first DMA channel and buffer RfifoUp (710) for reading data of the second DMA channel, the WE inputs of the write permission are connected to the WE1 and WE2 outputs of the write permission of the DMA controller read channels (703), and the Full1 and Full2 inputs are full connected to the FullDown and FullUp outputs of the RfifoDown buffer fullness (709) for reading the data of the first DMA channel and the RfifoUp buffer (710) for reading the data of the second DMA channel, input the read data buses Rdata_Down and Rdata_Up which are connected to the read data outputs Rdata1 and Rdata2 of the read channels of the DMA controller (703), while the upper half of the bits RdataH of the output data buses Rdata_Up and Rdata_Down of the RfifoDown buffer (709) for reading the data of the first DMAUp channel and buffer Rf10 ) data reads of the second DMA channel are connected to inputs A and B of the zero node "butterfly" BFly0 (700), and the lower half of the bits RdataL of the output buses Rdata_Up and Rdata_Down of the buffer RfifoDown (709) read data of the first DMA channel and buffer RfifoUp (710) read data of the second DMA channel are connected to the inputs A and B of the first butterfly node BFly1 (701), while the Y and Z outputs of the zero butterfly node BFly0 (700) are combined into a common bus and connected to the zero input of the multiplexer (708), the first input which is connected to the combined bus of outputs Y and Z of the first node "butterfly" BFly1 (701), while the output data bus of the multiplexer (708) is connected to the data input of the data write buffer Wfifo (705), the output bus which is connected to the input Wdata of the write channel of the DMA controller (703), the Empty input of which is connected to the EmptyW output of the Wfifo write enable buffer (705), and the RE write enable output to the RE input of the Wfifo write enable buffer (705), while the output of the AND element -NOT (706) is connected to the input of the delay element D (707), the output of which is connected to the inverse input of the AND-NOT element (706) and to the input of the multiplexer selector (708).

2. The calculator according to claim 1, characterized in that the computing node "butterfly" is typical and consists of two adders and a complex multiplier, while the first input A of the node "butterfly" is connected to the first inputs of two adders, the output of the first adder is the first output Y node "butterfly", and the second B input of the node "butterfly" is connected to the second input of the first adder, as well as to the input of the multiplier by -1, the output of which is connected to the second input of the second adder, the output of which is connected to the input of the complex multiplier, the output of which is the second output Z of the butterfly node.