WO2018012828A1 - Dispositif de calcul multifonction et dispositif de calcul de transformée de fourier rapide - Google Patents

Dispositif de calcul multifonction et dispositif de calcul de transformée de fourier rapide Download PDF

Info

Publication number
WO2018012828A1
WO2018012828A1 PCT/KR2017/007358 KR2017007358W WO2018012828A1 WO 2018012828 A1 WO2018012828 A1 WO 2018012828A1 KR 2017007358 W KR2017007358 W KR 2017007358W WO 2018012828 A1 WO2018012828 A1 WO 2018012828A1
Authority
WO
WIPO (PCT)
Prior art keywords
butterfly
output
mac
multiplier
input
Prior art date
Application number
PCT/KR2017/007358
Other languages
English (en)
Korean (ko)
Inventor
김태형
Original Assignee
김태형
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020160088659A external-priority patent/KR101842937B1/ko
Priority claimed from KR1020160156445A external-priority patent/KR101859294B1/ko
Application filed by 김태형 filed Critical 김태형
Priority to US16/317,886 priority Critical patent/US10949493B2/en
Priority to CN201780043429.0A priority patent/CN109496306B/zh
Publication of WO2018012828A1 publication Critical patent/WO2018012828A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0215Addressing or allocation; Relocation with look ahead addressing means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the technology described below relates to a multifunction computing device and a fast Fourier transform (FTT) computing device.
  • FFT fast Fourier transform
  • the MAC (multiply-accumulate) circuit has a multiplier and an accumulator connected to the output of the multiplier.
  • MAC circuits are used in a variety of applications, including finite impulse response filters (FIR filters), infinite impulse response filters (IIR filters), fast Fourier transforms (FFTs), and inverse Fourier transforms (IFFTs).
  • FIR filters finite impulse response filters
  • IIR filters infinite impulse response filters
  • FFTs fast Fourier transforms
  • IFFTs inverse Fourier transforms
  • the MAC circuit was initially applied to a digital signal processor (DSP), but is now commonly applied to a general purpose processor (GPP).
  • DSP digital signal processor
  • GPS general purpose processor
  • Korean Patent No. 10-0835173 Name of the Invention: Digital Signal Processing Apparatus and Method for Multiply Accumulation Operations.
  • the control unit has to call an instruction from the program memory every clock cycle and deliver it to the data address generator. This puts a lot of load on the control unit and lowers the efficiency of the entire system.
  • the present disclosure is to solve the problems of the prior art, and to provide a multifunctional computing device and an FTT computing device in which the control unit does not need to call an instruction from the program memory every clock cycle.
  • the present disclosure provides a multifunction computing device and an FTT computing device capable of minimizing the capacity of a memory used while simultaneously using a plurality of MAC circuits.
  • a multifunction computing device includes: a MAC unit including a plurality of multiply-accumulate circuits; Generating a read address group and transferring the generated read address group to a memory, wherein the read address group includes a plurality of read addresses; And a plurality of banks for storing a plurality of read data groups, wherein a read data group corresponding to the read address group is transferred from the plurality of read data groups to the MAC unit, wherein the read data group stores a plurality of read data. It includes the memory provided.
  • an FFT computing device includes: a multiplier having a plurality of multiplication circuits; Generating a read address group and transferring the generated read address group to a memory, wherein the read address group includes a plurality of read addresses; And a plurality of banks for storing a plurality of read data groups, wherein a read data group corresponding to the read address group is transferred to the multiplier from among the read data groups, wherein the read data group is configured to transmit a plurality of read data groups. It includes the memory provided.
  • the multifunction computing device and the FFT computing device according to the present disclosure have an advantage that the address generator includes a lookup table or a state machine to reduce the load of the controller by generating an address without intervention of the controller.
  • the multifunction computing device and the FFT computing device have an advantage of reducing the amount of memory required by storing data in a predetermined order so that collisions do not occur between the plurality of banks.
  • FIG. 1 is a diagram illustrating a multifunctional computing device according to an embodiment.
  • FIG. 2 to 10 are diagrams for describing an operation when the multi-function computing device illustrated in FIG. 1 includes eight MACs and performs a 16-point fast Fourier transform (FFT) operation.
  • FFT fast Fourier transform
  • 11 to 14 are diagrams for describing an operation when the multifunction computing device includes eight MAC circuits and performs an FIR operation.
  • 15 is a diagram illustrating an FFT computing device according to an embodiment.
  • 16 to 24 are diagrams for describing an operation when the FFT operation apparatus illustrated in FIG. 15 includes eight multiplication circuits and performs a 16-point fast Fourier transform (FFT) operation.
  • FFT fast Fourier transform
  • first, second, A, B, etc. may be used to describe various components, but the components are not limited by the terms, but merely for distinguishing one component from other components. Only used as For example, the first component may be referred to as the second component, and similarly, the second component may be referred to as the first component without departing from the scope of the technology described below.
  • each process constituting the method may occur differently from the stated order unless the context clearly indicates a specific order. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.
  • the multifunction computing device includes a MAC unit 110, an address generator 120, a memory 130, a read mapper 140, a write mapper 150, a read mapper 150, and a controller. And 160.
  • the MAC unit 110 includes a plurality of MAC circuits 111 and an arithmetic unit 116.
  • Each of the plurality of MAC circuits 111 includes a multiplier 112 and an accumulator 115.
  • Accumulator 115 accumulates the output of multiplier 112.
  • the accumulator 115 includes a summer 113 and a register 114.
  • the accumulator 115 may be omitted or the register 114 may be omitted.
  • the circuit in which the register 114 is omitted in the MAC circuit 111 is strictly a multiplication-addition circuit, but in the present invention, the multiplication-addition circuit is considered to be included in the scope of the MAC circuit 111.
  • the circuit in which the accumulator 115 is omitted in the MAC circuit 111 is strictly a multiplication circuit, but in the present invention, the multiplication circuit is considered to be included in the category of the MAC circuit 111. That is, the MAC circuit 111 of the present invention is a MAC circuit in a broad sense including not only a multiply-cumulative circuit but also a multiply-add circuit and a multiplication circuit.
  • the arithmetic unit 116 performs at least one of sum, difference, accumulation, and shift operations on the plurality of outputs output from the plurality of MAC circuits 111.
  • the arithmetic unit 116 outputs one or more MAC outputs Mout1 to MoutC, where C is an integer, obtained as a result of at least one operation.
  • the arithmetic unit 116 may additionally output a flag FL indicating a final operation result.
  • the operation performed by the arithmetic unit 116 is changed according to an operation (eg, FFT, FIR, etc.) that the multifunction computing device is to perform. Therefore, the arithmetic unit 116 changes the operation performed according to the arithmetic control signal ACS transmitted from the controller 160.
  • the address generator 120 generates a read address group RAG and transfers the read address group to the memory 130.
  • the read address group RAG has a plurality of read addresses.
  • the address generator 120 generates a write address group WAG and transfers it to the memory 130.
  • the write address group WAG includes a plurality of write addresses.
  • the address generator generates a read mapping value (RMV) and a write mapping value (WMV) and transfers them to the read mapper 140 and the write mapper 150, respectively.
  • the address generator 120 includes, for example, a counter 122 and a lookup table 124.
  • the counter 122 outputs a value changed according to the clock signal CK. In one example, the counter 122 outputs an integer value that increases with the clock signal CK.
  • the lookup table 124 outputs a read address group RAG selected according to a value output from the counter 122 among the plurality of read address groups. For this purpose, the lookup table 124 stores a plurality of read address groups. Also, the lookup table 124 outputs a write address group WAG selected according to a value output from the counter 122 among the plurality of write address groups. To this end, the lookup table 124 stores a plurality of write address groups.
  • the write address group WAG may be obtained by delaying the read address group RAG.
  • the lookup table 124 outputs a read mapping value RMV selected according to a value output from the counter 122 among the plurality of read mapping values. For this purpose, the lookup table 124 stores a plurality of read mapping values.
  • the lookup table 124 outputs a write mapping value WMV selected according to a value output from the counter 122 among the plurality of write mapping values. For this purpose, the lookup table 124 stores a plurality of write mapping values.
  • the address generator 120 may include a state machine (not shown).
  • the state machine generates a read address group (RAG), a write address group (WAG), a read mapping value (RMV), and a write mapping value (WMV) according to the clock signal.
  • RAG read address group
  • WAG write address group
  • RMV read mapping value
  • WMV write mapping value
  • the memory 130 includes a plurality of banks 132. Thus, the memory 130 may read or write several data at the same time. For example, when the memory 130 includes four banks 132, the memory 130 may read or write four data at the same time. Of course, in this case, four data should be located in different banks 132.
  • the memory 130 may be, for example, a dual-port memory. In this case, the memory 130 may simultaneously perform a write operation and a read operation. For example, when the memory 130 includes four banks 132, the memory 130 may read four data at the same time and write four data. Of course, in this case, four data to be read should be located in different banks 112 and four data to be written should be located in different banks 132.
  • the plurality of banks 132 stores a plurality of read data groups.
  • the memory 130 transfers the read data group RDG corresponding to the read address group RAG among the plurality of read data groups to the MAC unit 110.
  • the read data group RDG includes a plurality of read data. Each of the plurality of read data may be complex, real or integer.
  • the plurality of read data is output from different banks 132.
  • first to fourth read data may be output from the first to fourth banks among the plurality of read data, respectively.
  • the plurality of read data may be located in the same row.
  • the first to fourth read data may be first data of the first to fourth banks, respectively.
  • the plurality of read data may be located in different rows.
  • the first and third read data may be fifth data of the first and third banks, respectively, and the second and fourth read data may be sixth data of the second and fourth banks, respectively.
  • the plurality of banks 132 store one or more write data groups.
  • the memory 130 stores the write data group WDG at a location corresponding to the write address group WAG.
  • the write data group WDG has one or more write data.
  • Each of the one or more write data may be complex, real or integer.
  • One or more write data are stored in different banks 132.
  • the first to fourth write data may be stored in the first to fourth banks among the one or more write data, respectively.
  • One or more write data may be stored in the same row.
  • the first to fourth write data may be stored at first positions of the first to fourth banks, respectively.
  • the plurality of write data may be stored in different rows.
  • the first and third write data may be stored in a fifth position of the first and third banks, respectively, and the second and fourth write data may be stored in a sixth position of the second and fourth banks, respectively.
  • the read mapper 140 maps the plurality of read data to the plurality of MAC inputs Min1 to MinB, where B is an integer, according to the read mapping value RRM.
  • the write mapper 150 maps one or more MAC outputs Mout1 to MoutC into one or more write data according to the write mapping value WMV.
  • the controller 160 stores the initial read data groups in the memory 130, stores the plurality of read address groups in the lookup table 124, and then drives the address generator 120.
  • the controller 160 stores the initial read data groups in the memory so that the plurality of read data are output from different banks among the plurality of banks 132 even though the plurality of MAC circuits 111 operate simultaneously.
  • the controller 160 transmits an arithmetic control signal ACS corresponding to an operation to be performed by the multifunction computing device to the arithmetic unit 116.
  • the controller 160 is primarily involved in the initial operation of the multifunction computing device, and is not involved at all, or very occasionally, while the multifunction computing device performs an operation (eg, FFT, FIR, etc.). That is, during the operation, the operation is mainly performed by the operation of the address generator 120. Thus, the burden on the controller 160 is reduced.
  • the controller 160 may be, for example, a CPU.
  • the multifunction computing device shown in FIG. 1 may perform an FFT operation.
  • 2 to 10 are diagrams for describing an operation when the multi-function computing device illustrated in FIG. 1 includes eight MACs and performs a 16-point fast Fourier transform (FFT) operation.
  • FIG. 2 is a diagram illustrating a "radix2, decimation in time” operation as an example of a 16-point FFT.
  • the 16-point FFT operation has four stages, in which eight butterfly operations are performed.
  • the 16-point FFT has 16 inputs (X (1)-X (16)) and 16 outputs (Y (1)-Y (16)).
  • 3 is a view for briefly explaining a butterfly operation. In FIG. 3, the butterfly receives first and second butterfly inputs x1 and x2 and a twiddle factor w, and outputs first and second butterfly outputs y1 and y2.
  • the MAC unit 110 includes a first butterfly circuit 410 and a second butterfly circuit 470.
  • the first butterfly circuit 410 includes first to fourth MAC circuits 420, 430, 440, and 450 and a first arithmetic unit 460.
  • Each of the first to fourth MAC circuits 420, 430, 440, and 450 includes a multiplier 112 and an accumulator 115.
  • the accumulator 115 does not operate. Therefore, the register 114 included in the accumulator 115 operates in a reset state and outputs zero.
  • the first MAC circuit 420 outputs a value obtained by multiplying the first and second MAC inputs Min1 and Min2.
  • the second MAC circuit 430 outputs a value obtained by multiplying the third and fourth MAC inputs Min3 and Min4.
  • the third MAC circuit 440 outputs a value obtained by multiplying the fifth and sixth MAC inputs Min5 and Min6.
  • the fourth MAC circuit 450 outputs a value obtained by multiplying the seventh and eighth MAC inputs Min7 and Min8.
  • the first arithmetic unit 460 outputs first to fourth outputs Mout1 to Mout4.
  • the first output Mout1 corresponds to a value obtained by adding the output of the first MAC circuit 420 to the ninth input Min9 and subtracting the output of the second MAC circuit 430.
  • the second output Mout2 corresponds to a value obtained by subtracting the output of the first MAC circuit 420 from the ninth input Min9 and adding the output of the second MAC circuit 430.
  • the third output Mout3 corresponds to a value obtained by adding the output of the third MAC circuit 440 to the tenth input Min10 and the output of the fourth MAC circuit 450.
  • the fourth output Mout4 corresponds to a value obtained by subtracting the output of the third MAC circuit 440 from the tenth input Min10 and subtracting the output of the fourth MAC circuit 450.
  • the first arithmetic unit 460 includes first to sixth adders 461 to 466.
  • the first adder 461 subtracts the output of the second MAC circuit 430 from the output of the first MAC circuit 420.
  • the second adder 462 adds the output of the fourth MAC circuit 450 to the output of the third MAC circuit 440.
  • the third adder 463 adds the output of the first adder 461 to the ninth MAC input Min9.
  • the fourth adder 464 subtracts the output of the first adder 461 from the ninth MAC input Min9.
  • the fifth adder 465 adds the output of the second adder 462 to the tenth MAC input Min10.
  • the sixth adder 466 subtracts the output of the second adder 462 from the tenth MAC input Min10.
  • the second butterfly circuit 470 receives the eleventh through twentieth MAC inputs Min11 through Min20, and outputs fifth through eighth MAC outputs Mout5 through Mout8. Since the configuration of the second butterfly circuit 470 is the same as that of the first butterfly circuit 420, a detailed description thereof will be omitted.
  • the real (x1 [R]) and imaginary numbers (x1 [I]) of the first butterfly input (x1) are converted into the ninth and tenth MAC inputs. Enter each of (Min9, Min10).
  • the real number x2 [R] of the second butterfly input x2 is input to the first and fifth MAC inputs Min1 and Min5.
  • the imaginary number x2 [I] of the second butterfly input x2 is input to the third and seventh MAC inputs Min3 and Min7.
  • the real number w [R] of the rotation factor w is input to the second and eighth MAC inputs Min2 and Min8.
  • the imaginary number w [I] of the rotation factor w is input to the fourth and sixth MAC inputs Min4 and Min6.
  • the first MAC output Mout1 corresponds to the real number of the first butterfly output y1.
  • the second MAC output Mout2 corresponds to the real number of the second butterfly output y2.
  • the third MAC output Mout3 corresponds to the imaginary number of the first butterfly output y1.
  • the fourth MAC output Mout4 corresponds to the imaginary number of the second butterfly output y2.
  • the memory 130 includes first to sixth banks 510 to 560.
  • the first to fourth banks 510 to 540 are, for example, dual-port memories, and may simultaneously perform four outputs and four inputs.
  • the fifth and sixth banks 550 to 560 are, for example, single-port memories, and may simultaneously perform two outputs.
  • the first to fourth banks 510 to 540 output the first to fourth butterfly inputs X1 to X4 corresponding to the first to fourth butterfly input addresses XA1 to XA4, respectively.
  • the first to fourth banks 510 to 540 receive first to fourth butterfly outputs Y1 to Y4 respectively corresponding to the first to fourth butterfly output addresses YA1 to YA4.
  • the fifth and sixth banks 550 and 560 output first and second rotation factors W1 and W2 corresponding to the first and second rotation factor addresses WA1 and WA2, respectively.
  • the first to fourth butterfly input addresses XA1 to XA4 and the first and second rotation factor addresses WA1 and WA2 correspond to the read address group RAG of FIG. 1. That is, the read address group RAG includes the first to fourth butterfly input addresses XA1 to XA4 and the first and second rotation factor addresses WA1 and WA2 as a plurality of read addresses.
  • the first to fourth butterfly output addresses YA1 to YA4 correspond to the write address group WAG of FIG. 1. That is, the write address group WAG includes first to fourth butterfly output addresses YA1 to YA4 as a plurality of write addresses.
  • the first to fourth butterfly inputs X1 to X4 and the first and second rotation factors W1 and W2 correspond to the read data group RDG of FIG. 1.
  • the read data group RDG includes first to fourth butterfly inputs X1 to X4 and first and second rotation factors W1 and W2 as a plurality of read data.
  • the first to fourth butterfly outputs Y1 to Y4 correspond to the write data group WDG of FIG. 1. That is, the write data group WDG includes first to fourth butterfly outputs Y1 to Y4 as a plurality of write data.
  • the memory 130 stores the initial read data groups X (1) to X (16) and W (1) to W in a predetermined order so that collisions between the plurality of banks 510 to 560 do not occur during the FFT operation. (8)).
  • the initial read data groups X (1) to X (16) and W (1) to W (8) are values stored in the memory 130 before the FFT operation, and are stored by the controller 160 as an example. In the figure, 1 / X (1) means that X (1) is stored at address 1, and 5 / W (1) means that W (1) is stored at address 5.
  • the 16 point FFT inputs X (1) to X (16) are generally sequentially (X (1), X (2), X (3), X (4), X (5), X (6) ), X (7), X (8), X (9), X (10), X (11), X (12), X (13), X (14), X (15), X (16 )), But in the present embodiment, the predetermined order (X (1), X (2), X (3), X (4), X (7), X (8), X (5), X (6), X (11), X (12), X (9), X (10), X (13), X (14), X (15), and X (16).
  • the predetermined order is not sequential, but sequential in row units. That is, X (1) to X (4) are located in the first row, X (5) to X (8) are located in the second row, X (9) to X (12) are located in the third row, and X (13). ) Through X (16) are located in the fourth row.
  • the predetermined order is obtained through simulation in advance so that collisions between the banks 510 to 540 do not occur during the FFT operation.
  • collision between the banks 510 to 540 means that two or more butterfly inputs among the first to fourth butterfly inputs X1 to X4 are simultaneously read from one bank.
  • the predetermined order may be determined by repeating a process of changing the positions of some inputs among the initial FFT inputs X (1) to X (16). .
  • the eight rotation factors W (1) to W (8) are generally sequentially (W (1), W (2), W (3), W (4), W (5), W (6)). , W (7), W (8)), but in the present embodiment, the predetermined order W (1), W (2), W (4), W (3), W (6), W (5), W (7), and W (8).
  • the predetermined order is not sequential, but sequential in row units. That is, W (1) and W (2) are located in the first row, W (3) and W (4) are located in the second row, W (5) and W (6) are located in the third row, and W (7). And W (8) are located in the fourth row.
  • the predetermined order is obtained through simulation in advance so that collisions between the banks 550 to 560 do not occur during the FFT operation.
  • the lookup table 124 of the address generator 120 includes a butterfly lookup table 610, a rotation factor lookup table 620, a read mapping value lookup table 630, and a write mapping value lookup table 640.
  • And register 650 For example, after the controller 160 inputs values required for the butterfly lookup table 610, the rotation factor lookup table 620, the read mapping value lookup table 630, and the write mapping value lookup table 640, The counter 122 is driven.
  • the butterfly lookup table 610 outputs a plurality of butterfly input addresses XA1 to XA4 corresponding to the output value of the counter 122.
  • the register 650 outputs the plurality of butterfly output addresses YA1 to YA4 that delay the plurality of butterfly input addresses XA1 to XA4 by one or more clock cycles.
  • the delay by the register 650 is until after the plurality of butterfly inputs X1 to X4 are output from the memory 130, and then input to the memory 130 as the plurality of butterfly outputs Y1 to Y4. Corresponds to the delay required.
  • the plurality of butterfly inputs X1 to Y4 are output to the memory 130 as the plurality of butterfly outputs Y1 to Y4. It may take one or more clock cycles to enter.
  • the plurality of butterfly outputs Y1 to Y4 are stored in the memory 130. Are stored at locations where a plurality of butterfly inputs (X1 to X4) were located.
  • the rotation factor lookup table 620 outputs one or more rotation factor addresses WA1 and WA2 corresponding to the output value of the counter 122.
  • the read mapping value lookup table 630 outputs a read mapping value RMV corresponding to the output value of the counter.
  • the write mapping value lookup table 640 outputs a write mapping value WMV corresponding to the output value of the counter.
  • the butterfly lookup table 610 outputs 1, 2, 3, and 4 as the plurality of butterfly input addresses XA1 to XA4. Therefore, the memory 130 outputs X (1), X (2), X (3), and X (4) located at 1, 2, 3, and 4 as a plurality of butterfly inputs X1 to X4. . Since the plurality of butterfly input addresses XA1 to XA4 are also used as the plurality of butterfly output addresses YA1 to YA4, the plurality of butterfly outputs Y1 to Y4 are stored at the same position in the memory, that is, 1, 2, It is stored at address 3 and 4.
  • the butterfly lookup table 610 outputs 7, 8, 9, and 10 as the plurality of butterfly input addresses XA1 to XA4. Therefore, the memory 130 outputs X (7), X (8), X (5), and X (6) located at 7, 8, 9, and 10 as the plurality of butterfly inputs X1 to X4. . Since the plurality of butterfly input addresses XA1 to XA4 are also used as the plurality of butterfly output addresses YA1 to YA4, the plurality of butterfly outputs Y1 to Y4 are stored at the same location in memory, that is, 7, 8, It is stored at 9 and 10.
  • the butterfly lookup table 610 outputs 13, 14, 15, 16 in the third cycle, and the memory 130 outputs X (11), X (12), X (9), and X (10). Outputs In the fourth cycle, the butterfly lookup table 610 outputs 19, 20, 21, 22, and the memory 130 outputs X (13), X (14), X (15), and X (16). . Since the subsequent operation is the same as before, it is omitted for convenience of description.
  • FIG. 8 is a diagram illustrating values stored in the rotation factor lookup table 620.
  • the rotation factor lookup table 620 outputs 5, NA as one or more butterfly input addresses WA1 and WA2. Where NA means no output value.
  • the memory 130 outputs W (1) located at address 5 as one or more rotation factors W1.
  • the rotation factor lookup table 620 outputs 5, 18 as one or more butterfly input addresses WA1, WA2.
  • the memory 130 outputs W (1) and W (5) located at addresses 5 and 18 as one or more rotation factors W1 and W2. Since the subsequent operation is the same as before, it is omitted for convenience of description.
  • the read mapper 140 maps the real number X2 [R] of the second butterfly input X2 to the first MAC input Min1, and the first rotation factor ( Map the real number W1 [R] of W1) to the second MAC input Min2, map the imaginary number X2 [I] of the second butterfly input X2 to the third MAC input Min3, The imaginary number W1 [I] of the first rotation factor W1 is mapped to the fourth MAC input Min4.
  • the read mapper 140 is X4 [R], W1 [R], X4 [I], W1 [I], X4 [R], W1 [I], X4 [I], W1 [R], X3 [R], and X3 [I] to fifth to twentieth MAC inputs (Min5 to Min20), respectively.
  • the read mapper 140 is X4 [R], W1 [R], X4 [I], W1 [I], X4 [R], W1 [I], X4 [I], W1 [R].
  • X3 [R], X3 [I], X2 [R], W1 [R], X2 [I], W1 [I], X2 [R], W1 [I], X2 [I], W1 [R] , X1 [R], and X1 [I] are mapped to first to twentieth MAC inputs Min1 to Min20, respectively. Since the subsequent operation is the same as before, it is omitted for convenience of description.
  • the complexity of the mapper can be reduced by changing the storage location between the data in the same row and making the change rule constant in the changed rows.
  • the address change contents of the position change between the data in the same row of the mapping information may be integrated into a read or write address memory (butterfly lookup table 610 and rotation factor lookup table 620). The performance and content of this process is determined and obtained by pre-simulation.
  • the write mapper 150 maps the first MAC output Mout1 to the real value Y1 [R] of the first butterfly output Y1, and the second MAC output ( Mout2 is mapped to the real value Y2 [R] of the second butterfly output Y2, and the third MAC output Mout3 is mapped to the imaginary value Y1 [I] of the first butterfly output Y1.
  • the fourth MAC output Mout4 is mapped to an imaginary value Y2 [I] of the second butterfly output Y2.
  • the fifth to eighth MAC outputs Mout5 to Mout8 are mapped to Y3 [R], Y4 [R], Y3 [I], and Y4 [I], respectively.
  • the first to eighth MAC outputs Mout1 to Mout8 are transferred to Y3 [R], Y4 [R], Y3 [I], Y4 [I], Y1 [R], Y2 [R], Y1 [ I] and Y2 [I]. Since the subsequent operation is the same as before, it is omitted for convenience of description.
  • the multifunction computing device shown in FIG. 1 may perform an FIR operation.
  • 11 to 13 are diagrams for describing an operation when the multifunction computing device includes eight MAC circuits and performs an FIR operation.
  • the MAC unit 110 includes eight MAC circuits 111 and an arithmetic unit 116.
  • Each of the eight MAC circuits 111 includes a multiplier 112 and an accumulator 115 to multiply two MAC inputs and accumulate the multiplied values.
  • the arithmetic unit 116 includes a plurality of adders to sum all the values output from the eight MAC circuits 111. Assuming that odd-numbered MAC inputs Min1, Min3, ... Min15 are inputs of the FIR filter, and even-numbered MAC inputs Min2, Min4, ... Min16 are coefficients of the FIR filter, the MAC unit 110 Can process 8 inputs at the same time.
  • the MAC unit 110 operates four cycles to obtain a result.
  • the arithmetic unit 116 changes the operation performed according to the arithmetic control signal ACS transmitted from the controller 160 as shown in the figure.
  • the configuration of the arithmetic unit 116 is changed to be suitable for the FFT operation according to the arithmetic control signal (ACS) or changed to be suitable for the FIR operation can be simply implemented through a combination of an adder and a switch. It is omitted for convenience.
  • the memory 130 includes first to sixteen banks.
  • the first through eighth banks store FIR inputs In (1) -In (32), and the ninth through sixteenth banks store FIR coefficients C (1) -C (8).
  • the memory 130 outputs In (1) to In (8) and C (1) to C (8) in the first cycle.
  • the read mapper 140 maps In (1) to In (8) to Min1, Min3, ... Min15, and C (1) to C (8) maps to Min2, Min4, ... Min16. .
  • the memory 130 outputs In (9) to In (16) and C (1) to C (8).
  • the read mapper 140 maps In (9) to In (16) to Min1, Min3, ... Min15, and C (1) to C (8) maps to Min2, Min4, ... Min16. . Since the subsequent operation is the same as before, it is omitted for convenience of description.
  • the coefficients C (1) -C (8) are continuously output and can be implemented using registers instead of banks. In this case, the number of banks used in the memory 130 may be reduced.
  • the address generator 120 includes a counter 1301, a multiplier 1302, and first to eighth adders 1311, 1312, ... 1318.
  • the counter 1301 outputs an integer from 0 to 1 in increments.
  • Multiplier 1302 multiplies the output of the counter by sixteen.
  • Adders 1311-1318 add 0-7 to the output of multiplier 1302. The outputs of the adders 1311-1318 are delivered to the first through eighth banks.
  • FIG. 14 is a diagram illustrating a modification of the MAC unit 110 illustrated in FIG. 11.
  • the MAC unit 110 includes eight MAC circuits 111 and an arithmetic unit 116. Unlike FIG. 11, each of the eight MAC circuits 111 includes only a multiplier 112.
  • the arithmetic unit 116 includes a accumulator 115 as well as a plurality of adders 117.
  • the accumulator 115 has an adder 113 and a register 114. As such, when the accumulator 115 located in the MAC circuits 111 is moved to the arithmetic unit 116, the number of accumulators 115 may be reduced.
  • the FFT computing apparatus includes a multiplication unit 110, an address generator 120, a memory 130, a read mapper 140, a write mapper 150, and a read mapper 150. ) And the controller 160.
  • the multiplication unit 110 includes a plurality of multiplication circuits 111 and an arithmetic unit 116.
  • the arithmetic unit 116 performs at least one of a sum and a difference on the plurality of outputs output from the plurality of multiplication circuits 111.
  • the arithmetic unit 116 outputs a plurality of multiplier outputs Mout1 to MoutC, where C is an integer, obtained as a result of at least one operation.
  • the address generator 120 generates a read address group RAG and transfers the read address group to the memory 130.
  • the read address group RAG has a plurality of read addresses.
  • the address generator 120 generates a write address group WAG and transfers it to the memory 130.
  • the write address group WAG includes a plurality of write addresses.
  • the address generator generates a read mapping value (RMV) and a write mapping value (WMV) and transfers them to the read mapper 140 and the write mapper 150, respectively.
  • the address generator 120 includes, for example, a counter 122 and a lookup table 124.
  • the counter 122 outputs a value changed according to the clock signal CK. In one example, the counter 122 outputs an integer value that increases with the clock signal CK.
  • the lookup table 124 outputs a read address group RAG selected according to a value output from the counter 122 among the plurality of read address groups. For this purpose, the lookup table 124 stores a plurality of read address groups. Also, the lookup table 124 outputs a write address group WAG selected according to a value output from the counter 122 among the plurality of write address groups. To this end, the lookup table 124 stores a plurality of write address groups.
  • the write address group WAG may be obtained by delaying the read address group RAG.
  • the lookup table 124 outputs a read mapping value RMV selected according to a value output from the counter 122 among the plurality of read mapping values. For this purpose, the lookup table 124 stores a plurality of read mapping values.
  • the lookup table 124 outputs a write mapping value WMV selected according to a value output from the counter 122 among the plurality of write mapping values. For this purpose, the lookup table 124 stores a plurality of write mapping values.
  • the address generator 120 may include a state machine (not shown).
  • the state machine generates a read address group (RAG), a write address group (WAG), a read mapping value (RMV), and a write mapping value (WMV) according to the clock signal.
  • RAG read address group
  • WAG write address group
  • RMV read mapping value
  • WMV write mapping value
  • the memory 130 includes a plurality of banks 132. Thus, the memory 130 may read or write several data at the same time. For example, when the memory 130 includes four banks 132, the memory 130 may read or write four data at the same time. Of course, in this case, four data should be located in different banks 132.
  • the memory 130 may be, for example, a dual-port memory. In this case, the memory 130 may simultaneously perform a write operation and a read operation. For example, when the memory 130 includes four banks 132, the memory 130 may read four data at the same time and write four data. Of course, in this case, four data to be read should be located in different banks 112 and four data to be written should be located in different banks 132.
  • the plurality of banks 132 stores a plurality of read data groups.
  • the memory 130 transmits the read data group RDG corresponding to the read address group RAG among the plurality of read data groups to the multiplier 110.
  • the read data group RDG includes a plurality of read data. Each of the plurality of read data may be complex, real or integer.
  • the plurality of read data is output from different banks 132.
  • first to fourth read data may be output from the first to fourth banks among the plurality of read data, respectively.
  • the plurality of read data may be located in the same row.
  • the first to fourth read data may be first data of the first to fourth banks, respectively.
  • the plurality of read data may be located in different rows.
  • the first and third read data may be fifth data of the first and third banks, respectively, and the second and fourth read data may be sixth data of the second and fourth banks, respectively.
  • the plurality of banks 132 stores a plurality of write data groups.
  • the memory 130 stores the write data group WDG at a location corresponding to the write address group WAG.
  • the write data group WDG includes a plurality of write data.
  • Each of the plurality of write data may be complex, real or integer.
  • the plurality of write data are stored in different banks 132.
  • the first to fourth write data may be stored in the first to fourth banks among the plurality of write data, respectively.
  • the plurality of write data may be stored in the same row.
  • the first to fourth write data may be stored at first positions of the first to fourth banks, respectively.
  • the plurality of write data may be stored in different rows.
  • the first and third write data may be stored in a fifth position of the first and third banks, respectively, and the second and fourth write data may be stored in a sixth position of the second and fourth banks, respectively.
  • the read mapper 140 maps the plurality of read data into a plurality of multiplier inputs Min1 to MinB, where B is an integer, according to the read mapping value RRM.
  • the write mapper 150 maps the plurality of multiplier outputs Mout1 to MoutC into a plurality of write data according to the write mapping value WMV.
  • the controller 160 stores the initial read data groups in the memory 130, stores the plurality of read address groups in the lookup table 124, and then drives the address generator 120.
  • the controller 160 stores the initial read data groups in the memory such that the plurality of read data are output from different banks among the plurality of banks 132 even though the multiplication circuits 111 operate simultaneously. .
  • the controller 160 is primarily involved in the initial operation of the FFT computing device, and is not involved at all or very occasionally during the operation of the FFT computing device. That is, during the operation, the operation is mainly performed by the operation of the address generator 120. Thus, the burden on the controller 160 is reduced.
  • the controller 160 may be, for example, a CPU.
  • 16 to 24 illustrate a case in which the FFT operation apparatus shown in FIG. 15 includes eight multiplication circuits and performs a 16-point fast Fourier transform (FFT) operation, for example, a "radix2, decimation in time” operation.
  • FFT 16-point fast Fourier transform
  • the 16-point FFT operation has four stages, in which eight butterfly operations are performed.
  • the 16-point FFT has 16 inputs (X (1)-X (16)) and 16 outputs (Y (1)-Y (16)).
  • 17 is a diagram for briefly explaining a butterfly operation. In FIG. 17, the butterfly receives the first and second butterfly inputs x1 and x2 and the twist factor w, and outputs the first and second butterfly outputs y1 and y2.
  • the multiplier 110 includes a first butterfly circuit 410 and a second butterfly circuit 470.
  • the first butterfly circuit 410 includes first to fourth multiplication circuits 420, 430, 440, and 450 and a first arithmetic unit 460.
  • the first multiplication circuit 420 outputs a value obtained by multiplying the first and second multiplier inputs Min1 and Min2.
  • the second multiplication circuit 430 outputs a value obtained by multiplying the third and fourth multiplier inputs Min3 and Min4.
  • the third multiplication circuit 440 outputs a value obtained by multiplying the fifth and sixth multiplier inputs Min5 and Min6.
  • the fourth multiplier circuit 450 outputs a value obtained by multiplying the seventh and eighth multiplier inputs Min7 and Min8.
  • the first arithmetic unit 460 outputs the first to fourth multiplier outputs Mout1 to Mout4.
  • the first multiplier output Mout1 corresponds to a value obtained by adding the output of the first multiplier circuit 420 to the ninth multiplier input Min9 and subtracting the output of the second multiplier circuit 430.
  • the second multiplier output Mout2 corresponds to a value obtained by subtracting the output of the first multiplier circuit 420 from the ninth multiplier input Min9 and adding the output of the second multiplier circuit 430.
  • the third multiplier output Mout3 corresponds to a value obtained by adding the output of the third multiplier circuit 440 to the tenth multiplier input Min10 and adding the output of the fourth multiplier circuit 450.
  • the fourth multiplier output Mout4 corresponds to a value obtained by subtracting the output of the third multiplier circuit 440 from the tenth multiplier input Min10 and subtracting the output of the fourth multiplier circuit 450.
  • the first arithmetic unit 460 includes first to sixth adders 461 to 466.
  • the first adder 461 subtracts the output of the second multiplier 430 from the output of the first multiplier 420.
  • the second adder 462 adds the output of the fourth multiplier circuit 450 to the output of the third multiplier circuit 440.
  • the third adder 463 adds the output of the first adder 461 to the ninth multiplier input Min9.
  • the fourth adder 464 subtracts the output of the first adder 461 from the ninth multiplier input Min9.
  • the fifth adder 465 adds the output of the second adder 462 to the tenth multiplier input Min10.
  • the sixth adder 466 subtracts the output of the second adder 462 from the tenth multiplier input Min10.
  • the second butterfly circuit 470 receives the eleventh to twelfth multiplier inputs Min11 to Min20 and outputs the fifth to eighth multiplier outputs Mout5 to Mout8. Since the configuration of the second butterfly circuit 470 is the same as that of the first butterfly circuit 420, a detailed description thereof will be omitted.
  • the ninth and tenth multiplier inputs the real number x1 [R] and the imaginary number x1 [I] of the first butterfly input x1. Fields (Min9, Min10) respectively.
  • the real number x2 [R] of the second butterfly input x2 is input to the first and fifth multiplier inputs Min1 and Min5.
  • the imaginary number x2 [I] of the second butterfly input x2 is input to the third and seventh multiplier inputs Min3 and Min7.
  • the real number w [R] of the rotation factor w is input to the second and eighth multiplier inputs Min2 and Min8.
  • the imaginary number w [I] of the rotation factor w is input to the fourth and sixth multiplier inputs Min4 and Min6.
  • the first multiplier output Mout1 corresponds to the real number of the first butterfly output y1.
  • the second multiplier output Mout2 corresponds to the real number of the second butterfly output y2.
  • the third multiplier output Mout3 corresponds to the imaginary number of the first butterfly output y1.
  • the fourth multiplier output Mout4 corresponds to the imaginary number of the second butterfly output y2.
  • the memory 130 includes first to sixth banks 510 to 560.
  • the first to fourth banks 510 to 540 are, for example, dual-port memories, and may simultaneously perform four outputs and four inputs.
  • the fifth and sixth banks 550 to 560 are, for example, single-port memories, and may simultaneously perform two outputs.
  • the first to fourth banks 510 to 540 output the first to fourth butterfly inputs X1 to X4 corresponding to the first to fourth butterfly input addresses XA1 to XA4, respectively.
  • the first to fourth banks 510 to 540 receive first to fourth butterfly outputs Y1 to Y4 respectively corresponding to the first to fourth butterfly output addresses YA1 to YA4.
  • the fifth and sixth banks 550 and 560 output first and second rotation factors W1 and W2 corresponding to the first and second rotation factor addresses WA1 and WA2, respectively.
  • the first to fourth butterfly input addresses XA1 to XA4 and the first and second rotation factor addresses WA1 and WA2 correspond to the read address group RAG of FIG. 15. That is, the read address group RAG includes the first to fourth butterfly input addresses XA1 to XA4 and the first and second rotation factor addresses WA1 and WA2 as a plurality of read addresses.
  • the first to fourth butterfly output addresses YA1 to YA4 correspond to the write address group WAG of FIG. 15. That is, the write address group WAG includes first to fourth butterfly output addresses YA1 to YA4 as a plurality of write addresses.
  • the first to fourth butterfly inputs X1 to X4 and the first and second rotation factors W1 and W2 correspond to the read data group RDG of FIG. 15.
  • the read data group RDG includes first to fourth butterfly inputs X1 to X4 and first and second rotation factors W1 and W2 as a plurality of read data.
  • the first to fourth butterfly outputs Y1 to Y4 correspond to the write data group WDG of FIG. 15. That is, the write data group WDG includes first to fourth butterfly outputs Y1 to Y4 as a plurality of write data.
  • the memory 130 stores the initial read data groups X (1) to X (16) and W (1) to W in a predetermined order so that collisions between the plurality of banks 510 to 560 do not occur during the FFT operation. (8)).
  • the initial read data groups X (1) to X (16) and W (1) to W (8) are values stored in the memory 130 before the FFT operation, and are stored by the controller 160 as an example. In the figure, 1 / X (1) means that X (1) is stored at address 1, and 5 / W (1) means that W (1) is stored at address 5.
  • the 16 point FFT inputs X (1) to X (16) are generally sequentially (X (1), X (2), X (3), X (4), X (5), X (6) ), X (7), X (8), X (9), X (10), X (11), X (12), X (13), X (14), X (15), X (16 )), But in the present embodiment, the predetermined order (X (1), X (2), X (3), X (4), X (7), X (8), X (5), X (6), X (11), X (12), X (9), X (10), X (13), X (14), X (15), and X (16).
  • the predetermined order is not sequential, but sequential in row units. That is, X (1) to X (4) are located in the first row, X (5) to X (8) are located in the second row, X (9) to X (12) are located in the third row, and X (13). ) Through X (16) are located in the fourth row.
  • the predetermined order is obtained through simulation in advance so that collisions between the banks 510 to 540 do not occur during the FFT operation.
  • collision between the banks 510 to 540 means that two or more butterfly inputs among the first to fourth butterfly inputs X1 to X4 are simultaneously read from one bank.
  • the predetermined order may be determined by repeating a process of changing the positions of some inputs among the initial FFT inputs X (1) to X (16). .
  • the eight rotation factors W (1) to W (8) are generally sequentially (W (1), W (2), W (3), W (4), W (5), W (6)). , W (7), W (8)), but in the present embodiment, the predetermined order W (1), W (2), W (4), W (3), W (6), W (5), W (7), and W (8).
  • the predetermined order is not sequential, but sequential in row units. That is, W (1) and W (2) are located in the first row, W (3) and W (4) are located in the second row, W (5) and W (6) are located in the third row, and W (7). And W (8) are located in the fourth row.
  • the predetermined order is obtained through simulation in advance so that collisions between the banks 550 to 560 do not occur during the FFT operation.
  • the lookup table 124 of the address generator 120 may include a butterfly lookup table 610, a rotation factor lookup table 620, a read mapping value lookup table 630, and a write mapping value lookup table 640.
  • And register 650 For example, after the controller 160 inputs values required for the butterfly lookup table 610, the rotation factor lookup table 620, the read mapping value lookup table 630, and the write mapping value lookup table 640, The counter 122 is driven.
  • the butterfly lookup table 610 outputs a plurality of butterfly input addresses XA1 to XA4 corresponding to the output value of the counter 122.
  • the register 650 outputs the plurality of butterfly output addresses YA1 to YA4 that delay the plurality of butterfly input addresses XA1 to XA4 by one or more clock cycles.
  • the delay by the register 650 is until after the plurality of butterfly inputs X1 to X4 are output from the memory 130, and then input to the memory 130 as the plurality of butterfly outputs Y1 to Y4. Corresponds to the delay required.
  • the plurality of butterfly inputs X1 to Y4 are output to the memory 130 as the plurality of butterfly outputs Y1 to Y4. It may take one or more clock cycles to enter.
  • the plurality of butterfly outputs Y1 to Y4 are stored in the memory 130. Are stored at locations where a plurality of butterfly inputs (X1 to X4) were located.
  • the rotation factor lookup table 620 outputs one or more rotation factor addresses WA1 and WA2 corresponding to the output value of the counter 122.
  • the read mapping value lookup table 630 outputs a read mapping value RMV corresponding to the output value of the counter.
  • the write mapping value lookup table 640 outputs a write mapping value WMV corresponding to the output value of the counter.
  • FIG. 21 is a diagram illustrating values stored in the butterfly lookup table 610.
  • the butterfly lookup table 610 outputs 1, 2, 3, and 4 as the plurality of butterfly input addresses XA1 to XA4. Therefore, the memory 130 outputs X (1), X (2), X (3), and X (4) located at 1, 2, 3, and 4 as a plurality of butterfly inputs X1 to X4. . Since the plurality of butterfly input addresses XA1 to XA4 are also used as the plurality of butterfly output addresses YA1 to YA4, the plurality of butterfly outputs Y1 to Y4 are stored at the same position in the memory, that is, 1, 2, It is stored at address 3 and 4.
  • the butterfly lookup table 610 outputs 7, 8, 9, and 10 as the plurality of butterfly input addresses XA1 to XA4. Therefore, the memory 130 outputs X (7), X (8), X (5), and X (6) located at 7, 8, 9, and 10 as the plurality of butterfly inputs X1 to X4. . Since the plurality of butterfly input addresses XA1 to XA4 are also used as the plurality of butterfly output addresses YA1 to YA4, the plurality of butterfly outputs Y1 to Y4 are stored at the same location in memory, that is, 7, 8, It is stored at 9 and 10.
  • the butterfly lookup table 610 outputs 13, 14, 15, 16 in the third cycle, and the memory 130 outputs X (11), X (12), X (9), and X (10). Outputs In the fourth cycle, the butterfly lookup table 610 outputs 19, 20, 21, 22, and the memory 130 outputs X (13), X (14), X (15), and X (16). . Since the subsequent operation is the same as before, it is omitted for convenience of description.
  • FIG. 22 is a diagram illustrating values stored in the rotation factor lookup table 620.
  • the rotation factor lookup table 620 outputs 5, NA as one or more butterfly input addresses WA1 and WA2. Where NA means no output value.
  • the memory 130 outputs W (1) located at address 5 as one or more rotation factors W1.
  • the rotation factor lookup table 620 outputs 5, 18 as one or more butterfly input addresses WA1, WA2.
  • the memory 130 outputs W (1) and W (5) located at addresses 5 and 18 as one or more rotation factors W1 and W2. Since the subsequent operation is the same as before, it is omitted for convenience of description.
  • the read mapper 140 maps the real number X2 [R] of the second butterfly input X2 to the first multiplier input Min1, and the first rotation factor.
  • the real number W1 [R] of (W1) is mapped to the second multiplier input Min2, and the imaginary number X2 [I] of the second butterfly input X2 is mapped to the third multiplier input Min3.
  • the imaginary number W1 [I] of the first rotation factor W1 is mapped to the fourth multiplier input Min4.
  • X3 [R], X3 [I], X2 [R], W1 [R], X2 [I], W1 [I], X2 [R], W1 [I], X2 [I], W1 [R] , X1 [R] and X1 [I] are mapped to the first to twentieth multiplier inputs Min1 to Min20, respectively. Since the subsequent operation is the same as before, it is omitted for convenience of description.
  • the complexity of the mapper can be reduced by changing the storage location between the data in the same row and making the change rule constant in the changed rows.
  • the address change contents of the position change between the data in the same row of the mapping information may be integrated into a read or write address memory (butterfly lookup table 610 and rotation factor lookup table 620). The performance and content of this process is determined and obtained by pre-simulation.
  • the write mapper 150 maps the first multiplier output Mout1 to the real value Y1 [R] of the first butterfly output Y1 and the second multiplier. Map output Mout2 to real value Y2 [R] of second butterfly output Y2, and third multiplier output Mout3 to imaginary value Y1 [I of first butterfly output Y1. ), And the fourth multiplier output Mout4 is mapped to the imaginary value Y2 [I] of the second butterfly output Y2.
  • the fifth to eighth multiplier outputs Mout5 to Mout8 are mapped to Y3 [R], Y4 [R], Y3 [I], and Y4 [I], respectively.
  • the first to eighth multiplier outputs Mout1 to Mout8 are set to Y3 [R], Y4 [R], Y3 [I], Y4 [I], Y1 [R], Y2 [R], Y1. Map to [I] and Y2 [I], respectively. Since the subsequent operation is the same as before, it is omitted for convenience of description.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Discrete Mathematics (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)

Abstract

L'invention concerne, selon un mode de réalisation, un dispositif de calcul multifonction comportant: une unité à circuits multiplicateurs-accumulateurs (MAC) comprenant une pluralité de circuits MAC; une unité de génération d'adresses servant à générer un groupe d'adresses de lecture et à transférer le groupe d'adresses de lecture généré à une mémoire, le groupe d'adresses de lecture comprenant une pluralité d'adresses de lecture; et la mémoire qui comprend une pluralité de blocs servant à stocker une pluralité de groupes de données de lecture et transfère un groupe de données de lecture correspondant au groupe d'adresses de lecture, parmi la pluralité de groupes de données de lecture, à l'unité de MAC, le groupe de données de lecture comprenant une pluralité d'éléments de données de lecture.
PCT/KR2017/007358 2016-07-13 2017-07-10 Dispositif de calcul multifonction et dispositif de calcul de transformée de fourier rapide WO2018012828A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/317,886 US10949493B2 (en) 2016-07-13 2017-07-10 Multi-functional computing apparatus and fast fourier transform computing apparatus
CN201780043429.0A CN109496306B (zh) 2016-07-13 2017-07-10 多功能运算装置及快速傅里叶变换运算装置

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR1020160088659A KR101842937B1 (ko) 2016-07-13 2016-07-13 다기능 연산 장치
KR10-2016-0088659 2016-07-13
KR10-2016-0156445 2016-11-23
KR1020160156445A KR101859294B1 (ko) 2016-11-23 2016-11-23 고속 푸리에 변환 연산 장치

Publications (1)

Publication Number Publication Date
WO2018012828A1 true WO2018012828A1 (fr) 2018-01-18

Family

ID=60952604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2017/007358 WO2018012828A1 (fr) 2016-07-13 2017-07-10 Dispositif de calcul multifonction et dispositif de calcul de transformée de fourier rapide

Country Status (3)

Country Link
US (1) US10949493B2 (fr)
CN (1) CN109496306B (fr)
WO (1) WO2018012828A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344964A (zh) * 2018-08-08 2019-02-15 东南大学 一种适用于神经网络的乘加计算方法和计算电路
CN114185514A (zh) * 2021-12-13 2022-03-15 合肥工业大学 一种基于费马模数的多项式乘法器

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11455169B2 (en) * 2019-05-27 2022-09-27 Texas Instruments Incorporated Look-up table read
CN113094639B (zh) * 2021-03-15 2022-12-30 Oppo广东移动通信有限公司 一种dft并行处理方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001022577A (ja) * 1999-07-13 2001-01-26 Matsushita Electric Ind Co Ltd 情報処理装置
KR100835173B1 (ko) * 2006-09-20 2008-06-05 한국전자통신연구원 곱셈 누적 연산을 위한 디지털 신호처리 장치 및 방법
KR100940465B1 (ko) * 1998-03-18 2010-02-04 퀄컴 인코포레이티드 디지털 신호처리기
KR20110090915A (ko) * 2008-10-08 2011-08-10 에이알엠 리미티드 Simd 곱셈-누적 연산 실행장치 및 방법
KR20160072620A (ko) * 2014-12-15 2016-06-23 삼성전자주식회사 메모리 접근 방법 및 장치

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0228187B2 (ja) 1983-05-04 1990-06-21 Victor Company Of Japan Kosokufuuriehenkannoenzansochi
US6839728B2 (en) * 1998-10-09 2005-01-04 Pts Corporation Efficient complex multiplication and fast fourier transform (FFT) implementation on the manarray architecture
US20040181503A1 (en) * 2003-03-13 2004-09-16 Motorola, Inc. Information storage and retrieval method and apparatus
CN102081592B (zh) * 2009-11-27 2012-07-04 重庆重邮信科通信技术有限公司 一种混合基dft和idft快速实现方法及装置
KR20120100197A (ko) * 2011-03-03 2012-09-12 삼성전기주식회사 리포지션된 순서로 데이터를 출력하는 고속 푸리에 변환 장치
US8812819B1 (en) * 2011-08-18 2014-08-19 Altera Corporation Methods and apparatus for reordering data signals in fast fourier transform systems
US20150331634A1 (en) * 2013-01-09 2015-11-19 Sergei I. SALISHCHEV Continuous-flow conflict-free mixed-radix fast fourier transform in multi-bank memory

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100940465B1 (ko) * 1998-03-18 2010-02-04 퀄컴 인코포레이티드 디지털 신호처리기
JP2001022577A (ja) * 1999-07-13 2001-01-26 Matsushita Electric Ind Co Ltd 情報処理装置
KR100835173B1 (ko) * 2006-09-20 2008-06-05 한국전자통신연구원 곱셈 누적 연산을 위한 디지털 신호처리 장치 및 방법
KR20110090915A (ko) * 2008-10-08 2011-08-10 에이알엠 리미티드 Simd 곱셈-누적 연산 실행장치 및 방법
KR20160072620A (ko) * 2014-12-15 2016-06-23 삼성전자주식회사 메모리 접근 방법 및 장치

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344964A (zh) * 2018-08-08 2019-02-15 东南大学 一种适用于神经网络的乘加计算方法和计算电路
CN109344964B (zh) * 2018-08-08 2020-12-29 东南大学 一种适用于神经网络的乘加计算方法和计算电路
CN114185514A (zh) * 2021-12-13 2022-03-15 合肥工业大学 一种基于费马模数的多项式乘法器
CN114185514B (zh) * 2021-12-13 2024-03-08 合肥工业大学 一种基于费马模数的多项式乘法器

Also Published As

Publication number Publication date
CN109496306A (zh) 2019-03-19
US10949493B2 (en) 2021-03-16
CN109496306B (zh) 2023-08-29
US20190294650A1 (en) 2019-09-26

Similar Documents

Publication Publication Date Title
WO2018012828A1 (fr) Dispositif de calcul multifonction et dispositif de calcul de transformée de fourier rapide
US11836497B2 (en) Operation module and method thereof
US20200117976A1 (en) Processing apparatus and processing method
CA1297195C (fr) Processeur de signaux numeriques
WO2013039335A1 (fr) Dispositif de traitement et générateur de motif de réarrangement
CN111651203B (zh) 一种用于执行向量四则运算的装置和方法
EP0305639A2 (fr) Calculateur vectoriel
CN1004773B (zh) 能快速处理不同组指令的信息处理装置
KR101859294B1 (ko) 고속 푸리에 변환 연산 장치
CN111260043B (zh) 数据选择器、数据处理方法、芯片及电子设备
KR101842937B1 (ko) 다기능 연산 장치
JPH04260957A (ja) コンピュータ・システム
KR20180058166A (ko) 고속 푸리에 변환 연산 장치
CN111260042B (zh) 数据选择器、数据处理方法、芯片及电子设备
KR20180007652A (ko) 다기능 연산 장치
JP4108371B2 (ja) マルチプロセッサシステム
JP2668156B2 (ja) データ駆動型情報処理装置の実行制御方法
JP2547219B2 (ja) ベクトルデータのアクセス制御装置及び方法
JP2001167084A (ja) ベクトル演算処理装置及びベクトルデータ移送方法
CN111340229B (zh) 数据选择器、数据处理方法、芯片及电子设备
WO2019221569A1 (fr) Dispositif de traitement parallèle
JP3180447B2 (ja) ディジタル信号処理装置
JPH077388B2 (ja) ベクトル演算処理装置
JP2022162245A (ja) バタフライ演算回路
JP3693681B2 (ja) 演算装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17827899

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17827899

Country of ref document: EP

Kind code of ref document: A1