CN106201999B - Mixed base DFT/IDFT is read parallel and calculation method and device - Google Patents

Mixed base DFT/IDFT is read parallel and calculation method and device Download PDF

Info

Publication number
CN106201999B
CN106201999B CN201610596528.9A CN201610596528A CN106201999B CN 106201999 B CN106201999 B CN 106201999B CN 201610596528 A CN201610596528 A CN 201610596528A CN 106201999 B CN106201999 B CN 106201999B
Authority
CN
China
Prior art keywords
parallel
read
data
series
product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610596528.9A
Other languages
Chinese (zh)
Other versions
CN106201999A (en
Inventor
李桓
王晓琴
郭晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201610596528.9A priority Critical patent/CN106201999B/en
Publication of CN106201999A publication Critical patent/CN106201999A/en
Application granted granted Critical
Publication of CN106201999B publication Critical patent/CN106201999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Discrete Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of mixed base DFT/IDFT data parallel read method, mixed base DFT/IDFT parallel calculating method, mixed base DFT/IDFT data parallel reading devices, mixed base DFT/IDFT parallel computation unit.Wherein, which includes:According to points corresponding to operation series and the products of points corresponding to series being completed, to configure double loop parameter;Then, the size for judging maximum parallel read data number and being completed between the corresponding product counted of series;Finally, being based on judging result, corresponding double loop parameter is calculated according to judging result, and based on the double loop parameter parallel read data being calculated.The embodiment of the present invention improves processing degree of parallelism as a result, reduces correlation between data, reduces so that integral operation sky is clapped, improves flowing water line use ratio, and then can effectively promote mixed base DFT/IDFT arithmetic speed.

Description

Mixed base DFT/IDFT is read parallel and calculation method and device
Technical field
The present embodiments relate to mobile communication technology fields, and in particular to a kind of mixed base DFT/IDFT data parallel reading Take method, mixed base DFT/IDFT parallel calculating method, mixed base DFT/IDFT data parallel reading device, mixed base DFT/ IDFT parallel computation unit, but it is not limited to this.
Background technique
In digital information processing system, especially finite length sequence, DFT (discrete Fourier transform) are a kind of extremely heavy The mathematic(al) manipulation wanted.It is in the nature the finite point discrete sampling of finite length sequence Fourier transformation.It makes Digital Signal Processing It can be completed in frequency domain using digital operation method, enhance the flexibility of Digital Signal Processing, DFT is in digital communication, image The fields such as processing, power Spectral Estimation extensive application.Wherein, it counts and 2 class FFT of base can be used for the DFT operation of 2 power side Algorithm is completed.For other situations of counting, i.e., cannot complete to be known as general number DFT using fft algorithm.
Currently, generally number DFT generally uses the mixed base algorithm using Cooley-Tukey algorithm as theoretical basis.2 class of base Fft algorithm also based on this modification obtain.Its basic thought is:Multiple small point DFT is converted by big points DFT, In operation each time be known as level-one, successively execute every level-one operation and complete entire DFT process.Usually matter is set by small point Number, i.e., 3,5 ... successively carry out according to the process constantly nesting of base 3, base 5 ... in operation.If every level-one base N operation executes Dry time, but targeted data are varied.
Formula (1) is 3 algorithm expression formula of base, whereinIt is related with k to input twiddle factor;It is unrelated with k to export twiddle factor.
Due to generally counting the integral multiple of DFT process non-2, general processor can not be by integer group data when being handled It once reads in or writes out, to reduce degree of parallelism.Meanwhile general DFT treatment process be advanced row data and input rotation because Son multiplies, adds operation, then carries out and export multiplying, adding operation for twiddle factor, so that correlation is larger between data.Furthermore generally DFT process will multiply plus mixed and alternate executes, and be introduced back into calculating correlation.This leads to the arithmetic unit as caused by data dependence Latent period is elongated, and flowing water line use ratio reduces, to reduce the processing speed of entire DFT operation.
In view of this, the present invention is specifically proposed.
Summary of the invention
The main purpose of the embodiment of the present invention is to provide a kind of mixed base DFT/IDFT data parallel read method, until Partially solves the technical issues of how promoting operation efficiency.It is counted parallel in addition, also providing a kind of mixed base DFT/IDFT Calculation method, mixed base DFT/IDFT data parallel reading device, mixed base DFT/IDFT parallel computation unit.
To achieve the goals above, according to an aspect of the invention, there is provided following technical scheme:
A kind of mixed base DFT/IDFT data parallel read method.The method may include:
According to points corresponding to operation series and the products of points corresponding to series being completed, recirculate to configure two Parameter;
Judge maximum parallel read data number and the size being completed between the corresponding product counted of series;
Corresponding double loop parameter is calculated according to judging result, and based on the double loop parameter being calculated Parallel read data.
It is further, described according to points corresponding to operation series and being completed the products of points corresponding to series, It configures double loop parameter, can specifically include:
According to described to points and the product that points corresponding to series are completed corresponding to operation series, configuration is such as Under double loop parameter:First recirculates step-length as N1, first recirculates number as N0, second recirculates step-length as N2, the second weight Cycle-index isWherein, the N0It indicates to points corresponding to operation series, the N1Expression is completed corresponding to series The product of points, the N2For the N1With the N0Product.
Further, described that corresponding double loop parameter is calculated according to judging result, and be based on being calculated Double loop parameter parallel read data, can specifically include:
It is less than or equal to N in M1In the case where, read twiddle factor is not handled, calculates following double loop parameter:
Described first recirculate step-length be M, described first number that recirculates isDescribed second recirculates Step-length is N2, described second number that recirculates beWherein, the M indicates the maximum parallel read data that processor is supported Number, the N0It indicates to points corresponding to operation series, the N1It indicates that the corresponding product counted of series, institute is completed State N2For the N1With the N0Product;
The data are read parallel according to above-mentioned double loop parameter, and read the M data every time, until will be described N1A data are all read.
Further, described that corresponding double loop parameter is calculated according to judging result, and be based on being calculated Double loop parameter parallel read data, also can specifically include:
It is greater than N in M1In the case where, it calculatesValue;
DuplicationThe read twiddle factor of part;
According to following double loop parameter with N2Before step-length is read parallelGroup data:Described first recirculates step It is a length ofDescribed first recirculates number as N0, described second step-length that recirculates beDescribed second The number that recirculates isWherein, the M indicates the maximum parallel read data number that processor is supported, the N0Table Show to points corresponding to operation series, the N1It indicates that the corresponding product counted of series, the N is completed2For the N1With The N0Product.
To achieve the goals above, according to another aspect of the present invention, it additionally provides a kind of mixed based on the above method Close base DFT/IDFT parallel calculating method.The parallel calculating method may include:
Step 1:It is parallel to read input twiddle factor and output twiddle factor, and the two respective items are multiplied, it will multiply Product result is together with the input twiddle factor as the equivalent rotary factor;
Step 2:The equivalent rotary factor is multiplied with input data, and result of product is cached;
Step 3:In recirculating second, when executing multiplying in the step 2, result that the step 2 is cached It reads, and carries out corresponding addition or subtraction operation.
Further, described that the equivalent rotary factor is multiplied with input data, and result of product is cached, have Body may include:
In the case where processor is not provided with complex operation unit, by the equivalent rotary factor and the input data Real part, imaginary part multiplication cross result cached.
Further, the step 3 can specifically include:
In the case where processor is equipped with complex operation unit, when executing multiplying in the step 2, by the step The result of 2 cachings is read, and carries out corresponding add operation.
Further, the step 3 also can specifically include:
In the case where processor is not provided with complex operation unit, when executing multiplying in the step 2, by the step The result of rapid 2 caching is read, and carries out following subtraction operation:
By product between the equivalent rotary factor and the real part of the input data and the equivalent rotary factor and institute Product between the imaginary part of input data is stated to subtract each other.
To achieve the goals above, according to a further aspect of the invention, a kind of mixed base DFT/IDFT number is additionally provided According to parallel reading device.The parallel reading device may include:
It counts computing unit, for according to points corresponding to operation series and multiplying for points corresponding to series being completed Product, to configure double loop parameter;
Group number judging unit, for judging that points corresponding to series are completed with described in maximum parallel read data number Size between product;
Reading unit, judging result for being obtained according to described group of number judging unit calculate corresponding double follow Ring parameter, and based on the double loop parameter parallel read data being calculated.
Further, the points computing unit can specifically include:
Configuration module, for points corresponding to operation series and described points corresponding to series being completed according to described Product, configure following double loop parameter:First recirculates step-length as N1, first recirculates number as N0, second recirculates step A length of N2, second number that recirculates isWherein, the N0It indicates to points corresponding to operation series, the N1It indicates complete At the product of points corresponding to series, the N2For the N1With the N0Product.
Further, the reading unit can specifically include:
First computing module, for being less than or equal to N in M1In the case where, do not handle read twiddle factor, calculate with Under double loop parameter:
First recirculate step-length be M, number of repetition isSecond recirculates step-length as N2, number of repetition ForWherein, the M indicates the maximum parallel read data number that processor is supported, the N0It indicates to operation series institute Corresponding points, the N1It indicates that the corresponding product counted of series, the N is completed2For the N1With the N0Product;
First read module for reading the data parallel according to above-mentioned double loop parameter, and reads the M every time A data, until by the N1A data are all read.
Further, the reading unit also can specifically include:
Second computing module, for being greater than N in M1In the case where, it calculatesValue;
Replication module, it is described for replicatingThe read twiddle factor of part;
Second read module is used for according to following double loop parameter with N2Step-length is described before reading parallelGroup Data:First step-length that recirculates isFirst recirculates number as N0, second step-length that recirculates be Second number that recirculates isWherein, the M indicates the maximum parallel read data number that processor is supported, described N0It indicates to points corresponding to operation series, the N1It indicates that the corresponding product counted of series, the N is completed2It is described N1With the N0Product.
To achieve the goals above, according to a further aspect of the invention, it additionally provides a kind of based on above-mentioned parallel reading The mixed base DFT/IDFT parallel computation unit of device.The parallel computation unit may include:
Equivalent rotary factor calculating unit, for reading input twiddle factor and output twiddle factor parallel, and will be described The two respective items are multiplied, using result of product together with the input twiddle factor as the equivalent rotary factor;
Cache unit, the equivalent rotary factor and input for will be obtained by the equivalent rotary factor calculating unit Data are multiplied, and cache to result of product;
Data processing unit, will be described slow when the cache unit executes multiplying used in recirculating second The result cached in memory cell is read, and carries out corresponding addition or subtraction operation.
Further, the equivalent rotary factor calculating unit can specifically include:
It is parallel to read in module, for reading in the input twiddle factor and the output twiddle factor parallel;
Cache module is obtained for the input twiddle factor to be multiplied with the output twiddle factor respective items First and second groups of equivalent rotary factors, and by described first and second group of equivalent rotary factor together with as third group etc. The input twiddle factor for imitating twiddle factor is stored in caching.
Further, the data processing unit can also include:
Complex operation unit, the result for will cache in the cache unit is read, and carries out corresponding add operation.
Compared with prior art, above-mentioned technical proposal at least has the advantages that:
The embodiment of the present invention passes through according to points corresponding to operation series and multiplying for points corresponding to series being completed Product, to configure double loop parameter;Then, judge maximum parallel read data number and multiplying for the corresponding points of series is completed Size between product;Finally, being based on judging result, corresponding double loop parameter, and base are calculated according to judging result In the double loop parameter parallel read data being calculated.As a result, by calculating points relevant information, configure double Loop parameter reads data, and data with operation series according to points when one timing of bit wide of processor with maximum parallelism degree Between it is uncorrelated, in operation without specially to data carry out reordering operations, handled without carrying out lateral operation, improve place Degree of parallelism is managed, execution cycle is reduced.
Certainly, it implements any of the products of the present invention and is not necessarily required to realize all the above advantage simultaneously.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.Objectives and other advantages of the present invention can be by written explanation Specifically noted method is achieved and obtained in book, claims and attached drawing.
Detailed description of the invention
Attached drawing is as a part of the invention, and for providing further understanding of the invention, of the invention is schematic Examples and descriptions thereof are used to explain the present invention, but does not constitute an undue limitation on the present invention.Obviously, the accompanying drawings in the following description Only some embodiments to those skilled in the art without creative efforts, can be with Other accompanying drawings can also be obtained according to these attached drawings.In the accompanying drawings:
Fig. 1 is to be shown according to the process of mixed base DFT and IDFT the data parallel read method shown in an exemplary embodiment It is intended to;
Fig. 2 is to be illustrated according to the process of mixed base DFT and the IDFT parallel calculating method shown in another exemplary embodiment Figure;
Fig. 3 is inputs twiddle factor according to the parallel reading shown in an exemplary embodiment and exports twiddle factor, and will The two respective items are multiplied, using result of product together with input twiddle factor as the flow diagram of the equivalent rotary factor;
Fig. 4 is to be shown according to the structure of mixed base DFT and IDFT the data parallel reading device shown in an exemplary embodiment It is intended to;
Fig. 5 is the structural schematic diagram according to mixed base DFT and the IDFT parallel computation unit shown in an exemplary embodiment.
These attached drawings and verbal description are not intended to the protection scope limiting the invention in any way, but by reference to Specific embodiment is that those skilled in the art illustrate idea of the invention.
Specific embodiment
The technical issues of with reference to the accompanying drawing and specific embodiment is solved to the embodiment of the present invention, used technical side Case and the technical effect of realization carry out clear, complete description.Obviously, described embodiment is only one of the application Divide embodiment, is not whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not paying creation Property labour under the premise of, all other equivalent or obvious variant the embodiment obtained is fallen within the scope of protection of the present invention. The embodiment of the present invention can be embodied according to the multitude of different ways being defined and covered by claim.
It should be noted that in the following description, understanding for convenience, giving many details.But it is very bright Aobvious, realization of the invention can be without these details.
It should also be noted that, in the absence of clear limitations or conflicts, each embodiment in the present invention and Technical characteristic therein can be combined with each other and form technical solution.
The environment of application of the embodiment of the present invention is the LTE system of mobile communication field, wherein the transmission of uplink transmitting terminal prelists Code module is DFT process, and corresponding receiving end is IDFT (inverse discrete fourier transform) process.
Different according to distribution number of resources, the points N for carrying out DFT/IDFT meets following relationship:
N=2α×3β×5γ, 12≤N≤1536, α >=2, β >=1, γ >=0
When specific implementation, 2αPoint DFT can be used FFT completion, remaining base 3, base 5 DFT process then need using mixed base DFT is completed.Wherein, mixed base DFT need to carry out 3 operation of β base and γ 5 operation of base, and complete using the sequence of base 5 after first base 3 At.
Fig. 1 schematically illustrates a kind of mixed base DFT/IDFT data parallel read method.As shown in Figure 1, this method May include:
S100:According to points corresponding to operation series and the products of points corresponding to series being completed, to configure two Recirculate parameter.
S110:The size for judging maximum parallel read data number and being completed between the corresponding product counted of series.
S120:Corresponding double loop parameter is calculated according to judging result, and double is followed based on what is be calculated Ring parameter parallel read data.
The embodiment of the present invention configures double loop parameter, when the position of processor by calculating points relevant information A wide timing reads data with maximum parallelism degree, to improve processing degree of parallelism.
As a kind of optional implementation of the present embodiment, according to points corresponding to operation series and grade being completed The corresponding product counted of number, can specifically include to configure double loop parameter:According to points corresponding to operation series With the product that points corresponding to series are completed, following double loop parameter is configured:First recirculates step-length as N1, first follows again Ring number is N0, second recirculates step-length as N2, second number that recirculates isWherein, N0It indicates to corresponding to operation series Points, N1It indicates that the corresponding product counted of series, N is completed2For N1With N0Product.
As a kind of optional implementation of the present embodiment, it is based on judging result, is calculated according to the judgment result therewith Corresponding double loop parameter, and can specifically include based on the double loop parameter parallel read data being calculated:
It is less than or equal to N in M1In the case where, read twiddle factor is not handled, calculates following loop parameter:
First recirculate step-length be M, first number that recirculates isSecond recirculates step-length as N2, Double cycle-index isWherein, M indicates the maximum parallel read data number that processor is supported, N0It indicates to operation stage The corresponding points of number, N1It indicates that the corresponding product counted of series, N is completed2For N1With N0Product;
According to above-mentioned loop parameter parallel read data, and M data are read every time, until by N1A data are all read Out.
The embodiment of the present invention configures double loop parameter, when processor obtains position by calculating points relevant information A wide timing reads data with maximum parallelism degree, and uncorrelated between data, without carrying out lateral operation processing, to improve Handle degree of parallelism.
As a kind of optional implementation of the present embodiment, it is based on judging result, is calculated according to the judgment result therewith Corresponding double loop parameter, and specifically can also include based on the double loop parameter parallel read data being calculated:
It is greater than N in M1In the case where, it calculatesValue, read twiddle factor is replicatedPart, and root Descend loop parameter with N accordingly2Before step-length is read parallelGroup data:First step-length that recirculates isFirst weight Cycle-index is N0, second step-length that recirculates beSecond number that recirculates isWherein, M is indicated The maximum parallel read data number that processor is supported, N0It indicates to points corresponding to operation series, N1Grade is completed in expression The corresponding product counted of number, N2For N1With N0Product.
The embodiment of the present invention configures double loop parameter, when processor obtains position by calculating points relevant information A wide timing reads data with maximum parallelism degree, and uncorrelated between data, without carrying out lateral operation processing, to improve Handle degree of parallelism.
The embodiment of the present invention can be based on any mixed base process, may take arbitrary number in view of mixed base theory, it is impossible to Exhaustive citing, so, below by preferred mode, the present invention will be described in detail by taking base 3 as an example.
Assuming that:N0It indicates to points corresponding to operation series;N1It indicates that the corresponding product counted of series is completed;M table Show the maximum parallel read data number (16 can be taken as) that processor is supported;N indicates that DFT points (can be taken as 1200 Point).
S200:Calculate N0And N1, N0=3, N1=16, and according to N0And N1To determine loop parameter, wherein loop parameter packet Include first recirculate step-length and cycle-index, second recirculate step-length and cycle-index.
In this step, N2For N1With N0Product, first recirculate step-length be N1, first recirculates number as N0, second The step-length that recirculates is N2, second number that recirculates isFrom there through calculate it can be concluded that:N2=48, first step-length that recirculates is 16, first recirculate number be 3;Second recirculate step-length be 48, second recirculate number be 25.
S210:Judge M and N1Size relation.If M is less than or equal to N1, then follow the steps S211;Otherwise, step is executed S212。
S211:Read twiddle factor is not handled, according to following loop parameter parallel read data, and reads M every time A data, until by N1A data are all read:
First recirculate step-length be M, number of repetition isSecond recirculates step-length as N2, number of repetition For
Degree of parallelism is 16 at this time, bandwidth availability ratio 1.In this step, second parameter that recirculates is constant.In reality In, first recirculates parameter and second parameter that recirculates can be adjusted according to the bit wide of processor.
S212:It calculatesValue, read twiddle factor is replicatedPart, and according to following loop parameter With N2Before step-length is read parallelGroup data:First step-length that recirculates isFirst recirculates number as N0, Two step-lengths that recirculate areSecond number that recirculates is
Degree of parallelism at this time is
Based on the above embodiment, the embodiment of the present invention also proposes a kind of mixed base DFT/IDFT parallel calculating method.Such as Fig. 2 Shown, this method can be realized by step S300 to step S320.
S300:It is parallel to read input twiddle factor and output twiddle factor, and the two respective items are multiplied, by product As a result together with input twiddle factor as the equivalent rotary factor.
Specifically, as shown in figure 3, this step may include:Step S301 and step S302.
S301:It is parallel to read in input twiddle factor and output twiddle factor.
S302:Will input twiddle factor with export twiddle factor respective items be multiplied, obtain first and second groups it is equivalent Twiddle factor, and by first and second groups of equivalent rotary factors together with the input twiddle factor as the third group equivalent rotary factor Deposit caching.
It is described in detail to obtain the process of the equivalent rotary factor by taking base 3 as an example below by preferred mode.
S401:It is parallel to read in input twiddle factorAnd output twiddle factorWithWherein, W is twiddle factor label;K is the size of data for carrying out base N operation, value 0,1 ... ... N- 1。
S402:Will input twiddle factor with export twiddle factor respective items be multiplied, obtain first and second groups it is equivalent Twiddle factor:
And by first and second groups of equivalent rotary factors together with as third group The input twiddle factor of the equivalent rotary factor is stored in caching.
Wherein it is possible to be cached according to following manner:Individually storage input twiddle factor and output twiddle factor be not permanent For 1 factor.Input twiddle factor needs storage (N according to data difference0-1)×N1A different data, output twiddle factor only have (N0-1)×(N0- 1) a different data, corresponding multiplied result are (N0-1)×(N0-1)×N1A different data.
S310:The equivalent rotary factor is multiplied with input data, and result of product is cached.
Specifically, by taking base 3 as an example, two groups of equivalent rotary factors that this step obtains step S302WithAnd input twiddle factorIt is multiplied as three groups of equivalent rotary factors with input data.
Wherein, multiplication result is Wherein, B and C indicates input data.
In an alternative embodiment, if processor is without complex operation unit, the step by the equivalent rotary factor with The real part of input data, the result of imaginary part multiplication cross are cached.
This step is when being calculated, since the twiddle factor in each group of calculating process uses the equivalent rotary in caching The factor, so that each group of calculating process only includes multiplying between input data and twiddle factor plus operation, each group of operation Correlation before and after no data between journey, and second recirculate in 25 operation processes only need to execute it is primary.
S320:In recirculating second, when executing multiplying in step S310, the result that step S310 is cached is read Out, and corresponding addition or subtraction operation are carried out.
One of wherein, as a preferred embodiment, by taking base 3 as an example, in the case where processor is equipped with complex operation unit, add Method operates Wherein, A, B and C indicate input data.
The embodiment of the present invention by input with export twiddle factor is multiplied, then by by multiplication result in calculating process into Row caching, thus will multiply, add operation is kept completely separate, and reduces the correlation in entire calculating process, improve assembly line utilization Rate, and then improve arithmetic speed.
In an alternative embodiment, if processor is without complex operation unit, this step includes the equivalent rotary factor The subtraction of product between product between the real part of input data and the equivalent rotary factor and the imaginary part of input data operates.
The embodiment of the present invention, which will pass through, to be multiplied, reducing is kept completely separate, to improve each parting lines utilization rate, in turn Improve arithmetic speed.
In conclusion the embodiment of the present invention when being calculated, first carries out the equivalent rotary factor in each group of calculating process It is operated with the multiplication of input data, then by result of product, all deposit is cached.It will when next group of operation carries out multiplication operation Result of product data reading in caching carry out plus, reducing, clapped with evading the arithmetic unit sky that correlation generates between data.
Although each step is described in the way of above-mentioned precedence in above-described embodiment, this field Technical staff is appreciated that the effect in order to realize the present embodiment, executes between different steps not necessarily in such order, It (parallel) execution simultaneously or can be executed with reverse order, these simple variations all protection scope of the present invention it It is interior.
Based on technical concept identical with above-mentioned parallel read method embodiment, the embodiment of the present invention also provides a kind of mixing Base DFT/IDFT data parallel reading device.As shown in figure 4, the device 40 may include points computing unit 42, group number judgement Unit 44 and reading unit 46.Wherein, points computing unit 42 is used for according to points corresponding to operation series and being completed The corresponding product counted of series, to configure double loop parameter.Group number judging unit 44 is for judging maximum parallel reading number According to number and the size that is completed between the product of points corresponding to series.Reading unit 46 is used for according to a group number judging unit 44 Obtained judging result calculates corresponding double loop parameter, and is read parallel based on the double loop parameter being calculated Access evidence.
This mixed base DFT/IDFT data parallel reading device embodiment is configured by calculating points relevant information Double loop parameter reads data with operation series according to points when one timing of bit wide of processor with maximum parallelism degree, and It is uncorrelated between data, processing degree of parallelism is improved, execution cycle is reduced.
On the basis of the above embodiments, above-mentioned points computing unit 42 may further include configuration module.The configuration Module is used for according to points corresponding to operation series and the products of points corresponding to series being completed, and configuration is double as follows to follow Ring parameter:First recirculates step-length as N1, first recirculates number as N0, second recirculates step-length as N2, second recirculates number ForWherein, N0It indicates to points corresponding to operation series, N1It indicates that the corresponding product counted of series, N is completed2For N1 With N0Product.
On the basis of the embodiment shown in fig. 4, reading unit 46 may further include the first computing module and first and read Modulus block.Wherein, the first computing module is used to be less than or equal to N in M1In the case where, read twiddle factor is not handled, is calculated Double loop parameter below:
First recirculate step-length be M, number of repetition isSecond recirculates step-length as N2, number of repetition ForWherein, M indicates the maximum parallel read data number that processor is supported, N0It indicates to point corresponding to operation series Number, N1It indicates that the corresponding product counted of series, N is completed2For N1With N0Product.First read module is used for according to above-mentioned Double loop parameter parallel read data, and M data are read every time, until by N1A data are all read.
On the basis of the embodiment shown in fig. 4, reading unit 46 can further include the second computing module, backed stamper Block and the second read module.Wherein, the second computing module is used to be greater than N in M1In the case where, it calculatesValue.Backed stamper Block is for replicatingThe read twiddle factor of part.Second read module is used for according to following double loop parameter with N2 Before step-length is read parallelGroup data:First step-length that recirculates isFirst recirculates number as N0, the second weight Recycling step-length isSecond number that recirculates isWherein, M indicates the maximum that processor is supported Parallel read data number, N0It indicates to points corresponding to operation series, N1It indicates that multiplying for the corresponding points of series is completed Product, N2For N1With N0Product.
Explanation in relation to the parallel reading device embodiment can be with reference to associated parallel read method embodiment Illustrate, details are not described herein.
It should be noted that mixed base DFT/IDFT data parallel reading device provided by the above embodiment is carrying out data When reading, only the example of the division of the above functional modules, in practical applications, can according to need and will be above-mentioned Function distribution is completed by different functional modules, i.e., the internal structure of device is divided into different functional modules, to complete All or part of function described above.
In addition, the embodiment of the present invention also proposes a kind of mixed base DFT/IDFT based on above-mentioned parallel reading device embodiment Parallel computation unit.The parallel computation unit can execute above-mentioned parallel calculating method embodiment.As shown in figure 5, the device 50 It may include equivalent rotary factor calculating unit 52, cache unit 54 and data processing unit 56.Wherein, equivalent rotary factor meter Unit 52 is calculated for reading input twiddle factor and output twiddle factor parallel, and the two respective items are multiplied, by product As a result together with input twiddle factor as the equivalent rotary factor.Cache unit 54 is used for will be by equivalent rotary factor calculating unit 52 The obtained equivalent rotary factor is multiplied with input data, and caches to result of product.Data processing unit 56 is used for the Two recirculate, and when cache unit 54 executes multiplying, the result that caches in cache unit 54 is read, and carry out corresponding Addition or subtraction operation.
This mixed base DFT/IDFT parallel computation unit embodiment is preferentially handled twiddle factor when carrying out operation, And separate multiplying with signed magnitude arithmetic(al), reduce correlation between data, is reduced so that integral operation sky is clapped, improve stream Waterline utilization rate, and then can effectively promote mixed base DFT and IDFT arithmetic speed.
On the basis of the above embodiments, above-mentioned equivalent rotary factor calculating unit 52 may further include parallel reading Module and cache module.Wherein, module is read in parallel for reading in input twiddle factor and output twiddle factor parallel.Cache mould Block for will input twiddle factor with export twiddle factor respective items be multiplied, obtain first and second groups of equivalent rotaries because Son, and first and second groups of equivalent rotary factors are stored in together with the input twiddle factor as the third group equivalent rotary factor and are delayed It deposits.
On the basis of above-mentioned embodiment illustrated in fig. 5, data processing unit can also include complex operation unit.Wherein, Complex operation unit is used to read the result cached in cache unit, and carries out corresponding add operation.
Explanation in relation to the parallel computation unit embodiment can be with reference to associated parallel calculating method embodiment Related explanation, details are not described herein.
It should be noted that mixed base DFT/IDFT parallel computation unit provided by the above embodiment is carrying out parallel computation When, only the example of the division of the above functional modules, in practical applications, it can according to need and by above-mentioned function Distribution is completed by different functional modules, i.e., the internal structure of device is divided into different functional modules, more than completing The all or part of function of description.
It will be understood by those skilled in the art that above-mentioned mixed base DFT/IDFT data parallel reading device, mixed base DFT/ IDFT parallel computation unit further includes some other known features, such as processor, controller, memory etc., wherein memory Including but not limited to random access memory, flash memory, read-only memory, programmable read only memory, volatile memory, non-volatile Memory, serial storage, parallel storage or register etc., processor include but is not limited at CPLD/FPGA, DSP, ARM Device, MIPS processor etc. are managed, in order to unnecessarily obscure embodiment of the disclosure, these well known structures are not shown in figs. 4-5 Out.
It should be understood that the quantity of the modules in Fig. 4-5 is only schematical.According to actual needs, it can have Any number of each module.
Above-mentioned apparatus embodiment can be used for executing above-mentioned corresponding embodiment of the method, technical principle, the skill solved Art problem and the technical effect of generation are similar, and person of ordinary skill in the field can be understood that, for the side of description Just and succinctly, the specific work process of the device of foregoing description and related explanation, can be with reference to pair in preceding method embodiment Process is answered, details are not described herein.
It should be pointed out that the device of the invention embodiments and methods embodiment is described respectively above, but it is right The details of one embodiment description can also be applied to another embodiment.For module involved in the embodiment of the present invention, step Title, it is only for distinguish modules or step, be not intended as inappropriate limitation of the present invention.Those skilled in the art It should be understood that:Module or step in the embodiment of the present invention can also be decomposed or be combined again.Such as the mould of above-described embodiment Block can be merged into a module, can also be further split into multiple submodule.
Technical solution is provided for the embodiments of the invention above to be described in detail.Although applying herein specific A example the principle of the present invention and embodiment are expounded, still, the explanation of above-described embodiment be only applicable to help manage Solve the principle of the embodiment of the present invention;Meanwhile to those skilled in the art, according to an embodiment of the present invention, it is being embodied It can be made a change within mode and application range.
It should be noted that the flowchart or block diagram being referred to herein is not limited solely to form shown in this article, Other can also be carried out to divide and/or combine.
It should be noted that:Label and text in attached drawing are intended merely to be illustrated more clearly that the present invention, are not intended as pair The improper restriction of the scope of the present invention.
Again it should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, rather than be used to describe or indicate specific sequence or precedence.It should be understood that this The data that sample uses can be interchanged in appropriate circumstances, so that the embodiment of the present invention described herein can be in addition at this In illustrate or description those of other than sequence implement.
The terms "include", "comprise" or any other like term are intended to cover non-exclusive inclusion, so that Process, method, article or equipment/device including a series of elements not only includes those elements, but also including not bright The other elements really listed, or further include the intrinsic element of these process, method, article or equipment/devices.
As used herein, term " module ", " unit " may refer to the software object executed on a computing system Or routine.Disparate modules described herein can be embodied as to the object executed on a computing system or process (for example, making It is independent thread).While it is preferred that realize system and method described herein with software, but with hardware or soft The realization of the combination of part and hardware is also possible and can be conceived to.
Each step of the invention can be realized with general computing device, for example, they can concentrate on it is single On computing device, such as:Personal computer, server computer, handheld device or portable device, laptop device or more Processor device can also be distributed over a network of multiple computing devices, they can be to be different from sequence herein Shown or described step is executed, perhaps they are fabricated to each integrated circuit modules or will be more in them A module or step are fabricated to single integrated circuit module to realize.Therefore, the present invention is not limited to any specific hardware and soft Part or its combination.
Programmable logic device can be used to realize in method provided by the invention, and it is soft also to may be embodied as computer program Part or program module (it include routines performing specific tasks or implementing specific abstract data types, programs, objects, component or Data structure etc.), such as embodiment according to the present invention can be a kind of computer program product, run the computer program Product executes computer for demonstrated method.The computer program product includes computer readable storage medium, should It include computer program logic or code section on medium, for realizing the method.The computer readable storage medium can To be the built-in medium being mounted in a computer or the removable medium (example that can be disassembled from basic computer Such as:Using the storage equipment of hot plug technology).The built-in medium includes but is not limited to rewritable nonvolatile memory, Such as:RAM, ROM, flash memory and hard disk.The removable medium includes but is not limited to:Optical storage media (such as:CD- ROM and DVD), magnetic-optical storage medium (such as:MO), magnetic storage medium (such as:Tape or mobile hard disk), can with built-in Rewrite nonvolatile memory media (such as:Storage card) and with built-in ROM media (such as:ROM box).
Present invention is not limited to the embodiments described above, and without departing substantially from substantive content of the present invention, this field is common Any deformation, improvement or the replacement that technical staff is contemplated that each fall within protection scope of the present invention.

Claims (15)

1. a kind of mixed base DFT/IDFT data parallel read method, which is characterized in that the method includes at least:
According to points corresponding to operation series and being completed the products of points corresponding to series, recirculate ginseng to configure two Number;
Judge maximum parallel read data number and the size being completed between the corresponding product counted of series;
Corresponding double loop parameter is calculated according to judging result, and parallel based on the double loop parameter being calculated Read data.
2. the method according to claim 1, wherein described according to points corresponding to operation series and complete It is specifically included at the product of points corresponding to series to configure double loop parameter:
According to described to points and the product that points corresponding to series are completed corresponding to operation series, configuration following two Recirculate parameter:First recirculates step-length as N1, first recirculates number as N0, second recirculates step-length as N2, second recirculates Number isWherein, the N0It indicates to points corresponding to operation series, the N1The corresponding points of series are completed in expression Product, the N2For the N1With the N0Product;The N indicates to carry out the points of DFT/IDFT.
3. the method according to claim 1, wherein described corresponding double according to judging result calculating Loop parameter, and based on the double loop parameter parallel read data being calculated, it specifically includes:
It is less than or equal to N in M1In the case where, read twiddle factor is not handled, calculates following double loop parameter:
First recirculate step-length be M, described first number that recirculates isSecond recirculates step-length as N2, it is described Second number that recirculates isWherein, the M indicates the maximum parallel read data number that processor is supported, the N0Table Show to points corresponding to operation series, the N1It indicates that the corresponding product counted of series, the N is completed2For the N1With The N0Product;The N indicates to carry out the points of DFT/IDFT;
The data are read parallel according to above-mentioned double loop parameter, and read the M data every time, until by the N1It is a Data are all read.
4. the method according to claim 1, wherein described corresponding double according to judging result calculating Loop parameter, and based on the double loop parameter parallel read data being calculated, also specifically include:
It is greater than N in M1In the case where, it calculatesValue;
DuplicationThe read twiddle factor of part;
According to following double loop parameter with N2Before step-length is read parallelGroup data:First step-length that recirculates isDescribed first recirculates number as N0, second step-length that recirculates beDescribed second recirculates time Number isWherein, the M indicates the maximum parallel read data number that processor is supported, the N0It indicates to operation Points corresponding to series, the N1It indicates that the corresponding product counted of series, the N is completed2For the N1With the N0's Product;The N indicates to carry out the points of DFT/IDFT.
5. a kind of mixed base DFT/IDFT parallel calculating method based on any the method in the claims 1-4, special Sign is that the parallel calculating method includes at least:
Step 1:It is parallel to read input twiddle factor and output twiddle factor, and the two respective items are multiplied, by product knot Fruit is together with the input twiddle factor as the equivalent rotary factor;
Step 2:The equivalent rotary factor is multiplied with input data, and result of product is cached;
Step 3:In recirculating second, when executing multiplying in the step 2, the result that the step 2 is cached is read, And carry out corresponding addition or subtraction operation.
6. parallel calculating method according to claim 5, which is characterized in that described by the equivalent rotary factor and input Data are multiplied, and cache to result of product, specifically include:
In the case where processor is not provided with complex operation unit, by the reality of the equivalent rotary factor and the input data Portion, imaginary part multiplication cross result cached.
7. parallel calculating method according to claim 5, which is characterized in that the step 3 specifically includes:
In the case where processor is equipped with complex operation unit, when executing multiplying in the step 2, the step 2 is delayed The result deposited is read, and carries out corresponding add operation.
8. parallel calculating method according to claim 5, which is characterized in that the step 3 also specifically includes:
In the case where processor is not provided with complex operation unit, when executing multiplying in the step 2, by the step 2 The result of caching is read, and carries out following subtraction operation:
By product between the equivalent rotary factor and the real part of the input data and the equivalent rotary factor and described defeated Enter product between the imaginary part of data to subtract each other.
9. a kind of mixed base DFT/IDFT data parallel reading device, which is characterized in that the parallel reading device includes at least:
It counts computing unit, for according to points corresponding to operation series and being completed the products of points corresponding to series, To configure double loop parameter;
Group number judging unit, for judging maximum parallel read data number and the product that points corresponding to series are completed Between size;
Reading unit, judging result for being obtained according to described group of number judging unit calculate corresponding two and recirculate ginseng Number, and based on the double loop parameter parallel read data being calculated.
10. parallel reading device according to claim 9, which is characterized in that the points computing unit specifically includes:
Configuration module, for multiplying for points corresponding to series to be completed with described to points corresponding to operation series according to described Product configures following double loop parameter:First recirculates step-length as N1, first recirculates number as N0, second step-length that recirculates is N2, second number that recirculates isWherein, the N0It indicates to points corresponding to operation series, the N1Grade is completed in expression The corresponding product counted of number, the N2For the N1With the N0Product;The N indicates to carry out the points of DFT/IDFT.
11. parallel reading device according to claim 9, which is characterized in that the reading unit specifically includes:
First computing module, for being less than or equal to N in M1In the case where, read twiddle factor is not handled, is calculated following double Loop parameter:
First recirculate step-length be M, number of repetition isSecond recirculates step-length as N2, number of repetition be Wherein, the M indicates the maximum parallel read data number that processor is supported, the N0It indicates to corresponding to operation series Points, the N1It indicates that the corresponding product counted of series, the N is completed2For the N1With the N0Product;The N table Show the points for carrying out DFT/IDFT;
First read module for reading the data parallel according to above-mentioned double loop parameter, and reads the M number every time According to until by the N1A data are all read.
12. parallel reading device according to claim 9, which is characterized in that the reading unit also specifically includes:
Second computing module, for being greater than N in M1In the case where, it calculatesValue;
Replication module, it is described for replicatingThe read twiddle factor of part;
Second read module is used for according to following double loop parameter with N2Step-length is described before reading parallelGroup data: First step-length that recirculates isFirst recirculates number as N0, second step-length that recirculates beSecond The number that recirculates isWherein, the M indicates the maximum parallel read data number that processor is supported, the N0Table Show to points corresponding to operation series, the N1It indicates that the corresponding product counted of series, the N is completed2For the N1With The N0Product;The N indicates to carry out the points of DFT/IDFT.
13. a kind of mixed base DFT/IDFT parallel computation based on the parallel reading device any in the claims 9-12 Device, which is characterized in that the parallel computation unit includes at least:
Equivalent rotary factor calculating unit, for reading input twiddle factor and output twiddle factor parallel, and will be both described Respective items are multiplied, using result of product together with the input twiddle factor as the equivalent rotary factor;
Cache unit, the equivalent rotary factor and input data for will be obtained by the equivalent rotary factor calculating unit It is multiplied, and result of product is cached;
Data processing unit, it is when the cache unit executes multiplying, the caching is single used in recirculating second The result cached in member is read, and carries out corresponding addition or subtraction operation.
14. parallel computation unit according to claim 13, which is characterized in that the equivalent rotary factor calculating unit tool Body includes:
It is parallel to read in module, for reading in the input twiddle factor and the output twiddle factor parallel;
Cache module obtains first for the input twiddle factor to be multiplied with the output twiddle factor respective items With second group of equivalent rotary factor, and by described first and second group of equivalent rotary factor together with as the equivalent rotation of third group The input twiddle factor of transposon is stored in caching.
15. parallel computation unit according to claim 13, which is characterized in that the data processing unit further includes:
Complex operation unit, the result for will cache in the cache unit is read, and carries out corresponding add operation.
CN201610596528.9A 2016-07-26 2016-07-26 Mixed base DFT/IDFT is read parallel and calculation method and device Active CN106201999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610596528.9A CN106201999B (en) 2016-07-26 2016-07-26 Mixed base DFT/IDFT is read parallel and calculation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610596528.9A CN106201999B (en) 2016-07-26 2016-07-26 Mixed base DFT/IDFT is read parallel and calculation method and device

Publications (2)

Publication Number Publication Date
CN106201999A CN106201999A (en) 2016-12-07
CN106201999B true CN106201999B (en) 2018-11-27

Family

ID=57495233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610596528.9A Active CN106201999B (en) 2016-07-26 2016-07-26 Mixed base DFT/IDFT is read parallel and calculation method and device

Country Status (1)

Country Link
CN (1) CN106201999B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10698973B2 (en) 2016-07-26 2020-06-30 Institute Of Automation, Chinese Academy Of Sciences Method and apparatus for concurrent reading and calculation of mixed radix DFT/IDFT

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544111A (en) * 2013-10-08 2014-01-29 北京理工大学 Mixed base FFT method based on real-time processing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014108718A1 (en) * 2013-01-09 2014-07-17 Intel Corporation Continuous-flow conflict-free mixed-radix fast fourier transform in multi-bank memory

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544111A (en) * 2013-10-08 2014-01-29 北京理工大学 Mixed base FFT method based on real-time processing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Hardware Efficient Mixed Radix-25/16/9 FFT for LTE Systems;Jienan Chen 等;《IEEE Transactions on Very Large Scale Integration Systems》;20150228;第23卷(第2期);第221-229页 *
用于LTE的混合基DFT算法的FPGA实现;张冬冬;《中国优秀硕士论文全文数据库 信息科技辑》;20130315(第3期);第I135-419页 *

Also Published As

Publication number Publication date
CN106201999A (en) 2016-12-07

Similar Documents

Publication Publication Date Title
CA1308488C (en) Apparatus and method for performing digital signal processing including fast fourier transform radix-4 butterfly computations
CN103970718A (en) Quick Fourier transformation implementation device and method
CN104349260B (en) Low-power consumption WOLA wave filter groups and its synthesis phase circuit
Wang et al. Novel memory reference reduction methods for FFT implementations on DSP processors
Nouri et al. Evaluation of a heterogeneous multicore architecture by design and test of an OFDM receiver
CN106201999B (en) Mixed base DFT/IDFT is read parallel and calculation method and device
Revanna et al. A scalable FFT processor architecture for OFDM based communication systems
US9098449B2 (en) FFT accelerator
CN102129419B (en) Based on the processor of fast fourier transform
CN105975436A (en) IP circuit universal in SoC system and capable of being configured with accelerating unit
CN103493039A (en) Data processing method and related device
Tariq et al. Power-efficient and highly scalable parallel graph sampling using fpgas
WO2013097235A1 (en) Parallel bit order reversing device and method
CN105893326B (en) The device and method for realizing 65536 point FFT based on FPGA
CN103605636B (en) A kind of device and method realizing FFT computing
Abdelrazek et al. A novel architecture using NVIDIA CUDA to speed up simulation of multi-path fast fading channels
Li et al. A scalable systolic accelerator for estimation of the spectral correlation density function and its FPGA implementation
CN102023963B (en) High-speed multi-mode time domain and frequency domain transform method
Glittas et al. Pipelined FFT architectures for real-time signal processing and wireless communication applications
Valencia et al. Compact and high‐throughput parameterisable architectures for memory‐based FFT algorithms
Banerjee et al. A Novel Paradigm of CORDIC-Based FFT Architecture Framed on the Optimality of High-Radix Computation
CN104778150B (en) A kind of frequency domain technique and device
Li et al. The implementation of high-speed FFT processor based on FPGA
Princy et al. Performance analysis of FFT algorithm
WO2018018412A1 (en) Mixed-radix dft/idft parallel reading and computing methods and devices

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant