CN106201999A - Mixed base DFT/IDFT reads and computational methods and device parallel - Google Patents
Mixed base DFT/IDFT reads and computational methods and device parallel Download PDFInfo
- Publication number
- CN106201999A CN106201999A CN201610596528.9A CN201610596528A CN106201999A CN 106201999 A CN106201999 A CN 106201999A CN 201610596528 A CN201610596528 A CN 201610596528A CN 106201999 A CN106201999 A CN 106201999A
- Authority
- CN
- China
- Prior art keywords
- parallel
- progression
- product
- data
- recirculates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000205 computational method Methods 0.000 title description 2
- 238000000034 method Methods 0.000 claims abstract description 85
- 230000008569 process Effects 0.000 claims abstract description 37
- 239000000047 product Substances 0.000 claims description 74
- 230000008901 benefit Effects 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 12
- 239000006227 byproduct Substances 0.000 claims description 5
- 230000003134 recirculating effect Effects 0.000 claims description 5
- 230000010076 replication Effects 0.000 claims description 2
- 230000003362 replicative effect Effects 0.000 claims description 2
- 241000208340 Araliaceae Species 0.000 claims 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims 2
- 235000003140 Panax quinquefolius Nutrition 0.000 claims 2
- 235000008434 ginseng Nutrition 0.000 claims 2
- 230000003111 delayed effect Effects 0.000 claims 1
- 230000007423 decrease Effects 0.000 abstract description 4
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004899 motility Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Discrete Mathematics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of mixed base DFT/IDFT data parallel read method, mixed base DFT/IDFT parallel calculating method, mixed base DFT/IDFT data parallel reading device, mixed base DFT/IDFT parallel computation unit.Wherein, this parallel read method includes: according to treating counting and completing the product counted corresponding to progression corresponding to computing progression, configure double loop parameter;Then, it is judged that the size between the product that maximum parallel read data number and completing is counted corresponding to progression;Finally, based on judged result, calculate corresponding double loop parameter according to judged result, and based on calculated double loop parameter parallel read data.Thus, the embodiment of the present invention improves process degree of parallelism, decreases dependency between data, reduces so that integral operation sky is clapped, improves streamline utilization rate, and then can effectively promote mixed base DFT/IDFT arithmetic speed.
Description
Technical field
The present embodiments relate to mobile communication technology field, be specifically related to a kind of mixed base DFT/IDFT data parallel and read
Access method, mixed base DFT/IDFT parallel calculating method, mixed base DFT/IDFT data parallel reading device, mixed base DFT/
IDFT parallel computation unit, but it is not limited to this.
Background technology
In digital information processing system, especially finite length sequence, DFT (discrete Fourier transform) is that one is the heaviest
The mathematic(al) manipulation wanted.It is in the nature the finite point discrete sampling of finite length sequence Fourier transformation.It makes Digital Signal Processing
Digital operation method can be used to complete at frequency domain, enhance the motility of Digital Signal Processing, DFT is at digital communication, image
The field extensive application such as process, power Spectral Estimation.Wherein, the DFT computing of the power side being 2 of counting can use base 2 class FFT
Algorithm completes.For other situations of counting, i.e. fft algorithm can not be used to complete the most typically and count DFT.
At present, general number DFT typically uses the mixed base algorithm with Cooley-Tukey algorithm as theoretical basis.Base 2 class
Fft algorithm revise the most based on this and obtain.Its basic thought is: the DFT that counts greatly is converted into repeatedly small point DFT, its
In computing each time be referred to as one-level, perform every one-level computing successively and complete whole DFT process.Generally small point is set to matter
Number, i.e. 3,5 ... when computing successively according to base 3, base 5 ... process constantly nesting carry out.If every one-level base N operation performs
Dry time, but targeted data are varied from.
Formula (1) is base 3 algorithm expression formula, wherein,For inputting twiddle factor, relevant with k;For exporting twiddle factor, unrelated with k.
Due to the integral multiple of general number DFT process non-2, therefore general processor cannot be by whole array data when processing
Once read in or write out, thus reducing degree of parallelism.Meanwhile, general DFT processing procedure be advanced row data with input rotate because of
The taking advantage of of son, add computing, then carry out with the taking advantage of of output twiddle factor, add computing so that between data, dependency is bigger.Furthermore, typically
DFT process will be taken advantage of, add mixed and alternate execution, be introduced back into calculating dependency.This causes the arithmetical unit caused by data dependence
Latent period is elongated, and streamline utilization rate reduces, thus reduces the processing speed of whole DFT computing.
In view of this, the special proposition present invention.
Summary of the invention
The main purpose of the embodiment of the present invention is to provide a kind of mixed base DFT/IDFT data parallel read method, and it is extremely
Partially solve the technical problem how promoting operation efficiency.Count parallel additionally, also provide for a kind of mixed base DFT/IDFT
Calculation method, mixed base DFT/IDFT data parallel reading device, mixed base DFT/IDFT parallel computation unit.
To achieve these goals, according to an aspect of the invention, it is provided techniques below scheme:
A kind of mixed base DFT/IDFT data parallel read method.Described method may include that
According to treating counting and completing the product counted corresponding to progression corresponding to computing progression, configure two and recirculate
Parameter;
Judge the size between maximum parallel read data number and the described product completing to count corresponding to progression;
Corresponding double loop parameter is calculated according to judged result, and based on calculated double loop parameter
Parallel read data.
Further, described basis treats counting and completing the product counted corresponding to progression corresponding to computing progression,
Configure double loop parameter, specifically may include that
According to described treat corresponding to computing progression count and the described product completing to count corresponding to progression, configuration such as
Under double loop parameter: first step-length that recirculates is N1, first number of times that recirculates is N0, second step-length that recirculates is N2, the second weight
Cycle-index isWherein, described N0Represent and treat counting corresponding to computing progression, described N1Represent and completed corresponding to progression
The product counted, described N2For described N1With described N0Product.
Further, described according to the judged result corresponding double loop parameter of calculating, and based on being calculated
Double loop parameter parallel read data, specifically may include that
At M less than or equal to N1In the case of, do not process read twiddle factor, the following double loop parameter of calculating:
Described first recirculate step-length be M, described first number of times that recirculates beDescribed second recirculates
Step-length is N2, described second number of times that recirculates beWherein, described M represents the maximum parallel read data that processor is supported
Number, described N0Represent and treat counting corresponding to computing progression, described N1Represent the product completing to count corresponding to progression, institute
State N2For described N1With described N0Product;
Read described data parallel according to above-mentioned double loop parameter, and read described M data every time, until by described
N1Individual data all read.
Further, described according to the judged result corresponding double loop parameter of calculating, and based on being calculated
Double loop parameter parallel read data, the most specifically may include that
At M more than N1In the case of, calculateValue;
ReplicateThe twiddle factor that part is read;
According to following double loop parameter with N2Before step-length reads parallelGroup data: described first recirculates step
A length ofDescribed first number of times that recirculates is N0, described second step-length that recirculates beDescribed second
The number of times that recirculates isWherein, described M represents the maximum parallel read data number that processor is supported, described N0Table
Show and treat counting corresponding to computing progression, described N1Represent the product completing to count corresponding to progression, described N2For described N1With
Described N0Product.
To achieve these goals, according to another aspect of the present invention, additionally provide a kind of based on said method mixed
Close base DFT/IDFT parallel calculating method.Described parallel calculating method may include that
Step 1: parallel reading inputs twiddle factor and output twiddle factor, and the two respective items is multiplied, and will take advantage of
Amass result together with described input twiddle factor as the equivalent rotary factor;
Step 2: the described equivalent rotary factor is multiplied with input data, and result of product is cached;
Step 3: in recirculating second, when performing multiplying in described step 2, the result that described step 2 is cached
Read, and carry out corresponding addition or subtraction.
Further, described the described equivalent rotary factor is multiplied with input data, and result of product is cached, tool
Body may include that
In the case of processor is not provided with complex operation unit, by the described equivalent rotary factor and described input data
Real part, the result of imaginary part multiplication cross cache.
Further, described step 3 specifically may include that
In the case of processor is provided with complex operation unit, when performing multiplying in described step 2, by described step
The result of 2 cachings reads, and carries out corresponding add operation.
Further, described step 3 the most specifically may include that
In the case of processor is not provided with complex operation unit, when performing multiplying in described step 2, by described step
The result of rapid 2 cachings reads, and carries out following subtraction:
By product between the real part of the described equivalent rotary factor and described input data and the described equivalent rotary factor and institute
State product between the imaginary part of input data to subtract each other.
To achieve these goals, according to a further aspect of the invention, a kind of mixed base DFT/IDFT number is additionally provided
According to parallel reading device.This parallel reading device may include that
Count computing unit, for according to treating counting and taking advantage of of completing to count corresponding to progression corresponding to computing progression
Long-pending, configure double loop parameter;
Group number judging unit, for judging that maximum parallel read data number completes to count corresponding to progression with described
Size between product;
Reading unit, the judged result for obtaining according to described group of number judging unit calculates corresponding double follow
Ring parameter, and based on calculated double loop parameter parallel read data.
Further, computing unit of counting described in specifically may include that
Configuration module, for according to described in treat corresponding to computing progression count and described having completed count corresponding to progression
Product, configure following double loop parameter: first step-length that recirculates is N1, first number of times that recirculates is N0, second recirculates step
A length of N2, second number of times that recirculates isWherein, described N0Represent and treat counting corresponding to computing progression, described N1Represent the completeest
Become the product counted corresponding to progression, described N2For described N1With described N0Product.
Further, described reading unit specifically may include that
First computing module, for being less than or equal to N at M1In the case of, do not process read twiddle factor, calculate with
Under double loop parameter:
First recirculate step-length be M, number of repetition beSecond step-length that recirculates is N2, number of repetition
ForWherein, described M represents the maximum parallel read data number that processor is supported, described N0Represent and treat computing progression institute
Corresponding counts, described N1Represent the product completing to count corresponding to progression, described N2For described N1With described N0Product;
First read module, for reading described data parallel according to above-mentioned double loop parameter, and reads described M every time
Individual data, until by described N1Individual data all read.
Further, described reading unit the most specifically may include that
Second computing module, for being more than N at M1In the case of, calculateValue;
Replication module, is used for replicating describedThe twiddle factor that part is read;
Second read module, is used for according to following double loop parameter with N2Step-length is described before reading parallelGroup
Data: first step-length that recirculates isFirst number of times that recirculates is N0, second step-length that recirculates be
Second number of times that recirculates isWherein, described M represents the maximum parallel read data number that processor is supported, described
N0Represent and treat counting corresponding to computing progression, described N1Represent the product completing to count corresponding to progression, described N2For described
N1With described N0Product.
To achieve these goals, according to a further aspect of the invention, one is additionally provided based on above-mentioned parallel reading
The mixed base DFT/IDFT parallel computation unit of device.This parallel computation unit may include that
Equivalent rotary factor calculating unit, reads input twiddle factor and output twiddle factor for parallel, and by described
The two respective items is multiplied, using result of product together with described input twiddle factor as the equivalent rotary factor;
Buffer unit, for by the described equivalent rotary factor obtained by described equivalent rotary factor calculating unit and input
Data are multiplied, and cache result of product;
Data processing unit, in recirculating second, when described buffer unit performs multiplying, by described slow
In memory cell, the result of caching reads, and carries out corresponding addition or subtraction.
Further, described equivalent rotary factor calculating unit specifically may include that
Read in module parallel, read in described input twiddle factor and described output twiddle factor for parallel;
Cache module, for being multiplied with described output twiddle factor respective items by described input twiddle factor, obtains
First and second groups of equivalent rotary factors, and using described first and described second group of equivalent rotary factor together with as the 3rd group etc.
The described input twiddle factor of effect twiddle factor is stored in caching.
Further, described data processing unit can also include:
Complex operation unit, for the result of caching in described buffer unit being read, and carries out corresponding add operation.
Compared with prior art, technique scheme at least has the advantages that
The embodiment of the present invention is by according to treating counting and taking advantage of of completing to count corresponding to progression corresponding to computing progression
Long-pending, configure double loop parameter;Then, it is judged that maximum parallel read data number and taking advantage of of completing to count corresponding to progression
Size between Ji;Finally, based on judged result, corresponding double loop parameter, and base are calculated according to judged result
In calculated double loop parameter parallel read data.Thus, by relevant information of counting is calculated, configure double
Loop parameter, when bit wide one timing of processor, according to counting and computing progression reads data, and data with maximum parallelism degree
Between uncorrelated, when computing without specially data being carried out reordering operations, it is not necessary to carry out lateral operation and process, improve place
Reason degree of parallelism, decreases execution cycle.
Certainly, the arbitrary product implementing the present invention is not necessarily required to realize all the above advantage simultaneously.
Other features and advantages of the present invention will illustrate in the following description, and, partly become from description
Obtain it is clear that or understand by implementing the present invention.Objectives and other advantages of the present invention can be by the explanation write
Method specifically noted in book, claims and accompanying drawing realizes and obtains.
Accompanying drawing explanation
Accompanying drawing, as the part of the present invention, is used for providing further understanding of the invention, and the present invention's is schematic
Embodiment and explanation thereof are used for explaining the present invention, but do not constitute inappropriate limitation of the present invention.Obviously, the accompanying drawing in describing below
It is only some embodiments, to those skilled in the art, on the premise of not paying creative work, it is also possible to
Other accompanying drawings are obtained according to these accompanying drawings.In the accompanying drawings:
Fig. 1 is that the flow process according to the mixed base DFT shown in an exemplary embodiment and IDFT data parallel read method is shown
It is intended to;
Fig. 2 is the flow process signal according to the mixed base DFT shown in another exemplary embodiment and IDFT parallel calculating method
Figure;
Fig. 3 is to input twiddle factor and output twiddle factor according to parallel reading the shown in an exemplary embodiment, and will
The two respective items is multiplied, using result of product together with inputting the twiddle factor schematic flow sheet as the equivalent rotary factor;
Fig. 4 is that the structure according to the mixed base DFT shown in an exemplary embodiment and IDFT data parallel reading device is shown
It is intended to;
Fig. 5 is according to the mixed base DFT shown in an exemplary embodiment and the structural representation of IDFT parallel computation unit.
These accompanying drawings and word describe and are not intended as limiting the scope of the invention by any way, but pass through reference
Specific embodiment is that those skilled in the art illustrate idea of the invention.
Detailed description of the invention
Below in conjunction with the accompanying drawings and the specific embodiment technical side that the embodiment of the present invention solved the technical problem that, is used
The technique effect of case and realization carries out clear, complete description.Obviously, described embodiment is only of the application
Divide embodiment, be not whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not paying creation
Property work on the premise of, the embodiment of other equivalents all of being obtained or substantially modification all falls within protection scope of the present invention.
The embodiment of the present invention can embody according to the multitude of different ways being defined and covered by claim.
It should be noted that in the following description, understand for convenience, give many details.But it is the brightest
Aobvious, the realization of the present invention can not have these details.
Also, it should be noted the most clearly limiting or in the case of not conflicting, each embodiment in the present invention and
Technical characteristic therein can be mutually combined and form technical scheme.
The LTE system that environment is moving communicating field of embodiment of the present invention application, wherein, the transmission of up transmitting terminal prelists
Code module is DFT process, and corresponding receiving terminal is IDFT (inverse discrete Fourier transform changes) process.
Different according to Resources allocation number, carry out the points N of DFT/IDFT and meet following relation:
N=2α×3β×5γ, 12≤N≤1536, α >=2, β >=1, γ >=0
When implementing, 2αThe DFT of point can use FFT to complete, and residue base 3, the DFT process of base 5 then need to use mixed base
DFT completes.Wherein, after mixed base DFT need to carry out β-idyne 3 computing and γ-idyne 5 computing, and the first base 3 of employing, the order of base 5 is complete
Become.
Fig. 1 schematically illustrates a kind of mixed base DFT/IDFT data parallel read method.As it is shown in figure 1, the method
May include that
S100: according to treating counting and completing the product counted corresponding to progression corresponding to computing progression, configure two
Recirculate parameter.
S110: judge the size between the product that maximum parallel read data number and completing is counted corresponding to progression.
S120: calculate corresponding double loop parameter according to judged result, and double follow based on calculated
Ring parameter parallel read data.
The embodiment of the present invention, by calculating relevant information of counting, configures double loop parameter, when the position of processor
A wide timing reads data with maximum parallelism degree, thus improves process degree of parallelism.
As a kind of optional implementation of the present embodiment, according to treating counting and having completed level corresponding to computing progression
The product counted corresponding to number, configures double loop parameter and specifically may include that according to treating counting corresponding to computing progression
Complete the product counted corresponding to progression, configure following double loop parameter: first step-length that recirculates is N1, first heavily follows
Ring number of times is N0, second step-length that recirculates is N2, second number of times that recirculates isWherein, N0Represent and treat corresponding to computing progression
Count, N1Represent the product completing to count corresponding to progression, N2For N1With N0Product.
As a kind of optional implementation of the present embodiment, based on judged result, calculate therewith according to this judged result
Corresponding double loop parameter, and specifically may include that based on calculated double loop parameter parallel read data
At M less than or equal to N1In the case of, do not process read twiddle factor, calculate following loop parameter:
First recirculate step-length be M, first number of times that recirculates beSecond step-length that recirculates is N2,
Double cycle-index isWherein, M represents the maximum parallel read data number that processor is supported, N0Represent and treat operation stage
Counting corresponding to number, N1Represent the product completing to count corresponding to progression, N2For N1With N0Product;
According to above-mentioned loop parameter parallel read data, and read M data every time, until by N1Individual data are all read
Go out.
The embodiment of the present invention, by calculating relevant information of counting, configures double loop parameter, when processor obtains position
A wide timing is read between data, and data uncorrelated with maximum parallelism degree, it is not necessary to carries out lateral operation process, thus improves
Process degree of parallelism.
As a kind of optional implementation of the present embodiment, based on judged result, calculate therewith according to this judged result
Corresponding double loop parameter, and specifically can also include based on calculated double loop parameter parallel read data:
At M more than N1In the case of, calculateValue, replicates the twiddle factor readPart, and root
Descend loop parameter with N according to this2Before step-length reads parallelGroup data: first step-length that recirculates isFirst weight
Cycle-index is N0, second step-length that recirculates beSecond number of times that recirculates isWherein, M represents
The maximum parallel read data number that processor is supported, N0Represent and treat counting corresponding to computing progression, N1Represent and completed level
The product counted corresponding to number, N2For N1With N0Product.
The embodiment of the present invention, by calculating relevant information of counting, configures double loop parameter, when processor obtains position
A wide timing is read between data, and data uncorrelated with maximum parallelism degree, it is not necessary to carries out lateral operation process, thus improves
Process degree of parallelism.
The embodiment of the present invention can be based on any mixed base process, in view of mixed base theory may take Arbitrary Digit, it is impossible to
Limit is illustrated, so, as a example by base 3, describe the present invention below by preferred mode in detail.
Assume: N0Represent and treat counting corresponding to computing progression;N1Represent the product completing to count corresponding to progression;M table
Show the maximum parallel read data number (16 can be taken as) that processor is supported;N represents that DFT counts (can be taken as 1200
Point).
S200: calculate N0And N1, N0=3, N1=16, and according to N0And N1Determine loop parameter, wherein, loop parameter bag
Include first recirculate step-length and cycle-index, second recirculate step-length and cycle-index.
In this step, N2For N1With N0Product, first step-length that recirculates is N1, first number of times that recirculates is N0, second
The step-length that recirculates is N2, second number of times that recirculates isFrom there through calculating it follows that N2=48, first step-length that recirculates is
16, first number of times that recirculates is 3;Second step-length that recirculates is 48, and second number of times that recirculates is 25.
S210: judge M and N1Magnitude relationship.If M is less than or equal to N1, then step S211 is performed;Otherwise, step is performed
S212。
S211: do not process read twiddle factor, according to following loop parameter parallel read data, and reads M every time
Individual data, until by N1Individual data all read:
First recirculate step-length be M, number of repetition beSecond step-length that recirculates is N2, number of repetition
For
Now degree of parallelism is 16, and bandwidth availability ratio is 1.In this step, second parameter that recirculates is constant.In reality
In application, first parameter and second parameter that recirculates that recirculates can be adjusted according to the bit wide of processor.
S212: calculateValue, replicates the twiddle factor readPart, and according to following loop parameter
With N2Before step-length reads parallelGroup data: first step-length that recirculates isFirst number of times that recirculates is N0,
Two step-lengths that recirculate areSecond number of times that recirculates is
Degree of parallelism now is
Based on above-described embodiment, the embodiment of the present invention also proposes a kind of mixed base DFT/IDFT parallel calculating method.Such as Fig. 2
Shown in, the method can be realized to step S320 by step S300.
S300: parallel reading inputs twiddle factor and output twiddle factor, and the two respective items is multiplied, by product
Result is together with inputting twiddle factor as the equivalent rotary factor.
Specifically, as it is shown on figure 3, this step may include that step S301 and step S302.
S301: parallel reading inputs twiddle factor and output twiddle factor.
S302: be multiplied with output twiddle factor respective items by input twiddle factor, obtains first and second groups of equivalences
Twiddle factor, and using first and second groups of equivalent rotary factors together with the input twiddle factor as the 3rd group of equivalent rotary factor
It is stored in caching.
As a example by base 3, the process obtaining the equivalent rotary factor is described in detail below by preferred mode.
S401: parallel reading inputs twiddle factorAnd output twiddle factorWithWherein, W is twiddle factor labelling;K is the size of data carrying out base N operation, and value is 0,1 ... N-
1。
S402: be multiplied with output twiddle factor respective items by input twiddle factor, obtains first and second groups of equivalences
Twiddle factor:
And using first and second groups of equivalent rotary factors together with as the 3rd group
The input twiddle factor of the equivalent rotary factor is stored in caching.
Wherein it is possible to cache according in the following manner: the most individually storage input twiddle factor is permanent with output twiddle factor
It it is the factor of 1.Input twiddle factor needs to store (N according to data difference0-1)×N1Individual different pieces of information, output twiddle factor only has
(N0-1)×(N0-1) individual different pieces of information, corresponding multiplied result is (N0-1)×(N0-1)×N1Individual different pieces of information.
S310: the equivalent rotary factor is multiplied with input data, and result of product is cached.
Specifically, as a example by base 3, two groups of equivalent rotary factors that step S302 is obtained by this step
WithAnd input twiddle factorAs three groups of equivalent rotary factors
It is multiplied with input data.
Wherein, multiplication result is Wherein, B and C represents input data.
In an optional embodiment, if processor is without complex operation unit, then this step by the equivalent rotary factor with
The input real part of data, the result of imaginary part multiplication cross cache.
This step is when calculating, owing to the twiddle factor in each group of calculating process uses the equivalent rotary in caching
The factor so that the calculating process of each group only comprises taking advantage of, add computing, each group of computing between input data and twiddle factor
Dependency before and after no data between journey, and second recirculate in 25 these processes of computing only need to perform once.
S320: in recirculating second, when performing multiplying in step S310, result step S310 cached is read
Go out, and carry out corresponding addition or subtraction.
One of wherein, as the presently preferred embodiments, as a example by base 3, in the case of processor is provided with complex operation unit, addition is grasped
Work can be Wherein, A, B and C represent input data.
Input is multiplied by the embodiment of the present invention with output twiddle factor, then by multiplication result during calculating is entered
Row cache, thus will take advantage of, add operation is kept completely separate, and reduces the dependency in whole calculating process, improve streamline utilize
Rate, and then improve arithmetic speed.
In an optional embodiment, if processor is without complex operation unit, then this step comprises the equivalent rotary factor
And the subtraction of the product between product and the imaginary part of the equivalent rotary factor and input data between the real part of input data.
The embodiment of the present invention by will take advantage of, reducing is kept completely separate, thus improves each parting lines utilization rate, and then
Improve arithmetic speed.
In sum, the embodiment of the present invention, when calculating, first carries out the equivalent rotary factor in each group of calculating process
With the multiplication operation of input data, then result of product is all stored in caching.Will when next group computing carries out multiplication operation
Result of product data read-out in caching carries out adding, reducing, claps evading sky arithmetical unit that between data, dependency produces.
Although in above-described embodiment, each step is described according to the mode of above-mentioned precedence, but this area
Those of skill will appreciate that, in order to realize the effect of the present embodiment, perform not necessarily in such order between different steps,
It can simultaneously (parallel) perform or perform with reverse order, these simply change all protection scope of the present invention it
In.
Based on the technology design identical with above-mentioned parallel read method embodiment, the embodiment of the present invention also provides for a kind of mixing
Base DFT/IDFT data parallel reading device.As shown in Figure 4, this device 40 can include count computing unit 42, group number judgement
Unit 44 and reading unit 46.Wherein, computing unit 42 is counted for according to treating counting and completing corresponding to computing progression
The product counted corresponding to progression, configures double loop parameter.Group number judging unit 44 is used for judging that maximum reads number parallel
According to the size between the product that number and completing is counted corresponding to progression.Read unit 46 for according to group number judging unit 44
The judged result obtained calculates corresponding double loop parameter, and reads parallel based on calculated double loop parameter
Fetch data.
This mixed base DFT/IDFT data parallel reading device embodiment, by calculating relevant information of counting, configures
Double loop parameter, when bit wide one timing of processor, according to counting and computing progression reads data with maximum parallelism degree, and
Between data uncorrelated, improve process degree of parallelism, decrease execution cycle.
On the basis of above-described embodiment, above-mentioned computing unit 42 of counting may further include configuration module.This configuration
Module is for according to treating counting and completing the product counted corresponding to progression corresponding to computing progression, configuring the most double following
Ring parameter: first step-length that recirculates is N1, first number of times that recirculates is N0, second step-length that recirculates is N2, second recirculates number of times
ForWherein, N0Represent and treat counting corresponding to computing progression, N1Represent the product completing to count corresponding to progression, N2For N1
With N0Product.
On the basis of embodiment illustrated in fig. 4, read unit 46 and may further include the first computing module and the first reading
Delivery block.Wherein, the first computing module is for being less than or equal to N at M1In the case of, do not process read twiddle factor, calculate
The most double loop parameter:
First recirculate step-length be M, number of repetition beSecond step-length that recirculates is N2, number of repetition
ForWherein, M represents the maximum parallel read data number that processor is supported, N0Represent the point treated corresponding to computing progression
Number, N1Represent the product completing to count corresponding to progression, N2For N1With N0Product.First read module is for according to above-mentioned
Double loop parameter parallel read data, and read M data every time, until by N1Individual data all read.
On the basis of embodiment illustrated in fig. 4, read unit 46 and can further include the second computing module, backed stamper
Block and the second read module.Wherein, the second computing module is for being more than N at M1In the case of, calculateValue.Backed stamper
Block is used for replicatingThe twiddle factor that part is read.Second read module is used for according to following double loop parameter with N2
Before step-length reads parallelGroup data: first step-length that recirculates isFirst number of times that recirculates is N0, the second weight
Circulation step-length isSecond number of times that recirculates isWherein, M represents the maximum that processor is supported
Parallel read data number, N0Represent and treat counting corresponding to computing progression, N1Represent that complete to count corresponding to progression takes advantage of
Long-pending, N2For N1With N0Product.
Explanation about this parallel reading device embodiment is referred to associated parallel read method embodiment
Illustrate, do not repeat them here.
It should be noted that the mixed base DFT/IDFT data parallel reading device that above-described embodiment provides is carrying out data
During reading, only it is illustrated with the division of above-mentioned each functional module, in actual applications, can be as desired by above-mentioned
Function distribution is completed by different functional modules, the internal structure of device will be divided into different functional modules, to complete
All or part of function described above.
Additionally, the embodiment of the present invention also proposes a kind of mixed base DFT/IDFT based on above-mentioned parallel reading device embodiment
Parallel computation unit.This parallel computation unit can perform above-mentioned parallel calculating method embodiment.As it is shown in figure 5, this device 50
Equivalent rotary factor calculating unit 52, buffer unit 54 and data processing unit 56 can be included.Wherein, equivalent rotary factor meter
Calculate unit 52 and read input twiddle factor and output twiddle factor for parallel, and the two respective items is multiplied, by product
Result is together with inputting twiddle factor as the equivalent rotary factor.Buffer unit 54 is used for will be by equivalent rotary factor calculating unit 52
The equivalent rotary factor obtained is multiplied with input data, and caches result of product.Data processing unit 56 is for the
During two recirculate, when buffer unit 54 performs multiplying, the result of caching in buffer unit 54 is read, and carry out corresponding
Addition or subtraction.
Twiddle factor is preferentially processed by this mixed base DFT/IDFT parallel computation unit embodiment when carrying out computing,
And multiplying is separated with signed magnitude arithmetic(al), decrease dependency between data, reduce so that integral operation sky is clapped, improve stream
Waterline utilization rate, and then can effectively promote mixed base DFT and IDFT arithmetic speed.
On the basis of above-described embodiment, above-mentioned equivalent rotary factor calculating unit 52 may further include parallel reading
Module and cache module.Wherein, parallel module of reading in reads in input twiddle factor and output twiddle factor for parallel.Caching mould
Block for by input twiddle factor with export twiddle factor respective items be multiplied, obtain first and second groups of equivalent rotaries because of
Son, and first and second groups of equivalent rotary factors are stored in slow together with the input twiddle factor as the 3rd group of equivalent rotary factor
Deposit.
On the basis of above-mentioned embodiment illustrated in fig. 5, data processing unit can also include complex operation unit.Wherein,
Complex operation unit reads for the result that will cache in buffer unit, and carries out corresponding add operation.
Explanation about this parallel computation unit embodiment is referred to associated parallel calculating method embodiment
Relevant explanation, does not repeats them here.
It should be noted that the mixed base DFT/IDFT parallel computation unit that above-described embodiment provides is carrying out parallel computation
Time, only it is illustrated with the division of above-mentioned each functional module, in actual applications, can be as desired by above-mentioned functions
Distribution is completed by different functional modules, the internal structure of device will be divided into different functional modules, above to complete
The all or part of function described.
It will be understood by those skilled in the art that above-mentioned mixed base DFT/IDFT data parallel reading device, mixed base DFT/
IDFT parallel computation unit also includes some other known features, such as processor, controller, memorizer etc., wherein, memorizer
Include but not limited to random access memory, flash memory, read only memory, programmable read only memory, volatile memory, non-volatile
Memorizer, serial storage, parallel storage or depositor etc., processor includes but not limited at CPLD/FPGA, DSP, ARM
Reason device, MIPS processor etc., embodiment of the disclosure in order to unnecessarily fuzzy, structure known to these is shown the most in figs. 4-5
Go out.
It should be understood that the quantity of the modules in Fig. 4-5 is only schematically.According to actual needs, can have
Any number of each module.
Said apparatus embodiment may be used for performing above-mentioned corresponding embodiment of the method, its know-why, the skill solved
The technique effect of art problem and generation is similar, and person of ordinary skill in the field is it can be understood that arrive, for the side described
Just and succinctly, the specific works process of the device of foregoing description and relevant explanation, be referred in preceding method embodiment is right
Answer process, do not repeat them here.
It is to be noted that and respectively assembly of the invention embodiments and methods embodiment is described above, but right
The details of one embodiment description also apply be applicable to another embodiment.For the module related in the embodiment of the present invention, step
Title, it is only for distinguish modules or step, be not intended as inappropriate limitation of the present invention.Those skilled in the art
The module being appreciated that in the embodiment of the present invention or step can also be decomposed or combine.The mould of such as above-described embodiment
Block can merge into a module, it is also possible to is further split into multiple submodule.
The technical scheme provided the embodiment of the present invention above is described in detail.Although applying concrete herein
Individual example principle and the embodiment of the present invention are set forth, but, the explanation of above-described embodiment be only applicable to help reason
Solve the principle of the embodiment of the present invention;For those skilled in the art, according to the embodiment of the present invention, it is being embodied as
All can make a change within mode and range of application.
It should be noted that referred to herein to flow chart or block diagram be not limited solely to form shown in this article, its
Can also be carried out other divide and/or combination.
It can further be stated that: labelling and word in accompanying drawing are intended merely to be illustrated more clearly that the present invention, and it is right to be not intended as
The improper restriction of scope.
Again it should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " it is etc. for distinguishing similar object rather than for describing or representing specific order or precedence.Should be appreciated that this
The data that sample uses can be exchanged in appropriate circumstances, in order to embodiments of the invention described herein can be with except at this
In illustrate or describe those beyond order implement.
Term " includes ", " comprising " or any other like term are intended to comprising of nonexcludability, so that
Process, method, article or equipment/device including a series of key elements not only include those key elements, but also include the brightest
Other key element really listed, or also include the key element that these processes, method, article or equipment/device are intrinsic.
As used herein, term " module ", " unit " may refer to the software object performed on a computing system
Or routine.Disparate modules described herein can be embodied as object or process (such as, the work performed on a computing system
Thread for independent).While it is preferred that realize system and method described herein with software, but with hardware or soft
The realizing also possible and can be conceived to of the combination of part and hardware.
Each step of the present invention can realize with general calculating device, and such as, they can concentrate on single
Calculate on device, such as: personal computer, server computer, handheld device or portable set, laptop device or many
Processor device, it is also possible to be distributed on the network that multiple calculating device is formed, they can be to be different from order herein
Step shown or described by execution, or they are fabricated to respectively each integrated circuit modules, or by many in them
Individual module or step are fabricated to single integrated circuit module and realize.Therefore, the invention is not restricted to any specific hardware and soft
Part or its combination.
The method that the present invention provides can use PLD to realize, it is also possible to is embodied as computer program soft
Part or program module (it include performing particular task or realize the routine of particular abstract data type, program, object, assembly or
Data structure etc.), can be such as a kind of computer program according to embodiments of the invention, run this computer program
Product makes computer perform for the method demonstrated.Described computer program includes computer-readable recording medium, should
Comprise computer program logic or code section on medium, be used for realizing described method.Described computer-readable recording medium can
To be the built-in medium being mounted in a computer or the removable medium (example that can disassemble from basic computer
As: use the storage device of hot plug technology).Described built-in medium includes but not limited to rewritable nonvolatile memory,
Such as: RAM, ROM, flash memory and hard disk.Described removable medium includes but not limited to: optical storage media is (such as: CD-
ROM and DVD), magnetic-optical storage medium (such as: MO), magnetic storage medium (such as: tape or portable hard drive), have built-in can
Rewrite the media (such as: storage card) of nonvolatile memory and there are the media (such as: ROM box) of built-in ROM.
The present invention is not limited to above-mentioned embodiment, and in the case of without departing substantially from flesh and blood of the present invention, this area is common
Technical staff it is contemplated that any deformation, improve or replace and each fall within protection scope of the present invention.
Claims (15)
1. a mixed base DFT/IDFT data parallel read method, it is characterised in that described method at least includes:
According to treating counting and completing the product counted corresponding to progression corresponding to computing progression, configure two and recirculate ginseng
Number;
Judge the size between maximum parallel read data number and the described product completing to count corresponding to progression;
Corresponding double loop parameter is calculated according to judged result, and parallel based on calculated double loop parameter
Read data.
Method the most according to claim 1, it is characterised in that described basis treats counting and the completeest corresponding to computing progression
Become the product counted corresponding to progression, configure double loop parameter, specifically include:
According to described treat corresponding to computing progression count and the described product completing to count corresponding to progression, configure following two
Recirculate parameter: first step-length that recirculates is N1, first number of times that recirculates is N0, second step-length that recirculates is N2, second recirculates
Number of times isWherein, described N0Represent and treat counting corresponding to computing progression, described N1Represent and completed to count corresponding to progression
Product, described N2For described N1With described N0Product.
Method the most according to claim 1, it is characterised in that described corresponding double according to judged result calculating
Loop parameter, and based on calculated double loop parameter parallel read data, specifically include:
At M less than or equal to N1In the case of, do not process read twiddle factor, the following double loop parameter of calculating:
Described first recirculate step-length be M, described first number of times that recirculates beDescribed second recirculates step-length
For N2, described second number of times that recirculates beWherein, described M represents the maximum parallel read data number that processor is supported,
Described N0Represent and treat counting corresponding to computing progression, described N1Represent the product completing to count corresponding to progression, described N2For
Described N1With described N0Product;
Read described data parallel according to above-mentioned double loop parameter, and read described M data every time, until by described N1Individual
Data all read.
Method the most according to claim 1, it is characterised in that described corresponding double according to judged result calculating
Loop parameter, and based on calculated double loop parameter parallel read data, also specifically include:
At M more than N1In the case of, calculateValue;
ReplicateThe twiddle factor that part is read;
According to following double loop parameter with N2Before step-length reads parallelGroup data: described first step-length that recirculates isDescribed first number of times that recirculates is N0, described second step-length that recirculates beDescribed second heavily follows
Ring number of times isWherein, described M represents the maximum parallel read data number that processor is supported, described N0Expression is treated
Counting corresponding to computing progression, described N1Represent the product completing to count corresponding to progression, described N2For described N1With described
N0Product.
5. a mixed base DFT/IDFT parallel calculating method based on described method arbitrary in the claims 1-4, it is special
Levying and be, described parallel calculating method at least includes:
Step 1: parallel reading inputs twiddle factor and output twiddle factor, and the two respective items is multiplied, and is tied by product
Really together with described input twiddle factor as the equivalent rotary factor;
Step 2: the described equivalent rotary factor is multiplied with input data, and result of product is cached;
Step 3: in recirculating second, when performing multiplying in described step 2, the result described step 2 cached reads,
And carry out corresponding addition or subtraction.
Parallel calculating method the most according to claim 5, it is characterised in that described by the described equivalent rotary factor and input
Data are multiplied, and cache result of product, specifically include:
In the case of processor is not provided with complex operation unit, by the reality of the described equivalent rotary factor Yu described input data
Portion, the result of imaginary part multiplication cross cache.
Parallel calculating method the most according to claim 5, it is characterised in that described step 3 specifically includes:
In the case of processor is provided with complex operation unit, when performing multiplying in described step 2, described step 2 is delayed
The result deposited reads, and carries out corresponding add operation.
Parallel calculating method the most according to claim 5, it is characterised in that described step 3 also specifically includes:
In the case of processor is not provided with complex operation unit, when performing multiplying in described step 2, by described step 2
The result of caching reads, and carries out following subtraction:
By product between the real part of the described equivalent rotary factor and described input data and the described equivalent rotary factor and described defeated
Enter product between the imaginary part of data to subtract each other.
9. a mixed base DFT/IDFT data parallel reading device, it is characterised in that this parallel reading device at least includes:
Count computing unit, for according to treating counting and completing the product counted corresponding to progression corresponding to computing progression,
Configure double loop parameter;
Group number judging unit, for judging maximum parallel read data number and the described product completing to count corresponding to progression
Between size;
Reading unit, the judged result for obtaining according to described group of number judging unit calculates corresponding two and recirculates ginseng
Number, and based on calculated double loop parameter parallel read data.
Parallel reading device the most according to claim 9, it is characterised in that described in computing unit of counting specifically include:
Configuration module, for according to described in treat corresponding to computing progression count and described complete to count corresponding to progression take advantage of
Long-pending, configure following double loop parameter: first step-length that recirculates is N1, first number of times that recirculates is N0, second step-length that recirculates is
N2, second number of times that recirculates isWherein, described N0Represent and treat counting corresponding to computing progression, described N1Represent and completed level
The product counted corresponding to number, described N2For described N1With described N0Product.
11. parallel reading devices according to claim 9, it is characterised in that described reading unit specifically includes:
First computing module, for being less than or equal to N at M1In the case of, do not process read twiddle factor, calculate following double
Loop parameter:
First recirculate step-length be M, number of repetition beSecond step-length that recirculates is N2, number of repetition be
Wherein, described M represents the maximum parallel read data number that processor is supported, described N0Represent and treat corresponding to computing progression
Count, described N1Represent the product completing to count corresponding to progression, described N2For described N1With described N0Product;
First read module, for reading described data parallel according to above-mentioned double loop parameter, and reads described M number every time
According to, until by described N1Individual data all read.
12. parallel reading devices according to claim 9, it is characterised in that described reading unit also specifically includes:
Second computing module, for being more than N at M1In the case of, calculateValue;
Replication module, is used for replicating describedThe twiddle factor that part is read;
Second read module, is used for according to following double loop parameter with N2Step-length is described before reading parallelGroup data:
First step-length that recirculates isFirst number of times that recirculates is N0, second step-length that recirculates beSecond
The number of times that recirculates isWherein, described M represents the maximum parallel read data number that processor is supported, described N0Table
Show and treat counting corresponding to computing progression, described N1Represent the product completing to count corresponding to progression, described N2For described N1With
Described N0Product.
13. 1 kinds of mixed base DFT/IDFT parallel computations based on described parallel reading device arbitrary in the claims 9-12
Device, it is characterised in that described parallel computation unit at least includes:
Equivalent rotary factor calculating unit, for parallel input twiddle factor and the output twiddle factor of reading, and will described both
Respective items is multiplied, using result of product together with described input twiddle factor as the equivalent rotary factor;
Buffer unit, for the described equivalent rotary factor and the input data that will be obtained by described equivalent rotary factor calculating unit
It is multiplied, and result of product is cached;
Data processing unit, in recirculating second, when described buffer unit performs multiplying, by single for described caching
In unit, the result of caching reads, and carries out corresponding addition or subtraction.
14. parallel computation units according to claim 13, it is characterised in that described equivalent rotary factor calculating unit has
Body includes:
Read in module parallel, read in described input twiddle factor and described output twiddle factor for parallel;
Cache module, for being multiplied with described output twiddle factor respective items by described input twiddle factor, obtains first
With second group of equivalent rotary factor, and using described first and described second group of equivalent rotary factor together with as the 3rd group of equivalence rotation
The described input twiddle factor of transposon is stored in caching.
15. parallel computation units according to claim 13, it is characterised in that described data processing unit also includes:
Complex operation unit, for the result of caching in described buffer unit being read, and carries out corresponding add operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610596528.9A CN106201999B (en) | 2016-07-26 | 2016-07-26 | Mixed base DFT/IDFT is read parallel and calculation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610596528.9A CN106201999B (en) | 2016-07-26 | 2016-07-26 | Mixed base DFT/IDFT is read parallel and calculation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106201999A true CN106201999A (en) | 2016-12-07 |
CN106201999B CN106201999B (en) | 2018-11-27 |
Family
ID=57495233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610596528.9A Active CN106201999B (en) | 2016-07-26 | 2016-07-26 | Mixed base DFT/IDFT is read parallel and calculation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106201999B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018018412A1 (en) * | 2016-07-26 | 2018-02-01 | 中国科学院自动化研究所 | Mixed-radix dft/idft parallel reading and computing methods and devices |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544111A (en) * | 2013-10-08 | 2014-01-29 | 北京理工大学 | Mixed base FFT method based on real-time processing |
WO2014108718A1 (en) * | 2013-01-09 | 2014-07-17 | Intel Corporation | Continuous-flow conflict-free mixed-radix fast fourier transform in multi-bank memory |
-
2016
- 2016-07-26 CN CN201610596528.9A patent/CN106201999B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014108718A1 (en) * | 2013-01-09 | 2014-07-17 | Intel Corporation | Continuous-flow conflict-free mixed-radix fast fourier transform in multi-bank memory |
CN103544111A (en) * | 2013-10-08 | 2014-01-29 | 北京理工大学 | Mixed base FFT method based on real-time processing |
Non-Patent Citations (2)
Title |
---|
JIENAN CHEN 等: "Hardware Efficient Mixed Radix-25/16/9 FFT for LTE Systems", 《IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS》 * |
张冬冬: "用于LTE的混合基DFT算法的FPGA实现", 《中国优秀硕士论文全文数据库 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018018412A1 (en) * | 2016-07-26 | 2018-02-01 | 中国科学院自动化研究所 | Mixed-radix dft/idft parallel reading and computing methods and devices |
US10698973B2 (en) | 2016-07-26 | 2020-06-30 | Institute Of Automation, Chinese Academy Of Sciences | Method and apparatus for concurrent reading and calculation of mixed radix DFT/IDFT |
Also Published As
Publication number | Publication date |
---|---|
CN106201999B (en) | 2018-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qiu et al. | Data transfer minimization for financial derivative pricing using Monte Carlo simulation with GPU in 5G | |
Ma et al. | Performance modeling for CNN inference accelerators on FPGA | |
Sklyarov et al. | High-performance implementation of regular and easily scalable sorting networks on an FPGA | |
Kułaga et al. | FPGA implementation of decision trees and tree ensembles for character recognition in Vivado HLS | |
WO2018027706A1 (en) | Fft processor and algorithm | |
Mu et al. | Scalable and conflict-free NTT hardware accelerator design: Methodology, proof, and implementation | |
Thong et al. | Fpga acceleration of enhanced boolean constraint propagation for sat solvers | |
Aminian et al. | FPGA-based circuit model emulation of quantum algorithms | |
CN106933777B (en) | The high-performance implementation method of the one-dimensional FFT of base 2 based on domestic 26010 processor of Shen prestige | |
Wu et al. | High-performance architecture for the conjugate gradient solver on FPGAs | |
CN106201999A (en) | Mixed base DFT/IDFT reads and computational methods and device parallel | |
Kang et al. | FlexKA: A Flexible Karatsuba Multiplier Hardware Architecture for Variable-Sized Large Integers | |
Angizi et al. | Processing-in-memory acceleration of mac-based applications using residue number system: A comparative study | |
CN105893326B (en) | The device and method for realizing 65536 point FFT based on FPGA | |
Su et al. | Parallel direct simulation Monte Carlo computation using CUDA on GPUs | |
Valencia et al. | Compact and high‐throughput parameterisable architectures for memory‐based FFT algorithms | |
Sklyarov et al. | FPGA-based accelerators for parallel data sort | |
Gunasekaran et al. | FPGA Based Implementation of Brent Kung Parallel Prefix Adder | |
More et al. | FPGA implementation of FFT processor using vedic algorithm | |
Princy et al. | Performance analysis of FFT algorithm | |
Kannan et al. | FPGA implementation of FFT architecture using modified Radix-4 algorithm | |
Chen | Optimization Implementation of SM3 Algorithm Based on 64 Rounds Grading Calculation | |
Tripathy et al. | A reconfigurable computing architecture for semantic information filtering | |
Koca | An FPGA based approach for Černý conjecture falsification | |
Zhou et al. | Efficient implementation of FDFM approach for euclidean algorithms on the FPGA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |