CN109101347A - A kind of process of pulse-compression method of the FPGA heterogeneous computing platforms based on OpenCL - Google Patents
A kind of process of pulse-compression method of the FPGA heterogeneous computing platforms based on OpenCL Download PDFInfo
- Publication number
- CN109101347A CN109101347A CN201810778029.0A CN201810778029A CN109101347A CN 109101347 A CN109101347 A CN 109101347A CN 201810778029 A CN201810778029 A CN 201810778029A CN 109101347 A CN109101347 A CN 109101347A
- Authority
- CN
- China
- Prior art keywords
- data
- buf
- kernel
- ifft
- local
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Ultra Sonic Daignosis Equipment (AREA)
Abstract
The process of pulse-compression method of the invention discloses a kind of FPGA heterogeneous computing platforms based on OpenCL, defines the first array local_buf_1 and the second array local_buf_2, array length N in inverse Fourier transform IFFT kernel.Method particularly includes: M group echo data PRT sequence obtains conjugate multiplication result data through FFT kernel and conjugate multiplication kernel;Every group of PRT sampled point is N number of.When m is odd number, by the conjugate multiplication result data sequence deposit local_buf_1 of m group PRT;Data in local_buf_2 are equally divided into eight sections simultaneously and fetches from each section according to progress IFFT calculating in such a way that binary bits inverted sequence is incremented by and exports IFFT result data.After the conjugate multiplication result data of m group PRT is stored in local_buf_1 completely, by the conjugate multiplication result data sequence deposit local_buf_2 of m+1 group PRT;Data in local_buf_1 are equally divided into eight sections simultaneously and fetches from each section according to progress IFFT calculating in such a way that binary bits inverted sequence is incremented by and exports IFFT result data.The final pulse compression result for obtaining M group PRT.
Description
Technical field
The present invention relates to signal processing and parallel computing fields, and in particular to a kind of FPGA based on OpenCL is different
The process of pulse-compression method of structure computing platform.
Background technique
To processor performance, more stringent requirements are proposed for the development of Modern Radar Signal processing technique, however due to mole fixed
Rule encounters bottleneck, and the computing capability of general processor is increasingly unable to satisfy practical application request.Heterogeneous computing platforms can fill
Divide the completion for accelerating task using the advantage of various types of processors, is improving system-computed performance, Energy Efficiency Ratio and calculating real-time
Aspect has embodied advantage not available for conventional architectures.The unique internal structure of FPGA makes it have powerful parallel computation energy
Power and lower power consumption, therefore FPGA and CPU are formed together isomery processing platform can effectively realize system-computed performance
Promotion.OpenCL is a kind of cross-platform parallel programming model based on C/C++ for aiming at heterogeneous computing platforms formulation, and is
First industrial standard of industry.OpenCL provides a kind of completely new development approach as cross-platform development language, for FPGA.
The method development cycle is short, abstraction hierarchy is high, portable strong, compensates for the deficiency of traditional development scheme.Currently, being based on
The FPGA heterogeneous computing platforms of OpenCL have become the research hotspot of academia and industry.
Pulse compression technique is widely used in radar signal processing field, and for radar system, pulsewidth and radar energy are visited
Ranging is inversely prroportional relationship with distance resolution from direct proportionality.It may be implemented using pulse compression technique biggish
Detection range, while distance resolution with higher.Pulse compression technique is needed to the progress of exomonental echo-signal
It is a burst pulse by echo suppression with filtering processing, to improve the signal-to-noise ratio and distance resolution for receiving signal.Such as Fig. 1
It is shown, existing pulse compression algorithm process include three processing steps: (1) FFT (2) conjugate multiplication (3) IFFT, and this three
There is specific " productive consumption " relationship, i.e. the output of previous step is the input of later step between a step.
Based on OpenCL when realizing process of pulse-compression on FPGA, need to map three kernels of generation on FPGA
(kernel), three processing steps in pulse compression algorithm process are respectively corresponded.As shown in Fig. 2, wherein Global Memory
Global storage is the DDR chip outside FPGA, and kernel can carry out data interaction with the DDR chip outside FPGA, in typical case
OpenCL model in, needed between multiple kernels by global storage carry out data interaction, and by host carry out data tune
Degree, therefore three complete work in series of kernel of pulse compression algorithm, and data dispatch can bring biggish processing to be delayed, it should
The computation capability that operating mode is unable to give full play FPGA is optimal process performance.As shown in figure 3, Intel FPGA
It is extended on the basis of OpenCL typical model, increases kernel channel (kernel pipeline) intercore communication machine
System allows different kernels directly to pass through kernel channel and carries out data interaction, it is not necessary to pass through global memory, without host
End participates in data dispatch.Therefore, for there is the processing step of " productive consumption " relationship between each other, it can use kernel
Channel optimizes kernel, to realize pipeline and parallel design, promotes process performance.
For process of pulse-compression, the degree of parallelism calculated can be extracted in terms of two: (1) between each group PRT (echo)
It is independent from each other, therefore can be with the data of parallel processing each group PRT.(2) when handling every group of PRT, Intel official is utilized
The FFT/IFFT kernel routine of offer, the routine realize 8 data points of each clock cycle output based on OpenCL
Base 4FFT engine may be implemented different points FFT by modifying parameter, have between three kernels in process of pulse-compression process
Explicitly " productive consumption " relationship, therefore pipeline and parallel design can be carried out as unit of 8 data points.
But there are a technological difficulties during realization: the FFT/IFFT kernel that Intel official provides, which uses, divides eight
The mode that section is incremented by order inputs, and the mode of binary bits inverted sequence exports, and can not directly be existed using kernel channel
Pipeline and parallel design is realized between tri- FFT, conjugate multiplication and IFFT kernels, 8 data points of FFT output are by conjugation phase
After multiplying, need to be adjusted the position sequence of data just can be carried out IFFT processing.
Therefore, it is badly in need of finding a kind of method at present, is realized under the premise of guaranteeing that data bit sequence is correct and with 8 data points be
The high performance pipeline parallel processing of unit.
Summary of the invention
In view of this, the process of pulse-compression side of the present invention provides a kind of FPGA heterogeneous computing platforms based on OpenCL
Method can realize that each clock cycle handles the high performance pipeline of 8 data points simultaneously under the premise of guaranteeing that data bit sequence is correct
Row processing, to greatly improve the process performance of pulse compression.
In order to achieve the above objectives, the technical solution of the present invention is as follows: the FPGA heterogeneous computing platforms based on OpenCL are included in
In three kernels of mapping generation in field programmable gate array FPGA chip, respectively Fourier transformation FFT kernel, conjugate multiplication
Core and inverse Fourier transform IFFT kernel, and between FFT kernel and conjugate multiplication kernel, in conjugate multiplication kernel and IFFT
The data path of kernel communication is established between core using kernel pipeline kernel channel;In inverse Fourier transform IFFT kernel
Two number groups of middle definition are respectively the first array local_buf_1 and the second array local_buf_2, as local cache,
In sampling of the array length of the first array local_buf_1 and the second array local_buf_2 with one group of echo data PRT
Point number is identical.
This method comprises the following steps:
S1, M group echo data PRT, which are sequentially input into FFT kernel, carries out Fourier transformation, the FFT knot of FFT kernel output
Fruit data are transmitted directly to the progress conjugate multiplication operation of conjugate multiplication kernel by kernel pipeline kernel channel and are total to
Yoke multiplied result data;It is N that number of sampling points is identical in every group of echo data PRT.
For the conjugate multiplication result data of m group echo data PRT, m initial value is 1, executes S2;
S2, when m is odd number, the conjugate multiplication result data sequence of m group echo data PRT is stored in the first array
local_buf_1;Data in the second array local_buf_2 are equally divided into eight sections and are passed according to binary bits inverted sequence simultaneously
The mode of increasing fetches according to progress IFFT calculating from each section and exports IFFT result data;Wherein the second array local_buf_
Data are initially invalid data in 2.
It, will after the conjugate multiplication result data of m group echo data PRT is stored in the first array local_buf_1 completely
The second array local_buf_2 of conjugate multiplication result data sequence deposit of m+1 group echo data PRT;Simultaneously by the first number
Data are equally divided into eight sections and evidence of fetching from each section in such a way that binary bits inverted sequence is incremented by group local_buf_1
It carries out IFFT calculating and exports IFFT result data.
S3, judge whether that whole M group echo datas complete IFFT processing, if then defeated with inverse Fourier transform IFFT kernel
All IFFT result datas out are as the pulse compression result for being directed to M group echo data PRT.
Otherwise m returns to S2 from increasing 2.
Further, it states in S2, data in the second array local_buf_2 is equally divided into eight sections and according to binary system ratio
The incremental mode of special inverted sequence is fetched from each section according to progress IFFT calculating, specifically:
It is marked in order for each data in the m+1 group echo data PRT stored in the second array local_buf_2
Subscript;
As m ≠ 1, every segment data obtained after data are equally divided into eight sections in the second array local_buf_2
Originating subscript is respectively 0,1,2,3,4,5,6 and 7, successively takes one from each section in such a way that binary bits inverted sequence is incremented by
A data then take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~
I), 2+ (~i), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~
It i) is the result for carrying out binary bits reversion with LOG (N) position bit to i.
As m=1, data are initially invalid data in the second array local_buf_2, do not do and locate for invalid data
Reason.
Further, in S2, data in the first array local_buf_1 are equally divided into eight sections and according to binary bits
The incremental mode of inverted sequence fetches according to progress IFFT calculating from each section and exports IFFT result data, specifically:
Under being marked in order for each data in the m group echo data PRT stored in the first array local_buf_1
Mark.
The starting subscript point of every segment data obtained after data are equally divided into eight sections in first array local_buf_1
Not Wei 0,1,2,3,4,5,6 and 7, using binary bits inverted sequence be incremented by by the way of a data are successively taken from each section, then
Take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~i), 2+ (~
I), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~i) is to i
The result of binary bits reversion is carried out with LOG (N) position bit.
The utility model has the advantages that
The present invention is based on kernel channel to optimize pulse compression algorithm, and utilizes Ping-Pong caching
Mode solves in pulse compression process due to FFT kernel the output data by the way of binary bits inverted sequence and can not be direct
The problem of carrying out IFFT processing, for process of pulse-compression whole process, realizes the high performance pipeline as unit of 8 data points
Parallel processing, so that FFT, conjugate multiplication and IFFT three parts processing time-interleaving are together, so that pulse be greatly shortened
The processing time of compression algorithm.
Detailed description of the invention
Fig. 1 is existing pulse compression algorithm flow chart;
Fig. 2 is multiple kernel Core Operational pattern diagrams based on typical OpenCL model;
Fig. 3 is multiple kernel Core Operational pattern diagrams based on kernel channel;
Fig. 4 is the composed structure schematic diagram for the FPGA heterogeneous computing platforms based on OpenCL that the present invention uses;
Fig. 5 is that the present invention is based on the process of pulse-compression method flow diagrams of the FPGA heterogeneous computing platforms of OpenCL;
Fig. 6 is that pipeline and parallel design operating mode schematic diagram is compressed in the pulse based on kernel channel;
Fig. 7 is the process of pulse-compression operating mode schematic diagram based on typical OpenCL model.
Specific embodiment
The present invention will now be described in detail with reference to the accompanying drawings and examples.
The embodiment of the present invention is by taking M × N granularity pulse pressure as an example, i.e., M group echo data PRT altogether, every group of echo data PRT packet
Containing N number of sampled point.
The operating mode to FFT/IFFT kernel and the position sequence of input, output data are described in detail below:
The process of pulse-compression method of the present invention provides a kind of FPGA heterogeneous computing platforms based on OpenCL, is based on
The FPGA heterogeneous computing platforms of OpenCL are as shown in figure 4, include the mapping generation three in field programmable gate array FPGA chip
A kernel, respectively Fourier transformation FFT kernel, conjugate multiplication kernel and inverse Fourier transform IFFT kernel, and in FFT kernel
It is established between conjugate multiplication kernel, between conjugate multiplication kernel and IFFT kernel using kernel pipeline kernel channel
The data path of kernel communication;It is respectively the first array local_ that two number groups are defined in inverse Fourier transform IFFT kernel
Buf_1 and the second array local_buf_2, as local cache, wherein the first array local_buf_1 and the second array
The array length of local_buf_2 is identical as the number of sampling points of one group of echo data PRT.
Two array the first array local_buf_1 and the second array are defined in the embodiment of the present invention in IFFT kernel
The array length of local_buf_2 is N, identical as the number of sampling points of one group of echo data PRT, specifically can be by first
Array local_buf_1 and the second array local_buf_2 are defined as local memory, using compiler on FPGA ram in slice
The two arrays are mapped and realized.
On the basis of the above-mentioned FPGA heterogeneous computing platforms based on OpenCL, process of pulse-compression side provided by the invention
Method process is as shown in figure 5, include the following steps:
S1, M group echo data PRT, which are sequentially input into FFT kernel, carries out Fourier transformation, the FFT knot of FFT kernel output
Fruit data are transmitted directly to the progress conjugate multiplication operation of conjugate multiplication kernel by kernel pipeline kernel channel and are total to
Yoke multiplied result data;It is N that number of sampling points is identical in every group of echo data PRT.It is that will return wherein in conjugate multiplication kernel
Wave number carries out conjugate multiplication according to the FFT result data of PRT and the FFT result of reference signal.
For the conjugate multiplication result data of m group echo data PRT, m initial value is 1, executes S2.
S2, when m is odd number, the conjugate multiplication result data sequence of m group echo data PRT is stored in the first array
local_buf_1;Data in the second array local_buf_2 are equally divided into eight sections and are passed according to binary bits inverted sequence simultaneously
The mode of increasing fetches according to progress IFFT calculating from each section and exports IFFT result data;Wherein the second array local_buf_
Data are initially invalid data in 2.
Specifically:
It is marked in order for each data in the m+1 group echo data PRT stored in the second array local_buf_2
Subscript;
As m ≠ 1, every segment data obtained after data are equally divided into eight sections in the second array local_buf_2
Originating subscript is respectively 0,1,2,3,4,5,6 and 7, successively takes one from each section in such a way that binary bits inverted sequence is incremented by
A data then take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~
I), 2+ (~i), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~
It i) is the result for carrying out binary bits reversion with LOG (N) position bit to i.
As m=1, data are initially invalid data in the second array local_buf_2, in the embodiment of the present invention, for
Invalid data is not processed.
It, will after the conjugate multiplication result data of m group echo data PRT is stored in the first array local_buf_1 completely
The second array local_buf_2 of conjugate multiplication result data sequence deposit of m+1 group echo data PRT;Simultaneously by the first number
Data are equally divided into eight sections and evidence of fetching from each section in such a way that binary bits inverted sequence is incremented by group local_buf_1
It carries out IFFT calculating and exports IFFT result data.
Specifically:
Under being marked in order for each data in the m group echo data PRT stored in the first array local_buf_1
Mark;
The starting subscript point of every segment data obtained after data are equally divided into eight sections in first array local_buf_1
Not Wei 0,1,2,3,4,5,6 and 7, using binary bits inverted sequence be incremented by by the way of a data are successively taken from each section, then
Take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~i), 2+ (~
I), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~i) is to i
The result of binary bits reversion is carried out with LOG (N) position bit.
In the present invention, the principle of S2 are as follows:
Since FFT kernel exports calculated result in a manner of binary bits inverted sequence, and every group of PRT by FFT and
The data that IFFT kernel is input to after conjugate multiplication kernel are stored in array local_buf_1 and local_buf_2 by sequence, because
This, the data in array local_buf_1 and local_buf_2 are stored in a manner of binary bits inverted sequence.For every
Group PRT, the position for the data that array local_buf_1 and local_buf_2 is stored since preceding 8 continuation address subscript 0
Sequence is 0,4 × N/8,2 × N/8,6 × N/8,1 × N/8,5 × N/8,3 × N/8,7 × N/8, just with above-mentioned IFFT engine to original
The position sequence of eight sections of 8 data points of starting for being incremented by access of beginning data point is identical.Therefore, in the IFFT of process of pulse-compression
The data stored in core, local_buf_1 and local_buf_2 are equally divided into eight sections, but the starting subscript of every segment data
It is 0,1,2,3,4,5,6,7, and is no longer 0,4 × N/8,2 × N/8,6 × N/8,1 × N/8,5 × N/8,3 × N/8,7 × N/8.
It is subsequent every time using a for circulation using binary bits inverted sequence it is incremental by the way of by local_buf_1 and local_
Data in buf_2 are successively taken out, and each for circulation, which is taken out, is designated as 0+ (~i), 1+ (~i), 2+ (~i), 3 under 8 point datas
+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein (~i) is suitable to target under eight segment datas in every group of PRT
Sequence increment value 1,2,3 ... (N/8-1) carries out the result of binary bits reversion with LOG (N) position bit.
Whether S3, to judge m=M true, if all IFFT number of results then exported with inverse Fourier transform IFFT kernel
According to as the pulse compression result for being directed to M group echo data PRT.
Otherwise m returns to S2 from increasing 2.
When calculating the process of pulse-compression of M × N granularity, since the preceding N/8 for circulation of IFFT kernel is needed to first group
The data of PRT carry out local cache, and what IFFT engine calculated in the process is invalid data, the output for IFFT kernel,
The delay for needing to increase on the basis of original routine N/8 for circulation needs to wait (N/8+N/8) a delay, ability altogether
Effective output of all calculated results is obtained in subsequent M × N/8 for circulation.Therefore, IFFT kernel needs to be implemented altogether
(M × N/8+N/8+N/8) secondary for circulation.In preceding M × N/8 for circulation, N/8 is divided exactly to obtain using circulation subscript i every
Group number base, base=0,1,2 ... the M of one group of PRT obtain data in each group of PRT to N/8 remainder using circulation subscript i
Offset address offset, offset=0,1,2 ... N/8 are utilized to realize Ping-Pong caching using group number base
Offset address offset, which is realized, presses the data of the every group of PRT stored in array local_buf_1 and local_buf_2 point for eight sections
It is taken out according to the mode that binary bits inverted sequence is incremented by and is sent into IFFT engine.
Using the above method, each clock can be realized using kernel channel for pulse compression algorithm whole process
The high performance pipeline parallel processing of 8 data points of period treatment, operating mode is as shown in fig. 6, wherein red arrow is
The data interaction of kernel kernel and global memory, FFT kernel read initial data from global memory, conjugate multiplication kernel from
Reference signal is read in global memory, pulse compression calculated result is stored in global memory by IFFT kernel.It is based on allusion quotation separately below
Type OpenCL model and optimization method proposed by the present invention are to the pulse compression algorithm of 4K × 8K granularity in CPU+Arria10
It is realized on FPGA heterogeneous computing platforms and tests kernel and execute the time, the results are shown in Table 1.
Test result before and after 1 improved Algorithm for Pulse Compression of table
Data can see from table 1, and when being based on typical case OpenCL model realization pulse compression algorithm, three kernels are serial
It executes, operating mode is as shown in fig. 7, total time is the summation that three kernels respectively handle the time.Using proposed by the present invention
When method realizes pulse compression algorithm, tri- FFT, conjugate multiplication and IFFT kernels are worked by the way of pipeline parallel method,
So that three parts processing time-interleaving is together, so that the processing time of pulse compression algorithm be greatly shortened, processing is improved
Performance.
Specifically, for the pulse pressure of 4K × 8K granularity, CPU+Arria10FPGA heterogeneous computing platforms can be reached at present
Optimal performance with based on eight core parallel optimization of DSP C6678 realize result carry out across comparison, the results are shown in Table 2.
Table 2 is directed to the pulse compression algorithm performance across comparison of different processor
Arria10 FPGA | DSP C6678 | |
Total time (unit: ms) | 42 | 1200 |
Data can see from table 2, the pulse pressure of 4K × 8K granularity be handled, Arria10 FPGA is compared to DSP
C6678 obtains 28.6 times of performance boosts.
Therefore the present invention can be realized the high performance pipeline parallel processing as unit of 8 data points, so that FFT, conjugation
It is multiplied and IFFT three parts handles time-interleaving together, so that the processing time of pulse compression algorithm be greatly shortened.
In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention.
All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention
Within protection scope.
Claims (3)
1. a kind of process of pulse-compression method of the FPGA heterogeneous computing platforms based on OpenCL, which is characterized in that described to be based on
The FPGA heterogeneous computing platforms of OpenCL include three kernels of mapping generation in field programmable gate array FPGA chip, respectively
For Fourier transformation FFT kernel, conjugate multiplication kernel and inverse Fourier transform IFFT kernel, and in the FFT kernel and described
Between conjugate multiplication kernel, kernel pipeline kernel is utilized between the conjugate multiplication kernel and the IFFT kernel
Channel establishes the data path of kernel communication;Two number groups are defined in the inverse Fourier transform IFFT kernel is respectively
First array local_buf_1 and the second array local_buf_2, as local cache, wherein the first array local_buf_1
It is identical as the number of sampling points of one group of echo data PRT with the array length of the second array local_buf_2;
This method comprises the following steps:
S1, M group echo data PRT, which are sequentially input into the FFT kernel, carries out Fourier transformation, the FFT knot of FFT kernel output
Fruit data are transmitted directly to the conjugate multiplication kernel by the kernel pipeline kernel channel and carry out conjugate multiplication behaviour
Make to obtain conjugate multiplication result data;It is N that number of sampling points is identical in every group of echo data PRT;
For the conjugate multiplication result data of m group echo data PRT, m initial value is 1, executes S2;
S2, when m is odd number, the conjugate multiplication result data sequence of m group echo data PRT is stored in the first array local_
buf_1;Data in the second array local_buf_2 are equally divided into eight sections and are incremented by according to binary bits inverted sequence simultaneously
Mode fetched from each section according to carrying out IFFT calculating and exporting IFFT result data;Wherein the second array local_
Data are initially invalid data in buf_2;
After the conjugate multiplication result data of m group echo data PRT is stored in the first array local_buf_1 completely, by m+1
The second array local_buf_2 of conjugate multiplication result data sequence deposit of group echo data PRT;Simultaneously by first array
In local_buf_1 data be equally divided into eight sections and by binary bits inverted sequence be incremented by the way of from each section access according into
Row IFFT is calculated and is exported IFFT result data;
S3, judge whether that whole M group echo datas complete IFFT processing, if then defeated with the inverse Fourier transform IFFT kernel
All IFFT result datas out are as the pulse compression result for being directed to the M group echo data PRT;
Otherwise m returns to S2 from increasing 2.
2. the method as described in claim 1, which is characterized in that in the S2, by number in the second array local_buf_2
According to be equally divided into eight sections and by binary bits inverted sequence be incremented by the way of from each section access according to carry out IFFT calculating, specifically
Are as follows:
Under being marked in order for each data in the m+1 group echo data PRT stored in the second array local_buf_2
Mark;
As m ≠ 1, every segment data obtained after data are equally divided into eight sections in the second array local_buf_2
Originating subscript is respectively 0,1,2,3,4,5,6 and 7, successively takes one from each section in such a way that binary bits inverted sequence is incremented by
A data then take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~
I), 2+ (~i), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~
It i) is the result for carrying out binary bits reversion with LOG (N) position bit to i;
As m=1, data are initially invalid data in the second array local_buf_2, do not do and locate for invalid data
Reason.
3. the method as described in claim 1, which is characterized in that in the S2, by number in the first array local_buf_1
According to be equally divided into eight sections and in such a way that binary bits inverted sequence is incremented by from each section access according to carrying out IFFT calculating and defeated
IFFT result data out, specifically:
Subscript is marked in order for each data in the m group echo data PRT stored in the first array local_buf_1;
The starting subscript point of every segment data obtained after data are equally divided into eight sections in the first array local_buf_2
Not Wei 0,1,2,3,4,5,6 and 7, using binary bits inverted sequence be incremented by by the way of a data are successively taken from each section, then
Take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~i), 2+ (~
I), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~i) is to i
The result of binary bits reversion is carried out with LOG (N) position bit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810778029.0A CN109101347B (en) | 2018-07-16 | 2018-07-16 | Pulse compression processing method of FPGA heterogeneous computing platform based on OpenCL |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810778029.0A CN109101347B (en) | 2018-07-16 | 2018-07-16 | Pulse compression processing method of FPGA heterogeneous computing platform based on OpenCL |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109101347A true CN109101347A (en) | 2018-12-28 |
CN109101347B CN109101347B (en) | 2021-07-20 |
Family
ID=64846323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810778029.0A Active CN109101347B (en) | 2018-07-16 | 2018-07-16 | Pulse compression processing method of FPGA heterogeneous computing platform based on OpenCL |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109101347B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112597432A (en) * | 2020-12-28 | 2021-04-02 | 华力智芯(成都)集成电路有限公司 | Method and system for realizing acceleration of complex sequence cross-correlation on FPGA (field programmable Gate array) based on FFT (fast Fourier transform) algorithm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095730A (en) * | 2016-06-23 | 2016-11-09 | 中国科学技术大学 | A kind of FFT floating-point optimization method based on ILP and DLP |
CN106484658A (en) * | 2016-09-26 | 2017-03-08 | 西安电子科技大学 | The device and method of 65536 points of pulse compressions is realized based on FPGA |
US20180150644A1 (en) * | 2016-11-29 | 2018-05-31 | Intel Corporation | Technologies for secure encrypted external memory for field-programmable gate arrays (fpgas) |
CN108132467A (en) * | 2017-12-23 | 2018-06-08 | 成都汇蓉国科微系统技术有限公司 | The biradical Forward-looking SAR imaging methods of DSP+FPGA and imaging device based on enhanced ADC |
-
2018
- 2018-07-16 CN CN201810778029.0A patent/CN109101347B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095730A (en) * | 2016-06-23 | 2016-11-09 | 中国科学技术大学 | A kind of FFT floating-point optimization method based on ILP and DLP |
CN106484658A (en) * | 2016-09-26 | 2017-03-08 | 西安电子科技大学 | The device and method of 65536 points of pulse compressions is realized based on FPGA |
US20180150644A1 (en) * | 2016-11-29 | 2018-05-31 | Intel Corporation | Technologies for secure encrypted external memory for field-programmable gate arrays (fpgas) |
CN108132467A (en) * | 2017-12-23 | 2018-06-08 | 成都汇蓉国科微系统技术有限公司 | The biradical Forward-looking SAR imaging methods of DSP+FPGA and imaging device based on enhanced ADC |
Non-Patent Citations (1)
Title |
---|
JIACHENG YU等: "《Realization and Optimization of Pulse Compression Algorithm on OpenCL-Based FPGA Heterogeneous Computing Platform》", 《SIGNAL AND INFORMATION PROCESSING,NETWORKING AND COMPUTERS》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112597432A (en) * | 2020-12-28 | 2021-04-02 | 华力智芯(成都)集成电路有限公司 | Method and system for realizing acceleration of complex sequence cross-correlation on FPGA (field programmable Gate array) based on FFT (fast Fourier transform) algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN109101347B (en) | 2021-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103970718B (en) | Device and method is realized in a kind of fast Fourier transform | |
US6490672B1 (en) | Method for computing a fast fourier transform and associated circuit for addressing a data memory | |
Li et al. | An FPGA design framework for CNN sparsification and acceleration | |
CN201226025Y (en) | Processor for pulse Doppler radar signal | |
CN109613536B (en) | Satellite-borne SAR real-time processing device and method | |
CN109101347A (en) | A kind of process of pulse-compression method of the FPGA heterogeneous computing platforms based on OpenCL | |
CN112446471B (en) | Convolution acceleration method based on heterogeneous many-core processor | |
CN103728616A (en) | Field programmable gate array (FPGA) based inverse synthetic aperture radar (ISAP) imaging parallel envelope alignment method | |
CN106445472B (en) | A kind of character manipulation accelerated method, device, chip, processor | |
Zong-ling et al. | The design of lightweight and multi parallel CNN accelerator based on FPGA | |
CN102129419B (en) | Based on the processor of fast fourier transform | |
CN103838704A (en) | FFT accelerator with high throughput rate | |
US6549925B1 (en) | Circuit for computing a fast fourier transform | |
CN109633640A (en) | A kind of ISAR Processing Algorithm based on to marine origin picture | |
Yang et al. | A efficient design of a real-time FFT architecture based on FPGA | |
KR20010110202A (en) | Two cycle fft | |
CN108008665B (en) | Large-scale circular array real-time beam former based on single-chip FPGA and beam forming calculation method | |
CN110096672A (en) | Inexpensive pipeline-type fft processor implementation method based on FPGA | |
CN108508426B (en) | SAR echo signal generation method based on multi-core DSP and echo simulator | |
CN113203997B (en) | FPGA-based radar super-resolution direction finding method, system and application | |
CN103902506A (en) | FFTW3 optimization method based on loongson 3B processor | |
Bahtat et al. | Efficient implementation of a complete multi-beam radar coherent-processing on a telecom SoC | |
CN110320501A (en) | A kind of radar signal impulse compression method based on GPU | |
Yingxi et al. | Design of the high-powered digital pulse compression real-time processing system based on ADSP-TS203 | |
CN104820581B (en) | A kind of method for parallel processing of FFT and IFFT permutation numbers table |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |