CN109101347A - A kind of process of pulse-compression method of the FPGA heterogeneous computing platforms based on OpenCL - Google Patents

A kind of process of pulse-compression method of the FPGA heterogeneous computing platforms based on OpenCL Download PDF

Info

Publication number
CN109101347A
CN109101347A CN201810778029.0A CN201810778029A CN109101347A CN 109101347 A CN109101347 A CN 109101347A CN 201810778029 A CN201810778029 A CN 201810778029A CN 109101347 A CN109101347 A CN 109101347A
Authority
CN
China
Prior art keywords
data
buf
kernel
ifft
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810778029.0A
Other languages
Chinese (zh)
Other versions
CN109101347B (en
Inventor
胡善清
于嘉程
王雨薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810778029.0A priority Critical patent/CN109101347B/en
Publication of CN109101347A publication Critical patent/CN109101347A/en
Application granted granted Critical
Publication of CN109101347B publication Critical patent/CN109101347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Ultra Sonic Daignosis Equipment (AREA)

Abstract

The process of pulse-compression method of the invention discloses a kind of FPGA heterogeneous computing platforms based on OpenCL, defines the first array local_buf_1 and the second array local_buf_2, array length N in inverse Fourier transform IFFT kernel.Method particularly includes: M group echo data PRT sequence obtains conjugate multiplication result data through FFT kernel and conjugate multiplication kernel;Every group of PRT sampled point is N number of.When m is odd number, by the conjugate multiplication result data sequence deposit local_buf_1 of m group PRT;Data in local_buf_2 are equally divided into eight sections simultaneously and fetches from each section according to progress IFFT calculating in such a way that binary bits inverted sequence is incremented by and exports IFFT result data.After the conjugate multiplication result data of m group PRT is stored in local_buf_1 completely, by the conjugate multiplication result data sequence deposit local_buf_2 of m+1 group PRT;Data in local_buf_1 are equally divided into eight sections simultaneously and fetches from each section according to progress IFFT calculating in such a way that binary bits inverted sequence is incremented by and exports IFFT result data.The final pulse compression result for obtaining M group PRT.

Description

A kind of process of pulse-compression method of the FPGA heterogeneous computing platforms based on OpenCL
Technical field
The present invention relates to signal processing and parallel computing fields, and in particular to a kind of FPGA based on OpenCL is different The process of pulse-compression method of structure computing platform.
Background technique
To processor performance, more stringent requirements are proposed for the development of Modern Radar Signal processing technique, however due to mole fixed Rule encounters bottleneck, and the computing capability of general processor is increasingly unable to satisfy practical application request.Heterogeneous computing platforms can fill Divide the completion for accelerating task using the advantage of various types of processors, is improving system-computed performance, Energy Efficiency Ratio and calculating real-time Aspect has embodied advantage not available for conventional architectures.The unique internal structure of FPGA makes it have powerful parallel computation energy Power and lower power consumption, therefore FPGA and CPU are formed together isomery processing platform can effectively realize system-computed performance Promotion.OpenCL is a kind of cross-platform parallel programming model based on C/C++ for aiming at heterogeneous computing platforms formulation, and is First industrial standard of industry.OpenCL provides a kind of completely new development approach as cross-platform development language, for FPGA. The method development cycle is short, abstraction hierarchy is high, portable strong, compensates for the deficiency of traditional development scheme.Currently, being based on The FPGA heterogeneous computing platforms of OpenCL have become the research hotspot of academia and industry.
Pulse compression technique is widely used in radar signal processing field, and for radar system, pulsewidth and radar energy are visited Ranging is inversely prroportional relationship with distance resolution from direct proportionality.It may be implemented using pulse compression technique biggish Detection range, while distance resolution with higher.Pulse compression technique is needed to the progress of exomonental echo-signal It is a burst pulse by echo suppression with filtering processing, to improve the signal-to-noise ratio and distance resolution for receiving signal.Such as Fig. 1 It is shown, existing pulse compression algorithm process include three processing steps: (1) FFT (2) conjugate multiplication (3) IFFT, and this three There is specific " productive consumption " relationship, i.e. the output of previous step is the input of later step between a step.
Based on OpenCL when realizing process of pulse-compression on FPGA, need to map three kernels of generation on FPGA (kernel), three processing steps in pulse compression algorithm process are respectively corresponded.As shown in Fig. 2, wherein Global Memory Global storage is the DDR chip outside FPGA, and kernel can carry out data interaction with the DDR chip outside FPGA, in typical case OpenCL model in, needed between multiple kernels by global storage carry out data interaction, and by host carry out data tune Degree, therefore three complete work in series of kernel of pulse compression algorithm, and data dispatch can bring biggish processing to be delayed, it should The computation capability that operating mode is unable to give full play FPGA is optimal process performance.As shown in figure 3, Intel FPGA It is extended on the basis of OpenCL typical model, increases kernel channel (kernel pipeline) intercore communication machine System allows different kernels directly to pass through kernel channel and carries out data interaction, it is not necessary to pass through global memory, without host End participates in data dispatch.Therefore, for there is the processing step of " productive consumption " relationship between each other, it can use kernel Channel optimizes kernel, to realize pipeline and parallel design, promotes process performance.
For process of pulse-compression, the degree of parallelism calculated can be extracted in terms of two: (1) between each group PRT (echo) It is independent from each other, therefore can be with the data of parallel processing each group PRT.(2) when handling every group of PRT, Intel official is utilized The FFT/IFFT kernel routine of offer, the routine realize 8 data points of each clock cycle output based on OpenCL Base 4FFT engine may be implemented different points FFT by modifying parameter, have between three kernels in process of pulse-compression process Explicitly " productive consumption " relationship, therefore pipeline and parallel design can be carried out as unit of 8 data points.
But there are a technological difficulties during realization: the FFT/IFFT kernel that Intel official provides, which uses, divides eight The mode that section is incremented by order inputs, and the mode of binary bits inverted sequence exports, and can not directly be existed using kernel channel Pipeline and parallel design is realized between tri- FFT, conjugate multiplication and IFFT kernels, 8 data points of FFT output are by conjugation phase After multiplying, need to be adjusted the position sequence of data just can be carried out IFFT processing.
Therefore, it is badly in need of finding a kind of method at present, is realized under the premise of guaranteeing that data bit sequence is correct and with 8 data points be The high performance pipeline parallel processing of unit.
Summary of the invention
In view of this, the process of pulse-compression side of the present invention provides a kind of FPGA heterogeneous computing platforms based on OpenCL Method can realize that each clock cycle handles the high performance pipeline of 8 data points simultaneously under the premise of guaranteeing that data bit sequence is correct Row processing, to greatly improve the process performance of pulse compression.
In order to achieve the above objectives, the technical solution of the present invention is as follows: the FPGA heterogeneous computing platforms based on OpenCL are included in In three kernels of mapping generation in field programmable gate array FPGA chip, respectively Fourier transformation FFT kernel, conjugate multiplication Core and inverse Fourier transform IFFT kernel, and between FFT kernel and conjugate multiplication kernel, in conjugate multiplication kernel and IFFT The data path of kernel communication is established between core using kernel pipeline kernel channel;In inverse Fourier transform IFFT kernel Two number groups of middle definition are respectively the first array local_buf_1 and the second array local_buf_2, as local cache, In sampling of the array length of the first array local_buf_1 and the second array local_buf_2 with one group of echo data PRT Point number is identical.
This method comprises the following steps:
S1, M group echo data PRT, which are sequentially input into FFT kernel, carries out Fourier transformation, the FFT knot of FFT kernel output Fruit data are transmitted directly to the progress conjugate multiplication operation of conjugate multiplication kernel by kernel pipeline kernel channel and are total to Yoke multiplied result data;It is N that number of sampling points is identical in every group of echo data PRT.
For the conjugate multiplication result data of m group echo data PRT, m initial value is 1, executes S2;
S2, when m is odd number, the conjugate multiplication result data sequence of m group echo data PRT is stored in the first array local_buf_1;Data in the second array local_buf_2 are equally divided into eight sections and are passed according to binary bits inverted sequence simultaneously The mode of increasing fetches according to progress IFFT calculating from each section and exports IFFT result data;Wherein the second array local_buf_ Data are initially invalid data in 2.
It, will after the conjugate multiplication result data of m group echo data PRT is stored in the first array local_buf_1 completely The second array local_buf_2 of conjugate multiplication result data sequence deposit of m+1 group echo data PRT;Simultaneously by the first number Data are equally divided into eight sections and evidence of fetching from each section in such a way that binary bits inverted sequence is incremented by group local_buf_1 It carries out IFFT calculating and exports IFFT result data.
S3, judge whether that whole M group echo datas complete IFFT processing, if then defeated with inverse Fourier transform IFFT kernel All IFFT result datas out are as the pulse compression result for being directed to M group echo data PRT.
Otherwise m returns to S2 from increasing 2.
Further, it states in S2, data in the second array local_buf_2 is equally divided into eight sections and according to binary system ratio The incremental mode of special inverted sequence is fetched from each section according to progress IFFT calculating, specifically:
It is marked in order for each data in the m+1 group echo data PRT stored in the second array local_buf_2 Subscript;
As m ≠ 1, every segment data obtained after data are equally divided into eight sections in the second array local_buf_2 Originating subscript is respectively 0,1,2,3,4,5,6 and 7, successively takes one from each section in such a way that binary bits inverted sequence is incremented by A data then take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~ I), 2+ (~i), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~ It i) is the result for carrying out binary bits reversion with LOG (N) position bit to i.
As m=1, data are initially invalid data in the second array local_buf_2, do not do and locate for invalid data Reason.
Further, in S2, data in the first array local_buf_1 are equally divided into eight sections and according to binary bits The incremental mode of inverted sequence fetches according to progress IFFT calculating from each section and exports IFFT result data, specifically:
Under being marked in order for each data in the m group echo data PRT stored in the first array local_buf_1 Mark.
The starting subscript point of every segment data obtained after data are equally divided into eight sections in first array local_buf_1 Not Wei 0,1,2,3,4,5,6 and 7, using binary bits inverted sequence be incremented by by the way of a data are successively taken from each section, then Take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~i), 2+ (~ I), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~i) is to i The result of binary bits reversion is carried out with LOG (N) position bit.
The utility model has the advantages that
The present invention is based on kernel channel to optimize pulse compression algorithm, and utilizes Ping-Pong caching Mode solves in pulse compression process due to FFT kernel the output data by the way of binary bits inverted sequence and can not be direct The problem of carrying out IFFT processing, for process of pulse-compression whole process, realizes the high performance pipeline as unit of 8 data points Parallel processing, so that FFT, conjugate multiplication and IFFT three parts processing time-interleaving are together, so that pulse be greatly shortened The processing time of compression algorithm.
Detailed description of the invention
Fig. 1 is existing pulse compression algorithm flow chart;
Fig. 2 is multiple kernel Core Operational pattern diagrams based on typical OpenCL model;
Fig. 3 is multiple kernel Core Operational pattern diagrams based on kernel channel;
Fig. 4 is the composed structure schematic diagram for the FPGA heterogeneous computing platforms based on OpenCL that the present invention uses;
Fig. 5 is that the present invention is based on the process of pulse-compression method flow diagrams of the FPGA heterogeneous computing platforms of OpenCL;
Fig. 6 is that pipeline and parallel design operating mode schematic diagram is compressed in the pulse based on kernel channel;
Fig. 7 is the process of pulse-compression operating mode schematic diagram based on typical OpenCL model.
Specific embodiment
The present invention will now be described in detail with reference to the accompanying drawings and examples.
The embodiment of the present invention is by taking M × N granularity pulse pressure as an example, i.e., M group echo data PRT altogether, every group of echo data PRT packet Containing N number of sampled point.
The operating mode to FFT/IFFT kernel and the position sequence of input, output data are described in detail below:
The process of pulse-compression method of the present invention provides a kind of FPGA heterogeneous computing platforms based on OpenCL, is based on The FPGA heterogeneous computing platforms of OpenCL are as shown in figure 4, include the mapping generation three in field programmable gate array FPGA chip A kernel, respectively Fourier transformation FFT kernel, conjugate multiplication kernel and inverse Fourier transform IFFT kernel, and in FFT kernel It is established between conjugate multiplication kernel, between conjugate multiplication kernel and IFFT kernel using kernel pipeline kernel channel The data path of kernel communication;It is respectively the first array local_ that two number groups are defined in inverse Fourier transform IFFT kernel Buf_1 and the second array local_buf_2, as local cache, wherein the first array local_buf_1 and the second array The array length of local_buf_2 is identical as the number of sampling points of one group of echo data PRT.
Two array the first array local_buf_1 and the second array are defined in the embodiment of the present invention in IFFT kernel The array length of local_buf_2 is N, identical as the number of sampling points of one group of echo data PRT, specifically can be by first Array local_buf_1 and the second array local_buf_2 are defined as local memory, using compiler on FPGA ram in slice The two arrays are mapped and realized.
On the basis of the above-mentioned FPGA heterogeneous computing platforms based on OpenCL, process of pulse-compression side provided by the invention Method process is as shown in figure 5, include the following steps:
S1, M group echo data PRT, which are sequentially input into FFT kernel, carries out Fourier transformation, the FFT knot of FFT kernel output Fruit data are transmitted directly to the progress conjugate multiplication operation of conjugate multiplication kernel by kernel pipeline kernel channel and are total to Yoke multiplied result data;It is N that number of sampling points is identical in every group of echo data PRT.It is that will return wherein in conjugate multiplication kernel Wave number carries out conjugate multiplication according to the FFT result data of PRT and the FFT result of reference signal.
For the conjugate multiplication result data of m group echo data PRT, m initial value is 1, executes S2.
S2, when m is odd number, the conjugate multiplication result data sequence of m group echo data PRT is stored in the first array local_buf_1;Data in the second array local_buf_2 are equally divided into eight sections and are passed according to binary bits inverted sequence simultaneously The mode of increasing fetches according to progress IFFT calculating from each section and exports IFFT result data;Wherein the second array local_buf_ Data are initially invalid data in 2.
Specifically:
It is marked in order for each data in the m+1 group echo data PRT stored in the second array local_buf_2 Subscript;
As m ≠ 1, every segment data obtained after data are equally divided into eight sections in the second array local_buf_2 Originating subscript is respectively 0,1,2,3,4,5,6 and 7, successively takes one from each section in such a way that binary bits inverted sequence is incremented by A data then take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~ I), 2+ (~i), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~ It i) is the result for carrying out binary bits reversion with LOG (N) position bit to i.
As m=1, data are initially invalid data in the second array local_buf_2, in the embodiment of the present invention, for Invalid data is not processed.
It, will after the conjugate multiplication result data of m group echo data PRT is stored in the first array local_buf_1 completely The second array local_buf_2 of conjugate multiplication result data sequence deposit of m+1 group echo data PRT;Simultaneously by the first number Data are equally divided into eight sections and evidence of fetching from each section in such a way that binary bits inverted sequence is incremented by group local_buf_1 It carries out IFFT calculating and exports IFFT result data.
Specifically:
Under being marked in order for each data in the m group echo data PRT stored in the first array local_buf_1 Mark;
The starting subscript point of every segment data obtained after data are equally divided into eight sections in first array local_buf_1 Not Wei 0,1,2,3,4,5,6 and 7, using binary bits inverted sequence be incremented by by the way of a data are successively taken from each section, then Take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~i), 2+ (~ I), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~i) is to i The result of binary bits reversion is carried out with LOG (N) position bit.
In the present invention, the principle of S2 are as follows:
Since FFT kernel exports calculated result in a manner of binary bits inverted sequence, and every group of PRT by FFT and The data that IFFT kernel is input to after conjugate multiplication kernel are stored in array local_buf_1 and local_buf_2 by sequence, because This, the data in array local_buf_1 and local_buf_2 are stored in a manner of binary bits inverted sequence.For every Group PRT, the position for the data that array local_buf_1 and local_buf_2 is stored since preceding 8 continuation address subscript 0 Sequence is 0,4 × N/8,2 × N/8,6 × N/8,1 × N/8,5 × N/8,3 × N/8,7 × N/8, just with above-mentioned IFFT engine to original The position sequence of eight sections of 8 data points of starting for being incremented by access of beginning data point is identical.Therefore, in the IFFT of process of pulse-compression The data stored in core, local_buf_1 and local_buf_2 are equally divided into eight sections, but the starting subscript of every segment data It is 0,1,2,3,4,5,6,7, and is no longer 0,4 × N/8,2 × N/8,6 × N/8,1 × N/8,5 × N/8,3 × N/8,7 × N/8. It is subsequent every time using a for circulation using binary bits inverted sequence it is incremental by the way of by local_buf_1 and local_ Data in buf_2 are successively taken out, and each for circulation, which is taken out, is designated as 0+ (~i), 1+ (~i), 2+ (~i), 3 under 8 point datas + (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein (~i) is suitable to target under eight segment datas in every group of PRT Sequence increment value 1,2,3 ... (N/8-1) carries out the result of binary bits reversion with LOG (N) position bit.
Whether S3, to judge m=M true, if all IFFT number of results then exported with inverse Fourier transform IFFT kernel According to as the pulse compression result for being directed to M group echo data PRT.
Otherwise m returns to S2 from increasing 2.
When calculating the process of pulse-compression of M × N granularity, since the preceding N/8 for circulation of IFFT kernel is needed to first group The data of PRT carry out local cache, and what IFFT engine calculated in the process is invalid data, the output for IFFT kernel, The delay for needing to increase on the basis of original routine N/8 for circulation needs to wait (N/8+N/8) a delay, ability altogether Effective output of all calculated results is obtained in subsequent M × N/8 for circulation.Therefore, IFFT kernel needs to be implemented altogether (M × N/8+N/8+N/8) secondary for circulation.In preceding M × N/8 for circulation, N/8 is divided exactly to obtain using circulation subscript i every Group number base, base=0,1,2 ... the M of one group of PRT obtain data in each group of PRT to N/8 remainder using circulation subscript i Offset address offset, offset=0,1,2 ... N/8 are utilized to realize Ping-Pong caching using group number base Offset address offset, which is realized, presses the data of the every group of PRT stored in array local_buf_1 and local_buf_2 point for eight sections It is taken out according to the mode that binary bits inverted sequence is incremented by and is sent into IFFT engine.
Using the above method, each clock can be realized using kernel channel for pulse compression algorithm whole process The high performance pipeline parallel processing of 8 data points of period treatment, operating mode is as shown in fig. 6, wherein red arrow is The data interaction of kernel kernel and global memory, FFT kernel read initial data from global memory, conjugate multiplication kernel from Reference signal is read in global memory, pulse compression calculated result is stored in global memory by IFFT kernel.It is based on allusion quotation separately below Type OpenCL model and optimization method proposed by the present invention are to the pulse compression algorithm of 4K × 8K granularity in CPU+Arria10 It is realized on FPGA heterogeneous computing platforms and tests kernel and execute the time, the results are shown in Table 1.
Test result before and after 1 improved Algorithm for Pulse Compression of table
Data can see from table 1, and when being based on typical case OpenCL model realization pulse compression algorithm, three kernels are serial It executes, operating mode is as shown in fig. 7, total time is the summation that three kernels respectively handle the time.Using proposed by the present invention When method realizes pulse compression algorithm, tri- FFT, conjugate multiplication and IFFT kernels are worked by the way of pipeline parallel method, So that three parts processing time-interleaving is together, so that the processing time of pulse compression algorithm be greatly shortened, processing is improved Performance.
Specifically, for the pulse pressure of 4K × 8K granularity, CPU+Arria10FPGA heterogeneous computing platforms can be reached at present Optimal performance with based on eight core parallel optimization of DSP C6678 realize result carry out across comparison, the results are shown in Table 2.
Table 2 is directed to the pulse compression algorithm performance across comparison of different processor
Arria10 FPGA DSP C6678
Total time (unit: ms) 42 1200
Data can see from table 2, the pulse pressure of 4K × 8K granularity be handled, Arria10 FPGA is compared to DSP C6678 obtains 28.6 times of performance boosts.
Therefore the present invention can be realized the high performance pipeline parallel processing as unit of 8 data points, so that FFT, conjugation It is multiplied and IFFT three parts handles time-interleaving together, so that the processing time of pulse compression algorithm be greatly shortened.
In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (3)

1. a kind of process of pulse-compression method of the FPGA heterogeneous computing platforms based on OpenCL, which is characterized in that described to be based on The FPGA heterogeneous computing platforms of OpenCL include three kernels of mapping generation in field programmable gate array FPGA chip, respectively For Fourier transformation FFT kernel, conjugate multiplication kernel and inverse Fourier transform IFFT kernel, and in the FFT kernel and described Between conjugate multiplication kernel, kernel pipeline kernel is utilized between the conjugate multiplication kernel and the IFFT kernel Channel establishes the data path of kernel communication;Two number groups are defined in the inverse Fourier transform IFFT kernel is respectively First array local_buf_1 and the second array local_buf_2, as local cache, wherein the first array local_buf_1 It is identical as the number of sampling points of one group of echo data PRT with the array length of the second array local_buf_2;
This method comprises the following steps:
S1, M group echo data PRT, which are sequentially input into the FFT kernel, carries out Fourier transformation, the FFT knot of FFT kernel output Fruit data are transmitted directly to the conjugate multiplication kernel by the kernel pipeline kernel channel and carry out conjugate multiplication behaviour Make to obtain conjugate multiplication result data;It is N that number of sampling points is identical in every group of echo data PRT;
For the conjugate multiplication result data of m group echo data PRT, m initial value is 1, executes S2;
S2, when m is odd number, the conjugate multiplication result data sequence of m group echo data PRT is stored in the first array local_ buf_1;Data in the second array local_buf_2 are equally divided into eight sections and are incremented by according to binary bits inverted sequence simultaneously Mode fetched from each section according to carrying out IFFT calculating and exporting IFFT result data;Wherein the second array local_ Data are initially invalid data in buf_2;
After the conjugate multiplication result data of m group echo data PRT is stored in the first array local_buf_1 completely, by m+1 The second array local_buf_2 of conjugate multiplication result data sequence deposit of group echo data PRT;Simultaneously by first array In local_buf_1 data be equally divided into eight sections and by binary bits inverted sequence be incremented by the way of from each section access according into Row IFFT is calculated and is exported IFFT result data;
S3, judge whether that whole M group echo datas complete IFFT processing, if then defeated with the inverse Fourier transform IFFT kernel All IFFT result datas out are as the pulse compression result for being directed to the M group echo data PRT;
Otherwise m returns to S2 from increasing 2.
2. the method as described in claim 1, which is characterized in that in the S2, by number in the second array local_buf_2 According to be equally divided into eight sections and by binary bits inverted sequence be incremented by the way of from each section access according to carry out IFFT calculating, specifically Are as follows:
Under being marked in order for each data in the m+1 group echo data PRT stored in the second array local_buf_2 Mark;
As m ≠ 1, every segment data obtained after data are equally divided into eight sections in the second array local_buf_2 Originating subscript is respectively 0,1,2,3,4,5,6 and 7, successively takes one from each section in such a way that binary bits inverted sequence is incremented by A data then take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~ I), 2+ (~i), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~ It i) is the result for carrying out binary bits reversion with LOG (N) position bit to i;
As m=1, data are initially invalid data in the second array local_buf_2, do not do and locate for invalid data Reason.
3. the method as described in claim 1, which is characterized in that in the S2, by number in the first array local_buf_1 According to be equally divided into eight sections and in such a way that binary bits inverted sequence is incremented by from each section access according to carrying out IFFT calculating and defeated IFFT result data out, specifically:
Subscript is marked in order for each data in the m group echo data PRT stored in the first array local_buf_1;
The starting subscript point of every segment data obtained after data are equally divided into eight sections in the first array local_buf_2 Not Wei 0,1,2,3,4,5,6 and 7, using binary bits inverted sequence be incremented by by the way of a data are successively taken from each section, then Take 8 point datas every time, take N/8 times altogether, i-th take out 8 point datas subscript be sequentially 0+ (~i), 1+ (~i), 2+ (~ I), 3+ (~i), 4+ (~i), 5+ (~i), 6+ (~i), 7+ (~i), wherein i=1,2 ..., (N/8-1), (~i) is to i The result of binary bits reversion is carried out with LOG (N) position bit.
CN201810778029.0A 2018-07-16 2018-07-16 Pulse compression processing method of FPGA heterogeneous computing platform based on OpenCL Active CN109101347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810778029.0A CN109101347B (en) 2018-07-16 2018-07-16 Pulse compression processing method of FPGA heterogeneous computing platform based on OpenCL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810778029.0A CN109101347B (en) 2018-07-16 2018-07-16 Pulse compression processing method of FPGA heterogeneous computing platform based on OpenCL

Publications (2)

Publication Number Publication Date
CN109101347A true CN109101347A (en) 2018-12-28
CN109101347B CN109101347B (en) 2021-07-20

Family

ID=64846323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810778029.0A Active CN109101347B (en) 2018-07-16 2018-07-16 Pulse compression processing method of FPGA heterogeneous computing platform based on OpenCL

Country Status (1)

Country Link
CN (1) CN109101347B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597432A (en) * 2020-12-28 2021-04-02 华力智芯(成都)集成电路有限公司 Method and system for realizing acceleration of complex sequence cross-correlation on FPGA (field programmable Gate array) based on FFT (fast Fourier transform) algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095730A (en) * 2016-06-23 2016-11-09 中国科学技术大学 A kind of FFT floating-point optimization method based on ILP and DLP
CN106484658A (en) * 2016-09-26 2017-03-08 西安电子科技大学 The device and method of 65536 points of pulse compressions is realized based on FPGA
US20180150644A1 (en) * 2016-11-29 2018-05-31 Intel Corporation Technologies for secure encrypted external memory for field-programmable gate arrays (fpgas)
CN108132467A (en) * 2017-12-23 2018-06-08 成都汇蓉国科微系统技术有限公司 The biradical Forward-looking SAR imaging methods of DSP+FPGA and imaging device based on enhanced ADC

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095730A (en) * 2016-06-23 2016-11-09 中国科学技术大学 A kind of FFT floating-point optimization method based on ILP and DLP
CN106484658A (en) * 2016-09-26 2017-03-08 西安电子科技大学 The device and method of 65536 points of pulse compressions is realized based on FPGA
US20180150644A1 (en) * 2016-11-29 2018-05-31 Intel Corporation Technologies for secure encrypted external memory for field-programmable gate arrays (fpgas)
CN108132467A (en) * 2017-12-23 2018-06-08 成都汇蓉国科微系统技术有限公司 The biradical Forward-looking SAR imaging methods of DSP+FPGA and imaging device based on enhanced ADC

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIACHENG YU等: "《Realization and Optimization of Pulse Compression Algorithm on OpenCL-Based FPGA Heterogeneous Computing Platform》", 《SIGNAL AND INFORMATION PROCESSING,NETWORKING AND COMPUTERS》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597432A (en) * 2020-12-28 2021-04-02 华力智芯(成都)集成电路有限公司 Method and system for realizing acceleration of complex sequence cross-correlation on FPGA (field programmable Gate array) based on FFT (fast Fourier transform) algorithm

Also Published As

Publication number Publication date
CN109101347B (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN103970718B (en) Device and method is realized in a kind of fast Fourier transform
US6490672B1 (en) Method for computing a fast fourier transform and associated circuit for addressing a data memory
Li et al. An FPGA design framework for CNN sparsification and acceleration
CN201226025Y (en) Processor for pulse Doppler radar signal
CN109613536B (en) Satellite-borne SAR real-time processing device and method
CN109101347A (en) A kind of process of pulse-compression method of the FPGA heterogeneous computing platforms based on OpenCL
CN112446471B (en) Convolution acceleration method based on heterogeneous many-core processor
CN103728616A (en) Field programmable gate array (FPGA) based inverse synthetic aperture radar (ISAP) imaging parallel envelope alignment method
CN106445472B (en) A kind of character manipulation accelerated method, device, chip, processor
Zong-ling et al. The design of lightweight and multi parallel CNN accelerator based on FPGA
CN102129419B (en) Based on the processor of fast fourier transform
CN103838704A (en) FFT accelerator with high throughput rate
US6549925B1 (en) Circuit for computing a fast fourier transform
CN109633640A (en) A kind of ISAR Processing Algorithm based on to marine origin picture
Yang et al. A efficient design of a real-time FFT architecture based on FPGA
KR20010110202A (en) Two cycle fft
CN108008665B (en) Large-scale circular array real-time beam former based on single-chip FPGA and beam forming calculation method
CN110096672A (en) Inexpensive pipeline-type fft processor implementation method based on FPGA
CN108508426B (en) SAR echo signal generation method based on multi-core DSP and echo simulator
CN113203997B (en) FPGA-based radar super-resolution direction finding method, system and application
CN103902506A (en) FFTW3 optimization method based on loongson 3B processor
Bahtat et al. Efficient implementation of a complete multi-beam radar coherent-processing on a telecom SoC
CN110320501A (en) A kind of radar signal impulse compression method based on GPU
Yingxi et al. Design of the high-powered digital pulse compression real-time processing system based on ADSP-TS203
CN104820581B (en) A kind of method for parallel processing of FFT and IFFT permutation numbers table

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant