CN112986944B - Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration - Google Patents

Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration Download PDF

Info

Publication number
CN112986944B
CN112986944B CN202110238579.5A CN202110238579A CN112986944B CN 112986944 B CN112986944 B CN 112986944B CN 202110238579 A CN202110238579 A CN 202110238579A CN 112986944 B CN112986944 B CN 112986944B
Authority
CN
China
Prior art keywords
gpu
mtd
mti
cuda
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110238579.5A
Other languages
Chinese (zh)
Other versions
CN112986944A (en
Inventor
贾宗衡
孙子棠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110238579.5A priority Critical patent/CN112986944B/en
Publication of CN112986944A publication Critical patent/CN112986944A/en
Application granted granted Critical
Publication of CN112986944B publication Critical patent/CN112986944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • G01S7/41Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • G01S7/41Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • G01S7/414Discriminating targets with respect to background clutter
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention relates to the technical field of radar signal processing, and provides a radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration. The method comprises the following steps: setting radar signal processing parameter values in a CPU, and copying echo matrix data after pulse compression to a GPU video memory space; dividing the thread organization of the kernel function, and executing a secondary canceller MTI kernel function in the GPU; executing matrix transposition kernel functions in the GPU, completing FFT parallel calculation of a plurality of groups of Doppler channels by means of CUFFT library functions, and finally executing the matrix transposition kernel functions again to obtain an output result of an MTD parallel algorithm and transmitting the output result back to the CPU; and optimizing the kernel functions of the MTI and the MTD by utilizing a CUDA code optimization strategy, and drawing an optimized acceleration ratio curve. The optimized parallelization algorithm acceleration ratio of the invention reaches 142.66 times, can well meet the real-time performance of radar signal processing, is based on a CUDA software system and a development mode of a Visual Studio platform, and is convenient for expansion and transplantation.

Description

Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration
Technical Field
The invention belongs to the technical field of radar signal processing, in particular to a method for realizing MTI and MTD of a radar based on CUDA heterogeneous parallel acceleration, which aims to ensure the instantaneity of MTI and MTD algorithm when the radar processes large-scale echo data volume by utilizing GPU parallel computing capability and CUDA heterogeneous programming mode and is easy for platform transplantation.
Background
The radar performs clutter suppression by means of a moving target display (MTI) technique and a Moving Target Detection (MTD) technique during signal processing. The MTI processing utilizes the characteristic that clutter is smaller than Doppler frequency of a radar detection object in a frequency domain, and utilizes a digital canceller to cancel each distance unit one by one to filter out static clutter so as to improve the signal to noise ratio. However, the MTI cannot obtain the doppler frequency of the moving object in advance, and further MTD is required to suppress clutter outside the echo band. The common practice of MTD processing is to concatenate a set of adjacent narrowband doppler filter banks that match the coherent echo bursts after the MTI filter. With the increasing complexity of modern battlefield electromagnetic environments, the increasing echo data volume makes CPU serial processing time-consuming, and the real-time performance of radar signal processing in the current battlefield environment is difficult to meet.
The GPU is used as a core component of the display card, and the hardware architecture of the GPU has high parallelism and is particularly superior to a CPU in parallel resource calculation. The CUDA is collectively called a unified computing device architecture (Compute Unified Device Architecture, CUDA), is a general parallel computing platform proposed by NVIDIA corporation, supports heterogeneous cooperation of the CPU and the GPU, and a programming model of the CUDA fully combines logical control good for the CPU and parallel operation good for the GPU. There have been some research efforts on MTI and MTD algorithms for CUDA platforms.
Chen Dajiang of the university of electronic technology gives a GPU-implemented MTI algorithm in its master graduation paper, "GPU-based alert radar signal processing software design". The method has similar theoretical connection with the MTI parallelization process in the invention, and comprises the following main steps: copying the echo data after pulse compression from the CPU to the GPU in a first-in first-out storage mode; and a second step of: a primary canceller is used at the GPU end to realize two-pulse cancellation operation in a pulse repetition period; and a third step of: and returning the result after the MTI processing to the CPU for scheduling. The method successfully reduces the time consumption of MTI processing, but has the defects that the designed MTI filter has a narrow stop band notch and poor clutter suppression effect, and the lack of uniformity of data precision causes larger estimation error.
In the patent of ' a rapid radar implementation method for processing external radiation source radar signals based on GPU ' (application number: CN201310176310.4; application publication number: CN 103308897B) applied by the institute of electronics of China academy of sciences ', an MTD algorithm based on GPU implementation is disclosed. The method mainly comprises the following steps: cross-reorganizing echo data to be processed, wherein the whole echo data is divided into N equal-length data blocks, each data block is subdivided into L equal-length data segments, and each data segment comprises M data points; secondly, splicing the data with the same data segment number in different data blocks together in sequence, and splicing the tail data of the ith segment of the Nth data block with the initial data of the (i+1) th segment of the 1 st data block to form a new storage structure; and a third step of: copying echo data under the new storage structure to the GPU, and starting M multiplied by N threads; fourth step: and carrying out MTD processing on every M multiplied by N data points at the GPU side aiming at the spliced data. The method effectively improves the parallelism degree of the MTD algorithm, but has some defects, such as optimization of thread allocation and delay hiding is not considered.
Disclosure of Invention
The invention provides a radar MTI and MTD realization method based on CUDA heterogeneous parallel acceleration by utilizing GPU equipment, a CUDA software system and a programming model, and the method also comprises the optimization design of code instructions and thread structures in the process, so that the radar signal processing speed can be greatly improved.
The technical idea of the invention is that the efficient MTI and MTD parallel algorithm is realized by adopting a heterogeneous programming mode of CPU and GPU by combining a radar signal processing algorithm and a GPU parallel processing technology based on a CUDA acceleration platform, and the whole system comprises a Host end program executed by the CPU and a Device end program executed by the GPU. The Host end is responsible for logic control and data management, and specifically comprises setting of simulation parameters, configuration of a GPU thread hierarchy, opening and releasing of a storage unit, reading of radar echo data, copying of data into a GPU and calling of a kernel function. The Device end is responsible for the specific execution process of the kernel function and the CUDA function library corresponding to the MTI and MTD parallel algorithm.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the method for realizing the radar MTI and MTD based on CUDA isomerism parallel acceleration comprises the following steps:
step 1, setting radar signal processing parameter values in a CPU, and reading N after pulse compression r ×N c The echo data matrix X is maintained, is used as initial data before MTI processing and is copied into a developed GPU video memory one by one;
step 2, using a 2-dimensional thread index to allocate the Grid (Grid) and the thread Block (Block) of the CUDA thread, executing a secondary canceller MTI kernel function in the GPU, and outputting echo data after filtering static object impurities and a distance unit where a moving target is located;
step 3, for N obtained in the step 2 r ×N c Dimension result matrix X MTI Firstly executing matrix transposition kernel functions in a GPU, then executing cuFFTExeC2C functions in a CUFFT library to complete FFT parallel calculation of a plurality of groups of Doppler channels, and finally executing the matrix transposition kernel functions again to obtain N output by an MTD parallel algorithm r ×N c Dimension matrix X MTD It is copied from the GPU back to the CPU.
And 4, optimizing the kernel functions of the MTI and the MTD respectively realized in the 2 nd and the 3 rd steps by adopting strategies such as code instruction optimization, optimal thread allocation, alignment and merging global memory access, and the like, and calculating the speed-up ratio of the CUDA heterogeneous parallel algorithm to the CPU serial algorithm after optimization.
In the method for realizing the radar MTI and the MTD based on CUDA heterogeneous parallel acceleration, target echo data simultaneously comprises distance based on time delay and speed dimension information based on Doppler frequency shift. Firstly, storing an initial data matrix on GPU equipment through a video memory space, and distributing a thread model of CUDA by adopting a two-dimensional index mode; next, parallelized MTI kernel functions implemented based on the secondary canceller principle are executed at the GPU. Before MTD processing, the output matrix of the last link is transposed to ensure that Doppler data addresses are continuous, then Doppler parallel FFT calculation is completed on the GPU, and finally, matrix transposition is performed again to restore expected target echo data.
Compared with the prior art, the invention has the following advantages: firstly, all kernel functions are optimized according to a CUDA optimization strategy, so that the operation speed of signal processing is fully improved; secondly, when the operation precision and the acceleration effect are balanced, aiming at the function with low arithmetic strength in engineering, the single-precision floating point number with higher intensity of the GPU of the Turing framework is used, and the whole has higher cost performance; thirdly, the invention is based on a CUDA software system and a Visual Studio platform development mode, has the characteristics of software and modularization, and is convenient for expansion and transplantation.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for realizing CUDA isomerism parallel computing radar MTI and MTD;
fig. 2 is a schematic diagram of a secondary canceller implementing an MTI algorithm according to the present invention;
fig. 3 is a schematic diagram of a narrowband doppler filter bank for implementing an MTD algorithm according to the present invention;
FIG. 4 is a verification simulation diagram of the result of executing only the MTD kernel function in the GPU provided by the invention;
FIG. 5 is a verification simulation diagram of the result of executing MTI and MTD kernel functions in succession in a GPU provided by the invention;
FIG. 6 is an acceleration ratio curve of the optimized CUDA heterogeneous parallel algorithm and the CPU serial algorithm provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The flow chart of CUDA isomerism parallel MTI and MTD algorithm provided by the embodiment of the invention is shown in figure 1. Specifically, the method comprises the following steps:
step 1, setting radar signal processing parameter values in a CPU, and reading N after pulse compression r ×N c And (3) maintaining the echo data matrix X, and copying the echo data matrix X serving as initial data before MTI processing into a developed GPU video memory one by one.
Specifically, the step 1 includes the following 2 substeps:
step 1.1, setting a transmitting signal parameter in a CPU, compressing the pulse to obtain N r ×N c And (3) maintaining the echo matrix X, and opening up a GPU video memory space by using a cudamallloc function.
Step 1.2, copying the echo data after each pulse compression from the CPU to the GPU by using the cudaMemcpy function and the cudamemcpyHostTodevice parameter, and storing the current sampling point value and the value after the sampling point passes through the delay line by each thread of the CUDA.
And 2, distributing the Grid (Grid) and the thread Block (Block) size of the CUDA thread by using the 2-dimensional thread index, executing a secondary canceller MTI kernel function in the GPU, and outputting echo data after filtering static object impurities and a distance unit where a moving target is located. Fig. 2 is a schematic diagram of a secondary canceller used in MTI processing, where the output signal Y (t) is equal to the convolution of the impulse response H (t) and the input X (t), and the calculation formula is:
Y(t)=H(t)*X(t)=X(t)-2X(t-T r )+X(t-2T r )
the transfer function is
H(z)=(1-z -1 ) 2 =1-2z -1 +z -2
Specifically, the step 2 includes the following 3 sub-steps:
step 2.1, dividing the Grid (Grid) and the thread Block (Block) of the thread organization according to the echo data length copied to the GPU, wherein the thread on each GridDim.x dimension is responsible for completing two subtraction operations of a group of three-pulse cancellation.
And 2.2, executing a secondary canceller MTI kernel function, and completing two subtraction operations of sampling points of the same distance resolution unit in a pulse repetition period in the GPU by using the thread index value.
Step 3, for N obtained in the step 2 r ×N c Dimension result matrix X MTI Firstly executing matrix transposition kernel functions in a GPU, then executing cuFFTExeC2C functions in a CUFFT library to complete FFT parallel calculation of a plurality of groups of Doppler channels, and finally executing the matrix transposition kernel functions again to obtain N output by an MTD parallel algorithm r ×N c Dimension matrix X MTD It is copied from the GPU back to the CPU. An MTD filter schematic diagram for constructing MTI cascade FFT by using a narrow-band Doppler filter bank is shown in FIG. 3, and the amplitude-frequency characteristic is that
Wherein N represents the number of target echo pulses, k represents the kth filter, T r Is the pulse repetition period. Each distance unit and N-1T r The delay element covers the whole doppler frequency.
Specifically, the step 3 includes the following 5 sub-steps:
step 3.1, dividing the mesh (Grid) and the thread Block (Block) size of the thread organization, wherein the griddim.x dimension processes the distance dimension data of a plurality of channels, and the griddim.y dimension processes the Doppler dimension data of a plurality of channels.
And 3.2, aiming at matrix data after the execution of the twice-canceled MTI kernel function, configuring the kernel function to map each Doppler channel data into a thread block, executing the matrix transposed kernel function in the GPU, and enabling the data addresses of the distance dimension and the Doppler dimension to be continuous.
And 3.3, creating a cuFFT handle, calling a cuDA library function cufftPlan2D to configure a 2-dimensional cuFFT plan, and executing a complex domain-to-complex domain FFT parallel algorithm on a Doppler dimension by using a library function cufftExec2C with a parameter of CUFFT_FORWARD.
Step 3.4, performing matrix transposition kernel function again on the matrix obtained after FFT parallel calculation to obtain N output by MTD parallel algorithm r ×N c Dimension matrix X MTD . Obtaining Doppler frequency shift according to a Doppler channel where a moving target is located, and solving formulas of radial speed and speed resolution of the moving target are respectively as follows:
wherein c is the speed of light, f c Is the carrier frequency, Δf d Is Doppler resolution, f r Is pulse repetition frequency mtd FFT Is the FFT point number selected by the MTD.
And 3.5, copying the target echo data processed by the MTD from the GPU to the CPU by using the cudaMemcpy function and the cudamemcpyDeviceToHost parameter, calling the cufftDestroy function to destroy the cuFFT handle, and calling the free function and the cudame function to release memory resources occupied by the CPU and the GPU respectively.
And 4, optimizing the kernel functions of the MTI and the MTD respectively realized in the 2 nd and the 3 rd steps by adopting strategies such as code instruction optimization, optimal thread allocation, alignment and merging global memory access, and the like, and calculating the speed-up ratio of the CUDA heterogeneous parallel algorithm to the CPU serial algorithm after optimization.
Specifically, the following 4 CUDA optimization strategies are included in step 4:
(1) Code instruction optimization. The present invention replaces arithmetic operators in kernel functions with bit operators, such as binary left shift operators < replace multiply operators×; for another example, at 2 n When doing modular computation, the method is replaced by bitwise and operator&(2 n -1). Meanwhile, the suffix 'f' is added to all float type variables in the invention, so that unnecessary time consumption caused by the hidden double-to-float forced type conversion is eliminated.
(2) Optimal thread allocation. The number of threads opened up by the configuration single thread block is an integral multiple of 32 and is not more than 1024. For the int type data accounting for 4 bytes, 256 threads are always stored in one thread block; for the float type data accounting for 8 bytes, 128 threads are always stored in one thread block, so that an execution unit can be better recycled, and the efficiency of CUDA instruction flow is improved.
(3) And aligning the full local memory access. In the invention, the global memory transaction head address of the GPU equipment is an integer multiple of the granularity of the cache, the memory access is realized by the L2 cache of 32 bytes or the L1 cache of 128 bytes, and the global memory access is always aligned, so that a part of bandwidth is saved.
(4) And merging global memory accesses. The invention enables the thread bundles to start from the aligned memory addresses, all 32 threads in each thread bundle access a continuous memory block, the data used for transmission processing are all needed by the thread bundles, the merging degree of the memory access is 100%, and the maximization of the memory throughput is facilitated.
The implementation method of the radar MTI and MTD based on CUDA isomerism parallel acceleration provided by the invention is finished.
The effects of the present invention will be further described with reference to simulation experiments.
1. Simulation conditions:
in the simulation experiment of the present invention, the computer hardware and software environments are configured as follows: the GPU equipment is a NVIDIA GeForce GTX 1660Ti video card and is provided with 6GB video memory and 1536 CUDA cores; the CPU model is an Intel (R) Core i7-9750H processor, 6 Core 12 threads, and the main frequency is 2.6GHz; the operating system is a 64-bit Windows 10 specialty; the heterogeneous parallel platform is CUDA Toolkit 10.2; CUDA programming platform is Microsoft Visual Studio 2019; the algorithm verification platform is MATLAB R2020a.
The simulation parameters of the invention are as follows: the linear frequency modulation signal is used as a radar transmitting signal, the bandwidth B of the modulating signal is 20MHz, the pulse width tau is 10us, and the pulse repetition period T r 100us, sampling frequency f s 100MHz, transmitter carrier frequency f c 10GHz. Setting 3 moving targets to be detected in the simulation, wherein the distance of the moving targets is 3Km, and the speed is 250m/s; the distance of the moving object 2 is 6Km, and the speed is 25m/s; the distance of the moving object 3 is 4Km, and the speed is 75m/s; finally, a stationary object is set at a distance of 1 Km.
The invention sets 1000 pulse repetition periods T per transmission of the radar in the simulation r As one data acquisition, the time required for one data acquisition is 100ms, according to the sampling frequency f s A pulse repetition period T can be obtained r Number of samples N in s Is 10 4 And each. The CUDA programming uses float type single-precision floating point number which occupies 4 bytes, so the data size to be processed in one data acquisition time is 10 3 ·N s ·4Bytes/1024 2 ≈38.1MB。
2. The simulation content:
fig. 4 is a simulation diagram of performing only MTD kernel functions in the GPU and then loading output data to the MATLAB platform for result verification.
Fig. 5 is a simulation diagram of sequentially executing MTI and MTD kernel functions in the GPU, and then loading output data to the MATLAB platform for result verification.
FIG. 6 is a graph of acceleration ratio of the CUDA heterogeneous parallel algorithm to the CPU serial algorithm for MTI+MTD using the optimization strategy of step 4.
3. Simulation result analysis:
according to the simulation parameters of the invention, the maximum unambiguous distance R of the radar can be solved max = (c·pri)/2=15 km, distance resolution Δr=c/2b=7.5 m. The invention adopts 32-point FFT to process MTD in GPU, then Doppler resolution Deltaf d =1/(pri·32), velocity resolution Δv= (c·Δf d )/(2·f c )≈4.7m/s。
In fig. 4 and 5, the x-axis represents distance in m; the y-axis represents speed in m/s; the z-axis represents normalized amplitude in volts. As can be seen from fig. 4, in the case where only MTD processing is performed and MTI processing is not performed in the GPU, 4 targets are detected in total, and the stationary target located at 1Km is not filtered out. As can be seen from fig. 5, under the condition that MTI and MTD processes are sequentially performed in the GPU, the static target is successfully filtered, and the MTD processing result output by the GPU corresponds to the simulation parameters of the 3 expected moving targets set by the invention within the range of error permission, so that the feasibility of the implementation method of the radar MTI and MTD based on CUDA heterogeneous parallel acceleration provided by the invention is verified.
In fig. 6, the horizontal axis represents the data size when the CPU or GPU is used for mti+mtd processing; the vertical axis is the acceleration ratio value obtained by dividing the average time consumption of the CPU serial algorithm by the average time consumption of the CUDA heterogeneous parallel algorithm. As can be seen from fig. 6, when the processed data volume is large, the acceleration ratio tends to be saturated, the acceleration ratio of the optimized MTI and MTD parallelization algorithm is 142.66 times as high as that of the whole, the real-time performance of radar signal processing can be well met, and the method is based on the development mode of the CUDA software system and the Visual studio platform, and is also beneficial to platform transplantation of the algorithm.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (4)

1. A radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration is characterized by comprising the following steps:
step 1, setting radar signal processing parameter values in a CPU, and reading N after pulse compression r ×N c The echo data matrix X is maintained, is used as initial data before MTI processing and is copied into a developed GPU video memory one by one;
step 2, using a 2-dimensional thread index to allocate the Grid (Grid) and the thread Block (Block) of the CUDA thread, executing a secondary canceller MTI kernel function in the GPU, and outputting echo data after filtering static object impurities and a distance unit where a moving target is located;
step 3, for N obtained in the step 2 r ×N c Dimension result matrix X MTI Firstly executing matrix transposition kernel functions in a GPU, then executing cuFFTExeC2C functions in a CUFFT library to complete FFT parallel calculation of a plurality of groups of Doppler channels, and finally executing the matrix transposition kernel functions again to obtain N output by an MTD parallel algorithm r ×N c Dimension matrix X MTD Copying it from the GPU back to the CPU;
and 4, optimizing the kernel functions of the MTI and the MTD respectively realized in the 2 nd and the 3 rd steps by adopting strategies such as code instruction optimization, optimal thread allocation, alignment and merging global memory access, and the like, and calculating the speed-up ratio of the CUDA heterogeneous parallel algorithm to the CPU serial algorithm after optimization.
2. The method according to claim 1, characterized in that step 1 comprises in particular the sub-steps of:
step 1.1, setting a transmitting signal parameter in a CPU, compressing the pulse to obtain N r ×N c A dimensional echo matrix X is used for opening up a GPU video memory space by using a cudaMalloc function;
step 1.2, copying the echo data after each pulse compression from the CPU to the GPU by using the cudaMemcpy function and the cudamemcpyHostTodevice parameter, and storing the current sampling point value and the value after the sampling point passes through the delay line by each thread of the CUDA.
3. The method according to claim 1, characterized in that step 2 comprises in particular the sub-steps of:
step 2.1, dividing the Grid (Grid) and the thread Block (Block) of the thread organization according to the echo data length copied to the GPU, wherein the thread on each GridDim.x dimension is responsible for completing two subtraction operations of a group of three-pulse cancellation;
and 2.2, executing a secondary canceller MTI kernel function, and completing two subtraction operations of sampling points of the same distance resolution unit in a pulse repetition period in the GPU by using the thread index value.
4. The method according to claim 1, characterized in that step 3 comprises the following sub-steps:
step 3.1, dividing the size of a Grid (Grid) and a thread Block (Block) of a thread organization, wherein the Grid dim.x dimension processes distance dimension data of a plurality of channels, and the Grid dim.y dimension processes Doppler dimension data of a plurality of channels;
step 3.2, aiming at matrix data after the execution of the twice-cancellation MTI kernel function, configuring the kernel function to map each Doppler channel data into a thread block, executing the matrix transposition kernel function in the GPU, and enabling the data addresses of the distance dimension and the Doppler dimension to be continuous;
step 3.3, creating a cuFFT handle, calling a cuDA library function cufftPlan2D to configure a 2-dimensional cuFFT plan, and executing a complex domain-to-complex domain FFT parallel algorithm on a Doppler-dimension library function cufftExec2C with a parameter of CUFFT_FORWARD;
step 3.4, performing matrix transposition kernel function again on the matrix obtained after FFT parallel calculation to obtain N output by MTD parallel algorithm r ×N c Dimension matrix X MTD; Obtaining Doppler frequency shift according to a Doppler channel where a moving target is located, and solving formulas of radial speed and speed resolution of the moving target are respectively as follows:
wherein c is the speed of light, f c Is the carrier frequency, Δf d Is Doppler resolution, f r Is pulse repetition frequency mtd FFT The FFT point number selected by the MTD;
and 3.5, copying the target echo data processed by the MTD from the GPU to the CPU by using the cudaMemcpy function and the cudamemcpyDeviceToHost parameter, calling the cufftDestroy function to destroy the cuFFT handle, and calling the free function and the cudame function to release memory resources occupied by the CPU and the GPU respectively.
CN202110238579.5A 2021-03-04 2021-03-04 Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration Active CN112986944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110238579.5A CN112986944B (en) 2021-03-04 2021-03-04 Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110238579.5A CN112986944B (en) 2021-03-04 2021-03-04 Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration

Publications (2)

Publication Number Publication Date
CN112986944A CN112986944A (en) 2021-06-18
CN112986944B true CN112986944B (en) 2023-09-08

Family

ID=76352588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110238579.5A Active CN112986944B (en) 2021-03-04 2021-03-04 Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration

Country Status (1)

Country Link
CN (1) CN112986944B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704520B (en) * 2021-10-27 2022-03-08 天津(滨海)人工智能军民融合创新中心 Method and device for accelerating Anchor-based data processing by using cuda in parallel and electronic equipment
CN116502028B (en) * 2023-04-28 2023-10-20 中国科学院软件研究所 Large-scale FFT (fast Fourier transform) implementation method and device based on floating point number compression technology
CN117152259A (en) * 2023-11-01 2023-12-01 常熟理工学院 Micro-assembly positioning acceleration method and system based on multichannel microscopic vision guidance
CN117687779B (en) * 2023-11-30 2024-04-26 山东诚泉信息科技有限责任公司 Complex electric wave propagation prediction rapid calculation method based on heterogeneous multi-core calculation platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104849698A (en) * 2015-05-21 2015-08-19 中国人民解放军海军工程大学 Radar signal parallel processing method and system based on heterogeneous multinucleated system
WO2018045566A1 (en) * 2016-09-09 2018-03-15 深圳大学 Random pulse doppler radar angle-doppler imaging method based on compressed sensing
CN110187962A (en) * 2019-04-26 2019-08-30 中国人民解放军战略支援部队信息工程大学 A kind of Gridding algorithm optimization method and device based on CUDA
CN110208752A (en) * 2019-06-27 2019-09-06 电子科技大学 A kind of radar MTI/MTD implementation method based on GPU

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104849698A (en) * 2015-05-21 2015-08-19 中国人民解放军海军工程大学 Radar signal parallel processing method and system based on heterogeneous multinucleated system
WO2018045566A1 (en) * 2016-09-09 2018-03-15 深圳大学 Random pulse doppler radar angle-doppler imaging method based on compressed sensing
CN110187962A (en) * 2019-04-26 2019-08-30 中国人民解放军战略支援部队信息工程大学 A kind of Gridding algorithm optimization method and device based on CUDA
CN110208752A (en) * 2019-06-27 2019-09-06 电子科技大学 A kind of radar MTI/MTD implementation method based on GPU

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于GPU的软件雷达信号处理;田乾元;徐朝阳;赵泉;;舰船电子对抗(第01期);全文 *

Also Published As

Publication number Publication date
CN112986944A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN112986944B (en) Radar MTI and MTD implementation method based on CUDA isomerism parallel acceleration
CN104237852B (en) For processing the methods, devices and systems of radar signal
CN105137428A (en) Dechirp signal polar format imaging algorithm FPGA (Field Programmable Gate Array) realization method
CN102435989B (en) Field programmable gate array (FPGA)-based general wave beam forming device
CN116483319A (en) Operator processing method, device, equipment and medium for software defined chip
CN109154651A (en) Ranging processing method, device and unmanned vehicle based on radar
CN111830478B (en) FPGA (field programmable Gate array) implementation method for MTD (maximum Transmission Difference) processing of LFMCW (Linear frequency modulation and continuous phase) radar
CN103956991A (en) FIR filter parallel realization method based on CPU/GPU heterogeneous platform
CN113407483B (en) Dynamic reconfigurable processor for data intensive application
CN113672380B (en) Phase interferometer direction-finding system for realizing FX cross-correlation phase discrimination by GPU and phase discrimination method thereof
CN103544111B (en) A kind of hybrid base FFT method based on real-time process
CN105116398A (en) Real time Hough transformation detection weak object method based on FPGA
Rabinovich et al. Particle swarm optimization on a GPU
CN109633613B (en) FPGA (field programmable Gate array) realization method for hypersonic platform combined pulse compression and spring speed compensation
CN108874547A (en) A kind of data processing method and device of astronomy software Gridding
CN109840306A (en) One kind being based on recursive parallel FFT communication optimization method and system
CN115951323A (en) Radar signal self-adaptive constant false alarm rate detection optimization method based on OpenCL
CN102901951A (en) GPU (graphics processing unit)-based radar signal intra-pulse characteristic real-time analysis realizing scheme
CN109239688B (en) High-efficiency Doppler filter bank based on FPGA
Faulkner et al. GPU synthesis of RF channeliser outputs for a variable bandwidth microwave digital receiver
Wang et al. The Optimization of Radar Echo Pulse Compression Algorithm Based on DSP
CN109614151B (en) Four-core parallel large-point pulse pressure data processing method
Damnjanović et al. On Hardware Implementations of Two-Dimensional Fast Fourier Transform for Radar Signal Processing
Li et al. Parallel Optimal Design of SSR Response Signal Processing Algorithms Based on GPU
Shao et al. Research and implementation of a high performance parallel computing digital down converter on graphics processing unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant