CN104483670B

CN104483670B - SAR (synthetic aperture radar) echo simulation method based on GPU (ground power unit)

Info

Publication number: CN104483670B
Application number: CN201410718693.8A
Authority: CN
Inventors: 梁毅; 邢孟道; 杜凡; 李震宇
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2014-12-01
Filing date: 2014-12-01
Publication date: 2017-02-01
Anticipated expiration: 2034-12-01
Also published as: CN104483670A

Abstract

The invention belongs to the technical field of radar signal processing, and particularly relates to an SAR (synthetic aperture radar) echo simulation method based on GPU (ground power unit). The SAR echo simulation method based on the GPU comprises the following steps of 1, setting working parameters of an SAR at the GPU; 2, establishing a coordinate system at the GPU by using the center of an SAR observing scene as a reference point, setting the position of each array element of the SAR in each azimuth time, and setting the coordinate and scattering coefficient of each scattering point in the SAR observing scene; 3, copying all data of a CPU (central processing unit) into a whole memory of the GPU; 4, using three Kernel functions to obtain the echo data of the SAR observing scene in the first azimuth time at the GPU, namely l=1, 2 until Na; 5, enabling l to be selected from 1, 2 until Na, and repeating the step 4, so as to obtain the echo data of the SAR observing scene in all bearing times.

Description

Synthetic aperture radar echo simulation method based on gpu

Technical field

The invention belongs to Radar Signal Processing Technology field, particularly to the synthetic aperture radar echo simulation based on gpu Method, the present invention can be used for the echo simulation that airborne, missile-borne platform sar (synthetic aperture radar) is imaged.

Background technology

The research of sar imaging method needs the sar original echoed signals of specified conditions, and these data often cannot pass through radar Carrier actual measurement obtains, such as missile-borne sar, therefore can obtain the required original echoed signals of imaging by emulation and have great meaning Justice.

Echo simulation needs to carry out echo calculating to each scattering point of simulated scenario, and amount of calculation is huge, when echo generates Between long.The Time-Domain algorithm operation efficiency of analogue echo is low, is not suitable for the simulation of large scene Area Objects echo data.

Content of the invention

It is an object of the invention to proposing synthetic aperture radar echo simulation method based on gpu, the present invention is using being based on The frequency domain algorithm of concentric circular, further increases the performance of echo simulation, shortens simulation time.

For realizing above-mentioned technical purpose, the present invention adopts the following technical scheme that and is achieved.

Comprised the following steps based on the synthetic aperture radar echo simulation method of gpu:

Step 1, at cpu end, arranges the running parameter of synthetic aperture radar；The running parameter bag of described synthetic aperture radar Include synthetic aperture radar orientation sampling number n_a；

Step 2, at cpu end, arranges sar observation scene, to set up coordinate system for reference point in the center that sar observes scene, Setting each array element of sar, in the position of each orientation time, arranges sar and observes the coordinate of each scattering point and scattering system in scene Number；

Step 3, all data duplications that cpu is arranged in step 1 and step 2 are in the global memory of gpu；

Step 4, at gpu end, using the 1st kernel function, the 2nd kernel function and the 3rd kernel function call Go out the echo data that l-th orientation time sar observes scene, l=1,2 ..., n_a；

Step 5, makes l take 1,2 ..., n successively_a, and repeated execution of steps 4, draw all orientation time sar observation scenes Echo data.

The invention has the benefit that invention applies multiple optimisation strategy, fully having excavated Area Objects echo simulation High concurrency, ensure precision while, drastically reduce the area the generation time of echo.

Brief description

Fig. 1 is the flow chart of the synthetic aperture radar echo simulation method based on gpu of invention；

Fig. 2 a is the schematic diagram distributing thread grid in the present invention for the 1st kernel function；

Fig. 2 b is the schematic diagram distributing thread grid in the present invention for the 2nd kernel function；

Fig. 3 be the present invention in the 2nd kernel function each thread block corresponding data is carried out tree add up ask The schematic diagram of sum；

Fig. 4 is the schematic diagram distributing thread grid in the present invention for the 3rd kernel function；

Fig. 5 a is the map of magnitudes of the scattering point two dimension echo data being drawn using the present invention in emulation experiment；

Fig. 5 b is the phase diagram of the scattering point two dimension echo data being drawn using the present invention in emulation experiment；

Fig. 6 a is Area Objects emulation original scene figure in emulation experiment；

Fig. 6 b is the Area Objects echo two dimensional amplitude figure being drawn using the present invention in emulation experiment；

Fig. 6 c is the Area Objects imaging results schematic diagram being drawn using the present invention in emulation experiment.

Specific embodiment

The invention will be further described below in conjunction with the accompanying drawings:

With reference to Fig. 1, it is the flow chart of the synthetic aperture radar echo simulation method based on gpu of the present invention.Gpu should be based on Synthetic aperture radar echo simulation method comprise the following steps:

Step 1, at cpu end, arranges the running parameter of synthetic aperture radar, the running parameter bag of described synthetic aperture radar Include: synthetic aperture radar field angle, synthetic aperture radar distance are to sampling number (range cell number) n_r, synthetic aperture radar side Position is to sampling number n_a, synthetic aperture radar distance is to sample frequency f_s, synthetic aperture radar distance is to sampling interval δ r, synthesis Aperture radar distance is to oblique distance r of first sampling unit (first range cell)₀, light velocity c, synthetic aperture radar transmitting letter Number bandwidth b, the carrier wavelength lambda of synthetic aperture radar transmission signal.

Step 2, at cpu end, arranges sar observation scene, to set up coordinate system for reference point in the center that sar observes scene, Each array element of sar is set in the position of each orientation time, the interval between scattering point in scene set, setting sar observation scene In the coordinate of each scattering point and scattering coefficient.

Step 3, all data duplications that cpu is arranged in step 1 and step 2 are in the global memory of gpu；Now, Need to be the data distribution global memory from cpu at gpu end in advance.

Step 4, at gpu end, using the 1st kernel function, the 2nd kernel function and the 3rd kernel function call Go out the echo data that l-th orientation time sar observes scene, l=1,2 ..., n_a.

Show that l-th orientation time sar observes the process of the echo data of scene and include following sub-step:

(4.1) at gpu end, it is that the 1st kernel function distributes n1 thread block, Expression rounds up, and point_num represents l-th orientation time Sar observes the number of all scattering points of scene kind, and n is the natural number setting.It (is each that each thread block comprises 256 threads Size block_size of thread block is set to 256).So ensure that the total number of threads for the 1st kernel function distribution Multiple for 256, unnecessary thread does not distribute task.Reference picture 2a, is to be that the 1st kernel function distributes thread in the present invention The schematic diagram of grid；In Fig. 2 a, block (0,0), block (1,0), block (2,0) ... are expressed as the 1st kernel function and divide The thread block joined, thread0 to thread255 represents 256 threads of each thread block.

Each thread of 1st kernel function draws the echo data of n scattering point of l-th orientation time, is calculating During the echo data of l-th each scattering point of orientation time, calculate l-th orientation time corresponding scattering point place range cell Echo data simultaneously carries out n_rSecondary sinc interpolation, the echo data of corresponding scattering point place range cell is carried out kth time sinc and inserts Data after value is designated as the kth time sinc interpolated data of corresponding scattering point, and k takes 1 to n_r；The 1st sinc by corresponding scattering point The n-th of the extremely corresponding scattering point of interpolated data_rSecondary sinc interpolated data is combined as the echo data of corresponding scattering point.As can be seen that The echo data of each scattering point is length is n_rVector (include n_rPoint data).

After each thread of the 1st kernel function draws the echo data of n scattering point of l-th orientation time, Echo data to n scattering point of l-th orientation time is added up (the same distance list of n scattering point of l-th orientation time The sinc interpolated data of unit is added up), draw the 1st range cell of the echo data of n scattering point of l-th orientation time Accumulation result to the n-th of the echo data of n scattering point of l-th orientation time_rThe accumulation result of individual range cell, by l The accumulation result of the 1st range cell of the echo data of individual n scattering point of orientation time is to the n scattering of l-th orientation time The n-th of the echo data of point_rThe accumulation result of individual range cell forms the echo data of n scattering point of l-th orientation time Accumulation result.The accumulation result of the echo data of n scattering point of l-th orientation time is by n_rThe vector of individual plural elements composition (include n_rPoint data),

In the embodiment of the present invention, according to preceding description, the number of the thread of the 1st kernel function use is Expression rounds up；The n scattering point that each thread of 1st kernel function draws The accumulation result of echo data is n for length_rVector (n_rPoint data), n_rRepresent synthetic aperture radar distance to sampling number. After each thread of the 1st kernel function draws the accumulation result of echo data of n scattering point, it is stored in In the global memory of gpu, therefore need to distribute n for the 1st kernel function in the global memory of gpu_r×point_num/n Individual plurality of elements (each plurality of elements is used for storing complex data).

Obviously, it is that the thread grid that the 1st kernel function distributes is closely related with the size of n, in order to single thread Distribute task as much as possible to improve the efficiency of the 1st kernel function and to consider the restriction of global memory, setting changes n to be made The size of the accumulation result of the echo data of n scattering point that each thread of the 1st kernel function draws be limited in 1gb with Under, the value of the n value of the maximum thus the found out n for setting.

(4.2) at gpu end, it is that the 2nd kernel function distributes n2 thread blockEach thread block for the 2nd kernel function distribution includes 256 threads.N2 thread block can be indicated with two-dimentional thread block grid, and the line number of two-dimentional thread block grid is n_r, row Number isReference picture 2b, is to divide for the 2nd kernel function in the present invention The schematic diagram of distribution journey grid.In Fig. 2 b, block (0,0), block (1,0) ... block (0,1), block (1,1) ..., block(0,n_r),block(1,n_r) ... it is represented as the 2nd kernel function and distribute n2 thread block；Thread0 is extremely Thread255 represents 256 threads of each thread block.

256 × the n drawing for each thread block of the 1st kernel function_rIndividual plural elements (i.e. the 1st kernel The accumulation result of the echo data of n scattering point of 256 l-th orientation time that each thread block of function draws), the 2nd N is called in individual kernel function_rIndividual thread block；During l-th orientation drawing for each thread block of the 1st kernel function Between k-th range cell of n scattering point 256 plural elements, in the 2nd corresponding n of kernel function_rMake in individual thread block K-th of n scattering point of l-th orientation time each thread block of 1st kernel function being drawn with k-th thread block away from Carry out the cumulative summation (parallel reduction summation) of tree from 256 plural elements of unit, draw l-th orientation time the 2nd The corresponding n of individual kernel function_rThe cumulative summed result of k-th thread block in individual thread block, k takes 1 to n_r.As can be seen that the 2nd Individual kernel function draws the n2 cumulative summed result of l-th orientation time.

Specifically, the 2nd kernel function will draw to each thread block of the 1st kernel function simultaneously parallel 256×n_rIndividual plural elements carry out the cumulative summation of tree, in the 2nd corresponding n of kernel function_rUse in individual thread block 256 × n that k-th thread block draws to the corresponding thread block of the 1st kernel function_rIndividual plural elements carry out tree and tire out Plus the process of summation is: the kth of the n scattering point of l-th orientation time that the corresponding thread block of the 1st kernel function is drawn The accumulation result of individual range cell reads shared drive from the global memory of gpu, then at (being made up of 256 plural elements) In the shared drive of gpu, using the 2nd corresponding n of kernel function_rK-th thread block in individual thread block is to the 1st The accumulation result tree of l-th orientation time, k-th range cell of n scattering point that the corresponding thread of kernel function draws Shape structure adds up and sues for peace (parallel reduction summation).With reference to Fig. 3, it is each thread block pair of the 2nd kernel function in the present invention Corresponding data carries out tree and adds up the schematic diagram of summation.Each thread of the corresponding thread block of the 2nd kernel function will Corresponding plural elements read the shared drive of gpu from the global memory of gpu, after parallel reduction summation terminates, knot First element that fruit is located in shared drive, is write accumulation result in the overall situation of gpu by No. 0 thread in corresponding thread block Deposit.The parallel reduction summation of shared drive to be realized requires that the thread data number of each thread block is 2 integer power, But last thread block in each range cell needs the block size that data to be processed is possibly less than in program in practice 256, but because, in the programming model of cuda single instrction multithreading, it also opens the shared drive of 256 plural elements simultaneously And global memory can be read, therefore in order that the 2nd kernel function correctly executes, in thread receive data according to shared drive Before, whether the sequence number of thread to be judged treats cumulative data amount check less than a range cell, reads if less than then this thread To the relevant position of shared drive, otherwise to it, corresponding element in shared drive sets to 0 this thread global memory's respective element. So, the 2nd kernel unit just can do any number of parallel reduction summation to the data of each range cell, and does not require Data volume is 256 integral multiple.

(4.3) at gpu end, it is the 3rd kernel function distribution n_rIndividual thread block, is the 3rd kernel function distribution Each thread block includes 256 threads.With reference to Fig. 4, it is to be the signal that the 3rd kernel function distributes thread grid in the present invention Figure；In Fig. 4, block (0,0), block (0,1) ... block (0, n_r) it is expressed as the thread block that the 3rd kernel function distributes, Thread0 to thread255 represents 256 threads of each thread block.

After calling the 2nd kernel function to realize partly the adding up of echo data, using the 3rd kernel function by the The output result of 2 kernel functions is added up with shared drive again.In embodiments of the present invention, the 3rd kernel letter K-th thread block of number reads k-th of the l-th orientation time that the 2nd kernel function draws from the global memory of gpu The accumulation result (being made up of 256 plural elements) of range cell, l-th orientation then the 2nd kernel function being drawn The cumulative summed result of k-th range cell of time carries out the cumulative summation (parallel reduction summation) of tree, draws correspondence Cumulative summed result, k takes 1 to n_r.3rd kernel function draws the n of l-th orientation time_rIndividual cumulative summed result.Need If it is noted that the 3rd kernel function treats that in same range cell cumulative data is more than 256, first by line 256 data in this range cell are write shared drive by journey block, and other all data of this range cell are followed with while afterwards Ring is added on shared drive, so it is ensured that a thread block of the 3rd kernel function can realize a distance list Metadata cumulative；If the 3rd kernel function treats that in a range cell cumulative data is less than 256, the 3rd Kernel function, using adopting in a like fashion with the 2nd kernel function, is mended 0 to the unnecessary element of shared drive, then is entered Row reduction is sued for peace.After the completion of 3rd kernel function execution, in n_rN is produced on individual range cell_rPoint echo.

In the embodiment of the present invention, the 1st kernel function, the 2nd kernel function and the 3rd kernel function are pressed respectively Execute successively according to respective thread grid, complete Parallel Implementation on gpu and launch the process that pulsatile once receives echo, that is, The superposition of all scattering point echoes in beam area.

Compare traditional analogue echoes algorithm, the present invention is when calculating the echo data of each scattering point it is only necessary to calculate The echo of its place range cell simultaneously carries out multiple spot interpolation, greatly reduces amount of calculation.Draw l-th each scattering of orientation time The process of the echo data of point includes following sub-step:

A whether () running parameter first according to synthetic aperture radar, judge corresponding scattering point in synthetic aperture radar In beam area.Specifically, calculate the angle of strabismus of corresponding scattering point, if below the 1/2 of field angle, then correspond to scattering Point is in the beam area of synthetic aperture radar；Otherwise, corresponding scattering point is not in the beam area of radar.

B () calculates instantaneous oblique distance r of corresponding scattering point in the beam area of synthetic aperture radar,

C (), with first range cell of synthetic aperture radar as reference unit, determines the beam area of synthetic aperture radar Sequence number ind of interior corresponding scattering point place range cell,

D () determines interpolation length 2l+1 of sinc interpolation, it is higher that 2l+1 typically chooses the bigger interpolation precision of 40～100, l, But the efficiency (speed) calculating each scattering point echo data also accordingly reduces

E () judges corresponding scattering point whether in interpolation sampling unit, if l+1≤ind≤n_r- l, then illustrate corresponding dissipating Exit point, in interpolation sampling unit, otherwise, illustrates corresponding scattering point not in interpolation sampling unit.As l+1≤ind≤n_r- l, Skip to sub-step (f).

F () represents the sequence number of the interpolation point in 2l+1 point interpolation with m, the span of m is-l≤m≤l.Draw corresponding dissipating The echo data of m-th interpolation point of exit point, the echo data of m-th interpolation point of i-th scattering point is expressed as s_i(ind+ M), as-l≤m≤l and 1≤ind+m≤n_rWhen, s_i(ind+m) it is:

s_{i} (ind + m) = σ_{i} \sin c (b (\frac{2 (r_{0} + (ind + m) \cdot δr)}{c} - \frac{2 r (i)}{c})) \exp (- j \frac{4 π}{λ} r (i))

Otherwise, s_i(ind+m) it is 0.σ_iRepresent the scattering coefficient of i-th scattering point, c represents the light velocity；R (i) represents i-th The oblique distance of scattering point.

By s_i(ind+m) n_rIndividual non-zero valued combinations become the echo data s of i-th scattering point_i.

Above-mentioned sub-step (a) is all completed by the 1st kernel function to sub-step (f).1st kernel function is not only sentenced Break each scattering point whether in the wave beam of synthetic aperture radar, calculates oblique distance and the echo data of scattering point, it is right to also achieve Partly the adding up of echo data.Each thread will execute two-layer circulation: calculate and in global memory cumulative n scattering point time Ripple, a scattering point needs to carry out 2l+1 point interpolation.

2nd kernel function, the 3rd kernel function obtain each range cell using parallel reduction summation method Echo data, that is, the output result tree of the 1st kernel function is added up.Every in 2nd kernel function Individual distance is added up to the echo data of sample point with multiple thread block, obtains multiple values.Every in 3rd kernel function Individual distance is collected echo result for a value to sample point by a thread block, obtains n_rIndividual distance returning to sample point Ripple.After 1st kernel function, the 2nd kernel function and the 3rd kernel function execute successively, when obtaining an orientation Between n_rPoint echo.

The present invention is optimized to them with following multiple optimisation strategy.1st kernel function, the 2nd kernel letter The thread block size of number and the 3rd kernel function is disposed as 256, can improve the occupancy of stream handle (sm) thread block Rate, the shortest through testing program execution time when thread block size is 256.A thread due to the 1st kernel function will Execution two-layer circulation, calculates more complicated, therefore optimizes the use to depositor first, to reduce unnecessary double counting and right The read-write of global memory, improves the performance of the 1st kernel function.But the 1st single thread of kernel function needs simultaneously Register variable is more to also limit the thread block number simultaneously executing in sm, so reusing the 1st kernel as far as possible Register variable in function, reduces the use unnecessary to depositor.In order to improve the degree of parallelism of instruction in the program of gpu end, The present invention reads real imaginary part with two independent instructions from global memory, effectively increases the performance of program.gpu On complex operation not using Inline Function but real part and imaginary part are respectively calculated, program feature improves further.Will After realizing the while loop unrolling of tree-shaped parallel reduction summation process, the 2nd kernel function and the 3rd kernel function Execution time reduces further.

Step 5, by the 3rd n that kernel function draws_rIndividual cumulative summed result is designated as l-th orientation time sar observation The echo data of scene, makes l take 1,2 ..., n successively_a, and repeated execution of steps 4, draw all orientation time sar observation fields The echo data of scape, the echo data that all orientation time sar are observed scene writes in the global memory of gpu.All orientation Time sar observes the echo data of scene and observes the echo data of scene to n-th by the 1st orientation time sar_aThe individual orientation time Sar observes the echo data composition of scene.It can be seen that the global memory of gpu needs to distribute n_a×n_rIndividual plurality of elements is storing The echo data of all orientation time.

s (\hat{t}, t_{m}; r_{s}) = w_{r} (\hat{t}) w_{a} (t_{m}) \exp (- j \frac{4 π}{λ} r (t_{m}; r_{s}))

Wherein,It is apart from fast time, t_mIt is the orientation slow time, r_sIt is oblique distance during the inswept target of beam central line, r (t_m； r_s) it is instantaneous oblique distance, w_r() represents the window function of transmission signal, w_a() represents orientation window function.Represent with Apart from the fast timeOrientation slow time t_mAnd oblique distance r during the inswept target of beam central line_sCorresponding echo data.

Step 6, at gpu end, the echo data observing scene to all orientation time sar does distance to Fourier transformation (fft), obtain the echo data apart from frequency domain all orientation time；Echo data apart from frequency domain all orientation time is multiplied by Apart from frequency matching item, draw the echo data of all orientation time after frequency matching；Adjust the distance all after frequency matching The echo data of orientation time makees distance to inverse Fourier transform (ifft), draws all orientation time after frequency matching Time domain echo data, so far just can realize the generation of the echo data of large scene.Due to orientation same in global memory The echo data of sample point is continuous, so when the echo data to all orientation time makees distance to Fourier transformation, right In the echo data of all orientation time, the echo data of each orientation time is n_rPoint distance is to Fourier transformation.Cuda is built-in Fft built-in function can once realize the fft batch processing of multirow.

Distance realizes echo data and the multiplication apart from frequency matching item by calling gpu kernel function to after fft.Distance frequency Domain occurrence is a distance to n_rPoint vector, because amount of calculation is less, copies to gpu end after calculating on cpu, then by parallel Each thread taken advantage of again to the element in occurrence to by each sampling unit of echo data and distance by distance.

After the time domain echo data of all orientation time after drawing apart from frequency matching, it is institute after frequency matching Have the time domain echo data of orientation time allocation space on the internal memory at cpu end, will after frequency matching all orientation time Global memory from gpu for the time domain echo data copy to cpu end, write file preserves.

The effect of the present invention can be further illustrated by following emulation experiment.

In emulation experiment, echo simulation is carried out using the following two kinds method, method one is the present invention, method 2 is only sharp Carry out echo simulation with cpu.The software platform being related in emulation, cpu, gpu are as shown in the table:

Software platform	Visual studio 2010, cuda toolkit v5.5
		cpu	intel x5650
gpu	tesla c2070

Reference picture 5a, is the map of magnitudes of the scattering point two dimension echo data being drawn using the present invention in emulation experiment.

Reference picture 5b, is the phase diagram of the scattering point two dimension echo data being drawn using the present invention in emulation experiment.Fig. 5 a In Fig. 5 b, transverse axis represents the sequence number in orientation for the sampling unit, the longitudinal axis represent sampling unit distance to sequence number, from 5a and Fig. 5 b finds out, linear FM signal is positive frequency sweep, and phase of echo shows as two-dimentional hyp form, and practical situation is coincide.

Respectively the scene comprising 512*512 and 2000*3000 scattering point is emulated, angle of strabismus is 50 degree, reference Fig. 6 a, is Area Objects emulation original scene figure in emulation experiment, reference picture 6b, is the face being drawn using the present invention in emulation experiment Target echo two dimensional amplitude figure, reference picture 6c, is the Area Objects imaging results schematic diagram being drawn using the present invention in true experiment.Figure In 6b, transverse axis represent sampling unit orientation sequence number, the longitudinal axis represent sampling unit distance to sequence number.From Fig. 6 c Find out, the Area Objects imaging results of the present invention ensure that good focusing effect.Simulation time and speed-up ratio are as shown in the table

Scene scatters are counted	Cpu version takes (s)	This law takes (s)	Speed-up ratio
				512*512	55701	138	404
2000*3000	1127424	1599	705

In sum, realization on cpu for the present invention is one three layers circulation, and outermost loop number is sampled for orientation Points, the inside two-layer circulates the echo for calculating the generation of pulsatile once all scattering points, subsequently does distance to echo and takes advantage of to fft With apart from frequency matching item be again ifft become back to the time domain, the step for carry out in outermost loop.The present invention is by cpu version This inside two-layer circulation, that is, this process of echo that calculating radar emission pulsatile once receives is real parallel on gpu Existing, with large-scale thread, the echo that each scattering point produces is realized reduction summation, obtain the echo of this orientation time simultaneously Write global memory.The circulation of the outermost orientation sampled point kernel function to call gpu end is executed on cpu, obtains The echo data of all orientation time.

Obviously, those skilled in the art can carry out the various changes and modification essence without deviating from the present invention to the present invention God and scope.So, if these modifications of the present invention and modification belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprise these changes and modification.

Claims

1. the synthetic aperture radar echo simulation method based on gpu is it is characterised in that comprise the following steps:

Step 1, at cpu end, arranges the running parameter of synthetic aperture radar；The running parameter of described synthetic aperture radar includes closing Become aperture radar bearing to sampling number n_a；

Step 2, at cpu end, arranges sar observation scene, to set up coordinate system for reference point in the center that sar observes scene, setting Each array element of sar, in the position of each orientation time, arranges sar and observes the coordinate of each scattering point and scattering coefficient in scene；

Step 4, at gpu end, draws l using the 1st kernel function, the 2nd kernel function and the 3rd kernel function The echo data of individual orientation time sar observation scene, l=1,2 ..., n_a；

The running parameter of described synthetic aperture radar also includes synthetic aperture radar distance to sampling number n_r；In step 4, obtain The process going out the echo data that l-th orientation time sar observes scene includes following sub-step:

(4.1) at gpu end, it is that the 1st kernel function distributes n1 thread block, Expression rounds up, and point_num represents the number of l-th orientation time sar observation all scattering point of scene kind, and n is to set Fixed natural number, is that each thread block of the 1st kernel function distribution comprises 256 threads；

Draw the echo data of n scattering point of l-th orientation time using each thread of the 1st kernel function, calculating During the echo data of l-th each scattering point of orientation time, calculate l-th orientation time corresponding scattering point place range cell Echo data simultaneously carries out sinc interpolation, and the echo data of corresponding scattering point place range cell is carried out after kth time sin c interpolation Data be designated as the kth time sinc interpolated data of corresponding scattering point, k takes 1 to n_r；The 1st sinc interpolation by corresponding scattering point The n-th of the extremely corresponding scattering point of data_rSecondary sinc interpolated data is combined as the echo data of corresponding scattering point；

After each thread of the 1st kernel function draws the echo data of n scattering point of l-th orientation time, to l The echo data of individual n scattering point of orientation time is added up, and draws the echo data of n scattering point of l-th orientation time The accumulation result of the 1st range cell is to the n-th of the echo data of n scattering point of l-th orientation time_rIndividual range cell tired Plus result, by the accumulation result of the 1st range cell of the echo data of n scattering point of l-th orientation time to l-th orientation The n-th of the echo data of n scattering point of time_rThe accumulation result of individual range cell forms n scattering point of l-th orientation time The accumulation result of echo data；During drawing the accumulation result of echo data of n scattering point of l-th orientation time, Each thread block of 1st kernel function draws 256 × n_rIndividual plural elements；

(4.2) at gpu end, it is that the 2nd kernel function distributes n2 thread block Each thread block for the 2nd kernel function distribution includes 256 threads；

256 × the n drawing for each thread block of the 1st kernel function_rIndividual plural elements, in the 2nd kernel function Call n_rIndividual thread block；The n scattering point kth of l-th orientation time drawing for each thread block of the 1st kernel function 256 plural elements of individual range cell, in the 2nd corresponding n of kernel function_rUsing k-th thread block in individual thread block 256 of l-th orientation time, k-th range cell of n scattering point that each thread block of 1st kernel function is drawn Plural elements carry out parallel reduction summation, draw the 2nd corresponding n of kernel function of l-th orientation time_rIn individual thread block The cumulative summed result of k thread block；

(4.3) at gpu end, it is the 3rd kernel function distribution n_rIndividual thread block, is each line of the 3rd kernel function distribution Journey block includes 256 threads；

Call the 2nd kernel function realize echo data partly add up after, using the 3rd kernel function by the 2nd The output result of kernel function is added up with shared drive again；K-th thread block of the 3rd kernel function is from gpu Global memory in read the accumulation result of k-th range cell of l-th orientation time that the 2nd kernel function draws, Then the cumulative summed result of k-th range cell of the l-th orientation time that the 2nd kernel function draws is carried out parallel Reduction is sued for peace, and draws corresponding cumulative summed result；The n of the l-th orientation time that the 3rd kernel function draws_rIndividual cumulative Summed result is the echo data of l-th orientation time sar observation scene；

Step 5, makes l take 1,2 ..., n successively_a, and repeated execution of steps 4, show that all orientation time sar observe returning of scene Wave number evidence.

2. the synthetic aperture radar echo simulation method based on gpu as claimed in claim 1 it is characterised in that step 5 it Afterwards, at gpu end, the echo data observing scene to all orientation time sar does distance to Fourier transformation, obtains apart from frequency domain The echo data of all orientation time；Echo data apart from frequency domain all orientation time is multiplied by apart from frequency matching item, obtains Go out the echo data of all orientation time after frequency matching；Adjust the distance the echo data of all orientation time after frequency matching Make distance to inverse Fourier transform, draw the time domain echo data of all orientation time after frequency matching.