CN105045763A

CN105045763A - FPGA (Field Programmable Gata Array) and multi-core DSP (Digital Signal Processor) based PD (Pulse Doppler) radar signal processing system and parallel realization method therefor

Info

Publication number: CN105045763A
Application number: CN201510411844.XA
Authority: CN
Inventors: 王俊; 吕栋; 张玉玺; 杨彬; 尹晗
Original assignee: Beihang University
Current assignee: Hangzhou Leishi Technology Co ltd
Priority date: 2015-07-14
Filing date: 2015-07-14
Publication date: 2015-11-11
Anticipated expiration: 2035-07-14
Also published as: CN105045763B

Abstract

An FPGA (Field Programmable Gata Array) and multi-core DSP (Digital Signal Processor) based PD (Pulse Doppler) radar signal processing system comprises an FPGA core chip, a peripheral minimum system circuit of the FPGA core chip, a DSP chip, a peripheral minimum system circuit of the DSP chip, a gigabit network interface chip, a power supply chip and a level switching chip, wherein the FPGA core chip is used for receiving radar direct wave and echo signals acquired by a data collection chip; the signals are subjected to down-conversion processing and stored in a memory; after being received, data is transmitted to a DDR3 (Double Data Rate 3) memory in the DSP chip through an SRIO (Serial Rapid Input Output) interface between the FPGA core chip and the DSP chip; the DSP chip is used for pulse compression, phase-coherent accumulation and constant false alarm rate detection; information of a target point is obtained; and finally the target information is uploaded to an upper computer through a network port. A parallel realization method for the system comprises six steps. According to the system and the parallel realization method, a hardware circuit is simple; an FPGA and multi-core DSP architecture is adopted for processing; and the parallel processing performance of the system is made full use of.

Description

A kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP and Parallel Implementation method thereof

Technical field

The present invention is a kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP and Parallel Implementation method thereof, and it is the hardware platform based on FPGA+ multi-core DSP, realizes the Radar Signal Processing of multi-core DSP, belongs to digital processing field.

Background technology

Radar Doppler refers to the radar utilizing the information of Doppler effect to target to extract and process.If radar emission is pulse modulated radiofrequency signal, be namely referred to as pulse Doppler radar, be called for short PD radar.In order to obtain large Timed automata signal, improve the resolution of radar speed and distance, radar launches linear FM signal usually, based on the feature of chirped PD radar in conjunction with pulse Doppler and pulse compression.Additionally use correlative accumulation when signal transacting and improve the signal to noise ratio (S/N ratio) detected, the method that the method for carrying out FFT to the data of same range gate realizes Doppler filter group filtering is commonly used during Project Realization, output signal detects in (CFAR) system to constant false alarm rate after asking mould, whether exceedes thresholding come whether there is target in judging distance door according to detecting unit.Radar by improving the signal to noise ratio (S/N ratio) of target, signal to noise ratio detects target.

Radar signal is being carried out in the process of check processing, mainly be divided into the modules such as pulse compression, correlative accumulation, constant false alarm rate (CFAR) detection, although have remarkable help to raising target echo detection, but also add the operand of process simultaneously, as calculated the FFT etc. of large several point, the real-time calculation requirement of processor is promoted greatly.In addition, the proposition of new Radar Technology and application make the function of radar from strength to strength, but propose requirements at the higher level to radar signal processor simultaneously.

Along with the fast development of semiconductor technology and memory technology, very high speed integrated circuit (VHSIC) and VLSI (very large scale integrated circuit) (VLSI) technology obtain and increase substantially, TI company is proposed multi-core DSP chip, propose novel processor architecture, operational performance significantly promotes, and this makes to realize various algorithm fast becomes possibility.

For the demand that processor performance above-mentioned promotes, the present inventor devises a kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP, this system adopts FPGA+ multi-core DSP framework, peripheral except FPGA, outside minimum system circuit needed for DSP work, also have two network interface chips, Radar Signal Processing is programming realization in FPGA and multi-core DSP, can meet the real-time demand of complex radar signal process.

Summary of the invention

1, object: the object of the present invention is to provide a kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP and Parallel Implementation method thereof, its object is to realize PD Radar Signal Processing System by hardware language, C language programming and multi-core DSP program design.

2, technical scheme: object of the present invention is achieved through the following technical solutions.

(1) a kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP of the present invention, it comprises fpga core chip and peripheral minimum system circuit, dsp chip and peripheral minimum system circuit, gigabit networking interface chip, power supply chip and level transferring chip.Its system architecture as shown in Figure 1, position annexation between them and signal trend are: the straight ripple of radar that fpga core chip reception data acquisition chip collects and echoed signal, exist after carrying out down-converted in internal memory, transferred data to the DDR3 in dsp chip by the SRIO interface between fpga core chip and dsp chip after receiving the data of a frame, then dsp chip carries out pulse compression, correlative accumulation and constant false alarm rate (CFAR) detection, obtain the information of impact point, finally by network interface, target information is uploaded to host computer.

This fpga core chip selection XC6VSX315T, belong to the Virtex-6 series of Xilinx company, adopt the 40nm manufacturing process of third generation XilinxASMBL framework, there is efficient pair register 6 and input LUT (look-up table) logic, there is abundant IO resource, a large amount of on-chip memory resources, supports DDR3.Lower power consumption 50% compared with last generation product, cost reduces by 20%.In addition, this chip has powerful signal handling capacity and has the ability connected in series based on low-power consumption GTX6.5Gbps transceiver, ensures the high speed serial transmission between fpga core chip and dsp chip.Fpga core chip is sampled after the data that obtain receiving data acquisition chip, exists in internal memory, be transferred in dsp chip after obtaining the data of a frame by SRIO after Digital Down Convert by data.

This fpga core chip periphery minimum system circuit, comprises clock source and program loads FLASH, and they are responsible for auxiliary fpga core chip and complete processing capacity.Clock source provides clock signal for fpga core chip; Because the power-off of fpga core chip Program is eliminated automatically, load in FLASH so program code must be cured to a program, after powering on, the program in FLASH is loaded in fpga core chip automatically to make it normally work at every turn.Clock source provides the system clock of fpga core chip operation, and the frequency required for crystal oscillator produces directly sends fpga core chip to.

The TMS320C6678 multinuclear process chip that this dsp chip adopts TI to release.This chip adopts a kind of Harvard's bus structure of improvement: the program bus of a set of 256, two cover 64 bit data bus and a set of 32 DMA private buss.Processing unit adopts high-performance, advanced very long instruction word structure, and per clock period can the instruction of executed in parallel 8 32bit.Adopt 8 arithmetic speeds to build up to the DSP kernel of 1.25GHz, achieve 320GMAC and 160GFLOP fixed point and floating-point performance on a single chip.Monokaryon is except can be configured to L1P and L1D of the 32KB of CACHE, also comprise 512KB can be configured to RAM or CACHE LL2SRAM, also has the multinuclear shared drive of 4MB in addition, can use as the L2SRAM shared or shared L3SRAM, built-in DDR3 controller, addressable 33bit address and 8GB storage space.TMS320C6678 chip provides abundant peripheral interface, and wherein according to mission requirements, the interfaces such as serial RapidIO, PCIE, Hyperlink, DDR3 are mainly used in signal transacting combination.The frame radar pulse string signal utilizing SRIO to receive fpga core chip to be in the present invention transferred to, carry out design and the distribution of multinuclear task, the multi-core parallel concurrent implementation procedure that layout pulse compression, correlative accumulation and CFAR detect, finally obtain the information of impact point, upload to host computer by network interface, realize the performance boost of computation process.

This dsp chip minimum system peripheral circuit, comprises clock source, and program loads FLASH, and outside DDR3 storer, they are responsible for auxiliary DSP chip and complete processing capacity.Owing to being automatically eliminated after the power-off of dsp chip program, load in FLASH so program code must be cured to a program, after powering on, the program in FLASH is loaded in dsp chip automatically to make it normally work at every turn.Because dsp chip needs temporary and process mass data, so must in its outside extension storage space.The data such as the buffering result of raw data and intermediate treatment are stored wherein by the plug-in four DDR3 storeies of dsp chip.Clock source provides the system clock of dsp chip work, and the frequency required for crystal oscillator produces directly sends dsp chip to.

This gigabit networking interface chip selects the 88E1111 ethernet physical layer chip of Marvell company, under the control of the EMAC module of dsp chip, transmits original information data with host computer with gigabit Ethernet network form.

This power supply chip provides the voltage needed for whole system work.Extraneous to the isolation voltage of system input+5V, by power supply chip ,+5V voltage is changed into+3.3V, + 2.5V, + 1.8V, + 1.5V, + 1.2V, + 1.0V, + 0.75V, CMGT_AVTT, CMGT_AVCC, be supplied to fpga core chip (+3.3V respectively, + 2.5V, + 1.8V, + 1.0V), program loads FLASH (+3.3V, + 1.8V), dsp chip (+3.3V, + 1.8V, + 1.0V), DDR3 module (+1.5V, + 0.75V), gigabit networking interface chip (+3.3V, + 1.2V), clock providing source (+3.3V), wherein CMGT_AVTT and CMGT_AVCC is respectively fpga core chip high speed interface and provides+1.2V and+1.0V voltage.

This level transferring chip be adopt be TI company release SN74ALVC164245 chip.This chip supports+2.5V to+3.3V ,+3.3V to the level conversion of+5V.

(2) the present invention is a kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP and Parallel Implementation method thereof, its building process is summarized as follows: in fpga core chip, receive the intermediate frequency data that data acquisition obtains, obtain baseband signal data after carrying out Digital Down Convert, and data are sent into ram in slice and carry out buffer memory; After obtaining a frame pulse string data, carried out the transmission of data by the High Speed Serial SRIO of FPGA and DSP; Dsp chip obtains the baseband signal after a frame down coversion, leaves in DDR3, and design multi-core parallel concurrent realizes the pulse compression algorithm process of frame data; By the deposit data after pulse compression in the buffer memory of DDR3, then design multinuclear and realize parallel correlative accumulation algorithm process and CFAR check processing, obtain the information of impact point; Finally by network interface, impact point information is sent to host computer.

In sum, a kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP of the present invention and Parallel Implementation method thereof, the method concrete steps are as follows:

Step one: in fpga core chip, Digital Down Convert is carried out to intermediate-freuqncy signal

This step is completed by Digital Down Converter Module in fpga core chip, and Digital Down Converter Module extracts logic by data acquisition, mould two, time delay correcting filter, dual port RAM module form.Digital Down Converter Module adopts multiphase filter structure, extracts through two times of odd evens, and if sampling data down-conversion is obtained base band complex data after correcting by time delay.Data acquisition module using the data that obtain after data acquisition chip is sampled as single ended input.Mould two extracts logic and input data pick-up is become I, Q two paths of data, at the rising edge of each clock by zone bit negate, data is got negative when zone bit is 1.Time delay corrects filtering and is realized by 12 rank FIR filter, and coefficient is generated by Matlab; Get high 16 of I, Q two-way after filtering and be spliced into 32 base band datas.

Step 2: data cached in fpga core chip, and configure SRIO preparation transmission data

Adopt the SRIO of x4 to interconnect between fpga core chip and dsp chip, single channel rate is 5Gbps, and consider that 8b/10b encodes, effective bandwidth is up to 2Gbps.Its structural drawing as shown in Figure 3, present invention utilizes the serial RapidIOIP core that Xilinx provides, and devises local side and far-end.Comprise local data process, remote data process and IP kernel.Local data process be responsible for sending local data request packet and receive that far-end sends to local data respond packet.Remote data process is responsible for receiving the packet from far-end.The major function of IP kernel is packing and unpacks, initialization and protocol realization.

When this locality sends data to far-end, data write is sent buffer memory, and give transmit control device enabling signal after writing.Local data end for process according to the SRIO header packet information set, comprise Packet type, bag size, bag number, send address, the other side ID etc. control produce request module from transmission buffer memory produce bag.These bags are transferred to far-end through the process of IP kernel.When far-end obtains packet when sending respond packet to this locality, the serial bit stream received is solved SRIO and wraps and pass to local data response processing module by IP kernel.Data write in bag is received buffer memory by remote data processing controls remote data request processing module, and transmits completion signal after writing to the module needing data, needs the module of data can from reception buffer memory sense data.

Step 3: configure SRIO register and receive data and leave in DDR3 in dsp chip

As shown in Figure 4, in SRIO module, local device is dsp chip to the SRIO module map of dsp chip end, and far-end device refers to fpga core chip.SRIO module in dsp chip is primarily of loading/unloading module and Physical layer composition.Loading/unloading module sends VBUSM request to DDR3 storer, accepts VBUSM response under the control of CPU/EDMA.In loading/unloading module, MMR command register controls send buffer memory and receive buffer memory, and is connected with the FIFO of Physical layer.

In dsp chip, usually call the configuration that CSL (on sheet Support Library) function realizes SRIO, comprise enable, initialization, open and set up functions such as communicating.The realization of SRIO can be divided into 4 steps: address maps; Configuration ID, SRIO port, interrupt vector; Configuration register, comprises the configuration of transmission mode and speed; Etc. to be linked.Upon connecting, dsp chip can receive and send SRIO bag.Need between dsp chip and fpga core chip to know that the object ID of the other side and start address correctly could transmit data.Select DirectIO mode when data are transmitted, only need the address mapping relation of TX and RX both sides just can realize transmission.

Step 4: realize multinuclear process pulse compression algorithm in dsp chip

This step completes in dsp chip, needs to design a set of multinuclear tasks in parallel implementation algorithm to carry out the data processing of pulse compression algorithm.The flow process of Radar Signal Processing as shown in Figure 5, pulse compression calculates in units of pulse, and correlative accumulation and CFAR detection carry out calculating according to the distance section position unit of train of impulses, only have after train of impulses all completes pulse compression, just can carry out correlative accumulation and CFAR detection, so generally flow process is divided into two subtasks, the pulse compression once completing all pulses calculates, and is once that correlative accumulation and CFAR detect.Due to when pulse compression calculates between pulse data correlation little, the data correlation between correlative accumulation and CFAR detection computations is little, thus multinuclear realize adopt master slave mode realize, a core is responsible for scheduling and the distribution of task, and its cokernel carries out parallel work-flow by task.

Need the configuration considering accumulator system when multinuclear designs, need in signal transacting to carry out data access operation with storer, access performance directly affects the efficiency of algorithm.Memory access performance is relevant with the position that code, data store, also relevant to access mode.By configure each memory size, cache arrange size and data access mode, the transfer rate under different situations can be obtained.The access that analysis can obtain external memory storage adopts EDMA mode usually, and internal storage is directly accessed or IDMA mode by kernel; The kernel access performance of internal memory is good, has wider data bus and instruction bus, leaves in LL2 by the data of key and variable; L1D and L1P is set to CACHE and can improves instruction and data buffer memory between kernel and storer.Raw data is placed in DDR3 by the accumulator system of the present invention's design, the LL2 of each core is configured to SRAM, deposit the data in pulse compression, correlative accumulation and CFAR testing process, L1D and L1P is configured to the access efficiency that CACHE is conducive to improving kernel, and the reservoir designs structural drawing of system as shown in Figure 6.

In Radar Signal Processing flow process, data are sent to DDR3 by SRIO interface by fpga core chip, then train of impulses are divided into 8 parts, and each core processes wherein 1 part of pulse data respectively.Pulse compression adopts frequency domain mode to realize in engineering, and after namely the frequency spectrum of input signal and the conjugate complex of local reference signal frequency spectrum are taken advantage of, another mistake welfare leaf transformation obtains result.Key step comprises input, FFT, takes advantage of again, IFFT, output.Pulse data is transferred to the LL2 of each core from DDR3 by EDMA module by importation, the buffer area of configuration data process in the LL2 of then each core, carry out data processing, by EDMA module, data are passed to the buffer zone of DDR3 when finally exporting, need the way of output configuring EDMA module when input and output.Twiddle factor required for FFT and IFFT calculates completes when initialization and calculates and exist in the LL2 of each core, takes advantage of the frequency spectrum conjugation of required reference signal also to exist when initialization in the LL2 of each core again, directly calls during calculating.

Relatively independent between kernel in multiple nucleus system, the scheduling that needing intercoms mutually finishes the work.Kernel needs the order of distributing corresponding subtask and execution.Pulse compression algorithm is realized owing to adopting master slave mode, each core completes respective data processing, do not need internuclear data sharing and transmission, so adopt the communication mode of internuclear interrupt mode, configuration of IP CGRx will produce the interruption of core x, SRCS0 ~ 27 arrange interrupt source mark, adopt SRCSx to represent the interrupt identification of core x here.Corresponding mark is removed in SRCC0 ~ 27 of configuration of IP CARx.

Need to consider the structure of Parallel Implementation in layout flow process, but during design, the bus of DDR3 becomes bottleneck, DDR3 data bus is the highest can only configure 64, and only has a set of bus, and each core needs president when accessing, can not return data in time.So stagger when core and Nuclear Data input during design, after previous core completes input, utilize the input of internuclear next core of down trigger; Adopt the input of last round of output and next round to be combined into the input of a module whole as next round during output, DDR3 bus occupation problem can be avoided like this; The burst process of next round is carried out again until complete the burst process task that each core distributes after core 7 completes whole data processing.The multinuclear realization flow figure of pulse compression as shown in Figure 7.

Step 5: realize multinuclear process correlative accumulation and CFAR detection algorithm in dsp chip

After process of pulse-compression completes, deposit data is in the DDR3 buffer zone of dsp chip, and the output wherein after each pulse compression stores by the mode of row in DDR3, just can carry out data processing in the mode of row like this when correlative accumulation and CFAR detect.The ranks transposition of a 2-D data is completed by the output of pulse compression.

Correlative accumulation and CFAR detect the mode adopting frequency domain to realize, and each core completes asks mould and CFAR to detect apart from the data input of cutting into slices, correlative accumulation, plural number.

The distance section of train of impulses, as data, adopts the mode of EDMA to be transferred to LL2 from DDR3.Correlative accumulation adopts FFT to realize in engineering, and its twiddle factor also completes when initialization and calculates and leave in LL2, directly completes in LL2 when FFT calculates.The complex points data obtained after correlative accumulation need to carry out asking mould to calculate amplitude.It is consuming time and need to consider data from overflow that conventional quadratic sum opens radical sign, adopts the mode of linear-apporximation to realize plural number and ask mould

|X|≈g(I,Q)＝amax{|I|,|Q|}+bmin{|I|,|Q|}

Wherein a, b are weighting coefficient, and the value of coefficient is relevant with the requirement of relative error.The method that the ripples such as employing are approximate.Choose suitable a, b parameter, make error below 0.8%, its formula is as follows:

|X|≈max{TL+1/8TS,27/32TL+9/16TS}

CFAR detects the mode adopting conventional CA-CFAR rate (CA-CFAR) to detect and realizes; usually adopt the mode of drawing window detection to realize when real data process; when calculating once sliding window and detecting, test cell both sides are provided with protected location and are used for preventing target from crossing over multiple unit and cause interference.For judging whether there is target in one of them unit, need to be averaging left and right reference unit to be multiplied by threshold factor again and to obtain detection threshold, then with draw whether there is target after test cell.When Project Realization, usual CFAR detects needs to consume a large amount of cycle, and TMS320C6678 support software flowing water technology realizes parallel operation, can be optimized the process of CFAR detection by assembly instruction layout.Specific implementation process is as follows: first complete writing of C language code, then linear assembler code is rewritten into, according to the periodicity of code determination iteration, draw correlogram afterwards, namely the functional unit used by every bar instruction is determined, have 8 functional units in TMS320C6678 monokaryon, bus supports parallel 8 instructions of one-period.Last according to correlogram, determine the register file of every bar instruction, then carry out instruction layout, need the cyclic pac king in consideration streamline, kernel circulates, it is emptying to circulate, also need the delay, life cycle and so on of considering instruction, finally obtain the instruction layout walked abreast.

The problem considering that DDR3 bus takies is needed when multinuclear realizes correlative accumulation and CFAR detects, so also need staggered, each core completes correlative accumulation and the CFAR detection of partial distance section separately afterwards, obtain an impact point information, the heart computing carried out after finally the impact point information of 8 cores being asked value obtains the information result of target.The multinuclear realization flow figure that correlative accumulation and CFAR detect as depicted in figure 8.

Step 6: send target information to host computer by network interface

Obtain impact point information after being detected by correlative accumulation and CFAR, information is transferred to host computer by the network interface of 6678.Network interface card controller (NETCP) is had, mainly for the treatment of Ethernet data bag in 6678.This controller is by the PKTDMA controller transmitted for control data bag DMA; For the bag accelerator (PA) of identification of data packets and classification; For safety accelerating machine (SA) and the gigabit Ethernet conversion subsystem four part composition of Data Packet Encryption and deciphering.The quick exchange of common complete packet between dsp chip and external unit.The structural drawing of network interface card controller as shown in Figure 9.

When externally device sends packet to dsp chip, data enter network interface card controller through DMA, in SA, complete ciphering process, enter PA by data stream bus switches afterwards, in PA, add mac frame head, IP frame head and UDP/TCP frame head according to the descriptor preset.Enter gigabit Ethernet conversion subsystem by data stream bus switches afterwards, carry out identifying packing and being sent to external unit from predetermined port to data in this module.

(3) advantage and effect: the present invention is a kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP and Parallel Implementation method thereof, and its advantage is: hardware circuit is simple, small volume; Process structure adopts FPGA+ multi-core DSP framework, in fpga core chip, realize Digital Down Convert, carries out pulse compression, correlative accumulation and CFAR and detects, given full play to the parallel processing performance of system in dsp chip.

Accompanying drawing explanation

Fig. 1 system architecture diagram.

Fig. 2 FPGA realizes Digital Down Convert schematic diagram.

The SRIO function structure chart of Fig. 3 FPGA.

The SRIO function structure chart of Fig. 4 DSP.

Fig. 5 Radar Signal Processing process flow diagram.

Fig. 6 multi-core DSP accumulator system design drawing.

Fig. 7 multi-core DSP realizes pulse compression process flow diagram.

Fig. 8 multi-core DSP realizes correlative accumulation and CFAR overhaul flow chart.

Fig. 9 network interface card controller architecture figure.

Figure 10 multi-core DSP realizes Radar Signal Processing process flow diagram.

In figure, symbol description is as follows:

Fig. 1: SRIO full name is serialrapidio, high-speed serial bus; PCIe full name is peripheralcomponentinterfaceexpress, high-speed bus and interface; EMIF full name is externalmemoryinterface, external memory interface; DDR3 full name is doubledatadate3sdram, octuple speed synchronous DRAM.

Fig. 2: AD_data 14 bit data representing input; I, Q two paths of data that AD_data_I and AD_data_Q obtains after representing extraction; AD_data_BB gets 16 after representing the filtering of I, Q two paths of data and is spliced into 32 bit data.

Fig. 4: FIFO full name is firstinfirstout, First Input First Output; VBUSM refers to the main equipment on VBUS; LSU is the register title in DSP; MMR full name is memorymappedregister, is the one of DSP storer.

Fig. 5: LFM refers to linear FM signal; FFT refers to Fast Fourier Transform (FFT); IFFT refers to inverse fast Fourier transform.

Fig. 6: CORE0, CORE7 represents core 0, core 7; L1D/32KBCACHE, L1P/32KBCACHE represent the level one data and program storage that are configured to 32KB buffer memory; LL2/512KBSRAM represents the local second-level storage being configured to 512KB internal memory; CACHE represents and can be configured to buffer memory; EDMA represents the direct memory access of enhancement mode.

Fig. 7, Fig. 8: IPCGR0 ~ IPCGR7 represents the internuclear interruption producing core 0 ~ 7.

Fig. 9: the PKTDMA DMA referring to packet transmits; SA refers to safety accelerating machine; PA refers to bag accelerator; GbESwitchSubSystem refers to gigabit Ethernet conversion subsystem; SGMII refers to Serial Gigabit Media Independent Interface.

Figure 10: 0x90000000 etc. refers to the logical address in DSP; IPC0 ~ 7 refer to the interruption of core 0 ~ 7; CFAR refers to constant false alarm rate and detects.

Embodiment

Below according to summary of the invention, in conjunction with Figure of description, to a kind of PD Radar Signal Processing System based on FPGA+DSP of the present invention and multinuclear implementation method thereof, be specifically described:

The present invention is by hardware program language, C language programming and the multi-core DSP programming realization PD Radar Signal Processing hardware system based on FPGA+DSP.

(1) the invention provides a kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP, comprise fpga core chip and peripheral minimum system circuit, dsp chip and peripheral minimum system circuit, gigabit networking interface chip, power supply chip and level transferring chip.Its system architecture as shown in Figure 1, position annexation between them and signal trend are: the straight ripple of radar that FPGA reception data acquisition chip collects and echoed signal, exist after carrying out down-converted in internal memory, transferred data to the DDR3 in dsp chip by the SRIO interface between fpga core chip and dsp chip after receiving the data of a frame, then dsp chip carries out pulse compression, correlative accumulation and constant false alarm rate (CFAR) detection, obtain the information of impact point, finally by network interface, target information is uploaded to host computer.

This fpga core chip selection XC6VSX315T, belong to the Virtex-6 series of Xilinx company, adopt the 40nm manufacturing process of third generation XilinxASMBL framework, there is efficient pair register 6 and input LUT (look-up table) logic, there is abundant IO resource, a large amount of on-chip memory resources, supports DDR3.Lower power consumption 50% compared with last generation product, cost reduces by 20%.In addition, this chip has powerful signal handling capacity and has the ability connected in series based on low-power consumption GTX6.5Gbps transceiver, ensures the high speed serial transmission between FPGA and DSP.Fpga core chip is sampled after the data that obtain receiving data acquisition chip, exists in internal memory, be transferred in dsp chip after obtaining the data of a frame by SRIO after Digital Down Convert by data.Fig. 2 is that FPGA realizes Digital Down Convert schematic diagram.

This fpga chip minimum system peripheral circuit, comprises clock source and program loads FLASH, and they are responsible for auxiliary fpga core chip and complete processing capacity.Clock source provides clock signal for FPGA; Because the power-off of FPGA Program is eliminated automatically, load in FLASH so program code must be cured to a program, after powering on, the program in FLASH is loaded in fpga core chip automatically to make it normally work at every turn.Clock source provides the system clock of fpga core chip operation, and the frequency required for crystal oscillator produces directly sends fpga core chip to.

The TMS320C6678 multinuclear process chip that this dsp chip adopts TI to release.This chip adopts a kind of Harvard's bus structure of improvement: the program bus of a set of 256, two cover 64 bit data bus and a set of 32 DMA private buss.Processing unit adopts high-performance, advanced very long instruction word structure, and per clock period can the instruction of executed in parallel 8 32bit.Adopt 8 arithmetic speeds to build up to the DSP kernel of 1.25GHz, achieve 320GMAC and 160GFLOP fixed point and floating-point performance on a single chip.Monokaryon is except can be configured to L1P and L1D of the 32KB of CACHE, also comprise 512KB can be configured to RAM or CACHE LL2SRAM, also has the multinuclear shared drive of 4MB in addition, can use as the L2SRAM shared or shared L3SRAM, built-in DDR3 controller, addressable 33bit address and 8GB storage space.TMS320C6678 chip provides abundant peripheral interface, and wherein according to mission requirements, the interfaces such as serial RapidIO, PCIE, Hyperlink, DDR3 are mainly used in signal transacting combination.The frame radar pulse string signal utilizing SRIO to receive FPGA to be in the present invention transferred to, carry out design and the distribution of multinuclear task, the multi-core parallel concurrent implementation procedure that layout pulse compression, correlative accumulation and CFAR detect, finally obtain the information of impact point, upload to host computer by network interface, realize the performance boost of computation process.

This dsp chip minimum system peripheral circuit, comprises clock source, and program loads FLASH, and outside DDR3 storer, they are responsible for auxiliary DSP chip and complete processing capacity.Owing to being automatically eliminated after the power-off of dsp chip program, load in FLASH so program code must be cured to a program, after powering on, the program in FLASH is loaded in dsp chip automatically to make it normally work at every turn.Because dsp chip needs temporary and process mass data, so must in its outside extension storage space.The data such as the buffering result of raw data and intermediate treatment are stored wherein by the plug-in four DDR3 storeies of dsp chip.Clock source provides the system clock of DSP acp chip work, and the frequency required for crystal oscillator produces directly sends dsp chip to.

This power supply chip provides the voltage needed for whole system work.Extraneous to the isolation voltage of system input+5V, by power supply chip ,+5V voltage is changed into+3.3V, + 2.5V, + 1.8V, + 1.5V, + 1.2V, + 1.0V, + 0.75V, CMGT_AVTT, CMGT_AVCC, be supplied to fpga core chip (+3.3V respectively, + 2.5V, + 1.8V, + 1.0V), program loads FLASH (+3.3V, + 1.8V), DSP acp chip (+3.3V, + 1.8V, + 1.0V), DDR3 module (+1.5V, + 0.75V), gigabit networking interface chip (+3.3V, + 1.2V), clock providing source (+3.3V), wherein CMGT_AVTT and CMGT_AVCC is respectively FPGA high-speed interface and provides+1.2V and+1.0V voltage.

(2) the present invention is a kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP and Parallel Implementation method thereof, its building process is summarized as follows: in fpga core chip, receive the intermediate frequency data that data acquisition obtains, obtain baseband signal data after carrying out Digital Down Convert, and data are sent into ram in slice and carry out buffer memory; After obtaining a frame pulse string data, carried out the transmission of data by the High Speed Serial SRIO of fpga core chip and dsp chip; Dsp chip obtains the baseband signal after a frame down coversion, leaves in DDR3, and design multi-core parallel concurrent realizes the pulse compression algorithm process of frame data; By the deposit data after pulse compression in the buffer memory of DDR3, then design multinuclear and realize parallel correlative accumulation algorithm process and CFAR check processing, obtain the information of impact point; Finally by network interface, impact point information is sent to host computer.

Step 3: configure SRIO register and receive data and leave in DDR3 in dsp chip

Step 4: realize multinuclear process pulse compression algorithm in dsp chip

|X|≈g(I,Q)＝amax{|I|,|Q|}+bmin{|I|,|Q|}

|X|≈max{TL+1/8TS,27/32TL+9/16TS}

The problem considering that DDR3 bus takies is needed when multinuclear realizes correlative accumulation and CFAR detects, so also need staggered, each core completes correlative accumulation and the CFAR detection of partial distance section separately afterwards, obtain an impact point information, the heart computing carried out after finally the impact point information of 8 cores being asked value obtains the information result of target.The multinuclear realization flow figure that correlative accumulation and CFAR detect as shown in Figure 8.

Step 6: send target information to host computer by network interface

The process flow diagram that in last dsp chip, multinuclear realizes pulse compression, correlative accumulation and CFAR detect as shown in Figure 10.

Based on the PD Radar Signal Processing System of FPGA+DSP and the main devices of multinuclear implementation method hardware circuit thereof be:

The selection of fpga core chip:

Select the Virtex-6XC6VSX315T of Xilinx company

Virtex-6 series is the fpga core chip of new generation that Xilinx company releases, this Series FPGA acp chip has carried out most suitable Combinatorial Optimization, comprise dirigibility, hard kernel IP, transceiver function and developing instrument support, total solution can be provided for the application of communication, network and digital processing field.

Virtex-6XC6VSX315T is a member of Virtex-6 family.There is following principal feature:

1) 49200 slice;

2) 12 MMCM (Mixed-ModeClockManagers) modules;

3)25344KbitsRAM；

4) 720 general purpose I/O pins;

5) 24 GTX modules;

6) 2 PCIe interface modules.

In addition, Xilinx company additionally provides powerful development platform (ISE), and developer completes whole design by this platform.

Program loads the selection of FLASH chip:

Select the XCF16P of Xilinx company.

XCF16P capacity is 16Mbit, and its memory capacity can support that the fpga core chip of multiple Xilinx company carries out power-up routine loading.

The selection of dsp chip:

Select the TMS320C6678 of TI company

TMS320C6678, it adopts a kind of Harvard's bus structure of improvement: the program bus of a set of 256, two cover 32 bit data bus and a set of 32 DMA private buss, and its principal feature is as follows:

1) processing unit adopts high-performance, advanced very long instruction word structure, and per clock period can the instruction of executed in parallel 8 32bit;

2) TMS320C6678 adopts 8 arithmetic speeds to build up to the DSP kernel of 1.25GHz, and individual devices incorporates 320GMAC and 160GFLOP fixed point and floating-point performance.

3) TMS320C6678 incorporates jumbo on-chip memory, each core is except 32KBL1P and L1D that can be configured to CACHE, also comprise the L2 that 512KB can be configured to RAM or CACHE, also have the multinuclear shared drive of 4MB in addition, shared L2SRAM or shared L3SRAM can be used as and use.

4) TMS320C6678 chip provides abundant peripheral interface, and native system mainly uses the interfaces such as SRIO, DDR3.These interfaces mainly use in computing dsp chip.Wherein SRIO is used for the data communication of dsp chip and fpga core chip, and DDR3 is used for DSP exterior storage.

In addition, the dsp chip Integrated Development Environment (CCS5) that Texas Instruments provides, developer completes all designs and debugging by this Integrated Development Environment.

The selection of power supply chip:

Described power supply chip is LTM4616 and LTM4627 of LinearTechnology company.

The key property of LTM4616 is as follows:

1) input voltage range 2.7V to 5.5V;

2) out-put supply scope 0.6V to 5V;

3) overcurrent and overheating protection;

4) output voltage overvoltage protection;

5) (15mm × 15mm × 2.82mm) LGA encapsulates and (15mmx15mmx3.42mm) BGA package.

The key property of LTM4627 is as follows:

1) input voltage range 4.5V to 20V;

2) output voltage range 0.6V to 5V;

3) for realizing the difference far-end sampling amplifier of accurate voltage stabilizing;

4) output voltage overvoltage protection;

5) (15mm × 15mm × 4.32mm) LGA encapsulates and 15mmx15mmx4.92mmBGA encapsulation.

Gigabit networking interface chip:

Gigabit networking interface chip selects the 88E1111 chip of Marvell company.This chip complete support IEEE802.3 protocol family, built-in 1.25G serial deserializer, meets the application of gigabit optical transport, uses Standard Digital CMOS manufacture, possess self-adaptation, super low-power consumption pattern.It supports Gigabit Media gateway interface (GMII), the GMII (RGMII) simplified, Serial Gigabit Media gateway interface (SGMII).

Level transferring chip:

What level transferring chip adopted is the SN74ALVC164245 chip that TI company releases.This chip belongs to the Widebus series of TI, supports the level conversion between+2.5V and+3.3V ,+3.3V and+5V, for the asynchronous communication between data bus.

System realizes result

Application VHDL Hardware description language fixed point C language of making peace is programmed, and the module write is downloaded in XilinxVirtex-6XC6VSX315T and TMS320C6678.In experimentation, emulation generation one frame pulse string data is input in FPGA module and inputs as data, is observed by ChipScopePro (logic analyser that XilinxISE software carries), PC.

The resource taken in fpga core chip is as follows:

Table 1FPGA acp chip system resource service condition

In multi-core DSP chip, realize 200 pulses pulse compression algorithm and monokaryon realize making comparisons, and the consumption clock period recorded is as follows:

Table 2 pulse compression monokaryon and multinuclear realize consuming the clock period

In multi-core DSP chip, realize the correlative accumulation of 200 pulses and CFAR detection algorithm and monokaryon realize making comparisons, the consumption clock period recorded is as follows:

Table 3 correlative accumulation and CFAR detect monokaryon and multinuclear realizes consuming the clock period

A kind of PD Radar Signal Processing System based on FPGA+ multi-core DSP and Parallel Implementation method thereof, PD Radar Signal Detection is achieved by VHDL language, fixed point C language and assembly routine, and in the test of reality, demonstrate the parallel optimization of multinuclear, demonstrate the feasibility of multinuclear optimal design, and have following characteristics:

hardware circuit is simple, small volume, provides exploration and foundation for system in future is integrated.

process structure adopts FPGA+ multi-core DSP framework, in fpga core chip, realize Digital Down Convert, carries out pulse compression, correlative accumulation and CFAR and detects, given full play to the parallel processing performance of system in dsp chip.

achieved the multinuclear optimization of Radar Signal Detection flow process by test, improve performance.

Visible, the PD Radar Signal Processing System based on FPGA+ multi-core DSP has very high using value, is improved in practical application real-time, has good application prospect.

Claims

1. based on a PD Radar Signal Processing System for FPGA+ multi-core DSP, it is characterized in that: it comprises fpga core chip and peripheral minimum system circuit, dsp chip and peripheral minimum system circuit, gigabit networking interface chip, power supply chip and level transferring chip; The straight ripple of radar that fpga core chip reception data acquisition chip collects and echoed signal, exist after carrying out down-converted in internal memory, transferred data to the DDR3 in dsp chip by the SRIO interface between fpga core chip and dsp chip after receiving the data of a frame, then dsp chip carries out pulse compression, correlative accumulation and constant false alarm rate detection, obtain the information of impact point, finally by network interface, target information is uploaded to host computer;

This fpga core chip is XC6VSX315T, has efficient pair register 6 and inputs LUT logic, abundant IO resource and a large amount of on-chip memory resources, supports DDR3; In addition, this chip has powerful signal handling capacity and has the ability connected in series based on low-power consumption GTX6.5Gbps transceiver, ensures the high speed serial transmission between fpga core chip and dsp chip; Fpga core chip is sampled after the data that obtain receiving data acquisition chip, exists in internal memory, be transferred in dsp chip after obtaining the data of a frame by SRIO after Digital Down Convert by data;

This fpga core chip periphery minimum system circuit, comprises clock source and program loads FLASH, and they are responsible for auxiliary fpga core chip and complete processing capacity; Clock source provides clock signal for fpga core chip; Because the power-off of fpga core chip Program is eliminated automatically, load in FLASH so program code must be cured to a program, after powering on, the program in FLASH is loaded in fpga core chip automatically to make it normally work at every turn; Clock source provides the system clock of fpga core chip operation, and the frequency required for crystal oscillator produces directly sends fpga core chip to;

This dsp chip is TMS320C6678 multinuclear process chip, and this chip adopts a kind of Harvard's bus structure of improvement: the program bus of a set of 256, two cover 64 bit data bus and a set of 32 DMA private buss; Processing unit adopts high-performance, advanced very long instruction word structure, the instruction of per clock period energy executed in parallel 8 32bit; Adopt 8 arithmetic speeds to build up to the DSP kernel of 1.25GHz, achieve 320GMAC and 160GFLOP fixed point and floating-point performance on a single chip; Monokaryon is except being configured to L1P and L1D of the 32KB of CACHE, also comprise 512KB be configured to RAM or CACHE LL2SRAM, also has the multinuclear shared drive of 4MB in addition, can use as the L2SRAM shared or shared L3SRAM, built-in DDR3 controller, energy addressing 33bit address and 8GB storage space; TMS320C6678 chip provides abundant peripheral interface, and wherein according to mission requirements, serial RapidIO, PCIE, Hyperlink, DDR3 interface is mainly used in signal transacting combination; The frame radar pulse string signal receiving fpga core chip utilizing SRIO and be transferred to, carry out design and the distribution of multinuclear task, the multi-core parallel concurrent implementation procedure that layout pulse compression, correlative accumulation and CFAR detect, finally obtain the information of impact point, upload to host computer by network interface, realize the performance boost of computation process;

The peripheral minimum system circuit of this dsp chip, comprise clock source, program loads FLASH, outside DDR3 storer, and they are responsible for auxiliary DSP chip and complete processing capacity; Owing to being automatically eliminated after the power-off of dsp chip program, load in FLASH so program code must be cured to a program, after powering on, the program in FLASH is loaded in dsp chip automatically to make it normally work at every turn; Because dsp chip needs temporary and process mass data, so must in its outside extension storage space, the plug-in four DDR3 storeies of dsp chip, store the buffering result data of raw data and intermediate treatment wherein; Clock source provides the system clock of dsp chip work, and the frequency required for crystal oscillator produces directly sends dsp chip to;

This gigabit networking interface chip is 88E1111 ethernet physical layer chip, under the control of the EMAC module of dsp chip, transmits original information data with host computer with gigabit Ethernet network form;

This power supply chip provides the voltage needed for whole system work, extraneous to the isolation voltage of system input+5V, by power supply chip ,+5V voltage is changed into+3.3V, + 2.5V, + 1.8V, + 1.5V, + 1.2V, + 1.0V, + 0.75V, CMGT_AVTT, CMGT_AVCC, be supplied to fpga core chip+3.3V respectively, + 2.5V, + 1.8V, + 1.0V, program loads FLASH+3.3V, + 1.8V, dsp chip+3.3V, + 1.8V, + 1.0V, DDR3 module+1.5V, + 0.75V, gigabit networking interface chip+3.3V, + 1.2V, clock providing source+3.3V, wherein CMGT_AVTT and CMGT_AVCC is respectively fpga core chip high speed interface and provides+1.2V and+1.0V voltage,

This level transferring chip is SN74ALVC164245 chip, and this chip supports+2.5V to+3.3V ,+3.3V to the level conversion of+5V.

2., based on a Parallel Implementation method for the PD Radar Signal Processing System of FPGA+ multi-core DSP, the method concrete steps are as follows:

This step is completed by Digital Down Converter Module in fpga core chip, Digital Down Converter Module extracts logic by data acquisition, mould two, time delay correcting filter, dual port RAM module form, Digital Down Converter Module adopts multiphase filter structure, extract through two times of odd evens, if sampling data down-conversion is obtained base band complex data after correcting by time delay; Data acquisition module using the data that obtain after data acquisition chip is sampled as single ended input; Mould two extracts logic and input data pick-up is become I, Q two paths of data, at the rising edge of each clock by zone bit negate, data is got negative when zone bit is 1; Time delay corrects filtering and is realized by 12 rank FIR filter, and coefficient is generated by Matlab; Get high 16 of I, Q two-way after filtering and be spliced into 32 base band datas;

Adopt the SRIO of x4 to interconnect between fpga core chip and dsp chip, single channel rate is 5Gbps, and consider that 8b/10b encodes, effective bandwidth is up to 2Gbps; The serial RapidIOIP core utilizing Xilinx to provide, and devise local side and far-end; Comprise local data process, remote data process and IP kernel; Local data process be responsible for sending local data request packet and receive that far-end sends to local data respond packet; Remote data process is responsible for receiving the packet from far-end; The major function of IP kernel is packing and unpacks, initialization and protocol realization;

When this locality sends data to far-end, data write is sent buffer memory, and give transmit control device enabling signal after writing; Local data end for process, according to the SRIO header packet information set, comprises Packet type, bag size, bag number, sends address, the other side ID and control to produce request module produce bag from transmission buffer memory; These bags are transferred to far-end through process of IP kernel, when far-end obtains packet and when sending respond packet to this locality, the serial bit stream received is solved SRIO and wraps and pass to local data response processing module by IP kernel; Data write in bag is received buffer memory by remote data processing controls remote data request processing module, and transmits completion signal after writing to the module needing data, needs the module of data from reception buffer memory sense data;

Step 3: configure SRIO register and receive data and leave in DDR3 in dsp chip

In the SRIO module of dsp chip end, local device is dsp chip, and far-end device refers to fpga core chip; SRIO module in dsp chip is primarily of loading/unloading module and Physical layer composition, and loading/unloading module sends VBUSM request to DDR3 storer, accepts VBUSM response under the control of CPU/EDMA; In loading/unloading module, MMR command register controls send buffer memory and receive buffer memory, and is connected with the FIFO of Physical layer;

In dsp chip, usually call Support Library function on CSL and sheet and realize the configuration of SRIO, comprise enable, initialization, open and set up communication functions; The realization of RIO is divided into 4 steps: address maps; Configuration ID, SRIO port, interrupt vector; Configuration register, comprises the configuration of transmission mode and speed; Etc. to be linked; Upon connecting, dsp chip can receive and send SRIO bag, needs to know that the object ID of the other side and start address correctly could transmit data between dsp chip and fpga core chip; Select DirectIO mode when data are transmitted, only need the address mapping relation of TX and RX both sides just can realize transmission;

Step 4: realize multinuclear process pulse compression algorithm in dsp chip

This step completes in dsp chip, needs to design a set of multinuclear tasks in parallel implementation algorithm to carry out the data processing of pulse compression algorithm; Pulse compression calculates in units of pulse, and correlative accumulation and CFAR detection carry out calculating according to the distance section position unit of train of impulses, only have after train of impulses all completes pulse compression, just can carry out correlative accumulation and CFAR detection, so generally flow process is divided into two subtasks, the pulse compression once completing all pulses calculates, and is once that correlative accumulation and CFAR detect; Due to when pulse compression calculates between pulse data correlation little, the data correlation between correlative accumulation and CFAR detection computations is little, thus multinuclear realize adopt master slave mode realize, a core is responsible for scheduling and the distribution of task, and its cokernel carries out parallel work-flow by task;

Need the configuration considering accumulator system when multinuclear designs, need in signal transacting to carry out data access operation with storer, access performance directly affects the efficiency of algorithm; Memory access performance is relevant with the position that code, data store, also relevant to access mode; By configure each memory size, cache arrange size and data access mode, obtain the transfer rate under different situations; Analyze the access of external memory storage adopts EDMA mode usually, internal storage is directly accessed or IDMA mode by kernel; The kernel access performance of internal memory is good, has wider data bus and instruction bus, leaves in LL2 by the data of key and variable; L1D and L1P is set to the instruction and data buffer memory between CACHE raising kernel and storer; Raw data is placed in DDR3 by the accumulator system of design, the LL2 of each core is configured to SRAM, deposit the data in pulse compression, correlative accumulation and CFAR testing process, L1D and L1P is configured to the access efficiency that CACHE is conducive to improving kernel, in Radar Signal Processing flow process, data are sent to DDR3 by SRIO interface by fpga core chip, then train of impulses are divided into 8 parts, and each core processes wherein 1 part of pulse data respectively; Pulse compression adopts frequency domain mode to realize in engineering, and after namely the frequency spectrum of input signal and the conjugate complex of local reference signal frequency spectrum are taken advantage of, another mistake welfare leaf transformation obtains result; Key step comprises input, FFT, takes advantage of again, IFFT, output; Pulse data is transferred to the LL2 of each core from DDR3 by EDMA module by importation, the buffer area of configuration data process in the LL2 of then each core, carry out data processing, by EDMA module, data are passed to the buffer zone of DDR3 when finally exporting, need the way of output configuring EDMA module when input and output; Twiddle factor required for FFT and IFFT calculates completes when initialization and calculates and exist in the LL2 of each core, takes advantage of the frequency spectrum conjugation of required reference signal also to exist when initialization in the LL2 of each core again, directly calls during calculating;

Relatively independent between kernel in multiple nucleus system, the scheduling that needing intercoms mutually finishes the work; Kernel needs the order of distributing corresponding subtask and execution; Pulse compression algorithm is realized owing to adopting master slave mode, each core completes respective data processing, do not need internuclear data sharing and transmission, so adopt the communication mode of internuclear interrupt mode, configuration of IP CGRx will produce the interruption of core x, SRCS0 ~ 27 arrange interrupt source mark, adopt SRCSx to represent the interrupt identification of core x here, and corresponding mark is removed in SRCC0 ~ 27 of configuration of IP CARx;

Need to consider the structure of Parallel Implementation in layout flow process, but during design, the bus of DDR3 becomes bottleneck, DDR3 data bus is the highest can only configure 64, and only has a set of bus, and each core needs president when accessing, can not return data in time; So stagger when core and Nuclear Data input during design, after previous core completes input, utilize the input of internuclear next core of down trigger; Adopt the input of last round of output and next round to be combined into the input of a module whole as next round during output, avoid DDR3 bus occupation problem like this; The burst process of next round is carried out again until complete the burst process task that each core distributes after core 7 completes whole data processing;

After process of pulse-compression completes, deposit data is in the DDR3 buffer zone of dsp chip, output wherein after each pulse compression stores by the mode of row in DDR3, just carry out data processing in the mode of row when correlative accumulation and CFAR detect like this, completed the ranks transposition of a 2-D data by the output of pulse compression; Correlative accumulation and CFAR detect the mode adopting frequency domain to realize, and each core completes asks mould and CFAR to detect apart from the data input of cutting into slices, correlative accumulation, plural number;

The distance section of train of impulses is as data, the mode of EDMA is adopted to be transferred to LL2 from DDR3, correlative accumulation adopts FFT to realize in engineering, and its twiddle factor also completes when initialization and calculates and leave in LL2, directly completes in LL2 when FFT calculates; The complex points data that obtain after correlative accumulation need to carry out asking mould to calculate amplitude, and it is consuming time and need to consider data from overflow that conventional quadratic sum opens radical sign, adopts the mode of linear-apporximation to realize plural number and ask mould;

|X|≈g(I,Q)＝amax{|I|,|Q|}+bmin{|I|,|Q|}

Wherein a, b are weighting coefficient, and the value of coefficient is relevant with the requirement of relative error; The method that the ripples such as employing are approximate, choose suitable a, b parameter, make error below 0.8%, its formula is as follows:

|X|≈max{TL+1/8TS,27/32TL+9/16TS}

CFAR detects the mode adopting conventional CA-CFAR rate to detect and realizes, usually adopt the mode of drawing window detection to realize when real data process, when calculating once sliding window and detecting, test cell both sides are provided with protected location and are used for preventing target from crossing over multiple unit and cause interference, for judging whether there is target in one of them unit, need to be averaging left and right reference unit to be multiplied by threshold factor again and to obtain detection threshold, then with draw whether there is target after test cell, when Project Realization, usual CFAR detects needs to consume a large amount of cycle, and TMS320C6678 support software flowing water technology realizes parallel operation, is optimized the process of CFAR detection by assembly instruction layout, specific implementation process is as follows: first complete writing of C language code, then linear assembler code is rewritten into, according to the periodicity of code determination iteration, draw correlogram afterwards, namely the functional unit used by every bar instruction is determined, 8 functional units are had in TMS320C6678 monokaryon, bus supports parallel 8 instructions of one-period, last according to correlogram, determine the register file of every bar instruction, then instruction layout is carried out, need to consider the cyclic pac king in streamline, kernel circulates, circulate emptying, also need the delay considering instruction, life cycle, finally obtain the instruction layout walked abreast,

The problem considering that DDR3 bus takies is needed when multinuclear realizes correlative accumulation and CFAR detects, so also need staggered, each core completes correlative accumulation and the CFAR detection of partial distance section separately afterwards, obtain an impact point information, the heart computing carried out after finally the impact point information of 8 cores being asked value obtains the information result of target;

Step 6: send target information to host computer by network interface

Obtain impact point information after being detected by correlative accumulation and CFAR, information is transferred to host computer by the network interface of 6678, has network interface card controller NETCP in 6678, for the treatment of Ethernet data bag; This controller is made up of the PKTDMA controller transmitted for control data bag DMA, the bag accelerator PA for identification of data packets and classification, the safety accelerating machine SA for Data Packet Encryption and deciphering and gigabit Ethernet conversion subsystem four part; The quick exchange of common complete packet between dsp chip and external unit;

When externally device sends packet to dsp chip, data enter network interface card controller through DMA, ciphering process is completed in SA, PA is entered afterwards by data stream bus switches, mac frame head, IP frame head and UDP/TCP frame head is added according to the descriptor preset in PA, enter gigabit Ethernet conversion subsystem by data stream bus switches afterwards, carry out identifying packing and being sent to external unit from predetermined port to data in this module.