Summary of the invention
The present invention realizes computational efficiency and the precision of particle filter algorithm in order to improve hardware, a kind of Hardware Implementation of the sample importance Resampling Particle Filter based on FPGA is proposed, the modules of application FPGA design particle filter algorithm, thereby be efficient calculation and the hardware problem of implementation of Complex Particle filtering algorithm in engineering application, a kind of solution thinking of novelty is provided.
The present invention is based on the Hardware Implementation of the sample importance Resampling Particle Filter (Samples Importance Resampling Particle Filter-SIRF) of FPGA, described particle filter comprises particle generation module, particle update module, resampling module and output generation module, wherein:
(1) particle generation module exports respectively particle update module, resampling module to after being used for receiving input vector generation particle;
(2) it is to export resampling module to after weights calculating and weights normalization that the particle that particle update module is used for that step (1) is generated upgrades;
(3) resampling module feeds back to step (1) particle generation module for the particle of the particle after the described renewal of step (2) or step (1) generation being carried out after resampling process and state upgrade;
(4) output generation module generates output for the particle of the particle after the described renewal of step (2) or step (1) generation is carried out to data.
The all input and output of the described particle generation module of step (1) are all M dimension (M=4) vectors, and the parameter of buffer control unit is identical.
The all inputoutput datas of described resampling module are all M dimensions.
The described particle update module of step (2) is divided into three processing module: PU1, PU2 and PU3;
PU1 processing module receives the input from particle generation module, will export M dimension ephemeral data t
pU1be transported to PU2 processing module;
PU2 module receives the M dimension ephemeral data t from PU1 processing module
pU1carry out weights with external observation input (z (n)) and calculate formation output stream t
pU2, generate weights accumulated value sum simultaneously;
PU3 processing module receives the output stream t from PU2 processing module
pU2sum carries out weights normalization with weights accumulated value, then standardized weight w is stored in to output buffer, and exports resampling module and particle generation module to.
The described output generation module normalization output of step (4) is:
Wherein: u
xoutput variable for output generation module; Sum be weights and; t
pU2output for PU2 module.
The hardware design methods that the present invention proposes can expand to dynamically reconfiguring of different particle filters.For each particle filter, first define the operation of each processing module, then define data flow architecture, finally design buffer control unit and global controller.
Most important part is data center, and it is responsible for transfer of data a large amount of between processing module.Whole filter uses module level the pipeline design, has greatly simplified design cycle.Module level streamline is realized synchronous execution by distributed director, and the data that this controller is controlled each processing module generate and transmission.
The main beneficial effect of the present invention's design is exactly the detailed process that has provided application module level production line design philosophy design particle filter hardware, adopt mode that circulation is merged to remove the weights normalization step in particle filter algorithm simultaneously and remove, thereby realized, sampling, weights calculate and the parallel distribute architecture of resampling process.The method for designing of this particle filter has been accelerated the time of implementation of filter, has reduced the algorithm complex of particle filter algorithm simultaneously.
Embodiment
Particle filter algorithm has following two unique execution characteristics: (1) can be expressed as data flow diagram, and node (or module) can concurrently be carried out.Although the complexity of each module is different, data flow diagram can clearly represent the data dependence relation between each module; (2) one group of data in each each cycle of resume module in data flow diagram.
For application hardware realizes particle filter, the present invention adopts module level the pipeline design method, particle filter is divided into particle generation module, particle update module, resampling module and output generation module, and modules executed in parallel, can significantly improve the operational efficiency of algorithm.In order to make full use of buffer control unit, by following three requirements, design processing module: (1) designs processing module on the basis of eliminating the dependence (except data dependence) of control signal between each processing module simultaneously.If the control signal between any two processing modules has dependence, by time data, design these dependences.If the dependence between control signal is completely inevitable, using control signal as data and by buffer control unit, realize control.(2) guarantee to select under the certain prerequisite of the generation of data and operating speed the size of processing module, also will guarantee to generate consistent with the quantity of usage data simultaneously.(3) only have a global clock, the clock signal of other processing modules all comes from global clock.
The present invention adopts distributed director, uses module level the pipeline design method design sample Importance Resampling Particle Filter (SIRF), and the two dimension of take pure orientation target following is as processing object, and the main unknown state of estimating is cartesian coordinate system (X
n=[x, V
x, y, V
y]
t) in position and the speed of tracing object, x wherein, y refers to the position coordinates of target, V
x, V
yrespectively x, the velocity component in y direction.Whole particle filter is comprised of several processing modules, the arithmetical operation of the various complexity of each resume module, and each processing module has for controlling the local control of its operation simultaneously.Distributed control can be processed the data dependence relation between each data module efficiently.
Whole particle filter has been divided into several processing modules, and each processing module has one for controlling the local control of its operation.First define the operation of each processing module, then define data flow architecture.Design a buffer control unit and global controller simultaneously, the data that global controller is controlled each processing module generate and transmission, adopt distributed control to process efficiently the data dependence relation between each data module, module level streamline is realized synchronous execution by distributed director.Whole filter uses module level the pipeline design, has greatly simplified design cycle.
The present invention has designed each processing module of SIRF, comprise particle generation module (PG), particle update module (PU), mean value computation/generation output (MC/OG), (RS) module that resamples etc., be illustrated in figure 1 the module data flow graph of SIRF.
(1) modular design
Particle generation module (PG): the major function of this module generates particle exactly.In PG processing module, there are four to connect input vector
buffering area be connected output vector (x, V with 4
x, y, V
y) buffering area.The output of PG module is for (RS) module that resamples.In addition, two buffering areas that connect (x, y) are for particle update module PU1.All input and output are all M dimensional vectors, and the parameter of buffer control unit is identical.Particle generation module obtains all output by parallel computation.
Particle update module (PU): the major function of particle update module has been that weights calculate and weights normalization, and the arithmetical operation that this module completes has multiplication, classification, division, antitrigonometric function arctan () and exponential function exp ().Adopt rotation of coordinate numerical calculation method (CORDIC) to launch the operator as artan () and exp ().According to the dimension of arithmetic element, particle is upgraded to computing and in function, be divided into three processing module: PU1, PU2 and PU3.
PU1 processing module has 2 receptions will export t from the input block of (x, the y) of PG processing module and 1
pU1be transported to the buffering area of PU2 processing module.PU1 processing module completes the calculating of artan (y/x) and generates M dimension ephemeral data t
pU1.For artan () computing, in order to differentiate (x, y) and (x ,-y), with a normal value pi/2 and a multiplexer, adjust angle.Owing to there is no data dependence relation between PG and PU1 processing module, once the input block of PU1 module obtains data, PU1 processing module is just directly calculated its output.
PU2 module has two input blocks, respectively from the (t of PU1 processing module
pU1) and external observation input (z (n)).During n step iteration, the value of z (n) is constant.PU2 has two output buffers.Respectively (the t that exports to PU3 processing module
pU2) and (sum).Two output buffers of PU2 module are by (t
pU2) and (sum) flow to PU3 module, the function of PU2 module is to be mainly responsible for weights to calculate, but this module is not normalized weights, but they are defined as to output stream (t
pU2).Meanwhile, at the generation sum that finally these weights added up of weight calculation, sum carries out weights normalization as the input of PU3 processing module.PU3 processing module has two from the input block (t of PU2 processing module
pU2, sum), each does not have standardized t PU3 standardization
pU2and sum, then standardized weight w is stored in to output buffer, and for RS processing module and the PG processing module that generates particle.
Resampling module (RS): RS module mainly carries out resampling process and state upgrades calculating, therefore does not need to design separately a state update module.RS module has 5 input blocks, wherein connects (x, V
x, y, V
y) 4 buffering areas from PG processing module, also have a buffering area from the buffering area of normalized weight (w) in PU3 processing module.Output resamples
be stored in 4 output buffers.The all inputoutput datas of RS module are all M dimensions.
RS processing module copies weight compared with macroparticle and eliminates the particle that weight is less.By reading each weight and copying particle according to weight and realize above operation.Because all weights are all standardized, and the summation of weight equals 1.Therefore, resample and have the particle of equal number afterwards.Whole resampling process need is M clock cycle at least, and this is because the poorest in the situation that, and all weights may be all zero, thereby effectively particle may produce at M all after date.The output of RS processing module must be just available after M cycle.Therefore, PG and OG processing module must be waited for M cycle before reading valid data.
Output generation module (MC/OG): for sharing module buffering area and interconnection, utilize the data of PG processing module generation and weight and the sum of the calculating of PU2 processing module to design this module.Then carry out this module and come normalization output to be by the value of sum:
Data relationship between SIRF algorithm modules as shown in Figure 2, the figure illustrates the data annexation between processing module and buffering area.Resample and state renewal is incorporated into same module and output (estimation) calculation procedure directly obtains data from sampling step.
(2) controller design
Filter application buffer control unit is realized integrated operation, determines that controller architecture and the whole parameter realizing are as follows: L
maxi, L
i, nr
i, nw
i, M
i, C
i, P
i, F
iand D
i.L wherein
maxirefer to the logical delay between processing module; Actual L
iscope be 0 < L
i< L
maxi; Nr
ito write buffering area and read the side-play amount between buffering area; Nw
ito read previous moment buffer control unit and write the side-play amount between current buffer control unit; C
i, P
iand F
irefer to respectively data utilization rate, data formation speed and the processing speed of processing module i; Di refers to the retardation coefficient of processing module i generated data.Parameter M
ithe data flow dimension of data generation module, parameter (M
i, nr
i, nw
i) by describing function, obtained, and parameter (L
i, C
i, P
i, F
i, D
i) by the program that realizes of processing module, obtained.
(3) FPGA realizes
The data flow diagram of the SIRF that application FPGA realizes as shown in Figure 3, has provided the annexation between processing module and buffering area in figure.Table 1 has been listed the major parameter of each processing module, and the actual speed scope of processing module is between 206MHz~351MHz, and owing to being subject to the speed restriction of CORDIC method, in order to simplify controller design, choosing 206MHz is global clock simultaneously.Length of delay can be obtained by internal data flow, the shared FPGA resource of each module while realizing that this table gives FPGA.
Table 1 processing module information table
Table 2 has provided the data dependence relation between modules, and in this table, except E3, E4, E6, E7 and E8, the parameter of other connections is all given tacit consent to.For connecting E3 and E4, owing to utilizing t
pU2last data generate sum simultaneously, have respectively nw
3=M+1, nw
4=2, by sequential chart, can find out, sum utilizes t by PU2 module at M all after date
pU2m data generate.For E6, there is nr
6=M+60, wherein nr
1+ L
pU1+ nr
2+ nw
2+ L
pU2+ nr
4+ nw
4+ L
pU3+ nr
5=1+23+1+1+20+2+1+10+1=60.In order to wait for that RS module generates first data, for E7 and E8, there is respectively nw
7=M, nw
8=M.Owing to there not being speed mismatch, so the value of D is 1 entirely, to only have when 1 data of E5 transmission, M data (being data vector) are just transmitted in other link.The quantity of synchronous buffering area of using is about 5M, and wherein M is filter population used.In order to give the read-write different address of assignment of logical, data of each buffer stores need a plurality of internal storage locations.
The link information table (EIT) of table 2SIRF
By table 1 and table 2, derived the parameter of all buffer control units of SIRF, as shown in table 3.This table has provided time started, the time started of writing and the read-start time of each buffer control unit.
Note the several crucial synchronous point of data flow architecture: the time started of the buffer control unit of (1) E1 and E6 is with to write the time started identical; (2) due to two data of RS resume module, so buffer control unit read-start time of E5 and E6 is identical; (3) buffer control unit of E7 and E8 uses simultaneously; (4) buffer control unit of E3 and E4 is identical start-up time.
The buffer control unit parameter of table 3SIRF