WO2017075868A1 - 一种fir滤波器组及滤波方法 - Google Patents

一种fir滤波器组及滤波方法 Download PDF

Info

Publication number
WO2017075868A1
WO2017075868A1 PCT/CN2015/098343 CN2015098343W WO2017075868A1 WO 2017075868 A1 WO2017075868 A1 WO 2017075868A1 CN 2015098343 W CN2015098343 W CN 2015098343W WO 2017075868 A1 WO2017075868 A1 WO 2017075868A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
cache
alu
resource
input
Prior art date
Application number
PCT/CN2015/098343
Other languages
English (en)
French (fr)
Inventor
马传文
杨丽宁
温龙
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2017075868A1 publication Critical patent/WO2017075868A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03HIMPEDANCE NETWORKS, e.g. RESONANT CIRCUITS; RESONATORS
    • H03H17/00Networks using digital techniques
    • H03H17/02Frequency selective networks

Definitions

  • the present invention relates to digital signal processing technologies, and in particular, to a finite-length unit impulse response (FIR) filter set and a filtering method.
  • FIR finite-length unit impulse response
  • FIR filter is the most basic component in digital signal processing system. It is widely used in communication, image processing, pattern recognition, etc., for example, digital up converter (DUC, Digital Up Converter) in wireless communication systems.
  • DUC Digital Up Converter
  • DDC Digital Down Converter
  • the link of the digital down converter (DDC, Digital Down Converter) contains a large number of FIR filters.
  • ASIC Application Specific Integrated Circuit
  • the embodiments of the present invention are expected to provide an FIR filter bank and a filtering method, which can realize that the internal hardware resources of the filter bank can be reconstructed, reusable, and flexibly configurable, and under the premise of reasonable resources and speed. Can meet different filter combinations.
  • an embodiment of the present invention provides an FIR filter bank, the FIR filter bank includes a control circuit and a data processing circuit coupled to each other; the data processing circuit includes a data stream bus array, a buffer resource pool, and an arithmetic logic. a unit ALU resource pool and an accumulator resource pool; the control circuit includes: a data flow controller, a cache resource mapper, a filter coefficient memory, an ALU controller, an accumulation resource organizer, and an output timing controller; wherein
  • the data stream bus array is configured to receive input data from an input port, receive output data from the accumulator resource pool; and transmit the input data and the output data to a control of the data flow controller to Cache the resource pool, or transmit the output data to an output port according to control of the output timing controller;
  • the buffer resource pool includes at least one cache resource block configured to receive data transmitted by the data stream bus array according to control of the data flow controller, and pass the data flow controller according to a filter order The number and the cascading relationship control the data transmitted by the data stream bus array to form a filter buffer to be calculated;
  • the ALU resource pool includes at least one ALU configured to perform multiply and add calculation on the filter buffer to be calculated according to the cache resource mapper, the filter coefficient memory, and the ALU controller, and calculate the multiply and add The calculation result is transmitted to the accumulator resource pool by the accumulated resource organizer;
  • the accumulator resource pool includes at least one accumulator, each accumulator corresponding to the ALU in the ALU resource pool, configured to multiply and add the ALU according to the allocation of the filter resource by the accumulative resource organizer The calculated calculation results are added to obtain a filtered result; and the filtered result is transmitted to the data stream bus array.
  • the data structure of the data stream bus in the data stream bus array includes: data, a cache resource block identifier corresponding to the data, and an identifier bit for characterizing the data as new data.
  • each cache resource block in the cache resource pool includes at least one string.
  • a register group a cache cascade switch;
  • the cache cascade switch of each cache resource block includes three inputs and an output, wherein the first input of the cache cascade switch and the data Connected to the stream bus array, the second input of the cache cascade switch is connected to the output of the pre-cache resource block; the third input of the cache cascade switch is connected to the data flow controller;
  • the output of the coupled switch is coupled to the input of the register bank.
  • the data flow controller controls the first input of the cache cascade switch to be closed, and the second input is turned on, the input data of the register group of the cache resource block is provided by the cache resource block of the previous stage.
  • each ALU in the ALU resource pool includes two ALU cache blocks, an adder, a multiplier, and a truncation circuit; wherein the two ALU cache blocks respectively correspond to two cache resource blocks.
  • the output filter buffer to be operated, the size of each ALU cache block is the same as the size of the register group in the cache resource block.
  • two ALU cache blocks are respectively connected to two input ports of the adder, and the cache data of the two ALU cache blocks are sent to the adder through the ALU controller;
  • the adder output is connected to the multiplier, and the other input of the multiplier is connected to the filter coefficient memory, wherein the coefficients in the filter coefficient memory are initialized by software, in a preset order Inputting the multiplier to perform a filtering operation;
  • each accumulator in the accumulator resource pool includes an adder, a interceptor and a buffer; wherein the adder is configured to add or self-accumulate the two ALU data;
  • the accumulated resource organizer performs multiplication and addition calculation on the ALU according to the allocation of the filter resource
  • the calculation result is added by the adder, the interceptor and the buffer to obtain a filtering result.
  • the data flow controller is configured to output, according to the configuration, a filtering result of the accumulator in the accumulator resource pool to the data stream bus array;
  • the data flow controller is configured to output a filtering result of the accumulator in the accumulator resource pool to the output port according to a configuration.
  • the output resource controller and the output timing controller may perform a processing by using a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor), or a programmable logic array ( FPGA, Field-Programmable Gate Array) implementation.
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA Field-Programmable Gate Array
  • the embodiment of the present invention provides a filtering method, where the method is applied to the FIR filter bank according to any one of the foregoing aspects, the method includes:
  • the buffer resource block in the cache resource pool receives the data transmitted by the data stream bus array according to the control of the data flow controller, and controls the data flow controller according to the filter order, the number, and the cascade relationship. Filter buffer to be calculated;
  • the ALU in the ALU resource pool of the arithmetic logic unit performs multiplication and addition calculation on the filter buffer to be calculated according to the cache resource mapper, the filter coefficient memory, and the ALU controller, and transmits the calculation result of the multiplication and addition calculation to the cumulative resource organizer to Accumulator resource pool;
  • the accumulators in the accumulator resource pool are added by the accumulating resource organizer to calculate the multiplication and addition calculation of the ALU according to the filtering resource allocation situation, to obtain a filtering result;
  • the method further includes the data flow controller looping the filtering result to the cache resource block by controlling the data stream bus array.
  • the embodiment of the invention provides a FIR filter bank and a filtering method.
  • the internal hardware resources of the filter bank can be reconstructed, reusable and flexibly configurable, and reasonable.
  • Different filtering combinations can be met under the premise of resources and speed.
  • FIG. 1 is a schematic structural diagram of an FIR filter bank according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of another FIR filter bank according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a data structure of a data stream bus in a data stream bus array according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a cache resource block structure and a connection relationship between cache resource blocks included in a cache resource pool according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of an ALU in an ALU resource pool according to an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart diagram of a filtering method according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a data structure of a cache resource block according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a data structure of another cache resource block according to an embodiment of the present invention.
  • the basic idea of the technical solution of the embodiment of the present invention is to uniformly consider all the hardware resources of the filter, so that the filtering resources can be reorganized according to different application scenarios.
  • the structure of different filter banks is formed to realize reconfigurability of the filter bank.
  • the FIR filter bank 10 includes two components: a data processing circuit 101 and a control circuit 102. The two components are coupled to each other. It is to be understood that, in this embodiment, the structure of the FIR filter bank 10 is exemplarily described. Therefore, the external circuit and the electrical components related to the FIR filter bank 10 are not specifically described in this embodiment, and those skilled in the art The related external circuit of the FIR filter bank 10 described in this embodiment can be designed to meet the corresponding application requirements according to the needs of the actual application scenario.
  • the data processing circuit 101 may include a data stream bus array 1011, a cache resource pool 1012, an ALU resource pool 1013, and an accumulator resource pool 1014; and the control circuit 102 may include a data stream controller 1021, a cache resource mapper 1022, a filter coefficient memory 1023, an ALU controller 1024, an accumulation resource organizer 1025, and an output timing controller 1026;
  • Data stream bus array 1011 configured to receive input data from an input port, receive output data from accumulator resource pool 1014; and transmit the input data and the output data to a cache resource pool in accordance with control of data stream controller 1021. 1012, or transmitting the output data to an output port according to control of the output timing controller 1026;
  • the cache resource pool 1012 includes at least one cache resource block configured to receive data transmitted by the data stream bus array 1011 according to the control of the data flow controller 1021, and is configured by the data flow controller 1021 according to the filter order, the number, and the cascade. The relationship is controlled to form a filter buffer to be calculated;
  • the ALU resource pool 1013 includes at least one ALU configured to perform multiplication and addition calculation according to the cache resource mapper 1022, the filter coefficient memory 1023, and the ALU controller 1024 to calculate the filter buffer, and pass the calculation result of the multiplication and addition calculation to the accumulation resource organizer. 1025 is transmitted to the accumulator resource pool 1014;
  • the accumulator resource pool 1014 includes at least one accumulator, each accumulator corresponding to one of the ALU resource pools 1013, configured to perform a multiply-accumulate calculation of the ALU according to the allocation of the filter resource by the accumulative resource organizer 1025. Plus, the filtered result is obtained; and the filtered result is transmitted to the data stream bus array 1011.
  • the data structure of the data stream bus in the data stream bus array 1011 may include: data, a cache resource block corresponding to the data.
  • the bit width Data Width of the data is determined by the bit width of the input data received from the input port; the cache resource block identifier bit width corresponding to the data is determined by the number of cache resource blocks in the cache resource pool 1012;
  • the cache resource block entry determines whether the cache resource block identifier ID corresponding to the data in the data stream bus matches the ID of the cache resource block itself, if the cache of the cache resource block matches
  • the cascode switch points to the input data of the data stream bus array 1011 and the data is shifted backwards, otherwise ignored.
  • the need for traffic needs to be considered, and at least one set (for example, m groups) of data stream buses may be formed into the data stream bus array 1011.
  • the control selection may be performed according to the data flow controller 1021 at the cache resource block entry.
  • each cache resource block includes at least one serial register group, one cache cascade switch; each cache resource block cache cascade switch includes three input terminals and one The output end, wherein the first input end of the cache cascade switch is connected to the data stream bus array 1011. It should be noted that the first input end can be identified by the data stream bus in the data stream bus array 1011 and the cache resource block corresponding to the data.
  • the ID determines the data of the data stream bus in the data stream bus array 1011 received by the cache resource block; the second input end of the cache cascade switch is connected to the output of the pre-cache resource block; and the third input end of the cache cascade switch versus The data stream controller 1021 is connected; the output of the buffer cascade switch is connected to the input of the register set.
  • the data stream controller 1021 can control the on and off of the first input end and the second input end of the cache cascade switch according to the filter order, the number, and the cascade relationship, thereby controlling the source of the input data of the register set by Filter order, number and cascading relationship are controlled, and the reconstruction of cache resources is also realized.
  • the data stream controller 1021 controls the first input end of the cache cascade switch to be turned on, and the second input end is turned off, the input data of the register group of the cache resource block is provided by the data stream bus array 1011, and the data stream bus array 1011
  • the input data received by the input port may be provided, and the output data received by the accumulator resource pool 1014 may be provided.
  • the input data of the register set is the output data received by the accumulator resource pool 1014 provided by the data stream bus array 1011.
  • the data stream controller 1021 controls the first input end of the buffer cascade switch to be closed, and the second input terminal is turned on, the input data of the register group of the cache resource block is The cache resource block of the previous stage is provided, thereby implementing cascading of the internal buffer of the filter.
  • the data stream controller 1021 determines the input data of the register set of the cache resource block by controlling the cache cascade switch is provided by the data stream bus array 1011, and the data stream bus in the data stream bus array 1011 is used to characterize the data as new
  • the first input end in the cache resource block determines whether the cache resource block identifier ID corresponding to the data stream bus and the data matches the cache resource block, and if so, the cache resource block is shifted to the right.
  • the post-cache resource block if connected to the pre-stage, moves to the right to form a filter buffer to be operated.
  • each cache resource block can determine the length of the buffer by setting the number of registers connected in the register group. Specifically, the last few registers can be bypassed, thereby ensuring the buffer resources of the symmetric phase of the filter phase.
  • the symmetry of the block data facilitates the multiplexing processing of the ALU arithmetic unit in the subsequent ALU resource pool.
  • the structure of the ALU in the ALU resource pool 1013 is shown, as shown by the dotted line frame in FIG. It is shown that one ALU in the ALU resource pool 1013 can contain two ALU cache blocks, an adder, a multiplier, and a truncation circuit.
  • the two ALU cache blocks respectively correspond to the filter buffers to be operated of the output of the two cache resource blocks, and the size of each ALU cache block is the same as the size of the register group in the cache resource block, so that the control of the cache resource mapper 1022 can be performed.
  • the controller 1024 sends the buffered data of the two ALU cache blocks to the adder.
  • the adder output is coupled to a multiplier, and the other input of the multiplier is coupled to a filter coefficient memory 1023.
  • the multipliers are input in a certain order to participate in the filtering operation. The operation result after the multiplier operation is intercepted by the truncation circuit and sent to the accumulation resource pool.
  • the adder a port cache data is sent to the adder in steps from the 0 address or the configuration address to the high address under the control of the ALU controller 1024, and the step length is defaulted to 1, and different numbers can be set according to the system requirements;
  • the b port is sent to the adder in steps from the high address or a configuration address to the 0 address, and data 0 can also be selected.
  • an accumulator in accumulator resource pool 1014 can include an adder, a truncator, and a buffer that can be used for both the addition of two ALU data and self-accumulation.
  • the accumulated resource organizer 1025 adds the calculation result of the multiplication and addition calculation to the ALU according to the filter resource allocation condition, and adds the result by the adder, the interceptor and the buffer to obtain a filtering result.
  • the accumulated resource organizer, the control ALU addition relationship, and the self-accumulation cycle number may be configured according to the multiplexing requirement of the ALU.
  • the data flow controller 1021 can also be configured to output a filtered result of the accumulator in the accumulator resource pool 1014 to the data stream bus array 1011 in accordance with the configuration, thereby looping the filtered result back to the corresponding by the data stream bus array 1011.
  • the buffer resource block performs next-stage filtering; or the filtering result of the accumulator in the accumulator resource pool 1014 is controlled to be output to the output port according to the configuration.
  • the output timing controller 1026 controls the filtering results of the output ports to be sorted according to a preset timing and then output.
  • This embodiment provides an FIR filter bank, which realizes that all hardware resources of the filter are uniformly considered, thereby realizing that the internal hardware resources of the filter bank can be reconstructed, reusable, and configurable, and at a reasonable resource and speed. Under the premise of meeting different filter combinations.
  • the filtering method may include :
  • the input port After receiving the input data, the input port transmits the input data to the data stream bus array.
  • transmitting the input data to the data stream bus array may include:
  • the identification ID is correspondingly performed, and the corresponding identification bit dv for characterizing the data as new data is set to be valid.
  • the buffer resource block in the cache resource pool receives the data transmitted by the data flow bus array according to the control of the data flow controller, and controls the data flow controller according to the filter order, the number, and the cascade relationship to form a to-be-calculated Filter buffer
  • Each cache resource block includes at least one serial register group, one cache cascade switch; each cache resource block cache cascade switch includes three input ends and one output end, wherein the cache cascade switch An input terminal is connected to the data stream bus array.
  • the first input end can determine the data stream bus received by the cache resource block by using the data stream bus in the data stream bus array and the cache resource block identifier ID corresponding to the data.
  • the data stream bus data in the array; the second input of the cache cascade switch is connected to the output of the pre-cache resource block; the third input of the cache cascade switch is connected to the data flow controller; and the output of the cache cascade switch Connected to the input of the register bank.
  • the ALU in the ALU resource pool performs multiplication and addition calculation on the filter buffer to be calculated according to the cache resource mapper, the filter coefficient memory, and the ALU controller, and transmits the calculation result of the multiplication and addition calculation to the accumulation by the accumulation resource organizer.
  • each ALU may contain two ALU cache blocks, an adder, a multiplier, and a truncation circuit.
  • the two ALU cache blocks respectively correspond to the filter buffers to be operated of the output of the two cache resource blocks, and the size of each ALU cache block is the same as the size of the register group in the cache resource block, thereby being able to be under the control of the cache resource mapper.
  • time-divisionally mapping the filter to be calculated outputted by the buffer resource block to the corresponding ALU cache block the two ALU cache blocks are respectively connected to the two input ports of the adder, and the two ALU cache blocks are passed through the ALU controller. The cached data is sent to the adder.
  • the output of the adder is connected to a multiplier, and the other input of the multiplier is connected to a filter coefficient memory.
  • the multipliers are input in a certain order to participate in the filtering operation. The operation result after the multiplier operation is intercepted by the truncation circuit and sent to the accumulation resource pool.
  • each accumulator corresponds to one ALU in the ALU resource pool, and each accumulator may include an adder, a interceptor and a buffer; wherein the adder can be used for both ALU data. Adding can also be used as self-accumulation.
  • the accumulated resource organizer 1025 adds the calculation result of the multiplication and addition calculation to the ALU according to the filter resource allocation condition, and adds the result by the adder, the interceptor and the buffer to obtain a filtering result.
  • S605 transmit the filtering result to the data stream bus array according to the control of the data flow controller, and transmit the filtering result in the data stream bus array to the output port;
  • the output port can output the filtering result at the corresponding timing under the control of the output timing controller.
  • the data flow controller can also loop back the filtered results by controlling the data stream bus array. The next level of filtering is performed to the corresponding buffer resource block, thereby implementing cascade of filters.
  • the foregoing process is a flow of the method for filtering the input data by the FIR filter bank.
  • the application of the FIR filter bank is briefly described in four specific embodiments of the third embodiment to the sixth embodiment. Description.
  • the first stage filter is set to 12 coefficient symmetry, 2 times extraction, and 3 times input multiplexing ratio; the second stage filter is 47 coefficients symmetrical, 2 times extraction, 6 times input multiplexing ratio.
  • the first stage filter occupies two ID resource blocks identified as ID0 and ID1, and the cache cascading switch of the cache resource block ID0 is set to be connected to the data stream bus array, and the input data of the register group of the cache resource block ID0 is set by the data flow bus.
  • the array provides; the cache cascading switch of the cache resource block ID1 is set to be connected to the pre-cache resource block ID0, and the input data of the register group of the cache resource block ID1 is provided by the cache resource block of the previous stage;
  • the cache resource block ID2 cache cascade switch is set to be connected to the data stream bus array; the cache resource block ID3-ID7 cache cascade switches are set Connected to the pre-cache resource block.
  • the first stage filtering can complete 6 multiply and add operations in 6 cycles, only one ALU0 is needed in the ALU resource pool.
  • the last two registers of the buffer resource block ID0 are bypassed, so that the buffer resource block ID0 and the buffer resource block ID1 can be mapped to the taps corresponding to the symmetric coefficients.
  • the data is added.
  • the cache resource blocks ID0 and ID1 are respectively mapped to the two ALU cache blocks of ALU0 through the cache resource mapper.
  • symmetric data (d0, d11), (d1, d10) are sequentially completed in six cycles.
  • the first stage filter obtained by adding (d2, d9), (d3, d8), (d4, d7) and (d5, d6) and then multiplying, accumulating, and truncating the coefficients respectively.
  • the output data of the first stage filter is routed to the cache resource block ID2 through the data stream bus array to form
  • the buffer of the second stage filter stores the data structure of the buffer resource block of the second stage filter as shown in FIG. Since the second-stage filter can perform 24 multiply-accumulate operations within 12 cycles, two ALUs in the ALU resource pool are required for implementation, and are set to ALU1 and ALU2.
  • the last 8 cycles map cache resource blocks ID3 and ID6 to the two caches of ALU1, and ID4 and ID5 map to the two caches of ALU2. The remaining data operations are completed in 8 cycles. Then, the outputs of ALU1 and ALU2 are added in the accumulator resource pool and then accumulated. Finally, the accumulated result is truncated, and then output and sorted and outputted through the output port to realize two series filters.
  • two series filters are used to multiplex one ALU unit for processing.
  • Set 12 coefficients of the first stage filter to be symmetric, 2 times extraction, 12 times input multiplexing ratio; 24 stages of the second stage filter are symmetric, 2 times extraction, 24 times input multiplexing ratio.
  • the first stage filter occupies two ID resource blocks identified as ID0 and ID1, and the cache cascading switch of the cache resource block ID0 is set to be connected to the data stream bus array, and the input data of the register group of the cache resource block ID0 is set by the data flow bus.
  • the array provides; the cache cascading switch of the cache resource block ID1 is set to be connected to the pre-cache resource block ID0, and the input data of the register group of the cache resource block ID1 is provided by the cache resource block of the previous stage;
  • the buffer cascading switch of the block ID2 is set to be connected to the data stream bus array; the cache cascading switches of the cache resource block ID3 and the ID4 cache resource block are all set to be connected to the pre-cache resource block.
  • the processing of the first stage filter is the same as that of the first embodiment, and details are not described herein again.
  • the first stage filter completes the operation in 6 cycles, and the second stage filter operation is completed in the remaining 18 cycles. Specifically, the multiplication and addition operations of the 24 numbers of the second stage filter can be completed within 12 beats.
  • the cache resource block ID3 is simultaneously mapped to the two ALU cache blocks of ALU0, and the multiplication operation of the middle 8 data is completed.
  • the last 8 cycles respectively map the cache resource blocks ID2 and ID4 to the two ALU cache blocks of ALU0 to complete the remaining data operations. This achieves a configuration in which two series filters are multiplexed with one ALU unit for processing.
  • the first group of filters may be two series filters.
  • the specific cache resource block, the ALU allocation in the ALU resource pool, and the specific processing implementation process are described in the second embodiment, and are not described here.
  • the second group of filters is an interpolation filter, two 16 data multiplication and addition operations are completed in 8 cycles. Therefore, two ALUs in the ALU resource pool are required to complete the odd and even phase data operations, respectively.
  • the cache resource blocks ID5 and ID6 are mapped to ALU1 and ALU2, respectively.
  • ALU1 multiplies the added data by the odd phase coefficients one by one to obtain the odd phase filtering result. After accumulating and truncating, the filtering result is output to the output port.
  • the result of the even phase filtering is calculated by ALU2.
  • the output port is sorted by the timing, and the odd-even phase values are sequentially output as required to obtain the output result of the second group of filters.
  • the first set of two filter multiplexing ratios is equal to 3/4 of the sum of the reciprocals, and the first set of filters can be assigned to the first set of data stream buses, and the second set of filters can be used separately by the second set of data stream buses. .
  • the cache cascade switch is set to be connected to the first group of data stream buses at the cache resource blocks ID0, ID1, and the cache cascade switch is set to be connected to the second group of data stream buses at the cache resource block ID2.
  • the output of the accumulator is routed through the first set of data stream buses to the cache resource block ID1, and the two sets of two filters in series are connected in series.
  • the filtering method proposed in the embodiment of the present invention is applied to the FIR filter bank according to the first embodiment, so that all the filters can be used.
  • the hardware resources are considered in a unified manner, so that the internal hardware resources of the filter bank can be reconstructed, reusable, flexible and configurable, and can meet different filtering combinations under the premise of reasonable resources and speed.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the embodiment of the invention provides a FIR filter bank and a filtering method.
  • the internal hardware resources of the filter bank can be reconstructed, reusable and flexibly configurable, and reasonable.
  • Different filtering combinations can be met under the premise of resources and speed.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)

Abstract

一种FIR滤波器组及滤波方法,该FIR滤波器组包括相互耦接的控制电路(102)和数据处理电路(101);所述数据处理电路(101)包括数据流总线阵列(1011)、缓存资源池(1012)、算术逻辑单元ALU资源池(1013)、累加器资源池(1014);所述控制电路(102)包括:数据流控制器(1021)、缓存资源映射器(1022)、滤波系数存储器(1023)、ALU控制器(1024)、累加资源组织器(1025)和输出时序控制器(1026)。

Description

一种FIR滤波器组及滤波方法 技术领域
本发明涉及数字信号处理技术,尤其涉及一种有限长单位冲激响应(FIR,Finite Impulse Response)滤波器组及滤波方法。
背景技术
近年来由于软件定义网络(SDN,Software Defined Network)、软件定义存储、软件定义云计算等软件定义概念的提出,使得对硬件产品的功能灵活性、易扩展性、可重构性需求日益增强。
FIR滤波器是数字信号处理系统中最基本的元件,在通信、图像处理、模式识别等领域都有着广泛的应用,例如,在无线通信系统中的数字上变频器(DUC,Digital Up Converter)和数字下变频器(DDC,Digital Down Converter)的链路中,包含有大量的FIR滤波器。
但是,目前在专用集成电路(ASIC,Application Specific Integrated Circuit)设计中,尽管出现了针对单个滤波器结构的可重构改进,但是,对于无法提供对于滤波器组的可重构能力;而且,目前所出现的通过修改滤波器间的连接关系来实现滤波器组的可重构方案,却又缺乏单个滤波器的可重构能力,资源利用率低。无法在多种制式的通信标准长期共存的情况下,实现可重构、可重用且灵活可配置。
发明内容
为解决上述技术问题,本发明实施例期望提供一种FIR滤波器组及滤波方法,实现滤波器组内部硬件资源可重构、可重用且灵活可配置,以及在合理的资源和速度的前提下能够满足不同的滤波组合。
本发明实施例的技术方案是这样实现的:
第一方面,本发明实施例提供了FIR滤波器组,所述FIR滤波器组包括相互耦接的控制电路和数据处理电路;所述数据处理电路包括数据流总线阵列、缓存资源池、算术逻辑单元ALU资源池、累加器资源池;所述控制电路包括:数据流控制器、缓存资源映射器、滤波系数存储器、ALU控制器、累加资源组织器和输出时序控制器;其中,
所述数据流总线阵列,配置为从输入端口接收输入数据,从所述累加器资源池接收输出数据;以及,根据所述数据流控制器的控制将所述输入数据及所述输出数据传输至所述缓存资源池,或者根据所述输出时序控制器的控制将所述输出数据传输至输出端口;
所述缓存资源池,包括至少一个缓存资源块,配置为根据所述数据流控制器的控制接收所述数据流总线阵列传输的数据,并通过所述数据流控制器根据滤波器阶数、个数和级联关系对所述数据流总线阵列传输的数据进行控制,形成待计算的滤波缓存;
所述ALU资源池包括至少一个ALU,配置为根据所述缓存资源映射器、所述滤波系数存储器以及所述ALU控制器对所述待计算的滤波缓存进行乘加计算,并将乘加计算的计算结果通过所述累加资源组织器传输至所述累加器资源池;
所述累加器资源池包括至少一个累加器,每个累加器与所述ALU资源池中的ALU一一对应,配置为通过所述累加资源组织器根据滤波资源分配情况对所述ALU进行乘加计算的计算结果进行相加,得到滤波结果;并将所述滤波结果传输至所述数据流总线阵列。
在上述方案中,所述数据流总线阵列中数据流总线的数据结构包括:数据、与数据对应的缓存资源块标识和用于表征数据为新数据的标识位。
在上述方案中,所述缓存资源池中每个缓存资源块均包括至少一个串 联的寄存器组,一个缓存级联开关;所述每个缓存资源块的缓存级联开关包括三个输入端和一个输出端,其中,所述缓存级联开关的第一输入端与所述数据流总线阵列相连,所述缓存级联开关的第二输入端与前级缓存资源块的输出相连;所述缓存级联开关的第三输入端与所述数据流控制器相连;所述缓存级联开关的输出端与所述寄存器组的输入端相连。
在上述方案中,当所述数据流控制器控制所述缓存级联开关的第一输入端开通,第二输入端关闭时,所述缓存资源块的寄存器组的输入数据由所述数据流总线阵列提供;
当所述数据流控制器控制所述缓存级联开关的第一输入端关闭,第二输入端开通时,所述缓存资源块的寄存器组的输入数据由前级的缓存资源块提供。
在上述方案中,所述ALU资源池中的每个ALU均包括两个ALU缓存块,加法器、乘法器以及截位电路;其中,所述两个ALU缓存块分别对应于两个缓存资源块所输出的待运算的滤波缓存,所述每个ALU缓存块的大小与所述缓存资源块中寄存器组的大小相同。
在上述方案中,两个ALU缓存块分别连接在加法器的两个输入端口,通过所述ALU控制器将两个ALU缓存块的缓存数据送入所述加法器;
所述加法器输出端与所述乘法器相连,所述乘法器的另一输入端与所述滤波系数存储器相连,其中,所述滤波系数存储器中的系数通过软件初始化后,以预设的顺序输入所述乘法器进行滤波运算;
所述乘法器运算后的运算结果经过所述截位电路进行截位后送入所述累加资源池。
在上述方案中,所述累加器资源池中的每个累加器均包括一个加法器、截位器和一个缓存器;其中,所述加法器配置为两个ALU数据的相加或自累加;所述累加资源组织器根据滤波资源分配情况对ALU进行乘加计算的 计算结果通过所述加法器、所述截位器和所述缓存器进行相加,得到滤波结果。
在上述方案中,所述数据流控制器,配置为根据配置控制所述累加器资源池中的累加器的滤波结果输出到所述数据流总线阵列;或者,
所述数据流控制器,配置为根据配置控制所述累加器资源池中的累加器的滤波结果输出至所述输出端口。
所述数据流总线阵列、所述缓存资源池、所述ALU资源池、所述累加器资源池、所述数据流控制器、所述缓存资源映射器、所述滤波系数存储器、所述ALU控制器、所述累加资源组织器、所述输出时序控制器在执行处理时,可以采用中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Singnal Processor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)实现。
第二方面,本发明实施例提供了一种滤波方法,所述方法应用于上述方案中任一项所述的FIR滤波器组,所述方法包括:
通过输入端口接收到输入数据后,将所述输入数据传输至数据流总线阵列;
缓存资源池中的缓存资源块根据数据流控制器的控制接收所述数据流总线阵列传输的数据,并通过所述数据流控制器根据滤波器阶数、个数和级联关系进行控制,形成待计算的滤波缓存;
算术逻辑单元ALU资源池中的ALU根据缓存资源映射器、滤波系数存储器以及ALU控制器对所述待计算的滤波缓存进行乘加计算,并将乘加计算的计算结果通过累加资源组织器传输至累加器资源池;
所述累加器资源池中的累加器通过累加资源组织器根据滤波资源分配情况对所述ALU进行乘加计算的计算结果进行相加,得到滤波结果;
根据所述数据流控制器的控制将所述滤波结果传输至所述数据流总线 阵列,并将所述数据流总线阵列中的滤波结果传输至输出端口。
在上述方案中,所述方法还包括:所述数据流控制器通过控制所述数据流总线阵列将所述滤波结果回环至缓存资源块。
本发明实施例提供了一种FIR滤波器组及滤波方法,通过将滤波器的所有硬件资源进行统一考量,从而实现滤波器组内部硬件资源可重构、可重用且灵活可配置,以及在合理的资源和速度的前提下能够满足不同的滤波组合。
附图说明
图1为本发明实施例所提出的一种FIR滤波器组的结构示意图;
图2为本发明实施例所提出的另一种FIR滤波器组的结构示意图;
图3为本发明实施例所提出的数据流总线阵列中数据流总线的数据结构示意图;
图4为本发明实施例所提出的缓存资源池中所包括的缓存资源块结构以及缓存资源块之间的连接关系示意图;
图5为本发明实施例所提出的ALU资源池中ALU的结构示意图;
图6为本发明实施例所提出的一种滤波方法的流程示意图;
图7为本发明实施例所提出的一种缓存资源块的数据结构示意图;
图8为本发明实施例所提出的另一种缓存资源块的数据结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
实施例一
由于本发明实施例的技术方案的基本思想是将滤波器的所有硬件资源进行统一考量,使得能够根据不同的应用场景,对滤波资源进行重组,从 而形成不同的滤波器组的结构,实现滤波器组的可重构。
参见图1,其示出了本发明实施例所提出的一种FIR滤波器组10的结构。如图1所示,FIR滤波器组10包括:数据处理电路101和控制电路102两个组成部分,这两个组成部分相互之间是耦接的关系。可以理解地,由于本实施例是对于FIR滤波器组10的结构进行实例性的说明,因此,对于FIR滤波器组10相关的外部电路及电器元件,本实施例不作具体赘述,本领域技术人员可以根据实际应用场景的需要对本实施例所述的FIR滤波器组10的相关外部电路进行设计以满足相应的应用需求。
参见图2所示的FIR滤波器组10的具体结构,数据处理电路101可以包括数据流总线阵列1011、缓存资源池1012、ALU资源池1013以及累加器资源池1014;而控制电路102则可以包括:数据流控制器1021、缓存资源映射器1022、滤波系数存储器1023、ALU控制器1024、累加资源组织器1025以及输出时序控制器1026;其中,
数据流总线阵列1011,配置为从输入端口接收输入数据,从累加器资源池1014接收输出数据;以及,根据数据流控制器1021的控制将所述输入数据及所述输出数据传输至缓存资源池1012,或者根据输出时序控制器1026的控制将所述输出数据传输至输出端口;
缓存资源池1012,包括至少一个缓存资源块,配置为根据数据流控制器1021的控制接收数据流总线阵列1011传输的数据,并通过数据流控制器1021根据滤波器阶数、个数和级联关系进行控制,形成待计算的滤波缓存;
ALU资源池1013包括至少一个ALU,配置为根据缓存资源映射器1022、滤波系数存储器1023以及ALU控制器1024对待计算的滤波缓存进行乘加计算,并将乘加计算的计算结果通过累加资源组织器1025传输至累加器资源池1014;
累加器资源池1014包括至少一个累加器,每个累加器与ALU资源池1013中的一个ALU对应,配置为通过累加资源组织器1025根据滤波资源分配情况对ALU进行乘加计算的计算结果进行相加,得到滤波结果;并将滤波结果传输至数据流总线阵列1011。
在图2所示的FIR滤波器组10的具体结构的基础上,示例性地,参见图3,数据流总线阵列1011中数据流总线的数据结构可以包括:数据、与数据对应的缓存资源块标识ID和用于表征数据为新数据的标识位dv。其中,数据的位宽Data Width由从输入端口所接收到的输入数据的位宽决定;数据对应的缓存资源块标识位宽由缓存资源池1012中缓存资源块的数量决定;在用于表征数据为新数据的标识位dv有效的情况下,缓存资源块入口处判断数据流总线中与数据对应的缓存资源块标识ID是否与缓存资源块自身的ID相符,如果相符且该缓存资源块的缓存级联开关指向数据流总线阵列1011的输入数据则该数据往后移位,否则忽略。可以理解的是,在具体设计中需要考虑到流量的需求,可以有至少一组(例如m组)数据流总线组成数据流总线阵列1011。在缓存资源块入口处根据数据流控制器1021进行控制选择即可。
在图2所示的FIR滤波器组10的具体结构的基础上,示例性地,参见图4,其示出了缓存资源池1012中所包括的缓存资源块结构以及缓存资源块之间的连接关系,如图4中点划线框所示,每个缓存资源块都包括至少一个串联的寄存器组,一个缓存级联开关;每个缓存资源块的缓存级联开关包括三个输入端和一个输出端,其中,缓存级联开关的第一输入端与数据流总线阵列1011相连,需要说明的是,第一输入端可以通过数据流总线阵列1011中数据流总线与数据对应的缓存资源块标识ID来确定该缓存资源块所接收的数据流总线阵列1011中数据流总线的数据;缓存级联开关的第二输入端与前级缓存资源块的输出相连;缓存级联开关的第三输入端与 数据流控制器1021相连;缓存级联开关的输出端与寄存器组的输入端相连。
数据流控制器1021可以根据滤波器阶数、个数和级联关系来控制缓存级联开关的第一输入端和第二输入端的通断,从而通过控制所述寄存器组的输入数据的来源对滤波器阶数、个数和级联关系进行控制,还实现了缓存资源的重构。
具体地,当数据流控制器1021控制缓存级联开关的第一输入端开通,第二输入端关闭时,缓存资源块的寄存器组的输入数据由数据流总线阵列1011提供,数据流总线阵列1011中可以提供由输入端口接收的输入数据,也可以提供由累加器资源池1014接收的输出数据,当寄存器组的输入数据是由数据流总线阵列1011提供的由累加器资源池1014接收的输出数据时,也就实现了滤波器之间的级联;而当数据流控制器1021控制缓存级联开关的第一输入端关闭,第二输入端开通时,缓存资源块的寄存器组的输入数据由前级的缓存资源块提供,从而实现了滤波器内部缓存器的级联。
例如,当数据流控制器1021通过控制缓存级联开关来确定缓存资源块的寄存器组的输入数据由数据流总线阵列1011提供,且数据流总线阵列1011中数据流总线的用于表征数据为新数据的标识位dv有效时,缓存资源块中的第一输入端判断数据流总线与数据对应的缓存资源块标识ID与该缓存资源块是否匹配,如果匹配则缓存资源块整体右移。后级缓存资源块如果与前级相连则跟着右移,形成一个待运算的滤波缓存。
需要说明的是,每一个缓存资源块可以通过设置寄存器组中串联的寄存器个数来确定缓存的长度,具体可以将最后几个寄存器旁路掉,从而能够保证系数对称的滤波相的前后缓存资源块数据的对称性,方便后续ALU资源池中,ALU运算单元的复用处理。
在图2所示的FIR滤波器组10的具体结构的基础上,示例性地,参见图5,其示出了ALU资源池1013中ALU的结构,如图5中的点划线框所 示,ALU资源池1013中的一个ALU可以包含两个ALU缓存块,加法器、乘法器以及截位电路。两个ALU缓存块分别与两个缓存资源块的输出的待运算的滤波缓存对应,每个ALU缓存块的大小与缓存资源块中寄存器组的大小相同,从而能够在缓存资源映射器1022的控制下,分时地将缓存资源块输出的待计算的滤波相映射到相应的ALU缓存块上;两个ALU缓存块分别连接在加法器的两个输入端口,比如a端口和b端口,通过ALU控制器1024将两个ALU缓存块的缓存数据送入加法器。加法器输出端与乘法器相连,乘法器的另一输入端与滤波系数存储器1023相连。滤波系数存储器1023中的系数通过软件初始化后,以一定的顺序输入乘法器参加滤波运算。乘法器运算后的运算结果经过截位电路进行截位后送入累加资源池。
具体地,加法器a端口缓存数据在ALU控制器1024的控制下从0地址或者配置地址到高地址依次按步进送入加法器,步进长度默认为1,根据系统需要可以设置不同数字;b端口从高地址或者一个配置地址到0地址依次按步进送入加法器,也可以选择数据0。
示例性地,累加器资源池1014中的一个累加器可以包括一个加法器、截位器和一个缓存器,加法器既可以用于两个ALU数据的相加,也可以作为自累加用。累加资源组织器1025根据滤波资源分配情况对ALU进行乘加计算的计算结果通过加法器、截位器和缓存器进行相加,得到滤波结果。进一步地,还可以根据ALU的复用需求配置累加资源组织器、控制ALU的相加关系和自累加循环次数。
示例性地,数据流控制器1021还可以配置为根据配置控制累加器资源池1014中的累加器的滤波结果输出到数据流总线阵列1011,从而通过数据流总线阵列1011将滤波结果回环到相应的缓存资源块进行下一级滤波;或者根据配置控制累加器资源池1014中的累加器的滤波结果输出到output端口。
示例性地,输出时序控制器1026控制输出端口的滤波结果按照预设的时序进行排序后进行输出。
本实施例提供了一种FIR滤波器组,通过将滤波器的所有硬件资源进行统一考量,从而实现滤波器组内部硬件资源可重构、可重用且灵活可配置,以及在合理的资源和速度的前提下能够满足不同的滤波组合。
实施例二
参见图6,其示出了一种应用于前述实施例所述的FIR滤波器组的滤波方法,FIR滤波器组的具体结构如前述实施例所述,在此不再赘述,滤波方法可以包括:
S601:输入端口接收到输入数据后,将所述输入数据传输至数据流总线阵列;
具体地,将所述输入数据传输至数据流总线阵列,可以包括:
将所述输入数据传输至数据流总线阵列中的数据流总线时对应进行标识ID,并将相应的用于表征数据为新数据的标识位dv设置为有效。
S602:缓存资源池中的缓存资源块根据数据流控制器的控制接收数据流总线阵列传输的数据,并通过数据流控制器根据滤波器阶数、个数和级联关系进行控制,形成待计算的滤波缓存;
其中,每个缓存资源块都包括至少一个串联的寄存器组,一个缓存级联开关;每个缓存资源块的缓存级联开关包括三个输入端和一个输出端,其中,缓存级联开关的第一输入端与数据流总线阵列相连,需要说明的是,第一输入端可以通过数据流总线阵列中数据流总线与数据对应的缓存资源块标识ID来确定该缓存资源块所接收的数据流总线阵列中数据流总线的数据;缓存级联开关的第二输入端与前级缓存资源块的输出相连;缓存级联开关的第三输入端与数据流控制器相连;缓存级联开关的输出端与寄存器组的输入端相连。
S603:ALU资源池中的ALU根据缓存资源映射器、滤波系数存储器以及ALU控制器对所述待计算的滤波缓存进行乘加计算,并将乘加计算的计算结果通过累加资源组织器传输至累加器资源池;
具体地,每个ALU可以包含两个ALU缓存块,加法器、乘法器以及截位电路。两个ALU缓存块分别与两个缓存资源块的输出的待运算的滤波缓存对应,每个ALU缓存块的大小与缓存资源块中寄存器组的大小相同,从而能够在缓存资源映射器的控制下,分时地将缓存资源块输出的待计算的滤波相映射到相应的ALU缓存块上;两个ALU缓存块分别连接在加法器的两个输入端口,通过ALU控制器将两个ALU缓存块的缓存数据送入加法器。加法器输出端与乘法器相连,乘法器的另一输入端与滤波系数存储器相连。滤波系数存储器中的系数通过软件初始化后,以一定的顺序输入乘法器参加滤波运算。乘法器运算后的运算结果经过截位电路进行截位后送入累加资源池。
S604:累加器资源池中的累加器通过累加资源组织器根据滤波资源分配情况对ALU进行乘加计算的计算结果进行相加,得到滤波结果;
需要说明的是,每个累加器与ALU资源池中的一个ALU对应,每个累加器可以包括一个加法器、截位器和一个缓存器;其中,加法器既可以用于两个ALU数据的相加,也可以作为自累加用。累加资源组织器1025根据滤波资源分配情况对ALU进行乘加计算的计算结果通过加法器、截位器和缓存器进行相加,得到滤波结果。
S605:根据数据流控制器的控制将滤波结果传输至数据流总线阵列,并将数据流总线阵列中的滤波结果传输至输出端口;
可以理解的,输出端口可以在输出时序控制器的控制下,将滤波结果以相应的时序进行输出。
此外,数据流控制器还可以通过控制数据流总线阵列将滤波结果回环 到相应的缓存资源块进行下一级滤波,从而实现滤波器的级联。
上述过程是FIR滤波器组对输入数据进行滤波的方法流程,为了说明本实施例技术方案的详细应用,通过实施例三至实施例六共四种具体实施例对FIR滤波器组的应用进行简要说明。
实施例三
以两个串联滤波器为例,设定第一级滤波器为12个系数偶对称,2倍抽取,3倍输入复用比;第二级滤波器为47个系数奇对称,2倍抽取,6倍输入复用比。基于前述实施例所述的FIR滤波器组与滤波方法,具体实施过程如下:
设置第一级滤波器占用标识为ID0、ID1两个缓存资源块,缓存资源块ID0的缓存级联开关设置为与数据流总线阵列相连,缓存资源块ID0的寄存器组的输入数据由数据流总线阵列提供;缓存资源块ID1的缓存级联开关设置为与前级缓存资源块ID0相连,缓存资源块ID1的寄存器组的输入数据由前级的缓存资源块提供;
设置第二级滤波器占用标识为ID2-ID7共五个缓存资源块,缓存资源块ID2的缓存级联开关设置为与数据流总线阵列相连;缓存资源块ID3-ID7的缓存级联开关均设置为与前级缓存资源块相连。
由于第一级滤波可以在6个周期内完成6次乘加操作,因此,在ALU资源池中仅需要一个ALU0即可实现。如图7所示的第一级滤波器的缓存资源块的数据结构,缓存资源块ID0的最后两个寄存器被旁路,这样使得缓存资源块ID0和缓存资源块ID1能够针对对称系数对应的抽头数据进行相加的操作。缓存资源块ID0与ID1分别通过缓存资源映射器被映射到ALU0的两个ALU缓存块上,在ALU控制器的控制下,6个周期依次完成对称数据(d0,d11)、(d1,d10)、(d2,d9)、(d3,d8)、(d4,d7)和(d5,d6)的相加,然后分别与系数相乘、累加、截位等操作后得到的第一级滤 波器的输出数据;
随后,在数据流控制器的控制下将第一级滤波器的输出数据加上缓存资源ID号后,通过数据流总线阵列将第一级滤波器的输出数据路由到缓存资源块ID2上,形成第二级滤波器的缓存存储,如图8所示的第二级滤波器的缓存资源块的数据结构。由于第二级滤波器可以在12个周期以内完成24次乘加运算,因此,需要ALU资源池中的2个ALU进行实现,设置为ALU1和ALU2。前四个周期将ID2和ID7同时映射到ALU1和ALU2的两个ALU缓存块上,其中,ALU1完成(d0,d46)、(d1,d45)、(d2,d44)以及(d3,d43)的对应加乘操作;ALU2完成(d4,d42)、(d5,d41)、(d6,d40)以及(d7,d39)的对应加乘操作。后8个周期分别将缓存资源块ID3和ID6映射到ALU1的两个缓存上,ID4和ID5映射到ALU2的两个缓存上。在8个周期内分别完成剩余的数据运算。随后在累加器资源池中将ALU1和ALU2的输出相加之后再进行累加,最后将累加结果截位后,经过输出端口进行输出排序后输出,实现两个串联滤波器。
实施例四
以两个串联滤波器复用一个ALU单元进行处理为例。设定第一级滤波器12个系数偶对称,2倍抽取,12倍输入复用比;第二级滤波器24个系数偶对称,2倍抽取,24倍输入复用比。基于前述实施例所述的FIR滤波器组与滤波方法,具体实施过程如下:
设置第一级滤波器占用标识为ID0、ID1两个缓存资源块,缓存资源块ID0的缓存级联开关设置为与数据流总线阵列相连,缓存资源块ID0的寄存器组的输入数据由数据流总线阵列提供;缓存资源块ID1的缓存级联开关设置为与前级缓存资源块ID0相连,缓存资源块ID1的寄存器组的输入数据由前级的缓存资源块提供;
设置第二级滤波器占用标识为ID2-ID4共三个缓存资源块,缓存资源 块ID2的缓存级联开关设置为与数据流总线阵列相连;缓存资源块ID3、ID4缓存资源块的缓存级联开关均设置为与前级缓存资源块相连。
第一级滤波器的处理过程与实施例一相同,在此不再赘述。第一级滤波器在6个周期内完成运算,剩余18个周期的时间内完成第二级滤波器的运算。具体地,第二级滤波器的24个数的乘加运算可以在12拍以内完。前4个周期将缓存资源块ID3同时映射到ALU0的两个ALU缓存块上,完成中间8个数据的加乘操作。后8个周期分别将缓存资源块ID2和ID4映射到ALU0的两个ALU缓存块上,完成剩余的数据运算。这样实现了两个串联滤波器复用一个ALU单元进行处理的配置。
实施例五
以两组并联的插值滤波器为例。第一组滤波器可以是两个串联滤波器,具体的缓存资源块、ALU资源池中ALU的分配及具体处理实现过程如实施例二所述,在此不再赘述。
设定第二组为一个独立滤波器,32个系数偶对称,8倍输入复用比,2倍插值。由于第二组滤波器是2倍插值,因此只需要存储16个数据,占用标识为ID5和ID6两个缓存资源块。
由于第二组滤波器是一个插值滤波器,在8个周期内完成两个16个数据的乘加操作,因此共需要ALU资源池中的2个ALU分别完成奇、偶相的数据运算。将缓存资源块ID5、ID6分别映射到ALU1、ALU2上。ALU1将相加之后的数据逐个与奇相系数相乘得到奇相滤波结果,经过累加、截位,将滤波结果输出到输出端口。偶相滤波结果由ALU2计算。在输出端口通过整理时序,按要求依次输出奇偶相值,得到第二组滤波器的输出结果。
实施例六
以数据流总线阵列中的多组数据流总线的处理为例,对两组并联滤波 器进行说明。设定第一组为两个串联滤波器,输入复用比分别为2和4,与之并联的第二组为一个滤波器,输入复用比为2。其中,每个滤波器各占用一个缓存资源块、一个ALU计算单元。此时,数据流总线阵列中的一组数据流总线显然无法满足流量需求。由于三个滤波器的复用比倒数之和等于5/4,因此,需要数据流总线阵列中的两组数据流总线。
第一组串联的两个滤波器复用比倒数之和等于3/4,可以将第一组滤波器分配到第一组数据流总线上,第二组滤波器单独使用第二组数据流总线。
当有多组数据流总线时,需要在缓存资源块与数据流总线的接口处额外增加一级选择匹配逻辑。在缓存资源块ID0、ID1处将缓存级联开关设置为与第一组数据流总线相连,缓存资源块ID2处将缓存级联开关设置为与第二组数据流总线相连。同样在累加器的输出通过第一组数据流总线路由到缓存资源块ID1,实现第一组串联的两个滤波器中,两级滤波器的串联。
通过上述用于具体的应用场景实施例的说明,可以得知,本发明实施例所提出的滤波方法,由于应用在实施例一所述的FIR滤波器组中,因此,能够将滤波器的所有硬件资源进行统一考量,从而实现滤波器组内部硬件资源可重构、可重用且灵活可配置,以及在合理的资源和速度的前提下能够满足不同的滤波组合。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和 /或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。
工业实用性
本发明实施例提供了一种FIR滤波器组及滤波方法,通过将滤波器的所有硬件资源进行统一考量,从而实现滤波器组内部硬件资源可重构、可重用且灵活可配置,以及在合理的资源和速度的前提下能够满足不同的滤波组合。

Claims (10)

  1. 一种有限长单位冲激响应FIR滤波器组,所述FIR滤波器组包括相互耦接的控制电路和数据处理电路;所述数据处理电路包括数据流总线阵列、缓存资源池、算术逻辑单元ALU资源池、累加器资源池;所述控制电路包括:数据流控制器、缓存资源映射器、滤波系数存储器、ALU控制器、累加资源组织器和输出时序控制器;其中,
    所述数据流总线阵列,配置为从输入端口接收输入数据,从所述累加器资源池接收输出数据;以及,根据所述数据流控制器的控制将所述输入数据及所述输出数据传输至所述缓存资源池,或者根据所述输出时序控制器的控制将所述输出数据传输至输出端口;
    所述缓存资源池,包括至少一个缓存资源块,配置为根据所述数据流控制器的控制接收所述数据流总线阵列传输的数据,并通过所述数据流控制器根据滤波器阶数、个数和级联关系对所述数据流总线阵列传输的数据进行控制,形成待计算的滤波缓存;
    所述ALU资源池包括至少一个ALU,配置为根据所述缓存资源映射器、所述滤波系数存储器以及所述ALU控制器对所述待计算的滤波缓存进行乘加计算,并将乘加计算的计算结果通过所述累加资源组织器传输至所述累加器资源池;
    所述累加器资源池包括至少一个累加器,每个累加器与所述ALU资源池中的ALU一一对应,配置为通过所述累加资源组织器根据滤波资源分配情况对所述ALU进行乘加计算的计算结果进行相加,得到滤波结果;并将所述滤波结果传输至所述数据流总线阵列。
  2. 根据权利要求1所述的FIR滤波器组,其中,所述数据流总线阵列中数据流总线的数据结构包括:数据、与数据对应的缓存资源块标识和用于表征数据为新数据的标识位。
  3. 根据权利要求1所述的FIR滤波器组,其中,所述缓存资源池中每个缓存资源块均包括至少一个串联的寄存器组,一个缓存级联开关;所述每个缓存资源块的缓存级联开关包括三个输入端和一个输出端,其中,所述缓存级联开关的第一输入端与所述数据流总线阵列相连,所述缓存级联开关的第二输入端与前级缓存资源块的输出相连;所述缓存级联开关的第三输入端与所述数据流控制器相连;所述缓存级联开关的输出端与所述寄存器组的输入端相连。
  4. 根据权利要求3所述的FIR滤波器组,其中,当所述数据流控制器控制所述缓存级联开关的第一输入端开通,第二输入端关闭时,所述缓存资源块的寄存器组的输入数据由所述数据流总线阵列提供;
    当所述数据流控制器控制所述缓存级联开关的第一输入端关闭,第二输入端开通时,所述缓存资源块的寄存器组的输入数据由前级的缓存资源块提供。
  5. 根据权利要求1所述的FIR滤波器组,其中,所述ALU资源池中的每个ALU均包括两个ALU缓存块,加法器、乘法器以及截位电路;其中,所述两个ALU缓存块分别对应于两个缓存资源块所输出的待运算的滤波缓存,所述每个ALU缓存块的大小与所述缓存资源块中寄存器组的大小相同。
  6. 根据权利要求5所述的FIR滤波器组,其中,两个ALU缓存块分别连接在加法器的两个输入端口,通过所述ALU控制器将两个ALU缓存块的缓存数据送入所述加法器;
    所述加法器输出端与所述乘法器相连,所述乘法器的另一输入端与所述滤波系数存储器相连,其中,所述滤波系数存储器中的系数通过软件初始化后,以预设的顺序输入所述乘法器进行滤波运算;
    所述乘法器运算后的运算结果经过所述截位电路进行截位后送入所述 累加资源池。
  7. 根据权利要求1所述的FIR滤波器组,其中,所述累加器资源池中的每个累加器均包括一个加法器、截位器和一个缓存器;其中,所述加法器配置为两个ALU数据的相加或自累加;所述累加资源组织器根据滤波资源分配情况对ALU进行乘加计算的计算结果通过所述加法器、所述截位器和所述缓存器进行相加,得到滤波结果。
  8. 根据权利要求1所述的FIR滤波器组,其中,所述数据流控制器配置为根据配置控制所述累加器资源池中的累加器的滤波结果输出到所述数据流总线阵列;或者,
    所述数据流控制器配置为根据配置控制所述累加器资源池中的累加器的滤波结果输出至所述输出端口。
  9. 一种滤波方法,所述方法应用于权利要求1至8任一项所述的FIR滤波器组,所述方法包括:
    通过输入端口接收到输入数据后,将所述输入数据传输至数据流总线阵列;
    缓存资源池中的缓存资源块根据数据流控制器的控制接收所述数据流总线阵列传输的数据,并通过所述数据流控制器根据滤波器阶数、个数和级联关系进行控制,形成待计算的滤波缓存;
    算术逻辑单元ALU资源池中的ALU根据缓存资源映射器、滤波系数存储器以及ALU控制器对所述待计算的滤波缓存进行乘加计算,并将乘加计算的计算结果通过累加资源组织器传输至累加器资源池;
    所述累加器资源池中的累加器通过累加资源组织器根据滤波资源分配情况对所述ALU进行乘加计算的计算结果进行相加,得到滤波结果;
    根据所述数据流控制器的控制将所述滤波结果传输至所述数据流总线阵列,并将所述数据流总线阵列中的滤波结果传输至输出端口。
  10. 根据权利要求9所述的方法,其中,所述方法还包括:所述数据流控制器通过控制所述数据流总线阵列将所述滤波结果回环至缓存资源块。
PCT/CN2015/098343 2015-11-03 2015-12-22 一种fir滤波器组及滤波方法 WO2017075868A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510745052.6A CN106656103B (zh) 2015-11-03 2015-11-03 一种fir滤波器组及滤波方法
CN201510745052.6 2015-11-03

Publications (1)

Publication Number Publication Date
WO2017075868A1 true WO2017075868A1 (zh) 2017-05-11

Family

ID=58661524

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/098343 WO2017075868A1 (zh) 2015-11-03 2015-12-22 一种fir滤波器组及滤波方法

Country Status (2)

Country Link
CN (1) CN106656103B (zh)
WO (1) WO2017075868A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI768504B (zh) * 2020-10-12 2022-06-21 瑞昱半導體股份有限公司 濾波器電路與信號處理方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108039941B (zh) * 2017-12-13 2020-10-20 重庆邮电大学 Lte-a控制信道解资源映射的方法
CN109802691A (zh) * 2019-01-24 2019-05-24 中科驭数(北京)科技有限公司 序列数据的滤波方法及装置
CN114584108A (zh) * 2020-11-30 2022-06-03 深圳市中兴微电子技术有限公司 滤波器单元以及滤波器阵列
CN112822783B (zh) * 2020-12-31 2023-03-21 联想未来通信科技(重庆)有限公司 一种资源调度方法、装置和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030028569A1 (en) * 2000-01-14 2003-02-06 Brokish Charles W. Delayed adaptive least-mean-square digital filter
CN1866738A (zh) * 2006-06-12 2006-11-22 许金生 一种通用可编程数字滤波器及其工作方法
CN102510273A (zh) * 2011-12-27 2012-06-20 中国科学院自动化研究所 一种有限脉冲响应滤波器
CN103269212A (zh) * 2013-05-14 2013-08-28 邓晨曦 低成本低功耗可编程多级fir滤波器实现方法
CN103378820A (zh) * 2012-04-19 2013-10-30 中兴通讯股份有限公司 可编程数字滤波实现方法、装置、基带芯片及其终端
CN104539263A (zh) * 2014-12-25 2015-04-22 电子科技大学 一种可重构低功耗数字fir滤波器

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030028569A1 (en) * 2000-01-14 2003-02-06 Brokish Charles W. Delayed adaptive least-mean-square digital filter
CN1866738A (zh) * 2006-06-12 2006-11-22 许金生 一种通用可编程数字滤波器及其工作方法
CN102510273A (zh) * 2011-12-27 2012-06-20 中国科学院自动化研究所 一种有限脉冲响应滤波器
CN103378820A (zh) * 2012-04-19 2013-10-30 中兴通讯股份有限公司 可编程数字滤波实现方法、装置、基带芯片及其终端
CN103269212A (zh) * 2013-05-14 2013-08-28 邓晨曦 低成本低功耗可编程多级fir滤波器实现方法
CN104539263A (zh) * 2014-12-25 2015-04-22 电子科技大学 一种可重构低功耗数字fir滤波器

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI768504B (zh) * 2020-10-12 2022-06-21 瑞昱半導體股份有限公司 濾波器電路與信號處理方法

Also Published As

Publication number Publication date
CN106656103B (zh) 2019-07-19
CN106656103A (zh) 2017-05-10

Similar Documents

Publication Publication Date Title
WO2017075868A1 (zh) 一种fir滤波器组及滤波方法
US9792118B2 (en) Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption, and related vector processor systems and methods
US9684509B2 (en) Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory, and related vector processing instructions, systems, and methods
TWI601066B (zh) 具有用於提供多模基-2x蝶形向量處理電路的可程式設計資料路徑的向量處理引擎以及相關的向量處理器、系統和方法
EP2972968B1 (en) Vector processing engines having programmable data path configurations for providing multi-mode vector processing, and related vector processors, systems, and methods
US9880845B2 (en) Vector processing engines (VPEs) employing format conversion circuitry in data flow paths between vector data memory and execution units to provide in-flight format-converting of input vector data to execution units for vector processing operations, and related vector processor systems and methods
WO2015073646A1 (en) Vector processing engine employing reordering circuitry in data flow paths between vector data memory and execution units, and related method
US7325123B2 (en) Hierarchical interconnect for configuring separate interconnects for each group of fixed and diverse computational elements
WO2002012978A2 (en) Configurable function processing cell linear array in computation engine coupled to host units
US20150143079A1 (en) VECTOR PROCESSING ENGINES (VPEs) EMPLOYING TAPPED-DELAY LINE(S) FOR PROVIDING PRECISION CORRELATION / COVARIANCE VECTOR PROCESSING OPERATIONS WITH REDUCED SAMPLE RE-FETCHING AND POWER CONSUMPTION, AND RELATED VECTOR PROCESSOR SYSTEMS AND METHODS
EP3069236A1 (en) Vector processing engine employing despreading circuitry in data flow paths between execution units and vector data memory, and related method
WO2021073137A1 (zh) 一种可重构处理器和可重构处理器系统
CN113064852B (zh) 一种可重构处理器及配置方法
JP6537823B2 (ja) ソフトウェア・デファインド・ネットワーク処理エンジンにおける並行かつ条件付きのデータ操作の方法および装置
CN105631013B (zh) 生成哈希值的装置和方法
US9727526B2 (en) Apparatus and method of vector unit sharing
JPH11266140A (ja) ディジタルフィルタを実現するプログラム可能な回路
JP2015503785A (ja) Fft/dftの逆順ソーティングシステム、方法およびその演算システム
US10826982B2 (en) Packet processing architecture and method therefor
US20160344373A1 (en) Resource-saving circuit structures for deeply pipelined systolic finite impulse response filters
CN106708467B (zh) 一种宽位累加器电路及其设计方法、可编程逻辑器件
CN114448390A (zh) 一种Biquad数字滤波器装置及实现方法
US6401106B1 (en) Methods and apparatus for performing correlation operations
CN206147622U (zh) 一种SoC系统中通用可配置加速单元的IP电路
CN103955353B (zh) 具有面向全分布式超长指令字的高能效局部互连结构的装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15907710

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15907710

Country of ref document: EP

Kind code of ref document: A1