CN107645287B - 6 parallel rapid FIR filter - Google Patents
6 parallel rapid FIR filter Download PDFInfo
- Publication number
- CN107645287B CN107645287B CN201710396331.5A CN201710396331A CN107645287B CN 107645287 B CN107645287 B CN 107645287B CN 201710396331 A CN201710396331 A CN 201710396331A CN 107645287 B CN107645287 B CN 107645287B
- Authority
- CN
- China
- Prior art keywords
- parallel
- fast
- convolution
- fir filter
- filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Complex Calculations (AREA)
- Filters That Use Time-Delay Elements (AREA)
Abstract
The invention discloses a size-configurable convolution hardware implementation based on a 6 parallel fast finite impulse response filter cascade structure, which can complete convolution calculation of four sizes, namely 3 × 3, 5 × 5, 7 × 7 and 11 × 11, reduce the complexity of convolution calculation and improve the throughput rate under the 6 parallel structure. The invention introduces a 2-parallel and 3-parallel fast FIR filter algorithm structure, and then generates a 6-parallel fast FIR Filter Algorithm (FFA) according to the way of 2-parallel structure cascade 3-parallel substructure. On the basis of 6 parallel FFAs, a configurable sub-filter is used for designing a fast convolution hardware architecture capable of completing convolution calculations of four sizes, namely 3 x 3, 5 x 5, 7 x 7 and 11 x 11. Compared with the traditional 6-parallel FIR filter, under the condition of the same throughput rate, the algorithm can save 50% of multiplication operation on the basis of adding a plurality of addition operations. And because the area and the power consumption of the multiplier are far larger than those of the adder in the aspect of hardware implementation, the area and the power consumption of the structure can be saved by 50%. The present invention can be applied to situations where convolution calculations of various typical sizes (3 x 3, 5 x 5, 7 x 7 and 11 x 11) are required, such as convolutional neural networks, video image processing, wireless communications, etc., to increase the effective throughput of the original filter or to reduce the power consumption of the original filter.
Description
Technical Field
The invention relates to the field of integrated circuits and machine learning, in particular to a 6-parallel fast FIR filter structure which is used for realizing the hardware implementation of a convolution calculation general circuit with all four sizes of 3 x 3, 5 x 5, 7 x 7 and 11 x 11 in a convolution neural network.
Background
Convolutional Neural Networks (CNNs) are one of the most studied and most widely used machine learning algorithms currently. Convolution calculation is the most calculation resource-consuming part in the CNN, the operation of a rolling machine is represented by multiply-accumulate calculation in hardware implementation, and a multiplier is very resource-consuming in hardware, the occupied area and the power consumption of the multiplier are ten times of those of an adder, so that the hardware implementation optimization for the convolution operation is significant. Most convolution networks use convolution kernels of two sizes, 3 x 3 or 5 x 5, while a small fraction of convolution kernels of two sizes, 7 x 7 and 11 x 11, are larger, while the other sizes have not been used effectively.
Polynomial representation of an N-tap FIR filter in the time domain
In the z domain are
Where the sequence x (N) is an infinitely long input sequence and the sequence h (N) contains FIR filter coefficients of length N. It can be seen that if { h (N) } is regarded as a coefficient of an N-dimensional discrete convolution, the FIR filter implements one time of N × N convolution calculation.
The fast Finite Impulse Response (FIR) algorithm (FFA) is obtained by applying the algorithm strength reduction mode to an FIR filter, and the core idea is to achieve the effect of reducing the hardware complexity by using the mode of sharing a substructure.
Disclosure of Invention
The main innovative contents of the invention are as follows:
based on the existing parallel fast Finite Impulse Response (FIR) algorithm and the FFA cascading scheme with large block size, the hardware implementation of the 6 parallel Fast FIR Algorithm (FFA) is proposed for the first time;
designing a general-purpose fast convolution hardware circuit compatible with convolution kernels with sizes commonly used by all four convolution neural networks of 3 x 3, 5 x 5, 7 x 7 and 11 x 11 on the basis of the 6 parallel fast convolution kernels;
the theoretical analysis of the invention is as follows:
in the z-domain, the polynomial representation of an N-tap FIR filter is
First, we discuss a 2-parallel fast FIR filter in a one-stage structure.
The input sequence { x (0), x (1), x (2), x (3), … } may be split into odd and even terms as follows
X(z)=x(0)+x(1)z-1+x(2)z-2+x(3)z-3+…
=x(0)+x(2)z-2+x(4)z-4+…
+z-1[x(1)+x(3)z-2+x(5)z-4+…]
=X0+z-1X1
Wherein X0And X1Z transform of x (2k +1), respectively x (2 k). Similarly, the filter coefficient H (z) of order N can be split into two parts
H(z)=H0+z-1H1
Wherein H0(z2) And H1(z2) All have the length ofCorresponding to the even sub-filter and the odd sub-filter. And the output sequence y (n) is also expressed as two parts of the parity term, and is calculated as follows
Y(z)=Y0+z-1Y1
=(X0+z-1X1)(H0+z-1H1)
=(X0H0+z-2X1H1)+z-1(X1H0+X0H1)
Wherein
Y0=X0H0+z-2X1H1
Y1=X1H0+X0H1
A Fast FIR Algorithm (FFA) is applied to obtain a first-level structure, namely a 2-parallel fast FIR filter structure, and a plurality of 2-parallel FFA structures can be obtained, wherein a typical structure is as follows
Y0=X0H0+z-2X1H1
Y1=(H0+H1)(X0+X1)-X0H0-X1H1
We discuss the 3-parallel-speed FIR filter structure below, where for a three-phase polynomial decomposition, the input sequence x (n) and the filter coefficient sequence H (n) can be decomposed into
X(z)=X0(z3)+z-1X1(z3)+z-2X2(z3)
H(z)=H0(z3)+z-1H1(z3)+z-2H2(z3)
Wherein X0(z3),X1(z3),X2(z3) Corresponding to the time domain expressions x (3k), x (3k +1) and x (3k +2), respectively, and H0(z3),H1(z3),H2(z3) Corresponding to three sub-filters. The output expression of the system is as follows
Y(z)=Y0(z3)+z-1Y1(z3)+z-2Y2(z3)
=(X0+z-1X1+z-2X2)(H0+z-1H1+z-2H2)
In theory, a large number of optimized 3-parallel fast FIR filter structures are available, the matrix form of which can be expressed as
Y=QHP·X
Where P and Q correspond to the pre-processing matrix and post-processing matrix, respectively, and the H matrix corresponds to the sub-filter matrix. So the hardware implementation block diagram of 3 parallel FFA can be easily made according to the above formula, taking the most common 3 parallel FFA structure as an example, see fig. 1.
The structure of the 6 parallel FFA can be realized by sleeving any type of 3 parallel substructures in any type of 2 parallel structures and cascading the most typical two FFA structures, and the output expression is
Y=Y0+z-1Y1+z-2Y+z-3Y3+z-4Y4+z-5Y5
=(X′0+z-1X′1)((H′0+z-1H′1))
=[X′0H′0+z-2X′1H′1]+z-1[(X′0+X′1)(H′0+H′1)-X′0H′0-X′1H′1]
First, a 2-parallel fast FIR filter structure is used, in which
X′0=(X0+z-2X2+z-4X4)
X′1=(X1+z-2X3+z-4X5)
H′0=(H0+z-2H2+z-4H4)
H′1=(H1+z-2H3+z-4H5)
Then each sub-entry corresponds to a 3-parallel FFA, and the output structure is the same, so that the three sub-filters output as
X′0H′0=a0+a1+a2=a0+z-2b1+z-4b2
X′1H′1=a3+a4+a5=a3+z-2b4+z-4b5
(X′0+X′1)(H′0+H′1)=a6+a7+a8=a6+z-2b7+z-4b8
It should be noted here that the three terms of the expression of the three sub-filter outputs are with z0、z-2And z-4The output expression for bringing it into the parent structure, i.e. the 2-parallel structure, has
Y0=a0+z-6a5
Y1=-a0-a3+a6
Y2=a1+a3
Y3=-a1-a4+a7
Y4=a2+a4
Y5=-a2-a5+a8
The circuit of 6 parallel fast FIR filters can be made according to the output expression. The 6 parallel general convolution kernel comprises 3 parallel FIR filters, the sub-filter part of the circuit can simultaneously realize independent three-channel 3 x 3 convolution calculation, the whole filter can realize single-channel 5 x 5 convolution calculation, and the reconfigurable 2-order FIR sub-filter can realize hardware realization compatible with convolution calculation of all four sizes, namely 3 x 3, 5 x 5, 7 x 7 and 11 x 11. The function of mode selection can be completed by adding a MUX element, and the specific circuit schematic diagram is shown in FIG. 2, and the specific circuit schematic diagram of the reconfigurable 2-step FIR sub-filter is shown in FIG. 3.
In the output module, the output module outputs 6 output results in parallel at a time. 36 multiplications and 30 additions are needed to calculate 6 output results by applying the traditional direct 6-order FIR filter, and 18 multiplications and 42 additions are needed to calculate 6 output results by applying the 6-parallel fast FIR filter of the invention. Because the area and power consumption consumed by the multiplier are far larger than those of the adder in the hardware implementation, compared with the traditional direct FIR filter, the 6-parallel fast FIR filter introduced by the invention can save 50% of hardware resources. And on the basis, a general circuit supporting convolution calculation of all four sizes applied to the convolutional neural network is realized.
Drawings
FIG. 1 is a block diagram of a 3 parallel fast FIR filter;
FIG. 2 is a detailed circuit diagram of a generic 6 parallel fast FIR filter;
FIG. 3 is a circuit schematic of a 2 nd order reconfigurable FIR sub-filter;
fig. 4 is a schematic diagram of the blocks of a 6 parallel fast FIR filter.
Detailed Description
When the mode selection A module inputs 0 and the mode selection B module inputs 0, the circuit performs three-channel 3 multiplied by 3 convolution calculation and inputs a sequence xi{n}={xi0,xi1,xi2H, a sequence of convolution coefficientsi{n}={hi0,hi1,h i21, 23, the input mode is
X0←x00,X2←x01,X4←x02;H00←h00,H01←h01,H02←h02;
X6←x10,X7←x11,X8←x12;H10←h10,H11←h11,H12←h12;
X1←x20,X3←x21,X5←x22;H20←h20,H21←h21,H22←h22;
When the mode selection A module inputs 1 and the mode selection B module inputs 0, the circuit performs single-channel 5 × 5 convolution calculation, the single-channel input sequence still converts the input data into 6-channel parallel input through the serial-to-parallel pre-processing circuit, and the input sequence of the general convolution kernel is x { n } - { x { (x) } at the moment0,x1,x2,x3,x4,x5H, parameter sequence h { n } - { h0,h1,h2,h3,h 40, here the coefficient h in the convolution of 6 x 6 is used ingeniously5The special case of 0 realizes 5 × 5 convolution calculation, so the input mode is
X0←x0;H00←h0
X2←x2;H01←h2
X4←x4;H02←h4
X6←z;H10←h0+h1
X7←z;H11←h2+h3
X8←z;H12←h4
X1←x1;H20←h1
X3←x3;H21←h3
X5←x5;H22←0
When the mode selection A module inputs 1 and the mode selection B module inputs 1, the circuit realizes convolution calculation of a single channel 11 multiplied by 11, input data of the single channel is still converted into 6 channels of parallel input through the pre-processing circuit from serial to parallel, and the input sequence is x { n } - { x { (n) } x }0,x1,…,x5},{x6,x7,…,x11H, parameter sequence h { n } - { h0,h1,…,h 100, here by the coefficient h in a convolution of 12 x 1211The special case of 0 is used to realize 11 × 11 convolution calculation, and the input mode is
X0←{x0,x6};H00←{h0,h6}
X2←{x2,x8};H01←{h2,h8}
X4←{x4,x10};H02←{h4,h10}
X6←z;H10←{h0+h1,h6+h7}
X7←z;H11←{h2+h3,h8+h9}
X8←z;H12←{h4+h5,h10}
X1←{x1,x7};H20←{h1,h7}
X3←{x3,x9};H21←{h3,h9}
X5←{x5,x11};H22←{h5,0}
In mode selectionWhen the module A inputs 1 and the mode selects the module B to input 1, the circuit realizes a 7 multiplied by 7 single-channel convolution mode through the change of the input mode, and a single-channel input sequence still converts input data into parallel data through a pre-processing circuit from serial to parallel6Parallel input, input sequence is x { n } - { x }0,x1,…,x5},{x6,x7,…,x11H, parameter sequence h { n } - { h0,h1,…, h 60, 0, 0, 0, 0}, using the convolution factor h in a 12 × 12 convolution7,…,hnThe 7 × 7 convolution calculation is realized in the special case of 0, and the input mode is
X0←{x0,x6};H00←{h0,h6}
X2←{x2,x8};H01←{h2,0}
X4←{x4,x10};H02←{h4,0}
X6←z;H10←{h0+h1,h6}
X7←z;H11←{h2+h3,0}
X8←z;H12←{h4+h5,0}
X1←{x1,x7};H20←{h1,0}
X3←{x3,x9};H21←{h3,0}
X5←{x5,x11};H22←{h5,0}
In summary, if only 3 × 3 and 5 × 5 modes are supported, the structure of the invention uses 18 multipliers, 42 adders and 7 delay units, which can save 50% of hardware resources; and by using a 2-order filter in a sub-filter structure, the efficient hardware implementation of convolution calculation of all 4 types of convolution neural networks with common sizes can be completed, and by using 35 multipliers, 59 adders and 25 delay units, under the condition that the circuit integration scale is quite high nowadays, the design of an efficient general type neural network convolution kernel is realized, and the convolution calculation of all four types of convolution kernels of 3 × 3, 5 × 5, 7 × 7 and 11 × 11 can be supported.
Claims (3)
1. A6 parallel fast FIR filter, a structure of 3 parallel fast FIR filters in cascade, comprising:
a mode selection module for selecting one of the four convolution calculation modes of 3 × 3, 5 × 5, 7 × 7 and 11 × 11;
the data input module is used for carrying out parallel input of a corresponding mode on serial input data and sending the serial input data into a corresponding mode input channel;
the fast convolution module is used for carrying out fast convolution calculation operation for reducing complexity on parallel input data;
the data output module is used for outputting parallel data of a corresponding mode;
wherein the fast convolution module further comprises:
2, cascading 3 parallel fast FIR filter substructures;
the 2-stage parallel structure of the primary structure comprises 3 pre-adders, 9 post-adders, 1 data register and 3 secondary 3-stage parallel fast FIR filter substructures;
3 secondary 3 parallel fast FIR filter substructures, each of which comprises 3 pre-adders, 7 post-adders, 2 data registers, and 18 second-order reconfigurable FIR subfilters;
and each of the 6 second-order reconfigurable FIR sub-filters comprises 2 multipliers, 1 adder, 1 data register and 1MUX unit of selecting 1 from 2.
2. In a 6-parallel fast FIR filter according to claim 1, a method of implementing a 5 x 5 fast convolution algorithm;
a method of implementing a 7 x 7 fast convolution algorithm;
a method of implementing 11 x 11 fast convolution algorithm.
3. In a 6 parallel fast FIR filter according to claim 1, a general reconfigurable FIR sub-filter for realizing two mode selections of 1 st order and 2 nd order is provided.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710396331.5A CN107645287B (en) | 2017-05-24 | 2017-05-24 | 6 parallel rapid FIR filter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710396331.5A CN107645287B (en) | 2017-05-24 | 2017-05-24 | 6 parallel rapid FIR filter |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107645287A CN107645287A (en) | 2018-01-30 |
CN107645287B true CN107645287B (en) | 2020-12-22 |
Family
ID=61110124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710396331.5A Active CN107645287B (en) | 2017-05-24 | 2017-05-24 | 6 parallel rapid FIR filter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107645287B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108429546B (en) * | 2018-03-06 | 2021-11-05 | 深圳大学 | Design method of hybrid FIR filter |
CN110138358A (en) * | 2019-04-30 | 2019-08-16 | 南京大学 | A kind of long linear phase limited impulse response digital filter of idol |
WO2021046709A1 (en) * | 2019-09-10 | 2021-03-18 | 深圳市南方硅谷半导体有限公司 | Fir filter optimization method and device, and apparatus |
CN111832717B (en) * | 2020-06-24 | 2021-09-28 | 上海西井信息科技有限公司 | Chip and processing device for convolution calculation |
CN112149351B (en) * | 2020-09-22 | 2023-04-18 | 吉林大学 | Microwave circuit physical dimension estimation method based on deep learning |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101661407B (en) * | 2009-09-30 | 2013-05-08 | 中兴通讯股份有限公司 | Finite impulse response filter with parallel structure and processing method thereof |
US9268742B2 (en) * | 2012-06-05 | 2016-02-23 | Intel Corporation | Reconfigurable variable length fir filters for optimizing performance of digital repeater |
CN102882491B (en) * | 2012-10-23 | 2016-04-13 | 南开大学 | A kind of sparse method for designing without frequency deviation linear phase fir notch filter |
CN103093052A (en) * | 2013-01-25 | 2013-05-08 | 复旦大学 | Design method of low-power dissipation parallel finite impulse response (FIR) digital filter |
US9893714B2 (en) * | 2015-09-01 | 2018-02-13 | Nxp Usa, Inc. | Configurable FIR filter with segmented cells |
-
2017
- 2017-05-24 CN CN201710396331.5A patent/CN107645287B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN107645287A (en) | 2018-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107645287B (en) | 6 parallel rapid FIR filter | |
Mohanty et al. | A high-performance FIR filter architecture for fixed and reconfigurable applications | |
US5875122A (en) | Integrated systolic architecture for decomposition and reconstruction of signals using wavelet transforms | |
Coleman | Chebyshev stopbands for CIC decimation filters and CIC-implemented array tapers in 1D and 2D | |
US7127482B2 (en) | Performance optimized approach for efficient downsampling operations | |
NagaJyothi et al. | Distributed arithmetic architectures for fir filters-a comparative review | |
US10003324B2 (en) | Fast FIR filtering technique for multirate filters | |
Mohanty et al. | A high-performance VLSI architecture for reconfigurable FIR using distributed arithmetic | |
US7277479B2 (en) | Reconfigurable fir filter | |
Gardezi et al. | Design and VLSI Implementation of CSD based DA Architecture for 5/3 DWT | |
US20100077014A1 (en) | Second order real allpass filter | |
Hung et al. | Compact inverse discrete cosine transform circuit for MPEG video decoding | |
Ye et al. | A low cost and high speed CSD-based symmetric transpose block FIR implementation | |
Mayilavelane et al. | A Fast FIR filtering technique for multirate filters | |
Kumar et al. | FPGA Implementation of Systolic FIR Filter Using Single-Channel Method | |
Selvakumar et al. | FPGA based efficient fast FIR algorithm for higher order digital FIR filter | |
Shrivastava et al. | An efficient block-based architecture for reconfigurable fir filter using partial-product method | |
Roach et al. | Design of low power and area efficient ESPFFIR filter using multiple constant multiplier | |
Kadul et al. | High speed and low power FIR filter implementation using optimized adder and multiplier based on Xilinx FPGA | |
Narasimha et al. | Implementation of LOW Area and Power Efficient Architectures of Digital FIR filters | |
Kumar et al. | Design and implementation of pervasive DA based FIR filter and feeder register based multiplier for software definedradio networks | |
Mariammal et al. | A reconfigurable high-speed and low-complexity residue number system-based multiply-accumulate channel filter for software radio receivers | |
Shilparani et al. | FPGA implementation of FIR filter architecture using MCM technology with pipelining | |
Chandran et al. | NEDA based hybrid architecture for DCT—HWT | |
Fernández et al. | A new implementation of the discrete cosine transform in the residue number system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20190429 Address after: Room 816, Block B, Software Building 9 Xinghuo Road, Jiangbei New District, Nanjing, Jiangsu Province Applicant after: Nanjing Fengxing Technology Co., Ltd. Address before: 210023 Xianlin Avenue 163 Nanjing University Electronic Building 229, Qixia District, Nanjing City, Jiangsu Province Applicant before: Nanjing University |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |