CN107645287B - 6 parallel rapid FIR filter - Google Patents

6 parallel rapid FIR filter Download PDF

Info

Publication number
CN107645287B
CN107645287B CN201710396331.5A CN201710396331A CN107645287B CN 107645287 B CN107645287 B CN 107645287B CN 201710396331 A CN201710396331 A CN 201710396331A CN 107645287 B CN107645287 B CN 107645287B
Authority
CN
China
Prior art keywords
parallel
fast
convolution
fir filter
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710396331.5A
Other languages
Chinese (zh)
Other versions
CN107645287A (en
Inventor
王中风
王昊楠
林军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Fengxing Technology Co., Ltd.
Original Assignee
Nanjing Fengxing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Fengxing Technology Co ltd filed Critical Nanjing Fengxing Technology Co ltd
Priority to CN201710396331.5A priority Critical patent/CN107645287B/en
Publication of CN107645287A publication Critical patent/CN107645287A/en
Application granted granted Critical
Publication of CN107645287B publication Critical patent/CN107645287B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)
  • Filters That Use Time-Delay Elements (AREA)

Abstract

The invention discloses a size-configurable convolution hardware implementation based on a 6 parallel fast finite impulse response filter cascade structure, which can complete convolution calculation of four sizes, namely 3 × 3, 5 × 5, 7 × 7 and 11 × 11, reduce the complexity of convolution calculation and improve the throughput rate under the 6 parallel structure. The invention introduces a 2-parallel and 3-parallel fast FIR filter algorithm structure, and then generates a 6-parallel fast FIR Filter Algorithm (FFA) according to the way of 2-parallel structure cascade 3-parallel substructure. On the basis of 6 parallel FFAs, a configurable sub-filter is used for designing a fast convolution hardware architecture capable of completing convolution calculations of four sizes, namely 3 x 3, 5 x 5, 7 x 7 and 11 x 11. Compared with the traditional 6-parallel FIR filter, under the condition of the same throughput rate, the algorithm can save 50% of multiplication operation on the basis of adding a plurality of addition operations. And because the area and the power consumption of the multiplier are far larger than those of the adder in the aspect of hardware implementation, the area and the power consumption of the structure can be saved by 50%. The present invention can be applied to situations where convolution calculations of various typical sizes (3 x 3, 5 x 5, 7 x 7 and 11 x 11) are required, such as convolutional neural networks, video image processing, wireless communications, etc., to increase the effective throughput of the original filter or to reduce the power consumption of the original filter.

Description

6 parallel rapid FIR filter
Technical Field
The invention relates to the field of integrated circuits and machine learning, in particular to a 6-parallel fast FIR filter structure which is used for realizing the hardware implementation of a convolution calculation general circuit with all four sizes of 3 x 3, 5 x 5, 7 x 7 and 11 x 11 in a convolution neural network.
Background
Convolutional Neural Networks (CNNs) are one of the most studied and most widely used machine learning algorithms currently. Convolution calculation is the most calculation resource-consuming part in the CNN, the operation of a rolling machine is represented by multiply-accumulate calculation in hardware implementation, and a multiplier is very resource-consuming in hardware, the occupied area and the power consumption of the multiplier are ten times of those of an adder, so that the hardware implementation optimization for the convolution operation is significant. Most convolution networks use convolution kernels of two sizes, 3 x 3 or 5 x 5, while a small fraction of convolution kernels of two sizes, 7 x 7 and 11 x 11, are larger, while the other sizes have not been used effectively.
Polynomial representation of an N-tap FIR filter in the time domain
Figure GDA0002744686450000011
In the z domain are
Figure GDA0002744686450000012
Where the sequence x (N) is an infinitely long input sequence and the sequence h (N) contains FIR filter coefficients of length N. It can be seen that if { h (N) } is regarded as a coefficient of an N-dimensional discrete convolution, the FIR filter implements one time of N × N convolution calculation.
The fast Finite Impulse Response (FIR) algorithm (FFA) is obtained by applying the algorithm strength reduction mode to an FIR filter, and the core idea is to achieve the effect of reducing the hardware complexity by using the mode of sharing a substructure.
Disclosure of Invention
The main innovative contents of the invention are as follows:
based on the existing parallel fast Finite Impulse Response (FIR) algorithm and the FFA cascading scheme with large block size, the hardware implementation of the 6 parallel Fast FIR Algorithm (FFA) is proposed for the first time;
designing a general-purpose fast convolution hardware circuit compatible with convolution kernels with sizes commonly used by all four convolution neural networks of 3 x 3, 5 x 5, 7 x 7 and 11 x 11 on the basis of the 6 parallel fast convolution kernels;
the theoretical analysis of the invention is as follows:
in the z-domain, the polynomial representation of an N-tap FIR filter is
Figure GDA0002744686450000021
First, we discuss a 2-parallel fast FIR filter in a one-stage structure.
The input sequence { x (0), x (1), x (2), x (3), … } may be split into odd and even terms as follows
X(z)=x(0)+x(1)z-1+x(2)z-2+x(3)z-3+…
=x(0)+x(2)z-2+x(4)z-4+…
+z-1[x(1)+x(3)z-2+x(5)z-4+…]
=X0+z-1X1
Wherein X0And X1Z transform of x (2k +1), respectively x (2 k). Similarly, the filter coefficient H (z) of order N can be split into two parts
H(z)=H0+z-1H1
Wherein H0(z2) And H1(z2) All have the length of
Figure GDA0002744686450000022
Corresponding to the even sub-filter and the odd sub-filter. And the output sequence y (n) is also expressed as two parts of the parity term, and is calculated as follows
Y(z)=Y0+z-1Y1
=(X0+z-1X1)(H0+z-1H1)
=(X0H0+z-2X1H1)+z-1(X1H0+X0H1)
Wherein
Y0=X0H0+z-2X1H1
Y1=X1H0+X0H1
A Fast FIR Algorithm (FFA) is applied to obtain a first-level structure, namely a 2-parallel fast FIR filter structure, and a plurality of 2-parallel FFA structures can be obtained, wherein a typical structure is as follows
Y0=X0H0+z-2X1H1
Y1=(H0+H1)(X0+X1)-X0H0-X1H1
We discuss the 3-parallel-speed FIR filter structure below, where for a three-phase polynomial decomposition, the input sequence x (n) and the filter coefficient sequence H (n) can be decomposed into
X(z)=X0(z3)+z-1X1(z3)+z-2X2(z3)
H(z)=H0(z3)+z-1H1(z3)+z-2H2(z3)
Wherein X0(z3),X1(z3),X2(z3) Corresponding to the time domain expressions x (3k), x (3k +1) and x (3k +2), respectively, and H0(z3),H1(z3),H2(z3) Corresponding to three sub-filters. The output expression of the system is as follows
Y(z)=Y0(z3)+z-1Y1(z3)+z-2Y2(z3)
=(X0+z-1X1+z-2X2)(H0+z-1H1+z-2H2)
In theory, a large number of optimized 3-parallel fast FIR filter structures are available, the matrix form of which can be expressed as
Y=QHP·X
Where P and Q correspond to the pre-processing matrix and post-processing matrix, respectively, and the H matrix corresponds to the sub-filter matrix. So the hardware implementation block diagram of 3 parallel FFA can be easily made according to the above formula, taking the most common 3 parallel FFA structure as an example, see fig. 1.
The structure of the 6 parallel FFA can be realized by sleeving any type of 3 parallel substructures in any type of 2 parallel structures and cascading the most typical two FFA structures, and the output expression is
Y=Y0+z-1Y1+z-2Y+z-3Y3+z-4Y4+z-5Y5
=(X′0+z-1X′1)((H′0+z-1H′1))
=[X′0H′0+z-2X′1H′1]+z-1[(X′0+X′1)(H′0+H′1)-X′0H′0-X′1H′1]
First, a 2-parallel fast FIR filter structure is used, in which
X′0=(X0+z-2X2+z-4X4)
X′1=(X1+z-2X3+z-4X5)
H′0=(H0+z-2H2+z-4H4)
H′1=(H1+z-2H3+z-4H5)
Then each sub-entry corresponds to a 3-parallel FFA, and the output structure is the same, so that the three sub-filters output as
X′0H′0=a0+a1+a2=a0+z-2b1+z-4b2
X′1H′1=a3+a4+a5=a3+z-2b4+z-4b5
(X′0+X′1)(H′0+H′1)=a6+a7+a8=a6+z-2b7+z-4b8
It should be noted here that the three terms of the expression of the three sub-filter outputs are with z0、z-2And z-4The output expression for bringing it into the parent structure, i.e. the 2-parallel structure, has
Y0=a0+z-6a5
Y1=-a0-a3+a6
Y2=a1+a3
Y3=-a1-a4+a7
Y4=a2+a4
Y5=-a2-a5+a8
The circuit of 6 parallel fast FIR filters can be made according to the output expression. The 6 parallel general convolution kernel comprises 3 parallel FIR filters, the sub-filter part of the circuit can simultaneously realize independent three-channel 3 x 3 convolution calculation, the whole filter can realize single-channel 5 x 5 convolution calculation, and the reconfigurable 2-order FIR sub-filter can realize hardware realization compatible with convolution calculation of all four sizes, namely 3 x 3, 5 x 5, 7 x 7 and 11 x 11. The function of mode selection can be completed by adding a MUX element, and the specific circuit schematic diagram is shown in FIG. 2, and the specific circuit schematic diagram of the reconfigurable 2-step FIR sub-filter is shown in FIG. 3.
In the output module, the output module outputs 6 output results in parallel at a time. 36 multiplications and 30 additions are needed to calculate 6 output results by applying the traditional direct 6-order FIR filter, and 18 multiplications and 42 additions are needed to calculate 6 output results by applying the 6-parallel fast FIR filter of the invention. Because the area and power consumption consumed by the multiplier are far larger than those of the adder in the hardware implementation, compared with the traditional direct FIR filter, the 6-parallel fast FIR filter introduced by the invention can save 50% of hardware resources. And on the basis, a general circuit supporting convolution calculation of all four sizes applied to the convolutional neural network is realized.
Drawings
FIG. 1 is a block diagram of a 3 parallel fast FIR filter;
FIG. 2 is a detailed circuit diagram of a generic 6 parallel fast FIR filter;
FIG. 3 is a circuit schematic of a 2 nd order reconfigurable FIR sub-filter;
fig. 4 is a schematic diagram of the blocks of a 6 parallel fast FIR filter.
Detailed Description
When the mode selection A module inputs 0 and the mode selection B module inputs 0, the circuit performs three-channel 3 multiplied by 3 convolution calculation and inputs a sequence xi{n}={xi0,xi1,xi2H, a sequence of convolution coefficientsi{n}={hi0,hi1h i21, 23, the input mode is
X0←x00,X2←x01,X4←x02;H00←h00,H01←h01,H02←h02
X6←x10,X7←x11,X8←x12;H10←h10,H11←h11,H12←h12
X1←x20,X3←x21,X5←x22;H20←h20,H21←h21,H22←h22
When the mode selection A module inputs 1 and the mode selection B module inputs 0, the circuit performs single-channel 5 × 5 convolution calculation, the single-channel input sequence still converts the input data into 6-channel parallel input through the serial-to-parallel pre-processing circuit, and the input sequence of the general convolution kernel is x { n } - { x { (x) } at the moment0,x1,x2,x3,x4,x5H, parameter sequence h { n } - { h0,h1,h2,h3h 40, here the coefficient h in the convolution of 6 x 6 is used ingeniously5The special case of 0 realizes 5 × 5 convolution calculation, so the input mode is
X0←x0;H00←h0
X2←x2;H01←h2
X4←x4;H02←h4
X6←z;H10←h0+h1
X7←z;H11←h2+h3
X8←z;H12←h4
X1←x1;H20←h1
X3←x3;H21←h3
X5←x5;H22←0
When the mode selection A module inputs 1 and the mode selection B module inputs 1, the circuit realizes convolution calculation of a single channel 11 multiplied by 11, input data of the single channel is still converted into 6 channels of parallel input through the pre-processing circuit from serial to parallel, and the input sequence is x { n } - { x { (n) } x }0,x1,…,x5},{x6,x7,…,x11H, parameter sequence h { n } - { h0,h1,…,h 100, here by the coefficient h in a convolution of 12 x 1211The special case of 0 is used to realize 11 × 11 convolution calculation, and the input mode is
X0←{x0,x6};H00←{h0,h6}
X2←{x2,x8};H01←{h2,h8}
X4←{x4,x10};H02←{h4,h10}
X6←z;H10←{h0+h1,h6+h7}
X7←z;H11←{h2+h3,h8+h9}
X8←z;H12←{h4+h5,h10}
X1←{x1,x7};H20←{h1,h7}
X3←{x3,x9};H21←{h3,h9}
X5←{x5,x11};H22←{h5,0}
In mode selectionWhen the module A inputs 1 and the mode selects the module B to input 1, the circuit realizes a 7 multiplied by 7 single-channel convolution mode through the change of the input mode, and a single-channel input sequence still converts input data into parallel data through a pre-processing circuit from serial to parallel6Parallel input, input sequence is x { n } - { x }0,x1,…,x5},{x6,x7,…,x11H, parameter sequence h { n } - { h0,h1,…, h 60, 0, 0, 0, 0}, using the convolution factor h in a 12 × 12 convolution7,…,hnThe 7 × 7 convolution calculation is realized in the special case of 0, and the input mode is
X0←{x0,x6};H00←{h0,h6}
X2←{x2,x8};H01←{h2,0}
X4←{x4,x10};H02←{h4,0}
X6←z;H10←{h0+h1,h6}
X7←z;H11←{h2+h3,0}
X8←z;H12←{h4+h5,0}
X1←{x1,x7};H20←{h1,0}
X3←{x3,x9};H21←{h3,0}
X5←{x5,x11};H22←{h5,0}
In summary, if only 3 × 3 and 5 × 5 modes are supported, the structure of the invention uses 18 multipliers, 42 adders and 7 delay units, which can save 50% of hardware resources; and by using a 2-order filter in a sub-filter structure, the efficient hardware implementation of convolution calculation of all 4 types of convolution neural networks with common sizes can be completed, and by using 35 multipliers, 59 adders and 25 delay units, under the condition that the circuit integration scale is quite high nowadays, the design of an efficient general type neural network convolution kernel is realized, and the convolution calculation of all four types of convolution kernels of 3 × 3, 5 × 5, 7 × 7 and 11 × 11 can be supported.

Claims (3)

1. A6 parallel fast FIR filter, a structure of 3 parallel fast FIR filters in cascade, comprising:
a mode selection module for selecting one of the four convolution calculation modes of 3 × 3, 5 × 5, 7 × 7 and 11 × 11;
the data input module is used for carrying out parallel input of a corresponding mode on serial input data and sending the serial input data into a corresponding mode input channel;
the fast convolution module is used for carrying out fast convolution calculation operation for reducing complexity on parallel input data;
the data output module is used for outputting parallel data of a corresponding mode;
wherein the fast convolution module further comprises:
2, cascading 3 parallel fast FIR filter substructures;
the 2-stage parallel structure of the primary structure comprises 3 pre-adders, 9 post-adders, 1 data register and 3 secondary 3-stage parallel fast FIR filter substructures;
3 secondary 3 parallel fast FIR filter substructures, each of which comprises 3 pre-adders, 7 post-adders, 2 data registers, and 18 second-order reconfigurable FIR subfilters;
and each of the 6 second-order reconfigurable FIR sub-filters comprises 2 multipliers, 1 adder, 1 data register and 1MUX unit of selecting 1 from 2.
2. In a 6-parallel fast FIR filter according to claim 1, a method of implementing a 5 x 5 fast convolution algorithm;
a method of implementing a 7 x 7 fast convolution algorithm;
a method of implementing 11 x 11 fast convolution algorithm.
3. In a 6 parallel fast FIR filter according to claim 1, a general reconfigurable FIR sub-filter for realizing two mode selections of 1 st order and 2 nd order is provided.
CN201710396331.5A 2017-05-24 2017-05-24 6 parallel rapid FIR filter Active CN107645287B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710396331.5A CN107645287B (en) 2017-05-24 2017-05-24 6 parallel rapid FIR filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710396331.5A CN107645287B (en) 2017-05-24 2017-05-24 6 parallel rapid FIR filter

Publications (2)

Publication Number Publication Date
CN107645287A CN107645287A (en) 2018-01-30
CN107645287B true CN107645287B (en) 2020-12-22

Family

ID=61110124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710396331.5A Active CN107645287B (en) 2017-05-24 2017-05-24 6 parallel rapid FIR filter

Country Status (1)

Country Link
CN (1) CN107645287B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429546B (en) * 2018-03-06 2021-11-05 深圳大学 Design method of hybrid FIR filter
CN110138358A (en) * 2019-04-30 2019-08-16 南京大学 A kind of long linear phase limited impulse response digital filter of idol
WO2021046709A1 (en) * 2019-09-10 2021-03-18 深圳市南方硅谷半导体有限公司 Fir filter optimization method and device, and apparatus
CN111832717B (en) * 2020-06-24 2021-09-28 上海西井信息科技有限公司 Chip and processing device for convolution calculation
CN112149351B (en) * 2020-09-22 2023-04-18 吉林大学 Microwave circuit physical dimension estimation method based on deep learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661407B (en) * 2009-09-30 2013-05-08 中兴通讯股份有限公司 Finite impulse response filter with parallel structure and processing method thereof
US9268742B2 (en) * 2012-06-05 2016-02-23 Intel Corporation Reconfigurable variable length fir filters for optimizing performance of digital repeater
CN102882491B (en) * 2012-10-23 2016-04-13 南开大学 A kind of sparse method for designing without frequency deviation linear phase fir notch filter
CN103093052A (en) * 2013-01-25 2013-05-08 复旦大学 Design method of low-power dissipation parallel finite impulse response (FIR) digital filter
US9893714B2 (en) * 2015-09-01 2018-02-13 Nxp Usa, Inc. Configurable FIR filter with segmented cells

Also Published As

Publication number Publication date
CN107645287A (en) 2018-01-30

Similar Documents

Publication Publication Date Title
CN107645287B (en) 6 parallel rapid FIR filter
Mohanty et al. A high-performance FIR filter architecture for fixed and reconfigurable applications
US5875122A (en) Integrated systolic architecture for decomposition and reconstruction of signals using wavelet transforms
Coleman Chebyshev stopbands for CIC decimation filters and CIC-implemented array tapers in 1D and 2D
US7127482B2 (en) Performance optimized approach for efficient downsampling operations
NagaJyothi et al. Distributed arithmetic architectures for fir filters-a comparative review
US10003324B2 (en) Fast FIR filtering technique for multirate filters
Mohanty et al. A high-performance VLSI architecture for reconfigurable FIR using distributed arithmetic
US7277479B2 (en) Reconfigurable fir filter
Gardezi et al. Design and VLSI Implementation of CSD based DA Architecture for 5/3 DWT
US20100077014A1 (en) Second order real allpass filter
Hung et al. Compact inverse discrete cosine transform circuit for MPEG video decoding
Ye et al. A low cost and high speed CSD-based symmetric transpose block FIR implementation
Mayilavelane et al. A Fast FIR filtering technique for multirate filters
Kumar et al. FPGA Implementation of Systolic FIR Filter Using Single-Channel Method
Selvakumar et al. FPGA based efficient fast FIR algorithm for higher order digital FIR filter
Shrivastava et al. An efficient block-based architecture for reconfigurable fir filter using partial-product method
Roach et al. Design of low power and area efficient ESPFFIR filter using multiple constant multiplier
Kadul et al. High speed and low power FIR filter implementation using optimized adder and multiplier based on Xilinx FPGA
Narasimha et al. Implementation of LOW Area and Power Efficient Architectures of Digital FIR filters
Kumar et al. Design and implementation of pervasive DA based FIR filter and feeder register based multiplier for software definedradio networks
Mariammal et al. A reconfigurable high-speed and low-complexity residue number system-based multiply-accumulate channel filter for software radio receivers
Shilparani et al. FPGA implementation of FIR filter architecture using MCM technology with pipelining
Chandran et al. NEDA based hybrid architecture for DCT—HWT
Fernández et al. A new implementation of the discrete cosine transform in the residue number system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190429

Address after: Room 816, Block B, Software Building 9 Xinghuo Road, Jiangbei New District, Nanjing, Jiangsu Province

Applicant after: Nanjing Fengxing Technology Co., Ltd.

Address before: 210023 Xianlin Avenue 163 Nanjing University Electronic Building 229, Qixia District, Nanjing City, Jiangsu Province

Applicant before: Nanjing University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant