CN107645287A - A kind of size based on 6 parallel rapid finite impact response filter cascade structures can configure convolution hardware and realize - Google Patents

A kind of size based on 6 parallel rapid finite impact response filter cascade structures can configure convolution hardware and realize Download PDF

Info

Publication number
CN107645287A
CN107645287A CN201710396331.5A CN201710396331A CN107645287A CN 107645287 A CN107645287 A CN 107645287A CN 201710396331 A CN201710396331 A CN 201710396331A CN 107645287 A CN107645287 A CN 107645287A
Authority
CN
China
Prior art keywords
parallel
fir
quick
fast convolution
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710396331.5A
Other languages
Chinese (zh)
Other versions
CN107645287B (en
Inventor
王中风
王昊楠
林军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Fengxing Technology Co Ltd
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201710396331.5A priority Critical patent/CN107645287B/en
Publication of CN107645287A publication Critical patent/CN107645287A/en
Application granted granted Critical
Publication of CN107645287B publication Critical patent/CN107645287B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)
  • Filters That Use Time-Delay Elements (AREA)

Abstract

Convolution hardware is can configure the invention discloses a kind of size based on 6 parallel rapid finite impact response filter cascade structures to realize, the structure can complete the convolutional calculation of tetra- kinds of sizes of 3*3,5*5,7*7 and 11*11, convolutional calculation complexity is reduced, and throughput is improved under 6 parallel organizations.The present invention first describes 2 parallel with 3 parallel quick FIR filter algorithm structures, and the mode that 3 parallel minor structures are then cascaded according to 2 parallel organizations produces 6 parallel quick FIR filter algorithms (FFA).On the basis of 6 parallel FFA, with configurable subfilter, the fast convolution hardware structure that can complete tetra- kinds of size convolutional calculations of 3*3,5*5,7*7 and 11*11 is devised.Compared to 6 traditional parallel FIR filters, under the conditions of identical throughput, this algorithm can save 50% multiplication operation simultaneously on the basis of some add operations are increased.And due to being realized in hardware, the area and power consumption of multiplier are much larger than adder, therefore this framework can save 50% area and power consumption.The present invention can be used in needs the occasion of a variety of typical sizes (3*3,5*5,7*7 and 11*11) convolutional calculations, such as convolutional neural networks, Computer Vision, radio communication etc., the effective throughput of original filter can be improved, or reduces the power consumption of original filter.

Description

A kind of size based on 6 parallel rapid finite impact response filter cascade structures can Convolution hardware is configured to realize
Technical field
The present invention relates to integrated circuit and machine learning field, more particularly to a kind of 6 parallel quick FIR filter structures, The universal circuit of convolutional calculation of the whole four kinds of sizes of 3*3,5*5,7*7 and 11*11 in convolutional neural networks is carried out using it Hardware is realized.
Background technology
Convolutional neural networks (CNN) are that current research obtains one of machine learning algorithm at most and being most widely used. Convolutional calculation is the most part of consumption calculations resource in CNN, and machine operation is rolled up in hardware realization and shows as repeatedly multiplying accumulating calculating, And multiplier is that very consumption resource, its footprint area and power consumption is ten several times of adder within hardware, thus it is directed to volume The hardware of product operation realizes that optimization just seems highly significant.The convolutional network of the overwhelming majority all employ both chis of 3*3 or 5*5 Very little convolution kernel, the larger sized convolution kernel of small part have two kinds of 7*7 and 11*11, and other sizes were not used effectively also then.
The FIR filter of one N tap is shown as in the polynomial table of time domain
It is in z domains
Wherein sequence { x (n) } is the list entries of an endless, and sequence { h (n) } contains the FIR that length is N and filtered Device coefficient.It can seem, if { h (n) } to be considered as to the coefficient of N-dimensional discrete convolution, FIR filter realizes N × N volume Product calculates.
The mode of algorithm intensity reduction is applied in finite impulse response (FIR) wave filter, has just obtained quick FIR algorithm (FFA), its core concept is to reach the effect of reduction hardware complexity using the mode of shared minor structure.
The content of the invention
The fundamental novel features of the present invention have:
● based on existing parallel rapid finite impulse response (FIR) algorithm, and the FFA concatenated schemes of chunk sizes, The hardware for proposing 6 parallel quick FIR algorithms (FFA) first is realized;
● on the basis of 6 parallel fast convolution cores, devise all four kinds of a kind of compatible 3*3,5*5,7*7 and 11*11 The universal fast convolution hardware circuit of convolutional neural networks Commonly Used Size convolution kernel;
The theory analysis of the present invention is as follows:
In z domains, the polynomial table of the FIR filter of a N tap is shown as
First, we discuss 2 parallel quick FIR filters in primary structure.
List entries { x (0), x (1), x (2), x (3) ... } can be split as odd term and even item two parts are as follows
X (z)=x (0)+x (1) z-1+x(2)z-2+x(3)z-3+L
=x (0)+x (2) z-2+x(4)z-4+L
+z-1[x(1)+x(3)z-2+x(5)z-4+L]
=X0+z-1X1
Wherein X0And X1Respectively x (2k) x (2k+1) z-transform.Similarly, exponent number is that N filter coefficient H (z) can be with It is split as two parts
H (z)=H0+z-1H1
Wherein H0(z2) and H1(z2) length is allCorresponding to even number subfilter and odd number subfilter.And will be defeated Go out sequences y (n) and also illustrate that into two parts of odd even item, be calculated as follows
Y (z)=Y0+z-1Y1
=(X0+z-1X1)(H0+z-1H1)
=(X0H0+z-2X1H1)+z-1(X1H0+X0H1)
Wherein
Y0=X0H0+z-2X1H1
Y1=X1H0+X0H1
The parallel quick FIR filter structure in primary structure i.e. 2 is obtained using quick FIR algorithm (FFA), can be obtained a lot 2 parallel FFA structures of kind, more typical structure are as follows
Y0=X0H0+z-2X1H1
Y1=(H0+H1)(X0+X1)-X0H0-X1H1
We discuss 3 and scanning frequency FIR filter structure below, for the Factoring Polynomials of three-phase, list entries x (n) and Filter coefficient sequence H (n) can be broken down into
X (z)=X0(z3)+z-1X1(z3)+z-2X2(z3)
H (z)=H0(z3)+z-1H1(z3)+z-2H2(z3)
Wherein X0(z3), X1(z3), X2(z3) correspond respectively to time-domain expression x (3k), x (3k+1) and x (3k+2), and H0 (z3), H1(z3), H2(z3) correspond to three subfilters.The output expression formula of so system is as follows
Y (z)=Y0(z3)+z-1Y1(z3)+z-2Y2(z3)↓
=(X0+z-1X1+z-2X2)(H0+z-1H1+z-2H2)
In theory, 3 parallel quick FIR filter structures of a variety of optimizations can be obtained, its matrix form can table It is shown as following form
Y=QHPX
Wherein P and Q corresponds respectively to preconditioning matrix and post processing matrix, and H-matrix then corresponds to subfilter matrix.Institute Realize block diagram can easily make 3 parallel FFA hardware according to above formula, using 3 the most commonly used parallel FFA structures as Example, is shown in Fig. 1.
6 parallel FFA structure, can by applying mechanically any type of 3 parallel minor structures in the parallel organization of any type 2, Cascaded with most typical two kinds of FFA structures, then exporting expression formula is
Y=Y0+z-1Y1+z-2Y+z-3Y3+z-4Y4+z-5Y5
=(X '0+z-1X′1)((H′0+z-1H′1))
=[X '0H′0+z-2X′1H′1]+z-1[(X′0+X′1)(H′0+H′1)-X′0H′0-X′1H′1]
First by the structure of 2 parallel quick FIR filters, wherein
X′0=(X0+z-2X2+z-4X4)
X′1=(X1+z-2X3+z-4X5)
H′0=(H0+z-2H2+z-4H4)
H′1=(H1+z-2H3+z-4H5)
Then now each subitem correspond to a 3 parallel FFA, and its export structure is identical, then makes three subfilters Export and be
X′0H′0=a0+a1+a2=a0+z-2b1+z-4b2
X′1H′1=a3+a4+a5=a3+z-2b4+z-4b5
(X′0+X′1)(H′0+H′1)=a6+a7+a8=a6+z-2b7+z-4b8
Herein it should be noted that three of three subfilter output expression formulas are with z0、z-2With z-4It is 3 parallel defeated Go out structure, being taken to the father's structure i.e. output expression formula of 2 parallel organizations has
Y0=a0+z-6a5
Y1=-a0-a3+a6
Y2=a1+a3
Y3=-a1-a4+a7
Y4=a2+a4
Y5=-a2-a5+a8
The circuit of 6 parallel quick FIR filters can be then made according to output expression formula.The 6 parallel general convolution kernel bag Containing 33 parallel FIR filters, then the circuit neutron filter segment can realize the independent convolution meter of triple channel 3 × 3 simultaneously Calculate, and overall wave filter can then realize the convolutional calculation of single channel 5 × 5, and by using the rank FIR subfilters of restructural 2, can To realize that compatible all hardware of four kinds of sizes 3 × 3,5 × 5,7 × 7 and 11 × 11 convolutional calculations is realized.By adding MUX members Part can completes the function of model selection, and physical circuit schematic diagram is shown in Fig. 2, and the rank FIR subfilter physical circuits of restructural 2 show Intention is shown in Fig. 3.
In the output module, the parallel output 6 of output module one time output result.Filtered with traditional rank FIR of Direct-type 6 Ripple device, which calculates 6 output results, needs 36 multiplication, 30 sub-additions, and 6 are calculated with the 6 parallel quick FIR filters of the present invention Individual output result needs 18 multiplication, 42 sub-additions.In being realized in hardware, the area and power consumption of multiplier consumption are much big In adder, therefore compared to traditional Direct-type FIR Filter, the 6 parallel quick FIR filters that the present invention introduces can save Save 50% hardware resource.And all four kinds of size rolls for supporting to be applied in convolutional neural networks are realized on this basis The universal circuit that product calculates.
Brief description of the drawings
Fig. 1 is 3 parallel quick FIR filter structure figures;
Fig. 2 is the physical circuit figure of universal 6 parallel quick FIR filters;
Fig. 3 is the circuit diagram of 2 rank restructural FIR subfilters;
Fig. 4 is 6 parallel quick FIR filter modules schematic diagrames.
Embodiment
0 is inputted in model selection A modules, when model selection B modules input 0, the circuit carries out the convolution meter of triple channel 3 × 3 Calculate, list entries xi{ n }={ xi0, xi1, xi2, convolution coefficient sequence hi{ n }={ hi0, hi1, hi2, i=1,2,3, now input Pattern is
X0←x00, X2 ← x01, X4 ← x02;H00←h00, H01 ← h01, H02 ← h02
X6←x10, X7 ← x11, X8 ← x12;H10←h10, H11 ← h11, H12 ← h12
X1←x20, X3 ← x21, X5 ← x22;H20←h20, H21 ← h21, H22 ← h22
1 is inputted in model selection A modules, when model selection B modules input 0, the circuit carries out the convolution meter of single channel 5 × 5 To calculate, single channel list entries converts input data into 6 tunnels still through the preposing signal process circuit of transformation from serial to parallel and inputted parallel, Now the list entries of general convolution kernel is x { n }={ x0, x1, x2, x3, x4, x5, argument sequence h { n }={ h0, h1, h2, h3, h4, 0 }, it dexterously make use of make coefficient h in 6 × 6 convolution here5=0 special circumstances realize 5 × 5 convolutional calculations, so defeated Entering pattern is
X0←x0;H00←h0
X2←x2;H01←h2
X4←x4;H02←h4
X6←z;H10←h0+h1
X7←z;H11←h2+h3
X8←z;H12←h4
X1←x1;H20←h1
X3←x3;H21←h3
X5←x5;H22←0
1 is inputted in model selection A modules, when model selection B modules input 1, the volume of the circuit realiration single channel 11 × 11 Product calculates, and it is parallel that single channel list entries still through the preposing signal process circuit of transformation from serial to parallel converts input data into 6 tunnels Input, list entries is x { n }={ x0, x1..., x5, { x6, x7..., x11, argument sequence h { n }={ h0, h1..., h10, 0 }, Here with making coefficient h in 12 × 12 convolution11=0 special circumstances realize 11 × 11 convolutional calculations, and input pattern is
X0←{x0, x6};H00←{h0, h6}
X2←{x2, x8};H01←{h2, h8}
X4←{x4, x10};H02←{h4, h10}
X6←z;H10←{h0+h1, h6+h7}
X7←z;H11←{h2+h3, h8+h9}
X8←z;H12←{h4+h5, h10}
X1←{x1, x7};H20←{h1, h7}
X3←{x3, x9};H21←{h3, h9}
X5←{x5, x11};H22←{h5, 0 }
1 is inputted in model selection A modules, when model selection B modules input 1, the circuit passes through the change of input pattern, reality Existing 7 × 7 single channel convolution patterns, single channel list entries is still through the preposing signal process circuit of transformation from serial to parallel by input data It is converted into 6 tunnels to input parallel, list entries is x { n }={ x0, x1..., x5, { x6, x7..., x11, argument sequence h { n }= {h0, h1..., h6, 0,0,0,0,0 }, here with making convolution coefficient h in 12 × 12 convolution7..., h11=0 special circumstances To realize 7 × 7 convolutional calculations, input pattern is
X0←{x0, x6};H00←{h0, h6 }
X2←{x2, x8};H01←{h2, 0 }
X4←{x4, x10};H02←{h4, 0 }
X6←z;H10←{h0+h1, h6}
X7←z;H11←{h2+h3, 0 }
X8←z;H12←{h4+h5, 0 }
X1←{x1, x7};H20←{h1, 0 }
X3←{x3, x9};H21←{h3, 0 }
X5←{x5, x11};H22←{h5, 0 }
In summary, if only supporting 3 × 3,5 × 5 both of which, our structure uses 18 multipliers, and 42 add Musical instruments used in a Buddhist or Taoist mass, 7 delay units, can save 50% hardware resource;And by using 2 rank wave filters in subfilter structure, I Can complete the hardware-efficients of all convolutional calculations of 4 kinds of convolutional neural networks Commonly Used Sizes and realize, using 35 multipliers, 59 adders, 25 delay units, in the case where nowadays circuit collection is at a relatively high on a large scale, realize efficient general type nerve The design of network convolution kernel, it can support the convolutional calculation of 3 × 3,5 × 5,7 × 7 and 11 × 11 whole four kinds of convolution kernels.

Claims (4)

1. a kind of 6 parallel quick FIR filters, the structure being made up of 3 parallel quick FIR filter cascades, including:
Mode selection module, for selecting to carry out one kind in tetra- kinds of convolutional calculation patterns of 3*3,5*5,7*7 and 11*11;
Data input module, for carrying out corresponding modes parallelization input to serial input data, and it is sent into corresponding modes input Passage;
Fast convolution module, the fast convolution for parallel input data reduce complexity calculate operation;
Data outputting module, for exporting the parallel data of corresponding modes.
2. according to claim 16 parallel quick FIR filters, the method for realizing 5*5 fast convolution algorithms;
The method for realizing 7*7 fast convolution algorithms;
The method for realizing 11*11 fast convolution algorithms.
3. according to claim 16 parallel quick FIR filters, the general of 1 rank and the selection of 2 rank both of which is realized Type restructural FIR subfilters.
4. according to claim 16 parallel quick FIR filters, wherein, the fast convolution module also includes:
● the 2 parallel quick parallel quick FIR filter minor structures of FIR filter structure cascade 3;
● the parallel organization of primary structure 2 includes 3 preposition adders, 9 rearmounted adders, 1 data register, 3 two levels 3 Parallel quick FIR filter minor structure;
● 3 two levels 3 quick FIR filter minor structure parallel, respectively comprising 3 preposition adders, 7 rearmounted adders, 2 numbers According to register, 18 second order restructural FIR subfilters;
● 6 second order restructural FIR subfilters, each includes 2 multipliers, 1 adder, 1 data register, and 1 Individual 2 select 1MUX units.
CN201710396331.5A 2017-05-24 2017-05-24 6 parallel rapid FIR filter Active CN107645287B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710396331.5A CN107645287B (en) 2017-05-24 2017-05-24 6 parallel rapid FIR filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710396331.5A CN107645287B (en) 2017-05-24 2017-05-24 6 parallel rapid FIR filter

Publications (2)

Publication Number Publication Date
CN107645287A true CN107645287A (en) 2018-01-30
CN107645287B CN107645287B (en) 2020-12-22

Family

ID=61110124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710396331.5A Active CN107645287B (en) 2017-05-24 2017-05-24 6 parallel rapid FIR filter

Country Status (1)

Country Link
CN (1) CN107645287B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429546A (en) * 2018-03-06 2018-08-21 深圳大学 A kind of mixed type FIR filter design method
CN110138358A (en) * 2019-04-30 2019-08-16 南京大学 A kind of long linear phase limited impulse response digital filter of idol
CN111832717A (en) * 2020-06-24 2020-10-27 上海西井信息科技有限公司 Chip structure and processing module for convolution calculation
CN112149351A (en) * 2020-09-22 2020-12-29 吉林大学 Microwave circuit physical dimension estimation method based on deep learning
WO2021046709A1 (en) * 2019-09-10 2021-03-18 深圳市南方硅谷半导体有限公司 Fir filter optimization method and device, and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661407A (en) * 2009-09-30 2010-03-03 中兴通讯股份有限公司 Finite impulse response filter with parallel structure and processing method thereof
CN102882491A (en) * 2012-10-23 2013-01-16 南开大学 Design method of sparse frequency-deviation-free linear phase FIR (finite impulse response) notch filter
CN103093052A (en) * 2013-01-25 2013-05-08 复旦大学 Design method of low-power dissipation parallel finite impulse response (FIR) digital filter
US20130332499A1 (en) * 2012-06-05 2013-12-12 P- Wave Holdings LLC Reconfigurable variable length fir filters for optimizing performance of digital repeater
US20170063346A1 (en) * 2015-09-01 2017-03-02 Freescale Semiconductor, Inc. Configurable fir filter with segmented cells

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661407A (en) * 2009-09-30 2010-03-03 中兴通讯股份有限公司 Finite impulse response filter with parallel structure and processing method thereof
US20130332499A1 (en) * 2012-06-05 2013-12-12 P- Wave Holdings LLC Reconfigurable variable length fir filters for optimizing performance of digital repeater
CN102882491A (en) * 2012-10-23 2013-01-16 南开大学 Design method of sparse frequency-deviation-free linear phase FIR (finite impulse response) notch filter
CN103093052A (en) * 2013-01-25 2013-05-08 复旦大学 Design method of low-power dissipation parallel finite impulse response (FIR) digital filter
US20170063346A1 (en) * 2015-09-01 2017-03-02 Freescale Semiconductor, Inc. Configurable fir filter with segmented cells

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田晶晶: "基于快速卷积算法的低复杂度并行FIR滤波器的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429546A (en) * 2018-03-06 2018-08-21 深圳大学 A kind of mixed type FIR filter design method
CN108429546B (en) * 2018-03-06 2021-11-05 深圳大学 Design method of hybrid FIR filter
CN110138358A (en) * 2019-04-30 2019-08-16 南京大学 A kind of long linear phase limited impulse response digital filter of idol
WO2021046709A1 (en) * 2019-09-10 2021-03-18 深圳市南方硅谷半导体有限公司 Fir filter optimization method and device, and apparatus
CN111832717A (en) * 2020-06-24 2020-10-27 上海西井信息科技有限公司 Chip structure and processing module for convolution calculation
CN112149351A (en) * 2020-09-22 2020-12-29 吉林大学 Microwave circuit physical dimension estimation method based on deep learning
CN112149351B (en) * 2020-09-22 2023-04-18 吉林大学 Microwave circuit physical dimension estimation method based on deep learning

Also Published As

Publication number Publication date
CN107645287B (en) 2020-12-22

Similar Documents

Publication Publication Date Title
CN107645287A (en) A kind of size based on 6 parallel rapid finite impact response filter cascade structures can configure convolution hardware and realize
Rashidi et al. Design and implementation of low power digital FIR filter based on low power multipliers and adders on Xilinx FPGA
Trimale A review: FIR filter implementation
Khan et al. VLSI implementation of reduced complexity wallace multiplier using energy efficient CMOS full adder
Safarian et al. FPGA implementation of LMS-based FIR adaptive filter for real time digital signal processing applications
Srinivasa Reddy et al. An approach for fixed coefficient RNS-based FIR filter
Jana et al. An area efficient vlsi architecture for 1-d and 2-d discrete wavelet transform (dwt) and inverse discrete wavelet transform (idwt)
Murthy et al. Optimized DA-reconfigurable FIR filters for software defined radio channelizer applications
Trimale et al. FIR filter implementation on FPGA using MCM design technique
Erdogan et al. High throughput FIR filter design for low power SoC applications
Liao et al. Novel architectures for the lifting-based discrete wavelet transform
Ye et al. A low cost and high speed CSD-based symmetric transpose block FIR implementation
Singh et al. Novel architecture for lifting discrete wavelet packet transform with arbitrary tree structure
Kumar et al. Design and implementation of pervasive DA based FIR filter and feeder register based multiplier for software definedradio networks
Kumar et al. Array Multiplier and CIA based FIR Filter for DSP applications
Zhang et al. Low-power reconfigurable FIR filter design based on common operation sharing
Chaudhary et al. Design of 64 bit High Speed Vedic Multiplier
Kamboh et al. An algorithmic transformation for FPGA implementation of high throughput filters
Narasimha et al. Implementation of LOW Area and Power Efficient Architectures of Digital FIR filters
Subathradevi et al. Delay optimized novel architecture of FIR filter using clustered-retimed MAC unit Cell for DSP applications
Malviya et al. Design of IIR filter using Wallace tree multiplier
Turner et al. Implementation of fixed DSP functions using the reduced coefficient multiplier
Shilparani et al. FPGA implementation of FIR filter architecture using MCM technology with pipelining
Dinesh et al. Survey on reconfigurable fir filter architecture
Jain et al. Analysis of fast FIR algorithms based area efficient FIR digital filters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190429

Address after: Room 816, Block B, Software Building 9 Xinghuo Road, Jiangbei New District, Nanjing, Jiangsu Province

Applicant after: Nanjing Fengxing Technology Co., Ltd.

Address before: 210023 Xianlin Avenue 163 Nanjing University Electronic Building 229, Qixia District, Nanjing City, Jiangsu Province

Applicant before: Nanjing University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant