CN107862381A - A kind of FIR filter suitable for a variety of convolution patterns is realized - Google Patents
A kind of FIR filter suitable for a variety of convolution patterns is realized Download PDFInfo
- Publication number
- CN107862381A CN107862381A CN201711101343.7A CN201711101343A CN107862381A CN 107862381 A CN107862381 A CN 107862381A CN 201711101343 A CN201711101343 A CN 201711101343A CN 107862381 A CN107862381 A CN 107862381A
- Authority
- CN
- China
- Prior art keywords
- convolution
- hardware
- fir filter
- parallel
- patterns
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 36
- 238000004364 calculation method Methods 0.000 claims abstract description 21
- 238000012545 processing Methods 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 abstract description 9
- 230000006978 adaptation Effects 0.000 abstract 1
- 238000000034 method Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03H—IMPEDANCE NETWORKS, e.g. RESONANT CIRCUITS; RESONATORS
- H03H17/00—Networks using digital techniques
- H03H17/02—Frequency selective networks
- H03H17/06—Non-recursive filters
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03H—IMPEDANCE NETWORKS, e.g. RESONANT CIRCUITS; RESONATORS
- H03H17/00—Networks using digital techniques
- H03H2017/0072—Theoretical filter design
- H03H2017/0081—Theoretical filter design of FIR filters
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of FIR filter for being applicable to a variety of convolution patterns and its hardware to realize, the structure can support the convolution algorithm of main flow in current convolutional neural networks, the 3*3 convolution algorithms that the 3*3 and 5*5 convolutional calculation and step-length that such as step-length is 1 are 2, and reduce hardware consumption with 6 parallel quick FIR algorithms, convolutional calculation complexity is reduced, improves data throughput.The hardware configuration that the present invention completes three parallel convolution operations that paces are 2 derives, and it is combined with 6 parallel quick FIR filter hardware configurations not increasing adder on the basis of multiplier so that the structure is all very big under each pattern of adaptation must to make use of hardware resource.The present invention can realize that the convolutional neural networks of current most main flows calculate by the different configurations of the single hardware configuration, improve hardware utilization, possess high universalizable, and the hardware for simplifying convolutional neural networks is realized.
Description
Technical field
The present invention relates to computer and the hardware in electronics science field, more particularly to deep learning field convolutional neural networks
Realize, the generic structure and hardware that a kind of compatible paces are 1 convolutional calculation and paces are 2 convolution algorithms are realized.
Background technology
Nowadays convolutional neural networks (CNN) have turned into presently most because it is in the remarkable performance in the fields such as image, audio
Prevalence, and one of most widely used deep learning algorithm.With the rapid development of convolutional neural networks in recent years, big convolution kernel
Application in a model is fewer and fewer, with the convolution algorithm for being most widely 3*3 and 5*5 in current each model, and
And the convolution algorithm that paces are 2 is also arrived by increasing model use.And for the convolution algorithm that paces are 2, do not have always but
There is a good hardware to realize prioritization scheme.The convolution algorithm that traditional paces are 1 can be improved by quick FIR algorithm
Degree of parallelism simultaneously reduces multiplier resources.
The FIR filter of one N tap is shown as in the polynomial table of time domain:
Or it can be expressed as in z domains
If using the FIR filter coefficient sequence { h (n) } that length is N as the coefficient of N-dimensional discrete convolution, FIR filtering
Device can realize the convolution algorithm of a N-dimensional.Pass through the combination of N number of wave filter, it is possible to achieve N*N in convolutional neural networks
Convolution algorithm.And quick FIR algorithm can realize high degree of parallelism, and by increase adder reduce the method for multiplier come
Realize low complex degree.But this method is 2 convolution algorithm for step-length and improper, is calculated and selected by this method
Property output can bring the serious waste of hardware resource to realize the convolution that step-length is 2, have in each cycle about 50% it is hard
Part resource is no influence on result of calculation.So a kind of can either realize that the convolution algorithm and can that traditional step-length is 1 realizes step
A length of 2 convolution algorithm, and will turn into low complex degree, high degree of parallelism, the common hardware framework of high hardware resource utilization
A kind of demand.
The content of the invention
In view of the above-mentioned problems, the present invention proposes one kind, not only compatible step-length was 1 but also can be simultaneous on quick FIR algorithm framework
Hold the convolutional calculation framework and its hardware realization that step-length is 2.The present invention realizes three kinds of computation schemas on a kind of hardware structure,
Respectively the parallel-convolution of 6 tap 6 calculates, three independent parallel convolution operations of 3 tap 3, and 3 that 2 independent step-lengths are 2
The parallel-convolution of tap 3 calculates.The present invention possesses high universalizable, by being configured to the difference of the hardware structure, it is possible to achieve big portion
Divide the convolution algorithm of current main-stream.The specific content of the invention is as follows:
A kind of FIR filter for being applicable to a variety of convolution patterns, its hardware structure include:
1) data input selecting unit, for different convolution patterns, input data is carried out to reselect arrangement input
To corresponding convolutional calculation module.
2) convolutional calculation unit, basic component units are the 3 parallel quick FIR filters of 3 taps, and insert data choosing
Device control data stream is selected to be directed to different convolution algorithms.
3) computing unit after convolution, the output to convolutional calculation unit carry out processing and calculated so as to realize to convolutional calculation
The cascade of multiple independent component units in unit, form the quick FIR filter of a multiple parallel multi-tap.
4) data output selecting unit, for different convolution patterns, corresponding result of calculation is selected as module
Output.
Second of computation schema of the present invention is three 3 independent taps 3 quick FIR algorithm hardware configuration parallel, wherein 3
Quick FIR hardware configurations are most basic comprising modules parallel for tap 3, and each output Y and input X can be obtained by the derivation of equation
Between relation:
Y0=H0X0-z-3H2X2+z-3[(H1+H2)(X1+X2)-H1X1]
Y1=[(H0+H1)(X0+X1)-H1X1]-[H0X0-z-3H2X2]
Y2=[H0+H1+H2)(X0+X1+X2)]-[(H0+H1)(X0+X1)-H1X1]-[(H1+H2)(X1+X2)-H1X1]
And for the parallel convolution operations of 3 tap 3 that step-length is 2, it can derive between each output Y and input X
Relation is:
Y0=H0X0+z-6(H2X4+H1X5)
Y1=H2X0+H1X1+H0X2
Y2=H2X2+H1X3+H0X4
By the multiplexing of multiplier and adder, and the mode of multiplexer is added by convolution algorithm that step-length is 2
It is combined with quick FIR algorithm, so as to realize with high hardware resource utilization, high degree of parallelism, the convolution algorithm of high universalizable
Hardware structure.
Brief description of the drawings
Fig. 1 is the 3 parallel-convolution computing hardware structure charts that step-length is 2.
Fig. 2 is that quick FIR filters computing hardware structure chart parallel for 3 taps 3.
Fig. 3 is integrated stand composition of the present invention.
Fig. 4 is a kind of FIR filter hardware structure diagram suitable for a variety of convolution patterns of the invention.
Embodiment
Below in conjunction with accompanying drawing to the present invention specific implementation be further described, step-length be 2 convolution algorithm with
The convolution algorithm that step-length is 1 has very big difference in the specific implementation, because the data inputted each time and last data
At intervals of 2, when this to carry out parallel computation with quick FIR algorithm, due to the noncontinuity of data so that the meter of half
It is all hash to calculate result.If without using quick FIR algorithm, by inputting the relation with output:
Y0=H0X0+z-6(H2X4+H1X5)
Y1=H2X0+H1X1+H0X2
Y2=H2X2+H1X3+H0X4
If can obtain the convolution algorithm that step-length is 2 making 3 parallel, its hardware configuration is as shown in Figure 1.
The hardware configuration altogether used 9 multipliers and 6 adders, often input 6 data be calculated 3 it is defeated
Go out, relative to quick FIR algorithm accomplish it is identical it is parallel under conditions of, realize the high usage of hardware resource.
A kind of hardware configuration of quick FIR filter is as shown in Figure 2.The hardware configuration reduces multiplication by increasing adder
The method of device reduces hardware implementation complexity and has accomplished high degree of parallelism.For the convolution algorithm that traditional step-length is 1, the hardware
Structure can realize efficient computing.
The present invention is 1 convolution algorithm for being 2 with paces by the way that the fusion of above two hardware configuration is realized to paces
Efficient support, have complexity low on the premise of high degree of parallelism, hardware utilization is high, it is versatile the features such as.Fig. 3 is
The hardware structure of the present invention, such as scheming the framework includes four modules:
1) data input selecting unit, for different convolution patterns, input data is carried out to reselect arrangement input
To corresponding convolutional calculation module.
2) convolutional calculation unit, basic component units are the 3 parallel quick FIR filters of 3 taps, and insert data choosing
Device control data stream is selected to be directed to different convolution algorithms.
3) computing unit after convolution, the output to convolutional calculation unit carry out processing and calculated so as to realize to convolutional calculation
The cascade of multiple independent component units in unit, form the quick FIR filter of a multiple parallel multi-tap.
4) data output selecting unit, for different convolution patterns, corresponding result of calculation is selected as module
Output.
Specific hardware circuit is as shown in Figure 4.The present invention is answered by being multiplexed multiplier and adder and adding multichannel
With the method for device it is achieved thereby that melting 3 parallel-convolution computing hardware structures that step-length is 2 and three parallel quickly FIR filters
Close, it is 2 to be realized under conditions of no one multiplier of increase and adder to paces and paces are 1 two kinds of convolution patterns
It is compatible.6 parallel quick FIR filters are realized into 33 parallel quick FIR filter cascades.The present invention shares three kinds of mode of operations
It can select, the first is the parallel quick FIR filter of 3 tap in 3 independent 3, can be real by setting tap coefficient H
Existing 3*3 convolution algorithm.Second be 2 independent step-lengths be 23 tap convolution algorithms.The third is single 6 parallel quick
FIR filter.5*5 convolution algorithm can be realized by being set to 0 by last tap coefficient to the third pattern.Only need to be by number
Inputted according to input X, tap coefficient H, and mode of operation M to the hardware circuit, the present invention can be according to the pattern of input to X and H
Automatically distributed, input to each submodule and carry out computing, and the result drawn is passed through into multiplexer selection and pattern
The data to match export as a result.
Certain present invention is not only applicable in the calculating of convolutional neural networks, at Digital Image Processing, data signal
Also there are much applicable scenes, and be multiplexed by multiplier and adder and insert multiplexer in the fields such as reason, radio communication
Improvement of the mode to some other wave filters can realize matching to more convolutional calculation patterns.For the art
For those of ordinary skill, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvement
Protection scope of the present invention is also should be regarded as with retouching.It is that the available prior art of clear and definite each part is subject in the present embodiment
Realize.
Claims (1)
1. a kind of FIR filter for being applicable to a variety of convolution patterns, its hardware structure include:
1) data input selecting unit, for different convolution patterns, input data reselect arrangement input to phase
The convolutional calculation module answered.
2) convolutional calculation unit, basic component units are the 3 parallel quick FIR filters of 3 taps, and insert data selector
Control data stream is directed to different convolution algorithms.
3) computing unit after convolution, the output to convolutional calculation unit carry out processing and calculated so as to realize to convolutional calculation unit
The cascade of interior multiple independent component units, form the quick FIR filter of a multiple parallel multi-tap.
4) data output selecting unit, for different convolution patterns, corresponding result of calculation is selected to be exported as module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711101343.7A CN107862381A (en) | 2017-11-06 | 2017-11-06 | A kind of FIR filter suitable for a variety of convolution patterns is realized |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711101343.7A CN107862381A (en) | 2017-11-06 | 2017-11-06 | A kind of FIR filter suitable for a variety of convolution patterns is realized |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107862381A true CN107862381A (en) | 2018-03-30 |
Family
ID=61700161
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711101343.7A Pending CN107862381A (en) | 2017-11-06 | 2017-11-06 | A kind of FIR filter suitable for a variety of convolution patterns is realized |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107862381A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109104197A (en) * | 2018-11-12 | 2018-12-28 | 合肥工业大学 | The coding and decoding circuit and its coding and decoding method of non-reduced sparse data applied to convolutional neural networks |
WO2021046709A1 (en) * | 2019-09-10 | 2021-03-18 | 深圳市南方硅谷半导体有限公司 | Fir filter optimization method and device, and apparatus |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040103133A1 (en) * | 2002-11-27 | 2004-05-27 | Spectrum Signal Processing Inc. | Decimating filter |
CN106845635A (en) * | 2017-01-24 | 2017-06-13 | 东南大学 | CNN convolution kernel hardware design methods based on cascade form |
CN106936406A (en) * | 2017-03-10 | 2017-07-07 | 南京大学 | A kind of realization of 5 parallel rapid finite impact response filter |
-
2017
- 2017-11-06 CN CN201711101343.7A patent/CN107862381A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040103133A1 (en) * | 2002-11-27 | 2004-05-27 | Spectrum Signal Processing Inc. | Decimating filter |
CN106845635A (en) * | 2017-01-24 | 2017-06-13 | 东南大学 | CNN convolution kernel hardware design methods based on cascade form |
CN106936406A (en) * | 2017-03-10 | 2017-07-07 | 南京大学 | A kind of realization of 5 parallel rapid finite impact response filter |
Non-Patent Citations (1)
Title |
---|
JICHEN WANG等: "《Efficient Convolution Architectures for Convolutional Neural Network》", 《2016 8TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING (WCSP)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109104197A (en) * | 2018-11-12 | 2018-12-28 | 合肥工业大学 | The coding and decoding circuit and its coding and decoding method of non-reduced sparse data applied to convolutional neural networks |
CN109104197B (en) * | 2018-11-12 | 2022-02-11 | 合肥工业大学 | Coding and decoding circuit and coding and decoding method for non-reduction sparse data applied to convolutional neural network |
WO2021046709A1 (en) * | 2019-09-10 | 2021-03-18 | 深圳市南方硅谷半导体有限公司 | Fir filter optimization method and device, and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cheng et al. | High-speed VLSI implementation of 2-D discrete wavelet transform | |
Cooklev | An efficient architecture for orthogonal wavelet transforms | |
WO2016201216A1 (en) | Sparse cascaded-integrator-comb filters | |
CN105183425A (en) | Fixed-bit-width multiplier with high accuracy and low complexity properties | |
CN107862381A (en) | A kind of FIR filter suitable for a variety of convolution patterns is realized | |
US10050607B2 (en) | Polyphase decimation FIR filters and methods | |
WO2008034027A2 (en) | Processor architecture for programmable digital filters in a multi-standard integrated circuit | |
CN101136623A (en) | Time-domain implementing method for simple coefficient FIR filter | |
Kumar et al. | Exploiting coefficient symmetry in conventional polyphase FIR filters | |
CN113556101B (en) | IIR filter and data processing method thereof | |
Ahmed et al. | High-accuracy stochastic computing-based fir filter design | |
CN107864017B (en) | A kind of method for correcting phase and device | |
CN106505971A (en) | A kind of low complex degree FIR filter structure of the row that rearranged based on structured adder order | |
CN106505973B (en) | A kind of FIR filter of N tap | |
CN108429546A (en) | A kind of mixed type FIR filter design method | |
WO2014127663A1 (en) | Interpolation filtering method and interpolation filter | |
CN105429610B (en) | A kind of FIR filter optimization method based on SPT coefficients | |
US6449630B1 (en) | Multiple function processing core for communication signals | |
CN108270416A (en) | A kind of high-order interpolation wave filter and method | |
CN206743202U (en) | Single order antidifferential circuit and its full phase shift filter | |
CN105048997A (en) | Matched filer multiplexing apparatus and method, and digital communication receiver | |
CN112422102B (en) | Digital filter capable of saving multiplier and implementation method thereof | |
TW201616810A (en) | Finite impulse response filter and filtering method | |
Revathi et al. | Implementation of Area and Power Efficient Pulse Shaping FIR Interpolation Filter for Multi Standard DUC | |
Gunasekaran et al. | Low power and area efficient reconfigurable FIR filter implementation in FPGA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180330 |
|
WD01 | Invention patent application deemed withdrawn after publication |