CN105681628B - A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing - Google Patents
A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing Download PDFInfo
- Publication number
- CN105681628B CN105681628B CN201610003960.2A CN201610003960A CN105681628B CN 105681628 B CN105681628 B CN 105681628B CN 201610003960 A CN201610003960 A CN 201610003960A CN 105681628 B CN105681628 B CN 105681628B
- Authority
- CN
- China
- Prior art keywords
- input
- restructural
- convolution
- output
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/21—Circuitry for suppressing or minimising disturbance, e.g. moiré or halo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/21—Circuitry for suppressing or minimising disturbance, e.g. moiré or halo
- H04N5/213—Circuitry for suppressing or minimising impulsive noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/64—Circuits for processing colour signals
- H04N9/73—Colour balance circuits, e.g. white balance circuits or colour temperature control
Abstract
A kind of method that the present invention discloses convolutional network arithmetic element and restructural convolutional neural networks processor and realizes image denoising processing;Restructural convolutional neural networks processor disclosed in this invention, including bus interface, pretreatment unit, reconfigurable hardware controller, SRAM, SRAM control unit, input buffer module, output buffer module, memory, data storage controller and convolutional network arithmetic element;Its resource is few, speed is fast, can be suitably used for common convolutional neural networks framework.The present invention can be realized convolutional neural networks, and processing speed is fast, be easy to transplant, and resource consumption is few, can restore the image or video polluted by raindrop, dust, moreover it is possible to operate as pre-treatment and provide help for subsequent image recognition or classification.
Description
Technical field
The present invention relates to field of image processing, in particular to a kind of convolutional network arithmetic element and restructural convolutional Neural net
Network processor and the method for realizing image denoising processing.
Background technique
The removal of image raindrop and dust is significant for image processing application, especially video monitoring and navigation system
System.It can be used for restoring the image or video that are polluted by raindrop, dust, and being alternatively arranged as pre-treatment operation is that subsequent image is known
Not or classification provides help.
The method of current removal picture noise is mostly completed in the way of gaussian filtering, median filtering, bilateral filtering etc.,
These method treatment effects are bad, are not usually able to satisfy the demand of specific image processing application.Therefore need an effect more preferable
Method remove picture noise, the method for convolutional neural networks becomes a good selection.
Current deep learning network is mostly run on GPU, but GPU is expensive, and power consumption is high, is not appropriate for advising greatly
The extensive use of mould.And the speed of service is slow on CPU, it is low to run large-scale deep learning network efficiency, is unable to satisfy performance
Demand.
It can be seen that the problem of current technology is for applying convolutional neural networks, being primarily present has: processor area is big, at
The problems such as this height, power consumption is big, and performance is poor.Therefore this just needs that a low-power consumption, area be small, restructural convolution of high treating effect
Neural network processor.
Summary of the invention
The purpose of the present invention is to provide a kind of convolutional network arithmetic element and restructural convolutional neural networks processor and
Realize image denoising processing method, hardware resource consumption is low, area is small, can restore by raindrop, dust pollution image or
Video.
To achieve the goals above, the present invention adopts the following technical scheme:
A kind of convolutional network arithmetic element, including 2 restructural separation convolution modules, nonlinear activation function unit and multiply
Accumulator element;
The output of first restructural separation convolution module is the input of nonlinear activation function unit, nonlinear activation letter
The output of counting unit is the input of multiply-accumulator unit, and the output of multiply-accumulator unit is second restructural separation convolution module
Input;
Picture signal and Configuration network parameter signal are input to first restructural separation convolution module;First restructural
It separates convolution module and completes 16 × 16 convolution algorithms;Nonlinear activation function unit completes activation primitive in convolutional neural networks
Operation;Multiply-accumulator unit completes the operation of the articulamentum in convolutional neural networks;Second restructural separation convolution module is same
When complete 48 × 8 convolution algorithms;
The multiply-accumulator unit includes several multiply-accumulators and several registers;Wherein multiply-accumulator is for calculating one
Layer convolutional network output valve and weight parameter product and;The result of upper one layer of convolutional network is input to by register to be multiplied accumulating
In device.
Further, the restructural separation convolution module includes that 16 4 × 4 restructural one-dimensional convolution modules and first are posted
Storage group;Picture signal and convolutional network parameter are input to restructural one-dimensional convolution module by the first register group;Restructural point
1 16 × 16 convolution achievable from convolution module are completed at the same time 48 × 8 convolution algorithms;4 × 4 restructural one-dimensional convolution moulds
Block includes 4 first selectors, 4 the 1st input multipliers, the one 4 input summer, 4 the 2nd 2 input multipliers and the
24 input summers;The output end of 4 first selectors connects the input terminals of corresponding 4 the 1st inputs multiplier, and 4 the
Another input terminal of one 2 input multipliers is the weight of neural network;The output end connection of 4 the 1st input multipliers
The input terminal of one 4 input summer;The input of 4 the 2nd 2 input multipliers is output and the nerve of the one 4 input summer
The weight of network;The input of 2nd 4 input summer is the output of 4 the 2nd 2 input multipliers.
Further, the nonlinear activation function unit includes QD generator and arithmetic unit group;Wherein QD generator
Input is the output of restructural separation convolution, and the input of arithmetic unit group is the output of QD generator;QD generator is sharp for generating
Parameter needed for function living;Arithmetic unit group is for calculating the final end value of activation primitive;
The QD generator includes first divider;Input signal is input to the first divider, and the first divider is defeated
Quotient Q and remainder D out;The arithmetic unit group includes shift register, 2 first adders and the second divider;Shift register
Output and for 2 first adders input;The output of 2 first adders is the input of the second divider;Shift LD
Device, first adder and the second divider are sequentially connected;
A kind of restructural convolutional neural networks processor, including the control of bus interface, pretreatment unit, reconfigurable hardware
Device, SRAM, SRAM control unit, input buffer module, output buffer module, memory, data storage controller and several
Convolutional network arithmetic element described in any one of claims 1 to 3;Bus interface connects pretreatment unit, data storage
Controller, reconfigurable hardware controller and input-buffer, output caching;Memory connects data Memory Controller;Input is slow
Deposit connection reconfigurable hardware controller and SRAM control unit;Convolutional network arithmetic element connects input buffer module, output is delayed
Storing module;
The input of the pretreatment unit is image or vision signal;Complete the pre-treatments such as white balance, noise filtering
Operation;
The input buffer module, output buffer module are respectively used to the input of caching convolutional network arithmetic element and defeated
Out;
The reconfigurable hardware controller configures convolutional network computing module, controls its calculating process;It is transporting
During calculation or at the end of send interrupt requests and complete and the interaction of external system;
The SRAM control unit is used to control the transmission of convolutional network weight parameter.
Further, including 512 convolutional network arithmetic elements, it realizes at the image denoising based on convolutional neural networks
Reason.
Further, a kind of restructural convolutional neural networks processor realizes 3 layers of convolutional neural networks, is used for
The raindrop and dust adhered in removal image or video;The convolutional neural networks first layer is by 512 16 × 16 convolution
It constitutes, the second layer is neural network articulamentum, and third layer is made of 512 8 × 8 convolution.
A kind of method that restructural convolutional neural networks processor realizes image denoising processing, comprising:
It is random to reduce convolution number during image denoising processing, the consumption of hardware resource is reduced, processing speed is improved
Degree;
Alternatively, 16 × 16 convolution algorithm units and 8 × 8 convolution algorithm units are distinguished during image denoising processing
It is divided into the convolution mask of 16 and 44 × 4, one-dimensional convolution is used to each 4 × 4 convolution.
Compared with the existing technology, the invention has the following advantages: convolutional network arithmetic element utilizes Reconfiguration Technologies,
Achievable 16 × 16 convolution is completed at the same time 48 × 8 convolution algorithms, improves hardware performance and flexibility.The present invention is using deeply
The method for spending study, realizes the denoising of removable image raindrop and dust, treatment effect meet demand.The present invention is not
Under the premise of influencing treatment effect, the random template number for reducing convolutional network, but also the method for utilizing the one-dimensional convolution of piecemeal,
Hardware resource consumption greatly reduces, and processing speed greatly improves.This processor can realize 3 layers of convolutional neural networks, Neng Gouwei
Subsequent higher level image recognition, classification provide feature.Expensive relative to GPU, power consumption is high, and area is big.CPU operation speed
Degree is slow, and it is low to run large-scale deep learning network efficiency.The present invention using Reconfiguration Technologies and above-mentioned reduction template number and
The method of the one-dimensional convolution of piecemeal, the restructural convolutional neural networks processor of realization it is low in resources consumption, be easy to hardware realization, energy
Enough images or video for restoring to be polluted by raindrop, dust.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of convolutional network arithmetic element;
Fig. 2 is the structural schematic diagram of nonlinear activation function unit;
Fig. 3 is the structural schematic diagram of the one 4 × 4th restructural one-dimensional convolution module;
Fig. 4 is the structural schematic diagram of restructural separation convolution module;
Fig. 5 is the structural schematic diagram of restructural convolutional neural networks processor;
Specific embodiment
Explanation and specific embodiment elaborate to the present invention with reference to the accompanying drawing.
Referring to Fig.1, convolutional network arithmetic element packet used in restructural convolutional neural networks processor in the present invention
Include 2 restructural separation convolution modules, nonlinear activation function unit and multiply-accumulator unit;First restructural separation convolution
The output of module is the input of nonlinear activation function unit, and the output of nonlinear activation function unit is multiply-accumulator unit
Input, the output of multiply-accumulator unit are the input of second restructural separation convolution module;
Picture signal and Configuration network parameter signal are input to first restructural separation convolution module;First restructural
It separates convolution module and completes 16 × 16 convolution algorithms;Nonlinear activation function unit completes activation primitive in convolutional neural networks
Operation;Multiply-accumulator unit completes the operation of the articulamentum in convolutional neural networks;Second restructural separation convolution module is same
When complete 48 × 8 convolution algorithms;
It please refers to shown in Fig. 2, nonlinear activation function unit includes QD generator and arithmetic unit group;Wherein QD generator
Input is the output of restructural separation convolution, and the input of arithmetic unit group is the output of QD generator;QD generator is sharp for generating
Parameter needed for function living;Arithmetic unit group is for calculating the final result of activation primitive.
The activation primitive of neural network of the present invention is hyperbolic tangent function
By domain extension and Taylor series expansion, obtain
Wherein | D | < ln2
QD generator includes first divider, and input signal is input to the first divider, and the first divider is divided by fixed
Value 0.69 exports quotient Q and remainder D;The arithmetic unit group includes shift register, 2 first adders and the second divider;It moves
Bit register output and the input for 2 first adders;The output of 2 first adders is the input of the second divider;It moves
Bit register, first adder and the second divider are sequentially connected;
It please refers to shown in Fig. 3,4 × 4 restructural one-dimensional convolution modules include 4 first selector MUX, 4 the 1st inputs
Multiplier, the one 4 input summer, 4 the 2nd 2 input multipliers, the 2nd 4 input summer.Two of first selector are defeated
Enter for picture signal and previous stage result;The output end of 4 first selectors connects corresponding 4 the 1st inputs multiplier
Another input terminal of one input terminal, 4 the 1st input multipliers is the weight of neural network;4 the 1st inputs multiply
The output end of musical instruments used in a Buddhist or Taoist mass connects the input terminal of the one 4 input summer;The input of 4 the 2nd 2 input multipliers is that the one 4 input adds
The output of musical instruments used in a Buddhist or Taoist mass and the weight of neural network;The input of 2nd 4 input summer is the output of 4 the 2nd 2 input multipliers.
It please refers to shown in Fig. 4, restructural separation convolution module includes the first register group, 16 4 × 4 restructural one-dimensional volumes
Volume module, 44 input first adders and 14 input second adders.Using Reconfiguration Technologies, restructural separation convolution mould
Block achievable 16 × 16 is completed at the same time 48 × 8 convolution algorithms.Picture signal and configuration signal are input to the first register
Group.The input of one 4 × 4th convolution 1 is 1-4 row picture signal, and the input of the one 4 × 4th convolution 5 is 5-8 row picture signal.
When convolution mask is 16 × 16, the input of the one 4 × 4th convolution 3 is the output of the one 4 × 4th convolution 2, the one 4 ×
The input of 4 convolution 7 is the output of the one 4 × 4th convolution 6, and the input of the one 4 × 4th convolution 11 is the output of the one 4 × 4th convolution 10,
The input of one 4 × 4th convolution 15 is the output of the one 4 × 4th convolution 14.The input of one 4 × 4th convolution 9 is 9-12 row image letter
Number, the input of the one 4 × 4th convolution 13 is 13-16 row picture signal.Restructural separation convolution module output is second adder
As a result.
When convolution module is 8 × 8, the input of the one 4 × 4th convolution 3 is 1-4 row picture signal, the one 4 × 4th convolution 7
Input is 1-4 row picture signal, and the input of the one 4 × 4th convolution 11 is 1-4 row picture signal, the input of the one 4 × 4th convolution 15
For 1-4 row picture signal.The input of one 4 × 4th convolution 9 is 1-4 row picture signal, and the input of the one 4 × 4th convolution 13 is 5-8
Row picture signal.Restructural separation convolution module output is the result of 4 first adders.One restructural separation convolution module
48 × 8 convolution algorithms can be completed at the same time.
It please refers to shown in Fig. 5, a kind of restructural convolutional neural networks processor of the present invention includes bus interface, pre-treatment list
Member, reconfigurable hardware controller, SRAM, SRAM control unit, input-buffer, output caching, memory, data storage control
Device and several convolutional network arithmetic elements;Bus interface connects pretreatment unit, data storage controller, reconfigurable hardware control
Device and input-buffer processed, output caching;Memory connects data Memory Controller;Input-buffer connects reconfigurable hardware control
Device and SRAM control unit;Convolutional network arithmetic element connects input buffer module, output buffer module.
The input of pretreatment unit is image or vision signal;Complete the pre-treatments such as white balance, noise filtering operation;It is defeated
Enter cache module, output caching is respectively used to outputting and inputting for caching convolutional network arithmetic element.Reconfigurable hardware controller
Convolutional network arithmetic element is configured, its calculating process is controlled;In calculating process or at the end of send interrupt requests
Complete the interaction with external system;SRAM control unit is used to control the transmission of convolutional network weight parameter.
One is realized in the convolutional neural networks of removal image raindrop and dust, including 512 convolutional network arithmetic elements.
In order to reduce resource, processing speed is improved, the present invention uses following two method during specific implementation: (1) subtracting at random
The method of few convolution number: reducing the number of convolutional network arithmetic element under the premise of not influencing treatment effect, reduces hardware
The consumption of resource improves processing speed;(2) method of the one-dimensional convolution of piecemeal: 16 × 16 and 8 × 8 convolution mask is divided respectively
At 16 and 44 × 4 convolution masks, to each 4 × 4 convolution by the way of one-dimensional convolution.
Referring to Fig. 5, restructural 16 × 16 convolution algorithm unit include 16 4 × 4 restructural one-dimensional convolution modules (1,2,
3 ..., 16), row storing module and register;The input of row storing module is image or vision signal, and the input of register group is that row is deposited
The output of module, the input of 4 × 4 restructural one-dimensional convolution modules are the output of register group;Row storing module is for saving image;
Register is used to save the image data that row deposits serial input, and image data is input to 4 × 4 restructural one-dimensional convolution modules.
Restructural 8 × 8 convolution algorithm unit includes 44 × 4 restructural one-dimensional convolution modules (1,2,3,4), row storing modules
And register;The input of row storing module is the output of multiply-accumulator, and the input of register group is the output of row storing module, and 4 × 4 can
The input for reconstructing one-dimensional convolution module is the output of register group.
Claims (6)
1. a kind of convolutional network arithmetic element, it is characterised in that: including 2 restructural separation convolution modules, nonlinear activation letters
Counting unit and multiply-accumulator unit;
The output of first restructural separation convolution module is the input of nonlinear activation function unit, nonlinear activation function list
The output of member is the input of multiply-accumulator unit, and the output of multiply-accumulator unit is the defeated of second restructural separation convolution module
Enter;
Picture signal and Configuration network parameter signal are input to first restructural separation convolution module;First restructural separation
Convolution module completes 16 × 16 convolution algorithms;Nonlinear activation function unit completes the fortune of activation primitive in convolutional neural networks
It calculates;Multiply-accumulator unit completes the operation of the articulamentum in convolutional neural networks;Second restructural separation convolution module is simultaneously
Complete 48 × 8 convolution algorithms;
The multiply-accumulator unit includes several multiply-accumulators and several registers;Wherein multiply-accumulator is for calculating one layer of volume
Product network output valve and weight parameter product and;The result of upper one layer of convolutional network is input to multiply-accumulator by register
In;
The restructural separation convolution module includes 16 4 × 4 restructural one-dimensional convolution modules and the first register group;First posts
Storage group is used to picture signal or previous stage output and convolutional network parameter being input to restructural one-dimensional convolution module;It is restructural
Separation convolution module is for completing 1 16 × 16 convolution or being completed at the same time 48 × 8 convolution algorithms;
4 × 4 restructural one-dimensional convolution modules include 4 first selectors, 4 the one 2 input multipliers, the one 4 input additions
Device, 4 the 2nd 2 input multipliers and the 2nd 4 input summers;The output end connection of 4 first selectors is 4 first corresponding
Another input terminal of the input terminal of 2 input multipliers, 4 the 1st input multipliers is the weight of neural network;4
The output end of one 2 input multipliers connects the input terminal of the one 4 input summer;The input of 4 the 2nd 2 input multipliers is the
The output of one 4 input summers and the weight of neural network;The input of 2nd 4 input summer is 4 the 2nd 2 input multipliers
Output.
2. a kind of convolutional network arithmetic element according to claim 1, it is characterised in that: the nonlinear activation function list
Member includes QD generator and arithmetic unit group;Wherein the input of QD generator is the output of restructural separation convolution, arithmetic unit group
Input is the output of QD generator;QD generator is for parameter needed for generating activation primitive;Arithmetic unit group is for calculating activation
The final end value of function;
The QD generator includes first divider;Input signal is input to the first divider, and the first divider exports quotient Q
With remainder D;The arithmetic unit group includes shift register, 2 first adders and the second divider;Shift register output and
For the input of 2 first adders;The output of 2 first adders is the input of the second divider;Shift register, first add
Musical instruments used in a Buddhist or Taoist mass and the second divider are sequentially connected.
3. a kind of restructural convolutional neural networks processor, it is characterised in that: including bus interface, pretreatment unit, restructural
Hardware control, SRAM, SRAM control unit, input buffer module, output buffer module, memory, data storage control
Convolutional network arithmetic element described in device and several any one of claims 1 to 2;Bus interface connects pretreatment unit, number
According to Memory Controller, reconfigurable hardware controller and input-buffer, output caching;Memory connects data storage control
Device;Input-buffer connects reconfigurable hardware controller and SRAM control unit;Convolutional network arithmetic element connects input-buffer mould
Block, output buffer module;
The input of the pretreatment unit is image or vision signal;Complete the pre-treatments such as white balance, noise filtering operation;
The input buffer module, output buffer module are respectively used to outputting and inputting for caching convolutional network arithmetic element;
The reconfigurable hardware controller configures convolutional network computing module, controls its calculating process;In operation
In journey or at the end of send interrupt requests and complete and the interaction of external system;
The SRAM control unit is used to control the transmission of convolutional network weight parameter.
4. a kind of restructural convolutional neural networks processor according to claim 3, it is characterised in that: rolled up including 512
Product network operations unit realizes the image denoising processing based on convolutional neural networks.
5. a kind of restructural convolutional neural networks processor according to claim 3, it is characterised in that: described one kind can weigh
Structure convolutional neural networks processor realizes 3 layers of convolutional neural networks, for removing the raindrop adhered in image or video
And dust;The convolutional neural networks first layer is made of 512 16 × 16 convolution, and the second layer is neural network articulamentum,
Third layer is made of 512 8 × 8 convolution.
6. the method that a kind of restructural convolutional neural networks processor as claimed in claim 3 realizes image denoising processing, special
Sign is: including:
It is random to reduce convolution number during image denoising processing, the consumption of hardware resource is reduced, processing speed is improved;
Alternatively, 16 × 16 convolution algorithm units and 8 × 8 convolution algorithm units are respectively classified into during image denoising processing
16 and 44 × 4 convolution masks use one-dimensional convolution to each 4 × 4 convolution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610003960.2A CN105681628B (en) | 2016-01-05 | 2016-01-05 | A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610003960.2A CN105681628B (en) | 2016-01-05 | 2016-01-05 | A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105681628A CN105681628A (en) | 2016-06-15 |
CN105681628B true CN105681628B (en) | 2018-12-07 |
Family
ID=56298840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610003960.2A Active CN105681628B (en) | 2016-01-05 | 2016-01-05 | A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105681628B (en) |
Families Citing this family (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203617B (en) * | 2016-06-27 | 2018-08-21 | 哈尔滨工业大学深圳研究生院 | A kind of acceleration processing unit and array structure based on convolutional neural networks |
CN106203621B (en) * | 2016-07-11 | 2019-04-30 | 北京深鉴智能科技有限公司 | The processor calculated for convolutional neural networks |
WO2018018470A1 (en) * | 2016-07-27 | 2018-02-01 | 华为技术有限公司 | Method, apparatus and device for eliminating image noise and convolutional neural network |
CN106250103A (en) * | 2016-08-04 | 2016-12-21 | 东南大学 | A kind of convolutional neural networks cyclic convolution calculates the system of data reusing |
US10832123B2 (en) | 2016-08-12 | 2020-11-10 | Xilinx Technology Beijing Limited | Compression of deep neural networks with proper use of mask |
US10936941B2 (en) | 2016-08-12 | 2021-03-02 | Xilinx, Inc. | Efficient data access control device for neural network hardware acceleration system |
US10802992B2 (en) | 2016-08-12 | 2020-10-13 | Xilinx Technology Beijing Limited | Combining CPU and special accelerator for implementing an artificial neural network |
US10810484B2 (en) | 2016-08-12 | 2020-10-20 | Xilinx, Inc. | Hardware accelerator for compressed GRU on FPGA |
US10762426B2 (en) | 2016-08-12 | 2020-09-01 | Beijing Deephi Intelligent Technology Co., Ltd. | Multi-iteration compression for deep neural networks |
US10621486B2 (en) | 2016-08-12 | 2020-04-14 | Beijing Deephi Intelligent Technology Co., Ltd. | Method for optimizing an artificial neural network (ANN) |
US10698657B2 (en) | 2016-08-12 | 2020-06-30 | Xilinx, Inc. | Hardware accelerator for compressed RNN on FPGA |
US10643124B2 (en) | 2016-08-12 | 2020-05-05 | Beijing Deephi Intelligent Technology Co., Ltd. | Method and device for quantizing complex artificial neural network |
CN107229967B (en) * | 2016-08-22 | 2021-06-15 | 赛灵思公司 | Hardware accelerator and method for realizing sparse GRU neural network based on FPGA |
US10984308B2 (en) | 2016-08-12 | 2021-04-20 | Xilinx Technology Beijing Limited | Compression method for deep neural networks with load balance |
CN106331433B (en) * | 2016-08-25 | 2020-04-24 | 上海交通大学 | Video denoising method based on deep recurrent neural network |
KR20180034853A (en) | 2016-09-28 | 2018-04-05 | 에스케이하이닉스 주식회사 | Apparatus and method test operating of convolutional neural network |
IE87469B1 (en) * | 2016-10-06 | 2024-01-03 | Google Llc | Image processing neural networks with separable convolutional layers |
JP2018067154A (en) * | 2016-10-19 | 2018-04-26 | ソニーセミコンダクタソリューションズ株式会社 | Arithmetic processing circuit and recognition system |
CN106529669A (en) | 2016-11-10 | 2017-03-22 | 北京百度网讯科技有限公司 | Method and apparatus for processing data sequences |
US10733505B2 (en) | 2016-11-10 | 2020-08-04 | Google Llc | Performing kernel striding in hardware |
CN108073977A (en) * | 2016-11-14 | 2018-05-25 | 耐能股份有限公司 | Convolution algorithm device and convolution algorithm method |
CN108073550A (en) * | 2016-11-14 | 2018-05-25 | 耐能股份有限公司 | Buffer unit and convolution algorithm apparatus and method |
US10438115B2 (en) * | 2016-12-01 | 2019-10-08 | Via Alliance Semiconductor Co., Ltd. | Neural network unit with memory layout to perform efficient 3-dimensional convolutions |
US10417560B2 (en) * | 2016-12-01 | 2019-09-17 | Via Alliance Semiconductor Co., Ltd. | Neural network unit that performs efficient 3-dimensional convolutions |
CN108241484B (en) * | 2016-12-26 | 2021-10-15 | 上海寒武纪信息科技有限公司 | Neural network computing device and method based on high-bandwidth memory |
US10140574B2 (en) * | 2016-12-31 | 2018-11-27 | Via Alliance Semiconductor Co., Ltd | Neural network unit with segmentable array width rotator and re-shapeable weight memory to match segment width to provide common weights to multiple rotator segments |
CN106909970B (en) * | 2017-01-12 | 2020-04-21 | 南京风兴科技有限公司 | Approximate calculation-based binary weight convolution neural network hardware accelerator calculation device |
CN106843809B (en) * | 2017-01-25 | 2019-04-30 | 北京大学 | A kind of convolution algorithm method based on NOR FLASH array |
CN106940815B (en) * | 2017-02-13 | 2020-07-28 | 西安交通大学 | Programmable convolutional neural network coprocessor IP core |
CN108629406B (en) * | 2017-03-24 | 2020-12-18 | 展讯通信(上海)有限公司 | Arithmetic device for convolutional neural network |
CN107248144B (en) * | 2017-04-27 | 2019-12-10 | 东南大学 | Image denoising method based on compression type convolutional neural network |
CN108804973B (en) * | 2017-04-27 | 2021-11-09 | 深圳鲲云信息科技有限公司 | Hardware architecture of target detection algorithm based on deep learning and execution method thereof |
CN108804974B (en) * | 2017-04-27 | 2021-07-02 | 深圳鲲云信息科技有限公司 | Method and system for estimating and configuring resources of hardware architecture of target detection algorithm |
CN107169563B (en) | 2017-05-08 | 2018-11-30 | 中国科学院计算技术研究所 | Processing system and method applied to two-value weight convolutional network |
CN107256424B (en) * | 2017-05-08 | 2020-03-31 | 中国科学院计算技术研究所 | Three-value weight convolution network processing system and method |
CN109117945B (en) * | 2017-06-22 | 2021-01-26 | 上海寒武纪信息科技有限公司 | Processor and processing method thereof, chip packaging structure and electronic device |
CN107480782B (en) * | 2017-08-14 | 2020-11-10 | 电子科技大学 | On-chip learning neural network processor |
CN107609641B (en) * | 2017-08-30 | 2020-07-03 | 清华大学 | Sparse neural network architecture and implementation method thereof |
CN107844826B (en) * | 2017-10-30 | 2020-07-31 | 中国科学院计算技术研究所 | Neural network processing unit and processing system comprising same |
CN107862374B (en) * | 2017-10-30 | 2020-07-31 | 中国科学院计算技术研究所 | Neural network processing system and processing method based on assembly line |
CN108304923B (en) * | 2017-12-06 | 2022-01-18 | 腾讯科技(深圳)有限公司 | Convolution operation processing method and related product |
CN107909148B (en) * | 2017-12-12 | 2020-10-20 | 南京地平线机器人技术有限公司 | Apparatus for performing convolution operations in a convolutional neural network |
CN108038815B (en) * | 2017-12-20 | 2019-12-17 | 深圳云天励飞技术有限公司 | integrated circuit with a plurality of transistors |
CN108256628B (en) * | 2018-01-15 | 2020-05-22 | 合肥工业大学 | Convolutional neural network hardware accelerator based on multicast network-on-chip and working method thereof |
CN108154194B (en) * | 2018-01-18 | 2021-04-30 | 北京工业大学 | Method for extracting high-dimensional features by using tensor-based convolutional network |
CN110147872B (en) * | 2018-05-18 | 2020-07-17 | 中科寒武纪科技股份有限公司 | Code storage device and method, processor and training method |
CN108846420B (en) * | 2018-05-28 | 2021-04-30 | 北京陌上花科技有限公司 | Network structure and client |
CN108764336A (en) * | 2018-05-28 | 2018-11-06 | 北京陌上花科技有限公司 | For the deep learning method and device of image recognition, client, server |
CN109343826B (en) * | 2018-08-14 | 2021-07-13 | 西安交通大学 | Reconfigurable processor operation unit for deep learning |
CN110874632A (en) * | 2018-08-31 | 2020-03-10 | 北京嘉楠捷思信息技术有限公司 | Image recognition processing method and device |
CN109409512B (en) * | 2018-09-27 | 2021-02-19 | 西安交通大学 | Flexibly configurable neural network computing unit, computing array and construction method thereof |
TWI766193B (en) * | 2018-12-06 | 2022-06-01 | 神盾股份有限公司 | Convolutional neural network processor and data processing method thereof |
CN109711533B (en) * | 2018-12-20 | 2023-04-28 | 西安电子科技大学 | Convolutional neural network acceleration system based on FPGA |
CN109784483B (en) * | 2019-01-24 | 2022-09-09 | 电子科技大学 | FD-SOI (field-programmable gate array-silicon on insulator) process-based binary convolution neural network in-memory computing accelerator |
CN111626399B (en) * | 2019-02-27 | 2023-07-28 | 中国科学院半导体研究所 | Convolutional neural network computing device and data computing method |
CN110070178B (en) * | 2019-04-25 | 2021-05-14 | 北京交通大学 | Convolutional neural network computing device and method |
CN111008697B (en) * | 2019-11-06 | 2022-08-09 | 北京中科胜芯科技有限公司 | Convolutional neural network accelerator implementation architecture |
TWI734598B (en) * | 2020-08-26 | 2021-07-21 | 元智大學 | Removing method of rain streak in image |
RU2764395C1 (en) | 2020-11-23 | 2022-01-17 | Самсунг Электроникс Ко., Лтд. | Method and apparatus for joint debayering and image noise elimination using a neural network |
CN113591025A (en) * | 2021-08-03 | 2021-11-02 | 深圳思谋信息科技有限公司 | Feature map processing method and device, convolutional neural network accelerator and medium |
CN115841416B (en) * | 2022-11-29 | 2024-03-19 | 白盒子(上海)微电子科技有限公司 | Reconfigurable intelligent image processor architecture for automatic driving field |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4644488A (en) * | 1983-10-12 | 1987-02-17 | California Institute Of Technology | Pipeline active filter utilizing a booth type multiplier |
US4937774A (en) * | 1988-11-03 | 1990-06-26 | Harris Corporation | East image processing accelerator for real time image processing applications |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8442927B2 (en) * | 2009-07-30 | 2013-05-14 | Nec Laboratories America, Inc. | Dynamically configurable, multi-ported co-processor for convolutional neural networks |
-
2016
- 2016-01-05 CN CN201610003960.2A patent/CN105681628B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4644488A (en) * | 1983-10-12 | 1987-02-17 | California Institute Of Technology | Pipeline active filter utilizing a booth type multiplier |
US4937774A (en) * | 1988-11-03 | 1990-06-26 | Harris Corporation | East image processing accelerator for real time image processing applications |
Non-Patent Citations (4)
Title |
---|
A Deep Convolutional Neural Network Based on Nested Residue Number System;Hiroki Nakahara etal;《Field Programmable Logic and Applications (FPL)》;20150904;全文 * |
A Massively Parallel Coprocessor for Conv-olutional Neural Networks;Murugan Sankaradas etal;《2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors》;20090709;全文 * |
A reconfigurable interconnected filter for face recognition based on convolution neural network;Shefa A. Dawwd;《Design and Test Workshop (IDT)》;20091117;全文 * |
卷积神经网络的FPGA并行加速方案设计;方睿等;《计算机工程与应用》;20150415(第8期);第2-4章,图1-4 * |
Also Published As
Publication number | Publication date |
---|---|
CN105681628A (en) | 2016-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105681628B (en) | A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing | |
CN111684473B (en) | Improving performance of neural network arrays | |
US10943167B1 (en) | Restructuring a multi-dimensional array | |
CN106529670B (en) | It is a kind of based on weight compression neural network processor, design method, chip | |
CN105930902B (en) | A kind of processing method of neural network, system | |
CN106951926A (en) | The deep learning systems approach and device of a kind of mixed architecture | |
CN110533164B (en) | Winograd convolution splitting method for convolution neural network accelerator | |
CN107463990A (en) | A kind of FPGA parallel acceleration methods of convolutional neural networks | |
CN108733348B (en) | Fused vector multiplier and method for performing operation using the same | |
CN106203617A (en) | A kind of acceleration processing unit based on convolutional neural networks and array structure | |
CN111626403B (en) | Convolutional neural network accelerator based on CPU-FPGA memory sharing | |
CN113033794B (en) | Light weight neural network hardware accelerator based on deep separable convolution | |
CN110163362A (en) | A kind of computing device and method | |
CN109284824A (en) | A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies | |
CN110276447A (en) | A kind of computing device and method | |
Duan et al. | Energy-efficient architecture for FPGA-based deep convolutional neural networks with binary weights | |
Xiao et al. | FPGA-based scalable and highly concurrent convolutional neural network acceleration | |
CN109472734B (en) | Target detection network based on FPGA and implementation method thereof | |
CN102970545A (en) | Static image compression method based on two-dimensional discrete wavelet transform algorithm | |
CN105955896A (en) | Reconfigurable DBF algorithm hardware accelerator and control method | |
CN112988229B (en) | Convolutional neural network resource optimization configuration method based on heterogeneous computation | |
Yin et al. | FPGA-based high-performance CNN accelerator architecture with high DSP utilization and efficient scheduling mode | |
CN111886605B (en) | Processing for multiple input data sets | |
Jiang et al. | Hardware implementation of depthwise separable convolution neural network | |
CN114519425A (en) | Convolution neural network acceleration system with expandable scale |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |