CN105681628B

CN105681628B - A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing

Info

Publication number: CN105681628B
Application number: CN201610003960.2A
Authority: CN
Inventors: 张斌; 饶磊; 李艳婷; 杨宏伟; 赵季中
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2016-01-05
Filing date: 2016-01-05
Publication date: 2018-12-07
Anticipated expiration: 2036-01-05
Also published as: CN105681628A

Abstract

A kind of method that the present invention discloses convolutional network arithmetic element and restructural convolutional neural networks processor and realizes image denoising processing；Restructural convolutional neural networks processor disclosed in this invention, including bus interface, pretreatment unit, reconfigurable hardware controller, SRAM, SRAM control unit, input buffer module, output buffer module, memory, data storage controller and convolutional network arithmetic element；Its resource is few, speed is fast, can be suitably used for common convolutional neural networks framework.The present invention can be realized convolutional neural networks, and processing speed is fast, be easy to transplant, and resource consumption is few, can restore the image or video polluted by raindrop, dust, moreover it is possible to operate as pre-treatment and provide help for subsequent image recognition or classification.

Description

A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and realization The method of image denoising processing

Technical field

The present invention relates to field of image processing, in particular to a kind of convolutional network arithmetic element and restructural convolutional Neural net Network processor and the method for realizing image denoising processing.

Background technique

The removal of image raindrop and dust is significant for image processing application, especially video monitoring and navigation system System.It can be used for restoring the image or video that are polluted by raindrop, dust, and being alternatively arranged as pre-treatment operation is that subsequent image is known Not or classification provides help.

The method of current removal picture noise is mostly completed in the way of gaussian filtering, median filtering, bilateral filtering etc., These method treatment effects are bad, are not usually able to satisfy the demand of specific image processing application.Therefore need an effect more preferable Method remove picture noise, the method for convolutional neural networks becomes a good selection.

Current deep learning network is mostly run on GPU, but GPU is expensive, and power consumption is high, is not appropriate for advising greatly The extensive use of mould.And the speed of service is slow on CPU, it is low to run large-scale deep learning network efficiency, is unable to satisfy performance Demand.

It can be seen that the problem of current technology is for applying convolutional neural networks, being primarily present has: processor area is big, at The problems such as this height, power consumption is big, and performance is poor.Therefore this just needs that a low-power consumption, area be small, restructural convolution of high treating effect Neural network processor.

Summary of the invention

The purpose of the present invention is to provide a kind of convolutional network arithmetic element and restructural convolutional neural networks processor and Realize image denoising processing method, hardware resource consumption is low, area is small, can restore by raindrop, dust pollution image or Video.

To achieve the goals above, the present invention adopts the following technical scheme:

A kind of convolutional network arithmetic element, including 2 restructural separation convolution modules, nonlinear activation function unit and multiply Accumulator element；

The output of first restructural separation convolution module is the input of nonlinear activation function unit, nonlinear activation letter The output of counting unit is the input of multiply-accumulator unit, and the output of multiply-accumulator unit is second restructural separation convolution module Input；

Picture signal and Configuration network parameter signal are input to first restructural separation convolution module；First restructural It separates convolution module and completes 16 × 16 convolution algorithms；Nonlinear activation function unit completes activation primitive in convolutional neural networks Operation；Multiply-accumulator unit completes the operation of the articulamentum in convolutional neural networks；Second restructural separation convolution module is same When complete 48 × 8 convolution algorithms；

The multiply-accumulator unit includes several multiply-accumulators and several registers；Wherein multiply-accumulator is for calculating one Layer convolutional network output valve and weight parameter product and；The result of upper one layer of convolutional network is input to by register to be multiplied accumulating In device.

Further, the restructural separation convolution module includes that 16 4 × 4 restructural one-dimensional convolution modules and first are posted Storage group；Picture signal and convolutional network parameter are input to restructural one-dimensional convolution module by the first register group；Restructural point 1 16 × 16 convolution achievable from convolution module are completed at the same time 48 × 8 convolution algorithms；4 × 4 restructural one-dimensional convolution moulds Block includes 4 first selectors, 4 the 1st input multipliers, the one 4 input summer, 4 the 2nd 2 input multipliers and the 24 input summers；The output end of 4 first selectors connects the input terminals of corresponding 4 the 1st inputs multiplier, and 4 the Another input terminal of one 2 input multipliers is the weight of neural network；The output end connection of 4 the 1st input multipliers The input terminal of one 4 input summer；The input of 4 the 2nd 2 input multipliers is output and the nerve of the one 4 input summer The weight of network；The input of 2nd 4 input summer is the output of 4 the 2nd 2 input multipliers.

Further, the nonlinear activation function unit includes QD generator and arithmetic unit group；Wherein QD generator Input is the output of restructural separation convolution, and the input of arithmetic unit group is the output of QD generator；QD generator is sharp for generating Parameter needed for function living；Arithmetic unit group is for calculating the final end value of activation primitive；

The QD generator includes first divider；Input signal is input to the first divider, and the first divider is defeated Quotient Q and remainder D out；The arithmetic unit group includes shift register, 2 first adders and the second divider；Shift register Output and for 2 first adders input；The output of 2 first adders is the input of the second divider；Shift LD Device, first adder and the second divider are sequentially connected；

A kind of restructural convolutional neural networks processor, including the control of bus interface, pretreatment unit, reconfigurable hardware Device, SRAM, SRAM control unit, input buffer module, output buffer module, memory, data storage controller and several Convolutional network arithmetic element described in any one of claims 1 to 3；Bus interface connects pretreatment unit, data storage Controller, reconfigurable hardware controller and input-buffer, output caching；Memory connects data Memory Controller；Input is slow Deposit connection reconfigurable hardware controller and SRAM control unit；Convolutional network arithmetic element connects input buffer module, output is delayed Storing module；

The input of the pretreatment unit is image or vision signal；Complete the pre-treatments such as white balance, noise filtering Operation；

The input buffer module, output buffer module are respectively used to the input of caching convolutional network arithmetic element and defeated Out；

The reconfigurable hardware controller configures convolutional network computing module, controls its calculating process；It is transporting During calculation or at the end of send interrupt requests and complete and the interaction of external system；

The SRAM control unit is used to control the transmission of convolutional network weight parameter.

Further, including 512 convolutional network arithmetic elements, it realizes at the image denoising based on convolutional neural networks Reason.

Further, a kind of restructural convolutional neural networks processor realizes 3 layers of convolutional neural networks, is used for The raindrop and dust adhered in removal image or video；The convolutional neural networks first layer is by 512 16 × 16 convolution It constitutes, the second layer is neural network articulamentum, and third layer is made of 512 8 × 8 convolution.

A kind of method that restructural convolutional neural networks processor realizes image denoising processing, comprising:

It is random to reduce convolution number during image denoising processing, the consumption of hardware resource is reduced, processing speed is improved Degree；

Alternatively, 16 × 16 convolution algorithm units and 8 × 8 convolution algorithm units are distinguished during image denoising processing It is divided into the convolution mask of 16 and 44 × 4, one-dimensional convolution is used to each 4 × 4 convolution.

Compared with the existing technology, the invention has the following advantages: convolutional network arithmetic element utilizes Reconfiguration Technologies, Achievable 16 × 16 convolution is completed at the same time 48 × 8 convolution algorithms, improves hardware performance and flexibility.The present invention is using deeply The method for spending study, realizes the denoising of removable image raindrop and dust, treatment effect meet demand.The present invention is not Under the premise of influencing treatment effect, the random template number for reducing convolutional network, but also the method for utilizing the one-dimensional convolution of piecemeal, Hardware resource consumption greatly reduces, and processing speed greatly improves.This processor can realize 3 layers of convolutional neural networks, Neng Gouwei Subsequent higher level image recognition, classification provide feature.Expensive relative to GPU, power consumption is high, and area is big.CPU operation speed Degree is slow, and it is low to run large-scale deep learning network efficiency.The present invention using Reconfiguration Technologies and above-mentioned reduction template number and The method of the one-dimensional convolution of piecemeal, the restructural convolutional neural networks processor of realization it is low in resources consumption, be easy to hardware realization, energy Enough images or video for restoring to be polluted by raindrop, dust.

Detailed description of the invention

Fig. 1 is the structural schematic diagram of convolutional network arithmetic element；

Fig. 2 is the structural schematic diagram of nonlinear activation function unit；

Fig. 3 is the structural schematic diagram of the one 4 × 4th restructural one-dimensional convolution module；

Fig. 4 is the structural schematic diagram of restructural separation convolution module；

Fig. 5 is the structural schematic diagram of restructural convolutional neural networks processor；

Specific embodiment

Explanation and specific embodiment elaborate to the present invention with reference to the accompanying drawing.

Referring to Fig.1, convolutional network arithmetic element packet used in restructural convolutional neural networks processor in the present invention Include 2 restructural separation convolution modules, nonlinear activation function unit and multiply-accumulator unit；First restructural separation convolution The output of module is the input of nonlinear activation function unit, and the output of nonlinear activation function unit is multiply-accumulator unit Input, the output of multiply-accumulator unit are the input of second restructural separation convolution module；

It please refers to shown in Fig. 2, nonlinear activation function unit includes QD generator and arithmetic unit group；Wherein QD generator Input is the output of restructural separation convolution, and the input of arithmetic unit group is the output of QD generator；QD generator is sharp for generating Parameter needed for function living；Arithmetic unit group is for calculating the final result of activation primitive.

The activation primitive of neural network of the present invention is hyperbolic tangent function

By domain extension and Taylor series expansion, obtain

Wherein | D | < ln2

QD generator includes first divider, and input signal is input to the first divider, and the first divider is divided by fixed Value 0.69 exports quotient Q and remainder D；The arithmetic unit group includes shift register, 2 first adders and the second divider；It moves Bit register output and the input for 2 first adders；The output of 2 first adders is the input of the second divider；It moves Bit register, first adder and the second divider are sequentially connected；

It please refers to shown in Fig. 3,4 × 4 restructural one-dimensional convolution modules include 4 first selector MUX, 4 the 1st inputs Multiplier, the one 4 input summer, 4 the 2nd 2 input multipliers, the 2nd 4 input summer.Two of first selector are defeated Enter for picture signal and previous stage result；The output end of 4 first selectors connects corresponding 4 the 1st inputs multiplier Another input terminal of one input terminal, 4 the 1st input multipliers is the weight of neural network；4 the 1st inputs multiply The output end of musical instruments used in a Buddhist or Taoist mass connects the input terminal of the one 4 input summer；The input of 4 the 2nd 2 input multipliers is that the one 4 input adds The output of musical instruments used in a Buddhist or Taoist mass and the weight of neural network；The input of 2nd 4 input summer is the output of 4 the 2nd 2 input multipliers.

It please refers to shown in Fig. 4, restructural separation convolution module includes the first register group, 16 4 × 4 restructural one-dimensional volumes Volume module, 44 input first adders and 14 input second adders.Using Reconfiguration Technologies, restructural separation convolution mould Block achievable 16 × 16 is completed at the same time 48 × 8 convolution algorithms.Picture signal and configuration signal are input to the first register Group.The input of one 4 × 4th convolution 1 is 1-4 row picture signal, and the input of the one 4 × 4th convolution 5 is 5-8 row picture signal.

When convolution mask is 16 × 16, the input of the one 4 × 4th convolution 3 is the output of the one 4 × 4th convolution 2, the one 4 × The input of 4 convolution 7 is the output of the one 4 × 4th convolution 6, and the input of the one 4 × 4th convolution 11 is the output of the one 4 × 4th convolution 10, The input of one 4 × 4th convolution 15 is the output of the one 4 × 4th convolution 14.The input of one 4 × 4th convolution 9 is 9-12 row image letter Number, the input of the one 4 × 4th convolution 13 is 13-16 row picture signal.Restructural separation convolution module output is second adder As a result.

When convolution module is 8 × 8, the input of the one 4 × 4th convolution 3 is 1-4 row picture signal, the one 4 × 4th convolution 7 Input is 1-4 row picture signal, and the input of the one 4 × 4th convolution 11 is 1-4 row picture signal, the input of the one 4 × 4th convolution 15 For 1-4 row picture signal.The input of one 4 × 4th convolution 9 is 1-4 row picture signal, and the input of the one 4 × 4th convolution 13 is 5-8 Row picture signal.Restructural separation convolution module output is the result of 4 first adders.One restructural separation convolution module 48 × 8 convolution algorithms can be completed at the same time.

It please refers to shown in Fig. 5, a kind of restructural convolutional neural networks processor of the present invention includes bus interface, pre-treatment list Member, reconfigurable hardware controller, SRAM, SRAM control unit, input-buffer, output caching, memory, data storage control Device and several convolutional network arithmetic elements；Bus interface connects pretreatment unit, data storage controller, reconfigurable hardware control Device and input-buffer processed, output caching；Memory connects data Memory Controller；Input-buffer connects reconfigurable hardware control Device and SRAM control unit；Convolutional network arithmetic element connects input buffer module, output buffer module.

The input of pretreatment unit is image or vision signal；Complete the pre-treatments such as white balance, noise filtering operation；It is defeated Enter cache module, output caching is respectively used to outputting and inputting for caching convolutional network arithmetic element.Reconfigurable hardware controller Convolutional network arithmetic element is configured, its calculating process is controlled；In calculating process or at the end of send interrupt requests Complete the interaction with external system；SRAM control unit is used to control the transmission of convolutional network weight parameter.

One is realized in the convolutional neural networks of removal image raindrop and dust, including 512 convolutional network arithmetic elements. In order to reduce resource, processing speed is improved, the present invention uses following two method during specific implementation: (1) subtracting at random The method of few convolution number: reducing the number of convolutional network arithmetic element under the premise of not influencing treatment effect, reduces hardware The consumption of resource improves processing speed；(2) method of the one-dimensional convolution of piecemeal: 16 × 16 and 8 × 8 convolution mask is divided respectively At 16 and 44 × 4 convolution masks, to each 4 × 4 convolution by the way of one-dimensional convolution.

Referring to Fig. 5, restructural 16 × 16 convolution algorithm unit include 16 4 × 4 restructural one-dimensional convolution modules (1,2, 3 ..., 16), row storing module and register；The input of row storing module is image or vision signal, and the input of register group is that row is deposited The output of module, the input of 4 × 4 restructural one-dimensional convolution modules are the output of register group；Row storing module is for saving image； Register is used to save the image data that row deposits serial input, and image data is input to 4 × 4 restructural one-dimensional convolution modules.

Restructural 8 × 8 convolution algorithm unit includes 44 × 4 restructural one-dimensional convolution modules (1,2,3,4), row storing modules And register；The input of row storing module is the output of multiply-accumulator, and the input of register group is the output of row storing module, and 4 × 4 can The input for reconstructing one-dimensional convolution module is the output of register group.

Claims

1. a kind of convolutional network arithmetic element, it is characterised in that: including 2 restructural separation convolution modules, nonlinear activation letters Counting unit and multiply-accumulator unit；

The output of first restructural separation convolution module is the input of nonlinear activation function unit, nonlinear activation function list The output of member is the input of multiply-accumulator unit, and the output of multiply-accumulator unit is the defeated of second restructural separation convolution module Enter；

Picture signal and Configuration network parameter signal are input to first restructural separation convolution module；First restructural separation Convolution module completes 16 × 16 convolution algorithms；Nonlinear activation function unit completes the fortune of activation primitive in convolutional neural networks It calculates；Multiply-accumulator unit completes the operation of the articulamentum in convolutional neural networks；Second restructural separation convolution module is simultaneously Complete 48 × 8 convolution algorithms；

The multiply-accumulator unit includes several multiply-accumulators and several registers；Wherein multiply-accumulator is for calculating one layer of volume Product network output valve and weight parameter product and；The result of upper one layer of convolutional network is input to multiply-accumulator by register In；

The restructural separation convolution module includes 16 4 × 4 restructural one-dimensional convolution modules and the first register group；First posts Storage group is used to picture signal or previous stage output and convolutional network parameter being input to restructural one-dimensional convolution module；It is restructural Separation convolution module is for completing 1 16 × 16 convolution or being completed at the same time 48 × 8 convolution algorithms；

4 × 4 restructural one-dimensional convolution modules include 4 first selectors, 4 the one 2 input multipliers, the one 4 input additions Device, 4 the 2nd 2 input multipliers and the 2nd 4 input summers；The output end connection of 4 first selectors is 4 first corresponding Another input terminal of the input terminal of 2 input multipliers, 4 the 1st input multipliers is the weight of neural network；4 The output end of one 2 input multipliers connects the input terminal of the one 4 input summer；The input of 4 the 2nd 2 input multipliers is the The output of one 4 input summers and the weight of neural network；The input of 2nd 4 input summer is 4 the 2nd 2 input multipliers Output.

2. a kind of convolutional network arithmetic element according to claim 1, it is characterised in that: the nonlinear activation function list Member includes QD generator and arithmetic unit group；Wherein the input of QD generator is the output of restructural separation convolution, arithmetic unit group Input is the output of QD generator；QD generator is for parameter needed for generating activation primitive；Arithmetic unit group is for calculating activation The final end value of function；

The QD generator includes first divider；Input signal is input to the first divider, and the first divider exports quotient Q With remainder D；The arithmetic unit group includes shift register, 2 first adders and the second divider；Shift register output and For the input of 2 first adders；The output of 2 first adders is the input of the second divider；Shift register, first add Musical instruments used in a Buddhist or Taoist mass and the second divider are sequentially connected.

3. a kind of restructural convolutional neural networks processor, it is characterised in that: including bus interface, pretreatment unit, restructural Hardware control, SRAM, SRAM control unit, input buffer module, output buffer module, memory, data storage control Convolutional network arithmetic element described in device and several any one of claims 1 to 2；Bus interface connects pretreatment unit, number According to Memory Controller, reconfigurable hardware controller and input-buffer, output caching；Memory connects data storage control Device；Input-buffer connects reconfigurable hardware controller and SRAM control unit；Convolutional network arithmetic element connects input-buffer mould Block, output buffer module；

The input buffer module, output buffer module are respectively used to outputting and inputting for caching convolutional network arithmetic element；

The reconfigurable hardware controller configures convolutional network computing module, controls its calculating process；In operation In journey or at the end of send interrupt requests and complete and the interaction of external system；

4. a kind of restructural convolutional neural networks processor according to claim 3, it is characterised in that: rolled up including 512 Product network operations unit realizes the image denoising processing based on convolutional neural networks.

5. a kind of restructural convolutional neural networks processor according to claim 3, it is characterised in that: described one kind can weigh Structure convolutional neural networks processor realizes 3 layers of convolutional neural networks, for removing the raindrop adhered in image or video And dust；The convolutional neural networks first layer is made of 512 16 × 16 convolution, and the second layer is neural network articulamentum, Third layer is made of 512 8 × 8 convolution.

6. the method that a kind of restructural convolutional neural networks processor as claimed in claim 3 realizes image denoising processing, special Sign is: including:

It is random to reduce convolution number during image denoising processing, the consumption of hardware resource is reduced, processing speed is improved；

Alternatively, 16 × 16 convolution algorithm units and 8 × 8 convolution algorithm units are respectively classified into during image denoising processing 16 and 44 × 4 convolution masks use one-dimensional convolution to each 4 × 4 convolution.