CN114429203B - Convolution calculation method, convolution calculation device and application thereof - Google Patents

Convolution calculation method, convolution calculation device and application thereof Download PDF

Info

Publication number
CN114429203B
CN114429203B CN202210335610.1A CN202210335610A CN114429203B CN 114429203 B CN114429203 B CN 114429203B CN 202210335610 A CN202210335610 A CN 202210335610A CN 114429203 B CN114429203 B CN 114429203B
Authority
CN
China
Prior art keywords
convolution
line
data
kernel
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210335610.1A
Other languages
Chinese (zh)
Other versions
CN114429203A (en
Inventor
陆金刚
方伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Xinsheng Electronic Technology Co Ltd
Original Assignee
Zhejiang Xinsheng Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Xinsheng Electronic Technology Co Ltd filed Critical Zhejiang Xinsheng Electronic Technology Co Ltd
Priority to CN202210335610.1A priority Critical patent/CN114429203B/en
Publication of CN114429203A publication Critical patent/CN114429203A/en
Application granted granted Critical
Publication of CN114429203B publication Critical patent/CN114429203B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Computational Mathematics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Neurology (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a convolution calculation method, a convolution calculation device and application thereof, wherein the convolution calculation method comprises the following steps: starting a working stage, stabilizing the working stage and finishing the working stage; in the starting working stage, the input characteristic diagram cache unit caches the data in each input characteristic diagram according to lines, and when the number of the cache data lines in each input characteristic diagram reaches a kernel _ size line, the line convolution operation unit starts convolution operation; in the stable working stage, after the input feature map cache unit finishes caching a new line of data in each input feature map, the line convolution operation unit performs convolution operation and outputs a convolution operation result of the new line of each output feature map; and in the working finishing stage, the row convolution operation unit completes the calculation of the residual rows by self-driving and outputs the convolution calculation results of the residual rows. By adopting the convolution calculation method, the real-time performance of image processing is improved, the storage resource is saved, and the block boundary effect cannot occur.

Description

Convolution calculation method, convolution calculation device and application thereof
Technical Field
The invention relates to the technical field of convolutional neural network and chip design, in particular to a convolutional calculation method, a convolutional calculation device and application thereof.
Background
Image noise reduction is always a very important function in the field of Image Signal Processing (ISP), and can make an image more clear and beautiful visually, improve the quality of a picture owner and the quality of the picture, and further facilitate better image analysis and understanding, so that the image noise reduction technology is widely applied to the fields of biology, medicine, military, automatic driving and the like. At present, mainstream image noise reduction methods are mainly divided into two types, one type is a priori traditional image noise reduction method based on a specific form, and the traditional image noise reduction methods not only have complex models, but also contain a plurality of parameters needing manual adjustment, so that the image noise reduction process is complex in calculation. Especially, when the image noise reduction processing technology is faced with severe weather, complex light, severe motion and other complex scenes, the image noise reduction processing technology based on the traditional ISP is close to the bottleneck, the effect is more and more difficult to adjust, and the algorithm cost performance is more and more low. The other type of image noise reduction method is an image noise reduction method based on deep learning, and the AI noise reduction algorithm based on the CNN network can well solve the problems encountered by the traditional image noise reduction method. When various complex application scenes are faced, the AI noise reduction algorithm shows super-strong adaptability and stability, and effectively retains real details in an image while efficiently removing image noise; meanwhile, the AI noise reduction algorithm almost needs to carry out parameter adjustment, which lays a foundation for large-scale industrial application; in addition, the AI noise reduction algorithm can be updated iteratively, and compared with the traditional image noise reduction processing technology, once the image noise reduction algorithm is solidified in the chip, the AI noise reduction algorithm cannot be updated, and the AI noise reduction algorithm has better flexibility.
DnCNN is a current famous AI noise reduction algorithm based on a CNN network, and is a Convolutional Neural network for Denoising, which is derived from a paper of Beyond a Gaussian Denoiser: identified Learning of Deep CNN for Image Denoising, in which an end-to-end Residual Learning Neural network model is used for Denoising for the first time, and a better noise reduction effect than a previous noise reduction algorithm is obtained. Meanwhile, the DnCNN can also be applied to tasks such as single-image super-division, JPEG deblocking and the like.
As shown in fig. 1, fig. 1 is a network structure diagram of a common DnCNN. The input of the network structure is an original image with noise points, the output is a residual image (namely a noise point image), and the residual image is subtracted on the basis of the original image to obtain an image after noise reduction. The entire network structure can be divided into three parts: the network layer connected to the original image is "Conv + ReLU" (Conv: Convolution; ReLU: Rectifier Linear Neural, i.e. rectification Linear unit) for generating N output feature maps (one input feature map if a gray map, three input feature maps if a color map); the network layer connected with the residual image is 'Conv' and is used for reconstructing output (one output characteristic image if the network layer is a gray scale image, three output characteristic images if the network layer is a color image); the other M intermediate network layers are in the form of 'Conv + BN + ReLU' (BN: Batch Normalization), wherein Batch Normalization is added between convolution and ReLU, and the number of input and output feature maps of the intermediate network layers is N.
At present, an AI noise reduction process based on a CNN network is generally performed after ISP image processing, that is, after an entire frame of image is processed by a conventional ISP (the process may or may not include conventional noise reduction), an AI noise reduction method is used to reduce noise. In the above processing manner, a frame delay of at least one frame may be caused, which is not suitable for an application scene with relatively low delay tolerance, such as automatic driving. Meanwhile, as can be known from the network structure of the DnCNN, the AI noise reduction process based on the CNN network adopts a mode of original image resolution in-original image resolution out, which will cause huge memory resource loss for a large-resolution application scene (for example, 4K images, when the number of output feature maps is 64, only the output of the first convolution layer needs to occupy about 500MB of memory). Although some published patents use image block technology to reduce the memory requirement, when performing CNN network operation, the CNN network operation is performed on each block separately, and finally the results of final calculation are combined, but there is a certain difference between the result of CNN network operation performed on each block separately and the result of CNN network operation on the whole image, especially for applications such as AI noise reduction, an obvious block boundary effect may occur, which further affects the overall noise reduction effect. Due to the above problems, the large-scale application of the CNN network-based AI noise reduction algorithm in the industry is definitely limited.
Therefore, there is a need for an AI noise reduction algorithm based on a CNN network, which can be combined with the conventional ISP processing, does not increase the frame-level delay, and can effectively reduce the requirement for the memory, save the storage resource, and not weaken the AI noise reduction effect.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a convolution calculation method, which is characterized in that: the convolution calculation method comprises a starting working stage, a stable working stage and an ending working stage; in the starting working phase, the input feature map cache unit caches data in each input feature map according to lines, and judges whether the line number of the cached data in each input feature map reaches a kernel _ size line, wherein the kernel _ size represents the line number of a convolution kernel; when the number of the cache data lines in each input feature map reaches a kernel _ size line, starting convolution operation by the line convolution operation unit, and when each output feature map outputs a first line of convolution calculation results, ending the starting working phase; in the stable working stage, after the input feature map cache unit finishes caching a new line of data in each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to carry out convolution operation, and stably outputs a convolution operation result of a new line of each output feature map; when the input characteristic diagram cache unit finishes caching the last line of data of each input characteristic diagram and finishes the convolution calculation of the line of data, the stable working stage is finished; and in the finishing working stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the convolution calculation results of the residual lines.
The convolution calculation method provided by the invention has the following advantages: (1) when the amount of the cached data in each input feature map reaches kernel _ size, the convolution calculation can be started without caching the data of the whole input feature map, so that the real-time performance of image processing is enhanced in the calculation process; (2) the input feature map caching unit only needs to set a cache with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to cache the data of the whole input feature map; when temporary data caching is carried out, only 1 row of data capacity needs to be set for temporary data caching, and convolution calculation results are output according to rows; therefore, the convolution calculation method greatly reduces the memory required by calculation and saves the storage resource; (3) in the calculation process of the method, the data participating in the operation is completely the same as the data of the conventional convolution operation of the whole frame picture, so that the obvious block boundary effect cannot occur.
Preferably, when determining whether the number of rows of the cached data in each input feature map reaches the kernel _ size row, the number of rows of the cached data includes the number of rows of PAD data filled above each input feature map data.
Preferably, the number of rows of the PAD data is (kernel _ size-1)/2.
Preferably, if the input feature map is original image data, the input feature map caching unit sets a cache space with a capacity of (kernel _ size +1) × PIC _ W for each input feature map, where PIC _ W is a data amount of each line of each input feature map; and if the input feature map is the output feature map of the previous layer of convolution operation, the input feature map caching unit sets a caching space with the capacity of kernel _ size _ PIC _ W for each input feature map. When the cache of each input feature graph reaches kernel _ size line data, the line convolution operation unit starts to carry out convolution operation according to lines, and meanwhile, the residual line cache space can continuously receive data input of the external input feature graph.
The invention also provides a convolution calculation device adopting the convolution calculation method, which is characterized by comprising an input characteristic diagram cache unit, a convolution kernel coefficient reading unit, a line convolution operation unit and a line convolution result writing-out unit, wherein the input characteristic diagram cache unit is used for caching line data in each input characteristic diagram; the convolution kernel coefficient caching unit is used for caching convolution kernel coefficients; the convolution kernel coefficient reading unit is used for reading the convolution kernel coefficients participating in convolution operation from the convolution kernel coefficient cache unit; and the line convolution operation unit is used for reading data participating in convolution calculation from the input characteristic diagram cache unit and the convolution kernel coefficient reading unit, performing convolution calculation, and outputting a convolution result to an external storage device through the line convolution result writing-out unit.
The invention provides a DnCNN network computing method, which is characterized by comprising a working starting stage, a stable working stage and a working ending stage; in the starting working stage, the original image data input buffer unit buffers data in each input feature map in the input image according to lines, and judges whether the number of buffer data lines in each input feature map reaches a kernel _ size line, wherein the kernel _ size represents the number of lines of a convolution kernel; when the number of the cache data lines in each input feature graph reaches a kernel _ size line, starting a first layer of convolution operation by a line convolution operation unit; other neural network layers need to wait until the number of lines of the cache data in each output characteristic diagram of the previous layer reaches a kernel _ size line, and then the calculation of the current neural network layer is started; when the data of the first row of residual images are output, the starting working phase is ended; in the stable working stage, after the original image data input cache unit caches a new line of data of each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to perform continuous operation on each neural network layer, and stably outputs a new line of residual image result data; when the original image data input cache unit finishes caching the last line of data of each input characteristic graph, finishes the calculation of each neural network layer and outputs a residual image and output data corresponding to the line, the stable working stage is finished and the working stage is finished; and in the finishing working stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the output data of the residual lines of the residual image.
The DnCNN network computing method provided by the invention has the following advantages: (1) when the amount of the cached data in each input feature map reaches kernel _ size, the convolution calculation can be started without caching the data of the whole input feature map, so that the real-time performance of image processing is enhanced in the calculation process; (2) the original image data input buffer unit only needs to set a buffer with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to buffer the data of the whole input feature map; when the intermediate result is cached, each output characteristic diagram only needs the data capacity of kernel _ size _ PIC _ W to be used for caching the intermediate result and used as the input of the next layer of neural network, so that the memory required by calculation is greatly reduced by adopting the convolution calculation method, and the storage resource is saved; (3) in the calculation process, the data participating in the calculation is completely the same as the data of the conventional convolution calculation of the whole frame picture, so that the obvious block boundary effect cannot occur; (4) the traditional image signal processing related algorithm functional modules are processed according to rows, the line-processed CNN network computing method also inputs and outputs computing results according to the rows, and the delay line number of the output results and the original image is only Layer _ num + kernel _ size-1-pad _ num, so that the method can be compatible with the traditional image signal processing algorithm and can be used for parallel computing, the delay is small, and no frame-level delay exists.
Preferably, the line convolution operation unit performs calculation of BN and ReLU in addition to the convolution operation.
Preferably, when judging whether the number of cache data lines in each input feature map reaches a kernel _ size line, the number of cache data lines includes the number of lines of PAD data filled above each input feature map data; and waiting for the number of lines of the cache data in each output characteristic diagram of the previous layer to reach a kernel _ size line by other neural network layers, wherein the number of lines of the cache data comprises the number of lines of the PAD data filled above each output characteristic diagram data.
Preferably, the number of rows of the PAD data is (kernel _ size-1)/2.
Preferably, the original image data input buffer unit sets a buffer space with a capacity of (kernel _ size +1) × PIC _ W for each input feature map, where PIC _ W is a data amount of each line of each input feature map.
Preferably, the line delay number of the data of the output first line residual image and the original image input data is start _ src _ ln _ num = Layer _ num + kernel _ size-1-pad _ num; wherein start _ src _ ln _ num represents a line delay number of data of a first line of output residual images and original image input data; layer _ num represents the number of layers of the whole DnCNN network, including all Conv + ReLU layers, Conv + BN + ReLU layers and Conv layers; kernel _ size represents the number of rows of the convolution kernel; PAD _ num represents the number of PAD rows filled above the input feature map. Therefore, with the DnCNN calculation method of the present invention, only a small delay is generated.
Correspondingly, the invention also provides a DnCNN network computing device adopting the DnCNN network computing method, which is characterized by comprising an original image data input cache unit, a convolution kernel coefficient reading unit, a line convolution intermediate result reading unit, a line convolution operation unit, a line convolution intermediate result cache unit and a line convolution result writing-out unit; the original image data input caching unit is used for caching line data of each input characteristic diagram in an original image; the convolution kernel coefficient caching unit is used for caching convolution kernel coefficients; the convolution kernel coefficient reading unit is used for reading the convolution kernel coefficients participating in convolution operation from the convolution kernel coefficient cache unit; the line convolution intermediate result cache unit is used for caching an intermediate result calculated by a DnCNN network; the line convolution operation unit is used for reading data participating in convolution operation from the original image data input buffer unit/the line convolution intermediate result reading unit and the convolution kernel coefficient reading unit, performing convolution operation and outputting an obtained result to the line convolution result writing-out unit; and the line convolution writing-out unit outputs the intermediate result to the line convolution intermediate result buffer unit and transmits the final residual image data to an external memory.
The DnCNN network computing device processed according to rows provided by the invention has the following advantages: (1) the original image data input buffer unit only needs to set a buffer with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to buffer the data of the whole input feature map; when the intermediate result caching is carried out, the line convolution intermediate result caching unit only needs to configure kernel _ size _ PIC _ W data capacity for each output characteristic diagram for intermediate result caching and used as the input of the next layer of neural network, so that the device greatly reduces the storage space, realizes the miniaturization and integration, and is more favorable for industrial application; (2) in the calculation process of the device, the data participating in the operation is completely the same as the data of the conventional convolution operation of the whole frame picture, so that the obvious block boundary effect cannot occur. (3) The traditional image signal processing related algorithm functional modules are processed according to rows, the CNN network computing device for processing according to rows also inputs and outputs computing results according to rows, and the number of delay rows of the output results and the original input image is only Layer _ num + kernel _ size-1-pad _ num, so that the device can be compatible with the traditional image signal processing device for parallel computing, the delay is small, and no frame-level delay exists.
Preferably, the line convolution intermediate result buffer unit sets a buffer space with a capacity of kernel _ size × PIC _ W lines for each intermediate output feature map.
Preferably, the line convolution operation unit performs calculation of BN and ReLU in addition to the convolution operation.
Preferably, the DnCNN network computing device is placed in one of the following locations: the start of the image signal processing flow, an intermediate position of the image signal processing flow, or the end of the image signal processing flow. The CNN network computing device for line-wise processing in the present invention inputs and outputs the computation results line-wise, as in the conventional image signal processing device, and therefore, the device in the present invention can be placed at any position in the image signal processing device, performs compatible parallel computation with the conventional image signal processing algorithm, and has a small delay and no frame-level delay.
Drawings
FIG. 1 is a diagram of a network architecture of a common DnCNN.
FIG. 2 is a diagram illustrating convolutional layer operation.
FIG. 3 is a schematic diagram of the convolution calculation method of the line-by-line processing in the present invention.
Fig. 4 is a diagram showing a convolution calculation apparatus according to the convolution calculation method for line-by-line processing of the present invention.
FIG. 5 is an embodiment of the present invention of line-wise processed DnCNN network operations.
FIG. 6 is a schematic diagram of row data flow in each neural network layer in a DnCNN network.
Fig. 7 is a CNN network computing device for line-by-line processing according to the present invention.
Detailed Description
FIG. 2 is a diagram illustrating convolutional layer operation. The color coding method of the digital image comprises RGB, YUV, YCbCr and the like. Taking RGB color coding as an example, each pixel in the digital image may be composed of a red sub-pixel, a green sub-pixel, and a blue sub-pixel; that is, if the resolution or resolution of a digital image is W × H square pixels, the digital image may be represented by N two-dimensional matrices. For example, a digital image may be composed of multiple feature maps Tin_1、Tin_2、……、Tin_NThe total data volume of the digital image is W H N, where W is the feature width, H is the feature height, and N is the dimension (or number of channels).
Assuming that a digital image needs to be subjected to convolution operation with M sets of convolution kernels (kernel), i.e. M output feature images are generated through convolution calculation, similarly, each of the set of convolution kernels can be represented by N two-dimensional matrices, and the data amount of a set of convolution kernel coefficients is w × h × N, i.e. the convolution kernel coefficients needed by one output feature image are calculated; then when computing M output feature images, the total data size of the convolution kernel coefficients is w × h × N × M, where w is the convolution kernel width, h is the convolution kernel height, N is the dimension (or number of channels), and M is the number of sets of convolution kernels.
When one-step convolution operation is performed, the element in the convolution kernel and the current corresponding element in the feature map perform matrix Multiplication (MAC) operation, and then the convolution kernel sequentially moves one step (stride) to perform matrix multiplication with the element corresponding to each step in the feature map, and so on until the convolution kernel moves to the last element of the feature map. It is noted that when the convolution kernel is moved outside the feature map, some elements in the convolution kernel cannot correspond to elements in the feature map, so it is necessary to expand the range of the feature map, generally, padding (padding) the feature map, i.e., padding some values on the boundary of the feature map matrix to increase the size of the feature map matrix, e.g., padding 0, to ensure that the convolution operation is valid (valid). In general, the size w × h of the convolution kernel is odd × odd, w and h are equal values, and assuming that the size is k, that is, the size of the convolution kernel is k × k, the filling amplitude is (k-1)/2, that is, the (k-1)/2 row filling values are filled in the upper and lower sides of the feature map matrix, and the (k-1)/2 column filling values are filled in the left and right sides of the feature map matrix. After the padding, the output signature will be the same size as the input signature.
For example, assume a set of input profiles Tin_1、Tin_2、……、Tin_NA set of convolution kernels Wx,y,Wx,yRepresenting the convolution kernel between the xth input feature map and the yth output feature map, and the step size of the convolution operation is 1, a set of output feature maps (feature maps) T can be generatedout_1、Tout_2、……、Tout_M. In short, with the convolution operation architecture of fig. 2, a convolutional neural network computation can be implemented.
The convolutional layer calculation is illustrated in FIG. 2, which includes N input profiles T in FIG. 2in_1、Tin_2、……、Tin_NM output profiles Tout_1、Tout_2、……、Tout_M,Wx,yRepresents the convolution kernel between the xth input feature map and the yth output feature map, and if the convolution kernel is a 3 × 3 convolution kernel, it contains 9 coefficients. In the calculation process, each output feature map needs all input feature maps to participate in calculation, and T is usedout_1For example, the conventional calculation process is as follows: (1) will Tin_1And convolution kernel W1,1Carrying out filtering operation to obtain temporary characteristic diagram data Tout_1_tmpAnd caching; (2) will Tin_2And convolution kernel W2,1Filtering operation is carried out, and the calculated result is accumulated on the temporary characteristic diagram data T cached before in a point-to-point modeout_1_tmpAnd taking the result of the calculation asNew temporary profile data Tout_1_tmpCaching; (3) completing the calculation of all input feature map data in a similar calculation mode in the step (2), and finally obtaining Tout_1_tmpI.e. the final Tout_1
According to the above calculation process, the conventional calculation method needs to start convolution calculation after the data of the whole input feature map is input, and in the calculation process, all the temporary feature map data are the same as the data volume of the whole input feature map, so that the calculation method needs huge memory resources, especially for a high-resolution application scenario, the calculation method occupies more memory space, and further the large-scale application of the CNN-based AI noise reduction method in the industry is limited.
In order to solve the above-mentioned problems of the conventional convolution calculation method, the present invention provides a convolution calculation method by line processing, as shown in fig. 3, taking convolution calculation with convolution kernel of 3 × 3 and span of 1 as an example, if T isin_1、Tin_2、……、Tin_NAll have three lines of data, then T can be calculated according to the 3x3 filtering calculation modeout_1、Tout_2、……、Tout_MThe corresponding line of data is calculated. Then only one row of data is input subsequently, Tout_1、Tout_2、……、Tout_MThe next line of data can also be calculated. The convolution calculation process can be divided into three stages: a starting working phase, a stable working phase and an ending working phase.
In a starting working phase, the input feature map caching unit caches data in each input feature map according to lines, judges whether the number of cached data volume lines in each input feature map reaches kernel _ size lines (wherein, kernel _ size represents the number of lines of a convolution kernel), when the cached data volume in each input feature map reaches kernel _ size line data, the line convolution operation unit starts convolution operation, and when each output feature map outputs a first line convolution calculation result, the starting working phase is ended. The cache data amount comprises PAD data filled above each output feature map data, namely if the input feature maps are filled (Padding) in the convolution calculation process, whether the sum of the cache data amount line number and the PAD line number in each input feature map reaches a kernel _ size line is judged, and when the sum of the cache data amount line number and the PAD line number in each input feature map reaches the kernel _ size line data, the line convolution operation unit starts convolution operation. When the input feature map caching unit caches data, if the input feature map is original image data, a cache BUF with a capacity of (kernel _ size +1) × PIC _ W may be set for each input feature map, where PIC _ W is a data amount of each line of each input feature map. When the cache of each input feature graph reaches kernel _ size line data, the line convolution operation unit starts to perform line-by-line convolution operation, and meanwhile, the remaining line of cache space can continuously receive data input of the external input feature graph. When the input feature map caching unit caches data, if the input feature map is an output feature map of a previous layer of convolution operation, a cache BUF with a capacity of kernel _ size × PIC _ W may be set for each input feature map, and is used to receive a convolution output result of the previous layer of convolution operation. And circularly using the buffer BUF in a stable working stage until all the row data in the input characteristic diagram are input.
In the stable working stage, after the input feature map cache unit finishes caching a new line of data in each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to carry out convolution operation, and stably outputs a convolution operation result of a new line of each output feature map. And when the input characteristic diagram cache unit finishes caching the last line of data of each input characteristic diagram and finishes convolution calculation of the line of data, ending the stable working stage and entering the working finishing stage.
With the output characteristic diagram T in FIG. 3out_1For example, in the starting operation stage, the operation process of the row convolution operation unit is as follows: (1) obtaining a first input feature map Tin_1The data PAD, PIC _1 and PIC _2 participating in the operation are obtained, and the corresponding convolution kernel W is obtained1,1Carrying out convolution operation on the data to obtain an output characteristic diagram Tout_1Temporary data PIC _1_ tmp of the first row PIC _ 1; (2) obtaining a second input feature map Tin_2The data PAD, PIC _1 and PIC _2 participating in the operation are obtained, and the corresponding convolution kernel W is obtained2,1Performing convolution operation on the data in the step (1), and accumulating the calculated result to the previous output characteristic diagram T in a point-to-point modeout_1Temporary data PIC _1_ tmp of the first row PIC _ 1; (3) analogizing by the method until all the input feature map data are calculated, and finally obtaining the output feature map Tout_1The temporary data PIC _1_ tmp of the first row PIC _1 is the final first row output Tout_1. And outputting the result through a row convolution result output unit. In the smooth working phase, the results of the rows in the other output characteristic diagrams are obtained by the method. It can be seen from the above operation process that the line convolution operation unit only needs 1 data capacity for temporary data PIC _1_ tmp buffering, rather than the traditional convolution calculation method that needs the same memory size as the input feature map for intermediate result buffering, which further reduces the storage space required by the convolution calculation method provided in the present invention.
And in the working finishing stage, the row convolution operation unit completes the calculation of the residual rows by self-driving and outputs the convolution calculation results of the residual rows. Taking the convolution kernel of 3 × 3 in fig. 3 as an example, the last line of data of each output feature map is generated by calculation using the PIC _ H-1, PIC _ H line of each input feature map and the PAD line, and the last line of data of each output feature map is output.
The convolution calculation method provided by the invention has the following advantages: (1) when the amount of the cached data in each input feature map reaches kernel _ size, the convolution calculation can be started without caching the data of the whole input feature map, so that the real-time performance of image processing is enhanced in the calculation process; (2) the input feature map caching unit only needs to set a cache with the capacity of (kernel _ size +1) × PIC _ W or kernel _ size × PIC _ W for each input feature map, and does not need to cache the data of the whole input feature map; when temporary data caching is carried out, only 1 row of data capacity needs to be set for temporary data caching, and convolution calculation results are output according to rows; therefore, the convolution calculation method greatly reduces the memory required by calculation and saves the storage resource; (3) in the calculation process of the method, the data participating in the operation is completely the same as the data of the conventional convolution operation of the whole frame picture, so that the obvious block boundary effect cannot occur.
Fig. 4 is a diagram showing a convolution calculation apparatus 100 according to the convolution calculation method of the present invention. The convolution calculation device includes an input feature map buffer unit 101, a convolution kernel coefficient reading unit 102, a convolution kernel coefficient buffer unit 103, a line convolution operation unit 104, and a line convolution result writing-out unit 105. The input characteristic diagram caching unit 101 is used for caching line data in each input characteristic diagram; the convolution kernel coefficient buffer unit 103 is used for buffering convolution kernel coefficients; the convolution kernel coefficient reading unit 102 is configured to read a convolution kernel coefficient participating in convolution operation from the convolution kernel coefficient buffering unit 103; the line convolution operation unit 104 is configured to read data involved in convolution calculation from the input feature map buffer unit 101 and the convolution kernel coefficient reading unit 102, perform convolution calculation, and output a line convolution operation result to an external storage device through the line convolution result writing unit 105.
In the process of performing convolution calculation, in a starting working stage, the input feature map caching unit 101 receives data input by lines in each input feature map, and judges whether the number of the cached input feature map lines reaches a kernel _ size line; a convolution kernel coefficient reading unit 102 reads a convolution kernel coefficient required for participating in calculation from a convolution kernel coefficient buffering unit 103; when the amount of the cache data in each input feature map reaches kernel _ size line data, the line convolution operation unit 104 starts convolution operation, and when each output feature map outputs the first line of convolution calculation results, the start-up working phase is ended. The minimum capacity of the input feature map buffer unit 101 is (kernel _ size +1) × PIC _ W × N, where kernel _ size is the number of rows of the convolution kernel, PIC _ W is the data amount of each row of the input feature map, and N is the number of input feature maps.
If the input feature map is the original image data, the input feature map caching unit 101 sets a cache with a capacity of (kernel _ size +1) × PIC _ W for each input feature map, and is used for caching the data of the input feature map, when the cached data amount reaches a kernel _ size line, the convolution calculation can be performed, and meanwhile, the remaining cache space in one line can continuously accept the data input of the external input feature map. If the input feature map is the output feature map of the previous layer of convolution operation, the input feature map cache unit 101 sets a cache with a capacity of kernel _ size _ PIC _ W for each input feature map, and is used for receiving the convolution output result of the previous layer of convolution operation. And in the subsequent stable working stage, circularly using the buffer BUF until all the row data in the input characteristic diagram are input.
In the stable working stage, after the input feature map buffer unit 101 buffers a new line of data in each input feature map, the line convolution operation unit 104 obtains the input feature map data required by the convolution operation and the corresponding convolution kernel data to perform the convolution operation, and outputs the result of the convolution operation to the line convolution result writing-out unit 105.
In the end working phase, the row convolution operation unit 104 completes the calculation of the remaining rows by self-driving, and after the calculation is completed, the row convolution operation unit outputs the last row output result of each output feature map to the row convolution result writing-out unit 105. The line convolution result writing-out unit 105 can output the data in the output feature map to the external memory by lines in the stationary operation stage.
The convolution calculation method processed according to the line can be applied to DnCNN operation, and the whole DnCNN network can be calculated and processed according to the line calculation mode. Because BN can be combined with convolution calculations (Conv) in the DnCNN network structure shown in fig. 1, whereas ReLU is a simple decision and assignment operation, the essence of the entire DnCNN network calculation is a continuous calculation of multiple convolution layers.
As shown in fig. 5, fig. 5 is an embodiment of the present invention of a line processed DnCNN network operation. This embodiment simplifies the DnCNN network in order to better describe the method for calculating the DnCNN network proposed in this patent. In the DnCNN network, an input image is a single gray image (namely the number of input feature maps is 1), and the image size is PIC _ W + PIC _ H; a Conv + ReLU layer is connected behind the input image, the number of output characteristic graphs is 2, and the size of a convolution kernel is 3 multiplied by 3; 2 Conv + BN + ReLU layers are continuously connected behind the Conv + ReLU layer, the number of input/output characteristic graphs is 2, and the size of a convolution kernel is 3 multiplied by 3; the second "Conv + BN + ReLU" layer is followed by a "Conv" layer, whose output feature map is 1 (i.e. the final residual image), and the size of the convolution kernel is 3 × 3. In fig. 5, in order to keep the image size unchanged during the continuous convolution operation, the image during the calculation is filled in, and PAD rows are added. The PAD lines in the graph are not real feature data, and according to the requirement of 3 × 3 filter calculation, a PAD line with a data value of 0 needs to be added on each of the upper and lower sides of the feature graph to ensure that the image height remains unchanged after the 3 × 3 filter calculation; the leftmost and rightmost edges of each feature map also need to be added with a PAD column with data value 0 respectively to ensure that the width of the image remains unchanged after 3 × 3 filtering calculation (not shown in fig. 5 and not described further since it is not very relevant to the scheme in this patent).
As can be seen from fig. 5, when two lines of data SRC _ LN1 and SRC _ LN2 are input to the input image, the filled first line of PAD lines is added, so that the convolution operation of the "Conv + ReLU" LAYER is performed, and the first line of data LAYER1_ OFMAP1_ LN of the output characteristic map of the "Conv + ReLU" LAYER is calculated1And LAYER1_ OFMAP2_ LN1
When both output profiles of the "Conv + ReLU" LAYER have two lines of data, LAYER1_ OFMAP1_ LN1、LAYER1_OFMAP1_LN2And LAYER1_ OFMAP2_ LN1、LAYER1_OFMAP2_LN2In combination with the PAD row in the first row of each output feature map in the LAYER, it is possible to perform convolution calculation of the "first Conv + BN + ReLU" LAYER and calculate the first row data LAYER2_ ofamap 1_ LN in the two output feature maps in the "first Conv + BN + ReLU" LAYER1And LAYER2_ OFMAP2_ LN1. At this time, the data that the input image has entered and participated in the calculation is: SRC _ LN1、SRC_LN2And SRC _ LN3
When two layers of the "first Conv + BN + ReLU" layerEach output profile has two lines of data LAYER2_ OFMAP1_ LN1、LAYER2_OFMAP1_LN2And LAYER2_ OFMAP2_ LN1、LAYER2_OFMAP2_LN2In combination with the PAD row in the first row of each output feature map in the LAYER, it is possible to perform convolution calculation of the "second Conv + BN + ReLU" LAYER and calculate the first row data LAYER3_ ofamap 1_ LN in the two output feature maps in the "second Conv + BN + ReLU" LAYER1And LAYER3_ OFMAP2_ LN1. At this time, the data that the input image has entered and participated in the calculation is: SRC _ LN1、SRC_LN2、SRC_LN3And SRC _ LN4
When both output profiles of the "second Conv + BN + ReLU" LAYER have two lines of data LAYER3_ OFMAP1_ LN1、LAYER3_OFMAP1_LN2And LAYER3_ OFMAP2_ LN1、LAYER3_OFMAP2_LN2In combination with the PAD row of the first row in each output profile in the LAYER, a convolution calculation of the "Conv" LAYER may be performed and the first row data LAYER4_ ofamap 1_ LN in the "Conv" LAYER is calculated1The data is the first line of data in the final residual image. At this time, the data that the input image has entered and participated in the calculation is: SRC _ LN1、SRC_LN2、SRC_LN3、SRC_LN4And SRC _ LN5
According to the calculation process, as long as 5 lines of original image data are input, the operation of the whole simplified DnCNN network can be carried out, and the data of the first line of residual image can be obtained; and each subsequent input of a line of original image data can output a line of residual image data.
Referring to fig. 6, fig. 6 is a schematic diagram showing row data flow in each neural network layer in the DnCNN network. Similar to the calculation principle in fig. 5, when the input data in a layer of neural network plus the data of PAD row reaches 3 rows of kernel _ size, the convolution calculation of the layer of neural network can be started. According to fig. 6, with the simplified DnCNN network, when input data of an input image reaches 5 lines, data of a first line of residual image can be output, and subsequently, one line of residual image data can be output every time input of one line of original image data. The same convolution operation unit processed by rows is circularly called according to the method, and the complete operation of the DnCNN can be realized. The line delay of the input data of the original image caused by this can be calculated by the following formula:
start_src_ln_num = Layer_num + kernel_size-1-pad_num;
wherein, start _ src _ ln _ num represents the number of lines of the input image when outputting the first line of residual image data, that is, the number of line delays between the data of the first line of residual image and the input data of the original image; layer _ num represents the number of layers of the whole DnCNN network, including all Conv + ReLU layers, Conv + BN + ReLU layers and Conv layers; kernel _ size represents the number of rows of the convolution kernel, e.g., 3 for a 3 × 3 convolution kernel; PAD _ num represents the number of PAD rows filled above the input feature map, and as mentioned above, the (kernel _ size-1)/2 rows of filling values are filled above and below the input feature map, so that PAD _ num is (kernel _ size-1)/2, and when kernel _ size is 3, PAD _ num is 1.
Similar to the above convolution calculation method, the calculation process of the line-by-line CNN network calculation method can also be divided into three stages: a starting working phase, a stable working phase and an ending working phase.
In the starting working phase, the original image data input buffer unit buffers the data in each input feature map in the input image according to lines, and judges whether the buffer data amount line number in each input feature map reaches a kernel _ size line (the kernel _ size represents the line number of a convolution kernel), when the buffer data amount in each input feature map reaches the kernel _ size line data, the line convolution operation unit starts to perform a first layer of convolution operation, namely convolution calculation of a 'Conv + ReLU' layer in FIG. 5. The cache data amount comprises PAD data, namely if the input feature diagram is filled (Padding) in the convolution calculation process, whether the sum of the cache data amount line number and the PAD line number in each input feature diagram reaches a kernel _ size line is judged, and when the sum of the cache data amount line number and the PAD line number in each input feature diagram reaches the kernel _ size line data, the line convolution operation unit starts convolution operation. At this time, only the first layer convolution, i.e., the convolution operation of the "Conv + ReLU" layer, can be performed. To continue to start the calculation of the next layer "Conv + BL + ReLU", it is necessary to wait until the original image data input buffer unit finishes buffering a new line of data, and the number of data lines in each output feature map calculated by the "Conv + ReLU" layer plus the number of PAD lines reaches kernel _ size. Similar conditions exist in the calculation of the other subsequent "Conv + BL + ReLU" layers and "Conv" layers, and the calculation of the current neural network layer can not be started until the number of data lines in each output feature map of the previous layer plus the number of PAD lines reaches kernel _ size. And starting the working phase until the data of the first row of residual pictures are output by the Conv layer. When the 'Conv' layer outputs the data of the residual image of the first row, a smooth working stage is started to enter.
In the stable working stage, after the original image data is input into the cache unit to cache a new line of data of each input feature map, the line convolution operation unit obtains the input feature map data required by the convolution operation and the corresponding convolution kernel data to perform continuous operation on each neural network layer, and stably outputs a new line of residual image result data. And when the original image data input buffer unit finishes the last line of data of each input characteristic graph, the calculation of each neural network layer is finished, and the residual image and the output data corresponding to the line are output, ending the stable working stage and entering the working finishing stage.
And in the working finishing stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the output data of the residual image residual lines. In the ending stage, there will be no more original image input, and the calculation process of the line convolution operation unit in the ending stage will be described below by taking the simplified DnCNN network in fig. 5 as an example. Step (1): the line convolution operation unit firstly uses the PIC _ H-1 line and the PIC _ H line of the original image and combines the PAD line to carry out convolution operation, generates the last line in each output characteristic diagram in the Conv + ReLU layer, and then completes the calculation of residual image data of the PIC _ H-1 line of each output characteristic diagram in the Conv + BL + ReLU layer, the PIC _ H-2 line of each output characteristic diagram in the second Conv + BL + ReLU layer and the PIC _ H-3 line in the Conv layer in sequence. Step (2): the line convolution operation unit generates the last line in each output characteristic diagram in the first Conv + BL + ReLU layer by utilizing the PIC _ H-1 line and the PIC _ H line of each output characteristic diagram in the Conv + ReLU layer and combining with the PAD line to carry out convolution operation, and then completes the calculation of the residual image data of the PIC _ H-1 line of each output characteristic diagram in the second Conv + BL + ReLU layer and the PIC _ H-2 line in the Conv layer in sequence. And (3): and the line convolution operation unit performs convolution operation by using the PIC _ H-1 line and the PIC _ H line of each output characteristic diagram in the 'first Conv + BL + ReLU' layer and combining the PAD line to generate the last line in each output characteristic diagram in the 'second Conv + BL + ReLU' layer, and then completes the calculation of the residual image data of the PIC _ H-1 line in the 'Conv' layer. And (4): and the line convolution operation unit performs convolution operation by using the PIC _ H-1 line and the PIC _ H line of each output characteristic diagram in the 'second Conv + BL + ReLU' layer and combining the PAD line to complete the calculation of the residual image data of the last line in the 'Conv' layer.
The CNN network computing method processed according to rows provided by the invention has the following advantages: (1) when the amount of the cached data in each input feature map reaches kernel _ size, the convolution calculation can be started without caching the data of the whole input feature map, so that the real-time performance of image processing is enhanced in the calculation process; (2) the original image data input buffer unit only needs to set a buffer with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to buffer the data of the whole input feature map; when the intermediate result is cached, each output characteristic diagram only needs the data capacity of kernel _ size _ PIC _ W to be used for caching the intermediate result and used as the input of the next layer of neural network, so that the memory required by calculation is greatly reduced by adopting the convolution calculation method, and the storage resource is saved; (3) in the calculation process, the data participating in the calculation is completely the same as the data of the conventional convolution calculation of the whole frame picture, so that the obvious block boundary effect cannot occur; (4) the traditional image signal processing related algorithm functional modules are processed according to rows, the line-processed CNN network computing method also inputs and outputs computing results according to the rows, and the delay line number of the output results and the original image is only Layer _ num + kernel _ size-1-pad _ num, so that the method can be compatible with the traditional image signal processing algorithm and can be used for parallel computing, the delay is small, and no frame-level delay exists.
As shown in fig. 7, the present invention provides a CNN network computing device 200 that processes by line, which includes an original image data input buffer unit 201, a convolution kernel coefficient buffer unit 203, a convolution kernel coefficient reading unit 202, a line convolution intermediate result reading unit 207, a line convolution operation unit 204, a line convolution intermediate result buffer unit 206, and a line convolution result writing-out unit 205. The original image data input buffer unit 201 is configured to buffer line data of each input feature map in an original image; the convolution kernel coefficient buffer unit 203 is used for buffering each convolution kernel coefficient; the convolution kernel coefficient reading unit 202 is configured to read a convolution kernel coefficient participating in convolution operation from the convolution kernel coefficient buffer unit 203; the line convolution intermediate result cache unit 206 is used for caching the intermediate result of the CNN network calculation; the line convolution intermediate result reading unit 207 is configured to read a line convolution intermediate result from the line convolution intermediate result buffer unit 206; the line convolution operation unit 204 is configured to read data participating in convolution operation from the original image data input buffer unit 201/line convolution intermediate result reading unit 207 and convolution kernel coefficient reading unit 202, perform convolution operation, and output an obtained result to the line convolution result writing-out unit 205; the line convolution write-out unit 205 outputs the intermediate result to the line convolution intermediate result buffer unit 206, and sends the final residual image data to the external memory.
The original image data input buffer unit 201 receives data input by lines in each input feature map, and judges whether the number of lines of the buffered input feature map reaches a kernel _ size line; when the buffer data amount in each input feature map reaches kernel _ size line data, the original image data input buffer unit informs the line convolution operation unit to start performing line-by-line convolution operation. The original image data input buffer unit 201 is internally provided with a buffer space with a capacity of (kernel _ size +1) × PIC _ W for each input feature map of the original image, and is used for buffering data of the input feature map, where the kernel _ size is the number of lines of the convolution kernel, and the PIC _ W is the data amount of each line of the input feature map. When the data amount of each input feature map buffer memory reaches the kernel _ size line, the convolution calculation can be carried out, and meanwhile, the data input of the external input feature map can be continuously received by the remaining buffer memory space in one line. And the subsequent cycle uses the buffer BUF until all the line data of the original image input characteristic diagram are input.
The convolution kernel coefficient buffer unit 203 is used to buffer all convolution kernel coefficients of the entire DnCNN network, which are introduced before the device is started and are not changed in the middle.
The convolution kernel coefficient reading unit 202 is configured to read convolution kernel coefficients cached in the convolution kernel coefficient caching unit 203, and circularly invoke a calculation process according to the row convolution operation unit 204 to complete reading of corresponding neural network layer convolution kernel coefficients.
The row convolution operation unit 204 completes the whole DnCN network calculation by rows in a circular calling mode. The line convolution operation unit 204 performs calculations of BN and ReLU in addition to the convolution operation, but only the convolution operation is described in this patent since the calculations of BN and ReLU do not affect the operation process of the present apparatus. In the starting working stage, when the cache data amount in each input feature map of the original image reaches kernel _ size row data, the row convolution operation unit 204 starts to perform calculation. At this time, only the first layer convolution, i.e., the convolution operation of the "Conv + ReLU" layer, can be performed. To continue to start the calculation of the next layer "Conv + BL + ReLU", it is necessary to wait until the original image data input buffer unit 201 finishes buffering a new line of data, and the number of data lines in each output feature map calculated by the "Conv + ReLU" layer plus the number of PAD lines reaches kernel _ size. Similar conditions exist in the calculation of other subsequent "Conv + BL + ReLU" layers and "Conv" layers, and the operation of the current neural network layer can not be started until the number of data lines in each output feature map of the previous layer plus the number of PAD lines reaches kernel _ size. And starting the working phase until the data of the first row of residual pictures are output by the Conv layer. When the 'Conv' layer outputs the data of the residual image of the first row, a smooth working stage is started to enter. In the steady working stage, after the original image data is input into the buffer unit 201 to buffer a new line of data of each input feature map, the line convolution operation unit 204 obtains the input feature map data required by the convolution operation and the corresponding convolution kernel data to perform continuous operation on each neural network layer, and stably outputs a new line of residual image result data. And when the original image data input buffer unit finishes the last line of data of each input characteristic graph, the calculation of each neural network layer is finished, and the residual image and the output data corresponding to the line are output, ending the stable working stage and entering the working finishing stage. In the end working stage, the row convolution operation unit 204 completes the calculation of the remaining rows by self-driving, and outputs the output data of the remaining rows of the residual image.
The line convolution intermediate result buffer unit 206 is used for buffering line data of the output characteristic diagram of each layer in the middle of the DnCNN network. Taking the simplified DnCNN network in fig. 5 as an example, the line convolution intermediate result buffer unit 206 mainly buffers line data of the respective output feature maps of the "Conv + ReLU" layer, the "first Conv + BL + ReLU" layer, and the "second Conv + BL + ReLU" layer. In order to save the buffer space, the data storage space of the intermediate output characteristic diagram also adopts a multiplexing mode, and each output characteristic diagram only needs the data capacity of kernel _ size _ PIC _ W for the intermediate result buffer and is used as the input of the next layer of neural network. Compared with the traditional frame-based operation mode, the output characteristic diagram of each layer needs to be provided with the buffer space with the same size as the input characteristic diagram, and particularly under the condition of large resolution, the line-based processing method adopted by the invention greatly saves the storage space capacity.
The line convolution intermediate result reading unit 207 is configured to read data in the line convolution intermediate result cache unit 206, and complete reading of the corresponding neural network layer characteristic line data according to a loop calling process of the line convolution operation unit 204.
The position of the CNN network computing device for line-by-line processing provided in the present invention can be arbitrarily placed in the whole image signal processing process, and can be placed at the beginning of the whole image signal processing flow, at the middle of the image signal processing flow, or at the end of the image signal processing flow, which is not limited herein. Because the image processing algorithms of the traditional ISP module are processed according to rows, while the traditional AI algorithm is generally processed according to frames, the AI noise reduction algorithm can be started only after the ISP module finishes processing one frame of data according to the rows; that is, the AI noise reduction process based on the CNN network introduced in the background art is generally performed after the ISP image processing, so that in the connection mode, the output result of the AI noise reduction algorithm has at least one frame delay, and is not suitable for an application scenario with a small delay tolerance, such as automatic driving. In contrast, the CNN network computing device for line processing provided in the present invention, because it inputs and outputs the computation results line by line, can be placed anywhere throughout the image signal processing device, and its output results do not have a delay at the frame level. Therefore, compared with the traditional AI noise reduction device, the noise reduction device provided by the invention has stronger adaptability and better application prospect.
The CNN network computing device processed according to rows provided by the invention has the following advantages: (1) the original image data input buffer unit only needs to set a buffer with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to buffer the data of the whole input feature map; when the intermediate result caching is carried out, the line convolution intermediate result caching unit only needs to configure kernel _ size _ PIC _ W data capacity for each output characteristic diagram for intermediate result caching and used as the input of the next layer of neural network, so that the device greatly reduces the storage space, realizes the miniaturization and integration, and is more favorable for industrial application; (2) in the calculation process of the device, the data participating in the operation is completely the same as the data of the conventional convolution operation of the whole frame picture, so that the obvious block boundary effect cannot occur. (3) The traditional image signal processing related algorithm functional modules are processed according to rows, the CNN network computing device for processing according to rows also inputs and outputs computing results according to rows, and the number of delay rows of the output results and the original input image is only Layer _ num + kernel _ size-1-pad _ num, so that the device can be compatible with the traditional image signal processing device for parallel computing, the delay is small, and no frame-level delay exists.
Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (13)

1. A convolution calculation method is characterized in that the convolution calculation method comprises a starting working stage, a stable working stage and an ending working stage; wherein the content of the first and second substances,
in the starting working phase, the input characteristic diagram cache unit caches data in each input characteristic diagram according to lines and judges whether the line number of the cached data in each input characteristic diagram reaches a kernel _ size line, wherein the kernel _ size represents the line number of a convolution kernel; when the number of the lines of the cache data in each input feature graph reaches a kernel _ size line, starting convolution operation by a line convolution operation unit, and when each output feature graph outputs a convolution calculation result of a first line, ending the starting working phase;
in the stable working stage, after the input feature map cache unit finishes caching a new line of data in each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to carry out convolution operation, and stably outputs a convolution operation result of a new line of each output feature map; if the input feature map is original image data, the input feature map caching unit sets a caching space with the capacity of (kernel _ size +1) × PIC W for each input feature map, wherein PIC W is the data volume of each line of each input feature map; if the input feature map is the output feature map of the previous layer of convolution operation, the input feature map caching unit sets a caching space with the capacity of kernel _ size _ PIC _ W for each input feature map; when the input characteristic diagram cache unit finishes caching the last line of data of each input characteristic diagram and finishes the convolution calculation of the line of data, the stable working stage is finished;
and in the finishing working stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the convolution calculation results of the residual lines.
2. The convolution calculation method of claim 1, wherein the number of buffered data lines includes the number of PAD data lines filled above each input feature map data when determining whether the number of buffered data lines in each input feature map reaches a kernel _ size line.
3. The convolution calculation method of claim 2, wherein the number of rows of the PAD data is (kernel _ size-1)/2.
4. A convolution calculation apparatus used in the convolution calculation method according to claim 1, wherein the convolution calculation apparatus includes an input feature map buffer unit, a convolution kernel coefficient read unit, a line convolution operation unit, and a line convolution result write unit, wherein,
the input characteristic diagram caching unit is used for caching line data in each input characteristic diagram;
the convolution kernel coefficient cache unit is used for caching convolution kernel coefficients;
the convolution kernel coefficient reading unit is used for reading the convolution kernel coefficients participating in convolution operation from the convolution kernel coefficient cache unit;
and the line convolution operation unit is used for reading data participating in convolution calculation from the input characteristic diagram buffer unit and the convolution kernel coefficient reading unit, performing convolution calculation, and outputting a convolution result to an external storage device through the line convolution result writing-out unit.
5. A DnCNN network computing method is characterized in that the DnCNN network computing method comprises a starting working stage, a stable working stage and an ending working stage; wherein, the first and the second end of the pipe are connected with each other,
in the starting working stage, the original image data input buffer unit buffers data in each input feature map in the input image according to lines, and judges whether the buffer data line number in each input feature map reaches a kernel _ size line, wherein the kernel _ size represents the line number of a convolution kernel; when the number of the cache data lines in each input feature graph reaches a kernel _ size line, starting a first layer of convolution operation by a line convolution operation unit; other neural network layers need to wait until the number of lines of the cache data in each output characteristic diagram of the previous layer reaches a kernel _ size line, and then the calculation of the current neural network layer is started; when the data of the first row of residual images are output, the starting working phase is ended;
in the stable working stage, after the original image data input cache unit caches a new line of data of each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to perform continuous operation on each neural network layer, and stably outputs a new line of residual image result data; the original image data input buffer unit sets a buffer space with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, wherein the PIC _ W is the data volume of each line of each input feature map; when the original image data input cache unit finishes caching the last line of data of each input characteristic graph, finishes the calculation of each neural network layer and outputs a residual image and output data corresponding to the line, the stable working stage is finished and the working stage is finished;
and in the finishing working stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the output data of the residual lines of the residual image.
6. The method of calculating a DnCNN network of claim 5 wherein the row convolution operation unit performs calculations of BN and ReLU in addition to convolution operations.
7. The DnCNN network computing method of claim 5, wherein the number of buffered data lines comprises the number of lines of PAD data filled above each input feature map data when determining whether the number of buffered data lines in each input feature map reaches a kernel _ size line; and the number of lines of cache data in each output characteristic diagram of other neural network layers from the previous layer to the previous layer reaches a kernel _ size line, and the number of lines of cache data comprises the number of lines of PAD data filled above each output characteristic diagram data.
8. The method of calculating a DnCNN network of claim 7, wherein the number of rows of PAD data is (kernel _ size-1)/2.
9. The DnCNN network computing method of claim 5, wherein a line delay number of the data of the first line of the residual image from the original image input data is
start_src_ln_num = Layer_num + kernel_size-1-pad_num;
Wherein start _ src _ ln _ num represents a line delay number of data of a first line of output residual images and original image input data; layer _ num represents the number of layers of the whole DnCNN network, including all Conv + ReLU layers, Conv + BN + ReLU layers and Conv layers; kernel _ size represents the number of rows of the convolution kernel; PAD _ num represents the number of PAD rows filled above the input feature map.
10. A DnCNN network computing device for use in the DnCNN network computing method of claim 5, wherein the DnCNN network computing device comprises an original image data input buffer unit, a convolution kernel coefficient reading unit, a line convolution intermediate result reading unit, a line convolution operation unit, a line convolution intermediate result buffer unit, and a line convolution result writing-out unit; wherein the content of the first and second substances,
the original image data input cache unit is used for caching line data of each input characteristic diagram in the original image;
the convolution kernel coefficient caching unit is used for caching convolution kernel coefficients;
the convolution kernel coefficient reading unit is used for reading the convolution kernel coefficients participating in convolution operation from the convolution kernel coefficient cache unit;
the line convolution intermediate result cache unit is used for caching an intermediate result calculated by the DnCNN network;
the line convolution intermediate result reading unit is used for reading a line convolution intermediate result from the line convolution intermediate result buffer unit;
the line convolution operation unit is used for reading data participating in convolution operation from the original image data input buffer unit/the line convolution intermediate result reading unit and the convolution kernel coefficient reading unit, performing convolution operation and outputting an obtained result to the line convolution result writing-out unit; and
and the line convolution writing-out unit outputs the intermediate result to the line convolution intermediate result buffer unit and transmits the final residual image data to an external memory.
11. The DnCNN network computing device of claim 10, wherein the line convolution intermediate result buffer unit sets a buffer space with a size of kernel _ size _ PIC _ W lines for each intermediate output feature map.
12. The apparatus of claim 10 wherein the row convolution operation unit performs calculations of BN and ReLU in addition to convolution operations.
13. The DnCNN network computing device of claim 10, wherein the DnCNN network computing device is located at one of: the start of the image signal processing flow, an intermediate position of the image signal processing flow, or the end of the image signal processing flow.
CN202210335610.1A 2022-04-01 2022-04-01 Convolution calculation method, convolution calculation device and application thereof Active CN114429203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210335610.1A CN114429203B (en) 2022-04-01 2022-04-01 Convolution calculation method, convolution calculation device and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210335610.1A CN114429203B (en) 2022-04-01 2022-04-01 Convolution calculation method, convolution calculation device and application thereof

Publications (2)

Publication Number Publication Date
CN114429203A CN114429203A (en) 2022-05-03
CN114429203B true CN114429203B (en) 2022-07-01

Family

ID=81314321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210335610.1A Active CN114429203B (en) 2022-04-01 2022-04-01 Convolution calculation method, convolution calculation device and application thereof

Country Status (1)

Country Link
CN (1) CN114429203B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row
CN110991609A (en) * 2019-11-27 2020-04-10 天津大学 Line buffer for improving data transmission efficiency

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102415508B1 (en) * 2017-03-28 2022-07-01 삼성전자주식회사 Convolutional neural network processing method and apparatus
US20220012587A1 (en) * 2020-07-09 2022-01-13 Shanghai Zhaoxin Semiconductor Co., Ltd. Convolution operation method and convolution operation device
CN112200300B (en) * 2020-09-15 2024-03-01 星宸科技股份有限公司 Convolutional neural network operation method and device
CN113807509B (en) * 2021-09-14 2024-03-22 绍兴埃瓦科技有限公司 Neural network acceleration device, method and communication equipment
CN114169514B (en) * 2022-02-14 2022-05-17 浙江芯昇电子技术有限公司 Convolution hardware acceleration method and convolution hardware acceleration circuit

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row
CN110991609A (en) * 2019-11-27 2020-04-10 天津大学 Line buffer for improving data transmission efficiency

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A convolutional neural network accelerator based on FPGA for buffer optimization;Haotian Wang et al;《IAEAC 2021》;20210405;2362-2367页 *
图像卷积实时计算的FPGA实现;张帆;《电子设计工程》;20210131;第29卷(第1期);132-137,142页 *

Also Published As

Publication number Publication date
CN114429203A (en) 2022-05-03

Similar Documents

Publication Publication Date Title
KR102511059B1 (en) Super-resolution processing method for moving image and image processing apparatus therefor
CN107924554B (en) Multi-rate processing of image data in an image processing pipeline
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
US10579908B2 (en) Machine-learning based technique for fast image enhancement
US9787922B2 (en) Pixel defect preprocessing in an image signal processor
KR101137753B1 (en) Methods for fast and memory efficient implementation of transforms
US9386234B2 (en) Auto filter extent management
JP2021016150A (en) Loop filtering device and image decoding device
CN110211057B (en) Image processing method and device based on full convolution network and computer equipment
Guo et al. Joint denoising and demosaicking with green channel prior for real-world burst images
US9462189B2 (en) Piecewise perspective transform engine
US20140375843A1 (en) Image processing apparatus, image processing method, and program
CN111402139A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
JP2015144435A (en) Techniques to facilitate use of small line buffers for processing of small or large images
CN112771843A (en) Information processing method, device and imaging system
WO2023160426A1 (en) Video frame interpolation method and apparatus, training method and apparatus, and electronic device
CN114429203B (en) Convolution calculation method, convolution calculation device and application thereof
CN114596339A (en) Frame processing device and method and frame processor
US11302035B2 (en) Processing images using hybrid infinite impulse response (TTR) and finite impulse response (FIR) convolution block
JP2005176350A (en) Printing processing of compressed image with noise
CN114885144B (en) High frame rate 3D video generation method and device based on data fusion
WO2021139380A1 (en) Image processing method and device, electronic device
Asama et al. A machine learning imaging core using separable FIR-IIR filters
WO2006112814A1 (en) Edge-sensitive denoising and color interpolation of digital images
CN113365107B (en) Video processing method, film and television video processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant