CN114429203B - Convolution calculation method, convolution calculation device and application thereof - Google Patents
Convolution calculation method, convolution calculation device and application thereof Download PDFInfo
- Publication number
- CN114429203B CN114429203B CN202210335610.1A CN202210335610A CN114429203B CN 114429203 B CN114429203 B CN 114429203B CN 202210335610 A CN202210335610 A CN 202210335610A CN 114429203 B CN114429203 B CN 114429203B
- Authority
- CN
- China
- Prior art keywords
- convolution
- line
- data
- kernel
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Computational Mathematics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Mathematical Optimization (AREA)
- Artificial Intelligence (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Neurology (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
- Image Processing (AREA)
Abstract
The invention provides a convolution calculation method, a convolution calculation device and application thereof, wherein the convolution calculation method comprises the following steps: starting a working stage, stabilizing the working stage and finishing the working stage; in the starting working stage, the input characteristic diagram cache unit caches the data in each input characteristic diagram according to lines, and when the number of the cache data lines in each input characteristic diagram reaches a kernel _ size line, the line convolution operation unit starts convolution operation; in the stable working stage, after the input feature map cache unit finishes caching a new line of data in each input feature map, the line convolution operation unit performs convolution operation and outputs a convolution operation result of the new line of each output feature map; and in the working finishing stage, the row convolution operation unit completes the calculation of the residual rows by self-driving and outputs the convolution calculation results of the residual rows. By adopting the convolution calculation method, the real-time performance of image processing is improved, the storage resource is saved, and the block boundary effect cannot occur.
Description
Technical Field
The invention relates to the technical field of convolutional neural network and chip design, in particular to a convolutional calculation method, a convolutional calculation device and application thereof.
Background
Image noise reduction is always a very important function in the field of Image Signal Processing (ISP), and can make an image more clear and beautiful visually, improve the quality of a picture owner and the quality of the picture, and further facilitate better image analysis and understanding, so that the image noise reduction technology is widely applied to the fields of biology, medicine, military, automatic driving and the like. At present, mainstream image noise reduction methods are mainly divided into two types, one type is a priori traditional image noise reduction method based on a specific form, and the traditional image noise reduction methods not only have complex models, but also contain a plurality of parameters needing manual adjustment, so that the image noise reduction process is complex in calculation. Especially, when the image noise reduction processing technology is faced with severe weather, complex light, severe motion and other complex scenes, the image noise reduction processing technology based on the traditional ISP is close to the bottleneck, the effect is more and more difficult to adjust, and the algorithm cost performance is more and more low. The other type of image noise reduction method is an image noise reduction method based on deep learning, and the AI noise reduction algorithm based on the CNN network can well solve the problems encountered by the traditional image noise reduction method. When various complex application scenes are faced, the AI noise reduction algorithm shows super-strong adaptability and stability, and effectively retains real details in an image while efficiently removing image noise; meanwhile, the AI noise reduction algorithm almost needs to carry out parameter adjustment, which lays a foundation for large-scale industrial application; in addition, the AI noise reduction algorithm can be updated iteratively, and compared with the traditional image noise reduction processing technology, once the image noise reduction algorithm is solidified in the chip, the AI noise reduction algorithm cannot be updated, and the AI noise reduction algorithm has better flexibility.
DnCNN is a current famous AI noise reduction algorithm based on a CNN network, and is a Convolutional Neural network for Denoising, which is derived from a paper of Beyond a Gaussian Denoiser: identified Learning of Deep CNN for Image Denoising, in which an end-to-end Residual Learning Neural network model is used for Denoising for the first time, and a better noise reduction effect than a previous noise reduction algorithm is obtained. Meanwhile, the DnCNN can also be applied to tasks such as single-image super-division, JPEG deblocking and the like.
As shown in fig. 1, fig. 1 is a network structure diagram of a common DnCNN. The input of the network structure is an original image with noise points, the output is a residual image (namely a noise point image), and the residual image is subtracted on the basis of the original image to obtain an image after noise reduction. The entire network structure can be divided into three parts: the network layer connected to the original image is "Conv + ReLU" (Conv: Convolution; ReLU: Rectifier Linear Neural, i.e. rectification Linear unit) for generating N output feature maps (one input feature map if a gray map, three input feature maps if a color map); the network layer connected with the residual image is 'Conv' and is used for reconstructing output (one output characteristic image if the network layer is a gray scale image, three output characteristic images if the network layer is a color image); the other M intermediate network layers are in the form of 'Conv + BN + ReLU' (BN: Batch Normalization), wherein Batch Normalization is added between convolution and ReLU, and the number of input and output feature maps of the intermediate network layers is N.
At present, an AI noise reduction process based on a CNN network is generally performed after ISP image processing, that is, after an entire frame of image is processed by a conventional ISP (the process may or may not include conventional noise reduction), an AI noise reduction method is used to reduce noise. In the above processing manner, a frame delay of at least one frame may be caused, which is not suitable for an application scene with relatively low delay tolerance, such as automatic driving. Meanwhile, as can be known from the network structure of the DnCNN, the AI noise reduction process based on the CNN network adopts a mode of original image resolution in-original image resolution out, which will cause huge memory resource loss for a large-resolution application scene (for example, 4K images, when the number of output feature maps is 64, only the output of the first convolution layer needs to occupy about 500MB of memory). Although some published patents use image block technology to reduce the memory requirement, when performing CNN network operation, the CNN network operation is performed on each block separately, and finally the results of final calculation are combined, but there is a certain difference between the result of CNN network operation performed on each block separately and the result of CNN network operation on the whole image, especially for applications such as AI noise reduction, an obvious block boundary effect may occur, which further affects the overall noise reduction effect. Due to the above problems, the large-scale application of the CNN network-based AI noise reduction algorithm in the industry is definitely limited.
Therefore, there is a need for an AI noise reduction algorithm based on a CNN network, which can be combined with the conventional ISP processing, does not increase the frame-level delay, and can effectively reduce the requirement for the memory, save the storage resource, and not weaken the AI noise reduction effect.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a convolution calculation method, which is characterized in that: the convolution calculation method comprises a starting working stage, a stable working stage and an ending working stage; in the starting working phase, the input feature map cache unit caches data in each input feature map according to lines, and judges whether the line number of the cached data in each input feature map reaches a kernel _ size line, wherein the kernel _ size represents the line number of a convolution kernel; when the number of the cache data lines in each input feature map reaches a kernel _ size line, starting convolution operation by the line convolution operation unit, and when each output feature map outputs a first line of convolution calculation results, ending the starting working phase; in the stable working stage, after the input feature map cache unit finishes caching a new line of data in each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to carry out convolution operation, and stably outputs a convolution operation result of a new line of each output feature map; when the input characteristic diagram cache unit finishes caching the last line of data of each input characteristic diagram and finishes the convolution calculation of the line of data, the stable working stage is finished; and in the finishing working stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the convolution calculation results of the residual lines.
The convolution calculation method provided by the invention has the following advantages: (1) when the amount of the cached data in each input feature map reaches kernel _ size, the convolution calculation can be started without caching the data of the whole input feature map, so that the real-time performance of image processing is enhanced in the calculation process; (2) the input feature map caching unit only needs to set a cache with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to cache the data of the whole input feature map; when temporary data caching is carried out, only 1 row of data capacity needs to be set for temporary data caching, and convolution calculation results are output according to rows; therefore, the convolution calculation method greatly reduces the memory required by calculation and saves the storage resource; (3) in the calculation process of the method, the data participating in the operation is completely the same as the data of the conventional convolution operation of the whole frame picture, so that the obvious block boundary effect cannot occur.
Preferably, when determining whether the number of rows of the cached data in each input feature map reaches the kernel _ size row, the number of rows of the cached data includes the number of rows of PAD data filled above each input feature map data.
Preferably, the number of rows of the PAD data is (kernel _ size-1)/2.
Preferably, if the input feature map is original image data, the input feature map caching unit sets a cache space with a capacity of (kernel _ size +1) × PIC _ W for each input feature map, where PIC _ W is a data amount of each line of each input feature map; and if the input feature map is the output feature map of the previous layer of convolution operation, the input feature map caching unit sets a caching space with the capacity of kernel _ size _ PIC _ W for each input feature map. When the cache of each input feature graph reaches kernel _ size line data, the line convolution operation unit starts to carry out convolution operation according to lines, and meanwhile, the residual line cache space can continuously receive data input of the external input feature graph.
The invention also provides a convolution calculation device adopting the convolution calculation method, which is characterized by comprising an input characteristic diagram cache unit, a convolution kernel coefficient reading unit, a line convolution operation unit and a line convolution result writing-out unit, wherein the input characteristic diagram cache unit is used for caching line data in each input characteristic diagram; the convolution kernel coefficient caching unit is used for caching convolution kernel coefficients; the convolution kernel coefficient reading unit is used for reading the convolution kernel coefficients participating in convolution operation from the convolution kernel coefficient cache unit; and the line convolution operation unit is used for reading data participating in convolution calculation from the input characteristic diagram cache unit and the convolution kernel coefficient reading unit, performing convolution calculation, and outputting a convolution result to an external storage device through the line convolution result writing-out unit.
The invention provides a DnCNN network computing method, which is characterized by comprising a working starting stage, a stable working stage and a working ending stage; in the starting working stage, the original image data input buffer unit buffers data in each input feature map in the input image according to lines, and judges whether the number of buffer data lines in each input feature map reaches a kernel _ size line, wherein the kernel _ size represents the number of lines of a convolution kernel; when the number of the cache data lines in each input feature graph reaches a kernel _ size line, starting a first layer of convolution operation by a line convolution operation unit; other neural network layers need to wait until the number of lines of the cache data in each output characteristic diagram of the previous layer reaches a kernel _ size line, and then the calculation of the current neural network layer is started; when the data of the first row of residual images are output, the starting working phase is ended; in the stable working stage, after the original image data input cache unit caches a new line of data of each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to perform continuous operation on each neural network layer, and stably outputs a new line of residual image result data; when the original image data input cache unit finishes caching the last line of data of each input characteristic graph, finishes the calculation of each neural network layer and outputs a residual image and output data corresponding to the line, the stable working stage is finished and the working stage is finished; and in the finishing working stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the output data of the residual lines of the residual image.
The DnCNN network computing method provided by the invention has the following advantages: (1) when the amount of the cached data in each input feature map reaches kernel _ size, the convolution calculation can be started without caching the data of the whole input feature map, so that the real-time performance of image processing is enhanced in the calculation process; (2) the original image data input buffer unit only needs to set a buffer with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to buffer the data of the whole input feature map; when the intermediate result is cached, each output characteristic diagram only needs the data capacity of kernel _ size _ PIC _ W to be used for caching the intermediate result and used as the input of the next layer of neural network, so that the memory required by calculation is greatly reduced by adopting the convolution calculation method, and the storage resource is saved; (3) in the calculation process, the data participating in the calculation is completely the same as the data of the conventional convolution calculation of the whole frame picture, so that the obvious block boundary effect cannot occur; (4) the traditional image signal processing related algorithm functional modules are processed according to rows, the line-processed CNN network computing method also inputs and outputs computing results according to the rows, and the delay line number of the output results and the original image is only Layer _ num + kernel _ size-1-pad _ num, so that the method can be compatible with the traditional image signal processing algorithm and can be used for parallel computing, the delay is small, and no frame-level delay exists.
Preferably, the line convolution operation unit performs calculation of BN and ReLU in addition to the convolution operation.
Preferably, when judging whether the number of cache data lines in each input feature map reaches a kernel _ size line, the number of cache data lines includes the number of lines of PAD data filled above each input feature map data; and waiting for the number of lines of the cache data in each output characteristic diagram of the previous layer to reach a kernel _ size line by other neural network layers, wherein the number of lines of the cache data comprises the number of lines of the PAD data filled above each output characteristic diagram data.
Preferably, the number of rows of the PAD data is (kernel _ size-1)/2.
Preferably, the original image data input buffer unit sets a buffer space with a capacity of (kernel _ size +1) × PIC _ W for each input feature map, where PIC _ W is a data amount of each line of each input feature map.
Preferably, the line delay number of the data of the output first line residual image and the original image input data is start _ src _ ln _ num = Layer _ num + kernel _ size-1-pad _ num; wherein start _ src _ ln _ num represents a line delay number of data of a first line of output residual images and original image input data; layer _ num represents the number of layers of the whole DnCNN network, including all Conv + ReLU layers, Conv + BN + ReLU layers and Conv layers; kernel _ size represents the number of rows of the convolution kernel; PAD _ num represents the number of PAD rows filled above the input feature map. Therefore, with the DnCNN calculation method of the present invention, only a small delay is generated.
Correspondingly, the invention also provides a DnCNN network computing device adopting the DnCNN network computing method, which is characterized by comprising an original image data input cache unit, a convolution kernel coefficient reading unit, a line convolution intermediate result reading unit, a line convolution operation unit, a line convolution intermediate result cache unit and a line convolution result writing-out unit; the original image data input caching unit is used for caching line data of each input characteristic diagram in an original image; the convolution kernel coefficient caching unit is used for caching convolution kernel coefficients; the convolution kernel coefficient reading unit is used for reading the convolution kernel coefficients participating in convolution operation from the convolution kernel coefficient cache unit; the line convolution intermediate result cache unit is used for caching an intermediate result calculated by a DnCNN network; the line convolution operation unit is used for reading data participating in convolution operation from the original image data input buffer unit/the line convolution intermediate result reading unit and the convolution kernel coefficient reading unit, performing convolution operation and outputting an obtained result to the line convolution result writing-out unit; and the line convolution writing-out unit outputs the intermediate result to the line convolution intermediate result buffer unit and transmits the final residual image data to an external memory.
The DnCNN network computing device processed according to rows provided by the invention has the following advantages: (1) the original image data input buffer unit only needs to set a buffer with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to buffer the data of the whole input feature map; when the intermediate result caching is carried out, the line convolution intermediate result caching unit only needs to configure kernel _ size _ PIC _ W data capacity for each output characteristic diagram for intermediate result caching and used as the input of the next layer of neural network, so that the device greatly reduces the storage space, realizes the miniaturization and integration, and is more favorable for industrial application; (2) in the calculation process of the device, the data participating in the operation is completely the same as the data of the conventional convolution operation of the whole frame picture, so that the obvious block boundary effect cannot occur. (3) The traditional image signal processing related algorithm functional modules are processed according to rows, the CNN network computing device for processing according to rows also inputs and outputs computing results according to rows, and the number of delay rows of the output results and the original input image is only Layer _ num + kernel _ size-1-pad _ num, so that the device can be compatible with the traditional image signal processing device for parallel computing, the delay is small, and no frame-level delay exists.
Preferably, the line convolution intermediate result buffer unit sets a buffer space with a capacity of kernel _ size × PIC _ W lines for each intermediate output feature map.
Preferably, the line convolution operation unit performs calculation of BN and ReLU in addition to the convolution operation.
Preferably, the DnCNN network computing device is placed in one of the following locations: the start of the image signal processing flow, an intermediate position of the image signal processing flow, or the end of the image signal processing flow. The CNN network computing device for line-wise processing in the present invention inputs and outputs the computation results line-wise, as in the conventional image signal processing device, and therefore, the device in the present invention can be placed at any position in the image signal processing device, performs compatible parallel computation with the conventional image signal processing algorithm, and has a small delay and no frame-level delay.
Drawings
FIG. 1 is a diagram of a network architecture of a common DnCNN.
FIG. 2 is a diagram illustrating convolutional layer operation.
FIG. 3 is a schematic diagram of the convolution calculation method of the line-by-line processing in the present invention.
Fig. 4 is a diagram showing a convolution calculation apparatus according to the convolution calculation method for line-by-line processing of the present invention.
FIG. 5 is an embodiment of the present invention of line-wise processed DnCNN network operations.
FIG. 6 is a schematic diagram of row data flow in each neural network layer in a DnCNN network.
Fig. 7 is a CNN network computing device for line-by-line processing according to the present invention.
Detailed Description
FIG. 2 is a diagram illustrating convolutional layer operation. The color coding method of the digital image comprises RGB, YUV, YCbCr and the like. Taking RGB color coding as an example, each pixel in the digital image may be composed of a red sub-pixel, a green sub-pixel, and a blue sub-pixel; that is, if the resolution or resolution of a digital image is W × H square pixels, the digital image may be represented by N two-dimensional matrices. For example, a digital image may be composed of multiple feature maps Tin_1、Tin_2、……、Tin_NThe total data volume of the digital image is W H N, where W is the feature width, H is the feature height, and N is the dimension (or number of channels).
Assuming that a digital image needs to be subjected to convolution operation with M sets of convolution kernels (kernel), i.e. M output feature images are generated through convolution calculation, similarly, each of the set of convolution kernels can be represented by N two-dimensional matrices, and the data amount of a set of convolution kernel coefficients is w × h × N, i.e. the convolution kernel coefficients needed by one output feature image are calculated; then when computing M output feature images, the total data size of the convolution kernel coefficients is w × h × N × M, where w is the convolution kernel width, h is the convolution kernel height, N is the dimension (or number of channels), and M is the number of sets of convolution kernels.
When one-step convolution operation is performed, the element in the convolution kernel and the current corresponding element in the feature map perform matrix Multiplication (MAC) operation, and then the convolution kernel sequentially moves one step (stride) to perform matrix multiplication with the element corresponding to each step in the feature map, and so on until the convolution kernel moves to the last element of the feature map. It is noted that when the convolution kernel is moved outside the feature map, some elements in the convolution kernel cannot correspond to elements in the feature map, so it is necessary to expand the range of the feature map, generally, padding (padding) the feature map, i.e., padding some values on the boundary of the feature map matrix to increase the size of the feature map matrix, e.g., padding 0, to ensure that the convolution operation is valid (valid). In general, the size w × h of the convolution kernel is odd × odd, w and h are equal values, and assuming that the size is k, that is, the size of the convolution kernel is k × k, the filling amplitude is (k-1)/2, that is, the (k-1)/2 row filling values are filled in the upper and lower sides of the feature map matrix, and the (k-1)/2 column filling values are filled in the left and right sides of the feature map matrix. After the padding, the output signature will be the same size as the input signature.
For example, assume a set of input profiles Tin_1、Tin_2、……、Tin_NA set of convolution kernels Wx,y,Wx,yRepresenting the convolution kernel between the xth input feature map and the yth output feature map, and the step size of the convolution operation is 1, a set of output feature maps (feature maps) T can be generatedout_1、Tout_2、……、Tout_M. In short, with the convolution operation architecture of fig. 2, a convolutional neural network computation can be implemented.
The convolutional layer calculation is illustrated in FIG. 2, which includes N input profiles T in FIG. 2in_1、Tin_2、……、Tin_NM output profiles Tout_1、Tout_2、……、Tout_M,Wx,yRepresents the convolution kernel between the xth input feature map and the yth output feature map, and if the convolution kernel is a 3 × 3 convolution kernel, it contains 9 coefficients. In the calculation process, each output feature map needs all input feature maps to participate in calculation, and T is usedout_1For example, the conventional calculation process is as follows: (1) will Tin_1And convolution kernel W1,1Carrying out filtering operation to obtain temporary characteristic diagram data Tout_1_tmpAnd caching; (2) will Tin_2And convolution kernel W2,1Filtering operation is carried out, and the calculated result is accumulated on the temporary characteristic diagram data T cached before in a point-to-point modeout_1_tmpAnd taking the result of the calculation asNew temporary profile data Tout_1_tmpCaching; (3) completing the calculation of all input feature map data in a similar calculation mode in the step (2), and finally obtaining Tout_1_tmpI.e. the final Tout_1。
According to the above calculation process, the conventional calculation method needs to start convolution calculation after the data of the whole input feature map is input, and in the calculation process, all the temporary feature map data are the same as the data volume of the whole input feature map, so that the calculation method needs huge memory resources, especially for a high-resolution application scenario, the calculation method occupies more memory space, and further the large-scale application of the CNN-based AI noise reduction method in the industry is limited.
In order to solve the above-mentioned problems of the conventional convolution calculation method, the present invention provides a convolution calculation method by line processing, as shown in fig. 3, taking convolution calculation with convolution kernel of 3 × 3 and span of 1 as an example, if T isin_1、Tin_2、……、Tin_NAll have three lines of data, then T can be calculated according to the 3x3 filtering calculation modeout_1、Tout_2、……、Tout_MThe corresponding line of data is calculated. Then only one row of data is input subsequently, Tout_1、Tout_2、……、Tout_MThe next line of data can also be calculated. The convolution calculation process can be divided into three stages: a starting working phase, a stable working phase and an ending working phase.
In a starting working phase, the input feature map caching unit caches data in each input feature map according to lines, judges whether the number of cached data volume lines in each input feature map reaches kernel _ size lines (wherein, kernel _ size represents the number of lines of a convolution kernel), when the cached data volume in each input feature map reaches kernel _ size line data, the line convolution operation unit starts convolution operation, and when each output feature map outputs a first line convolution calculation result, the starting working phase is ended. The cache data amount comprises PAD data filled above each output feature map data, namely if the input feature maps are filled (Padding) in the convolution calculation process, whether the sum of the cache data amount line number and the PAD line number in each input feature map reaches a kernel _ size line is judged, and when the sum of the cache data amount line number and the PAD line number in each input feature map reaches the kernel _ size line data, the line convolution operation unit starts convolution operation. When the input feature map caching unit caches data, if the input feature map is original image data, a cache BUF with a capacity of (kernel _ size +1) × PIC _ W may be set for each input feature map, where PIC _ W is a data amount of each line of each input feature map. When the cache of each input feature graph reaches kernel _ size line data, the line convolution operation unit starts to perform line-by-line convolution operation, and meanwhile, the remaining line of cache space can continuously receive data input of the external input feature graph. When the input feature map caching unit caches data, if the input feature map is an output feature map of a previous layer of convolution operation, a cache BUF with a capacity of kernel _ size × PIC _ W may be set for each input feature map, and is used to receive a convolution output result of the previous layer of convolution operation. And circularly using the buffer BUF in a stable working stage until all the row data in the input characteristic diagram are input.
In the stable working stage, after the input feature map cache unit finishes caching a new line of data in each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to carry out convolution operation, and stably outputs a convolution operation result of a new line of each output feature map. And when the input characteristic diagram cache unit finishes caching the last line of data of each input characteristic diagram and finishes convolution calculation of the line of data, ending the stable working stage and entering the working finishing stage.
With the output characteristic diagram T in FIG. 3out_1For example, in the starting operation stage, the operation process of the row convolution operation unit is as follows: (1) obtaining a first input feature map Tin_1The data PAD, PIC _1 and PIC _2 participating in the operation are obtained, and the corresponding convolution kernel W is obtained1,1Carrying out convolution operation on the data to obtain an output characteristic diagram Tout_1Temporary data PIC _1_ tmp of the first row PIC _ 1; (2) obtaining a second input feature map Tin_2The data PAD, PIC _1 and PIC _2 participating in the operation are obtained, and the corresponding convolution kernel W is obtained2,1Performing convolution operation on the data in the step (1), and accumulating the calculated result to the previous output characteristic diagram T in a point-to-point modeout_1Temporary data PIC _1_ tmp of the first row PIC _ 1; (3) analogizing by the method until all the input feature map data are calculated, and finally obtaining the output feature map Tout_1The temporary data PIC _1_ tmp of the first row PIC _1 is the final first row output Tout_1. And outputting the result through a row convolution result output unit. In the smooth working phase, the results of the rows in the other output characteristic diagrams are obtained by the method. It can be seen from the above operation process that the line convolution operation unit only needs 1 data capacity for temporary data PIC _1_ tmp buffering, rather than the traditional convolution calculation method that needs the same memory size as the input feature map for intermediate result buffering, which further reduces the storage space required by the convolution calculation method provided in the present invention.
And in the working finishing stage, the row convolution operation unit completes the calculation of the residual rows by self-driving and outputs the convolution calculation results of the residual rows. Taking the convolution kernel of 3 × 3 in fig. 3 as an example, the last line of data of each output feature map is generated by calculation using the PIC _ H-1, PIC _ H line of each input feature map and the PAD line, and the last line of data of each output feature map is output.
The convolution calculation method provided by the invention has the following advantages: (1) when the amount of the cached data in each input feature map reaches kernel _ size, the convolution calculation can be started without caching the data of the whole input feature map, so that the real-time performance of image processing is enhanced in the calculation process; (2) the input feature map caching unit only needs to set a cache with the capacity of (kernel _ size +1) × PIC _ W or kernel _ size × PIC _ W for each input feature map, and does not need to cache the data of the whole input feature map; when temporary data caching is carried out, only 1 row of data capacity needs to be set for temporary data caching, and convolution calculation results are output according to rows; therefore, the convolution calculation method greatly reduces the memory required by calculation and saves the storage resource; (3) in the calculation process of the method, the data participating in the operation is completely the same as the data of the conventional convolution operation of the whole frame picture, so that the obvious block boundary effect cannot occur.
Fig. 4 is a diagram showing a convolution calculation apparatus 100 according to the convolution calculation method of the present invention. The convolution calculation device includes an input feature map buffer unit 101, a convolution kernel coefficient reading unit 102, a convolution kernel coefficient buffer unit 103, a line convolution operation unit 104, and a line convolution result writing-out unit 105. The input characteristic diagram caching unit 101 is used for caching line data in each input characteristic diagram; the convolution kernel coefficient buffer unit 103 is used for buffering convolution kernel coefficients; the convolution kernel coefficient reading unit 102 is configured to read a convolution kernel coefficient participating in convolution operation from the convolution kernel coefficient buffering unit 103; the line convolution operation unit 104 is configured to read data involved in convolution calculation from the input feature map buffer unit 101 and the convolution kernel coefficient reading unit 102, perform convolution calculation, and output a line convolution operation result to an external storage device through the line convolution result writing unit 105.
In the process of performing convolution calculation, in a starting working stage, the input feature map caching unit 101 receives data input by lines in each input feature map, and judges whether the number of the cached input feature map lines reaches a kernel _ size line; a convolution kernel coefficient reading unit 102 reads a convolution kernel coefficient required for participating in calculation from a convolution kernel coefficient buffering unit 103; when the amount of the cache data in each input feature map reaches kernel _ size line data, the line convolution operation unit 104 starts convolution operation, and when each output feature map outputs the first line of convolution calculation results, the start-up working phase is ended. The minimum capacity of the input feature map buffer unit 101 is (kernel _ size +1) × PIC _ W × N, where kernel _ size is the number of rows of the convolution kernel, PIC _ W is the data amount of each row of the input feature map, and N is the number of input feature maps.
If the input feature map is the original image data, the input feature map caching unit 101 sets a cache with a capacity of (kernel _ size +1) × PIC _ W for each input feature map, and is used for caching the data of the input feature map, when the cached data amount reaches a kernel _ size line, the convolution calculation can be performed, and meanwhile, the remaining cache space in one line can continuously accept the data input of the external input feature map. If the input feature map is the output feature map of the previous layer of convolution operation, the input feature map cache unit 101 sets a cache with a capacity of kernel _ size _ PIC _ W for each input feature map, and is used for receiving the convolution output result of the previous layer of convolution operation. And in the subsequent stable working stage, circularly using the buffer BUF until all the row data in the input characteristic diagram are input.
In the stable working stage, after the input feature map buffer unit 101 buffers a new line of data in each input feature map, the line convolution operation unit 104 obtains the input feature map data required by the convolution operation and the corresponding convolution kernel data to perform the convolution operation, and outputs the result of the convolution operation to the line convolution result writing-out unit 105.
In the end working phase, the row convolution operation unit 104 completes the calculation of the remaining rows by self-driving, and after the calculation is completed, the row convolution operation unit outputs the last row output result of each output feature map to the row convolution result writing-out unit 105. The line convolution result writing-out unit 105 can output the data in the output feature map to the external memory by lines in the stationary operation stage.
The convolution calculation method processed according to the line can be applied to DnCNN operation, and the whole DnCNN network can be calculated and processed according to the line calculation mode. Because BN can be combined with convolution calculations (Conv) in the DnCNN network structure shown in fig. 1, whereas ReLU is a simple decision and assignment operation, the essence of the entire DnCNN network calculation is a continuous calculation of multiple convolution layers.
As shown in fig. 5, fig. 5 is an embodiment of the present invention of a line processed DnCNN network operation. This embodiment simplifies the DnCNN network in order to better describe the method for calculating the DnCNN network proposed in this patent. In the DnCNN network, an input image is a single gray image (namely the number of input feature maps is 1), and the image size is PIC _ W + PIC _ H; a Conv + ReLU layer is connected behind the input image, the number of output characteristic graphs is 2, and the size of a convolution kernel is 3 multiplied by 3; 2 Conv + BN + ReLU layers are continuously connected behind the Conv + ReLU layer, the number of input/output characteristic graphs is 2, and the size of a convolution kernel is 3 multiplied by 3; the second "Conv + BN + ReLU" layer is followed by a "Conv" layer, whose output feature map is 1 (i.e. the final residual image), and the size of the convolution kernel is 3 × 3. In fig. 5, in order to keep the image size unchanged during the continuous convolution operation, the image during the calculation is filled in, and PAD rows are added. The PAD lines in the graph are not real feature data, and according to the requirement of 3 × 3 filter calculation, a PAD line with a data value of 0 needs to be added on each of the upper and lower sides of the feature graph to ensure that the image height remains unchanged after the 3 × 3 filter calculation; the leftmost and rightmost edges of each feature map also need to be added with a PAD column with data value 0 respectively to ensure that the width of the image remains unchanged after 3 × 3 filtering calculation (not shown in fig. 5 and not described further since it is not very relevant to the scheme in this patent).
As can be seen from fig. 5, when two lines of data SRC _ LN1 and SRC _ LN2 are input to the input image, the filled first line of PAD lines is added, so that the convolution operation of the "Conv + ReLU" LAYER is performed, and the first line of data LAYER1_ OFMAP1_ LN of the output characteristic map of the "Conv + ReLU" LAYER is calculated1And LAYER1_ OFMAP2_ LN1。
When both output profiles of the "Conv + ReLU" LAYER have two lines of data, LAYER1_ OFMAP1_ LN1、LAYER1_OFMAP1_LN2And LAYER1_ OFMAP2_ LN1、LAYER1_OFMAP2_LN2In combination with the PAD row in the first row of each output feature map in the LAYER, it is possible to perform convolution calculation of the "first Conv + BN + ReLU" LAYER and calculate the first row data LAYER2_ ofamap 1_ LN in the two output feature maps in the "first Conv + BN + ReLU" LAYER1And LAYER2_ OFMAP2_ LN1. At this time, the data that the input image has entered and participated in the calculation is: SRC _ LN1、SRC_LN2And SRC _ LN3。
When two layers of the "first Conv + BN + ReLU" layerEach output profile has two lines of data LAYER2_ OFMAP1_ LN1、LAYER2_OFMAP1_LN2And LAYER2_ OFMAP2_ LN1、LAYER2_OFMAP2_LN2In combination with the PAD row in the first row of each output feature map in the LAYER, it is possible to perform convolution calculation of the "second Conv + BN + ReLU" LAYER and calculate the first row data LAYER3_ ofamap 1_ LN in the two output feature maps in the "second Conv + BN + ReLU" LAYER1And LAYER3_ OFMAP2_ LN1. At this time, the data that the input image has entered and participated in the calculation is: SRC _ LN1、SRC_LN2、SRC_LN3And SRC _ LN4。
When both output profiles of the "second Conv + BN + ReLU" LAYER have two lines of data LAYER3_ OFMAP1_ LN1、LAYER3_OFMAP1_LN2And LAYER3_ OFMAP2_ LN1、LAYER3_OFMAP2_LN2In combination with the PAD row of the first row in each output profile in the LAYER, a convolution calculation of the "Conv" LAYER may be performed and the first row data LAYER4_ ofamap 1_ LN in the "Conv" LAYER is calculated1The data is the first line of data in the final residual image. At this time, the data that the input image has entered and participated in the calculation is: SRC _ LN1、SRC_LN2、SRC_LN3、SRC_LN4And SRC _ LN5。
According to the calculation process, as long as 5 lines of original image data are input, the operation of the whole simplified DnCNN network can be carried out, and the data of the first line of residual image can be obtained; and each subsequent input of a line of original image data can output a line of residual image data.
Referring to fig. 6, fig. 6 is a schematic diagram showing row data flow in each neural network layer in the DnCNN network. Similar to the calculation principle in fig. 5, when the input data in a layer of neural network plus the data of PAD row reaches 3 rows of kernel _ size, the convolution calculation of the layer of neural network can be started. According to fig. 6, with the simplified DnCNN network, when input data of an input image reaches 5 lines, data of a first line of residual image can be output, and subsequently, one line of residual image data can be output every time input of one line of original image data. The same convolution operation unit processed by rows is circularly called according to the method, and the complete operation of the DnCNN can be realized. The line delay of the input data of the original image caused by this can be calculated by the following formula:
start_src_ln_num = Layer_num + kernel_size-1-pad_num;
wherein, start _ src _ ln _ num represents the number of lines of the input image when outputting the first line of residual image data, that is, the number of line delays between the data of the first line of residual image and the input data of the original image; layer _ num represents the number of layers of the whole DnCNN network, including all Conv + ReLU layers, Conv + BN + ReLU layers and Conv layers; kernel _ size represents the number of rows of the convolution kernel, e.g., 3 for a 3 × 3 convolution kernel; PAD _ num represents the number of PAD rows filled above the input feature map, and as mentioned above, the (kernel _ size-1)/2 rows of filling values are filled above and below the input feature map, so that PAD _ num is (kernel _ size-1)/2, and when kernel _ size is 3, PAD _ num is 1.
Similar to the above convolution calculation method, the calculation process of the line-by-line CNN network calculation method can also be divided into three stages: a starting working phase, a stable working phase and an ending working phase.
In the starting working phase, the original image data input buffer unit buffers the data in each input feature map in the input image according to lines, and judges whether the buffer data amount line number in each input feature map reaches a kernel _ size line (the kernel _ size represents the line number of a convolution kernel), when the buffer data amount in each input feature map reaches the kernel _ size line data, the line convolution operation unit starts to perform a first layer of convolution operation, namely convolution calculation of a 'Conv + ReLU' layer in FIG. 5. The cache data amount comprises PAD data, namely if the input feature diagram is filled (Padding) in the convolution calculation process, whether the sum of the cache data amount line number and the PAD line number in each input feature diagram reaches a kernel _ size line is judged, and when the sum of the cache data amount line number and the PAD line number in each input feature diagram reaches the kernel _ size line data, the line convolution operation unit starts convolution operation. At this time, only the first layer convolution, i.e., the convolution operation of the "Conv + ReLU" layer, can be performed. To continue to start the calculation of the next layer "Conv + BL + ReLU", it is necessary to wait until the original image data input buffer unit finishes buffering a new line of data, and the number of data lines in each output feature map calculated by the "Conv + ReLU" layer plus the number of PAD lines reaches kernel _ size. Similar conditions exist in the calculation of the other subsequent "Conv + BL + ReLU" layers and "Conv" layers, and the calculation of the current neural network layer can not be started until the number of data lines in each output feature map of the previous layer plus the number of PAD lines reaches kernel _ size. And starting the working phase until the data of the first row of residual pictures are output by the Conv layer. When the 'Conv' layer outputs the data of the residual image of the first row, a smooth working stage is started to enter.
In the stable working stage, after the original image data is input into the cache unit to cache a new line of data of each input feature map, the line convolution operation unit obtains the input feature map data required by the convolution operation and the corresponding convolution kernel data to perform continuous operation on each neural network layer, and stably outputs a new line of residual image result data. And when the original image data input buffer unit finishes the last line of data of each input characteristic graph, the calculation of each neural network layer is finished, and the residual image and the output data corresponding to the line are output, ending the stable working stage and entering the working finishing stage.
And in the working finishing stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the output data of the residual image residual lines. In the ending stage, there will be no more original image input, and the calculation process of the line convolution operation unit in the ending stage will be described below by taking the simplified DnCNN network in fig. 5 as an example. Step (1): the line convolution operation unit firstly uses the PIC _ H-1 line and the PIC _ H line of the original image and combines the PAD line to carry out convolution operation, generates the last line in each output characteristic diagram in the Conv + ReLU layer, and then completes the calculation of residual image data of the PIC _ H-1 line of each output characteristic diagram in the Conv + BL + ReLU layer, the PIC _ H-2 line of each output characteristic diagram in the second Conv + BL + ReLU layer and the PIC _ H-3 line in the Conv layer in sequence. Step (2): the line convolution operation unit generates the last line in each output characteristic diagram in the first Conv + BL + ReLU layer by utilizing the PIC _ H-1 line and the PIC _ H line of each output characteristic diagram in the Conv + ReLU layer and combining with the PAD line to carry out convolution operation, and then completes the calculation of the residual image data of the PIC _ H-1 line of each output characteristic diagram in the second Conv + BL + ReLU layer and the PIC _ H-2 line in the Conv layer in sequence. And (3): and the line convolution operation unit performs convolution operation by using the PIC _ H-1 line and the PIC _ H line of each output characteristic diagram in the 'first Conv + BL + ReLU' layer and combining the PAD line to generate the last line in each output characteristic diagram in the 'second Conv + BL + ReLU' layer, and then completes the calculation of the residual image data of the PIC _ H-1 line in the 'Conv' layer. And (4): and the line convolution operation unit performs convolution operation by using the PIC _ H-1 line and the PIC _ H line of each output characteristic diagram in the 'second Conv + BL + ReLU' layer and combining the PAD line to complete the calculation of the residual image data of the last line in the 'Conv' layer.
The CNN network computing method processed according to rows provided by the invention has the following advantages: (1) when the amount of the cached data in each input feature map reaches kernel _ size, the convolution calculation can be started without caching the data of the whole input feature map, so that the real-time performance of image processing is enhanced in the calculation process; (2) the original image data input buffer unit only needs to set a buffer with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to buffer the data of the whole input feature map; when the intermediate result is cached, each output characteristic diagram only needs the data capacity of kernel _ size _ PIC _ W to be used for caching the intermediate result and used as the input of the next layer of neural network, so that the memory required by calculation is greatly reduced by adopting the convolution calculation method, and the storage resource is saved; (3) in the calculation process, the data participating in the calculation is completely the same as the data of the conventional convolution calculation of the whole frame picture, so that the obvious block boundary effect cannot occur; (4) the traditional image signal processing related algorithm functional modules are processed according to rows, the line-processed CNN network computing method also inputs and outputs computing results according to the rows, and the delay line number of the output results and the original image is only Layer _ num + kernel _ size-1-pad _ num, so that the method can be compatible with the traditional image signal processing algorithm and can be used for parallel computing, the delay is small, and no frame-level delay exists.
As shown in fig. 7, the present invention provides a CNN network computing device 200 that processes by line, which includes an original image data input buffer unit 201, a convolution kernel coefficient buffer unit 203, a convolution kernel coefficient reading unit 202, a line convolution intermediate result reading unit 207, a line convolution operation unit 204, a line convolution intermediate result buffer unit 206, and a line convolution result writing-out unit 205. The original image data input buffer unit 201 is configured to buffer line data of each input feature map in an original image; the convolution kernel coefficient buffer unit 203 is used for buffering each convolution kernel coefficient; the convolution kernel coefficient reading unit 202 is configured to read a convolution kernel coefficient participating in convolution operation from the convolution kernel coefficient buffer unit 203; the line convolution intermediate result cache unit 206 is used for caching the intermediate result of the CNN network calculation; the line convolution intermediate result reading unit 207 is configured to read a line convolution intermediate result from the line convolution intermediate result buffer unit 206; the line convolution operation unit 204 is configured to read data participating in convolution operation from the original image data input buffer unit 201/line convolution intermediate result reading unit 207 and convolution kernel coefficient reading unit 202, perform convolution operation, and output an obtained result to the line convolution result writing-out unit 205; the line convolution write-out unit 205 outputs the intermediate result to the line convolution intermediate result buffer unit 206, and sends the final residual image data to the external memory.
The original image data input buffer unit 201 receives data input by lines in each input feature map, and judges whether the number of lines of the buffered input feature map reaches a kernel _ size line; when the buffer data amount in each input feature map reaches kernel _ size line data, the original image data input buffer unit informs the line convolution operation unit to start performing line-by-line convolution operation. The original image data input buffer unit 201 is internally provided with a buffer space with a capacity of (kernel _ size +1) × PIC _ W for each input feature map of the original image, and is used for buffering data of the input feature map, where the kernel _ size is the number of lines of the convolution kernel, and the PIC _ W is the data amount of each line of the input feature map. When the data amount of each input feature map buffer memory reaches the kernel _ size line, the convolution calculation can be carried out, and meanwhile, the data input of the external input feature map can be continuously received by the remaining buffer memory space in one line. And the subsequent cycle uses the buffer BUF until all the line data of the original image input characteristic diagram are input.
The convolution kernel coefficient buffer unit 203 is used to buffer all convolution kernel coefficients of the entire DnCNN network, which are introduced before the device is started and are not changed in the middle.
The convolution kernel coefficient reading unit 202 is configured to read convolution kernel coefficients cached in the convolution kernel coefficient caching unit 203, and circularly invoke a calculation process according to the row convolution operation unit 204 to complete reading of corresponding neural network layer convolution kernel coefficients.
The row convolution operation unit 204 completes the whole DnCN network calculation by rows in a circular calling mode. The line convolution operation unit 204 performs calculations of BN and ReLU in addition to the convolution operation, but only the convolution operation is described in this patent since the calculations of BN and ReLU do not affect the operation process of the present apparatus. In the starting working stage, when the cache data amount in each input feature map of the original image reaches kernel _ size row data, the row convolution operation unit 204 starts to perform calculation. At this time, only the first layer convolution, i.e., the convolution operation of the "Conv + ReLU" layer, can be performed. To continue to start the calculation of the next layer "Conv + BL + ReLU", it is necessary to wait until the original image data input buffer unit 201 finishes buffering a new line of data, and the number of data lines in each output feature map calculated by the "Conv + ReLU" layer plus the number of PAD lines reaches kernel _ size. Similar conditions exist in the calculation of other subsequent "Conv + BL + ReLU" layers and "Conv" layers, and the operation of the current neural network layer can not be started until the number of data lines in each output feature map of the previous layer plus the number of PAD lines reaches kernel _ size. And starting the working phase until the data of the first row of residual pictures are output by the Conv layer. When the 'Conv' layer outputs the data of the residual image of the first row, a smooth working stage is started to enter. In the steady working stage, after the original image data is input into the buffer unit 201 to buffer a new line of data of each input feature map, the line convolution operation unit 204 obtains the input feature map data required by the convolution operation and the corresponding convolution kernel data to perform continuous operation on each neural network layer, and stably outputs a new line of residual image result data. And when the original image data input buffer unit finishes the last line of data of each input characteristic graph, the calculation of each neural network layer is finished, and the residual image and the output data corresponding to the line are output, ending the stable working stage and entering the working finishing stage. In the end working stage, the row convolution operation unit 204 completes the calculation of the remaining rows by self-driving, and outputs the output data of the remaining rows of the residual image.
The line convolution intermediate result buffer unit 206 is used for buffering line data of the output characteristic diagram of each layer in the middle of the DnCNN network. Taking the simplified DnCNN network in fig. 5 as an example, the line convolution intermediate result buffer unit 206 mainly buffers line data of the respective output feature maps of the "Conv + ReLU" layer, the "first Conv + BL + ReLU" layer, and the "second Conv + BL + ReLU" layer. In order to save the buffer space, the data storage space of the intermediate output characteristic diagram also adopts a multiplexing mode, and each output characteristic diagram only needs the data capacity of kernel _ size _ PIC _ W for the intermediate result buffer and is used as the input of the next layer of neural network. Compared with the traditional frame-based operation mode, the output characteristic diagram of each layer needs to be provided with the buffer space with the same size as the input characteristic diagram, and particularly under the condition of large resolution, the line-based processing method adopted by the invention greatly saves the storage space capacity.
The line convolution intermediate result reading unit 207 is configured to read data in the line convolution intermediate result cache unit 206, and complete reading of the corresponding neural network layer characteristic line data according to a loop calling process of the line convolution operation unit 204.
The position of the CNN network computing device for line-by-line processing provided in the present invention can be arbitrarily placed in the whole image signal processing process, and can be placed at the beginning of the whole image signal processing flow, at the middle of the image signal processing flow, or at the end of the image signal processing flow, which is not limited herein. Because the image processing algorithms of the traditional ISP module are processed according to rows, while the traditional AI algorithm is generally processed according to frames, the AI noise reduction algorithm can be started only after the ISP module finishes processing one frame of data according to the rows; that is, the AI noise reduction process based on the CNN network introduced in the background art is generally performed after the ISP image processing, so that in the connection mode, the output result of the AI noise reduction algorithm has at least one frame delay, and is not suitable for an application scenario with a small delay tolerance, such as automatic driving. In contrast, the CNN network computing device for line processing provided in the present invention, because it inputs and outputs the computation results line by line, can be placed anywhere throughout the image signal processing device, and its output results do not have a delay at the frame level. Therefore, compared with the traditional AI noise reduction device, the noise reduction device provided by the invention has stronger adaptability and better application prospect.
The CNN network computing device processed according to rows provided by the invention has the following advantages: (1) the original image data input buffer unit only needs to set a buffer with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, and does not need to buffer the data of the whole input feature map; when the intermediate result caching is carried out, the line convolution intermediate result caching unit only needs to configure kernel _ size _ PIC _ W data capacity for each output characteristic diagram for intermediate result caching and used as the input of the next layer of neural network, so that the device greatly reduces the storage space, realizes the miniaturization and integration, and is more favorable for industrial application; (2) in the calculation process of the device, the data participating in the operation is completely the same as the data of the conventional convolution operation of the whole frame picture, so that the obvious block boundary effect cannot occur. (3) The traditional image signal processing related algorithm functional modules are processed according to rows, the CNN network computing device for processing according to rows also inputs and outputs computing results according to rows, and the number of delay rows of the output results and the original input image is only Layer _ num + kernel _ size-1-pad _ num, so that the device can be compatible with the traditional image signal processing device for parallel computing, the delay is small, and no frame-level delay exists.
Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (13)
1. A convolution calculation method is characterized in that the convolution calculation method comprises a starting working stage, a stable working stage and an ending working stage; wherein the content of the first and second substances,
in the starting working phase, the input characteristic diagram cache unit caches data in each input characteristic diagram according to lines and judges whether the line number of the cached data in each input characteristic diagram reaches a kernel _ size line, wherein the kernel _ size represents the line number of a convolution kernel; when the number of the lines of the cache data in each input feature graph reaches a kernel _ size line, starting convolution operation by a line convolution operation unit, and when each output feature graph outputs a convolution calculation result of a first line, ending the starting working phase;
in the stable working stage, after the input feature map cache unit finishes caching a new line of data in each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to carry out convolution operation, and stably outputs a convolution operation result of a new line of each output feature map; if the input feature map is original image data, the input feature map caching unit sets a caching space with the capacity of (kernel _ size +1) × PIC W for each input feature map, wherein PIC W is the data volume of each line of each input feature map; if the input feature map is the output feature map of the previous layer of convolution operation, the input feature map caching unit sets a caching space with the capacity of kernel _ size _ PIC _ W for each input feature map; when the input characteristic diagram cache unit finishes caching the last line of data of each input characteristic diagram and finishes the convolution calculation of the line of data, the stable working stage is finished;
and in the finishing working stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the convolution calculation results of the residual lines.
2. The convolution calculation method of claim 1, wherein the number of buffered data lines includes the number of PAD data lines filled above each input feature map data when determining whether the number of buffered data lines in each input feature map reaches a kernel _ size line.
3. The convolution calculation method of claim 2, wherein the number of rows of the PAD data is (kernel _ size-1)/2.
4. A convolution calculation apparatus used in the convolution calculation method according to claim 1, wherein the convolution calculation apparatus includes an input feature map buffer unit, a convolution kernel coefficient read unit, a line convolution operation unit, and a line convolution result write unit, wherein,
the input characteristic diagram caching unit is used for caching line data in each input characteristic diagram;
the convolution kernel coefficient cache unit is used for caching convolution kernel coefficients;
the convolution kernel coefficient reading unit is used for reading the convolution kernel coefficients participating in convolution operation from the convolution kernel coefficient cache unit;
and the line convolution operation unit is used for reading data participating in convolution calculation from the input characteristic diagram buffer unit and the convolution kernel coefficient reading unit, performing convolution calculation, and outputting a convolution result to an external storage device through the line convolution result writing-out unit.
5. A DnCNN network computing method is characterized in that the DnCNN network computing method comprises a starting working stage, a stable working stage and an ending working stage; wherein, the first and the second end of the pipe are connected with each other,
in the starting working stage, the original image data input buffer unit buffers data in each input feature map in the input image according to lines, and judges whether the buffer data line number in each input feature map reaches a kernel _ size line, wherein the kernel _ size represents the line number of a convolution kernel; when the number of the cache data lines in each input feature graph reaches a kernel _ size line, starting a first layer of convolution operation by a line convolution operation unit; other neural network layers need to wait until the number of lines of the cache data in each output characteristic diagram of the previous layer reaches a kernel _ size line, and then the calculation of the current neural network layer is started; when the data of the first row of residual images are output, the starting working phase is ended;
in the stable working stage, after the original image data input cache unit caches a new line of data of each input feature map, the line convolution operation unit acquires input feature map data required by convolution operation and corresponding convolution kernel data to perform continuous operation on each neural network layer, and stably outputs a new line of residual image result data; the original image data input buffer unit sets a buffer space with the capacity of (kernel _ size +1) × PIC _ W for each input feature map, wherein the PIC _ W is the data volume of each line of each input feature map; when the original image data input cache unit finishes caching the last line of data of each input characteristic graph, finishes the calculation of each neural network layer and outputs a residual image and output data corresponding to the line, the stable working stage is finished and the working stage is finished;
and in the finishing working stage, the line convolution operation unit completes the calculation of the residual lines by self-driving and outputs the output data of the residual lines of the residual image.
6. The method of calculating a DnCNN network of claim 5 wherein the row convolution operation unit performs calculations of BN and ReLU in addition to convolution operations.
7. The DnCNN network computing method of claim 5, wherein the number of buffered data lines comprises the number of lines of PAD data filled above each input feature map data when determining whether the number of buffered data lines in each input feature map reaches a kernel _ size line; and the number of lines of cache data in each output characteristic diagram of other neural network layers from the previous layer to the previous layer reaches a kernel _ size line, and the number of lines of cache data comprises the number of lines of PAD data filled above each output characteristic diagram data.
8. The method of calculating a DnCNN network of claim 7, wherein the number of rows of PAD data is (kernel _ size-1)/2.
9. The DnCNN network computing method of claim 5, wherein a line delay number of the data of the first line of the residual image from the original image input data is
start_src_ln_num = Layer_num + kernel_size-1-pad_num;
Wherein start _ src _ ln _ num represents a line delay number of data of a first line of output residual images and original image input data; layer _ num represents the number of layers of the whole DnCNN network, including all Conv + ReLU layers, Conv + BN + ReLU layers and Conv layers; kernel _ size represents the number of rows of the convolution kernel; PAD _ num represents the number of PAD rows filled above the input feature map.
10. A DnCNN network computing device for use in the DnCNN network computing method of claim 5, wherein the DnCNN network computing device comprises an original image data input buffer unit, a convolution kernel coefficient reading unit, a line convolution intermediate result reading unit, a line convolution operation unit, a line convolution intermediate result buffer unit, and a line convolution result writing-out unit; wherein the content of the first and second substances,
the original image data input cache unit is used for caching line data of each input characteristic diagram in the original image;
the convolution kernel coefficient caching unit is used for caching convolution kernel coefficients;
the convolution kernel coefficient reading unit is used for reading the convolution kernel coefficients participating in convolution operation from the convolution kernel coefficient cache unit;
the line convolution intermediate result cache unit is used for caching an intermediate result calculated by the DnCNN network;
the line convolution intermediate result reading unit is used for reading a line convolution intermediate result from the line convolution intermediate result buffer unit;
the line convolution operation unit is used for reading data participating in convolution operation from the original image data input buffer unit/the line convolution intermediate result reading unit and the convolution kernel coefficient reading unit, performing convolution operation and outputting an obtained result to the line convolution result writing-out unit; and
and the line convolution writing-out unit outputs the intermediate result to the line convolution intermediate result buffer unit and transmits the final residual image data to an external memory.
11. The DnCNN network computing device of claim 10, wherein the line convolution intermediate result buffer unit sets a buffer space with a size of kernel _ size _ PIC _ W lines for each intermediate output feature map.
12. The apparatus of claim 10 wherein the row convolution operation unit performs calculations of BN and ReLU in addition to convolution operations.
13. The DnCNN network computing device of claim 10, wherein the DnCNN network computing device is located at one of: the start of the image signal processing flow, an intermediate position of the image signal processing flow, or the end of the image signal processing flow.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210335610.1A CN114429203B (en) | 2022-04-01 | 2022-04-01 | Convolution calculation method, convolution calculation device and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210335610.1A CN114429203B (en) | 2022-04-01 | 2022-04-01 | Convolution calculation method, convolution calculation device and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114429203A CN114429203A (en) | 2022-05-03 |
CN114429203B true CN114429203B (en) | 2022-07-01 |
Family
ID=81314321
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210335610.1A Active CN114429203B (en) | 2022-04-01 | 2022-04-01 | Convolution calculation method, convolution calculation device and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114429203B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109934339A (en) * | 2019-03-06 | 2019-06-25 | 东南大学 | A kind of general convolutional neural networks accelerator based on a dimension systolic array |
CN110097174A (en) * | 2019-04-22 | 2019-08-06 | 西安交通大学 | Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row |
CN110991609A (en) * | 2019-11-27 | 2020-04-10 | 天津大学 | Line buffer for improving data transmission efficiency |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102415508B1 (en) * | 2017-03-28 | 2022-07-01 | 삼성전자주식회사 | Convolutional neural network processing method and apparatus |
US20220012587A1 (en) * | 2020-07-09 | 2022-01-13 | Shanghai Zhaoxin Semiconductor Co., Ltd. | Convolution operation method and convolution operation device |
CN112200300B (en) * | 2020-09-15 | 2024-03-01 | 星宸科技股份有限公司 | Convolutional neural network operation method and device |
CN113807509B (en) * | 2021-09-14 | 2024-03-22 | 绍兴埃瓦科技有限公司 | Neural network acceleration device, method and communication equipment |
CN114169514B (en) * | 2022-02-14 | 2022-05-17 | 浙江芯昇电子技术有限公司 | Convolution hardware acceleration method and convolution hardware acceleration circuit |
-
2022
- 2022-04-01 CN CN202210335610.1A patent/CN114429203B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109934339A (en) * | 2019-03-06 | 2019-06-25 | 东南大学 | A kind of general convolutional neural networks accelerator based on a dimension systolic array |
CN110097174A (en) * | 2019-04-22 | 2019-08-06 | 西安交通大学 | Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row |
CN110991609A (en) * | 2019-11-27 | 2020-04-10 | 天津大学 | Line buffer for improving data transmission efficiency |
Non-Patent Citations (2)
Title |
---|
A convolutional neural network accelerator based on FPGA for buffer optimization;Haotian Wang et al;《IAEAC 2021》;20210405;2362-2367页 * |
图像卷积实时计算的FPGA实现;张帆;《电子设计工程》;20210131;第29卷(第1期);132-137,142页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114429203A (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102511059B1 (en) | Super-resolution processing method for moving image and image processing apparatus therefor | |
CN107924554B (en) | Multi-rate processing of image data in an image processing pipeline | |
CN111028150B (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
US10579908B2 (en) | Machine-learning based technique for fast image enhancement | |
US9787922B2 (en) | Pixel defect preprocessing in an image signal processor | |
KR101137753B1 (en) | Methods for fast and memory efficient implementation of transforms | |
US9386234B2 (en) | Auto filter extent management | |
JP2021016150A (en) | Loop filtering device and image decoding device | |
CN110211057B (en) | Image processing method and device based on full convolution network and computer equipment | |
Guo et al. | Joint denoising and demosaicking with green channel prior for real-world burst images | |
US9462189B2 (en) | Piecewise perspective transform engine | |
US20140375843A1 (en) | Image processing apparatus, image processing method, and program | |
CN111402139A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
JP2015144435A (en) | Techniques to facilitate use of small line buffers for processing of small or large images | |
CN112771843A (en) | Information processing method, device and imaging system | |
WO2023160426A1 (en) | Video frame interpolation method and apparatus, training method and apparatus, and electronic device | |
CN114429203B (en) | Convolution calculation method, convolution calculation device and application thereof | |
CN114596339A (en) | Frame processing device and method and frame processor | |
US11302035B2 (en) | Processing images using hybrid infinite impulse response (TTR) and finite impulse response (FIR) convolution block | |
JP2005176350A (en) | Printing processing of compressed image with noise | |
CN114885144B (en) | High frame rate 3D video generation method and device based on data fusion | |
WO2021139380A1 (en) | Image processing method and device, electronic device | |
Asama et al. | A machine learning imaging core using separable FIR-IIR filters | |
WO2006112814A1 (en) | Edge-sensitive denoising and color interpolation of digital images | |
CN113365107B (en) | Video processing method, film and television video processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |