CN114596229A

CN114596229A - Image noise reduction method and system

Info

Publication number: CN114596229A
Application number: CN202210215638.1A
Authority: CN
Inventors: 邢宇; 隋凌志; 王怀亮
Original assignee: Shanghai Yili Health Technology Co ltd
Current assignee: Shanghai Yili Health Technology Co ltd
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-06-07

Abstract

The present disclosure provides an image data noise reduction method and a system for performing the same. The image denoising method comprises the following steps: selecting a reference block aiming at an original input image, and determining N candidate blocks which are most similar to the reference block in a search window; performing collaborative filtering by taking the set of candidate blocks as input, wherein the collaborative filtering comprises 3D frequency domain transformation, filtering and 3D frequency domain inverse transformation; and taking the spatial domain pixel value obtained by cooperatively filtering each candidate block and the weight obtained in the filtering process as input, and carrying out aggregation and normalization with the original input image, wherein the calculation in the method uses the shift and multiply-add operation of fixed point integers and is carried out on a Bayer domain.

Description

Image noise reduction method and system

Technical Field

The present application relates to an image processing method, and more particularly, to an image denoising method.

Background

In a broad sense, any signal component mixed in, which causes a difference between an actual output and a desired output, may be referred to as "noise". The most dominant noise in an imaging system comes from optical noise (e.g., ghosting, color aberration), sensor and photoelectric conversion (e.g., shot noise, moire effect, dark current), and circuitry (e.g., amplifier noise, power switching noise), etc.

With the rapid development of the fields of mobile equipment, artificial intelligence, medical treatment, high and new technology and the like, more new and various requirements are put forward for the acquisition and processing processes of images. Image denoising algorithms can be classified into 3 categories, namely filter-Based Methods (Filtering-Based Methods), Model-Based Methods (Model-Based Methods), and Learning-Based Methods (Learning-Based Methods). At present, algorithms except for Deep Learning (DL) are generally called as traditional methods, but actually, both traditional algorithms and DL algorithms can achieve good effects on an image denoising task.

Filter-based noise reduction algorithms remove noise in an image by means of artificially designed filters. In general, filter-based noise reduction algorithms can be classified into two categories, namely, spatial domain feature denoising and transform domain denoising, and are respectively processed in an image spatial domain and a transform domain. Common spatial domain noise reduction methods include mean filtering, median filtering, bilateral filtering, and Non-Local Means (NLM) filtering.

Model-based methods mathematically model natural images or their noise, and are generally highly mathematically motivated. Various models have been used for a priori modeling of images, including sparse models, gradient models, non-local self-similar models, and markov random field models. Low-Rank Matrix Recovery (LRMR), which is commonly used in image Recovery, for example, decomposes image features into Low-Rank components and sparse components, while a natural image should be approximately Low-Rank, and those components that are large in amplitude and sparse tend to be noise.

Among the conventional algorithms, BM3D is one of the algorithms that are recognized to have a better noise reduction effect. With the development of processors and algorithms in recent years, a large number of learning-based noise reduction algorithms appear, for example, several deep learning-based noise reduction works born in 2009 to 2012 are 2-3 layers of fully-connected networks; in 2014-2015, discriminant learning methods such as the systolic field Cascade (CSF) and the Nonlinear Reaction Diffusion model (TNRD) have appeared, which achieve similar effects to BM 3D; since 2016, several neural Network structure variants based on VGG (a deep convolutional neural Network structure), ResNet (Residual Network), UNet (a deep convolutional neural Network structure similar to U) and the like appear, and are used for image noise reduction tasks, and the effect exceeding BM3D is achieved.

Although the learning-based convolutional neural network performs well in image noise reduction and image recovery, training of the neural network model requires a large number of training image sample pairs, which cannot cover all image scenes, sensor configurations, circuit parameters. Therefore, limited by the generalization capability of the neural network, in many real noise scenarios, without the targeted fine tuning in such scenarios, the effect of the neural network may not be comparable to that of the conventional algorithm, and how to obtain the data sets (including noise data, Ground Truth (real data), etc.) in different scenarios is also a potential problem. In addition, the large scale of many neural network algorithms (including image size, network depth and parameter size) can make it difficult to meet the requirements of the application on real-time processing. Therefore, for the task of image noise reduction of an unknown scene, BM3D can be preferentially used as a reference algorithm.

Disclosure of Invention

According to an aspect of the present disclosure, there is provided an image denoising method, including: selecting a reference block aiming at an original input image, and determining N candidate blocks which are most similar to the reference block in a search window; performing collaborative filtering by taking the set of candidate blocks as input, wherein the collaborative filtering comprises 3D frequency domain transformation, filtering and 3D frequency domain inverse transformation; and taking the spatial domain pixel value obtained by cooperatively filtering each candidate block and the weight obtained in the filtering process as input, and carrying out aggregation and normalization with the original input image, wherein the calculation in the method uses the shift and multiply-add operation of fixed point integers and is carried out on a Bayer domain. In one embodiment, the filtering process includes only a single filtering, and wherein the single filtering includes hard thresholding.

In one embodiment, the 3D frequency domain transform and the 3D frequency domain inverse transform comprise a two-dimensional cosine transform in integer form.

In one embodiment, the size of the reference block is set to 8x8 and the step size is set to 6.

In one embodiment, the search window size is set to 9x9 and the step size is set to 2.

In one embodiment, determining the N candidate blocks that are most similar to the reference block comprises using the manhattan distance to determine the similarity of the candidate blocks to the reference block.

In a further embodiment, N is 16, and when the number of candidate blocks satisfying the condition is less than 16, the reference block is padded among the set of candidate blocks.

According to another aspect of the present disclosure, a system configured to perform the above method is provided.

In one embodiment, the system comprises at least one of: a central processing unit CPU, an image processing unit GPU, an image signal processor chain, a field programmable gate array FPGA and other video chains.

According to another aspect of the present disclosure, there is provided a non-transitory computer storage medium having stored thereon instructions, which when executed by one or more processors, are configured to perform the above-described method.

Drawings

Fig. 1 is a process of the existing BM 3D;

fig. 2 shows the grouping operation in step 1 in the existing BM 3D.

Fig. 3 shows a flow diagram of a method according to an embodiment of the present disclosure.

Fig. 4-7 respectively illustrate in detail the specific steps of the method of fig. 3, according to an embodiment of the present disclosure.

Fig. 8A, 8B and 8C illustrate noise reduction effects obtained by processing for three pictures respectively using a method according to an embodiment of the present disclosure.

Fig. 9 shows a simplified method flow diagram according to an embodiment of the present disclosure.

Detailed Description

The following description with reference to the accompanying drawings is provided to facilitate a thorough understanding of various embodiments of the present disclosure as defined by the claims and equivalents thereof. This description includes various specific details to' aid understanding but should be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Moreover, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.

The terms and phrases used in the following specification and claims are not limited to their dictionary meanings but are used only by the inventor to enable a clear and consistent understanding of the present disclosure. Accordingly, it should be apparent to those skilled in the art that the following descriptions of the various embodiments of the present disclosure are provided for illustration only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.

The terms "comprises" or "comprising" refer to the presence of the respective disclosed functions, operations, or components that may be used in various embodiments of the present disclosure, and do not limit the presence of one or more additional functions, operations, or features. Furthermore, the terms "comprises" or "comprising" may be interpreted as referring to certain features, integers, steps, operations, elements, components, or groups thereof, but should not be interpreted as excluding the possibility of one or more other features, integers, steps, operations, elements, components, or groups thereof.

The term "or" as used in various embodiments of the present disclosure includes any and all combinations of the listed terms. For example, "a or B" may include a, may include B, or may include both a and B.

Unless otherwise defined, all terms (including technical or scientific terms) used in this disclosure have the same meaning as understood by one of ordinary skill in the art to which this disclosure belongs. General terms, as defined in dictionaries, are to be interpreted as having a meaning that is consistent with their context in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below by referring to the accompanying drawings and examples.

The following first explains part of the nouns.

FPGA: field Programmable Gate Array, Field Programmable logic Gate Array. The FPGA belongs to a semi-custom circuit in an integrated circuit and consists of a series of Configurable Logic Blocks (CLBs) and programmable connections. The algorithm can be quickly compiled and burned into a combination of transistor circuits through a hardware description language, so that the deployment and acceleration of the algorithm on a hardware platform can be quickly realized. The method has the characteristics of extremely high calculation energy consumption ratio, rapid iteration, customization and the like, and is particularly suitable for the current artificial intelligence and the related application of a data center.

Image denoising: the image noise is usually expressed as a series of pixel points or pixel blocks causing bad visual effects in the image, and the image noise is from the whole process of acquisition, transmission, compression and display of the image and has various expression forms of salt and pepper noise, gaussian noise and the like. Image denoising is a necessary step to improve image resolution and sensory effects, and is generally performed before more advanced image processing algorithms, which is the basis of image processing.

BM 3D: Block-Matching and 3D Filtering, and based on an image noise reduction algorithm of region Matching and sparse three-dimensional transform domain collaborative Filtering. The algorithm was published by Kostatin Dabov et al in IEEE TRANSCATIONS ON IMAGE PROVESSING in 2007, and BM3D consisted of 3 steps, respectively, of similar block Grouping (Grouping), Collaborative Filtering (clustering), and Aggregation process (Aggregation). Similar blocks are grouped by selecting a reference block with a certain size in a noise image, searching blocks with the minimum difference from the reference block in a proper size around the reference block, and integrating the blocks into a three-dimensional matrix. The collaborative filtering is to perform a sliding window on a reference block in a full graph and continuously perform similar block grouping to form a plurality of three-dimensional matrixes, each three-dimensional matrix is filtered, a two-dimensional block in the three-dimensional matrixes is subjected to wavelet or Discrete Cosine Transform (DCT) so as to be transformed from a spatial domain to a frequency domain, a third dimension is transformed, usually Hadamard Transform (Hadamard Transform), the three-dimensional matrixes are subjected to hard threshold processing after transformation, the spatial domain value of the reference block after filtering is restored through inverse transformation corresponding to the previous transformation (Hadamard Transform and DCT), and the weight is calculated by utilizing non-zero component statistical superposition. And the final aggregation process is that at each position of the image, spatial domain results recovered after collaborative filtering is carried out on similar blocks obtained from different reference blocks and respective weights are multiplied, accumulated and normalized, and then the spatial domain results and the respective weights are fused to the corresponding position of the original image, so that the noise intensity in the original image is reduced. BM3D is one of the most effective conventional algorithms recognized in the field of image noise reduction.

PSNR, Peak Signal-to-Noise Ratio, is one of the commonly used criteria for evaluating image quality. Given a clean image F and a noisy image K with width w and height h, the Mean Squared Error (MSE) and the peak signal-to-noise ratio are (dB in dB, MAX in the maximum pixel value possible for the picture):

SSIM, Structural Similarity. The structural similarity is also an index for measuring the similarity degree of the two digital images, and compared with PSNR, SSIM is more suitable for judging the influence quality by human eyes. SSIM consists of three parts, brightness contrast, contrast and texture contrast, where C1, C2, and C3 are constants, μ is the average gray level, and σ is the standard deviation of the gray level.

The final SSIM index is:

SSIM(x，y)＝l(x，y)*c(x，y)*s(x，y)。

CFA, Color Filter Array. An image sensor is essentially a monochrome sensor that responds to light in a sensitive wavelength range. Therefore, in order to restore a color image, a color separation technique is required, typically using an on-chip color filter array CFA on the photodiode. The most common primary color filter pattern is the Bayer (Bayer) pattern, where the green filter is twice the amount of blue or red, because human vision derives visual detail primarily from the green spectrum.

One of the objectives of the present disclosure is to improve BM3D in conventional filter-based image denoising algorithms. The BM3D adds frequency domain filtering operation on the basis of non-local mean value, and utilizes frequency domain information to screen out similar pixel values which obviously do not meet the requirements, thereby relieving the fuzzy problem of spatial filtering algorithm, which is one of the best traditional image noise reduction algorithms recognized at present.

As shown in fig. 1 below, the processing procedure of the conventional BM3D, BM3D is divided into two steps, step 1 and step 2, wherein each step is composed of three operations of grouping, collaborative filtering, and aggregation.

The first operation is grouping, as described in detail below with respect to FIG. 2. From the input picture, one k × k image block is taken as a reference block (dotted line block) in turn according to a certain step size, and for each reference block, N candidate blocks (dotted line block) most similar to the reference block are searched in an N × N window (color painting frame) taking the reference block as the center. The N candidate blocks form a group, which is a three-dimensional matrix of k x N, which is the input to the collaborative filtering.

The collaborative filtering operation consists of three steps of 3D frequency domain transformation, filtering and 3D frequency domain inverse transformation. The input is a k x N three-dimensional matrix block composed of N candidate blocks. The 3D frequency domain Transform first performs a 2D DCT (Discrete Cosine Transform) or Wavelet Transform (Wavelet Transform) on each k × k candidate block, and then performs a Hadamard Transform (Hadamard Transform) on the third dimension. And performing N DCT transformations with the size of k x k in total, and performing k Hadamard transformations with the size of N x k in total to obtain the frequency domain three-dimensional matrix blocks (k x N) with the same size. The filtering process first presets a threshold value for the frequency, and the components below the threshold value are set to 0 for filtering purposes. And counting the number of the components higher than the threshold value, wherein the obtained statistical result is used as a basis for distributing the weight of each pixel value in the group through certain operation, and the weight is used in the aggregation step. And filtering, performing inverse transformation in a 3D frequency domain to obtain a filtered spatial domain pixel value, and using the filtered spatial domain pixel value and the weight obtained in the filtering process as the aggregated input. The aggregation operation is to accumulate and normalize the collaborative filtering results of the overlapped parts according to respective weights to obtain a final output pixel value. Step 1 and step 2 (as shown in fig. 1) of BM3D differ in that the grouping process of step 2 considers both the noise map and the noise reduction result map of step 1, and the filtering operation is changed from setting the threshold to wiener filtering.

However, since BM3D has very high computational complexity, it is difficult for the algorithm to process in real time on a hardware platform, and especially when the resolution of the input picture is high, this problem becomes more challenging. There are many acceleration schemes of BM3D, and one scheme (referred to as scheme one herein) uses OpenMP tool to propose a CPU-based multithreading acceleration method; in another scheme (referred to as scheme two herein), OpenCL and CUDA frameworks are utilized, a first GPU-based acceleration method is provided, and the performance is improved by 7.5 times compared with a CPU multithreading method; in another scheme (referred to as scheme three), a data caching and sharing mechanism is optimized for the block matching operation, and the performance is further improved; another scheme (referred to herein as scheme four) mainly reduces the extra memory overhead; the CPU has difficulty in meeting the requirement for real-time performance of the algorithm, and the GPU is not suitable for embedded application scenarios such as endoscopes and surgical robots due to high cost and energy consumption.

In addition, compared with general processors such as a CPU (central processing unit) and a GPU (graphics processing unit), the FPGA has the characteristic of customization, and can realize direct mapping from an algorithm to hardware, so that hardware resources are fully utilized, and the FPGA has higher calculation energy consumption ratio and lower cost. In another scheme (referred to as scheme five), the BM3D accelerator based on the FPGA is designed, and compared with the GPU implementation of the scheme two, the performance is improved by 12 times. However, as shown in the following table 1, the method based on the GPU-OpenCL comes from the scheme two, the method based on the GPU-CUDA comes from the scheme three, and the method based on the FPGA-OpenCL comes from the scheme five, the GPU is implemented to operate on an Intel i7-6700K CPU and a NVIDIA Titan XP GPU with 12GB video memory, and the FPGA is implemented to operate on an Intel aria-10 GX1150 FPGA development board with 8GB DDR3 SDRAM.

TABLE 1

The existing method is not just barely capable of meeting the real-time processing requirement of the small images, but far from meeting the real-time processing requirement on 1080P/4K video.

The method and the device perform software and hardware cooperative optimization aiming at BM3D, and realize the real-time 60fps (frame per second) noise reduction processing of 1080P/4K video on the premise of ensuring the noise reduction effect, for example, on an FPGA (field programmable gate array) hardware platform.

Firstly, the optimization is carried out on a software level, because the image noise degree is not large in the application scene of the method, the method that wiener filtering is used in the step two of BM3D is almost not improved in the sense noise reduction effect after testing, the overall improvement trend is not existed in the index of PSNR, and even side effects are generated in a plurality of testing scenes. Therefore, the present disclosure only reserves step 1 of BM3D and omits step 2, thereby greatly reducing the amount of calculation and basically reserving the noise reduction effect.

Secondly, due to the characteristics of the BM3D algorithm, even if only step 1 is adopted, there are a lot of redundant calculations, and the selection of the following parameters will not only affect the effect of the algorithm, but also affect the amount of calculation to a great extent: 1. the size and step size of the reference block; 2. searching a block similar to the reference block in a certain range and searching the step size; 3. selecting the number of candidate blocks with the maximum similarity to the reference block; and 4, transforming the space domain into the frequency domain.

Therefore, in the present disclosure, algorithm selection and parameter optimization are performed for an application, and in order to further reduce computational power while ensuring a noise reduction effect as much as possible, the parameters of the present disclosure are set as follows:

the BM3D noise reduction is carried out on a Bayer domain, and compared with an RGB or YUV domain, the BM3D noise reduction method has a similar noise reduction effect on the Bayer domain;

selecting the size of a reference block as 8x8, and setting the step length as 6;

the search window size of the block matching is set to 9x9, and the step size is 2;

searching candidate blocks by using Manhattan distance instead of a two-norm, and selecting 16 blocks which are most similar to the reference block; if the number of blocks smaller than the threshold is less than 16 blocks, supplementing the reference block in the remaining blocks as a part of the candidate block set;

the spatial domain to frequency domain transform uses a discrete cosine DCT transform that is more hardware friendly.

In addition, the common BM3D algorithm implementation uses 32 or 64-bit floating point numbers for operation. The floating-point number operation occupies a large amount of computing resources, and the FPGA development board used by the invention does not have a floating-point operation unit, so that in the disclosure, the algorithm is quantized, all the calculations in the algorithm are converted into fixed-point numbers from floating-point numbers, the calculated amount is greatly reduced, and the hardware resources are saved. Floating point number operation of BM3D mainly focuses on 2-dimensional discrete cosine transform and aggregation coefficients used in multiply-accumulate in the aggregation process, and quantization is respectively implemented in these two aspects.

The 2D discrete cosine transform uses butterfly calculation, and adopts 2-time 1D discrete integer cosine transform, and now the coefficient matrix is counted as M, and the coefficient matrix is:

assuming the input matrix is X, there is an 8X8 diagonal matrix I, where the values on the diagonal are

The process of DCT2D is where @ denotes matrix multiplication,. T denotes the transpose of the matrix:

DCT2D(X)＝I@M@X@(I@M).T

it can be found that I is a diagonal matrix, so this equation can be transformed into:

DCT2D(X)＝M@X@M.T·E

where · represents the matrix dot product and E is a compensation matrix. E is quantized to an 8-bit integer matrix and the multiplication of M with M.T by the matrix of input X is converted to the following calculation:

a[0]＝in[0]+in[7]；

a[1]＝in[1]+in[6]；

a[2]＝in[2]+in[5]；

a[3]＝in[3]+in[4]；

b[0]＝a[0]+a[3]；

b[1]＝a[1]+a[2]；

b[2]＝a[0]-a[3]；

b[3]＝a[1]-a[2]；

a[4]＝in[0]-in[7]；

a[5]＝in[1]-in[6]；

a[6]＝in[2]-in[5]；

a[7]＝in[3]-in[4]；

b[4]＝a[5]+a[6]+((a[4]＞＞1)+a[4])；

b[5]＝a[4]-a[7]-((a[6]＞＞1)+a[6])；

b[6]＝a[4]+a[7]-((a[5]＞＞1)+a[5])；

b[7]＝a[5]-a[6]+((a[7]＞＞1)+a[7])；

out[0]＝b[0]+b[1]；

out[2]＝b[2]+(b[3]＞＞1)；

out[4]＝b[0]-b[1]；

out[6]＝(b[2]＞＞1)-b[3]；

out[1]＝b[4]+(b[7]＞＞2)；

out[3]＝b[5]+(b[6]＞＞2)；

out[5]＝b[6]-(b[5]＞＞2)；

out[7]＝-b[7]+(b[4]＞＞2)；

inverse DCT2D is the same, assuming the input to the inverse DCT is Y:

IDCT2D(Y)＝(I@M).T@Y@I@M

since I is a diagonal matrix, the above equation can be changed to:

IDCT2D(Y)＝M.T@(Y·E)@M

therefore, the two-dimensional cosine transform of the floating point number is converted into the two-dimensional cosine transform in the form of an integer, and only the shift and the multiplication and addition of the integer are calculated in the calculation process, so that the consumption of hardware resources is greatly reduced.

The final polymerization process of BM3D can be described by the following equation:

where I is the value of the inverse DCT2D, which is consistent with the number of bits of the algorithm input image, ω is determined by the number of numbers (count) greater than the threshold in the hadamard transform, and ω is calculated in the original implementation in the following way:

the present disclosure quantizes ω to an 8-bit unsigned number as well, so far, at the algorithm level, all floating-point operations in BM3D are all converted to fixed-point operations.

In one embodiment, the present disclosure may implement the fast algorithm of BM3D described above based on FPGA, implementing real-time BM3D noise reduction of FPS60 for input images of 4K and below. In other embodiments, the present disclosure may also be implemented in other modules, systems, or devices besides FPGAs, such as CPUs, GPUs, and the like. Alternatively, the methods and hardware implementations of the present disclosure may also be embedded in an Image link, such as an Image Signal Processor (Image Signal Processor) link or other video link, to perform filtering functions. Moreover, the module, system or device implementing the method of the present disclosure can store the related image intermediate processing result in the on-chip storage, avoiding interaction with a large amount of data of the off-chip storage system, and the whole hardware processing flow is as shown in fig. 3.

A detailed description will now be given using one embodiment.

p1. in this disclosure, the data flow is in rows, and an entire row of data must be read in before being read into the next row, as shown in fig. 4, at least 16 rows of data must be buffered before the calculation of the search candidate block can be performed. The line buffer space size used is k-buf ═ 16 × img _ w.

p2. is then the block matching stage of BM3D, which shows the process of block matching as shown in fig. 4, the reference block is the middle blue block, the painted squares represent the areas of searching candidate blocks, the green squares are the candidate blocks most similar to the reference block, where the search area is 9x9, since the present disclosure is a noise reduction in bayer domain, the search step size is set to 2 in consideration of the correspondence of channels.

In particular, for block matching at the edge of an image, since the search area falls outside the original image, padding (padding), which is usually constant, mirror (reflex) or symmetric (symmetry), is required for the original image. In the present disclosure, the filling constant 0x80 is adopted for the region outside the original image.

p3. when the blocks are matched, selecting candidate blocks whose Manhattan distance between the candidate blocks and the reference block is less than the set threshold and the first 16 candidate blocks, if the number of candidate blocks less than the threshold is less than 16, filling the reference block in the candidate blocks, wherein the odd-even ordering algorithm is used and FPGA implementation is formed. After the odd-even ordering, the 16 candidate blocks are overlapped into one (8 x 8x 16) three-dimensional pixel block and cached on the chip, the two-dimensional discrete fourier transform is respectively performed on each (8 x 8) pixel block to convert the pixel block into a frequency domain, and then the hadamard transform and the hard threshold filtering are performed on the three-dimensional pixel block, and the process is shown in fig. 5.

p4. it is emphasized that DCT2D is performed in the dimension 8x8, whereas the Hadamard transform is performed in the stacking direction, i.e. in the dimension 16x8, the part smaller than the threshold is set to 0 for noise reduction by hard threshold filtering after the Hadamard transform, and the statistics of the number of blocks larger than the threshold in the whole three-dimensional pixel block is used to calculate the candidate blocks for the final imageContribution, i.e. weight ω_iThe above process is shown in fig. 6. The frequency domain values are then reconverted to spatial domain values Ii (x, y) by inverse DCT transformation, where x and y are coordinates corresponding to the original image, as shown in fig. 7.

p5. since each pixel in the entire image was used once or several times as a reference or candidate block in the final aggregation process, there will be a corresponding spatial value Ii (x, y) and weight ω after each inverse DCT transform_iAnd accumulating and normalizing the numerical value and the weight to obtain a final output image numerical value. In the hardware implementation, a buffer (buffer) of the output image is always maintained, which is used for buffering the result after each multiplication and accumulation, and the lines are output after the output result of the current line is not influenced by the subsequent windowing operation.

It will be appreciated by those skilled in the art that the present disclosure can be embodied in other specific forms without changing the technical spirit or essential characteristics thereof. Accordingly, it should be understood that the above-described embodiments are only examples and are not limiting. The scope of the present disclosure is defined by the appended claims rather than by the detailed description. Therefore, it is to be understood that all modifications or variations derived from the meaning and scope of the appended claims and equivalents thereof are within the scope of the present disclosure.

In the above-described embodiments of the present disclosure, all operations and messages may be selectively performed or may be omitted. Further, the operations in each embodiment need not be performed sequentially, and the order of the operations may be varied. The messages need not be delivered in sequence and the order of delivery of the messages may vary. Each operation and each message transfer may be performed independently.

At least by: aiming at the optimization of BM3D algorithm, the optimization comprises the selection of algorithm parameters, the realization of quantitative fixed points and the like; and a customized acceleration hardware design, including a streaming process; designing on-chip computing units and caching, etc., the present disclosure may provide at least one of the following advantages: according to the selection and parameter optimization of the algorithm disclosed by the invention, for example, step 2 in the original algorithm is omitted, and only step 1 is reserved, so that the calculation amount is greatly reduced on the basis of ensuring the algorithm effect; in addition, the method in the disclosure only uses the shift and multiply-add operation of fixed point integers during calculation, further realizes the reduction of the calculation amount, and is applicable to the FPGA only supporting fixed point calculation; moreover, the BM3D noise reduction is carried out on a Bayer domain, and compared with the RGB or YUV domain, the BM3D noise reduction has better noise reduction effect on the Bayer domain; finally, the parameters in the denoising method are selected, so that the algorithm calculation amount is further reduced while the denoising effect is ensured as much as possible. In addition, the method also performs software and hardware cooperative optimization, wherein customized hardware acceleration is performed on the FPGA aiming at the optimized algorithm, so that real-time noise reduction is realized for videos, especially 1080P/4K and 60FPS videos.

Fig. 8A, 8B and 8C illustrate the noise reduction effect obtained by processing for three pictures using the method according to an embodiment of the present disclosure, respectively, where from left to right are the artwork, the noise map, and the result after the optimized configuration and quantization in the present disclosure, respectively. Specifically, using the original algorithm to recommend parameters, in fig. 8A, the PSNR of the finally obtained picture is 38.17; in fig. 8B, the PSNR of the finally obtained picture is 38.81; whereas in fig. 8C, the PSNR of the finally obtained picture is 37.43; using the optimized parameters of the present disclosure, in fig. 8A, the PSNR of the finally obtained picture is 38.71; in fig. 8B, the PSNR of the finally obtained picture is 39.56; and in fig. 8C, the PSNR of the finally obtained picture is 37.48. Therefore, the algorithm can still realize effective noise reduction by using the parameters adopted by the method. Based on this optimized parameter, embodiments of the present disclosure enable real-time processing of 60FPS in 1080P/4K video.

As shown in the figure, an image denoising method according to the present disclosure includes: selecting a reference block aiming at an original input image, and determining N candidate blocks which are most similar to the reference block in a search window; performing collaborative filtering by taking the set of candidate blocks as input, wherein the collaborative filtering comprises 3D frequency domain transformation, filtering and 3D frequency domain inverse transformation; and taking the spatial domain pixel value obtained by cooperatively filtering each candidate block and the weight obtained in the filtering process as input, and carrying out aggregation and normalization with the original input image, wherein the calculation in the method uses the shift and multiply-add operation of fixed point integers and is carried out on a Bayer domain. In one embodiment, the filtering process includes only a single filtering, and wherein the single filtering includes hard thresholding.

According to another aspect of the present disclosure, there is provided a non-transitory computer storage medium having stored thereon instructions that, when executed by one or more processors, are configured to perform the above-described method.

While the disclosure has been shown and described with reference to various embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents.

Claims

1. An image denoising method, comprising:

selecting a reference block aiming at an original input image, and determining N candidate blocks which are most similar to the reference block in a search window;

performing collaborative filtering by taking the set of candidate blocks as input, wherein the collaborative filtering comprises 3D frequency domain transformation, filtering and 3D frequency domain inverse transformation;

taking the spatial domain pixel value obtained after each candidate block is cooperatively filtered and the weight obtained in the filtering process as input, aggregating and normalizing the spatial domain pixel value and the original input image,

the calculation in the above method uses shift and multiply-add operation of fixed point integer, and is performed on bayer domain.

2. The method of claim 1, wherein the filtering process comprises only a single filtering, and wherein the single filtering comprises hard thresholding.

3. The method of claim 1 or 2, wherein the 3D frequency domain transform and the 3D frequency domain inverse transform comprise a two-dimensional cosine transform in integer form.

4. The method of any preceding claim, wherein the size of the reference block is set to 8x8 and the step size is set to 6.

5. The method of any of the preceding claims, wherein the search window size is set to 9x9 and the step size is set to 2.

6. The method of any one of the preceding claims, wherein determining the N candidate blocks that are most similar to the reference block comprises using a manhattan distance to determine a similarity of the candidate blocks to the reference block.

7. The method of any one of the preceding claims, wherein N is 16, and when the number of candidate blocks satisfying the condition is less than 16, the reference block is padded among the set of candidate blocks.

8. A system configured to perform the method of any one of claims 1 to 7.

9. The system of claim 9, wherein the system comprises at least one of: a central processing unit CPU, an image processing unit GPU, an image signal processor chain, a field programmable gate array FPGA and other video chains.

10. A non-transitory computer storage medium having stored thereon instructions, which when executed by one or more processors, are configured to perform the method of any one of claims 1-7.