CN116017172A

CN116017172A - Raw domain image noise reduction method and device, camera and terminal

Info

Publication number: CN116017172A
Application number: CN202211673079.5A
Authority: CN
Inventors: 王琳琳
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2023-04-25

Abstract

The invention provides a noise reduction method of a Raw domain image, a device, a camera and a terminal thereof, wherein the noise reduction method directly processes a plurality of Bayer original domain Raw images captured under the same exposure, uses Bayer original frames to perform alignment and fusion processing, can obtain more pixel bit numbers, and fuses the plurality of Bayer original domain Raw images into one frame to be used as an output image after noise reduction, and then can further perform hardware ISP processing, so that the flow is more efficient. The multi-frame Raw image is subjected to multi-frame alignment processing by adopting an alignment algorithm based on the fast Fourier transform, so that the method is more stable, reduces the calculated amount, and improves edge artifacts and the like caused by alignment. By adopting the fusion method based on Kalman filtering, multi-frame fusion processing is carried out on multi-frame Raw images, so that motion ghosts can be effectively reduced, the noise of a moving object can be better removed, the motion blur is clear, the denoising effect is better, the time domain denoising effect is better, the denoising is more efficient, and the robustness is stronger.

Description

Raw domain image noise reduction method and device, camera and terminal

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for denoising a Raw domain image, a camera, and a terminal.

Background

Currently, a major technical obstacle to taking high quality photographs with cell phone cameras is insufficient light. Especially when shooting indoors or at night, the scene itself is not sufficiently lighted. In order to obtain sufficient light, one way commonly used is to use analog and digital gain, which results in further amplification of noise; another way that is commonly used is to extend the exposure time, which can lead to motion blur due to camera shake or subject motion. Also, daytime shots with high dynamic range may be affected by insufficient light, especially when the exposure is reduced to suppress high light, insufficient light is collected in the shadow area (dark area), and the noise is again amplified by local tone mapping to lighten the dark area. Methods of obtaining sufficient light also include large aperture lenses, flashlights, etc., but each is a trade-off. The mobile phone camera is limited by the thickness, the aperture can not be infinitely increased, and can be increased as much as possible in a limited range; flash lamps are the least popular option, and flash light is an undesirable way, albeit with a momentary increase in light. On one hand, the insufficient light leads to more noise generated by shooting under weak light; on the other hand, since an imaging element (typically CMOS) of a cellular phone camera is inevitably affected by thermal noise or the like, an image taken in low light is typically noisy. Image denoising is an indispensable task.

Image denoising methods are various, however, a single-frame denoising method also loses more details while denoising. The method can combine the spatial denoising of one frame and the time domain denoising of a plurality of frames, can keep more image details as much as possible, and achieves the purpose of effective denoising. The current multi-frame noise reduction method is already a mainstream noise reduction method of a mobile phone camera. The method has the advantages that the method is more in processing in the yuv domain, is widely applied, is large in calculated amount and long in time consumption, is always a pain point problem of multi-frame noise reduction, and is also a common problem of multi-frame noise reduction at present in the process of balancing noise and details, motion blurring or ghosting and the like.

Disclosure of Invention

The invention provides a method and a device for denoising a Raw domain image, a camera and a terminal, and the denoising effect is better.

In a first aspect, the present invention provides a method for denoising a Raw domain image, where the method for denoising a Raw domain image includes: receiving an input multi-frame Raw image, wherein the multi-frame Raw image is a multi-frame Bayer original domain Raw image grabbed under the same exposure; selecting one frame of Raw image from the multi-frame Raw images as a reference frame, and other frames of Raw images as frames to be aligned; performing multi-frame alignment processing on multi-frame Raw images by adopting an alignment algorithm based on fast Fourier transform to obtain position information of aligned blocks in all frames to be aligned and reference frames; and carrying out multi-frame fusion processing on the multi-frame Raw image by adopting a fusion method based on Kalman filtering to obtain an output image after noise reduction.

In the scheme, the captured multi-frame Bayer original domain Raw image under the same exposure is directly processed, namely the input multi-frame Bayer original domain Raw image is not used for carrying out alignment and fusion processing on RGB frames or YUV frames subjected to color correction, gamma, demosaicing and the like on a hardware image signal processor, the Bayer original frames are used for carrying out alignment and fusion processing, more pixel bit numbers can be obtained, and the multi-frame Bayer original domain Raw image is fused into one frame and then is used as an output image after noise reduction, and then hardware ISP processing can be carried out. In addition, the multi-frame Raw image is subjected to multi-frame alignment processing by adopting an alignment algorithm based on the fast Fourier transform, so that the method is more stable, the calculated amount is reduced, and meanwhile, edge artifacts and the like caused by alignment are improved. And the fusion method based on Kalman filtering is also adopted, multi-frame fusion processing is carried out on multi-frame Raw images, motion ghosts can be effectively reduced, noise of a moving object is better removed, the motion blur is clear, the denoising effect is better, the time domain denoising effect is better, the denoising is more efficient, and the robustness is stronger. In addition, the noise reduction method of the invention uses an improved raised cosine window, so that the calculated amount is obviously reduced, the occupied storage space is small, and the time consumption of the algorithm can be obviously reduced.

In a specific embodiment, before receiving the input multi-frame Raw image, the noise reduction method of the Raw domain image further includes: and confirming the number of frames required to grasp the Bayer original domain Raw image under the same exposure according to the shooting scene. By first performing shooting scene detection before processing, the number of frames for grabbing the Raw image is determined, so that noise reduction effect is prevented from being influenced due to the fact that the number of frames is too small, and meanwhile, calculation amount is prevented from being increased due to the fact that the number of frames is too large.

In a specific embodiment, according to a shooting scene, confirming the number of frames required to capture a Raw image of a Bayer primitive field under the same exposure includes: dividing different BV sections in advance; presetting underexposure area gray thresholds, underexposure area duty thresholds, overexposure area gray thresholds and overexposure area duty thresholds of different BV sections; receiving the detected ambient brightness BV value of the shooting scene, and judging the BV section to which the ambient brightness BV value belongs; calculating the ratio of the underexposure area to the overexposure area in R, G, B channels and the ratio of the overexposure area in R, G, B channels according to the gray threshold value of the underexposure area and the gray threshold value of the overexposure area of the BV section to which the BV value belongs, and screening out the ratio of the maximum underexposure area to the ratio of the maximum overexposure area in R, G, B channels; and determining the frame number of the Raw image of the Bayer original domain to be grabbed under the same exposure according to the underexposure area duty ratio threshold value and the overexposure area duty ratio threshold value of the BV section to which the ambient brightness BV value belongs and the maximum underexposure area duty ratio value and the maximum overexposure area duty ratio value. In the shooting scene detection part, not only the brightness value is considered for shooting scene division, but also the dynamic range is considered, and the accuracy of the determined frame number for grabbing the Raw image is improved, so that the noise reduction effect is improved.

In a specific embodiment, selecting a frame of Raw image from multiple frames of Raw images as a reference frame includes: and selecting the frame Raw image with the maximum sharpness from the multi-frame Raw images as a reference frame, so that the camera shake and the blurring caused by scene motion can be avoided.

In a specific embodiment, selecting a frame of Raw image from the multiple frames of Raw images as the reference frame further includes: the frame Raw image with the greatest sharpness is selected from the first 3 frames in the multi-frame Raw image as a reference frame, so that the influence of shutter delay can be reduced.

In a specific embodiment, an alignment algorithm based on fast fourier transform is adopted to perform multi-frame alignment processing on multi-frame Raw images, so as to obtain position information of all aligned blocks in the frame to be aligned and the reference frame, including:

preprocessing a multi-frame Raw image;

carrying out K times of pyramid downsampling on each frame of Raw image in the multi-frame Raw image to obtain K+1 layers of pyramid images with different sizes;

dividing 1-K layers of pyramid images obtained from each frame of Raw image into blocks;

searching in a preset range of the coarsest layer pyramid image of the frame to be aligned at present for each block of the reference frame to find a matched position, and sequentially transmitting a search result to the next layer pyramid image until final position information is found in the finest layer pyramid image;

Calculating a distance measurement value D of each block in the reference frame when aligning according to the reference frame and each frame to be aligned according to the following steps _p (u,v)：

Wherein T (x, y) is a block of coordinates (x, y) in the reference frame, I is a search area of the frame to be aligned, p is a power of a norm for alignment, n is a block size, (u) ₀ ,v ₀ ) An initial alignment position inherited from the coarsest layer of the pyramid;

based on distance measurement D of each block in the reference frame when aligned _p (u, v) obtaining the position information of the aligned blocks in all the frames to be aligned and the reference frame. The problems caused by undersampling are solved more easily in the subsequent alignment step, and the problems such as edge dislocation, edge ghost and the like can be avoided or reduced in a limited manner.

In a specific embodiment, the distance measurement minimized offset D is calculated as follows _p (u, v) comprising:

when p=2, calculate D using box filtering and convolution ₂ The following are provided:

wherein, the liquid crystal display device comprises a liquid crystal display device,first item

Represents the sum of squares of the T elements; the second box (i° I, n) represents the sum of squares of I elements filtered using a non-normalized box filter of size nxn; third item 2 (F ^-1 { F { I }. Times.F { T }) is proportional to the cross-correlation of I and T, calculated by the fast Fourier transform;

Determining integer displacement

Minimizing displacement errors. To improve the computational efficiency.

In a specific embodiment, an integer displacement is determined

Minimizing displacement errors, comprising:

fitting a binary polynomial to

And solving the minimum value of the polynomial, and completing sub-pixel estimation of each block:

wherein A is a positive semi-definite matrix of 2x2, b is a vector of 2x1, c is a scalar;

fitting by a weighted least squares

The displacement error is minimized, so that the sub-pixel estimation of the moving part is conveniently completed, and the displacement error is conveniently minimized.

In a specific embodiment, a fusion method based on kalman filtering is adopted to perform multi-frame fusion processing on multi-frame Raw images to obtain an output image after noise reduction, and the method comprises the following steps: sequentially selecting a frame of Raw image from multiple frames of Raw images to serve as a current frame of Raw image, and processing blocks of each frame in the current frame of Raw image and a frame of Raw image before the current frame of Raw image by using Kalman filtering, wherein the frame of Raw image before the first frame of Raw image in the multiple frames of Raw image is the last frame of Raw image in the multiple frames of Raw image; and fusing the reference frame and other frames to be aligned according to preset weights to obtain an output image after noise reduction. The time domain noise can be effectively removed, and image details can be kept as much as possible.

In a second aspect, the present invention further provides a noise reduction device for a Raw domain image, where the noise reduction device for a Raw domain image includes: the device comprises a receiving module, a screening module, an alignment module and a fusion module. The receiving module is used for receiving input multi-frame Raw images, wherein the multi-frame Raw images are multi-frame Bayer original domain Raw images grabbed under the same exposure. The screening module is used for selecting one frame of Raw image from the multi-frame Raw images as a reference frame, and other frames of Raw images as frames to be aligned. The alignment module is used for carrying out multi-frame alignment processing on multi-frame Raw images by adopting an alignment algorithm based on fast Fourier transform to obtain the position information of the aligned blocks in all frames to be aligned and the reference frame. The fusion module is used for carrying out multi-frame fusion processing on multi-frame Raw images by adopting a fusion method based on Kalman filtering to obtain an output image after noise reduction.

In a specific embodiment, the noise reduction device for the Raw domain image further includes a determining module, where the determining module is configured to determine, according to a shooting scene, a number of frames required to capture the Raw domain Bayer image under the same exposure before receiving the input multi-frame Raw image. By first performing shooting scene detection before processing, the number of frames for grabbing the Raw image is determined, so that noise reduction effect is prevented from being influenced due to the fact that the number of frames is too small, and meanwhile, calculation amount is prevented from being increased due to the fact that the number of frames is too large.

In a specific embodiment, when the determining module determines, according to a shooting scene, the number of frames required to capture a Raw image of a Bayer original domain under the same exposure, the following manner is adopted: dividing different BV sections in advance; presetting underexposure area gray thresholds, underexposure area duty thresholds, overexposure area gray thresholds and overexposure area duty thresholds of different BV sections; receiving the detected ambient brightness BV value of the shooting scene, and judging the BV section to which the ambient brightness BV value belongs; calculating the ratio of the underexposure area to the overexposure area in R, G, B channels and the ratio of the overexposure area in R, G, B channels according to the gray threshold value of the underexposure area and the gray threshold value of the overexposure area of the BV section to which the BV value belongs, and screening out the ratio of the maximum underexposure area to the ratio of the maximum overexposure area in R, G, B channels; and determining the frame number of the Raw image of the Bayer original domain to be grabbed under the same exposure according to the underexposure area duty ratio threshold value and the overexposure area duty ratio threshold value of the BV section to which the ambient brightness BV value belongs and the maximum underexposure area duty ratio value and the maximum overexposure area duty ratio value. In the shooting scene detection part, not only the brightness value is considered for shooting scene division, but also the dynamic range is considered, and the accuracy of the determined frame number for grabbing the Raw image is improved, so that the noise reduction effect is improved.

In a specific embodiment, the screening module selects a frame Raw image with the greatest sharpness from multiple frames Raw images as a reference frame, so that camera shake and blurring caused by scene motion can be avoided.

In a specific embodiment, the screening module selects the frame Raw image with the greatest sharpness from the first 3 frames in the multi-frame Raw image, and the frame Raw image is used as a reference frame, so that the influence of shutter delay can be reduced.

In a specific embodiment, when the alignment module performs multi-frame alignment processing on multi-frame Raw images by adopting an alignment algorithm based on fast fourier transform to obtain position information of aligned blocks in all frames to be aligned and reference frames, the following steps are adopted:

preprocessing a multi-frame Raw image;

Wherein T (c, y) is a block of coordinates (x, y) in the reference frame, I is a search area of the frame to be aligned, p is a power of a norm for alignment, n is a block size, (u) ₀ ,v ₀ ) An initial alignment position inherited from the coarsest layer of the pyramid;

based on distance measurement D of each block in the reference frame when aligned _p (u, v) obtaining the position information of the aligned blocks in all the frames to be aligned and the reference frame. Problems with undersamplingThe problem of edge dislocation, edge ghost and the like can be avoided or reduced in a limited way when the device is put into a subsequent alignment step and is easier to solve.

In a specific embodiment, the alignment module calculates the offset D after minimizing the distance measurement _p (u, v) when p=2, the D is calculated first using the box filtering and convolution ₂ The following are provided:

wherein the first item

Represents the sum of squares of the T elements; the second box (i° I, n) represents the sum of squares of I elements filtered using a non-normalized box filter of size nxn; third item 2 ^-1 { F { I }. Times.F { T }) is proportional to the cross-correlation of I and T, calculated by the fast Fourier transform;

Determining the integer displacement

Minimizing displacement errors. To improve the computational efficiency.

In a specific embodiment, the alignment module is configured to determine an integer displacement

In minimizing the displacement error, the following steps are taken:

the following binary polynomials are fitted first

fitting with a weighted least squares

In a specific embodiment, the fusion module adopts a fusion method based on Kalman filtering to perform multi-frame fusion processing on multi-frame Raw images, and the following steps are adopted to obtain the noise-reduced output image: sequentially selecting a frame of Raw image from multiple frames of Raw images to serve as a current frame of Raw image, and processing blocks of each frame in the current frame of Raw image and a frame of Raw image before the current frame of Raw image by using Kalman filtering, wherein the frame of Raw image before the first frame of Raw image in the multiple frames of Raw image is the last frame of Raw image in the multiple frames of Raw image; and fusing the reference frame and other frames to be aligned according to preset weights to obtain an output image after noise reduction. The time domain noise can be effectively removed, and image details can be kept as much as possible.

In a third aspect, the present invention also provides a camera comprising: the image acquisition module is connected with the noise reduction device of any one of the Raw domain images in a communication way.

In a fourth aspect, the present invention also provides a terminal, which includes any one of the cameras described above.

Drawings

Fig. 1 is a flowchart of a method for denoising a Raw domain image according to an embodiment of the present invention;

FIG. 2 is a flowchart of another method for denoising a Raw domain image according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of another method for denoising a Raw domain image according to an embodiment of the present invention;

FIG. 4 is an example of a histogram of pixel value distribution of an image according to an embodiment of the present invention;

fig. 5 is a view illustrating an example of division of a shooting scene according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a selection of reference frames according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of pyramid sampling according to an embodiment of the present invention;

FIG. 8 is a schematic alignment diagram according to an embodiment of the present invention;

fig. 9 is a schematic block diagram of a noise reduction device for a Raw domain image according to an embodiment of the present invention.

Reference numerals:

10-receiving module 20-screening module 30-alignment module 40-fusion module 50-determination module

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to facilitate understanding of the method for denoising a Raw domain image provided by the embodiment of the present invention, an application scenario of the method for denoising a Raw domain image provided by the embodiment of the present invention is first described below, where the method for denoising a Raw domain image is applied to a camera of a terminal such as, but not limited to, a mobile phone, a notebook computer, and the like, and is used for denoising a captured image. The following describes the denoising method of the Raw domain image in detail with reference to the accompanying drawings.

Referring to fig. 1, the method for denoising a Raw domain image provided by the embodiment of the invention includes:

step10: receiving an input multi-frame Raw image, wherein the multi-frame Raw image is a multi-frame Bayer original domain Raw image grabbed under the same exposure;

Step20: selecting one frame of Raw image from the multi-frame Raw images as a reference frame, and other frames of Raw images as frames to be aligned;

step30: performing multi-frame alignment processing on multi-frame Raw images by adopting an alignment algorithm based on fast Fourier transform to obtain position information of aligned blocks in all frames to be aligned and reference frames;

step40: and carrying out multi-frame fusion processing on the multi-frame Raw image by adopting a fusion method based on Kalman filtering to obtain an output image after noise reduction.

In the scheme, the captured multi-frame Bayer original domain Raw image under the same exposure is directly processed, namely the input multi-frame Bayer original domain Raw image is not used for carrying out alignment and fusion processing on RGB frames or YUV frames subjected to color correction, gamma, demosaicing and the like on a hardware image signal processor, the Bayer original frames are used for carrying out alignment and fusion processing, more pixel bit numbers can be obtained, and the multi-frame Bayer original domain Raw image is fused into one frame and then is used as an output image after noise reduction, and then hardware ISP processing can be carried out. In addition, the multi-frame Raw image is subjected to multi-frame alignment processing by adopting an alignment algorithm based on the fast Fourier transform, so that the method is more stable, the calculated amount is reduced, and meanwhile, edge artifacts and the like caused by alignment are improved. And the fusion method based on Kalman filtering is also adopted, multi-frame fusion processing is carried out on multi-frame Raw images, motion ghosts can be effectively reduced, noise of a moving object is better removed, the motion blur is clear, the denoising effect is better, the time domain denoising effect is better, the denoising is more efficient, and the robustness is stronger. In addition, the noise reduction method of the invention uses an improved raised cosine window, so that the calculated amount is obviously reduced, the occupied storage space is small, and the time consumption of the algorithm can be obviously reduced. The steps are described in detail below with reference to the accompanying drawings.

As shown in fig. 2, before receiving the input multi-frame Raw image, the noise reduction method of the Raw domain image may further include: and confirming the number of frames required to grasp the Bayer original domain Raw image under the same exposure according to the shooting scene. By first performing shooting scene detection before processing, the number of frames for grabbing the Raw image is determined, so that noise reduction effect is prevented from being influenced due to the fact that the number of frames is too small, and meanwhile, calculation amount is prevented from being increased due to the fact that the number of frames is too large.

For low dynamic scenes, the darker the ambient brightness, the greater the noise, the brighter the ambient brightness and the less noise. For high dynamic scenes, the more the dark area luminance is raised, the more noise is generated, since tone mapping may be used subsequently to raise the dark area luminance. The multi-frame image is utilized to carry out time domain noise suppression, the denoising effect and the fusion frame number form positive correlation, and generally, the more the fusion frame number is, the better the denoising effect is. Therefore, the number of frames required to capture the Raw image of the Bayer original field can be set according to the BV value of the ambient brightness of different shooting scenes and different dynamic ranges. The dynamic range may be divided according to a histogram of gray value distribution of one photographed image as shown in fig. 4.

Specifically, when confirming the number of frames required to capture the Raw image of the Bayer original field under the same exposure according to the shooting scene, the following steps can be adopted. First, different BV segments are pre-partitioned. Exemplary BV segments are shown in FIG. 5, with 5 segments, 0-400, 300-700, 600-1000, 900-1300, 1200-1600, respectively. Of course, the manner of dividing the BV segments is not limited to that shown in fig. 5. Then, the gray level threshold (TH 1 shown in fig. 4) of the underexposure area, the duty ratio threshold of the underexposure area (the range difference of the judgment a shown in fig. 5 is the duty ratio threshold of the underexposure area), the gray level threshold of the overexposure area (TH 2 shown in fig. 4) and the duty ratio threshold of the overexposure area (the range difference of the judgment B shown in fig. 5 is the duty ratio threshold of the overexposure area), and the corresponding frame number under each threshold are preset. The under-exposure gray threshold Th1, the over-exposure gray threshold Th2, the under-exposure duty threshold, the over-exposure duty threshold and the frames corresponding to different dynamic ranges under different BV values are all set as adjustable parameters. Next, the above-described divided BV segments and preset parameters are adopted for practical application.

Referring to fig. 5, first, the detected ambient brightness BV value of the shooting scene is received, and a BV segment to which the ambient brightness BV value belongs is determined. Then, according to the ambient brightness BV value, the BV segment to which the current shooting scene belongs can be judged, and then according to the preset underexposure gray threshold and overexposure gray threshold in the determined BV segment, the underexposure area duty ratio (a shown in fig. 5 represents the calculated underexposure area duty ratio) and the overexposure area duty ratio (B shown in fig. 5 represents the calculated overexposure area duty ratio) of the ambient brightness BV value in the R, G, B three channels respectively under the shooting scene are calculated. That is, in the same BV segment, the underexposure area duty (the duty of the pixel number counted in the range of [0, th1] is set as A, A.epsilon.0, 1 ]) and the overexposure area duty (the duty of the pixel number counted in the range of [ Th2, 255] is set as B, B.epsilon.0, 1 ]) are calculated from the histogram. Since the image is a raw domain image, the histogram data is counted for R, G, B three channels. Then, the maximum underexposure area occupation ratio and the maximum overexposure area occupation ratio in the three channels of R, G, B are screened out. And finding out a preset underexposure area duty ratio threshold value and an overexposure duty ratio threshold value of the BV section according to the BV section to which the BV value of the ambient brightness of the current shooting scene belongs, judging an underexposure area duty ratio threshold value section and an overexposure duty ratio threshold value section in which the maximum underexposure area duty ratio value and the maximum overexposure area duty ratio value are located, and then determining the number of frames for grabbing the Bayer original domain Raw image under the same exposure. The frame number for grabbing the Raw image is set according to different dynamic ranges and brightness BV values. In the shooting scene detection part, not only the brightness value is considered for shooting scene division, but also the dynamic range is considered, and the accuracy of the determined frame number for grabbing the Raw image is improved, so that the noise reduction effect is improved.

Referring to fig. 1, an input multi-frame Raw image is received, and the multi-frame Raw image is a multi-frame Bayer original-domain Raw image captured under the same exposure. Specifically, the camera module needs to capture a plurality of frames of Bayer original domain Raw images under the same exposure environment and serve as original images for noise reduction processing. The captured multi-frame Bayer original domain Raw image can be respectively the sequence of a first frame, a second frame and a third frame according to the capturing sequence.

Next, referring to fig. 1, 2 and 3, one frame Raw image is selected as a reference frame from the multi-frame Raw images, and the other frame Raw images are used as frames to be aligned. When a frame of Raw image is selected from a plurality of frames of Raw images to serve as a reference frame, the frame of Raw image with the greatest sharpness can be selected from the plurality of frames of Raw images to serve as the reference frame, and blurring caused by camera shake and scene motion can be avoided. In addition, in a more preferable embodiment, the frame Raw image with the greatest sharpness can be selected from the first 3 frames in the multi-frame Raw image as the reference frame, so that the influence of shutter delay can be reduced. Of course, it should be noted that the manner of selecting the reference frame is not limited to the manner shown above, and other manners may be adopted.

Specifically, simple gradient calculation can be performed on the input captured multi-frame Raw image. Reference is made to the outline of the frame-picking diagram shown in fig. 6. First, each frame of Raw image may be divided into 13×10 large blocks, gradients in horizontal, vertical, and diagonal directions are calculated in each large block by taking adjacent 3*3 pixels as a whole, and gradients in all 3*3 pixels in the large blocks are averaged together to obtain a gradient value for each large block. And then sequencing gradient values of 13 x 10 large blocks, and selecting a block A with the largest gradient. Thereafter, a gradient average value of 3*3 large blocks centering on the large block a is calculated as a final sharpness value of the current frame. And then, calculating the definition of each frame of the previous 3 frames of images of the input multi-frame Raw image according to the method, and selecting the most clear frame as a reference frame.

Next, as shown in fig. 1, fig. 2 and fig. 3, an alignment algorithm based on fast fourier transform is adopted to perform multi-frame alignment processing on multi-frame Raw images, so as to obtain position information of aligned blocks in all frames to be aligned and reference frames.

When an alignment algorithm based on fast fourier transform is adopted to perform multi-frame alignment processing on multi-frame Raw images to obtain position information of aligned blocks in all frames to be aligned and reference frames, the following manner can be adopted:

(1) And preprocessing the multi-frame Raw image. Since each frame of Raw image input is an original Bayer domain Raw image, the original Bayer domain Raw image is undersampled when four planes are present, so alignment faces special challenges. The solution method can be to perform demosaicing processing on each frame of original Bayer domain Raw image which is input originally, so as to estimate the RGB value of each pixel, but even a simple demosaicing algorithm runs on a multi-frame image, a large storage space is occupied, and the method can influence the performance of the algorithm. In a more preferred embodiment, the shifted Bayer samples can be made to have a uniform color by estimating the shift by a multiple of 2 pixels. Essentially, the problem caused by undersampling is put into a subsequent alignment step, and the solution is easier.

(2) And carrying out pyramid downsampling on each frame of Raw image in the multi-frame Raw image for K times to obtain K+1 layers of pyramid images with different sizes. Exemplary k=4 as shown in fig. 7, it is of course noted that K is not limited to only 4. In particular, pyramidal downsampling is performed in a manner as shown in fig. 7. In the above frame selection, the frame with the best definition is selected from the previous 3 frames as the reference frame, then the other frames are the frames to be aligned, and the purpose of the alignment algorithm is to align the frames to be aligned with the reference frame. Because the shake exists inevitably when the terminal camera such as a mobile phone shoots, the acquired input frames are unlikely to be identical, and the problems of dislocation, offset and the like exist, the alignment algorithm is very critical, and the alignment algorithm used in the method can be used for limiting the problems of edge dislocation, edge ghost and the like. As shown in fig. 7, the grayscale images of the reference frame and the frame to be aligned are subjected to four pyramid downsampling to obtain a total of 5 layers of pyramid images with different sizes, so that coarse-to-fine alignment can be performed subsequently. Next, the alignment operation is started.

(3) Referring to fig. 8, 1 to K layers of pyramid images obtained for each frame Raw image are divided into blocks.

(4) And searching in a preset range of the coarsest layer pyramid image of the frame to be aligned currently for each block of the reference frame to find a matched position, and sequentially transmitting a search result to the next layer pyramid image until final position information is found in the finest layer pyramid image. I.e. starting from the topmost (coarse) layer of the pyramid for each block of the reference frame, searching within a certain range in the current frame to be aligned to find the matching position and passing the result to the next layer, such coarse to fine alignment, each block gets a final position information at the finest layer.

(5) The blocks in each reference are aligned by an offset that is minimized by the distance measurement of the above equation, which is associated with the blocks of the frame to be aligned. The distance measurement of each block in the reference frame when aligning can be calculated according to the following formula according to the reference frame and each frame to be alignedValue D _p (u,v)：

Where T (x, y) is a block of coordinates (x, y) in the reference frame, I is the search area of the frame to be aligned, p is the power of the norm for alignment (specifically, p is equal to 1 or 2), n is the size of the block (specifically, n may be 8 or 16), (u) ₀ ,v ₀ ) Is the initial alignment position inherited from the coarsest layer of the pyramid (e.g., layer 4 in fig. 7).

(6) Based on distance measurement D of each block in the reference frame when aligned _p (u, v) obtaining the position information of the aligned blocks in all the frames to be aligned and the reference frame. The problems caused by undersampling are solved more easily in the subsequent alignment step, and the problems such as edge dislocation, edge ghost and the like can be avoided or reduced in a limited manner.

The invention uses the block size, the search radius and the distance measurement value D _p (u, v) formula, alignment calculation is performed at different pyramid layers. Under the coarse pyramid layer, the present invention computes sub-pixel alignment, minimizes the L2 residual, uses a larger search radius, and allows for pyramid extraction. Under the finest pyramid layer, the present invention calculates pixel level alignment, minimizes the L1 residual, and limits the search radius to a smaller range.

Wherein, in the step (5), the offset D is calculated after the distance measurement value is minimized _p (u, v) at p=1, the formula shown in the above step (5) can be directly used for calculation. However, when p=2, on a coarse scale, the distance measurement D described above _p The (u, v) formula is computationally inefficient. In a more preferable manner, when p=2, the following processing can be performed in addition to the formula shown in the above step (5).

First, using a similar way to accelerate normalized cross-correlation, D is calculated using box filtering and convolution ₂ The following are provided:

wherein the first item

Represents the sum of squares of the T elements; the second box (i° I, n) represents the sum of squares of I elements filtered using a non-normalized box filter of size nxn; third item 2 ^-1 { F { I }. Times.F { T }) is proportional to the cross-correlation of I and T, calculated by the fast Fourier transform. It should be explained that the main function of the box filtering is to sum the pixel values within each window quickly together at a given sliding window size.

Computing D using box filtering and convolution ₂ Thereafter, an integer displacement is determined

Minimizing displacement errors. To improve the computational efficiency. In particular in determining integer displacements +.>

In minimizing the displacement error, the following procedure may be employed. First, a binary polynomial is fitted to +.>

After sub-pixel estimation of each block is completed, a weighted least squares fit may be used

Minimizing displacement errorsSub-pixel estimation of the moving part is convenient to complete, and displacement error minimization is convenient to achieve. Solving the system corresponds to solving D ₂ Inner product with a derived set of 6 3x3 filters, each corresponding to (a, b, c). When the parameters of the quadratic equation are restored, the minimum value is the complement square:

μ＝-A ^-1 b

where the vector μ represents the sub-pixel transformation, which must be added to the integer displacement

Next, referring to fig. 1,2 and 3, a fusion method based on kalman filtering is adopted to perform multi-frame fusion processing on multi-frame Raw images, and time domain denoising is performed to obtain a denoised output image. It should be explained that Kalman filtering (Kalman filtering) is an algorithm for optimally estimating the system state by using a linear system state equation and by inputting and outputting observation data through the system. The kalman filter, given a known measurement variance, enables estimation of the state of the dynamic system from a series of data in which measurement noise is present.

When a fusion method based on Kalman filtering is specifically adopted to perform multi-frame fusion processing on multi-frame Raw images to obtain an output image after noise reduction, the following steps can be adopted. Firstly, sequentially selecting a frame of Raw image from multiple frames of Raw images as a current frame of Raw image, and processing blocks of each frame in the current frame of Raw image and a frame of Raw image before the current frame of Raw image by using Kalman filtering, wherein the frame of Raw image before the first frame of Raw image in the multiple frames of Raw image is the last frame of Raw image in the multiple frames of Raw image. For example, s frames are shot in a certain shooting scene, and are respectively indicated by 1,2, … and s, and for each frame, the current frame and the previous frame are used for processing (if the frame is 1 st, the frame is the previous frame). In the processing, the block processing is also performed, the principle of Kalman filtering is used, the position information of each block is calculated in the alignment step, the position information is used for finding the pixel value corresponding to the current frame and the previous frame, the Kalman filtering is used for re-estimating a value as the value of the pixel of the current frame, and the value of the pixel in a certain block of the current frame is predicted (or estimated). Each frame is processed once, which is equivalent to performing kalman filtering time domain denoising processing on each frame. And then fusing the reference frame and other frames to be aligned according to preset weights to obtain an output image after noise reduction. The time domain noise can be effectively removed, and image details can be kept as much as possible.

In practical photography applications, the fusion algorithm must be robust to failure of the alignment algorithm. Although the alignment algorithm is important for motion compensation, the alignment failure is caused by various reasons, such as non-rigid motion, illumination change or abrupt shielding, and the like, so that the problems of ghosting, edge ghosting and the like are generated. The Kalman filtering-based fusion method provided by the invention is based on consideration of performance, and the Kalman filtering is a time domain filter processed on an input image block, so that time domain noise points can be effectively removed, and image details can be reserved as much as possible. In the alignment algorithm, the position information of all the blocks of which the images to be aligned are aligned with the reference frame images is obtained, and the next fusion algorithm fuses all the blocks of different scales. The principle of the fusion method of the kalman filter is described as follows.

In the alignment algorithm, the position information of the blocks where all the images to be aligned are aligned with the reference frame image is obtained. The following fusion algorithm will fuse all blocks of different scales. The fusion algorithm of the invention processes on half of the overlapped patches in each space dimension, namely, the pixels of tile_scale0/2 are additionally taken from the upper, lower, left and right sides of the tile_scale 0-sized image block, and discontinuity of the boundary of each block can be avoided by smoothly mixing the overlapped blocks. Furthermore, a window function needs to be applied to the image block to avoid edge artifacts. The invention uses a modified raised cosine window 1/2-1/2cos (2pi (x+1/2)/n), 0.ltoreq.x < n. Unlike the conventional window, when the function is repeated with n=2 overlapping samples, the total contribution from all blocks is 1 in the sum of each position. On the other hand, the window is shifted by half, so that zero of the window caused by modification of the denominator can be avoided. A 0 in the window corresponds to a pixel that has no output, which means that smaller blocks can be used to the same effect, thus saving computation.

First, each image block of the current image and the previous frame image of the current image is processed using kalman filtering, and the principle is to re-estimate the current frame using the current frame and the past frame (previous frame). Thus, the above operations are performed on all the input frames, and the fusion weight of each frame is determined according to the finally calculated estimated value.

Each frame of the input image is processed as follows:

assume that the pixel values of the fused frame are represented by the following formula:

the coordinates of the pixel values are (x, y), and Zi is a pixel representation calculated by the fusion frame, wherein noise information is contained. Let the pixel values of the aligned input frame be expressed by:

wherein the coordinates of the pixel value are (X, y), X _i For a pixel representation of an input frame,

and->

Including noise information.

The invention uses Z _i Seen as an observation in Kalman filtering, X is taken as _i Seen as a state value in kalman filtering. The relationship between the observed value and the state value, namely H _i Taking out the coordinate values in the state, and processing noise to obey normal distribution, wherein sigma is used for representing errors:

Z _i ＝H _i X _i +V _i

wherein V is _i ～N(0,R)，

The average value and variance of noise are 0 and Q respectively _i Gaussian distribution w of (2) _i Further estimation of position is denoted Φ, and assume that the motion of the pixel points from frame to frame obeys:

Then:

X _i ＝Φ _i-1 X _i-1 +W _i-1 wherein phi is _i-1 Expressed as:

next, prediction is first performed, subject to the following equation:

wherein () 'represents the predicted estimate and ()' represents the final state estimate, the above equation

And->

The state at i-1 time (corresponding to that frame) versus i time (the following frame) and its covariance estimate are shown, respectively.

Finally, the final state and the estimated value are calculated by taking the Kalman filtering formula:

after the kalman filter processing is performed on each of the input frames, the time domain noise is already effectively controlled. And then, fusing the processed input frames according to preset weights, namely fusing the reference frames and other frames to be aligned according to the preset weights, and obtaining the noise-reduced output image.

Since the Kalman filtering is time domain filtering and only acts on time domain noise, a spatial filtering in DFT domain can be added at the end of the Kalman filtering algorithm to process spatial noise.

In the various embodiments shown above, the multiple frames of the Bayer original domain Raw image captured under the same exposure are directly processed, that is, the input multiple frames of the Bayer original domain Raw image are not used for the RGB frames or the YUV frames processed by color correction, gamma, demosaicing and the like on the hardware image signal processor, but are aligned and fused by using the Bayer original frames, so that more pixel bit numbers can be obtained, and the multiple frames of the Bayer original domain Raw image are fused into one frame to be used as the output image after noise reduction, and then the hardware ISP processing can be performed, so that the flow is more efficient. In addition, the multi-frame Raw image is subjected to multi-frame alignment processing by adopting an alignment algorithm based on the fast Fourier transform, so that the method is more stable, the calculated amount is reduced, and meanwhile, edge artifacts and the like caused by alignment are improved. And the fusion method based on Kalman filtering is also adopted, multi-frame fusion processing is carried out on multi-frame Raw images, motion ghosts can be effectively reduced, noise of a moving object is better removed, the motion blur is clear, the denoising effect is better, the time domain denoising effect is better, the denoising is more efficient, and the robustness is stronger. In addition, the noise reduction method of the invention uses an improved raised cosine window, so that the calculated amount is obviously reduced, the occupied storage space is small, and the time consumption of the algorithm can be obviously reduced.

In addition, the embodiment of the invention also provides a noise reduction device for a Raw domain image, referring to fig. 1 and 9, the noise reduction device for the Raw domain image includes: a receiving module 10, a screening module 20, an alignment module 30 and a fusion module 40. The receiving module 10 is configured to receive an input multi-frame Raw image, where the multi-frame Raw image is a multi-frame Bayer original domain Raw image captured under the same exposure. The filtering module 20 is configured to select one frame of Raw image from multiple frames of Raw images as a reference frame, and other frames of Raw images as frames to be aligned. The alignment module 30 is configured to perform multi-frame alignment processing on the multi-frame Raw image by using an alignment algorithm based on a fast fourier transform, so as to obtain position information of aligned blocks in all frames to be aligned and reference frames. The fusion module 40 is configured to perform multi-frame fusion processing on the multi-frame Raw image by using a fusion method based on kalman filtering, so as to obtain a noise-reduced output image.

In addition, referring to fig. 2 and 9, the noise reduction device for a Raw domain image may further include a determining module 50, where the determining module 50 is configured to confirm, according to a shooting scene, a number of frames required to capture a Bayer original domain Raw image under the same exposure before receiving an input multi-frame Raw image. By first performing shooting scene detection before processing, the number of frames for grabbing the Raw image is determined, so that noise reduction effect is prevented from being influenced due to the fact that the number of frames is too small, and meanwhile, calculation amount is prevented from being increased due to the fact that the number of frames is too large.

When determining the number of frames required to capture the Raw image of the Bayer original field under the same exposure according to the shooting scene, the determining module 50 may adopt the following manner: dividing different BV sections in advance; presetting underexposure area gray thresholds, underexposure area duty thresholds, overexposure area gray thresholds and overexposure area duty thresholds of different BV sections; receiving the detected ambient brightness BV value of the shooting scene, and judging the BV section to which the ambient brightness BV value belongs; calculating the ratio of the underexposure area to the overexposure area in R, G, B channels and the ratio of the overexposure area in R, G, B channels according to the gray threshold value of the underexposure area and the gray threshold value of the overexposure area of the BV section to which the BV value belongs, and screening out the ratio of the maximum underexposure area to the ratio of the maximum overexposure area in R, G, B channels; and determining the frame number of the Raw image of the Bayer original domain to be grabbed under the same exposure according to the underexposure area duty ratio threshold value and the overexposure area duty ratio threshold value of the BV section to which the ambient brightness BV value belongs and the maximum underexposure area duty ratio value and the maximum overexposure area duty ratio value. In the shooting scene detection part, not only the brightness value is considered for shooting scene division, but also the dynamic range is considered, and the accuracy of the determined frame number for grabbing the Raw image is improved, so that the noise reduction effect is improved.

The above screening module 20 may select the frame Raw image with the greatest sharpness from the multi-frame Raw images as the reference frame, so as to avoid the shake of the shooting hand and the blur caused by the scene motion. Further, the filtering module 20 may select the frame Raw image with the greatest sharpness from the first 3 frames in the multi-frame Raw image, which is used as a reference frame, so as to reduce the influence of shutter delay.

When the alignment module 30 performs multi-frame alignment processing on multi-frame Raw images by adopting an alignment algorithm based on fast fourier transform to obtain the position information of all the blocks aligned in the frame to be aligned and the reference frame, the following steps may be adopted:

preprocessing a multi-frame Raw image;

Wherein the alignment module 30 minimizes the offset D after the calculated distance measurement _p (u, v) when p=2, the D can be calculated first using the box filtering and convolution ₂ The following are provided:

wherein the first item

Determining the integer displacement

Minimizing displacement errors. To improve the computational efficiency.

Wherein the alignment module 30 is determining an integer displacement

In minimizing the displacement error, the following steps may be employed:

the following binary polynomials are fitted first

fitting with a weighted least squares

The above-mentioned fusion module 40 performs multi-frame fusion processing on multi-frame Raw images by adopting a fusion method based on kalman filtering, and the following steps may be adopted to obtain the noise-reduced output image: sequentially selecting a frame of Raw image from multiple frames of Raw images to serve as a current frame of Raw image, and processing blocks of each frame in the current frame of Raw image and a frame of Raw image before the current frame of Raw image by using Kalman filtering, wherein the frame of Raw image before the first frame of Raw image in the multiple frames of Raw image is the last frame of Raw image in the multiple frames of Raw image; and fusing the reference frame and other frames to be aligned according to preset weights to obtain an output image after noise reduction. The time domain noise can be effectively removed, and image details can be kept as much as possible.

It should be noted that each of the above-described determination module 50, the reception module 10, the screening module 20, the alignment module 30, and the fusion module 40 is a functional module that includes not only software codes for realizing the respective functions but also a storage medium storing the respective software codes and a processor executing the respective software codes.

In addition, the embodiment of the invention also provides a camera, which comprises: the image acquisition module is connected with the noise reduction device of any one of the Raw domain images in a communication way. The image acquisition module can be a camera module, and the noise reduction device can be a functional device which is in communication connection with the camera module and integrates software algorithm and hardware.

Furthermore, the embodiment of the invention also provides a terminal, which comprises any one of the cameras. The terminal can be a mobile phone terminal, a notebook terminal, a tablet computer and other terminals.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A method for denoising a Raw domain image, comprising:

receiving an input multi-frame Raw image, wherein the multi-frame Raw image is a multi-frame Bayer original domain Raw image grabbed under the same exposure;

selecting one frame of Raw image from the multi-frame Raw images as a reference frame, and other frames of Raw images as frames to be aligned;

performing multi-frame alignment processing on the multi-frame Raw image by adopting an alignment algorithm based on fast Fourier transform to obtain position information of aligned blocks in all frames to be aligned and the reference frame;

and carrying out multi-frame fusion processing on the multi-frame Raw image by adopting a fusion method based on Kalman filtering to obtain an output image after noise reduction.

2. The noise reduction method of claim 1, further comprising, prior to said receiving the input multi-frame Raw image:

and confirming the number of frames required to grasp the Bayer original domain Raw image under the same exposure according to the shooting scene.

3. The noise reduction method as set forth in claim 2, wherein the step of confirming the number of frames required to capture the Raw image of the Bayer primitive field at the same exposure according to the photographed scene includes:

dividing different BV sections in advance;

presetting underexposure area gray thresholds, underexposure area duty thresholds, overexposure area gray thresholds and overexposure area duty thresholds of different BV sections;

Receiving the detected ambient brightness BV value of the shooting scene, and judging a BV section to which the ambient brightness BV value belongs;

calculating the ratio of the underexposure area to the overexposure area and the ratio of the overexposure area in the three R, G, B channels respectively according to the gray threshold value of the underexposure area and the gray threshold value of the overexposure area of the BV section to which the BV value belongs, and screening out the ratio of the maximum underexposure area to the ratio of the maximum overexposure area in the three R, G, B channels;

and determining the frame number of the Raw image of the Bayer original domain to be grabbed under the same exposure according to the underexposure area duty ratio threshold value and the overexposure area duty ratio threshold value of the BV section to which the ambient brightness BV value belongs and the maximum underexposure area duty ratio value and the maximum overexposure area duty ratio value.

4. The method of noise reduction according to claim 1, wherein selecting a frame Raw image from the plurality of frames Raw image as a reference frame comprises:

and selecting a frame Raw image with the largest sharpness from the multi-frame Raw images as the reference frame.

5. The method of noise reduction according to claim 4, wherein selecting a frame Raw image from the plurality of frames Raw image as a reference frame further comprises:

and selecting a frame Raw image with the largest sharpness from the first 3 frames in the multi-frame Raw images as the reference frame.

6. The method of noise reduction according to claim 1, wherein said performing multi-frame alignment processing on the multi-frame Raw image using an alignment algorithm based on a fast fourier transform to obtain position information of aligned blocks in all the frames to be aligned and the reference frame includes:

preprocessing the multi-frame Raw image;

calculating a reference frame according to the following formula according to the reference frame and each frame to be alignedDistance measurement D of each block when aligned _p (u，v)：

Wherein T (x, y) is a block of coordinates (x, y) in the reference frame, I is a search area of the frame to be aligned, p is a power of a norm for alignment, n is a block size, (u) ₀ ，v ₀ ) An initial alignment position inherited from the coarsest layer of the pyramid;

Based on distance measurement D of each block in the reference frame when aligned _p (u, v) obtaining the position information of all the blocks aligned in the frame to be aligned and the reference frame.

7. The method of noise reduction according to claim 6, wherein the calculated distance measurement minimizes an offset D _p (u, v) comprising:

when p=2, D2 is calculated using the box filter and convolution as follows:

wherein the first item

determining integer displacement

Minimizing displacement errors.

8. Such as weightThe method of noise reduction according to claim 7, wherein the determining an integer displacement

Minimizing displacement errors, comprising:

fitting a binary polynomial to

fitting by a weighted least squares

Minimizing displacement errors.

9. The noise reduction method according to claim 1, wherein the performing a multi-frame fusion process on the multi-frame Raw image by using a fusion method based on kalman filtering to obtain a noise-reduced output image includes:

sequentially selecting a frame of Raw image from the multi-frame Raw images as a current frame of Raw image, and processing blocks of each frame in the current frame of Raw image and a frame of Raw image before the current frame of Raw image by using Kalman filtering; wherein, the previous frame of Raw image of the first frame of Raw image in the multi-frame Raw image is the last frame of Raw image in the multi-frame Raw image:

and fusing the reference frame with other frames to be aligned according to preset weights to obtain the noise-reduced output image.

10. A noise reduction device for a Raw domain image, comprising:

the receiving module is used for receiving an input multi-frame Raw image, wherein the multi-frame Raw image is a multi-frame Bayer original domain Raw image grabbed under the same exposure;

the screening module is used for selecting one frame of Raw image from the multi-frame Raw images as a reference frame, and other frames of Raw images as frames to be aligned;

the alignment module is used for carrying out multi-frame alignment processing on the multi-frame Raw image by adopting an alignment algorithm based on fast Fourier transform to obtain the position information of the aligned blocks in all the frames to be aligned and the reference frame;

And the fusion module is used for carrying out multi-frame fusion processing on the multi-frame Raw image by adopting a fusion method based on Kalman filtering to obtain an output image after noise reduction.

11. A camera, comprising:

an image acquisition module;

the noise reduction device for the Raw domain image of claim 10 in communication with the image acquisition module.

12. A terminal, comprising: the camera of claim 11.