CN114070960A

CN114070960A - Noise reduction method and device for video

Info

Publication number: CN114070960A
Application number: CN202111622158.9A
Authority: CN
Inventors: 马强; 程志威; 周少华
Original assignee: Suzhou Industrial Park Zhizai Tianxia Technology Co ltd
Current assignee: Suzhou Industrial Park Zhizai Tianxia Technology Co ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-02-18

Abstract

The invention discloses a noise reduction method and a device thereof for a video, wherein the noise reduction method comprises the following steps: receiving T video frames arranged in a time sequence, wherein each video frame is composed of M rows and N columns of pixels; decomposing the T foreground video frames into T foreground video frames and T background video frames; and performing noise reduction processing on the T foreground video frames and the T background video frames, and then combining the T foreground video frames and the T background video frames after noise reduction into the T video frames after noise reduction. In summary, the noise reduction method can complete video noise reduction.

Description

Noise reduction method and device for video

Technical Field

The invention relates to the technical field of video noise reduction, in particular to a noise reduction method and device for a video.

Background

In practical practice, it is difficult to collect clean video (video data without noise pollution) and even some video data cannot be obtained, so the video denoising method based on supervised learning is generally difficult to be applied in practice. In the prior art, many video noise reduction technologies based on supervised learning methods are proposed, but since supervised learning inherently has the above disadvantages, the techniques cannot be used on a large scale. Therefore, it is necessary to design a noise reduction method without supervised learning.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for reducing noise in a video.

In order to achieve the purpose, the technical scheme of the invention is realized as follows: a method of noise reduction for video, comprising the steps of: t video frames Y arranged in time sequence are received₁，Y₂，...，Y_TEach video frame Y_iEach pixel consists of M rows and N columns of pixels; wherein T, M, N and i are natural numbers, T is more than or equal to 2, and i is more than or equal to 1 and less than or equal to T; creating T foreground video frames F₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd F is₁＝0，B₁＝Y₁(ii) a Video frame Y based on partialSVD₁Is processed, i.e. U₁,X₁,V₁＝partialSVD(B₁R) in which U₁Is a matrix of M rows and M columns, V₁Is a matrix of N rows and N columns, X₁The matrix is a diagonal matrix with M rows and N columns, and r is a preset rank value; initializing m to 1 and t to 2, and continuously executing a first operation: increasing the value of m by 1, U_t,X_t,V_t＝incSVD(Y_t,U_t-1,X_t-1,V_t-1) Continuously executing n times of second operation, and when m is larger than or equal to the preset threshold value R, dwnSVD (1, U)_t,X_t,V_t) When is coming into contact with

After m is 1, the value of t is increased by 1; the second operation specifically includes: b is_t＝U_t’*X_t*V_t ^T，U_tIs' U_tM rows 1 to r columns of elements, V_t ^TIs a V_tTranspose of (F)_t＝sgn(Y_t-B_t)*max(0，|Y_t-B_tI- γ), where γ is a preset parameter and sgn () is a sign function; and when the second operation has been performed n times, perform U_t,X_t,V_t＝repSVD(Y_t，F_t，U_t，X_t，V_t) (ii) a For T foreground video frames F₁，F₂，...，F_TPerforming noise reduction processing, and processing T background video frames B₁，B₂，...，B_TCarrying out noise reduction processing, and then carrying out noise reduction on the T foreground video frames F after noise reduction₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd combining the T video frames after noise reduction.

As an improvement of the embodiment of the present invention, the "pair of T foreground video frames F₁，F₂，...，F_TPerforming noise reduction processing, and processing T background video frames B₁，B₂，...，B_TThe performing the noise reduction processing specifically includes: obtaining a preset lightweight neural network, and taking the T foreground video frames F₁，F₂，...，F_TInputting the lightweight neural network for noise reduction processing, and converting the T background video frames B₁，B₂，...，B_TInputting the lightweight neural network for noise reduction; the lightweight neural network is trained by a plurality of clean images and noise images matched with the clean images; the lightweight neural network includes: the neural network includes a SqueezeNet neural network, a ShuffleNet neural network, a MnasNet neural network, a MobileNet neural network, a CondenseNet neural network, an ESPNet neural network, a Channelnets neural network, a PeleNet neural network, an IGC neural network, an FBNet neural network, an EfficientNet neural network, a GhostNet neural network, a WeightNet neural network, a MicroNet neural network, and a U-NET neural network.

As an improvement of the embodiment of the present invention, the "training of the lightweight neural network by a plurality of clean images and noise images matched with the clean images" specifically includes: imaging input of multiple noises into a deep neural network D_θAnd outputting a clean image matching each noise image, and then training the lightweight neural network based on the plurality of clean images and the noise images matching the plurality of clean images.

As an improvement of the embodiment of the present invention, the "pair of T foreground video frames F₁，F₂，...，F_TThe performing of the noise reduction processing specifically includes: f, taking T foreground video frames₁，F₂，...，F_TAnd dividing the video frame into a plurality of foreground video frame groups, wherein each foreground video frame group comprises a plurality of foreground video frames, and performing noise reduction processing on the plurality of foreground video frames in each foreground video frame group based on an optical flow method.

As an improvement of the embodiment of the present invention, the "performing noise reduction processing on a plurality of foreground video frames in each foreground video frame group based on an optical flow method" specifically includes: and performing the following processing on each foreground video frame group: selecting a key foreground video frame from the foreground video frame group, calculating optical flow information of all foreground video frames in the foreground video frame group relative to the key foreground video frame, performing registration processing on all foreground video frames relative to the key foreground video frame based on the optical flow information, and then performing noise reduction processing on all foreground video frames after registration.

The embodiment of the invention also provides a noise reduction device for the video, which comprises the following modules: a data acquisition module for receiving T video frames Y arranged in time sequence₁，Y₂，...，Y_TEach video frame Y_iEach pixel consists of M rows and N columns of pixels; wherein T, M, N and i are natural numbers, T is more than or equal to 2, and i is more than or equal to 1 and less than or equal to T; an initialization module for creating T foreground video frames F₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd F is₁＝0，B₁＝Y₁(ii) a Video frame Y based on partialSVD₁Is processed, i.e. U₁,X₁,V₁＝partialSVD(B₁R) in which U₁Is a matrix of M rows and M columns, V₁Is a matrix of N rows and N columns, X₁The matrix is a diagonal matrix with M rows and N columns, and r is a preset rank value; a foreground and background extraction module, configured to initialize m ═ 1 and t ═ 2, and continuously perform a first operation: increasing the value of m by 1, U_t,X_t,V_t＝incSVD(Y_t,U_t-1,X_t-1,V_t-1) Continuously executing n times of second operation, and when m is larger than or equal to the preset threshold value R, dwnSVD (1, U)_t,X_t,V_t) When is coming into contact with

After m is 1, the value of t is increased by 1; the second operation specifically includes: b is_t＝U_t’*X_t*V_t ^T，U_tIs' U_tM rows 1 to r columns of elements, V_t ^TIs a V_tTranspose of (F)_t＝sgn(Y_t-B_t)*max(0，|Y_t-B_tI- γ), where γ is a preset parameter and sgn () is a sign function; and when the second operation has been performed n times, perform U_t,X_t,V_t＝repSVD(Y_t，F_t，U_t，X_t，V_t) (ii) a A noise reduction module for reducing noise of T foreground video frames F₁，F₂，...，F_TPerforming noise reduction processing, and processing T background video frames B₁，B₂，...，B_TCarrying out noise reduction processing, and then carrying out noise reduction on the T foreground video frames F after noise reduction₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd combining the T video frames after noise reduction.

As an improvement of the embodiment of the present invention, the noise reduction module is further configured to: obtaining a preset lightweight neural network, and taking the T foreground video frames F₁，F₂，...，F_TInputting the lightweight neural network for noise reduction processing, and converting the T background video frames B₁，B₂，...，B_TInputting the lightweight neural network for noise reduction; the lightweight neural network is trained by a plurality of clean images and noise images matched with the clean images; the lightweight neural network includes: the neural network includes a SqueezeNet neural network, a ShuffleNet neural network, a MnasNet neural network, a MobileNet neural network, a CondenseNet neural network, an ESPNet neural network, a Channelnets neural network, a PeleNet neural network, an IGC neural network, an FBNet neural network, an EfficientNet neural network, a GhostNet neural network, a WeightNet neural network, a MicroNet neural network, and a U-NET neural network.

As an improvement of the embodiment of the present invention, the noise reduction module is further configured to: imaging input of multiple noises into a deep neural network D_θAnd outputting a clean image matching each noise image, and then training the lightweight neural network based on the plurality of clean images and the noise images matching the plurality of clean images.

As an improvement of the embodiment of the present invention, the noise reduction module is further configured to: f, taking T foreground video frames₁，F₂，...，F_TAnd dividing the video frame into a plurality of foreground video frame groups, wherein each foreground video frame group comprises a plurality of foreground video frames, and performing noise reduction processing on the plurality of foreground video frames in each foreground video frame group based on an optical flow method.

As an improvement of the embodiment of the present invention, the noise reduction module is further configured to: and performing the following processing on each foreground video frame group: selecting a key foreground video frame from the foreground video frame group, calculating optical flow information of all foreground video frames in the foreground video frame group relative to the key foreground video frame, performing registration processing on all foreground video frames relative to the key foreground video frame based on the optical flow information, and then performing noise reduction processing on all foreground video frames after registration.

The method and the device for reducing the noise of the video provided by the embodiment of the invention have the following advantages: the embodiment of the invention discloses a noise reduction method and a device thereof for a video, wherein the noise reduction method comprises the following steps: receiving T video frames arranged in a time sequence, wherein each video frame is composed of M rows and N columns of pixels; wherein, T, M, N; decomposing the T foreground video frames into T foreground video frames and T background video frames; and performing noise reduction processing on the T foreground video frames and the T background video frames, and then combining the T foreground video frames and the T background video frames after noise reduction into the T video frames after noise reduction. In summary, the noise reduction method can complete video noise reduction.

Drawings

Fig. 1 is a schematic flowchart of a noise reduction method for video according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a noise reduction method for video according to an embodiment of the present invention;

fig. 3, 4, 5 and 6 are graphs of experimental results of a noise reduction method for a video according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to embodiments shown in the drawings. The present invention is not limited to the embodiment, and structural, methodological, or functional changes made by one of ordinary skill in the art according to the embodiment are included in the scope of the present invention.

The following description and the drawings sufficiently illustrate specific embodiments herein to enable those skilled in the art to practice them. Portions and features of some embodiments may be included in or substituted for those of others. The scope of the embodiments herein includes the full ambit of the claims, as well as all available equivalents of the claims. The terms "first," "second," and the like, herein are used solely to distinguish one element from another without requiring or implying any actual such relationship or order between such elements. In practice, a first element can also be referred to as a second element, and vice versa. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a structure, apparatus, or device that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such structure, apparatus, or device. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a structure, device or apparatus that comprises the element. The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The terms "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like herein, as used herein, are defined as orientations or positional relationships based on the orientation or positional relationship shown in the drawings, and are used for convenience in describing and simplifying the description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention. In the description herein, unless otherwise specified and limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may include, for example, mechanical or electrical connections, communications between two elements, direct connections, and indirect connections via intermediary media, where the specific meaning of the terms is understood by those skilled in the art as appropriate.

An embodiment of the present invention provides a method for reducing noise of a video, as shown in fig. 1, including the following steps:

step 101: t video frames Y arranged in time sequence are received₁，Y₂，...，Y_TEach video frame Y_iEach pixel consists of M rows and N columns of pixels; wherein T, M, N and i are natural numbers, T is more than or equal to 2, and i is more than or equal to 1 and less than or equal to T;

step 102: creating T foreground video frames F₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd F is₁＝0，B₁＝Y₁(ii) a Video frame Y based on partialSVD₁Is processed, i.e. U₁,X₁,V₁＝partialSVD(B₁R) in which U₁Is a matrix of M rows and M columns, V₁Is a matrix of N rows and N columns, X₁The matrix is a diagonal matrix with M rows and N columns, and r is a preset rank value;

step 103: initializing m to 1 and t to 2, and continuously executing a first operation: increasing the value of m by 1, U_t,X_t,V_t＝incSVD(Y_t,U_t-1,X_t-1,V_t-1) Continuously executing n times of second operation, and when m is larger than or equal to the preset threshold value R, dwnSVD (1, U)_t,X_t,V_t) When is coming into contact with

After m is 1, the value of t is increased by 1; the second operation specifically includes: b is_t＝U_t’*X_t*V_t ^T，U_tIs' U_tM rows 1 to r columns of elements, V_t ^TIs a V_tTranspose of (F)_t＝sgn(Y_t-B_t)*max(0，|Y_t-B_tI- γ), where γ is a preset parameter and sgn () is a sign function; and when the second operation has been performed n times, perform U_t,X_t,V_t＝repSVD(Y_t，F_t，U_t，X_t，V_t)；

Step 104: for T foreground video frames F₁，F₂，...，F_TPerforming noise reduction processing, and processing T background video frames B₁，B₂，...，B_tCarrying out noise reduction processing, and then carrying out noise reduction on the T foreground video frames f after noise reduction₁，f₂，...，f_tAnd T background video frames B₁，B₂，...，B_TAnd combining the T video frames after noise reduction.

Here, a clean video data sample X can be understood as: x is formed by R^T*C*M*NWhere T is the number of video frames, C is the number of color channels of a video frame (e.g., 3 channels for RGB and Lab images, 4 channels for CMYK images, etc.), and the size of each video frame is M × N (i.e., M rows and N columns of pixels are included).

The video noise reduction problem can be presented by the following formula: y is_t＝X_t+N_tWhere T e {1, …, T }, Y_t∈R^C*M*NFor the t-th contaminated video frame, X_t∈R^C*M*NFor the t-th clean video frame, N_t∈R^C*M*NIs noise. Different noise types follow different random distributions, e.g., (1) additive Gaussian noise, then noise frame Y_tThe following distribution is obeyed: y is_t,m,n～N(X_t,m,n,σ²) Where T is {1, …, T }, M is {1, …, M }, N is {1, …, N }, σ isIs a standard deviation, and the expectation of Y satisfies E [ Y]X; (2) poisson noise, noisy video frame Y_tThe following poisson distribution is obeyed:

wherein λ is the average event number and the expectation of Y satisfies E [ Y [)]The variance of poisson noise is signal dependent, i.e. it varies with the variation of the original signal value.

This step 102 and step 103 are responsible for video decomposition (also called context modeling, aiming at decomposing a video Y into a low rank context B ∈ R^T×C×M×NWith sparse foreground F ∈ R^T×C×M×N). It is trying to solve one optimization problem as follows:

min_B,Frank(B)+yIIF||₁s.t.Y＝B+F，

wherein Y ∈ R^CMN×TFor a reshaped input video, like Y_t＝vec(Y_t)，B,F∈R^CMN×TIs a low rank matrix and a sparse matrix, and gamma is a positive parameter. However, the rank operator in the above formula is not differentiable and is discontinuous. Of the above equations, a commonly used convex relaxation is Robust Principal Component Analysis (RPCA), which replaces the rank operator with a convex kernel norm to yield: min_B,F||B||_*+γ||F||₁s.t.y ═ B + F. Although many algorithms have been proposed for solving the RPCA problem, most of them require a batch or an entire video data to be fed as input into the algorithm for processing, however, the batch operation is very time consuming and insufficient in practical application scenarios requiring real-time performance.

The video decomposition method in the embodiment of the invention is a rapid online background modeling algorithm, and can decompose the video in real time in a frame-by-frame mode. The algorithm accounts for the following slack of equation one:

an iterative algorithm in step 103 solves this problem. The algorithm finally returns a low rank matrix B and a sparse matrix F, which are further reshaped into a background B and a foreground F of the video.

Here, partialSVD, incSVD, repSVD and downSVD are all variants of Singular Value Decomposition (SVD).

SVD singular value decomposition, assuming Matrix is an m x n order Matrix, which decomposes Matrix into Matrix UXV^TWherein U is an m × m order matrix, V^TIs the transpose of a matrix V of order n X n, X being a diagonal matrix of order m X n, the elements on the diagonal being called singular values.

PartialSVD, also known as thin SVD (Rodriguez)&In Wohlberg 2016, partialSVD is named thinSVD, and actually truncated SVD), is an SVD method for reducing the time-space complexity, and decomposes an m x n order matrix into m x n order matrix

Wherein, U_kIs an m x k order matrix and is a matrix,

is a k by n order matrix, X_kIs a k x k diagonal matrix, k being any integer. Compared with the common SVD, the partialSVD has the advantages of lower calculation cost and higher calculation speed.

incSVD, which is a short for incemental SVD, is an incremental SVD, and decomposes the m x n order Matrix and the m-dimensional vector d in the known singular value

In the case of (b), wherein X₀For the r x r order Matrix, incSVD can quickly find the m x (n +1) order Matrix [ Matrix, d]Singular value decomposition of

Wherein X₁Is a (r +1) × (r +1) order matrix. In this context, consider that Matrix is an existing n-frame video and vector d is a new frameVideo, then incSVD can be based directly on the original decomposition

Finding (n +1) frame video [ Matrix, d]Singular value decomposition of (c).

down SVD: downSVD is the reverse of incSVD. At known singular value decomposition

In the case of (1), the downSVD can quickly find the singular value decomposition of the Matrix

repVD: repPVD is short for replace SVD, which is similar to the first two SVDs. For the m x n order Matrix, m-dimensional vector a and vector b, decomposing in known singular value

In the case of (1), the downSVD can quickly find the Matrix [ Matrix, b ]]Singular value decomposition of

The use of the SVD variant described above greatly reduces the run time required for the algorithm.

Video decomposition has two advantages for noise reduction: (1) after decomposition, i.e. the video is decomposed into foreground and background, the noise on the video frame is also divided into two parts: background noise and foreground noise. Since the background is always a low rank matrix in time, the energy of the noise related to the background can be considered very low, otherwise the low rank assumption will not hold; this will make the background have a high signal to noise ratio and make the noise reduction task simpler, which is also the finding in experiments, as shown in fig. 2. (2) Most of noise energy in the original picture is reserved in the foreground, and the foreground contains a plurality of sparse moving objects, so that the noise of the foreground is separated from the image content to a great extent, and the noise of the foreground is removed more easily.

Given the advantages based on low rank and sparsity assumptions, video decomposition yields a better result. Given F, B ═ incPCP (Y), let us say

And

the foreground and the background after noise reduction. Thus the denoised video may be defined as

Based on the prospect

Knowing the sparseness of the video

Can be estimated as:

wherein M is the same as R^T×c×M×NIs one

A mask of medium non-zero elements indicates multiplication of elements between matrices. Since the foreground is sparse, suppose | | | M | | luminance_F＜＜||1-M||_FWherein 1 ∈ R^T×C×M×NIs a tensor so the elements are all 1.

Given a clean video X, assume

Is the noise reduction result of the noise video without decomposition operation. Based on the low-rank assumption of the background, the background contains low-energy noise, which means that the background contains a better noise reduction quality than the original video, i.e. the background is a noise reduction quality

Noise reduction results

And

can be used

And

to measure, therefore, there are

Non-woven fabric with sparsity M_F＜＜||1-M||_FThis inequality holds true assuming low rank hypotheses. Thus, it can be derived

This means that the composition of the video will yield a better result.

In this embodiment, the description "for T foreground video frames F₁，F₂，...，F_TPerforming noise reduction processing, and processing T background video frames B₁，B₂，...，B_TThe performing the noise reduction processing specifically includes:

obtaining a preset lightweight neural network, and taking the T foreground video frames F₁，F₂，...，F_TInputting the lightweight neural network for noise reduction processing, and converting the T background video frames B₁，B₂，...，B_TInputting the lightweight neural network for noise reduction; the lightweight neural network is trained by a plurality of clean images and noise images matched with the clean images;

the lightweight neural network includes: the neural network includes a SqueezeNet neural network, a ShuffleNet neural network, a MnasNet neural network, a MobileNet neural network, a CondenseNet neural network, an ESPNet neural network, a Channelnets neural network, a PeleNet neural network, an IGC neural network, an FBNet neural network, an EfficientNet neural network, a GhostNet neural network, a WeightNet neural network, a MicroNet neural network, and a U-NET neural network.

Here, the lightweight neural network is trained based on a plurality of clean images and noise images matched with the clean images, so that the lightweight network can learn the mapping relation, and the noise reduction effect of the high-precision network and the speed advantage of the lightweight network can be simultaneously maintained by guiding the training of the lightweight network by using the high-precision but time-consuming network. .

In this embodiment, the training of the lightweight neural network from a plurality of clean images and noise images matched with the clean images specifically includes: imaging input of multiple noises into a deep neural network D_θAnd outputting a clean image matching each noise image, and then training the lightweight neural network based on the plurality of clean images and the noise images matching the plurality of clean images.

Here, most existing Self-monitoring Noise reduction methods, such as Noise2Void, Noise2Self, etc., are inspired by the idea of blind-spot, which first randomly masks a part of pixels from an input Noise image, and the masked pixel values are set to zero or alternative values. Through training, a deep neural network with learnable parameters recorded as theta

To predict these missing pixels. The input to the network is a masked image and the output is the original noise image. This method can learn a de-noising map from the noise data rather than an identity map.

In this embodiment, the pair of T foreground video frames F₁，F₂，...，F_TTo perform noise reduction processingThe method comprises the following steps: f, taking T foreground video frames₁，F₂，...，F_TAnd dividing the video frame into a plurality of foreground video frame groups, wherein each foreground video frame group comprises a plurality of foreground video frames, and performing noise reduction processing on the plurality of foreground video frames in each foreground video frame group based on an optical flow method.

Here, since the foreground picture mainly contains some sparse content, it is more suitable for multi-frame noise reduction based on the optical flow method. Therefore, after single-frame denoising processing is carried out on the foreground video, firstly, a certain frame is selected as a key frame, and then optical flow images of a plurality of continuous frames before and after the key frame are calculated; then, the optical flow information can be used to take the key frame as the reference frame, the adjacent frames are registered, and finally the registered continuous multi-frame images are weighted and averaged.

In this embodiment, the "performing noise reduction processing on a plurality of foreground video frames in each foreground video frame group based on an optical flow method" specifically includes: and performing the following processing on each foreground video frame group: selecting a key foreground video frame from the foreground video frame group, calculating optical flow information of all foreground video frames in the foreground video frame group relative to the key foreground video frame, performing registration processing on all foreground video frames relative to the key foreground video frame based on the optical flow information, and then performing noise reduction processing on all foreground video frames after registration.

Here, the optical flow refers to a motion displacement of an object. The motion field in space is transferred to the image and is expressed as an optical flow field (optical flow field), and the optical flow can be calculated by matching or the like based on the assumption that the brightness is constant and the time is continuous or the motion is "small motion". Such as the Horn-Schunck algorithm, Lucas-Kanade (LK) optical flow method, and the like. The neural network-based method uses the motion video and the corresponding synthetic light flow graph data set to form paired data, thereby performing supervised training. Such as PWCNET, FLOWNET, and the like. After the optical flow is acquired, the neighboring frames may be registered to obtain temporal noise reduction results for the intermediate frames.

In practical experiments, the inventors evaluated the proposed noise reduction method using surveillance video with gaussian noise and poisson noise and general video. In addition, the proposed noise reduction method is also evaluated using X-ray coronary angiography, some coronary vessel videos with noise in the actual scene.

Monitoring video noise reduction

In the data set related to surveillance video, the CDNet2014 data set is used for training and testing. CD-Net2014 is a dynamic monitoring data set containing 53 different kinds of videos, such as dynamic background, camera shake, bad weather, and the like. The video is taken from a relatively fixed camera view, which is very suitable for low rank background extraction.

In training, the data set was divided into 31 for training and 22 for testing. To use the framework of the noise reduction method in this embodiment, each training video is decomposed into foreground and background by the above-mentioned noise reduction device implemented on a GPU (Graphics Processing Unit), the rank r of the background is set to 1, and the parameter γ of the noise reduction device is set to 1

Where each frame is M x N in size. For each video, 200 frames are randomly chosen for training. The size of each frame is adjusted to 512 x 512 and noise is added to each frame. In the experimental part, the variance σ of gaussian noise is set to 50,75,100, and poisson noise λ is set to 50,75,100. And training two U-Nets to respectively perform noise reduction training on the foreground and the background by using a noise2Self framework. Training with Adam optimizer and learning rate of 10^-4. U-Net was trained 100 times, 8 samples per training. All experiments used a GeForce RTX 2080Ti GPU.

During testing, 22 CDNet2014 videos are used to test the performance of the noise reduction method framework in the embodiment. Consecutive 200 frames were extracted from the test video and the foreground was guaranteed to be dynamic for fair evaluation. By contrast with some other deep Noise reduction networks, including convolutional Blind Point networks (BSN), Noise2Void, Noise2Self, Noise2Truth, ViDeNN, and FastDVDNet, whereBSN, Noise2Void, Noise2Self is an auto-supervised single image Noise reduction network, while Noise2Truth is a supervised learning method, and ViDenN and FastDVDnet are the highest level supervised video Noise reduction networks in the industry. The bayesian training part of the BSN model was discarded, as well as the noise mapping of the FastDvDnet input for blind denoising and fair comparison. For multi-frame noise reduction, a pre-trained PWCNet is used to compute the optical flow. For calculating the weight beta of a bilateral filter_kThe parameter ρ of (a) was set to ρ 0,0.02,0.05 for the comparative experiment. All experiments were performed on NVIDIA GeForce RTX 2080Ti GPU.

And (6) recovering. PSNR scores and SSIM metrics are calculated between noisy videos and their true values to measure performance. The results of the comparison are shown in FIG. 3, where the best results are in bold.

As shown in fig. 3, psnr (db)/SSIM score and temporal consistency measurement on the CDNet2014 data set. The time consistency measurement (if the third number appears in the cell) only provides gaussian noise σ 75 and poisson noise λ 5. The best score is shown in bold and the inferior score is underlined

In fig. 3, for all ρ ═ 0,0.02,0.05, the noise reduction method in the present embodiment is framed in all the self-supervision methods and supervision methods, and achieves the best PSNR in terms of denoising gaussian noise and poisson noise. Compared with the existing Self-supervision denoising method including Noise2Self, the PSNR of the denoising method frame in the embodiment is improved by about 3dB, thereby proving the effectiveness of the video decomposition method. Also, the proposed noise reduction method framework in this embodiment achieves the best performance in terms of SSIM. The result shows that the noise reduction method in the embodiment of the invention has better robustness to the noise with different noise levels and better noise reduction performance to the noise with larger variance. Another finding is that Noise2Self has similar performance to Noise2Truth, although there is no ground Truth data.

And (4) carrying out comparative experiments. The noise reduction method in this embodiment is to study three components of the noise reduction method framework through comparative experiments. First, assuming that there is no multi-frame denoising step, i.e., ρ ═ 0, it can be seen from fig. 3 that the PSNR fraction of the denoising method in this embodiment is higher than that of Noise2Self, which indicates that video decomposition plays a crucial role in denoising. Secondly, if there is no single frame denoising step, the optical flow calculation will be severely interfered by noise, resulting in inaccurate calculation results. Thirdly, in the noise reduction method of the present embodiment comparing different parameters ρ, when ρ is 0.02, the multi-frame noise reduction is slightly improved. When ρ is 0.05, the PSNR fraction decreases due to excessive smoothing.

And (5) time consistency. Although multi-frame denoising provides a negligible contribution in PSNR and SSIM scores, it guarantees temporal consistency of the denoised video. Given a video Y, the following Temporal Consistency (TC) metric is introduced:

where W (-) is the registration function, O_t+1→tIs the optical flow from the t +1 th frame to the t th frame, and M, N is the size of the video frame. This TC metric calculates the difference between every two consecutive frames in the video. TC metrics are calculated on the CDNet2014 test set. The results are also listed in fig. 3 above, and it is found that the lower the TC metric, the better the coherence of the video. As can be seen from fig. 3, the time consistency of the noise reduction method in this embodiment is better than that of all the existing self-supervised noise reduction methods and supervised noise reduction methods. The TC gauge increases along with the increase of the rho value, and the necessity of multi-frame denoising is verified from a quantitative point of view.

And (7) running time. The noise reduction method framework in the embodiment satisfies the condition of real-time processing. Fig. 4 shows the runtime of the framework of the noise reduction method at different video resolutions. As can be seen from fig. 4 (left), most of the running time is in multi-frame denoising (MFD), more precisely, optical flow calculation, which is accelerated with the development of the most advanced optical flow framework. The video decomposition and the single-frame denoising achieve higher running speed in all sizes. The frame shown in FIG. 4 (right) achieves MFDs of 13.5fps and 50.4fps on 512 x 512 video, both of which can be processed in real time.

Video noise reduction is common.

It is proved that the noise reduction method framework in the embodiment improves the video denoising performance despite the absence of the low-rank background assumption. The results of the noise reduction method in this embodiment were tested on the Davis2017 test set using a CDNet2014 dataset trained model, where most videos have a dynamically moving background. Each video has approximately 80 frames and is resized to 768 x 512. PSNR (Peak Signal to Noise Ratio) and SSIM (Structural SIMilarity) scores were used as evaluation indexes. The result of the noise reduction method in this embodiment is compared with the above-described framework, but the BSN is not included therein because it requires the length and width of the frame to be equal. The comparison results are shown in FIG. 5. It shows that the noise reduction method in this embodiment is improved by 0.5dB on average over the single frame N2S. Compared with the supervised video denoising method, the denoising result of the method is superior to ViDeNN and FastDvDnet in the denoising effect of Gaussian noise, and when the variance sigma of the noise is 75,100. Experiments have shown that the framework is able to achieve better performance even if the background is not strictly low rank.

In FIG. 5, the PSNR (dB)/SSIM score on the Davis2017 data set. The best score is shown in bold and the inferior score is underlined.

X-ray coronary angiography noise reduction

Here, the effectiveness of the noise reduction method framework in practical application in the present embodiment will be shown. Consider the denoising of X-ray Coronary Angiography (XCA), which is a video of Coronary vessels taken by X-rays of a fixed imaging geometry, satisfying a low rank assumption. The low dose of x-rays causes signal dependent noise in the video. The XCA data set has 22 gray-level videos, divided into 16 videos for training and 6 videos for testing. 50 frames are extracted from each video for training, and the size of each frame is set to 1024 × 1024. The single frame denoising method still uses Noise2Self, and is trained for 10 rounds to avoid overfitting.

Since there is no real data, the relative Noise ratio (CNR) of each frame in the test set is calculated, a blind image quality estimateAnd the method is used for evaluating the noise reduction quality. The higher the CNR score, the better the denoising quality. Thus, the average CNR of the original noisy video is CNR_input0.402. CNR after single frame denoising processing by Noise2Self_Noise2Self0.671. The frame of the noise reduction method (ρ 0.04) in this embodiment achieves the best performance, i.e., CNR_Noise2FgBgThis verifies the superiority of the noise reduction method model in this embodiment in this application, 0.938. Other supervised and unsupervised methods are not compared, as there is no clean data for supervision, and other unsupervised methods can be used directly in the framework. Fig. 6 provides a comparison of visual quality, and it can be found that the denoising result of the denoising method in this embodiment is less noisy, thereby qualitatively proving the advantage of the denoising method in this embodiment.

The embodiment of the invention provides a noise reduction device for a video, which comprises the following modules:

a data acquisition module for receiving T video frames Y arranged in time sequence₁，Y₂，...，Y_TEach video frame Y_iEach pixel consists of M rows and N columns of pixels; wherein T, M, N and i are natural numbers, T is more than or equal to 2, and i is more than or equal to 1 and less than or equal to T;

an initialization module for creating T foreground video frames F₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd F is₁＝0，B₁＝Y₁(ii) a Video frame Y based on partialSVD₁Is processed, i.e. U₁,X₁,V₁＝partialSVD(B₁R) in which U₁Is a matrix of M rows and M columns, V₁Is a matrix of N rows and N columns, X₁The matrix is a diagonal matrix with M rows and N columns, and r is a preset rank value;

a foreground and background extraction module, configured to initialize m ═ 1 and t ═ 2, and continuously perform a first operation: increasing the value of m by 1, U_t,X_t,V_t＝incSVD(Y_t,U_t-1,X_t-1,V_t-1) Continuously executing n times of second operation, and when m is larger than or equal to the preset threshold value R, dwnSVD (1, U)_t,X_t,V_t) When is coming into contact with

A noise reduction module for reducing noise of T foreground video frames F₁，F₂，...，F_TPerforming noise reduction processing, and processing T background video frames B₁，B₂，...，B_TCarrying out noise reduction processing, and then carrying out noise reduction on the T foreground video frames F after noise reduction₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd combining the T video frames after noise reduction.

In this embodiment, a preset lightweight neural network is obtained, and the T foreground video frames F are processed₁，F₂，...，F_TInputting the lightweight neural network for noise reduction processing, and converting the T background video frames B₁，B₂，...，B_TInputting the lightweight neural network for noise reduction; the lightweight neural network is trained by a plurality of clean images and noise images matched with the clean images;

In this embodiment, the noise reduction module is further configured to: imaging input of multiple noises into a deep neural network D_θAnd outputting a clean image matching each noise image, and then training the lightweight neural network based on the plurality of clean images and the noise images matching the plurality of clean images.

In this embodiment, the noise reduction module is further configured to: f, taking T foreground video frames₁，F₂，...，F_TAnd dividing the video frame into a plurality of foreground video frame groups, wherein each foreground video frame group comprises a plurality of foreground video frames, and performing noise reduction processing on the plurality of foreground video frames in each foreground video frame group based on an optical flow method.

In this embodiment, the noise reduction module is further configured to: and performing the following processing on each foreground video frame group: selecting a key foreground video frame from the foreground video frame group, calculating optical flow information of all foreground video frames in the foreground video frame group relative to the key foreground video frame, performing registration processing on all foreground video frames relative to the key foreground video frame based on the optical flow information, and then performing noise reduction processing on all foreground video frames after registration.

It should be understood that although the present description refers to embodiments, not every embodiment contains only a single technical solution, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the technical solutions in the embodiments can also be combined appropriately to form other embodiments understood by those skilled in the art.

The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.

Claims

1. A method for noise reduction of video, comprising the steps of:

t video frames Y arranged in time sequence are received₁，Y₂，...，Y_TEach video frame Y_iEach pixel consists of M rows and N columns of pixels; wherein T, M, N and i are natural numbers, T is more than or equal to 2, and i is more than or equal to 1 and less than or equal to T;

creating T foreground video frames F₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd F is₁＝0，B₁＝Y₁(ii) a Video frame Y based on partialSVD₁Is processed, i.e. U₁，X₁，V₁＝partialSVD(B₁R) in which U₁Is a matrix of M rows and M columns, V₁Is a matrix of N rows and N columns, X₁The matrix is a diagonal matrix with M rows and N columns, and r is a preset rank value;

initializing m to 1 and t to 2, and continuously executing a first operation: increasing the value of m by 1, U_t，X_t，V_t＝incSVD(Y_t，U_t-1，X_t-1，V_t-1) Continuously executing n times of second operation, and when m is larger than or equal to the preset threshold value R, dwnSVD (1, U)_t，X_t，V_t) When is coming into contact with

After m is 1, the value of t is increased by 1; the second operation specifically includes: b is_t＝U_t’*X_t*V_t ^T，U_tIs' U_tM rows 1 to r columns of elements, V_t ^TIs a V_tTranspose of (F)_t＝sgn(Y_t-B_t)*max(0，|Y_t-B_tI- γ), where γ is a preset parameter and sgn () is a sign function; and when the second operation has been performed n times, perform U_t，X_t，V_t＝repSVD(Y_t，F_t，U_t，X_t，V_t)；

For T foreground video frames F₁，F₂，...，F_TPerforming noise reduction processing, and processing T background video frames B₁，B₂，...，B_TCarrying out noise reduction processing, and then carrying out noise reduction on the T foreground video frames F after noise reduction₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd combining the T video frames after noise reduction.

2. The method of claim 1, wherein the "F" pairs T foreground video frames₁，F₂，...，F_TPerforming noise reduction processing, and processing T background video frames B₁，B₂，...，B_TThe performing the noise reduction processing specifically includes:

3. The noise reduction method according to claim 2, wherein the "training of the lightweight neural network from a plurality of clean images and noise images matched with the clean images" specifically comprises:

imaging input of multiple noises into a deep neural network D_θAnd outputting a clean image matching each noise image, and then training the lightweight neural network based on the plurality of clean images and the noise images matching the plurality of clean images.

4. The method of claim 2, wherein the "pair of T foreground video frames F" is₁，F₂，...，F_TThe performing the noise reduction processing specifically includes:

f, taking T foreground video frames₁，F₂，...，F_TAnd dividing the video frame into a plurality of foreground video frame groups, wherein each foreground video frame group comprises a plurality of foreground video frames, and performing noise reduction processing on the plurality of foreground video frames in each foreground video frame group based on an optical flow method.

5. The method according to claim 4, wherein the performing the noise reduction processing on the plurality of foreground video frames in each foreground video frame group based on the optical flow method specifically includes:

and performing the following processing on each foreground video frame group: selecting a key foreground video frame from the foreground video frame group, calculating optical flow information of all foreground video frames in the foreground video frame group relative to the key foreground video frame, performing registration processing on all foreground video frames relative to the key foreground video frame based on the optical flow information, and then performing noise reduction processing on all foreground video frames after registration.

6. A noise reduction apparatus for video, comprising:

an initialization module for creating T foreground video frames F₁，F₂，...，F_TAnd T background video frames B₁，B₂，...，B_TAnd F is₁＝0，B₁＝Y₁(ii) a Video frame Y based on partialSVD₁Is processed, i.e. U₁，X₁，V₁＝partialSVD(B₁R) in which U₁Is a matrix of M rows and M columns, V₁Is a matrix of N rows and N columns, X₁The matrix is a diagonal matrix with M rows and N columns, and r is a preset rank value;

a foreground and background extraction module, configured to initialize m ═ 1 and t ═ 2, and continuously perform a first operation: increasing the value of m by 1, U_t，X_t，V_t＝incSVD(Y_t，U_t-1，X_t-1，V_t-1) Continuously executing n times of second operation, and when m is larger than or equal to the preset threshold value R, dwnSVD (1, U)_t，X_t，V_t) When is coming into contact with

7. The noise reduction device of claim 6, wherein the noise reduction module is further configured to:

obtaining a preset lightweight neural network, and taking the T foreground video frames F₁，F₂，...，F_TInputting theCarrying out noise reduction processing on the lightweight neural network, and carrying out noise reduction processing on the T background video frames B₁，B₂，...，B_TInputting the lightweight neural network for noise reduction; the lightweight neural network is trained by a plurality of clean images and noise images matched with the clean images;

8. The noise reduction device of claim 7, wherein the noise reduction module is further configured to:

9. The noise reduction device of claim 7, wherein the noise reduction module is further configured to:

10. The noise reduction device of claim 9, wherein the noise reduction module is further configured to: