KR20220154578A

KR20220154578A - Image Processing Device for Image Denoising

Info

Publication number: KR20220154578A
Application number: KR1020210102630A
Authority: KR
Inventors: 소재웅; 장영일; 조남익
Original assignee: 삼성전자주식회사; 서울대학교산학협력단
Priority date: 2021-05-13
Filing date: 2021-08-04
Publication date: 2022-11-22

Abstract

Provided is an image processing device for performing image denoising to perform image denoising capable of coping with a wide range of noise levels with a single neural network. According to the present invention, the image processing device comprises: an image sensor sensing an input image of a subject; a neural processor including a plurality of convolutional neural networks to remove noise from the input image; and a memory storing instructions executable in the neural processor. The neural processor uses an encoding convolutional neural network to infer a latent variable on the basis of the image received from the image sensor, and uses a denoising convolutional neural network to remove noise on the basis of a preset latent variable to generate a denoising image.

Description

Image processing device for performing image denoising {Image Processing Device for Image Denoising}

본 발명은 이미지 프로세싱 장치에서의 이미지 노이즈 제거 방법에 관한 것이다.The present invention relates to a method for removing image noise in an image processing apparatus.

이미지 장치에서 획득된 영상이 불가피한 노이즈로 인해 손상되면, 이미지 프로세싱 장치는 획득된 영상으로부터 노이즈를 제거한 영상으로 재구성할 필요가 있다. 이때 불가피한 노이즈는, 일반적으로 이미지 센서의 촬상 시 및 장치의 이미지 프로세싱과 같은 소스로부터 여러 노이즈가 누적되는 것을 의미한다. 따라서, 노이즈 발생 프로세스는 정확하게 모델링하기는 어려우며, 일반적으로 노이즈는 중앙 한계 이론(central limit theorem)에 따라 AWGN(additive white Gaussian noise)이라고 가정한다.If an image obtained by an image device is damaged due to unavoidable noise, the image processing device needs to reconstruct an image from which noise is removed from the acquired image. In this case, unavoidable noise generally means that various noises are accumulated from sources such as imaging of an image sensor and image processing of a device. Therefore, the noise generating process is difficult to accurately model, and it is generally assumed that the noise is additive white Gaussian noise (AWGN) according to the central limit theorem.

최근 컨벌루션 뉴럴 네트워크(CNN)은 획득된 이미지에서 AWGN을 제거하는데 많이 이용되고 있다. 그러나, 종래의 CNN 기반 이미지 노이즈 제거 방법은 대부분 비블라인드(Non-Blind) 방법이며, 이러한 방법에는 다양한 노이즈에 대해 상응하여 훈련된 별도의 모델이 필요하다Recently, convolutional neural networks (CNNs) have been widely used to remove AWGNs from acquired images. However, most of the conventional CNN-based image denoising methods are non-blind methods, and these methods require separate models trained correspondingly for various noises.

또한, 입력 이미지에 훈련된 이미지와 다른 레벨의 노이즈가 있을 경우, 소정 레벨의 노이즈만 포함한 이미지로 구성된 훈련 데이터와 테스트 이미지 간의 도메인 불일치(discrepancy)로 인해 성능 저하가 발생할 수 있다. 이러한 현상은 신뢰도를 떨어뜨리고, 실제 적용범위를 제한할 수 있다.In addition, when the input image has a different level of noise than the trained image, performance degradation may occur due to domain discrepancy between the test image and training data composed of images including only a predetermined level of noise. This phenomenon may decrease reliability and limit the actual application range.

상술한 문제들을 해결하기 위해, 본 발명이 해결하고자 하는 기술적 과제는 단일 신경망으로 광범위한 노이즈 레벨에 대처할 수 있는 이미지 디노이징을 수행하는 이미지 프로세싱 장치 및 이미징 장치를 제공하는 것이다. In order to solve the above problems, a technical problem to be solved by the present invention is to provide an image processing device and an imaging device that perform image denoising that can cope with a wide range of noise levels with a single neural network.

또한, 상술한 문제들을 해결하기 위해, 본 발명이 해결하고자 하는 기술적 과제는 노이즈에 대한 추가 정보 없이도, 다양한 노이즈 레벨에 적용할 수 있는 유연한 컨볼루션 뉴럴 네트워크를 포함하는, 이미지 디노이징을 수행하는 이미지 프로세싱 장치 및 이미징 장치를 제공하는 것이다.In addition, in order to solve the above problems, the technical problem to be solved by the present invention is an image that performs image denoising, including a flexible convolutional neural network that can be applied to various noise levels without additional information about noise. It is to provide a processing device and an imaging device.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 기술적 문제를 해결하기 위한, 몇몇 실시예에 따른 이미지 프로세싱 장치는 기설정된 노이즈에 기초한 입력 노이즈 이미지로부터 잠재 변수를 추론하는 인코더, 상기 입력 노이즈 이미지로부터 상기 노이즈를 제거하여 디노이징 이미지를 생성하는 디노이저 및 상기 추론된 잠재 변수를 이용하여 노이즈 이미지를 복원하는 디코더를 포함하고, 상기 복원된 노이즈 이미지와 상기 입력 노이즈 이미지 간의 차이에 기초하여 상기 잠재 변수를 학습한다.In order to solve the above-described technical problem, an image processing apparatus according to some embodiments includes an encoder for inferring a latent variable from an input noise image based on predetermined noise, and generating a denoising image by removing the noise from the input noise image. A denoiser and a decoder for restoring a noise image using the inferred latent variable, and learning the latent variable based on a difference between the reconstructed noise image and the input noise image.

상술한 기술적 문제를 해결하기 위한, 몇몇 실시예에 따른 이미지 프로세싱 장치는 피사체에 대한 제1 입력 이미지를 센싱하는 이미지 센서, 복수의 합성곱 신경망을 포함하여 상기 제1 입력 이미지의 노이즈를 제거하는 뉴럴 프로세서, 상기 뉴럴 프로세서에서 실행가능한 인스트럭션들을 저장하는 메모리를 포함하고, 상기 뉴럴 프로세서는 인코딩 합성곱 신경망을 이용하여 상기 제1 입력 이미지에 기초하여 잠재 변수를 추론하고,디노이징 합성곱 신경망을 이용하여 상기 제1 입력 이미지에서 상기 추론된 잠재 변수에 기초한 노이즈를 제거하여 제1 디노이징 이미지를 생성한다.In order to solve the above technical problem, an image processing apparatus according to some embodiments includes an image sensor for sensing a first input image of a subject and a plurality of convolutional neural networks to remove noise from the first input image. A processor and a memory storing instructions executable by the neural processor, wherein the neural processor infers a latent variable based on the first input image using an encoding convolutional neural network, and uses a denoising convolutional neural network to A first denoising image is generated by removing noise based on the inferred latent variable from the first input image.

상술한 기술적 문제를 해결하기 위한, 몇몇 실시예에 따른 이미징 장치는 프로세서 및 상기 프로세서에 의해 실행가능한 인스트럭션들을 저장하는 메모리를 포함하고, 상기 프로세서는 노이즈를 설정하고, 입력 노이즈 이미지로부터 노이즈 정보를 추론하고, 상기 입력 노이즈 이미지로부터 상기 추론된 노이즈 정보에 기초한 상기 노이즈를 제거하여 디노이징 이미지를 생성하고, 상기 추론된 노이즈 정보에 기초하여 복원된 노이즈 이미지를 생성한다. To solve the above technical problem, an imaging device according to some embodiments includes a processor and a memory storing instructions executable by the processor, wherein the processor sets noise and infers noise information from an input noise image. and generating a denoising image by removing the noise based on the inferred noise information from the input noise image, and generating a reconstructed noise image based on the inferred noise information.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Details of other embodiments are included in the detailed description and drawings.

도 1a 내지 도 1c는 CNN 기반으로 이미지 디노이징을 수행하는 방법을 다양한 비교예들을 설명하기 위한 개념도이다.
도 2는 본 발명의 실시예들에 따른 이미지 디노이징 방법을 설명하기 위한 개념도이다.
도 3는 잠재 변수의 학습에 영향을 주는 각 손실 함수에 대한 영향을 나타낸 개념도이다.
도 4는 도 3의 이미지 디노이징 방법을 보다 구체적으로 설명하기 위한 개념도이다.1A to 1C are conceptual diagrams for explaining various comparative examples of a method of performing image denoising based on a CNN.
2 is a conceptual diagram illustrating an image denoising method according to embodiments of the present invention.
3 is a conceptual diagram showing the influence of each loss function that affects the learning of latent variables.
FIG. 4 is a conceptual diagram illustrating the image denoising method of FIG. 3 in more detail.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used in a meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

단일 모델 블라인드 디노이징 방법(Unit model blind denoising method)은 입력 이미지에 포함된 다양한 레벨의 노이즈에 대처하여, 노이즈를 제거할 수 있다. 그러나, 단일 모델 블라인드 노이즈 제거 방법은 일반적으로 기설정된 범위의 노이즈 레벨에 피팅된 비블라인드 노이즈 제거 방법보다 성능이 떨어진다. 다양한 노이즈 레벨을 갖는 노이즈 이미지와 노이즈가 제거된 이미지 간의 조건부 매핑이 특정 레벨에 집중된 노이즈 레벨을 갖는 경우로 훈련하는 것에 비해 어렵기 때문이다. 단일 모델 블라인드 디노이징 방법은 다양한 실시예에 따라, 나이브 블라인드 모델로 호칭될 수도 있다. A unit model blind denoising method can remove noise by coping with various levels of noise included in an input image. However, single-model blind denoising methods generally perform worse than non-blind denoising methods fitted to a predetermined range of noise levels. This is because conditional mapping between a noise image with various noise levels and an image from which noise is removed is more difficult than training with a noise level concentrated at a specific level. A single model blind denoising method may be referred to as a naive blind model, according to various embodiments.

또다른 접근법은 추가 정보를 활용하여, 다양한 노이즈 레벨을 처리할 수 있는 유연한 네트워크 개발을 고려할 수 있다. 이 경우, 디노이징을 수행하는 CNN(Convolution Neural Network)과 함께 노이즈 레벨 추정이 필요하다. 즉, 노이즈 레벨 추정 단계 및 추정된 노이즈 레벨에 따른 디노이징 CNN을 활용하는 2단계 모델을 고려할 수 있다. 그러나, 2단계 모델은 노이즈 레벨 추정이 정확한 정보를 제공하지 못할 경우 제대로 동작하지 못할 수 있다.Another approach may consider developing flexible networks capable of handling different noise levels, utilizing the additional information. In this case, noise level estimation is required together with a convolution neural network (CNN) that performs denoising. That is, a two-step model using a noise level estimation step and a denoising CNN according to the estimated noise level may be considered. However, the two-step model may not work well if the noise level estimation does not provide accurate information.

현실에서의 노이즈 레벨 분포는, 모델 설계에서 가정되는 분포, 예를 들면 가우시안 분포에서 크게 벗어난다. 특히, 두 가지 어려운 점이 있는데, 하나는 실제 노이즈를 제거하기 위해, 쌍으로 구성되고 노이즈가 제거된 노이즈 데이터세트를 구현하는 것이 어렵다는 점이다. 실제 노이즈의 분포는 가우시안 노이즈보다 복잡한 특징(예를 들어 다중 모드, 신호 의존성, 공간적 변종 등)을 가지므로 노이즈의 정확한 모델링이 어렵다. A noise level distribution in reality deviates greatly from a distribution assumed in model design, for example, a Gaussian distribution. In particular, there are two difficulties. One is that it is difficult to implement a paired and denoised noise dataset in order to remove actual noise. Since the distribution of real noise has more complex characteristics (eg, multi-mode, signal dependence, spatial variation, etc.) than Gaussian noise, it is difficult to accurately model the noise.

두 번째로는 데이터의 효율적인 학습이다. 실제 노이즈는 매우 다양한 형태의 노이즈를 가진다. 노이즈의 강도뿐만 아니라 노이즈 분포도 매우 다양하고 다르게 나타난다. (예를 들면 카메라(하드웨어)에 따라, 카메라 설정에 따라, 촬영 환경에 따라 다르게 나타날 수 있다.) 이러한 복잡한 형태의 조건부 확률 분포를 학습하는 것은 매우 어렵다.The second is efficient learning of data. Actual noise has many different types of noise. Not only the intensity of the noise but also the distribution of the noise is very diverse and different. (For example, it may appear differently depending on the camera (hardware), camera settings, and shooting environment.) It is very difficult to learn such a complex conditional probability distribution.

따라서, 본 발명에서는 노이즈 이미지가 가지는 복잡한 분포를 보다 단순한 하위 분포로 나누어, 하위 분포에 따라 CNN 디노이징을 수행하는, 이미지 디노이징 방법을 제공한다. 또한 본 발명에서는 별도 정보의 추가 없이, 단일 CNN 기반 블라인드 이미지 디노이징을 수행하는 이미지 디노이징 방법 및 그 이미지 프로세싱 장치를 제공한다. Accordingly, the present invention provides an image denoising method in which a complex distribution of a noisy image is divided into simpler sub-distributions and CNN denoising is performed according to the sub-distributions. In addition, the present invention provides an image denoising method and an image processing apparatus for performing single CNN-based blind image denoising without adding additional information.

이하 설명에서, 디노이징(Denoising)이미지는 x, 노이즈 이미지는 y로 표시하고, 데이터 분포를

로 표시한다. 이미지 데이터는 N개의 하위 데이터를 포함한다고 가정한다. In the following description, the denoising image is denoted by x, the noise image by y, and the data distribution is

indicated by It is assumed that the image data includes N sub-data.

또한, 이하 설명에서 조건부 확률 분포 및 확률분포의 기대값에 기초하여 설명한다. 예를 들어

는 조건부 밀도함수로서, 확률변수 y를 전제로 한 확률변수 x의 분포를 의미한다. 예를 들어

는 []내의 기대값을 의미한다. In addition, in the following description, it is explained based on the conditional probability distribution and the expected value of the probability distribution. for example

is a conditional density function, which means the distribution of a random variable x under the premise of a random variable y. for example

means the expected value in [].

도 1a 내지 도 1c는 CNN 기반으로 이미지 디노이징을 수행하는 방법을 다양한 비교예들을 설명하기 위한 개념도이다.1A to 1C are conceptual diagrams for explaining various comparative examples of a method of performing image denoising based on a CNN.

이미지 디노이징 방법은 예를 들어, 3가지 범주로 구분해 볼 수 있다. 일 예로 비블라인드(Non-blind) 모델, 나이브 블라인드 모델, 2단계 블라인드 모델로 구분할 수 있다.Image denoising methods can be divided into three categories, for example. For example, it can be classified into a non-blind model, a naive blind model, and a two-step blind model.

도 1a는 특정 비블라인드 모델(Specific Non-Blind Model)을 설명하기 위한 개념도이다.1A is a conceptual diagram for explaining a specific non-blind model.

특정 비블라인드 모델은 특정된 노이즈 레벨에서 훈련된 별개의 네트워크를 포함한다. 이때 특정된 노이즈 레벨은 이미지 프로세싱 장치에서 의도한, 즉 기설정된 범위의 노이즈 레벨을 포함한다. 특정 비블라인드 모델은 예를 들어, DnCNN-S, RED, 및 NLRN 등을 포함할 수 있다. 특정 비블라인드 모델은 수학식 1과 같이 표현할 수 있다.A particular unblinded model includes a separate network trained at a specified noise level. At this time, the specified noise level includes a noise level intended by the image processing device, that is, within a predetermined range. Specific non-blind models may include, for example, DnCNN-S, RED, and NLRN. A specific non-blind model can be expressed as in Equation 1.

<수학식 1><Equation 1>

수학식 1에서,

는 i번째 네트워크의 파라미터를 나타낸다.In Equation 1,

represents a parameter of the i-th network.

i번째 네트워크에서 이미지 데이터에 포함된 노이즈 레벨은 표준편차

이 특정된 N개의 네트워크 각각에 한정된다. The noise level included in the image data in the ith network is the standard deviation

It is limited to each of these specified N networks.

예를 들어 제1 합성곱 신경망

에서 노이즈 레벨의 표준편차 엘리먼트(element)는

이고, 제2 합성곱 신경망

에서 노이즈 레벨의 표준편차 엘리먼트는

이라고 가정하자. 특정 비블라인드 모델은 해당 노이즈 레벨의 노이즈 이미지가 입력되었을 때, 적합한 이미지 디노이징을 수행할 수 있다For example, the first convolutional neural network

The standard deviation element of the noise level in

, and the second convolutional neural network

The standard deviation element of the noise level at

Let's assume A specific non-blind model may perform appropriate image denoising when a noise image of a corresponding noise level is input.

특정 비블라인드 모델은, 현재 노이즈 레벨이 특정된 네트워크 중 어느 하나에 속하면, 이미지 디노이징을 수행할 수 있다. 다만, 특정 비블라인드 모델을 적용하기 위해서는 테스트 이미지의 노이즈 레벨 정보가 있어야 한다. 또한, 실제(real-world) 노이즈 레벨은 다양하기 때문에 특정 비블라인드 모델을 적용하기 위해서는 다양한 노이즈 레벨 범위를 커버하기 위해, 많은 네트워크들이 준비되어야 하므로, 이를 저장하기 위한 메모리 사이즈가 커지는 문제가 있다.A specific non-blind model may perform image denoising if the current noise level belongs to any one of the specified networks. However, in order to apply a specific non-blind model, noise level information of the test image must be present. In addition, since the real-world noise level varies, in order to apply a specific non-blind model, many networks must be prepared to cover various noise level ranges, so the memory size for storing them increases. There is a problem.

도 1b는 나이브 블라인드 모델(Na

ve Blind Model)을 설명하기 위한 개념도이다.1b is a naive blind model (Na

It is a conceptual diagram to explain ve Blind Model).

도 1b를 참조하면, 대부분의 블라인드 디노이저는 전적으로 CNN의 표현력에 기초하면서, 나이브 블라인드 모델(naive blind model)의 카테고리에 속한다. 특히, 실제 노이즈의 경우, 노이즈 레벨을 알기 매우 어렵기 때문에, 수학식 2와 같이 표현되는 나이브 블라인드 모델로 디노이징을 수행해 볼 수 있다.Referring to FIG. 1B , most blind denoisers are entirely based on the expressive power of CNNs and belong to the category of naive blind models. In particular, in the case of actual noise, since it is very difficult to know the noise level, denoising can be performed using a naive blind model expressed as Equation 2.

<수학식 2><Equation 2>

나이브 블라인드 모델은 노이즈 레벨의 조건부 분포를 캡처하기 위해 파라미터화된 단일 네트워크로 훈련한다. 즉, 모든 노이즈 레벨로 훈련한다. 이 경우 나이브 블라인드 모델의 노이즈 레벨 분포

는 특정 비블라인드 모델의 노이즈 레벨 분포

보다 더 복잡하기 때문에, 나이브 블라인드 모델은 특정 비블라인드 모델보다 떨어진 성능을 갖는다. 나이브 블라인드 모델의 예로는 DnCNN-B, UNLNet, GCBD, 및 RIDNet 등이 있다.Naive-blind models are trained with a single parameterized network to capture the conditional distribution of noise levels. That is, train with all noise levels. Noise level distribution of the naive blind model in this case

is the noise level distribution of a particular non-blind model

Because of their higher complexity, naïve blind models perform worse than certain non-blind models. Examples of naive blind models include DnCNN-B, UNLNet, GCBD, and RIDNet.

도 1c는 2단계 블라인드 모델(2-Stage Blind Model)을 설명하기 위한 개념도이다.1c is a conceptual diagram for explaining a 2-Stage Blind Model.

도 1c를 참조하면, 2단계 블라인드 모델은 1차적으로 노이즈 레벨 파라미터 c를 추정하고, 2차적으로 네트워크로 이미지 디노이징을 수행한다. 2단계 블라인드 모델은 수학식 3 및 수학식 4와 같이 표현될 수 있다.Referring to FIG. 1C, the two-step blind model firstly estimates a noise level parameter c, and secondarily performs image denoising with a network. The two-step blind model can be expressed as Equations 3 and 4.

<수학식 3><Equation 3>

<수학식 4><Equation 4>

즉, 수학식 3에서

는 노이즈 추정부(noise estimator)의 파라미터를 의미하고, 수학식 3은 노이즈 이미지 y에 있어서, 2단계 블라인드 모델의 파라미터 c 중에 분포확률

이 최대의 예상치(max expectation)을 갖게 하는 c를 갖는 네트워크를 선택하도록 모델링 한 것이다.That is, in Equation 3

Means a parameter of the noise estimator, and Equation 3 is the distribution probability among the parameters c of the two-step blind model in the noise image y

It is modeled to select a network with c that gives this maximum expectation.

수학식 4에서

는 디노이저(Denoiser)의 파라미터를 의미하고, 수학식 4는 선택된 네트워크에서, 노이즈 이미지 y에서 노이즈를 제거하여 디노이징 이미지 x로 생성할 때 수학식 3에 의해서 주어진 c에 의해서 분포확률

이 최대의 예상치를 갖게하는 디노이징 이미지 x를 추정하도록 네트워크를 모델링 한것이다.in Equation 4

Means a parameter of the denoiser, and Equation 4 is a distribution probability by c given by Equation 3 when generating a denoising image x by removing noise from a noise image y in the selected network

The network is modeled to estimate the denoising image x that gives this maximum estimate.

이러한 2단계 블라인드 모델을 적용하기 위해서는 가우시안 노이즈와 같이, 이미 알고 있는 노이즈 레벨의 파라미터 c가 선택되어야 한다. 그러나, 실제 노이즈 레벨의 파라미터 c는 가우시안 노이즈보다 더 복잡하므로 추가적인 정보 없이는 2단계 블라인드 모델을 적용하기 어렵다.In order to apply this two-step blind model, a parameter c of a known noise level, such as Gaussian noise, must be selected. However, since the parameter c of the actual noise level is more complex than Gaussian noise, it is difficult to apply the two-step blind model without additional information.

노이즈 이미지를 y, 디노이징 이미지를 x라고 할 때, 최대 사후 확률 MAP(Maximum A Posterior)은 수학식 5와 같이 추론(interference)될 수 있다. When the noise image is y and the denoising image is x, maximum posterior probability MAP (Maximum A Posterior) can be inferred as shown in Equation 5.

<수학식 5><Equation 5>

수학식 5의

는 노이즈 이미지 y에 대한 디노이징 이미지 x의 확률을 최대화하는 성분을 의미한다. 이때 수학식 5는 다시 수학식 6과 같이 가능도 함수(likelihood function) 및 사전확률(Prior probability) 로 구분되어 정리될 수 있다.of Equation 5

denotes a component that maximizes the probability of the denoising image x with respect to the noise image y. At this time, Equation 5 may be further classified into a likelihood function and a prior probability as in Equation 6 and rearranged.

<수학식 6><Equation 6>

수학식 6은 가우시안 노이즈인 경우

를 가능도 함수

및 사전확률

로 구분한 것으로, 앞서 설명한 노이즈 추정부의 파라미터

에 대한 식으로 다시 쓸 수 있다. Equation 6 is Gaussian noise

the likelihood function

and prior probabilities

, which is the parameter of the noise estimation unit described above.

can be rewritten as an expression for

그러나, 앞서 설명한 바와 같이 수학식 6에 의하여도, 실제 노이즈 레벨은 가우시안 노이즈보다 복잡하고 다양하므로, 노이즈 레벨에 대한 추가적인 정보가 있어야만 하는 문제점이 있다. However, as described above, even according to Equation 6, since the actual noise level is more complex and diverse than Gaussian noise, there is a problem in that additional information on the noise level must be present.

도 2는 본 발명의 실시예들에 따른 이미지 디노이징 방법을 설명하기 위한 개념도이고, 도 3는 도 2의 이미지 디노이징 방법을 보다 구체적으로 설명하기 위한 개념도이며, 도 4는 도 3의 이미지 디노이징 방법을 보다 구체적으로 설명하기 위한 개념도이다.2 is a conceptual diagram illustrating an image denoising method according to embodiments of the present invention, FIG. 3 is a conceptual diagram illustrating the image denoising method of FIG. 2 in more detail, and FIG. 4 is a conceptual diagram illustrating the image denoising method of FIG. It is a conceptual diagram for explaining the easing method in more detail.

사후확률분포

를 가정하기 위해, 본 발명의 몇몇 실시예들에서는 설정가능한 확률함수(tractable probality fuction)

를 가정하여 설명한다. 노이즈 이미지 y와 디노이징 이미지 x 간의 조인트-로그 확률분포(joint log-distribution)

는 수학식 7과 같이 표현될 수 있다. posterior distribution

To assume, in some embodiments of the present invention, a tractable probability function

It is explained by assuming . Joint log-distribution between the noise image y and the denoising image x

Can be expressed as in Equation 7.

<수학식 7><Equation 7>

이때 수학식 7에서

는 쿨백-라이블러 발산(Kullback-Leibler divergence)을 의미한다. 쿨백-라이블러 발산은 두 확률분포의 차이를 계산하는데 사용하는 함수로, 어떤 이상적인 분포에 대해 그 분포를 근사하는 다른 분포를 사용해 샘플링을 한다면 발생할 수 있는 정보 엔트로피 차이를 계산한다. 예를 들어

는,

와

의 크로스 엔트로피에서

의 엔트로피를 뺀 값을 의미한다. 즉

와

분포의 차이를 나타낸다.At this time, in Equation 7

denotes the Kullback-Leibler divergence. The Kullback-Leibler divergence is a function used to calculate the difference between two probability distributions. It calculates the information entropy difference that can occur if sampling is performed using another distribution that approximates the distribution for an ideal distribution. for example

Is,

Wow

From the cross entropy of

means the value minus the entropy of In other words

Wow

Indicates a difference in distribution.

수학식 7에서,

,

는 각각 훈련에 의해 학습가능한 값들이다. 그러나,

의 경우, 노이즈 이미지에 포함된 노이즈 레벨 자체, 즉,

를 구하기 어렵다. 추적불가능한 수학식 7의 근사를 하기 위해 변분하한 L을 수학식 8과 같이 정의하면 변분하한 L이 결합확률분포

의 하한이 된다.In Equation 7,

,

are values that can be learned by training, respectively. But,

In the case of , the noise level itself included in the noise image, that is,

is difficult to obtain. If the lower variation limit L is defined as in Equation 8 in order to approximate Equation 7 that cannot be traced, the lower variation limit L is the joint probability distribution

is the lower limit of

<수학식 8><Equation 8>

수학식 8에 기초하여 수학식 7의 결합확률분포

를 정리하면 수학식 9와 같이 정리될 수 있다.Joint probability distribution of Equation 7 based on Equation 8

By arranging, it can be arranged as in Equation 9.

KL 발산의 특성에 의해

이므로 log p(x,y)의 하한(lower bound)은 변분하한 L이 되며 각각이 다룰수 있는 확률변수로 나타낼 수 있기 때문에 이를 통해 CNN 기반 모델로 학습 가능하다.By the nature of the KL divergence

, the lower bound of log p(x,y) is the lower bound L, and since each can be represented as a random variable that can be handled, it can be learned as a CNN-based model.

<수학식 9><Equation 9>

즉,

는

과

의 합이므로,

는 수학식 10과 같이 변분하한

과 같거나 큰 값을 가질 수 있다. 이때, 확률 특성(예를 들면 동일 데이터 세트 내 모든 확률의 합은 1이 된다)에 기초하면, log p(x,y)를 최대화 하는 문제는 변분하한 L을 최대화 하는 문제로 근사할 수 있다.in other words,

Is

class

Since the sum of

Is the lower limit of variation as shown in Equation 10

can have a value equal to or greater than At this time, based on the probability characteristics (for example, the sum of all probabilities in the same data set is 1), the problem of maximizing log p(x,y) can be approximated as the problem of maximizing the lower variation limit L.

<수학식 10><Equation 10>

한편, 수학식 7에서, 노이즈 이미지 y와 디노이징 이미지 x로부터 노이즈 성분인 잠재 변수 c만을 도출하는

를 알기 어려우므로, 이를 위해 로그 사후확률(log posterior probability)로 임의의

를 정의한다.

는 쉽게 다룰 수 있는(tractable) 확률분포, 예를 들어 가우시안 확률분포로 설정될 수 있다. 최종적으로 구하려고 하는 사후 확률 분포

는 노이즈 이미지 y에 대한 디노이징 이미지 x의 확률분포로서, 수학식 7의

에 대한 식으로 재정리하면 수학식 11과 같이 정의된다.On the other hand, in Equation 7, deriving only the latent variable c, which is a noise component, from the noise image y and the denoising image x

Since it is difficult to know, for this purpose, a log posterior probability

define

may be set to a tractable probability distribution, for example, a Gaussian probability distribution. The posterior probability distribution you are finally trying to find

Is the probability distribution of the denoising image x for the noise image y, in Equation 7

Rearranged as an expression for , it is defined as Equation 11.

<수학식 11><Equation 11>

수학식 11에서 목표로 하는 노이즈 이미지가 주어졌을 때 디노이징 이미지의 확률값 즉 MAP는 수학식 8로부터 학습된 p(x|y,c) 와 q(c|y)로부터 구할 수 있다.Given the target noise image in Equation 11, the probability value of the denoising image, that is, MAP, can be obtained from p(x|y,c) and q(c|y) learned from Equation 8.

수학식 8에서, 변분하한

의 첫번째 항(term)

는 디노이징 및 이미지 재구성에 있어서 노이즈 이미지 y와 잠재 변수 c가 주어졌을 때 디노이징 이미지 x에 관한 수식이다. 즉, 유일하게 디노이징 이미지 x에 관여된 수식이다.In Equation 8, the lower limit of variation

the first term of

is an expression for denoising image x given noise image y and latent variable c in denoising and image reconstruction. That is, it is the only formula involved in the denoising image x.

그러나 변분하한

의 나머지 구성요소들은 잠재 변수 c에 대한 것으로, 두번째 항

는 잠재분포를 제한하는 KL 발산이고, 변분하한

의 세번째 항

는 노이즈 이미지 y의 자동 인코더 재구성(auto-encoder reconstruction)에 관한 수식이다.but lower bound

The remaining components of are for the latent variable c, the second term

is the KL divergence limiting the potential distribution, and the lower variation bound

the third term of

is an expression for auto-encoder reconstruction of the noisy image y.

도 2를 참고하면, 앞서 설명한 수학식 8의 각 항은 분할된 3개의 합성곱 신경망

각각으로 학습한다. 이때 (

)는 각각 디노이징 합성곱 신경망 파라미터, 인코딩 합성곱 신경망 파라미터, 디코딩 합성곱 신경망의 파라미터이다. 노이즈 레벨에 관한 잠재 변수 c를 구하기 위해 훈련 데이터

가 주어진다고 가정하면, 몇몇 실시예에 따른 이미지 디노이징 방법은 경험적 데이터 분포를 가진 훈련 데이터

를 사용하여

에 대한 최종 목적 함수인 수학식 12를 도출할 수 있다. Referring to FIG. 2, each term of Equation 8 described above is divided into three convolutional neural networks.

learn from each At this time (

) are parameters of a denoising convolutional neural network parameter, an encoding convolutional neural network parameter, and a decoding convolutional neural network parameter, respectively. Training data to find the latent variable c related to the noise level

Assuming that x is given, an image denoising method according to some embodiments may use training data with an empirical data distribution.

use with

Equation 12, which is the final objective function for , can be derived.

<수학식 12><Equation 12>

수학식 12의 첫번째 항

은, 디노이징 이미지 x 와 유추된 출력 y 간 왜곡을 최소화하기 위해 MAE(mean absolute error)를 적용한 것이다. 손실함수로 다시 작성하면, 첫번째 항에 대한 목적함수

는 이미지 디노이징(denoise)에 대한 것으로, 최대신호 잡음비(Peak signal-to-noise ratio; PSNR)을 최대화하기 위해 수학식 13과 같은 라플라스 확률분포(Laplacian distribution)에서의 L1 손실 함수를 사용한다.The first term of Equation 12

is the application of mean absolute error (MAE) to minimize the distortion between the denoising image x and the inferred output y. Rewritten as a loss function, the objective function for the first term

is for image denoising, and uses the L1 loss function in the Laplacian distribution as shown in Equation 13 to maximize the Peak signal-to-noise ratio (PSNR).

<수학식 13><Equation 13>

잠재 변수 c의 사전확률분포를 수학식 14의 정규 분포 N(0,1)로 가정하자. 이때 수학식 12의 두번째 항에 대응하는 목적함수

는 KL 발산(divergence)으로, 수학식 14와 같이 표현된다.Assume that the prior probability distribution of the latent variable c is the normal distribution N(0,1) in Equation 14. At this time, the objective function corresponding to the second term of Equation 12

Is the KL divergence and is expressed as in Equation 14.

<수학식 14><Equation 14>

수학식 14에서 미분 가능한 학습을 위해 재매개화 트릭(reprarameterization trick)을 이용하면, 수학식 15과 같이 표현된다. 수학식 15에 기초하면, 신경망이 잠재 변수 c의 평균과 표준편차를 예측하도록 학습할 수 있다.If a reparameterization trick is used for differentiable learning in Equation 14, it is expressed as Equation 15. Based on Equation 15, the neural network can learn to predict the mean and standard deviation of the latent variable c.

<수학식 15><Equation 15>

수학식 12의 세번째 항의 목적함수

은 노이즈 이미지 y에서 자기 자신을 복원, 즉

를 생성하는 것이다. 자기자신을 복원하여(

) 입력된 노이즈 이미지 y와 비교함으로써 잠재 변수 c가 노이즈 입력 이미지 y와 디코더(300) 출력 간 MAE(mean absolute error)가 최소값을 갖도록 네트워크를 학습한다. 또한,

와

간의 JS 발산(Jensen-Shannon divergence)이 최소화되도록, 적대적 손실함수를 채택한다. Objective function of the third term of Equation 12

restores itself from the noise image y, i.e.

is to create By restoring yourself (

) by comparing with the input noise image y, the latent variable c learns the network so that the mean absolute error (MAE) between the noise input image y and the output of the decoder 300 has a minimum value. In addition,

Wow

An adversarial loss function is adopted so that the JS divergence (Jensen-Shannon divergence) between the two is minimized.

몇몇 실시예에 따라 세번째 항에 대응하는 목적함수

로는 수학식 17과 같이 L1의 손실 함수

와 생성적 적대 신경망(Genertive Advesarial Network, GAN)의 적대적 손실함수(adversarial loss)

를 모두 사용한다. 일 실시예로 추가적으로 합성 노이즈 영상에서 사전 정보를 반영할 수 있는 경우(예를 들어 AWGN의 경우) 잠재 변수 c로부터 간단한 합성곱 신경망 EST를 거쳐 노이즈 레벨을 예측하는 L1 손실함수

을 추가할 수 있다(

). 이때

는 단순 2레이어 CNN(Conv-ReLU-Conv)를 말한다.An objective function corresponding to the third term according to some embodiments

As shown in Equation 17, the loss function of L1 is

and the adversarial loss function of a generative advesarial network (GAN)

use all As an embodiment, if prior information can be additionally reflected in the synthesized noise image (for example, in the case of AWGN) L1 loss function that predicts the noise level through a simple convolutional neural network EST from the latent variable c

can be added (

). At this time

refers to a simple two-layer CNN (Conv-ReLU-Conv).

다른 실시예를 들어 실제 노이즈 레벨에 대한 사전 정보가 없는 카메라 노이즈 이미지의 경우에는 목적함수

로는 L1 손실 함수와 적대적 손실 함수만 사용할 수 있다.In another embodiment, for camera noise images without prior information about the actual noise level, the objective function

As , only the L1 loss function and the adversarial loss function can be used.

<수학식 16><Equation 16>

위에서 설명한 바와 같이 수학식 12의 최종 목적함수를 재정리하면, 최종 손실 함수는 수학식 13, 수학식 15, 수학식 16의 가중 합(weighted sum)이 되고, 정규화된(normalized) KL 발산은 임의의 가중치

가 곱해진 수학식 18과 같이 표현될 수 있다.If the final objective function of Equation 12 is rearranged as described above, the final loss function becomes a weighted sum of Equations 13, 15, and 16, and the normalized KL divergence is weight

It can be expressed as Equation 18 multiplied by

<수학식 17><Equation 17>

도 3을 참고하면, 몇몇 실시예에 따른 이미지 디노이징 방법은 앞서 설명한 수학식 12 및 수학식 18에 기초하여 노이즈 레벨의 사전 정보 없이도, 변분적 뉴럴 네트워크를 이용하여 노이즈를 효율적으로 제거할 수 있다. Referring to FIG. 3 , the image denoising method according to some embodiments can efficiently remove noise using a variational neural network without prior information on the noise level based on Equations 12 and 18 described above. .

즉, 이미지 디노이징 방법은 인코딩 합성곱 신경망을 이용하여 이미지 센서로부터 수신된 입력 노이즈 이미지에서 노이즈 정보인 잠재 변수를 추론하고, 디노이징 합성곱 신경망을 이용하여 기설정된 잠재 변수에 기초한 노이즈를 제거하여 디노이징 이미지를 생성할 수 있다. 또한, 이미지 디노이징 방법은 추론된 잠재 변수에 기초하여 노이즈 이미지를 복원해보고, 복원된 노이즈 이미지와 실제 입력 노이즈 이미지와 비교하여, 앞서 추론된 잠재 변수를 수정하도록 학습시킬 수 있다.That is, the image denoising method infers a latent variable, which is noise information, from an input noisy image received from an image sensor using an encoding convolutional neural network, and removes noise based on a preset latent variable using a denoising convolutional neural network. A denoising image can be created. Also, the image denoising method may learn to reconstruct a noise image based on the inferred latent variable, compare the reconstructed noise image with an actual input noise image, and modify the previously inferred latent variable.

구체적으로, 다양한 노이즈 레벨에 대응하기 위해, 본 발명의 이미지 프로세싱 장치는 학습데이터의 노이즈 이미지 y에서 디노이징 이미지 x로 디노이징하는 디노이징 합성곱 신경망

와, 학습데이터에서의 잠재 변수 c를 추론하기 위해 노이즈 이미지 y를 인코딩하는 인코딩 합성곱 신경망

, 추론된 잠재 변수 c로 입력된 노이즈 이미지 y을 다시 복원하기 위해 디코딩하는 디코딩 합성곱 신경망

을 포함한다. Specifically, in order to respond to various noise levels, the image processing apparatus of the present invention denoises the denoising image x from the noise image y of the training data to the denoising convolutional neural network

and an encoding convolutional neural network that encodes the noisy image y to infer the latent variable c in the training data.

, a decoding convolutional neural network that decodes to restore the input noise image y with the inferred latent variable c

includes

디노이징 합성곱 신경망

은 실제 디노이징 이미지 x와 추론된 디노이징 이미지

간 차이를 손실함수로 생성하여, 디노이징 합성곱 신경망

과 인코딩 합성곱 신경망

의 차이(gradient)를 역전파(back propagation)하고(도 3의 (c)), 인코딩 합성곱 신경망

은 입력된 노이즈 이미지 및 추론된 디노이징 이미지

로부터 잠재 변수 c를 추론하고(도 3의 (a)), 디코딩 합성곱 신경망

은 추론된 잠재 변수 c를 이용해 노이즈 이미지

로 복원하고, 복원된 노이즈 이미지

와 입력된 노이즈 이미지 y 간의 차이에 대한 손실함수의 차이(gradient)는 인코딩 합성곱 신경망

및 디코딩 합성곱 신경망

으로 역전파 되어 (도 3의 (b)) 앞서 추론되고 이용된 잠재 변수 c를 학습하는데 반영한다.Denoising Convolutional Neural Networks

is the actual denoising image x and the inferred denoising image

By generating the difference between the differences as a loss function, denoising convolutional neural network

and encoding convolutional neural networks

Back propagation of the gradient of (Fig. 3 (c)), encoding convolutional neural network

is the input noise image and the inferred denoising image

Inferring the latent variable c from (Fig. 3 (a)), decoding convolutional neural network

is a noise image using the inferred latent variable c

, and the restored noise image

The difference (gradient) of the loss function for the difference between y and the input noise image y is the encoding convolutional neural network

and decoding convolutional neural networks

It is back-propagated (Fig. 3(b)) and reflected in learning the latent variable c that was previously inferred and used.

도 4를 참조하면, 몇몇 실시예에 따른 이미지 프로세싱 장치(10)는 디노이저(100), 인코더(200) 및 디코더(300)를 포함한다. 이미지 프로세싱 장치(10)는 프로세서 및 메모리를 포함할 수 있다. 프로세서는 컨볼루션 신경망을 이용하여 입력되는 노이즈 이미지에서 노이즈를 제거하여 디노이징 이미지를 출력한다. 상기 프로세서는 다양한 실시예에 따라 뉴럴 프로세서를 포함할 수 있다. 메모리는 이미지 프로세싱 장치(10)에 의해 실행가능한 인스트럭션들(또는 프로그램)을 저장할 수 있다. 예를 들어 인스트럭션들은 이미지 프로세싱 장치(10)의 동작 및/또는 이미지 프로세싱 장치(10)의 각 구성의 동작을 실행하기 위한 인스트럭션들을 포함할 수 있다.Referring to FIG. 4 , an image processing device 10 according to some embodiments includes a denoiser 100 , an encoder 200 and a decoder 300 . The image processing device 10 may include a processor and memory. The processor removes noise from the input noise image using a convolutional neural network and outputs a denoising image. The processor may include a neural processor according to various embodiments. The memory may store instructions (or programs) executable by the image processing device 10 . For example, the instructions may include instructions for executing an operation of the image processing device 10 and/or an operation of each component of the image processing device 10 .

이미지 프로세싱 장치(10)는 메모리(310)에 저장된 데이터를 처리할 수 있다. 이미지 프로세싱 장치(10)는 메모리(310)에 저장된 컴퓨터로 읽을 수 있는 코드(예를 들어, 소프트웨어) 및 이미지 프로세싱 장치(10)에 의해 유발된 인스트럭션(instruction)들을 실행할 수 있다.The image processing device 10 may process data stored in the memory 310 . The image processing device 10 may execute computer readable codes (eg, software) stored in the memory 310 and instructions triggered by the image processing device 10 .

이미지 프로세싱 장치(10)는 목적하는 동작들(desired operations)을 실행시키기 위한 물리적인 구조를 갖는 회로를 가지는 하드웨어로 구현된 데이터 처리 장치일 수 있다. 예를 들어, 목적하는 동작들은 프로그램에 포함된 코드(code) 또는 인스트럭션들(instructions)을 포함할 수 있다.The image processing device 10 may be a data processing device implemented in hardware having a circuit having a physical structure for executing desired operations. For example, desired operations may include codes or instructions included in a program.

예를 들어, 하드웨어로 구현된 데이터 처리 장치는 마이크로프로세서(microprocessor), 중앙 처리 장치(central processing unit), 프로세서 코어(processor core), 멀티-코어 프로세서(multi-core processor), 멀티프로세서(multiprocessor), ASIC(Application-Specific Integrated Circuit), FPGA(Field Programmable Gate Array)를 포함할 수 있다.For example, a data processing unit implemented in hardware includes a microprocessor, a central processing unit, a processor core, a multi-core processor, and a multiprocessor. , Application-Specific Integrated Circuit (ASIC), and Field Programmable Gate Array (FPGA).

디노이저(100)는 채널 축을 따라 연결된 잠재 변수 c가 있는 노이즈 이미지 y를 수신하여 디노이징 이미지 x를 추론한다. 몇몇 실시예에서 디노이저(100)는 완전 컨볼루션(fully convolution)으로 높은 확장성을 갖는다.The denoiser 100 receives a noisy image y with latent variable c connected along the channel axis and infers a denoising image x. In some embodiments, the denoiser 100 has high scalability with fully convolution.

디노이저(100)는 D개의 RIR블록(120-1 내지 120-D), 컨볼루션 레이어(130)를 포함한다. The denoiser 100 includes D RIR blocks 120-1 to 120-D and a convolution layer 130.

RIR블록(residual-in-residual block, RIRBlock, 120-1)은 N개의 잔차 블록(ResBlock)과 1개의 컨볼루션 레이어(130)을 포함한다.A residual-in-residual block (RIRBlock) 120-1 includes N residual blocks (ResBlock) and one convolution layer 130.

잔차블럭(ResBlock, 1 내지 N)은 기본 빌딩 블록으로 채택된다. 구체적으로 설명하면, 각 RIR블럭(120) 내에서 연속적으로 동일한 잔차블럭(ResBlock)이 연결된, 정류된 선형 유닛(Rectified linear unit, ReLU)을 구성한다. 정류된 선형 유닛은 다른 컨볼루션 레이어와 같이 Conv(110-1)-ReLU-Conv(120-1) 순서로 된 64개 필터의 3x3 컨볼루션 레이어로 구현된다. 이후, 상기 입력이 컨볼루션의 계층의 출력에 추가되어, 스킵 연결(skip-connecion)을 형성한다. 컨볼루션 레이어(130)는 디노이징 이미지 x 자체 대신 잔차 이미지(노이즈, 140)를 유추한다. 디노이저(140)는 도 2에서 설명한 제1 목적함수 또는 도 3에서 설명한 디노이징 합성곱 신경망

에 대응되게 동작한다.Residual blocks (ResBlock, 1 to N) are adopted as basic building blocks. Specifically, in each RIR block 120, the same residual block (ResBlock) is continuously connected to constitute a rectified linear unit (ReLU). The rectified linear unit is implemented as a 3x3 convolutional layer of 64 filters in the order of Conv (110-1) -ReLU-Conv (120-1) like other convolutional layers. Then, the input is added to the output of the layer of convolution to form a skip-connection. The convolution layer 130 infers the residual image (noise, 140) instead of the denoising image x itself. The denoiser 140 uses the first objective function described in FIG. 2 or the denoising convolutional neural network described in FIG. 3

operates in correspondence with

인코더(200)와 디코더(300)는 스킵 연결이 없는 간단한 피드포워드 컨볼루션 네트워크(feedforward convolution network)일 수 있다. 인코더(200) 및 디코더(300)는 소정 사이즈의 패치 단위로 그 이미지를 식별하는 패치 식별부(patch discriminator) 및 스펙트럼 정규화를 사용한다. 패치 식별부는 상기 소정 사이즈의 패치 단위로 진위를 판단한다. Encoder 200 and decoder 300 may be simple feedforward convolution networks without skip connections. The encoder 200 and the decoder 300 use a patch discriminator that identifies the image in units of patches of a predetermined size and spectral normalization. The patch identification unit determines authenticity in units of patches of the predetermined size.

인코더(200)는 피쳐 맵(feature map)의 공간 크기(spaial size)를 2배(높이 및 폭의 1/4)로 줄이고, 출력 c에는 4개의 채널이 있다. 미분가능한 몬테 카를로 를 위하여, 잠재 변수 c를 입력 노이즈 영상 y에 재매개화 트릭(reparameterization trick)을 적용하여 수학식 18와 같이 산출할 수 있다The encoder 200 reduces the spatial size of the feature map by a factor of 2 (height and width 1/4), and output c has 4 channels. For differentiable Monte Carlo, the latent variable c can be calculated as shown in Equation 18 by applying a reparameterization trick to the input noise image y

<수학식 18><Equation 18>

이때

는 하다마드 연산(Hadamard product)을 나타낸 것으로

,

는 각각 인코더(200)의 출력이 될 수 있다. At this time

represents the Hadamard product.

,

may be outputs of the encoder 200, respectively.

디코더(300)는 인코더(200)와 대칭적으로 구현될 수 있다. 디코더(300)는 인코더(200)로부터 잠재 변수 c를 수신하여, 노이즈 이미지 y'를 복원할 수 있다.The decoder 300 may be implemented symmetrically with the encoder 200. The decoder 300 may receive the latent variable c from the encoder 200 and restore the noise image y'.

인코더(200)는 도 2에서 설명한 제2 목적함수 또는 도 3에서 설명한 인코딩 합성곱 신경망

에 대응되게 동작한다. 디코더(300)는 도 2에서 설명한 제3 목적함수 또는 도 3에서 설명한 디코딩 합성곱 신경망

에 대응되게 동작한다. The encoder 200 uses the second objective function described in FIG. 2 or the encoding convolutional neural network described in FIG. 3

operates in correspondence with The decoder 300 uses the third objective function described in FIG. 2 or the decoding convolutional neural network described in FIG. 3

operates in correspondence with

도 2 내지 도 4에서 설명한 몇몇 실시예에 따른 이미지 디노이징 방법에 따르면, 노이즈 정보에 관한 새로운 잠재 변수를 도입하여 변분론적으로 네트워크를 학습하기 위한 손실 함수들을 유도하고, 유도된 손실 함수를 찾기 위한 합성곱 신경망을 학습한다. 이를 통해 신경망이 학습을 통해 노이즈 정보에 기초한 잠재 변수 c를 찾고 복원된 노이즈 영상을 통해 잠재 변수를 계속 최적화되도록 한다. 이러한 이미지 디노이징 방법은 영상 저하가 나타날 경우 노이즈에 대한 별도의 추가 정보 제공없이도 노이즈 제거를 보다 효율적으로 할 수 있는 장점이 있다. 또한 다양한 레벨의 노이즈가 이미지에 포함되어도 잠재 변수 c를 계속 학습하기 때문에, 노이즈 레벨에 최적화된 디노이징을 수행할 수 있다. 또한, 단일 신경망을 이용하므로 보다 적은 메모리 용량에서도 구현이 가능한 장점이 있다.According to the image denoising method according to some embodiments described with reference to FIGS. 2 to 4 , a new latent variable related to noise information is introduced to derive loss functions for learning a network in a variational way, and to find the derived loss function. Learn convolutional neural networks. Through this, the neural network finds a latent variable c based on the noise information through learning and continuously optimizes the latent variable through the restored noise image. This image denoising method has the advantage of being able to more efficiently remove noise without providing additional information about noise when image deterioration occurs. In addition, since the latent variable c is continuously learned even when various levels of noise are included in the image, denoising optimized for the noise level can be performed. In addition, since a single neural network is used, there is an advantage in that it can be implemented with a smaller memory capacity.

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하나, 본 발명은 상기 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 제조될 수 있으며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.The embodiments of the present invention will be described with reference to the accompanying drawings, but the present invention is not limited to the above embodiments and can be manufactured in a variety of different forms, and those skilled in the art in the art to which the present invention pertains A person will understand that the present invention may be embodied in other specific forms without changing the technical spirit or essential features. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting.

10: 이미지 프로세싱 장치
100 : 디노이저 200 : 인코더
300: 디코더10: image processing unit
100: Denoiser 200: Encoder
300: decoder

Claims

an encoder that infers a latent variable from an input noise image based on a preset noise;
a denoiser generating a denoising image by removing the noise from the input noise image; and
And a decoder for restoring a noisy image using the inferred latent variable,
and learning the latent variable based on a difference between the reconstructed noise image and the input noise image.

The method of claim 1, wherein the denoiser
An image processing apparatus comprising a plurality of residual in residual (RIR) blocks and a first convolution block.

The method of claim 1, wherein the encoder
The image processing device of claim 1, wherein an average and a standard deviation of the latent variable are inferred from the input noise image using the preset noise and Kullback-Leibler divergence.

The method of claim 1, wherein the denoiser
An image processing device for generating the current stage denoising image such that a peak signal to noise ratio (PSNR) is maximized in a loss function between a previous stage denoising image and a current stage denoising image generated based on the learned latent variable.

an image sensor for sensing a first input image of a subject;
a neural processor including a plurality of convolutional neural networks to remove noise from the first input image;
A memory for storing instructions executable by the neural processor;
The neural processor
Inferring a latent variable based on the first input image using an encoding convolutional neural network;
An image processing device configured to generate a first denoising image by removing noise based on the inferred latent variable from the first input image using a denoising convolutional neural network.

According to claim 5,
The image processing device, wherein the latent variable includes level and distribution information of the noise.

The method of claim 5, wherein the denoising convolutional neural network
a plurality of residual-in residual (RIR) blocks comprising a plurality of rectified linear units, each comprising a plurality of residual blocks and a convolution block; and
contains a convolutional layer,
wherein each linear unit is connected by a skip connection and infers a residual image by adding the first input image to an output of the linear unit.

The method of claim 5, wherein the encoding convolutional neural network
The image processing apparatus of claim 1 , wherein the spatial size of a feature map for the first input image is reduced by a preset ratio, and the latent variable is inferred by applying a reparameterization trick.

The method of claim 5, wherein the neural processor
Generating a first reconstructed image using the latent variable and the first denoising image using a decoding convolutional neural network;
The denoising convolutional neural network
Re-inferring the latent variable that causes a mean absolute error (MAE) between the first denoising image and the first reconstructed image to be small;
An image processing apparatus using the latent variable re-inferred when performing denoising on a second input image of a next operation section.

10. The method of claim 9, wherein the denoising convolutional neural network
A generative adversarial network (GAN) that minimizes JS divergence between a training data set including the first input image and the first reconstructed image.