KR102467092B1

KR102467092B1 - Super-resolution image processing method and system robust coding noise using multiple neural networks

Info

Publication number: KR102467092B1
Application number: KR1020210165814A
Authority: KR
Inventors: 김시중; 김주한; 이웅원
Original assignee: 블루닷 주식회사
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-11-16

Abstract

A super resolution image processing method robust against coding noise using a plurality of neural networks is disclosed. The super-resolution image processing method that is robust to coding noise using a plurality of neural networks comprises the steps of: receiving, by a processor, a compressed low-resolution image; decoding the compressed low-resolution image by the processor; generating, by the processor, a noise-removed low-resolution image by removing noise from the decoded low-resolution image using a first neural network; when the difference between the noise level of the decoded low-resolution image and the noise level of the denoised low-resolution image is smaller than a certain value, converting, by the processor, the decoded low-resolution image into a decoded high-resolution image using a second neural network, and when the difference is greater than the arbitrary value, converting, by the processor, the low-resolution image from which the noise is removed into the decoded high-resolution image using the second neural network; encoding the decoded high-resolution image by the processor; and transmitting, by the processor, the encoded high-resolution image.

Description

Super-resolution image processing method and system robust coding noise using multiple neural networks

본 발명은 복수의 신경망들을 이용하여 코딩 노이즈에 강인한 슈퍼 레졸루션 이미지 처리 방법 및 시스템에 관한 것으로, 상세하게는 저해상도 이미지를 압축하는 과정에서 생성되는 코딩 노이즈의 레벨에 따라 슈퍼 레졸루션 이미지를 생성할 수 있는 복수의 신경망들을 이용하여 코딩 노이즈에 강인한 슈퍼 레졸루션 이미지 처리 방법 및 시스템에 관한 것이다. The present invention relates to a super-resolution image processing method and system robust against coding noise using a plurality of neural networks, and more particularly, to a super-resolution image that can be generated according to the level of coding noise generated in the process of compressing a low-resolution image. A super-resolution image processing method and system robust against coding noise using a plurality of neural networks.

이미지 업스케일링(upscaling), 또는 이미지 슈퍼 레졸루션(super-resolution)은 저해상도 이미지((low resolution image)로부터 고해상도 이미지(high resolution image)를 생성하기 위한 기술을 의미한다. Image upscaling, or image super-resolution, refers to a technique for generating a high resolution image from a low resolution image.

이미지 업스케일링 동작은 압축된 저해상도 이미지를 디코딩한 저해상도 이미지에 대해서도 수행될 수 있다. 이미지를 압축하는 과정에서 노이즈가 생성될 수 있다. 상기 노이즈는 코딩 노이즈로 정의된다. 상기 코딩 노이즈가 제거되지 않은 저해상도 이미지에 대해 이미지 업스케일링 동작이 수행될 때, 코딩 노이즈는 아티팩트(artifacts)가 될 수 있다. 즉, 변환된 고해상도 이미지에서 코딩 노이즈는 눈에 뛸 수 있다. The image upscaling operation may also be performed on a low-resolution image obtained by decoding a compressed low-resolution image. Noise can be created in the process of compressing an image. The noise is defined as coding noise. When an image upscaling operation is performed on a low-resolution image from which the coding noise has not been removed, the coding noise may become an artifact. That is, coding noise may be noticeable in the converted high-resolution image.

따라서 코딩 노이즈를 포함하는 저해상도 이미지에 대해 이미지 업스케일링 동작이 수행될 때, 코딩 노이즈를 제거하고 고해상도 이미지를 생성할 수 있는 새로운 방법이 요구된다. 이때, 저해상도 이미지에 포함된 텍스쳐(texture)도 같이 제거되지 않는 방법이 요구된다. Therefore, when an image upscaling operation is performed on a low-resolution image including coding noise, a new method capable of removing coding noise and generating a high-resolution image is required. At this time, a method for not removing the texture included in the low-resolution image is required.

한국 등록특허공보 제10-1996730호(2019.06.28.)Korean Registered Patent Publication No. 10-1996730 (2019.06.28.)

본 발명이 이루고자 하는 기술적인 과제는 코딩 노이즈의 레벨에 따라 고해상도 이미지를 생성할 수 있는 복수의 신경망들을 이용하여 코딩 노이즈에 강인한 슈퍼 레졸루션 이미지 처리 방법 및 시스템을 제공하는 것이다.A technical problem to be achieved by the present invention is to provide a super-resolution image processing method and system robust against coding noise by using a plurality of neural networks capable of generating high-resolution images according to the level of coding noise.

본 발명의 실시 예에 따른 복수의 신경망들을 이용하여 코딩 노이즈에 강인한 슈퍼 레졸루션 이미지 처리 방법은 프로세서는 압축된 저해상도 이미지를 수신하는 단계, 상기 프로세서는 상기 압축된 저해상도 이미지를 디코딩하는 단계, 상기 프로세서는 제1신경망을 이용하여 상기 디코딩된 저해상도 이미지의 노이즈를 제거하여 노이즈가 제거된 저해상도 이미지를 생성하는 단계, 상기 프로세서는 상기 디코딩된 저해상도 이미지의 노이즈 레벨과 상기 노이즈가 제거된 저해상도 이미지의 노이즈 레벨에 따라 상기 디코딩된 저해상도 이미지, 또는 상기 노이즈가 제거된 저해상도 이미지를 제2신경망을 이용하여 디코딩된 고해상도 이미지로 변환하는 단계, 상기 프로세서는 상기 디코딩된 고해상도 이미지를 인코딩하는 단계, 및 상기 프로세서는 상기 인코딩된 고해상도 이미지를 전송하는 단계를 포함한다. A super-resolution image processing method robust against coding noise using a plurality of neural networks according to an embodiment of the present invention includes the steps of: receiving, by a processor, a compressed low-resolution image; decoding, by the processor, the compressed low-resolution image; Generating a noise-removed low-resolution image by removing noise from the decoded low-resolution image using a first neural network, wherein the processor determines the noise level of the decoded low-resolution image and the noise level of the noise-removed low-resolution image. converting the decoded low-resolution image or the noise-removed low-resolution image into a decoded high-resolution image using a second neural network, the processor encoding the decoded high-resolution image, and the processor performing the encoding Transmitting the high-resolution image.

상기 디코딩된 저해상도 이미지의 노이즈 레벨과 상기 노이즈가 제거된 저해상도 이미지의 노이즈 레벨의 차이가 임의의 값보다 작을 때, 상기 프로세서는 상기 디코딩된 저해상도 이미지를 상기 제2신경망을 이용하여 상기 디코딩된 고해상도 이미지로 변환한다. When the difference between the noise level of the decoded low-resolution image and the noise level of the noise-removed low-resolution image is smaller than a certain value, the processor converts the decoded low-resolution image into the decoded high-resolution image using the second neural network. convert to

상기 차이가 상기 임의의 값보다 클 때, 상기 프로세서는 상기 노이즈가 제거된 저해상도 이미지를 상기 제2신경망을 이용하여 상기 디코딩된 고해상도 이미지로 변환한다. When the difference is greater than the predetermined value, the processor converts the noise-removed low-resolution image into the decoded high-resolution image using the second neural network.

상기 디코딩된 저해상도 이미지의 노이즈 레벨은 제3신경망을 이용하여 생성된다. The noise level of the decoded low-resolution image is generated using a third neural network.

상기 노이즈가 제거된 저해상도 이미지의 노이즈 레벨은 제4신경망을 이용하여 생성된다. The noise level of the noise-removed low-resolution image is generated using a fourth neural network.

실시 예에 따라 상기 복수의 신경망들을 이용하여 코딩 노이즈에 강인한 슈퍼 레졸루션 이미지 처리 방법은 상기 프로세서는 상기 노이즈가 제거된 저해상도 이미지와 손실없는 저해상도 트레이닝 이미지 사이의 이미지 유사도를 제1점수로 계산하는 단계, 상기 프로세서는 상기 디코딩된 고해상도 이미지와 높은 해상도 트레이닝 이미지 사이의 이미지 유사도를 제2점수로 계산하는 단계, 및 상기 제1점수가 상기 제2점수보다 클 때, 상기 프로세서는 상기 제2신경망보다 상기 제1신경망을 더 학습시키는 단계를 포함할 수 있다. According to an embodiment, the super-resolution image processing method robust against coding noise using the plurality of neural networks includes, by the processor, an image similarity between the noise-removed low-resolution image and the lossless low-resolution training image as a first score Calculating, calculating, by the processor, an image similarity between the decoded high-resolution image and the high-resolution training image as a second score; 1 Further training of the neural network may be included.

상기 노이즈가 제거된 저해상도 이미지와 상기 손실없는 저해상도 트레이닝 이미지가 유사할수록 상기 제1점수는 낮다. The first score is lower as the low resolution image from which the noise is removed is similar to the lossless low resolution training image.

상기 디코딩된 고해상도 이미지와 상기 높은 해상도 트레이닝 이미지가 유사할수록 상기 제2점수는 낮다. The second score is lower as the decoded high-resolution image and the high-resolution training image are similar.

본 발명의 실시 예에 따른 복수의 신경망들을 이용하여 코딩 노이즈에 강인한 슈퍼 레졸루션 이미지 처리 시스템은 이미지 처리 장치를 포함한다. A super-resolution image processing system robust against coding noise using a plurality of neural networks according to an embodiment of the present invention includes an image processing device.

상기 이미지 처리 장치는 명령들을 실행하는 프로세서, 및 상기 명령들을 저장하는 메모리를 포함한다. The image processing device includes a processor that executes commands, and a memory that stores the commands.

상기 명령들은 제1신경망을 이용하여 저해상도 이미지의 노이즈를 제거하여 노이즈가 제거된 저해상도 이미지를 생성하며, 상기 저해상도 이미지의 노이즈 레벨과 상기 노이즈가 제거된 저해상도 이미지의 노이즈 레벨에 따라 상기 저해상도 이미지, 또는 상기 노이즈가 제거된 저해상도 이미지를 제2신경망을 이용하여 고해상도 이미지로 변환하도록 구현된다. The instructions generate a noise-removed low-resolution image by removing noise from the low-resolution image using a first neural network, and the low-resolution image, or The low-resolution image from which the noise has been removed is converted into a high-resolution image using a second neural network.

상기 저해상도 이미지의 노이즈 레벨과 상기 노이즈가 제거된 저해상도 이미지의 노이즈 레벨의 차이가 임의의 값보다 작을 때, 상기 명령들은 상기 저해상도 이미지를 상기 제2신경망을 이용하여 상기 고해상도 이미지로 변환하도록 구현된다. When a difference between the noise level of the low-resolution image and the noise level of the noise-removed low-resolution image is smaller than a certain value, the instructions convert the low-resolution image into the high-resolution image using the second neural network.

상기 차이가 상기 임의의 값보다 클 때, 상기 명령들은 상기 노이즈가 제거된 저해상도 이미지를 상기 제2신경망을 이용하여 상기 고해상도 이미지로 변환하도록 구현된다. When the difference is greater than the arbitrary value, the instructions are implemented to convert the noise-removed low-resolution image into the high-resolution image using the second neural network.

상기 저해상도 이미지의 노이즈 레벨은 제3신경망을 이용하여 생성된다. The noise level of the low-resolution image is generated using a third neural network.

상기 명령들은 상기 노이즈가 제거된 저해상도 이미지와 손실없는 저해상도 트레이닝 이미지 사이의 이미지 유사도를 제1점수로 계산하며, 상기 고해상도 이미지와 높은 해상도 트레이닝 이미지 사이의 이미지 유사도를 제2점수로 계산하며, 상기 제1점수가 상기 제2점수보다 클 때, 상기 제2신경망보다 상기 제1신경망을 더 학습시키도록 구현될 수 있다. The instructions calculate image similarity between the noise-removed low-resolution image and the lossless low-resolution training image as a first score, calculate image similarity between the high-resolution image and the high-resolution training image as a second score, and When a score of 1 is greater than the second score, the first neural network may be trained more than the second neural network.

상기 노이즈가 제거된 저해상도 이미지와 상기 손실없는 저해상도 트레이닝 이미지가 유사할수록 상기 제1점수는 낮으며, 상기 고해상도 이미지와 상기 높은 해상도 트레이닝 이미지가 유사할수록 상기 제2점수는 낮다. The first score is lower as the low-resolution image from which the noise is removed is similar to the lossless low-resolution training image, and the second score is lower as the high-resolution image and the high-resolution training image are similar.

본 발명의 실시 예에 따른 복수의 신경망들을 이용하여 코딩 노이즈에 강인한 슈퍼 레졸루션 이미지 처리 방법 및 시스템은 코딩 노이즈 레벨에 따라 선택적으로 저해상도 이미지와 코딩 노이즈가 제거된 저해상도 이미지 중 어느 하나를 고해상도 이미지로 변환함으로써 텍스쳐(texture)도 같이 제거되지 않도록 하는 효과가 있다. A super-resolution image processing method and system robust against coding noise using a plurality of neural networks according to an embodiment of the present invention selectively converts either a low-resolution image or a low-resolution image from which coding noise is removed into a high-resolution image according to a coding noise level. This has the effect of preventing the texture from being removed as well.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 상세한 설명이 제공된다.
도 1은 본 발명의 실시 예에 따른 신경망을 이용한 슈퍼 레졸루션 이미지 처리 시스템의 블록도를 나타낸다.
도 2는 도 1에 도시된 신경망 모듈의 훈련 동작을 설명하기 위한 네트워크의 블록도를 나타낸다.
도 3은 도 2에 도시된 생성기 네트워크의 상세한 블록도를 나타낸다.
도 4는 본 발명의 실시 예에 따른 신경망을 이용한 슈퍼 레졸루션 이미지 처리 방법의 효과를 설명하기 위한 이미지들을 나타낸다.
도 5는 본 발명의 실시 예에 따른 신경망을 이용한 슈퍼 레졸루션 이미지 처리 방법의 흐름도를 나타낸다.
도 6은 본 발명의 다른 실시 예에 따른 신경망 모듈의 훈련 동작을 설명하기 위한 네트워크의 블록도를 나타낸다.
도 7은 본 발명의 다른 실시 예에 따른 복수의 신경망들을 이용하여 코딩 노이즈에 강인한 슈퍼 레졸루션 이미지 처리 방법의 흐름도를 나타낸다.A detailed description of each drawing is provided in order to more fully understand the drawings cited in the detailed description of the present invention.
1 shows a block diagram of a super-resolution image processing system using a neural network according to an embodiment of the present invention.
FIG. 2 shows a block diagram of a network for explaining a training operation of the neural network module shown in FIG. 1 .
Figure 3 shows a detailed block diagram of the generator network shown in Figure 2;
4 shows images for explaining the effect of the super-resolution image processing method using a neural network according to an embodiment of the present invention.
5 is a flowchart of a super-resolution image processing method using a neural network according to an embodiment of the present invention.
6 is a block diagram of a network for explaining a training operation of a neural network module according to another embodiment of the present invention.
7 is a flowchart of a super-resolution image processing method robust against coding noise using a plurality of neural networks according to another embodiment of the present invention.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시 예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시 예들은 다양한 형태들로 실시될 수 있으며 본 명세서에 설명된 실시 예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are only illustrated for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention It can be embodied in various forms and is not limited to the embodiments described herein.

본 발명의 개념에 따른 실시 예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시 예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나 이는 본 발명의 개념에 따른 실시 예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Embodiments according to the concept of the present invention can apply various changes and can have various forms, so the embodiments are illustrated in the drawings and described in detail in this specification. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosure forms, and includes all changes, equivalents, or substitutes included in the spirit and technical scope of the present invention.

제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1구성요소는 제2구성요소로 명명될 수 있고, 유사하게 제2구성요소는 제1구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but the components should not be limited by the terms. The above terms are only for the purpose of distinguishing one component from another component, e.g., without departing from the scope of rights according to the concept of the present invention, a first component may be termed a second component, and similarly The second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않은 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as “directly connected” or “directly connected” to another element, it should be understood that no other element exists in the middle. Other expressions describing the relationship between elements, such as "between" and "directly between" or "adjacent to" and "directly adjacent to", etc., should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다." 또는 "가지다." 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. As used herein, “includes.” or "to have." is intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, but is intended to indicate that one or more other features or numbers, steps, operations, components, parts, or combinations thereof are present. It should be understood that it does not preclude the possibility of existence or addition of one or the other.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 나타낸다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in this specification, it should not be interpreted in an ideal or excessively formal meaning. don't

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예를 설명함으로써, 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail by describing preferred embodiments of the present invention with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 신경망을 이용한 슈퍼 레졸루션 이미지 처리 시스템의 블록도를 나타낸다. 1 shows a block diagram of a super-resolution image processing system using a neural network according to an embodiment of the present invention.

도 1을 참고하면, 신경망을 이용한 슈퍼 레졸루션 이미지 처리 시스템(100)은 사용자가 전송한 저해상도 이미지를 고해상도 이미지로 변환하고 변환된 고해상도 이미지를 상기 사용자로 전송하는 시스템을 의미한다. Referring to FIG. 1 , a super-resolution image processing system 100 using a neural network means a system that converts a low-resolution image transmitted by a user into a high-resolution image and transmits the converted high-resolution image to the user.

신경망을 이용한 슈퍼 레졸루션 이미지 처리 시스템(100)은 이미지 처리 장치(10)와 컴퓨팅 장치(20)를 포함한다. 이미지 처리 장치(10)와 컴퓨팅 장치(20)는 네트워크(101)를 통해 서로 통신한다. A super resolution image processing system 100 using a neural network includes an image processing device 10 and a computing device 20 . The image processing device 10 and the computing device 20 communicate with each other through the network 101 .

이미지 처리 장치(10)는 컴퓨팅 장치, 또는 서버 등 다양한 용어들로 호칭될 수 있다. 이미지 처리 장치(10)는 컴퓨팅 장치(20)로부터 압축된 저해상도 이미지(CLR)를 수신하고, 압축된 저해상도 이미지(CLR)를 인코딩된 고해상도 이미지(ESR)로 변환하여 컴퓨팅 장치(20)로 출력한다. 압축된 저해상도 이미지(CLR)에서 인코딩된 고해상도 이미지(ESR)로의 변환 동작은 업스케일링 동작으로 정의될 수 있다. 뒤에서 이미지 처리 장치(10)의 자세한 동작들에 대해 설명될 것이다. The image processing device 10 may be called various terms such as a computing device or a server. The image processing device 10 receives the compressed low-resolution image CLR from the computing device 20, converts the compressed low-resolution image CLR into an encoded high-resolution image ESR, and outputs the compressed low-resolution image CLR to the computing device 20. . A conversion operation from a compressed low-resolution image (CLR) to an encoded high-resolution image (ESR) may be defined as an upscaling operation. Detailed operations of the image processing device 10 will be described later.

컴퓨팅 장치(20)는 노트북, 태블릿 PC, 또는 데스크탑과 같은 전자 장치를 의미한다. 실시 예에 따라 컴퓨팅 장치(20)는 클라이언트로 호칭될 수 있다. 컴퓨팅 장치(20)는 저해상도 이미지로부터 고해상도 이미지를 생성하기 위해 이용된다. 컴퓨팅 장치(20)는 저해상도 이미지로부터 고해상도 이미지를 얻기 위해 사용자의 제어 하에 저해상도 이미지를 압축하여 압축된 저해상도 이미지(CLR)를 이미지 처리 장치(10)로 전송한다. 저해상도 이미지를 압축하는 과정에서 노이즈가 생성될 수 있다. 상기 노이즈는 코딩 노이즈로 정의된다. 저해상도 이미지를 고해상도 이미지로 변환하는 과정에서 상기 코딩 노이즈는 제거되어야 한다. 만약, 코딩 노이즈의 제거 없이 상기 저해상도 이미지를 고해상도 이미지로 변환할 때, 고해상도 이미지의 퀄리티는 낮을 것이다. 즉, 코딩 노이즈는 아티팩트가 될 수 있다. 본 발명에서는 압축된 저해상도 이미지를 고해상도 이미지로 업스케일링하는 과정에서 저해상도 이미지에 포함된 코딩 노이즈를 제거하는 방법들이 제안된다. Computing device 20 refers to an electronic device such as a notebook, tablet PC, or desktop. According to embodiments, the computing device 20 may be referred to as a client. Computing device 20 is used to create a high-resolution image from a low-resolution image. The computing device 20 compresses the low-resolution image under a user's control to obtain a high-resolution image from the low-resolution image and transmits the compressed low-resolution image CLR to the image processing device 10 . Noise can be created in the process of compressing low-resolution images. The noise is defined as coding noise. In the process of converting a low-resolution image into a high-resolution image, the coding noise must be removed. If the low-resolution image is converted into a high-resolution image without removing coding noise, the quality of the high-resolution image will be low. That is, coding noise can become an artifact. In the present invention, methods for removing coding noise included in a low-resolution image in a process of up-scaling a compressed low-resolution image to a high-resolution image are proposed.

이미지 처리 장치(10)는 프로세서(11)와 메모리(13)를 포함한다. 프로세서(11)는 신경망을 이용한 슈퍼 레졸루션 이미지 처리 방법들과 관련된 명령들을 실행한다. 메모리(13)는 상기 명령들을 포함한다. 메모리(13)는 디코딩 모듈(15), 신경망 모듈(17), 및 인코딩 모듈(19)을 포함한다. 디코딩 모듈(15), 신경망 모듈(17), 및 인코딩 모듈(19)은 프로세서(11)에 의해 실행되는 명령들을 의미한다. 이하, 디코딩 모듈(15), 신경망 모듈(17), 및 인코딩 모듈(19)의 동작들은 프로세서(11)에 의해 수행되는 것으로 이해될 수 있다. The image processing device 10 includes a processor 11 and a memory 13 . The processor 11 executes instructions related to super-resolution image processing methods using a neural network. The memory 13 contains the instructions. The memory 13 includes a decoding module 15 , a neural network module 17 , and an encoding module 19 . The decoding module 15, the neural network module 17, and the encoding module 19 refer to instructions executed by the processor 11. Hereinafter, operations of the decoding module 15 , the neural network module 17 , and the encoding module 19 may be understood to be performed by the processor 11 .

디코딩 모듈(15)은 압축된 저해상도 이미지(CLR)를 디코딩한다. 즉, 디코딩 모듈(15)은 압축된 저해상도 이미지(CLR)를 수신하여 디코딩된 저해상도 이미지(LR)를 출력한다. The decoding module 15 decodes the compressed low resolution image CLR. That is, the decoding module 15 receives the compressed low-resolution image CLR and outputs the decoded low-resolution image LR.

신경망 모듈(17)은 한 쌍의 신경망들을 이용하여 디코딩된 저해상도 이미지(LR)를 디코딩된 고해상도 이미지(SR)로 변환한다. The neural network module 17 converts the decoded low-resolution image LR into a decoded high-resolution image SR using a pair of neural networks.

도 2는 도 1에 도시된 신경망 모듈의 훈련 동작을 설명하기 위한 네트워크의 블록도를 나타낸다. FIG. 2 shows a block diagram of a network for explaining a training operation of the neural network module shown in FIG. 1 .

도 1과 도 2를 참고하면, 디코딩된 저해상도 이미지(LR)를 디코딩된 고해상도 이미지(SR)로 변환하기 위해 한 쌍의 신경망들(300, 400)이 먼저 훈련되어야 한다. 네트워크(200)의 동작들은 신경망 모듈(17)에 의해 수행된다. Referring to FIGS. 1 and 2 , a pair of neural networks 300 and 400 must first be trained to convert a decoded low-resolution image LR into a decoded high-resolution image SR. The operations of network 200 are performed by neural network module 17 .

도 2에 도시된 네트워크(200)는 GAN(Generative adversarial network)를 나타낸다. 네트워크(200)은 생성기 네트워크(generator network, 500)와 판별자 네트워크(discriminator network, 550)를 포함한다. The network 200 shown in FIG. 2 represents a generative adversarial network (GAN). The network 200 includes a generator network 500 and a discriminator network 550 .

생성기 네트워크(500)는 생성기 네트워크(500)의 손실 함수가 최소가 되도록 훈련되는 것이 아니라, 판별자 네트워크(500)를 속이기 위해 훈련된다. 이는 네트워크(200)를 비지도 방법(unsupervised manner)로 학습할 수 있게 한다. Generator network 500 is not trained to minimize the loss function of generator network 500, but rather to trick discriminator network 500. This allows the network 200 to learn in an unsupervised manner.

첫 번째 스텝에서 생성기 네트워크(500)가 훈련되고, 두 번째 스텝에서 판별자 네트워크(550)가 훈련된다. 상기 첫 번째 스텝과 상기 두 번째 스텝은 반복적으로 수행된다. In the first step the generator network 500 is trained, and in the second step the discriminator network 550 is trained. The first step and the second step are repeatedly performed.

생성기 네트워크(500)는 코딩 노이즈를 포함하는 디코딩된 저해상도 이미지(LR)를 수신하여 디코딩된 고해상도 이미지(SR)가 출력되도록 훈련된다. The generator network 500 is trained to receive a decoded low-resolution image LR containing coding noise and output a decoded high-resolution image SR.

판별자 네트워크(550)는 높은 해상도 트레이닝 이미지(HR)와 생성기 네트워크(500)로부터 출력되는 디코딩된 고해상도 이미지(SR)를 수신하여 높은 해상도 트레닝 이미지(HR)의 분포와 디코딩된 고해상도 이미지(SR)의 분포의 차이인 바서스타인 손실(Wassersein loss, EM)을 출력한다. 바서스타인 손실(EM)에서 따라 생성기 네트워크(500)와 판별자 네트워크(550)가 훈련될 수 있다. 즉, 생성기 네트워크(500)와 판별자 네트워크(550)의 파라미터들(가중치들)이 조정될 수 있다. 높은 해상도 트레이닝 이미지(HR)와 고해상도 이미지(SR)는 같은 해상도를 가질 수 있다. 판별자 네트워크(550)는 컨볼루션 레이어들과 활성화 레이어들을 포함한다. The discriminator network 550 receives the high resolution training image HR and the decoded high resolution image SR output from the generator network 500, and calculates the distribution of the high resolution training image HR and the decoded high resolution image SR ) outputs the Wasserstein loss (EM), which is the difference in the distribution. Generator network 500 and discriminator network 550 can be trained according to the Wasserstein loss (EM). That is, parameters (weights) of the generator network 500 and the discriminator network 550 may be adjusted. The high resolution training image HR and the high resolution image SR may have the same resolution. The discriminator network 550 includes convolutional layers and activation layers.

생성기 네트워크(500)는 한 쌍의 신경망들(300, 400)을 포함한다. 한 쌍의 신경망들(300, 400)은 순차적으로 연결된다. Generator network 500 includes a pair of neural networks 300 and 400 . A pair of neural networks 300 and 400 are sequentially connected.

도 3은 도 2에 도시된 생성기 네트워크의 상세한 블록도를 나타낸다. Figure 3 shows a detailed block diagram of the generator network shown in Figure 2;

도 1 내지 도 3을 참고하면, 제1신경망(300)은 코딩 노이즈를 포함하는 상기 디코딩된 저해상도 이미지(LR)를 수신하여 노이즈가 제거된 저해상도 이미지(DLR)를 출력한다. 제1신경망(300)은 저해상도 이미지(LR)에서 코딩 노이즈를 제거하기 위해 이용된다. 제1신경망(300)은 코딩 노이즈 제거 네트워크(coding noise reduction network)로 호칭될 수 있다. Referring to FIGS. 1 to 3 , the first neural network 300 receives the decoded low-resolution image LR including coding noise and outputs a noise-removed low-resolution image DLR. The first neural network 300 is used to remove coding noise from the low-resolution image LR. The first neural network 300 may be referred to as a coding noise reduction network.

제1신경망(300)은 코딩 노이즈를 포함하는 상기 디코딩된 저해상도 이미지(LR)를 수신하여 노이즈가 제거된 저해상도 이미지(DLR)를 출력하기 위해 복수의 레이어들(310-1~310-N; N은 자연수), 셔플 레이어(320), 및 가산기(330)를 포함한다. 복수의 레이어들(310-1~310-N)은 컨볼루션 레이어, 또는 활성화(activation) 레이어이다. 상기 활성화 레이어는 ReLU(Rectified Linear Unit) 활성화 함수일 수 있다. 복수의 레이어들(310-1~310-N)의 일부는 잔여 블록(residual block)을 형성한다. 실시 예에 따라 복수의 레이어들(310-1~310-N)의 수는 다양할 수 있다. The first neural network 300 receives the decoded low-resolution image LR including coding noise and includes a plurality of layers 310-1 to 310-N; N to output a noise-removed low-resolution image DLR. is a natural number), a shuffle layer 320, and an adder 330. The plurality of layers 310-1 to 310-N are convolution layers or activation layers. The activation layer may be a Rectified Linear Unit (ReLU) activation function. Some of the plurality of layers 310-1 to 310-N form a residual block. Depending on embodiments, the number of the plurality of layers 310-1 to 310-N may vary.

실시 예에 따라 제1신경망(300)은 (H, W, C) 형태로 표현되는 저해상도 이미지(LR)를 (H/2, W/2, C*2*2) 형태로 변환하는 스페이스 투 뎁스(Space to depth) 레이어(305)를 더 포함할 수 있다. 스페이스 투 뎁스 레이어(305)는 제1신경망(300)의 처리 속도 향상을 위해 이용된다. 상기 H는 이미지의 높이, 상기 W은 이미지의 폭, 상기 C는 채널을 나타낸다. According to an embodiment, the first neural network 300 converts the low-resolution image LR expressed in the form of (H, W, C) into the form of (H / 2, W / 2, C * 2 * 2) Space-to-depth (Space to depth) layer 305 may be further included. The space to depth layer 305 is used to improve the processing speed of the first neural network 300 . The H represents the height of the image, the W represents the width of the image, and the C represents the channel.

셔플 레이어(320)는 하나의 채널을 포함하는 이미지를 생성하기 위해 복수의 채널들을 포함하는 텐서(tensor)를 섞는다. The shuffle layer 320 shuffles tensors including a plurality of channels to generate an image including one channel.

가산기(330)는 셔플 레이어(320)로부터 출력되는 이미지와 상기 디코딩된 저해상도 이미지(LR)를 가산하여 노이즈가 제거된 저해상도 이미지(DLR)를 출력한다. The adder 330 adds the image output from the shuffle layer 320 and the decoded low-resolution image LR, and outputs a noise-removed low-resolution image DLR.

제1손실 함수(340)은 노이즈가 제거된 저해상도 이미지(DLR)과 손실없는 저해상도 트레이닝 이미지(LLR)를 수신하여 손실 함수를 계산한다. 상기 손실 함수는 노이즈가 제거된 저해상도 이미지(DLR)과 손실없는 저해상도 트레이닝 이미지(LLR)의 손실오차제곱(Mean Square Error)으로 계산될 수 있다. 손실없는 저해상도 트레이닝 이미지(LLR)는 높은 해상도 트레이닝 이미지(HR)에서 해상도가 감소된 이미지를 의미한다. 손실없는 저해상도 트레이닝 이미지(LLR)와 높은 해상도 트레이닝 이미지(HR)는 신경망의 훈련을 위해 사용되는 이미지들이다. The first loss function 340 calculates a loss function by receiving the noise-removed low-resolution image DLR and the lossless low-resolution training image LLR. The loss function may be calculated as a mean square error of a noise-removed low-resolution image (DLR) and a lossless low-resolution training image (LLR). The lossless low-resolution training image (LLR) means an image whose resolution is reduced from the high-resolution training image (HR). A lossless low-resolution training image (LLR) and a high-resolution training image (HR) are images used for neural network training.

제2신경망(400)은 상기 노이즈가 제거된 저해상도 이미지(DLR)를 수신하여 상기 디코딩된 고해상도 이미지(SR)를 출력한다. 제2신경망(400)은 저해상도 이미지(DLR)에서 고해상도 이미지(SR)로 변환하기 위해 이용된다. 제2신경망(400)은 비디오 슈퍼 레졸루션 네트워크(video super resolution network)라고 호칭될 수 있다. The second neural network 400 receives the noise-removed low-resolution image DLR and outputs the decoded high-resolution image SR. The second neural network 400 is used to convert a low resolution image DLR into a high resolution image SR. The second neural network 400 may be referred to as a video super resolution network.

제2신경망(400)은 상기 노이즈가 제거된 저해상도 이미지(DLR)를 수신하여 상기 디코딩된 고해상도 이미지(SR)를 출력하기 위해 연결 레이어(405), 복수의 레이어들(410-1~410-M; M은 자연수), 셔플 레이어(420), 업스케일링 레이어(425) 및 가산기(430)를 포함한다. The second neural network 400 receives the noise-removed low-resolution image DLR and outputs the decoded high-resolution image SR through a connection layer 405 and a plurality of layers 410-1 to 410-M. ; M is a natural number), a shuffle layer 420, an upscaling layer 425, and an adder 430.

연결 레이어(405)는 제2신경망(400)을 통해 출력된 이전 프레임의 고해상도 이미지(PSR)와 제1신경망(300)으로부터 출력된 노이즈가 제거된 저해상도 이미지(DLR)를 연결(concatenate)한다. 실시 예에 따라 제2신경망(400)은 스페이스 투 뎁스 레이어(403)를 더 포함할 수 있다. 이전 프레임의 고해상도 이미지(PSR)는 스페이스 투 뎁스 레이어(403)로 입력될 수 있다. 제2신경망(400)을 통해 출력된 이전 프레임의 고해상도 이미지(PSR)는 (H*scale factor, W*scale factor, C)로 표현된다. 제1신경망(300)으로부터 출력된 노이즈가 제거된 저해상도 이미지(DLR)는 (H, W, C)로 표현된다. 상기 H는 이미지의 높이, 상기 W는 이미지의 폭, 상기 C는 채널을 나타낸다. 상기 scale factor는 상수를 나타낸다. 즉, 제2신경망(400)을 통해 출력된 이전 프레임의 고해상도 이미지(PSR)와 제1신경망(300)으로부터 출력된 노이즈가 제거된 저해상도 이미지(DLR)는 서로 사이즈가 다르다. 따라서 스페이스 투 뎁스 레이어(403)를 통해 (H*scale factor, W*scale factor, C)로 표현되는 이전 프레임의 고해상도 이미지(PSR)는 (H, W, C*scale factor*scale factor)로 변경된다.The connection layer 405 concatenates the high resolution image PSR of the previous frame output through the second neural network 400 and the noise-removed low resolution image DLR output from the first neural network 300. According to an embodiment, the second neural network 400 may further include a space to depth layer 403 . The high-resolution image (PSR) of the previous frame may be input to the space-to-depth layer 403 . The high-resolution image (PSR) of the previous frame output through the second neural network 400 is expressed as (H*scale factor, W*scale factor, C). The noise-removed low-resolution image DLR output from the first neural network 300 is expressed as (H, W, C). The H represents the height of the image, the W represents the width of the image, and the C represents the channel. The scale factor represents a constant. That is, the high-resolution image (PSR) of the previous frame output through the second neural network 400 and the noise-removed low-resolution image (DLR) output from the first neural network 300 have different sizes. Therefore, the high resolution image (PSR) of the previous frame expressed as (H*scale factor, W*scale factor, C) through the space-to-depth layer 403 is changed to (H, W, C*scale factor*scale factor) do.

스페이스 투 뎁스 레이어(403)로부터 출력된 이미지와 노이즈가 제거된 저해상도 이미지(DLR)가 연결 레이어(405)로 입력될 수 있다. 연결 레이어(405)는 스페이스 투 뎁스 레이어(403)로부터 출력된 이미지와 노이즈가 제거된 저해상도 이미지(DLR)를 연결할 수 있다. An image output from the space-to-depth layer 403 and a low-resolution image (DLR) from which noise is removed may be input to the connection layer 405 . The connection layer 405 may connect the image output from the space-to-depth layer 403 and the low-resolution image (DLR) from which noise is removed.

복수의 레이어들(410-1~410-M)은 컨볼루션 레이어, 또는 활성화(activation) 레이어이다. 상기 활성화 레이어는 ReLU(Rectified Linear Unit) 활성화 함수일 수 있다. 복수의 레이어들(410-1~410-M)의 일부는 잔여 블록(residual block)을 형성한다. 실시 예에 따라 복수의 레이어들(410-1~410-M)의 수는 다양할 수 있다. The plurality of layers 410-1 to 410-M are convolution layers or activation layers. The activation layer may be a Rectified Linear Unit (ReLU) activation function. Some of the plurality of layers 410-1 to 410-M form a residual block. Depending on embodiments, the number of the plurality of layers 410-1 to 410-M may vary.

셔플 레이어(420)는 하나의 채널을 포함하는 이미지를 생성하기 위해 복수의 채널들을 포함하는 텐서(tensor)를 섞는다.The shuffle layer 420 shuffles tensors including a plurality of channels to generate an image including one channel.

업스케일 레이어(425)는 제1신경망(300)으로부터 출력된 노이즈가 제거된 저해상도 이미지(DLR)를 수신하여 업스케일을 수행하기 위해 바이큐빅 인터폴레이션(bicubic interpolation)을 이용한다. The upscaling layer 425 uses bicubic interpolation to perform upscaling by receiving the noise-removed low-resolution image DLR output from the first neural network 300 .

가산기(430)는 셔플 레이어(420)로부터 출력되는 이미지와 업스케일 레이어(425)로부터 출력된 이미지를 가산하여 상기 디코딩된 고해상도 이미지(SR)를 출력한다.The adder 430 adds the image output from the shuffle layer 420 and the image output from the upscale layer 425 and outputs the decoded high-resolution image SR.

제2손실 함수(440)은 디코딩된 고해상도 이미지(SR)와 높은 해상도 트레이닝 이미지(HR)를 수신하여 손실 함수를 계산한다. 상기 손실 함수는 디코딩된 고해상도 이미지(SR)와 높은 해상도 트레이닝 이미지(HR)의 손실오차제곱(Mean Square Error)으로 계산될 수 있다. The second loss function 440 calculates a loss function by receiving the decoded high resolution image SR and the high resolution training image HR. The loss function may be calculated as a mean square error of the decoded high-resolution image (SR) and the high-resolution training image (HR).

실시 예에 따라 제1신경망(300)과 제2신경망(400)을 효율적으로 훈련시키기 위해 LPIPS(Learned Perceptual Image Patch Similarity) 유닛들(335, 435)이 이용될 수 있다. According to an embodiment, LPIPS (Learned Perceptual Image Patch Similarity) units 335 and 435 may be used to efficiently train the first neural network 300 and the second neural network 400 .

제1LPIPS 유닛(335)은 제1신경망(300)에 포함된다. 제1LPIPS 유닛(335)은 노이즈가 제거된 저해상도 이미지(DLR)와 손실없는 저해상도 트레이닝 이미지(LLR) 사이의 이미지 유사도를 제1점수(a)로 계산한다. 노이즈가 제거된 저해상도 이미지(DLR)와 손실없는 저해상도 트레이닝 이미지(LLR)가 유사할수록 제1점수(a)는 낮다. 제1LPIPS 유닛(335)은 신경망으로 구현된다. The first LPIPS unit 335 is included in the first neural network 300 . The first LPIPS unit 335 calculates the image similarity between the noise-removed low-resolution image DLR and the lossless low-resolution training image LLR as a first score (a). The first score (a) is lower as the low-resolution image (DLR) from which noise is removed is similar to the low-resolution training image (LLR) without loss. The first LPIPS unit 335 is implemented as a neural network.

제2LPIPS 유닛(435)는 제2신경망(400)에 포함된다. 제2LPIPS 유닛(435)은 디코딩된 고해상도 이미지(SR)와 높은 해상도 트레이닝 이미지(HR) 사이의 이미지 유사도(b)를 제2점수(b)로 계산한다. 디코딩된 고해상도 이미지(SR)와 높은 해상도 트레이닝 이미지(HR)가 유사할수록 제2점수(b)는 낮다. The second LPIPS unit 435 is included in the second neural network 400 . The second LPIPS unit 435 calculates the image similarity (b) between the decoded high-resolution image (SR) and the high-resolution training image (HR) as a second score (b). The second score (b) is lower as the decoded high-resolution image SR and the high-resolution training image HR are similar.

제1점수(a)가 제2점수(b)보다 클 때, 프로세서(11)는 제2신경망(400)보다 제1신경망(300)을 더 학습시킨다. 프로세서(11) 제1점수(a)와 제2점수(b)의 합이 1이 되도록 제1점수(a)와 제2점수(b)를 표준화(normalization)한다. When the first score (a) is greater than the second score (b), the processor 11 trains the first neural network 300 more than the second neural network 400. The processor 11 normalizes the first score (a) and the second score (b) so that the sum of the first score (a) and the second score (b) becomes 1.

LPIPS 유닛들(335, 435)을 이용함으로써 훈련이 더 필요한 신경망을 결정할 수 있고, 훈련이 더 필요한 신경망에 대해 더 학습되도록 함으로써 보다 효율적인 훈련이 가능하다. By using the LPIPS units 335 and 435, it is possible to determine a neural network that needs more training, and more efficient training is possible by learning more about the neural network that needs more training.

생성기 네트워크(500)를 훈련시키기 위해 생성기 네트워크(500)의 손실 함수가 제안된다. 프로세서(11)는 아래의 수학식 1과 같이 표현된 손실 함수를 이용하여 생성기 네트워크(500)를 훈련시킬 수 있다. 생성기 네트워크(500)의 손실 함수는 아래의 수학식으로 표현된다. A loss function of the generator network (500) is proposed to train the generator network (500). The processor 11 may train the generator network 500 using a loss function expressed as in Equation 1 below. The loss function of the generator network 500 is expressed by the equation below.

상기 LossFunction은 생성기 네트워크(500)의 손실함수를, 상기 a와 상기 b는 제1점수와 제2점수를, 상기 LossFunction1은 제1신경망(300)의 제1손실 함수(340)를, 상기 LossFunction2는 제2신경망(400)의 제2손실 함수(440)를, 상기 corr1은 제1신경망(300)의 상관성을, 상기 corr2는 제2신경망(400)의 상관성을 의미한다. The LossFunction is the loss function of the generator network 500, the a and the b are the first and second scores, the LossFunction1 is the first loss function 340 of the first neural network 300, and the LossFunction2 is In the second loss function 440 of the second neural network 400, the corr1 denotes the correlation of the first neural network 300, and the corr2 denotes the correlation of the second neural network 400.

상기 제1신경망(300)의 상관성은 아래의 수학식으로 정의된다. The correlation of the first neural network 300 is defined by the following equation.

상기 corr1은 상기 제1신경망(300)의 상관성을, 상기 crosscorrelation1은 제1신경망(300)의 특징 맵의 채널들 사이의 상호 상관 관계(cross correlation)를 의미한다. 상기 maskmap_c(x,y)은 아래의 수학식으로 표현된다. The corr1 denotes the correlation of the first neural network 300, and the crosscorrelation1 denotes the cross-correlation between channels of the feature map of the first neural network 300. The maskmap _c (x, y) is expressed by the following equation.

상기 LR은 제1신경망(300)의 입력인 디코딩된 저해상도 이미지(LR)의 픽셀 값을, 상기 gaussianblurkernel(LR)은 저해상도 이미지(LR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값을 의미한다. 즉, 프로세서(11)는 제1신경망(300)의 입력인 상기 디코딩된 저해상도 이미지(LR)의 픽셀 값과, 상기 가우시안 커널의 픽셀 값의 차이의 제1절대값을 계산한다. The LR denotes a pixel value of the decoded low-resolution image LR, which is an input of the first neural network 300, and the gaussianblurkernel LR denotes a pixel value of a Gaussian kernel used to blur the low-resolution image LR. That is, the processor 11 calculates a first absolute value of a difference between a pixel value of the decoded low-resolution image LR, which is an input of the first neural network 300, and a pixel value of the Gaussian kernel.

디코딩된 저해상도 이미지(LR)의 영역(x,y)이 대비(cotrast)가 낮은 플랫(flat) 영역을 포함할 때, 특징들(features)의 상관성은 무시될 수 있다. 디코딩된 저해상도 이미지(LR)의 영역(x,y)이 대비(contrast)가 높은 텍스쳐(texture) 영역을 포함할 때, 특징들(features)의 상관성은 무시되어서는 안된다. When the region (x,y) of the decoded low-resolution image LR includes a flat region with low contrast, the correlation of features may be disregarded. When the region (x,y) of the decoded low-resolution image LR includes a texture region with high contrast, the correlation of features should not be neglected.

상기 수학식 3은 이미지(LR)의 영역(x,y)이 플랫 영역, 또는 텍스쳐 영역을 포함하는지 판단하기 위해 이용된다. Equation 3 is used to determine whether the area (x, y) of the image LR includes a flat area or a texture area.

상기 수학식 3에서 디코딩된 저해상도 이미지(LR)의 픽셀 값과 저해상도 이미지(LR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값의 차이가 제1문턱값보다 작을 때, 프로세서(11)는 이미지(LR)의 영역(x,y)에 플랫 영역이 포함되었다고 판단한다. 따라서 상기 수학식 3에서 디코딩된 저해상도 이미지(LR)의 픽셀 값과 저해상도 이미지(LR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값의 차이가 제1문턱값보다 작을 때, 프로세서(11)는 상기 maskmap_c(x,y)의 값으로 0을 출력한다. 프로세서(11)는 상기 제1절대값이 제1문턱값보다 작을 때, 상기 제1절대값을 0으로 설정한다. 상기 (x,y)는 이미지(LR)의 픽셀 위치를 나타낸다. When the difference between the pixel value of the low-resolution image LR decoded in Equation 3 and the pixel value of the Gaussian kernel used to blur the low-resolution image LR is smaller than the first threshold value, the processor 11 generates an image ( It is determined that the flat area is included in the area (x, y) of the LR). Therefore, when the difference between the pixel value of the low-resolution image LR decoded in Equation 3 and the pixel value of the Gaussian kernel used to blur the low-resolution image LR is smaller than the first threshold value, the processor 11 Outputs 0 as the value of maskmap _c (x,y). The processor 11 sets the first absolute value to 0 when the first absolute value is smaller than the first threshold value. The (x,y) represents a pixel position of the image LR.

상기 수학식 3에서 디코딩된 저해상도 이미지(LR)의 픽셀 값과 저해상도 이미지(LR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값의 차이가 제1문턱값보다 클 때, 프로세서(11)는 이미지(LR)의 영역(x,y)에 텍스쳐 영역이 포함되었다고 판단한다. 따라서 상기 수학식 3에서 디코딩된 저해상도 이미지(LR)의 픽셀 값과 저해상도 이미지(LR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값의 차이가 제1문턱값보다 클 때, 프로세서(11)는 상기 maskmap_c(x,y)의 값으로 1을 출력한다. 프로세서(11)는 상기 제1절대값이 상기 제1문턱값보다 클 때, 상기 제1절대값을 1로 설정한다. 상기 maskmap_c(x,y)의 값은 0, 또는 1이다. When the difference between the pixel value of the low-resolution image LR decoded in Equation 3 and the pixel value of the Gaussian kernel used to blur the low-resolution image LR is greater than the first threshold value, the processor 11 produces an image ( It is determined that the texture area is included in the area (x, y) of the LR). Therefore, when the difference between the pixel value of the low-resolution image LR decoded in Equation 3 and the pixel value of the Gaussian kernel used to blur the low-resolution image LR is greater than the first threshold value, the processor 11 Outputs 1 as the value of maskmap _c (x,y). The processor 11 sets the first absolute value to 1 when the first absolute value is greater than the first threshold value. The value of the maskmap _c (x,y) is 0 or 1.

프로세서(11)는 제1신경망(300)의 특징 맵(FM1)의 채널들 사이의 제1상호 상관 관계(cross correlation)의 값을 계산한다. 상기 수학식 2에서 상기 crosscorrelation1은 제1신경망(300)의 특징 맵의 채널들 사이의 상호 상관 관계(cross correlation)를 나타낸다. 제1신경망(300)의 특징 맵(FM1)은 제1신경망(300)에서 셔플 레이어(320)로 입력되는 특징 맵을 의미한다. 셔플 레이어(320)로 입력되는 특징 맵(FM1)은 다른 레이어(예컨대, 310-1, 310-2, ..., 또는 310-4)로부터 출력되는 특징 맵보다 많은 특징들(features)을 포함하고 있다. 특징들(features)은 이미지에서 점(point), 에지(edge), 또는 객체를 의미한다. 특징은 (x, y, channel)으로 표현된다. 상기 x, y는 이미지에서 픽셀의 위치를 의미하며, 상기 channel은 이미지에서 채널을 의미한다. 각 채널에 포함된 특징들의 상호 상관 관계가 높을 때, 각 채널은 서로 유사한 특징들을 모두 포함할 필요가 없다. 이는 고해상도 이미지(SR)를 얻는데 비효율적이다. 따라서 각 채널에 포함된 특징들의 상호 상관 관계를 판단하기 위해 프로세서(11)는 제1신경망(300)의 특징 맵(FM1)의 채널들 사이의 제1상호 상관 관계(cross correlation)의 값을 계산한다.The processor 11 calculates a first cross correlation value between channels of the feature map FM1 of the first neural network 300 . In Equation 2, the crosscorrelation1 represents a cross correlation between channels of the feature map of the first neural network 300 . The feature map FM1 of the first neural network 300 means a feature map input from the first neural network 300 to the shuffle layer 320 . The feature map FM1 input to the shuffle layer 320 includes more features than feature maps output from other layers (eg, 310-1, 310-2, ..., or 310-4). are doing Features refer to points, edges, or objects in an image. A feature is represented by (x, y, channel). The x and y denote positions of pixels in the image, and the channel denotes a channel in the image. When the cross-correlation of features included in each channel is high, each channel need not include all features similar to each other. This is inefficient in obtaining a high-resolution image SR. Therefore, in order to determine the cross-correlation of the features included in each channel, the processor 11 calculates a first cross-correlation value between the channels of the feature map FM1 of the first neural network 300. do.

제2신경망(400)의 상관성은 아래의 수학식으로 정의된다. The correlation of the second neural network 400 is defined by the following equation.

상기 corr2은 제2신경망(400)의 상관성을, 상기 crosscorrelation2은 제2신경망(400)의 특징 맵의 채널들 사이의 상호 상관 관계(cross correlation)를 의미한다. 상기 maskmap_s(x,y)은 아래의 수학식으로 표현된다. The corr2 denotes the correlation of the second neural network 400, and the crosscorrelation2 denotes the cross-correlation between channels of the feature map of the second neural network 400. The maskmap _s (x, y) is expressed by the following equation.

상기 DLR은 제2신경망(400)의 입력인 노이즈가 제거된 저해상도 이미지(DLR)의 픽셀 값을, 상기 gaussianblurkernel(DLR)은 노이즈가 제거된 저해상도 이미지(DLR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값을 의미한다. 즉, 프로세서(11)는 제2신경망(400)의 입력인 노이즈가 제거된 저해상도 이미지(DLR)의 픽셀 값과, 상기 가우시안 커널의 픽셀 값의 차이의 제2절대값을 계산한다. The DLR is a pixel value of a low-resolution image (DLR) from which noise is removed, which is an input of the second neural network 400, and the gaussianblurkernel (DLR) is a Gaussian kernel used to blur the low-resolution image (DLR) from which noise is removed. represents the pixel value. That is, the processor 11 calculates a second absolute value of a difference between a pixel value of the noise-removed low-resolution image DLR, which is an input of the second neural network 400, and a pixel value of the Gaussian kernel.

노이즈가 제거된 저해상도 이미지(DLR)의 영역(x,y)이 대비(cotrast)가 낮은 플랫(flat) 영역을 포함할 때, 특징들(features)의 상관성은 무시될 수 있다. 노이즈가 제거된 저해상도 이미지(DLR)의 영역(x,y)이 대비(contrast)가 높은 텍스쳐(texture) 영역을 포함할 때, 특징들(features)의 상관성은 무시되어서는 안된다. When the area (x, y) of the low-resolution image (DLR) from which noise is removed includes a flat area with low contrast, correlation between features may be disregarded. When the area (x, y) of the low-resolution image (DLR) from which noise is removed includes a high-contrast texture area, the correlation of features should not be neglected.

상기 수학식 5은 노이즈가 제거된 저해상도 이미지(DLR)의 영역(x,y)이 플랫 영역, 또는 텍스쳐 영역을 포함하는지 판단하기 위해 이용된다. Equation 5 is used to determine whether the area (x, y) of the noise-removed low-resolution image DLR includes a flat area or a texture area.

상기 수학식 5에서 노이즈가 제거된 저해상도 이미지(DLR)의 픽셀 값과 노이즈가 제거된 저해상도 이미지(DLR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값의 차이가 제2문턱값보다 작을 때, 프로세서(11)는 노이즈가 제거된 저해상도 이미지(DLR)의 영역(x,y)에 플랫 영역이 포함되었다고 판단한다. 따라서 상기 수학식 5에서 노이즈가 제거된 저해상도 이미지(DLR)의 픽셀 값과 노이즈가 제거된 저해상도 이미지(DLR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값의 차이가 제2문턱값보다 작을 때, 프로세서(11)는 상기 maskmap_s(x,y)의 값으로 0을 출력한다. 프로세서(11)는 상기 제2절대값이 제2문턱값보다 작을 때, 상기 제2절대값을 0으로 설정한다. 상기 (x,y)는 이미지(LR)의 픽셀 위치를 나타낸다. When the difference between the pixel value of the noise-removed low-resolution image DLR in Equation 5 and the pixel value of the Gaussian kernel used to blur the noise-removed low-resolution image DLR is smaller than the second threshold value, the processor (11) determines that the flat area is included in the area (x, y) of the low-resolution image DLR from which noise has been removed. Therefore, in Equation 5, when the difference between the pixel value of the noise-removed low-resolution image DLR and the pixel value of the Gaussian kernel used to blur the noise-removed low-resolution image DLR is smaller than the second threshold value, The processor 11 outputs 0 as the value of the maskmap _s (x, y). The processor 11 sets the second absolute value to 0 when the second absolute value is smaller than the second threshold value. The (x,y) represents a pixel position of the image LR.

상기 수학식 5에서 노이즈가 제거된 저해상도 이미지(DLR)의 픽셀 값과 노이즈가 제거된 저해상도 이미지(DLR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값의 차이가 제2문턱값보다 클 때, 프로세서(11)는 노이즈가 제거된 저해상도 이미지(DLR)의 영역(x,y)에 텍스쳐 영역이 포함되었다고 판단한다. 따라서 상기 수학식 5에서 노이즈가 제거된 저해상도 이미지(DLR)의 픽셀 값과 노이즈가 제거된 저해상도 이미지(DLR)를 흐리게 하기 위해 사용되는 가우시안 커널의 픽셀 값의 차이가 제2문턱값보다 클 때, 프로세서(11)는 상기 maskmap_s(x,y)의 값으로 1을 출력한다. 프로세서(11)는 상기 제2절대값이 상기 제2문턱값보다 클 때, 상기 제2절대값을 1로 설정한다. 상기 maskmap_s(x,y)의 값은 0, 또는 1이다. When the difference between the pixel value of the noise-removed low-resolution image DLR in Equation 5 and the pixel value of the Gaussian kernel used to blur the noise-removed low-resolution image DLR is greater than the second threshold value, the processor (11) determines that the texture area is included in the area (x, y) of the low-resolution image DLR from which noise is removed. Therefore, in Equation 5, when the difference between the pixel value of the noise-removed low-resolution image DLR and the pixel value of the Gaussian kernel used to blur the noise-removed low-resolution image DLR is greater than the second threshold value, The processor 11 outputs 1 as the value of the maskmap _s (x, y). The processor 11 sets the second absolute value to 1 when the second absolute value is greater than the second threshold value. The value of the maskmap _s (x, y) is 0 or 1.

프로세서(11)는 제2신경망(400)의 특징 맵(FM2)의 채널들 사이의 제2상호 상관 관계(cross correlation)의 값을 계산한다. 상기 수학식 4에서 상기 crosscorrelation2은 제2신경망(400)의 특징 맵의 채널들 사이의 상호 상관 관계(cross correlation)를 나타낸다. 제2신경망(400)의 특징 맵(FM2)은 제2신경망(400)에서 셔플 레이어(420)로 입력되는 특징 맵을 의미한다. 셔플 레이어(420)로 입력되는 특징 맵(FM2)은 다른 레이어(예컨대, 410-1, 410-2, ..., 또는 410-4)로부터 출력되는 특징 맵보다 많은 특징들을 포함하고 있다. 유사하게 각 채널에 포함된 특징들의 상호 상관 관계가 높을 때, 각 채널은 서로 유사한 특징들을 모두 포함할 필요가 없다. 따라서 각 채널에 포함된 특징들의 상호 상관 관계를 판단하기 위해 프로세서(11)는 제2신경망(300)의 특징 맵(FM2)의 채널들 사이의 제2상호 상관 관계(cross correlation)의 값을 계산한다.The processor 11 calculates a second cross correlation value between channels of the feature map FM2 of the second neural network 400 . In Equation 4, the crosscorrelation2 represents a cross correlation between channels of the feature map of the second neural network 400 . The feature map FM2 of the second neural network 400 means a feature map input from the second neural network 400 to the shuffle layer 420 . The feature map FM2 input to the shuffle layer 420 includes more features than feature maps output from other layers (eg, 410-1, 410-2, ..., or 410-4). Similarly, when the cross-correlation of features included in each channel is high, each channel need not include all features similar to each other. Therefore, in order to determine the cross-correlation of the features included in each channel, the processor 11 calculates the value of the second cross-correlation between the channels of the feature map FM2 of the second neural network 300. do.

생성기 네트워크(500)는 제3신경망(500)으로 호칭될 수 있다. 제3신경망(500)은 제1신경망(300)과 제2신경망(400)을 포함한다.The generator network 500 may be referred to as a third neural network 500. The third neural network 500 includes the first neural network 300 and the second neural network 400 .

제3신경망(500)의 손실 함수는 상기 수학식 1과 같이 정의된다. 즉, 제3신경망(500)의 손실 함수는 상기 제1문턱값에 따른 제1절대값과 상기 제1상호 상관 관계의 값을 곱한 값과, 상기 제2문턱값에 따른 제2절대값과 상기 제2상호 상관 관계의 값을 곱한 값에 의해 정의된다. The loss function of the third neural network 500 is defined as in Equation 1 above. That is, the loss function of the third neural network 500 is a value obtained by multiplying a first absolute value according to the first threshold value by the value of the first cross-correlation, a second absolute value according to the second threshold value, and the It is defined by the value multiplied by the value of the second cross-correlation.

상기 제1문턱값에 따른 제1절대값은 상기 수학식 3에 따른 maskmap_c(x,y)의 값인 0, 또는 1을 나타낸다. 상기 제1상호 상관 관계의 값은 상기 수학식 2에서 제1신경망(300)의 특징 맵의 채널들 사이의 상호 상관 관계인 crosscorrelation1을 나타낸다. 상기 제1문턱값에 따른 제1절대값과 상기 제1상호 상관 관계의 값을 곱한 값은 상기 수학식 2에서 상기 제1신경망(300)의 상관성인 상기 corr1을 나타낸다. The first absolute value according to the first threshold value represents 0 or 1, which is the value of maskmap _c (x,y) according to Equation 3 above. The value of the first cross-correlation represents crosscorrelation1, which is a cross-correlation between channels of the feature map of the first neural network 300 in Equation 2 above. A value obtained by multiplying the first absolute value according to the first threshold value by the value of the first cross-correlation represents the corr1, which is the correlation of the first neural network 300 in Equation 2.

상기 제2문턱값에 따른 제2절대값은 상기 수학식 5에 따른 maskmap_s(x,y)의 값인 0, 또는 1을 나타낸다. 상기 제2상호 상관 관계의 값은 상기 수학식 4에서 제2신경망(400)의 특징 맵의 채널들 사이의 상호 상관 관계인 crosscorrelation2을 나타낸다. 상기 제2문턱값에 따른 제2절대값과 상기 제2상호 상관 관계의 값을 곱한 값은 상기 수학식 4에서 상기 제2신경망(400)의 상관성인 상기 corr2을 나타낸다. The second absolute value according to the second threshold value represents 0 or 1, which is a value of maskmap _s (x, y) according to Equation 5 above. The value of the second cross-correlation represents crosscorrelation2, which is a cross-correlation between channels of the feature map of the second neural network 400 in Equation 4 above. A value obtained by multiplying the second absolute value according to the second threshold value by the value of the second cross-correlation represents the corr2, which is the correlation of the second neural network 400 in Equation 4.

도 4는 본 발명의 실시 예에 따른 신경망을 이용한 슈퍼 레졸루션 이미지 처리 방법의 효과를 설명하기 위한 이미지들을 나타낸다. 도 4의 (a)는 컴퓨팅 장치(20)에 의해 이미지 프로세싱 장치(10)로 전송되는 이미지의 일부를 나타낸다. 도 4의 (b)는 저해상도 이미지이다. 도 4의 (b)는 종래기술에 의해 저해상도 이미지에서 변환된 고해상도 이미지를 나타낸다. 도 4의 (c)는 본 발명에 따른 저해상도 이미지에서 변환된 고해상도 이미지를 나타낸다. 도 4의 (b)에 도시된 고해상도 이미지는 종래의 코딩 노이즈가 제거되지 않아 도 4의 (c)에 도시된 고해상도 이미지보다 화질이 떨어진다. 4 shows images for explaining the effect of the super-resolution image processing method using a neural network according to an embodiment of the present invention. (a) of FIG. 4 shows a portion of an image transmitted to the image processing device 10 by the computing device 20 . Figure 4 (b) is a low-resolution image. Figure 4 (b) shows a high-resolution image converted from a low-resolution image by the prior art. Figure 4 (c) shows a high-resolution image converted from a low-resolution image according to the present invention. The high-resolution image shown in (b) of FIG. 4 is lower in quality than the high-resolution image shown in (c) of FIG. 4 because conventional coding noise is not removed.

도 5는 본 발명의 실시 예에 따른 신경망을 이용한 슈퍼 레졸루션 이미지 처리 방법의 흐름도를 나타낸다.5 is a flowchart of a super-resolution image processing method using a neural network according to an embodiment of the present invention.

도 1 내지 도 5를 참고하면, 프로세서(11)는 압축된 저해상도 이미지(CLR)를 수신한다(S10). Referring to FIGS. 1 to 5 , the processor 11 receives the compressed low-resolution image CLR (S10).

프로세서(11)는 상기 압축된 저해상도 이미지(CLR)를 디코딩한다(S20). The processor 11 decodes the compressed low-resolution image CLR (S20).

프로세서(11)는 한 쌍의 신경망들(50, 60)을 이용하여 상기 디코딩된 저해상도 이미지(LR)를 디코딩된 고해상도 이미지(SR)로 변환한다(S30). The processor 11 converts the decoded low-resolution image LR into a decoded high-resolution image SR using a pair of neural networks 50 and 60 (S30).

프로세서(11)는 상기 디코딩된 고해상도 이미지(SR)를 인코딩한다(S40). The processor 11 encodes the decoded high-resolution image SR (S40).

프로세서(11)는 상기 인코딩된 고해상도 이미지(ESR)를 네트워크(101)를 통해 컴퓨팅 장치(20)로 전송한다(S50). The processor 11 transmits the encoded high-resolution image ESR to the computing device 20 through the network 101 (S50).

도 6은 본 발명의 다른 실시 예에 따른 신경망 모듈의 훈련 동작을 설명하기 위한 네트워크의 블록도를 나타낸다. 6 is a block diagram of a network for explaining a training operation of a neural network module according to another embodiment of the present invention.

도 6에 도시된 네트워크(600)는 GAN(Generative adversarial network)를 나타낸다. 도 6을 참고하면, 네트워크(600)는 생성기 네트워크(700)와 판별자 네트워크(770)를 포함한다. 도 6에 도시된 네트워크(600)는 도 2에 도시된 네트워크(200)와 구성 및 동작이 유사하다. 따라서 도 2에 도시된 네트워크(200)와 유사한 부분을 제외한 차이점에 대해서만 설명한다. The network 600 shown in FIG. 6 represents a generative adversarial network (GAN). Referring to FIG. 6 , the network 600 includes a generator network 700 and a discriminator network 770 . The network 600 shown in FIG. 6 is similar in structure and operation to the network 200 shown in FIG. 2 . Therefore, only differences from the network 200 shown in FIG. 2 except for similar parts will be described.

도 6에 도시된 생성기 네트워크(700)와 도 2에 도시된 생성기 네트워크(500)과 다르다. 도 2에 도시된 생성기 네트워크(500)는 코딩 노이즈 제거뿐만 아니라 디코딩된 저해상도 이미지(LR)에 포함된 텍스쳐(texture)도 제거될 수 있는 문제점이 발생할 수 있다. It differs from the generator network 700 shown in FIG. 6 and the generator network 500 shown in FIG. 2 . The generator network 500 shown in FIG. 2 may have a problem in that not only coding noise removal but also texture included in the decoded low-resolution image LR may be removed.

도 6에 도시된 생성기 네트워크(700)는 이러한 문제점을 해결하기 위한 것이다. 디코딩된 저해상도 이미지(LR)에 포함된 코딩 노이즈가 높을 때, 도 6에 도시된 생성기 네트워크(700)는 디코딩된 저해상도 이미지(LR)의 노이즈를 제거하고, 노이즈가 제거된 저해상도 이미지(DLR)를 제2신경망(720)을 이용하여 디코딩된 고해상도 이미지(SR)로 변환한다. 디코딩된 저해상도 이미지(LR)에 포함된 코딩 노이즈가 낮을 때, 도 6에 도시된 생성기 네트워크(700)는 노이즈가 제거된 저해상도 이미지(DLR)가 아니라, 디코딩된 저해상도 이미지(LR)를 제2신경망(720)을 이용하여 디코딩된 고해상도 이미지(SR)로 변환한다. 따라서 도 6에 도시된 생성기 네트워크(700)는 디코딩된 저해상도 이미지(LR)에 포함된 텍스쳐(texture)가 제거되는 문제점이 해소될 수 있다. The generator network 700 shown in FIG. 6 is intended to solve this problem. When the coding noise included in the decoded low-resolution image LR is high, the generator network 700 shown in FIG. 6 removes noise from the decoded low-resolution image LR, and generates the noise-removed low-resolution image DLR. The second neural network 720 is used to convert the decoded high-resolution image SR. When the coding noise included in the decoded low-resolution image LR is low, the generator network 700 shown in FIG. 6 converts the decoded low-resolution image LR, not the noise-removed low-resolution image DLR, to the second neural network. It is converted into a decoded high-resolution image (SR) using 720. Therefore, the generator network 700 shown in FIG. 6 can solve the problem of removing the texture included in the decoded low-resolution image LR.

생성기 네트워크(700)는 코딩 노이즈를 포함하는 디코딩된 저해상도 이미지(LR)를 수신하여 디코딩된 고해상도 이미지(SR)가 출력되도록 훈련된다. The generator network 700 is trained to receive a decoded low-resolution image LR containing coding noise and output a decoded high-resolution image SR.

판별자 네트워크(770)는 높은 해상도 트레이닝 이미지(HR)와 생성기 네트워크(700)로부터 출력되는 디코딩된 고해상도 이미지(SR)를 수신하여 높은 해상도 트레닝 이미지(HR)의 분포와 디코딩된 고해상도 이미지(SR)의 분포의 차이인 바서스타인 손실(EM)을 출력한다. The discriminator network 770 receives the high resolution training image HR and the decoded high resolution image SR output from the generator network 700, and calculates the distribution of the high resolution training image HR and the decoded high resolution image SR ) outputs the Wasserstein loss (EM), which is the difference in the distribution of

생성기 네트워크(700)는 제1신경망(710), 제2신경망(720), 제3신경망(730), 제4신경망(740), 선택 신호 생성기(750), 및 선택기(760)를 포함한다. The generator network 700 includes a first neural network 710, a second neural network 720, a third neural network 730, a fourth neural network 740, a selection signal generator 750, and a selector 760.

제1신경망(710)은 복수의 레이어들을 포함한다. 제1신경망(710)은 도 2에 도시된 제1신경망(300)과 대응된다. 따라서 제1신경망(710)은 도 3에 도시된 제1신경망(300)으로 구현될 수 있다. 실시 예에 따라 제1신경망(710)은 도 3에 도시된 제1신경망(300)을 수정하여 구현될 수 있다. The first neural network 710 includes a plurality of layers. The first neural network 710 corresponds to the first neural network 300 shown in FIG. 2 . Accordingly, the first neural network 710 may be implemented as the first neural network 300 shown in FIG. 3 . According to an embodiment, the first neural network 710 may be implemented by modifying the first neural network 300 shown in FIG. 3 .

제2신경망(720)은 복수의 레이어들을 포함한다. 제2신경망(720)은 도 2에 도시된 제2신경망(400)과 대응된다. 따라서 제2신경망(720)은 도 3에 도시된 제2신경망(400)으로 구현될 수 있다. 실시 예에 따라 제2신경망(720)은 도 3에 도시된 제2신경망(400)을 수정하여 구현될 수 있다. The second neural network 720 includes a plurality of layers. The second neural network 720 corresponds to the second neural network 400 shown in FIG. 2 . Accordingly, the second neural network 720 may be implemented as the second neural network 400 shown in FIG. 3 . According to an embodiment, the second neural network 720 may be implemented by modifying the second neural network 400 shown in FIG. 3 .

제1신경망(710)과 제2신경망(720)을 효율적으로 훈련시키기 위해 도 3에 도시된 LPIPS 유닛들(335, 435)이 이용될 수 있다. 앞에서 LPIPS 유닛들(335, 435)의 동작들에 대해 설명하였으므로 이에 대한 상세한 설명은 생략한다. In order to efficiently train the first neural network 710 and the second neural network 720, the LPIPS units 335 and 435 shown in FIG. 3 may be used. Since operations of the LPIPS units 335 and 435 have been described above, a detailed description thereof will be omitted.

앞에서 설명한 수학식 1이 생성기 네트워크(700)에 적용될 수 있다. 즉, 생성기 네트워크(700)의 손실 함수는 수학식 1과 같이 표현될 수 있다. Equation 1 described above may be applied to the generator network 700. That is, the loss function of the generator network 700 can be expressed as Equation 1.

상기 수학식 1에서 상기 LossFunction은 생성기 네트워크(700)의 손실함수를, 상기 a와 상기 b는 제1점수와 제2점수를, 상기 LossFunction1은 제1신경망(710)의 제1손실 함수를, 상기 LossFunction2는 제2신경망(720)의 제2손실 함수를, 상기 corr1은 제1신경망(710)의 상관성을, 상기 corr2는 제2신경망(720)의 상관성을 의미한다. 생성기 네트워크(700)에 상기 수학식 1이 적용됨에 따라 수학식 2 내지 수학식 5도 생성기 네트워크(700)에 적용된다. In Equation 1, the LossFunction is the loss function of the generator network 700, the a and the b are the first and second scores, the LossFunction1 is the first loss function of the first neural network 710, LossFunction2 is the second loss function of the second neural network 720, corr1 is the correlation of the first neural network 710, and corr2 is the correlation of the second neural network 720. As Equation 1 is applied to the generator network 700, Equations 2 to 5 are also applied to the generator network 700.

도 2에 도시된 제1네트워크(300), 제2네트워크(400), 및 생성기 네트워크(500)의 기술적 특징은 도 6에 도시된 제1네트워크(710), 제2네트워크(720), 및 생성기 네트워크(700)의 기술적 특징에 그대로 적용된다. The technical characteristics of the first network 300, the second network 400, and the generator network 500 shown in FIG. 2 are the first network 710, the second network 720, and the generator network shown in FIG. It is applied to the technical characteristics of the network 700 as it is.

제3신경망(730)과 제4신경망(740) 각각은 일반적인 CNN(convolutional neural network)으로 구현될 수 있다. 제3신경망(730)은 인코딩되기 전의 원본 이미지와, 디코딩된 저해상도 이미지(LR)를 훈련 데이터로 원본 이미지를 인코딩하는 과정에서 인코딩 비트레이트(bitrate)를 조절하면서 훈련될 수 있다. 제4신경망(740)은 인코딩되기 전의 원본 이미지와, 노이즈가 제거된 저해상도 이미지(DLR)를 훈련 데이터로 원본 데이터를 인코딩하는 과정에서 인코딩 비트레이트(bitrate)를 조절하면서 훈련될 수 있다.Each of the third neural network 730 and the fourth neural network 740 may be implemented as a general convolutional neural network (CNN). The third neural network 730 may be trained while adjusting the encoding bitrate in the process of encoding the original image using the original image before encoding and the decoded low-resolution image LR as training data. The fourth neural network 740 may be trained while adjusting the encoding bitrate in the process of encoding the original image before encoding and the low-resolution image (DLR) from which noise is removed as training data.

프로세서(11)는 제1신경망(710)을 이용하여 디코딩된 저해상도 이미지(LR)의 노이즈를 제거하여 노이즈가 제거된 저해상도 이미지(DLR)를 생성한다. The processor 11 removes noise from the decoded low-resolution image LR using the first neural network 710 to generate a noise-removed low-resolution image DLR.

프로세서(11)는 디코딩된 저해상도 이미지(LR)의 노이즈 레벨과 노이즈가 제거된 저해상도 이미지(DLR)의 노이즈 레벨에 따라 디코딩된 저해상도 이미지(DLR), 또는 노이즈가 제거된 저해상도 이미지(DLR)를 제2신경망(720)을 이용하여 디코딩된 고해상도 이미지(SR)로 변환한다. The processor 11 outputs the decoded low-resolution image DLR or the noise-removed low-resolution image DLR according to the noise level of the decoded low-resolution image LR and the noise level of the noise-removed low-resolution image DLR. It is converted into a decoded high-resolution image (SR) using the 2 neural network 720.

선택기(760)는 선택 신호(SEL)에 따라 디코딩된 저해상도 이미지(DLR)와 노이즈가 제거된 저해상도 이미지(DLR) 중 어느 하나를 제2신경망(720)으로 출력한다. 선택기(760)는 프로그램 명령들로 구현될 수 있다. 실시 예에 따라 선택기(760)는 하드웨어로 구현될 수 있다. The selector 760 outputs either the decoded low-resolution image DLR or the noise-removed low-resolution image DLR to the second neural network 720 according to the selection signal SEL. Selector 760 may be implemented with program instructions. According to embodiments, the selector 760 may be implemented in hardware.

디코딩된 저해상도 이미지(LR)의 노이즈 레벨과 노이즈가 제거된 저해상도 이미지(DLR)의 노이즈 레벨의 차이가 임의의 값보다 작을 때, 프로세서(11)는 디코딩된 저해상도 이미지(LR)를 제2신경망(720)을 이용하여 디코딩된 고해상도 이미지(SR)로 변환한다. When the difference between the noise level of the decoded low-resolution image LR and the noise level of the noise-removed low-resolution image DLR is smaller than a certain value, the processor 11 converts the decoded low-resolution image LR to the second neural network ( 720) to convert into a decoded high-resolution image (SR).

디코딩된 저해상도 이미지(LR)의 노이즈 레벨과 노이즈가 제거된 저해상도 이미지(DLR)의 노이즈 레벨의 차이가 상기 임의의 값보다 클 때, 프로세서(11)는 노이즈가 제거된 저해상도 이미지(DLR)를 제2신경망(720)을 이용하여 디코딩된 고해상도 이미지(SR)로 변환한다. When the difference between the noise level of the decoded low-resolution image LR and the noise level of the noise-removed low-resolution image DLR is greater than the predetermined value, the processor 11 outputs the noise-removed low-resolution image DLR. It is converted into a decoded high-resolution image (SR) using the 2 neural network 720.

상기 디코딩된 저해상도 이미지(LR)의 노이즈 레벨은 제3신경망(730)을 이용하여 생성된다. 프로세서(11)는 디코딩된 저해상도 이미지(LR)를 제3신경망(730)을 이용하여 디코딩된 저해상도 이미지(LR)의 노이즈 레벨을 생성한다. 상기 노이즈 레벨은 0 에서 1사이의 값을 가질 수 있다. The noise level of the decoded low-resolution image LR is generated using the third neural network 730. The processor 11 generates a noise level of the decoded low-resolution image LR by using the third neural network 730 from the decoded low-resolution image LR. The noise level may have a value between 0 and 1.

상기 노이즈가 제거된 저해상도 이미지(DLR)의 노이즈 레벨은 제4신경망(740)을 이용하여 생성된다. 프로세서(11)는 노이즈가 제거된 저해상도 이미지(DLR)를 제4신경망(740)을 이용하여 노이즈가 제거된 저해상도 이미지(DLR)의 노이즈 레벨을 생성한다. 상기 노이즈 레벨은 0 에서 1사이의 값을 가질 수 있다. The noise level of the noise-removed low-resolution image DLR is generated using the fourth neural network 740 . The processor 11 generates a noise level of the noise-removed low-resolution image DLR by using the fourth neural network 740 . The noise level may have a value between 0 and 1.

선택 신호 생성기(750)는 디코딩된 저해상도 이미지(LR)의 노이즈 레벨과 노이즈가 제거된 저해상도 이미지(DLR)의 노이즈 레벨을 비교하고, 비교 결과에 따라 선택 신호(SEL)를 생성한다. The selection signal generator 750 compares the noise level of the decoded low-resolution image LR with the noise level of the noise-removed low-resolution image DLR, and generates the selection signal SEL according to the comparison result.

디코딩된 저해상도 이미지(LR)의 노이즈 레벨과 노이즈가 제거된 저해상도 이미지(DLR)의 노이즈 레벨의 차이가 임의의 값보다 작을 때, 프로세서(11)는 선택 신호 생성기(750)에 의해 생성된 선택 신호(SEL)에 따라 디코딩된 저해상도 이미지(LR)와 노이즈가 제거된 저해상도 이미지(DLR) 중 디코딩된 저해상도 이미지(LR)를 선택하고, 디코딩된 저해상도 이미지(LR)를 제2신경망(720)을 이용하여 디코딩된 고해상도 이미지(SR)로 변환한다. When the difference between the noise level of the decoded low-resolution image LR and the noise level of the noise-removed low-resolution image DLR is smaller than a certain value, the processor 11 generates a selection signal generated by the selection signal generator 750. According to (SEL), the decoded low-resolution image LR is selected from among the decoded low-resolution image LR and the noise-removed low-resolution image DLR, and the decoded low-resolution image LR is used by the second neural network 720. and converts it into a decoded high-resolution image (SR).

디코딩된 저해상도 이미지(LR)의 노이즈 레벨과 노이즈가 제거된 저해상도 이미지(DLR)의 노이즈 레벨의 차이가 상기 임의의 값보다 클 때, 프로세서(11)는 선택 신호 생성기(750)에 의해 생성된 선택 신호(SEL)에 따라 디코딩된 저해상도 이미지(LR)와 노이즈가 제거된 저해상도 이미지(DLR) 중 노이즈가 제거된 저해상도 이미지(DLR)를 선택하고, 노이즈가 제거된 저해상도 이미지(DLR)를 제2신경망(720)을 이용하여 디코딩된 고해상도 이미지(SR)로 변환한다. When the difference between the noise level of the decoded low-resolution image LR and the noise level of the noise-removed low-resolution image DLR is greater than the predetermined value, the processor 11 selects the signal generated by the selection signal generator 750. According to the signal (SEL), a low-resolution image (DLR) with noise removed is selected from among the decoded low-resolution image (LR) and the noise-removed low-resolution image (DLR), and the noise-removed low-resolution image (DLR) is converted to the second neural network. It is converted into a decoded high-resolution image (SR) using 720.

도 7은 본 발명의 다른 실시 예에 따른 복수의 신경망들을 이용하여 코딩 노이즈에 강인한 슈퍼 레졸루션 이미지 처리 방법의 흐름도를 나타낸다.7 is a flowchart of a super-resolution image processing method robust against coding noise using a plurality of neural networks according to another embodiment of the present invention.

도 1, 도 6 및 도 7을 참고하면, 프로세서(11)는 압축된 저해상도 이미지(LR)를 수신한다(S100). Referring to FIGS. 1, 6 and 7, the processor 11 receives the compressed low-resolution image LR (S100).

프로세서(11)는 상기 압축된 저해상도 이미지(LR)를 디코딩한다(S110). The processor 11 decodes the compressed low-resolution image LR (S110).

프로세서(11)는 제1신경망(710)을 이용하여 상기 디코딩된 저해상도 이미지(LR)의 노이즈를 제거하여 노이즈가 제거된 저해상도 이미지(DLR)를 생성한다(S120). The processor 11 removes noise from the decoded low-resolution image LR using the first neural network 710 to generate a noise-removed low-resolution image DLR (S120).

프로세서(11)는 상기 디코딩된 저해상도 이미지(LR)의 노이즈 레벨과 상기 노이즈가 제거된 저해상도 이미지(DLR)의 노이즈 레벨에 따라 상기 디코딩된 저해상도 이미지(LR), 또는 상기 노이즈가 제거된 저해상도 이미지(DLR)를 제2신경망(720)을 이용하여 디코딩된 고해상도 이미지(SR)로 변환한다(S130). The processor 11 determines the decoded low-resolution image LR or the noise-removed low-resolution image ( The DLR) is converted into a decoded high-resolution image SR using the second neural network 720 (S130).

프로세서(11)는 상기 디코딩된 고해상도 이미지(SR)를 인코딩한다(S140). The processor 11 encodes the decoded high-resolution image SR (S140).

프로세서(11)는 상기 인코딩된 고해상도 이미지(ESR)를 컴퓨팅 장치(20)로 전송한다(S150). The processor 11 transmits the encoded high-resolution image ESR to the computing device 20 (S150).

본 발명은 도면에 도시된 일 실시 예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the present invention has been described with reference to an embodiment shown in the drawings, this is only exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the attached claims.

100: 신경망을 이용한 슈퍼 레졸루션 이미지 처리 시스템;
101: 네트워크;
10: 이미지 처리 장치;
11: 프로세서;
13: 메모리;
15: 디코딩 모듈;
17: 신경망 모듈;
19: 인코딩 모듈;
20: 컴퓨팅 장치;100: super-resolution image processing system using a neural network;
101: network;
10: image processing unit;
11: processor;
13: memory;
15: decoding module;
17: neural network module;
19: encoding module;
20: computing device;

Claims

receiving, by a processor, a compressed low-resolution image;
decoding, by the processor, the compressed low-resolution image;
generating, by the processor, a noise-removed low-resolution image by removing noise from the decoded low-resolution image using a first neural network;
When the difference between the noise level of the decoded low-resolution image and the noise level of the denoised low-resolution image is smaller than a certain value, the processor converts the decoded low-resolution image into a decoded high-resolution image using a second neural network. and converting, by the processor, the low-resolution image from which the noise is removed into the decoded high-resolution image using the second neural network when the difference is greater than the predetermined value;
encoding, by the processor, the decoded high-resolution image; and
The super-resolution image processing method that is robust to coding noise using a plurality of neural networks comprising the step of transmitting, by the processor, the encoded high-resolution image.

delete

The method of claim 1, wherein the noise level of the decoded low-resolution image is
It is generated using the third neural network,
The noise level of the low-resolution image from which the noise has been removed is
A super-resolution image processing method that is robust against coding noise using a plurality of neural networks generated using a fourth neural network.

The method of claim 1, wherein the super-resolution image processing method robust to coding noise using the plurality of neural networks comprises:
calculating, by the processor, an image similarity between the noise-removed low-resolution image and the lossless low-resolution training image as a first score;
calculating, by the processor, an image similarity between the decoded high-resolution image and the high-resolution training image as a second score; and
When the first score is greater than the second score, the processor further training the first neural network than the second neural network,
The lossless low-resolution training images are images used for training of the first neural network, and the high-resolution training images are images used for training of the second neural network,
The first score is lower as the low-resolution image from which the noise is removed is similar to the low-resolution training image,
The second score is lower as the decoded high-resolution image and the high-resolution training image are similar,
The high-resolution training image and the decoded high-resolution image are robust to coding noise using a plurality of neural networks having the same resolution.

delete

Including an image processing device,
The image processing device,
a processor that executes instructions; and
a memory for storing the instructions;
These commands are
Noise is removed from the low-resolution image using a first neural network to generate a low-resolution image from which the noise is removed;
When the difference between the noise level of the low-resolution image and the noise level of the low-resolution image from which the noise has been removed is smaller than an arbitrary value,
These commands are
It is implemented to convert the low-resolution image into a high-resolution image using a second neural network,
When the difference is greater than the above arbitrary value,
These commands are
A super-resolution image processing system robust against coding noise using a plurality of neural networks implemented to convert the low-resolution image from which the noise is removed into the high-resolution image using the second neural network.

delete

The method of claim 6, wherein the noise level of the low-resolution image is,
It is generated using the third neural network,
The noise level of the low-resolution image from which the noise has been removed is
A super-resolution image processing system robust against coding noise using a plurality of neural networks generated using a fourth neural network.

The method of claim 6, wherein the instructions,
Calculate the image similarity between the noise-removed low-resolution image and the lossless low-resolution training image as a first score;
Calculate the image similarity between the high-resolution image and the high-resolution training image as a second score,
When the first score is greater than the second score, the first neural network is further trained than the second neural network,
The first score is lower as the low-resolution image from which the noise is removed is similar to the lossless low-resolution training image,
The second score is lower as the high-resolution image and the high-resolution training image are similar,
The high-resolution training image and the decoded high-resolution image are robust to coding noise using a plurality of neural networks having the same resolution.

delete