KR20150107360A

KR20150107360A - Method and apparatus for generating of super resolution image

Info

Publication number: KR20150107360A
Application number: KR1020140030137A
Authority: KR
Inventors: 임성창; 이대열; 정세윤; 조숙희; 김휘용; 김종호; 최진수; 김진웅; 김종옥; 전재희; 박세진
Original assignee: 한국전자통신연구원; 고려대학교 산학협력단
Priority date: 2014-03-14
Filing date: 2014-03-14
Publication date: 2015-09-23
Also published as: KR101695900B1

Abstract

Disclosed is a method for generating an image using a super resolution technique, which comprises the following steps: decomposing an input image into a low frequency image and a high frequency image by using a filter; generating an enlarged image of the low frequency component by interpolating the input image; searching a similar patch which has the highest similarity from the low frequency image by using the enlarged image as a patch unit; copying and mapping a patch in the high frequency image corresponding to the searched similar patch into a high frequency image of a higher layer which has a higher resolution than the high frequency image; and merging the enlarged image and the high frequency image of the higher layer to generate a high resolution image which has a higher resolution than the input image.

Description

TECHNICAL FIELD [0001] The present invention relates to a method and an apparatus for generating an image using a super resolution technique,

본 발명은 영상 처리에 관한 것으로, 보다 상세하게는 초해상도 기법을 이용한 영상의 병렬 고속화 처리에 관한 것이다. [0001] The present invention relates to image processing, and more particularly, to parallel processing of images using a super resolution technique.

최근 다양한 디바이스와 UHDTV(Ultra High Definition TV)의 출현으로 고해상도 영상의 필요성이 중시 되고 있다. 초해상도 기법은 영상의 해상도를 향상 시키는 여러 방법 중 하나로 단일 혹은 다수의 영상으로부터 고해상도 영상을 구성하는 기술이다.Recently, the emergence of various devices and UHDTV (Ultra High Definition TV) has emphasized the necessity of high resolution images. The super resolution technique is one of several methods for improving the resolution of an image, and is a technique for constructing a high resolution image from single or multiple images.

또한 카메라 기술의 발달로 영상의 해상도가 증가하면서, 처리해야 할 정보의 양이 급격하게 증가하게 되었다. 그러나 기존의 CPU(Central Processing Unit)로는 처리성능에 한계를 보였고, 이를 보완하기 위해 GPGPU(General Purpose computing on Graphics Processing Unit)에 대한 관심도가 급증했다. GPGPU는 GPU(Graphics Processing Unit)를 활용해, CPU가 실행하는 일반적인 응용 프로그램의 계산을 수행하는 기술을 일컫는다. GPU는 CPU에 비해 일반적으로 낮은 클럭 주파수를 가지고 있지만, 다수의 코어를 이용해 어플리케이션의 병렬도가 높은 부분을 가속화 시킬 수 있다.In addition, as the resolution of the image increases due to the development of camera technology, the amount of information to be processed has increased sharply. However, the processing performance of the conventional CPU (Central Processing Unit) has been limited, and the GPGPU (General Purpose Computing on Graphics Processing Unit) has been attracting interest. GPGPU refers to a technology that utilizes a GPU (Graphics Processing Unit) to perform calculations of general applications executed by a CPU. GPUs generally have a lower clock frequency than CPUs, but they can accelerate high parallelism of applications using multiple cores.

GPGPU를 활용할 수 있는 플랫폼으로, 크게 NVIDIA 그룹이 개발하는 CUDA(Compute Unified Device Architecture)와 크로노스 그룹이 개발하여 배포하는 OpenCL(Open Computing Language)이 있다. 그 중에서도 OpenCL은 개방형 범용 병렬 프레임워크로서, 그 범용성에 있어서 높이 평가되고 있다.GPGPU is a platform that can be utilized by Compute Unified Device Architecture (CUDA) developed by NVIDIA Group and OpenCL (Open Computing Language) developed and distributed by Kronos Group. Among them, OpenCL is an open general-purpose parallel framework and has been highly evaluated for its versatility.

본 발명은 OpenCL을 이용해 초해상도 알고리즘 기법의 처리 시간을 감소시키는 방법을 제공하며, 메모리 모델에 대한 최적화 기법을 적용한 결과를 제공한다.The present invention provides a method of reducing the processing time of a resolution algorithm using OpenCL and provides a result of applying an optimization technique to a memory model.

본 발명은 초해상도 기법을 이용한 고해상도 영상을 생성하는 방법 및 장치를 제공한다. The present invention provides a method and apparatus for generating a high resolution image using a super resolution technique.

본 발명은 초해상도 기법을 OpenCL 기반으로 병렬화 구현하는 방법을 제공한다. The present invention provides a method for parallelizing a super resolution technique based on OpenCL.

본 발명은 초해상도 기법에 적용할 수 있는 메모리 최적화 방법을 제공한다.The present invention provides a memory optimization method applicable to a super resolution technique.

본 발명의 일 실시예에 따르면, 초해상도 기법을 이용한 영상 생성 방법이 제공된다. 상기 방법은 입력 영상을 필터를 이용하여 저주파수 영상과 고주파수 영상으로 분해하는 단계, 상기 입력 영상을 보간(interpolation)하여 저주파수 성분의 확대 영상을 생성하는 단계, 상기 확대 영상을 패치 단위로 하여 상기 저주파수 영상으로부터 가장 유사도가 높은 유사 패치를 검색하는 단계, 상기 검색된 유사 패치에 대응하는 상기 고주파수 영상 내 패치를 상기 고주파수 영상보다 해상도가 높은 상위 계층의 고주파수 영상으로 복사 및 매핑(mapping)하는 단계 및 상기 확대 영상과 상기 상위 계층의 고주파수 영상을 융합하여 상기 입력 영상보다 해상도가 높은 고해상도 영상을 생성하는 단계를 포함한다. According to an embodiment of the present invention, an image generation method using a super resolution technique is provided. The method includes the steps of decomposing an input image into a low frequency image and a high frequency image using a filter, interpolating the input image to generate an enlarged image of a low frequency component, A step of searching for a similar patch having the highest similarity from the high frequency image and copying and mapping the patch in the high frequency image corresponding to the retrieved similar patch to an upper frequency high frequency image having a higher resolution than the high frequency image, And generating a high-resolution image having a higher resolution than the input image by fusing the high-frequency image of the upper layer.

초해상도 기법을 OpenCL을 이용하여 구현함으로써 병렬화 및 고속 처리가 가능하다. 메모리 최적화 기법을 적용함으로써 전역 메모리에서 발생하게 되는 동기화 문제 및 병목 현상을 해결할 수 있다. 또한, 초해상도 기법을 OpenCL로 구현함에 따라 CPU 구현 대비 약 90배 정도의 속도 향상을 확인할 수 있었다. Parallelization and high-speed processing are possible by implementing the super resolution technique using OpenCL. By applying the memory optimization technique, it is possible to solve the synchronization problem and the bottleneck which are generated in the global memory. Also, as the super resolution technique is implemented in OpenCL, the speed improvement is about 90 times as compared with the CPU implementation.

도 1은 본 발명의 실시예에 따른 초해상도 기법 알고리즘을 이용한 고해상도 영상을 생성하는 방법을 나타내는 개념도이다.
도 2는 상술한 도 1의 방법을 구현하기 위한 본 발명의 실시예에 따른 시스템 구성을 나타내는 블록도이다.
도 3은 본 발명의 실시예에 따른 초해상도 기법 알고리즘의 구현을 위한 커널 설계(kernel design)의 일예를 나타낸 것이다.
도 4는 본 발명의 실시예에 따른 초해상도 기법 알고리즘에 적용할 수 있는 메모리 최적화를 위한 시스템 구성의 일예를 나타내는 도면이다.
도 5는 본 발명의 실시예에 따른 전역 메모리 사용 시 경합 조건 문제의 해결 방법을 설명하기 위해 도시된 도면이다.
도 6은 텍스처 메모리 사용에 따른 경합 조건 문제를 설명하기 위해 도시된 도면이다.
도 7은 본 발명의 실시예에 따른 초해상도 기법 알고리즘에 적용할 수 있는 메모리 최적화를 위한 시스템 구성의 다른 예를 나타내는 도면이다.1 is a conceptual diagram illustrating a method for generating a high-resolution image using a super resolution technique algorithm according to an embodiment of the present invention.
2 is a block diagram illustrating a system configuration according to an embodiment of the present invention for implementing the method of FIG.
FIG. 3 shows an example of a kernel design for implementation of a super resolution technique algorithm according to an embodiment of the present invention.
4 is a diagram illustrating an example of a system configuration for memory optimization that can be applied to a super resolution technique algorithm according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a method for solving a contention condition problem when using global memory according to an embodiment of the present invention. Referring to FIG.
FIG. 6 is a diagram illustrating a contention condition problem according to texture memory usage. FIG.
FIG. 7 is a diagram illustrating another example of a system configuration for memory optimization that can be applied to a super resolution technique algorithm according to an embodiment of the present invention.

이하, 도면을 참조하여 본 발명의 실시 형태에 대하여 구체적으로 설명한다. 본 명세서의 실시예를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 명세서의 요지를 흐릴 수 있다고 판단되는 경우에는 해당 설명을 생략할 수도 있다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In describing the embodiments of the present invention, if the detailed description of related known structures or functions is deemed to obscure the subject matter of the present specification, the description may be omitted.

본 명세서에서 어떤 구성 요소가 다른 구성 요소에 연결되어 있다거나 접속되어 있다고 언급된 때에는, 그 다른 구성 요소에 직접적으로 연결되어 있거나 또는 접속되어 있는 것을 의미할 수도 있고, 중간에 다른 구성 요소가 존재하는 것을 의미할 수도 있다. 아울러, 본 명세서에서 특정 구성을 포함한다고 기술하는 내용은 해당 구성 이외의 구성을 배제하는 것이 아니며, 추가적인 구성이 본 발명의 실시 또는 본 발명의 기술적 사상의 범위에 포함될 수 있음을 의미한다.When an element is referred to herein as being connected or connected to another element, it may mean directly connected or connected to the other element, It may mean something. In addition, the description that includes a specific configuration in this specification does not exclude a configuration other than the configuration, and means that additional configurations can be included in the scope of the present invention or the scope of the present invention.

제1, 제2 등의 용어는 다양한 구성들을 설명하는데 사용될 수 있지만, 상기 구성들은 상기 용어에 의해 한정되지 않는다. 상기 용어들은 하나의 구성을 다른 구성으로부터 구별하는 목적으로 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성은 제2 구성으로 명명될 수 있고, 유사하게 제2 구성도 제1 구성으로 명명될 수 있다.The terms first, second, etc. may be used to describe various configurations, but the configurations are not limited by the term. The terms are used for the purpose of distinguishing one configuration from another. For example, without departing from the scope of the present invention, the first configuration may be referred to as the second configuration, and similarly, the second configuration may be named as the first configuration.

또한 본 발명의 실시예에 나타나는 구성부들은 서로 다른 특징적인 기능을 나타내기 위해 독립적으로 도시되는 것으로, 각 구성부들이 분리된 하드웨어나 하나의 소프트웨어 구성 단위로 이루어짐을 의미하지 않는다. 즉, 각 구성부는 설명의 편의상 각각의 구성부로 나열하여 포함한 것으로 각 구성부 중 적어도 두 개의 구성부가 하나의 구성부를 이루거나, 하나의 구성부가 복수 개의 구성부로 나뉘어져 기능을 수행할 수 있다. 각 구성부의 통합된 실시예 및 분리된 실시예도 본 발명의 본질에서 벗어나지 않는 한 본 발명의 권리 범위에 포함된다.In addition, the constituent elements shown in the embodiments of the present invention are shown independently to represent different characteristic functions, which do not mean that each constituent element is composed of separate hardware or a single software constituent unit. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of each constituent unit may form one constituent unit or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and the separate embodiments of each component are also included in the scope of the present invention unless they depart from the essence of the present invention.

또한, 일부의 구성 요소는 본 발명에서 본질적인 기능을 수행하는 필수적인 구성 요소는 아니고 단지 성능을 향상시키기 위한 선택적 구성 요소일 수 있다. 본 발명은 단지 성능 향상을 위해 사용되는 구성 요소를 제외한 본 발명의 본질을 구현하는데 필수적인 구성부만을 포함하여 구현될 수 있고, 단지 성능 향상을 위해 사용되는 선택적 구성 요소를 제외한 필수 구성 요소만을 포함한 구조도 본 발명의 권리범위에 포함된다.
In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance. The present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.

초해상도 기법은 기존 보간 기법에 비해 고해상도 영상의 복원에 탁월한 성능을 보이고, 특히 영상의 고주파 디테일 성분을 복원하는데 적합하다. 이러한 장점에도 불구하고, 초해상도 기법은 알고리즘의 복잡성으로 인해 많은 연산량을 필요로 한다. GPGPU(General Purpose computing on Graphics Processing Unit) 기술은 병렬성이 높은 GPU에서 비그래픽 연산을 수행하는 방식으로, 고복잡도 어플리케이션의 계산 시간을 줄이는데 광범위하게 활용되고 있다. 본 발명에서는 GPGPU를 이용하기 위한 플랫폼 중 OpenCL(Open Computing Language)을 사용해 기존 초해상도 기법을 병렬화하여 연산 시간을 감소시키고, 메모리 모델에 입각해 병목 현상을 해결하는 최적화 기법을 제안한다. The super resolution technique is superior to the conventional interpolation technique in restoration of high resolution images, and is particularly suitable for restoring high frequency detail components of an image. Despite these advantages, the super resolution technique requires a large amount of computation due to the complexity of the algorithm. GPGPU (General Purpose Computing on Graphics Processing Unit) technology is widely used to reduce the computation time of high-complexity applications by performing non-graphic operations on GPUs with high parallelism. In the present invention, we propose an optimization technique to solve the bottleneck based on the memory model by reducing the computation time by parallelizing the existing super resolution method using OpenCL (Open Computing Language) as a platform for using GPGPU.

OpenCL 플랫폼(platform)은 개방형 표준 병렬 컴퓨팅 프레임워크로, CPU(Central Processing Unit), GPU(Graphics Processing Unit), DSP(Digital Signal Processor) 등의 프로세서에서 실행되는 프로그램을 작성할 수 있게 해준다. OpenCL 모델에 따르면 호스트 프로세서가 컨텍스트(context)를 관리하고, 이를 기반으로 여러 디바이스의 병렬 동작이 가능하다. 하나의 태스크는 여러 개의 워크 그룹으로 나누어지고, 각각의 워크 그룹들은 동시에 동작하는 복수의 워크 아이템으로 구성되어 실행된다. The OpenCL platform is an open standard parallel computing framework that allows you to write programs that run on processors such as CPU (Central Processing Unit), GPU (Graphics Processing Unit), and DSP (Digital Signal Processor). According to the OpenCL model, the host processor manages the context, and based on this, multiple devices can operate in parallel. One task is divided into a plurality of work groups, and each of the work groups is composed of a plurality of work items simultaneously operating.

OpenCL은 계층적 메모리 모델을 정의한다. 전역(global) 메모리는 가장 용량이 크지만 접근 시간이 400~800 사이클(cycle)로 매우 느리며, 지역(local) 메모리나 개인(private) 메모리는 그에 비해 훨씬 빠른 접근 시간을 가지는 것이 특징이다. 모든 워크 아이템은 전역 메모리에 접근 가능하고, 같은 워크 그룹의 워크 아이템들은 지역 메모리를 공유한다.
OpenCL defines a hierarchical memory model. Global memory has the greatest capacity, but its access time is very slow with 400 to 800 cycles, and local memory and private memory have a much faster access time. All work items are accessible to global memory, and work items in the same work group share local memory.

도 1은 본 발명의 실시예에 따른 초해상도 기법 알고리즘을 이용한 고해상도 영상을 생성하는 방법을 나타내는 개념도이다. 1 is a conceptual diagram illustrating a method for generating a high-resolution image using a super resolution technique algorithm according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 초해상도 기법을 이용한 영상 생성 방법은 LF(Low Frequency)/HF(High Frequency) 분해 단계, 선형 보간 단계, 유사 패치 검색 단계, 패치 복사 및 매핑 단계, LF/HF 융합 단계 및 배경 투사 단계를 포함할 수 있다. 1, an image generation method using a super resolution technique according to an embodiment of the present invention includes an LF (Low Frequency) / HF (High Frequency) decomposition step, a linear interpolation step, a similar patch search step, a patch copy and mapping step , An LF / HF fusion step, and a background projection step.

LF/HF 분해 단계에서는 입력 영상(100)을 가우시안 필터(Gaussian Filter)를 이용해 저주파수(LF) 영상(110)과 고주파수(HF) 영상(120)으로 분해할 수 있다. In the LF / HF decomposition step, the input image 100 can be decomposed into a low-frequency (LF) image 110 and a high-frequency (HF) image 120 using a Gaussian filter.

선형 보간 단계에서는 원 영상(입력 영상)(100)을 기존의 보간 기법을 활용해 예컨대, 1.25배 확대된 영상(130)을 생성할 수 있다.In the linear interpolation step, an original image (input image) 100 can be generated using an existing interpolation technique, for example, an image 130 enlarged by 1.25 times.

유사 패치 검색 단계에서는 패치(135) 단위로 하위 저주파수 영상 계층에서 가장 유사한 패치(115)를 검색할 수 있다. 이때, 패치 간의 유사도는 MSE(Mean Squared Error) 값을 기준으로 판단할 수 있다. In the similar patch search step, the most similar patch 115 in the lower low frequency image layer can be searched for every patch 135. At this time, similarity between patches can be determined based on MSE (Mean Squared Error) value.

최적의 유사도를 가지는 패치(115)가 검색되면, 그에 해당하는 고주파수 성분(125)을 상위 계층 고주파수 영상(140)을 추정하는데 사용할 수 있다. 다시 말해, 패치 복사 및 매핑 단계에서는 하위 저주파수 영상 계층에서 검색된 가장 유사한 패치(115)에 대응되는 고주파수 성분의 패치(125)를 상위 계층 고주파수 영상(140)에 복사 및 매핑하여 상위 계층 고주파수 영상(140)을 생성할 수 있다. When a patch 115 having an optimal similarity is retrieved, the corresponding high frequency component 125 can be used to estimate the upper layer high frequency image 140. In other words, in the patch copying and mapping step, the high frequency component patch 125 corresponding to the most similar patch 115 found in the lower low frequency image layer is copied and mapped to the upper layer high frequency image 140, Can be generated.

LF/HF 융합 단계에서는 최종적으로 저주파수 영상(130)과 고주파수 영상(140)을 융합하여 입력 영상(100) 보다 한 계층 위의 고해상도 영상(150)을 만들 수 있다. In the LF / HF fusion step, the low-frequency image 130 and the high-frequency image 140 are finally fused to form a high-resolution image 150 that is one layer above the input image 100.

배경 투사 단계에서는 고해상도 영상(150)을 입력 영상(100)으로 배경 투사할 수 있다. In the background projection step, the high-resolution image 150 can be projected to the input image 100 in the background.

상술한 과정을 여러 계층에 대해 반복하여 고해상도의 영상을 얻을 수 있다.
The above-described process is repeated for several layers to obtain a high-resolution image.

도 2는 상술한 도 1의 방법을 구현하기 위한 본 발명의 실시예에 따른 시스템 구성을 나타내는 블록도이다. 2 is a block diagram illustrating a system configuration according to an embodiment of the present invention for implementing the method of FIG.

도 2를 참조하면, 본 발명에 따른 초해상도 기법을 위한 시스템은 CPU를 이용하는 호스트(210)와 GPU를 이용하는 적어도 하나의 디바이스(220)로 구성될 수 있다. Referring to FIG. 2, a system for a super resolution technique according to the present invention may include a host 210 using a CPU and at least one device 220 using a GPU.

호스트(210)는 CPU를 이용하여 파일 입출력, 컨텍스트 초기화, 워크플로 제어, 클린 업(clean up) 등을 수행할 수 있다. The host 210 can perform file input / output, context initialization, workflow control, and clean up using a CPU.

디바이스(220)는 호스트(210)로부터 영상이 입력되면 입력된 영상 보다 높은해상도를 가지는 고해상도 영상을 생성하기 위해 GPU를 이용하여 상술한 도 1의 과정을 수행할 수 있다. The device 220 can perform the process of FIG. 1 using the GPU to generate a high-resolution image having a resolution higher than that of the input image when the image is input from the host 210. FIG.

즉, 디바이스(220)는 입력 영상에 대해 LF/HF 분해 단계(단계 1), 선형 보간 단계(단계 2), 유사 패치 검색 단계(단계 3), 패치 복사 및 매핑 단계(단계 4), LF/HF 융합 단계(단계 5) 및 배경 투사 단계(단계 6)을 수행할 수 있다. 그리고, 단계 1~단계 6을 통해 생성된 고해상도 영상은 호스트(210)로 전달되어 출력될 수 있다.
In other words, the device 220 performs the LF / HF decomposition step (step 1), the linear interpolation step (step 2), the similar patch search step (step 3), the patch copy and mapping step (step 4) HF fusion step (step 5) and background projection step (step 6). The high-resolution image generated through steps 1 to 6 may be transmitted to the host 210 and output.

본 발명의 실시예에 따른 초해상도 기법의 병렬화 및 최적화를 구현하기 위해서 GPGPU 기술을 사용할 수 있다. 이를 위해서, GPU 사양과 영상 크기를 고려하여 워크 아이템과 워크 그룹의 수를 지정해 주어야 한다. 본 발명의 실시예에서는 [32x32]의 워크 아이템을 하나의 워크 그룹으로 지정해 사용하였다. The GPGPU technique can be used to implement parallelization and optimization of the super resolution technique according to the embodiment of the present invention. To do this, you must specify the number of work items and work groups, taking into account the GPU specification and image size. In the embodiment of the present invention, work items of [32x32] are designated as one work group and used.

본 발명의 실시예에 따른 초해상도 기법 알고리즘의 구현을 위해 태스크 및 동기화 시점을 기준으로 총 7개의 커널을 아래와 같이 설계할 수 있다. 표 1은 각 커널의 명칭과, 커널을 실행할 때 걸린 시간, 그리고 전체 실행 시간에서 각 커널이 차지하는 비율을 나타낸다. In order to implement the algorithm of the super resolution technique according to the embodiment of the present invention, a total of 7 kernels can be designed as follows based on the task and synchronization timing. Table 1 shows the name of each kernel, the time it took to execute the kernel, and the percentage of each kernel in total execution time.

커널 명칭Kernel name 시간 (㎲)Time (μs) 비율(%)ratio(%) 이미지 컨볼루션Image convolution 5,998.6785,998,678 1.77%1.77% 유사 패치 검색Search for Similar Patches 323,730.725323, 730.725 96.05%96.05% 이미지 스케일러Image scaler 3,329.4603,329.460 0.99%0.99% 이미지 덧셈Image addition 1,089.0071,089.007 0.32%0.32% 이미지 뺄셈Image subtraction 718.804718.804 0.22%0.22% 이미지 나눗셈Image division 1,299.3611,299,361 0.39%0.39% 이미지 초기화Initialize image 883.618883.618 0.26%0.26%

표 1을 참조하면, 전체 수행 시간 중 유사 패치 검색이 가장 많은 시간을 차지한다는 것을 보여주고 있다. 이로부터 유사 패치 검색 커널에 걸리는 시간을 최적화하는 기법이 요구됨을 알 수 있다.
Referring to Table 1, it is shown that similar patch search takes the most time during the entire execution time. From this, it can be seen that a technique for optimizing the time taken to search for a similar patch search kernel is required.

도 3은 본 발명의 실시예에 따른 초해상도 기법 알고리즘의 구현을 위한 커널 설계(kernel design)의 일예를 나타낸 것이다. FIG. 3 shows an example of a kernel design for implementation of a super resolution technique algorithm according to an embodiment of the present invention.

본 발명에 따른 초해상도 기법 알고리즘을 수행하는 디바이스는 태스크 단위, 병렬화 크기, 전역 동기화 등을 기준으로 도 3에 도시된 바와 같이 커널을 분리할 수 있다. A device performing a super resolution technique algorithm according to the present invention can separate a kernel as shown in FIG. 3 based on a task unit, a parallelization size, and global synchronization.

예를 들어, 본 발명에 따른 초해상도 기법 알고리즘에서 LF/HF 분해 단계는 컨볼루션(convolution)을 위한 “_kernel Convolute”커널과 영상 뺄셈을 위한 “_kernel Sub” 커널에 의해 실행될 수 있다. 선형 보간 단계는 영상 스케일러를 위한 “_kernel Scaler” 커널에 의해 실행될 수 있다. 유사 패치 검색 단계는 초해상도 복원을 위한 “_kernel SSSR” 커널에 의해 실행될 수 있다. 패치 복사 및 매핑 단계는 영상 덧셈을 위한 “_kernel Add” 커널과 영상 나눗셈을 위한 “_kernel Div” 커널에 의해 실행될 수 있다. LF/HF 융합 단계는 영상 덧셈을 위한 “_kernel Add” 커널에 의해 실행될 수 있다. 배경 투사 단계는 컨볼루션(convolution)을 위한 “_kernel Convolute” 커널, 영상 스케일러를 위한 “_kernel Scaler” 커널 및 영상 뺄셈을 위한 “_kernel Sub” 커널에 의해 실행될 수 있다.For example, in the super resolution technique algorithm according to the present invention, the LF / HF decomposition step can be executed by a "_kernel conform" kernel for convolution and a "_kernel Sub" kernel for image subtraction. The linear interpolation step can be performed by the " _kernel Scaler " kernel for the image scaler. The similar patch search step can be executed by the " _kernel SSSR " kernel for super resolution restoration. The patch copy and mapping steps can be performed by a "_kernel Add" kernel for image addition and a "_kernel Div" kernel for image division. The LF / HF fusion step can be performed by the "_kernel Add" kernel for image addition. Background projection steps can be performed by a "_kernel Convolute" kernel for convolution, a "_kernel Scaler" kernel for image scalers, and a "_kernel Sub" kernel for image subtraction.

이때, 영상 나눗셈, 덧셈, 뺄셈 등 기초 산술 연산자를 위한 커널을 사용함으로써 메모리 사용 공간을 최소화할 수 있다. 영상 스케일러(선형 보간법)를 위한 커널은 업-스케일링(up-scaling)과 다운-스케일링(down-scaling)을 실행할 수 있다. 컨볼루션을 위한 커널은 3x3 가우시안 필터링 혹은 11x11 안티-앨리어스(anti-alias) 필터링 등을 실행할 수 있다. 초해상도 복원을 위한 커널은 자기 유사성 기반 고해상도 복원을 말한다.
At this time, memory use space can be minimized by using kernel for basic arithmetic operators such as image division, addition, subtraction. The kernel for the image scaler (linear interpolation) can perform up-scaling and down-scaling. The kernel for convolution can perform 3x3 Gaussian filtering or 11x11 anti-alias filtering. The kernel for super resolution reconstruction refers to self similarity based high resolution reconstruction.

상술한 바와 같이, 본 발명에 따른 초해상도 기법의 구현을 위한 커널 설계에서 유사 패치 검색 시 가장 많은 실행 시간이 걸리는 것을 알 수 있었다. 따라서, 유사 패치 검색 커널을 실행할 때 걸리는 시간을 최소화하기 위한 메모리 접근(access) 최적화 기법이 필요하다. 이를 위해, 본 발명에서는 OpenCL의 메모리 모델을 적용하여 초해상도 기법의 최적화 방법을 제공하고자 한다. As described above, it can be seen that the kernel design for realizing the super resolution technique according to the present invention takes the longest execution time in retrieving similar patches. Therefore, a memory access optimization technique is needed to minimize the time it takes to execute a similar patch search kernel. To this end, the present invention provides a method of optimizing a super resolution technique by applying a memory model of OpenCL.

OpenCL의 계층적 메모리 모델에 따르면, 전역 메모리는 오프-칩 메모리로, 모든 메모리 모델 중 가장 용량이 크지만 접근 시간이 가장 느리며, 지역 메모리와 개인 메모리는 온-칩 메모리로, 더 빠른 접근 속도가 장점이나 제한된 자원의 크기를 가진다. According to OpenCL's hierarchical memory model, global memory is off-chip memory, the largest of all memory models, but with the slowest access time, local and private memory on-chip memory, It has advantages or size of limited resources.

메모리를 할당하는 방법에는 두 가지 방법이 있다. 즉, 일반 버퍼로 설정하는 방법과 이미지 버퍼로 설정하는 방법이 있다. 메모리를 이미지 버퍼로 설정했을 경우, 이미지 버퍼는 일반적으로 GPU 내부에 존재하는 텍스처 메모리(texture memory)에 할당되게 되어, 전역 메모리(global memory)보다 빠른 접근 속도를 보인다. There are two ways to allocate memory. In other words, there are two ways to set it as a normal buffer and an image buffer. If you set the memory as an image buffer, the image buffer will be allocated to texture memory, which is usually located inside the GPU, and will have a faster access speed than the global memory.

도 4는 본 발명의 실시예에 따른 초해상도 기법 알고리즘에 적용할 수 있는 메모리 최적화를 위한 시스템 구성의 일예를 나타내는 도면이다. 4 is a diagram illustrating an example of a system configuration for memory optimization that can be applied to a super resolution technique algorithm according to an embodiment of the present invention.

도 4를 참조하면, 호스트(410)는 본 발명의 실시예에 따른 초해상도 기법 알고리즘의 종료 시점까지 GPU의 메모리 공간을 최초 선언한 후, 분리된(isolated) 상태로 유지할 수 있다. Referring to FIG. 4, the host 410 may maintain the memory space of the GPU in an isolated state after the first declaration of the GPU until the end of the super resolution technique algorithm according to the embodiment of the present invention.

입력과 출력을 제외한 모든 데이터는 GPU의 메모리 공간에 저장될 수 있다. All data except input and output can be stored in the memory space of the GPU.

디바이스(420)는 본 발명의 실시예에 따른 초해상도 기법 구현을 위해 6개의 메모리 공간을 할당하고 이를 재사용할 수 있다. 예컨대, 도 4에 도시된 바와 같이, 상술한 본 발명의 실시예에 따른 초해상도 기법의 각 단계(단계 1 ~ 단계 6)를 수행함에 따라 필요한 메모리 공간 I0, L0, H0, L1, H1, W1을 할당할 수 있다. The device 420 can allocate and reuse six memory spaces for realizing the super resolution technique according to the embodiment of the present invention. For example, as shown in FIG. 4, the necessary memory spaces I0, L0, H0, L1, H1, W1 (step 1 to step 6) of the super resolution technique according to the above- . &Lt; / RTI >

이때, 디바이스(420), 즉 GPU에 할당된 메모리(I0, L0, H0, L1, H1, W1)는 전역 메모리일 수도 있고, 혹은 텍스처 메모리일 수도 있다.
At this time, the device 420, that is, the memory (I0, L0, H0, L1, H1, W1) allocated to the GPU may be a global memory or a texture memory.

GPU에 할당된 메모리가 전역 메모리인 경우, 전역 메모리는 모든 워크 아이템들이 접근 가능하기 때문에 전역 메모리의 동기화 문제, 즉 경합 조건(race condition) 문제가 발생할 수 있다. If the memory allocated to the GPU is global memory, the global memory may encounter a synchronization problem in global memory, that is, a race condition because all work items are accessible.

도 5는 본 발명의 실시예에 따른 전역 메모리 사용 시 경합 조건 문제의 해결 방법을 설명하기 위해 도시된 도면이다. FIG. 5 is a diagram illustrating a method for solving a contention condition problem when using global memory according to an embodiment of the present invention. Referring to FIG.

경합 조건 문제는 전역 메모리와 같이 두 개 이상의 워크 아이템이 동시에 접근 가능한 경우에 발생할 수 있다. The race condition problem can occur when two or more work items are simultaneously accessible, such as global memory.

예를 들어, 본 발명에 따른 초해상도 기법의 단계들 중 유사 패치 검색 시 도 5의 (a)에 도시된 바와 같이, 9개의 워크 아이템들이 한 화소의 주소값을 참조할 때 경합 조건 문제가 발생할 수 있다. For example, as shown in FIG. 5 (a), when searching for a similar patch among the steps of the super resolution technique according to the present invention, when 9 work items refer to an address value of one pixel, .

경합 조건 문제란, 전역 메모리에 선언되어 있는 영상의 한 화소에 여러 개의 워크 아이템이 동시다발적으로 접근할 때, 결과를 예측할 수 없게 되는 상황을 일컫는다. 이 문제를 해결할 수 있는 방법으로 크게 세가지가 있다. The contention condition problem refers to a situation in which the result can not be predicted when multiple work items simultaneously access a pixel of an image declared in the global memory. There are three main ways to solve this problem.

첫 번째로, 커널을 재 동기화하는 방법이다. 이는 동기화되어야 할 데이터를 불가피하게 부가적으로 저장해야 하고, 추가 커널을 실행해야 한다는 단점이 있다. First, it is a way to resynchronize the kernel. This has the disadvantage that additional data must be additionally stored inevitably, and an additional kernel must be executed.

두 번째로, 지역 메모리를 사용하는 방법이다. 지역 메모리는 빠른 접근 속도가 장점이지만, 한정된 자원만을 지닌다는 점과 추가적으로 경계 부분을 동기화해야 한다는 단점이 있다. Secondly, there is a way to use local memory. Local memory has the advantages of fast access speed, but it has only limited resources and it has a disadvantage that it needs to synchronize the boundary part additionally.

세 번째로, 아톰 명령어에 의한 직렬화 방법이 있다. 이는 내부 함수로 선언되어 있는 데이터 값 교환 방식을 활용해 간편하게 구현할 수 있다는 장점이 있다. Third, there is a serialization method by Atom instruction. This is advantageous in that it can be implemented easily by utilizing the data value exchange method declared as an internal function.

세 번째 방법이 두 번째 방법에 비해 오버헤드가 적기 때문에, 본 발명에서는 GPU 내 전역 메모리가 할당된 경우, 아톰 명령어(atomic function)에 의한 직렬화 방법을 사용할 수 있다. Since the third method has less overhead than the second method, in the present invention, when the global memory in the GPU is allocated, the serialization method using an atomic function can be used.

도 5의 (b)에 도시된 바와 같이, 9개의 워크 아이템들이 동시에 전역 메모리에 접근할 때, 본 발명의 실시예에서는 아톰 명령어를 통해 9개의 워크 아이템들이 데이터 값을 교환하는 방식으로 직렬화함으로써 동시 접근에 따른 경합 조건 문제를 해결할 수 있다.
As shown in FIG. 5 (b), when nine work items access the global memory at the same time, in the embodiment of the present invention, nine work items are serialized by exchanging data values through Atom instructions, It is possible to solve the contention problem due to access.

상술한 바와 같이, GPU 내 메모리를 이미지 버퍼로 설정했을 경우, 일반적으로 텍스처 메모리에 할당되므로, 이러한 경우 텍스처 메모리가 전역 메모리보다 빠른 접근 속도를 보인다. 하지만, 이미지 버퍼는 오직 읽기(Read) 혹은 쓰기(Write)만 가능하다는 특성이 있기 때문에, 상술한 경합 조건 문제가 발생할 수도 있다. As described above, when the memory in the GPU is set as the image buffer, the texture memory is generally allocated to the texture memory, so that the texture memory has faster access speed than the global memory. However, since there is a characteristic that the image buffer can only be read or written, the above-mentioned conflict condition problem may occur.

도 6은 텍스처 메모리 사용에 따른 경합 조건 문제를 설명하기 위해 도시된 도면이다. FIG. 6 is a diagram illustrating a contention condition problem according to texture memory usage. FIG.

일반적으로 텍스처 메모리는 이미지 버퍼로 설정되기 때문에, 복수 개의 워크 아이템들이 텍스처 메모리에 접근할 경우에는 읽기(Read) 혹은 쓰기(Write) 접근이 가능하다. 그러나, 도 6에 도시된 바와 같이, 텍스처 메모리(a)에 복수 개의 워크 아이템들이 읽기(Read)와 쓰기(Write)를 동시에 요청하는 경우 경합 조건 문제가 발생할 수 있다.In general, texture memory is set as an image buffer, so that when multiple work items access the texture memory, it is possible to have read or write access. However, as shown in FIG. 6, when a plurality of work items simultaneously request a read and a write in the texture memory a, a race condition problem may occur.

이러한 경우, 본 발명에서는 전역 메모리와 텍스처 메모리를 혼용하여 상술한 경합 조건 문제 해결 방법을 재사용함으로써 텍스처 메모리의 읽기(Read)와 쓰기(Write) 동시 요청에 따른 경합 조건 문제를 해결할 수 있다. In this case, according to the present invention, the contention condition problem due to simultaneous read and write requests of the texture memory can be solved by reusing the conflict method problem solution method by using the global memory and the texture memory together.

예를 들어, 본 발명에 따른 초해상도 기법에서 유사 패치 검색을 실행하기 위한 커널의 입력은 이미지 버퍼로 설정하고, 출력은 일반 버퍼로 설정할 수 있다. 이때 영상 피라미드의 바로 위 계층에서 중간 결과값을 사용하기 위해, 일반 버퍼에서 이미지 버퍼로 내부 전송이 일어나도록 하는 내부 함수를 사용하여 직렬화하는 방법을 통해 이미지 버퍼의 경합 조건 문제를 해결할 수 있다. 예컨대, 일반 버퍼에서 이미지 버퍼로 내부 전송이 일어나도록 하는 내부 함수는 OpenCL API(application Program Interface)에서 지원하는 “clBuffertoImage” 함수를 이용할 수 있다.
For example, in the super resolution technique according to the present invention, input of the kernel for executing similar patch search can be set to an image buffer, and output can be set to a general buffer. In this case, the contention condition problem of the image buffer can be solved through serialization using an internal function that causes internal transfer from the general buffer to the image buffer to use the intermediate result value directly above the image pyramid. For example, the internal function that causes internal transfer from the general buffer to the image buffer can use the "clBuffertoImage" function supported by the OpenCL API (application program interface).

도 7은 본 발명의 실시예에 따른 초해상도 기법 알고리즘에 적용할 수 있는 메모리 최적화를 위한 시스템 구성의 다른 예를 나타내는 도면이다. FIG. 7 is a diagram illustrating another example of a system configuration for memory optimization that can be applied to a super resolution technique algorithm according to an embodiment of the present invention.

도 7을 참조하면, 호스트(710)는 본 발명의 실시예에 따른 초해상도 기법 알고리즘의 종료 시점까지 GPU의 메모리 공간을 최초 선언한 후, 분리된(isolated) 상태로 유지할 수 있다. Referring to FIG. 7, the host 710 can maintain the memory space of the GPU in an isolated state after the memory space of the GPU is initially declared until the end of the super resolution technique algorithm according to an embodiment of the present invention.

디바이스(720)는 본 발명의 실시예에 따른 초해상도 기법 구현을 위해 8개의 메모리 공간을 할당하고 이를 재사용할 수 있다. 예컨대, 도 7에 도시된 바와 같이, 상술한 본 발명의 실시예에 따른 초해상도 기법의 각 단계(단계 1 ~ 단계 6)를 수행함에 따라 필요한 텍스처 메모리 공간 I0, L0, H0, L1, H1, W1과 전역 메모리 공간 W1_G, H1_G을 할당할 수 있다. 즉, 텍스처 메모리 공간과 전역 메모리 공간을 혼용하여 GPU 메모리를 최적화 할 수 있다.
The device 720 may allocate and reuse eight memory spaces for realizing the super resolution technique according to the embodiment of the present invention. For example, as shown in FIG. 7, the necessary texture memory spaces I0, L0, H0, L1, H1, and H1 may be obtained by performing each step (step 1 to step 6) of the super resolution technique according to the above- W1 and global memory space W1_G, H1_G. In other words, GPU memory can be optimized by mixing texture memory space and global memory space.

한편, PC와 GPU 간의 메모리 전송은 GPU 내부 메모리 전송보다 느리다. 따라서, 본 발명의 실시예에 따른 초해상도 기법 알고리즘에 적용할 수 있는 메모리 최적화를 위해서, GPU에 선언된 메모리를 재사용함으로써 메모리 전송을 최소화할 수 있다. 또한, 자주 사용하는 커널 계수와 함수 인자는 접근 속도가 빠른 상수 메모리에 저장해 접근 속도를 향상시킬 수 있다. Meanwhile, memory transfer between PC and GPU is slower than GPU internal memory transfer. Therefore, in order to optimize the memory applicable to the super resolution technique algorithm according to the embodiment of the present invention, memory transfer can be minimized by reusing the memory declared in the GPU. In addition, frequently used kernel coefficients and function arguments can be stored in a constant memory with fast access speed to improve access speed.

본 발명에서는 CPU AMD-A8 3870 @ 3.00Ghz 와 8GB RAM, GPU Geforce 780을 사용하여 상술한 본 발명에 따른 초해상도 기법을 구현하여 테스트하였다. 그리고 OpenCL 1.1 버전을 사용하여 구현하였다. In the present invention, a super resolution technique according to the present invention is implemented and tested using a CPU AMD-A8 3870 @ 3.00 Ghz, an 8 GB RAM, and a GPU Geforce 780. And we implemented it using OpenCL 1.1 version.

표 2는 1920x1080 사이즈의 FHD(Full High Definition) 영상을 3840x2160 사이즈의 UHD(Ultra High Definition) 영상으로 제안된 본 발명에 따른 초해상도 기법을 활용해 구현했을 때의 결과이다. 표 2는 각 계층별 평균 실행 시간 및 총 실행 시간을 나타낸다. Table 2 shows the result of implementing a full high definition (FHD) image of 1920x1080 size using a super resolution technique according to the present invention proposed as a UHD (Ultra High Definition) image of 3840x2160 size. Table 2 shows average execution time and total execution time for each layer.

실행시간(s)Run Time (s) 계층1(s)Layer 1 (s) 계층2(s)Layer 2 (s) 계층 3(s)Layer 3 (s) CPUCPU 25.20925.209 5.0555.055 7.8937.893 12.26112.261 GPU(global)GPU (global) 0.3880.388 0.0740.074 0.1180.118 0.1820.182 GPU(image)GPU (image) 0.2730.273 0.0400.040 0.0650.065 0.1280.128

표 2에 도시된 바와 같이, OpenCL을 활용해 구현 했을 경우 CPU 대비 64배의 속도 향상이 있었고, 이미지 버퍼를 활용한 경우에 가속비를 92배까지 향상 시킬 수 있었다.
As shown in Table 2, when implemented using OpenCL, the speed was 64 times faster than the CPU, and the acceleration ratio could be improved up to 92 times when using the image buffer.

상술한 실시예들에서, 방법들은 일련의 단계 또는 블록으로서 순서도를 기초로 설명되고 있으나, 본 발명은 단계들의 순서에 한정되는 것은 아니며, 어떤 단계는 상술한 바와 다른 단계와 다른 순서로 또는 동시에 발생할 수 있다. 또한, 당해 기술 분야에서 통상의 지식을 가진 자라면 순서도에 나타난 단계들이 배타적이지 않고, 다른 단계가 포함되거나, 순서도의 하나 또는 그 이상의 단계가 본 발명의 범위에 영향을 미치지 않고 삭제될 수 있음을 이해할 수 있을 것이다.In the above-described embodiments, the methods are described on the basis of a flowchart as a series of steps or blocks, but the present invention is not limited to the order of the steps, and some steps may occur in different orders or simultaneously . It will also be understood by those skilled in the art that the steps depicted in the flowchart illustrations are not exclusive and that other steps may be included or that one or more steps in the flowchart may be deleted without affecting the scope of the invention You will understand.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호범위는 특허청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The foregoing description is merely illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be construed according to the claims, and all technical ideas within the scope of the claims should be construed as being included in the scope of the present invention.

Claims

Decomposing an input image into a low-frequency image and a high-frequency image using a filter;
Interpolating the input image to generate an enlarged image of a low frequency component;
Searching for a similar patch having the highest degree of similarity from the low-frequency images using the enlarged image as a patch unit;
And copying and mapping the patch in the high-frequency image corresponding to the retrieved similar patch to a higher-layer high-frequency image having a higher resolution than the high-frequency image; And
And generating a high-resolution image having a higher resolution than the input image by fusing the enlarged image and the high-frequency image of the upper layer.