KR20210128605A

KR20210128605A - Method and Apparatus for Image Transform

Info

Publication number: KR20210128605A
Application number: KR1020200046435A
Authority: KR
Inventors: 전진; 나태영; 김재일; 배주한
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2021-10-27

Abstract

Provided are a method and device for image transformation for generating a high-resolution image from a low-resolution image. This embodiment of the present invention uses a pre-trained inference model based on a pair of low-resolution and high-resolution images for training with identical features such as resolution, aspect ratio, and alignment. The device for image transformation comprises: an input unit for acquiring an image; and an image conversion unit that creates an image.

Description

Image transformation apparatus and method {Method and Apparatus for Image Transform}

본 발명은 저해상도 영상으로부터 고해상도 영상을 생성하는 영상 변환장치 및 방법에 관한 것이다. 더욱 상세하게는, 학습용 저해상도 영상 및 고해상도 영상 페어를 이용하여 사전에 트레이닝된 심층신경망 기반의 추론 모델이 저해상도 영상으로부터 고해상도 영상을 생성하는 영상 변환장치 및 방법에 관한 것이다.The present invention relates to an image conversion apparatus and method for generating a high-resolution image from a low-resolution image. More specifically, it relates to an image conversion apparatus and method in which a deep neural network-based inference model trained in advance using a low-resolution image for training and a high-resolution image pair generates a high-resolution image from a low-resolution image.

이하에 기술되는 내용은 단순히 본 발명과 관련되는 배경 정보만을 제공할 뿐 종래기술을 구성하는 것이 아니다. The content described below merely provides background information related to the present invention and does not constitute the prior art.

근래에 딥러닝(deep learning) 기반 심층신경망(deep neural network)에 기초하는 초해상도(Super Resolution, SR) 기술이 활발히 개발되고 있으며, 이전의 다른 기술과 비교하여 SR의 성능이 크게 향상되고 있다. 하지만 심층신경망 기반의 기존 SR 연구에서 신경망의 트레이닝에 이용하는 데이터는, 실제 방송용 영상과는 품질 측면에서 차이가 존재하는, 왜곡이 없는 영상이 대부분이다. 따라서, 기존 연구에 따른 SR 구현의 결과물을 방송용 콘텐츠에 적용하는 경우, 목표 수준의 고품질 영상을 획득하기 어렵다는 한계가 있다. In recent years, a super-resolution (SR) technology based on a deep learning-based deep neural network has been actively developed, and the performance of the SR is greatly improved compared to other previous technologies. However, in the existing deep neural network-based SR research, the data used for neural network training is mostly undistorted images, which are different in quality from the actual broadcasting images. Therefore, there is a limitation in that it is difficult to obtain a high-quality image of a target level when the result of SR implementation according to the existing research is applied to broadcasting contents.

초해상도 심층신경망의 대표적인 기술로는 EDSR(Enhanced Deep Super Resolution network)이 있는데(비특허문헌 1 참조), EDSR은 영상 변환에 널리 이용되는 ResNet(비특허문헌 2 참조)을 개선한 신경망이다. 초해상도 기술을 구현하기 위해 EDSR은 32 개의 잔차블록(Residual Block, RB), 콘볼루션 레이어(convolution layer) 및 업샘플러(up-sampler) 등을 포함한다. As a representative technology of super-resolution deep neural networks, there is an Enhanced Deep Super Resolution network (EDSR) (see Non-Patent Document 1). In order to implement the super-resolution technology, the EDSR includes 32 residual blocks (RBs), a convolution layer, and an up-sampler.

ResNet의 RB(Residual Block)와 비교하여, EDSR의 RB는 BN(Batch Normalization) 레이어를 제거하고 ReLU(Rectified Linear Unit) 레이어 하나를 갖는다. 이러한 구조 변경을 이용하는 EDSR이 발표되었을 때, 초해상도 구현 분야에서 EDSR은 SOTA(State of the Art) 성능을 달성하였다. 하지만 SR 구현 시 일반적으로 64 개의 특성맵 채널(feature map channel)이 사용되는 것과 비교하여, EDSR은 256 개의 채널을 사용하기 때문에, 43×10⁶ 개의 방대한 파라미터를 포함하므로 연산 속도가 느리다는 단점이 있다. Compared to ResNet's RB (Residual Block), EDSR's RB removes the BN (Batch Normalization) layer and has one ReLU (Rectified Linear Unit) layer. When EDSR using this structural change was announced, EDSR achieved State of the Art (SOTA) performance in the field of super-resolution implementation. However, compared to the general use of 64 feature map channels when implementing SR, EDSR uses 256 channels, so it ^{contains 43 × 10 6} vast parameters, so the calculation speed is slow. have.

다른 초해상도 심층신경망으로는, 초해상도 구현을 위해 신경망(neural network)의 깊이(depth)가 월등히 증가된 RCAN(Residual Channel Attention Network) 모델이 있다(비특허문헌 3 참조). 초해상도를 구현하기 위해 기존에 연구된 신경망 구조로는 깊은 네트워크(deep network)에 대한 트레이닝이 어려웠기 때문에, RCAN 모델은 깊은 네트워크를 트레이닝시키기 위한 구조를 포함하였다. As another super-resolution deep neural network, there is a Residual Channel Attention Network (RCAN) model in which the depth of a neural network is significantly increased for super-resolution implementation (see Non-Patent Document 3). Since it was difficult to train a deep network with the previously studied neural network structure to implement super-resolution, the RCAN model included a structure for training the deep network.

RCAN 모델은, 긴 스킵 커넥션(long skip connection)을 사용하는 RIR(Residual In Residual); RIR 내부에 포함되고 짧은 스킵 커넥션(short skip connection)을 사용하는 RG(Residual Group); 및 채널 간 상호 의존성을 기반으로 채널 단위의 특징을 재조정하기 위한 CA(Channel Attention)를 이용하는, RG 내부의 RCAB(Residual Channel Attention Block) 등을 포함한다. RCAN 모델에서는 RIR, RG, RCAB 및 CA로 특징지어지는 구조를 구현하기 위해 대략 800 개 정도의 컨볼루션 레이어가 이용되었다. 기존 방법과의 비교 실험 결과를 이용하여 RCAN 모델은 더 개선된 SR 구현 결과를 보였다. 하지만 EDSR과 비교하여 파라미터 개수가 16×10⁶ 개로 감소되었음에도, 800 개 정도의 컨볼루션 레이어를 사용하기 때문에 RCAN 모델도 연산 속도가 느리다는 단점이 존재한다. The RCAN model includes: Residual In Residual (RIR) using a long skip connection; Residual Group (RG) contained within the RIR and using a short skip connection; and a Residual Channel Attention Block (RCAB) inside the RG, which uses a CA (Channel Attention) for re-adjusting the characteristics of a channel unit based on inter-channel interdependence. In the RCAN model, approximately 800 convolutional layers were used to implement the structure characterized by RIR, RG, RCAB and CA. Using the experimental results of comparison with the existing method, the RCAN model showed more improved SR implementation results. However, compared to EDSR, ^{even though the number of parameters is reduced to 16×10 6} , the RCAN model also has a disadvantage in that the computation speed is slow because about 800 convolutional layers are used.

따라서, 왜곡을 많이 포함하는 방송용 저해상도 영상으로부터 목표 수준의 고품질 영상을 획득할 수 있는 SR을 구현하면서도, 그 구조가 간단하여 연산 속도는 향상된 영상 변환장치를 필요로 한다.Accordingly, while implementing SR capable of obtaining a high-quality image of a target level from a low-resolution image for broadcasting including a lot of distortion, an image conversion apparatus having a simple structure and improved operation speed is required.

비특허문헌 1: B. Lim, S. Son, H. Kim, S. Nah, K. Lee. Enhanced Deep Residual Networks for Single Image Super Resolution, In CVPR workshop, 2017.Non-Patent Document 1: B. Lim, S. Son, H. Kim, S. Nah, K. Lee. Enhanced Deep Residual Networks for Single Image Super Resolution, In CVPR workshop, 2017. 비특허문헌 2: K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR 2016.Non-Patent Document 2: K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR 2016. 비특허문헌 3: Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu. Image Super-Resolution Using Very Deep Residual Channel Attention Networks, In ECCV, 2018. Non-Patent Document 3: Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu. Image Super-Resolution Using Very Deep Residual Channel Attention Networks, In ECCV, 2018.

본 개시는, 해상도, 종횡비(aspect ratio), 및 정렬과 같은 특성을 동일하게 일치시킨 학습용 저해상도 영상 및 고해상도 영상 페어(image pair)를 기반으로 사전에 트레이닝된 심층신경망 기반의 추론 모델(inference model)을 이용하여, 저해상도 영상으로부터 고해상도(high resolution) 영상을 생성하는 영상 변환장치 및 방법을 제공하는 데 주된 목적이 있다.The present disclosure provides an inference model based on a deep neural network trained in advance based on a low-resolution image for learning and a high-resolution image pair in which characteristics such as resolution, aspect ratio, and alignment are identically matched. A main object of the present invention is to provide an image conversion apparatus and method for generating a high resolution image from a low resolution image by using the .

본 발명의 실시예에 따르면, 비디오 콘텐츠로부터 저해상도(low resolution) 영상을 획득하는 입력부; 및 심층신경망(deep neural network) 기반의 추론 모델(inference model)을 이용하여, 상기 저해상도 영상으로부터 상기 비디오 콘텐츠에 대한 고해상도(high resolution) 영상을 생성하는 영상변환부를 포함하되, 상기 추론 모델은 제1 콘볼루션 레이어(convolution layer), 복수의 LRB(Long Residual Block), 제2 콘볼루션 레이어, 제1 스킵 연결(skip connection), 및 제3 콘볼루션 레이어를 포함하여, 상기 저해상도 영상을 상기 제1 콘볼루션 레이어, 상기 복수의 LRB, 상기 제2 콘볼루션 레이어, 및 상기 제3 콘볼루션 레이어의 순서대로 처리하여 상기 고해상도 영상을 생성하는 것을 특징으로 하는 영상 변환장치를 제공한다. According to an embodiment of the present invention, the input unit for obtaining a low resolution (low resolution) image from the video content; and an image conversion unit generating a high resolution image for the video content from the low resolution image using an inference model based on a deep neural network, wherein the inference model includes a first By including a convolution layer, a plurality of Long Residual Blocks (LRB), a second convolution layer, a first skip connection, and a third convolution layer, the low-resolution image is converted into the first convolutional layer. A convolutional layer, the plurality of LRBs, the second convolutional layer, and the third convolutional layer are sequentially processed to generate the high-resolution image.

본 발명의 다른 실시예에 따르면, 영상 변환장치가 이용하는 영상 변환방법에 있어서, 비디오 콘텐츠로부터 저해상도(low resolution) 영상을 획득하는 과정; 및 심층신경망(deep neural network) 기반으로 구현되고, 제1 콘볼루션 레이어(convolution layer), 복수의 LRB(Long Residual Block), 제2 콘볼루션 레이어, 제1 스킵 연결(skip connection), 및 제3 콘볼루션 레이어를 포함하고, 동일한 비디오 콘텐츠에 기초하는 학습용 저해상도 영상 및 학습용 고해상도 영상 페어(image pair)를 기반으로 사전에 트레이닝되며, 상기 저해상도 영상을 상기 제1 콘볼루션 레이어, 상기 복수의 LRB, 상기 제2 콘볼루션 레이어, 및 상기 제3 콘볼루션 레이어의 순서대로 처리하는 추론 모델(inference model)을 이용하여, 상기 저해상도 영상으로부터 상기 비디오 콘텐츠에 대한 고해상도(high resolution) 영상을 생성하는 과정을 포함하는 것을 특징으로 하는 영상 변환방법을 제공한다. According to another embodiment of the present invention, there is provided an image conversion method used by an image conversion apparatus, the method comprising: obtaining a low resolution image from video content; And implemented based on a deep neural network, a first convolution layer, a plurality of Long Residual Blocks (LRB), a second convolution layer, a first skip connection, and a third It includes a convolutional layer, and is trained in advance based on a pair of a low-resolution image for training and a high-resolution image for training based on the same video content, and uses the low-resolution image as the first convolutional layer, the plurality of LRBs, and the A process of generating a high-resolution image of the video content from the low-resolution image by using an inference model that processes the second convolutional layer and the third convolutional layer in order It provides an image conversion method, characterized in that.

본 발명의 다른 실시예에 따르면, 영상 변환방법이 포함하는 각 단계를 실행시키기 위하여 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터프로그램을 제공한다. According to another embodiment of the present invention, there is provided a computer program stored in a computer-readable recording medium to execute each step included in the image conversion method.

이상에서 설명한 바와 같이 본 실시예에 따르면, 학습용 저해상도 영상 및 고해상도 영상 페어(image pair)를 기반으로 사전에 트레이닝된 심층신경망 기반의 추론 모델(inference model)을 이용하여, 저해상도 영상으로부터 고해상도(high resolution) 영상을 생성하는 영상 변환장치 및 방법을 제공함으로써, 왜곡을 많이 포함하는 방송용 저해상도 영상으로부터 고품질의 고해상도 영상을 획득하는 SR 구현이 가능해지는 효과가 있다.As described above, according to this embodiment, using an inference model based on a deep neural network trained in advance based on a low-resolution image for learning and a high-resolution image pair, high resolution from a low-resolution image ) By providing an image conversion apparatus and method for generating an image, there is an effect that it becomes possible to implement SR that obtains a high-quality, high-resolution image from a low-resolution image for broadcasting containing a lot of distortion.

또한 본 실시예에 따르면, 해상도, 종횡비(aspect ratio), 및 정렬과 같은 특성을 동일하게 일치시킨 학습용 저해상도 영상 및 고해상도 영상 페어를 기반으로 사전에 트레이닝된 심층신경망 기반의 추론 모델(inference model)을 이용하는 영상 변환장치 및 방법을 제공함으로써, 추론 모델의 복잡도 감소 및 트레이닝 효율 향상이 가능해지는 효과가 있다. In addition, according to the present embodiment, a deep neural network-based inference model trained in advance based on a pair of low-resolution images and high-resolution images for training in which characteristics such as resolution, aspect ratio, and alignment are identically matched. By providing an image conversion apparatus and method used, there is an effect that it is possible to reduce the complexity of the inference model and improve the training efficiency.

도 1은 본 발명의 일 실시예에 따른 영상 변환장치의 블록도이다.
도 2는 본 발명의 일 실시예에 따른 추론 모델의 블록도이다.
도 3은 본 발명의 일 실시예에 따른 추론 모델에 포함된 LRB의 블록도이다.
도 4는 본 발명의 일 실시예에 따른 추론 모델에 포함된 SRB의 블록도이다.
도 5는 본 발명의 일 실시예에 따른 영상 변환방법의 순서도이다.
도 6은 본 발명의 일 실시예에 따른 영상 변환장치의 학습 모델에 대한 블록도이다.
도 7은 본 발명의 일 실시예에 따른 영상 페어 생성과정에 대한 순서도이다.1 is a block diagram of an image conversion apparatus according to an embodiment of the present invention.
2 is a block diagram of an inference model according to an embodiment of the present invention.
3 is a block diagram of an LRB included in an inference model according to an embodiment of the present invention.
4 is a block diagram of an SRB included in an inference model according to an embodiment of the present invention.
5 is a flowchart of an image conversion method according to an embodiment of the present invention.
6 is a block diagram of a learning model of an image conversion apparatus according to an embodiment of the present invention.
7 is a flowchart of an image pair generation process according to an embodiment of the present invention.

이하, 본 발명의 실시예들을 예시적인 도면을 참조하여 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 실시예들의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the present embodiments, if it is determined that a detailed description of a related well-known configuration or function may obscure the gist of the present embodiments, the detailed description thereof will be omitted.

또한, 본 실시예들의 구성요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성요소를 다른 구성요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Also, in describing the components of the present embodiments, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. Throughout the specification, when a part 'includes' or 'includes' a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated. . In addition, the '... Terms such as 'unit' and 'module' mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software.

첨부된 도면과 함께 이하에 개시될 상세한 설명은 본 발명의 예시적인 실시형태를 설명하고자 하는 것이며, 본 발명이 실시될 수 있는 유일한 실시형태를 나타내고자 하는 것이 아니다.DETAILED DESCRIPTION The detailed description set forth below in conjunction with the appended drawings is intended to describe exemplary embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced.

본 실시예는 저해상도 영상으로부터 고해상도 영상을 생성하는 영상 변환장치 및 방법에 대한 내용을 개시한다. 보다 자세하게는, 해상도, 종횡비(aspect ratio), 및 정렬과 같은 특성을 동일하게 일치시킨 학습용 저해상도 영상 및 고해상도 영상 페어를 기반으로 사전에 트레이닝된 심층신경망 기반의 추론 모델(inference model)을 이용하여, 저해상도 영상으로부터 고해상도 영상을 생성하는 영상 변환장치 및 방법을 제공한다.This embodiment discloses an image conversion apparatus and method for generating a high-resolution image from a low-resolution image. In more detail, using an inference model based on a deep neural network trained in advance based on a pair of low-resolution images and high-resolution images for training in which properties such as resolution, aspect ratio, and alignment are identically matched, An image conversion apparatus and method for generating a high-resolution image from a low-resolution image are provided.

도 1은 본 발명의 일 실시예에 따른 영상 변환장치의 블록도이다.1 is a block diagram of an image conversion apparatus according to an embodiment of the present invention.

본 발명의 실시예에 있어서, 영상 변환장치(100)는 사전에 트레이닝된 추론 모델(inference model)을 이용하여, 저해상도(low resolution) 영상으로부터 고해상도(high resolution) 영상을 생성한다. 영상 변환장치(100)는 입력부(102), 영상변환부(104) 및 출력부(106)의 전부 또는 일부를 포함한다. 또한, 영상변환부(104)는 추론 모델(inference model, 110)을 포함한다. 여기서, 본 실시예에 따른 영상 변환장치(100)에 포함되는 구성요소가 반드시 이에 한정되는 것은 아니다. 예컨대, 영상 변환장치(100)는 추론 모델의 트레이닝을 위한 트레이닝부(미도시)를 추가로 구비하거나, 외부의 트레이닝부와 연동되는 형태로 구현될 수 있다.In an embodiment of the present invention, the image conversion apparatus 100 generates a high resolution image from a low resolution image by using a pre-trained inference model. The image conversion apparatus 100 includes all or a part of the input unit 102 , the image conversion unit 104 , and the output unit 106 . In addition, the image conversion unit 104 includes an inference model (inference model, 110). Here, the components included in the image conversion apparatus 100 according to the present embodiment are not necessarily limited thereto. For example, the image conversion apparatus 100 may additionally include a training unit (not shown) for training an inference model, or may be implemented in a form that interworks with an external training unit.

여기서 영상 변환(image transform)은 열화된(degraded) 영상을 적절한 필터(filter)로 처리하여 고품질의 선명한 영상을 생성하는 방법을 의미한다. 따라서, 본 실시예에 따른 영상 변환장치(100)는 추론 모델을 필터로 이용하여 저품질의 영상으로부터 고품질의 영상을 추론하는 영상 변환을 수행한다. Here, image transform refers to a method of generating a high-quality, clear image by processing a degraded image with an appropriate filter. Accordingly, the image conversion apparatus 100 according to the present embodiment performs image conversion for inferring a high-quality image from a low-quality image by using the inference model as a filter.

도 1의 도시는 본 실시예에 따른 예시적인 구성이며, 입력과 출력의 형태, 추론 모델의 형태, 및 추론 모델의 트레이닝 방법에 따라 다른 구성요소 또는 구성요소 간의 다른 연결을 포함하는 다양한 구현이 가능하다. 1 is an exemplary configuration according to the present embodiment, and various implementations including other components or other connections between components are possible depending on the types of inputs and outputs, the type of the inference model, and the training method of the inference model do.

본 실시예에 따른 입력부(102)는 저해상도 영상을 획득한다. 여기서 저해상도 영상은 저장장치(미도시)에 보관되어 있는 방송용 비디오 콘텐츠(video content)로부터 획득된 SD(Standard Definition) 영상일 수 있으나 반드시 이에 한정하는 것은 아니며, 해상도 및 품질 개선을 필요로 하는 어느 형태의 영상이든 될 수 있다. 방송용 비디오 콘텐츠로부터 획득된 저해상도 영상은 왜곡(distortion)을 포함할 수 있다.The input unit 102 according to the present embodiment acquires a low-resolution image. Here, the low-resolution image may be an SD (Standard Definition) image obtained from video content for broadcasting stored in a storage device (not shown), but is not limited thereto, and any form requiring resolution and quality improvement It can be a video of A low-resolution image obtained from video content for broadcasting may include distortion.

본 실시예에 따른 영상변환부(104)는 심층신경망(deep neural network) 기반의 추론 모델(110)을 이용하여, 저해상도 영상으로부터 동일한 비디오 콘텐츠에 대한 고해상도 영상을 생성한다. The image conversion unit 104 according to the present embodiment generates a high-resolution image of the same video content from a low-resolution image by using the inference model 110 based on a deep neural network.

또한, 고해상도 영상은 FHD(Full High-Definition) 영상일 수 있으나 반드시 이에 한정되는 것은 아니며, 입력 영상보다 해상도 및 품질이 향상된 어느 형태의 영상이든 될 수 있다. In addition, the high-resolution image may be a full high-definition (FHD) image, but is not limited thereto, and may be any type of image with improved resolution and quality than the input image.

본 실시예에 따른 출력부(106)는 고해상도 영상을 출력한다. 출력부(106)는 고해상도 영상을 시각 및/또는 청각적인 형태로 사용자에게 제공하거나, 저장장치에 보관할 수 있다.The output unit 106 according to the present embodiment outputs a high-resolution image. The output unit 106 may provide the high-resolution image to the user in visual and/or audible form, or store it in a storage device.

이하 도 2 내지 도 4를 이용하여 추론 모델(110)의 구조 및 동작에 대하여 기술한다. Hereinafter, the structure and operation of the inference model 110 will be described using FIGS. 2 to 4 .

도 2는 본 발명의 일 실시예에 따른 추론 모델의 블록도이다.2 is a block diagram of an inference model according to an embodiment of the present invention.

본 실시예에 따른 추론 모델(110)은 저해상도 영상으로부터 고해상도 영상을 자동으로 생성한다. 추론 모델(110)은 복수의 LRB(Long Residual Block, 202), 복수의 콘볼루션 레이어(convolution layer, 206, 208, 210), 및 제1 스킵 연결(skip connection, 204)의 전부 또는 일부를 포함한다.The inference model 110 according to the present embodiment automatically generates a high-resolution image from a low-resolution image. The inference model 110 includes all or part of a plurality of Long Residual Blocks (LRBs) 202, a plurality of convolution layers (206, 208, 210), and a first skip connection (204). do.

본 실시예에 따른 추론 모델(110)은 다수의 콘볼루션 레이어를 기반으로 하는 심층신경망으로 구현된다. 추론 모델(110)은 딥러닝(deep learning) 기반 학습 모델을 이용하여 사전에 트레이닝될 수 있다. 학습 모델의 구조 및 학습 모델의 트레이닝 과정은 추후 설명하기로 한다.The inference model 110 according to the present embodiment is implemented as a deep neural network based on a plurality of convolutional layers. The inference model 110 may be trained in advance using a deep learning-based learning model. The structure of the learning model and the training process of the learning model will be described later.

도 2에 도시된 바와 같이, 추론 모델(110)은 저해상도 영상을 제1 콘볼루션 레이어(206), 복수의 LRB, 제2 콘볼루션 레이어(208), 및 제3 콘볼루션 레이어(210)의 순서대로 처리하여 고해상도 영상을 생성할 수 있다. As shown in FIG. 2 , the inference model 110 calculates a low-resolution image in the order of the first convolutional layer 206 , the plurality of LRBs, the second convolutional layer 208 , and the third convolutional layer 210 . It can be processed to produce high-resolution images.

추론 모델(110)의 입력 측에 위치한 제1 콘볼루션 레이어(206)는 저해상도 영상으로부터 복수의 특성맵(feature map)을 추출한다. 추론 모델(110)의 연산 속도와 고해상도 영상의 품질을 절충하여 제1 콘볼루션 레이어(206)가 추출하는 특성맵 채널의 적절한 개수가 설정될 수 있다. 도 2 내지 도 4에서 c(c는 자연수)는 특성맵 채널의 개수를 나타내는데, 예를 들어, c는 64일 수 있다.The first convolutional layer 206 located on the input side of the inference model 110 extracts a plurality of feature maps from the low-resolution image. An appropriate number of characteristic map channels extracted by the first convolutional layer 206 may be set by compromising the operation speed of the inference model 110 and the quality of the high-resolution image. 2 to 4, c (c is a natural number) indicates the number of characteristic map channels, for example, c may be 64.

도 2에 도시된 바와 같이, 본 실시예에 따른 추론 모델(110)은 8 개의 LRB를 포함하나, 반드시 이에 한정하는 것은 아니며, 추론 모델(110)의 연산 속도와 고해상도 영상의 품질을 절충하여 LRB의 적절한 개수가 설정될 수 있다. 추론 모델(110)에 포함된 LRB 각각은 동일한 구조로 구현되어, 동일한 기능을 수행한다. 따라서, 이하 하나의 LRB(202)의 구조 및 동작에 대하여 설명한다.As shown in FIG. 2 , the inference model 110 according to the present embodiment includes eight LRBs, but is not limited thereto, and the LRBs by compromising the operation speed of the inference model 110 and the quality of high-resolution images. An appropriate number of may be set. Each of the LRBs included in the inference model 110 is implemented with the same structure and performs the same function. Accordingly, the structure and operation of one LRB 202 will be described below.

도 3은 본 발명의 일 실시예에 따른 추론 모델에 포함된 LRB의 블록도이다.3 is a block diagram of an LRB included in an inference model according to an embodiment of the present invention.

본 실시예에 따른 LRB(202)는 2 개의 SRB(Short Residual Block, 302), 제2 스킵 연결(304) 및 제4 콘볼루션 레이어(306)의 전부 또는 일부를 포함한다. LRB(202)는 입력을 2 개의 SRB, 및 제4 콘볼루션 레이어(306)의 순서대로 처리한다.The LRB 202 according to the present embodiment includes all or a part of two Short Residual Blocks (SRBs) 302 , a second skip connection 304 , and a fourth convolutional layer 306 . The LRB 202 processes the input in order of two SRBs and a fourth convolutional layer 306 .

LRB(202)에 포함된 SRB 각각은 동일한 구조로 구현되어, 동일한 기능을 수행한다. 따라서, 이하 하나의 SRB(302)의 구조 및 동작에 대하여 설명한다.Each of the SRBs included in the LRB 202 is implemented with the same structure and performs the same function. Accordingly, the structure and operation of one SRB 302 will be described below.

도 4는 본 발명의 일 실시예에 따른 추론 모델에 포함된 SRB의 블록도이다.4 is a block diagram of an SRB included in an inference model according to an embodiment of the present invention.

본 실시예에 따른 SRB(302)는 제3 스킵 연결(402), 제1 콘볼루션 그룹(convolution group, 404), 연쇄 레이어(concatenation layer, 406), 및 제2 콘볼루션 그룹(408)의 전부 또는 일부를 포함한다. 제1 콘볼루션 그룹(404) 및 제2 콘볼루션 그룹(408) 각각은 두 개의 콘볼루션 레이어 및 하나의 ReLU(Rectified Linear Unit)를 포함한다. 여기서, ReLU는 출력의 범위를 한정하는 활성 함수(activation function)이다.The SRB 302 according to this embodiment includes all of the third skip connection 402 , the first convolution group 404 , the concatenation layer 406 , and the second convolution group 408 . or includes some. Each of the first convolutional group 404 and the second convolutional group 408 includes two convolutional layers and one Rectified Linear Unit (ReLU). Here, ReLU is an activation function that limits the range of the output.

SRB(302)는 입력을 제1 콘볼루션 그룹(404), 연쇄 레이어(406) 및 제2 콘볼루션 그룹(408)의 순서대로 처리한다. The SRB 302 processes the input in the order of the first convolutional group 404 , the concatenated layer 406 , and the second convolutional group 408 .

제1 콘볼루션 그룹(404)은 SRB(302) 입력에 대한 잔차(residue) 출력을 생성한다. 연쇄 레이어(406)는 SRB(302)의 입력과 제1 콘볼루션 그룹(404)의 잔차 출력을 연쇄하여 제2 콘볼루션 그룹(408) 측으로 전달한다. 제2 콘볼루션 그룹(408)은 연쇄 레이어(406) 결과로부터 잔차 출력을 생성한다. 제3 스킵 연결(402)은 SRB(302)의 입력과 제2 콘볼루션 그룹(408)의 잔차 출력을 가산한다.The first convolutional group 404 generates a residual output for the SRB 302 input. The concatenated layer 406 concatenates the input of the SRB 302 and the residual output of the first convolutional group 404 and transmits it to the second convolutional group 408 . The second convolution group 408 generates a residual output from the concatenated layer 406 result. The third skip connection 402 adds the input of the SRB 302 and the residual output of the second convolution group 408 .

LRB(202)에 포함된 제4 콘볼루션 레이어(306)는 두 번째 SRB의 출력(SRB의 입력과 제2 콘볼루션 그룹의 잔차 출력이 가산된 결과)으로부터 잔차 출력을 생성한다. 또한 제2 스킵 연결(304)은 LRB(202)의 입력과 제4 콘볼루션 레이어(306)의 잔차 출력을 가산한다.The fourth convolutional layer 306 included in the LRB 202 generates a residual output from the output of the second SRB (the result of adding the input of the SRB and the residual output of the second convolution group). Also, the second skip connection 304 adds the input of the LRB 202 and the residual output of the fourth convolutional layer 306 .

추론 모델(110)에 포함된 제2 콘볼루션 레이어(208)는 마지막 LRB 내부의 출력(LRB의 입력과 제4 콘볼루션 레이어의 잔차 출력이 가산된 결과)으로부터 잔차 출력을 생성한다. 제1 스킵 연결(204)은 제1 콘볼루션 레이어(206)의 출력과 제2 콘볼루션 레이어(208)의 잔차 출력을 가산함으로써 저해상도 영상의 특성을 추론 모델(110)의 출력 측으로 전달할 수 있다.The second convolutional layer 208 included in the inference model 110 generates a residual output from an output inside the last LRB (a result of adding the input of the LRB and the residual output of the fourth convolutional layer). The first skip connection 204 may transfer the characteristic of the low-resolution image to the output side of the inference model 110 by adding the output of the first convolutional layer 206 and the residual output of the second convolutional layer 208 .

추론 모델(110)은 제3 콘볼루션 레이어(210)를 이용하여 제1 콘볼루션 레이어(206)의 출력(저해상도 영상의 특성)과 제2 콘볼루션 레이어(208)의 잔차 출력에 대한 가산 결과로부터 고해상도 영상을 생성할 수 있다.The inference model 110 uses the third convolutional layer 210 to obtain the output of the first convolutional layer 206 (characteristics of the low-resolution image) and the residual output of the second convolutional layer 208 from the addition result. High-resolution images can be created.

도 2 내지 도 4에 도시된 바와 같이 추론 모델(110)은 SRB(302), LRB(202), 추론 모델(110)로 확장되면서 레이어의 개수가 증가하여 신경망의 깊이가 깊어진다. 그럼에도 추론 모델(110)의 각 구성요소마다 스킵 연결(204, 304 및 402)을 포함하여, 이후 기술되는 학습 모델에 대한 트레이닝이 효과적으로 진행될 수 있도록 한다. As shown in FIGS. 2 to 4 , the inference model 110 is extended to the SRB 302 , the LRB 202 , and the inference model 110 , and the number of layers increases to deepen the depth of the neural network. Nevertheless, skip connections 204 , 304 , and 402 are included for each component of the inference model 110 , so that training for the learning model described later can proceed effectively.

ResNet(비특허문헌 2 참조)의 RB(Residual Block)는 2 개의 콘볼루션 레이어, 2 개의 BN(Batch Normalization) 레이어, 및 2 개의 ReLU 레이어를 포함한다. 또한, ResNet과 비교하여 EDSR(비특허문헌 1 참조)의 RB는 BN 레이어가 모두 제거되고 ReLU 레이어를 하나만 포함한다. 이에 대하여 본 실시예에 따른 SRB(302)는 4 개의 콘볼루션 레이어, 및 연쇄 레이어를 포함한다. Residual Block (RB) of ResNet (refer to Non-Patent Document 2) includes two convolutional layers, two BN (Batch Normalization) layers, and two ReLU layers. In addition, as compared to ResNet, the RB of EDSR (see Non-Patent Document 1) includes only one ReLU layer with all BN layers removed. In contrast, the SRB 302 according to the present embodiment includes four convolutional layers and a concatenated layer.

ResNet에서 이용하는 BN 레이어는 영상 클래스의 분류에는 효과가 있으나 픽셀 단위로 영상을 변환하는 초해상도의 구현에는 효과가 작은 것으로 알려져 있기 때문에 본 실시예에 따른 SRB(302)에서는 사용되지 않는다. 또한, 기존 방법들과 비교하여 영상의 고주파 영역의 변환에 중점을 두기 위하여 SRB(302)는 연쇄 레이어(406)를 이용한다. The BN layer used in ResNet is effective in classifying image classes, but is not used in the SRB 302 according to the present embodiment because it is known that the effect is small in the implementation of super-resolution converting images in pixel units. In addition, the SRB 302 uses the concatenated layer 406 in order to focus on the transformation of the high-frequency region of the image compared to the existing methods.

도 5는 본 발명의 일 실시예에 따른 영상 변환방법의 순서도이다.5 is a flowchart of an image conversion method according to an embodiment of the present invention.

본 실시예에 따른 영상 변환장치(100)는 저해상도(low resolution) 영상을 획득한다(S500). 여기서 저해상도 영상은 방송용 비디오 콘텐츠로부터 획득된 SD 영상일 수 있으나 반드시 이에 한정하는 것은 아니며, 해상도 및 품질 개선을 필요로 하는 어느 형태의 영상이든 될 수 있다. The image conversion apparatus 100 according to the present embodiment acquires a low resolution image (S500). Here, the low-resolution image may be an SD image obtained from video content for broadcasting, but is not limited thereto, and may be any type of image requiring resolution and quality improvement.

영상 변환장치(100)는 사전에 트레이닝된 추론 모델을 이용하여 저해상도 영상으로부터 고해상도(high resolution) 영상을 생성한다(S502).The image conversion apparatus 100 generates a high-resolution image from a low-resolution image by using a pre-trained inference model (S502).

추론 모델(110)은 심층신경망으로 구현되고, 복수의 LRB, 복수의 콘볼루션 레이어, 및 스킵 연결의 전부 또는 일부를 포함한다. 추론 모델(110)은 학습용 저해상도 영상 및 학습용 고해상도 영상 페어를 기반으로 사전에 트레이닝될 수 있다.The inference model 110 is implemented as a deep neural network, and includes all or part of a plurality of LRBs, a plurality of convolutional layers, and skip connections. The inference model 110 may be trained in advance based on a pair of a low-resolution image for training and a high-resolution image for training.

영상 변환장치(100)가 생성하는 고해상도 영상은 FHD 영상일 수 있으나 반드시 이에 한정되는 것은 아니며, 입력 영상보다 해상도 및 품질이 향상된 어느 형태의 영상이든 될 수 있다.The high-resolution image generated by the image conversion apparatus 100 may be an FHD image, but is not limited thereto, and may be any type of image with improved resolution and quality than the input image.

영상 변환장치(100)는 고해상도 영상을 출력한다(S504). 고해상도 영상은 시각 및/또는 청각적인 형태로 사용자에게 제공되거나, 저장장치에 보관될 수 있다.The image conversion apparatus 100 outputs a high-resolution image (S504). The high-resolution image may be provided to a user in a visual and/or audible form or stored in a storage device.

전술한 바와 같이 본 실시예에 따른 영상 변환장치(100)는 딥러닝 기반 학습 모델을 구비하고, 구비된 학습 모델을 이용하여 추론 모델에 대한 트레이닝 과정을 수행할 수 있다. 이러한 학습 모델은 동일한 대상에 대하여 상이한 해상도를 갖는 영상을 기반으로, 저해상도 영상을 고해상도 영상으로 생성하는 것이 가능하도록 사전에 트레이닝된 모델일 수 있다.As described above, the image conversion apparatus 100 according to the present embodiment may have a deep learning-based learning model, and may perform a training process for the inference model using the provided learning model. This learning model is based on images with different resolutions for the same object, and converts low-resolution images into high-resolution images. It may be a pre-trained model to be able to generate.

도 6은 본 발명의 일 실시예에 따른 영상 변환장치의 학습 모델에 대한 블록도이다.6 is a block diagram of a learning model of an image conversion apparatus according to an embodiment of the present invention.

본 실시예에 따른 학습 모델은, 동일한 비디오 콘텐츠로부터 획득된 SD 영상과 FHD 영상으로부터 학습용 영상 페어(image pair)를 생성하고, 생성된 영상 페어를 이용하여 추론 모델(110)을 트레이닝시킨다. 학습 모델은 입력부(602), 영상페어 생성기(604) 및 추론 모델(110)의 전부 또는 일부를 포함한다. 전술한 바와 같이, 학습 모델은 추론 모델(110)의 트레이닝을 위한 트레이닝부(미도시)를 추가로 구비하거나, 외부의 트레이닝부와 연동되는 형태로 구현될 수 있다.The learning model according to the present embodiment generates an image pair for training from an SD image and an FHD image obtained from the same video content, and trains the inference model 110 using the generated image pair. The learning model includes all or a part of the input unit 602 , the image pair generator 604 , and the inference model 110 . As described above, the learning model may further include a training unit (not shown) for training the inference model 110 , or may be implemented in a form that is linked with an external training unit.

입력부(602)는 저해상도 영상인 SD 영상과 고해상도 영상인 FHD 영상을 획득한다. 여기서 SD 영상과 FHD 영상은 동일한 방송용 비디오 콘텐츠로부터 획득된 것이다. SD 영상의 해상도는 720×480이고, 4:3(횡:종, 이하 동일함) 비율을 사용하며, FHD 영상의 해상도는 1920×1080이고 16:9 비율을 사용하므로, 두 영상은 해상도, 종횡비(aspect ratio), 및 정렬 측면에서 상호 동일한 특성을 갖지 않는다. 또한 방송용 비디오 콘텐츠로부터 획득된 저해상도 영상은 왜곡을 포함할 수 있다.The input unit 602 acquires an SD image, which is a low-resolution image, and an FHD image, which is a high-resolution image. Here, the SD image and the FHD image are obtained from the same broadcast video content. The resolution of the SD video is 720×480, and 4:3 (horizontal: vertical, hereinafter the same) ratio is used, and the FHD video resolution is 1920×1080 and uses a 16:9 ratio, so the two videos have resolution and aspect ratio. (aspect ratio), and do not have the same characteristics in terms of alignment. Also, a low-resolution image obtained from video content for broadcasting may include distortion.

영상페어 생성기(604)는 입력된 두 영상 간의 해상도, 종횡비 및 정렬을 보정함으로써 동일한 특성을 갖는 학습용 영상 페어, 즉 학습용 SD 영상 및 학습용 FHD 영상을 생성하기 위한 전처리과정(pre-processing)을 수행한다. 여기서, 학습용 SD 영상은 추론 모델(110)의 입력으로 이용되고, 학습용 FHD 영상은 추론 모델(110)의 트레이닝을 위한 타겟(target) 영상으로 이용될 수 있다.The image pair generator 604 performs pre-processing for generating a learning image pair having the same characteristics, that is, an SD image for learning and an FHD image for learning, by correcting the resolution, aspect ratio, and alignment between the two input images. . Here, the SD image for learning may be used as an input of the inference model 110 , and the FHD image for learning may be used as a target image for training the inference model 110 .

두 영상이 상호 동일한 특성을 갖지 않은 경우, 두 영상 간의 차는 강한 에지(edge) 특성을 보일 수 있다. 두 영상 간의 특성이 일치될수록, 이러한 강한 에지 특성이 완화될 수 있다. 딥러닝 기반 학습에서는 타겟 영상을 이용하여 신경망에 대한 트레이닝이 수행되므로, 추론 모델(110)의 사이즈 감소 및 트레이닝 효율의 향상 측면에서, 학습용 영상 간에 동일한 특성을 갖는 것이 매우 중요하다.When the two images do not have the same characteristics, the difference between the two images may show a strong edge characteristic. As the characteristics between the two images match, the strong edge characteristics can be relaxed. In deep learning-based learning, since training for a neural network is performed using a target image, it is very important to have the same characteristics between training images in terms of reducing the size of the inference model 110 and improving training efficiency.

도 7은 본 발명의 일 실시예에 따른 영상 페어 생성과정에 대한 순서도이다.7 is a flowchart of an image pair generation process according to an embodiment of the present invention.

영상페어 생성기(604)는 SD 비디오 영상과 FHD 비디오 영상의 시작 프레임을 일치시킨다(S700). 동일 비디오 콘텐츠에 대한 SD 비디오 영상과 FHD 비디오 영상의 프레임(frame) 개수가 일치하지 않는 경우, 영상페어 생성기(604)는 두 비디오 영상의 시작 프레임을 일치시킬 수 있다. 반면 두 영상 간 프레임 개수가 일치하는 경우, 영상페어 생성기(604)는 첫 번째 프레임부터 이후의 단계를 진행할 수 있다. The image pair generator 604 matches the start frames of the SD video image and the FHD video image (S700). When the number of frames of the SD video image and the FHD video image for the same video content do not match, the image pair generator 604 may match the start frames of the two video images. On the other hand, when the number of frames between the two images is the same, the image pair generator 604 may perform subsequent steps from the first frame.

영상페어 생성기(604)는 SD 비디오 영상과 FHD 비디오 영상 간의 프레임 번호를 일치시킨 후 학습에 사용하기 위한 프레임의 리스트를 생성한다(S702). 이때, 두 비디오 영상의 모든 프레임이 사용되지 않고, N-스텝(N은 자연수) 간격으로 프레임이 선택될 수 있다. 또한, 압축 영상이 이용되는 경우, 상대적으로 화질 저하가 작은 I-프레임(intra frame) 페어가 우선적으로 선택되어 리스트가 생성될 수 있다. The image pair generator 604 creates a list of frames to be used for learning after matching frame numbers between the SD video image and the FHD video image (S702). In this case, all frames of the two video images are not used, and frames may be selected at intervals of N-steps (N is a natural number). Also, when a compressed image is used, an I-frame (intra frame) pair having a relatively small degradation in image quality is preferentially selected to generate a list.

영상페어 생성기(604)는 SD 영상과 FHD 영상 간의 종횡비를 일치시키다(S704). 리스트에 포함된 영상 페어에 대하여 영상페어 생성기(604)는 SD 영상과 FHD 영상의 종횡비를 일치시킬 수 있다. 예를 들어, SD 영상이 4:3 비율인 경우, 영상페어 생성기(604)는 16:9 비율인 FHD 영상의 가로 해상도를 SD 영상에 맞게 자른다. 또는, SD 영상이 16:9 비율이기 때문에 검은 영역을 포함한 경우, 영상페어 생성기(604)는 검은 영역을 제외한 영상을 FHD 영상 콘텐츠에 맞게 자른다. The image pair generator 604 matches the aspect ratio between the SD image and the FHD image (S704). With respect to the image pair included in the list, the image pair generator 604 may match the aspect ratio of the SD image and the FHD image. For example, when the SD image has a 4:3 ratio, the image pair generator 604 cuts the horizontal resolution of the 16:9 FHD image to fit the SD image. Alternatively, if the SD image includes a black region because the SD image has a 16:9 ratio, the image pair generator 604 cuts the image excluding the black region to fit the FHD image content.

영상페어 생성기(604)는 SD 영상과 FHD 영상 간의 해상도를 일치시킨다(S706). SD 영상의 해상도를 정수배하여 FHD 영상의 해상도를 생성할 수 없기 때문에, 영상페어 생성기(604)는 SD 영상의 종횡 해상도 각각을 다르게 조절하여 두 영상 간의 해상도를 일치시킬 수 있다. The image pair generator 604 matches the resolution between the SD image and the FHD image (S706). Since the resolution of the FHD image cannot be generated by multiplying the resolution of the SD image by an integer multiple, the image pair generator 604 may adjust the vertical and horizontal resolutions of the SD image differently to match the resolutions between the two images.

영상페어 생성기(604)는 영상 정렬 알고리즘을 이용하여 동일 위치로 정렬된 영상 페어, 즉 학습용 SD 영상 및 학습용 FHD 영상을 생성한다(S708). 종횡비 및 해상도를 일치시킨 두 영상 간의 정렬을 위하여, 영상의 밝기인 휘도(luminance) 정보 또는 영상 내부의 경계선(edge line) 정보 등에 기반하는 영상 정렬 알고리즘이 이용될 수 있다. The image pair generator 604 generates an image pair aligned at the same position using an image alignment algorithm, that is, an SD image for learning and an FHD image for learning (S708). In order to align two images with the same aspect ratio and resolution, an image alignment algorithm based on luminance information, which is the brightness of an image, or edge line information within the image, may be used.

심층신경망 기반 추론 모델(110)의 구조 및 동작은 이미 자세히 기술되었으므로, 더 이상의 자세한 설명은 생략한다. Since the structure and operation of the deep neural network-based inference model 110 have already been described in detail, further detailed description will be omitted.

본 실시예에 따른 트레이닝부는 영상페어 생성기(604)가 생성한 영상 페어 중, 학습용 SD 영상을 추론 모델(110)의 입력으로 이용하고, 학습용 FHD 영상을 추론 모델(110)의 트레이닝을 위한 타겟(target) 영상으로 이용한다.The training unit according to this embodiment uses the SD image for learning among the image pairs generated by the image pair generator 604 as an input of the inference model 110, and uses the FHD image for learning as a target for training of the inference model 110 ( target) is used as an image.

학습용 SD 영상을 이용하여 추론 모델(110)이 생성한 추론 영상과 타겟 영상 간의 거리 메트릭(distance metric)에 기반하여, 트레이닝부는 추론 모델(110)의 파라미터를 업데이트함으로써, 추론 모델(110)에 대한 트레이닝을 진행한다. 여기서 거리 메트릭은 크로스 엔트로피(cross entropy), L1 또는 L2 메트릭 등, 두 비교 대상 간의 메트릭 차이를 표현할 수 있는 것이면 어느 것이든 이용이 가능하다.Based on the distance metric between the inference image and the target image generated by the inference model 110 using the SD image for learning, the training unit updates the parameters of the inference model 110, conduct training. Here, as the distance metric, any one capable of expressing a metric difference between two comparison objects, such as cross entropy, L1 or L2 metric, may be used.

이하 본 실시예에 따른 영상 변환장치(100)의 성능을 제시하기 위한 실험례에 대하여 설명한다.Hereinafter, an experimental example for presenting the performance of the image conversion apparatus 100 according to the present embodiment will be described.

실험을 위한 학습용 영상 페어로는 방송용 드라마 콘텐츠를 이용하였으며, 이들에 대한 해상도는 표 1에 나타낸 바와 같다.As a learning image pair for the experiment, drama contents for broadcasting were used, and the resolutions for them are as shown in Table 1.

비교 대상으로는 종래의 기술 중 가장 양호한 성능을 보이는 RCAN 모델을 이용하였다. 본 실시예에 따른 영상 변환장치(100)는 도 6에 도시된 학습 모델을 이용하여 트레이닝된 추론 모델(110)을 포함한다. 추론 모델(110)은 75 개의 콘볼루션 레이어로 구성되고, 3×10⁶ 개 정도의 파라미터를 포함한다. 800 개 이상의 콘볼루션 레이어 및 16×10⁶ 개 정도의 파라미터를 포함하는 RCAN 모델과 비교하여, 본 실시예에 따른 추론 모델(110)은 감소된 복잡도를 갖는다.As a comparison target, the RCAN model showing the best performance among the prior art was used. The image conversion apparatus 100 according to the present embodiment includes an inference model 110 trained using the learning model shown in FIG. 6 . The inference model 110 consists of 75 convolutional layers and includes about ^{3×10 6 parameters.} Compared with the RCAN model including 800 or more convolutional layers and about 16×10 ⁶ parameters, the inference model 110 according to the present embodiment has reduced complexity.

성능 비교를 위하여, 표 1에 나타낸 콘텐츠로부터 획득된 SD 영상을 FHD 영상으로 변환 시의 장당 평균 소요시간, 및 변환된 FHD 영상의 PSNR(Peak Signal to Noise Ratio)이 측정되었다. 먼저 평균 소요시간의 경우, 본 실시예에 따른 영상 변환장치(100)는 장당 0.51 초를 사용하여, RCAN 모델의 장당 1.79 초와 비교하여 탁월한 연산 속도의 증가를 보였다. 또한 변환된 FHD 영상에 대하여, 본 실시예가 보인 PSNR은 33.31 dB로서 RCAN 모델의 33.35 dB와 비교하여 거의 차이를 보이지 않음으로써, 영상 품질의 유사함이 확인되었다. For performance comparison, the average required time per page for converting an SD image obtained from the contents shown in Table 1 into an FHD image, and a Peak Signal to Noise Ratio (PSNR) of the converted FHD image were measured. First, in the case of the average required time, the image conversion apparatus 100 according to this embodiment used 0.51 seconds per sheet, and showed an excellent increase in operation speed compared to 1.79 seconds per sheet of the RCAN model. Also, with respect to the converted FHD image, the PSNR shown in this example was 33.31 dB, which showed little difference compared to 33.35 dB of the RCAN model, confirming the similarity of image quality.

RCAN 모델과 대비하여, 영상 변환장치(100)에 포함된 추론 모델(110)은 LRB, SRB 등에 기초하는 간단한 구성요소와 효과적인 트레이닝을 위한 다수의 스킵 연결을 포함한다. 이러한 간단하면서도 효과적인 구조로 인하여 본 실시예에 따른 영상 변환장치(100)는 연산 속도를 증가시키면서도 변환 영상의 품질 저하를 억제하는 것이 가능하다. In contrast to the RCAN model, the inference model 110 included in the image conversion apparatus 100 includes simple components based on LRB, SRB, and the like, and a plurality of skip connections for effective training. Due to such a simple and effective structure, the image conversion apparatus 100 according to the present embodiment can suppress deterioration in quality of the converted image while increasing the operation speed.

이상에서 설명한 바와 같이 본 실시예에 따르면, 학습용 저해상도 영상 및 고해상도 영상 페어(image pair)를 기반으로 사전에 트레이닝된 추론 모델(inference model)을 이용하여, 저해상도 영상으로부터 고해상도(high resolution) 영상을 생성하는 영상 변환장치 및 방법을 제공함으로써, 왜곡을 많이 포함하는 방송용 저해상도 영상으로부터 목표 수준의 고품질 영상을 획득하는 SR 구현이 가능해지는 효과가 있다.As described above, according to the present embodiment, a high-resolution image is generated from a low-resolution image by using an inference model trained in advance based on a low-resolution image for learning and a high-resolution image pair. By providing an image conversion apparatus and method that includes a large amount of distortion, there is an effect that it becomes possible to implement SR that obtains a high-quality image of a target level from a low-resolution image for broadcasting that contains a lot of distortion.

또한 본 실시예에 따르면, 해상도, 종횡비(aspect ratio), 및 정렬과 같은 특성을 동일하게 일치시킨 학습용 저해상도 영상 및 고해상도 영상 페어를 기반으로 사전에 트레이닝된 추론 모델(inference model)을 이용하는 영상 변환장치 및 방법을 제공함으로써, 추론 모델의 복잡도 감소 및 트레이닝 효율 향상이 가능해지는 효과가 있다. Also, according to the present embodiment, an image conversion apparatus using an inference model trained in advance based on a pair of low-resolution images and high-resolution images for training in which characteristics such as resolution, aspect ratio, and alignment are identically matched. And by providing the method, there is an effect that it is possible to reduce the complexity of the inference model and improve the training efficiency.

본 실시예에 따른 각 순서도에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나, 반드시 이에 한정되는 것은 아니다. 다시 말해, 순서도에 기재된 과정을 변경하여 실행하거나 하나 이상의 과정을 병렬적으로 실행하는 것이 적용 가능할 것이므로, 순서도는 시계열적인 순서로 한정되는 것은 아니다.Although it is described that each process is sequentially executed in each flowchart according to the present embodiment, the present invention is not limited thereto. In other words, since it may be applicable to change and execute the processes described in the flowchart or to execute one or more processes in parallel, the flowchart is not limited to a time-series order.

본 명세서에 설명되는 시스템들 및 기법들의 다양한 구현예들은, 디지털 전자 회로, 집적 회로, FPGA(field programmable gate array), ASIC(application specific integrated circuit), 컴퓨터 하드웨어, 펌웨어, 소프트웨어, 및/또는 이들의 조합으로 실현될 수 있다. 이러한 다양한 구현예들은 프로그래밍가능 시스템 상에서 실행가능한 하나 이상의 컴퓨터 프로그램들로 구현되는 것을 포함할 수 있다. 프로그래밍가능 시스템은, 저장 시스템, 적어도 하나의 입력 디바이스, 그리고 적어도 하나의 출력 디바이스로부터 데이터 및 명령들을 수신하고 이들에게 데이터 및 명령들을 전송하도록 결합되는 적어도 하나의 프로그래밍가능 프로세서(이것은 특수 목적 프로세서일 수 있거나 혹은 범용 프로세서일 수 있음)를 포함한다. 컴퓨터 프로그램들(이것은 또한 프로그램들, 소프트웨어, 소프트웨어 애플리케이션들 혹은 코드로서 알려져 있음)은 프로그래밍가능 프로세서에 대한 명령어들을 포함하며 "컴퓨터가 읽을 수 있는　기록매체"에 저장된다. Various implementations of the systems and techniques described herein include digital electronic circuitry, integrated circuits, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), computer hardware, firmware, software, and/or combination can be realized. These various implementations may include being implemented in one or more computer programs executable on a programmable system. The programmable system includes at least one programmable processor (which may be a special purpose processor) coupled to receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device. or may be a general-purpose processor). Computer programs (also known as programs, software, software applications or code) contain instructions for a programmable processor and are stored on a "computer-readable recording medium".

컴퓨터가 읽을 수 있는　기록매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 이러한 컴퓨터가 읽을 수 있는　기록매체는 ROM, CD-ROM, 자기 테이프, 플로피디스크, 메모리 카드, 하드 디스크, 광자기 디스크, 스토리지 디바이스 등의 비휘발성(non-volatile) 또는 비일시적인(non-transitory) 매체일 수 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송) 및 데이터 전송 매체(data transmission medium)와 같은 일시적인(transitory) 매체를 더 포함할 수도 있다. 또한 컴퓨터가 읽을 수 있는　기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다.The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. These computer-readable recording media are non-volatile or non-transitory, such as ROM, CD-ROM, magnetic tape, floppy disk, memory card, hard disk, magneto-optical disk, and storage device. media, and may further include transitory media such as carrier waves (eg, transmission over the Internet) and data transmission media. In addition, computer-readable recording media are distributed in networked computer systems, and computer-readable codes may be stored and executed in a distributed manner.

본 명세서에 설명되는 시스템들 및 기법들의 다양한 구현예들은, 프로그램가능 컴퓨터에 의하여 구현될 수 있다. 여기서, 컴퓨터는 프로그램가능 프로세서, 데이터 저장 시스템(휘발성 메모리, 비휘발성 메모리, 또는 다른 종류의 저장 시스템이거나 이들의 조합을 포함함) 및 적어도 한 개의 커뮤니케이션 인터페이스를 포함한다. 예컨대, 프로그램가능 컴퓨터는 서버, 네트워크 기기, 셋탑 박스, 내장형 장치, 컴퓨터 확장 모듈, 개인용 컴퓨터, 랩탑, PDA(Personal Data Assistant), 클라우드 컴퓨팅 시스템 또는 모바일 장치 중 하나일 수 있다.Various implementations of the systems and techniques described herein may be implemented by a programmable computer. Here, the computer includes a programmable processor, a data storage system (including volatile memory, non-volatile memory, or other types of storage systems or combinations thereof), and at least one communication interface. For example, a programmable computer may be one of a server, a network appliance, a set-top box, an embedded device, a computer expansion module, a personal computer, a laptop, a Personal Data Assistant (PDA), a cloud computing system, or a mobile device.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of this embodiment, and various modifications and variations will be possible by those skilled in the art to which this embodiment belongs without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain rather than limit the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the following claims, and all technical ideas within the equivalent range should be interpreted as being included in the scope of the present embodiment.

100: 영상 변환장치 102: 입력부
104: 영상변환부 106: 출력부
110: 추론 모델
202: LRB 204: 제1 스킵 연결
302: SRB 304: 제2 스킵 연결
402: 제3 스킵 연결
100: video conversion device 102: input unit
104: image conversion unit 106: output unit
110: inference model
202: LRB 204: first skip connection
302: SRB 304: second skip connection
402: third skip connection

Claims

an input unit for obtaining a low resolution image from video content; and
An image conversion unit that generates a high-resolution image of the video content from the low-resolution image using an inference model based on a deep neural network
including,
The inference model includes a first convolution layer, a plurality of Long Residual Blocks (LRB), a second convolutional layer, a first skip connection, and a third convolutional layer, the low resolution The image conversion apparatus according to claim 1, wherein the high-resolution image is generated by processing the image in order of the first convolutional layer, the plurality of LRBs, the second convolutional layer, and the third convolutional layer.

According to claim 1,
The inference model is
An image conversion apparatus, characterized in that it is trained in advance using a low-resolution image for learning and a high-resolution image pair for learning based on the same video content.

According to claim 1,
The first convolutional layer,
Image conversion apparatus, characterized in that for extracting a plurality of feature maps (feature map) from the low-resolution image.

According to claim 1,
The second convolutional layer generates a residual output with respect to an output of a last LRB among the plurality of LRBs, and the first skip connection is performed by adding the output of the first convolutional layer and the residual output. and the third convolutional layer generates the high-resolution image from the addition result.

According to claim 1,
Each of the plurality of LRBs,
Including two Short Residual Blocks (SRBs), a second skip connection, and a fourth convolutional layer, the input of the LRB is processed in the order of the two SRBs and the fourth convolutional layer, characterized in that video converter.

6. The method of claim 5,
The fourth convolutional layer generates a residual output with respect to an output of a second SRB among the two SRBs, and the second skip connection adds the input of the LRB and the residual output.

6. The method of claim 5,
Each of the two SRBs,
Including a first convolutional group including at least one convolutional layer, a concatenation layer, a second convolutional group including at least one convolutional layer, and a third skip connection, the input of the SRB is processed in the order of the first convolutional group, the concatenated layer, and the second convolutional group.

8. The method of claim 7,
The first convolution group generates a residual output with respect to the input of the SRB, and the concatenation layer concatenates the input of the SRB and the residual output of the first convolution group and transmits it to the second convolution group; The second convolution group generates a residual output for the output of the concatenated layer, and the third skip connection adds the input of the SRB and the residual output of the second convolution group. .

In the image conversion method used by the image conversion apparatus,
obtaining a low resolution image from video content; and
Implemented based on a deep neural network, a first convolution layer, a plurality of Long Residual Blocks (LRB), a second convolution layer, a first skip connection, and a third convolution It includes a convolution layer, is trained in advance based on a low-resolution image for training and a high-resolution image pair for training based on the same video content, and uses the low-resolution image to convert the low-resolution image to the first convolutional layer, the plurality of LRBs, and the first A process of generating a high-resolution image of the video content from the low-resolution image by using an inference model that processes two convolutional layers and the third convolutional layer in order
Image conversion method comprising a.

A computer program stored in a computer-readable recording medium to execute each step included in the image conversion method according to claim 9.