KR102597298B1

KR102597298B1 - Image warping network system using local texture estimator

Info

Publication number: KR102597298B1
Application number: KR1020220067008A
Authority: KR
Inventors: 진경환; 이재원
Original assignee: 재단법인대구경북과학기술원
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2023-11-02

Abstract

The present invention relates to an image warping network system using local texture estimation and to a technique for correcting image distortion and providing super-resolution (SR) image data by utilizing both Fourier characteristics and Jacobian matrices of spatially varying coordinate transformations.

Description

Image warping network system using local texture estimator}

본 발명은 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템에 관한 것으로, 더욱 상세하게는 푸리에 특성과 공간적으로 변화하는 좌표 변환의 야코비안 행렬(Jacobian matrices)을 모두 활용하여, 이미지 왜곡을 보정하고 고해상화(SR, Super-Resolution)된 이미지 데이터를 제공할 수 있는 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템에 관한 것이다.The present invention relates to an image warping network system using local texture estimation. More specifically, it utilizes both Fourier characteristics and Jacobian matrices of spatially varying coordinate transformations to correct image distortion and achieve high resolution ( This relates to an image warping network system using local texture estimation that can provide SR (Super-Resolution) image data.

이미지 워핑(warping)이란, 다양한 컴퓨터 비전(computer vision) 및 그래픽 작업에 널리 사용되는 픽셀의 위치를 이동하는 기하학적 처리의 한 기술로서, 일 예를 들자면, 이미지 편집, 광학 흐름, 이미지 정렬 및 전방향(omnidirectional) 비전 등을 들 수 있다.Image warping is a geometric processing technique that moves the position of pixels that is widely used in various computer vision and graphics tasks, for example, image editing, optical flow, image alignment, and omnidirectional (omnidirectional) vision, etc.

이 때, 이미지의 확대나 축소와 같은 기하학적 처리는 모든 픽셀에 대하여 일정한 규칙을 적용함으로써, 균일한 반환 결과를 얻을 수 있는데 반해, 이미지 워핑은 픽셀 별로 이동 정도를 달리할 수 있으며, 일그러진 영상을 올바르게 복구하거나, 정상적인 형태를 변환하는데 활용되고 있다.At this time, while geometric processing such as enlarging or reducing the image can obtain uniform return results by applying certain rules to all pixels, image warping can vary the degree of movement for each pixel and can correct the distorted image. It is used to restore or convert to normal form.

이에 따라, 이미지 워핑 기술은 보는 관점에 따라, SR(Super-Resolution) 기술의 일반화된 형태로 볼 수 있으며, 도 1의 a)와 같은 입력 데이터를 기준으로 SR 기술은 도 1의 b)와 같이, 모든 영역이 동일한 규칙을 적용하여 스케일 팩터가 확장되는데 반해, 이미지 워핑 기술은 도 1의 c)와 같이, 위치마다 다른 스케일 팩터로 변환하게 된다.Accordingly, depending on the viewpoint, image warping technology can be viewed as a generalized form of SR (Super-Resolution) technology, and based on the input data shown in a) of Figure 1, SR technology is shown in b) of Figure 1. , While the scale factor is expanded by applying the same rules to all areas, the image warping technology converts each location to a different scale factor, as shown in c) of Figure 1.

일반적인 이미지 워핑 기술로는 입력 공간에서 공간 위치를 찾고 누락된 RGB 값을 계산하기 위한 보간 커널을 적용하는 것이다. 이러한 보간 기반 이미지 워핑 기술은 구별할 수 있고(differentiable) 구현하기 쉬운 프레임워크로 구성되는 장점이 있다. 이에 반해, 결과 이미지(출력 이미지)의 흐릿하게 나오는 영역이 존재하여 정확도/신뢰도가 낮은 문제점이 있다.A common image warping technique is to find spatial locations in the input space and apply an interpolation kernel to compute missing RGB values. This interpolation-based image warping technique has the advantage of being comprised of a framework that is differentiable and easy to implement. On the other hand, there is a problem of low accuracy/reliability due to the presence of blurred areas in the resulting image (output image).

최근 이러한 문제점을 해소하기 위한 'SRWarp' 기술이 제안되고 있으며, SRWarp는 적응형 워핑 계층(adaptive warping layer)을 사용한 호모그래피 변환(homography transform)을 포함하고 있어, 기존의 이미지 워핑 기술에 초해상화 기술을 적용하여, 결과 이미지의 해상도를 높이고자 하였다. 그렇지만, 학습하지 않은 변환에 대해서는 성능이 좋지 않은 문제점이 나타났다.Recently, 'SRWarp' technology has been proposed to solve these problems, and SRWarp includes a homography transform using an adaptive warping layer, providing super-resolution to the existing image warping technology. By applying technology, we attempted to increase the resolution of the resulting image. However, there was a problem with poor performance for untrained transformations.

이에 따라, 이를 해결하기 위해 최근 또다른 종래 기술은 학습하지 않은 변환에 대해서도 높은 성능을 유지할 수 있도록 초해상화 기술에 Implicit Neural Representation을 적용하였으나, 이 역시도 고주파 영역 정보를 잡지 못해 출력 이미지들이 흐릿하게 보이는 문제점이 나타났다.Accordingly, to solve this problem, another conventional technology recently applied Implicit Neural Representation to super-resolution technology to maintain high performance even for unlearned transformations, but this also failed to capture high-frequency region information, resulting in blurry output images. A visible problem appeared.

이러한 고주파 영역 정보를 잡지 못하는 문제점을 해결하기 위하여 최근 다른 종래 기술인 LTE(Local Texture Estimator)이 제안되었다. 그렇지만, 이 역시도 이미지 워핑은 하지 못하는 문제점이 있다.To solve this problem of not being able to capture high-frequency region information, another conventional technology, LTE (Local Texture Estimator), was recently proposed. However, this also has the problem of not being able to do image warping.

이미지 워핑 기술이 고해상화 기술보다 구현하기 어려운 이유는, 모든 위치에 대해 고해상도로 복원을 수행하면서도 각 위치마다 스케일 팩터가 상이하기 때문에, 학습하지 않은 변환에 대해서도 좋은 성능을 가지며, 고주파 영역 정보를 잡을 수 있는 이미지 워핑에 대한 기술이 요구되고 있다.The reason why image warping technology is more difficult to implement than high-resolution technology is that although it performs high-resolution restoration for all positions, the scale factor is different for each position, so it has good performance even for unlearned transformations and cannot capture high-frequency region information. Technology for image warping that can be used is in demand.

이와 관련해서, "CVPR 2022, Local Texture Estimator for Implicit Representation Function"에서는 상술한 LTE 기술을 제안하고 있다.In this regard, "CVPR 2022, Local Texture Estimator for Implicit Representation Function" proposes the LTE technology described above.

CVPR 2022, Local Texture Estimator for Implicit Representation FunctionCVPR 2022, Local Texture Estimator for Implicit Representation Function

본 발명은 상기한 바와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은 푸리에 특성과 공간적으로 변화하는 좌표 변환의 야코비안 행렬(Jacobian matrices)을 모두 활용하여, 이미지 왜곡을 보정하고 고해상화(SR, Super-Resolution)된 이미지 데이터를 제공할 수 있는 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템을 제공하는 것이다.The present invention was created to solve the problems of the prior art as described above. The purpose of the present invention is to correct image distortion by utilizing both the Fourier characteristic and the Jacobian matrices of spatially varying coordinate transformation. and provides an image warping network system using local texture estimation that can provide high-resolution (SR, Super-Resolution) image data.

본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템으로서, 기초 이미지 데이터를 입력받고, 포함되어 있는 특성 맵(feature map)을 산출하는 인코더 모듈, 기입력된 로컬 그리드 정보, 형상(Shape) 정보 및 상기 특성 맵을 입력받아, 좌표 변환을 고려한 푸리에 정보(fourier information)를 추정하고, 추정한 푸리에 정보와 좌표 변환으로 인한 야코비안 행렬(Jacobian matrices)을 활용하여, 리샘플 이미지를 출력하는 이미지 처리 모듈, 상기 리샘플 이미지를 입력받아, 상기 리샘플 이미지의 추정 RGB 정보를 출력하는 디코더 모듈, 상기 기초 이미지 데이터의 이중 선형 보간 이미지 데이터를 산출하는 이중 선형 보간 모듈 및 상기 이중 선형 보간 이미지 데이터와 상기 리샘플 이미지의 추정 RGB 정보를 합산하여, 워핑된 업 스케일링 이미지를 제공하는 합산 모듈을 포함하는 것이 바람직하다.An image warping network system using local texture estimation according to an embodiment of the present invention, which includes an encoder module that receives basic image data and calculates the included feature map, pre-entered local grid information, and shape ( Shape) information and the feature map are input, estimate Fourier information considering coordinate transformation, and output a resample image using the estimated Fourier information and Jacobian matrices resulting from coordinate transformation. an image processing module, a decoder module that receives the resampled image and outputs estimated RGB information of the resampled image, a bilinear interpolation module that calculates bilinear interpolation image data of the basic image data, and the bilinear interpolation image It is desirable to include a summing module that adds data and estimated RGB information of the resampled image to provide a warped up-scaled image.

더 나아가, 상기 인코더 모듈은 업 스케일링 모듈을 포함하지 않는 심층 SR(Super-Resolution) 네트워크로 구성되어, 상기 심층 SR 네트워크를 통해 특성 맵을 산출하는 것이 바람직하다.Furthermore, the encoder module is preferably composed of a deep SR (Super-Resolution) network that does not include an upscaling module, and the feature map is calculated through the deep SR network.

더 나아가, 상기 심층 SR 네트워크는 EDSR(Enhanced Deep Super-Resolution Network), RCAN(Residual Channel Attention Network) 및 RRDB(Residual-in-Residual Dense Block network) 중 선택되는 어느 하나인 것이 바람직하다.Furthermore, the deep SR network is preferably one selected from Enhanced Deep Super-Resolution Network (EDSR), Residual Channel Attention Network (RCAN), and Residual-in-Residual Dense Block network (RRDB).

더 나아가, 상기 이미지 처리 모듈은 상기 특성 맵을 이용하여, 진폭(amplitude)를 추정하는 진폭 추정기, 상기 특성 맵을 이용하여, 주파수(frequency)를 추정하는 주파수 추정기 및 상기 형상 정보의 위상(phase)을 추정하는 위상 추정기를 더 포함하며, 상기 로컬 그리드 정보와 추정한 주파수를 이용하여 벡터 내적(inner product)을 수행하는 것이 바람직하다.Furthermore, the image processing module includes an amplitude estimator that estimates the amplitude using the feature map, a frequency estimator that estimates the frequency using the feature map, and a phase of the shape information. It further includes a phase estimator that estimates , and preferably performs a vector inner product using the local grid information and the estimated frequency.

더 나아가, 상기 진폭 추정기와 주파수 추정기는 각각 256 채널을 갖는 3×3 컨볼루션 레이어로 형성되며, 상기 위상 추정기는 128 채널을 갖는 단일 선형 레이어로 형성되는 것이 바람직하다.Furthermore, it is preferable that the amplitude estimator and the frequency estimator are each formed as a 3×3 convolutional layer with 256 channels, and the phase estimator is formed as a single linear layer with 128 channels.

상기와 같은 구성에 의한 본 발명의 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은 푸리에 특성과 공간적으로 변화하는 좌표 변환의 야코비안 행렬(Jacobian matrices)을 모두 활용하여, 이미지 왜곡을 보정하고 고해상화(SR, Super-Resolution)된 이미지 데이터(워핑된 업스케일링 이미지 데이터)를 제공할 수 있는 장점이 있다.The image warping network system using local texture estimation of the present invention with the above configuration utilizes both Fourier characteristics and Jacobian matrices of spatially varying coordinate transformation to correct image distortion and achieve high resolution (SR). , Super-Resolution) has the advantage of being able to provide image data (warped upscaling image data).

도 1은 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템을 나타낸 구성 예시도이다.
도 2는 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템에 의해 이미지 워핑 데이터를 신경망으로 나타낸 예시도이다.
도 3은 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템에 적용된 야코비안 행렬(Jacobian matrix) 및 헤시안 텐서(Hessian tensor)을 나타낸 예시도이다.
도 4는 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템의 학습 과정을 나타낸 예시도이다.
도 5 내지 도 10은 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템과 다른 이미지 워핑 네트워크 시스템의 출력 이미지에 대한 비교도이다.
도 11은 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템에 적용된 푸리에 정보를 시각화한 예시도이다.Figure 1 is a configuration diagram showing an image warping network system using local texture estimation according to an embodiment of the present invention.
Figure 2 is an example diagram showing image warping data as a neural network by an image warping network system using local texture estimation according to an embodiment of the present invention.
Figure 3 is an example diagram showing a Jacobian matrix and a Hessian tensor applied to an image warping network system using local texture estimation according to an embodiment of the present invention.
Figure 4 is an exemplary diagram showing the learning process of an image warping network system using local texture estimation according to an embodiment of the present invention.
5 to 10 are comparison diagrams of output images of an image warping network system using local texture estimation according to an embodiment of the present invention and another image warping network system.
Figure 11 is an example diagram visualizing Fourier information applied to an image warping network system using local texture estimation according to an embodiment of the present invention.

이하 첨부한 도면들을 참조하여 본 발명의 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템을 상세히 설명한다. 다음에 소개되는 도면들은 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 예로서 제공되는 것이다. 따라서, 본 발명은 이하 제시되는 도면들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 또한, 명세서 전반에 걸쳐서 동일한 참조번호들은 동일한 구성요소들을 나타낸다.Hereinafter, an image warping network system using local texture estimation of the present invention will be described in detail with reference to the attached drawings. The drawings introduced below are provided as examples so that the idea of the present invention can be sufficiently conveyed to those skilled in the art. Accordingly, the present invention is not limited to the drawings presented below and may be embodied in other forms. Additionally, like reference numerals refer to like elements throughout the specification.

이때, 사용되는 기술 용어 및 과학 용어에 있어서 다른 정의가 없다면, 이 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 통상적으로 이해하고 있는 의미를 가지며, 하기의 설명 및 첨부 도면에서 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능 및 구성에 대한 설명은 생략한다.At this time, if there is no other definition in the technical and scientific terms used, they have the meaning commonly understood by those skilled in the art to which this invention pertains, and the gist of the present invention is summarized in the following description and attached drawings. Descriptions of known functions and configurations that may be unnecessarily obscure are omitted.

더불어, 시스템은 필요한 기능을 수행하기 위하여 조직화되고 규칙적으로 상호 작용하는 장치, 기구 및 수단 등을 포함하는 구성 요소들의 집합을 의미한다.In addition, a system refers to a set of components including devices, mechanisms, and means that are organized and interact regularly to perform necessary functions.

이미지 워핑이란, 상술한 바와 같이, 직사각형 그리드에 정의된 이미지를 임의의 모양으로 재구성하는 기술이다. 이러한 이미지를 연속적으로 표현하는 데 암묵적 신경 표현(INR, Implicit Neural Representations)이 적용되고 있다.As described above, image warping is a technique for reconstructing an image defined in a rectangular grid into an arbitrary shape. Implicit neural representations (INR) are being applied to continuously express these images.

기존의 컴퓨터 비전 연구에서는 이미지를 다룰 때, 보통 픽셀 위치마다 RGB 값을 가지는 하나의 행렬로 표현하는 것이 일반적이며, 이를 통해서 일반적으로 이미지의 해상도에 따라 저장에 필요한 용량과 이미지를 다룰 수 있는 모델의 크기가 좌우되었다. 최근 이미지를 픽셀 위치인 x, y 좌표에서 r, g, b 값으로 변환해주는 하나의 함수로 표현하고자 하며, 이 함수를 신경망으로 표현해 학습하는 것이 암묵적 신경 표현의 핵심 기술이다. 암묵적 신경 표현을 통해서 이미지를 함수로 표현할 경우, 이미지를 해상도와 상관없이 저장할 수 있으며, 컴퓨터 비전 모델에 적용할 있고, 또한, 이미지를 고화질로 변환해주는 SR 기술도 단순히 표현할 수 있는 장점이 있다.In existing computer vision research, when dealing with images, it is common to express them as a matrix with RGB values for each pixel position, and through this, the capacity required for storage and the model that can handle the image are generally determined according to the resolution of the image. Size depends. Recently, we want to express an image as a function that converts the x, y coordinates of the pixel location into r, g, and b values, and learning this function by expressing it with a neural network is a core technology of implicit neural expression. When an image is expressed as a function through implicit neural expression, the image can be stored regardless of resolution, can be applied to a computer vision model, and also has the advantage of being able to simply express SR technology that converts the image to high definition.

일 예를 들자면, 이미지의 각 픽셀은 특정 RGB 값을 가지고 있고 이 픽셀들이 모여서 이미지가 만들어진다. 이러한 이미지를 함수로 표현하기 위해서는 좌표(x, y)를 입력받아, 각 픽셀의 값(RGB 값 또는, intensity 값)을 출력해야 하며, 이러한 함수 표현 자체가 이미지 데이터에 해당한다.For example, each pixel in an image has a specific RGB value, and these pixels are combined to create an image. In order to express such an image as a function, coordinates (x, y) must be input and the value (RGB value or intensity value) of each pixel must be output, and this function expression itself corresponds to image data.

다충 퍼셉트론(MLP, Multi-Layer Perceptron)은 이러한 INR을 매개 변수화하고 좌표를 입력받도록 설계되어 있다.The Multi-Layer Perceptron (MLP) is designed to parameterize this INR and receive coordinates.

다만, 이러한 함수 표현(암묵적 신경 표현)을 위한 인공지능 네트워크를 이루고 있는 독립형 다층 퍼셉트론(MLP, Multi-Layer Perceptron)은 배경 기술에서 언급한 바와 같이, 고주파 영역(고주파 푸리에 계수)을 학습하는 데 어려움이 있다.However, as mentioned in the background technology, the stand-alone multi-layer perceptron (MLP), which forms the artificial intelligence network for this function expression (implicit neural representation), has difficulty learning the high-frequency region (high-frequency Fourier coefficients). There is.

즉, 암묵적 신경 표현의 단점은 ReLU가 있는 독립형 다층 퍼셉트론이 저주파 영역 위주로 학습이 이루어진다는 것이다.In other words, the disadvantage of implicit neural representation is that the independent multi-layer perceptron with ReLU learns mainly in the low-frequency region.

이러한 이미지 워핑은 이미지 편집(image editing), 광학 흐름(optical flow), 이미지 정렬(image alignment) 및 전방향 비젼(omnidirectional vision)과 같은 다양한 컴퓨터 비전 및 그래픽 작업에 사용된다.This image warping is used in a variety of computer vision and graphics tasks, such as image editing, optical flow, image alignment, and omnidirectional vision.

종래에는 입력 공간에서 누락된 RGB 값을 보간하기 위해 역 좌표 변환을 적용하여 이미지 워핑을 수행하였으나, 이러한 보간 기법의 이미지 워핑 기법은 출력 이미지에서 흐릿하게 나오는 영역이 존재하게 된다.Conventionally, image warping was performed by applying inverse coordinate transformation to interpolate missing RGB values in the input space. However, this interpolation image warping technique results in blurred areas in the output image.

이러한 점을 해소하기 위하여, 최근 SRWarp 기법은 심층 단일 이미지 초해상도(SISR, Deep Single Image Super Resolution) 아키텍처를 백본으로 채택하여, 고주파 디테일로 이미지를 재구성하도록 설계하고 있다.To solve this problem, the recent SRWarp technique adopts the Deep Single Image Super Resolution (SISR) architecture as its backbone and is designed to reconstruct images with high-frequency details.

이러한 SISR의 기술적 목적은 저하된 저해상도 이미지 데이터를 고해상도 이미지 데이터로 재구성하는 것으로, 심층 특징 맵을 추출하여 이를 네트워크의 끝 단에서 업스케일하여 고해상도 이미지 데이터로 재구성하고 있다. 이러한 기술의 특성 상, 시각적으로는 명확한 고해상도 이미지 데이터를 출력할 수 있으나, 왜곡된 이미지의 각 국소 영역이 서로 다른 스케일 팩터로 확장되기 때문에, 또 다른 한계가 존재한다.The technical purpose of SISR is to reconstruct degraded low-resolution image data into high-resolution image data. Deep feature maps are extracted and upscaled at the end of the network to reconstruct them into high-resolution image data. Due to the nature of this technology, it is possible to output visually clear high-resolution image data, but because each local area of the distorted image is expanded by a different scale factor, another limitation exists.

상술한 종래 기술의 문제점을 해소하기 위하여, 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은, 도 1에 도시된 바와 같이, 인코더 모듈, 이미지 처리 모듈, 디코더 모듈, 이중 선형 보간 모듈 및 합산 모듈을 포함하여 구성되며, 각 구성들은 컴퓨터를 포함하는 하나의 연산 처리 수단 또는, 각각의 연산 처리 수단에 구비되어 동작을 수행하게 된다.In order to solve the problems of the prior art described above, the image warping network system using local texture estimation according to an embodiment of the present invention includes an encoder module, an image processing module, a decoder module, and a bilinear, as shown in FIG. It is composed of an interpolation module and a summation module, and each component is provided and operated in one operation processing means including a computer or in each operation processing means.

본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은, 입력 이미지의 푸리에 정보와 좌표 변환의 공간적 변화 특성을 모두 고려하여, 이미지 워핑을 위한 주파수 응답 추정기를 INR을 통해 공식화하여 표현하게 된다. The image warping network system using local texture estimation according to an embodiment of the present invention considers both the Fourier information of the input image and the spatial change characteristics of the coordinate transformation, and formulates and expresses the frequency response estimator for image warping through INR. I do it.

본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템에 대해서 간략하게는, 입력되는 이미지()와, 미분 가능하고 가역적인 좌표 변환()를 고려할 때, 도 2에 도시된 바와 같이, 이미지 워핑을 위한 I^WARP()를 나타내는 INR을 위한 로컬 텍스처 추정기(일명, 'LTEW, Local Texture Estimator for image Warping')를 위한 알고리즘에 관한 것이다.Briefly about the image warping network system using local texture estimation according to an embodiment of the present invention, the input image ( ) and differentiable and reversible coordinate transformations ( ), as shown in Figure 2, I ^WARP ( ) relates to an algorithm for a local texture estimator (aka, 'LTEW, Local Texture Estimator for image Warping') for INR.

기하학에서, 야코비안의 결정 인자는 국소 배율을 나타내며, 이에 따라, MLP가 I^WARP를 나타내기 전에, 각 픽셀에 대해 공간적으로 변화하는 야코비안 행력을 푸리에 정보에 곱하게 된다.In geometry, the determinant of the Jacobian represents the local scale, so that the Fourier information is multiplied by the spatially varying Jacobian power for each pixel before the MLP represents I ^WARP .

본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은, 입력 이미지(기초 이미지 데이터)를 이용하여 푸리에 정보를 추정하고, 추정한 푸리에 정보와 좌표 변환으로 인한 야코비안 행렬을 활용하는 것이 바람직하다.An image warping network system using local texture estimation according to an embodiment of the present invention estimates Fourier information using an input image (basic image data) and utilizes the Jacobian matrix resulting from the estimated Fourier information and coordinate transformation. It is desirable.

이에 따라, 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은, 푸리에 정보와 공간적으로 변화하는 좌표 변환의 야코비안 행렬을 모두 활용함으로써, 이미지 워핑을 위한 연속적인 신경 표현을 위한 알고리즘을 제공할 수 있다. 즉, 심층 신경망 인코더의 특징 맵과 상대 좌표 또는, 로컬 그리드 정보를 모두 사용함으로써, 학습이 이루어지지 않은 영역에 대한 일반화를 제공할 수 있다.Accordingly, the image warping network system using local texture estimation according to an embodiment of the present invention utilizes both Fourier information and the Jacobian matrix of spatially varying coordinate transformation to provide continuous neural representation for image warping. Algorithms can be provided. In other words, by using both the feature map of the deep neural network encoder and the relative coordinates or local grid information, it is possible to provide generalization for areas where learning has not occurred.

상세한 설명에 앞서서, INR에 대해서 더 알아보자면, 신경망이 범용 함수 근사치라는 사실에서 동기 부여된 INR은 연속 도메인 신호를 나타내기 위해 널리 적용된다. 일반적으로, 데이터에 대한 메모리 요구 사항은 신호 분해능에 2차 또는 3차적으로 비례한다. 이에 대조적으로 INR은 저장 크기가 신호 분해능보다 모델 매개 변수의 수에 비례하기 때문에, 연속 신호를 저장하는 효율적인 메모리 프레임 워크를 제공한다.Before a detailed explanation, let's learn more about INR. Motivated by the fact that neural networks are general-purpose function approximators, INR is widely applied to represent continuous domain signals. Typically, memory requirements for data are quadratically or cubically proportional to signal resolution. In contrast, INR provides an efficient memory framework to store continuous signals because the storage size is proportional to the number of model parameters rather than signal resolution.

또한, 주파수 영역의 편향 문제를 해소하기 위하여, 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은, 로컬 텍스처 추정기(LTE, Local Texture Estimator)는 푸리에 분석에 의해 동기화되는 저해상도 이미지 데이터로부터 고해상도 이미지 데이터에 대한 푸리에 특징을 추정한다.In addition, in order to solve the problem of bias in the frequency domain, the image warping network system using local texture estimation according to an embodiment of the present invention uses a local texture estimator (LTE) to generate low-resolution images synchronized by Fourier analysis. Fourier features for high-resolution image data are estimated from the data.

도 1에 도시된 바와 같이, 입력 좌표 공간으로는 집합 X()를 의미하며, 출력 좌표 공간으로는 Y()를 의미한다.As shown in Figure 1, the input coordinate space is a set ), and the output coordinate space is Y( ) means.

국소 신경 표현(Local Neural Representation)에서, 신경 표현()은 훈련 가능한 매개 변수(θ)를 가진 MLP에 의해 매개 변수화된다. 디코딩 함수()은 쿼리 포인트()에 대한 RGB 값을 하기의 수학식 1을 통해서 예측하게 된다.In Local Neural Representation, neural representation ( ) is parameterized by an MLP with trainable parameters (θ). Decoding function ( ) is the query point ( ) is predicted through Equation 1 below.

여기서, 이며,here, and

이며, and

는 의 집합이며, Is is a set of

은 로컬 앙상블 계수이며, is the local ensemble coefficient,

는 지수 j에 대한 잠재 변수를 의미하며, means the latent variable for index j,

는 의 좌표를 의미한다. Is means the coordinates of

최근 연구를 통해서, 표준 MLP 구조가 고주파 영역의 학습에 문제가 있는 것을 알 수 있으며, 이러한 스펙트럼 편향 문제를 해소하기 위하여, 하기의 수학식 2와 같이, 상기의 수학식 1을 수정하였다.Through recent research, it has been found that the standard MLP structure has problems with learning in the high frequency region, and to solve this spectrum bias problem, Equation 1 above was modified as shown in Equation 2 below.

여기서, 는 로컬 텍스쳐 추정기(LTE)이다.here, is a local texture estimator (LTE).

LTE()는 적어도 두 개의 추정기를 포함하며, 하나는 진폭 추정기()이며, 또다른 하나는 주파수 추정기()이다.LTE( ) contains at least two estimators, one being an amplitude estimator ( ), and the other is a frequency estimator ( )am.

이 때, 추정함수()는 하기의 수학식 3과 같이 정의된다.At this time, the estimation function ( ) is defined as Equation 3 below.

여기서, ,here, ,

, ,

는 로컬 그리드이며, is the local grid,

는 진폭 벡터(amplitude vector)이며, is the amplitude vector,

는 지수 j에 대한 주파수 행렬을 의미하며, means the frequency matrix for exponent j,

<, >는 벡터 내적(inner product)을 의미하며, <, > means vector inner product,

⊙는 요소 별 곱셈(element-wise multiplication)을 의미한다.⊙ means element-wise multiplication.

상술한 수학식들을 통해서는 Fj가 입력 이미지(I^IN)의 주파수 응답이며, 이를 통해서는 I^WARP에 의한 워핑 이미지와 주파수 응답이 상이하기 때문에, 고해상도의 워핑 이미지를 출력하지 못한다.Through the above-mentioned equations, Fj is the frequency response of the input image (I ^IN ), and since the frequency response is different from the warping image by I ^WARP , a high-resolution warping image cannot be output.

이를 해소하기 위하여, 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은 주어진 좌표 변환(

)을 아핀 변환(affine transformations)으로 선형화한다.To solve this problem, an image warping network system using local texture estimation according to an embodiment of the present invention provides a given coordinate transformation (

) is linearized with affine transformations.

이를 위한 포인트(x_j) 근처의 로컬 그리드()의 선형 근사치는 하기의 수학식 4에 의해 연산된다.For this, a local grid near the point (x _j ) ( ) The linear approximation of ) is calculated by Equation 4 below.

여기서, 는 좌표 변환(

)의 야코비안 행렬이며, here, is a coordinate transformation (

) is the Jacobian matrix of

는 x²이상의 차수 조건을 의미하며, means the order condition of x ² or more,

는 입력 공간 X의 로컬 그리드를 의미한다. refers to the local grid of the input space

아핀 이론(affine theorem)과 상기의 수학식 4에 의해 포인트() 근처의 워핑된 이미지(I^WARP)의 주파수 응답()은 하기의 수학식 5에 의해 근사된다.By the affine theorem and Equation 4 above, the point ( ) Frequency response of the nearby warped image (I ^WARP ) ( ) is approximated by Equation 5 below.

이와 같이, 상기의 수학식 4와 5를 통해서, 상기의 수학식 3의 추정 함수를 하기의 수학식 6과 같이 일반화한다.In this way, through Equations 4 and 5 above, the estimation function of Equation 3 above is generalized as Equation 6 below.

상기의 수학식 3과 수학식 6을 비교하면, LTEW 표현은 대신 입력 좌표 공간 의 로컬 그리드를 활용하여, 워핑 이미지에 대한 푸리에 정보를 추출할 수 있음을 알 수 있다.Comparing Equation 3 and Equation 6 above, the LTEW expression is instead of input coordinate space It can be seen that Fourier information for the warped image can be extracted by utilizing the local grid of .

대칭 스케일 팩터 내의 SISR의 경우, 업 샘플링된 이미지의 픽셀 모양은 정사각형이며, 공간적으로 불변하다.For SISR within a symmetric scale factor, the pixel shape of the upsampled image is square and spatially invariant.

그렇지만, 이미지 워핑의 경우, 재샘플(리샘플)된 이미지의 픽셀은 도 3의 a)와 같이, 임의의 모양을 가질 수 있고 공간적으로 변할 수 있다.However, in the case of image warping, pixels of the resampled image may have arbitrary shapes and vary spatially, as shown in a) of FIG. 3.

이러한 점을 해소하기 위하여, 본 발명에서는 하기의 수학식 7과 같이 포인트(y)에 대한 좌표 변환의 기울기로 픽셀 형상()을 나타낸다.In order to solve this problem, in the present invention, the pixel shape ( ).

여기서, [·,·]은 평탄화 후의 연결을 의미하며,Here, [·,·] means connection after flattening,

, 는 각각 픽셀의 방향을 나타내는 수치적 야코비안 행렬과 곡률 정도를 지정하는 수치적 헤시안 텐서를 나타낸다. , represents a numerical Jacobian matrix indicating the direction of the pixel and a numerical Hessian tensor specifying the degree of curvature, respectively.

형상 정보의 표현을 위해, 쿼리 포인트와 가장 가까운 8개의 포인트()에 역좌표 변환을 적용하고, 그 차이를 계산하여, 도 3의 b)에 도시된 바와 같은 수치 미분을 계산한다.To express shape information, the eight points closest to the query point ( ) is applied to the inverse coordinate transformation, the difference is calculated, and the numerical differentiation as shown in b) of FIG. 3 is calculated.

주어진 좌표 변환(

)이 클래스 C²에 있다고 가정하고, 이는 를 의미한다.Given coordinate transformation (

) is in class C ² , which gives means.

이에 따라, 형상 표현을 위해 의 6개의 변수만을 활용하게 된다.Accordingly, for shape expression Only six variables are used.

형상 정보에는 가장자리 위치 또는, 픽셀 형상에 대한 정보가 포함되며, 이에 따라, 상기의 수학식 6을 하기의 수학식 8과 같이 재정의한다.Shape information includes information about edge positions or pixel shapes, and accordingly, Equation 6 above is redefined as Equation 8 below.

여기서, 는 위상 추정기이다.here, is a phase estimator.

더불어, 본 발명에서는 네트워크 수렴을 안정화하고, LTEW가 고주파 영역 정보를 학습하는 데에 도움을 주기 위해, 이중 선형 보간 이미지 데이터()를 더 활용하게 된다.In addition, in the present invention, in order to stabilize network convergence and help LTEW learn high-frequency region information, bilinear interpolation image data ( ) will be used more.

이에 따라, 본 발명에 의해 제안된 LTEW를 사용한 워핑 이미지(I^WARP)의 국소 신경 표현은 하기의 수학식 9와 같이 정의된다.Accordingly, the local neural representation of the warped image (I ^WARP ) using LTEW proposed by the present invention is defined as Equation 9 below.

이를 위한, 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템의 각 구성에 대해서 자세히 알아보자면,To this end, let us look in detail at each configuration of the image warping network system using local texture estimation according to an embodiment of the present invention.

상기 인코더 모듈()은 기초 이미지 데이터를 입력받고, 상기 기초 이미지 데이터에 포함되어 있는 특성 맵(feature map)(z)을 산출하는 것이 바람직하다.The encoder module ( ) It is desirable to receive basic image data and calculate a feature map (z) included in the basic image data.

이 때, 상기 인코더 모듈()은 업 스케일 모듈을 포함하지 않는 심층 SR 네트워크로 구성되는 것이 바람직하며, 구성된 상기 심층 SR 네트워크를 이용하여 특성 맵을 산출하게 된다.At this time, the encoder module ( ) is preferably composed of a deep SR network that does not include an upscale module, and a feature map is calculated using the constructed deep SR network.

상기 심층 SR 네트워크로는 종래 기술인 EDSR(Enhanced Deep Super-Resolution Network), RCAN(Residual Channel Attention Network) 및 RRDB(Residual-in-Residual Dense Block network) 중 선택되는 어느 하나인 것이 바람직하며, 이에 대해서 한정하는 것은 아니다.The deep SR network is preferably one selected from conventional technologies such as EDSR (Enhanced Deep Super-Resolution Network), RCAN (Residual Channel Attention Network), and RRDB (Residual-in-Residual Dense Block network), and is limited to this. It's not like that.

상기 이미지 처리 모듈()은 미리 입력된 로컬 그리드 정보(), 형상(shape) 정보(s) 및 상기 인코더 모듈에 의한 특성 맵(z)을 입력받아, 좌표 변환을 고려한 푸리에 정보(fourier information)를 추정하게 된다. 또한, 추정한 푸리에 정보와 좌표 변환으로 인한 야코비안 행렬(Jacobian matrices)을 활용하여, 리샘플 이미지를 출력하게 된다.The image processing module ( ) is pre-entered local grid information ( ), shape information (s), and characteristic map (z) by the encoder module are input, and Fourier information considering coordinate transformation is estimated. In addition, a resample image is output using the estimated Fourier information and Jacobian matrices resulting from coordinate transformation.

이 때, 상기 이미지 처리 모듈()은 도 1에 도시된 바와 같이, 진폭 추정기(

), 주파수 추정기(

) 및 위상 추정기(

)를 포함하게 된다.At this time, the image processing module ( ) is an amplitude estimator (

), frequency estimator (

) and phase estimator (

) will be included.

상기 진폭 추정기(

)는 상기 특성 맵을 이용하여, 진폭(amplitude)을 추정하며, 상기 주파수 추정기(

)는 상기 특성 맵을 이용하여, 주파수(frequency)를 추정하게 된다.The amplitude estimator (

) uses the characteristic map to estimate the amplitude, and the frequency estimator (

) uses the characteristic map to estimate the frequency.

또한, 상기 위상 추정기(

)는 상기 형상 정보의 위상(phase)을 추정하는 것으로, 상기의 수학식 8과 같이, 상기 로컬 그리드 정보()와 추정한 주파수를 이용하여 벡터 내적(inner product)을 수행하는 것이 바람직하다.In addition, the phase estimator (

) is to estimate the phase of the shape information, and as in Equation 8 above, the local grid information ( It is desirable to perform a vector inner product using ) and the estimated frequency.

이 때, 상기 진폭 추정기(

)와 주파수 추정기(

)는 각각 256 채널을 갖는 3×3 컨볼루션 레이어로 형성되는 것이 바람직하며, 상기 위상 추정기(

)는 128 채널을 갖는 단일 선형 레이어로 형성되는 것이 바람직하다.At this time, the amplitude estimator (

) and frequency estimator (

) is preferably formed of 3×3 convolutional layers each having 256 channels, and the phase estimator (

) is preferably formed as a single linear layer with 128 channels.

본 발명에서는 워핑 이미지가 포인트(

) 근처에서 동일한 텍스처를 갖는다고 가정하고, 상기 진폭 추정기(

)와 주파수 추정기(

)는 각각 가장 가까운 보간(Nearest-neighborhood interpolation) 기법을 이용하여,

에서 푸리에 정보(

)를 추정하는 것이 바람직하다.In the present invention, the warping image is the point (

), assuming that they have the same texture nearby, the amplitude estimator (

) and frequency estimator (

) each uses the nearest-neighborhood interpolation technique,

Fourier information (

) is desirable to estimate.

또한, 상기 이미지 처리 모듈()은 상기 리샘플 이미지를 생성하기 앞서서, 진폭과 사인파 활성화 출력을 곱하는 것이 바람직하다.In addition, the image processing module ( ) is preferably multiplied by the amplitude and the sine wave activation output before generating the resample image.

상기 디코더 모듈()은 상기 이미지 처리 모듈()에 의한 상기 리샘플 이미지를 입력받아, 상기 리샘플 이미지의 추정 RGB 정보를 출력하게 된다.The decoder module ( ) is the image processing module ( ) is input, and estimated RGB information of the resample image is output.

이 때, 상기 디코더 모듈()은 ReLU가 포함되는 4계층 MLP로 형성되며, 256개의 숨겨진 차원을 포함하고 있다.At this time, the decoder module ( ) is formed as a four-layer MLP with ReLU and contains 256 hidden dimensions.

상기 이중 선형 보간 모듈은 상기 기초 데이터의 이중 선형 보간 이미지 데이터를 산출하게 된다.The bilinear interpolation module calculates bilinear interpolation image data of the basic data.

상기 합산 모듈은 상기 이중 선형 보간 이미지 데이터와 상기 리샘플 이미지의 추정 RGB 정보를 합산하여, 워핑된 업 스케일링된 이미지를 제공(출력)하게 된다.The summation module sums the bilinear interpolation image data and the estimated RGB information of the resample image and provides (outputs) a warped up-scaled image.

상술한 구성들을 포함하는 국부 텍스처 추정을 이용한 이미징 워핑 네트워크 시스템의 학습 과정에 대해서 살명하자면,To explain the learning process of the imaging warping network system using local texture estimation including the above-described configurations,

크기가 B인 두 개의 집단을 가정하고, 하나의 집단은 이미지 집단()이고, 또다른 하나의 집단은 좌표 변환 집단()으로, 여기서 각 는 미분 가능하고 가역적이다.Assume two groups of size B, one group is the image group ( ), and another group is the coordinate transformation group ( ), where each is differentiable and reversible.

하기의 수학식 10과 같이 역 좌표 변환을 적용한 입력 이미지 데이터를 준비한다.Prepare input image data to which inverse coordinate transformation is applied as shown in Equation 10 below.

여기서, 이다.here, am.

이 후, 보이드 픽셀을 피해서, 입력 이미지 데이터를 잘라냄으로써, 크롭된 이미지 데이터를 생성한다.(, 여기서, 임.)Afterwards, cropped image data is generated by cropping the input image data, avoiding void pixels. ( , here, lim.)

한편, i번째 집단 요소(, 여기서 임.)에 대해, 유효한 좌표 중 쿼리 포인트(M)를 랜덤하게 샘플링하여, 정답 데이터(GT, Ground Truth)를 준비한다.Meanwhile, the ith group element ( , here ), randomly sample the query points (M) among the valid coordinates and prepare the correct answer data (GT, Ground Truth).

이 후, 도 4에 도시된 바와 같이, 본 발명에 의한 각 쿼리 포인트에 대한 워핑된 업 스케일링 이미지를 평가하기 위해, 정답 데이터 그룹과 비교하여 손실을 계산하게 된다(하기의 수학식 11 참조).Afterwards, as shown in FIG. 4, in order to evaluate the warped up-scaled image for each query point according to the present invention, the loss is calculated by comparing it with the correct data group (see Equation 11 below).

여기서, 는 RGB 정보를 의미한다.here, means RGB information.

비대칭 스케일을 갖는 고해상도 기법(asymmetric-scale SR)의 경우, 각 스케일 팩터(sx, sy)는 u(0.25, 1)에서 랜덤하게 샘플링되고, 호모그래피 변환(homography transform)의 경우, 하기의 수학식 12에 의한 분포에서 역좌표 변환을 랜덤하게 샘플링을 수행하게 된다.In the case of a high-resolution technique with asymmetric scale (asymmetric-scale SR), each scale factor (sx, sy) is randomly sampled from u(0.25, 1), and in the case of homography transform, the equation below Inverse coordinate transformation is randomly sampled from the distribution by 12.

여기서, hx, hy ~ u(0.25, 0.25)는 쉬어링(sheering)을 의미하며,Here, hx, hy ~ u(0.25, 0.25) means sheering,

는 회전(rotation)을 의미하며, means rotation,

sx, sy ~ u(0.35, 0.5)는 스케일링(scaling)을 의미하며,sx, sy ~ u(0.35, 0.5) means scaling,

tx ~ u(-0.75W, 0.125W), ty ~ u(0.75H, 0.125H), px ~ u(-0.6W, 0.6W), py ~ u(-0.6H, 0.6H)는 투영(projection)을 의미한다.tx to u(-0.75W, 0.125W), ty to u(0.75H, 0.125H), px to u(-0.6W, 0.6W), py to u(-0.6H, 0.6H) are projections. ) means.

보이지 않는 변환(unseen transformation)을 통해 본 발명의 일반화 성능을 평가한 결과, 비대칭 스케일을 갖는 고해상도 기법과 호모그래피 변환의 경우, 학습되지 않은 좌표 변환에 대해서 sx, sy ~ u(0.125, 0.25)에서 샘플링이 이루어지며, 다른 파라미터들(hx, hy, θ, tx, ty, px, py)은 동일함을 알 수 잇었다.As a result of evaluating the generalization performance of the present invention through unseen transformation, in the case of high-resolution techniques with asymmetric scale and homography transformation, sx, sy ~ u (0.125, 0.25) for unlearned coordinate transformation Sampling was performed, and it was found that other parameters (hx, hy, θ, tx, ty, px, py) were the same.

더 나아가, 본 발명의 성능을 평가하기 위하여 다양한 실험을 수행하였다.Furthermore, various experiments were performed to evaluate the performance of the present invention.

상세하게는, 비대칭 스케일을 갖는 고해상도 기법의 경우, Set5, Set14, B100 및 Urban100(하기의 표 1 참조)에 대한 네트워크 평가를 위한 PSNR(Peak Signal-to-Noise Ratio)를 분석하였으며, 입력 데이터로 48 × 48 패치를 사용하였으며, 학습 과정에서 임의의 스케일 다운 샘플링을 위해 Pytorch에서 바이큐빅(bicubic) 크기 조정을 수행하였다. 또한, 업 스케일링 모듈이 없는 인코더로 설계하였다.In detail, for the high-resolution technique with asymmetric scale, Peak Signal-to-Noise Ratio (PSNR) for network evaluation was analyzed for Set5, Set14, B100, and Urban100 (see Table 1 below), and as input data 48 × 48 patches were used, and bicubic resizing was performed in Pytorch for random scale down sampling during the learning process. Additionally, it was designed as an encoder without an upscaling module.

호모그래피 변환의 경우, 네트워크 평가를 위해 기존 Set5, Set14, B100 및 Urban100의 고해상도 이미지에 호모그래피 변환을 적용하여 Set5-워핑, Set14-워핑, B100-워핑, Urban100-워핑 데이터 셋을 분석하였다. 학습 과정에서 임의의 호모그래피 변환을 위해 바이큐빅 리샘플링을 사용하였으며, 입력 매치의 최대 크기는 48 × 48 패치를 사용하였으며, 업 스케일링 모듈이 없는 인코더로 설계하였다.In the case of homography transformation, for network evaluation, homography transformation was applied to the existing high-resolution images of Set5, Set14, B100, and Urban100, and the Set5-warping, Set14-warping, B100-warping, and Urban100-warping data sets were analyzed. During the learning process, bicubic resampling was used for random homography transformation, the maximum size of the input match was 48 × 48 patches, and an encoder without an upscaling module was designed.

더불어, 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템의 결과가 호모그래피 변환을 나타내도록 학습하고, ERP 투시 투영을 수행하여 추가 학습 없이 일반화 성능을 평가하게 된다.In addition, the results of the image warping network system using local texture estimation according to an embodiment of the present invention are learned to represent homography transformation, and ERP perspective projection is performed to evaluate generalization performance without additional learning.

평가 결과, 비대칭 스케일을 갖는 고해상도 기법을 적용에 대해서 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템과 종래의 RCAN, MetsSR(Magnification Arbitrary Network for Super-Resolution), ArbSR(Scale-Arbitrary Super-Resolution)과의 비교 결과는 하기의 표 1 및 도 5(in-scale)와, 하기의 표 2 및 도 6(out-of-scale)와 같다.As a result of the evaluation, the image warping network system using local texture estimation according to an embodiment of the present invention, the conventional RCAN, MetsSR (Magnification Arbitrary Network for Super-Resolution), and ArbSR (Scale- The results of comparison with Arbitrary Super-Resolution are shown in Table 1 and Figure 5 (in-scale) below and Table 2 and Figure 6 (out-of-scale) below.

여기서, 빨간 색은 최고의 성능을 의미하며, 파란 색은 그 다음의 성능을 의미한다.Here, red means the best performance, and blue means the next performance.

RCAN의 경우, 저해상도 이미지를 4배 업샘플링하고, 바이큐빅 보간법을 사용하여 리샘플링한다. MetaSR의 경우, 입력 이미지를 최대(sx, sy) 배수로 업샘플링하고, 바이큐빅 기법을 사용하여 다운샘플링한다.For RCAN, low-resolution images are upsampled by 4 times and resampled using bicubic interpolation. In the case of MetaSR, the input image is upsampled by the maximum (sx, sy) multiple and downsampled using the bicubic technique.

Set14, B100을 제외하고, 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은 모든 스케일 팩터와 모든 데이터 셋에 대해 성능 및 시각적 품질에서 종래의 네트워크 시스템의 성능보다 뛰어남을 알 수 있다.Except for Set14 and B100, it can be seen that the image warping network system using local texture estimation according to an embodiment of the present invention is superior to the performance of the conventional network system in terms of performance and visual quality for all scale factors and all data sets. there is.

또한, 호모그래피 변환을 적용하여 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템과 종래의 RRDB 및 SRWarp와의 비교 결과는 하기의 표 3, 도 7 및 도 8과 같다.In addition, the results of comparison between the image warping network system using local texture estimation according to an embodiment of the present invention by applying homography transformation and the conventional RRDB and SRWarp are shown in Table 3, Figures 7, and 8 below.

RRDB의 경우, 입력 이미지를 4배로 슈퍼 샘플링하고, 바이큐빅 리샘플링 기법을 사용하여 변환한다.In the case of RRDB, the input image is supersampled 4 times and converted using the bicubic resampling technique.

본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은 in-scale과 out-of-scale 모두에 대해 mPSNR 및 시각적 품질이 종래의 네트워크 시스템의 성능보다 뛰어남을 알 수 있다.It can be seen that the image warping network system using local texture estimation according to an embodiment of the present invention is superior to the performance of the conventional network system in mPSNR and visual quality for both in-scale and out-of-scale.

더불어, ERP 투시 투영을 적용한 일반화 능력의 검증 결과는 도 9에 도시된 바와 같다. RRDB의 경우, 입력 ERP 이미지를 4배 업샘플링하고 바이큐빅 기법을 이용하여 보간 처리하였으며, 이 때, 입력 ERP 이미지의 해상도는 1644 × 832이다.In addition, the verification results of generalization ability applying ERP perspective projection are shown in Figure 9. In the case of RRDB, the input ERP image was upsampled 4 times and interpolated using the bicubic technique. At this time, the resolution of the input ERP image was 1644 × 832.

다만, HMD 등과 같은 제한된 하드웨어 자원을 고려할 때, ERP 이미지를 전체 해상도로 저장하고 전송하는 것은 불가능하기 때문에, 고해상도의 ERP 이미지를 4배로 다운 샘플링하고 FOV(Field of View) 120°로 이미지를 832 × 832 크기로 투영한다.However, considering limited hardware resources such as HMD, it is impossible to save and transmit the ERP image at full resolution, so the high-resolution ERP image is downsampled by 4 times and the image is converted to 832 Project at size 832.

도 9에 도시된 바와 같이, RRDB는 고주파 영역을 보여주는 데 한계가 분명히 존재하며, SRWarp는 경계 근처의 구조물을 보여주고 있으나, 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템에 의한 결과 데이터는 경계 근처에 구조물 없이 미세한 세부 정보까지 보여줌을 알 수 있다.As shown in FIG. 9, RRDB clearly has limitations in showing high-frequency areas, and SRWarp shows structures near the boundary, but in the image warping network system using local texture estimation according to an embodiment of the present invention, It can be seen that the resulting data shows fine details without structures near the boundary.

더불어, 도 10에서는 FOV 90°로 크기가 8190 × 2529에서 4095 ×4095인 고해상도 ERP 이미지를 투영한 결과, RRDB 및 SRWarp는 이를 분석 불가능하였다.In addition, in Figure 10, as a result of projecting a high-resolution ERP image with a size of 8190 × 2529 to 4095 × 4095 with a FOV of 90°, RRDB and SRWarp were unable to analyze it.

즉, RRDB 및 SRWarp 모두 입력되는 ERP 이미지를 4배 업샘플링해야 하는데, 엄청난 양의 메모리가 요구되기 때문에, 현실 불가능하다.In other words, both RRDB and SRWarp need to upsample the input ERP image by 4 times, which is practically impossible because a huge amount of memory is required.

이에 반해, 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템은 분석 포인트를 순차적으로 쿼리할 수 있기 때문에, 메모리를 효율적으로 활용하여 복원이 가능함을 알 수 있다.On the other hand, since the image warping network system using local texture estimation according to an embodiment of the present invention can query analysis points sequentially, it can be seen that restoration is possible by efficiently utilizing memory.

또한, 도 11에서는 본 발명의 일 실시예에 따른 국부 텍스처 추정을 이용한 이미지 워핑 네트워크 시스템에 의해 워핑된 이미지의 검증을 위해 이산 푸리에 변환(DFT, Discrete Fourier Transform) 특징 공간을 시각화하였다.In addition, in Figure 11, the Discrete Fourier Transform (DFT) feature space is visualized to verify the image warped by the image warping network system using local texture estimation according to an embodiment of the present invention.

상세하게는, 2D 공간에 추정 주파수를 산란하고, 크기에 따라 색상을 설정한다. 이 후, 입력 이미지()에서 추정한 푸리에 정보(Fj)를 이용하여, 리샘플 이미지를 생성하기 앞서서, 푸리에 공간(Fj)은 야코비안 행렬()에 의해 변환되어, 출력 이미지()의 주파수 응답과 일치하게 된다.In detail, the estimated frequency is scattered in 2D space and the color is set according to the size. After this, the input image ( Before generating a resample image using the Fourier information (Fj) estimated from ), the Fourier space (Fj) is the Jacobian matrix ( ), the output image ( ) is consistent with the frequency response of

이상과 같이 본 발명에서는 구체적인 구성 소자 등과 같은 특정 사항들과 한정된 실시예 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것 일 뿐, 본 발명은 상기의 일 실시예에 한정되는 것이 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described with reference to specific details such as specific components and drawings of limited embodiments, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above-mentioned embodiment. No, those skilled in the art can make various modifications and variations from this description.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허 청구 범위뿐 아니라 이 특허 청구 범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Accordingly, the spirit of the present invention should not be limited to the described embodiments, and all matters that are equivalent or equivalent to the claims of this patent as well as the claims described below shall fall within the scope of the spirit of the present invention. .

Claims

An encoder module that receives basic image data and calculates an included feature map;
Receives pre-entered local grid information, shape information, and the feature map, estimates Fourier information considering coordinate transformation, and Jacobian matrices resulting from the estimated Fourier information and coordinate transformation. An image processing module that outputs a resample image using;
a decoder module that receives the resampled image and outputs estimated RGB information of the resampled image;
a bilinear interpolation module that calculates bilinear interpolation image data of the basic image data; and
a summation module that adds the bilinear interpolated image data and estimated RGB information of the resampled image to provide a warped up-scaled image;
Image warping network system using local texture estimation, including.

According to clause 1,
The encoder module is
An image warping network system using local texture estimation, which consists of a deep SR (Super-Resolution) network that does not include an upscaling module, and calculates a feature map through the deep SR network.

According to clause 2,
The deep SR network is
An image warping network system using local texture estimation, which is one of EDSR (Enhanced Deep Super-Resolution Network), RCAN (Residual Channel Attention Network), and RRDB (Residual-in-Residual Dense Block network).

According to clause 2,
The image processing module is
an amplitude estimator that estimates amplitude using the characteristic map;
a frequency estimator that estimates frequency using the characteristic map; and
a phase estimator that estimates the phase of the shape information;
It further includes,
An image warping network system using local texture estimation that performs a vector inner product using the local grid information and the estimated frequency.

According to clause 4,
The amplitude estimator and the frequency estimator are
It is formed of 3×3 convolutional layers with 256 channels each,
The phase estimator is
Image warping network system using local texture estimation, formed from a single linear layer with 128 channels.