KR102136468B1

KR102136468B1 - Deep Learning based Video Frame Rate Conversion Method and Apparatus

Info

Publication number: KR102136468B1
Application number: KR1020180144595A
Authority: KR
Inventors: 정진우; 안하은; 김제우
Original assignee: 전자부품연구원
Priority date: 2018-11-21
Filing date: 2018-11-21
Publication date: 2020-07-21
Also published as: KR20200059627A; WO2020105753A1

Abstract

딥러닝에 기반하여 다양한 컬러 영상 포맷에 대응하는 동시에 고품질, 고속으로 프레임을 보간하는 동영상 프레임 율 변환 방법 및 장치가 제공된다. 본 발명의 실시예에 따른 동영상 프레임 율 변환 장치는 각 채널의 영상에 대한 해상도를 조정하는 조정부; 조정부에서 해상도가 조정된 t초와 t+1초의 연속된 두 영상으로 중간 영상을 생성하는 생성부; 및 생성부에서 생성된 중간 영상의 해상도를 향상시키는 향상부;를 포함한다.
이에 의해, 딥러닝 기반 동영상 프레임 율 변환에 있어, 딥러닝의 입력이 RGB 컬러 포맷 뿐만 아니라 YCbCr 등 다양한 컬러 포맷에 적용할 수 있게 되며, 프레임 보간 방법과 해상도 향상 방법을 동시에 이용하여 고품질, 고속의 프레임 율 변환이 가능해진다.Provided is a method and apparatus for converting a video frame rate that interpolates frames at high quality and high speed while coping with various color image formats based on deep learning. An apparatus for converting a video frame rate according to an embodiment of the present invention includes an adjustment unit that adjusts a resolution for an image of each channel; A generation unit that generates an intermediate image from two consecutive images of t seconds and t+1 seconds whose resolution is adjusted by the adjusting unit; And an enhancement unit for improving the resolution of the intermediate image generated by the generation unit.
As a result, in the deep learning-based video frame rate conversion, the input of deep learning can be applied to various color formats such as YCbCr as well as the RGB color format. Frame rate conversion becomes possible.

Description

Deep Learning based Video Frame Rate Conversion Method and Apparatus}

본 발명은 동영상의 프레임 율 변환 방법에 관한 것으로, 더욱 상세하게는 딥러닝 기법에 기반한 프레임 보간 기법과 해상도 증가 기법을 적용하여 고속으로 프레임 율을 증가시키는 방법 및 장치에 관한 것이다.The present invention relates to a method for converting a frame rate of a video, and more particularly, to a method and apparatus for increasing a frame rate at high speed by applying a frame interpolation technique and a resolution increase technique based on a deep learning technique.

1) 동영상 프레임 율 변환 기법 개요1) Overview of video frame rate conversion technique

동영상은 연속된 정지 영상의 집합으로 구성된다. 비디오에서 정지 영상을 프레임이라고 부르며 단위 시간 당 프레임의 수를 동영상의 프레임 율 (frame rate)이라고 한다. 예를 들어 1초에 24장의 프레임으로 구성되면 프레임 율은 24 fps (frame per second)가 된다.A video is composed of a set of continuous still images. In a video, a still image is called a frame, and the number of frames per unit time is called a frame rate of a video. For example, if it consists of 24 frames per second, the frame rate is 24 fps (frame per second).

프레임 율은 촬영자의 의도, 영상의 포맷, 카메라의 한계 등에 의하여 결정된다. 관찰자가 영상을 연속된 화면으로 느끼기 위해서는 어느 정도 이상의 프레임 율이 필요하고 이보다 낮을 경우 움직임이 부드럽지 않아 보인다. 이 현상은 디스플레이의 크기, 조명, 시청 거리 등에 의해 달라질 수 있다.The frame rate is determined by the intention of the photographer, the format of the image, and the limitations of the camera. In order for an observer to feel the image as a continuous screen, a frame rate of a certain amount or more is required, and if it is lower than this, the movement does not appear smooth. This phenomenon may vary depending on the size of the display, lighting, viewing distance, and the like.

이를 개선하기 위해 동영상의 프레임 율을 후처리에 의해 증가시키는 것을 동영상 프레임 율 변환이라고 한다. To improve this, increasing the frame rate of the video by post-processing is called video frame rate conversion.

2) 종래 기술 2) Prior art

동영상 프레임 율을 증가시키는 가장 간단한 방법은 프레임을 반복하는 것이다. 예를 들어 30 fps 영상을 60 fps 영상으로 증가시킬 경우, 각 프레임마다 한 프레임을 반복하여 출력하는 것이다. 그러나 이 방법의 경우 동영상의 정보량은 동일하고 움직임에 대한 연속성은 변하지 않았으므로 관찰자가 느끼는 불편감은 동일하다.The simplest way to increase the video frame rate is to repeat the frame. For example, when a 30 fps image is increased to a 60 fps image, one frame is repeatedly output for each frame. However, in this method, since the amount of information in the video is the same and the continuity for movement has not changed, the discomfort the viewer feels is the same.

이를 해결하기 위해 연속된 프레임들을 이용하여 가상의 프레임을 생성하는 기술이 개발되었다. 즉 t 초와 t+1 초 사이의 영상을 이용하여 t+0.5 초의 중간 영상을 새롭게 생성하며 이를 프레임 보간 (frame interpolation) 기술이라고 한다.To solve this, a technique for generating a virtual frame using successive frames has been developed. That is, an intermediate image of t+0.5 seconds is newly generated using an image between t seconds and t+1 seconds, and this is called a frame interpolation technique.

프레임 보간은 다양한 방법이 개발되었으며 일반적으로는 다음과 같은 두 단계 과정을 거친다. 첫 번째 단계는 움직임 또는 옵티컬 플로우 (optical flow)를 획득하는 단계 이며, 두 번째 단계는 움직임 정보를 바탕으로 중간 프레임을 생성 (Synthesis)하는 단계이다.Various methods of frame interpolation have been developed and generally go through the following two steps. The first step is a step of acquiring a motion or an optical flow, and the second step is a step of generating an intermediate frame (Synthesis) based on the motion information.

동영상에서 물체의 움직임이 부드럽게 보이려면 중간 영상은 물체의 움직임이 두 영상 사이의 중간에 해당되어야 한다. 따라서 물체의 움직임 정보를 가지고 있는 옵티컬 플로우를 정확하게 찾는 것이 매우 중요하다. 이에 기반한 다양한 기법들이 제안되어 왔다.In order for the motion of the object to appear smooth in the video, the intermediate image must correspond to the middle of the motion between the two images. Therefore, it is very important to accurately find the optical flow that contains the motion information of the object. Various techniques based on this have been proposed.

최근 딥러닝 (Deep learning) 알고리즘이 등장하여 컴퓨터 비젼, 음성 인식 등 다양한 분야에 널리 사용되고 있으며 종래에 방법에 비해 월등한 성능을 보이고 있다. 이에 발맞추어 딥러닝을 사용한 다양한 프레임 보간 기법이 등장하였다. 이 기법들은 옵티컬 플로우와 합성에 기반한 종래의 방법보다 더욱 뛰어난 보간 결과를 보여줌에 따라 최근 지속적으로 연구되고 있다.Recently, a deep learning algorithm has appeared, and is widely used in various fields such as computer vision and speech recognition, and shows superior performance compared to the conventional method. In line with this, various frame interpolation techniques using deep learning have appeared. These techniques have been studied continuously in recent years as they showed better interpolation results than conventional methods based on optical flow and synthesis.

3) 종래 기술 문제점3) Problems in the prior art

종래의 딥러닝에 기반한 기술은 입력 영상이 RGB 형식의 컬러 포맷을 갖는다고 가정한다. 그러나 대부분의 동영상은 YCbCr 형식의 컬러 포맷으로 압축되어 저장되어 있다. 또한 색차 신호인 Cb, Cr은 서브 샘플링 (sub-sampling) 되어 있어 밝기 신호인 Y와 해상도가 다르다.Conventional deep learning based technology assumes that the input image has a color format in RGB format. However, most videos are compressed and stored in YCbCr color format. In addition, the color difference signals Cb and Cr are sub-sampled, so the resolution is different from the brightness signal Y.

따라서 기존 방법에 일반적인 동영상을 적용하기 위해서는 색차 신호를 업샘플링 (up-sampling)하여 밝기 신호와 영상 크기를 맞춘 후, 다시 YCbCr 컬러 형식을 RGB로 변환하는 작업을 수행하여야 한다. 이런 컬러 변환 작업들은 비효율적이고 특히 영상 크기가 커질 경우 실시간 동작을 어렵게 하는 요소로 작동한다.Therefore, in order to apply a general video to an existing method, it is necessary to up-sampling the color difference signal to match the brightness signal and image size, and then convert the YCbCr color format to RGB again. These color conversion tasks are inefficient and operate as a factor that makes real-time operation difficult, especially when the image size increases.

기존 방법의 두 번째 문제는 네트워크 구조로 인한 문제로 4K (3840x2160) 해상도와 같이 큰 해상도에 대하여 GPU 메모리 부족으로 연산이 불가능 하거나, 아니면 매우 느린 연산 속도를 보여준다. 이와 같은 현상은 실시간 연산을 요구하는 상용 애플리케이션에는 딥러닝을 이용한 프레임 보간 방법 적용을 어렵게 한다.The second problem of the existing method is a problem due to the network structure, and for large resolutions such as 4K (3840x2160) resolution, it is impossible to operate due to insufficient GPU memory, or it shows a very slow operation speed. Such a phenomenon makes it difficult to apply a frame interpolation method using deep learning to commercial applications that require real-time computation.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 딥러닝에 기반하여 다양한 컬러 영상 포맷에 대응하는 동시에 고품질, 고속으로 프레임을 보간하는 동영상 프레임 율 변환 방법 및 장치를 제공함에 있다.The present invention has been devised to solve the above problems, and an object of the present invention is to provide a method and apparatus for converting a video frame rate that interpolates frames at high quality and high speed while coping with various color image formats based on deep learning. In the offer.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 동영상 프레임 율 변환 장치는 각 채널의 영상에 대한 해상도를 조정하는 조정부; 조정부에서 해상도가 조정된 t초와 t+1초의 연속된 두 영상으로 중간 영상을 생성하는 생성부; 및 생성부에서 생성된 중간 영상의 해상도를 향상시키는 향상부;를 포함한다. According to an embodiment of the present invention for achieving the above object, the apparatus for converting a video frame rate includes an adjustment unit that adjusts a resolution for an image of each channel; A generation unit that generates an intermediate image from two consecutive images of t seconds and t+1 seconds whose resolution is adjusted by the adjusting unit; And an enhancement unit for improving the resolution of the intermediate image generated by the generation unit.

조정부는, 각 채널의 영상에 대한 해상도를 동일하게 서브 샘플링하는 것일 수 있다. The controller may subsample the resolution of the images of each channel in the same way.

생성부는, 해상도가 조정된 영상의 특성을 계산하는 인코더부; 계산된 특성을 영상의 해상도 크기만큼 복원하는 디코더부; 조정부에서 해상도가 조정된 두 영상과 디코더의 출력을 이용하여 중간 영상을 생성하는 합성부;를 포함할 수 있다. The generating unit includes: an encoder unit for calculating the characteristics of the image with the adjusted resolution; A decoder unit that restores the calculated characteristics by the resolution size of the image; It may include; a synthesis unit for generating an intermediate image by using the output of the decoder and the two images whose resolution is adjusted by the adjustment unit.

향상부는, 생성부에서 생성된 중간 영상의 해상도를 원본 영상의 해상도로 향상시키는 것일 수 있다.The enhancement unit may improve the resolution of the intermediate image generated by the generation unit to the resolution of the original image.

향상부는, 디테일을 개선할 중간 영상의 해상도를 조정하는 제1 조정부; 원본 영상을 이용하여, 해상도가 조정된 중간 영상의 디테일을 개선하는 디테일 향상부; 제1 조정부에서 해상도가 조정되지 않은 중간 영상의 해상도를 조정하는 제2 조정부를 포함할 수 있다.The enhancement unit includes: a first adjustment unit that adjusts the resolution of the intermediate image to improve detail; A detail enhancement unit that improves details of an intermediate image whose resolution is adjusted using the original image; The first adjustment unit may include a second adjustment unit that adjusts the resolution of the intermediate image whose resolution is not adjusted.

디테일 향상부는, 선별적으로 선택된 채널만을 입력받는 것일 수 있다.The detail enhancement unit may receive only a selectively selected channel.

선택된 채널은, 밝기 신호일 수 있다. The selected channel may be a brightness signal.

디테일 향상부는, 생성부에서 생성되는 중간 정보를 참조하여, 중간 영상의 디테일을 개선하는 것일 수 있다.The detail enhancement unit may improve the detail of the intermediate image by referring to the intermediate information generated by the generation unit.

디테일 향상부는, 딥러닝 네트워크를 사용하는 것일 수 있다.The detail enhancement unit may be to use a deep learning network.

한편, 본 발명의 다른 실시예에 따른, 동영상 프레임 율 변환 장치는 각 채널의 영상에 대한 해상도를 조정하는 단계; 해상도가 조정된 t초와 t+1초의 연속된 두 영상으로 중간 영상을 생성하는 단계; 및 생성된 중간 영상의 해상도를 향상시키는 단계;를 포함한다. On the other hand, according to another embodiment of the present invention, the video frame rate converter comprises the steps of adjusting the resolution for the image of each channel; Generating an intermediate image from two consecutive images of t seconds and t+1 seconds with the adjusted resolution; And improving the resolution of the generated intermediate image.

한편, 본 발명의 다른 실시예에 따른, 동영상 프레임 율 변환 장치는 각 채널의 영상에 대한 해상도를 조정하는 조정부; 조정부에서 해상도가 조정된 t초와 t+1초의 연속된 두 영상으로 생성된 중간 영상의 해상도를 향상시키는 향상부;를 포함한다. On the other hand, according to another embodiment of the present invention, the video frame rate conversion apparatus is an adjustment unit for adjusting the resolution for the image of each channel; It includes; an enhancement unit for improving the resolution of the intermediate image generated by two consecutive images of t seconds and t+1 seconds whose resolution is adjusted by the adjusting unit.

한편, 본 발명의 다른 실시예에 따른, 동영상 프레임 율 변환 장치는 각 채널의 영상에 대한 해상도를 조정하는 단계; 조정부에서 해상도가 조정된 t초와 t+1초의 연속된 두 영상으로 생성된 중간 영상의 해상도를 향상시키는 단계;를 포함한다. On the other hand, according to another embodiment of the present invention, the video frame rate converter comprises the steps of adjusting the resolution for the image of each channel; And improving the resolution of the intermediate image generated by two consecutive images of t seconds and t+1 seconds whose resolution is adjusted by the adjustment unit.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 딥러닝 기반 동영상 프레임 율 변환에 있어, 딥러닝의 입력이 RGB 컬러 포맷 뿐만 아니라 YCbCr 등 다양한 컬러 포맷에 적용할 수 있게 되며, 프레임 보간 방법과 해상도 향상 방법을 동시에 이용하여 고품질, 고속의 프레임 율 변환이 가능해진다.As described above, according to embodiments of the present invention, in deep learning-based video frame rate conversion, the input of deep learning can be applied to various color formats such as YCbCr as well as the RGB color format. High-quality, high-speed frame rate conversion is possible using the resolution enhancement method at the same time.

도 1 : 프레임 보간 기술
도 2 : 발명의 도면
도 3 : 색차 신호 서브 샘플링, Y: 밝기 신호, Cb, Cr: 색차 신호
도 4 : 본 발명의 대표 도면
도 5 : 중간 프레임 생성부와 해상도 향상부 개념도
도 6 : 입력 영상 조정부와 디테일 향상부의 입출력 해상도간의 관계
도 7 : YCbCr 4:2:0 형식에 대한 본 발명의 예시
도 8 : YCbCr 4:2:2 형식에 대한 본 발명의 예시 1
도 9 : YCbCr 4:2:2 형식에 대한 본 발명의 예시 2
도 10 : YCbCr 4:4:4 또는 RGB 형식에 대한 본 발명의 예시 1
도 11 : YCbCr 4:4:4 또는 RGB 형식에 대한 본 발명의 예시 21: Frame interpolation technology
Figure 2: Drawing of the invention
3: Subsampling of chrominance signal, Y: brightness signal, Cb, Cr: chrominance signal
4: Representative drawings of the present invention
5: A conceptual diagram of an intermediate frame generation unit and a resolution enhancement unit
6: Relationship between input and output resolution of the input image adjustment unit and the detail enhancement unit
Figure 7: Example of the present invention for YCbCr 4:2:0 format
8: Example 1 of the present invention for YCbCr 4:2:2 format
Figure 9: Example 2 of the present invention for YCbCr 4:2:2 format
Figure 10: Example 1 of the present invention for YCbCr 4:4:4 or RGB format
Figure 11: Example 2 of the present invention for YCbCr 4:4:4 or RGB format

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

본 발명의 실시예에 따른 동영상 프레임 율 변환 장치는, 도 2에 도시된 바와 같이, 입력 영상 조정부(110), 중간 프레임 생성부(120), 해상도 향상부(130)를 포함하여 구성된다.The apparatus for converting a video frame rate according to an embodiment of the present invention includes an input image adjustment unit 110, an intermediate frame generation unit 120, and a resolution enhancement unit 130, as shown in FIG. 2.

대부분의 동영상은 RGB 형식에서 YCbCr 형식으로 변환되어 사용된다. YCbCr에서 Y는 밝기 신호, Cb와 Cr은 색차 신호라고 한다. 일반적으로 대부분 동영상의 YCbCr 영상의 색차 신호는 서브 샘플링이 되어 사용된다.Most videos are converted from RGB format to YCbCr format. In YCbCr, Y is a brightness signal and Cb and Cr are color difference signals. In general, color difference signals of YCbCr images of most moving pictures are used as sub-sampling.

YCbCr의 대표적인 서브 샘플링 기법으로는 도 2와 같이 YCbCr4:4:4, YCbCr4:2:2, YCbCr4:2:0 방법이 있다. YCbCr4:4:4는 색차신호인 Cb 와 Cr 신호를 서브샘플링 하지 않는 영상 포맷을 의미하며 Cb, Cr의 해상도와 Y의 해상도가 같다. YCbCr 4:2:2의 경우 색차 신호는 가로 방향으로 2씩 서브 샘플링 되고, YCbCr 4:2:0에서 색차 신호는 가로, 세로 2씩 서브 샘플링되었다. 이와 같이 색차 신호와 밝기 신호의 해상도는 컬러 포맷에 의해 서로 상이하다.Representative subsampling techniques of YCbCr include YCbCr4:4:4, YCbCr4:2:2, and YCbCr4:2:0 methods as shown in FIG. 2. YCbCr4:4:4 means an image format that does not subsample the color difference signals Cb and Cr signals, and the resolutions of Cb, Cr and Y are the same. In the case of YCbCr 4:2:2, the color difference signal is subsampled by 2 in the horizontal direction, and in YCbCr 4:2:0, the color difference signal is subsampled by 2 in the horizontal and vertical direction. In this way, the resolutions of the color difference signal and the brightness signal are different from each other by color format.

Y, Cb, Cr의 각각을 하나의 컬러 채널이라고 하자. 일반적인 딥러닝 네트워크의 입력은 채널 간의 해상도가 동일해야 각각의 컬러 채널을 동시에 처리할 수 있다.Let each of Y, Cb, and Cr be one color channel. In general, a deep learning network input must have the same resolution between channels so that each color channel can be processed simultaneously.

그러나 위에서 언급하였듯이 컬러 포맷에 따라 컬러 채널 간 해상도가 상이하다. 컬러 채널 간 해상도를 맞추기 위해서 본 발명의 실시예에서는 입력 신호를 조정하는 방식을 제안한다.However, as mentioned above, the resolution between color channels is different according to the color format. In order to match the resolution between color channels, an embodiment of the present invention proposes a method of adjusting an input signal.

도 2에서, 입력 영상 조정부(110)는 딥러닝에 적용하기 전 각 채널의 영상의 해상도를 동일하게 조절하도록 한다. 즉 채널 별로 각각 서브 샘플링하여 동일한 해상도를 갖게 하도록 한다.In FIG. 2, the input image adjusting unit 110 adjusts the resolution of the image of each channel to be the same before applying it to deep learning. That is, each channel is sub-sampled to have the same resolution.

구체적으로 설명하기 위해 입력 신호가 N개의 채널로 구성되어 있고, 각각의 채널에 대한 해상도가 H₁ × W₁ , ... ,H_N × W_N라고 하자. 여기서 H는 영상의 세로 해상도를 W는 가로 해상도를 의미한다.For details, suppose that the input signal is composed of N channels, and the resolution for each channel is H ₁ × W ₁ , ..., H _N × W _N. Here, H is the vertical resolution of the image and W is the horizontal resolution.

입력 영상 조정부(110)는 모든 채널의 가로 해상도와 세로 해상도를 서브 샘플링을 통해 동일한 해상도 H_M x W_M으로 변형시킨다. H_M x W_M 은 가장 큰 해상도를 갖는 채널의 해상도보다는 크지 않는 것을 권장한다. 왜냐하면 딥러닝 네트워크의 입력 해상도가 커질 경우 그에 비례해 연산량이 증가하기 때문이다.The input image adjusting unit 110 transforms horizontal and vertical resolutions of all channels into the same resolution H _M x W _M through sub-sampling. H _M x W _M It is recommended that is not larger than the resolution of the channel with the largest resolution. This is because when the input resolution of a deep learning network increases, the amount of computation increases proportionally.

입력 영상 조정부(110)에서는 영상의 해상도를 조절함으로써 보간 프로세서의 연산량을 줄일 수 있다. 참고로 채널 별로 별도의 네트워크로 처리할 경우 밝기와 색차신호의 움직임이 다르게 예측되어 밝기 신호와 색차 신호가 합성된 신호는 불일치 생길 수 가 있다. 이와 같이 입력 영상 조정부(110)는 채널 별로 입력된 신호의 해상도를 조절하여 네트워크에 입력으로 사용될 영상을 만든다.The input image adjusting unit 110 may reduce the computation amount of the interpolation processor by adjusting the resolution of the image. For reference, when processing by a separate network for each channel, the motions of the brightness and the color difference signal are predicted differently, and the signal in which the brightness signal and the color difference signal are synthesized may be inconsistent. In this way, the input image adjusting unit 110 adjusts the resolution of the input signal for each channel to make an image to be used as an input to the network.

본 발명의 실시예에서 제안하는 네트워크 구조는 도 5와 같이 크게 두 가지 단계로 이루어져 있다. 첫 번째는 중간 영상을 생성하는 네트워크로 중간 프레임 생성부로 지칭한다. 이 네트워크는 기존의 중간 영상을 생성하는 네트워크가 사용될 수 있다. 두 번째 네트워크는 생성된 중간 영상을 원본 해상도와 일치하게 조절하는 네트워크로 해상도 향상부(130)라고 지칭한다.The network structure proposed in the embodiment of the present invention is largely composed of two steps as shown in FIG. 5. The first is a network that generates an intermediate image and is referred to as an intermediate frame generator. This network may be a network that generates existing intermediate images. The second network is a network that adjusts the generated intermediate image to match the original resolution and is referred to as a resolution enhancement unit 130.

도 4를 참조하여 본 발명의 실시예에 따른 동영상 프레임 율 변환 장치에 대하여 보다 상세히 설명한다. A video frame rate conversion apparatus according to an embodiment of the present invention will be described in more detail with reference to FIG. 4.

중간 프레임 생성부(120)는 t초와 t+1초의 연속된 두 프레임의 영상(

,

)을 입력으로 받는다. 두 프레임은 입력 영상 조정부(110)를 거쳐 두 프레임 간의 모든 채널의 영상 해상도는 일치된 상태가 된다.The intermediate frame generation unit 120 is an image of two consecutive frames of t seconds and t+1 seconds (

,

) As input. The two frames pass through the input image adjusting unit 110 and the image resolutions of all channels between the two frames are matched.

중간 프레임 생성부(120)는 인코더부(121), 디코더부(122), 합성부(123)로 구성된다.The intermediate frame generation unit 120 includes an encoder unit 121, a decoder unit 122, and a synthesis unit 123.

인코더부(121)는 컨볼루셔널 레이어 (convolutioal layer), 풀링 레이어 (Pooling layer), 활성화 레이어 (Activation layer) 등으로 구성되어 있고 이에 한정하지 않는다.The encoder unit 121 is composed of a convolutional layer (convolutioal layer), a pooling layer (Pooling layer), an activation layer (Activation layer), and the like.

인코더부(121)의 출력은 디코부(122)의 입력이 되고 디코더부(122)에서는 디컨볼루셔널 레이어 (deconvolutional layer) 및 활성화 레이어 (activation layer) 등으로 구성되어 있으며 이에 한정하지 않는다.The output of the encoder unit 121 is an input of the decoder unit 122, and the decoder unit 122 is composed of a deconvolutional layer and an activation layer, but is not limited thereto.

디코더부(122)의 출력과 네트워크의 입력이 합성부(123)의 입력으로 들어가 1차 중간 (보간) 영상을 생성한다. 1차 중간 영상의 해상도는 중간 프레임 생성부(120)의 입력의 해상도와 동일하다. 디코더부(122)에서는 인코더부(121)의 출력뿐만 아니라 인코더부(121)의 중간 결과도 입력으로 받을 수 있고, 이를 이용해 성능을 개선할 수 있다.The output of the decoder unit 122 and the network input enter the input of the synthesis unit 123 to generate a first intermediate (interpolated) image. The resolution of the primary intermediate image is the same as the resolution of the input of the intermediate frame generator 120. In the decoder unit 122, not only the output of the encoder unit 121 but also the intermediate result of the encoder unit 121 can be received as an input, and the performance can be improved by using the same.

도 5는 인코더부(121), 디코더부(122), 합성부(123)에 대한 개념을 보여준다.5 shows the concepts of the encoder unit 121, the decoder unit 122, and the synthesis unit 123.

인코더부(121)에서는 영상의 해상도가 압축되어 영상의 특성 (feature)을 계산하고, 디코더부(122)에서는 특성을 영상의 해상도 크기만큼 복원한다. 인코더부(121)의 중간 정보는 디코더부(122)에 직접적으로 전달되고 이는 도 5에서 화살표로 표현된다.In the encoder unit 121, the resolution of the image is compressed to calculate a feature of the image, and the decoder unit 122 restores the characteristic by the resolution size of the image. The intermediate information of the encoder unit 121 is directly transmitted to the decoder unit 122, which is represented by an arrow in FIG.

또한 인코더부(121)와 디코더부(122)의 중간 정보도 디테일 향상부(132)에 전달되어 재활용되어 진다.In addition, intermediate information between the encoder unit 121 and the decoder unit 122 is also transmitted to the detail enhancement unit 132 and recycled.

해상도 향상부(130)는 해상도가 감소된 채널 신호에 대하여 해상도 증가를 실시한다. 중간 프레임 생성부(120)에 생성된 1차 중간 프레임의 모든 채널의 해상도는 H_M x W_M 이다. 원본 영상의 해상도는 H₁ x W₁ , ... ,H_N x W_N 이므로, 해상도 향상부(130)에서는 원본 영상의 해상도와 일치하게 영상의 크기를 조절한다. 1차 중간 프레임의 해상도는 원본 해상도에 비해 작은 것이 권장되므로 해상도 향상부(130)는 주로 해상도 증가 작업이 수행된다.The resolution enhancement unit 130 increases the resolution of the channel signal having the reduced resolution. The resolution of all channels of the primary intermediate frame generated in the intermediate frame generator 120 is H _M x W _M to be. The resolution of the original image is H ₁ x W ₁ , ..., H _N x W _N Therefore, the resolution enhancement unit 130 adjusts the size of the image to match the resolution of the original image. Since the resolution of the primary intermediate frame is smaller than the original resolution, the resolution enhancement unit 130 mainly performs an increase in resolution.

해상도 향상부(130)는 중간 영상 조정부(131), 디테일 향상부(132), 출력 영상 조정부(133)로 구성되어 있고, 이 중 디테일 향상부(132)만 딥러닝 네트워크를 사용하여 많은 연산량이 필요로 한다.The resolution enhancement unit 130 is composed of an intermediate image adjustment unit 131, a detail enhancement unit 132, and an output image adjustment unit 133, of which only the detail enhancement unit 132 uses a deep learning network to generate a large amount of computation. in need.

따라서 디테일 향상부(132)에 사용되는 입력은 모든 채널이 아닌 선별적으로 선택된 채널만 사용될 수 있다. 일반적으로는 밝기 신호의 대부분의 디테일이 집중되어 있으므로 디테일 향상부(132)의 입력으로는 밝기 신호만을 사용하는 것을 권장한다.Therefore, the input used in the detail enhancement unit 132 may be used only selectively selected channels, not all channels. In general, since most of the detail of the brightness signal is concentrated, it is recommended to use only the brightness signal as the input of the detail enhancement unit 132.

중간 영상 조정부(131)에서는 디테일 향상부(132)에 사용될 입력의 해상도를 조절한다. 도 6은 입력 영상 조정부(110)와 디테일 향상부(132)의 입출력 해상도간의 관계를 보여준다. 원본 영상의 n 번째 채널의 해상도는 H_n x W_n 이라고 하고 n 번째 채널을 디테일 향상부(132)에 적용하는 상태이다.The intermediate image adjustment unit 131 adjusts the resolution of the input to be used in the detail enhancement unit 132. 6 shows a relationship between input and output resolutions of the input image adjustment unit 110 and the detail enhancement unit 132. The resolution of the n-th channel of the original image is H _n x W _n And the n-th channel is applied to the detail enhancement unit 132.

디테일 향상부(132)의 입출력의 해상도는 원본 해상도와 동일해야 하고, 이를 위해 입력 영상 조정부(110)에서 1차 중간 프레임의 해상도를 조절하여 영상 크기 원본 해상도와 동일하게 한다. 조절하는 방법은 bilinear, bicubic 등의 필터를 사용할 수 있고 특정 방법에 제한되지 않는다.The resolution of the input/output of the detail enhancement unit 132 should be the same as the original resolution, and for this purpose, the resolution of the primary intermediate frame is adjusted by the input image adjustment unit 110 to be the same as the original resolution of the image size. The method of adjustment may use filters such as bilinear and bicubic, and is not limited to a specific method.

디테일 향상부(132)의 입력은 1차 중간 영상과 원본 입력 영상

와

이다. 원본 입력 영상은 비록 다른 시간의 영상들이지만 원본 해상도의 디테일들을 보존하고 있다. 따라서 디테일 향상부(132)에는 이 정보를 사용하여 1차 중간 영상의 디테일을 개선하도록 한다.The input of the detail enhancement unit 132 is the first intermediate image and the original input image

Wow

to be. The original input image preserves the details of the original resolution, although the images are of different times. Therefore, the detail enhancement unit 132 uses this information to improve the detail of the primary intermediate image.

이외에도 해상도 향상부(130)의 입력으로 중간 프레임 생성부(120)의 중간 결과들을 입력으로 받는다. 중간 프레임 생성부(120)의 중간 과정를 디테일 향상부(132)에서는 입력으로 받는다. 이 입력은 영상의 특징 (feature)을 포함하고 있으며 이 특징은 디테일 향상부(132)에 사용됨으로써 더욱 높은 성능을 제공한다.In addition, as an input of the resolution enhancement unit 130, intermediate results of the intermediate frame generation unit 120 are received as input. The intermediate process of the intermediate frame generation unit 120 is received as an input by the detail enhancement unit 132. This input includes the feature of the image, and this feature is used in the detail enhancement unit 132 to provide higher performance.

출력 영상 조정부(133)에서는 최종적으로 보간 프레임 각각의 채널 해상도가 원본의 채널 해상도와 동일하게 조절해 준다. 디테일 향상부(132)에서 해상도가 향상된 채널은 원본 영상의 해상도와 동일하겠지만, 다른 채널의 영상은 원본 해상도와 다를 수 있다. 다시 말하면 원본 해상도와 다른 보간 영상의 채널에 대해서만 출력 영상 조정부(133)에서 해상도 조절이 수행된다. The output image adjusting unit 133 finally adjusts the channel resolution of each interpolated frame to be the same as the original channel resolution. The channel in which the resolution is improved in the detail enhancement unit 132 is the same as the resolution of the original image, but the image of the other channel may be different from the original resolution. In other words, the resolution adjustment is performed in the output image adjustment unit 133 only for the channel of the interpolated image different from the original resolution.

본 발명의 실시예에서, 네트워크의 학습(training)은 중간 프레임 생성부(120)를 먼저 학습한 후, 디테일 향상부(132)를 학습하는 것을 권장한다. 또는 위와 같이 학습을 수행한 후, 엔드 투 엔드 (end-to-end)로 전체 네트워크를 한 번에 학습하는 것을 권장한다. 다른 방법으로는 중간 프레임 생성부(120)를 학습한 후, 중간 프레임과 디테일 향상부(132)를 동시에 학습하는 것을 권장한다.In an embodiment of the present invention, training of the network is recommended to first learn the intermediate frame generation unit 120 and then learn the detail enhancement unit 132. Or, after learning as above, it is recommended to learn the entire network at one time end-to-end. As another method, after learning the intermediate frame generation unit 120, it is recommended to learn the intermediate frame and the detail enhancement unit 132 at the same time.

다음은 본 발명의 다양한 구형예들을 보여준다. The following shows various spherical examples of the present invention.

전체 네트워크의 입력은 동영상에서 연속된 두 프레임이며, 이를

와

이라고 하자. 전체 네트워크의 출력은

이며 t’ 은 0과 1사이의 값이다. 즉 출력 영상은 t 와 t+1 사이의 영상이 된다. 여기서 t는 시간을 의미한다. 입력 영상이 YCbCr 컬러 포맷을 갖는다고 하면 입력 영상

와

는

와

로 표현할 수 있다.

와

은 밝기 신호를,

은 색차 신호를 의미한다. 입력 영상 조정부(110)의 입력은

와

이 되며 출력은

와

이 된다.

은 서브 샘플링된 영상을 의미하며

사이의 해상도는 모두 같아야 한다. 다시 말하면 채널 간 해상도는 같아야 한다. The input of the entire network is two consecutive frames in the video.

Wow

Let's say. The output of the entire network

And t 'is a value between 0 and 1. That is, the output image is an image between t and t +1. Where t means time. Assuming that the input image has a YCbCr color format, the input image

Wow

The

Wow

Can be expressed as

Wow

The brightness signal,

Means a color difference signal. The input of the input image adjusting unit 110 is

Wow

And the output is

Wow

It becomes.

Means sub-sampled image

The resolutions between them should all be the same. In other words, the resolution between channels should be the same.

입력 영상 조정부(110)에 대한 실시예를 YCbCr 4:2:0 컬러 형식을 갖는 동영상에 대하여 도 7과 같이 살펴보자. 밝기 신호 Y의 해상도가 H x W이면 색차 신호 Cb, Cr은 H/2 x W/2 의 해상도를 갖는다. 입력 영상 조정부(110)에서 영상 신호의 출력 해상도는 H/2 x W/2로 설정한다. 이와 같을 경우 Y의 신호만 가로, 세로 2배씩 서브 샘플링하여 H/2 x W/2로 생성한다.

과

은 모두 H/2 x W/2의 해상도를 갖는다. 출력 영상의 해상도는 갈 컬러 채널 중 가장 작은 해상도를 갖는 채널보다 작거나 같아야 한다. 즉, YCbCr 4:2:0영상의 경우에는 출력 해상도는 H/2 x W/2 보다 같거나 작아야 한다. 다른 실시예로 YCbCr 4:2:2 영상에 대하여 살펴보자. Y의 해상도를 H x W라고 하면 Cb, Cr의 해상도는 H x W/2이다. 그럼 입력 영상 조정부(110)의 출력 해상도는 H x W/2 이거나 그 이하여야 한다. 따라서 사용자의 설정에 따라

과

의 해상도는 H x W/2 가 될 수 도 있고, H/2 x W/2 등도 될 수 있다. An embodiment of the input image adjustment unit 110 will be described with reference to a video having a YCbCr 4:2:0 color format as shown in FIG. 7. When the resolution of the brightness signal Y is H x W, the color difference signals Cb and Cr have a resolution of H/2 x W/2. The output resolution of the video signal from the input image adjusting unit 110 is set to H/2 x W/2. In this case, only the signal of Y is subsampled horizontally and vertically twice to generate H/2 x W/2.

and

All have H/2 x W/2 resolution. The resolution of the output image should be less than or equal to the channel with the smallest resolution among the color channels to go. That is, in the case of a YCbCr 4:2:0 image, the output resolution should be equal to or smaller than H/2 x W/2. As another example, let's look at the YCbCr 4:2:2 image. If the resolution of Y is H x W, the resolution of Cb and Cr is H x W/2. Then, the output resolution of the input image adjusting unit 110 should be H x W/2 or less. Therefore, depending on the user's settings

and

The resolution may be H x W/2 or H/2 x W/2.

중간 프레임 생성부(120)의 입력은

과

이다. 도 7의 YCbCr 4:2:0 포맷의 경우에는

과

이 같고 t 프레임도 마찬가지다. 중간 프레임 생성부의 출력은

이 된다.

는 t와 t+1 프레임 사이에서 보간된 중간 프레임이며

와

의 해상도는 동일하다. 중간 프레임 생성부(120)는 인코더부, 디코더부, 합성부로 구성되나 이에 한정되지 않는다. 기존의 프레임 보간 방법을 사용하여도 무관하다. 인코더부는 콘볼루셔널 레이어, 풀링 레이어, 활성 레이어로 구성되어 있다. 아래 도 처럼 입력 영상에 대하여 컨볼루션을 수행하고 풀링 레이어를 수행하고 활성 레이어를 수행한다. 풀링 레이어는 영상의 해상도를 감소시킨다. 일반적으로 가로, 세로 각각 2배식 감소시킨다. 컨볼루셔널 레이어의 출력을

, 풀링 레이어의 출력을

,활성 레이어의 출력을

이라고 한다. L은 레이어의 순서를 의미하며 이 각각의 레이어의 출력은 해상도 증가부의 입력이 된다. 디코더부는 인코더부에서 줄어든 해상도를 다시 증가시키면서 특징을 추출한다. 해상도 증가는 Bilinear 필터 또는 디콘볼루션 레이어 등 다양한 방법을 사용할 수 있으며 특정 방법에 제한되지 않는다. 각각의 업샘플링 레이어의 출력을

이라고 정의하며 L은 레이어의 순서를 의미한다. 각각의 업샘플링 레이어의 출력

은 해상도 향상부(130)의 입력으로 사용된다. 디코더부의 최종 출력

와

,

은 합성부의 입력이 된다. 합성은 콘볼루션 필터, 와핑 (Warping) 등 다양한 방법이 사용될 수 있으며 본 발명에서는 특정 방법에 제한되지 않는다. 합성부의 출력은

이 되며 이를 1차 중간 프레임으로 부른다.The input of the intermediate frame generator 120 is

and

to be. In the case of the YCbCr 4:2:0 format of FIG. 7

and

This is the same, and so is the t frame. The output of the intermediate frame generator

It becomes.

Is the intermediate frame interpolated between t and t+1 frames

Wow

The resolution is the same. The intermediate frame generation unit 120 includes an encoder unit, a decoder unit, and a synthesis unit, but is not limited thereto. It is not necessary to use the existing frame interpolation method. The encoder unit is composed of a convolutional layer, a pooling layer, and an active layer. As shown below, convolution is performed on the input image, a pooling layer is performed, and an active layer is performed. The pooling layer reduces the resolution of the image. In general, it is reduced by a factor of 2 in each of the horizontal and vertical directions. The output of the convolutional layer

, The output of the pooling layer

, Output the active layer

It is said. L denotes the order of the layers, and the output of each layer becomes an input of a resolution increasing unit. The decoder unit extracts features while increasing the reduced resolution in the encoder unit again. The increase in resolution may use various methods such as a bilinear filter or a deconvolution layer, and is not limited to a specific method. The output of each upsampling layer

And L is the order of the layers. Output of each upsampling layer

Is used as an input of the resolution enhancement unit 130. Decoder output

Wow

,

Is the input of the synthesis section. Synthesis may be a variety of methods, such as convolution filter, warping (Warping) is not limited to a specific method in the present invention. The output of the synthesizer

This is called the first intermediate frame.

1차 중간 프레임의 해상도는 원영상에 비해서 작은 해상도를 갖는다. 따라서 해상도 향상부(130)에서는 1차 중간 프레임의 해상도를 증가시켜 원영상과 같은 해상도를 갖게 하는 동작을 수행한다. 해상도 향상부(130)의 입력은 해상도 향상부(130)의 입출력은 다양하게 정의될 수 있으며 다음에서 몇 가지 실시 예에 대하여 살펴본다. The resolution of the first intermediate frame has a smaller resolution than the original image. Therefore, the resolution enhancement unit 130 increases the resolution of the first intermediate frame to perform the operation to have the same resolution as the original image. The input and output of the resolution enhancement unit 130 may be variously defined as inputs and outputs of the resolution enhancement unit 130, and some embodiments will be described below.

먼저 원본 입력 영상이 YCbCr 4:2:0 컬러 포맷이며 Y의 해상도는 H x W, Cb, Cr의 해상도는 H/2 X W/2이라고 하자. 입력 영상 조정부(110)에서 Y의 해상도를 서브샘플링하여 Cb, Cr의 해상도인 H/2 x W/2 와 일치시킨다. 따라서 중간 프레임 생성부(120)의 해상도는 Y, Cb, Cr의 경우 H/2 x W/2 가 되며, Y의 경우 원영상에 비해 해상도가 줄었지만 Cb, Cr의 경우 해상도가 변하지 않았으므로 Y 영상만 해상도 향상부(130)를 사용하여 해상도를 증가시킨다. 해상도 향상부(130)는 중간 영상 조정부(131)와 디테일 향상부(132)의 두 단계로 구성된다. 중간 영상 조정부(131)는 1차 중간 프레임을 업샘플링하여 원 영상의 크기와 동일하게 만든다.First, let's say that the original input image is a YCbCr 4:2:0 color format, and the resolution of Y is H x W, Cb, and Cr is H/2 X W/2. The input image adjustment unit 110 subsamples the resolution of Y to match the resolution of Cb and Cr with H/2 x W/2. Therefore, the resolution of the intermediate frame generation unit 120 is H/2 x W/2 in the case of Y, Cb, Cr, and in the case of Y, the resolution is reduced compared to the original image, but in the case of Cb, Cr, the resolution has not changed. Only the image is increased using the resolution enhancement unit 130. The resolution enhancement unit 130 is composed of two steps: an intermediate image adjustment unit 131 and a detail enhancement unit 132. The intermediate image adjustment unit 131 upsamples the first intermediate frame to make it equal to the size of the original image.

위의 경우 중간 영상 조정부(131)의 입력은 H/2 x W/2 의 해상도를 갖는 1차 중간 프레임

이고 출력은 H x W 해상도를 갖는 영상

이다. 디테일 향상부(132)의 입력은

와 원영상

,

이다.

의 디테일은 고해상도 디테일을 가지고 있는

,

의 협력을 받아 더욱 정교하게 복원될 수 있다. 기존의 다중 프레임 (multi-frame) 해상도 향상 방법은 여러 장의 저해상도 영상을 사용하여 고해상도 영상을 사용하였다. 그러나 본 발명에서는 저해상도 영상과 고해상도 영상을 동시에 이용하여 해상도 향상을 꾀하였다. 디테일 향상부(132)는 딥러닝 네트워크로 구성되어 있으며 출력은

이다. 디테일 향상부(132)를 위한 딥러닝 네트워크는 컨벌루셔널 레이어, 풀링 레이어, 활성 레이어 등으로 구성되어 있으며, 본 논문에서는 특정 네트워크 형태로 제한하지 않는다. 디테일 향상부(132)는 중간 프레임 생성부(120)의 중간 결과물을 입력으로 받아 디테일 향상부(132)의 중간 과정에 더하거나 적층하여 (concatenation) 성능을 개선할 수 있다. 중간 생성부의 중간 결과물과 디테일 향상부(132)의 중간 결과물의 해상도는 일치해야한다.

와 1차 중간 프레임의 색차 신호

,

이 최종 결과물이 된다. In the above case, the input of the intermediate image adjustment unit 131 is a first intermediate frame having a resolution of H/2 x W/2

And the output is H x W resolution image

to be. The input of the detail enhancement unit 132 is

And original video

,

to be.

The details of the

,

It can be restored more elaborately with the cooperation of. In the conventional multi-frame resolution enhancement method, high resolution images were used using multiple low-resolution images. However, in the present invention, the resolution is improved by simultaneously using a low-resolution image and a high-resolution image. The detail enhancement unit 132 is composed of a deep learning network and the output is

to be. The deep learning network for the detail enhancement unit 132 is composed of a convolutional layer, a pooling layer, and an active layer, and is not limited to a specific network type in this paper. The detail enhancement unit 132 may receive an intermediate result of the intermediate frame generation unit 120 as an input and add or stack the intermediate result of the detail enhancement unit 132 to improve the performance of the concatenation. The resolution of the intermediate result of the intermediate generation unit and the intermediate result of the detail enhancement unit 132 must match.

And 1st intermediate frame color difference signal

,

This is the final result.

원본 입력 영상이 YCbCr 4:2:2 컬러 포맷인 실시예에 대하여 살펴보자. Y의 해상도는 H x W, Cb, Cr의 해상도는 H x W/2 라고 하자. 도 8과 같이 입력 영상 조정부(110)에서 Y의 해상도를 서브샘플링하여 Cb, Cr의 해상도인 H x W/2 와 일치시킨다. 따라서 중간 프레임 생성부(120)의 해상도는 Y, Cb, Cr의 경우 H x W/2 가 되며, Y의 경우 원영상에 비해 해상도가 줄었지만 Cb, Cr의 경우 해상도가 변하지 않았으므로 Y 영상만 해상도 향상부(130)를 사용하여 해상도를 증가시킨다. 해상도 향상부(130)의 작동 방식은 위의 YCbCr 4:2:0 형식일 때와 동일하다. H x W의 해상도를 가지는

와 1차 H x W/2의 해상도를 가지는 중간 프레임의 색차 신호

,

이 최종 결물이 된다. Let us consider an embodiment in which the original input image is a YCbCr 4:2:2 color format. Let the resolution of Y be H x W, the resolution of Cb and Cr be H x W/2. As shown in FIG. 8, the resolution of Y is subsampled by the input image adjusting unit 110 to match the resolution of Cb and Cr with H x W/2. Therefore, the resolution of the intermediate frame generation unit 120 is H x W/2 in the case of Y, Cb, Cr, and in the case of Y, the resolution is reduced compared to the original image, but in the case of Cb, Cr, the resolution has not changed, so only the Y image The resolution is increased by using the resolution enhancement unit 130. The operation method of the resolution enhancement unit 130 is the same as in the YCbCr 4:2:0 format above. H x W resolution

And 1st H x W/2 resolution intermediate frame color difference signal

,

This is the final payment.

YCbCr 4:2:2 컬러 포맷의 다른 실행 예시는 도 9와 같다. Y의 해상도는 H x W, Cb, Cr의 해상도는 H x W/2 라고 하자. 입력 조정부에서 Y, Cb, Cr의 해상도를 모두 H/2 x W/2로 서브 샘플링한다. 그리고 위의 YCbCr 4:2:0 과 동일한 방식으로 처리해준다. 이와 같은 과정을 거칠 경우 디테일 향상부(132)의 출력은 Y신호는 H x W의 해상도를 갖으며, Cb, Cr 신호는 H/2 x W/2 의 해상도를 갖는다. Cb, Cr의 신호가 원본 신호와 다르기 때문에 출력 영상 조정부(133)에서 Cb, Cr의 해상도를 원본 신호와 동일하게 조정하여 준다. 조정 방법은 Bicubic 필터 등이 사용될 수 있으며 특정 방법에 제한 받지 않는다. 이와 같이 할 경우 위의 방법에 비해 네트워크에서 처리되는 입력 영상 크기가 작으므로 연산량과 메모리를 절약할 수 있다는 장점이 있다. 그리고 YCbCr 4:2:0에서 사용되었던 네트워크를 공유할 수 있다는 장점이 있다. Another example of the YCbCr 4:2:2 color format is illustrated in FIG. 9. Let the resolution of Y be H x W, the resolution of Cb and Cr be H x W/2. The input adjustment unit subsamples the resolution of Y, Cb, and Cr to H/2 x W/2. And it is processed in the same way as YCbCr 4:2:0 above. When this process is performed, the output of the detail enhancement unit 132 has a resolution of H x W for the Y signal, and a signal of H/2 x W/2 for the Cb and Cr signals. Since the signals of Cb and Cr are different from the original signal, the output image adjusting unit 133 adjusts the resolution of Cb and Cr to be the same as the original signal. As the adjustment method, a bicubic filter or the like can be used, and is not limited to a specific method. In this way, compared to the above method, the size of the input image processed in the network is small, so there is an advantage of saving computational amount and memory. And the advantage of being able to share the network used in YCbCr 4:2:0.

YCbCr 4:4:4나 RGB 입력 영상이 입력인 경우를 고려하여 보자. YCbCr 4:4:4나 RGB 컬러 형식의 동영상은 모든 채널의 해상도가 동일함으로 입력 영상 조정부(110)와 해상도 향상부(130)가 사용되지 않아도 중간 프레임이 생성될 수 있다. 그러나 연산량을 감소시키기 위해서 본 발명에서는 입력 영상 조정부(110)와 해상도 향상부(130)를 사용할 수 있다. 예를 들어 입력 영상이 RGB 형식이고 각 채널의 해상도는 H x W라고 가정하자. 입력 영상 조정부(110)에서 각 채널을 H/2 x W/2 의 크기로 서브샘플링 한 후 중간 프레임 생성부(120)의 입력으로 넣는다. 그리고 H/2 x W/2의 해상도의 1차 증간 프레임은 해상도 향상부(130)에서 H x W의 해상도로 복원된다. 전체 프로세스에서 속도의 이점은 중간 프레임 생성부(120)의 연산량이 해상도 향상부(130)의 연산량보다 높다는 데에서 온다. 따라서 연산량이 높은 중간 프레임 생성부(120)는 작은 해상도로 처리하고 이를 다시 해상도 향상부(130)에서 복원하여 연산량을 줄일 수 있도록 한다. YCbCr 4:4:4 의 경우도 유사한 방식으로 적용할 수 있다. 각 채널의 해상도가 H x W이면 입력 영상 조절부에서 H/2 x W/2로 변환한다. 그리고 중간 프레임 생성부(120)에서 H/2 x W/2의 Y, Cb, Cr 결과를 생성한다. 1차 중간 프레임은 해상도 향상부(130)에 2가지 방법으로 적용될 수 있다. 첫 번째 방법은 위에서 설명한 RGB 영상처럼 도 10과 같이 Y, Cb, Cr 모든 채널에 대하여 디테일 향상부(132)의 딥러닝 네트워크를 통과시키는 것이다. 두 번째 방법은 위에서 설명한 YCbCr 4:2:2 방식처럼 도 11과 같이 Y 영역만 딥러닝 네트워크를 통과하여 고화질 영상으로 복원하고 Cb, Cr은 출력영상 조정부에서 단순 업스케일링을 통해 복원하는 것이다. 이것은 Cb, Cr은 일반적으로 디테일 영역이 적기 때문에 단순 업스케일링을 통해서도 복원이 가능하다. 이와 같은 방법은 전체 채널 영상에 대하여 딥러닝 네트워크를 사용하는 것에 비해 더욱 적은 연산량으로 복원이 가능하다. 또한 YCbCr4:2:0 네트워크와 공통으로 사용할 수 있다는 장점이 있어 범용성이 높다는 장점이 있다. Let's consider the case where YCbCr 4:4:4 or RGB input image is input. In YCbCr 4:4:4 or RGB color format video, the resolution of all channels is the same, so an intermediate frame may be generated even if the input image adjusting unit 110 and the resolution enhancing unit 130 are not used. However, in order to reduce the computation amount, the input image adjustment unit 110 and the resolution enhancement unit 130 may be used in the present invention. For example, suppose the input image is in RGB format and the resolution of each channel is H x W. The input image adjustment unit 110 subsamples each channel to a size of H/2 x W/2 and then inputs it as an input of the intermediate frame generation unit 120. In addition, the first incremental frame having a resolution of H/2 x W/2 is restored to the resolution of H x W in the resolution enhancement unit 130. The advantage of speed in the entire process comes from that the computation amount of the intermediate frame generator 120 is higher than that of the resolution enhancement unit 130. Therefore, the intermediate frame generation unit 120 having a high computation amount is processed with a small resolution and is restored by the resolution enhancement unit 130 to reduce the computation amount. YCbCr 4:4:4 can be applied in a similar manner. If the resolution of each channel is H x W, the input image controller converts H/2 x W/2. Then, the Y, Cb, and Cr results of H/2 x W/2 are generated by the intermediate frame generator 120. The primary intermediate frame may be applied to the resolution enhancement unit 130 in two ways. The first method is to pass the deep learning network of the detail enhancement unit 132 for all channels Y, Cb, and Cr as shown in FIG. 10, as in the RGB image described above. The second method is to restore the high-definition image through only the Y region through the deep learning network as shown in FIG. 11 as in the YCbCr 4:2:2 method described above, and restore Cb and Cr through simple upscaling in the output image adjustment unit. Since Cb and Cr generally have a small detail area, they can be restored through simple upscaling. Such a method can be restored with a smaller computational amount than using a deep learning network for all channel images. In addition, it has the advantage that it can be used in common with the YCbCr4:2:0 network, so it has a high versatility.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.On the other hand, the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program that performs functions of the apparatus and method according to the present embodiment. Further, the technical idea according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium. The computer-readable recording medium can be any data storage device that can be read by a computer and stores data. Of course, the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like. In addition, computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention pertains without departing from the gist of the present invention claimed in the claims. In addition, various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical idea or prospect of the present invention.

110 : 입력 영상 조정부
120 : 중간 프레임 생성부
121 : 인코더부
122 : 디코더부
123 : 합성부
130 : 해상도 향상부
131 : 중간 영상 조정부
132 : 디테일 향상부
133 : 출력 영상 조정부110: input image adjustment unit
120: intermediate frame generating unit
121: encoder unit
122: decoder unit
123: synthesis section
130: resolution enhancement unit
131: middle image adjustment unit
132: detail enhancement unit
133: output image adjustment unit

Claims

Adjuster for adjusting the resolution for the image of each channel;
A generation unit that generates an intermediate image from two consecutive images of t seconds and t+1 seconds whose resolution is adjusted by the adjusting unit; And
Includes; an enhancement unit for improving the resolution of the intermediate image generated by the generation unit,
In the middle video,
t+0.5 second video,
The adjustment unit,
Video frame rate converter, characterized in that for sub-sampling the resolution for each channel of the video.

delete

The method according to claim 1,
The generating part,
An encoder unit for calculating the characteristics of the image with the adjusted resolution;
A decoder unit that restores the calculated characteristics by the resolution size of the image;
And a synthesis unit that generates an intermediate image using the outputs of the decoder and the two images whose resolution is adjusted by the adjustment unit.

The method according to claim 1,
The improvement department,
Video frame rate converter, characterized in that to improve the resolution of the intermediate image generated by the generator to the resolution of the original image.

The method according to claim 4,
The improvement department,
A first adjustment unit for adjusting the resolution of the intermediate image to improve detail;
A detail enhancement unit that improves details of an intermediate image whose resolution is adjusted using the original image;
And a second adjustment unit that adjusts the resolution of the intermediate image whose resolution is not adjusted by the first adjustment unit.

The method according to claim 5,
The detail enhancement part,
Video frame rate converter, characterized in that only the selected channel is input.

The method according to claim 6,
The selected channel,
Video frame rate conversion device characterized in that the brightness signal.

The method according to claim 5,
The detail enhancement part,
A video frame rate conversion device characterized by improving the detail of the intermediate image by referring to the intermediate information generated by the generation unit.

The method according to claim 5,
The detail enhancement part,
A video frame rate conversion device characterized by using a deep learning network.

Adjusting a resolution for an image of each channel;
Generating an intermediate image from two consecutive images of t seconds and t+1 seconds with the adjusted resolution; And
And improving the resolution of the generated intermediate image.
In the middle video,
t+0.5 second video,
The adjustment step,
Video frame rate converter, characterized in that for sub-sampling the resolution for each channel of the video.

Adjuster for adjusting the resolution for the image of each channel;
It includes; an enhancement unit for improving the resolution of the intermediate image generated by two consecutive images of t seconds and t+1 seconds in which the resolution is adjusted by the adjusting unit;
In the middle video,
t+0.5 second video,
The adjustment unit,
Video frame rate converter, characterized in that for sub-sampling the resolution for each channel of the video.

Adjusting a resolution for an image of each channel;
Including the step of improving the resolution of the intermediate image generated by two consecutive images of t seconds and t+1 seconds, the resolution of which is adjusted by the adjustment unit,
In the middle video,
t+0.5 second video,
The adjustment step,
Video frame rate converter, characterized in that for sub-sampling the resolution for each channel of the video.