KR101791573B1

KR101791573B1 - Super resolution system and method with convolution neural network

Info

Publication number: KR101791573B1
Application number: KR1020160137913A
Authority: KR
Inventors: 낭종호; 김상철
Original assignee: 서강대학교산학협력단
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2017-10-31

Abstract

본 발명은 비디오 스트림에 대한 수퍼 해상도 방법 및 시스템에 관한 것이다. 상기 수퍼 해상도 방법 및 장치는 선-훈련용 이미지에 대하여 컨볼루션 신경망을 이용하여 수퍼 해상도를 학습하여 선-훈련된 필터 세트를 획득하고, 수퍼 해상도를 수행하고자 하는 비디오 스트림의 프레임들을 샷들(shot)과 점진적 샷 변화 영역들(Gradual shot change region)로 그룹핑하고, 각 그룹별로 컨볼루션 신경망을 이용하여 선-훈련된 필터 세트들을 다시 학습하여 각 그룹에 최적화된 미세 튜닝된 필터 세트를 획득하고, 각 그룹에 대한 미세 튜닝된 필터 세트를 이용하여 각 그룹의 프레임들에 대한 수퍼 해상도를 수행하는 것을 특징으로 한다. The present invention relates to super resolution methods and systems for video streams. The super resolution method and apparatus may further include a step of acquiring a pre-resolution filter set by learning a super resolution using a convolutional neural network with respect to a pre-training image, And Gradual shot change regions, and re-learning pre-trained filter sets using convolutional neural networks for each group to obtain fine tuned filter sets optimized for each group, And super resolution for the frames of each group is performed using the fine tuned filter set for the group.

Description

[0001] The present invention relates to a super resolution apparatus and method for a video stream using a convolutional neural network,

본 발명은 비디오 스트림에 대한 수퍼 해상도 장치 및 방법에 관한 것으로서, 더욱 구체적으로는, 비디오 스트림을 샷과 점진적 샷 변화 영역으로 그룹화하고 컨볼루션 신경망을 이용하여 각 그룹별로 짧은 iteration을 학습하여 비디오 스트림에 대한 수퍼 해상도 영상을 획득하는 비디오 스트림에 대한 수퍼 해상도 장치 및 방법에 관한 것이다. The present invention relates to a super resolution apparatus and method for a video stream, and more particularly, to grouping video streams into shot and progressive shot change regions and learning a short iteration for each group using a convolution neural network, Resolution super resolution image and a super resolution apparatus and method for a video stream for obtaining super resolution images.

Ultra HD영상이 국제 표준이 되었지만 현재 대다수의 영상 컨텐츠는 Full HD로 제작되어 있다.기제작되어 있는 영상 컨텐츠들을 Ultra HD 상영기에서 재생시에 선형 보간 방법(Linear interpolation)을 수행하여 해상도(resolution)를 scaling up한다. Linear interpolation의 경우, 화면이 blurring 되는 효과가 많이 나타나게 되어 화질이 떨어지는 문제점이 있다. 이러한 문제점을 해결하기 위해서 초해상도(Super Resolution)라는 기술이 연구되고 있는데, 초해상도는 정지영상의 확대시 화질보존을 목적으로 하기 때문에 동영상에 적용할 시에 정지영상과 동영상이 갖는 이미지의 특성차이 때문에 오히려 선형 보간 방법보다 화질열화가 더 심해지는 경우도 발생한다.Although Ultra HD video has become the international standard, most of the video contents are currently being produced in Full HD. When playing back the created video contents in Ultra HD screen, it is necessary to perform linear interpolation scaling up. In the case of linear interpolation, there are many blurring effects on the screen, which results in poor picture quality. In order to solve this problem, super resolution technology has been studied. Since super resolution is intended to preserve image quality when enlarging a still image, when applying to a moving image, Therefore, the image quality deterioration may become worse rather than the linear interpolation method.

수퍼 해상도(Super Resolution)은 이미지의 화질 열화를 최소화시키면서 resolution을 향상시키는 기법을 말한다. 초해상도 방법으로는, (1) 영상의 resolution이 늘어나면 blur증가, noise 증가가 일어나기 때문에 1장의 이미지를 이용하여 de-blur, de-noise filter를 학습하여 필터를 찾는 single image based SR, (2) 연속된 여러 장의 이미지를 이용하여 공통된 de-blur, de-noise filter를 학습하여 필터를 찾는 multi frame based SR, (3) 대량의 DB에서 학습을 이용하여 적합한 filter set을 미리 학습시키는 example based SR 등이 있다. 이러한 수퍼 해상도 방법들은 모두 공통적으로 Sharpening Edge를 표현하는 De-blurrinf 및 De-noising을 위한 최적화된 필터들을 찾아내는 것을 목적으로 한다. 하지만, 이러한 학습 방식의 SR은 machine learning의 문제점들을 그대로 안고 있다. 따라서, 최근에는 Deep learning을 이용하여 이러한 filter set을 학습하는 CNN기반의 Super Resolution이 연구되었으며 큰 성능 향상을 보여준다.Super Resolution refers to a technique for improving resolution while minimizing deterioration of image quality. (1) As the resolution of the image increases, the blur increases and the noise increases. Therefore, the single image based SR finding the filter by learning the de-blur and the de-noise filter using one image ), A multi-frame based SR that finds a filter by learning a common de-blur and de-noise filter using several consecutive images, (3) an example based SR that pre- . These super resolution methods are all aimed at finding optimized filters for de-blurring and de-noising that commonly represent a sharpening edge. However, the SR of this learning method holds the problems of machine learning. Recently, CNN-based super resolution learning has been studied using Deep learning.

CNN(Convolutional Neural Network)모델은 Convolutional Layer와 Subsampling Layer의 다층 구조로 연결된 신경망이다. 상위 계층으로 갈수록 점진적으로 대상 영역이 확장되는 형태의 feature map을 생성하며, 이 과정에서 receptive field의 연결구조를 통하여 특징점의 위치이동에 강인한 feature map을 학습해 나간다. 이렇게 학습된 feature map을 이용하여 분류 문제, image reconstruct 문제, 유사 판별 문제 등에 활용되고 있다. The CNN (Convolutional Neural Network) model is a neural network with a multilayered structure consisting of a convolutional layer and a subsampling layer. In this process, the feature map that is robust to the position shift of the feature points is learned through the connection structure of the receptive field. Using the learned feature maps, they are used for classification problem, image reconstruction problem, and similar discrimination problem.

하지만, Deep learning 을 이용하여 filter set을 학습하는 CNN 기반의 SR은 very deep neural network를 사용하여 SR의 성능을 대폭 향상시킬 수는 있지만, Large deep network는 iteration당 학습 시간이 길고 학습에 필요한 데이터가 많이 필요하여, 전체적으로 처리 비용이 증가되는 문제점이 발생한다. However, CNN-based SR learning filter set using Deep learning can greatly improve SR performance by using very deep neural network, but large deep network has a long learning time per iteration and data necessary for learning There is a problem that the processing cost is increased as a whole.

한편, 비디오 스트림을 구성하는 영상은 Shot단위로 촬영되어 이것들의 연속적인 결합으로 이루어져 있다. Shot은 영상에서 semantic meaning을 갖는 최소단위이며, 이것을 구분하는 shot change detection 방법들이 다양하게 연구되고 있다. shot change detection 방법들로는, 연속된 프레임의 변화가 급격히 일어나는 부분을 찾아서 밝기값의 변화나 optical flow등의 변화로 판단하는 방법 등이 있다. 하지만 이런 방법은 서서히 변화가 일어나서 shot이 변하는 gradual change 부분에서 shot 경계를 구분하지 못하는 단점이 있기 때문에 밝기값과 모션 벡터의 누적 변화율을 이용하여 shot 경계를 찾고, gradual change 부분을 판단하는 방법이 연구되었다.On the other hand, images constituting a video stream are shot in units of shots and consist of consecutive combinations of them. Shot is the minimum unit that has semantic meaning in image, and various shot change detection methods that distinguish it are studied. Shot change detection methods include a method of determining a change in brightness value or an optical flow by finding a portion where a continuous frame change occurs rapidly. However, this method has a disadvantage in that the shot boundary can not be distinguished in the gradual change part where the shot changes gradually due to the gradual change, so the method of determining the gradual change part by searching the shot boundary using the brightness value and the cumulative change rate of the motion vector .

한국공개특허공보 제 10-2011-0049570 호Korean Patent Publication No. 10-2011-0049570 한국공개특허공보 제 10-2003-0020357 호Korean Patent Publication No. 10-2003-0020357 한국공개특허공보 제 10-2013-0112500 호Korean Patent Publication No. 10-2013-0112500 미국등록특허 US 9,020,302 B2US Patent No. 9,020,302 B2

전술한 문제점을 해결하기 위한 본 발명의 목적은 일련의 프레임들로 구성된 비디오 스트림에 대하여 짧은 iteration 횟수로 CNN 학습하여 이미지 열화를 최소화시키면서 해상도를 향상시킬 수 있는 비디오 스트림에 대한 수퍼 해상도 방법 및 장치를 제공하는 것이다. SUMMARY OF THE INVENTION An object of the present invention is to provide a super resolution method and apparatus for a video stream capable of improving resolution while minimizing image degradation by CNN learning with a short iteration number of times for a video stream composed of a series of frames .

전술한 기술적 과제를 달성하기 위한 본 발명의 제1 특징에 따른 비디오 스트림에 대한 수퍼 해상도 장치는, 일련의 프레임들로 구성된 비디오 스트림을 입력받는 비디오 스트림 입력 모듈; 사전 설정된 학습 모델을 이용하여 선-훈련용 이미지들에 대하여 선-훈련(pre-training)하여 수퍼 해상도를 위한 필터 세트들을 획득하는 선-훈련 모듈; 상기 비디오 스트림의 프레임들을 샷(shot)들과 점진적 샷 변화 영역(gradual shot change region)으로 분류하고 클러스터링하고, 그 결과에 따라 각 프레임들을 샷들과 점진적 샷 변화 영역들로 그룹화하는 그룹핑 모듈; 및 상기 그룹핑 모듈에 의해 분류되고 클러스터링된 각 그룹들에 대하여, 해당 그룹의 프레임들에 대하여 상기 사전 설정된 학습 모델을 이용하여 상기 선-훈련된 필터 세트들을 다시 학습하여 해당 그룹에 대해 최적화되어 미세 튜닝된 필터 세트들을 획득하고, 상기 미세 튜닝된 필터 세트들을 이용하여 해당 그룹의 프레임들에 대한 수퍼 해상도 영상을 획득하는 수퍼 해상도 모듈;를 구비하여, 비디오 스트림을 구성하는 프레임들에 대한 수퍼 해상도를 수행한다. According to an aspect of the present invention, there is provided a super resolution apparatus for a video stream, the apparatus including: a video stream input module receiving a video stream composed of a series of frames; A pre-training module for pre-training images for pre-training using a pre-set learning model to obtain filter sets for super resolution; A grouping module for classifying and clustering frames of the video stream into shots and a gradual shot change region, and grouping the frames into shots and progressive shot change regions according to the result; And for each of the groups grouped and clustered by the grouping module, the pre-trained filter sets are re-learned using the pre-determined learning model for the frames of the group to be optimized for that group, And a super resolution module for obtaining the super resolution image for the frames of the group by using the fine tuned filter sets and performing a super resolution on the frames constituting the video stream do.

전술한 제1 특징에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 상기 필터 세트는 적어도 De-bluring 필터 및 De-noising 필터를 포함하는 것이 바람직하다.In the super resolution apparatus for a video stream according to the first aspect, it is preferable that the filter set includes at least a De-bluring filter and a De-noising filter.

전술한 제1 특징에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 상기 선-훈련 모듈 및 상기 수퍼 해상도 모듈이 사용하는 학습 모델은 컨볼루션 신경망(Convolution Neural Network; 'CNN')인 것이 바람직하다. In the super resolution apparatus for a video stream according to the first aspect, it is preferable that the learning model used by the pre-training module and the super resolution module is a Convolution Neural Network (CNN).

전술한 제1 특징에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 상기 수퍼 해상도 모듈은, 각 그룹을 구성하는 프레임들에 대한 저해상도 이미지를 추출하고 상기 저해상도 이미지들을 학습용 데이터 세트(train set)로 하고, 프레임들에 대한 원본 이미지를 레이블 세트(label set)로 설정하고, 상기 학습용 데이터 세트와 레이블 세트를 이용하여 상기 선-훈련된 필터 세트를 학습시킴으로서, 각 그룹에 대해 최적화되어 미세 튜닝된 필터 세트를 획득하고, 각 그룹에 대한 미세 튜닝된 필터 세트를 이용하여 각 그룹의 프레임들에 대한 수퍼 해상도 영상들을 얻는 것이 바람직하다. In the super resolution apparatus for a video stream according to the first aspect, the super resolution module extracts a low resolution image of frames constituting each group, sets the low resolution images as a training data set , Setting the original image for the frames as a label set and learning the pre-trained set of filters using the training data set and label set, and optimizing the fine tuned filter set for each group And obtain super resolution images for the frames of each group using the fine tuned filter set for each group.

전술한 제1 특징에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 상기 컨볼루션 신경망은 3-Layer의 Shallow 한 구조로 이루어진 것이 바람직하다. In the super resolution apparatus for a video stream according to the first aspect, it is preferable that the convolution neural network has a 3-layer Shallow structure.

본 발명의 제2 특징에 따른 비디오 스트림에 대한 수퍼 해상도 방법은, (a) 일련의 프레임들로 구성된 비디오 스트림을 입력받는 단계; (b) 사전 설정된 학습 모델을 이용하여 선-훈련용 이미지들에 대하여 선-훈련(pre-training)하여 수퍼 해상도를 위한 필터 세트들을 획득하는 단계; (c) 상기 비디오 스트림을 구성하는 프레임들을 샷(shot)들과 점진적 샷 변화 영역(gradual shot change region)으로 분류하고 클러스터링하고, 그 결과에 따라 각 프레임들을 샷들과 점진적 샷 변화 영역들로 그룹화시키는 단계; (d) 상기 (c) 단계에 의해 분류되고 클러스터링된 각 그룹들에 대하여, 해당 그룹의 프레임들에 대하여 상기 선-훈련된 필터 세트들을 상기 사전 설정된 학습 모델을 이용하여 다시 학습하여 해당 그룹에 대해 최적화되어 미세 튜닝된 필터 세트들을 획득하고, 각 그룹에 대한 미세 튜닝된 필터 세트들을 이용하여 각 그룹의 프레임들에 대한 수퍼 해상도 영상을 획득하는 단계;를 구비하여, 비디오 스트림을 구성하는 프레임들에 대한 수퍼 해상도를 수행한다. A super resolution method for a video stream according to a second aspect of the present invention includes the steps of: (a) receiving a video stream composed of a series of frames; (b) pre-training pre-training images for pre-training using a pre-set learning model to obtain filter sets for super resolution; (c) classifying and clustering frames constituting the video stream into shots and a gradual shot change region, and grouping the frames into shots and progressive shot change regions according to the result step; (d) For each group classified and clustered in step (c), the pre-trained filter sets are again learned for the frames of the group using the preset learning model, Acquiring optimized and fine tuned filter sets and obtaining a super resolution image for each group of frames using fine tuned filter sets for each group, Perform super resolution for.

전술한 제2 특징에 따른 비디오 스트림에 대한 수퍼 해상도 방법에 있어서, 상기 필터 세트는 적어도 De-bluring 필터 및 De-noising 필터를 포함하는 것이 바람직하다. In the super resolution method for a video stream according to the second aspect, it is preferable that the filter set includes at least a De-bluring filter and a De-noising filter.

전술한 제2 특징에 따른 비디오 스트림에 대한 수퍼 해상도 방법에 있어서, 상기 (b) 단계와 (d) 단계에서 사용하는 학습 모델은 컨볼루션 신경망(Convolution Neural Network; 'CNN')인 것이 바람직하다. In the super resolution method for a video stream according to the second aspect, it is preferable that the learning model used in steps (b) and (d) is a Convolution Neural Network (CNN).

전술한 제2 특징에 따른 비디오 스트림에 대한 수퍼 해상도 방법에 있어서, 상기 (d) 단계는, 각 그룹을 구성하는 프레임들에 대한 저해상도 이미지를 추출하고 상기 저해상도 이미지들을 학습용 데이터 세트(train set)로 하고, 프레임들에 대한 원본 이미지를 레이블 세트(label set)로 설정하고, 각 그룹의 상기 학습용 데이터 세트와 레이블 세트를 이용하여 상기 선-훈련된 필터 세트를 학습시킴으로서, 각 그룹에 대해 최적화되어 미세 튜닝된 필터 세트를 획득하고, 각 그룹에 대한 미세 튜닝된 필터 세트를 이용하여 각 그룹의 프레임들에 대한 수퍼 해상도 영상들을 얻는 것이 바람직하다. In the super resolution method for a video stream according to the second aspect of the present invention, the step (d) includes the steps of: extracting a low-resolution image of frames constituting each group and outputting the low-resolution images as a training data set , Training the set of pre-trained filters using the learning data set and label set of each group as the label set and setting the original image for the frames to be fine It is desirable to obtain the tuned filter set and obtain super resolution images for the frames of each group using the fine tuned filter set for each group.

전술한 제2 특징에 따른 비디오 스트림에 대한 수퍼 해상도 방법에 있어서, 상기 컨볼루션 신경망은 3-Layer의 Shallow 한 구조로 이루어진 것이 바람직하다. In the super resolution method for a video stream according to the second aspect of the present invention, the convolutional neural network preferably has a 3-layer Shallow structure.

본 발명에 따른 비디오 스트림에 대한 수퍼 해상도 방법 및 장치는 비디오 스트림의 프레임들을 특징이 서로 다른 샷과 점진적 샷 변화 영역으로 분류하고 클러스터링하여 그룹화시킨 후, 각 그룹에 대하여 필터 세트들을 CNN으로 학습함으로써, 각 그룹에 최적화된 필터 세트들을 획득하여 수퍼 해상도에 따른 이미지 열화를 최소화시키면서 해상도를 향상시킬 수 있게 된다. The super resolution method and apparatus for a video stream according to the present invention classify frames of a video stream into different shots and progressive shot change regions, cluster and group them, and then learn filter sets for each group using CNN, It is possible to acquire optimized filter sets for each group to improve resolution while minimizing image deterioration due to super resolution.

또한, 본 발명에 따른 비디오 스트림에 대한 수퍼 해상도 방법 및 장치는 일반 이미지들(general images)로 구성된 선-학습용 데이터 세트를 이용하여 CNN 학습하여 수퍼 해상도를 위한 필터 세트들을 미리 확보하며, 비디오 스트림의 각 그룹에 대하여 상기 필터 세트들을 다시 학습시켜 각 그룹에 최적화되도록 미세 튜닝된 필터 세트를 획득함으로써, 학습 iteration 횟수를 최소화시키면서 우수한 화질의 수퍼 해상도 영상을 얻을 수 있게 된다. In addition, the super resolution method and apparatus for a video stream according to the present invention pre-obtains filter sets for super resolution by CNN learning using a pre-learning data set composed of general images, The filter sets are re-learned for each group to obtain a fine tuned filter set to be optimized for each group, thereby obtaining super resolution images of excellent image quality while minimizing the number of learning iterations.

도 1은 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 장치를 도시한 블록도이다.
도 2는 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 방법에서 사용된 수퍼 해상도 CNN 구조이다.
도 3은 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 종래의 방법들과 비교하여 수퍼 해상도 결과물들을 예시한 이미지들이다.
도 4는 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 학습 iteration 횟수와 PSNR 상승률의 상관관계를 나타내는 그래프이다.
도 5는 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 학습 iteration 횟수에 따른 결과물들을 예시한 이미지들이다. 1 is a block diagram illustrating a super resolution device for a video stream in accordance with a preferred embodiment of the present invention.
2 is a super resolution CNN structure used in a super resolution method for a video stream according to a preferred embodiment of the present invention.
FIG. 3 is an image illustrating super resolution results in comparison with conventional methods, in a super resolution apparatus for a video stream according to a preferred embodiment of the present invention.
4 is a graph showing a correlation between the number of learning iterations and the PSNR increasing rate in a super resolution apparatus for a video stream according to a preferred embodiment of the present invention.
FIG. 5 is an image illustrating a result according to the number of learning iterations in a super resolution apparatus for a video stream according to a preferred embodiment of the present invention.

본 발명에 따른 비디오 스트림에 대한 수퍼 해상도 방법 및 시스템은 비디오 스트림의 프레임들을 샷들(shot)과 점진적 샷 변화 영역들(Gradual shot change region)로 그룹핑하고, 각 그룹별로 CNN을 이용하여 선-훈련된 초기 필터 세트들을 다시 학습하여 각 그룹에 최적화된 미세 튜닝된 필터 세트를 획득하고, 각 그룹에 대한 미세 튜닝된 필터 세트를 이용하여 각 그룹의 프레임들에 대한 수퍼 해상도를 수행하는 것을 특징으로 한다. The super resolution method and system for a video stream according to the present invention is characterized in that the frames of the video stream are grouped into shots and Gradual shot change regions, The initial filter sets are re-learned to obtain a fine tuned filter set optimized for each group, and a super resolution for the frames of each group is performed using the fine tuned filter set for each group.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 방법 및 장치의 구조 및 동작에 대하여 보다 구체적으로 설명한다. Hereinafter, the structure and operation of a super resolution method and apparatus for a video stream according to a preferred embodiment of the present invention will be described in more detail with reference to the accompanying drawings.

도 1은 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 장치의 구조를 도시한 블록도이다. 1 is a block diagram illustrating the structure of a super resolution device for a video stream according to a preferred embodiment of the present invention.

도 1을 참조하면, 본 발명에 따른 수퍼 해상도 장치(10)는 비디오 스트림 입력 모듈(100), 선-훈련 모듈(110), 그룹핑 모듈(120), 및 수퍼 해상도 모듈(130)을 구비한다. Referring to FIG. 1, a super resolution apparatus 10 according to the present invention includes a video stream input module 100, a pre-training module 110, a grouping module 120, and a super resolution module 130.

상기 비디오 스트림 입력 모듈(100)은 수퍼 해상도를 수행하고자 하는 일련의 프레임들로 구성된 비디오 스트림을 입력받는다. The video stream input module 100 receives a video stream composed of a series of frames for performing super resolution.

상기 선-훈련 모듈(110)은 사전 설정된 학습 모델을 이용하여 선-훈련용 이미지들에 대하여 선-훈련(pre-training)하여 수퍼 해상도를 위한 초기 필터 세트들을 획득한다. 본 발명에서 사용하는 학습 모델은 컨볼루션 신경망(CNN)을 사용하는 것이 바람직하다. The pre-training module 110 pre-training the pre-training images using a pre-set learning model to obtain initial filter sets for super resolution. The learning model used in the present invention is preferably a convolutional neural network (CNN).

한편, 상기 그룹핑 모듈(120)은 상기 비디오 스트림의 프레임들을 샷(shot)영역들과 점진적 샷 변화 영역들(gradual shot change region)으로 분류하여 클러스터링하고, 그 결과에 따라 각 프레임들을 샷 영역들과 점진적 샷 변화 영역들로 그룹화시켜 샷 영역들과 점진적 샷 변화 영역들로 이루어지는 다수 개의 그룹들을 설정한다. 상기 그룹핑 모듈은 영상의 모션 벡터와 밝기값의 누적 변화율을 이용하여 비디오 스트림의 프레임들을 샷 영역들과 점진적 샷 변화영역들로 분류하여 클러스터링할 수 있다. 한편, 비디오 스트림을 구성하는 프레임들을 샷 영역과 점진적 샷 변화 영역으로 분류하기 위한 방법들을 다양하게 연구되고 있으며, 상기 그룹핑 모듈은 이러한 방법들 중 하나를 선택하여 구현될 수 있다. Meanwhile, the grouping module 120 classifies the frames of the video stream into shot regions and gradual shot change regions, and clusters the frames. Then, each of the frames is divided into shot regions And sets a plurality of groups including shot regions and progressive shot change regions by grouping them into progressive shot change regions. The grouping module may classify frames of the video stream into shot regions and progressive shot change regions by using the cumulative rate of change of the motion vector and the brightness value of the image, and may cluster the frames. Meanwhile, various methods for classifying frames constituting a video stream into a shot area and a progressive shot change area have been studied variously, and the grouping module can be implemented by selecting one of these methods.

이를 보다 구체적으로 설명하면, 다음과 같다. 먼저, 비디오 스트림의 기본 단위는 샷(shot)으로서, 상기 샷(shot)은 영상 제작자의 편집을 거치지 않은 연속적인 카메라 이동으로 얻어진 가장 작은 단위의 비디오 데이터이며, 객체의 움직임과 관련된 일련의 연속된 프레임 집합으로서 의미있는 데이터의 집합이다. 상기 샷은 컷(cut), 디졸브(dissolve), 페이드(fade), 와이프(wipe) 등과 같이 여러 가지 변화로 연결되어 있다. 이렇게 변화되는 샷의 경계를 찾는 것은 비디오 내용 기반 검색을 위하여 전체 비디오 데이터를 효과적으로 구성하는데 가장 기본이 되는 핵심 기술이다.More specifically, it is as follows. First, a basic unit of a video stream is a shot, and the shot is the smallest unit of video data obtained by continuous camera movement without editing the video producer, and is a series of consecutive It is a set of data that is meaningful as a frame set. The shot is connected to various changes such as a cut, a dissolve, a fade, a wipe, and the like. Finding the boundaries of changing shots is the core technology for effectively constructing the entire video data for video content based retrieval.

현재까지 샷 경계 검출을 위하여 많은 연구가 진행되어 왔다. 많은 연구 결과들에서 급격한 장면 변환과 점진적 장면 변환을 위한 탐지모델을 각각 따로 정의하고 있으며, 이런 경우 특징 선택 및 변수 설정을 모델별로 수행하였다. 그 일예로서, H. J. Zhang, A. Kankanhalli, and S. W. Smoliar, "Automatic partitioning of full-motion video," ACM Multimedia Systems, 1: pp.10-28, 1993.은 연속된 프레임들의 히스토그램 차를 이용하는 트윈 컴패리슨(twin-comparison) 방법을 이용하여 갑작스런 장면 변화와 점진적 장면 변화를 검출하였다. 다른 일예로서는, A. M. Alattar, "Detecting and compressing dissolve regions in video sequences with DVI multimedia image compression algorithm," ISCAS, 13-16, 1993. 그리고 J. Meng, Y. Juan, S. F. Chang, "Scene change detection in a MPEG compressed video sequence," IS&T/SPIE Symposium, Proceedings, vol. 2419, Feb. 1995.은 점진적 장면 변화의 하나인 디졸브 검출을 위해 동영상의 각 프레임의 분산으로부터 만들어지는 분산 곡선의 특징을 이용하였다. 상기 분산 곡선은 디졸브 구간에서 아래로 볼록한 포물선 모양을 나타나게 되는데, Alattar는 디졸브 구간의 시작과 끝에서 분산 곡선의 2차 미분값이 음의 최소값으로 나타남을 증명하고 이를 음의 최소값이 일정한 임계값을 넘는 구간에 대해 그 구간의 평균이 또 다른 임계값보다 크고 그 구간의 길이가 일정한 길이보다 큰 구간을 디졸브 구간으로 정의하였다. 또한 Meng은 분산 곡선의 1차 미분에서 인접한 최대값과 최소값의 차이를 기준으로 디졸브 구간을 찾았다. 이러한 샷 경계 검출 방법들은 당업계에서 이미 널리 알려진 기술로서, 현재에도 지속적으로 연구 개발되고 있는 기술분야이므로, 이에 대한 보다 구체적인 설명은 생략한다. So far, much research has been done to detect shot boundaries. In many research results, detection models for sudden scene change and progressive scene change are separately defined. In this case, feature selection and parameter setting are performed for each model. As an example, HJ Zhang, A. Kankanhalli, and SW Smoliar, "Automatic partitioning of full motion video," ACM Multimedia Systems, 1: pp. 10-28, Sudden scene changes and progressive scene changes were detected using a twin-comparison method. Another example is AM Alattar, " Detecting and compressing dissolve regions in video sequences with DVI multimedia image compression algorithm ", ISCAS, 13-16, 1993. and J. Meng, Y. Juan, SF Chang, MPEG compressed video sequence, "IS & T / SPIE Symposium, Proceedings, vol. 2419, Feb. 1995. used the feature of the dispersion curve made from the dispersion of each frame of moving picture for the dissolve detection which is one of the gradual scene change. The dispersion curve shows a convex parabolic shape downward in the dissolve section. Alattar proves that the second derivative of the dispersion curve at the beginning and the end of the dissolve section appears as the minimum value of the dissociation curve. A section in which the average of the section is larger than another threshold value and the length of the section is longer than the predetermined length is defined as a dissolve section. In addition, Meng found the dissolve interval based on the difference between the maximum and minimum values in the first derivative of the dispersion curve. Such shot boundary detection methods are well known in the art, and are still being researched and developed, so a detailed description thereof will be omitted.

따라서, 본 발명에 따른 수퍼 해상도 장치 및 방법은, 전술한 바와 같은 샷 경계 검출에 대한 다양한 방법들 중 하나를 이용하여 비디오 스트림으로부터 샷과 점진적 샷 변화영역을 분류하고 클러스터링하는 그룹핑 모듈을 구현할 수 있다. Accordingly, the super resolution apparatus and method according to the present invention may implement a grouping module that classifies and clusters shots and progressive shot change regions from a video stream using one of various methods for shot boundary detection as described above .

한편, 영상의 최소 편집 단위인 샷의 프레임들은 동일한 배경과 객체가 나타나기 때문에 비슷한 패턴이 반복된다. 따라서, 비디오 스트림은 샷 단위로 그룹화하여 수퍼 해상도를 위한 CNN 학습 모델을 사용하여 학습하는 것이 바람직하다. 하지만, 샷의 경계가 장면 전환이 이루어지는 컷(cut)이 아닌 점진적 변화(gradual change)되는 점진적 샷 변화 영역인 경우, 샷에 사용되는 필터 세트를 이용하여 학습하는 경우 이러한 점진적 샷 변화 영역은 alpha blending과 blurring 효과가 많이 나타나게 되고 이미지 열화가 심화된다.On the other hand, similar patterns are repeated because the frames of the shots, which are the minimum editing units of the image, have the same background and objects. Therefore, it is preferable to group the video streams in units of shots and learn using a CNN learning model for super resolution. However, if the boundary of the shot is a progressive shot change area that is gradual change rather than a cut where the scene change is made, if the learning is performed using the filter set used for the shot, the progressive shot change area is alpha blending And the blurring effect is increased and the image deterioration is intensified.

따라서, 본 발명에 따른 수퍼 해상도 장치 및 방법은 비디오 스트림의 프레임들을 샷들과 점진적 샷 변화 영역들로 분류하여 그룹화하고 각 그룹에 대하여 CNN을 이용하여 학습함으로써 각 그룹에 최적화된 필터 세트를 얻게 된다. Accordingly, the super resolution apparatus and method according to the present invention classify frames of a video stream into shots and progressive shot change regions and group them, and learn each group using CNN to obtain a filter set optimized for each group.

상기 수퍼 해상도 모듈(130)은 상기 그룹핑 모듈에 의해 분류되고 클러스터링되어 구성된 각 그룹들에 대하여 각각 학습하여 각 그룹에 최적화된 필터 세트들을 획득하여 수퍼 해상도 영상을 얻게 된다. 상기 수퍼 해상도 모듈을 보다 구체적으로 설명하면, 각 그룹에 대하여, 각 그룹을 구성하는 프레임들에 대한 저해상도 이미지들을 추출하고, 상기 추출된 저해상도 이미지들을 학습용 데이터 세트(train set)로 하고 원래의 프레임인 고해상도 이미지들을 레이블 세트(label set)로 하여, 상기 사전 설정된 학습 모델인 컨볼루션 신경망을 이용하여 상기 선-훈련된 초기 필터 세트들을 다시 학습한다. 그 결과, 각 그룹별로 해당 그룹에 대하여 최적화되어 미세 튜닝된 필터 세트들을 획득하게 된다. 이러한 과정에 의해 각 그룹에 대하여 획득된 상기 미세 튜닝된 필터 세트들을 이용하여 해당 그룹의 프레임들에 대한 수퍼 해상도 영상을 획득한다. 따라서, 본 발명에 따른 수퍼 해상도 장치는, 각 그룹에 대해 최적화되어 미세 튜닝된 필터 세트들을 이용하여 각 그룹의 프레임들에 대한 수퍼 해상도 영상들을 획득하고, 상기 획득된 각 그룹의 프레임들에 대한 수퍼 해상도 영상들을 결합하여, 수퍼 해상도의 비디오 스트림을 제작하게 된다. The super resolution module 130 obtains a super resolution image by learning each group configured and classified by the grouping module and acquiring optimized filter sets for each group. More specifically, the super resolution module extracts low resolution images of the frames constituting each group, sets the extracted low resolution images as a training data set (train set) The pre-training initial filter sets are re-learned using the convolutional neural network, which is the pre-set learning model, with the high-resolution images as a label set. As a result, fine tuned filter sets are optimized for the respective groups for each group. By this process, a super resolution image for the frames of the group is obtained by using the fine tuned filter sets obtained for each group. Therefore, the super resolution apparatus according to the present invention can obtain super resolution images for the frames of each group by using fine and fine-tuned filter sets optimized for each group, Resolution images are combined to produce a super-resolution video stream.

본 발명에서 사용하는 학습 모델은 컨볼루션 신경망(CNN)을 사용하는 것이 바람직하다. 도 2는 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 방법에서 사용된 수퍼 해상도 CNN 구조이다. 상기 CNN은 학습용 이미지들로부터 feature map들을 구하고 이들을 학습시킴으로써, 저해상도(Low Resolution) 이미지로부터 고해상도(High Resolution) 이미지로 scaling up할 때 발생하는 노이즈와 blur를 제거하는 필터 세트들을 얻게 된다. 상기 필터 세트들을 이용하여, 저해상도 이미지로부터 고해상도 이미지를 획득하게 된다. The learning model used in the present invention is preferably a convolutional neural network (CNN). 2 is a super resolution CNN structure used in a super resolution method for a video stream according to a preferred embodiment of the present invention. The CNN obtains filter sets that remove noise and blur that occurs when scaling up from a low resolution image to a high resolution image by learning feature maps from learning images. Using the filter sets, a high resolution image is obtained from the low resolution image.

종래의 수퍼 해상도를 위한 CNN은 일반적 케이스(general case)에 맞추어 화질이 좋은 정지 영상 학습 데이터 세트, 레이블 세트(label set) 및 테스트 세트로 이루어진다. 하지만 비디오 스트림을 구성하는 샷 들의 프레임들은 영상의 압축에 의해 데이터 손실, 카메라 워크로 인한 blurring 효과 등이 많이 나타나며, 또한 영상에서 한 샷에 담긴 프레임들은 같은 객체, 같은 배경, 같은 카메라 워크, 비슷한 편집이 되는 특성이 있다. 이러한 이유로 인하여, 하나의 샷 안에서 수퍼 해상도에 사용하게 될 필터들은 모두 유사하며 정지 영상에서의 학습된 필터와는 차이가 발생하게 된다. 따라서, 필터들이 각 그룹내의 필터들에 최적화되어야 만이, 일반 이미지들을 이용하여 선-훈련된 필터 세트들보다 수퍼 해상도의 성능을 향상시킬 수 있게 된다. 따라서, 본 발명에 따른 수퍼 해상도 장치 및 방법은 선-훈련된 모델로 각 그룹을 미세 튜닝하여 학습을 시작하여 조기에 학습을 종료시킴에 따라, 각 그룹별로 SRCNN을 훈련시킴에도 불구하고 시간에 대한 학습 비용을 낮출 수 있게 된다. The conventional CNN for super resolution is composed of a still image learning data set, a label set, and a test set having good picture quality in accordance with a general case. However, frames of shots composing a video stream are subject to data loss due to image compression, blurring effect due to camera work, and the like. In addition, frames in a shot in the video are the same objects, the same background, . For this reason, the filters to be used for super resolution in one shot are all similar and different from the learned filters in the still image. Thus, only when the filters are optimized for the filters in each group, it is possible to improve the performance of the super resolution over the pre-set of pre-trained filters using normal images. Accordingly, the super resolution apparatus and method according to the present invention are capable of finely tuning each group with a pre-trained model to start learning and ending the learning early, so that even though SRCNN is trained for each group, The learning cost can be lowered.

본 발명에 따른 비디오 스트림에 대한 수퍼 해상도 방법은, (a) 일련의 프레임들로 구성된 비디오 스트림을 입력받는 단계; (b) 사전 설정된 학습 모델을 이용하여 선-훈련용 이미지들에 대하여 선-훈련(pre-training)하여 수퍼 해상도를 위한 초기 필터 세트들을 획득하는 단계; (c) 상기 비디오 스트림의 프레임들을 샷(shot)영역들과 점진적 샷 변화 영역들(gradual shot change region)으로 분류하고 클러스터링하고, 그 결과에 따라 각 프레임들을 샷 영역들과 점진적 샷 변화 영역들로 그룹화시켜, 샷 영역들과 점진적 샷 변화 영역들로 이루어지는 다수 개의 그룹들을 설정하는 단계; (d) 상기 그룹핑 모듈에 의해 설정된 다수 개의 그룹들에 대하여, 각 그룹별로 각 그룹을 구성하는 프레임들에 대하여 상기 사전 설정된 학습 모델을 이용하여 상기 초기 필터 세트들을 다시 학습하여 해당 그룹에 대해 최적화되어 미세 튜닝된 필터 세트들을 획득하고, 각 그룹에 대하여 상기 미세 튜닝된 필터 세트들을 이용하여 각 그룹별로 해당 그룹을 구성하는 프레임들에 대한 수퍼 해상도 영상을 획득하는 단계;를 구비하여, 비디오 스트림을 구성하는 프레임들에 대한 수퍼 해상도를 수행하게 된다. A super resolution method for a video stream according to the present invention includes the steps of: (a) receiving a video stream composed of a series of frames; (b) pre-training pre-training images for pre-training using a pre-set learning model to obtain initial filter sets for super resolution; (c) classifying and clustering the frames of the video stream into shot areas and gradual shot change regions, and classifying each frame into shot areas and progressive shot change areas Grouping, setting a plurality of groups of shot regions and progressive shot change regions; (d) For the plurality of groups set by the grouping module, the initial filter sets are re-learned using the pre-set learning model for frames constituting each group for each group, and optimized for the group Acquiring fine-tuned filter sets, and obtaining a super-resolution image of frames constituting a corresponding group for each group using the fine-tuned filter sets for each group, And super resolution for the frames.

전술한 본 발명에 따른 비디오 스트림에 대한 수퍼 해상도 장치 및 방법에 대한 성능을 확인하기 위하여 다양한 실험을 수행하였다. 이러한 실험을 위하여, 한국의 Documentary 영상중 무작위로 선별한 823 프레임들로 구성된 10 그룹들을 이용하였다. 상기 10 그룹은 7 샷과 3개의 점진적 샷 변화 영역을 포함한다. 그룹내의 overfitting을 위하여 테스트 구간과 학습 구간을 동일하게 하여 loss율이 낮아지는 것을 확인했다.Various experiments have been performed to confirm the performance of the super resolution apparatus and method for the video stream according to the present invention. For this experiment, we used 10 groups consisting of randomly selected 823 frames among Korean documentary images. The 10 groups include 7 shots and 3 progressive shot change areas. For overfitting in the group, we confirmed that the loss rate is lowered by making the test section and the learning section the same.

표 1은 SRCNN에 적용한 학습 모델에 따라 출력되는 데이터 세트에서 Scaled Up Image의 평균 PSNR(Peak Signal to Noise Ratio)를 나타내고 있다. Table 1 shows the average PSNR (Peak Signal to Noise Ratio) of the Scaled Up Image in the data set output according to the learning model applied to the SRCNN.

표 1에 있어서, (a)는 선형 보간법(Linear Interpolation)에 따라 획득된 고해상도 이미지에 대한 PSNR 이며, (b)는 영상의 샷 프레임(shot frame)을 학습 데이터 세트 및 테스트 레이블 세트로 하여 2500번 학습한 경우로서 blur filter 위주로 학습되었기 때문에 대부분의 결과가 blur된 것처럼 나와서 (a) 경우에 비해 성능이 많이 하락하게 된다. (c)는 일반 이미지들을 이용하여 선훈련된 필터 세트를 사용하여 SRCNN 한 경우로서 3000번 학습한 경우이며 성능이 소폭 하락한 것을 알 수 있다. (c)는 화질이 깨끗한 정지 영상들로 이루어진 데이터 세트에서 필터를 학습하였으므로, 객체(object)의 움직임과 카메라 워킹으로 인한 Blur 현상이나 Ghosting 현상 등이 있는 상태에서의 필터가 학습되지 않았으며, 그 결과 성능이 소폭 하락된 것이다. (d)는 본 발명에 따른 방법을 이용하여, general filter set들이 학습된 모델에서 그룹핑된 프레임들에 최적화된 필터들을 조기에 2500번 학습시킴으로써, (a)의 경우보다 높은 성능을 얻음을 확인할 수 있다.In Table 1, (a) is a PSNR for a high-resolution image obtained by linear interpolation, (b) shows a shot frame of an image as a learning data set and a test label set, As a learning case, most of the results are blurred because they are learned mainly by the blur filter, which results in a decrease in performance compared to (a). (c) shows that SRCNN was learned 3000 times using a pre-trained filter set using general images, and the performance was slightly decreased. (c), since the filter is learned in the data set including the still image having the clean image quality, the filter in the state where the motion of the object and the blur phenomenon or the ghosting phenomenon due to the camera walking are not learned, The result is a slight decline in performance. (d) shows that by using the method according to the present invention, the general filter sets learn filters that are optimized for the grouped frames in the learned model 2500 times earlier than in the case of (a) have.

이하, 학습의 iteration 횟수와 수퍼 해상도의 결과와의 상관 관계를 살펴본다. SRCNN의 간단한 네트워크의 특성상 loss layer의 Loss율은 500번째에서 수렴하게 된다. 도 3은 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 종래의 방법들과 비교하여 수퍼 해상도 결과물들을 예시한 이미지들이다. 도 3의 (a)는 원본 이미지이며, (b)는 2500번 학습한 경우이며, (c)는 300K 번 학습한 경우이다. Hereinafter, the correlation between the number of iterations of learning and the result of super resolution will be examined. Due to the nature of the SRCNN simple network, the loss rate of the loss layer converges at 500th. FIG. 3 is an image illustrating super resolution results in comparison with conventional methods, in a super resolution apparatus for a video stream according to a preferred embodiment of the present invention. 3 (a) is an original image, (b) is a case of learning 2500 times, and (c) is a case of learning 300K times.

도 3을 참조하면, 학습 횟수에 따른 수퍼 해상도의 결과물을 보면, 학습 횟수가 (b)와 같이 저반복 상태에서는 blur filter들이 학습되어 loss율은 낮지만 객관적 품질이 떨어지는 문제가 발생한다. 반면에 (c)와 같이 학습 횟수가 누적될수록 Edge filter 들이 학습되어 sharpening 효과가 두드러지게 나타난 것을 알 수 있다. 따라서, Large-scale iterated pre-trained 된 모델을 미세 튜닝(fine tunning)하여 저반복을 하면 샷의 프레임들에 대한 적합한 denoise filter 를 추가하고 불필요한 edge filter를 제거 효과를 볼 수 있게 된다. Referring to FIG. 3, blur filters are learned in a low repetition state such that the number of learning times is low as shown in FIG. 3, resulting in a low loss rate but low objective quality. On the other hand, as shown in (c), the edge filters are learned and the sharpening effect becomes prominent as the learning frequency is accumulated. Therefore, by fine-tuning a large-scale iterated pre-trained model and repeating it repeatedly, it is possible to add an appropriate denoise filter for the shot frames and remove the unnecessary edge filter.

도 4는 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 학습 iteration 횟수와 PSNR 상승률의 상관관계를 나타내는 그래프이다. 도 4를 참조하면, 학습 데이터 모델 대비 500번의 학습만에 PSNR의 상승률이 10%가 넘게 되고, 이후부터는 학습 횟수(training iteration)가 늘어나더라도 많은 향상폭을 얻는 것이 어려우며, 평균적으로 2500번의 학습으로부터 사람들이 결과물의 화질이 향상되었다고 판단하게 된다. 4 is a graph showing a correlation between the number of learning iterations and the PSNR increasing rate in a super resolution apparatus for a video stream according to a preferred embodiment of the present invention. Referring to FIG. 4, it is difficult to obtain a large improvement rate even if the increase rate of PSNR exceeds 10% only after learning 500 times compared with the learning data model, and thereafter, the number of training iterations increases. On the average, People will judge that the quality of the result is improved.

도 5는 본 발명의 바람직한 실시예에 따른 비디오 스트림에 대한 수퍼 해상도 장치에 있어서, 학습 iteration 횟수에 따른 결과물들을 예시한 이미지들이다. 도 5를 통해, 미세 튜닝된 학습 횟수가 1500번 또는 2500 번의 경우 화질이 향상됨을 파악할 수 있다. FIG. 5 is an image illustrating a result according to the number of learning iterations in a super resolution apparatus for a video stream according to a preferred embodiment of the present invention. 5, it can be seen that the picture quality is improved when the number of fine tuned learning times is 1500 or 2500 times.

표 2는 점진적 샷 변화 영역들에 대한 그룹들만을 분류하여 수퍼 해상도를 수행했을 때의 PSNR을 나타낸다. Table 2 shows the PSNR when super resolution is performed by grouping only groups for progressive shot change regions.

표 2를 살펴 보면, (a)는 선형 보간법을 이용한 경우이며, (b)는 점진적 샷 변화 영역들만을 분류하여 학습한 경우이며, (c)는 샷와 점진적 샷 변화 영역들을 포함하는 프레임들을 학습한 경우를 각각 나타낸다. 표 2를 통해, (c)는 점진적 변화를 포함하는 두 개의 샷에 대하여 학습한 것으로서, 점진적 샷 변화 영역들만으로 구성된 (b)에 비하여 성능이 나빠짐을 알 수 있다. 이는, 점진적 변화가 일어났다는 것의 특징이 점진적 샷 변화 영역의 프레임들은 평군 20 프레임 내외이고, 점진적 변화로 연결된 두 샷의 영상의 유사도가 매우 다르며, 또한, 점진적 샷 변화 영역에서 일어나는 편집 특성상 blurring과 alpha blending이 많이 일어남에도 불구하고, 일반 샷 안의 프레임들의 성능을 향상시키기 위한 edge filter들의 weight가 강하게 남아있기 때문이다. 한편, (b)와 같이, 점진적 변화된 프레임들로만 학습하는 경우, blurring과 alpha blending에 적합한 필터들이 overfitting하여 성능이 향상됨을 알 수 있다. In Table 2, (a) shows a case using linear interpolation, (b) shows a case where only progressive shot change regions are classified, and (c) shows a case in which frames including a shot and a progressive shot change region are learned Respectively. Through Table 2, (c) shows that the performance is worse than (b) consisting of only the progressive shot change areas, which is learned from two shots including a gradual change. This is because the progressive change is characterized by the fact that the frames of the incremental shot change area are within about 20 frames of the normal frame and the similarity of the images of two shots connected by the incremental change is very different. Also, due to the editing characteristics occurring in the progressive shot change area, blurring and alpha This is because the weight of the edge filters to enhance the performance of the frames in the ordinary shot remains strong despite the blending. On the other hand, as shown in (b), when learning only with progressively changed frames, it is seen that the performance is improved by overfitting filters suitable for blurring and alpha blending.

전술한 실험들을 통해, 본 발명과 같이 비디오 스트림을 샷와 점진적 샷 변화 영역들로 그룹화시키고, 각 그룹을 구성하는프레임들을 데이터 세트로 수퍼 해상도를 수행하면, 종래의 일반적인 이미지 데이터에서 학습된 모델에 비하여 성능이 크게 향상됨을 알 수 있다. 또한, 본 발명에 따라 선-훈련된 필터 세트를 이용하여 그룹별로 미세 튜닝함으로써, 학습 횟수가 2500번만에 주관적인 성능 향상이 일어남을 확인할 수 있게 된다. Through the above-described experiments, when a video stream is grouped into shot and progressive shot change areas as in the present invention, and super resolution is performed using the frames constituting each group as a data set, compared with the model learned in conventional general image data It can be seen that the performance is greatly improved. In addition, according to the present invention, fine tuning is performed for each group using a pre-trained filter set, and it is confirmed that the performance improvement is subjective only at the number of learning times of 2500 times.

이상에서 본 발명에 대하여 그 바람직한 실시예를 중심으로 설명하였으나, 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 발명이 속하는 분야의 통상의 지식을 가진 자라면 본 발명의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 그리고, 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구 범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood that various changes and modifications may be made without departing from the spirit and scope of the invention. It is to be understood that the present invention may be embodied in many other specific forms without departing from the spirit or essential characteristics thereof.

10 : 수퍼 해상도 장치
100 : 비디오 스트림 입력 모듈
110 : 선-훈련 모듈
120 : 그룹핑 모듈
130 : 수퍼 해상도 모듈10: super resolution device
100: Video stream input module
110: Line-training module
120: Grouping module
130: Super Resolution Module

Claims

A video stream input module for receiving a video stream composed of a series of frames;
A pre-training module for pre-training pre-training images for pre-training images to obtain initial filter sets for super resolution;
The frames of the video stream are classified into shot regions and gradual shot change regions and are clustered. Each frame is grouped into shot regions or progressive shot change regions according to a clustered result, A grouping module for setting groups; And
The initial filter sets are re-learned for each of the groups constituting the shot regions or the incremental shot change regions by the grouping module for each group using the predetermined learning model for the frames constituting each group A super resolution module for obtaining fine tuned filter sets optimized for each group and using the fine tuned filter sets for each group to obtain super resolution images for the frames of each group;
And super resolution for the frames making up the video stream.

2. The apparatus of claim 1, wherein the initial filter set and the fine tuned filter set include at least a De-bluring filter and a De-noising filter.

2. The super resolution apparatus of claim 1, wherein the learning model used by the pre-training module and the super resolution module is a Convolution Neural Network (CNN).

The super resolution module of claim 3, wherein the super resolution module extracts low resolution images of frames constituting each group, sets the low resolution images as a training data set, wherein the learning filter is set to a label set and the initial filter set is re-learned using the learning data set and the label set to obtain fine tuned filter sets for each group.
Wherein super-resolution images for the frames comprising each group are obtained using fine tuned filter sets for each group.

4. The apparatus of claim 3, wherein the convolution neural network comprises a 3-layer shallow structure.

(a) receiving a video stream composed of a series of frames;
(b) pre-training pre-training images for pre-training using a pre-set learning model to obtain initial filter sets for super resolution;
(c) clustering the frames constituting the video stream into shot areas and gradual shot change regions, and clustering the frames, and dividing each frame into shot areas and progressive shot change Setting a plurality of groups by grouping into regions;
(d) For each group set to the shot regions and the progressive shot change regions according to the step (c), the initial filter sets are set for the frames of each group for each group using the predetermined learning model Re-learning to obtain fine tuned filter sets optimized for each group, and obtaining super resolution images for frames constituting each group using fine tuned filter sets for the respective groups for each group;
Wherein the super resolution of the frames constituting the video stream is performed.

7. The method of claim 6, wherein the initial filter set and the fine tuned filter set include at least a De-bluring filter and a De-noising filter.

The method of claim 6, wherein the learning model used in steps (b) and (d) is a Convolution Neural Network (CNN).

The method as claimed in claim 8, wherein the step (d) comprises the steps of: extracting low resolution images of frames constituting each group, setting low resolution images as a training data set, Sets a label set, re-learns the initial filter set using the learning data set and the label set, and obtains fine-tuned filter sets optimized for the group for each group,
And super resolution images for the frames constituting each group are obtained using the fine tuned filter set for each group.

9. The method of claim 8, wherein the convolution neural network comprises a shallow 3-layer structure.