KR20220046808A

KR20220046808A - Method and Apparatus for Video Inpainting

Info

Publication number: KR20220046808A
Application number: KR1020200129912A
Authority: KR
Inventors: 윤종길; 김동원; 임정연; 허재호
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2020-10-08
Filing date: 2020-10-08
Publication date: 2022-04-15

Abstract

Disclosed are a method and apparatus for video inpainting, generating a reconstructed video with high quality by performing video inpainting based on inter-frame and/or intra-frame correlation. According to one aspect of the present disclosure, a method of restoring a region occluded by an object included in a video comprises: selecting a target frame from which the object is to be removed and a reference frame including information about the region in the target frame occluded by the object from among frames of the video; and inpainting the target frame by performing at least one of inter-inpainting and intra-inpainting according to the number of frames selected as the reference frame.

Description

Image restoration method and apparatus {Method and Apparatus for Video Inpainting}

본 실시예는 영상복원 방법 및 장치에 관한 것이다.This embodiment relates to an image restoration method and apparatus.

이 부분에 기술된 내용은 단순히 본 발명에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section merely provides background information on the present invention and does not constitute the prior art.

영상에는 시청자의 이해를 돕거나 재미를 더하기 위해 다양한 자막(subtitle)이 추가된다. 예를 들어, 외국 영상을 수입하는 경우, 출연자의 음성 등을 번역한 자막이 추가된다. 또한, 소음이 심한 환경에서는 자막이 내용 전달에 도움을 준다.Various subtitles are added to the video to help viewers understand it or to add fun. For example, in the case of importing foreign video, subtitles translated from the voice of the performer are added. Also, in a noisy environment, subtitles help to convey the content.

그러나 경우에 따라서는 자막이 오히려 영상 시청의 방해요소가 될 수 있다. 예를 들어, 외국어 학습을 위해 영상을 보는 시청자는 자막에 의해 학습을 방해받을 수 있다. 또한, 과도한 자막은 화면의 일부를 가려서 영상에 집중할 수 없게 만든다. 나아가, 외국어 자막이 추가되어 있어 해당 자막을 이해하지 못하는 경우에는 차라리 자막이 없는 것이 낫다.However, in some cases, subtitles can become an obstacle to video viewing. For example, a viewer watching an image for learning a foreign language may be interrupted by subtitles. In addition, excessive subtitles cover part of the screen, making it difficult to focus on the video. Furthermore, if foreign language subtitles are added and the subtitles are not understood, it is better not to have subtitles.

자막과 영상이 별도의 채널로 존재하는 경우에는, 자막을 편집하거나 제거하는 작업이 어렵지 않다. 그러나 자막이 영상과 통합되어 있고, 자막 추가 이전의 원본 영상을 보유하고 있지 않은 경우가 문제된다. 이 경우, 자막이 포함된 영역을 흐리게(blur) 처리하거나, 자막이 포함된 영역 위에 반투명 및/또는 불투명한 띠를 추가하기도 하며, 심한 경우 영상에서 자막이 포함된 영역 전체를 잘라 내는 등 자막을 가리기 위해 영상을 훼손시키고 있다.When the subtitle and the video exist as separate channels, it is not difficult to edit or remove the subtitle. However, there is a problem in the case where the subtitles are integrated with the video and the original video before adding the subtitles is not retained. In this case, the subtitles are removed by blurring the region containing the subtitles, adding a translucent and/or opaque band over the region containing the subtitles, and in extreme cases cutting the entire region containing the subtitles from the video. You are destroying the image to cover it up.

이에 따라, 영상을 훼손시키지 않고 자막에 의해 가려진 영역을 복원하는 기술에 대한 요구가 증가하고 있다.Accordingly, there is an increasing demand for a technique for reconstructing an area covered by subtitles without damaging an image.

기존의 영상복원 기술로는, 제거할 영역의 주변 픽셀 정보를 참고하여 가려진 영역을 복원하는 확산 기반(diffusion-based) 방식 및 화면을 여러 개의 영역(region)으로 분할한 후, 적당한 영역을 선택해 가려진 영역을 대체하는 패치 기반(patch-based) 방식 등이 있다. With the existing image restoration technology, a diffusion-based method that restores the occluded region by referring to information on surrounding pixels of the region to be removed, and a screen divided into several regions and then selected an appropriate region There is a patch-based method that replaces a region, and the like.

본 개시의 실시예는, 입력 영상으로부터 자막을 제거할 타겟 프레임(target frame) 및 자막에 의해 가려진 영역에 대한 정보를 얻을 참조 프레임(reference frame)을 선택하고, 이를 바탕으로 프레임 간 상관성 기반 영상복원 및/또는 프레임 내 상관성 기반 영상복원을 수행함으로써 높은 품질을 갖는 복원 영상을 생성하는 영상복원방법 및 장치를 제공하는 데 주된 목적이 있다.An embodiment of the present disclosure selects a target frame from which subtitles are to be removed from an input image and a reference frame to obtain information about an area covered by the subtitles, and based on this, image restoration based on inter-frame correlation and/or to provide an image restoration method and apparatus for generating a restored image having high quality by performing intra-frame correlation-based image restoration.

본 실시예의 일 측면에 의하면, 영상에 포함된 객체에 의해 가려진 영역을 복원하는 방법에 있어서, 상기 영상의 프레임 중에서 상기 객체를 제거할 타겟 프레임(target frame) 및 상기 객체에 의해 가려진 상기 타겟 프레임 내의 영역에 대한 정보를 포함하는 참조 프레임(reference frame)을 선택하는 과정; 및 상기 참조 프레임으로 선택된 프레임의 개수에 따라 인터 복원(inter inpainting) 및 인트라 복원(intra inpainting) 중 적어도 하나를 수행하여 상기 타겟 프레임을 복원하는 과정을 포함하는 것을 특징으로 하는 영상복원방법을 제공한다.According to an aspect of this embodiment, in the method of reconstructing an area covered by an object included in an image, a target frame from which the object is to be removed from among the frames of the image and within the target frame covered by the object selecting a reference frame including information on a region; and performing at least one of inter-inpainting and intra-inpainting according to the number of frames selected as the reference frame to restore the target frame. .

본 실시예의 다른 측면에 의하면, 영상에 포함된 객체에 의해 가려진 영역을 복원하는 장치에 있어서, 상기 객체를 제거할 타겟 프레임(target frame) 및 상기 객체에 의해 가려진 상기 타겟 프레임 내의 영역에 대한 정보를 포함하는 참조 프레임(reference frame)을 선택하는 프레임 선택부(frame selection unit); 및 상기 참조 프레임의 수에 따라 인터 복원(inter inpainting) 및 인트라 복원(intra inpainting) 중 적어도 하나를 수행하여 복원 프레임을 생성하는 복원부(inpainting unit)를 포함하는 것을 특징으로 하는 영상복원장치를 제공한다.According to another aspect of this embodiment, in the apparatus for reconstructing a region obscured by an object included in an image, information about a target frame from which the object is to be removed and a region within the target frame obscured by the object is provided. a frame selection unit for selecting a reference frame including; and an inpainting unit configured to generate a restored frame by performing at least one of inter-inpainting and intra-inpainting according to the number of reference frames. do.

이상에서 설명한 바와 같이 본 개시의 실시예에 의하면, 입력 영상으로부터 자막을 제거할 타겟 프레임(target frame) 및 자막에 의해 가려진 영역에 대한 정보를 얻을 참조 프레임(reference frame)을 선택하고, 이를 바탕으로 프레임 간 상관성 기반 영상복원 및/또는 프레임 내 상관성 기반 영상복원을 수행함으로써 높은 품질을 갖는 복원 영상을 생성할 수 있다는 효과가 있다.As described above, according to the embodiment of the present disclosure, a target frame from which a caption is to be removed from an input image and a reference frame from which information about an area covered by the caption are obtained are selected, and based on this, a reference frame is selected. By performing inter-frame correlation-based image restoration and/or intra-frame correlation-based image restoration, it is possible to generate a restored image having high quality.

나아가, 본 개시의 실시예에 의하면, 자막 편집이 완료된 영상으로부터 편집 전의 원본 영상을 얻어낼 수 있으므로, 원본 영상 구매 비용, 원본 영상 보관 비용, 및 영상 처리 비용을 절감할 수 있다는 효과가 있다. Furthermore, according to the embodiment of the present disclosure, since the original image before editing can be obtained from the image on which the subtitle editing is completed, there is an effect that the original image purchase cost, the original image storage cost, and the image processing cost can be reduced.

도 1은 본 개시의 일 실시예에 따른 영상복원장치를 개략적으로 나타낸 블록구성도이다.
도 2 내지 도 4c는 본 개시의 제1 실시예에 따른 프레임 선택부를 설명하기 위한 예시도이다.
도 5는 본 개시의 제2 실시예에 따른 프레임 선택부를 설명하기 위한 예시도이다.
도 6은 본 개시의 제2 실시예에 따른 프레임 선택과정을 설명하기 위한 순서도이다.
도 7은 본 개시의 일 실시예에 따른 인터 복원부의 네트워크를 설명하기 위한 예시도이다.
도 8a 내지 도 8b는 본 개시의 일 실시예에 따른 타겟 프레임과 참조 프레임 간 유사도 계산을 설명하기 위한 예시도이다.
도 9는 본 개시의 일 실시예에 따른 어텐션 매칭을 위한 참조 프레임의 특징 인덱싱을 설명하기 위한 예시도이다.
도 10은 본 개시의 일 실시예에 따른 비대칭 입출력 구조를 갖는 오토인코더 네트워크를 설명하기 위한 예시도이다.
도 11은 본 개시의 일 실시예에 따른 인트라 복원부를 설명하기 위한 예시도이다.
도 12는 본 개시의 일 실시예에 따른 비정밀 예측부를 설명하기 위한 예시도이다.
도 13은 본 개시의 일 실시예에 따른 정밀 처리부를 설명하기 위한 예시도이다.
도 14는 본 개시의 일 실시예에 따른 병합 네트워크 학습을 설명하기 위한 예시도 이다. 1 is a block diagram schematically showing an image restoration apparatus according to an embodiment of the present disclosure.
2 to 4C are exemplary views for explaining the frame selector according to the first embodiment of the present disclosure.
5 is an exemplary view for explaining a frame selector according to a second embodiment of the present disclosure.
6 is a flowchart illustrating a frame selection process according to a second embodiment of the present disclosure.
7 is an exemplary diagram for explaining a network of an inter recovery unit according to an embodiment of the present disclosure.
8A to 8B are exemplary diagrams for explaining calculation of a similarity between a target frame and a reference frame according to an embodiment of the present disclosure.
9 is an exemplary diagram illustrating feature indexing of a reference frame for attention matching according to an embodiment of the present disclosure.
10 is an exemplary diagram for explaining an autoencoder network having an asymmetric input/output structure according to an embodiment of the present disclosure.
11 is an exemplary diagram for explaining an intra restoration unit according to an embodiment of the present disclosure.
12 is an exemplary diagram for explaining a coarse prediction unit according to an embodiment of the present disclosure.
13 is an exemplary view for explaining a precision processing unit according to an embodiment of the present disclosure.
14 is an exemplary diagram for explaining merge network learning according to an embodiment of the present disclosure.

이하, 본 개시의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 개시을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present disclosure will be described in detail with reference to exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description thereof will be omitted.

또한, 본 개시의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In addition, in describing the components of the present disclosure, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. Throughout the specification, when a part 'includes' or 'includes' a certain element, this means that other elements may be further included, rather than excluding other elements, unless otherwise stated. . In addition, the '... Terms such as 'unit' and 'module' mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software.

이하, 본 개시의 다양한 실시예들은 영상복원장치가 제거하는 대상으로서 자막(subtitle)을 예로 들어 설명한다. 그러나 이는 설명의 편의를 위한 것일 뿐, 본 개시가 이러한 실시예에 한정되는 것은 아니다. 예를 들어, 본 개시의 일 실시예에 따른 영상복원장치는 영상에서 특정 상표 및/또는 로고(logo) 등을 제거할 수 있다. Hereinafter, various embodiments of the present disclosure will be described using a subtitle as an object to be removed by the image restoration apparatus. However, this is only for convenience of description, and the present disclosure is not limited to these embodiments. For example, the image restoration apparatus according to an embodiment of the present disclosure may remove a specific trademark and/or logo from an image.

도 1은 본 개시의 일 실시예에 따른 영상복원장치를 개략적으로 나타낸 블록구성도이다.1 is a block diagram schematically showing an image restoration apparatus according to an embodiment of the present disclosure.

도 1을 참조하면, 본 개시의 일 실시예에 따른 영상복원장치(10)는 입력 프레임 DB(input frame DB, 100), 프레임 메타정보 DB(frame metadata DB, 110), 마스크 생성부(mask generation unit, 120), 프레임 선택부(frame selection unit, 130), 복원부(inpainting unit, 140), 프레임 병합부(frame merging unit, 150), 및 출력 프레임 DB(output frame DB, 160)를 전부 또는 일부 포함한다. 도 1에 도시된 모든 블록이 필수 구성요소는 아니며, 다른 실시예에서 영상복원장치(10)에 포함된 일부 블록이 추가, 변경 또는 삭제될 수 있다. 예컨대, 타겟 프레임을 대상으로 인터 복원 및 인트라 복원 중 하나만을 택일적으로 수행하는 경우, 영상복원장치는 프레임 병합부(150)를 포함하지 않을 수 있다.Referring to FIG. 1 , an image restoration apparatus 10 according to an embodiment of the present disclosure includes an input frame DB (DB) 100, a frame metadata DB (DB) 110, and a mask generation unit. unit 120, a frame selection unit 130, an inpainting unit 140, a frame merging unit 150, and an output frame DB (output frame DB, 160) all or includes some Not all blocks shown in FIG. 1 are essential components, and in another embodiment, some blocks included in the image restoration apparatus 10 may be added, changed, or deleted. For example, when only one of inter restoration and intra restoration is selectively performed on a target frame, the image restoration apparatus may not include the frame merging unit 150 .

입력 프레임 DB(100)는 제거하고자 하는 자막이 포함된 입력 영상을 프레임 단위로 이미지화하여 저장한다. The input frame DB 100 stores the input image including the subtitle to be removed as an image in units of frames.

프레임 메타 정보 DB(110)는 입력 영상으로부터 장면 전환정보, 자막 위치정보, 자막에 사용된 텍스트정보, 및/또는 자막에 사용된 폰트정보를 추출하여 저장한다. The frame meta information DB 110 extracts and stores scene change information, subtitle location information, text information used for subtitles, and/or font information used for subtitles from the input image.

마스크 생성부(120)는 자막에 사용된 텍스트와 폰트를 기준으로 자막영역을 표현하는 마스크 이미지(mask image)를 생성한다.The mask generator 120 generates a mask image representing the caption area based on the text and font used for the caption.

프레임 선택부(130)는 프레임 메타 정보 및/또는 마스크 이미지에 근거하여 입력 프레임 중 자막을 제거할 타겟 프레임(target frame) 및 자막에 의해 가려진 영역에 대한 정보를 얻을 참조 프레임(reference frame)을 선택한다. 프레임 선택부(130)에 대한 구체적인 설명은 도 2 내지 도 6b를 참조하여 설명하기로 한다.The frame selector 130 selects a target frame from which a caption is to be removed and a reference frame from which information on an area covered by the caption is obtained from among the input frames, based on the frame meta information and/or the mask image. do. A detailed description of the frame selector 130 will be described with reference to FIGS. 2 to 6B .

복원부(140)는 타겟 프레임, 참조 프레임 및 마스크 이미지에 근거하여 타겟 프레임으로부터 자막을 제거하고, 자막에 의해 가려졌던 영역을 복원한다. 본 개시에서 '복원'은 다른 프레임에 있는 픽셀 및/또는 다른 입력 프레임 내 다른 위치에 있는 픽셀을 참조하여 자막(202)에 의해 가려진 영역의 픽셀을 재구성하는 인페인팅(inpainting)과 동일한 의미로 사용된다. 복원부는 인터 복원부(inter inpainting unit, 142) 및 인트라 복원부(intra inpainting unit, 144)를 포함한다. 인터 복원부(142)는 프레임 간(inter-frame) 상관성에 기반하여 타겟 프레임의 복원을 수행하는 반면, 인트라 복원부(144)는 프레임 내(intra-frame) 상관성 기반하여 타겟 프레임의 복원을 수행한다. 이하에서는, 프레임 간 상관성 기반 복원을 인터 복원이라 하고, 프레임 내 상관성 기반 복원을 인트라 복원이라 한다. The restoration unit 140 removes the subtitle from the target frame based on the target frame, the reference frame, and the mask image, and restores an area covered by the subtitle. In this disclosure, 'restore' is used synonymously with inpainting, which reconstructs pixels in an area obscured by the subtitle 202 with reference to pixels in other frames and/or pixels in other positions in other input frames. do. The restoration unit includes an inter restoration unit 142 and an intra inpainting unit 144 . The inter reconstructor 142 reconstructs the target frame based on inter-frame correlation, whereas the intra reconstructor 144 reconstructs the target frame based on intra-frame correlation. do. Hereinafter, inter-frame correlation-based restoration is referred to as inter restoration, and intra-frame correlation-based restoration is referred to as intra restoration.

본 개시의 일 실시예에 따른 복원부(140)는 프레임 선택부(130)가 선택한 타겟 프레임 및 참조 프레임에 대한 정보에 기초하여 인터 복원 및 인트라 복원 중 적어도 하나를 수행한다. 인터 복원부(142) 및 인트라 복원부(144)에 대한 구체적인 설명은 도 7 내지 도 13을 참조하여 설명하기로 한다.The restoration unit 140 according to an embodiment of the present disclosure performs at least one of an inter restoration and an intra restoration based on the information on the target frame and the reference frame selected by the frame selection unit 130 . A detailed description of the inter restoration unit 142 and the intra restoration unit 144 will be described with reference to FIGS. 7 to 13 .

프레임 병합부(150)는 인터 복원부(142)의 출력 및 인트라 복원부(144)의 출력을 합성하거나, 인터 복원부(142)의 출력 및 인트라 복원부(144)의 출력 중 하나의 출력을 선택하여 복원 프레임으로 사용한다. 프레임 병합부(150)에 대한 구체적인 설명은 도 14를 참조하여 설명하기로 한다The frame merging unit 150 synthesizes the output of the inter restoration unit 142 and the output of the intra restoration unit 144 , or outputs one of the output of the inter restoration unit 142 and the output of the intra restoration unit 144 . Select it and use it as a restoration frame. A detailed description of the frame merging unit 150 will be described with reference to FIG. 14 .

출력 프레임 DB(160)는 자막이 제거된 출력 영상의 프레임을 저장한다.The output frame DB 160 stores a frame of an output image from which subtitles are removed.

이하, 도 2 내지 도 4c을 참조하여 본 개시의 제1 실시예에 따른 프레임 선택부를 설명한다.Hereinafter, a frame selector according to a first embodiment of the present disclosure will be described with reference to FIGS. 2 to 4C .

도 2 내지 도 4c는 본 개시의 제1 실시예에 따른 프레임 선택부를 설명하기 위한 예시도이다.2 to 4C are exemplary views for explaining the frame selector according to the first embodiment of the present disclosure.

도 2를 참조하면, 본 개시의 일 실시예에 따른 프레임 선택부(130)는 타겟 프레임보다 시간 상으로 이전에 위치한 이전 프레임 및/또는 타겟 프레임보다 시간 상으로 이후에 위치한 이후 프레임으로부터 타겟 프레임과 유사도가 높은 전방향 참조 프레임 및 후방향 참조 프레임을 선택하고, 해당 참조 프레임들에 대한 유사영역 정보를 추출한다. 여기서 전방향 참조 프레임은 이전 프레임 중 타겟 프레임과 유사도가 높은 프레임을 의미하고, 후방향 참조 프레임은 이후 프레임 중 타겟 프레임과 유사도가 높은 프레임을 의미한다. Referring to FIG. 2 , the frame selector 130 according to an embodiment of the present disclosure performs a target frame and a target frame from a previous frame positioned before the target frame and/or a subsequent frame positioned later than the target frame in time. A forward reference frame and a backward reference frame having high similarity are selected, and similar region information for the reference frames is extracted. Here, the forward reference frame refers to a frame having a high similarity to the target frame among previous frames, and the backward reference frame refers to a frame having a high similarity to the target frame among subsequent frames.

프레임 선택부(130)는 타겟 프레임, 참조 프레임 및 참조 프레임들에 대한 유사영역 정보를 복원부(140)에 구비된 심층 신경망(Deep Neural Network)의 입력으로서 제공한다. The frame selector 130 provides the target frame, the reference frame, and similar region information on the reference frames as an input to the deep neural network provided in the reconstruction unit 140 .

이하, 본 개시의 일 실시예에 따른 프레임 선택부(130)가 유사영역 정보를 추출하는 방법을 설명하기에 앞서, 도 3을 참조하여 본 개시에서 사용하는 용어들을 설명하도록 한다.Hereinafter, before describing a method of extracting similar region information by the frame selector 130 according to an embodiment of the present disclosure, terms used in the present disclosure will be described with reference to FIG. 3 .

도 3의 (a)를 참조하면, 검출영역(302)은 프레임 또는 프레임의 특정영역(300) 내에서 자막이 검출되는 영역을 나타낸다. 본 개시의 일 실시예에 따르면, 프레임 선택부(130)는 OCR(optical character recognition)을 이용하여 자막을 검출할 수 있으나, 이러한 예시에 한정되는 것은 아니다. 주위영역(neighbor area, 304)은 검출영역(302)으로부터 일정한 거리만큼 떨어진 영역 및/또는 해당 영역을 포함하는 사각형의 영역을 나타낸다.Referring to FIG. 3A , the detection region 302 indicates a region in which a caption is detected within a frame or a specific region 300 of the frame. According to an embodiment of the present disclosure, the frame selector 130 may detect a caption using optical character recognition (OCR), but is not limited to this example. A neighbor area 304 indicates an area separated by a predetermined distance from the detection area 302 and/or a rectangular area including the corresponding area.

도 3의 (b)를 참조하면, 프레임 또는 프레임의 특정영역(300)은 검출영역(302) 및 자막을 기준으로 외부영역(300a), 내부영역(300b), 및 자막영역(300c)으로 구분될 수 있다. 외부영역(300a)은 자막이 검출되지 않는 영역, 즉 검출영역(302)의 외부영역을 의미한다. 내부영역(300b)은 검출영역(302) 내에서 자막에 해당하지 않는 영역을 의미한다. 자막영역(300c)은 검출영역(302) 내에서 자막에 해당하는 영역으로서, 자막 제거 및 복원을 수행할 영역을 의미한다.Referring to FIG. 3B , a frame or a specific area 300 of a frame is divided into an outer area 300a, an inner area 300b, and a caption area 300c based on the detection area 302 and the caption. can be The outer region 300a refers to an area in which a caption is not detected, that is, an area outside the detection area 302 . The inner region 300b refers to a region that does not correspond to a caption within the detection region 302 . The caption area 300c is an area corresponding to a caption within the detection area 302 and refers to an area in which caption removal and restoration are to be performed.

이상과 같은 용어들을 기초로, 도 4a 내지 도 4c를 참조하여 본 개시의 일 실시예에 따른 프레임 선택부(130)가 유사영역 정보를 추출하는 방법을 설명하도록 한다. 한편, 이하에서는 2개의 이전 프레임으로부터 유사영역 정보를 추출하는 방법을 설명하나 이는 설명의 편의를 위한 것으로서 이러한 방법은 하나 이상의 이전 프레임 및/또는 하나 이상의 이후 프레임들에도 동일하게 적용될 수 있다.A method of extracting similar region information by the frame selector 130 according to an embodiment of the present disclosure will be described with reference to FIGS. 4A to 4C based on the above terms. Meanwhile, a method of extracting similar region information from two previous frames will be described below, but this is for convenience of description, and this method may be equally applied to one or more previous frames and/or one or more subsequent frames.

도 4a는 원형 객체(object)가 시간의 흐름에 따라 프레임의 우측 하단으로 이동하는 예시를 나타낸다. 도 4a의 (c)는 시점 t에서 자막을 제거할 타겟 프레임(f_t,420)이고, 도 4a의 (a) 및 (b)는 타겟 프레임으로부터 시간 상으로 이전에 위치한 이전 프레임(f_t-2,f_t-1,400 및 410)을 나타낸다. 도 4a의 (a) 및 (b)에서는 이전 프레임(400 및 410)에 자막이 추가되어 있는, 즉 이전 프레임(400 및 410) 내에 검출영역(402 및 412)이 포함되어 있는 예를 도시하고 있으나, 이는 설명의 편의를 위한 것으로서 이전 프레임(400 및 410)에는 자막이 추가되어 있지 않거나, 타겟 프레임(420)과는 다른 내용의 자막이 추가되어 있을 수 있다.4A illustrates an example in which a circular object moves to the lower right corner of a frame over time. 4a (c) shows the target frame from which the subtitle is to be removed at time t (f _t ,420), and (a) and (b) of FIG. 4A show the previous frames (f _t-2, f _t-1 ,400 and 410). 4A (a) and (b) show an example in which the caption is added to the previous frames 400 and 410, that is, the detection regions 402 and 412 are included in the previous frames 400 and 410, but , this is for convenience of explanation, and subtitles may not be added to the previous frames 400 and 410 , or subtitles with different contents from the target frame 420 may be added.

도 4b를 참조하면, 본 개시의 일 실시예에 따른 프레임 선택부(130)는 타겟 프레임(420)의 주위영역(424)을 이용하여, 이전 프레임(400 및 410)들로부터 타겟 프레임(420)과 유사도가 가장 높은 유사영역(430 및 440)을 찾는 템플릿 매칭(template matching)을 수행한다. Referring to FIG. 4B , the frame selector 130 according to an embodiment of the present disclosure uses the surrounding area 424 of the target frame 420 to select the target frame 420 from previous frames 400 and 410 . Template matching is performed to find similar regions 430 and 440 having the highest similarity to .

한편, 도 4b에서는 검출영역(422)의 주변 화소 전체를 이용해서 템플릿 매칭을 수행하는 예시를 도시하고 있으나, 검출영역(422)의 주변 화소를 블록 단위로 분할하여 템플릿 매칭을 수행할 수 있다. 나아가, 템플릿 매칭을 수행하는 방법은 이 분야의 통상의 기술자가 용이하게 채용할 수 있는 모든 방법을 포괄하며, 특정한 방법으로 한정되지 않는다.Meanwhile, although FIG. 4B shows an example of performing template matching using all of the pixels surrounding the detection area 422 , template matching may be performed by dividing the pixels surrounding the detection area 422 into blocks. Furthermore, the method of performing template matching encompasses all methods that can be easily employed by those skilled in the art, and is not limited to a specific method.

본 개시의 일 실시예에 따른 프레임 선택부(130)는 SAD(sum of absolute differences)및/또는 MSD(mean square difference) 등을 이용하여 특정 영역 간의 유사도를 계산할 수 있으나 이러한 예시에 한정되는 것은 아니며, 이 분야의 통상의 기술자라면 누구든지 다른 방법을 이용하여 특정 영역 간의 유사도를 산출할 수 있을 것이다. 한편, 프레임 선택부(130)는 타겟 프레임(420) 및 이전 프레임(400 및 410)의 내부영역에 대해서는 유사도를 계산하지 않는다.The frame selector 130 according to an embodiment of the present disclosure may calculate the similarity between specific regions using sum of absolute differences (SAD) and/or mean square difference (MSD), but is not limited to these examples. , any person skilled in the art will be able to calculate the similarity between specific regions using other methods. Meanwhile, the frame selector 130 does not calculate the similarity of the target frame 420 and the inner regions of the previous frames 400 and 410 .

본 개시의 일 실시예에 따른 프레임 선택부(130)는 기설정된 임계 유사도보다 높은 유사도를 갖는 유사영역(430 및 440)이 검출된 이전 프레임(400 및 410)을 후방향 참조 프레임으로 선택한다. 다시 말해, 이전 프레임(430 및 440) 내에 기설정된 임계 유사도보다 높은 유사도를 갖는 영역이 없는 경우에는 해당 이전 프레임은 후방향 참조 프레임으로 선택하지 않는다. The frame selector 130 according to an embodiment of the present disclosure selects previous frames 400 and 410 in which similar regions 430 and 440 having a similarity higher than a preset threshold similarity are detected as backward reference frames. In other words, when there is no region having a similarity higher than the preset threshold similarity in the previous frames 430 and 440, the previous frame is not selected as a backward reference frame.

본 개시의 일 실시예에 따른 프레임 선택부(130)는 후방향 참조 프레임으로 선택된 이전 프레임(400 및 410)에 대해 유사영역 정보 및 타겟 프레임과 이전 프레임 간의 거리 정보를 추출한다.The frame selector 130 according to an embodiment of the present disclosure extracts similar region information and distance information between the target frame and the previous frame with respect to the previous frames 400 and 410 selected as the backward reference frame.

유사영역 정보는 유사도, 유사영역의 위치정보 및/또는 유사영역으로부터 세분화된 하나 이상의 영역에 대한 정보 등을 포함할 수 있다. The similar region information may include a degree of similarity, location information of the similar region, and/or information on one or more regions subdivided from the similar region.

도 4c를 참조하면, 본 개시의 일 실시예에 따른 프레임 선택부(130)는 유사영역(430 및 440)을 외부영역(430a 및 440a), 내부영역(430b 및 440b), 및 자막영역(440c)으로 구분하고, 각 영역에 대해 서로 다른 인덱스(index) 및/또는 가중치(weight)를 부여한다. 예를 들어, 외부영역(430a 및 440a)에는 "2"의 값을 부여하고, 내부영역(430b 및 440b)에는 "1"의 값을 부여하고, 자막영역(440c)에는 "0"의 값을 부여할 수 있다. 프레임 선택부(130)는 부여된 인덱스 및/또는 가중치를 이용하여 유사영역(430 및 440)을 복원시 유효한 영역, 중요한 영역, 참고할 만할 영역 등으로 다시 구분할 수 있다.Referring to FIG. 4C , the frame selector 130 according to an embodiment of the present disclosure divides the similar regions 430 and 440 into external regions 430a and 440a, internal regions 430b and 440b, and a caption region 440c. ), and a different index and/or weight is given to each area. For example, a value of “2” is assigned to the outer regions 430a and 440a, a value of “1” is assigned to the inner regions 430b and 440b, and a value of “0” is assigned to the subtitle region 440c. can be given The frame selector 130 may re-classify the similar regions 430 and 440 into valid regions, important regions, reference regions, and the like, by using the assigned indexes and/or weights.

한편, 본 개시의 다른 실시예에 따른 프레임 선택부(130)는 템플릿 매칭 대신 비디오 코덱(video codec)의 움직임 벡터(motion vector)를 이용하여 참조 프레임을 선택하고, 유사영역에 대한 정보, 및 타겟 프레임과 참조 프레임 간의 거리 정보를 추출할 수 있다.On the other hand, the frame selector 130 according to another embodiment of the present disclosure selects a reference frame by using a motion vector of a video codec instead of template matching, information on a similar region, and a target Distance information between the frame and the reference frame may be extracted.

이하, 도 5 내지 도 6을 참조하여 본 개시의 제2 실시예에 따른 프레임 선택부를 설명한다.Hereinafter, a frame selector according to a second embodiment of the present disclosure will be described with reference to FIGS. 5 to 6 .

도 5는 본 개시의 제2 실시예에 따른 프레임 선택부를 설명하기 위한 예시도이다. 5 is an exemplary view for explaining a frame selector according to a second embodiment of the present disclosure.

도 5를 참조하면, 본 개시의 일 실시예에 따른 프레임 선택부(130)는 흐름 제어부(500), 후방향 참조 프레임 큐(510), 타겟 프레임 리스트(520), 및 전방향 참조 프레임 큐(530)를 전부 또는 일부 포함한다. 도 5에 도시된 모든 블록이 필수 구성요소는 아니며, 다른 실시예에서 프레임률 향상 장치(10)에 포함된 일부 블록이 추가, 변경 또는 삭제될 수 있다. 5, the frame selection unit 130 according to an embodiment of the present disclosure includes a flow control unit 500, a backward reference frame queue 510, a target frame list 520, and a forward reference frame queue ( 530) in whole or in part. Not all blocks shown in FIG. 5 are essential components, and in another embodiment, some blocks included in the frame rate improving apparatus 10 may be added, changed, or deleted.

흐름 제어부(500)는 프레임 메타정보를 기초로 입력 프레임 중에서 타겟 프레임, 후방향 참조 프레임, 및 전방향 참조 프레임을 결정한다. 이때, 참조 프레임은 타겟 프레임과 시간 상으로 인접하며, 타겟 프레임의 자막을 대체할 수 있는 상관성이 높은 화소값을 가지고 있는 프레임을 의미한다. The flow control unit 500 determines a target frame, a backward reference frame, and a forward reference frame from among the input frames based on frame meta information. In this case, the reference frame refers to a frame that is adjacent in time to the target frame and has a pixel value with high correlation that can replace the subtitle of the target frame.

구체적으로, 참조 프레임은 자막을 포함하지 않거나 포함하더라도 타겟 프레임의 자막영역과 겹치지 않는 자막영역을 가진다. 또한, 후방향 참조 프레임은 타겟 프레임보다 시간 상으로 이전에 위치한 참조 프레임을 의미하고 전방향 참조 프레임은 타겟 프레임보다 시간적으로 이후에 위치한 참조 프레임을 의미한다. Specifically, the reference frame does not include the subtitle or has a subtitle region that does not overlap the subtitle region of the target frame even if it does include the subtitle. In addition, the backward reference frame refers to a reference frame located temporally earlier than the target frame, and the forward reference frame refers to a reference frame located temporally later than the target frame.

본 개시의 일 실시예에 따르면, 흐름 제어부(500)는 입력 프레임을 시간순으로 순차적으로 확인하여 타겟 프레임을 결정하고, 결정된 타겟 프레임을 기준으로 시간 상으로 가장 가까운 후방향 참조 프레임 및/또는 전방향 참조 프레임이 후방향 참조 프레임 큐(510) 및/또는 전방향 참조 프레임 큐(530)에 적재될 수 있도록 한다. According to an embodiment of the present disclosure, the flow control unit 500 determines the target frame by sequentially checking the input frames in chronological order, and the closest backward reference frame and/or forward direction in time based on the determined target frame. Allows the reference frame to be loaded into the backward reference frame queue 510 and/or the forward reference frame queue 530 .

본 개시의 일 실시예에 따르면, 흐름 제어부(500)는 프레임 메타정보 DB(110)로부터 제공되는 장면 전환정보를 기초로, 장면 단위로 타겟 프레임 및 참조 프레임을 결정한다. 즉, 흐름 제어부(500)는 동일 장면 내의 입력 프레임 중에서 타겟 프레임 및 참조프레임을 결정한다. 구체적으로 흐름 제어부(500)는 장면 전환이 발생하면 후방향 참조 프레임 큐(510) 및 전방향 참조 프레임 큐(530)를 초기화하여 타겟 프레임과의 상관성이 낮은 프레임이 참조 프레임으로 사용되지 않도록 한다.According to an embodiment of the present disclosure, the flow controller 500 determines a target frame and a reference frame in units of scenes based on scene change information provided from the frame meta information DB 110 . That is, the flow control unit 500 determines a target frame and a reference frame among input frames in the same scene. Specifically, when a scene change occurs, the flow control unit 500 initializes the backward reference frame queue 510 and the forward reference frame queue 530 so that a frame having a low correlation with the target frame is not used as the reference frame.

본 개시의 일 실시예에 따르면, 흐름 제어부(500)는 N(N=자연수) 개의 프레임을 하나의 프레임 처리 단위로 구성하고, 타겟 프레임의 수(T), 후방향 참조 프레임의 수(B), 및 전방향 참조 프레임의 수(F)의 합이 N 개가 되는 경우 타겟 프레임 및/또는 참조 프레임을 인터 복원부(142) 및 인트라 복원부(144) 중 적어도 하나에게 제공한다. According to an embodiment of the present disclosure, the flow control unit 500 configures N (N = natural number) frames as one frame processing unit, the number of target frames (T), and the number of backward reference frames (B) , and when the sum of the number F of omnidirectional reference frames becomes N, the target frame and/or the reference frame are provided to at least one of the inter restoration unit 142 and the intra restoration unit 144 .

구체적으로, N 개의 프레임 처리 단위 내에 참조 프레임이 없는 경우(B + F = 0), 흐름 제어부(500)는 타겟 프레임 리스트(520) 내의 T 개의 타겟 프레임 중 M(M=자연수) 개의 타겟 프레임을 인트라 복원부(144)에게 제공하고, 복원된 M 개의 프레임을 참조 프레임으로 사용하기 위해 후방향 참조 프레임 큐(510)에 추가한다. 반면, N 개의 프레임 처리 단위 내에 참조 프레임이 있는 경우(B + F > 0), 흐름 제어부(500)는 T 개의 타겟 프레임, B 개의 후방향 참조 프레임, 및 F 개의 전방향 참조 프레임을 인터 복원부(142)에게 제공한다. 이때, 흐름 제어부(500)는 T 개의 타겟 프레임을 인트라 복원부(144)에게도 제공할 수 있다.Specifically, when there is no reference frame within the N frame processing units (B + F = 0), the flow control unit 500 selects M (M = natural number) target frames among T target frames in the target frame list 520 . It is provided to the intra restoration unit 144 and added to the backward reference frame queue 510 in order to use the restored M frames as reference frames. On the other hand, when there are reference frames within the N frame processing units (B + F > 0), the flow control unit 500 converts the T target frames, the B backward reference frames, and the F forward reference frames to the inter restoration unit. (142). In this case, the flow control unit 500 may also provide the T target frames to the intra restoration unit 144 .

흐름 제어부(500)가 타겟 프레임 및 참조 프레임을 결정하는 과정에 대한 구체적인 설명은 도 6a 및 6b를 참조하여 설명하기로 한다.A detailed description of a process in which the flow controller 500 determines the target frame and the reference frame will be described with reference to FIGS. 6A and 6B .

후방향 참조 프레임 큐(510)는 흐름 제어부(500)에 의해 결정된 후방향 참조 프레임을 저장한다. 전방향 참조 프레임 큐(530)는 흐름 제어부(500)에 의해 결정된 전방향 참조 프레임을 저장한다. 각 참조 프레임 큐(510 및 530)는 기 설정된 최대 프레임 수를 기준으로, 신규 참조 프레임을 삽입 및/또는 가장 오래된 참조 프레임을 제거하여 타겟 프레임을 기준으로 시간 상으로 가장 가까운 후방향 참조 프레임 및/또는 전방향 참조 프레임이 후방향 참조 프레임 큐(510) 및/또는 전방향 참조 프레임 큐(530)에 적재될 수 있도록 한다. 각 참조 프레임 큐(510 및 530)는 흐름 제어부(500)의 제어에 따라 적어도 하나의 후방향 참조 프레임 및/또는 적어도 하나의 전방향 참조 프레임을 인터 복원부(142)에게 제공한다. The backward reference frame queue 510 stores the backward reference frame determined by the flow control unit 500 . The forward reference frame queue 530 stores the forward reference frame determined by the flow control unit 500 . Each of the reference frame queues 510 and 530 inserts a new reference frame and/or removes the oldest reference frame based on a preset maximum number of frames, so that the backward reference frame closest in time to the target frame and/or Alternatively, the forward reference frame may be loaded into the backward reference frame queue 510 and/or the forward reference frame queue 530 . Each of the reference frame queues 510 and 530 provides at least one backward reference frame and/or at least one forward reference frame to the inter restoration unit 142 under the control of the flow control unit 500 .

타겟 프레임 리스트(520)는 흐름 제어부(500)에 의해 결정된 타겟 프레임을 저장한다. 타겟 프레임 리스트(520)는 흐름 제어부(500)의 제어에 따라 적어도 하나의 타겟 프레임을 인터 복원부(142) 및 인트라 복원부(144) 중 적어도 하나에게 제공한다. The target frame list 520 stores the target frame determined by the flow control unit 500 . The target frame list 520 provides at least one target frame to at least one of the inter restoration unit 142 and the intra restoration unit 144 under the control of the flow control unit 500 .

도 6은 본 개시의 제2 실시예에 따른 프레임 선택과정을 설명하기 위한 순서도이다.6 is a flowchart illustrating a frame selection process according to a second embodiment of the present disclosure.

흐름 제어부(500)는 입력 프레임 중 n(n=자연수)번째 프레임에 제거할 자막이 존재하는지 확인한다(S600). n번째 프레임에 제거할 자막이 없는 경우, 흐름 제어부(500)는 n번째 프레임을 후방향 참조 프레임 큐(510)에 추가하고(S602), n을 1 만큼 증가시키고 프레임 선택과정을 처음부터 재수행한다(S604). 즉, 흐름 제어부(500)는 입력 프레임을 시간순으로 순차적으로 확인하여 프레임 내에 제거할 자막이 존재하는지 확인한다. The flow control unit 500 checks whether a subtitle to be removed exists in the n (n = natural number)-th frame among the input frames (S600). If there is no subtitle to be removed in the nth frame, the flow controller 500 adds the nth frame to the backward reference frame queue 510 (S602), increases n by 1, and restarts the frame selection process from the beginning. (S604). That is, the flow control unit 500 sequentially checks the input frames in chronological order to confirm whether there is a subtitle to be removed in the frame.

n번째 프레임에 제거할 자막이 있는 경우, 흐름 제어부(500)는 후방향 참조 프레임 큐(510)에 후방향 참조 프레임이 존재하는지 확인한다(S610). When there is a caption to be removed in the nth frame, the flow controller 500 checks whether a backward reference frame exists in the backward reference frame queue 510 (S610).

후방향 참조 프레임이 없는 경우, 흐름 제어부(500)는 자막제거가 기 처리된 프레임 중에서 후방향 참조 프레임을 검출하고, 후방향 참조 프레임 큐(510)에 추가한다(S612). 구체적으로, 흐름 제어부(500)는 자막제거가 기 처리된 프레임 중에서, n 번째 프레임의 자막영역과 겹치지 않는 자막영역을 갖는 프레임을 후방향 참조 프레임으로 검출하고, 후방향 참조 프레임 큐(510)에 추가한다.If there is no backward reference frame, the flow control unit 500 detects a backward reference frame from among the frames for which subtitle removal has been previously processed, and adds it to the backward reference frame queue 510 ( S612 ). Specifically, the flow control unit 500 detects a frame having a caption area that does not overlap the caption area of the nth frame from among the frames for which caption removal has been previously processed as a backward reference frame, and stores the frame in the backward reference frame queue 510 . add

흐름 제어부(500)는 n번째 프레임으로부터 시간 상으로 이후에 위치한 이후 프레임 중에서 전방향 참조 프레임을 검출하고, 전방향 참조 프레임 큐(530)에 추가한다(S620). 구체적으로, 흐름 제어부(500)는 n+1번째 프레임부터 다음 장면 전환이 발생하기 전 시점의 프레임까지 순차적으로 자막이 존재하는지 여부를 확인하여, 제거할 자막이 없는 프레임을 전방향 참조 프레임 큐(530)에 추가한다. 즉, 흐름 제어부(500)는 n번째 프레임으로부터 시간 순서상 가까운 전방향 참조 프레임부터 전방향 참조 프레임 큐(530)에 추가하며, 기 설정된 최대 프레임 수 이하의 전방향 참조 프레임을 전방향 참조 프레임 큐(530)에 추가한다. The flow control unit 500 detects the forward reference frame from among the frames located later in time from the nth frame, and adds it to the forward reference frame queue 530 ( S620 ). Specifically, the flow control unit 500 sequentially checks whether subtitles exist from the n+1th frame to the frame at the time before the next scene change occurs, and sets the frame without the subtitle to be removed to the forward reference frame queue ( 530) is added. That is, the flow control unit 500 adds to the omnidirectional reference frame queue 530 from the omnidirectional reference frame that is closest in time order from the nth frame, and adds omnidirectional reference frames less than or equal to the preset maximum number of frames to the omnidirectional reference frame queue. (530) is added.

흐름 제어부(500)는 n번째 프레임을 타겟 프레임 리스트(520)에 추가한다(S630).The flow control unit 500 adds the nth frame to the target frame list 520 (S630).

흐름 제어부(500)는 타겟 프레임 리스트(520)에 저장된 타겟 프레임의 수와 후방향 참조 프레임 큐(510) 및 전방향 참조 프레임 큐(530)에 저장된 참조 프레임의 수의 합(T+B+F)이 프레임 처리 단위인 N보다 큰지 확인한다(S640).The flow control unit 500 is the sum of the number of target frames stored in the target frame list 520 and the number of reference frames stored in the backward reference frame queue 510 and the forward reference frame queue 530 (T+B+F). ) is greater than N, which is a frame processing unit (S640).

흐름 제어부(500)는 타겟 프레임의 수와 참조 프레임의 수의 합이 프레임 처리 단위인 N보다 작은 경우 n번째 프레임이 마지막 프레임인지 확인한다(S642). 여기서, 마지막 프레임은 입력 프레임 전체 중 마지막 프레임 또는 다음 장면 전환이 발생하기 전 마지막 프레임을 의미한다. When the sum of the number of target frames and the number of reference frames is smaller than N, which is a frame processing unit, the flow control unit 500 checks whether the nth frame is the last frame (S642). Here, the last frame means the last frame among all input frames or the last frame before the next scene change occurs.

n번째 프레임이 마지막 프레임이 아닌 경우, 흐름 제어부(500)는 n을 1 만큼 증가시키고 프레임 선택과정을 처음부터 재수행한다(S604). 즉, 흐름 제어부(500)는 프레임 처리 단위 및 장면 전환 시점에 기초해 복원을 수행할 프레임 수를 조절한다.If the nth frame is not the last frame, the flow control unit 500 increases n by 1 and restarts the frame selection process from the beginning (S604). That is, the flow controller 500 adjusts the number of frames to be restored based on the frame processing unit and the scene change time.

타겟 프레임의 수와 참조 프레임의 수의 합이 프레임 처리 단위인 N보다 큰 경우 또는 n번째 프레임이 마지막 프레임인 경우, 흐름 제어부(500)는 참조 프레임의 수가 0인지 확인한다. When the sum of the number of target frames and the number of reference frames is greater than N, which is a frame processing unit, or when the nth frame is the last frame, the flow controller 500 checks whether the number of reference frames is 0.

참조 프레임의 수가 0인 경우, 흐름 제어부(500)는 타겟 프레임 리스트(520)에 저장된 타겟 프레임 중 M(M=자연수) 개의 프레임을 인트라 복원부(144)에게 제공하고, 해당 M 개의 프레임을 타겟 프레임 리스트(520)에서 제거한다(S652). 이에 따라, 인트라 복원부(144)는 M개의 타겟 프레임에 대해 인트라 복원을 수행한다.When the number of reference frames is 0, the flow control unit 500 provides M (M = natural number) frames among the target frames stored in the target frame list 520 to the intra restoration unit 144, and sets the M frames to the target. It is removed from the frame list 520 (S652). Accordingly, the intra restoration unit 144 performs intra restoration on the M target frames.

흐름 제어부(500)는 인트라 복원부(144)가 복원한 M 개의 프레임을 참조 프레임으로 사용하기 위해 후방향 참조 프레임 큐(510)에 추가한다(S654).The flow control unit 500 adds the M frames restored by the intra restoration unit 144 to the backward reference frame queue 510 for use as reference frames (S654).

과정 S650의 확인 결과 참조 프레임의 수가 0이 아닌 경우 또는 과정 S654에서 후방향 참조 프레임 큐(510)에 참조 프레임을 추가한 이후, 흐름 제어부(500)는 타겟 프레임의 수가 0인지 확인한다(S660).If the number of reference frames is not 0 as a result of the check in step S650 or after adding the reference frames to the backward reference frame queue 510 in step S654, the flow controller 500 checks whether the number of target frames is 0 (S660) .

타겟 프레임의 수가 0이 아닌 경우, 흐름 제어부(500)는 타겟 프레임 리스트(520)에 저장된 타겟 프레임과 후방향 참조 프레임 큐(510) 및 전방향 참조 프레임 큐(530)에 저장된 참조 프레임을 인터 복원부(142)에게 제공하고, 타겟 프레임 리스트(520)를 초기화 한다(S662). 이에 따라, 인터 복원부(142)는 타겟 프레임 리스트(520)에 저장된 전체 타겟 프레임에 대해 인트라 복원을 수행한다. 한편, 본 개시의 다른 실시예에 따른 흐름 제어부(500)는 타겟 프레임 리스트(520)를 초기화하기에 앞서, 타겟 프레임 리스트(520)에 저장된 타겟 프레임을 인트라 복원부(144)에게 제공할 수 있다. When the number of target frames is not 0, the flow control unit 500 inter-restores the target frame stored in the target frame list 520 and the reference frame stored in the backward reference frame queue 510 and the forward reference frame queue 530 . It is provided to the unit 142, and the target frame list 520 is initialized (S662). Accordingly, the inter restoration unit 142 performs intra restoration on all target frames stored in the target frame list 520 . Meanwhile, the flow control unit 500 according to another embodiment of the present disclosure may provide the target frame stored in the target frame list 520 to the intra restoration unit 144 before initializing the target frame list 520 . .

흐름 제어부(500)는 n번째 프레임이 마지막 프레임인지 확인한다(S670). 여기서, 마지막 프레임은 입력 프레임 전체 중 마지막 프레임 또는 다음 장면 전환이 발생하기 전 마지막 프레임을 의미한다. The flow control unit 500 checks whether the n-th frame is the last frame (S670). Here, the last frame means the last frame among all input frames or the last frame before the next scene change occurs.

n번째 프레임이 마지막 프레임이 아닌 경우, 흐름 제어부(500)는 n을 1 만큼 증가시키고 프레임 선택과정을 처음부터 재수행한다(S604). 즉, 흐름 제어부(500)는 n번째 프레임으로부터 시간 상으로 이후에 위치한 프레임을 대상으로 자막을 제거할 하나 이상의 타겟 프레임을 선택하고, 타겟 프레임들과 상관성이 높은 프레임을 참조 프레임으로 선택한다. If the nth frame is not the last frame, the flow control unit 500 increases n by 1 and restarts the frame selection process from the beginning (S604). That is, the flow control unit 500 selects one or more target frames from which subtitles are to be removed from frames located later in time from the nth frame, and selects a frame having high correlation with the target frames as a reference frame.

n번째 프레임이 마지막 프레임이 아닌 경우, 흐름 제어부(500)는 프레임 선택과정을 종료한다. 한편, n번째 프레임이 다음 장면 전환이 발생하기 전 마지막 프레임인 경우에는 흐름 제어부(500)는 후방향 참조 프레임 큐(510) 및 전방향 참조 프레임 큐(530)를 초기화하고, 프레임 선택과정을 처음부터 재수행할 수 있다.If the n-th frame is not the last frame, the flow control unit 500 ends the frame selection process. On the other hand, when the n-th frame is the last frame before the next scene change occurs, the flow control unit 500 initializes the backward reference frame queue 510 and the forward reference frame queue 530, and starts the frame selection process for the first time. can be redone from

이하, 도 7 내지 도 10을 참조하여 본 개시의 일 실시예에 따른 인터 복원부를 설명한다.Hereinafter, an inter restoration unit according to an embodiment of the present disclosure will be described with reference to FIGS. 7 to 10 .

인터 복원부(142)는 타겟 프레임과 참조 프레임 간 유사도를 활용하여, 타겟 프레임과 상관성이 높은 참조 프레임으로부터 타겟 프레임에서 제거할 자막 위치의 화소 값을 대체할 새로운 화소 값들을 획득하고, 및 이를 합성하여 국소 이미지 복원을 수행한다.The inter reconstructor 142 obtains new pixel values to replace the pixel value of the subtitle position to be removed from the target frame from the reference frame with high correlation with the target frame by using the similarity between the target frame and the reference frame, and synthesizes them to perform local image restoration.

도 7은 본 개시의 일 실시예에 따른 인터 복원부의 네트워크를 설명하기 위한 예시도이다. 7 is an exemplary diagram for explaining a network of an inter recovery unit according to an embodiment of the present disclosure.

도 7을 참조하면, 인터 복원부(142)는 신경망(Neural Network) 기반 오토인코더(autoencoder)로 학습한 인코더 네트워크(encoder network, 700 내지 706)를 이용하여, 타겟 프레임과 참조 프레임의 특징(feature)으로서 밸류 특징(value feature), 키 특징(key feature), 및 쿼리 특징(query feature)를 추출한다. Referring to FIG. 7 , the inter reconstructor 142 uses an encoder network 700 to 706 learned with a neural network-based autoencoder, and features a target frame and a reference frame. ) as a value feature, a key feature, and a query feature.

인터 복원부(142)는 어텐션 매칭부(attention matching unit, 710)를 이용하여 타겟 프레임에서 자막영역에 인접한 주위영역의 쿼리 특징과 참조 프레임의 키 특징 간의 어텐션 매칭(attention matching)을 수행하고, 어텐션 스코어(attention score)를 산출한다. 인터 복원부(142)는 어텐션 스코어를 기반으로 참조 프레임으로부터 타겟 프레임의 검출영역과 유사도(similarity)가 높은 유사영역을 찾는다. 본 개시의 일 실시예에 따른 어텐션 매칭부(710)에 대해서는 도 8a 내지 도 9에서 후술하도록 한다. The inter restoration unit 142 performs attention matching between the query characteristics of the surrounding region adjacent to the subtitle region in the target frame and the key characteristics of the reference frame using an attention matching unit 710, and An attention score is calculated. The inter reconstructor 142 finds a similar region having high similarity to the detection region of the target frame from the reference frame based on the attention score. The attention matching unit 710 according to an embodiment of the present disclosure will be described later with reference to FIGS. 8A to 9 .

인터 복원부(142)는 타겟 프레임의 자막 영역을 채우기 위한 보상치 특징(compensated value feature)를 생성하고, 이를 디코더 네트워크(decoder network, 720)에 입력하여 최종 복원 화소를 생성한다.The inter reconstructor 142 generates a compensated value feature for filling the subtitle region of the target frame, and inputs it to a decoder network 720 to generate a final reconstructed pixel.

이하, 도 8a 내지 도 9를 참조하여 본 개시의 일 실시예에 따른 어텐션 매칭부(710)에 대해 설명한다. Hereinafter, the attention matching unit 710 according to an embodiment of the present disclosure will be described with reference to FIGS. 8A to 9 .

먼저, 본 개시의 일 실시예에 따른 어텐션 매칭부(710)가 수정된 3차원 합성곱(modified 3D convolution)을 이용하여 타겟 프레임과 참조 프레임 간 유사도를 계산하는 방법을 설명한다. First, a method for calculating the similarity between a target frame and a reference frame by using the modified 3D convolution by the attention matching unit 710 according to an embodiment of the present disclosure will be described.

도 8a 내지 도 8b는 본 개시의 일 실시예에 따른 타겟 프레임과 참조 프레임 간 유사도 계산을 설명하기 위한 예시도이다.8A to 8B are exemplary diagrams for explaining calculation of a similarity between a target frame and a reference frame according to an embodiment of the present disclosure.

도 8a는 본 개시의 일 실시예에 따른 복원영역(inpainting area)의 특징정보를 이용한 유사도 계산방법을 나타낸다.8A illustrates a similarity calculation method using characteristic information of an inpainting area according to an embodiment of the present disclosure.

종래의 어텐션 매칭부는 행렬 곱(matrix multiplication)을 이용하여, 키 특징벡터(key feature vector)와 쿼리 특징벡터(query feature vector) 간의 유사도 즉, 벡터(vector) 간의 유사도를 구한다. 그러나 이러한 방법은 신경망을 통과한 한 포인트 즉, 작은 영역에 대한 상관도만 계산하기 때문에 복원할 주위 픽셀들을 이용하여 큰 영역 간의 유사도를 계산할 수 없다. 반면 본 개시의 일 실시예에 따른 어텐션 매칭부(710)는 합성곱(convolution)을 이용하여 키 특징 매트릭스(key feature matrix)와 쿼리 특징 매트릭스(query feature matrix) 간의 유사도, 즉 매트릭스(matrix) 간의 유사도를 찾는다. A conventional attention matching unit obtains a degree of similarity between a key feature vector and a query feature vector, that is, a degree of similarity between vectors, by using matrix multiplication. However, since this method only calculates the correlation for a point that has passed through the neural network, that is, a small region, it is impossible to calculate the similarity between large regions using surrounding pixels to be restored. On the other hand, the attention matching unit 710 according to an embodiment of the present disclosure uses a convolution to use a similarity between a key feature matrix and a query feature matrix, that is, between a matrix look for similarity.

도 8a를 참조하면, 본 개시의 일 실시예에 따른 인코더는 (W, H)의 크기를 갖는 타겟 프레임과 참조 프레임을 입력받아 (W', H', F)의 크기를 갖는 3차원 매트릭스를 생성한다. 이때, W'과 H'는 인코더에 의해 스케일된(scaled) 프레임 크기이며, F는 인코더에 의해 추출된 특징의 수이다. 마찬가지로, 타겟 프레임에 포함된 복원영역의 크기를 (iW, iH)라 할 때, 인코더에 의해 스케일 된 복원영역의 크기는 (iW', iH')이다. 따라서, 타겟 프레임에 대한 3차원 매트릭스 중 복원할 영역의 3차원 매트릭스만을 추출한 쿼리 특징 매트릭스는 (iW', iH', F)의 크기를 갖는다. 한편, 키 특징 매트릭스는 참조 프레임에 대한 3차원 매트릭스로서, (W', H', F)의 크기를 갖는다.Referring to FIG. 8A , an encoder according to an embodiment of the present disclosure receives a target frame and a reference frame having a size of (W, H) and generates a three-dimensional matrix having a size of (W', H', F). create Here, W' and H' are frame sizes scaled by the encoder, and F is the number of features extracted by the encoder. Similarly, when the size of the reconstructed area included in the target frame is (iW, iH), the size of the reconstructed area scaled by the encoder is (iW', iH'). Accordingly, the query feature matrix obtained by extracting only the 3D matrix of the region to be restored from the 3D matrix of the target frame has a size of (iW', iH', F). On the other hand, the key feature matrix is a three-dimensional matrix for the reference frame, and has a size of (W', H', F).

본 개시의 일 실시예에 따른 어텐션 매칭부(710)는 쿼리 특징 매트릭스와 키 특징 매트릭스 간의 수정된 3차원 합성곱(modified 3D convolution)을 이용하여 유사도 매트릭스(similarity matrix)을 계산한다. 이때, 수정된 3차원 합성곱을 수식으로 표현하면 수학식 1과 같다. The attention matching unit 710 according to an embodiment of the present disclosure calculates a similarity matrix by using a modified 3D convolution between the query feature matrix and the key feature matrix. At this time, if the modified three-dimensional convolution is expressed as an equation, it is shown in Equation 1.

여기서, S는 유사도 매트릭스이며, Q는 쿼리 특징 매트릭스 이고, K는 키 특징 매트릭스이다.Here, S is a similarity matrix, Q is a query feature matrix, and K is a key feature matrix.

본 개시의 일 실시예에 따른 어텐션 매칭부(710)는 유사도 매트릭스에 SoftMax 또는 SparseMax 함수를 적용하여 유사도 매트릭스를 확률 정보로 변환한다. The attention matching unit 710 according to an embodiment of the present disclosure converts the similarity matrix into probability information by applying a SoftMax or SparseMax function to the similarity matrix.

이상과 같이, 종래의 어텐션 매칭부는 3차원 매트릭스를 2차원 매트릭스로 변환한 후 2차원 매트릭스 간의 행렬 곱을 이용함으로써, 타겟 프레임과 참조 프레임의 특정지점 간의 유사성만을 계산하지만, 본 개시의 일 실시예에 따른 어텐션 매칭부(710)는 3차원 매트릭스 간의 합성곱을 이용함으로써, 타겟 프레임과 참조 프레임의 특정영역 간의 유사성을 계산할 수 있다. As described above, the conventional attention matching unit calculates only the similarity between specific points of the target frame and the reference frame by using the matrix product between the two-dimensional matrices after converting the three-dimensional matrix into a two-dimensional matrix, but in an embodiment of the present disclosure Accordingly, the attention matching unit 710 may calculate the similarity between the specific region of the target frame and the reference frame by using the convolution between the 3D matrices.

도 8b는 본 개시의 일 실시예에 따른 주위영역(neighbor area)의 특징정보를 이용한 유사도 계산방법을 나타낸다.8B illustrates a similarity calculation method using characteristic information of a neighbor area according to an embodiment of the present disclosure.

도 8a에서 설명한 바와 같이, 본 개시의 일 실시예에 따른 어텐션 매칭부(710)는 복원을 수행할 복원영역의 특징을 추출하여 유사도를 계산한다. 반면, 도 8b를 참조하면 본 개시의 다른 실시예에 따른 어텐션 매칭부(710)는 복원영역이 아닌 복원영역의 주위영역의 특징을 추출하여 쿼리 특징 매트릭스를 생성한다.As described in FIG. 8A , the attention matching unit 710 according to an embodiment of the present disclosure extracts features of a restoration area to be restored and calculates a degree of similarity. On the other hand, referring to FIG. 8B , the attention matching unit 710 according to another embodiment of the present disclosure generates a query feature matrix by extracting features of a region surrounding the restored region, not the restored region.

이하, 본 개시의 일 실시예에 따른 어텐션 매칭부(710)가 타겟 프레임과 참조 프레임 사이에서 추정된 움직임 이동량에 따라 적응적으로 특징 인덱싱(feature indexing)을 수행하는 방법을 설명한다. Hereinafter, a method for the attention matching unit 710 according to an embodiment of the present disclosure to adaptively perform feature indexing according to an estimated amount of movement between a target frame and a reference frame will be described.

도 9는 본 개시의 일 실시예에 따른 어텐션 매칭을 위한 참조 프레임의 특징 인덱싱을 설명하기 위한 예시도이다.9 is an exemplary diagram illustrating feature indexing of a reference frame for attention matching according to an embodiment of the present disclosure.

종래의 어텐션 매칭부(710)는 타겟 프레임의 복원영역의 특징과 어텐션 매칭을 수행하기 위해, 참조 프레임의 가용한(valid) 모든 특징을 인덱싱하여 사용한다. 그러나 통계적으로 영상 시퀀스 상 인접한 프레임 간에는 발생하는 움직임 이동량이 작기 때문에, 참조 프레임의 가용한 모든 특징 중 타겟 프레임의 복원영역과 인접한 위치에서 추출된 특징을 이용하여 산출할 때 높은 어텐션 스코어를 얻을 가능성이 크다. 즉, 참조 프레임의 특징을 프레임 전체 영역으로 인덱싱하는 것은 연산량이 많으며, 잘못된 어텐션 매칭으로 인해 기대하지 않은 품질 열화를 발생시킬 수 있다.The conventional attention matching unit 710 indexes and uses all available features of the reference frame to perform attention matching with the features of the restored region of the target frame. However, statistically, since the amount of movement occurring between adjacent frames in the image sequence is small, there is a high possibility of obtaining a high attention score when calculating using features extracted from the location adjacent to the restoration area of the target frame among all available features of the reference frame. big. That is, indexing the features of the reference frame into the entire frame area requires a lot of computation, and may cause unexpected quality degradation due to incorrect attention matching.

본 개시의 일 실시예에 따른 어텐션 매칭부(710)는 타겟 프레임과 참조 프레임의 움직임 이동량을 추정하여 참조 프레임의 특징 인덱싱을 가변적으로 수행한다. The attention matching unit 710 according to an embodiment of the present disclosure variably performs feature indexing of the reference frame by estimating the amount of movement of the target frame and the reference frame.

도 9를 참조하면, 본 개시의 일 실시예에 따른 어텐션 매칭부(710)는 참조 프레임 내에서 타겟 프레임의 복원영역(902)과 동일한 위치의 영역(912, 922, 및 932)을 기준으로 유효영역(valid area, 914, 924, 934)을 주변으로 확장시킨다. 이때, 유효영역은 특징 인덱싱의 대상이 되는 영역을 의미한다. 즉, 어텐션 매칭부(710)는 유효영역(914, 924, 934)에 포함된 특징을 대상으로 특징 인덱싱을 수행한다. Referring to FIG. 9 , the attention matching unit 710 according to an embodiment of the present disclosure is effective based on regions 912 , 922 , and 932 at the same location as the restoration region 902 of the target frame within the reference frame. The valid areas 914, 924, and 934 are extended to the periphery. In this case, the effective region means a region to be subjected to feature indexing. That is, the attention matching unit 710 performs feature indexing on features included in the effective regions 914 , 924 , and 934 .

본 개시의 일 실시예에 따른 어텐션 매칭부(710)는 타겟 프레임과 참조 프레임 간 움직임 이동량을 추정하고, 추정된 움직임 이동량에 기초하여 유효 영역의 확장 정도를 결정한다. 즉, 어텐션 매칭부(710)는 타겟 프레임과 참조 프레임 사이에서 추정된 움직임 이동량에 따라 적응적으로 유효영역(914, 924, 934)의 크기를 조정한다. 예를 들어, 어텐션 매칭부(710)는 타겟 프레임과 참조 프레임 사이에서 추정된 움직임 이동량이 적으면 유효영역의 크기를 줄이고, 타겟 프레임과 참조 프레임 사이에서 추정된 움직임 이동량이 많으면 유효영역의 크기를 늘린다. The attention matching unit 710 according to an embodiment of the present disclosure estimates the amount of movement between the target frame and the reference frame, and determines the extent of expansion of the effective area based on the estimated amount of movement. That is, the attention matching unit 710 adaptively adjusts the sizes of the effective regions 914 , 924 , and 934 according to the estimated amount of movement between the target frame and the reference frame. For example, the attention matching unit 710 reduces the size of the effective region when the estimated amount of movement between the target frame and the reference frame is small, and increases the size of the effective region when the amount of movement estimated between the target frame and the reference frame is large. increase

도 9를 참조하면, 제1 참조 프레임 특징맵(910)은 작은 움직임 이동량을 갖는 참조 프레임으로부터 추출된 특징맵이고, 제2 참조 프레임 특징맵(920)은 중간 움직임 이동량을 갖는 참조 프레임으로부터 추출된 특징맵이고, 제3 참조 프레임 특징맵(930)은 큰 움직임 이동량을 갖는 참조 프레임으로부터 추출된 특징맵을 나타낸다. 따라서, 도 9에 나타나듯이 제1 참조 프레임 특징맵(910)의 유효영역(914)의 크기가 가장 작게 결정되고, 제3 참조 프레임 특징맵(930)의 유효영역(934)의 크기가 가장 크게 결정된다.Referring to FIG. 9 , the first reference frame feature map 910 is a feature map extracted from a reference frame with a small amount of motion, and the second reference frame feature map 920 is extracted from a reference frame with an intermediate amount of movement. It is a feature map, and the third reference frame feature map 930 represents a feature map extracted from a reference frame having a large amount of motion. Therefore, as shown in FIG. 9 , the size of the effective region 914 of the first reference frame feature map 910 is determined to be the smallest, and the size of the effective region 934 of the third reference frame feature map 930 is the largest. is decided

본 개시의 일 실시예에 따른 어텐션 매칭부(710)는 타겟 프레임과 참조 프레임 사이의 시간적 거리(temporal distance)를 기준으로 움직임 이동량을 추정할 수 있다. 예를 들어, 어텐션 매칭부(710)는 타겟 프레임과 참조 프레임 사이의 시간적 거리가 멀수록 움직임 이동량이 크다고 추정할 수 있다. 본 개시의 다른 실시예에 따른 어텐션 매칭부(710)는 타겟 프레임의 복원영역과 인접한 영역의 화소들이 참조 프레임으로부터 얼마만큼 이동하였는지에 따라 움직임 이동량을 추정할 수 있다. 어텐션 매칭부(710)의 움직임 이동량 추정방법은 전술한 예시에 한정되지 않으며, 통상의 기술자라면 누구든지 다른 방법을 이용하여 프레임 간의 움직임 이동량을 추정할 수 있을 것이다.The attention matching unit 710 according to an embodiment of the present disclosure may estimate the amount of movement based on a temporal distance between the target frame and the reference frame. For example, the attention matching unit 710 may estimate that the greater the temporal distance between the target frame and the reference frame, the greater the amount of movement. The attention matching unit 710 according to another embodiment of the present disclosure may estimate a movement amount according to how much pixels in an area adjacent to the restored area of the target frame have moved from the reference frame. The method of estimating the amount of movement of the attention matching unit 710 is not limited to the above-described example, and any person skilled in the art may estimate the amount of movement between frames using other methods.

이하, 도 10을 참조하여 본 개시의 일 실시예에 따른 비대칭 입출력 구조를 갖는 오토인코더 네트워크(autoencoder network)에 대해 설명한다. Hereinafter, an autoencoder network having an asymmetric input/output structure according to an embodiment of the present disclosure will be described with reference to FIG. 10 .

도 10은 본 개시의 일 실시예에 따른 비대칭 입출력 구조를 갖는 오토인코더 네트워크를 설명하기 위한 예시도이다.10 is an exemplary diagram for explaining an autoencoder network having an asymmetric input/output structure according to an embodiment of the present disclosure.

종래의 오토인코더 네트워크는 입력 영상의 해상도와 출력 영상의 해상도가 동일한 대칭형 구조로 설계된다. 따라서 입력 영상의 해상도가 클수록 네트워크의 연산량과 메모리 사용량이 많이 증가하기 때문에, 이를 해결하기 위해 다운샘플러(downsampler), 오토인코더(autoencoder), 업샘플러(upsampler) 및 합성부의 구조를 통해 영상을 복원하는 과정을 4단계로 나누어 수행한다. The conventional autoencoder network is designed in a symmetrical structure in which the resolution of the input image and the resolution of the output image are the same. Therefore, as the resolution of the input image increases, the amount of computation and memory usage of the network increases significantly. The process is divided into 4 steps.

여기서, 다운샘플러는 복원영역을 포함하는 (W, H) 크기의 입력 영상을 (W', H')의 크기로 다운샘플링(downsampling)하여 오토인코더 네트워크에 입력한다. 오토인코더 네트워크는 입력 영상에 포함된 복원영역에 대해 복원을 수행하며, 도 7을 참조한 설명 부분에서 설명한 바와 같은 인터 복원부(142)의 네트워크에 대응할 수 있다. 업샘플러는 (W', H')의 크기를 갖는 오토인코더 네트워크의 출력을 업샘플링(upsampling)하여 출력 영상을 (W, H) 크기의 원 해상도로 복원한다. 합성부는 입력 영상에서 복원 영역을 제외한 영역과 출력 영상의 복원 영역을 합성하여 최종 복원 영상을 생성한다. Here, the downsampler downsamples an input image having a size of (W, H) including a reconstructed area to a size of (W', H') and input it to the autoencoder network. The autoencoder network performs restoration on the restoration area included in the input image, and may correspond to the network of the inter restoration unit 142 as described in the description with reference to FIG. 7 . The upsampler upsamples the output of the autoencoder network having the size of (W', H') and restores the output image to the original resolution of the size (W, H). The synthesizing unit generates a final reconstructed image by synthesizing a reconstructed area of the output image with a region excluding the reconstructed area from the input image.

도 10을 참조하면, 본 개시의 일 실시예에 따른 오토인코더 네트워크(1000)는 입력 영상의 해상도와 출력 영상의 해상도가 상이한 비대칭 입출력 구조를 가진다. 본 개시의 일 실시예에 따른 디코더 네트워크(1010)의 출력단은 입력 영상의 해상도와 동일한 해상도의 영상을 출력할 수 있도록 하는 업샘플링 네트워크 레이어(upsampling network layer, 1020)를 구비한다. 즉, 본 개시의 일 실시예에 따르면, 출력 영상을 업샘플링하여 원 해상도로 복원하는 과정을 오토인코더 네트워크(1000)와 연동하는 별도의 업샘플러에서 수행하지 않고, 오토인코더 네트워크(1000) 내의 디코더 네트워크(1010)에 융합된 업샘플링 네트워크 레이어(1020)에서 수행한다. 이에 따라 오토인코더 네트워크가 영상복원과 업샘플링을 함께 학습할 수 있어, 바이큐빅(bicubic) 등과 같은 일반적인 업샘플링 방법보다 더욱 해상력이 높은 결과물을 얻을 수 있다.Referring to FIG. 10 , the autoencoder network 1000 according to an embodiment of the present disclosure has an asymmetric input/output structure in which a resolution of an input image and a resolution of an output image are different. The output terminal of the decoder network 1010 according to an embodiment of the present disclosure includes an upsampling network layer 1020 capable of outputting an image having the same resolution as that of the input image. That is, according to an embodiment of the present disclosure, the process of upsampling the output image to restore the original resolution is not performed by a separate upsampler interworking with the autoencoder network 1000 , but a decoder in the autoencoder network 1000 . It is performed in the upsampling network layer 1020 fused to the network 1010 . Accordingly, the autoencoder network can learn image restoration and upsampling together, so that a higher resolution result can be obtained than general upsampling methods such as bicubic.

이하, 도 11 내지 도 13을 참조하여 본 개시의 일 실시예에 따른 인트라 복원부를 설명한다.Hereinafter, an intra restoration unit according to an embodiment of the present disclosure will be described with reference to FIGS. 11 to 13 .

도 11은 본 개시의 일 실시예에 따른 인트라 복원부를 설명하기 위한 예시도이다. 11 is an exemplary view for explaining an intra restoration unit according to an embodiment of the present disclosure.

본 개시의 일 실시예에 따른 인트라 복원부(144)는 타겟 프레임 내 유사도를 활용하여, 타겟 프레임에서 제거할 자막 위치의 화소 값을 대체할 새로운 화소 값을 타겟 프레임 내의 다른 위치로부터 획득하고, 이를 합성하여 국소 이미지 복원을 수행한다. The intra restoration unit 144 according to an embodiment of the present disclosure obtains, from another position in the target frame, a new pixel value to replace the pixel value of the subtitle position to be removed from the target frame, by using the similarity within the target frame. By synthesizing, local image reconstruction is performed.

도 11을 참조하면, 본 개시의 일 실시예에 따른 인트라 복원부(144)는 비정밀 예측부(coarse prediction unit, 1100) 및 정밀 처리부(refinement processing unit, 1110)를 포함한다. 비정밀 예측부(1100)는 타겟 프레임 및 마스크 이미지를 기초로 1차적으로 비정밀 예측 프레임(coarse predicted frame)을 생성한다. 정밀 처리부(1110)는 비정밀 예측프레임을 입력받아 최종적으로 복원 프레임(inpainted frame)을 생성한다. Referring to FIG. 11 , the intra reconstruction unit 144 according to an embodiment of the present disclosure includes a coarse prediction unit 1100 and a refinement processing unit 1110 . The coarse prediction unit 1100 primarily generates a coarse predicted frame based on the target frame and the mask image. The precision processing unit 1110 receives an inaccurate prediction frame and finally generates an inpainted frame.

도 12는 본 개시의 일 실시예에 따른 비정밀 예측부를 설명하기 위한 예시도이다. 12 is an exemplary diagram for explaining a coarse prediction unit according to an embodiment of the present disclosure.

본 개시의 일 실시예에 따른 비정밀 예측부(1100)는 타겟 프레임의 영상 특성을 분석하여 자막이 제거된 영역을 대체하기 위한 새로운 화소 값을 대략적으로 예측한다. The coarse prediction unit 1100 according to an embodiment of the present disclosure roughly predicts a new pixel value for replacing the region from which the caption is removed by analyzing the image characteristics of the target frame.

도 12를 참조하면, 본 개시의 일 실시예에 따른 비정밀 예측부(1100)는 신경망 기반 오토인코더로 학습한 인코더 네트워크(1200) 및 디코더 네트워크(1210)를 포함한다. 비정밀 예측부(1100)는 타겟 프레임 및 마스크 이미지를 입력받아, 자막 제거 영역을 새로운 화소값으로 대체한 비정밀 예측 프레임을 생성한다. Referring to FIG. 12 , the coarse prediction unit 1100 according to an embodiment of the present disclosure includes an encoder network 1200 and a decoder network 1210 trained by a neural network-based autoencoder. The coarse prediction unit 1100 receives a target frame and a mask image, and generates a coarse prediction frame in which the caption removal region is replaced with a new pixel value.

도 13은 본 개시의 일 실시예에 따른 정밀 처리부를 설명하기 위한 예시도이다. 13 is an exemplary view for explaining a precision processing unit according to an embodiment of the present disclosure.

도 13을 참조하면, 본 개시의 일 실시예에 따른 정밀 처리부(1110)는 신경망 기반 오토인코더로 학습한 인코더 네트워크(1300 내지 1308)를 이용하여, 타겟 프레임의 특징으로서 밸류 특징(value feature), 키 특징(key feature), 및 쿼리 특징(query feature)를 추출한다. 13, the precision processing unit 1110 according to an embodiment of the present disclosure uses the encoder networks 1300 to 1308 learned with a neural network-based autoencoder, a value feature as a feature of a target frame, Extract a key feature, and a query feature.

정밀 처리부(1110)는 어텐션 매칭부(attention matching unit, 1310)를 이용하여 타겟 프레임에서 자막제거영역의 쿼리 특징과 그 외 영역의 키 특징 간의 어텐션 매칭(attention matching)을 수행하고, 어텐션 스코어(attention score)를 산출한다. 정밀 처리부(1110)는 어텐션 스코어를 기반으로 타겟 프레임 내에서 자막영역과 유사도(similarity)가 높은 유사영역을 찾는다. 본 개시의 일 실시예에 따른 어텐션 매칭부(1310)는 도 8a 내지 도 9를 참조한 설명 부분에서 설명한 바와 같은 인터 복원부(142)의 어텐션 매칭부(710)와 동일한 방식으로 어텐션 매칭을 수행할 수 있다. The precision processing unit 1110 performs attention matching between the query characteristics of the subtitle removal area and the key characteristics of other areas in the target frame using an attention matching unit 1310, and performs an attention score (attention matching). score) is calculated. The precision processing unit 1110 searches for a similar region having high similarity to the subtitle region in the target frame based on the attention score. The attention matching unit 1310 according to an embodiment of the present disclosure performs attention matching in the same manner as the attention matching unit 710 of the inter restoration unit 142 as described in the description with reference to FIGS. 8A to 9 . can

정밀 처리부(1110)는 타겟 프레임의 자막영역을 채우기 위한 보상치 특징벡터를 생성하고, 이를 디코더 네트워크(1320)에 입력하여 최종 복원 화소를 생성한다.The precision processing unit 1110 generates a compensation value feature vector for filling the subtitle region of the target frame, and inputs it to the decoder network 1320 to generate a final reconstructed pixel.

이하, 도 14를 참조하여 본 개시의 일 실시예에 따른 프레임 병합부를 설명한다.Hereinafter, a frame merging unit according to an embodiment of the present disclosure will be described with reference to FIG. 14 .

본 개시의 일 실시예에 따른 프레임 병합부(150)는 인터 복원부(142)의 출력 및 인트라 복원부(144)의 출력을 합성하거나, 인터 복원부(142)의 출력 및 인트라 복원부(144)의 출력 중 하나의 출력을 선택하여 최종 출력 프레임으로 사용한다.The frame merging unit 150 according to an embodiment of the present disclosure synthesizes the output of the inter restoration unit 142 and the output of the intra restoration unit 144 , or the output of the inter restoration unit 142 and the intra restoration unit 144 . ), one output is selected and used as the final output frame.

본 개시의 일 실시예에 따른 프레임 병합부(150)는 어텐션 스코어를 이용하여 두 개의 복원 프레임을 합성하거나, 두 개의 복원 프레임 중 하나의 복원 프레임을 선택할 수 있다. 구체적으로, 프레임 병합부(150)는 인터 복원부(142) 및 인트라 복원부(144)에서 더 높은 어텐션 스코어를 산출한 복원부가 출력하는 복원 프레임을 최종 출력 프레임으로 선택할 수 있다. 또는, 프레임 병합부(150)는 인터 복원부(142) 및 인트라 복원부(144)가 각각 산출한 어텐션 스코어의 비율만큼 두 개의 복원 프레임을 합성하여 최종 출력 프레임을 생성할 수 있다. The frame merging unit 150 according to an embodiment of the present disclosure may synthesize two reconstructed frames using the attention score or select one reconstructed frame from among the two reconstructed frames. Specifically, the frame merging unit 150 may select a restored frame output by the restoration unit for which the inter restoration unit 142 and the intra restoration unit 144 has calculated a higher attention score as the final output frame. Alternatively, the frame merging unit 150 may generate a final output frame by synthesizing two reconstructed frames by the ratio of the attention scores calculated by the inter reconstructor 142 and the intra reconstructor 144, respectively.

예를 들어, 인터 복원부(142)의 어텐션 스코어가 80이고, 인트라 복원부(144)의 어텐션 스코어가 30인 경우, 프레임 병합부(150)는 인터 복원부(142)가 출력하는 복원 프레임을 최종 출력프레임으로 선택하거나 인터 복원부(142)가 출력하는 복원 프레임 및 인트라 복원부(144)가 출력하는 복원 프레임을 80:30의 비율로 합성하여 최종 복원 프레임을 생성할 수 있다. For example, when the attention score of the inter restoration unit 142 is 80 and the attention score of the intra restoration unit 144 is 30, the frame merging unit 150 receives the restored frame output from the inter restoration unit 142 . A final reconstructed frame may be generated by selecting it as the final output frame or synthesizing the reconstructed frame output by the inter reconstructor 142 and the reconstructed frame output by the intra reconstructor 144 at a ratio of 80:30.

본 개시의 다른 실시예에 따른 프레임 병합부(150)는 기 학습된 모델(pretrained model)을 이용하여 두 개의 복원 프레임을 합성하거나, 두 개의 복원 프레임 중 하나의 복원 프레임을 선택할 수 있다. 구체적으로 프레임 병합부(150)는 기 학습된 세그먼테이션(segmentation)용 네트워크를 이용하여 인터 복원부(142)가 출력하는 복원 프레임 및 인트라 복원부(144)가 출력하는 복원 프레임으로부터 각각 특징(feature)을 추출한다. 프레임 병합부(150)는 추출된 특징을 비교하여 더 많은 특징이 추출된 복원 프레임을 최종 출력 프레임으로 선택하거나, 특징이 추출된 비율에 따라 두 개의 복원 프레임을 합성하여 최종 출력 프레임을 생성한다. The frame merging unit 150 according to another embodiment of the present disclosure may synthesize two reconstructed frames using a pretrained model or select one reconstructed frame from among the two reconstructed frames. Specifically, the frame merging unit 150 uses a pre-learned network for segmentation from the reconstructed frame output by the inter reconstructor 142 and the reconstructed frame output by the intra reconstructor 144, respectively. to extract The frame merging unit 150 compares the extracted features and selects a reconstructed frame from which more features are extracted as the final output frame, or generates a final output frame by synthesizing two reconstructed frames according to a ratio from which features are extracted.

예를 들어, 프레임 병합부(150)는 기 학습된 모델로서 VGGNet 16을 이용할 수 있다. 프레임 병합부(150)는 인터 복원부(142)가 출력하는 복원 프레임 및 인트라 복원부(144)가 출력하는 복원 프레임을 VGGNet에 입력하여 특징을 추출한다. 프레임 병합부(150)는 추출된 특징의 값의 비교를 통해 어느 복원 프레임이 더 잘 복원되었는지를 판단한다. 예를 들면, 인터 복원부(142)가 출력하는 복원 프레임으로부터 추출된 특징의 총 합 또는 에너지의 총합이 인트라 복원부(144)가 출력하는 복원 프레임으로부터 추출된 특징의 총 합보다 큰 경우, 인터 복원부(142)가 출력하는 복원 프레임을 최종 출력 프레임으로 사용한다.For example, the frame merging unit 150 may use VGGNet 16 as a pre-trained model. The frame merging unit 150 inputs the restored frame output by the inter restoration unit 142 and the restored frame output by the intra restoration unit 144 into VGGNet to extract features. The frame merging unit 150 determines which reconstructed frame is better reconstructed by comparing the extracted feature values. For example, when the total sum of features or energy extracted from the restored frame output by the inter restoration unit 142 is greater than the total sum of features extracted from the restored frame output by the intra restoration unit 144, the inter The restored frame output by the restoration unit 142 is used as the final output frame.

본 개시의 다른 실시예에 따른 프레임 병합부(150)는 신경망(neural network)을 학습시켜 두 개의 복원 프레임을 합성하거나, 두 개의 복원 프레임 중 하나의 복원 프레임을 선택할 수 있다. 구체적으로, 프레임 병합부(150)는 인터 복원부(142)가 출력하는 복원 프레임 및 인트라 복원부(144)가 출력하는 복원 프레임 중 더 복원이 잘 된 프레임을 선택하도록 병합 네트워크(Merge Network)를 학습시킬 수 있다.The frame merging unit 150 according to another embodiment of the present disclosure may learn a neural network to synthesize two reconstructed frames or select one reconstructed frame from among the two reconstructed frames. Specifically, the frame merging unit 150 performs a merge network to select a better restored frame among the restored frame output by the inter restoration unit 142 and the restored frame output by the intra restoration unit 144 . can learn

도 14는 본 개시의 일 실시예에 따른 병합 네트워크 학습을 설명하기 위한 예시도 이다. 14 is an exemplary diagram for explaining merge network learning according to an embodiment of the present disclosure.

도 14를 참조하면, 병합 네트워크(1400)는 CNN(Convolutional Neural Networks)기반 인공신경망으로서, 두 개의 프레임을 입력하면, 하나의 프레임이 출력되는 구조이다. 본 개시의 일 실시예에 따르면, 학습부(1410)는 병합 네트워크가 출력하는 프레임과 원본 프레임(original frame) 또는 GT(Ground Truth) 프레임 간의 손실(loss)을 계산하여, 손실이 줄어드는 방향으로 병합 네트워크(1400)를 학습시킨다. 즉, 학습부(1410)는 병합 네트워크(1400)가 원본 프레임 또는 GT 프레임과 비슷한 프레임을 출력하도록 병합 네트워크(1400)를 학습시킨다.Referring to FIG. 14 , the merge network 1400 is a Convolutional Neural Networks (CNN)-based artificial neural network. When two frames are input, one frame is output. According to an embodiment of the present disclosure, the learning unit 1410 calculates a loss between a frame output from the merging network and an original frame or GT (Ground Truth) frame, and merges in a direction in which the loss is reduced. The network 1400 is trained. That is, the learning unit 1410 trains the merging network 1400 so that the merging network 1400 outputs a frame similar to the original frame or the GT frame.

본 개시의 일 실시예에 따르면 학습부(1410)는 병합 네트워크가 출력하는 프레임과 자막이 편집되기 전 원본 프레임 또는 GT 프레임 간의 차이를 손실로서 계산할 수 있다. 구체적으로, 학습부(1410)는 MAD(Mean Absolute Difference), MSD(Mean Square Difference) 등과 같은 방법을 사용하여 손실을 계산하거나 기 학습된 네트워크(pretrained Network)를 이용하여 추출된 특징의 합 또는 에너지의 차이를 손실로서 계산할 수 있다.According to an embodiment of the present disclosure, the learner 1410 may calculate a difference between the frame output from the merge network and the original frame or the GT frame before the subtitle is edited as a loss. Specifically, the learning unit 1410 calculates a loss by using a method such as Mean Absolute Difference (MAD), Mean Square Difference (MSD), etc. or the sum or energy of features extracted using a pretrained network. The difference can be calculated as a loss.

도 6a 및 도 6b에서는 각 과정을 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 개시의 일 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 개시의 일 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 개시의 일 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 6a 및 도 6b에 기재된 순서를 변경하여 실행하거나 하나 이상의 과정을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 도 6a 및 도 6b는 시계열적인 순서로 한정되는 것은 아니다.Although it is described that each process is sequentially executed in FIGS. 6A and 6B , this is merely illustrative of the technical idea of an embodiment of the present disclosure. In other words, those of ordinary skill in the art to which an embodiment of the present disclosure pertain change the order described in FIGS. 6A and 6B without departing from the essential characteristics of an embodiment of the present disclosure, or perform one or more processes Since various modifications and variations can be applied by executing in parallel, FIGS. 6A and 6B are not limited to a time-series order.

본 명세서에 설명되는 시스템들 및 기법들의 다양한 구현예들은, 디지털 전자 회로, 집적 회로, FPGA(field programmable gate array), ASIC(application specific integrated circuit), 컴퓨터 하드웨어, 펌웨어, 소프트웨어, 및/또는 이들의 조합으로 실현될 수 있다. 이러한 다양한 구현예들은 프로그래밍가능 시스템 상에서 실행가능한 하나 이상의 컴퓨터 프로그램들로 구현되는 것을 포함할 수 있다. 프로그래밍가능 시스템은, 저장 시스템, 적어도 하나의 입력 디바이스, 그리고 적어도 하나의 출력 디바이스로부터 데이터 및 명령들을 수신하고 이들에게 데이터 및 명령들을 전송하도록 결합되는 적어도 하나의 프로그래밍가능 프로세서(이것은 특수 목적 프로세서일 수 있거나 혹은 범용 프로세서일 수 있음)를 포함한다. 컴퓨터 프로그램들(이것은 또한 프로그램들, 소프트웨어, 소프트웨어 애플리케이션들 혹은 코드로서 알려져 있음)은 프로그래밍가능 프로세서에 대한 명령어들을 포함하며 "컴퓨터가 읽을 수 있는　기록매체"에 저장된다. Various implementations of the systems and techniques described herein may be implemented in digital electronic circuitry, integrated circuitry, field programmable gate array (FPGA), application specific integrated circuit (ASIC), computer hardware, firmware, software, and/or combination can be realized. These various implementations may include being implemented in one or more computer programs executable on a programmable system. The programmable system includes at least one programmable processor (which may be a special purpose processor) coupled to receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device. or may be a general-purpose processor). Computer programs (also known as programs, software, software applications or code) contain instructions for a programmable processor and are stored on a "computer-readable recording medium".

컴퓨터가 읽을 수 있는　기록매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 이러한 컴퓨터가 읽을 수 있는　기록매체는 ROM, CD-ROM, 자기 테이프, 플로피디스크, 메모리 카드, 하드 디스크, 광자기 디스크, 스토리지 디바이스 등의 비휘발성(non-volatile) 또는 비일시적인(non-transitory) 매체일 수 있으며, 또한 데이터 전송 매체(data transmission medium)와 같은 일시적인(transitory) 매체를 더 포함할 수도 있다. 또한, 컴퓨터가 읽을 수 있는　기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다.The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. These computer-readable recording media are non-volatile or non-transitory, such as ROM, CD-ROM, magnetic tape, floppy disk, memory card, hard disk, magneto-optical disk, and storage device. It may be a medium, and may further include a transitory medium such as a data transmission medium. In addition, the computer-readable recording medium may be distributed in a network-connected computer system, and the computer-readable code may be stored and executed in a distributed manner.

본 명세서에 설명되는 시스템들 및 기법들의 다양한 구현예들은, 프로그램가능 컴퓨터에 의하여 구현될 수 있다. 여기서, 컴퓨터는 프로그램가능 프로세서, 데이터 저장 시스템(휘발성 메모리, 비휘발성 메모리, 또는 다른 종류의 저장 시스템이거나 이들의 조합을 포함함) 및 적어도 한 개의 커뮤니케이션 인터페이스를 포함한다. 예컨대, 프로그램가능 컴퓨터는 서버, 네트워크 기기, 셋탑 박스, 내장형 장치, 컴퓨터 확장 모듈, 개인용 컴퓨터, 랩탑, PDA(Personal Data Assistant), 클라우드 컴퓨팅 시스템 또는 모바일 장치 중 하나일 수 있다.Various implementations of the systems and techniques described herein may be implemented by a programmable computer. Here, the computer includes a programmable processor, a data storage system (including volatile memory, non-volatile memory, or other types of storage systems or combinations thereof), and at least one communication interface. For example, a programmable computer may be one of a server, a network appliance, a set-top box, an embedded device, a computer expansion module, a personal computer, a laptop, a Personal Data Assistant (PDA), a cloud computing system, or a mobile device.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of this embodiment, and a person skilled in the art to which this embodiment belongs may make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain rather than limit the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present embodiment.

10: 영상복원장치 100: 입력 프레임 DB
110: 프레임 메타정보 DB 120: 마스크 생성부
130: 프레임 선택부 140: 복원부
142: 인터 복원부 144: 인트라 복원부
150: 프레임 병합부 160: 출력 프레임 DB10: image restoration device 100: input frame DB
110: frame meta information DB 120: mask generator
130: frame selection unit 140: restoration unit
142: inter restoration unit 144: intra restoration unit
150: frame merging unit 160: output frame DB

Claims

A method of restoring an area covered by an object included in an image, the method comprising:
selecting a reference frame including information on a target frame from which the object is to be removed and a region within the target frame covered by the object from among the frames of the image; and
Reconstructing the target frame by performing at least one of inter-inpainting and intra-inpainting according to the number of frames selected as the reference frame
Image restoration method comprising a.

According to claim 1,
The selection process is
calculating a similarity between the target frame and a frame adjacent to the target frame by using pixels adjacent to an area in which the object is detected in the target frame; and
selecting a frame having the similarity higher than a preset threshold similarity as a reference frame, and generating information on a similarity region having the highest similarity within the reference frame;
The restoration process is
The image restoration method according to claim 1, wherein the target frame is restored by receiving information on the target frame, the reference frame, and the similar region.

3. The method of claim 2,
The creation process is
The image restoration method, characterized in that the similar region is divided into at least one region according to the position of the object in the reference frame, and a different weight is assigned to each divided region.

According to claim 1,
The selection process is
a first selection process of checking frames of the image in chronological order, selecting a frame not including the object as a backward reference frame, and selecting a frame including the object as a target frame; and
A second selection process of selecting a frame not including the object as a forward reference frame by checking at least one frame located later in time from the target frame in chronological order
Image restoration method comprising a.

According to claim 1,
The selection process is
An image restoration method, wherein the reference frame is selected from among the previously processed frames in which restoration is completed, and a previously processed frame having a restored region that does not overlap a region covered by the object in the target frame is selected as the reference frame.

According to claim 1,
The selection process is
The image restoration method of claim 1, wherein the target frame and the reference frame are selected from among frames in the same scene based on the scene change information extracted from the image.

The method of claim 1,
The restoration process is
When the sum of the number of target frames and the number of reference frames is greater than a preset frame processing unit, at least one of the inter restoration and the intra restoration is performed.

According to claim 1,
The restoration process is
The image restoration method according to claim 1, wherein the intra restoration is performed when there is no frame selected as the reference frame, and the inter restoration is performed when there is a frame selected as the reference frame.

According to claim 1,
The restoration process is
generating a restored frame by performing the intra restoration on some of the target frames when there is no frame selected as the reference frame; and
A process of performing the inter restoration on the remaining frames among the target frames by using the restored frame as the reference frame
Image restoration method comprising a.

In the device for restoring the area covered by the object included in the image,
a frame selection unit for selecting a reference frame including information on a target frame from which the object is to be removed and a region within the target frame obscured by the object; and
an inpainting unit generating a restored frame by performing at least one of inter inpainting and intra inpainting according to the number of reference frames;
Image restoration apparatus comprising a.

A computer program stored in a computer-readable recording medium to execute each process included in the image restoration method according to any one of claims 1 to 9.