KR102180478B1

KR102180478B1 - apparatus AND method for DETECTING CAPTION

Info

Publication number: KR102180478B1
Application number: KR1020140133552A
Authority: KR
Inventors: 이동혁; 문영식; 박기태; 김상호
Original assignee: 삼성전자주식회사; 한양대학교 에리카산학협력단
Priority date: 2014-10-02
Filing date: 2014-10-02
Publication date: 2020-11-18
Also published as: KR20160040043A

Abstract

영상을 이루는 프레임들 중 추출된, 제1 프레임으로부터 제1 프레임의 자막 영역을 획득하고, 제2 프레임으로부터 제2 프레임의 자막 영역을 획득하는 단계, 상기 제1 프레임의 자막 영역과 상기 제2 프레임의 자막 영역이 동일하다고 판단된 경우, 상기 제1 프레임과 상기 제2 프레임 사이의 모든 프레임이 상기 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하는 단계 및 상기 제1 프레임의 자막 영역과 상기 제2 프레임의 자막영역이 동일하지 않다고 판단된 경우, 상기 제1 프레임과 상기 제2 프레임 사이에 위치하는 중간 프레임의 자막 영역을 획득하여, 상기 제1 프레임의 자막 영역과 상기 중간 프레임의 자막 영역이 동일하다고 판단된 경우, 상기 제1 프레임과 상기 중간 프레임 사이의 모든 프레임이 상기 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하는 단계를 포함하는 자막 검출 방법이 제시된다.Obtaining a caption area of a first frame from a first frame extracted from among frames constituting an image, and obtaining a caption area of a second frame from a second frame, the caption area of the first frame and the second frame If it is determined that the caption area of is the same, predicting that all frames between the first frame and the second frame include the same caption area as the first frame, and the caption area of the first frame and the second frame When it is determined that the caption area of the two frames is not the same, the caption area of the intermediate frame positioned between the first frame and the second frame is obtained, and the caption area of the first frame and the caption area of the intermediate frame are When it is determined that they are the same, a caption detection method comprising predicting that all frames between the first frame and the intermediate frame include the same caption region as the first frame is provided.

Description

Caption detection device and method thereof {apparatus AND method for DETECTING CAPTION}

본 발명의 실시예들은 자막 검출 장치 및 그 방법에 관한 것이다.Embodiments of the present invention relate to a caption detection apparatus and method thereof.

최근 스마트TV의 출시로 인하여 사용자가 원하는 컨텐츠를 제공하기 위한 여러 방법이 필요하게 되었다. 기존에는 방송사와 연계하여 방송사가 제공하는 정보를 사용자에게 전달하는 방법을 사용하였으나, 사람이 일일이 정보를 입력해야 한다는 점에서 많은 불편함이 있기 때문에 자동적으로 영상의 정보를 추출할 수 있는 방법이 연구되고 있다. 영상에 삽입된 자막은 영상의 내용에 대한 이해를 돕기 위한 부가 설명이나 주제와 같은 중요한 시맨틱(semantic)정보를 포함하고 있기 때문에 영상에서 자막 검출을 하기 위한 다양한 영상 처리 기법이 연구되고 있다. Recently, with the release of smart TVs, various methods for providing contents desired by users are required. Previously, a method of delivering information provided by broadcasters to users in connection with broadcasters was used, but there is a lot of inconvenience in that humans have to input information individually, so a method that can automatically extract information from images has been researched. Has become. Since the caption embedded in an image contains important semantic information such as additional explanations or topics to help understand the contents of the image, various image processing techniques for detecting captions in the image are being studied.

본 발명은 일반적인 광고, 뉴스 영상에서 중복되는 자막 영상이 많음을 고려하여 모든 동영상 프레임을 검색하지 않고 효율적으로 비디오 자막을 검출할 수 있는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method capable of efficiently detecting video subtitles without searching all video frames in consideration of a large number of overlapping subtitle images in general advertisements and news images.

또한, 본 발명은 기존의 texture기반 방법이 가진 높은 계산량을 줄이고, 글자의 특성을 고려하여 효과적으로 자막을 검출할 수 있는 장치 및 방법에 관한 것이다. In addition, the present invention relates to an apparatus and method capable of effectively detecting subtitles in consideration of character characteristics and reducing a high computational amount of conventional texture-based methods.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예에 따르면, 영상을 이루는 프레임들 중 추출된, 제1 프레임으로부터 제1 프레임의 자막 영역을 획득하고, 제2 프레임으로부터 제2 프레임의 자막 영역을 획득하는 단계, 상기 제1 프레임의 자막 영역과 상기 제2 프레임의 자막 영역이 동일하다고 판단된 경우, 상기 제1 프레임과 상기 제2 프레임 사이의 모든 프레임이 상기 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하는 단계 및 상기 제1 프레임의 자막 영역과 상기 제2 프레임의 자막영역이 동일하지 않다고 판단된 경우, 상기 제1 프레임과 상기 제2 프레임 사이에 위치하는 중간 프레임의 자막 영역을 획득하여, 상기 제1 프레임의 자막 영역과 상기 중간 프레임의 자막 영역이 동일하다고 판단된 경우, 상기 제1 프레임과 상기 중간 프레임 사이의 모든 프레임이 상기 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하는 단계를 포함하는 자막 검출 방법이 제공될 수 있다. As a technical means for achieving the above-described technical problem, according to an embodiment of the present invention, a caption area of a first frame is obtained from a first frame extracted from among frames constituting an image, and a second frame is obtained from the second frame. Acquiring a caption area of a frame, when it is determined that the caption area of the first frame and the caption area of the second frame are the same, all frames between the first frame and the second frame are Predicting to include the same caption area, and when it is determined that the caption area of the first frame and the caption area of the second frame are not the same, an intermediate frame located between the first frame and the second frame When it is determined that the caption area of the first frame and the caption area of the intermediate frame are the same by acquiring the caption area, all frames between the first frame and the intermediate frame include the same caption area as the first frame. There may be provided a method for detecting a caption, including the step of predicting that it is.

상기 자막 검출 방법에 있어서, 상기 제1 프레임의 자막 영역을 획득하는 단계는, 상기 제1 프레임에 소정의 연산을 수행하여 제1 임시 자막 영역을 획득하는 단계, 상기 프레임을 다운 샘플링(down sampling)한 영상에 대하여 상기 소정의 연산을 수행하여 제2 임시 자막 영역을 생성하는 단계, 상기 제1 임시 자막 영역과 상기 제2 임시 자막 영역에 대하여 덧셈 연산을 수행하여 통합 영상을 획득하는 단계, 상기 통합 영상을 이진화 하는 단계, 상기 이진화 한 영상에서 노이즈 및 비자막 영역을 제거하여 최종 자막 영역을 생성하는 단계를 더 포함할 수 있다. In the caption detection method, the obtaining of the caption area of the first frame includes: obtaining a first temporary caption area by performing a predetermined operation on the first frame, and down-sampling the frame The step of generating a second temporary caption area by performing the predetermined operation on one image, obtaining an integrated image by performing an addition operation on the first temporary caption area and the second temporary caption area, the integration The step of binarizing the image, and removing noise and non-film regions from the binarized image may further include generating a final caption region.

상기 이진화 하는 단계는, 상기 제1 프레임의 복잡도가 소정의 값 이상이면 제1 임계값을 적용하여 이진화를 수행하고, 복잡도가 상기 소정의 값 미만이면 제2 임계값을 적용하여 이진화를 수행하며, 제1 임계값이 제2 임계값보다 큰 것을 특징으로 할 수 있다. In the binarizing step, if the complexity of the first frame is equal to or greater than a predetermined value, binarization is performed by applying a first threshold, and when the complexity is less than the predetermined value, binarization is performed by applying a second threshold, It may be characterized in that the first threshold is greater than the second threshold.

상기 제1 임시 자막 영역 획득 단계는, 상기 제1 프레임을 YCbCr 색 공간으로 변환하는 단계, 상기 변환한 YCbCr 색 공간에서, Y, Cb, Cr에 각각 수직 방향 DCT(Discreate Cosine Transform)를 수행하여, 저주파 영역대를 제거하고, 상기 저주파 영역대가 제거된 Y, Cb, Cr를 합쳐 수직 방향 RGB 색 공간으로 변환하는 단계, 상기 변환한 YCbCr 색 공간에서, Y, Cb, Cr에 각각 수평 방향 DCT(Discreate Cosine Transform)를 수행하여, 저주파 영역대를 제거하고, 상기 저주파 영역대가 제거된 Y, Cb, Cr를 합쳐 수평 방향 RGB 색 공간으로 변환하는 단계, 상기 수직 방향 RGB 색 공간과 상기 수평 방향 RGB 색 공간을 곱셈 연산을 통해 합치는 단계를 더 포함할 수 있다. The obtaining of the first temporary caption region may include converting the first frame into a YCbCr color space, performing vertical DCT (Discreate Cosine Transform) on Y, Cb, and Cr in the transformed YCbCr color space, Removing the low-frequency band, and converting the Y, Cb, and Cr from which the low-frequency band was removed into a vertical RGB color space. In the converted YCbCr color space, DCT (Discreate) in the horizontal direction to Y, Cb, and Cr respectively Cosine Transform) to remove the low frequency band, and converting the Y, Cb, and Cr from which the low frequency band is removed to a horizontal RGB color space, the vertical RGB color space and the horizontal RGB color space It may further include the step of combining through a multiplication operation.

상기 곱셈 연산을 통해 합치는 단계는, 상기 수직 방향 RGB 색 공간과 상기 수평 방향 RGB 색 공간에 블러링(blurring)을 수행하는 단계를 더 포함할 수 있다. The step of combining through the multiplication operation may further include performing blurring on the vertical RGB color space and the horizontal RGB color space.

상기 최종 자막 영역 생성 단계는, 상기 생성된 최종 자막 영역으로부터 자막을 검출하는 단계를 더 포함할 수 있다. The step of generating the final caption area may further include detecting a caption from the generated final caption area.

본 발명 일 실시예의 또 다른 측면에 따르면, 영상을 이루는 프레임들 중 소정 개수의 프레임을 추출하는 프레임 추출부, 상기 추출된 프레임 중, 제1 프레임으로부터 제1 프레임의 자막 영역을 획득하고, 제2 프레임으로부터 제2 프레임의 자막 영역을 획득하는 자막 영역 획득부 및 상기 제1 프레임의 자막 영역과 상기 제2 프레임의 자막 영역이 동일하다고 판단된 경우, 상기 제1 프레임과 상기 제2 프레임 사이의 모든 프레임이 상기 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하고, 상기 제1 프레임의 자막 영역과 상기 제2 프레임의 자막영역이 동일하지 않다고 판단된 경우, 상기 제1 프레임과 상기 제2 프레임 사이에 위치하는 중간 프레임의 자막 영역을 획득하여, 상기 제1 프레임의 자막 영역과 상기 중간 프레임의 자막 영역이 동일하다고 판단된 경우, 상기 제1 프레임과 상기 중간 프레임 사이의 모든 프레임이 상기 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하는 제어부를 포함하는 자막 검출 장치가 제공될 수 있다.According to still another aspect of an embodiment of the present invention, a frame extracting unit for extracting a predetermined number of frames among frames constituting an image, obtaining a caption area of a first frame from a first frame among the extracted frames, and a second When it is determined that the caption area obtaining unit for obtaining the caption area of the second frame from the frame and the caption area of the first frame and the caption area of the second frame are the same, all of the caption areas between the first frame and the second frame If it is predicted that a frame includes the same caption area as the first frame, and it is determined that the caption area of the first frame and the caption area of the second frame are not the same, between the first frame and the second frame When the caption area of the intermediate frame located at is acquired and it is determined that the caption area of the first frame and the caption area of the intermediate frame are the same, all frames between the first frame and the intermediate frame are the first frame. A caption detection device including a control unit that predicts to include the same caption area as may be provided.

상기 자막 영역 획득부는, 상기 제1 프레임에 소정의 연산을 수행하여 제1 임시 자막 영역을 획득하고, 상기 프레임을 다운 샘플링(down sampling)한 영상에 대하여 상기 소정의 연산을 수행하여 제2 임시 자막 영역을 획득하며, 상기 제1 임시 자막 영역과 상기 제2 임시 자막 영역에 덧셈 연산을 수행하여 통합 영상을 획득하고, 상기 통합 영상을 이진화 하며, 상기 이진화 한 영상에서 노이즈 및 비자막 영역을 제거하여 최종 자막 영역을 생성할 수 있다. The caption area obtaining unit may perform a predetermined operation on the first frame to obtain a first temporary caption area, and perform the predetermined operation on an image obtained by down-sampling the frame to obtain a second temporary caption. Acquire a region, perform an addition operation on the first temporary caption region and the second temporary caption region to obtain an integrated image, binarize the integrated image, and remove noise and non-film regions from the binarized image. You can create a final subtitle area.

상기 이진화 하는 단계는, 상기 제1 프레임의 복잡도가 소정의 값 이상이면 제1 임계값을 적용하여 이진화를 수행하고, 복잡도가 상기 소정의 값 미만이면 제2 임계값을 적용하여 이진화를 수행하며, 제1 임계값이 제2 임계값보다 클 수 있다. In the binarizing step, if the complexity of the first frame is equal to or greater than a predetermined value, binarization is performed by applying a first threshold, and when the complexity is less than the predetermined value, binarization is performed by applying a second threshold, The first threshold may be greater than the second threshold.

상기 자막 영역 획득부는, 상기 제1 프레임을 YCbCr 색 공간으로 변환하고, 상기 변환한 YCbCr 색 공간에서, Y, Cb, Cr에 각각 수직 방향 DCT(Discreate Cosine Transform)를 수행하여, 저주파 영역대를 제거하고, 상기 저주파 영역대가 제거된 Y, Cb, Cr를 합쳐 수직 방향 RGB 색 공간으로 변환하며, 상기 변환한 YCbCr 색 공간에서, Y, Cb, Cr에 각각 수평 방향 DCT(Discreate Cosine Transform)를 수행하여, 저주파 영역대를 제거하고, 상기 저주파 영역대가 제거된 Y, Cb, Cr를 합쳐 수평 방향 RGB 색 공간으로 변환하고, 상기 수직 방향 RGB 색 공간과 상기 수평 방향 RGB 색 공간을 곱셈 연산을 통해 합칠 수 있다. The caption region acquisition unit converts the first frame into a YCbCr color space, and in the transformed YCbCr color space, performs vertical DCT (Discreate Cosine Transform) on Y, Cb, and Cr, respectively, to remove a low frequency region. And, by combining the Y, Cb, and Cr from which the low-frequency band is removed and converting it into a vertical RGB color space, in the converted YCbCr color space, each horizontal DCT (Discreate Cosine Transform) is performed on Y, Cb and Cr , The low-frequency band is removed, and Y, Cb, and Cr from which the low-frequency band is removed are converted into a horizontal RGB color space, and the vertical RGB color space and the horizontal RGB color space can be combined through a multiplication operation. have.

상기 자막 영역 획득부는, 상기 곱셈 연산을 수행하기 전에 상기 수직 방향 RGB 색 공간과 상기 수평 방향 RGB 색 공간에 블러링(blurring)을 수행할 수 있다. The caption region obtaining unit may perform blurring on the vertical RGB color space and the horizontal RGB color space before performing the multiplication operation.

상기 제어부는, 상기 생성된 최종 자막 영역으로부터 자막을 추출하는 기능을 더 포함할 수 있다. The control unit may further include a function of extracting a caption from the generated final caption area.

본 발명 일 실시예의 또 다른 측면에 따르면, 프로세서에 의해 독출되어 수행되었을 때, 자막 검출 방법을 수행하는 컴퓨터 프로그램 코드들을 저장하는 컴퓨터 판독가능 기록매체에 있어서, 상기 자막 검출 방법은, 영상을 이루는 프레임들 중 추출된, 제1 프레임으로부터 제1 프레임의 자막 영역을 획득하고, 제2 프레임으로부터 제2 프레임의 자막 영역을 획득하는 단계, 상기 제1 프레임의 자막 영역과 상기 제2 프레임의 자막 영역이 동일하다고 판단된 경우, 상기 제1 프레임과 상기 제2 프레임 사이의 모든 프레임이 상기 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하는 단계 및 상기 제1 프레임의 자막 영역과 상기 제2 프레임의 자막영역이 동일하지 않다고 판단된 경우, 상기 제1 프레임과 상기 제2 프레임 사이에 위치하는 중간 프레임의 자막 영역을 획득하여, 상기 제1 프레임의 자막 영역과 상기 중간 프레임의 자막 영역이 동일하다고 판단된 경우, 상기 제1 프레임과 상기 중간 프레임 사이의 모든 프레임이 상기 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하는 단계를 포함하는 컴퓨터 판독 가능 기록매체가 제공될 수 있다.According to another aspect of the present invention, in a computer readable recording medium storing computer program codes for performing a subtitle detection method when read and executed by a processor, the subtitle detection method comprises: a frame constituting an image Obtaining a caption area of the first frame from the extracted first frame and obtaining a caption area of a second frame from the second frame, wherein the caption area of the first frame and the caption area of the second frame are If it is determined that they are the same, predicting that all frames between the first frame and the second frame include the same caption area as the first frame, and the caption area of the first frame and the caption of the second frame When it is determined that the regions are not the same, the caption region of the intermediate frame positioned between the first frame and the second frame is obtained, and it is determined that the caption region of the first frame and the caption region of the intermediate frame are the same. In this case, a computer-readable recording medium may be provided that includes predicting that all frames between the first frame and the intermediate frame include the same caption area as the first frame.

도1은 본 발명의 일 실시예에 따른 자막 검출 장치의 구성도이다.
도2는 본 발명의 일 실시예에 따라 모든 동영상 프레임을 검색하지 않고 효율적으로 비디오 자막을 검출하는 방법에 대한 예시를 나타내는 도면이다.
도3은 본 발명의 일 실시예에 따라 모든 동영상 프레임을 검색하지 않고 효율적으로 비디오 자막을 검출하는 방법에 대한 흐름도이다.
도4는 본 발명의 일 실시예에 따른 자막 영역 획득 방법의 흐름도이다.
도5는 본 발명의 일 실시예에 따른 자막 영역 획득 방법을 설명하기 위한 세부 흐름도이다.
도6은 본 발명의 일 실시예에 따른 자막 영역 획득 과정을 나타낸 도면이다.
도7은 본 발명의 일 실시예에 따른 자막 영역 획득 과정을 나타낸 다른 도면이다.
도8은 본 발명의 일 실시예에 따라 각각 DCT, HPF 및 IDCT를 수행한 Y, Cb, Cr 영상을 결합시키는 과정을 도시한다.
도9는 본 발명의 일 실시예에 따라 수직 방향으로 DCT를 수행한 영상 및 수평 방향으로 DCT를 수행한 영상을 결합시키는 과정을 도시한다.
도10은 본 발명의 일 실시예에 따라 가우시안 블러의 효과를 나타낸 도면이다.
도11은 본 발명의 일 실시예에 따른 원래 크기의 영상과 다운 샘플링한 영상을 결합하여 최종적으로 자막 영역을 획득하는 과정을 나타낸 도면이다.
도12는 본 발명의 일 실시예에 따라 복잡도가 낮은 영상에서 자막 영역을 획득하고 이진화하는 과정을 나타낸 도면이다.
도13는 본 발명의 일 실시예에 따라 복잡도가 높은 영상에서 자막 영역을 획득하고 이진화하는 과정을 나타낸 도면이다.
도14는 본 발명의 일 실시예에 따라 자막 영역을 획득하고 획득한 자막 영역에 후처리하여 자막을 검출하는 과정을 나타낸 도면이다.
도15는 본 발명의 일 실시예에 따라 자막을 검출 하는 전체 과정을 나타낸 도면이다.1 is a block diagram of a caption detection apparatus according to an embodiment of the present invention.
2 is a diagram illustrating an example of a method of efficiently detecting a video caption without searching all video frames according to an embodiment of the present invention.
3 is a flowchart illustrating a method of efficiently detecting video subtitles without searching all video frames according to an embodiment of the present invention.
4 is a flowchart of a method of obtaining a caption area according to an embodiment of the present invention.
5 is a detailed flowchart illustrating a method of obtaining a caption area according to an embodiment of the present invention.
6 is a diagram illustrating a process of obtaining a caption area according to an embodiment of the present invention.
7 is another diagram illustrating a process of obtaining a caption area according to an embodiment of the present invention.
8 illustrates a process of combining Y, Cb, and Cr images performed by DCT, HPF, and IDCT, respectively, according to an embodiment of the present invention.
9 illustrates a process of combining an image performed DCT in a vertical direction and an image performed DCT in a horizontal direction according to an embodiment of the present invention.
10 is a diagram showing the effect of Gaussian blur according to an embodiment of the present invention.
11 is a diagram illustrating a process of finally obtaining a caption area by combining an image of an original size and a down-sampled image according to an embodiment of the present invention.
12 is a diagram illustrating a process of obtaining and binarizing a caption region from an image with low complexity according to an embodiment of the present invention.
13 is a diagram illustrating a process of obtaining and binarizing a caption region from an image with high complexity according to an embodiment of the present invention.
14 is a diagram illustrating a process of detecting a caption by obtaining a caption area and performing post-processing on the acquired caption area according to an embodiment of the present invention.
15 is a diagram showing an entire process of detecting a caption according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 제시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described later together with the accompanying drawings. However, the present invention is not limited to the embodiments presented below, but may be implemented in a variety of different forms, and only the present embodiments inform the scope of the invention to those of ordinary skill in the art. It is provided for reference, and the invention is only defined by the scope of the claims.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고 본 발명에 대해 구체적으로 설명하기로 한다. The terms used in the present specification will be briefly described, and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다. The terms used in the present invention have been selected from general terms that are currently widely used while considering functions in the present invention, but this may vary depending on the intention or precedent of a technician working in the field, the emergence of new technologies, and the like. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning of the terms will be described in detail in the description of the corresponding invention. Therefore, the terms used in the present invention should be defined based on the meaning of the term and the overall contents of the present invention, not a simple name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에서 사용되는 "부"라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부"는 어떤 역할들을 수행한다. 그렇지만 "부"는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부"는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 "부"는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부"들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부"들로 결합되거나 추가적인 구성요소들과 "부"들로 더 분리될 수 있다.When a part of the specification is said to "include" a certain component, it means that other components may be further included rather than excluding other components unless otherwise stated. In addition, the term "unit" used in the specification refers to a hardware component such as software, FPGA, or ASIC, and "unit" performs certain roles. However, "unit" is not meant to be limited to software or hardware. The “unit” may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors. Thus, as an example, "unit" refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, procedures, Includes subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, database, data structures, tables, arrays and variables. The functions provided within the components and "units" may be combined into a smaller number of components and "units" or may be further separated into additional components and "units".

본 명세서에서 "프레임"이란 스크린 등에 영사될 수 있는 직사각형 모양의 이미지를 말하는 것으로, 영화에서는 필름으로부터 영사되는 하나의 쇼트를 의미할 수 있다. 여러 장의 프레임을 스크린에 연속적으로 영사하면 관객들이 느끼는 잔상 효과로 영상이 마치 움직이는 것처럼 보일 수 있다. In the present specification, "frame" refers to a rectangular image that can be projected on a screen or the like, and in a movie, it may mean a single shot projected from a film. If several frames are successively projected onto the screen, the image may appear as if moving due to the afterimage effect felt by the audience.

본 명세서에서 "자막 영역 획득"이란 원본 영상 중에서 자막을 포함할 가능성이 높은 것으로 판단되는 일부 영역을 획득한다는 의미일 수 있다.In the present specification, "acquiring a subtitle area" may mean acquiring a partial area determined to have a high probability of including a caption in the original image.

또한 본 명세서에서 "자막 추출"이란 획득된 자막 영역에서 텍스트로서의 자막을 추출한다는 것을 의미할 수 있다. In addition, in the present specification, "subtitle extraction" may mean that a caption as text is extracted from the acquired caption area.

즉, 자막이 추출되기 전의 자막 영역은 텍스트가 아닌 이미지이고, 자막으로 추출되어야 텍스트로서 인식될 수 있다. That is, the caption area before the caption is extracted is an image, not a text, and must be extracted as a caption to be recognized as text.

아래에서는 첨부한 도면을 참고하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present invention. In addition, in the drawings, parts not related to the description are omitted in order to clearly describe the present invention.

도1은 본 발명의 일 실시예에 따른 자막 검출 장치의 구성도이다.1 is a block diagram of a caption detection apparatus according to an embodiment of the present invention.

자막 검출 장치(100)는 영상을 이루는 프레임들 중 소정 개수의 프레임을 추출하는 프레임 추출부(110), 추출된 프레임 중, 제1 프레임으로부터 제1 프레임의 자막 영역을 획득하고, 제2 프레임으로부터 제2 프레임의 자막 영역을 획득하는 자막 영역 획득부(130) 및 제1 프레임의 자막 영역과 상기 제2 프레임의 자막 영역이 동일하다고 판단된 경우, 제1 프레임과 상기 제2 프레임 사이의 모든 프레임이 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하고, 제1 프레임의 자막 영역과 제2 프레임의 자막영역이 동일하지 않다고 판단된 경우, 제1 프레임과 제2 프레임 사이에 위치하는 중간 프레임의 자막 영역을 획득하여, 제1 프레임의 자막 영역과 중간 프레임의 자막 영역이 동일하다고 판단된 경우, 제1 프레임과 중간 프레임 사이의 모든 프레임이 제1 프레임과 동일한 자막 영역을 포함하는 것으로 예측하는 제어부(150)를 포함할 수 있다. The caption detection apparatus 100 includes a frame extracting unit 110 that extracts a predetermined number of frames among frames constituting an image, obtains a caption area of a first frame from a first frame among the extracted frames, and obtains a caption area of the first frame from the second frame. When it is determined that the caption area acquisition unit 130 for acquiring the caption area of the second frame and the caption area of the first frame and the caption area of the second frame are the same, all frames between the first frame and the second frame If it is predicted to include the same caption area as the first frame and it is determined that the caption area of the first frame and the caption area of the second frame are not the same, the intermediate frame positioned between the first frame and the second frame is When it is determined that the caption area of the first frame and the caption area of the intermediate frame are the same by acquiring the caption area, the control unit predicts that all frames between the first frame and the intermediate frame include the same caption area as the first frame 150 may be included.

프레임 추출부(110)는 영상을 이루는 모든 프레임 중 자막 영역 획득 대상이 될 프레임을 소정 간격마다 하나씩 추출할 수 있다. 예를 들어, 프레임 추출부(110)는 1초당 3개의 프레임을 추출할 수 있다. The frame extracting unit 110 may extract a frame to be a caption region acquisition target among all frames constituting an image, one at a predetermined interval. For example, the frame extracting unit 110 may extract three frames per second.

도2는 본 발명의 일 실시예에 따라 모든 동영상 프레임을 검색하지 않고 효율적으로 비디오 자막을 검출하는 방법에 대한 예시를 나타내는 도면이다.2 is a diagram illustrating an example of a method of efficiently detecting a video caption without searching all video frames according to an embodiment of the present invention.

통상, 하나의 영상은 수많은 프레임으로 구성된다. 도2는 하나의 영상을 구성하는 전체 프레임(210) 중에서 소정의 기준에 따라 자막 영역 획득 대상이 될 프레임(230)을 추출할 수 있다.Usually, one image is composed of many frames. 2 shows a frame 230 to be a caption region acquisition target according to a predetermined criterion from among all frames 210 constituting one image.

이 때, 소정의 기준은 시간 간격, 프레임의 개수 등을 포함할 수 있고, 영상의 종류 및 길이 등에 따라 달리 적용될 수 있다.In this case, the predetermined criterion may include a time interval, the number of frames, and the like, and may be differently applied according to the type and length of the image.

도2에서는 영상을 이루는 전체 프레임(210) 중 제1 프레임(250), 제2 프레임(270), 제3 프레임(290)의 3개 프레임이 추출된 상황을 도시하고 있다. FIG. 2 shows a situation in which three frames of the first frame 250, the second frame 270, and the third frame 290 are extracted out of all frames 210 constituting an image.

전체 프레임(210)에 대하여 자막 영역을 획득하지 않고, 일부의 추출된 프레임(230)에 대하여만 자막 영역을 획득하는 것은 동일한 자막이 여러 프레임에 걸쳐 존재하는 경우, 자막 검출 속도 측면에서나 자원 활용 측면에서 효율적일 수 있다. Acquiring the subtitle region for only some of the extracted frames 230 without obtaining the subtitle region for the entire frame 210 is in terms of subtitle detection speed or resource utilization when the same subtitle exists over several frames. Can be efficient in

도3은 본 발명의 일 실시예에 따라 모든 동영상 프레임을 검색하지 않고 효율적으로 비디오 자막을 검출하는 방법에 대한 흐름도이다.3 is a flowchart illustrating a method of efficiently detecting video subtitles without searching all video frames according to an embodiment of the present invention.

단계 S310에서 자막 검출 장치(100)는 영상을 이루는 프레임들 중 추출된, 제1 프레임으로부터 제1 프레임의 자막 영역을 획득하고, 제2 프레임으로부터 제2 프레임의 자막 영역을 획득할 수 있다. In step S310, the caption detection apparatus 100 may obtain the caption area of the first frame from the extracted first frame among frames constituting the image, and acquire the caption area of the second frame from the second frame.

자막 검출 장치(100)는 획득된 제1 프레임의 자막 영역과 제2 프레임의 자막 영역을 비교하여 동일 여부를 판단할 수 있다(S320).The caption detection apparatus 100 may compare the acquired caption area of the first frame and the caption area of the second frame to determine whether they are the same (S320).

만일, 제1 프레임의 자막 영역과 제2 프레임의 자막 영역이 동일하면, 자막 검출 장치(100)는 제1 프레임과 제2 프레임 사이의 모든 프레임이 제1 프레임 및 제2 프레임과 동일한 자막 영역을 포함하는 것으로 예측할 수 있다(S330).If the caption area of the first frame and the caption area of the second frame are the same, the caption detection apparatus 100 determines that all frames between the first frame and the second frame are the same as the first frame and the second frame. It can be predicted to include (S330).

반면, 제1 프레임의 자막 영역과 제2 프레임의 자막 영역이 동일하지 않은 것으로 판단되면, 자막 검출 장치(100)는 제1 프레임과 제2 프레임 사이에 위치하는 중간 프레임의 자막 영역을 획득할 수 있다(S340).On the other hand, if it is determined that the caption area of the first frame and the caption area of the second frame are not the same, the caption detection apparatus 100 may obtain the caption area of the intermediate frame located between the first frame and the second frame. Yes (S340).

일부 실시예에서는 제1 프레임과 제2 프레임 사이에 위치하는 중간 프레임의 자막 영역을 획득하는 대신, 제1 프레임과 제2 프레임 사이에 위치하는 임의의 프레임을 선정하여 자막 영역을 획득할 수도 있다.In some embodiments, instead of obtaining the caption region of an intermediate frame positioned between the first frame and the second frame, the caption region may be obtained by selecting an arbitrary frame positioned between the first frame and the second frame.

자막 검출 장치(100)는 다시 제1 프레임의 자막 영역과 상기 중간 프레임의 자막 영역을 비교하여 동일 여부를 판단할 수 있다(S350).The caption detection apparatus 100 may again compare the caption area of the first frame with the caption area of the intermediate frame to determine whether they are the same (S350).

만일, 제1 프레임의 자막 영역과 중간 프레임의 자막 영역이 동일하다고 판단된 경우, 제1 프레임과 중간 프레임 사이의 모든 프레임이 제1 프레임 및 중간 프레임과 동일한 자막 영역을 포함하는 것으로 예측할 수 있다(S360).If it is determined that the caption area of the first frame and the caption area of the intermediate frame are the same, it can be predicted that all frames between the first frame and the intermediate frame include the same caption area as the first frame and the intermediate frame ( S360).

그러나, 여전히 제1 프레임의 자막 영역과 중간 프레임의 자막 영역이 동일하지 않다고 판단된 경우, 제2 프레임을 중간 프레임으로 대체할 수 있다(S370).However, if it is still determined that the caption area of the first frame and the caption area of the intermediate frame are not the same, the second frame may be replaced with the intermediate frame (S370).

이 경우, 자막 검출 장치(100)는 제1 프레임과 대체된 제2 프레임 사이에 위치한 중간 프레임의 자막 영역을 다시 획득하여(S340), 단계 S350 및 S360 또는 S370을 반복하여 수행할 수 있다.In this case, the caption detection apparatus 100 may re-acquire the caption area of the intermediate frame located between the first frame and the replaced second frame (S340), and repeat steps S350 and S360 or S370.

이렇게 도3의 전체 단계들을 반복하면 모든 프레임에 대하여 자막 검출을 하지 않고도 적은 수행 시간으로 도2에서 추출된 모든 프레임들(230) 및 추출되지 않은 전체 프레임(210)에 대한 자막 영역을 결정할 수 있다. If all the steps of FIG. 3 are repeated in this way, it is possible to determine the caption regions for all the frames 230 extracted in FIG. 2 and all the frames 210 that are not extracted with a small execution time without detecting captions for all frames. .

도4는 본 발명의 일 실시예에 따른 자막 영역 획득 방법의 흐름도이다. 4 is a flowchart of a method of obtaining a caption area according to an embodiment of the present invention.

단계 S410에서 자막 검출 장치(100)는 영상을 이루는 모든 프레임 중 자막 영역 획득 대상이 될 프레임을 추출할 수 있다. In step S410, the caption detection apparatus 100 may extract a frame to be a caption region acquisition target from among all frames constituting an image.

자막 검출 장치(100)는 추출된 대상 프레임에 대하여 소정의 연산을 수행하여 제1 임시 자막 영역을 획득함과(S420) 동시에, 그 프레임을 다운 샘플링한 영상에 대하여도 소정의 연산을 수행하여 제2 임시 자막 영역을 획득할 수 있다(S430).The caption detection apparatus 100 performs a predetermined operation on the extracted target frame to obtain a first temporary caption area (S420), and at the same time, performs a predetermined operation on an image obtained by down-sampling the frame. 2 A temporary caption area may be acquired (S430).

이 때, 프레임에 대한 다운 샘플링은 1/2로 수행할 수 있다. 하지만 이에 한정되지는 않는다. In this case, down-sampling for the frame may be performed in half. However, it is not limited thereto.

소정의 연산은 DCT(Discrete Cosine Transform), HPF(High-pass Filter), IDCT(Inverse DCT)와 같은 연산을 포함할 수 있는데, 이에 대한 자세한 내용은 후술한다.The predetermined operation may include operations such as Discrete Cosine Transform (DCT), High-pass Filter (HPF), and Inverse DCT (IDCT), which will be described in detail later.

영상의 크기가 다르면 자막의 크기도 다르게 된다. 본 실시예에서는 원래 크기의 영상과 1/2 크기로 다운 샘플링된 영상에서 각각 제1 임시 자막 영역 및 제2 임시 자막 영역을 획득하므로 서로 크기가 다른 자막 영역이 획득될 수 있다.If the video size is different, the size of the subtitle will be different. In the present embodiment, since the first temporary caption area and the second temporary caption area are respectively acquired from an image of an original size and an image down-sampled to a size of 1/2, caption areas having different sizes can be obtained.

단계 S440에서 자막 검출 장치(100)는 제1 임시 자막 영역과 제2 임시 자막 영역에 대하여 덧셈 연산을 수행할 수 있다.
In operation S440, the caption detection apparatus 100 may perform an addition operation on the first temporary caption area and the second temporary caption area.

자막의 크기가 큰 경우, 원본 영상에서는 자막 영역으로 검출되지 않은 반면, 다운 샘플링된 영상에서는 자막 영역으로 검출되는 경우가 있을 수 있다. 이 경우, 다운 샘플링된 영상을 원본 영상 크기로 리사이즈시켜준 후 원본 영상과 밝기 값 덧셈을 수행하면, 원본 크기 영상에서 얻은 자막 영역에서 획득되지 않은 자막 영역을 다운 샘플링된 영상의 자막 영역을 통하여 획득하는 것이 가능하다. When the size of the caption is large, the original image may not be detected as a caption area, while the down-sampled image may be detected as a caption area. In this case, if the downsampled image is resized to the original image size and then the brightness value is added to the original image, the subtitle region not obtained from the subtitle region obtained from the original size image is acquired through the subtitle region of the downsampled image. It is possible to do.

단계 S450에서 자막 검출 장치(100)는 단계 S440에서 원본 영상과 다운 샘플링된 영상에 밝기 값 덧셈을 수행한 영상으로부터 강조된 부분을 검출하기 위하여 이진화를 수행할 수 있다. 이진화란 영상을 흑 또는 백으로만 나타내는 일련의 과정을 의미하는데, 흑과 백을 나누는 기준을 문턱치라고 할 수 있다.In step S450, the caption detection apparatus 100 may perform binarization to detect a portion highlighted from the image obtained by adding brightness values to the original image and the down-sampled image in step S440. Binarization refers to a series of processes that represent an image in black or white only, and the criterion for dividing black and white can be called a threshold.

자막 검출 장치(100)는 영상의 복잡도에 따라 복잡한 영상은 높은 문턱치를 적용하고, 단순한 영상은 낮은 문턱치를 적용할 수 있다. The caption detection apparatus 100 may apply a high threshold to a complex image and a low threshold to a simple image according to the complexity of the image.

수학식 1에서 m은 평균 밝기값을 의미하고, std는 영상의 분산을 의미하며, 2.5라는 수치는 실험적으로 얻어진 수치를 의미할 수 있다.In Equation 1, m denotes an average brightness value, std denotes an image variance, and a value of 2.5 denotes an experimentally obtained value.

단계 S460에서 자막 검출 장치(100)는 이진화 영상에서 노이즈 및 비자막영역을 제거하여 최종 자막 영역을 획득할 수 있다. In operation S460, the caption detection apparatus 100 may obtain a final caption area by removing noise and a non-film area from the binarized image.

도5는 본 발명의 일 실시예에 따른 자막 영역 획득 방법을 설명하기 위한 세부 흐름도이다.5 is a detailed flowchart illustrating a method of obtaining a caption area according to an embodiment of the present invention.

단계 S510에서 자막 검출 장치(100)는 대상 프레임을 YCbCr 색공간으로 변환할 수 있다. In step S510, the caption detection apparatus 100 may convert the target frame into a YCbCr color space.

YCbCr 색공간은 MPEG, JPEG 등과 같은 디지털 이미지 압축 형식에 많이 쓰이는 색공간의 일종으로, Y는 특정 방향에 대한 광 밀도 즉 휘도 성분을 나타내고, Cb와 Cr은 색차 성분을 나타낸다. 색공간이란 색 표시계(color system)를 3차원으로 표현한 공간 개념으로, 컬러가 3가지 속성에 의해 만들어진다고 보고 이 3가지 속성으로 구성되는 3차원 공간에서 컬러들이 어떠한 식으로 배치되는 지에 대한 개념적인 정보를 포함할 수 있다. The YCbCr color space is a type of color space commonly used in digital image compression formats such as MPEG and JPEG, where Y represents the light density or luminance component in a specific direction, and Cb and Cr represent color difference components. The color space is a concept of space that expresses the color system in three dimensions. It is considered that colors are created by three attributes, and is a conceptual illustration of how colors are arranged in a three-dimensional space composed of these three attributes. May contain information.

RCB 색공간의 프레임을 YCbCr 색공간으로 변환한다는 것은 프레임을 각각 Y, Cb, Cr 영상으로 분리한다는 의미일 수 있다. Converting a frame of the RCB color space into a YCbCr color space may mean that the frame is separated into Y, Cb, and Cr images, respectively.

단계 S530에서 자막 검출 장치(100)는 분리된 Y, Cb, Cr 영상에 대하여 각각 수직 방향으로 DCT(Discrete Cosine Transform) 및 저주파 영역 제거 작업을 수행할 수 있다. In operation S530, the caption detection apparatus 100 may perform a DCT (Discrete Cosine Transform) and a low-frequency region removal operation in a vertical direction on the separated Y, Cb, and Cr images, respectively.

DCT는 Y, Cb, Cr 로 변환된 데이터를 코사인 함수의 합으로 변환하는 과정일 수 있다. 영상에 DCT를 수행하면, 영상을 단조로운 배경 영역과 같이 변화가 없거나 적은 영역을 포함하는 저주파 영역과, 복잡한 부분을 포함하는 고주파 영역으로 구분할 수 있다. 통상 자막은 고주파 영역에 포함될 수 있다. DCT may be a process of converting data converted into Y, Cb, and Cr into a sum of cosine functions. When DCT is performed on an image, it is possible to divide the image into a low frequency region including a region with little or no change, such as a monotonous background region, and a high frequency region including a complex region. In general, subtitles may be included in the high frequency region.

자막 검출 장치(100)는 저주파 영역과 고주파 영역으로 구분된 영상에서 HPF(High-pass Filter)를 통하여 자막이 포함되지 않았을 저주파 영역을 제거할 수 있다. The caption detection apparatus 100 may remove a low-frequency region in which a caption is not included from an image divided into a low-frequency region and a high-frequency region through a high-pass filter (HPF).

자막 검출 장치(100)는 저주파 영역이 제거된 영상에 IDCT(Inverse DCT)를 수행하여 DCT 성분들을 다시 Y, Cb, Cr 데이터로 변환할 수 있다. The caption detection apparatus 100 may convert DCT components back to Y, Cb, and Cr data by performing IDCT (Inverse DCT) on the image from which the low frequency region has been removed.

단계 S570에서 자막 검출 장치(100)는 Y, Cb, Cr 영상에 대하여 Y + Cb + Cr의 덧셈 연산을 수행하여 YCbCr 영상을 다시 RGB 영상으로 합칠 수 있다.In operation S570, the caption detection apparatus 100 may perform an addition operation of Y + Cb + Cr with respect to the Y, Cb, and Cr images to add the YCbCr image back to the RGB image.

자막 검출 장치(100)는 분리된 Y, Cb, Cr 영상에 대하여 각각 수평 방향으로도 DCT(Discrete Cosine Transform) 및 저주파 영역 제거 작업을 수행하고(S550), Y, Cb, Cr 영상에 대하여 Y + Cb + Cr의 덧셈 연산을 수행하여 YCbCr 영상을 다시 RGB 영상으로 합칠 수 있다(S580).The caption detection apparatus 100 performs a DCT (Discrete Cosine Transform) and a low-frequency region removal operation in the horizontal direction for the separated Y, Cb, and Cr images (S550), and Y, Cb, and Cr images are Y + The YCbCr image may be added back to the RGB image by performing an addition operation of Cb + Cr (S580).

단계 S590에서 자막 검출 장치(100)는 단계 S530 및 S570을 거친 영상과, 단계 S550 및 S580을 거친 영상을 곱하는 곱셈 연산을 수행할 수 있다. In step S590, the caption detection apparatus 100 may perform a multiplication operation of multiplying the image that has passed through steps S530 and S570 and the image that has passed through steps S550 and S580.

즉, 자막 검출 장치(100)는 수직 방향의 결과 영상과 수평 방향의 결과 영상에 곱셈 연산을 수행할 수 있다. That is, the caption detection apparatus 100 may perform a multiplication operation on the result image in the vertical direction and the result image in the horizontal direction.

도6은 본 발명의 일 실시예에 따른 자막 영역 획득 과정을 나타낸 도면이다.6 is a diagram illustrating a process of obtaining a caption area according to an embodiment of the present invention.

6110은 입력 영상을 나타내는데, 네모 상자 부분이 자막 영역을 나타낸다. 6110 represents an input image, and the square box represents the subtitle area.

6130은 입력 영상에 수평 방향으로 DCT를 수행한 결과 영상을 나타낸다. 6130 denotes a result image of performing DCT on the input image in the horizontal direction.

영상에 DCT를 수행할 때는 2차원 DCT를 수행하는 것이 일반적이다. 하지만 2차원 DCT를 수행할 경우, 계산량이 많아질 뿐 아니라, 영상 전체에서 자막 영역이 어느 파장대에 포함되었는지 특정하기가 어려울 수 있기 때문에 영상의 라인 별로 한 줄씩 1차원 DCT를 수행할 수 있다. When performing DCT on an image, it is common to perform 2D DCT. However, when the 2D DCT is performed, not only the amount of calculation increases, but it may be difficult to specify in which wavelength band the caption region is included in the entire image, and thus 1D DCT may be performed line by line of the image.

6130은 6110에 대하여 수평 방향으로 한 라인씩 DCT를 수행한 결과 영상이다. 영상의 좌측에 위치할수록 저주파 영역, 우측에 위치할수록 고주파 영역을 나타낸다. 6130에서 자막 영역과 같은 라인은 다른 라인보다 고주파 영역에 훨씬 많은 값이 분포하고 있음을 확인할 수 있다. 6130 is a result image of performing DCT line by line in the horizontal direction for 6110. A lower-frequency region is displayed as the image is positioned to the left, and a high-frequency region is displayed to the right. In 6130, it can be seen that a line such as a subtitle area has much more values distributed in a high-frequency area than in other lines.

또한, 6150은 HPF를 통하여 6130에서 저주파 영역을 제거한 결과를 나타낸다. 즉, 6150은 고주파 영역만 남은 영상이다.In addition, 6150 shows the result of removing the low-frequency region from 6130 through HPF. That is, 6150 is an image in which only the high-frequency region remains.

마지막으로 6170은 6150에 IDCT를 수행한 결과를 나타낸다. IDCT를 통하여 다시 공간 영역으로 변환하면 6170과 같이 배경이 대부분 제거되고, 자막 영역 및 강한 에지를 가지는 영역만 남은 자막 후보 영역을 획득할 수 있다. Finally, 6170 shows the result of performing IDCT on 6150. When converted back to the spatial domain through IDCT, most of the background is removed as shown in 6170, and a caption candidate area in which only the caption area and the area having a strong edge remain.

Y, Cb, Cr 영상에 대해 각각 DCT, HPF 및 IDCT를 수행하면 Y 영상에서 자막 후보 영역을 검출하지 못하더라도, 나머지 두 영상에서 검출할 수 있기 때문에 일부 실시예에서는 Y, Cb, Cr 영상에 대하여 각각 DCT, HPF 및 IDCT를 수행할 수 있다. If DCT, HPF, and IDCT are performed on Y, Cb, and Cr images, respectively, even if the caption candidate region cannot be detected in the Y image, it can be detected in the remaining two images. In some embodiments, the Y, Cb, and Cr images are DCT, HPF and IDCT can be performed, respectively.

도7은 본 발명의 일 실시예에 따른 자막 영역 획득 과정을 나타낸 다른 도면이다.7 is another diagram illustrating a process of obtaining a caption area according to an embodiment of the present invention.

7110은 입력 영상을 나타내고, 7130 및 7150은 각각 수평, 수직 방향으로 DCT를 수행하여 자막 후보 영역을 획득한 결과 영상을 나타낸다. Reference numeral 7110 denotes an input image, and reference numerals 7130 and 7150 denote a result image of acquiring a caption candidate region by performing DCT in the horizontal and vertical directions, respectively.

수평, 수직 방향으로 각각 DCT를 수행하는 경우, 각 수행 결과는 글자의 수평 성분 또는 수직 성분만을 포함하고 있으므로, 추후 수평 성분 및 수직 성분을 합치는 과정이 필요할 수 있다. When DCT is performed in the horizontal and vertical directions, respectively, since each execution result includes only the horizontal component or the vertical component of the letter, a process of combining the horizontal component and the vertical component may be required later.

도8은 본 발명의 일 실시예에 따라 각각 DCT, HPF 및 IDCT를 수행한 Y, Cb, Cr 영상을 결합시키는 과정을 도시한다. 8 illustrates a process of combining Y, Cb, and Cr images performed by DCT, HPF, and IDCT, respectively, according to an embodiment of the present invention.

8110은 DCT, HPF 및 IDCT를 수행한 Y 영상을 나타내고, 8130은 8110과 마찬가지로 DCT, HPF 및 IDCT를 수행한 Cb 영상을 나타내며, 8150은 DCT, HPF 및 IDCT를 수행한 Cr 영상을 나타낸다. 8110 denotes a Y image subjected to DCT, HPF and IDCT, 8130 denotes a Cb image subjected to DCT, HPF, and IDCT as in 8110, and 8150 denotes a Cr image subjected to DCT, HPF and IDCT.

8170은 각각 DCT, HPF 및 IDCT를 수행한 Y, Cb, Cr 영상에 대하여 Y + Cb + Cr의 덧셈 연산을 수행하여 결합시킨 영상을 나타낸다. 이 때, 자막 검출 장치(100)는 각 영상의 밝기 값의 합을 구한 후, 0~255의 값으로 노멀라이즈(normalize)할 수 있다. 8170 denotes an image obtained by performing an addition operation of Y + Cb + Cr on Y, Cb, and Cr images subjected to DCT, HPF, and IDCT, respectively. In this case, the caption detection apparatus 100 may obtain the sum of the brightness values of each image and then normalize to a value of 0 to 255.

일부 실시예에서, Y, Cb, Cr 영상의 결합은 수직 방향으로 DCT를 수행한 영상 및 수평 방향으로 DCT를 수행한 영상에 대하여 각각 이루어질 수 있다. In some embodiments, the Y, Cb, and Cr images may be combined with respect to an image performed DCT in a vertical direction and an image performed DCT in a horizontal direction, respectively.

도9는 본 발명의 일 실시예에 따라 수직 방향으로 DCT를 수행한 영상 및 수평 방향으로 DCT를 수행한 영상을 결합시키는 과정을 도시한다. 9 illustrates a process of combining an image performed DCT in a vertical direction and an image performed DCT in a horizontal direction according to an embodiment of the present invention.

9110은 수직 방향으로 각각 DCT, HPF 및 IDCT를 수행한 Y, Cb, Cr 영상에 대하여 Y + Cb + Cr의 덧셈 연산을 수행하여 결합시킨 영상을 나타내고, 9130은 수평 방향으로 각각 DCT, HPF 및 IDCT를 수행한 Y, Cb, Cr 영상에 대하여 Y + Cb + Cr의 덧셈 연산을 수행하여 결합시킨 영상을 나타낼 수 있다. 9110 denotes an image combined by performing an addition operation of Y + Cb + Cr on the Y, Cb, and Cr images in which DCT, HPF and IDCT were performed in the vertical direction, respectively, and 9130 denotes DCT, HPF and IDCT in the horizontal direction, respectively. The combined image may be represented by performing an addition operation of Y + Cb + Cr on the Y, Cb, and Cr images performed by.

9150은 9110의 영상과 9130의 영상에 대하여 곱셈 연산을 수행하여 결합시킨 영상을 나타낼 수 있다. 9150 may represent an image obtained by performing a multiplication operation on the image 9110 and the image 9130 and combined.

일부 실시예에서는 두 영상이 각각 자막의 수평, 수직 성분만 포함하고 있어 두 영상을 결합하는 과정에서 자막 영역이 검출되지 않을 수 있으므로 가우시안 블러(Gaussian blur)를 적용한 후, 두 영상의 밝기 값 곱셈을 수행할 수 있다. 가우시안 블러에 대한 자세한 내용은 도10에서 후술한다. In some embodiments, since the two images contain only the horizontal and vertical components of the subtitle, the subtitle region may not be detected in the process of combining the two images, so after applying Gaussian blur, multiplication of the brightness values of the two images is performed. Can be done. Details of the Gaussian blur will be described later in FIG. 10.

밝기 값 곱셈은 작은 밝기 값끼리의 연산에 의한 밝기 값 증가량 보다 높은 밝기 값끼리의 연산에 의한 밝기 값 증가량이 더 많은 특징이 있다. Brightness value multiplication has a characteristic that the increase in brightness value due to calculation of high brightness values is greater than the increase in brightness value by operation of small brightness values.

도10은 본 발명의 일 실시예에 따라 가우시안 블러의 효과를 나타낸 도면이다.10 is a diagram showing the effect of Gaussian blur according to an embodiment of the present invention.

10110은 입력 영상을 나타내고, 10130은 가우시안 블러를 수행하지 않고 수평 성분만 포함한 영상과 수직 성분만 포함한 영상에 대하여 덧셈 연산을 수행한 결과 영상을 나타낼 수 있다. 10110 denotes an input image, and 10130 denotes an image as a result of performing an addition operation on an image including only a horizontal component and an image including only a vertical component without performing Gaussian blur.

10150은 가우시안 블러를 수행한 후 수평 성분만 포함한 영상과 수직 성분만 포함한 영상에 대하여 덧셈 연산을 수행한 결과를 나타내고, 10170은 가우시안 블러를 수행한 후 수평 성분만 포함한 영상과 수직 성분만 포함한 영상에 대하여 곱셈 연산을 수행했을 때의 결과를 나타낸 도면이다. 10150 represents the result of performing the addition operation on the image including only the horizontal component and the image including only the vertical component after performing Gaussian blur, and 10170 represents the result of performing a Gaussian blur and then performing a Gaussian blur on an image including only horizontal and vertical components. It is a diagram showing the result when a multiplication operation is performed.

본 명세서에서 가우시안 블러는 가우시안 함수(Gaussian function)을 이미지에 적용하여 모자이크 효과와 비슷하게 가리고 싶은 부위를 부드러우면서도 흐릿하게 표현하는 방법을 의미할 수 있다. In the present specification, Gaussian blur may refer to a method of expressing a region to be covered softly and blurry similarly to a mosaic effect by applying a Gaussian function to an image.

10130, 10150 및 10170을 비교할 때, 10170(가우시안 블러 수행 후, 곱셈 연산 수행)이 10130(가우시안 블러 수행하지 않고 덧셈 연산 수행) 및 10150(가우시안 블러 수행 후 덧셈 연산 수행)에 비하여 자막 영역은 더 강조되고 그 외의 객체 영역은 효과적으로 억제되어 가장 효율적으로 자막 영역을 획득할 수 있는 것을 확인할 수 있다. When comparing 10130, 10150, and 10170, the subtitle area is more emphasized compared to 10130 (addition operation is performed without Gaussian blur) and 10150 (addition operation is performed after Gaussian blur is performed). It can be seen that the other object areas are effectively suppressed to obtain the subtitle area most efficiently.

따라서, 일부 실시예에서는 가우시안 블러를 수행한 후, 곱셈 연산 수행할 수 있다. Accordingly, in some embodiments, after Gaussian blur is performed, a multiplication operation may be performed.

도11은 본 발명의 일 실시예에 따른 원래 크기의 영상과 다운 샘플링한 영상을 결합하여 최종적으로 자막 영역을 획득하는 과정을 나타낸 도면이다.11 is a diagram illustrating a process of finally obtaining a caption area by combining an image of an original size and a down-sampled image according to an embodiment of the present invention.

11110은 원본 크기의 영상에 대하여 자막 후보 영역을 획득한 결과를 나타내고, 11130은 1/2로 다운 샘플링한 영상에 대하여 자막 후보 영역을 획득한 결과를 나타내며, 11150은 11110과, 11130을 원본 크기로 확대한 영상 사이에 덧셈 연산을 수행한 결과를 나타낼 수 있다.11110 denotes the result of obtaining the caption candidate region for an image of the original size, 11130 denotes the result of obtaining the caption candidate region for the image down-sampled by 1/2, and 11150 denotes 11110 and 11130 as the original size. The result of performing the addition operation between the enlarged images may be displayed.

영상의 크기가 다르면 자막의 크기도 다르게 되므로, 어떤 크기의 영상에서는 가우시안 블러를 수행하더라도 자막 영역이 충분히 강조되지 않아 검출에 실패하는 경우가 있을 수 있다. If the size of the image is different, the size of the caption is also different. Therefore, even if Gaussian blur is performed in an image of a certain size, detection may fail because the caption area is not sufficiently emphasized.

이 경우, 서로 다른 크기에서 자막을 검출한 후, 덧셈 연산을 수행하면 특정 크기에서 검출하지 못하는 자막 영역을 다른 크기에서는 검출할 가능성이 높으므로 더욱 효과적으로 자막 영역을 획득할 수 있다. In this case, if subtitles are detected at different sizes and then an addition operation is performed, a subtitle area that cannot be detected at a specific size is highly likely to be detected at other sizes, so that the subtitle area can be more effectively obtained.

도12은 본 발명의 일 실시예에 따라 복잡도가 낮은 영상에서 자막 영역을 획득하고 이진화하는 과정을 나타낸 도면이다.12 is a diagram illustrating a process of obtaining and binarizing a caption region from an image with low complexity according to an embodiment of the present invention.

12110은 배경 등을 고려할 때 복잡도가 비교적 낮은 영상에 해당할 수 있다. 복잡도가 낮은 영상은 수식1의 엔트로피가 상대적으로 작을 수 있다(예를 들어, 엔트로피: 0.982). 따라서 이진화 시에 흑과 백을 나누는 기준인 문턱치도 상대적으로 작을 수 있다. 12110 may correspond to an image having a relatively low complexity when considering a background or the like. An image with low complexity may have a relatively small entropy in Equation 1 (eg, entropy: 0.982). Therefore, the threshold value for dividing black and white during binarization may also be relatively small.

12130은 12110에 대하여 획득한 최종 자막 영역일 수 있다. 12130 may be a final caption area acquired for 12110.

12150은 12130을 이진화한 영상일 수 있다. 이진화란 강조된 부분을 검출하기 위하여 수식1에 의하여 계산되거나 임의로 지정된 문턱치를 적용하여 영상을 흑 또는 백으로만 나타내는 일련의 과정을 의미할 수 있다. 12150 may be an image obtained by binarizing 12130. Binarization may refer to a series of processes in which an image is displayed only in black or white by applying a threshold value calculated by Equation 1 or arbitrarily designated in order to detect the highlighted part.

이진화 수행시 최종 자막 영역으로 획득된 영상 영역 중 상대적으로 덜 강조된 일부 영역을 탈락시키고, 더 강조된 영역만 자막 영역으로 남길 수 있다. When performing binarization, a relatively less emphasized part of the image area obtained as the final caption area may be eliminated, and only the more emphasized area may be left as the caption area.

도13는 본 발명의 일 실시예에 따라 복잡도가 높은 영상에서 자막 영역을 획득하고 이진화하는 과정을 나타낸 도면이다.13 is a diagram illustrating a process of obtaining and binarizing a caption region from an image with high complexity according to an embodiment of the present invention.

13110은 배경 등을 고려할 때 복잡도가 비교적 높은 영상에 해당할 수 있다. 복잡도가 높은 영상은 수식1에서 엔트로피가 상대적으로 클 수 있다(예를 들어, 엔트로피: 5.478). 따라서 이진화 시에 흑과 백을 나누는 기준인 문턱치도 클 수 있다. 13110 may correspond to an image having a relatively high complexity when considering a background or the like. An image with high complexity may have a relatively large entropy in Equation 1 (eg, entropy: 5.478). Therefore, the threshold value for dividing black and white during binarization may also be large.

13130은 13110에 대하여 획득한 최종 자막 영역일 수 있다. 13130에서는 관중석 부분이 비자막 영역임에도 불구하고 자막 영역으로 획득될 수 있음을 확인할 수 있다. 13130 may be a final caption area acquired for 13110. In 13130, it can be seen that even though the audience seat portion is a non-film area, it can be acquired as a caption area.

13150은 13130을 이진화한 영상일 수 있다. 이 때 상대적으로 큰 문턱치가 적용될 수 있는데, 그 결과 비자막 영역임에도 불구하고 13130에서 자막 영역으로 획득되었던 관중석 부분이 상당 부분 제거되었음을 확인할 수 있다. 13150 may be an image obtained by binarizing 13130. In this case, a relatively large threshold may be applied. As a result, it can be seen that a large part of the audience seat obtained as the caption area in 13130 has been removed despite the non-film area.

도14는 본 발명의 일 실시예에 따라 자막 영역을 획득하고 획득한 자막 영역에 후처리하여 자막을 검출하는 과정을 나타낸 도면이다.14 is a diagram illustrating a process of detecting a caption by obtaining a caption area and performing post-processing on the acquired caption area according to an embodiment of the present invention.

14110은 원본 영상으로부터 획득한 자막 영역을 나타낸 도면이고, 14130은 14110에서 특정 자막 영역만을 확대한 영상이며, 14150은 14130에 SWT(Stroke Width Transform)을 수행한 영상이며, 14170은 14150에 후처리를 진행한 결과를 나타내는 영상일 수 있다. 14110 is a diagram showing the caption area acquired from the original image, 14130 is an image in which only a specific caption area is enlarged in 14110, 14150 is an image obtained by performing SWT (Stroke Width Transform) on 14130, and 14170 is a post-processing on 14150. It may be an image showing the result of progress.

원본 영상에 소정의 연산을 수행하고 얻은 자막 영역에 이진화까지 수행하면, 자막 검출 장치(100)는 라벨링(labeling)을 통해서 연결 성분(connected component)을 찾아낼 수 있는데, 찾아낸 연결 성분의 최소 경계 사각형(MBR:Minimum Bounding Rectangle)을 구하고, 마지막으로 필터링까지 거치면, 남은 MBR이 최종 자막 영역이 될 수 있다(14110). If a predetermined operation is performed on the original image and even binarization is performed on the obtained caption region, the caption detection apparatus 100 can find a connected component through labeling, and the minimum bounding rectangle of the found connected component When (MBR:Minimum Bounding Rectangle) is obtained and finally filtered, the remaining MBR may become the final caption area (14110).

14130은 최종 자막 영역으로 남은 MBR의 한 예시일 수 있다. 최종 자막 영역이 획득되면 자막 검출 장치(100)는 그 자막 영역으로부터 자막 추출 과정을 수행할 수 있다. 14130 may be an example of the MBR remaining as the final caption area. When the final caption area is obtained, the caption detection apparatus 100 may perform a caption extraction process from the caption area.

일부 실시예에서, 자막 추출 과정은 수행 속도를 고려하여, 각 MBR 별로 이루어질 수 있다. In some embodiments, the subtitle extraction process may be performed for each MBR in consideration of the execution speed.

자막 추출을 위하여 자막 검출 장치(100)는 각 MBR(이미지)마다 SWT를 수행하여 글자(텍스트)를 추출할 수 있다. For caption extraction, the caption detection apparatus 100 may extract characters (text) by performing SWT for each MBR (image).

SWT는 영상의 에지(edge)를 분석하여 동일한 굵기를 가지는 연결 성분을 찾아내는 알고리즘이다. SWT 알고리즘은 구조상 밝은 영역에서 어두운 자막을 찾거나 어두운 영역에서 밝은 자막을 찾는 방법 중 하나의 방법으로만 자막을 검출할 수 있다. SWT is an algorithm that analyzes an edge of an image to find a connected component having the same thickness. The SWT algorithm can detect a subtitle only in one of a method of finding a dark subtitle in a bright area or a bright subtitle in a dark area.

따라서, 자막 검출 장치(100)는 획득한 이진화 영상을 이용하여 이진화 영상의 흰색 영역에 대한 원본 영상의 평균 밝기 값과 흑색 영역에 대한 평균 밝기 값을 비교하여 흰색 영역의 평균 밝기 값이 더 높으면 어두운 영역에서 밝은 자막을 찾는 방법을 사용하고, 그렇지 않으면 밝은 영역에서 어두운 자막을 찾는 방법을 이용할 수 있다. Accordingly, the caption detection apparatus 100 compares the average brightness value of the original image for the white area of the binarized image and the average brightness value for the black area using the obtained binarized image, and when the average brightness value of the white area is higher, A method of finding bright subtitles in a region may be used. Otherwise, a method of finding dark subtitles in a bright region may be used.

하지만 14150과 같이 SWT를 이용하여 검출한 글자는 온전하지 않을 수 있다. 따라서 일부 실시예에서는 후처리를 통하여 손실된 글자 영역을 복구해줄 수 있다. However, characters detected using SWT such as 14150 may not be intact. Accordingly, in some embodiments, the lost text area may be recovered through post-processing.

후처리 과정은 SWT를 통해 검출한 자막에서 손실된 글자 영역을 확률 분포 히스토그램을 이용하여 복구하는 과정일 수 있다. 일부 실시예에서, 히스토그램은 R, G, B 각각 256개씩 2²⁴개의 색상들을 각각 8개씩 총 512색으로 양자화하여 구성될 수 있다. The post-processing process may be a process of recovering a character region lost in a subtitle detected through SWT using a probability distribution histogram. In some embodiments, the histogram may be configured by quantizing 2 ²⁴ colors of 256 R, G, and B each into a total of 512 colors, 8 each.

후처리 과정을 거치면 14170과 같이 온전한 글자를 획득할 수 있다. After passing through the post-processing process, it is possible to obtain a complete character such as 14170.

검출된 자막은 텍스트이므로, 검출된 자막을 이용하여 웹에서 검출된 자막과 관련된 정보도 검색할 수 있다. Since the detected caption is text, information related to the caption detected on the web may be searched by using the detected caption.

도15는 본 발명의 일 실시예에 따라 자막을 검출 하는 전체 과정을 나타낸 도면이다.15 is a diagram illustrating an entire process of detecting a caption according to an embodiment of the present invention.

원본 영상(1510)에서 자막을 검출하기 위하여, 자막 검출 장치(100)는 프레임 샘플링 작업(1520)을 통하여 자막 검출을 수행할 프레임을 선정할 수 있다. In order to detect a caption in the original image 1510, the caption detection apparatus 100 may select a frame to perform caption detection through a frame sampling operation 1520.

프레임이 선정되면, 자막 검출 장치(100)는 원본 크기 영상(1530) 및 1/2로 다운 샘플링한 영상(1540)을 각각 YCbCr 채널로 변환할 수 있다. When a frame is selected, the caption detection apparatus 100 may convert the original size image 1530 and the image 1540 down-sampled to 1/2 into YCbCr channels, respectively.

1550에서 자막 검출 장치(100)는 분리된 Y, Cb, Cr 영상에 대하여 각각 수평 방향으로 1차원 DCT를 수행한 후 HPF를 통해 저주파 영역대를 제거하고, IDCT를 통해 다시 영상을 공간 영역으로 복원시킬 수 있다. 그리고 Y, Cb, Cr에 대한 결과 영상에 대하여 덧셈 연산을 수행하여 제1 임시 자막 영역을 획득할 수 있다. In 1550, the caption detection apparatus 100 performs 1-D DCT on the separated Y, Cb, and Cr images in the horizontal direction, respectively, and then removes the low-frequency region through HPF, and restores the image to the spatial region again through IDCT. I can make it. In addition, the first temporary caption area may be obtained by performing an addition operation on the resulting image for Y, Cb, and Cr.

1560에서 자막 검출 장치(100)는 분리된 Y, Cb, Cr 영상에 대하여 각각 수직 방향으로 1차원 DCT를 수행한 후 HPF를 통해 저주파 영역대를 제거하고, IDCT를 통해 다시 영상을 공간 영역으로 복원시킬 수 있다. 그리고 Y, Cb, Cr에 대한 결과 영상에 대하여 덧셈 연산을 수행 하여 제2 임시 자막 영역을 획득할 수 있다. In 1560, the caption detection apparatus 100 performs 1D DCT on the separated Y, Cb, and Cr images in a vertical direction, respectively, removes the low-frequency region through HPF, and restores the image to the spatial region again through IDCT. I can make it. In addition, a second temporary caption area may be obtained by performing an addition operation on the resulting image for Y, Cb, and Cr.

또한, 자막 검출 장치(100)는 제1 임시 자막 영역과 제2 임시 자막 영역에 대하여 곱셈 연산을 수행하여 원본 크기 영상의 자막 영역을 획득할 수 있다. Also, the caption detection apparatus 100 may obtain a caption area of an original size image by performing a multiplication operation on the first temporary caption area and the second temporary caption area.

1570에서 자막 검출 장치(100)는 원본 영상을 1/2로 다운 샘플링한 이미지에서 분리된 Y, Cb, Cr 영상에 대하여 각각 수평 방향으로 1차원 DCT를 수행한 후 HPF를 통해 저주파 영역대를 제거하고, IDCT를 통해 다시 영상을 공간 영역으로 복원시킬 수 있다. 그리고 Y, Cb, Cr에 대한 결과 영상에 대하여 덧셈 연산을 수행하여 제3 임시 자막 영역을 획득할 수 있다. In 1570, the caption detection apparatus 100 performs one-dimensional DCT in the horizontal direction on the Y, Cb, and Cr images separated from the image obtained by down-sampling the original image by 1/2, and then removes the low-frequency region through HPF. Then, the image may be restored to the spatial domain again through IDCT. In addition, a third temporary caption area may be obtained by performing an addition operation on the resulting image for Y, Cb, and Cr.

1580에서 자막 검출 장치(100)는 원본 영상을 1/2로 다운 샘플링한 이미지에서 분리된 Y, Cb, Cr 영상에 대하여 각각 수직 방향으로 1차원 DCT를 수행한 후 HPF를 통해 저주파 영역대를 제거하고, IDCT를 통해 다시 영상을 공간 영역으로 복원시킬 수 있다. 그리고 Y, Cb, Cr에 대한 결과 영상에 대하여 덧셈 연산을 수행 하여 제4 임시 자막 영역을 획득할 수 있다. At 1580, the caption detection apparatus 100 performs 1D DCT on the Y, Cb, and Cr images separated from the image obtained by down-sampling the original image by 1/2, and then removes the low-frequency region through HPF. Then, the image may be restored to the spatial domain again through IDCT. In addition, a fourth temporary caption area may be obtained by performing an addition operation on the resulting image for Y, Cb, and Cr.

또한, 자막 검출 장치(100)는 제3 임시 자막 영역과 제4 임시 자막 영역에 대하여 곱셈 연산을 수행하여 원본 영상을 1/2로 다운 샘플링한 영상의 자막 영역을 획득할 수 있다. Also, the caption detection apparatus 100 may perform a multiplication operation on the third temporary caption area and the fourth temporary caption area to obtain a caption area of an image obtained by down-sampling the original image by 1/2.

자막 검출 장치(100)는 원본 크기 영상의 자막 영역과, 원본 영상을 1/2로 다운 샘플링한 영상의 자막 영역에 대하여 덧셈 연산을 수행하여 최종 자막 영역을 획득할 수 있다.The caption detection apparatus 100 may obtain a final caption area by performing an addition operation on a caption area of an original size image and a caption area of an image obtained by down-sampling the original image by 1/2.

자막 검출 장치(100)는 최종 자막 영역에 대하여 SWT 및 후처리를 수행하여 자막을 검출할 수 있다. The caption detection apparatus 100 may detect a caption by performing SWT and post-processing on the final caption area.

한편, 본 발명은 컴퓨터 판독가능 저장매체에 컴퓨터가 판독 가능한 코드를 저장하여 구현하는 것이 가능하다. 상기 컴퓨터 판독가능 저장매체는 컴퓨터 시스템에 의하여 판독될 수 있는 데이터가 저장되는 모든 종류의 저장 장치를 포함한다.Meanwhile, the present invention can be implemented by storing a computer-readable code in a computer-readable storage medium. The computer-readable storage medium includes all types of storage devices storing data that can be read by a computer system.

상기 컴퓨터가 판독 가능한 코드는, 상기 컴퓨터 판독가능 저장매체로부터 프로세서에 의하여 독출되어 실행될 때, 본 발명에 따른 영상 처리 방법을 구현하는 단계들을 수행하도록 구성된다. 상기 컴퓨터가 판독 가능한 코드는 다양한 프로그래밍 언어들로 구현될 수 있다. 그리고 본 발명의 실시예들을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 통상의 기술자들에 의하여 용이하게 프로그래밍될 수 있다. The computer-readable code is configured to perform steps of implementing the image processing method according to the present invention when read and executed by a processor from the computer-readable storage medium. The computer-readable code may be implemented in various programming languages. In addition, functional programs, codes, and code segments for implementing the embodiments of the present invention can be easily programmed by those of ordinary skill in the art to which the present invention belongs.

컴퓨터 판독가능 저장매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 반송파(예를 들어 인터넷을 통한 전송)의 형태로 구현하는 것을 포함한다. 또한, 컴퓨터 판독가능 저장매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행되는 것도 가능하다.Examples of the computer-readable storage medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like, and also include implementation in the form of a carrier wave (for example, transmission through the Internet). Further, the computer-readable storage medium may be distributed over a computer system connected by a network, and computer-readable codes may be stored and executed in a distributed manner.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일 형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains will be able to understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting. For example, each component described as a single form may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 위 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims to be described later rather than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

Claims

Obtaining a caption area of a first frame from the first frame, which is extracted from among frames constituting an image, and obtaining a caption area of a second frame from the second frame;
When it is determined that the caption area of the first frame and the caption area of the second frame are the same, predicting that all frames between the first frame and the second frame include the same caption area as the first frame step; And
When it is determined that the caption area of the first frame and the caption area of the second frame are not the same, a caption area of an intermediate frame positioned between the first frame and the second frame is obtained, If it is determined that the caption area and the caption area of the intermediate frame are the same, predicting that all frames between the first frame and the intermediate frame include the same caption area as the first frame,
Acquiring the caption area of the first frame,
Obtaining a first temporary caption area by performing a predetermined operation on the first frame;
Generating a second temporary caption area by performing the predetermined operation on an image obtained by down-sampling the first frame;
Obtaining an integrated image by performing an addition operation on the first temporary caption area and the second temporary caption area;
Binarizing the integrated image;
And generating a final caption region by removing noise and non-film regions from the binarized image.

delete

The method of claim 1,
The binarization step,
When the complexity of the first frame is greater than or equal to a predetermined value, binarization is performed by applying a first threshold, and when the complexity is less than the predetermined value, binarization is performed by applying a second threshold. 2 The caption detection method, characterized in that greater than the threshold value.

The method of claim 1,
The step of obtaining the first temporary caption area,
Converting the first frame into a YCbCr color space;
In the transformed YCbCr color space, each vertical direction DCT (Discreate Cosine Transform) is performed on Y, Cb, and Cr to remove the low-frequency region, and the vertical direction RGB by combining the Y, Cb, and Cr from which the low-frequency region is removed. Converting to a color space;
In the transformed YCbCr color space, a horizontal direction DCT (Discreate Cosine Transform) is performed on each of Y, Cb, and Cr to remove the low-frequency region, and the horizontal direction RGB by adding the Y, Cb, and Cr from which the low-frequency region is removed. Converting to a color space;
And combining the vertical direction RGB color space and the horizontal direction RGB color space through a multiplication operation.

The method of claim 4,
The step of combining through the multiplication operation,
And performing blurring on the vertical RGB color space and the horizontal RGB color space.

The method of claim 1,
The final caption area generation step,
And detecting a caption from the generated final caption area.

A frame extracting unit for extracting a predetermined number of frames from among frames constituting an image;
A caption area obtaining unit acquiring a caption area of a first frame from a first frame of the extracted frames and a caption area of a second frame from a second frame; And
When it is determined that the caption area of the first frame and the caption area of the second frame are the same, it is predicted that all frames between the first frame and the second frame include the same caption area as the first frame, and ,
When it is determined that the caption area of the first frame and the caption area of the second frame are not the same, a caption area of an intermediate frame positioned between the first frame and the second frame is obtained, When it is determined that the caption area and the caption area of the intermediate frame are the same, the control unit predicts that all frames between the first frame and the intermediate frame include the same caption area as the first frame,
The caption area acquisition unit,
A first temporary caption area is obtained by performing a predetermined operation on the first frame, and a second temporary caption area is obtained by performing the predetermined operation on an image obtained by down-sampling the first frame. , Performing an addition operation on the first temporary caption area and the second temporary caption area to obtain an integrated image, binarizing the integrated image, and removing noise and non-film areas from the binarized image to obtain a final caption area. A caption detection device for generating.

delete

The method of claim 7,
The binarization step,
When the complexity of the first frame is greater than or equal to a predetermined value, binarization is performed by applying a first threshold, and when the complexity is less than the predetermined value, binarization is performed by applying a second threshold. 2 The caption detection device, characterized in that greater than the threshold value.

The method of claim 7,
The caption area acquisition unit,
Convert the first frame into a YCbCr color space,
In the transformed YCbCr color space, each vertical direction DCT (Discreate Cosine Transform) is performed on Y, Cb, and Cr to remove the low-frequency region, and the vertical direction RGB by combining the Y, Cb, and Cr from which the low-frequency region is removed. Convert to color space,
In the transformed YCbCr color space, a horizontal direction DCT (Discreate Cosine Transform) is performed on each of Y, Cb, and Cr to remove the low-frequency region, and the horizontal direction RGB by adding the Y, Cb, and Cr from which the low-frequency region is removed. Convert to color space,
And combining the vertical RGB color space and the horizontal RGB color space through a multiplication operation.

The method of claim 10,
The caption area acquisition unit,
And performing blurring on the vertical RGB color space and the horizontal RGB color space before performing the multiplication operation.

The method of claim 7,
The control unit,
A caption detection apparatus further comprising a function of extracting a caption from the generated final caption area.

In a computer-readable recording medium storing computer program codes for performing a subtitle detection method when read and performed by a processor, the subtitle detection method comprising:
Obtaining a caption area of a first frame from the first frame, which is extracted from among frames constituting an image, and obtaining a caption area of a second frame from the second frame;
When it is determined that the caption area of the first frame and the caption area of the second frame are the same, predicting that all frames between the first frame and the second frame include the same caption area as the first frame step; And
When it is determined that the caption area of the first frame and the caption area of the second frame are not the same, a caption area of an intermediate frame positioned between the first frame and the second frame is obtained, If it is determined that the caption area and the caption area of the intermediate frame are the same, predicting that all frames between the first frame and the intermediate frame include the same caption area as the first frame,
Acquiring the caption area of the first frame,
Obtaining a first temporary caption area by performing a predetermined operation on the first frame;
Generating a second temporary caption area by performing the predetermined operation on an image obtained by down-sampling the first frame;
Obtaining an integrated image by performing an addition operation on the first temporary caption area and the second temporary caption area;
Binarizing the integrated image;
The computer-readable recording medium further comprising the step of generating a final subtitle area by removing noise and non-film areas from the binarized image.