KR19990047501A

KR19990047501A - How to extract and recognize news video subtitles

Info

Publication number: KR19990047501A
Application number: KR1019970065943A
Authority: KR
Inventors: 배영래; 이재연; 정세윤; 전병태; 왕민
Original assignee: 정선종; 한국전자통신연구원
Priority date: 1997-12-04
Filing date: 1997-12-04
Publication date: 1999-07-05
Also published as: KR100243350B1

Abstract

본 발명은 비디오 영상에서 자막을 자동으로 추출하고 인식하는 방법에 관한 것으로서, 종래에는 단순한 인덱스 검색 방법에 의하므로써 영상의 의미를 다양하게 표현하지 못하여 다양한 사용자의 검색 요구를 충족시키지 못하는 문제점이있다.The present invention relates to a method of automatically extracting and recognizing subtitles from a video image, and conventionally, a simple index retrieval method does not express various meanings of an image and thus does not satisfy a search requirement of various users.

이에 본 발명은 한 프레임의 비디오 입력 영상(명암값 영상)이 들어오면 전체 영상에서 뉴스 자막이 존재할 징후가 있는 곳을 추출하여 추출된 후보 영역이 연속된 프레임에서 연속성이 계속되는지를 검사함으로서 최종 후보 영역을 추출하고, 추출된 자막의 후보 영역이 진정한 자막 영역인지 확인하기 위하여 대단위 영역화 및 검증을 하며, 영역화 과정에서 구해진 개별 문자 영역을 인식 수단을 사용하여 추출된 자막 인식 단계를 수행하므로써 실시간적으로 뉴스 비디오 자막 추출 및 인식 이 가능하게 하였다.Therefore, when the video input image (contrast value image) of one frame comes in, the present invention extracts a place where there is a sign of a news subtitle from the whole image, and examines whether the extracted candidate region continues in successive frames. Extract the subtitles, perform large-scale segmentation and verification to verify that the candidate subtitles of the extracted subtitles are true subtitles, and perform the extracted subtitle recognition step using the recognition means for the individual character regions obtained in the segmentation process. As a result, news video subtitle extraction and recognition is possible.

Description

How to extract and recognize news video subtitles

본 발명은 비디오 영상에서 자막을 자동으로 추출하고 인식하는 방법에 관한 것으로서, 상세하게는 멀티미디어 데이터베이스의 한 분야인 비디오(또는 영상) 데이터베이스 기술로서 자막 영역 위치를 실시간으로 찾아내는 방법에 관한 것이다.The present invention relates to a method for automatically extracting and recognizing a subtitle from a video image. More particularly, the present invention relates to a method of finding a subtitle area position in real time as a video (or image) database technology, which is a field of a multimedia database.

종래의 비디오(또는 영상) 데이터 베이스 기술은 영상에 간단한 인덱스(index)를 붙여 압축 파일 형태로 영상을 저장하고, 사용자의 요구에 따라 인덱스 검색을 통하여 정보를 검색하였다. 이러한 단순한 인덱스 검색 방법은 영상의 의미를 다양하게 표현하지 못함으로서 다양한 사용자의 검색 요구를 충족시키지 못하는 문제점이있다. 따라서 영상의 의미 정보를 추출하여 메타 데이터화하여 저장하고 이를 검색함으로서 이러한 문제점을 해결할 수 있다. 뉴스 비디오에서 중요한 영상의 의미 정보는 자막 정보, 컬러 정보, 인물 정보등을 들 수 있다.In the conventional video (or image) database technology, an image is stored in a compressed file by attaching a simple index to the image, and information is searched through an index search according to a user's request. This simple index retrieval method does not express the meaning of the image in various ways, and thus does not satisfy the search needs of various users. Therefore, this problem can be solved by extracting the semantic information of the image, storing it as metadata, and retrieving it. The semantic information of the important video in the news video may include caption information, color information, person information, and the like.

기존의 자막 추출 방법은 전체 영상에 대하여 영역 분할 및 합병(region splite and merge) 방법을 이용하여 자막 문자를 추출하는 방법이 있으며, 또다른 방법은 전체 영상을 일정 영역으로 분할한 후 분할된 영역의 명암값 변화가 급격히 변하는 부분을 자막 영역으로 판단하고 추출한다. 이렇한 기존의 방법의 문제점은 전체 영상을 처리함으로서 처리 시간이 많이 소요되고 신뢰성이 떨어진다는 단점이 있다.Conventional subtitle extraction method is to extract the subtitle character using the region splite and merge method for the whole image, another method is to divide the entire image into a predetermined region and then The portion where the change in contrast value changes rapidly is determined and extracted as the subtitle area. The problem with the conventional method is that processing the entire image takes a lot of processing time and has a disadvantage in reliability.

내용 기반 비디오(뉴스 비디오) 데이터베이스 시스템 구축시 영상의 의미 정보(자막, 컬러, 모션, 모양, 등) 추출 과정은 중요한 과정중 하나이다. 특히 뉴스 비디오에서는 자막의 역활은 앵커 또는 기자가 설명해주는 내용을 강조하기 위하여 자막을 사용하기 때문에 자막의 의미 정보는 중요하다고 볼 수 있다.Extracting semantic information (caption, color, motion, shape, etc.) of images is an important process when constructing a content-based video (news video) database system. In particular, in the news video, the role of the subtitles is important because the subtitles are used to emphasize the contents explained by the anchor or the reporter.

따라서, 본 발명은 종래의 문제점을 해결하기 위해, 전체 영상을 처리하지 않고 자막 영역을 추출하므로서 대용량의 뉴스 비디오에서 자막을 실시간에 추출하는 방법을 제공하는 데 그 목적이 있다.Accordingly, an object of the present invention is to provide a method for extracting subtitles in a large amount of news video in real time by extracting a subtitle region without processing the entire image in order to solve the conventional problems.

도 1 은 본 발명에 의한 뉴스 비디오 자막 추출에관한 시스템 구성도.1 is a system configuration of the news video subtitle extraction according to the present invention.

도 2 는 본 발명에 의한 자막 후보 영역 추출 흐름도.2 is a flowchart of caption candidate region extraction according to the present invention;

도 3 은 본 발명에 의한 자막 영역 검증 및 영역화 흐름도.3 is a flowchart for caption region verification and regionation according to the present invention;

도 4 는 본 발명에 의한 대단위 영역화 추출 흐름도.Figure 4 is a large area segmentation extraction flow chart according to the present invention.

도 5 는 본 발명에 의한 대단위 문자 영역 검증 및 문자 단위 개별 영역화 흐름도.5 is a flow chart of large character area verification and individual character area segmentation according to the present invention;

*도면의 주요부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

100 : 뉴스 비디오 입력 영상 200 : 자막후보 영역추출단계100: news video input image 200: subtitle candidate region extraction step

300 : 자막 영역 검증 및 영역화단계 400 : 추출된 자막 인식단계300: subtitle area verification and area step 400: extracted subtitle recognition step

상기 목적을 달성하기 위한 본 발명은 각 프레임에서 추출된 자막의 후보 영역간의 연속성을 고려하여 최종 후보 영역을 결정한다. 추출된 최종 후보 영역이 진정한 자막 영역인지를 판별하기 위하여 검증을 수행한다. 검증 방법은 자막 문자를 군집화하는 대단위 영역화 방법을 사용하여 자막의 단어 문자를 영역화하고 검증을 수행한다. 대단위 문자 영역화 방법은 추출된 각 문자 요소들을 min-x 좌표를 기준으로 오름 차순으로 정렬한다. 첫 번째 구성 요소는 첫 번째 대단위 영역으로 만든다. 두 번째 구성 요소는 형성된 대단위 영역에 속하는지 조건을 검사하고 조건에 맞으면 대단위 영역에 소속 시키고, 조건이 많지 않으면 새로운 대단위 영역을 생성 시킨다. 이렇한 과정을 모든 구성 요소에 반복하여 자막 단어로 이루어진 대단위 영역을 만든다. 자막의 특징 조건을 만족하는 대단위 영역 중에는 비 자막 영역이 존재할 수 있다. 이렇한 비 대단위 영역을 제거하기위해서 대단위 영역의 특징 조건을 비교하여 자막 문장으로서 부적합한 대단위 영역을 제거한다. 조건을 만족하는 대단위 영역중에는 비 자막 영역이 존재할 수 있으므로 문자 단위의 개별 영역화 및 조건 검사를 통하여 진정한 자막 영역을 추출한다.In order to achieve the above object, the present invention determines a final candidate area in consideration of continuity between candidate areas of subtitles extracted from each frame. Verification is performed to determine whether the extracted final candidate region is a true subtitle region. The verification method uses a large-scale segmentation method that clusters the caption characters to segment and verify the word characters of the caption. The large character segmentation method sorts each extracted character element in ascending order based on min-x coordinates. The first component makes the first large area. The second component checks the condition whether it belongs to the formed large area, and if it meets the condition, it belongs to the large area. If there is not many conditions, it creates a new large area. This process is repeated for all components to create a large area of subtitle words. A non-subtitle area may exist among the large areas that satisfy the feature condition of the subtitle. In order to remove such non-large-area regions, the feature conditions of the large-area regions are compared to remove unsuitable large-area regions as subtitle sentences. Since there may be a non-subtitle area in a large area that satisfies a condition, a true subtitle area is extracted through individual area characterization and condition checking.

도 1 은 본 발명에 의한 뉴스 비디오 자막 추출 방법의 전체 흐름도를 나타낸 것으로, 도시된 바와 같이 비디오 입력 영상(명암값 영상)(100)에 대하여 한프레임의 비디오 영상이 들어오면 전체 영상에서 뉴스 자막이 존재할 징후가 있는 곳을 추출하여 추출된 후보 영역을 연속된 프레임에서 연속성이 계속되는지를 검사하여 최종 후보 영역을 결정하는 후보 영역 추출단계(200)와; 상기 추출된 자막의 후보 영역이 진정한 자막 영역인지 확인하기 위하여 대단위 영역화 및 검증을 하는 자막 영역 검증 및 영역화 단계(300)와; 영역화 과정에서 구해진 개별 문자 영역을 인식 수단을 사용하여 추출된 자막 인식 단계(400)를 수행한다.1 is a flowchart illustrating a method of extracting news video subtitles according to the present invention. As shown in FIG. A candidate region extraction step (200) of extracting a place where there is an indication of existence and determining a final candidate region by checking whether the extracted candidate region continues in a continuous frame; A caption region verification and regionation step (300) of performing large area segmentation and verification to confirm whether the candidate region of the extracted caption is a true caption region; The extracted subtitle recognition step 400 is performed by using the recognition means for the individual character areas obtained in the regionation process.

상기와 같이 이루어지는 본 발명에 의한 뉴스 비디오 자막 추출 및 인식 방법은 다음과 같이 진행된다. 한 프레임의 비디오 입력 영상(명암값 영상)이 들어오면(100) 전체 영상에서 뉴스 자막이 존재할 징후가 있는 곳을 추출하고 추출된 후보 영역이 연속된 프레임에서 연속성이 계속되는지를 검사하여 최종 후보 영역을 추출하도록 한다(200). 상기 추출된 자막의 후보 영역이 진정한 자막 영역인지 확인하기 위하여 대단위 영역화 및 검증을 수행하도록 한다(300). 상기 뉴스 비디오 입력영상으로부터 RGB 컬러 정보를 통해 자막영역 검증 및 영역화를 수행한다. 상기 영역화 과정에서 구해진 개별 문자 영역을 인식 수단을 사용하여 추출된 자막을 인식한다(400).The news video subtitle extraction and recognition method according to the present invention made as described above proceeds as follows. When a video input image (contrast value image) of one frame comes in (100), a place where there is a sign of a news subtitle is extracted from the entire image, and the extracted candidate region is examined to determine whether the continuity continues in successive frames. To extract (200). In order to check whether the candidate region of the extracted caption is a true caption region, large-area segmentation and verification are performed (300). The caption region verification and regionization are performed through the RGB color information from the news video input image. In operation 400, the extracted subtitles are recognized using the recognition means of the individual character area obtained in the regioning process.

도 2 는 본 발명에 의한 후보 영역 추출 방법으로서, 기존의 후보 영역 추출 방법을 사용하며, 추출된 후보 영역은 연속된 프레임에서 연속성이 계속되는지를 검사하여 최종 후보 영역을 결정한다. 그 과정은 도시된 바와 같이 먼저 연속된 프레임에서 후보 영역을 시간별(t 시점의 Frame에서 추출된 후보 영역 : F_t(x,y)_1~n, n: t 시점에서 추출된 후보 영역의 개수)로 추출한다(210). 즉, 먼저 t-1, t시점의 후보 영역을 추출해놓고 t+1 시점시, t 시점의 최종 후보 영역을 결정한다. t시점의 모든 후보 영역(F_t(x,y)_1~n)에 대하여 모든 t-1시점의 후보 영역(F_t-1(x,y)_1~s) 또는 t+1시점의 후보 영역( F_t+1(x,y)_1~k)들이 일치하는가를 조사하여(230) 일치하면 해당하는 좌표를 최종 후보 영역으로 결정한다(240). 일치하지 않으면 후보영역을 하나씩 증가 시켜 상기의 작업을 수행하고(250,220) t 시점의 모든 조사가 끝나면 t+1시점의 후보 영역을 t 시점의 후보 영역으로 변경한다(260).2 illustrates a candidate region extraction method according to the present invention, which uses an existing candidate region extraction method. The extracted candidate region determines whether or not continuity continues in successive frames to determine a final candidate region. As shown, first, candidate regions in consecutive frames are selected by time (a candidate region extracted from a frame at time _t : F _t (x, y) _{1 to n} , n: number of candidate regions extracted at time t) Extract (210). That is, first, candidate regions of time t-1 and time t are extracted, and a final candidate region of time t is determined at time t + 1. For all candidate regions F _t (x, y) _{1 to n} at time t, candidate regions F _t-1 (x, y) _{1 to s} at all times _t-1 or candidate regions at time t + 1 If (F _{t + 1} (x, y) _{1 ~ k} ) coincides with each other (230) and if they match, the corresponding coordinate is determined as the final candidate region (240). If it does not match, the above operation is performed by increasing the candidate areas one by one (250, 220). After all investigations at time t are completed, the candidate area at time t + 1 is changed to the candidate area at time t (260).

도 3에서는 추출된 후보 영역이 진정한 자막 영역인지를 영역화하고 검증하는 단계를 설명하고 있다. 뉴스 비디오 입력 영상 중의 자막 문자 특징이 다양한 컬러값(RGB 컬러)을 가지고 있으므로 자막 문자 추출에 컬러를 이용하고 있다(310). 컬러값 처리를 쉽게하기 위하여 RGB 영상을 HSI 영상으로 변화하여 사용하며, H,S,I의 고유값을 이용하여 자막 문자를 이치화 한다(320). 이치화된 영상에서 문자 단위의 구성 요소 좌표를 추출하기 위해서 라베링(labeling)을 수행한다(330). 추출된 문자 단위의 구성 요소들를 군집화 시키는 대단위 영역화를 수행한다(340). 추출된 대단위 영역이 진정한 자막 영역인가를 조사하여 비 자막 영역을 제거하고 문자 단위 개별 영역을 추출하여 자막 영역을 추출한다(350).In FIG. 3, a step of terminating and verifying whether the extracted candidate region is a true subtitle region is described. Since the caption character feature in the news video input image has various color values (RGB color), color is used for caption character extraction (310). The RGB image is converted into an HSI image for easy color value processing, and the subtitle character is binarized using the unique values of H, S, and I (320). Labeling is performed to extract component coordinates in units of characters from the binarized image (330). In operation 340, a large area segmentation for clustering the extracted elements of the character unit is performed. The method determines whether the extracted large-area area is a true subtitle area, removes the non-subtitle area, and extracts a subtitle area by character unit 350 to extract a subtitle area.

도 4 는 대단위 영역 추출 방법을 보여주고 있다. 먼저 추출된 각 문자 단위 구성 요소의 min-x 좌표를 기준으로 오름차순 정렬하여 정렬된 결과 R_1~k얻는다(341,342). R₁은 무조건 대단위 영역 LR₁로 편입 시키고(343), 영역 R₂가 대단위 영역 LR₁에 소속될 수 있는지 조건을 조사하여(345) 소속될 수 있으면 LR₁에 소속시키고(346), 조건이 되지않으면 새로운 LR₂을 생성하여 R₂를 편입시킨다(347). 임의의 R_i가 LR_1~m중으로 편입할 수 있는 조건은 R_i의 y 좌표(min-y <-> max-y)가 어떤 LR_s(1≤s≤m)의 y 좌표(min-y <-> max-y)와 85%이상 중첩되면 R_i는 LR_s에 편입시킬 수 있는 조건이 된다고 본다. 이러한 과정을 R_1~k까지 반복하여 LR_m을 생성한다(348,344). 생성된 대단위 영역들은 대단위 문자 영역 검증 및 문자 단위 개별 영역화를 수행하여 자막을 확인하고 개별 영역화를 수행한다(350).4 shows a large region extraction method. First, ascending order based on the min-x coordinate of each extracted character unit component is obtained, and R1 _{to k are} obtained (341, 342). R ₁ unconditionally incorporates large region LR ₁ (343), examines the condition that region R ₂ can belong to large region LR ₁ (345), and if so belongs to LR ₁ (346), If not, create a new LR ₂ to incorporate R ₂ (347). The condition that any R _i can be incorporated into LR _{1 ~ m} is that the y coordinate of R _i (min-y <-> max-y) is the y coordinate (min-y) of any LR _s (1≤s≤m). <-> max-y) and more than 85% overlap R _i is considered to be a condition to be incorporated into LR _s . This process is repeated to R _{1 ~ k} to generate LR _m (348,344). The generated large-area areas check the subtitles by performing large-character area verification and character-based individual area, and perform individual area (350).

도 5에서는 대단위 문자 영역 검증 및 문자 단위 개별 영역화 단계를 상세히 설명하고 있다. 대단위 영역 LR_i에대하여 다음과 같은 여러 가지 영역 계수를 구한다(351,352,353).In FIG. 5, the large character area verification and the character unit individual area are described in detail. Various area coefficients are obtained for the large area LR _i (351, 352, 353).

영역의 크기 = (max-x - min-x + 1) * (max-y - min-y + 1)Size of the region = (max-x-min-x + 1) * (max-y-min-y + 1)

영역의 가로세로 비 = (max-y - min-y + 1 )/ (max-x - min-x + 1)Aspect ratio of the area = (max-y-min-y + 1) / (max-x-min-x + 1)

문자영역 대 배경영역 비 = 문자 화소의 개수 / 영역의 크기Character area to background area ratio = number of character pixels / area size

영역 계수가 자막의 조건을 만족하지 않으면 다음 대단위 영역을 조사하고(354,358), 만족하면 대단위 문자 영역안에서 개별 문자 영역을 추출한다(354,355). 개별 문자 영역 추출 방법은 문자의 자소(초성, 중성, 종성)를 결합하여 하나의 문자 단위로 묶어 문자 영역을 추출한다. 뉴스 자막을 보면 앵커가 나오는 뉴스 아이콘 자막은 최소 4 ~ 9자의 문자로 구성되므로 추출된 개별 문자 영역의 개수가 4개 이상이면 자막 영역으로 추출한다(356). 추출된 LR_i의 개별 영역 좌표를 저장하도록 한다(357).If the area coefficient does not satisfy the condition of the caption, the next large area is examined (354,358). If the area coefficient is satisfied, the individual character area is extracted in the large character area (354,355). In the method of extracting individual text areas, the text areas are extracted by combining the phoneme characters (first, neutral, and final) into one letter unit. When viewing the news subtitles, the news icon subtitles with anchors are composed of at least 4 to 9 characters, so if the number of extracted individual character areas is 4 or more, the subtitle area is extracted (356). In operation 357, individual region coordinates of the extracted LR _i are stored.

이렇게 추출된 개별 문자 영역에 대하여 문자 인식기를 통해 자막인식을 수행한다(400).Subtitle recognition is performed through the character recognizer on the extracted individual character regions (400).

본 발명에 의하면, 자막 영역 추출시 전체 영상을 처리하지 않고 자막 영역을 실시간에 추출할 수 있게 하기 때문에 대용량의 뉴스 비디오 자막을 실시간으로 처리함으로서 대용량의 비디오 자막 처리시간이 줄어들게 되며, 이로 인한 서비스의 고급화를 가져올 수 있게 된다.According to the present invention, it is possible to extract a subtitle area in real time without processing the entire image when extracting the subtitle area, thereby processing a large amount of video video subtitles in real time, thereby reducing the processing time of a large video subtitle. It will bring you advanced.

Claims

When a video input image (contrast value image) of one frame is received, the candidate is determined by extracting a place where there is a sign of a news subtitle from the whole image and checking whether the extracted candidate region is continuous in successive frames. Region extraction step;

Caption region verification and regionation step of extracting a large unit area, extracting a large character area, and extracting a character unit individual area to confirm whether the candidate region of the extracted caption is a true caption area;

And a caption recognition step of extracting the individual character areas obtained in the caption area verification and regionation step using a recognition means.

The method of claim 1,

The candidate region extraction step,

In the consecutive frames, candidate regions are extracted by time (the candidate regions extracted from the frame at time _t : F _t (x, y) _{1 to n} , n: the number of candidate regions extracted at the time t) and t-1, t extracting a candidate region at time t + 1 and determining a final candidate region at time t + 1;

For all candidate regions F _t (x, y) _{1 to n} at time t, candidate regions F _t-1 (x, y) _{1 to s} at all times _t-1 or candidate regions at time t + 1 Checking whether (F _{t + 1} (x, y) _1-k ) matches;

Increasing the candidate regions one by one if they do not match, and determining corresponding coordinates as the final candidate regions if they match;

and after all investigations at time t are completed, changing the candidate area at time t + 1 into a candidate area at time t.

The method of claim 1,

The caption region verification and regioning step,

A binarization step of converting an RGB image into an HSI image for processing various color values of the caption character of the news video input image, and binarizing the caption character using the unique values of H, S, and I;

A labeling step of labeling to extract component coordinates in units of characters from the binarized image;

A large unit area extracting step of extracting a large unit area for clustering the extracted character units;

News video characterized in that the large character region verification and character unit individual segmentation step of extracting the subtitle region are performed by verifying whether the extracted large unit region is a true subtitle region, removing the non-subtitle region, and region-by-character region. How to extract and recognize subtitles.

The method of claim 3, wherein

The large unit area extraction step,

Obtaining an ordered result R _{1 to k} by ascending order based on the min-x coordinate of each character unit component;

Unconditionally incorporating R ₁ into the large region LR ₁ ;

If region R ₂ can belong to investigate the conditions that can belong to a large-scale area LR ₁ comprising belong and, incorporated into the R ₂ to produce a new LR ₂ unless the conditions ₁ and LR;

Repeating the above process from R _{1 to k} to conditionally check whether any R _i can be incorporated into LR _{1 to} _m to generate LR _m .

The method of claim 4, wherein

Generating the LR _m ,

Conditions under which R _i can be incorporated into LR _{1 to m} ,

R _i in the y-coordinate (min-y <-> max -y) which LR _s (1≤s≤m) of the y-coordinate (min-y <-> max -y) and when at least 85% overlap R _i is A method for extracting and recognizing news video subtitles, characterized by being incorporated into LR _s .

The method of claim 3, wherein

The large character area verification and character unit individual area step,

Obtaining area coefficients for each area of the large unit area LR _{i including} the size of the area, the aspect ratio of the area, and the ratio of the text area to the background area;

If the area coefficient of the large area LR _i does not satisfy the condition of the caption, inspecting the next large area, and if it satisfies the area, extracts the area by individual character units in the large character area;

Extracting and recognizing news video subtitles if the number of the extracted individual character areas is 4 or more and storing the coordinates of the individual areas of LR _i .

The method of claim 6,

The individual text area extraction step,

A method of extracting and recognizing a news video subtitle, comprising extracting a character region by combining characters of a character (first, neutral, and final) into a single character unit.