KR20010050596A

KR20010050596A - A Video Summary Description Scheme and A Method of Video Summary Description Generation for Efficient Overview and Browsing

Info

Publication number: KR20010050596A
Application number: KR1020000055781A
Authority: KR
Inventors: 김재곤; 장현성; 김문철; 김진웅
Original assignee: 오길록; 한국전자통신연구원
Priority date: 1999-10-11
Filing date: 2000-09-22
Publication date: 2001-06-15
Also published as: CN101398843B; EP1222634A1; CN1382288A; CN101398843A; WO2001027876A1; JP2003511801A; AU7689200A; EP1222634A4; CA2387404A1; KR100371813B1; CN100485721C; JP4733328B2

Abstract

PURPOSE: A hierarchical summary description scheme for effective video description and browsing and a method and system of producing descriptive video technical data thereof are provided to allow the user easily to browse and see the description on a user customization system. CONSTITUTION: The hierarchical summary description scheme(201) includes at least one highlight level description scheme(202) and no or one summary theme list description scheme(203). The summary theme list description scheme(203) includes theme or event information allowing the user to browse through. The high light level description scheme(202) includes several highlight segment description schemes(204) and no or multiple highlight level description schemes. The number of highlight segment description schemes(204) equals to the number of segments composing the summary video of the level. The highlight segment description scheme describes the information related to the summary video segment. The highlight segment description scheme includes one video segment locator description scheme(205), no or multiple image locator description schemes(206), no or multiple sound locator description schemes(207) and audio segment locator description schemes(208).

Description

A video summary description scheme for efficient video overview and browsing, and a method and system for generating summary video description data according to the description. [A Video Summary Description Scheme and A Method of Video Summary Description Generation for Efficient Overview and Browsing}

본 발명은 효율적인 비디오 개관 및 브라우징을 위한 요약 비디오 기술구조에 관한 것이다. 또한 요약 비디오 기술구조에 따라 요약 비디오를 기술하기 위한 요약 비디오 생성방법 및 시스템에 관한 것이다.The present invention relates to a summary video technology architecture for efficient video overview and browsing. It also relates to a summary video generation method and system for describing a summary video according to a summary video description structure.

본 발명이 속하는 기술분야는 내용기반 비디오 색인(indexing) 및 브라우징/검색 분야로 비디오를 내용기반(content based)으로 요약하고 이를 기술하는 분야이다. 비디오를 요약하는 형태는 크게 동적 요약(dynamic summary)과 정적 요약(static summary)으로 나눌 수 있는데 본 발명에 따른 비디오 기술구조는 동적 요약과 정적 요약을 통일된 기반의 기술구조로 효과적으로 기술하기 위한 것이다.The technical field to which the present invention belongs is the field of content-based video indexing and browsing / search, which summarizes and describes the video content-based. The video summarization form can be largely divided into a dynamic summary and a static summary. The video description structure according to the present invention is for effectively describing the dynamic summary and the static summary into a unified base description structure. .

일반적으로, 기존의 요약 비디오 및 기술구조는 단순히 요약 비디오에 포함된 비디오 구간에 대한 정보만을 제공함으로써 요약 비디오의 재현을 통하여 전체 비디오의 내용을 전달하는데 국한된다. 그러나 많은 경우 요약 비디오를 통해서 전체 내용을 개관하는데 그치기보다는 전체 내용의 개관을 통하여 관심 있는 부분을 다시 확인하기 위한 브라우징이 필요하다.In general, the existing summary video and description structure is limited to conveying the contents of the entire video through the reproduction of the summary video by simply providing information on the video section included in the summary video. However, in many cases, it is necessary to browse to reconfirm the part of interest through an overview of the entire contents rather than an overview of the entire contents through the summary video.

또한 기존의 요약 비디오는 요약 비디오 제공자가 정한 기준에 의해서 중요하다고 판단되는 비디오 구간만을 사용자에게 제공한다. 따라서 사용자와 비디오 제공자의 기준이 다른 경우, 혹은 사용자가 특별히 원하는 기준이 있을 경우 사용자는 원하는 형태의 요약 비디오를 얻을 수 없다. 즉, 기존의 요약 비디오는 몇 가지 레벨의 요약 비디오가 제공되어 사용자가 원하는 레벨의 요약 비디오를 선택하도록 하지만 요약 비디오의 내용에 따른 선택을 할 수 없으므로 사용자의 선택 범위가 제한적이다.In addition, the existing summary video provides the user with only the video sections determined to be important by the criteria set by the summary video provider. Therefore, if the user's and video provider's criteria are different or if the user's special criteria are desired, the user cannot obtain the summary video of the desired type. That is, the existing summary video is provided with several levels of summary video to allow the user to select the desired level of summary video, but the user can not select according to the content of the summary video, so the user's selection range is limited.

발명의 명칭이 "method and apparatus for video browsing based on content and structure" 이고 등록 번호가 US5821945 인 특허에서는 비디오를 간략히 표현하고 그 표현을 통하여 원하는 내용의 비디오로 접근하는 브라우징 기능을 제공한다. 그러나 대표 프레임에 기반한 정적인 요약이고 기존의 정적 요약은 비디오 샷(shot)의 대표 프레임을 이용하여 요약하는데, 대표 프레임은 단지 그 샷을 대표하는 영상 정보만 제공하므로 요약을 이용한 정보 전달에 한계가 있다. 이에 비해 본 발명에 따른 비디오 기술구조와 브라우징 방법은 비디오 세그먼트에 기반한 동적 요약을 이용한다.The patent entitled "method and apparatus for video browsing based on content and structure" and the registration number US5821945 provide a browsing function for briefly expressing video and accessing video of desired contents through the expression. However, the static summary based on the representative frame and the existing static summary are summarized using the representative frame of the video shot. Since the representative frame provides only the image information representing the shot, there is a limit to information transmission using the summary. have. In contrast, the video description structure and the browsing method according to the present invention use a dynamic summary based on video segments.

1999년에 ISO/IEC JTC1/SC29/WG11 MPEG-7 Output Document No. N2844 에 발표된 "MPEG-7 Description Scheme (V0.5)" 에서 제안된 요약 비디오 기술구조는 동적 요약 비디오의 각 비디오 세그먼트의 구간 정보만을 기술한다. 이는 동적 요약을 기술하는 기본적인 기능은 제공하지만 다음의 측면에서 문제점을 갖는다. 우선 기존에는 요약 비디오를 구성하는 요약 세그먼트로부터 원 비디오로의 접근을 제공하지 못한다는 단점이 있다. 즉, 사용자들은 요약 비디오를 통한 개관과 요약 내용을 바탕으로 좀 더 자세한 내용 파악을 위하여 원 비디오로 접근하고자 하는데, 종래에는 이를 제공하지 못한다. 또한 오디오 요약 기술 기능을 충분히 제공하지 못하며, 마지막으로 사건 기반의 요약(event-based summary)을 표현하고자 할 때 중복 기술과 탐색의 복잡성이 불가피해지는 단점을 갖고 있다.In 1999, ISO / IEC JTC1 / SC29 / WG11 MPEG-7 Output Document No. The summary video description structure proposed in "MPEG-7 Description Scheme (V0.5)" published in N2844 describes only the section information of each video segment of the dynamic summary video. This provides the basic functionality for describing dynamic summaries but has problems in the following aspects. First, there is a drawback that the conventional video cannot provide access to the original video from the summary segment constituting the summary video. That is, users want to access the original video for more detailed information based on the overview and summary contents through the summary video, but this cannot be provided in the related art. In addition, it does not provide enough audio summary description function, and finally, when expressing an event-based summary, there is a disadvantage that duplication description and search complexity are inevitable.

따라서, 본 발명은 상기의 문제점을 개선하기 위하여 요약 비디오와 함께 요약 비디오에 포함된 각 비디오 구간마다 대표 프레임 정보, 대표 음향 정보를 포함하고, 요약 비디오의 내용에 대한 사용자의 선택을 제공하는 사용자 주문형(user customization)의 사건 기반 요약(event based summary)과 효과적인 브라우징을 가능하게 하는 계층적 요약 비디오 기술구조와 그 기술구조를 이용한 요약 비디오 기술 데이터 생성방법 및 시스템을 제공하는 데 그 목적이 있다.Accordingly, the present invention includes representative frame information and representative sound information for each video section included in the summary video together with the summary video in order to improve the above-mentioned problem, and provides a user's selection for the contents of the summary video. The purpose of the present invention is to provide a hierarchical summary video description structure that enables event-based summary of user customization and effective browsing, and a method and system for generating summary video description data using the description structure.

도 1은 본 발명에 따른 기술구조(description scheme: DS)에 따라서 요약 비디오 기술 데이터를 생성하기 위한 시스템을 도시한 블록도 이고,1 is a block diagram illustrating a system for generating summary video description data according to a description scheme (DS) in accordance with the present invention;

도 2는 본 발명에 따른 요약 비디오를 기술하기 위한 계층적 기술 구조의 자료구조를 UML(Unified Modeling Language)로 도시한 것이고,FIG. 2 illustrates a data structure of a hierarchical description structure for describing a summary video according to the present invention in a Unified Modeling Language (UML).

도 3은 본 발명에 따른 요약 비디오 재현 및 브라우징 툴의 사용자 인터페이스의 일 실시예 이고,3 is an embodiment of a user interface of a summary video reproduction and browsing tool in accordance with the present invention;

도 4는 본 발명에 따른 요약 비디오 기술 데이터를 이용한 계층적 브라우징을 위한 데이터 및 제어 흐름에 대한 구성도 이다.4 is a block diagram illustrating data and control flow for hierarchical browsing using summary video description data according to the present invention.

이와 같은 목적을 달성하기 위한 본 발명의 한 실시예에 따른 계층적요약 기술구조(HierarchicalSummary DS)는, 하이라이트레벨에 대해 기술하는 적어도 하나 이상의 하이라이트레벨 기술구조를 포함하고, 상기 하이라이트레벨 기술구조는 그 레벨의 요약 비디오를 구성하는 하이라이트 세그먼트의 정보를 기술하는 하나 이상의 하이라이트세그먼트 기술구조(HierarchicalSegment DS)를 포함한 것을 특징으로 한다.Hierarchical Summary DS according to an embodiment of the present invention for achieving the above object includes at least one highlight level description structure for describing the highlight level, the highlight level description structure is And one or more Highlight Segment Description Structures (Hierarchical Segment DS) describing information of the highlight segments constituting the level summary video.

양호하게는 상기 하이라이트레벨 기술구조는 다수 개의 하위의 하이라이트레벨 기술구조로 구성되고, 상기 하이라이트레벨 기술구조는 다수의 하위 레벨로 구성되는 것을 특징으로 한다.Preferably, the highlight level description structure is composed of a plurality of lower highlight level description structures, and the highlight level description structure is composed of a plurality of lower levels.

양호하게는 상기 하이라이트세그먼트 기술구조는, 상기 해당 하이라이트 세그먼트의 시간 정보와 비디오 자체정보를 기술하는 비디오 세그먼트 위치지정 기술구조를 포함한 것을 특징으로 한다.Preferably, the highlight segment description structure includes a video segment positioning description structure describing the time information of the corresponding highlight segment and the video itself.

양호하게는 상기 계층적요약 기술구조는, 상기 계층적요약 기술구조가 포함하는 모든 요약의 종류를 나타내는 SummaryComponentType을 나열하여 기술한 SummaryComponentTypeList 속성을 포함한 것을 특징으로 한다.Preferably, the hierarchical summary description structure includes a SummaryComponentTypeList attribute described by listing SummaryComponentType representing all kinds of summaries included in the hierarchical summary description structure.

양호하게는 상기 계층적요약 기술구조는, 요약에 포함된 사건(또는 주제)들을 나열하고 그 ID를 기술하는 요약주제리스트 기술구조(SummaryThemeList DS)를 포함하여, 사건 중심의 요약을 기술하고 사용자가 요약 비디오를 상기 요약주제리스트에 기술된 주제 또는 사건별로 브라우징할 수 있도록 하는 것을 특징으로 한다.Preferably, the hierarchical summary description structure includes a summary theme list description (SummaryThemeList DS) that lists the events (or subjects) included in the summary and describes its ID. The summary video may be browsed for each subject or event described in the summary topic list.

또한, 본 발명에 따르면 상술한 계층적요약 기술구조로 요약 비디오를 저장한 컴퓨터로 읽을 수 있는 기록매체가 제공된다.According to the present invention, there is provided a computer-readable recording medium storing a summary video in the hierarchical summary description structure described above.

또한, 본 발명에 따르면 원 비디오를 입력받아 요약 기술구조에 따라서 요약 비디오 기술 데이터를 생성하는 요약 비디오 기술데이터 생성방법이 제공되는데, 이는 원 비디오를 입력받고 분석하여 비디오 분석결과를 출력하는 비디오 분석단계와; 요약 비디오 구간을 선택하기 위한 요약 규칙을 정의하는 요약규칙 정의단계; 상기 원 비디오 분석 결과와 상기 요약 규칙을 입력받아 원 비디오에서 비디오내용을 요약할 수 있는 비디오 구간을 선택하여 요약비디오구간정보를 구성하는 요약비디오 구간선택단계; 및 상기 요약비디오구간선택단계에서 정의된 요약비디오구간정보를 입력받아 계층적요약 기술구조에 따라 비디오요약기술데이터를 생성하는 요약비디오기술단계를 포함하여 이루어진 것을 특징으로 한다.In addition, according to the present invention, there is provided a method for generating summary video description data for receiving summary video and generating summary video description data according to the summary description structure. The video analysis step of outputting a video analysis result by receiving and analyzing the original video Wow; A summary rule definition step of defining a summary rule for selecting a summary video section; A summary video section selection step of receiving the original video analysis result and the summary rule and selecting a video section for summarizing video contents in the original video to configure summary video section information; And a summary video description step of receiving the summary video section information defined in the summary video section selection step and generating video summary description data according to the hierarchical summary description structure.

또한, 본 발명에 따르면 원 비디오를 입력받아 요약 기술구조에 따라서 요약 비디오 기술 데이터를 생성하는 요약 비디오 기술 데이터 생성시스템이 제공되는데, 이는 원 비디오를 입력받고 분석하여 비디오 분석결과를 출력하는 비디오 분석수단과; 요약 비디오 구간을 선택하기 위한 요약 규칙을 정의하는 요약규칙 정의수단; 상기 원 비디오 분석 결과와 상기 요약 규칙을 입력받아 원 비디오로부터 비디오내용을 요약할 수 있는 비디오 구간을 선택하여 요약비디오구간정보를 구성하는 요약비디오 구간선택수단; 및 상기 요약비디오 구간선택수단에서 정의된 요약비디오구간정보를 입력받아 계층적요약 기술구조를 가지는 비디오 요약 기술 데이터를 생성하는 요약비디오기술수단을 포함하여 이루어진 것을 특징으로 한다.Also, according to the present invention, there is provided a summary video description data generation system that receives an original video and generates summary video description data according to the summary description structure. The video analysis means outputs a video analysis result by receiving and analyzing the original video. and; Summary rule definition means for defining a summary rule for selecting a summary video section; Summary video section selection means for receiving the original video analysis result and the summary rule and selecting a video section for summarizing video contents from the original video to configure summary video section information; And summary video description means for receiving the summary video section information defined by the summary video section selection means and generating video summary description data having a hierarchical summary description structure.

또한, 본 발명에 따르면 상술한 바와 같은 요약 비디오 기술 데이터 생성방법으로 비디오를 계층적요약하는 요약 비디오 기술 데이터 생성시스템을 기능시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체가 제공된다.According to the present invention, there is also provided a computer-readable recording medium having recorded thereon a program for operating a summary video description data generation system hierarchically summarizing a video by the above-described method of generating summary video description data.

또한, 본 발명에 따른 서버/클라이언트 환경에서의 비디오 브라우징 시스템은, 원 비디오를 입력받아 계층적요약 기술구조에 기반하여 요약 비디오 기술 데이터를 생성하며 상기 원 비디오와 요약 비디오 기술 데이터를 링크하는 요약 비디오 기술 데이터 생성시스템을 구비한 서버와;In addition, the video browsing system in a server / client environment according to the present invention receives a raw video, generates a summary video description data based on a hierarchical summary description structure, and includes a summary video linking the original video and the summary video description data. A server having a technical data generation system;

상기 요약 비디오 기술 데이터를 이용하여 상기 원 비디오를 개관하고 상기 서버의 원 비디오로 접근하여 비디오를 브라우징 및 네비게이션하는 클라이언트를 구비한 것을 특징으로 한다.And a client that overviews the original video using the summary video description data and accesses the original video of the server to browse and navigate the video.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 일 실시예를 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail a preferred embodiment of the present invention.

도 1은 본 발명에 따른 기술구조(description scheme)에 따라서 요약 비디오 기술데이터를 생성하기 위한 시스템을 도시한 블록도 이다. 도 1에 도시된 바와 같이, 본 발명에 따른 요약 비디오 기술 데이터 생성 장치는 특징 추출부(101), 사건 검출부(102), 에피소드 검출부(103), 요약 비디오 구간 선택부(104), 요약 규칙 정의부(105), 대표 프레임 추출부(106), 대표 음향 추출부(107), 및 요약 비디오 기술부(108)로 구성된다.1 is a block diagram illustrating a system for generating summary video description data according to a description scheme in accordance with the present invention. As shown in FIG. 1, the apparatus for generating summary video description data according to the present invention includes a feature extractor 101, an event detector 102, an episode detector 103, a summary video section selector 104, and a summary rule definition. The unit 105 includes a representative frame extracting unit 106, a representative sound extracting unit 107, and a summary video description unit 108.

특징 추출부(101)는 원 비디오를 입력하여 요약 비디오를 생성하기 위하여 필요한 특징들을 추출한다. 일반적인 특징으로는 샷 경계, 카메라 움직임, 자막 영역, 얼굴 영역 등이 있다. 특징 추출 단계에서는 이들 특징들을 추출하여 특징의 종류와 이들 특징들이 검출되는 비디오 시간 구간을 (특징 종류, 특징 일련번호, 시간구간)의 형태로 사건 검출 단계로 출력한다. 예를 들면 카메라 움직임의 경우 (카메라줌, 1, 100~150)에는 카메라 줌 1번(첫 번째 줌)이 100~150 프레임에서 검출되었다는 정보를 표현한다.The feature extractor 101 inputs the original video and extracts the features necessary to generate the summary video. Common features include shot boundaries, camera movements, subtitle areas, and face areas. The feature extraction step extracts these features and outputs the types of features and video time intervals in which these features are detected in the form of (feature type, feature serial number, time interval) to the event detection step. For example, in the case of camera movement (camera zoom, 1, 100 to 150), information indicating that the camera zoom 1 (first zoom) is detected in 100 to 150 frames is represented.

사건 검출부(102)는 원 비디오에 포함된 주요 사건들을 검출한다. 이들 사건은 원 비디오의 내용을 잘 표현할 수 있어야 하고 요약 비디오를 생성하는데 기준이 되는 것들이기 때문에 일반적으로 원 비디오의 쟝르에 따라서 다르게 정의된다. 사건은 상위의 의미 레벨을 나타낼 수도 있고 상위의 의미를 직접 유추할 수 있는 비주얼 특징일 수도 있다. 예를 들어, 축구 비디오의 경우 골, 슛, 자막, 재재생(replay) 등을 사건으로 정의할 수 있다.The event detector 102 detects major events included in the original video. These events are generally defined differently according to the genre of the original video because they should be able to express the contents of the original video well and are the basis for generating the summary video. An event may represent a higher semantic level or may be a visual feature that directly infers a higher meaning. For example, in the case of a soccer video, a goal, a shot, a subtitle, a replay, and the like may be defined as an event.

사건 검출부(102)는 검출한 사건의 종류와 그 시간 구간을 (사건 종류, 사건번호, 시간 구간) 형태로 출력한다. 예를 들면 첫 번째 골이 200~300 프레임 사이에 발생했다는 사건 정보는 (골, 1, 200~300) 의 형태로 출력한다.The event detector 102 outputs the detected event type and its time interval in the form of an event type, an event number, and a time interval. For example, the event information indicating that the first goal occurred between 200 and 300 frames is output in the form of (Goal, 1, 200 to 300).

에피소드 검출부(103)는 검출된 사건을 바탕으로 비디오를 이야기 흐름에 기반한 사건보다 더 큰 단위의 에피소드로 분할한다. 주요 사건을 검출한 다음 그 주요 사건을 중심으로 그 사건에 따른 부대 사건을 포함하여 하나의 에피소드로 검출한다. 일례로 축구 비디오의 경우 골과 슛은 주요 사건이 되고 그 사건의 부대 사건으로 골이나 슛이 발생했을 때의 벤치 장면, 관중석 장면, 골 세레모니 장면, 골 장면의 재재생 등이 그 사건의 부대 사건을 구성한다. 즉 골과 슛을 중심으로 에피소드를 검출한다.The episode detection unit 103 divides the video into larger episodes than the events based on the story flow based on the detected events. Detects a major event and then detects the episode as a single episode, including the incidents associated with that event. For example, in the case of soccer videos, goals and shots are the main events, and incidental events of the event include bench scenes when a goal or shot occurred, a grandstand scene, a goal ceremony scene, and a replay of a goal scene. Configure That is, the episode is detected based on the goal and the shot.

에피소드 검출 정보는 (에피소드 번호, 시간 구간, 우선 순위, 특징 샷, 사건 연결 정보)의 형태로 출력한다. 여기서 에피소드 번호는 에피소드의 일련 번호이고 시간 구간은 그 에피소드의 시간 구간을 샷 단위로 나타낸다. 우선 순위는 그 에피소드의 중요도를 나타낸다. 특징 샷은 그 에피소드를 구성하는 샷들 중에서 가장 중요한 정보를 포함한 샷 번호를 나타내고 사건 연결 정보는 그 에피소드와 관련된 사건들의 사건 번호를 나타낸다. 예를 들면 (에피소드1, 4~6, 1, 5, 골 1, 자막3)으로 표시할 경우, 첫 번째 에피소드는 4~6번째 샷을 포함하고, 우선 순위는 높고(1), 특징 샷은 5번 샷이고 연결된 이벤트는 1번 골과 3번 자막임을 나타낸다.The episode detection information is output in the form of (episode number, time interval, priority, feature shot, event connection information). Here, the episode number is the serial number of the episode and the time interval represents the time interval of the episode in shot units. Priority indicates the importance of the episode. The feature shot represents a shot number including the most important information among the shots constituting the episode, and the event linking information represents the event number of events related to the episode. For example, if you mark (Episode 1, 4-6, 1, 5, Goal 1, Subtitle 3), the first episode contains the fourth to sixth shots, the priority is high (1), and the feature shot is It is shot 5 and connected events are 1 goal and 3 subtitles.

요약 비디오 구간 선택부(104)는 검출된 에피소드를 바탕으로 원 비디오 내용을 잘 요약할 수 있는 비디오 구간을 선택한다. 이 구간 선택 기준은 미리 정해진 요약 규칙 정의부(105)의 요약 규칙에 따라서 수행한다.The summary video section selector 104 selects a video section capable of summarizing the original video content based on the detected episode. This section selection criterion is performed according to the summary rule of the predetermined summary rule definition unit 105.

요약 규칙 정의부(105)에서는 요약 구간을 선택하기 위한 규칙을 정의하고 요약 구간을 선택하기 위한 제어신호를 출력한다. 또한 요약 규칙 정의부(105)에서는 요약 비디오 구간을 선택하는데 기반이 되는 요약 사건 종류를 요약 비디오 기술부(108)로 출력한다.The summary rule defining unit 105 defines a rule for selecting a summary section and outputs a control signal for selecting the summary section. In addition, the summary rule definition unit 105 outputs the summary event type based on selecting the summary video section to the summary video description unit 108.

요약 비디오 구간 선택부(104)는 선택된 요약 비디오의 구간들의 시간 정보를 프레임 단위로 출력하고 비디오 구간에 해당하는 사건 종류를 출력한다. 즉, (100~200, 골), (500~700, 슛) ..... 의 형태로 요약 비디오의 구간으로 선택된 비디오 세그먼트는 100~200 프레임, 500~700 프레임 ... 이고 각 세그먼트의 사건은 골과 슛임을 나타낸다. 또는 요약 비디오 구간에 해당하는 비디오만으로 구성된 별도의 비디오로 접근할 수 있도록 파일명 등의 정보를 출력할 수도 있다.The summary video section selection unit 104 outputs time information of the sections of the selected summary video in frame units and outputs an event type corresponding to the video section. That is, the video segment selected as the interval of the summary video in the form of (100-200, goal), (500-700, shot) ..... is 100-200 frames, 500-700 frames ... An event is a goal and a shot. Alternatively, information such as a file name may be output so that a separate video composed of only video corresponding to the summary video section is accessible.

요약 비디오 구간 선택이 완료되면 그 요약 비디오 구간 정보를 이용하여 그 비디오 구간에서의 대표 프레임과 대표 음향을 대표 프레임 추출부(106)와 대표 음향 추출부(107)에서 각각 추출한다. 대표 프레임 추출부(106)는 그 요약 비디오 구간을 대표하는 영상의 프레임 번호 또는 그 영상 데이터를 출력한다. 대표 음향 추출부(107)는 그 요약 비디오 구간을 대표하는 음향 데이터 또는 음향 시간 구간을 출력한다.When the summary video section selection is completed, the representative frame extractor and the representative sound in the video section are extracted by the representative frame extractor 106 and the representative sound extractor 107 using the summary video section information. The representative frame extractor 106 outputs a frame number of the video representing the summary video section or the video data. The representative sound extracting unit 107 outputs sound data or sound time sections representing the summary video section.

요약 비디오 기술부(108)에서는 도 2에 기술된 본 발명에 따른 계층적 기술구조(Hierarchical Summary Description Scheme)에 따라서 효과적인 요약과 브라우징 기능이 가능하도록 관련 정보를 기술한다. 여기에 포함되는 주요 정보는 요약 비디오의 요약 사건 종류와 각 요약 비디오 구간을 기술하는 정보로 시간 정보, 대표프레임, 대표 음향, 구간 사건 종류 정보들이다.The summary video description unit 108 describes related information to enable an effective summary and browsing function according to the hierarchical summary description scheme described in FIG. 2. The main information included here is information describing a summary event type of each summary video and each summary video section, and includes time information, representative frame, representative sound, and section event type information.

요약 비디오 기술부(108)는 도 2에 도시된 바와 같은 기술구조에 따른 요약 비디오 기술 데이터를 출력한다.The summary video description unit 108 outputs summary video description data according to the description structure as shown in FIG. 2.

도 2는 본 발명에 따른 요약 비디오 기술 데이터를 기술하기 위한 계층적요약 기술구조(HierarchicalSummary DS)의 자료구조를 UML(Unified Modeling Language)로 도시한 것이다.FIG. 2 illustrates a data structure of a hierarchical summary DS for describing summary video description data in UML (Unified Modeling Language).

요약 비디오를 기술하는 계층적요약 기술구조(HierarchicalSummary DS)(201)는 하나 이상의 하이라이트레벨 기술구조(HighlightLevel DS)(202), 하나 또는 영 개의 요약주제리스트 기술구조(SummaryThemeList DS)(203)로 구성된다. 요약주제리스트(SummaryThemeList)는 요약을 구성하는 주제 또는 사건 정보를 나열하여 기술하는 것으로 사건 중심의 요약 및 브라우징 기능을 제공한다.Hierarchical Summary DS 201 describing the summary video consists of one or more HighlightLevel DS 202 and one or zero SummaryThemeList DS 203. do. SummaryThemeList lists and describes the topics or event information that make up the summary. It provides event-driven summarization and browsing.

하이라이트레벨 기술구조(HighlightLevel DS)(202)는 그 레벨의 요약 비디오를 구성하는 비디오 구간 수만큼의 하이라이트세그먼트 기술구조(HighlightSegment DS)(204)와 영(zero) 또는 다수개의 하이라이트 레벨 기술구조(HighlightLevel DS)로 구성된다. HighlightSegment DS는 각 요약 비디오 구간에 해당하는 정보를 기술한다. HighlightSegment DS는 한 개의 비디오세그먼트위치지정 기술구조 (VideoSegmentLocator DS)(205)와 영 또는 다수 개의 영상 위치지정 기술구조 (ImageLocator DS)(206), 그리고 영 또는 다수 개의 음향 위치지정 기술구조 (SoundLocator DS)(207), 오디오 세그먼트 위치지정 기술구조(AudioSegmentLocator DS)(208)로 구성된다.The highlight level description structure (HighlightLevel DS) 202 is a highlight segment description structure (HighlightSegment DS) 204 and zero or multiple highlight level description structures (HighlightLevelLevel) as many as the number of video sections constituting the summary video of the level. DS). HighlightSegment DS describes information corresponding to each summary video section. HighlightSegment DS includes one VideoSegmentLocator DS 205, zero or multiple ImageLocator DS 206, and zero or multiple SoundLocator DS. 207, an audio segment positioning technology structure (AudioSegmentLocator DS) 208.

이하에서는 이 계층적요약 기술구조에 대해 보다 상세하게 설명하기로 한다.Hereinafter, this hierarchical summary technology structure will be described in more detail.

계층적요약 기술구조(HierarchicalSummary DS)는 HierarchicalSummary DS가 포함하는 요약 형태를 분명히 나타내는 요약타입리스트(SummaryComponentTypeList)라는 속성(attribute)을 갖는다. 요약타입리스트(SummaryComponentTypeList)는 요약타입(SummaryComponentType)을 기반으로 하여 파생되며, 포함된 모든 SummaryComponentType을 나열하여 기술한다.HierarchicalSummary DS has an attribute called SummaryComponentTypeList that clearly indicates the type of summary that HierarchicalSummary DS contains. The summary type list (SummaryComponentTypeList) is derived based on the summary type (SummaryComponentType) and describes by listing all included SummaryComponentTypes.

SummaryComponentType에는 keyFrames, keyVideoClips, keyAudioClips, keyEvents, unconstrained의 5종류가 있다. keyFrames은 대표 프레임으로 구성된 keyFrames 요약을 나타낸다. keyVideoClips은 주요 비디오 구간들의 집합으로 구성된 keyVideoClips 요약을 나타내고, keyEvents는 사건 또는 주제에 해당하는 비디오 구간으로 구성된 요약을 나타내고, keyAudioClips는 대표 오디오 구간들의 집합으로 구성된 keyAudioClips 요약을 나타낸다. unconstrained는 상기한 요약 이외의 사용자가 정의한 형태의 요약을 나타낸다.There are five types of SummaryComponentType: keyFrames, keyVideoClips, keyAudioClips, keyEvents, and unconstrained. keyFrames represents a summary of keyFrames consisting of representative frames. keyVideoClips represents a keyVideoClips summary consisting of a set of major video sections, keyEvents represents a summary consisting of video sections corresponding to an event or subject, and keyAudioClips represents a keyAudioClips summary consisting of a set of representative audio sections. unconstrained represents a user-defined form of summary other than the above.

또한, 계층적요약 기술구조(HierarchicalSummary DS)는 사건 중심의 요약을 기술하기 위하여 요약에 포함된 사건(또는 주제)들을 나열하고 그 ID를 기술하는 요약주제리스트 기술구조(SummaryThemeList DS)를 포함할 수도 있다.In addition, the Hierarchical Summary DS may include a SummaryThemeList DS that lists the events (or subjects) included in the summary and describes its ID to describe the event-driven summary. have.

SummaryThemeList는 임의의 수의 SummaryTheme을 요소(element)로 갖는다. SummaryTheme은 ID 형(type)의 id라는 속성을 갖고 parentId라는 속성을 선택적으로 갖는다.SummaryThemeList has any number of SummaryTheme as an element. SummaryTheme has an attribute called id of type ID and an optional attribute called parentId.

SummaryThemeList DS는 사용자가 요약 비디오를 SummaryThemeList에 기술된 몇 가지의 주제 또는 사건 별로 브라우징할 수 있게 한다. 즉, 기술 데이터를 입력하는 응용 툴은 SummaryThemeList DS를 parsing하여 이 정보를 사용자에게 제시하여 사용자가 원하는 주제를 선택할 수 있게 한다. 이때 이러한 주제를 단순한 형태로 나열할 경우, 주제의 수가 많으면 사용자가 원하는 주제를 찾기가 용이하지 않을 수 있다.The SummaryThemeList DS allows the user to browse the summary video by several themes or events described in the SummaryThemeList. In other words, the application tool for entering technical data parses the SummaryThemeList DS and presents this information to the user so that the user can select the desired topic. At this time, when the topics are listed in a simple form, a large number of topics may not be easy for a user to find a desired topic.

따라서 주제를 ToC(Table of Content)와 유사한 Tree 구조로 표현함으로써, 사용자는 더 효율적으로 원하는 주제를 찾아 주제별 브라우징을 할 수 있도록 한다. 이를 위하여 본 발명에서는 SummaryTheme에 parentId라는 속성을 선택적으로 사용할 수 있도록 한다. 이 parentId 란 Tree 구조에서 상위의 요소(상위의 주제)의 id를 의미한다.Therefore, by representing the subject in a tree structure similar to the ToC (Table of Content), users can find the subject they want more efficiently and browse by subject. To this end, in the present invention, a property called parentId can be selectively used in SummaryTheme. This parentId means the id of the parent element (parent topic) of the Tree structure.

본 발명의 계층적요약 기술구조(HierarchicalSummary DS)는 하나의 하이라이트레벨 기술구조(HighlightLevel DS)를 포함하며, 이 하이라이트레벨 기술구조는 요약 비디오를 구성하는 하나 이상의 비디오 세그먼트(또는 구간)를 포함하는 하나 이상의 하이라이트세그먼트 기술구조(HighlightSegment DS)를 포함한다.Hierarchical Summary DS of the present invention includes one Highlight Level DS, which includes one or more video segments (or intervals) constituting the summary video. The above highlight segment description structure (HighlightSegment DS) is included.

HighlightLevel DS는 IDREFS 형의 themeIds라는 name 속성을 갖는다. 이 themeIds는 해당 HighlightLevel에 포함된 모든 HighlightSegment DS 또는 해당 HighlightLevel DS의 자식 HighlightLevel DS에 공통된 주제 및 사건의 id를 기술하는데, 이 id는 상기 SummaryThemeList DS에 기술되어 있다. themeIds는 다수개의 사건을 지칭할 수 있으며 그 레벨을 구성하는 하이라이트 세그먼트에 공통된 주제의 형을 나타내는 themeIds를 둠으로써 사건 중심의 요약을 할 때 그 레벨을 구성하는 모든 세그먼트에 동일한 id가 불필요하게 반복되는 문제점을 해결한다.HighlightLevel DS has a name attribute called themeIds of type IDREFS. These themeIds describe the IDs of the themes and events common to all the HighlightSegment DSs included in the HighlightLevel or to the child HighlightLevel DSs of the HighlightLevel DS, which are described in the SummaryThemeList DS. themeIds can refer to multiple events, and by placing themeIds representing the types of themes common to the highlight segments constituting the level, the same id is unnecessarily repeated in all segments constituting the level when event-driven summarization occurs. Solve the problem.

HighlightSegment DS는 하나의 비디오 세그먼트 위치지정 기술구조 (VideoSegmentLocator DS)와, 영 또는 다수 개의 영상 위치지정 기술구조 (ImageLocator DS)와, 제로 또는 하나의 음향 위치지정 기술구조(SoundLocator DS)와, 제로 또는 하나의 오디오 위치지정 기술구조(AudioSegmentLocator DS)를 포함한다.HighlightSegment DS consists of a VideoSegmentLocator DS, zero or multiple ImageLocator DSs, zero or one SoundLocator DS, and zero or one. AudioSegmentLocator DS.

여기서, VideoSegmentLocator DS는 요약 비디오를 구성하는 비디오 세그먼트의 구간의 시건 정보 및 비디오 정보 자체를 기술한다. ImageLocator DS는 그 비디오 세그먼트의 대표 프레임의 영상 데이터 정보를 기술한다. SoundLocator DS는 해당 비디오 세그먼트 구간을 대표하는 음향 정보를 기술한다. AudioSegmentLocator DS는 오디오 요약을 구성하는 오디오 세그먼트의 구간의 시간 정보 및 오디오 정보 자체를 기술한다.Here, the VideoSegmentLocator DS describes the time information of the section of the video segment constituting the summary video and the video information itself. The ImageLocator DS describes the video data information of the representative frame of the video segment. The SoundLocator DS describes sound information representing a corresponding video segment section. The AudioSegmentLocator DS describes the time information of the section of the audio segment constituting the audio summary and the audio information itself.

HighlightSegment DS는 themeIds 속성을 갖는데, 이는 해당 하이라이트 세그먼트가 상기 SummaryThemeList DS에 기술된 주제 및 사건의 어느 주제에 해당하는 지를 SummaryThemeList DS에 정의된 id를 이용하여 기술한다. themeIds는 다수 개의 사건을 지칭할 수 있으며 이는 하나의 하이라이트 세그먼트가 다수의 주제에 포함될 수 있도록 하여 기존의 사건 기반의 요약을 기술할 때 사건(또는 주제)별로 비디오 세그먼트를 기술하여 기술의 중복이 불가피해지는 문제점을 해결하는 본 발명의 효율적인 기술 방법이다.The HighlightSegment DS has a themeIds attribute, which describes which theme of the subject and the event described in the SummaryThemeList DS corresponds to the id defined in the SummaryThemeList DS. themeIds can refer to multiple events, which allows one highlight segment to be included in multiple themes, so that when describing an existing event-based summary, video segments are described by event (or topic) to avoid overlap of description. It is an efficient technical method of the present invention that solves the problem.

요약 비디오를 구성하는 하이라이트 세그먼트를 기술할 때 단지 그 하이라이트 비디오 구간의 시간 정보만을 기술하던 기존의 계층적 요약 기술구조와는 달리 본 발명에서는 요약 비디오를 구성하는 하이라이트 세그먼트를 기술하기 위한 HighlightSegment DS를 도입하여, 상기 HighlightSegment DS가 각 하이라이트 세그먼트의 비디오 구간 정보, 대표 프레임 정보, 대표 음향 정보를 기술할 수 있도록 상기와 같이 VideoSegmentLocator DS, ImageLocator DS, SoundLocator DS를 두어 하이라이트 세그먼트 비디오를 통한 개관과 그 세그먼트의 대표 프레임 및 대표 음향을 활용한 네비게이션 및 브라우징을 효율적으로 할 수 있도록 한다.Unlike the existing hierarchical summary description structure which describes only the time information of the highlight video section when describing the highlight segment constituting the summary video, the present invention introduces the HighlightSegment DS for describing the highlight segment constituting the summary video. In order to describe the video segment information, the representative frame information, and the representative sound information of each highlight segment, the HighlightSegment DS includes the VideoSegmentLocator DS, the ImageLocator DS, and the SoundLocator DS as described above. It enables efficient navigation and browsing using frame and representative sound.

비디오 구간에 해당하는 대표 음향을 기술할 수 있는 SoundLocator DS를 두어 그 비디오 구간을 대표할 수 있는 특징적인 음향(예, 총소리, 함성, 축구에서 앵커의 멘트(예, 골, 슛), 드라마에서의 배우 이름, 특정 단어 등)을 통하여 실제로 그 비디오 구간을 재생해 보지 않고도 짧은 시간에 그 구간이 사용자가 원하는 내용이 포함된 중요한 구간인지 어떤 내용이 포함된 구간이지 대략적으로 파악하게 하여 효율적인 브라우징을 가능하게 한다.SoundLocator DS can be used to describe the representative sound corresponding to the video segment so that the characteristic sound can represent the video segment (e.g. gunshots, shouts, anchor moments (e.g. goals, shots) in football, drama Through the name of the actor, a specific word, etc.), it is possible to efficiently browse through the video in a short time without having to play the video section. Let's do it.

도 3은 도 2와 같은 기술구조로 기술된 요약 비디오 기술 데이터를 입력하는 요약 비디오 재현 및 브라우징 툴의 사용자 인터페이스의 구성도이다. 비디오 재현부(301)는 사용자의 제어에 따라서 원 비디오나 또는 요약 비디오를 재현한다. 원 비디오 대표 프레임부(305)는 원 비디오의 샷들의 대표 프레임을 재현한다. 즉, 일련의 축소된 크기의 영상들로 구성된다. 원 비디오의 샷의 대표 프레임은 본 발명의 HierarchicalSummary DS로 기술되지 않고 별도의 기술구조로 기술되고, 이 기술 데이터가 본 발명의 HierarchicalSummary DS로 기술되는 요약 기술 데이터와 함께 제공될 때 활용할 수 있다. 사용자는 대표 프레임을 클릭하여 대표 프레임에 해당하는 원 비디오의 샷으로 접근한다. 요약 비디오 레벨0 대표 프레임부 및 대표 음향부(307)와 요약 비디오 레벨1 대표 프레임부 및 대표 음향부(306)는 각각 요약 비디오 레벨0과 요약 비디오 레벨1의 각 비디오 구간을 대표하는 프레임과 음향 정보를 재현한다. 즉, 일련의 축소된 크기의 영상과 음향을 나타내는 마크 영상으로 구성된다. 사용자가 요약 비디오 대표 프레임부 및 대표 음향부의 대표 프레임을 클릭하면 그 대표 프레임에 해당하는 원 비디오 구간으로 접근한다. 이때 요약 비디오의 대표 프레임에 해당하는 대표 음향 마크를 클릭하면 그 비디오 구간의 대표 음향이 재현된다.3 is a configuration diagram of a user interface of a summary video reproduction and browsing tool for inputting summary video description data described in the same technology structure as that of FIG. 2. The video reproducing unit 301 reproduces the original video or the summary video under the control of the user. The original video representative frame unit 305 reproduces a representative frame of shots of the original video. That is, it consists of a series of reduced size images. The representative frame of the shot of the original video is not described by the HierarchicalSummary DS of the present invention but described in a separate description structure, and can be utilized when this description data is provided together with the summary description data described by the HierarchicalSummary DS of the present invention. The user clicks on the representative frame to access a shot of the original video corresponding to the representative frame. The summary video level 0 representative frame unit and the representative sound unit 307 and the summary video level 1 representative frame unit and the representative sound unit 306 respectively represent frames and sounds representing each video section of the summary video level 0 and the summary video level 1, respectively. Reproduce the information. That is, it consists of a series of reduced-size images and mark images representing sound. When a user clicks on a representative frame of the summary video representative frame unit and the representative sound unit, the user accesses the original video section corresponding to the representative frame. In this case, when the representative sound mark corresponding to the representative frame of the summary video is clicked, the representative sound of the video section is reproduced.

요약 비디오 제어부(302)는 요약 비디오를 재생하기 위하여 사용자의 선택을 위한 제어를 입력한다. 사용자는 레벨선택부(303)를 통하여 다 계층의 요약 비디오가 제공될 경우 원하는 레벨의 요약을 선택하여 개관 및 브라우징 한다. 사건선택부(304)는 SummaryThemeList에 의해서 제공되는 사건 및 주제를 나열하고 사용자는 원하는 사건을 선택하여 개관 및 브라우징한다. 결국, 이는 사용자 주문형의 요약을 실현하는 것이다.The summary video control unit 302 inputs a control for user selection to play the summary video. When a multi-layered summary video is provided through the level selector 303, the user selects a summary of a desired level to view and browse. The event selector 304 lists the events and subjects provided by the SummaryThemeList, and the user selects the desired event for overview and browsing. In the end, this is to realize a user-specific summary.

도 4는 본 발명의 요약 비디오를 이용한 계층적 브라우징을 위한 데이터 및 제어 흐름에 대한 구성도이다. 브라우징은 도 3의 사용자 인터페이스를 이용하여 브라우징을 위한 데이터들을 도 4의 방법으로 접근하여 수행한다. 브라우징을 위한 데이터들은 요약 비디오와 요약 비디오의 대표 프레임, 원 비디오(406)와 원 비디오 대표 프레임(405)이다. 요약 비디오는 두 개의 레벨을 갖는 것으로 한다. 물론 두 개 이상의 레벨을 가질 수도 있다. 요약 비디오 레벨0(401)은 요약 비디오 레벨1(403)보다 더 짧게 요약된 것이다. 즉, 요약 비디오 레벨1이 요약 비디오 레벨0 보다 더 많은 내용을 포함하고 있다. 요약 비디오 레벨0 대표 프레임(402)은 요약 비디오 레벨0의 대표 프레임이고, 요약 비디오 레벨1 대표 프레임(404)은 요약 비디오 레벨1의 대표 프레임이다.4 is a diagram illustrating a data and control flow for hierarchical browsing using the summary video of the present invention. Browsing is performed by accessing data for browsing in the method of FIG. 4 using the user interface of FIG. 3. The data for browsing are the summary video and the representative frame of the summary video, the original video 406 and the original video representative frame 405. The summary video is assumed to have two levels. Of course, you can have more than one level. The summary video level 0 401 is a shorter summary than the summary video level 1 403. That is, summary video level 1 contains more content than summary video level 0. The summary video level 0 representative frame 402 is a representative frame of summary video level 0, and the summary video level 1 representative frame 404 is a representative frame of summary video level 1.

요약 비디오와 원 비디오는 도 3의 비디오 재현부(301)를 통하여 재현된다. 요약 비디오 레벨0 대표 프레임은 요약 비디오 레벨0 대표 프레임부 및 대표 음향부(306)에 표시되고, 요약 비디오 레벨1 대표 프레임은 요약 비디오 레벨1 대표 프레임부 및 대표 음향부(307)에 표시된다. 원 비디오 대표 프레임은 원 비디오 대표 프레임부(305)에 표시된다.The summary video and the original video are reproduced through the video reproducing unit 301 of FIG. The summary video level 0 representative frame is displayed in the summary video level 0 representative frame unit and the representative audio unit 306, and the summary video level 1 representative frame is displayed in the summary video level 1 representative frame unit and the representative audio unit 307. The original video representative frame is displayed in the original video representative frame unit 305.

도 4에 도시된 본 발명의 계층적 브라우징 방법은 다음의 예와 같이 다양한 형태의 계층적 경로를 가질 수 있다.The hierarchical browsing method of the present invention illustrated in FIG. 4 may have various types of hierarchical paths as in the following example.

경우 1) (1) - (2)Case 1) (1)-(2)

경우 2) (1) - (3) - (5)Case 2) (1)-(3)-(5)

경우 3) (1) - (3) - (4) - (6)Case 3) (1)-(3)-(4)-(6)

경우 4) (7) - (5)Case 4) 7-5

경우 5) (7) - (4) - (6)Case 5) (7)-(4)-(6)

전체적인 브라우징 기법은 다음과 같다. 먼저 원 비디오의 요약 비디오를 재현해서 원 비디오의 전체 내용을 파악한다. 이때 요약 비디오는 요약 비디오 레벨0을 재현할 수도 있고 요약 비디오 레벨1을 재현할 수도 있다. 요약 비디오를 재현한 다음 요약 비디오에서 더 자세히 브라우징 하고자 할 때 관심있는 비디오 구간을 요약 비디오 대표 프레임을 통하여 확인한다. 정확히 찾고자 하는 장면이 요약 비디오 대표 프레임에서 확인이 되면 그 대표 프레임을 연결된 원 비디오의 비디오 구간으로 바로 접근하여 재생한다. 그렇지 않은 경우 좀더 자세한 정보가 필요한 경우 다음 레벨의 대표 프레임을 파악하거나 원 비디오의 대표 프레임의 내용을 계층적으로 파악하여 원하는 원 비디오로 접근한다. 이러한 계층적 브라우징 기법은 원하는 내용을 접근하기 위하여 원 비디오를 재생하면서 브라우징 하면 많은 시간이 걸릴 수 있는데 원 비디오의 내용을 계층화된 대표 프레임을 통해서 바로 접근하므로 브라우징 시간을 상당히 줄일 수 있다.The overall browsing technique is as follows. First, reproduce the summary video of the original video to grasp the entire contents of the original video. The summary video may then reproduce summary video level 0 or may reproduce summary video level 1. When the summary video is reproduced and then browsed further in the summary video, the video segment of interest is identified through the summary video representative frame. When the scene to be accurately found is identified in the summary video representative frame, the representative frame is directly accessed and played in the video section of the connected original video. Otherwise, if more detailed information is needed, the next level representative frame is identified or the content of the representative frame of the original video is hierarchically accessed to access the desired original video. In this hierarchical browsing technique, it may take a lot of time to browse the original video while accessing the desired content. However, the browsing time can be significantly reduced because the original video is directly accessed through the layered representative frame.

기존의 일반적인 비디오 색인 및 브라우징 기법은 원 비디오를 샷 단위로 분할하고, 각 샷을 대표하는 대표 프레임을 구성하여 대표 프레임으로부터 원하는 샷을 인식하여 그 샷으로 접근한다. 이 경우 원 비디오의 샷의 개수가 매우 많아서 많은 수의 대표 프레임으로부터 원하는 내용을 브라우징 하는데 많은 시간과 노력을 요한다. 본 발명에서는 요약 비디오의 대표 프레임으로 계층적 대표 프레임을 구성하여 보다 쉽고 빨리 원하는 비디오로 접근할 수 있게 한다.Conventional video indexing and browsing techniques divide the original video into shot units, construct a representative frame representing each shot, recognize a desired shot from the representative frame, and approach the shot. In this case, since the number of shots of the original video is very large, it takes a lot of time and effort to browse the desired content from a large number of representative frames. In the present invention, hierarchical representative frames are composed of representative frames of the summary video, so that the user can access the desired video more easily and quickly.

경우 1)은 요약 비디오 레벨0을 재현하고 요약 비디오 레벨0 대표 프레임으로부터 바로 원 비디오로 접근하는 경우이다. 경우 2)는 요약 비디오 레벨0을 재현하고 요약 비디오 레벨0 대표 프레임에서 가장 관심 있는 대표 프레임을 선택하고 원 비디오에 접근하기 전에 더 자세한 정보를 파악하기 위해서 그 대표 프레임 근처에 해당하는 요약 비디오 레벨1의 대표 프레임에서 원하는 장면을 확인하고 원 비디오로 접근하는 경우이다. 경우 3)은 경우 2)에서 요약 비디오 레벨1 대표 프레임에서 바로 원 비디오로 접근하기 어려운 경우, 더 자세한 정보를 얻기 위하여 가장 관심있는 대표 프레임을 선택하고 그 대표 프레임 근처의 원 비디오 대표 프레임들을 대상으로 원하는 장면을 확인하고 원 비디오의 대표 프레임을 이용하여 원 비디오로 접근하는 경우이다. 경우 4)와 경우 5)는 요약 비디오 레벨1의 재현에서 시작하고 경로는 위에서 설명한 경우와 유사하다.Case 1) is the case of reproducing summary video level 0 and accessing the original video directly from the summary video level 0 representative frame. Case 2) reproduces the summary video level 0, selects the representative frame of interest in the summary video level 0 representative frame, and identifies the corresponding summary video level 1 near that representative frame to obtain more detailed information before accessing the original video. This is the case when you want to check the desired scene in the representative frame and access to the original video. In case 3) is difficult to access the original video directly from the summary video level 1 representative frame in case 2), select the representative frame of interest to obtain more detailed information and target the original video representative frames near the representative frame. This is the case where the desired scene is identified and the original video is accessed using the representative frame of the original video. Cases 4) and 5) begin with the representation of the summary video level 1 and the path is similar to that described above.

이러한 본 발명을 서버/클라이언트 환경에 적용하면 다수의 클라이언트가 하나의 서버에 접근하여 비디오를 개관 및 브라우징할 수 있는 시스템을 제공할 수 있다. 서버에 원 비디오를 입력받아 계층적요약 기술구조에 기반하여 요약 비디오 기술 데이터를 생성하며 상기 원 비디오와 요약 비디오 기술 데이터를 링크하는 요약 비디오 기술 데이터 생성시스템을 구비시킨다. 클라이언트는 통신망을 통해 서버에 접근하여, 요약 비디오 기술 데이터를 이용하여 비디오를 개관하고 원 비디오로 접근하여 비디오를 브라우징 및 네비게이션한다.Applying the present invention to a server / client environment can provide a system in which a plurality of clients can access one server to view and browse video. A summary video description data generation system receives a raw video from a server, generates summary video description data based on a hierarchical summary description structure, and links the original video with the summary video description data. The client accesses the server through the communication network, overviews the video using the summary video description data, and browses and navigates the video by accessing the original video.

본 발명의 기술 사상은 상기 바람직한 실시 예에 따라 구체적으로 기술되었으나, 상기한 실시예는 그 설명을 위한 것이며 그 제한을 위한 것이 아님을 주의하여야 한다. 또한, 본 발명의 기술 분야의 통상의 전문가라면 본 발명의 기술 사상의 범위 내에서 다양한 실시예가 가능함을 이해할 수 있을 것이다.Although the technical spirit of the present invention has been described in detail according to the above-described preferred embodiment, it should be noted that the above-described embodiment is for the purpose of description and not of limitation. In addition, those skilled in the art will understand that various embodiments are possible within the scope of the technical idea of the present invention.

이상에서 설명한 바와 같이 본 발명은, 요약 비디오의 생성과 기술구조를 통하여 비디오 전체 내용을 빠른 시간에 파악하고 요약 비디오의 각 비디오 구간의 대표 프레임 정보와 대표 음향 정보를 이용하여 효과적인 계층적 브라우징을 가능하게 한다. 또한 사건 기반의 요약 비디오 기술을 통하여 사건 및 주제에 따른 요약 비디오 및 브라우징을 사용자에게 제공할 수 있는 사용자 주문형의 기능도 포함한다.As described above, the present invention enables a quick hierarchical browsing using the representative frame information and the representative sound information of each video section of the summary video through the generation of the summary video and the technical structure in a short time. Let's do it. It also includes on-demand functionality to provide users with summary video and browsing based on events and subjects through event-based summary video technology.

Claims

In the Hierarchical Summary DS describing the summary video,

At least one highlight level description structure describing a highlight level,

And the highlight level description structure comprises one or more highlight segment description structures (Hierarchical Segment DS) describing information of the highlight segments constituting the summary video of the level.

The method of claim 1,

And the highlight level description structure comprises a plurality of lower level highlight level description structures, and the highlight level description structure comprises a plurality of lower levels.

The method of claim 1,

The highlight segment technology structure,

And a video segment positioning description structure describing time information of the corresponding highlight segment and video self information.

The method of claim 3, wherein

The highlight segment technology structure,

And a video positioning technology structure for describing the representative frame of the highlight segment.

The method of claim 3, wherein

The highlight segment technology structure,

And a sound positioning technology structure for describing representative sound information of the corresponding highlight segment.

The method of claim 3, wherein

The highlight segment technology structure,

And a video positioning description structure for describing the representative frame of the highlight segment, and an audio positioning description structure for describing the representative sound information of the highlight segment.

The method according to claim 4 or 6,

The image positioning technology structure,

And a hierarchical summary technology structure for describing time information and image data of a representative frame of a video section corresponding to the corresponding highlight segment.

The method according to any one of claims 3 to 6,

The highlight segment technology structure,

And an audio segment positioning description structure describing audio segment information constituting an audio summary of the highlight segment.

The method of claim 8,

The audio segment positioning technology structure is

And a hierarchical summary technology structure for describing time information and audio data information of an audio section of the corresponding highlight segment.

The method of claim 1,

The hierarchical summary technical structure,

And a SummaryComponentTypeList attribute described by listing SummaryComponentTypes representing all kinds of summaries included in the hierarchical summary description structure.

The method of claim 10, wherein the SummaryComponentType,

KeyFrames representing a summary of keyFrames consisting of representative frames, keyVideoClips representing a summary of keyVideoClips consisting of a set of major video segments, keyEvents representing a summary consisting of video segments corresponding to an event or subject, and keyAudioClips summary consisting of a set of representative audio intervals. A hierarchical summary description structure comprising keyAudioClips representing and unconstrained representing summaries in user-defined forms other than the summaries described above.

The hierarchical summary technical structure of claim 1,

A summary of the event-driven summary, including a SummaryThemeList DS listing the events (or subjects) included in the summary and describing their IDs, and the user Or hierarchical summary technology structure that enables browsing on a case-by-case basis.

The method of claim 11,

The SummaryThemeList DS has an arbitrary number of SummaryTheme as elements,

The SummaryTheme is a hierarchical summary technology structure characterized in that it has an id attribute representing the event or subject.

The method of claim 13,

The SummaryTheme further comprises a parentID attribute to describe the id of the upper event or subject.

The method of claim 13,

The high rate technology structure,

Include themeIds describing the id attribute as an attribute,

And if the highlight segment or highlight level constituting the highlight level has the same event or theme, the id of the event is described in the highlight level description structure.

The method of claim 13,

The highlight segment technology structure,

Include themeIds describing the id attribute as an attribute,

A hierarchical summary description structure that describes the event or subject of the highlight segment.

On your computer,

A highlight level description structure having at least one level describing a highlight level,

The highlight level description structure includes one or more highlight segment description structures (HierarchicalSegment DS) describing information of each highlight segment constituting the level summary video.

The highlight segment description structure is a computer-readable recording medium storing a hierarchical summary description structure of a video including a video segment positioning description structure describing time information of the corresponding highlight segment and video itself.

In the summary video description data generating method of receiving the original video and generating the summary video description data according to the summary description structure,

A video analysis step of receiving and analyzing the original video and outputting a video analysis result;

A summary rule definition step of defining a summary rule for selecting a summary video section;

A summary video section selection step of receiving the original video analysis result and the summary rule and selecting a video section for summarizing video contents in the original video to configure summary video section information; And

And a summary video description step of receiving the summary video section information defined in the summary video section selection step and generating the video summary description data according to the hierarchical summary description structure.

The method of claim 18,

The hierarchical summary technical structure,

And the highlight segment description structure includes a video segment positioning description structure describing the time information and the video itself information of the corresponding highlight segment.

The method of claim 18,

The video analysis step,

A feature extraction step of extracting a feature by taking an original video as an input, and outputting a time interval in which the type and feature of the extracted feature are detected;

An event detection step of detecting major events included in the original video by receiving the type of the feature and the time interval in which the feature is detected;

And an episode detection step of detecting an episode by dividing the original video based on the story flow based on the detected major events.

The method of claim 18,

And the summary rule definition step defines a summary event type based on selecting a summary video section and provides the summary event description step to the summary video description step.

The method of claim 18,

And extracting a representative frame by receiving the summary video section information and extracting the representative frame and providing the summary video description step to the summary video description step.

The method of claim 18,

And a representative sound extraction step of receiving the summary video section information and extracting the representative sound and providing the summary video description step to the summary video description step.

On your computer,

A feature extraction step of extracting a feature from an input original video and outputting a time section in which the type and feature of the extracted feature are detected;

An event detection step of detecting the main events included in the original video by receiving the type of the feature and the time interval in which the feature is detected;

An episode detection step of detecting an episode by classifying the original video based on the story flow based on the detected main events

A summary video section selection step of inputting the detected episode and the summary rule to select a video section capable of summarizing video content from the original video to configure summary video section information; And

A computer-readable recording medium having recorded thereon a program for executing a summary video description step of generating a video summary description data having a hierarchical summary description structure by receiving the summary video section information defined in the summary video section selection step.

In a summary video description data generation system that receives an original video and generates summary video description data according to the summary description structure,

Video analysis means for receiving and analyzing the original video and outputting a video analysis result;

Summary rule definition means for defining a summary rule for selecting a summary video section;

Summary video section selection means for receiving the original video analysis result and the summary rule and selecting a video section for summarizing video contents from the original video to configure summary video section information; And

And a summary video description means for receiving the summary video section information defined by the summary video section selection means and generating video summary description data having a hierarchical summary description structure.

The method of claim 25,

The hierarchical summary technical structure,

The method of claim 25, wherein the video analysis means

Feature extraction means for taking an original video as an input and extracting a feature, and outputting a time section in which the type and feature of the extracted feature are detected;

An event detecting means for detecting the main events included in the original video by receiving the type of the feature and the time interval in which the feature is detected;

And an episode detection means for detecting an episode by classifying the original video based on the story flow based on the detected major events.

The method of claim 25,

And the summary rule definition means defines a summary event type based on selecting a summary video section and provides the summary event description to the summary video description means.

The method of claim 25,

And a representative frame extracting means for extracting a representative frame by receiving summary video section information of the summary video section selecting means and providing the summary video section information to the summary video description means.

The method of claim 25,

And a representative sound extracting means for receiving the summary video section information of the summary video section selecting means and extracting the representative sound to provide the summary video description means to the summary video description means.

On your computer,

Feature extraction means for extracting a feature from an input original video and outputting a time section in which the type and feature of the extracted feature are detected;

Episode detection means for detecting the episode by classifying the original video based on the story based on the detected major events,

Summary rule definition means for defining a summary rule for selecting a summary video segment,

A summary video section selection means for inputting the detected episode and the summary rule to select a video section capable of summarizing video content from the original video to construct summary video section information, and

And a summary video description means for receiving the summary video section information defined in the summary video section selection step and generating video summary description data having a hierarchical summary description structure.

A server having a summary video description data generation system receiving the original video and generating summary video description data based on the hierarchical summary description structure and linking the original video and the summary video description data;

And a client for overviewing the original video using the summary video description data and accessing the original video of the server for browsing and navigating the video.