KR100370249B1

KR100370249B1 - A system for video skimming using shot segmentation information

Info

Publication number: KR100370249B1
Application number: KR10-2000-0048036A
Authority: KR
Inventors: 전성배; 윤경로; 강배근; 배소영; 유재신
Original assignee: 엘지전자 주식회사
Priority date: 2000-08-19
Filing date: 2000-08-19
Publication date: 2003-01-29
Also published as: KR20020014857A

Abstract

본 발명은 샷 세그멘테이션을 정보를 이용해서 짧은 시간내에 전체 비디오의 내용을 이해할 수 있고, 비디오 스트림을 빠른 시간내에 탐색하여 사용자가 원하는 위치로 이동할 수 있도록 한 비디오 스키밍 방법에 관한 것이다.The present invention relates to a video skimming method in which shot segmentation can be used to understand the contents of the entire video in a short time, and the user can navigate the video stream to a desired position by quickly searching the video stream.

본 발명은 모든 비디오에 존재하는 샷을 기준으로 하여 샷의 일정부분(앞부분이나 뒷부분, 또는 가운데 부분, 또는 이 것들의 조합)만을 재생하고 나머지 부분은 스킵하는 방법으로 스키밍을 수행한다. 본 발명의 스키밍 방법에서는 각 샷 내에서 재생할 부분의 길이를 모든 샷에 대하여 동일하게 하거나, 샷 내에서의 비디오 유사성(평균적 이미지적/모션적/오디오적 유사도)에 근거하여 각 샷 마다 서로 다른 길이로 재생한다. 또한 본 발명에서는 각 샷 내에서 재생할 구간으로 선정된 세그먼트들을 연속하여 재생하거나, 재생 구간내의 일부 프레임(예: MPEG의 경우 I프레임)만을 디코딩하여 구간 내에서 다시 스키핑하여 재생한다.According to the present invention, skimming is performed by playing only a part of the shot (front part, rear part, center part, or a combination thereof) based on the shots present in all videos, and skipping the remaining part. In the skimming method of the present invention, the length of a portion to be reproduced in each shot is the same for all shots, or different lengths for each shot based on video similarity (average image / motion / audio similarity) in the shot. Play with In addition, the present invention continuously reproduces segments selected as sections to be played in each shot, or decodes only some frames (eg, I frames in MPEG) in the section to be skipped and plays back in the section.

Description

A system for video skimming using shot segmentation information}

본 발명은 멀티미디어 검색과 브라우징을 위한 시스템으로서, 특히 비디오의 샷 세그먼테이션 정보를 이용해서 비디오의 전체 내용에 대한 요약적인 이해와 사용자가 원하는 부분(위치)으로의 빠른 이동을 할 수 있도록 한 비디오 스키밍(skimming) 방법에 관한 것이다.The present invention is a system for multimedia search and browsing, and in particular, using video shot segmentation information, a video skimming system is provided to allow a quick understanding of the entire content of the video and quick movement to a desired part (location). skimming) method.

특히 본 발명은 비디오 세그먼테이션(video segmentation)에 의해서 이루어지는 샷을 기반으로 하여 비디오의 내용을 스키밍하고, 비디오 스키밍을 수행할 때 샷 내에서 특정한 일부분의 연속적인 재생 또는 스킵 기법을 도입한 부분 재생을 통해서 사용자가 짧은 시간 내에 비디오 전체의 내용을 충분히 이해할 수 있도록 함과 함께, 비디오 스트림을 빠른 시간내에 탐색하여 사용자가 원하는 위치로 빠르게 이동할 수 있도록 한 비디오 스키밍 방법에 관한 것이다.In particular, the present invention is to skim the content of the video based on the shot made by video segmentation, and to play the video skimming through the partial playback in which the continuous playback or skip technique of a specific portion in the shot is introduced. The present invention relates to a video skimming method that enables the user to fully understand the content of the entire video in a short time, and allows the user to quickly navigate to a desired position by quickly searching the video stream.

대중매체의 발달과 멀티미디어 콘텐트의 제작이 쉬워짐에 따라 일반인이 매일 접하게 되는 미디어의 양이 매우 방대해 졌다. 멀티미디어 콘텐트가 방대해짐에 따라 사용자가 원하는 데이터를 선별하여 주는 자동화 시스템에 대한 요구가 발생하였으며, 이를 해결하기 위한 방안들에 대한 연구가 활발히 진행되고 있다. 특히, 디지털 기술이 발달함에 따라 비디오 콘텐트는 디지털 형태로 저장되어 유통되는 추세로 진행하고 있으며 디지털 방송이 대중화되면 이러한 미디어의 디지털화는 더욱 가속화 될 것이다.With the development of mass media and the ease of production of multimedia contents, the amount of media that the public encounters every day has become very large. As the multimedia content is enormous, there is a demand for an automated system that selects data desired by the user, and researches on ways to solve the problem have been actively conducted. In particular, with the development of digital technology, video contents are being stored and distributed in digital form, and digitalization of such media will be accelerated when digital broadcasting becomes popular.

이와같은 디지털 비디오 콘텐트에 대하여 어떤 사용자는 뉴스에서 스포츠 관련 뉴스만을 시청하기를 원할 수 있으며, 또 다른 시청자는 뉴스에서 증권 관련 뉴스만을 원하는 경우가 있을 수 있다. 또한 어떤 사용자는 쇼 프로그램에서 특정 인물이 출연하는 장면만을 시청하기 위한 요구가 있을 수도 있다. 이러한 다양한 사용자의 요구를 수용하기 위하여 여러가지 연구가 활발히 진행되고 있다.For such digital video content, some users may only want to watch sports-related news in the news, while other viewers may only want securities-related news in the news. Some users may also be required to watch only scenes in which a particular person appears in a show program. Various studies have been actively conducted to accommodate the needs of various users.

또한 사용자는 제한된 시간 내에 비디오 콘텐트의 전체 내용을 파악하고자 하는 요구가 있다. 이러한 요구를 수용하는 것이 "하이라이트"이다. 일반적으로 하이라이트는 비디오 콘텐트에서 중요한 장면을 모아 새로 구성한 콘텐트로 이해될 수 있으며, 이는 "스포츠 하이라이트", "영화 예고편", "헤드라인 뉴스" 등을 통하여 접할 수 있다. 그러나, 비디오 콘텐트에서 하이라이트를 추출하는 것은 현재의 기술로 자동화 되기 매우 어려운 영역이다. 따라서 대부분 사람의 수동 작업에 의존하는 것이 보통이다. 앞서 언급된 바와 같이 미디어의 양이 폭발적으로 증가함에 따라 모든 비디오 콘텐트에 대하여 하이라이트를 수동으로 제공하는 것은 많은 인력이 필요하게 되므로 거의 불가능하다. 따라서 사용자로 하여금 빠른 시간내에 대강의 내용을 이해 시킬 수 있는 자동화 시스템이 필요하다.In addition, the user has a desire to grasp the entire contents of the video content within a limited time. It is a "highlight" to accommodate this need. In general, highlights can be understood as newly formed content by gathering important scenes from video content, which can be accessed through "sports highlights", "movie trailers", "headline news", and the like. However, extracting highlights from video content is a very difficult area to automate with current technology. Therefore, it is common to rely on the manual work of most people. As mentioned above, it is almost impossible to manually provide highlights for all video content as the amount of media explodes, which requires a lot of manpower. Therefore, there is a need for an automated system that allows users to quickly understand the contents of the sketch.

디지털 기술이 발달함에 따라 비디오 콘텐트에서 사용자가 원하는 위치로 이동하고자 할 때 사용하기 위한 용도로 키프레임(key frame)이 이용된다. 키프레임을 이용한 비디오 요약을 이용하면 사용자는 원하는 곳으로 빨리 이동할 수 있는 장점이 있다. 그러나 키 프레임을 이용하여 사용자가 원하는 곳을 쉽게 찾기 위해서는 많은 수의 키 프레임이 필요하며, 제한된 디스플레이 공간에 많은 수의 키프레임을 디스플레이 하기 어려우므로 사용자에게 많은 선택 작업을 요구하게 되어 매우 불편하다. 또한 키프레임을 이용한 방법을 통해서는 일반적으로 전체 비디오의 내용을 이해하기 어려운 단점이 있다.With the development of digital technology, key frames are used for use when the user wants to move to a desired position in video content. Using video summaries using keyframes has the advantage of allowing the user to quickly move to the desired location. However, a large number of key frames are required to easily find a user's desired location using key frames, and it is very inconvenient to require a large number of selection tasks for the user because it is difficult to display a large number of key frames in a limited display space. In addition, the method using a key frame generally has a disadvantage that it is difficult to understand the content of the entire video.

근래에는 디지털 비디오에서 원하는 장면을 찾기 위한 용도로 여러가지 비디오 인덱싱 기술이 연구되고 있다. 특정한 사람이 나온 장면만을 원하는 사용자를위하여 비디오에서 사람이 나온 장면을 찾아내고 그 사람이 누구인지를 인식하는 과정을 통하여 비디오에서 인물의 등장에 대한 정보를 인덱싱 하거나, 영화나 스포츠등에서 주요 장면을 추출하고 이를 인덱싱하는 등의 연구가 진행되고 있다. 그러나 비디오는 그 장르가 매우 다양하고 장르별로 인덱싱 되어야 할 데이터들이 매우 다르며, 현재의 기술로서는 사용자에게 의미 있는 정보를 높은 수준의 정확도를 가지고 추출할 수 있는 자동화 시스템의 구현은 매우 어려운 것으로 알려져 있다.Recently, various video indexing techniques have been studied to find desired scenes in digital video. For users who only want to see a scene from a specific person, index the information about the person's appearance in the video by extracting the scene from the video and recognizing who the person is. And researches on indexing them are underway. However, video has various genres, and data to be indexed by genres is very different, and it is known that it is difficult to implement an automation system that can extract information meaningful to users with a high level of accuracy with current technology.

한편, 아날로그 비디오에서와는 달리 디지털 비디오에서는 빨리감기/빨리되감기 기능을 수행함에 있어서 화질의 저하를 막을 수 있다.On the other hand, unlike in analog video, it is possible to prevent the deterioration of image quality in performing the fast forward / fast reverse function in digital video.

디지털 비디오에서 일반적으로 사용되는 고속 재생의 방법은 단위시간당 디코딩하는 프레임의 수를 늘려서 그 중의 일부분 만을 디스플레이 하거나 일정 부분을 건너뛰며 프레임을 디코딩하여 디스플레이 하는 방법이 사용된다.The fast playback method commonly used in digital video is to increase the number of frames to be decoded per unit time to display only a part of the frames or to skip and decode the frames.

그러나 단위시간당 디코딩하는 프레임의 수를 늘리는 방법은 단말 장치의 성능에 따라 최대 속도가 영향 받는 단점이 있으므로, 일반적으로 디지털 비디오의 빨리감기/빨리되감기에서는 일정 부분을 건너뛰며 프레임을 디코딩하여 디스플레이하는 방법을 이용한다. 디지털 비디오에서의 빨리감기/빨리 되감기 기술은 기존의 기술 중에서 제한된 시간내에 전체의 내용을 파악하고자 하거나 원하는 위치로 이동하고자 하는 사용자의 요구에 대응하기 위한 가장 합리적인 대안이지만, 일정부분을 스킵함에 있어서 대부분 시간적으로 일정한 간격을 이용하므로 사용자는 자신이 원하는 부분의 장면을 놓친다거나, 중요하지 않은 부분이 상대적으로 많이 재생되는 단점이 있다.However, the method of increasing the number of frames to decode per unit time has a disadvantage in that the maximum speed is affected by the performance of the terminal device. Therefore, in the case of fast forward / fast reverse of digital video, a method of decoding and displaying frames by skipping a portion is generally used. Use Fast-forward / fast-rewind technology in digital video is the most reasonable alternative to responding to the needs of users who want to grasp the entire contents within a limited time or move to a desired position, but most of the skipping part Since the user uses a constant interval in time, the user misses the scene of the part that he / she wants, or a relatively insignificant part plays a disadvantage.

본 발명에서는 디지털 비디오 환경에서 사용자가 제한된 시간내에 비디오 전체의 내용을 파악하고자 하거나 또는 원하는 위치로 이동하고자 하는 사용자의 요구에 대응하며, 비디오에서 중요하지 않은 부분이 상대적으로 많이 재생된다거나 사용자가 실제로 시청을 원하는 장면을 놓칠 수 있는 확률을 최소화 하는 자동화된 비디오 스키밍 시스템 구성 방법을 제안한다.In the present invention, in response to the user's request to grasp the contents of the entire video within a limited time or to move to a desired position in a digital video environment, relatively insignificant portions of the video are reproduced or the user actually We propose a method of constructing an automated video skimming system that minimizes the probability of missing a desired scene.

또한 본 발명은 알려진 바와같이 높은 정확도를 가지고 자동화 될 수 있는 비디오 세그멘테이션 기술에 기반하여 비디오 스키밍을 수행함을 특징으로 하며, 비디오의 샷 세그멘테이션 정보를 이용해서 비디오 스키밍을 수행함으로써 사용자가 비디오의 전체 내용을 모두 시청하지 않고도 비디오의 내용을 이해하는데 필요충분할 정도로 요약된 검색 및 브라우징 환경을 제공함과 동시에, 사용자가 원하는 위치로 빠르게 이동할 수 있는 비디오 스키밍 방법을 제안한다.In addition, the present invention is characterized by performing video skimming based on the video segmentation technology that can be automated with high accuracy as known, and by performing video skimming using the shot segmentation information of the video, We propose a video skimming method that enables users to move quickly to a desired location while providing a search and browsing environment that is summarized enough to understand the contents of a video without watching all of them.

도1은 샷 세그멘테이션의 개념을 설명하기 위한 도면1 is a diagram for explaining a concept of shot segmentation;

도2는 본 발명의 샷 세그멘테이션 정보를 이용한 스키밍 방법의 개념을 설명하기 위한 도면2 is a view for explaining the concept of a skimming method using shot segmentation information of the present invention;

도3은 비디오 스키밍에서 재생할 부분과 스킵할 부분을 선정하는 방법의 여러가지 예를 나타낸 도면3 is a diagram illustrating various examples of a method of selecting a portion to be played back and a portion to be skipped in video skimming.

도4는 샷의 비유사도 특성을 이용한 동적인 단위 재생 길이 선정방법의 일예를 나타낸 도면4 is a diagram illustrating an example of a dynamic unit reproduction length selection method using dissimilarity characteristics of a shot;

도5는 스키핑을 이용한 고속 스키밍 방법을 설명하기 위한 도면5 is a view for explaining a high-speed skimming method using skipping

도6은 샷 정보를 이용한 비디오 스키밍을 위한 시스템 구성의 일예를 나타낸 도면6 is a diagram illustrating an example of a system configuration for video skimming using shot information;

본 발명은 비디오 스트림에 대한 샷 세그멘테이션 정보로부터 개별 샷의 구간을 인식하는 단계, 상기 인식된 개별 샷 구간 내에서 특정한 일부분을 해당 샷에 대한 내용을 반영하는 재생할 비디오 정보로 선택하는 단계, 각각의 개별 샷 마다 선택된 상기 비디오 정보를 연속시켜 재생하는 단계를 포함하여 이루어지는 것을 특징으로 하는 샷 세그멘테이션 정보를 이용한 비디오 스키밍 방법이다.The present invention provides a method of recognizing a segment of an individual shot from shot segmentation information of a video stream, selecting a specific portion within the recognized individual shot segment as video information to be reproduced reflecting contents of the shot, and each individual segment. A video skimming method using shot segmentation information comprising the step of continuously playing the video information selected for each shot.

또한 본 발명에서 상기 샷 세그멘테이션 정보는 비디오 스트림의 제작 및 편집 단위인 샷 정보를 시작위치와 지속시간, 또는 시작위치와 종료위치 등의 시간적정보와 함께 표시한 것을 특징으로 한다.In addition, in the present invention, the shot segmentation information is characterized by displaying shot information, which is a production and editing unit of a video stream, together with temporal information such as a start position and a duration, or a start position and an end position.

또한 본 발명에서 상기 각각의 샷 내에서 재생할 부분의 선정은 해당 샷의 전반부, 후반부, 중간부분, 전반부와 후반부를 동시에 사용하는 것을 특징으로 한다.In the present invention, the selection of the portion to be reproduced in each shot is characterized by using the first half, the second half, the middle, the first half and the second half of the shot at the same time.

또한 본 발명에서 상기 각각의 샷에서 재생될 길이의 결정은 각각의 샷에서 동일한 길이의 세그먼트를 선택하여 재생하거나, 개별 샷내의 이미지적/모션적/오디오적 유사성의 평균값에 근거하여 유사도가 높으면 재생할 길이를 줄이고 유사도가 낮으면 해당 샷에서 재생할 길이를 늘리는 것을 특징으로 한다.Also, in the present invention, the determination of the length to be reproduced in each shot is performed by selecting and reproducing segments of the same length in each shot, or reproducing if the similarity is high based on the average value of the image / motion / audio similarity in the individual shots. If the length is reduced and the similarity is low, the length to be played in the shot is increased.

또한 본 발명에서 상기 샷 내의 이미지적/모션적/오디오적 유사성은 샷 내에서의 시간적 위치가 다른 프레임, 움직임 벡터, 오디오 데이터의 유사성인 것을 특징으로 한다.Also, in the present invention, the image / motion / audio similarity in the shot may be similarity between frames, motion vectors, and audio data having different temporal positions in the shot.

또한 본 발명에서 상기 개별 샷에서 재생할 부분으로 선정된 세그먼트의 길이가 해당 샷의 길이보다 크게 계산될 경우 개별 샷에서 재생할 부분의 길이를 해당 샷의 길이 이하로 줄이는 것을 특징으로 한다.In the present invention, when the length of the segment selected as the portion to be reproduced in the individual shot is calculated to be larger than the length of the shot, the length of the portion to be reproduced in the individual shot is reduced to less than or equal to the length of the shot.

또한 본 발명에서 상기 비디오 스키밍을 위해 선택되어 재생될 구간들을 정상적인 비디오 재생속도로 재생하거나, 또는 단위 시간당 디코딩될 프레임의 수를 정상적인 경우보다 높여서 상기 재생구간을 고속 재생하거나, 또는 상기 재생할 구간내의 프레임들을 모두 디코딩하지 않고 중간의 몇 프레임씩 스킵하여 재생함으로써 상기 재생구간을 고속 재생하는 것을 특징으로 한다.Also, in the present invention, the sections to be selected and reproduced for the video skimming are reproduced at a normal video playback rate, or the number of frames to be decoded per unit time is higher than normal to reproduce the playback section at high speed, or the frames within the section to be played back. It is characterized in that the playback section is reproduced at high speed by skipping and reproducing every few frames without decoding all of them.

또한 본 발명에서 상기 스킵을 이용한 고속 스키밍을 MPEG과 같은 프레임간압축을 사용하는 코딩 스킴을 이용한 비디오 스트림에 적용하였을 때, 디코딩할 프레임은 다른 프레임의 디코딩 없이 해당 프레임만 디코딩하여 프레임 데이터를 얻을 수 있는 I 프레임인 것을 특징으로 한다.Also, in the present invention, when the fast skimming using the skip is applied to a video stream using a coding scheme using an interframe compression such as MPEG, a frame to be decoded can obtain frame data by decoding only the corresponding frame without decoding another frame. Characterized in that I frame.

상기한 바와같이 이루어진 본 발명의 샷 세그멘테이션 정보를 이용한 비디오 스키밍 방법을 첨부된 도면을 참조하여 더욱 상세하게 설명한다.A video skimming method using the shot segmentation information of the present invention made as described above will be described in more detail with reference to the accompanying drawings.

디지털 비디오 기술의 발달과 이미지/비디오 인식 기술의 발달로 인하여 사용자들은 원하는 비디오를 원하는 시점에 원하는 부분만을 검색(searching/ filtering)하고 브라우징(browsing) 할 수 있게 되었다.With the development of digital video technology and the development of image / video recognition technology, users can search / filter and browse only the desired part at the desired time of the desired video.

비선형적인 비디오 브라우징(non-linear video browsing)과 검색을 위하여 가장 기본이 되는 기술은 샷 세그멘테이션 (shot segmentation) 기법과 샷 클러스터링(shot clustering) 기법이며, 이 두 가지 기술은 비디오를 분석(analysis)하는데 있어서 가장 핵심이 되는 기술이다. 따라서 현재까지 많은 연구가 샷 세그멘테이션에 집중되어 있고 샷 클러스터링 기술에 대한 연구가 시작되고 있는 추세이다.The most basic techniques for non-linear video browsing and retrieval are shot segmentation and shot clustering, both of which are used to analyze video. Is the key technology. Therefore, many studies are focused on shot segmentation until now, and research on shot clustering technology is being started.

여러가지 연구 결과를 토대로 하면 샷 세그멘테이션은 자동화 될 수 있으며 대부분의 알고리즘은 90%이상의 높은 정확도를 가지고 구현될 수 있다.Based on various studies, shot segmentation can be automated and most algorithms can be implemented with high accuracy of over 90%.

샷(shot)이란 방해(interruption) 없이 하나의 카메라로부터 얻어진 비디오 프레임들의 시퀀스(sequence)를 의미하며, 이는 비디오를 분석(analysis)하거나 구성(construction)하는 가장 기본이 되는 단위이다.Shot refers to a sequence of video frames obtained from one camera without interruption, which is the most basic unit for analyzing or constructing video.

일반적으로 비디오는 수많은 샷의 연결로 구성된다. 샷 세그멘테이션이란 비디오를 각각의 개별 샷으로 분할하는 기법을 의미하며 도1은 샷 세그멘테이션 과정을 도식화 한 것이다. 일반적으로 대부분의 샷 세그멘테이션 알고리즘은 동일한 샷내에서는 이미지적/모션적/오디오적 유사성이 존재하고 두개의 다른 샷 간에는 이미지적/모션적/오디오적 비유사성이 계측되는 특성에 기반한다.In general, video consists of a series of shots. Shot segmentation refers to a technique of dividing a video into individual shots, and FIG. 1 illustrates the shot segmentation process. In general, most shot segmentation algorithms are based on the property that image / motion / audio similarity exists within the same shot and that image / motion / audio dissimilarity is measured between two different shots.

일반적으로 비디오 하이라이트는 비디오 스트림의 내용전개에 있어서 의미상으로 중요한 세그먼트들을 선택하여 이 세그먼트들을 연속적으로 재생하는 방법이다.In general, video highlighting is a method of selecting segments that are semantically important in the content development of a video stream and continuously playing these segments.

그러나 다양한 비디오 콘텐트에 대하여 내용전개에 있어서 의미상으로 중요한 세그먼트들을 선택하는 것은 자동화 하기 매우 어렵다.However, it is very difficult to automate the selection of semantically important segments in content deployment for various video content.

본 발명에서는 모든 비디오에 존재하는 샷을 기준으로 하여 샷의 일정부분만을 재생하고 나머지 부분은 스킵(skip)하는 방법을 통하여 원래의 비디오 스트림 보다 짧은 길이를 재생하도록 하는 스키밍 방법을 제안한다.The present invention proposes a skimming method to reproduce a shorter length than the original video stream by playing only a portion of the shot and skipping the remaining portion based on the shots present in all the videos.

이러한 본 발명의 스키밍 방법은 샷 세그멘테이션 기술이 자동화 가능하므로 완전 자동화 시스템을 구축할 수 있다는 장점이 있으며, 일반적인 디지털 비디오에 대한 빨리감기/빨리되감기에서 발생하는 문제 즉, 중요하지 않은 장면이 길게 재생된다거나 중요한 장면을 놓친다거나 하는 문제를 최소화 시킬 수 있는 장점이 있다.Such a skimming method of the present invention has the advantage that it is possible to build a fully automated system because the shot segmentation technology can be automated, and the problem that occurs during fast forward / fast reverse for general digital video, that is, a non-critical scene is reproduced for a long time. Or to miss important scenes.

도2는 본 발명의 개념을 요약한 도면이다. 도2에서 회색으로 표시된 부분은 샷 세그멘테이션 정보를 이용한 스키밍 방법에서 재생될 부분을 나타낸 것이며 나머지 부분은 스킵할 부분을 나타낸 것이다.2 is a diagram summarizing the concept of the present invention. In FIG. 2, gray portions indicate portions to be reproduced in the skimming method using shot segmentation information, and others indicate portions to be skipped.

도2에 도시한 본 발명의 비디오 스키밍 방법에서는 각 샷 내에서 재생될 부분과 스킵핑할 부분을 어떻게 선정할 것인가, 재생할 부분의 재생길이는 어떻게 선정할 것인가, 그리고 재생구간 내에서의 재생방법은 어떻게 할 것인가를 고려한다.In the video skimming method of the present invention shown in Fig. 2, how to select the portion to be played back and the portion to be skipped within each shot, how to select the playback length of the portion to be played back, and the playback method within the playback section Consider what to do.

먼저, 각 샷 내에서 재생될 부분과 스킵핑할 부분을 선정하는 방법에 대해서 설명한다.First, a method of selecting a portion to be reproduced and a portion to be skipped in each shot will be described.

각 샷에서 재생될 부분을 선정하는 방법은 무조건적으로 샷의 앞부분을 선택하거나 뒷부분 또는 가운데 부분을 선택할 수 있다. 도3은 샷의 앞부분, 뒷부분, 가운데 부분 또는 앞/뒤 부분을 동시에 이용하여 비디오 스키밍을 하는 방식에서 재생될 부분과 스킵될 부분을 나타낸 도면이다.The method of selecting a portion to be played in each shot can be unconditionally selected as the front part of the shot, the rear part, or the middle part. FIG. 3 is a diagram illustrating a portion to be played back and a portion to be skipped in a video skimming method using the front part, the rear part, the middle part, or the front / rear part of the shot simultaneously.

그러나 비디오의 장르에 따라서 차이는 있지만 실험의 결과는 해당 샷의 앞부분을 스킵하고 뒷부분을 재생하는 것이 일반적으로 사용자의 만족도가 높은 것으로 나타난다. 그 이유는 일반적으로 샷의 결말 부분(예 : 축구의 골 장면 등)이 샷의 내용을 이용하는데 있어서 도입이나 전개 부분보다 더 중요하며, 뉴스와 같은 프로그램에서 단계적인 도표 설명등의 방법을 이용할 경우 샷의 앞부분에서는 내용의 일부가 표현되고 마지막 부분에서 전체의 내용이 표현되기 때문이다.However, although there are differences depending on the genre of the video, the results of the experiment show that skipping the front part of the shot and playing the back part generally shows a high level of user satisfaction. The reason for this is that the ending part of the shot (e.g. soccer goal scene, etc.) is more important than the introduction or development part in using the content of the shot, and when using a method such as step-by-step charting in a program such as news. This is because part of the content is expressed at the beginning of the shot and the entire content is expressed at the end.

그러나 비디오의 장르에 따라서는 샷의 앞부분이 일반적으로 중요한 것도 있는데 그 일례로 문제 풀이 등을 주로 하는 교육 방송등을 들 수 있다. 이러한 방송에서는 샷의 앞부분에서 어떤 문제를 다루는지에 대한 정보가 있고 그 이후로는 문제 풀이가 이어지므로 원하는 부분을 재생하기 위해서는 샷의 앞부분을 재생하는 것이 뒷부분을 재생하는 것보다 사용자에게 더 많은 정보를 제공할 수 있다.However, depending on the genre of the video, the first part of the shot is generally important. For example, an education broadcast mainly for problem solving is used. In these broadcasts, there is information about what problems are dealt with at the beginning of the shot, and after that, problem solving is followed. To play the part you want, playing the front part of the shot gives more information to the user than playing the back part. Can provide.

따라서 본 발명에서는 비디오의 장르에 따라서 샷 내에서 재생할 위치를 다르게 선정할 수 있으며, 같은 샷 내에서도 앞부분, 중간부분, 뒷부분을 혼용하여 스키밍을 구현할 수 있다.Therefore, according to the present invention, the position to be played in the shot may be differently selected according to the genre of the video, and the skimming may be implemented by using the front part, the middle part, and the rear part in the same shot.

다음은 본 발명에서 재생될 길이를 선정하는 방법에 대해서 설명한다.The following describes a method of selecting a length to be reproduced in the present invention.

각 샷에서 재생될 길이를 선정하는 방법은 선정된 모든 샷에 대하여 동일한 길이의 세그먼트를 재생할 부분으로 선정하는 방법과, 샷의 특성을 이용하여 각 샷마다 재생할 길이를 다르게 선정하는 방법으로 나뉠 수 있다.The method of selecting the length to be played in each shot can be divided into a method of selecting segments of the same length for all the selected shots and a method of differently selecting the length of playback for each shot by using the characteristics of the shots. .

이때 이용하는 샷의 특성은 한 샷 내에서의 평균적 이미지적/모션적/오디오적 유사성에 근거할 수 있다. 즉 한 샷내에서 이미지적/모션적/오디오적 유사성이 크면 클수록 장면이 단조로운 것으로 판단할 수 있으므로 이러한 장면에 대해서는 스킵을 많이 하고, 샷내에서 이미지적/모션적/오디오적 유사성이 작으면 내용이 복잡한 장면으로 판단할 수 있으므로 이러한 장면에 대해서는 스킵을 적게 하는 방식을 이용하여 재생될 단위 세그먼트의 길이를 동적으로 조정할 수 있다.The characteristics of the shot used at this time may be based on the average image / motion / audio similarity within a shot. In other words, the larger the image, motion, and audio similarity within a shot, the more likely the scene is monotonous. Therefore, the scene is skipped a lot, and the smaller the image, motion, and audio similarity within a shot, the more complex the content. Since it can be determined as a scene, the length of the unit segment to be reproduced can be dynamically adjusted by using a method of reducing the skip for such a scene.

이러한 방법은 샷의 시간적 길이에 의존하지 않고 내용이 많은 부분은 적게 스킵하고 내용이 적은 부분은 많이 스킵하는 방법으로, 선정된 모든 샷에 대하여 동일한 길이의 세그먼트를 재생하는 방법보다 사용자의 이해도가 높은 수준의 비디오 스키밍을 제공할 수 있는 방법이다.This method does not depend on the time length of the shot and skips a large portion of content and skips a large portion of content, which is more understandable to users than playing a segment of the same length for all selected shots. It is a way to provide a level of video skimming.

도4는 샷내에서 이미지적/모션적/오디오적 유사성에 기반한 재생 및 스킵할 길이의 선정 방법의 일례를 나타낸 것이다.4 shows an example of a method of selecting a length to be played back and skipped based on image / motion / audio similarity in a shot.

도4의 그래프에서 가로축은 시간을 의미하며 세로축은 샷 내에서의 이미지적/모션적/오디오적 비유사성을 측정하여 누적시킨 값을 의미한다. 이러한비유사성 데이터는 일반적으로 샷 세그멘테이션 알고리즘에서 추출 가능한 샷 특성을 나타내는 데이터이다.In the graph of FIG. 4, the horizontal axis represents time and the vertical axis represents a value accumulated by measuring image / motion / audio dissimilarity in a shot. Such dissimilarity data is generally data representing shot characteristics that can be extracted by a shot segmentation algorithm.

비유사도의 일례로 인접한 프레임 또는 일정간격으로 떨어진 프레임간의 칼라 히스토그램(color histogram)의 차이를 들 수 있다.An example of dissimilarity is the difference in color histograms between adjacent frames or frames spaced apart at regular intervals.

도4에서는 샷A와 샷B는 길이가 비슷하지만 삿A의 평균 변화율 보다 샷B의 평균 변화율이 더 크므로 샷A에서보다 샷B에서 더 많은 부분을 재생하는 상황을 표현하였다.In Figure 4, shot A and shot B are similar in length, but the average change rate of shot B is larger than the average change rate of Satte A. Therefore, the situation in which more parts are reproduced in shot B than in shot A is represented.

이와같이 재생 구간을 설정함에 있어서 샷의 길이를 고려하지 않으면 해당 샷의 길이보다 재생할 구간의 길이가 더 커지는 에러 상황(샷의 길이가 매우 짧은 경우)이 발생할 수 있으므로 본 발명의 스키밍 방법에서는 단위 구간의 길이가 해당 샷 보다 커지는 경우에는 예외적으로 해당 샷의 전체를 재생 구간으로 선정하거나 해당 샷의 길이를 고려하여 일부분을 재생 구간으로 선정할 수 있다.In this case, if the length of the shot is not considered in setting the playback section, an error situation (when the shot length is very short) may be larger than the length of the shot. If the length is larger than the shot, the entirety of the shot may be selected as the playing section or a part may be selected as the playing section in consideration of the length of the shot.

다음은 재생 구간내에서의 재생 방법에 대하여 설명한다.Next, a reproduction method in the reproduction section will be described.

본 발명에 따른 비디오 스키밍은 순방향 뿐만아니라 역방향에 대하여서도 적용이 가능하다.The video skimming according to the present invention can be applied not only in the forward direction but also in the reverse direction.

각각의 샷내에서 재생할 구간으로 선정된 세그먼트들을 연속적으로 재생하면 사용자는 전체의 내용을 이해하면서도 짧은 시간에 콘텐트에 대한 개요정보를 얻을 수 있으며 원하는 위치를 탐색하는데 있어서 별도의 간섭이 필요하지 않다.By continuously playing the segments selected as the sections to be played in each shot, the user can obtain the overview information of the contents in a short time while understanding the entire contents, and does not need any additional interference in searching for a desired position.

본 발명의 비디오 스키밍 방법에서 각각의 샷내에서 재생할 구간으로 선정된 세그먼트들을 재생하는 방법은 크게 두가지로 나뉠 수 있다.In the video skimming method of the present invention, a method of reproducing segments selected as sections to be reproduced in each shot may be divided into two types.

첫번째는 정상 재생 방법과 동일한 방법으로 각 세그먼트들을 재생하는 방법이며, 두번째는 재생 구간내의 일부 프레임만을 디코딩하여 구간내에서 다시 스킵핑을 이용하여 재생하는 방법이다.The first method is to reproduce each segment in the same manner as the normal playback method, and the second method is to decode some frames in the playback section and play back using skipping again in the section.

정상 재생 방법은 매우 일반적이므로 구체적인 설명을 생략하고 재생 구간내에서 일부 프레임만을 디코딩하여 구간내에서 스킵핑을 이용한 재생 방법을 설명하기로 한다.Since the normal reproduction method is very general, a detailed description thereof will be omitted, and a reproduction method using skipping within the interval by decoding only some frames in the reproduction interval will be described.

재생 구간내에서 일부 프레임만을 디코딩하여 구간내에서 스킵핑을 이용한 재생 방법은 고속 스키밍을 구현하기 위한 방법이다. 이때 디스플레이될 프레임들은 시간적으로 일정한 간격 만큼 떨어진 프레임으로 지정할 수도 있으며 MPEG과 같은 프레임간 압축을 이용하는 방법에서는 프레임간 의존성이 없는 I 프레임들로 지정할 수 있다.The playback method using skipping in a section by decoding only some frames in a playback section is a method for implementing high speed skimming. In this case, the frames to be displayed may be designated as frames spaced apart at regular intervals, and may be designated as I frames having no inter-frame dependency in a method using inter-frame compression such as MPEG.

도5는 재생 구간내에서 스킵핑을 이용한 고속 스키밍 방법의 일례를 도식화 한 것이다. 이러한 방법을 사용하면 사용자는 실제로 많은 정보를 얻으면서 고속으로 비디오 화일을 재생하는 효과를 경험할 수 있다.5 is a diagram illustrating an example of a fast skimming method using skipping in a reproduction section. Using this method, the user can experience the effect of playing video files at high speed while actually getting a lot of information.

상기한 바와같이 이루어지는 본 발명의 샷 세그멘테이션 정보를 이용한 비디오 스키밍 방법에 따른 비디오 스키밍 시스템은; 멀티미디어 데이터로서 디지털 비디오 데이터의 검색과 브라우징이 이루어지기 위하여 비디오 스키밍을 위한 사용자 명령을 입력하는 사용자 인터페이스 수단과, 상기 사용자 인터페이스수단으로 입력된 사용자 명령에 따라 해당 비디오 화일을 인덱싱 정보에 근거하여 스키밍 처리하는 제어수단과, 상기 제어수단에 디지털 비디오 데이터와 해당 비디오에 대한 인덱스 정보를 제공하기 위한 비디오 정보 화일과, 상기 제어수단에 의해서 스키밍 처리된 비디오가 재생되는 디스플레이수단을 포함하여 이루어진 것을 특징으로 한다.The video skimming system according to the video skimming method using the shot segmentation information of the present invention made as described above; In order to search and browse digital video data as multimedia data, user interface means for inputting a user command for video skimming, and a corresponding video file is skimmed based on indexing information according to a user command input to the user interface means. And a video information file for providing digital video data and index information of the video to the control means, and a display means for reproducing the video skimmed by the control means. .

또한 본 발명에서 상기 인덱스 정보는 비디오 스트림에 대한 샷 세그멘테이션 정보를 포함한다.In the present invention, the index information includes shot segmentation information for the video stream.

또한 본 발명에서 상기 사용자 인터페이스 수단은 비디오 스키밍의 정도로서 요약 수준을 지정하는 수단이나, 비디오 스키밍시의 재생 구간의 배속을 지정하는 수단을 포함하여, 비디오 스키밍이 이루어질 때 비디오의 요약수준이나 재생 배속을 선택할 수 있다.Also, in the present invention, the user interface means includes means for designating a summary level as a degree of video skimming, or means for designating a speed of a playback section during video skimming. You can choose.

또한 본 발명에서 상기 제어수단은 사용자의 입력 또는 기본 설정을 이용하여 스키밍 조건에 따라 인덱스 화일로부터 샷세그멘테이션 정보와 관련된 비디오 인덱스정보를 읽고, 비디오 스키밍 조건에 맞게 재생할 세그먼트들을 계산하여 관련된 미디어 화일에서 해당 세그먼트들을 연속적으로 재생하여 디스플레이 수단으로 출력한다.Also, in the present invention, the control means reads the video index information related to the shot segmentation information from the index file according to the skimming condition using the user's input or the basic setting, calculates the segments to be played according to the video skimming condition, and then applies the corresponding media file in the related media file. The segments are continuously reproduced and output to the display means.

도6은 비디오 스키밍을 위한 본 발명의 스키밍 시스템 구성의 실시예를 나타낸 것이다.6 shows an embodiment of a skimming system configuration of the present invention for video skimming.

도6에 의하는 바와같이 본 발명의 비디오 스키밍 시스템은, 비디오 스키밍의 정도와 스키밍에서 사용할 배속 등의 사용자 명령을 입력하기 위한 사용자 인터페이스부(601)와, 상기 사용자 인터페이스부(601)로 입력된 사용자 명령에 따라 해당 비디오 화일을 인덱싱 정보에 근거하여 스키밍 처리하는 주제어부(602)와, 상기 주제어부(602)에 디지털 비디오 스트림 정보를 제공하기 위한 미디어 화일(603)과,상기 미디어 화일에 해당하는 인덱싱 정보를 제공하기 위한 인덱스 화일(604)과, 상기 주제어부(602)에 의해서 스키밍 처리된 비디오가 재생되는 디스플레이 장치부(605)를 포함하여 이루어진다.As shown in FIG. 6, the video skimming system of the present invention includes a user interface unit 601 for inputting a user command such as a degree of video skimming and a double speed to be used in skimming, and inputted to the user interface unit 601. A main control unit 602 for skimming a corresponding video file based on indexing information according to a user command, a media file 603 for providing digital video stream information to the main control unit 602, and the media file And an index file 604 for providing indexing information, and a display device unit 605 on which a video skimmed by the main control unit 602 is reproduced.

도6의 본 발명 비디오 스키밍 시스템에서 인덱스 화일(604)은 미디어 화일(603)에 포함될 수 있다. 디스플레이 장치부(605)는 비디오 스트림을 디스플레이하는 모니터, 스피커등의 출력 장치이며, 사용자 인터페이스부(601)는 사용자의 입력을 받아들이는 키보드, 마우스, 리모콘, 버튼 등의 입력 수단을 의미한다.In the video skimming system of FIG. 6, an index file 604 may be included in a media file 603. The display device unit 605 is an output device such as a monitor or a speaker for displaying a video stream, and the user interface unit 601 means an input means such as a keyboard, a mouse, a remote controller, a button, etc. that receives a user input.

미디어 화일(603)은 비디오(오디오)데이터가 저장된 화일이며, 인덱스 화일(604)은 샷세그멘테이션 정보가 포함된 비디오에 대한 인덱스 정보가 저장된 화일이다.The media file 603 is a file in which video (audio) data is stored, and the index file 604 is a file in which index information of a video including shot segmentation information is stored.

사용자는 사용자 인터페이스부(601)를 이용하여 비디오 스키밍을 요구한다.The user requests video skimming using the user interface 601.

비디오 스키밍을 요구할 때에는 요약의 수준(스키밍의 정도)를 지정할 수 있으며, 스키밍에서 사용할 배속을 지정할 수 있다. 즉 사용자는 전체 비디오를 몇 분안에 압축하여 시청할지를 사용자 인터페이스부(601)를 이용하여 지정하고, 주 제어부(602)는 사용자의 입력에 대응하여 미디어 화일(603)과 그에 따른 인덱스 화일(602)의 정보로부터 어떤 샷의 어떤 부분을 스키밍을 위하여 재생할 것인지, 각 세그먼트들은 몇 배속으로 재생할 것인지를 결정하게 된다. 이러한 과정을 마치면 주 제어부(602)는 미디어 화일(603)을 디코딩하여 해당 프레임들을 디스플레이 장치부(605)에 디스플레이함으로써 사용자에게 비디오 스키밍 기능을 제공하게 된다.When you require video skimming, you can specify the level of summarization (the amount of skimming) and the speed you want to use for skimming. That is, the user designates in minutes how to compress and watch the entire video by using the user interface unit 601, and the main control unit 602 responds to the user's input by the media file 603 and the index file 602 accordingly. From this information, it is determined which part of the shot is to be played for skimming, and how many times each segment is to be played. After this process, the main controller 602 decodes the media file 603 and displays the frames on the display device unit 605 to provide a video skimming function to the user.

상기한 바와같이 본 발명은 디지털 비디오 환경에서 제한된 시간내에 전체의내용을 파악하고자 하거나 원하는 위치로 이동하고자 하는 사용자의 요구에 동시에 대응하는 비디오 스키밍 방법을 제안하였다.As described above, the present invention proposes a video skimming method that simultaneously responds to a user's request to grasp the entire contents within a limited time or move to a desired location in a digital video environment.

본 발명은 기존의 비디오 스키밍에서 발생할수 있는 문제점인, 중요하지 않은 부분이 상대적으로 많이 재생된다거나 사용자가 실제 원하는 장면을 놓칠 수 있는 확률을 최소화 하기 위한 방안으로 재생될 세그먼트들을 샷별로 할당함으로써 해결 하였다.The present invention solves the problem by allocating segments to be played by shots in order to minimize the probability that a relatively small non-critical portion can be reproduced or a user misses a desired scene. It was.

본 발명의 비디오 스키밍 방식은 원하는 위치로 이동하고자 하는 사용자의 요구에 대하여 사용자 입력의 필요성을 최소화 시킬수 있는 방법이다.The video skimming method of the present invention is a method capable of minimizing the necessity of user input for a user's request to move to a desired position.

본 발명의 비디오 스키밍을 기능을 이용하면 사용자는 짧은 시간내에 전체의 내용을 파악할 수 있으며 전체의 내용을 파악함에 있어서 중요한 부분을 놓지지 않고 지루한 부분을 간단히 지나갈 수 있는 장점이 있다.By using the video skimming function of the present invention, the user can grasp the entire contents in a short time, and in grasping the entire contents, there is an advantage of simply passing a boring part without missing an important part.

또한 사용자는 원하는 위치로 이동하고자 할 때에도 본 발명의 비디오 스키밍을 이용할 수 있으며 이는 키프레임을 이용한 방식보다 사용자 입력 요구가 매우 작은 장점이 있다.In addition, the user can use the video skimming of the present invention even when the user wants to move to a desired position, which has the advantage that the user input request is much smaller than the keyframe method.

결국 본 발명은 비디오 하이라이트 재생과 같은 용도로 이용될 수 있으며, 각 샷의 재생할 구간들을 재생함에 있어서 고속재생방식과 혼용되면 사용자의 입력 요구를 최소화 하며 원하는 장면을 빨리 탐색하는 기능으로 활용될 수 있다.As a result, the present invention can be used for the purpose of playing video highlights, and when used with a high-speed playback method in playing sections to be played for each shot, it can be utilized as a function of quickly searching for a desired scene while minimizing a user's input request. .

Claims

A shot segmentation for dividing a video into each individual shot and a method of skimming the video using the segment,

Recognizing the section of the individual shot from the shot segmentation information for the video stream, Selecting a portion of the section in the recognized individual shot section as the video information to be reproduced, in order to allocate the segments to be played by shot, Each of Continuously playing the selected video information for each shot in the shot section; Video skimming method using the shot segmentation information, characterized in that comprises a.

The shot segmentation information of claim 1, wherein the shot segmentation information is displayed by displaying shot information, which is a unit of production and editing of a video stream, together with temporal information such as a start position and a duration, or a start position and an end position. Video skimming method.

The video skimming method using shot segmentation information according to claim 1, wherein the selection of a portion to be played in each shot is performed by using the first half, the second half, the middle half, the first half, and the second half of the shot at the same time.

The method of claim 1, wherein the determination of the length to be played in each shot is performed by selecting and playing back segments of the same length in each shot.

The method of claim 1, wherein the determination of the length to be reproduced in each shot is performed based on the average value of the image / motion / audio similarity in the individual shots. The video skimming method using the shot segmentation information, characterized in that to increase the.

6. The video skimming method according to claim 5, wherein the image / motion / audio similarity in the shot is similarity of frames, motion vectors, and audio data having different temporal positions in the shot.

The method of claim 4 or 5, wherein when the length of the segment selected as the portion to be reproduced in the individual shot is calculated to be larger than the length of the shot, the length of the portion to be reproduced in the individual shot is reduced to the length of the shot or less. A video skimming method using shot segmentation information.

The video skimming method using shot segmentation information according to claim 1, wherein the sections to be selected and reproduced for the video skimming are reproduced at a normal video playback speed.

The video skimming method using shot segmentation information according to claim 1, wherein the sections to be selected and reproduced for the video skimming are reproduced using a faster playback speed than the normal speed.

10. The video skimming method of claim 9, wherein the playback section is reproduced at high speed by increasing the number of frames to be decoded per unit time than normal.

10. The video skimming method according to claim 9, wherein the playback section is reproduced at high speed by skipping and replaying a few frames in the middle without decoding all the frames in the section to be played back.

The method of claim 11, wherein when the fast skimming using the skip is applied to a video stream using a coding scheme using an interframe compression such as MPEG, a frame to be decoded decodes only the corresponding frame without decoding another frame. A video skimming method using shot segmentation information, characterized in that the obtained I frame.

A media information file providing user interface means for inputting user commands for video skimming, index information including digital video data and shot segmentation information for recognizing individual shot sections and playback sections of the video data stream, and the user; Control means for selecting a playback section in the individual shot section and continuously playing the selected playback sections based on the indexing information with respect to the corresponding video file according to a user command input to the interface means; and the video skimmed by the control means. Display means for reproducing; Video skimming system using the shot segmentation information, characterized in that comprises a.

15. The method of claim 13, wherein the index information is shot segmentation information for the video stream, and includes shot information, which is a unit of production and editing of the video stream, including temporal information such as a start position and a duration, or a start position and an end position. Video skimming system using shot segmentation information.

15. The method according to claim 13 or 14, wherein the user interface means includes means for designating a summary level as a degree of video skimming, or means for designating a double speed of a playback section during video skimming. A video skimming system using shot segmentation information, characterized in that the level of summary or playback speed can be selected.

15. The apparatus of claim 13 or 14, wherein the control means reads the video index information related to the shot segmentation information from the index file according to the skimming condition by using the user's input or the basic setting, and calculates the segments to be played back according to the video skimming condition. Video segmentation information using the shot segmentation information, characterized in that the corresponding media files are continuously played and output to the display means.