KR101328270B1

KR101328270B1 - Annotation method and augmenting video process in video stream for smart tv contents and system thereof

Info

Publication number: KR101328270B1
Application number: KR1020120030296A
Authority: KR
Inventors: 조근식
Original assignee: 인하대학교 산학협력단
Priority date: 2012-03-26
Filing date: 2012-03-26
Publication date: 2013-11-14
Also published as: KR20130108684A

Abstract

스마트 TV의 비디오 어노테이션 및 증강 방법이 개시된다. 비디오 어노테이션 방법은 제1 단말 또는 상기 제1 단말과 이기종의 제2 단말로부터 상기 제1 단말을 통해 재생 중인 비디오 콘텐츠에서 사용자가 선택한 장면(scene) 및 상기 장면에 대하여 상기 사용자가 작성한 어노테이션(annotation) 정보를 입력 받는 단계; 상기 비디오 콘텐츠에서 상기 사용자에 의해 선택된 장면의 비디오 단락(video segment)을 인식하는 단계; 및 상기 어노테이션 정보를 상기 비디오 단락에 넣어 상기 어노테이션 정보가 포함된 비디오 스트림을 생성하는 단계를 포함할 수 있다.Disclosed are a video annotation and augmentation method of a smart TV. The video annotation method includes a scene selected by a user and an annotation created by the user with respect to a scene selected by a user in video content being reproduced through the first terminal from a first terminal or the first terminal and a heterogeneous second terminal. Receiving information; Recognizing a video segment of a scene selected by the user in the video content; And generating the video stream including the annotation information by inserting the annotation information into the video paragraph.

Description

Smart TV video annotation and augmentation method and system thereof {ANNOTATION METHOD AND AUGMENTING VIDEO PROCESS IN VIDEO STREAM FOR SMART TV CONTENTS AND SYSTEM THEREOF}

본 발명의 실시예들은 스마트 TV 콘텐츠에 어노테이션 정보를 작성할 수 있는 비디오 어노테이션 방법 및 자동 어노테이션 시스템에 관한 것이다.Embodiments of the present invention relate to a video annotation method and an automatic annotation system capable of writing annotation information to smart TV content.

비디오 어노테이션(video annotation)을 위해, 비디오는 시간에 따라 단락 구분이 되어야 하고, 비디오 어노테이션 툴을 사용해서 비디오의 단락 별로 어노테이션 정보들을 작성할 수 있어야 한다.For video annotation, the video must be paragraph-delimited over time, and the video annotation tool must be able to create annotation information for each paragraph of the video.

한국공개특허 제10-2010-0123204호(공개일 2010년 11월 24일)에는 경과 시간에 따른 단락 구분을 위해 얼굴 인식을 이용하여 소정 장면에 대한 상황을 어노테이션 할 수 있는 기술이 개시되어 있다.Korean Patent Laid-Open Publication No. 10-2010-0123204 (published November 24, 2010) discloses a technique for annotating a situation for a predetermined scene using face recognition to distinguish paragraphs according to elapsed time.

특정 장면(scene)에 객체 어노테이션(objects annotation)을 할 경우, 기존의 전통적인 어노테이션 툴을 사용하여 해결할 수 없는 문제점이 나타나게 된다.When annotating objects to a scene, problems that cannot be solved using traditional annotation tools appear.

종래의 어노테이션은 시간이 많이 소모되는 작업이고 자동적으로 수행하기에는 어려움이 많다. 만약, 같은 비디오에 대해 여러 사람이 어노테이션을 할 때 개인적으로 작성된 어노테이션 정보들을 하나의 비디오 스트림으로 합치고 협업하는 데에 어려움이 많고 많은 시간이 소비되는 한계가 있다.Conventional annotations are time consuming and difficult to perform automatically. If several people are annotating the same video, it is difficult and time consuming to combine and collaborate on personally created annotation information into a single video stream.

더욱이, 어노테이션을 작성하거나 비디오 단락을 재생하기 위해서는 비디오의 단락을 찾아야 하는데, 대개 어노테이션 되어 있는 키워드 매칭에 의존하거나 사용자에 의해 작성되는 시간 척도에 의존하게 된다.Furthermore, in order to annotate or play a video paragraph, it is necessary to find the paragraph of the video, usually depending on the annotated keyword matching or on a time scale created by the user.

비디오 콘텐츠를 제공하는 기관에서는 여러 사용자들에 의해 작성된 어노테이션 정보를 수집하여 이를 다음 세대의 사용자들을 위한 유용한 정보로서 활용할 수 있다. 그러나, 개인들이 비디오의 특정 장면에서 작성한 어노테이션 각각에 대하여 비디오의 단락을 일일이 찾아 해당 단락에 어노테이션을 첨부해야 하기 때문에 여러 사용자에 의해 작성된 어노테이션들을 하나의 비디오 스트림을 통합하기 위해서는 많은 시간과 수고가 필요하다.Institutions providing video content can collect annotation information created by multiple users and use it as useful information for future generations of users. However, since individuals must find and annotate paragraphs of the video for each annotation created in a particular scene of the video, it takes a lot of time and effort to integrate annotations created by multiple users into a single video stream. Do.

따라서, 여러 사용자에 의해 작성된 개별적인 어노테이션들을 자동으로 하나의 비디오 스트림에 통합하는 기술이 필요하다.Thus, there is a need for a technique that automatically integrates individual annotations written by multiple users into a single video stream.

어노테이션을 비디오 단락에 작성하고 특정 장면의 객체를 비디오에 증강시킬 수 있는 비디오 어노테이션 방법 및 자동 어노테이션 시스템을 제공한다.Provides video annotation methods and automatic annotation systems that can create annotations in video paragraphs and augment objects in specific scenes in video.

증강현실에서 특징(feature) 기반의 이미지 매칭 방법을 사용하여 비디오에 어노테이션 정보를 작성하는 비디오 어노테이션 방법 및 자동 어노테이션 시스템을 제공한다.Provided are a video annotation method and an automatic annotation system for creating annotation information on a video using a feature-based image matching method in augmented reality.

여러 사용자가 공동으로 소비한 콘텐츠에 대하여 개인에 의해 작성된 개별적인 어노테이션들을 통합하여 하나의 비디오 스트림으로 생성할 수 있는 비디오 어노테이션 방법 및 자동 어노테이션 시스템을 제공한다.The present invention provides a video annotation method and an automatic annotation system capable of integrating individual annotations made by individuals on content consumed by several users and generating them into a single video stream.

본 발명의 일 측면에 따르면, 사용자의 제1 단말을 통해 재생 중인 비디오 콘텐츠에서 상기 사용자에 의해 어노테이션(annotation) 정보가 작성된 장면의 비디오 단락(video segment)을 인식하는 인식부; 및 상기 어노테이션 정보를 상기 비디오 단락에 넣어 상기 어노테이션 정보가 포함된 비디오 스트림을 생성하는 생성부를 포함하는 자동 어노테이션 시스템이 제공된다.According to an aspect of the present invention, a recognition unit for recognizing a video segment of a scene in which annotation information (annotation) information is created by the user in the video content being played through the user's first terminal; And a generator configured to insert the annotation information into the video paragraph to generate a video stream including the annotation information.

상기 자동 어노테이션 시스템은 상기 제1 단말 또는 상기 제1 단말과 이기종의 제2 단말로부터 상기 사용자가 상기 비디오 콘텐츠에서 선택한 장면, 및 상기 장면에 대하여 상기 사용자가 작성한 상기 어노테이션 정보를 입력 받는 입력부를 더 포함할 수 있다.The automatic annotation system further includes an input unit configured to receive a scene selected by the user from the video content from the first terminal or the second terminal and a heterogeneous second terminal, and the annotation information created by the user with respect to the scene. can do.

상기 인식부는 상기 비디오 콘텐츠와 상기 장면을 비교하여 상기 장면의 시작 지점을 인식할 수 있다.The recognition unit may recognize the starting point of the scene by comparing the video content with the scene.

상기 생성부는 증강현실(Augmented Reality)에서 사용되는 특징 기반의 이미지 매칭 알고리즘 또는 상기 비디오 콘텐츠의 타임 프레임 정보(time frame information)를 이용하여 상기 비디오 스트림을 생성할 수 있다.The generation unit may generate the video stream using a feature-based image matching algorithm or time frame information of the video content used in Augmented Reality.

상기 자동 어노테이션 시스템은 상기 비디오 스트림을 저장하는 데이터베이스; 및 상기 사용자의 요청에 따라 상기 비디오 스트림을 상기 제1 단말 또는 상기 제1 단말과 이기종의 제2 단말로 제공하는 제공부를 더 포함할 수 있다.The automatic annotation system includes a database for storing the video stream; And a providing unit configured to provide the video stream to the first terminal or the first terminal and a heterogeneous second terminal at the request of the user.

상기 제공부는 상기 비디오 스트림의 재생 시 상기 어노테이션 정보를 해당 장면에서 증강된 객체로 제공할 수 있다.The providing unit may provide the annotation information as an augmented object in a corresponding scene when the video stream is played.

본 발명의 다른 측면에 따르면, 제1 단말 또는 상기 제1 단말과 이기종의 제2 단말로부터 상기 제1 단말을 통해 재생 중인 비디오 콘텐츠에서 사용자가 선택한 장면 및 상기 장면에 대하여 상기 사용자가 작성한 어노테이션 정보를 입력 받는 단계; 상기 비디오 콘텐츠에서 상기 사용자에 의해 선택된 장면의 비디오 단락(video segment)을 인식하는 단계; 및 상기 어노테이션 정보를 상기 비디오 단락에 넣어 상기 어노테이션 정보가 포함된 비디오 스트림을 생성하는 단계를 포함하는 비디오 어노테이션 방법이 제공된다.According to another aspect of the present invention, the scene selected by the user in the video content being played through the first terminal from the first terminal or the first terminal and the heterogeneous second terminal and annotation information created by the user with respect to the scene; Receiving an input; Recognizing a video segment of a scene selected by the user in the video content; And inserting the annotation information into the video paragraph to generate a video stream including the annotation information.

본 발명의 실시예에 따르면, 비디오 단락의 정지된 이미지들을 어노테이션 하고 특징 기반의 이미지 매칭 기법이나 타임 프레임 정보를 이용하여 특정 장면의 객체를 비디오에 증강시킬 수 있다. 따라서, 비디오 방송 기관을 위한 새로운 어노테이션 방법을 제공함으로써 유저들을 위해 견해를 수집하고 고객들의 의견이나 다른 어노테이션 정보를 직접적으로 보여주거나, 그들의 의견이나 견해를 갱신하고 뉴스나 광고 직후에 어노테이션 정보를 P2P 네트워킹이나 어플리케이션을 통해 다양한 형태로 서비스 할 수 있다.According to an embodiment of the present invention, an object of a specific scene may be augmented in a video by annotating still images of a video paragraph and using a feature-based image matching technique or time frame information. Thus, by providing a new annotation method for video broadcasters, it can collect views for users and present their opinions or other annotation information directly, or update their opinions or views and P2P networking information immediately after news or advertising. It can also be serviced in various forms through applications.

도 1은 본 발명의 일실시예에 있어서, TV 콘텐츠의 비디오 단락에 어노테이션을 작성하는 자동 어노테이션 시스템의 내부 구성을 도시한 블록도이다.
도 2는 본 발명의 일실시예에 있어서, 비디오 어노테이션 툴을 설명하기 위한 예시 화면이다.
도 3은 본 발명의 일실시예에 있어서, 어노테이션 정보가 포함된 비디오를 생성하는 과정을 설명하기 위한 예시 화면이다.
도 4는 본 발명의 일실시예에 있어서, 어노테이션 정보가 포함된 비디오의 생성 및 서비스 형태를 설명하기 위한 예시 화면이다.
도 5는 본 발명의 일실시예에 있어서, 어노테이션 정보가 작성된 영상 목록을 보여주는 예시 화면이다.
도 6은 본 발명의 일실시예에 있어서, TV 콘텐츠의 비디오 단락에 어노테이션을 작성하는 비디오 어노테이션 방법을 도시한 흐름도이다.1 is a block diagram showing an internal configuration of an automatic annotation system for annotating video paragraphs of TV content according to one embodiment of the present invention.
2 is an exemplary screen for describing a video annotation tool according to an embodiment of the present invention.
3 is an exemplary screen for explaining a process of generating a video including annotation information according to an embodiment of the present invention.
FIG. 4 is an exemplary screen for describing a generation and service type of a video including annotation information according to an embodiment of the present invention.
5 is an exemplary screen illustrating an image list in which annotation information is created according to an embodiment of the present invention.
6 is a flowchart illustrating a video annotation method for annotating video paragraphs of TV content according to one embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 실시예들은 콘텐츠의 비디오 단락에 어노테이션을 작성할 수 있는 어노테이션 시스템에 적용될 수 있다. 특히, 본 실시예들은 비디오 단락의 정지된 이미지들을 어노테이션 하고 증강 현실에서 사용되는 특징 기반의 이미지 매칭(feature based image matching) 기법이나 비디오 콘텐츠의 타임 프레임 정보를 이용하여 특정 장면의 객체를 비디오에 증강시킬 수 있는 기술에 관한 것이다.The present embodiments can be applied to an annotation system that can annotate video paragraphs of content. In particular, the present embodiments annotate still images of a video paragraph and augment objects of a particular scene in a video using feature based image matching techniques or time frame information of video content used in augmented reality. It is about technology that can be made.

본 명세서에서, '어노테이션'이란 비디오 객체에 특정 이미지, 음성 또는 애니메이션 등을 덧붙여 사용자가 해당 영역에 어떤 액션(예를 들면, 클릭 또는 터치)을 취하면 비디오의 인터액티브 한 반응(interaction)을 구현할 수 있다.In this specification, 'annotation' refers to adding an image, voice, or animation to a video object to implement an interactive interaction of the video when the user takes an action (for example, a click or a touch) in the area. Can be.

스마트 TV 에서는 특정 드라마나 광고가 프레임 단위로 표시되므로 정지된 특정 화면이 시작점에서 몇 번째 프레임인지 확인할 수 있어 별도의 이미지 매칭이 필요 없이 증강된 객체를 비디오에 표시할 수 있다.In smart TVs, specific dramas or advertisements are displayed in units of frames, so it is possible to check the frame number of a specific stationary screen at the starting point, so that an augmented object can be displayed in a video without requiring image matching.

도 1은 본 발명의 일실시예에 있어서, TV 콘텐츠의 비디오 단락에 어노테이션을 작성하는 자동 어노테이션 시스템의 내부 구성을 도시한 블록도이다.1 is a block diagram showing an internal configuration of an automatic annotation system for annotating video paragraphs of TV content according to one embodiment of the present invention.

예컨대, 사용자가 스마트 TV(101)를 통해 한 남자가 스케이트보드를 타고 있는 비디오 콘텐츠를 시청하고 있는 상황에서 관심 있는 장면이나 특정 상품에 대하여 어노테이션을 작성할 수 있다.For example, in a situation where a user is watching a video content on which a man is riding a skateboard through the smart TV 101, the user may annotate a scene or a particular product of interest.

이에, 자동 어노테이션 시스템(100)는 스마트 TV(101)를 통해 비디오 콘텐츠를 시청하는 사용자로부터 사용자가 원하는 장면의 스틸 이미지와 해당 장면의 객체들을 선택 받을 수 있다. 이때, 사용자로부터 선택 받은 장면에서의 객체와 해당 장면의 비디오 단락(video segment)은 사용자의 관심사가 포함된 스틸 이미지로 어노테이션 될 수 있다.Accordingly, the automatic annotation system 100 may receive a still image of a scene desired by the user and objects of the scene from a user who views the video content through the smart TV 101. In this case, an object in a scene selected by the user and a video segment of the scene may be annotated as a still image including the user's interest.

이러한 어노테이션을 통해 스틸 이미지에 해당되는 장면에 개인적인 의견이나 설명, 그리고 2차원 또는 3차원 그래픽 등을 첨부할 수 있다. 하나의 장면에는 여러 개의 어노테이션들이 포함될 수 있으며, 이러한 어노테이션들은 특정 웹 사이트 또는 텍스트 메시지에 링크될 수 있다.These annotations allow you to attach personal comments, comments, and two-dimensional or three-dimensional graphics to the still image. A scene can contain several annotations, which can be linked to a specific web site or text message.

자동화 된 컴퓨터에서는 이미 선택된 장면이나 개별적으로 어노테이션 된 장면을 비디오 이미지와 매칭함으로써 자동적으로 개별적인 어노테이션들을 수집하여 비디오 스트림에 병합하는 작업을 할 수 있다.Automated computers can automatically collect individual annotations and merge them into the video stream by matching already selected or individually annotated scenes with video images.

따라서, 자동 어노테이션 시스템(100)은 상기한 작업을 통해 시청 중인 비디오 콘텐츠에서 사용자가 선택한 장면과 해당 장면에 대하여 사용자가 작성한 어노테이션 정보를 합쳐서 어노테이션 정보가 포함된 새로운 비디오 스트림을 생성할 수 있다.Accordingly, the automatic annotation system 100 may generate a new video stream including the annotation information by combining the scene selected by the user in the video content being viewed and the annotation information created by the user with respect to the corresponding scene.

비디오 이미지가 어노테이션 된 후 네트워크 서버 상에 저장되면, 사용자는 스마트폰, 태블릿 등의 단말을 이용하여 네트워크 서버에 접근하여 어노테이션 된 비디오 이미지를 언제든지 볼 수 있다. 즉, 어노테이션 이미지는 단말 상에 디스플레이 될 수 있고, 이때 사용자는 이미지를 클릭하여 해당 이미지에 첨부된 추가적인 정보나 의견을 볼 수 있다.After the video image is annotated and stored on the network server, the user may access the network server using a terminal such as a smartphone or a tablet to view the annotated video image at any time. That is, the annotation image may be displayed on the terminal, and the user may click on the image to view additional information or comments attached to the image.

특히, 자동 어노테이션 시스템(100)에서는 사용자에 의해 어노테이션이 작성된 비디오의 단락을 자동 인식함으로써 같은 비디오 콘텐츠에 대해 여러 사람이 어노테이션 한 경우 개별적으로 작성된 어노테이션들을 수집하여 이를 쉽게 하나의 비디오 스트림으로 통합할 수 있다. 방송사 등 비디오 콘텐츠를 공급하는 기관에서 개별적인 어노테이션 정보를 하나의 비디오 스트림으로 자동 통합하는 기술을 적용함으로써 추가적인 작업 없이 여러 사용자 간의 협업적인 어노테이션 작성 및 이용이 더욱 용이해질 수 있다.In particular, the automatic annotation system 100 automatically recognizes paragraphs of annotated video by the user, so that when several people have annotated the same video content, the individual annotations can be collected and easily integrated into a single video stream. have. Organizations that supply video content, such as broadcasters, can apply technology that automatically integrates individual annotation information into a single video stream, making it easier to create and use collaborative annotations between users without additional work.

도 1에 도시한 바와 같이, 자동 어노테이션 시스템(100)은 입력부(110), 인식부(120), 생성부(130), 데이터베이스(140), 제공부(150)를 포함하여 구성될 수 있다.As shown in FIG. 1, the automatic annotation system 100 may include an input unit 110, a recognition unit 120, a generation unit 130, a database 140, and a provision unit 150.

입력부(110)는 사용자가 시청하고 있는 비디오 콘텐츠에서 어노테이션을 위해 선택한 장면과, 해당 장면에 대하여 사용자가 작성한 어노테이션 정보를 입력 받을 수 있다. 일 예로, 사용자는 비디오 콘텐츠가 재생되고 있는 제1 단말을 통해 비디오 이미지에서 원하는 장면을 선택하고 어노테이션 정보를 입력할 수 있다. 다른 예로, 사용자는 제1 단말과 이기종의 제2 단말을 통해 제1 단말에서 재생 중인 비디오 이미지에서 원하는 장면을 선택하고 어노테이션 정보를 입력할 수 있다. 예컨대, 사용자가 제1 단말과 제2 단말을 대상으로 N 스크린 서비스 환경(여기서, 'N스크린 서비스'란, 스마트 TV와 스마트 패드에 동일한 비디오 콘텐츠를 볼 수 있는 서비스를 의미한다.)을 이용하는 경우 제2 단말을 통해 제1 단말에서 재생 중인 비디오의 특정 장면에 대한 어노테이션 작성이 얼마든지 가능하다. 또한, 제1 단말과 제2 단말이 자동 어노테이션 시스템(100)과 상호 간에 연동이 가능한 상태에서 제2 단말을 통해 제1 단말에서 재생 중인 비디오의 특정 장면을 촬영하여 촬영된 장면을, 사용자가 작성한 어노테이션 정보와 함께 자동 어노테이션 시스템(100)으로 전송할 수 있다.The input unit 110 may receive a scene selected for annotation in the video content viewed by the user and annotation information created by the user with respect to the corresponding scene. For example, the user may select a desired scene from the video image and input annotation information through the first terminal where the video content is being played. As another example, the user may select a desired scene from the video image being played in the first terminal and input annotation information through the first terminal and the heterogeneous second terminal. For example, when a user uses an N screen service environment for a first terminal and a second terminal ('N screen service' refers to a service for viewing the same video content on a smart TV and a smart pad). It is possible to create an annotation for a specific scene of a video being played in the first terminal through the second terminal. In addition, the first terminal and the second terminal in the state capable of interworking with the automatic annotation system 100 mutually photographed a specific scene of the video being played on the first terminal through the second terminal, the user created a scene The annotation information may be transmitted to the automatic annotation system 100 together with the annotation information.

이에, 입력부(110)는 제1 단말 또는 제2 단말로부터 사용자가 선택한 장면의 스틸 이미지와 사용자가 작성한 어노테이션 정보를 입력 받을 수 있다. 이때, 어노테이션 정보는 텍스트, 이미지, URL 주소 등 비디오 이미지에 첨부 가능한 모든 형태의 정보를 의미할 수 있다.Accordingly, the input unit 110 may receive a still image of the scene selected by the user and annotation information created by the user from the first terminal or the second terminal. In this case, the annotation information may refer to all types of information attachable to a video image, such as a text, an image, and a URL address.

이를 위하여, 제1 단말 또는 제2 단말에는 자동 어노테이션 시스템(100)과의 연동 및 인터페이스를 지원하는 어노테이션 툴 또는 어노테이션 전용 어플리케이션이 설치될 수 있다. 이때, 제1 단말과 제2 단말은 CPU를 가진 장치면 충분하고 그 예시로서 스마트 TV는 물론, 태블릿(tablet), 스마트 폰, 스마트 패드와 같은 모바일 전용 스마트 디바이스 등이 있을 수 있으나 이에 한정될 것은 아니다.To this end, an annotation tool or annotation-only application that supports interworking with the automatic annotation system 100 and an interface may be installed in the first terminal or the second terminal. In this case, the first terminal and the second terminal may be a device having a CPU, and examples thereof may include not only a smart TV but also a mobile dedicated smart device such as a tablet, a smart phone, and a smart pad. no.

도 2는 비디오 어노테이션 툴의 인터페이스 화면을 도시한 것으로, 이는 비디오 어노테이션 툴을 대한 이해를 돕기 위한 예시 화면이다. 사용자는 비디오 어노테이션 툴을 이용하여 비디오의 시간 단락에 맞추어 어노테이션 정보를 추가할 수 있다. 비디오 어노테이션 툴은 비디오 클립을 볼 수 있고, 시간 단락을 표시하거나, 이전 단락으로 되돌리거나, 해당 단락에 정보를 추가하는 등의 작업을 지원할 수 있다. 도 2에 도시한 비디오 어노테이션 툴은 비디오 파일에 정보를 추가하거나 재생하는데 사용되는 것으로, UI를 위한 4가지 부분으로 구성된다. 도 2에 도시한 바와 같이, 비디오 어노테이션 툴의 UI 중 'Chooser'(201)은 어노테이션 문서를 선택하여 열거나, 문서의 제목을 변경하거나, 또는 문서의 복사본을 저장하기 위해 사용된다. 그리고, 'Playback Control'(203)은 'Video Display'(202)에 해당되는 화면을 통해 비디오 클립을 재생하기 위해 사용되는 것으로, 이 패널은 재생, 일시정지, 앞으로 가기, 뒤로 가기, 재시작, 소리 조절, 재생 시간 등을 조작하기 위한 슬라이더로 사용된다. 또한, 'Note Editor'(204)는 어노테이션 정보(텍스트, 이미지 등)를 작성하기 위해 사용된다. 마지막으로, 'Timeline'(205)은 비디오 클립의 시간 척도를 보여주는 것으로, 현재 재생시간, 어노테이션 단락의 추적 시간을 보여줄 수 있다. 이때, 'Timeline'(205)은 어노테이션을 추가 또는 삭제하거나, 수정된 작업을 취소, 재시도를 수행하는 작업을 지원할 수 있다. 상기한 기능의 비디오 어노테이션 툴은 어플리케이션 형태로 구현될 수 있다.2 illustrates an interface screen of a video annotation tool, which is an example screen for understanding the video annotation tool. The user can add annotation information to the time section of the video using the video annotation tool. The video annotation tool can view video clips, display time paragraphs, return to previous paragraphs, add information to the paragraphs, and more. The video annotation tool shown in FIG. 2 is used to add or reproduce information in a video file, and is composed of four parts for the UI. As shown in FIG. 2, 'Chooser' 201 in the UI of the video annotation tool is used to select and open an annotation document, change the title of the document, or save a copy of the document. In addition, the 'Playback Control' 203 is used to play a video clip through the screen corresponding to the 'Video Display' 202. This panel is used to play, pause, go forward, go back, restart, and sound. It is used as a slider to control the adjustment, playback time, and so on. In addition, the 'Note Editor' 204 is used to create annotation information (text, image, etc.). Finally, 'Timeline' 205 shows the time scale of the video clip, which may show the current playback time, the tracking time of the annotation paragraph. In this case, the 'Timeline' 205 may support a task of adding or deleting an annotation, canceling a modified task, and retrying. The video annotation tool having the above function may be implemented in an application form.

인식부(120)는 제1 단말을 통해 재생 중인 비디오 콘텐츠에서 사용자에 의해 어노테이션 정보가 작성된 장면(즉, 사용자에 의해 선택된 장면의 스틸 이미지)의 비디오 단락을 인식하는 역할을 수행한다. 일 예로, 인식부(120)는 제1 단말을 대상으로 비디오 콘텐츠를 제공하는 콘텐츠 서버(미도시)와 연동 가능하며, 사용자로부터 어노테이션 요청이 있으면 콘텐츠 서버에 접속하여 현재 사용자가 시청하고 있는 비디오 콘텐츠를 가져올 수 있다. 인식부(120)는 사용자가 시청 중인 비디오 콘텐츠와 사용자가 어노테이션을 위해 선택한 장면을 비교하여 해당 장면의 시작 지점을 인식할 수 있다. 일반적인 비디오 스트림의 경우 프레임 단위로 이루어지기 때문에 특정 정지된 화면이 시작점에서 몇 번째 프레임인지 확인 가능하고, 이에 사용자에 의해 어노테이션 정보가 작성된 장면의 비디오 단락을 인식할 수 있다.The recognizer 120 recognizes a video section of a scene (that is, a still image of a scene selected by the user) in which the annotation information is created by the user in the video content being played through the first terminal. For example, the recognition unit 120 may be interlocked with a content server (not shown) that provides video content to a first terminal, and when an annotation request is received from a user, the recognition unit 120 connects to the content server and is currently watching the video content. Can be imported. The recognition unit 120 may recognize the starting point of the scene by comparing the video content being watched by the user with the scene selected by the user for annotation. Since a general video stream is made in units of frames, it is possible to check the number of frames from a starting point of a specific still picture, and the user can recognize a video paragraph of a scene in which the annotation information is created.

생성부(130)는 인식부(120)에 의해 인식된 비디오 단락에 사용자에 의해 작성된 어노테이션 정보를 넣어 어노테이션 정보가 포함된 새로운 비디오 스트림을 생성할 수 있다. 다시 말해, 생성부(130)는 증강현실에서 사용되는 특징 기반의 이미지 매칭 기법 또는 타임 프레임 정보를 이용하여 어노테이션 정보가 포함된 비디오 스트림을 생성할 수 있다. 이때, 비디오 스트림에는 사용자뿐만 아니라 다른 사용자들에 의해 작성된 여러 개의 어노테이션 정보가 포함될 수 있다.The generation unit 130 may generate the new video stream including the annotation information by inserting the annotation information created by the user in the video paragraph recognized by the recognition unit 120. In other words, the generation unit 130 may generate a video stream including the annotation information using a feature-based image matching technique or time frame information used in augmented reality. In this case, the video stream may include a plurality of annotation information created by not only a user but also other users.

도 3은 본 발명의 일실시예에 있어서, 어노테이션 정보가 포함된 비디오를 생성하는 과정을 설명하기 위한 예시 화면이다.3 is an exemplary screen for explaining a process of generating a video including annotation information according to an embodiment of the present invention.

도 3을 참조하면, 인식부(120)는 사용자에 의해 어노테이션 정보가 작성된 장면의 스틸 이미지(301)와 사용자가 시청 중인 콘텐츠의 비디오 스트림(302)을 비교하여 비디오 스트림(302)에서 스틸 이미지(301)가 존재하는 비디오 단락을 인식할 수 있다. 그리고, 생성부(130)는 비디오 스트림(302)의 해당 단락에 사용자에 의해 작성된 어노테이션 정보를 병합하여 어노테이션 정보가 포함된 비디오 스트림(303)을 생성할 수 있다.Referring to FIG. 3, the recognizer 120 compares a still image 301 of a scene in which annotation information is created by a user with a video stream 302 of content being viewed by the user, and compares the still image ( 301 may recognize the existing video paragraph. In addition, the generation unit 130 may generate the video stream 303 including the annotation information by merging the annotation information created by the user in the corresponding paragraph of the video stream 302.

데이터베이스(140)는 생성부(130)에 의해 생성된, 어노테이션 정보가 포함된 비디오 스트림을 저장 및 유지할 수 있다. 또한, 자동 어노테이션 시스템(100)이 제1 단말을 대상으로 비디오 콘텐츠를 제공하는 콘텐츠 서버에 적용되는 경우, 자동 어노테이션 시스템은 데이터베이스(140)를 통해 비디오 콘텐츠를 직접 저장 및 유지할 수 있다.The database 140 may store and maintain a video stream including the annotation information generated by the generator 130. In addition, when the automatic annotation system 100 is applied to a content server that provides video content for the first terminal, the automatic annotation system may directly store and maintain the video content through the database 140.

제공부(150)는 사용자의 요청에 따라 제1 단말 또는 제2 단말로 어노테이션 정보가 포함된 비디오 스트림을 제공할 수 있다. 제공부(150)는 사용자가 특정 장면 또는 특정 이미지를 클릭하면 해당 장면에 첨부되어 있는 어노테이션 정보를 재생 중인 화면 상에서 증강된 객체로 제공할 수 있다. 즉, 어노테이션 정보가 포함된 비디오 스트림은 제1 단말 또는 제2 단말 상에서 재생 시에 어노테이션 정보가 해당 장면에서 증강된 객체로 표현될 수 있다.The provider 150 may provide a video stream including the annotation information to the first terminal or the second terminal according to a user's request. When the user clicks on a specific scene or a specific image, the provider 150 may provide annotation information attached to the scene as an augmented object on the screen being played. That is, the video stream including the annotation information may be represented as an object in which the annotation information is augmented in the scene when played on the first terminal or the second terminal.

도 4는 본 발명의 일실시예에 있어서, 어노테이션 정보가 포함된 비디오의 생성 및 서비스 형태를 설명하기 위한 예시 화면이다.FIG. 4 is an exemplary screen for describing a generation and service type of a video including annotation information according to an embodiment of the present invention.

상기한 구성의 자동 어노테이션 시스템(100)은 사용자가 시청 중인 콘텐츠의 비디오 스트림(401)을 이용하여 어노테이션 정보가 포함된 비디오 스트림(402)을 생성할 수 있다. 이후, 자동 어노테이션 시스템(100)은 어노테이션 정보가 포함된 비디오 스트림(403)을 저장하고 있다가, 사용자의 요청에 따라 특징 기반의 이미지 매칭 기법이나 타임 프레임 정보를 통해 특정 장면의 어노테이션 정보가 증강된 객체로 구현된 비디오(404)를 사용자의 단말로 제공할 수 있다.The automatic annotation system 100 having the above configuration may generate the video stream 402 including the annotation information by using the video stream 401 of the content being viewed by the user. Subsequently, the automatic annotation system 100 stores the video stream 403 including the annotation information, and the annotation information of a specific scene is augmented through a feature-based image matching technique or time frame information according to a user's request. The video 404 implemented as an object may be provided to the user's terminal.

하나의 비디오 스트림에는 다수의 장면에 어노테이션 정보가 첨부될 수 있고, 또한 하나의 장면에는 다수의 어노테이션 정보가 첨부될 수 있다. 이에, 제공부(150)는 비디오 스트림에 첨부된 어노테이션 목록을 제공할 수 있다. 도 5에 도시한 바와 같이, 제공부(150)는 사용자가 요청한 특정 콘텐츠를 재생하기 위한 동영상 플레이어(503)와, 재생 중인 콘텐츠에 대하여 어노테이션 정보가 작성된 이미지 목록(501)과, 이미지 각각에 첨부된 어노테이션 정보(502) 등을 포함하는 UI를 제공할 수 있다.An annotation information may be attached to a plurality of scenes in one video stream, and a plurality of annotation information may be attached to one scene. Accordingly, the provider 150 may provide a list of annotations attached to the video stream. As shown in FIG. 5, the providing unit 150 includes a video player 503 for playing specific content requested by the user, an image list 501 in which annotation information is created for the content being played, and attached to each image. A UI including the annotation information 502 may be provided.

상기한 구성의 자동 어노테이션 시스템(100)은 다음의 절차에 따라 다양한 용도를 위해 사용될 수 있다.The automatic annotation system 100 of the above configuration can be used for various purposes according to the following procedure.

1. 비디오를 시청하는 사용자 각각은 비디오에 등장하는 장면을 보는 동안 자신이 관심이 있는 다양한 객체들에 대해서 정지된 화면 혹은 사진(screen shot)을 찍을 수 있다.1. Each user watching a video can take a still picture or screen shot of various objects of interest while watching the scenes appearing in the video.

2. 사용자 각각은 각 장면에 대해 다양한 객체들과 함께 어노테이션을 작성할 수 있다.2. Each user can annotate with various objects for each scene.

3. 개인적으로 어노테이션 된 장면 및 객체들은 지정된 서버(자동 어노테이션 시스템)에 보내질 수 있으며, 이때 어노테이션 된 장면들은 해당 서버에서 수집되어 자율적인 비디오 어노테이션 작업을 위해 사용될 수 있다.3. Personally annotated scenes and objects can be sent to a designated server (auto-annotation system), where the annotated scenes can be collected from the server and used for autonomous video annotation.

4. 비디오 어노테이션 작업을 수행하는 서버에서는 사용자가 시청한 비디오와 사용자가 어노테이션을 작성한 장면을 기계적으로 비교함으로써 어노테이션 된 장면의 시작 지점을 인식할 수 있고, 또한 자동적으로 해당 콘텐츠의 비디오 스트림에 비디오 어노테이션과 객체 어노테이션을 넣을 수 있다. 이러한 방법을 통해 협업적인 어노테이션 작성이 자동으로 이루어질 수 있다.4. The server performing the video annotation task can recognize the starting point of the annotated scene by mechanically comparing the video watched by the user with the scene that the user has annotated, and also automatically annotates the video stream of the content. And object annotations. In this way, collaborative annotation creation can be done automatically.

5. 증강현실에서 사용되는 특징 점 검출 알고리즘 혹은 타임 프레임 정보를 통해서 사용자들로부터 수집된 어노테이션 된 스틸 이미지는 어노테이션 정보가 포함된 비디오 스트림을 생성하는 데에 이용될 수 있다.5. Annotated still images collected from users through feature point detection algorithms or time frame information used in augmented reality can be used to generate video streams containing annotation information.

6. 어노테이션 정보가 포함된 비디오 스트림은 특정 장면 또는 객체에 대해 추가적인 정보를 보여준다. 이러한 어노테이션 정보를 이용함으로써 www 내의 자원 및 개별 사용자에 의해 어노테이션 된 디지털 콘텐츠들과 함께 비디오 스트림의 장면이나 객체를 통합할 수 있다.6. A video stream containing annotation information shows additional information about a particular scene or object. This annotation information can be used to integrate scenes or objects in the video stream with resources within www and digital content annotated by individual users.

7. 비디오 스트림의 재생 시 어노테이션 된 장면은 어노테이션 정보의 노출 없이 보여질 수 있다. 하지만 만약 사용자가 단말을 이용하여 해당 장면을 본다면 어노테이션 정보가 단말의 화면 상에 증강되어 보여질 수 있다.7. During playback of the video stream, the annotated scene can be viewed without exposing the annotation information. However, if the user views the scene using the terminal, the annotation information may be augmented and displayed on the screen of the terminal.

상기한 시나리오를 포함하는 본 발명의 솔루션은 TV 스크린과 상업적 스폰서나 시청자와 같은 참가자를 위한 어노테이션 정보를 통합함으로써 개인화 된 콘텐츠 서비스를 위하여 광고, 이러닝(E-Learning) 또는 대화형 TV 등에 적용될 수 있다.The solution of the present invention including the above scenario can be applied to advertisement, e-learning or interactive TV for personalized content service by integrating the TV screen and annotation information for participants such as commercial sponsors or viewers. .

도 6은 본 발명의 일실예에 있어서, TV 콘텐츠의 비디오 단락에 어노테이션을 작성하는 비디오 어노테이션 방법을 도시한 흐름도이다. 일실시예에 따른 비디오 어노테이션 방법은 도 1을 통해 설명한 자동 어노테이션 시스템(100)에 의해 각각의 단계가 수행될 수 있다.6 is a flowchart illustrating a video annotation method for annotating video paragraphs of TV content in one embodiment of the present invention. In the video annotation method according to an embodiment, each step may be performed by the automatic annotation system 100 described with reference to FIG. 1.

단계(610)에서 자동 어노테이션 시스템은 사용자가 시청 중인 비디오 스트림에서 사용자에 의해 어노테이션 정보가 작성된 스틸 이미지의 비디오 단락을 인식할 수 있다. 이를 위하여, 자동 어노테이션 시스템은 제1 단말 또는 제1 단말과 이기종의 제2 단말로부터 제1 단말을 통해 재생 중인 비디오 콘텐츠에서 사용자가 선택한 장면 및 해당 장면에 대하여 사용자가 작성한 어노테이션 정보를 입력 받을 수 있다. 그리고, 자동 어노테이션 시스템은 사용자가 시청 중인 비디오 콘텐츠와 사용자가 어노테이션을 위해 선택한 장면을 비교하여 해당 장면의 시작 지점을 인식함으로써 사용자에 의해 어노테이션 된 장면의 이미지 단락을 인식할 수 있다.In step 610, the automatic annotation system may recognize a video paragraph of the still image in which the annotation information was created by the user in the video stream that the user is watching. To this end, the automatic annotation system may receive input of the scene selected by the user and annotation information created by the user with respect to the scene selected from the video content being played through the first terminal from the first terminal or the first terminal and the heterogeneous second terminal. . In addition, the automatic annotation system may recognize the image paragraph of the scene annotated by the user by comparing the video content being watched by the user with the scene selected by the user to recognize the starting point of the scene.

단계(620)에서 자동 어노테이션 시스템은 사용자에 의해 작성된 어노테이션 정보를 단계(610)에서 인식된 비디오 단락에 병합하여 어노테이션 정보가 포함된 새로운 비디오 스트림을 생성할 수 있다. 이때, 자동 어노테이션 시스템은 증강현실에서 사용되는 특징 기반의 이미지 매칭 기법 또는 타임 프레임 정보를 이용하여 어노테이션 정보가 포함된 비디오 스트림을 생성할 수 있다.In step 620, the automatic annotation system merges the annotation information created by the user into the video paragraph recognized in step 610 to generate a new video stream containing the annotation information. In this case, the automatic annotation system may generate a video stream including the annotation information using a feature-based image matching technique or time frame information used in augmented reality.

단계(630)에서 자동 어노테이션 시스템은 어노테이션 정보가 포함된 비디오 스트림을 저장 및 유지하고 있다가, 사용자의 요청에 따라 제1 단말 또는 제2 단말로 어노테이션 정보가 포함된 비디오 스트림을 제공할 수 있다. 이때, 어노테이션 정보가 포함된 비디오 스트림은 특징 기반의 이미지 매칭 기법이나 타임 프레임 정보를 통해 특정 장면의 어노테이션 정보가 증강된 객체로 표시되도록 구현되어 있다. 즉, 자동 어노테이션 시스템은 사용자가 해당 비디오를 재생하거나, 사용자가 특정 장면 또는 특정 이미지를 클릭하면, 해당 장면에 첨부되어 있는 어노테이션 정보를 재생 중인 화면 상에서 증강된 객체로 제공할 수 있다.In operation 630, the automatic annotation system stores and maintains the video stream including the annotation information, and may provide the video stream including the annotation information to the first terminal or the second terminal at the user's request. In this case, the video stream including the annotation information is implemented to display the annotation information of a specific scene as an augmented object through a feature-based image matching technique or time frame information. That is, the automatic annotation system may provide the annotation information attached to the scene as an augmented object on the screen while the user plays the video or when the user clicks a specific scene or a specific image.

이와 같이, 본 발명의 실시예에 따르면, 비디오 단락의 정지된 이미지들을 어노테이션 하고 특징 기반의 이미지 매칭 기법이나 타임 프레임 정보를 이용하여 특정 장면의 객체를 비디오에 증강시킬 수 있다. 따라서, 비디오 방송 기관을 위한 새로운 어노테이션 방법을 제공함으로써 유저들을 위해 견해를 수집하고 고객들의 의견이나 다른 어노테이션 정보를 직접적으로 보여주거나, 그들의 의견이나 견해를 갱신하고 뉴스나 광고 직후에 어노테이션 정보를 P2P 네트워킹이나 어플리케이션을 통해 다양한 형태로 서비스 할 수 있다.As described above, according to an exemplary embodiment of the present invention, an object of a specific scene may be augmented in the video by annotating still images of a video paragraph and using a feature-based image matching technique or time frame information. Thus, by providing a new annotation method for video broadcasters, it can collect views for users and present their opinions or other annotation information directly, or update their opinions or views and P2P networking information immediately after news or advertising. It can also be serviced in various forms through applications.

본 발명의 실시예에 따른 방법들은 다양한 컴퓨터 시스템을 통하여 수행될 수 있는 프로그램 명령(instruction) 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 또한, 상술한 파일 시스템은 컴퓨터 판독이 가능한 기록 매체에 기록될 수 있다.The methods according to embodiments of the present invention may be implemented in the form of a program instruction that can be executed through various computer systems and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. In addition, the above-described file system can be recorded in a computer-readable recording medium.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

100: 자동 어노테이션 시스템
110: 입력부
120: 인식부
130: 생성부
140: 데이터베이스
150: 제공부100: Auto Annotation System
110: input unit
120:
130: generation unit
140: Database
150: provider

Claims

A recognition unit for recognizing a video segment of a scene in which annotation information is created by the user in video content being played through the first terminal of the user; And
A generator which generates the video stream including the annotation information by inserting the annotation information into the video paragraph;
Lt; / RTI >
Wherein the generation unit comprises:
Generating the video stream using a feature based image matching algorithm used in Augmented Reality or time frame information of the video content.
An automatic annotation system, characterized in that.

The method of claim 1,
An input unit for receiving a scene selected by the user from the video content from the first terminal or the second terminal and a heterogeneous second terminal, and the annotation information created by the user with respect to the scene
Automatic annotation system further comprising.

The method of claim 1,
Wherein,
Comparing the video content with the scene to recognize a starting point of the scene
An automatic annotation system, characterized in that.

delete

A recognition unit for recognizing a video segment of a scene in which annotation information is created by the user in video content being played through the first terminal of the user;
A generator configured to insert the annotation information into the video paragraph to generate a video stream including the annotation information;
A database for storing the video stream; And
Providing unit for providing the video stream to the first terminal or the first terminal and a heterogeneous second terminal at the request of the user
Lt; / RTI >
Wherein the providing unit comprises:
Providing the annotation information as an augmented object in the scene when playing the video stream.
An automatic annotation system, characterized in that.

Receiving a scene selected by the user and annotation information created by the user with respect to the scene, from a first terminal or the first terminal and a heterogeneous second terminal through video content being played through the first terminal; step;
Recognizing a video segment of a scene selected by the user in the video content; And
Inserting the annotation information into the video paragraph to generate a video stream including the annotation information
Lt; / RTI >
Generating the video stream,
Generating the video stream using a feature based image matching algorithm used in Augmented Reality or time frame information of the video content.
And video annotation method.

The method of claim 7, wherein
Recognizing the video paragraph,
Comparing the video content with the scene to recognize a starting point of the scene
And video annotation method.

delete

Receiving a scene selected by the user and annotation information created by the user with respect to the scene, from a first terminal or the first terminal and a heterogeneous second terminal through video content being played through the first terminal; step;
Recognizing a video segment of a scene selected by the user in the video content;
Inserting the annotation information into the video paragraph to generate a video stream including the annotation information; And
Providing the video stream to the first terminal or the second terminal at the request of the user, and providing the annotation information as an augmented object in the scene.
Video annotation method comprising a.