KR101313285B1

KR101313285B1 - Method and Device for Authoring Information File of Hyper Video and Computer-readable Recording Medium for the same

Info

Publication number: KR101313285B1
Application number: KR1020110054169A
Authority: KR
Inventors: 김호
Original assignee: 주식회사 에이치비솔루션
Priority date: 2011-06-03
Filing date: 2011-06-03
Publication date: 2013-09-30
Also published as: KR20120134936A

Abstract

본 발명은 하이퍼 비디오 정보 파일 제작장치 및 제작방법, 그 기록매체에 관한 것으로서,
비디오 파일에 포함된 인물이나 객체의 부가 정보가 정보 파일로 제작되어 하이퍼 링크로 연결되는 하이퍼 비디오에 있어서, 정보 파일 제작장치에서 실행되는 하이퍼 비디오 정보 파일 제작방법으로서, (a) 입력된 비디오 파일로부터 프레임들을 추출하는 단계; (b) 추출된 프레임들에서 샷 경계를 검출하고, 검출된 각 샷 별로 대표 프레임을 추출하며, 샷 및 대표 프레임에 관한 정보를 비디오계층 DB에 저장하는 단계; (c) 각 샷들 중 유사한 샷들을 하나의 장면으로 그룹화하고, 상기 프레임 및 샷, 장면이 구조화된 비디오 계층 구조 정보를 생성하여 비디오계층 DB에 저장하는 단계; (d) 사용자로부터 임의의 프레임에서 주목하는 객체 영역을 선택 입력받고, 상기 대표 프레임들에서 상기 객체 영역을 검출하여 객체 위치 DB에 저장하는 단계; (e) 상기 각 대표 프레임들에서 검출된 객체의 대표 위치들에서부터 인접한 프레임에서의 위치를 추적하여 객체 위치 DB에 추가 저장하는 단계; (f) 사용자로부터 입력된 객체 정보를 객체 정보 DB에 저장하는 단계; 및 (g) 상기 객체 위치 DB와 객체 정보 DB 내용을 매핑하여 하이퍼 비디오 정보 파일을 생성하는 단계;를 포함하여 구성된다. The present invention relates to a device and method for producing a hyper video information file, and a recording medium thereof,
In a hyper video in which additional information of a person or an object included in a video file is produced as an information file and linked by a hyperlink, a method of producing a hyper video information file executed in an information file producing apparatus, the method comprising: (a) from an input video file Extracting frames; (b) detecting a shot boundary from the extracted frames, extracting a representative frame for each detected shot, and storing information about the shot and the representative frame in a video layer DB; (c) grouping similar shots of each shot into one scene, generating video hierarchy information in which the frames, shots, and scenes are structured, and storing the same in a video hierarchy DB; (d) selecting and inputting an object region of interest in an arbitrary frame from a user, detecting the object region in the representative frames, and storing the object region in an object position DB; (e) tracking the position in the adjacent frame from the representative positions of the objects detected in the respective representative frames and storing the additional position in the object position DB; (f) storing the object information input from the user in the object information DB; And (g) generating a hyper video information file by mapping the object location DB and object information DB contents.

Description

Apparatus and method for producing hyper video information file, recording medium {Method and Device for Authoring Information File of Hyper Video and Computer-readable Recording Medium for the same}

본 발명은 하이퍼 비디오 정보 파일 제작장치 및 제작방법, 그 기록매체에 관한 것으로서, 비디오 데이터에 포함된 인물이나 객체들의 부가 정보가 하이퍼 링크로 연결되는 하이퍼 비디오의 정보 파일을 제작하되, 장면분할, 얼굴검출 및 그룹화, 객체검출, 객체 추적 등을 자동으로 하여 하이퍼 비디오 정보 파일 제작을 효율적으로 진행할 수 있도록 구성된 하이퍼 비디오 정보 파일 제작장치 및 제작방법, 그 기록매체에 관한 것이다.
The present invention relates to an apparatus and method for producing a hyper video information file, and a recording medium thereof, wherein an information file of a hyper video in which additional information of a person or an object included in video data is linked by a hyperlink, is divided into a scene and a face. The present invention relates to a hyper video information file production apparatus and a method for producing a hyper video information file efficiently by automatically detecting and grouping, detecting an object, and tracking an object, and a recording medium thereof.

하이퍼 비디오(Hyper Video)란, 하이퍼 링크(Hyper Link)를 포함하고 있는 비디오, 즉 비디오에 등장하는 인물들의 얼굴영역이나 목적하는 객체영역들에 하이퍼 링크를 연결하고, 그것을 통하여 인물이나 객체들의 부가 정보를 볼 수 있도록 편집된 비디오를 말한다. Hyper video is a video including a hyper link, that is, a hyperlink is connected to a face area or a desired object area of a person appearing in the video, and through this, additional information of the person or objects is provided. Speak the edited video to view it.

이러한 하이퍼 비디오는 인터넷 홈 페이지에서 광범위하게 사용되고 있는 하이퍼 텍스트(HyperText)의 개념을 비디오 분야로 확장한 것이라고 할 수 있다. 일반 비디오와 달리 하이퍼 비디오에서는 그 재생 시 하이퍼 링크가 걸려 있는 인물들과 객체들에 대한 부가 정보를 사용자가 손쉽게 선택하여 볼 수 있다. Such hyper video is an extension of the concept of HyperText, which is widely used on the Internet home page, to the video field. Unlike normal video, hyper video allows the user to easily select and view additional information on the people and objects that are hyperlinked during playback.

최근 하이퍼 비디오의 응용 범위는 예를 들어, PPL(Product in PLacement) 개념을 응용한 IPTV에서의 양방향 광고 등 양방향 서비스(Interactive Service)를 중심으로 상당히 확대되는 추세이다. Recently, the application range of hyper video has been expanded considerably, for example, an interactive service such as an interactive advertisement in an IPTV using a product in PLacement (PPL) concept.

하이퍼 링크로 비디오에 연결되는 부가 정보로서는 비디오에 포함된 인물과 관련된 부가적 화상, 본문, 음성, 비디오, 홈페이지 주소(URL) 등을 예로 들 수 있다. Examples of the additional information linked to the video through a hyperlink include an additional image, a text, a voice, a video, a homepage address (URL), and the like related to the person included in the video.

이러한 하이퍼 비디오는 비디오 데이터에 포함된 인물(예, 얼굴)이나 객체들(예, 안경, 귀걸이, 의류, 가방 등)의 각 비디오 프레임 상의 위치 및 그와 관련된 부가 정보를 저장한 정보 파일을 하이퍼 비디오 정보 파일 제작장치를 통해 제작하고, 하이퍼 비디오 재생장치에서 비디오 데이터와 해당 정보 파일을 하이퍼 링크 되도록 재생하는 방식으로 구현된다. This hyper video is a hyper video that stores the location of each person's (eg, face) or objects (eg, glasses, earrings, clothing, bags, etc.) contained in the video data on each video frame and additional information associated with it. Produced through an information file production device, and reproduced so that the video data and the information file to be hyperlinked in the hyper video playback device.

하이퍼 비디오 정보 파일 제작에서 중요한 문제는 하이퍼 링크를 걸어주어야 할 목적하는 인물 또는 객체 영역의 정의와 검증이다. An important issue in the creation of hyper-video information files is the definition and verification of the desired person or object area to which the hyperlinks should be placed.

이와 관련하여, MacOS 용 VideoClix와 같은 종래의 하이퍼 비디오 정보 파일 제작장치는, 비디오의 매 프레임에서 목적하는 객체 영역의 정의와 그 변화를 작업자가 수동으로 지정하고 편집하는 방식을 취하므로, 하이퍼 비디오 정보 파일 제작에 많은 시간과 노력이 소비된다는 한계가 있었다.In this regard, conventional hyper video information file authoring apparatus, such as VideoClix for MacOS, takes the method of manually specifying and editing the definition of the desired object area and the change in every frame of video, so that the hyper video information can be edited. There was a limit to the amount of time and effort required to produce a file.

이러한 점을 감안한 기술로서, 예를 들어, 대한민국 공개특허 10-2009-0044221호(공개일자 2009.05.07)에서는 양방향 광고 정보 파일 저작 서비스 제공방법에 대하여 제안한 바 있다. In consideration of this point, for example, Korean Patent Laid-Open Publication No. 10-2009-0044221 (published 2009.05.07) has proposed a method for providing an interactive advertisement information file authoring service.

그러나, 상기 양방향 광고 정보 파일 저작 서비스 제공방법은, 추출된 프레임에 포함된 광고 객체 영역을 정보 파일 저작자가 수동으로 선택하고, 기 설정된 프레임 추출 빈도에 따라 다음 프레임을 순차적으로 추출하는 방식으로 진행하므로, 근본적으로 수동 작업의 한계를 벗어나지 못하였고, 전체 프레임에 대하여 소요되는 작업 시간이 적지 않다는 문제가 있었다. However, the method for providing the interactive advertisement information file authoring service proceeds by manually selecting the advertisement object region included in the extracted frame and extracting the next frame sequentially according to a preset frame extraction frequency. However, there was a problem that the basic work was not limited to the manual work and the working time for the entire frame was not small.

또한, 상기 양방향 광고 정보 파일 저작 서비스 제공방법은, 장면전환판단 루틴을 구비하도록 하여, 추출된 프레임과 이전 프레임의 변화도를 측정하여 임계값보다 크면 장면 전환된 것으로 판단하여 그 동안 추출된 여러 프레임들을 모아 하나의 장면으로 저장하는 방식을 취하였는데, 통상적으로 동영상이 카메라를 통해 촬영된 구조에 따라 프레임(frame)-샷(shot)-장면(scene) 등 다층적 영상 구조를 취함에도 불구하고, 이러한 구분 없이 프레임의 변환도 측정에 따라 단순히 장면 전환 여부만을 판단하므로, 장면 전환을 구분하는 실질적 효과가 미미하였다. In addition, the method for providing an interactive advertisement information file authoring service includes a scene change determination routine, and measures the degree of change of the extracted frame and the previous frame, and determines that the scene is changed when it is larger than a threshold value. After the video is taken in a multi-layered video structure such as frame-shot-scene according to the structure of the video shot through the camera, Without this division, the frame conversion degree is simply determined as to whether or not to change scenes according to the measurement, and thus the practical effect of distinguishing scene transitions is insignificant.

또한, 상기와 같은 장면 전환을 구분함에 있어서도, 기 설정된 고정된 임계값을 사용하므로, 다양한 화면 변화에 대응하지 못하고 장면 전환 판단의 정밀도가 낮아진다는 문제점도 있었다. In addition, since the preset fixed threshold value is used to classify the scene change as described above, there is a problem that the precision of the scene change determination is lowered without being able to cope with various screen changes.

또한, 양방향 광고 정보 파일을 만들기 위한 기술의 목적상, 광고 대상이 되는 물건 객체에 대한 위치만을 파악하도록 구성되어, 인물의 얼굴 위치 파악에는 효과적인 수단을 제공하지 못하는 한계가 있었다.
In addition, for the purpose of a technology for creating an interactive advertisement information file, it is configured to only detect the position of the object object to be advertised, there was a limit that does not provide an effective means for determining the position of the face of the person.

상기 종래 기술에 따른 문제점을 해결하기 위한 본 발명은, 특히, 비디오 데이터에 포함된 인물이나 객체들의 부가 정보가 하이퍼 링크로 연결되는 하이퍼 비디오의 정보 파일을 제작하되, 장면분할, 얼굴검출 및 그룹화, 객체검출, 객체 추적 등을 자동으로 하여 하이퍼 비디오 정보 파일 제작을 효율적으로 진행할 수 있도록 구성된 하이퍼 비디오 정보 파일 제작장치 및 제작방법, 그 기록매체를 제공하는 것을 그 목적으로 한다.
The present invention for solving the problems according to the prior art, in particular, while producing an information file of the hyper video in which the additional information of the person or objects included in the video data is connected by a hyperlink, scene division, face detection and grouping, An object of the present invention is to provide an apparatus and method for producing a hyper video information file, which is configured to automatically perform object detection, object tracking, and the like to efficiently produce a hyper video information file, and a recording medium thereof.

상기와 같은 목적을 달성하기 위한 본 발명의 일실시예는, 비디오 파일에 포함된 인물이나 객체의 부가 정보가 정보 파일로 제작되어 하이퍼 링크로 연결되는 하이퍼 비디오에 있어서, 정보 파일 제작장치에서 실행되는 하이퍼 비디오 정보 파일 제작방법으로서, (a) 입력된 비디오 파일로부터 프레임들을 추출하는 단계; (b) 추출된 프레임들에서 샷 경계를 검출하고, 검출된 각 샷 별로 대표 프레임을 추출하며, 샷 및 대표 프레임에 관한 정보를 비디오계층 DB에 저장하는 단계; (c) 각 샷들 중 유사한 샷들을 하나의 장면으로 그룹화하고, 상기 프레임 및 샷, 장면이 구조화된 비디오 계층 구조 정보를 생성하여 비디오계층 DB에 저장하는 단계; (d) 사용자로부터 임의의 프레임에서 주목하는 객체 영역을 선택 입력받고, 상기 대표 프레임들에서 상기 객체 영역을 검출하여 객체 위치 DB에 저장하는 단계; (e) 상기 각 대표 프레임들에서 검출된 객체의 대표 위치들에서부터 인접한 프레임에서의 위치를 추적하여 객체 위치 DB에 추가 저장하는 단계; (f) 사용자로부터 입력된 객체 정보를 객체 정보 DB에 저장하는 단계; 및 (g) 상기 객체 위치 DB와 객체 정보 DB 내용을 매핑하여 하이퍼 비디오 정보 파일을 생성하는 단계;를 포함하여 구성된다. One embodiment of the present invention for achieving the above object is, in the hyper-video is created in the information file and additional information of the person or object included in the video file is linked to the hyperlink, which is executed in the information file production apparatus A method of producing a hyper video information file, comprising: (a) extracting frames from an input video file; (b) detecting a shot boundary from the extracted frames, extracting a representative frame for each detected shot, and storing information about the shot and the representative frame in a video layer DB; (c) grouping similar shots of each shot into one scene, generating video hierarchy information in which the frames, shots, and scenes are structured, and storing the same in a video hierarchy DB; (d) selecting and inputting an object region of interest in an arbitrary frame from a user, detecting the object region in the representative frames, and storing the object region in an object position DB; (e) tracking the position in the adjacent frame from the representative positions of the objects detected in the respective representative frames and storing the additional position in the object position DB; (f) storing the object information input from the user in the object information DB; And (g) generating a hyper video information file by mapping the object location DB and object information DB contents.

본 발명의 또 다른 측면에 따른 일실시예는, 비디오 파일에 포함된 인물이나 객체의 부가 정보가 정보 파일로 제작되어 하이퍼 링크로 연결되는 하이퍼 비디오에 있어서, 정보 파일 제작장치에서 실행되는 하이퍼 비디오 정보 파일 제작방법으로서, 입력된 비디오 파일로부터 추출된 프레임들에서 샷 경계를 검출하고, 검출된 각 샷 별로 대표 프레임을 추출하며, 샷 및 대표 프레임에 관한 정보를 비디오계층 DB에 저장하는 단계; 각 샷들 중 유사한 샷들을 하나의 장면으로 그룹화하고, 상기 프레임 및 샷, 장면이 구조화된 비디오 계층 구조 정보를 생성하여 비디오계층 DB에 저장하는 단계; 각 대표 프레임들에 포함된 얼굴 영역을 검출하여 상기 검출된 얼굴 영역 중 유사한 얼굴들로 얼굴 그룹화를 진행하여 객체 위치 DB에 저장하며, 사용자로부터 선택 입력된 객체 영역을 상기 대표 프레임에서 검출하고, 인접한 프레임에서 상기 객체와 그룹화된 얼굴의 위치를 추적하여 객체 위치 DB에 저장하는 단계; 사용자로부터 입력된 객체 정보를 객체 정보 DB에 저장하고, 상기 객체 위치 DB와 객체 정보 DB 내용을 매핑하여 하이퍼 비디오 정보 파일을 생성하는 단계;를 포함하여 구성된다. In one embodiment according to another aspect of the present invention, hyper video information that is executed in an information file production apparatus in a hyper video in which additional information of a person or an object included in a video file is produced as an information file and linked by a hyperlink A file production method, comprising: detecting a shot boundary from frames extracted from an input video file, extracting a representative frame for each detected shot, and storing information about the shot and the representative frame in a video layer DB; Grouping similar shots among the shots into one scene, generating video hierarchy information in which the frames, shots, and scenes are structured, and storing the same in a video hierarchy DB; Detects a face area included in each representative frame, performs a face grouping with similar faces among the detected face areas, stores the object in the object position DB, detects an object area selected by a user in the representative frame, and Tracking the position of the face grouped with the object in a frame and storing the position in the object position DB; Storing the object information input from the user in the object information DB, and mapping the object location DB and the object information DB contents to generate a hyper video information file.

본 발명의 다른 측면에 따르면, 상기 하이퍼 비디오 정보파일 제작방법의 각 단계를 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체가 제공된다. According to another aspect of the present invention, there is provided a computer-readable recording medium having recorded thereon a program for executing each step of the method for producing a hyper video information file.

본 발명의 또 다른 측면에 따른 일실시예는, 비디오 파일에 포함된 인물이나 객체의 부가 정보가 하이퍼 링크로 연결되는 하이퍼 비디오를 구성하기 위한 하이퍼 비디오 정보 파일 제작장치로서, 입력된 비디오 파일로부터 프레임들을 추출하는 기능과, 추출된 프레임들에서 샷 경계를 검출하고, 검출된 각 샷 별로 대표 프레임을 추출하며, 샷 및 대표 프레임에 관한 정보를 비디오계층 DB에 저장하는 기능과, 각 샷들 중 유사한 샷들을 하나의 장면으로 그룹화하고, 상기 프레임 및 샷, 장면이 구조화된 비디오 계층 구조 정보를 생성하여 비디오계층 DB에 저장하는 기능을 수행하는 장면 그룹화 모듈; 상기 대표 프레임들에서 상기 객체 영역을 검출하여 객체 위치 DB에 저장하는 기능을 수행하는 객체 검출 모듈; 상기 각 대표 프레임들에서 검출된 객체의 대표 위치들에서부터 인접한 프레임에서의 위치를 추적하여 객체 위치 DB에 추가 저장하는 객체 추적 모듈; 사용자로부터 임의의 프레임에서 주목하는 객체 영역을 선택 입력받는 기능과, 사용자로부터 입력된 객체 정보를 객체 정보 DB에 저장하는 기능을 수행하는 사용자 UI 모듈; 및 상기 객체 위치 DB와 객체 정보 DB 내용을 매핑하여 하이퍼 비디오 정보 파일을 생성하는 기능을 적어도 수행하는 제어 모듈;을 포함하여 구성된다. An embodiment according to another aspect of the present invention is a hyper video information file production apparatus for configuring a hyper video in which additional information of a person or an object included in a video file is linked by a hyperlink, and includes a frame from an input video file. Extracting frames, detecting a shot boundary from the extracted frames, extracting a representative frame for each detected shot, storing information about the shot and the representative frame in a video layer DB, and similar shots among the shots. A scene grouping module configured to group the video data into a single scene, and to generate the video hierarchy information in which the frames, shots, and scenes are structured, and store the video hierarchy information in a video layer DB; An object detection module configured to detect the object area in the representative frames and store the object area in an object location DB; An object tracking module for tracking a location in an adjacent frame from the representative locations of the objects detected in the representative frames and storing the location in an object location DB; A user UI module for selecting and receiving an object region of interest in an arbitrary frame from a user and storing the object information input from the user in an object information DB; And a control module configured to at least perform a function of generating a hyper video information file by mapping the object location DB and object information DB contents.

바람직하게는, 검출된 각 대표 프레임들에 포함된 얼굴 영역을 검출하는 얼굴 영역 검출 모듈;및 상기 검출된 얼굴 영역 중 유사한 얼굴들로 얼굴 그룹화를 진행하여 객체 위치 DB에 저장하는 얼굴 그룹화 모듈;을 더 포함하여 구성될 수 있다. Preferably, a face region detection module for detecting a face region included in each of the detected representative frames; and a face grouping module for performing face grouping with similar faces among the detected face regions and storing them in an object position DB. It may be configured to include more.

본 발명의 또 다른 측면에 따른 일실시예는, 비디오 파일에 포함된 인물이나 객체의 부가 정보가 하이퍼 링크로 연결되는 하이퍼 비디오를 구성하기 위한 하이퍼 비디오 정보 파일 제작장치로서, 입력된 비디오 파일로부터 추출된 프레임들에서 샷 경계를 검출하고, 검출된 각 샷 별로 대표 프레임을 추출하며, 샷 및 대표 프레임에 관한 정보를 비디오계층 DB에 저장하는 기능; 각 샷들 중 유사한 샷들을 하나의 장면으로 그룹화하고, 상기 프레임 및 샷, 장면이 구조화된 비디오 계층 구조 정보를 생성하여 비디오계층 DB에 저장하는 기능; 각 대표 프레임들에 포함된 얼굴 영역을 검출하여 상기 검출된 얼굴 영역 중 유사한 얼굴들로 얼굴 그룹화를 진행하여 객체 위치 DB에 저장하며, 사용자로부터 선택 입력된 객체 영역을 상기 대표 프레임에서 검출하고, 인접한 프레임에서 상기 객체와 그룹화된 얼굴의 위치를 추적하여 객체 위치 DB에 저장하는 기능; 및 사용자로부터 입력된 객체 정보를 객체 정보 DB에 저장하고, 상기 객체 위치 DB와 객체 정보 DB 내용을 매핑하여 하이퍼 비디오 정보 파일을 생성하는 기능;을 적어도 실행하도록 구성될 수 있다.
An embodiment according to another aspect of the present invention is a hyper video information file production apparatus for constructing a hyper video in which additional information of a person or an object included in a video file is linked by a hyperlink, and extracted from an input video file. Detecting a shot boundary from the captured frames, extracting a representative frame for each detected shot, and storing information about the shot and the representative frame in a video layer DB; Grouping similar shots among the shots into one scene, and generating and storing the video hierarchy information in which the frames, shots, and scenes are structured and stored in a video layer DB; Detects a face area included in each representative frame, performs a face grouping with similar faces among the detected face areas, stores the object in the object position DB, detects an object area selected by a user in the representative frame, and Tracking the position of the face grouped with the object in a frame and storing the position in the object position DB; And storing the object information input from the user in the object information DB, and mapping the object location DB and the object information DB contents to generate a hyper video information file.

이와 같은 본 발명은, 장면분할, 얼굴검출 및 그룹화, 객체검출, 객체 추적을 자동으로 하여 하이퍼 비디오 정보 파일 제작을 쉽게 진행할 수 있다는 장점이 있다. As such, the present invention has an advantage in that a hyper video information file can be easily produced by automatically performing scene division, face detection and grouping, object detection, and object tracking.

특히, 본 발명은, 프레임에서 샷 경계를 검출하고 유사한 샷들을 장면으로 그룹화하여 동영상의 계층구조를 생성하므로, 프레임 또는 샷의 갯수가 많은 긴 동영상에 대하여도 편집 시에 개별 샷들에 대한 접근을 보다 쉽게 할 수 있도록 하고, 결과적으로 하이퍼 비디오 제작 효율을 높이는 장점을 제공한다.In particular, the present invention generates a hierarchical structure of a video by detecting shot boundaries in a frame and grouping similar shots into scenes, thereby providing access to individual shots when editing a long video having a large number of frames or shots. It is easy to do, and consequently offers the advantage of increasing the efficiency of hyper video production.

또한, 본 발명은, 대표 프레임들에서 객체 위치를 사용자가 추출 혹은 객체 검출을 이용하여 자동으로 검출한 다음, 검출된 전체 객체 혹은 특정한 객체에 대하여 모든 대표 프레임에서의 위치로부터 이후 인접한, 같은 샷 내의 프레임들에서의 위치를 배경처리(background) 방식으로 추적하므로, 이미 장면 그룹화가 진행된 프레임에 대하여는 사용자의 편집 조작을 장시간이 소요되는 객체 추적과 별도로 병행하여 수행할 수 있다는 장점이 있다. In addition, the present invention automatically detects an object position in representative frames using extraction or object detection, and then, within the same shot, subsequently adjacent from the position in all representative frames for the entire detected object or a particular object. Since the position in the frames is tracked by a background processing method, an edit operation of a user can be performed in parallel with the tracking of an object that takes a long time with respect to a frame in which scene grouping has already been performed.

또한, 본 발명은, 샷 경계 검출 시 사용되는 임계값을 능동적으로 설정할 수 있도록 하므로, 동영상의 내용에 관계없이 안정적이고 정밀도 높게 샷 경계를 검출할 수 있도록 한다. In addition, the present invention enables to actively set the threshold value used when detecting the shot boundary, so that the shot boundary can be detected stably and with high precision regardless of the contents of the video.

또한, 본 발명은, 광고 대상이 되는 물건 객체뿐만이 아니라, 얼굴 검출 및 그룹화를 통해 다양한 인물 관련 정보의 제공을 위한 인물의 얼굴 위치 파악도 자동적으로 수행하므로, 하이퍼 비디오의 활용성을 더욱 넓히는 장점이 있다. In addition, the present invention, as well as the object object to be advertised, automatically detects the position of the person face to provide a variety of person-related information through the face detection and grouping, the advantage of further expanding the utilization of the hyper video have.

또한, 본 발명은 하이퍼 비디오 정보 파일을 XML 형식의 메타 자료로 생성하므로, 하이퍼 비디오의 재생 시에 하나의 재생기에서 서로 다른 하이퍼 비디오 편집 결과를 재생(play)할 수 있다는 장점도 제공한다.
In addition, since the present invention generates the hyper video information file as meta data in an XML format, it also provides an advantage that different hyper video editing results can be played in one player when the hyper video is played.

도 1은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작장치의 구성도,
도 2는 본 발명의 일 실시예에 의한 하이퍼 비디오 제장장치의 비디오 계층 구조 개념도,
도 3은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 전체 흐름도,
도 4는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 장면 그룹화 단계의 흐름도,
도 5는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 얼굴 검출 단계의 흐름도,
도 6은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 얼굴 그룹화 단계의 흐름도,
도 7은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 객체 검출 단계의 흐름도,
도 8은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 객체 추적 단계의 흐름도,
도 9는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 샷 경계 검출을 설명하기 위한 참고도,
도 10은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 연속 샷 열에서의 인력관계를 설명하기 위한 참고도,
도 11은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 얼굴영역 검출을 위한 4각 특징점의 개념 예시 사진,
도 12a 및 도 12b는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 샷 경계 변화 시의 변화 전후 시점(#91,#93)의 각 프레임 화면 예,
도 13a 및 도 13b는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 장면 변화 시의 변화 전후 시점(#5350,#5352)의 각 프레임 화면 예,
도 14는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 유사 얼굴로 그룹화된 얼굴들의 예,
도 15는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자가 객체 영역을 선택하는 화면 예,
도 16은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자가 선택한 객체에 대하여 각 대표 프레임에서 객체 검출을 한 화면의 예,
도 17a 내지 도 17c는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 대표 프레임에서 검출된 객체에 대하여 인접 프레임(#404,#420,#479)에서 객체 추적되는 화면의 예,
도 18a 및 도 18b는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 기본화면 예,
도 19는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 객체 정보 편집 화면 예,
도 20은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 객체DB 내보내기 윈도우 화면 예,
도 21은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 미디어 출력 윈도우 화면 예,
도 22a 내지 도 22c는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 객체정보 디스플레이 및 하이퍼링크 화면 예이다. 1 is a block diagram of a hyper video production apparatus according to an embodiment of the present invention;
2 is a conceptual diagram of a video hierarchy structure of a hyper video making apparatus according to an embodiment of the present invention;
3 is an overall flowchart of a method for producing a hyper video according to an embodiment of the present invention;
4 is a flowchart of a scene grouping step of the hyper video production method according to an embodiment of the present invention;
5 is a flowchart of a face detection step of a hyper video production method according to an embodiment of the present invention;
6 is a flowchart of a face grouping step of the hyper video production method according to an embodiment of the present invention;
7 is a flowchart of an object detection step of a hyper video production method according to an embodiment of the present invention;
8 is a flowchart of an object tracking step of a hyper video production method according to an embodiment of the present invention;
9 is a reference diagram for explaining shot boundary detection in a method of manufacturing a hyper video according to an embodiment of the present invention;
10 is a reference diagram for explaining a attraction relationship in a continuous shot row of a hyper video production method according to an embodiment of the present invention;
11 is a conceptual illustration photo of a quadrilateral feature point for detecting a face region of a hyper video production method according to an embodiment of the present invention;
12A and 12B illustrate examples of frame screens of before and after changes (# 91 and # 93) when the shot boundary changes in the hyper video production method according to an embodiment of the present invention.
13A and 13B illustrate examples of frame screens of before and after changes (# 5350 and # 5352) when the scene changes in the hyper video production method according to an embodiment of the present invention.
14 is an example of faces grouped into similar faces in the method of producing a hyper video according to an embodiment of the present invention;
15 is a screen example in which a user selects an object area in a hyper video production method according to an embodiment of the present invention;
16 is an example of a screen in which an object is detected in each representative frame of an object selected by a user of a method of producing a hyper video according to an embodiment of the present invention;
17A to 17C illustrate examples of screens for which an object is tracked in adjacent frames # 404, # 420, and # 479 with respect to an object detected in a representative frame of the hyper video production method according to an embodiment of the present invention;
18A and 18B illustrate basic screen examples of a user UI of a hyper video production method according to an embodiment of the present invention;
19 is an example of object information editing screen of a user UI of the method of producing a hyper video according to an embodiment of the present invention;
20 is an example of an object DB export window screen of a user UI of a method of producing a hyper video according to an embodiment of the present invention;
21 is an example of a media output window screen of a user UI of a method of producing a hyper video according to an embodiment of the present invention;
22A to 22C illustrate examples of an object information display and a hyperlink screen of a user UI of the hyper video production method according to an embodiment of the present invention.

본 발명은 그 기술적 사상 또는 주요한 특징으로부터 벗어남이 없이 다른 여러가지 형태로 실시될 수 있다. 따라서, 본 발명의 실시예들은 모든 점에서 단순한 예시에 지나지 않으며 한정적으로 해석되어서는 안 된다.The present invention can be embodied in many other forms without departing from the spirit or main features thereof. Accordingly, the embodiments of the present invention are to be considered in all respects as merely illustrative and not restrictive.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be. On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "구비하다", "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present application, the terms "comprises", "having", "having", and the like are intended to specify the presence of stated features, integers, steps, operations, components, Steps, operations, elements, components, or combinations of elements, numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 대응하는 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like or corresponding elements are denoted by the same reference numerals, and a duplicate description thereof will be omitted. In the following description of the present invention, if it is determined that the detailed description of the related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted.

도 1은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작장치의 구성도, 도 2는 본 발명의 일 실시예에 의한 하이퍼 비디오 제장장치의 비디오 계층 구조 개념도이다. 1 is a block diagram of a hyper video production apparatus according to an embodiment of the present invention, Figure 2 is a conceptual diagram of a video hierarchy structure of a hyper video storage device according to an embodiment of the present invention.

비디오 파일에 포함된 인물이나 객체의 부가 정보가 하이퍼 링크로 연결되는 하이퍼 비디오를 구성하기 위한 하이퍼 비디오 정보 파일 제작장치(1000)가 구비된다. A hyper video information file producing apparatus 1000 is provided for composing a hyper video in which additional information of a person or an object included in a video file is linked by a hyperlink.

하이퍼 비디오 정보 파일 제작장치(1000)는 중앙처리유닛, 시스템 DB, 시스템 메모리, 인터페이스 등의 컴퓨팅 요소를 구비한 통상의 컴퓨터 시스템이 될 수 있으며, 이러한 통상의 컴퓨터 시스템에 하이퍼 비디오 정보 파일 제작 프로그램의 설치 및 구동에 의해 하이퍼 비디오 정보 파일 제작장치(1000)로서 기능되는 것으로 볼 수 있다. 이러한 컴퓨터 시스템의 통상적 구성에 대한 설명은 생략하며, 이하에서는 본 발명의 실시예의 설명에 필요한 기능 관점의 구성을 중심으로 설명한다. The hyper video information file producing apparatus 1000 may be a conventional computer system having a computing element such as a central processing unit, a system DB, a system memory, an interface, and the like. It can be seen that it functions as the hyper video information file producing apparatus 1000 by installation and driving. A description of the general configuration of such a computer system is omitted, and the following description will focus on the configuration of functional aspects required for the description of the embodiments of the present invention.

하이퍼 비디오 정보 파일 제작장치(1000)는 장면 그룹화 모듈(110)을 구비한다. The hyper video information file producing apparatus 1000 includes a scene grouping module 110.

장면 그룹화 모듈(110)은, 입력된 비디오 파일로부터 프레임들을 추출하는 기능과, 추출된 프레임들에서 샷 경계를 검출하고, 검출된 각 샷 별로 대표 프레임을 추출하며, 샷 및 대표 프레임에 관한 정보를 비디오계층 DB(170)에 저장하는 기능을 수행한다. 이러한 기능은 샷 검출 수단(부호 미도시)에 의해 구현될 수 있다.The scene grouping module 110 may extract a frame from an input video file, detect a shot boundary from the extracted frames, extract a representative frame for each detected shot, and provide information about the shot and the representative frame. It performs a function of storing in the video layer DB (170). This function can be implemented by shot detection means (not shown).

또한, 장면 그룹화 모듈(110)은, 각 샷들 중 유사한 샷들을 하나의 장면으로 그룹화하고, 도 2에 도시된 바와 같이 상기 프레임 및 샷, 장면이 구조화된 비디오 계층 구조 정보를 생성하여 비디오계층 DB(170)에 저장하는 기능을 수행한다. 이러한 기능은 장면 분할 수단(부호 미도시)에 의해 구현될 수 있다. In addition, the scene grouping module 110 groups similar shots among each shot into one scene, and generates video hierarchy information in which the frames, shots, and scenes are structured as shown in FIG. 170) to save. This function can be implemented by scene dividing means (not shown).

비디오계층 DB(170)는 비디오의 샷 및 장면 정보, 샷들의 대표 프레임 등 비디오의 계층 구조를 관리하는 DB이다.The video layer DB 170 is a DB that manages a video hierarchy such as shot and scene information of a video and a representative frame of shots.

하이퍼 비디오 정보 파일 제작장치(1000)는 또한, 상기 대표 프레임들에서 상기 객체 영역을 검출하여 객체 위치 DB(180)에 저장하는 기능을 수행하는 객체 검출 모듈(140)을 구비한다. The apparatus 1000 for producing a hyper video information file also includes an object detection module 140 that detects the object area from the representative frames and stores the object area in the object location DB 180.

하이퍼 비디오 정보 파일 제작장치(1000)는 또한, 상기 각 대표 프레임들에서 검출된 객체의 대표 위치들에서부터 인접한 프레임에서의 위치를 추적하여 객체 위치 DB(180)에 추가 저장하는 객체 추적 모듈(150)을 구비한다.The apparatus 1000 for producing a hyper video information file also tracks positions in adjacent frames from the representative positions of the objects detected in the respective representative frames, and further stores the positions in the object position DB 180. It is provided.

객체 위치 DB(180)는 대표 프레임을 포함한 각 프레임에서 검출한 객체 영역의 위치와, 후술하는 얼굴 영역의 위치(또는 그룹화된 얼굴 위치)를 관리하는 DB이다. The object position DB 180 is a DB for managing the position of the object region detected in each frame including the representative frame and the position (or grouped face position) of the face region described later.

하이퍼 비디오 정보 파일 제작장치(1000)는 또한, 사용자로부터 임의의 프레임에서 주목하는 객체 영역을 선택 입력받는 기능과, 사용자로부터 입력된 객체 정보를 객체 정보 DB(190)에 저장하는 기능을 수행하는 사용자 UI(User Interface) 모듈(160)을 구비한다. 이러한 사용자 UI 모듈(160)은 그 제공 기능에 따라 비디오 재생 수단, 장면 보기 수단, 객체 보기 수단, 객체 정보 보기 수단, 자동처리정보 보기 수단 등 다양한 사용자 UI 수단을 제공한다. 객체 정보 DB(190)는 객체(또는 얼굴)들에 부여된 각종 부가 정보를 관리하는 DB이다. The apparatus 1000 for producing a hyper video information file also performs a function of receiving a selection of an object region of interest in an arbitrary frame from a user and storing the object information input from the user in the object information DB 190. A UI module 160 is provided. The user UI module 160 provides various user UI means such as a video reproducing means, a scene viewing means, an object viewing means, an object information viewing means, an automatic processing information viewing means, and the like according to the provision function thereof. The object information DB 190 is a DB that manages various additional information provided to objects (or faces).

이러한 사용자 UI 모듈(160)은, 상기 객체 위치 DB(180) 또는 객체 정보 DB(190)에 저장된 정보를 사용자에게 제공하고, 제공한 정보에 대한 수정 편집 정보를 입력받는 기능을 구비할 수 있다. The user UI module 160 may have a function of providing information stored in the object location DB 180 or the object information DB 190 to a user and receiving correction and edit information on the provided information.

특히, 상기 사용자 UI 모듈(160)은, 전체 프레임에 대한 객체 검출 또는 추적이 완료된 상태가 아니더라도, 장면 그룹화 및 객체 검출, 객체 추적이 이뤄진 장면 내의 프레임에 대하여 상기 사용자 UI 모듈(160)의 기능 제공이 이뤄지도록 구성될 수 있다.In particular, the user UI module 160 provides a function of the user UI module 160 with respect to a frame in a scene in which scene grouping, object detection, and object tracking are performed, even if object detection or tracking of the entire frame is not completed. This can be configured to take place.

또한, 상기 사용자 UI 모듈(160)에 있어서, 상기 객체 위치 DB(180) 또는 객체 정보 DB(190)에 저장된 정보의 사용자 제공 및 수정 편집 정보 입력은, 상기 비디오계층 DB(170)에 저장된 구조화된 비디오 계층에 따라 장면-샷-프레임의 계층적 접근으로 이뤄질 수 있다. In addition, in the user UI module 160, user input and correction edit information input of information stored in the object location DB 180 or object information DB 190 may be stored in the video layer DB 170. Depending on the video layer, this can be done with a hierarchical approach of scene-shot-frames.

하이퍼 비디오 정보 파일 제작장치(1000)는 또한, 상기 객체 위치 DB와 객체 정보 DB 내용을 매핑하여 하이퍼 비디오 정보 파일을 생성하는 기능을 적어도 수행하는 제어 모듈(100)을 구비한다. 제어 모듈(100)은 상기 기능과 함께, 본 실시예의 하이퍼 비디오 정보 파일 제작장치(1000)의 전반적인 동작 제어를 수행한다. The hyper video information file producing apparatus 1000 also includes a control module 100 that performs at least a function of generating a hyper video information file by mapping the object location DB and the object information DB contents. The control module 100 performs the overall operation control of the hyper video information file producing apparatus 1000 of the present embodiment together with the above functions.

한편, 본 실시예의 하이퍼 비디오 정보 파일 제작장치(1000)는, 검출된 각 대표 프레임들에 포함된 얼굴 영역을 검출하는 얼굴 영역 검출 모듈(120)을 구비한다. Meanwhile, the apparatus 1000 for producing a hyper video information file according to the present embodiment includes a face region detection module 120 for detecting a face region included in each of the detected representative frames.

하이퍼 비디오 정보 파일 제작장치(1000)는 또한, 상기 검출된 얼굴 영역 중 유사한 얼굴들로 얼굴 그룹화를 진행하여 객체 위치 DB(180)에 저장하는 얼굴 그룹화 모듈(130)을 구비한다. The apparatus 1000 for producing a hyper video information file also includes a face grouping module 130 for proceeding face grouping with similar faces among the detected face regions and storing the same in the object position DB 180.

본 실시예의 하이퍼 비디오 정보 파일 제작장치(1000)에 있어서 바람직하게, 상기 대표 프레임은 각 샷의 시작 프레임을 기준으로 초반부에 해당하는 설정 범위 내의 프레임을 사용하되, 첫 프레임을 사용하지 않는다. In the hyper-video information file producing apparatus 1000 of the present embodiment, the representative frame preferably uses a frame within a setting range corresponding to the beginning of the shot based on the start frame of each shot, but does not use the first frame.

본 실시예의 하이퍼 비디오 정보 파일 제작장치(1000)에 있어서 바람직하게, 상기 객체 정보 DB는 객체 목록을 트리 구조로 관리한다.
In the hyper-video information file producing apparatus 1000 of the present embodiment, the object information DB manages the object list in a tree structure.

도 3은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 전체 흐름도이다. 3 is an overall flowchart of a hyper video production method according to an embodiment of the present invention.

도시된 바와 같이 본 실시예에 의한 하이퍼 비디오 제작방법은, 제작 과정의 시작 단계(S10)로부터 출발하여, 비디오 파일 입력 단계(S20) 및 장면 그룹화 단계(S30), 얼굴 영역 검출 단계(S40), 얼굴 그룹화 단계(S50), 객체 검출 단계(S60), 객체 추적 단계(S70), 객체 정보 입력 단계(S80), 결과물 생성(S90), FLV(Flash Video) 및 XML(Extensible Mark-up Language) 변환 출력 단계(S92)를 거쳐 종료 단계(S94)로 이뤄진다.
As shown, the hyper video production method according to the present embodiment starts from the start step (S10) of the production process, the video file input step (S20), the scene grouping step (S30), the face region detection step (S40), Face grouping step (S50), object detection step (S60), object tracking step (S70), object information input step (S80), result generation (S90), FLV (Flash Video) and XML (Extensible Mark-up Language) transformation The output stage S92 is followed by the termination stage S94.

도 4는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 장면 그룹화 단계의 흐름도로서, 도면을 참조하여 장면 그룹화 단계(S30)를 상세하게 설명한다. 4 is a flowchart of a scene grouping step of the hyper video production method according to an embodiment of the present invention, and the scene grouping step S30 will be described in detail with reference to the drawings.

입력된 비디오 파일로부터 프레임들을 추출한다(S310). 이러한 과정은 예를 들어, 사용자가 대상이 되는 비디오 파일에 대한 프로젝트를 생성한 후, DirectX의 샘플 그래버(SampleGrabber)를 이용하여 비디오 파일에서 프레임들을 추출하는 방식으로 이뤄질 수 있다. 이렇게 비디오 파일에서 프레임 추출을 DirectX 방식으로 진행하게 되면, 임의의 형식의 코덱(codec)에 대하여도 DirectX 필터를 지원하는 경우라면 하이퍼 비디오 제작장치에서 이용할 수 있다.Frames are extracted from the input video file (S310). This can be done, for example, by creating a project for the target video file and then extracting frames from the video file using a sample grabber of DirectX. In this way, if the frame is extracted from the video file using the DirectX method, if the DirectX filter is supported for a codec of any format, it can be used in the hyper video production apparatus.

추출된 프레임들에서 샷 경계를 검출하고(S320~S340), 검출된 각 샷 별로 대표 프레임을 추출하며, 샷 및 대표 프레임에 관한 정보를 비디오계층 DB에 저장한다(S350). Shot boundaries are detected from the extracted frames (S320 to S340), a representative frame is extracted for each detected shot, and information about the shot and the representative frame is stored in the video layer DB (S350).

이때, 상기 대표 프레임은 각 샷의 시작 프레임을 기준으로 초반부에 해당하는 설정 범위 내의 프레임을 사용하되, 첫 프레임을 사용하지 않는 것이 바람직하다. 이렇게 첫 프레임을 사용하지 않는 이유는 페이드 샷(fade shot) 인 경우에는 샷의 첫 프레임이 깨끗하지 못할 수 있기 때문이다. 바람직하게는 샷의 시작 프레임에서부터 5 프레임 이후의 프레임을 사용하되, 초반부(예, 6 프레임)에 해당하는 프레임을 사용하는 것이 바람직하다. In this case, the representative frame may use a frame within a setting range corresponding to the beginning of the shot based on the start frame of each shot, but does not use the first frame. This is because the first frame of the shot may not be clean in the case of a fade shot. Preferably, a frame five frames after the start frame of the shot is used, but a frame corresponding to the initial portion (eg, six frames) is preferably used.

이 과정에서, 상기 샷 경계 검출(S320~S340)은, 추출된 각 프레임과 이전 프레임과의 변화도를 측정하고(S320), 변화도가 제1 임계값을 초과하는 경우(S330)에 해당 프레임을 샷 경계로 검출(S340)하는 방식으로 이뤄진다. In this process, the shot boundary detection (S320 to S340) measures the degree of change between each extracted frame and the previous frame (S320), and if the degree of change exceeds the first threshold (S330). Is detected by the shot boundary (S340).

프레임 간의 변화도는 프레임 간의 거리로 볼 수 있다. 기존에 알려진 프레임 간 거리를 정량적으로 계산하기 위한 대표적인 방법들로서 화소점 기반 방법, 블록 기반 방법, 색상 정보에 기초한 방법들이 있다. The degree of change between frames can be viewed as the distance between frames. Representative methods for quantitatively calculating the distance between frames known in the art include a pixel point based method, a block based method, and a method based on color information.

바람직하게, 본 실시예에서는 색상 정보에 기초한 방법을 사용한다. 일반적인 색상 정보에 기초한 방법은 우선 여러 가지 색공간 RGB, HSV, YIQ, L*a*b*, L*u*v* 혹은 Gray 공간 에서의 색상히스토그램을 구성하고, 인접한 프레임에서의 색상히스토그램 간 거리를 계산하여, 어떤 임계치(T)와 비교하는 방식으로 이뤄진다. Preferably, the present embodiment uses a method based on color information. A method based on general color information first constructs a color histogram in various color spaces RGB, HSV, YIQ, L * a * b *, L * u * v * or Gray space, and then the distance between the color histograms in adjacent frames. Is calculated and compared with a certain threshold T.

이러한 프레임 간의 변화도 측정 방법에 관하여는, T. Y. Liu, K. T. Lo, X. D. Zhang, and J. Feng의 논문, "A new cut detection algorithm with constant false-alarm ratio for video segmentation"(J. Vis. Commun. Image R., 15(2): 132-144, 2004.)와, R.A.Joyce and B.Liu의 논문, "Temporal Segmentation of Video Using Frame and Histogram Space"(IEEE Trans. Multimedia, vol.8, no.1, pp.130-140 (2006))등을 통해 이해될 수 있다.For a method of measuring the degree of change between frames, a paper by TY Liu, KT Lo, XD Zhang, and J. Feng, "A new cut detection algorithm with constant false-alarm ratio for video segmentation" (J. Vis. Commun. Image R., 15 (2): 132-144, 2004.), and in the papers of RAJoyce and B.Liu, "Temporal Segmentation of Video Using Frame and Histogram Space" (IEEE Trans. Multimedia, vol. 8, no. 1, pp. 130-140 (2006)).

일예를 들면, 프레임에서의 색상히스토그램(H)은 다음과 같이 구할 수 있다.For example, the color histogram H in the frame can be obtained as follows.

H(m) = (m 색상을 가지는 화소점의 개수) / (전체 화소점의 개수), (m = 1...M이며, M은 RGB 색공간에서 표현 가능한 색상값)H (m) = (number of pixel points with m color) / (number of total pixel points), (m = 1 ... M, where M is the color value that can be represented in the RGB color space)

이때 두 프레임 a 와 b 사이 거리는 하기 수학식1과 같이 계산한다.At this time, the distance between two frames a and b is calculated as in Equation 1 below.

[수학식1][Equation 1]

(단,

: 프레임 a 에서의 색상히스토그램, (only,

: Color histogram at frame a,

: 프레임 b 에서의 색상히스토그램,

: Color histogram at frame b,

D(a, b): 프레임 a 와 b 사이 거리)D (a, b): distance between frames a and b)

바람직한 일예로서, 상기 변화도는 각 프레임 별로 계산된 색상히스토그램 간의 거리값이 사용되며, 상기 제1 임계값(T(i))은 i 번째 프레임에 대하여 하기 수학식2로 능동적으로 정의된다. As a preferred example, the gradient is used as a distance value between color histograms calculated for each frame, and the first threshold value T (i) is actively defined by Equation 2 for the i-th frame.

[수학식2]&Quot; (2) "

(단, μ(i) : i 번째 프레임에서의 색상히스토그램 거리값 평균,(Μ (i): average color histogram distance value at i-th frame,

σ(i) : i 번째 프레임에서의 색상히스토그램 거리값 표준편차,σ (i): Standard deviation of the color histogram distance value at the i th frame,

α : 가중치(상수))α: weight (constant)

상기 거리값 평균과 표준편차를 계산하기 위하여 i 번째 프레임에서 어떤 일정한 윈도우 구역을 설정하고 이 윈도우 구역 내에서 평균과 표준편차를 계산할 수 있다. 또한, 상기 가중치 α = 3 으로 설정하면 확률 및 수리통계학에서의 '3σ법칙'에 따라 샷 경계를 특이값으로 추출해낼 수 있다.In order to calculate the distance mean and standard deviation, a certain window area may be set in the i th frame, and the mean and standard deviation may be calculated within this window area. In addition, if the weight α = 3, the shot boundary may be extracted as a singular value according to the '3σ law' in probability and mathematical statistics.

이와 관련하여, 도 9는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 샷 경계 검출을 설명하기 위한 참고도로서, 상기 샷 경계가 검출되는 상태를 예시한다. In this regard, FIG. 9 is a reference diagram for explaining shot boundary detection in the hyper video production method according to an embodiment of the present invention, and illustrates a state in which the shot boundary is detected.

이후, 각 샷들 중 유사한 샷들을 하나의 장면으로 그룹화하고, 상기 프레임 및 샷, 장면이 구조화된 비디오 계층 구조 정보를 생성하여 비디오계층 DB에 저장한다(S360~S380). 이러한 장면 그룹화는 사용자의 접근을 편리하게 하기 위하여 이용된다. 즉, 비디오 파일의 내용이 길어서 샷이 수백 개 이상 검출된 경우, 장면을 사용하지 않으면 많은 샷 들을 일일이 조사하는데 많은 시간이 소비되게 되는데, 유사한 샷들을 장면으로 그룹화하면, 사용자의 접근조작이 장면->샷->프레임으로 계층적으로 보다 편리하게 진행될 수 있다.Thereafter, similar shots among the shots are grouped into one scene, and the video layer structure information in which the frames, shots, and scenes are structured is generated and stored in the video hierarchy DB (S360 to S380). This scene grouping is used to facilitate the user's access. In other words, if the content of the video file is long and hundreds of shots are detected, a large amount of time is spent investigating many shots without using the scene. This can be done more conveniently hierarchically with> shot-> frame.

상기 과정에서 바람직하게, 장면 그룹화(S360~S380)는, 각 샷의 대표 프레임 별로 색상히스토그램을 구하고, 각 샷의 대표 프레임에 대하여, 전후 양방향으로 인접하는 소정 갯수의 인접 샷의 대표 프레임들과의 상호 유사도를 구하며(S360), 상기 인접 샷들로부터 상기 상호 유사도에 비례하여 받는 인력비가 설정된 제2임계치를 초과하는 경우에(S370), 해당 대표 프레임이 포함된 샷을 새로운 장면의 시작으로 추가하여 상기 비디오계층 DB에 저장하는 방식으로 이뤄진다(S380). In the above process, the scene grouping (S360 to S380) may obtain a color histogram for each representative frame of each shot, and with the representative frames of a predetermined number of adjacent shots adjacent to each other in both directions, with respect to the representative frame of each shot. When the similarity is calculated (S360), and the workforce ratio received from the adjacent shots in proportion to the mutual similarity exceeds a second threshold value (S370), the shot including the representative frame is added as a start of a new scene. This is done by storing in the video layer DB (S380).

보다 상세한 일예를 설명하면, 샷들의 유사도를 정량적으로 평가하기 위하여 샷의 대표 프레임들에서의 색상히스토그램에 기초하여 하기 수학식3과 같이 계산되는 상호 유사도를 이용한다.In more detail, in order to quantitatively evaluate the similarity of the shots, mutual similarity calculated based on the color histogram in the representative frames of the shots is used as in Equation 3 below.

[수학식3]&Quot; (3) "

(단,

: 샷 a 의 대표 프레임에서의 색상히스토그램, (only,

: Color histogram of representative frame of shot a,

: 샷 b 의 대표 프레임에서의 색상히스토그램,

: Color histogram of representative frame of shot b,

w: 가중치(

),w: weights (

),

d: 샷 a 와 b 사이 최소거리(이전 샷의 끝 프레임에서부터 다음 샷의 시작 프레임까지 거리, d: Minimum distance between shots a and b (the distance from the end frame of the previous shot to the start frame of the next shot,

C: 샷의 길이를 규정하는 상수,C: constant defining the length of the shot,

Cor(a,b): 샷 a 와 b 사이 상호 유사도)Cor (a, b): mutual similarity between shots a and b)

도 10은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 연속 샷 열에서의 인력관계를 설명하기 위한 참고도로서, 도시된 것처럼, 연속적인 샷들의 열에서 하나의 주목하는 샷은 인접한 3개의 샷으로부터 상호 유사도에 비례하는 인력을 받는다고 가정할 수 있다. 이때, 주목하는 샷 i 에서의 인력비는 하기 수학식4와 같이 계산한다.FIG. 10 is a reference diagram for explaining a attraction relationship in a continuous shot column of a hyper video production method according to an embodiment of the present invention. It can be assumed that the shot receives manpower proportional to mutual similarity. At this time, the attraction ratio in the shot i to pay attention is calculated as in Equation 4 below.

[수학식4]&Quot; (4) "

R(i) = (right(i) + right(i + 1)) / ((left(i) + left(i + 1))R (i) = (right (i) + right (i + 1)) / ((left (i) + left (i + 1))

(단, left(i) = max{Cor(i, i - 1), Cor(i, i - 2), Cor(i, i - 3)},(Except left (i) = max {Cor (i, i-1), Cor (i, i-2), Cor (i, i-3)},

left(i + 1) = max{Cor(i + 1, i - 1), Cor(i + 1, i - 2)},left (i + 1) = max {Cor (i + 1, i-1), Cor (i + 1, i-2)},

right(i) = max{Cor(i, i + 1), Cor(i, i + 2), Cor(i, i + 3)}right (i) = max {Cor (i, i + 1), Cor (i, i + 2), Cor (i, i + 3)}

right(i + 1) = max{Cor(i + 1, i + 2), Cor(i + 1, i + 3), Cor(i + 1, i + 4)})right (i + 1) = max {Cor (i + 1, i + 2), Cor (i + 1, i + 3), Cor (i + 1, i + 4)})

이와 같이 주목하는 샷 i 에서의 인력비 R(i)를 계산하면 미리 정의한 임계치(T)에 따라서 판정식 'R(i) > T AND R(i) > R(i - 1) AND R(i) > R(i + 1)'를 만족하는 경우 샷 i 를 새로운 장면의 시작으로 설정하게 된다. 위의 판정식을 만족하지 않는 경우에는 샷 i 를 이전 장면으로 추가한다.
When calculating the manpower ratio R (i) in the shot i to be noted in this manner, the determination formula 'R (i)> T AND R (i)> R (i-1) AND R (i' is determined according to a predefined threshold value T. If i satisfies)> R (i + 1) ', the shot i is set as the start of a new scene. If the above decision is not satisfied, the shot i is added to the previous scene.

도 5는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 얼굴 검출 단계의 흐름도, 도 6은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 얼굴 그룹화 단계의 흐름도로서, 도면을 참조하여 얼굴 영역 검출 단계(S40) 및 얼굴 그룹화 단계(S50)를 상세하게 설명한다. 5 is a flowchart of a face detection step of the hyper video production method according to an embodiment of the present invention, and FIG. 6 is a flowchart of a face grouping step of the hyper video production method according to an embodiment of the present invention. The area detection step S40 and the face grouping step S50 will be described in detail.

본 단계에서는, 검출된 각 대표 프레임들에 포함된 얼굴 영역을 검출(S40)하여 상기 검출된 얼굴 영역 중 유사한 얼굴들로 얼굴 그룹화를 진행하여 객체 위치 DB에 저장하며(S50), 상기 얼굴 그룹화된 각 얼굴 영역을 추적 대상 객체로 하여 후술하는 객체 추적 단계(S70)를 진행한다. 일반 객체 검출은 검출하려는 대상 객체를 사용자가 지정하는 방식을 취하는데 비해, 얼굴 검출은 장면 그룹화 시에 대표 프레임에서 자동으로 진행하게 되는데, 이러한 방식은 작업의 자동화 정도와 효율성을 더욱 높이는 장점을 제공한다. In this step, the face regions included in the detected representative frames are detected (S40), the face grouping is performed with similar faces among the detected face regions, and stored in the object position DB (S50). The object tracking step (S70) described below is performed by using each face area as a tracking target object. While general object detection takes a method of specifying a target object to be detected by a user, face detection is automatically performed in a representative frame when grouping scenes, which provides an advantage of increasing the degree of automation and efficiency of operation. do.

먼저, 얼굴 영역 검출 단계(S40)를 살펴 보면, 검출된 각 대표 프레임들에 포함된 얼굴 영역을 검출한다(S410~S430).First, referring to the face region detection step S40, face regions included in the detected representative frames are detected (S410 ˜ S430).

검출된 얼굴 영역이 있는 경우(S440)에 해당 얼굴 영역을 미지얼굴 목록에 추가 저장한다(S450).If there is a detected face area (S440), the face area is additionally stored in the unknown face list (S450).

상기 검출된 얼굴 영역과 유사한 얼굴이 객체 위치 DB에 존재하는 것으로 판단한 경우(S460), 해당 얼굴 영역을 객체 위치 DB의 해당 얼굴 그룹에 추가 등록한다(S470).If it is determined that a face similar to the detected face region exists in the object position DB (S460), the face region is additionally registered to the corresponding face group of the object position DB (S470).

기존에 알려진 얼굴 검출 방법들로서, 지식기반 방법(Knowledge-based), 특징기반방법(feature-based), 형판(템플릿)(template-matching) 정합 방법, 외형기반(Appearance-based)방법 등이 있다. Known face detection methods include a knowledge-based method, a feature-based method, a template-matching matching method, and an appearance-based method.

바람직하게, 본 실시예에서는 외형기반(Appearance-based)방법을 사용한다. 외형기반방법은 상이한 영상들에서 얼굴영역과 비얼굴영역을 획득하며, 획득된 영역들을 학습하여 학습모델을 만들고, 입력 영상과 학습모델자료를 비교하여 얼굴을 검출하는 방법으로서, 정면 및 측면 얼굴 검출에 대해서는 비교적 성능이 높은 방법으로 알려져 있다.Preferably, in this embodiment, an appearance-based method is used. The appearance-based method obtains face and non-face areas from different images, learns the acquired areas to make a learning model, and compares input images and learning model data to detect faces. It is known as a relatively high performance method.

이러한 얼굴검출에 관하여는, Jianxin Wu, S. Charles Brubaker, Matthew D. Mullin, and James M. Rehg의 논문, "Fast Asymmetric Learning for Cascade Face Detection,"(IEEE Tran- saction on Pattern Analysis and Machine Intelligence, Vol. 30, No. 3, MARCH 2008.)와, Paul Viola, Michael Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features"(Accepted Conference on Computer Vision and Pattern Recognition 2001.)등을 통해 이해될 수 있다. Regarding such face detection, the paper by Jianxin Wu, S. Charles Brubaker, Matthew D. Mullin, and James M. Rehg, "Fast Asymmetric Learning for Cascade Face Detection," by IEEE Transcription on Pattern Analysis and Machine Intelligence, 30, No. 3, MARCH 2008.), and Paul Viola, Michael Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features" (Accepted Conference on Computer Vision and Pattern Recognition 2001.) have.

특히, 바람직한 일예로서, 상기 얼굴 영역 검출은, 상기 대표 프레임의 RGB 색 정보로부터 YCbCr 색 모델을 작성하고, 작성된 색 모델에서 색 정보와 밝기 정보를 분리하며, 상기 밝기 정보에 의하여 얼굴후보영역을 검출하는 단계와, 상기 검출된 얼굴후보영역에 대한 4각 특징점 모델을 정의하고, 상기 4각 특징점 모델을 AdaBoost 학습 알고리즘에 의하여 학습시킨 학습자료에 기초하여 얼굴 영역을 검출하는 단계로 이뤄진다. AdaBoost 학습알고리즘은 약분류기의 선형적인 결합을 통하여 최종적으로 높은 검출 성능을 가지는 강분류기를 생성하는 알고리즘으로 알려져 있다.Particularly, as a preferable example, the face area detection may generate a YCbCr color model from RGB color information of the representative frame, separate color information and brightness information from the created color model, and detect a face candidate area by the brightness information. And defining a quadrilateral feature point model for the detected face candidate region, and detecting a face region based on learning data trained by the AdaBoost learning algorithm. The AdaBoost learning algorithm is known as an algorithm that generates a strong classifier with high detection performance through linear combination of weak classifiers.

이와 관련하여, 도 11은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 얼굴영역 검출을 위한 4각 특징점의 개념 예시 사진으로서, 얼굴영역 검출을위하여 4각 특징점이 사용되는 상태를 예시하고 있다. In this regard, FIG. 11 is a conceptual illustration photograph of a quadrilateral feature point for detecting a face region of the hyper video production method according to an embodiment of the present invention, and illustrates a state where the quadrilateral feature point is used for face region detection. .

다음으로, 얼굴 그룹화 단계(S50)에서는, 상기 과정을 통해 모든 대표 프레임에 대한 얼굴 영역 검출이 완료된 경우, 상기 미지얼굴 목록에 저장된 미지얼굴들 중 유사 얼굴들로 얼굴 그룹화를 진행하여(S510) 상기 객체 위치 DB에 저장한다(S520~S530). Next, in the face grouping step (S50), when the face region detection for all the representative frames is completed through the above process, the face grouping is performed with similar faces among unknown faces stored in the unknown face list (S510). Store in the object location DB (S520 ~ S530).

얼굴은 일반 객체와 달리 얼굴 그룹화 단계가 추가되어 있는데, 이렇게 그룹화를 진행하고 매 그룹을 얼굴 객체로 자동적으로 설정하여 주면, 인물에 따르는 객체 편집 효율을 높일 수 있다는 장점이 있다. 또한, 이전에 편집한 적이 있는 인물에 대한 특징들을 객체 위치 DB로 출력하였다가 이후 편집 시에 참조하게 되면 같은 인물에 대한 편집을 다시 반복하지 않아도 되므로 편집 효율이 높아지는 장점을 갖게 된다. Unlike general objects, faces are grouped with a face grouping step. If the grouping is performed and each group is automatically set as a face object, there is an advantage in that an object editing efficiency according to a person can be improved. In addition, when the features of the person who has been edited before are outputted to the object position DB and referred to later editing, the editing of the same person does not have to be repeated, thereby improving the editing efficiency.

기존에 알려진 얼굴 그룹화 방법들로서, 얼굴의 기하학적인 특징인 눈, 코, 입 등의 위치나 크기 또는 이들 간의 거리를 이용하여 얼굴을 인식하는 기하학적 정합방법, 얼굴자료를 데이터베이스에 저장된 형판(템플릿)영상과 비교하여 서로 간의 상관성을 분석하여 얼굴을 인식하는 형판 패턴 정합방법, 신경망(ANN: Arfiti-cial Neural Network)을 이용한 방법, SVM(Support Vector Machine)방법, HMM (Hidden Markov Model)방법 등이 있다. Conventionally known face grouping methods, such as geometric matching method that recognizes a face using the position or size of the eyes, nose, mouth, etc., which are geometric features of the face, or distance between them, template image stored in a database Compared with the data analysis, there is a template pattern matching method that recognizes a face by analyzing the correlation between each other, a method using an Arfiti-cial Neural Network (ANN), a Support Vector Machine (SVM) method, and a Hidden Markov Model (HMM) method. .

특히, 형판 패턴 정합방법으로서, 주성분 분석(PCA:Principal Component Analysis)방법, 독립성분 분석(ICA: Independent Component Analysis)방법, 선형판별 분석(LDA: linear dis-criminant Analysis)방법, 2DPCA(2-Dimensional PCA)방법, PCA/LDA 방법, 구성요소기반(DCT/ LDA)방법, 국부적 특징 분석(LFA: Local Feature Analysis)방법 등이 있다. In particular, as a template pattern matching method, a principal component analysis (PCA) method, an independent component analysis (ICA) method, a linear dis-criminant analysis (LDA) method, and a 2-Dimensional PCA), PCA / LDA, Component-based (DCT / LDA), Local Feature Analysis (LFA).

특히, 바람직한 일예로서, 상기 얼굴 그룹화는, 가보 웨블레트(Gabor wavelet) 변환에 의하여 얼굴 화상의 가보(Gabor) 표현을 추출하고, 비선형 매핑을 하며, 커널(kernel) 공간에서의 선형 판별 분석을 하고, 순차 그룹화 알고리즘에 의한 얼굴 그룹화를 통해 이뤄진다. Particularly, as a preferable example, the face grouping extracts a Gabor representation of a face image by Gabor wavelet transformation, performs nonlinear mapping, and performs linear discriminant analysis in kernel space. And face grouping by a sequential grouping algorithm.

가보 웨블레트는 국부적이면서도 식별력이 있는 특징량들을 효과적으로 표현할 수 있으며, 이로 인해 패턴 검출이나 얼굴 인식에서 유용하게 쓰이는 것으로 알려져 있다. 커널 공간에서의 선형 판별 분석(KLDA, GDA)은 선형 판별 분석(LDA)을 커널 공간에서 실현함으로써 인식 능력을 높인 방법이다. Gabor wavelets can effectively express local and discernible feature quantities, which is known to be useful in pattern detection and face recognition. Linear discriminant analysis (KLDA, GDA) in kernel space is a method of improving recognition ability by realizing linear discriminant analysis (LDA) in kernel space.

순차 그룹화 알고리즘에서는 커널 공간에서의 선형판별분석에 의해 L차원으로 축소된 특징 벡터들에 의하여 얼굴의 그룹화가 진행되는데, 새로운 얼굴과 이미 그룹화된 매 얼굴들과의 유사도를 계산하는 과정에 의해 이뤄진다. In the sequential grouping algorithm, face grouping is performed by feature vectors reduced in L dimension by linear discriminant analysis in kernel space, which is performed by calculating a similarity between a new face and every face already grouped.

이러한 얼굴 그룹화에 관하여는, G. Baudat and F. Anouar의 논문, "Generalized discriminant analysis using a kernel approach"(Neu-ral Comput., vol. 12, no. 10, pp. 2385-2404, 2000.)등을 통해 이해될 수 있다.
Regarding this face grouping, a paper by G. Baudat and F. Anouar, "Generalized discriminant analysis using a kernel approach" (Neu-ral Comput., Vol. 12, no. 10, pp. 2385-2404, 2000.) It may be understood through such.

도 7은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 객체 검출 단계의 흐름도로서, 객체 검출 단계(S60)는 다음과 같이 진행된다. 7 is a flowchart of the object detection step of the hyper video production method according to an embodiment of the present invention, the object detection step (S60) proceeds as follows.

사용자로부터 임의의 프레임에서 주목하는 객체 영역을 선택 입력받고, 상기 대표 프레임들에서 상기 객체 영역을 검출하여 객체 위치 DB에 저장한다(S610~S650).The user selects and inputs an object region of interest in an arbitrary frame, detects the object region in the representative frames, and stores the object region in an object position DB (S610 to S650).

이러한 객체 입력 과정에서, 사용자는 임의의 프레임에서 객체를 선택한다. 예를 들어, 대표 프레임뿐만이 아니라 '미디어 재생기 및 작업창'에서 임의의 프레임으로 이동하고, 작업 영역에서 4각 선택 도구로 객체 영역을 선택할 수 있다. 이때 선택된 객체는 새로운 객체로 객체 목록에 추가된다. 이 객체를 이미 추가된 다른 객체로 드래그 앤 드롭(Drag & Drop)을 이용하여 통합시킬 수도 있다.In this object input process, the user selects an object in an arbitrary frame. For example, you can move to any frame in the 'media player and task pane' as well as the representative frame, and select the object region with the quadrilateral selection tool in the work region. The selected object is added to the object list as a new object. You can also integrate this object by dragging and dropping into another object that has already been added.

이렇게 임의의 프레임에서 객체를 선택한 경우, 이후 이 프레임이 포함된 샷 내에서, 이후 연속적인 프레임들에서의 객체 위치를 작업도구창의 '객체 추적'을 클릭하여 추적할 수 있다. If an object is selected in any frame like this, the object position in subsequent frames within the shot including the frame can be tracked by clicking 'Track object' in the work tool window.

한편, 상기 과정에서 객체 선택은 일반적으로 알려진 공지의 방식이 사용될 수 있다. 예를 들어, 화살표를 마우스로 움직여 관심 객체로 이동시킨 후, 다각형, 원 등의 영역 선택 도형을 그려서 선택하는 방식이 사용될 수 있다. On the other hand, the object selection in the process may be used a known method generally known. For example, a method of moving an arrow to an object of interest by moving the arrow and then drawing by selecting an area selection figure such as a polygon or a circle may be used.

바람직한 일예로서, 상기 객체 영역 검출은, SIFT(Scale Invariant Feature Transform) 특징량을 검출하고, 확산거리를 이용한 초기 정합을 수행하며, SIFT 서술자를 이용한 최종 정합을 수행하는 과정으로 이뤄진다. In a preferred embodiment, the object region detection is performed by detecting a scale invariant feature transform (SIFT) feature, performing an initial matching using a spreading distance, and performing a final matching using a SIFT descriptor.

SIFT 특징량 검출은, 예를 들어, 스케일 공간 극값 탐색(Scale-space extrema detection) 및 키포인트 국부화(Keypoint localization), 방향 할당(Orientation assignment), 키포인트 서술자(Keypoint descriptor) 등의 기본계산 단계로 이뤄진다. SIFT feature detection consists of, for example, basic calculation steps such as scale-space extrema detection and keypoint localization, orientation assignment, and keypoint descriptors. .

이러한 객체 영역 검출에 관하여는, D. Lowe의 논문, "Distinctive Image Features from Scale-Invariant Keypoints"(IJCV, 60(2), pp. 91-110, 2004. 1, 2, 6, 7, 8)과, Haibin Ling, Kazunori Okada의 논문, "Diffusion Distance for Histogram Com-parison"(2006)과, V. Ferrari, T. Tuytelaars, L. van Gool의 논문, "Simultaneous Object Recognition and Segmentation by Image Exploration"(ECCV, 2004.) 등을 통해 이해될 수 있다.
Regarding such object area detection, D. Lowe's paper, "Distinctive Image Features from Scale-Invariant Keypoints" (IJCV, 60 (2), pp. 91-110, 2004. 1, 2, 6, 7, 8) And, by Haibin Ling, Kazunori Okada, "Diffusion Distance for Histogram Com-parison" (2006), and V. Ferrari, T. Tuytelaars, L. van Gool, "Simultaneous Object Recognition and Segmentation by Image Exploration" ( ECCV, 2004.).

도 8은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 객체 추적 단계의 흐름도이다. 8 is a flowchart of an object tracking step of a hyper video production method according to an embodiment of the present invention.

객체 추적 단계(S70)에서는, 상기 각 대표 프레임들에서 검출된 객체의 대표 위치들에서부터 인접한 프레임에서의 위치를 추적하여 객체 위치 DB에 추가 저장한다(S70). In the object tracking step (S70), the location in the adjacent frame is tracked from the representative positions of the objects detected in the respective representative frames and further stored in the object location DB (S70).

이를 보다 상세하게 보면, 하나의 대표 프레임을 기준으로 다음 프레임을 입력받는다(S710~S730).In more detail, the next frame is input based on one representative frame (S710 to S730).

상기 다음 프레임이 상기 대표 프레임이 포함된 샷 범위를 벗어나지 않은 것으로 판단된 경우(S740), 해당 프레임 내에서 객체 위치를 추적한다(S750).If it is determined that the next frame does not deviate from the shot range including the representative frame (S740), the object position is tracked in the corresponding frame (S750).

상기 다음 프레임이 상기 대표 프레임이 포함된 샷 범위를 벗어난 것으로 판단된 경우(S740)에는, 다음 대표 프레임을 입력받고 상기 S710 이하 단계로 다시 진행한다.When it is determined that the next frame is out of the shot range including the representative frame (S740), the next representative frame is input and the process proceeds to step S710 or less.

객체 위치를 추적(S750)한 결과, 추적된 객체가 있는 경우(S760), 추적된 해당 객체의 위치를 객체 위치 DB에 추가 저장한다(S770).As a result of tracking the object location (S750), if there is a tracked object (S760), the location of the tracked object is additionally stored in the object location DB (S770).

기존에 알려진 객체 추적 방법들로서, 저수준해석(Low-Level Analysis), 특징해석(Feature Analysis), 능동형상모델(Active Shape Models), 선형부분공간(Linear Subspace Methods), 신경망, 통계적방법 등이 있다. Known object tracking methods include low-level analysis, feature analysis, active shape models, linear subspace methods, neural networks, and statistical methods.

특히, 바람직한 일예로서, 상기 객체 추적은, 상기 대표 프레임에서 객체의 대표 위치 영역을 선택받는 단계와, 선택된 객체의 위치 영역과 상기 대표 프레임에 기초하여 객체의 색상 모형화를 진행하는 단계와, 추적을 위하여 다음 프레임이 입력되면 Mean Shift 방법에 의해 객체 움직임이 추적되는 단계와, 탐색 윈도우의 위치를 객체의 위치로 결정하고, 객체의 크기 변화를 추정하는 단계로 이뤄진다. Mean Shift 방법은 색상에 기초하여 객체를 모형화하고 그러한 색상의 확률적 분산의 최빈값을 찾기 위해 그의 그라디엔트를 상승시키는 방법이다.
Particularly, as an example, the object tracking may include selecting a representative location area of an object in the representative frame, performing color modeling of the object based on the location area of the selected object and the representative frame, and tracking. To this end, when the next frame is input, the object movement is tracked by means of the Mean Shift method, the position of the search window is determined as the position of the object, and the size change of the object is estimated. The Mean Shift method is a method of modeling an object based on color and raising its gradient to find the mode of stochastic variance of that color.

다음으로, 상기와 같은 과정을 통해 객체 추적이 이뤄진 후, 객체 정보 입력 단계(S80)에서는 사용자로부터 입력된 객체 정보를 객체 정보 DB에 저장한다. 상기 객체 정보로서는 예를 들어, 해당 객체와 관련된 화상, 본문, 음성, 비디오, 웹주소(URL) 등이 포함될 수 있다.Next, after the object tracking through the above process, in the object information input step (S80) stores the object information input from the user in the object information DB. The object information may include, for example, an image, a text, a voice, a video, a web address (URL), and the like related to the object.

상기 객체 정보 DB는 객체 목록을 트리 구조로 관리한다. 예를 들어, 파일 시스템(file system)에서 디렉토리(directory)와 파일(file)의 관계와 같이, 가상적인 객체 그룹(디렉토리에 대응)을 트리 구조로 만들고, 여기에 객체(파일에 대응)들을 보관한다. 이와 같이 많은 객체 들을 트리 구조로 관리하면 객체 관리 효율을 높일 수 있다. 이러한 트리 구조는 하기 표 1의 형태로 예시될 수 있다. The object information DB manages a list of objects in a tree structure. For example, in a file system, create a tree structure of virtual object groups (corresponding to directories), such as the relationship between directories and files, and store objects (corresponding to files) in them. do. Managing many objects in a tree structure like this can improve object management efficiency. Such a tree structure can be illustrated in the form of Table 1 below.

전체 객체Whole object 얼굴Face 배우Actor 홍길동Hong Gil Dong 황비홍Fei Hong 가수Singer 황진이Hwang Jin-yi 옷dress 바지Pants 청바지Blue jeans ...... ...... ......

한편, 객체 정보 입력 단계(S80)에서는 바람직하게, 상기 객체 위치 DB 또는 객체 정보 DB에 저장된 정보를 사용자에게 제공하고, 제공한 정보에 대한 수정 편집 정보를 입력받는 경우에 해당 수정 편집 정보를 상기 객체 위치 DB 또는 객체 정보 DB에 수정 저장하는 수동 편집이 이뤄질 수 있다. On the other hand, in the object information input step (S80), preferably, when the user provided the information stored in the object location DB or the object information DB, and receives the correction edit information for the provided information, the object to the corrected edit information Manual edits can be made to modify and store the location DB or object information DB.

예를 들면, 자동 객체 추적 또는 자동 얼굴 영역 검출에서 검출 결과들은 프레임에서의 객체 위치로 표시된다. 이 객체 위치가 정확하지 않으면 사용자는 예를들어, '미디어 재생기 및 작업창'에서 해당 프레임으로 이동하여 프레임에서의 객체 위치를 마우스로 변경 또는 삭제할 수 있다.For example, in automatic object tracking or automatic face region detection, the detection results are indicated by the object position in the frame. If the object position is not correct, the user can, for example, navigate to the frame in the Media Player and Task Pane and change or delete the object position in the frame with the mouse.

객체 위치는 예를 들어, 프레임에서 회전된 4각형으로 표시되며, 여기에 모서리 4개, 변 4개, 회전 1개 총 9개의 핸들러(handler)가 표시된다. 이 핸들러를 마우스로 움직이거나 객체 위치 내부를 클릭하여 이동시키는 방법으로 위치를 수정할 수 있다. 또한 삭제(delete) 버튼을 눌러 해당 프레임에서 객체 위치를 삭제할 수도 있다.The object position is represented, for example, by a square rotated in the frame, with nine handlers in total: four corners, four sides, and one rotation. You can modify the location by moving the handler with the mouse or by clicking inside the object location. You can also delete the object position in the frame by pressing the delete button.

한편, 얼굴 그룹화에서 서로 다른 얼굴들을 같은 얼굴로 그룹화한 경우에는, 객체 목록에서 해당 얼굴을 선택하고, 화면 일측에 디스플레이된 대표 프레임 목록에서 해당 얼굴을 드래그 앤 드롭(drag & drop)하여 정확한 얼굴 객체로 이동시키거나 삭제 버튼으로 삭제할 수도 있다.On the other hand, in the case of face grouping, if different faces are grouped into the same face, the face is selected from the object list and the face is dragged and dropped from the list of representative frames displayed on one side of the screen. You can move to or delete it with the Delete button.

상기 과정에서 또한 바람직하게는, 전체 프레임에 대한 객체 검출 또는 추적이 완료된 상태가 아니더라도, 장면 그룹화 및 객체 검출, 객체 추적이 이뤄진 장면 내의 프레임에 대하여는 상기 수정 편집 정보를 입력받을 수 있다. In the above process, preferably, even if the object detection or tracking for the entire frame is not completed, the modified edit information may be input to the frames in the scene where the scene grouping, object detection, and object tracking are performed.

이를 보다 상세하게 설명하면, 장면 분할, 객체 검출, 객체 추적과 같은 자동처리들은 시간이 오래 걸린다는 특징이 있다. 이러한 점을 감안하여, 본 실시예의 제작장법에서는 이러한 자동 처리들을 백 그라운드(background) 방식으로 진행하여 놓고, 이 처리가 진행되는 과정에도 사용자가 다른 편집 작업을 할 수 있도록 구성된다. In more detail, automatic processes such as scene segmentation, object detection, and object tracking take a long time. In view of this point, in the manufacturing method of the present embodiment, such automatic processes are performed in a background manner, and the user can perform other editing work even during this process.

한편, 바람직하게는, 상기 객체 위치 DB 또는 객체 정보 DB에 저장된 정보의 사용자 제공 및 수정 편집 정보 입력은, 상기 비디오계층 DB에 저장된 구조화된 비디오 계층에 따라 장면-샷-프레임의 계층적 접근으로 이뤄질 수 있다.
On the other hand, preferably, the user-provided and corrected edit information input of the information stored in the object location DB or the object information DB is to be a hierarchical approach of scene-shot-frame according to the structured video layer stored in the video layer DB. Can be.

다음으로, 결과물 생성 단계(S90)에서는 상기 객체 위치 DB와 객체 정보 DB 내용을 매핑하여 하이퍼 비디오 정보 파일을 생성한다(S90). 상기 하이퍼 비디오 정보 파일은 XML 형식의 메타 자료로 생성된다. Next, in the result generation step (S90), the hyper video information file is generated by mapping the object location DB and the object information DB contents (S90). The hyper video information file is generated as meta data in XML format.

이후, 사용자가 결과물 출력을 진행하면, 비디오 파일을 FLV 형식으로 변환하여, 객체 위치 DB와 객체 정보 DB 내용이 포함된 XML 형식의 메타 자료를 출력하게 된다. Then, when the user outputs the result, the video file is converted into the FLV format, and the meta data of the XML format including the object location DB and the object information DB contents is output.

비데오 파일을 FLV형식으로 변환하는 것은 플래쉬(flash)로 만든 단말용 재생기(확장자 *.swf)에서 이 비디오 파일을 디스플레이할 수 있도록 하기 위함이다. 또한, 단말용 재생기를 swf 파일로 하는 것은 웹 브라우저상에서 별도의 추가적인 activeX설치없이 재생할 수 있도록 하기 위함이다.
The conversion of the video file to the FLV format is intended to allow this video file to be displayed in a flash player (extension * .swf). In addition, the terminal player to the swf file is to be able to play on the web browser without additional activeX installation.

도 12a 내지 도 22c는 상기 각 단계의 진행에 따른 화면 예이다. 12A to 22C are screen examples according to the progress of each step.

도 12a 및 도 12b는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 샷 경계 변화 시의 변화 전후 시점(#91,#93)의 각 프레임 화면 예, 도 13a 및 도 13b는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 장면 변화 시의 변화 전후 시점(#5350,#5352)의 각 프레임 화면 예, 도 14는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 유사 얼굴로 그룹화된 얼굴들의 예, 도 15는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자가 객체 영역을 선택하는 화면 예, 도 16은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자가 선택한 객체에 대하여 각 대표 프레임에서 객체 검출을 한 화면의 예, 도 17a 내지 도 17c는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 대표 프레임에서 검출된 객체에 대하여 인접 프레임(#404,#420,#479)에서 객체 추적되는 화면의 예, 도 18a 및 도 18b는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 기본화면 예, 도 19는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 객체 정보 편집 화면 예, 도 20은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 객체DB 내보내기 윈도우 화면 예, 도 21은 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 미디어 출력 윈도우 화면 예, 도 22a 내지 도 22c는 본 발명의 일 실시예에 의한 하이퍼 비디오 제작방법의 사용자 UI의 객체정보 디스플레이 및 하이퍼링크 화면 예이다.
12A and 12B are examples of frame screens of before and after changes (# 91, # 93) when the shot boundary changes in the hyper video production method according to an embodiment of the present invention, and FIGS. 13A and 13B illustrate one embodiment of the present invention. Example of each frame screen of before and after changes (# 5350, # 5352) when the scene changes in the hyper video production method according to an embodiment, Figure 14 is grouped into similar faces of the hyper video production method according to an embodiment of the present invention Examples of faces, FIG. 15 is a screen example in which a user selects an object area in a hyper video production method according to an embodiment of the present invention, and FIG. 16 is an object selected by a user in a hyper video production method according to an embodiment of the present invention. Examples of a screen in which an object is detected in each representative frame with respect to each other, FIGS. 17A to 17C are adjacent frames (# 404 and # 4) with respect to an object detected in the representative frame of the hyper video production method according to an embodiment of the present invention. 20, # 479) is an example of a screen tracked by the object, Figures 18a and 18b is a basic screen example of the user UI of the hyper-video production method according to an embodiment of the present invention, Figure 19 is an embodiment of the present invention Example of the object information editing screen of the user UI of the hyper video production method, Figure 20 is an example of the object DB export window screen of the user UI of the hyper video production method according to an embodiment of the present invention, Figure 21 is an embodiment of the present invention Examples of the media output window screen of the user UI of the hyper video production method according to the present invention, FIGS. 22A to 22C are examples of the object information display and the hyperlink screen of the user UI of the hyper video production method according to an embodiment of the present invention.

본 발명의 실시예 들은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독가능 기록매체를 포함한다. 상기 컴퓨터 판독 가능 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 기록매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크와 같은 자기-광 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 상기 기록매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.
Embodiments of the present invention include a computer readable recording medium including program instructions for performing various computer-implemented operations. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. The recording medium may be those specially designed and configured for the present invention or may be those known and used by those skilled in the computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CD-ROMs, DVDs, magnetic-optical media such as floppy disks, and ROM, RAM, flash memory, and the like. Hardware devices specifically configured to store and execute the same program instructions are included. The recording medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

1000: 하이퍼 비디오 정보 파일 제작장치
110: 장면 그룹화 모듈 120: 얼굴 영역 검출 모듈
130: 얼굴 그룹화 모듈 140: 객체 검출 모듈
150: 객체 추적 모듈 160: 사용자 UI 모듈
170: 비디오계층 DB 180: 객체 위치 DB
190: 객체 정보 DB1000: Hyper video information file producer
110: scene grouping module 120: face region detection module
130: face grouping module 140: object detection module
150: object tracking module 160: user UI module
170: video layer DB 180: object location DB
190: object information DB

Claims

In a hyper video in which additional information of a person or an object included in a video file is produced as an information file and linked by a hyperlink, a method of producing a hyper video information file executed in an information file producing apparatus,
(a) extracting frames from the input video file;
(b) detecting a shot boundary from the extracted frames, extracting a representative frame for each detected shot, and storing information about the shot and the representative frame in a video layer DB, wherein the representative frame is a start frame of each shot; Use a frame within the setting range corresponding to the beginning of the reference;
(c) grouping similar shots among the shots into one scene, and generating and storing the frame, shot and scene-structured video hierarchy information in a video hierarchy DB, wherein the similarity of each shot is determined by For the representative frame, determined based on the mutual similarity obtained for the representative frames of the predetermined number of adjacent shots adjacent in both directions;
(d) selecting and inputting an object region of interest in an arbitrary frame from a user, detecting the object region in the representative frames, and storing the object region in an object position DB;
(e) tracking the position in the adjacent frame from the positions of the object region detected in each of the representative frames and additionally storing the position in the object position DB;
(f) storing the object information input from the user in the object information DB; And
(g) generating a hyper video information file by mapping the content of the object location DB and the object information DB to generate a hyper video information file.

The method of claim 1,
Shot boundary detection of the step (b),
And measuring the degree of change between each extracted frame and the previous frame, and detecting the corresponding frame as a shot boundary when the degree of change exceeds the first threshold value.

3. The method of claim 2,
The gradient is a distance value between color histograms calculated for each frame,
And wherein the first threshold value T (i) is defined by Equation 2 for the i-th frame.
&Quot; (2) "

(Μ (i): average color histogram distance value at i-th frame,
σ (i): Standard deviation of the color histogram distance value at the i th frame,
α: weight (constant)

The method of claim 1,
Scene grouping of step (c),
Find the color histogram for each frame of each shot,
Regarding the representative frames of each shot, the mutual similarity with the representative frames of the predetermined number of adjacent shots adjacent to each other in both directions,
When the workforce ratio received in proportion to the mutual similarity from the adjacent shots exceeds the second threshold value, the shot including the representative frame is added to the beginning of a new scene and stored in the video layer DB. How to make a hyper video information file.

The method of claim 1,
The object region detection of the step (d),
Scale Invariant Feature Transform (SIFT) A method for producing a hyper video information file comprising detecting a feature, performing an initial match using a spreading distance, and performing a final match using a SIFT descriptor.

The method of claim 1,
The step (e)
(e1) receiving a next frame based on one representative frame;
(e2) if it is determined that the next frame does not deviate from the shot range including the representative frame in step (e1), tracking an object position within the frame;
(e3) if it is determined that the next frame is out of the shot range including the representative frame in step (e1), receiving the next representative frame and proceeding to step (e1);
and (e4) if there is the tracked object location in step (e2), additionally storing the tracked location of the tracked object in the object location DB.

The method of claim 1,
Object location tracking of step (e),
Receiving a selection of a location area of an object in the representative frame;
Performing color modeling of the object based on the location area of the selected object and the representative frame;
Tracking an object movement by a mean shift method when a next frame is input for tracking; And
And determining the position of the search window as the position of the object and estimating a change in the size of the object.

The method of claim 1,
After the step (c)
(h) detecting the face regions included in the representative frames extracted in the step (b), proceeding to face grouping by the face grouping method with respect to the detected face regions, and storing them in the object position DB; And proceeding to step (e) by using each of the grouped face areas as a tracking target object.

delete

9. The method of claim 8,
The face area detection is,
Creating a YCbCr color model from the RGB color information of the representative frame, separating color information and brightness information from the created color model, and detecting a face candidate area based on the brightness information; And
Defining a quadrilateral feature point model for the detected face candidate region, and detecting a face region based on the training material trained by the AdaBoost learning algorithm on the quadrilateral feature point model; hyper video information How to create a file.

9. The method of claim 8,
The face grouping is,
Extract Gabor expression of face image by Gabor wavelet transform, perform nonlinear mapping, linear discriminant analysis in kernel space, and face grouping by sequential grouping algorithm Method for producing a hyper video information file, characterized in that is made.

The method of claim 1,
(i) providing the user with the information stored in the object location DB or the object information DB, and correcting and storing the corrected edit information in the object location DB or the object information DB when receiving correction edit information for the provided information Hyper video information file production method characterized in that it comprises a; further comprising.

The method of claim 12,
Hyper video, characterized in that the step (i) can be performed on the frames in the scene in which the scene grouping, object region detection, and object position tracking are performed even if the object region detection or object position tracking for the entire frame is not completed. How to create an information file.

The method of claim 12,
The user input and correction edit information input of the information stored in the object location DB or the object information DB is configured to be a hierarchical approach of scene-shot-frame according to the structured video layer stored in the video layer DB. How to make a hyper video information file.

The method of claim 1,
And the representative frame uses a frame within a setting range corresponding to the beginning of the shot based on the start frame of each shot, but does not use the first frame.

The method of claim 1,
And the object information DB manages a list of objects in a tree structure.

The method of claim 1,
And wherein the hyper video information file is generated as meta data in an XML format.

In a hyper video in which additional information of a person or an object included in a video file is produced as an information file and linked by a hyperlink, a method of producing a hyper video information file executed in an information file producing apparatus,
Detecting a shot boundary from the frames extracted from the input video file, extracting a representative frame for each detected shot, and storing information about the shot and the representative frame in the video layer DB, wherein the representative frame is used for each shot. Use a frame within a setting range corresponding to the beginning relative to the start frame;
Grouping similar shots of each shot into one scene, and generating and storing the frame, shot and scene-structured video hierarchical information in a video hierarchy DB-similarity of each shot to a representative frame of each shot Is determined based on mutual similarity obtained for representative frames of a predetermined number of adjacent shots adjacent to each other in forward and backward directions;
Detects the face area included in each representative frame, performs face grouping by the face grouping method with respect to the detected face areas, stores them in the object position DB, and detects the object area selected by the user in the representative frame. And tracking the position of the face grouped with the object in an adjacent frame and storing the position in the object position DB;
And storing the object information input from the user in the object information DB and mapping the object location DB and the object information DB contents to generate a hyper video information file.

A computer-readable recording medium having recorded thereon a program for executing each step of the method according to any one of claims 1 to 8 and 10 to 18.

An apparatus for producing a hyper video information file for composing a hyper video in which additional information of a person or an object included in a video file is linked by a hyperlink,
Extracting frames from the input video file, detecting shot boundaries from the extracted frames, extracting representative frames for each detected shot, and storing information on shots and representative frames in the video layer DB; The representative frame uses a frame within a setting range corresponding to the beginning based on the start frame of each shot. Groups similar shots of each shot into one scene, and the frame, shot and scene are structured in a video hierarchy. A function of generating information and storing the information in the video layer DB-The similarity of each shot is determined based on the mutual similarity obtained for the representative frames of a predetermined number of adjacent shots adjacent to each other in both directions. A scene grouping module for performing the task;
An object detection module configured to detect an object area in the representative frames and store the object area in an object location DB;
An object tracking module for tracking a position in an adjacent frame from positions of the object region detected in each representative frame and additionally storing the position in an object position DB;
A user UI module for selecting and receiving an object region of interest in an arbitrary frame from a user and storing the object information input from the user in an object information DB; And
And a control module for mapping the object location DB and the object information DB contents to generate a hyper video information file.

21. The method of claim 20,
A face region detection module detecting a face region included in each representative frame extracted by the scene grouping module; and
And a face grouping module configured to perform face grouping by the face grouping method on the detected face regions and to store them in an object position DB.

21. The method of claim 20,
The user UI module,
And providing information stored in the object location DB or the object information DB to a user, and receiving correction edit information for the provided information.

The method of claim 22,
Even if the object region detection or object position tracking for the entire frame is not completed, the user UI module can be provided with respect to frames in the scene where scene grouping, object region detection, and object position tracking are performed. Hyper video information file producer.

The method of claim 22,
The user input and correction edit information input of the information stored in the object location DB or the object information DB is configured to be a hierarchical approach of scene-shot-frame according to the structured video layer stored in the video layer DB. Hyper video information file producer.

21. The method of claim 20,
And the representative frame uses a frame within a setting range corresponding to the beginning at the beginning of each shot, but does not use the first frame.

21. The method of claim 20,
And the object information DB manages the object list in a tree structure.

An apparatus for producing a hyper video information file for composing a hyper video in which additional information of a person or an object included in a video file is linked by a hyperlink,
Detecting shot boundaries from frames extracted from the input video file, extracting representative frames for each detected shot, and storing information about shots and representative frames in the video layer DB. Use a frame within a setting range corresponding to the beginning relative to the start frame;
Grouping similar shots of each shot into one scene, and generating and storing the frame, shot and scene-structured video hierarchy information in a video hierarchy DB-the similarity of the shots is represented in a representative frame of each shot. Is determined based on mutual similarity obtained for representative frames of a predetermined number of adjacent shots adjacent to each other in forward and backward directions;
Detects the face area included in each representative frame, performs face grouping by the face grouping method with respect to the detected face areas, stores them in the object position DB, and detects the object area selected by the user in the representative frame. And tracking a position of the face grouped with the object in an adjacent frame and storing the position in the object position DB; And
And storing the object information input by the user in the object information DB, and mapping the object location DB and the object information DB contents to generate a hyper video information file.