KR20190079993A

KR20190079993A - Method for authoring stereoscopic contents and application thereof

Info

Publication number: KR20190079993A
Application number: KR1020170182142A
Authority: KR
Inventors: 박승민; 박준서; 곽남훈
Original assignee: 박승민; 곽남훈; 박준서
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2019-07-08
Also published as: KR102058228B1

Abstract

Disclosed are a method for writing stereoscopic sound content and an application therefor. According to one embodiment of the present invention, the method for writing stereoscopic sound content comprises the steps of: providing a visual interface for applying a binaural effect to audio synchronized with an image played through a mobile terminal; and generating metadata synchronized with the image based on a touch input to the visual interface, wherein feedback audio applied with the binaural effect is outputted by immediately responding to the touch input.

Description

[0001] METHOD FOR AUTHORING STEREOSCOPIC CONTENTS AND APPLICATION THEREOF [0002]

본 발명은 입체 음향 컨텐츠 저작 기술에 관한 것으로, 특히 모바일 단말을 통해 재생되는 영상의 음향에 바이너럴 효과를 적용할 수 있는 기술에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention [0002] The present invention relates to stereophonic content authoring technology, and more particularly, to a technique capable of applying a binary effect to the sound of an image reproduced through a mobile terminal.

일반적으로 영상에 입체적인 음향 효과를 적용하기 위해서는 여러 대의 스피커들을 이용하여 출력되는 소리를 입력하는 멀티 채널 방식을 이용하고 있다. 또한, 2개의 스피커나 헤드폰을 이용하는 2채널 방식으로 입체 음향 효과를 구현하는 시스템이 개발되고 있으나, 주로 하드웨어에 의한 기술에 해당한다. Generally, in order to apply a three-dimensional sound effect to an image, a multi-channel method of inputting a sound output by using a plurality of speakers is used. In addition, a system that realizes a stereo sound effect using a two-channel system using two speakers or headphones has been developed, but it is mainly a technique based on hardware.

최근 소프트웨어를 이용하여 입체 음향 효과를 구현하는 기술들이 소개되고 있으나, 청취자를 기준으로 2차원에 해당하는 위치에 음상을 배치하는 것일 뿐, 3차원 위치를 이용하여 입체 음향 효과를 구현하는 기술은 찾아보기 어렵다.Recently, there have been introduced technologies for implementing a stereo sound effect using software, but a technology for implementing a stereo sound effect using a three-dimensional position is merely to locate an image at a position corresponding to a two-dimensional position based on a listener It is difficult to see.

한국 등록 특허 제10-1599554호, 2016년 2월 25일 등록(명칭: SAC 부가정보를 이용한 3D 바이노럴 필터링 시스템 및 방법)Korean Registered Patent No. 10-1599554, registered on February 25, 2016 (name: 3D binaural filtering system and method using SAC side information) 한국 등록 특허 제10-0971700호, 2010년 7월 15일 등록(명칭: 공간큐 기반의 바이노럴 스테레오 합성 장치 및 그 방법과, 그를 이용한 바이노럴 스테레오 복호화 장치)Korean Registered Patent No. 10-0971700, July 15, 2010 (Name: Space Cue-Based Binaural Stereo Synthesizer and Method, and Binaural Stereo Decoder Using the Same)

본 발명의 목적은 쉽고 직관적인 인터페이스를 포함하는 어플리케이션을 통해 모바일 단말에서 바이너럴 효과가 적용된 오디오를 포함하는 입체 음향 컨텐츠를 저작하는 것이다.It is an object of the present invention to author stereophonic content including audio with a binary effect on a mobile terminal through an application that includes an easy and intuitive interface.

또한, 본 발명의 목적은 모바일 단말의 터치 입력에 대한 메타데이터를 이용하여 입체 음향 컨텐츠를 생성하기 위한 바이너럴 렌더링을 수행하는 것이다. In addition, an object of the present invention is to perform binarization to generate stereophonic contents using metadata of a touch input of a mobile terminal.

또한, 본 발명의 목적은 오디오 피드백을 실시간으로 제공해줌으로써 사용자가 보다 객체의 움직임에 대응하여 보다 수월하게 인터페이스를 조작할 수 있도록 하는 것이다.Another object of the present invention is to provide audio feedback in real time so that a user can manipulate the interface more easily in response to movement of an object.

또한, 본 발명의 목적은 모바일 단말에 연결되는 외부 음향 기기를 통해 오디오를 청취하였을 때 방향감, 거리감 및 공간감 등을 제공하는 것이다.It is another object of the present invention to provide a directional feeling, a sense of distance, and a sense of space when listening to audio through an external sound device connected to a mobile terminal.

상기한 목적을 달성하기 위한 본 발명에 따른 입체 음향 컨텐츠 저작 방법은, 모바일 단말을 통해 재생되는 영상에 동기되는 오디오에, 바이너럴(binaural) 효과를 적용하기 위한 비주얼 인터페이스를 제공하는 단계; 및 상기 비주얼 인터페이스에 대한 터치 입력을 기반으로 상기 영상에 동기되는 메타데이터를 생성하는 단계를 포함하고, 상기 메타데이터를 생성하는 단계는 상기 터치 입력에 즉각적으로 반응하여 바이너럴 효과가 적용된 피드백 오디오를 출력한다.According to another aspect of the present invention, there is provided a method of authoring a stereophonic sound content, the method comprising: providing a visual interface for applying a binaural effect to audio synchronized with an image reproduced through a mobile terminal; And generating metadata to be synchronized with the video based on a touch input to the visual interface, wherein the step of generating the metadata includes: generating a feedback audio having a binary effect by instantly responding to the touch input, Output.

이 때, 입체 음향 컨텐츠 저작 방법은 상기 메타데이터를 기반으로 바이너럴 효과가 적용된 오디오를 포함하는 입체 음향 컨텐츠를 렌더링하는 단계를 더 포함할 수 있다.In this case, the stereoscopic content authoring method may further include rendering the stereophonic content including the audio to which the binary effect is applied based on the metadata.

이 때, 입체 음향 컨텐츠를 렌더링하는 단계는 상기 영상, 상기 바이너럴 효과가 적용된 오디오 및 부가 데이터를 포함하는 컨테이너를 생성하고, 상기 영상 및 상기 바이너럴 효과가 적용된 오디오는 동기 될 수 있다.In this case, the step of rendering stereoscopic contents creates a container including the image, audio and additional data to which the binary effect is applied, and the video and the audio to which the binary effect is applied may be synchronized.

이 때, 메타데이터는 상기 바이너럴 효과에 상응하는 음상의 삼차원 위치에 상응할 수 있다.At this time, the metadata may correspond to the three-dimensional position of the sound image corresponding to the binary effect.

이 때, 비주얼 인터페이스는 상기 음상의 2차원 평면상의 위치를 지정하기 위한 제1 인터페이스; 및 상기 음상의 상기 2차원 평면에 수직한 직선상에 위치를 지정하기 위한 제2 인터페이스를 포함하고, 상기 삼차원 위치는 상기 제1 인터페이스에 대한 사용자의 제1 입력 및 상기 제2 인터페이스에 대한 상기 사용자의 제2 입력을 조합하여 생성될 수 있다.At this time, the visual interface includes a first interface for specifying a position on the two-dimensional plane of the sound image; And a second interface for specifying a position on a straight line perpendicular to the two-dimensional plane of the sound image, wherein the three-dimensional position is a first input of the user for the first interface and a second input for the user of the second interface for the second interface, &Lt; / RTI >

이 때, 비주얼 인터페이스는 상기 메타데이터에 상응하는 음상을 상기 영상에 오버레이하여 실시간으로 표시할 수 있다.At this time, the visual interface can display an image corresponding to the metadata in real time by overlaying the image on the image.

이 때, 메타데이터에 상응하는 음상은 상기 사용자의 머리 위치를 기준으로 설정된 반구상의 위치로 표현될 수 있다.At this time, the sound image corresponding to the meta data may be represented by a hemispheric position set with reference to the head position of the user.

이 때, 비주얼 인터페이스는 상기 영상의 객체 중 상기 음상에 상응하는 객체를 하이라이트하는 객체 트래킹 인터페이스를 포함할 수 있다.At this time, the visual interface may include an object tracking interface for highlighting an object corresponding to the sound image among the objects of the image.

이 때, 객체 트래킹 인터페이스는 상기 영상의 객체 중 상기 오디오와의 상관성을 기반으로 상기 음상에 상응하는 객체를 결정할 수 있다.At this time, the object tracking interface can determine an object corresponding to the sound image based on the correlation with the audio among the objects of the image.

이 때, 바이너럴 효과가 적용된 오디오는 레프트 채널 및 라이트 채널에 상응하는 2채널 오디오일 수 있다.In this case, the audio to which the binary effect is applied may be two channel audio corresponding to the left channel and the right channel.

또한, 본 발명의 일실시예에 따른 컴퓨터로 판독 가능한 기록매체에 저장된 입체 음향 컨텐츠 저작 어플리케이션은, 모바일 단말을 통해 재생되는 영상에 동기되는 오디오에 바이너럴(binaural) 효과를 적용하기 위한 비주얼 인터페이스를 제공하는 단계; 및 상기 비주얼 인터페이스에 대한 터치 입력을 기반으로 상기 영상에 동기되는 메타데이터를 생성하는 단계를 실행시키고, 상기 터치 입력에 즉각적으로 반응하여 바이너럴 효과가 적용된 피드백 오디오를 출력한다.In addition, a stereophonic content authoring application stored in a computer-readable recording medium according to an embodiment of the present invention includes a visual interface for applying a binaural effect to audio synchronized with an image reproduced through a mobile terminal ; And generating metadata to be synchronized with the video based on a touch input to the visual interface, and outputs feedback audio to which a binary effect is applied by immediately reacting with the touch input.

이 때, 메타데이터를 기반으로 바이너럴 효과가 적용된 오디오를 포함하는 입체 음향 컨텐츠를 렌더링할 수 있다.At this time, stereophonic contents including audio with a binary effect can be rendered based on the metadata.

이 때, 렌더링을 통해 상기 영상, 상기 바이너럴 효과가 적용된 오디오 및 부가 데이터를 포함하는 컨테이너가 생성되고, 상기 영상 및 상기 바이너럴 효과가 적용된 오디오는 동기될 수 있다.At this time, a container including the image, the audio and the audio data to which the binary effect is applied and the additional data are generated through rendering, and the audio and the audio to which the binary effect is applied can be synchronized.

이 때, 비주얼 인터페이스는 상기 음상의 2차원 평면상의 위치를 지정하기 위한 제1 인터페이스; 및 상기 음상의 상기 2차원 평면에 수직한 직선상에 위치를 지정하기 위한 제2 인터페이스를 포함하고, 상기 삼차원 위치는 상기 제1 인터페이스에 대한 사용자의 제1 입력 및 상기 제2 인터페이스에 대한 상기 사용자의 제2 입력을 조합하여 생성할 수 있다.At this time, the visual interface includes a first interface for specifying a position on the two-dimensional plane of the sound image; And a second interface for specifying a position on a straight line perpendicular to the two-dimensional plane of the sound image, wherein the three-dimensional position is a first input of the user for the first interface and a second input for the user of the second interface for the second interface, Can be generated by combining the first and second inputs.

이 때, 바이너럴 효과가 적용된 오디오는 레프트 채널 및 라이트 채널에 상응하는 2채널 오디오에 상응할 수 있다.At this time, the audio to which the binary effect is applied may correspond to the two channel audio corresponding to the left channel and the right channel.

본 발명에 따르면, 쉽고 직관적인 인터페이스를 포함하는 어플리케이션을 통해 모바일 단말에서 바이너럴 효과가 적용된 오디오를 포함하는 입체 음향 컨텐츠를 저작할 수 있다.According to the present invention, it is possible to author stereophonic contents including audio having a binary effect in a mobile terminal through an application including an easy and intuitive interface.

또한, 본 발명은 모바일 단말의 터치 입력에 대한 메타데이터를 이용하여 입체 음향 컨텐츠를 생성하기 위한 바이너럴 렌더링을 수행할 수 있다.In addition, the present invention can perform the binarization to generate stereophonic contents using the metadata of the touch input of the mobile terminal.

또한, 본 발명은 오디오 피드백을 실시간으로 제공해줌으로써 사용자가 보다 객체의 움직임에 대응하여 보다 수월하게 인터페이스를 조작할 수 있도록 할 수 있다.In addition, the present invention can provide audio feedback in real time so that a user can manipulate the interface more easily in response to movement of an object.

또한, 본 발명은 모바일 단말에 연결되는 외부 음향 기기를 통해 오디오를 청취하였을 때 방향감, 거리감 및 공간감 등을 제공할 수 있다.In addition, the present invention can provide a sense of direction, a sense of distance, and a sense of space when listening to audio through an external sound device connected to a mobile terminal.

도 1은 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 어플리케이션의 실행 화면을 나타낸 도면이다.
도 2는 본 발명에 따른 모바일 단말의 일 예를 나타낸 블록도이다.
도 3은 본 발명에 따른 영상에 동기되는 메타데이터의 일 예를 나타낸 도면이다.
도 4는 본 발명에 따른 비주얼 인터페이스의 일 예를 나타낸 도면이다.
도 5는 본 발명에 따른 비주얼 인터페이스의 다른 예를 나타낸 도면이다.
도 6은 본 발명에 따른 음상이 위치하는 3차원 공간의 일 예를 나타낸 도면이다.
도 7은 본 발명에 따른 영상에 오버레이된 음상의 일 예를 나타낸 도면이다.
도 8 내지 도 9는 본 발명에 따른 사용자의 머리 위치를 기준으로 나타낸 음상의 일 예를 나타낸 도면이다.
도 10은 본 발명에 따른 객체 트래킹 인터페이스를 통해 객체를 하이라이트하는 과정의 일 예를 나타낸 도면이다.
도 11은 본 발명에 따른 렌더링 과정의 일 예를 나타낸 블록도이다.
도 12는 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 방법을 나타낸 동작흐름도이다.
도 13은 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 과정을 상세하게 나타낸 동작흐름도이다.
도 14 내지 도 15는 본 발명에 따른 입체 음향 컨텐츠 저작 어플리케이션 실행화면의 다른 예를 나타낸 도면이다.
도 16은 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 시스템을 나타낸 도면이다.1 is a view showing an execution screen of a stereophonic content authoring application according to an embodiment of the present invention.
2 is a block diagram illustrating an example of a mobile terminal according to the present invention.
3 is a diagram illustrating an example of metadata synchronized with an image according to the present invention.
4 is a diagram illustrating an example of a visual interface according to the present invention.
5 is a view showing another example of a visual interface according to the present invention.
6 is a view showing an example of a three-dimensional space in which an image is located according to the present invention.
7 is a view showing an example of a sound image overlaid on an image according to the present invention.
8 to 9 are views showing an example of a sound image based on a user's head position according to the present invention.
10 is a view illustrating an example of a process of highlighting an object through the object tracking interface according to the present invention.
11 is a block diagram illustrating an example of a rendering process according to the present invention.
12 is a flowchart illustrating a stereoscopic content authoring method according to an exemplary embodiment of the present invention.
FIG. 13 is a flowchart illustrating an operation of authoring a stereoscopic sound content according to an exemplary embodiment of the present invention.
14 to 15 are views showing another example of a stereoscopic sound content authoring application execution screen according to the present invention.
16 is a diagram illustrating a stereo audio content authoring system according to an embodiment of the present invention.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.The present invention will now be described in detail with reference to the accompanying drawings. Hereinafter, a repeated description, a known function that may obscure the gist of the present invention, and a detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. Accordingly, the shapes and sizes of the elements in the drawings and the like can be exaggerated for clarity.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 어플리케이션의 실행 화면을 나타낸 도면이다.1 is a view showing an execution screen of a stereophonic content authoring application according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 컴퓨터로 판독 가능한 기록매체에 저장된 입체 음향 컨텐츠 저작 어플리케이션은 모바일 단말을 통해 재생되는 영상에 동기되는 오디오에 바이너럴(binaural) 효과를 적용하기 위한 비주얼 인터페이스를 제공한다.Referring to FIG. 1, a stereophonic content authoring application stored in a computer-readable recording medium according to an exemplary embodiment of the present invention may include a binaural effect application module for applying a binaural effect to audio synchronized with an image reproduced through a mobile terminal Provides a visual interface.

이 때, 입체 음향 컨텐츠 저작 어플리케이션은 입체 음향 컨텐츠를 저작하기 위한 툴(tool)에 상응하는 것으로, 컴퓨터에 설치되어 실행되는 컴퓨터 프로그램에 상응할 수 있다.At this time, the stereophonic content authoring application corresponds to a tool for authoring stereophonic contents, and may correspond to a computer program installed and executed in the computer.

이 때, 바이너럴(binaural) 효과는, 사용자가 헤드폰이나 이어폰으로 오디오를 들을 때 마치 오디오의 음상이 외부에 위치하는 것처럼 입체적인 음향 효과를 제공하는 것이다. 따라서, 바이너럴 효과가 적용된 오디오를 포함하는 컨텐츠는 일반 컨텐츠보다 실감나고 현장감 있는 경험을 사용자에게 제공해줄 수 있다. In this case, the binaural effect provides a three-dimensional sound effect as if the audio image is located outside when the user is listening to the audio through the headphone or earphone. Therefore, the contents including the audio with the binary effect can provide the user with a more realistic and realistic experience than the general contents.

이 때, 영상은 모바일 단말에 저장된 영상에 상응할 수 있다. 예를 들어, 영상은 모바일 단말에 설치된 카메라를 통해 사용자가 직접 촬영한 영상이거나 또는 네트워크를 기반으로 다운로드받는 영상에 해당할 수 있다.At this time, the image may correspond to the image stored in the mobile terminal. For example, the image may correspond to a user's own image captured through a camera installed in the mobile terminal or an image downloaded based on the network.

이 때, 비주얼 인터페이스는 입체 음향 컨텐츠를 저작하기 위한 입체 음향 컨텐츠 저작 어플리케이션의 실행을 기반으로 제공될 수 있으므로 어플리케이션을 실행할 수 있는 모바일 단말을 통해 제공될 수 있다.At this time, the visual interface can be provided based on the execution of a stereoscopic content authoring application for authoring stereophonic contents, and thus can be provided through a mobile terminal capable of executing an application.

예를 들어, 비주얼 인터페이스는 도 1에 도시된 실행화면에 상응하게 구성될 수 있다.For example, the visual interface may be configured corresponding to the execution screen shown in Fig.

이 때, 모바일 단말은 네트워크를 기반으로 데이터 통신을 수행 가능한 단말에 상응할 수 있다.At this time, the mobile terminal may correspond to a terminal capable of performing data communication based on the network.

예를 들어, 도 2를 참조하면, 본 발명의 일실시예에 따른 모바일 단말은 통신부(210), 프로세서(220) 및 메모리(230)를 포함한다.For example, referring to FIG. 2, a mobile terminal according to an embodiment of the present invention includes a communication unit 210, a processor 220, and a memory 230.

통신부(210)는 네트워크와 같은 통신망을 통해 입체 음향 컨텐츠 저작을 위해 필요한 정보를 송수신하는 역할을 한다. 특히, 본 발명의 일실시예에 따른 통신부(210)는 네트워크를 기반으로 어플리케이션을 제공하는 서버로부터 입체 음향 컨텐츠 저작 어플리케이션을 획득할 수 있다. The communication unit 210 transmits and receives information required for authoring stereophonic contents through a communication network such as a network. In particular, the communication unit 210 according to an embodiment of the present invention can acquire a stereophonic content authoring application from a server that provides an application based on a network.

이 때, 서버는 입체 음향 컨텐츠 저작 어플리케이션 및 입체 음향 컨텐츠 저작 어플리케이션의 실행과 관련된 다양한 컨텐츠와 서비스를 제공할 수 있다. At this time, the server can provide a variety of contents and services related to the execution of stereophonic content authoring application and stereophonic content authoring application.

예를 들어, 도 16을 참조하면, 모바일 단말(1610)과 서버(1620)는 네트워크(1630)를 기반으로 데이터를 송수신할 수 있다. 이 때, 모바일 단말(1610)은 서버(1620)로부터 입체 음향 컨텐츠 저작에 필요한 다양한 컨텐츠 또는 데이터를 다운로드하여 사용할 수도 있다.For example, referring to FIG. 16, a mobile terminal 1610 and a server 1620 can send and receive data based on the network 1630. At this time, the mobile terminal 1610 may download various contents or data required for stereophonic content authoring from the server 1620.

이 때, 네트워크(1630)는 기존의 이용되는 네트워크 및 향후 개발 가능한 네트워크를 모두 포괄하는 개념이다. 예를 들어, 네트워크는 인터넷 프로토콜(IP)을 기반으로 데이터 서비스를 제공하는 아이피망, 유선망, Wibro(Wireless Broadband)망, WCDMA를 포함하는 3세대 이동통신망, HSDPA(High Speed Downlink Packet Access)망 및 LTE 망을 포함하는 3.5세대 이동통신망, LTE advanced를 포함하는 4세대 이동통신망, 위성통신망 및 와이파이(Wi-Fi)망 중 어느 하나 또는 하나 이상을 결합하여 이루어질 수 있다.At this time, the network 1630 is a concept that covers both existing and future developable networks. For example, the network may be a third generation mobile communication network including an i-bimetal, a wired network, a Wibro (Wireless Broadband) network, a WCDMA, a High Speed Downlink Packet Access A 3.5G mobile communication network including an LTE network, a 4G mobile communication network including an LTE advanced, a satellite communication network, and a Wi-Fi network.

프로세서(220)는 중앙연산 처리장치에 상응하는 것으로, 메모리(230)에 저장된 입체 음향 컨텐츠 저작 어플리케이션을 실행시켜 제어할 수 있다. The processor 220 corresponds to the central processing unit and can be controlled by executing a stereophonic content authoring application stored in the memory 230. [

이 때, 메모리(230)에는 운영체제(Operating System, OS)와 함께 입체 음향 컨텐츠 저작 어플리케이션을 포함한 각종 어플리케이션들이 저장될 수 있다. 따라서, 입체 음향 컨텐츠 저작 어플리케이션은 모바일 단말에 설치되어 실행되는 컴퓨터 프로그램에 상응할 수 있다.At this time, various applications including a stereophonic content authoring application can be stored in the memory 230 together with an operating system (OS). Accordingly, the stereophonic content authoring application may correspond to a computer program installed and executed in the mobile terminal.

또한, 메모리(230)는 본 발명의 실시예에 따른 입체 음향 컨텐츠 저작을 위한 기능을 지원할 수 있다. 이 때, 메모리(230)는 별도의 대용량 스토리지로 동작할 수 있고, 동작 수행을 위한 제어 기능을 포함할 수도 있다. 일 구현예의 경우, 메모리는 컴퓨터로 판독 가능한 매체이다. 일 구현 예에서, 메모리는 휘발성 메모리 유닛일 수 있으며, 다른 구현예의 경우, 메모리는 비휘발성 메모리 유닛일 수도 있다. 다양한 서로 다른 구현 예에서, 메모리는 예컨대 하드디스크 장치, 광학디스크 장치, 혹은 어떤 다른 대용량 저장장치를 포함할 수도 있다.In addition, the memory 230 may support a function for authoring stereo sound content according to an embodiment of the present invention. At this time, the memory 230 may operate as a separate mass storage and may include a control function for performing operations. In one implementation, the memory is a computer-readable medium. In one implementation, the memory may be a volatile memory unit, and in other embodiments, the memory may be a non-volatile memory unit. In various different implementations, the memory may include, for example, a hard disk device, an optical disk device, or any other mass storage device.

또한, 본 발명의 일실시예에 따른 컴퓨터로 판독 가능한 기록매체에 저장된 입체 음향 컨텐츠 저작 어플리케이션은 비주얼 인터페이스에 대한 터치 입력을 기반으로 영상에 동기되는 메타데이터를 생성한다. In addition, the stereophonic content authoring application stored in a computer-readable recording medium according to an exemplary embodiment of the present invention generates metadata to be synchronized with an image based on a touch input to a visual interface.

예를 들어, 메타데이터는 비주얼 인터페이스에 대한 터치 입력 값을 기반으로 생성될 수 있다. 즉, 모바일 단말의 사용자는 영상에 포함된 객체의 움직임을 보면서 비주얼 인터페이스를 조작하여 영상에 동기된 오디오의 음상을 조절할 수 있다. For example, the metadata may be generated based on a touch input value to the visual interface. That is, the user of the mobile terminal can control the sound image of the audio synchronized with the image by operating the visual interface while watching the motion of the object included in the image.

이 때, 터치 입력 값은 멀티터치 입력 값일 수 있다. 3차원 좌표상의 위치를 효과적으로 지정하기 위해서는 화면상의 싱글 포인트를 설정하는 싱글 터치로는 부족하고, 화면상의 복수개의 포인트들을 동시에 설정할 수 있는 멀티 터치가 필요할 수 있다.At this time, the touch input value may be a multi-touch input value. In order to effectively specify a position on a three-dimensional coordinate, a single touch for setting a single point on the screen is not sufficient, and a multi-touch capable of simultaneously setting a plurality of points on the screen may be required.

즉, 멀티 터치가 지원되지 않는 경우나 화면에 대한 터치 입력이 아닌 키보드나 마우스를 이용한 입력 값을 이용하여 메타데이터를 생성하는 경우에는 사용자가 영상을 보면서 직관적이면서도 효율적으로 메타데이터를 생성하기 어려울 수 있다.That is, when the multi-touch is not supported or when the metadata is generated by using the input values using the keyboard or the mouse instead of the touch input to the screen, it is difficult for the user to generate the metadata intuitively and efficiently while viewing the image have.

이 때, 메타데이터는 모바일 단말에서 재생되는 영상에 동기되어 생성될 수 있다.At this time, the metadata may be generated in synchronization with an image reproduced by the mobile terminal.

예를 들어, 도 3에 도시된 것과 같이 영상을 재생하는 동안 생성된 메타데이터(320-1~320-N)를 각각 터치 입력 시점에 재생된 영상의 프레임들(310-1~310-N)에 매칭시켜 메타데이터와 영상을 동기화할 수 있다. For example, as shown in FIG. 3, the metadata 320-1 to 320-N generated during the reproduction of an image may be transmitted to the frames 310-1 to 310-N of the reproduced image at the touch input time, So that the metadata and the image can be synchronized with each other.

이 때, 입체 음향 컨텐츠 저작 어플리케이션은 터치 입력에 즉각적으로 반응하여 바이너럴 효과가 적용된 피드백 오디오를 출력할 수 있다. At this time, the stereophonic content authoring application can immediately output the feedback audio with the binary effect by reacting to the touch input.

이 때, 피드백 오디오는 메타데이터를 기반으로 바이너럴 효과가 적용되어 있지 않은 기존의 오디오를 변형한 것에 상응할 수 있다. 즉, 사용자에 의한 터치 입력이 발생하면, 이에 상응하게 생성되는 메타데이터를 실시간으로 반영하여 피드백 오디오를 출력해줄 수 있다. 만약, 즉각적으로 피드백 오디오가 출력되지 않는다면, 사용자는 렌더링이 완료될 때까지 자신이 변형한 오디오를 들을 수 없기 때문에 영상과 대응하지 않도록 바이너럴 효과가 적용될 수도 있다. At this time, the feedback audio may correspond to a modification of the existing audio on which the binary effect is not applied based on the metadata. That is, when the touch input by the user occurs, the feedback audio can be output by reflecting the generated metadata in real time. If immediate feedback audio is not output, the user may apply a binary effect so that it does not correspond to the image because the user can not hear the modified audio until rendering is complete.

따라서, 본 발명에서는 즉각적으로 피드백 오디오를 출력해줌으로써 사용자가 영상에 포함된 객체의 움직임에 대응하여 터치 입력을 수행하도록 할 수 있다. Accordingly, in the present invention, the feedback audio is immediately output, so that the user can perform the touch input corresponding to the motion of the object included in the image.

이 때, 바이너럴 효과가 적용된 피드백 오디오는 이어폰이나 헤드폰과 같이 사용자가 바이너럴 효과에 따른 입체감을 느낄 수 있는 외부 음향 기기를 통해 출력될 수 있다.At this time, the feedback audio to which the binary effect is applied can be output through an external sound device such as an earphone or a headphone, which allows the user to feel a stereoscopic effect due to a binary effect.

이 때, 메타데이터는 바이너럴 효과에 상응하는 음상의 삼차원 위치에 상응할 수 있다. 이 때, 음상의 삼차원 위치는 바이너럴 효과를 느끼는 모바일 단말의 사용자를 기준으로 할 수 있다.At this time, the metadata may correspond to the three-dimensional position of the sound image corresponding to the binary effect. In this case, the three-dimensional position of the sound image may be based on the user of the mobile terminal experiencing the binary effect.

예를 들어, 본 발명에 따른 음상의 삼차원 위치는 도 6에 도시된 것과 같은 사용자의 머리를 기준으로 하는 반구 형태의 3차원 공간 상에 나타낼 수 있다. 따라서, 바이너럴 효과가 적용된 오디오의 경우, 사용자를 기준으로 360도에서 입체적으로 소리가 들리는 효과를 제공할 수 있다.For example, the three-dimensional position of the sound image according to the present invention can be represented on a hemispherical three-dimensional space with respect to the user's head as shown in Fig. Therefore, in the case of the audio with the binary effect, it is possible to provide a three-dimensional sound effect at 360 degrees based on the user.

이와 같이, 본 발명에서는 X축, Y축, Z축을 이용하는 삼차원 공간 상에 음상을 배치함으로써 기존의 음향 기술보다 깊은 몰입감을 제공할 수 있다.As described above, in the present invention, the sound image is arranged on the three-dimensional space using the X-axis, the Y-axis, and the Z-axis to provide a deeper feeling of immersion than the conventional sound technique.

이 때, 비주얼 인터페이스는 음상의 2차원 평면상의 위치를 지정하기 위한 제1 인터페이스 및 음상의 2차원 평면에 수직한 직선상에 위치를 지정하기 위한 제2 인터페이스를 포함하고, 삼차원 위치는 제1 인터페이스에 대한 사용자의 제1 입력 및 제2 인터페이스에 대한 사용자의 제2 입력을 조합하여 생성될 수 있다. At this time, the visual interface includes a first interface for specifying a position on a two-dimensional plane of the sound image and a second interface for specifying a position on a straight line perpendicular to the two-dimensional plane of the sound image, The first input of the user to the second interface and the second input of the user to the second interface.

예를 들어, 도 4를 참조하면, 사용자가 영상에 포함된 객체(400)의 움직임에 따라 제1 인터페이스(410)와 제2 인터페이스(420)를 조작할 수 있다. 이 때, 영상에 포함된 객체(400)가 이동하는 방향에 따라 제1 인터페이스(410)의 조작키와 제2 인터페이스(420)의 조작키를 터치 입력으로 이동시킬 수 있다. 이 때, 터치 입력에 따른 제1 입력과 제2 입력을 조합하여 생성된 삼차원 위치를 메타데이터로 기록할 수 있다. For example, referring to FIG. 4, a user may manipulate the first interface 410 and the second interface 420 according to movement of an object 400 included in an image. At this time, the operation keys of the first interface 410 and the operation keys of the second interface 420 can be moved to the touch input according to the direction in which the object 400 included in the image moves. At this time, the three-dimensional position generated by combining the first input and the second input according to the touch input can be recorded as metadata.

다른 예를 들어, 도 5를 참조하면, 사용자가 영상에 포함된 객체(500)의 움직임에 따라 사용자의 머리 위치를 포함하는 제3 인터페이스(510)를 조작할 수도 있다. 이 때, 사용자가 제3 인터페이스(510)에 해당하는 3차원 위치에 음상을 터치 입력으로 선택하면, 선택된 위치에 상응하는 삼차원 위치를 메타데이터로 기록할 수도 있다. 이 때, 제3 인터페이스(510)는 조작의 편의를 위해 머리의 방향을 회전시키면서 음상을 터치할 수도 있다. For example, referring to FIG. 5, a user may manipulate the third interface 510 including the user's head position according to the movement of the object 500 included in the image. At this time, if the user selects the sound image as the touch input at the three-dimensional position corresponding to the third interface 510, the user may record the three-dimensional position corresponding to the selected position as the metadata. At this time, the third interface 510 may touch the sound image while rotating the head direction for the convenience of operation.

이 때, 비주얼 인터페이스는 도 4 내지 도 5에 도시된 형태에 한정되지 않으며, 음상의 3차원 위치를 입력 가능한 다양한 형태로 제공될 수 있다.In this case, the visual interface is not limited to the shapes shown in FIGS. 4 to 5, and may be provided in various forms in which the three-dimensional position of the sound image can be input.

이 때, 비주얼 인터페이스에 포함된 모드변경 버튼을 기반으로 피드백 오디오에 의한 영상을 플레이하는 플레이 모드나 비주얼 인터페이스에 의해 음상의 위치를 조절하기 위한 편집 모드 중 어느 하나의 모드를 선택적으로 제공할 수도 있다. 이 때, 모드변경 버튼은 토글(toggle) 방식으로 동작할 수 있다.At this time, it is possible to selectively provide any one of a play mode for playing the image by the feedback audio based on the mode change button included in the visual interface or an edit mode for adjusting the position of the sound image by the visual interface . At this time, the mode change button can be operated in a toggle manner.

이 때, 비주얼 인터페이스는 메타데이터에 상응하는 음상을 영상에 오버레이하여 실시간으로 표시할 수 있다. At this time, the visual interface can overlay the image corresponding to the meta data on the image and display it in real time.

예를 들어, 도 7에 도시된 것과 같이 사용자가 비주얼 인터페이스에 포함된 제1 인터페이스와 제2 인터페이스를 조작하여 변화하는 음상(710)을 영상에 보여지는 객체에 오버레이하여 표시해줄 수 있다. 이 때, 사용자가 터치 입력을 기반으로 제1 인터페이스나 제2 인터페이스를 조작하는 경우, 음상(710)의 위치도 실시간으로 이동시키면서 보여줄 수도 있다.For example, as shown in FIG. 7, the user may operate the first interface and the second interface included in the visual interface to display the changed sound image 710 over the object displayed in the image. In this case, when the user operates the first interface or the second interface based on the touch input, the position of the sound image 710 may be displayed while moving in real time.

이 때, 메타데이터에 상응하는 음상은 사용자의 머리 위치를 기준으로 설정된 반구상의 위치로 표현될 수 있다.At this time, the sound image corresponding to the meta data can be expressed as a hemispheric position set with reference to the head position of the user.

예를 들어, 도 7에 도시된 것과 같이 별도의 음상 트래킹 인터페이스(720)를 통해 변화하는 음상을 보여줄 수도 있다. 이 때, 음상 트래킹 인터페이스(720)는 도 8에 도시된 것과 같이 사용자의 머리 위치를 기준으로 생성된 반구상의 공간에 상응하는 형태로 제공될 수 있다. 이 때, 음상의 3차원 위치를 보다 직관적으로 식별할 수 있도록 사용자 머리의 방향을 회전하면서 음상을 표시할 수도 있다. For example, the sound image may be changed through a separate sound phase tracking interface 720 as shown in FIG. At this time, the sound image tracking interface 720 may be provided in a form corresponding to the hemispherical space generated based on the user's head position as shown in FIG. At this time, the sound image may be displayed while rotating the direction of the user's head so as to more intuitively identify the three-dimensional position of the sound image.

다른 예를 들어, 도 9에 도시된 것과 같이 사용자의 머리를 기준으로 하는 2차원 평면상에서 위치를 나타낼 수도 있다. 이 때, 도 9에 도시된 음상의 위치는 비주얼 인터페이스에 포함된 제1 인터페이스의 입력값에 상응할 수도 있다. As another example, the position may be represented on a two-dimensional plane with respect to the user's head as shown in Fig. At this time, the position of the sound image shown in FIG. 9 may correspond to the input value of the first interface included in the visual interface.

이 때, 비주얼 인터페이스는 영상의 객체 중 음상에 상응하는 객체를 하이라이트하는 객체 트래킹 인터페이스를 포함할 수 있다.At this time, the visual interface may include an object tracking interface for highlighting the object corresponding to the sound image among the objects of the image.

이 때, 객체 트래핑 인터페이스는 영상의 객체 중 오디오와의 상관성을 기반으로 음상에 상응하는 객체를 결정할 수 있다. 즉, 영상에 동기되는 오디오가 크게 변화하는 순간, 영상 내의 객체의 변화를 고려하여 오디오의 변화에 상관성이 높다고 판단되는 객체를 음상에 상응하는 객체로 결정할 수 있다.At this time, the object trapping interface can determine the object corresponding to the sound image based on the correlation with the audio among the objects of the image. That is, at a moment when the audio synchronized with the video changes significantly, the object determined to be highly correlated with the audio change can be determined as the object corresponding to the sound image, taking into consideration the change of the object in the image.

예를 들어, 객체 트래킹 인터페이스는 영상에 동기되는 오디오가 기설정된 기준 변화량 이상 변화하는 시점에서 새롭게 등장하는 객체를 음상에 상응하는 객체로 판단하고 하이라이트 할 수 있다. 즉, 도 10을 참조하면, 영상에 동기되는 오디오(1010)가 큰 변화를 보이지 않는 구간에서는 영상 프레임(1020)에서도 새로운 객체가 등장하거나 큰 변화가 나타나지 않는 것을 확인할 수 있다. 그러나, 영상 프레임(1030)에 나타난 것과 같이 자동차 객체(1000)가 등장하면서 오디오(1010)가 크게 변화하는 것을 알 수 있다. 이 때, 객체 트래핑 인터페이스는 자동차를 음상에 상응하는 객체(1000)로 판단하고, 사용자가 객체(1000)를 쉽게 식별할 수 있도록 하이라이트 처리할 수 있다.For example, the object tracking interface can judge and highlight a newly emerging object as an object corresponding to a sound image at a time when audio synchronized with the image changes by a predetermined reference change amount or more. In other words, referring to FIG. 10, it can be seen that a new object does not appear in the image frame 1020 or a large change does not appear in a section where the audio 1010 synchronized with the image does not show a large change. However, as shown in the video frame 1030, it can be seen that the audio object 1010 greatly changes as the automobile object 1000 appears. At this time, the object trapping interface can determine the automobile as the object 1000 corresponding to the sound image, and can highlight the object 1000 so that the user can easily identify the object 1000.

또한, 본 발명의 일실시예에 따르면, 객체 트래핑 인터페이스는 음상에 상응하게 인식된 객체의 움직임에 따라 음상의 위치도 그에 상응하게 변경시킬 수도 있다. 즉, 사용자가 직접 음상을 변경하지 않아도, 객체 트래핑 인터페이스를 통해 인식된 음상에 상응하는 객체를 트래킹하면서 자동으로 음상에 대한 메타데이터가 생성되도록 할 수도 있다. Also, according to an embodiment of the present invention, the object trapping interface may change the position of the sound image correspondingly to the movement of the recognized object corresponding to the sound image. That is, even if the user does not directly change the sound image, it is possible to automatically generate the sound image metadata while tracking the object corresponding to the sound image recognized through the object trapping interface.

또한, 본 발명의 일실시예에 따른 컴퓨터로 판독 가능한 기록매체에 저장된 입체 음향 컨텐츠 저작 어플리케이션은 메타데이터를 기반으로 바이너럴 효과가 적용된 오디오를 포함하는 입체 음향 컨텐츠를 렌더링한다.In addition, the stereophonic content authoring application stored in a computer-readable recording medium according to an exemplary embodiment of the present invention renders stereophonic content including audio applied with a binary effect based on metadata.

이 때, 입체 음향 컨텐츠는 모바일 단말을 통해 재생 가능한 컨텐츠에 상응할 수 있으며, 사용자가 입체 음향 컨텐츠를 시청하는 경우에 마치 사용자의 주위로 여러 개의 스피커가 설치되어있는 환경에서 오디오가 출력되는 것과 같은 음향 효과를 느낄 수 있다.In this case, the stereophonic content may correspond to the content reproducible through the mobile terminal, and in the case where the user views the stereophonic content, audio is output in an environment in which a plurality of speakers are installed around the user You can feel the sound effect.

이 때, 바이너럴 효과가 적용된 오디오는 레프트 채널 및 라이트 채널에 상응하는 2채널 오디오일 수 있다. 따라서, 2채널 오디오만을 지원하는 단말이나 디바이스에서도 마치 5.1 채널이나 10.2 채널과 같은 음향 효과를 느낄 수 있는 입체 음향 컨텐츠를 제공할 수 있다.In this case, the audio to which the binary effect is applied may be two channel audio corresponding to the left channel and the right channel. Therefore, a terminal or a device that supports only 2-channel audio can provide stereophonic contents that can feel a sound effect like 5.1 channel or 10.2 channel.

이 때, 본 발명의 일실시예에 따른 바이너럴 효과가 적용된 오디오는 기존의 이용되는 기술 및 향후 개발 가능한 기술을 적용하여 생성될 수 있다. At this time, the audio to which the binary effect according to an embodiment of the present invention is applied can be generated by applying existing technologies and technologies that can be developed in the future.

한국등록공보 10-1599554에는 MPEG 서라운드(MPEG Surround)라는 국제 표준의 다채널 오디오 부호화 기술을 기반으로 3차원 바이너럴 신호를 출력하는 내용을 개시하고 있다. 10-1599554에서는 MPEG-Surround(MPS) 국제 표준을 기반으로 멀티채널 오디오 재생특성 파라미터를 추출하고, 오디오 재생특성 파라미터를 이용하여 다운믹스 오디오 신호에 대해 HRTF(Head Related Transfer function) 필터링을 수행하여 3D 바이너럴 신호를 출력하는 내용을 개시하고 있다. 여기서, HRTF 필터링은 인간의 청각기관을 모델링한 더미 헤더 마이크를 이용하여 360도의 방위각과 180도의 고도각 사이를 특정 간격을 두고 각각의 위치에 대한 좌우 양쪽의 임펄스 응답을 구하는 필터링일 수 있다.Korean Registered Patent Publication No. 10-1599554 discloses a method for outputting a three-dimensional binary signal based on a multi-channel audio encoding technology of MPEG standard called " MPEG Surround ". 10-1599554 extracts multichannel audio reproduction characteristic parameters based on the MPEG-Surround (MPS) international standard, performs HRTF (Head Related Transfer function) filtering on the downmix audio signal using the audio reproduction characteristic parameters, And outputs a binary signal. Here, the HRTF filtering may be filtering using the dummy header microphones modeled by the human auditory organ to obtain impulse responses on both sides of the respective positions with a specific interval between the azimuth angle of 360 degrees and the altitude angle of 180 degrees.

이 때, 멀티채널 오디오 재생특성 파라미터는 주파수 대역 별 전후방 채널 신호의 출력 레벨 차에 대한 것으로, MPEG-Surround(MPS) 국제 표준에서 다채널 오디오 신호를 입력 받아 두 귀 사이의 소리 크기 차이와 채널 사이의 상관도 등으로 표현되는 공간 파라미터를 기반으로 추출될 수 있다.In this case, the multi-channel audio reproduction characteristic parameter relates to the output level difference of the front and rear channel signals according to the frequency band. In the MPEG-Surround (MPS) international standard, a multi-channel audio signal is received, And the like, and the like.

또한, 한국등록공보 10-0971700에는 가상음원의 위치정보와 각 채널 별 바이노럴 필터 계수에 기초하여 좌/우 오디오 신호를 주파수 영역에서 필터링하고, 필터링된 신호를 바이노럴 스테레오 신호로 복호화하는 내용을 개시하고 있다. 이 때, 입력되는 시간 영역의 스테레오 좌/우 오디오 신호를 DFT(Discrete Fourier Transform) 또는 FFT(Fast Fourier Transform)를 이용하여 주파수 영역의 신호로 변환하고, 가상음원의 위치정보를 기반으로 할당된 서브밴드 별 각 채널의 파워 이득 값과 각 채널에 대한 주파수 영역의 좌/우 HRTF 계수 블록에 기초하여 주파수 영역에 상응하는 스테레오 좌/우 신호를 바이노럴 스테레오 신호로 필터링할 수 있다. In Korean Registered Patent Publication No. 10-0971700, left and right audio signals are filtered in the frequency domain based on the position information of virtual sound sources and binaural filter coefficients for each channel, and the filtered signals are decoded into binaural stereo signals The contents are disclosed. At this time, the stereo left / right audio signal in the input time domain is converted into a frequency domain signal by using DFT (Discrete Fourier Transform) or FFT (Fast Fourier Transform), and the sub- The binaural stereo signal can be used to filter the stereo left / right signal corresponding to the frequency domain based on the power gain value of each channel for each band and the left / right HRTF coefficient block of the frequency domain for each channel.

이 때, 가상음원의 위치정보(VSLI: Virtual Source Location Information) 기반의 공간 큐 정보를 합성하여 서브밴드 별 각 채널의 파워 이득값을 산출할 수 있고, 스테레오 신호에 대한 VSLI 기반의 공간 큐 정보는 임의의 서브밴드(m)에 대하여, 좌측반면각(LHA: Left Half-plane Angle)(LHA(m)), 좌측보조평면각(LSA: Left Subsequent Angle)(LSA(m)), 우측반면각(RHA: Right Half-plane Angle)(RHA(m)) 및 우측보조평면각(RSA: Right Subsequent Angle)(RSA(m))을 포함할 수 있다.In this case, the power gain value of each channel can be calculated by combining spatial cue information based on VSLI (Virtual Source Location Information), and VSLI-based spatial queue information for a stereo signal The left half-plane angle (LHA (m)), the left subconscious angle (LSA (LSA (m)) and the right half angle (LSA RHA (Right Half-Plane Angle) RHA (m) and Right Subsequent Angle (RSA (m)).

따라서, 본 발명에서도 상기와 같은 기술을 기반으로 메타데이터에 상응하게 바이너럴 효과가 적용된 오디오를 생성할 수 있다. Accordingly, in the present invention, it is also possible to generate audio to which a binary effect is applied according to the metadata based on the above-described technique.

예를 들어, MPEG-Surround(MPS) 국제 표준을 기반으로 영상에 동기되는 오디오에 대한 공간 파라미터를 추출하고, 메타데이터에 상응하는 음원의 삼차원 위치와 공간 파라미터를 기반으로 오디오에 대해 HRTF(Head Related Transfer function) 필터링을 수행함으로써 바이너럴 효과가 적용된 오디오 신호를 생성할 수 있다.For example, spatial parameters for audio synchronized to an image are extracted based on the MPEG-Surround (MPS) international standard, and HRTF (Head Related) is calculated for the audio based on the 3D position and spatial parameters of the sound source corresponding to the metadata. Transfer function filtering can be performed to generate an audio signal with a binary effect.

다른 예를 들어, 영상에 동기되는 오디오를 주파수 영역의 스테레오 좌/우 오디오 신호로 변환하고, 메타데이터에 상응하는 위치정보를 기반으로 공간 큐 정보를 합성하여 서브밴드 별 각 채널의 파워 이득값을 산출한 뒤 각 채널에 대한 주파수 영역의 좌/우 HRTF 계수 블록에 기초하여 바이너럴 효과가 적용된 오디오를 생성할 수 있다.In another example, the audio synchronized with the image is converted into a stereo left / right audio signal in the frequency domain, and the spatial cue information is synthesized based on the position information corresponding to the metadata, And then generate a binarized audio based on the left / right HRTF coefficient block in the frequency domain for each channel.

이 때, 영상, 바이너럴 효과가 적용된 오디오 및 부가 데이터를 포함하는 컨테이너를 생성하고, 영상 및 바이너럴 효과가 적용된 오디오는 동기 될 수 있다. At this time, a container including an image, audio and binary data to which a binary effect is applied, and audio with a binary effect can be synchronized.

예를 들어, 도 11을 참조하면, 본 발명의 일실시예에 따른 비주얼 인터페이스에 대한 터치 입력을 기반으로 영상(1111) M₁에 동기되는 오디오(1112) S₁에 대한 메타데이터(1120)를 생성하였다고 가정할 수 있다. 이 때, 입체 음향 컨텐츠 저작 어플리케이션은 영상(1111) M₁에 동기되는 오디오(1112) S₁과 메타데이터(1120)를 이용하여 바이너럴 효과가 적용된 오디오(1121) S₂를 생성할 수 있고, 렌더링을 수행하여 영상(1111) M₁, 바이너럴 효과가 적용된 오디오(1121) S₂ 및 부가 데이터(1130)를 포함하는 컨테이너(1140)를 생성할 수 있다. For example, referring to FIG. 11, metadata 1120 for an audio 1112 S ₁ synchronized with an image 1111 M ₁ based on a touch input to a visual interface according to an embodiment of the present invention Can be assumed. At this time, the stereophonic content authoring application can generate the audio 1121 S ₂ to which the _binary effect is applied by using the audio 1112 S ₁ and the metadata 1120 synchronized with the image 1111 M ₁ , A container 1140 including the image 1111 M ₁ , the audio 1121 S ₂ to which the _binary effect is applied, and the additional data 1130 can be generated.

이 때, 부가 데이터(1130)는 입체 음향 컨텐츠의 포맷에 관련된 정보나 렌더링을 위한 파라미터 등을 포함할 수 있다. In this case, the additional data 1130 may include information related to the format of stereophonic contents, parameters for rendering, and the like.

이 때, 비주얼 인터페이스에 대한 터치 입력을 기반으로 렌더링을 시작할 수 있다. At this time, the rendering can be started based on the touch input to the visual interface.

예를 들어, 사용자가 비주얼 인터페이스에 포함된 저장 버튼(SAVE)을 터치하는 경우, 입체 음향 컨텐츠를 생성하기 위한 렌더링을 시작할 수 있다. 이 때, 렌더링되어 생성된 입체 음향 컨텐츠는 입체 음향 컨텐츠 저작 어플리케이션을 기반으로 모바일 단말에 저장될 수 있다.For example, when the user touches the save button (SAVE) included in the visual interface, the user can start rendering to generate stereophonic content. At this time, the rendered stereoscopic sound content can be stored in the mobile terminal based on the stereoscopic sound content authoring application.

이 때, 렌더링은 바이너럴 효과가 적용된 오디오(1121) S₂를 생성하는 과정과 함께 수행될 수도 있다.At this time, rendering may be performed together with a process of generating audio 1121 S ₂ to which a _binary effect is applied.

또한, 사용자가 렌더링을 수행하지 않고 비주얼 인터페이스를 정지 또는 종료하는 경우, 현재까지 생성된 메타데이터를 보관하거나 또는 삭제할 수도 있다. In addition, when the user stops or ends the visual interface without performing rendering, the generated metadata may be stored or deleted.

예를 들어, 사용자가 비주얼 인터페이스에 포함된 정지 버튼(PAUSE)을 터치하는 경우, 현재까지 생성된 메타데이터를 영상과 함께 저장하여 보관할 수도 있다.For example, when the user touches the stop button PAUSE included in the visual interface, the metadata generated so far may be stored and stored together with the image.

다른 예를 들어, 사용자가 비주얼 인터페이스에 포함된 종료 버튼(End) 또는 나가기 버튼(EXIT)을 터치하는 경우, 현재까지 생성된 메타데이터를 삭제하고 비주얼 인터페이스를 종료할 수도 있다.For example, when the user touches the end button (End) or the exit button (EXIT) included in the visual interface, the generated metadata may be deleted and the visual interface may be terminated.

이와 같은 입체 음향 컨텐츠 저작 어플리케이션을 이용하여, 방향감, 거리감 및 공간감을 제공할 수 있는 입체 음향 컨텐츠를 생성할 수 있다. By using such a stereoscopic sound content authoring application, it is possible to generate stereo sound content that can provide a sense of direction, a sense of distance, and a sense of space.

또한, 쉽고 직관적인 인터페이스를 기반으로 누구나 쉽게 바이너럴 효과가 적용된 입체 음향 컨텐츠를 저작할 수 있다.Also, based on an easy and intuitive interface, anyone can easily create stereoscopic sound contents with bilingual effects.

도 12는 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 방법을 나타낸 동작흐름도이다. 12 is a flowchart illustrating a stereoscopic content authoring method according to an exemplary embodiment of the present invention.

도 12를 참조하면, 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 방법은 모바일 단말을 통해 재생되는 영상에 동기되는 오디오에, 바이너럴(binaural) 효과를 적용하기 위한 비주얼 인터페이스를 제공한다(S1210).Referring to FIG. 12, a stereophonic content authoring method according to an embodiment of the present invention provides a visual interface for applying a binaural effect to audio synchronized with an image reproduced through a mobile terminal (S1210 ).

이 때, 바이너럴(binaural) 효과는, 사용자가 헤드폰이나 이어폰으로 오디오를 들을 때 마치 오디오의 음상이 외부에 위치하는 것처럼 입체적으로 들리게 해주는 것이다. 따라서, 바이너럴 효과가 적용된 컨텐츠는 일반 컨텐츠보다 실감나고 현장감 있는 경험을 사용자에게 제공해줄 수 있다.In this case, the binaural effect is that when a user listens to audio with a headphone or earphone, the audio image is audibly heard as if it were located outside. Therefore, the contents with the binary effect can provide users with more realistic and realistic experiences than general contents.

이 때, 비주얼 인터페이스는 입체 음향 컨텐츠 저작 어플리케이션의 실행을 기반으로 제공될 수 있으므로 입체 음향 컨텐츠 저작 어플리케이션이 실행 가능한 모바일 단말을 통해 제공될 수 있다.At this time, since the visual interface can be provided based on execution of the stereoscopic content authoring application, the stereoscopic content authoring application can be provided through the executable mobile terminal.

이 때, 모바일 단말은 네트워크를 이용한 데이터 통신을 기반으로 입체 음향 컨텐츠 저작 어플리케이션을 제공하는 서버로부터 입체 음향 컨텐츠 저작 어플리케이션을 획득할 수 있다. 따라서, 모바일 단말은 네트워크를 기반으로 데이터 통신을 수행 가능한 단말에 상응할 수 있다.At this time, the mobile terminal can acquire a stereophonic content authoring application from a server providing a stereophonic content authoring application based on data communication using a network. Accordingly, the mobile terminal may correspond to a terminal capable of performing data communication based on the network.

이 때, 모바일 단말의 메모리에는 운영체제(Operating Systme, OS)와 함께 입체 음향 컨텐츠 저작 어플리케이션을 포함한 각종 어플리케이션들이 저장될 수 있다. 따라서, 입체 음향 컨텐츠 저작 어플리케이션은 모바일 단말에 설치되어 실행되는 컴퓨터 프로그램에 상응할 수 있다.In this case, various applications including a stereophonic content authoring application can be stored in the memory of the mobile terminal together with an operating system (OS). Accordingly, the stereophonic content authoring application may correspond to a computer program installed and executed in the mobile terminal.

또한, 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 방법은 비주얼 인터페이스에 대한 터치 입력을 기반으로 영상에 동기되는 메타데이터를 생성하고, 터치 입력에 즉각적으로 반응하여 바이너럴 효과가 적용된 피드백 오디오를 출력한다(S1220). According to another aspect of the present invention, there is provided a stereophonic content authoring method comprising: generating metadata to be synchronized with an image based on a touch input to a visual interface; receiving feedback audio having a binary effect by instantly responding to a touch input; (S1220).

이 때, 모바일 단말의 사용자는 영상에 포함된 객체의 움직임을 보면서 비주얼 인터페이스를 조작하여 영상에 동기된 오디오의 음상을 조절할 수 있는데, 이 때에 입력값을 기반으로 메타데이터를 생성할 수 있다. At this time, the user of the mobile terminal can control the sound image of the audio synchronized with the image by operating the visual interface while watching the motion of the object included in the image, and at this time, the metadata can be generated based on the input value.

예를 들어, 도 3에 도시된 것과 같이 영상을 재생하는 동안 생성된 메타데이터(320-1~320-N)를 각각 터치 입력 시점에 재생된 영상의 프레임들(310-1~310-N)에 매칭시켜 메타데이터와 영상을 동기화할 수 있다.For example, as shown in FIG. 3, the metadata 320-1 to 320-N generated during the reproduction of an image may be transmitted to the frames 310-1 to 310-N of the reproduced image at the touch input time, So that the metadata and the image can be synchronized with each other.

이 때, 터치 입력에 즉각적으로 반응하여 바이너럴 효과가 적용된 피드백 오디오를 출력할 수 있다. At this time, it is possible to output feedback audio to which a binary effect is applied by immediately reacting to a touch input.

이 때, 피드백 오디오는 메타데이터를 기반으로 기존의 오디오를 변형한 것에 상응할 수 있다. 즉, 사용자에 의한 터치 입력이 발생하면, 이에 상응하게 생성되는 메타데이터를 실시간으로 반영하여 피드백 오디오를 출력해줄 수 있다. 만약, 즉각적으로 피드백 오디오가 출력되지 않는다면, 사용자는 렌더링이 완료될 때까지 자신이 변형한 오디오를 들을 수 없기 때문에 영상과 대응하지 않도록 바이너럴 효과가 적용될 수도 있다. At this time, the feedback audio may correspond to a modification of the existing audio based on the metadata. That is, when the touch input by the user occurs, the feedback audio can be output by reflecting the generated metadata in real time. If immediate feedback audio is not output, the user may apply a binary effect so that it does not correspond to the image because the user can not hear the modified audio until rendering is complete.

이 때, 비주얼 인터페이스는 음상의 2차원 평면상의 위치를 지정하기 위한 제1 인터페이스 및 음상의 2차원 평면에 수직한 직선상에 위치를 지정하기 위한 제2 인터페이스를 포함하고, 삼차원 위치는 제1 인터페이스에 대한 사용자의 제1 입력 및 제2 인터페이스에 대한 사용자의 제2 입력을 조합하여 생성될 수 있다.At this time, the visual interface includes a first interface for specifying a position on a two-dimensional plane of the sound image and a second interface for specifying a position on a straight line perpendicular to the two-dimensional plane of the sound image, The first input of the user to the second interface and the second input of the user to the second interface.

다른 예를 들어, 도 5를 참조하면, 사용자가 영상에 포함된 객체(500)의 움직임에 따라 사용자의 머리 위치를 포함하는 제3 인터페이스(510)를 조작할 수도 있다. 이 때, 사용자가 제3 인터페이스(510)에 해당하는 3차원 위치에 음상을 터치 입력으로 선택하면, 선택된 위치에 상응하는 삼차원 위치를 메타데이터로 기록할 수도 있다. 이 때, 제3 인터페이스(510)는 조작의 편의를 위해 머리의 방향을 회전시키면서 음상을 터치할 수도 있다.For example, referring to FIG. 5, a user may manipulate the third interface 510 including the user's head position according to the movement of the object 500 included in the image. At this time, if the user selects the sound image as the touch input at the three-dimensional position corresponding to the third interface 510, the user may record the three-dimensional position corresponding to the selected position as the metadata. At this time, the third interface 510 may touch the sound image while rotating the head direction for the convenience of operation.

이 때, 비주얼 인터페이스는 모드변경 버튼(MODE)을 이용하여 피드백 오디오에 의한 영상을 플레이하는 플레이 모드나 비주얼 인터페이스에 의해 음상의 위치를 조절하기 위한 편집 모드 중 어느 하나의 모드를 선택적으로 제공할 수 있다. 이 때, 모드변경 버튼은 토글(toggle) 방식으로 동작할 수 있다.At this time, the visual interface can selectively provide any one of a play mode for playing the video by the feedback audio or an edit mode for adjusting the position of the sound image by the visual interface by using the mode change button (MODE) have. At this time, the mode change button can be operated in a toggle manner.

이 때, 비주얼 인터페이스는 메타데이터에 상응하는 음상을 영상에 오버레이하여 실시간으로 표시할 수 있다.At this time, the visual interface can overlay the image corresponding to the meta data on the image and display it in real time.

또한, 본 발명의 일실시예에 따르면, 객체 트래핑 인터페이스는 음상에 상응하게 인식된 객체의 움직임에 따라 음상의 위치도 그에 상응하게 변경시킬 수도 있다. 즉, 사용자가 직접 음상을 변경하지 않아도, 객체 트래핑 인터페이스를 통해 인식된 음상에 상응하는 객체를 트래킹하면서 자동으로 음상에 대한 메타데이터가 생성되도록 할 수도 있다.Also, according to an embodiment of the present invention, the object trapping interface may change the position of the sound image correspondingly to the movement of the recognized object corresponding to the sound image. That is, even if the user does not directly change the sound image, it is possible to automatically generate the sound image metadata while tracking the object corresponding to the sound image recognized through the object trapping interface.

또한, 도 12에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 방법은 메타데이터를 기반으로 바이너럴 효과가 적용된 오디오를 포함하는 입체 음향 컨텐츠를 렌더링한다.In addition, although not shown in FIG. 12, the stereophonic content authoring method according to an embodiment of the present invention renders stereophonic contents including audio applied with a binary effect based on metadata.

이 때, 영상, 바이너럴 효과가 적용된 오디오 및 부가 데이터를 포함하는 컨테이너를 생성하고, 영상 및 바이너럴 효과가 적용된 오디오는 동기 될 수 있다.At this time, a container including an image, audio and binary data to which a binary effect is applied, and audio with a binary effect can be synchronized.

예를 들어, 도 11을 참조하면, 본 발명의 일실시예에 따른 비주얼 인터페이스에 대한 터치 입력을 기반으로 영상(1111) M1에 동기되는 오디오(1112) S1에 대한 메타데이터(1120)를 생성하였다고 가정할 수 있다. 이 때, 영상(1111) M1에 동기되는 오디오(1112) S1과 메타데이터(1120)를 이용하여 바이너럴 효과가 적용된 오디오(1121) S2를 생성할 수 있고, 렌더링을 수행하여 영상(1111) M1, 바이너럴 효과가 적용된 오디오(1121) S2 및 부가 데이터(1130)를 포함하는 컨테이너(1140)를 생성할 수 있다. For example, referring to FIG. 11, the metadata 1120 for the audio 1112 S1 to be synchronized with the image 1111 M1 is generated based on the touch input to the visual interface according to an embodiment of the present invention Can be assumed. At this time, the audio 1121 S2 to which the binary effect is applied can be generated using the audio 1112 S1 and the metadata 1120 synchronized with the video 1111, The audio 1121 S2 to which the binary effect has been applied, and the additional data 1130 can be generated.

이 때, 렌더링은 바이너럴 효과가 적용된 오디오(1121) S2를 생성하는 과정과 함께 수행될 수도 있다.At this time, rendering may be performed together with a process of generating audio 1121 S2 to which a binary effect is applied.

또한, 도 12에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 방법은 상술한 바와 같이 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 과정에서 발생하는 다양한 정보를 별도의 저장 모듈에 저장할 수 있다.In addition, although not shown in FIG. 12, the stereophonic content authoring method according to an exemplary embodiment of the present invention stores various information generated in the stereophonic content authoring process according to an exemplary embodiment of the present invention, Module.

이와 같은 입체 음향 컨텐츠 저작 방법을 이용하여 방향감, 거리감 및 공간감을 제공할 수 있는 입체 음향 컨텐츠를 생성할 수 있다. By using such a stereoscopic content authoring method, it is possible to generate stereophonic content that can provide a sense of direction, a sense of distance, and a sense of space.

도 13은 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 과정을 상세하게 나타낸 동작흐름도이다.FIG. 13 is a flowchart illustrating an operation of authoring a stereoscopic sound content according to an exemplary embodiment of the present invention.

도 13을 참조하면, 본 발명의 일실시예에 따른 입체 음향 컨텐츠 저작 과정은 먼저 모바일 단말에 설치된 입체 음향 컨텐츠 저작을 위한 입체 음향 컨텐츠 저작 어플리케이션이 실행되면(S1310), 입체 음향 컨텐츠 저작 어플리케이션에서 사용자에 의해 선택된 영상을 기반으로 비주얼 인터페이스를 제공할 수 있다(S1320).Referring to FIG. 13, the stereophonic content authoring process according to an exemplary embodiment of the present invention starts with a step S1310 in which a stereophonic content authoring application for authoring stereophonic content authoring installed in a mobile terminal is executed (S1310) And provides a visual interface based on the image selected by the user (S1320).

이 때, 비주얼 인터페이스는 사용자에 의해 선택된 영상에 동기되는 오디오에 바이너럴 효과를 적용하기 위한 인터페이스들을 포함할 수 있다. At this time, the visual interface may include interfaces for applying a binary effect to the audio synchronized with the image selected by the user.

이 후, 바이너럴 효과를 적용하기 위한 인터페이스들에 대한 터치 입력을 기반으로 영상에 동기되는 메타데이터를 생성할 수 있고, 동시에 터치 입력에 즉각적으로 반응하여 바이너럴 효과가 적용된 피드백 오디오를 사용자에게 출력해줄 수 있다(S1330).Thereafter, the metadata to be synchronized with the image can be generated based on the touch input to the interfaces for applying the binary effect, and at the same time, the feedback audio to which the binary effect is applied is immediately output to the user (S1330).

이 때, 피드백 오디오를 실시간으로 출력해줌으로써 사용자가 생성된 메타데이터에 상응하는 바이너럴 효과를 확인할 수 있다.At this time, by outputting the feedback audio in real time, the user can confirm the binary effect corresponding to the generated metadata.

이 후, 사용자에 의해 비주얼 인터페이스에 포함된 저장 버튼이 입력되는 경우, 영상, 바이너럴 효과가 적용된 오디오 및 부가 데이터를 포함하는 컨테이너를 생성하는 렌더링을 수행한다(S1340).Thereafter, when a storage button included in the visual interface is input by the user, rendering is performed to generate a container including an image, audio and additional data to which a binary effect is applied (S1340).

이 때, 영상과 바이너럴 효과가 적용된 오디오는 시간을 기준으로 동기화될 수 있다.At this time, the video and the audio to which the binary effect is applied can be synchronized with respect to time.

이 후, 생성된 컨테이너에 상응하는 입체 음향 컨텐츠를 입체 음향 컨텐츠 저작 어플리케이션에 기반하여 모바일 단말의 메모리에 저장할 수 있다(S1350).Thereafter, the stereophonic contents corresponding to the generated container can be stored in the memory of the mobile terminal based on the stereophonic content authoring application (S1350).

이 때, 이어폰이나 헤드폰과 같은 외부 음향 기기가 연결된 모바일 단말에서 입체 음향 컨텐츠를 재생하는 경우, 마치 외부에 위치하는 오디오에서 소리가 나는 것처럼 실감나게 컨텐츠를 감상할 수 있다.In this case, when stereophonic contents are reproduced by a mobile terminal connected to an external sound apparatus such as an earphone or a headphone, it is possible to appreciate the contents realistically as if the sound exists in the external audio.

도 14 내지 도 15는 본 발명에 따른 입체 음향 컨텐츠 저작 어플리케이션 실행화면의 다른 예를 나타낸 도면이다.14 to 15 are views showing another example of a stereoscopic sound content authoring application execution screen according to the present invention.

먼저 도 14를 참조하면, 본 발명에 따른 입체 음향 컨텐츠 저작 어플리케이션은 모바일 단말을 기반으로 실행될 수 있으며, 실행후 입체 음향 컨텐츠를 저작하기 위한 저작 메뉴(Authoring), 입체 음향 컨텐츠 저작에 필요한 컨텐츠를 다운받기 위한 컨텐츠 다운로드 메뉴(Contents Download) 및 기타 정보 메뉴(Information) 등을 제공할 수 있다. Referring to FIG. 14, the stereophonic content authoring application according to the present invention can be executed based on a mobile terminal, and includes an authoring for authoring stereophonic contents after execution, Contents download menu and other information menu for receiving the contents.

예를 들어, 저작 메뉴(Authoring)를 선택하면, 도 15에 도시된 것처럼 저작할 일반 영상 컨텐츠를 선택하기 위한 메뉴를 제공할 수 있다. For example, when an authoring menu is selected, a menu for selecting general image contents to be authored as shown in FIG. 15 may be provided.

이 때, 일반 영상 컨텐츠는 입체 음향 컨텐츠 저작 어플리케이션이 저장된 모바일 단말의 메모리에 저장된 동영상에 상응할 수 있다. 이 때, 입체 음향 컨텐츠 저작 어플리케이션을 통해 이전에 저작한 입체 영상 컨텐츠와 아직 저작되지 않은 일반 영상 컨텐츠를 분류하여 영상 컨텐츠 검색결과를 제공할 수도 있다.At this time, the general image content may correspond to the moving image stored in the memory of the mobile terminal storing the stereoscopic content authoring application. At this time, the stereoscopic image content previously authored through the stereoscopic content authoring application and the general image content that has not yet been authored may be classified to provide the image content search result.

이 때, 일반 영상 컨텐츠는 도 15에 도시된 것과 같이 영상의 내용을 식별할 수 있는 썸네일(Thumbnail)과 함께 제공될 수 있다. At this time, the general video content may be provided with a thumbnail that can identify the content of the video as shown in FIG.

예를 들어, 일반 영상 컨텐츠 중 사용자가 어느 하나의 영상 컨텐츠를 선택하는 경우, 선택된 일반 영상 컨텐츠를 입체 음향 컨텐츠로 저작하기 위해 입체 음향 컨텐츠 저작 어플리케이션이 비주얼 인터페이스를 제공할 수 있다. For example, when a user selects one of the general image contents, the stereoscopic sound content authoring application can provide a visual interface for authoring the selected general image contents as stereophonic contents.

이상에서와 같이 본 발명에 따른 입체 음향 컨텐츠 저작 방법 및 이를 위한 어플리케이션은 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, the stereophonic contents authoring method and the application thereof according to the present invention are not limited to the configuration and method of the above-described embodiments, but the embodiments can be applied to various embodiments All or some of the embodiments may be selectively combined.

110, 410: 제1 인터페이스 120, 420: 제2 인터페이스
210: 통신부 220: 프로세서
230: 메모리 310-1~310-N, 920, 930: 프레임
320-1~320-N: 메타데이터 400, 500, 900: 객체
510: 제3 인터페이스 710: 음상
720: 음상 트래킹 인터페이스 1010: 오디오
1020, 1030: 영상 프레임 1111: 영상
1112: 오디오 1120: 메타데이터
1121: 바이너럴 효과가 적용된 오디오
1130: 부가데이터 1140: 컨테이너
1610: 모바일 단말 1620: 서버
1630: 네트워크110, 410: first interface 120, 420: second interface
210: communication unit 220:
230: memories 310-1 to 310-N, 920, 930: frame
320-1 to 320-N: metadata 400, 500, 900: object
510: Third interface 710: Sound image
720: Sound field tracking interface 1010: Audio
1020, 1030: video frame 1111: video
1112: Audio 1120: Metadata
1121: Audio with binaural effect
1130: additional data 1140: container
1610: Mobile terminal 1620: Server
1630: Network

Claims

Providing a visual interface for applying a binaural effect to audio that is synchronized with an image played through a mobile terminal; And
And generating metadata to be synchronized with the video based on a touch input to the visual interface,
The step of generating the metadata
Wherein the feedback audio is generated by immediately responding to the touch input and outputting a feedback audio to which a binary effect is applied.

The method according to claim 1,
The stereophonic content authoring method
And rendering stereoscopic sound contents including audio to which a binary effect is applied based on the metadata.

The method of claim 2,
Wherein rendering the stereophonic content comprises:
And generating a container including the image, the audio and the additional data to which the binary effect is applied, and the audio and the audio to which the binary effect is applied are synchronized.

The method according to claim 1,
The metadata
Dimensional position of the sound image corresponding to the binary effect.

The method of claim 4,
The visual interface
A first interface for specifying a position on the two-dimensional plane of the sound image; And
And a second interface for specifying a position on a straight line perpendicular to the two-dimensional plane of the sound image,
Wherein the three-dimensional location is generated by combining a first input of the user to the first interface and a second input of the user to the second interface.

The method of claim 5,
The visual interface
And displaying an image corresponding to the metadata in real time by overlaying the image on the image.

The method of claim 6,
The audio image corresponding to the metadata
Wherein the position of the user is represented by a hemispherical position set based on the head position of the user.

The method of claim 7,
The visual interface
And an object tracking interface for highlighting an object corresponding to the sound image among the objects of the image.

The method of claim 8,
The object tracking interface
And determining an object corresponding to the sound image based on the correlation with the audio among the objects of the image.

The method of claim 3,
Wherein the audio to which the binary effect is applied is a two-channel audio corresponding to a left channel and a right channel.

Providing a visual interface for applying a binaural effect to audio that is synchronized with an image played through a mobile terminal; And
Generating metadata to be synchronized with the video based on a touch input to the visual interface,
And outputting a feedback audio to which a binary effect is applied by immediately responding to the touch input.

The method of claim 11,
And stereoscopic sound content including audio having a binary effect is rendered based on the metadata.

The method of claim 12,
Wherein a container including the image, the audio and the audio data to which the binary effect is applied is generated through the rendering, and the audio and the audio to which the binary effect is applied are synchronized.

The method of claim 11,
The metadata
Dimensional position corresponding to the binary effect. &Lt; Desc / Clms Page number 19 >

15. The method of claim 14,
The visual interface
A first interface for specifying a position on the two-dimensional plane of the sound image; And
And a second interface for specifying a position on a straight line perpendicular to the two-dimensional plane of the sound image,
Wherein the three-dimensional location is generated by combining a first input of the user to the first interface and a second input of the user to the second interface.

16. The method of claim 15,
The visual interface
And displays an image corresponding to the metadata in real time overlaying the image.

18. The method of claim 16,
The audio image corresponding to the metadata
Wherein the position of the user is represented by a hemispheric position set based on the head position of the user.

18. The method of claim 17,
The visual interface
And an object tracking interface for highlighting an object corresponding to the sound image among the objects of the image.

19. The method of claim 18,
The object tracking interface
Wherein the object corresponding to the sound image is determined based on correlation with the audio among the objects of the image.

14. The method of claim 13,
Wherein the audio to which the binary effect is applied is a two-channel audio corresponding to a left channel and a right channel.