KR20180035194A

KR20180035194A - Method for generating and playing object-based audio contents and computer readable recordoing medium for recoding data having file format structure for object-based audio service

Info

Publication number: KR20180035194A
Application number: KR1020180024785A
Authority: KR
Inventors: 장인선; 서정일; 김휘용; 이태진; 강경옥; 홍진우; 김진웅; 안치득; 함승철
Original assignee: 한국전자통신연구원; (주)오디즌
Priority date: 2008-04-23
Filing date: 2018-02-28
Publication date: 2018-04-05
Also published as: KR101999351B1; KR20150100585A; KR101724326B1; KR20160150616A

Abstract

Disclosed are a method for generating and playing object-based audio content and a computer-readable recording medium for recoding data with a file format structure for an object-based audio service. The method for generating object-based audio content includes the steps of: receiving a plurality of audio objects; generating at least one preset by using the plurality of audio objects which are inputted; and storing a preset parameter about an attribute of the at least one preset and the plurality of audio objects. The preset parameter is stored in a box form defined in a media file format with regard to the object-based audio contents. Accordingly, the present invention can efficiently store the preset for the plurality of audio objects.

Description

TECHNICAL FIELD [0001] The present invention relates to a method for generating and reproducing object-based audio content, and a file format structure for object-based audio service. FORMAT STRUCTURE FOR OBJECT-BASED AUDIO SERVICE}

본 발명은 객체기반 오디오 컨텐츠에 대한 프리셋 정보를 효율적으로 저장할 수 있는 객체기반 오디오 컨텐츠의 생성/재생 방법 및 객체기반 오디오 서비스를 위한 파일 포맷 구조를 가진 데이터를 기록한 컴퓨터 판독 가능 기록 매체에 관한 것이다. The present invention relates to an object-based audio content creation / playback method capable of efficiently storing preset information on object-based audio content, and a computer-readable recording medium storing data having a file format structure for object-based audio service.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT원천기술개발의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호 : 2008-F-011-01, 과제명 : 차세대DTV핵심기술개발(표준화연계) - 무안경개인형3D방송기술개발(계속)].The present invention is derived from research conducted as part of the development of IT source technology of the Ministry of Knowledge Economy and the Korea IT Industry Promotion Agency. [Project number: 2008-F-011-01, - Development of 3D broadcasting technology of Muan Xing Ding Doll (cont.)].

TV 방송, 라디오 방송, DMB(Digital Multimedia Broadcasting) 등과 같은 방송 서비스를 통해 제공되는 기존의 오디오 신호는 여러 가지 음원으로부터 획득된 오디오 신호가 믹싱되어 하나의 오디오 신호로 저장/전송되는 것이다. Conventional audio signals provided through broadcasting services such as TV broadcasting, radio broadcasting, and DMB (Digital Multimedia Broadcasting) are audio signals obtained from various sound sources are mixed and stored / transmitted as one audio signal.

이와 같은 환경에서는 시청자가 전체 오디오 신호의 세기 등을 조절하는 것은 가능하나, 오디오 신호 내에 포함된 각 음원 별 오디오 신호의 세기를 조절하는 것 등과 같은 음원 별 오디오 신호의 특성제어는 불가능하게 된다. In such an environment, it is possible for the viewer to adjust the intensity of the entire audio signal, but it is impossible to control the characteristics of the audio signal for each sound source such as adjusting the intensity of the audio signal for each sound source included in the audio signal.

그러나, 오디오 컨텐츠를 저작할 때, 각 음원 별 오디오 신호를 합성하지 않고 독립적으로 저장한다면, 컨텐츠 재생 단말에서는 각 음원 별 오디오 신호에 대한 세기 등을 제어하면서 해당 컨텐츠를 시청할 수 있게 된다. However, when authoring audio contents, if audio signals of respective sound sources are independently synthesized without being synthesized, the contents reproducing terminal can watch the corresponding contents while controlling the intensity of audio signals for each sound source.

이와 같이 저장/송신 단에서 여러 개의 오디오 신호를 독립적으로 저장/전송하고, 사용자가 수신기(컨텐츠 재생 장치)에서 각각의 오디오 신호를 적절히 제어하면서 청취할 수 있도록 하는 오디오 서비스를 객체 기반 오디오 서비스라 한다.An audio service that allows a user to independently store / transmit a plurality of audio signals at a storage / transmission end and allow the user to listen to and control each audio signal in a receiver (content reproduction apparatus) is referred to as an object-based audio service .

이러한 객체 기반 오디오 서비스에서는 각 객체들의 위치, 음의 세기, 객체들의 위치에 따른 음향적 특성 등과 같은 속성들을 프리셋(Preset)으로 정의하여 제공함으로써 사용자로 하여금 이들을 오디오 컨텐츠의 재생에 활용할 수 있게 한다. 즉, 여러 개의 프리셋 오디오 정보들을 생성하여 파일 내부에 포함하여 서비스한다면, 수신 측에서는 객체 기반 오디오 서비스를 더욱 효율적으로 재생할 수 있다. In such an object-based audio service, a property such as a position, a sound intensity, and an acoustic characteristic according to the position of each object is defined and provided as a preset, thereby allowing the user to utilize them for reproduction of audio contents. That is, if a plurality of preset audio information is generated and included in a file, the object-based audio service can be reproduced more efficiently on the receiving side.

기존의 ISO 기반 미디어 파일 포맷(ISO-BMFF: ISO Base Media File Format)에서는 오디오, 비디오, 정지 영상 등 다양한 형태의 미디어를 모두 포함하는 형태의 파일 구조를 정의하고 있다. 상기의 파일 구조는 미디어의 인터체인지(interchange), 관리(management), 편집(editing), 프레젠테이션(presentation)에 있어 유연하고 확장 가능한 특징이 있다. The existing ISO Base Media File Format (ISO-BMFF) defines a file structure that includes various types of media such as audio, video, and still images. The above file structure is flexible and expandable in the interchange, management, editing, and presentation of media.

이러한 ISO 기반 미디어 파일 포맷에 오디오 트랙과 프리셋 정보를 추가하여 저장 또는 송신한다면 객체기반 오디오 서비스를 더욱 효율적으로 제공할 수 있을 것이다.Adding or storing audio track and preset information in these ISO-based media file formats will provide more efficient object-based audio services.

본 발명의 일실시예들은 복수의 오디오 객체에 대한 프리셋을 효율적으로 저장할 수 객체기반 오디오 컨텐츠의 생성 방법을 제공하는 것을 목적으로 한다. It is an object of the present invention to provide a method of generating object-based audio content capable of efficiently storing a preset for a plurality of audio objects.

본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법은 복수의 오디오 객체를 입력 받는 단계, 상기 입력된 복수의 오디오 객체를 이용하여 적어도 하나의 프리셋을 생성하는 단계, 및 상기 복수의 오디오 객체, 및 상기 적어도 하나의 프리셋의 속성에 대한 프리셋 파라미터를 저장하는 단계를 포함하고, 상기 프리셋 파라미터는 상기 객체기반 오디오 컨텐츠에 관한 미디어 파일 포맷에서 정의되는 박스(box)의 형태로 저장된다. A method of generating object-based audio content according to an exemplary embodiment of the present invention includes receiving a plurality of audio objects, generating at least one preset using the input plurality of audio objects, And storing a preset parameter for an attribute of the at least one preset, wherein the preset parameter is stored in the form of a box defined in a media file format for the object-based audio content.

이 경우, 상기 미디어 파일 포맷은 ISO 기반 미디어 파일 포맷(ISO base media file format) 구조일 수 있다. In this case, the media file format may be an ISO base media file format.

또한, 상기 박스는 무브(moov) 박스를 포함하고, 상기 무브 박스는 상기 무브 박스 내에 정의된 제1 박스를 포함하고, 상기 제1 박스는 상기 제1 박스 내에 정의된 제2 박스를 포함하고, 상기 프리셋 파라미터는 제1 프리셋 파라미터 및 제2 프리셋 파라미터를 포함하고, 상기 제1 프리셋 파라미터는 상기 적어도 하나의 프리셋의 개수, 및 상기 적어도 하나의 프리셋 중에서 어느 하나의 프리셋의 프리셋 아이디(ID) 중에서 적어도 하나를 포함하고, 상기 제1 프리셋 파라미터는 상기 제1 박스에 저장되고, 상기 제2 프리셋 파라미터는 상기 제2 박스에 저장될 수 있다. The box also includes a moov box, the moov box includes a first box defined within the moov box, the first box includes a second box defined within the first box, Wherein the preset parameter includes a first preset parameter and a second preset parameter, wherein the first preset parameter includes at least one of the number of the at least one preset and at least one of preset IDs of any one of the at least one preset Wherein the first preset parameter is stored in the first box, and the second preset parameter is stored in the second box.

또한, 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법은 객체기반 오디오 컨텐츠로부터 복수의 오디오 객체 및 적어도 하나의 프리셋을 복원하는 단계, 상기 적어도 하나의 프리셋에 기초하여 상기 복수의 오디오 객체를 믹싱하여 출력 오디오 신호를 생성하는 단계, 및 상기 출력 오디오 신호를 재생하는 단계를 포함하고, 상기 적어도 하나의 프리셋 각각은 프리셋 파라미터를 포함하고, 상기 프리셋 파라미터는 상기 객체기반 오디오 컨텐츠에 관한 미디어 파일 포맷에서 정의되는 박스의 형태로 상기 객체기반 오디오 컨텐츠에 저장될 수 있다. According to another aspect of the present invention, there is provided a method of reproducing object-based audio content, the method comprising: restoring a plurality of audio objects and at least one preset from object-based audio content; And reproducing the output audio signal, wherein each of the at least one preset includes a preset parameter, and the preset parameter includes a media file associated with the object-based audio content Based audio content in the form of a box defined in the format.

또한, 본 발명의 일실시예에 따른 객체기반 오디오 서비스를 위한 파일 포맷 구조를 가진 데이터를 기록한 컴퓨터 판독 가능 기록 매체는 객체기반 오디오 컨텐츠의 규격 정보를 저장하는 에프팁(ftyp) 박스, 상기 객체 기반 오디오 컨텐츠를 구성하는 복수의 오디오 객체를 저장하는 엠닷(mdat) 박스, 및 상기 저장된 복수의 오디오 객체를 프레젠테이션(presentation)하는 메타데이터(meta data)를 저장하는 무브(moov) 박스를 포함하고, 상기 복수의 오디오 객체를 이용하여 생성된 적어도 하나의 프리셋의 속성에 대한 프리셋 파라미터는 상기 에프팁 박스 및 상기 무브 박스 중에서 어느 하나에 저장된다. According to another aspect of the present invention, there is provided a computer-readable recording medium on which data having a file format structure for an object-based audio service is recorded. The computer-readable recording medium includes an ffp box for storing standard information of object- An mdat box for storing a plurality of audio objects constituting audio contents and a moov box for storing meta data for presenting the stored plurality of audio objects, Preset parameters for attributes of at least one preset created using a plurality of audio objects are stored in either the fift tip box or the move box.

본 발명에 따르면, 복수의 오디오 객체에 대한 프리셋을 효율적으로 저장할 수 있게 된다.According to the present invention, a preset for a plurality of audio objects can be efficiently stored.

도 1은 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 저장을 위한 미디어 파일 포맷 구조의 기본 형태를 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 트랙과 채널과의 관계를 도시한 도면이다.
도 3은 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법에 대한 흐름도를 도시한 도면이다.
도 4는 본 발명의 일실시예에 따른 'moov'의 구조를 도시한 도면이다.
도 5는 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법에 대한 흐름도를 도시한 도면이다.
도 6은 본 발명의 다른 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법의 흐름도를 도시한 도면이다.
도 7 및 도 8은 본 발명의 일실시예에 따라 디스크립션 정보를 포함하는 객체기반 오디오 컨텐츠의 저장을 위한 파일 포맷의 구조를 도시한 도면이다.1 is a diagram showing a basic form of a media file format structure for storing object-based audio contents according to an embodiment of the present invention.
2 is a diagram showing a relationship between a track and a channel according to an embodiment of the present invention.
3 is a flowchart illustrating a method of generating object-based audio content according to an exemplary embodiment of the present invention.
4 is a diagram illustrating a structure of 'moov' according to an embodiment of the present invention.
FIG. 5 is a flowchart illustrating a method of reproducing object-based audio content according to an exemplary embodiment of the present invention. Referring to FIG.
FIG. 6 is a flowchart illustrating a method of reproducing object-based audio content according to another embodiment of the present invention.
7 and 8 are diagrams showing a structure of a file format for storing object-based audio content including description information according to an embodiment of the present invention.

이하에서는 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 다음에 예시하는 본 발명의 실시예는 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 다음에 상술하는 실시예에 한정되는 것은 아니다. 본 발명의 실시예는 당업계에서 통상의 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위하여 제공된다. Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. However, the following embodiments of the present invention may be modified into various other forms, and the scope of the present invention is not limited to the embodiments described below. The embodiments of the present invention are provided to enable those skilled in the art to more fully understand the present invention.

도 1은 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 저장을 위한 미디어 파일 포맷 구조의 기본 형태를 도시한 도면이다. 1 is a diagram showing a basic form of a media file format structure for storing object-based audio contents according to an embodiment of the present invention.

도 1을 참조하면, 객체기반 오디오 컨텐츠의 저장을 위한 미디어 파일 포맷 구조는 크게, 객체기반 오디오 컨텐츠의 규격 정보(즉, 객체기반 오디오 컨텐츠 파일의 타입 정보)가 저장되는 에프팁 박스(이하 'ftyp'라고 한다), 객체기반 오디오 컨텐츠를 구성하는 복수의 오디오 객체 데이터의 프레젠테이션(presentation)을 위한 메타데이터(metadata)(예를 들어, 디코딩 타임(decoding time))가 저장되는 무브 박스(이하 'moov'라고 한다), 및 복수의 오디오 객체 데이터가 저장되는 엠닷 박스(이하, 'mdat' 라고 한다)를 포함하여 구성된다. Referring to FIG. 1, a media file format structure for storing object-based audio content includes an ff tip box (hereinafter, referred to as 'ftyp') storing standard information of object-based audio content (Hereinafter referred to as " moov ") for storing metadata (for example, decoding time) for presentation of a plurality of audio object data constituting object-based audio contents, (Hereinafter referred to as " mdat ") in which a plurality of audio object data is stored.

'ftyp' 및 'moov'는 메타 박스(이하 'meta'라고 한다)를 포함하여 구성되는데, 일반적으로 'meta'에는 'mdat'에 저장된 복수의 오디오 객체 데이터에 대한 디스크립션 메타데이터(descriptive metadata)가 저장된다. 'ftyp' and 'moov' are configured to include a meta box (hereinafter referred to as 'meta'). Generally, 'meta' includes descriptive metadata of a plurality of audio object data stored in 'mdat' .

여기서, 객체기반 오디오 컨텐츠의 저장을 위한 미디어 파일 포맷 구조는 ISO 기반 미디어 파일 포맷(ISO-BMFF: ISO Based Media File Format) 구조인 것이 바람직하다. Here, the media file format structure for storing object-based audio contents is preferably an ISO-based media file format (ISO-BMFF) structure.

이하에서는 ISO 기반 미디어 파일 포맷(ISO-BMFF) 구조에 따라서 객체기반 오디오 컨텐츠의 재생과 관련된 프리셋을 복수의 오디오 객체와 함께 저장하여 객체기반 오디오 컨텐츠를 생성하는 방법에 대해 기술하기로 한다. 그러나, 앞서 언급한 바와 같이, 이하에서 설명되는 객체기반 오디오 컨텐츠 생성 방법은 ISO 기반 미디어 파일 포맷(ISO-BMFF) 구조를 갖는 객체기반 오디오 컨텐츠에 한정되지 않으며, MP4 파일 등과 같은 멀티미디어 데이터를 저장하기 위한 미디어 파일 포맷 구조를 갖는 멀티채널 오디오 컨텐츠에 대해서도 확장 가능하다. Hereinafter, a method for generating object-based audio content by storing a preset associated with playback of object-based audio content along with an ISO-based media file format (ISO-BMFF) structure together with a plurality of audio objects will be described. However, as described above, the object-based audio content generation method described below is not limited to the object-based audio content having the ISO-based media file format (ISO-BMFF) structure, and may store multimedia data such as MP4 files The present invention is also applicable to multi-channel audio content having a media file format structure.

본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법을 설명하기에 앞서, 객체기반 오디오 컨텐츠에 저장되는 프리셋의 속성을 나타내는 프리셋 파라미터에 대해 먼저 설명하기로 한다. 프리셋 파라미터는 아래에 나열된 프리셋 정보 중에서 적어도 하나를 포함할 수 있다. Before describing a method of generating object-based audio content according to an embodiment of the present invention, a preset parameter indicating a property of a preset stored in object-based audio content will be described first. The preset parameter may include at least one of the preset information listed below.

1. 프리셋 이름, 프리셋 아이디1. Preset name, preset ID

'프리셋 이름'은 프리셋과 대응되는 문자열(string)을 의미하고, '프리셋 아이디'는 프리셋과 대응되는 각각의 프리셋과 대응되는 정수(integer)를 의미한다. 'Preset name' means a string corresponding to a preset, and 'Preset ID' means an integer corresponding to each preset corresponding to a preset.

2. 프리셋 개수, 디폴트 프리셋 아이디(ID)2. Number of Presets, Default Preset ID (ID)

'프리셋 개수'는 객체기반 오디오 컨텐츠에 포함되는 프리셋의 개수를 의미한다. The 'number of presets' refers to the number of presets included in the object-based audio content.

'디폴트 프리셋 아이디(default preset ID)'는 객체기반 오디오 컨텐츠가 재생되는 경우에 있어 사용자 인터랙션(user interaction)이 없는 초기 상태에서 가장 먼저 재생되어야 할 프리셋 아이디를 의미한다. 디폴트 프리셋 아이디는 객체기반 오디오 컨텐츠에 포함된 프리셋 아이디 중에서 어느 하나의 프리셋 아이디와 대응될 수 있다. The 'default preset ID' refers to a preset ID to be played first in an initial state in which there is no user interaction when object-based audio content is played back. The default preset ID may correspond to any one of the preset IDs included in the object-based audio content.

3. 프리셋 정보의 표시 여부3. Whether or not preset information is displayed

'프리셋 정보의 표시 여부'는 객체기반 오디오 컨텐츠의 재생 시에 프리셋 정보(일례로서, 아래에서 설명하는 입력 트랙 별 또는 입력 채널 별 볼륨 정보 또는 입력 트랙 별 또는 입력 채널 별 주파수 이득(gain) 정보)를 사용자에게 표시할지 여부에 대한 정보를 의미한다. Preset information display information includes preset information (for example, volume information for each input track or input channel, frequency gain information for each input track or input channel, which will be described below) To be displayed to the user.

4. 프리셋의 편집 가능 여부4. Whether the preset can be edited

'프리셋의 편집 가능 여부'는 객체기반 오디오 컨텐츠의 재생 시 사용자가 프리셋을 편집할 수 있는지에 대한 정보를 의미한다.'Editable Preset' means information on whether a user can edit a preset when playing object-based audio content.

5. 입력 트랙(track)의 개수, 입력 트랙의 아이디, 입력 트랙 당 입력 채널(channel)의 개수5. The number of input tracks, the ID of the input track, the number of input channels per input track

'입력 트랙의 개수'는 객체기반 오디오 컨텐츠에 저장되는 입력 트랙의 개수를 의미한다. 여기서 입력 트랙은 음원(sound source)와 대응될 수 있다. 즉, 객체기반 오디오 컨텐츠가 보컬(vocal), 피아노, 드럼으로 구성되는 경우, 보컬, 피아노, 드럼 각각은 하나의 트랙으로 구성될 수 있다. The number of input tracks means the number of input tracks stored in the object-based audio content. Here, the input track may correspond to a sound source. That is, when the object-based audio contents are composed of a vocal, a piano, and a drum, each of the vocal, piano, and drum may be composed of one track.

'입력 트랙의 아이디'는 각각의 입력 트랙과 대응되는 정수(integer)를 의미한다. 'ID of the input track' means an integer corresponding to each input track.

'입력 트랙당 입력 채널의 개수'는 각각의 입력 트랙에 포함되는 채널의 개수를 의미한다. The 'number of input channels per input track' means the number of channels included in each input track.

이하, 도 2를 참고하여 트랙 및 채널과의 관계를 설명하기로 한다. Hereinafter, the relationship between tracks and channels will be described with reference to FIG.

도 2는 본 발명의 일실시예에 따른 트랙과 채널과의 관계를 도시한 도면이다. 2 is a diagram showing a relationship between a track and a channel according to an embodiment of the present invention.

도 2에서는 보컬 트랙(210), 피아노 트랙(220), 및 드럼 트랙(230)을 도시하고 있다.In Fig. 2, the vocal track 210, the piano track 220, and the drum track 230 are shown.

음원의 녹음 시에 있어서, 각각의 음원을 2채널(즉, 스테레오 채널)로 녹음하는 경우, 각 트랙은 2개의 채널을 포함할 수 있다. 즉, 2채널로 보컬, 피아노, 및 드럼을 녹음하는 경우, 보컬 트랙(210)은 제1 채널(211) 및 제2 채널(212)로 구성되고, 피아노 트랙(220)은 제1 채널(221) 및 제2 채널(222)로 구성되고, 드럼 트랙(230)은 제1 채널(231) 및 제2 채널(232)로 구성될 수 있다. 도 2에서는 모든 트랙이 동일한 채널을 포함하는 것으로 도시하였지만, 각 트랙당 포함되는 채널의 개수는 서로 다를 수 있다.When recording a sound source, when each sound source is recorded in two channels (i.e., a stereo channel), each track may include two channels. That is, when a vocal, a piano, and a drum are recorded in two channels, the vocal track 210 is composed of a first channel 211 and a second channel 212, and the piano track 220 includes a first channel 221 And a second channel 222. The drum track 230 may include a first channel 231 and a second channel 232. 2, all the tracks include the same channel, but the number of channels included in each track may be different from each other.

이 때, 객체기반 오디오 컨텐츠의 저작자가 트랙 별로 프리셋을 설정하는 경우 복수의 오디오 객체는 트랙과 대응될 수 있고, 채널 별로 프리셋을 설정하는 경우 복수의 오디오 객체는 채널과 대응될 수 있다. In this case, when the author of the object-based audio content sets a preset for each track, a plurality of audio objects may correspond to a track, and when a preset is set for each channel, a plurality of audio objects may correspond to a channel.

6. 출력 채널의 타입(type), 출력 채널의 개수6. Type of output channel, number of output channels

'출력 채널의 타입'은 객체기반 오디오 컨텐츠가 어떠한 채널을 통해 재생되는지 여부에 대한 정보를 의미하고, '출력 채널의 개수'는 출력 채널 타입에 따른 출력 채널의 개수를 의미한다. The 'output channel type' means information on which channel the object-based audio contents are reproduced, and the 'number of output channels' means the number of output channels according to the output channel type.

7. 사운드 등화(equalization)를 위한 주파수 대역(frequency band)의 개수, 각각의 주파수 대역의 중심 주파수(center frequency), 각각의 주파수 대역의 대역폭(bandwidth) 7. The number of frequency bands for sound equalization, the center frequency of each frequency band, the bandwidth of each frequency band,

'주파수 대역의 개수'는 신호의 증폭이나 전송 과정에서 발생하는 신호의 변형을 보정하기 위한 사운드 등화가 적용될 주파수 대역의 개수를 의미한다. The number of frequency bands refers to the number of frequency bands to which sound equalization is to be applied to compensate for distortion of a signal occurring during signal amplification or transmission.

8. 입력 트랙 별 또는 입력 채널 별 볼륨 정보8. Volume information by input track or by input channel

'볼륨 정보'는 복수의 오디오 객체 각각의 볼륨에 관한 정보를 의미한다. 오디오 객체가 입력 트랙과 대응되는 경우, '입력 트랙 별 볼륨 정보'가 객체기반 오디오 컨텐츠에 저장되고, 오디오 객체가 입력 채널과 대응되는 경우, '입력 채널 별 볼륨 정보'가 객체기반 오디오 컨텐츠에 저장된다. The " volume information " means information about the volume of each of a plurality of audio objects. When the audio object corresponds to the input track, the 'volume information per input track' is stored in the object-based audio content. If the audio object corresponds to the input channel, 'volume information per input channel' is stored in the object- do.

9. 입력 트랙 별 또는 입력 채널 별 주파수 이득(gain) 정보9. Frequency gain information by input track or input channel

'주파수 이득 정보'는 사운드 등화 적용 시의 주파수 이득에 관한 정보를 의미하는 것이다. 오디오 객체가 입력 트랙과 대응되는 경우, '입력 트랙 별 주파수 이득 정보'가 객체기반 오디오 컨텐츠에 저장되고, 오디오 객체가 입력 채널과 대응되는 경우, '입력 채널 별 주파수 이득 정보'가 객체기반 오디오 컨텐츠에 저장된다. The 'frequency gain information' refers to information on the frequency gain when sound equalization is applied. When the audio object corresponds to the input track, 'frequency gain information per input track' is stored in the object-based audio content, and when the audio object corresponds to the input channel, 'frequency gain information per input channel' / RTI >

10. 프리셋 글로벌(global) 볼륨 정보10. Preset global volume information

'프리셋 글로벌 볼륨 정보'는 복수의 오디오 객체 전체의 볼륨을 조절하기 위한 정보를 의미한다. The 'preset global volume information' refers to information for adjusting the volume of a plurality of audio objects.

11. 음상(sound image)의 크기 및 음상의 각도11. The size of the sound image and the angle of the sound image

'음상의 크기' 및 '음상의 각도'는 객체기반 오디오 컨텐츠에 저장되는 복수개의 채널에 의해 형성되는 음상의 크기 값 및 음상의 각도 값을 의미한다. The 'size of sound image' and the 'angle of sound image' mean the size value and the angle value of sound image formed by a plurality of channels stored in the object-based audio content.

객체기반 오디오 컨텐츠의 저작자는 다양한 방법을 통하여 ISO 기반 미디어 파일 포맷 구조에 따라, 상기 나열된 정보들 중에서 적어도 하나를 포함하는 프리셋 파라미터를 저장하여 객체기반 오디오 컨텐츠를 생성할 수 있다. The author of the object-based audio content may generate the object-based audio content by storing preset parameters including at least one of the listed information according to the ISO-based media file format structure through various methods.

도 3은 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법에 대한 흐름도를 도시한 도면이다.3 is a flowchart illustrating a method of generating object-based audio content according to an exemplary embodiment of the present invention.

먼저, 단계(310)에서는 복수의 오디오 객체를 입력 받는다. First, in step 310, a plurality of audio objects are input.

다음으로, 단계(320)에서는 입력된 복수의 오디오 객체를 이용하여 적어도 하나의 프리셋을 생성한다. Next, in step 320, at least one preset is generated using the plurality of input audio objects.

마지막으로, 단계(330)에서는 복수의 오디오 객체, 및 프리셋의 속성에 대한 프리셋 파라미터를 저장한다. 상기 언급한 바와 같이, 프리셋 파라미터는 상기 나열된 정보 중에서 적어도 하나를 포함할 수 있다. Finally, in step 330, a plurality of audio objects and preset parameters for the attributes of the preset are stored. As mentioned above, the preset parameter may include at least one of the listed information.

이 경우, 프리셋 파라미터는 객체기반 오디오 컨텐츠에 관한 미디어 파일 포맷에서 정의되는 박스(box)의 형태로 저장된다. In this case, the preset parameters are stored in the form of boxes defined in the media file format for object-based audio content.

이하에서는 단계(330)에서 프리셋 파라미터를 저장하는 과정을 상세히 설명하기로 한다. Hereinafter, the process of storing the preset parameters in step 330 will be described in detail.

' ftyp ' 내에 존재하는 ' meta 또는 ' moov ' 내에 존재하는 ' meta'내에 프리셋 파라미터를 저장Storing the preset parameters in the meta '' present in the '' meta or 'moov present in the' ftyp '

본 발명의 일실시예에 따르면, 프리셋 파라미터는 'ftyp' 내에 존재하는 'meta'(이하 제1 'meta'라고 한다), 또는 'moov' 내에 존재하는 'meta'(이하 제2 'meta'라고 한다)내에 저장될 수 있다. According to one embodiment of the present invention, a preset parameter is defined as 'meta' (hereinafter referred to as 'first meta') existing in 'ftyp' or 'meta' ). &Lt; / RTI >

즉, 상기에서 언급한 바와 같이, 제1 'meta' 또는 제2 'meta'에는 노래 제목, 가수 이름, 앨범(album) 이름 등 객체기반 오디오 컨텐츠에 대한 일반적인 정보를 나타내는 디스크립션 정보(또는 디스크립션 메타데이터)가 저장될 수 있는데, 프리셋 파라미터는 상기의 디스크립션 정보와 함께 저장될 수 있다. That is, as described above, the first 'meta' or the second 'meta' may include description information (or description metadata) indicating general information on object-based audio contents such as a song title, an artist name, ) May be stored, and the preset parameter may be stored together with the description information.

디스크립션 정보가 저장되는 ' meta '와 다른 별개의 ' meta '에 프리셋 파라미터를 저장 Preset parameters are stored in ' meta ' where the description information is stored and in a separate ' meta '

본 발명의 일실시예에 따르면, 프리셋 파라미터는 객체기반 오디오 컨텐츠에 대한 디스크립션 정보가 저장되는 'meta'와 다른 별개의 'meta'에 저장될 수 있다. According to an embodiment of the present invention, a preset parameter may be stored in a separate 'meta' that is different from 'meta' in which description information about object-based audio content is stored.

이는 디스크립션 정보는 객체기반 오디오 컨텐츠의 식별과 관련된 정보이고, 프리셋 파라미터는 객체기반 오디오 컨텐츠의 재생과 관련된 정보로서, 양 정보의 속성이 서로 다르므로 이는 서로 구분되어 관리(handling)되는 것이 바람직하다는 점에 기인한 것이다. This is because the description information is related to the identification of the object-based audio content, and the preset parameter is related to the reproduction of the object-based audio content. Since the attributes of the information are different from each other, .

일례로서, 디스크립션 정보는 제1 'meta'에 저장되고, 프리셋 파라미터는 제2 'meta'에 저장될 수 있다. As an example, the description information may be stored in the first 'meta' and the preset parameters may be stored in the second 'meta'.

ISO 기반 미디어 파일 포맷에서는 하나의 레벨(level) 내에 하나의 'meta' 만이 존재할 수 있는 것으로 규정하고 있으므로 'ftyp'와 'moov' 각각은 하위 레벨에서 하나의 'meta'만을 포함할 수 있다. 따라서, 디스크립션 정보와 프리셋 파라미터가 구분되어 저장되려면, 서로 다른 레벨에 존재하는 'meta'(즉, 제1 'meta' 및 제2 'meta')에 디스크립션 정보 및 프리셋 파라미터가 각각 저장되어야 한다. 이 경우, 프리셋 파라미터는 프레젠테이션을 위한 메타데이터의 속성을 가지고 있으므로, 디스크립션 정보는 제1 'meta'에 저장되고, 프리셋 파라미터는 제2 'meta'에 저장될 수 있다. In ISO-based media file format, 'ftyp' and 'moov' each contain only one 'meta' at a lower level, since only one 'meta' can exist within one level. Therefore, in order for the description information and the preset parameter to be separately stored, the description information and the preset parameter must be stored in the 'meta' (i.e., the first 'meta' and the second 'meta') existing at different levels. In this case, since the preset parameter has the attribute of the metadata for the presentation, the description information can be stored in the first 'meta', and the preset parameter can be stored in the second 'meta'.

다른 일례로서, 디스크립션 정보는 'meta'(제1 'meta' 및 제2 'meta')에 그대로 저장되어 있고, 프리셋 파라미터는 'ftyp' 또는 'moov' 내에 존재하는 메코(meco) 박스(이하, 'meco'라고 한다) 내에 저장될 수 있다. As another example, the description information is stored intactly in 'meta' (first meta 'and second meta'), and preset parameters are stored in a meco box existing in 'ftyp' or 'moov' quot; meco ").

'meco'는 ISO 기반 미디어 파일 포맷에서 규정하고 있는, 부가적인 메타데이터를 저장하기 위한 박스(Additional Metadata Contain Box)로서, 'meco'에는 ISO 기반 미디어 파일 포맷에서 규정되지 않은 별개의 메타데이터가 저장될 수 있다. 따라서, 프리셋 파라미터는 'ftyp' 내에 존재하는 'meco' 또는 'moov' 내에 존재하는 'meco' 중 어느 하나에 저장될 수 있다. 'meco' is an additional metadata container box for storing additional metadata defined in the ISO-based media file format, and 'meco' stores separate metadata not defined in the ISO-based media file format. . Therefore, the preset parameter may be stored in either 'meco' existing in 'ftyp' or 'meco' existing in 'moov'.

'' moovmoov ' 내에 새롭게 정의된 박스에 프리셋 파라미터를 저장 Preset parameters are stored in the newly defined box in '

본 발명의 일실시예에 따르면, 프리셋 파라미터는 'moov' 내에 새롭게 정의된 박스에 저장될 수 있다. According to one embodiment of the present invention, the preset parameter may be stored in a newly defined box in 'moov'.

상기 언급한 바와 같이, 프리셋 파라미터와 디스크립션 정보는 속성이 서로 다르므로, 프리셋 파라미터는 디스크립션 정보와 별개로 관리(handling)되는 것이 바람직하다. 또한, 프리셋 파라미터는 프레젠테이션을 위한 메타데이터의 속성을 가지고 있으므로, 'moov' 내에 저장되는 것이 바람직하다. 따라서, 프리셋 파라미터를 효율적으로 관리하기 위해서는 'moov' 내에 새로운 박스를 정의하고, 새롭게 정의된 박스를 내에 프리셋 파라미터를 저장하는 것이 바람직하다. As described above, since the preset parameter and the description information have different attributes, it is desirable that the preset parameter be handled separately from the description information. Also, since the preset parameters have attributes of metadata for presentation, they are preferably stored in 'moov'. Therefore, in order to efficiently manage the preset parameters, it is desirable to define a new box in 'moov' and to store the preset parameters in the newly defined box.

도 4는 본 발명의 일실시예에 따른 'moov'의 구조를 도시한 도면이다. 4 is a diagram illustrating a structure of 'moov' according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 'moov' 내에는 2개의 박스가 정의될 수 있다. As shown in FIG. 4, two boxes may be defined in 'moov'.

제1 박스는 'moov'내에 정의되는 박스로서, 제1 박스에는 프리셋의 전체적인 정보를 나타내는 프리셋 파라미터인 제1 프리셋 파라미터가 저장된다. 이하에서는 제1 박스를 프리셋 컨테이너 박스(preset contain box) 즉,'prco'라고 칭하기로 한다. The first box is a box defined in 'moov', and the first box stores a first preset parameter which is a preset parameter indicating the overall information of the preset. Hereinafter, the first box will be referred to as a preset contain box, that is, 'prco'.

일례로, 제1 프리셋 파라미터는 상기에서 언급한 프리셋의 개수 및 디폴트 프리셋 아이디 중에서 적어도 하나가 포함될 수 있다. 디폴트 프리셋 아이디(default preset ID)란 객체기반 오디오 컨텐츠가 재생되는 경우에 있어 사용자 인터랙션(user interaction)이 없는 초기 상태에서 가장 먼저 재생되어야 할 프리셋 아이디를 의미한다. 디폴트 프리셋 아이디는 객체기반 오디오 컨텐츠에 포함된 프리셋 아이디 중에서 어느 하나의 프리셋 아이디와 대응될 수 있다. For example, the first preset parameter may include at least one of the number of the above-mentioned preset and the default preset ID. The default preset ID means a preset ID to be played first in an initial state in which no user interaction occurs when object-based audio content is played back. The default preset ID may correspond to any one of the preset IDs included in the object-based audio content.

제2 박스는 'prco'내에 정의되는 박스로서, 제2 박스에는 프리셋의 속성에 대한 파라미터인 제2 프리셋 파라미터가 저장된다. The second box is a box defined in 'prco', and the second box stores a second preset parameter which is a parameter for the attribute of the preset.

일례로, 제2 프리셋 파라미터에는 상기 나열된 정보 중에서 프리셋의 개수 및 디폴트 프리셋 아이디 이외의 다른 정보들이 포함될 수 있다. 이하에서는 제2 박스를 프리셋 박스(preset box), 즉, 'prst'라고 칭하기로 한다. For example, the second preset parameter may include information other than the number of the preset and the default preset ID among the listed information. Hereinafter, the second box will be referred to as a preset box, that is, 'prst'.

'prco'내에는 객체기반 오디오 컨텐츠에 포함되는 프리셋 수만큼의 'prst'가 존재한다. 만약, 객체기반 오디오 컨텐츠 내에 프리셋이 저장되지 않는 경우, 'prco' 내에는 'prst'가 존재하지 않는다. Within 'prco' there is 'prst' as many as the number of presets contained in object-based audio content. If the preset is not stored in the object-based audio content, there is no 'prst' in 'prco'.

일례로, 'prst'에는 상기에서 언급한 프리셋 정보 중에서 프리셋의 개수 및 디폴트 프리셋 아이디를 제외한 나머지 프리셋 정보를 포함하는 프리셋 파라미터가 저장될 수 있다. For example, 'prst' may store preset parameters including the number of presets and the preset information other than the default preset ID in the above-mentioned preset information.

본 발명의 일실시예에 따르면, 'moov'가 'prco' 및 'prst'를 포함하는 경우, ISO 기반 미디어 파일 포맷의 구조는 표 1과 같이 나타낼 수 있다. According to an embodiment of the present invention, when 'moov' includes 'prco' and 'prst', the structure of the ISO based media file format may be as shown in Table 1.

ftypftyp file type and compatibilityfile type and compatibility moovmoov container for all the metadatacontainer for all the metadata mvhdmvhd movie header, overall declarationsmovie header, overall declarations traktrak container for an individual track or streamcontainer for an individual track or stream tkhdtkhd track header, overall information about the tracktrack header, overall information about the track treftref track reference container트랙 참조 용기 edtsedts edit list containeredit list container elstelst an edit listan edit list mdiamdia container for the media information in a trackcontainer for the media information in a track mdhdmdhd media header, overall information about the mediamedia header, overall information about the media hdlrHDLR handler, declares the media (handler) typehandler, declares the media (handler) type minfminf media information containermedia information container smhdsmhd sound media header, overall information (sound track only)sound media header, overall information (sound track only) hmhdhmhd hint media header, overall information (hint track only)hint media header, overall information (hint track only) nmhdnmhd Null media header, overall information (some tracks only)Null media header, overall information (some tracks only) dinfdinf data information box, containerdata information box, container drefdref data reference box, declares source(s) of media data in trackdata reference box, declares source (s) of media data in track stblstbl sample table box, container for the time/space mapsample table box, container for the time / space map stsdstsd sample descriptions (codec types, initialization etc.)sample descriptions (codec types, initialization etc.) sttsstts (decoding) time-to-sample decoding time-to-sample stscstsc sample-to-chunk, partial data-offset informationsample-to-chunk, partial data-offset information stszstsz sample sizes (framing)sample sizes (framing) stz2stz2 compact sample sizes (framing)compact sample sizes (framing) stcostco chunk offset, partial data-offset informationchunk offset, partial data-offset information co64co64 64-bit chunk offset64-bit chunk offset prcoprco container for the presetscontainer for the presets prstprst preset box, container for the preset informationpreset box, container for the preset information mdatmdat media data containermedia data container freefree free spacefree space skipskip free spacefree space metameta MetadataMetadata hdlrHDLR handler, declares the metadata (handler) typehandler, declares the metadata (handler) type dinfdinf data information box, containerdata information box, container DrefDref data reference box,declares source(s) of metadata itemsdata reference box, declares source (s) of metadata items ilociloc item locationitem location iinfiinf item informationitem information xmlxml XML containerXML container bxmlbxml binary XML containerbinary XML container pitmpitm primary item referenceprimary item reference

이하에서는 'prco' 및 'prst'의 신택스(syntax)와 시맨틱스(semantics)의 일실시예들에 대해 자세히 설명하기로 한다. Hereinafter, one embodiment of the syntax and semantics of 'prco' and 'prst' will be described in detail.

표 2는 'prco'의 신택스의 일실시예를 나타낸다. Table 2 shows one embodiment of the syntax of 'prco'.

Preset Container Box
Box type: 'prco'
Container: Movie Box ('moov')
Mandatory: Yes
Quantity: Exactly one

syntax

aligned(8) class PresetContainerBox extends Box('prco'){
unsigned int(8) num_preset;
unsigned int(8) default_preset_ID;
} Preset Container Box
Box type : 'prco'
Container: Movie Box ('moov')
Mandatory: Yes
Quantity: Exactly one

syntax

aligned (8) class PresetContainerBox extends Box ('prco') {
unsigned int (8) num_preset;
unsigned int (8) default_preset_ID;
}

표 2의 신택스에 따른 시맨틱스는 아래와 같다. The semantics according to the syntax of Table 2 are as follows.

'num_preset'은 'prco' 내의 프리셋의 개수를 의미한다. 'num_preset' means the number of presets in 'prco'.

'default_preset_ID'는 디폴트 프리셋 아이디를 각각 의미한다. 저작자가 'default_preset_ID'를 설정하지 않은 경우, 프리셋 아이디 값이 가장 작은 프리셋의 프리셋 아이디가 'default_preset_ID'로 설정될 수 있다. 'default_preset_ID' means the default preset ID. If the author does not set 'default_preset_ID', the preset ID of the preset with the smallest preset ID value may be set to 'default_preset_ID'.

만약 'default_preset_ID'가 '0'으로 설정된 경우, 객체기반 오디오 컨텐츠에 포함되는 복수의 오디오 객체 중에서 다객체 오디오 압축 기술(SAOC: MPEG-D Spatial audio object coding)로써 부호화되어 저장된 오디오 객체들의 비트스트림 내부에 저장된 프리셋에 따라 객체기반 오디오 컨텐츠가 재생될 수 있다. 이에 대한 보다 자세한 설명은 도 6에 대한 설명을 참고하기로 한다. If 'default_preset_ID' is set to '0', among the plurality of audio objects included in the object-based audio content, the bitstream of the audio objects encoded and stored by the multi-object audio coding (SAOC) The object-based audio contents can be reproduced in accordance with the preset stored in the memory. A detailed description thereof will be made with reference to FIG. 6.

표 3은 'prst'의 개괄적인 신택스를 나타낸다. Table 3 shows the general syntax of 'prst'.

Preset Box
Box type: ' prst '
Container: Preset Container Box ('prco')
Mandatory: No
Quantity: zero or more

syntax

aligned(8) class PresetBox extends FullBox('prst', version=0, flags){
unsigned int(8) preset_ID;
unsigned int(8) num_preset_track;
unsigned int(8) preset_track_ID[num_preset_track];
unsigned int(8) preset_type;
unsigned int(8) preset_global_volume;

if(preset_type == 0) {}
if(preset_type == 1) {}
if(preset_type == 2) {}
if(preset_type == 3) {}
if(preset_type == 4) {}
if(preset_type == 5) {}
if(preset_type == 6) {}
if(preset_type == 7) {}
if(preset_type == 8) {}
if(preset_type == 9) {}
if(preset_type == 10) {}
if(preset_type == 11) {}
string preset_name;
} Preset Box
Box type: ' prst '
Container: Preset Container Box ('prco')
Mandatory: No
Quantity: zero or more

syntax

aligned (8) class PresetBox extends FullBox ('prst', version = 0, flags) {
unsigned int (8) preset_ID;
unsigned int (8) num_preset_track;
unsigned int (8) preset_track_ID [num_preset_track];
unsigned int (8) preset_type;
unsigned int (8) preset_global_volume;

if (preset_type == 0) {}
if (preset_type == 1) {}
if (preset_type == 2) {}
if (preset_type == 3) {}
if (preset_type == 4) {}
if (preset_type == 5) {}
if (preset_type == 6) {}
if (preset_type == 7) {}
if (preset_type == 8) {}
if (preset_type == 9) {}
if (preset_type == 10) {}
if (preset_type == 11) {}
string preset_name;
}

표 3의 신택스에 따른 시맨틱스는 아래와 같다. The semantics according to the syntax of Table 3 are as follows.

'version'은 'prst'의 버전을 의미한다. 'version' means the version of 'prst'.

'flags'는 객체기반 오디오 컨텐츠의 재생 시에 있어, 'prst'에 저장된 정보를 사용자에게 표시할지 여부 및 'prst'에 저장된 정보에 대한 사용자의 편집을 허용할지 여부에 대한 플래그 정보를 의미한다.'flags' denotes flag information about whether to display information stored in 'prst' to the user and whether to allow the user to edit information stored in 'prst', when reproducing object-based audio contents.

'flags'는 8비트 인티저(bit integer)의 데이터 타입을 갖는 플래그 정보로서, 표 4와 같은 의미를 가질 수 있다. 'flags' is flag information having a data type of 8-bit integer, and can have the same meaning as in Table 4.

FlagsFlags DisplayDisplay EditEdit 0x010x01 disabledisable disabledisable 0x020x02 enableenable disabledisable 0x030x03 enableenable enableenable

즉, 만약 'flags'가 0x01인 경우, 객체기반 오디오 컨텐츠의 재생 시 'prst' 내에 저장된 프리셋 관련 정보가 사용자에게 표시되지 않으며, 사용자는 'prst' 내에 저장된 프리셋 관련 정보를 편집할 수 없다.That is, if 'flags' is 0x01, the preset related information stored in the 'prst' is not displayed to the user when reproducing the object based audio content, and the user can not edit the preset related information stored in the 'prst'.

만약 'flags'가 0x02인 경우, 객체기반 오디오 컨텐츠의 재생 시 'prst' 내에 저장된 프리셋 관련 정보는 사용자에게 표시지만, 사용자는 'prst' 내에 저장된 정보를 편집할 수 없다.If 'flags' is 0x02, the preset related information stored in 'prst' is displayed to the user when playing the object based audio content, but the user can not edit the information stored in 'prst'.

만약 'flags'가 0x03인 경우, 객체기반 오디오 컨텐츠의 재생 시 'prst' 내에 저장된 정보는 사용자에게 표시되며, 사용자는 'prst' 내에 저장된 정보를 편집할 수 있다. If 'flags' is 0x03, the information stored in 'prst' is displayed to the user when playing object-based audio content, and the user can edit the information stored in 'prst'.

'preset_ID'는 프리셋 아이디를 의미하는 것으로 1 이상의 값을 가질 수 있다. 'preset_ID' means a preset ID and can have a value of 1 or more.

'num_preset_track'은 프리셋과 관련된 입력 트랙의 개수를 의미한다. 'num_preset_track' means the number of input tracks associated with the preset.

'preset_track_ID[num_preset_track]'은 입력 트랙의 아이디를 저장하는 어레이(array)를 의미한다. 'preset_track_ID [num_preset_track]' means an array that stores the ID of the input track.

'preset_name' 은 프리셋 이름을 의미한다. 'preset_name' means preset name.

'preset_global_volume'는 프리셋 글로벌 볼륨 정보를 의미한다. 'preset_global_volume' means preset global volume information.

일반적으로, 객체기반 오디오 컨텐츠의 리듬감을 강조하기 위해, 저작자는 드럼과 같은 타악기(percussion instrument) 소리의 볼륨을 다른 악기 소리의 볼륨에 비해 상대적으로 크게 하여 프리셋을 생성한다. Generally, to emphasize the rhythm of object-based audio content, the author creates a preset by increasing the volume of the percussion instrument sound, such as a drum, relative to the volume of other instrument sounds.

그런데, 만약 타악기 소리와 다른 악기 소리의 상대적인 볼륨 차가 작은 경우, 충분한 리듬감을 느낄 수 없게 된다. 이와 반대로, 만약 타악기 소리와 다른 악기 소리의 상대적인 볼륨 차가 큰 경우, 전체적인 볼륨의 크기가 작아지게 된다. 이는 일반적으로 타악기의 소리는 효과음(effector)과 같은 속성을 가지고 있어, 객체기반 오디오 컨텐츠의 총 재생 구간에 걸쳐 다른 악기 소리에 비해 타악기 소리의 고주파 성분이 차지하는 비중이 크다는 점에 기인한 것이다. However, if the relative volume difference between the percussion sound and the other instrument sound is small, you will not feel a sufficient rhythm. Conversely, if the relative volume difference between percussive sounds and other instrument sounds is large, the overall volume becomes smaller. This is due to the fact that the percussion sound generally has the same properties as the effector, and the high frequency component of the percussion sound is larger than the sound of other instruments over the total playback period of the object-based audio contents.

예를 들어, [보컬, 피아노, 드럼]으로 구성된 프리셋의 볼륨 값이 [250, 200, 400]인 경우 전체적인 볼륨은 적당하지만 리듬감이 강조되지 않고, 프리셋의 볼륨 값이 [100, 150, 400]의 경우 리듬감은 강조되지만 전체적인 볼륨은 줄어들게 된다. For example, if the volume of a preset consisting of vocals, pianos, and drums is set to [250, 200, 400], the overall volume is adequate but the rhythm is not emphasized, The rhythm is emphasized but the overall volume is reduced.

이는 객체기반 오디오 컨텐츠 내에 프리셋 글로벌 볼륨 정보를 더 저장함으로써 해결될 수 있다. 프리셋 글로벌 볼륨 정보는 프리셋을 구성하는 오디오 객체의 전체적인 볼륨을 조절하기 위한 정보이다. This can be solved by further storing the preset global volume information in the object-based audio content. The preset global volume information is information for adjusting the overall volume of the audio object constituting the preset.

즉, 객체기반 오디오 컨텐츠 내에 세팅되어 있는 기본 글로벌 볼륨 값을 기준으로 입력 트랙 전체의 볼륨 값을 저장하고, 프리셋 글로벌 볼륨 값을 기존의 글로벌 볼륨 값보다 크도록 프리셋을 생성한다면, 객체기반 오디오 컨텐츠의 재생 시 상대적인 볼륨 차가

의 비율로 더 커지게 된다. That is, if the volume value of the entire input track is stored based on the basic global volume value set in the object-based audio content, and the preset global volume value is created to be larger than the existing global volume value, Relative volume difference during playback

. &Lt; / RTI >

일례로서, 기본 글로벌 볼륨 값이 '50'이고, [보컬, 피아노, 드럼]로 구성된 프리셋의 볼륨 값이 [100, 150, 400]인 경우, 프리셋 글로벌 볼륨 값을 100로 설정한다면, 각각의 악기의 볼륨은 두 배로 커지게 된다. 이에 따라, 주 멜로디를 구성하는 보컬 및 피아노의 볼륨은 두 배 정도 커지게 되어 객체기반 오디오 컨텐츠의 전체적인 볼륨은 적정한 수준이 되고, 드럼의 볼륨 또한 2배로 커지게 되어 리듬감을 강조할 수 있게 된다. As an example, if the preset global volume value is set to 100 when the default global volume value is '50' and the volume value of the preset composed of [vocal, piano, drum] is [100, 150, 400] Will be doubled. As a result, the volume of the vocal and piano constituting the main melody becomes twice as large, so that the overall volume of the object-based audio contents becomes appropriate, and the volume of the drum also doubles, thereby emphasizing the rhythm.

이와 같이 프리셋 글로벌 볼륨 값을 이용해서 볼륨을 증폭시키는 경우, 클리핑(clipping) 현상 등의 음질 열화가 발생할 수 있지만, 일반적으로 타악기 소리를 일정 수준 이상으로 증가시키는 경우, 타악기에서 발생하는 음질 열화는 사용자가 인지하기 어렵다는 실험적 사실에 기초한다면, 프리셋 글로벌 볼륨 정보의 이용에 따른 음질 열화는 문제되지 않을 것이다. When the volume is amplified using the preset global volume value as described above, sound quality deterioration such as clipping may occur. However, in general, when the percussion sound is increased to a certain level or more, Based on the experimental fact that it is difficult to recognize, the sound quality deterioration due to the use of the preset global volume information will not be a problem.

또한, 프리셋 글로벌 볼륨 정보는 기본 글로벌 볼륨 값이 최대인 경우, 전체적인 볼륨 크기를 증가시키기 위한 용도로도 사용될 수 있다. In addition, the preset global volume information can also be used to increase the overall volume size when the default global volume value is the maximum.

즉, 일반적인 객체기반 오디오 컨텐츠의 재생에 있어, 기본 글로벌 볼륨 값이 최대인 경우, 오디오 객체 각각의 볼륨을 조절하는 것이 불가능하다. 그러나, 만약 객체기반 오디오 컨텐츠 내에 프리셋 글로벌 볼륨 정보가 저장되어 있다면, 기본 글로벌 볼륨 값의 최대값보다 더 큰 볼륨으로 객체기반 오디오 컨텐츠를 재생할 수 있게 된다. That is, in playback of general object-based audio content, it is impossible to adjust the volume of each audio object if the default global volume value is maximum. However, if the preset global volume information is stored in the object-based audio content, the object-based audio content can be reproduced with a volume larger than the maximum value of the basic global volume value.

'preset_type'은 프리셋의 타입을 의미한다. 'preset_type' means the preset type.

본 발명의 일실시예에 따르면, 프리셋 타입은 믹싱 정보의 종류, 믹싱 정보의 적용 대상, 및 객체기반 오디오 컨텐츠의 재생 시간에 따른 믹싱 정보의 변화 여부에 기초하여 결정될 수 있다. 이하에서는 프리셋 타입의 결정 방법에 대해 상세히 설명하기로 한다. According to one embodiment of the present invention, the preset type can be determined based on the type of mixing information, the application object of the mixing information, and the change in the mixing information according to the reproduction time of the object-based audio content. Hereinafter, a method of determining a preset type will be described in detail.

먼저, 프리셋 타입은 믹싱 정보의 종류에 기초하여 결정할 수 있다. First, the preset type can be determined based on the type of the mixing information.

일례로서, 믹싱 정보는 볼륨 정보 및 사운드 등화 정보 중에서 적어도 하나를 포함할 수 있다. 이하에서는 볼륨 정보만을 고려하여 생성된 프리셋을 볼륨 프리셋(volume preset)으로, 등화 정보만을 고려하여 생성된 프리셋을 등화 프리셋(equalization preset)으로, 볼륨 정보와 등화 정보를 모두 고려하여 생성된 프리셋을 볼륨/등화 프리셋(volume/equalization preset)라고 칭하기로 한다. As an example, the mixing information may include at least one of volume information and sound equalization information. Hereinafter, a preset created considering only volume information will be referred to as a volume preset, a preset created considering only equalization information will be referred to as an equalization preset, a preset created considering both volume information and equalization information, / Volume equalization preset.

다음으로, 프리셋 타입은 믹싱 정보의 적용 대상에 기초하여 결정될 수 있다. Next, the preset type can be determined based on the application target of the mixing information.

즉, 입력 트랙을 오디오 객체로 간주하여 믹싱 정보를 적용할지, 입력 채널을 오디오 객체로 간주하여 믹싱 정보를 적용할지 여부에 따라 프리셋 타입이 결정될 수 있다. 이하에서는 입력 트랙을 오디오 객체로 간주하여 생성된 프리셋을 트랙 프리셋(track preset)으로, 입력 채널을 오디오 객체로 간주하는 생성된 프리셋을 채널 프리셋(channel preset)으로 칭하기로 한다. That is, the preset type can be determined according to whether the input track is regarded as an audio object and the mixing information is applied, or whether the input channel is regarded as an audio object and the mixing information is applied. Hereinafter, a preset created by considering an input track as an audio object is referred to as a track preset, and a generated preset that regards an input channel as an audio object is referred to as a channel preset.

마지막으로, 프리셋 타입은 객체기반 오디오 컨텐츠의 재생 시간에 따른 믹싱 정보의 변화 여부에 기초하여 결정될 수 있다. Finally, the preset type can be determined based on whether the mixing information changes according to the playback time of the object-based audio content.

즉, 객체기반 오디오 컨텐츠의 재생됨에 따라, 믹싱 정보가 일정한 값을 갖는지, 믹싱 정보가 변화하는지 여부에 따라 프리셋 타입이 결정될 수 있다. 이하에서는 믹싱 정보가 변화하지 않는 경우의 프리셋을 스태틱 프리셋(static preset)으로, 믹싱 정보가 변화하는 경우의 프리셋을 다이나믹 프리셋(dynamic preset)으로 칭하기로 한다. That is, as the object-based audio content is reproduced, the preset type can be determined according to whether the mixing information has a constant value or whether the mixing information changes. Hereinafter, a preset in the case where the mixing information does not change is referred to as a static preset, and a preset in the case where the mixing information changes is referred to as a dynamic preset.

본 발명의 일실시예에 따르면, 객체기반 오디오 컨텐츠 내에 다이나믹 프리셋을 저장하는 경우 'prst' 내에는 입력 트랙 아이디 및 상기 입력 트랙 아이디의 믹싱 정보를 매핑(mapping)하는 테이블(table)이 포함될 수 있다. 이 경우, 기존의 ISO-BMFF에서 규정하고 'stts'(decoding time to sample box)와 상기 테이블에 저장된 믹싱 정보에 기초하여 입력 트랙의 샘플링 넘버에 따른 믹싱 정보가 도출될 수 있다('stts'에는 디코딩 시간(decoding time)과 샘플링 넘버(sample number)와의 관계 정보가 저장되어 있다). 이에 따라, 객체기반 오디오 컨텐츠의 재생에 있어 임의 접근(random access)이 가능하게 되고, 객체기반 오디오 컨텐츠에 저장되는 믹싱 정보의 양은 감소될 수 있다. According to an embodiment of the present invention, when storing a dynamic preset in the object-based audio content, a 'prst' may include a table for mapping an input track ID and mixing information of the input track ID . In this case, the mixing information according to the sampling number of the input track can be derived based on the existing ISO-BMFF and the 'stts' (decoding time to sample box) and the mixing information stored in the table ('stts' The relationship between the decoding time and the sample number is stored). This allows random access to the playback of the object-based audio content, and the amount of mixing information stored in the object-based audio content can be reduced.

상기에서 언급한 정보들을 이용하여 프리셋을 생성하는 경우, 프리셋 타입은 표 5와 같이 구분될 수 있다. 표 5에서는 12개의 프리셋이 존재할 수 있는 것으로 표시하였지만, 이는 분류 요소에 따라 더욱 확장될 수 있다.In the case of generating a preset using the above-mentioned information, the preset types can be classified as shown in Table 5. Although Table 5 shows that 12 presets may be present, this may be further extended according to the classification elements.

presetpreset
_type_type static(S)static (S)
/dynamic(D)/ dynamic (D) track(T)track (T)
/channel(C)/ channel (C) volumevolume
(Vol)(Vol) equalizationequalization
(( EqEq )) meaningmeaning 00 SS TT VolVol -- static track volume presetstatic track volume preset 1One SS TT VolVol EqEq static track volume preset with equalizationstatic track volume preset with equalization 22 SS TT -- EqEq static track equalization presetstatic track equalization preset 33 DD TT VolVol -- dynamic track volume presetdynamic track volume preset 44 DD TT VolVol EqEq dynamic track volume preset with equalizationdynamic track volume preset with equalization 55 DD TT -- EqEq dynamic track equalization presetdynamic track equalization preset 66 SS CC VolVol -- static object volume preset static object volume preset 77 SS CC VolVol EqEq static object volume preset with equalizationstatic object volume preset with equalization 88 SS CC -- EqEq static object equalization presetstatic object equalization preset 99 DD CC VolVol -- dynamic object volume preset dynamic object volume preset 1010 DD CC VolVol EqEq dynamic object volume preset with equalizationdynamic object volume preset with equalization 1111 DD CC -- EqEq dynamic object equalization presetdynamic object equalization preset

표 5를 참고하면, 믹싱 정보는 볼륨 정보와 등화 정보를 포함하고, 이는 프리셋 타입에 따라 상이한 형태로 'prst'에 저장됨을 알 수 있다. 여기서, 믹싱 정보의 저장 형태는 크게, 프리셋 타입이 static preset 인지 dynamic preset인지에 따라 구분될 수 있다. Referring to Table 5, it can be seen that the mixing information includes volume information and equalization information, which are stored in 'prst' in a different form depending on the preset type. Here, the storage format of the mixing information can be largely classified according to whether the preset type is a static preset or a dynamic preset.

1. 프리셋 타입이 static preset인 경우 1. If the preset type is a static preset

프리셋 타입이 static preset인 경우, 객체기반 오디오 컨텐츠를 구성하는 복수의 프레임에서의 믹싱 정보는 동일하므로, 각각의 오디오 객체 별로 동일한 믹싱 정보가 저장된다. 여기서, 믹싱 정보의 저장 형태는 프리셋 타입이 track preset인지, channel preset인지에 따라 세부적으로 구분될 수 있다. If the preset type is a static preset, the mixing information in the plurality of frames constituting the object-based audio content is the same, and thus the same mixing information is stored for each audio object. Here, the storing form of the mixing information can be classified in detail according to whether the preset type is a track preset or a channel preset.

1.1. 프리셋 타입이 static/track preset인 경우('preset_type' 값이 0, 1, 2인 경우)1.1. If the preset type is static / track preset ('preset_type' value is 0, 1 or 2)

믹싱 정보가 트랙 별로 저장되는 경우, 출력 채널 타입은 입력 트랙 중에서 가장 많은 채널을 갖는 입력 트릭에 따라 결정될 수 있다. 예를 들어, 제1 입력 트랙이 2개의 채널을 포함하고, 제2 입력 트랙이 1개의 채널을 포함하는 경우, 제1 입력 채널에 포함되는 채널의 개수가 더 많으므로, 출력 채널 타입은 스테레오로 결정될 수 있다. When the mixing information is stored for each track, the output channel type may be determined according to an input trick having the largest number of channels among the input tracks. For example, when the first input track includes two channels and the second input track includes one channel, the number of channels included in the first input channel is larger, so that the output channel type is set to stereo Can be determined.

이 경우, 'prst' 내의 프리셋의 신택스는 표 6 내지 표 8과 같을 수 있다. In this case, the syntax of the preset in 'prst' may be as shown in Tables 6 to 8.

if(preset_type == 0){ // static track volume preset
for(i=0; i<num_preset_track; i++){
unsigned int(8) preset_volume;
}
}if (preset_type == 0) {// static track volume preset
for (i = 0; i <num_preset_track; i ++) {
unsigned int (8) preset_volume;
}
}

if(preset_type == 1){ // static track volume preset with equalization
for(i=0; i<num_preset_track; i++){
unsigned int(8) preset_volume;
unsigned int(8) num_freq_band;
for(j=0; j<num_freq_band; j++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}if (preset_type == 1) {// static track volume preset with equalization
for (i = 0; i <num_preset_track; i ++) {
unsigned int (8) preset_volume;
unsigned int (8) num_freq_band;
for (j = 0; j <num_freq_band; j ++) {
unsigned int (16) center_freq;
unsigned int (16) bandwidth;
unsigned int (8) preset_freq_gain;
}
}
}

if(preset_type == 2){ // static track equalization preset
for(i=0; i<num_preset_track; i++){
unsigned int(8) num_freq_band;
for(j=0; j<num_freq_band; j++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
nsigned int(8) preset_freq_gain;
}
}
}if (preset_type == 2) {// static track equalization preset
for (i = 0; i <num_preset_track; i ++) {
unsigned int (8) num_freq_band;
for (j = 0; j <num_freq_band; j ++) {
unsigned int (16) center_freq;
unsigned int (16) bandwidth;
nsigned int (8) preset_freq_gain;
}
}
}

표 6 내지 표 8의 신택스에 따른 시맨틱스는 아래와 같다. The semantics according to the syntax of Tables 6 to 8 are as follows.

'preset_volume'은 볼륨 정보를 의미한다. 'preset_volume' means volume information.

볼륨 정보는 입력 트랙의 입력 볼륨 값과 출력 트랙의 출력 볼륨 값간의 볼륨 이득 값을 포함할 수 있다. 볼륨 이득 값은 백분율 또는 데시벨(dB)로 표현될 수 있다. The volume information may include a volume gain value between the input volume value of the input track and the output volume value of the output track. The volume gain value may be expressed as a percentage or in decibels (dB).

또한, 백분율 또는 데시벨로 표현된 볼륨 이득 값은 양자화되어 저장될 수 있다. 이 경우, 양자화된 볼륨 이득 값은 표 9 및 표 10과 같이 표현될 수 있다. In addition, volume gain values expressed in percent or decibels may be quantized and stored. In this case, the quantized volume gain values can be expressed as shown in Tables 9 and 10.

indexindex 00 1One 22 33 149149 200200 value(ratio)value (ratio) 00 0.020.02 0.040.04 0.060.06 3.983.98 4.004.00

indexindex 00 1One 22 33 44 55 66 77 88 99 1010 1111 1212 1313 value(dB)value (dB) -25-25 -21-21 -18-18 -15-15 -12-12 -8-8 -5-5 -3-3 -1-One 00 1One 22 33 44

'num_freq_band'은 사운드 등화가 적용되는 주파수 대역의 개수를 의미하는 것으로서, 0 이상 32 이하의 정수 값을 갖는다. 'num_freq_band' means the number of frequency bands to which sound equalization is applied, and has an integer value ranging from 0 to 32 inclusive.

'center_freq'는 각각의 주파수 대역에서의 중심 주파수를 의미하는 것으로서, 0 이상 20,000 이하의 정수 값을 갖는다(단위: Hz). 'center_freq' means a center frequency in each frequency band, and has an integer value of 0 to 20,000 (unit: Hz).

'bandwidth'는 각각의 주파수 대역의 대역폭을 의미하는 것으로서, 0 이상 20,000 이하의 정수 값을 갖는다(단위: Hz). 'bandwidth' refers to the bandwidth of each frequency band, and has an integer value from 0 to 20,000 (unit: Hz).

'preset_freq_gain'각각의 주파수 대역에서의 주파수 이득 값을 의미한다. 'preset_freq_gain' means a frequency gain value in each frequency band.

볼륨 이득 값과 마찬가지로 주파수 이득 값 역시 백분율 또는 데시벨(dB)로 표현될 수 있고, 또한, 백분율 또는 데시벨로 표현된 주파수 이득 값은 양자화되어 저장될 수 있다. 이 경우, 양자화된 주파수 이득 값은 표 11과 같이 표현될 수 있다. Like the volume gain value, the frequency gain value can also be expressed as a percentage or decibel (dB), and the frequency gain value expressed as a percentage or decibel can also be quantized and stored. In this case, the quantized frequency gain values can be expressed as shown in Table 11.

indexindex 00 1One 22 33 149149 200200 gaingain 00 0.020.02 0.040.04 0.060.06 3.983.98 4.004.00

1.2. 프리셋 타입이 static/channel preset인 경우('preset_type' 값이 7, 8, 9인 경우)1.2. Preset type is static / channel preset ('preset_type' value is 7, 8, 9)

믹싱 정보가 채널 별로 저장되는 경우, 믹싱 정보는 입력 트랙의 개수, 입력 트랙 당 채널의 개수 및 출력 채널 타입을 고려하여 저장될 수 있다. 이 경우, 'prst'내의 프리셋의 신택스는 표 12 내지 표 14와 같을 수 있다. When the mixing information is stored for each channel, the mixing information can be stored in consideration of the number of input tracks, the number of channels per input track, and the output channel type. In this case, the syntax of the preset in 'prst' may be as shown in Tables 12 to 14.

if(preset_type == 6){ // static object volume preset
unsigned int(8) num_input_channel[num_preset_track];
unsigned int(8) output_channel_type;
for (i=0; i<num_preset_track; i++){
for (j=0; j<num_input_channel[i]; j++){
for (k=0; k<num_output_channel; k++){
unsigened int(8) preset_volume;
}
}
}
}if (preset_type == 6) {// static object volume preset
unsigned int (8) num_input_channel [num_preset_track];
unsigned int (8) output_channel_type;
for (i = 0; i <num_preset_track; i ++) {
(j = 0; j < num_input_channel [i]; j ++) {
for (k = 0; k <num_output_channel; k ++) {
unsigned int (8) preset_volume;
}
}
}
}

if(preset_type == 7){ // static object volume preset with equalization
for (i=0; i<num_preset_track; i++){
for (j=0; j<num_input_channel[i]; j++){
for (k=0; k<num_output_channel; k++){
unsigned int(8) preset_volume;
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}if (preset_type == 7) {// static object volume preset with equalization
for (i = 0; i <num_preset_track; i ++) {
(j = 0; j < num_input_channel [i]; j ++) {
for (k = 0; k <num_output_channel; k ++) {
unsigned int (8) preset_volume;
unsigned int (8) num_freq_band;
for (m = 0; m <num_freq_band; m ++) {
unsigned int (16) center_freq;
unsigned int (16) bandwidth;
unsigned int (8) preset_freq_gain;
}
}
}
}
}

if(preset_type == 8){ // static object equalization preset
for (i=0; i<num_preset_track; i++){
for (j=0; j<num_input_channel[i]; j++){
for (k=0; k<num_output_channel; k++){
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}if (preset_type == 8) {// static object equalization preset
for (i = 0; i <num_preset_track; i ++) {
(j = 0; j < num_input_channel [i]; j ++) {
for (k = 0; k <num_output_channel; k ++) {
unsigned int (8) num_freq_band;
for (m = 0; m <num_freq_band; m ++) {
unsigned int (16) center_freq;
unsigned int (16) bandwidth;
unsigned int (8) preset_freq_gain;
}
}
}
}
}

표 12 내지 표 14의 신택스에 따른 시맨틱스는 아래와 같다. Semantics according to the syntax shown in Tables 12 to 14 are as follows.

'num_input_channel[num_preset_track]'은 입력 트랙당 채널의 개수에 대한 정보를 저장하는 어레이를 의미한다. 'num_input_channel [num_preset_track]' means an array storing information on the number of channels per input track.

일례로서, 'num_input_channel[num_preset_track]'는 'moov'/'track'/'media'/'minf'/'stbl'/'stsd' 내에 존재하는 'channel_count' 정보를 이용하여 구성될 수 있다. 입력 트랙이 모노 채널을 포함하는 경우, 'num_input_channel[num_preset_track]'는 '1'의 값, 입력 트랙이 스테레오 채널을 포함하는 경우, 'num_input_channel[num_preset_track]'는 '2'의 값을, 입력 트랙이 5채널을 포함하는 경우, 'num_input_channel[num_preset_track]'는 '5'의 값을 각각 가질 수 있다. As an example, 'num_input_channel [num_preset_track]' may be configured using 'channel_count' information existing in 'moov' / 'track' / 'media' / 'minf' / 'stbl' / 'stsd' Num_input_channel [num_preset_track] 'is a value of' 1 'when the input track includes a mono channel,' 2 'is a value of' num_input_channel [num_preset_track] 'when the input track includes a stereo channel, When 5 channels are included, 'num_input_channel [num_preset_track]' may have a value of '5', respectively.

'output_channel_type'은 출력 채널 타입을 의미하고, 'num_output_channel'은 출력 채널의 개수를 의미한다. 일례로서, 'output_channel_type'과 'num_output_channel'은 표 15과 같은 관계를 가질 수 있다. 'output_channel_type' denotes an output channel type, and 'num_output_channel' denotes a number of output channels. As an example, 'output_channel_type' and 'num_output_channel' may have the relationship shown in Table 15.

output_channel_typeoutput_channel_type MeaningMeaning numnum _output_channel_output_channel 00 mono channelmono channel 1One 1One stereo channeltere channel 22 22 5 channel5 channel 55

또한, 본 발명의 일실시예에 따르면, 프리셋 타입이 static/object/volume preset이고, 출력 채널의 개수가 5개인 경우, 'prst'에 저장되는 믹싱 정보는 표 16과 같이 표현될 수 있다. According to an embodiment of the present invention, when the preset type is static / object / volume preset and the number of output channels is 5, the mixing information stored in 'prst' can be expressed as shown in Table 16.

preset_track_ID = 1preset_track_ID = 1 reset_track_ID = 7 reset_track_ID = 7 LL RR MM output channel volume출력 채널 채널 LL 5050 00 5050 RR 00 8080 5050 CC 5050 8080 00 LsLs 00 00 3030 RsRs 00 00 3030

이 경우, 'prst'에 저장되는 각각의 파라미터는 하기와 같은 관계를 가진다. In this case, each parameter stored in 'prst' has the following relationship.

*num_preset_track = 2* num_preset_track = 2

preset_track_ID[2] = [1,7]preset_track_ID [2] = [1,7]

num_input_channel[2] = [2, 1]num_input_channel [2] = [2, 1]

num_output_channel =5num_output_channel = 5

preset_volume = [50, 0, 50, 0, 0, 0, 80, 80, 0, 0, 50, 50, 0, 30, 30]Preset_volume = [50, 0, 50, 0, 0, 0, 80, 80, 0, 0, 50, 50, 0, 30,

여기서, 'preset_volume'을 살펴보면, 일부 믹싱 정보들이 중복되어 저장됨을 알 수 있다. 이 경우, 저장되는 정보의 양이 불필요하게 증가되게 되므로, 'prst'에 저장되는 정보의 양을 줄이기 위한 방안이 요구된다. 이에 대한 보다 자세한 설명을 하기의 "2-나, 다, 라" 부분을 참고하기로 한다. Here, if 'preset_volume' is examined, it can be seen that some mixing information is stored redundantly. In this case, since the amount of information to be stored is unnecessarily increased, a method for reducing the amount of information stored in 'prst' is required. A more detailed description will be given in the section of "2-, D, LA" below.

2. 프리셋 타입이 dynamic preset인 경우 2. If the preset type is dynamic preset

프리셋 타입이 dynamic preset인 경우, 객체기반 오디오 컨텐츠를 구성하는 복수의 프레임에서 믹싱 정보가 변화하므로, 상이한 믹싱 정보가 저장될 수 있다.When the preset type is a dynamic preset, since the mixing information changes in a plurality of frames constituting the object-based audio contents, different mixing information can be stored.

따라서, 믹싱 정보는 프레임 넘버(또는 샘플링 넘버(sample number))에 따른 행렬로 표현될 수 있으며, 또한 상기 행렬은 입력 트랙의 프레임과 이에 해당하는 믹싱 정보를 매핑하는 테이블의 형태로써 표현될 수 있다. Accordingly, the mixing information may be represented by a matrix according to a frame number (or a sampling number), and the matrix may be expressed in the form of a table that maps a frame of the input track and corresponding mixing information .

이하에서는 변화하는 믹싱 정보가 표 17과 같은 매핑 테이블 형태로 표시되는 경우, 믹싱 정보를 저장하는 방안에 대해 구체적으로 설명하기로 한다. Hereinafter, a method of storing the mixing information when the changing mixing information is displayed in the form of a mapping table as shown in Table 17 will be described in detail.

sampling numbersampling number Input TrackInput Track preset_track ID = 1preset_track ID = 1 preset_track ID=3preset_track ID = 3 1One 5050 2020 22 5050 2020 99 5050 2020 1010 5050 2020 1111 5050 1010 1212 5050 1010 1919 5050 1010 2020 5050 1010 2121 7070 6060 2222 7070 6060 2929 7070 6060 3030 7070 6060

가. 프레임 넘버에 따른 믹싱 정보 값을 그대로 저장end. The mixing information value according to the frame number is stored as it is

나. 프레임 넘버에 따른 믹싱 정보 값을 기준 값(reference value) 및 기준 값에 대한 믹싱 정보 차이 값으로 저장I. The mixing information value according to the frame number is stored as the mixing information difference value with respect to the reference value and the reference value

기준 값은 기준 프레임에서의 기준 믹싱 정보 값을 의미한다. 따라서, 기준 프레임에서의 기준 믹싱 정보 값, 및 기준 프레임 이외의 프레임에서의 믹싱 정보와 기준 믹싱 정보 값과의 차이 값이 'prst'에 저장될 수 있다. The reference value means a reference mixing information value in the reference frame. Therefore, the reference mixing information value in the reference frame and the difference value between the mixing information in the frame other than the reference frame and the reference mixing information value can be stored in 'prst'.

만약 기준 값이 0인 경우, 표 17은 표 18과 같이 간략하게 표현될 수 있다. If the reference value is 0, Table 17 can be simplified as shown in Table 18.

sampling countsampling count Input TrackInput Track preset_track ID = 1preset_track ID = 1 2020 5050 1010 7070 sampling countsampling count Input TrackInput Track preset_track ID = 3preset_track ID = 3 1010 2020 1010 1010 1010 6060

따라서, 믹싱 정보가 표 18와 같은 테이블의 형태로 'prst'에 저장되는 경우, 저장되는 정보의 양을 감소시킬 수 있게 된다. Accordingly, when the mixing information is stored in 'prst' in the form of a table as shown in Table 18, it is possible to reduce the amount of information to be stored.

다. 중복을 나타내는 플래그 정보를 이용하여 믹싱 정보를 저장All. The mixing information is stored using the flag information indicating the redundancy

본 방안은 이전의 프레임의 믹싱 정보 값과 현재 프레임의 믹싱 정보 값이 동일한 경우, 믹싱 정보 값을 저장하지 않고, 현재 프레임의 믹싱 정보 값과 이전 프레임의 믹싱 정보 값이 동일한 것임을 나타내는 플래그 정보를 저장함으로써, 'prst'에 저장되는 정보의 양을 감소시킬 수 있는 방법이다. The present invention stores flag information indicating that the mixing information value of the current frame is the same as the mixing information value of the previous frame without storing the mixing information value when the mixing information value of the previous frame is the same as the mixing information value of the current frame Thereby reducing the amount of information stored in 'prst'.

이 경우, 믹싱 정보 값이 시간에 따라 값이 변화한다 하더라도, 각 프레임마다 믹싱 정보가 변화할 가능성은 크지 않으므로, 프레임마다 플래그 값을 부여하는 것이 효율적이지 않다. In this case, even if the value of the mixing information changes with time, it is not likely that the mixing information changes for each frame, so it is not efficient to give a flag value for each frame.

따라서, 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법에 따르면, 믹싱 정보 값 및 플래그 정보는 믹싱 정보가 변화하는 프레임 간격에 대한 정보에 기초하여 저장될 수 있다. Therefore, according to the object-based audio content generation method according to the embodiment of the present invention, the mixing information value and the flag information can be stored based on the information about the frame interval in which the mixing information changes.

예를 들어, 믹싱 정보가 표 17과 같이 변화하는 경우, 믹싱 정보(즉 볼륨 정보)는 10개의 프레임 단위로 변화하는 것으로 간주될 수 있다. 따라서, 표 17은 표 19와 같이 간략하게 표현할 수 있다. For example, when the mixing information changes as shown in Table 17, the mixing information (i.e., volume information) can be regarded as being changed in units of ten frames. Therefore, Table 17 can be expressed briefly as shown in Table 19.

preset_volumepreset_volume 5050 5050 7070 2020 1010 6060 volume_flagvolume_flag 00 1One 00 00 00 00 modified preset_volumemodified preset_volume 5050 __ 7070 2020 1010 6060

따라서, 'prst'에 저장되는 각각의 파라미터는 하기와 같은 관계를 가진다.Therefore, each parameter stored in 'prst' has the following relationship.

dynamic_interval = 10dynamic_interval = 10

volume_flag = [0, 1, 0, 0, 0, 0]volume_flag = [0, 1, 0, 0, 0, 0]

preset_volume = [50, 70, 20, 10, 60]preset_volume = [50, 70, 20, 10, 60]

여기서, 'dynamic_interval'은 프레임 간격을 의미하고, 'volume_flag'는 볼륨 플래그 정보를 의미한다. 이전 프레임의 믹싱 정보와 현재 프레임의 믹싱 정보가 동일한 경우, 'volume_flag'는 '1'의 값을 갖고, 이전 프레임의 믹싱 정보와 현재 프레임의 믹싱 정보가 다른 경우, 'volume_flag'는 '0'의 값을 갖는다. Here, 'dynamic_interval' denotes a frame interval, and 'volume_flag' denotes volume flag information. If the mixing information of the previous frame and the mixing information of the current frame are the same, 'volume_flag' has a value of '1'. If the mixing information of the previous frame and the mixing information of the current frame are different, 'volume_flag' Lt; / RTI >

이를 참고하면, 객체기반 오디오 컨텐츠에 포함되는 복수의 프레임이 특정 프레임 간격에 따라 프레임 그룹으로 구분되고, 믹싱 정보는 프레임 그룹 별로 저장되는 것으로 이해될 수 있다. Referring to this, it can be understood that a plurality of frames included in the object-based audio content are divided into frame groups according to specific frame intervals, and the mixing information is stored for each frame group.

즉 본 발명의 일실시예에 따르면, 제1 프레임 그룹에 대한 제1 그룹 믹싱 정보와 제2 프레임 그룹에 대한 제2 그룹 믹싱 정보가 다른 경우, 'prst'에 저장되는 프리셋 파라미터는 제1 그룹 믹싱 정보, 제2 그룹 믹싱 정보, 제1 그룹 믹싱 정보와 제2 그룹 믹싱 정보가 다른 것임을 나타내는 제1 플래그(flag) 정보, 및 복수의 프레임 그룹 각각에 포함되는 프레임의 개수(즉, 프레임 간격)을 포함한다. That is, according to an embodiment of the present invention, when the first group mixing information for the first frame group and the second group mixing information for the second frame group are different, the preset parameter stored in 'prst' First flag information indicating that the first group mixing information and the second group mixing information are different from each other, and the number of frames included in each of the plurality of frame groups (i.e., the frame interval) .

반대로, 제1 그룹 믹싱 정보와 제2 그룹 믹싱 정보가 동일한 경우, 'prst'에 저장되는 프리셋 파라미터는 제1 그룹 믹싱 정보, 및 제1 그룹 믹싱 정보와 제2 그룹 믹싱 정보가 동일한 것임을 나타내는 제2 플래그 정보, 및 포함하는 복수의 프레임 그룹 각각에 포함되는 프레임의 개수를 포함한다. In contrast, when the first group mixing information and the second group mixing information are the same, the preset parameters stored in 'prst' include first group mixing information and second group mixing information indicating that the first group mixing information and the second group mixing information are identical. Flag information, and the number of frames included in each of a plurality of frame groups included.

라. 믹싱 정보가 변화하는 횟수, 믹싱 정보가 변화하는 프레임의 프레임 넘버를 이용하여 믹싱 정보를 저장la. The number of times the mixing information changes, and the frame number of the frame in which the mixing information changes,

본 방안에 따르면, 믹싱 정보가 변화하는 횟수, 믹싱 정보가 변화하는 프레임의 프레임 넘버, 및 이에 따른 믹싱 정보가 저장된다. 따라서, 본 방안은 임의 접근(random access)의 측면에서, 상기 설명한 '다'의 방법보다 더욱 효율적인 방법이라고 할 수 있다. According to the present invention, the number of times the mixing information changes, the frame number of the frame in which the mixing information changes, and the mixing information corresponding thereto are stored. Therefore, this scheme is a more efficient method than the above-described method in terms of random access.

예를 들어, 믹싱 정보가 표 17과 같이 변화하는 경우, 'prst'에 저장되는 믹싱 정보의 변화 횟수, 믹싱 정보가 변화하는 프레임 넘버, 및 믹싱 정보(즉 볼륨 정보)는 아래와 같다. For example, when the mixing information changes as shown in Table 17, the number of changes of the mixing information stored in 'prst', the frame number at which the mixing information changes, and the mixing information (i.e., volume information) are as follows.

num_updates = 3num_updates = 3

updated_sample_number = [1, 11, 21]updated_sample_number = [1, 11, 21]

preset_volume = [50, 20, 50, 10, 70, 60]preset_volume = [50, 20, 50, 10, 70, 60]

여기서, 'num_updates'는 믹싱 정보의 변화(업데이트) 횟수를, 'updated_sample_number'은 믹싱 정보가 변화(업데이트)되는 프레임 넘버를 각각 의미한다.Herein, 'num_updates' indicates the number of times the mixing information is changed (updated), and 'updated_sample_number' indicates the frame number at which the mixing information is changed (updated).

이상에서, 믹싱 정보가 재생 시간에 따라 변화하는 경우, 믹싱 파라미터를 효율적으로 저장하기 위한 방안들에 대해 자세히 살펴보았다. 상기의 방안들은 프리셋 타입이 static preset인 경우에 있어, 저장되는 믹싱 정보들이 중복되는 때에도 역시 적용 가능하다. In the above, methods for efficiently storing the mixing parameter when the mixing information is changed according to the reproduction time have been described in detail. The above measures are also applicable when the preset type is a static preset and the stored mixing information is overlapped.

예를 들어, 'prst'에 저장되는 믹싱 정보가 표 16과 같이 표시되는 경우에 있어, 플래그 정보를 이용하는 상기의 "다" 방안에 따라 믹싱 정보를 저장하는 하는 경우, 표 16은 표 20과 같이 변형될 수 있다. For example, in the case where the mixing information stored in 'prst' is displayed as shown in Table 16, and the mixing information is stored according to the above-mentioned 'scheme' using the flag information, It can be deformed.

preset_volumepreset_volume 5050 00 5050 00 00 00 8080 8080 00 00 5050 5050 00 3030 3030 volume_flagvolume_flag 00 00 00 00 1One 1One 00 1One 00 1One 00 1One 00 00 1One modified preset_volumemodified preset_volume 5050 00 5050 00 __ __ 8080 __ 00 __ 5050 __ 00 3030 __

volume_flag = [0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1]volume_flag = [0, 0, 0, 0,1,1,0,1,0,1,0,1,0,0,1]

preset_volume = [50, 0, 50, 0, 80, 0, 50, 0, 30]preset_volume = [50, 0, 50, 0, 80, 0, 50, 0, 30]

이 경우, 표 12에 표시된 'prst'내의 프리셋의 신택스는 표 21과 같이 변형될 수 있다. In this case, the syntax of the preset in 'prst' shown in Table 12 can be modified as shown in Table 21. [

if(preset_type == 6){ // static object volume preset
unsigned int(8) num_input_channel[num_preset_track];
unsigned int(8) output_channel_type;
unsigned int(16) num_volume_flag;
for (i=0; i<num_volume_flag; i++){
unsigned int(8) volume_flag;
if(volume_flag==0){
unsigned int(8) preset_volume;
}
}
}if (preset_type == 6) {// static object volume preset
unsigned int (8) num_input_channel [num_preset_track];
unsigned int (8) output_channel_type;
unsigned int (16) num_volume_flag;
for (i = 0; i <num_volume_flag; i ++) {
unsigned int (8) volume_flag;
if (volume_flag == 0) {
unsigned int (8) preset_volume;
}
}
}

표 21의 신택스에 따른 시맨틱스는 아래와 같다. The semantics according to the syntax of Table 21 are as follows.

'volume_flag'는 볼륨 플래그 정보를 의미하는 것으로서, 'volume_flag'는 1비트 인티저의 데이터 타입을 갖는다. 'volume_flag' 이전 프레임의 믹싱 정보와 현재 프레임의 믹싱 정보가 동일한 경우, 'volume_flag'는 '1'의 값을 갖고, 이전 프레임의 믹싱 정보와 현재 프레임의 믹싱 정보가 다른 경우, 'volume_flag'는 '0'의 값을 갖는다.'volume_flag' means volume flag information, and 'volume_flag' has a data type of 1 bit integer. 'volume_flag' has a value of '1' when the mixing information of the previous frame is the same as the mixing information of the current frame, 'volume_flag' is '1' when the mixing information of the previous frame is different from the mixing information of the current frame, 0 ".

'num_volume_flag'는 'volume_flag'의 어레이 길이를 의미한다.'num_volume_flag' means the array length of 'volume_flag'.

이하에서는 상기에서 설명한 프리셋 저장 방안에 기초하여 dynamic preset의 믹싱 정보를 'prst'에 저장하는 일실시예를 구체적으로 설명하기로 한다. Hereinafter, an embodiment for storing mixing information of a dynamic preset in 'prst' will be described in detail based on the preset storing method described above.

2.1. 프리셋 타입이 dynamic/track preset인 경우('preset_type' 값이 3, 4, 5인 경우),2.1. If the preset type is dynamic / track preset ('preset_type' value is 3, 4 or 5)

상기에서 언급한 바와 같이 프리셋 타입이 track preset인 경우, 믹싱 정보의 저장에 있어 출력 채널의 타입은 고려되지 않을 수 있다.As mentioned above, when the preset type is track preset, the type of the output channel may not be considered in storing the mixing information.

본 발명의 일실시예에 따르면, 'prst' 내의 프리셋의 신택스는 표 22 내지 표 24와 같을 수 있다. 표 22 내지 표 24에 표시된 신택스는 상기 설명한 "라"의 방안을 이용하여 믹싱 정보를 저장하는 방법과 관련된 신택스이다. According to an embodiment of the present invention, the syntax of a preset in 'prst' may be as shown in Tables 22 to 24. The syntax shown in Tables 22 to 24 is a syntax related to a method for storing mixing information using the above-described scheme of "a ".

if(preset_type == 3)){ // dynamic track volume preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
unsigned int(8) preset_volume;
}
}
}if (preset_type == 3)) {// dynamic track volume preset
unsigned int (16) num_updates;
for (i = 0; i <num_updates; i ++) {
unsigned int (16) updated_sample_number;
for (j = 0; j <num_preset_track; j ++) {
unsigned int (8) preset_volume;
}
}
}

if(preset_type == 4){ // dynamic track volume preset with equalization
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
unsigned int(8) preset_volume;
unsigned int(16) num_freq_band;
for (k=0; k<num_freq_band; k++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
} if (preset_type == 4) {// dynamic track volume preset with equalization
unsigned int (16) num_updates;
for (i = 0; i <num_updates; i ++) {
unsigned int (16) updated_sample_number;
for (j = 0; j <num_preset_track; j ++) {
unsigned int (8) preset_volume;
unsigned int (16) num_freq_band;
for (k = 0; k <num_freq_band; k ++) {
unsigned int (16) center_freq;
unsigned int (16) bandwidth;
unsigned int (8) preset_freq_gain;
}
}
}
}

if(preset_type == 5){ // dynamic track equalization preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
unsigned int(16) num_freq_band;
for(k=0; k<num_freq_band; k++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}if (preset_type == 5) {// dynamic track equalization preset
unsigned int (16) num_updates;
for (i = 0; i <num_updates; i ++) {
unsigned int (16) updated_sample_number;
for (j = 0; j <num_preset_track; j ++) {
unsigned int (16) num_freq_band;
for (k = 0; k <num_freq_band; k ++) {
unsigned int (16) center_freq;
unsigned int (16) bandwidth;
unsigned int (8) preset_freq_gain;
}
}
}
}

표 22 내지 표 24의 신택스에 따른 시맨틱스는 아래와 같다. The semantics according to the syntax of Tables 22 to 24 are as follows.

'num_updates'는 믹싱 정보의 변화(업데이트) 횟수를 의미한다. 'num_updates' means the number of times the mixing information changes (updates).

'updated_sample_number'은 믹싱 정보가 변화(업데이트)되는 프레임 넘버를 의미한다. 'updated_sample_number' means a frame number at which the mixing information is changed (updated).

또한, 상기의 "다"의 방안에 따라 믹싱 정보를 저장하는 경우, 표 22의 신택스는 표 25와 같이 변형될 수 있다. Further, in the case of storing the mixing information according to the above-mentioned "D" scheme, the syntax of Table 22 can be modified as shown in Table 25. [

if(preset_type == 3)){ // dynamic track volume preset
unsigned int(8) dynamic_interval;
unsigned int(32) num_volume_flag;
for(i=0; i< num_volume_flag; i++){
unsigned int(8) volume_flag;
if(volume_flag ==0){
unsigned int(8) preset_volume;
}
}
}if (preset_type == 3)) {// dynamic track volume preset
unsigned int (8) dynamic_interval;
unsigned int (32) num_volume_flag;
for (i = 0; i <num_volume_flag; i ++) {
unsigned int (8) volume_flag;
if (volume_flag == 0) {
unsigned int (8) preset_volume;
}
}
}

표 25의 신택스에 따른 시맨틱스는 아래와 같다. The semantics according to the syntax of Table 25 are as follows.

'dynamic_interval'은 프레임 간격을 의미한다. 'dynamic_interval' means the frame interval.

2.2. 프리셋 타입이 dynamic/channel preset인 경우('preset_type' 값이 9, 10, 11인 경우),2.2. If the preset type is dynamic / channel preset ('preset_type' value is 9, 10, 11)

상기에서 언급한 바와 같이, 만약 믹싱 정보가 채널 별로 저장된다면, 믹싱 정보는 입력 트랙의 개수, 입력 트랙 당 채널의 개수 및 출력 채널의 타입을 고려하여 저장될 수 있다. As described above, if the mixing information is stored for each channel, the mixing information can be stored in consideration of the number of input tracks, the number of channels per input track, and the type of output channel.

이 경우, 'prst'내의 프리셋의 신택스는 표 26 내지 표 28과 같을 수 있다. 표 26 내지 표 27의 신택스는 상기 설명한 "라"의 방법을 이용하여 믹싱 정보를 저장하는 방법과 관련된 신택스이다. In this case, the syntax of the preset in 'prst' may be as shown in Tables 26 to 28. The syntax of Tables 26 to 27 is a syntax related to a method of storing mixing information using the above-described method of "a ".

if(preset_type == 9){ // dynamic object volume preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
for (k=0; k<num_input_channel[j]; k++){
for (m=0; m<num_output_channel; m++){
unsigned int(8) preset_volume;
}
}
}
}
}if (preset_type == 9) {// dynamic object volume preset
unsigned int (16) num_updates;
for (i = 0; i <num_updates; i ++) {
unsigned int (16) updated_sample_number;
for (j = 0; j <num_preset_track; j ++) {
for (k = 0; k <num_input_channel [j]; k ++) {
for (m = 0; m <num_output_channel; m ++) {
unsigned int (8) preset_volume;
}
}
}
}
}

if(preset_type == 10){ // dynamic object volume preset with equalization
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
for(j=0; j<num_preset_track; j++){
for (k=0; k<num_input_channel[i]; k++){
for (m=0; m<num_output_channel; m++){
unsigned int(8) preset_volume;
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
for(n=0; n<num_freq_band; n++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}
}
}if (preset_type == 10) {// dynamic object volume preset with equalization
unsigned int (16) num_updates;
for (i = 0; i <num_updates; i ++) {
for (j = 0; j <num_preset_track; j ++) {
for (k = 0; k <num_input_channel [i]; k ++) {
for (m = 0; m <num_output_channel; m ++) {
unsigned int (8) preset_volume;
unsigned int (8) num_freq_band;
for (m = 0; m <num_freq_band; m ++) {
for (n = 0; n <num_freq_band; n ++) {
unsigned int (16) center_freq;
unsigned int (16) bandwidth;
unsigned int (8) preset_freq_gain;
}
}
}
}
}
}
}

if(preset_type == 11){ // dynamic object equalization preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
for(j=0; j<num_preset_track; j++){
for (k=0; k<num_input_channel[i]; k++){
for (m=0; m<num_output_channel; m++){
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
for(n=0; n<num_freq_band; n++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}
}
}if (preset_type == 11) {// dynamic object equalization preset
unsigned int (16) num_updates;
for (i = 0; i <num_updates; i ++) {
for (j = 0; j <num_preset_track; j ++) {
for (k = 0; k <num_input_channel [i]; k ++) {
for (m = 0; m <num_output_channel; m ++) {
unsigned int (8) num_freq_band;
for (m = 0; m <num_freq_band; m ++) {
for (n = 0; n <num_freq_band; n ++) {
unsigned int (16) center_freq;
unsigned int (16) bandwidth;
unsigned int (8) preset_freq_gain;
}
}
}
}
}
}
}

이상에서는 믹싱 정보가 볼륨 정보 및 등화 정보만을 포함하는 것으로 기술하였으나, 본 발명의 일실시예에 따르면, 믹싱 정보는 적어도 하나의 입력 채널에 의해 형성되는 음상(sound image)의 크기 값 및 상기 음상의 각도 값을 더 포함할 수 있다. 음상의 크기 값 및 음상의 각도 값은 음상의 가상 위치(virtual position)를 결정하는 프리셋 파라미터이다. In the above description, the mixing information includes only the volume information and the equalization information. However, according to an embodiment of the present invention, the mixing information includes a size value of a sound image formed by at least one input channel, And may further include an angle value. The magnitude value of the sound image and the angular value of the sound image are preset parameters that determine the virtual position of the sound image.

이 경우, 음상의 각도 값은 양자화 되어 저장될 수 있다. 일례로, 음상의 각도 값은 표 29와 같은 테이블 형태로 표현될 수 있다. In this case, the angular value of the sound image can be quantized and stored. For example, the angle value of the sound image can be expressed in the form of a table as shown in Table 29.

indexindex 00 1One 22 33 44 55 66 value(°)value (°) 00 55 1010 1515 2020 2525 3030 indexindex 77 88 99 1010 1111 1212 1313 value(°)value (°) 4040 5050 6060 7070 8080 9090 100100 indexindex 1414 1515 1616 1717 1818 1919 2020 value(°)value (°) 110110 120120 130130 140140 150150 160160 170170 indexindex 2121 2222 2323 2424 2525 2626 2727 value(°)value (°) 180180 190190 200200 210210 220220 230230 240240 indexindex 2828 2929 3030 3131 3232 3333 3434 value(°)value (°) 250250 260260 270270 280280 290290 300300 310310 indexindex 3535 3636 3737 3838 3939 4040 4141 value(°)value (°) 320320 330330 335335 340340 345345 350350 355355

또한, 본 발명의 일실시예에 따르면, 객체기반 오디오 컨텐츠는 적어도 하나의 프리셋 중에서 어느 하나에 기초하여 믹싱된 오디오 신호의 다운 믹스된 신호인 모노/스테레오 오디오 신호를 더 포함할 수 있다. According to an embodiment of the present invention, the object-based audio content may further include a mono / stereo audio signal which is a downmixed signal of an audio signal mixed based on any one of the at least one preset.

상기 모노/스테레오 오디오 신호는 객체기반 오디오 컨텐츠의 재생이 불가능한 오디오 재생 장치와의 호환성을 위해 저장된다.The mono / stereo audio signal is stored for compatibility with an audio playback device that is not capable of playing object-based audio content.

객체기반 오디오 컨텐츠가 모노/스테레오 오디오 신호를 더 포함하는 경우, 객체기반 오디오 컨텐츠의 재생이 가능한 오디오 장치에서는 복수의 오디오 객체 및 적어도 하나의 프리셋에 기초하여 객체기반 오디오 컨텐츠를 재생하고, 객체기반 오디오 컨텐츠의 재생이 불가능한 오디오 장치에서는 모노/스테레오 오디오 신호를 재생하게 된다. 이에 따라, 오디오 장치의 종류에 관계없이 객체기반 오디오 컨텐츠의 재생이 가능하게 된다. When an object-based audio content further includes a mono / stereo audio signal, in an audio device capable of reproducing object-based audio content, object-based audio content is played based on a plurality of audio objects and at least one preset, A mono / stereo audio signal is reproduced in an audio device in which reproduction of the contents is impossible. Accordingly, object-based audio content can be reproduced regardless of the type of the audio device.

일례로서, 모노/스테레오 오디오 신호는 'mdat'에 저장될 수 있다. 이 경우, 'moov'/'trak'/'tkhd'내의 flags의 시맨틱스는 표 30과 같이 수정될 수 있다. 표 30에서 밑줄 친 부분은 삭제되는 시맨틱스이고, 굵은 글씨로 표시된 부분은 추가되는 시맨틱스이다. As an example, a mono / stereo audio signal may be stored in 'mdat'. In this case, the semantics of flags in 'moov' / 'trak' / 'tkhd' can be modified as shown in Table 30. In Table 30, the underlined part is the semantics to be deleted, and the part indicated in bold is the added semantics.

flags - is a 24-bit integer with flags; the following values are defined:

- Track_enabled: Indicates that the track is enabled. Flag value is 0x000001. A disabled track (the low bit is zero) is treated as if it were not present.
- Track_in_movie: Indicates that the track is used in the presentation. Flag value is 0x000002.
- Track_in_interaction_movie: Indicates that the track is used in the presentation by an interactive music player. Flag value is 0x000002.
- Track_in_non_interaction_movie: Indicates that the track is used in the presentation by a non-interactive music player. Flag value is 0x000003.
- Track_in_preview: Indicates that the track is used when previewing the presentation. Flag value is 0x000004.
flags - is a 24-bit integer with flags; the following values are defined:

- Track_enabled: Indicates that the track is enabled. Flag value is 0x000001. A disabled track (the low bit is zero) is treated as if it were not present.
- Track_in_movie: Indicates that the track is used in the presentation. Flag value is 0x000002.
- Track_in_interaction_movie: Indicates that the track is used in an interactive music player. Flag value is 0x000002.
- Track_in_non_interaction_movie: Indicates that the track is used in a non-interactive music player. Flag value is 0x000003.
- Track_in_preview: Indicates that the track is used when previewing the presentation. Flag value is 0x000004.

MPEG-4 BIFS (Binary format For Scene)를 이용하여 ' moov ' 내에 존재하는 'trak'내에 프리셋 파라미터를 저장Preset parameters are stored in 'trak' existing in ' moov ' using MPEG-4 Binary format For Scene ( BIFS )

본 발명의 일실시예에 따르면, 프리셋 파라미터는 MPEG-4 BIFS를 이용하여 'moov' 내에 존재하는 트랙(track) 박스(이하 'trak'이라고 한다)내에 저장될 수 있다. According to one embodiment of the present invention, a preset parameter may be stored in a track box (hereinafter referred to as 'trak') existing in 'moov' using MPEG-4 BIFS.

이 경우, 프리셋 파라미터 중에서 프리셋의 전체적인 정보를 나타내는 제1 프리셋 파라미터(일례로, 프리셋의 개수, 디폴트 프리셋 아이디 등)는 상기에서 설명한 'prco'에 저장될 수도 있고, BIFS 내에 새롭게 정의된 노드를 이용하여 저장될 수도 있다. In this case, the first preset parameter (for example, the number of the preset, the default preset ID, etc.) indicating the overall information of the preset among the preset parameters may be stored in the above-described 'prco' .

BIFS 내에 새롭게 정의된 노드를 이용하여 제1 프리셋 파라미터를 저장하는 경우, 노드 인터페이스(node interface)는 표 31과 같이 나타낼 수 있다. 표 31에서, 'PresetSound'는 새롭게 정의된 노드를 의미한다. When the first preset parameter is stored using the newly defined node in the BIFS, the node interface can be represented as shown in Table 31. In Table 31, 'PresetSound' means the newly defined node.

node interface

PresetSound{
exposedField SFNode source NULL
exposedField SFInt32 numPresets 1
exposedField SFInt32 default_preset_ID 1
} 노드 인터페이스

PresetSound {
exposedField SFNode source NULL
exposedField SFInt32 numPresets 1
exposedField SFInt32 default_preset_ID 1
}

표 31의 노드 인터페이스에 따른 시맨틱스는 아래와 같다. The semantics according to the node interface in Table 31 are as follows.

'source' field는 ISO/IEC 14496-11:2005의 subclause 7.2.2.116의 시맨틱스를 따른다. The 'source' field follows the semantics of subclause 7.2.2.116 of ISO / IEC 14496-11: 2005.

'numPreset' field 및 'default_preset_ID' field는 앞서 설명한 'prco'의 시맨틱스를 따른다. The 'numPreset' and 'default_preset_ID' fields follow the semantics of 'prco' described above.

또한, 프리셋 파라미터 중에서 볼륨 정보를 나타내는 프리셋 파라미터는 AudioMix node 및 WideSound node를 적절히 조합하여 저장할 수 있다. Preset parameters indicating volume information among the preset parameters can be stored in an appropriate combination of AudioMix node and WideSound node.

또한, 프리셋 파라미터 중에서, 등화 정보를 나타내는 프리셋 파라미터는 기존의 AudioRXProto node 중 PROTO audioEcho를 이용하여 저장할 수도 있고, BIFS 내에 새롭게 정의된 노드를 이용하여 저장될 수도 있다. Among the preset parameters, preset parameters indicating equalization information may be stored using PROTO audioEcho among existing AudioRXProto nodes, or may be stored using newly defined nodes in BIFS.

BIFS 내에 새롭게 정의된 노드를 이용하여 등화 정보(보다 정확하게는 주파수 이득 값)를 저장하는 경우, 노드 인터페이스(node interface)는 표 32와 같이 나타낼 수 있다. 표 32에서, 'PersetAudioEqualizer'는 새롭게 정의된 노드를 의미한다. When storing the equalization information (more precisely, the frequency gain value) using the newly defined node in the BIFS, the node interface can be represented as shown in Table 32. In Table 32, 'PersetAudioEqualizer' means a newly defined node.

node interface
PresetAudioEqualizer{
eventIn MFNode addChildren
eventIn MFNode removeChildren
exposedField MFNode children []
exposedField SFInt32 numInputs 1
exposedField MFFloat params []
} 노드 인터페이스
PresetAudioEqualizer {
eventIn MFNode addChildren
eventIn MFNode removeChildren
exposedField MFNode children []
exposedField SFInt32 numInputs 1
exposedField MFFloat params []
}

표 32의 노드 인터페이스에 따른 시맨틱스는 아래와 같다. The semantics according to the node interface in Table 32 are as follows.

'children' field는 동시에 믹싱될 수 있는 노드들의 출력을 의미한다. 'child' field의 일례로서, AudioSource, AudioMix 등이 있다. The 'children' field is the output of nodes that can be mixed at the same time. An example of a 'child' field is AudioSource, AudioMix, and the like.

'addChildren'은 'children' field에 추가되는 노드 리스트를 의미한다. 'addChildren' means the list of nodes added to the 'children' field.

'removeChildren'은 'children' field에서 삭제되는 노드 리스트를 의미한다. 'removeChildren' means the node list to be removed from the 'children' field.

'numInputs' field는 입력 트랙의 개수를 의미한다. The 'numInputs' field indicates the number of input tracks.

'params' field는 [numInputs ×3·numFreqBands]의 행렬로서, 각 행에는 각 입력 트랙에 적용되는 주파수 대역의 등화 파라미터(등화 정보)가 저장된다. 이는 표 33과 같이 나타낼 수 있다. The 'params' field is a matrix of [numInputs x 3 .numFreqBands], where each row stores equalization parameters (equalization information) of frequency bands applied to each input track. This can be shown in Table 33.

Data TypeData Type FunctionFunction Default valueDefault value RangeRange floatfloat numFreqBandsnumFreqBands 22 0,…, 320,… , 32 float[]float [] centerFreqcenterFreq [][] 0,…, 200000,… , 20000 float[]float [] bandwidthbandwidth [][] 0,…, 200000,… , 20000 float[]float [] gaingain 1One 0.1,…, 100.1, ... , 10

여기에서,' numFreqBands'은 주파수 대역의 개수, 'centerFreq'는 각 주파수 대역에서의 중심 주파수, 'bandwidth'는 각 주파수 대역에서의 대역폭, 'gain'은 주파수 대역 별 이득 값을 각각 의미한다. Here, 'numFreqBands' is the number of frequency bands, 'centerFreq' is the center frequency in each frequency band, 'bandwidth' is the bandwidth in each frequency band, and 'gain' is the gain value per frequency band.

즉, 'params' field의 각 행은 아래와 같이 구성된다.That is, each line of the 'params' field is structured as follows.

numFreqBands = params [0]numFreqBands = params [0]

centerFreq [0...numFreqBands-1] = params [1 ... numFreqBands]centerFreq [0 ... numFreqBands-1] = params [1 ... numFreqBands]

bandwidth [0...numFreqBands-1] = params [numFreqBands + 1 ... 2·numFreqBands]bandwidth [0 ... numFreqBands-1] = params [numFreqBands + 1 ... 2 占 numFreqBands]

gain [0...numFreqBands-1] = params [2·numFreqBands+1 ... 3·numFreqBands]gain [0 ... numFreqBands-1] = params [2 .numFreqBands + 1 ... 3 .numFreqBands]

MPEG-4 LASeR (Lightweight Application Scene Representation)를 이용하여 'meta' 내의 xml'에 프리셋 파라미터를 저장Preset parameters are stored in 'xml' in 'meta' using MPEG-4 Lightweight Application Scene Representation ( LASeR )

본 발명의 일실시예에 따르면, 프리셋 파라미터는 MPEG-4 LASeR를 이용하여 'meta' 내에 존재하는 엑스엠엘(xml) 박스(이하 'xml'이라고 한다)내에 저장될 수 있다. According to an embodiment of the present invention, a preset parameter may be stored in an XMl box (hereinafter referred to as 'xml') existing in 'meta' using MPEG-4 LASeR.

이 경우, 표 34와 같은 엘리먼트(element) 및 어트리뷰트(attribute)를 새롭게 정의하여 프리셋 파라미터를 저장할 수 있다. In this case, an element and an attribute as shown in Table 34 can be newly defined to store the preset parameter.

i. presetContainer element

semantics

presetContainer element에는 앞서 설명한 'prco'와 동일한 정보가 저장된다.

attribute

'numPreset'은 프리셋의 개수를 의미한다.
'defaultPresetID'는 디폴트 프리셋 아이디를 의미한다.

ii. preset element

semantics

preset element에는 앞서 설명한 'prst'와 동일한 정보가 저장된다. 또한, preset element는 presetContainer element의 children으로 존재한다.

attribute
앞서 설명한 ISO-BMFF의 'prst'의 신택스 및 시맨틱스를 어트리뷰트로 이용한다. i. presetContainer element

semantics

The presetContainer element stores the same information as the above 'prco'.

attribute

'numPreset' means the number of presets.
'defaultPresetID' means the default preset ID.

ii. preset element

semantics

The preset element stores the same information as 'prst' described above. Preset elements also exist as children of the presetContainer element.

attribute
We use the syntax and semantics of 'prst' in ISO-BMFF described above as attributes.

기타Other

본 발명의 일실시예에 따르면, 복수의 오디오 객체를 포함하여 구성되는 파일 내에 프리셋 정보가 이미 기술되어 있는 경우, 객체 기반 오디오 컨텐츠 포맷에서 이를 참조하게 하거나, 상기의 프리셋 정보를 객체 기반 오디오 컨텐츠 포맷에 맞도록 변형하여 객체기반 오디오 컨텐츠 포맷 형태로 프리셋 파라미터를 저장할 수 있다. According to an embodiment of the present invention, when preset information is already described in a file including a plurality of audio objects, the preset information may be referred to in an object-based audio content format, or the preset information may be referred to as an object- To store preset parameters in the form of an object-based audio content format.

또한, 본 발명의 일실시예에 따르면, BIFS 또는 LASeR와 같은 장면 표현언어 형태로 구성된 파일 내에서 프리셋 정보가 기술되어 있는 경우, 객체기반 오디오 컨텐츠 포맷에서 이를 참조하게 하거나, 상기의 프리셋 정보를 객체 기반 오디오 컨텐츠 포맷 스키마에 맞도록 변형하여 객체기반 오디오 컨텐츠 포맷 형태로 프리셋 파라미터를 저장할 수 있다. According to an embodiment of the present invention, when preset information is described in a file configured in a scene expression language format such as BIFS or LASeR, the preset information is referred to in an object-based audio content format, Based audio content format schema to store preset parameters in the form of object-based audio content formats.

또한, 본 발명의 일실시예에 따르면, 프리셋 만으로 구성된 파일로부터 프리셋 정보를 획득하는 경우, 객체기반 오디오 컨텐츠 포맷에서 이를 참조하도록 할 수 있다. 또한, 프리셋 만으로 구성된 파일에 저장된 프리셋 정보를 객체기반 오디오 컨텐츠 포맷 형태로 저장할 수 있다.Also, according to an embodiment of the present invention, when obtaining preset information from a file composed of only a preset, it may be referred to in an object-based audio content format. In addition, preset information stored in a file composed only of a preset can be stored as an object-based audio content format.

앞서 언급한 바와 같이, 객체기반 오디오 컨텐츠에는 디스크립션 정보(또는 디스크립션 메타데이터)가 추가적으로 저장되고, 저장된 디스크립션 정보는 객체기반 오디오 컨텐츠의 검색 및 필터링에 활용될 수 있다. 이하에서는 디스크립션 정보를 저장하는 방법을 도 7 및 도 8을 참고하여 설명하기로 한다. As described above, the description information (or description metadata) is additionally stored in the object-based audio content, and the stored description information can be utilized for searching and filtering the object-based audio content. Hereinafter, a method of storing description information will be described with reference to FIGS. 7 and 8. FIG.

도 7 및 도 8은 본 발명의 일실시예에 따라 디스크립션 정보를 포함하는 객체기반 오디오 컨텐츠의 저장을 위한 파일 포맷의 구조를 도시한 도면이다.7 and 8 are diagrams showing a structure of a file format for storing object-based audio content including description information according to an embodiment of the present invention.

ISO 기반의 객체기반 오디오 컨텐츠 파일 포맷에서, 디스크립션 정보는 앨범(album)을 표현하기 위한 메타데이터(이하, 'album level metadata'라고 한다), 노래(song)를 표현하기 위한 메타데이터(이하, 'song level metadata'라고 한다), 및 트랙(track)을 표현하기 위한 메타데이터(이하, 'track level metadata'라고 한다)를 포함하여 구성될 수 있다. 여기서, 각각의 메타데이터를 정리하면 표 35와 같이 나타낼 수 있다. In the ISO-based object-based audio content file format, the description information includes metadata for expressing an album (hereinafter referred to as' album level metadata '), metadata for expressing a song (hereinafter referred to as' song level metadata "), and metadata for representing a track (hereinafter referred to as track level metadata). Here, each of the meta data can be summarized as shown in Table 35.

DescriptionDescription LevelLevel albumalbum songsong tracktrack titletitle oo oo oo singersinger oo oo -- composercomposer - - oo -- lyricistlyricist -- oo -- performing musicianperforming musician -- -- oo genregenre oo oo -- file datefile date oo oo oo CD track number of the songCD track number of the song -- oo -- productionproduction oo oo -- publisherpublisher oo oo -- copyright informationcopyright information oo oo -- ISRC
(International Standard Recording Code)ISRC
(International Standard Recording Code) -- oo -- imgaeimgae oo oo -- URL
site address related to the music and the artist(e.g. album homepage, fan cafe, music video)URL
site address related to the music and the artist (eg album homepage, fan cafe, music video) oo oo --

상기의 메타데이터는 "노래(song) 및 트랙을 표현하기 위한 메타데이터"와 "앨범을 표현하기 위한 메타데이터"의 2가지 타입으로 분류될 수 있다. 여기서, "앨범을 표현하기 위한 메타데이터"는 객체기반 오디오 컨텐츠 내에 저장된 노래(song) 중에서 같은 앨범 내에 수록되어 있는 노래(song)들에 대한 공통되는 정보들을 표현한다. The above metadata can be classified into two types: " metadata for expressing a song and a track "and" metadata for representing an album ". Here, the "metadata for representing an album" represents common information about songs stored in the same album among songs stored in the object-based audio content.

album level metadata는 'ftyp'/'meta'에, song level metadata는 'moov'/'meta'에, track level metadata는 'moov'/'trak'/'meta'에 각각 저장될 수 있다. 이를 정리하면 표 36과 같이 나타낼 수 있다. album level metadata can be stored in 'ftyp' / 'meta', song level metadata in 'moov' / 'meta', track level metadata in 'moov' / 'trak' / 'meta' respectively. Table 36 summarizes them.

MetadataMetadata LocationLocation track level track level trak/meta boxtrak / meta box song level song level moov/meta box moov / meta box album level album level meta box of file meta box of file

상기의 메타데이터가 저장되는 ISO 기반의 객체기반 오디오 컨텐츠 파일 포맷 구조의 형태는 도 7 및 도 8과 같이 나타낼 수 있다. 도 7에 도시된 포맷 구조는 하나의 싱글 타입의 파일 구조(single type file structure)이고, 도 8에 도시된 포맷 구조는 멀티 타입의 파일 구조(multiple type file structure)이다. The form of the ISO-based object-based audio content file format structure in which the metadata is stored can be illustrated in FIGS. 7 and 8. FIG. The format structure shown in FIG. 7 is a single type file structure, and the format structure shown in FIG. 8 is a multiple type file structure.

여기서, 상기의 메타데이터는 mp7t(mpeg-7 type)에 따라 관리(handling)될 수 있다. Here, the metadata may be managed according to mp7t (mpeg-7 type).

보다 상세하게, track level metadata 및 song level metadata를 위해서 MPEG-7의 'CreationInformation', 'MediaInformation', 및 'Semantics DS'가 사용될 수 있다. album level metadata를 위해서는 MPEG-7의 'ContentCollection DS' 및 'CreationInformation DS '가 사용될 수 있다. 이는 album level metadata가 하나의 앨범에 포함되는 복수의 노래에 대한 구조적 정보(structure information)를 포함하고 있기 때문이다. More specifically, 'CreationInformation', 'MediaInformation', and 'Semantics DS' of MPEG-7 can be used for track level metadata and song level metadata. MPEG-7 'ContentCollection DS' and 'CreationInformation DS' can be used for album level metadata. This is because the album level metadata contains structure information for a plurality of songs included in one album.

이를 정리하면 표 37 내지 표 39와 같이 나타낼 수 있다. These can be summarized as shown in Table 37 to Table 39.

Tag NameTag Name SemanticsSemantics CreationInformation/Creation/Creator[＠type="Instrument"]CreationInformation / Creation / Creator [@ type = "Instrument"] The title of the trackThe title of the track - CreationInformation/Creation/Creator[Role/＠herf="urn:mpeg:mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonType"] / Name /{FamilyName, GivenName}(Arist name)
- CreationInformation/Creation/Creator[Role/＠herf= "urn:mpeg: mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonGroupType"] /Name/(Group Name)- FamilyName, GivenName} (Arist name) - CreationInformation / Creation / Creator [Role / @ herf = "urn: mpeg: mpeg7: RoleCS: 2001: PERFORMER"] / Agent [@xsi:
- Name / (Group Name) - CreationInformation / Creation / Creator [Role / @ herf = "urn: mpeg: mpeg7: RoleCS: 2001: PERFORMER"] / Agent [@xsi: The name of a musician who is performing instruments, such as vocal, guitar, keyboard and so onThe name of a musician who is performing, such as vocal, guitar, keyboard and so on CreationInformation/CreationCoordinates/Date/TimePointCreationInformation / CreationCoordinates / Date / TimePoint Time point of the recordingTime point of the recording

Tag NameTag Name SemanticsSemantics CreationInformation/Creation/Title[＠type="songTitle"]CreationInformation / Creation / Title [@ type = "songTitle"] The title of the songThe title of the song - CreationInformation/Creation/Creator[Role/＠herf="urn: mpeg : mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonType"] / Name /{FamilyName, GivenName}(Arist name)
- CreationInformation/Creation/Creator[Role/＠herf= "urn : mpeg : mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonGroupType"] /Name/(Group Name)- FamilyName, GivenName} (Arist name) - CreationInformation / Creation / Creator [Role / @ herf = "urn: mpeg: mpeg7: RoleCS: 2001: PERFORMER"] / Agent [@xsi:
- Name / (Group Name) - CreationInformation / Creation / Creator [Role / @ herf = "urn: mpeg: mpeg7: RoleCS: 2001: PERFORMER"] / Agent [@xsi: The name of a musician such as singer, composer and lyricistThe name of a musician such as singer, composer and lyricist CreationInformation/Classification/Genre[＠herf="urn:id3:v1:genreID"]CreationInformation / Classification / Genre [@ herf = "urn: id3: v1: genreID"] GenreGenre CreationInformation/CreationCoordinates/Date/TimePointCreationInformation / CreationCoordinates / Date / TimePoint Time point when the song is releasedTime point when the song is released Semantics/SemanticBase[＠xsi:type="SemanticStateType"] /AttributeValuePairSemantics / SemanticBase [@xsi: type = "SemanticStateType"] / AttributeValuePair CD track number of the songCD track number of the song CreationInformation/Creation/Abstract/FreeTextAnnotationCreationInformation / Creation / Abstract / FreeTextAnnotation Information on production, Publisher and site address related to the music and the artist
(e.g. album homepage, fan cafe and music video) Information on production, Publisher and site address related to the music and the artist
(eg album homepage, fan cafe and music video) CreationInformation/Creation/copyrightStringCreationInformation / Creation / copyrightString Textual label indicating information that may be displayed or otherwise made known to the end userTextual label indicating information that may be displayed or otherwise made known to the end user MediaInformation/MediaIdentification/EntityIdentifierMediaInformation / MediaIdentification / EntityIdentifier ISRCISRC CreationInformation/Creation/TitleMedia[＠type="TitleImage"]CreationInformation / Creation / TitleMedia [@ type = "TitleImage"]

Tag NameTag Name SemanticsSemantics ContentCollection/CreationInformation/Creation/Title[＠type="albumTitle"]ContentCollection / CreationInformation / Creation / Title [@ type = "albumTitle"] The title of the albumThe title of the album - ContentCollection/CreationInformation/Creation/Creator[Role/＠href="urn: mpeg:mpeg7:RoleCS:2001:PERFORMER"]/Agent[＠xsi:type = "PersonType"]/ Name /{FamilyName, GivenName}(Arist name)
- CreationInformation/Creation/Creator[Role/＠herf= "urn:mpeg: mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonGroupType"] /Name/(Group Name)- FamilyName, GivenName} (Arist) - ContentCollection / CreationInformation / Creation / Creator [Role / @ href = "urn: mpeg: mpeg7: RoleCS: 2001: PERFORMER"] / Agent [@xsi: name)
- Name / (Group Name) - CreationInformation / Creation / Creator [Role / @ herf = "urn: mpeg: mpeg7: RoleCS: 2001: PERFORMER"] / Agent [@xsi: The name of representative musician of the albumThe name of representative musician of the album ContentCollection/CreationInformation/Classification/Genre[＠href="urn: id3:v1:genreID"]ContentCollection / CreationInformation / Classification / Genre [@ href = "urn: id3: v1: genreID"] GenreGenre ContentCollection/CreationInformation/CreationCoordinates/Date/TimepointContentCollection / CreationInformation / CreationCoordinates / Date / Timepoint Time point when the album is relatedTime point when the album is related ContentCollection/CreationInformation/Creation/Abstract/FreeText AnotationContentCollection / CreationInformation / Creation / Abstract / FreeText Anotation Information on production, publisher and site address related to the music and the artist
(e.g. album homepage, fan cafe and music video)Information on production, publisher and site address related to the music and the artist
(eg album homepage, fan cafe and music video) ContentCollection/CreationInformation/Creation/CopyrightStringContentCollection / CreationInformation / Creation / CopyrightString Textual label indicating information that may be displayed or otherwise made known to the end userTextual label indicating information that may be displayed or otherwise made known to the end user ContentCollection/CreationInformation/Creation/TitleMedia[＠type ="TitleImage"]ContentCollection / CreationInformation / Creation / TitleMedia [@type = "TitleImage"] The title of the multimedia content in image formThe title of the multimedia content in image form

또한, 객체기반 오디오 컨텐츠 내에는 노래의 가사(lyrics) 등과 같은 오디오 컨텐츠 관련 정보가 포함될 수 있는데, 객체기반 오디오 컨텐츠의 재생 시 오디오 컨텐츠 재생 장치에 상기의 오디오 컨텐츠 관련 정보를 표시한다면, 보다 효율적으로 사용자에게 객체기반 오디오 서비스를 제공할 수 있다. 오디오 컨텐츠 관련 정보는 객체기반 오디오 컨텐츠의 재생 시간에 따라 변화될 수 있다. 이하에서는 재생 시간에 따라 변화하는 오디오 컨텐츠 관련 정보를 'Timed Text'라고 칭하기로 한다. In addition, the object-based audio content may include audio content-related information such as lyrics of the song. If the audio content-related information is displayed on the audio content reproduction apparatus when the object-based audio content is reproduced, It is possible to provide an object-based audio service to the user. The audio content related information may be changed according to the playback time of the object-based audio content. Hereinafter, the audio content related information that changes according to the playback time will be referred to as 'Timed Text'.

객체기반 오디오 컨텐츠 파일 포맷에서는 3GPP TS 26.245 (이하, '3GPP Timed Text'라고 칭하기로 한다), MPEG-4 Streaming Text Format과 같은 Timed Text 표준을 이용하여 Timed Text를 제공할 수 있다. The object-based audio content file format can provide Timed Text using Timed Text standards such as 3GPP TS 26.245 (hereinafter referred to as '3GPP Timed Text') and MPEG-4 Streaming Text Format.

일례로서, 3GPP Timed Text를 이용하여 Timed Text를 제공하는 경우, 3GPP Timed Text는 텍스트 샘플(text sample)과 샘플 디스크립션(sample description)을 포함하여 구성될 수 있다. As an example, when providing Timed Text using 3GPP Timed Text, 3GPP Timed Text may be configured to include a text sample and a sample description.

여기서, 텍스트 샘플은 텍스트 스트링(text string)과 샘플 모디파이어(sample modifier)를 포함하여 구성될 수 있는데, 샘플 모디파이어(sample modifier)는 텍스트 스트링을 랜더링하는 방법에 대한 정보를 담고 있다. Here, the text sample may comprise a text string and a sample modifier, which contains information about how to render the text string.

텍스트 샘플은 ISO-BMFF에서 'mdat' 내 하나의 트랙(즉 text track) 으로 저장된다. 저장된 텍스트 샘플은 'moov'/'trad'/'mdia'/'minf'/'stbl' 내의 'stts', 'stsc', 'stco' 등에 저장된 정보들을 이용하여 오디오 트랙과 같은 timed media와 동기되어 재생된다. Text samples are stored in ISO-BMFF as a single track (ie a text track) in 'mdat'. Stored text samples are synchronized with timed media such as audio tracks using information stored in 'stts', 'stsc', 'stco', etc. in 'moov' / 'trad' / 'mdia' / 'minf' / 'stbl' Is reproduced.

또한, 샘플 디스크립션은 텍스트가 랜더링되는 방법에 관한 정보를 포함한다. 일례로, 샘플 디스크립션은 디스플레이되는 텍스트의 위치, 텍스트의 색, 배경(background) 색 등에 대한 정보를 포함하고 있다. 샘플 디스크립션은 한편, sample description은 'SampleEntry'를 'TextSampleEntry'로 확장하여 'stsd'에서 기술될 수 있다. The sample description also includes information about how the text is rendered. For example, the sample description contains information about the location of the displayed text, the color of the text, the background color, and so on. The sample description, on the other hand, can be described in 'stsd' by extending 'SampleEntry' to 'TextSampleEntry'.

이상에서는 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법에 대해 설명하였다. 이하에서는 도 5를 참고하여 상기의 객체기반 오디오 컨텐츠의 생성 방법에 따라 생성된 객체기반 오디오 컨텐츠를 재생하는 방법에 대해 설명하기로 한다. The method of generating object-based audio contents according to an embodiment of the present invention has been described above. Hereinafter, a method of reproducing object-based audio content generated according to the object-based audio content generation method will be described with reference to FIG.

도 5는 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법에 대한 흐름도를 도시한 도면이다. FIG. 5 is a flowchart illustrating a method of reproducing object-based audio content according to an exemplary embodiment of the present invention. Referring to FIG.

먼저, 단계(510)에서는 객체기반 오디오 컨텐츠로부터 복수의 오디오 객체 및 적어도 하나의 프리셋을 복원한다. First, in step 510, a plurality of audio objects and at least one preset are restored from the object-based audio content.

이 경우, 객체기반 오디오 컨텐츠는 도 3에서 설명한 객체기반 오디오 컨텐츠의 생성 방법에 따라 생성된 것이다. In this case, the object-based audio content is generated according to the object-based audio content generation method described with reference to FIG.

단계(520)에서는 적어도 하나의 프리셋에 기초하여 복수의 오디오 객체를 믹싱하여 출력 오디오 신호를 생성한다. In step 520, a plurality of audio objects are mixed based on at least one preset to generate an output audio signal.

단계(530)에서는 생성된 출력 오디오 신호를 재생한다. In step 530, the generated output audio signal is reproduced.

상기에서 언급한 바와 같이, 프리셋 파라미터에 포함된 디폴트 프리셋 아이디 값이 '0'의 값을 갖는 경우, 다객체 오디오 압축 기술(SAOC)로써 부호화되어 저장된 오디오 객체들의 비트스트림 내부에 저장된 프리셋에 따라 객체기반 오디오 컨텐츠가 재생될 수 있는데, 이하에서는 도 6을 참고하여 다객체 오디오 압축 기술(SAOC)로써 부호화되어 저장된 오디오 객체들의 비트스트림 내부에 저장된 프리셋에 기초하여 객체기반 오디오 컨텐츠가 재생되는 과정을 상세히 설명하기로 한다. As described above, when the default preset ID value included in the preset parameter has a value of '0', the object is encoded according to the preset stored in the bitstream of the audio objects encoded and stored by the multi-object audio compression technique (SAOC) Based audio content can be reproduced. Hereinafter, referring to FIG. 6, a process of reproducing object-based audio content based on a preset stored in the bitstream of audio objects coded and stored by the multi-object audio compression technique (SAOC) I will explain.

도 6은 본 발명의 다른 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법의 흐름도를 도시한 도면이다. FIG. 6 is a flowchart illustrating a method of reproducing object-based audio content according to another embodiment of the present invention.

먼저, 단계(610)에서는 객체기반 오디오 컨텐츠 내에 프리셋이 존재하는지를 판단한다. First, in step 610, it is determined whether or not a preset exists in the object-based audio content.

단계(610)에서 프리셋이 존재하는 것으로 판단(즉, 'num_preset가 '0'이 아닌 값을 갖는 것으로 판단)한 경우, 단계(620)에서는 객체기반 오디오 컨텐츠 내에 디폴트 프리셋 아이디가 존재하는지를 판단한다. If it is determined in step 610 that a preset exists (i.e., 'num_preset' has a value other than '0'), in step 620, it is determined whether a default preset ID exists in the object-based audio content.

단계(620)에서 디폴트 프리셋 아이디가 존재하는 것으로 판단(즉, 'default_preset_ID'가 '0'이 아닌 값을 갖는 것으로 판단)한 경우, 단계(630)에서는 디폴트 프리셋 아이디와 동일한 프리셋 아이디를 갖는 프리셋에 기초하여 복수의 오디오 객체를 믹싱하여 출력 오디오 신호를 생성하고, 단계(670)에서는 생성될 출력 신호를 재생한다. If it is determined in step 620 that a default preset ID exists (i.e., 'default_preset_ID' is determined to have a value other than '0'), in step 630, a preset having the same preset ID as the default preset ID And mixes the plurality of audio objects on the basis of the audio signal to generate an output audio signal. In step 670, the output signal to be generated is reproduced.

만약, 단계(610)에서 프리셋이 존재하지 않는 것으로 판단(즉, 'num_preset가 '0'의 값을 갖는 것으로 판단)하거나, 단계(620)에서 디폴트 프리셋 아이디가 존재하지 않는 것으로 판단(즉, 'default_preset_ID'가 '0'의 값을 갖는 것으로 판단)한 경우, 단계(640)에서는 SAOC 비트스트림이 존재하는지를 판단한다. If it is determined in step 610 that a preset does not exist (i.e., 'num_preset' has a value of '0'), or if it is determined in step 620 that a default preset ID does not exist default_preset_ID 'has a value of' 0 '), it is determined in step 640 whether an SAOC bitstream exists.

단계(640)에서 SAOC 비트스트림이 존재하는 것으로 판단한 경우, 단계(650)에서는 SAOC 비트스트림 내에 프리셋이 존재하는지를 판단한다. If it is determined in step 640 that a SAOC bitstream exists, it is determined in step 650 whether a preset exists in the SAOC bitstream.

단계(650)에서 SAOC 비트스트림 내에 프리셋이 존재하는 것으로 판단한 경우, 단계(670)에서는 SAOC 비트스트림 내에 포함된 첫번째 프리셋에 기초하여 복수의 오디오 객체를 믹싱하여 출력 오디오 신호를 생성하고, 단계(670)에서는 생성될 출력 신호를 재생한다. If it is determined in step 650 that a preset is present in the SAOC bitstream, then in step 670, a plurality of audio objects are mixed based on the first preset contained in the SAOC bitstream to produce an output audio signal, ) Reproduces the output signal to be generated.

만약, 단계(640)에서 SAOC 비트스트림이 존재하지 않는 것으로 판단하거나, 단계(650)에서 SAOC 비트스트림 내에 프리셋이 존재하지 않는 것으로 판단한 경우, 객체기반 오디오 컨텐츠 내에 프리셋이 없는 것으로 판단하여 객체기반 오디오 컨텐츠를 재생하지 않는다. If it is determined in step 640 that the SAOC bitstream does not exist, or if it is determined in step 650 that there is no preset in the SAOC bitstream, it is determined that there is no preset in the object-based audio content, Do not play content.

또한, 본 발명의 실시예들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 일실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. In addition, embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Examples of program instructions, such as magneto-optical and ROM, RAM, flash memory and the like, can be executed by a computer using an interpreter or the like, as well as machine code, Includes a high-level language code. The hardware devices described above may be configured to operate as one or more software modules to perform operations of one embodiment of the present invention, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- And various modifications and changes may be made thereto by those skilled in the art to which the present invention pertains. Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, belong to the scope of the present invention .

Claims

An object-based audio content generation apparatus comprising:
The object-based audio-content creating apparatus includes:
A processor for generating at least one preset associated with a plurality of audio objects and generating object-based audio content comprising the plurality of audio objects, the at least one preset,
Lt; / RTI >
Wherein the preset includes information related to boxes and offsets associated with synchronization.

The method according to claim 1,
Wherein the preset is editable by a user,
Wherein the object-based audio content is provided according to a media file format including track information.

The method according to claim 1,
Wherein the object-
(ii) the number of output channels according to the type of the output channel, (iii) the gain information for the audio object, and ( iv) an object-based audio content generation device including a virtual position and an angle of the audio object.

An object-based audio content reproducing apparatus comprising:
The object-based audio content reproducing apparatus includes:
A processor for extracting at least one preset and a plurality of audio objects associated with a plurality of audio objects from object-based audio content, and reproducing the object-based audio content using the preset and audio objects,
Wherein the preset includes information related to boxes and offsets associated with synchronization.

5. The method of claim 4,
Wherein the preset is editable by a user,
Wherein the object-based audio content is provided according to a media file format including track information.

5. The method of claim 4,
Wherein the object-
(ii) the number of output channels according to the type of the output channel, (iii) the gain information for the audio object, and ( iv) an object-based audio content reproduction device including a virtual position and an angle of the audio object.

A method for generating an object-based audio content,
Creating at least one preset associated with a plurality of audio objects; And
Generating the object-based audio content including the plurality of audio objects, the at least one preset,
Lt; / RTI >
Wherein the preset includes information related to boxes and offsets associated with synchronization.

8. The method of claim 7,
Wherein the preset is editable by a user,
Wherein the object-based audio content is provided according to a media file format including track information.

8. The method of claim 7,
Wherein the object-
(ii) the number of output channels according to the type of the output channel, (iii) the gain information for the audio object, and ( iv) an object-based audio content generation method including a virtual position and an angle of an audio object.

An object-based audio content reproducing method,
Extracting at least one preset and a plurality of audio objects associated with a plurality of audio objects from object-based audio content; And
Playing the object-based audio content using the preset and audio objects
Lt; / RTI >
Wherein the preset includes information related to boxes and offsets associated with synchronization.

11. The method of claim 10,
Wherein the preset is editable by a user,
Wherein the object-based audio content is provided according to a media file format including track information.

11. The method of claim 10,
Wherein the object-
(ii) the number of output channels according to the type of the output channel, (iii) the gain information for the audio object, and ( iv) an object-based audio content reproduction method including a virtual position and an angle of an audio object.