KR20070012808A

KR20070012808A - Audio bitstream format in which the bitstream syntax is described by an ordered transveral of a tree hierarchy data structure

Info

Publication number: KR20070012808A
Application number: KR1020067021713A
Authority: KR
Inventors: 삐에르-안토니 스티벨 레뮈에
Original assignee: 돌비 레버러토리즈 라이쎈싱 코오포레이션
Priority date: 2004-04-21
Filing date: 2005-04-13
Publication date: 2007-01-29
Also published as: MXPA06010867A; BRPI0509985A; JP2007537464A; CN1942931A; US20070208571A1; CA2561352A1; AU2005241905A1; EP1743327A1; IL178123A0; WO2005109403A1

Abstract

A bitstream format for representing audio information in which the bitstream syntax is described by an ordered transversal of a tree hierarchy data structure, has a tree hierarchy comprising a plurality of tree hierarchy levels, each having one or more nodes, in which at least some progressively smaller subdivisions of the audio information are represented in progressively lower levels of the tree hierarchy, wherein the audio information is included among nodes in one ore more of said levels. ® KIPO & WIPO 2007

Description

AUDIO BITSTREAM FORMAT IN WHICH THE BITSTREAM SYNTAX IS DESCRIBED BY AN ORDERED TRANSVERAL OF A TREE HIERARCHY DATA STRUCTURE}

본 발명은 트리 계층 데이터 구조의 순서적 횡단에 의해 비트스트림이 기술되는 오디오 정보를 나타내기 위한 비트스트림 포맷, 이러한 비트스트림 포맷에 따라 포맷된 비트스트림, 이러한 비트스트림을 저장 혹은 전송하기 위한 매체, 이러한 비트스트림 포맷에 따른 포맷을 갖는 비트스트림을 엔코딩 및 디코딩하기 위한 시스템, 이러한 비트스트림 포맷에 따른 포맷을 갖는 비트스트림을 엔코딩하기 위한 엔코더, 이러한 비트스트림 포맷에 따른 포맷을 갖는 비트스트림을 디코딩하기 위한 디코더, 이러한 비트스트림 포맷에 따른 포맷을 갖는 비트스트림을 엔코딩 및 디코딩하기 위한 프로세스, 이러한 비트스트림 포맷에 따른 포맷을 갖는 비트스트림을 생성하기 위한 프로세스, 이러한 비트스트림 포맷에 따른 포맷을 갖는 비트스트림을 엔코딩하기 위한 프로세스, 이러한 비트스트림 포맷에 따른 포맷을 갖는 비트스트림을 디코딩하기 위한 프로세스에 관한 것이다. The present invention relates to a bitstream format for representing audio information in which a bitstream is described by sequential traversal of a tree hierarchical data structure, a bitstream formatted according to the bitstream format, a medium for storing or transmitting such a bitstream, A system for encoding and decoding a bitstream having a format according to such a bitstream format, an encoder for encoding a bitstream having a format according to such a bitstream format, a decoding of a bitstream having a format according to such a bitstream format A decoder for, a process for encoding and decoding a bitstream having a format according to such a bitstream format, a process for generating a bitstream having a format according to such a bitstream format, a bitstream having a format according to such a bitstream format Encoding Relates to a process, a process for decoding a bitstream having a format in accordance with such a bit stream format.

<발명의 개시><Start of invention>

본 발명의 면들에 따라서, 트리 계층 데이터 구조의 순서적 횡단에 의해 비트스트림이 기술되는 오디오 정보를 나타내기 위한 비트스트림 포맷은 복수의 트리 계층레벨들을 포함하는 트리 계층을 가지며, 트리 계층레벨 각각은 하나 이상의 노드들을 가지며, 오디오 정보의 적어도 어떤 점진적으로 더욱 작아지는 세분할은 트리 계층의 점진적으로 더욱 낮아지는 레벨들로 표현되며, 오디오 정보는 하나 이상의 레벨들의 노드들 중에 포함된다. 오디오의 점진적으로 더욱 작아지는 세분할은 하나 이상의 시간적 세분할, 공간적 세분할 및 분해능 세분할을 포함한다. 트리 계층의 제1 레벨은 모든 오디오 정보를 나타내는 루트 노드를 포함하고 적어도 한 낮은 레벨은 오디오 정보의 시간 세그먼트화를 나타내는 복수의 노드들을 포함하며, 적어도 한 더 낮은 레벨은 오디오 정보의 공간적 세그먼트화를 나타내는 복수의 노드들을 포함할 수 있다. 대안적으로, 혹은 추가하여, 오디오 정보는, 기본 분해능 오디오 정보층이 한 레벨에 포함되고 하나 이상의 오디오 정보 분해능 인핸스먼트층들이 동일 층 내에 혹은 하나 이상의 다른 레벨들에 포함되게 하는 복수의 분해능들을 제공하도록 층 구조로 되어 있을 수 있다. 본 발명의 다른 면들은 본 설명 및 청구항들에 걸쳐 개시된다.According to aspects of the present invention, a bitstream format for representing audio information in which a bitstream is described by sequential traversal of a tree hierarchical data structure has a tree hierarchy comprising a plurality of tree hierarchy levels, each tree hierarchy level With one or more nodes, at least some progressively smaller subdivision of audio information is represented by progressively lower levels of the tree hierarchy, with audio information included among the nodes of one or more levels. Incrementally smaller subdivisions of audio include one or more temporal subdivisions, spatial subdivisions, and resolution subdivisions. The first level of the tree hierarchy includes a root node representing all the audio information and at least one lower level includes a plurality of nodes representing the temporal segmentation of the audio information, and the at least one lower level represents the spatial segmentation of the audio information. It may include a plurality of nodes representing. Alternatively, or in addition, the audio information provides a plurality of resolutions such that the basic resolution audio information layer is included in one level and the one or more audio information resolution enhancement layers are included in the same layer or in one or more other levels. It may be a layer structure so as to. Other aspects of the invention are disclosed throughout the description and claims.

본 발명의 양태들에 따른 비트스트림 포맷은The bitstream format according to aspects of the present invention is

- 오디오 처리 레이턴시 최소화,-Minimize audio processing latency,

- 비트스트림에 광범한 수정 없이 메타데이터를 추가, 제거, 아니면 조작,Adding, removing, or otherwise manipulating metadata without extensive modification to the bitstream,

- 비트스트림에 포함된 오디오물의 특정한 면들에 임의의 메타데이터를 연관시키는 것,Associating arbitrary metadata with specific aspects of the audio material contained in the bitstream,

- 비트스트림 구조적 오버헤드를 최소화,-Minimize bitstream structural overhead,

- 향후/기존 호환성을 위한 유연한 비트스트림 구조를 제공,-Provide flexible bitstream structure for future / existing compatibility

- 다양한 인터페이스들에 대해 효율적 전송을 가능하게 하는 것,Enabling efficient transmission over various interfaces,

- 프레임 기반의 편집을 할 수 있게 하는 것,-Enabling frame-based editing,

- 엔코딩된 혹은 엔코딩하지 않은 오디오 정보의 캡슐화를 할 수 있게 하는 것-Enabling the encapsulation of encoded or unencoded audio information

중 하나 이상 이상에 유용할 수 있다. It may be useful for one or more of the.

트리 계층 데이터 구조들의 정의 및 예는 NIST, 즉 National Institute of Standard and Technology, 웹사이트 "Dictionary of Algorithms and Data Structures"(http://nist.gov/dads/)에서 발견될 수 있다. 트리 계층 데이터 구조의 순서적 횡단의 데모는 Department of Computer Science, University of Canterbury(New Zealand) 웹사이트의 Data Structures, Algorithis, Binary Tree Traversal Algorithm (http://www.cosc.cantebury.ac.nz/people/mukundan/dsal/ BTree.html)에서 발견될 수 있다.Definitions and examples of tree hierarchical data structures can be found in NIST, the National Institute of Standard and Technology, website "Dictionary of Algorithms and Data Structures" (http://nist.gov/dads/). Demonstration of sequential traversal of tree hierarchical data structures can be found at Data Structures, Algorithis, Binary Tree Traversal Algorithm (http://www.cosc.cantebury.ac.nz/) of the Department of Computer Science, University of Canterbury (New Zealand) website. people / mukundan / dsal / BTree.html).

도 1a 및 도 1b는 비트스트림의 오디오 정보(여기서는 "오디오 에센스"라고 할 때도 있음) 성분들 및 본 발명의 면들에 따라 이 비트스트림의 계층적 트리 표현을 각각 도시한 개략도이다.1A and 1B are schematic diagrams illustrating hierarchical tree representations of audio information (sometimes referred to herein as "audio essences") components of a bitstream and this bitstream in accordance with aspects of the present invention.

도 2는 도 1b와 유사하나 메타데이터를 포함하는 계층 트리 표현의 예를 도시한 개략도이다.FIG. 2 is a schematic diagram illustrating an example of a hierarchical tree representation similar to FIG. 1B but including metadata.

도 3은 도 2의 트리 계층의 순서에 따른 횡단의 결과로서, 본 발명의 면들에 따라, 시리얼화된 비트스트림을 도시한 개략도이다. 도 2는 각 노드의 시작 및/또 는 끝에 첨부된 메타데이터의 세그먼트들을 또한 도시한 점에서 도 1b와는 다르다.3 is a schematic diagram illustrating a serialized bitstream, in accordance with aspects of the present invention, as a result of traversal in the order of the tree hierarchy of FIG. FIG. 2 differs from FIG. 1B in that it also shows segments of metadata appended to the beginning and / or end of each node.

도 4a 내지 도 4d는 본 발명의 면들에 따라 비트스트림을 사용한 트랜스코딩 프로세스를 도시한 개략도이다.4A-4D are schematic diagrams illustrating a transcoding process using a bitstream in accordance with aspects of the present invention.

도 5는 본 발명의 면들에 따라 트리 계층 내 노드의 구조의 개략도이다.5 is a schematic diagram of the structure of a node in a tree hierarchy in accordance with aspects of the present invention.

도 6은 짧은 노드의 구조의 개략도이다.6 is a schematic diagram of the structure of a short node.

도 7은 본 발명에 따라 계층적 트리의 예의 개략도이다.7 is a schematic diagram of an example of a hierarchical tree in accordance with the present invention.

도 8a는 본 발명의 면들에 따라 비트스트림에 두 개의 AC-3 동기 프레임들의 매핑을 도시한 개략도이다.8A is a schematic diagram illustrating the mapping of two AC-3 sync frames to a bitstream in accordance with aspects of the present invention.

도 8b는 2개의 보층 오디오 채널들이 추가된 도 8a의 AC-3 캡슐화 비트스트림의 개략도이다.8B is a schematic diagram of the AC-3 encapsulation bitstream of FIG. 8A with two complementary audio channels added.

도 9는 흐름도 혹은 기능 블록도로서, 본 발명의 면들에 따라, 도 3의 예와 유사한 비트스트림을 생성하기 위한 엔코더 혹은 엔코딩 프로세스의 여러 기능적인 면들을 도시한 것이다. 9 is a flow diagram or functional block diagram illustrating various functional aspects of an encoder or encoding process for generating a bitstream similar to the example of FIG. 3, in accordance with aspects of the present invention.

도 10은 본 발명의 면들에 따라, 도 3 및 도 9의 예들의 것과 같은 비트스트림으로부터 오디오 에센스 및 메타데이터를 도출하기 위한 디코더 혹은 디코딩 프로세스의 여러 기능적 면들을 도시한, 흐름도 혹은 기능 블록도이다.FIG. 10 is a flow diagram or functional block diagram illustrating various functional aspects of a decoder or decoding process for deriving audio essence and metadata from a bitstream such as the examples of FIGS. 3 and 9, in accordance with aspects of the present invention. .

도 1a 및 도 1b는 비트스트림의 오디오 정보(여기서는 "오디오 에센스"라고 할 때도 있음) 성분들 및 본 발명의 면들에 따라 이 비트스트림의 계층적 트리 표현을 각각 도시한 개략도이다. 도 1a의 비트스트림 표현은 2개의 연속한 오디오 프 레임들을 나타내는 것으로, 그 각각은 제1 및 제2 채널들, 즉 채널 1 및 채널 2를 구비한다. 후자는 예를 들면 각각 좌측 및 우측 스피커에 의해 재생될 오디오 정보에 대응할 수 있다. 채널들 1 및 2는 제1 프레임에서 1a 및 2a로 표기되었고 제2 프레임에서 1b 및 2b로 표기되었다. 도 1a에서, 수직방향은 채널들을 나타내고, 수평 방향은 프레임들 및 시간을 나타낸다.1A and 1B are schematic diagrams illustrating hierarchical tree representations of audio information (sometimes referred to herein as "audio essences") components of a bitstream and this bitstream in accordance with aspects of the present invention. The bitstream representation of FIG. 1A represents two consecutive audio frames, each having first and second channels, namely channel 1 and channel 2. FIG. The latter may correspond, for example, to audio information to be reproduced by the left and right speakers, respectively. Channels 1 and 2 are labeled 1a and 2a in the first frame and 1b and 2b in the second frame. In FIG. 1A, the vertical direction represents channels and the horizontal direction represents frames and time.

도 1b의 예에서, 본 발명의 면들에 따라 도 1a의 비트스트림을 근간을 이루는 트리 계층은 3레벨로서 레벨 1, 레벨 2 및 레벨 3을 갖는다. 레벨 1에서 단일 루트 노드(3)는 전체 비트스트림의 오디오물을 나타낸다. 실제로, 후술하는 바와 같이, 비트스트림 포맷과, "오디오물"의 근간이 되는 트리 계층 데이터 구조 표현은 오디오 정보 혹은 오디오 "에센스", 오디오 에센스에 관한 정보인 "메타데이터" 및 그 외 데이터를 포함할 수 있다. 그러나, 이러한 단순 예에서는 비트스트림의 트리 계층에 관하여 오디오 에센스만이 도시되었다.In the example of FIG. 1B, the tree hierarchy underlying the bitstream of FIG. 1A, in accordance with aspects of the present invention, has Level 1, Level 2, and Level 3 as three levels. At level 1 a single root node 3 represents the audio material of the entire bitstream. Indeed, as will be described later, the bitstream format and the representation of the tree hierarchical data structure underlying the "audio material" include audio information or audio "essence", "metadata" which is information about audio essence, and other data. can do. In this simple example, however, only audio essence is shown with respect to the tree hierarchy of the bitstream.

이 예의 계층의 레벨 2에서, 오디오물은 임의의 수의 개별적 오디오 프레임들로 분해될 수 있는데, 그 각각은 고정된 혹은 가변 기간들 혹은 비트 길이들(표현에 간략성을 위해서, 2개의 프레임들만이 도 1 및 도 1b의 예에서 도시되었음)을 갖는다. 각각이 이의 부모로서 루트 노드(3)을 갖는 프레임 노드들(4, 5)은 이 예의 계층의 레벨 2에서 각각 제1 및 제2 오디오 프레임들을 나타낸다. 각 오디오 프레임은 임의의 수의 오디오 채널들로 분해되고(표현에 간략성을 위해서, 도 1a 및 도 1b의 예에서는 프레임당 단지 2개의 채널들만이 도시되었음), 그 각각은 예를 들면, "좌측" 및 "우측"과 같이 공간적 방향에 대응한다. 채널 노드들(6, 7, 8, 9) 는 그 각각은 갖자 소속된 프레임 노드를 이의 부모로서 갖는 것으로서, 계층의 레벨 3에서 연속한 프레임들에 오디오 채널들(1a, 2a, 1b, 2b)를 각각 나타낸다.At level 2 of the hierarchical example of this example, the audio material can be broken down into any number of individual audio frames, each of which is fixed or variable periods or bit lengths (for simplicity of representation, only two frames). This is illustrated in the example of FIGS. 1 and 1B). Frame nodes 4, 5 each having a root node 3 as its parent represent the first and second audio frames, respectively, at level 2 of the hierarchy of this example. Each audio frame is decomposed into any number of audio channels (for simplicity in presentation, only two channels per frame are shown in the examples of FIGS. 1A and 1B), each of which is, for example, " Correspond to spatial directions such as "left" and "right". The channel nodes 6, 7, 8, and 9 each have their own frame node as their parent, with audio channels 1a, 2a, 1b, 2b in successive frames at level 3 of the hierarchy. Respectively.

도 1b의 예에서, 채널 노드들(6-9)은 잎 노드들이며 각각은 적어도 한 에센스 요소 형태의 오디오 에센스를 갖는다. 원리적으로, 오디오 에센스는 잎 노드들에 내포될 필요가 없을지라도, 본 발명의 설명을 읽고 이해되어 알게 되는 바와 같이, 실제로는 잎 노드들에 오디오 에센스를 배치하는 것이 잇점이 있다(그리고, 하나 이상의 분해도의 인핸스먼트 층들과 함께 외도의 기본 분해도 층이 제공되는 것과 같은 "층구조" 오디오의 경우엔, 오디오 에센스를 잎 노드들에 그리고 하나 이상의 차상위의 계층의 층들의 노드들에 배치한다). In the example of FIG. 1B, the channel nodes 6-9 are leaf nodes and each has an audio essence in the form of at least one essence element. In principle, although audio essence does not need to be implied in leaf nodes, it is advantageous to actually place audio essence in leaf nodes, as one will read and understand the description of the present invention. In the case of "layered" audio, such as in which the basic resolution layer of the independence is provided along with the enhancement layers of the above exploded view, the audio essence is placed in the leaf nodes and in the nodes of the layers of one or more higher layers).

계층 내 어디에 놓여지건 간에, 오디오 에센스가 계층의 하나 이상의 노드들 내에 있다는 것과, 그에 따라, 오디오 에센스가 결과적인 비트스트림에 존재하게 된다는 것이 본 발명의 특징이다. 이것은, 예를 들면, 엔코딩이나 디코딩 혹은 오디오 에센스에 관계된 정보가 비트스트림 및 이의 내재된 계층 이외에 놓여져 있을 가능성을 배제하지 않는다. 예를 들면, 오디오 에센스에 연관된 메타데이터에 포인터는 비트스트림 및 이의 내재한 계층 외의 특정 디코딩 프로세스를 가리킬 수도 있을 것이다.Whatever is placed within the layer, it is a feature of the present invention that the audio essence is in one or more nodes of the layer, and hence the audio essence is present in the resulting bitstream. This does not exclude, for example, the possibility that information relating to encoding or decoding or audio essence lies outside the bitstream and its underlying layer. For example, a pointer to metadata associated with an audio essence may point to a specific decoding process other than the bitstream and its underlying layer.

전술한 바와 같이, "오디오물"의 비트스트림 포맷 및 내재하는 트리 계층 데이터 구조 표현은 오디오 정보 혹은 오디오 "에센스" 뿐만이 아니라, 오디오 에센스에 관한 정보인 "메타데이터" 및 이외 데이터를 포함할 수 있다.As mentioned above, the bitstream format and the underlying tree hierarchical data structure representation of "audio material" may include not only audio information or audio "essences" but also "metadata" and other data relating to audio essences. .

오디오 메타데이터에 관한 유용한 논의는 "Exploring the AC-3 Audio Standard for ATSC" in Audio Notes by Tim Carroll, June 26, 2002, at http://tvtechnology.com/features/audio_notes/f-TC-AC3-06.26.02.shtml, "A Closer Look at Audio Metadata" in Audio Notes by Tim Carroll, July 24, 2002, at http://tvtechnology.com/features/audio_notes/f-tc-metadata.shtml and "Audio Metadata: You Can Get There From Here" in Audio Notes by Tim Carrol, August 21, 2002, at http://tvtechnology.com/features/audio_notes/f-TC-metadata-08.21.02.shtml. 각 문헌은 전체를 참조로 여기 포함시킨다.A useful discussion of audio metadata is "Exploring the AC-3 Audio Standard for ATSC" in Audio Notes by Tim Carroll, June 26, 2002, at http://tvtechnology.com/features/audio_notes/f-TC-AC3- 06.26.02.shtml, "A Closer Look at Audio Metadata" in Audio Notes by Tim Carroll, July 24, 2002, at http://tvtechnology.com/features/audio_notes/f-tc-metadata.shtml and "Audio Metadata : You Can Get There From Here "in Audio Notes by Tim Carrol, August 21, 2002, at http://tvtechnology.com/features/audio_notes/f-TC-metadata-08.21.02.shtml. Each document is incorporated herein by reference in its entirety.

본 발명의 면들에 따라 계층적 표현에 기초한 비트스트림은 임의의 메타데이터 정보가 기술하는 오디오 에센스에 이 정보가 정밀하게 연관되게, 따라 동기화할 수 있게 한다. 이것은 특정 오디오 에센스에 연관시킬 메타데이터를 오디오 에센스와 동일한 노드 내에 혹은 오디오 에센스를 포함한 노드의 임의의 부모 노드 내에 배치함으로써 달성될 수 있다. 본 발명의 실시예에 따라서, 후술하는 바와 같이, 하나 이상의 메타데이터 요소들은 계층 내 임의의 노드의 시작 혹은 끝에 첨부될 수 있다. 따라서, 도 1b의 예에서와 같은 3레벨 계층에서, 특정 오디오 에센스에 연관된 메타데이터는 레벨 1의 루트 노드 내 전체 비트스트림의 오디오물의 시작 혹은 끝, 혹은 특정 오디오 에센스를 갖는 채널의 부모인 레벨 2에 프레임 노드 내 개개의 프레임의 시작 혹은 끝, 및/또는 특정 오디오 에센스를 내포한 레벨 3에 채널(잎) 노드 내 채널의 시작 혹은 끝에 첨부될 수 있다. 이러한 구성들의 예들을 도 2의 예에서 이하 설명한다.Bitstreams based on hierarchical representations in accordance with aspects of the present invention allow for the precise synchronization of this information with the audio essence described by any metadata information. This may be accomplished by placing metadata to associate with a particular audio essence within the same node as the audio essence or within any parent node of the node that contains the audio essence. In accordance with an embodiment of the present invention, as described below, one or more metadata elements may be appended to the beginning or end of any node in the hierarchy. Thus, in a three-level hierarchy such as in the example of FIG. 1B, the metadata associated with a particular audio essence is level 2 which is the beginning or end of the audio material of the entire bitstream in the root node of level 1, or the parent of a channel with a particular audio essence May be appended to the beginning or end of each frame in a frame node, and / or to the beginning or end of a channel in a channel (leaf) node at level 3 containing a specific audio essence. Examples of such configurations are described below in the example of FIG. 2.

바람직하게, 메타데이터는 개개의 노드들의 "의미상 독립성"에 기여하는 방 식으로 계층 레벨들 간에 배분된다. 예를 들면, 도 1b 유형의 구성에서, 루트 노드에 메타데이터는 바람직하게는 전체 오디오물에만 적용하고, 프레임 노드에 메타데이터는 바람직하게는 특정 프레임 및 이의 채널들에만 적용하고, 채널 노드에 메타데이터는 바람직하게는 특정 채널에만 적용한다. 메타데이터 정보의 적합한 정의에 의해서, 주어진 노드의 조작은 다른 노드에 실린 메타데이터의 수정을 요하지 않음을 확실히 할 수 있다. 예를 들면, 프레임 노드가 어떤 특정 채널 노드에 특정한 메타데이터를 포함하지 않고 또한 채널 노드가 다른 채널 노드에 의해 요구되는 메타데이터를 포함하고 있지 않을 때, 채널 메타데이터뿐만 아니라 채널 노드 전체가 다른 노드에 놓인 메타데이터를 수정하지 않고도 추가, 제거 혹은 수정될 수 있다. 이러한 면에서, 본 발명의 특징들은 노드들이 의미상으로 독립적이 되게 한다. 즉, 메타데이터 및 에센스 견지에서, 임의의 주어진 노드는 자신에게만 그리고 자신의 모든 자식들(있다면)에 똑같이 적용될 수 있는 메타데이터를 갖고 있다면 자신의 형제와는 독립적일 수 있다. 결국, 적합하게 분산된 메타데이터를 갖는 본 발명에 따른 비트스트림은 후술하는 바와 같이, 트랜스코딩을 용이하게 한다.Preferably, metadata is distributed between hierarchical levels in a manner that contributes to "significant independence" of the individual nodes. For example, in the configuration of the FIG. 1B type, the metadata at the root node preferably applies only to the entire audio material, and the metadata to the frame node preferably applies only to the particular frame and its channels, and to the channel node. The data preferably applies only to specific channels. By appropriate definition of metadata information, it can be assured that manipulation of a given node does not require modification of the metadata carried on other nodes. For example, when a frame node does not contain metadata specific to any particular channel node and the channel node does not contain metadata required by another channel node, not only the channel metadata but also the entire channel node is another node. It can be added, removed, or modified without modifying the metadata in it. In this respect, features of the invention make the nodes semantically independent. That is, in terms of metadata and essence, any given node can be independent of its sibling if it has metadata that can apply equally to itself and to all its children (if any). As a result, the bitstream according to the present invention with suitably distributed metadata facilitates transcoding, as described below.

본 발명에 따른 비트스트림은 오디오물의 계층적 표현을 시리얼화 하기 위해서 트리 계층 데이터 구조의 순서적 횡단을 사용하여 생성된다. 바람직하게, 순서에 따른 횡단은 프리오더 횡단("프리픽스(prefix) 횡단"이라고도 함)과 비슷하다. 프리오더 횡단 알고리즘은, 루트 노드를 처리하고, 이어서 재귀적으로 모든 서브-트리들을 처리함으로써 트리의 모든 노드들을 처리하는 것으로서 정의될 수 있다. 특히, 바디 태그들이 채용되지 않았다면("바디 태그"에 관해서는 이하 참조), 본 발명의 면들에 따라 계층을 시리얼화 하는데 사용하기 위한 적합한 프리오더 횡단 알고리즘은 루트 노드부터 시작하여, 다음의 알고리즘을 적용함으로써 기술될 수 있다.The bitstream according to the present invention is generated using an ordered traversal of the tree hierarchical data structure to serialize the hierarchical representation of the audio material. Preferably, the traversal in order is similar to preorder traversal (also called "prefix traversal"). The preorder traversal algorithm may be defined as processing all nodes of a tree by processing the root node and then recursively processing all sub-trees. In particular, if body tags are not employed (see below for "body tags"), a suitable preorder traversal algorithm for use in serializing a layer in accordance with aspects of the present invention, starting from the root node, applies the following algorithm: Can be described.

a) 노드의 시작을 표시하는 "시작 태그" 세그먼트가 비트스트림에 기입될 수 있다.a) A "start tag" segment indicating the start of the node may be written to the bitstream.

b) 노드의 시작에 첨부된 하나 이상의 메타데이터 혹은 에센스 요소들 각각은 개별 세그먼트로서 기입될 수 있다.b) Each of the one or more metadata or essence elements attached to the beginning of the node may be written as a separate segment.

c) 단계 "a"부터 시작하여, 알고리즘은 고찰 하의 노드의 자식 노드들 각각에 적용된다.c) Starting from step "a", the algorithm is applied to each of the child nodes of the node under consideration.

d) 노드의 끝에 첨부된 하나 이상의 메타데이터 혹은 에센스 요소들 각각은 개별 세그먼트로서 기입될 수 있다.d) Each of the one or more metadata or essence elements attached to the end of the node may be written as a separate segment.

e) 노드의 끝을 표시하는 "끝 태그" 세그먼트가 비트스트림에 기입될 수 있다.e) A "end tag" segment indicating the end of the node may be written to the bitstream.

횡단 알고리즘은 다음과 같이 단순화한 C언어 유사코드로서 표현될 수도 있다.The traversal algorithm may be expressed as a simplified C language-like code as follows.

바디 태그들이 채용된다면, 적합한 프리오더 횡단 알고리즘은 루트 노드부터 시작하여, 다음의 알고리즘을 적용함으로써 기술될 수 있다.If body tags are employed, a suitable preorder traversal algorithm can be described by applying the following algorithm, starting from the root node.

c) 루트 노드가 자식 노드들을 갖고 있지 않고 그의 끝에 메타데이터 혹은 에센스 요소들이 첨부되어 있지 않다면 단계 d) 내지 g)가 생략될 수 있다.c) Steps d) to g) can be omitted if the root node has no child nodes and no metadata or essence elements are appended to it.

단계 "a"부터 시작하여, 알고리즘은 고찰 하의 노드의 자식 노드들 각각에 적용된다.Starting from step "a", the algorithm is applied to each of the child nodes of the node under consideration.

d) 노드의 자식 노드의 시작을 표시하는 "바디 시작 태그" 세그먼트가 비트스트림에 기입될 수 있다.d) A "body start tag" segment indicating the start of a child node of a node may be written to the bitstream.

노드의 끝에 첨부된 하나 이상의 메타데이터 혹은 에센스 요소들 각각은 개별 세그먼트로서 기입될 수 있다.Each of the one or more metadata or essence elements appended to the end of the node may be written as a separate segment.

e) 단계 "a"부터 시작하여, 알고리즘이 고찰 하의 노드의 자식 노드들의 각각에 적용된다.e) Starting from step "a", an algorithm is applied to each of the child nodes of the node under consideration.

f) 노드의 자식노드의 끝을 표시하는 "바디 끝 태그"가 비트스트림에 기입될 수 있다.f) A "body end tag" indicating the end of a node's child node may be written to the bitstream.

g) 노드의 끝에 첨부된 하나 이상의 메타데이터 혹은 에센스 요소들 각각이 개별 세그먼트로서 기입될 수 있다.g) Each of the one or more metadata or essence elements attached to the end of the node may be written as a separate segment.

h) 노드의 끝을 표시하는 "끝 태그" 세그먼트가 비트스트림에 기입될 수 있다.h) A "end tag" segment indicating the end of the node may be written to the bitstream.

도 2는 도 1b와 유사하나 메타데이터를 포함하는 계층 트리 표현의 단순 예를 도시한 것이다. 도 3은 도 2의 트리 계층의 순서에 따른 횡단의 결과로서 시리얼화 된 비트스트림을 도시한 것이다.FIG. 2 shows a simple example of a hierarchical tree representation similar to FIG. 1B but including metadata. 3 illustrates a serialized bitstream as a result of traversal according to the order of the tree hierarchy of FIG. 2.

도 2는 각 노드의 시작 및/또는 끝에 첨부된 메타데이터의 세그먼트들을 또한 도시한 점에서 도 1b와는 다르다. 노드들이 도 1b의 노드들을 수정한 것들임을 표시하기 위해서, 프라임 기호를 가진 참조부호들이 도 2에 사용되었다. 따라서, 루트 노드(3')는 이의 시작에 첨부된, 예를 들면, 타이틀 및 저작권 메타데이터를 갖는다. 프레임 노드들(4', 5')은, 예를 들면, 각 노드의 시작에 첨부된 타임코드 및 각 노드의 끝에 첨부된 소리세기 메타데이터를 갖는다. 채널 노드들(6', 7', 8', 9')은 예를 들면, 각 노드의 시작에 첨부된 다운믹스 메타데이터를 갖는다.FIG. 2 differs from FIG. 1B in that it also shows segments of metadata appended to the beginning and / or end of each node. To indicate that the nodes are modifications to the nodes of FIG. 1B, reference signs with prime symbols have been used in FIG. 2. Thus, the root node 3 'has, for example, title and copyright metadata attached to its beginning. Frame nodes 4 ', 5' have, for example, a timecode appended to the beginning of each node and loudness metadata appended to the end of each node. Channel nodes 6 ', 7', 8 ', 9' have, for example, downmix metadata appended to the beginning of each node.

도 3은 본 발명에 따라 시리얼화 된 비트스트림 및 본 발명에 따른 계층의 예를 도시한 것이다. 비트스트림은 위의 "바디-태그들이 없는 경우의" 알고리즘에 따라 도 2의 계층의 순서적 횡단에 의해 나온 세그먼트들(세그먼트를 "원자 원소"라고도 함)(10 내지 37)를 갖는다. 오디오 에센스, 혹은 메타데이터를 갖든, 아니면 이외 데이터를 갖든지 간에 각 원소는 이의 내용을 나타내는 고유 식별자를 사용하여 라벨된다. 적합한 식별자들을 이하 기술한다.3 illustrates an example of a serialized bitstream and a layer according to the present invention. The bitstream has segments (also referred to as "atomic elements") 10 to 37, resulting from the sequential traversal of the hierarchy of FIG. 2 according to the "without body-tags" algorithm above. Each element, whether with audio essence or metadata, or with other data, is labeled using a unique identifier that indicates its contents. Suitable identifiers are described below.

후술하는 바와 같이, 루트 노드(3')는 모든 오디오물인 세그먼트들(10 내지 37)을 포함한다. 루트 노드(3') 내에 프레임 노드들(4', 5')의 네스팅, 및 이어서 각각의 프레임 노드 내에 채널 노드들의 네스팅이 도 3에 도시되었다. 도 3의 예의 비트스트림은 오디오물의 시작을 나타내는 루트 노드 시작 태그 세그먼트(10)부터 시작하여 루트 노드 시작에 첨부된 메타데이터(타이틀) 세그먼트(11) 및 메타데이터(저작권)(세그먼트)(12)가 이어진다. 이어서, 프레임 노드 시작 태그(13) 및 이 에 이은 프레임 노드(4')의 시작에 첨부된 메타데이터(타임코드) 세그먼트(14)로 표시된 바와 제1 자식인 프레임 노드(4')에 방문된다. 다음에, 채널 노드 시작 태그(15)로 표시된 바와 같이, 프레임 노드의 제1 자식인 채널 노드(6')에 방문된다. 채널 노드 시작 태그 세그먼트에 이어, 채널 노드(6')의 시작에 첨부된 메타데이 터(다운믹스) 세그먼트(16)가 온다. 메타데이터 세그먼트(16) 다음에, 채널 노드(6')의 (채널 1) 오디오 에센스(17) 및 채널 노드 끝 태그(18)가 이어진다. 다음에, 채널 노드 시작 태그(19)로 표시된 바와 같이 프레임 노드(4')의 제2 자식인 채널노드(7')에 방문된다. 채널 노드 시작 태그 세그먼트에 다음에는 채널 노드(7')의 시작에 첨부된 메타데이터(다운믹스) 세그먼트(20)가 이어진다. 메타데이터 세그먼트(20) 다음에는 채널 노드(7')의 (채널 2) 오디오 에센스(21) 및 채널 노드 끝 태그(22)가 이어진다. 프레임 노드(4')의 그 외 다른 자식이 없고 또한 채널 노드들(6', 6')은 잎 노드들이기 때문에, 프레임 노드(4')에 재 방문됨으로써, 소리세기 메타데이터(23)가 기입될 수 있게 한다(소리세기 메타데이터는 소리세기 메타데이터의 값을 판정하기 위해서 채널들(1, 2)의 오디오 에센스를 방문한 프로세스에 따른다). 이어서, 프레임 태그 세그먼트(24)의 끝이 비트스트림에 기입된다. 이어서, 다음 프레임 노드(5')에 방문된다.As will be described later, the root node 3 'includes segments 10 to 37 which are all audio material. Nesting of frame nodes 4 ', 5' in the root node 3 'and then nesting of channel nodes in each frame node are shown in FIG. The bitstream of the example of FIG. 3 is a metadata (title) segment 11 and metadata (copyright) (segment) 12 beginning at the root node start tag segment 10 representing the beginning of the audio material and attached to the root node start. Is followed. Subsequently, a visit is made to the frame node 4 'which is the first child and indicated by the frame node start tag 13 followed by the metadata (timecode) segment 14 attached to the beginning of the frame node 4'. . Next, as indicated by the channel node start tag 15, a visit is made to the channel node 6 ', which is the first child of the frame node. Following the channel node start tag segment, comes the metadata (downmix) segment 16 attached to the start of the channel node 6 '. Following the metadata segment 16, the channel node 6 '(channel 1) audio essence 17 and the channel node end tag 18 are followed. Next, a visit is made to the channel node 7 'which is the second child of the frame node 4' as indicated by the channel node start tag 19. The channel node start tag segment is followed by a metadata (downmix) segment 20 appended to the beginning of the channel node 7 '. The metadata segment 20 is followed by the (channel 2) audio essence 21 and the channel node end tag 22 of the channel node 7 '. Since there are no other children of the frame node 4 'and the channel nodes 6', 6 'are leaf nodes, they are revisited to the frame node 4' so that the loudness metadata 23 is written. (The loudness metadata depends on the process of visiting the audio essence of the channels 1, 2 to determine the value of the loudness metadata). The end of frame tag segment 24 is then written to the bitstream. Then, the next frame node 5 'is visited.

프레임 노드(4')의 서브-트리, 에 대해 지금 설명한 바와 유사한 방식으로, 프레임(5') 및 이의 자식인 잎 노드들(8', 9')에 의한 비트스트림 기입되어, 프레임 시작 세그먼트(25), 메타데이터(타임코드) 세그먼트(26), 채널 노드 시작 태그(27), 채널 노드 메타데이터(다운믹스) 세그먼트(28), (채널 1) 채널 노드 오디오 에센스 세그먼트(29), 채널 노드 끝 태그(30), 채널 노드 시작 태그(31), 채널 노드 메타데이터(다운믹스) 세그먼트(32), (채널 2) 채널 노드 오디오 에센스 세그먼트(33), 채널 노드 끝 태그(34), 프레임 노드 끝 메타데이터(소리세기)(35) 및 프레임 태그 세그먼트의 끝(36)이 생성된다. 이 단순 예는 단지 2개의 프레임만을 갖기 때문에, 루트 노드에 재 방문된다. 루트 노드의 끝에 첨부된 메타데이터가 없는 한, 루트 노드 끝 태그 세그먼트(37)가 기입되어 오디오물의 끝을 표시한다.In a similar manner as has now been described with respect to the sub-tree of the frame node 4 ', the bitstream is written by the frame 5' and its child leaf nodes 8 ', 9', so that the frame start segment ( 25), metadata (timecode) segment 26, channel node start tag 27, channel node metadata (downmix) segment 28, (channel 1) channel node audio essence segment 29, channel node End tag 30, channel node start tag 31, channel node metadata (downmix) segment 32, (channel 2) channel node audio essence segment 33, channel node end tag 34, frame node End metadata (sound strength) 35 and end 36 of the frame tag segment are generated. Since this simple example has only two frames, it is revisited to the root node. Unless there is metadata appended to the end of the root node, the root node end tag segment 37 is written to indicate the end of the audio material.

의미상 독립적인 것 외에도, 위에 언급한 바와 같이, 각 세그먼트가 자신의 유형 및 길이를 갖고 있고 다른 세그먼트들은 갖지 않으며 또 다른 세그먼트 내에 네스트되지 않는 점에서 각 세그먼트는 구조적으로 독립적이다. 그러므로, 세그먼트는 다른 세그먼트들의 사전 지식없이도 처리될 수 있고, 그 결과로 비트스트림은 한번에 한 세그먼트로 파싱될 수 있어, 낮은 레이턴시 연산을 가능하게 한다. 또한, 노드 혹은 세그먼트의 추가, 삭제 및 수정은 어떤 다른 노드 혹은 세그먼트의 조작을 반드시 필요로 하지 않는다.In addition to being semantically independent, as mentioned above, each segment is structurally independent in that each segment has its own type and length, no other segments and no nesting within another segment. Therefore, a segment can be processed without prior knowledge of other segments, and as a result the bitstream can be parsed into one segment at a time, enabling low latency computation. In addition, adding, deleting, and modifying a node or segment does not necessarily require manipulation of any other node or segment.

이러한 구조적 융통성이 있어, 메타데이터 및 오디오 에센스가 최적으로 분산되어 있다면, 세그먼트들, 및 사실상 전체 노드들이 다른 세그먼트들 및 노드들에 영향을 미치지 않고 추가, 제거, 및 조작될 수 있다. 이것은, 예를 들면, 비트스트림 전체를 재 마스터링할 필요없이 어떤 오디오물로부터 특정 오디오 채널을 제거할 수 있게 한다. 특히, 노드들은 바람직하게는 계통적 수정(즉, 비트스트림의 다른 노드들의 수정)을 요할 수 있는 어떤 길이 혹은 동기화 정보도 포함하지 않는다. 길이 정보는 시작 태그들 및 끝 태그들이 노드의 범위를 정하기 때문에 필요하지 않다. 동기화 정보는 노드 내의 세그먼트의 존재가 노드의 내용에 분명하게 동기하기 때문에 필요하지 않다. 한편, 메타데이터 및/또는 오디오 에센스는 예를 들면 계층의 특정 레벨에 노드들간에 의존성을 생기도록 분산될 수도 있는데, 이 경우에는 레이턴시가 증가될 것이다. 예를 들면, 본 발명의 면들의 특정 실시예는 각 프레임 노드가 시간 스탬프를 포함할 것과 시간 스탬프들이 연속적일 것을 요구할 수도 있을 것이다. 이 때, 한 프레임 노드의 제거는 모든 후속의 프레임 노드들을 수정할 것을 요하여, 바람직하지 못한 설계 판정이 될 것이다.With this structural flexibility, if metadata and audio essences are optimally distributed, segments, and in fact entire nodes, can be added, removed, and manipulated without affecting other segments and nodes. This makes it possible, for example, to remove a particular audio channel from some audio material without having to remaster the entire bitstream. In particular, the nodes preferably do not contain any length or synchronization information that may require systematic modification (ie, modification of other nodes in the bitstream). The length information is not needed because the start tags and the end tags delimit the node. Synchronization information is not necessary because the presence of a segment in a node is clearly synchronized with the content of the node. On the other hand, metadata and / or audio essences may be distributed, for example, to create dependencies between nodes at certain levels of the hierarchy, in which case the latency will be increased. For example, certain embodiments of aspects of the present invention may require that each frame node include a time stamp and that the time stamps are consecutive. At this time, removal of one frame node would require modification of all subsequent frame nodes, which would be an undesirable design decision.

전술한 바와 같이, 계층 내 각 원소는 오디오 에센스를 포함하건, 메타데이터를 포함하건, 아니면 다른 데이터를 포함하건 간에, 바람직하게는 그의 내용을 나타내는 고유 식별자를 사용하여 라벨된다. 그러므로, 본 발명에 따라 포맷된 비트스트림을 수신하는 주어진 애플리케이션은 이것이 인식하지 못하는 원소들을 무시할 수 있다. 이것은 현존의 애플리케이션들을 교란함이 없이 비트스트림에 새로운 유형의 원소들이 도입될 수 있게 한다. 예를 들면, 하나 이상의 오디오 에센스 인핸스먼트 층들은 관계된 메타데이터와 함께 비트스트림에 추가되어, 기존 및 향후 호환성을 가지게 할 수도 있을 것이다. 대안적으로, 하나 이상의 인핸스먼트 층들은 메타데이터에 포함될 수도 있을 것이다.As mentioned above, each element in the hierarchy, whether including audio essence, metadata, or other data, is preferably labeled using a unique identifier representing its contents. Therefore, a given application receiving a bitstream formatted according to the present invention may ignore elements it does not recognize. This allows new types of elements to be introduced into the bitstream without disturbing existing applications. For example, one or more audio essence enhancement layers may be added to the bitstream along with the associated metadata to allow for existing and future compatibility. Alternatively, one or more enhancement layers may be included in the metadata.

도 4a 내지 도 4d는 본 발명의 면들에 따라 비트스트림을 사용한 트랜스코딩 프로세스를 도시한 것이다. 세그먼트들은 이들이 비트스트림에 나타날 때 일련으로 처리된다. 도 4a는 트랜스코딩 프로세스에 앞서 본 발명에 따른 2채널 비트스트림을 도시한 것이다. 세그먼트들 (a) 및 (b)는 프레임 1의 채널 1 및 채널 2에 대응하는 오디오 정보를 갖는다. 도 4b에서, 트랜스코딩 프로세스는 오디오 정보를 포함하는 세그먼트 (a)를 접하였을 때는 6개의 세그먼트들을 읽은 상태이다. 이것은 비트스트림으로부터 세그먼트를 읽고, 오디오 정보를 추출하고, 이 오디오 정보를 타겟 포맷으로 트랜스코딩하고, (a) 대신에 비트스트림에 기입하는 세그먼트 (a') 에 오디오 정보를 다시 랩(wrap)한다. 트랜스코딩 맥락에서 채널 노드들이 상호 의존적이지 않다면, 이전 혹은 미래의 노드들에 대해 알 필요가 없다. 이것은 낮은 레이턴시 연산에 있어선 중요한 것으로, 전체 비트스트림 혹은 비트스트림의 대부분이 트랜스코딩 프로세스에 의해 수신되기 전에 시작할 수도 있다. 도 4c에서, 트랜스코딩프로세스는 도 4b에 관련하여 기술된 방식으로 세그먼트 (b')로서 기입하는 세그먼트 (b)에 도달한다. 도 4d는 완전히 트랜스코딩된 비트스트림을 도시한 것이다.4A-4D illustrate a transcoding process using a bitstream in accordance with aspects of the present invention. Segments are processed in series as they appear in the bitstream. 4A illustrates a two channel bitstream in accordance with the present invention prior to the transcoding process. Segments (a) and (b) have audio information corresponding to channel 1 and channel 2 of frame 1. In FIG. 4B, the transcoding process has read six segments when it encounters segment (a) containing audio information. It reads the segment from the bitstream, extracts the audio information, transcodes the audio information into the target format, and wraps the audio information back into segment (a ') instead of writing (a) into the bitstream. . If the channel nodes are not interdependent in the context of transcoding, there is no need to know about previous or future nodes. This is important for low latency operations, which may start before the entire bitstream or most of the bitstream is received by the transcoding process. In FIG. 4C, the transcoding process reaches segment (b) which writes as segment (b ') in the manner described in connection with FIG. 4B. 4D shows a fully transcoded bitstream.

다음은 본 발명의 면들의 실시예를 기술한다. 본 발명은 이 실시예 혹은 다른 실시예들로 한정되는 것은 아니다. 다음 설명이 비트스트림의 신택스 및 문법, 비트스트림의 원자 원소들의 구조, 및 이들 원소들의 순응적 배열을 개시할지라도, 이것은 메타데이터와 오디오 에센스간의 관계와 같은, 비트스트림의 의미적 콘텐트를 기술하지 않는다. 이러한 관계는 본 발명의 범위 밖이다.The following describes an embodiment of aspects of the present invention. The invention is not limited to this or other embodiments. Although the following description discloses the syntax and syntax of the bitstream, the structure of the atomic elements of the bitstream, and the adaptive arrangement of these elements, it does not describe the semantic content of the bitstream, such as the relationship between metadata and audio essences. Do not. This relationship is outside the scope of the present invention.

여기 및 특히 이 실시예에 관련하여 채용된 용어는 다음과 같이 정의될 수 있다.The terms employed herein and in particular in connection with this embodiment may be defined as follows.

내재하는 오디오물(audio material): 노드들 및 세그먼트들을 포함하는 자체 내장된 비트스트림에 의해 표현되고 본 발명의 면들에 따라 포맷된 오디오 정보.Inherent audio material: Audio information represented by a self-built bitstream comprising nodes and segments and formatted according to aspects of the present invention.

노드: 계층 레벨에 속하고 시작태그 및 끝 태그 쌍에 의해 범위가 정해지는 제로 혹은 그 이상의 연속한 비트스트림 세그먼트들. 노드들은 네스트될 수 있다.Node: Zero or more consecutive bitstream segments that belong to the hierarchy level and are scoped by start tag and end tag pairs. Nodes can be nested.

세그먼트(원자 원소): 구별되는 엔터티로서 조작될 수 있는(예를 들면, 패키지되거나 암호화되는) 가장 작은 비트스트림 원소. 3가지 유영의 세그먼트들로서 오디오 에센스 세그먼트, 메타데이터 세그먼트(오디오 에센스 및 메타데이터 세그먼트는 "콘텐츠" 세그먼트이다) 및 태그 세그먼트(태그 세그먼트는, 예를 들면, 비트스트림 및 트리 계층을 서로간에 관계시킬 수 있게 하는 "구조적" 세그먼트이다.). 세그먼트는 자신의 길이, 유형, 및/또는 콘텐츠에 관한 정보를 갖출 수 있다.Segment (atomic element): The smallest bitstream element that can be manipulated (eg, packaged or encrypted) as a distinct entity. The three stream segments are the Audio Essence Segment, the Metadata Segment (the Audio Essence and Metadata Segment are "Content" Segments), and the Tag Segment (Tag Segments can associate the bitstream and tree hierarchy with each other, for example). "Structural" segment). A segment may have information about its length, type, and / or content.

오디오 에센스 세그먼트: 오디오 에센스를 가진 콘텐트 세그먼트(오디오 정보). 오디오 에센스 세그먼트는 예를 들면 일련의 엔코딩되지 않은 펄스 부호 변조(PCM) 오디오 데이터 혹은 암호화된 PCM 오디오 데이터(예를 들면, 지각적으로 엔코딩된 PCM).Audio Essence Segment: A content segment with audio essence (audio information). An audio essence segment is for example a series of unencoded pulse code modulation (PCM) audio data or encrypted PCM audio data (eg, perceptually encoded PCM).

메타데이터 세그먼트: 관련된 오디오 에센스에 관계된 메타데이터 정보를 갖는 콘텐트 세그먼트.Metadata Segment: A content segment with metadata information related to the associated audio essence.

태그 세그먼트: 노드의 범위를 정하는데 사용되는 비-콘텐트 세그먼트.Tag Segment: A non-content segment used to scope nodes.

프레임: 오디오물의 시간간격 및 이러한 오디오 에센스 세그먼트들에 관계된 하나 이상의 메타데이터 세그먼트들을 나타내는 하나 이상의 오디오 에센스 세그먼트들을 포함하는 비트스트림 노드.Frame: a bitstream node comprising one or more audio essence segments representing a time interval of audio material and one or more metadata segments related to these audio essence segments.

일 그룹의 프레임들: 하나 이상의 메타데이터 세그먼트들이 선행되고, 옵션으로, 하나 이상의 추가의 메타데이터 세그먼트들이 동반되는 일련의 프레임들.A group of frames: a series of frames preceded by one or more metadata segments, optionally accompanied by one or more additional metadata segments.

본 발명에 따라 포맷된 비트스트림은 오디오 코딩, 오디오 메타데이터 및 트랜스포트 방법과는 독립적으로 정의되며, 따라서, 오류정정 및 압축에 특정한 메타데이터와 같은 특징들을 포함하지 않을 수 있다.Bitstreams formatted in accordance with the present invention are defined independently of audio coding, audio metadata, and transport methods, and thus may not include features such as metadata specific to error correction and compression.

세그먼트Segment

전술한 바와 같이, 세그먼트 또는 원자 원소는 구별되는 엔터티로서 조작될 수 있는(예를 들면, 패키지되거나 암호화되는) 최소 비트스트림 원소이다. 실제적으로, 각 세그먼트는 유형 및 크기 정보를 포함하는 헤더, 및 오디오 에센스 및 메타데이터 세그먼트들의 경우엔 페이로드를 포함하는 바이트별로 정렬된 구조일 수 있다. 태그 세그먼트들은 구조 정보를 내장하고 페이로드는 갖지 않는다. 콘텐트 세그먼트들은 이들의 페이로드로서 메타데이터 혹은 에센스 정보를 내장한다. 세그먼트의 유형 및 이의 의미는 고유 식별자들을 사용하여 더욱 상세히 할 수 있다. 세그먼트 신택스는 이하 상세히 명시된다.As noted above, a segment or atomic element is the smallest bitstream element that can be manipulated (eg, packaged or encrypted) as a distinct entity. In practice, each segment may be a structure arranged by byte, including a header containing type and size information, and in the case of audio essence and metadata segments, a payload. Tag segments contain structure information and have no payload. Content segments embed metadata or essence information as their payloads. The type of segment and its meaning can be further detailed using unique identifiers. Segment syntax is specified in detail below.

노드Node

세그먼트들은 노드들로 배열되고, 이들은 계층적으로 네스트되는 구조들이다. 본 실시예에서, 한 노드는 시작 태그 세그먼트와 끝 태그 세그먼트를 매칭함으로써 경계가 정해지는 일련의 세그먼트들로 구성될 수 있다. 도 5에 도시한 바와 같이, 트리 계층 내 한 노드의 구조는 3개의 구별되는 콘텍스트들(혹은 부분들)로서 헤더(40), 바디(41), 및 테일러(42) 콘텍스트들을 갖는다. 헤더 및 테일러 콘텍스트들은 각각이 하나 이상의 콘텐츠 세그먼트들을 가질 수 있고, 바디 콘텍스트는 제로 혹은 그 이상의 자식 노드들을 가진다. 옵션으로, 바디 콘텍스트는 바디 시작- 및 끝-태그 세그먼트들에 의해 경계가 정해질 수 있다.Segments are arranged into nodes, which are hierarchically nested structures. In this embodiment, a node may consist of a series of segments delimited by matching the start tag segment and the end tag segment. As shown in FIG. 5, the structure of a node in the tree hierarchy has a header 40, body 41, and Taylor 42 contexts as three distinct contexts (or portions). Header and Taylor contexts may each have one or more content segments, and the body context may have zero or more child nodes. Optionally, the body context can be bounded by body start- and end-tag segments.

도 5의 상세를 참조하면, 노드 구조는 시작 태그 세그먼트(43)부터 시작하여 끝 태그 세그먼트(44)로 끝난다. 태그 세그먼트들(43, 44) 각각은 태그의 유형이 계층에서 노드의 위치에 따르기 때문에 "X"로 표시하였다. 본 실시예에서 루트 노드의 경우에, 태그 세그먼트들은 일 그룹의 프레임(GOF) 태그들일 수 있다. 시작 태그(43) 다음에, 헤더 콘텍스트(40)는 하나 이상의 콘텐트 세그먼트들(45)을 가질 수 있다. 다음에, 바디 시작 태그(46)는 도 5에 도시한 노드 밑의 하나 이상의 계층 레벨들에 네스트된 하나 이상의 노드들(47)을 포함하는 하나 이상의 바디 콘텍스트(41)를 가질 수 있다. 바디 끝 태그(48)은 바디 콘텍스트(41)의 끝의 범위를 정할 수 있다. 바디 끝 태그(48) 다음에, 테일러 콘텍스트(42)는 하나 이상의 콘텐트 세그먼트들(49)을 가질 수 있다. 마지막으로, 노드 구조는 끝 태그 세그먼트(44)로 끝난다.Referring to the details of FIG. 5, the node structure starts with a start tag segment 43 and ends with an end tag segment 44. Each of the tag segments 43, 44 is marked with an "X" because the type of tag depends on the position of the node in the hierarchy. In the case of the root node in this embodiment, the tag segments may be a group of frame (GOF) tags. After the start tag 43, the header context 40 can have one or more content segments 45. The body start tag 46 may then have one or more body contexts 41 including one or more nodes 47 nested at one or more hierarchical levels below the node shown in FIG. 5. Body end tag 48 may delimit the end of body context 41. Following the body end tag 48, the Taylor context 42 may have one or more content segments 49. Finally, the node structure ends with an end tag segment 44.

오디오 에센스 및, 아마도, 관계된 메타데이터를 포함하는 잎 노드의 경우에서 발생할 수 있는 것으로, 바디 콘텍스트 및 테일러 콘텍스트 둘 다 비어있다면, 바디 태그들은 생략될 수 있고, 노드는 도 6에 도시한 바와 같이 짧은 노드가 된다. 짧은 노드는 바디 태그들이 없을 때는 헤더 콘텍스트 및 풋터 콘텍스트를 구별할 수 없기 때문에 헤더 콘텍스트(40')로 제한된다. 도 6의 상세를 참조하면, 노드 구조는 시작 태그 세그먼트(50)로 시작하고 끝 태그 세그먼트(51)로 끝난다. 도 5의 예의 경우에서처럼, 태그 세그먼트들은 태그의 유형이 계층 내 노드의 위치에 의존하기 때문에 "X"가 표시되었다. 본 실시예에서 잎 노드의 경우에, 태그 세그먼트는 채널 태그들이 될 수 있다. 헤더 콘텍스트(40')는 시작 태그와 끝 태그 사이에 있고 하나 이상의 콘텐트 세그먼트들(45')를 포함한다.This may occur in the case of a leaf node containing audio essence and possibly related metadata, where both body tags and the Taylor context are empty, the body tags may be omitted and the node may be short as shown in FIG. Become a node. Short nodes are limited to header context 40 'because they cannot distinguish between header context and footer context in the absence of body tags. Referring to the details of FIG. 6, the node structure begins with a start tag segment 50 and ends with an end tag segment 51. As in the case of the example of FIG. 5, tag segments are marked with an "X" because the type of tag depends on the position of the node in the hierarchy. In the case of a leaf node in this embodiment, the tag segment may be channel tags. The header context 40 'is between the start tag and the end tag and includes one or more content segments 45'.

계층구조Hierarchy

비트스트림의 계층적 구조는 노드들의 바디 콘텍스트의 구조에 의해 명시될 수 있다. 노드들에 연관된 헤더 및 테일러 콘텍스트들의 콘텐츠 및 의미론은 본 발명의 비트스트림 포맷이 채용되는 환경에 특정하며 본 발명의 부분을 형성하지 않는다. The hierarchical structure of the bitstream can be specified by the structure of the body context of the nodes. The content and semantics of the header and Taylor contexts associated with the nodes are specific to the environment in which the bitstream format of the present invention is employed and do not form part of the present invention.

용이한 확장성을 위해서, 본 발명의 면들에 따라 포맷된 비트스트림을 수신하여 처리하는 애플리케이션에 의해 콘텍스트 외의 콘텐트 세그먼트들 및 노드들은 생략되어 무시될 수 있다. 그러나, 콘텍스트 내이긴 하나 순서 외인 노드들은 에러로서 취급될 수 있다. "콘텍스트 내"라는 것은 특정 노드 콘텍스트에 속하는 것으로서 정의된 세그먼트들 및 노드들을 말한다. 예를 들면, 후술하는 바와 같이, 최상위 채널(TOC) 노드는 프레임 바디 내 있을 때 콘텍스트 내이지만 GOF 노드에 있다면 콘텍스트 외가 될 것이다. 이러한 방법은 미래의 애플리케이션들이 구 애플리케이션들과 호환성은 유지하면서 추가의 콘텐트 세그먼트들 및 노드들을 삽입할 수 있게 함으로써 향후 호환성을 가능하게 한다.For ease of scalability, content segments and nodes other than context may be omitted and ignored by an application that receives and processes a bitstream formatted according to aspects of the present invention. However, nodes that are within the context but out of order can be treated as errors. "In context" refers to segments and nodes defined as belonging to a particular node context. For example, as discussed below, the top channel (TOC) node will be in the context when it is in the frame body but out of context if it is in the GOF node. This approach enables future compatibility by allowing future applications to insert additional content segments and nodes while maintaining compatibility with older applications.

도 7에 도시한 바와 같이, 본 발명의 면들에 따른 비트스트림은 루트에 일련의 하나 이상의 그룹의 프레임들(GOF) 노드들을 가진 계층구조이다. GOF 노드들만이 이 예의 루트 노드에서 콘텍스트 내이다.As shown in Figure 7, a bitstream in accordance with aspects of the present invention is a hierarchy having a series of one or more groups of frames (GOF) nodes at the root. Only GOF nodes are in context at the root node of this example.

일 그룹의 프레임들(GOF)의 노드들Nodes of a Group of Frames (GOF)

GOF 노드(60...61)(도 7)은 비트스트림에 실린 오디오물의 부분을 정확하게 재생하는데 필요한 정보를 갖는 엔터티이다. 프레임 노드들은 각 GOF 노드 내에 네 스트된다. 이상적으로는 GOF 노드는 비트스트림들의 GOF 경계에서 쉽게 조작(예를 들면 이어지게)될 수 있도록 충분한 정보를 갖는다.GOF nodes 60 ... 61 (FIG. 7) are entities having information necessary for accurately reproducing portions of audio material carried in the bitstream. Frame nodes are nested within each GOF node. Ideally, the GOF node has enough information to be easily manipulated (eg, followed) at the GOF boundary of the bitstreams.

노드 명 Node name GOF GOF tag_id tag_id (아래 참조) (See below) 콘텍스트내 노드들 Nodes in context 프레임 frame 바디 구조 Body structure 제로 혹은 그 이상의 프레임 노드들 Zero or more frame nodes

프레임 노드Frame nodes

프레임 노드(62...63)(도 7)는 오디오 에센스 및 시간간격에 대응하는 메타데이터 정보를 포함한다. 하나의 최상위 채널들(TOC) 노드 및 하나의 최하위 채널들(BOC) 노드는 각 프레임 노드 내에 네스트될 수 있다. 프레임 레벨에 있는 메타데이터는 GOF 레벨에서 이미 발견된 것을 보완할 수 있고 프레임 노드들에 걸쳐 변경될 수 있다. 프레임 노드들은 프레임 레벨에서 메타데이터가 프레임들에 걸쳐 변하지 않는다면 독립적일 수 있다. 필요한 요건은 아닐지라도, 프레임들은 동반된 화상 에센스와 동기화될 수도 있다. 대안적으로, 채널들은 3이상의 노드들로 그룹으로 형성되거나 채널들은 채널 노드들이 콘텍스트 내 노드들이 되도록 각 프레임 하에 직접 네스트될 수도 있다.Frame nodes 62 ... 63 (FIG. 7) contain metadata information corresponding to audio essences and time intervals. One top channel (TOC) node and one bottom channel (BOC) node may be nested within each frame node. Metadata at the frame level can complement what is already found at the GOF level and can be changed across frame nodes. Frame nodes can be independent if metadata at the frame level does not change over frames. Although not a requirement, the frames may be synchronized with the accompanying picture essence. Alternatively, the channels may be grouped into three or more nodes or the channels may be nested directly under each frame such that the channel nodes are nodes in the context.

노드 명 Node name 프레임 frame tag_id tag_id (아래 참조) (See below) 콘텍스트내 노드들 Nodes in context TOC 및 BOC TOC and BOC 바디 구조 Body structure 한 BOC 노드가 이어지는 한 TOC 노드 One TOC node followed by one BOC node

TOC 및 BOC 노드들TOC and BOC Nodes

TOC 및 BOC 노드들 각각은 프레임 내 포함된 정보의 대략 반에 대응하는 메 타데이터 및 에센스 정보를 갖는다. 이러한 구성은 엔코더들 및 디코더들이 프레임 전체를 수신 혹은 전송하기 전에 이 프레임의 처리를 시작할 수 있게 함으로써 레이턴시를 감소시킬 수 있다. TOC 및 BOC 바디 콘텍스트들은 0이상의 채널 노드들을 갖는다.Each of the TOC and BOC nodes has metadata and essence information corresponding to approximately half of the information contained in the frame. This configuration can reduce latency by allowing encoders and decoders to begin processing the frame before receiving or transmitting the entire frame. TOC and BOC body contexts have zero or more channel nodes.

노드 명 Node name TOC TOC tag_id tag_id (아래 참조) (See below) 콘텍스트내 노드들 Nodes in context 채널 channel 바디 구조 Body structure 0이상의 채널 노드 Zero or more channel nodes

노드 명 Node name 프레임 frame tag_id tag_id (아래 참조) (See below) 콘텍스트내 노드들 Nodes in context 채널 channel 바디 구조 Body structure 0이상의 채널 노드 Zero or more channel nodes

채널노드Channel node

각 채널노드는 단일의 독립적인 에센스 엔터티를 나타내며, 통상적으로 0이상의 메타데이터 세그먼트들이 동반되는 하나 이상의 에센스 세그먼트들을 갖는다. 이 비트스트림 포맷 실시예에서, 채널노드의 바디는 비어있고, 테일러가 정의되어 있지 않다면, 노드 구조는 짧은 노드형태를 취할 수 있다.Each channel node represents a single independent essence entity and typically has one or more essence segments accompanied by zero or more metadata segments. In this bitstream format embodiment, if the body of the channel node is empty and no taylor is defined, the node structure may take the form of a short node.

노드 명 Node name 채널 channel tag_id tag_id (아래 참조) (See below) 콘텍스트내 노드들 Nodes in context <없음> <None> 바디 구조 Body structure <비어 있음> <Empty>

세그먼트 명세Segment specification

세그먼트들은 간이 C 언어 신택스에 기초해서, 다음의 의사코드에 의해 보다 상세히 명시될 수 있다. 1비트보다 큰 청크 원소들에 있어서, 비트들의 도착 순서는 항시 MSB가 먼저이다. 프레임 내 포함된 필드들 혹은 원소들은 볼드체로 표시하 였다.Segments can be specified in more detail by the following pseudocode, based on the simplified C language syntax. For chunk elements larger than 1 bit, the arrival order of the bits is always MSB first. Fields or elements included in the frame are shown in bold.

태그 세그먼트 파라미터들Tag Segment Parameters

"is_tag" 파라미터"is_tag" parameter

워드 크기: 1Word size: 1

유효 범위: 1Effective range: 1

태그 세그먼트는 항시 1의 is_tag 값을 갖는다.The tag segment always has an is_tag value of 1.

"start_or_end" 파라미터"start_or_end" parameter

워드 크기: 1Word size: 1

유효 범위: 0(시작), 1(끝)Valid Range: 0 (start), 1 (end)

이 파라미터의 값은 태그가 시작 태그(0)인지 아니면 끝 태그(1)인지를 나타낸다.The value of this parameter indicates whether the tag is a start tag (0) or an end tag (1).

"is_long_id" 파라미터"is_long_id" parameter

워드 크기:1Word size: 1

유효 범위: 0(5비트 id 필드), 1(13비트 필드)Valid Range: 0 (5-bit id field), 1 (13-bit field)

이 파라미터의 값은 tag_id 필드가 5비트인지 13비트인지를 나타낸다.The value of this parameter indicates whether the tag_id field is 5 bits or 13 bits.

"tag_id" 파라미터"tag_id" parameter

워드 크기: 5 내지 13(앞의 파라미터 참조)Word size: 5 to 13 (see previous parameter)

유효 범위: [0..31] 혹은 [0..2¹³-1]Valid Range: [0..31] or [0..2 ¹³ -1]

이 파라미터의 값은 세그먼트가 어떤 태그를 나타내는지를 나타낸다. 다음 태그들이 정의될 수 있다.The value of this parameter indicates which tag the segment represents. The following tags can be defined.

태그 tag 태그 설명 Tag Description 0 0 프레임 태그 Frame tag 1 One 최상의 채널(TOC) 태그 Best Channel (TOC) Tag 2 2 최하위 채널(BOC) 태그 Lowest Channel (BOC) Tag 3 3 채널 태그 Channel tag 4 4 일 그룹의 프렘들(GOF)의 태그 Tags of a group of prems (GOF) 5 5 바디 태그 Body tag

콘텐트 세그먼트 파라미터Content segment parameters

"is_tag" 파라미터"is_tag" parameter

워드 크기:1Word size: 1

유효 범위: 0Valid Range: 0

콘텐츠 세그먼트는 항시 0의 is_tag 값을 갖는다.The content segment always has an is_tag value of zero.

metadata_or_essence 파라미터metadata_or_essence parameter

워드 크기:1Word size: 1

유효 범위:0(메타데이터),1(에센스)Effective range: 0 (metadata), 1 (essence)

이 파라미터의 값은 세그먼트가 메타데이터(0)을 포함하는지 아니면 에센스(1)를 포함한지를 나타낸다.The value of this parameter indicates whether the segment contains metadata (0) or essence (1).

"is_long_id" 파라미터"is_long_id" parameter

워드 크기: 1Word size: 1

유효 범위: 0(5비트 id 필드), 1(13비트 id 필드)Valid Range: 0 (5-bit id field), 1 (13-bit id field)

이 파라미터의 값은 content_id 필드가 5비트인지 아니면 13비트인지를 나타낸다.The value of this parameter indicates whether the content_id field is 5 bits or 13 bits.

"content_id" 파라미터"content_id" parameter

유효 범위:[0..31] 혹은 [0..2¹³-1]Valid Range: [0..31] or [0..2 ¹³ -1]

이 파라미터의 값은 세그먼트 내 포함된 정보의 유형을 고유하게 식별한다.The value of this parameter uniquely identifies the type of information contained in the segment.

"content_length_class" 파라미터"content_length_class" parameter

워드 크기:2Word size: 2

유효 범위:[0..3]Valid Range: [0..3]

content_length_class 파라미터는 다음 테이블에 따라 세그먼트의 최대 길이를 결정할 수 있다.The content_length_class parameter may determine the maximum length of a segment according to the following table.

content_length classcontent_length class content_length parameter depthcontent_length parameter depth 00 6 비트6 bit 1One 14 비트14 bit 22 22 비트22 bit 33 30 비트30 bit

"content_length" 파라미터"content_length" parameter

워드 크기: (content_length_class+1)*8-2Word size: (content_length_class + 1) * 8-2

유효 범위: [0..63](content_length_class==0)Valid Range: [0..63] (content_length_class == 0)

[0..16383](content_length_class==1) [0..16383] (content_length_class == 1)

[0..2^22](content_length_class==2) [0..2 ^ 22] (content_length_class == 2)

[0..2^30](content_length_class==3) [0..2 ^ 30] (content_length_class == 3)

content_length 파라미터는 페이로드의 총 길이를 바이트로 결정한다.The content_length parameter determines the total length of the payload in bytes.

AC-3 시리얼 코딩된 오디오 비트스트림들의 캡슐화 예Example of Encapsulation of AC-3 Serial Coded Audio Bitstreams

위에 언급한 바와 같이, 엔코딩된 오디오 정보는 본 발명의 면들에 따라 포 맷된 비트스트림의 세그먼트들로서 캡슐화 될 수 있다. 이것의 예로서, AC-3 시리얼 코딩된 오디오 비트스트림의 필수 부분들은 다음의 방식으로 캡슐화 될 수 있다.As mentioned above, the encoded audio information can be encapsulated as segments of the bitstream formatted according to aspects of the present invention. As an example of this, the essential parts of the AC-3 serial coded audio bitstream may be encapsulated in the following manner.

AC-3 디지털 오디오 압축 표준은 ATSC 표준: Digital Audio Compression(AC-3), Revision A, Document A/52A, Advanced Television Systems Committee, 20 August 2001(the "A/52A Document")에 기술되어 있다. A/52A 문헌 전체를 참조로 여기 포함시킨다.AC-3 digital audio compression standards are described in the ATSC standard: Digital Audio Compression (AC-3), Revision A, Document A / 52A, Advanced Television Systems Committee, 20 August 2001 (the "A / 52A Document"). The entirety of A / 52A is incorporated herein by reference.

AC-3 비트스트림 신택스는 A/52A 문헌의 섹션 5( 및 그 외의 곳)에 기재되어 있다. AC-3 시리얼 코딩된 오디오 비트스트림은 일련의 동기화 프레임들("동기 프레임들")로 구성된다. 도 8a는 본 발명의 면들에 따라 비트스트림에 2개의 AC-3 동기 프레임들을 매칭하는 것을 도시한 것이다. 각 AC-3 동기 프레임은 6개의 코딩된 오디오 블록들(AB0 내지 AB5)을 포함하고, 그 각각은 256개의 새로운 오디오 샘플들을 나타낸다. 각 프레임의 시작에 동기화 정보(SI) 헤더는 동기화를 얻고 유지하는데 필요한 정보를 갖는다. 비트스트림 정보(BSI) 헤더는 SI 다음에 오며, 코딩된 오디오 서비스를 기술하는 파라미터들을 갖는다. 코딩된 오디오 블록들 다음엔 보조 데이터(Aux) 필드가 올 수 있다. 흔히, 보조 데이터는 AC-3 프레임의 비트 길이를 조정하는데 필요한 눌 "패딩" 비트를 포함한다. 그러나, 어떤 경우엔, 보조 데이터가 정보를 갖는다. 각 프레임의 끝에는 에러검출을 위한 CRC 워드를 포함하는 에러 체크 필드가 있다. 추가의 CRC 워드가 SI 헤더 내 배치되는데, 이의 사용은 옵션이다.AC-3 bitstream syntax is described in section 5 (and elsewhere) of the A / 52A document. An AC-3 serial coded audio bitstream consists of a series of synchronization frames (“sync frames”). 8A illustrates matching two AC-3 sync frames to a bitstream in accordance with aspects of the present invention. Each AC-3 sync frame contains six coded audio blocks AB0 through AB5, each of which represents 256 new audio samples. At the beginning of each frame, a synchronization information (SI) header has the information needed to obtain and maintain synchronization. The Bitstream Information (BSI) header follows the SI and has parameters that describe the coded audio service. Coded audio blocks may be followed by an auxiliary data (Aux) field. Often, the ancillary data includes the pressed "padding" bits needed to adjust the bit length of the AC-3 frame. In some cases, however, the auxiliary data has information. At the end of each frame is an error check field containing a CRC word for error detection. An additional CRC word is placed in the SI header, the use of which is optional.

도 8a는 각각이 하나 이상의 AC3 채널들을 나타내는 것인 2개의 프레임 노드들로 구성된 일 그룹의 프레임 노드로 구성된 비트스트림에의 2개의 AC-3 동기 프레임들의 매핑을 도시한 것이다. Si 및 BSI 헤더들에 포함된 메타데이터 아이템들은 2개의 그룹들, 즉 (1) 프레임에 특유한 메타데이터 아이템들, 예를 들면, 타임코드, 및 (2) AC 및 이의 채널들 각각에 특정한 메타데이터로 분할된다. 특유 메타데이터는 "GFM" 메타데이터 세그먼트에 랩 되고, 특정 메타데이터는 "AC3M" 메타데이터 세그먼트에 랩 된다. Aux 블록은 사용자 비트를 갖는다면 Aux 세그먼트에 랩 되고, 패딩을 위해서만 사용된다면, 생략될 수도 있다. 주어진 비트스트림은 일부 자신들의 에러정정 메커니즘을 제공하는 다양한 인터페이스들에 걸쳐 이동할 수 있기 때문에, 에러 정정 및 검출정보는 생략될 수도 있다(CRC 블록이 생략될 수도 있다(생략되어 도시되었음).FIG. 8A shows the mapping of two AC-3 sync frames to a bitstream consisting of a group of frame nodes consisting of two frame nodes, each representing one or more AC3 channels. Metadata items included in the Si and BSI headers are divided into two groups: (1) metadata items specific to the frame, eg timecode, and (2) metadata specific to each of AC and its channels. Divided into. Specific metadata is wrapped in the "GFM" metadata segment, and specific metadata is wrapped in the "AC3M" metadata segment. Aux blocks are wrapped in Aux segments if they have user bits, and may be omitted if they are used only for padding. Since a given bitstream may move across various interfaces that provide some of their own error correction mechanisms, error correction and detection information may be omitted (CRC blocks may be omitted (not shown)).

특히, 도 8a에서, 2개의 AC-3 동기 프레임들이 도시되었고, 그 각각은 순서대로 SI, AB0 내지 AB5, Aux 및 CRC 원소들을 포함한다. 두 AC-3 동기 프레임들이 캡슐화를 위해 매핑되는 본 발명의 면들에 따른 비트스트림은 먼저 GOF l작 태그 및 이에 이은 프레임 시작 태그(FRM), 특유 프레임 메타데이터(GFM), AC-3 채널 시작 태그(AC3), AC-3 특정의 메타데이터(AC3M), AC-3 콘텐트 세그먼트들(AB0 내지 AB5 및 Aux), AC-3 채널 끝 태그(AC3), 프레임 끝 태그(FRM), 및 제2 AC-3 동기 프레임으로부터 매핑된 동일 열을 포함한다.In particular, in FIG. 8A, two AC-3 sync frames are shown, each containing SI, AB0 to AB5, Aux and CRC elements in order. A bitstream according to aspects of the present invention, in which two AC-3 sync frames are mapped for encapsulation, is first a GOF first tag followed by a frame start tag (FRM), a unique frame metadata (GFM), an AC-3 channel start tag. (AC3), AC-3 specific metadata (AC3M), AC-3 content segments (AB0 through AB5 and Aux), AC-3 channel end tag (AC3), end frame tag (FRM), and second AC -3 contains the same column mapped from the sync frame.

도 8b는 2개의 보충 오디오 채널들을 부가하여 도 8a의 AC-3 캡슐화된 비트스트림을 도시한 것이다. 각 채널은 특유 채널(GCH) 노드에 포함될 수 있다. 제1 채널은 선형 PCM 샘플들을 포함할 수 있는 디렉터 주석(DC) 채널을 포함할 수 있다. 특유 채널 메타데이터(GCM) 세그먼트는 이 채널을 DC채널을 포함하는 것으로 확인한다. 제2 채널은 CELP(Code-Excited Linear Prediction)(유실성 엔코딩되는 음성 오디오 포맷) 엔코딩된 오디오로 구성될 수 있는, 시각적으로 손상된(VI) 채널을 포함할 수 있다. 다시, 특유 채널 메타데이터(GCM) 세그먼트는 채널을 VI 물을 포함하는 것으로서 확인할 수 있다. 각 추가 채널에 포함된 오디오 콘텐츠의 기간은 바람직하게는, 일정 기간을 갖는 AC3 노드 내 오디오 콘텐트의 기간에 맞춘다. 또한, 비트스트림을 식별하는 메타데이터는 일 그룹의 프레임들의 메타데이터 세그먼트(GOFM)에 추가될 수 있다.8B shows the AC-3 encapsulated bitstream of FIG. 8A with the addition of two supplemental audio channels. Each channel may be included in a unique channel (GCH) node. The first channel may comprise a director annotation (DC) channel, which may include linear PCM samples. The unique channel metadata (GCM) segment identifies this channel as including a DC channel. The second channel may comprise a visually impaired (VI) channel, which may consist of Code-Excited Linear Prediction (CELP) encoded audio. Again, the unique channel metadata (GCM) segment can identify the channel as containing VI material. The duration of the audio content included in each additional channel is preferably matched to the duration of the audio content in the AC3 node having a certain duration. In addition, metadata identifying the bitstream may be added to the metadata segment (GOFM) of a group of frames.

특히, 도 8b에는, 보충 디렉터 주석 및 시각적으로 손상된 오디오 채널들이 추가된 매핑된 제1 AC-3 동기 프레임의 상세가 도시되었다. 비트스트림은 먼저 GOF 시작 태그 및 이에 이어 비트스트림 식별 메타데이터(GOFM), 프레임 시작 태그(FRM), 특유 프레임 데이터(GFM), AC-3 채널 시작 태그(AC3), AC-3에 특정한 메타데이터(AC3M), AC-3 콘텐트 세그먼트(AB- 내지 AB5 및 Aux), AC-3 채널 끝 태그(AC3), 특유 채널 시작 태그(GCH), 특유 채널 메타데이터(GCM), 선형 PCM 오디오 에센스 세그먼트(PCM), 특유 채널 끝 태그(GCH), 특유 채널 시작 태그(GCH), 특유 채널 메타데이터(GCM), CELP-엔코딩된 오디오 에센스(CELP), 특유 채널 끝 태그(GCH), 및 프레임 끝 태그(FRM)을 포함한다. 제2 프레임(일부만이 도시됨)은 제2 프레임 정보를 갖고 같은 열을 반복한다.In particular, in FIG. 8B, the details of the mapped first AC-3 sync frame with supplemental director annotations and visually corrupted audio channels are shown. The bitstream first contains GOF start tag followed by bitstream identification metadata (GOFM), frame start tag (FRM), specific frame data (GFM), AC-3 channel start tag (AC3), and AC-3 specific metadata. (AC3M), AC-3 Content Segments (AB- through AB5 and Aux), AC-3 Channel End Tag (AC3), Unique Channel Start Tag (GCH), Unique Channel Metadata (GCM), Linear PCM Audio Essence Segment ( PCM), unique channel end tag (GCH), unique channel start tag (GCH), unique channel metadata (GCM), CELP-encoded audio essence (CELP), unique channel end tag (GCH), and end of frame tag ( FRM). The second frame (only some are shown) repeats the same column with the second frame information.

본 발명의 포맷의 잇점은 2개의 추가 채널들의 삽입이 AC3 데이터에의 수정 을 요하지 않으며, 이러한 삽입은 원 비트스트림이 스트리밍되고 있을 때 행해질 수도 있었을 것인데, 즉, 제2 프레임(도시생략)에서 VI 채널의 삽입은 제1 프레임의 콘텐트에 대해 알 필요가 없다는 것이다. 또한, VI 및/또는 DC 채널들을 해석할 수 없는 디코더들은 이들 채널들을 쉽게 무시할 수 있다. 예를 들면, VI 및 DC 채널들은 비트스트림의 콘텐트를 규정하는 명세에 대한 개정에 추가되어 있을 수도 있다. 따라서, 비트스트림은 기존과 호환될 수 있다.An advantage of the format of the present invention is that the insertion of two additional channels does not require modification to AC3 data, which could have been done when the original bitstream was being streamed, i.e., VI in the second frame (not shown). Insertion of the channel is that there is no need to know about the content of the first frame. Also, decoders that cannot interpret VI and / or DC channels can easily ignore these channels. For example, VI and DC channels may be added to an amendment to the specification that specifies the content of the bitstream. Therefore, the bitstream may be compatible with the existing.

도 9는 흐름도 혹은 기능 블록도로서, 본 발명의 면들에 따라, 도 3의 예와 유사한 비트스트림을 생성하기 위한 엔코더 혹은 엔코딩 프로세스의 여러 기능적인 면들을 도시한 것이다. 예를 들면, 선형 PCM 엔코딩된 오디오의 샘플들일 수 있는 오디오 에센스 스트림(91)은 오디오를 여러 블록의 적합한 기간(고정된 혹은 가변)으로 세그먼트하고 압축(예를 들면, 비트 레이트 감축 엔코딩)과 같은 추가의 처리를 적용할 수 있는 오디오 세그먼트화 및 처리 기능 혹은 디바이스(93)에 적용된다. 결과적인 오디오 데이터는 오디오 콘텐트 세그먼트들에 랩 될 수 있고, 이 중 한 예(95)가 개략적으로 도시되었다. 오디오 에센스에 관한 정보는 메타데이터 생성기(97)에 공급될 수 있다. 후자는 이러한 정보 및 아마도 다른 정보, 이를테면 사용자로부터 혹은 다른 기능들 혹은 디바이스들(도시생략)로부터의 정보를 사용하여, 비트스트림에의 삽입을 위해서, 오디오 에센스와 동기될 수도 혹은 동기되지 않을 수도 있는 메타데이터 세그먼트들을 생성한다.9 is a flow diagram or functional block diagram illustrating various functional aspects of an encoder or encoding process for generating a bitstream similar to the example of FIG. 3, in accordance with aspects of the present invention. For example, audio essence stream 91, which may be samples of linear PCM encoded audio, segments the audio into suitable periods (fixed or variable) of several blocks and compresses (eg, bit rate reduction encoding). Additional processing may be applied to the audio segmentation and processing function or device 93 where applicable. The resulting audio data can be wrapped in audio content segments, one of which is schematically illustrated 95. Information about the audio essence may be supplied to the metadata generator 97. The latter may use this information and possibly other information, such as information from the user or from other functions or devices (not shown), for synchronization into the bitstream, which may or may not be synchronized with the audio essence. Create metadata segments.

오디오 콘텐트 세그먼트들은 채널 노드 시리얼화 기능 혹은 디바이스(99)에 보내져 하나 이상의 오디오 콘텐트 세그먼트들 및 메타데이터 생성기(97)로부터 얻 어진 하나 이상의 연관된 메타데이터 세그먼트들(이 예에서는 다운믹싱(DM) 메타데이터의 한 세그먼트)를 포함하는 채널 노드(도 2의 계층의 레벨3과 비교)를 채널 노드 시작- 및 끝-태그들과 함께 생성한다. 채널 노드의 예(101)는 채널 시작 태그(CHAN), 다운믹스 메타데이터(DM), 오디오 에센스 세그먼트, 및 채널 끝 태그(CHAN)을 포함하는 것으로서 개략적으로 도시되었다.The audio content segments are sent to the channel node serialization function or device 99 and the one or more audio content segments and one or more associated metadata segments obtained from the metadata generator 97 (in this example downmixing (DM) metadata). Create a channel node (compare with Level 3 of the hierarchy of FIG. 2) along with the channel node start- and end-tags. Example 101 of a channel node is schematically illustrated as including a channel start tag (CHAN), downmix metadata (DM), audio essence segment, and channel end tag (CHAN).

채널 노드는 프레임 노드 시리얼화기(103)에 공급되어 입력 채널 노드 및 메타데이터 생성기(97)로부터 얻어진 연관된 프레임 레벨 메타데이터(이 예에서는 타임코드(TC) 메타데이터의 한 세그먼트)을 포함하는 프레임 노드(도 2의 계층의 레벨 2와 비교)를 프레임 노드 시작- 및 끝-태그들과 함께 생성한다. 프레임 노드의 예(105)가 프레임 시작-태그(REAM), 타임코드 메타데이터(TC), 채널 노드 시퀀스, 및 프레임 끝-태그(FRAM)을 포함하는 것으로서 개략적으로 도시되었다.The channel node is supplied to the frame node serializer 103 and includes the frame node metadata (in this example, a segment of timecode (TC) metadata) obtained from the input channel node and the metadata generator 97. (Compared to level 2 of the layer of FIG. 2) with frame node start- and end-tags. An example 105 of a frame node is schematically illustrated as including a frame start-tag (REAM), timecode metadata (TC), channel node sequence, and frame end-tag (FRAM).

프레임 노드는 일 그룹의 프레임들(gof) 노드 시리얼화기 기능 혹은 디바이스(107)에 공급되어 연속한 프레임 노드들 및 메타데이터 생성기(97)로부터 얻어진 연관된 한 세그먼트의 타이틀(TITL) 메타데이터를 일 그룹의 프레임 시작 및 끝-태그들과 함께 완전한 비트스트림(도 2의 계층의 레벨 1과 비교)으로 조합한다. 완전한 비트스트림의 예가 일 그룹의 프레임 시작 태그(GOF), 타이틀 메타데이터(TITL), 2개의 프레임 열들, 및 일 그룹의 프레임 끝 태그(GOF)를 포함함을 것으로서 개략적으로 도시되었다.The frame node is a group of frames node serializer function or device 107 which is supplied with a group of title (TITL) metadata of consecutive frame nodes and associated segment obtained from metadata generator 97. Combine into a complete bitstream (compare level 1 of the layer of FIG. 2) with the frame start and end tags. An example of a complete bitstream is shown schematically as including a group of frame start tags (GOF), title metadata (TITL), two frame columns, and a group of frame end tags (GOF).

도 10은 본 발명의 면들에 따라, 도 3 및 도 9의 예들의 것과 같은 비트스트림으로부터 오디오 에센스 및 메타데이터를 도출하기 위한 디코더 혹은 디코딩 프로세스의 여러 기능적 면들을 도시한, 흐름도 혹은 기능 블록도이다. FIG. 10 is a flow diagram or functional block diagram illustrating various functional aspects of a decoder or decoding process for deriving audio essence and metadata from a bitstream such as the examples of FIGS. 3 and 9, in accordance with aspects of the present invention. .

도 9의 예에 의해 생성된 것과 같은 비트스트림은 일 그룹의 프레임들(gof) 노드 디시리얼화기(121)에 인가된다. gof 노드 디시리얼화기는 gof 시작 및 끝 태그들 및 gof 메타데이터(이 예에서는 타이틀(TITL) 메타데이터)을 인식하여 제거하고, 메타데이터를 메타데이터 해석기(123)에 보내고, 프레임 노드들을 프레임 노드 디시리얼화기(125)에 보낸다. 근본적으로 도 9의 프레임 노드(105)와 동일할 수 있는 프레임 노드(105)의 예가 개략적으로 도시되었다. A bitstream as generated by the example of FIG. 9 is applied to a group of frames node deserializer 121. The gof node deserializer recognizes and removes gof start and end tags and gof metadata (in this example, title (TITL) metadata), sends the metadata to the metadata resolver 123, and sends the frame nodes to the frame node. To the deserializer 125. An example of a frame node 105 that is essentially the same as the frame node 105 of FIG. 9 is shown schematically.

프레임 노드 디시리얼화기(125)는 프레임 노드 시작 및 끝 태그들 및 프레임 메타데이터(이 예에서는 타임코드(TC) 메타데이터)을 인식하여 제거하고, 메타데이터를 메타데이터 해석기(123)에 보내고, 채널 노드들을 채널 노드 디시리얼화기(127)에 보낸다. 근본적으로 도 9에 채널 노드(101)와 동일할 수 잇는 채널 노드(101)의 예가 개략적으로 도시되었다.Frame node deserializer 125 recognizes and removes frame node start and end tags and frame metadata (in this example, timecode (TC) metadata), sends the metadata to metadata interpreter 123, The channel nodes are sent to channel node deserializer 127. An example of a channel node 101, which may be essentially the same as the channel node 101, is schematically illustrated in FIG. 9.

채널 노드 디시리얼화기(127)는 채널 노드 시작 및 끝 태그들 및 채널 메타데이터(이 예에서는 다운믹스(DM) 메타데이터)를 인식하여 제거하고, 메타데이터를 메타데이터 해석기(123)에 보내고, 도 9의 엔코더 혹은 엔코딩 프로세스에 인가된 오디오 에센스와 근본적으로 동일할 수 있는 오디오 에센스(91) 스트림을 재 어셈블하는 오디오 렌더링 프로세스 혹은 디바이스(129)에 오디오 에센스 세그먼트들을 보낸다.Channel node deserializer 127 recognizes and removes channel node start and end tags and channel metadata (in this example downmix (DM) metadata), sends the metadata to metadata interpreter 123, Audio essence segments are sent to an audio rendering process or device 129 that reassembles an audio essence 91 stream that may be essentially the same as the audio essence applied to the encoder or encoding process of FIG. 9.

메타데이터 해석기(123)는 각종의 메타데이터를 해석하고 이를 기능들 및/또는 디바이스들(도시생략)에 그리고 오디오 렌더링(129)에 인가할 수 있다.Metadata interpreter 123 may interpret various metadata and apply it to functions and / or devices (not shown) and to audio rendering 129.

본 발명 및 이의 여러 면들은 이를테면 디지털 신호 프로세서들, 프로그램된 범용 디지털 컴퓨터들, 및/또는 전용 디지털 컴퓨터들에서 수행되는 소프트웨어 기능들에 의해서 다양한 방법들로 구현될 수 있다. 아날로그 및/또는 디지털 신호 스트림들간 인터페이스들은 적합한 하드웨어 및/또는 기능들로 소프트웨어 및/또는 펌웨어로 수행될 수 있다. 본 발명 및 이의 여러 면들이 이들의 소스로서 아날로그 오디오 신호들을 가질 수 있을지라도, 본 발명의 면들을 실시하는 대부분의 혹은 모든 처리 기능들은 오디오 신호들이 샘플들에 의해 표현되는 디지털 신호 스트림들에 대해 디지털 영역에서 수행될 것이다.The present invention and its various aspects can be implemented in a variety of ways, such as by software functions performed on digital signal processors, programmed general purpose digital computers, and / or dedicated digital computers. Interfaces between analog and / or digital signal streams may be performed in software and / or firmware with suitable hardware and / or functions. Although the present invention and its various aspects may have analog audio signals as their source, most or all of the processing functions that implement the aspects of the present invention are digital to digital signal streams in which audio signals are represented by samples. Will be performed in the area.

본 발명의 면들에 따라 포맷된 비트스트림은 임의의 공지의 하나 이상의 데이터 저장 및 전송 매체에 의해 저장 혹은 전송될 수 있다.A bitstream formatted according to aspects of the present invention may be stored or transmitted by any known one or more data storage and transmission media.

본 발명의 다른 변형 및 수정들의 구현 및 이의 여러 면들은 당업자들에게 명백할 것이고 기술된 이들 구체적인 실시예들로 한정되지 않음을 알 것이다. 그러므로, 본 발명에 의해 개시 및 여기 청구된 기본적 내재된 원리들의 정신 및 범위 내의 임의의 및 모든 수정들, 변형들, 혹은 등가물들을 포괄한다.It will be appreciated that implementations and other aspects of other variations and modifications of the invention will be apparent to those skilled in the art and are not limited to these specific embodiments described. Therefore, it is intended to cover any and all modifications, variations, or equivalents within the spirit and scope of the underlying underlying principles disclosed and claimed herein.

Claims

In a bitstream format for representing audio information in which bitstream syntax is described by sequential traversal of a tree hierarchical data structure, the tree structure is:

A plurality of tree hierarchy levels, each having one or more nodes, at least some progressively smaller subdivision of the audio information being represented by progressively lower levels of the tree hierarchy, wherein the audio information is Bitstream format included among one or more of the levels of nodes.

2. The bitstream format of claim 1, wherein the progressively smaller subdivision of the audio comprises one or more temporal subdivisions, spatial subdivisions, and resolution subdivisions.

2. The bitstream format of claim 1 wherein the first level of the tree hierarchy includes a root node representing all audio information and at least one lower level comprises a plurality of nodes representing time intervals of audio information.

4. The bitstream format of claim 3 wherein the at least one lower level comprises a plurality of nodes representing spatial subdivision of audio information.

The method of claim 1, wherein the bitstream comprises a series of independent tags and content segments, each tag segment serving as a delimiter, each content segment being associated with audio information or audio information. A payload containing data, wherein said segments are arranged into hierarchically nested nodes that are structurally independent between levels of said tree hierarchy.

6. The bitstream format of claim 5 wherein each node is delimited by start and end tag segments.

7. The bitstream format of claim 6, wherein the start- and end-tag segments delimit header and footer contexts within a node.

8. A node as claimed in any preceding claim, wherein the node comprising one or more content segments carrying audio information comprises one or more content segments carrying metadata related to the audio information in the one or more content segments carrying audio information. Bitstream format.

A bitstream formatted according to the bitstream format of claim 1.

A system for encoding and decoding a bitstream having a format according to the bitstream format according to claim 1.

An encoder for encoding a bitstream having a format according to the bitstream format according to claim 1.

A decoder for decoding a bitstream having a format according to the bitstream format according to claim 1.

An apparatus for transcoding a bitstream having a format according to the bitstream format according to claim 1.

A process of generating a bitstream formatted according to the bitstream format according to claim 1.

A process for encoding and decoding a bitstream having a format according to the bitstream format according to claim 1.

A process for encoding a bitstream having a format according to the bitstream format according to claim 1.

A process for decoding a bitstream having a format according to the bitstream format according to claim 1.

A process for transcoding a bitstream having a format according to the bitstream format according to claim 1.

A medium for storing or transmitting a bitstream according to claim 9.