KR20060122734A

KR20060122734A - Encoding / Decoding Method of Audio Signal for Selecting Spatial Information Transmission Method

Info

Publication number: KR20060122734A
Application number: KR1020060046972A
Authority: KR
Inventors: 오현오; 김효진; 김동수; 정양원; 임재현; 방희석
Original assignee: 엘지전자 주식회사
Priority date: 2005-05-26
Filing date: 2006-05-25
Publication date: 2006-11-30
Also published as: KR20060122694A; KR20060122693A; KR20060122692A

Abstract

본 발명은 멀티채널 오디오 신호의 공간 정보에 대한 부호-복호화(encoding-decoding)방법에 관한 것으로서, 더욱 상세하게는 공간 정보 비트스트림을 전송하는 방법을 선택할 수 있는 멀티채널 오디오 신호의 부호화-복호화하는 방법에 대한 것이다.The present invention relates to an encoding-decoding method for spatial information of a multichannel audio signal. More particularly, the present invention relates to a method of encoding and decoding a multichannel audio signal capable of selecting a method of transmitting a spatial information bitstream. It's about how.

상기와 같은 공간 정보를 부호-복호화하는 방법을 제공하기 위해, 본 발명은 공간 정보 비트스트림을 다운믹스된 오디오 신호에 임베디드하여 보내거나, 공간 정보 비트스트림을 별도로 보내거나, 또는 공간 정보 비트스트림을 다운믹스 오디오 신호에 임베디드할뿐만 아니라 별도로 보내는 방법을 선택할 수 있는 부호화 방법을 제공함으로써, 다양한 종류의 전송매체에 멀티채널 오디오 신호를 전송할 수 있다. 또한, 본 발명은 공간정보 비트스트림이 다운믹스 오디오 신호에 임베디드된 경우, 별도로 보내진 경우, 또는 다운믹스 오디오 신호에 임베디드될뿐만 아니라 별로도 보내지는 경우에, 각각의 방식에 따라 제공되는 공간 정보 비트스트림을 디코딩하는 방법을 제공함으로써, 다양한 방식으로 제공되는 공간 정보 비트스트림을 이용할 수 있다.In order to provide a method of code-decoding spatial information as described above, the present invention embeds a spatial information bitstream in a downmixed audio signal, sends a spatial information bitstream separately, or sends a spatial information bitstream. By providing an encoding method that can be embedded in the downmix audio signal as well as a separate transmission method, a multichannel audio signal can be transmitted to various types of transmission media. In addition, the present invention provides spatial information bits provided according to respective schemes when the spatial information bitstream is embedded in the downmix audio signal, when sent separately, or when not only embedded in the downmix audio signal but also transmitted separately. By providing a method of decoding a stream, it is possible to use a spatial information bitstream provided in various ways.

Description

ENCODED AND DECODING METHOD OF AUDIO SIGNAL WITH SELECTABLE TRANSMISSION METHOD OF SPATIAL BITSTREAM}

도 1은 본 발명에서의 오디오 신호에 대한 공간 정보를 인간이 인식하는 방법을 나타내는 도면.BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram illustrating a method for a human to recognize spatial information about an audio signal in the present invention.

도 2는 본 발명에 따른 공간인코더에 대한 블록도.2 is a block diagram of a spatial encoder according to the present invention.

도 3은 본 발명에 따른 도 2의 공간인코더를 구성하는 임베드부에 대한 상세한 블록도.Figure 3 is a detailed block diagram of the embedding portion constituting the spatial encoder of Figure 2 according to the present invention.

도 4는 본 발명에 따른 공간 정보 비트스트림을 재정렬하는 제1 방법에 대한 도면.4 is a diagram of a first method of reordering spatial information bitstreams in accordance with the present invention.

도 5는 본 발명에 따른 공간 정보 비트스트림을 재정렬하는 제2 방법에 대한 도면.5 is a diagram of a second method for reordering spatial information bitstreams in accordance with the present invention.

도 6a는 본 발명에 따른 임베드를 위한 공간 정보 비트스트림의 구성형태에 대한 도면.6A is a diagram of the configuration of a spatial information bitstream for embedding in accordance with the present invention.

도 6b는 도 6a의 임베드를 위한 공간 정보 비트스트림의 구성형태에 대한 상세도.FIG. 6B is a detailed diagram of the configuration of a spatial information bitstream for embedding in FIG. 6A; FIG.

도 7은 본 발명에 따른 공간디코더에 대한 블록도.7 is a block diagram of a spatial decoder according to the present invention.

도 8은 본 발명에 따른 공간디코더를 구성하는 임베디드신호디코더에 대한 상세한 블록도.8 is a detailed block diagram of an embedded signal decoder constituting a spatial decoder according to the present invention.

도 9는 본 발명에 따른 오디오 신호를 일반적인 PCM디코더에서 재생하는 경우에 대한 도면.9 is a diagram of a case of reproducing an audio signal according to the present invention in a general PCM decoder.

도 10은 본 발명에 따른 다운믹스된 오디오 신호에 공간 정보 비트스트림을 임베드하는 인코딩 방법에 대한 흐름도.10 is a flowchart of an encoding method for embedding a spatial information bitstream in a downmixed audio signal in accordance with the present invention.

도 11은 본 발명에 따른 다운믹스된 오디오 신호에 임베드된 공간 정보 비트스트림을 디코딩하는 방법에 대한 흐름도.11 is a flowchart of a method of decoding a spatial information bitstream embedded in a downmixed audio signal in accordance with the present invention.

도 12는 본 발명에 따른 다운믹스된 오디오 신호에 임베드되는 공간 정보 비트스트림의 프레임(또는 서브프레임) 크기를 나타내는 도면.12 illustrates a frame (or subframe) size of a spatial information bitstream embedded in a downmixed audio signal in accordance with the present invention.

도 13은 본 발명에 따른 다운믹스된 오디오 신호에 일정한 크기로 임베드되는 공간 정보 비트스트림을 나타내는 도면.13 illustrates a spatial information bitstream embedded in a downmixed audio signal with a constant size in accordance with the present invention.

도 14a는 고정된 크기로 임베드되는 공간정보 비트스트림의 시간축정렬(time align) 문제를 해결하기 위한 제1 방법을 나타내는 도면.FIG. 14A is a diagram illustrating a first method for solving a time alignment problem of a spatial information bitstream embedded at a fixed size. FIG.

도 14b는 고정된 크기로 임베드되는 공간정보 비트스트림의 시간축정렬(time align) 문제를 해결하기 위한 제2 방법을 나타내는 도면.FIG. 14B illustrates a second method for solving the time alignment problem of a spatial information bitstream embedded at a fixed size. FIG.

도 15는 공간 정보 비트스트림이 복호화 단계에서 적용되는 프레임 크기 단위로 다운믹스 오디오 신호와 결합하는 것을 나타내는 도면.15 is a diagram illustrating that a spatial information bitstream is combined with a downmix audio signal in units of a frame size applied in a decoding step.

도 16은 본 발명에 따른 다운믹스된 오디오 신호에 다양한 크기로 임베드되는 공간 정보 비트스트림을 인코딩하는 방법에 대한 흐름도.16 is a flow diagram of a method for encoding a spatial information bitstream embedded in various sizes in a downmixed audio signal in accordance with the present invention.

도 17는 본 발명에 따른 다운믹스된 오디오 신호에 일정한 크기로 임베드되는 공간 정보 비트스트림을 인코딩하는 방법에 대한 흐름도.FIG. 17 is a flow diagram of a method for encoding a spatial information bitstream embedded with a constant size in a downmixed audio signal in accordance with the present invention. FIG.

도 18은 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제1 방법을 나타내는 도면.18 illustrates a first method of embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention.

도 19는 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제2 방법을 나타내는 도면.19 illustrates a second method of embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention.

도 20은 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제3 방법을 나타내는 도면.20 illustrates a third method of embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention.

도 21은 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제4 방법을 나타내는 도면.21 illustrates a fourth method of embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention.

도 22는 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제5 방법을 나타내는 도면.FIG. 22 illustrates a fifth method for embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention. FIG.

도 23은 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제6 방법을 나타내는 도면.23 illustrates a sixth method of embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention.

도 24는 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제7 방법을 나타내는 도면.FIG. 24 illustrates a seventh method of embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention. FIG.

도 25는 본 발명에 따른 두 채널의 다운믹스된 오디오 신호에 임베드되는 공간 정보 비트스트림을 인코딩하는 방법에 대한 흐름도.25 is a flowchart of a method for encoding a spatial information bitstream embedded in a downmixed audio signal of two channels in accordance with the present invention.

도 26은 본 발명에 따른 두 채널의 다운믹스된 오디오 신호에 임베드된 공간 정보 비트스트림을 디코딩하는 방법에 대한 흐름도.FIG. 26 is a flow diagram of a method of decoding a spatial information bitstream embedded in a downmixed audio signal of two channels in accordance with the present invention. FIG.

도 27는 본 발명에 따른 공간 정보 비트스트림을 전송할 수 있는 방법을 선택할 수 있는 제1 공간인코더에 대한 도면.27 is a diagram of a first spatial encoder capable of selecting a method capable of transmitting spatial information bitstream in accordance with the present invention.

도 28은 본 발명에 따른 공간 정보 비트스트림을 전송할 수 있는 방법을 선택할 수 있는 제2 공간인코더에 대한 도면.FIG. 28 is a diagram of a second spatial encoder capable of selecting a method for transmitting a spatial information bitstream in accordance with the present invention. FIG.

도 29는 본 발명에 따른 공간 정보 비트스트림을 MPEG 비트스트림 형태로 별도로 보내는 경우에 대한 비트스트림 구성도.29 is a diagram illustrating the configuration of a bitstream for separately transmitting a spatial information bitstream in the form of an MPEG bitstream according to the present invention.

도 30은 본 발명에 따른 공간 정보 비트스트림을 전송하는 방법에 대한 흐름도.30 is a flowchart of a method for transmitting a spatial information bitstream in accordance with the present invention.

도 31은 본 발명에 따라 전송되는 공간 정보 비트스트림을 디코딩할 수 있는 제1 공간디코더에 대한 도면.FIG. 31 is a diagram of a first spatial decoder capable of decoding the spatial information bitstream transmitted in accordance with the present invention. FIG.

도 32는 본 발명에 따라 전송되는 공간 정보 비트스트림을 디코딩할 수 있는 제2 공간디코더에 대한 도면.32 illustrates a second spatial decoder capable of decoding the spatial information bitstream transmitted in accordance with the present invention.

도 33은 본 발명에 따라 전송되는 공간 정보 비트스트림을 디코딩할 수 있는 제3 공간디코더에 대한 도면.33 is a diagram of a third spatial decoder capable of decoding the spatial information bitstream transmitted in accordance with the present invention.

도 34는 본 발명에 따른 제1 공간디코더를 이용하여 디코딩하는 방법에 대한 흐름도.34 is a flowchart of a method of decoding using a first spatial decoder according to the present invention.

도 35는 본 발명에 따른 제2 공간디코더를 이용하여 디코딩하는 방법에 대한 흐름도.35 is a flowchart of a method of decoding using a second spatial decoder according to the present invention.

도 36은 본 발명에 따른 제3 공간디코더를 이용하여 디코딩하는 방법에 대한 흐름도.36 is a flowchart of a method of decoding using a third spatial decoder according to the present invention.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

101.원거리 음원 102.직접적인 음파101.Remote sound source 102.Direct sound wave

104.반사된 음파 201.멀티채널 오디오 신호104. Reflected Sound Wave 201. Multichannel Audio Signal

202.아티스틱 다운믹스신호 203.다운믹스부202. Artistic downmix signal 203. Downmix section

204.공간 정보 추출부 205.다운믹스된 오디오 신호204. Spatial information extraction unit 205. Downmixed audio signal

206.공간 정보 인코딩부 207.임베드부206. Spatial information encoding section 207. Embed section

208.임베드된 오디오 신호 303.버퍼208 Embedded audio signal 303 Buffer

304.마스킹한계값 계산부 305.비트스트림 재구성부304. Masking limit calculation unit 305. Bitstream reconstruction unit

306.인코딩부 401.공간 정보 비트스트림306. Encoding unit 401. Spatial information bitstream

402.비트플레인 405.프레임402.Bitplane 405.Frame

601.헤더 602.프레임 데이터601.Header 602.Frame data

603.싱크워드 604.k값603.Sinkword 604.k Value

606.에러검출코드 또는 에러정정코드 702.임베디드신호디코더606. Error detection code or error correction code 702. Embedded signal decoder

703.공간 정보 디코더 704.멀티채널 생성부703. Spatial information decoder 704. Multichannel generator

902.싱크워드 탐색부 903.헤더 디코딩부902. Sync word search unit 903. Header decoding unit

904.데이터 역변형부 1302.TS헤더904.Data Inverse Deformer 1302.TS Header

1303.TS(Transport Stream) 1406.TS싱크워드1303.TS (Transport Stream) 1406.TS Syncword

1408.플래그 1409.위치값1408.Flag 1409.Position value

2704.공간정보추출부 2708.전송방법선택부2704. Spatial information extraction section 2708. Transmission method selection section

2903.확장 비트스트림 2906.보조데이터 헤더2903.Extended Bitstream 2906.Auxiliary Data Header

2907.보조데이터 비트스트림2907.Secondary Data Bitstream

본 발명은 멀티채널 오디오 신호의 공간 정보에 대한 부호-복호화(encoding-decoding)방법에 관한 것이다.The present invention relates to an encoding-decoding method for spatial information of a multichannel audio signal.

최근에 디지털 오디오 신호에 대한 다양한 코딩기술 및 방법들이 개발되고 있으며, 이와 관련된 제품들이 생산되고 있다. 또한 멀티채널 오디오 신호의 공간 정보를 이용하여 모노 또는 스테레오 오디오 신호를 디코딩 단계에서 멀티채널로 바꾸는 코딩방법들이 개발되고 있으며, 이에 대한 제품이 실용화되고 있다. 이 경우에 상기 모노 또는 스테레오 오디오 신호와 상기 공간 정보는 AAC(Advanced Audio Coding)와 같은 다양한 압축방법을 이용하여 압축되어 전송된다. Recently, various coding techniques and methods for digital audio signals have been developed, and related products have been produced. In addition, coding methods for converting a mono or stereo audio signal into a multichannel in a decoding step using spatial information of a multichannel audio signal have been developed, and a product for this has been put into practical use. In this case, the mono or stereo audio signal and the spatial information are compressed and transmitted using various compression methods such as AAC (Advanced Audio Coding).

그러나 몇몇 기록매체에서는 공간 정보를 저장할 수 있는 영역이 존재하지 않는다. 따라서, 상기와 같은 경우에는 모노 또는 스테레오 오디오 신호만을 저장하거나 또는 전송함으로써 디코딩을 통해 재생되는 오디오 신호가 멀티채널이 아닌 모노 또는 스테레오 오디오 신호로 재생됨으로써 음질이 단조롭게 들리는 문제점이 있었다. 또한, 다운믹스된 오디오 신호와 함께 공간 정보를 하나의 방법으로만 전송하는 경우에는 다양한 종류의 디코더를 사용할 수 없는 문제점이 있었다.However, some recording media do not have an area for storing spatial information. Therefore, in the above case, the audio signal reproduced through decoding by storing or transmitting only a mono or stereo audio signal is reproduced as a mono or stereo audio signal rather than a multichannel, so that the sound quality is monotonous. In addition, when spatial information is transmitted only by one method together with the downmixed audio signal, various types of decoders cannot be used.

따라서 상기와 같은 문제점을 해결하기 위해 제안된 본 발명은, 멀티채널 오 디오 신호를 코딩하는데 있어서, 공간 정보를 전송하는 방법을 선택할 수 있는 인코딩 방법을 제공하는데 그 목적이 있다. 또한, 본 발명은 상기와 같은 방식으로 인코딩되어 전송된 공간 정보를 디코딩하는 방법을 제공하는데 그 목적이 있다.Accordingly, an object of the present invention, which is proposed to solve the above problems, is to provide an encoding method capable of selecting a method of transmitting spatial information in coding a multichannel audio signal. Another object of the present invention is to provide a method for decoding spatial information encoded and transmitted in the above manner.

상기의 목적을 달성하기 위하여, 본 발명은 상기 멀티채널 오디오 신호를 다운믹스하고, 상기 멀티채널 오디오 신호로부터 공간 정보를 추출하는 단계와; 상기 공간 정보를 이용하여 공간 정보 비트스트림을 생성하는 단계와; 생성된 상기 공간 정보 비트스트림의 전송방법을 선택하는 단계와; 상기 선택된 방법에 따라, 상기 공간 정보 비트스트림 및 상기 다운믹스 오디오 신호를 포함하는 전체 비트스트림을 생성하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 인코딩 방법을 제공한다. 여기서, 상기 전송방법은 상기 공간 정보 비트스트림을 상기 다운믹스 오디오 신호에 임베디드하여 전송하는 방법이거나, 상기 공간 정보 비트스트림을 상기 다운믹스 오디오 신호와는 별도로 전송하는 방법이거나, 또는 상기 공간 정보 비트스트림을 상기 다운믹스 오디오 신호에 임베디드하고, 또한 상기 다운믹스된 오디오 신호와는 별도로 전송하는 방법이 될 수 있다.In order to achieve the above object, the present invention comprises the steps of downmixing the multichannel audio signal and extracting spatial information from the multichannel audio signal; Generating a spatial information bitstream using the spatial information; Selecting a method of transmitting the generated spatial information bitstream; According to the selected method, generating a full bitstream including the spatial information bitstream and the downmix audio signal; provides a method of encoding a multi-channel audio signal comprising a. Here, the transmission method is a method of embedding the spatial information bitstream in the downmix audio signal, and transmitting the spatial information bitstream separately from the downmix audio signal, or the spatial information bitstream. May be embedded in the downmixed audio signal and transmitted separately from the downmixed audio signal.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 상기 멀티채널 오디오 신호를 다운믹스하고, 상기 멀티채널 오디오 신호로부터 공간 정보를 추출하는 단계와; 상기 공간 정보를 이용하여 공간 정보 비트스트림을 생성하는 단계와; 생성된 상기 공간 정보 비트스트림을 상기 다운믹스 오디오 신호에 임베디드하는 단계와; 상기 공간 정보 비트스트림이 임베디드된 다운믹스 오디오 신호를 포함하도록 전체 비트스트림을 생성하되, 상기 전체 비트스트림내에 별도로 상기 공간 정보 비트스트림을 포함할 것인지를 결정하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 인코딩 방법을 제공한다. 여기서, 상기 전체 비트스트림은 별도로 상기 공간 정보 비트스트림을 포함하지 않도록 구성되거나, 또는 별도로 상기 공간 정보 비트스트림을 포함되도록 구성될 수 있다. 그리고 상기 전체 비트스트림에는 상기 전송방법에 대한 식별정보가 포함될 수 있다.In addition, to achieve the above object, the present invention comprises the steps of downmixing the multichannel audio signal, and extracting spatial information from the multichannel audio signal; Generating a spatial information bitstream using the spatial information; Embedding the generated spatial information bitstream into the downmix audio signal; Generating an entire bitstream such that the spatial information bitstream includes an embedded downmix audio signal, and determining whether to include the spatial information bitstream separately in the entire bitstream; A method of encoding a channel audio signal is provided. The entire bitstream may be configured not to include the spatial information bitstream separately, or may be configured to include the spatial information bitstream separately. The entire bitstream may include identification information about the transmission method.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호를 포함하는 전체 비트스트림을 수신하는 단계와; 상기 다운믹스 오디오 신호에 임베드된 공간 정보 비트스트림을 추출하는 단계와; 상기 공간 정보 비트스트림을 디코딩하여 얻어진 공간 정보를 이용하여 상기 다운믹스 오디오 신호를 멀티채널 오디오 신호로 변환하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호로 디코딩하는 방법을 제공한다. 여기서, 상기 전체 비트스트림으로부터 상기 공간 정보 비트스트림이 상기 다운믹스 오디오 신호에 임베드되었다는 것을 나타내는 식별정보를 추출하고, 추출된 상기 식별정보를 이용하여 공간 정보 비트스트림을 추출할 수 있다. In addition, in order to achieve the above object, the present invention comprises the steps of: receiving an entire bitstream including a downmix audio signal in which the spatial information bitstream is embedded; Extracting a spatial information bitstream embedded in the downmix audio signal; And converting the downmix audio signal into a multichannel audio signal using the spatial information obtained by decoding the spatial information bitstream. Here, identification information indicating that the spatial information bitstream is embedded in the downmix audio signal may be extracted from the entire bitstream, and the spatial information bitstream may be extracted using the extracted identification information.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 공간 정보 비트스트림 및 다운믹스 오디오 신호를 포함하는 전체 비트스트림을 수신하는 단계와; 상기 전체 비트스트림으로부터 상기 다운믹스 오디오 신호와 별도로 전송된 공간 정보 비트스트림을 추출하는 단계와; 상기 공간 정보 비트스트림을 디코딩하여 얻어진 공간 정보를 이용하여 상기 다운믹스 오디오 신호를 멀티채널 오디오 신호로 변환하는 단 계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호로 디코딩하는 방법을 제공한다. 상기 공간 정보 비트스트림은 CD의 데이터 트랙에 저장되어 전송되거나, MPEG 비트스트림 중 확장 비트스트림내에 저장되어 전송되거나, 또는 MPEG 비트스트림 중 코어 코덱 비트스트림 내의 보조 데이터 영역에 저장되어 전송될 수 있다. 또한, 상기 디코딩 방법은 상기 (b)단계 이전에, 상기 전체 비트스트림으로부터 상기 공간 정보 비트스트림이 상기 다운믹스 오디오 신호와 별도로 전송되었다는 것을 나타내는 식별정보를 추출하는 것을 포함한다.In addition, in order to achieve the above object, the present invention comprises the steps of: receiving an entire bitstream comprising a spatial information bitstream and a downmix audio signal; Extracting a spatial information bitstream transmitted separately from the downmix audio signal from the entire bitstream; And converting the downmix audio signal into a multichannel audio signal using the spatial information obtained by decoding the spatial information bitstream. The spatial information bitstream may be stored and transmitted in a data track of a CD, stored and transmitted in an extended bitstream of an MPEG bitstream, or stored and transmitted in an auxiliary data area in a core codec bitstream of an MPEG bitstream. Further, the decoding method includes extracting identification information indicating that the spatial information bitstream was transmitted separately from the downmix audio signal from the entire bitstream before step (b).

또한, 상기의 목적을 달성하기 위하여, 본 발명은 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호 및 별도로 전송된 공간 정보 비트스트림을 포함하는 전체 비트스트림을 수신하는 단계와; 상기 다운믹스 오디오 신호로부터 공간 정보 비트스트림을 추출하고, 추출된 상기 공간 정보 비트스트림 디코딩하거나, 또는 상기 별도로 전송된 공간 정보 비트스트림을 디코딩하는 단계와; 상기 공간 정보 비트스트림을 디코딩하여 얻어진 공간 정보를 이용하여 상기 다운믹스 오디오 신호를 멀티채널 오디오 신호로 변환하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호로 디코딩하는 방법을 제공한다. 이때, 상기 전체 비트스트림으로부터 상기 공간 정보 비트스트림이 상기 다운믹스 오디오 신호에 임베드되어 전송되고, 또한 상기 다운믹스 오디오 신호와 별도로 전송되었다는 것을 나타내는 식별정보를 추출할 수 있다.In addition, in order to achieve the above object, the present invention comprises the steps of receiving a full bitstream including a downmix audio signal embedded with the spatial information bitstream and the spatial information bitstream transmitted separately; Extracting a spatial information bitstream from the downmix audio signal, decoding the extracted spatial information bitstream, or decoding the separately transmitted spatial information bitstream; And converting the downmix audio signal into a multichannel audio signal using the spatial information obtained by decoding the spatial information bitstream. In this case, identification information indicating that the spatial information bitstream is embedded and transmitted in the downmix audio signal and is transmitted separately from the downmix audio signal may be extracted from the entire bitstream.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 상기 멀티채널 오디오 신호를 다운믹스하고, 상기 멀티채널 오디오 신호로부터 공간 정보를 추출하는 단계 와; 상기 공간 정보를 이용하여 각 프레임별 공간 정보 비트스트림을 생성하는 단계와; 상기 공간 정보 비트스트림과 상기 다운믹스된 오디오 신호를 결합하여 전체 비트스트림을 구성하되, 상기 공간 정보 비트스트림의 프레임 크기(N)는 공간 정보가 복호화되어 적용되는 프레임 크기(S) 단위로 상기 다운믹스 오디오 신호와 결합되는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 인코딩 방법을 제공한다. 여기서, 상기 공간 정보 비트스트림에 대한 헤더내에 상기 공간 정보 비트스트림의 적용범위에 관한 위치정보가 삽입될 수 있다. 또한, 상기 공간 정보 비트스트림에 대한 헤더내에 상기 공간 정보 비트스트림의 시작위치에 대한 싱크워드가 삽입될 수 있다. In addition, in order to achieve the above object, the present invention comprises the steps of downmixing the multichannel audio signal and extracting spatial information from the multichannel audio signal; Generating a spatial information bitstream for each frame using the spatial information; The spatial information bitstream and the downmixed audio signal are combined to form an entire bitstream, wherein the frame size (N) of the spatial information bitstream is the downlink by the frame size (S) to which spatial information is decoded and applied. Combined with the mixed audio signal; provides a method of encoding a multi-channel audio signal comprising a. In this case, location information regarding an application range of the spatial information bitstream may be inserted into a header of the spatial information bitstream. In addition, a sync word for a start position of the spatial information bitstream may be inserted in a header for the spatial information bitstream.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 공간 정보 비트스트림과 결합된 다운믹스된 오디오 신호를 포함하는 전체 비트스트림을 수신하는 단계와; 상기 전체 비트스트림으로부터 공간 정보를 복호화되어 적용하는 프레임 크기(S) 단위로 상기 다운믹스된 오디오 신호와 결합된 상기 공간 정보 비트스트림을 추출 및 디코딩하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호로 디코딩하는 방법을 제공한다. 여기서, 상기 전체 비트스트림으로부터 상기 공간 정보 비트스트림을 적용할 위치정보를 독출할 수 있다.In addition, in order to achieve the above object, the present invention comprises the steps of: receiving an entire bitstream comprising a downmixed audio signal combined with a spatial information bitstream; Extracting and decoding the spatial information bitstream combined with the downmixed audio signal in a frame size (S) unit in which spatial information is decoded and applied from the entire bitstream. Provides a method for decoding into a signal. Here, position information to which the spatial information bitstream is applied may be read from the entire bitstream.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 오디오 신호가 다운믹스된 오디오 신호 및 공간 정보 비트스트림을 포함하도록 생성되고, 상기 공간 정보 비트스트림을 상기 다운믹스된 오디오 신호와 결합하여 전체 비트스트림을 구성하되, 상기 공간 정보 비트스트림을 복호화하여 적용하는 프레임 크기(S) 단위로 상기 다 운믹스 오디오 신호와 결합하도록 생성되는 것을 특징으로 하는 오디오 신호의 생성방법을 제공한다.In addition, in order to achieve the above object, the present invention is generated such that an audio signal includes a downmixed audio signal and a spatial information bitstream, and the spatial information bitstream is combined with the downmixed audio signal to form an entire bitstream. Comprising a configuration, but provides a method for generating an audio signal characterized in that it is generated to be combined with the downmix audio signal in the unit of the frame size (S) to be applied by decoding the spatial information bitstream.

이하 상기의 목적을 구체적으로 실현할 수 있는 본 발명의 바람직한 실시예를 첨부한 도면을 참조하여 설명한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention that can specifically realize the above object will be described.

도 1 은 본 발명에서의 오디오 신호에 대한 공간 정보를 인간이 인식하는 방법을 도시한다. 멀티채널 오디오 신호에 대한 코딩방법은 인간이 오디오 신호를 3차원적 공간으로 인지한다는 사실을 바탕으로, 복수의 파라미터 세트(parameter sets)를 통하여 상기 오디오 신호를 3차원적 공간 정보로 표현할 수 있다는 것을 이용한다. 멀티채널 오디오 신호의 공간 정보를 표시하기 위한 "공간 파라미터"라고 불리는 상기 파라미터에는 CLD(Channel level differences), ICC(Inter Channel Coherences) 및 CTD(Channel Time Difference)등이 있다. 상기 CLD는 두 채널간의 에너지 차이를 의미하고, 상기 ICC는 두 채널 간의 상관관계(correlation)를 의미하며, CTD는 두 채널간의 시간 차이를 의미한다.1 shows a method for a human to recognize spatial information about an audio signal in the present invention. The coding method for a multichannel audio signal is based on the fact that a human perceives the audio signal as a three-dimensional space. I use it. Such parameters, called "spatial parameters" for indicating spatial information of a multichannel audio signal, include channel level differences (CLD), inter channel coherences (ICC), channel time differences (CTD), and the like. The CLD denotes an energy difference between two channels, the ICC denotes a correlation between two channels, and the CTD denotes a time difference between two channels.

인간이 오디오 신호를 어떻게 공간적으로 인식하며, 상기 공간 파라미터의 개념이 어떻게 생성되는지가 도 1에 도시된다. 원거리에 있는 음원(105)으로부터의 직접적인 음파(direct sound wave)(103)가 인간의 왼쪽 귀(107)에 도달하고, 또 다른 직접적인 음파(102)는 머리 주위에서 회절되어 오른쪽 귀(106)에 도달하게 된다. 상기 두 음파(102 및 103)는 도달시간 및 에너지 레벨에서 차이를 보이게 되며, 이와 같은 차이가 상기 CTD 및 CLD 파라미터를 생성하게 된다. 또한, 만일 반사된 음파(104 및 105)가 양 귀에 도달되거나, 또는 상기 음원(105)이 분산되어 있 다면, 서로 상관관계가 없는 음파가 양 귀에 도달될 것이고, 이것이 상기 ICC 파라미터를 생성하게 된다. 상기와 같이 원리로 생성된 공간 파라미터들을 이용하여 멀티채널 오디오 신호를 모노 또는 스테레오 신호로 전송한 후 다시 멀티채널로 출력할 수 있다. 본 발명은 상기 공간 정보, 즉 공간 파라미터들을 상기 모노 또는 스테레오 오디오 신호에 임베디드(embeded)하여 전송한 후, 다시 멀티채널 오디오 신호로 재생할 수 있는 방법을 제공한다.How a human perceives an audio signal spatially and how the concept of the spatial parameter is generated is shown in FIG. 1. Direct sound wave 103 from the remote source 105 arrives at the human left ear 107, and another direct sound wave 102 is diffracted around the head to the right ear 106. Will be reached. The two sound waves 102 and 103 show a difference in arrival time and energy level, and this difference generates the CTD and CLD parameters. Also, if the reflected sound waves 104 and 105 reach both ears, or if the sound source 105 is dispersed, uncorrelated sound waves will reach both ears, which will generate the ICC parameter. . By using the spatial parameters generated as described above, the multichannel audio signal may be transmitted as a mono or stereo signal and then output again to the multichannel. The present invention provides a method in which the spatial information, that is, the spatial parameters are embedded in the mono or stereo audio signal, transmitted, and then reproduced as a multichannel audio signal.

도 2는 본 발명에 따른 공간인코더에 대한 블록도를 도시한다. 도시된 것처럼, 먼저 상기 공간인코더는 멀티채널 오디오 신호(201)를 수신한다. 여기서 N은 입력 채널의 수를 의미한다. 상기 멀티채널 오디오 신호(201)는 다운믹스(down-mix)부(203)에서 다운믹스된 오디오 신호(Lo 및 Ro, 205)로 된다. 상기 다운믹스된 오디오 신호는 PCM신호에 해당되며, 본 명세서에는 편의상 다운믹스된 오디오 신호라고 기술할 것이다. 상기 다운믹스된 오디오 신호(205)는 모노 또는 스테레오 오디오 신호를 포함할 수 있다. 본 명세서에서는 편의상 스테레오 오디오 신호를 예로 하여 기술할 것이나, 본 발명이 이에 한정되지는 않는다. 그리고 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터가 공간정보추출부(203)에서 상기 멀티채널 오디오 신호(201)로부터 추출된다. 여기서 공간 정보(spatial information)란 멀티채널(예를 들면, Left, Right, Center, Left surround, Right surround 등) 오디오 신호를 다운믹스하고, 상기 다운믹스된 오디오 신호(205)를 전송하며, 상기 전송된 다운믹스된 오디오 신호를 다시 멀티채널로 업믹스(upmix) 할 때 사용되는 오디오 신호 채널에 대한 정보를 말한다. 선택적으로, 상기 다운믹스된 오디오 신 호(205)는 외부에서 직접 제공되는 다운믹스된 오디오 신호, 예를 들면 아티스틱 다운믹스된 오디오 신호(Artistic down-mix signal, 202)를 이용하여 생성될 수 있다. 2 shows a block diagram of a spatial encoder according to the present invention. As shown, first the spatial encoder receives a multichannel audio signal 201. Where N is the number of input channels. The multichannel audio signal 201 is an audio signal Lo and Ro 205 downmixed by the down-mix unit 203. The downmixed audio signal corresponds to a PCM signal and will be described herein as a downmixed audio signal for convenience. The downmixed audio signal 205 may comprise a mono or stereo audio signal. In the present specification, a stereo audio signal will be described as an example for convenience, but the present invention is not limited thereto. The spatial information of the multichannel audio signal, that is, the spatial parameter, is extracted from the multichannel audio signal 201 by the spatial information extractor 203. In this case, spatial information refers to downmixing a multi-channel (eg, Left, Right, Center, Left surround, Right surround, etc.) audio signal, transmitting the downmixed audio signal 205, and transmitting the downmixed audio signal. Refers to information about an audio signal channel used when upmixing the downmixed audio signal back to a multichannel. Alternatively, the downmixed audio signal 205 may be generated using an externally provided downmixed audio signal, for example an artistic down-mix signal 202. .

상기 공간정보추출부(203)에서 추출된 공간 정보는 공간정보인코딩부(206)에서 전송 및 저장을 위한 공간 정보 비트스트림으로 부호화되는 과정을 거친다. 상기 공간 정보 비트스트림은 적당히 변형되어 임베드부(207)에서 전송할 신호, 즉 상기 다운믹스된 오디오 신호(205)에 직접 삽입되는데, 이때, "디지털 오디오 임베드기법(Digital Audio Embeded Method)"이 사용될 수 있다. 예를 들면, 상기 다운믹스된 오디오 신호(205)가 공간 정보를 저장하기 어려운 저장매체(예를 들면, 스테레오 컴팩트디스크(stereo CD))에 저장되거나 또는 SPDIF(Sony/Philips Digital Interface)와 같은 방식으로 전송할 원(raw) PCM 오디오 신호인 경우, AAC 등의 압축 부호화되는 경우와는 다르게 공간 정보 비트스트림을 저장할 수 있는 보조데이터 영역(Ancillary Data Field)이 존재하지 않는다. 이때, 상기 "디지털 오디오 임베드기법"을 사용하면, 상기 원(raw) PCM 오디오 신호에 음질왜곡 없이 상기 공간 정보 비트스트림을 임베드할 수 있으며, 상기 공간 정보 비트스트림이 임베드된 오디오 신호는 일반적인 디코더 입장에서 원 신호와 구별되지 않는다. 즉, 공간 정보 비트스트림이 임베드되어 있는 출력신호 Lo'/Ro'(208)는 일반적인 PCM 디코더 입장에서 입력신호 Lo/Ro(205)와 동일한 신호라 볼 수 있다.The spatial information extracted by the spatial information extractor 203 is encoded by the spatial information encoder 206 into a spatial information bitstream for transmission and storage. The spatial information bitstream is appropriately modified and inserted directly into the signal to be transmitted by the embedding unit 207, ie, the downmixed audio signal 205, where a "Digital Audio Embeded Method" may be used. have. For example, the downmixed audio signal 205 is stored in a storage medium (eg, stereo CD) that is difficult to store spatial information, or in a manner such as SPDIF (Sony / Philips Digital Interface). In the case of a raw PCM audio signal to be transmitted by a mobile station, there is no ancillary data field capable of storing a spatial information bitstream, unlike in the case of compression encoding such as AAC. In this case, if the "digital audio embedding technique" is used, the spatial information bitstream may be embedded in the raw PCM audio signal without sound distortion, and the audio signal in which the spatial information bitstream is embedded is a general decoder. Is indistinguishable from the original signal. That is, the output signal Lo '/ Ro' 208 in which the spatial information bitstream is embedded may be regarded as the same signal as the input signal Lo / Ro 205 from a general PCM decoder.

상기 "디지털 오디오 임베드기법"에는 비트치환부호화방법(Bit Replacement Coding Method) , 반향삽입방법(Echo Hiding Method), 대역확산통신법(Spread- Spectrum-based Method) 등이 있다. 상기 비트치환부호화방법은 양자화된 오디오 샘플의 최하위 비트들을 변형하여 원하는 정보를 삽입하는 방법으로, 오디오 신호에 있어서 최하위 비트의 변형은 오디오 신호의 품질에 거의 영향을 주지 않는다는 특성을 이용하는 방법이다. 상기 반향삽입방법은 사람의 귀에 들리지 않을 정도의 작은 크기의 반향을 오디오 신호에 삽입하는 방법이다. 상기 대역확산통신법은 이산코사인변환(Discrete Cosine Transform), 이산푸리에변환(Discrete Fourier Transform) 등을 통해 오디오 신호를 주파수 영역으로 변환한 후에, 이진수로 만들어진 원하는 정보를 PN(Pseudo Noise) 시퀀스로 대역 확산하여 주파수 영역으로 변환된 오디오 신호에 첨가하는 방법이다. 본 발명에서는 상기 임베드 방법 중 비트치환부호화방법을 중심으로 기술할 것이나, 본 발명이 상기 비트치환부호화방법에만 한정되는 것은 아니다.The "digital audio embedding method" includes a bit replacement coding method, an echo hiding method, a spread-spectrum-based method, and the like. The bit substitution encoding method is a method of inserting desired information by transforming least significant bits of a quantized audio sample, and using a characteristic that deformation of least significant bit in an audio signal has little effect on the quality of an audio signal. The echo insertion method is a method of inserting an echo of a size small enough to be inaudible to the human ear. In the spread spectrum communication method, after converting an audio signal into a frequency domain through a discrete cosine transform, a discrete fourier transform, and the like, the desired information, which is made with a binary number, is spread with a pseudo noise (PN) sequence. To add to the audio signal converted into the frequency domain. In the present invention, a description will be made mainly on the bit substitution encoding method of the embedding method, but the present invention is not limited to the bit substitution encoding method.

도 3은 본 발명에 따른 도 2의 공간인코더를 구성하는 임베드부(207)에 대한 상세한 블록도이다. 비트치환부호화방법에 의해 공간 정보 비트스트림을 다운믹스된 오디오 신호에 임베드할 때, 상기 공간 정보 비트스트림을 임베드할 수 있는 삽입비트 값(이하, K값이라 한다)은 최하위 1비트만을 사용하는 것이 아니라 정해진 방법에 따라 K(K>0)비트를 사용할 수 있다. 상기 K비트는 다운믹스된 오디오 신호의 최하위 비트 값을 사용할 수 있으나, 상기 최하위 비트 값에만 한정되지는 않는다. 여기서 정해진 방법이란 예를 들면, 심리음향모델(Psychoacoustic Model)에 따른 마스킹 한계값(Masking Threshold)을 구하고, 상기 마스킹 한계값에 따라 적당한 비트를 할당하는 것을 말한다. 도시된 것처럼, 다운믹스된 오디오 신호 Lo/Ro(301)는 임베드부 내의 버퍼(303)를 거처 인코딩부(306)로 전송된다. 마스킹한계값계산부(304)는 입력된 오디오 신호를 일정한 구간(예를 들면, 블록)으로 나누고, 해당 구간에 대한 마스킹 한계값을 구한다. 또한 상기 마스킹한계값계산부(304)는 상기 마스킹 한계값에 따라 청각적인 왜곡이 발생하지 않고도 변경할 수 있는 상기 다운믹스된 오디오 신호의 삽입비트 값, 즉 K값을 구한다. 즉, 상기 공간 정보 비트스트림을 상기 다운믹스된 오디오 신호에 임베드하는데 사용할 수 있는 비트 수를 블록별로 할당하는 것이다. 본 명세서에서, 블록이란 프레임내에 존재하는 하나의 삽입비트 값(즉, K값)을 이용하여 삽입된 데이터 단위를 말한다. 하나의 프레임에는 1개 이상의 복수의 블록이 존재할 수 있으며, 따라서 프레임의 길이가 고정되어 있으면 블록의 길이는 블록의 갯수가 증가함에 따라 감소할 수 있다. 이용할 수 있는 K값이 결정되면, 상기 K값을 공간 정보 비트스트림에 포함시킬 수 있다. 즉, 비트스트림재구성부(305)는 상기 K값을 포함하도록 상기 공간 정보 비트스트림을 재구성할 수 있다. 이때, 상기 공간 정보 비트스트림에는 싱크워드(sync word), 에러검출코드 또는 에러정정코드 등이 포함될 수 있다.3 is a detailed block diagram of the embedding part 207 constituting the spatial encoder of FIG. 2 according to the present invention. When embedding the spatial information bitstream in the downmixed audio signal by the bit substitution encoding method, it is preferable to use only the least significant 1 bit as the insertion bit value (hereinafter referred to as the K value) that can embed the spatial information bitstream. Instead, the K (K> 0) bits can be used according to a predetermined method. The K bit may use the least significant bit value of the downmixed audio signal, but is not limited to the least significant bit value. The method defined here means, for example, obtaining a masking threshold according to a psychoacoustic model and allocating an appropriate bit according to the masking threshold. As shown, the downmixed audio signal Lo / Ro 301 is transmitted via the buffer 303 in the embedding portion to the encoding portion 306. The masking limit value calculator 304 divides the input audio signal into a predetermined section (for example, a block) and calculates a masking limit value for the section. In addition, the masking limit value calculator 304 obtains an insertion bit value of the downmixed audio signal, that is, a K value, which can be changed without generating an acoustic distortion according to the masking limit value. In other words, the number of bits that can be used to embed the spatial information bitstream into the downmixed audio signal is allocated for each block. In the present specification, a block refers to a data unit inserted using one insertion bit value (that is, a K value) existing in a frame. One or more blocks may exist in one frame. Therefore, if the length of the frame is fixed, the length of the block may decrease as the number of blocks increases. Once the available K value is determined, the K value may be included in the spatial information bitstream. That is, the bitstream reconstruction unit 305 may reconstruct the spatial information bitstream to include the K value. In this case, the spatial information bitstream may include a sync word, an error detection code, or an error correction code.

재구성된 공간 정보 비트스트림은 임베드가능한 형태로 재정렬될 수 있다. 재정렬된 상기 공간 정보 비트스트림은 인코딩부(306)에서 상기 다운믹스된 오디오 신호에 임베드되어, 상기 공간 정보 비트스트림이 임베드된 오디오 신호 Lo'/Ro'(307)로 출력된다. 이때, 상기 공간 정보 비트스트림은 상기 다운믹스된 오디오 신호의 K비트내에 임베드될 수 있다. 상기 K값은 블록내에서 고정된 하나의 값을 가질 수 있거나, 또는 블록내에서 오디오 신호의 형상(예를 들면, 템포럴 쉐 이프(temporal shape))에 따라 가변적인 K값을 가질 수 있다. 어떤 경우이든 상기 K값은 상기 공간 정보 비트스트림의 재구성 과정 또는 재정렬 과정에서 부가정보로써 삽입되어 디코더로 전송되고, 디코더에서는 상기 부가정보를 이용하여 공간 정보 비트스트림 데이터를 복원할 수 있다.The reconstructed spatial information bitstream may be rearranged into an embeddable form. The rearranged spatial information bitstream is embedded in the downmixed audio signal by the encoding unit 306, and the spatial information bitstream is output to the embedded audio signal Lo '/ Ro' 307. In this case, the spatial information bitstream may be embedded in K bits of the downmixed audio signal. The K value may have a fixed value within the block, or may have a variable K value depending on the shape of the audio signal (eg, temporal shape) within the block. In any case, the K value is inserted as additional information and transmitted to the decoder in the reconstruction or rearrangement of the spatial information bitstream, and the decoder can restore the spatial information bitstream data using the additional information.

전술하였듯이, 상기 공간 정보 비트스트림은 블록별로 상기 다운믹스된 오디오 신호에 결합하는 과정을 거친다. 상기 과정에는 다양한 결합방법이 이용된다. 제1 방법은 단순히 상기 다운믹스된 오디오 신호의 하위 K비트 만큼을 0으로 대체한 후, 변형된 상기 공간 정보 비트스트림 데이터를 더하는 방법이다. 예를 들면, K값이 3이고, 다운믹스된 오디오 신호의 한 샘플 데이터가 11101101이며, 임베드해야할 공간 정보 비트스트림 데이터가 111인 경우, 상기 11101101의 하위 3비트를 0으로 대체하여 11101000으로 만들고, 그 다음에 상기 공간 정보 비트스트림 데이터 111을 더하여 11101111로 만드는 것이다.As described above, the spatial information bitstream is coupled to the downmixed audio signal on a block-by-block basis. Various coupling methods are used in the above process. The first method simply replaces the lower K bits of the downmixed audio signal with zeros and then adds the modified spatial information bitstream data. For example, if the K value is 3, one sample data of the downmixed audio signal is 11101101, and the spatial information bitstream data to be embedded is 111, the lower 3 bits of the 11101101 are replaced with 0 to be 11101000, The spatial information bitstream data 111 is then added to make 11101111.

제2 방법은 디더링(dithering) 방법을 이용하는 것인데, 먼저 변형된 공간 정보 비트스트림 데이터를 상기 다운믹스된 오디오 신호로부터 뺀 뒤에, 상기 다운믹스된 오디오 신호를 상기 K값에 근거하여 재양자화하고, 재양자화된 상기 다운믹스된 오디오 신호에 대해 상기 변형된 공간 정보 비트스트림 데이터를 더하는 방법이다. 예를 들면, K값이 3이고, 다운믹스된 오디오 신호의 한 샘플 데이터가 11101101이며, 임베드해야할 공간 정보 비트스트림 데이터가 111인 경우, 상기 11101101에서 111을 빼서 11100110로 만들고, 하위 3비트 이상에 대해 재양자화하여 11101000(반올림 적용)으로 만들며, 그 다음에 상기 111을 더해서 11101111로 만드는 것이다.The second method is to use a dithering method, first subtracting the modified spatial information bitstream data from the downmixed audio signal, and then requantizing the downmixed audio signal based on the K value, And adding the modified spatial information bitstream data to the quantized downmixed audio signal. For example, if the value of K is 3, one sample data of the downmixed audio signal is 11101101, and the spatial information bitstream data to be embedded is 111, the value 11110 is subtracted from 111 to 11100110, and the lower 3 bits or more And requantize to 11101000 (rounded), then add 111 to 11101111.

상기 다운믹스된 오디오 신호에 임베드될 공간 정보 비트스트림은 임의의 비트스트림이기 때문에 백색잡음적인 특성을 갖지 못할 수 있다. 다운믹스된 오디오 신호에 백색잡음 형태의 신호가 더해지는 것이 음질 특성상 유리하기 때문에, 상기 공간 정보 비트스트림을 백색화(whitening)하는 과정을 거친 후에 상기 다운믹스된 오디오 신호에 더할 수 있다. 상기 백색화는 싱크워드를 제외한 공간 정보 비트스트림에 적용될 수 있다. 상기 백색화(whitening)란 오디오 신호의 음량이 모든 주파수 영역에서 동일하거나 거의 유사한 크기를 갖는 랜덤 신호로 만드는 것을 말한다. 또한, 상기 백색화이외에 상기 제2 방법에는 재양자화 과정에 노이즈 쉐이핑(Noise shaping) 기법을 적용하여 청각적 왜곡을 최소화할 수 있다. 상기 노이즈 쉐이핑이란 재양자화 과정에서 생성되는 양자화 노이즈의 에너지가 가청주파수 대역 이상의 고주파수 대역으로 이동되도록 노이즈 특성을 변형시키거나, 해당 오디오 신호로부터 마스킹 임계값을 구해, 상기 마스킹 임계값에 대응되는 시변(time-varing)필터를 생성하고, 상기 필터에 의해 재양자 과정에서 발생하는 노이즈의 특성을 변형시키는 과정을 말한다.Since the spatial information bitstream to be embedded in the downmixed audio signal is an arbitrary bitstream, it may not have a white noise characteristic. Since it is advantageous to add a white noise signal to the downmixed audio signal, it may be added to the downmixed audio signal after whitening the spatial information bitstream. The whitening may be applied to the spatial information bitstream excluding the syncword. The whitening refers to making the volume of an audio signal into a random signal having the same or almost similar magnitude in all frequency domains. In addition to the whitening, the second method may minimize noise distortion by applying a noise shaping technique to the requantization process. The noise shaping may be performed by modifying a noise characteristic such that energy of quantization noise generated in a requantization process is shifted to a high frequency band of an audible frequency band or higher, or obtaining a masking threshold value from a corresponding audio signal, A process of generating a time-varing) filter and modifying the characteristics of noise generated in the requantization process by the filter.

도 4는 본 발명에 따른 공간 정보 비트스트림을 재정렬하는 제1 방법을 도시한다. 상술하였듯이, 상기 공간 정보 비트스트림은 상기 K값을 이용하여 임베드가능한 형태로 재정렬될 수 있다. 이때, 상기 공간 정보 비트스트림은 다양한 방식으로 재정렬되어 상기 다운믹스된 오디오 신호에 임베드될 수 있는데, 도 4는 상기 방식 중 한 방식에 해당된다. 상기 제1 방법은 K비트 단위로 해당 블록에 대한 상 기 공간 정보 비트스트림을 분산하여 이를 순차적으로 임베드할 수 있도록 상기 공간 정보 비트스트림을 재정렬하는 것이다. 도시된 것처럼 K값이 4이고, 한 블록(405)이 N개의 샘플(403)로 구성된 경우, 상기 공간 정보 비트스트림(401)은 각 샘플의 하위 4비트에 순차적으로 임베드될 수 있도록 재정렬될 수 있다. 전술하였듯이, 본 발명은 각 샘플의 하위 4비트에만 공간 정보 비트스트림을 임베드하는 것에 한정되지는 않는다. 그리고 각 샘플의 하위 K비트 내에서는 상기 공간 정보 비트스트림이 도시된 것처럼 상위 비트부터 임베드(MSB(Most Significant Bit) first)될 수 있거나, 또는 하위 비트부터 임베드(LSB(Least Significant Bit) first))될 수 있다.4 illustrates a first method of reordering spatial information bitstreams in accordance with the present invention. As described above, the spatial information bitstream may be rearranged into an embeddable form using the K value. In this case, the spatial information bitstream may be rearranged in various ways and embedded in the downmixed audio signal. FIG. 4 corresponds to one of the above methods. The first method is to rearrange the spatial information bitstream to distribute the spatial information bitstream for the corresponding block in units of K bits and to embed the same sequentially. As shown, if the K value is 4 and one block 405 consists of N samples 403, the spatial information bitstream 401 can be rearranged so that it can be sequentially embedded in the lower 4 bits of each sample. have. As described above, the present invention is not limited to embedding the spatial information bitstream only in the lower 4 bits of each sample. Within the lower K bits of each sample, the spatial information bitstream may be embedded from a higher bit (MSB (Most Significant Bit) first) as shown, or from a lower bit (Least Significant Bit (LSB) first). Can be.

도 4에서 화살표(404)는 임베드되는 방향을 나타내고, 괄호안의 숫자는 데이터 재정렬 순서를 나타낸다. 도시된 비트플레인(402)이란 다수의 비트들로 구성되는 일정한 비트 계층을 말한다. 상기 블록(405)에 임베드가능한 비트 수보다 임베드해야할 공간 정보 비트스트림의 비트 수가 작은 경우에는, 남는 비트 수를 0으로 채우거나(406), 랜덤 신호(Random signal)를 넣거나, 또는 원래의 다운믹스된 오디오 신호로 대체할 수 있다. 예를 들면, 블록을 구성하는 샘플수(N)가 100이고 K값이 4인 경우, 상기 블록에 임베드가능한 비트 수(W)는 W=N*K=100*4=400비트가 된다. 만일 임베드해야할 공간 정보 비트스트림의 비트 수(V)가 390비트인 경우(즉, V<W인 경우), 나머지 10비트는 0으로 채우거나, 랜덤 신호를 넣거나, 원래의 다운믹스된 오디오 신호로 대체하거나, 또는 데이터 끝을 알려주는 말단비트열(tail sequence)을 넣을 수 있다. 상기 말단비트열은 해당 블록에서 공간 정보 비트스트 림의 끝을 알려주는 비트열을 말한다. Arrow 404 in FIG. 4 indicates the direction in which it is embedded, and the numbers in parentheses indicate the data reordering order. The illustrated bitplane 402 refers to a constant bit layer composed of a plurality of bits. If the number of bits in the spatial information bitstream to be embedded in the block 405 is less than the embeddable number of bits, fill the remaining number of bits with zero (406), insert a random signal, or the original downmix. Can be replaced with the audio signal. For example, when the number of samples (N) constituting the block is 100 and the K value is 4, the number of bits (W) embedable in the block is W = N * K = 100 * 4 = 400 bits. If the number of bits (V) of the spatial information bitstream to be embedded is 390 bits (i.e., V <W), the remaining 10 bits are filled with zeros, inserted random signals, or the original downmixed audio signal. Alternately, you can insert a tail sequence that indicates the end of the data. The terminal bit string refers to a bit string indicating the end of the spatial information bit stream in the block.

도 5는 본 발명에 따른 공간 정보 비트스트림을 재정렬하는 제2 방법을 도시한다. 상기 제2 방법은 상기 공간 정보 비트스트림(501)을 비트플레인(Bit Plane, 502) 순서로 재정렬하는 것이다. 즉, 상기 공간 정보 비트스트림을 블록별로 임베드하려는 다운믹스된 오디오 신호의 비트(즉, LSB)에 순차적으로 임베드하는 것이다. 이때, 상기 공간 정보 비트스트림을 블록별로 상기 다운믹스된 오디오 신호의 하위 비트부터 순차적으로 임베드할 수 있으나, 본 발명은 이에 한정되지 않는다. 예를 들면, 블록을 구성하는 샘플수(N)가 100이고 K값이 4인 경우, 먼저 비트플레인(502) 0을 구성하는 최하위 100비트를 먼저 채우고, 그 다음에 비트플레인(502) 1을 구성하는 100비트를 채우는 식으로 진행할 수 있다. 5 illustrates a second method of reordering spatial information bitstreams in accordance with the present invention. The second method is to rearrange the spatial information bitstream 501 in the order of bit planes 502. That is, the spatial information bitstream is sequentially embedded in bits (ie, LSBs) of the downmixed audio signal to be embedded block by block. In this case, the spatial information bitstream may be sequentially embedded from the lower bits of the downmixed audio signal for each block, but the present invention is not limited thereto. For example, if the number of samples (N) constituting the block is 100 and the K value is 4, first, the least significant 100 bits constituting the bitplane 502 0 are filled first, and then the bitplane 502 1 is filled. You can proceed by filling the 100 bits to make up.

도 5에서 화살표(505)는 임베드되는 방향을 나타내고, 괄호안의 숫자는 데이터 재정렬 순서를 나타낸다. 상기 제2 방법은 특히 임의의 위치에서 싱크워드(Sync Word)를 추출하는데 유리할 수 있다. 상기와 같이 재정렬 및 부호화된 신호로부터 삽입된 공간 정보 비트스트림의 시작위치를 찾을 때에는, LSB만을 추출하여 검색함으로써 가능해진다. 또한, 임베드해야할 공간 정보 비트스트림의 비트 수(V)에 따라 최소의 LSB만을 사용하는 효과를 기대할 수 있다. 이때도 블록(504)에 임베드가능한 비트 수(W)가 임베드해야할 공간 정보 비트스트림의 비트 수(V)보다 크면, 남는 비트 수는 상기와 같이 0으로 채우거나(506), 랜덤 신호를 넣거나, 원래의 다운믹스된 오디오 신호로 대체하거나, 또는 데이터의 끝을 알려주는 말단비트열을 넣을 수 있는데, 특히 상기 다운믹스된 오디오 신호를 그대로 이용하는 것이 유리 하다.Arrow 505 in FIG. 5 indicates the direction in which it is embedded, and the numbers in parentheses indicate the data reordering order. The second method may be particularly advantageous for extracting Sync Word at any location. When finding the starting position of the inserted spatial information bitstream from the rearranged and encoded signal as described above, it is possible to extract and search only the LSB. In addition, the effect of using only the minimum LSB can be expected according to the number of bits (V) of the spatial information bitstream to be embedded. In this case, if the number of embeddable bits (W) in the block 504 is larger than the number of bits (V) of the spatial information bitstream to be embedded, the remaining number of bits is filled with zero as described above (506), or a random signal is inputted, It may be replaced with the original downmixed audio signal, or a terminal bit string indicating the end of data may be used. It is particularly advantageous to use the downmixed audio signal as it is.

도 6a는 본 발명에 따른 임베드를 위한 공간 정보 비트스트림의 구성형태를 도시한다. 전술하였듯이, 공간 정보 비트스트림(607)은 비트스트림재구성부(305)에서 상기 공간 정보 비트스트림에 대한 싱크워드(Sync Word, 603)와 K값(604)을 포함하도록 재구성되어, 상기 임베드를 위한 공간 정보 비트스트림이 될 수 있다. 또한, 재구성 과정에서, 상기 공간 정보 비트스트림(607)이 전송 및 저장 과정에서 손상되었는지 유무를 판단할 수 있는 에러검출코드(error detection code) 또는 에러정정코드(error correction code)(606, 608)가 상기 임베드를 위한 공간 정보 비트스트림에 포함될 수 있다. 상기 에러검출코드에는 CRC(Cyclic Redundancy Check)가 포함될 수 있다. 상기 에러검출코드 또는 에러정정코드는 두 단계로 나누어 포함될 수 있는데, K값들이 포함된 헤더(601)에 대한 에러검출코드1 또는 에러정정코드1(606)와 상기 공간 정보 비트스트림의 프레임 데이터(602)에 대한 에러검출코드2 또는 에러정정코드2(608)가 별도로 포함될 수 있다. 그 외에 기타정보(605)가 별도로 상기 임베드를 위한 공간 정보 비트스트림에 포함될 수 있다. 상기 기타정보(605)에는 공간 정보 비트스트림의 재정렬방법에 대한 식별정보 등이 포함될 수 있다.6A illustrates a configuration of a spatial information bitstream for embedding in accordance with the present invention. As described above, the spatial information bitstream 607 is reconstructed by the bitstream reconstruction unit 305 to include a sync word (603) and a K value 604 for the spatial information bitstream, so that the embedding for the embedding is performed. It may be a spatial information bitstream. In the reconstruction process, an error detection code or an error correction code 606 and 608 that can determine whether the spatial information bitstream 607 is damaged during transmission and storage. May be included in the spatial information bitstream for embedding. The error detection code may include a cyclic redundancy check (CRC). The error detection code or error correction code may be divided into two stages. The error detection code 1 or error correction code 1 606 for the header 601 including the K values and the frame data of the spatial information bitstream ( Error detection code 2 or error correction code 2 608 for 602 may be included separately. In addition, other information 605 may be separately included in the spatial information bitstream for embedding. The other information 605 may include identification information about a reordering method of the spatial information bitstream.

도 6b는 도 6a의 임베드를 위한 공간 정보 비트스트림에 대한 상세도이다. 도시된 것처럼, 도 6b에서는 공간 정보 비트스트림(610)의 한 프레임이 2개의 블록으로 구성된 실시예를 나타내나, 본 발명은 상기 실시예에 한정되지 않는다. 도 6b에의 임베드를 위한 공간 정보 비트스트림도 싱크워드(612), K값(K1, K2, K3 및 K4)(613, 614, 615 및 616), 기타정보(617), 에러검출코드 또는 에러정정코드(618 및 623)를 포함하도록 구성될 수 있다. 상기 공간 정보 비트스트림(610)은 두 개의 블록으로 구성되어 있는데, 스테레오 오디오 신호의 경우 블록1은 왼쪽(left) 채널에 대한 블록1(619)와 오른쪽(Right) 채널에 대한 블록1(620)로 구성되며, 블록 2도 왼쪽 채널에 대한 블록2(621)와 오른쪽 채널에 대한 블록2(622)로 구성될 수 있다. 도 6b에서는 스테레오 오디오 신호에 대하여 도시되어 있지만, 본 발명은 스테레오 오디오 신호에 대해서만 한정되지 않는다. 상기 블록들에 대한 삽입비트 값(K 값)은 헤더 부분에 포함된다. K1(613)은 블록1의 왼쪽 채널에 대한 삽입비트 값이고, K2(614)는 블록1의 오른쪽 채널에 대한 삽입비트 값이며, K3(615)은 블록2의 왼쪽 채널에 대한 삽입비트 값이고, K4(616)는 블록2의 오른쪽 채널에 대한 삽입비트 값에 해당된다. 또한, 상기 에러검출코드 또는 에러정정코드는 두 단계로 나누어 포함될 수 있는데, K값들이 포함된 헤더(609)에 대한 에러검출코드1 또는 에러정정코드1(618)와 상기 공간 정보 비트스트림의 프레임 데이터(611)에 대한 에러검출코드2 또는 에러정정코드2(623)가 별도로 포함될 수 있다.FIG. 6B is a detailed diagram of the spatial information bitstream for embedding of FIG. 6A. As shown in FIG. 6B, an embodiment in which one frame of the spatial information bitstream 610 is composed of two blocks is shown, but the present invention is not limited to the above embodiment. Spatial information bitstream for embedding in FIG. 6B also includes a syncword 612, K values (K1, K2, K3 and K4) 613, 614, 615 and 616, other information 617, error detection code or error correction. It can be configured to include codes 618 and 623. The spatial information bitstream 610 is composed of two blocks. In the case of a stereo audio signal, block 1 is block 1 619 for the left channel and block 1 620 for the right channel. Block 2 may also be composed of block 2 621 for the left channel and block 2 622 for the right channel. Although shown in FIG. 6B for the stereo audio signal, the present invention is not limited to the stereo audio signal only. The insertion bit value (K value) for the blocks is included in the header portion. K1 613 is the insertion bit value for the left channel of block 1, K2 614 is the insertion bit value for the right channel of block 1, K3 615 is the insertion bit value for the left channel of block 2 , K4 616 corresponds to the insertion bit value for the right channel of block 2. In addition, the error detection code or error correction code may be divided into two stages. The error detection code 1 or error correction code 1 618 for the header 609 including K values and the frame of the spatial information bitstream Error detection code 2 or error correction code 2 623 for the data 611 may be included separately.

도 7은 본 발명에 따른 공간디코더에 대한 블록도이다. 도시된 것처럼, 상기 공간디코더는 공간 정보 비트스트림이 임베드된 오디오 신호 Lo'/Ro'(701)를 수신한다. 상기 공간 정보 비트스트림이 임베드된 오디오 신호는 모노 또는 스테레오 오디오 신호를 포함할 수 있으며, 편의상 본 발명에서는 스테레오 오디오 신호를 기준으로 설명하나, 이에 한정되지는 않는다. 그리고 임베디드신호디코더(702)는 상기 오디오 신호(701)로부터 공간 정보 비트스트림을 검출 및 복호화할 수 있다. 상기 임베디드신호디코더(702)에서 얻어진 임베드된 공간 정보 비트스트림은 부호화된 공간 정보 비트스트림이며, 상기 부호화된 공간 정보 비트스트림은 공간정보디코더(703)의 입력으로 들어간다. 상기 공간정보디코더(703)에서는 상기 부호화된 공간 정보 비트스트림을 복호화하여 멀티채널생성부(704)로 출력한다. 상기 멀티채널생성부(704)는 다운믹스된 오디오 신호(701) 및 복호화를 통해 얻어진 공간 정보들을 입력으로 받아 멀티채널 오디오 신호(705)를 출력할 수 있다.7 is a block diagram of a spatial decoder according to the present invention. As shown, the spatial decoder receives the audio signal Lo '/ Ro' 701 in which the spatial information bitstream is embedded. The audio signal in which the spatial information bitstream is embedded may include a mono or stereo audio signal. For convenience, the present invention is described with reference to a stereo audio signal, but is not limited thereto. The embedded signal decoder 702 may detect and decode a spatial information bitstream from the audio signal 701. The embedded spatial information bitstream obtained by the embedded signal decoder 702 is an encoded spatial information bitstream, and the encoded spatial information bitstream enters an input of the spatial information decoder 703. The spatial information decoder 703 decodes the encoded spatial information bitstream and outputs the decoded spatial information bitstream to the multichannel generator 704. The multichannel generator 704 may receive the downmixed audio signal 701 and spatial information obtained through decoding and output the multichannel audio signal 705.

도 8은 본 발명에 따른 공간디코더를 구성하는 임베디드신호디코더를 상세하게 도시한다. 상기 임베디드신호디코더에는 공간 정보가 임베드된 오디오 신호 Lo'/Ro'(801)가 입력되고, 싱크워드탐색부(802)는 입력된 상기 오디오 신호(801)로부터 싱크워드(Sync Word)를 검출한다. 헤더디코딩부(803)는 상기 싱크워드가 검출되면, 그 이후부터 헤더를 디코딩한다. 이때 백색화(Whitening) 또는 노이즈쉐이핑(Noise Shaping) 기법이 적용되어 있으면, 데이터역변형부(804)에서 이를 역처리한 후에 진행할 수 있다. 상기 헤더 디코딩을 통해 K값 등의 부가정보를 얻고, 상기 부가정보를 이용하여 재정렬된 공간 정보 비트스트림을 다시 정렬하여 본래의 공간 정보 비트스트림(805)을 얻는다. 상기 공간 정보 비트스트림(805)은 인코딩 단계에서의 변형되기 전의 공간 정보 비트스트림과 동일할 수 있다. 또한, 상기 싱크워드를 검출함으로써 다운믹스된 오디오 신호와 공간 정보 비트스트림의 프레임을 정렬할 수 있는 싱크위치(Sync Position) 정보, 즉, 프레임정렬정보(806)를 얻을 수 있다.8 illustrates in detail the embedded signal decoder constituting the spatial decoder according to the present invention. An audio signal Lo '/ Ro' 801 embedded with spatial information is input to the embedded signal decoder, and the sync word search unit 802 detects a sync word from the input audio signal 801. . When the syncword is detected, the header decoding unit 803 decodes the header thereafter. In this case, if a whitening or noise shaping technique is applied, the data inverse transform unit 804 may process the data after the reverse processing. The header decoding obtains additional information such as a K value, and rearranges the rearranged spatial information bitstream using the additional information to obtain the original spatial information bitstream 805. The spatial information bitstream 805 may be the same as the spatial information bitstream before being modified in the encoding step. In addition, by detecting the sync word, sync position information, that is, frame alignment information 806, may be obtained to align frames of the downmixed audio signal and the spatial information bitstream.

도 9는 본 발명에 따른 오디오 신호를 일반적인 PCM디코더에서 재생하는 경 우를 도시한다. 즉, 도 9는 공간 정보 비트스트림이 임베드된 오디오 신호 Lo'/Ro'(901)가 일반적인 PCM디코더의 입력으로 인가되는 경우를 나타낸다. 이때, 상기 일반적인 PCM디코더는 상기 공간 정보 비트스트림이 임베드된 오디오 신호 Lo'/Ro'(901)를 정상적인 스테레오 오디오 신호로 인식하여 소리를 재생하게 되며, 이는 음질관점에서 공간 정보가 임베드되기 전의 오디오 신호(902)와 구별되지 않는다. 따라서, 본 발명에 따른 공간 정보가 임베드된 오디오 신호는 일반적인 PCM디코더를 통해서는 정상적인 스테레오 오디오 신호를 재생하는 호환성을 갖고, 멀티채널로 복호화가 가능한 디코더에서는 멀티채널 오디오 신호를 제공할 수 있다.9 illustrates a case in which an audio signal according to the present invention is reproduced by a general PCM decoder. That is, FIG. 9 illustrates a case in which the audio signal Lo '/ Ro' 901 in which the spatial information bitstream is embedded is applied as an input of a general PCM decoder. In this case, the general PCM decoder recognizes the audio signal Lo '/ Ro' 901 in which the spatial information bitstream is embedded as a normal stereo audio signal and reproduces sound, which is audio before spatial information is embedded in terms of sound quality. It is not distinguished from the signal 902. Accordingly, the audio signal embedded with the spatial information according to the present invention has a compatibility of reproducing a normal stereo audio signal through a general PCM decoder, and a multichannel audio signal may be provided in a decoder capable of multichannel decoding.

도 10은 본 발명에 따른 다운믹스된 오디오 신호에 공간 정보 비트스트림을 임베드하는 인코딩 방법에 대한 흐름도를 나타낸다. 먼저 멀티채널 오디오 신호(1001)로부터 오디오 신호를 다운믹스(1002)한다. 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 신호를 포함할 수 있다. 또한, 상기 멀티채널 오디오 신호(1001)로부터 공간 정보를 추출(1003)하고, 상기 공간 정보를 이용하여 공간 정보 비트스트림을 생성(1004)한다. 그 다음에 상기 공간 정보 비트스트림을 상기 다운믹스된 오디오 신호에 임베드(1005)하고, 상기 공간 정보 비트스트림이 임베드된 다운믹스된 오디오 신호를 포함하는 전체 비트스트림을 전송(1006)한다. 여기서, 본 발명은 상기 다운믹스된 오디오 신호를 이용하여 K값을 구하고, 상기 K비트에 공간 정보 비트스트림을 임베드하는 것을 포함할 수 있다. 10 shows a flowchart of an encoding method for embedding a spatial information bitstream in a downmixed audio signal according to the present invention. First, the audio signal is downmixed 1002 from the multichannel audio signal 1001. The downmixed audio signal may comprise a mono or stereo signal. In addition, the spatial information is extracted 1003 from the multi-channel audio signal 1001, and a spatial information bitstream is generated 1004 using the spatial information. The spatial information bitstream is then embedded 1005 into the downmixed audio signal and the entire bitstream including the downmixed audio signal in which the spatial information bitstream is embedded 1006. Here, the present invention may include obtaining a K value using the downmixed audio signal and embedding a spatial information bitstream in the K bit.

도 11은 본 발명에 따른 다운믹스된 오디오 신호에 임베드된 공간 정보 비트스트림을 디코딩하는 방법에 대한 흐름도를 나타낸다. 먼저 공간 디코더는 공간정 보 비트스트림이 임베드된 다운믹스된 오디오 신호를 포함하는 비트스트림을 수신(1101)하고, 상기 비트스트림으로부터 다운믹스된 오디오 신호를 검출(1102)한다. 또한 상기 전체 비트스트림으로부터 공간 정보 비트스트림을 검출 및 디코딩(1103)한다. 상기 디코딩을 통해 공간 정보를 추출(1104)하고, 추출된 공간 정보를 이용하여 다운믹스된 오디오 신호를 디코딩(1105)한다. 상기 다운믹스된 오디오 신호는 두 채널로 디코딩되거나, 또는 멀티채널로 디코딩될 수 있다. 여기서, 본 발명은 다운믹스된 오디오 신호에 상기 공간 정보 비트스트림의 임베드방법 및 K값에 대한 정보를 독출하고, 독출된 상기 임베드방법 및 K값을 이용하여 상기 공간 정보 비트스트림을 디코딩하는 것을 포함할 수 있다.11 shows a flowchart of a method of decoding a spatial information bitstream embedded in a downmixed audio signal according to the present invention. First, the spatial decoder receives 1101 a bitstream including a downmixed audio signal in which a spatial information bitstream is embedded, and detects 1102 the downmixed audio signal from the bitstream. It also detects and decodes 1103 the spatial information bitstream from the entire bitstream. The spatial information is extracted 1104 through the decoding, and the downmixed audio signal is decoded 1105 using the extracted spatial information. The downmixed audio signal may be decoded into two channels or decoded into multichannels. The present invention includes reading information on the embedding method and K value of the spatial information bitstream into a downmixed audio signal, and decoding the spatial information bitstream using the read embed method and the K value. can do.

도 12는 본 발명에 따른 다운믹스된 오디오 신호에 다양한 크기로 임베드되는 공간 정보 비트스트림의 프레임 크기를 도시한다. 본 발명에서 "프레임"이란 하나의 헤더를 갖는 일정한 길이의 독립적인 복호화가 가능한 단위를 말하며, 일반적으로 본 명세서에서 "프레임"이라 함은 다음에 오는 "삽입프레임"을 의미한다. 도시되는 것처럼, 다운믹스된 오디오 신호에 공간 정보 비트스트림을 임베드하는 단위에 해당되는 프레임 크기(N)(이하, "삽입프레임 크기"라 한다)는, 공간 정보를 복호화하여 적용하는 단위에 해당되는 공간 정보 비트스트림의 프레임 크기(S)(이하, "복호화프레임 크기"라 한다)와 같은 크기를 갖도록 하거나((a)의 경우), S의 배수가 되도록 하거나((b)의 경우), 또는 S가 N의 배수가 되도록 하는 방법((c)의 경우)이 있다. (a)에서 도시되는 것처럼 N=S인 경우에는, 상기 복호화프레임 크기(S, 1201)와 상기 삽입프레임 크기(N, 1202)가 일치하여 복호화 과정이 용이한 장점이 있다. 반면, (b)에서 도시되는 것처럼 N>S인 경우에는, 복수 개의 복호화프레임(1203)을 묶어서 하나의 삽입프레임 크기(N, 1204)로 하여 전송함으로써, 헤더나 에러검출코드(예를 들면, CRC) 등으로 인해 부가되는 비트 수를 줄일 수 있다. 이 경우 헤더 내에는 프레임의 크기 값 등이 부가정보로써 삽입될 수 있다. (c)에서 도시되는 것처럼 N<S인 경우에는, 여러 개의 삽입프레임(N, 1206)을 묶어서 하나의 복호화프레임(S, 1205)을 구성할 수 있다.12 illustrates a frame size of a spatial information bitstream embedded in various sizes in a downmixed audio signal according to the present invention. In the present invention, "frame" refers to a unit capable of independent decoding of a predetermined length having one header. In general, "frame" in the present specification means "inserted frame". As shown, a frame size N (hereinafter, referred to as an "inserted frame size") corresponding to a unit for embedding a spatial information bitstream in a downmixed audio signal corresponds to a unit for decoding and applying spatial information. To have a size equal to the frame size S of the spatial information bitstream (hereinafter referred to as " decoded frame size ") (for (a)), or to be a multiple of S (for (b)), or There is a method (for (c)) in which S is a multiple of N. As shown in (a), when N = S, the decoding frame size (S, 1201) and the insertion frame size (N, 1202) are the same, there is an advantage that the decoding process is easy. On the other hand, in the case of N> S, as shown in (b), a plurality of decoding frames 1203 are bundled and transmitted with one insertion frame size (N, 1204), whereby a header or an error detection code (for example, The number of bits added due to the CRC) can be reduced. In this case, a frame size value and the like may be inserted as additional information in the header. As shown in (c), in the case of N <S, one decoding frame (S) 1205 may be configured by combining several insertion frames (N, 1206).

도 13은 본 발명에 따른 다운믹스된 오디오 신호에 일정한 크기로 임베드되는 공간 정보 비트스트림을 도시한다. 도 12에 도시된 (a),(b) 및 (c)의 경우는 모두 상기 삽입프레임과 상기 복호화프레임이 서로 정렬되도록 구성된다. 반면, 경우에 따라서는 고정된 크기의 비트스트림, 예를 들면 트랜스포트 스트림(Transport Stream(TS, 1303))과 같은 형태의 상위 패킷(packet)을 구성하여 전송할 수 있다. 즉, 공간 정보 비트스트림의 복호화프레임(1301) 크기와 관계없이 상기 공간 정보 비트스트림(1301)을 일정 크기의 패킷 단위로 묶어, 상기 패킷에 TS 헤더(1302) 등의 정보를 넣어 전송할 수 있다.13 illustrates a spatial information bitstream embedded in a downmixed audio signal with a constant size in accordance with the present invention. In the case of (a), (b) and (c) illustrated in FIG. 12, the insertion frame and the decoding frame are configured to be aligned with each other. On the other hand, in some cases, an upper packet having a fixed size, such as a transport stream (Transport Stream (TS) 1303), may be configured and transmitted. That is, regardless of the size of the decoding frame 1301 of the spatial information bitstream, the spatial information bitstream 1301 may be bundled in a packet unit having a predetermined size, and the TS header 1302 or the like may be inserted into the packet and transmitted.

상기 방법은 다운믹스된 오디오 신호의 특성에 따라, 블록별로 마스킹 한계값이 다르게 나타나고, 이로 인해 상기 다운믹스된 오디오 신호의 음질 손상 없이 할당할 수 있는 최대 비트 수(K_max)가 차이가 나는 점을 고려하여, 공간 정보 비트스트림의 전송률을 가변시키기 위해 필요하다. 예를 들면, 상기 K_max가 해당 블록에서 필요로 하는 공간 정보 비트스트림을 모두 표현하기에 부족한 경우에는 K_max까지만 데이터를 전송하고, 나머지는 이후 다른 블록에서 전송할 수 있다. K_max가 남는 경우에는 다음 블록에 대한 공간 정보 비트스트림을 미리 싣는 방법으로 적용할 수 있다. 이때, 각 TS 패킷은 독립적인 헤더를 가지며, 상기 헤더 내에는 싱크워드(Sync Word), TS 패킷의 길이 및 패킷 내에서 할당된 비트 수(K) 등이 포함될 수 있다. 본 발명은 또한 부가정보(예를 들면, 공간 정보)를 압축되지 않은 형태의 PCM 오디오 신호에 임베드하되, 임베드되는 상기 부가정보의 삽입프레임 크기를 상기 부가정보를 적용하는 프레임 크기(S)에 관계없이 일정한 크기로 삽입하는 것을 포함할 수 있다. 또한, 본 발명은 상기 공간 정보 비트스트림을 상기 다운믹스 오디오 신호와 결합하여 전체 비트스트림을 구성하는 것을 포함한다. 이때, 상기 결합되는 공간 정보 비트스트림의 크기(N)는 공간 정보가 복호화되어 적용되는 프레임 크기(S)에 관계없이 일정한 크기 단위로 결정될 수 있으며, 상기 복호화되어 적용되는 프레임의 위치정보를 상기 공간 정보 비트스트림내에 포함할 수 있다. 상기 위치정보는 상기 공간 정보 비트스트림이 적용될 범위에 관한 정보를 포함할 수 있다. 또한, 본 발명은 공간 정보 비트스트림이 결합된 프레임, 즉 다운믹스 오디오 신호의 프레임의 시작위치와 관계없이 공간 정보 비트스트림이 적용되어야 할 프레임의 시작위치가 결정되는 것을 포함한다. 이때, 전체 비트스트림내에 상기 공간 정보 비트스트림의 시작위치에 대한 위치정보를 삽입할 수 있다. According to the method, the masking threshold value is different for each block according to the characteristics of the downmixed audio signal, which results in a difference in the maximum number of bits (K_max) that can be allocated without damaging the sound quality of the downmixed audio signal. In consideration, it is necessary to vary the data rate of the spatial information bitstream. For example, when K_max is insufficient to represent all the spatial information bitstreams required by the corresponding block, data can be transmitted only up to K_max, and the rest can be transmitted later by other blocks. If K_max remains, it can be applied by loading the spatial information bitstream for the next block in advance. In this case, each TS packet has an independent header, and the header may include a sync word, a length of the TS packet, and a number K of bits allocated in the packet. The present invention also embeds additional information (e.g., spatial information) in an uncompressed PCM audio signal, wherein the embedding frame size of the embedded additional information is related to the frame size S to which the additional information is applied. It may include inserting at a constant size without. The present invention also includes combining the spatial information bitstream with the downmix audio signal to form the entire bitstream. In this case, the size N of the combined spatial information bitstream may be determined by a predetermined size unit regardless of the frame size S to which spatial information is decoded and applied. It can be included in the information bitstream. The location information may include information regarding a range to which the spatial information bitstream is applied. In addition, the present invention includes determining a start position of a frame to which a spatial information bitstream is to be applied regardless of a frame to which a spatial information bitstream is combined, that is, a frame of a downmix audio signal. At this time, position information on the start position of the spatial information bitstream may be inserted in the entire bitstream.

도 14(a)는 고정된 크기로 임베드되는 공간정보 비트스트림의 시간축정렬(time align) 문제를 해결하기 위한 제1 방법을 도시한다. 상기 고정된 크기로 공간정렬 비트스트림을 임베드하는 방법은 임베드된 공간 정보 비트스트림의 프레임 시작 위치와 다운믹스된 오디오 신호가 시간축정렬(time align)이 맞지 않는 문 제가 있다. 따라서, 상기 시간축정렬 문제를 해결할 수 있는 방법이 필요하다. 도 14에 도시된 제1 방법은 공간 정보 비트스트림(1403)에 대한 싱크워드(1402)(이하 "데이터프레임싱크워드"라 한다)를 별도로 두는 것이다. 상기 데이터프레임싱크워드(1402)는 공간 정보 비트스트림(1403)의 프레임(이하, "공간 정보 프레임"이라 한다)을 식별하는데 사용될 수 있다. FIG. 14A illustrates a first method for solving a time alignment problem of a spatial information bitstream embedded in a fixed size. The method of embedding a spatial alignment bitstream with a fixed size has a problem that a time alignment of a downmixed audio signal does not coincide with a frame start position of the embedded spatial information bitstream. Therefore, there is a need for a method that can solve the time axis alignment problem. The first method shown in FIG. 14 is to set a syncword 1402 (hereinafter referred to as a "data frame syncword") for the spatial information bitstream 1403. The data frame sync word 1402 may be used to identify a frame (hereinafter, referred to as a "spatial information frame") of the spatial information bitstream 1403.

TS 패킷(1004 및 1405)으로 구성되는 경우를 예로 들면, 상기 TS 패킷 헤더(1404)내에는 현재의 패킷내에 상기 공간 정보 프레임에 대한 데이터프레임싱크워드(1402)의 존재여부를 알리는 식별정보(1408)(예를 들면, 플래그)를 둔다. 만일 상기 식별정보(1408)가 1이면(즉, 데이터프레임싱크워드(1402)가 존재하면), 상기 TS 패킷 헤더(1404)내에 상기 공간 정보 프레임이 정렬되어야할 다운믹스 샘플에 대한 위치정보(1409)(예를 들면, 지연(delay)정보)를 주도록 할 수 있다. 즉, 현재 TS 패킷(1404 및 1405)내에 새로운 공간 정보 프레임(1403)의 시작점이 존재하면, 상기 공간 정보 프레임(1403)은 상기 TS 패킷(1404 및 1405)의 시작샘플로부터 전/후 몇 번째 샘플위치에 있다는 위치정보(1409)를 줄 수 있다. 상술하였듯이, 상기 위치정보는 상기 공간 정보 비트스트림이 적용될 범위에 관한 정보를 포함할 수 있다. 만일 상기 식별정보(1411)가 0이면, TS 패킷의 헤더내에 상기 위치정보를 포함하지 않을 수 있다. 일반적으로, 공간 정보 비트스트림(1403)은 대응되는 다운믹스된 오디오 신호(1401)보다 먼저 오는 것이 바람직하므로, 상기 위치정보(1409)는 주로 지연(delay)에 대한 샘플값일 수 있다. 한편, 상기 지연이 너무 커서 샘플값을 표현하는데 필요한 정보량이 지나치게 커지는 문제를 방지하기 위해, 샘플단위 가 아닌 일정 샘플을 묶어서 표현하는 샘플군단위(예를 들면, 그래뉼(granule)단위) 등을 정의하여, 상기 샘플군단위로 상기 위치정보를 표현할 수 있다. 전술한 것처럼, 상기 TS 헤더내에는 TS 싱크워드(1406)(이하, "전송프레임싱크워드"라 한다), 삽입비트 값(1407) 및 기타정보(1410)가 포함될 수 있다.For example, in the case of the TS packet 1004 and 1405, identification information 1408 in the TS packet header 1404 indicating whether a data frame sync word 1402 exists for the spatial information frame in a current packet. ) (For example, a flag). If the identification information 1408 is 1 (i.e., data frame syncword 1402 is present), positional information 1409 for downmix samples to which the spatial information frame should be aligned in the TS packet header 1404. (For example, delay information). That is, if a starting point of a new spatial information frame 1403 is present in the current TS packets 1404 and 1405, the spatial information frame 1403 is a few samples before and after the start sample of the TS packets 1404 and 1405. Location information 1409 that is in the location may be given. As described above, the location information may include information about a range to which the spatial information bitstream is applied. If the identification information 1411 is 0, the location information may not be included in the header of the TS packet. In general, since the spatial information bitstream 1403 preferably comes before the corresponding downmixed audio signal 1401, the location information 1409 may be primarily a sample value for delay. On the other hand, in order to prevent the problem that the amount of information required to express the sample value is too large because the delay is too large, a sample group unit (for example, granule unit), etc., which bundles and represents a certain sample rather than a sample unit, is defined. Thus, the location information may be expressed in units of the sample group. As described above, the TS header may include a TS sync word 1406 (hereinafter referred to as a "transmission frame sync word"), an insertion bit value 1407, and other information 1410.

도 14(b)는 고정된 크기로 임베드되는 공간정보 비트스트림의 시간축정렬(time align) 문제를 해결하기 위한 제2 방법을 도시한다. TS 패킷으로 구성되는 경우를 예로 들면, 상기 제2 방법은 공간 정보 프레임(1413)의 시작점과 TS 패킷의 시작점과 대응되는 다운믹스된 오디오 신호(1412)의 시작점을 일치시키는 것이다. 상기와 같이 일치되는 부분에 대해, 상기 세 가지의 시작점들이 정렬되었음을 알려주는 식별정보(1420 또는 1422)(예를 들면, 플래그)가 상기 TS 패킷의 헤더(1415)내에 포함될 수 있다. 도 15는 다운믹스된 오디오 신호의 n번째 프레임(1412)에서 상기 세 가지의 식별정보가 일치되었음을 도시한다. 이 경우에, 상기 식별정보(1422)는 1의 값을 가질 수 있다. 만일 상기 세 가지의 식별정보가 일치하는 않는 경우에는, 상기 식별정보(1420)는 0의 값을 가질 수 있다. 상기 세 가지 시작점들을 일치시키기 위해서, 이전 TS 패킷 다음의 일정부분(1417)(fill 부분)은 0으로 채우거나, 랜덤 신호를 넣거나, 원래의 다운믹스된 오디오 신호로 대체할 수 있다. 전술한 것처럼, 상기 TS 헤더(1415)내에는 TS 싱크워드(1418), 삽입비트 값(1419) 및 기타정보(1421)가 포함될 수 있다.FIG. 14B illustrates a second method for solving a time alignment problem of a spatial information bitstream embedded in a fixed size. For example, the second method is to match the start point of the spatial information frame 1413 with the start point of the downmixed audio signal 1412 corresponding to the start point of the TS packet. For the matching part as described above, identification information 1420 or 1422 (eg, a flag) indicating that the three starting points are aligned may be included in the header 1415 of the TS packet. FIG. 15 shows that the three pieces of identification information match in the nth frame 1412 of the downmixed audio signal. In this case, the identification information 1422 may have a value of one. If the three pieces of identification information do not match, the identification information 1420 may have a value of zero. In order to match the three starting points, a portion 1417 (fill portion) following the previous TS packet may be filled with zeros, inserted with a random signal, or replaced with the original downmixed audio signal. As described above, the TS header 1415 may include a TS syncword 1418, an insertion bit value 1418, and other information 1421.

도 15는 본 발명에 따른 다운믹스된 오디오 신호와 공간 정보 비트스트림이 결합되도록 구성하는 방법을 도시한다. 도시된 것처럼, 상기 공간 정보 비트스트 림(1504)은 다운믹스된 오디오 신호의 프레임 크기, 즉, 삽입프레임 크기(N) 단위로 삽입될 수 있다. 예를 들면, 복호화프레임 크기(S)(1503)와 상기 삽입프레임 크기(N)(1501)가 다르면, 상기 공간 정보 비트스트림(1504)을 임의로 끊어서 상기 삽입프레임(1501)에 끼워 맞추는 것이 아니라, 상기 공간 정보 비트스트림(1504)을 분리하지 않고 하나의 삽입 프레임(1501)에 넣을 수 있다. 이때, 상기 공간 정보 비트스트림(1504)은 다운믹스 오디오 신호에 임베드되지 않고, 상기 다운믹스 오디오 신호와 결합되도록 구성될 수 있다. 예를 들면, 다운믹스 오디오 신호도 압축된 형태의 비트스트림으로 표현될 수 있으므로, 도시된 것처럼 압축된 형태의 다운믹스된 오디오 신호 비트스트림(1502)이 존재하고, 상기 다운믹스된 오디오 신호 비트스트림(1502)에 공간 정보 비트스트림(1504)을 복호화프레임 크기로 결합할 수 있다. 따라서 본 발명에서는 상기 공간 정보 비트스트림(1502)이 한꺼번에 집중적(burst)으로 전송될 수 있다. 또한, 본 발명은 상기 공간 정보 비트스트림(1504)을 압축된 형태의 TS 비트스트림(1506)으로 구성하여 상기 압축된 형태의 다운믹스된 오디오 신호 비트스트림(1502)과 결합하는 것을 포함한다. 이때, 상기 TS 비트스트림(1506)에 대한 TS 헤더(1505)가 포함될 수 있다. 상기 TS 헤더(1505)에는 TS 싱크워드(1507), 삽입비트값(1508), 식별정보(1510), 위치값(1510), 또는 기타정보(1511) 중 하나 이상이 포함될 수 있다. 15 illustrates a method for configuring a downmixed audio signal and a spatial information bitstream in accordance with the present invention. As shown, the spatial information bitstream 1504 may be inserted in units of a frame size of the downmixed audio signal, that is, an insertion frame size (N). For example, if the decoding frame size (S) 1503 and the insertion frame size (N) 1501 are different, the spatial information bitstream 1504 is not arbitrarily cut and fitted to the insertion frame 1501. The spatial information bitstream 1504 may be inserted into one insertion frame 1501 without being separated. In this case, the spatial information bitstream 1504 may be configured to be combined with the downmix audio signal without being embedded in the downmix audio signal. For example, the downmix audio signal may also be represented as a compressed bitstream, such that there is a downmixed audio signal bitstream 1502 in compressed form as shown, and the downmixed audio signal bitstream The spatial information bitstream 1504 may be combined into a decoding frame size at 1502. Therefore, in the present invention, the spatial information bitstream 1502 may be transmitted at a time. In addition, the present invention includes combining the spatial information bitstream 1504 into a TS bitstream 1506 in a compressed form and combining it with the downmixed audio signal bitstream 1502 in the compressed form. In this case, a TS header 1505 for the TS bitstream 1506 may be included. The TS header 1505 may include one or more of a TS syncword 1507, an insertion bit value 1508, identification information 1510, a position value 1510, or other information 1511.

도 16은 본 발명에 따른 다운믹스된 오디오 신호에 다양한 크기로 임베드되는 공간 정보 비트스트림을 인코딩하는 방법에 대한 흐름도를 나타낸다. 먼저 멀티채널 오디오 신호(1601)로부터 오디오 신호를 다운믹스(1602)한다. 상기 다운믹스 된 오디오 신호는 모노 또는 스테레오 신호를 포함할 수 있다. 또한, 상기 멀티채널 오디오 신호(1601)로부터 공간 정보를 추출(1603)하고, 상기 공간 정보를 이용하여 공간 정보 비트스트림을 생성(1604)한다. 만일 복호화프레임 크기(S)가 삽입프레임 크기(N)보다 크다면(1605), 상기 삽입프레임 크기(N)는 다수의 N을 묶어서 하나의 S와 같도록 형성(1607)한다. 만일 복호화프레임의 크기(S)가 삽입프레임 크기(N)보다 작다면(1606), 상기 삽입프레임 크기(N)는 다수의 S를 묶어서 하나의 N과 같도록 형성(1608)한다. 만일 N과 S가 같다면, 상기 삽입프레임 크기(N)는 복호화프레임 크기(S)와 같도록 형성(1609)된다. 상기와 같은 방식으로 형성된 공간 정보 비트스트림은 상기 다운믹스된 오디오 신호에 임베드(1610)되고, 그 다음에 상기 공간 정보 비트스트림이 임베드된 상기 다운믹스된 오디오 신호를 포함하는 전체 비트스트림을 전송(1611)한다. 본 발명은 상기 공간 정보 비트스트림의 삽입프레임의 크기에 대한 정보를 전체 비트스트림 내에 임베드하는 것을 포함할 수 있다. 또한, 본 발명은 상기 다운믹스된 오디오 신호를 이용하여 K값을 구하고, 상기 K비트에 상기 공간 정보 비트스트림을 임베드하는 것을 포함할 수 있다.16 illustrates a flowchart of a method of encoding a spatial information bitstream embedded in various sizes in a downmixed audio signal according to the present invention. First, an audio signal is downmixed from the multichannel audio signal 1601. The downmixed audio signal may comprise a mono or stereo signal. In addition, the spatial information is extracted 1603 from the multi-channel audio signal 1601, and a spatial information bitstream is generated using the spatial information (1604). If the decoding frame size S is larger than the insertion frame size N (1605), the insertion frame size N is formed to be equal to one S by grouping a plurality of N's. If the size (S) of the decoding frame is smaller than the insertion frame size (N) (1606), the insertion frame size (N) is formed so as to be equal to one N by combining a plurality of S. If N and S are equal, the insertion frame size N is formed to be equal to the decoding frame size S (1609). The spatial information bitstream formed in this manner is embedded 1610 in the downmixed audio signal and then transmits the entire bitstream including the downmixed audio signal in which the spatial information bitstream is embedded ( 1611). The present invention may include embedding information on the size of an insertion frame of the spatial information bitstream in the entire bitstream. Further, the present invention may include obtaining a K value using the downmixed audio signal and embedding the spatial information bitstream in the K bit.

도 17는 본 발명에 따른 다운믹스된 오디오 신호에 일정한 크기로 임베드되는 공간 정보 비트스트림을 인코딩하는 방법에 대한 흐름도를 나타낸다. 먼저 멀티채널 오디오 신호(1701)로부터 오디오 신호를 다운믹스(1702)한다. 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 신호를 포함할 수 있다. 또한, 상기 멀티채널 오디오 신호(1701)로부터 공간 정보를 추출(1703)하고, 상기 공간 정보를 이용하여 공간 정보 비트스트림을 생성(1704)한다. 상기 공간 정보 비트스트림을 일정 한 크기(패킷 단위)의 비트스트림, 예를 들면 트랜스포트 스트림(TS)으로 묶은 후(1705)에, 상기 일정한 크기의 공간 정보 비트스트림을 상기 다운믹스된 오디오 신호에 임베드(1706)한다. 그 다음에 상기 공간 정보 비트스트림이 임베드된 상기 다운믹스된 오디오 신호를 포함하는 전체 비트스트림을 전송(1707)한다. 여기서, 본 발명은 상기 다운믹스된 오디오 신호를 이용하여 K값을 구하고, 상기 K비트에 상기 공간 정보 비트스트림을 임베드하는 것을 포함할 수 있다.17 shows a flowchart of a method of encoding a spatial information bitstream embedded with a constant size in a downmixed audio signal according to the present invention. First, an audio signal is downmixed 1702 from the multichannel audio signal 1701. The downmixed audio signal may comprise a mono or stereo signal. In addition, spatial information is extracted from the multichannel audio signal 1701 (1703), and a spatial information bitstream is generated (1704) using the spatial information. After concatenating the spatial information bitstream into a bitstream of a constant size (packet unit), for example, a transport stream (TS) (1705), the spatial information bitstream of the constant size is added to the downmixed audio signal. Embed 1706. It then transmits (1707) the entire bitstream including the downmixed audio signal in which the spatial information bitstream is embedded. Here, the present invention may include obtaining a K value using the downmixed audio signal and embedding the spatial information bitstream in the K bit.

도 18은 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제1 방법을 도시한다. 다운믹스된 오디오 신호가 적어도 두 채널로 구성된 경우, 상기 공간 정보 비트스트림은 상기 두 채널에 공통된 데이터라 볼 수 있다. 따라서, 상기 공간 정보 비트스트림을 적어도 두 채널에 나누어 임베드하는 방법이 필요한데, 도 18은 상기 공간 정보 비트스트림을 두 채널 이상 중에서 한 채널에만 임베드하는 방법을 나타낸다. 도시된 것처럼, 상기 공간 정보 비트스트림을 다운믹스된 오디오 신호의 K비트에 임베드하는데, 하나의 채널에만 임베드하고 다른 채널에는 임베드하지 않는다. 상기 K값은 블록별로 다를 수 있다. 전술하였듯이, 상기 K비트는 다운믹스된 오디오 신호의 하위 비트에 해당될 수 있으나, 본 발명은 이에 한정되지 않는다. 여기서, 상기 공간 정보 비트스트림은 한 채널에 LSB부터 비트플레인(Bit Plane) 순서로 넣거나, 또는 샘플순으로 넣을 수 있다.18 illustrates a first method of embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention. When the downmixed audio signal is composed of at least two channels, the spatial information bitstream may be regarded as data common to the two channels. Accordingly, a method of dividing and embedding the spatial information bitstream into at least two channels is needed, and FIG. 18 illustrates a method of embedding the spatial information bitstream into only one channel of two or more channels. As shown, the spatial information bitstream is embedded in the K bits of the downmixed audio signal, only embedded in one channel and not embedded in the other. The K value may be different for each block. As described above, the K bit may correspond to a lower bit of the downmixed audio signal, but the present invention is not limited thereto. In this case, the spatial information bitstream may be put in an LSB to bit plane order or a sample order in one channel.

도 19는 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제2 방법을 도시한다. 도면에서는 편의상 두 채널에 대해서만 도시하였으나, 본 발명은 이에 한정되지 않는다. 도시된 것처럼, 상기 제2 방법은 상기 공간 정보 비트스트림을 한 채널(여기서는 Left 채널)의 블록 n에 먼저 임베드하고, 다른 채널(여기서는 right 채널)의 블록 n에 임베드하며, 그 다음에 다시 상기 원래 채널(Left 채널)의 블록 n+1에 임베드하는 방식으로 진행된다. 그리고 두 채널의 신호 특성이 다르므로, 각 채널에서의 마스킹 한계값을 별도로 구해, K값을 각 채널에 다르게 할당할 수 있다. 즉, 도시된 것처럼 한 채널에는 K1을 다른 채널에는 K2를 할당할 수 있다. 또한 상기 K값은 각 블록별로 다를 수도 있다. 여기서도 또한, 상기 공간 정보 비트스트림은 각 채널에 LSB부터 비트플레인(Bit Plane) 순서로 임베드되거나, 또는 샘플순으로 임베드될 수 있다. 19 illustrates a second method of embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention. In the drawings, only two channels are illustrated for convenience, but the present invention is not limited thereto. As shown, the second method first embeds the spatial information bitstream into block n of one channel (here left channel), then into block n of another channel (here right channel), and then back to the original The process proceeds by embedding in block n + 1 of the channel (Left channel). Since the signal characteristics of the two channels are different, the masking limit value of each channel can be obtained separately, and the K value can be assigned to each channel differently. That is, as shown, K1 may be allocated to one channel and K2 to another channel. In addition, the K value may be different for each block. Here, the spatial information bitstream may be embedded in each channel in LSB to bit plane order or in sample order.

도 20은 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제3 방법을 도시한다. 도시된 것처럼, 상기 제3 방법은 상기 공간 정보 비트스트림을 적어도 두 채널에 나누어 임베드하되, 그 순서를 샘플 단위로 두 채널에 번갈아 가며 임베드하는 방식으로 진행된다. 그리고 두 채널의 신호 특성이 다르므로, 각 채널에서의 마스킹 한계값을 별도로 구해, K값을 각 채널에 다르게 할당할 수 있다. 즉, 도시된 것처럼 한 채널에는 K1을 다른 채널에는 K2를 할당할 수 있다. 또한 상기 K값은 블록별로 다를 수도 있다. 예를 들면, 먼저 한 채널(여기서는 left 채널)의 샘플1의 하위 K1비트에 먼저 채우고, 다른 채널(여기서는 right 채널)의 샘플1의 하위 K2비트를 채운다. 그 다음에 다시 원래 채널(left 채널)의 샘플2의 하위 K1비트를 채우고, 다른 채널(right 채널)의 샘플2의 하위 K2비트를 채운다. 도면에서 블록 내의 숫자는 공간 정보 비트스트림을 채 우는 순서를 나타낸다. 도 20은 MSB부터 채우는 것을 도시하였으나, LSB부터 채우는 것도 가능하다. 20 illustrates a third method for embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention. As shown in the drawing, the third method divides the spatial information bitstream into at least two channels and embeds them, but alternately embeds the order in two channels on a sample basis. Since the signal characteristics of the two channels are different, the masking limit value of each channel can be obtained separately, and the K value can be assigned to each channel differently. That is, as shown, K1 may be allocated to one channel and K2 to another channel. Also, the K value may be different for each block. For example, first fill the lower K1 bits of sample 1 of one channel (here, the left channel) first, and then fill the lower K2 bits of sample 1 of the other channel (here, right channel). Then, the lower K1 bits of sample 2 of the original channel (left channel) are filled again, and the lower K2 bits of sample 2 of the other channel (right channel) are filled. Numbers in blocks in the figure indicate the order of filling the spatial information bitstream. 20 shows filling from the MSB, it is also possible to fill from the LSB.

도 21은 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제4 방법을 도시한다. 도시된 것처럼, 상기 제4 방법은 상기 공간 정보 비트스트림을 적어도 두 채널에 나누어 임베드하되, 그 순서를 LSB부터 비트 단위로 두 채널에 번갈아 가며 임베드하는 방식으로 진행된다. 그리고 두 채널의 신호 특성이 다르므로, 각 채널에서의 마스킹 한계값을 별도로 구해, K(K1 및 K2)값을 각 채널에 다르게 할당할 수 있다. 즉, 도시된 것처럼 한 채널에는 K1을 다른 채널에는 K2를 할당할 수 있다. 또한 상기 K값은 각 블록별로 다를 수도 있다. 예를 들면, 먼저 한 채널(여기서는 left 채널)의 샘플1의 최하위 1비트를 먼저 채우고, 다른 채널(여기서는 right 채널)의 샘플1의 최하위 1비트를 채운다. 그 다음에 다시 원래 채널(left 채널)의 샘플2의 최하위 1비트를 채우고, 다시 다른 채널(right 채널)의 샘플2의 최하위 1비트를 채운다. 도면에서 블록 내의 숫자는 공간 정보 비트스트림을 채우는 순서를 나타낸다.21 illustrates a fourth method for embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention. As shown in the drawing, the fourth method divides the spatial information bitstream into at least two channels and embeds them, and then alternates the embedding of the spatial information bitstreams into two channels in units of bits from the LSB. Since the signal characteristics of the two channels are different, the masking limit value of each channel can be obtained separately, and K (K1 and K2) values can be assigned to each channel differently. That is, as shown, K1 may be allocated to one channel and K2 to another channel. In addition, the K value may be different for each block. For example, the first 1 bit of sample 1 of one channel (here left channel) is filled first, and the lowest 1 bit of sample 1 of the other channel (here right channel) is filled first. It then fills the least significant 1 bit of sample 2 of the original channel (left channel) and again the least significant 1 bit of sample 2 of the other channel (right channel). The numbers in the blocks in the figure indicate the order of filling the spatial information bitstream.

오디오 신호가 보조데이터 영역이 없는 저장매체(예를 들면, 스테레오 CD)에 저장되거나 SPDIF와 같은 방식으로 전송되는 경우, L/R채널이 샘플단위로 인터리빙(interleaving)되기 때문에, 상기 제3 방법, 제4 방법으로 저장되어 있는 것이 디코더 입장에서는 받은 순서대로 처리할 수 있어 유리하다. 또한, 상기 제4 방법은 공간 정보 비트스트림을 재정렬하는 과정에서 비트플레인 단위로 재정렬하여 저장하는 경우에 적용할 수 있다. 상술한 것처럼, 두 채널에 공간 정보 비트스트림을 나누어 임베드하는 경우에 K값을 각 채널에 다르게 할당하는 것이 가능하데, 이 경우 비트스트림 내에 각 채널별로 K값을 별도로 전송하는 것이 가능하다. 또한 상기 K값이 복수로 전송하는 경우, 상기 K값을 부호화할 때 디퍼렌셜(differential) 부호화 방법을 이용할 수 있다. In the third method, since the L / R channel is interleaved in units of samples when the audio signal is stored in a storage medium (for example, a stereo CD) without an auxiliary data area or transmitted in the same manner as the SPDIF, What is stored in the fourth method is advantageous for the decoder to process in the order received. In addition, the fourth method may be applied to a case in which the spatial information bitstream is rearranged and stored in bit plane units. As described above, when the spatial information bitstream is divided and embedded in two channels, it is possible to assign a K value differently to each channel. In this case, it is possible to separately transmit the K value for each channel in the bitstream. In addition, when the K value is transmitted in plural, a differential encoding method may be used when encoding the K value.

도 22는 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제5 방법을 도시한다. 도시된 것처럼, 상기 제5 방법은 상기 공간 정보 비트스트림을 적어도 두 채널에 나누어 임베드하되, 상기 적어도 두 채널에 동일한 값을 반복해서 삽입하는 방식으로 진행한다. 이때, 상기 적어도 두 채널에 동일한 부호의 값을 삽입하거나, 또는 부호를 반대로 하여 삽입할 수 있다. 예를 들면, 적어도 두 채널에 1의 값을 삽입하거나, 또는 1과 -1의 값을 교대로 삽입할 수 있다. 상기 제5 방법은 적어도 두 채널의 최하위 삽입비트(예를 들면, K비트)를 비교함으로써, 전송오류를 쉽게 확인할 수 있는 장점이 있다. 특히, 모노 오디오 신호를 CD와 같은 스테레오 매체에 전송하는 경우, 다운믹스된 오디오 신호의 L(left)채널과 R(right)채널이 동일하므로, 삽입되는 공간 정보도 동일하게 함으로써 강인성 향상 등을 도모할 수 있다. 여기서도 또한, 상기 공간 정보 비트스트림은 각 채널에 LSB부터 비트플레인(Bit Plane) 순서로 임베드되거나, 또는 샘플순으로 임베드될 수 있다.FIG. 22 illustrates a fifth method for embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention. As shown, the fifth method divides the spatial information bitstream into at least two channels and embeds the same, but repeatedly inserts the same value into the at least two channels. In this case, a value of the same sign may be inserted into the at least two channels, or the sign may be reversed. For example, a value of 1 may be inserted into at least two channels, or a value of 1 and -1 may be inserted alternately. The fifth method has an advantage in that transmission errors can be easily identified by comparing least significant insertion bits (eg, K bits) of at least two channels. In particular, when a mono audio signal is transmitted to a stereo medium such as a CD, since the L (left) channel and the R (right) channel of the downmixed audio signal are the same, the spatial information to be inserted is also the same to improve the robustness. can do. Here, the spatial information bitstream may be embedded in each channel in LSB to bit plane order or in sample order.

도 23은 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제6 방법을 도시한다. 도시된 것처럼, 상기 제6 방법은 각 채널의 프레임이 복수의 블록(길이 B)으로 구성된 경우에 상기 공간 정보 비 트스트림을 적어도 두 채널에 삽입하는 방법에 관한 것이다. 도시된 것처럼, 상기 삽입비트 값(즉, K값)은 각 채널 및 블록별로 각각 다른 값을 가지거나, 또는 동일한 값을 가질 수 있다. 상기 삽입비트 값들(예를 들면, K1, K2, K3 및 K4)은 프레임 전체에 대해 한차례 전송되는 프레임 헤더내에 저장될 수 있으며, 상기 프레임 헤더는 LSB에 위치될 수 있다. 이 경우에 상기 헤더는 비트플레인 단위로 삽입될 수 있으며, 공간 정보 비트스트림 데이터의 삽입은 샘플단위로 번갈아가면서 이루어질 수 있거나, 또는 블록단위로 번갈아가면서 이루어질 수 있다. 도 23은 프레임내 블록 개수가 2인 경우를 도시하며, 따라서 상기 블록의 크기(B)는 N/2가 된다. 이 경우에 상기 프레임에 삽입된 비트수는 (K1 + K2 + K3 + K4)*B가 된다. 23 illustrates a sixth method for embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention. As shown, the sixth method relates to a method of inserting the spatial information bitstream into at least two channels when the frame of each channel is composed of a plurality of blocks (length B). As shown, the insertion bit value (ie, K value) may have a different value for each channel and block, or may have the same value. The insertion bit values (eg, K1, K2, K3 and K4) may be stored in a frame header transmitted once for the entire frame, and the frame header may be located in the LSB. In this case, the header may be inserted in units of bit planes, and the insertion of spatial information bitstream data may be performed alternately in units of samples, or alternately in units of blocks. FIG. 23 shows a case where the number of blocks in a frame is two, so that the size B of the block is N / 2. In this case, the number of bits inserted into the frame becomes (K1 + K2 + K3 + K4) * B.

도 24는 본 발명에 따른 적어도 두 채널로 다운믹된 오디오 신호에 공간 정보 비트스트림을 임베드하는 제7 방법을 도시한다. 도시된 것처럼, 상기 제7 방법은 상기 공간 정보 비트스트림을 적어도 두 채널에 나누어 임베드하되, 그 순서를 LSB(또는 MSB)부터 비트단위로 적어도 두 채널에 번갈아가며 삽입하는 방법과 샘플단위로 번갈아가면 삽입하는 방법을 혼합한 방법을 이용하는 것이다. 상기 방법은 프레임 단위로 이루어지거나, 또는 도시된 것처럼 블록단위로 이루어질 수 있다. 도 24에 도시된 것처럼, 1 내지 C(빗금친 부분)는 헤더에 대응되는 부분으로서, 데이터프레임싱크워드의 탐색을 용이하게 하기 위해 LSB(또는 MSB)에 비트플레인 순서로 삽입될 수 있다. C+1이상(빗금치지 않은 부분)은 헤더 이외의 부분으로서, 데이터를 독출하기 용이하도록 샘플단위로 채널을 번갈아가며 삽입할 수 있다. 삽입비트 값들(예를 들면, K값들)은 각 채널 및 블록별로 다른 값을 가지거나, 또는 동 일한 값을 가질 수 있다. 상기 삽입비트 값들은 모두 헤더내에 포함되어 있어 읽어내야 할 데이터를 알 수 있다. 24 illustrates a seventh method of embedding a spatial information bitstream in an audio signal downmixed into at least two channels in accordance with the present invention. As shown, the seventh method divides and embeds the spatial information bitstream into at least two channels, and alternately inserts the spatial information bitstream into at least two channels in units of bits from the LSB (or MSB). The method of mixing is used. The method may be performed in units of frames or in units of blocks as shown. As shown in FIG. 24, 1 to C (hatched portions) correspond to headers, and may be inserted in the LSB (or MSB) in bitplane order to facilitate the search for a data frame sync word. C + 1 or more (non-hatched portion) is a part other than the header, and can be alternately inserted in the sample unit so that data can be easily read. Insertion bit values (eg, K values) may have different values for each channel and block, or may have the same value. The insertion bit values are all included in the header so that the data to be read can be known.

도 25는 본 발명에 따른 두 채널의 다운믹스된 오디오 신호에 임베드되는 공간 정보 비트스트림을 인코딩하는 방법에 대한 흐름도를 도시한다. 먼저 멀티채널 오디오 신호(2501)로부터 오디오 신호를 적어도 두 채널로 다운믹스(2502)한다. 또한, 상기 멀티채널 오디오 신호(2501)로부터 공간 정보를 추출(2503)하고, 상기 공간 정보를 이용하여 공간 정보 비트스트림을 생성(2504)한다. 상기 적어도 두 채널로 다운믹스된 오디오 신호에 상기 공간 정보 비트스트림을 임베드(2505)한다. 이때, 공간 정보 비트스트림을 두 채널에 임베드하는 상기 네 가지 방법 중 하나 이상의 방법이 사용될 수 있다. 그 다음에 상기 공간 정보 비트스트림이 임베드된 상기 다운믹스된 오디오 신호를 포함하는 전체 비트스트림을 전송(2506)한다. 여기서, 본 발명은 상기 다운믹스된 오디오 신호를 이용하여 K값을 구하고, 상기 K비트에 상기 공간 정보 비트스트림을 임베드하는 것을 포함할 수 있다.25 shows a flowchart of a method for encoding a spatial information bitstream embedded in two channels of downmixed audio signals according to the present invention. First, the audio signal from the multichannel audio signal 2501 is downmixed 2502 into at least two channels. In addition, the spatial information is extracted 2503 from the multi-channel audio signal 2501, and a spatial information bitstream is generated 2504 using the spatial information. The spatial information bitstream is embedded 2505 in the downmixed audio signal into at least two channels. In this case, one or more of the four methods of embedding the spatial information bitstream into two channels may be used. It then transmits 2506 the entire bitstream including the downmixed audio signal in which the spatial information bitstream is embedded. Here, the present invention may include obtaining a K value using the downmixed audio signal and embedding the spatial information bitstream in the K bit.

도 26은 본 발명에 따른 두 채널의 다운믹스된 오디오 신호에 임베드된 공간 정보 비트스트림을 디코딩하는 방법에 대한 흐름도를 나타낸다. 먼저 공간 디코더는 공간 정보 비트스트림이 임베드된 다운믹스된 오디오 신호를 포함하는 비트스트림을 수신(2601)하고, 상기 비트스트림으로부터 다운믹스된 오디오 신호를 검출(2602)한다. 또한 상기 비트스트림으로부터 두 채널의 다운믹스된 오디오 신호에 임베드된 공간 정보 비트스트림을 추출 및 디코딩(2603)한다. 그 다음에 상기 디코딩으로부터 얻어진 공간 정보를 이용하여 상기 다운믹스된 오디오 신호를 멀티채널 로 변환(2604)한다. 본 발명은 상기 공간 정보 비트스트림이 임베드된 순서에 대한 식별정보를 추출하고, 상기 식별정보를 이용하여 상기 공간 정보비트스트림을 추출 및 디코딩하는 것을 포함할 수 있다. 또한, 본 발명은 상기 비트스트림으로부터 K값에 대한 정보를 독출하고, 상기 K값을 이용하여 상기 공간 정보 비트스트림을 디코딩하는 것을 포함할 수 있다.FIG. 26 is a flowchart illustrating a method of decoding a spatial information bitstream embedded in a downmixed audio signal of two channels according to the present invention. First, the spatial decoder receives 2601 a bitstream including a downmixed audio signal in which a spatial information bitstream is embedded, and detects 2602 the downmixed audio signal from the bitstream. Also extracts and decodes 2603 the spatial information bitstream embedded in the downmixed audio signal of two channels from the bitstream. The downmixed audio signal is then converted to multi-channel 2604 using the spatial information obtained from the decoding. The present invention may include extracting identification information about the order in which the spatial information bitstream is embedded, and extracting and decoding the spatial information bitstream using the identification information. In addition, the present invention may include reading information on a K value from the bitstream and decoding the spatial information bitstream using the K value.

도 27은 본 발명에 따른 공간 정보 비트스트림을 전송할 수 있는 방법을 선택할 수 있는 제1 공간인코더를 도시한다. 도시된 것처럼, 먼저 상기 제1 공간인코더는 멀티채널 오디오 신호(2701)를 수신한다. 여기서 n은 입력 채널의 수를 의미한다. 상기 멀티채널 오디오 신호(2701)는 다운믹스(down-mix)부(2703)에서 다운믹스되어 다운믹스된 오디오 신호(Lo 및 Ro, 2705)로 된다. 상기 다운믹스 오디오 신호(2705)는 모노 또는 스테레오 오디오 신호를 포함할 수 있다. 본 명세서에서는 편의상 스테레오 오디오 신호를 예로 하여 기술할 것이나, 본 발명이 이에 한정되지는 않는다. 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터는 공간정보추출부(2704)에서 상기 멀티채널 오디오 신호(2701)로부터 추출된다. 선택적으로, 상기 다운믹스된 오디오 신호(2705)는 외부에서 직접 제공되는 다운믹스 신호, 예를 들면 아티스틱 다운믹스 신호(Artistic down-mix signal, 2702)를 이용하여 생성될 수 있다.27 illustrates a first spatial encoder capable of selecting a method for transmitting a spatial information bitstream according to the present invention. As shown, first the first spatial encoder receives a multichannel audio signal 2701. N is the number of input channels. The multi-channel audio signal 2701 is downmixed by a down-mix unit 2703 to be downmixed audio signals Lo and Ro 2705. The downmix audio signal 2705 may include a mono or stereo audio signal. In the present specification, a stereo audio signal will be described as an example for convenience, but the present invention is not limited thereto. The spatial information of the multichannel audio signal, that is, the spatial parameter, is extracted from the multichannel audio signal 2701 by the spatial information extractor 2704. Alternatively, the downmixed audio signal 2705 may be generated using an externally provided downmix signal, for example, an artistic down-mix signal 2702.

상기 공간정보추출부(2704)에서 추출된 공간 정보는 공간정보인코딩부(2707)에서 공간 정보 비트스트림으로 부호화되는 과정을 거친다. 그 다음에 전송방법선택부(2708)는 상기 공간 정보 비트스트림을 상기 다운믹스 오디오 신호에 임베드할 것인지, 별도의 공간 정보 비트스트림으로 전송할 것인지, 또는 두 가지 모두를 사용할 것인지를 선택할 수 있다. The spatial information extracted by the spatial information extractor 2704 is encoded by the spatial information encoder 2707 into a spatial information bitstream. The transmission method selector 2708 may then select whether to embed the spatial information bitstream in the downmix audio signal, transmit it as a separate spatial information bitstream, or use both.

만일 상기 공간 정보 비트스트림을 별도로 전송할 수 없다면, 상기 공간 정보 비트스트림은 임베드부(2706)에서 전송할 신호, 즉 상기 다운믹스 신호(2705)에 직접 삽입되는데, 이때, "디지털 오디오 임베드기법(Digital Audio Embeded Method)"이 사용될 수 있다. 이때, 상기 "디지털 오디오 임베드기법"을 사용하면, 상기 원(raw) PCM 오디오 신호에 음질왜곡 없이 상기 공간 정보 비트스트림을 임베드할 수 있으며, 상기 공간 정보 비트스트림이 임베드된 오디오 신호는 일반적인 디코더 입장에서 원 신호와 구별되지 않는다. 즉, 공간 정보 비트스트림이 임베드되어 있는 출력신호 Lo'/Ro'(2709)는 일반적인 PCM 디코더 입장에서 입력신호 Lo/Ro(2705)와 동일한 신호라 볼 수 있다.If the spatial information bitstream cannot be separately transmitted, the spatial information bitstream is directly inserted into a signal to be transmitted by the embedding unit 2706, that is, the downmix signal 2705, wherein "digital audio embedding technique (Digital Audio Embedding Technique) is used. Embeded Method) "may be used. In this case, if the "digital audio embedding technique" is used, the spatial information bitstream may be embedded in the raw PCM audio signal without sound distortion, and the audio signal in which the spatial information bitstream is embedded is a general decoder. Is indistinguishable from the original signal. That is, the output signal Lo '/ Ro' 2709 in which the spatial information bitstream is embedded may be regarded as the same signal as the input signal Lo / Ro 2705 from a general PCM decoder.

만일 상기 공간 정보 비트스트림을 별도로 전송할 수 있다면, 상기 공간 정보 비트스트림(2710)은 상기 다운믹스 오디오 신호에 임베드되지 않고, 별도로 상기 다운믹스 오디오 신호와 함께 전송될 수 있다. 즉, 이 경우에 상기 다운믹스된 오디오 신호는 코어 코덱 비트스트림으로 인코딩되어 전송되고, 상기 공간 정보 비트스트림(2710)은 상기 코어 코덱 비트스트림과 함께 전송될 수 있다. 예를 들면, 상기 공간 정보 비트스트림을 CD의 데이터 트랙(data track)과 같은 곳에 삽입하거나, 또는 일반적인 MPEG 비트스트림 형태로 전송할 수 있다. 또한, 본 발명은 상기 공간 정보 비트스트림을 상기 다운믹스 오디오 신호에 임베드하고, 이와 함께 별도로 상기 공간 정보 비트스트림을 전송하는 것을 포함한다. 즉, 공간 정보 비트스트 림을 두 부분에서 전송하는 것이다. 예를 들면, CD(Compact Disc)와 같은 저장매체의 한 계층에 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호를 저장하고, 다른 계층에 상기 공간 정보 비트스트림을 별도로 저장하는 것이다. 본 발명은 상기와 같이 구성된 전체 비트스트림내에 공간 정보 비트스트림의 존재여부를 나타내는 식별정보를 포함하는 것을 포함한다. 멀티채널 오디오 신호가 이와 같이 구성되는 경우, 임베드된 공간 정보 비트스트림을 디코딩할 수 있는 디코더뿐만 아니라, 임베드된 공간 정보 비트스트림을 디코딩할 수 없는 디코더도 다운믹스된 오디오 신호를 멀티채널로 변환할 수 있는 장점을 가지게 된다. If the spatial information bitstream can be transmitted separately, the spatial information bitstream 2710 may be separately transmitted with the downmix audio signal without being embedded in the downmix audio signal. That is, in this case, the downmixed audio signal may be encoded and transmitted in a core codec bitstream, and the spatial information bitstream 2710 may be transmitted together with the core codec bitstream. For example, the spatial information bitstream may be inserted in a data track of a CD or transmitted in a general MPEG bitstream. The invention also includes embedding the spatial information bitstream in the downmix audio signal and transmitting the spatial information bitstream separately. In other words, the spatial information bitstream is transmitted in two parts. For example, a downmix audio signal in which a spatial information bitstream is embedded in one layer of a storage medium such as a compact disc (CD) is stored, and the spatial information bitstream is separately stored in another layer. The present invention includes the identification information indicating whether the spatial information bitstream is present in the entire bitstream configured as described above. When the multichannel audio signal is thus configured, not only a decoder capable of decoding the embedded spatial information bitstream, but also a decoder that cannot decode the embedded spatial information bitstream can convert the downmixed audio signal into a multichannel. You will have the advantage.

도 28은 본 발명에 따른 공간 정보 비트스트림을 전송할 수 있는 방법을 선택할 수 있는 제2 공간인코더를 도시한다. 도시된 것처럼, 상기 제2 공간인코더는 상기 제1 공간인코더와 유사하다. 차이점은 상기 제1 공간인코더에서는 공간 정보 비트스트림을 다운믹스 오디오 신호에 임베드할 것인지를 선택할 수 있으나, 상기 제2 공간인코더에서는 공간정보인코딩부(2807)에서 생성된 공간 정보 비트스트림이 임베드부(2806)에서 다운믹스 오디오 신호에 항상 임베드된다. 상기 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호(2808)와 상기 공간정보인코딩부(2807)에서 생성된 공간 정보 비트스트림(2809)은 각각 전송방법선택부(2810)로 보내진다. 상기 전송방법선택부(2810)는 상기 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호(2808)를 포함하는 전체 비트스트림을 생성할지, 또는 상기 다운믹스 오디오 신호(2808)와 공간 정보 비트스트림(2809)을 모두 포함하는 전체 비트스트림을 생성할지 결정할 수 있다. 28 illustrates a second spatial encoder capable of selecting a method for transmitting a spatial information bitstream according to the present invention. As shown, the second spatial encoder is similar to the first spatial encoder. The difference is that the first spatial encoder can select whether or not to embed the spatial information bitstream into the downmix audio signal, but in the second spatial encoder, the spatial information bitstream generated by the spatial information encoding unit 2807 is embedded into It is always embedded in the downmix audio signal at 2806. The downmix audio signal 2808 in which the spatial information bitstream is embedded and the spatial information bitstream 2809 generated by the spatial information encoding unit 2807 are respectively sent to the transmission method selection unit 2810. The transmission method selector 2810 generates the entire bitstream including the downmix audio signal 2808 in which the spatial information bitstream is embedded, or the downmix audio signal 2808 and the spatial information bitstream 2809. It is possible to determine whether to generate the entire bitstream that includes all).

도 29는 본 발명에 따른 공간 정보 비트스트림을 MPEG 비트스트림형태로 별도로 보내는 경우에 대한 비트스트림 구성도이다. 도시된 것처럼, 도 29의 (a)는 코어 코덱 비트스트림(2902)과 독립된 별도의 확장 비트스트림(extension bitstream, 2903)이 존재하는 경우를 나타내며, 상기 확장 비트스트림(2903)내에 공간 정보 비트스트림을 삽입할 수 있다. 도 29의 (b)는 코어 코덱 비트스트림(2905)내에 보조데이터 헤더(2906) 및 보조데이터 비트스트림(2907)으로 구성되는 보조데이터 영역이 존재하며, 상기 보조 데이터 비트스트림(2907)내에 공간 정보 비트스트림을 삽입할 수 있다. 이때, 도 29의 (a)와 같이 구성되는 경우에, 상기 헤더(2901)내에 공간 정보 비트스트림의 존재 여부를 나타내는 식별정보가 삽입되거나, 또는 상기 확장 비트스트림(2903)내에 상기 공간 정보 비트스트림의 존재 여부를 나타내는 식별정보를 삽입할 수 있다. 도 29의 (b)와 같이 구성되는 경우에, 상기 보조데이터 영역내에 상기 공간 정보 비트스트림의 존재 여부에 대한 식별정보를 포함할 수 있다.29 is a diagram illustrating the configuration of a bitstream for separately transmitting a spatial information bitstream in the form of an MPEG bitstream according to the present invention. As shown, (a) of FIG. 29 illustrates a case in which a separate extension bitstream 2903 independent of the core codec bitstream 2902 exists, and a spatial information bitstream in the extension bitstream 2907 is present. You can insert FIG. 29B shows an auxiliary data area composed of an auxiliary data header 2906 and an auxiliary data bitstream 2907 in the core codec bitstream 2905, and the spatial information in the auxiliary data bitstream 2907 is present. You can insert a bitstream. In this case, when configured as shown in FIG. 29A, identification information indicating whether a spatial information bitstream exists in the header 2901 is inserted, or the spatial information bitstream is included in the extension bitstream 2907. Identification information indicating whether or not may be inserted. In the case of FIG. 29B, identification information on whether the spatial information bitstream exists in the auxiliary data region may be included.

도 30은 본 발명에 따른 공간 정보 비트스트림을 전송하는 방법에 대한 흐름도를 나타낸다. 먼저 멀티채널 오디오 신호(3001)로부터 오디오 신호를 다운믹스(3002)한다. 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 신호를 포함할 수 있다. 또한, 상기 멀티채널 오디오 신호(3001)로부터 공간 정보를 추출(3003)하고, 추출된 공간 정보를 이용하여 공간 정보 비트스트림을 생성(3004)한다. 그 다음에 상기 공간 정보 비트스트림의 전송방법을 선택(3005)한다. 상기 전송방법의 제1 모드는 상기 공간 정보 비트스트림을 상기 다운믹스 오디오 신호에 임베 드(3006)하여 전송하는 것이다. 상기 전송방법의 제2 모드는 상기 공간 정보 비트스트림을 별도로 전송하는 것이다. 또한, 상기 공간 정보 비트스트림은 상기 제1 모드와 상기 제2 모드로 함께 전송될 수 있다. 그 다음에 각각의 모드로 생성된 상기 공간 정보 비트스트림과 상기 다운믹스 오디오 신호를 포함하도록 전체 비트스트림을 구성하여 전송(3007)한다.30 is a flowchart illustrating a method for transmitting a spatial information bitstream according to the present invention. First, an audio signal is downmixed 3002 from the multichannel audio signal 3001. The downmixed audio signal may comprise a mono or stereo signal. In addition, spatial information is extracted 3003 from the multi-channel audio signal 3001, and a spatial information bitstream is generated 3004 using the extracted spatial information. Next, a method of transmitting the spatial information bitstream is selected 3005. The first mode of the transmission method is to embed the spatial information bitstream in the downmix audio signal (3006) and transmit. The second mode of the transmission method is to transmit the spatial information bitstream separately. In addition, the spatial information bitstream may be transmitted together in the first mode and the second mode. The entire bitstream is then constructed and transmitted 3007 to include the spatial information bitstream and the downmix audio signal generated in each mode.

도 31은 본 발명에 따라 전송되는 공간 정보 비트스트림을 디코딩할 수 있는 제1 공간디코더를 도시한다. 상기 제1 공간디코더는 공간 정보 비트스트림이 다운믹스 오디오 신호에 임베드된 경우에 사용가능한 디코더이다. 먼저, 상기 제1 공간인코더는 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호(3101)를 수신한다. 그 다음에 식별정보추출부(3102)는 상기 다운믹스 오디오 신호(3101)로부터 공간 정보 비트스트림이 전송된 방법에 대한 식별정보를 추출한다. 상기 식별정보는 공간 정보 비트스트림이 다운믹스 오디오 신호에 임베드되었다는 정보가 될 수 있다. 예를 들면, 상기 식별정보는 싱크워드(Syncword)가 될 수 있으며, 이 경우에 상기 다운믹스 오디오 신호의 삽입비트에 상기 싱크워드가 존재하면 공간 정보 비트스트림이 임베드된 것이고, 상기 싱크워드가 존재하지 않으면 공간 정보 비트스트림이 임베드되지 않은 것이 될 수 있다. 임베디드신호디코더(3103)는 상기 식별정보에 따라 상기 다운믹스 오디오 신호에 임베드된 공간 정보 비트스트림을 추출한다. 추출된 상기 공간 정보 비트스트림은 공간정보디코더(3104)에서 디코딩되고, 상기 디코딩을 통해 공간 정보가 추출될 수 있다. 추출된 상기 공간 정보는 멀티채널생성부(3105)에서 상기 다운믹스 오디오 신호를 멀티채널 오디오 신호(3107)로 변환하는데 이용될 수 있다. 만일 공간 정보 비트스트림이 다운믹스 오디오 신호에 임베드되지 않았다면, 상기 제1 공간디코더는 상기 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호를 직접 출력(3106)할 수 있다.31 illustrates a first spatial decoder capable of decoding the spatial information bitstream transmitted according to the present invention. The first spatial decoder is a decoder usable when the spatial information bitstream is embedded in the downmix audio signal. First, the first spatial encoder receives a downmix audio signal 3101 in which a spatial information bitstream is embedded. The identification information extraction section 3102 then extracts identification information on how the spatial information bitstream was transmitted from the downmix audio signal 3101. The identification information may be information that a spatial information bitstream is embedded in the downmix audio signal. For example, the identification information may be a syncword. In this case, if the syncword exists in the insertion bit of the downmix audio signal, the spatial information bitstream is embedded, and the syncword exists. Otherwise, the spatial information bitstream may not be embedded. The embedded signal decoder 3103 extracts a spatial information bitstream embedded in the downmix audio signal according to the identification information. The extracted spatial information bitstream may be decoded by the spatial information decoder 3104, and spatial information may be extracted through the decoding. The extracted spatial information may be used by the multichannel generator 3105 to convert the downmix audio signal into a multichannel audio signal 3107. If the spatial information bitstream is not embedded in the downmix audio signal, the first spatial decoder can directly output 3106 the downmix audio signal with the spatial information bitstream embedded therein.

도 32는 본 발명에 따라 전송되는 공간 정보 비트스트림을 디코딩할 수 있는 제2 공간디코더를 도시한다. 상기 제2 공간디코더는 공간 정보 비트스트림이 별도로 전송되는 경우에 이용될 수 있다. 먼저, 상기 제2 공간디코더는 코어 코덱 비트스트림(3201) 및 공간 정보 비트스트림(3202)를 수신한다. 그 다음에 식별정보추출부(3204)는 상기 코어 코덱 비트스트림(3201) 또는 공간 정보 비트스트림(3202)으로부터 식별정보를 추출한다. 예를 들면, 상기 식별정보는 데이터 트랙상에 공간 정보 비트스트림이 존재한다는 것을 나타내거나, 확장 비트스트림내에 공간 정보 비트스트림이 존재한다는 것을 나타내거나, 또는 보조 데이터 영역내에 공간 정보 비트스트림이 존재한다는 것을 나타내는 정보가 될 수 있다. 상기 코어 코덱 비트스트림(3201)은 코어코덱디코더(3203)에서 디코딩되고, 상기 공간 정보 비트스트림은 공간정보디코더(3205)에서 디코딩된다. 만일 상기 공간 정보 비트스트림이 데이터 트랙상에 존재한다면, 상기 코어코덱디코더(3203)가 필요하지 않을 수 있다. 상기 공간 정보 비트스트림(3202)을 디코딩하여 얻어진 공간정보는 멀티채널생성부(3206)에서 코어 코덱 비트스트림(3201)을 디코딩하여 얻어진 다운믹스 오디오 신호를 멀티채널 오디오 신호(3208)로 변환하는데 이용될 수 있다.32 illustrates a second spatial decoder capable of decoding the spatial information bitstream transmitted according to the present invention. The second spatial decoder may be used when the spatial information bitstream is separately transmitted. First, the second spatial decoder receives a core codec bitstream 3201 and a spatial information bitstream 3202. The identification information extraction section 3204 then extracts identification information from the core codec bitstream 3201 or the spatial information bitstream 3202. For example, the identification indicates that there is a spatial information bitstream on the data track, indicates that there is a spatial information bitstream in the extension bitstream, or indicates that there is a spatial information bitstream in the auxiliary data area. May be information indicating that. The core codec bitstream 3201 is decoded by a core codec decoder 3203, and the spatial information bitstream is decoded by a spatial information decoder 3205. If the spatial information bitstream is on a data track, the core codec decoder 3203 may not be needed. The spatial information obtained by decoding the spatial information bitstream 3202 is used to convert the downmix audio signal obtained by decoding the core codec bitstream 3201 by the multichannel generator 3206 into a multichannel audio signal 3208. Can be.

도 33은 본 발명에 따라 전송되는 공간 정보 비트스트림을 디코딩할 수 있는 제3 공간디코더를 도시한다. 상기 제3 공간디코더는 공간 정보 비트스트림이 다운 믹스 오디오 신호에 임베드되고 또한 별도로 전송되는 경우에 이용될 수 있다. 상기 제3 공간디코더는 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호(3301) 및 별도로 전송되는 공간 정보 비트스트림(3302)을 수신한다. 그 다음에 식별정보추출부(3303)는 상기 코어 코덱 비트스트림(3301)과 공간 정보 비트스트림(3302) 중 하나 이상으로부터 상기 공간 정보 비트스트림이 전송된 방법에 대한 식별정보를 추출한다. 상기 식별정보는 공간 정보 비트스트림이 다운믹스 오디오 신호에 임베드되었을 뿐만 아니라, 상기 다운믹스 오디오 신호(3301)와 함께 별도로 전송되었다는 것을 나타내는 정보가 될 수 있다. 또한 상기 식별정보추출부(3303)는 공간 정보 비트스트림이 임베드된 상기 다운믹스 오디오 신호와 별도로 전송된 공간 정보 비트스트림(3302) 중 하나 이상으로부터 공간 정보 비트스트림을 추출할 수 있다. 만일, 다운믹스 오디오 신호(3301)로부터 공간 정보를 추출하여 디코딩하는 것이 유리하다고 판단되면, 임베디드신호디코더(3304)에서 다운믹스 오디오 신호에 임베드된 공간 정보 비트스트림을 추출하고, 공간정보디코더(3305)에서 추출된 상기 공간 정보 비트스트림을 디코딩한다. 만일, 별도로 전송된 공간 정보 비트스트림(3302)을 디코딩하는 것이 유리하다고 판단되면, 별도로 전송된 상기 공간 정보 비트스트림(3302)을 공간디코더(3305)에서 직접 디코딩한다. 공간 정보 비트스트림을 디코딩하여 얻어진 공간 정보는 멀티채널생성부(3306)에서 상기 다운믹스 오디오 신호를 멀티채널 오디오 신호(3308)로 변환하는데 이용될 수 있다. 만일 멀티채널을 이용할 수 없는 경우라면, 상기 다운믹스 오디오 신호를 직접 출력(3307)할 수 있다. 33 illustrates a third spatial decoder capable of decoding the spatial information bitstream transmitted according to the present invention. The third spatial decoder can be used when the spatial information bitstream is embedded in the downmix audio signal and transmitted separately. The third spatial decoder receives a downmix audio signal 3301 in which the spatial information bitstream is embedded and a spatial information bitstream 3302 separately transmitted. The identification information extracting unit 3303 then extracts identification information on how the spatial information bitstream is transmitted from one or more of the core codec bitstream 3301 and the spatial information bitstream 3302. The identification information may be information indicating that the spatial information bitstream is not only embedded in the downmix audio signal but also transmitted separately together with the downmix audio signal 3301. The identification information extractor 3303 may extract the spatial information bitstream from one or more of the spatial information bitstream 3302 transmitted separately from the downmix audio signal in which the spatial information bitstream is embedded. If it is determined that it is advantageous to extract and decode spatial information from the downmix audio signal 3301, the embedded signal decoder 3304 extracts a spatial information bitstream embedded in the downmix audio signal, and then spatial information decoder 3305. Decode the spatial information bitstream extracted at If it is determined that it is advantageous to decode the spatial information bitstream 3302 separately transmitted, the spatial information bitstream 3302 directly transmitted by the spatial decoder 3305 is directly decoded. The spatial information obtained by decoding the spatial information bitstream may be used by the multichannel generator 3306 to convert the downmix audio signal into a multichannel audio signal 3308. If multichannel is not available, the downmix audio signal may be directly output 3307.

도 34는 본 발명에 따른 제1 공간디코더를 이용하여 디코딩하는 방법에 대한 흐름도를 나타낸다. 먼저 상기 제1 공간디코더는 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호를 포함하는 전체 비트스트림을 수신(3401)하고, 상기 전체 비트스트림으로부터 다운믹스 오디오 신호를 추출(3402)한다. 또한 상기 전체 비트스트림으로부터 공간 정보 비트스트림이 상기 다운믹스 오디오 신호에 임베드되었다는 것을 나타내는 식별정보를 독출(3403)한다. 상기 식별정보를 이용하여 상기 다운믹스 오디오 신호로부터 공간 정보 비트스트림을 추출(3404)하고, 추출된 상기 공간 정보 비트스트림을 디코딩(3405)한다. 그 다음에 상기 공간 정보 비트스트림을 디코딩하여 공간 정보를 추출(3406)하고, 추출된 상기 공간 정보를 이용하여 상기 다운믹스 오디오 신호를 멀티채널로 변환(3407)한다.34 is a flowchart illustrating a method of decoding using a first spatial decoder according to the present invention. First, the first spatial decoder receives 3401 an entire bitstream including a downmix audio signal in which a spatial information bitstream is embedded, and extracts 3402 the downmix audio signal from the entire bitstream. It also reads 3403 from the entire bitstream identifying that a spatial information bitstream is embedded in the downmix audio signal. The spatial information bitstream is extracted 3404 from the downmix audio signal using the identification information, and the extracted spatial information bitstream is decoded 3405. The spatial information bitstream is then decoded to extract spatial information (3406), and the downmixed audio signal is converted to multichannel using the extracted spatial information (3407).

도 35는 본 발명에 따른 제2 공간디코더를 이용하여 디코딩하는 방법에 대한 흐름도를 나타낸다. 먼저 상기 제2 공간디코더는 다운믹스 오디오 신호에 대한 코어 코덱 비트스트림 및 공간 정보 비트스트림을 포함하는 전체 비트스트림을 수신(3501)한다. 여기서, 상기 공간 정보 비트스트림은 상기 다운믹스 오디오 신호에 임베드되지 않고, 별로로 상기 코어 코덱 비트스트림과 함께 전송된 것이다. 그 다음에 상기 전체 비트스트림으로부터 코어 코덱 비트스트림을 디코딩(3502)하여 다운믹스 오디오 신호를 추출(3504)한다. 또한, 상기 전체 비트스트림으로부터 공간 정보 비트스트림이 상기 다운믹스 오디오 신호에 임베드되지 않고, 별도로 전송되었다는 것을 나타내는 식별정보를 추출(3503)한다. 상기 식별정보를 이용하여 공간 정보 비트스트림을 디코딩(3505)하여 공간 정보를 추출(3506)한다. 그 다음에 추출 된 상기 공간 정보를 이용하여 상기 다운믹스 오디오 신호를 멀티채널로 변환(3507)한다.35 is a flowchart illustrating a method of decoding using a second spatial decoder according to the present invention. First, the second spatial decoder receives 3501 an entire bitstream including a core codec bitstream and a spatial information bitstream for a downmix audio signal. Here, the spatial information bitstream is not embedded in the downmix audio signal, but separately transmitted together with the core codec bitstream. The core codec bitstream is then decoded 3502 from the entire bitstream to extract 3504 the downmix audio signal. In addition, identification information indicating that the spatial information bitstream is not embedded in the downmix audio signal is transmitted separately from the entire bitstream (3503). The spatial information is decoded using the identification information (3505) to extract spatial information (3506). The downmixed audio signal is then converted into multichannels using the extracted spatial information (3507).

도 36은 본 발명에 따른 제3 공간디코더를 이용하여 디코딩하는 방법에 대한 흐름도를 나타낸다. 먼저 상기 제3 공간디코더는 공간 정보 비트스트림이 임베드된 다운믹스 오디오 신호에 대한 코어 코덱 비트스트림 및 별도의 공간 정보 비트스트림을 포함하는 전체 비트스트림을 수신(3601)한다. 그 다음에 상기 전체 비트스트림으로부터 다운믹스 오디오 신호를 추출(3603)한다. 또한, 상기 전체 비트스트림으로부터 공간 정보 비트스트림이 상기 다운믹스 오디오 신호에 임베드되었을뿐만 아니라, 별도로 전송되었다는 것을 나타내는 식별정보를 추출(3602)한다. 그 다음에 상기 식별정보에 따라, 공간 정보 비트스트림의 디코딩 방법을 선택(3604)한다. 상기 디코딩 방법에는 다운믹스 오디오 신호에 임베드된 공간 정보 비트스트림을 디코딩하는 제1 모드와, 별도로 전송되는 공간 정보 비트스트림을 디코딩하는 제2 모드가 있다. 만일, 제1 모드로 디코딩하는 것이 유리하다고 판단되면, 상기 코어 코덱 비트스트림으로부터 공간 정보 비트스트림을 추출(3605)하고, 추출된 상기 공간 정보 비트스트림을 디코딩(3606)한다. 만일, 제2 모드가 유리하다고 판단되면, 상기 별도로 전송된 공간 정보 비트스트림을 디코딩(3606)한다. 그 다음에 상기 디코딩을 통하여 공간 정보를 추출(3602)하고, 추출된 공간 정보를 이용하여 상기 다운믹스 오디오 신호를 멀티채널로 변환(3608)한다.36 is a flowchart illustrating a method of decoding using a third spatial decoder according to the present invention. First, the third spatial decoder receives 3601 an entire bitstream including a core codec bitstream and a separate spatial information bitstream for a downmix audio signal in which the spatial information bitstream is embedded. A downmix audio signal is then extracted 3603 from the entire bitstream. Further, 3602 extracts identification information indicating that the spatial information bitstream is not only embedded in the downmix audio signal but also transmitted separately from the entire bitstream. Then, in accordance with the identification information, a method of decoding the spatial information bitstream is selected 3604. The decoding method includes a first mode for decoding the spatial information bitstream embedded in the downmix audio signal, and a second mode for decoding the spatial information bitstream transmitted separately. If it is determined that decoding in the first mode is advantageous, the spatial information bitstream is extracted 3605 from the core codec bitstream, and the extracted spatial information bitstream is decoded 3606. If it is determined that the second mode is advantageous, the separately transmitted spatial information bitstream is decoded (3606). Next, the spatial information is extracted 3602 through the decoding, and the downmixed audio signal is converted into a multichannel using the extracted spatial information (3608).

지금까지 본 발명에 대하여 몇몇 실시예들을 들어 구체적으로 설명하였으나, 상기 실시예들은 본 발명을 이해하기 위한 설명을 위해 제시된 것이며, 본 발명의 범위가 상기 실시예에 제한되는 것은 아니다. 당업자라면 본 발명의 기술적 사상의 범위를 벗어나지 않고도 다양한 변형이 가능함을 이해할 수 있을 것이며, 본 발명의 범위는 첨부된 특허청구범위에 의해서 해석되어야 할 것이다.Although the present invention has been described in detail with reference to some embodiments, the above embodiments are presented for the purpose of understanding the present invention, and the scope of the present invention is not limited to the above embodiments. Those skilled in the art will understand that various modifications are possible without departing from the scope of the technical idea of the present invention, and the scope of the present invention should be interpreted by the appended claims.

이상에서 기술된 것과 같이, 본 발명에 따른 멀티채널 오디오 신호를 코딩하는데 있어서, 공간 정보 비트스트림을 다운믹스된 오디오 신호에 임베드하여 보내거나, 공간 정보 비트스트림을 별도로 보내거나, 또는 공간 정보 비트스트림을 다운믹스 오디오 신호에 임베드할뿐만 아니라 별도로 보내는 방법을 선택할 수 있는 부호화 방법을 제공함으로써, 공간 정보를 별도로 전송할 수 있는 매체뿐만 아니라 공간 정보를 별도로 전송할 수 없는 매체등과 같은 다양한 종류의 전송 매체에 멀티채널 오디오 신호를 전송할 수 있는 효과가 있다.As described above, in coding a multichannel audio signal according to the present invention, a spatial information bitstream is embedded in a downmixed audio signal and sent, a spatial information bitstream is sent separately, or a spatial information bitstream. Provides an encoding method that not only embeds a downmix audio signal, but also selects a method of sending it separately to various types of transmission media such as a medium for transmitting spatial information separately and a medium for transmitting spatial information separately. There is an effect that can transmit a multi-channel audio signal.

또한, 본 발명은 공간정보 비트스트림이 다운믹스 오디오 신호에 임베드된 경우, 별도로 보내진 경우, 또는 다운믹스 오디오 신호에 임베드될뿐만 아니라 별로도 보내지는 경우에, 각각의 방식에 따라 제공되는 공간 정보 비트스트림을 디코딩하는 방법을 제공함으로써, 다양한 방식으로 제공되는 공간 정보 비트스트림을 이용할 수 있는 효과가 있다.In addition, the present invention provides spatial information bits provided according to respective schemes when the spatial information bitstream is embedded in the downmix audio signal, when sent separately, or when not only embedded in the downmix audio signal but also transmitted separately. By providing a method of decoding a stream, the spatial information bitstream provided in various ways can be used.

Claims

A method of encoding a multichannel audio signal,

(a) downmixing the multichannel audio signal and extracting spatial information from the multichannel audio signal;

(b) generating a spatial information bitstream using the spatial information;

(c) selecting a method of transmitting the generated spatial information bitstream; And

(d) generating, according to the selected method, an entire bitstream comprising the spatial information bitstream and the downmix audio signal.

The method of claim 1,

The transmitting method is characterized in that for embedding the spatial information bitstream in the downmix audio signal and transmits.

The method of claim 2,

And said spatial information bitstream is embedded within an insertion bit of said downmix audio signal.

The method of claim 3, wherein

And the insertion bit is obtained by using a masking threshold of the downmixed audio signal.

The method of claim 1,

The transmitting method is characterized in that for transmitting the spatial information bitstream separately from the downmix audio signal, the multi-channel audio signal encoding method.

The method of claim 1,

The transmitting method is characterized in that for embedding the spatial information bitstream in the downmix audio signal, and transmits separately from the downmixed audio signal.

The method of claim 1,

And the identification information on the transmission method is included in the entire bitstream.

A method of encoding a multichannel audio signal,

(b) generating a spatial information bitstream using the spatial information;

(c) embedding the generated spatial information bitstream into the downmix audio signal; And

(d) generating an entire bitstream such that the spatial information bitstream includes an embedded downmix audio signal, and determining whether to include the spatial information bitstream separately within the entire bitstream; A method for encoding a multichannel audio signal.

The method of claim 8,

Step (c) is,

And embedding said spatial information bitstream into an embedded bit of said downmix audio signal.

The method of claim 8,

In step (d),

And configuring the entire bitstream such that the spatial information bitstream is not included separately in the entire bitstream.

The method of claim 8,

In step (d),

And constructing the entire bitstream such that the spatial information bitstream is separately included in the entire bitstream.

The method of claim 8,

In the method of decoding into a multi-channel audio signal,

(a) receiving the entire bitstream including the downmix audio signal in which the spatial information bitstream is embedded;

(b) extracting a spatial information bitstream embedded in the downmix audio signal; And

and (c) converting the downmix audio signal into a multichannel audio signal using spatial information obtained by decoding the spatial information bitstream.

The method of claim 13,

The decoding method,

And before step (b), extracting identification information indicating that the spatial information bitstream is embedded in the downmix audio signal from the entire bitstream. How to.

In the method of decoding into a multi-channel audio signal,

(a) receiving an entire bitstream comprising a spatial information bitstream and a downmix audio signal;

(b) extracting the spatial information bitstream transmitted separately from the downmix audio signal from the entire bitstream; And

The method of claim 15,

And said spatial information bitstream is stored and transmitted in a data track of a compact disc.

The method of claim 15,

And said spatial information bitstream is stored and transmitted in an extension bitstream.

The method of claim 15,

And said spatial information bitstream is stored and transmitted in an auxiliary data region within a core codec bitstream.

The method of claim 15,

The decoding method,

And before the step (b), extracting identification information indicating that the spatial information bitstream is transmitted separately from the downmix audio signal from the entire bitstream. How to decode.

In the method of decoding into a multi-channel audio signal,

(a) receiving an entire bitstream comprising a downmix audio signal having a spatial information bitstream embedded therein and a spatial information bitstream transmitted separately;

(b) extracting a spatial information bitstream from the downmix audio signal, decoding the extracted spatial information bitstream, or decoding the separately transmitted spatial information bitstream; And

The method of claim 20,

The decoding method,

Prior to step (b), extracting identification information indicating that the spatial information bitstream is embedded and transmitted in the downmix audio signal and transmitted separately from the downmix audio signal from the entire bitstream. And decoding a multichannel audio signal.

A method of encoding a multichannel audio signal,

(b) generating a spatial information bitstream for each frame using the spatial information; And

(c) Composing the entire bitstream by combining the spatial information bitstream and the downmixed audio signal, the frame size (N) of the spatial information bitstream is a frame size (S) unit to which spatial information is decoded and applied. And combining with the downmix audio signal.

The method of claim 22,

And position information on an application range of the spatial information bitstream is inserted into a header of the spatial information bitstream.

The method of claim 22,

And a syncword for a start position of the spatial information bitstream is inserted in a header for the spatial information bitstream.

In the method of decoding into a multi-channel audio signal,

(a) receiving an entire bitstream comprising a downmixed audio signal coupled with a spatial information bitstream; And

(b) extracting and decoding the spatial information bitstream combined with the downmixed audio signal on a frame size (S) basis in which spatial information is decoded and applied from the entire bitstream. Decoding into a multichannel audio signal.

The method of claim 25,

In step (b),

And converting the downmixed audio signal into a multichannel audio signal using the spatial information obtained by decoding the spatial information.

The method of claim 25,

The decoding method

And before the step (b), reading out location information to which the spatial information bitstream is to be applied from the entire bitstream.

In the method of generating an audio signal,

The audio signal is generated to include a downmixed audio signal and a spatial information bitstream,

The spatial information bitstream is combined with the downmixed audio signal to form an entire bitstream, and the spatial information bitstream is generated to be combined with the downmix audio signal by a frame size (S) applied by decoding the spatial information bitstream. An audio signal generation method.