KR102119239B1

KR102119239B1 - Method for creating binaural stereo audio and apparatus using the same

Info

Publication number: KR102119239B1
Application number: KR1020180010874A
Authority: KR
Inventors: 구본희
Original assignee: 구본희
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2020-06-04
Also published as: WO2019147041A1; KR20190091824A

Abstract

바이노럴 스테레오 오디오 생성 방법 및 이를 위한 장치가 개시된다. 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 3차원 바이노럴 레이어에 상응하는 3차원 레이어 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성하는 단계; 평면 레이어에 상응하는 오디오 프로세싱을 수행하여 평면 레이어 오디오 출력을 생성하는 단계; 및 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 합하여 바이노럴 스테레오 출력을 생성하는 단계를 포함한다.Disclosed is a method for generating binaural stereo audio and an apparatus therefor. A method for generating binaural stereo audio according to an embodiment of the present invention includes generating a 3D layer binaural output by performing 3D layer binaural encoding corresponding to a 3D binaural layer; Performing audio processing corresponding to the flat layer to generate a flat layer audio output; And generating a binaural stereo output by combining the 3D layer binaural output and a flat layer audio output.

Description

Method and apparatus for generating binaural stereo audio {METHOD FOR CREATING BINAURAL STEREO AUDIO AND APPARATUS USING THE SAME}

본 발명은 바이노럴 스테레오 오디오를 생성하는 기술에 관한 것으로, 특히 3차원 레이어에 기반한 바이노럴 출력과 평면 레이어에 기반한 오디오 출력을 합쳐서 범용적으로 재생 가능한 바이노럴 스테레오 오디오를 생성하는 기술에 관한 것이다.The present invention relates to a technology for generating binaural stereo audio. In particular, a technique for generating binaural stereo audio that can be universally reproduced by combining a binaural output based on a 3D layer and an audio output based on a flat layer is provided. It is about.

멀티미디어 기술이 향상되면서, 5.1 채널보다 많은 7.1 채널, 10.2 채널, 11.1 채널, 22.2 채널 등의 다채널 오디오 신호를 포함하는 컨텐츠의 사용이 증가하고 있다. 그러나, 컨텐츠를 이용하는 사용자들이 소지하고 있는 사용자 단말들은 대체로 스테레오 스피커나 헤드폰, 이어폰과 같이 스테레오 형태의 오디오 신호를 재생할 수 있기 때문에 고품질의 다채널 오디오 신호는 스테레오 형태의 오디오 신호로 변환될 필요가 있다.As the multimedia technology has improved, the use of content including multi-channel audio signals such as 7.1 channels, 10.2 channels, 11.1 channels, and 22.2 channels, which are more than 5.1 channels, is increasing. However, since user terminals possessed by users using content can generally reproduce stereo type audio signals such as stereo speakers, headphones, and earphones, high-quality multi-channel audio signals need to be converted into stereo type audio signals. .

한국 공개 특허 제10-2015-0013073호, 2015년 2월 4일 공개(명칭: 다채널 오디오 신호의 바이노럴 렌더링 방법 및 장치)Published Korean Patent No. 10-2015-0013073, released on February 4, 2015 (Name: Binaural rendering method and device for multi-channel audio signal)

본 발명의 목적은 다양한 사운드 요소를 믹스함으로써 바이노럴 효과를 극대화할 수 있는 바이노럴 스테레오 오디오를 생성하기 위한 방법을 제공하는 것이다.An object of the present invention is to provide a method for generating binaural stereo audio capable of maximizing a binaural effect by mixing various sound elements.

또한, 본 발명의 목적은 효과적인 바이노럴 효과를 생성하기 위한 사운드 요소를 쉽게 가감하거나 조절할 수 있는 바이노럴 엔진을 제공하는 것이다.In addition, it is an object of the present invention to provide a binaural engine that can easily adjust or adjust sound elements for generating an effective binaural effect.

또한, 본 발명의 목적은 자연스러운 업 믹스 및 다운 믹스를 기반으로 다양한 종류의 컨텐츠들과의 호환성을 향상시키는 것이다.In addition, an object of the present invention is to improve compatibility with various kinds of contents based on natural up-mix and down-mix.

상기한 목적을 달성하기 위한 본 발명에 따른 바이노럴 스테레오 오디오 생성 방법은 3차원 바이노럴 레이어에 상응하는 3차원 레이어 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성하는 단계; 평면 레이어에 상응하는 오디오 프로세싱을 수행하여 평면 레이어 오디오 출력을 생성하는 단계; 및 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 합하여 바이노럴 스테레오 출력을 생성하는 단계를 포함한다.A binaural stereo audio generating method according to the present invention for achieving the above object comprises: generating a 3D layer binaural output by performing 3D layer binaural encoding corresponding to a 3D binaural layer; Performing audio processing corresponding to the flat layer to generate a flat layer audio output; And generating a binaural stereo output by combining the 3D layer binaural output and a flat layer audio output.

이 때, 평면 레이어는 서라운드 레이어 바이노럴 인코딩을 수행하여 서라운드 레이어 바이노럴 출력을 생성하고, 생성된 상기 서라운드 레이어 바이노럴 출력을 상기 평면 레이어 오디오 출력으로 제공하는 서라운드 레이어 및 스테레오 신호를 입력 받아서 상기 스테레오 신호에 상응하는 상기 평면 레이어 오디오 출력을 생성하는 근접용 스테레오 레이어 중 어느 하나일 수 있다.At this time, the flat layer performs surround layer binaural encoding to generate a surround layer binaural output, and inputs a surround layer and a stereo signal that provides the generated surround layer binaural output as the flat layer audio output. It may be any one of the adjacent stereo layers that receive and generate the flat layer audio output corresponding to the stereo signal.

이 때, 3차원 레이어 바이노럴 출력은 4개의 업 채널들과 4개의 다운채널들로 구성된 8채널 기반의 3차원 큐빅(Cubic) 상에 위치하는 바이노럴 포인트에 대한 3차원 벡터에 상응하게 생성될 수 있다.At this time, the 3D layer binaural output corresponds to a 3D vector for a binaural point located on an 8 channel based 3D cubic composed of 4 up channels and 4 down channels. Can be created.

이 때, 바이노럴 스테레오 출력을 생성하는 단계는 3차원 가중치를 상기 3차원 레이어 바이노럴 출력에 적용하고, 평면 가중치를 상기 평면 레이어 오디오 출력에 적용하고, 상기 3차원 가중치 및 상기 평면 가중치는 서로 독립적으로 설정될 수 있다.At this time, the step of generating a binaural stereo output applies a 3D weight to the 3D layer binaural output, a plane weight is applied to the plane layer audio output, and the 3D weight and the plane weight are It can be set independently of each other.

이 때, 바이노럴 스테레오 출력을 생성하는 단계는 서브우퍼 레이어에 상응하는 서브우퍼 출력을 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력과 함께 합산하여 상기 바이노럴 스테레오 출력을 생성할 수 있다.At this time, the step of generating the binaural stereo output may generate the binaural stereo output by adding the subwoofer output corresponding to the subwoofer layer together with the 3D layer binaural output and the flat layer audio output. have.

이 때, 3차원 큐빅은 상기 3차원 큐빅의 꼭지점에 해당하는 8개의 동적 스피커들의 위치를 상기 3차원 바이노럴 레이어에 대한 크기 파라미터에 상응하게 변경하여 생성될 수 있다.At this time, the three-dimensional cubic can be generated by changing the positions of eight dynamic speakers corresponding to the vertices of the three-dimensional cubic corresponding to the size parameter for the three-dimensional binaural layer.

이 때, 3차원 벡터는 상기 3차원 큐빅의 내부에 포함되고, 상기 서라운드 레이어에 상응하는 2차원 평면의 중심에 해당하는 기준 청취점을 기준으로 생성될 수 있다.At this time, a 3D vector is included in the 3D cubic and may be generated based on a reference listening point corresponding to the center of a 2D plane corresponding to the surround layer.

이 때, 3차원 레이어 바이노럴 출력을 생성하는 단계는 상기 3차원 벡터의 방향 정보를 헤드 트래킹 정보에 상응하게 회전된 상기 3차원 큐빅에 적용하여 상기 3차원 레이어 바이노럴 출력을 생성하되, 상기 헤드 트래킹 정보는 헤드 트래킹 모듈에 기반한 트래킹 입력 및 사용자 인터페이스에 기반한 사용자 입력 중 적어도 하나에 상응하게 획득될 수 있다.At this time, the step of generating a 3D layer binaural output generates the 3D layer binaural output by applying direction information of the 3D vector to the 3D cubic rotated corresponding to head tracking information, The head tracking information may be obtained corresponding to at least one of a tracking input based on the head tracking module and a user input based on the user interface.

이 때, 3차원 큐빅은 팬(Pan), 틸트(tilt) 및 롤(roll) 중 적어도 하나의 회전 파라미터에 상응하게 회전될 수 있다.At this time, the three-dimensional cubic can be rotated corresponding to at least one rotation parameter of a pan, tilt, and roll.

이 때, 평면 레이어는 상기 4개의 업 채널들과 상기 4개의 다운채널들 사이에 위치할 수 있다.At this time, a flat layer may be located between the four up channels and the four down channels.

또한, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치는, 3차원 바이노럴 레이어에 상응하는 3차원 레이어 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성하고, 평면 레이어에 상응하는 오디오 프로세싱을 수행하여 평면 레이어 오디오 출력을 생성하고, 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 합하여 바이노럴 스테레오 출력을 생성하는 프로세서; 및 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 저장하는 메모리를 포함한다.In addition, the binaural stereo audio generating apparatus according to an embodiment of the present invention generates a 3D layer binaural output by performing a 3D layer binaural encoding corresponding to a 3D binaural layer, and generates a plane. A processor performing audio processing corresponding to a layer to generate a flat layer audio output, and generating a binaural stereo output by combining the 3D layer binaural output and the flat layer audio output; And a memory for storing the 3D layer binaural output and the flat layer audio output.

이 때, 프로세서는 3차원 가중치를 상기 3차원 레이어 바이노럴 출력에 적용하고, 평면 가중치를 상기 평면 레이어 오디오 출력에 적용하고, 상기 3차원 가중치 및 상기 평면 가중치는 서로 독립적으로 설정될 수 있다.At this time, the processor may apply a 3D weight to the 3D layer binaural output, a plane weight to the plane layer audio output, and the 3D weight and the plane weight may be set independently of each other.

이 때, 프로세서는 서브우퍼 레이어에 상응하는 서브우퍼 출력을 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력과 함께 합산하여 상기 바이노럴 스테레오 출력을 생성할 수 있다.At this time, the processor may generate the binaural stereo output by adding the subwoofer output corresponding to the subwoofer layer together with the 3D layer binaural output and the flat layer audio output.

이 때, 프로세서는 상기 3차원 벡터의 방향 정보를 헤드 트래킹 정보에 상응하게 회전된 상기 3차원 큐빅에 적용하여 상기 3차원 레이어 바이노럴 출력을 생성하되, 상기 헤드 트래킹 정보는 헤드 트래킹 모듈에 기반한 트래킹 입력 및 사용자 인터페이스에 기반한 사용자 입력 중 적어도 하나에 상응하게 획득될 수 있다.At this time, the processor applies the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information to generate the 3D layer binaural output, wherein the head tracking information is based on the head tracking module. It may be obtained corresponding to at least one of tracking input and user input based on a user interface.

본 발명에 따르면, 다양한 사운드 요소를 믹스함으로써 바이노럴 효과를 극대화할 수 있는 바이노럴 스테레오 오디오를 생성하기 위한 방법을 제공할 수 있다.According to the present invention, it is possible to provide a method for generating binaural stereo audio capable of maximizing a binaural effect by mixing various sound elements.

또한, 본 발명은 효과적인 바이노럴 효과를 생성하기 위한 사운드 요소를 쉽게 가감하거나 조절할 수 있는 바이노럴 엔진을 제공할 수 있다.In addition, the present invention can provide a binaural engine that can easily adjust or adjust the sound elements for generating an effective binaural effect.

또한, 본 발명은 자연스러운 업 믹스 및 다운 믹스를 기반으로 다양한 종류의 컨텐츠들과의 호환성을 향상시킬 수 있다.In addition, the present invention can improve compatibility with various kinds of contents based on natural up-mix and down-mix.

도 1은 본 발명의 일실시예에 따른 바이노럴 엔진의 구조를 나타낸 도면이다.
도 2는 종래의 바이노럴 엔진의 구조를 나타낸 도면이다.
도 3은 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치를 나타낸 블록도이다.
도 4는 본 발명의 일실시예에 따른 3차원 레이어 바이노럴 출력을 생성하는 상세한 구조를 나타낸 도면이다.
도 5는 본 발명에 따른 8채널 기반의 3차원 큐빅(Cubic)의 일 예를 나타낸 도면이다.
도 6은 본 발명에 따른 동적 스피커들의 위치를 변경하여 생성되는 다양한 크기의 3차원 큐빅의 일 예를 나타낸 도면이다.
도 7는 본 발명에 따른 3차원 벡터의 일 예를 나타낸 도면이다.
도 8은 본 발명에 따른 헤드 트래킹 정보에 상응하게 회전된 3차원 큐빅에 3차원 벡터의 방향 정보를 적용한 일 예를 나타낸 도면이다.
도 9는 본 발명에 따른 회전 파라미터의 일 예를 나타낸 도면이다.
도 10은 본 발명의 일실시예에 따른 서라운드 레이어 바이노럴 출력을 생성하는 상세한 구조를 나타낸 도면이다.
도 11은 본 발명에 5채널 기반의 서라운드 레이어의 일 예를 나타낸 도면이다.
도 12는 본 발명의 일실시예에 따른 스테레오 신호를 생성하는 상세한 구조를 나타낸 도면이다.
도 13 내지 도 14는 본 발명에 따른 근접용 스테레오 레이어의 일 예를 나타낸 도면이다.
도 15는 본 발명에 따른 3차원 큐빅의 업채널과 다운채널 사이에 위치하는 평면 레이어의 일 예를 나타낸 도면이다.
도 16 내지 도 17은 본 발명에 따른 서라운드 레이어의 일부 채널로 이용되는 근접용 스테레오 레이어의 일 예를 나타낸 도면이다.
도 18은 본 발명의 일실시예에 따른 서브우퍼 출력을 생성하는 상세한 구조를 나타낸 도면이다.
도 19는 본 발명에 따른 3차원 바이노럴 레이어, 평면 레이어 및 서브우퍼 레이어를 합한 구조의 일 예를 나타낸 도면이다.
도 20은 종래의 바이노럴 엔진을 통해 표현되는 사운드의 일 예를 나타낸 도면이다.
도 21은 본 발명에 따른 바이노럴 엔진을 통해 표현되는 사운드의 일 예를 나타낸 도면이다.
도 22는 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법을 나타낸 동작흐름도이다.1 is a view showing the structure of a binaural engine according to an embodiment of the present invention.
2 is a view showing the structure of a conventional binaural engine.
3 is a block diagram showing a binaural stereo audio generating apparatus according to an embodiment of the present invention.
4 is a diagram showing a detailed structure for generating a 3D layer binaural output according to an embodiment of the present invention.
5 is a diagram illustrating an example of an 8-channel based 3D cubic according to the present invention.
6 is a view showing an example of a three-dimensional cubic of various sizes generated by changing the position of the dynamic speakers according to the present invention.
7 is a view showing an example of a three-dimensional vector according to the present invention.
8 is a diagram illustrating an example of applying direction information of a 3D vector to a 3D cubic rotated corresponding to head tracking information according to the present invention.
9 is a view showing an example of a rotation parameter according to the present invention.
10 is a diagram showing a detailed structure for generating a surround layer binaural output according to an embodiment of the present invention.
11 is a diagram showing an example of a 5-channel surround layer in the present invention.
12 is a view showing a detailed structure for generating a stereo signal according to an embodiment of the present invention.
13 to 14 are views showing an example of a stereo layer for proximity according to the present invention.
15 is a view showing an example of a planar layer positioned between an up-channel and a down-channel of a 3D cubic according to the present invention.
16 to 17 are views showing an example of a stereo layer for proximity used as a part of a channel of a surround layer according to the present invention.
18 is a diagram showing a detailed structure for generating a subwoofer output according to an embodiment of the present invention.
19 is a diagram illustrating an example of a structure in which a 3D binaural layer, a planar layer, and a subwoofer layer are combined according to the present invention.
20 is a view showing an example of a sound expressed through a conventional binaural engine.
21 is a diagram showing an example of sound expressed through a binaural engine according to the present invention.
22 is a flowchart illustrating a method for generating binaural stereo audio according to an embodiment of the present invention.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.If described in detail with reference to the accompanying drawings the present invention. Here, repeated descriptions, well-known functions that may unnecessarily obscure the subject matter of the present invention, and detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. Therefore, the shape and size of elements in the drawings may be exaggerated for a more clear description.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 바이노럴 엔진의 구조를 나타낸 도면이고, 도 2는 종래의 바이노럴 엔진의 구조를 나타낸 도면이다.1 is a view showing the structure of a binaural engine according to an embodiment of the present invention, and FIG. 2 is a view showing the structure of a conventional binaural engine.

먼저, 도 2를 참조하면, 종래의 바이노럴 엔진은 바이노럴 인코더(210)를 통해 다채널의 오디오 파일에 대해 바이노럴 인코딩된 바이노럴 출력을 전용 플레이어(220)를 통해 디코딩하여 제공한다. 이 때, 종래 기술에 따른 바이노럴 인코딩은 리스닝 포지션(listening position)으로부터 일정 거리 떨어진 곳에 배치된 고정 스피커를 이용하기 때문에 스피커의 위치를 조절하여 공간의 이미지를 증감시키는데에 어려움이 따른다. First, referring to FIG. 2, the conventional binaural engine decodes the binaural encoded binaural output for a multi-channel audio file through the binaural encoder 210 through the dedicated player 220. to provide. At this time, since the binaural encoding according to the prior art uses a fixed speaker disposed at a certain distance from the listening position, it is difficult to adjust the position of the speaker to increase or decrease the image of the space.

또한, 종래의 바이노럴 엔진은 서라운드 영화 컨텐츠와 같이 영상과 오디오가 함께 포함된 컨텐츠에 특화된 엔진으로, 음악 컨텐츠와 같이 공간 이미지가 존재하지 않는 소스의 경우에는 바이노럴 엔진을 적용하기 난해한 문제점이 있다. 또한, 전용 플레이어(220)를 사용해야만 바이노럴 인코딩된 컨텐츠 재생이 가능하기 때문에 활용적인 측면에서 효율성이 떨어질 수 있다. 예를 들어, 음악 컨텐츠의 특성상 청취자에게 충분한 라우드니스(loudness)를 전달해주어야 하지만, 도 2에 도시된 것이 바이노럴 인코더(210)만 이용해서는 음악 컨텐츠에 최적화된 음향 효과를 제공하는데 한계가 있다. In addition, the conventional binaural engine is an engine specialized for content that includes both video and audio, such as surround movie content, and it is difficult to apply the binaural engine in the case of a source that does not have a spatial image such as music content. There is this. In addition, since a binaural encoded content can be played only by using the dedicated player 220, efficiency may be reduced in terms of utilization. For example, due to the nature of the music content, sufficient loudness must be delivered to the listener, but there is a limit to providing an optimized sound effect to the music content using only the binaural encoder 210 shown in FIG. 2.

또한, 종래의 바이노럴 엔진은, 컨텐츠에 따라 주로 활용되는 효과에 특화된 하나의 인코더만을 이용하기 때문에 다양한 방식의 연출 효과를 적용하는 것이 불가능했다. 예를 들어, 음악 컨텐츠에 대해서는 특성상 서브우퍼를 사용하지 않는 경우가 많기 때문에, 종래의 바이노럴 엔진을 통해 음악 컨텐츠에 서브우퍼에 따른 저음 재생 요소를 제공하는 연출은 거의 시도되지 않았다. In addition, since the conventional binaural engine uses only one encoder specialized for the effect mainly used according to the content, it is impossible to apply various types of directing effects. For example, since a subwoofer is often not used for a music content, it has been hardly attempted to provide a bass reproduction element according to the subwoofer to the music content through a conventional binaural engine.

이에 반해, 도 1에 도시된 본 발명의 일실시예에 따른 바이노럴 엔진은 다양한 바이노럴 음향 효과를 포함하는 출력과 오디오 프로세싱에 의한 출력을 믹싱(mixing)하여 보다 극적인 연출을 포함하는 바이노럴 스테레오 출력을 생성할 수 있다. In contrast, the binaural engine according to an embodiment of the present invention shown in FIG. 1 includes a more dramatic production by mixing an output including various binaural sound effects and an output by audio processing. It is possible to generate the binaural stereo output.

예를 들어, 도 1에 도시된 것처럼 다채널의 3차원 바이노럴 레이어(110)에 상응하는 바이노럴 인코더(111)로 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성할 수 있다. 또한, 서라운드 레이어(120)에 상응하는 바이노럴 인코더(121)로 바이노럴 인코딩을 수행하여 서라운드 레이어 바이노럴 출력을 생성할 수 있다. 또한, 근접용 스테레오 레이어(130)에 상응하는 스테레오 버스(131)로 스테레오 신호에 상응하는 오디오 출력을 생성할 수 있다. 또한, 서브우퍼 레이어(140)에 상응하는 LFE 버스(141)로 서브우퍼 출력을 생성할 수 있다. 이 후, 바이노럴 믹서(150)를 통해 각각의 출력, 즉 3차원 레이어 바이노럴 출력, 서라운드 레이어 바이노럴 출력, 스테레오 신호에 상응하는 오디오 출력 및 서브우퍼 출력을 합산하여 바이노럴 스테레오 출력을 생성할 수 있다. 이 때, 바이노럴 믹서(150)를 통해 결합된 바이노럴 스테레오 출력은 범용 디코더를 통해 재생 가능한 형태로 청취자 또는 사용자에게 출력될 수 있다. For example, as illustrated in FIG. 1, binaural encoding is performed by the binaural encoder 111 corresponding to the multi-channel 3D binaural layer 110 to generate a 3D layer binaural output. Can be. In addition, a binaural encoding may be performed with the binaural encoder 121 corresponding to the surround layer 120 to generate a surround layer binaural output. In addition, an audio output corresponding to a stereo signal may be generated by the stereo bus 131 corresponding to the proximity stereo layer 130. In addition, the subwoofer output may be generated by the LFE bus 141 corresponding to the subwoofer layer 140. Subsequently, each output is output through the binaural mixer 150, that is, a three-dimensional layer binaural output, a surround layer binaural output, an audio output corresponding to a stereo signal, and a subwoofer output, thereby adding binaural stereo. You can generate output. At this time, the binaural stereo output combined through the binaural mixer 150 may be output to a listener or a user in a form playable through a general-purpose decoder.

이와 같이, 본 발명에 따른 바이노럴 엔진은 다양한 인코더에 의한 출력을 믹싱하여 바이노럴 스테레오 오디오를 생성하기 때문에 특정 컨텐츠에 특화되지 않은 범용적 형태로 사용될 수 있으며, 종래의 컨텐츠들에 대해서도 높은 호환성이 제공될 수 있다. As described above, since the binaural engine according to the present invention generates binaural stereo audio by mixing the outputs of various encoders, it can be used in a general-purpose form that is not specialized for specific content, and is high for conventional content. Compatibility can be provided.

예를 들어, 영상과 오디오가 함께 포함된 영화 컨텐츠의 경우, 영상에 포함된 객체의 움직임에 기반하여 생성 가능한 서라운드 레이어 바이노럴 출력과 함께 3차원 레이어 바이노럴 출력, 스테레오 출력 및 서브우퍼 출력 중 적어도 하나를 혼합하여 제공함으로써 보다 극적인 사운드 연출이 가능하도록 할 수 있다. For example, in the case of movie content that includes both video and audio, a 3D layer binaural output, a stereo output, and a subwoofer output together with a surround layer binaural output that can be generated based on the motion of an object included in the image. By providing at least one of them, a more dramatic sound can be produced.

다른 예를 들어, 오디오만 포함하는 음악 컨텐츠의 경우에는 3차원 바이노럴 레이어를 기반으로 생성된 3차원 레이어 바이노럴 출력과 함께 스테레오 출력이나 서브우퍼 출력을 혼합하여 제공함으로써 다이나믹한 음악을 제공할 수도 있다.For another example, in the case of music content containing only audio, dynamic music is provided by mixing a stereo output or a subwoofer output together with a 3D layer binaural output generated based on a 3D binaural layer. You may.

도 3은 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치를 나타낸 블록도이다.3 is a block diagram showing a binaural stereo audio generating apparatus according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치는 통신부(310), 프로세서(320) 및 메모리(330)를 포함한다.Referring to FIG. 3, a binaural stereo audio generating apparatus according to an embodiment of the present invention includes a communication unit 310, a processor 320, and a memory 330.

통신부(310)는 네트워크와 같은 통신망을 통해 바이노럴 스테레오 오디오 생성을 위해 필요한 정보를 송수신하는 역할을 한다. 특히, 본 발명의 일실시예에 따른 통신부(310)는 바이노럴 스테레오 오디오 생성을 위해 입력 가능한 소스 또는 컨텐츠, 바이노럴 인코딩을 위해 적용될 헤드 트래킹 정보 및 사용자 입력에 관련된 정보를 수신하고, 바이노럴 스테레오 출력에 상응하는 바이노럴 스테레오 오디오를 제공할 수 있다. The communication unit 310 transmits and receives information necessary for generating binaural stereo audio through a communication network such as a network. In particular, the communication unit 310 according to an embodiment of the present invention receives input source or content for generating binaural stereo audio, head tracking information to be applied for binaural encoding, and information related to user input, and It is possible to provide binaural stereo audio corresponding to the binaural stereo output.

프로세서(320)는 3차원 바이노럴 레이어에 상응하는 3차원 레이어 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성한다.The processor 320 generates a 3D layer binaural output by performing 3D layer binaural encoding corresponding to the 3D binaural layer.

이 때, 3차원 바이노럴 레이어는 3차원 공간 이미지를 만드는 요소에 상응하는 것으로, 예를 들어 도 4를 참조하면, 3차원 큐빅 방식에 상응하는 바이노럴 인코더(420)를 이용하여 3차원 바이노럴 레이어에 포함된 다수의 채널들에 상응하는 3차원 레이어 바이노럴 인코딩을 수행할 수 있다. At this time, the 3D binaural layer corresponds to an element that creates a 3D spatial image. For example, referring to FIG. 4, the 3D using the binaural encoder 420 corresponding to the 3D cubic method A 3D layer binaural encoding corresponding to a plurality of channels included in the binaural layer may be performed.

이 때, 3차원 바이노럴 레이어는 8채널 기반의 3차원 큐빅에 상응하는 4개의 업채널(411)과 4개의 다운채널(412)을 포함할 수 있다. At this time, the 3D binaural layer may include 4 up-channels 411 and 4 down-channels 412 corresponding to 8-channel 3D cubic.

따라서, 3차원 레이어 바이노럴 출력(430)은 8채널 기반의 오디오를 바이노럴 인코딩함으로써 생성된 출력에 상응할 수 있고, 도 4에 도시된 것과 같이 2채널에 상응하게 출력될 수 있다. 또한, 3차원 레이어 바이노럴 출력(430)에 상응하는 2채널은 각각 레프트 채널과 라이트 채널에 상응할 수 있다. Accordingly, the 3D layer binaural output 430 may correspond to an output generated by binaural encoding of 8-channel based audio, and may be output corresponding to 2 channels as illustrated in FIG. 4. Also, two channels corresponding to the 3D layer binaural output 430 may correspond to the left channel and the right channel, respectively.

이 때, 도 4에 도시된 실시예에서는 3차원 바이노럴 레이어로 8채널 기반의 3차원 큐빅 레이어를 사용하였으나, 3차원 바이노럴 레이어는 이에 한정되지 않을 수 있다. 즉, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치 또는 바이노럴 엔진은 사용 가능한 다른 3차원 바이노럴 레이어 또는 향후 개발될 3차원 바이노럴 레이어를 포함하여 구성될 수도 있다. At this time, in the embodiment illustrated in FIG. 4, an 8-channel based 3D cubic layer is used as a 3D binaural layer, but the 3D binaural layer may not be limited thereto. That is, the binaural stereo audio generating apparatus or binaural engine according to an embodiment of the present invention may be configured to include other three-dimensional binaural layers that can be used or a three-dimensional binaural layer to be developed in the future.

예를 들어, 도 5를 참조하면, 8채널 기반의 3차원 큐빅은 4개의 업채널들에 상응하는 4개의 동적 스피커들(511~514)과 4개의 다운채널들에 상응하는 4개의 동적 스피커들(515~518)을 각 꼭지점으로 하는 육면체 구조일 수 있다. 이 때, 8개의 동적 스피커들(511~518)의 위치는 변경이 가능하기 때문에 3차원 큐빅에 의해 발생하는 바이노럴 효과의 범위도 동적으로 변경할 수 있다.For example, referring to FIG. 5, 8-channel-based 3D cubic includes 4 dynamic speakers 511 to 514 corresponding to 4 up channels and 4 dynamic speakers corresponding to 4 down channels. It may be a hexahedral structure having (515-518) as each vertex. At this time, since the positions of the eight dynamic speakers 511 to 518 can be changed, the range of the binaural effect generated by the three-dimensional cubic can also be dynamically changed.

다른 예를 들어, 기존의 바이노럴 Vbap(Vector base amplitude panning) 방식을 사용하여 3차원 큐빅을 생성함으로써 8개의 동적 스피커들로 이머시브(immersive) 사운드를 구현할 수도 있다. 즉, 8개의 동적 스피커들 각각에 대해 X, Y, Z에 대한 위치 값을 부여하되, 3차원 큐빅의 중점을 기준으로 하는 벡터 기반의 가상의 트랙 포인트(Track Point)를 표현할 수 있다. 이 때, 가상의 트랙 포인트는 헤드 트래킹 정보에 포함된 파라미터 값에 상응하게 표현될 수 있다. As another example, an immersive sound may be implemented with eight dynamic speakers by generating a three-dimensional cubic using a conventional binaural vector base amplitude panning (Vbap) method. That is, the position values for X, Y, and Z are assigned to each of the eight dynamic speakers, but a vector-based virtual track point based on the center point of the three-dimensional cubic can be expressed. At this time, the virtual track point may be represented corresponding to the parameter value included in the head tracking information.

이와 같은 3차원 큐빅을 통해 오디오만 포함하는 음악 컨텐츠에 대한 공간 이미지를 생성할 수 있고, 소리의 움직임을 표현할 수 있어서 보다 입체적인 효과를 제공할 수 있다.Through such a three-dimensional cubic, a spatial image for music content including only audio can be generated, and motion of sound can be expressed, thereby providing a more three-dimensional effect.

이 때, 3차원 큐빅은 3차원 큐빅의 꼭지점에 해당하는 8개의 동적 스피커들의 위치를 3차원 바이노럴 레이어에 대한 크기 파라미터에 상응하게 변경하여 생성될 수 있다. 즉, 고정 방식이 아닌 가변 방식의 동적 스피커들의 위치를 크기 파라미터에 상응하게 자유롭게 변경함으로써 효율적으로 3차원 큐빅을 생성할 수 있다.At this time, the three-dimensional cubic can be generated by changing the positions of eight dynamic speakers corresponding to the vertices of the three-dimensional cubic corresponding to the size parameter for the three-dimensional binaural layer. That is, it is possible to efficiently generate a three-dimensional cubic by freely changing the positions of dynamic speakers of a variable type rather than a fixed type corresponding to a size parameter.

예를 들어, 크기 파라미터를 상수로 정하고, 여기에 바이노럴 함수를 곱하는 방식으로 3차원 큐빅을 프로세싱함으로써 도 6에 도시된 것과 같이 다양한 범위를 갖는 3차원 큐빅들(610, 620, 630)을 생성할 수 있다. For example, by defining a size parameter as a constant and processing the 3D cubic by multiplying it with a binaural function, 3D cubics 610, 620, and 630 having various ranges as shown in FIG. Can be created.

이 때, 3차원 벡터는 3차원 큐빅의 내부에 포함되고, 서라운드 레이어에 상응하는 2차원 평면의 중심에 해당하는 기준 청취점을 기준으로 생성될 수 있다.At this time, the 3D vector is included in the 3D cubic and may be generated based on a reference listening point corresponding to the center of the 2D plane corresponding to the surround layer.

예를 들어, 도 7을 참조하면, 바이노럴 스테레오 오디오를 듣는 사용자 또는 청취자의 위치를 가상으로 표현한 기준 청취점(700)은 8개의 동적 스피커들을 각 꼭지점으로 하는 3차원 큐빅(710)의 내부에 위치하되, 서라운드 레이어(720) 상에서 중심 부분에 위치할 수 있다. 이 때, 바이노럴 포인트(730)가 도 7에 도시된 것과 같이 3차원 큐빅(710)의 상면에 위치한다고 가정하면, 3차원 레이어 바이노럴 출력에 상응하는 3차원 벡터(740)는 도 7에 도시된 기준 청취점(700)에서 바이노럴 포인트(730)를 향하는 방향으로 생성될 수 있다. For example, referring to FIG. 7, the reference listening point 700 virtually representing the position of a user or a listener listening to binaural stereo audio is an interior of the 3D cubic 710 having 8 dynamic speakers as each vertex. Located in, but may be located in the center portion on the surround layer 720. At this time, assuming that the binaural point 730 is located on the upper surface of the 3D cubic 710 as illustrated in FIG. 7, the 3D vector 740 corresponding to the 3D layer binaural output is shown in FIG. It may be generated in a direction toward the binaural point 730 from the reference listening point 700 shown in 7.

이 때, 도 10 내지 도 11을 통해 상세하게 설명하겠지만, 서라운드 레이어(720)는 서라운드 효과에 상응하는 서라운드 이미지를 만드는 요소에 상응하는 것으로, 도 7에서는 설명의 편의를 위해 서라운드 레이어(720)를 평면의 형태로 도시하였으나, 평면 형태에 한정되지 않을 수 있다. At this time, although it will be described in detail through FIGS. 10 to 11, the surround layer 720 corresponds to an element that creates a surround image corresponding to the surround effect, and in FIG. 7, the surround layer 720 is provided for convenience of explanation. Although shown in the form of a plane, it may not be limited to the plane shape.

이 때, 도 7에 도시된 것과 같이 3차원 큐빅(710) 상에서 바이노럴 포인트(730)가 기준 청취점(700)이 위치하는 서라운드 레이어(720)보다 높게 위치할 경우, 출력되는 소리가 청취자의 상단에 맺힐 수 있다. 또한, 3차원 큐빅(710) 상에서 바이노럴 포인트(730)가 기준 청취점(700)이 위치하는 서라운드 레이어(720)보다 낮게 위치할 경우, 출력되는 소리가 청취자의 하단에 맺힐 수도 있다.At this time, when the binaural point 730 is positioned higher than the surround layer 720 in which the reference listening point 700 is located on the three-dimensional cubic 710, as shown in FIG. 7, the output sound is a listener At the top of the can. In addition, when the binaural point 730 is positioned lower than the surround layer 720 in which the reference listening point 700 is located on the 3D cubic 710, the output sound may be formed at the bottom of the listener.

이와 같이, 본 발명에서는 3차원 큐빅(710)상에서 기준 청취점(700)을 기준으로 한 바이노럴 포인트(730)의 위치를 변경함으로써 보다 다양한 오디오를 연출하는 것이 가능할 수 있다. As described above, in the present invention, it may be possible to produce more various audio by changing the position of the binaural point 730 relative to the reference listening point 700 on the three-dimensional cubic 710.

이 때, 3차원 벡터의 방향 정보를 헤드 트래킹 정보에 상응하게 회전된 3차원 큐빅에 적용하여 3차원 레이어 바이노럴 출력을 생성할 수 있다. 즉, 바이노럴 포인트는 기준 청취점에 해당하는 청취자의 머리를 기준으로 설정된 위치이므로 청취자의 머리 위치나 각도가 변경되는 경우, 3차원 큐빅 상에서 바이노럴 포인트의 위치도 변경될 수 있다. At this time, the 3D layer binaural output may be generated by applying the direction information of the 3D vector to the rotated 3D cubic corresponding to the head tracking information. That is, since the binaural point is a position set based on the listener's head corresponding to the reference listening point, when the listener's head position or angle is changed, the position of the binaural point on the 3D cubic can also be changed.

예를 들어, 도 7에 도시된 3차원 큐빅(710)을 헤드 트래킹 정보에 상응하게 도 8에 도시된 것처럼 회전시켰다고 가정할 수 있다. 이 때, 도 7에 도시된 3차원 벡터(740)의 방향 정보를 그대로 도 8에 도시된 3차원 큐빅에 적용함으로써 회전에 따라 변경된 바이노럴 포인트의 위치를 검출할 수 있다. For example, it can be assumed that the three-dimensional cubic 710 shown in FIG. 7 is rotated as shown in FIG. 8 corresponding to the head tracking information. At this time, by applying the direction information of the 3D vector 740 illustrated in FIG. 7 to the 3D cubic illustrated in FIG. 8 as it is, the position of the binaural point changed according to the rotation can be detected.

이 때, 헤드 트래킹 정보는 사용자나 청취자의 머리 움직임을 트래킹한 데이터에 상응하는 것으로, 별도의 헤드 트래킹 모듈에 기반한 트래킹 입력 및 사용자 인터페이스에 기반한 사용자 입력 중 적어도 하나에 상응하게 획득될 수 있다.At this time, the head tracking information corresponds to data tracking the head movement of a user or a listener, and may be obtained corresponding to at least one of a tracking input based on a separate head tracking module and a user input based on a user interface.

예를 들어, 사용자나 청취자가 헤드 트래킹 모듈을 직접 착용한 상태에서 머리를 움직이면, 헤드 트래킹 모듈에서 사용자의 머리가 움직인 거리나 각도 등을 측정하여 헤드 트래킹 정보로 생성하고 전송할 수 있다.For example, if the user or the listener moves the head while wearing the head tracking module directly, the head tracking module may measure and measure the distance or angle of the user's head movement and generate and transmit the head tracking information.

다른 예를 들어, 헤드 트래킹 정보는 사용자나 청취자가 사용자 인터페이스를 통해 인위적으로 부여할 수도 있다. 즉, 사용자나 청취자가 인위적으로 공간 이미지를 회전시키기 위해, 헤드 트래킹 모듈에 의한 헤드 트래킹 정보의 수신 여부와 상관없이 사용자 인터페이스를 기반으로 헤드 트래킹 정보를 입력할 수도 있다. 이 때, 사용자나 청취자는 바이노럴 스테레오 출력을 생성하는 믹싱과정 또는 입력되는 정보에 따라 변화하는 바이노럴 스테레오 출력을 청취하면서 헤드 트래킹 정보를 입력 및 수정할 수도 있다.For another example, the head tracking information may be artificially provided by a user or a listener through a user interface. That is, in order to artificially rotate a spatial image by a user or a listener, head tracking information may be input based on a user interface regardless of whether head tracking information is received by the head tracking module. At this time, the user or the listener may input and correct head tracking information while listening to a mixing process that generates a binaural stereo output or a binaural stereo output that changes according to input information.

예를 들어, 도 9에 도시된 것과 같이 청취자가 팬(Pan), 틸트(tilt) 및 롤(roll) 중 적어도 하나에 상응하게 머리를 회전하는 경우, 이 값을 회전 파라미터로 획득하여 3차원 큐빅에 적용할 수 있다. For example, when the listener rotates the head corresponding to at least one of a pan, tilt, and roll as shown in FIG. 9, this value is obtained as a rotation parameter to obtain a three-dimensional cubic Can be applied to.

이와 같이, 헤드 트래킹 정보에 따라 3차원 큐빅을 회전시키거나 상하좌우로 움직여서 연출되는 효과는 향후 평면 레이어 오디오 출력과 믹싱되어 바이노럴 스테레오 출력을 생성할 수 있다. 따라서, 평면 레이어에 상응하는 서라운드 레이어나 근접용 스테레오 레이어 또는 서브우퍼 레이어 등을 회전시키거나 이동시키는 종래의 방식보다 효율적으로 헤드 트래킹에 기반한 이머시브(immersive) 효과를 연출할 수 있다. As described above, the effect produced by rotating the 3D cubic or moving it up, down, left, and right according to the head tracking information may be mixed with the flat layer audio output in the future to generate a binaural stereo output. Therefore, an immersive effect based on head tracking can be produced more efficiently than a conventional method of rotating or moving a surround layer, a proximity stereo layer, or a subwoofer layer corresponding to a flat layer.

또한, 프로세서(320)는 평면 레이어에 상응하는 오디오 프로세싱을 수행하여 평면 레이어 오디오 출력을 생성한다.In addition, the processor 320 performs audio processing corresponding to the flat layer to generate a flat layer audio output.

이 때, 평면 레이어는 3차원 바이노럴 레이어와는 상이한 구조를 갖는 레이어에 상응하는 것으로, 서라운드 효과 또는 스테레오 효과에 상응하는 이미지를 만드는 요소에 상응할 수 있다. At this time, the planar layer corresponds to a layer having a structure different from that of the 3D binaural layer, and may correspond to an element that creates an image corresponding to a surround effect or a stereo effect.

따라서, 평면 레이어는 서라운드 레이어 바이노럴 인코딩을 수행하여 서라운드 레이어 바이노럴 출력을 생성하고, 생성된 서라운드 레이어 바이노럴 출력을 평면 레이어 오디오 출력으로 제공하는 서라운드 레이어 및 스테레오 신호를 입력 받아서 스테레오 신호에 상응하는 평면 레이어 오디오 출력을 생성하는 근접용 스테레오 레이어 중 어느 하나일 수 있다.Therefore, the flat layer performs surround layer binaural encoding to generate a surround layer binaural output, and receives a surround layer and stereo signals that provide the generated surround layer binaural output as a flat layer audio output, and receives a stereo signal. It may be any one of the stereo layer for the proximity to generate a flat layer audio output corresponding to.

예를 들어, 도 10을 참조하면, 바이노럴 인코더(1020)를 이용하여 5채널 또는 7채널(1010)의 서라운드 레이어에 상응하는 서라운드 레이어 바이노럴 인코딩을 수행할 수 있다. 이 때, 도 13 내지 도 14를 통해 설명하겠지만, 근접용 스테레오 레이어에 상응하는 2채널을 서라운드 레이어에 포함시켜 7채널 기반의 서라운드 레이어 바이노럴 인코딩을 수행할 수 있다. For example, referring to FIG. 10, the binaural encoder 1020 may be used to perform surround layer binaural encoding corresponding to the surround layer of the 5 or 7 channel 1010. At this time, as described with reference to FIGS. 13 to 14, a 7-channel-based surround layer binaural encoding may be performed by including 2 channels corresponding to the adjacent stereo layer in the surround layer.

이 때, 서라운드 레이어는, 예를 들어, 도 11에 도시된 것과 같이 5개의 스피커들(1111~1115)을 포함하는 구조에 상응할 수 있다. 이 때, 서라운드 레이어 바이노럴 출력(1030)은 서라운드 레이어 상에 위치하는 바이노럴 포인트에 상응할 수 있다. 만약, 청취자가 서라운드 레이어의 중심에 위치하는 기준 청취점에서 소리를 듣고 있다고 가정할 경우, 마치 서라운드 레이어 상의 바이노럴 포인트에서 소리가 나는 것처럼 바이노럴 인코딩하여 서라운드 레이어 바이노럴 출력(1030)을 생성할 수 있다.At this time, the surround layer may correspond to a structure including five speakers 1111 to 1115, as shown in FIG. 11, for example. At this time, the surround layer binaural output 1030 may correspond to a binaural point located on the surround layer. If it is assumed that the listener is listening to the sound at the reference listening point located at the center of the surround layer, the surround layer binaural output 1030 is encoded by binaural encoding as if the sound is coming from the binaural point on the surround layer. Can generate

이 때, 서라운드 레이어 바이노럴 출력(1030)은 도 10에 도시된 것과 같이 2채널에 상응하게 출력될 수 있다. 또한, 서라운드 레이어 바이노럴 출력(1030)에 상응하는 2채널은 각각 레프트 채널과 라이트 채널에 상응할 수 있다.At this time, the surround layer binaural output 1030 may be output corresponding to two channels as illustrated in FIG. 10. Also, the two channels corresponding to the surround layer binaural output 1030 may correspond to the left channel and the right channel, respectively.

이 때, 도 10 내지 도 11에서는 5채널 또는 7채널(1010)에 해당하는 서라운드 레이어를 도시하고 있으나, 서라운드 레이어의 채널은 5채널 또는 7채널(1010)에 한정되지 않는다. 또한, 도 11에서는 서라운드 레이어를 사각형 평면 형태로 도시하고 있으나, 이에 한정되지 않고 선의 두께, 평면 모양의 형태 및 기준 청취점으로부터의 거리 등 다양한 형태로 표현 가능하다. At this time, although the surround layers corresponding to the 5 or 7 channels 1010 are illustrated in FIGS. 10 to 11, the channels of the surround layers are not limited to the 5 or 7 channels 1010. In addition, in FIG. 11, the surround layer is illustrated in a rectangular flat shape, but is not limited thereto, and can be expressed in various shapes such as a line thickness, a flat shape, and a distance from a reference listening point.

다른 예를 들어, 도 12를 참조하면, 스테레오 버스(Stereo Bus)(1220)를 기반으로 2채널(1210)의 근접용 스테레오 레이어에 상응하게 오디오 프로세싱을 수행할 수 있다. 즉, 평면 레이어 오디오 출력에 상응하는 스테레오 신호(1230)는 2채널(1210) 기반의 스테레오 오디오를 프로세싱함으로써 생성된 출력에 상응할 수 있고, 2채널에 상응하게 출력될 수 있다. For another example, referring to FIG. 12, audio processing may be performed corresponding to a stereo layer for proximity of 2 channels 1210 based on a stereo bus 1220. That is, the stereo signal 1230 corresponding to the flat layer audio output may correspond to the output generated by processing the stereo audio based on the two channels 1210, and may be output corresponding to the two channels.

이 때, 근접용 스테레오 레이어는 스테레오 효과에 상응하는 스테레오 이미지를 만드는 요소에 상응하는 것으로, 서라운드 레이어의 일부로 포함되어 나타낼 수도 있다. In this case, the stereo layer for proximity corresponds to an element that creates a stereo image corresponding to the stereo effect, and may be included as part of the surround layer.

예를 들어, 도 13 내지 도 14에 도시된 것과 같이 5개의 스피커들에 기반한 서라운드 레이어 상에 2개의 스피커들(1311, 1312, 1411, 1412)에 상응하는 근접용 스테레오 레이어를 포함시켜 총 7개의 스피커들을 포함하는 레이어 구조로 나타낼 수도 있다. For example, as shown in FIGS. 13 to 14, a total of seven speakers including a proximity stereo layer corresponding to two speakers 1311, 1312, 1411, and 1412 on a surround layer based on five speakers It may be represented by a layer structure including speakers.

이 때, 도 13에 도시된 것과 같이, 근접용 스테레오 레이어는 서라운드 레이어 상에 위치하는 기준 청취점(1300)으로부터 근접한 거리에 배치될 수 있다. 또는, 도 14에 도시된 것과 같이, 기준 청취점(1400)의 좌우 사이드 스피커로써 근접용 스테레오 레이어를 사용할 수도 있다.At this time, as illustrated in FIG. 13, the stereo layer for proximity may be disposed at a distance close to the reference listening point 1300 positioned on the surround layer. Alternatively, as illustrated in FIG. 14, a stereo layer for proximity may be used as a left and right side speaker of the reference listening point 1400.

이 때, 근접용 스테레오 레이어에 상응하게 출력되는 스테레오 신호는 바이노럴 인코딩에 사용되는 공간 파라미터로는 연출하기 어려운 댐핑(damping)감을 제공할 수 있다. 따라서, 본 발명의 일실시예에 따른 바이노럴 스테레오 출력은 바이노럴 인코딩에 의한 이머시브(immersive) 효과를 제공함과 동시에 댐핑감을 제공할 수도 있다.At this time, the stereo signal output corresponding to the stereo layer for proximity may provide a damping feeling that is difficult to produce with spatial parameters used for binaural encoding. Accordingly, the binaural stereo output according to an embodiment of the present invention may provide an immersive effect by binaural encoding and a damping feeling.

이와 같이, 서라운드 레이어 바이노럴 출력에 상응하는 평면 레이어 오디오 출력이나 스테레오 신호에 상응하는 평면 레이어 오디오 출력은 3차원 레이어 바이노럴 출력과 비교하였을 때, 단지 상이한 음향 효과를 포함하는 출력에 해당하는 것일 수 있다. 즉, 평면 레이어 오디오 출력은 3차원 레이어에 상응하는 출력이 아니어도 3차원 레이어 바이노럴 출력보다 다양한 값을 포함할 수도 있다. As such, the flat layer audio output corresponding to the surround layer binaural output or the flat layer audio output corresponding to the stereo signal corresponds to an output including only different sound effects when compared to the 3D layer binaural output. May be That is, the flat layer audio output may include various values than the 3D layer binaural output even if the output does not correspond to the 3D layer.

이 때, 평면 레이어는 3차원 큐빅에 상응하는 4개의 업채널들과 4개의 다운채널들 사이에 위치할 수 있다.At this time, the planar layer may be located between four up-channels and four down-channels corresponding to the three-dimensional cubic.

예를 들어, 도 15를 참조하면, 본 발명의 일실시예에 따른 평면 레이어(1510~1530)는 3차원 바이노럴 레이어에 상응하는 3차원 큐빅에 포함된 4개의 업채널들과 4개의 다운채널들 사이에 위치할 수 있다.For example, referring to FIG. 15, the planar layers 1510 to 1530 according to an embodiment of the present invention include four up channels and four down channels included in the 3D cubic corresponding to the 3D binaural layer. It can be located between channels.

이 때, 4개의 업채널들은 3차원 큐빅의 상단에 위치하는 4개의 스피커들에 해당할 수 있고, 4개의 다운채널들은 3차원 큐빅의 하단에 위치하는 4개의 스피커들에 해당할 수 있다.At this time, the four up channels may correspond to the four speakers located at the top of the three-dimensional cubic, and the four down channels may correspond to the four speakers located at the bottom of the three-dimensional cubic.

즉, 도 15에 도시된 것과 같이 평면 레이어(1510~1530)는 3차원 큐빅에 상응하는 육면체의 높이 범위 내에 위치할 수 있다. That is, as illustrated in FIG. 15, the planar layers 1510 to 1530 may be located within the height range of the cube corresponding to the three-dimensional cubic.

따라서, 평면 레이어(1510~1530)에 상응하는 서라운드 레이어나 근접용 스테레오 레이어에 포함되는 각각의 스피커들도 3차원 큐빅에 포함된 4개의 업채널들과 4개의 다운채널들 사이에 위치할 수 있다. 이 때, 도 15에서는 설명의 편의를 위해 평면 레이어(1510~1530)를 평면의 형태로 도시하였으나, 본 발명의 일실시예에 따른 평면 레이어의 형태는 평면의 형태에 한정되지 않을 수 있다.Accordingly, each speaker included in the surround layer corresponding to the flat layer 1510 to 1530 or the stereo layer for proximity may also be positioned between four up channels and four down channels included in the 3D cubic. . At this time, in FIG. 15, for convenience of description, the planar layers 1510 to 1530 are illustrated in a planar form, but the planar layer form according to an embodiment of the present invention may not be limited to the planar form.

또한, 도 16과 도 17은 각각 3차원 바이노럴 레이어에 상응하는 3차원 큐빅과 평면 레이어(1610)를 위에서 내려다본 구조를 나타낸 것으로, 평면 레이어(1610)에 포함된 근접용 스테레오 레이어의 스피커들(1621, 1622)도 3차원 큐빅의 업채널과 다운채널 사이에 위치하는 것을 알 수 있다. In addition, FIGS. 16 and 17 respectively show a three-dimensional cubic and planar layer 1610 corresponding to the three-dimensional binaural layer from above, and a speaker of a proximity stereo layer included in the planar layer 1610 It can be seen that the fields 1621 and 1622 are also located between the up and down channels of the 3D cubic.

이 때, 도 17에 도시된 것과 같이, 기준 청취점(1700)을 기준으로 근접용 스테레오 레이어의 스피커들(1721, 1722)을 좌우 사이드에 배치함으로써 영상이 포함된 비디오 컨텐츠의 호환 시 적용할 수도 있다.At this time, as illustrated in FIG. 17, the speakers 1721 and 1722 of the proximity stereo layer are arranged on the left and right sides based on the reference listening point 1700 to be applied when the video content including the image is compatible. have.

또한, 프로세서(320)는 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 합하여 바이노럴 스테레오 출력을 생성한다. 즉, 3차원 레이어 바이노럴 출력에 의한 이머시브(immersive) 요소와 평면 레이어 오디오 출력에 의한 근접 재생 요소 및 오브젝트 요소 등을 믹스함으로써 바이노럴 효과가 극대화된 바이노럴 스테레오 출력을 생성할 수 있다. In addition, the processor 320 generates a binaural stereo output by combining a 3D layer binaural output and a flat layer audio output. In other words, by mixing immersive elements by 3D layer binaural output and proximity elements and object elements by plane layer audio output, binaural stereo output with maximum binaural effect can be generated. have.

이 때, 이머시브(immersive) 사운드만 구성하고자 하는 경우에는 3차원 레이어 바이노럴 출력만을 이용하여 바이노럴 스테레오 출력을 생성할 수도 있다.In this case, if only the immersive sound is to be configured, a binaural stereo output may be generated using only a 3D layer binaural output.

이 때, 서브우퍼 레이어에 상응하는 서브우퍼 출력을 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력과 함께 합산하여 바이노럴 스테레오 출력을 생성할 수 있다. 이 때, 서브우퍼 출력을 합산함으로써 바이노럴 스테레오 출력에 상응하는 이머시브(immersive) 효과를 극대화 시킬 수 있고, 다이나믹한 저음 재생 요소를 연출할 수 있다.At this time, the subwoofer output corresponding to the subwoofer layer may be added together with the 3D layer binaural output and the flat layer audio output to generate a binaural stereo output. At this time, by adding the subwoofer output, it is possible to maximize the immersive effect corresponding to the binaural stereo output, and to produce a dynamic bass reproduction element.

예를 들어, 도 18을 참조하면, LFE 버스(Low Frequency Effects Bus)(1820)를 기반으로 서브우퍼 레이어에 포함된 단일 채널 또는 2채널(1810)의 신호를 오디오 프로세싱할 수 있다. 즉, 서브우퍼 출력(1830)은 단일 채널 또는 2채널(1810) 기반의 오디오를 프로세싱함으로써 생성된 출력에 상응할 수 있고, 도 12에 도시된 것과 같이 단일 채널 또는 2채널에 상응할 수 있다. For example, referring to FIG. 18, a signal of a single channel or two channels 1810 included in a subwoofer layer may be audio-processed based on a LFE bus (Low Frequency Effects Bus) 1820. That is, the subwoofer output 1830 may correspond to an output generated by processing audio based on a single channel or two channels 1810, and may correspond to a single channel or two channels as illustrated in FIG. 12.

예를 들어, 서브우퍼 레이어는 5.1 채널, 7.1 채널 및 11.1 채널과 같이 단일 채널에 상응하거나, 10.2 채널 및 22.2 채널과 같이 2채널에 상응할 수도 있다. For example, the subwoofer layer may correspond to a single channel such as 5.1 channels, 7.1 channels and 11.1 channels, or may correspond to 2 channels such as 10.2 channels and 22.2 channels.

이 때, 서브우퍼 레이어는 3차원 바이노럴 레이어에 상응하는 3차원 큐빅이나 평면 레이어와 분리되어 위치할 수 있다. At this time, the subwoofer layer may be positioned separately from the 3D cubic or planar layer corresponding to the 3D binaural layer.

예를 들어, 도 19에 도시된 것과 같이 서브우퍼 레이어(1940)는 3차원 바이노럴 레이어에 상응하는 3차원 큐빅(1910), 서라운드 레이어(1920) 및 근접용 스테레오 레이어(1930)와 떨어진 곳에 위치할 수 있다. 이 때, 도 19에 도시된 구조는 일실시예에 상응하는 것으로, 각각의 레이어들을 조합한 구조에 한정되지 않는다.For example, as illustrated in FIG. 19, the subwoofer layer 1940 is separated from the 3D cubic 1910, the surround layer 1920, and the proximity stereo layer 1930 corresponding to the 3D binaural layer. Can be located. At this time, the structure shown in FIG. 19 corresponds to an embodiment, and is not limited to a structure in which each layer is combined.

이 때, 3차원 가중치를 3차원 레이어 바이노럴 출력에 적용하고, 평면 가중치를 평면 레이어 오디오 출력에 적용할 수 있고, 3차원 가중치 및 평면 가중치는 서로 독립적으로 설정될 수 있다. 즉, 레이어별 출력의 크기를 세분화하여 조절한 뒤 믹싱을 수행함으로써 보다 극적인 형태의 바이노럴 스테레오 출력을 생성할 수 있고, 바이노럴 효과를 극대화시킬 수 있다.At this time, the 3D weight may be applied to the 3D layer binaural output, the plane weight may be applied to the plane layer audio output, and the 3D weight and the plane weight may be set independently of each other. In other words, the size of the output for each layer is subdivided and adjusted, and then mixing is performed to generate a more dramatic binaural stereo output and maximize the binaural effect.

또한, 본 발명은 상기와 같은 기능의 프로세서(320)를 기반으로 자연스러운 업믹스 및 다운믹스 기능을 지원할 수 있으므로 다양한 종류의 사운드를 지원하는 컨텐츠 간의 호환성을 향상시킬 수 있다. 예를 들어, 3차원 큐빅을 통해 표현되는 서라운드 이미지를 서라운드 레이어로 다운믹스할 수 있다. 또한, 서라운드 레이어는 다시 근접용 스테레오 레이어로 다운믹스할 수도 있다. 이와 같이, 영역을 기반으로 다운믹스를 수행함에 따라 사운드의 음질을 보다 효과적으로 보존할 수 있다.In addition, since the present invention can support natural upmix and downmix functions based on the processor 320 having the above functions, compatibility between contents supporting various types of sounds can be improved. For example, a surround image expressed through 3D cubic can be downmixed to a surround layer. Also, the surround layer may be downmixed to a stereo layer for proximity again. As described above, as the downmix is performed based on the region, the sound quality of the sound can be more effectively preserved.

메모리(330)는 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 저장한다. The memory 330 stores a 3D layer binaural output and a flat layer audio output.

또한, 메모리(330)는 상술한 바와 같이 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오를 생성하는 과정에서 발생되는 다양한 정보를 저장한다. In addition, as described above, the memory 330 stores various information generated in the process of generating binaural stereo audio according to an embodiment of the present invention.

실시예에 따라, 메모리(330)는 바이노럴 스테레오 오디오 생성 장치와 독립적으로 구성되어 바이노럴 스테레오 오디오 생성 기능을 지원할 수 있다. 이 때, 메모리(330)는 별도의 대용량 스토리지로 동작할 수 있고, 동작 수행을 위한 제어 기능을 포함할 수 있다.According to an embodiment, the memory 330 may be configured independently of the binaural stereo audio generation device to support a binaural stereo audio generation function. At this time, the memory 330 may operate as a separate mass storage, and may include a control function for performing the operation.

한편, 바이노럴 스테레오 오디오 생성 장치는 메모리가 탑재되어 그 장치 내에서 정보를 저장할 수 있다. 일 구현예의 경우, 메모리는 컴퓨터로 판독 가능한 매체이다. 일 구현 예에서, 메모리는 휘발성 메모리 유닛일 수 있으며, 다른 구현예의 경우, 메모리는 비휘발성 메모리 유닛일 수도 있다. 일 구현예의 경우, 저장장치는 컴퓨터로 판독 가능한 매체이다. 다양한 서로 다른 구현 예에서, 저장장치는 예컨대 하드디스크 장치, 광학디스크 장치, 혹은 어떤 다른 대용량 저장장치를 포함할 수도 있다.On the other hand, a binaural stereo audio generating device is equipped with a memory to store information in the device. In one implementation, the memory is a computer-readable medium. In one implementation, the memory may be a volatile memory unit, and in other implementations, the memory may be a non-volatile memory unit. In one embodiment, the storage device is a computer-readable medium. In various different implementations, the storage device may include, for example, a hard disk device, an optical disk device, or some other mass storage device.

이와 같은, 바이노럴 스테레오 오디오 생성 장치를 통해 다양한 사운드 요소를 믹스함으로써 바이노럴 효과를 극대화할 수 있다. 또한, 자연스러운 업 믹스 및 다운 믹스를 기반으로 다양한 종류의 컨텐츠들과의 호환성을 향상시킬 수 있다.The binaural effect can be maximized by mixing various sound elements through the binaural stereo audio generator. In addition, compatibility with various types of content may be improved based on natural up-mix and down-mix.

도 20은 종래의 바이노럴 엔진을 통해 표현되는 사운드의 일 예를 나타낸 도면이고, 도 21은 본 발명에 따른 바이노럴 엔진을 통해 표현되는 사운드의 일 예를 나타낸 도면이다.20 is a view showing an example of a sound expressed through a conventional binaural engine, and FIG. 21 is a view showing an example of a sound expressed through a binaural engine according to the present invention.

먼저, 도 20을 참조하면, 종래의 바이노럴 엔진을 이용한 바이노럴 믹스 방식은 소리의 근접을 표현하는데 한계가 존재한다. 즉, 바이노럴 믹싱은 소리에 대한 공간 이미지를 제공하는 것에 해당하므로, 바이노럴 믹싱을 통해 소리의 근접을 표현하기 위해서는 소리의 음량을 조절하는 방법밖에 존재하지 않는다. First, referring to FIG. 20, a conventional binaural mix method using a binaural engine has limitations in expressing proximity of sound. That is, since binaural mixing corresponds to providing a spatial image for sound, there is only a method of adjusting the volume of sound to express the proximity of sound through binaural mixing.

따라서, 종래의 바이노럴 엔진으로 바이노럴 믹싱을 수행하는 경우, 엔지니어가 의도한 사운드 방향(2010)에 상응하게 바이노럴 믹싱을 수행하여도, 믹싱결과는 실제 사운드 방향(2020)에 상응하게 표현될 수 있다. 즉, 기준 청취점(2000)을 기준으로 소리가 앞에서 뒤로 또는 뒤에서 앞으로 흐르도록 표현하기 위한 바이노럴 믹싱은 실제로 바이노럴 엔진의 표면을 따라가는 형태로 표현되며, 이것은 Vbap(Vector base amplitude panning) 기술의 한계에 해당할 수 있다.Therefore, when performing binaural mixing with a conventional binaural engine, even if binaural mixing is performed corresponding to the intended sound direction 2010, the mixing result corresponds to the actual sound direction 2020. Can be expressed. That is, the binaural mixing for expressing the sound flowing from front to back or back to front based on the reference listening point 2000 is actually expressed in a form that follows the surface of the binaural engine, which is vector base amplitude panning (Vbap). It may fall within the limits of technology.

그러나, 도 21을 참조하면, 본 발명에 따른 바이노럴 엔진은 3차원 바이노럴 레이어 이외에도 서라운드 레이어(2110)와 근접용 스테레오 레이어(2120)를 이용하여 생성된 평면 레이어 오디오 출력을 함께 믹싱할 수 있다. 즉, 종래의 바이노럴 엔진에서는 소리의 음량을 통해서만 조절되었던 소리의 근접 표현을 서라운드 레이어 바이노럴 출력과 스테레오 신호를 통해 조절할 수 있다. However, referring to FIG. 21, the binaural engine according to the present invention may mix the flat layer audio output generated using the surround layer 2110 and the proximity stereo layer 2120 in addition to the 3D binaural layer. Can be. That is, in the conventional binaural engine, the proximity expression of the sound that has been adjusted only through the volume of the sound can be adjusted through the surround layer binaural output and the stereo signal.

따라서, 본 발명에 따른 바이노럴 엔진을 이용하는 경우, 도 20에서 엔지니어가 의도한 사운드 방향(2010)에 일치하는 실제 사운드 방향(2130)을 표현할 수 있다. 즉, 기준 청취점(2100)을 투과함으로써 마치 청취자의 몸을 투과하는 듯한 사운드를 연출해낼 수 있다.Therefore, when the binaural engine according to the present invention is used, the actual sound direction 2130 corresponding to the sound direction 2010 intended by the engineer in FIG. 20 may be expressed. That is, by transmitting the reference listening point 2100, it is possible to produce a sound as if passing through the listener's body.

도 22는 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법을 나타낸 동작흐름도이다.22 is a flowchart illustrating a method for generating binaural stereo audio according to an embodiment of the present invention.

도 22를 참조하면, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 3차원 바이노럴 레이어에 상응하는 3차원 레이어 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성한다(S2210).Referring to FIG. 22, a binaural stereo audio generation method according to an embodiment of the present invention generates a 3D layer binaural output by performing 3D layer binaural encoding corresponding to a 3D binaural layer (S2210).

예를 들어, 도 7을 참조하면, 바이노럴 스테레오 오디오를 듣는 사용자 또는 청취자의 위치를 가상으로 표현한 기준 청취점(700)은 8개의 동적 스피커들을 각 꼭지점으로 하는 3차원 큐빅(710)의 내부에 위치하되, 서라운드 레이어(720) 상에서 중심 부분에 위치할 수 있다. 이 때, 바이노럴 포인트(730)가 도 7에 도시된 것과 같이 3차원 큐빅(710)의 상면에 위치한다고 가정하면, 3차원 레이어 바이노럴 출력에 상응하는 3차원 벡터(740)는 도 7에 도시된 기준 청취점(700)에서 바이노럴 포인트(730)를 향하는 방향으로 생성될 수 있다.For example, referring to FIG. 7, the reference listening point 700 virtually representing the position of a user or a listener listening to binaural stereo audio is an interior of the 3D cubic 710 having 8 dynamic speakers as each vertex. Located in, but may be located in the center portion on the surround layer 720. At this time, assuming that the binaural point 730 is located on the upper surface of the 3D cubic 710 as illustrated in FIG. 7, the 3D vector 740 corresponding to the 3D layer binaural output is shown in FIG. It may be generated in a direction toward the binaural point 730 from the reference listening point 700 shown in 7.

이 때, 도 10 내지 도 11을 통해 상세하게 설명하겠지만, 서라운드 레이어(720)는 서라운드 효과에 상응하는 서라운드 이미지를 만드는 요소에 상응하는 것으로, 도 7에서는 설명의 편의를 위해 서라운드 레이어(720)를 평면의 형태로 도시하였으나, 평면 형태에 한정되지 않을 수 있다.At this time, although it will be described in detail through FIGS. 10 to 11, the surround layer 720 corresponds to an element that creates a surround image corresponding to the surround effect, and in FIG. 7, the surround layer 720 is provided for convenience of explanation. Although shown in the form of a plane, it may not be limited to the plane shape.

이 때, 헤드 트래킹 정보는 사용자나 청취자의 머리 움직임을 트래킹한 데이터에 상응하는 것으로, 별도의 헤드 트래킹 모듈 또는 사용자 인터페이스를 통해 입력될 수 있다. At this time, the head tracking information corresponds to data tracking the head movement of the user or listener, and may be input through a separate head tracking module or a user interface.

다른 예를 들어, 헤드 트래킹 정보는 사용자나 청취자가 인위적으로 부여할 수도 있다. 즉, 사용자나 청취자가 인위적으로 공간 이미지를 회전시키기 위해서 헤드 트래킹 모듈에 의한 헤드 트래킹 정보의 수신 여부와 상관없이 사용자 인터페이스를 기반으로 헤드 트래킹 정보를 입력할 수도 있다. 이 때, 사용자나 청취자는 바이노럴 스테레오 출력을 생성하는 믹싱과정 또는 입력되는 정보에 따라 변화하는 바이노럴 스테레오 출력을 청취하면서 헤드 트래킹 정보를 입력 및 수정할 수도 있다.For another example, the head tracking information may be artificially provided by a user or a listener. That is, the user or the listener may input the head tracking information based on the user interface regardless of whether the head tracking information is received by the head tracking module to artificially rotate the spatial image. At this time, the user or the listener may input and correct head tracking information while listening to a mixing process that generates a binaural stereo output or a binaural stereo output that changes according to input information.

이와 같이, 헤드 트래킹 정보에 따라 3차원 큐빅을 회전시키거나 상하좌우로 움직여서 연출되는 효과는 향후 평면 레이어 오디오 출력과 믹싱되어 바이노럴 스테레오 출력을 생성할 수 있다. 따라서, 평면 레이어에 상응하는 서라운드 레이어나 근접용 스테레오 레이어 또는 서브우퍼 레이어 등을 회전시키거나 이동시키는 종래의 방식보다 효율적으로 헤드 트래킹에 기반한 이머시브(immersive) 효과를 연출할 수 있다.As described above, the effect produced by rotating the 3D cubic or moving it up, down, left, and right according to the head tracking information may be mixed with the flat layer audio output in the future to generate a binaural stereo output. Therefore, an immersive effect based on head tracking can be produced more efficiently than a conventional method of rotating or moving a surround layer, a proximity stereo layer, or a subwoofer layer corresponding to a flat layer.

또한, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 평면 레이어에 상응하는 오디오 프로세싱을 수행하여 평면 레이어 오디오 출력을 생성한다(S2220).In addition, in the binaural stereo audio generation method according to an embodiment of the present invention, audio processing corresponding to a flat layer is performed to generate a flat layer audio output (S2220).

이 때, 평면 레이어는 3차원 바이노럴 레이어와는 상이한 구조를 갖는 레이어에 상응하는 것으로, 서라운드 효과 또는 스테레오 효과에 상응하는 이미지를 만드는 요소에 상응할 수 있다.At this time, the planar layer corresponds to a layer having a structure different from that of the 3D binaural layer, and may correspond to an element that creates an image corresponding to a surround effect or a stereo effect.

예를 들어, 도 10을 참조하면, 바이노럴 인코더(1020)를 이용하여 5채널 또는 7채널(1010)의 서라운드 레이어에 상응하는 서라운드 레이어 바이노럴 인코딩을 수행할 수 있다. 이 때, 도 13 내지 도 14를 통해 설명하겠지만, 근접용 스테레오 레이어에 상응하는 2채널을 서라운드 레이어에 포함시켜 7채널 기반의 서라운드 레이어 바이노럴 인코딩을 수행할 수 있다.For example, referring to FIG. 10, the binaural encoder 1020 may be used to perform surround layer binaural encoding corresponding to the surround layer of the 5 or 7 channel 1010. At this time, as described with reference to FIGS. 13 to 14, a 7-channel-based surround layer binaural encoding may be performed by including 2 channels corresponding to the adjacent stereo layer in the surround layer.

이 때, 도 10 내지 도 11에서는 5채널 또는 7채널(1010)에 해당하는 서라운드 레이어를 도시하고 있으나, 서라운드 레이어의 채널은 5채널 또는 7채널(1010)에 한정되지 않는다. 또한, 도 11에서는 서라운드 레이어를 사각형 평면 형태로 도시하고 있으나, 이에 한정되지 않고 선의 두께, 평면 모양의 형태 및 기준 청취점으로부터의 거리 등 다양한 형태로 표현 가능하다.At this time, although the surround layers corresponding to the 5 or 7 channels 1010 are illustrated in FIGS. 10 to 11, the channels of the surround layers are not limited to the 5 or 7 channels 1010. In addition, in FIG. 11, the surround layer is illustrated in a rectangular flat shape, but is not limited thereto, and can be expressed in various shapes such as a line thickness, a flat shape, and a distance from a reference listening point.

다른 예를 들어, 도 12를 참조하면, 스테레오 버스(Stereo Bus)(1220)를 기반으로 2채널의(1210)의 근접용 스테레오 레이어에 상응하게 오디오 프로세싱을 수행할 수 있다. 즉, 평면 레이어 오디오 출력에 상응하는 스테레오 신호(1230)는 2채널(1210) 기반의 스테레오 오디오를 프로세싱함으로써 생성된 출력에 상응할 수 있고, 2채널에 상응하게 출력될 수 있다. For another example, referring to FIG. 12, audio processing may be performed corresponding to a stereo layer for proximity of two channels 1210 based on a stereo bus 1220. That is, the stereo signal 1230 corresponding to the flat layer audio output may correspond to the output generated by processing the stereo audio based on the two channels 1210, and may be output corresponding to the two channels.

이와 같이, 서라운드 레이어 바이노럴 출력에 상응하는 평면 레이어 오디오 출력이나 스테레오 신호에 상응하는 평면 레이어 오디오 출력은 3차원 레이어 바이노럴 출력과 비교하였을 때, 단지 상이한 음향 효과를 포함하는 출력에 해당하는 것일 수 있다. 즉, 평면 레이어 오디오 출력은 3차원 레이어에 상응하는 출력이 아니어도 3차원 레이어 바이노럴 출력보다 다양한 값을 포함할 수도 있다.As such, the flat layer audio output corresponding to the surround layer binaural output or the flat layer audio output corresponding to the stereo signal corresponds to an output including only different sound effects when compared to the 3D layer binaural output. May be That is, the flat layer audio output may include various values than the 3D layer binaural output even if the output does not correspond to the 3D layer.

또한, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 합하여 바이노럴 스테레오 출력을 생성한다(S2230). 즉, 3차원 레이어 바이노럴 출력에 의한 이머시브(immersive) 요소와 평면 레이어 오디오 출력에 의한 근접 재생 요소 및 오브젝트 요소 등을 믹스함으로써 바이노럴 효과가 극대화된 바이노럴 스테레오 출력을 생성할 수 있다. In addition, the binaural stereo audio generation method according to an embodiment of the present invention generates a binaural stereo output by combining a 3D layer binaural output and a flat layer audio output (S2230). In other words, by mixing immersive elements by 3D layer binaural output and proximity elements and object elements by plane layer audio output, binaural stereo output with maximum binaural effect can be generated. have.

이 때, 서브우퍼 레이어는 3차원 바이노럴 레이어에 상응하는 3차원 큐빅이나 평면 레이어와 분리되어 위치할 수 있다.At this time, the subwoofer layer may be positioned separately from the 3D cubic or planar layer corresponding to the 3D binaural layer.

또한, 본 발명은 상기에 개시된 기능들을 기반으로 자연스러운 업믹스 및 다운믹스 기능을 지원할 수 있으므로 다양한 종류의 사운드를 지원하는 컨텐츠 간의 호환성을 향상시킬 수 있다. 예를 들어, 3차원 큐빅을 통해 표현되는 서라운드 이미지를 서라운드 레이어로 다운믹스할 수 있다. 또한, 서라운드 레이어는 다시 근접용 스테레오 레이어로 다운믹스할 수도 있다. 이와 같이, 영역을 기반으로 다운믹스를 수행함에 따라 사운드의 음질을 보다 효과적으로 보존할 수 있다.In addition, since the present invention can support natural upmix and downmix functions based on the functions disclosed above, compatibility between contents supporting various types of sounds can be improved. For example, a surround image expressed through 3D cubic can be downmixed to a surround layer. Also, the surround layer may be downmixed to a stereo layer for proximity again. As described above, as the downmix is performed based on the region, the sound quality of the sound can be more effectively preserved.

또한, 도 22에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 네트워크와 같은 통신망을 통해 바이노럴 스테레오 오디오 생성을 위해 필요한 정보를 송수신할 수 있다. 특히, 본 발명의 일실시예에 따른 헤드 트래킹 정보나 사용자 입력 또는 바이노럴 효과를 적용할 컨텐츠에 관련된 정보를 수신하고, 바이노럴 스테레오 출력을 제공할 수 있다.In addition, although not shown in FIG. 22, the binaural stereo audio generation method according to an embodiment of the present invention may transmit and receive information required for binaural stereo audio generation through a communication network such as a network. In particular, head tracking information or user input according to an embodiment of the present invention or information related to content to which a binaural effect is applied may be received, and a binaural stereo output may be provided.

또한, 도 22에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 상술한 바와 같이 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오를 생성하는 과정에서 발생되는 다양한 정보를 저장한다.In addition, although not shown in FIG. 22, the binaural stereo audio generation method according to an embodiment of the present invention is generated in the process of generating the binaural stereo audio according to an embodiment of the present invention as described above. Stores various information.

따라서, 본 발명의 실시예는 컴퓨터로 구현된 방법이나 컴퓨터에서 실행 가능한 명령어들이 기록된 비일시적인 컴퓨터에서 읽을 수 있는 매체로 구현될 수 있다. 컴퓨터에서 읽을 수 있는 명령어들이 프로세서에 의해서 수행될 때, 컴퓨터에서 읽을 수 있는 명령어들은 본 발명의 적어도 한 가지 측면에 따른 방법을 수행할 수 있다.Accordingly, an embodiment of the present invention may be implemented as a computer-implemented method or a non-transitory computer-readable medium having computer-executable instructions recorded thereon. When computer readable instructions are executed by a processor, computer readable instructions may perform a method in accordance with at least one aspect of the present invention.

이상에서와 같이 본 발명에 따른 바이노럴 스테레오 오디오 생성 방법 및 이를 위한 장치는 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, the method and apparatus for generating the binaural stereo audio according to the present invention are not limited to the configuration and method of the above-described embodiments, and the above embodiments may be modified in various ways. In order to be possible, all or part of each embodiment may be selectively combined.

110, 1910: 3차원 바이노럴 레이어
111, 121, 210, 420, 1020: 바이노럴 인코더
120, 720, 1610, 1710, 1920, 2110: 서라운드 레이어
130, 2120: 근접용 스테레오 레이어
131, 1220: 스테레오 버스 140: 서브우퍼 레이어
141, 1820: LFE(Low Frequency Effects) 버스
150: 바이노럴 믹서 220: 전용 플레이어
310: 통신부 320: 프로세서
330: 메모리 411: 4개의 업채널
412: 4개의 다운채널 430: 3차원 레이어 바이노럴 출력
511~518: 동적 스피커 610~630, 710: 3차원 큐빅
700, 1300, 1400, 1600, 1700, 2000, 2100: 기준 청취점
730: 바이노럴 포인트 740: 3차원 벡터
1010: 5채널 또는 7채널 1030: 서라운드 레이어 바이노럴 출력
1111~1115, 1311, 1312, 1411, 1412, 1621, 1622, 1721, 1722: 스피커
1210: 2채널 1230: 스테레오 신호
1510~1530: 평면 레이어
1810: 단일 채널 또는 2채널 1830: 서브우퍼 출력
1930: 근접용 스테레오 레이어 1940: 서브우퍼 레이어
2010: 엔지니어가 의도한 사운드 방향
2020, 2130: 실제 사운드 방향110, 1910: 3D binaural layer
111, 121, 210, 420, 1020: binaural encoder
120, 720, 1610, 1710, 1920, 2110: surround layer
130, 2120: stereo layer for proximity
131, 1220: stereo bus 140: subwoofer layer
141, 1820: Low Frequency Effects (LFE) bus
150: binaural mixer 220: dedicated player
310: communication unit 320: processor
330: Memory 411: 4 up channels
412: 4 down channels 430: 3D layer binaural output
511~518: Dynamic speaker 610~630, 710: 3D cubic
700, 1300, 1400, 1600, 1700, 2000, 2100: Reference listening point
730: binaural point 740: 3D vector
1010: 5-channel or 7-channel 1030: surround layer binaural output
1111~1115, 1311, 1312, 1411, 1412, 1621, 1622, 1721, 1722: Speaker
1210: 2-channel 1230: stereo signal
1510~1530: flat layer
1810: single channel or 2 channels 1830: subwoofer output
1930: Proximity stereo layer 1940: Subwoofer layer
2010: sound direction intended by the engineer
2020, 2130: actual sound direction

Claims

Generating a 3D layer binaural output by performing 3D layer binaural encoding corresponding to the 3D binaural layer;
Performing audio processing corresponding to the flat layer to generate a flat layer audio output; And
Generating a binaural stereo output by combining the three-dimensional layer binaural output and a plane layer audio output.
Including,
The 3D binaural layer
Corresponds to an 8-channel-based 3D cubic shape composed of 4 up channels and 4 down channels located at the front and rear centered on the listener's position,
The step of generating the binaural stereo output is
When the flat layer is a proximity stereo layer, output from a plurality of dynamic speakers on the 3D cubic located at a greater distance than at least one proximity speaker positioned on the proximity stereo layer based on the position of the listener A binaural stereo audio generation method characterized in that the binaural signal is generated by combining the stereo signal output from the at least one proximity speaker.

The method according to claim 1,
The flat layer
A surround layer generating a surround layer binaural output by performing surround layer binaural encoding, and providing the generated surround layer binaural output as the flat layer audio output, and
Binaural stereo audio generating method, characterized in that any one of the proximity stereo layer for receiving the stereo signal and generating the flat layer audio output corresponding to the stereo signal.

The method according to claim 2,
The 3D layer binaural output is
A binaural stereo audio generation method characterized in that it is generated corresponding to a three-dimensional vector for a binaural point located on the three-dimensional cubic (Cubic).

The method according to claim 2,
The step of generating the binaural stereo output is
Binaural stereo, characterized in that a 3D weight is applied to the 3D layer binaural output, a plane weight is applied to the plane layer audio output, and the 3D weight and the plane weight are set independently of each other. How to create audio.

The method according to claim 1,
The step of generating the binaural stereo output is
A binaural stereo audio generating method comprising generating a binaural stereo output by adding a subwoofer output corresponding to a subwoofer layer together with the three-dimensional layer binaural output and a flat layer audio output.

The method according to claim 3,
The three-dimensional cubic
A method for generating binaural stereo audio, characterized in that the positions of eight dynamic speakers corresponding to the vertices of the three-dimensional cubic are generated by changing corresponding to a size parameter for the three-dimensional binaural layer.

The method according to claim 3,
The 3D vector
A binaural stereo audio generating method included in the 3D cubic and generated based on a reference listening point corresponding to a center of a 2D plane corresponding to the surround layer.

The method according to claim 3,
The step of generating the 3D layer binaural output is
The 3D layer binaural output is generated by applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information, wherein the head tracking information is based on a head tracking module and a user interface A binaural stereo audio generation method, characterized in that corresponding to at least one of the user input based on.

The method according to claim 8,
The three-dimensional cubic
A method of generating binaural stereo audio, characterized in that it is rotated corresponding to at least one rotation parameter of a pan, tilt, and roll.

The method according to claim 3,
The planar layer is located between the four up channels and the four down channels, binaural stereo audio generating method.

A 3D layer binaural encoding corresponding to the 3D binaural layer is performed to generate a 3D layer binaural output, and audio processing corresponding to the plane layer is performed to generate a flat layer audio output, and the 3 A processor for generating a binaural stereo output by combining a dimensional layer binaural output and a plane layer audio output; And
Memory for storing the 3D layer binaural output and flat layer audio output
Including,
The 3D binaural layer
Corresponds to an 8-channel-based 3D cubic shape composed of 4 up channels and 4 down channels located at the front and rear centered on the listener's position,
The processor
When the flat layer is a proximity stereo layer, output from a plurality of dynamic speakers on the 3D cubic located at a greater distance than at least one proximity speaker positioned on the proximity stereo layer based on the position of the listener The binaural stereo audio generating apparatus, characterized in that for generating the binaural stereo output by combining the stereo signal output from the at least one proximity speaker and the binaural signal.

The method according to claim 11,
The flat layer
A surround layer generating a surround layer binaural output by performing surround layer binaural encoding, and providing the generated surround layer binaural output as the flat layer audio output, and
A binaural stereo audio generating device, characterized in that it is any one of the adjacent stereo layer for receiving the stereo signal and generating the flat layer audio output corresponding to the stereo signal.

The method according to claim 12,
The 3D layer binaural output is
A binaural stereo audio generating apparatus characterized in that it is generated corresponding to a three-dimensional vector for a binaural point located on the three-dimensional cubic (Cubic).

The method according to claim 12,
The processor
3D weighting is applied to the 3D layer binaural output, plane weighting is applied to the plane layer audio output,
The binaural stereo audio generating apparatus, characterized in that the 3D weight and the plane weight are set independently of each other.

The method according to claim 11,
The processor
A binaural stereo audio generating apparatus comprising generating a binaural stereo output by adding a subwoofer output corresponding to a subwoofer layer together with the three-dimensional layer binaural output and a flat layer audio output.

The method according to claim 13,
The three-dimensional cubic
A device for generating binaural stereo audio, characterized in that the positions of eight dynamic speakers corresponding to a vertex of the three-dimensional cubic are generated by changing corresponding to a size parameter for the three-dimensional binaural layer.

The method according to claim 13,
The 3D vector
A binaural stereo audio generating apparatus included in the 3D cubic and generated based on a reference listening point corresponding to a center of a 2D plane corresponding to the surround layer.

The method according to claim 13,
The processor
The 3D layer binaural output is generated by applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information, wherein the head tracking information is based on a head tracking module and a user interface A binaural stereo audio generating device, characterized in that obtained corresponding to at least one of the user input based on.

The method according to claim 18,
The three-dimensional cubic
A binaural stereo audio generating apparatus characterized in that it is rotated corresponding to at least one rotation parameter of a pan, tilt, and roll.

The method according to claim 13,
The planar layer is a binaural stereo audio generating apparatus, characterized in that located between the four up channels and the four down channels.