KR20190091824A

KR20190091824A - Method for creating binaural stereo audio and apparatus using the same

Info

Publication number: KR20190091824A
Application number: KR1020180010874A
Authority: KR
Inventors: 구본희
Original assignee: 구본희
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2019-08-07
Also published as: KR102119239B1; WO2019147041A1

Abstract

Disclosed are a method for generating binaural stereo audios, and an apparatus therefor. According to one embodiment of the present invention, the method for generating binaural stereo audios comprises the steps of: generating a three-dimensional layer binaural output by performing three-dimensional layer binaural encoding corresponding to a three-dimensional binaural layer; generating a planar layer audio output by performing audio processing corresponding to a planar layer; and outputting a binaural stereo output by combining the three-dimensional layer binaural output and the planar layer audio output.

Description

Method for generating binaural stereo audio and a device therefor {METHOD FOR CREATING BINAURAL STEREO AUDIO AND APPARATUS USING THE SAME}

본 발명은 바이노럴 스테레오 오디오를 생성하는 기술에 관한 것으로, 특히 3차원 레이어에 기반한 바이노럴 출력과 평면 레이어에 기반한 오디오 출력을 합쳐서 범용적으로 재생 가능한 바이노럴 스테레오 오디오를 생성하는 기술에 관한 것이다.The present invention relates to a technique for generating binaural stereo audio, and more particularly, to a technique for generating universally reproducible binaural stereo audio by combining a binaural output based on a 3D layer and an audio output based on a planar layer. It is about.

멀티미디어 기술이 향상되면서, 5.1 채널보다 많은 7.1 채널, 10.2 채널, 11.1 채널, 22.2 채널 등의 다채널 오디오 신호를 포함하는 컨텐츠의 사용이 증가하고 있다. 그러나, 컨텐츠를 이용하는 사용자들이 소지하고 있는 사용자 단말들은 대체로 스테레오 스피커나 헤드폰, 이어폰과 같이 스테레오 형태의 오디오 신호를 재생할 수 있기 때문에 고품질의 다채널 오디오 신호는 스테레오 형태의 오디오 신호로 변환될 필요가 있다.As multimedia technology is improved, the use of content including multichannel audio signals, such as 7.1 channels, 10.2 channels, 11.1 channels, and 22.2 channels, is increasing. However, since the user terminals possessed by the users using the content can reproduce stereo audio signals such as stereo speakers, headphones, and earphones, high quality multi-channel audio signals need to be converted into stereo audio signals. .

한국 공개 특허 제10-2015-0013073호, 2015년 2월 4일 공개(명칭: 다채널 오디오 신호의 바이노럴 렌더링 방법 및 장치)Korean Unexamined Patent Publication No. 10-2015-0013073, published February 4, 2015 (name: method and apparatus for binaural rendering of multi-channel audio signals)

본 발명의 목적은 다양한 사운드 요소를 믹스함으로써 바이노럴 효과를 극대화할 수 있는 바이노럴 스테레오 오디오를 생성하기 위한 방법을 제공하는 것이다.It is an object of the present invention to provide a method for generating binaural stereo audio that can maximize the binaural effect by mixing various sound elements.

또한, 본 발명의 목적은 효과적인 바이노럴 효과를 생성하기 위한 사운드 요소를 쉽게 가감하거나 조절할 수 있는 바이노럴 엔진을 제공하는 것이다.It is also an object of the present invention to provide a binaural engine capable of easily adding or adjusting sound elements to produce an effective binaural effect.

또한, 본 발명의 목적은 자연스러운 업 믹스 및 다운 믹스를 기반으로 다양한 종류의 컨텐츠들과의 호환성을 향상시키는 것이다.It is also an object of the present invention to improve compatibility with various types of contents based on natural upmix and downmix.

상기한 목적을 달성하기 위한 본 발명에 따른 바이노럴 스테레오 오디오 생성 방법은 3차원 바이노럴 레이어에 상응하는 3차원 레이어 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성하는 단계; 평면 레이어에 상응하는 오디오 프로세싱을 수행하여 평면 레이어 오디오 출력을 생성하는 단계; 및 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 합하여 바이노럴 스테레오 출력을 생성하는 단계를 포함한다.According to an aspect of the present invention, there is provided a binaural stereo audio generation method comprising: generating a 3D layer binaural output by performing a 3D layer binaural encoding corresponding to a 3D binaural layer; Generating a planar layer audio output by performing audio processing corresponding to the planar layer; And combining the three-dimensional layer binaural output and the planar layer audio output to generate a binaural stereo output.

이 때, 평면 레이어는 서라운드 레이어 바이노럴 인코딩을 수행하여 서라운드 레이어 바이노럴 출력을 생성하고, 생성된 상기 서라운드 레이어 바이노럴 출력을 상기 평면 레이어 오디오 출력으로 제공하는 서라운드 레이어 및 스테레오 신호를 입력 받아서 상기 스테레오 신호에 상응하는 상기 평면 레이어 오디오 출력을 생성하는 근접용 스테레오 레이어 중 어느 하나일 수 있다.In this case, the planar layer performs surround layer binaural encoding to generate a surround layer binaural output, and inputs a surround layer and a stereo signal for providing the generated surround layer binaural output to the planar layer audio output. And a proximity stereo layer that receives and generates the planar layer audio output corresponding to the stereo signal.

이 때, 3차원 레이어 바이노럴 출력은 4개의 업 채널들과 4개의 다운채널들로 구성된 8채널 기반의 3차원 큐빅(Cubic) 상에 위치하는 바이노럴 포인트에 대한 3차원 벡터에 상응하게 생성될 수 있다.At this time, the three-dimensional layer binaural output corresponds to a three-dimensional vector of binaural points located on an eight-channel-based three-dimensional cubic composed of four up channels and four down channels. Can be generated.

이 때, 바이노럴 스테레오 출력을 생성하는 단계는 3차원 가중치를 상기 3차원 레이어 바이노럴 출력에 적용하고, 평면 가중치를 상기 평면 레이어 오디오 출력에 적용하고, 상기 3차원 가중치 및 상기 평면 가중치는 서로 독립적으로 설정될 수 있다.In this case, the generating of the binaural stereo output may include applying a 3D weight to the 3D layer binaural output, applying a plane weight to the plane layer audio output, and the 3D weight and the plane weight Can be set independently of each other.

이 때, 바이노럴 스테레오 출력을 생성하는 단계는 서브우퍼 레이어에 상응하는 서브우퍼 출력을 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력과 함께 합산하여 상기 바이노럴 스테레오 출력을 생성할 수 있다.In this case, generating the binaural stereo output may generate the binaural stereo output by summing a subwoofer output corresponding to a subwoofer layer together with the three-dimensional layer binaural output and the planar layer audio output. have.

이 때, 3차원 큐빅은 상기 3차원 큐빅의 꼭지점에 해당하는 8개의 동적 스피커들의 위치를 상기 3차원 바이노럴 레이어에 대한 크기 파라미터에 상응하게 변경하여 생성될 수 있다.In this case, the 3D cubic may be generated by changing the positions of the eight dynamic speakers corresponding to the vertices of the 3D cubic according to the size parameter for the 3D binaural layer.

이 때, 3차원 벡터는 상기 3차원 큐빅의 내부에 포함되고, 상기 서라운드 레이어에 상응하는 2차원 평면의 중심에 해당하는 기준 청취점을 기준으로 생성될 수 있다.In this case, the 3D vector may be included in the 3D cubic and generated based on a reference listening point corresponding to the center of the 2D plane corresponding to the surround layer.

이 때, 3차원 레이어 바이노럴 출력을 생성하는 단계는 상기 3차원 벡터의 방향 정보를 헤드 트래킹 정보에 상응하게 회전된 상기 3차원 큐빅에 적용하여 상기 3차원 레이어 바이노럴 출력을 생성하되, 상기 헤드 트래킹 정보는 헤드 트래킹 모듈에 기반한 트래킹 입력 및 사용자 인터페이스에 기반한 사용자 입력 중 적어도 하나에 상응하게 획득될 수 있다.In this case, the generating of the 3D layer binaural output may include generating the 3D layer binaural output by applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information. The head tracking information may be obtained corresponding to at least one of a tracking input based on a head tracking module and a user input based on a user interface.

이 때, 3차원 큐빅은 팬(Pan), 틸트(tilt) 및 롤(roll) 중 적어도 하나의 회전 파라미터에 상응하게 회전될 수 있다.In this case, the 3D cubic may be rotated corresponding to the rotation parameter of at least one of pan, tilt and roll.

이 때, 평면 레이어는 상기 4개의 업 채널들과 상기 4개의 다운채널들 사이에 위치할 수 있다.In this case, the planar layer may be located between the four up channels and the four down channels.

또한, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치는, 3차원 바이노럴 레이어에 상응하는 3차원 레이어 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성하고, 평면 레이어에 상응하는 오디오 프로세싱을 수행하여 평면 레이어 오디오 출력을 생성하고, 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 합하여 바이노럴 스테레오 출력을 생성하는 프로세서; 및 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 저장하는 메모리를 포함한다.In addition, the binaural stereo audio generating apparatus according to an embodiment of the present invention generates a three-dimensional layer binaural output by performing a three-dimensional layer binaural encoding corresponding to the three-dimensional binaural layer, A processor configured to generate a planar layer audio output by performing audio processing corresponding to a layer, and generate a binaural stereo output by combining the three-dimensional layer binaural output and the planar layer audio output; And a memory storing the three-dimensional layer binaural output and the planar layer audio output.

이 때, 프로세서는 3차원 가중치를 상기 3차원 레이어 바이노럴 출력에 적용하고, 평면 가중치를 상기 평면 레이어 오디오 출력에 적용하고, 상기 3차원 가중치 및 상기 평면 가중치는 서로 독립적으로 설정될 수 있다.In this case, the processor may apply a 3D weight to the 3D layer binaural output, apply a plane weight to the plane layer audio output, and the 3D weight and the plane weight may be set independently of each other.

이 때, 프로세서는 서브우퍼 레이어에 상응하는 서브우퍼 출력을 상기 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력과 함께 합산하여 상기 바이노럴 스테레오 출력을 생성할 수 있다.In this case, the processor may generate the binaural stereo output by adding the subwoofer output corresponding to the subwoofer layer together with the 3D layer binaural output and the planar layer audio output.

이 때, 프로세서는 상기 3차원 벡터의 방향 정보를 헤드 트래킹 정보에 상응하게 회전된 상기 3차원 큐빅에 적용하여 상기 3차원 레이어 바이노럴 출력을 생성하되, 상기 헤드 트래킹 정보는 헤드 트래킹 모듈에 기반한 트래킹 입력 및 사용자 인터페이스에 기반한 사용자 입력 중 적어도 하나에 상응하게 획득될 수 있다.At this time, the processor generates the 3D layer binaural output by applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information, wherein the head tracking information is based on the head tracking module. It may be obtained corresponding to at least one of the tracking input and the user input based on the user interface.

본 발명에 따르면, 다양한 사운드 요소를 믹스함으로써 바이노럴 효과를 극대화할 수 있는 바이노럴 스테레오 오디오를 생성하기 위한 방법을 제공할 수 있다.According to the present invention, it is possible to provide a method for generating binaural stereo audio that can maximize the binaural effect by mixing various sound elements.

또한, 본 발명은 효과적인 바이노럴 효과를 생성하기 위한 사운드 요소를 쉽게 가감하거나 조절할 수 있는 바이노럴 엔진을 제공할 수 있다.In addition, the present invention can provide a binaural engine capable of easily adding or adjusting sound elements for producing an effective binaural effect.

또한, 본 발명은 자연스러운 업 믹스 및 다운 믹스를 기반으로 다양한 종류의 컨텐츠들과의 호환성을 향상시킬 수 있다.In addition, the present invention may improve compatibility with various types of contents based on natural upmix and downmix.

도 1은 본 발명의 일실시예에 따른 바이노럴 엔진의 구조를 나타낸 도면이다.
도 2는 종래의 바이노럴 엔진의 구조를 나타낸 도면이다.
도 3은 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치를 나타낸 블록도이다.
도 4는 본 발명의 일실시예에 따른 3차원 레이어 바이노럴 출력을 생성하는 상세한 구조를 나타낸 도면이다.
도 5는 본 발명에 따른 8채널 기반의 3차원 큐빅(Cubic)의 일 예를 나타낸 도면이다.
도 6은 본 발명에 따른 동적 스피커들의 위치를 변경하여 생성되는 다양한 크기의 3차원 큐빅의 일 예를 나타낸 도면이다.
도 7는 본 발명에 따른 3차원 벡터의 일 예를 나타낸 도면이다.
도 8은 본 발명에 따른 헤드 트래킹 정보에 상응하게 회전된 3차원 큐빅에 3차원 벡터의 방향 정보를 적용한 일 예를 나타낸 도면이다.
도 9는 본 발명에 따른 회전 파라미터의 일 예를 나타낸 도면이다.
도 10은 본 발명의 일실시예에 따른 서라운드 레이어 바이노럴 출력을 생성하는 상세한 구조를 나타낸 도면이다.
도 11은 본 발명에 5채널 기반의 서라운드 레이어의 일 예를 나타낸 도면이다.
도 12는 본 발명의 일실시예에 따른 스테레오 신호를 생성하는 상세한 구조를 나타낸 도면이다.
도 13 내지 도 14는 본 발명에 따른 근접용 스테레오 레이어의 일 예를 나타낸 도면이다.
도 15는 본 발명에 따른 3차원 큐빅의 업채널과 다운채널 사이에 위치하는 평면 레이어의 일 예를 나타낸 도면이다.
도 16 내지 도 17은 본 발명에 따른 서라운드 레이어의 일부 채널로 이용되는 근접용 스테레오 레이어의 일 예를 나타낸 도면이다.
도 18은 본 발명의 일실시예에 따른 서브우퍼 출력을 생성하는 상세한 구조를 나타낸 도면이다.
도 19는 본 발명에 따른 3차원 바이노럴 레이어, 평면 레이어 및 서브우퍼 레이어를 합한 구조의 일 예를 나타낸 도면이다.
도 20은 종래의 바이노럴 엔진을 통해 표현되는 사운드의 일 예를 나타낸 도면이다.
도 21은 본 발명에 따른 바이노럴 엔진을 통해 표현되는 사운드의 일 예를 나타낸 도면이다.
도 22는 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법을 나타낸 동작흐름도이다.1 is a view showing the structure of a binaural engine according to an embodiment of the present invention.
2 is a view showing the structure of a conventional binaural engine.
3 is a block diagram illustrating an apparatus for generating binaural stereo audio according to an embodiment of the present invention.
4 is a diagram showing a detailed structure for generating a three-dimensional layer binaural output according to an embodiment of the present invention.
5 is a view showing an example of an eight-channel based three-dimensional cubic (cubic) according to the present invention.
6 is a diagram illustrating an example of three-dimensional cubic of various sizes generated by changing positions of dynamic speakers according to the present invention.
7 is a diagram illustrating an example of a 3D vector according to the present invention.
8 is a diagram illustrating an example in which direction information of a 3D vector is applied to a 3D cubic rotated according to the head tracking information according to the present invention.
9 is a diagram illustrating an example of a rotation parameter according to the present invention.
10 illustrates a detailed structure for generating a surround layer binaural output according to an embodiment of the present invention.
11 illustrates an example of a 5-channel surround layer according to the present invention.
12 illustrates a detailed structure for generating a stereo signal according to an embodiment of the present invention.
13 to 14 are diagrams showing an example of a proximity stereo layer according to the present invention.
15 illustrates an example of a planar layer positioned between an upchannel and a downchannel of a 3D cubic according to the present invention.
16 to 17 are diagrams showing an example of a proximity stereo layer used as part of a channel of a surround layer according to the present invention.
18 illustrates a detailed structure for generating a subwoofer output according to an embodiment of the present invention.
19 is a diagram illustrating an example of a structure in which a three-dimensional binaural layer, a planar layer, and a subwoofer layer are combined according to the present invention.
20 is a diagram illustrating an example of a sound expressed through a conventional binaural engine.
21 is a diagram illustrating an example of sound expressed through a binaural engine according to the present invention.
22 is a flowchart illustrating a method for generating binaural stereo audio according to an embodiment of the present invention.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. Here, the repeated description, well-known functions and configurations that may unnecessarily obscure the subject matter of the present invention, and detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more completely describe the present invention to those skilled in the art. Accordingly, the shape and size of elements in the drawings may be exaggerated for clarity.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 바이노럴 엔진의 구조를 나타낸 도면이고, 도 2는 종래의 바이노럴 엔진의 구조를 나타낸 도면이다.1 is a view showing the structure of a binaural engine according to an embodiment of the present invention, Figure 2 is a view showing the structure of a conventional binaural engine.

먼저, 도 2를 참조하면, 종래의 바이노럴 엔진은 바이노럴 인코더(210)를 통해 다채널의 오디오 파일에 대해 바이노럴 인코딩된 바이노럴 출력을 전용 플레이어(220)를 통해 디코딩하여 제공한다. 이 때, 종래 기술에 따른 바이노럴 인코딩은 리스닝 포지션(listening position)으로부터 일정 거리 떨어진 곳에 배치된 고정 스피커를 이용하기 때문에 스피커의 위치를 조절하여 공간의 이미지를 증감시키는데에 어려움이 따른다. First, referring to FIG. 2, the conventional binaural engine decodes a binaural encoded binaural output for a multi-channel audio file through a binaural encoder 210 through a dedicated player 220. to provide. At this time, since the binaural encoding according to the prior art uses a fixed speaker disposed at a predetermined distance from the listening position, it is difficult to adjust the position of the speaker to increase or decrease the image of the space.

또한, 종래의 바이노럴 엔진은 서라운드 영화 컨텐츠와 같이 영상과 오디오가 함께 포함된 컨텐츠에 특화된 엔진으로, 음악 컨텐츠와 같이 공간 이미지가 존재하지 않는 소스의 경우에는 바이노럴 엔진을 적용하기 난해한 문제점이 있다. 또한, 전용 플레이어(220)를 사용해야만 바이노럴 인코딩된 컨텐츠 재생이 가능하기 때문에 활용적인 측면에서 효율성이 떨어질 수 있다. 예를 들어, 음악 컨텐츠의 특성상 청취자에게 충분한 라우드니스(loudness)를 전달해주어야 하지만, 도 2에 도시된 것이 바이노럴 인코더(210)만 이용해서는 음악 컨텐츠에 최적화된 음향 효과를 제공하는데 한계가 있다. In addition, the conventional binaural engine is a specialized engine for contents including both video and audio, such as surround movie contents, and it is difficult to apply the binaural engine in the case of a source having no spatial image such as music contents. There is this. In addition, since the binaural encoded content can be played only by using the dedicated player 220, efficiency may be reduced in terms of utilization. For example, although the loudness should be delivered to the listener due to the nature of the music content, there is a limitation in providing the sound effect optimized for the music content using only the binaural encoder 210 shown in FIG. 2.

또한, 종래의 바이노럴 엔진은, 컨텐츠에 따라 주로 활용되는 효과에 특화된 하나의 인코더만을 이용하기 때문에 다양한 방식의 연출 효과를 적용하는 것이 불가능했다. 예를 들어, 음악 컨텐츠에 대해서는 특성상 서브우퍼를 사용하지 않는 경우가 많기 때문에, 종래의 바이노럴 엔진을 통해 음악 컨텐츠에 서브우퍼에 따른 저음 재생 요소를 제공하는 연출은 거의 시도되지 않았다. In addition, since the conventional binaural engine uses only one encoder specialized for an effect mainly used according to the content, it is impossible to apply various effects of the rendering effect. For example, since the subwoofer is often not used for the music content, it has hardly been attempted to provide the bass reproduction element according to the subwoofer to the music content through the conventional binaural engine.

이에 반해, 도 1에 도시된 본 발명의 일실시예에 따른 바이노럴 엔진은 다양한 바이노럴 음향 효과를 포함하는 출력과 오디오 프로세싱에 의한 출력을 믹싱(mixing)하여 보다 극적인 연출을 포함하는 바이노럴 스테레오 출력을 생성할 수 있다. In contrast, the binaural engine according to the exemplary embodiment of the present invention illustrated in FIG. 1 includes a more dramatic rendering by mixing an output including various binaural sound effects and an output by audio processing. Can produce innal stereo output.

예를 들어, 도 1에 도시된 것처럼 다채널의 3차원 바이노럴 레이어(110)에 상응하는 바이노럴 인코더(111)로 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성할 수 있다. 또한, 서라운드 레이어(120)에 상응하는 바이노럴 인코더(121)로 바이노럴 인코딩을 수행하여 서라운드 레이어 바이노럴 출력을 생성할 수 있다. 또한, 근접용 스테레오 레이어(130)에 상응하는 스테레오 버스(131)로 스테레오 신호에 상응하는 오디오 출력을 생성할 수 있다. 또한, 서브우퍼 레이어(140)에 상응하는 LFE 버스(141)로 서브우퍼 출력을 생성할 수 있다. 이 후, 바이노럴 믹서(150)를 통해 각각의 출력, 즉 3차원 레이어 바이노럴 출력, 서라운드 레이어 바이노럴 출력, 스테레오 신호에 상응하는 오디오 출력 및 서브우퍼 출력을 합산하여 바이노럴 스테레오 출력을 생성할 수 있다. 이 때, 바이노럴 믹서(150)를 통해 결합된 바이노럴 스테레오 출력은 범용 디코더를 통해 재생 가능한 형태로 청취자 또는 사용자에게 출력될 수 있다. For example, as shown in FIG. 1, a binaural encoding is performed by a binaural encoder 111 corresponding to a multi-channel 3D binaural layer 110 to generate a 3D layer binaural output. Can be. In addition, the binaural encoder 121 corresponding to the surround layer 120 may perform binaural encoding to generate a surround layer binaural output. In addition, the audio bus corresponding to the stereo signal may be generated by the stereo bus 131 corresponding to the proximity stereo layer 130. In addition, the subwoofer output may be generated by the LFE bus 141 corresponding to the subwoofer layer 140. Thereafter, the binaural mixer 150 sums the respective outputs, that is, the three-dimensional layer binaural output, the surround layer binaural output, the audio output corresponding to the stereo signal, and the subwoofer output to add binaural stereo. Can produce output In this case, the binaural stereo output coupled through the binaural mixer 150 may be output to a listener or a user in a form reproducible through a general purpose decoder.

이와 같이, 본 발명에 따른 바이노럴 엔진은 다양한 인코더에 의한 출력을 믹싱하여 바이노럴 스테레오 오디오를 생성하기 때문에 특정 컨텐츠에 특화되지 않은 범용적 형태로 사용될 수 있으며, 종래의 컨텐츠들에 대해서도 높은 호환성이 제공될 수 있다. As described above, since the binaural engine according to the present invention generates binaural stereo audio by mixing outputs of various encoders, the binaural engine can be used in a general-purpose form that is not specialized to specific contents, and can be used for conventional contents. Compatibility may be provided.

예를 들어, 영상과 오디오가 함께 포함된 영화 컨텐츠의 경우, 영상에 포함된 객체의 움직임에 기반하여 생성 가능한 서라운드 레이어 바이노럴 출력과 함께 3차원 레이어 바이노럴 출력, 스테레오 출력 및 서브우퍼 출력 중 적어도 하나를 혼합하여 제공함으로써 보다 극적인 사운드 연출이 가능하도록 할 수 있다. For example, for movie content that includes video and audio, three-dimensional layer binaural output, stereo output, and subwoofer output, along with surround layer binaural output that can be generated based on the movement of objects included in the video, are included. Mixing and providing at least one of them may enable more dramatic sound production.

다른 예를 들어, 오디오만 포함하는 음악 컨텐츠의 경우에는 3차원 바이노럴 레이어를 기반으로 생성된 3차원 레이어 바이노럴 출력과 함께 스테레오 출력이나 서브우퍼 출력을 혼합하여 제공함으로써 다이나믹한 음악을 제공할 수도 있다.For example, music content containing only audio provides dynamic music by providing a mixture of stereo output or subwoofer output along with a 3D layer binaural output generated based on a 3D binaural layer. You may.

도 3은 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치를 나타낸 블록도이다.3 is a block diagram illustrating an apparatus for generating binaural stereo audio according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치는 통신부(310), 프로세서(320) 및 메모리(330)를 포함한다.Referring to FIG. 3, a binaural stereo audio generating apparatus according to an embodiment of the present invention includes a communication unit 310, a processor 320, and a memory 330.

통신부(310)는 네트워크와 같은 통신망을 통해 바이노럴 스테레오 오디오 생성을 위해 필요한 정보를 송수신하는 역할을 한다. 특히, 본 발명의 일실시예에 따른 통신부(310)는 바이노럴 스테레오 오디오 생성을 위해 입력 가능한 소스 또는 컨텐츠, 바이노럴 인코딩을 위해 적용될 헤드 트래킹 정보 및 사용자 입력에 관련된 정보를 수신하고, 바이노럴 스테레오 출력에 상응하는 바이노럴 스테레오 오디오를 제공할 수 있다. The communication unit 310 plays a role of transmitting and receiving information necessary for generating binaural stereo audio through a communication network such as a network. In particular, the communication unit 310 according to an embodiment of the present invention receives a source or content that can be input for generating binaural stereo audio, head tracking information to be applied for binaural encoding, and information related to a user input. It is possible to provide binaural stereo audio corresponding to the binaural stereo output.

프로세서(320)는 3차원 바이노럴 레이어에 상응하는 3차원 레이어 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성한다.The processor 320 performs a 3D layer binaural encoding corresponding to the 3D binaural layer to generate a 3D layer binaural output.

이 때, 3차원 바이노럴 레이어는 3차원 공간 이미지를 만드는 요소에 상응하는 것으로, 예를 들어 도 4를 참조하면, 3차원 큐빅 방식에 상응하는 바이노럴 인코더(420)를 이용하여 3차원 바이노럴 레이어에 포함된 다수의 채널들에 상응하는 3차원 레이어 바이노럴 인코딩을 수행할 수 있다. In this case, the 3D binaural layer corresponds to an element for creating a 3D spatial image. For example, referring to FIG. 4, for example, the 3D binaural layer may be 3D using a binaural encoder 420 corresponding to the 3D cubic method. Three-dimensional layer binaural encoding corresponding to a plurality of channels included in the binaural layer may be performed.

이 때, 3차원 바이노럴 레이어는 8채널 기반의 3차원 큐빅에 상응하는 4개의 업채널(411)과 4개의 다운채널(412)을 포함할 수 있다. In this case, the 3D binaural layer may include four upchannels 411 and four downchannels 412 corresponding to 8-channel-based 3D cubics.

따라서, 3차원 레이어 바이노럴 출력(430)은 8채널 기반의 오디오를 바이노럴 인코딩함으로써 생성된 출력에 상응할 수 있고, 도 4에 도시된 것과 같이 2채널에 상응하게 출력될 수 있다. 또한, 3차원 레이어 바이노럴 출력(430)에 상응하는 2채널은 각각 레프트 채널과 라이트 채널에 상응할 수 있다. Accordingly, the 3D layer binaural output 430 may correspond to an output generated by binaural encoding 8-channel based audio, and may be output corresponding to two channels as shown in FIG. 4. In addition, two channels corresponding to the 3D layer binaural output 430 may correspond to the left channel and the light channel, respectively.

이 때, 도 4에 도시된 실시예에서는 3차원 바이노럴 레이어로 8채널 기반의 3차원 큐빅 레이어를 사용하였으나, 3차원 바이노럴 레이어는 이에 한정되지 않을 수 있다. 즉, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 장치 또는 바이노럴 엔진은 사용 가능한 다른 3차원 바이노럴 레이어 또는 향후 개발될 3차원 바이노럴 레이어를 포함하여 구성될 수도 있다. In this case, although the embodiment shown in FIG. 4 uses an 8 channel based 3D cubic layer as the 3D binaural layer, the 3D binaural layer may not be limited thereto. That is, the binaural stereo audio generating apparatus or binaural engine according to an embodiment of the present invention may be configured to include other available 3D binaural layers or 3D binaural layers to be developed in the future.

예를 들어, 도 5를 참조하면, 8채널 기반의 3차원 큐빅은 4개의 업채널들에 상응하는 4개의 동적 스피커들(511~514)과 4개의 다운채널들에 상응하는 4개의 동적 스피커들(515~518)을 각 꼭지점으로 하는 육면체 구조일 수 있다. 이 때, 8개의 동적 스피커들(511~518)의 위치는 변경이 가능하기 때문에 3차원 큐빅에 의해 발생하는 바이노럴 효과의 범위도 동적으로 변경할 수 있다.For example, referring to FIG. 5, an 8-channel based 3D cubic includes four dynamic speakers corresponding to four upchannels 511 to 514 and four dynamic speakers corresponding to four downchannels. It may be a hexahedral structure having each of the vertices (515-518). At this time, since the positions of the eight dynamic speakers 511 to 518 can be changed, the range of the binaural effect generated by the three-dimensional cubic can also be changed dynamically.

다른 예를 들어, 기존의 바이노럴 Vbap(Vector base amplitude panning) 방식을 사용하여 3차원 큐빅을 생성함으로써 8개의 동적 스피커들로 이머시브(immersive) 사운드를 구현할 수도 있다. 즉, 8개의 동적 스피커들 각각에 대해 X, Y, Z에 대한 위치 값을 부여하되, 3차원 큐빅의 중점을 기준으로 하는 벡터 기반의 가상의 트랙 포인트(Track Point)를 표현할 수 있다. 이 때, 가상의 트랙 포인트는 헤드 트래킹 정보에 포함된 파라미터 값에 상응하게 표현될 수 있다. In another example, an immersive sound may be realized by eight dynamic speakers by generating three-dimensional cubic using a conventional binaural vector base amplitude panning (Vbap) method. That is, each of the eight dynamic speakers may be given position values for X, Y, and Z, and a vector-based virtual track point based on the midpoint of the 3D cubic may be expressed. In this case, the virtual track point may be represented corresponding to the parameter value included in the head tracking information.

이와 같은 3차원 큐빅을 통해 오디오만 포함하는 음악 컨텐츠에 대한 공간 이미지를 생성할 수 있고, 소리의 움직임을 표현할 수 있어서 보다 입체적인 효과를 제공할 수 있다.Through such a three-dimensional cubic it is possible to generate a spatial image for the music content containing only audio, it is possible to represent the movement of the sound can provide a more three-dimensional effect.

이 때, 3차원 큐빅은 3차원 큐빅의 꼭지점에 해당하는 8개의 동적 스피커들의 위치를 3차원 바이노럴 레이어에 대한 크기 파라미터에 상응하게 변경하여 생성될 수 있다. 즉, 고정 방식이 아닌 가변 방식의 동적 스피커들의 위치를 크기 파라미터에 상응하게 자유롭게 변경함으로써 효율적으로 3차원 큐빅을 생성할 수 있다.In this case, the 3D cubic may be generated by changing the positions of the eight dynamic speakers corresponding to the vertices of the 3D cubic according to the size parameter for the 3D binaural layer. In other words, by freely changing the position of the dynamic speaker of the variable method other than the fixed method in accordance with the size parameter can be efficiently generated three-dimensional cubic.

예를 들어, 크기 파라미터를 상수로 정하고, 여기에 바이노럴 함수를 곱하는 방식으로 3차원 큐빅을 프로세싱함으로써 도 6에 도시된 것과 같이 다양한 범위를 갖는 3차원 큐빅들(610, 620, 630)을 생성할 수 있다. For example, three-dimensional cubics 610, 620, and 630 having various ranges as shown in FIG. 6 are processed by processing the three-dimensional cubic by setting the magnitude parameter as a constant and multiplying it by a binaural function. Can be generated.

이 때, 3차원 벡터는 3차원 큐빅의 내부에 포함되고, 서라운드 레이어에 상응하는 2차원 평면의 중심에 해당하는 기준 청취점을 기준으로 생성될 수 있다.In this case, the 3D vector may be included in the 3D cubic and generated based on a reference listening point corresponding to the center of the 2D plane corresponding to the surround layer.

예를 들어, 도 7을 참조하면, 바이노럴 스테레오 오디오를 듣는 사용자 또는 청취자의 위치를 가상으로 표현한 기준 청취점(700)은 8개의 동적 스피커들을 각 꼭지점으로 하는 3차원 큐빅(710)의 내부에 위치하되, 서라운드 레이어(720) 상에서 중심 부분에 위치할 수 있다. 이 때, 바이노럴 포인트(730)가 도 7에 도시된 것과 같이 3차원 큐빅(710)의 상면에 위치한다고 가정하면, 3차원 레이어 바이노럴 출력에 상응하는 3차원 벡터(740)는 도 7에 도시된 기준 청취점(700)에서 바이노럴 포인트(730)를 향하는 방향으로 생성될 수 있다. For example, referring to FIG. 7, a reference listening point 700 that virtually expresses a location of a user or a listener who listens to binaural stereo audio is an interior of a three-dimensional cubic 710 having eight dynamic speakers as vertices. Located in the center portion on the surround layer (720). At this time, assuming that the binaural point 730 is located on the top surface of the three-dimensional cubic 710, as shown in Figure 7, the three-dimensional vector 740 corresponding to the three-dimensional layer binaural output The reference listening point 700 illustrated in FIG. 7 may be generated in a direction toward the binaural point 730.

이 때, 도 10 내지 도 11을 통해 상세하게 설명하겠지만, 서라운드 레이어(720)는 서라운드 효과에 상응하는 서라운드 이미지를 만드는 요소에 상응하는 것으로, 도 7에서는 설명의 편의를 위해 서라운드 레이어(720)를 평면의 형태로 도시하였으나, 평면 형태에 한정되지 않을 수 있다. In this case, as will be described in detail with reference to FIGS. 10 to 11, the surround layer 720 corresponds to an element for creating a surround image corresponding to a surround effect. In FIG. 7, the surround layer 720 is illustrated for convenience of description. Although illustrated in the form of a plane, it may not be limited to the plane form.

이 때, 도 7에 도시된 것과 같이 3차원 큐빅(710) 상에서 바이노럴 포인트(730)가 기준 청취점(700)이 위치하는 서라운드 레이어(720)보다 높게 위치할 경우, 출력되는 소리가 청취자의 상단에 맺힐 수 있다. 또한, 3차원 큐빅(710) 상에서 바이노럴 포인트(730)가 기준 청취점(700)이 위치하는 서라운드 레이어(720)보다 낮게 위치할 경우, 출력되는 소리가 청취자의 하단에 맺힐 수도 있다.In this case, when the binaural point 730 is positioned higher than the surround layer 720 where the reference listening point 700 is located on the 3D cubic 710 as shown in FIG. Can be brought to the top of the. In addition, when the binaural point 730 is lower than the surround layer 720 where the reference listening point 700 is positioned on the 3D cubic 710, the output sound may be formed at the bottom of the listener.

이와 같이, 본 발명에서는 3차원 큐빅(710)상에서 기준 청취점(700)을 기준으로 한 바이노럴 포인트(730)의 위치를 변경함으로써 보다 다양한 오디오를 연출하는 것이 가능할 수 있다. As described above, in the present invention, more various audios may be produced by changing the position of the binaural point 730 based on the reference listening point 700 on the 3D cubic 710.

이 때, 3차원 벡터의 방향 정보를 헤드 트래킹 정보에 상응하게 회전된 3차원 큐빅에 적용하여 3차원 레이어 바이노럴 출력을 생성할 수 있다. 즉, 바이노럴 포인트는 기준 청취점에 해당하는 청취자의 머리를 기준으로 설정된 위치이므로 청취자의 머리 위치나 각도가 변경되는 경우, 3차원 큐빅 상에서 바이노럴 포인트의 위치도 변경될 수 있다. In this case, the 3D layer binaural output may be generated by applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information. That is, since the binaural point is a position set based on the head of the listener corresponding to the reference listening point, when the head position or angle of the listener is changed, the position of the binaural point on the 3D cubic may also be changed.

예를 들어, 도 7에 도시된 3차원 큐빅(710)을 헤드 트래킹 정보에 상응하게 도 8에 도시된 것처럼 회전시켰다고 가정할 수 있다. 이 때, 도 7에 도시된 3차원 벡터(740)의 방향 정보를 그대로 도 8에 도시된 3차원 큐빅에 적용함으로써 회전에 따라 변경된 바이노럴 포인트의 위치를 검출할 수 있다. For example, it may be assumed that the three-dimensional cubic 710 shown in FIG. 7 is rotated as shown in FIG. 8 corresponding to the head tracking information. At this time, by applying the direction information of the three-dimensional vector 740 shown in FIG. 7 to the three-dimensional cubic shown in FIG. 8, the position of the binaural point changed according to the rotation can be detected.

이 때, 헤드 트래킹 정보는 사용자나 청취자의 머리 움직임을 트래킹한 데이터에 상응하는 것으로, 별도의 헤드 트래킹 모듈에 기반한 트래킹 입력 및 사용자 인터페이스에 기반한 사용자 입력 중 적어도 하나에 상응하게 획득될 수 있다.In this case, the head tracking information corresponds to data tracking head movements of a user or a listener, and may be obtained corresponding to at least one of a tracking input based on a separate head tracking module and a user input based on a user interface.

예를 들어, 사용자나 청취자가 헤드 트래킹 모듈을 직접 착용한 상태에서 머리를 움직이면, 헤드 트래킹 모듈에서 사용자의 머리가 움직인 거리나 각도 등을 측정하여 헤드 트래킹 정보로 생성하고 전송할 수 있다.For example, when the user or the listener moves the head while directly wearing the head tracking module, the head tracking module may measure the distance or the angle of the movement of the user's head and generate and transmit the head tracking information.

다른 예를 들어, 헤드 트래킹 정보는 사용자나 청취자가 사용자 인터페이스를 통해 인위적으로 부여할 수도 있다. 즉, 사용자나 청취자가 인위적으로 공간 이미지를 회전시키기 위해, 헤드 트래킹 모듈에 의한 헤드 트래킹 정보의 수신 여부와 상관없이 사용자 인터페이스를 기반으로 헤드 트래킹 정보를 입력할 수도 있다. 이 때, 사용자나 청취자는 바이노럴 스테레오 출력을 생성하는 믹싱과정 또는 입력되는 정보에 따라 변화하는 바이노럴 스테레오 출력을 청취하면서 헤드 트래킹 정보를 입력 및 수정할 수도 있다.As another example, the head tracking information may be artificially assigned by the user or the listener through the user interface. That is, in order to artificially rotate the spatial image, the user or the listener may input the head tracking information based on the user interface regardless of whether the head tracking information is received by the head tracking module. At this time, the user or the listener may input and modify the head tracking information while listening to the mixing process for generating the binaural stereo output or the binaural stereo output that changes according to the input information.

예를 들어, 도 9에 도시된 것과 같이 청취자가 팬(Pan), 틸트(tilt) 및 롤(roll) 중 적어도 하나에 상응하게 머리를 회전하는 경우, 이 값을 회전 파라미터로 획득하여 3차원 큐빅에 적용할 수 있다. For example, when the listener rotates the head corresponding to at least one of pan, tilt and roll, as shown in FIG. 9, the value is obtained as a rotation parameter to obtain a three-dimensional cubic. Applicable to

이와 같이, 헤드 트래킹 정보에 따라 3차원 큐빅을 회전시키거나 상하좌우로 움직여서 연출되는 효과는 향후 평면 레이어 오디오 출력과 믹싱되어 바이노럴 스테레오 출력을 생성할 수 있다. 따라서, 평면 레이어에 상응하는 서라운드 레이어나 근접용 스테레오 레이어 또는 서브우퍼 레이어 등을 회전시키거나 이동시키는 종래의 방식보다 효율적으로 헤드 트래킹에 기반한 이머시브(immersive) 효과를 연출할 수 있다. In this way, the effects produced by rotating the 3D cubic or moving up, down, left, and right according to the head tracking information may be mixed with the planar layer audio output in the future to generate a binaural stereo output. Accordingly, an immersive effect based on head tracking can be produced more efficiently than a conventional method of rotating or moving a surround layer, a proximity stereo layer, or a subwoofer layer corresponding to a planar layer.

또한, 프로세서(320)는 평면 레이어에 상응하는 오디오 프로세싱을 수행하여 평면 레이어 오디오 출력을 생성한다.In addition, the processor 320 performs audio processing corresponding to the planar layer to generate a planar layer audio output.

이 때, 평면 레이어는 3차원 바이노럴 레이어와는 상이한 구조를 갖는 레이어에 상응하는 것으로, 서라운드 효과 또는 스테레오 효과에 상응하는 이미지를 만드는 요소에 상응할 수 있다. In this case, the planar layer corresponds to a layer having a structure different from that of the 3D binaural layer, and may correspond to an element for making an image corresponding to a surround effect or a stereo effect.

따라서, 평면 레이어는 서라운드 레이어 바이노럴 인코딩을 수행하여 서라운드 레이어 바이노럴 출력을 생성하고, 생성된 서라운드 레이어 바이노럴 출력을 평면 레이어 오디오 출력으로 제공하는 서라운드 레이어 및 스테레오 신호를 입력 받아서 스테레오 신호에 상응하는 평면 레이어 오디오 출력을 생성하는 근접용 스테레오 레이어 중 어느 하나일 수 있다.Therefore, the flat layer performs surround layer binaural encoding to generate a surround layer binaural output, and receives a surround signal and a stereo signal that provides the generated surround layer binaural output as a flat layer audio output. It may be any one of a proximity stereo layer that generates a planar layer audio output corresponding to.

예를 들어, 도 10을 참조하면, 바이노럴 인코더(1020)를 이용하여 5채널 또는 7채널(1010)의 서라운드 레이어에 상응하는 서라운드 레이어 바이노럴 인코딩을 수행할 수 있다. 이 때, 도 13 내지 도 14를 통해 설명하겠지만, 근접용 스테레오 레이어에 상응하는 2채널을 서라운드 레이어에 포함시켜 7채널 기반의 서라운드 레이어 바이노럴 인코딩을 수행할 수 있다. For example, referring to FIG. 10, a surround layer binaural encoding corresponding to a surround layer of five or seven channels 1010 may be performed using the binaural encoder 1020. In this case, as will be described with reference to FIGS. 13 through 14, 7-channel-based surround layer binaural encoding may be performed by including 2 channels corresponding to a proximity stereo layer in the surround layer.

이 때, 서라운드 레이어는, 예를 들어, 도 11에 도시된 것과 같이 5개의 스피커들(1111~1115)을 포함하는 구조에 상응할 수 있다. 이 때, 서라운드 레이어 바이노럴 출력(1030)은 서라운드 레이어 상에 위치하는 바이노럴 포인트에 상응할 수 있다. 만약, 청취자가 서라운드 레이어의 중심에 위치하는 기준 청취점에서 소리를 듣고 있다고 가정할 경우, 마치 서라운드 레이어 상의 바이노럴 포인트에서 소리가 나는 것처럼 바이노럴 인코딩하여 서라운드 레이어 바이노럴 출력(1030)을 생성할 수 있다.In this case, the surround layer may correspond to a structure including five speakers 1111 to 1115, for example, as illustrated in FIG. 11. In this case, the surround layer binaural output 1030 may correspond to a binaural point located on the surround layer. If it is assumed that the listener is listening to the sound at the reference listening point located in the center of the surround layer, the surround layer binaural output 1030 by binaural encoding as if the sound is coming from the binaural point on the surround layer. Can be generated.

이 때, 서라운드 레이어 바이노럴 출력(1030)은 도 10에 도시된 것과 같이 2채널에 상응하게 출력될 수 있다. 또한, 서라운드 레이어 바이노럴 출력(1030)에 상응하는 2채널은 각각 레프트 채널과 라이트 채널에 상응할 수 있다.In this case, the surround layer binaural output 1030 may be output corresponding to two channels as illustrated in FIG. 10. In addition, two channels corresponding to the surround layer binaural output 1030 may correspond to the left channel and the right channel, respectively.

이 때, 도 10 내지 도 11에서는 5채널 또는 7채널(1010)에 해당하는 서라운드 레이어를 도시하고 있으나, 서라운드 레이어의 채널은 5채널 또는 7채널(1010)에 한정되지 않는다. 또한, 도 11에서는 서라운드 레이어를 사각형 평면 형태로 도시하고 있으나, 이에 한정되지 않고 선의 두께, 평면 모양의 형태 및 기준 청취점으로부터의 거리 등 다양한 형태로 표현 가능하다. 10 to 11 illustrate a surround layer corresponding to 5 or 7 channels 1010, the channel of the surround layer is not limited to 5 or 7 channels 1010. In addition, although the surround layer is illustrated in the form of a rectangular plane in FIG. 11, the surround layer is not limited thereto and may be represented in various forms such as a line thickness, a planar shape, and a distance from a reference listening point.

다른 예를 들어, 도 12를 참조하면, 스테레오 버스(Stereo Bus)(1220)를 기반으로 2채널(1210)의 근접용 스테레오 레이어에 상응하게 오디오 프로세싱을 수행할 수 있다. 즉, 평면 레이어 오디오 출력에 상응하는 스테레오 신호(1230)는 2채널(1210) 기반의 스테레오 오디오를 프로세싱함으로써 생성된 출력에 상응할 수 있고, 2채널에 상응하게 출력될 수 있다. For another example, referring to FIG. 12, audio processing may be performed corresponding to the proximity stereo layer of the two channels 1210 based on the stereo bus 1220. That is, the stereo signal 1230 corresponding to the planar layer audio output may correspond to the output generated by processing the two channel 1210 based stereo audio, and may be output corresponding to the two channels.

이 때, 근접용 스테레오 레이어는 스테레오 효과에 상응하는 스테레오 이미지를 만드는 요소에 상응하는 것으로, 서라운드 레이어의 일부로 포함되어 나타낼 수도 있다. In this case, the proximity stereo layer corresponds to an element for creating a stereo image corresponding to the stereo effect, and may be included as a part of the surround layer.

예를 들어, 도 13 내지 도 14에 도시된 것과 같이 5개의 스피커들에 기반한 서라운드 레이어 상에 2개의 스피커들(1311, 1312, 1411, 1412)에 상응하는 근접용 스테레오 레이어를 포함시켜 총 7개의 스피커들을 포함하는 레이어 구조로 나타낼 수도 있다. For example, as shown in Figs. 13 to 14, a total of seven pieces are included in a surround layer corresponding to two speakers 1311, 1312, 1411, and 1412 on a surround layer based on five speakers. It may be represented by a layer structure including speakers.

이 때, 도 13에 도시된 것과 같이, 근접용 스테레오 레이어는 서라운드 레이어 상에 위치하는 기준 청취점(1300)으로부터 근접한 거리에 배치될 수 있다. 또는, 도 14에 도시된 것과 같이, 기준 청취점(1400)의 좌우 사이드 스피커로써 근접용 스테레오 레이어를 사용할 수도 있다.In this case, as shown in FIG. 13, the proximity stereo layer may be disposed at a close distance from the reference listening point 1300 positioned on the surround layer. Alternatively, as shown in FIG. 14, a proximity stereo layer may be used as the left and right side speakers of the reference listening point 1400.

이 때, 근접용 스테레오 레이어에 상응하게 출력되는 스테레오 신호는 바이노럴 인코딩에 사용되는 공간 파라미터로는 연출하기 어려운 댐핑(damping)감을 제공할 수 있다. 따라서, 본 발명의 일실시예에 따른 바이노럴 스테레오 출력은 바이노럴 인코딩에 의한 이머시브(immersive) 효과를 제공함과 동시에 댐핑감을 제공할 수도 있다.In this case, the stereo signal output corresponding to the proximity stereo layer may provide a damping feeling that is difficult to produce by the spatial parameter used for binaural encoding. Accordingly, the binaural stereo output according to an embodiment of the present invention may provide an immersive effect by binaural encoding and at the same time provide a damping feeling.

이와 같이, 서라운드 레이어 바이노럴 출력에 상응하는 평면 레이어 오디오 출력이나 스테레오 신호에 상응하는 평면 레이어 오디오 출력은 3차원 레이어 바이노럴 출력과 비교하였을 때, 단지 상이한 음향 효과를 포함하는 출력에 해당하는 것일 수 있다. 즉, 평면 레이어 오디오 출력은 3차원 레이어에 상응하는 출력이 아니어도 3차원 레이어 바이노럴 출력보다 다양한 값을 포함할 수도 있다. As such, a flat layer audio output corresponding to a surround layer binaural output or a flat layer audio output corresponding to a stereo signal corresponds to an output that includes only different sound effects when compared to a three dimensional layer binaural output. It may be. That is, the planar layer audio output may include various values than the 3D layer binaural output even if the output is not corresponding to the 3D layer.

이 때, 평면 레이어는 3차원 큐빅에 상응하는 4개의 업채널들과 4개의 다운채널들 사이에 위치할 수 있다.In this case, the planar layer may be positioned between four upchannels and four downchannels corresponding to the 3D cubic.

예를 들어, 도 15를 참조하면, 본 발명의 일실시예에 따른 평면 레이어(1510~1530)는 3차원 바이노럴 레이어에 상응하는 3차원 큐빅에 포함된 4개의 업채널들과 4개의 다운채널들 사이에 위치할 수 있다.For example, referring to FIG. 15, the planar layers 1510-1530 according to an embodiment of the present invention include four upchannels and four downs included in a three-dimensional cubic corresponding to a three-dimensional binaural layer. It may be located between the channels.

이 때, 4개의 업채널들은 3차원 큐빅의 상단에 위치하는 4개의 스피커들에 해당할 수 있고, 4개의 다운채널들은 3차원 큐빅의 하단에 위치하는 4개의 스피커들에 해당할 수 있다.In this case, the four upchannels may correspond to four speakers positioned at the top of the 3D cubic, and the four downchannels may correspond to four speakers positioned at the bottom of the 3D cubic.

즉, 도 15에 도시된 것과 같이 평면 레이어(1510~1530)는 3차원 큐빅에 상응하는 육면체의 높이 범위 내에 위치할 수 있다. That is, as shown in FIG. 15, the planar layers 1510 to 1530 may be located within a height range of a hexahedron corresponding to the 3D cubic.

따라서, 평면 레이어(1510~1530)에 상응하는 서라운드 레이어나 근접용 스테레오 레이어에 포함되는 각각의 스피커들도 3차원 큐빅에 포함된 4개의 업채널들과 4개의 다운채널들 사이에 위치할 수 있다. 이 때, 도 15에서는 설명의 편의를 위해 평면 레이어(1510~1530)를 평면의 형태로 도시하였으나, 본 발명의 일실시예에 따른 평면 레이어의 형태는 평면의 형태에 한정되지 않을 수 있다.Accordingly, the speakers included in the surround layer or the proximity stereo layer corresponding to the planar layers 1510 to 1530 may also be positioned between four upchannels and four downchannels included in the 3D cubic. . In this case, although the planar layers 1510 to 1530 are illustrated in the form of a plane in FIG. 15 for convenience of description, the form of the planar layer according to an embodiment of the present invention may not be limited to the form of the plane.

또한, 도 16과 도 17은 각각 3차원 바이노럴 레이어에 상응하는 3차원 큐빅과 평면 레이어(1610)를 위에서 내려다본 구조를 나타낸 것으로, 평면 레이어(1610)에 포함된 근접용 스테레오 레이어의 스피커들(1621, 1622)도 3차원 큐빅의 업채널과 다운채널 사이에 위치하는 것을 알 수 있다. Also, FIGS. 16 and 17 show the top-down structure of the three-dimensional cubic and the flat layer 1610 corresponding to the three-dimensional binaural layer, respectively, and the speaker of the proximity stereo layer included in the flat layer 1610. It can be seen that the fields 1621 and 1622 are also located between the up channel and the down channel of the 3D cubic.

이 때, 도 17에 도시된 것과 같이, 기준 청취점(1700)을 기준으로 근접용 스테레오 레이어의 스피커들(1721, 1722)을 좌우 사이드에 배치함으로써 영상이 포함된 비디오 컨텐츠의 호환 시 적용할 수도 있다.In this case, as illustrated in FIG. 17, the speakers 1721 and 1722 of the proximity stereo layer may be disposed on the left and right sides with respect to the reference listening point 1700 to be applied when the video content including the image is compatible. have.

또한, 프로세서(320)는 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 합하여 바이노럴 스테레오 출력을 생성한다. 즉, 3차원 레이어 바이노럴 출력에 의한 이머시브(immersive) 요소와 평면 레이어 오디오 출력에 의한 근접 재생 요소 및 오브젝트 요소 등을 믹스함으로써 바이노럴 효과가 극대화된 바이노럴 스테레오 출력을 생성할 수 있다. In addition, the processor 320 generates a binaural stereo output by adding the three-dimensional layer binaural output and the planar layer audio output. That is, a binaural stereo output with the maximum binaural effect can be generated by mixing an immersive element with a three-dimensional layer binaural output, a close-up reproduction element and an object element with a planar layer audio output. have.

이 때, 이머시브(immersive) 사운드만 구성하고자 하는 경우에는 3차원 레이어 바이노럴 출력만을 이용하여 바이노럴 스테레오 출력을 생성할 수도 있다.In this case, when only the immersive sound is to be configured, the binaural stereo output may be generated using only the 3D layer binaural output.

이 때, 서브우퍼 레이어에 상응하는 서브우퍼 출력을 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력과 함께 합산하여 바이노럴 스테레오 출력을 생성할 수 있다. 이 때, 서브우퍼 출력을 합산함으로써 바이노럴 스테레오 출력에 상응하는 이머시브(immersive) 효과를 극대화 시킬 수 있고, 다이나믹한 저음 재생 요소를 연출할 수 있다.At this time, the subwoofer output corresponding to the subwoofer layer may be summed together with the 3D layer binaural output and the planar layer audio output to generate a binaural stereo output. At this time, the subwoofer output can be summed to maximize the immersive effect corresponding to the binaural stereo output and to produce a dynamic bass reproduction element.

예를 들어, 도 18을 참조하면, LFE 버스(Low Frequency Effects Bus)(1820)를 기반으로 서브우퍼 레이어에 포함된 단일 채널 또는 2채널(1810)의 신호를 오디오 프로세싱할 수 있다. 즉, 서브우퍼 출력(1830)은 단일 채널 또는 2채널(1810) 기반의 오디오를 프로세싱함으로써 생성된 출력에 상응할 수 있고, 도 12에 도시된 것과 같이 단일 채널 또는 2채널에 상응할 수 있다. For example, referring to FIG. 18, a signal of a single channel or two channels 1810 included in a subwoofer layer may be audio processed based on a low frequency effects bus 1820. That is, the subwoofer output 1830 may correspond to an output generated by processing single channel or two channel 1810 based audio, and may correspond to a single channel or two channels as shown in FIG. 12.

예를 들어, 서브우퍼 레이어는 5.1 채널, 7.1 채널 및 11.1 채널과 같이 단일 채널에 상응하거나, 10.2 채널 및 22.2 채널과 같이 2채널에 상응할 수도 있다. For example, the subwoofer layer may correspond to a single channel, such as 5.1 channels, 7.1 channels, and 11.1 channels, or may correspond to two channels, such as 10.2 channels and 22.2 channels.

이 때, 서브우퍼 레이어는 3차원 바이노럴 레이어에 상응하는 3차원 큐빅이나 평면 레이어와 분리되어 위치할 수 있다. In this case, the subwoofer layer may be separated from the 3D cubic or planar layer corresponding to the 3D binaural layer.

예를 들어, 도 19에 도시된 것과 같이 서브우퍼 레이어(1940)는 3차원 바이노럴 레이어에 상응하는 3차원 큐빅(1910), 서라운드 레이어(1920) 및 근접용 스테레오 레이어(1930)와 떨어진 곳에 위치할 수 있다. 이 때, 도 19에 도시된 구조는 일실시예에 상응하는 것으로, 각각의 레이어들을 조합한 구조에 한정되지 않는다.For example, as shown in FIG. 19, the subwoofer layer 1940 is separated from the three-dimensional cubic 1910, the surround layer 1920, and the proximity stereo layer 1930 corresponding to the three-dimensional binaural layer. Can be located. In this case, the structure shown in FIG. 19 corresponds to one embodiment, and is not limited to a structure in which respective layers are combined.

이 때, 3차원 가중치를 3차원 레이어 바이노럴 출력에 적용하고, 평면 가중치를 평면 레이어 오디오 출력에 적용할 수 있고, 3차원 가중치 및 평면 가중치는 서로 독립적으로 설정될 수 있다. 즉, 레이어별 출력의 크기를 세분화하여 조절한 뒤 믹싱을 수행함으로써 보다 극적인 형태의 바이노럴 스테레오 출력을 생성할 수 있고, 바이노럴 효과를 극대화시킬 수 있다.In this case, the 3D weight may be applied to the 3D layer binaural output, the plane weight may be applied to the planar layer audio output, and the 3D weight and the plane weight may be set independently of each other. In other words, by adjusting the size of the output for each layer and adjusting the result, a more dramatic binaural stereo output can be generated and the binaural effect can be maximized.

또한, 본 발명은 상기와 같은 기능의 프로세서(320)를 기반으로 자연스러운 업믹스 및 다운믹스 기능을 지원할 수 있으므로 다양한 종류의 사운드를 지원하는 컨텐츠 간의 호환성을 향상시킬 수 있다. 예를 들어, 3차원 큐빅을 통해 표현되는 서라운드 이미지를 서라운드 레이어로 다운믹스할 수 있다. 또한, 서라운드 레이어는 다시 근접용 스테레오 레이어로 다운믹스할 수도 있다. 이와 같이, 영역을 기반으로 다운믹스를 수행함에 따라 사운드의 음질을 보다 효과적으로 보존할 수 있다.In addition, since the present invention can support the natural upmix and downmix functions based on the processor 320 having the above function, compatibility between contents supporting various kinds of sounds can be improved. For example, a surround image expressed through 3D cubic can be downmixed into a surround layer. The surround layer can also be downmixed back to a proximity stereo layer. As such, as the downmix is performed based on the area, the sound quality of the sound may be more effectively preserved.

메모리(330)는 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 저장한다. The memory 330 stores the three-dimensional layer binaural output and the planar layer audio output.

또한, 메모리(330)는 상술한 바와 같이 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오를 생성하는 과정에서 발생되는 다양한 정보를 저장한다. In addition, the memory 330 stores various information generated in the process of generating binaural stereo audio according to an embodiment of the present invention as described above.

실시예에 따라, 메모리(330)는 바이노럴 스테레오 오디오 생성 장치와 독립적으로 구성되어 바이노럴 스테레오 오디오 생성 기능을 지원할 수 있다. 이 때, 메모리(330)는 별도의 대용량 스토리지로 동작할 수 있고, 동작 수행을 위한 제어 기능을 포함할 수 있다.According to an embodiment, the memory 330 may be configured independently of the binaural stereo audio generation device to support a binaural stereo audio generation function. In this case, the memory 330 may operate as a separate mass storage and may include a control function for performing an operation.

한편, 바이노럴 스테레오 오디오 생성 장치는 메모리가 탑재되어 그 장치 내에서 정보를 저장할 수 있다. 일 구현예의 경우, 메모리는 컴퓨터로 판독 가능한 매체이다. 일 구현 예에서, 메모리는 휘발성 메모리 유닛일 수 있으며, 다른 구현예의 경우, 메모리는 비휘발성 메모리 유닛일 수도 있다. 일 구현예의 경우, 저장장치는 컴퓨터로 판독 가능한 매체이다. 다양한 서로 다른 구현 예에서, 저장장치는 예컨대 하드디스크 장치, 광학디스크 장치, 혹은 어떤 다른 대용량 저장장치를 포함할 수도 있다.On the other hand, the binaural stereo audio generating device is equipped with a memory can store information in the device. In one embodiment, the memory is a computer readable medium. In one implementation, the memory may be a volatile memory unit, and for other implementations, the memory may be a nonvolatile memory unit. In one embodiment, the storage device is a computer readable medium. In various different implementations, the storage device may include, for example, a hard disk device, an optical disk device, or some other mass storage device.

이와 같은, 바이노럴 스테레오 오디오 생성 장치를 통해 다양한 사운드 요소를 믹스함으로써 바이노럴 효과를 극대화할 수 있다. 또한, 자연스러운 업 믹스 및 다운 믹스를 기반으로 다양한 종류의 컨텐츠들과의 호환성을 향상시킬 수 있다.Such a binaural stereo audio generating device can maximize the binaural effect by mixing various sound elements. In addition, compatibility with various types of contents can be improved based on natural upmix and downmix.

도 20은 종래의 바이노럴 엔진을 통해 표현되는 사운드의 일 예를 나타낸 도면이고, 도 21은 본 발명에 따른 바이노럴 엔진을 통해 표현되는 사운드의 일 예를 나타낸 도면이다.20 is a diagram illustrating an example of sound expressed through a conventional binaural engine, and FIG. 21 is a diagram illustrating an example of sound represented through a binaural engine according to the present invention.

먼저, 도 20을 참조하면, 종래의 바이노럴 엔진을 이용한 바이노럴 믹스 방식은 소리의 근접을 표현하는데 한계가 존재한다. 즉, 바이노럴 믹싱은 소리에 대한 공간 이미지를 제공하는 것에 해당하므로, 바이노럴 믹싱을 통해 소리의 근접을 표현하기 위해서는 소리의 음량을 조절하는 방법밖에 존재하지 않는다. First, referring to FIG. 20, the binaural mix method using a conventional binaural engine has a limitation in expressing proximity of sound. That is, since the binaural mixing corresponds to providing a spatial image of the sound, there is only a method of adjusting the volume of the sound to express the proximity of the sound through the binaural mixing.

따라서, 종래의 바이노럴 엔진으로 바이노럴 믹싱을 수행하는 경우, 엔지니어가 의도한 사운드 방향(2010)에 상응하게 바이노럴 믹싱을 수행하여도, 믹싱결과는 실제 사운드 방향(2020)에 상응하게 표현될 수 있다. 즉, 기준 청취점(2000)을 기준으로 소리가 앞에서 뒤로 또는 뒤에서 앞으로 흐르도록 표현하기 위한 바이노럴 믹싱은 실제로 바이노럴 엔진의 표면을 따라가는 형태로 표현되며, 이것은 Vbap(Vector base amplitude panning) 기술의 한계에 해당할 수 있다.Therefore, when performing binaural mixing with a conventional binaural engine, even if the engineer performs binaural mixing corresponding to the intended sound direction 2010, the mixing result corresponds to the actual sound direction 2020. Can be expressed. That is, the binaural mixing for expressing the sound to flow from the front to the back or the front to the rear based on the reference listening point 2000 is actually expressed in the form of following the surface of the binaural engine, which is called Vbap (Vector base amplitude panning). This may be a limitation of technology.

그러나, 도 21을 참조하면, 본 발명에 따른 바이노럴 엔진은 3차원 바이노럴 레이어 이외에도 서라운드 레이어(2110)와 근접용 스테레오 레이어(2120)를 이용하여 생성된 평면 레이어 오디오 출력을 함께 믹싱할 수 있다. 즉, 종래의 바이노럴 엔진에서는 소리의 음량을 통해서만 조절되었던 소리의 근접 표현을 서라운드 레이어 바이노럴 출력과 스테레오 신호를 통해 조절할 수 있다. However, referring to FIG. 21, the binaural engine according to the present invention mixes the planar layer audio output generated using the surround layer 2110 and the proximity stereo layer 2120 in addition to the 3D binaural layer. Can be. That is, in the conventional binaural engine, the proximity expression of the sound, which has been adjusted only through the volume of the sound, may be adjusted through the surround layer binaural output and the stereo signal.

따라서, 본 발명에 따른 바이노럴 엔진을 이용하는 경우, 도 20에서 엔지니어가 의도한 사운드 방향(2010)에 일치하는 실제 사운드 방향(2130)을 표현할 수 있다. 즉, 기준 청취점(2100)을 투과함으로써 마치 청취자의 몸을 투과하는 듯한 사운드를 연출해낼 수 있다.Therefore, when using the binaural engine according to the present invention, it is possible to represent the actual sound direction 2130 corresponding to the sound direction 2010 intended by the engineer in FIG. That is, by transmitting the reference listening point 2100, it is possible to produce a sound as if the body of the listener.

도 22는 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법을 나타낸 동작흐름도이다.22 is a flowchart illustrating a method for generating binaural stereo audio according to an embodiment of the present invention.

도 22를 참조하면, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 3차원 바이노럴 레이어에 상응하는 3차원 레이어 바이노럴 인코딩을 수행하여 3차원 레이어 바이노럴 출력을 생성한다(S2210).Referring to FIG. 22, a method for generating binaural stereo audio according to an embodiment of the present invention generates a 3D layer binaural output by performing a 3D layer binaural encoding corresponding to a 3D binaural layer. (S2210).

예를 들어, 도 7을 참조하면, 바이노럴 스테레오 오디오를 듣는 사용자 또는 청취자의 위치를 가상으로 표현한 기준 청취점(700)은 8개의 동적 스피커들을 각 꼭지점으로 하는 3차원 큐빅(710)의 내부에 위치하되, 서라운드 레이어(720) 상에서 중심 부분에 위치할 수 있다. 이 때, 바이노럴 포인트(730)가 도 7에 도시된 것과 같이 3차원 큐빅(710)의 상면에 위치한다고 가정하면, 3차원 레이어 바이노럴 출력에 상응하는 3차원 벡터(740)는 도 7에 도시된 기준 청취점(700)에서 바이노럴 포인트(730)를 향하는 방향으로 생성될 수 있다.For example, referring to FIG. 7, a reference listening point 700 that virtually expresses a position of a user or a listener who listens to binaural stereo audio is an interior of a three-dimensional cubic 710 having eight dynamic speakers as vertices. Located in the center portion on the surround layer (720). At this time, assuming that the binaural point 730 is located on the top surface of the three-dimensional cubic 710, as shown in Figure 7, the three-dimensional vector 740 corresponding to the three-dimensional layer binaural output The reference listening point 700 illustrated in FIG. 7 may be generated in a direction toward the binaural point 730.

이 때, 도 10 내지 도 11을 통해 상세하게 설명하겠지만, 서라운드 레이어(720)는 서라운드 효과에 상응하는 서라운드 이미지를 만드는 요소에 상응하는 것으로, 도 7에서는 설명의 편의를 위해 서라운드 레이어(720)를 평면의 형태로 도시하였으나, 평면 형태에 한정되지 않을 수 있다.In this case, as will be described in detail with reference to FIGS. 10 to 11, the surround layer 720 corresponds to an element for creating a surround image corresponding to a surround effect. In FIG. 7, the surround layer 720 is illustrated for convenience of description. Although illustrated in the form of a plane, it may not be limited to the plane form.

이 때, 헤드 트래킹 정보는 사용자나 청취자의 머리 움직임을 트래킹한 데이터에 상응하는 것으로, 별도의 헤드 트래킹 모듈 또는 사용자 인터페이스를 통해 입력될 수 있다. In this case, the head tracking information corresponds to data tracking head movements of a user or a listener, and may be input through a separate head tracking module or a user interface.

다른 예를 들어, 헤드 트래킹 정보는 사용자나 청취자가 인위적으로 부여할 수도 있다. 즉, 사용자나 청취자가 인위적으로 공간 이미지를 회전시키기 위해서 헤드 트래킹 모듈에 의한 헤드 트래킹 정보의 수신 여부와 상관없이 사용자 인터페이스를 기반으로 헤드 트래킹 정보를 입력할 수도 있다. 이 때, 사용자나 청취자는 바이노럴 스테레오 출력을 생성하는 믹싱과정 또는 입력되는 정보에 따라 변화하는 바이노럴 스테레오 출력을 청취하면서 헤드 트래킹 정보를 입력 및 수정할 수도 있다.As another example, the head tracking information may be artificially assigned by the user or the listener. That is, the user or the listener may input the head tracking information based on the user interface regardless of whether the head tracking module receives the head tracking information in order to artificially rotate the spatial image. At this time, the user or the listener may input and modify the head tracking information while listening to the mixing process for generating the binaural stereo output or the binaural stereo output that changes according to the input information.

이와 같이, 헤드 트래킹 정보에 따라 3차원 큐빅을 회전시키거나 상하좌우로 움직여서 연출되는 효과는 향후 평면 레이어 오디오 출력과 믹싱되어 바이노럴 스테레오 출력을 생성할 수 있다. 따라서, 평면 레이어에 상응하는 서라운드 레이어나 근접용 스테레오 레이어 또는 서브우퍼 레이어 등을 회전시키거나 이동시키는 종래의 방식보다 효율적으로 헤드 트래킹에 기반한 이머시브(immersive) 효과를 연출할 수 있다.In this way, the effects produced by rotating the 3D cubic or moving up, down, left, and right according to the head tracking information may be mixed with the planar layer audio output in the future to generate a binaural stereo output. Accordingly, an immersive effect based on head tracking can be produced more efficiently than a conventional method of rotating or moving a surround layer, a proximity stereo layer, or a subwoofer layer corresponding to a planar layer.

또한, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 평면 레이어에 상응하는 오디오 프로세싱을 수행하여 평면 레이어 오디오 출력을 생성한다(S2220).In addition, the binaural stereo audio generating method according to an embodiment of the present invention generates a flat layer audio output by performing audio processing corresponding to the flat layer (S2220).

이 때, 평면 레이어는 3차원 바이노럴 레이어와는 상이한 구조를 갖는 레이어에 상응하는 것으로, 서라운드 효과 또는 스테레오 효과에 상응하는 이미지를 만드는 요소에 상응할 수 있다.In this case, the planar layer corresponds to a layer having a structure different from that of the 3D binaural layer, and may correspond to an element for making an image corresponding to a surround effect or a stereo effect.

예를 들어, 도 10을 참조하면, 바이노럴 인코더(1020)를 이용하여 5채널 또는 7채널(1010)의 서라운드 레이어에 상응하는 서라운드 레이어 바이노럴 인코딩을 수행할 수 있다. 이 때, 도 13 내지 도 14를 통해 설명하겠지만, 근접용 스테레오 레이어에 상응하는 2채널을 서라운드 레이어에 포함시켜 7채널 기반의 서라운드 레이어 바이노럴 인코딩을 수행할 수 있다.For example, referring to FIG. 10, a surround layer binaural encoding corresponding to a surround layer of five or seven channels 1010 may be performed using the binaural encoder 1020. In this case, as will be described with reference to FIGS. 13 through 14, 7-channel-based surround layer binaural encoding may be performed by including 2 channels corresponding to a proximity stereo layer in the surround layer.

이 때, 도 10 내지 도 11에서는 5채널 또는 7채널(1010)에 해당하는 서라운드 레이어를 도시하고 있으나, 서라운드 레이어의 채널은 5채널 또는 7채널(1010)에 한정되지 않는다. 또한, 도 11에서는 서라운드 레이어를 사각형 평면 형태로 도시하고 있으나, 이에 한정되지 않고 선의 두께, 평면 모양의 형태 및 기준 청취점으로부터의 거리 등 다양한 형태로 표현 가능하다.10 to 11 illustrate a surround layer corresponding to 5 or 7 channels 1010, the channel of the surround layer is not limited to 5 or 7 channels 1010. In addition, although the surround layer is illustrated in the form of a rectangular plane in FIG. 11, the surround layer is not limited thereto and may be represented in various forms such as a line thickness, a planar shape, and a distance from a reference listening point.

다른 예를 들어, 도 12를 참조하면, 스테레오 버스(Stereo Bus)(1220)를 기반으로 2채널의(1210)의 근접용 스테레오 레이어에 상응하게 오디오 프로세싱을 수행할 수 있다. 즉, 평면 레이어 오디오 출력에 상응하는 스테레오 신호(1230)는 2채널(1210) 기반의 스테레오 오디오를 프로세싱함으로써 생성된 출력에 상응할 수 있고, 2채널에 상응하게 출력될 수 있다. For another example, referring to FIG. 12, audio processing may be performed corresponding to a proximity stereo layer of two channels 1210 based on a stereo bus 1220. That is, the stereo signal 1230 corresponding to the planar layer audio output may correspond to the output generated by processing the two channel 1210 based stereo audio, and may be output corresponding to the two channels.

이와 같이, 서라운드 레이어 바이노럴 출력에 상응하는 평면 레이어 오디오 출력이나 스테레오 신호에 상응하는 평면 레이어 오디오 출력은 3차원 레이어 바이노럴 출력과 비교하였을 때, 단지 상이한 음향 효과를 포함하는 출력에 해당하는 것일 수 있다. 즉, 평면 레이어 오디오 출력은 3차원 레이어에 상응하는 출력이 아니어도 3차원 레이어 바이노럴 출력보다 다양한 값을 포함할 수도 있다.As such, a flat layer audio output corresponding to a surround layer binaural output or a flat layer audio output corresponding to a stereo signal corresponds to an output that includes only different sound effects when compared to a three dimensional layer binaural output. It may be. That is, the planar layer audio output may include various values than the 3D layer binaural output even if the output is not corresponding to the 3D layer.

또한, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 3차원 레이어 바이노럴 출력 및 평면 레이어 오디오 출력을 합하여 바이노럴 스테레오 출력을 생성한다(S2230). 즉, 3차원 레이어 바이노럴 출력에 의한 이머시브(immersive) 요소와 평면 레이어 오디오 출력에 의한 근접 재생 요소 및 오브젝트 요소 등을 믹스함으로써 바이노럴 효과가 극대화된 바이노럴 스테레오 출력을 생성할 수 있다. In addition, the binaural stereo audio generation method according to an embodiment of the present invention generates a binaural stereo output by adding the 3D layer binaural output and the planar layer audio output (S2230). That is, a binaural stereo output with the maximum binaural effect can be generated by mixing an immersive element with a three-dimensional layer binaural output, a close-up reproduction element and an object element with a planar layer audio output. have.

이 때, 서브우퍼 레이어는 3차원 바이노럴 레이어에 상응하는 3차원 큐빅이나 평면 레이어와 분리되어 위치할 수 있다.In this case, the subwoofer layer may be separated from the 3D cubic or planar layer corresponding to the 3D binaural layer.

또한, 본 발명은 상기에 개시된 기능들을 기반으로 자연스러운 업믹스 및 다운믹스 기능을 지원할 수 있으므로 다양한 종류의 사운드를 지원하는 컨텐츠 간의 호환성을 향상시킬 수 있다. 예를 들어, 3차원 큐빅을 통해 표현되는 서라운드 이미지를 서라운드 레이어로 다운믹스할 수 있다. 또한, 서라운드 레이어는 다시 근접용 스테레오 레이어로 다운믹스할 수도 있다. 이와 같이, 영역을 기반으로 다운믹스를 수행함에 따라 사운드의 음질을 보다 효과적으로 보존할 수 있다.In addition, since the present invention can support the natural upmix and downmix functions based on the functions disclosed above, it is possible to improve compatibility between contents supporting various kinds of sounds. For example, a surround image expressed through 3D cubic can be downmixed into a surround layer. The surround layer can also be downmixed back to a proximity stereo layer. As such, as the downmix is performed based on the area, the sound quality of the sound may be more effectively preserved.

또한, 도 22에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 네트워크와 같은 통신망을 통해 바이노럴 스테레오 오디오 생성을 위해 필요한 정보를 송수신할 수 있다. 특히, 본 발명의 일실시예에 따른 헤드 트래킹 정보나 사용자 입력 또는 바이노럴 효과를 적용할 컨텐츠에 관련된 정보를 수신하고, 바이노럴 스테레오 출력을 제공할 수 있다.In addition, although not shown in FIG. 22, the binaural stereo audio generating method according to an embodiment of the present invention may transmit and receive information necessary for generating binaural stereo audio through a communication network such as a network. In particular, it is possible to receive head tracking information or information related to a content to which a user input or a binaural effect is applied and provide a binaural stereo output according to an embodiment of the present invention.

또한, 도 22에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오 생성 방법은 상술한 바와 같이 본 발명의 일실시예에 따른 바이노럴 스테레오 오디오를 생성하는 과정에서 발생되는 다양한 정보를 저장한다.In addition, although not shown in FIG. 22, the binaural stereo audio generation method according to an embodiment of the present invention is generated in the process of generating the binaural stereo audio according to the embodiment of the present invention as described above. Store a variety of information.

따라서, 본 발명의 실시예는 컴퓨터로 구현된 방법이나 컴퓨터에서 실행 가능한 명령어들이 기록된 비일시적인 컴퓨터에서 읽을 수 있는 매체로 구현될 수 있다. 컴퓨터에서 읽을 수 있는 명령어들이 프로세서에 의해서 수행될 때, 컴퓨터에서 읽을 수 있는 명령어들은 본 발명의 적어도 한 가지 측면에 따른 방법을 수행할 수 있다.Accordingly, embodiments of the present invention may be implemented in a computer-implemented method or a non-transitory computer-readable medium in which computer-executable instructions are recorded. When computer readable instructions are executed by a processor, the computer readable instructions may perform a method according to at least one aspect of the present invention.

이상에서와 같이 본 발명에 따른 바이노럴 스테레오 오디오 생성 방법 및 이를 위한 장치는 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, the method and apparatus for generating binaural stereo audio according to the present invention are not limited to the configuration and method of the embodiments described as described above, but the embodiments may be modified in various ways. All or part of each of the embodiments may be configured to be selectively combined so that.

110, 1910: 3차원 바이노럴 레이어
111, 121, 210, 420, 1020: 바이노럴 인코더
120, 720, 1610, 1710, 1920, 2110: 서라운드 레이어
130, 2120: 근접용 스테레오 레이어
131, 1220: 스테레오 버스 140: 서브우퍼 레이어
141, 1820: LFE(Low Frequency Effects) 버스
150: 바이노럴 믹서 220: 전용 플레이어
310: 통신부 320: 프로세서
330: 메모리 411: 4개의 업채널
412: 4개의 다운채널 430: 3차원 레이어 바이노럴 출력
511~518: 동적 스피커 610~630, 710: 3차원 큐빅
700, 1300, 1400, 1600, 1700, 2000, 2100: 기준 청취점
730: 바이노럴 포인트 740: 3차원 벡터
1010: 5채널 또는 7채널 1030: 서라운드 레이어 바이노럴 출력
1111~1115, 1311, 1312, 1411, 1412, 1621, 1622, 1721, 1722: 스피커
1210: 2채널 1230: 스테레오 신호
1510~1530: 평면 레이어
1810: 단일 채널 또는 2채널 1830: 서브우퍼 출력
1930: 근접용 스테레오 레이어 1940: 서브우퍼 레이어
2010: 엔지니어가 의도한 사운드 방향
2020, 2130: 실제 사운드 방향110, 1910: three-dimensional binaural layer
111, 121, 210, 420, 1020: binaural encoder
120, 720, 1610, 1710, 1920, 2110: surround layer
130, 2120: close-up stereo layer
131, 1220: Stereo bus 140: Subwoofer layer
141, 1820: Low Frequency Effects bus
150: binaural mixer 220: dedicated player
310: communication unit 320: processor
330: memory 411: four upchannels
412: 4 downchannels 430: 3D layer binaural output
511 to 518: dynamic speakers 610 to 630, 710: three-dimensional cubic
700, 1300, 1400, 1600, 1700, 2000, 2100: reference listening point
730: binaural point 740: three-dimensional vector
1010: 5 or 7 channels 1030: Surround layer binaural output
1111-1115, 1311, 1312, 1411, 1412, 1621, 1622, 1721, 1722: speaker
1210: 2-channel 1230: Stereo signal
1510-1530: plane layer
1810: single channel or two-channel 1830: subwoofer output
1930: proximity stereo layer 1940: subwoofer layer
2010: Engineer's intended sound direction
2020, 2130: actual sound direction

Claims

Generating a 3D layer binaural output by performing 3D layer binaural encoding corresponding to the 3D binaural layer;
Generating a planar layer audio output by performing audio processing corresponding to the planar layer; And
Generating a binaural stereo output by combining the three-dimensional layer binaural output and the planar layer audio output.
Binaural stereo audio generation method comprising a.

The method according to claim 1,
The planar layer is
A surround layer that performs surround layer binaural encoding to generate a surround layer binaural output, and provides the generated surround layer binaural output to the planar layer audio output;
Binaural stereo audio generating method, characterized in that any one of the proximity stereo layer that receives the stereo signal and generates the planar layer audio output corresponding to the stereo signal.

The method according to claim 2,
The 3D layer binaural output
Binaural stereo audio, characterized in that it is generated corresponding to a three-dimensional vector for a binaural point located on an eight-channel based three-dimensional cubic consisting of four up channels and four down channels How to produce.

The method according to claim 2,
Generating the binaural stereo output
Binaural stereo, wherein a 3D weight is applied to the 3D layer binaural output, a plane weight is applied to the plane layer audio output, and the 3D weight and the plane weight are set independently of each other. How to generate audio.

The method according to claim 1,
Generating the binaural stereo output
And generating a binaural stereo output by summing a subwoofer output corresponding to a subwoofer layer together with the three-dimensional layer binaural output and the planar layer audio output.

The method according to claim 3,
The three-dimensional cubic
Binaural stereo audio generation method, characterized in that generated by changing the position of the eight dynamic speakers corresponding to the vertex of the three-dimensional cubic corresponding to the size parameter for the three-dimensional binaural layer.

The method according to claim 3,
The 3D vector is
The binaural stereo audio generation method of claim 3, wherein the binaural stereo audio is generated based on a reference listening point corresponding to a center of a 2D plane corresponding to the surround layer.

The method according to claim 3,
Generating the 3D layer binaural output
The 3D layer binaural output is generated by applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information, wherein the head tracking information is a tracking input and a user interface based on a head tracking module. The binaural stereo audio generation method of claim 1, wherein the binaural stereo audio is obtained corresponding to at least one of user inputs based on the input.

The method according to claim 8,
The three-dimensional cubic
A method for generating binaural stereo audio, characterized in that rotated in correspondence with a rotation parameter of at least one of pan, tilt and roll.

The method according to claim 3,
And the planar layer is located between the four up channels and the four down channels.

3D layer binaural encoding corresponding to the 3D binaural layer is performed to generate a 3D layer binaural output, and audio processing corresponding to the planar layer is performed to generate a planar layer audio output. A processor for combining the dimensional layer binaural output and the planar layer audio output to produce a binaural stereo output; And
A memory for storing the three-dimensional layer binaural output and a planar layer audio output
Binaural stereo audio generation apparatus comprising a.

The method according to claim 11,
The planar layer is
A surround layer that performs surround layer binaural encoding to generate a surround layer binaural output, and provides the generated surround layer binaural output to the planar layer audio output;
Binaural stereo audio generating device, characterized in that any one of the proximity stereo layer that receives the stereo signal and generates the planar layer audio output corresponding to the stereo signal.

The method according to claim 12,
The 3D layer binaural output
Binaural stereo audio, characterized in that it is generated corresponding to a three-dimensional vector for a binaural point located on an eight-channel based three-dimensional cubic consisting of four up channels and four down channels Generating device.

The method according to claim 12,
The processor is
Apply a three-dimensional weight to the three-dimensional layer binaural output, apply a plane weight to the flat layer audio output,
And the three-dimensional weight and the plane weight are set independently of each other.

The method according to claim 11,
The processor is
And generating the binaural stereo output by summing a subwoofer output corresponding to a subwoofer layer together with the three-dimensional layer binaural output and the planar layer audio output.

The method according to claim 13,
The three-dimensional cubic
Binaural stereo audio generating device, characterized in that it is generated by changing the position of the eight dynamic speakers corresponding to the vertex of the three-dimensional cubic corresponding to the size parameter for the three-dimensional binaural layer.

The method according to claim 13,
The 3D vector is
And a binaural stereo audio generating device included in the 3D cubic and generated based on a reference listening point corresponding to the center of the 2D plane corresponding to the surround layer.

The method according to claim 13,
The processor is
The 3D layer binaural output is generated by applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information, wherein the head tracking information is a tracking input and a user interface based on a head tracking module. And a binaural stereo audio generating device corresponding to at least one of user inputs based on the input.

The method according to claim 18,
The three-dimensional cubic
A binaural stereo audio generating device, characterized in that rotated in correspondence with a rotation parameter of at least one of pan, tilt and roll.

The method according to claim 13,
And the planar layer is positioned between the four up channels and the four down channels.