KR102119240B1

KR102119240B1 - Method for up-mixing stereo audio to binaural audio and apparatus using the same

Info

Publication number: KR102119240B1
Application number: KR1020180010877A
Authority: KR
Inventors: 김동준
Original assignee: 김동준
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2020-06-05
Also published as: WO2019147040A1; KR20190091825A

Abstract

스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법 및 이를 위한 장치가 개시된다. 본 발명의 일실시예에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법은 스테레오 신호에서 분리된 고음 영역 및 저음 영역을 기반으로 바이노럴 인코딩을 수행하여 바이노럴 출력을 생성하는 단계; 상기 스테레오 신호에서 분리된 중음 영역을 기반으로 스테레오 와이드 프로세싱을 수행하여 와이드 스테레오 출력을 생성하는 단계; 및 상기 스테레오 신호, 상기 바이노럴 출력 및 상기 와이드 스테레오 출력을 합하여 업 믹스 스테레오 출력을 생성하는 단계를 포함한다.Disclosed is a method and apparatus for up-mixing stereo audio to binaural audio. A method of up-mixing stereo audio to binaural audio according to an embodiment of the present invention includes generating binaural output by performing binaural encoding based on a high-pitched region and a low-pitched region separated from a stereo signal; Generating stereo wide output by performing stereo wide processing based on the midrange separated from the stereo signal; And combining the stereo signal, the binaural output, and the wide stereo output to generate an upmix stereo output.

Description

How to upmix stereo audio to binaural audio and devices for it {METHOD FOR UP-MIXING STEREO AUDIO TO BINAURAL AUDIO AND APPARATUS USING THE SAME}

본 발명은 스테레오 오디오를 바이노럴 오디오로 업 믹싱하는 기술에 관한 것으로, 특히 고음과 저음을 이용한 바이노럴 출력과 중음을 이용한 와이드 스테레오 출력을 합쳐서 스테레오 오디오를 업 믹싱하는 기술에 관한 것이다.The present invention relates to a technique for up-mixing stereo audio to binaural audio, and more particularly, to a technique for up-mixing stereo audio by combining binaural output using high and low tones and wide stereo output using mid-range.

멀티미디어 기술이 향상되면서, 5.1 채널보다 많은 7.1 채널, 10.2 채널, 11.1 채널, 22.2 채널 등의 다채널 오디오 신호를 포함하는 컨텐츠의 사용이 증가하고 있다. 그러나, 컨텐츠를 이용하는 사용자들이 소지하고 있는 사용자 단말들은 대체로 스테레오 스피커나 헤드폰, 이어폰과 같이 스테레오 형태의 오디오 신호를 재생할 수 있기 때문에 고품질의 다채널 오디오 신호는 스테레오 형태의 오디오 신호로 변환될 필요가 있다.As the multimedia technology has improved, the use of content including multi-channel audio signals such as 7.1 channels, 10.2 channels, 11.1 channels, and 22.2 channels, which are more than 5.1 channels, is increasing. However, since user terminals possessed by users using content can generally reproduce stereo type audio signals such as stereo speakers, headphones, and earphones, high-quality multi-channel audio signals need to be converted into stereo type audio signals. .

한국 공개 특허 제10-2015-0013073호, 2015년 2월 4일 공개(명칭: 다채널 오디오 신호의 바이노럴 렌더링 방법 및 장치)Published Korean Patent No. 10-2015-0013073, released on February 4, 2015 (Name: Binaural rendering method and device for multi-channel audio signal)

본 발명의 목적은 이머시브(immersive)을 수행하지 않고도 기존 스테레오 파일을 이머시브로 업 믹스하는 방법을 제공하는 것이다.An object of the present invention is to provide a method of up-mixing an existing stereo file to immersive without performing immersive.

또한, 본 발명의 목적은 스테레오 파일을 이머시브 파일로 믹스하는데 필요한 시간과 비용을 절감하는 것이다.It is also an object of the present invention to reduce the time and cost required to mix stereo files into immersive files.

또한, 본 발명의 목적은 자연스러운 업 믹스를 기반으로 다양한 종류의 컨텐츠들과의 호환성을 향상시키는 것이다.In addition, the object of the present invention is to improve compatibility with various types of content based on a natural upmix.

상기한 목적을 달성하기 위한 본 발명에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법은 스테레오 신호에서 분리된 고음 영역 및 저음 영역을 기반으로 바이노럴 인코딩을 수행하여 바이노럴 출력을 생성하는 단계; 상기 스테레오 신호에서 분리된 중음 영역을 기반으로 스테레오 와이드 프로세싱을 수행하여 와이드 스테레오 출력을 생성하는 단계; 및 상기 스테레오 신호, 상기 바이노럴 출력 및 상기 와이드 스테레오 출력을 합하여 업 믹스 스테레오 출력을 생성하는 단계를 포함한다.The method for up-mixing stereo audio according to the present invention to achieve the above object to binaural audio is performed by performing binaural encoding based on a high-pitched region and a low-pitched region separated from a stereo signal to generate a binaural output. To do; Generating stereo wide output by performing stereo wide processing based on the midrange separated from the stereo signal; And combining the stereo signal, the binaural output, and the wide stereo output to generate an upmix stereo output.

이 때, 바이노럴 출력은 4개의 업 채널들과 4개의 다운채널들로 구성된 8채널 기반의 3차원 큐빅(Cubic)에 위치하는 바이노럴 포인트에 대한 3차원 벡터에 상응하게 생성되되, 상기 4개의 업 채널들의 위치는 상기 고음 영역을 기반으로 설정되고, 상기 4개의 다운채널들의 위치는 상기 저음 영역을 기반으로 설정될 수 있다.At this time, the binaural output is generated corresponding to a 3D vector for a binaural point located in an 8-channel based 3D cubic composed of 4 up channels and 4 down channels. The positions of the four up channels may be set based on the treble region, and the positions of the four down channels may be set based on the bass region.

이 때, 4개의 업 채널들의 위치는 상기 고음 영역에서 트랜션트(Transient)의 크기를 기준으로 검출된 어느 하나의 고음 주파수를 이용하여 설정되고, 상기 4개의 다운채널들의 위치는 상기 저음 영역에서 트랜션트(Transient)의 크기를 기준으로 검출된 어느 하나의 저음 주파수를 이용하여 설정될 수 있다.At this time, the positions of the four up channels are set using any one treble frequency detected based on the magnitude of the transient in the treble region, and the positions of the four down channels are trans in the bass region. It can be set using any one of the bass frequencies detected based on the magnitude of the shunt.

이 때, 4개의 업 채널들로 구성되는 3차원 큐빅의 상위 레이어와 상기 4개의 다운채널들로 구성되는 3차원 큐빅의 하위 레이어 사이의 거리는 상기 스테레오 신호의 이퀄라이저 값을 기반으로 설정될 수 있다.At this time, the distance between the upper layer of the 3D cubic composed of 4 up channels and the lower layer of the 3D cubic composed of 4 down channels may be set based on the equalizer value of the stereo signal.

이 때, 와이드 스테레오 출력은 상기 중음 영역에 상응하는 와이드 스테레오 레이어를 기반으로 생성되되, 상기 와이드 스테레오 레이어는 리버브 값과 딜레이 값에 상응하게 이미지 공간이 확장된 스테레오 레이어에 상응할 수 있다.At this time, a wide stereo output is generated based on a wide stereo layer corresponding to the mid-range, and the wide stereo layer may correspond to a stereo layer in which an image space is extended corresponding to a reverb value and a delay value.

이 때, 3차원 벡터는 상기 3차원 큐빅의 내부에 위치하는 기준 청취점을 기준으로 생성될 수 있다.At this time, a 3D vector may be generated based on a reference listening point located inside the 3D cubic.

이 때, 바이노럴 출력을 생성하는 단계는 상기 3차원 벡터의 방향 정보를 헤드 트래킹 정보에 상응하게 회전된 상기 3차원 큐빅에 적용하여 상기 바이노럴 출력을 생성할 수 있다.At this time, the step of generating the binaural output may generate the binaural output by applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information.

이 때, 3차원 큐빅은 팬(Pan), 틸트(tilt) 및 롤(roll) 중 적어도 하나의 회전 파라미터에 상응하게 회전될 수 있다.At this time, the three-dimensional cubic can be rotated corresponding to at least one rotation parameter of a pan, tilt, and roll.

이 때, 바이노럴 출력은 상기 상위 레이어의 기본 주파수를 기준으로 하는 하모닉스를 포함할 수 있다.At this time, the binaural output may include harmonics based on the fundamental frequency of the upper layer.

이 때, 업 믹스하는 방법은 상기 스테레오 신호를 고음 패스 필터, 중음 패스 필터 및 저음 패스 필터로 각각 입력하여, 상기 스테레오 신호를 상기 고음 영역, 상기 중음 영역 및 상기 저음 영역으로 분리하는 단계를 더 포함할 수 있다.At this time, the up-mixing method may further include separating the stereo signal into the treble region, the mid-range region, and the bass region by inputting the stereo signal into a treble pass filter, a mid-pass filter, and a bass pass filter, respectively. can do.

또한, 본 발명의 일실시예에 따른 업 믹스 장치는, 스테레오 신호에서 분리된 고음 영역 및 저음 영역을 기반으로 바이노럴 인코딩을 수행하여 바이노럴 출력을 생성하고, 상기 스테레오 신호에서 분리된 중음 영역을 기반으로 스테레오 와이드 프로세싱을 수행하여 와이드 스테레오 출력을 생성하고, 상기 스테레오 신호, 상기 바이노럴 출력 및 상기 와이드 스테레오 출력을 합하여 업 믹스 스테레오 출력을 생성하는 프로세서; 및 상기 스테레오 신호, 상기 바이노럴 출력 및 상기 와이드 스테레오 출력을 저장하는 메모리를 포함한다.In addition, the upmix device according to an embodiment of the present invention performs binaural encoding based on a high-pitched region and a low-pitched region separated from a stereo signal to generate a binaural output, and the mid-tone separated from the stereo signal A processor that performs stereo wide processing based on a region to generate a wide stereo output, and adds the stereo signal, the binaural output, and the wide stereo output to generate an upmix stereo output; And a memory for storing the stereo signal, the binaural output, and the wide stereo output.

이 때, 프로세서는 상기 3차원 벡터의 방향 정보를 헤드 트래킹 정보에 상응하게 회전된 상기 3차원 큐빅에 적용하여 상기 바이노럴 출력을 생성할 수 있다.At this time, the processor may generate the binaural output by applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information.

이 때, 프로세서는 상기 스테레오 신호를 고음 패스 필터, 중음 패스 필터 및 저음 패스 필터로 각각 입력하여, 상기 스테레오 신호를 상기 고음 영역, 상기 중음 영역 및 상기 저음 영역으로 분리할 수 있다.At this time, the processor may input the stereo signal into a high-pass filter, a mid-pass filter, and a low-pass filter, respectively, and separate the stereo signal into the high-pitched region, the middle-pitched region, and the low-pitched region.

본 발명에 따르면, 이머시브(immersive)을 수행하지 않고도 기존 스테레오 파일을 이머시브로 업 믹스하는 방법을 제공할 수 있다.According to the present invention, it is possible to provide a method of upmixing an existing stereo file to immersive without performing immersive.

또한, 본 발명은 스테레오 파일을 이머시브 파일로 믹스하는데 필요한 시간과 비용을 절감할 수 있다.In addition, the present invention can reduce the time and cost required to mix stereo files into immersive files.

또한, 본 발명은 자연스러운 업 믹스를 기반으로 다양한 종류의 컨텐츠들과의 호환성을 향상시킬 수 있다.In addition, the present invention can improve compatibility with various types of content based on a natural upmix.

도 1은 본 발명의 일실시예에 따른 스테레오 오디오 업 믹스 구조를 나타낸 도면이다.
도 2는 본 발명의 일실시예에 따른 업 믹스 장치를 나타낸 블록도이다.
도 3 내지 도 5는 본 발명에 따른 스테레오 신호의 고음 영역, 중음 영역, 저음 영역을 분리하는 필터의 일 예를 나타낸 도면이다.
도 6은 본 발명의 일실시예에 따른 바이노럴 출력을 생성하는 상세한 구조를 나타낸 도면이다.
도 7은 본 발명에 따른 8채널 기반의 3차원 큐빅(Cubic)에서 상위 레이어와 하위 레이어의 일 예를 나타낸 도면이다.
도 8은 본 발명에 따른 스테레오 오디오 업 믹스 효과의 일 예를 개념적으로 나타낸 도면이다.
도 9는 본 발명에 따른 3차원 큐빅에서 상면 레이어와 하면 레이어 간의 거리를 나타낸 도면이다.
도 10은 본 발명에 따른 3차원 벡터의 일 예를 나타낸 도면이다.
도 11은 본 발명에 따른 헤드 트래킹 정보에 상응하게 회전된 3차원 큐빅에 3차원 벡터의 방향 정보를 적용한 일 예를 나타낸 도면이다.
도 12는 본 발명에 따른 회전 파라미터의 일 예를 나타낸 도면이다.
도 13은 본 발명의 일실시예에 따른 와이더 스테레오 출력을 생성하는 상세한 구조를 나타낸 도면이다.
도 14는 본 발명에 따른 스테레오 이미지를 확장하는 일 예를 나타낸 도면이다.
도 15는 본 발명에 따른 3차원 큐빅의 상위 레이어 및 하위 레이어와 와이드 스테레오 레이어를 합한 구조의 일 예를 나타낸 도면이다.
도 16는 본 발명의 일실시예에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법을 나타낸 동작흐름도이다.1 is a view showing a stereo audio up mix structure according to an embodiment of the present invention.
2 is a block diagram showing an upmix device according to an embodiment of the present invention.
3 to 5 are diagrams showing an example of a filter for separating a high-pitched region, a middle-pitched region, and a low-pitched region of a stereo signal according to the present invention.
6 is a view showing a detailed structure for generating a binaural output according to an embodiment of the present invention.
7 is a diagram illustrating an example of an upper layer and a lower layer in an 8-channel based 3D cubic according to the present invention.
8 is a diagram conceptually showing an example of a stereo audio upmix effect according to the present invention.
9 is a view showing the distance between the top and bottom layers in a three-dimensional cubic according to the present invention.
10 is a view showing an example of a three-dimensional vector according to the present invention.
11 is a view showing an example of applying direction information of a 3D vector to a 3D cubic rotated corresponding to head tracking information according to the present invention.
12 is a view showing an example of a rotation parameter according to the present invention.
13 is a view showing a detailed structure for generating a wiper stereo output according to an embodiment of the present invention.
14 is a diagram illustrating an example of extending a stereo image according to the present invention.
15 is a diagram showing an example of a structure in which a 3D cubic upper layer and a lower layer are combined with a wide stereo layer according to the present invention.
16 is a flowchart illustrating a method of up-mixing stereo audio to binaural audio according to an embodiment of the present invention.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.If described in detail with reference to the accompanying drawings the present invention. Here, repeated descriptions, well-known functions that may unnecessarily obscure the subject matter of the present invention, and detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. Therefore, the shape and size of elements in the drawings may be exaggerated for a more clear description.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 스테레오 오디오 업 믹스 구조를 나타낸 도면이다.1 is a view showing a stereo audio up mix structure according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 스테레오 오디오 업 믹스 구조는 2채널에 상응하는 스테레오 신호(110)를 고음 패스 필터(121), 중음 패스 필터(122) 및 저음 패스 필터(123)로 각각 입력시킬 수 있다. Referring to FIG. 1, in the stereo audio upmix structure according to an embodiment of the present invention, a stereo signal 110 corresponding to two channels is provided with a high-pass filter 121, a mid-pass filter 122, and a low-pass filter 123. ).

이 때, 고음 패스 필터(121)로 입력된 스테레오 신호(110)에서는 고음 영역만이 통과되어 바이노럴 인코더(130)로 입력될 수 있다. 또한, 중음 패스 필터(122)로 입력된 스테레오 신호(110)에서는 중음 영역만이 통과되어 스테레오 와이더(140)로 입력될 수 있다. 마지막으로, 저음 패스 필터(123)로 입력된 스테레오 신호(110)에서는 저음 영역만 통과되어 고음 영역과 함께 바이노럴 인코더(130)로 입력될 수 있다. At this time, in the stereo signal 110 input through the treble pass filter 121, only the treble region may pass and be input to the binaural encoder 130. In addition, in the stereo signal 110 input through the mid-pass filter 122, only the mid-range is passed and may be input to the stereo winder 140. Finally, in the stereo signal 110 input through the low-pass filter 123, only the low-pitched region is passed and may be input to the binaural encoder 130 together with the high-pitched region.

이 때, 바이노럴 인코더(Binaural Encoder)(130)로 입력된 저음 영역은 방향성을 갖지 않지만, 고음 영역은 방향성을 가질 수 있기 때문에 고음 영역과 저음 영역을 분리하고, 이머시브(immersive) 효과를 주기 위한 바이노럴 인코딩을 수행할 수 있다. At this time, the bass region input to the binaural encoder (Binaural Encoder) 130 does not have a directionality, but since the treble region can have a directionality, the treble region and the bass region are separated, and an immersive effect is achieved. Binaural encoding for the cycle can be performed.

예를 들어, 바이노럴 인코더(130)는 고음 영역에 해당하는 스테레오 2채널과 저음 영역에 해당하는 스테레오 2채널을 이용하여 3차원 레이어를 생성할 수 있고, 3차원 레이어에 상응하게 바이노럴 인코딩을 수행할 수 있다. For example, the binaural encoder 130 may generate a 3D layer using 2 stereo channels corresponding to the treble region and 2 stereo channels corresponding to the bass region, and the binaural corresponding to the 3D layer Encoding can be performed.

이 때, 스테레오 와이더(Stereo Wider)(140)로 입력된 중음 영역은 바이노럴 인코딩을 수행하지 않고, 스테레오 이미지 영역을 확장하기 위한 스테레오 와이드 프로세싱을 수행할 수 있다. In this case, the mid-range inputted to the stereo Wider 140 may perform stereo wide processing to expand the stereo image area without performing binaural encoding.

이 후, 바이노럴 믹서(Binaural Mixer)(150)에서는 바이노럴 인코더(130)에서 출력되는 바이노럴 출력 및 스테레오 와이더(140)에서 출력되는 와이드 스테레오 출력과 함께 스테레오 신호(110)을 합하여 업 믹스 스테레오 출력을 생성할 수 있다. Thereafter, the binaural mixer 150 outputs the stereo signal 110 together with the binaural output output from the binaural encoder 130 and the wide stereo output output from the stereo wiper 140. Combined, you can create an upmix stereo output.

이 때, 업 믹스 스테레오 출력은 이머시브(immersive) 효과가 포함된 스테레오 신호 또는 스테레오 오디오에 상응할 수 있다. 즉, 본 발명에 따르면 별도의 이머시브 믹싱(immersive mixing)을 수행하지 않고도 스테레오 오디오 또는 스테레오 오디오 컨텐츠에 이머시브 효과를 연출할 수 있다. At this time, the upmix stereo output may correspond to a stereo signal or stereo audio with an immersive effect. That is, according to the present invention, it is possible to produce an immersive effect on stereo audio or stereo audio content without performing separate immersive mixing.

따라서, 종래의 방식대로 이머시브 믹싱을 수행하여 이머시브 컨텐츠 또는 이머시브 오디오를 생성하는 것보다 비용과 시간을 절감할 수 있다. Accordingly, it is possible to reduce cost and time than performing immersive mixing in the conventional manner to generate immersive content or immersive audio.

도 2는 본 발명의 일실시예에 따른 업 믹스 장치를 나타낸 블록도이다.2 is a block diagram showing an upmix device according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일실시예에 따른 업 믹스 장치는 통신부(210), 프로세서(220) 및 메모리(230)를 포함한다. Referring to FIG. 2, an upmix device according to an embodiment of the present invention includes a communication unit 210, a processor 220, and a memory 230.

통신부(210)는 네트워크와 같은 통신망을 통해 업 믹스 스테레오 오디오를 생성을 위해 필요한 정보를 송수신하는 역할을 한다. 특히, 본 발명의 일실시예에 따른 통신부(210)는 업 믹스 스테레오 오디오 생성을 위해 소스에 해당하는 스테레오 신호 또는 컨텐츠, 바이노럴 인코딩을 위해 헤드 트래킹 모듈이나 사용자 인터페이스를 통해 입력될 헤드 트래킹 정보 등을 수신하고, 업 믹스 스테레오 출력에 상응하는 업 믹스 스테레오 오디오를 제공할 수 있다. The communication unit 210 serves to transmit and receive information necessary for generating upmix stereo audio through a communication network such as a network. In particular, the communication unit 210 according to an embodiment of the present invention is a stereo signal or content corresponding to a source for generating upmix stereo audio, head tracking information to be input through a head tracking module or a user interface for binaural encoding Etc., and provide upmix stereo audio corresponding to the upmix stereo output.

프로세서(220)는 스테레오 신호에서 분리된 고음 영역 및 저음 영역을 기반으로 바이노럴 인코딩을 수행하여 바이노럴 출력을 생성한다. The processor 220 generates a binaural output by performing binaural encoding based on a high-pitched region and a low-pitched region separated from the stereo signal.

이 때, 스테레오 신호를 고음 패스 필터, 중음 패스 필터 및 저음 패스 필터로 각각 입력하여, 스테레오 신호를 고음 영역, 중음 영역 및 저음 영역으로 분리할 수 있다.At this time, the stereo signal may be input to the treble pass filter, the mid pass filter, and the low pass filter, respectively, and the stereo signal may be divided into a treble region, a mid region, and a bass region.

예를 들어, 도 3 내지 도 5를 참조하면, 프로세서(220)는 고음 패스 필터(300), 중음 패스 필터(400) 및 저음 패스 필터(500)로 각각 2채널에 상응하는 스테레오 신호를 입력할 수 있다. For example, referring to FIGS. 3 to 5, the processor 220 may input stereo signals corresponding to two channels to the treble pass filter 300, the mid pass filter 400, and the low pass filter 500, respectively. Can be.

이 때, 고음 패스 필터(300)는 입력된 스테레오 신호의 음역대 중 고음 영역만을 통과시키는 필터에 상응하는 것으로, 도 3에 도시된 것과 같이 고음 영역의 스테레오 신호를 출력할 수 있다.At this time, the treble pass filter 300 corresponds to a filter that passes only the treble region of the input stereo signal range, and may output a stereo signal of the treble region as shown in FIG. 3.

이 때, 중음 패스 필터(400)는 입력된 스테레오 신호의 음역대 중 중음 영역만을 통과시키는 필터에 상응하는 것으로, 도 4에 도시된 것과 같이 중음 영역의 스테레오 신호를 출력할 수 있다.At this time, the mid-pass filter 400 corresponds to a filter that passes only the mid-range in the sound range of the input stereo signal, and may output a stereo signal in the mid-range as shown in FIG. 4.

이 때, 저음 패스 필터(500)는 입력된 스테레오 신호의 음역대 중 저음 영역만을 통과시키는 필터에 상응하는 것으로, 도 5에 도시된 것과 같이 저음 영역의 스테레오 신호를 출력할 수 있다.At this time, the bass pass filter 500 corresponds to a filter that passes only the bass region of the input range of the stereo signal, and may output a stereo signal in the bass region as shown in FIG. 5.

이 때, 본 발명에서 사용되는 고음 패스 필터(300), 중음 패스 필터(400), 저음 패스 필터(500)는 특정한 필터링 방법에 한정되지 않고, 사용 가능하거나 향후 개발 가능한 기술을 적용하여 동작할 수 있다. At this time, the treble pass filter 300, the mid pass filter 400, and the low pass filter 500 used in the present invention are not limited to a specific filtering method, and can be operated by applying available or developable technologies in the future. have.

이 때, 바이노럴 출력은 4개의 업 채널들과 4개의 다운채널들로 구성된 8채널 기반의 3차원 큐빅(Cubic)에 위치하는 바이노럴 포인트에 대한 3차원 벡터에 상응하게 생성되되, 4개의 업 채널들의 위치는 고음 영역을 기반으로 설정되고, 4개의 다운채널들의 위치는 저음 영역을 기반으로 설정될 수 있다.At this time, the binaural output is generated corresponding to a 3D vector for a binaural point located in an 8-channel based 3D cubic composed of 4 up channels and 4 down channels, 4 The positions of the four up channels may be set based on the treble region, and the positions of the four down channels may be set based on the bass region.

이 때, 8채널 기반의 3차원 큐빅은 3차원 공간 이미지를 만드는 요소에 상응하는 것으로, 4개의 업 채널들로 구성되는 상위 레이어와 4개의 다운채널들로 구성되는 하위 레이어로 구성되는 3차원 레이어에 상응할 수 있다. At this time, the 8-channel-based 3D cubic corresponds to an element that creates a 3D spatial image. The 3D layer is composed of an upper layer composed of 4 up channels and a lower layer composed of 4 down channels. It may correspond to.

예를 들어, 도 6을 참조하면, 3차원 큐빅 방식에 상응하는 바이노럴 인코더(620)를 이용하여 스테레오 신호에서 분리된 고음 영역(611)에 상응하는 2채널 및 저음 영역(612)에 상응하는 2채널에 상응하게 바이노럴 인코딩을 수행할 수 있다. For example, referring to FIG. 6, using a binaural encoder 620 corresponding to a three-dimensional cubic method, corresponding to two channels and a bass region 612 corresponding to a treble region 611 separated from a stereo signal Binaural encoding may be performed corresponding to the two channels.

이 때, 4개의 업 채널들의 위치는 고음 영역(611)에서 트랜션트(Transient)의 크기를 기준으로 검출된 어느 하나의 고음 주파수를 이용하여 설정될 수 있고, 4개의 다운채널들의 위치는 저음 영역(612)에서 트랜션트(Transient)의 크기를 기준으로 검출된 어느 하나의 저음 주파수를 이용하여 설정될 수 있다. At this time, the positions of the four up channels can be set using any one treble frequency detected based on the magnitude of the transient in the treble area 611, and the positions of the four down channels are the bass area In 612, it may be set using any one bass frequency detected based on the magnitude of the transient.

이 때, 트랜션트(Transient)는 소리의 파형에서 소리가 처음 시작될 때 나타나는 초기 진폭 상승 부분을 의미하는 것일 수 있다.At this time, the transient (Transient) may mean a portion of the initial amplitude rise that appears when the sound first starts in the waveform of the sound.

예를 들어, 본 발명에서는 고음 영역(611)과 저음 영역(612)에서 각각 트랜션트가 강한 주파수를 하나씩 검색하고, 검색된 주파수들을 실시간으로 좌우 분리 처리하여 생성된 레프트 채널과 라이트 채널을 기반으로 4개의 업 채널들과 4개의 다운채널들을 생성할 수 있다. 이 때, 스테레오 효과를 높이기 위해서 고음 영역(611)에 상응하는 상위 레이어의 패닝값에 스테레오 인핸스(Stereo Enhance)를 적용하여 자연스러운 소리가 생성되도록 할 수도 있다.For example, in the present invention, the high frequency region 611 and the low frequency region 612 are searched for frequencies with strong transients, respectively, and the searched frequencies are separated from each other in real time, based on the left channel and the light channel. It is possible to create 4 up channels and 4 down channels. In this case, in order to increase the stereo effect, it is also possible to apply a stereo enhancement to the panning value of the upper layer corresponding to the high-pitched region 611 to generate a natural sound.

도 7을 참조하면, 먼저, 고음 영역(611)에서 검출된 고음 주파수를 좌우 분리 처리하여 레프트 채널 L과 라이트 채널 R에 상응하는 위치를 획득하고, 도 7에 도시된 것과 같이 레프트 채널의 위치에 스피커(711)을 배치하고, 라이트 채널의 위치에 스피커(712)를 배치할 수 있다. 이 후, 레프트 채널 L과 라이트 채널 R을 'L-(L-R)'에 상응하게 조합한 위치에 스피커(713)을 배치하고, 레프트 채널 L과 라이트 채널 R을 'R-(L-R)'에 상응하게 조합한 위치에 스피커(714)를 배치함으로써 3차원 큐빅의 상위 레이어(710)를 구성할 수 있다.Referring to FIG. 7, first, a left and right separation process is performed on the treble frequencies detected in the treble region 611 to obtain positions corresponding to the left channel L and the light channel R, and the position of the left channel is as shown in FIG. 7. The speaker 711 may be disposed, and the speaker 712 may be disposed at the position of the light channel. Thereafter, the speaker 713 is disposed in a position where the left channel L and the light channel R are combined to the'L-(LR)', and the left channel L and the light channel R are corresponding to the'R-(LR)'. By disposing the speaker 714 at the combined position, the upper layer 710 of the three-dimensional cubic can be constructed.

또한, 저음 영역(612)에서 검출된 저음 주파수를 좌우 분리 처리하여 레프트 채널 L과 라이트 채널 R에 상응하는 위치를 획득하고, 도 7에 도시된 것과 같이 레프트 채널의 위치에 스피커(721)을 배치하고, 라이트 채널의 위치에 스피커(722)를 배치할 수 있다. 이 후, 레프트 채널 L과 라이트 채널 R을' L-(L-R)'에 상응하게 조합한 위치에 스피커(723)을 배치하고, 레프트 채널 L과 라이트 채널 R을 'R-(L-R)'에 상응하게 조합한 위치에 스피커(724)를 배치함으로써 3차원 큐빅의 하위 레이어(720)를 구성할 수 있다.In addition, the bass frequencies detected in the bass region 612 are separated left and right to obtain positions corresponding to the left channel L and the light channel R, and the speaker 721 is disposed at the position of the left channel as shown in FIG. 7. And the speaker 722 can be arrange|positioned at the position of a light channel. Thereafter, the speaker 723 is disposed in a position where the left channel L and the light channel R are combined to the'L-(LR)', and the left channel L and the light channel R are corresponding to the'R-(LR)'. By disposing the speaker 724 at the combined position, the lower layer 720 of the 3D cubic can be constructed.

따라서, 도 6에 도시된 바이노럴 출력(630)은 도 7에 도시된 것과 같이 8개의 스피커(711~714, 721~724)에 상응하는 8채널 기반의 오디오를 바이노럴 인코딩함으로써 생성된 출력에 상응할 수 있고, 도 6에 도시된 것과 같이 2채널에 상응하는 스테레오 형식으로 출력될 수 있다. 이 때, 바이노럴 출력(630)에 상응하는 2채널은 각각 레프트 채널과 라이트 채널에 상응할 수 있다.Accordingly, the binaural output 630 illustrated in FIG. 6 is generated by binaural encoding of 8-channel based audio corresponding to the eight speakers 711 to 714 and 721 to 724 as illustrated in FIG. 7. It may correspond to the output, and may be output in a stereo format corresponding to two channels as shown in FIG. 6. At this time, the two channels corresponding to the binaural output 630 may correspond to the left channel and the right channel, respectively.

즉, 도 8에 도시된 것과 같이, 2채널(810)에 불과했던 고음 영역과 저음 영역의 스테레오 신호를 바이노럴 인코딩함으로써 8채널(820)에 상응하는 바이노럴 효과를 포함하는 바이노럴 출력을 생성할 수 있다.That is, as shown in FIG. 8, binaural including a binaural effect corresponding to the 8-channel 820 by binaurally encoding stereo signals of the high-pitched and low-pitched regions that were only 2 channels 810. You can generate output.

이 때, 본 발명의 실시예에서는 3차원 레이어로 8채널 기반의 3차원 큐빅을 사용하였으나, 바이노럴 인코딩을 위한 3차원 레이어는 이에 한정되지 않을 수 있다. 즉, 본 발명의 일실시예에 따른 업 믹스 장치는 사용 가능한 다른 3차원 레이어 또는 향후 개발될 3차원 레이어를 포함하여 구성될 수도 있다.At this time, in the embodiment of the present invention, 8-channel-based 3D cubic was used as a 3D layer, but the 3D layer for binaural encoding may not be limited thereto. That is, the upmix device according to an embodiment of the present invention may be configured to include other 3D layers or 3D layers to be developed in the future.

이 때, 4개의 업 채널들로 구성되는 3차원 큐빅의 상위 레이어와 4개의 다운채널들로 구성되는 3차원 큐빅의 하위 레이어 사이의 거리는 스테레오 신호의 이퀄라이저 값을 기반으로 설정될 수 있다. In this case, the distance between the upper layer of the 3D cubic composed of 4 up channels and the lower layer of the 3D cubic composed of 4 down channels may be set based on the equalizer value of the stereo signal.

이 때, 스테레오 신호의 이퀄라이저(equalizer, EQ) 값은 음역대를 조절하여 소리의 공간감을 조정하기 위한 것으로, 도 9에 도시된 3차원 큐빅의 상위 레이어와 하위 레이어간 거리(910)는 이퀄라이저 값에 따라 설정될 수 있다. 즉, 상위 레이어에 해당하는 고음역대의 헤르츠(Hz)를 조절하거나 또는 하위 레이어에 해당하는 저음역대의 헤르츠를 조절하는 방식으로 상위 레이어와 하위 레이어의 거리(910)를 조정하여 수직적으로 이미지 공간을 조정할 수 있습니다.At this time, the equalizer (Equalizer, EQ) value of the stereo signal is to adjust the spatial range of the sound by adjusting the range, the distance between the upper and lower layers of the three-dimensional cubic shown in Figure 9 (910) is the equalizer value Can be set accordingly. In other words, the image space is adjusted vertically by adjusting the distance between the upper and lower layers (910) by adjusting the hertz (Hz) of the high-layer corresponding to the upper layer or by adjusting the hertz of the lower-layer corresponding to the lower layer. can.

이 때, 3차원 벡터는 큐빅의 내부에 위치하는 기준 청취점을 기준으로 생성될 수 있다. At this time, the 3D vector may be generated based on a reference listening point located inside the cubic.

예를 들어, 도 10을 참조하면, 사용자 또는 청취자의 위치를 가상으로 표현한 기준 청취점(1010)은 8개의 동적 스피커들을 각 꼭지점으로 하는 3차원 큐빅(1000)의 내부 중심 부분에 위치할 수 있다. 이 때, 바이노럴 포인트(1020)가 도 10에 도시된 것과 같이 3차원 큐빅(1000)의 상위 레이어 상에 위치한다고 가정하면, 바이노럴 출력에 상응하는 3차원 벡터(1030)는 도 10에 도시된 기준 청취점(1010)에서 바이노럴 포인트(1020)를 향하는 방향으로 생성될 수 있다.For example, referring to FIG. 10, the reference listening point 1010 virtually representing the position of the user or the listener may be located in the inner central portion of the 3D cubic 1000 using eight dynamic speakers as each vertex. . At this time, assuming that the binaural point 1020 is located on the upper layer of the 3D cubic 1000 as shown in FIG. 10, the 3D vector 1030 corresponding to the binaural output is shown in FIG. 10. It may be generated in a direction toward the binaural point 1020 from the reference listening point 1010 shown in.

이 때, 도 10에 도시된 것과 같이 3차원 큐빅(1000) 상에서 바이노럴 포인트(1020)가 기준 청취점(1010)보다 높게 위치할 경우, 출력되는 소리가 청취자의 상단에 맺힐 수 있다. 또한, 3차원 큐빅(1000) 상에서 바이노럴 포인트(1020)가 기준 청취점(1010)보다 낮게 위치할 경우, 출력되는 소리가 청취자의 하단에 맺힐 수도 있다.At this time, as illustrated in FIG. 10, when the binaural point 1020 is positioned higher than the reference listening point 1010 on the three-dimensional cubic 1000, the sound output may be formed at the top of the listener. In addition, when the binaural point 1020 is positioned lower than the reference listening point 1010 on the three-dimensional cubic 1000, the sound output may be formed at the bottom of the listener.

이와 같이, 본 발명에서는 3차원 큐빅(1000)상에서 기준 청취점(1010)을 기준으로 한 바이노럴 포인트(1020)의 위치를 변경함으로써 보다 다양한 오디오를 연출하는 것이 가능할 수 있다.As described above, in the present invention, it may be possible to produce more various audios by changing the position of the binaural point 1020 based on the reference listening point 1010 on the three-dimensional cubic 1000.

이 때, 도 10에는 도시하지 아니하였으나, 기준 청취점(1010)은 3차원 큐빅(1000)의 내부에 위치하되, 스테레오 신호의 중음 영역에 상응하는 와이드 스테레오 레이어 상에 위치할 수도 있다. 즉, 2채널 기반의 스테레오 레이어에 상응하는 와이드 스테레오 레이어는 3차원 큐빅(1000)의 상위 레이어와 하위 레이어 사이에 위치할 수 있다. At this time, although not shown in FIG. 10, the reference listening point 1010 is located inside the 3D cubic 1000, but may be located on a wide stereo layer corresponding to the mid-range of the stereo signal. That is, the wide stereo layer corresponding to the 2-channel based stereo layer may be located between the upper layer and the lower layer of the 3D cubic 1000.

이 때, 3차원 벡터의 방향 정보를 헤드 트래킹 정보에 상응하게 회전된 3차원 큐빅에 적용하여 바이노럴 출력을 생성할 수 있다. 즉, 바이노럴 포인트는 기준 청취점에 해당하는 청취자의 머리를 기준으로 설정된 위치이므로 청취자의 머리 위치나 각도가 변경되는 경우, 3차원 큐빅 상에서 바이노럴 포인트의 위치도 변경될 수 있다.At this time, the binaural output may be generated by applying the direction information of the 3D vector to the rotated 3D cubic corresponding to the head tracking information. That is, since the binaural point is a position set based on the listener's head corresponding to the reference listening point, when the listener's head position or angle is changed, the position of the binaural point on the 3D cubic can also be changed.

예를 들어, 도 10에 도시된 3차원 큐빅(1000)을 헤드 트래킹 정보에 상응하게 도 11에 도시된 것처럼 회전시켰다고 가정할 수 있다. 이 때, 도 10에 도시된 3차원 벡터(1030)의 방향 정보를 그대로 도 11에 도시된 3차원 큐빅에 적용함으로써 회전에 따라 변경된 바이노럴 포인트의 위치를 검출할 수 있다. For example, it can be assumed that the 3D cubic 1000 shown in FIG. 10 is rotated as shown in FIG. 11 corresponding to the head tracking information. At this time, by applying the direction information of the 3D vector 1030 shown in FIG. 10 to the 3D cubic shown in FIG. 11 as it is, the position of the binaural point changed according to the rotation can be detected.

이 때, 헤드 트래킹 정보는 사용자나 청취자의 머리 움직임을 트래킹한 데이터에 상응하는 것으로, 별도의 헤드 트래킹 모듈에 기반한 트래킹 입력 및 사용자 인터페이스에 기반한 사용자 입력 중 적어도 하나에 상응하게 획득될 수 있다.At this time, the head tracking information corresponds to data tracking the head movement of a user or a listener, and may be obtained corresponding to at least one of a tracking input based on a separate head tracking module and a user input based on a user interface.

예를 들어, 사용자나 청취자가 헤드 트래킹 모듈을 직접 착용한 상태에서 머리를 움직이면, 헤드 트래킹 모듈에서 사용자의 머리가 움직인 거리나 각도 등을 측정하여 헤드 트래킹 정보로 생성하고 전송할 수 있다.For example, if the user or the listener moves the head while wearing the head tracking module directly, the head tracking module may measure and measure the distance or angle of the user's head movement and generate and transmit the head tracking information.

다른 예를 들어, 헤드 트래킹 정보는 사용자나 청취자가 사용자 인터페이스를 통해 인위적으로 부여할 수도 있다. 즉, 사용자나 청취자가 인위적으로 공간 이미지를 회전시키기 위해, 헤드 트래킹 모듈에 의한 헤드 트래킹 정보의 수신 여부와 상관없이 사용자 인터페이스를 기반으로 헤드 트래킹 정보를 입력할 수도 있다. 이 때, 사용자나 청취자는 업 믹스 스테레오 출력을 생성하는 믹싱과정 또는 입력되는 정보에 따라 변화하는 업 믹스 스테레오 출력을 청취하면서 헤드 트래킹 정보를 입력 및 수정할 수도 있다.For another example, the head tracking information may be artificially provided by a user or a listener through a user interface. That is, in order to artificially rotate a spatial image by a user or a listener, head tracking information may be input based on a user interface regardless of whether head tracking information is received by the head tracking module. At this time, the user or the listener may input and modify the head tracking information while listening to the mixing process generating the upmix stereo output or the upmix stereo output changing according to the input information.

이 때, 3차원 큐빅은 팬(Pan), 틸트(tilt) 및 롤(roll) 중 적어도 하나의 회전 파라미터에 상응하게 회전될 수 있다. At this time, the three-dimensional cubic can be rotated corresponding to at least one rotation parameter of a pan, tilt, and roll.

예를 들어, 도 12에 도시된 것과 같이 청취자가 팬(Pan), 틸트(tilt) 및 롤(roll) 중 적어도 하나에 상응하게 머리를 회전하는 경우, 이 값을 회전 파라미터로 획득하여 3차원 큐빅에 적용할 수 있다. For example, as shown in FIG. 12, when the listener rotates the head corresponding to at least one of a pan, tilt, and roll, this value is obtained as a rotation parameter to obtain a three-dimensional cubic Can be applied to.

이와 같이, 헤드 트래킹 정보에 따라 3차원 큐빅을 회전시키거나 상하좌우로 움직여서 연출되는 효과는 향후 와이드 스테레오 출력 및 스테레오 신호와 믹싱되어 업 믹스 스테레오 출력을 생성할 수 있다. 따라서, 스테레오 레이어를 회전시키거나 이동시키는 종래의 방식보다 효율적으로 헤드 트래킹에 기반한 이머시브(immersive) 효과를 연출할 수 있다.As described above, the effect produced by rotating the 3D cubic or moving it up, down, left, and right according to the head tracking information may be mixed with a wide stereo output and a stereo signal in the future to generate an upmix stereo output. Accordingly, it is possible to produce an immersive effect based on head tracking more efficiently than the conventional method of rotating or moving the stereo layer.

이 때, 바이노럴 출력은 상위 레이어의 기본 주파수를 기준으로 하는 하모닉스를 포함할 수 있다. At this time, the binaural output may include harmonics based on the fundamental frequency of the upper layer.

이 때, 하모닉스는 기준 주파수에 해당하는 소리에서 진동수가 정수배 관계에 있는 상음에 상응하는 것으로, 음악적인 자연스러움을 제공하기 위해 바이노럴 출력에 포함되어 믹싱에 활용될 수 있다. At this time, the harmonics corresponds to an overtone in which the frequency is an integer multiple in the sound corresponding to the reference frequency, and may be included in a binaural output and used for mixing to provide a musical naturalness.

또한, 프로세서(220)는 스테레오 신호에서 분리된 중음 영역을 기반으로 스테레오 와이드 프로세싱을 수행하여 와이드 스테레오 출력을 생성한다. In addition, the processor 220 performs stereo wide processing based on the mid-range separated from the stereo signal to generate a wide stereo output.

이 때, 와이드 스테레오 출력은 중음 영역에 상응하는 와이드 스테레오 레이어를 기반으로 생성될 수 있다. At this time, the wide stereo output may be generated based on the wide stereo layer corresponding to the mid-range.

예를 들어, 도 13을 참조하면, 스테레오 와이더(1320)로 입력된 스테레오 신호의 중음 영역(1310)을 기반으로 와이드 스테레오 레이어에 상응하게 스테레오 와이드 프로세싱을 수행할 수 있다. 이 때, 와이드 스테레오 출력(1330)은 도 13에 도시된 것과 같이 2채널에 상응하는 스테레오 형식으로 출력될 수 있다.For example, referring to FIG. 13, stereo wide processing may be performed corresponding to the wide stereo layer based on the mid region 1310 of the stereo signal input to the stereo wirer 1320. At this time, the wide stereo output 1330 may be output in a stereo format corresponding to two channels as shown in FIG. 13.

이 때, 와이드 스테레오 레이어는 스테레오 이미지를 만드는 요소에 상응하는 것으로, 리버브 값과 딜레이 값에 상응하게 이미지 공간이 확장된 스테레오 레이어에 상응할 수 있다.At this time, the wide stereo layer corresponds to an element that creates a stereo image, and may correspond to a stereo layer in which the image space is expanded corresponding to a reverb value and a delay value.

예를 들어, 도 14를 참조하면, 스테레오 이미지 영역(1400)은 리버브(Reverb)(1410)와 딜레이 또는 팬(Delay or Pan)(1420)을 기반으로 확장될 수 있다. For example, referring to FIG. 14, the stereo image area 1400 may be extended based on a reverb 1410 and a delay or pan 1420.

이 때, 리버브(1410)는 음원에서 출발한 소리가 벽이나 바닥, 천정 같은 곳에 두번이상 부딪쳐서 귀에 도달한 잔향에 상응하는 것으로, 스테레오 이미지 영역(1400)에 해당하는 공간의 크기를 앞/뒤 방향으로 조절할 수 있는 값에 상응한다. 이 때, 리버브(1410) 값은 원음이 들리고 나서 리버브(1410)가 들리기까지 걸리는 시간에 해당하는 프리 딜레이(Pre Delay) 값을 기반으로 조절될 수 있다. 또한, 프리 딜레이 이외에도 초기 반사음에 해당하는 얼리 리플랙션(Early Reflection)을 파라미터로 조절하여 리버브(1410) 값을 조절할 수도 있다.At this time, the reverb 1410 corresponds to the reverberation that the sound originating from the sound source hits the ear more than once by hitting the wall, floor, or ceiling, and the size of the space corresponding to the stereo image area 1400 is forward/backward. Corresponds to the value that can be adjusted with. At this time, the reverb 1410 value may be adjusted based on a pre-delay value corresponding to the time it takes for the reverb 1410 to be heard after the original sound is heard. In addition, in addition to the pre-delay, it is also possible to adjust the value of the reverb 1410 by adjusting the early reflection corresponding to the initial reflection sound as a parameter.

이 때, 딜레이 또는 팬(1420)에서 딜레이는, 좌측과 우측 채널에 대한 딜레이 값에 해당하는 것으로 이 값을 서로 다르게 조절함으로써 스테레오 이미지 영역(1400)에 해당하는 공간의 크기를 좌/우 방향으로 조절할 수 있다. 이 때, 팬(Pan)은 수평적으로 소리가 어디까지 퍼지도록 할지를 결정하는 값에 해당하므로, 본 발명에서는 딜레이 또는 팬(1420)을 조절하여 스테레오 이미지 영역(1400)의 해당하는 공간의 좌우 크기를 조절할 수 있다. At this time, the delay or the delay in the fan 1420 corresponds to a delay value for the left and right channels, and by adjusting these values differently, the size of the space corresponding to the stereo image area 1400 is turned left/right. Can be adjusted. At this time, the pan (Pan) corresponds to a value that determines the extent to which the sound spreads horizontally, in the present invention, the left or right size of the corresponding space in the stereo image area 1400 by adjusting the delay or the fan 1420 Can be adjusted.

이 때, 도 15를 참조하면, 본 발명의 일실시예에 따른 와이드 스테레오 레이어(1530)는 서라운드 형태의 상위 레이어(1510)와 하위 레이어(1520)로 구성된 3차원 큐빅과 조합되어 위치할 수 있다. 이 때, 도 15에 도시된 구조는 일실시예에 상응하는 것으로, 각각의 레이어들을 조합한 구조에 한정되지 않는다.15, the wide stereo layer 1530 according to an embodiment of the present invention may be positioned in combination with a three-dimensional cubic composed of a surround upper layer 1510 and a lower layer 1520. . At this time, the structure shown in FIG. 15 corresponds to an embodiment, and is not limited to a structure in which each layer is combined.

또한, 프로세서(220)는 스테레오 신호, 바이노럴 출력 및 와이드 스테레오 출력을 합하여 업 믹스 스테레오 출력을 생성한다. In addition, the processor 220 combines stereo signals, binaural outputs, and wide stereo outputs to generate an upmix stereo output.

즉, 바이노럴 출력에 의한 이머시브(immersive) 요소와 와이드 스테레오 출력에 의한 확장된 스테레오 효과를 소스로 사용된 스테레오 신호와 함께 믹스함으로써 이머시브 효과가 포함된 업 믹스 스테레오 출력을 생성할 수 있다. That is, the upmix stereo output including the immersive effect can be generated by mixing the immersive element by the binaural output and the extended stereo effect by the wide stereo output together with the stereo signal used as the source. .

또한, 본 발명은 상기와 같은 기능의 프로세서(220)를 기반으로 자연스러운 업 믹스 기능을 지원할 수 있으므로 다양한 종류의 사운드를 지원하는 컨텐츠 간의 호환성을 향상시킬 수 있다.In addition, since the present invention can support a natural upmix function based on the processor 220 having the above-described function, compatibility between contents supporting various types of sounds can be improved.

메모리(230)는 스테레오 신호, 바이노럴 출력 및 와이드 스테레오 출력을 저장한다. The memory 230 stores stereo signals, binaural outputs, and wide stereo outputs.

또한, 메모리(230)는 상술한 바와 같이 본 발명의 일실시예에 따른 업 믹스 스테레오 출력을 생성하는 과정에서 발생되는 다양한 정보를 저장한다. In addition, as described above, the memory 230 stores various information generated in the process of generating an upmix stereo output according to an embodiment of the present invention.

실시예에 따라, 메모리(230)는 업 믹스 장치와 독립적으로 구성되어 업 믹스 스테레오 오디오 생성 기능을 지원할 수 있다. 이 때, 메모리(230)는 별도의 대용량 스토리지로 동작할 수 있고, 동작 수행을 위한 제어 기능을 포함할 수 있다.According to an embodiment, the memory 230 may be configured independently of the upmix device to support the upmix stereo audio generation function. At this time, the memory 230 may operate as a separate mass storage, and may include a control function for performing the operation.

한편, 업 믹스 장치는 메모리가 탑재되어 그 장치 내에서 정보를 저장할 수 있다. 일 구현예의 경우, 메모리는 컴퓨터로 판독 가능한 매체이다. 일 구현 예에서, 메모리는 휘발성 메모리 유닛일 수 있으며, 다른 구현예의 경우, 메모리는 비휘발성 메모리 유닛일 수도 있다. 일 구현예의 경우, 저장장치는 컴퓨터로 판독 가능한 매체이다. 다양한 서로 다른 구현 예에서, 저장장치는 예컨대 하드디스크 장치, 광학디스크 장치, 혹은 어떤 다른 대용량 저장장치를 포함할 수도 있다.On the other hand, an upmix device is equipped with a memory and can store information within the device. In one implementation, the memory is a computer-readable medium. In one implementation, the memory may be a volatile memory unit, and in other implementations, the memory may be a non-volatile memory unit. In one embodiment, the storage device is a computer-readable medium. In various different implementations, the storage device may include, for example, a hard disk device, an optical disk device, or some other mass storage device.

이와 같은 업 믹스 장치를 통해 이머시브(immersive)을 수행하지 않고도 기존 스테레오 파일을 이머시브로 업 믹스할 수 있고, 스테레오 파일을 이머시브 파일로 믹스하는데 필요한 시간과 비용을 절감할 수 있다.Through this upmix device, an existing stereo file can be upmixed to immersively without performing immersive, and the time and cost required to mix the stereo file into an immersive file can be reduced.

도 16는 본 발명의 일실시예에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법을 나타낸 동작흐름도이다.16 is a flowchart illustrating a method of up-mixing stereo audio to binaural audio according to an embodiment of the present invention.

도 16을 참조하면, 본 발명의 일실시예에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법은 스테레오 신호에서 분리된 고음 영역 및 저음 영역을 기반으로 바이노럴 인코딩을 수행하여 바이노럴 출력을 생성한다(S1610).Referring to FIG. 16, a method of up-mixing stereo audio to binaural audio according to an embodiment of the present invention is binaural by performing binaural encoding based on a high-pitched region and a low-pitched region separated from a stereo signal. Generate an output (S1610).

또한, 저음 영역(612)에서 검출된 저음 주파수를 좌우 분리 처리하여 레프트 채널 L과 라이트 채널 R에 상응하는 위치를 획득하고, 도 7에 도시된 것과 같이 레프트 채널의 위치에 스피커(721)을 배치하고, 라이트 채널의 위치에 스피커(722)를 배치할 수 있다. 이 후, 레프트 채널 L과 라이트 채널 R을 'L-(L-R)'에 상응하게 조합한 위치에 스피커(723)을 배치하고, 레프트 채널 L과 라이트 채널 R을 'R-(L-R)'에 상응하게 조합한 위치에 스피커(724)를 배치함으로써 3차원 큐빅의 하위 레이어(720)를 구성할 수 있다.In addition, the bass frequencies detected in the bass region 612 are separated left and right to obtain positions corresponding to the left channel L and the light channel R, and the speaker 721 is disposed at the position of the left channel as shown in FIG. 7. And the speaker 722 can be arrange|positioned at the position of a light channel. Subsequently, the speaker 723 is disposed in a position where the left channel L and the light channel R correspond to'L-(LR)', and the left channel L and the light channel R correspond to'R-(LR)'. By disposing the speaker 724 at the combined position, the lower layer 720 of the 3D cubic can be constructed.

이 때, 하모닉스는 기준 주파수에 해당하는 소리에서 진동수가 정수배 관계에 있는 상음에 상응하는 것으로, 음악적인 자연스러움을 제공하기 위해 바이노럴 출력에 포함되어 믹싱에 활용될 수 있다.At this time, the harmonics corresponds to an overtone in which the frequency is an integer multiple in the sound corresponding to the reference frequency, and may be included in a binaural output and used for mixing to provide a musical naturalness.

또한, 본 발명의 일실시예에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법은 스테레오 신호에서 분리된 중음 영역을 기반으로 스테레오 와이드 프로세싱을 수행하여 와이드 스테레오 출력을 생성한다(S1620).In addition, in the method of upmixing stereo audio to binaural audio according to an embodiment of the present invention, stereo wide processing is performed based on a midrange separated from a stereo signal to generate a wide stereo output (S1620).

또한, 본 발명의 일실시예에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법은 스테레오 신호, 바이노럴 출력 및 와이드 스테레오 출력을 합하여 업 믹스 스테레오 출력을 생성한다(S1630).In addition, in the method of upmixing stereo audio to binaural audio according to an embodiment of the present invention, an upmix stereo output is generated by combining a stereo signal, a binaural output, and a wide stereo output (S1630).

또한, 본 발명은 상기와 같은 기능을 기반으로 자연스러운 업 믹스 기능을 지원할 수 있으므로 다양한 종류의 사운드를 지원하는 컨텐츠 간의 호환성을 향상시킬 수 있다.In addition, since the present invention can support a natural upmix function based on the above functions, compatibility between contents supporting various types of sounds can be improved.

또한, 도 16에는 도시하지 아니하였으나, 본 발명의 일실시예에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법은 네트워크와 같은 통신망을 통해 업 믹스 스테레오 오디오를 생성을 위해 필요한 정보를 송수신한다. 특히, 업 믹스 스테레오 오디오 생성을 위해 소스에 해당하는 스테레오 신호 또는 컨텐츠, 바이노럴 인코딩을 위해 헤드 트래킹 모듈이나 사용자 인터페이스를 통해 입력될 헤드 트래킹 정보 등을 수신하고, 업 믹스 스테레오 출력에 상응하는 업 믹스 스테레오 오디오를 제공할 수 있다.In addition, although not shown in FIG. 16, a method of up-mixing stereo audio to binaural audio according to an embodiment of the present invention transmits and receives information necessary for generating up-mix stereo audio through a communication network such as a network. . In particular, for generating upmix stereo audio, stereo signals or contents corresponding to a source, head tracking information to be input through a head tracking module or a user interface for binaural encoding, and the like, and up corresponding to up mix stereo output Mix stereo audio can be provided.

또한, 본 발명의 일실시예에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법은 상술한 바와 같이 본 발명의 일실시예에 따른 업 믹스 스테레오 출력을 생성하는 과정에서 발생되는 다양한 정보를 저장한다.In addition, the method of upmixing stereo audio according to an embodiment of the present invention to binaural audio stores various information generated in the process of generating an upmix stereo output according to an embodiment of the present invention as described above. do.

본 발명의 실시예는 컴퓨터로 구현된 방법이나 컴퓨터에서 실행 가능한 명령어들이 기록된 비일시적인 컴퓨터에서 읽을 수 있는 매체로 구현될 수 있다. 컴퓨터에서 읽을 수 있는 명령어들이 프로세서에 의해서 수행될 때, 컴퓨터에서 읽을 수 있는 명령어들은 본 발명의 적어도 한 가지 측면에 따른 방법을 수행할 수 있다.Embodiments of the present invention may be implemented as a computer-implemented method or a non-transitory computer-readable medium in which instructions executable on a computer are recorded. When computer readable instructions are executed by a processor, computer readable instructions may perform a method in accordance with at least one aspect of the present invention.

이상에서와 같이 본 발명에 따른 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법 및 이를 위한 장치는 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, the method and apparatus for up-mixing stereo audio according to the present invention to binaural audio are not limited to the configuration and method of the above-described embodiments, and the above embodiments All or some of the embodiments may be selectively combined to constitute various modifications.

110: 스테레오 신호 121, 300: 고음 패스 필터
122, 400: 중음 패스 필터 123, 500: 저음 패스 필터
130, 620: 바이노럴 인코더 140, 1320: 스테레오 와이더
150: 바이노럴 믹서 210: 통신부
220: 프로세서 230: 메모리
611: 고음 영역 신호 612: 저음 영역 신호
630: 바이노럴 출력 710, 1510: 상위 레이어
711~714, 721~724: 스피커 720, 1520: 하위 레이어
810: 2채널 820: 8채널
910: 거리 1000: 3차원 큐브
1010: 기준 청취점 1020: 바이노럴 포인트
1030: 3차원 벡터 1310: 중음 영역 신호
1330: 와이드 스테레오 출력 1400: 스테레오 이미지 영역
1410: 리버브 1420: 딜레이 또는 팬
1530: 와이드 스테레오 레이어110: stereo signal 121, 300: treble pass filter
122, 400: mid pass filter 123, 500: bass pass filter
130, 620: binaural encoder 140, 1320: stereo winder
150: binaural mixer 210: communication unit
220: processor 230: memory
611: treble signal 612: bass signal
630: binaural output 710, 1510: upper layer
711~714, 721~724: speaker 720, 1520: lower layer
810: 2 channels 820: 8 channels
910: Distance 1000: 3D cube
1010: Reference listening point 1020: Binaural point
1030: 3D vector 1310: Mid-range signal
1330: Wide stereo output 1400: Stereo image area
1410: reverb 1420: delay or fan
1530: Wide stereo layer

Claims

Generating binaural output by performing binaural encoding based on a high-pitched region and a low-pitched region separated from the stereo signal;
Generating stereo wide output by performing stereo wide processing based on the midrange separated from the stereo signal; And
Combining the stereo signal, the binaural output, and the wide stereo output to generate an upmix stereo output that multi-channels the stereo signal.
Including,
The binaural output is generated based on a binaural point located in an 8-channel based 3D cubic composed of 4 up channels and 4 down channels,
The four up channels and the four down channels are generated based on the left channel and the right channel generated by separating left and right frequencies of the strong transients respectively found in the treble region and the bass region,
The up-mix stereo output is a method of up-mixing stereo audio to binaural audio, characterized in that it corresponds to an immersive content type capable of representing the directionality of the treble region based on the binaural point.

The method according to claim 1,
The binaural output is
It is generated corresponding to the 3D vector for the binaural point, wherein the positions of the four up channels are set based on the treble region, and the positions of the four down channels are set based on the bass region. How to up-mix stereo audio featuring binaural audio

The method according to claim 2,
The positions of the four up channels are set using any one treble frequency detected based on the magnitude of the transient in the treble region,
The method of up-mixing stereo audio to binaural audio is characterized in that the positions of the four down-channels are set using any one bass frequency detected based on the magnitude of a transient in the bass region.

The method according to claim 3,
The distance between the upper layer of the three-dimensional cubic composed of the four up channels and the lower layer of the three-dimensional cubic composed of the four down channels is set based on the equalizer value of the stereo signal. How to upmix audio to binaural audio.

The method according to claim 1,
The wide stereo output
It is generated based on the wide stereo layer corresponding to the mid-range, the wide stereo layer is a stereo audio characterized in that it corresponds to a stereo layer with an extended image space corresponding to the reverb value and the delay value to binaural audio How to mix up.

The method according to claim 2,
The 3D vector
A method of upmixing stereo audio to binaural audio, characterized in that it is generated based on a reference listening point located inside the 3D cubic.

The method according to claim 6,
The step of generating the binaural output is
A method of upmixing stereo audio to binaural audio, wherein the binaural output is generated by applying direction information of the 3D vector to the 3D cubic rotated corresponding to head tracking information.

The method according to claim 7,
The three-dimensional cubic
A method of upmixing stereo audio to binaural audio, characterized in that it rotates corresponding to at least one rotation parameter of a pan, tilt, or roll.

The method according to claim 4,
The binaural output is
A method of upmixing stereo audio to binaural audio, comprising harmonics based on the fundamental frequency of the upper layer.

The method according to claim 1,
How to upmix the stereo audio to binaural audio
And inputting the stereo signal into a treble pass filter, a mid pass filter, and a bass pass filter, respectively, and separating the stereo signal into the treble region, the mid region, and the bass region. How to upmix to binaural audio.

Binaural encoding is performed based on the treble and bass regions separated from the stereo signal to generate a binaural output, and stereo wide processing is performed based on the midrange separated from the stereo signal to generate a wide stereo output. And a processor generating a multi-channel upmix stereo output by combining the stereo signal, the binaural output, and the wide stereo output. And
Memory for storing the stereo signal, the binaural output and the wide stereo output
Including,
The binaural output is generated based on a binaural point located in an 8-channel based 3D cubic composed of 4 up channels and 4 down channels,
The four up channels and the four down channels are generated based on the left channel and the right channel generated by separating left and right frequencies of the strong transients respectively found in the treble region and the bass region,
The upmix stereo output corresponds to an immersive content type capable of indicating the directionality of the treble region based on the binaural point.

The method according to claim 11,
The binaural output is
It is generated corresponding to the 3D vector for the binaural point, wherein the positions of the four up channels are set based on the treble region, and the positions of the four down channels are set based on the bass region. Up-mixing device characterized by.

The method according to claim 12,
The positions of the four up channels are set using any one treble frequency detected based on the magnitude of the transient in the treble region,
The position of the four down-channels is set using any one bass frequency detected based on the magnitude of a transient in the bass region.

The method according to claim 13,
The distance between the upper layer of the 3D cubic composed of the 4 up channels and the lower layer of the 3D cubic composed of the 4 down channels is set based on the equalizer value of the stereo signal. Mix device.

The method according to claim 11,
The wide stereo output
The upmix device is generated based on a wide stereo layer corresponding to the mid-range, wherein the wide stereo layer corresponds to a stereo layer in which an image space is expanded corresponding to a reverb value and a delay value.

The method according to claim 12,
The 3D vector
Up-mixing device, characterized in that generated based on the reference listening point located inside the three-dimensional cubic.

The method according to claim 16,
The processor
And applying the direction information of the 3D vector to the 3D cubic rotated corresponding to the head tracking information to generate the binaural output.

The method according to claim 17,
The three-dimensional cubic
Up-mixing device characterized in that it is rotated in correspondence to at least one rotation parameter of a pan, tilt, and roll.

The method according to claim 14,
The binaural output is
And a harmonics based on the fundamental frequency of the upper layer.

The method according to claim 11,
The processor
The stereo signal is input to the treble pass filter, the mid pass filter, and the bass pass filter, respectively, and the upmix device, characterized in that to separate the stereo signal into the treble region, the mid region and the bass region.