KR100932791B1

KR100932791B1 - Method of generating head transfer function for sound externalization, apparatus for processing 3D audio signal using same and method thereof

Info

Publication number: KR100932791B1
Application number: KR1020080040073A
Authority: KR
Inventors: 장인선; 이용주; 장대영; 이태진; 서정일; 강경옥; 홍진우; 김진웅; 안치득
Original assignee: 한국전자통신연구원
Priority date: 2008-02-21
Filing date: 2008-04-29
Publication date: 2009-12-21
Also published as: KR20090090975A

Abstract

본 발명은 음상 외재화를 위한 머리전달함수 생성 방법과, 그를 이용한 3차원 오디오 신호 처리 장치 및 그 방법에 관한 것으로서, 구체(球體) 마이크로폰으로 측정된 멀티채널 룸 임펄스 응답을 통해 모델링된 머리 전달 함수(HRTF)를 이용하여 3차원 오디오 신호를 생성함으로써, 음상 내재화를 제거하여 3차원 오디오 신호의 현장감(현실감)을 증대시키고자 한다.The present invention relates to a method of generating a head transfer function for sound image externalization, a three-dimensional audio signal processing apparatus using the same, and a method of performing a head transfer function modeled through a multichannel room impulse response measured by a sphere microphone. By generating a three-dimensional audio signal using (HRTF), it is intended to increase the realism (reality) of the three-dimensional audio signal by removing sound image internalization.

이를 위하여, 본 발명은, 멀티채널 임펄스 응답을 이용한 3차원 오디오 신호 처리 장치에 있어서, 오디오 데이터를 디코딩하여 원래의 오디오 신호를 복원하기 위한 오디오 디코딩 수단; 및 구체(球體) 마이크로폰으로 측정된 멀티채널 룸 임펄스 응답을 통하여 모델링된 머리 전달 함수(HRTF)를 이용하여, 상기 복원된 오디오 신호에 대한 3차원 오디오 신호를 생성하기 위한 3차원 오디오 생성 수단을 포함한다.To this end, the present invention provides a three-dimensional audio signal processing apparatus using a multichannel impulse response, comprising: audio decoding means for decoding audio data and restoring an original audio signal; And three-dimensional audio generating means for generating a three-dimensional audio signal for the reconstructed audio signal using a head transfer function (HRTF) modeled through a multichannel room impulse response measured with a sphere microphone. do.

3차원 오디오, 입체 음향, 고현장감, 머리 전달 함수, HRTF, 멀티채널 임펄스 응답, 구체 마이크로폰, 음상 외재화 3D audio, stereo, high realistic, head transfer function, HRTF, multichannel impulse response, sphere microphone, sound externalization

Description

Method for generating head transfer function for sound image externalization, apparatus for processing 3D audio signal using same, and method therefor {METHOD FOR CREATING HRTTF FOR SOUND EXTERNALIZATION, APPARATUS AND METHOD FOR PROCESSING 3D AUDIO SIGNAL}

본 발명은 고현장감 멀티미디어 재생을 위한 3차원 오디오 신호 처리에 관한 것으로, 더욱 상세하게는 구체(球體) 마이크로폰으로 측정된 멀티채널 룸 임펄스 응답을 통하여 모델링된 머리 전달 함수(HRTF)를 이용하여 3차원 오디오 신호를 생성함으로써, 음상 내재화를 제거하여 3차원 오디오 신호의 현장감(현실감)을 증대시킬 수 있는, 음상 외재화를 위한 머리전달함수 생성 방법과, 그를 이용한 3차원 오디오 신호 처리 장치 및 그 방법에 관한 것이다.The present invention relates to three-dimensional audio signal processing for high-reality multimedia playback, and more particularly, to three-dimensional audio signal processing using a HRTF modeled through a multichannel room impulse response measured by a sphere microphone. A method of generating a head transfer function for sound externalization, a three-dimensional audio signal processing apparatus using the same, and a method for generating an audio signal, thereby eliminating sound internalization to increase the presence (reality) of a three-dimensional audio signal. It is about.

본 발명은 정보통신부 및 정보통신연구진흥원의 IT성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2007-S-004-01, 과제명: 무안경 개인형 3D 방송기술개발].The present invention is derived from the research conducted as part of the IT growth engine technology development project of the Ministry of Information and Communication and the Ministry of Information and Telecommunications Research and Development. [Task Management Number: 2007-S-004-01] Development].

3차원 오디오 기술은 청취자로 하여금 음원을 획득한 곳에 있는 것과 같은 느낌을 주는 기술로서, 이에는 3차원 오디오 재생 기술과 3차원 오디오 획득 기술이 있으며, 두 기술 모두 중요한 요소로 여겨져 왔다.Three-dimensional audio technology is a technology that gives the listener a feeling as if the sound source is obtained, there are three-dimensional audio reproduction technology and three-dimensional audio acquisition technology, both technologies have been considered as an important element.

3차원 오디오 획득 기술에는 더미 헤드(Dummy Head)를 이용한 획득 기술, 멀티채널 마이크로폰을 이용한 획득 기술, 구체(球體) 위에 설치한 멀티채널 마이크로폰(구체 마이크로폰)을 이용한 획득 기술 등이 있다.Three-dimensional audio acquisition techniques include an acquisition technique using a dummy head, an acquisition technique using a multichannel microphone, and an acquisition technique using a multichannel microphone (sphere microphone) installed on a sphere.

이러한 획득 기술들은 청취자에게 풍부한 입체감과 현장감을 제공해준다는 장점은 있으나, 3차원 오디오 효과를 주기 위해 각각의 방식으로 오디오 컨텐츠를 획득한다는 것이 현실적으로 상당히 어려우며, 더 나아가 기존 스테레오 컨텐츠에 3차원 효과를 주기 위한 기술로 이용하기에도 한계가 있다.Although these acquisition techniques have the advantage of providing a rich three-dimensional and realism to the listener, it is practically difficult to obtain audio content in each way to give a three-dimensional audio effect, and furthermore, to give a three-dimensional effect to the existing stereo content. There is a limit to using it as a technology.

한편, 멀티채널 마이크로폰을 이용한 획득 방법들에 있어서, 마이크로폰의 개수가 2개를 초과하게 되면 획득 기술 적용 후에 출력되는 신호의 형태가 3채널 이상이 되는데, 이를 휴대 단말 등의 헤드폰/이어폰을 통해서 재생하기 위해서는 스테레오 신호로의 변환(스테레오 다운믹스), 즉, 후처리(Post-Processing)가 필요하게 된다.On the other hand, in the acquisition methods using a multi-channel microphone, if the number of microphones exceeds two, the form of the signal output after the acquisition technology is three or more channels, which is reproduced through a headphone / earphone such as a portable terminal This requires conversion to a stereo signal (stereo downmix), i.e. post-processing.

도 1은 종래의 구체 마이크로폰을 이용한 스테레오 오디오 획득 방법에 대한 설명도로서, 5채널 구체 마이크로폰(11)을 이용하여 오디오 신호를 획득하고, 그 5채널 오디오 신호(u₁, u₂, ..., u₅)를 후처리 모듈(12)을 통하여 스테레오 오디오 출력신호로 변환하는 과정을 나타낸다. 여기서, "1" 내지 "5"는 구체 상에서의 마이크로폰의 배치를 나타낸다.1 is an explanatory diagram of a stereo audio acquisition method using a conventional sphere microphone. An audio signal is obtained using a five-channel sphere microphone 11, and the five-channel audio signal u ₁ , u ₂ , ... , u ₅ ) is converted into a stereo audio output signal through the post-processing module 12. Here, "1" to "5" indicate the placement of the microphone on the sphere.

이러한 종래의 획득 기술은 휴대 단말에서 3차원 오디오 재생을 위해서, 복잡한 후처리(스테레오 다운믹스) 과정을 거쳐야 하기 때문에, 현실적으로는 그 적용에 한계가 있다.This conventional acquisition technique has a complicated post-processing (stereo downmix) process for three-dimensional audio reproduction in a portable terminal, and therefore, its application is practically limited.

따라서 기존 컨텐츠에 3차원 효과(입체 효과)를 주기 위한 '재생 기술'에 무게가 실리고 있다.Therefore, the weight is put on the 'playing technology' to give the three-dimensional effect (stereoscopic effect) to the existing content.

최근 MP3 플레이어(Player), PMP(Portable Multimedia Player), 핸드폰, DMB 플레이어(Player) 등과 같은 다양한 휴대 단말을 통하여 멀티미디어 데이터를 시청하는 경우가 급속히 증가하고 있다.Recently, the multimedia data is rapidly increasing through various portable terminals such as an MP3 player, a portable multimedia player, a mobile phone, and a DMB player.

이러한 휴대 단말에서는 헤드폰 또는 이어폰을 통해 오디오 신호를 청취하는 방식이 일반적인데, 이러한 방식으로 오디오 신호를 청취하는 경우에는 오디오의 음상(Sound Image)이 머리 내부에 맺히는 음상 내재화(IHL: Inside-the-Head Localization) 현상이 발생하게 된다.In such a mobile terminal, a method of listening to an audio signal through a headphone or earphone is common. In the case of listening to an audio signal in this manner, an internal sound image (IHL: Inside-the-) is formed in which a sound image of an audio is formed inside the head. Head Localization) will occur.

이러한 음상 내재화(IHL) 현상은 공간감이나 입체감을 떨어지게 하여 음향의 현실감을 저하시키는 요인이 되며, 또한 청취자로 하여금 피로를 쉽게 느끼게 하는 요인이 될 수 있기 때문에, 이를 극복하여 청취자로 하여금 3차원 효과(입체 효과)를 느낄 수 있도록 하는 다양한 기술들이 출현하고 있다.This internalization of sound (IHL) is a factor that degrades the realism of sound by degrading a sense of space or three-dimensionality, and also can be a factor that makes listeners feel fatigue easily. A variety of techniques are emerging to make a stereo effect.

즉, 상기와 같은 음상 내재화(IHL) 문제를 해결함으로써 헤드폰/이어폰을 통한 청취 시 음상이 머리의 '외부'에 맺히도록(OHL: Out of the Head Localization) 하는 기술을 음상 외재화(Sound Externalization) 기술이라 하는데, 이와 관련해서는 공간의 반사 및 잔향에 의한 공간 음향 특성, 개인의 머리 및 귓바퀴 등 인체에 의한 음향 전달 특성, 머리 움직임에 의한 음향 전달 특성 변화 등을 이용한 접근방법들이 연구되어 오고 있다.In other words, by solving the above-mentioned problem of sound internalization (IHL), the sound externalization technique is used to make the sound image 'out of the head localization' (OHL) when listening through headphones / earphones. In this regard, approaches have been studied using spatial acoustic characteristics due to reflection and reverberation of a space, acoustic transmission characteristics by a human body such as an individual's head and axle, and acoustic transmission characteristics due to a head movement.

이들 중에서 공간 음향 특성 기반의 외재화 기술은 반사 및 잔향이 사람이 느끼는 현장감에 큰 영향을 미친다는 연구 결과가 있다. 이와 같은 연구 결과에 근거하고, 또한 반사음 및 잔향은 HRTF와 같이 개인화가 필요한 정보가 아니며 그 계산량도 적다는 장점 등으로 인하여, 현재 상용화되고 있는 대부분의 외재화 기술에서는 반사음과 잔향을 이용하고 있다.Among them, externalization technology based on spatial acoustic characteristics has a research result that reflection and reverberation have a great influence on the realism that a person feels. Based on the results of this study, the reflection sound and reverberation are not information that needs personalization like HRTF, and the calculation amount is small, and most externalization technologies are currently using reflection sound and reverberation.

종래의 공간 음향 특성 기반의 외재화 기술들은 반사음과 잔향을 적용하기 위해 다양한 방법들을 사용하고 있으며, 이에는 여러 개의 HRTF 및 이득/지연을 이용하여 반사음을 추가하는 방법, 임의의 각도에서 반사음을 가정하고 인공 잔향을 추가하는 방법 등이 있다.Existing externalization techniques based on spatial acoustic characteristics use various methods for applying reflection sound and reverberation, including adding reflection sound using multiple HRTFs and gain / delay, and assuming reflection sound at an arbitrary angle. And adding artificial reverberation.

하지만, 이러한 종래의 반사음/잔향을 이용한 방법들은 '인공적으로' 공간감을 증대시키기 때문에, 현장감이 떨어지고 음장(Sound Field) 및 음질이 저하되며, 청취자로 하여금 거부감이 들게 한다는 문제점이 있다.However, these conventional methods using reflection / reverberation have a problem in that 'aesthetically' increases the sense of space, so that the presence of the scene is degraded, the sound field and the sound quality are degraded, and the listener is rejected.

상기와 같은 종래의 3차원 오디오 기술에서, 3차원 오디오 획득 기술은 현실적으로 그 사용이 제한적이라는 문제점이 있고, 또한 3차원 오디오 재생 기술은 인공적으로 공간감을 증대시키기 때문에 현장감 및 음장/음질을 저하시킨다는 문제점 이 있으며, 이러한 문제점을 해결하고자 하는 것이 본 발명의 과제이다.In the conventional three-dimensional audio technology as described above, the three-dimensional audio acquisition technology has a problem that its use is practically limited, and furthermore, the three-dimensional audio reproduction technology artificially increases the sense of space, thereby lowering the realism and sound field / sound quality. There is a problem to be solved by the present invention.

따라서 본 발명은 구체(球體) 마이크로폰으로 측정된 멀티채널 룸 임펄스 응답을 통하여 모델링된 머리 전달 함수(HRTF)를 이용하여 3차원 오디오 신호를 생성함으로써, 음상 내재화를 제거하여 3차원 오디오 신호의 현장감(현실감)을 증대시킬 수 있는, 음상 외재화를 위한 머리전달함수 생성 방법과, 그를 이용한 3차원 오디오 신호 처리 장치 및 그 방법을 제공하는데 그 목적이 있다.Therefore, the present invention generates a three-dimensional audio signal using a head transfer function (HRTF) modeled through a multi-channel room impulse response measured by a sphere microphone, thereby eliminating sound image internalization to produce a realistic sense of the three-dimensional audio signal. It is an object of the present invention to provide a method for generating a head transfer function for sound externalization, a three-dimensional audio signal processing apparatus using the same, and a method thereof, which can increase the sense of reality.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects and advantages of the present invention which are not mentioned above can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. Also, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

본 발명은 상기와 같은 목적을 해결하기 위하여, 구체 마이크로폰을 이용하여 측정한 멀티채널 룸(Room) 임펄스 응답을 통해 모델링된 머리전달함수(HRTF)를 이용하여 음상 내재화를 제거하는 것을 특징으로 한다.In order to solve the above object, the present invention is characterized by removing the internalization of sound using a head transfer function (HRTF) modeled through a multi-channel room impulse response measured using a concrete microphone.

더욱 상세하게, 본 발명은, 음상 외재화를 위한 머리전달함수(HRTF) 생성 방법에 있어서, 멀티채널 오디오신호를 스테레오 신호로 변환할 수 있는 변환 함수를 구하는 단계; 구체(球體) 마이크로폰을 이용하여 멀티채널 룸 임펄스 응답을 구하는 단계; 및 상기 변환 함수 및 상기 멀티채널 룸 임펄스 응답을 이용하여 머리전 달함수(HRTF)를 생성하는 머리전달함수 생성 단계를 포함한다.More specifically, the present invention provides a method of generating a head transfer function (HRTF) for sound externalization, the method comprising: obtaining a conversion function capable of converting a multichannel audio signal into a stereo signal; Obtaining a multichannel room impulse response using a sphere microphone; And generating a head transfer function (HRTF) using the transform function and the multichannel room impulse response.

또한, 본 발명은, 멀티채널 임펄스 응답을 이용한 3차원 오디오 신호 처리 장치에 있어서, 오디오 데이터를 디코딩하여 원래의 오디오 신호를 복원하기 위한 오디오 디코딩 수단; 및 구체(球體) 마이크로폰으로 측정된 멀티채널 룸 임펄스 응답을 통하여 모델링된 머리 전달 함수(HRTF)를 이용하여, 상기 복원된 오디오 신호에 대한 3차원 오디오 신호를 생성하기 위한 3차원 오디오 생성 수단을 포함한다.The present invention also provides a three-dimensional audio signal processing apparatus using a multichannel impulse response, comprising: audio decoding means for decoding audio data and restoring an original audio signal; And three-dimensional audio generating means for generating a three-dimensional audio signal for the reconstructed audio signal using a head transfer function (HRTF) modeled through a multichannel room impulse response measured with a sphere microphone. do.

또한, 본 발명은, 멀티채널 임펄스 응답을 이용한 3차원 오디오 신호 처리 방법에 있어서, 오디오 데이터를 디코딩하여 원래의 오디오 신호를 복원하는 단계; 및 구체 마이크로폰으로 측정된 멀티채널 룸 임펄스 응답을 통하여 모델링된 머리 전달 함수(HRTF)를 이용하여, 상기 복원된 오디오 신호에 대한 3차원 오디오 신호를 생성하는 3차원 오디오 생성 단계를 포함한다.In addition, the present invention, a three-dimensional audio signal processing method using a multi-channel impulse response, decoding the audio data to restore the original audio signal; And a three-dimensional audio generation step of generating a three-dimensional audio signal for the reconstructed audio signal using a head transfer function (HRTF) modeled through a multichannel room impulse response measured by a concrete microphone.

상기와 같은 발명은, 더미헤드(dummy head)를 이용하여 무향실에서 녹음한 HRTF를 사용하는 기존 기술과 달리, 특정 공간에서 5채널의 구체 마이크로폰을 이용하여 녹음한 멀티채널 룸 임펄스 응답을 이용함으로써 우수한 외재화 성능과 자연스러운 잔향을 제공할 수 있는 효과가 있다.The present invention is superior to the conventional technology using HRTF recorded in an anechoic chamber using a dummy head, by using a multichannel room impulse response recorded using a five-channel concrete microphone in a specific space. This has the effect of providing externalization performance and natural reverberation.

따라서 본 발명은, 인공적인 3차원 효과를 제공하는 것이 아니라, 공간으로부터 구체 마이크로폰을 이용하여 측정한 룸 임펄스 응답을 통해 모델링한 HRTF(enhanced HRTF)를 이용하여 3차원 오디오 신호 처리를 수행함으로써, 3차원 오디오 신호의 현장감을 현저히 높일 수 있는 효과가 있다.Therefore, the present invention does not provide an artificial three-dimensional effect, but performs three-dimensional audio signal processing using an enhanced HRTF (HRTF) modeled through a room impulse response measured using a sphere microphone from space. There is an effect that can significantly increase the realism of the three-dimensional audio signal.

또한, 본 발명은, MP3 플레이어, PMP, DMB 재생장치, 핸드폰 등과 같은 휴대 멀티미디어 장치, 또는 개인용 컴퓨터(PC)에 탑재되어 재생되는 멀티미디어 재생 프로그램 등에 탑재되어 사용됨으로써, 헤드폰 또는 이어폰으로 멀티미디어 데이터를 재생/청취할 때, 각각의 개인에게 더욱 현장감 있는 3차원 오디오를 제공할 수 있는 효과가 있다.In addition, the present invention is used by being mounted on a portable multimedia device such as an MP3 player, a PMP, a DMB playback device, a mobile phone, or a multimedia playback program mounted on a personal computer (PC) and played back, thereby playing multimedia data with headphones or earphones. When listening, it has the effect of providing each individual with more realistic three-dimensional audio.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명하기로 한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, whereby those skilled in the art may easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2a 및 도 2b는 본 발명에 따른 음상 외재화를 위한 머리전달함수(HRTF) 생성 방법에 대한 일실시예 설명도이다.2A and 2B are diagrams illustrating an embodiment of a method of generating a head transfer function (HRTF) for sound externalization according to the present invention.

먼저, 본 발명에 사용되는 구체(球體) 마이크로폰(21)을 설명하기로 한다. 구체 마이크로폰(21)은 인간의 머리형태를 구체(球體)로 단순화한 것으로서, 기존의 HRTF를 측정하는데 사용된 더미 헤드(Dummy Head)에 비해 크기 및 형태뿐만 아 니라 해당 마이크로폰의 개수 및 위치에 대한 제약이 적다. 또한 마이크로폰의 위치에 따라 기존 더미 헤드가 갖는 여러 문제점을 해결할 수 있는 장점이 있다.First, the spherical microphone 21 used in the present invention will be described. The sphere microphone 21 simplifies the shape of a human head into a sphere, and compares the size and shape of the human head with the number and position of the microphone as well as the size and shape of the dummy head. Less constraint In addition, there is an advantage that can solve various problems of the existing dummy head according to the position of the microphone.

본 발명은, 특정 공간에서 5채널의 구체 마이크로폰(즉, 구체의 수평면 위에 5개의 마이크로폰(마이크)이 배치된 구체 마이크로폰)을 이용하여 멀티채널 룸 임펄스 응답을 구하고, 이를 이용하여 HRTF를 모델링한 후, 그 모델링된 HRTF(Enhanced HRTF)을 이용하여 3차원 오디오 신호 처리를 수행하는 것을 특징으로 한다.The present invention obtains a multichannel room impulse response using a five-channel sphere microphone (i.e., a sphere microphone having five microphones disposed on a horizontal plane of a sphere) in a specific space, and models HRTF using the same. The 3D audio signal processing may be performed using the modeled HRTF.

본 발명에서 사용되는 구체 마이크로폰에서, 1번 마이크는 구체의 정 중앙에 위치하여 정면의 오디오 신호를 획득하기 위한 것이고, 측면 마이크는 인간이 방향을 판단할 때 머리를 좌/우로 흔드는 것을 보상하기 위해 좌/우측 각각에 전후 15도씩의 각도로 2개씩 배치된다. 즉, 전면의 마이크는 1번 위치, 좌측의 마이크는 2번 및 4번 위치, 우측의 마이크는 3번 및 5번 위치에 배치된다.In the sphere microphone used in the present invention, the first microphone is located at the center of the sphere to acquire the front audio signal, and the side microphone is used to compensate for shaking the head to the left / right when the human judges the direction. Two are arranged on each of the left and right sides at an angle of 15 degrees before and after. That is, the microphone on the front side is placed in position 1, the microphone on the left side in positions 2 and 4, and the microphone on the right side in positions 3 and 5.

도 2에 도시된 바와 같은 마이크로폰 배치에서, 중앙 마이크로폰을 통해서는 전면의 음상을 강조하여 기존의 대표적인 3차원 오디오 처리 기술인 바이노럴(Binaural) 프로세싱 기술이 갖는 전/후방 혼동(Front-Back Confusion) 현상을 해결할 수 있고, 측면 마이크로폰을 이용해서는 머리 움직임을 보상할 수 있다. In the microphone arrangement as shown in FIG. 2, the front-end confusion of the conventional three-dimensional audio processing technology, binaural processing technology, by emphasizing the front image through the center microphone The phenomena can be solved, and the side microphones can be used to compensate for head movements.

따라서 본 발명에서와 같이, 구체 마이크로폰의 룸 임펄스 응답을 통해 모델링된 enhanced HRTF를 이용하여 3차원 오디오 신호 처리를 수행한다면, 고현장감(고현실감)의 멀티미디어 재생 장치를 구현할 수 있게 된다.Therefore, as in the present invention, if the 3D audio signal processing is performed by using the enhanced HRTF modeled through the room impulse response of the concrete microphone, it is possible to implement a high-reality (realistic) multimedia playback device.

먼저, 5채널 구체 마이크로폰(21)을 이용하여 5채널 룸 임펄스 응답 h₁, h₂ ..., h₅를 측정한다.First, the five-channel room impulse response h ₁ , h ₂ ..., H ₅ is measured using the five-channel sphere microphone 21.

즉, 0°, ±75°, ±105°의 마이크로폰 배치를 갖는 5채널 구체 마이크로폰(21)을 이용하여, 전방 ±30°에 위치하고 있는 음원으로부터 룸 임펄스 응답 h₁, h₂, ..., h₅를 측정한다. 여기서, 전방 ±30°의 가상 음원 위치를 가정한 이유는 기존 스테레오 콘텐츠 음상의 자연스러운 외재화를 수행하기 위함이다.That is, using a five-channel sphere microphone 21 having a microphone arrangement of 0 °, ± 75 ° and ± 105 °, the room impulse response h ₁ , h ₂ , ..., Measure h ₅ . Here, the reason for assuming a virtual sound source position of ± 30 ° in front is to perform a natural externalization of the existing stereo content sound.

더욱 상세하게는, -30°에 위치한 음원(좌측 음원)으로부터는 룸 임펄스 응답

를 측정하고, +30°에 위치한 음원(우측 음원)으로부터는 룸 임펄스 응답

를 측정한다.More specifically, the room impulse response from the sound source (left sound source) located at -30 °

Measure the room impulse response from the sound source (right sound source) located at + 30 °.

Measure

다음으로, HRTF 모델링부(HRTF 생성부)(22)는, 상기와 같이 측정된 룸 임펄스 응답을 이용하여, 외재화를 위한 채널/귀 변환 필터(Channel-To-Ear Filter) 함수

을 아래의 [수학식 1]과 같이 모델링할 수 있다.

는 일반적인 더미헤드를 이용하여 무향실에서 녹음한 기존의 HRTF에 달리, 더욱 개선된 효과(외재화 효과)를 가진다는 점에서 "개선된(Enhanced) HRTF"라 할 수 있다.Next, the HRTF modeling unit (HRTF generating unit) 22 uses a channel impulse response measured as described above, and uses a channel-to-ear filter function for externalization.

Can be modeled as in Equation 1 below.

Unlike the existing HRTF recorded in the anechoic chamber using a general dummy head, the term "enhanced HRTF" can be said to have a more improved effect (externalization effect).

여기서

는 채널/귀 변환 필터(Channel-To-Ear Filter) 함수를 나타내고, SCF(Sphere Conversion Filter)는 구체 마이크로폰의 멀티채널 출력을 스테레오 신호로 변환하기 위한 필터를 나타내는 것으로서

로 계산된다(도 2b 참조). SIR은 구체 임펄스 응답(Sphere Impulse Response)을 나타내는 것으로서, 구체의 수평면 0˚위치에 마이크를 설치한 후, 수평면에서 스피커의 방향을 5˚씩 변경하면서 임펄스를 발생시켜서 측정한다. 측정한 임펄스 응답(SIR) 중 마이크와 스피커가 평행을 이루는 0˚ 응답의 역함수(

)를 구한 후, 이를 각각의 임펄스 응답(

)과 콘볼루션(conv: Convolution)함으로써

를 구한다. 그리고 *는 컨볼루션(Convolution) 연산을 의미한다.here

Denotes a channel-to-ear filter function, and a sphere conversion filter (SCF) denotes a filter for converting a multichannel output of a specific microphone into a stereo signal.

(See FIG. 2B). The SIR represents a sphere impulse response. The microphone is installed at a position of 0 ° on the horizontal plane of the sphere, and is measured by generating an impulse while changing the direction of the speaker by 5 ° in the horizontal plane. Of the measured impulse response (SIR), the inverse function of the 0˚ response where the microphone and the speaker are parallel (

), And each impulse response (

) And convolution (conv)

Obtain And * denotes a convolution operation.

그리고 "LT"는 구체의 왼쪽(Left) 90도(-90˚) 포인트(Point), "RT"는 구체의 오른쪽(Right) 90도(+90˚) 포인트를 나타내며, 예를 들어, SCF₁ _- _LT는 중앙 스피커에서 "LT"까지의 임펄스 응답을 나타낸다(도 2b 참조).And "LT" represents left 90 degree (-90 °) point of the sphere, "RT" represents right 90 degree (+ 90 °) point of the sphere, for example, SCF ₁ _- _LT represents the impulse response at the center speaker to the "LT" (see Fig. 2b).

도 3은 본 발명에 따른 고현장감 멀티미디어 재생 시스템의 일실시예 구성도이다.3 is a block diagram of an embodiment of a high-reality multimedia playback system according to the present invention.

도 3에 도시된 바와 같이, 고현장감 멀티미디어 재생 시스템(30)은 역다중화부(31), 비디오 디코더(32), 오디오 디코더(33), 및 3차원 오디오 생성부(34)를 포함하여 이루어진다. 이하, 각각의 구성수단을 설명하기로 한다. 여기서, 오디오 디코더(33)와 3차원 오디오 생성부(34)를 묶어서 "3차원 오디오 신호 처리 장치"라 할 수 있다.As shown in FIG. 3, the high-reality multimedia playback system 30 includes a demultiplexer 31, a video decoder 32, an audio decoder 33, and a three-dimensional audio generator 34. Hereinafter, each constituent means will be described. Here, the audio decoder 33 and the 3D audio generator 34 may be collectively referred to as a "3D audio signal processing apparatus."

역다중화부(31)가 멀티미디어 데이터를 비디오 데이터와 오디오 데이터로 분리(역다중화)하면, 비디오 디코더(32)는 상기 분리된 비디오 데이터를 원래의 비디오 신호로 복원하고, 오디오 디코더(33)는 상기 분리된 오디오 데이터를 디코딩하여 원래의 오디오 신호(3차원 효과가 가미되지 않은 스테레오 신호)로 복원한다.When the demultiplexer 31 separates (demultiplexes) the multimedia data into video data and audio data, the video decoder 32 restores the separated video data to the original video signal, and the audio decoder 33 The separated audio data is decoded and restored to the original audio signal (stereo signal without a three-dimensional effect).

3차원 오디오 생성부(34)는 HRTF 저장부(341) 및 외재화 필터(342)를 포함하여 이루어지는 것으로서, 구체(球體) 마이크로폰으로 측정된 멀티채널 룸 임펄스 응답을 통하여 모델링된 머리 전달 함수(HRTF)(수학식 1)를 이용하여, 오디오 디코더(33)에서 복원된 오디오 신호에 대하여 외재화 3차원 오디오신호를 생성한다.The three-dimensional audio generator 34 includes an HRTF storage unit 341 and an externalization filter 342. The head transfer function (HRTF) modeled through a multichannel room impulse response measured by a sphere microphone is used. Using Equation 1, an externalized three-dimensional audio signal is generated with respect to the audio signal reconstructed by the audio decoder 33.

즉, HRTF 저장부(341)는 [수학식 1]과 같은 머리 전달 함수(HRTF)를 저장하고 있고, 외재화 필터(342)는 HRTF 저장부(341)에 저장된 머리 전달 함수(HRTF)(수학식 1)를 이용하여 3차원 효과를 부여하면서 외재화를 수행하는데, 이에 대해서는 도 4에서 상세히 설명하기로 한다.That is, the HRTF storage unit 341 stores the head transfer function HRTF as shown in [Equation 1], and the externalization filter 342 stores the head transfer function HRTF stored in the HRTF storage unit 341 (math). Externalization is performed while giving a three-dimensional effect by using Equation 1), which will be described in detail with reference to FIG. 4.

도 4는 본 발명에 따른 도 3의 외재화 필터의 상세 구성도로서, [수학식 1] 과 같이 계산된 채널/귀 변환 필터(Channel-To-Ear Filter) 함수(머리전달함수)를 기반으로 하는 음상 외재화 과정을 나타낸다.4 is a detailed block diagram of the externalization filter of FIG. 3 according to the present invention, based on a channel-to-ear filter function (head transfer function) calculated as in Equation 1; This shows the phonetic externalization process.

스테레오 오디오 신호가 외재화 필터(342)에 입력되면, 각 채널 신호는 멀티채널 룸 임펄스 응답을 이용하여 모델링한 채널/귀 변환 필터(Channel-To-Ear Filter)(41 내지 44)에서 필터링된 후, 좌측/우측 귀(L/R ear)에 해당하는 신호들의 합을 통하여 외재화 스테레오 신호를 생성한다. 즉, 외재화 필터(342)에서 출력되는 오디오 신호는 "외재화 되고 3차원 효과가 가미된 오디오 신호"이다.When a stereo audio signal is input to the externalization filter 342, each channel signal is filtered by a channel-to-ear filter 41 to 44 modeled using a multichannel room impulse response. The externalized stereo signal is generated through the sum of the signals corresponding to the left and right ears. That is, the audio signal output from the externalization filter 342 is an "audio signal with externalized and three-dimensional effect".

좌채널/좌측귀 변환 필터(Left channel to Left ear Filter)(41)는

, 좌채널/우측귀 변환 필터(Left channel to Right ear Filter)(42)는

, 우채널/좌측귀 변환 필터(Right channel to Left ear Filter)(43)는

, 우채널/우측귀 변환 필터(Right channel to Right ear Filter)(44)는

와 같은 머리전달함수를 이용하여 신호변환을 수행한다.Left channel to Left ear filter 41

, Left channel to Right ear Filter 42

, Right channel to left ear filter 43

, Right channel to right ear filter 44

Signal conversion is performed using a head transfer function such as

좌채널/좌측귀 변환 필터(41) 및 우채널/좌측귀 변환 필터(43)의 출력은 가산기(45)에서 결합되고, 좌채널/우측귀 변환 필터(42) 및 우채널/우측귀 변환 필터(44)의 출력은 가산기(46)에서 결합되어 출력된다.The outputs of the left channel / left ear transform filter 41 and the right channel / left ear transform filter 43 are combined in an adder 45, and the left channel / right ear transform filter 42 and the right channel / right ear transform filter The output of 44 is combined and output from adder 46.

도 4에서는 외재화 필터(342)에 입력되는 오디오 신호가 좌/우 채널로 구분된 스테레오(Stero) 신호인 경우를 예로 들었으나, 만약 외재화 필터(342)에 입력되는 오디오 신호가 모노(Mono) 신호인 경우에는 좌/우 채널 신호가 동일한 것으로 처리하면 된다.In FIG. 4, the audio signal input to the externalization filter 342 is a stereo signal divided into left and right channels. However, if the audio signal input to the externalization filter 342 is mono, the mono signal is mono. ) Signal, the left and right channel signals may be the same.

본 발명은, 도 4에 도시된 바와 같이, 구체 마이크로폰을 이용해 녹음한 멀티채널 룸 임펄스 응답을 기반으로 하여 구성된 '외재화 필터'(342)를 사용하여 음상 외재화를 수행함으로써, 전방 음상을 강조하고 또한 사람의 머리에 따른 반사 및 회절 등과 자연스러운 크로스 톡(Cross-talk) 및 잔향이 반영되어 효과적인 외재화 성능을 제공한다.As shown in FIG. 4, the present invention emphasizes the front image by performing a sound image externalization using an 'externalization filter' 342 configured based on a multichannel room impulse response recorded using a concrete microphone. In addition, natural cross-talk and reverberation such as reflection and diffraction along the human head are reflected to provide effective externalization performance.

도 5는 본 발명에 대한 음상 외재화 청취 평가의 결과도, 도 6a 및 도 6b는 본 발명에 대한 음상 정위 청취평가의 결과도이다. 여기서, 도 6a는 각도별 결과를 나타내고, 도 6b는 평균결과를 나타낸다.Fig. 5 is a result of sound externalization listening evaluation according to the present invention, and Figs. 6A and 6B are results of sound stereotactic listening evaluation according to the present invention. 6A shows the results for each angle, and FIG. 6B shows the average results.

이하에서는, 본 발명의 음상 외재화 및 음상 정위 성능에 대한 실험 방법 및 그 결과를 정리분석하고 상용기술인 SRS 헤드폰(Headphone)과의 비교청취 평가 결과를 통해 본 발명의 성능을 검증하기로 한다.Hereinafter, the experimental method and results of the sound externalization and sound localization performance of the present invention will be summarized and analyzed, and the performance of the present invention will be verified through a comparison listening evaluation result with commercially available SRS headphones.

[ 실험 환경 및 방법 ]Experimental Environment and Methods

본 발명에 따른 음상 외재화 방법의 성능 평가를 위해, 음상 외재화 거리에 대한 평가와 음상 변화를 측정하는 실험을 수행하였다. 일반적으로 오디오 신호에 대하여 다양한 신호처리를 수행하면 음상이 변화하는 문제가 발생할 수 있으므로, 음상의 변화를 측정하는 실험은 외재화 기술의 음상 보존 성능을 검증하기 위해서 필요하다. 음상 외재화 기술(알고리즘)의 성능 평가는 총 15명의 피험자를 대상으로 수행되었다.In order to evaluate the performance of the sound externalization method according to the present invention, an experiment was performed to evaluate the sound externalization distance and to measure the sound change. In general, a variety of signal processing on the audio signal may cause a change in the sound image, so experiments to measure the change in the sound image are necessary to verify the sound image preservation performance of the externalization technology. The performance evaluation of the phonetic externalization technique (algorithm) was performed on a total of 15 subjects.

음상 외재화 거리를 측정하기 위한 실험은 모노 음원과 스테레오 음원에 대해 각각 수행하였다. 이렇게 한 이유는, 일반적으로 좌/우 신호의 상관성이 높은 스테레오 신호의 경우, 외재화 효과가 모노 신호에 비해 크지 않으므로, 서로 다른 외재화 성능을 나타낼 수 있을 것으로 예상되기 때문이다.Experiments to measure the sound externalization distance were performed for mono and stereo sources, respectively. The reason for this is that stereo signals having high correlations between left and right signals are generally expected to exhibit different externalization performances because the externalization effects are not as large as those of mono signals.

모노 음원의 외재화 거리 측정 실험은 다음과 같은 절차에 따라 이루어졌다. 먼저, 피실험자로부터 30°각도의 1m 전방에 스피커를 위치시킨 후, 모노의 백색잡음 신호를 스피커를 통해 들려주어서, 실제 30° 각도의 1m 전방에서 재생되는 소리를 피실험자로 하여금 인지하도록 한다. 다음으로, 피실험자에게 이어폰을 착용하게 한 후, 원 신호인 모노의 백색잡음 신호를 청취하도록 하여 음상 내재화 현상을 인지하도록 하였다. 충분히 숙지하도록 한 후, 모노의 백색잡음 신호에 외재화 알고리즘을 적용한 신호를 청취하도록 하고, 피실험자가 인지하는 음상 외재화 거리 정도를 손으로 가리키도록 한 후 이 거리를 측정하였다.The externalization distance measurement experiment of the mono sound source was performed according to the following procedure. First, the speaker is placed 1m in front of the subject at 30 ° angle, and then a mono white noise signal is heard through the speaker so that the subject perceives the sound reproduced at 1m in front of the actual 30 ° angle. Next, the test subject was allowed to wear an earphone, and the sound of internalization was recognized by listening to a mono white noise signal, which is the original signal. After fully understanding, the signal to which the externalization algorithm was applied to the white noise signal of mono was listened to, and the distance was measured after pointing the hand to the degree of the externalization of the sound image perceived by the test subject.

스테레오 음원의 외재화 거리 측정 실험도 모노 음원의 외재화 거리 측정 실험과 유사한 방법으로 이루어졌는데, 그 절차는 아래와 같다. 먼저, 피험자의 ±30°, 1m 전방에 스피커들를 위치시키고, 스테레오 백색잡음 신호를 스피커를 통해 들려주어서, 실제 1m 전방에서 재생되는 소리를 인지하도록 한다. 이때, 좌/우 모두 백색 잡음 신호이므로, 음상은 피실험자의 전방에 맺히게 된다. 다음으로, 피실험자에게 이어폰을 착용하게 한 후 원 신호인 스테레오의 백색잡음 신호를 청취하도록 하여 음상 내재화 현상을 인지하도록 하였다. 1m 전방에서 좌/우 스피커에 의해 재생되는 소리와 어어폰 착용 시에 음상이 내재화되는 정도를 충분히 숙지하도 록 한 후, 스테레오의 백색잡음 신호에 외재화 알고리즘을 적용한 신호를 청취하도록 하고, 피실험자가 인지하는 음상 외재화 거리 정도를 손으로 가리키도록 한 후 이 거리를 측정하였다.The externalization distance measurement experiment of the stereo sound source was performed in a similar manner to the externalization distance measurement experiment of the mono sound source. The procedure is as follows. First, the speakers are placed in front of the subject at ± 30 ° and 1m, and the stereo white noise signal is heard through the speakers to recognize the sound being reproduced in front of the actual 1m. At this time, since both left and right signals are white noise, the sound image is formed in front of the test subject. Next, the test subject was put on an earphone to listen to a white noise signal of stereo, which is an original signal, to recognize a sound internalization phenomenon. After fully understanding the sound played by the left and right speakers at 1m in front and the degree of internalization of the sound image when the earphone is worn, make sure to listen to the signal applying the externalization algorithm to the white noise signal of the stereo. This distance was measured by pointing the hand at the perceived sound externalization distance.

한편, 외재화 알고리즘에 따라 음상이 변화하는 정도를 측정하는 실험은 0°∼180° 사이에서 30°간격으로 총 7개의 각도에 대해 이루어졌는데, 이렇게 한 이유는 음상의 각도에 따라 외재화 신호처리 후에 음상이 변화하는 정도가 다르게 나타날 수 있기 때문이다.On the other hand, the experiment to measure the degree of change of the sound image according to the externalization algorithm was performed for a total of seven angles at intervals of 30 ° between 0 ° ~ 180 °, the reason for doing this is the externalization signal processing according to the angle of the sound image This is because the degree of change in the sound image may be different later.

음상 변화에 대한 실험은 다음과 같은 구체적인 절차에 따라 이루어졌다. 먼저, 백색 잡음에 대하여 해당 각도의 HRTF를 이용하여 렌더링한 신호(Reference Signal)를 이어폰을 통해 청취하도록 함으로써, 피실험자가 해당 각도에 대해 충분히 인지하도록 한다. 이후, 기준(Reference) 신호에 대해 외재화 알고리즘을 적용한 신호를 피험자에게 들려주고, 피험자가 인지하는 음상 정위 각도를 지시하도록 하여, 그 각도를 측정하였다.The experiment on the change of sound phase was carried out according to the following detailed procedure. First, by listening to the signal (Reference Signal) rendered by using the HRTF of the angle with respect to the white noise through the earphone, the subject is sufficiently aware of the angle. Subsequently, a signal to which the externalization algorithm was applied to the reference signal was heard to the subject, and the angle of the acoustic phase that the subject perceived was instructed to measure the angle.

이 외에 기존 상용기술과의 외재화 성능 비교를 위하여 총 8명의 피험자를 대상으로 SRS 헤드폰(Headphone)과의 비교 청취를 실시하였다. 본 비교청취평가에서는 기준이 되는 신호(Reference Signal)는 주어지지 않았으며, 피실험자로 하여금 오직 두 시스템(A: 본 발명, B: SRS Headphone)의 출력 신호에 대한 외재화 정도를 비교하도록 하여 상대적인 외재화 성능을 평가하였다.In addition, to compare the externalization performance with the existing commercial technologies, a total of eight subjects were compared and listened to the SRS headphones. In this comparative listening evaluation, no reference signal was given, and the test subject was allowed to compare the degree of externalization of the output signals of only two systems (A: the present invention and B: the SRS Headphone). Goods performance was evaluated.

본 발명에 따른 외재화 알고리즘의 경우, 실측된 룸 임펄스 응답을 이용하였고 이를 외재화 필터로 모델링하는 과정에서 발생하는 음색 변화에 대해 추가적인 전처리/후처리 작업을 하지 않았기 때문에, 비교 청취 시에 피실험자에게 실험 콘텐츠의 음색은 고려하지 않고, 외재화 정도만을 평가하도록 하였다.In the case of the externalization algorithm according to the present invention, since the measured room impulse response was used and no additional pre- and post-processing operations were performed on the tone change generated in the process of modeling the externalization filter, the subjects were compared to the subjects. The tone of the experiment content was not considered and only the degree of externalization was evaluated.

아래의 [표 1]은 비교 평가를 위한 점수표이다. 실험 콘텐츠로는 약 20∼25초 길이의 오디오 클립(44.1kHz 샘플링 율)을 이용하였고, 클래식, 가요 및 음성 3 종류 콘텐츠를 이용하였다.Table 1 below is a scorecard for comparative evaluation. The experimental content was about 20-25 seconds long audio clip (44.1kHz sampling rate), and three kinds of contents were used: Classic, Music, and Voice.

(비교청취평가를 위한 점수표)(Score table for comparative listening evaluation) 1One A is more external B A is more external B 00 There is little difference between A and B There is little difference between A and B -1-One B is more external A B is more external A

[ 실험 결과 ][ Experiment result ]

본 발명의 음상 외재화 거리에 대한 청취평가 결과는 도 5와 같다. 도 5에 도시된 바와 같이, 청취자의 이마(Forehead) 전방으로 모노 신호는 평균 16.74 cm, 스테레오 신호는 평균 13.43 cm의 외재화 성능을 제공하는 것으로 나타났다.Listening evaluation results for the sound image externalization distance of the present invention are shown in FIG. 5. As shown in FIG. 5, it was shown that the mono signal averaged 16.74 cm and the stereo signal averaged 13.43 cm in front of the listener's forehead.

즉, 본 발명을 이용하면, 머리 내에 정위되던 음상이 머리 밖 10cm 이상에 정위된다는 것을 알 수 있다. 또한, 모노 신호에 비해 스테레오 신호에 대한 외재화 성능이 떨어지는 것을 볼 수 있는데, 이는 앞에서도 설명한 바와 같이, L/R 신호 간의 유사도(correlation)가 높은 경우에 음상이 피실험자 가까이에 맺히기 때문으로 판단된다.That is, using the present invention, it can be seen that the sound image positioned in the head is positioned at 10 cm or more outside the head. In addition, it can be seen that the externalization performance of the stereo signal is lower than that of the mono signal, as described above, because the sound image is formed near the subject when the correlation between the L / R signals is high. .

한편, 본 발명에 따른 외재화 알고리즘에 따라 음상이 변화하는 정도를 측정하는 실험의 결과는 도 6a 및 도 6b에 도시된 바와 같다.On the other hand, the results of the experiment to measure the degree of change in the sound image according to the externalization algorithm according to the present invention are as shown in Figs. 6a and 6b.

도 6a는 각도에 따른 음상 변화 값의 평균으로서 95% 신뢰구간 내의 값을 나타낸 것이고, 도 6b는 전체 평균값을 나타낸다. 청취평가를 수행하기 전에, 훈련(training)을 통해 충분히 기준(Reference) 신호를 인지하도록 함으로써, 피실험자가 전방과 후방을 혼동(front back confusion)하는 경우가 없도록 하였다.FIG. 6A shows the value within the 95% confidence interval as an average of the change in sound image with angle, and FIG. 6B shows the overall mean value. Prior to conducting the audit evaluation, training ensured that the reference signal was sufficiently recognized so that the subject did not have front back confusion.

실험 결과, 전방보다는 후방에 있는 소리의 음상 변화가 크다는 것을 알 수 있었는데, 이는 사람의 음상 인지에 대해 전방 해상도가 후방 해상도에 비해 높다는 기존 사실에 기반하는 것으로 이해할 수 있다. 각도에 따라 차이는 있지만, 음상이 평균적으로 -0.75 °를 중심으로 ±10.65 ° 정도로 변화함을 확인하였으며, 이는 본 발명에 따른 외재화 알고리즘이 원래의 음상에 변화를 초래하지만, 그 변화량이 크지 않다는 것을 나타낸다.As a result of the experiment, it can be seen that the sound image change of the sound in the rear rather than the front is large, which can be understood based on the existing fact that the front resolution is higher than the rear resolution for human image recognition. Although it varies depending on the angle, it is confirmed that the sound image changes on average about -0.75 ° around ± 10.65 °, which means that the externalization algorithm according to the present invention causes a change in the original sound image, but the amount of change is not large. Indicates.

마지막으로, 기존의 SRS 헤드폰(headphone)과의 음상 외재화 정도에 대한 비교 청취 평가 결과는 다음의 [표 2]와 같다. 실험 결과, 클래식, 가요, 음성에 대해 75 %의 피실험자들이, 본 발명이 기존의 SRS 헤드폰(Headphone)보다 더 우수한 음상 외재화 성능이 있다고 인지하는 것이 확인되었다.Finally, the results of the comparative listening evaluation on the degree of sound externalization with the existing SRS headphones are shown in Table 2 below. As a result of the experiment, 75% of the subjects of the classical, the flexible and the voice confirmed that the present invention has a better sound image externalization performance than the existing SRS headphones.

Relative DistanceRelative distance Percentage(%) Percentage (%) A>BA> B 7575 A=BA = B 8.338.33 A<BA <B 16.6716.67

결론적으로, 본 발명은 기존에 사용되어 온 일반적인 HRTF 대신 특정 공간에서 5채널의 구체 마이크로폰을 이용하여 녹음한 멀티채널 룸 임펄스 응답을 이용해 외재화 필터를 구성함으로써, 전방 음상 강조는 물론이고 사람의 머리에 따른 반사 및 회절 등과 자연스러운 크로스 톡(Cross-talk) 및 잔향이 반영되게 하여 더욱 효과적인 외재화 성능을 제공한다. 그리고 전방 외재화 성능 및 음장 보존 정도를 청취평가를 통하여 실험적으로 확인하였으며, 특히, 상용 외재화 기술인 SRS 헤드폰과의 비교청취를 통하여 본 발명의 외재화 성능이 우수하다는 것을 입증하였다.In conclusion, the present invention constructs an externalization filter using a multichannel room impulse response recorded using a five-channel concrete microphone in a specific space instead of the conventional HRTF. Natural cross-talk and reverberation such as reflection and diffraction are reflected to provide more effective externalization performance. In addition, the external externalization performance and the degree of sound field preservation were experimentally confirmed through a listening evaluation, and in particular, the externalization performance of the present invention was proved to be excellent through comparison with commercially available externalization technology SRS headphones.

한편, 전술한 바와 같은 본 발명의 방법은 컴퓨터 프로그램으로 작성이 가능하다. 그리고 상기 프로그램을 구성하는 코드 및 코드 세그먼트는 당해 분야의 컴퓨터 프로그래머에 의하여 용이하게 추론될 수 있다.　또한, 상기 작성된 프로그램은 컴퓨터가 읽을 수 있는 기록매체(정보저장매체)에 저장되고, 컴퓨터에 의하여 판독되고 실행됨으로써 본 발명의 방법을 구현한다. 그리고 상기 기록매체는 컴퓨터가 판독할 수 있는 모든 형태의 기록매체를 포함한다.On the other hand, the method of the present invention as described above can be written in a computer program. And the code and code segments constituting the program can be easily inferred by a computer programmer in the art. In addition, the written program is stored in a computer-readable recording medium (information storage medium), and read and executed by a computer to implement the method of the present invention. The recording medium may include any type of computer readable recording medium.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

도 1은 종래의 구체 마이크로폰을 이용한 스테레오 오디오 획득 방법에 대한 설명도,1 is an explanatory diagram for a stereo audio acquisition method using a conventional sphere microphone,

도 2a 및 도 2b는 본 발명에 따른 음상 외재화를 위한 머리전달함수(HRTF) 생성 방법에 대한 일실시예 설명도,2A and 2B are diagrams illustrating one embodiment of a method for generating a head transfer function (HRTF) for externalizing sound images according to the present invention;

도 3은 본 발명에 따른 고현장감 멀티미디어 재생 시스템의 일실시예 구성도,3 is a block diagram of an embodiment of a high-reality multimedia playback system according to the present invention;

도 4는 본 발명에 따른 도 3의 외재화 필터의 상세 구성도,4 is a detailed configuration diagram of the externalization filter of FIG. 3 according to the present invention;

도 5는 본 발명에 대한 음상 외재화 청취 평가의 결과도,5 is a result of the sound image externalization listening evaluation according to the present invention;

도 6a 및 도 6b는 본 발명에 대한 음상 정위 청취평가의 결과도이다.6A and 6B are results of sound stereotactic hearing evaluation according to the present invention.

* 도면의 주요부분에 대한 부호 설명* Explanation of symbols on the main parts of the drawings

11, 21: 구체 마이크로폰 22: HRTF 모델링부11, 21: sphere microphone 22: HRTF modeling unit

33: 오디오 디코더 34: 3차원 오디오 생성부33: audio decoder 34: three-dimensional audio generator

341: HRTF 저장부 342: 외재화 필터341: HRTF storage unit 342: externalization filter

Claims

In the method of generating a head transfer function (HRTF) for sound externalization,

Obtaining a conversion function capable of converting the multichannel audio signal into a stereo signal;

Obtaining a multichannel room impulse response using a sphere microphone; And

Generating a head transfer function (HRTF) using the transform function and the multichannel room impulse response

Hair transfer function generation method comprising a.

The method of claim 1,

The conversion function is

And a concrete transform filter (SCF) function for converting the multichannel audio signal of the concrete microphone into a stereo signal.

The method according to claim 1 or 2,

The head transfer function generating step,

And convolutioning the transform function and the multichannel room impulse response to obtain the head transfer function (HRTF).

The method of claim 1,

The sphere microphone,

Five microphones are arranged on the sphere, and the front microphone for emphasizing the front image and the two side microphones for each of the left and right to compensate for the movement of the head.

In the three-dimensional audio signal processing apparatus using a multi-channel impulse response,

Audio decoding means for decoding the audio data to restore the original audio signal; And

Three-dimensional audio generating means for generating a three-dimensional audio signal for the reconstructed audio signal using a head transfer function (HRTF) modeled through a multichannel room impulse response measured with a sphere microphone

3D audio signal processing apparatus comprising a.

The method of claim 5, wherein

The three-dimensional audio generating means,

Storage means for storing a modeled head transfer function (HRTF) through a multichannel room impulse response measured with the sphere microphone; And

An externalization filter for generating an externalized three-dimensional audio signal for the reconstructed audio signal using the stored head transfer function HRHR.

3D audio signal processing apparatus comprising a.

The method according to claim 5 or 6,

The head transfer function (HRTF),

A three-dimensional audio signal processing apparatus, which is generated through a conversion function for converting a multichannel audio signal of the concrete microphone into a stereo signal and a convolution of a multichannel room impulse response obtained by using the concrete microphone .

The method of claim 5, wherein

The sphere microphone,

Five microphones are disposed on the sphere, the three-dimensional audio signal processing device characterized in that the front microphone for emphasizing the front image and two side microphones for each of the left and right to compensate for the movement of the head.

In the 3D audio signal processing method using a multi-channel impulse response,

Decoding the audio data to restore the original audio signal; And

A three-dimensional audio generation step of generating a three-dimensional audio signal for the reconstructed audio signal using a head transfer function (HRTF) modeled through a multichannel room impulse response measured by a concrete microphone

3D audio signal processing method comprising a.

The method of claim 9,

The three-dimensional audio generation step,

A three-dimensional audio signal characterized in that an externalized three-dimensional audio signal is generated for the reconstructed audio signal using a head transfer function (HRTF) modeled through a multichannel room impulse response measured by the spherical microphone. Treatment method.

The method according to claim 9 or 10,

The head transfer function (HRTF),

3D audio signal processing method, characterized in that it is generated through a conversion function for converting the multi-channel audio signal of the concrete microphone into a stereo signal, and the convolution of the multi-channel room impulse response obtained using the concrete microphone. .