KR101096072B1

KR101096072B1 - Method and apparatus for enhancement of audio reconstruction

Info

Publication number: KR101096072B1
Application number: KR1020097019538A
Authority: KR
Inventors: 빌레 퓰키
Original assignee: 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.
Priority date: 2007-03-21
Filing date: 2008-02-01
Publication date: 2011-12-20
Also published as: BRPI0808225A2; RU2416172C1; BRPI0808225B1; WO2008113427A1; DE602008002066D1; EP2130403B1; JP2010521909A; TWI456569B; CN101658052A; CN101658052B; HK1138977A1; KR20090121348A; EP2130403A1; US20080232601A1; TW200841326A; ATE476835T1; JP5455657B2

Abstract

An audio signal having at least one audio channel and associated direction parameters indicating a direction of origin of a portion of the audio channel with respect to a recording position is reconstructed to derive a reconstructed audio signal. A desired direction of origin with respect to the recording position is selected. The portion of the audio channel is modified for deriving a reconstructed portion of the reconstructed audio signal, wherein the modifying comprises increasing an intensity of the portion of the audio channel having direction parameters indicating a direction of origin close to the desired direction of origin with respect to another portion of the audio channel having direction parameters indicating a direction of origin further away from the desired direction of origin.

Description

METHOD AND APPARATUS FOR ENHANCEMENT OF AUDIO RECONSTRUCTION}

본 발명은 재생되는 오디오 신호의 발생 방향의 감지를 개선하는 기술에 관한 것이다. 특히, 본 발명은 다른 방향으로부터 오는 오디오 신호에 대하여 오디오 소스(source)의 선택가능한 방향이 강조되거나 혹은 중시될 수 있도록 기록된 오디오 신호의 재생을 위한 장치 및 방법을 제안한다.The present invention relates to a technique for improving the detection of the direction of generation of an audio signal to be reproduced. In particular, the present invention proposes an apparatus and method for the reproduction of a recorded audio signal such that the selectable direction of the audio source can be emphasized or emphasized with respect to the audio signal from the other direction.

일반적으로, 다중 채널 재생 및 청취에 있어서, 청취자는 다수의 확성기(loudspeaker)에 둘러싸인다. 특정 셋업(set-up)을 위한 오디오 신호를 획득하기 위하여 여러 방법이 존재한다. 재생에 있어서의 하나의 일반적인 목표는 최초로 기록되는 신호의 공간 구성, 즉 오케스트라 내의 트럼펫의 위치와 같이 각각의 오디오 소스의 근원(origin)을 재생하는 것이다. 몇몇의 확성기 셋업은 상당히 흔하며 서로 다른 공간의 효과를 일으킬 수 있다. 특별한 후반 작업(post-production) 기술의 사용 없이, 일반적으로 알려진 두 채널 스테레오 셋업만이 두 개의 확성기 사이의 선 상에서 청각 이벤트(auditory event)를 재생할 수 있다. 이것은 주로 이른 바 "진폭-패닝(amplitude-panning)"에 이해 달성되는데, 하나의 오디오 소스에 관한 신호의 진폭은 확성기에 대한 오디오 소스의 위치에 따라 두 개의 확성기 사이에 분포한다. 이것은 보통 레코딩 혹은 그 다음의 믹싱(mixing) 동안에 일어난다. 즉, 청취 위치에 대하여 맨 왼쪽으로부터 오는 오디오 소스는 주로 왼쪽 확성기에 의해 재생될 것이며, 반면에 청취 위치의 정면에서의 오디오 소스는 양쪽의 확성기에 의해 동일한 진폭(레벨)으로 재생될 것이다. 그러나, 다른 방향으로부터 나오는 사운드는 재생될 수 없다.In general, for multichannel playback and listening, the listener is surrounded by multiple loudspeakers. There are several ways to obtain an audio signal for a particular set-up. One general goal in reproduction is to reproduce the origin of each audio source, such as the spatial configuration of the signal initially recorded, i.e. the location of the trumpet in the orchestra. Some loudspeaker setups are quite common and can cause different space effects. Without the use of special post-production techniques, only generally known two-channel stereo setups can reproduce auditory events on the line between two loudspeakers. This is mainly achieved in so-called "amplitude-panning", where the amplitude of a signal relative to one audio source is distributed between two loudspeakers depending on the position of the audio source relative to the loudspeaker. This usually happens during recording or subsequent mixing. That is, the audio source coming from the far left relative to the listening position will be mainly played by the left loudspeaker, while the audio source in front of the listening position will be played by both loudspeakers with the same amplitude (level). However, sound from other directions cannot be reproduced.

그 결과, 청취자 주위에 위치한 더 많은 확성기를 사용함으로써, 더 많은 방향들을 포함할 수 있으며 더 자연적인 공간의 효과를 생산할 수 있다. 아마도 가장 잘 알려진 다중 채널 확성기 레이아웃(layout)는 5.1 스탠다드(ITU-R775-1)인데, 이는 5개의 확성기를 포함하며, 이들의 청취 위치에 대한 방위각은 0°, ±30°, ±110°로 미리 계산된다. 이는 신호를 레코딩하거나 혹은 믹싱하는 동안에 스탠다드로부터의 재생 셋업의 특정 확성기의 배치(configuration) 및 편향(deviation)이 재생 품질(quality)의 감소를 가져올 것이라는 것을 의미한다.As a result, by using more loudspeakers located around the listener, it can include more directions and produce more natural spatial effects. Perhaps the best known multi-channel loudspeaker layout is the 5.1 standard (ITU-R775-1), which includes five loudspeakers whose azimuth angles to their listening positions are 0 °, ± 30 °, and ± 110 °. It is calculated in advance. This means that the configuration and deviation of certain loudspeakers of the playback setup from the standard while recording or mixing the signal will result in a reduction in playback quality.

서로 다른 방향에 위치한 많은 수의 확성기를 갖는 다른 많은 시스템들이 제안되었다. 특히 극장 혹은 음향 설비장(sound installation)과 같은 전문적이고 특별한 시스템 또한 서로 다른 높이의 확성기를 포함한다.Many other systems have been proposed with a large number of loudspeakers located in different directions. Specialized and special systems, in particular theaters or sound installations, also include loudspeakers of different heights.

서로 다른 재생 셋업에 따라, 이전에 언급된 확성기 시스템에 대한 몇몇의 레코딩 방법들이 이전에 언급된 확성기 시스템에 대하여 디자인되고 제안되었는데, 이는 청취 상황에서의 공간 효과를 레코딩 환경에서 감지하도록 기록하고 재생하기 위함이다. 선택된 다중채널 확성기에 대한 공간적 사운드를 레코딩하는 이론적으로 가장 이상적인 방법은 확성기 수와 똑같은 수의 마이크로폰(microphone)이 사용되는 것이다. 그러한 경우에 있어서, 마이크로폰의 방향성 패턴은 또한 확성기 레이아웃과 일치하여야 하며, 따라서 어떠한 단일 방향으로부터의 사운드만이 적은 수의 마이크로폰(1, 2, 혹은 그 이상)을 갖고 레코드된다. 각각의 마이크로폰은 특정 확성기와 관련된다. 많은 확성기가 사용되어 질수록, 마이크로폰의 방향성 패턴은 좁아져야만 한다. 그러나, 좁은(narrow) 방향성의 마이크로폰은 덜 비싸며 전형적으로 논플랫(non-flat) 주파수 반응을 가지며, 바람직스럽지 않은 방법으로 기록되는 사운드의 질을 떨어뜨린다. 더욱이, 다중 채널 재생에 대한 입력으로서 너무 넓은 방향성 패턴을 갖는 수 개의 마이크로폰을 사용하는 것은, 단일 방향으로부터 오는 사운드가 항상 필요한 것보다 더 많은 확성기로 재생될 수 있는데, 왜냐하면 그것은 다른 확성기와 관련된 마이크로폰으로 재생될 수 있기 때문이라는 사실 때문에 편향되고 흐릿한 청각의 지각을 야기하기 때문이다. 일반적으로, 널리 이용할 수 있는 마이크로폰은 두 채널 레코딩 및 재생에 가장 적합한데, 즉, 이것들은 주위의 공간 효과의 재생 목적 없이 디자인된다.According to different playback setups, several recording methods for the previously mentioned loudspeaker system have been designed and proposed for the previously mentioned loudspeaker system, which allows recording and playback to detect spatial effects in the recording environment in a recording environment. For sake. The theoretically ideal way to record the spatial sound for a selected multichannel loudspeaker is to use the same number of microphones as the number of loudspeakers. In such a case, the directional pattern of the microphone must also match the loudspeaker layout, so that only sound from any single direction is recorded with a small number of microphones (1, 2, or more). Each microphone is associated with a particular loudspeaker. As more loudspeakers are used, the directional pattern of the microphone must be narrowed. However, narrow directional microphones are less expensive and typically have a non-flat frequency response, which degrades the quality of sound recorded in an undesirable way. Moreover, using several microphones with a too wide directional pattern as input for multi-channel playback, sounds coming from a single direction can be played back with more loudspeakers than would always be necessary because Because of the fact that it can be reproduced, it causes a sense of deflection and blurry hearing. In general, widely available microphones are best suited for two channel recording and playback, i.e. they are designed without the purpose of reproducing ambient spatial effects.

마이크로폰 디자인의 관점에서, 공간적인 오디오 재생에서의 요구에 대한 마 이크로폰의 방향성 패턴을 개조하기 위하여 몇몇의 접근법이 논의되었다. 일반적으로, 모든 마이크로폰은 사운드가 마이크로폰에 도달하는 방향에 따라 다르게 사운드를 획득한다. 즉, 마이크로폰은 기록되는 사운드의 도달 방향에 따라 서로 다른 민감도를 갖는다. 몇몇 마이크로폰에서는 사운드의 방향을 거의 독립적으로 획득하기 때문에 이러한 효과는 중요하지 않다. 이러한 마이크로폰은 일반적으로 전방향성의(omnidirectional) 마이크로폰으로 불린다. 전형적인 마이크로폰 디자인에 있어서, 원형의 진동판이 작은 밀폐된 인클로져(enclosure)에 부착된다. 만약에 진동판이 인클로져에 부착되지 않고 사운드가 각각의 위치로부터 균등하게 도달하면, 그것의 방향성 패턴은 두 개의 로브(lobe)를 갖는다. 즉, 그러한 마이크로폰은 진동판의 정면과 후면으로부터 동등한 민감도를 갖는, 그러나 반대의 극성을 갖는 사운드를 획득한다. 그러한 마이크로폰은 진동판의 면과 일치하는 방향, 즉 최대 민감도의 방향에 수직으로부터 오는 사운드를 획득할 수 없다. 그러한 방향성 패턴을 이중극(dipole) 혹은 팔자형(figure-of-eight)라고 부른다.In terms of microphone design, several approaches have been discussed to adapt the directional pattern of the microphone to the needs of spatial audio reproduction. In general, all microphones acquire sound differently depending on the direction in which the sound reaches the microphone. That is, the microphones have different sensitivity depending on the arrival direction of the sound to be recorded. This effect is not important because some microphones acquire the sound direction almost independently. Such a microphone is generally called an omnidirectional microphone. In a typical microphone design, a circular diaphragm is attached to a small, closed enclosure. If the diaphragm is not attached to the enclosure and the sound reaches evenly from each location, its directional pattern has two lobes. That is, such a microphone obtains sound with equal sensitivity but opposite polarity from the front and rear sides of the diaphragm. Such a microphone cannot obtain sound coming from perpendicular to the direction coinciding with the face of the diaphragm, ie the direction of maximum sensitivity. Such a directional pattern is called a dipole or figure-of-eight.

전방향성의 마이크로폰은 또한 방향성 마이크로폰으로 변경될 수 있는데, 마이크로폰에 대하여 밀폐되지 않은 인클로져를 사용한다. 인클로져는 특히 사운드 웨이브가 인클로져를 통하여 진동판에 도달하도록 하는데, 상기 전달하는 몇몇의 방향성은 그러한 마이크로폰의 방향성 패턴이 전방향성 및 이중극 사이에서의 패턴이 되도록 우선된다. 그러한 패턴은, 예를 들면 두 개의 로브를 갖는다. 일반적으로 알려진 몇몇의 마이크로폰은 단지 하나의 로브를 갖는 패턴을 갖는다. 가장 중 요한 예는 심장형(cardioid) 패턴으로서, 방향성 함수 D는 D = 1 + cos(θ)로 표현될 수 있는데, θ는 사운드가 도달하는 방향을 나타낸다. 따라서 방향성 함수는 방향에 따라 들어오는 사운드 진폭의 분획(fraction)을 정량화한다.The omnidirectional microphone can also be converted to a directional microphone, using an enclosure that is not sealed to the microphone. The enclosure in particular allows the sound wave to reach the diaphragm through the enclosure, with some of the directionality of the transmission being prioritized such that the directional pattern of such a microphone is the pattern between the omnidirectional and bipolar. Such a pattern has, for example, two lobes. Some commonly known microphones have a pattern with only one lobe. The most important example is the cardioid pattern, where the directional function D can be expressed as D = 1 + cos (θ), where θ represents the direction in which the sound arrives. The directional function thus quantifies the fraction of incoming sound amplitude along the direction.

이전에 논의되었던 전방향의 패턴은 또한 0 차(zeroth-order) 패턴이라고도 불리며 이전에 언급된 다른 패턴들(이중극 혹은 심장형)은 1차(first-order) 패턴이라고 불린다. 이전에 언급된 모든 마이크로폰 디자인은 임의의 형상을 허용하지 않는데, 왜냐하면 그들의 방향성 패턴은 전적으로 그들의 기계적 구성에 의해 결정되기 때문이다.The omnidirectional pattern discussed previously is also called a zeroth-order pattern and the other patterns previously mentioned (bipolar or cardiac) are called first-order patterns. All previously mentioned microphone designs do not allow arbitrary shapes, because their directional pattern is entirely determined by their mechanical configuration.

부분적으로 이러한 문제를 극복하기 위하여, 몇 가지 특별한 음향 구조가 디자인되었는데, 1차 마이크로폰의 그것들보다는 좁은 방향성 패턴을 생성하기 위하여 사용될 수 있도록 디자인되었다. 예를 들면, 내부에 홀(hole)을 갖는 튜브(tube)가 전방향성의 마이크로폰에 부착되면, 좁은 방향성을 갖는 마이크로폰이 생성된다. 이러한 마이크로폰은 샷건(shotgun) 혹은 라이플(rifle) 마이크로폰으로 불린다. 그러나, 그것들은 대체로 플랫(flat) 주파수 반응을 갖지 않는데, 즉 방향성 패턴은 기록되는 사운드의 질을 희생하여 좁아진다. 더욱이, 방향성 패턴은 기하학적 구성에 의해 미리 결정되며, 따라서 그러한 마이크로폰과 함께 수행된 레코딩의 방향성 패턴은 레코딩 후에 제어될 수 없다.In part to overcome this problem, some special acoustic structures have been designed that can be used to create narrower directional patterns than those of the primary microphone. For example, when a tube having a hole therein is attached to an omnidirectional microphone, a microphone having a narrow orientation is produced. Such a microphone is called a shotgun or rifle microphone. However, they generally do not have a flat frequency response, ie the directional pattern narrows at the expense of the quality of the recorded sound. Moreover, the directional pattern is predetermined by the geometry, so the directional pattern of the recording performed with such a microphone cannot be controlled after recording.

그러므로, 실질적인 레코딩 후에 방향성 패턴을 변경하기 위하여 다른 방법들이 제안되었다. 일반적으로, 이는 전방향성 혹은 방향성 마이크로폰의 어레이(array)를 갖는 레코딩 사운드의 기초적인 생각과 나중에 신호 진행을 적응시키는 것에 의존한다. 그와 같은 여러 가지 기술들이 최근에 제안되었다. 상당히 간단한 예로는 두 개의 전방향성 마이크로폰을 갖는 사운드를 기록하는 것인데, 이는 서로 가깝게 위치하여 두 신호를 각각으로부터 감산하는 것이다. 이는 이중극과 같은 방향성 패턴을 갖는 실질적인 마이크로폰 신호를 생성한다.Therefore, other methods have been proposed to change the directional pattern after actual recording. In general, this relies on the basic idea of recording sound with an array of omni-directional or directional microphones and later adapting the signal progression. Many such techniques have recently been proposed. A fairly simple example is to record a sound with two omni-directional microphones, which are located close to each other and subtract two signals from each other. This produces a substantial microphone signal having a directional pattern such as a dipole.

기타, 마이크로폰이 신호하는 더욱 정교한 구성들이 그들이 합쳐지기 전에 늦춰지고 여과하여 제거될 수 있다. 빔(beam) 형성, 또한 무선 랜(LAN)으로부터 알려진 기술을 사용하여, 좁은 빔에 해당하는 신호는 특별히 디자인된 필터를 갖는 각각의 마이크로폰 신호를 여과하고 여과 후에 신호를 합침으로써 형성된다. 그러나, 이러한 기술은 신호 자체는 알아볼 수 없는데, 즉 사운드가 도달하는 방향을 알아내지 못한다. 따라서, 미리 결정된 방향성 패턴이 한정되어야만 하는데, 이는 미리 결정된 방향에 있어서 사운드 소스의 실질적 존재와는 독립적이다. 일반적으로 사운드의 "도달하는 방향성"의 계산이 자체의 과제이다.Other, more sophisticated configurations that the microphone signals can be slowed down and filtered out before they merge. Using beamforming, or a technique known from WLAN, signals corresponding to narrow beams are formed by filtering each microphone signal with a specially designed filter and combining the signals after filtration. However, this technique does not recognize the signal itself, that is, it does not know the direction in which the sound arrives. Thus, a predetermined directional pattern must be defined, which is independent of the substantial presence of the sound source in the predetermined direction. In general, the calculation of the "reaching direction" of a sound is its task.

일반적으로 서로 다른 많은 공간적 방향성의 특징들이 위의 기술들로 형성될 수 있다. 그러나, 임의의 공간적으로 선택된 민감도 패턴을 형성하는 것(즉, 좁은 방향성 패턴을 형성하는 것)은 많은 수의 마이크로폰을 요구한다.In general, many different spatial directional features can be formed with the above techniques. However, forming any spatially selected sensitivity pattern (ie, forming a narrow directional pattern) requires a large number of microphones.

다중 채널 레코딩을 생성하는 다른 방법은 마이크로폰을 각각의 사운드 소스에 근접하게 위치시키고 최종 믹스에서 클로즈업(close-up)된 마이크로폰 신호의 레벨을 제어함으로써 공간적인 효과를 기록하고 재생하는 것이다. 그러나, 그러한 시스템은 최종 다운믹스(down-mix)를 생성하는데 있어서 많은 수의 마이크로폰 및 많은 유저(user)의 상호작용을 필요로 한다.Another way to create multi-channel recordings is to record and play back spatial effects by placing the microphone close to each sound source and controlling the level of the microphone signal close-up in the final mix. However, such a system requires the interaction of a large number of microphones and a large number of users in producing the final down-mix.

위의 문제점을 극복하기 위한 방법이 최근에 제안되어 방향성 오디오 코딩(DirAC)이라 불리는데, 이는 서로 다른 마이크로폰 시스템과 함께 사용되며 임의의 확성기 셋업을 갖는 재생을 위한 사운드를 기록할 수 있다. DirAC의 목적은 임의의 기하학적 셋업을 갖는 다중 채널 확성기 시스템을 사용하여 가능한 한 정밀하게 현존하는 음향 환경의 공간적 효과를 재생하는 것이다. 레코딩 환경 내에서, 환경의 반응(지속적으로 기록되는 사운드 혹은 임펄스 응답일 수 있는)은 전방향성의 마이크로폰(W) 및 사운드가 도달하는 방향 및 사운드의 확산(diffuseness)을 측정하도록 하는 마이크로폰 셋으로 측정된다. 이하의 단락 및 애플리케이션 내에서, "확산"이라는 용어는 사운드의 비방향성(non-directivity)을 위한 척도로 사용된다. 즉, 모든 방향으로부터 동등한 강도를 갖는 청취 혹은 레코딩 위치에 도달하는 사운드가 최대의 확산이다. 확산을 정량화하는 일반적인 방법은 간격[0, …, 1]으로부터 확산 값을 사용하는 것인데, 상기 값 1은 최대 확산 사운드를 표시하고 값 0은 완전한 방향성의 사운드, 즉 명확하게 구분되는 오직 하나의 방향으로부터 도달 하는 사운드를 표시한다. 일반적으로 알려진 사운드가 도달하는 방향을 측정하는 방법중의 하나는 축 직교 좌표(Cartesian coordinate axes)와 일치하는 3개의 팔자형 마이크로폰(XYZ)을 적용하는 것이다. 이른바 "음장(Sound Field) 마이크로폰"이라 불리는 특정 마이크로폰이 디자인되었는데, 이는 모든 희망하는 반응을 직접 산출한다. 그러나 위에서 언급하였듯이, ㅉ, X, Y 및 Z 신호는 또한 개별의 전방향성 마이크로폰 셋으로부터 컴퓨터로 계산될 수 있다.A method for overcoming the above problem has recently been proposed and called Directional Audio Coding (DirAC), which is used with different microphone systems and can record sound for playback with any loudspeaker setup. The purpose of DirAC is to reproduce the spatial effects of the existing acoustic environment as precisely as possible using a multichannel loudspeaker system with an arbitrary geometric setup. Within a recording environment, the response of the environment (which may be a continuously recorded sound or impulse response) is measured with a omnidirectional microphone (W) and a microphone set that measures the direction in which the sound arrives and the diffuseness of the sound. do. Within the following paragraphs and applications, the term "diffusion" is used as a measure for the non-directivity of sound. That is, the sound reaching the listening or recording position with equal intensity from all directions is the maximum spreading. Typical methods for quantifying diffusion are intervals [0,... , 1], where the value 1 represents the maximum diffused sound and the value 0 represents the full directional sound, ie the sound arriving from only one clearly distinguished direction. One way to measure the direction in which a known sound arrives is to apply three four-way microphones (XYZ) that coincide with Cartesian coordinate axes. So-called "sound field microphones" have been designed with specific microphones that directly produce all the desired responses. However, as mentioned above, the X, Y, and Z signals can also be computed from a set of individual omnidirectional microphones.

DirAC 분석에 있어서, 기록된 사운드 신호는 주파수 채널로 분할되는데, 이는 인간 청각의 지각의 주파수 선택과 일치한다. 즉, 예를 들면, 신호는 신호를 다수의 주파수 채널로 분할하기 위하여 필터 뱅크(filter bank) 혹은 푸리에 변환(Fourier-transform)에 의해 처리되는데, 이는 인간 청력의 주파수 선택에 적응하는 대역폭(bandwidth)을 갖는다. 그리고 나서, 주파수 대역 신호는 사운드 근원(origin)의 방향 및 미리 결정된 시간 해상도(time resolution)를 갖는 각각의 주파수 채널에 대한 확산 값을 결정하기 위하여 분석된다. 이러한 시간 해상도는 고정될 필요가 없으며, 물론 레코딩 환경에 적용될 수 있다. DirAC에 있어서, 하나 혹은 그 이상의 오디오 채널은 분석된 방향 및 확산 데이터로 기록되고 전송된다.In DirAC analysis, the recorded sound signal is divided into frequency channels, which is consistent with the frequency selection of the human hearing perception. That is, for example, the signal is processed by a filter bank or Fourier-transform to divide the signal into multiple frequency channels, which is a bandwidth that adapts to the frequency selection of human hearing. Has The frequency band signal is then analyzed to determine the spread value for each frequency channel with the direction of the sound origin and with a predetermined time resolution. This temporal resolution does not need to be fixed and can of course be applied to the recording environment. In DirAC, one or more audio channels are recorded and transmitted in the analyzed direction and spread data.

합성 혹은 디코딩(decoding)에 있어서, 최종적으로 확성기에 적용된 오디오 채널은 전방향성의 채널 W(사용되는 마이크로폰의 전방향성 패턴에 기인한 고 음질로 기록된)를 기반으로 할 수 있거나, 혹은 각각의 확성기에 대한 사운드는 W, X, Y 및 Z의 가중치 합으로 계산될 수 있는데, 이로써 각각의 확성기에 대한 특정 방향성의 특징을 갖는 신호를 형성한다. 인코딩(encoding)에 상응하여, 각각의 오디오 채널은 주파수 채널로 분할되는데, 이들은 나아가 분석된 확산도에 따라 선택적으로 확산 및 비확산(non-diffuse) 스트림(stream)으로 분할된다. 만약 확산도가 높게 측정되면, 확산 스트림은 바이노럴 큐 코딩(Binaural Cue Coding)에서도 사용되는 데코릴레이션(decorrelation) 기술과 같은 사운드의 확산 감지를 생산하는 기술을 사용함으로써 재생될 수 있다. 비확산 사운드는 포인트 같은(point-like) 실제의 오디오 소스를 생산하는 것을 목적으로 하는 기술을 사용하여 재생되는데, 오디오 소스는 DirAC 신호의 발생과 같은 분석에서 확인된 방향성 데이터에 의해 나타나는 방향에 위치한다. 즉, 공간적 재생은 종래의 기술에서와 같이 하나의 특정한 "이상적인(ideal)" 확성기 셋업에 의해 만들어지는 것이 아니다. 이것은 사운드의 근원이 레코딩에 사용되는 마이크로폰 상의 방향성 패턴에 대한 정보를 사용하여 방향성 파라미터로 결정되는 특별한 경우이다. 이전에 언급하였듯이, 3차원 공간의 사운드의 근원은 주파수 선택 방법으로 파라미터화된다. 그와 같이, 방향성 효과는 확성기 셋업의 기하학을 알고 있는 한 임의의 확성기 셋업을 위한 고 품질로 재생될 수 있다. 그러므로 DirAC는 특별한 확성기 기하학에 한정되는 것이 아니라 더욱 탄력적인 공간적 재생을 허용한다. In synthesis or decoding, the audio channel finally applied to the loudspeaker may be based on omni-directional channel W (recorded with high sound quality due to the omnidirectional pattern of the microphone used), or each loudspeaker The sound for may be computed as the weighted sum of W, X, Y and Z, thereby forming a signal with a characteristic of a particular directionality for each loudspeaker. Corresponding to the encoding, each audio channel is divided into frequency channels, which are further divided into spread and non-diffuse streams depending on the analyzed diffusivity. If the diffusivity is measured high, the spread stream can be reproduced by using a technique that produces diffusion detection of sound, such as the decoration technique, which is also used in binaural cue coding. Non-diffusion sound is reproduced using a technique aimed at producing a point-like real audio source, which is located in the direction indicated by the directional data identified in the analysis, such as the generation of the DirAC signal. . That is, spatial reproduction is not made by one particular "ideal" loudspeaker setup as in the prior art. This is a special case where the source of sound is determined by the directional parameter using information about the directional pattern on the microphone used for recording. As mentioned previously, the source of sound in three-dimensional space is parameterized by the frequency selection method. As such, the directional effect can be reproduced with high quality for any loudspeaker setup as long as the geometry of the loudspeaker setup is known. Therefore, DirAC is not limited to special loudspeaker geometries, but allows for more flexible spatial reproduction.

다중 채널 오디오 레코딩을 재생하고 뒤의 다중 채널 재생을 위한 적절한 신호를 기록하기 위하여 많은 기술들이 개발되었지만, 선행 기술들은 예를 들면 하나 의 원하는 뚜렷한 방향으로부터의 신호의 이해도가 개선될 수 있는 것과 같은, 오디오 신호의 근원의 방향이 재생 동안에 강조될 수 있도록 미리 기록된 신호에 영향을 끼치지 못한다.Although many techniques have been developed for reproducing multichannel audio recordings and for recording the appropriate signals for subsequent multichannel reproduction, the prior arts can be improved, for example, in understanding the signal from one desired distinct direction, The direction of the source of the audio signal does not affect the pre-recorded signal so that it can be emphasized during playback.

본 발명의 한 실시 예에 따르면, 적어도 하나의 오디오 채널 및 레코딩 위치에 대한 오디오 채널 부분(portion)의 근원의 방향을 나타내는 관련 방향성 파라미터는 하나 혹은 많은 수의 뚜렷한 방향으로부터 오는 신호의 감지를 개선하는 것을 허용하도록 재생될 수 있다.According to one embodiment of the invention, the relevant directional parameter indicative of the direction of the source of the audio channel portion relative to the at least one audio channel and the recording position is used to improve the detection of signals from one or many distinct directions. Can be recycled to allow.

즉, 재생에 있어서, 레코딩 위치에 대한 근원의 원하는 방향이 선택될 수 있다. 재생된 오디오 신호의 재생되는 부분을 끌어내는 동안에, 오디오 채널 부분은, 근원의 원하는 방향에 가까운 근원의 방향을 나타내는 방향성 파라미터를 갖는 오디오 채널 부분의 강도는 근원의 원하는 방향으로부터 멀리 떨어진 근원의 방향을 나타내는 방향성 파라미터를 갖는 다른 오디오 채널 부분에 대하여 증가되는 것과 같이 변형된다. 오디오 채널 부분의 근원의 방향성 혹은 다중 채널 신호는 레코딩 동안에 선택된 방향에 위치한 오디오 객체(object)의 더 나은 감지를 허용하기 위하여 강조될 수 있다. In other words, in reproduction, the desired direction of the source relative to the recording position can be selected. While extracting the reproduced portion of the reproduced audio signal, the audio channel portion has an directional parameter representing the direction of the source close to the desired direction of the source, so that the strength of the audio channel portion is the direction of the source away from the desired direction of the source. It is deformed as it is increased for other audio channel portions with directional parameters that indicate. The directional or multichannel signal of the source of the audio channel portion can be emphasized to allow better detection of the audio object located in the selected direction during recording.

본 발명의 그 다음의 실시 예에 따르면, 유저는 선택된 방향과 관련된, 오디오 채널 혹은 다중 오디오 채널 부분과 같이 어떠한 방향 혹은 방향성이 강조되어야 하는가를 선택할 수 있는데, 즉, 그것들의 강도 혹은 진폭은 나머지 부분과 관련하여 증가된다. 실시 예에 따르면, 특정 방향으로부터 오는 사운드의 강조나 감쇄(attenuation)는 방향성 파라미터를 실행하지 않는 시스템보다는 더 예리한 공간 해상도로 실행할 수 있다. 본 발명의 그 다음의 실시 예에 따르면, 보통의 마이크로폰으로는 실행되지 않는 임의의 공간적 가중치 함수(weighting function)가 명시된다. 더욱이, 가중치 함수는 시계열적(time variant) 혹은 주파수 변종(frequency variant)일 수 있기 때문에, 본 발명의 그 다음의 실시 예는 높은 유연성(flexibility)으로 사용된다. 더욱이, 가중치 함수는 하드웨어 교환(예를 들면 마이크로폰) 대신에 시스템 내로 로드(load)되기 때문에 실행하고 업데이트 하기가 매우 쉽다.According to the next embodiment of the present invention, the user can select which direction or direction should be emphasized, such as the portion of the audio channel or the multiple audio channel associated with the selected direction, ie their intensity or amplitude is the remainder. Increased in connection with. According to an embodiment, the attenuation or attenuation of sound coming from a particular direction may be performed at a sharper spatial resolution than a system that does not implement the directional parameter. According to the next embodiment of the present invention, any spatial weighting function is specified that is not performed with a normal microphone. Moreover, since the weight function can be a time variant or a frequency variant, the next embodiment of the present invention is used with high flexibility. Moreover, the weight function is very easy to execute and update because it is loaded into the system instead of a hardware exchange (eg a microphone).

본 발명의 그 다음의 실시 예에 따르면, 오디오 채널 부분의 확산도를 나타내는 관련 확산 파라미터를 갖는 오디오 신호는 높은 확산도를 갖는 오디오 채널 부분의 강도가 관련된 낮은 확산도를 갖는 다른 오디오 채널 부분과 관련하여 감소되도록 재생된다.According to a next embodiment of the invention, an audio signal having an associated spreading parameter representing the spreading of the audio channel portion is such that the intensity of the high spreading audio channel portion is reduced in relation to the other spreading audio channel portion having a low spreading rate. Is played.

따라서, 오디오 신호를 재생하는데 있어서, 각각의 오디오 신호 부분의 확산도는 재생된 신호의 방향성 감지를 더 증가하도록 계정될 수 있다. 이는 부가적으로, 오디오 소스의 더 나은 재분포를 위하여 확산도 정보를 사용하기보다는 신호의 전체적인 확산도를 증가시키기 위하여 확산 사운드 부분만을 사용하는 기술에 대한 오디오 소스의 재분포를 증가시킬 수 있다. 본 발명은 또한 반대로 주위(ambient) 신호와 같은 확산 근원이 되는 기록된 사운드 부분을 강조하도록 허용한다는 것을 유념하여야 한다.Thus, in reproducing the audio signal, the spreading degree of each audio signal portion can be accounted for further increasing the directional sensing of the reproduced signal. This may additionally increase the redistribution of the audio source for techniques that use only the diffuse sound portion to increase the overall spread of the signal rather than using the spread information for better redistribution of the audio source. It should be noted that the invention also allows to emphasize the portion of the recorded sound that is the source of diffusion, such as the ambient signal.

그 다음의 실시 예에 따르면, 적어도 하나의 오디오 채널이 다수의 오디오 채널에 최대 혼합(up-mixed)된다. 다수의 오디오 채널은 재생(playback)을 위한 이요할 수 있는 확성기의 수와 일치할 수 있다. 임의의 확성기 셋업은 오디오 소스의 재분포를 개선하기 위하여 사용되며 오디오 소스의 방향성은 이용할 수 있는 확성기의 수와 상관없이 항상 가능한 한 가장 좋은 현재의 장비와 함께 재생된다는 것을 보증한다.According to a next embodiment, at least one audio channel is up-mixed to the plurality of audio channels. Multiple audio channels may match the number of available loudspeakers for playback. Any loudspeaker setup is used to improve the redistribution of the audio source and ensure that the directionality of the audio source is always played with the best current equipment possible, regardless of the number of loudspeakers available.

본 발명의 또 다른 실시 예에 따르면, 재생은 모노포닉(monophonic) 확성기를 거쳐서도 형성될 수 있다. 물론, 이 경우에 있어서 신호의 근원의 방향은 확성기의 물리적 위치일 것이다. 그러나, 레코딩 위치에 대한 신호의 근원의 원하는 방향을 선택함으로써, 선택된 방향으로부터의 신호 스트림의 가청성(audibility)은 단순한 다운 믹스(downmix)의 재생과 비교하여 현저히 증가할 수 있다.According to another embodiment of the present invention, the playback may also be formed via a monophonic loudspeaker. Of course, in this case the direction of the source of the signal would be the physical location of the loudspeaker. However, by selecting the desired direction of the source of the signal relative to the recording position, the audibility of the signal stream from the selected direction can be increased significantly compared to the reproduction of a simple downmix.

본 발명의 그 다음의 실시 예에 따르면, 신호의 근원의 방향은 하나 혹은 그 이상의 오디오 채널이 확성기와 일치하는 채널 수에 최대 혼합될 때, 정확하게 재생될 수 있다. 근원의 방향은 예를 들면, 진폭 패닝 기술을 사용하여 가능한 한 좋게 재생될 수 있다. 감지 품질을 더 증가시키기 위하여, 추가적인 위상 변화(phase shift)가 도입될 수 있는 데, 이 또한 선택된 방향에 의존한다.According to the next embodiment of the present invention, the direction of the source of the signal can be reproduced accurately when one or more audio channels are mixed to the maximum number of channels corresponding to the loudspeaker. The direction of the source can be reproduced as well as possible using, for example, amplitude panning techniques. In order to further increase the detection quality, additional phase shifts can be introduced, which also depends on the selected direction.

본 발명의 몇몇 실시 예는 추가적으로 오디오 품질에 심각한 영향을 미치지 않는 오디오 신호를 레코딩하기 위한 마이크로폰 캡슐(capsule)의 비용을 감소시킬 수 있는데, 그 이유는 방향/확산을 결정하는데 사용되는 마이크로폰은 플랫 주파수 반응만을 가질 필요가 없기 때문이다.Some embodiments of the present invention may further reduce the cost of a microphone capsule for recording an audio signal that does not significantly affect audio quality, since the microphone used to determine direction / diffusion is flat frequency. It is not necessary to have only a reaction.

다음에 첨부한 도면을 참조하여 본 발명의 몇몇 실시 예들을 설명할 것이다.Next, some embodiments of the present invention will be described with reference to the accompanying drawings.

도 1은 오디오 신호를 재생하기 위한 방법의 실시 예를 나타낸다;1 shows an embodiment of a method for reproducing an audio signal;

도 2는 오디오 신호를 재생하기 위한 장치의 블록도를 나타낸다; 및2 shows a block diagram of an apparatus for reproducing an audio signal; And

도 3은 뒤따르는 실시 예의 블록도를 나타낸다;3 shows a block diagram of the following embodiment;

도 4는 원격지간 회의(teleconferencing) 시나리오에서 본 발명의 방법 혹은 장치의 적용의 예를 나타낸다;4 shows an example of the application of the method or apparatus of the present invention in a teleconferencing scenario;

도 5는 오디오 신호의 방향성 인지를 개선시키기 위한 방법의 실시 예를 나타낸다;5 illustrates an embodiment of a method for improving directional perception of an audio signal;

도 6은 오디오 신호를 재생하기 위한 디코더(decoder)의 실시 예를 나타낸다; 및6 shows an embodiment of a decoder for reproducing an audio signal; And

도 7은 오디오 신호의 방향성 인지를 개선시키기 위한 시스템의 실시 예를 나타낸다.7 illustrates an embodiment of a system for improving directional recognition of an audio signal.

도 1은 적어도 하나의 오디오 채널 및 레코딩 위치에 대한 오디오 채널 부분의 근원의 방향을 나타내는 관련 방향성 파라미터를 갖는 오디오 신호를 재생하기 위한 방법의 실시 예를 나타낸다. 선택 단계(10)에서, 레코딩 위치에 대한 근원의 원하는 방향은 재생되는 오디오 신호의 재생되는 부분을 위하여 선택되는데, 상기 재생되는 부분은 오디오 채널의 부분과 상응한다. 즉, 진행되는 신호 부분을 위하여, 재생 후에 신호 부분이 명확하게 들리는 근원의 원하는 방향이 선택된다. 선택은 아래에 상세히 설명하는 바와 같이 즉시 유저의 입력으로 혹은 자동적으로 행해질 수 있다.1 shows an embodiment of a method for reproducing an audio signal having at least one audio channel and an associated directional parameter representing the direction of the origin of the audio channel portion relative to the recording position. In the selection step 10, the desired direction of the source for the recording position is selected for the portion of the audio signal to be reproduced, which portion corresponds to the portion of the audio channel. That is, for the signal portion to be advanced, the desired direction of the source from which the signal portion is clearly heard after reproduction is selected. The selection can be made immediately or automatically by the user's input as described in detail below.

상기 오디오 신호의 재생되는 부분은 시간 부분이거나 주파수 부분 혹은 오디오 채널의 특정 주파수 인터벌의 시간 부분일 수도 있다. 변경 단계(12)에서, 오디오 채널의 부분은 재생되는 오디오 신호의 재생되는 부분을 끌어내기 위하여 변경되는데, 상기 변경은 근원의 원하는 방향으로부터 멀리 떨어진 근원의 방향을 나타내는 방향성 파라미터를 갖는 오디오 채널이 다른 부분에 대하여 근원의 원하는 방향에 가까운 근원의 방향을 나타내는 방향성 파라미터를 갖는 오디오 채널 부분의 강도 증가를 포함한다. 즉, 오디오 채널의 그러한 부분은 그들의 강도 혹은 레벨을 증가시킴으로써 강조되는데, 예를 들면, 오디오 채널 부분의 스케일링 팩터(scaling factor)의 증가에 의해 구현된다. 실시 예에 따르면, 선택된(원하는) 방향에 가까운 방향으로부터 기원하는 부분은 재생에 있어서 이러한 신호 부분을 강조하고 청취자가 흥미로워 하는 그러한 오디오 기록 객체의 가청성을 개선시키기 위하여 큰 스케일 팩터에 의해 증가된다. 일반적으로, 이러한 적용 관계에 있어서, 신호 혹은 채널의 강도를 증가시키는 것은 신호를 더 잘 들릴 수 있게 하는 척도로 이해하여야 한다. 이는 예를 들면 신호의 진폭, 신호에 의해 수행되는 에너지를 증가시킬 수 있거나 혹은 개체(unity)보다 큰 스케일 팩터를 갖는 신호를 증가시킬 수 있다. 대신에 경쟁하는 신호의 소란(loudness)은 효과를 획득하기 위해 감소될 수 있다.The reproduced portion of the audio signal may be a time portion or a frequency portion or a time portion of a specific frequency interval of the audio channel. In a change step 12, the portion of the audio channel is changed to bring out the reproduced portion of the audio signal being reproduced, the change being different from the audio channel having a directional parameter indicating the direction of the source away from the desired direction of the source. An increase in intensity of the portion of the audio channel having a directional parameter representing the direction of the source close to the desired direction of the source relative to the portion. That is, such portions of the audio channel are emphasized by increasing their intensity or level, for example by increasing the scaling factor of the audio channel portion. According to an embodiment, the portion originating from the direction close to the selected (desired) direction is increased by a large scale factor in order to emphasize this signal portion in reproduction and to improve the audibility of such an audio recording object that is of interest to the listener. In general, in this application relationship, increasing the strength of a signal or channel should be understood as a measure to make the signal better heard. This may, for example, increase the amplitude of the signal, the energy carried by the signal, or increase the signal with a scale factor larger than the unity. Instead the loudness of competing signals can be reduced to achieve the effect.

원하는 방향의 선택은 유저 인터페이스를 거쳐 청취 위치에 있는 유저에 의해 직접 실행된다. 그러나, 다른 실시 예에 따르면, 선택은 예를 들면, 방향성 파라미터의 분석과 같은 것에 의해 자동적으로 실행될 있는데, 따라서 대충 같은 기원을 갖는 주파수 부분이 강조되며, 반면에 오디오 채널의 나머지 부분은 억제된다. 따라서, 신호는 최종 청취에서 유저의 입력을 추가적인 요구 없이 두드러진 오디오 소스에 자동적으로 집중할 수 있다.The selection of the desired direction is carried out directly by the user at the listening position via the user interface. However, according to another embodiment, the selection is carried out automatically, for example by analysis of directional parameters, so that the frequency part with roughly the same origin is emphasized, while the rest of the audio channel is suppressed. Thus, the signal can automatically focus on the salient audio source without further demands of the user's input at the final listening.

뒤이은 실시 예에 따르면, 선택 단계는 생략되는데, 그 이유는 근원의 방향이 설정되었기 때문이다. 즉 설정 방향에 가까운 근원의 방향을 나타내는 방향성 파라미터를 갖는 오디오 채널 부분의 강도는 증가한다. 설정 방향은, 예를 들면 배 선에 의해 접속된(hardwired) 것일 수 있는데, 즉 방향은 미리 결정된다. 만약, 예를 들어 중앙의 토커(talker)만이 원격지간 회의 시나리오에서 중요하다면, 이는 미리 결정된 설정 방향을 사용하여 실행될 수 있다. 다른 실시 예들은 설정 방향으로 사용되는 많은 수의 다른 방향을 또한 저장할 수 있는 메모리로부터 설정 방향을 읽을 수 있다. 이러한 것들 중의 하나는, 예를 들면 발명 장치상에 작동시킬 때 판독될 수 있다.According to a subsequent embodiment, the selection step is omitted because the direction of origin is set. That is, the intensity of the portion of the audio channel having the directional parameter representing the direction of the source close to the setting direction is increased. The setting direction may be, for example, hardwired by wiring, ie the direction is predetermined. If, for example, only a central talker is important in a teleconference scenario, this can be done using a predetermined set direction. Other embodiments may read the setting direction from a memory that may also store a large number of other directions used as the setting direction. One of these can be read, for example, when operating on the inventive device.

다른 실시 예에 따르면, 원하는 방향의 선택은 또한 추가적인 파라미터가 재생을 위한 원하는 방향을 나타내는 오디오 신호로 전송되는 것과 같이 인코더(encoder) 위치, 즉 신호의 레코딩에서 실행될 수 있다. 따라서, 재생된 신호의 공간적 인지는 재생에 사용되는 특정 확성기의 셋업 없이 인코더에서 미리 선택될 수 있다. According to another embodiment, the selection of the desired direction may also be carried out at the encoder position, i. Thus, the spatial perception of the reproduced signal can be preselected at the encoder without the setup of the particular loudspeaker used for reproduction.

오디오 신호를 재생하기 위한 방법은 재생되는 오디오 신호를 의도적으로 재생하는 특정 확성기 셋업과는 독립적이기 때문에, 그 방법은 스테레오 혹은 다중 채널 확성기 구성뿐만 아니라 모노포닉 확성기에도 적용될 수도 있다. 즉, 그 다음의 실시 예에 따르면, 재생된 환경의 공간적 효과는 신호의 개선된 지각에 대한 사후 처리(post-processed)이다.Since the method for reproducing the audio signal is independent of the specific loudspeaker setup intentionally reproducing the audio signal being reproduced, the method may be applied to monophonic loudspeakers as well as stereo or multichannel loudspeaker configurations. That is, according to the next embodiment, the spatial effect of the reproduced environment is post-processed for improved perception of the signal.

모노포닉 재생을 위하여 사용될 때, 효과는 임의의 방향성 패턴을 형성할 수 있는 새로운 타입의 마이크로폰을 갖는 신호를 레코딩하는 것으로 해석할 수 있다. 그러나, 이러한 효과는 즉, 레코딩 셋업에 있어서 어떠한 변화 없이 신호의 재생 동안에 리시빙 엔드(receiving end)에서 달성된다.When used for monophonic reproduction, the effect can be interpreted as recording a signal with a new type of microphone that can form any directional pattern. However, this effect is achieved at the receiving end during reproduction of the signal, i.e. without any change in the recording setup.

도 2는 오디오 신호의 재생을 위한 장치(디코더)의 실시 예를 나타내는데, 즉 오디오 신호를 재생하기 위한 디코더(20)의 실시 예를 나타낸다. 디코더(20)는 방향 선택기(22) 및 오디오 부분 변경자(modifier, 24)를 포함한다. 도 2의 실시 예에 따르면 몇몇의 마이크로폰에 의해 기록되는 다중 채널 오디오 입력(26)은 오디오 채널 부분의 근원의 방향을 나타내는 방향성 파라미터를 끌어내는 방향 분석기(28)에 의해 분석되는데, 즉, 신호 부분의 근원의 방향을 분석한다. 본 발명의 한 실시 예에 따르면, 에너지의 대부분이 마이크로폰에 일어나기 쉬운 방향이 선택된다. 레코딩 위치는 각각의 특정 신호 부분에서 결정된다. 이는 또한 예를 들면, DirAC 마이크로폰 기술을 사용하여 수행될 수 있다. 물론, 기록된 오디오 정보를 기초로 하는 다른 방향성 분석 방법이 분석을 구현하기 위하여 사용될 수 있다. 그 결과, 방향 분석기(28)는 오디오 채널 혹은 자중 채널 신호 부분의 근원의 방향을 나타내는 방향성 파라미터(30)를 끌어낸다. 더욱이, 방향 분석기(28)는 각각의 신호 부분(예를 들면, 각각의 주파수 인터벌 혹은 각각의 신호의 시간 프레임)을 위한 확산도 파라미터(32)를 끌어내기 위하여 작동할 수도 있다.2 shows an embodiment of an apparatus (decoder) for reproducing an audio signal, that is, an embodiment of a decoder 20 for reproducing an audio signal. Decoder 20 includes a direction selector 22 and an audio portion modifier 24. According to the embodiment of FIG. 2 the multi-channel audio input 26 recorded by several microphones is analyzed by a direction analyzer 28 which derives a directional parameter indicating the direction of the origin of the audio channel portion, ie the signal portion. Analyze the direction of the origin of the. According to one embodiment of the invention, the direction in which most of the energy is likely to occur in the microphone is selected. The recording position is determined at each specific signal part. This can also be done using, for example, DirAC microphone technology. Of course, other directional analysis methods based on the recorded audio information can be used to implement the analysis. As a result, the direction analyzer 28 derives a directional parameter 30 that indicates the direction of the source of the audio channel or the own weight channel signal portion. Furthermore, the direction analyzer 28 may operate to derive the spreading parameter 32 for each signal portion (eg, each frequency interval or time frame of each signal).

방향성 파라미터(30) 및 선택적으로 확산도 파라미터(32)는 재생되는 오디오 신호의 재생되는 부분을 위한 레코딩 위치에 대한 근원의 원하는 방향을 선택하기 위하여 구현되는 방향 선택기(32)로 전송된다. 원하는 방향에 대한 정보가 오디오 부분 변경자(24)로 전송된다. 오디오 부분 변경자(24)는 부분을 갖는, 방향성 파라미터를 끌어내기 위한, 적어도 하나의 오디오 채널(34)를 받는다. 예를 들면, 오디오 부분 변경자에 의한 적어도 하나의 채널은 전형적인 다중 채널 다운 믹스 알고리즘에 의해 발생되는 다중 채널 신호의 다운 믹스일 수 있다. 한 가지의 매우 간단한 경우는 다중 채널 오디오 입력(26) 신호의 직접적인 합계일 수 있다. 그러나, 다른 실시 예에 있어서는, 발명의 실시 예가 입력 채널의 수에 한정되지 않기 때문에 모든 오디오 입력 채널(26)은 오디오 디코더(20)에 의해 동시에 진행될 수 있다.The directional parameter 30 and optionally the diffusion parameter 32 are sent to a direction selector 32 which is implemented to select the desired direction of origin for the recording position for the reproduced portion of the audio signal to be reproduced. Information about the desired direction is sent to the audio portion modifier 24. The audio portion modifier 24 receives at least one audio channel 34 having a portion, for deriving a directional parameter. For example, the at least one channel by the audio portion modifier may be a down mix of the multi channel signal generated by a typical multi channel down mix algorithm. One very simple case may be the direct sum of the multi-channel audio input 26 signals. However, in other embodiments, all of the audio input channels 26 may be advanced simultaneously by the audio decoder 20 because the embodiment of the invention is not limited to the number of input channels.

오디오 부분 변경자(24)는 재생되는 오디오 신호의 재생되는 부분을 끌어내기 위한 오디오 부분을 변경하는데, 상기 변경은 근원의 원하는 방향으로부터 멀리 떨어진 근원의 방향을 나타내는 방향성 파라미터를 갖는 오디오 채널의 다른 부분에 대하여 근원의 원하는 방향에 가까운 근원의 방향을 나타내는 방향성 파라미터를 갖는 오디오 채널 부분의 강도의 증가를 포함한다. 도 2의 예에서, 변경은 변경되는 오디오 채널 부분을 갖는 스케일링 팩터(36q)를 증가시킴으로써 실행된다. 즉, 만약 오디오 채널 부분이 선택된 원하는 방향에 가까운 방향으로부터 기원하는 것으로 분석되면, 큰 스케일링 팩터가 오디오 부분을 증가시킨다. 따라서, 그것의 출력(38)에서, 오디오 부분 변경자는 입력에서 제공되는 오디오 채널 부분과 일치 하는 재생되는 오디오 신호의 재생되는 부분을 출력한다. 오디오 부분 변경자(24)의 출력에서의 점선 부분에 더 나타낸 것과 같이, 이는 모노 출력 신호를 위하여 형성되는 것뿐만 아니라 다중 채널 신호를 위해서도 형성되는데, 이때 출력 채널의 수는 고정되거나 미리 결정되지 않는다.The audio portion modifier 24 modifies the audio portion for eliciting the reproduced portion of the audio signal being reproduced, which changes to another portion of the audio channel having a directional parameter indicating the direction of the origin away from the desired direction of the origin. An increase in intensity of the portion of the audio channel with a directional parameter indicative of the direction of the source close to the desired direction of the source. In the example of FIG. 2, the change is performed by increasing the scaling factor 36q having the audio channel portion to be changed. That is, if the audio channel portion is analyzed to originate from a direction close to the selected desired direction, a large scaling factor increases the audio portion. Thus, at its output 38, the audio portion modifier outputs the reproduced portion of the reproduced audio signal that matches the portion of the audio channel provided at the input. As further shown in the dashed portion at the output of the audio portion modifier 24, it is formed not only for the mono output signal but also for the multi-channel signal, wherein the number of output channels is not fixed or predetermined.

바꾸어 말하면, 오디오 디코더(20)의 실시 예는 예를 들면, 그러한 방향성 분석으로부터의 출력을 DirAC에서 사용되는 것으로 간주한다. 마이크로폰 어레이로부터의 오디오 신호는 인간 청각 시스템의 주파수 선명도에 따라 주파수 대역으로 분할될 수 있다. 사운드의 방향 및 선택적으로 사운드의 확산도는 각각의 주파수 채널에 있어서 시간에 의존하여 분석된다. 이러한 특성은 예를 들면, 방위각(azi), 고도(ele)각 및 0에서 1 사이의 범위를 갖는 확산도 지수(psi)로 전달된다.In other words, an embodiment of the audio decoder 20 considers, for example, the output from such directional analysis to be used in DirAC. The audio signal from the microphone array can be divided into frequency bands according to the frequency sharpness of the human auditory system. The direction of the sound and optionally the spread of the sound is analyzed depending on the time in each frequency channel. This property is conveyed, for example, in the azimuth angle, the ele angle, and the diffusivity index (psi) having a range between 0 and 1.

그리고 나서, 의도되거나 혹은 선택된 방향성 특징은 방향 각(azi 및/혹은 ele) 및 선택적으로 확산도에 의존하는 가중치 작업을 사용하여 획득된 신호에 가한다. 명백하게, 이러한 가중치는 서로 다른 주파수 대역에 대하여 다르게 특정화되며, 일반적으로 시간에 따라 다양할 것이다.The intended or selected directional feature is then applied to the obtained signal using a weighting operation that depends on the direction angle (azi and / or ele) and optionally the degree of diffusion. Obviously, these weights are specified differently for different frequency bands and will generally vary with time.

도 3은 DirAc 합성을 기초로 한 본 발명의 그 다음의 실시 예를 나타낸다. 그러한 관점에서, 도 3의 실시 예는 DieAc 재생의 개선으로 해석될 수 있는데, 이는 분석된 방향에 따라 사운드의 레벨을 제어하도록 허용한다. 이것은 하나 혹은 복수의 방향으로부터 오는 사운드를 강조하거나 혹은 하나 혹은 복수의 방향으로부터 오는 사운드를 억제하는 것을 가능하게 한다. 다중 채널 재생에 적용될 때, 재생되는 사운드 이미지의 후 처리(post-processing)가 달성된다. 만약 출력으로 하나의 채널이 사용되면, 효과는 신호의 레코딩 동안에 임의의 방향성 패턴을 갖는 방향성 마이크로폰의 사용과 동등하다. 도 3에서 나타나는 실시 예에서, 전송되는 하나의 오디오 채널뿐만 아니라 방향성 파라미터의 유래도 나타난다. 분석은 예를 들면, 음장 마이크로폰에 의해 기록되는 B 포맷 마이크로폰 채널 W, X, Y 및 Z를 기초로 하여 실행된다.3 shows a next embodiment of the present invention based on DirAc synthesis. In that regard, the embodiment of FIG. 3 can be interpreted as an improvement in DieAc reproduction, which allows to control the level of sound in accordance with the analyzed direction. This makes it possible to emphasize a sound from one or more directions or to suppress a sound from one or more directions. When applied to multi-channel reproduction, post-processing of the sound image to be reproduced is achieved. If one channel is used as the output, the effect is equivalent to the use of a directional microphone with any directional pattern during recording of the signal. In the embodiment shown in FIG. 3, not only one audio channel transmitted but also the origin of the directional parameter is shown. The analysis is performed based on, for example, the B format microphone channels W, X, Y and Z recorded by the sound field microphones.

진행은 프레임 와이즈(frame-wise)로 수행된다. 그러므로, 연속적인 오디오 신호가 프레임으로 분할되는데, 프레임 경계에서 불연속을 피하기 위하여 윈도우 기능에 의해 스케일된다. 윈도우 신호 프레임은 마이크로폰 시그날을 N 주파수 대역으로 분할하는 푸리에 변환 블록(Fourier transform block, 40) 내에서 푸리에 변환을 받는다. 단순화하기 위하여, 나머지 주파수 대역의 진행도 동일하기 때문에 하나의 임의의 주파수 대역의 진행을 다음의 단락에서 설명한다. 푸리에 변환 블록(40)은 분석되는 윈도우 프레임 내에서 각각의 B 포맷 마이크로 채널 W, X, Y 및 Z에 존재하는 주파수 구성의 강도를 나타내는 계수를 끌어낸다. 이러한 주파수 파라미터(42)는 오디오 채널 및 관련 방향성 파라미터를 끌어내기 위한 오디오 인코더(44) 내로 입력된다. 도 3에 나타나는 실시 예에서, 전송되는 오디오 채널은 모든 방향으로부터 오는 신호 상의 정보를 갖는 전방향성 채널로 선택될 수 있다. 전 방향성에 대한 계수 및 B 포맷 마이크로폰 채널의 방향 부분을 기초로 하여, 방향성 및 확산도 분석이 방향성 분석 블록(48)에 의해 실행된다. Progression is performed frame-wise. Therefore, the continuous audio signal is divided into frames, which are scaled by the window function to avoid discontinuities at the frame boundaries. The window signal frame receives a Fourier transform in a Fourier transform block 40 that divides the microphone signal into N frequency bands. For the sake of simplicity, the progression of one arbitrary frequency band is described in the following paragraph since the progression of the remaining frequency bands is the same. Fourier transform block 40 derives a coefficient representing the strength of the frequency component present in each B format microchannel W, X, Y and Z within the window frame being analyzed. This frequency parameter 42 is input into an audio encoder 44 to derive the audio channel and associated directional parameters. In the embodiment shown in FIG. 3, the transmitted audio channel may be selected as an omni-directional channel having information on signals from all directions. Based on the coefficients for omnidirectional and the directional portion of the B format microphone channel, directionality and diffusivity analysis is performed by directional analysis block 48.

오디오 채널(46)의 분석된 부분의 사운드의 근원의 방향은 전방향성 채널(46)과 함께 오디오 신호를 재생하기 위한 오디오 디코더(50)로 전송된다. 확산도 파라미터(52)가 존재하면, 신호 경로는 비확산 경로(54a) 및 확산 경로(54b) 내로 유출된다. 비확산 경로(54b)는 확산도 ψ가 높을 때, 에너지 혹은 진폭의 대부분이 비확산 경로 내에 머무르는 것과 같은 방법으로 확산도 파라미터에 따라 스케일된다. 반대로, 확산도가 높으면, 에너지의 대부분은 확산 경로(54b)로 전환된다. 확산 경로(54b) 내에서, 신호는 역상관기(decorrelator, 56a 혹은 56b)를 사용하여 역상관되거나 혹은 확산된다. 역상관은 백색 잡음 신호(white noise signa)를 컨볼브(convolve)하는 것과 같은 종래의 알려진 기술을 사용하여 실행되는데, 상기 백색 잡음 신호는 주파수 채널에 따라 달라질 수 있다. 역상관성이 에너지를 보존하는 한, 최종 출력은, 확산도 파라미터 ψ에 의해 나타나듯이, 신호 경로에서의 신호는 미리 스케일되었기 때문에, 출력에서의 비확산 신호 경로(54a) 및 확산 신호 경로(54b)의 신호를 단순히 더함으로써 재생된다. 확산 신호 경로(54b)가 적절한 스케일링 법칙을 사용하여 확성기의 수에 의존하여 스케일될 수 있다. 예를 들면, 확산 경로에서의 신호는 1/

으로 스케일될 수 있는데, 이때 N은 확성기의 수이다.The direction of the source of the sound of the analyzed portion of the audio channel 46 is transmitted along with the omni-directional channel 46 to the audio decoder 50 for reproducing the audio signal. If the diffusion parameter 52 is present, the signal path flows into the non-diffusion path 54a and the diffusion path 54b. The non-diffusion path 54b is scaled according to the diffusivity parameter in such a way that most of the energy or amplitude stays in the non-diffusion path when the diffusivity ψ is high. Conversely, if the diffusivity is high, most of the energy is diverted to the diffusion path 54b. Within the diffusion path 54b, the signal is decorrelated or spread using a

decorrelator

56a or 56b. Inverse correlation is performed using conventionally known techniques such as convolve a white noise signa, which may vary depending on the frequency channel. As long as the decorrelation conserves energy, the final output is the signal of the non-diffusion signal path 54a and the spreading signal path 54b at the output since the signal in the signal path is prescaled, as indicated by the diffusion parameter ψ. It is reproduced by simply adding the signal. Spread signal path 54b may be scaled depending on the number of loudspeakers using appropriate scaling laws. For example, the signal in the spreading path is 1 /

It can be scaled to, where N is the number of loudspeakers.

다중 채널 셋업을 위하여 재생이 실행될 때, 확산 신호 경로(54b)뿐만 아니라 직접 신호 경로(54a)도 각각의 확성기 신호에 상응하는 서브(sub) 경로의 수로 분할된다(분할 위치(58a 및 58b)에서). 이를 위해, 분할 위치(58a 및 58b)에서의 분할은 복수의 확성기를 갖는 확성기 시스템을 경유하여 재생을 위한 복수의 채널에 대한 적어도 하나의 오디오 채널의 업 믹싱(up-mixing)과 동등한 것으로 해석될 수 있다. 그러므로, 각각의 다중 채널은 오디오 채널(46)의 채널 부분을 갖는다. 각각의 오디오 부분의 근원의 방향은 재생을 위해 사용되는 확성기에 상응하는 채널 부분의 강도 혹은 진폭을 추가적으로 증가하거나 감소시키는 리디렉션 블록(redirection block, 60)에 의해 재생된다. 이를 위해, 리디렉션 블록(60)은 일반적으로 재생을 위해 사용되는 확성기 셋업에 관한 정보를 필요로 한다. 실질적인 재분배(리디렉션) 및 관련 가중치 팩터는 예를 들면 진폭 패닝 기반의 벡터와 같은 기술을 사용하여 실행된다. 재분배 블록(60)에 서로 다른 기하학적 확성기 셋업을 공급함으로써, 재생 확성기의 임의의 구성이 재생 품질의 손실 없이 발명의 개념을 실행할 수 있다. 진행 후에, 복수의 반(inverse) 푸리에 변환은 각각의 확성기에 의해 재생될 수 있는 시간 도메인(domain) 신호를 끌어내기 위한 반 푸리에 변화 블록(62)에 의해 주파수 도메인 신호 상에서 실행된다. 재생에 앞서, 중복되고 부가되는 기술이 확성기에 의해 재생되도록 준비된 연속적인 시간 도메인 신호를 끌어내기 위하여 각각의 오디오 프레임을 연관시키는 합산 유닛(summation unit, 64)에 의해 실행되어야 한다. When reproduction is performed for a multi-channel setup, not only the spread signal path 54b but also the direct signal path 54a is divided into the number of sub paths corresponding to each loudspeaker signal (at the split positions 58a and 58b). ). To this end, the splitting at the splitting positions 58a and 58b is to be interpreted as equivalent to up-mixing of at least one audio channel to a plurality of channels for playback via a loudspeaker system having a plurality of loudspeakers. Can be. Therefore, each multiple channel has a channel portion of audio channel 46. The direction of the origin of each audio portion is reproduced by a redirection block 60 which further increases or decreases the strength or amplitude of the channel portion corresponding to the loudspeaker used for reproduction. For this purpose, the redirection block 60 generally needs information about the loudspeaker setup used for playback. Substantial redistribution (redirection) and associated weight factors are performed using techniques such as, for example, amplitude panning based vectors. By supplying different geometrical loudspeaker setups to the redistribution block 60, any configuration of the reproduction loudspeakers may implement the inventive concept without loss of reproduction quality. After proceeding, a plurality of inverse Fourier transforms are performed on the frequency domain signal by a half Fourier transform block 62 to derive a time domain signal that can be reproduced by each loudspeaker. Prior to playback, redundant and added techniques must be performed by a summation unit 64 that associates each audio frame to derive a continuous time domain signal that is ready for playback by the loudspeaker.

도 3에 나타나는 본 발명이 실시 예에 따르면, Dir-AC의 신호 프로세싱은 오디오 부분 변경자(66)가 실제로 진행되는 오디오 채널의 부분을 변경하고 원하는 방향에 가까운 근원의 방향을 나타내는 방향성 파라미터를 갖는 오디오 채널 부분의 강도를 증가를 허용하기 위하여 도입되는 것으로 개선되었다. 이는 직접 신호 경로에 대한 부가적인 가중치 팩터를 적용함으로써 달성되었다. 즉, 진행되는 주파수 부분이 원하는 방향으로부터 기원하면, 신호는 그러한 특정 신호 부분에 대한 추가적인 획득의 적용에 의해 강조된다. 획득의 적용은 효과가 모든 채널 부분에 동등하게 기여되면, 분할 포인트(split point, 58a)보다 앞서 실행될 수 있다.According to an embodiment of the present invention shown in FIG. 3, the signal processing of the Dir-AC is an audio having a directional parameter indicating that the audio portion modifier 66 changes the portion of the audio channel that is actually going and indicates the direction of the source close to the desired direction. It has been improved to be introduced to allow for increasing the strength of the channel portion. This was achieved by applying additional weight factors for the direct signal path. That is, if the advancing frequency portion originates from the desired direction, the signal is emphasized by the application of additional acquisition to that particular signal portion. The application of the acquisition can be carried out before the split point 58a if the effect contributes equally to all channel portions.

다른 실시 예에 있어서, 추가적인 가중치 팩터의 적용은 재분배 블록(60) 내에서 구현될 수 있는데, 이 경우에 재분배 블록은 추가적인 가중치 팩터에 의해 증가되고 감소되는 재분배 획득 팩터로 적용된다.In another embodiment, application of additional weight factors may be implemented within redistribution block 60, in which case the redistribution block is applied with a redistribution acquisition factor that is increased and decreased by additional weight factors.

다중 채널 신호의 재생에 있어서 방향성 개선을 사용할 때, 재생은 예를 들면, 도 3에 나타나는 것과 같이 DirAC 렌더링(rendering) 스타일로 실행된다. 재생되는 오디오 신호는 방향성 분석을 위해 사용되는 것과 동등한 주파수 대역으로 분할된다. 이러한 주파수 대역은 그 후 확산 및 비확산 스트림으로 분할된다. 확산 스트림은 예를 들면, 30ms의 광범위한 잡음 버스트(burst)의 컨볼루션(convolution) 후에 각각의 확성기에 대한 사운드를 적용함으로써 재생된다. 잡 음 버스트는 각각의 확성기에 따라 다르다. 비확산 스트림은 물론 시간에 의존하는 방향성 분석으로부터 배달되는 방향에 적용된다. 다중 채널 확성기 시스템에 있어서 방향성 감지를 달성하기 위하여, 간단한 페어와이즈(pair-wise) 혹은 트리플와이즈(triple-wise) 진폭 패닝이 사용된다. 더욱이, 각각의 주파수 채널은 분석되는 방향에 의존하는 획득 팩터 혹은 스케일링 팩터에 의해 증가된다. 일반적인 용어로, 함수는 재생을 위한 원하는 방향성 패턴으로 정의되도록 지정될 수 있다. 이는 예를 들면, 강조되는 단 하나의 단일 방향일 수 있다. 그러나, 임의의 방향성 패턴은 도 3의 실시 예에 따라 쉽게 구현될 수 있다.When using the directional enhancement in the reproduction of a multi-channel signal, the reproduction is performed in a DirAC rendering style as shown, for example, in FIG. The reproduced audio signal is divided into frequency bands equivalent to those used for directional analysis. This frequency band is then divided into spread and unspread streams. The spread stream is reproduced by applying sound for each loudspeaker, for example, after a convolution of a wide noise burst of 30 ms. The noise burst is different for each loudspeaker. The non-diffusion stream is of course applied to the direction delivered from time dependent directional analysis. In order to achieve directional sensing in a multi-channel loudspeaker system, simple pair-wise or triple-wise amplitude panning is used. Furthermore, each frequency channel is increased by an acquisition factor or scaling factor that depends on the direction being analyzed. In general terms, the function may be specified to be defined in the desired directional pattern for playback. This may be, for example, only one single direction to be highlighted. However, any directional pattern can be easily implemented according to the embodiment of FIG. 3.

다음의 시도에서, 본 발명의 그 다음의 실시 예는 진행 단계의 리스트로 설명한다. 리스트는 사운드가 B 포맷 마이크로폰으로 기록된다는 가정을 기초로 하며, 그 후에 DirAC 스타일 렌더링 혹은 오디오 채널 부분의 근원의 방향을 나타내는 방향성 파라미터 공급의 렌더링을 사용하여 다중 채널 혹은 모노포닉 확성기로의 청취를 위해 진행된다. 진행 과정은 다음과 같다.In the following attempt, the next embodiment of the invention is described with a list of progress steps. The list is based on the assumption that the sound is recorded with a B format microphone, and then for listening to a multichannel or monophonic loudspeaker using DirAC style rendering or rendering of a directional parameter supply indicating the direction of the source of the audio channel portion. Proceed. The process is as follows.

1. 마이크로폰 신호를 주파수 대역으로 분할하고 주파수에 의존하는 각각의 대역에서 방향성 및 선택적으로 확산도를 분석한다. 한 예로, 방향은 방위각 및 양각(elevation angle)에 의해 파라미터화된다.1. Divide the microphone signal into frequency bands and analyze the directionality and optionally the diffusion in each band depending on the frequency. In one example, the direction is parameterized by the azimuth and elevation angles.

2. 방향성 패턴을 나타내는 함수 F를 지정한다. 함수는 임의의 형태를 가질 수 있다. 그것은 전형적으로 방향에 의존한다. 더욱이, 그것은 만약 확산도 정보를 이용할 수 있으면, 또한 확산도에 의존할 수 있다. 함수는 서로 다른 주파수에 따라 다르며 시간에 의존하여 변경될 수 있다. 각각의 주파수 대역에서 오디오 신호의 뒤따른 가중(스케일링)을 위해 사용되는 각각의 시간 인스턴스(instance)에 대한 함수 F로부터 방향성 계수(q)를 끌어낸다.2. Specify a function F that represents a directional pattern. The function can have any form. It typically depends on the direction. Moreover, it may also depend on the diffusivity if diffusivity information is available. Functions vary with different frequencies and can change over time. The directional coefficient q is derived from the function F for each time instance used for subsequent weighting (scaling) of the audio signal in each frequency band.

3. 출력 신호를 형성하기 위하여 각각의 시간 및 주파수에 상응하는 방향성 팩터의 q 값을 갖는 오디오 샘플 값을 증가시킨다. 이것은 시간 및/혹은 주파수 도메인 표현(representation)에서 행해진다. 더욱이, 이 진행과정은 예를 들면, 원하는 출력 채널의 어떠한 수에 대한 DirAc 렌더링의 부분으로서 실행될 수 있다.3. Increase the audio sample value with q value of directional factor corresponding to each time and frequency to form an output signal. This is done in time and / or frequency domain representation. Moreover, this process can be implemented, for example, as part of DirAc rendering for any number of desired output channels.

앞서 설명하였듯이, 결과는 다중 채널 혹은 모노포닉 확성기 시스템을 사용하여 청취될 수 있다.As explained above, the results can be heard using a multi-channel or monophonic loudspeaker system.

도 4는 본 발명의 방법 및 장치가 원격지간 회의 시나리오 내에서 참가자의 지각성을 어떻게 강하게 증가시키는가에 대하 실례를 나타낸다. 레코딩 사이드(recording side, 100) 상에서, 레코딩 위치(104)에 대하여 별개의 방위(orientation)를 갖는 4명의 토커(talker, 102a-102d)가 설명된다. 즉, 토커(102c)로부터 기원하는 오디오 신호는 레코딩 위치(104)에 대하여 근원의 고정된 방향을 갖는다. 레코딩 위치(104)에서 기록되는 오디오 신호는 토커(102c) 및 예를 들면, 토커(102a 및 102b)의 논의로부터 기원하는 약간의 "배경(background)" 잡음으로부터 지원된다고 가정할 때, 청취 위치(110)에 기록되고 전송되는 광대역 신호 는 양쪽의 신호 구성을 포함할 것이다.4 illustrates an example of how the method and apparatus of the present invention strongly increase the perception of a participant in a teleconference scenario. On the recording side 100, four talkers 102a-102d with separate orientations relative to the recording position 104 are described. In other words, the audio signal originating from the talker 102c has a fixed direction of origin relative to the recording position 104. Assuming that the audio signal recorded at the recording position 104 is supported from the talker 102c and, for example, some "background" noise originating from the discussion of the talkers 102a and 102b, the listening position ( The wideband signal recorded and transmitted at 110 will include both signal configurations.

한 예로써, 청취 위치(114)에 위치한 청취자 주위를 둘러싼 6개의 확성기를 갖는 청취 셋업이 묘사된다. 따라서, 원칙적으로, 청취자(114) 주위의 대부분 임의의 위치로부터 나오는 사운드는 도 4에 묘사된 셋업에 의해 재생될 수 있다. 종래의 다중 채널 시스템은 가능한 한 가까이 레코딩되는 동안에 레코딩 위치(104)에서 경험한 공간적 인지를 재생하기 위하여 이러한 6개의 확성기(112a-112f)를 사용하여 사운드를 재생할 수 있다. 그러므로, 종래의 기술을 사용하여 사운드가 재생될 때, 토의하는 토커(102a 및 102b)의 "배경"으로서 토커(102c)의 지원 또한 토커(102c)의 신호의 이해도를 감소하여 명확하게 들릴 것이다.As an example, a listening setup is depicted with six loudspeakers surrounding a listener located at listening position 114. Thus, in principle, sound coming from most arbitrary locations around the listener 114 can be reproduced by the setup depicted in FIG. 4. Conventional multi-channel systems can use these six loudspeakers 112a-112f to reproduce sound while reproducing the spatial perception experienced at the recording location 104 while recording as close as possible. Therefore, when sound is reproduced using conventional techniques, the support of the talker 102c as the "background" of the discussing talkers 102a and 102b will also be clearly heard, reducing the understanding of the signal of the talker 102c.

본 발명의 실시 예에 따르면, 방향 선택기는 확성기(112a-112f)에 의해 재생되는 재생된 오디오 신호의 재생된 버젼(version)을 위해 사용되는 레코딩 위치에 대하여 근원의 원하는 방향을 선택하기 위하여 사용될 수 있다. 그러므로, 청취자(114)는 토커(102c)의 위치에 상응하는 원하는 방향(116)을 선택할 수 있다. 따라서, 오디오 부분 변경자는 선택되는 방향에 가까운 방향으로부터 기원하는 오디오 채널 부분의 강도를 강조하는 것과 같이 재생되는 오디오 신호의 재생되는 부분을 끌어내기 위하여 오디오 채널 부분을 변경할 수 있다. 청취자는 리시빙 엔드(receiving end)에서 근원의 어떠한 방향을 재생해야 하는지를 결정할 수 있다. 이러한 선택을 함으로써, 토커(102c)의 방향으로부터 기원하는 그러한 신호만이 강 조되며 따라서, 토의하는 토커(102a 및 102c)는 덜 방해받게 된다. 선택된 방향으로부터의 신호에 대한 강조와는 별도로, 웨이브 형태(120a 및 102b)에 의해 심볼로 나타낸 것과 같이, 방향은 진폭 패닝에 의해서 실행될 수 있다. 토커(102c)가 확성기 112c보다 확성기 112d에 더 가까이 위치하게 되면, 진폭 패닝은 확성기(112c 및 112d)를 경유하여 강조돤 신호의 재생을 이르게 하며, 반면에 나머지 확성기는 거의 조용해진다(결국 재생이 신호 부분을 확산한다). 진폭 패닝은 토커(102c)가 확성기(112d)에 가까이 위함에 따라, 확성기(112c)에 대하여 확성기(112d)의 레벨을 증가시킬 것이다.According to an embodiment of the invention, the direction selector may be used to select the desired direction of the source relative to the recording position used for the reproduced version of the reproduced audio signal reproduced by the loudspeakers 112a-112f. have. Therefore, the listener 114 can select the desired direction 116 corresponding to the position of the talker 102c. Thus, the audio portion modifier can change the audio channel portion to derive the reproduced portion of the audio signal to be reproduced, such as to emphasize the strength of the audio channel portion originating from the direction close to the selected direction. The listener can determine which direction of origin to play at the receiving end. By making this choice, only those signals originating from the direction of the talker 102c are emphasized, so that the discussing talkers 102a and 102c are less disturbed. Apart from emphasizing the signal from the selected direction, the direction may be implemented by amplitude panning, as represented by the wave forms 120a and 102b. When the talker 102c is located closer to the loudspeaker 112d than the loudspeaker 112c, amplitude panning leads to the reproduction of the stressed signal via the loudspeakers 112c and 112d, while the remaining loudspeakers are almost silent (and eventually the playback is almost silent). Spread out the signal part). Amplitude panning will increase the level of loudspeaker 112d relative to loudspeaker 112c as talker 102c is close to loudspeaker 112d.

도 5는 오디오 신호의 방향성 감지를 개선하기 위한 방법의 실시 예의 블록도를 나타낸다. 제 1 분석 단계(150)에서, 적어도 하나의 오디오 채널 및 레코딩 위치에 대하여 오디오 채널 부분의 근원의 방향을 나타내는 관련 방향성 파라미터가 파생된다.5 shows a block diagram of an embodiment of a method for improving directional sensing of an audio signal. In a first analysis step 150, an associated directional parameter is derived that indicates the direction of the origin of the audio channel portion with respect to the at least one audio channel and the recording position.

변경 단계(154)에서, 오디오 채널의 부분은 재생된 오디오 신호의 재생된 부분을 끌어내기 위하여 변경되는데, 상기 변경은 근원의 원하는 방향으로부터 더 멀리 떨어진 근원의 방향을 나타내는 방향성 파라미터를 갖는 다른 오디오 채널의 부분에 대하여, 근원의 방향으로부터 가까운 근원의 방향을 나타내는 방향성 파라미터를 갖는 오디오 채널 부분의 강도 증가를 포함한다.In a change step 154, the portion of the audio channel is changed to draw out the reproduced portion of the reproduced audio signal, which change is another audio channel having a directional parameter indicating the direction of the source further away from the desired direction of the source. For the portion of, includes increasing the intensity of the portion of the audio channel having a directional parameter representing the direction of the source close to the direction of the source.

도 6은 적어도 하나의 오디오 채널(160) 및 레코딩 위치에 대하여 오디오 채널 부분의 근원의 방향을 나타내는 관련 방향성 파라미터를 갖는 오디오 신호의 재생을 위한 오디오 디코더의 실시 예를 나타낸다.6 shows an embodiment of an audio decoder for reproduction of an audio signal having at least one audio channel 160 and an associated directional parameter indicating the direction of the origin of the audio channel portion with respect to the recording position.

오디오 디코더(158)는 오디오 채널 부분에 상응하는 재생되는 오디오 신호의 재생되는 부분에 대한 레코딩 위치에 대하여 근원의 원하는 방향을 선택하기 위한 방향 선택기(164)를 포함한다. 디코더(158)는 재생되는 오디오 신호의 재생되는 부분을 끌어내기 위한 오디오 채널 부분을 변경하기 위한 오디오 부분 변경자(166)를 더 포함하는데, 상기 변경은 근원의 원하는 방향으로부터 더 멀리 떨어진 근원의 방향을 나타내는 방향성 파라미터를 갖는 다른 오디오 채널의 부분에 대하여, 근원의 방향으로부터 가까운 근원의 방향을 나타내는 방향성 파라미터를 갖는 오디오 채널 부분의 강도 증가를 포함한다.The audio decoder 158 includes a direction selector 164 for selecting a desired direction of the source with respect to the recording position for the reproduced portion of the reproduced audio signal corresponding to the audio channel portion. Decoder 158 further includes an audio portion modifier 166 for modifying an audio channel portion for eliciting a reproduced portion of the reproduced audio signal, wherein the change is directed to the origin's direction further away from the desired direction of the origin. For the portion of the other audio channel having the directional parameter that represents, it includes increasing the intensity of the portion of the audio channel having the directional parameter that indicates the direction of the source close to the direction of the source.

도 6에서 나타낸 바와 같이, 단일의 재생되는 부분(168)이 파생되거나 혹은 디코더가 다중 채널 재생 셋업에서 사용될 때 복수의 재생되는 부분이 동시에 파생될 수도 있다. 도 7에서와 같이, 오디오 신호(180)의 방향성 감지의 개선을 위한 시스템의 실시 예는 도 6의 디코더(158)를 기초로 한다. 그러므로 다음에서는 추가적으로 소개되는 요소에 대해서만 설명할 것이다. 오디오 신호(180)의 방향성 감지의 개선을 위한 시스템은 입력으로서, 모노포닉 신호이거나 혹은 복수의 마이크로폰에 의해 기록되는 다중 채널 신호일 수 있는 오디오 신호(182)를 수신한다. 오디 오 인코더(184)는 적어도 하나의 오디오 채널(160) 및 레코딩 위치에 대하여 오디오 채널 부분의 근원의 방향을 나타내는 관련 방향성 파라미터(162)를 갖는 오디오 신호를 끌어낸다. 적어도 하나의 오디오 채널(160) 및 관련 방향성 파라미터는, 도 6의 오디오 디코더에 대해 이미 설명한 바와 같이, 지각적으로 개선된 출력 신호(170)를 끌어내기 위하여 더 진행된다.As shown in FIG. 6, a single played portion 168 may be derived or multiple played portions may be derived simultaneously when the decoder is used in a multi-channel playback setup. As shown in FIG. 7, an embodiment of a system for improving directional sensing of an audio signal 180 is based on the decoder 158 of FIG. 6. Therefore, only the elements introduced in the following will be explained. A system for improving directional sensing of audio signal 180 receives as input an audio signal 182, which may be a monophonic signal or a multi-channel signal recorded by a plurality of microphones. The audio encoder 184 derives an audio signal having at least one audio channel 160 and an associated directional parameter 162 indicating the direction of the origin of the audio channel portion with respect to the recording position. At least one audio channel 160 and associated directional parameters are further advanced to elicit a perceptually improved output signal 170, as already described for the audio decoder of FIG. 6.

비록 본 발명을 주로 다중 채널 오디오 재생 분야에 대하여 설명하였지만, 본 발명의 방법 및 장치로부터 다른 분야의 적용에도 이익을 줄 수 있다. 한 예로서, 본 발명의 개념은 원격지간 회의 시나리오에서의 특정한 개인의 연설에 대하여 초점(증가 및 감소에 의해)을 맞출 수 있다. 그것은 더욱이 탈 잔향(de-reverberation) 혹은 잔향 개선과 마찬가지로 주위의 구성을 거부(혹은 진폭)하는데 사용된다. 더 가능한 적용 시나리오는 주위의 잡음 신호의 잡음 취소를 포함한다. 더 가능한 사용은 청취 목적의 신호를 위한 방향성 개선일 수도 있다.Although the present invention has been described primarily in the field of multi-channel audio playback, it may benefit from other fields of application from the method and apparatus of the present invention. As one example, the concept of the present invention may focus (by increasing and decreasing) on the speech of a particular individual in a teleconference scenario. It is furthermore used to reject (or amplitude) the surrounding configuration, as is de-reverberation or reverberation improvement. More possible application scenarios include noise cancellation of ambient noise signals. A further possible use may be directional improvement for the signal for listening purposes.

본 발명의 방법의 특정 구현의 요구사항에 따라, 본 발명의 방법은 하드웨어 혹은 소프트웨어에서 구현될 수 있다. 구현은 디지털 저장 매체, 특히, 내부에 저장되는 전자적으로 판독가능한 제어 신호를 갖는 디스크, DVD, CD일 수 있는데, 이들은 본 발명의 방법이 실행되는 프로그래머블(programmable) 컴퓨터 시스템과 협력한다. 일반적으로, 본 발명은 따라서, 기계가 판독할 수 있는 캐리어 상에 저장된 프로그램 코드를 갖는 컴퓨터 프로그램 제품인데, 여기서 프로그램 코드는 컴퓨 터 프로그램 제품이 컴퓨터 상에 실행될 때, 본 발명의 방법을 실행하기 위하여 작동된다. 그러므로, 다시 말하면, 본 발명의 발명은 컴퓨터 프로그램이 컴퓨터 상에 실행될 때 적어도 하나의 본 발명의 방법을 실행하기 위한 프로그램 코드를 갖는 컴퓨터 프로그램이다.Depending on the requirements of a particular implementation of the method of the present invention, the method of the present invention may be implemented in hardware or software. The implementation may be a digital storage medium, in particular a disk, DVD, CD with electronically readable control signals stored therein, which cooperate with a programmable computer system on which the method of the present invention is implemented. Generally, the present invention is therefore a computer program product having a program code stored on a machine readable carrier, wherein the program code is used to execute the method of the present invention when the computer program product is executed on a computer. It works. Therefore, in other words, the invention of the present invention is a computer program having program code for executing at least one method of the present invention when the computer program is executed on a computer.

이상 본 발명에 대하여 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시켜 실시할 수 있음을 이해될 것이다. 다양한 변화들은 여기에 개시된 광범위 개념들로부터 벗어남이 없이 다른 실시 예들에 적용하도록 만들어질 수 있고 뒤따르는 청구항들에 의해 이해되어야 한다.Although the present invention has been described above with reference to preferred embodiments, it will be understood by those skilled in the art that the present invention may be variously modified and changed without departing from the spirit and scope of the present invention. will be. Various changes may be made to apply to other embodiments without departing from the broad concepts disclosed herein and should be understood by the claims that follow.

Claims

Has an associated direction parameter indicative of the direction of origin of a portion of an audio channel relative to at least one audio channel and a recording location, the portion of the audio channel being time A method of reproducing an audio signal, the portion, the frequency portion or the time portion of a constant frequency interval of the audio channel.

Selecting a specified direction of origin with respect to the recording position; And

Modifying a portion of an audio channel to obtain a reproduced portion of the reproduced audio signal;

Including,

The reproduced portion of the audio signal is a time portion, a frequency portion or a time portion of a constant frequency interval of the audio channel,

The changing step,

Intensity of a portion of an audio channel having a directional parameter indicative of the direction of the source, proximate to the specified direction of the source, to another portion of the audio channel with a directional parameter indicating a direction of the source away from the specified direction of the source. Against, increasing;

Including,

The portion of the audio channel is a temporal portion, a frequency portion or a time portion of a constant frequency interval of the audio channel, wherein at least one audio channel and the associated directionality indicating the direction of the origin of the audio channel portion relative to the recording position. Method for reproducing an audio signal with parameters.

In the method of claim 1,

The selecting step,

And reading said specified direction from a memory device,

A method for reproducing an audio signal having at least one audio channel and an associated directional parameter indicative of the direction of the origin of the audio channel portion relative to the recording position.

In the method of claim 1,

Said altering comprises changing a frequency domain representation of an audio channel portion, wherein said at least one audio channel and said audio signal having an associated directional parameter indicative of the direction of origin of the audio channel portion relative to the recording position. Way.

In the method of claim 1,

Wherein said change comprises a time domain change of an audio channel portion, said at least one audio channel and the associated directional parameter indicative of the direction of the origin of the audio channel portion relative to the recording position.

In the method of claim 1,

The changing step,

Obtaining a scaling factor for each portion of the audio channel;

Including,

The scaling factor is

The scaled portion of the audio channel having an associated direction parameter indicative of the direction of the source proximate to the desired direction relative to the source may be associated with an associated direction parameter indicating the direction of the source away from the desired direction with respect to the origin. For other scaled portions of the audio channel provided, to have increased intensity,

The scaled portion,

At least one audio channel and an associated directional parameter indicative of the direction of the origin of the audio channel portion relative to the recording position.

In the method of claim 1,

Deriving a frequency representation of at least one audio channel;

Further comprising an at least one audio channel and an associated directional parameter indicative of the direction of origin of the audio channel portion relative to the recording position.

In the method of claim 6,

Deriving step,

Further comprising deriving representations of first and second finite width frequency intervals of at least one audio channel, wherein the width of the first frequency interval is different from the width of the second frequency interval, A method for reproducing an audio signal having at least one audio channel and an associated directional parameter indicative of the direction of the origin of the audio channel portion relative to the recording position.

In the method of claim 1,

Selecting the specified direction of the source,

Reproducing an audio signal having at least one audio channel and an associated directional parameter indicative of the direction of origin of the audio channel portion relative to the recording position, comprising receiving as input a user input parameter indicative of a desired direction. Way.

In the method of claim 1,

Selecting the specified direction of the source,

Playing an audio signal having at least one audio channel and an associated directional parameter indicative of the direction of origin of the audio channel portion relative to the recording position, comprising receiving a directional parameter indicative of the desired direction associated with the audio signal. How to.

In the method of claim 1,

Selecting the specified direction of the source,

Determining the direction of origin of the finite width frequency interval of the at least one audio channel, the audio having an associated directional parameter indicative of the direction of origin of the audio channel portion relative to the recording position. Method for reproducing the signal.

In the method of claim 1,

Further comprising receiving a diffusion parameter indicative of the diffusion of the portion of the audio channel associated with the audio channel; And

The altering of the audio channel portion comprises reducing the intensity of the portion of the audio channel having a diffusivity parameter representing a high diffusivity relative to another portion of the audio channel having a diffusivity parameter representing a low diffusivity And a method for reproducing an audio signal having an associated directional parameter indicative of the direction of the source of the audio channel portion relative to the audio channel and the recording position.

In the method according to claim 1,

Upmixing at least one audio channel for the plurality of channels for playback via a loudspeaker system having a plurality of loudspeakers;

Wherein each of the plurality of channels has an audio channel portion corresponding to the at least one audio channel portion, having at least one audio channel and an associated directional parameter indicative of the direction of origin of the audio channel portion relative to the recording position. Method for playing an audio signal.

The method of claim 12,

The changing step,

Each intensity of the channel portion upmixed from the audio channel portion having a directional parameter indicating the direction of the source close to the desired direction of the source, with a directional parameter indicating the direction of the source further away from the desired direction relative to the source. Increasing for other channel portions in the plurality of channels, upmixed from other portions of an audio channel;

And at least one audio channel and an associated directional parameter indicative of the direction of the origin of the audio channel portion relative to the recording position.

The method of claim 13,

Panning the amplitude of the audio channel such that the recognition direction of the source of the channel portion to be reproduced in reproduction using a predetermined loudspeaker setup corresponds to the direction of the source;

In the method for improving the directional recognition of the audio signal,

Deriving an associated directional parameter indicative of the direction of origin of the audio channel portion relative to the at least one audio channel and the recording position;

Selecting a specified direction of origin for the recording position; And

Modifying the audio channel portion to obtain an improved audio signal portion;

The portion of the audio channel is a time portion, a frequency portion or a time portion of a constant frequency interval of the audio channel,

The portion of the enhanced audio channel is a time portion, a frequency portion or a time portion of a constant frequency interval of the enhanced audio channel,

The changing step,

Relative to the second portion of the audio channel having a directional parameter representing the direction of the source further away from the designated direction of the source, of the first portion of the audio channel having a directional parameter representing the direction of the source closer to the specified direction of the source. Increasing strength;

And directional recognition of the audio signal.

At least one audio channel and an associated directional parameter indicative of the direction of origin of the portion of the audio channel relative to the recording position, wherein the portion of the audio channel is a time portion, a frequency portion or a time of a constant frequency interval of the audio channel. An audio decoder for reproducing an audio signal as a part,

A direction selector adapted to select a specified direction of origin for the recording position; And

An audio portion modifier for changing the audio channel portion to obtain a reproduced portion of the reproduced audio signal;

Including;

To change the above,

Relative to the second portion of the audio channel having a directional parameter representing the direction of the source further away from the designated direction of the source, of the first portion of the audio channel having a directional parameter representing the direction of the source closer to the specified direction of the source. And increasing the intensity, the audio decoder having at least one audio channel and an associated directional parameter indicative of the direction of origin of the audio channel portion relative to the recording position.

An audio encoder for improving the direction recognition of an audio signal,

A signal generator for deriving an associated directional parameter indicative of the direction of origin of the portion of the audio channel relative to the at least one audio channel and the recording position;

A signal modifier for changing an audio channel portion to obtain an improved audio signal portion;

Including;

The portion of the enhanced audio signal is a time portion, a frequency portion or a time portion of a constant frequency interval of the audio channel,

The alteration may be applied to the second portion of the audio channel having a directional parameter that indicates a direction of the source further away from the specified direction of the source, the second of the audio channel having a directional parameter that indicates a direction of the source close to the specified direction of the source. And increasing the intensity of the one part.

A system for improving an audio signal to be reproduced,

An audio encoder for deriving an associated directional parameter indicative of the direction of origin of the audio channel portion relative to the at least one audio channel and the recording position;

An audio decoder having an audio portion modifier for changing an audio channel portion to obtain a reproduced portion of the reproduced audio signal;

Including;

The reproduced portion of the audio signal is a time portion, a frequency portion or a time portion of a constant frequency interval of the reproduced audio signal,

The alteration may be applied to the second portion of the audio channel having a directional parameter indicating a direction of the source further away from the specified direction of the source, of the audio channel having a directional parameter indicating a direction of the source closer to the specified direction of the source. Increasing the intensity of the first portion.

A computer-readable recording medium having recorded thereon a program for executing the method of claim 1 or 15 on a computer system.

delete