KR20140010468A

KR20140010468A - System for spatial extraction of audio signals

Info

Publication number: KR20140010468A
Application number: KR1020147000011A
Authority: KR
Inventors: 길버트 아서 조셉 솔로드르
Original assignee: 하만인터내셔날인더스트리스인코포레이티드
Priority date: 2009-10-05
Filing date: 2010-10-04
Publication date: 2014-01-24
Also published as: CN102687536A; EP2486737A1; KR20120064104A; JP2013507048A; KR101387195B1; CA2774415C; US20110081024A1; US9372251B2; WO2011044064A1; EP2486737B1; JP5400225B2; CA2774415A1; CN102687536B

Abstract

음향 처리 시스템은 오디오 콘텐츠의 적어도 2개의 상이한 입력 채널을 포함하는 오디오 입력 신호를 수신한다. 음향 처리 시스템은 오디오 입력 신호를 분석하여 오디오 입력 신호에 포함된 가청 음향 소스를 음향 소스 벡터로 분리시킨다. 가청 음향 소스의 음향 소스 벡터로의 분리는 청취자 인지 음향 스테이지 내에 각각의 가청 음향 소스의 지각 위치를 기초로 할 수 있다. 음향 소스 벡터는 음향 처리 시스템에 의해 개별적으로 그리고 독립적으로 처리될 수 있는 청취자 인지 음향 스테이지에 걸쳐 있는 공간 슬라이스를 나타낼 수 있다. 처리 후에, 음향 소스 벡터는 각각의 라우드스피커를 구동시키도록 사용된 출력 채널을 갖는 오디오 출력 신호를 형성하도록 선택적으로 결합될 수 있다. 가청 음향 소스가 분리되고 독립적이기 때문에, 가청 음향 소스는 어떠한 하나 이상의 출력 채널에 포함될 수 있다. The sound processing system receives an audio input signal comprising at least two different input channels of audio content. The sound processing system analyzes the audio input signal to separate the audible sound source included in the audio input signal into a sound source vector. The separation of the audible sound source into the sound source vector may be based on the perceptual position of each audible sound source within the listener perception sound stage. The sound source vector may represent a spatial slice that spans a listener perceptual sound stage that may be processed individually and independently by the sound processing system. After processing, the sound source vector can be selectively combined to form an audio output signal having an output channel used to drive each loudspeaker. Since the audible sound source is separate and independent, the audible sound source can be included in any one or more output channels.

Description

Spatial extraction system of audio signal {SYSTEM FOR SPATIAL EXTRACTION OF AUDIO SIGNALS}

본 출원은 2009년 10월 5일자로 출원되고 본 명세서에 참조로 합체되는 미국 가출원 제61/248,770호로부터 우선권의 이익을 청구한다.This application claims the benefit of priority from US Provisional Application No. 61 / 248,770, filed October 5, 2009, which is incorporated herein by reference.

본 발명은 전반적으로 오디오 시스템, 보다 구체적으로는 오디오 신호의 콘텐츠를 공간적으로 추출하는 시스템에 관한 것이다.The present invention relates generally to an audio system, and more particularly to a system for spatially extracting content of an audio signal.

음향 시스템을 이용하여 오디오 신호로부터 가청 음향을 발생시키는 것은 널리 알려져 있다. 오디오 신호는 미리 녹음된 오디오 신호 또는 라이브 오디오 신호일 수 있다. 오디오 신호를 수신하면, 음향 시스템은 오디오 신호를 처리하고 통상적으로 증폭된 형태의 오디오 신호를 라우드스피커에 제공하여 가청 음향을 발생시킬 수 있다. 라이브 오디오 신호의 예로는 가수와 오케스트라 등의 밴드에 의한 라이브 스테이지 공연이 있다. 미리 녹음된 오디오 신호의 예로는 가수와 밴드의 노래가 녹음된 콤팩트 디스크 또는 전자 데이터파일이 있다. 기타 오디오 소스가 또한 유사하게 제공될 수 있다. It is well known to generate an audible sound from an audio signal using an acoustic system. The audio signal may be a pre-recorded audio signal or a live audio signal. Upon receiving the audio signal, the acoustic system can generate an audible sound by processing the audio signal and providing the loudspeaker with the audio signal, typically in amplified form. An example of a live audio signal is a live stage performance by bands such as singers and orchestras. Examples of pre-recorded audio signals are compact discs or electronic data files in which songs of singers and bands are recorded. Other audio sources may also be similarly provided.

통상적으로, 콤팩트 디스크, 전자 데이터파일 및 기타 형태의 오디오 신호 저장은 스튜디오 또는 라이브 콘서트 현장에서 연주하는 가수와 밴드 등의 오디오 소스의 마스터 녹음으로 이루어진다. 가수와 밴드는 가수와 밴드에 의해 생성되는 라이브 음악을 수신하고 포착하도록 마이크로폰, 증폭기 및 녹음 장비를 이용하여 연주할 수 있다. 녹음 중에, 음향 혼합 엔지니어는 녹음을 위한 원하는 라이브 음향을 수신하기 위해 밴드의 멤버들 사이에 임의의 갯수의 마이크로폰을 전략적으로 위치 결정할 수 있다. 녹음 장비는 마이크로폰 및 밴드에 의해 연주되는 기타 악기로부터 라이브 오디오 입력값을 수신하도록 구성된 임의의 갯수의 입력 채널을 포함한다.Typically, compact discs, electronic data files and other forms of audio signal storage consist of master recordings of audio sources such as singers and bands playing in studios or live concert scenes. Singers and bands can be played using microphones, amplifiers, and recording equipment to receive and capture the live music produced by the singers and bands. During recording, the sound mixing engineer can strategically position any number of microphones between the members of the band to receive the desired live sound for recording. The recording equipment includes any number of input channels configured to receive live audio inputs from microphones and other instruments played by the band.

이어서, 음향 혼합 엔지니어는 오디오 신호가 수신된 채널을 혼합 또는 조정하여 가수 및 밴드에 의한 원하는 전체적인 음향을 얻는다. 또한, 음향 혼합 엔지니어는 녹음이 나중에 어떻게 재생하는지를 명시하도록 녹음된 오디오를 재혼합하거나 달리 조정할 수 있다. 예컨대, 음향 혼합 엔지니어는 녹음이 오디오 시스템의 라우드스피커를 통해 재생될 때에 청취자에 의해 인지되는 가수의 위치가 중앙 지점에 있게 되도록, 바이올린이 가수의 좌측에 대하여 인지되도록, 그리고 기타가 가수의 우측에 대하여 인지되도록 개별적인 오디오 신호를 조정할 수 있다. The sound mixing engineer then mixes or adjusts the channel on which the audio signal was received to obtain the desired overall sound by the mantissa and the band. In addition, the sound mixing engineer can remix or otherwise adjust the recorded audio to specify how the recording will play back later. For example, the sound mixing engineer may have the violin perceived relative to the left side of the singer, and the guitar to the right side of the singer, such that when the recording is played through the loudspeaker of the audio system, the position of the singer recognized by the listener is at the center point. Individual audio signals can be adjusted to be perceived.

오디오 시스템은 또한 스테레오 신호 등의 2개 이상의 채널 오디오 입력 신호를 수신하고, 수신된 입력 채널보다 많은 출력 채널을 형성할 수 있다. 그러한 오디오 시스템은 "Logic 7TM"으로서 공지되고 캘리포니아주 노스리지의 법인체인 Harman International Industries에 의해 제조되는 시스템을 포함한다. 그러한 시스템은 서로에 대하여 오디오 입력 신호의 페이징을 기초로 하여 오디오 입력 신호를 출력 채널에 분배한다. The audio system can also receive two or more channel audio input signals, such as stereo signals, and form more output channels than the received input channels. Such audio systems include those systems known as "Logic 7TM" and manufactured by Harman International Industries, a corporation of Northridge, California. Such systems distribute audio input signals to output channels based on paging of the audio input signals with respect to each other.

음향 처리 시스템은 적어도 2개의 별개의 오디오 채널을 포함하는 오디오 입력 신호를 수신할 수 있다. 오디오 입력 신호는 오디오 입력 신호에 포함된 가청 음향 소스 또는 오디오 소스의 지각 위치를 결정하도록 분석될 수 있다. 지각 위치는 청취자 인지 음향 스테이지를 기초로 하여 확인될 수 있다. 청취자 인지 음향 스테이지는 개념적으로 스테레오 오디오 시스템, 또는 서라운드 음성 오디오 시스템, 또는 오디오 입력 신호를 기초로 하여 청취자 인지 음향 스테이지를 생성하도록 가청 음향을 출력할 수 있는 임의의 다른 형태의 오디오 재생 시스템을 통해 오디오 입력 신호의 재생을 기초로 할 수 있다. The sound processing system may receive an audio input signal comprising at least two separate audio channels. The audio input signal may be analyzed to determine the audible sound source or perceptual location of the audio source included in the audio input signal. The perceptual position can be identified based on the listener perception sound stage. The listener-aware sound stage is conceptually audio via a stereo audio system, or a surround voice audio system, or any other form of audio playback system capable of outputting audible sound to generate a listener-aware sound stage based on an audio input signal. It can be based on reproduction of the input signal.

음향 처리 시스템은 청취자 인지 음향 스테이지를 청취자 인지 음향 스테이지의 임의의 예정된 갯수의 지각 위치(또한 공간 슬라이스라고 명명할 수 있음)로 분할할 수 있다. 예컨대, 오디오 입력 신호가 스테레오 입력 신호인 경우에, 지각 위치의 갯수는 출력 오디오 채널의 원하는 갯수, 예컨대 좌측 전방 출력 채널, 우측 전방 출력 채널, 중앙 출력 채널, 우측면 출력 채널, 좌측면 출력 채널, 우측 후방 출력 채널 및 좌측 후방 출력 채널을 나타내는 7개의 오디오 출력 채널일 수 있다. 또한, 오디오 입력 신호는 복수 개의 예정된 주파수 대역으로 분할될 수 있고 가청 음향 소스의 지각 위치는 예정된 주파수 대역 내에서 확인될 수 있다. The sound processing system may divide the listener perception sound stage into any predetermined number of perceptual positions (also termed spatial slices) of the listener perception sound stage. For example, if the audio input signal is a stereo input signal, the number of perceptual positions is the desired number of output audio channels, eg, left front output channel, right front output channel, center output channel, right side output channel, left side output channel, right side. There may be seven audio output channels representing the rear output channel and the left rear output channel. Also, the audio input signal can be divided into a plurality of predetermined frequency bands and the perceptual position of the audible sound source can be identified within the predetermined frequency band.

오디오 입력 신호를 공간 슬라이스로 분리하기 위하여, 음향 처리 시스템은 각각의 공간 슬라이스에 대해 이득 벡터를 결정 및 발생시킬 수 있다. 각 이득 벡터는 오디오 입력 신호의 전체 주파수 범위 내에 예정된 주파수 대역을 커버하는 이득 값을 포함한다. 이득 값은 오디오 입력 신호에 포함된 가청 음향 소스가 청취자 인지 음향 스테이지에서 가청 음향 소스의 위치에 따라 공간 슬라이스로 분리되도록 오디오 입력 신호의 콘텐츠를 기초로 하여 발생될 수 있다. 이득 벡터는 위치 필터 뱅크를 형성하는 복수 개의 위치 필터에 의해 형성될 수 있다. 일례에서, 위치 필터 뱅크에서 위치 필터의 갯수는 공간 슬라이스의 갯수 및 원하는 오디오 출력 채널의 갯수에 대응할 수 있다. To separate the audio input signal into spatial slices, the sound processing system may determine and generate a gain vector for each spatial slice. Each gain vector includes a gain value that covers a predetermined frequency band within the entire frequency range of the audio input signal. The gain value may be generated based on the content of the audio input signal such that the audible sound source included in the audio input signal is separated into spatial slices according to the position of the audible sound source in the listener perception sound stage. The gain vector may be formed by a plurality of position filters forming a position filter bank. In one example, the number of position filters in the position filter bank may correspond to the number of spatial slices and the number of desired audio output channels.

위치 필터 뱅크는 각각의 공간 슬라이스가 대응하는 음향 소스 벡터를 포함할 수 있도록 오디오 입력 신호를 별개의 그리고 독립적인 음향 소스 벡터로 분할하도록 오디오 입력 신호에 적용될 수 있다. 각각의 음향 소스 벡터는 청취자 인지 음향 스테이지의 공간 슬라이스에 포함되는 하나 이상의 가청 음향 소스를 나타내는 오디오 입력 신호 부분을 포함할 수 있다. The position filter bank may be applied to the audio input signal to divide the audio input signal into separate and independent sound source vectors such that each spatial slice may include a corresponding sound source vector. Each sound source vector may include an audio input signal portion representing one or more audible sound sources included in the spatial slice of the listener perceptual sound stage.

음향 소스 벡터는 오디오 처리 시스템에 의해 독립적으로 처리될 수 있다. 처리는 각각의 음향 소스 벡터에 포함되는 가청 음향 소스의 분류를 포함할 수 있다. 예컨대, 분류는 제1 공간 슬라이스에서 제1 음향 소스 벡터에 트럼펫 등의 악기로서 나타나는 가청 음향 소스의 확인 및 사람 음성으로서 제2 공간 슬라이스에서 제2 음향 소스 벡터에 포함되는 가청 음향 소스의 확인을 포함할 수 있다. 처리는 또한 등화(equalization), 딜레이 또는 임의의 다른 음향 처리 기법을 포함할 수 있다. The sound source vector can be processed independently by the audio processing system. The processing may include a classification of an audible sound source included in each sound source vector. For example, the classification may include identification of an audible sound source that appears as a musical instrument, such as a trumpet, in the first sound source vector in the first spatial slice and identification of an audible sound source included in the second sound source vector in the second spatial slice as a human voice. can do. The processing may also include equalization, delay or any other sound processing technique.

처리 후에, 음향 소스 벡터는 라우드스피커가 구동될 수 있는 다수의 오디오 출력 채널을 수용하는 오디오 출력 신호를 형성하도록 결합될 수 있다. 결합은 음향 소스 벡터를 결합하는 것, 음향 소스 벡터를 분할하는 것, 오디오 출력 채널로서 음향 소스 벡터를 간단히 통과시키는 것, 또는 다수의 오디오 출력 채널을 수용하는 오디오 출력 신호를 발생시키도록 음향 소스 벡터의 임의의 다른형태의 협력적 사용을 포함할 수 있다. After processing, the sound source vector can be combined to form an audio output signal that accommodates multiple audio output channels from which the loudspeakers can be driven. The combining may include combining the sound source vectors, dividing the sound source vectors, simply passing the sound source vectors as audio output channels, or generating an audio output signal to accommodate multiple audio output channels. Any other form of cooperative use.

본 발명의 다른 시스템, 방법, 특징 및 이점은 이하의 특징 및 상세한 설명을 검토하면 당업계의 숙련자들에게 명백하거나 명백해질 것이다. 그러한 모든 추가 시스템, 방법, 특징 및 이점은 이 설명 내에 포함되고, 본 발명의 범위 내에 있으며, 이하의 청구범위에 의해 보호된다. Other systems, methods, features and advantages of the present invention will be or become apparent to those skilled in the art upon reviewing the following features and detailed description. All such additional systems, methods, features, and advantages are included within this description, are within the scope of the present invention, and are protected by the following claims.

본 발명은 이하의 도면 및 설명을 참조하면 더 잘 이해될 수 있다. 도면들의 구성요소들은 반드시 실척은 아니고, 대신에 본 발명의 원리를 설명할 때에 강조가 있다. 더욱이, 도면들에서, 동일한 참조 번호는 상이한 도면들에 걸쳐 대응하는 부품을 지시한다.
도 1은 오디오 처리 시스템을 포함하는 예시적인 오디오 시스템의 블록도이다.
도 2는 청취자 인지 음향 스테이지의 예이다.
도 3은 청취자 인지 음향 스테이지의 다른 예이다.
도 4는 추정된 지각 위치와 청취자 인지 음향 스테이지 간의 예시적인 관계를 나타내는 그래프이다.
도 5는 위치 필터 뱅크의 예이다.
도 6은 청취자 인지 음향 스테이지와 복수 개의 공간 슬라이스에서 복수 개의 이득 벡터의 예이다.
도 7은 도 1의 오디오 처리 시스템의 블록도의 예이다.
도 8은 도 1의 오디오 처리 시스템의 다른 블록도의 예이다.
도 9는 도 1의 오디오 처리 시스템의 다른 블록도의 예이다.
도 10은 청취자 인지 음향 스테이지의 다른 예이다.
도 11은 도 1의 오디오 처리 시스템의 예시적인 작동 흐름도이다.
도 12는 도 11의 작동 흐름도의 제2 부분이다.The invention may be better understood with reference to the following figures and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the different views.
1 is a block diagram of an exemplary audio system that includes an audio processing system.
2 is an example of a listener perception sound stage.
3 is another example of a listener perception sound stage.
4 is a graph illustrating an exemplary relationship between an estimated perceptual position and a listener perceptual sound stage.
5 is an example of a position filter bank.
6 is an example of a plurality of gain vectors in a listener perceptual sound stage and a plurality of spatial slices.
7 is an example of a block diagram of the audio processing system of FIG. 1.
8 is an example of another block diagram of the audio processing system of FIG.
9 is an example of another block diagram of the audio processing system of FIG.
10 is another example of a listener perception sound stage.
11 is an exemplary operational flow diagram of the audio processing system of FIG. 1.
12 is a second part of the operational flow diagram of FIG. 11.

도 1은 오디오 처리 시스템(102)을 포함하는 예시적인 오디오 시스템(100)이다. 오디오 시스템(100)은 또한 적어도 하나의 오디오 콘텐츠 소스(104)와, 적어도 하나의 증폭기(106)와, 복수 개의 라우드스피커(108)를 포함할 수 있다. 오디오 시스템(100)은 가청 오디오 콘텐츠를 생성할 수 있는 임의의 시스템일 수 있다. 예시적인 오디오 시스템(100)은 차량 오디오 시스템, 홈 씨어터 시스템 등의 고정식 소비자 오디오 시스템, 극장 또는 텔레비젼 등의 멀티미디어 시스템을 위한 오디오 시스템, 멀티룸 오디오 시스템, 스타디움 또는 콘벤션 센터에서와 같은 공공 어드레스 시스템, 야외 오디오 시스템, 또는 가청 오디오 음향을 재생하기를 원하는 임의의 다른 현장을 포함한다. 1 is an example audio system 100 that includes an audio processing system 102. The audio system 100 may also include at least one audio content source 104, at least one amplifier 106, and a plurality of loudspeakers 108. Audio system 100 may be any system capable of producing audible audio content. Exemplary audio system 100 may be a fixed addressable consumer audio system, such as a car audio system, a home theater system, an audio system for a multimedia system, such as a theater or television, a public address system such as a multi-room audio system, a stadium, or a convention center. , An outdoor audio system, or any other site that wishes to reproduce an audible audio sound.

오디오 콘텐츠 소스(104)는 적어도 2개의 채널에서 상이한 오디오 신호를 발생 및 출력할 수 있는 하나 이상의 디바이스의 임의의 형태일 수 있다. 오디오 콘텐츠 소스(104)의 예로는 콤팩트 디스크 또는 비디오 디스크 플레이어 등의 미디어 플레이어, 비디오 시스템, 라디오, 카세트 테이프 플레이어, 무선 또는 유선 통신 디바이스, 네비게이션 시스템, 개인용 컴퓨터, MP3 플레이어 또는 IPODTM 등의 코덱 또는 적어도 2개의 채널에서 상이한 오디오 신호를 출력할 수 있는 오디오 관련 디바이스의 임의의 다른 형태를 포함한다. The audio content source 104 may be any form of one or more devices capable of generating and outputting different audio signals on at least two channels. Examples of audio content sources 104 include media players such as compact discs or video disc players, video systems, radios, cassette tape players, wireless or wired communication devices, navigation systems, personal computers, MP3 players or codecs such as IPOD ™ or at least Any other form of audio related device capable of outputting different audio signals on two channels.

도 1에서, 오디오 콘텐츠 소스(104)는 미리 녹음된 가청 음향 등의 소스 재료로부터 각각의 오디오 입력 채널(110)에서 2개 이상의 오디오 신호를 생성한다. 오디오 신호는 오디오 콘텐츠 소스(104)에 의해 생성되는 오디오 입력 신호일 수 있고 아날로그 소스 재료를 기초로 한 아날로그 신호일 수 있거나, 디지털 소스 재료를 기초로 한 디지털 신호일 수 있다. 따라서, 오디오 콘텐츠 소스(104)는 아날로그 대 디지털 또는 디지털 대 아나롤그 컨버터 등의 신호 전환 능력을 포함할 수 있다. 일례에서, 오디오 콘텐츠 소스(104)는 2개의 오디오 입력 채널(110)에 마련되는 우측 및 좌측 채널을 나타내는 2개의 실질적으로 상이한 오디오 신호로 이루어지는 스테레오 오디오 신호를 생성할 수 있다. 다른 예에서, 오디오 콘텐츠 소스(104)는 5.1 채널, 6.1 채널, 7.1 채널 등의 2개보다 많은 오디오 입력 채널(110)에서 2개보다 많은 오디오 신호를 생성하거나, 각각 동일한 갯수의 오디오 입력 채널(110)에서 생성되는 임의의 다른 갯수의 상이한 오디오 신호를 생성할 수 있다. In FIG. 1, audio content source 104 generates two or more audio signals in each audio input channel 110 from source material such as pre-recorded audible sound. The audio signal may be an audio input signal generated by the audio content source 104 and may be an analog signal based on analog source material or may be a digital signal based on digital source material. Thus, the audio content source 104 may include signal conversion capabilities, such as an analog to digital or digital to analog converter. In one example, audio content source 104 may generate a stereo audio signal consisting of two substantially different audio signals representing right and left channels provided in two audio input channels 110. In another example, audio content source 104 generates more than two audio signals on more than two audio input channels 110, such as 5.1 channels, 6.1 channels, 7.1 channels, or the like, or each of the same number of audio input channels ( Any other number of different audio signals generated at 110 may be generated.

증폭기(106)는 비교적 작은 진폭의 오디오 입력 신호를 수신하고 비교적 큰 진폭의 유사한 오디오 신호를 출력하는 임의의 회로 또는 독립형 디바이스일 수 있다. 2개 이상의 입력 신호가 2개 이상의 증폭기 입력 채널(112)에서 수신되고 2개 이상의 오디오 출력 채널(114)에서 출력될 수 있다. 오디오 신호의 진폭의 증폭 외에, 증폭기(106)는 또한 페이즈를 변경하거나, 주파수 등화(equalization)를 조정하거나, 딜레이를 조정하거나, 임의의 다른 형태의 조작이나 오디오 신호의 조정을 수행하는 신호 처리 능력을 포함할 수 있다. 또한, 증폭기(106)는 오디오 출력 채널(114)에 마련된 오디오 신호의 볼륨, 밸런스 및/또는 페이드(fade)를 조정하는 능력을 포함할 수 있다. 변경예에서, 증폭기는, 예컨대 라우드스피커(108)가 헤드폰 세트의 형태로 있거나 오디오 출력 채널이 다른 오디오 디바이스에 대한 입력부로서 기능할 때에, 생략될 수 있다. 또 다른 예에서, 라우드스피커(108)는, 예컨대 라우드스피커(108)가 독립형 라우드스피커일 때에 증폭기를 포함할 수 있다.Amplifier 106 may be any circuit or standalone device that receives an audio input signal of relatively small amplitude and outputs a similar audio signal of relatively large amplitude. Two or more input signals may be received on two or more amplifier input channels 112 and output on two or more audio output channels 114. In addition to amplifying the amplitude of the audio signal, the amplifier 106 also has a signal processing capability to change the phase, adjust the frequency equalization, adjust the delay, or perform any other form of manipulation or adjustment of the audio signal. It may include. The amplifier 106 may also include the ability to adjust the volume, balance and / or fade of the audio signal provided in the audio output channel 114. In an alternative, the amplifier may be omitted, for example, when the loudspeaker 108 is in the form of a set of headphones or the audio output channel functions as an input to another audio device. In another example, loudspeaker 108 may include an amplifier, for example when loudspeaker 108 is a standalone loudspeaker.

라우드스피커(108)는 룸, 차량 또는 라우드스피커(108)가 작동될 수 있는 임의의 다른 공간과 같은 청취 공간에 위치될 수 있다. 라우드스피커(108)는 임의의 크기이고 임의의 범위의 주파수에 걸쳐 작동될 수 있다. 각 오디오 출력 채널(114)은 하나 이상의 라우드스피커(108)를 구동하도록 신호를 공급할 수 있다. 각 라우드스피커(108)는 단일 트랜스듀서 또는 다중 트랜스듀서를 포함할 수 있다. 라우드스피커(108)는 또한 서브우퍼, 우퍼, 중음 및 트위터(tweeter) 등의 상이한 주파수 범위에서 작동될 수 있다. 2개 이상의 라우드스피커(108)가 오디오 시스템(100)에 포함될 수 있다. The loudspeaker 108 may be located in a listening room, such as a room, a vehicle, or any other room in which the loudspeaker 108 may operate. The loudspeakers 108 are of any size and can operate over any range of frequencies. Each audio output channel 114 may supply a signal to drive one or more loudspeakers 108. Each loudspeaker 108 may include a single transducer or multiple transducers. The loudspeakers 108 may also be operated in different frequency ranges such as subwoofers, woofers, mids and tweeters. Two or more loudspeakers 108 may be included in the audio system 100.

오디오 처리 시스템(102)은 오디오 입력 채널(110)에서 오디오 콘텐츠 소스(104)로부터의 오디오 입력 신호를 수신할 수 있다. 처리 후에, 오디오 처리 시스템(102)은 증폭기 입력 채널(112) 상에 처리된 오디오 신호를 제공한다. 오디오 처리 시스템(120)은 별개의 유닛이거나 오디오 콘텐츠 소스(104), 증폭기(106) 및/또는 라우드스피커(108)와 조합될 수 있다. 또한, 다른 예에서, 오디오 처리 시스템(102)은 오디오 콘텐츠 소스(104), 오디오 증폭기(106), 라우드스피커(108) 및/또는 임의의 다른 디바이스 또는 메카니즘[기타 오디오 처리 시스템(102)을 포함]과 연결하도록 네트워크 또는 통신 버스를 거쳐 통신할 수 있다. Audio processing system 102 may receive an audio input signal from audio content source 104 on audio input channel 110. After processing, the audio processing system 102 provides the processed audio signal on the amplifier input channel 112. The audio processing system 120 may be a separate unit or combined with the audio content source 104, the amplifier 106 and / or the loudspeaker 108. Also in another example, the audio processing system 102 includes an audio content source 104, an audio amplifier 106, a loudspeaker 108, and / or any other device or mechanism [other audio processing system 102]. Communication via a network or communication bus.

하나 이상의 오디오 프로세서(118)가 오디오 처리 시스템에 포함될 수 있다. 오디오 프로세서(118)는 컴퓨터 프로세서, 마이크로프로세서, 디지털 신호 프로세서, 또는 임의의 다른 디바이스, 논리 작업을 수행할 수 있는 일련의 디바이스들 또는 기타 메카니즘들과 같이 오디오 및/또는 비디오 신호를 처리할 수 있는 하나 이상의 연산 디바이스일 수 있다. 오디오 프로세서(118)는 메모리에 저장된 지시를 수행하도록 메모리(120)와 관련하여 작동될 수 있다. 이 지시는 소프트웨어, 펌웨어, 컴퓨터 코드 또는 그 몇몇의 조합의 형태일 수 있고, 오디오 프로세서(118)에 의해 수행될 때에 오디오 처리 시스템(102)의 기능성을 제공할 수 있다. 메모리(120)는 휘발성 메모리, 비휘발성 메모리, 전자 메모리, 자기 메모리, 광 메모리 또는 임의의 다른 형태의 데이터 저장 디바이스와 같은 하나 이상의 데이터 저장 디바이스의 임의의 형태일 수 있다. 지시 외에, 작동 파라미터 및 데이터가 또한 메모리(120)에 저장될 수 있다. 오디오 처리 시스템(120)은 또한 전자 디바이스, 전자 기계적 디바이스, 또는 아닐로그 신호와 디지털 신호 간의 변환을 위한 디바이스, 필터, 사용자 인터페이스, 통신 포트 등의 기계적 디바이스, 및/또는 임의의 다른 작동 기능성을 포함할 수 있고 오디오 시스템(100) 내에서 유저 및/또는 프로그래머에 엑세스될 수 있다. One or more audio processors 118 may be included in the audio processing system. The audio processor 118 may process audio and / or video signals, such as a computer processor, microprocessor, digital signal processor, or any other device, a series of devices or other mechanisms capable of performing logic tasks. It may be one or more computing devices. The audio processor 118 may be operated in conjunction with the memory 120 to perform instructions stored in the memory. This indication may be in the form of software, firmware, computer code, or some combination thereof and may provide the functionality of the audio processing system 102 when performed by the audio processor 118. Memory 120 may be any form of one or more data storage devices, such as volatile memory, nonvolatile memory, electronic memory, magnetic memory, optical memory, or any other type of data storage device. In addition to the instructions, operating parameters and data may also be stored in the memory 120. Audio processing system 120 may also include an electronic device, an electromechanical device, or a device for conversion between an analog and digital signals, a mechanical device such as a filter, a user interface, a communication port, and / or any other operational functionality. And a user and / or a programmer within the audio system 100.

작동 중에, 오디오 처리 시스템(102)은 오디오 입력 신호를 수신하고 처리한다. 일반적으로, 오디오 입력 신호의 처리 중에, 오디오 프로세서(118)는 오디오 입력 신호 내에 나타나는 복수 개의 가청 음향 소스의 각각의 복수 개의 지각 위치를 확인한다. 지각 위치는 청취자 인지 음향 스테이지 내에서 각각의 가청 음향 소스의 물리적 위치를 나타낸다. 따라서, 청취자가 실제 스테이지 상에서 발생하는 라이브 연주에 존재하면, 지각 위치는 기타리스트, 드러머, 가수 및 임의의 다른 연주자 또는 오디오 신호 내에 음향을 생성하는 물체 등의 연주자의 스테이지 상의 위치와 정렬된다. In operation, audio processing system 102 receives and processes audio input signals. In general, during processing of the audio input signal, the audio processor 118 identifies each of the plurality of perceptual positions of the plurality of audible sound sources that appear within the audio input signal. The perceptual location represents the physical location of each audible sound source within the listener perception sound stage. Thus, if the listener is present in a live performance that occurs on the actual stage, the perceptual position is aligned with the position on the stage of the performer such as guitarist, drummer, singer and any other performer or object generating sound in the audio signal.

오디오 프로세서(118)는 오디오 입력 신호를 각 (적어도) 하나의 지각 위치로부터의 오디오 콘텐츠를 각각 포함하는 공간 오디오 스트림 또는 공간 슬라이스의 세트로 분해한다. 소정의 지각 위치 내에 공동 배치된 임의의 음향 소스는 동일한 공간 오디오 스트림에 포함될 수 있다. 임의의 갯수의 상이한 공간 오디오 스트림이 청취자 인지 음향 스테이지를 가로질러 생성될 수 있다. 공간 오디오 스트림은 오디오 프로세서(118)에 의해 독립적으로 처리될 수 있다. The audio processor 118 decomposes the audio input signal into a set of spatial audio streams or spatial slices, each containing audio content from each (at least) perceptual location. Any sound source co-located within a given perceptual location may be included in the same spatial audio stream. Any number of different spatial audio streams can be generated across the listener perception acoustic stage. The spatial audio stream can be processed independently by the audio processor 118.

작동 중에, 오디오 프로세서(118)는 각각의 가청 음향 소스의 확인된 지각 위치를 기초로 하여 복수 개의 출력 채널 각각을 위한 복수 개의 필터를 발생시킬 수 있다. 오디오 프로세서(118)는 공간 오디오 스트림을 발생시키도록 오디오 입력 신호에 필터를 적용시킬 수 있다. 공간 오디오 스트림은 독립적으로 처리될 수 있다. 처리 후에, 공간 오디오 스트림은 조립되거나 달리 재결합되어 복수 개의 각 오디오 출력 채널을 갖는 오디오 출력 신호를 발생시킬 수 있다. 오디오 출력 채널은 증폭기 입력 라인(112) 상에 마련된다. 오디오 처리 시스템(102)은 오디오 입력 신호에 포함된 입력 채널의 갯수보다 많거나 적은 오디오 입력 채널을 제공할 수 있다. 이와 달리, 오디오 처리 시스템(102)은 입력 채널로서 제공된 것과 동일한 갯수의 오디오 출력 채널을 제공할 수 있다. In operation, the audio processor 118 may generate a plurality of filters for each of the plurality of output channels based on the identified perceptual positions of each audible sound source. The audio processor 118 may apply a filter to the audio input signal to generate a spatial audio stream. Spatial audio streams can be processed independently. After processing, the spatial audio stream can be assembled or otherwise recombined to generate an audio output signal having a plurality of respective audio output channels. An audio output channel is provided on the amplifier input line 112. The audio processing system 102 may provide more or less audio input channels than the number of input channels included in the audio input signal. Alternatively, audio processing system 102 may provide the same number of audio output channels as provided as input channels.

도 2는 스테레오 오디오 입력 신호 등의 오디오 입력 신호를 수신하는 스테레오 시스템 구성을 이용하여 형성된 청취자 인지 음향 스테이지(200)에 걸쳐서 인지를 예시하는 예이다. 도 2에서, 좌측 라우드스피커(202)와 우측 라우드스피커(204)는 청취 위치(206)에서 청취자에 의해 수신되는 음향을 생성하도록 오디오 콘텐츠 소스의 각각의 좌측 채널 및 우측 채널에 의해 구동된다. 다른 예에서, 추가의 채널 및 각각의 라우드스피커, 라우드스피커 위치 및 추가의/상이한 크기의 청취 위치가 예시될 수 있다. 2 is an example illustrating recognition over a listener recognition acoustic stage 200 formed using a stereo system configuration for receiving audio input signals, such as stereo audio input signals. In FIG. 2, the left loudspeaker 202 and the right loudspeaker 204 are driven by respective left and right channels of the audio content source to produce sound received by the listener at the listening position 206. In another example, additional channels and respective loudspeakers, loudspeaker positions, and additional / different size listening positions may be illustrated.

도 2에서, 청취 위치(206)는 각 라우드스피커(202, 204)에 대한 거리가 실질적으로 동일하도록 실질적으로 라우드스피커(202, 204) 사이에 있는 중앙 지점(208)에 배치된다. 이 예에서, 라우드스피커(202, 204)로부터 방출되는 가청 음향을 기초로 하여 청취자 인지 음향 스테이지(200) 내에서 임의의 갯수의 음향 소스의 지각 위치를 청취자가 결정하게 하도록 3개의 인자들이 조합될 수 있다. 인자들은 좌측 및 우측 채널에 있는 음향 소스의 상대적 진폭 레벨, 좌측 및 우측 채널에 있는 음향 소스의 상대적 딜레이(도착 시간), 그리고 좌측 및 우측 채널에 있는 음향 소스의 상대적 위상을 포함한다. In FIG. 2, the listening position 206 is disposed at a central point 208 substantially between the loudspeakers 202, 204 such that the distances to each loudspeaker 202, 204 are substantially the same. In this example, three factors may be combined to allow the listener to determine the perceptual position of any number of sound sources within the listener perception acoustic stage 200 based on the audible sound emitted from the loudspeakers 202, 204. Can be. The factors include the relative amplitude levels of the sound sources in the left and right channels, the relative delay (arrival time) of the sound sources in the left and right channels, and the relative phases of the sound sources in the left and right channels.

음향 소스의 레벨이 좌측 채널[좌측 라우드스피커(202)]에서 더 큰 것으로 청취 위치(206)에서 인지되면, 음향 소스는 좌측 라우드스피커(202)에 더 가까운 청취자 인지 음향스테이지(200)에서 제1 지각 위치(S1; 210)에 배치될 청취자에 의해 인지되게 된다. 유사하게, 음향 소스가 우측 라우드스피커(204)로부터 청취 위치(206)에 먼저 도착하면, 음향 소스는 우측 라우드스피커(204)에 더 가까운 제2 지각 위치(S2; 212)에서 청취자 인지 음향스테이지(200)에 배치되는 것으로 인지되게 된다. 따라서, 소리의 세기 및 도착 시간에 따라, 상이한 음향 소스가 청취자 인지 음향스테이지(200)에서 상이한 지각 위치에 있는 것으로 청취자에 의해 인지될 수 있다. 또한, 라우드스피커(202, 204)가 이들 간에 상당한 위상 변경을 갖는 오디오 신호에 의해 구동되면, 음향 소스가 우측 라우드스피커(204)를 지나 있는 제3 지각 위치(S3; 214)에 배치되는 것으로 인지될 수 있다. 도 2는 청취자 인지 음향스테이지(200) 내에서 음향 소스의 몇몇의 예시적인 지점의 간단한 예시이고, 다른 예에서, 임의의 갯수의 지각 위치에 배치되는 임의의 갯수의 음향 소스가 제공될 수 있다. If the level of the sound source is perceived at the listening position 206 to be greater in the left channel (left loudspeaker 202), then the sound source is the first in the listener perceived sound stage 200 closer to the left loudspeaker 202. It is perceived by the listener to be placed in the perceptual position (S1) 210. Similarly, when the sound source first arrives at the listening position 206 from the right loudspeaker 204, the sound source is at a second perceptual position S2 (212) closer to the right loudspeaker 204. 200). Thus, depending on the intensity of the sound and the time of arrival, it may be perceived by the listener that different sound sources are at different perceptual positions in the listener perception sound stage 200. Also, if loudspeakers 202 and 204 are driven by an audio signal with a significant phase shift between them, it is perceived that the sound source is placed in a third perceptual position (S3) 214 past right loudspeaker 204. Can be. FIG. 2 is a simple illustration of some example points of a sound source within the listener perception sound stage 200, and in other examples, any number of sound sources may be provided disposed at any number of perceptual locations.

도 2에서, 청취자 인지 음향스테이지(200)는 공간 슬라이스 또는 지각 위치(218, 220, 222, 224, 226, 228, 230)로서 또한 명명되는 7개의 구역으로 분할되어 있다. 다른 예에서, 청취자 인지 음향스테이지(200)는 임의의 다른 갯수의 지각 위치로서 분할될 수 있다. 도 2에서, 제1 지각 위치(S1; 210)는 제3 공간 슬라이스(222)에 위치되는 것으로 오디오 프로세서(118)에 의해 추정되고, 제2 지각 위치(S2; 212)는 제5 인지 슬라이스(226)에 위치되는 것으로 추정되며, 중앙 지점(208)은 제4 공간 슬라이스(224)에 위치되는 것으로 추정된다. In FIG. 2, the listener aware acoustic stage 200 is divided into seven zones, also referred to as spatial slices or perceptual positions 218, 220, 222, 224, 226, 228, 230. In another example, the listener perception sound stage 200 may be divided into any other number of perceptual positions. In FIG. 2, the first perceptual position S1 210 is estimated by the audio processor 118 to be located in the third spatial slice 222, and the second perceptual position S2 212 is the fifth perceptual slice ( 226 is assumed to be located, and the central point 208 is assumed to be located at the fourth spatial slice 224.

도 3은 공간 슬라이스로 분해된 청취자 인지 음향 스테이지(300)의 다른 예이다. 음향 스테이지(300)는 5.1, 6.1, 7.1 또는 몇몇의 다른 서라운드 음향 오디오 신호 등의 멀티 채널 오디오 신호를 수신하도록 서라운드 음향 시스템 구성을 이용하여 형성되었다. 도 3에서, 좌측 스피커(302), 우측 스피커(304), 중앙 스피커(316), 좌측면 스피커(308), 우측면 스피커(310), 좌측 후방 스피커(312), 우측 후방 스피커(314)는 청취자 위치(316)로부터 멀리 위치된다. 청취 위치(316)는 라우드스피커(302, 304, 306, 308, 310, 312, 314)의 원형 위치로 인해 실질적으로 동심 지점에 배치된다. 다른 예에서, 임의의 다른 갯수의 라우드스피커 및/또는 라우드스피커 위치 뿐만 아니라 청취자 위치가 예시될 수 있다.3 is another example of a listener perceptual sound stage 300 decomposed into spatial slices. The acoustic stage 300 was formed using a surround sound system configuration to receive multi-channel audio signals, such as 5.1, 6.1, 7.1 or some other surround acoustic audio signal. In FIG. 3, the left speaker 302, the right speaker 304, the center speaker 316, the left speaker 308, the right speaker 310, the left rear speaker 312, and the right rear speaker 314 are listeners. It is located far from location 316. The listening position 316 is disposed at a substantially concentric point due to the circular position of the loudspeakers 302, 304, 306, 308, 310, 312, 314. In other examples, any other number of loudspeaker and / or loudspeaker positions as well as listener positions may be illustrated.

도 3에서, 각각의 라우드스피커(302, 304, 306, 308, 310, 312, 314)에 각각 대응하는 7개의 공간 슬라이스 또는 지각 위치(320, 322, 324, 326, 328, 330, 332)가 청취자 위치(316)를 둘러싼다. 다른 예에서, 임의의 갯수의 공간 슬라이스가 사용될 수 있다. 또한, 각 공간 슬라이스의 폭은 여러 예에서 상이할 수 있다. 예컨대, 공간 슬라이스는 청취자 인지 음향 스테이지(300) 내에서 오버랩되거나 떨어져 있을 수 있다. In FIG. 3, seven spatial slices or perceptual positions 320, 322, 324, 326, 328, 330, 332 corresponding to the respective loudspeakers 302, 304, 306, 308, 310, 312, 314 are shown. Surround listener position 316. In other examples, any number of spatial slices can be used. In addition, the width of each space slice may be different in various examples. For example, the spatial slices may overlap or be apart within the listener perception acoustic stage 300.

오디오 프로세서(118)는 제3 공간 슬라이스(324) 내에 배치되도록 청취자 인지 음향 스테이지(300) 내에 제1 인지 음향 소스(S1; 336)를 추정할 수 있고, 제2 인지 음향 소스(S2; 338)는 제6 공간 슬라이스(330) 내에 배치되는 것으로 추정될 수 있다. 다른 예에서, 임의의 갯수의 인지 음향 소스는 공간 슬라이스(320, 322, 324, 326, 328, 330, 332) 내에 배치될 수 있다. The audio processor 118 may estimate the first cognitive sound source S1 336 in the listener cognitive sound stage 300 to be disposed within the third spatial slice 324, and the second cognitive sound source S2 338. May be estimated to be disposed within the sixth spatial slice 330. In another example, any number of cognitive sound sources can be placed in spatial slices 320, 322, 324, 326, 328, 330, 332.

청취자 인지 음향 스테이지 내에 음향 소스의 배치 추정은 오디오 입력 신호의 채널들의 상대 진폭, 위상 및 도착 시간의 비교를 기초로 할 수 있다. 우측 채널(R) 및 좌측 채널(L)로 이루어지는 스테레오 오디오 입력 신호의 예에서, 오디오 프로세서에 의해 추정된 지점의 계산은 수학식 1을 기초로 한다.The estimation of the placement of the sound source within the listener-aware sound stage may be based on a comparison of the relative amplitude, phase and arrival time of the channels of the audio input signal. In the example of the stereo audio input signal consisting of the right channel R and the left channel L, the calculation of the point estimated by the audio processor is based on equation (1).

여기서, S(ω)는 각각의 청취자 인지 음향 스테이지(300)에서 추정된 지점이고, L(ω)는 주파수 도메인에서 좌측 오디오 입력 신호의 (실제 성분와 가상 성분로 이루어지는) 복소수 표시이며, R(ω)은 주파수 도메인에서 우측 오디오 입력 신호의 (실제 성분와 가상 성분로 이루어지는) 복소수 표시이고, B는 밸런스 함수이다. V_L(ω)와 V_R(ω)은 1과 동일한 크기를 각각 갖는 (실제 성분와 가상 성분로 이루어지는) 별개의 복소수 벡터이다. V_L(ω)와 V_R(ω)은 L(ω) 및 R(ω)에 주파수 종속 딜레이를 적용하도록 사용될 수 있다. 딜레이의 값, 및 이에 따라 V_L(ω) 및 V_R(ω)의 값은 좌측(L) 및 우측(R) 입력 채널에서 소정의 음향 소스의 도착 시간에 존재할 수 있는 임의의 차이를 오프셋하도록 선택될 수 있다. 그러므로, V_L(ω)과 V_R(ω)은 2개의 입력 채널에서 소정의 음향 소스를 시간 정렬시키도록 사용될 수 있다. V_L(ω)과 V_R(ω)에 의해 제공되는 딜레이는 별법으로서 좌측 및 우측 오디오 입력 신호를 주파수 도메인으로 변환시키기 전에 시간 도메인에서 달성될 수 있다는 것을 알 것이다. 변수 ω는 주파수 또는 주파수 범위를 나타낸다. 밸런스 함수는 청취자 인지 음향 스테이지에 있는 음향 소스가 청취자 인지 음향 스테이지의 중앙의 좌측에 있는지 또는 청취자 인지 음향 스테이지의 중앙의 우측에 있는지를 식별하도록 사용될 수 있다. 밸런스 함수(B)는 수학식 2로 나타낼 수 있다.Where S (ω) is the estimated point in each listener perception acoustic stage 300, L (ω) is the complex representation (consisting of real and imaginary components) of the left audio input signal in the frequency domain and R (ω) ) Is a complex representation (consisting of real and imaginary components) of the right audio input signal in the frequency domain, and B is a balance function. V _L (ω) and V _R (ω) are separate complex vectors (consisting of real and imaginary components) each having the same magnitude as one. V _L (ω) and V _R (ω) can be used to apply a frequency dependent delay to L (ω) and R (ω). The value of the delay, and hence the values of V _L (ω) and V _R (ω), is offset so that any difference that may exist in the arrival time of a given sound source in the left (L) and right (R) input channels. Can be selected. Therefore, V _L (ω) and V _R (ω) can be used to time-align a given sound source in two input channels. It will be appreciated that the delay provided by V _L (ω) and V _R (ω) can alternatively be achieved in the time domain before converting the left and right audio input signals into the frequency domain. The variable ω represents the frequency or frequency range. The balance function may be used to identify whether the sound source in the listener-aware sound stage is to the left of the center of the listener-aware sound stage or to the right of the center of the listener-aware sound stage. The balance function B may be represented by Equation 2.

여기서, A는 우측 오디오 입력 신호(R)의 크기에 대한 좌측 오디오 입력 신호(L)의 크기의 오디오 프로세서(118)에 의한 진폭 비교의 표시이다. 일례에서, A는 좌측 오디오 입력 신호의 진폭이 우측 오디오 입력 신호의 진폭보다 클 때에 오디오 프로세서(118)에 의해 1로 설정될 수 있고, A는 좌측 오디오 입력 신호의 진폭이 우측 오디오 입력 신호의 진폭과 동일할 때에 0으로 설정될 수 있으며, 좌측 오디오 입력 신호의 진폭이 우측 오디오 입력 신호의 진폭보다 작을 때에 -1로 설정될 수 있다. Here, A is an indication of amplitude comparison by the audio processor 118 of the magnitude of the left audio input signal L to the magnitude of the right audio input signal R. In one example, A may be set to 1 by the audio processor 118 when the amplitude of the left audio input signal is greater than the amplitude of the right audio input signal, and A is the amplitude of the right audio input signal. It may be set to 0 when it is equal to and may be set to −1 when the amplitude of the left audio input signal is smaller than the amplitude of the right audio input signal.

5개 또는 6개의 입력 채널 서라운드 오디오 소스와 같이 다수의 입력 채널이 존재하는 경우에, 다수의 입력 채널을 고려하도록 수학식 1과 2 대신에 다른 수학식이 사용될 수 있다.If there are multiple input channels, such as five or six input channel surround audio sources, other equations may be used instead of Equations 1 and 2 to account for multiple input channels.

여기서, S(ω)는 각각의 청취자 인지 음향 스테이지(300)에서 추정된 지점이고, M_k(ω)는 주파수 도메인에서 k번째 오디오 입력 신호의 (실제 성분 및 가상 성분로 이루어지는) 복소수 표시이고, V_k(ω)는 (실제 성분 및 가상 성분로 이루어지는) 복소수 방향 벡터이다. C는 1보다 큰 정수이고 입력 채널의 갯수를 나타내며, 이에 따라 5개의 입력 채널 서라운드 오디오 소스의 경우에, C=5이다. 방향 벡터 V_k(ω)의 값은 다채널 입력 신호를 위해 의도된 바와 같이 스피커 위치의 각도를 나타내도록 선택될 수 있다. 예컨대, 5개의 입력 채널을 갖는 다채널 입력 신호의 경우에, 0도로 전방에 배치된 중앙 스피커, ±30도의 좌측 및 우측 스피커, 및 ±110도의 좌측 및 우측 후방 서라운드 스피커로 이루어지는 통상적인 재생 구성을 위해 입력 신호가 생성되었다고 가정하는 것이 타당하다. 이 예의 구성의 경우에, 방향 벡터들의 적당한 선택은 V_Center(ω) = 1 + 0i, V_Left(ω) = 0.866 + 0.5i, V_Right(ω) = 0.866 - 0.5i, V_LeftSurround(ω) = -0.342 + 0.940i, 및 V_{RightSurround}(ω) = -0.342 - 0.940i이고, 여기서 i는 -1의 제곱근인 복소수 연산자이다. 수학식 3은 합성 단일 벡터를 유도하기 위하여 입력 신호 채널 각각으로부터 합성 음향 필드에 대한 기여를 합산하도록 사용될 수 있다. 이 합성 신호 벡터는 (실제 성분과 가상 성분으로 이루어지는) 복소수 값이다. 수학식 3에서의 각도 함수는 합산 프로세스로부터 유발되는 합성 신호 벡터의 각도를 계산하도록 사용될 수 있다. 이 예에서 각도를 연산할 때에, 중앙 채널 스피커는 0도에 대응한다. 다른 예에서, 0도는 다른 곳에 배치될 수 있다. 인자 2/π는 +2 내지 -2 사이의 범위에 있도록 S(ω)의 값을 개산(槪算)한다. 수학식 3은 2개 이상의 채널을 갖는 입력 신호를 위해 사용될 수 있다. Where S (ω) is the estimated point in each listener perception acoustic stage 300, M _k (ω) is a complex representation (consisting of real and imaginary components) of the k-th audio input signal in the frequency domain, V _k (ω) is a complex direction vector (consisting of real and imaginary components). C is an integer greater than 1 and represents the number of input channels, so in the case of five input channel surround audio sources, C = 5. The value of the direction vector V _k (ω) may be chosen to represent the angle of the speaker position as intended for the multichannel input signal. For example, in the case of a multi-channel input signal having five input channels, a typical playback configuration consisting of a center speaker disposed in front of 0 degrees, a left and right speaker at ± 30 degrees, and a left and right rear surround speaker at ± 110 degrees is employed. It is reasonable to assume that an input signal has been generated. In the case of the configuration of this example, a suitable choice of direction vectors is V _Center (ω) = 1 + 0i, V _Left (ω) = 0.866 + 0.5i, V _Right (ω) = 0.866-0.5i, V _LeftSurround (ω) = -0.342 + 0.940i, and V _{RightSurround} (ω) = -0.342-0.940i, where i is a complex operator that is the square root of -1. Equation 3 can be used to sum the contributions to the composite acoustic field from each of the input signal channels to derive a composite single vector. This composite signal vector is a complex value (consisting of real and imaginary components). The angle function in equation (3) can be used to calculate the angle of the composite signal vector resulting from the summing process. In calculating the angle in this example, the center channel speaker corresponds to 0 degrees. In another example, zero degrees may be placed elsewhere. The factor 2 / π approximates the value of S (ω) so as to be in the range between +2 and -2. Equation 3 can be used for an input signal having two or more channels.

이와 달리, 다른 예에서, 다수의 입력 채널은 다수의 별개의 인지 음향 스테이지가 생성되도록 수학식 1 및 2에 적용하기 위한 쌍으로 쪼개질 수 있다. 예컨대, 인지 음향 스테이지는 좌측 전방과 우측 전방 사이에, 좌측 전방과 좌측면 사이에, 좌측면과 좌측 후방 사이 등에 생성될 수 있다. 다른 예에서, 3개 이상의 입력 채널의 오디오 소스가 5개 또는 6개의 입력 채널 서라운드 오디오 소스를 2개의 입력 채널 스테레오 오디오 소스로 다운믹싱하는 것과 같이 2개의 입력 채널 오디오 소스로 다운믹싱될 수 있다. 추출 및 처리 후에, 오디오 소스는 2개 이상의 오디오 출력 채널로 다시 업믹싱될 수 있다. Alternatively, in another example, multiple input channels can be split into pairs for application to Equations 1 and 2 such that multiple separate cognitive acoustic stages are created. For example, the cognitive sound stage may be created between a left front and a right front, between a left front and a left side, between a left side and a left rear, and the like. In another example, audio sources of three or more input channels may be downmixed to two input channel audio sources, such as downmixing five or six input channel surround audio sources to two input channel stereo audio sources. After extraction and processing, the audio source can be upmixed back to two or more audio output channels.

도 4는 청취자 인지 음향 스테이지(404), 예컨대 도 2의 청취자 인지 음향 스테이지(200)에 대해 계산된 추정 위치[S(ω); 402] 사이의 관계를 도시하는 예시적인 그래프이다. 청취자 인지 음향 스테이지는 복수 개의 예정된 구역으로 분할될 수 있고, 각각은 예정된 범위의 위치값들 사이로부터의 위치값을 갖는다. 도 4에서, 음향 스테이지(404)의 위치값은 -2 내지 +2의 예정된 범위의 위치값에 있고, 위치 제로 구역으로서 확인된 청취자 인지 음향 스테이지(404)의 중앙 위치에서의 중앙 위치(406), -1 위치 구역으로서 확인된 좌측면 위치(408), -2 위치 구역으로서 확인된 가장 좌측면 위치(410), +1 위치 구역으로서 확인된 우측면 위치(412), 및 +2 위치 구역으로서 확인된 가장 우측면 위치(414)을 포함한다. 다른 예에서, 도 3에 도시된 청취자 인지 음향 스테이지 등의 다른 청취자 인지 음향 스테이지가 예시될 수 있다. 또한, 다른 범위의 위치값들이 청취자 인지 음향 스테이지를 가로질러 상이한 구역들을 확인하도록 사용될 수 있고, 추가의 또는 더 적은 구역이 존재할 수 있다. FIG. 4 shows the estimated position S (ω) calculated for the listener perception acoustic stage 404, eg, the listener perception acoustic stage 200 of FIG. 2; 402] is an exemplary graph illustrating the relationship therebetween. The listener aware acoustic stage may be divided into a plurality of predetermined zones, each having a position value from between the predetermined range of position values. In FIG. 4, the position value of the acoustic stage 404 is in a predetermined range of position values of −2 to +2, and the central position 406 at the central position of the listener perception acoustic stage 404 identified as the position zero zone. , Left side location 408 identified as -1 location zone, leftmost side location 410 identified as -2 location zone, right side location 412 identified as +1 location zone, and +2 location zone identified Included rightmost side position 414. In another example, another listener aware acoustic stage may be illustrated, such as the listener perceived acoustic stage shown in FIG. 3. In addition, other ranges of position values may be used to identify different zones across the listener perception acoustic stage, and there may be additional or fewer zones.

도 4에서, 추정된 지각 위치[S(ω); 402]는 청취자 인지 음향 스테이지(404)의 지점들과 대응하도록 -2 내지 +2 사이에 있도록 계산된다. 다른 예에서, 추정된 지각 위치[S(ω); 402]를 나타내도록 다른 값이 사용될 수 있다. 추정된 지각 위치[S(ω); 402]의 값은 진폭 비교 A를 기초로 하여 +, - 또는 0이 되도록 수학식 1에 따라 계산된다(수학식 2).In FIG. 4, the estimated perceptual position S (ω); 402 is calculated to be between -2 and +2 to correspond to the points of the listener perceptual sound stage 404. In another example, estimated perceptual location [S (ω); Other values may be used to indicate 402. Estimated perceptual position [S (ω); Value is calculated according to Equation 1 to be +,-or 0 based on the amplitude comparison A (Equation 2).

오디오 시스템 내의 작동 및 신호 처리는 오디오 입력 신호의 분석을 기초로 하여 주파수 도메인에서 또는 시간 도메인에서 발생될 수 있다. 간결성을 위해, 본 논의는 주파수 도메인 기반 실시에 주로 집중하지만, 시간 기반 실시, 또는 조합 시간 기반 실시 및 주파수 기반 실시가 가능하고 시스템의 범위 내에 있다. Operation and signal processing in the audio system can occur in the frequency domain or in the time domain based on the analysis of the audio input signal. For brevity, this discussion focuses primarily on frequency domain based implementations, but time based implementations, or combination time based implementations and frequency based implementations are possible and within the scope of the system.

오디오 입력 신호는 오버래핑 윈도우 분석을 시간 샘플 블록에 적용하고 그 샘플을 이산 푸리에 변환(DFT; Discrete Fourier Transform), 파형 변환 또는 다른 변환 프로세스를 이용하여 변환시킴으로써 주파수 도메인 표시로 전환될 수 있다. 시간 샘플의 각 블록은 오디오 입력 신호의 적시의 순간 또는 스냅샷으로 명명될 수 있다. 적시의 순간 또는 스냅샷은 임의의 예정된 시간 주기 또는 시간 윈도우일 수 있다. 따라서, 오디오 입력 신호는 스냅샷들 또는 연속적인 또는 불연속적인 세그먼트의 시퀀스로 분할될 수 있고, 각 세그먼트는 시작 시간과, 이 시작 시간과 종료 시간 사이에 예정된 양의 시간을 구성하는 종료 시간을 갖는다. 오디오 입력 신호의 한 세그먼트의 종료 시간은 세그먼트들이 단부 대 단부 구성으로 형성되도록 오디오 입력 신호의 후속 세그먼트의 시작 시간에 인접할 수 있다. 일례에서, 각 세그먼트는 약 10 밀리초의 기간을 갖는 시간 윈도우 또는 스냅샷을 나타낼 수 있다. 통상적으로, 스냅샷은 약 5 내지 50 밀리초의 기간을 갖는다. 주파수 도메인에서, 오디오 입력 신호의 각 스냅샷은 예정된 주파수 스펙트럼을 가로질러 복수 개의 주파수 통으로 분리될 수 있다. 주파수 통은 예정된 주파수 범위, 예컨대 0 Hz 내지 24 kHz의 가청 주파수 범위를 둘러싸도록 각각 약 50 Hz 등의 예정된 크기일 수 있다. 예컨대, 48 kHz 등의 예정된 샘플 속도와, 1024개의 통과 같은 예정된 갯수의 통을 기초로 하여, 각 통은 46.875 Hz의 대역폭을 가질 수 있다. 다른 예에서, 통의 크기는 오디오 입력 신호의 샘플 속도를 기초로 하여 오디오 처리 시스템에 의해 동적으로 그리고 자동적으로 변경될 수 있다. 예컨대, 오디오 입력 신호가 44.1 kHz, 48 kHz, 88.2 kHz 또는 96 kHz의 샘플 속도로 샘플링될 수 있는 디지털 신호이면, 오디오 입력 신호의 샘플 속도는 오디오 처리 시스템에 의해 감지될 수 있고, 이에 따라 주파수 통의 크기는 오디오 처리 시스템이 오디오 입력 신호의 샘플 속도로 작동되도록 조절될 수 있다. The audio input signal can be converted to a frequency domain representation by applying overlapping window analysis to a time sample block and transforming the sample using a Discrete Fourier Transform (DFT), waveform transform, or other transform process. Each block of time samples may be named a time instant or snapshot of the audio input signal. The time instant or snapshot can be any predetermined time period or time window. Thus, the audio input signal can be divided into snapshots or a sequence of consecutive or discontinuous segments, each segment having a start time and an end time constituting a predetermined amount of time between this start time and the end time. . The end time of one segment of the audio input signal may be adjacent to the start time of the subsequent segment of the audio input signal such that the segments are formed in an end-to-end configuration. In one example, each segment may represent a time window or snapshot with a duration of about 10 milliseconds. Typically, the snapshot has a duration of about 5 to 50 milliseconds. In the frequency domain, each snapshot of the audio input signal can be separated into a plurality of frequency bins across a predetermined frequency spectrum. The frequency bins may be of a predetermined size, such as about 50 Hz, respectively, to enclose a predetermined frequency range, such as an audible frequency range of 0 Hz to 24 kHz. For example, based on a predetermined sample rate, such as 48 kHz, and a predetermined number of bins, such as 1024 passes, each bin may have a bandwidth of 46.875 Hz. In another example, the size of the bin can be changed dynamically and automatically by the audio processing system based on the sample rate of the audio input signal. For example, if the audio input signal is a digital signal that can be sampled at a sample rate of 44.1 kHz, 48 kHz, 88.2 kHz or 96 kHz, then the sample rate of the audio input signal can be detected by the audio processing system and thus frequency The magnitude of may be adjusted such that the audio processing system is operated at the sample rate of the audio input signal.

일 실시예에서, 0 Hz 내지 24 kHz의 가청 주파수 범위에 걸쳐 1024개의 주파수 통이 존재할 수 있다. 이와 달리, 오디오 입력 신호의 스냅샷은 병렬형 대역 통과 필터의 뱅크를 이용하여 시간 도메인에서 주파수 뱅크로 분할될 수 있다. 오디오 입력 신호는 또한 수학식 1 및 2를 기초로 하여 청취자 인지 음향 스테이지를 가로질러 예정된 갯수의 지각 위치 또는 공간 슬라이스로 분할될 수 있다. 각각의 지각 위치들 내에서, 오디오 입력 신호의 분할된 부분이 표시될 수 있다.In one embodiment, there may be 1024 frequency bins over an audible frequency range of 0 Hz to 24 kHz. Alternatively, a snapshot of the audio input signal can be partitioned into a frequency bank in the time domain using a bank of parallel band pass filters. The audio input signal may also be divided into a predetermined number of perceptual positions or spatial slices across the listener perception sound stage based on Equations 1 and 2. Within each of the perceptual positions, a segmented portion of the audio input signal can be displayed.

도 5는 청취자 인지 음향 스테이지와 수학식 1 및 2를 기초로 하여 오디오 처리 시스템(102)에 의해 발생된 예시적인 위치 필터 뱅크(500)를 나타낸다. 도 5에서, 6개의 위치 필터의 표시가 예시되어 있다. 위치 필터는 라우드스피커를 구동하도록 오디오 출력 신호에 포함된 오디오 출력 채널로서 제공되는 다수의 출력 채널과 일치할 수 있다. 이와 달리, 오디오 출력 채널을 형성하여 라우드스피커를 구동시키기 전에 추가의 처리 또는 사용을 위해 대응하는 갯수의 출력 채널을 발생시키도록 임의의 갯수의 필터가 사용될 수 있다. 따라서, 임의의 갯수의 위치 필터가 사용될 수 있고, 위치 필터의 출력 채널이 더 처리된 다음 결합 또는 분할되어 라우드스피커를 구동시키도록 사용된 오디오 출력 채널의 갯수와 일치될 수 있다. 예컨대, 오디오 출력 신호에 존재하는 가청 음향 소스가 오디오 출력 채널에 대응하는 청취자 인지 음향 스테이지의 위치에 있지 않으면, 그 위치의 양쪽면에서 오디오 출력 채널을 위해 2개의 신호가 생성될 수 있다. 다른 예에서, 오디오 입력 신호에 존재하는 가청 음향 소스가 2개 이상의 오디오 출력 채널에 대응하는 청취자 인지 음향 스테이지의 위치에 있다면, 신호는 2개 이상의 오디오 출력 채널에서 2배로 될 수 있다. 5 shows an exemplary position filter bank 500 generated by the audio processing system 102 based on the listener perception sound stage and equations (1) and (2). In FIG. 5, an indication of six position filters is illustrated. The position filter may match a number of output channels provided as audio output channels included in the audio output signal to drive the loudspeakers. Alternatively, any number of filters may be used to generate a corresponding number of output channels for further processing or use before forming an audio output channel to drive the loudspeakers. Thus, any number of position filters may be used and the output channels of the position filters may be further processed and then combined or split to match the number of audio output channels used to drive the loudspeakers. For example, if the audible sound source present in the audio output signal is not in the position of the listener aware acoustic stage corresponding to the audio output channel, two signals may be generated for the audio output channel on both sides of that position. In another example, if the audible sound source present in the audio input signal is in the position of a listener aware acoustic stage corresponding to two or more audio output channels, the signal may be doubled in the two or more audio output channels.

도 5에서, 위치 필터는 오디오 출력 신호에 대응하는 출력 채널을 포함할 수 있다. 따라서, 위치 필터는 중앙 채널 출력 필터(502), 우측 전방 출력 필터(504), 좌측 전방 출력 필터(506), 우측면 출력 필터(508), 좌측면 출력 필터(510), 우측 후방 출력 필터(512) 및 좌측 후방 출력 필터(514)를 포함한다. 이 예에서, 출력 필터(502, 504, 506, 508, 510, 512, 514)는 서라운드 음향 오디오 시스템에서 중앙, 우측 전방, 좌측 전방, 우측면, 좌측면, 우측 후방 및 좌측 후방 지정된 라우드스피커 등의 각각의 라우드스피커를 구동시키는 출력 채널에 대응할 수 있고, 하나 이상의 스피커는 원하는 효과를 제공하기 위해 청취자의 귀 또는 임의의 다른 스피커 위쪽 또는 아래쪽 높이의 인지를 제공한다. 다른 예에서, 출력 필터(502, 504, 506, 508, 510, 512 및 514)는 궁극적으로 2개 이상의 오디오 출력 채널의 일부가 되도록 더 처리되는 중간 출력 채널에 대응할 수 있다. 다른 예에서는 요구에 따라 더 적거나 많은 갯수의 위치 필터가 제공 및 사용될 수 있다. 위치 필터 뱅크(500)는 이득 축(518)으로서 확인된 제1 축과, 추정된 지각 위치[S(ω); 도 4]에 대응하는 추정된 지각 위치 축(520)으로서 확인된 제2 축을 포함한다. 도 5에서, 이득 축(518)은 수직축이고, 추정된 지각 위치 축(520)은 수평축이다. In FIG. 5, the position filter may include an output channel corresponding to the audio output signal. Thus, the position filter includes a center channel output filter 502, a right front output filter 504, a left front output filter 506, a right side output filter 508, a left side output filter 510, and a right rear output filter 512. ) And left rear output filter 514. In this example, the output filters 502, 504, 506, 508, 510, 512, 514 are the center, right front, left front, right side, left side, right rear and left rear designated loudspeakers, etc. in a surround sound audio system. It may correspond to an output channel driving each loudspeaker, and one or more speakers provide recognition above or below the listener's ear or any other speaker to provide the desired effect. In another example, output filters 502, 504, 506, 508, 510, 512, and 514 may correspond to intermediate output channels that are further processed to ultimately become part of two or more audio output channels. In other examples, fewer or more location filters may be provided and used as desired. The position filter bank 500 includes a first axis identified as the gain axis 518 and an estimated perceptual position S (ω); A second axis identified as the estimated perceptual location axis 520 corresponding to FIG. 4. In FIG. 5, the gain axis 518 is the vertical axis and the estimated perceptual position axis 520 is the horizontal axis.

각 필터는 청취자 인지 음향 스테이지를 가로질러 음향 소스의 추정된 지각 위치를 기초로 하여 구성되어 오디오 프로세서(118)에 의해 실시될 수 있다. 필터는 오디오 입력 신호의 분석을 기초로 하여 주파수 도메인에서 또는 시간 도메인에서 오디오 프로세서(118)에 의해 계산될 수 있다. 주파수 도메인에서, 수학식 1과 2를 이용하면, 추정된 지각 위치가 계산될 수 있다. 전술한 바와 같이, 일례에서, 계산된 추정된 지각 위치 값은 -2 내지 +2의 값일 수 있다. 다른 예에서, 임의의 다른 범위의 값들이 계산된 추정된 지각 위치 값들을 위해 사용될 수 있다. 특정한 계산된 추정된 지각 위치를 기초로 하여, 대응하는 이득 값이 결정될 수 있다. Each filter may be configured and implemented by the audio processor 118 based on the estimated perceptual location of the sound source across the listener perceptual sound stage. The filter may be calculated by the audio processor 118 in the frequency domain or in the time domain based on the analysis of the audio input signal. In the frequency domain, using Equations 1 and 2, the estimated perceptual position can be calculated. As mentioned above, in one example, the calculated estimated perceptual position value may be a value between -2 and +2. In another example, any other range of values may be used for the calculated estimated perceptual position values. Based on the particular calculated estimated perceptual position, the corresponding gain value can be determined.

도 5에서, 교차점(524)은 이득 축(518)에서 약 0.5의 이득 값에서 존재한다. 교차점(524)은 제1 위치로부터 멀어지고 제2 위치를 향하는 음향 에너지의 천이의 시작을 마킹할 수 있다. 출력 채널을 나타내는 위치 필터의 경우에, 교차점(524)은 제1 출력 채널과 제2 출력 채널 사이의 음향 에너지의 천이를 지시할 수 있다. 바꿔 말하면, 이 예에서, 한 채널에서의 이득 값이 감소함에 따라, 다른 채널에서의 이득 값이 이에 따라 증가할 수 있다. 따라서, 적시에 임의의 소정 지점에서 인접하게 배치된 출력 채널에 출력된 음향는 계산된 추정된 지각 위치 값들을 기초로 하여 인접하게 배치된 출력 채널들 사이에 할당될 수 있다. 예컨대, 중앙 채널 출력 필터(502)는 계산된 추정된 지각 위치 값이 0에 있을 때에 1의 이득에 있는 반면, 계산된 추정된 지각 위치 값이 -0.5에 있을 때에, 중앙 채널 출력 필터(502)의 이득 값은 약 0.15에 있고, 좌측 전방 채널 출력 필터(506)의 이득 값은 약 0.85에 있다. 교차점(524)은 각각의 위치 필터를 나타내는 라인들의 기울기를 특징으로 한 필터 구조를 기초로 하여 조정될 수 있다. In FIG. 5, the intersection point 524 is at a gain value of about 0.5 on the gain axis 518. Intersection point 524 may mark the beginning of the transition of acoustic energy away from the first position and towards the second position. In the case of a position filter representing an output channel, the intersection point 524 may indicate a transition of acoustic energy between the first output channel and the second output channel. In other words, in this example, as the gain value in one channel decreases, the gain value in the other channel may increase accordingly. Thus, the sound output to the adjacently arranged output channels at any given point in time can be allocated between the adjacently arranged output channels based on the calculated estimated perceptual position values. For example, the center channel output filter 502 is at a gain of 1 when the calculated estimated perceptual position value is at 0, while the center channel output filter 502 is at a -0.5 calculated position. The gain value of is at about 0.15 and the gain value of the left front channel output filter 506 is at about 0.85. The intersection point 524 can be adjusted based on the filter structure characterized by the slope of the lines representing each position filter.

따라서, 적시의 순간에 추정된 지각 위치(520)를 계산함으로써, 오디오 처리 시스템은 적시의 동일한 순간 동안에 출력 필터를 위한 대응하는 이득 값을 생기게 할 수 있다. 전술한 바와 같이, 오디오 입력 신호는 주파수 대역으로 분할된다. 따라서, 계산된 이득 값은 추정된 지각 위치(520)를 계산하도록 각각의 주파수 대역에서 오디오 입력 신호의 일부에 대한 수학식 1 및 2의 적용을 기초로 하여 각각의 주파수 대역 내에서 계산된다. 도 5에 도시된 교차점(524)은 0.5 외의 이득 값에서 발생할 수 있다. 도 5에 도시된 예의 위치 필터(502, 504, 506, 508, 510, 512, 514)는 인접한 필터하고만 오버랩한다. 인접한 필터들 사이에 더 많거나 적은 오버랩을 갖는 다른 위치 필터 구조가 사용될 수 있다. 3개 이상의 위치 필터가 청취자 인지 음향 스테이지를 가로질러 음향 소스의 소정의 추정된 지각 위치[S(ω)]를 위한 0이 아닌 이득 값을 갖는 위치 필터 구조가 고안될 수 있다. 추가적으로 또는 대안으로, 위치 필터의 이득 값은 양수 및 음수일 수 있다. Thus, by calculating the estimated perceptual position 520 at the instant of time, the audio processing system can produce a corresponding gain value for the output filter during the same instant of time. As described above, the audio input signal is divided into frequency bands. Thus, the calculated gain value is calculated within each frequency band based on the application of Equations 1 and 2 to the portion of the audio input signal in each frequency band to calculate the estimated perceptual position 520. Intersection 524 shown in FIG. 5 may occur at gain values other than 0.5. The position filters 502, 504, 506, 508, 510, 512, 514 of the example shown in FIG. 5 overlap only with adjacent filters. Other position filter structures with more or less overlap between adjacent filters may be used. A position filter structure can be devised in which three or more position filters have a nonzero gain value for a predetermined estimated perceptual position S (ω) of the acoustic source across the listener perception acoustic stage. Additionally or alternatively, the gain value of the position filter can be positive and negative.

도 6은 적시의 순간에 청취자 인지 음향 스테이지(600)를 가로질러 지각 위치 또는 공간 슬라이스(602)의 예정된 갯수(x)를 묘사하는 청취자 인지 음향 스테이지(600)의 예시적인 도면이다. 전술한 바와 같이, 7개의 공간 슬라이스가 도시되어 있지만, 임의의 갯수(x)의 공간 슬라이스(602)가 가능하다. 도 6에서, 청취자 인지 음향 스테이지(600)는 중앙(608)을 중심으로 대체로 대칭인 좌측 라우드스피커(604)와 우측 라우드스피커(606)를 포함한다. 다른 예에서, 도 3에 도시된 서라운드 음향 청취자 인지 스테이지와 같이 청취자 인지 음향 스테이지의 다른 구성이 실시될 수 있다. 6 is an exemplary diagram of a listener perception acoustic stage 600 depicting a predetermined number x of perceptual positions or spatial slices 602 across the listener perception acoustic stage 600 at a timely time. As noted above, seven spatial slices are shown, but any number (x) of spatial slices 602 are possible. In FIG. 6, the listener perception acoustic stage 600 includes a left loudspeaker 604 and a right loudspeaker 606 that are generally symmetric about a center 608. In another example, other configurations of the listener perception acoustic stage can be implemented, such as the surround acoustic listener perception stage shown in FIG. 3.

전술한 바와 같이, 수학식 1 및 2는 예정된 주파수 대역 또는 주파수 통으로 분할된 오디오 입력 신호에 적용된다. 계산된 추정된 지각 위치 값을 기초로 하여 이득 값이 또한 전술한 바와 같이 유도될 수 있다. 이득 값은 공간 슬라이스(602)의 각각의 공간 슬라이스에 대한 이득 위치 벡터(610)에 의해 나타내는 위치 필터에 포함될 수 있다. 각 이득 위치 벡터(610)는 0 내지 1에 달하는 이득 값 등의 이득 값(612)을 포함할 수 있다. As described above, Equations 1 and 2 apply to audio input signals divided into predetermined frequency bands or frequency channels. The gain value can also be derived as described above based on the calculated estimated perceptual position value. The gain value may be included in the position filter represented by the gain position vector 610 for each spatial slice of spatial slice 602. Each gain position vector 610 may include a gain value 612 such as a gain value ranging from 0 to 1.

도 6에서, 이득 값(612)은 Gsn으로 나타내는데, 여기서 "s"는 공간 슬라이스 번호이고 "n"은 주파수 통 번호에 대응하는 각각의 이득 위치 벡터(610)에서 주파수 대역 위치이다. 각 이득 위치 벡터(610)는 공간 슬라이스(602) 중 특정한 공간 슬라이스 내에서 제1 예정된 주파수(f1)로부터 제2 예정된 주파수(f2)까지의 오디오 입력 신호의 주파수 범위를 수직 방향으로 나타낼 수 있다. 각 이득 위치 벡터(610)에서 이득 값(612)의 번호는 오디오 입력 신호가 분할된 주파수 통(Bn)의 번호에 대응할 수 있다. 전술한 바와 같이, 오디오 입력 신호는 0 Hz 내지 20 kHz와 같이 예정된 범위의 주파수(f1 내지 f2)를 가로질러 1024개의 통과 같은 예정된 갯수의 주파수 통으로 분할될 수 있다. 따라서, 일례에서, 각 이득 위치 벡터(610)는 0 Hz 내지 24 kHz의 주파수 범위에 걸쳐서 1024개의 이득 값(612; n = 0 내지 1023)을 포함할 수 있고, 이는 오디오 입력 신호가 48 kHz일 때에 전체 주파수 범위의 약 46.875 Hz 폭 증분와 같이 대역폭(또는 주파수 통)의 각각의 예정된 부분에 대해 이득 값을 초래할 수 있다. In Figure 6, gain value 612 is represented by Gsn, where "s" is a spatial slice number and "n" is a frequency band location in each gain location vector 610 corresponding to a frequency bin number. Each gain position vector 610 may represent in a vertical direction the frequency range of the audio input signal from the first predetermined frequency f1 to the second predetermined frequency f2 within a specific spatial slice of the spatial slice 602. The number of gain values 612 in each gain position vector 610 may correspond to the number of the frequency bin Bn in which the audio input signal is divided. As discussed above, the audio input signal may be divided into a predetermined number of frequency bins, such as 1024 passes, over a predetermined range of frequencies f1 to f2, such as 0 Hz to 20 kHz. Thus, in one example, each gain position vector 610 may include 1024 gain values 612 (n = 0 to 1023) over a frequency range of 0 Hz to 24 kHz, which means that the audio input signal is 48 kHz. Can result in a gain value for each predetermined portion of the bandwidth (or frequency bin), such as about 46.875 Hz wide increments of the entire frequency range.

작동 중에, 오디오 입력 신호가 이득 위치 필터에 적용될 수 있다. 적시의 각 순간에, 각 이득 위치 벡터(610) 내에 각 이득 값(612)은 다음과 같이 대응하는 주파수 통(Bn)에서 오디오 입력 신호(In)의 일부와 곱셈될 수 있다. In operation, an audio input signal can be applied to the gain position filter. At each instant of time, each gain value 612 in each gain position vector 610 may be multiplied with a portion of the audio input signal In in the corresponding frequency bin Bn as follows.

여기서, S_sn은 주파수 통 번호 "n"에 대응하는 공간 슬라이스 번호 "s"의 음향 소스 값이다.Here, S _sn is an acoustic source value of the space slice number "s" corresponding to the frequency channel number "n".

각각의 공간 슬라이스에서 음향 소스 값(Ssn)의 어레이로부터 형성된 결과적인 음향 소스 벡터(Ss)는 적시의 순간 동안에 각각의 음향 소스를 공간 슬라이스(602)에 거주시킬 수 있다. 음향 소스 벡터(Ss)의 각각의 음향 소스 값("n" 음향 소스 값)은 이득 값과 유사하게 주파수 통(Bn)에 따라 예정된 주파수 범위(f1 내지 f2)에 걸쳐 분배될 수 있다. 따라서, 특정한 공간 슬라이스(602)에서 음향 소스의 주파수 범위는 f1 내지 f2의 예정된 주파수 범위에 걸쳐 완전히 나타낼 수 있다. 또한, 주파수 통(Bn)에 대응하는 주파수의 임의의 소정 대역에서 "s" 공간 슬라이스(602)를 수평방향으로 가로질러, 오디오 입력 신호에서 청취자 인지 음향 스테이지(600)에 걸쳐 존재하는 모든 음향 소스가 나타낼 수 있다. 이득 값(612)이 동일한 주파수 통(Bn)에 대해 청취자 인지 음향 스테이지를 수평방향으로 가로질러 적용되기 때문에, 이득 값(612)이 소정의 주파수 대역(n)에서 공간 슬라이스(들)(602)에 걸쳐 추가되면, 그 결과는 최대 이득 값일 수 있다. 예컨대, 이득 값의 범위가 0 내지 1이면, 제1 주파수 통(B1)에 대한 모든 공간 슬라이스(602)에 걸쳐 이득 값(612)의 수평방향 합계는 1일 수 있다. The resulting acoustic source vector Ss formed from the array of acoustic source values Ssn in each spatial slice can populate each acoustic source in the spatial slice 602 for a moment in time. Each sound source value (“n” sound source value) of the sound source vector Ss may be distributed over a predetermined frequency range f1 to f2 according to the frequency bin Bn, similarly to the gain value. Thus, the frequency range of the sound source in a particular spatial slice 602 can be fully represented over the predetermined frequency range of f1 to f2. In addition, all sound sources present across the listener perceptual acoustic stage 600 in the audio input signal across the " s " spatial slice 602 in any given band of frequency corresponding to the frequency bin Bn. Can be represented. Since the gain value 612 is applied horizontally across the listener perception acoustic stage for the same frequency bin Bn, the gain value 612 is spatial slice (s) 602 in the predetermined frequency band n. If added over, the result may be a maximum gain value. For example, if the range of gain values is 0-1, the horizontal sum of gain values 612 across all spatial slices 602 for the first frequency bin B1 may be one.

각각의 공간 슬라이스(602)에서의 음향 소스 벡터(Ss)는 청취자 인지 음향 스테이지에 걸쳐 하나 이상의 음향 소스 또는 오디오 음향 소스를 나타낼 수 있다. 오디오 입력 신호(오디오 소스 재료)는 각 음향 소스를 지각에 의해 배치하도록 혼합 엔지니어에 의해 생성 또는 혼합될 수 있었다. 예컨대, 음향 엔지니어는 오디오 녹음이 오디오 시스템을 통해 재생될 때에 청취자가 콘서트홀 전방 근처, 연주가들의 그룹이 악기를 연주하고 노래하는 스테이지의 중앙 근처의 좌석에 위치되는 것으로 인지하도록 스테레오 오디오 녹음을 생성(또는 혼합)하도록 노력할 수 있다. 이 예에서, 음향 엔지니어는 예컨대 가수가 음향 스테이지의 중앙 근처에 위치되고, 베이스 기타가 청취자 인지 음향 스테이지의 좌측에 위치되며, 피아노가 음향 스테이지의 우측에 위치되도록 밴드의 멤버들을 청취자 인지 음향 스테이지를 가로질러 분배하도록 오디오 녹음을 혼합할 수 있다. 다른 예에서, 오디오 녹음이 서라운드 음향 오디오 녹음으로서 생성될 때에, 음향 엔지니어는 청중에 존재하고 녹음에 포함된 다른 청취자가 청취자의 후방 및/또는 옆에 있는 것으로 인지되는 콘서트홀에서 청취자가 청중의 일부로서 인지하기를 바랄 수 있다. The acoustic source vector Ss in each spatial slice 602 may represent one or more acoustic sources or audio acoustic sources across the listener perception acoustic stage. Audio input signals (audio source materials) could be generated or mixed by mixing engineers to place each sound source by perception. For example, the sound engineer may create (or play) a stereo audio recording such that when the audio recording is played through the audio system, the listener perceives that the listener is located near the front of the concert hall and a group of performers playing the instrument and singing near the center of the stage. Mix). In this example, the acoustic engineer is responsible for setting up a listener-aware sound stage such that the singer is located near the center of the sound stage, the bass guitar is located to the left of the listener-aware sound stage, and the piano is located to the right of the sound stage. Audio recordings can be mixed to distribute across. In another example, when an audio recording is created as a surround sound audio recording, the sound engineer is present in the audience and the listener is part of the audience in a concert hall where other listeners included in the recording are perceived to be behind and / or next to the listener. You may wish to be aware.

각 음향 소스는 이제 각각의 공간 슬라이스에서 별개의 음향 소스 벡터(Ss)에 포함될 수 있다. 따라서, 개별적인 음향 소스의 조정 및 추가 처리는 개별적인 음향 소스 벡터(Ss)를 더 처리함으로써 수행될 수 있다. 위치 필터 뱅크에서 위치 필터의 갯수가 오디오 출력 채널의 갯수와 동일하면, 각 음향 소스 벡터(Ss)는 라우드스피커를 구동하도록 소스 재료로서 사용될 수 있다. 이와 달리, 오디오 출력 채널의 갯수가 음향 소스 벡터(Ss)의 갯수보다 많거나 적은 경우에, 음향 소스 벡터(Ss)는 집합, 결합, 분할, 중복, 통과되고 및/또는 음향 소스 벡터를 포함하는 오디오 출력 채널의 각각의 갯수를 포함하기 위해 오디오 출력 신호를 발생시키도록 달리 처리될 수 있다. 오디오 출력 신호에 포함된 오디오 출력 채널은 또한 하나 이상의 각각의 라우드스피커를 구동시키도록 출력되기 전에 추가 처리될 수 있다. Each sound source can now be included in a separate sound source vector Ss in each spatial slice. Thus, adjustment and further processing of the individual sound sources can be performed by further processing the individual sound source vectors Ss. If the number of position filters in the position filter bank is the same as the number of audio output channels, each sound source vector Ss can be used as the source material to drive the loudspeakers. Alternatively, if the number of audio output channels is more or less than the number of sound source vectors Ss, the sound source vectors Ss are aggregated, combined, split, overlapped, passed, and / or comprise sound source vectors. It can be otherwise processed to generate an audio output signal to include each number of audio output channels. The audio output channel included in the audio output signal can also be further processed before being output to drive one or more respective loudspeakers.

도 7은 주파수 도메인에서 작동하는 오디오 처리 시스템(102)의 기능적 처리 블록의 블록도 예이다. 오디오 처리 시스템(102)은 오디오 입력 신호 분석 모듈(700)과 후처리 모듈(702)을 포함한다. 오디오 입력 신호 분석 모듈(700)은 오디오 입력 전처리 모듈(704), 음향 소스 벡터 생성 모듈(706) 및 파라미터 입력 제어기 모듈(708)을 포함한다. 다른 예에서, 오디오 처리 시스템(102)의 기능을 설명하도록 추가의 모듈 또는 보다 적은 모듈이 사용될 수 있다. 여기에 사용된 바와 같이, "모듈" 또는 "모듈들"이라는 용어는 소프트웨어(컴퓨터 코드, 지시) 또는 하드웨어(예컨대, 회로, 전기 구성요소 및/또는 로직) 또는 소프트웨어와 하드웨어의 조합으로서 정의된다. 7 is a block diagram example of a functional processing block of an audio processing system 102 operating in the frequency domain. The audio processing system 102 includes an audio input signal analysis module 700 and a post processing module 702. The audio input signal analysis module 700 includes an audio input preprocessing module 704, a sound source vector generation module 706, and a parameter input controller module 708. In another example, additional modules or fewer modules may be used to describe the functionality of the audio processing system 102. As used herein, the term “module” or “modules” is defined as software (computer code, instructions) or hardware (eg, circuits, electrical components and / or logic) or a combination of software and hardware.

도 7에서, 오디오 입력 전처리 모듈(704)은 오디오 입력 신호를 수신할 수 있다. 오디오 입력 신호(712)는 입력 신호들의 스테레오 쌍, 멀티 채널 오디오 입력 신호, 예컨대 5 채널, 6 채널 또는 7 채널 입력 신호, 또는 2개의 오디오 입력 신호 이상의 임의의 다른 갯수의 오디오 입력 신호일 수 있다. 오디오 입력 전처리 모듈(704)은 주파수 도메인 전환 프로세스에 대한 타임 도메인의 임의의 형태를 포함할 수 있다. 도 7에서, 오디오 입력 전처리 모듈(706)은 각각의 오디오 입력 신호(712)를 위한 윈도잉 모듈(714)과 컨버터(716)를 포함한다. 윈도잉 모듈(714)과 컨버터(716)는 윈도우 분석을 시간 샘플의 블록에 대해 오버랩하고 샘플을 이산 푸리에 변환(DFT) 또는 기타 변환 프로세스를 이용하여 전환시키는 것을 수행할 수 있다. 다른 예에서, 오디오 입력 신호의 처리는 시간 도메인에서 수행될 수 있고, 오디오 입력 전처리 모듈(704)은 오디오 입력 신호 처리 모듈(700)로부터 생략될 수 있으며, 시간 도메인 필터 뱅크에 의해 대체될 수 있다. In FIG. 7, the audio input preprocessing module 704 may receive an audio input signal. The audio input signal 712 may be a stereo pair of input signals, a multi channel audio input signal, such as a five channel, six channel or seven channel input signal, or any other number of audio input signals over two audio input signals. The audio input preprocessing module 704 may include any form of time domain for the frequency domain switching process. In FIG. 7, the audio input preprocessing module 706 includes a windowing module 714 and a converter 716 for each audio input signal 712. Windowing module 714 and converter 716 may perform window analysis overlapping on a block of time samples and converting the samples using a Discrete Fourier Transform (DFT) or other transformation process. In another example, the processing of the audio input signal may be performed in the time domain, and the audio input preprocessing module 704 may be omitted from the audio input signal processing module 700 and replaced by a time domain filter bank. .

전처리된(또는 전처리되지 않은) 오디오 입력 신호는 음향 소스 벡터 생성 모듈(706)로 제공될 수 있다. 음향 소스 벡터 생성 모듈(706)은 음향 소스 발생 벡터(Ss)를 발생시킬 수 있다. 음향 소스 벡터 생성 모듈(706)은 이득 벡터 생성 모듈(720), 신호 분류자 모듈(722) 및 벡터 처리 모듈(724)을 포함할 수 있다. 이득 벡터 생성 모듈(720)은 도 6을 참조하여 논의한 바와 같이 공간 슬라이스(602) 각각을 위한 이득 위치 벡터(610)를 발생시킬 수 있다. The preprocessed (or not preprocessed) audio input signal may be provided to the sound source vector generation module 706. The sound source vector generation module 706 may generate the sound source generation vector Ss. The sound source vector generation module 706 may include a gain vector generation module 720, a signal classifier module 722, and a vector processing module 724. The gain vector generation module 720 may generate a gain position vector 610 for each of the spatial slices 602 as discussed with reference to FIG. 6.

이득 벡터 생성 모듈(720)에 의한 이득 위치 벡터의 발생은 추정된 위치 생성 모듈(728), 위치 필터 뱅크 생성 모듈(730), 밸런스 모듈(732), 인지 모델(734), 소스 모델(736) 및 유형 검출 모듈(738)에 의한 처리를 포함할 수 있다. 추정된 위치 생성 모듈(728)은 전술한 바와 같이 수학식 1을 이용하여 추정된 지각 위치 값을 계산할 수 있다. 위치 필터 뱅크 생성 모듈(730)은 도 5를 참조하여 전술한 바와 같이 위치 필터 뱅크(500)를 계산할 수 있고, 밸런스 모듈은 음향 소스 발생 벡터(Ss)를 계산하도록 수학식 2를 이용할 수 있다. The generation of the gain position vector by the gain vector generation module 720 may include the estimated position generation module 728, the position filter bank generation module 730, the balance module 732, the cognitive model 734, and the source model 736. And processing by the type detection module 738. The estimated location generating module 728 may calculate the estimated perceptual location value using Equation 1 as described above. The position filter bank generation module 730 may calculate the position filter bank 500 as described above with reference to FIG. 5, and the balance module may use Equation 2 to calculate the sound source generation vector Ss.

인지 모델(734)과 소스 모델(736)은 추정된 위치 생성 모듈(728), 위치 필터 뱅크 생성 모듈(730) 및 밸런스 모듈(732)에 의해 이득 위치 벡터를 형성하기 위해 처리를 개선시키도록 사용될 수 있다. 일반적으로, 인지 모델(734)과 소스 모델(736)은 청취자 인지 음향 스테이지 내에 가청 음향 소스의 계산된 위치에서 급변을 보상하도록 스냅샷 바이 스냅샷(snapshot-by-snapshot) 기초로 이득 위치 벡터의 계산에서의 조정을 가능하게 하도록 협동 작동할 수 있다. 예컨대, 인지 모델(734)과 소스 모델(736)은 달리 지각 위치에서 급변을 유발할 수 있는 청취자 인지 음향 스테이지에서 특정한 음향 소스의 존재 및 진폭의 급변을 보상할 수 있다. 인지 모델은 시간에 걸쳐(예컨대, 다수의 스냅샷에 걸쳐) 이득 위치 벡터의 발생 중에 시간 기반 청각 차폐 추정치와 주파수 기반 청각 차폐 추정치 중 적어도 하나를 기초로 하여 이득 위치 벡터의 평활화를 수행할 수 있다. 소스 모델(736)은 오디오 입력 신호를 모니터하고 예정된 갯수의 스냅샷에 걸쳐 오디오 입력 신호의 진폭 및 주파수의 변화의 예정된 속도를 초과하는 것을 피하도록 평활화를 제공할 수 있다. Cognitive model 734 and source model 736 may be used by the estimated position generation module 728, position filter bank generation module 730, and balance module 732 to improve processing to form gain position vectors. Can be. In general, the cognitive model 734 and the source model 736 are based on a snapshot-by-snapshot basis of the gain position vector to compensate for sudden changes in the calculated position of the audible acoustic source within the listener cognitive acoustic stage. Cooperative operation may be made to enable coordination in calculations. For example, the cognitive model 734 and the source model 736 can compensate for the presence of a particular sound source and the sudden change in amplitude in a listener cognitive sound stage that may otherwise cause a sudden change in the perceptual position. The cognitive model may perform smoothing of the gain position vector based on at least one of a time based auditory shielding estimate and a frequency based auditory shielding estimate during generation of the gain position vector over time (eg, over multiple snapshots). . Source model 736 can provide smoothing to monitor the audio input signal and avoid exceeding a predetermined rate of change in amplitude and frequency of the audio input signal over a predetermined number of snapshots.

모니터링은 각각의 스냅샷을 위해 또는 이전의 스냅샷들 중 적어도 하나를 고려하는 주파수 통 마다 기초하여 오디오 입력 신호의 적시의 순간을 위해 수행될 수 있다. 일례에서, 2개의 이전의 스냅샷은 예정된 가중 인자에 의해 개별적으로 가중되고, 평균되며 현재의 스냅샷에 대한 비교를 위해 사용된다. 가장 최근의 이전의 스냅샷은 오래된 스냅샷보다 높은 예정된 중량을 가질 수 있다. 예정된 변화 속도를 초과하는 진폭 또는 주파수의 변화의 소스 모델(736)에 의한 확인 시에, 인지 모델(734)은 오디오 입력 신호의 인지 음향 스테이지에 포함되는 소스나 가청 음향, 또는 오디오 소스의 지각 위치의 변화 속도를 감소시키도록 이득 위치 벡터의 이득 값을 자동적으로 그리고 동적으로 평활화시킬 수 있다. 예컨대, 다수의 오디오 소스가 때때로 동일한 지각 위치 또는 공간 슬라이스에 함께 있고 때때로 적시의 상이한 순간에 상이한 지각 위치를 차지할 때에, 평활화는 오디오 소스가 지각 위치들 사이에서 "점프"하는 것으로 보이는 것을 피하도록 사용될 수 있다. 그렇지 않으면, 지각 위치들 사이에 그러한 신속한 이동은 제1 출력 채널에 의해 구동되는 라우드스피커들 중 하나로부터 제2 출력 채널에 의해 구동되는 라우드스피커들 중 다른 하나로의 오디오 소스 점핑으로서 청취자에 의해 인지될 수 있다. Monitoring may be performed for each snapshot or for the timely instant of the audio input signal based on frequency bins considering at least one of the previous snapshots. In one example, two previous snapshots are individually weighted, averaged by a predetermined weighting factor and used for comparison to the current snapshot. The most recent previous snapshot may have a higher predetermined weight than the old snapshot. Upon confirmation by the source model 736 of the change in amplitude or frequency in excess of the predetermined rate of change, the cognitive model 734 is a source or audible sound included in the cognitive acoustic stage of the audio input signal, or the perceptual position of the audio source. The gain value of the gain position vector can be smoothed automatically and dynamically so as to reduce the rate of change of. For example, when multiple audio sources are sometimes at the same perceptual location or spatial slice and sometimes occupy different perceptual locations at different moments in time, smoothing may be used to avoid the audio source appearing to "jump" between perceptual locations. Can be. Otherwise such rapid movement between perceptual positions may be perceived by the listener as an audio source jumping from one of the loudspeakers driven by the first output channel to another of the loudspeakers driven by the second output channel. Can be.

이와 달리, 또는 추가적으로, 소스 모델(736)은 지각 위치들이 소스 모델(736)에 포함되는 소스를 기초로 하여 오디오 입력 신호에서 확인된 오디오 소스에 따라 자동적으로 조절될 수 있는 지각 위치들 또는 공간 슬라이스들의 경계를 정의하도록 사용될 수 있다. 따라서, 오디오 소스가 2개 이상의 지각 위치에 있는 것으로 확인되면, 지각 위치를 나타내는 영역은 지각 위치의 경계를 조절함으로써 증가 또는 감소될 수 있다. 예컨대, 지각 위치의 영역은 전체 오디오 소스가 단일 지각 위치에 있도록 위치 필터 뱅크(500; 도 5)에서 필터들의 교차점의 조절에 의해 넓혀질 수 있다. 다른 예에서, 2개 이상의 오디오 소스가 동일한 지각 위치에 있는 것으로 결정되면, 지각 위치 또는 공간 슬라이스의 경계는 오디오 소스가 별개의 공간 슬라이스에 나타날 때까지 점차 감소될 수 있다. 단일 지각 위치의 다수의 오디오 소스는 예컨대 확인된 소스의 상이한 작동 주파수 범위에 대응하는 소스 모델의 소스를 확인함으로써 확인될 수 있다. 다른 공간 슬라이스의 경계는 또한 자동적으로 조절될 수 있다. 전술한 바와 같이, 지각 위치들의 경계는 서로 오버랩하거나, 멀리 떨어져 있거나, 연속적으로 정렬될 수 있다. Alternatively, or in addition, source model 736 may be a perceptual location or spatial slice in which perceptual locations may be automatically adjusted according to the audio source identified in the audio input signal based on the source included in source model 736. Can be used to define the boundary of the Thus, if the audio source is found to be at two or more perceptual positions, the area representing the perceptual position can be increased or decreased by adjusting the boundary of the perceptual position. For example, the region of the perceptual position may be widened by adjusting the intersection of the filters in the position filter bank 500 (FIG. 5) such that the entire audio source is in a single perceptual position. In another example, if two or more audio sources are determined to be at the same perceptual location, the perimeter location or boundary of the spatial slice may be gradually reduced until the audio source appears in a separate spatial slice. Multiple audio sources of a single perceptual location can be identified, for example, by identifying the source of the source model corresponding to the different operating frequency range of the identified source. The boundaries of other spatial slices can also be adjusted automatically. As discussed above, the boundaries of perceptual locations may overlap, distant, or be continuously aligned with each other.

인지 모델(734)은 또한 적시의 한 순간으로부터 다음 순간으로의 평활한 천이를 유지하도록 이득 위치 벡터에 포함된 이득 값을 시간 경과에 따라 평활화시킬 수 있다. 소스 모델(736)은 오디오 입력 신호에 포함된 상이한 오디오 소스의 모델을 포함할 수 있다. 작동 중에, 소스 모델(736)은 오디오 입력 신호를 모니터하고 인지 모델(734)에 의해 평활화 처리를 조절할 수 있다. 일례로서, 소스 모델(736)은 드럼 등의 음향 소스의 갑작스런 온셋(onset)을 검출할 수 있고, 공간 슬라이스에 걸쳐 더럽히기보다는 공간의 특유한 위치에서 드럼의 온셋을 포착하기 위해 인지 모델(734)이 평활화의 양을 감소하게 할 수 있다. 소스 모델(736)에 포함된 모델을 이용하면, 인지 모델(734)은 얼마나 많은 소정의 주파수 대역이 약화되어야 하는지를 결정할 때에 오디오 입력 신호에 포함된 음향 소스의 물리적 특징을 설명할 수 있다. 별개의 모델로서 도 7에 도시되었지만, 인지 모델(734)과 소스 모델(736)은 다른 예들에서 결합될 수 있다. The cognitive model 734 can also smooth over time the gain values contained in the gain position vectors to maintain a smooth transition from one moment in time to the next. Source model 736 can include models of different audio sources included in the audio input signal. In operation, the source model 736 can monitor the audio input signal and adjust the smoothing process by the cognitive model 734. As an example, source model 736 can detect a sudden onset of an acoustic source, such as a drum, and recognize model 734 to capture the onset of the drum at a unique location in space rather than dirty across the space slice. It is possible to reduce the amount of this smoothing. Using the model included in the source model 736, the cognitive model 734 can describe the physical characteristics of the sound source included in the audio input signal in determining how many predetermined frequency bands should be attenuated. Although shown in FIG. 7 as a separate model, cognitive model 734 and source model 736 may be combined in other examples.

유형 검출 모델(738)은 클래식 음악, 재즈 음악, 록 음악, 이야기와 같이 오디오 입력 신호의 유형을 검출할 수 있다. 유형 검출 모듈(738)은 오디오 입력 신호를 분류하도록 오디오 입력 신호를 분석할 수 있다. 대안적으로 또는 추가적으로, 유형 검출 모듈(738)은 오디오 입력 신호에 포함된 데이터, 무선 데이터 시스템(RDS; radio data system) 데이터, 또는 오디오 입력 신호를 특정한 유형으로서 결정하고 분류하도록 외부에서 제공된 정보의 임의의 다른 형태를 수신 및 디코딩할 수 있다. 유형 검출 모듈(738)에 의해 결정된 유형 정보는 또한 이득 벡터 결정 모듈(720)의 다른 모듈에 제공될 수 있다. 예컨대, 서라운드 음향 용례에서, 위치 필터 뱅크 생성 모듈(730)은 유형이 클래식 음악이라는 지시를 유형 검출 모듈(738)로부터 수신하고 위치 필터 뱅크(500; 도 5)에서 필터들의 교차점의 조정에 의해 자동적으로 조절하여 오디오 입력 신호의 어떠한 부분이 우측 후방 및 좌측 후방 오디오 출력 채널로 출력되는 것을 방지할 수 있다. The type detection model 738 can detect the type of audio input signal, such as classical music, jazz music, rock music, or story. The type detection module 738 may analyze the audio input signal to classify the audio input signal. Alternatively or additionally, type detection module 738 may be configured to determine externally provided information to determine and classify data included in the audio input signal, radio data system (RDS) data, or audio input signal as a particular type. Any other form can be received and decoded. The type information determined by the type detection module 738 may also be provided to other modules of the gain vector determination module 720. For example, in surround acoustic applications, the position filter bank generation module 730 receives an indication from the type detection module 738 that the type is classical music and automatically by adjusting the intersection of the filters in the position filter bank 500 (FIG. 5). Can be adjusted to prevent any part of the audio input signal from being output to the right rear and left rear audio output channels.

신호 분류 모듈(722)은 지각 위치들 중 각각의 지각 위치에 포함된 하나 이상의 오디오 소스를 확인하도록 청취자 인지 음향 스테이지를 가로질러 지각 위치(공간 슬라이스) 각각에서 작동될 수 있다. 신호 분류자 모듈(722)은 음향 소스 벡터(Ss)로부터 음향 소스를 확인할 수 있다. 예컨대, 지각 위치들 중 제1 지각 위치에서, 신호 분류자 모듈(722)은 각각의 오디오 소스를 가수의 음성으로서 확인할 수 있고, 제2 지각 위치에서, 각각의 오디오 소스는 특정한 악기, 예컨대 트럼펫으로서 확인될 수 있으며, 제3 지각 위치에서, 음성 및 특정한 악기와 같이 다수의 각각의 오디오 소스가 확인될 수 있고, 청취자 인지 음향 스테이지의 제4 지각 위치에서, 각각의 오디오 소스가 박수와 같은 청중 소음으로서 확인될 수 있다. 오디오 소스의 확인은 특정한 지각 위치에 포함된 가청 음향의 신호 분석을 기초로 할 수 있다. The signal classification module 722 may be operated at each perceptual location (spatial slice) across the listener perceptual sound stage to identify one or more audio sources included in each perceptual location of the perceptual locations. The signal classifier module 722 can identify the sound source from the sound source vector Ss. For example, at the first perceptual location, the signal classifier module 722 can identify each audio source as the singer's voice, and at the second perceptual location, each audio source is a particular instrument, such as a trumpet. In a third perceptual position, a number of individual audio sources can be identified, such as voice and a specific instrument, and in a fourth perceptual position of the listener perception acoustic stage, each audio source is audience noise such as clapping. Can be identified as Identification of the audio source may be based on signal analysis of the audible sound contained at a particular perceptual location.

신호 분류자 모듈(722)은 음향 소스의 확인을 파라미커 입력 제어기(708)로부터 수신된 입력 정보, 벡터 생성 모듈(720)의 출력 신호, 및/또는 벡터 처리 모듈(724)의 출력 신호를 기초로 할 수 있다. 예컨대, 확인은 파라미터 입력 제어기(708)로부터 제공된 RDS 데이터 신호와 같이 위치 이득 위치 벡터와 파라미터의 관점에서 음향 소스 벡터(Ss)의 주파수, 진폭 및 스펙트럼 특징을 기초로 할 수 있다. 따라서, 신호 분류자 모듈(722)은 청취자 인지 음향 스테이지에서 각각의 지각 위치 각각에 포함된 하나 이상의 오디오 소스의 분류를 수행할 수 있다. 분류는 예컨대 미리 정해진 음향 소스, 주파수 또는 음색 특징의 라이브러리와의 비교를 기초로 할 수 있다. 대안적으로 또는 추가적으로, 분류는 주파수 분석, 음색 특징, 또는 소스 분류를 수행하는 임의의 다른 메카니즘이나 기술을 기초로 할 수 있다. 예컨대, 음향 소스의 분류는 비교적 갑작스런 드럼의 온셋 특징과 같은 오디오 소스의 공지된 구별 특징을 기초로 하여 입력 신호에 포함된 반향 콘텐츠의 추출 및/또는 분석, 입력 신호에 포함된 소음 추정치의 사용, 입력 신호에 포함된 스피치의 검출, 입력 신호에 포함된 특정한 오디오 소스의 검출을 기초로 할 수 있다.
The signal classifier module 722 determines the sound source based on the input information received from the parameter input controller 708, the output signal of the vector generation module 720, and / or the output signal of the vector processing module 724. You can do For example, the confirmation may be based on the frequency, amplitude and spectral characteristics of the sound source vector Ss in terms of position gain position vectors and parameters, such as the RDS data signal provided from the parameter input controller 708. Accordingly, the signal classifier module 722 may perform classification of one or more audio sources included in each of the perceptual positions in the listener perceptual sound stage. The classification may for example be based on a comparison with a library of predetermined sound sources, frequencies or timbre features. Alternatively or additionally, the classification may be based on frequency analysis, timbre features, or any other mechanism or technique for performing source classification. For example, the classification of the acoustic source may be based on the known distinctive features of the audio source, such as the relatively sudden drum onset characteristics, extraction and / or analysis of echo content included in the input signal, use of noise estimates included in the input signal, Detection of speech included in the input signal, detection of a particular audio source included in the input signal may be based.

*신호 분류 모듈(722)은 벡터 처리 모듈(724)이 소정의 출력 채널에 대해 소정의 공간 슬라이스 내의 소정의 음향 소스를 할당하게 할 수 있다. 예컨대, 음성 신호는 음성 신호가 청취자 인지 음향 스테이지에 배치된 것과 상관없이 소정의 출력 채널(예컨대, 중앙 출력 채널)에 할당될 수 있다. 다른 예에서, 예컨대 보다 즐겁게 되도록, 명료도를 증가시키도록, 또는 임의의 다른 이유로 원하는 음향 필드를 얻기 위하여 대화 시피치(예컨대, 이야기)로서 확인된 신호가 2개 이상의 출력 채널에 할당될 수 있다. The signal classification module 722 can cause the vector processing module 724 to assign a given sound source in a given spatial slice for a given output channel. For example, the speech signal may be assigned to a given output channel (eg, a central output channel) regardless of whether the speech signal is placed in a listener perception sound stage. In another example, a signal identified as a conversational pitch (eg, a story) may be assigned to two or more output channels, for example to increase clarity, or to obtain a desired acoustic field for any other reason.

도 7에서, 공간 슬라이스의 분류는 1)위치 필터 뱅크 생성 모듈(730), 2)인지 모듈(734), 3)소스 모델(736) 및 4)유형 검출 모듈(738) 각각에 대해 피드백 오디오 분류 신호로서 제공될 수 있다. 피드백 오디오 소스 분류 신호는 청취자 인지 음향 스테이지에 걸쳐 각각의 지각 위치의 확인과, 각 지각 위치에 포함된 하나 이상의 오디오 소스의 확인을 포함할 수 있다. 각 모듈은 오디오 입력 신호의 후속 스냅샷의 각각의 처리를 수행할 때에 피드백 오디오 소스 분류 신호를 사용할 수 있다. In FIG. 7, the classification of the spatial slices is classified into feedback audio classification for each of 1) position filter bank generation module 730, 2) recognition module 734, 3) source model 736, and 4) type detection module 738. May be provided as a signal. The feedback audio source classification signal may include identification of each perceptual position across the listener perceptual acoustic stage and identification of one or more audio sources included in each perceptual position. Each module may use the feedback audio source classification signal when performing each processing of subsequent snapshots of the audio input signal.

예컨대, 위치 필터 뱅크 생성 모듈(730)은 단일 공간 슬라이스와 같이 예정된 갯수의 공간 슬라이스 내에서 소정의 음향 소스의 주파수 성분들 전부 또는 거의 전부를 포착하도록 위치 필터 뱅크에서 출력 필터의 위치 및/또는 폭의 조정에 의해 지각 위치의 영역을 조정할 수 있다. 예컨대, 공간 슬라이스의 위치 및/또는 폭은 음성 신호인 것으로 확인된 오디오 소스와 같이 오디오 입력 신호 내의 확인된 오디오 소스를 추적 및 포착하도록 위치 필터 뱅크(500; 도 5)에서 필터들의 교차점의 조정에 의해 조정될 수 있다. 인지 모델(734)은 예정된 파라미터들을 기초로 하여 차폐 추정치를 조정하도록 오디오 소스 분류 신호를 사용할 수 있다. 예시적인 예정된 파라미터는 음향 소스가 강한 조화 구조를 갖고 있는지의 여부, 및/또는 음향 소스가 급격한 온셋을 갖고 있는지의 여부를 포함한다. 소스 모델(736)은 청취자 인지 음향 스테이지의 공간 슬라이스에서 오디오 소스를 확인하도록 피드백 오디오 소스 분류 신호를 사용할 수 있다. 예컨대, 피드백 오디오 소스 분류 신호가 몇몇의 지각 위치에서 음성 오디오 소스를 지시하고 다른 지각 위치들에서 음악 오디오 소스를 지시하는 경우에, 소스 모델(736)은 음성 및 음악 기반 모델을 오디오 입력 신호의 상이한 지각 위치에 적용할 수 있다. For example, the position filter bank generation module 730 may locate and / or the width of the output filter in the position filter bank to capture all or almost all of the frequency components of a given sound source within a predetermined number of space slices, such as a single space slice. By adjusting, the region of the perceptual position can be adjusted. For example, the position and / or width of the spatial slice may be adjusted to adjust the intersection of the filters in the position filter bank 500 (FIG. 5) to track and capture the identified audio source in the audio input signal, such as the audio source identified as the voice signal. Can be adjusted by The cognitive model 734 can use the audio source classification signal to adjust the occlusion estimate based on the predetermined parameters. Exemplary predetermined parameters include whether the sound source has a strong harmonic structure and / or whether the sound source has a sharp onset. The source model 736 can use the feedback audio source classification signal to identify the audio source in the spatial slice of the listener perceptual sound stage. For example, where the feedback audio source classification signal indicates a voice audio source at some perceptual locations and a music audio source at other perceptual locations, the source model 736 may be configured to differ between the audio and music based models of the audio input signal. Applicable to the perceptual position.

신호 분류자 모듈(722)은 또한 분류자 출력 라인(726)에서 공간 슬라이스의 분류의 지시를 제공할 수 있다. 분류자 출력 라인(726) 상의 분류 데이터 출력은 분류 데이터의 수신기와 호환하는 임의의 포맷일 수 있다. 분류 데이터는 공간 슬라이스의 확인과 각 공간 슬라이스 내에 포함되는 음향 소스(들)의 지시를 포함할 수 있다. 분류 데이터의 수신기는 데이터베이스 또는 기타 데이터 보유 및 조직화 메카니즘을 갖는 저장 디바이스, 연산 디바이스, 또는 기타 내부 모듈이나 외부 디바이스 또는 모듈일 수 있다. 분류 데이터는 분류 데이터가 발생된 오디오 데이터 등의 다른 데이터와 관련하여 저장될 수 있다. 예컨대, 분류 데이터는 오디오 데이터의 헤더 또는 사이드 체인에 저장될 수 있다. 하나 이상의 스냅샷에서 개별 공간 슬라이스 또는 공간 슬라이스의 전체성의 오프라인 또는 실시간 처리는 또한 분류 데이터를 이용하여 수행될 수 있다. 오프라인 처리는 연산 능력을 갖는 디바이스 및 시스템에 의해 수행될 수 있다. 헤더 또는 사이드 체인에서와 같이 오디오 데이터와 관련하여 일단 저장되면, 분류 데이터는 다른 디바이스 및 시스템에 의한 오디오 데이터의 처리의 일부로서 사용될 수 있다. 다른 연산 디바이스, 오디오 관련 디바이스 또는 오디오 관련 시스템에 의한 실시간 처리는 또한 대응하는 오디오 데이터를 처리하도록 출력 라인(726) 상에 제공된 분류 데이터를 사용할 수 있다. Signal classifier module 722 may also provide an indication of the classification of the spatial slice at classifier output line 726. The classification data output on classifier output line 726 may be in any format compatible with the receiver of the classification data. The classification data may include the identification of the spatial slices and the indication of the sound source (s) included in each spatial slice. The receiver of the classification data may be a storage device, a computing device, or other internal or external device or module with a database or other data retention and organization mechanism. The classification data may be stored in association with other data such as audio data from which the classification data is generated. For example, the classification data may be stored in the header or side chain of the audio data. Offline or real-time processing of individual spatial slices or the overallness of the spatial slices in one or more snapshots may also be performed using classification data. Offline processing can be performed by devices and systems with computing power. Once stored in connection with audio data, such as in a header or side chain, the classification data can be used as part of the processing of the audio data by other devices and systems. Real-time processing by another computing device, audio related device, or audio related system may also use the classification data provided on output line 726 to process the corresponding audio data.

유형 검출 모듈(738)은 오디오 입력 신호의 유형을 확인하도록 오디오 소스 분류 신호를 사용할 수 있다. 예컨대, 오디오 소스 분류 신호가 상이한 지각 위치드에서 음성만을 지시하는 경우, 유형은 이야기로서 유형 검출 모듈(738)에 의해 확인될 수 있다. The type detection module 738 can use the audio source classification signal to identify the type of audio input signal. For example, if the audio source classification signal indicates only voice at different perceptual positions, the type may be identified by the type detection module 738 as a story.

이득 벡터 생성 모듈(720)은 벡터 처리 모듈(724)에 의해 수신하도록 이득 벡터 출력 라인(744) 상의 이득 위치 벡터를 발생시킬 수 있다. 벡터 처리 모듈(724)은 또한 오디오 입력 신호 전방향 이송 라인(746) 상의 전방향 이송 오디오 신호로서 오디오 입력 신호(712)를 수신할 수 있다. 도 7에서, 전방향 이송 오디오 신호는 주파수 도메인에 있고, 다른 예에서 벡터 처리 모듈(724)은 시간 도메인에 있거나 주파수 도메인 및 시간 도메인의 조합으로 작동할 수 있으며, 오디오 입력 신호는 시간 도메인에서 벡터 처리 모듈(724)로 제공될 수 있다. The gain vector generation module 720 may generate a gain position vector on the gain vector output line 744 for receipt by the vector processing module 724. The vector processing module 724 can also receive the audio input signal 712 as an omnidirectional audio feed on the audio input signal omnidirectional feed line 746. In FIG. 7, the omni-directional transport audio signal is in the frequency domain, and in another example, the vector processing module 724 can be in the time domain or operate in a combination of the frequency domain and the time domain, and the audio input signal is a vector in the time domain. May be provided to a processing module 724.

벡터 처리 모듈(724)은 청취자 인지 음향 스테이지를 가로질러 각 공간 슬라이스를 위한 음향 소스 벡터(Ss)를 발생시키기 위해 각 주파수 통에서 오디오 입력 신호(전방향 이송 신호)에 이득 위치 벡터를 적용하도록 수학식 4를 이용할 수 있다. 음향 소스 벡터(Ss)의 개별적 및 독립적 처리는 또한 벡터 처리 모듈(724) 내에서 수행될 수 있다. 예컨대, 개별적 음향 소스 벡터(Ss)는 필터링되거나, 벡터 처리 모듈(724)에 의해 출력되기 전에 진폭 조절될 수 있다. 또한, 음향 소스 벡터(Ss) 중의 몇 개에 효과가 추가될 수 있고, 예컨대 추가 반향이 가수의 음성에 추가될 수 있다. 개별적 음향 소스 벡터(Ss)는 또한 독립적으로 벡터 처리 모듈(724)에 의해 처리의 일부로서 딜레이 또는 변경, 재구성, 강화 또는 정정될 수 있다. 음향 소스 벡터(Ss)는 또한 벡터 처리 모듈(724)에 의해 출력되기 전에 평활화되거나 달리 개별적으로 처리될 수 있다. 또한, 음향 소스 벡터(Ss)는 출력되기 전에 벡터 처리 모듈(724)에 의해 집합, 예컨대 결합 또는 분할될 수 있다. 따라서, 원래의 녹음은 개별적 공간 슬라이스 조정의 레벨을 기초로 하여 재생의 품질을 향상시키도록 "조정"될 수 있다. The vector processing module 724 applies mathematics to apply the gain position vector to the audio input signal (a forward feed signal) in each frequency bin to generate an acoustic source vector Ss for each spatial slice across the listener perception acoustic stage. Equation 4 can be used. Individual and independent processing of the sound source vector Ss may also be performed within the vector processing module 724. For example, the individual sound source vectors Ss can be filtered or amplitude adjusted before being output by the vector processing module 724. In addition, effects can be added to some of the sound source vectors Ss, for example additional echoes can be added to the singer's voice. The individual sound source vectors Ss may also be independently delayed or altered, reconstructed, enhanced or corrected as part of the processing by the vector processing module 724. The sound source vector Ss may also be smoothed or otherwise processed separately before being output by the vector processing module 724. Also, the sound source vectors Ss may be aggregated, for example combined or divided, by the vector processing module 724 before being output. Thus, the original recording can be "adjusted" to improve the quality of reproduction based on the level of the individual spatial slice adjustments.

벡터 처리 모듈(724)에 의한 처리 후에, 처리된 음향 소스 벡터(Ss)는 벡터 출력 라인(748) 상의 음향 소스 벡터 신호로서 출력될 수 있다. 각 음향 소스 벡터 신호는 오디오 비력 신호 내로부터 하나 이상의 별개의 오디오 소스를 나타낼 수 있다. 음향 소스 벡터 신호는 신호 분류자 모듈(722) 및 후처리 모듈(702)에 대한 입력 신호로서 제공될 수 있다. After processing by the vector processing module 724, the processed sound source vector Ss may be output as a sound source vector signal on the vector output line 748. Each sound source vector signal may represent one or more separate audio sources from within the audio power signal. The sound source vector signal may be provided as input signals to the signal classifier module 722 and the post processing module 702.

파라미터 입력 제어기(708)는 이득 벡터 생성 모듈(720), 신호 분류자 모듈(722) 및 벡터 처리 모듈(724)에 파라미터 입력값을 선택적으로 제공할 수 있다. 파라미터 입력값은 이득 위치 벡터 및/또는 처리된 음향 소스 벡터(Ss)를 발생시키기 위해 처리에 영향을 주고, 변경시키며 및/또는 개선시키도록 모듈에 의해 이용할 수 있는 임의의 신호 또는 지시일 수 있다. 예컨대, 차량의 경우에, 파라미터 입력값은 외부 신호, 예컨대 엔진 소음, 거리 소음, 차량의 내외측에 배치된 마이크로폰 및 가속도계, 차량 속도, 기후 제어 세팅, 컨버터블 개방 또는 폐쇄, 음향 시스템의 볼륨, RDS 데이터, 오디오 입력 신호의 소스, 예컨대 콤팩트 디스크(CD; compact disc), 디지털 비디오 디코더(DVD; digital video decoder), AM/FM/위성 라디오, 셀룰러 전화기, 블루투스 연결, MP3 플레이어, Ipod®, 또는 오디오 입력 신호의 임의의 다른 소스를 포함할 수 있다. 다른 파라미터 입력값은 오디오 신호가 손실이 많은 인지 오디오 코덱, 사용된 코덱 타입(MP3 등), 및/또는 입력 신호가 인코딩된 비트 전송률에 의해 압축되었다는 지시를 포함할 수 있다. 유사하게, 스피치 신호의 경우에, 파라미터 입력값은 채용된 스피치 코덱 타입의 지시, 인코딩된 비트 전송률, 및/또는 입력 신호 내의 음성 활동도의 지시를 포함할 수 있다. 다른 예에서, 오디오 처리에 유용한 임의의 다른 파라미터가 제공될 수 있다. The parameter input controller 708 may optionally provide parameter inputs to the gain vector generation module 720, the signal classifier module 722, and the vector processing module 724. The parameter input may be any signal or indication available by the module to influence, change, and / or improve processing to generate a gain position vector and / or processed sound source vector Ss. . For example, in the case of a vehicle, the parameter inputs may include external signals such as engine noise, street noise, microphones and accelerometers placed inside and outside the vehicle, vehicle speed, climate control settings, convertible opening or closing, volume of the acoustic system, RDS Sources of data, audio input signals such as compact discs (CDs), digital video decoders (DVDs), AM / FM / satellite radios, cellular telephones, Bluetooth connections, MP3 players, Ipod®, or audio It can include any other source of input signal. Other parameter inputs may include an indication that the audio signal was lossy perceptual audio codec, the codec type used (MP3, etc.), and / or the input signal was compressed by the encoded bit rate. Similarly, in the case of a speech signal, the parameter input value may include an indication of the speech codec type employed, an encoded bit rate, and / or an indication of speech activity in the input signal. In another example, any other parameter useful for audio processing may be provided.

이득 벡터 생성 모듈(720) 내에서, 파라미터 입력값은 오디오 입력 신호를 검출하도록 유형 검출 모듈(738)에 정보를 제공할 수 있다. 예컨대, 파라미터 입력값이 오디오 입력 신호가 휴대폰으로부터 입력된 것으로 지시하면, 유형 검출 모듈(738)은 오디오 입력 신호가 음성 신호라고 지시할 수 있다. 신호 분류기(722)에 제공된 파라미터 입력값이 공간 슬라이스에서 개별적인 오디오 소스를 분류하도록 사용될 수 있다. 예컨대, 파라미터 입력값이 오디오 소스가 네비게이션 시스템이라고 지시하면, 신호 분류기(722)는 오디오 소스로서 음성을 포함하는 공간 슬라이스를 주목하고 다른 공간 슬라이스는 무시한다. 또한, 파라미터는 신호 분류자(722)가 오디오 소스에 의해 특정한 공간 슬라이스에 포함된 소음 또는 다른 오디오 콘텐츠를 인식하게 할 수 있다. 벡터 처리 모듈(724)은 파라미터를 기초로 하여 공간 슬라이스의 처리를 조정할 수 있다. 예컨대, 차량의 경우에, 속도 파라미터가 보다 높은 속도에서 저주파수 오디오 소스, 또는 특정한 공간 슬라이스, 또는 특정한 음향 소스 벡터의 진폭을 증가시키도록 사용될 수 있다.Within the gain vector generation module 720, the parameter input value may provide information to the type detection module 738 to detect the audio input signal. For example, if the parameter input value indicates that the audio input signal is input from the cellular phone, the type detection module 738 may indicate that the audio input signal is a voice signal. The parameter inputs provided to the signal classifier 722 can be used to classify individual audio sources in the spatial slice. For example, if the parameter input indicates that the audio source is a navigation system, then the signal classifier 722 looks at the spatial slice that contains speech as the audio source and ignores other spatial slices. In addition, the parameters may cause the signal classifier 722 to recognize noise or other audio content included in a particular spatial slice by the audio source. The vector processing module 724 can adjust the processing of the spatial slices based on the parameters. For example, in the case of a vehicle, a speed parameter can be used to increase the amplitude of a low frequency audio source, or a specific spatial slice, or a particular sound source vector at a higher speed.

도 7에서, 음향 소스 벡터 신호는 전처리 모듈(704)과 유사한 프로세스를 이용하여 주파수 도메인으로부터 시간 도메인으로 전환시키도록 후처리 모듈(702)을 통해 처리될 수 있다. 따라서, 후처리 모듈(702)은 음향 소스 벡터 신호를 위해 컨버터(752)와 윈도잉 모듈(754)을 포함할 수 있다. 컨버터(752)와 윈도잉 모듈(754)은 시간 샘플의 블록을 전환시키도록 이산 푸리에 변환(DFT) 또는 다른 변환 프로세스를 사용할 수 있다. 다른 예에서, 시간 도메인 번환 프로세스에 대해 상이한 주파수 도메인이 사용될 수 있다. 또 다른 예에서, 벡터 출력 라인(748)에 제공된 음향 소스 벡터 신호는 시간 도메인에서 적어도 부분적으로 수행되는 음향 소스 벡터 처리 모듈(706)에 의한 처리로 인해 시간 도메인에 있을 수 있다. 음향 소스 벡터 신호 또는 후처리된 음향 소스 벡터 신호는 공간 슬라이스로 분할된 오디오 소스를 나타내고 추가 처리를 받을 수 있거나, 청취 공간에서 라우드스피커를 구동시키도록 사용될 수 있거나, 임의의 다른 오디오 처리 관련 활동을 위해 사용될 수 있다. In FIG. 7, the sound source vector signal may be processed through the post processing module 702 to switch from the frequency domain to the time domain using a process similar to the preprocessing module 704. Thus, the post processing module 702 may include a converter 752 and a windowing module 754 for the sound source vector signal. Converter 752 and windowing module 754 may use a Discrete Fourier Transform (DFT) or other conversion process to convert blocks of time samples. In another example, different frequency domains may be used for the time domain inversion process. In another example, the sound source vector signal provided to the vector output line 748 may be in the time domain due to processing by the sound source vector processing module 706 performed at least in part in the time domain. The sound source vector signal or post-processed sound source vector signal represents an audio source divided into spatial slices and can be subjected to further processing, can be used to drive the loudspeakers in the listening space, or any other audio processing related activity. Can be used for

도 8은 오디오 입력 신호 분석 모듈(700), 음향 소스 벡터 처리 모듈(802) 및 후처리 모듈(804)을 포함할 수 있는 오디오 처리 시스템(102)의 다른 예의 블록도이다. 오디오 입력 분석 모듈(700)은 전처리 모듈(704), 음향 소스 벡터 생성 모듈(706) 및 파라미터 입력 제어기(708)를 포함할 수 있다. 또한, 음향 소스 벡터 생성 모듈(706)은 전술한 바와 같이 이득 벡터 생성 모듈(720), 신호 분류자 모듈(722) 및 벡터 처리 모듈(724)을 포함할 수 있다. 8 is a block diagram of another example of an audio processing system 102 that may include an audio input signal analysis module 700, an acoustic source vector processing module 802, and a post processing module 804. The audio input analysis module 700 may include a preprocessing module 704, a sound source vector generation module 706, and a parameter input controller 708. The sound source vector generation module 706 may also include a gain vector generation module 720, a signal classifier module 722, and a vector processing module 724, as described above.

도 8에서, 전처리 모듈(704)은 좌측 스테레오 신호(L) 및 우측 스테레오 신호(R)의 형태로 오디오 입력 신호(806)를 수신한다. 다른 예에서, 임의의 갯수의 오디오 입력 신호가 제공될 수 있다. 오디오 입력 신호(806)는 전술한 바와 같이 전처리 모듈(706)에 의해 주파수 도메인으로 전환될 수 있거나, 시간 도메인에서 음향 소스 벡터 생성 모듈(706)에 의해 직접 수신될 수 있다. In FIG. 8, the preprocessing module 704 receives the audio input signal 806 in the form of a left stereo signal L and a right stereo signal R. In FIG. In another example, any number of audio input signals may be provided. The audio input signal 806 may be converted into the frequency domain by the preprocessing module 706 as described above, or may be directly received by the sound source vector generation module 706 in the time domain.

음향 소스 벡터 생성 모듈(706)은 또한 전술한 바와 같이 이득 벡터 생성 모듈(720), 신호 분류자 모듈(722) 및 벡터 처리 모듈(724)을 이용하여 벡터 출력 라인(748) 상에 음향 소스 벡터(Ss)를 발생시킬 수 있다. 벡터 출력 라인(748) 상의 음향 소스 벡터(Ss)는 음향 소스 베터 처리 모듈(802)에 의해 수신될 수 있다. 음향 소스 벡터 처리 모듈(802)은 또한 각각의 공간 슬라이스[음향 소스 벡터(Ss)]에서 오디오 소스의 확인을 지시하는 신호 분류자 모듈(722)로부터 오디오 분류 신호를 수신할 수 있다.The sound source vector generation module 706 also uses the gain vector generation module 720, the signal classifier module 722, and the vector processing module 724, as described above, on the vector output line 748. (Ss) can be generated. The sound source vector Ss on the vector output line 748 may be received by the sound source bettor processing module 802. The sound source vector processing module 802 may also receive an audio classification signal from the signal classifier module 722 that directs the identification of the audio source in each spatial slice (sound source vector Ss).

음향 소스 벡터 처리 모듈(802)은 처리된 음향 소스 벡터(Ss)를 기초로 하여 출력 채널 라인(810) 상에 오디오 출력 채널을 발생시킬 수 있다. 음향 소스 벡터 처리 모듈(802)은 음향 소스 벡터 수정 모듈(812) 및 조립 모듈(814)을 포함할 수 있다. The sound source vector processing module 802 may generate an audio output channel on the output channel line 810 based on the processed sound source vector Ss. The sound source vector processing module 802 may include a sound source vector modification module 812 and an assembly module 814.

음향 소스 벡터 수정 모듈(812)은 벡터 처리 모듈(724)에 관해 전술한 것과 유사한 기능성을 포함할 수 있다. 음향 소스 벡터 수정 모듈(812)은 처리된 음향 소스 벡터(Ss) 각각에서 개별적으로 작동할 수 있는 복수 개의 수정 블록(813)을 포함한다. 따라서, 음향 소스 벡터 수정 모듈(812)은 반향을 추가하고, 등화를 수행하며, 딜레이를 추가하고, 효과를 추가하고, 동적 범위 압축 또는 팽창을 수행하며, 과도 현상을 향상시키고, 신호 대역폭을 연장시키며, 분실 신호 성분을 재구성하도록 삽입 및/또는 보외(補外)하고, 및/또는 음향 소스 벡터(Ss) 마다 기초하여 임의의 다른 오디오 처리 관련 활동을 수행하도록 사용될 수 있다. 음향 소스 벡터 수정 모듈(812) 내의 처리는 강등된 오디오 신호를 정정, 복구 및 강화시키도록 사용될 수 있다. 따라서, 청취자 인지 음향 스테이지에 걸쳐 개별적 공간 슬라이스는 다른 음향 소스 벡터(Ss)에서 임의의 다른 오디오 소스에 영향을 미치는 일 없이 독립적으로 수정, 조정 및/또는 보상될 수 있다. 예컨대, 특정한 공간 슬라이스의 딜레이는 특정한 공간 슬라이스의 인지를 강조하거나, 인지된 음향 스테이지의 인지된 폭을 변경시키도록 수행될 수 있다.The sound source vector modification module 812 may include functionality similar to that described above with respect to the vector processing module 724. The sound source vector modification module 812 includes a plurality of modification blocks 813 that can operate individually on each of the processed sound source vectors Ss. Thus, sound source vector correction module 812 adds echoes, performs equalization, adds delays, adds effects, performs dynamic range compression or expansion, improves transients, and extends signal bandwidth. Can be inserted and / or extrapolated to reconstruct the missing signal component, and / or perform any other audio processing related activities based on a per sound source vector Ss. Processing within the sound source vector modification module 812 may be used to correct, recover, and enhance the demoted audio signal. Thus, individual spatial slices across the listener perceptual sound stage can be independently modified, adjusted and / or compensated without affecting any other audio source in other sound source vectors Ss. For example, the delay of a particular spatial slice can be performed to emphasize the recognition of a particular spatial slice or to change the perceived width of the perceived acoustic stage.

음향 소스 벡터 수정 모듈(812)은 또한 개별적 벡터에서 오디오 소스의 확인을 기초로 하여 개별적 음향 소스 벡터(Ss)의 수정을 수행할 수 있다. 전술한 바와 같이, 신호 분류 모듈(722)은 지각 위치들 중 각각의 지각 위치에 포함된 하나 이상의 오디오 소스를 확인하도록 청취자 인지 음향 스테이지에 걸쳐 각 지각 위치에서 작동할 수 있다. 오디오 소스의 확인 후에, 대응하는 음향 소스 벡터(Ss)는 확인된 오디오 소스를 기초로 하여 수정될 수 있다. 스냅샷 후에 처리를 위한 피드백으로서 오디오 소스의 확인을 이용하는 벡터 처리 모듈(724)과 달리, 음향 소스 벡터 수정 모듈(812)은 전방향 이송으로서 오디오 소스의 확인을 제공한다. 따라서, 음향 소스 벡터 수정 모듈(812)은 신호 분류 모듈(722)에 의해 제공되는 바와 같이 각 오디오 소스의 확인을 기초로 하여 개별적인 음향 소스 벡터(Ss)를 처리할 수 있다. The sound source vector modification module 812 may also perform modification of the individual sound source vector Ss based on the identification of the audio source in the individual vector. As noted above, the signal classification module 722 may operate at each perceptual location across the listener perceptual sound stage to identify one or more audio sources included in each perceptual location of the perceptual locations. After identification of the audio source, the corresponding sound source vector Ss can be modified based on the identified audio source. Unlike the vector processing module 724, which uses the identification of the audio source as feedback for processing after the snapshot, the acoustic source vector modification module 812 provides the identification of the audio source as omnidirectional transport. Thus, the sound source vector modification module 812 may process the individual sound source vector Ss based on the identification of each audio source as provided by the signal classification module 722.

오디오 소스의 확인을 기초로 한 수정은, 개별적 오디오 소스의 정정, 입력 신호에 포함된 인지된 음향 스테이지 및/또는 개별적 오디오 소스의 폭의 조정, 반향의 레벨의 조정, 스피치 소스의 레벨의 조정, 음성 소스의 감소 또는 제거, 충돌 소스의 강화, 동적 범위 압축 또는 팽창, 대역폭 연장, 개별적 오디오 소스의 분실 성분을 재구성하는 번외 및/또는 삽입, 오디오 소스 특정 효과 또는 강화, 및 청취자 인지 음향 스테이지에 걸쳐 지각 위치 조정을 포함할 수 있다. 개별적 확인된 오디오 소스의 정정은 라이브러리 또는 다른 오디오 소스 재생성 디바이스, 예컨대 MIDI 플레이어로부터 특정한 오디오 소스의 오디오 출력 부분의 대체를 포함할 수 있다. 예컨대, 특정한 주파수에서 출력된 소음을 갖는 노트를 포함하는 색소폰으로서 확인된 오디오 소스는 라이브러리로부터 또는 색소폰의 오디오를 재생성할 수 있는 소스로부터의 색소폰 오디오 출력의 동일한 주파수에서 동일한 노트와 대체될 수 있다. 입력 오디오 신호는 인지 오디오 코덱, 예컨대 MP3 코덱, 또는 임의의 다른 형태의 분실 압축에 의한 처리의 결과로서 손상 또는 강등될 수 있다. 다른 열화/손상 소스는 불충분한 오디오 녹음 및/또는 저장 습관, AM/FM 및 위성 라디오 방송, 텔레비젼 방송, 비디오 코덱, 블루투스 등의 무선 연결, 음성 코덱 뿐만 아니라 휴대폰 네트워크를 비롯한 전화기 네트워크를 포함한다. Modifications based on the identification of the audio source may include correction of the individual audio source, adjustment of the perceived acoustic stage and / or the individual audio source included in the input signal, adjustment of the level of reflections, adjustment of the level of the speech source, Reduction or elimination of speech sources, enhancement of collision sources, dynamic range compression or expansion, bandwidth extension, extra and / or insertion to reconstruct lost components of individual audio sources, audio source specific effects or enhancements, and listener-aware sound stages Perception location adjustments may be included. Correction of the individually identified audio source may include replacement of the audio output portion of the particular audio source from a library or other audio source regeneration device, such as a MIDI player. For example, an audio source identified as a saxophone containing notes with noise output at a particular frequency may be replaced with the same note at the same frequency of saxophone audio output from the library or from a source capable of reproducing the saxophone's audio. The input audio signal may be corrupted or demoted as a result of processing by a cognitive audio codec, such as an MP3 codec, or any other form of lost compression. Other degradation / damage sources include insufficient audio recording and / or storage habits, AM / FM and satellite radio broadcasts, television broadcasts, wireless connections such as video codecs, Bluetooth, voice codecs, as well as telephone networks, including cellular networks.

오디오 소스 특정 효과 또는 강화는 확인된 오디오 소스에 특정한 특별 음향 소스 벡터(Ss)에 포함된 음향 소스 값에 대한 변화를 포함할 수 있다. 예컨대, 음성으로서 확인된 오디오 소스는 진폭이 증가되거나 특정 주파수 대역으로 조정되어 음성이 청취자에게 보다 쉽게 인식되게 할 수 있다. 특정한 음향 소스 벡터(Ss)는 2개 이상의 음향 소스 벡터(Ss)에 나타나는 오디오 소스의 명료도를 증가시키도록 동적 범위 압축기의 적용에 의해 압축될 수 있다. 예컨대, 스피커 음성이 중앙 음향 소스 벡터(Ss) 뿐만 아니라 각각의 악기 또는 배경 소음을 또한 포함하는 인접한 좌측 및 우측 음향 소스 벡터에 존재하는 경우에, 중앙 음향 소스 벡터는 동적으로 압축되거나 그 레벨이 변경될 수 있다. 다른 예에서, 특정한 음향 소스 벡터(Ss)에서 트럼펫 등의 악기는 선명도를 개선하도록 등화될 수 있다. The audio source specific effect or enhancement may comprise a change to the sound source value contained in the special sound source vector Ss specific to the identified audio source. For example, an audio source identified as speech can be increased in amplitude or adjusted to a specific frequency band to make it easier for the listener to recognize the speech. The particular sound source vector Ss can be compressed by the application of a dynamic range compressor to increase the intelligibility of the audio source appearing in the two or more sound source vectors Ss. For example, if the speaker voice is present in adjacent left and right sound source vectors that also include each instrument or background noise as well as the center sound source vector Ss, the center sound source vector is dynamically compressed or its level changed. Can be. In another example, an instrument, such as a trumpet, in a particular sound source vector Ss may be equalized to improve sharpness.

지각 위치 조정은 청취자가 인지하는 음향 필드에서 일 위치로부터 다른 위치로 확인된 오디오 소스를 이동시키는 것을 포함할 수 있다. 예를 들면, 가수의 음성과 같은 음향 소스는 청취자가 인지하는 음성 스테이지에서 기타와 같은 제2 음향 소스가 인접 배치된 음향 소스 벡터(Ss)로 존재하는 상태에서 중앙 채널에 위치될 수 있다. 일단 신호 분류 모듈(722)에 의해 가수의 음성과 기타로 확인되면, 기타 음향 소스는 청취자가 인지하는 음향 스테이지에서 음향 소스 벡터 변경 모듈(812)에 의해 가수의 음성으로부터 더 멀리 이격되도록 이동될 수 있다. 예를 들면, 기타는 음향 소스 벡터 변경 모듈(812)에 의해 해당 오디오 소스를 오디오 소스를 포함하지 않는 것으로 확인된 다른 음향 소스 벡터(Ss)로 이동시키는 것에 의해 우측 라우드스피커 측으로 이동될 수 있다. 벡터 처리 모듈(724)은 음향 소스와 공간적 슬라이스를 가능한 최상으로 확인 및/또는 격리하도록 작동되는 반면, 음향 소스 벡터 변경 모듈(812)은 상기 확인되거나 및/또는 격리된 음향 소스 및 공간적 슬라이스를 변경하는데 사용된다.Perceptual position adjustment may include moving the identified audio source from one position to another in a sound field perceived by the listener. For example, a sound source, such as a singer's voice, may be located in the central channel with a second sound source, such as a guitar, present in the adjacently arranged sound source vector Ss at a voice stage perceived by the listener. Once identified by the signal classification module 722 as the singer's voice and other, the other sound source may be moved further away from the singer's voice by the sound source vector changing module 812 at the sound stage perceived by the listener. have. For example, the guitar may be moved to the right loudspeaker side by moving the audio source to another sound source vector Ss that has been identified as not containing an audio source by the sound source vector changing module 812. Vector processing module 724 is operative to identify and / or isolate acoustic sources and spatial slices as best as possible, while acoustic source vector change module 812 changes the identified and / or isolated acoustic sources and spatial slices. It is used to

출력 채널의 생성은 음향 소스 벡터(Ss)가 기원되는 지각 위치 또는 공간 슬라이스의 사용자 인지 음향 스테이지에서의 위치에 의존하여 다수의 음향 소스 벡터(Ss)를 조립 모듈(814)을 결합하거나 분리하는 것을 포함할 수 있다. 예를 들면, 5 출력 채널을 갖는 시스템에서, 청취자 인지 음향 스테이지의 중앙에 인접한 여러 지각적 위치로부터의 음향 소스 벡터(Ss)는 결합되어 중앙 라우드스피커를 구동하는 중앙 출력 채널을 형성할 수 있다. 오직 4개의 공간 슬라이스가 존재하는 다른 예에 따른 5채널의 서라운드 음향 출력 시스템에서, 공간 슬라이스 중 2개는 결합되어 측방 또는 후방 출력 채널을 형성할 수 있다. 다른 예에서, 지각 위치 또는 공간 슬라이스의 수는 출력 채널의 수와 일치할 수 있다. 전술한 바와 같이, 이것은 2채널 스테레오 레코딩이 5, 6, 7, 또는 임의의 수의 출력 채널로 전환될 수 있도록 한다.The generation of the output channel may involve combining or separating the plurality of sound source vectors Ss from the assembling module 814 depending on the perceptual position from which the sound source vector Ss originates or the position in the user perceived acoustic stage of the spatial slice. It may include. For example, in a system with five output channels, the sound source vectors Ss from several perceptual locations adjacent to the center of the listener perception sound stage may be combined to form a central output channel that drives the center loudspeaker. In a five-channel surround sound output system according to another example where there are only four spatial slices, two of the spatial slices can be combined to form a lateral or rear output channel. In another example, the number of perceptual locations or spatial slices may match the number of output channels. As mentioned above, this allows two-channel stereo recording to be switched to 5, 6, 7, or any number of output channels.

음향 소스 벡터(Ss)는 원시 오디오 입력 신호에서 오디오 소스를 청취자 인지 음향 스테이지의 다른 위치로 이동시키도록 음향 소스 벡터 변경 모듈(812)과 관련하여 작동하는 조립 모듈(814)에 의해 재배열되거나 재매핑될 수 있다. 청취자 인지 음향 스테이지에서 오디오 소스 각각은 음향 소스 벡터(Ss) 중 별개의 하나에 포함될 수 있으므로, 음향 소스는 청취자 인지 음향 스테이지의 다른 위치로 이동되거나 매핑될 수 있다. 다시 말해, 오디오 입력 신호에서 각각의 오디오 소스의 청취자 인지 음향 스테이지에서의 위치가 결정 및 획득되고, 그리고 오디오 소스가 음향 소스 벡터(Ss)에 의해 개별 지각 위치 또는 공간 슬라이스로 분리되기 때문에, 음향 소스가 출력 오디오 채널에서 대체로 동일한 위치에 위치되어야 하는지 여부 또는 출력 오디오 채널에서 새로운 지각 위치로 이동되어야 하는지 여부가 결정될 수 있다.The sound source vector Ss is rearranged or rearranged by an assembly module 814 that operates in conjunction with the sound source vector changing module 812 to move the audio source from the raw audio input signal to another location in the listener perception sound stage. Can be mapped. Since each audio source in the listener-aware sound stage may be included in a separate one of the sound source vectors Ss, the sound source may be moved or mapped to another location in the listener-aware sound stage. In other words, the sound source is determined and obtained in an audio input signal at the listener perception acoustic stage of each audio source, and the audio source is separated into individual perceptual positions or spatial slices by the sound source vector Ss. It can be determined whether is to be located at substantially the same location in the output audio channel or to be moved to a new perceptual location in the output audio channel.

예를 들면, 제1 지각 위치 또는 공간 슬라이스가 가수의 음성을 포함하고, 제1 지각 위치에 인접하게 위치된 제2 지각 위치가 기타를 포함하고 있으면, 가수의 음성은 중심 출력 채널에 할당되거나 매핑될 수 있고, 또한 기타는 가수의 음성에서 분리된 청취자 인지 음향 스테이지의 좌측 및 우측 모두에 할당되거나 매핑될 수 있다. 가수의 음성과 기타는 가수의 음성을 포함하는 음향 소스 벡터(Ss)를 중앙 출력 채널로 적절히 매핑하고 기타를 포함하는 음향 소스 벡터(Ss)를 조립 모듈(814)을 통해 좌우측 전방, 측방 및/또는 후방 출력 채널로 매핑하는 것에 의해 분리될 수 있다. 따라서, 오디오 처리 시스템(102)은 2채널 오디오 입력 신호를 서라운드 음향 출력 신호와 같은 임의의 수의 다채널 출력 신호로 전환할 수 있음은 물론, 오디오 입력 신호 중의 개별 오디오 소스가 원하는 출력 채널 중 임의의 하나 이상의 채널에 할당될 수 있게 할 수 있다.For example, if the first perceptual location or spatial slice contains the mantissa's voice, and the second perceptual location located adjacent to the first perceptual location includes the other, the mantissa's voice is assigned or mapped to the central output channel. The guitar may also be assigned or mapped to both the left and right sides of the listener perception acoustic stage separated from the singer's voice. The singer's voice and the guitar properly map the sound source vector Ss containing the singer's voice to the central output channel and the sound source vector Ss including the guitar through the assembly module 814 on the left, right, front, side and / or the like. Or by mapping to a rear output channel. Thus, the audio processing system 102 can convert a two-channel audio input signal into any number of multichannel output signals, such as a surround sound output signal, as well as any of the output channels desired by an individual audio source in the audio input signal. May be assigned to one or more channels of.

또한, 음향 소스 벡터(Ss)는 출력 채널이 인접 배치된 라우드스피커를 구동시, 음향 소스 벡터(Ss)에 포함된 오디오 소스가 2개의 라우드스피커 사이에 위치된 것으로 지각적으로 인지되도록 2개의 상이한 출력 채널에 할당될 수 있다. 또한, 라우드스피커가 차량의 도어 패널, 대시보드 또는 후미 데크와 같이 차량 내에서 다른 높이와 배향으로 위치되는 경우와 같은 특별한 용례에서, 음향 소스 벡터(Ss)는 차량의 운전석 및 조수석에서 청취자의 경험을 최적화하도록 라우드스피커의 위치와 관련하여 비례적으로 선택적으로 할당될 수 있다. 또한, 음향 소스 벡터(Ss)의 그룹은 하나 이상의 출력 채널에 고정적으로 매핑될 수 있다. 대안적으로, 음향 소스 벡터(Ss)는 다른 음향 소스 벡터(Ss)가 소정의 시간 동안 하나 이상의 출력 채널에 나타난 후, 파라미터 입력 제어기(708)로부터의 외부 파라미터, 오디오 입력 신호의 내용 또는 음향 소스 벡터(Ss)의 출력 채널에 대한 매핑에 변화를 야기하는데 유용한 임의의 다른 기준에 따라 자동으로 이동될 수 있도록 조립 모듈(814)에 의해 동작으로 그룹화될 수 있다. 따라서, 출력 채널에 대한 음향 소스 벡터(Ss)의 매핑은 일-대-일, 일-대-다, 또는 다-대-일 매핑일 수 있다. 음향 소스 벡터(Ss)의 일부 또는 전부의 매핑은 좌측 입력 신호가 플레이백 스피커 어레이의 좌측의 출력 채널에(및 연속하여 스피커에도) 매핑되고, 우측 입력 신호가 플레이백 스피커 어레이의 우측의 출력 채널에(및 연속하여 스피커에도) 매핑되도록 할 수 있다. 추가로 또는 대안적으로, 음향 소스 벡터(Ss)의 일부 또는 전부의 매핑은 좌측 입력 신호가 스피커 어레이의 우측의 출력 채널에 매핑되고, 및/또는 우측 입력 신호가 스피커 어레이의 좌측의 출력 채널에 매핑되도록 할 수 있다. 추가로 또는 대안적으로, 음향 소스 벡터(Ss)의 일부 또는 전부의 매핑은 좌측 입력 신호가 스피커 어레이의 양측의 출력 채널에 매핑되고, 및/또는 우측 입력 신호가 스피커 어레이의 양측의 출력 채널에 매핑되도록 할 수 있다. 매핑의 선택은 출력 신호에 대한 바람직한 청취자 인지 음향 스테이지를 얻기 위해 필요시 사용자에 의해 미리 결정되고 설정될 수 있다. 출력 채널에 대한 음향 소스 벡터(Ss)의 매핑은 해당 매핑이 주파수에 따라 변할 수 있도록 주파수 의존적일 수 있다. 일례에서, 주파수 의존적 매핑은 재생되는 음향 스테이지에서 보다 양호하고 안정적인 공간 이미지를 얻는데 사용될 수 있다. In addition, the sound source vector Ss may be divided into two different sources such that, when driving the loudspeakers of which the output channel is adjacently arranged, the audio source included in the sound source vector Ss is perceptually perceived as being located between the two loudspeakers. Can be assigned to an output channel. Also, in special applications such as where loudspeakers are positioned at different heights and orientations within a vehicle, such as a door panel, dashboard, or tail deck of a vehicle, the acoustic source vector Ss is the listener's experience in the driver's and passenger seats of the vehicle. Can be selectively assigned proportionally with respect to the position of the loudspeakers to optimize In addition, the group of sound source vectors Ss may be fixedly mapped to one or more output channels. Alternatively, the sound source vector Ss may be an external parameter from the parameter input controller 708, the content of the audio input signal or the sound source after another sound source vector Ss appears in one or more output channels for a predetermined time. Operation may be grouped by the assembly module 814 to be automatically moved according to any other criteria useful for causing a change in the mapping of the vector Ss to the output channel. Thus, the mapping of the sound source vector Ss to the output channel may be a one-to-one, one-to-many, or many-to-one mapping. The mapping of some or all of the sound source vectors Ss is such that the left input signal is mapped to the left output channel of the playback speaker array (and subsequently to the speakers), and the right input signal is output channel to the right of the playback speaker array. Can be mapped to (and subsequently to speakers). Additionally or alternatively, the mapping of some or all of the sound source vectors Ss may be such that the left input signal is mapped to an output channel on the right side of the speaker array, and / or the right input signal is output to an output channel on the left side of the speaker array. Can be mapped. Additionally or alternatively, the mapping of some or all of the sound source vectors Ss may be such that the left input signal is mapped to output channels on both sides of the speaker array, and / or the right input signal is output to both output channels on the speaker array. Can be mapped. The choice of mapping may be predetermined and set by the user as needed to obtain the desired listener-acoustic sound stage for the output signal. The mapping of the sound source vector Ss to the output channel may be frequency dependent such that the mapping may vary with frequency. In one example, frequency dependent mapping can be used to obtain a better and more stable spatial image at the acoustic stage being reproduced.

출력 채널 라인(810) 상의 오디오 출력 채널은 후처리 모듈(804)에 의해 수용될 수 있다. 후처리 모듈(804)은 임의의 형태의 주파수 도메인-시간 도메인 전환 프로세스를 사용하여 주파수 기반의 오디오 출력 채널을 시간 시간의 오디오 출력 채널로 전환할 수 있다. 도 8에서, 후처리 모듈(804)은 오디오 출력 신호에 포함된 오디오 출력 채널 각각에 대해 컨버터(816)와 윈도윙 모듈(windowing module)(818)을 포함한다. 컨버터(816)와 윈도윙 모듈(818)은 이산 푸리에 변환(DFT) 또는 다른 변환 처리를 사용하여 시간 샘플의 블록을 전환할 수 있다. 다른 예에서, 출력 채널 라인에 제공되는 오디오 출력 채널은 시간 도메인으로 적어도 부분적으로 수행되는 음향 소스 벡터 처리 모듈(706)에 의한 처리에 기인하여 시간 도메인으로 존재할 수 있으며, 후처리 모듈(804)은 생략될 수 있다. The audio output channel on the output channel line 810 may be received by the post processing module 804. The post-processing module 804 can convert the frequency based audio output channel to the time output audio output channel using any form of frequency domain-time domain switching process. In FIG. 8, the post processing module 804 includes a converter 816 and a windowing module 818 for each of the audio output channels included in the audio output signal. Converter 816 and windowing module 818 may convert blocks of time samples using Discrete Fourier Transform (DFT) or other transform processing. In another example, the audio output channel provided to the output channel line may exist in the time domain due to processing by the sound source vector processing module 706 performed at least in part in the time domain, and the post processing module 804 may May be omitted.

도 9는 오디오 입력 신호 분석 모듈(700)과 시스템 관리 모듈(902)을 포함할 수 있는 다른 예의 오디오 처리 시스템(102)의 블록도이다. 전술한 바와 같이, 오디오 입력 신호 분석 모듈(700)은 전처리 블록(704), 음향 소스 벡터 생성 모듈(706), 파라미터 입력 제어기(708)를 포함할 수 있다. 또한, 음향 소스 벡트 생성 모듈(706)은 이득 벡터 생성 모듈(720), 신호 분류기(722) 및 벡터 처리 모듈(724)을 포함할 수 있다. 오디오 입력 신호(904)를 기초로, 오디오 입력 신호 분석 모듈(700)은 벡터 출력 라인(748)에 음향 소스 벡터(Ss)를 생성할 수 있다. 도 9에서, 오디오 입력 신호(904)는 시간 도메인에 제공된 좌/우측 스테레오 쌍으로서 예시된다. 다른 예에서, 주파수 도메인 또는 시간 도메인에 임의의 수의 오디오 입력 신호가 존재할 수 있다.9 is a block diagram of another example audio processing system 102 that may include an audio input signal analysis module 700 and a system management module 902. As described above, the audio input signal analysis module 700 may include a preprocessing block 704, a sound source vector generation module 706, and a parameter input controller 708. The sound source vector generation module 706 may also include a gain vector generation module 720, a signal classifier 722, and a vector processing module 724. Based on the audio input signal 904, the audio input signal analysis module 700 may generate a sound source vector Ss at the vector output line 748. In FIG. 9, the audio input signal 904 is illustrated as a left / right stereo pair provided in the time domain. In another example, there may be any number of audio input signals in the frequency domain or time domain.

벡터 출력 라인(748)에 존재하는 음향 소스 벡터(Ss)는 시스템 관리 모듈(902)에 의해 수신될 수 있다. 시스템 관리 모듈(902)은 에너지 측정 모듈(906)과 시스템 제어 모듈(908)을 포함할 수 있다. 에너지 측정 모듈(906)은 벡터 출력 라인(748)에 각각의 음향 소스 벡터(Ss)를 수신하는 벡터 측정 모듈(910)을 포함할 수 있다. 벡터 측정 모듈(910)은 각각 음향 소스 벡터(Ss) 각각의 에너지 레벨을 측정할 수 있다. 벡터 측정 모듈(910)은 RMS(root-means-square: 제곱평균제곱근) 기반의 수단 또는 피크 기반의 수단과 같은 방법을 사용하여 신호 레벨을 측정할 수 있다. 추가로 또는 대안적으로, 벡터 측정 모듈(910)은 인지된 신호의 소리 크기를 측정할 수 있다.The sound source vector Ss present in the vector output line 748 may be received by the system management module 902. The system management module 902 may include an energy measurement module 906 and a system control module 908. The energy measurement module 906 may include a vector measurement module 910 for receiving each sound source vector Ss at the vector output line 748. The vector measuring module 910 may measure energy levels of respective sound source vectors Ss. The vector measurement module 910 can measure signal levels using methods such as root-means-square (RM) based means or peak based means. Additionally or alternatively, the vector measurement module 910 can measure the loudness of the perceived signal.

시스템 제어 모듈(908)은 제어기(912), 사용자 인터페이스(914), 및 데이터 저장 모듈(916)을 포함할 수 있다. 제어기(912)는 도 1과 관련하여 설명된 프로세서(120)와 유사한 표준 프로세서이거나 프로세서(120)(도 1)로 수행된 기능성을 나타낼 수 있다. 사용자 인터페이스(914)는 사용자가 오디오 신호 처리 시스템(102)으로부터 정보를 제공 및 수신할 수 있도록 하는 임의의 시각적, 청각적 및/또는 촉각적 메커니즘 프로세스 또는 기구를 포함할 수 있다. 예를 들면, 사용자 인터페이스(914)는 전기적 신호를 소정의 시각적으로 인지 가능한 형태로 사용자에게 제공되는 정보로 변환하는 디스플레이를 포함할 수 있다. 소정의 예의 디스플레이는 액정 디스플레이(LCD), 음극선 관(CRT) 디스플레이, 전계발광 디스플레이(ELD), 헤드-업 디스플레이(HUD), 플라즈마 디스플레이 패널(PDP), 발광 다이오드 디스플레이(LED), 또는 진공 형광 디스플레이(VFD)를 포함한다. 사용자 인터페이스(914)는 오디오 신호 처리 시스템(102)와 사용자의 상호 작용을 나타내는 전기적 신호를 제어기(912)에 대해 수신 및 송신할 수 있다. 일례에서, 사용자 인터페이스(914)는 제어기(912)에 전기적으로 연결되는 사용자 입력 장치를 포함할 수 있다. 입력 장치는 휠 버튼, 조이스틱, 키패드, 터치 스크린 구성, 또는 사용자로부터 입력을 수신하고 그러한 입력을 입력 신호로서 제어기(912)에 제공할 수 있는 임의의 다른 장치 또는 메커니즘일 수 있다. 다른 예에서, 디스플레이는 신호를 제어기(912)로 전달하는 터치 스크린 디스플레이 또는 오디오 신호 처리 시스템(102)에 포함되는 임의의 다른 모듈 또는 장치일 수 있다. 사용자가 터치한 디스플레이 상의 영역, 사용자가 디스플레이를 터치한 시간 길이, 사용자가 자신의 손가락을 디스플레이에 대해 이동시키는 방향 등과 같은 정보는 다른 신호 입력으로서 오디오 신호 처리 시스템(102)으로 전달될 수 있다. System control module 908 can include controller 912, user interface 914, and data storage module 916. The controller 912 may be a standard processor similar to the processor 120 described in connection with FIG. 1 or may indicate functionality performed by the processor 120 (FIG. 1). User interface 914 can include any visual, audio and / or tactile mechanism process or mechanism that enables a user to provide and receive information from audio signal processing system 102. For example, the user interface 914 may include a display that converts electrical signals into information provided to the user in a predetermined visually recognizable form. Examples of displays include liquid crystal displays (LCDs), cathode ray tube (CRT) displays, electroluminescent displays (ELDs), head-up displays (HUDs), plasma display panels (PDPs), light emitting diode displays (LEDs), or vacuum fluorescent lights And a display VFD. The user interface 914 can receive and transmit, to the controller 912, an electrical signal indicative of the user's interaction with the audio signal processing system 102. In one example, the user interface 914 can include a user input device electrically connected to the controller 912. The input device may be a wheel button, joystick, keypad, touch screen configuration, or any other device or mechanism capable of receiving input from a user and providing such input to the controller 912 as an input signal. In another example, the display can be a touch screen display or any other module or device included in the audio signal processing system 102 that delivers signals to the controller 912. Information such as the area on the display the user touched, the length of time the user touched the display, the direction in which the user moves his or her finger relative to the display, and the like may be passed to the audio signal processing system 102 as another signal input.

사용자 인터페이스(914)는 사용자가 오디오 신호 처리 시스템(102)과 청각적으로 상호 작용하도록 하는 음성-기반의 인터페이스를 포함할 수도 있다. 음성-기반의 인터페이스는 사용자가 마이크로폰과 음성 인식 소프트웨어를 사용하여 입력을 오디오 신호 처리 시스템(102)에 제공하도록 할 수 있다. 사용자의 음성은 마이크로폰을 사용하여 전자적 신호로 변환되고 음성 인식 소프트웨어를 사용하여 처리됨으로써 제어기(912)를 위한 텍스트 데이터를 생성할 수 있다.The user interface 914 may include a voice-based interface that allows the user to acoustically interact with the audio signal processing system 102. The speech-based interface may allow a user to provide input to the audio signal processing system 102 using a microphone and speech recognition software. The user's voice may be converted into an electronic signal using a microphone and processed using speech recognition software to generate text data for the controller 912.

데이터 저장 모듈(916)은 데이터의 입출력 기록(logging)과 저장을 가능케 하는 컴퓨터 코드를 포함할 수 있다. 컴퓨터 코드는 제어기(912)에 의해 실행 가능한 로직 및/또는 명령의 형태일 수 있다. 제어기(912)에 의한 명령의 실행은 각각의 음향 소스 벡터(Ss) 각각으로부터 에너지 레벨을 입출력하는 기능성을 제공할 수 있다. 또한, 오디오 신호 처리 시스템(102)에 제공되거나 오디오 신호 처리 시스템(102)에 의해 생성되는 임의의 다른 데이터 또는 파라미터의 입출력(logging)은 데이터 저장 모듈(916)에 의해 입출력될 수 있다. 데이터 저장 모듈(916)은 데이터베이스 유지 및 제어 툴 또는 임의의 다른 형태의 데이터 조직 및 저장 장치를 포함할 수도 있다. 데이터 저장 모듈(916)은 도 1과 관련하여 설명된 메모리(118)의 일부로서 포함될 수도 있다.The data storage module 916 may include computer code that enables input / output logging and storage of data. The computer code may be in the form of logic and / or instructions executable by the controller 912. Execution of the command by the controller 912 may provide functionality to input and output energy levels from each of the respective sound source vectors Ss. In addition, the logging of any other data or parameters provided to or generated by the audio signal processing system 102 may be input and output by the data storage module 916. The data storage module 916 may include a database maintenance and control tool or any other form of data organization and storage device. The data storage module 916 may be included as part of the memory 118 described with reference to FIG. 1.

도 9의 오디오 처리 시스템(102)은 전술한 바와 같은 오디오 처리 시스템(102)의 임의의 기타의 능력과 연계하여 사용될 수 있다. 따라서, 사용자 인터페이스(914)는 오디오 처리 시스템(102)의 사용자가 전술된 오디오 처리 시스템(102)의 임의의 기능성에 영향을 미치거나 제어를 행할 수 있는 기능성을 포함할 수 있다. 예를 들면, 사용자 인터페이스(914)는 사용자가 도 5와 관련하여 설명된 개별 위치 필터의 폭과 기울기를 수동으로 조정할 수 있게 한다. 따라서, 사용자는 어떤 음향 소스 벡터(Ss)에 대해 오디오 입력 신호에 포함되는 특별한 음향 소스가 예컨대 간단한 제어 노브에 의해 위치될 수 있는지 수동으로 조정할 수 있다. 다른 예에서, 사용자는 음향 소스 벡터(Ss)가 도 8과 관련하여 설명된 조립 모듈(814)에 의해 어떻게 그룹화, 분할 또는 조작되는지를 수동으로 조정하는 능력을 가질 수 있다. 따라서, 오디오 출력 채널보다 많은 음향 소스 벡터(Ss)가 존재하는 경우, 사용자는 음향 소스가 나타나는 라우드스피커를 조정함으로써 청취자 인지 음향 스테이지 내에서 오디오 소스의 지각 위치를 조정할 수 있다. 또한, 도 7을 참조로 설명된 파라미터 입력 제어기 모듈(708)에 대한 사용자로부터의 수동 입력은 사용자 인터페이스(914)를 통해 들어갈 수 있다. 도 8을 참조로 설명된 바와 같은 음향 소스 벡터 변경 모듈(812)을 사용한 음향 소스 벡터(Ss) 또는 음향 소스 벡터(Ss)에 포함되는 오디오 소스의 수동 조정도 사용자 인터페이스(914)를 통해 수행될 수 있다. 도 9의 벡터 측정 모듈(910)의 출력은 도 8의 변경 블록(813)에 의해 사용되어, 처리된 음향 소스 벡터(Ss) 또는 해당 처리된 음향 소스 벡터(Ss)에 포함되는 오디오 소스의 레벨을 조정할 수 있다. 예를 들면, 변경 블록(813)은 벡터 측정 모듈(910)에 의해 측정된 상대 에너지 레벨을 기초로 서라운드 오디오 신호의 생성에 사용되는 음향 소스 벡터(Ss)의 에너지 레벨을 증폭시킬 수 있다. The audio processing system 102 of FIG. 9 may be used in conjunction with any other capabilities of the audio processing system 102 as described above. Thus, the user interface 914 can include functionality that allows a user of the audio processing system 102 to influence or control any functionality of the audio processing system 102 described above. For example, the user interface 914 allows the user to manually adjust the width and inclination of the individual position filters described with respect to FIG. 5. Thus, the user can manually adjust for which sound source vector Ss a particular sound source included in the audio input signal can be located, for example by a simple control knob. In another example, the user may have the ability to manually adjust how the sound source vector Ss is grouped, split or manipulated by the assembly module 814 described in connection with FIG. 8. Thus, if there are more sound source vectors Ss than the audio output channel, the user can adjust the perceived position of the audio source within the listener perception sound stage by adjusting the loudspeaker in which the sound source appears. In addition, manual input from the user to the parameter input controller module 708 described with reference to FIG. 7 may enter through the user interface 914. Manual adjustment of the audio source included in the sound source vector Ss or the sound source vector Ss using the sound source vector changing module 812 as described with reference to FIG. 8 may also be performed through the user interface 914. Can be. The output of the vector measurement module 910 of FIG. 9 is used by the change block 813 of FIG. 8 to indicate the level of the audio source included in the processed sound source vector Ss or the processed sound source vector Ss. Can be adjusted. For example, the modification block 813 may amplify the energy level of the acoustic source vector Ss used to generate the surround audio signal based on the relative energy level measured by the vector measurement module 910.

도 10은 특별한 효과를 얻기 위해 음향 소스 벡터(Ss)에 대해 행해질 수 있는 오디오 처리 시스템(102) 내에서의 소정의 조정의 예를 나타낸다. 조정 능력은 사용자에 의해 수동으로 행해지거나, 프로세서에 의해 자동으로 행해지거나, 수동 및 자동 제어의 소정의 조합에 의해 수행될 수 있다. 도 10은 우측 라우드스피커(1004)와 좌측 라우드스피커(1006)를 구동시키는 출력 오디오 채널로 형성되는 청취자 인지 오디오 출력 음향 스테이지(1002)를 포함한다. 청취자 인지 오디오 출력 음향 스테이지(1002)는 중앙 위치(1008)를 포함한다. 다른 예에서, 청취자 인지 음향 스테이지(1002)는 도 3과 유사한 서라운드 음향 스테이지일 수 있다.10 shows an example of certain adjustments within the audio processing system 102 that can be made to the sound source vector Ss to achieve a particular effect. The adjustment capability may be done manually by the user, automatically by the processor, or by any combination of manual and automatic control. 10 includes a listener-aware audio output sound stage 1002 formed of output audio channels that drive a right loudspeaker 1004 and a left loudspeaker 1006. The listener aware audio output sound stage 1002 includes a central position 1008. In another example, the listener perception acoustic stage 1002 may be a surround acoustic stage similar to FIG. 3.

도 10에 중앙 공간 슬라이스(1010)가 예시된다. 다른 예에서, 임의의 다른 출력 채널이 유사하게 조정될 수 있다. 중앙 공간 슬라이스는 음향 소스 벡터 변경 모듈(812) 또는 조립 모듈(814)에 의해 화살표(1012)로 나타낸 바와 같이 청취자 인지 오디오 출력 음향 스테이지(1002) 내에서 적소에 조정될 수 있다. 또한, 중앙 공간 슬라이스(1010)로 연결되는 청취자 인지 음향 스테이지(1002)의 폭 또는 길이도 역시 위치 필터 뱅크(500)(도 5) 내의 중앙 공간 필터(502)의 기울기를 변경하는 것에 의해 화살표(1014)로 나타낸 바와 같이 조정될 수 있다. 임의의 공간 필터의 기울기의 조정은 인접 배치된 공간 필터와의 교차점을 변경시킬 것이다. 따라서, 도 10의 예에서 중앙 공간 슬라이스(1010)를 좁게 만드는 중앙 공간 필터(502)의 기울기의 조정은 오디오 소스를 중앙(1008)으로부터 멀리 그리고 청취자 인지 음향 스테이지(1002) 내의 좌우측 스피커(1004, 1006) 모두 또는 하나를 향해 이동시킬 수 있다. 반대로, 중앙 공간 슬라이스(1010)를 넓게 만드는 중앙 공간 필터(502)의 기울기의 조정은 청취자 인지 음향 스테이지(1002) 내에서 중앙(1008)에 가깝게 이동시킬 수 있다.Central space slice 1010 is illustrated in FIG. 10. In another example, any other output channel can be similarly adjusted. The center space slice may be adjusted in place within the listener aware audio output sound stage 1002 as indicated by arrow 1012 by the sound source vector change module 812 or the assembly module 814. In addition, the width or length of the listener perception acoustic stage 1002 connected to the center space slice 1010 may also be changed by changing the inclination of the center space filter 502 in the position filter bank 500 (FIG. 5). 1014). Adjusting the slope of any spatial filter will change the intersection with the spatially placed spatial filter. Thus, in the example of FIG. 10, the adjustment of the tilt of the center space filter 502, which narrows the center space slice 1010, causes the left and right speakers 1004, 1006) can be moved toward all or one. Conversely, the adjustment of the tilt of the center space filter 502 to widen the center space slice 1010 can move closer to the center 1008 within the listener perception acoustic stage 1002.

추가로 또는 대안적으로, 오디오 출력 채널의 진폭 또는 크기가 임의의 다른 오디오 출력 채널의 경우와 무관하게 조정될 수 있다. 도 10에서, 중앙 출력 채널의 진폭의 조정이 화살표(1016)로 예시된다. 진폭의 조정은 중앙 출력 채널(1010) 내에 있는 것으로 확인되는 음향 소스 벡터(Ss)에 포함되는 음향 벡터의 진폭을 조정하는 것에 의해 음향 소스 벡터 처리 모듈(802)(도 8)에서 수행될 수 있다. Additionally or alternatively, the amplitude or size of the audio output channel can be adjusted independent of the case of any other audio output channel. In FIG. 10, the adjustment of the amplitude of the center output channel is illustrated by arrow 1016. The adjustment of the amplitude may be performed in the sound source vector processing module 802 (FIG. 8) by adjusting the amplitude of the sound vector included in the sound source vector Ss identified to be in the central output channel 1010. .

이러한 진폭 조정의 특정 예의 용례는 오디오를 포함하는 비디오 방송의 영역에 있다. 텔레비젼 프로그래밍과 같은 다수의 비디오 방송은 오디오 입력 신호의 청취자 인지 음향 스테이지 내의 중앙 위치에 오디오 대화(dialogue)를 포함하기 때문에, 사용자는 오디오 입력 신호에 포함된 다른 오디오 소스를 변화되지 않게 하면서 수신된 텔레비젼 프로그래밍의 대화를 증폭하는 능력을 가질 수 있다. 따라서, 오디오 입력 신호에 포함된 배경 노이즈 때문에 대화의 청취가 어려운 보청기를 갖는 사용자와 같은 사용자는 중앙 공간 슬라이스와 관련된 음향 소스 벡터(Ss)에 관한 음향 소스 벡터(Ss)를 예컨대 6dB 정도의 미리 정해진 양으로 증폭함으로써 대화를 효과적으로 증폭하는 한편 나머지 음향 소스 벡터(Ss)의 진폭을 거의 변하지 않게 유지할 수 있다. 중앙 공간 슬라이스에 관련된 음향 소스 벡터(Ss)가 일단 증폭되면, 음향 소스 벡터(Ss)는 재결합되어 한 쌍의 스테레오 출력 채널과 같은 하나 이상의 출력 채널을 형성할 수 있다. 대안적으로 또는 추가적으로, 음성을 포함하는 것으로 확인되는 중앙 공간 슬라이스가 아닌 공간 슬라이스가 증폭될 수 있다. 또한, 음성이 이미 존재하고 있는 공간 슬라이스에서 음성의 부재의 확인을 기초로, 증폭이 선택적으로 적용될 수 있다. 다른 예에서, 아나운서 음성을 크게 하는 압축을 포함하는 텔레비젼 프로그래밍에 수신된 광고는 진폭이 감쇠될 수 있다. A particular example of such amplitude adjustment is in the area of video broadcasting that includes audio. Since many video broadcasts, such as television programming, include an audio dialogue at a central location within the listener's acoustic stage of the audio input signal, the user can receive television without changing other audio sources included in the audio input signal. Have the ability to amplify the dialogue of programming. Thus, a user such as a user with a hearing aid having difficulty hearing a conversation due to the background noise included in the audio input signal may select a sound source vector Ss related to the sound source vector Ss associated with the center space slice, for example, by a predetermined amount of 6 dB. Positive amplification effectively amplifies the conversation while keeping the amplitude of the remaining sound source vectors Ss almost unchanged. Once the sound source vector Ss associated with the center space slice is amplified, the sound source vector Ss can be recombined to form one or more output channels, such as a pair of stereo output channels. Alternatively or additionally, spatial slices other than the central spatial slice found to contain negative may be amplified. In addition, amplification may optionally be applied based on the confirmation of the absence of speech in the spatial slice in which speech already exists. In another example, an advertisement received in television programming that includes compression that augments announcer speech may be attenuated in amplitude.

일례에서, 공간 슬라이스(1010)의 위치 및/또는 폭의 조정과 오디오 출력 채널의 진폭 또는 크기의 조정에 의한 공간 슬라이스(1010)의 영역의 조정은 전술한 바와 같이 청취자 인지 음향 스테이지 내의 오디오 소스의 확인을 기초로 오디오 처리 시스템에 의해 자동으로 수행될 수 있다. 추가로 또는 대안적으로, 이러한 조정은 수동으로 행해질 수 있다. 예를 들면, 사용자는 공간 슬라이스(1010)의 위치를 청취자 인지 음향 스테이지(1002)를 가로질러 전후로 이동 또는 움직이도록 할 수 있는 회전 노브 또는 다른 형태의 사용자 인터페이스와 같은 제1 조정기를 가질 수 있다. 사용자는 공간 슬라이스(1010)의 폭을 조정하는 제2 조정기와 공간 슬라이스(1010) 내의 오디오 콘텐츠의 소리 크기를 조정하는 제3 조정기를 포함할 수 있다. 따라서, 사용자는 제1 조정기를 조정하여 공간 슬라이스(1010)를 청취자 인지 음향 스테이지(1002) 내의 주변으로 이동시켜, 기타와 같은 가청 음향의 하나 이상의 소스를 청취자 인지 음향 스테이지(1002) 내에 어느 정도 위치되도록 할 수 있다. 일단 위치되면, 사용자는 제2 조정기를 조정하여 공간 슬라이스(1010)의 폭을 조정하여 공간 슬라이스(1010) 내의 가청 음향의 하나 이상의 소스를 완전히 포위하도록 할 수 있다. 또한, 사용자가 제1 및 제2 조정기를 사용하여 공간 슬라이스의 영역을 원하는 대로 조정 완료하면, 사용자는 제3 조정기를 조정하여 이제 공간 슬라이스(1010) 내에 포위된 가청 음향의 하나 이상의 소스의 소리 크기를 증감시킬 수 있다. In one example, the adjustment of the area of the spatial slice 1010 by adjusting the position and / or width of the spatial slice 1010 and by adjusting the amplitude or size of the audio output channel is performed by the audio source in the listener perception acoustic stage as described above. It may be performed automatically by the audio processing system based on the confirmation. Additionally or alternatively, such adjustments can be made manually. For example, a user may have a first adjuster, such as a rotary knob or other form of user interface, which may cause the position of the spatial slice 1010 to move or move back and forth across the listener perception acoustic stage 1002. The user may include a second adjuster for adjusting the width of the spatial slice 1010 and a third adjuster for adjusting the loudness of audio content in the spatial slice 1010. Thus, the user adjusts the first adjuster to move the spatial slice 1010 to the periphery within the listener perception acoustic stage 1002 to position one or more sources of audible sound, such as guitar, to some extent within the listener perception acoustic stage 1002. You can do that. Once located, the user can adjust the second adjuster to adjust the width of the spatial slice 1010 to completely surround one or more sources of audible sound within the spatial slice 1010. In addition, when the user has finished adjusting the area of the spatial slice using the first and second adjusters as desired, the user adjusts the third adjuster so that the loudness of one or more sources of audible sound now enclosed within the space slice 1010. Can be increased or decreased.

도 11은 도 1-10을 참조로 설명되는 오디오 처리 시스템(102)에 의한 오디오 처리의 예를 나타낸 흐름도이다. 본 예에서, 오디오 신호는 시간 도메인으로 제공되어 주파수 도메인으로 변환된다. 다른 예에서, 오디오 신호는 주파수 도메인으로 수신될 수 있으며, 및/또는 처리는 시간 도메인과 주파수 도메인, 오직 시간 도메인, 또는 오직 주파수 도메인으로 행해질 수 있다. 블록(1102)에서, 오디오 처리 시스템(102)은 오디오 소스로부터 오디오 입력 신호를 수신한다. 오디오 입력 신호의 시간의 순간은 시간 도메인으로부터 주파수 도메인으로 전환되어 블록(1104)에서 주파수 통(bin)으로 분할된다. 블록(1106)에서, 청취자 인지 음향 스테이지를 통한 음향 소스 각각의 추정된 지각 위치(S(ω))가 추정된 위치 생성 모듈(728)에 의해 결정될 수 있다. 추정된 위치 지각 위치(S(ω))는 전술한 수학식 1 및 2를 기초로 결정될 수 있다. 추정된 지각 위치(S(ω))는 블록(1108)에서 위치 필터 뱅크(500)에 적용된다.11 is a flowchart showing an example of audio processing by the audio processing system 102 described with reference to FIGS. 1-10. In this example, the audio signal is provided in the time domain and converted into the frequency domain. In another example, the audio signal may be received in the frequency domain, and / or the processing may be done in the time domain and the frequency domain, only the time domain, or only the frequency domain. In block 1102, the audio processing system 102 receives an audio input signal from an audio source. The moment of time of the audio input signal is converted from the time domain to the frequency domain and divided into a frequency bin at block 1104. At block 1106, an estimated perceptual position S (ω) of each sound source through the listener perceptual sound stage may be determined by the estimated position generation module 728. The estimated position perception position S (ω) may be determined based on Equations 1 and 2 described above. The estimated perceptual position S (ω) is applied to the position filter bank 500 at block 1108.

블록(1110)에서, 주파수 통 각각에 대해 이득 값이 얻어져서, 미리 정해지거나 사용자 선택된 다수의 공간 슬라이스 중 하나에 대해 각각의 위치 이득 벡터를 형성한다. 블록(1112)에서 인지 모델(734)과 소스 모델(736)이 이득 위치 벡터에 적용된다. 블록(1114)에서, 이득 위치 벡터가 모든 공간 슬라이스에 대해 형성되었는지 여부가 결정된다. 이득 위치 벡터가 모든 공간 슬라이스에 대해 결정된 것이 아니라면, 블록(1116)에서 다음의 공간 슬라이스가 선택되고 블록(1110, 1112, 1114)이 반복된다. 블록(1114)에서 이득 위치 벡터가 모든 공간 슬라이스에 대해 결정되었으면, 연산은 블록(1118)으로 진행되어 각각의 공간 슬라이스에 대해 음향 소스 벡터(Ss)를 형성한다. 각각의 주파수 빈에 있는 오디오 입력 신호의 부분은 이득 위치 벡터 중 각각 하나의 벡터에서의 대응하는 이득 값으로 곱해져서 음향 소스 값(Ssn)을 생성하여 각각의 공간 슬라이스에 대해 음향 소스 벡터(Ss)를 형성할 수 있다.At block 1110, a gain value is obtained for each frequency bin to form each position gain vector for one of a plurality of predetermined or user selected spatial slices. At block 1112, the cognitive model 734 and the source model 736 are applied to the gain position vector. At block 1114, it is determined whether the gain position vector has been formed for all spatial slices. If the gain position vector is not determined for all spatial slices, then the next spatial slice is selected at block 1116 and blocks 1110, 1112, 1114 are repeated. If the gain position vector at block 1114 has been determined for all spatial slices, the operation proceeds to block 1118 to form an acoustic source vector Ss for each spatial slice. The portion of the audio input signal in each frequency bin is multiplied by the corresponding gain value in each one of the gain position vectors to produce an acoustic source value Ssn to generate the acoustic source vector Ss for each spatial slice. Can be formed.

블록(1120)에서, 음향 소스 벡터(Ss)가 각각의 공간 슬라이스에 대해 결정되었는지 여부가 결정된다. 그렇지 않은 경우, 블록(1122)에서, 연산은 음향 소스 벡터(Ss)가 아직 결정되지 않은 다음의 공간 슬라이스로 이동하여 음향 소스 벡터(Ss)가 각각의 공간 슬라이스에 대해 유도될 때까지 블록(1118, 1120))을 반복한다. 블록(1120)에서 음향 소스 벡터(Ss)가 모든 공간 슬라이스에 대해 유도되었으면, 연산은 도 12의 블록(1124)으로 진행된다. 블록(1124)에서, 각각의 음향 소스 벡터(Ss)가 신호 분류기(722)에 의해 각각 하나 이상의 음향 소스 벡터(Ss)로 표현되는 음향 소스를 확인하도록 분석된다.At block 1120, it is determined whether the acoustic source vector Ss has been determined for each spatial slice. Otherwise, at block 1122, the operation moves to the next spatial slice in which the acoustic source vector Ss has not yet been determined and blocks 1118 until the acoustic source vector Ss is derived for each spatial slice. 1120). If the acoustic source vector Ss has been derived for all spatial slices at block 1120, the operation proceeds to block 1124 of FIG. At block 1124, each acoustic source vector Ss is analyzed by the signal classifier 722 to identify the acoustic source, each represented by one or more acoustic source vectors Ss.

블록(1126)에서, 각각의 공간 슬라이스에 대한 음향 소스가 결정되었는지 여부가 결정된다. 모든 공간 슬라이스가 음향 소스에 대해 분석되지 않았으면, 연산은 공간 슬라이스에 있는 추가의 음향 소스를 확인하는 신호 분류 모듈(722)을 위한 블록(1124)으로 돌아간다. 다른 한편, 공간 슬라이스가 모든 고려되었으면, 공각 슬라이스 각각에 대해 피드백 오디오 소스 분류 신호가 생성되어, 블록(1128)에서 오디오 입력 신호의 연속 스냅샷 처리에 사용되도록 위치 필터 뱅크 생성 모듈(730), 인지 모델(734) 및 소스 모델(736)에 제공될 수 있다.At block 1126, it is determined whether a sound source for each spatial slice has been determined. If not all spatial slices have been analyzed for the acoustic source, the operation returns to block 1124 for signal classification module 722 to identify additional acoustic sources in the spatial slice. On the other hand, once the spatial slices have all been considered, a feedback audio source classification signal is generated for each of the sympathetic slices, so that the location filter bank generation module 730 recognizes, at block 1128, to be used for the continuous snapshot processing of the audio input signal. May be provided to the model 734 and the source model 736.

블록(1130)에서, 피드포워드 오디오 소스 분류 신호가 현재 처리되고 있는 오디오 입력 신호의 스냅샷의 음향 소스 벡터(Ss)를 추가로 처리하도록 음향 소스 벡터 변경 모듈(812)에 제공된다. 음향 소스 벡터 변경 모듈(812)은 블록(1132)에서 전방 이송 오디오 소스 분류 신호를 기초로 음향 소스 벡터(Ss)를 변경할 수 있다. 음향 소스 벡터(Ss)는 블록(1134)에서 조립 모듈(814)과 재결합 등의 조립을 행하여 오디오 출력 채널을 포함하는 오디오 출력 신호를 형성할 수 있다. 블록 (1136)에서, 오디오 출력 채널은 주파수 도메인으로부터 시간 도메인으로 전환될 수 있다. 이후 연산은 블록(1104)으로 돌아가서 오디오 입력 신호의 다른 스냅샷을 변환하고 상기 연산들을 다시 수행할 수 있다.In block 1130, a feedforward audio source classification signal is provided to the sound source vector change module 812 to further process the sound source vector Ss of the snapshot of the audio input signal currently being processed. The sound source vector changing module 812 may change the sound source vector Ss based on the forward transport audio source classification signal at block 1132. The sound source vector Ss may be reassembled with the assembly module 814 at block 1134 to form an audio output signal including the audio output channel. At block 1136, the audio output channel can be switched from the frequency domain to the time domain. The operation may then return to block 1104 to convert another snapshot of the audio input signal and perform the operations again.

전술한 오디오 처리 시스템을 사용하여, 오디오 입력 신호를 청취자 인지 음향 스테이지를 통한 여러 공간 슬라이스로 분할하는 것에 의해 2 이상의 채널의 임의의 오디오 입력 신호가 분석되어 오디오 입력 신호에 포함되는 오디오 소스의 지각 위치를 확인할 수 있다. 오디오 입력 신호의 현재 스냅샷은 각기 음향 소스 벡터(Ss)를 포함하는 여러 공간 슬라이스로 절개되어 오디오 소스를 확인할 수 있다. 오디오 소스가 일단 음향 소스 벡터(Ss)로 분할되면, 오디오 소스 각각은 분류된 후 해당 분류를 기초로 추가로 처리될 수 있다. 대안적으로, 공간 슬라이스로 분할된 오디오 입력 신호와, 음향 소스 벡터(Ss)를 포함하는 공간 슬라이스 각각은 독립적으로 처리될 수 있다. 다른 시스템의 경우, 개별 음향 소스를 나타내는 오디오 입력 신호의 부분을 처리하는 이러한 분할이 불가능하다. 개별 공간 슬라이스의 독립적 처리가 일단 수행되면, 공간 슬라이스는 출력 오디오 채널을 형성하기 위해 추가로 조작될 수 있다. 조작은 오디오 출력 채널을 형성하기 위한 공간 슬라이스의 이동, 결합 또는 분할을 포함할 수 있다.Using the above-described audio processing system, by dividing an audio input signal into several spatial slices through a listener perceptual sound stage, any audio input signal of two or more channels is analyzed and the perceptual position of the audio source included in the audio input signal. You can check. The current snapshot of the audio input signal can be cut into several spatial slices, each containing an acoustic source vector (Ss) to identify the audio source. Once the audio sources are divided into sound source vectors Ss, each of the audio sources can be classified and further processed based on the classification. Alternatively, the audio input signal divided into spatial slices and each spatial slice including the sound source vector Ss may be processed independently. For other systems, this division is not possible to process the portion of the audio input signal representing the individual sound source. Once the independent processing of the individual spatial slices is performed, the spatial slices can be further manipulated to form the output audio channel. Manipulation may include moving, combining or dividing the spatial slices to form an audio output channel.

본 발명의 다양한 실시예가 설명되었지만, 당업자에게는 보다 많은 실시예와 실시가 발명의 범위 내에서 가능함이 분명할 것이다. 따라서, 본 발명은 첨부된 특허청구범위와 그 등가물의 측면을 제외하고 제한되지 않는다.While various embodiments of the invention have been described, it will be apparent to those skilled in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.

Claims

As an audio processing system:
A processor;
Executed by the processor to analyze an audio input signal intended to drive a plurality of loudspeakers in a listening space, and to estimate a plurality of respective perceptual positions in a listener perception acoustic stage of each of the plurality of audio sources included in the audio input signal. Possible,
A gain vector generation module executable by the processor to generate a position filter bank comprising a plurality of position filters based on individual perceptual positions in the listener perception acoustic stage;
A vector processing module executable by the processor to apply the position filter bank to the audio input signal to produce a plurality of sound source vectors each representing one of each perceptual position
Audio processing system comprising a.

2. The sound source vector processing module of claim 1, further comprising a sound source vector processing module executable by the processor to modify the sound source vector and to combine the sound source vector to generate an audio output signal configured to drive a plurality of loudspeakers. And the sound source vector processing module is selectively coupled to the sound source vector such that the number of audio channels in an audio input signal is less than, more than, or equal to the number of channels in an audio output signal. Configurable audio processing system.

3. The position filter of claim 1 or 2, wherein each of the position filters comprises a plurality of gain position vectors, each of the gain position vectors having a plurality of gain values, each of the gain values being part of a total frequency range of the audio input signal. And the gain value is applied to the portion of the total frequency range of the audio input signal when the position filter bank is applied to the audio input signal.

3. The method of claim 1 or 2, wherein the analysis of the audio input signal by the gain vector generation module comprises a plurality of frequency passes each comprising the audio input signal and a band of frequencies included in the audio input signal. audio processing system comprising dividing into bins).

3. The audio processing of claim 1 or 2, wherein the gain vector generation module is further executable to use perceptual positions in the predetermined period to develop corresponding position gain vectors for each of the position filters in a predetermined period. system.

3. The audio processing system of claim 1 or 2, further comprising a signal classification module executable by the processor to identify each of the audio sources within each perceptual location.

The method of claim 1, further comprising: generating a source model executable by the processor to smooth the audio input signal to prevent changes in amplitude and frequency of the audio input signal over a predetermined number of snapshots over a predetermined ratio. An audio processing system further comprising.

The system of claim 1, wherein the position filter and the sound source vector are generated repeatedly at each of a plurality of time points, and the audio processing system is perceptual executable by the processor. Further comprising a model and a source model, the source model being executable to identify a change in amplitude or frequency of the audio input signal that exceeds a change in a predetermined rate, the perceptual model exceeding a change in the predetermined rate. And to dynamically smooth the gain position vector included in each of the position filters based on the identified change in amplitude or frequency.

8. A method according to any one of the preceding claims, characterized in that it identifies a sound source vector representing a predetermined one of the perceptual positions and adjusts the amplitude of the sound source indicated in the identified sound source vector. A sound source vector processing module executable by the processor to adjust the gain of the sound source vector, wherein the sound source vector processing module is further configured to generate an audio output signal to be provided to a plurality of loudspeakers; An audio processing system executable to combine the adjusted sound source vector with the rest of the sound source vector.

As an audio signal processing method:
Receiving an audio input signal configured to drive a plurality of loudspeakers in the listening space using an audio processor;
Using the audio processor, identifying a plurality of perceptual positions of each of a plurality of sources of audible sound provided within the audio input signal, wherein the perceptual positions are physical locations of each source of audible sound in a listener perceptual sound stage. Indicating, the identifying step;
Using the audio processor, generating a plurality of filters for each of a plurality of individual output channels based on the identified perceptual positions for the respective source of audible sound;
Using the audio processor, applying the filter to the audio input signal to generate a plurality of sound source vectors each representing a portion of the audio input signal.
Wherein the audio signal processing method comprises the steps of:

11. The method of claim 10, further comprising modifying each of said sound source vectors separately and independently to modify said portion of said audio input signal separately and independently.

12. The audio signal of claim 11, further comprising processing the modified sound source vector using the audio processor to produce an audio output signal adapted to drive a separate loudspeaker in each of a plurality of separate audio output channels. Treatment method.

13. The method of claim 12, wherein processing the modified sound source vector comprises combining the sub-combinations of the modified sound source vectors together to form one respective audio output channel.

The method according to claim 12 or 13, wherein the audio input signal comprises a plurality of audio input channels, and wherein the number of the individual audio output channels is larger or smaller than the number of the audio input channels.

The method of any one of claims 10 to 13, wherein the step of identifying a plurality of perceptual positions divides the audio input signal into frequencies of a plurality of predetermined bands, and at least one of the frequencies of the predetermined band. Identifying a perceptual location of one or more sources of the plurality of sources of audible sound at a frequency of.

16. The method of claim 15, wherein identifying the perceptual positions of one or more of the plurality of sources of audible sound comprises each of a plurality of predetermined regions forming a listener perception sound stage from a predetermined position value of a predetermined range of position values. Audio signal processing method comprising: assigning to.

11. The method of claim 10, wherein applying the filter to the audio input signal comprises separating a group of sources of audible sound coupled to the audio input signal into a plurality of different sound source vectors. And the group of sources of is separated based on the identified individual perceptual position for each of the sources of audible sound in the group.

A computer readable medium containing instructions executable by a processor:
Instructions for receiving an audio input signal configured to drive a plurality of loudspeakers in the listening space;
Instructions for generating a plurality of gain position vectors, each gain position vector corresponding to a position in the perceptual acoustic stage generated when the audio input signal is output as an audible sound in a listening space, and each gain position vector corresponding to the corresponding position of the gain position vector; Said vector generating instruction comprising a gain value at each of the frequencies of each of said plurality of predetermined bands of frequencies of said audio input signal at said position;
Generating a plurality of position filters for each of a plurality of individual output channels, wherein the position filters are generated from the gain position vectors;
Apply each said position filter to said audio input signal to form one acoustic source vector of a plurality of acoustic source vectors;
Computer readable medium comprising a.

19. The computer readable medium of claim 18, further comprising instructions for identifying an individual audio source in each said sound source vector, and instructions for separately and independently processing each said sound source in accordance with said identified individual audio source. Media available.

20. The method of claim 19, further comprising: combining the sound source vector to form an audio output signal comprising instructions for independently processing each said sound source vector and a plurality of audio output channels adapted to independently drive individual loudspeakers. Computer-readable media further comprising instructions.

21. The method of any of claims 18-20, wherein the instructions for generating a plurality of gain position vectors comprise instructions for converting the audio input signal into a frequency domain and dividing the audio input signal into frequencies of a predetermined band. Computer readable media.

21. The method according to any one of claims 18 to 20, wherein the command to apply each of the position filters to the audio input signal is such that each of the gain values at frequencies of one of the predetermined bands of frequencies is one of the predetermined bands. And instructions for applying to frequencies of a corresponding one of the frequencies of the predetermined band of an audio input signal.

21. The method of any of claims 18-20, wherein the instructions for generating a plurality of gain position vectors comprise instructions for generating a gain value for each of the frequencies of the predetermined band of an audio input signal at each of the positions. Computer-readable media comprising.

As an audio signal processing method:
Receiving an audio input signal adapted to drive a plurality of loudspeakers in the listening space;
Dividing the audio input signal into a plurality of sound source position vectors using a position filter bank having a plurality of position filters constructed based on estimated perceptual positions of a plurality of sources of audible sound included in the audio input signal; Wherein each of said acoustic source position vectors represents a perceptual position over a listener perception acoustic stage, wherein at least some of said acoustic source position vectors comprise sources of audible sound included in said audio input signal;
Independently modifying the sound source position vector;
Combining the sound source position vectors to produce an audio output signal comprising a plurality of audio output channels each configured to drive a separate loudspeaker
Wherein the audio signal processing method comprises the steps of:

25. The method of claim 24, wherein dividing the audio input signal into a plurality of sound source position vectors further comprises dividing the audio input signal into frequencies of a plurality of predetermined bands, and generating a plurality of frequencies for each of the frequencies of the predetermined band. Generating a sound source value, wherein each of said sound source vectors is formed from said plurality of sound source values for a particular one of said perceptual positions.

25. The method of claim 24, wherein dividing the audio input signal into a plurality of sound source position vectors comprises applying the audio input signal to the position filter to generate the sound source position vector. .

25. The method of claim 24, wherein combining the sound source position vectors to produce an audio output signal comprises combining the sound source position vectors to form each of the audio output channels.

28. The method of any of claims 24 to 27, wherein combining the sound source position vector to produce an audio output signal comprises forming one of the audio output channels from one of the sound source position vectors. Audio signal processing method comprising a.

28. The method of any of claims 24 to 27, wherein combining the sound source position vector to generate an audio output signal comprises one of the sound source position vectors in at least two of the audio output channels. Audio signal processing method comprising the step.

28. The method of any one of claims 24 to 27, wherein independently modifying the sound source position vector comprises independently adjusting only the audio sound source included in a particular one of the sound source position vectors. Audio signal processing method comprising.

28. The method according to any one of claims 24 to 27, wherein the step of modifying the sound source position vector independently comprises: replacing the source of audible sound included in a first vector of the sound source position vector; Moving to two vectors.

As an audio processing system:
Is configured to receive an audio input signal configured to drive a plurality of loudspeakers in the listening space;
Further configured to convert the audio input signal into a frequency domain;
Further configured to divide the audio input signal into a plurality of predetermined bands of frequencies;
In addition, it is configured to generate a location filter bank comprising a plurality of location filters each corresponding to one of a plurality of perceptual locations through a listener perceptual sound stage;
Further configured to apply the position filter bank to the audio input signal to divide a source of audible sound included in the audio input signal into a plurality of perceptual positions;
Further configured to separately process the source of the divided audible sound separately;
Further configured to combine the source of the divided audible sound to form an audio output signal comprising a plurality of audio output channels.
An audio processing system comprising a processor.