KR20170016488A

KR20170016488A - Apparatus and method for enhancing an audio signal, sound enhancing system

Info

Publication number: KR20170016488A
Application number: KR1020177000895A
Authority: KR
Inventors: 크리스티앙 울레; 파트리크 감프; 올리버 헬무트; 슈테판 바르가; 세바스티앙 샤러
Original assignee: 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.
Priority date: 2014-07-30
Filing date: 2015-07-27
Publication date: 2017-02-13
Also published as: JP6377249B2; RU2017106093A3; US20170133034A1; AU2015295518B2; WO2016016189A1; CN106796792A; US10242692B2; KR101989062B1; ES2797742T3; BR112017000645A2; EP3175445B1; RU2666316C2; AU2015295518A1; CN106796792B; EP2980789A1; MX362419B; CA2952157C; CA2952157A1; EP3175445B8; MX2017001253A

Abstract

프로세싱된 신호의 과도 및 음조 부분들을 감소 또는 제거하기 위해 오디오 신호를 프로세싱하기 위한 신호 프로세서 및 프로세싱된 신호로부터 제 1 역상관된 신호 및 제 2 역상관된 신호를 생성하기 위한 역상관기를 포함하는, 오디오 신호를 향상시키기 위한 장치가 개시된다. 장치는, 제 1 및 제 2 역상관된 신호 및 오디오 신호 또는 오디오 신호로부터 코히어런스 향상에 의해 유도된 신호를, 시변 가중치 팩터들을 사용하여 가중적으로 결합하고, 2-채널 오디오 신호를 획득하기 위한 결합기를 더 포함한다. 장치는, 오디오 신호의 상이한 부분들이 상이한 가중치 팩터들과 곱해지고 2-채널 오디오 신호가 시변 역상관도를 갖도록 오디오 신호를 분석함으로써 시변 가중치 팩터들을 제어하기 위한 제어기를 더 포함한다.A signal processor for processing the audio signal to reduce or eliminate transient and tonal portions of the processed signal, and a decorrelator for generating a first decorrelated signal and a second decorrelated signal from the processed signal, An apparatus for enhancing an audio signal is disclosed. The apparatus comprises means for weighting the first and second decorrelated signals and the signal derived by the coherence enhancement from the audio signal or audio signal using time-varying weighting factors and obtaining a 2-channel audio signal Lt; / RTI > The apparatus further includes a controller for controlling the time-varying weighting factors by analyzing the audio signal such that different portions of the audio signal are multiplied by different weighting factors and the 2-channel audio signal has a time varying decorrelation.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to an apparatus and method for enhancing an audio signal,

본 출원은 오디오 신호 프로세싱 및 특히 모노 또는 듀얼-모노 신호의 오디오 프로세싱에 관한 것이다.This application relates to audio signal processing and, in particular, audio processing of mono or dual-mono signals.

청각적 장면은 직접적인 음향 및 주변 음향의 혼합으로 모델링될 수 있다. 직접적인(또는 지향적인) 음향들은 음원들, 예를 들어, 악기, 보컬리스트 또는 스피커에 의해 방출되고, 가능한 최단 경로 상에서 수신기, 예를 들어, 청취자의 귀 또는 마이크로폰에 도달된다. 이격된 마이크로폰들의 세트를 사용하여 직접적인 음향을 캡쳐하는 경우, 수신된 신호들은 코히어런트하다. 반대로, 주변 (또는 확산) 음향들은, 예를 들어, 실내 반향, 박수 또는 소음에 기여하는 많은 이격된 음원들 또는 음향 반사 경계들에 의해 방출된다. 이격된 마이크로폰들의 세트를 사용하여 주변 음향 필드를 캡쳐하는 경우, 수신된 신호들은 적어도 부분적으로 코히어런트하지 않다.The auditory scene can be modeled as a mixture of direct sound and ambient sound. Direct (or directed) sounds are emitted by sources, such as musical instruments, vocalists or speakers, and reach the receiver, e. G., The listener's ear or microphone on the shortest possible path. When direct sound is captured using a set of spaced microphones, the received signals are coherent. Conversely, ambient (or diffuse) sounds are emitted by, for example, many spaced sound sources or acoustic reflection boundaries that contribute to room reverberation, applause or noise. When using a set of spaced microphones to capture a surrounding acoustic field, the received signals are not at least partially coherent.

모노포닉 음향 재생은 일부 재생 시나리오들(예를 들어, 댄스 클럽들)에서 또는 일부 타입들의 신호들(예를 들어, 음성 녹음)에 대해 적절한 것으로 고려될 수 있지만, 대부분의 음악 녹음들, 영화 음향 및 TV 음향은 스테레오 신호들이다. 스테레오 신호들은 주변(또는 확산) 음향들 및 음원들의 방향 및 폭의 감각을 생성할 수 있다. 이는, 공간 큐(space cue)들에 의해 인코딩된 스테레오 정보를 이용하여 달성된다. 가장 중요한 공간 큐는 채널-간 레벨 차이들(ICLD), 채널-간 시간 차이들(ICTD) 및 채널-간 코히어런스(ICC)이다. 결과적으로, 스테레오 신호들 및 대응하는 음향 재생 시스템들은 하나보다 많은 채널을 갖는다. ICLD 및 ICTD는 방향 감각에 기여한다. ICC는 음향의 폭 감각을 유발하며, 주변 음향들의 경우 음향이 모든 방향들에서 오는 것으로 인식된다.Monophonic sound reproduction may be considered appropriate for some playback scenarios (e.g., dance clubs) or for some types of signals (e.g., voice recordings), but most music recordings, And TV sound are stereo signals. Stereo signals may produce ambient (or diffuse) sounds and a sense of direction and width of sound sources. This is accomplished using stereo information encoded by space cues. The most important spatial cues are channel-to-channel level differences (ICLD), inter-channel time differences (ICTD) and channel-to-channel coherence (ICC). As a result, stereo signals and corresponding sound reproduction systems have more than one channel. ICLD and ICTD contribute to sense of direction. ICC causes a sense of width of sound, and in the case of ambient sounds, sound is perceived as coming from all directions.

다양한 포맷들의 다중 채널 음향 재생이 존재하지만, 오디오 녹음 및 음향 재생 시스템들 대부분은 여전히 2 개의 채널들을 갖는다. 2-채널 스테레오 음향은 엔터테인먼트 시스템들에 대한 표준이고 리스너(listener)들이 이에 대해 사용된다. 그러나, 스테레오 신호는 단지 2 개의 채널 신호들만을 갖도록 제한되는 것이 아닐 하나보다 많은 채널 신호를 가질 수 있다. 유사하게, 모노포닉 신호들은 단지 하나의 채널 신호만을 갖도록 제한되는 것이 아니라, 다수의 그러나 동일한 채널 신호들을 가질 수 있다. 예를 들어, 2 개의 동일한 채널 신호들을 포함하는 오디오 신호는 듀얼-모노 신호로 지칭될 수 있다.Although there are multi-channel sound reproduction of various formats, most of the audio recording and sound reproduction systems still have two channels. Two-channel stereo sound is standard for entertainment systems and listeners are used for this. However, a stereo signal is not limited to having only two channel signals, but can have more channel signals. Similarly, monophonic signals are not limited to having only one channel signal, but may have many but the same channel signals. For example, an audio signal comprising two identical channel signals may be referred to as a dual-mono signal.

스테레오 신호들 대신에 모노포닉 신호들이 청취자에게 이용가능한 데에는 여러 이유들이 존재한다. 첫째로, 오래된 녹음은, 그 때 스테레오 기술들이 사용되지 않았기 때문에 모노포닉이다. 둘째로, 송신 또는 저장 매체의 대역폭 제한들이 스테레오 정보의 손실을 초래할 수 있다. 대표적인 예는 주파수 변조(FM)를 사용한 라디오 방송이다. 여기서, 간섭 소스들, 다중경로 왜곡들 또는 다른 송신 손상들은 잡음있는 스테레오 정보를 초래할 수 있고, 이는, 통상적으로 채널들 둘 모두 사이의 차동 신호로 인코딩된 2-채널 신호들의 송신의 경우이다. 수신 조건들이 불량한 경우 스테레오 정보를 부분적으로 또는 완전히 폐기하는 것이 일반적이다.There are a number of reasons why monophonic signals are available to listeners instead of stereo signals. First, old recordings are monophonic, because stereo techniques were not used at that time. Second, bandwidth constraints on the transmission or storage medium may result in the loss of stereo information. A representative example is radio broadcast using frequency modulation (FM). Here, interference sources, multipath distortions or other transmission impairments can result in noisy stereo information, which is typically the case for the transmission of two-channel signals encoded with differential signals between both channels. It is common to discard the stereo information partially or completely if the reception conditions are poor.

스테레오 정보의 손실은 음질 저하를 초래할 수 있다. 일반적으로, 더 많은 수의 채널들을 포함하는 오디오 신호는 더 적은 수의 채널들을 포함하는 오디오 신호에 비해 더 높은 음질을 포함할 수 있다. 청취자들은 고음질을 포함하는 오디오 신호들을 청취하는 것을 선호할 수 있다. 미디어에 저장되거나 미디어를 통해 송신되는 데이터 레이트들과 같은 효율성 이유들로 음질은 종종 감소된다.Loss of stereo information may result in poor sound quality. In general, an audio signal that includes a greater number of channels may include a higher audio quality than an audio signal that includes fewer channels. Listeners may prefer to listen to audio signals containing high quality. Audio quality is often reduced for efficiency reasons, such as data rates stored at or transmitted through the media.

따라서, 오디오 신호들의 음질을 증가(향상)시킬 필요가 있다.Therefore, it is necessary to increase (improve) the sound quality of the audio signals.

따라서, 본 발명의 목적은 오디오 신호들의 향상 및/또는 재생된 오디오 신호들의 감각을 증가시키기 위한 장치 또는 방법을 제공하는 것이다. It is therefore an object of the present invention to provide an apparatus or method for enhancing audio signals and / or for increasing the sense of reproduced audio signals.

이러한 목적은, 청구항 제 1 항에 따른 오디오 신호를 향상시키기 위한 장치, 청구항 제 14 항에 따른 오디오 신호를 향상시키기 위한 방법 및 청구항 제 13 항에 따른 음향 향상 시스템 또는 청구항 제 15 항에 따른 컴퓨터 프로그램에 의해 달성된다.This object is achieved by an apparatus for enhancing an audio signal according to claim 1, a method for enhancing an audio signal according to claim 14 and a sound enhancement system according to claim 13 or a computer program according to claim 15, Lt; / RTI >

본 발명은, 수신된 오디오 신호들을 적어도 2 개의 셰어(share)들로 분할하고 수신된 신호의 셰어들 중 적어도 하나를 역상관시켜 인위적으로 공간 큐를 생성함으로써, 수신된 오디오 신호가 향상될 수 있다는 발견에 기초한다. 셰어들의 가중된 조합은, 스테레오로 인식되는 오디오 신호를 수신하는 것을 허용하고, 따라서 향상된다. 역상관이 음질을 감소시키는 성가신 효과들을 초래할 수 있는 경우, 적용된 가중치들을 제어하는 것은 향상의 레벨이 낮아질 수 있도록 가변적인 역상관도(degree of decorrelation) 및 그에 따른 가변적인 향상도(degree of enhancement)를 허용한다. 따라서, 음성 신호들의 경우와 같이 낮은 역상관이 적용되거나 전혀 적용되지 않는 부분들 또는 시간 간격들을 포함하고, 음악 신호들의 경우와 같이 더 많거나 높은 역상관도가 적용되는 부분들 또는 시간 간격들을 포함하는 가변적인 오디오 신호가 향상될 수 있다.The invention is based on the idea that the received audio signal can be enhanced by dividing the received audio signals into at least two shares and artificially creating spatial cues by decorrelating at least one of the shares of the received signal Based on discovery. The weighted combination of sharers allows to receive audio signals that are recognized as stereo, and is therefore improved. If the decorrelation can cause annoying effects that reduce the sound quality, controlling the applied weights can be accomplished by varying the degree of decorrelation and thus the degree of enhancement so that the level of enhancement can be lowered, . Thus, it includes portions or time intervals to which low decorrelation is applied or not at all, such as in the case of speech signals, and includes portions or time intervals to which higher or higher decorrelation is applied as in the case of music signals A variable audio signal can be improved.

본 발명의 일 실시예는 오디오 신호를 향상시키기 위한 장치를 제공한다. 장치는 프로세싱된 신호의 과도 및 음조 부분들을 감소 또는 제거하기 위해 오디오 신호를 프로세싱하기 위한 신호 프로세서를 포함한다. 장치는 프로세싱된 신호로부터 제 1 역상관된 신호 및 제 2 역상관된 신호를 생성하기 위한 역상관기를 더 포함한다. 장치는 결합기 및 제어기를 더 포함한다. 결합기는, 제 1 역상관된 신호, 제 2 역상관된 신호 및 오디오 신호 또는 오디오 신호로부터 코히어런스 향상에 의해 유도된 신호를, 시변 가중치 팩터들을 사용하여 가중적으로 결합하고, 2-채널 오디오 신호를 획득하도록 구성된다. 제어기는, 오디오 신호의 상이한 부분들이 상이한 가중치 팩터들과 곱해지고 2-채널 오디오 신호가 시변 역상관도를 갖도록 오디오 신호를 분석함으로써 시변 가중치 팩터들을 제어하도록 구성된다.One embodiment of the present invention provides an apparatus for enhancing an audio signal. The apparatus includes a signal processor for processing an audio signal to reduce or eliminate transient and tonal portions of the processed signal. The apparatus further includes an decorrelator for generating a first decorrelated signal and a second decorrelated signal from the processed signal. The apparatus further comprises a coupler and a controller. The combiner combines the first de-correlated signal, the second de-correlated signal, and the signal derived by the coherence enhancement from the audio signal or audio signal, using the time-varying weight factors, Signal. The controller is configured to control the time-varying weight factors by analyzing the audio signal such that different portions of the audio signal are multiplied by different weighting factors and the 2-channel audio signal has a time varying decorrelation.

스테레오(또는 다중 채널) 정보를 거의 또는 전혀 갖지 않는 오디오 신호, 예를 들어, 하나의 채널을 갖는 신호 또는 다수지만 거의 동일한 채널 신호들을 갖는 신호는, 향상이 적용된 후, 다중 채널, 예를 들어, 스테레오 신호로 인식될 수 있다. 수신된 모노 또는 듀얼-모노 오디오 신호는 상이한 경로들에서 상이하게 프로세싱될 수 있고, 하나의 경로에서 오디오 신호의 과도 및/또는 음조 부분들이 감소 또는 제거된다. 이러한 방식으로 프로세싱된 신호가 역상관되는 것, 및 역상관된 신호가 오디오 신호 또는 그로부터 유도된 신호를 포함하는 제 2 경로와 가중적으로 결합되는 것은, 서로에 대해 높은 역상관 팩터를 포함할 수 있는 2 개의 신호 채널들을 획득하는 것을 허용하여, 2 개의 채널들은 스테레오 신호로 인식된다.An audio signal having little or no stereo (or multi-channel) information, for example a signal with one channel or a signal with many but almost identical channel signals, It can be recognized as a stereo signal. The received mono or dual-mono audio signal may be processed differently in the different paths and the transient and / or tonal portions of the audio signal in one path are reduced or eliminated. The fact that the signal processed in this way is correlated in an uncorrelated manner and that the decorrelated signal is weighted with a second path comprising an audio signal or a signal derived therefrom, Allowing the two channels to be acquired, so that the two channels are recognized as stereo signals.

역상관된 신호 및 오디오 신호(또는 그로부터 유도된 신호)를 가중적으로 결합하기 위해 사용되는 가중치 팩터들을 제어함으로써, 시변 역상관도가 획득될 수 있어서, 오디오 신호를 향상시키는 것이 가능하게는 원하지 않는 효과들을 초래할 상황들에서, 향상은 감소 또는 생략될 수 있다. 예를 들어, 다수의 위치의 소스들로부터의 화자를 인식하는 것이 청취자에게 성가신 효과들을 초래할 수 있을 때 라디오 스피커의 신호 또는 다른 현저한 음원 신호들이 향상되는 것은 바람직하지 않다.By controlling the weighting factors used to weightively combine the decorrelated signal and the audio signal (or a signal derived therefrom), a time varying decorrelation can be obtained so that it is possible, In situations where effects will result, the enhancement may be reduced or omitted. For example, it may not be desirable to improve the signal of a radio speaker or other significant sound source signals when recognizing a speaker from sources at multiple locations may cause annoying effects on the listener.

추가적인 실시예에 따르면, 오디오 신호를 향상시키기 위한 장치는 프로세싱된 신호의 과도 및 음조 부분들을 감소 또는 제거하기 위해 오디오 신호를 프로세싱하기 위한 신호 프로세서를 포함한다. 장치는 역상관기, 결합기 및 제어기를 더 포함한다. 역상관기는 프로세싱된 신호로부터 제 1 역상관된 신호 및 제 2 역상관된 신호를 생성하도록 구성된다. 결합기는, 제 1 역상관된 신호 및 오디오 신호 또는 오디오 신호로부터 코히어런스 향상에 의해 유도된 신호를, 시변 가중치 팩터들을 사용하여 가중적으로 결합하고, 2-채널 오디오 신호를 획득하도록 구성된다. 제어기는, 오디오 신호의 상이한 부분들이 상이한 가중치 팩터들과 곱해지고 2-채널 오디오 신호가 시변 역상관도를 갖도록 오디오 신호를 분석함으로써 시변 가중치 팩터들을 제어하도록 구성된다. 이는, 모노 신호 또는 모노 신호와 유사한 신호(예를 들어, 듀얼-모노 또는 멀티-모노)를 스테레오-채널 오디오 신호인 것으로 인식하는 것을 허용한다.According to a further embodiment, an apparatus for enhancing an audio signal comprises a signal processor for processing an audio signal to reduce or eliminate transient and tonal portions of the processed signal. The apparatus further includes an decorrelator, a combiner, and a controller. The decorrelator is configured to generate a first decorrelated signal and a second decorrelated signal from the processed signal. The combiner is configured to weightively combine the first decorrelated signal and the signal derived by the coherence enhancement from the audio signal or the audio signal using the time-varying weight factors and obtain a 2-channel audio signal. The controller is configured to control the time-varying weight factors by analyzing the audio signal such that different portions of the audio signal are multiplied by different weighting factors and the 2-channel audio signal has a time varying decorrelation. This allows recognizing a mono signal or a signal similar to a mono signal (e.g., dual-mono or multi-mono) to be a stereo-channel audio signal.

오디오 신호를 프로세싱하기 위해, 제어기 및/또는 신호 프로세서는 주파수 도메인에서 오디오 신호의 표현을 프로세싱하도록 구성될 수 있다. 표현은 복수의 또는 다수의 주파수 대역들(서브대역들)을 포함할 수 있고, 이들 각각은 일 부분, 즉, 오디오 신호 각각의 스펙트럼의 오디오 신호 부분을 포함한다. 주파수 대역들 각각에 대해, 제어기는 2-채널 오디오 신호에서 인식된 역상관 레벨을 예측하도록 구성될 수 있다. 제어기는, 오디오 신호의 부분들(주파수 대역들)에 대한 가중치 팩터들을 증가시켜 더 높은 역상관도를 허용하고, 오디오 신호의 부분들에 대한 가중치 팩터들을 감소시켜 더 낮은 역상관도를 허용하도록 추가로 구성될 수 있다. 예를 들어, 박수 또는 소음과 같은 현저하지 않은 음원 신호를 포함하는 부분은, 현저한 음원 신호를 포함하는 부분보다 더 높은 역상관을 허용하는 가중치 팩터에 의해 결합될 수 있고, 여기서 용어 현저한 음원 신호는 음성, 악기, 보컬리스트 또는 스피커와 같이 직접적인 음향들로 인식되는 신호의 부분들에 대해 사용된다.To process the audio signal, the controller and / or the signal processor may be configured to process the representation of the audio signal in the frequency domain. The representation may comprise a plurality of or a plurality of frequency bands (subbands), each of which comprises a part of the audio signal portion of the spectrum of each of the audio signals. For each of the frequency bands, the controller can be configured to predict the recognized correlation level in the two-channel audio signal. The controller may be configured to increase the weighting factors for the portions of the audio signal (frequency bands) to allow for a higher degree of decorrelation and to reduce the weighting factors for portions of the audio signal, &Lt; / RTI > For example, a portion that includes non-significant source signals such as applause or noise may be combined by a weight factor that allows a higher correlation than the portion that contains the significant source signal, Is used for portions of a signal that are recognized as direct sounds, such as a voice, an instrument, a vocalist, or a speaker.

프로세서는, 주파수 대역의 일부 또는 전부 각각에 대해, 주파수 대역이 과도 또는 음조 성분들을 포함하는지 여부를 결정하고, 과도 또는 음조 부분들의 감소를 허용하는 스펙트럼 가중치들을 결정하도록 구성될 수 있다. 스펙트럼 가중치들 및 스케일링 팩터들은, 이진 결정들로 인한 성가신 효과들이 감소 및/또는 회피될 수 있도록 다수의 가능한 값들을 각각 포함할 수 있다.The processor may be configured to determine, for some or all of the frequency band, spectral weights that determine whether the frequency band includes transient or tonal components and allow for the reduction of transient or tonal portions. The spectral weights and scaling factors may each include a number of possible values such that annoying effects due to binary crystals can be reduced and / or avoided.

제어기는 2-채널 오디오 신호의 인식된 역상관 레벨이 타겟 값 주위의 일정 범위 내에 유지되도록 가중치 팩터들을 스케일링하도록 추가로 구성될 수 있다. 범위는 예를 들어 타겟 값의 ± 20 %, ± 10 % 또는 ± 5 %까지 확장될 수 있다. 타겟 값은, 예를 들어, 타겟 값을 변경하는 변하는 과도 및 음조 부분들을 포함하는 오디오 신호가 획득되도록, 음조 및/또는 과도 부분의 측정에 대해 미리 결정된 값일 수 있다. 이것은, 오디오 신호가 역상관되어 있는 경우 또는 예를 들어, 음성과 같이 현저한 음원에 대해 어떠한 역상관도 목적이 아닌 경우 낮은 역상관을 수행하거나 심지어 역상관을 전혀 수행하지 않는 것, 그리고 신호가 역상관되지 않았고 그리고/또는 역상관이 목적인 경우 높은 역상관을 허용한다. 가중치 팩터들 및/또는 스펙트럼 가중치들은 다수의 값들로 또는 심지어 거의 연속적으로 결정 및/또는 조절될 수 있다.The controller may be further configured to scale the weighting factors such that the recognized de-correlation level of the two-channel audio signal is maintained within a certain range around the target value. The range can be extended to, for example, ± 20%, ± 10% or ± 5% of the target value. The target value may be a predetermined value for the measurement of tonalities and / or transient portions such that, for example, an audio signal containing varying transient and tonal parts that change the target value is obtained. This means that if the audio signal is decoded or if it is not intended for any decorrelation with respect to a prominent sound source, such as, for example, speech, then performing a low decorrelation or even no decorrelation at all, If correlation is not correlated and / or anticorrelation is the object, it allows a high correlation. The weighting factors and / or spectral weights may be determined and / or adjusted to multiple values, or even approximately continuously.

역상관기는 오디오 신호의 반향 또는 지연에 기초하여 제 1 역상관된 신호를 생성하도록 구성될 수 있다. 제어기는 오디오 신호의 반향 또는 지연에 또한 기초하여 테스트 역상관된 신호를 생성하도록 구성될 수 있다. 반향은, 오디오 신호를 지연시키고 오디오 신호 및 이의 지연된 버전을 유한 임펄스 응답 필터 구조와 유사하게 결합함으로써 수행될 수 있고, 반향은 또한 무한 임펄스 응답 필터로서 구현될 수 있다. 지연 시간 및/또는 지연들 및 조합들의 수는 변할 수 있다. 테스트 역상관된 신호에 대해 오디오 신호를 지연 또는 반향시키는 지연 시간은, 예를 들어 지연 시간보다 짧을 수 있어서, 제 1 역상관된 신호에 대해 오디오 신호를 지연 또는 반향시키는 경우 지연 필터의 더 작은 필터 계수를 도출한다. 인식된 역상관 강도를 예측하기 위해, 지연 시간 및/또는 필터 계수들을 감소시킴으로써 계산 노력 및/또는 계산 능력이 감소될 수 있도록 더 낮은 역상관도 및 그에 따른 더 짧은 지연 시간이 충분할 수 있다.The decorrelator may be configured to generate a first decorrelated signal based on an echo or delay of the audio signal. The controller may be configured to generate a test decorrelated signal based also on the echo or delay of the audio signal. The echoes can be performed by delaying the audio signal and combining the audio signal and its delayed version similar to a finite impulse response filter structure, and the echo can also be implemented as an infinite impulse response filter. The number of delay times and / or delays and combinations may vary. The delay time for delaying or echoing the audio signal relative to the test decorrelated signal may be shorter than, for example, the delay time so that if the audio signal is delayed or echoed to the first decorrelated signal, And derives the coefficient. In order to predict the recognized reverse correlation strength, a lower degree of correlation and hence a shorter delay time may be sufficient so that the calculation effort and / or the calculation capability can be reduced by decreasing the delay time and / or filter coefficients.

후속적으로, 본 발명의 선호되는 실시예들은 첨부된 도면들에 대해 설명된다.
도 1은 오디오 신호를 향상시키기 위한 장치의 개략적인 블록도를 도시한다.
도 2는 오디오 신호를 향상시키기 위한 추가적인 장치의 개략적인 블록도를 도시한다.
도 3은 예측되는 인식된 역상관 강도의 레벨에 기초한 스케일링 팩터들(가중치 팩터들)의 컴퓨팅을 표시하는 예시적인 표를 도시한다.
도 4a는 가중치 팩터들을 부분적으로 결정하기 위해 실행될 수 있는 방법의 일부의 개략적인 흐름도를 도시한다.
도 4b는 인식된 역상관 레벨에 대한 측정이 임계 값들과 비교되는 경우를 도시하는, 도 4a의 방법의 추가적인 단계들의 개략적인 흐름도를 도시한다.
도 5는 도 1의 역상관기로서 동작하도록 구성될 수 있는 역상관기의 개략적인 블록도를 도시한다.
도 6a는 적어도 하나의 과도(단기) 신호 부분들을 포함하는 오디오 신호의 스펙트럼을 포함하는 개략도를 도시한다.
도 6b는 음조 성분을 포함하는 오디오 신호의 개략적인 스펙트럼을 도시한다.
도 7a는 과도 프로세싱 스테이지에 의해 수행되는 가능한 과도 프로세싱을 예시하는 개략적인 표를 도시한다.
도 7b는 음조 프로세싱 스테이지에 의해 실행될 수 있는 가능한 음조 프로세싱을 예시하는 예시적인 표를 도시한다.
도 8은 오디오 신호를 향상시키기 위한 장치를 포함하는 음향 향상 시스템의 개략적인 블록도를 도시한다.
도 9a는 배경/전경 프로세싱에 따라 입력 신호의 프로세싱에 대한 개략적인 블록도를 도시한다.
도 9b는 입력 신호를 전경 및 배경 신호로 분리하는 것을 예시한다.
도 10은 입력 신호에 스펙트럼 가중치들을 적용하도록 구성되는 개략적인 블록도 및 장치를 도시한다.
도 11은 오디오 신호를 향상시키기 위한 방법의 개략적인 흐름도를 도시한다.
도 12는 직접적인 신호 성분 또는 드라이 신호 성분 및 반향 신호 성분을 포함하는 혼합 신호에서 인식된 반향/역상관 레벨에 대한 측정치를 결정하기 위한 장치를 예시한다.
도 13a 내지 도 13c는 라우드니스 모델 프로세서의 구현들을 도시한다.
도 14는 도 12, 도 13a, 도 13b, 도 13c에 대한 일부 양상들에서 이미 논의된 라우드니스 모델 프로세서의 구현을 예시한다.Subsequently, preferred embodiments of the present invention are described with reference to the accompanying drawings.
Figure 1 shows a schematic block diagram of an apparatus for enhancing an audio signal.
Figure 2 shows a schematic block diagram of an additional apparatus for enhancing audio signals.
FIG. 3 shows an exemplary table indicating the computing of scaling factors (weighting factors) based on the level of the perceived reverse correlation strength that is predicted.
Figure 4A shows a schematic flow diagram of a portion of a method that may be executed to partially determine weighting factors.
FIG. 4B shows a schematic flow diagram of the further steps of the method of FIG. 4A, which illustrate the case where the measurement for the recognized reverse correlation level is compared to the thresholds.
Figure 5 shows a schematic block diagram of an decorrelator that can be configured to operate as the decorrelator of Figure 1;
Figure 6a shows a schematic diagram including a spectrum of an audio signal including at least one transient (short) signal portions.
6B shows a schematic spectrum of an audio signal including a tonality component.
Figure 7A shows a schematic table illustrating possible transient processing performed by the transient processing stage.
7B shows an exemplary table illustrating possible tone processing that may be performed by tone processing stages.
Figure 8 shows a schematic block diagram of a sound enhancement system including an apparatus for enhancing audio signals.
Figure 9A shows a schematic block diagram of the processing of an input signal in accordance with background / foreground processing.
FIG. 9B illustrates splitting the input signal into foreground and background signals.
Figure 10 shows a schematic block diagram and apparatus configured to apply spectral weights to an input signal.
Figure 11 shows a schematic flow diagram of a method for enhancing an audio signal.
Figure 12 illustrates an apparatus for determining measurements for echo / decorrelation levels recognized in a mixed signal comprising a direct signal component or a dry signal component and an echo signal component.
Figures 13A-13C illustrate implementations of a loudness model processor.
Figure 14 illustrates an implementation of the loudness model processor already discussed in some aspects with respect to Figures 12, 13A, 13B, and 13C.

동일한 또는 동등한 엘리먼트들 또는 동일한 또는 동등한 기능을 갖는 엘리먼트들은, 하기 설명에서, 상이한 도면들에서 발생하는 경우에도 동일한 또는 동등한 참조 부호들로 표시된다. The same or equivalent elements or elements having the same or equivalent function are denoted by the same or equivalent reference numerals even if they occur in different drawings in the following description.

하기 설명에서, 본 발명의 실시예들에 대한 더 철저한 설명을 제공하기 위해 다수의 세부 사항들이 상술된다. 그러나, 이러한 특정 세부사항들 없이도 본 발명의 실시예들이 실시될 수 있음은 본 기술분야의 당업자들에게 자명할 것이다. 다른 예들에서, 본 발명의 실시예들을 모호하게 하는 것을 회피하기 위해, 널리 공지된 구조들 및 디바이스들은 상세히 설명되기보다는 블록도 형태로 도시된다. 또한, 이하 설명되는 상이한 실시예들의 특징들은 구체적으로 달리 언급되지 않으면 서로 결합될 수 있다. In the following description, numerous details are set forth in order to provide a more thorough description of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than being described in detail, in order to avoid obscuring embodiments of the present invention. Further, the features of the different embodiments described below may be combined with each other unless specifically stated otherwise.

다음으로, 오디오 신호를 프로세싱하는 것이 참조될 것이다. 장치 또는 이의 컴포넌트들은 오디오 신호를 수신, 제공 및/또는 프로세싱하도록 구성될 수 있다. 각각의 오디오 신호는 동일한 시간 도메인 및/또는 주파수 도메인에서 수신, 제공 또는 프로세싱될 수 있다. 시간 도메인에서 오디오 신호 표현은, 예를 들어, 푸리에 변환들 등에 의해 오디오 신호의 주파수 표현으로 변환될 수 있다. 주파수 표현은, 예를 들어, 단기 푸리에 변환(STFT), 이산 코사인 변환 및/또는 고속 푸리에 변환(FFT)을 사용함으로써 획득될 수 있다. 대안적으로 또는 추가적으로, 주파수 표현은 직교 미러 필터QMF)들을 포함할 수 있는 필터뱅크 (filterbank)에 의해 획득될 수 있다. 오디오 신호의 주파수 도메인 표현은, 푸리에 변환들로부터 공지된 바와 같이 복수의 서브대역들을 각각 포함하는 복수의 프레임들을 포함할 수 있다. 각각의 서브대역은 오디오 신호의 일부를 포함한다. 오디오 신호의 시간 표현 및 주파수 표현은 서로 변환될 수 있기 때문에, 하기 설명은, 오디오 신호가 시간 도메인 표현 또는 주파수 도메인 표현인 것으로 제한되지 않아야 한다.Next, processing of the audio signal will be referred to. The device or components thereof may be configured to receive, provide and / or process audio signals. Each audio signal may be received, provided, or processed in the same time domain and / or frequency domain. The audio signal representation in the time domain can be transformed into a frequency representation of the audio signal, e.g., by Fourier transforms or the like. The frequency representation may be obtained, for example, by using short-term Fourier transform (STFT), discrete cosine transform and / or fast Fourier transform (FFT). Alternatively or additionally, the frequency representation may be obtained by a filterbank which may include quadrature mirror filters QMF). The frequency domain representation of the audio signal may comprise a plurality of frames each comprising a plurality of subbands as is known from Fourier transforms. Each subband includes a portion of the audio signal. Since the temporal representation and the frequency representation of the audio signal can be mutually transformed, the following description should not be restricted to that the audio signal is a time domain representation or a frequency domain representation.

도 1은 오디오 신호(102)를 향상시키기 위한 장치(10)의 개략적인 블록도를 도시한다. 오디오 신호(102)는, 주파수 도메인 또는 시간 도메인에서 표현되는, 예를 들어, 모노 신호 또는 모노형 신호, 예를 들어, 듀얼-모노 신호이다. 장치(10)는 신호 프로세서(110), 역상관기(120), 제어기(130) 및 결합기(140)를 포함한다. 신호 프로세서(110)는 오디오 신호(102)를 수신하고, 오디오 신호(102)에 비해 프로세싱된 신호(112)의 과도 및 음조 부분들을 감소 또는 제거하기 위해 오디오 신호(102)를 프로세싱하여 프로세싱된 신호(112)를 획득하도록 구성된다.1 shows a schematic block diagram of an apparatus 10 for enhancing an audio signal 102. As shown in FIG. The audio signal 102 is, for example, a mono signal or a mono type signal, e.g., a dual-mono signal, expressed in the frequency domain or time domain. The apparatus 10 includes a signal processor 110, an decorrelator 120, a controller 130 and a combiner 140. The signal processor 110 receives the audio signal 102 and processes the audio signal 102 to reduce or eliminate the transient and tonal portions of the processed signal 112 relative to the audio signal 102, 0.0 > 112 < / RTI >

역상관기(120)는 프로세싱된 신호(112)를 수신하고 프로세싱된 신호(112)로부터 제 1 역상관된 신호(122) 및 제 2 역상관된 신호(124)를 생성하도록 구성된다. 역상관기(120)는 적어도 부분적으로, 프로세싱된 신호(112)를 반향시킴으로써 제 1 역상관된 신호(122) 및 제 2 역상관된 신호(124)를 생성하도록 구성될 수 있다. 제 1 역상관된 신호(122) 및 제 2 역상관된 신호(124)는, 제 1 역상관된 신호(122)가 제 2 역상관된 신호(124)보다 짧거나 긴 시간 지연(반향 시간)을 포함하도록 반향에 대해 상이한 시간 지연들을 포함할 수 있다. 제 1 또는 제 2 역상관된 신호(122 또는 124)는 또한 지연 또는 반향 필터없이 프로세싱될 수 있다. The decorrelator 120 is configured to receive the processed signal 112 and to generate a first decorrelated signal 122 and a second decorrelated signal 124 from the processed signal 112. The decorrelator 120 may be configured, at least in part, to generate a first decorrelated signal 122 and a second decorrelated signal 124 by echoing the processed signal 112. The first de-correlated signal 122 and the second de-correlated signal 124 are generated such that the first de-correlated signal 122 has a time delay (echo time) that is shorter or longer than the second de- Lt; RTI ID = 0.0 > echo < / RTI > The first or second decorrelated signal 122 or 124 may also be processed without delay or echo filter.

역상관기(120)는 제 1 역상관된 신호(122) 및 제 2 역상관된 신호(124)를 결합기(140)에 제공하도록 구성된다. 제어기(130)는 오디오 신호(102)를 수신하고, 오디오 신호(102)의 상이한 부분들이 상이한 가중치 팩터들 a 또는 b와 곱해지도록 오디오 신호(102)를 분석함으로써 시변 가중치 팩터들 a 및 b를 제어하도록 구성된다. 따라서, 제어기(130)는 가중치 팩터들 a 및 b를 결정하도록 구성된 제어 유닛(132)을 포함한다. 제어기(130)는 주파수 도메인에서 동작하도록 구성될 수 있다. 제어 유닛(132)은 단기 푸리에 변환(STFT), 고속 푸리에 변환(FFT) 및/또는 정규의 푸리에 변환(FT)을 사용함으로써 오디오 신호(102)를 주파수 도메인으로 변환하도록 구성될 수 있다. 오디오 신호(102)의 주파수 도메인 표현은, 푸리에 변환들로부터 공지된 바와 같이 복수의 서브대역들을 포함할 수 있다. 각각의 서브대역은 오디오 신호의 일부를 포함한다. 대안적으로, 오디오 신호(102)는 주파수 도메인에서의 신호의 표현일 수 있다. 제어 유닛(132)은 오디오 신호의 디지털 표현의 각각의 서브대역에 대해 한 쌍의 가중치 팩터들 a 및 b를 제어 및/또는 결정하도록 구성될 수 있다.The decorrelator 120 is configured to provide the first de-correlated signal 122 and the second de-correlated signal 124 to the combiner 140. The controller 130 receives the audio signal 102 and controls the time-varying weight factors a and b by analyzing the audio signal 102 such that different portions of the audio signal 102 are multiplied by different weighting factors a or b. . Thus, the controller 130 includes a control unit 132 configured to determine the weighting factors a and b. The controller 130 may be configured to operate in the frequency domain. The control unit 132 may be configured to convert the audio signal 102 into the frequency domain by using a short term Fourier transform (STFT), a fast Fourier transform (FFT) and / or a normal Fourier transform (FT). The frequency domain representation of the audio signal 102 may comprise a plurality of subbands as is known from Fourier transforms. Each subband includes a portion of the audio signal. Alternatively, the audio signal 102 may be a representation of a signal in the frequency domain. The control unit 132 may be configured to control and / or determine a pair of weighting factors a and b for each subband of the digital representation of the audio signal.

결합기는 제 1 역상관된 신호(122), 제 2 역상관된 신호(124), 가중치 팩터들 a 및 b를 사용하여 오디오 신호(102)로부터 유도된 신호(136)를 가중적으로 결합하도록 구성된다. 오디오 신호(102)로부터 유도된 신호(136)는 제어기(130)에 의해 제공될 수 있다. 따라서, 제어기(130)는 선택적인 유도 유닛(134)을 포함할 수 있다. 유도 유닛(134)은 예를 들어, 오디오 신호(102)의 부분들을 적응, 수정 또는 향상시키도록 구성될 수 있다. 특히, 유도 유닛(110)은 신호 프로세서(110)에 의해 감쇠, 감소 또는 제거되는 오디오 신호(102)의 부분들을 증폭하도록 구성될 수 있다.The combiner is configured to weight combine the signal 136 derived from the audio signal 102 using a first decorrelated signal 122, a second decorrelated signal 124, and weighting factors a and b. do. The signal 136 derived from the audio signal 102 may be provided by the controller 130. Thus, the controller 130 may include an optional guidance unit 134. [ Induction unit 134 may be configured, for example, to adapt, modify, or enhance portions of audio signal 102. In particular, the guidance unit 110 may be configured to amplify portions of the audio signal 102 that are attenuated, reduced, or removed by the signal processor 110.

신호 프로세서(110)는 또한 주파수 도메인에서 동작하고, 오디오 신호(102)를 프로세싱하여, 신호 프로세서(110)가 오디오 신호(102)의 스펙트럼의 각각의 서브대역에 대한 과도 및 음조 부분들을 감소 또는 제거하도록 구성될 수 있다. 이것은, 과도 부분을 거의 또는 전혀 또는 음조 부분을 거의 또는 전혀(즉, 잡음) 포함하지 않는 서브대역에 대해서 프로세싱이 적거나 또는 프로세싱이 전혀 없는 것을 도출할 수 있다. 대안적으로, 결합기(140)는 유도된 신호 대신에 오디오 신호(102)를 수신할 수 있는데, 즉, 제어기(130)는 유도 유닛(134)없이 구현될 수 있다. 그러면, 신호(136)는 오디오 신호(102)와 동일할 수 있다.The signal processor 110 also operates in the frequency domain and processes the audio signal 102 to enable the signal processor 110 to reduce or eliminate transient and tonal portions for each subband of the spectrum of the audio signal 102 . This can result in little or no processing for subbands that contain little or no transient portion or little or no (i.e., noise) tonal portions. Alternatively, the combiner 140 may receive the audio signal 102 instead of the derived signal, i.e., the controller 130 may be implemented without the inductive unit 134. The signal 136 may then be the same as the audio signal 102.

그 다음, 결합기(140)는 가중치 팩터들 a 및 b를 포함하는 가중 신호(138)를 수신하도록 구성된다. 결합기(140)는 제 1 채널 y1 및 제 2 채널 y2를 포함하는 출력 오디오 신호(142)를 획득하도록 추가로 구성되어, 즉, 오디오 신호(142)는 2-채널 오디오 신호이다. The combiner 140 is then configured to receive a weighting signal 138 comprising weighting factors a and b. The combiner 140 is further configured to obtain an output audio signal 142 that includes a first channel y1 and a second channel y2, i.e., the audio signal 142 is a two-channel audio signal.

신호 프로세서(110), 역상관기(120), 제어기(130) 및 결합기(140)는 오디오 신호(102), 그로부터 유도된 신호(136) 및/또는 프로세싱된 신호들(112, 122 및/또는 124)을 프레임 단위 및 서브대역 단위로 프로세싱하여, 신호 프로세서(110), 역상관기(120), 제어기(130) 및 결합기(140)는 한번에 하나 이상의 주파수 대역들(신호의 부분들)을 프로세싱함으로써 각각의 주파수 대역에 대해 전술된 동작들을 실행하도록 구성될 수 있다.The signal processor 110, decorrelator 120, controller 130, and combiner 140 are coupled to the audio signal 102, the signal 136 derived therefrom, and / or the processed signals 112, 122 and / ) By processing on a frame and subband basis so that the signal processor 110, decorrelator 120, controller 130 and combiner 140 process one or more frequency bands (portions of the signal) RTI ID = 0.0 > of the < / RTI >

도 2는 오디오 신호(102)를 향상시키기 위한 장치(200)의 개략적인 블록도를 도시한다. 장치(200)는 신호 프로세서(210), 역상관기(120), 제어기(230) 및 결합기(240)를 포함한다. 역상관기(120)는 r1로 표시된 제 1 역상관된 신호(122) 및 r2로 표시된 제 2 역상관된 신호(124)를 생성하도록 구성된다.FIG. 2 shows a schematic block diagram of an apparatus 200 for enhancing an audio signal 102. The apparatus 200 includes a signal processor 210, an decorrelator 120, a controller 230 and a combiner 240. The decorrelator 120 is configured to generate a first decorrelated signal 122, denoted rl, and a second decorrelated signal 124, denoted r2.

신호 프로세서(210)는 과도 프로세싱 스테이지(211), 음조 프로세싱 스테이지(213) 및 결합 스테이지(215)를 포함한다. 신호 프로세서(210)는 주파수 도메인에서 오디오 신호(102)의 표현을 프로세싱하도록 구성된다. 오디오 신호(102)의 주파수 도메인 표현은 다수의 서브대역들(주파수 대역들)을 포함하고, 여기서 과도 프로세싱 스테이지(211) 및 색조 프로세싱 스테이지(213)는 주파수 대역들 각각을 프로세싱하도록 구성된다. 대안적으로, 오디오 신호(102)의 주파수 변환에 의해 획득된 스펙트럼은, 특정 주파수 범위들 또는 주파수 대역들, 예를 들어, 20Hz, 50Hz 또는 100Hz 미만 및/또는 16 kHz, 18 kHz 또는 22 kHz 초과의 주파수 대역들을 추가적인 프로세싱으로부터 배제하기 위해 감소, 즉 커팅될 수 있다. 이것은, 감소된 계산 노력 및 그에 따른 더 신속한 및/또는 더 정확한 프로세싱을 허용할 수 있다. Signal processor 210 includes transient processing stage 211, tone processing stage 213, and combining stage 215. The signal processor 210 is configured to process the representation of the audio signal 102 in the frequency domain. The frequency domain representation of the audio signal 102 includes a plurality of subbands (frequency bands), wherein the transient processing stage 211 and the tone processing stage 213 are configured to process each of the frequency bands. Alternatively, the spectrum obtained by the frequency transformation of the audio signal 102 may be in a specific frequency range or frequency bands, e.g., 20 Hz, 50 Hz, or 100 Hz and / or 16 kHz, 18 kHz, That is, to exclude the frequency bands of the frequency bands of interest from further processing. This may allow reduced computational effort and hence faster and / or more accurate processing.

과도 프로세싱 스테이지(211)는 프로세싱된 주파수 대역들 각각에 대해 주파수 대역이 과도 부분들을 포함하는지 여부를 결정하도록 구성된다. 음조 프로세싱 스테이지(213)는 주파수 대역들 각각에 대해 오디오 신호(102)가 주파수 대역에서 음조 부분들을 포함하는지 여부를 결정하도록 구성된다. 과도 프로세싱 스테이지(211)는 적어도 과도 부분들을 포함하는 주파수 대역들에 대해 스펙트럼 가중치 팩터들(217)을 결정하도록 구성되며, 스펙트럼 가중치 팩터들(217)은 각각의 주파수 대역과 연관된다. 도 6a 및 도 6b에서 설명되는 바와 같이, 과도 및 음조 특성들은 스펙트럼 프로세싱에 의해 식별될 수 있다. 과도 및/또는 음조의 레벨은 과도 프로세싱 스테이지(211) 및/또는 음조 프로세싱 스테이지(213)에 의해 측정되고 스펙트럼 가중치로 변환될 수 있다. 음조 프로세싱 스테이지(213)는 적어도 음조 부분들을 포함하는 주파수 대역들에 대한 스펙트럼 가중치 계수들(219)을 결정하도록 구성된다. 스펙트럼 가중치 팩터들(217 및 219)은 다수의 가능한 값들을 포함할 수 있고, 스펙트럼 가중치 팩터들(217 및/또는 219)의 크기는 주파수 대역에서의 과도 및/또는 음조 부분들의 양을 표시한다. The transient processing stage 211 is configured to determine, for each of the processed frequency bands, whether the frequency band includes transitional portions. Tone processing stage 213 is configured to determine, for each of the frequency bands, whether the audio signal 102 comprises tone portions in the frequency band. The transient processing stage 211 is configured to determine spectral weighting factors 217 for frequency bands including at least transient portions, and the spectral weighting factors 217 are associated with respective frequency bands. As described in Figures 6A and 6B, transient and tonal characteristics can be identified by spectral processing. The level of transients and / or tonality may be measured by transient processing stage 211 and / or tonality processing stage 213 and converted into spectral weights. Tone processing stage 213 is configured to determine spectral weighting coefficients 219 for frequency bands including at least tone portions. The spectral weighting factors 217 and 219 may include a number of possible values and the size of the spectral weighting factors 217 and / or 219 may indicate the amount of transient and / or tone portions in the frequency band.

스펙트럼 가중치 팩터들(217, 219)은 절대 값 또는 상대 값을 포함할 수 있다. 예를 들어, 절대 값은 주파수 대역에서 과도 및/또는 음조 음향 에너지의 값을 포함할 수 있다. 대안적으로, 스펙트럼 가중치 팩터들(217 및/또는 219)은 0과 1 사이의 값과 같은 상대 값을 포함할 수 있고, 값 0은 주파수 대역이 과도 또는 음조 부분들을 전혀 또는 거의 포함하지 않음을 표시하고, 값 1은 많은 양의 또는 완전한 과도 및/또는 음조 부분들을 포함하는 주파수 대역을 표시한다. 스펙트럼 가중치 팩터들은, 3, 5, 10개의 수 또는 더 많은 값들(단계들), 예를 들어, (0, 0.3 및 1), (0.1, 0.2, ..., 1) 등과 같은 다수의 값들 중 하나를 포함할 수 있다. 스케일의 크기, 최소값과 최대값 사이의 단계들의 수는 적어도 0이지만, 바람직하게는 적어도 하나 및 더 바람직하게는 적어도 5일 수 있다. 바람직하게는, 다수의 스펙트럼 가중치들(217, 219)의 값들은 최소값, 최대 값 및 최소값과 최대값 사이의 값을 포함하는 적어도 3개의 값을 포함한다. 최소값과 최대값 사이의 더 많은 수의 값들은 주파수 대역들 각각의 더 연속적인 가중치들을 허용할 수 있다. 최소값 및 최대값은 0과 1 또는 다른 값들 사이의 스케일로 스케일링될 수 있다. 최대값은 최고 또는 최저 레벨의 과도 및/또는 음조를 표시할 수 있다.The spectral weighting factors 217 and 219 may include an absolute value or a relative value. For example, the absolute value may include values of transient and / or tonal acoustic energy in the frequency band. Alternatively, the spectral weighting factors 217 and / or 219 may comprise a relative value such as a value between 0 and 1, and a value of 0 indicates that the frequency band contains no or very few transient or tonal parts And a value of 1 indicates a frequency band comprising a large amount or a complete transient and / or tone part. The spectral weighting factors may be selected from among a number of values such as 3, 5, 10 or more values (steps), e.g., (0, 0.3 and 1), (0.1, 0.2, ..., One can be included. The size of the scale, the number of steps between the minimum and maximum values is at least zero, but preferably is at least one and more preferably at least five. Preferably, the values of the plurality of spectral weights 217 and 219 include at least three values including a minimum value, a maximum value, and a value between a minimum value and a maximum value. A greater number of values between the minimum and maximum values may allow for more continuous weights of each of the frequency bands. The minimum and maximum values may be scaled to a scale between 0 and 1 or other values. The maximum value can indicate the highest or lowest level of transient and / or tonality.

결합 스테이지(215)는 후술되는 바와 같이 주파수 대역들 각각에 대한 스펙트럼 가중치들을 결합하도록 구성된다. 신호 프로세서(210)는 결합된 스펙트럼 가중치들을 주파수 대역들 각각에 적용하도록 구성된다. 예를 들어, 스펙트럼 가중치들(217 및/또는 219) 또는 그로부터 유도된 값은 프로세싱된 주파수 대역에서 오디오 신호(102)의 스펙트럼 값들과 곱해질 수 있다.The combining stage 215 is configured to combine the spectral weights for each of the frequency bands as described below. The signal processor 210 is configured to apply the combined spectral weights to each of the frequency bands. For example, the spectral weights 217 and / or 219 or values derived therefrom may be multiplied by the spectral values of the audio signal 102 in the processed frequency band.

제어기(230)는 신호 프로세서(210)로부터 스펙트럼 가중치 팩터들(217, 219) 또는 그에 대한 정보를 수신하도록 구성된다. 유도된 정보는, 예를 들어, 표의 인덱스 번호일 수 있고, 인덱스 번호는 스펙트럼 가중치 팩터들과 연관된다. 제어기는 코히어런트 신호 부분들, 즉 과도 프로세싱 스테이지(211) 및/또는 음조 프로세싱 스테이지(213)에 의해 감소 또는 제거되지 않거나 부분적으로만 감소 또는 제거된 부분들에 대한 오디오 신호(102)를 향상시키도록 구성된다. 간단히 말해, 유도 유닛(234)은 신호 프로세서(210)에 의해 감소 또는 제거되지 않은 부분들을 증폭할 수 있다.The controller 230 is configured to receive spectral weighting factors 217, 219 or information about it from the signal processor 210. The derived information may be, for example, the index number of the table, and the index number is associated with the spectral weighting factors. The controller may enhance the audio signal 102 for coherent signal portions, i.e., the portions that are not reduced or removed or only partially reduced or removed by the transient processing stage 211 and / or tone processing stage 213. [ . In short, the guidance unit 234 can amplify portions that have not been reduced or removed by the signal processor 210. [

유도 유닛(234)은 z로 표시된 오디오 신호(102)로부터 유도된 신호(236)를 제공하도록 구성된다. 결합기(240)는 신호 z(236)를 수신하도록 구성된다. 역상관기(120)는 신호 프로세서(210)로부터 s로 표시된 프로세싱된 신호(212)를 수신하도록 구성된다.Induction unit 234 is configured to provide a signal 236 derived from audio signal 102 marked z. The combiner 240 is configured to receive the signal z (236). The decorrelator 120 is configured to receive the processed signal 212 indicated by s from the signal processor 210.

결합기(240)는 제 1 채널 신호 y1 및 제 2-채널 신호 y2를 획득하기 위해 가중치 팩터들(스케일링 팩터들) a 및 b와 역상관된 신호들 r1 및 r2를 결합하도록 구성된다. 신호 채널들 y1 및 y2는 출력 신호(242)에 결합되거나 개별적으로 출력될 수 있다. The combiner 240 is configured to combine the signals r1 and r2 that are correlated with the weighting factors (scaling factors) a and b to obtain the first channel signal y1 and the second channel signal y2. The signal channels y1 and y2 may be coupled to the output signal 242 or may be individually output.

즉, 출력 신호(242)는 (통상적으로) 상관된 신호 z(236) 및 역상관된 신호 s(각각 r1 또는 r2)의 조합이다. 역상관된 신호는 2 단계, 즉, 첫째로 과도 및 음조 신호 성분들의 억제(감소 또는 제거) 및 둘째로 역상관에서 획득된다. 과도 신호 성분들 및 음조 신호 성분들의 억제는 스펙트럼 가중을 이용하여 수행된다. 신호는 주파수 도메인에서 프레임 단위로 프로세싱된다. 스펙트럼 가중치들은 각각의 주파수 빈(주파수 대역) 및 시간 프레임에 대해 컴퓨팅된다. 따라서, 오디오 신호는 전체 대역에서 프로세싱되는데, 즉 고려되는 모든 부분들이 프로세싱된다.That is, the output signal 242 is a combination of the (typically) correlated signal z 236 and the decorrelated signal s (r 1 or r 2, respectively). The decorrelated signal is obtained in two stages, i.e., suppression (reduction or elimination) of transient and tonal signal components first, and second, inverse correlation. Suppression of transient signal components and tonal signal components is performed using spectral weighting. The signal is processed on a frame-by-frame basis in the frequency domain. Spectral weights are computed for each frequency bin (frequency band) and time frame. Thus, the audio signal is processed in the entire band, i.e. all the parts considered are processed.

프로세싱의 입력 신호는 단일-채널 신호 x(102)일 수 있고, 출력 신호는 2-채널 신호 y = [y1, y2] 일 수 있고, 여기서 인덱스는 제 1 및 제 2 채널, 예를 들어, 스테레오 신호의 좌측 및 우측 채널을 표시한다. 출력 신호 y는 2-채널 신호 r = [r1, r2]와, The input signal of the processing may be a single-channel signal x 102 and the output signal may be a two-channel signal y = [y1, y2], where the index may be a first and a second channel, The left and right channels of the signal are displayed. The output signal y is a 2-channel signal r = [r1, r2]

(1)

(One)

(2)

에 따른 스케일링 팩터들 a 및 b를 갖는 단일-채널 신호 z를 선형 결합함으로써 컴퓨팅될 수 있고,Can be computed by linearly combining the single-channel signal z with the scaling factors a and b according to,

여기서, "x"는 방정식 (1) 및 (2)의 곱셈 연산자를 나타낸다.Here, "x" represents a multiplication operator of equations (1) and (2).

방정식 (1) 및 (2)는 질적으로 해석되어야 하며, 이는, 신호들 z, r1 및 r2의 셰어가 변하는 가중치 팩터들에 의해 제어(변경)될 수 있음을 표시한다. 예를 들어, 역수 값으로 나누는 것과 같은 역 연산들을 형성하는 것에 의해, 상이한 연산들을 수행함으로써 동일한 또는 동등한 결과들이 획득될 수 있다. 대안적으로 또는 추가적으로, 2-채널 신호 y를 획득하기 위해 스케일링 팩터들 a 및 b 및/또는 y1 및/또는 y2에 대한 값들을 포함하는 룩업 테이블이 사용될 수 있다.Equations (1) and (2) should be interpreted qualitatively, indicating that the share of signals z, r1, and r2 can be controlled (modified) by varying weighting factors. By forming inverse operations such as, for example, dividing by a reciprocal value, the same or equivalent results can be obtained by performing different operations. Alternatively or additionally, a lookup table may be used that includes values for scaling factors a and b and / or y1 and / or y2 to obtain a two-channel signal y.

스케일링 팩터들 a 및/또는 b는 인식된 상관 강도에 따라 단조적으로 감소하도록 컴퓨팅될 수 있다 인식된 강도에 대해 예측된 스칼라 값은 스케일링 팩터들을 제어하기 위해 사용될 수 있다.The scaling factors a and / or b may be computed to monotonically decrease according to the recognized correlation strength. A predicted scalar value for the recognized strength may be used to control the scaling factors.

r1 및 r2를 포함하는 역상관된 신호 r은 2 단계로 컴퓨팅될 수 있다. 먼저, 과도 및 음조 신호 성분들의 감쇠가 신호 s를 도출한다. 둘째로, 신호 s의 역상관이 수행될 수 있다.The decorrelated signal r comprising r1 and r2 may be computed in two stages. First, the attenuation of the transient and tonal signal components derives the signal s. Second, an inverse correlation of the signal s can be performed.

과도 신호 성분들 및 음조 신호 성분들의 감쇠는, 예를 들어, 스펙트럼 가중을 이용하여 수행된다. 신호는 주파수 도메인에서 프레임 단위로 프로세싱된다. 스펙트럼 가중치들은 각각의 주파수 빈 및 시간 프레임에 대해 컴퓨팅된다. 감쇠의 목적은 두 가지이다.The attenuation of transient signal components and tonal signal components is performed, for example, using spectral weighting. The signal is processed on a frame-by-frame basis in the frequency domain. The spectral weights are computed for each frequency bin and time frame. The purpose of attenuation is twofold.

1. 과도 또는 음조 신호 성분들은 통상적으로 소위 전경 신호들에 속하고, 따라서, 스테레오 이미지 내에서 이들의 위치는 종종 중앙이다.One. Transient or tonal signal components typically belong to so-called foreground signals, and therefore their position in the stereo image is often central.

2. 강한 과도 신호 성분들을 갖는 신호들의 역상관은 인식가능한 아티팩트들을 초래한다. 강한 음조 신호 성분들을 갖는 신호들의 역상관은 또한, 풍부한 신호 스펙트럼(가능하게는 불협화음) 배음(overtone)들로 인해 음색의 변경이 아닌 주파수의 변경으로 인식될 만큼 적어도 주파수 변조가 충분히 느린 경우, 음조 성분들(즉, 사인 곡선들)이 주파수 변조될 때 인식가능한 아티팩트들을 초래한다. 2. The decorrelation of signals with strong transient signal components results in recognizable artifacts. The decorrelation of the signals with strong tone signal components is also advantageous if at least the frequency modulation is slow enough to be perceived as a change in frequency rather than a change in tone color due to a rich signal spectrum (possibly dissonant) Resulting in recognizable artifacts when the components (i.e., sinusoids) are frequency modulated.

상관된 신호 z는 과도 및 음조 신호 성분들을 향상시키는 프로세싱, 예를 들어 신호 s를 컴퓨팅하기 위한 억제의 역수를 질적으로 적용함으로써 획득될 수 있다. 대안적으로, 예를 들어 프로세싱되지 않은 입력 신호가 그대로 사용될 수 있다. z가 또한 2-채널 신호인 경우가 존재할 수 있음을 주목한다. 실제로, 많은 저장 매체(예를 들어, 콤팩트 디스크)는 신호가 모노인 경우에도 2 개의 채널들을 사용한다. 두 개의 동일한 채널들을 갖는 신호는 "듀얼-모노"로 지칭된다. 입력 신호 z가 스테레오 신호인 경우가 또한 존재할 수 있고, 프로세싱의 목적은 스테레오 효과를 증가시키는 것일 수 있다.The correlated signal z can be obtained by qualitatively applying the inverse of the suppression to compute the processing of the transient and tonal signal components, e.g., the signal s. Alternatively, for example, an unprocessed input signal may be used as is. Note that z may also be a 2-channel signal. In fact, many storage media (e. G., Compact discs) use two channels even when the signal is mono. A signal having two identical channels is referred to as "dual-mono ". The case where the input signal z is a stereo signal may also be present and the purpose of the processing may be to increase the stereo effect.

인식된 역상관 강도는 EP 2 541 542 A1에 설명된 바와 같이, 라우드니스 계산 모델들을 사용하여 예측되는 인식된 늦은 반향 강도와 유사하게 예측될 수 있다.The recognized reverse correlation strength can be predicted similar to the recognized late echo intensity predicted using loudness calculation models, as described in EP 2 541 542 A1.

도 3은 예측되는 인식된 역상관 강도의 레벨에 기초한 스케일링 팩터들(가중치 팩터들) a 및 b의 컴퓨팅을 표시하는 예시적인 표를 도시한다.3 shows an exemplary table indicating the computing of scaling factors (weighting factors) a and b based on the predicted level of the recognized decorrelation strength.

예를 들어, 인식된 역상관 강도는, 그 강도의 값이, 인식된 낮은 역상관 레벨을 표시하는 0의 값, 넌(none) 각각과, 높은 역상관 레벨을 표시하는 10의 값 사이에서 변할 수 있는 스칼라 값을 포함하도록 예측될 수 있다. 레벨들은 예를 들어 청취자 테스트들 또는 예측 시뮬레이션에 기초하여 결정될 수 있다. 대안적으로, 역상관 레벨의 값은 최소값과 최대값 사이의 범위를 포함할 수 있다. 인식되는 역상관 레벨의 값은 최소값 및 최대값보다 많은 값을 수용하도록 구성될 수 있다. 바람직하게는, 인식된 상관 레벨은 적어도 3 개의 상이한 값들 및 더 바람직하게는 적어도 7 개의 상이한 값들을 수용할 수 있다. For example, the recognized inverse correlation strength may vary between a value of zero indicating a recognized low inverse correlation level, a value of ten indicating a high inverse correlation level, Lt; RTI ID = 0.0 > a < / RTI > scalar value. The levels may be determined based on, for example, listener tests or predictive simulations. Alternatively, the value of the de-correlation level may include a range between a minimum value and a maximum value. The value of the de-correlated level to be recognized can be configured to accommodate more values than the minimum and maximum values. Preferably, the recognized correlation level can accommodate at least three different values and more preferably at least seven different values.

인식된 역상관의 결정된 레벨에 기초하여 적용될 가중치 팩터들 a 및 b는 메모리에 저장될 수 있고, 제어기(130 또는 230)에 대해 액세스가능할 수 있다. 인식된 역상관 레벨들이 증가함에 따라, 오디오 신호 또는 결합기에 의해 그로부터 유도된 신호와 곱해질 스케일링 팩터들 a가 또한 증가할 수 있다. 인식된 역상관의 증가된 레벨은, 역상관 레벨들이 증가함에 따라, 오디오 신호 또는 그로부터 유도된 신호가 출력 신호(142 또는 242)에 더 높은 셰어를 포함하도록 "신호가 이미 (부분적으로) 역상관된 것"으로 해석될 수 있다. 역상관 레벨들이 증가함에 따라, 가중치 팩터 b는 감소되도록 구성되어, 즉, 신호 프로세서의 출력 신호에 기초하여 역상관기에 의해 생성된 신호 r1 및 r2는 결합기(140 또는 240)에서 결합되는 경우 더 낮은 셰어를 포함할 수 있다 .The weighting factors a and b to be applied based on the determined level of the recognized decorrelation may be stored in memory and accessible to the controller 130 or 230. [ As the recognized reverse correlation levels increase, the scaling factors a to be multiplied by the audio signal or a signal derived therefrom by the combiner can also be increased. The increased level of perceived reverse correlation is such that as the de-correlate levels increase, the audio signal or a signal derived therefrom already includes a higher share in the output signal 142 or 242, Can be interpreted as "what happened". As the de-correlation levels increase, the weight factor b is configured to decrease, i. E., The signals r1 and r2 generated by the decorrelator based on the output signal of the signal processor are lowered when combined in combiner 140 or 240 It can include SharePoint.

가중치 팩터 a는, 적어도 1(최소값) 및 최대 9(최대값)의 스칼라 값을 포함하는 것으로 도시되어 있다. 가중치 팩터 b는 최소값 2 및 최대값 8을 포함하는 범위에서 스칼라 값을 포함하는 것으로 도시되어 있지만, 가중치 팩터들 a 및 b 둘 모두는 최소값 및 최대값을 포함하는 범위 내의 값, 바람직하게는 최소값과 최대값 사이의 적어도 하나의 값을 포함할 수 있다. 도 3에 도시된 가중치 팩터들 a 및 b의 값들에 대한 대안으로, 인식된 역상관 레벨이 증가함에 따라, 가중치 팩터 a는 선형으로 증가할 수 있다. 대안적으로 또는 추가적으로, 인식된 역상관 레벨이 증가함에 따라, 가중치 팩터 b는 선형으로 감소할 수 있다. 또한, 인식된 역상관의 레벨에 대해, 프레임에 대해 결정된 가중치 팩터들 a 및 b의 합은 일정하거나 거의 일정할 수 있다. 예를 들어, 인식된 역상관 레벨이 증가함에 따라, 가중치 팩터 a는 0부터 10까지 증가할 수 있고, 가중치 팩터 b는 10의 값부터 0의 값으로 감소할 수 있다. 가중치 팩터들 둘 모두가 선형으로, 예를 들어, 스텝 사이즈 1로 감소 또는 증가하면, 가중치 팩터들 a 및 b의 합은 인식된 역상관의 각각의 레벨에 대해 10의 값을 포함할 수 있다. 적용될 가중치 팩터들 a 및 b는 시뮬레이션 또는 실험에 의해 결정될 수 있다.The weight factor a is shown to include a scalar value of at least 1 (the minimum value) and a maximum of 9 (the maximum value). Although the weight factor b is shown as including a scalar value in a range including a minimum value 2 and a maximum value 8, both weighting factors a and b are values within a range including a minimum value and a maximum value, And at least one value between the maximum values. As an alternative to the values of the weighting factors a and b shown in FIG. 3, as the recognized de-correlation level increases, the weighting factor a may increase linearly. Alternatively or additionally, as the recognized correlation level increases, the weight factor b may decrease linearly. Further, for the level of perceived decorrelation, the sum of the weighting factors a and b determined for the frame may be constant or nearly constant. For example, as the recognized reverse correlation level increases, the weight factor a may increase from 0 to 10, and the weight factor b may decrease from a value of 10 to a value of zero. If both of the weighting factors decrease or increase linearly, e.g., step size 1, then the sum of the weighting factors a and b may include a value of 10 for each level of the recognized decorrelation. The weighting factors a and b to be applied can be determined by simulation or experiment.

도 4a는, 예를 들어, 제어기(130 및/또는 230)에 의해 실행될 수 있는 방법(400)의 일부의 개략적인 흐름도를 도시한다. 제어기는, 예를 들어, 도 3에 도시된 바와 같이 스칼라 값을 도출하는 단계(410)에서 인식된 역상관 레벨에 대한 측정치를 결정하도록 구성된다. 단계(420)에서, 제어기는 결정된 측정치를 임계 값과 비교하도록 구성된다. 측정치가 임계 값보다 높으면, 제어기는 단계(430)에서 가중치 팩터들 a 및/또는 b를 수정 또는 적응시키도록 구성된다. 단계(430)에서, 제어기는 a 및 b에 대한 기준값에 대해 가중치 팩터 b를 감소시키거나, 가중치 팩터 a를 증가시키거나, 가중치 팩터 b를 감소시키고 가중치 팩터 a를 증가시키도록 구성된다. 예를 들어, 임계치는, 예를 들어, 오디오 신호의 주파수 대역들 내에서 변할 수 있다. 예를 들어, 임계치는, 저레벨의 역상관이 선호되거나 목표임을 표시하는 현저한 음원 신호를 포함하는 주파수 대역들에 대해 낮은 값을 포함할 수 있다. 대안적으로 또는 추가적으로 임계치는, 고레벨의 역상관이 선호되는 것을 표시하는 현저하지 않은 음원 신호를 포함하는 주파수 대역들에 대해 높은 값을 포함할 수 있다. 4A shows a schematic flow diagram of a portion of a method 400 that may be executed, for example, by controllers 130 and / The controller is configured to determine a measure for the de-correlation level recognized in step 410, e.g., deriving a scalar value as shown in FIG. In step 420, the controller is configured to compare the determined measurement to a threshold. If the measure is above the threshold, the controller is configured to modify or adapt the weighting factors a and / or b in step 430. In step 430, the controller is configured to decrease the weight factor b, increase the weight factor a, decrease the weight factor b, and increase the weight factor a, for a reference value for a and b. For example, the threshold may vary within frequency bands of, for example, an audio signal. For example, the threshold may include a low value for frequency bands that include significant source signals indicative of low-level decorrelation being the preferred or target. Alternatively or additionally, the threshold may include a high value for frequency bands that include non-significant source signals indicating that a high level of decorrelation is favored.

현저하지 않은 음원 신호들을 포함하는 주파수 대역들의 상관을 증가시키고 현저한 음원 신호들을 포함하는 주파수 대역들에 대한 역상관을 제한하는 것이 목표일 수 있다. 임계치는 예를 들어, 가중치 팩터들 a 및/또는 b가 수용할 수 있는 값들의 범위의 20 %, 50 % 또는 70 %일 수 있다. 예를 들어, 도 3을 참조하면, 현저한 음원 신호를 포함하는 주파수 프레임에 대해 임계 값은 7보다 낮거나, 5보다 낮거나 3보다 낮을 수 있다. 인식된 역상관 레벨이 너무 높으면, 단계(430)를 실행함으로써, 인식된 역상관 레벨이 감소될 수 있다. 가중치 팩터들 a 및 b는 단독으로 또는 한번에 둘 모두가 변경될 수 있다. 도 3에 도시된 표는 예를 들어, 제어기에 의해 적응될 초기 값들인 가중치 팩터들 a 및/또는 b에 대한 초기 값들을 포함하는 값일 수 있다.It may be desirable to increase the correlation of frequency bands containing non-significant source signals and to limit the decorrelation for frequency bands containing significant source signals. The threshold may be, for example, 20%, 50% or 70% of the range of acceptable values of the weighting factors a and / or b. For example, referring to FIG. 3, the threshold for a frequency frame containing significant source signals may be less than 7, less than 5, or less than 3. If the recognized reverse correlation level is too high, then by performing step 430, the recognized reverse correlation level can be reduced. The weighting factors a and b may be varied alone or both at a time. The table shown in Fig. 3 may be, for example, a value including initial values for weighting factors a and / or b that are initial values to be adapted by the controller.

도 4b는, 인식된 역상관 레벨에 대한 측정치(단계(410)에서 결정됨)가 임계 값에 비교되고, 측정치가 임계 값보다 낮은 경우(단계(440))를 묘사하는 방법(400)의 추가적인 단계들의 개략적인 흐름도를 도시한다. 제어기는, 인식된 역상관 레벨을 증가시키기 위해 a 및 b에 대한 기준에 대해 b를 증가시키거나, a를 감소시키거나, b를 증가시키키고 a를 감소시키도록 구성되고, 측정치가 적어도 임계 값인 값을 포함하도록 구성된다. Figure 4b illustrates an additional step of method 400 that depicts a measurement for a recognized de-correlation level (as determined at step 410) is compared to a threshold value and the measurement is less than a threshold value (step 440) Lt; / RTI > The controller is configured to either increase b for a reference to a and b to increase the recognized reverse correlation level, decrease a, increase b, and decrease a, and wherein the measure is at least a threshold value Value.

대안적으로 또는 추가적으로, 제어기는 2-채널 오디오 신호의 인식된 역상관 레벨이 타겟 값 주위의 일정 범위 내에 유지되도록 가중치 팩터들 a 및 b를 스케일링하도록 구성될 수 있다. 타겟 값은 예를 들어 임계 값일 수 있고, 임계 값은, 가중치 팩터들 및/또는 스펙트럼 가중치들이 결정되는 주파수 대역에 포함되는 신호의 타입에 기초하여 변할 수 있다. 타겟 값 주위의 범위는 타겟 값의 ± 20 %, ± 10 % 또는 ± 5 %까지 확장될 수 있다. 이것은, 인식된 역상관이 대략 타겟 값(임계치)인 경우 가중치 팩터들을 적응시키는 것을 중단하도록 허용할 수 있다.Alternatively or additionally, the controller may be configured to scale the weighting factors a and b such that the recognized de-correlation level of the two-channel audio signal is maintained within a certain range around the target value. The target value may be, for example, a threshold value, and the threshold value may vary based on the type of signal included in the frequency band for which weighting factors and / or spectral weights are determined. The range around the target value can be extended to ± 20%, ± 10%, or ± 5% of the target value. This may allow to stop adapting the weighting factors if the recognized decorrelation is approximately the target value (threshold).

도 5는 역상관기(120)로서 동작하도록 구성될 수 있는 역상관기(520)의 개략적인 블록도를 도시한다. 역상관기(520)는 제 1 역상관 필터(526) 및 제 2 역상관 필터(524 )를 포함한다. 제 1 역상관 필터(526) 및 제 2 역상관 필터(528) 둘 모두는 예를 들어 신호 프로세서로부터 프로세싱된 신호 s(512)를 수신하도록 구성된다. 역상관기(520)는 프로세싱된 신호(512)를 제 1 역상관 필터(526)의 출력 신호(523)와 결합하여 제 1 역상관된 신호(522)(r1)를 획득하고, 제 2 역상관 필터(528)의 출력 신호(525)와 결합하여 제 2 역상관된 신호(524)(r2)를 획득한다. 신호들의 결합을 위해, 역상관기(520)는 신호들을 임펄스 응답들과 컨벌루션하고 그리고/또는 스펙트럼 값들을 실수 및/또는 허수 값들과 곱하도록 구성될 수 있다. 대안적으로 또는 추가적으로, 나누기, 합, 차 등과 같은 다른 연산들이 실행될 수 있다. FIG. 5 shows a schematic block diagram of an decorrelator 520 that may be configured to operate as decorrelator 120. The decorrelator 520 includes a first decorrelation filter 526 and a second decorrelation filter 524. Both the first de-correlation filter 526 and the second de-correlation filter 528 are configured to receive, for example, signal s 512 processed from the signal processor. The decorrelator 520 combines the processed signal 512 with the output signal 523 of the first decorrelator filter 526 to obtain a first decorrelated signal 522 And combines with the output signal 525 of the filter 528 to obtain the second decorrelated signal 524 (r2). For combination of signals, decorrelator 520 may be configured to convolute the signals with impulse responses and / or multiply the spectral values with real and / or imaginary values. Alternatively, or in addition, other operations such as division, sum, difference, etc. may be performed.

역상관 필터들(526 및 528)은 프로세싱된 신호(512)를 반향 또는 지연시키도록 구성될 수 있다. 역상관 필터들(526 및 528)은 유한 임펄스 응답(FIR) 및/또는 무한 임펄스 응답(IIR) 필터를 포함할 수 있다. 예를 들어, 역상관 필터(526 및 528)는 프로세싱된 신호(512)를, 시간 및/또는 주파수에 걸쳐 감쇠 또는 지수함수적으로 감쇠하는 잡음 신호로부터 획득된 임펄스 응답과 컨벌루션하도록 구성될 수 있다. 이것은, 신호(512)에 대한 반향을 포함하는 역상관된 신호(523 및/또는 525)를 생성하도록 허용한다. 반향 신호의 반향 시간은, 예를 들어, 50 내지 1000 ms, 80 내지 500 ms 및/또는 120 내지 200 ms의 값을 포함할 수 있다. 반향 시간은, 반향의 전력이 임펄스에 의해 여기된 후 작은 값으로 감쇠하는데 소요되는, 예를 들어, 초기 전력보다 60 dB 아래로 감쇠하는데 소요되는 지속기간으로 이해될 수 있다. 바람직하게는, 역상관 필터들(526 및 528)은 IIR-필터들을 포함한다. 이것은, (제로-) 필터 계수들에 대한 계산들이 생략될 수 있도록 필터 계수들 중 적어도 일부가 제로로 설정되는 경우 계산량을 감소시키도록 허용한다. 선택적으로, 역상관 필터는 하나보다 많은 필터를 포함할 수 있고, 여기서 필터들은 직렬 및/또는 병렬로 접속된다. The decorrelated filters 526 and 528 may be configured to echo or delay the processed signal 512. The decorrelation filters 526 and 528 may comprise a finite impulse response (FIR) and / or an infinite impulse response (IIR) filter. For example, decorrelation filters 526 and 528 may be configured to convolve the processed signal 512 with an impulse response obtained from a noise signal that attenuates or exponentially decays over time and / or frequency . This allows to generate the decorrelated signal 523 and / or 525, which includes the echo for signal 512. The echo time of the echo signal may include, for example, a value of 50 to 1000 ms, 80 to 500 ms and / or 120 to 200 ms. The echo time can be understood as the duration required to attenuate to a small value after the excitation power is excited by the impulse, e.g., 60 dB below the initial power. Preferably, the decorrelation filters 526 and 528 comprise IIR-filters. This allows to reduce the amount of computation if at least some of the filter coefficients are set to zero such that calculations for (zero-) filter coefficients can be omitted. Optionally, the decorrelation filter may comprise more than one filter, wherein the filters are connected in series and / or in parallel.

즉, 반향은 역상관 효과를 포함한다. 역상관기는 역상관할 뿐만 아니라 울림(sonority)을 오직 약간 변경하도록 구성될 수 있다. 기술적으로, 반향은 임펄스 응답을 고려하여 특성화될 수 있는 선형 시간 불변(LTI) 시스템으로 간주될 수 있다. 임펄스 응답의 길이는 종종 반향에 대해 RT60으로 언급된다. 이것은, 그 이후 임펄스 응답이 60 dB만큼 감소되는 시간이다. 반향은 최대 1 초 또는 심지어 최대 몇 초의 길이를 가질 수 있다. 반향과 유사한 구조를 포함하지만, 임펄스 응답의 길이에 영향을 미치는 파라미터들에 대한 상이한 세팅들을 포함하는 역상관기가 구현될 수 있다. That is, the echo includes an inverse correlation effect. The decorrelators may be configured not only to decorrelate but also to change the sonority only slightly. Technically, echoes can be viewed as a linear time invariant (LTI) system that can be characterized in terms of impulse response. The length of the impulse response is often referred to as RT60 for echo. This is the time after which the impulse response is reduced by 60 dB. Echo can be up to 1 second or even up to several seconds in length. An decorrelator may be implemented that includes different configurations for echo-like structures, but for parameters that affect the length of the impulse response.

도 6a는 적어도 하나의 과도(단기) 신호 부분들을 포함하는 오디오 신호(602a)의 스펙트럼을 포함하는 개략도를 도시한다. 과도 신호 부분은 광대역 스펙트럼을 도출한다. 스펙트럼은 주파수들 f에 걸쳐 크기들 S(f)로 도시되며, 여기서 스펙트럼은 다수의 주파수 대역들 b1-3으로 세분화된다. 과도 신호 부분은 b1-3에서 주파수 대역들 중 하나 이상에서 결정될 수 있다.6A shows a schematic diagram including a spectrum of an audio signal 602a including at least one transient (short) signal portions. The transient signal portion derives a broadband spectrum. The spectrum is shown as magnitudes S (f) over frequencies f, where the spectrum is subdivided into a plurality of frequency bands b1-3. The transient signal portion may be determined at one or more of the frequency bands at b1-3.

도 6b는 음조 성분을 포함하는 오디오 신호(602b)의 개략적인 스펙트럼을 도시한다. 스펙트럼의 예는 7 개의 주파수 대역들 fb1-7로 도시되어 있다. 주파수 대역 fb4는 주파수 대역들 fb1-7의 중앙에 배열되고 다른 주파수 대역들 fb1-3 및 fb5-7에 비해 최대 크기 S(f)를 포함한다. 중심 주파수(주파수 대역 fb5)에 대해 증가하는 거리를 갖는 주파수 대역들은, 감소하는 크기들을 갖는 음조 신호의 고조파 반복들을 포함한다. 신호 프로세서는, 예를 들어, 크기 S(f)를 평가함으로써 음조 성분을 결정하도록 구성될 수 있다. 음조 성분의 증가하는 크기 S(f)는 감소된 스펙트럼 가중치 팩터들만큼 신호 프로세서에 의해 통합될 수 있다. 따라서, 주파수 대역 내에서 과도 및/또는 음조 성분들의 셰어가 높을수록, 신호 프로세서의 프로세싱된 신호에서 주파수 대역이 덜 기여할 수 있다. 예를 들어, 주파수 대역 fb4에 대한 스펙트럼 가중치는, 제로의 값 또는 제로에 가까운 값, 또는 주파수 대역 fb4가 낮은 셰어를 갖는 것으로 고려되는 것을 표시하는 다른 값을 포함할 수 있다.FIG. 6B shows a schematic spectrum of an audio signal 602b that includes a tonal component. An example of the spectrum is shown in seven frequency bands fb1-7. The frequency band fb4 is arranged in the middle of the frequency bands fb1-7 and includes the maximum size S (f) compared to the other frequency bands fb1-3 and fb5-7. Frequency bands with increasing distance to the center frequency (frequency band fb5) include harmonic repeats of the tonal signal with decreasing magnitudes. The signal processor may be configured to determine the tonality component, for example, by evaluating the size S (f). The increasing size S (f) of the tonality component can be integrated by the signal processor by as much as the reduced spectral weighting factors. Thus, the higher the share of transients and / or tone components in the frequency band, the less the frequency band may contribute in the processed signal of the signal processor. For example, the spectrum weight for frequency band fb4 may include a value of zero or a value close to zero, or other value indicating that frequency band fb4 is considered to have a low share.

도 7a는 신호 프로세서(110 및/또는 210)와 같은 신호 프로세서에 의해 수행되는 가능한 과도 프로세싱(211)을 예시하는 개략적인 표를 도시한다. 신호 프로세서는 고려되는 주파수 도메인에서 오디오 신호의 표현의 주파수 대역들 각각에서 과도 성분들의 양, 예를 들어, 셰어를 결정하도록 구성된다. 평가는 적어도 최소값(예를 들어, 1) 및 최대값(예를 들어, 15)을 포함하는 스타터 값으로 과도 성분들의 양을 결정하는 것을 포함할 수 있고, 더 높은 값은 주파수 대역 내의 더 많은 양의 과도 성분들을 표시할 수 있다. 주파수 대역에서 과도 성분들의 양이 많을수록, 각각의 스펙트럼 가중치, 예를 들어, 스펙트럼 가중치(217)는 더 낮을 수 있다. 예를 들어, 스펙트럼 가중치는 적어도 0과 같은 최소값 및 최대 1과 같은 최대값의 값을 포함할 수 있다. 스펙트럼 가중치는 최소값과 최대값 사이에 복수의 값들을 포함할 수 있고, 스펙트럼 가중치는 고려사항 팩터 및/또는 추후 프로세싱을 위한 주파수 대역의 고려사항 팩터를 표시할 수 있다. 예를 들어, 0의 스펙트럼 가중치는, 주파수 대역이 완전히 감쇠되는 것을 표시할 수 있다. 대안적으로, 또한 다른 스케일링 범위들이 구현될 수 있는데, 즉, 도 7a에 도시된 표는 과도 주파수 대역인 주파수 대역 및/또는 스펙트럼 가중치의 스텝 사이즈의 평가에 대해 다른 스텝 사이즈들을 갖는 표들로 스케일링 및/또는 변환될 수 있다. 스펙트럼 가중치는 연속적으로 변할 수 있다. FIG. 7A shows a schematic table illustrating possible transient processing 211 performed by a signal processor such as signal processor 110 and / or 210. The signal processor is configured to determine the amount of transient components, e.g., SHARE, in each of the frequency bands of the representation of the audio signal in the frequency domain considered. The evaluation may include determining the amount of transient components with a starter value that includes at least a minimum value (e.g., 1) and a maximum value (e.g., 15), and a higher value may include a greater amount The transient components of < / RTI > The greater the amount of transient components in the frequency band, the lower the respective spectral weight, e.g., spectral weight 217, may be. For example, the spectral weight may include a minimum value such as at least 0 and a maximum value such as maximum 1. The spectral weighting may include a plurality of values between the minimum and maximum values, and the spectral weighting may indicate a consideration factor of the frequency band for consideration factor and / or further processing. For example, a spectral weighting of zero can indicate that the frequency band is fully attenuated. Alternatively, other scaling ranges may be implemented, i.e., the table shown in FIG. 7A may be scaled by tables having different step sizes for the evaluation of the step size of the frequency band and / or spectral weight, which is an excess frequency band, and / / RTI > The spectral weights can be continuously varied.

도 7b는, 예를 들어, 음조 프로세싱 스테이지(213)에 의해 실행될 수 있는 가능한 음조 프로세싱을 예시하는 예시적인 표를 도시한다. 주파수 대역 내에서 음조 성분들의 양이 많을수록, 각각의 스펙트럼 가중치(219)는 더 낮을 수 있다. 예를 들어, 주파수 대역의 음조 성분들의 양은 최소값 1과 최대값 8 사이에서 스케일링될 수 있고, 최소값은, 주파수 대역에 음조 성분들이 전혀 또는 거의 전혀 포함되지 않는 것을 표시한다. 최대값은, 주파수 대역이 많은 양의 음조 성분들을 포함한다는 것을 표시할 수 있다. 스펙트럼 가중치(219)와 같은 각각의 스펙트럼 가중치는 또한 최소값 및 최대값을 포함할 수 있다. 최소값(예를 들어, 0.1)은, 주파수 대역이 거의 완전히 또는 완전히 감쇠된 것을 표시할 수 있다. 최대값은, 주파수 대역이 거의 감쇠 또는 완전히 감쇠된 것을 표시할 수 있다. 스펙트럼 가중치(219)는 최소값, 최대값 및 바람직하게는 최소값과 최대값 사이의 적어도 하나의 값을 포함하는 다수의 값들 중 하나를 수용할 수 있다. 대안적으로, 스펙트럼 가중치가 고려 팩터가 되도록 음조 주파수 대역들의 감소된 셰어에 대해 스펙트럼 가중치가 감소할 수 있다.FIG. 7B illustrates an exemplary table illustrating possible tone processing that may be performed, for example, by tone processing stage 213. As shown in FIG. The greater the amount of tonality components in the frequency band, the lower the respective spectral weight 219 may be. For example, the amount of tonality components in the frequency band may be scaled between a minimum value of 1 and a maximum value of 8, and a minimum value indicates that the frequency band contains no or little tonality components. The maximum value may indicate that the frequency band includes a large amount of tonality components. Each spectral weight, such as spectral weight 219, may also include a minimum value and a maximum value. The minimum value (e.g., 0.1) may indicate that the frequency band is almost completely or completely attenuated. The maximum value may indicate that the frequency band is nearly attenuated or fully attenuated. The spectral weight 219 may accommodate one of a plurality of values including a minimum value, a maximum value, and preferably at least one value between a minimum value and a maximum value. Alternatively, the spectral weighting may be reduced for a reduced share of tonal frequency bands such that the spectral weighting is a consideration factor.

신호 프로세서는, 신호 프로세서(210)에 대해 설명된 바와 같이, 과도 프로세싱을 위한 스펙트럼 가중치 및/또는 음조 프로세싱을 위한 스펙트럼 가중치를 주파수 대역의 스펙트럼 값들과 결합하도록 구성될 수 있다. 예를 들어, 프로세싱된 주파수 대역에 대해, 스펙트럼 가중치(217 및/또는 219)의 평균값은 결합 스테이지(215)에 의해 결정될 수 있다. 주파수 대역의 스펙트럼 가중치들은 오디오 신호(102)의 스펙트럼 값들과 결합되거나, 예를 들어 곱해질 수 있다. 대안적으로, 결합 스테이지는, 스펙트럼 가중치들(217 및 219) 둘 모두를 비교하고, 그리고/또는 둘 모두의 더 낮은 또는 더 높은 스펙트럼 가중치를 선택하고, 선택된 스펙트럼 가중치를 스펙트럼 값들과 결합하도록 구성될 수 있다. 대안적으로, 스펙트럼 가중치들은 상이하게, 예를 들어 합, 차, 몫 또는 팩터로서 결합될 수 있다. The signal processor may be configured to combine spectral weights for transient processing and / or spectral weights for tone processing with the spectral values of the frequency band, as described for the signal processor 210. For example, for a processed frequency band, the average value of the spectral weights 217 and / or 219 may be determined by the combining stage 215. The spectral weights of the frequency bands may be combined with, or multiplied by, the spectral values of the audio signal 102, for example. Alternatively, the combining stage may be configured to compare both spectral weights 217 and 219, and / or to select lower or higher spectral weights of both, and to combine the selected spectral weights with the spectral values . Alternatively, the spectral weights may be combined differently, e.g., as sum, difference, quotient or factor.

오디오 신호의 특성은 시간이 지남에 따라 변할 수 있다. 예를 들어, 라디오 방송 신호는 먼저 음성 신호(현저한 음원 신호)를, 그리고 그 후 음악 신호(현저하지 않은 음원 신호)를 포함할 수 있거나 또는 그 반대일 수 있다. 또한, 음성 신호 및/또는 음악 신호 내의 변동들이 발생할 수 있다. 이것은, 스펙트럼 가중치들 및/또는 가중치 팩터들의 급격한 변화들을 초래할 수 있다. 신호 프로세서 및/또는 제어기는, 예를 들어, 2 개의 신호 프레임들 사이의 최대 스텝 사이즈를 제한함으로써, 2 개의 프레임들 사이의 변동들을 감소 또는 제한하기 위해 스펙트럼 가중치들 및/또는 가중치 팩터들을 추가적으로 적응시키도록 구성될 수 있다. 오디오 신호의 하나 이상의 프레임들은 일정 시간 기간에서 합산될 수 있고, 신호 프로세서 및/또는 제어기는 이전 시간 기간, 예를 들어 하나 이상의 이전 프레임들의 스펙트럼 가중치들 및/또는 가중치 팩터들을 비교하고, 실제 시간 기간에 대해 결정된 스펙트럼 가중치들 및/또는 가중치 팩터들의 차이가 임계 값을 초과하는지 여부를 결정하도록 구성될 수 있다. 임계 값은, 예를 들어, 청취자에게 성가신 효과들을 초래하는 값을 표현할 수 있다. 신호 프로세서 및/또는 제어기는 이러한 성가신 효과들이 감소 또는 방지되도록 변동들을 제한하도록 구성될 수 있다. 대안적으로, 차이 대신에, 비율과 같은 다른 수학적 표현들이 스펙트럼 가중치들 및/또는 이전 및 실제 시간 기간의 가중치 팩터들을 비교하기 위해 또한 결정될 수 있다.The characteristics of the audio signal can change over time. For example, a radio broadcast signal may first include a voice signal (a prominent source signal), and then a music signal (a non-significant source signal), or vice versa. Variations within the audio signal and / or music signal may also occur. This may result in abrupt changes in spectral weights and / or weighting factors. The signal processor and / or controller may further adjust the spectral weights and / or weighting factors to limit or limit variations between two frames, for example, by limiting the maximum step size between two signal frames . &Lt; / RTI > One or more frames of the audio signal may be summed over a period of time and the signal processor and / or controller may compare the spectral weights and / or weighting factors of the previous time period, e.g., one or more previous frames, May be configured to determine whether the difference in spectral weightings and / or weighting factors determined for the threshold value exceeds a threshold. The threshold value may, for example, express a value that causes annoying effects to the listener. The signal processor and / or controller may be configured to limit variations such that these annoying effects are reduced or prevented. Alternatively, instead of a difference, other mathematical expressions, such as a ratio, may also be determined to compare the weighting factors of the spectral weights and / or the previous and actual time periods.

즉, 음조 및/또는 과도 특성들의 양을 포함하는 특징이 각각의 주파수 대역에 할당된다.That is, features including the amount of pitch and / or transient characteristics are assigned to each frequency band.

도 8은 오디오 신호(102)를 향상시키기 위한 장치(801)를 포함하는 음향 향상 시스템(800)의 개략적인 블록도를 도시한다. 음향 향상 시스템(800)은, 오디오 신호를 수신하고 오디오 신호를 장치(801)에 제공하도록 구성되는 신호 입력(106)을 포함한다. 음향 향상 시스템(800)은 2 개의 스피커들(808a 및 808b)을 포함한다. 스피커(808a)는 신호 y1을 수신하도록 구성된다. 스피커(808b)는, 스피커들(808a 및 808b)을 이용하여 신호 y1 및 y2가 음파 또는 신호들로 전달될 수 있도록 신호 y2를 수신하도록 구성된다. 신호 입력(106)은 유선 또는 무선 신호 입력, 예를 들어, 라디오 안테나일 수 있다. 장치(801)는 예를 들어 장치(100 및/또는 200)일 수 있다. FIG. 8 shows a schematic block diagram of a sound enhancement system 800 including an apparatus 801 for enhancing an audio signal 102. As shown in FIG. The sound enhancement system 800 includes a signal input 106 that is configured to receive an audio signal and provide an audio signal to the device 801. The sound enhancement system 800 includes two speakers 808a and 808b. Speaker 808a is configured to receive signal y1. The speaker 808b is configured to receive the signal y2 so that the signals y1 and y2 can be transmitted to the sound wave or signals using the speakers 808a and 808b. The signal input 106 may be a wired or wireless signal input, for example, a radio antenna. Device 801 may be, for example, device 100 and / or 200.

상관된 신호 z는 과도 및 음조 성분들을 향상시키는 프로세싱(신호 s를 컴퓨팅하기 위한 억제의 질적 역수)을 적용함으로써 획득될 수 있다. 결합기에 의해 수행되는 결합은 y (y1/y2) = 스케일링 팩터 1·z + 스케일링 팩터 2·스케일링 팩터 (r1/r2)로 선형적으로 표현될 수 있다. 스케일링 팩터들은 인식된 역상관 강도를 예측함으로써 획득될 수 있다.The correlated signal z can be obtained by applying processing (a qualitative inverse of the suppression to compute the signal s) that improves transient and tonal components. The coupling performed by the coupler can be expressed linearly with y (y1 / y2) = scaling factor 1z + scaling factor 2 scaling factor (r1 / r2). The scaling factors can be obtained by predicting the recognized reverse correlation strength.

대안적으로, 신호들 y1 및/또는 y2는 스피커(808a 및/또는 808b)에 의해 수신되기 전에 추가로 프로세싱될 수 있다. 예를 들어, 신호 y1 및/또는 y2는, 신호 y1 및/또는 y2를 프로세싱함으로써 유도된 신호 또는 신호들이 스피커들(808a 및/또는 808b)에 제공되도록 증폭, 등화 등이 될 수 있다.Alternatively, the signals y1 and / or y2 may be further processed before being received by the speakers 808a and / or 808b. For example, signals y1 and / or y2 may be amplified, equalized, etc., such that signals or signals derived by processing signals y1 and / or y2 are provided to the speakers 808a and / or 808b.

오디오 신호에 추가된 인공 반향은 반향의 레벨이 가청적이지만 너무 크지(집중적인 것) 않도록 구현될 수 있다. 가청적인 또는 성가신 레벨들은 테스트들 및/또는 시뮬레이션들에서 결정될 수 있다. 너무 높은 레벨은, 명확성이 떨어지는 것, 타악기 음향이 시간상 흐려지는 것 등으로 인해 양호하게 들리지 않는다. 타겟 레벨은 입력 신호에 의존할 수 있다. 입력 신호가 적은 양의 과도들을 포함하고 주파수 변조들에 의해 적은 양의 음조들을 포함하면, 반향은 더 낮은 정도로 가청적이고 레벨은 증가될 수 있다. 역상관기는 유사한 활성 원리를 포함할 수 있기 때문에, 이것은 역상관에도 유사하게 적용된다. 따라서, 역상관기의 최적의 강도는 입력 신호에 의존할 수 있다. 컴퓨팅은 수정된 파라미터들에 의해 동일할 수 있다. 신호 프로세서 및 제어기에서 실행되는 역상관은, 구조적으로 동일할 수 있지만 상이한 파라미터 세트들로 동작되는 2 개의 역상관기들에 의해 수행될 수 있다. 역상관 프로세서들은 2-채널 스테레오 신호들로 제한되는 것이 아니라 2 개보다 많은 신호들을 갖는 채널에 또한 적용될 수 있다. 역상관은, 모든 신호 쌍들의 역상관에 대해 최대 모든 값들을 포함할 수 있는 상관 메트릭들로 정량화될 수 있다. The artificial echo added to the audio signal can be implemented so that the level of echo is audible but not too large (intensive). Audible or annoying levels may be determined in tests and / or simulations. Too high levels do not sound well due to poor clarity, percussion sounds being blurred over time, and so on. The target level may depend on the input signal. If the input signal contains a small amount of transients and contains a small amount of tonality by the frequency modulation, the echo can be audible to a lower degree and the level can be increased. Since the decorrelators may contain similar active principles, this applies similarly to the decorrelation. Thus, the optimal strength of the decorrelator may depend on the input signal. Computing may be the same by modified parameters. The decorrelation performed in the signal processor and the controller can be performed by two decorrelators, which may be structurally identical but operate on different sets of parameters. The inverse correlation processors are not limited to two-channel stereo signals but may also be applied to channels having more than two signals. The inverse correlation may be quantified with correlation metrics that may include all maximum values for the decorrelation of all signal pairs.

발명된 방법의 발견은, 공간 큐를 생성하고 공간 큐를 신호에 도입하여, 프로세싱된 신호가 스테레오 신호의 감각을 생성하게 하는 것이다. 프로세싱은 하기 기준에 따라 설계된 것으로 간주될 수 있다.The discovery of the invented method is to create a spatial cue and introduce a spatial cue into the signal so that the processed signal produces a sense of the stereo signal. The processing may be deemed designed according to the following criteria.

1. 높은 강도(또는 라우드니스 레벨)를 갖는 직접적인 음원들은 중앙에 로컬라이징된다. 이들은 현저한 직접적인 음원들, 예를 들어, 음악 녹음에서 가수 또는 큰 소리의 악기이다.One. Direct sound sources with high intensity (or loudness level) are localized in the center. These are prominent direct sound sources, for example, singer or loud instruments in music recording.

2. 주변 음향들은 확산되는 것으로 인식된다.2. Peripheral sounds are perceived to be diffuse.

3. 낮은 강도(즉, 낮은 라우드니스 레벨들)를 갖는 직접적인 음원에 가능하게는 주변 음향들보다 더 작은 범위까지 확산도가 추가된다.3. A direct sound source with low intensity (i. E., Low loudness levels) is added with a degree of diffusion, possibly to a range smaller than the ambient sounds.

4. 프로세싱은 자연스럽게 들려야 하고 아티팩트들을 도입해서는 안된다.4. Processing should be natural and should not introduce artifacts.

설계 기준은, 오디오 녹음의 생성에서의 통상적인 관례 및 스테레오 신호들의 신호 특성들과 일치한다.The design criteria are consistent with conventional practices in the generation of audio recordings and signal characteristics of the stereo signals.

1. 현저한 직접적인 음향들은 통상적으로 중앙으로 패닝되는데(panned), 즉, 이들은 무시가능한 ICLD 및 ICTD와 혼합된다. 이러한 신호들은 높은 코히어런스를 나타낸다.One. Significant direct sounds are typically panned to the center, i. E. They are mixed with negligible ICLD and ICTD. These signals represent high coherence.

2. 주변 음향들은 낮은 코히어런스를 나타낸다.2. Ambient sounds represent low coherence.

3. 예를 들어, 오케스트라와 협연하는 오페라 가수와 같은 반향 환경에서 다수의 직접적인 소스들을 녹음하는 경우, 각각의 직접적인 음향의 확산양은 소스들의 마이크로폰들까지의 거리와 관련되는데, 이는, 마이크로폰까지의 거리가 증가하는 경우 직접적인 신호와 반향 사이의 비율이 감소하기 때문이다. 따라서 낮은 강도로 캡쳐된 음향들은 통상적으로 직접적인 음향들보다 코히어런트가 낮다(즉, 반대로 더 많이 확산된다).3. For example, when recording a large number of direct sources in an echo environment such as an opera singer with an orchestra, the amount of diffusion of each direct sound is related to the distance of the sources to the microphones, The ratio between the direct signal and the echo is reduced. Thus, sounds captured at low intensities are typically coherent (i.e., more vice versa) than direct sounds.

프로세싱은 역상관을 이용하여 공간 정보를 생성한다. 즉, 입력 신호들의 ICC는 감소된다. 오직 극단적인 경우들에서만, 역상관이 완전히 미상관된 신호들을 초래한다. 통상적으로, 부분적인 역상관이 달성되고 바람직하다. 프로세싱은 지향성 큐(즉, ICLD 및 ICTD)를 조작하지 않는다. 이러한 제한의 이유는, 직접적인 음원의 원래의 또는 의도된 위치에 대한 어떠한 정보도 이용가능하지 않기 때문이다.The processing uses spatial correlation to generate spatial information. That is, the ICC of the input signals is reduced. In only extreme cases, the decorrelation results in completely uncorrelated signals. Typically, partial decorrelation is achieved and desirable. Processing does not manipulate directional queues (i.e., ICLD and ICTD). The reason for this limitation is that no information about the original or intended location of the direct sound source is available.

상기 설계 기준에 따르면, 역상관은 다음과 같은 혼합 신호 내의 신호 성분들에 대해 선택적으로 적용된다:According to the design criteria, the decorrelation is selectively applied to the signal components in the mixed signal as follows:

1. 설계 기준 1에서 논의된 바와 같이 신호 성분들에 대해 역상관이 전혀 또는 거의 적용되지 않는다.One. As discussed in Design Criteria 1, no or little correlation is applied to the signal components.

2. 설계 기준 2에서 논의된 바와 같이 신호 성분들에 대해 역상관이 적용된다. 이러한 역상관은 프로세싱의 출력에서 획득되는 혼합 신호의 인식된 폭에 크게 기여한다.2. An inverse correlation is applied to the signal components as discussed in Design Criteria 2. This inverse correlation contributes significantly to the perceived width of the mixed signal obtained at the output of the processing.

역상관은 설계 기준 3에서 논의된 바와 같이 신호 성분들에 적용되지만, 설계 기준 2에서 논의된 바와 같은 신호 성분들에 대한 것보다 더 적은 정도로 적용된다.The inverse correlation is applied to the signal components as discussed in Design Criteria 3, but to a lesser extent than for the signal components as discussed in Design Criteria 2.

이러한 프로세싱은, 입력 신호 x를 전경 신호 x_a 및 배경 신호 x_b의 가산적 혼합물, 즉, x = x_a + x_b로 표현하는 신호 모델을 이용하여 예시된다. 설계 기준 1에서 논의된 바와 같이 전경 신호는 모든 신호 성분들을 포함한다. 설계 기준 2에서 논의된 바와 같이 배경 신호는 모든 신호 성분들을 포함한다. 설계 기준 3에서 논의된 바와 같은 모든 신호 성분들은 분리된 신호 성분들 중 어느 하나에 배타적으로 할당되는 것이 아니라 전경 신호 및 배경 신호에 부분적으로 포함된다.This processing is illustrated using a signal model in which the input signal x is expressed as an additive mixture of the foreground signal x _a and the background signal x _b , i.e., x = x _a + x _b . As discussed in Design Criteria 1, the foreground signal includes all signal components. As discussed in Design Criteria 2, the background signal includes all signal components. All signal components as discussed in Design Criteria 3 are not exclusively assigned to any of the separate signal components but are partly included in the foreground and background signals.

출력 신호 y는 y = y_a + y_b로 계산되며, 여기서 y_b는 x_b를 역상관함으로써 계산되고 y_a는 y_a = x_a 또는 x_a를 역상관함으로써 계산된다. 즉, 배경 신호는 역상관을 이용하여 프로세싱되고 전경 신호는 역상관을 이용하여 프로세싱되지 않거나, 역상관에 의해 프로세싱되지만 배경 신호보다 더 적은 정도로 프로세싱된다. 도 9b는 이러한 프로세싱을 예시한다.The output signal y is calculated as y = y _a + y _b , where y _b is calculated by decorrelating x _b and y _a is calculated by decorrelating y _a = x _a or x _a . That is, the background signal is processed using the decorrelation and the foreground signal is not processed using the decorrelation, or is processed by the decorrelation but is processed to a lesser extent than the background signal. Figure 9B illustrates this processing.

이러한 접근법은 상기 설계 기준을 충족시키는 것에 머물지 않는다. 추가적인 이점은, 전경 신호는 역상관을 적용하는 경우 원하지 않게 착색되기 쉬울 수 있는 한편, 배경은 이러한 가청적 아티팩트들을 도입함이 없이 역상관될 수 있다는 점이다. 따라서, 설명된 프로세싱은 혼합물의 모든 신호 성분들에 대해 역상관을 동일하게 적용하는 프로세싱에 비해 더 양호한 음질을 도출한다.This approach does not remain to meet the above design criteria. An additional advantage is that foreground signals can be susceptible to undesired coloration when applying decorrelation, while backgrounds can be decoded without introducing these audible artifacts. Thus, the described processing results in better sound quality compared to processing that equally applies an inverse correlation to all signal components of the mixture.

지금까지 입력 신호는, 개별적으로 프로세싱되고 출력 신호로 결합되는 "전경 신호" 및 "배경 신호"로 표시된 2 개의 신호들로 분해되었다. 동일한 근거를 따르는 동등한 방법들이 실현가능한 것을 주목해야 한다.Up to now, the input signal has been decomposed into two signals, marked as "foreground signal" and "background signal, " which are individually processed and combined into an output signal. It should be noted that equivalent methods that follow the same rationale are feasible.

신호 분해는 반드시 오디오 신호, 즉 시간 경과에 따른 파형의 형상과 유사한 신호들을 출력하는 프로세싱일 필요는 없다. 그 대신, 신호 분해는, 역상관 프로세싱에 대한 입력으로 사용될 수 있고 후속적으로 파형 신호로 변환될 수 있는 임의의 다른 신호 표현을 도출할 수 있다. 이러한 신호 표현의 예는 단기 푸리에 변환에 의해 컴퓨팅되는 스펙트로그램이다. 일반적으로, 가역 및 선형 변환들은 적절한 신호 표현들을 도출한다.The signal decomposition does not necessarily have to be an audio signal, i.e., a signal that outputs signals similar in shape to the waveform over time. Instead, signal decomposition can be used as an input to the decorrelation processing and can derive any other signal representation that can subsequently be transformed into a waveform signal. An example of such a signal representation is a spectrogram computed by a short-term Fourier transform. In general, the reversible and linear transformations yield appropriate signal representations.

대안적으로, 공간 큐는 입력 신호 x에 기초하여 스테레오 정보를 생성함으로써 선행적인 신호 분해없이 선택적으로 생성된다. 유도된 스테레오 정보는 시변 및 주파수 선택적 값들로 가중되고, 입력 신호와 결합된다. 시변 및 주파수-선택적 가중치 팩터들은, 이들이 배경 신호가 지배적인 시간-주파수 영역들에서는 크고 전경 신호가 지배적인 시간-주파수 영역들에서는 작도록 컴퓨팅된다. 이것은, 배경 신호와 전경 신호의 시변 및 주파수-선택적 비율을 정량화함으로써 공식화될 수 있다. 가중치 팩터들은 예를 들어, 단조적으로 증가하는 함수들을 이용하여 배경-대-전경 비율로부터 컴퓨팅될 수 있다.Alternatively, the spatial cue is selectively generated without preceding signal decomposition by generating stereo information based on the input signal x. The derived stereo information is weighted with time-varying and frequency-selective values and combined with the input signal. The time-varying and frequency-selective weighting factors are computed such that they are small in time-frequency regions where the background signal is dominant in time-frequency regions and the foreground signal is dominant. This can be formulated by quantifying the time-varying and frequency-selective ratios of the background and foreground signals. The weighting factors may be computed from the background-to-foreground ratio, for example, using monotonically increasing functions.

대안적으로, 선행적인 신호 분해는 2 개보다 많은 분리된 신호들을 도출할 수 있다.Alternatively, proactive signal decomposition can yield more than two separate signals.

도 9a 및 도 9b는, 예를 들어, 신호들 중 하나에서 음조 과도 부분들을 억제(감소 또는 제거)함으로써 입력 신호를 전경 및 배경 신호로 분리하는 것을 예시한다.Figures 9A and 9B illustrate separating an input signal into foreground and background signals, for example, by suppressing (reducing or eliminating) tonal transient portions in one of the signals.

입력 신호가 전경 신호 및 배경 신호의 가산적인 혼합이라는 가정을 사용함으로써 단순화된 프로세싱이 유도된다. 도 9b는 이를 예시한다. 여기서, 여기서, 분리 1은 전경 신호 또는 배경 신호 중 어느 하나의 분리를 표시한다. 전경 신호가 분리되면, 출력 1은 전경 신호를 표시하고 출력 2는 배경 신호이다. 배경 신호가 분리되면, 출력 1은 배경 신호를 표시하고 출력 2는 전경 신호이다.Simplified processing is induced by using the assumption that the input signal is an additive mixture of the foreground signal and the background signal. Figure 9b illustrates this. Here, separation 1 represents the separation of either the foreground signal or the background signal. When the foreground signal is separated, output 1 indicates the foreground signal and output 2 the background signal. When the background signal is separated, output 1 indicates the background signal and output 2 the foreground signal.

신호 분리 방법의 설계 및 구현은 전경 신호들 및 배경 신호들이 별개의 특성들을 갖는다는 발견에 기초한다. 그러나, 이상적인 분리로부터의 편차들, 즉 현저한 직접적인 음원들의 신호 성분들의 배경 신호로의 누설 또는 주변 신호 성분들의 전경 신호로의 누설은 수용가능하고, 반드시 최종적 결과의 음질을 손상시키지는 않는다.The design and implementation of the signal separation method is based on the discovery that foreground signals and background signals have distinct characteristics. However, deviations from ideal separation, i.e., leakage of signal components of significant direct sound sources into the background signal, or leakage of peripheral signal components into the foreground signal, are acceptable and do not necessarily impair the sound quality of the final result.

시간적 특성들에 대하여, 일반적으로 전경 신호들의 서브대역 신호들의 시간적 엔빌로프들은, 배경 신호들의 서브대역 신호들의 시간 엔빌로프들보다 더 강한 진폭 변조들을 특징으로 하는 것이 관측될 수 있다. 반대로, 배경 신호들은 통상적으로 전경 신호들보다 덜 과도적(또는 타악기적, 즉 더 지속적)이다.For temporal characteristics, it can be observed that the temporal envelopes of the subband signals of generally foreground signals are characterized by stronger amplitude modulations than the temporal envelopes of the subband signals of background signals. Conversely, background signals are typically less transient (or percussive, i.e. more persistent) than foreground signals.

스펙트럼 특성들의 경우, 일반적으로 전경 신호들이 더 음조성일 수 있음이 관측될 수 있다. 반대로, 배경 신호들은 통상적으로 전경 신호들보다 더 잡음이 많다.In the case of spectral characteristics, it can be observed that generally foreground signals may be more negative. Conversely, background signals are typically noisier than foreground signals.

위상 특성들에 대해, 일반적으로 배경 신호들의 위상 정보는 전경 신호들보다 더 잡음이 많은 것이 관측될 수 있다. 전경 신호들의 많은 예들에 대한 위상 정보는 다수의 주파수 대역들에 걸쳐 일치한다.For the phase characteristics, it is generally observed that the phase information of the background signals is more noisy than the foreground signals. The phase information for many examples of foreground signals is consistent across a number of frequency bands.

현저한 음원 신호들과 유사한 특성들을 특징으로 하는 신호들은 배경 신호들보다 전경 신호들일 가능성이 높다. 현저한 음원 신호들은 음조 신호 성분과 잡음 신호 성분 사이의 전이들로 특성화되는데, 음조 신호 성분들은, 강하게 변조된 기본 주파수를 갖는 시변 필터링된 펄스 열(train)들이다. 스펙트럼 프로세싱은 이러한 특성들에 기초할 수 있고, 분해는 스펙트럼 감산 또는 스펙트럼 가중을 이용하여 구현될 수 있다.Signals featuring characteristics similar to prominent sound source signals are more likely to be foreground signals than background signals. Significant source signals are characterized by transitions between the tonal signal component and the noise signal component, wherein the tonal signal components are time-varying filtered train with strongly modulated fundamental frequency. Spectral processing may be based on these properties, and decomposition may be implemented using spectral subtraction or spectral weighting.

스펙트럼 감산은, 예를 들어, 입력 신호의 연속적인(가능하게는 중첩하는) 부분들의 짧은 프레임들의 스펙트럼들이 프로세싱되는 주파수 도메인에서 수행된다. 기본 원리는, 원하는 신호와 간섭 신호의 가산적인 혼합으로 가정되는 입력 신호들의 크기 스펙트럼들로부터 간섭 신호의 크기 스펙트럼의 추정치를 감산하는 것이다. 전경 신호 분리의 경우, 원하는 신호는 전경이고 간섭 신호는 배경 신호이다. 배경 신호 분리의 경우, 원하는 신호는 배경이고 간섭 신호는 전경 신호이다.Spectral subtraction is performed in the frequency domain where, for example, spectra of short frames of successive (possibly overlapping) portions of the input signal are processed. The basic principle is to subtract an estimate of the magnitude spectrum of the interfering signal from the magnitude spectra of the input signals assumed to be an additive mixture of the desired signal and the interfering signal. For foreground signal separation, the desired signal is the foreground and the interfering signal is the background signal. In case of background signal separation, the desired signal is background and the interference signal is foreground signal.

스펙트럼 가중(또는 단기 스펙트럼 감쇠)은 동일한 원리를 따르고, 입력 신호 표현을 스케일링함으로써 간섭 신호를 감쇠시킨다. 입력 신호 x(t)는 단기 푸리에 변환(STFT), 필터 뱅크, 또는 주파수 대역 인덱스 n 및 시간 인덱스 k를 갖는 다중 주파수 대역들 X(n, k)을 갖는 신호 표현을 유도하기 위한 임의의 다른 수단을 사용하여 변환된다. 입력 신호들의 주파수 도메인 표현들은, 서브대역 신호들이 시변 가중치들 G(n, k)로 스케일링되도록 프로세싱된다.Spectral weighting (or short-term spectral attenuation) follows the same principle and attenuates the interference signal by scaling the input signal representation. The input signal x (t) may be a short term Fourier transform (STFT), a filter bank, or any other means for deriving a signal representation with multiple frequency bands X (n, k) Lt; / RTI > The frequency domain representations of the input signals are processed such that the subband signals are scaled to the time varying weights G (n, k).

(3)

가중치 연산 Y(n, k)의 결과는 출력 신호의 주파수 도메인 표현이다. 출력 시간 신호 y(t)는 주파수 도메인 변환의 역 프로세싱, 예를 들어, 역 STFT를 사용하여 컴퓨팅된다. 도 10은 스펙트럼 가중을 예시한다.The result of the weighting operation Y (n, k) is a frequency domain representation of the output signal. The output time signal y (t) is computed using the inverse processing of the frequency domain transform, e.g., an inverse STFT. Figure 10 illustrates spectral weighting.

역상관은, 상호 (부분적으로 또는 완전히) 미상관되지만 입력 신호와 유사하게 들리는 다수의 출력 신호들이 획득되도록 하나 이상의 동일한 입력 신호를 프로세싱하는 것을 지칭한다. 2 개의 신호들 사이의 상관은 상관 계수 또는 정규화된 상관 계수를 이용하여 측정될 수 있다. 2 개의 신호들 X₁(n, k) 및 X₂(n, k)에 대한 주파수 대역들에서 정규화된 상관 계수 NCC는, The inverse correlation refers to processing one or more identical input signals such that a plurality of output signals that are mutually (partially or completely) uncorrelated but sounds similar to the input signal are obtained. The correlation between the two signals can be measured using a correlation coefficient or a normalized correlation coefficient. The normalized correlation coefficient NCC in the frequency bands for the two signals X ₁ (n, k) and X ₂ (n, k)

(4)

와 같이 정의되고, Lt; / RTI >

여기서,

및

는 각각 제 1 및 제 2 입력 신호의 자동 전력 스펙트럼 밀도들(PSD)이고,

는,here,

And

Are the automatic power spectral densities (PSD) of the first and second input signals, respectively,

Quot;

(5)

에 의해 주어지는 상호-PSD이고, 여기서,

는 기대값 연산이고, X^*는 X의 복수 콘주게이트를 표시한다.RTI ID = 0.0 > PSD, < / RTI >

Is an expected value operation, and X ^* denotes a plurality of conjugates of X.

역상관은, 역상관 필터들을 사용함으로써 또는 주파수 도메인에서 입력 신호들의 위상을 조작함으로써 구현될 수 있다. 역상관 필터들에 대한 예는 전대역 통과 필터이고, 이는, 정의에 의해 입력 신호들의 크기 스펙트럼을 변경하지 않고, 오직 신호들의 위상만을 변경한다. 이것은, 출력 신호들이 입력 신호들과 유사하게 들리는 점에서 중립적으로 들리는 출력 신호들을 도출한다. 다른 예는 반향이고, 이는 또한 필터 또는 선형 시간 불변 시스템으로 모델링될 수 있다. 일반적으로, 역상관은, 입력 신호의 다수의 지연된(및 가능하게는 필터링된) 카피들을 입력 신호에 추가함으로써 달성될 수 있다. 수학적 측면에서, 인위적인 반향은, 반향(또는 역상관) 시스템의 임펄스 응답과 입력 신호의 컨벌루션으로 구현될 수 있다. 지연 시간이 작은 경우, 예를 들어, 50 ms보다 작은 경우, 신호의 지연된 카피들은 별개의 신호들(에코들)로 인식되지 않는다. 에코의 감각을 도출하는 지연 시간의 정확한 값은 반향 임계치이고, 스펙트럼 및 시간 신호 특성들에 의존한다. 이것은, 예를 들어, 느리게 증가하는 엔빌로프를 갖는 음향의 경우보다 임펄스형 음향들의 경우 더 작다. 현재의 문제에 대해, 에코 임계치보다 작은 지연 시간들을 사용하는 것이 바람직하다.The decorrelation can be implemented by using decorrelation filters or by manipulating the phase of the input signals in the frequency domain. An example for decorrelated filters is an all-pass filter, which by definition does not change the magnitude spectrum of the input signals, but only changes the phase of the signals. This leads to output signals that sound neutral in that the output signals are similar to the input signals. Another example is echo, which can also be modeled as a filter or a linear time invariant system. In general, the decorrelation can be achieved by adding a plurality of delayed (and possibly filtered) copies of the input signal to the input signal. In mathematical terms, an artificial echo can be implemented with the convolution of the input signal with the impulse response of the echo (or decorrelation) system. If the delay time is small, for example, less than 50 ms, the delayed copies of the signal are not recognized as separate signals (echoes). The exact value of the delay time to derive the sense of echo is the echo threshold and depends on the spectral and temporal signal characteristics. This is, for example, smaller for impulse sounds than for sounds with a slowly increasing envelope. For current problems, it is desirable to use delay times that are less than the echo threshold.

일반적인 경우에, 역상관 프로세스는 N 개의 채널들을 갖는 입력 신호를 프로세싱하고, 출력의 채널 신호들이 (부분적으로 또는 완전히) 상호 미상관되도록 M 개의 채널들을 갖는 신호를 출력한다.In a general case, the decorrelation process processes the input signal with N channels and outputs a signal with M channels such that the channel signals of the output are (partially or completely) mutually uncorrelated.

설명된 방법에 대한 많은 애플리케이션 시나리오들에서, 입력 신호를 일정하게 프로세싱하는 것은 적절하지 않고, 입력 신호의 분석에 기초하여 입력 신호를 활성화하고 그 영향을 제어하는 것이 적절하다. 일례는 FM 방송이고, 여기서, 송신의 장애가 스테레오 정보의 완전한 또는 부분적 손실을 초래하는 경우에만 설명된 방법이 적용된다. 다른 예는 음악 녹음들의 콜렉션을 청취하는 것이고, 여기서, 녹음들의 서브세트는 모노포닉이고 다른 서브세트는 스테레오 녹음들이다. 두 시나리오들 모두는 오디오 신호들의 스테레오 정보의 시변 양에 의해 특성화된다. 이것은, 스테레오 향상의 활성화 및 영향에 대한 제어, 즉 알고리즘의 제어를 요구한다.In many application scenarios for the described method it is not appropriate to process the input signal uniformly and it is appropriate to activate the input signal and to control its influence based on the analysis of the input signal. An example is an FM broadcast, where the described method applies only if the failure of transmission results in complete or partial loss of stereo information. Another example is listening to a collection of music recordings, where the subset of recordings is monophonic and the other subset is stereo recordings. Both scenarios are characterized by the time-varying amount of stereo information of the audio signals. This requires control of the activation and influence of the stereo enhancement, i.e., the control of the algorithm.

제어는 오디오 신호들의 공간 큐(ICLD, ICTD 및 ICC, 또는 이들의 서브세트)를 추정하는 오디오 신호들의 분석을 이용하여 구현된다. 추정은 주파수 선택적 방식으로 구현될 수 있다. 추정의 출력은, 프로세싱의 활성화 또는 영향을 제어하는 스칼라 값에 맵핑된다. 신호 분석은 입력 신호 또는 대안적으로 분리된 배경 신호를 프로세싱한다.Control is implemented using analysis of audio signals that estimate the spatial cues (ICLD, ICTD, and ICC, or a subset thereof) of the audio signals. The estimation can be implemented in a frequency selective manner. The output of the estimate is mapped to a scalar value that controls the activation or influence of processing. The signal analysis processes an input signal or alternatively a separate background signal.

프로세싱의 영향을 제어하는 간단한 방식은, 스테레오 향상의 (가능하게는 스케일링된) 출력 신호에 입력 신호의 (가능하게는 스케일링된)카피를 추가함으로써 그 영향을 감소시키는 것이다. 제어의 부드러운 전이는 제어 신호를 시간에 따라 저역 통과 필터링함으로써 획득된다.A simple way to control the effect of processing is to reduce its effect by adding a (possibly scaled) copy of the input signal to the (possibly scaled) output signal of the stereo enhancement. The smooth transition of the control is obtained by low pass filtering the control signal over time.

도 9a는 배경/전경 프로세싱에 따라 입력 신호(102)의 프로세싱(900)에 대한 개략적인 블록도를 도시한다. 입력 신호(102)는, 전경 신호(914)가 프로세싱될 수 있도록 분리된다. 단계(916)에서, 전경 신호(914)에 대해 역상관이 수행된다. 단계(916)는 선택적이다. 대안적으로, 전경 신호(914)는 프로세싱되지 않고, 즉, 미상관되어 유지될 수 있다. 프로세싱 경로(920)의 단계(922)에서, 배경 신호(924)가 추출, 즉, 필터링된다. 단계(926)에서, 배경 신호(924)는 역상관된다. 단계(904)에서, 역상관된 전경 신호(918)(대안적으로 전경 신호(914)) 및 역상관된 배경 신호(928)는 혼합되어 출력 신호(906)가 획득된다. 즉, 도 9a는 스테레오 향상의 블록도를 도시한다. 전경 신호 및 배경 신호가 컴퓨팅된다. 배경 신호는 역상관에 의해 프로세싱된다. 선택적으로, 전경 신호는 역상관에 의해, 그러나 배경 신호보다 적은 정도까지 프로세싱될 수 있다. 프로세싱된 신호들은 출력 신호에 결합된다.9A shows a schematic block diagram of processing 900 of input signal 102 in accordance with background / foreground processing. The input signal 102 is separated so that the foreground signal 914 can be processed. In step 916, an inverse correlation is performed on the foreground signal 914. Step 916 is optional. Alternatively, foreground signal 914 may not be processed, i.e., uncorrelated. In step 922 of the processing path 920, the background signal 924 is extracted, i.e., filtered. At step 926, the background signal 924 is decoded. At step 904, the decorrelated foreground signal 918 (alternatively foreground signal 914) and the decorrelated background signal 928 are mixed to obtain an output signal 906. [ That is, Figure 9A shows a block diagram of the stereo enhancement. Foreground signals and background signals are computed. The background signal is processed by inverse correlation. Optionally, the foreground signal may be processed by the decorrelation, but to a lesser extent than the background signal. The processed signals are coupled to an output signal.

도 9b는 입력 신호(102)의 분리 단계(912')를 포함하는 프로세싱(900')의 개략적인 블록도를 도시한다. 분리 단계(912')는 앞서 설명된 바와 같이 수행될 수 있다. 전경 신호(출력 신호 1)(914')는 분리 단계(912')에 의해 획득된다. 배경 신호(928')는 결합 단계(926')에서 전경 신호(914'), 가중치 팩터들 a 및/또는 b 및 입력 신호(102)를 결합함으로써 획득된다. 배경 신호(출력 신호 2)(928')는 결합 단계(926')에 의해 획득된다. FIG. 9B shows a schematic block diagram of a processing 900 'that includes a separation step 912' of an input signal 102. The separation step 912 'may be performed as described above. The foreground signal (output signal 1) 914 'is obtained by a separation step 912'. Background signal 928 'is obtained by combining foreground signal 914', weighting factors a and / or b, and input signal 102 in combining step 926 '. The background signal (output signal 2) 928 'is obtained by the combining step 926'.

도 10은, 예를 들어, 입력 신호(1002)일 수 있는 입력 신호(1002)에 스펙트럼 가중치들을 적용하도록 구성되는 개략적인 블록도 및 장치(1000)를 도시한다. 시간 도메인의 입력 신호(1002)는 주파수 도메인에서 서브대역들 X(1, k) ... X(n, k)로 분할된다. 필터뱅크(1004)는 입력 신호(1002)를 N 개의 서브대역들로 분할하도록 구성된다. 장치(1000)는 시간 인스턴스(프레임) k에서 N 개의 서브대역들 각각에 대한 과도 스펙트럼 가중치 및/또는 음조 스펙트럼 가중치 G(1, k) ... G(n, k)를 결정하도록 구성된 N 개의 계산 인스턴스들을 포함한다. 스펙트럼 가중치들 G(1, k) ... G(n, k)은, 가중된 서브대역 신호들 X(1, k) ... X(n, k) n, k)가 획득되도록, 서브대역 신호 X(1,k)…X(n,k)와 결합된다. 장치(1000)는 시간 도메인에서 Y(t)로 표시된 필터링된 출력 신호(1012)를 획득하기 위해 가중 서브대역 신호들을 결합하도록 구성된 역 프로세싱 유닛(1008)을 포함한다. 장치(1000)는 신호 프로세서(110 또는 210)의 일부일 수 있다. 즉, 도 10은 입력 신호를 전경 신호 및 배경 신호로 분해하는 것을 예시한다.10 shows a schematic block diagram and apparatus 1000 that is configured to apply spectral weights to an input signal 1002, which may be, for example, an input signal 1002. [ The time domain input signal 1002 is divided into subbands X (1, k) ... X (n, k) in the frequency domain. The filter bank 1004 is configured to divide the input signal 1002 into N subbands. The apparatus 1000 includes N sets of N spectral weights G (n, k) configured to determine transient spectral weights and / or tone spectral weights G (1, k) ... G (n, k) for each of the N subbands in a time instance Calculation instances. The spectral weights G (1, k) ... G (n, k) are set such that the weighted subband signals X (1, k) The band signal X (1, k) ... X (n, k). Apparatus 1000 includes a reverse processing unit 1008 configured to combine the weighted subband signals to obtain a filtered output signal 1012 denoted Y (t) in the time domain. Apparatus 1000 may be part of signal processor 110 or 210. That is, FIG. 10 illustrates decomposing an input signal into a foreground signal and a background signal.

도 11은 오디오 신호를 향상시키기 위한 방법(1100)의 개략적인 흐름도를 도시한다. 방법(1100)는 프로세싱된 신호의 과도 및 음조 부분들을 감소 또는 제거하기 위해 오디오 신호가 프로세싱되는 제 1 단계(1110)를 포함한다. 방법(1100)은, 프로세싱된 신호로부터 제 1 역상관된 신호 및 제 2 역상관된 신호가 생성되는 제 2 단계(1120)를 포함한다. 방법(1100)의 단계(1130)에서, 제 1 역상관된 신호, 제 2 역상관된 신호 및 오디오 신호 또는 코히어런스 향상에 의해 오디오 신호로부터 유도된 신호는, 2-채널 오디오 신호를 획득하기 위해 시변 가중치 팩터들을 사용함으로써 가중적으로 결합된다. 방법(1100)의 단계(1140)에서, 오디오 신호의 상이한 부분들이 상이한 가중치 팩터들과 곱해지고 2-채널 오디오 신호가 시변 역상관도를 갖도록 오디오 신호를 분석함으로써 시변 가중치 팩터들이 제어된다.FIG. 11 shows a schematic flow diagram of a method 1100 for enhancing an audio signal. The method 1100 includes a first step 1110 in which an audio signal is processed to reduce or remove transient and tonal portions of the processed signal. The method 1100 includes a second step 1120 in which a first de-correlated signal and a second de-correlated signal are generated from the processed signal. In step 1130 of method 1100, the first de-correlated signal, the second decorrelated signal and the audio signal, or a signal derived from the audio signal by coherence enhancement, Are weighted together by using time-varying weight factors. In step 1140 of method 1100, the time-varying weight factors are controlled by analyzing the audio signal such that different portions of the audio signal are multiplied by different weighting factors and the 2-channel audio signal has a time varying decorrelation.

다음으로, 라우드니스 측정에 기초하여 인식된 역상관 레벨을 결정할 가능성을 설명하기 위한 세부사항들이 상술될 것이다. 도시되는 바와 같이, 라우드니스 측정은 인식된 반향 레벨을 예측하도록 허용할 수 있다. 위에서 언급된 바와 같이, 반향은 또한, 인식된 반향 레벨이 또한 인식된 역상관 레벨로 간주될 수 있도록 역상관을 의미하며, 역상관의 경우 반향은 1 초보다 짧을 수 있어서, 예를 들어 500 ms보다 짧을 수 있고, 250 ms 또는 200 ms보다 짧을 수 있다.Details for explaining the possibility of determining the recognized correlation level based on the loudness measurement will next be described. As shown, the loudness measurement may allow to predict the recognized echo level. As noted above, echoes also mean de-correlated so that the recognized echo level can also be regarded as the recognized de-correlation level, and in the case of de-correlated echo can be less than one second, for example 500 ms And may be shorter than 250 ms or 200 ms.

도 12는 직접적인 신호 성분 또는 드라이 신호 성분(1201) 및 반향 신호 성분(102)을 포함하는 혼합 신호에서 인식된 반향 레벨에 대한 측정치를 결정하기 위한 장치를 예시한다. 드라이 신호 성분(1201) 및 반향 신호 성분(1202)은 라우드니스 모델 프로세서(1204)에 입력된다. 라우드니스 모델 프로세서는 직접적인 신호 성분(1201) 및 반향 신호 성분(1202)을 수신하도록 구성되고, 또한 도 13a에 예시된 바와 같이 인식 필터 스테이지(1204a) 및 후속적으로 접속된 라우드니스 계산기(1204b)를 포함한다. 라우드니스 모델 프로세서는 그 출력으로서 제 1 라우드니스 측정치(1206) 및 제 2 라우드니스 측정치(1208)를 생성한다. 라우드니스 측정치들 둘 모두는, 제 1 라우드니스 측정치(1206) 및 제 2 라우드니스 측정치(1208)를 결합하기 위해 결합기(1210)에 입력되어, 최종적으로 인식된 반향 레벨에 대한 측정치(1212)가 획득된다. 구현에 따라, 감지된 레벨(1212)에 대한 측정치는 상이한 신호 프레임들에 대한 인식된 라우드니스에 대해 적어도 2 개의 측정치들의 평균값에 기초하여, 인식된 반향 레벨을 예측하기 위한 예측기(1214)에 입력될 수 있다. 그러나,도 12의 예측기(1214)는 선택적이고, 실제적으로 인식된 레벨에 대한 측정치를, 라우드니스와 관련된 양적 값을 부여하는데 유용한 손(Sone) 단위의 범위와 같은 단위 범위 또는 특정 값 범위로 변환한다. 그러나, 제어기에서, 예를 들어 예측기(1214)에 의해 프로세싱되지 않은 인식된 레벨(1212)에 대한 측정치에 대해 다른 용도들이 또한 사용될 수 있고, 제어기는, 예측기(1214)에 의해 출력된 값에 반드시 의존할 필요가 없지만, 반향된 신호 또는 이득 팩터 g의 강력하게 변하는 레벨 정정들을 갖지 않기 위해 바람직하게는 시간에 따른 평활화가 선호되는 일종의 평활화된 형태로 또는 직접적인 형태로, 인식된 레벨(1212)에 대한 측정치를 직접 프로세싱할 수 있다.Figure 12 illustrates an apparatus for determining a measure for a recognized echo level in a mixed signal comprising a direct signal component or a dry signal component 1201 and an echo signal component 102. [ The dry signal component 1201 and the echo signal component 1202 are input to the loudness model processor 1204. The loudness model processor is configured to receive the direct signal component 1201 and the echo signal component 1202 and also includes a recognition filter stage 1204a and a subsequently connected loudness calculator 1204b as illustrated in Figure 13a. do. The loudness model processor generates a first loudness measurement 1206 and a second loudness measurement 1208 as its output. Both of the loudness measurements are input to the combiner 1210 to combine the first loudness measurement 1206 and the second loudness measurement 1208 and a measurement 1212 for the finally recognized reverberation level is obtained. Depending on the implementation, the measurement for the detected level 1212 is input to a predictor 1214 for predicting the recognized echo level, based on the average of at least two measurements for the recognized loudness for different signal frames . However, the predictor 1214 of FIG. 12 transforms the measurements for the selectively and practically recognized levels into a unit range or a range of specific values, such as a range of Sone units useful for giving quantitative values associated with loudness . However, at the controller, other uses may also be used for measurements on the recognized level 1212 that have not been processed by the predictor 1214, for example, But need not be dependent on the recognized level 1212 in a kind of smoothed form or in a direct form, preferably with temporal smoothing being preferred so as not to have strongly varying level corrections of the echoed signal or gain factor g You can directly process the measurements for.

특히, 인식 필터 스테이지는 직접적인 신호 성분, 반향 신호 성분 또는 혼합 신호 성분을 필터링하도록 구성되며, 인식 필터 스테이지는 인간과 같은 개체의 청각적 인식 메커니즘을 모델링하여 필터링된 직접적인 신호, 필터링된 반향 신호 또는 필터링된 혼합 신호를 획득하도록 구성된다. 구현에 따라, 인식 필터 스테이지는, 병렬로 동작하는 2 개의 필터를 포함할 수 있거나, 또는 스토리지 및 단일 필터를 포함할 수 있는데, 이는, 3 개의 신호들, 즉 반향 신호, 혼합 신호 및 직접적인 신호 각각을 필터링하기 위해 실제로 하나의 동일한 필터가 사용될 수 있기 때문이다. 그러나, 이러한 상황에서, 도 13a는 청각적 인식 메커니즘을 모델링하는 n 개의 필터를 예시하지만, 실제로 2 개의 필터들 또는 반향 신호 성분, 혼합 신호 성분 및 직접적인 신호 성분을 포함하는 그룹으로부터 2 개의 신호들을 필터링하는 단일 필터로도 충분할 수 있다.In particular, the recognition filter stage is configured to filter a direct signal component, an echo signal component, or a mixed signal component, wherein the recognition filter stage models an auditory perception mechanism of an object such as a human to generate a filtered direct signal, To obtain a mixed signal. Depending on the implementation, the recognition filter stage may comprise two filters operating in parallel, or may comprise a storage and a single filter, which may comprise three signals, echo signal, mixed signal and direct signal respectively In fact one and the same filter can be used. However, in this situation, FIG. 13A illustrates n filters that model acoustic perception mechanisms, but actually filters two signals from a group that includes two filters or echo signal components, mixed signal components, and direct signal components A single filter may be sufficient.

라우드니스 계산기(1204b) 또는 라우드니스 평가기는 필터링된 직접적인 신호를 사용하여 제 1 라우드니스 관련 측정치를 추정하고 필터링된 반향 신호 또는 필터링된 혼합 신호를 사용하여 제 2 라우드니스 측정치를 추정하도록 구성되고, 혼합 신호는 직접적인 신호 성분 및 반향 신호 성분 중첩으로부터 유도된다.The loudness calculator 1204b or the loudness evaluator is configured to estimate the first loudness-related measure using the filtered direct signal and estimate the second loudness measure using the filtered echo signal or the filtered mixed signal, Signal component and echo signal component superposition.

도 13c는 인식된 반향 레벨에 대한 측정치를 계산하는 4 가지 선호되는 모드들을 예시한다. 구현은, 직접적인 신호 성분 x 및 반향 신호 성분 r 둘 모두가 라우드니스 모델 프로세서에서 사용되지만, 제 1 측정치 EST1을 결정하기 위해 반향 신호가 자극으로서 사용되고, 직접적인 신호가 잡음으로서 사용되는 부분적인 라우드니스에 의존한다. 제 2 라우드니스 측정치 EST2를 결정하기 위해, 상황이 변경되고, 직접적인 신호 성분이 자극으로서 사용되고, 반향 신호 성분이 잡음으로서 사용된다. 그 다음, 결합기에 의해 생성된 인식된 정정 레벨에 대한 측정치는 제 1 라우드니스 측정치 EST1과 제 2 라우드니스 측정치 EST2 사이의 차이이다.Figure 13c illustrates four preferred modes for calculating measurements for the recognized echo level. The implementation relies on the partial loudness in which the direct signal component x and the echo signal component r are both used in the loudness model processor but the echo signal is used as a stimulus to determine the first measurement EST1 and the direct signal is used as noise . To determine the second loudness measure EST2, the situation is changed, a direct signal component is used as a stimulus, and an echo signal component is used as noise. The measurement for the recognized correction level produced by the coupler is then the difference between the first loudness measurement EST1 and the second loudness measurement EST2.

그러나, 도 13c의 라인들 2, 3 및 4에 표시된 다른 계산 효율적인 실시예들 추가적으로 존재한다. 이러한 보다 계산 효율적인 측정치들은 혼합 신호 m, 직접적인 신호 x 및 반향 신호 n을 포함하는 3 개의 신호들의 총 라우드니스를 계산하는 것에 의존한다. 도 13c의 마지막 열에 표시된 결합기에 의해 수행되는 요구된 계산에 따라, 제 1 라우드니스 측정치 EST1은 혼합 신호 또는 반향 신호의 총 라우드니스이고, 제 2 라우드니스 측정치 EST2는 직접적인 신호 성분 x 또는 혼합 신호 성분 m의 총 라우드니스이고, 실제 결합은 도 13c에 예시된 바와 같다.However, there are additional computationally efficient embodiments shown in lines 2, 3 and 4 of Figure 13c. These more computationally efficient measurements are dependent on calculating the total loudness of the three signals including the mixed signal m, the direct signal x and the echo signal n. The first loudness measure EST1 is the total loudness of the mixed signal or the echo signal and the second loudness measure EST2 is the sum of the direct signal component x or the sum of the mixed signal components m, according to the required calculation performed by the combiner shown in the last column of Figure 13c. Loudness, and the actual combination is as illustrated in Fig. 13C.

도 14는 도 12, 도 13a, 도 13b, 도 13c에 대한 일부 양상들에서 이미 논의된 라우드니스 모델 프로세서의 구현을 예시한다. 특히, 인식 필터 스테이지(1204a)는 각각의 브랜치에 대한 시간-주파수 변환기(1401)를 포함하고, 여기서, 도 3의 실시예에서 x[k]는 자극을 표시하고 n[k]는 잡음을 표시한다. 시간/주파수 변환된 신호는 귀 전달 함수 블록(ear transfer function block)(1402)으로 포워딩되고(귀 전달 함수는 대안적으로, 이와 유사한 결과들, 그러나 더 높은 계산 부하를 갖는 시간-주파수 변환기 전에 컴퓨팅될 수 있음을 주목한다), 이러한 블록(1402)의 출력은 여기 패턴 컴퓨팅 블록(1404)에, 및 그 후 시간적 적분 블록(1406)에 입력된다. 그 다음, 블록(1408)에서, 이러한 실시예에서의 특정 라우드니스가 계산되고, 여기서 블록(1408)은 도 13a의 라우드니스 계산기 블록(1204b)에 대응한다. 후속적으로, 블록(1410)에서 주파수에 걸친 적분이 수행되고, 여기서 블록(1410)은 도 13b의 1204c 및 1204d로서 이미 설명된 가산기에 대응한다. 블록(1410)은 자극 및 잡음의 제 1 세트에 대한 제 1 측정치, 및 자극 및 잡음의 제 2 세트에 대한 제 2 측정치를 생성한다. 특히, 도 13b가 고려되는 경우, 제 1 측정치를 계산하기 위한 자극은 반향 신호이고 잡음은 직접적인 신호인 한편, 제 2 측정치 계산하는 경우 상황은 변경되어, 자극은 직접적인 신호 성분이고 잡음은 반향 신호 성분이다. 따라서, 2 개의 상이한 라우드니스 측정치들을 생성하기 위해, 도 14에 예시된 절차가 2 회 수행되었다. 그러나, 계산에서의 변경은 오직, 상이하게 동작하는 블록(1408)에서만 발생하고, 따라서 블록들(1401 내지 1406)에 의해 예시된 단계들은 한번만 수행되면 되고, 도 13c에 도시된 구현의 경우 제 1 추정된 라우드니스 및 제 2 추정된 라우드니스를 컴퓨팅하기 위해 시간적 적분 블록(1406)의 결과는 저장될 수 있다. 다른 구현의 경우, 블록(1408)은 각각의 브랜치에 대한 개별적인 블록 "총 음량 컴퓨팅"으로 대체될 수 있고, 이러한 구현에서는 하나의 신호가 자극으로 간주되든지 또는 잡음으로 간주되든지 무관함을 주목해야 한다.Figure 14 illustrates an implementation of the loudness model processor already discussed in some aspects with respect to Figures 12, 13A, 13B, and 13C. In particular, the recognition filter stage 1204a includes a time-to-frequency converter 1401 for each branch, where x [k] denotes the stimulus and n [k] denotes the noise do. The time / frequency converted signal is forwarded to an ear transfer function block 1402 (the ear-transfer function is alternatively computed prior to the time-to-frequency converter with similar results, but with a higher computational load) The output of this block 1402 is input to the excitation pattern computing block 1404 and then to the temporal integration block 1406. [ Then, at block 1408, the specific loudness in this embodiment is calculated, where block 1408 corresponds to loudness calculator block 1204b in FIG. 13A. Subsequently, integration across frequency is performed at block 1410, where block 1410 corresponds to the adder already described as 1204c and 1204d in Figure 13b. Block 1410 generates a first measurement for the first set of stimuli and noise, and a second measurement for the second set of stimuli and noise. In particular, if FIG. 13B is taken into account, the stimulus for calculating the first measurement is an echo signal and the noise is a direct signal, while in the second measurement calculation the situation is changed so that the stimulus is a direct signal component and the noise is an echo signal component to be. Thus, to generate two different loudness measurements, the procedure illustrated in FIG. 14 was performed twice. However, the change in the calculation only occurs in differently operating blocks 1408, so that the steps illustrated by blocks 1401 through 1406 need only be performed once, and in the case of the implementation shown in Figure 13c, The result of the temporal integration block 1406 may be stored to compute the estimated loudness and the second estimated loudness. For other implementations, it should be noted that block 1408 can be replaced with a separate block "total volume computing" for each branch, and in this implementation, whether one signal is considered a stimulus or noise .

일부 양상들은 장치의 상황에서 설명되었지만, 이러한 양상들은 또한 대응하는 방법의 설명을 표현하는 것이 명백하며, 여기서 블록 또는 디바이스는 방법 단계 또는 방법 단계의 특징에 대응한다. 유사하게, 방법의 상황에서 설명되는 양상들은 또한 대응하는 장치의 블록 또는 아이템 또는 특징의 설명을 표현한다. While some aspects have been described in the context of an apparatus, it is evident that these aspects also represent a description of a corresponding method, wherein the block or device corresponds to a feature of a method step or method step. Similarly, aspects described in the context of a method also represent a description of a block or item or feature of the corresponding device.

특정한 구현 요건들에 따라, 본 발명의 실시예들은 하드웨어 또는 소프트웨어로 구현될 수 있다. 구현은, 각각의 방법이 수행되도록 프로그래밍가능 컴퓨터 시스템과 협력하는(또는 협력할 수 있는), 전자적으로 판독가능한 제어 신호들을 저장하는 디지털 저장 매체, 예를 들어, 플로피 디스크, DVD, CD, ROM, PROM, EPROM, EEPROM 또는 FLASH 메모리를 사용하여 수행될 수 있다. Depending on the specific implementation requirements, embodiments of the present invention may be implemented in hardware or software. The implementation may be implemented in a digital storage medium, such as a floppy disk, a DVD, a CD, a ROM, a floppy disk, a CD, a CD, a CD, PROM, EPROM, EEPROM or FLASH memory.

본 발명에 따른 일부 실시예들은, 본원에서 설명되는 방법들 중 하나가 수행되도록 프로그래밍가능 컴퓨터 시스템과 협력할 수 있는, 전자적으로 판독가능한 제어 신호들을 갖는 데이터 캐리어를 포함한다.Some embodiments in accordance with the present invention include a data carrier having electronically readable control signals that can cooperate with a programmable computer system to perform one of the methods described herein.

일반적으로, 본 발명의 실시예들은 프로그램 코드를 갖는 컴퓨터 프로그램 물건으로서 구현될 수 있고, 프로그램 코드는, 컴퓨터 프로그램 물건이 컴퓨터 상에서 실행되는 경우 본 방법들 중 하나를 수행하도록 동작한다. 프로그램 코드는 예를 들어, 머신-판독가능 캐리어 상에 저장될 수 있다. In general, embodiments of the present invention may be implemented as a computer program product having program code, and the program code is operable to perform one of the methods when the computer program product is run on a computer. The program code may be stored, for example, on a machine-readable carrier.

다른 실시예들은, 본원에서 설명되는 방법들 중 하나를 수행하기 위한, 머신 판독가능 캐리어 상에 저장되는 컴퓨터 프로그램을 포함한다. Other embodiments include a computer program stored on a machine readable carrier for performing one of the methods described herein.

따라서, 달리 말하면, 창작적 방법의 일 실시예는, 컴퓨터 프로그램이 컴퓨터 상에서 실행되는 경우, 본원에서 설명되는 방법들 중 하나를 수행하기 위한 프로그램 코드를 갖는 컴퓨터 프로그램이다.Thus, in other words, one embodiment of the inventive method is a computer program having program code for performing one of the methods described herein when the computer program is run on a computer.

따라서, 창작적 방법들의 추가적인 실시예는, 본원에서 설명되는 방법들 중 하나를 수행하기 위한 컴퓨터 프로그램이 포함되고 기록되는 데이터 캐리어(예를 들어, 디지털 저장 매체 또는 컴퓨터 판독가능 매체)이다. Accordingly, additional embodiments of the inventive methods are data carriers (e.g., digital storage media or computer readable media) in which computer programs for carrying out one of the methods described herein are included and recorded.

따라서, 창작적 방법의 추가적인 실시예는, 본원에서 설명되는 방법들 중 하나를 수행하기 위한 컴퓨터 프로그램을 표현하는 신호들의 시퀀스 또는 데이터 스트림이다. 예를 들어, 신호들의 시퀀스 또는 데이터 스트림은, 예를 들어, 인터넷을 통해, 데이터 통신 접속을 통해 전송되도록 구성될 수 있다. Thus, a further embodiment of the inventive method is a sequence or data stream of signals representing a computer program for performing one of the methods described herein. For example, a sequence of signals or a data stream may be configured to be transmitted over a data communication connection, for example, over the Internet.

추가적인 실시예는, 본원에서 설명되는 방법들 중 하나를 수행하도록 구성 또는 적응되는 프로세싱 수단, 예를 들어, 컴퓨터 또는 프로그래밍가능 로직 디바이스를 포함한다. Additional embodiments include processing means, e.g., a computer or programmable logic device, configured or adapted to perform one of the methods described herein.

추가적인 실시예는, 본원에서 설명되는 방법들 중 하나를 수행하기 위한 컴퓨터 프로그램이 설치된 컴퓨터를 포함한다. Additional embodiments include a computer in which a computer program for performing one of the methods described herein is installed.

앞서 설명된 실시예들은, 본 발명의 원리들에 대해 단지 예시적이다. 본원에서 설명되는 배열들 및 세부사항들의 변형들 및 변화들이 당업자들에게 자명할 것이 이해된다. 따라서, 본 발명은 후속 특허 청구항들의 범주에 의해서만 제한되며, 본원의 실시예들의 서술 및 설명의 방식으로 제시되는 특정 세부사항들에 의해서는 제한되지 않도록 의도된다.The embodiments described above are merely illustrative of the principles of the present invention. It is understood that variations and modifications of the arrangements and details described herein will be apparent to those skilled in the art. Accordingly, it is intended that the invention be limited only by the scope of the following claims, and are not intended to be limited by the specific details presented in the manner of description and explanation of the embodiments herein.

Claims

An apparatus (100; 200) for enhancing an audio signal (102)
A signal processor (110; 210) for processing the audio signal (102) to reduce or remove transient and tonal portions of the processed signal (112; 212);
An decorrelator (120; 520) for generating a first decorrelated signal and a second decorrelated signal (124; r2) from the processed signal (112; 212);
(122; 522, r1), the second decorrelated signal (124; r2), and a signal induced by the coherence enhancement from the audio signal or the audio signal (102) A combiner (140; 240) for weighting using the time-varying weight factors (a, b) and obtaining a 2-channel audio signal (142; 242); And
The audio signal 122 is multiplied by different weighting factors a and b so that different portions fb1 through fb7 of the audio signal are multiplied by different weighting factors a and b and the two- (130, 230) for controlling said time-varying weight factors (a, b)
Device.

The method according to claim 1,
The controller 130 230 increases the weighting factors a, b for the portions fb1-fb7 of the audio signal 102 to allow a higher degree of decorrelation, (A, b) for the portions (fb1-fb7) of the plurality of subcarriers (102, 102) to allow a lower degree of decorrelation.
Device.

3. The method according to claim 1 or 2,
The controller 130 is configured to scale the weighting factors a, b such that the recognized de-correlation level of the two-channel audio signal 142; 242 is maintained within a certain range around the target value , Said range extending to +/- 20% of said target value,
Device.

The method of claim 3,
The controller 130 230 determines the target value by echoing the audio signal 102 to obtain an echoed audio signal and compares the audio signal with the echoed audio signal 102 to obtain a comparison result Wherein the controller is configured to determine the recognized de-correlation level (232) based on the comparison result,
Device.

5. The method according to any one of claims 1 to 4,
The controller (130; 230) determines a significant sound source signal portion of the audio signal (102), and determines the weighting factor for the significant sound source signal portion relative to the portion of the audio signal (102) B, < / RTI >
Wherein the controller (130; 230) determines a non-significant source signal portion of the audio signal (102), and wherein the non-significant source B) < / RTI > for the signal portion,
Device.

6. The method according to any one of claims 1 to 5,
The controller (130; 230)
Generate a test decorrelated signal from a portion of the audio signal (102);
Derive a measure of a portion of the audio signal and a recognized correlation level from the test decorrelated signal;
And to derive the weighting factors (a, b) from the measurements for the recognized de-correlation level.
Device.

The method according to claim 6,
The decorrelator (120, 520) is configured to generate the first decorrelated signal (122; r1) based on an echo of the audio signal (102) having a first echo time, the controller ) Is configured to generate the test decorrelated signal based on an echo of the audio signal (102) having a second echo time, wherein the second echo time is shorter than the first echo time
Device.

8. The method according to any one of claims 1 to 7,
The controller (130; 230) is configured to control the weighting factors (a, b) such that each of the weighting factors (a, b) comprises a value of one of the first plurality of possible values, Wherein the plurality of values includes at least three values including a minimum value, a maximum value, and a value between the minimum value and the maximum value,
The signal processor (110, 210) is configured to determine spectral weights (217, 219) for a second plurality of frequency bands each representing a portion of the audio signal (102) in the frequency domain, Each of the first plurality 217 and the second 219 includes a value of one of the third plurality of possible values and the value of the third plurality comprises at least three values including a minimum value, a maximum value, and a value between the minimum value and the maximum value &Lt; / RTI >
Device.

9. The method according to any one of claims 1 to 8,
The signal processor (110; 210)
The audio signal 102 being transmitted in the frequency domain and the second plurality of frequency bands fb1-fb7 being representative of a second plurality of portions in the frequency domain between the audio signal 102 Processing the signal 102;
For each frequency band (fb1-fb7), determining a first spectral weight (217) representing a processing value for the transient processing (211) of the audio signal (102);
For each of the frequency bands (fb1-fb7), determining a second spectral weight (219) representing a processing value for tonal processing (213) of the audio signal (102);
And at least one of the first spectral weight 217 and the second spectral weight 219 for each of the frequency bands fb1 to fb7 is divided into a spectrum of the audio signal 102 in the frequency bands fb1 to fb7, Values,
Wherein each of the first spectral weight 217 and the second spectral weight 219 comprises a value of one of a third plurality of possible values and wherein the value of the third plurality comprises at least one of a minimum value, And at least three values including a value between the maximum values.
Device.

10. The method of claim 9,
Wherein for each of the second plurality of frequency bands (fb1-fb7), the signal processor (110; 210) comprises a first spectral weight (217) determined for the frequency band (fb1-fb7) (217; 219) including the smaller value is compared with the weighted value (219) in the frequency band (fb1-fb7) by comparing the weighted value (219) Is adapted to apply to the spectral values of the audio signal (102)
Device.

11. The method according to any one of claims 1 to 10,
The decorrelator 520 includes a first decorrelation filter 526 configured to filter the processed audio signal 512, s to obtain the first decorrelated signal 522, rl, And a second decorrelation filter (528) configured to filter the audio signal (512, s) to obtain a second decorrelated signal (524, r2), wherein the combiner (140; 240) The second decorrelated signal 524, r2 and the signal 136 (236) derived from the audio signal 102 or the audio signal 102 in a weighted combination of the decorrelated signal 522, r1, the second decorrelated signal 524, To obtain the 2-channel audio signal (142; 242)
Device.

12. The method according to any one of claims 1 to 11,
For a second plurality of frequency bands fb1-fb7, each of the frequency bands fb1-fb7 comprises a portion of the audio signal 102 having a first time period and being represented in the frequency domain,
The controller (130; 230) controls the weighting factors (a, b) such that each of the weighting factors (a, b) comprises a value of one of a first plurality of possible values, Comprises a minimum value, a maximum value and at least three values including a value between the minimum value and the maximum value, the value of the weighting factors (a, b) determined for the actual time period and the value of the previous time period (A, b) determined for the real time period so that the ratio or difference is reduced if the ratio or difference based on the values of the weight factors (a, b) determined for the real time period is greater than or equal to the threshold value, , &Lt; / RTI >
Wherein the signal processor (110, 210) is configured to determine the spectral weights (217, 219) each comprising a value of one of a third plurality of possible values, And a value between the minimum value and the maximum value,
Device.

As a sound enhancement system 800,
An apparatus (801) for enhancing an audio signal according to any one of claims 1 to 12;
A signal input (106) configured to receive the audio signal (102);
Channel audio signal y1 / y2 or a signal derived from the 2-channel audio signal y1 / y2 and the 2-channel audio signal y1 / / y2), < / RTI > and at least two speakers (808a, 808b)
Sound Enhancement System.

A method (1100) for enhancing an audio signal (102)
Processing (1110) the audio signal (102) to reduce or eliminate transient and tonal portions of the processed signal (112; 212);
Generating (1120) a first decorrelated signal (122, r1) and a second decorrelated signal (124; r2) from the processed signal (112; 212);
(122, r1), the second decorrelated signal (124, r2) and a signal (102) derived from the audio signal (102) or the coherence enhancement 136; 236) weighted using the time-varying weight factors (a, b) and obtaining a 2-channel audio signal 142 (242) (1130); And
By analyzing the audio signal 102 such that different portions of the audio signal are multiplied by different weighting factors a and b and the 2-channel audio signal 142 (242) has a time varying decorrelation, (A, b). &Lt; RTI ID = 0.0 >
Way.

A non-temporary storage medium in which a computer program is stored,
The computer program comprising program code for performing a method for enhancing an audio signal according to claim 14 when executed by a computer,
Non-temporary storage medium.