KR100978015B1

KR100978015B1 - Fixed Spectrum Power Dependent Audio Enhancement System

Info

Publication number: KR100978015B1
Application number: KR1020047021716A
Authority: KR
Inventors: 데이비드 에이. 씨. 엠. 루버스
Original assignee: 코닌클리케 필립스 일렉트로닉스 엔.브이.
Priority date: 2002-07-01
Filing date: 2003-06-19
Publication date: 2010-08-25
Anticipated expiration: 2023-06-19
Also published as: KR20050016719A

Abstract

A spectral processor coupled to the signal input for carrying the distorted request signal z, the reference signal input, and both signal inputs and processing the distorted request signal by the reference signal x serving as an estimate for the distortion of the request signal. An audio reinforcement system (1) for speech recognition or voice control comprising (SP) is disclosed. The spectral processor SP determines the factor C 'so that the estimate is a function of the product of C' over the spectral power of the reference signal, and the factor C 'is a component of the signals z, x that are essentially fixed over time. For the processing to be determined according to the spectral ratio between them. The factor determined by the fixed portions of the signals obviates the application of the critical speech detector in the audio enhancement system.

Spectrum Processor, Distorted Request Signal, Spectrum Ratio

Description

Stationary spectral power dependent audio enhancement system

본 발명은 왜곡된 요구 신호(distorted desired signal)를 나르기 위한 신호 입력, 기준 신호를 위한 기준 신호 입력(reference signal input), 및 상기 두 신호 입력들에 결합되어 요구 신호의 왜곡에 대한 추정으로서 역할을 하는 기준 신호에 의해 상기 왜곡된 요구 신호를 처리하는 스펙트럼 프로세서(spectral processor)를 포함하는 오디오 강화 시스템에 관한 것이고, 거기에 사용하는데 적합한 신호들에 관한 것이다.The present invention is coupled to the signal input for carrying a distorted desired signal, a reference signal input for a reference signal, and the two signal inputs to serve as an estimate for distortion of the required signal. The present invention relates to an audio enhancement system including a spectral processor for processing the distorted request signal by a reference signal, and to signals suitable for use therein.

본 발명은 이동 전화, 음성 인식 시스템, 또는 보이스 제어 시스템 등의 핸즈프리 통신 장치, 특히 통신 시스템과 같은 시스템에 관한 것이고, 상기 시스템은 오디오 강화 시스템이 제공된다. 왜곡된 요구 신호를 강화하는 방법으로서, 상기 신호가 상기 요구 신호의 왜곡에 대한 추정으로서 역할을 하는 기준 신호에 의해 스펙트럼 처리되는, 상기 방법에 관한 것이다.The present invention relates to a system such as a hands-free communication device, in particular a communication system, such as a mobile phone, a voice recognition system, or a voice control system, which system is provided with an audio enhancement system. A method for reinforcing a distorted request signal, wherein the signal is spectrally processed by a reference signal that serves as an estimate for distortion of the request signal.

상기 오디오 강화 시스템은 WO 97/45995로부터 알려진 왜곡 잡음과 같은 간섭 성분을 억제하기 위한 장치에 의해 구현된다. 알려진 시스템은 오디오 신호 입력들에 결합된 다수의 마이크로폰들을 포함한다. 마이크로폰들은 왜곡된 요구 신호를 위한 주요 마이크로폰과, 간섭 신호를 수신하기 위한 하나 이상의 참조 마이크로폰들을 포함한다. 상기 시스템은 또한, 오디오 신호 입력들을 통해 마이크로폰들에 결합된 신호 처리 장치로 구현된 스펙트럼 프로세서를 포함한다. 시스템의 출력에 감소된 간섭 잡음 성분을 포함한 출력 신호를 나타내도록, 왜곡 신호로부터 간섭 신호를 스펙트럼 감산한다.The audio enhancement system is implemented by a device for suppressing interference components such as distortion noise known from WO 97/45995. Known systems include a number of microphones coupled to audio signal inputs. The microphones include a main microphone for the distorted request signal and one or more reference microphones for receiving an interference signal. The system also includes a spectral processor implemented with a signal processing device coupled to the microphones via audio signal inputs. The interference signal is spectral subtracted from the distortion signal to represent an output signal that includes a reduced interference noise component at the output of the system.

알려진 오디오 강화 시스템의 단점은, 그것의 간섭 신호 소거 능력들이 음성 처리 장치에 결합될 음성 검출기의 애플리케이션에 의존한다는 점이다. 알려진 오디오 강화 시스템의 동작은 결정적으로 상기 음성 검출기에 의한 적절한 음성 검출에 의존한다.A disadvantage of the known audio enhancement system is that its interfering signal cancellation capabilities depend on the application of the speech detector to be coupled to the speech processing device. The operation of a known audio enhancement system depends critically on proper speech detection by the speech detector.

그러므로, 본 발명의 목적은 개선된 오디오 강화 시스템과, 음성 검출기의 존재와 중요 동작에 의해 복잡해지지 않는 관련 방법을 제공하는 것이다.It is therefore an object of the present invention to provide an improved audio enhancement system and an associated method that is not complicated by the presence and critical operation of the voice detector.

또한, 본 발명에 따른 오디오 강화 시스템은, 스펙트럼 프로세스가 인자 C'가 결정되어 추정이 기준 신호의 스펙트럼 전력에 대한 C' 곱이고, 인자 C'가 본질적으로 시간에 대해 고정인 신호 z 및 x의 성분들 간의 스펙트럼 비율에 따라 결정되도록 상기 처리를 위해 제공되는 것을 특징으로 한다.In addition, the audio enhancement system according to the present invention is characterized in that the spectral process of the signals z and x is such that the factor C 'is determined so that the estimate is a C' product of the spectral power of the reference signal, and the factor C 'is essentially fixed with respect to time. Characterized in that it is provided for said processing to be determined in accordance with the spectral ratio between the components.

유사하게, 본 발명에 따른 방법은, 상기 추정이 기준 신호의 스펙트럼 전력에 대한 C'곱의 함수가 되고, C'가 시간에 대해 본질적으로 고정인 신호들 z 및 x의 성분들 간의 스펙트럼 비율에 따라 결정되는 것을 특징으로 한다.Similarly, the method according to the invention is characterized in that the estimation is a function of the product of C 'over the spectral power of the reference signal, and C' is a function of It is determined according to.

발명자는, 규정된 바와 같은 인자 C'가 요구 신호에 본질적으로 둔감하다는 것을 발견했다. 인자 C'만이 신호들 z 및 x 내 고정 성분들의 비를 차지한다. 상기 인자 C'의 개념으로, 음성 검출기를 사용할 필요 없이 오디오 강화 시스템에 실제로 입력되는 왜곡된 요구 신호의 왜곡에 대해 신뢰성 있는 추정이 제공될 수 있다. 이것은 본 발명에 따른 단순화된 오디오 강화 시스템의 개선되고 덜 중요한 왜곡 소거 특성들을 유발한다. 하나 이상의 기준 신호들이 잡음, 에코들, 경합 음성, 요구 음성 신호의 잔향 등과 같은 왜곡들을 포함하는 경우, 왜곡 소거가 특히 개선된다. 또한, 왜곡에 대한 주파수 의존 추정은 일부 기준 신호(들)이 이용가능한 임의의 시나리오에서 계산될 수 있다.The inventor has found that the factor C 'as defined is inherently insensitive to the required signal. Only factor C 'accounts for the ratio of fixed components in signals z and x. With the concept of factor C ', a reliable estimate can be provided for the distortion of the distorted request signal actually input to the audio enhancement system without the need for using a speech detector. This leads to improved and less important distortion cancellation characteristics of the simplified audio enhancement system according to the present invention. If one or more reference signals include distortions such as noise, echoes, contention speech, reverberation of the demanded speech signal, etc., distortion cancellation is particularly improved. In addition, the frequency dependent estimate of the distortion may be calculated in any scenario where some reference signal (s) are available.

다른 이점들은 잡음 플로어(noise floor) 또는 에코 테일(echo tail)과 같은 개개의 왜곡 성분들의 명백한 추정이 필요치 않다는 점이고, 요구된다면, 이들 성분들을 갖는 조합 기술이 쉽게 달성될 수 있다. 이것은 마이크로폰 빔 형성 애플리케이션들을 위해서와 같이 우수한 추정 기술들이 존재하지 않는 경우에 특히 바람직하다.Another advantage is that no explicit estimation of the individual distortion components, such as the noise floor or the echo tail, is required, and if required a combination technique with these components can be easily achieved. This is particularly desirable when there are no good estimation techniques, such as for microphone beamforming applications.

본 발명에 따른 오디오 강화 시스템의 실시예는 제 2 항에 약술된 독특한 특징들을 갖는다.An embodiment of the audio enhancement system according to the invention has the unique features outlined in claim 2.

일반적으로, 특정한 수의 시간 프레임들을 커버하는 평균 스펙트럼 전력의 형태를 갖는 스펙트럼 전력들이 측정된다. 일정 시간 기간 동안, 본 발명에 따른 오디오 강화 시스템의 계산 복잡성에 대한 실제 부담 없이, 두 스펙트럼 전력들 모두에 대한 시간 기간 최소값들이 결정된다. In general, spectral powers in the form of average spectral power covering a particular number of time frames are measured. For a period of time, the time period minimums for both spectral powers are determined, without the actual burden on the computational complexity of the audio enhancement system according to the invention.

본 발명에 따른 오디오 강화 시스템의 다른 실시예에서, 시간 기간은 왜곡된 요구 신호 내 적어도 하나의 휴지(pause)를 포함한다. 이것은 잘 결정된 최소값과 왜곡된 요구 입력 신호의 고정 스펙트럼 성분 값을 유발하는데, 최소값은 입력 신호에서의 고정 왜곡을 정확히 표현한다.In another embodiment of the audio enhancement system according to the invention, the time period comprises at least one pause in the distorted request signal. This results in fixed spectral component values of the well-determined minimum value and the distorted required input signal, which minimum accurately represents the fixed distortion in the input signal.

바람직하게, 오디오 강화 시스템에 대한 왜곡된 요구 신호 입력에서의 음성 휴지가 정상적으로 포함되도록 상기 시간 기간은 4초 내지 5초를 지속한다.Preferably, the time period lasts from 4 seconds to 5 seconds so that speech pauses in the distorted request signal input to the audio enhancement system are normally included.

오디오 강화 시스템의 또 다른 실시예가 제 5 항에 약술된 독특한 특징들을 갖는다.Another embodiment of the audio enhancement system has the unique features outlined in claim 5.

일반적으로, 요구 신호의 왜곡 추정은 예를 들어, 신호 전력 또는 신호 에너지에 의해 양의 함수로 바람직하게 표현될 수 있고, 상기 스펙트럼 유닛들 중 하나에 의해 규정된다.In general, the distortion estimate of the required signal can be preferably expressed as a positive function, for example by signal power or signal energy, and is defined by one of the spectral units.

실질적으로 양호한 실시예는 제 6 항의 독특한 특징들을 갖는다.The substantially preferred embodiment has the unique features of claim 6.

상기의 경우, 오디오 강화 시스템은 스펙트럼 전력들 및/또는 평활화된 스펙트럼 전력들의 값들을 저장하기 위한 쉬프트 레지스터들을 구현하기 위해 비용 효율적이고 용이하게 포함된다. In that case, the audio enhancement system is included cost-effectively and easily to implement shift registers for storing values of spectral powers and / or smoothed spectral powers.

이제, 본 발명에 따른 오디오 강화 시스템과 방법은 첨부된 도면을 참조하면서, 부가적인 이점들과 함께 더 상세히 설명될 것이며, 여기서 유사한 성분들은 동일한 참조 번호들에 의해 언급된다.The audio enhancement system and method according to the invention will now be described in more detail with additional advantages, with reference to the accompanying drawings, wherein like components are referred to by the same reference numerals.

도 1은 본 발명에 따른 오디오 강화 시스템의 기본 다이어그램.1 is a basic diagram of an audio enhancement system according to the present invention.

도 2는 필터 및 합계 빔 형성기를 갖는 본 발명에 따른 오디오 강화 시스템의 또 다른 실시예에 구현된 기본 다이어그램.2 is a basic diagram implemented in another embodiment of an audio enhancement system according to the present invention having a filter and a sum beamformer.

도 3은 본 발명에 따른 오디오 강화 시스템의 상세한 실시예를 도시하는 도면.3 shows a detailed embodiment of an audio enhancement system according to the invention.

도 1은 스펙트럼 프로세서 SP에 의해 구현된 오디오 강화 시스템(1)의 기본 다이어그램을 도시하며, 여기에 주파수 도메인 입력 신호들(z, x) 및 출력 신호(q)가 도시된다. 이들 주파수 도메인 신호들은, 간단히 STFT로 언급되는 단시간(short time) DFT와 같은 이산 푸리에 변환(Discrete Fourier Transform)에 의해 상기 프로세서 SP에서 블록 단위로(block-wise) 스펙트럼 계산된다. 이 STFT는 인수들 kB, lw₀나 때때로 인수 w_k만으로 표현될 수 있는 시간 및 주파수 모두의 함수이다. 여기서 k는 이산 시간 프레임 인덱스를 나타내고, B는 프레임 쉬프트를 나타내며, l은 (이산) 주파수 인덱스를 나타내고, w₀는 기본 주파수 스페이싱을 나타내며, w_k는 인덱스 k의 스펙트럼 성분을 나타낸다. 입력 신호 z는 왜곡된 요구 신호를 나타낸다. 그것은 일반적으로 음성의 형태로, 요구 신호의 잡음, 에코, 경합 음성 또는 잔향과 같은 왜곡들과 요구 신호와의 합을 포함한다. 신호 x는 왜곡된 요구 신호 z내의 왜곡 추정이 도출되는 기준 신호를 나타낸다. 신호들(z,x)은 도 1과 도 2에 도시된 바와 같이, 하나 이상의 마이크로폰들(2)로부터 발생할 수 있다. 다중-마이크로폰 오디오 강화 시스템(1)에서, 하나 이상의 마이크로폰들로부터 기준 신호를 도출하기 위해 두 개 이상의 별개의 마이크로폰들(2)이 존재한다.1 shows a basic diagram of an audio enhancement system 1 implemented by a spectrum processor SP, in which the frequency domain input signals z, x and output signal q are shown. These frequency domain signals are spectrally calculated block-wise at the processor SP by a Discrete Fourier Transform, such as a short time DFT, simply referred to as STFT. This STFT is a function of both time and frequency, which can be represented only by the arguments kB, lw ₀ or sometimes the argument w _k . Where k denotes a discrete time frame index, B denotes a frame shift, l denotes a (discrete) frequency index, w ₀ denotes a fundamental frequency spacing, and w _k denotes the spectral component of the index k. The input signal z represents the distorted request signal. It is usually in the form of speech, which includes the sum of the distortion of the demand signal, such as noise, echo, contention speech or reverberation, and the demand signal. Signal x represents a reference signal from which distortion estimates within the distorted request signal z are derived. Signals z and x may originate from one or more microphones 2, as shown in FIGS. 1 and 2. In the multi-microphone audio enhancement system 1, there are two or more separate microphones 2 for deriving a reference signal from one or more microphones.

오디오 강화 시스템(1)은 그것으로부터 기준 신호 x를 도출하기 위해 적응 필터 수단(도시되지 않음)을 포함할 수 있다. 이 경우, 기준 신호는 통신 시스템의 최단부에서 발생한다.The audio enhancement system 1 may comprise adaptive filter means (not shown) to derive the reference signal x from it. In this case, the reference signal is generated at the shortest end of the communication system.

도 1의 실시예에서, 신호 x만이 기준 또는 잡음 신호를 포함하는 반면, 신호 z는 요구 신호 및 잡음 신호 모두를 포함한다. 도 2는 마이크로폰(2)들 모두가 마이크로폰 어레이 신호들(u₁,u₂)을 통해 음성과 잡음을 감지하는 경우에 대한 오디오 강화 시스템(1)의 실시예를 도시한다. 필터와 합계 빔 형성기(sum beamformer)(3)는 이제, 마이크로폰들(2) 및 스펙트럼 프로세서(SP) 사이에 결합된다. 또한, 스펙트럼 프로세서는 전술된 신호들(z,x)을 수신하는데, 신호(x)만이 기준 또는 잡음을 포함하고, 신호(z)는 요구 신호 및 잡음 신호를 포함한다. 개개의 전달 함수들 f₁(w),f₂(w)을 통해, 왜곡된 요구 신호(z)가 마이크로폰 어레이 신호들(u₁,u₂) 각각의 선형 조합에 의해 획득되도록 상기 빔 형성기(3)가 설계된다. 요구 신호에 직교(orthogonal)하는 서브스페이스로 이들 신호들을 투사하기 위해, 기준 신호 x는 블로킹 행렬(blocking matrix) B(w)에 의해 개개의 마이크로폰 어레이 신호들로부터 도출된다. 이상적으로, 행렬 B(w)의 출력 신호(x)는 요구 음성을 포함하지 않지만 왜곡들을 포함한다. 다음, 기준 신호(x)에 의해 상기 왜곡된 요구 신호(z)를 스펙트럼 처리하기 위해 신호들(z,x)이 스펙트럼 프로세서(SP)에 공급된다. 상기 프로세서(SP)로부터의 신호(q)는 사실상 왜곡 없는 출력 신호이다. q=G×z인데, 여기서 G는 이하에서 기술될 이득 함수이다.In the embodiment of Figure 1, only signal x includes a reference or noise signal, while signal z includes both a request signal and a noise signal. FIG. 2 shows an embodiment of an audio enhancement system 1 for the case where both of the microphones ₂ detect voice and noise via the microphone array signals u ₁ , u ₂ . The filter and sum beamformer 3 are now coupled between the microphones 2 and the spectral processor SP. In addition, the spectral processor receives the signals z and x described above, where only signal x includes a reference or noise, and signal z includes a request signal and a noise signal. Through the respective transfer functions f ₁ (w), f ₂ (w), the beam former ( ₁ ) is adapted such that the distorted request signal z is obtained by a linear combination of each of the microphone array signals u ₁ , u ₂ . 3) is designed. In order to project these signals into a subspace orthogonal to the request signal, the reference signal x is derived from the individual microphone array signals by a blocking matrix B (w). Ideally, the output signal x of the matrix B (w) does not contain the desired speech but contains distortions. The signals z and x are then supplied to the spectral processor SP for spectral processing the distorted request signal z by the reference signal x. The signal q from the processor SP is in fact an output signal without distortion. q = G × z, where G is the gain function to be described below.

오디오 강화 시스템(1)은 시스템, 특히 이동 전화, 음성 인식 시스템, 또는 보이스 제어 시스템 등의 핸즈프리 통신 장치와 같은 통신 시스템에 포함될 수 있다.The audio reinforcement system 1 may be included in a system, in particular a communication system such as a handsfree communication device such as a mobile phone, a voice recognition system, or a voice control system.

스펙트럼 프로세서(SP)는 전술된 이산 푸리에 변환(DFT)에 의해 발생된 후속 주파수 빈들에 대한 제어가능한 이득 함수로서 역할을 하도록 동작한다. 이러한 이득 함수는 왜곡된 요구 음성 신호(z)에 적용되지만, 신호(z)의 위상은 변경되지 않는다. 우수한 성능의 오디오 강화를 위해, 이득 함수형, 즉, 입력 신호에 존재하는 왜곡의 추정은 중요하다. 그러나, 다양한 이득 함수들이 처리되는 최적화 기준에 종속하여 사용될 수 있다. 예들은 스펙트럼 감산(spectral subtraction), 위너 필터링(Wiener filtering) 또는, 예를 들어 스펙트럼 진폭이나 크기에 기초한 MMSE 추정(Minimum Mean-Square Error estimation)이나 로그-MMSE 추정(log-MMSE estimation), 제곱 스펙트럼 크기(squared spectral magnitude), 전력 스펙트럼 밀도(power spectral density), 또는 포함된 신호들의 멜-스케일 평활화된 스펙트럼 밀도(Mel-scale smoothed spectral density)를 포함한다. 이들 기술들은 하나 이상의 마이크로폰들 및/또는 라우드스피커들을 갖는 오디오 강화 시스템들(1)에 대해 전술된 애플리케이션들과 결합될 수 있다.The spectral processor SP operates to serve as a controllable gain function for subsequent frequency bins generated by the Discrete Fourier Transform (DFT) described above. This gain function is applied to the distorted desired speech signal z, but the phase of the signal z is not changed. For good performance audio enhancement, gain function, i.e., estimation of distortion present in the input signal is important. However, various gain functions may be used depending on the optimization criteria being processed. Examples include spectral subtraction, Wiener filtering, or Minimum Mean-Square Error estimation or log-MMSE estimation based on spectral amplitude or magnitude, or squared spectrum. Squared spectral magnitude, power spectral density, or Mel-scale smoothed spectral density of the included signals. These techniques may be combined with the applications described above for audio enhancement systems 1 having one or more microphones and / or loudspeakers.

예를 들어, 이하에 기술될 위너 필터형의 경우, 스펙트럼 프로세서(SP)에서 구현된 이득 함수는 이하의 형태를 갖는다:For example, for the Wiener filter type described below, the gain function implemented in the spectral processor SP has the following form:

여기서 P_zz,n(kB,lw₀)와 P_zz(kB,lw₀)는 입력 신호(z)에서의 왜곡의 전력 분배와, 입력 신호(z) 자체의 전력 분배에 대한 추정들이다.

는 왜곡에 적용된 억제량을 조정하게 하는 소위 초과 감산 인자를 나타낸다. 이처럼 트레이드-오프(trade-off)는 왜곡 억제량과 프로세서 출력 신호의 지각 품질 사이에 형성될 수 있다.Where P _{zz, n} (kB, lw ₀ ) and P _zz (kB, lw ₀ ) are estimates of the power distribution of the distortion in the input signal z and the power distribution of the input signal z itself.

Denotes a so-called excess subtraction factor that allows to adjust the amount of suppression applied to the distortion. As such, a trade-off may be formed between the amount of distortion suppression and the perceived quality of the processor output signal.

식(1)에서, P_zz,n(kB,lw₀)는 일반적으로 알려지지 않았고, 따라서 추정되어야 한다. 추정

이 이하와 같이 제안된다:In equation (1), P _{zz, n} (kB, lw ₀ ) is not generally known and should therefore be estimated. calculation

This is suggested as follows:

여기서 비율 항은:Where the ratio term is:

여기서

은 음성과 같은 요구 신호의 부재 동안 측정된, 왜곡된 요구 신호(z)에 대한 왜곡의 시평균 스펙트럼 전력(time averaged spectral power)이고,

은 기준 신호(x)의 시평균 스펙트럼 전력이다. 예를 들면, 스펙트럼 진폭 또는 크기와 같이 스펙트럼 전력에 대한 양(陽, positive)의 측정값으로서, 포함된 신호들의 제곱 스펙트럼 크기, 전력 스펙트럼 밀도, 또는 멜-스케일 평활화된 스펙트럼 밀도가 취해질 수 있다. 프로세서(SP)에서 식(3)의 구현은 음성 검출기를 요구한다. 상기 음성 검출기가 정확히 수행하지 않는다면, 요구 음성이 영향받을 수 있고, 이것은 방지되어야 하는 가청 아티팩트들(audible artifacts)을 초래한다. 그러나, 차나 공장과 같은 잡음 조건들에서 신뢰성 있는 음성 검출이 태스크를 수행하는 것은 어렵다.here

Is the time averaged spectral power of the distortion for the distorted demand signal z, measured during the absence of a demand signal such as speech,

Is the time average spectral power of the reference signal x. For example, as a positive measure of spectral power, such as spectral amplitude or magnitude, the squared spectral magnitude, power spectral density, or mel-scale smoothed spectral density of the included signals can be taken. The implementation of equation (3) in the processor SP requires a speech detector. If the voice detector does not perform correctly, the required voice may be affected, resulting in audible artifacts that should be avoided. However, it is difficult for reliable voice detection to perform this task in noisy conditions such as cars or factories.

일반적으로, 인자 C에 대한 추정으로서, 실제적으로 식(3)의 비 중 고정 부분들에 집중함으로써 행해지는, 요구 음성에 사실상 둔감한 새로운 인자 C'를 제안함으로써, 음성 검출기를 요구하지 않는 강력한 알고리즘이 생성된다. 상기 아이디어의 실제적 구현에 있어서, 인자 C'는 왜곡된 요구 신호(z)의 최소 스펙트럼 전력과 기준 신호(x)의 최소 스펙트럼 전력의 비에 의해 규정되는데, 두 최소값은 하나의 시간 기간 동안 결정된다. 이하와 같은 공식으로 표현된다:In general, a robust algorithm that does not require a speech detector by suggesting a new factor C ', which is virtually insensitive to the demanded speech, actually done by concentrating on the fixed proportions of the ratio of equation (3) as an estimate for factor C. Is generated. In a practical implementation of the idea, the factor C 'is defined by the ratio of the minimum spectral power of the distorted request signal z and the minimum spectral power of the reference signal x, wherein the two minimum values are determined for one time period. . It is expressed by the following formula:

l-L개의 시간 프레임들과 l개의 시간 프레임들 간의 시간 기간은 왜곡된 요구 신호에 존재하는 적어도 하나의 휴지(pause)를 포함하는 L개의 시간 프레임들을 커버한다. 요구 신호가 음성 신호라면, 일반적으로 이것은 음성 휴지이다. 그렇게 결정된 최소값은 신호들(z,x) 각각의 고정 성분들에 대해 식 (4)의 비를 집중시키며, 이 때 최소값은 왜곡 또는 잡음의 고정 성분들을 나타낸다. 일반적으로 시간 기간은 적어도 4 내지 5초를 지속한다. 식(4)에 주어진 인자 C'는 신호들(z,x)의 고정 성분들에 기초하여 결정된다. 음성과 같은 비 고정 성분들이 이 신호들에 존재하는 경우에도 지속되는 것이 가정되고, 스펙트럼 프로세서(SP)에 의해 수행된 동작은 상기 가정에 기초한다.The time period between l-L time frames and l time frames covers L time frames including at least one pause present in the distorted request signal. If the request signal is a voice signal, this is typically a voice pause. The minimum value thus determined concentrates the ratio of equation (4) on the fixed components of each of the signals z, x, where the minimum represents the fixed components of the distortion or noise. Generally, the time period lasts at least 4 to 5 seconds. The factor C 'given in equation (4) is determined based on the fixed components of the signals z, x. It is assumed that even if non-fixed components such as voice are present in these signals, the operation performed by the spectral processor SP is based on this assumption.

식(4)에서 인자 C'의 분자와 분모의 스펙트럼은 평활화 상수들(β)을 각각 갖는 블록들(LPF1, LPF2)에 구현된 1차 재귀들(recursions)에서 전력 스펙트럼을 평활화함으로써 획득된다. 이 블록들에서의 재귀 구현은 입력 x,z 신호들의 평활화된 전력 스펙트럼 밀도 버전들을 획득하도록 도시된 바와 같이 결합된 곱셈기들(X), 덧셈기들(+), 및 지연 선들(z^-1)을 포함한다. 예를 들어, z 신호는 이 후, 평활화 규칙을 준수한다:The spectrum of the numerator and denominator of factor C 'in equation (4) is obtained by smoothing the power spectrum in the first order recursions implemented in blocks LPF1, LPF2 having smoothing constants β, respectively. The recursive implementation in these blocks uses combined multipliers (X), adders (+), and delay lines (z ⁻¹ ) as shown to obtain smoothed power spectral density versions of the input x, z signals. Include. For example, the z signal then follows the smoothing rule:

여기서 평활화 상수(β)는 0과 1 사이의 값으로 가정한다. 동일한 규칙이 x 신호 스펙트럼에 대해 적용할 수 있다. β의 값은 임의의 요구 방식으로 제어될 수 있다. 그것의 값은 일반적으로 50-200㎳의 시간 상수에 대응한다. 시간 프레임 인덱스 m마다, 이들 평활화된 양들의 각각이 쉬프트 레지스터들 SR1, SR2의 형태로 각각 버퍼에 저장된다. 레지스터 위치들의 각각에 저장된 L개의 평활화된 값들 외에, 개개의 최소 값들이 식(4)에 따른 C'의 계산된 값을 나타내도록 제수(divisor) D에 공급된다. 물론, 분모에서 작은 값에 의한 나눗셈을 방지하도록 적절한 조치들이 취해진다.Here, the smoothing constant β is assumed to be a value between 0 and 1. The same rule can apply for the x signal spectrum. The value of β can be controlled in any desired manner. Its value generally corresponds to a time constant of 50-200 ms. For each time frame index m, each of these smoothed amounts is stored in a buffer in the form of shift registers SR1, SR2, respectively. In addition to the L smoothed values stored in each of the register locations, individual minimum values are supplied to divisor D to represent the calculated value of C 'according to equation (4). Of course, appropriate measures are taken to prevent division by small values in the denominator.

요구 음성 신호의 평균 레벨이 왜곡의 평균 레벨에 대해 너무 높은 경우, LPF1과 LPF2에 의한 평균 출력이 요구 음성에 의해 지배되는 문제가 발생할 수 있다. 이것은 이들 평균들이 고 음성 레벨의 발생 후에 저 왜곡 레벨로 복귀하는데 오랜 시간이 걸리기 때문이다. C'의 추정이 요구 음성에 의해 영향을 받을 수 있을 때, 요구 음성 신호의 원하지 않는 억제를 초래한다. 이러한 영향은 예를 들어 이하에 따라 재귀들 내 다중가변 압축 함수(multivariable compression function) f_c를 적용함으로써 감소된다:If the average level of the demanded speech signal is too high relative to the average level of distortion, a problem may arise in that the average output by LPF1 and LPF2 is dominated by the demanded speech. This is because these averages take a long time to return to the low distortion level after the occurrence of the high speech level. When the estimate of C 'can be affected by the demanded speech, it results in unwanted suppression of the demanded speech signal. This effect is reduced, for example, by applying a multivariable compression function f _c in the recursions as follows:

동일한 규칙이 x 신호에 대해 적용할 수 있다. 새로운 입력 전력 값이 필터들(LPF1, LPF2)에서의 값들에 비해 비교적 큰 경우, 재귀의 업데이터 단계를 줄이기 위해 압축 함수가 선택된다. 그러므로 압축 함수는 평균 신호 전력상에서 높은 요구 음성 레벨의 영향을 감소시킨다. 적합한 압축 함수의 예가 이하에서 주어진다:The same rule can apply for the x signal. If the new input power value is relatively large compared to the values in the filters LPF1, LPF2, the compression function is selected to reduce the updater phase of recursion. Therefore, the compression function reduces the impact of high demanded speech levels on average signal power. Examples of suitable compression functions are given below:

여기서 δ는 양의 상수이다. δ의 값이 작을수록, 신호 값에서 더 느린 상승이 재귀 필터들(LPF1, LPF2)에 이어진다. 압축 블록(f_c)을 포함하는 실시예가 도 3에 도시된다. 압축이 필요치 않다면, 간단히 생략될 수 있다.Where δ is a positive constant. The smaller the value of δ, the slower rise in signal value follows the recursive filters LPF1, LPF2. An embodiment comprising a compression block f _c is shown in FIG. 3. If no compression is needed, it can simply be omitted.

상기는 본질적으로 양호한 실시예들과 최상의 가능 모드들을 참조하여 기술되었지만, 첨부된 청구범위 내의 다양한 변경들, 특징들, 및 특징들의 조합이 당업자의 이해 범위 내에 있기 때문에, 상기 실시예들은 관련된 시스템 실시예들 및 방법의 예들을 제한하는 것으로 해석되지 않는다는 점이 이해될 것이다.Although the foregoing has been described with reference to the preferred embodiments and the best possible modes, the embodiments of the invention are related to the implementation of the system as the various changes, features, and combinations of the features within the scope of the appended claims are within the understanding of those skilled in the art. It will be understood that it is not to be construed as limiting the examples of examples and methods.

Claims

Signal input for distorted desired signal z,

Reference signal input for reference signal x, and

A spectrum coupled to both of the signal inputs and processing the distorted request signal z by the reference signal x to yield a distortion estimate of the distorted request signal z, yielding a substantially distortion free output signal q An audio enhancement system comprising a processor,

The spectral processor is provided for processing the distorted request signal z using factor C ',

The distortion estimate of the distorted request signal z is a function of the product of the factor C 'and the spectral power of the reference signal x,

Wherein the factor C 'is determined by the spectral ratio of the distorted demand signal z and the essentially fixed components of the reference signal x.

The method of claim 1,

The factor C 'is defined by the ratio of the minimum of the spectral power of the distorted request signal and the minimum of the spectral power of the reference signal,

Wherein both minimum values are determined for a specific time period.

The method of claim 2,

Wherein said time period comprises at least one pause in said distorted request signal.

The method of claim 3, wherein

Wherein said time period lasts at least 4 to 5 seconds.

The method according to any one of claims 1 to 4,

The spectral power is defined by a function of any quantity related to the spectral power, such as spectral amplitude, squared spectral magnitude, power spectral density, or Mel-scale smoothed spectral density. An audio enhancement system, characterized in that.

The method according to any one of claims 1 to 4,

And the spectral processor comprises shift registers for storing values of spectral power.

The method according to any one of claims 1 to 4,

And the spectral power is smoothed spectral power.

A communication system provided with an audio enhancement system according to claim 1.

As a method of enhancing the distorted request signal z,

Processing the distorted request signal z by a reference signal x that causes a distortion estimate of the distorted request signal z to be derived, thereby reinforcing the distorted request signal z comprising a substantially distorted output signal q. In the way,

Said processing step comprises processing said distorted request signal z using factor C ',

Wherein said factor C 'is determined by the spectral ratio of the distorted demand signal z and the essentially fixed components of said reference signal x.

delete

A speech recognition system provided with an audio enhancement system according to claim 1.

A voice control system provided with an audio enhancement system according to claim 1.