EP2158588B1

EP2158588B1 - Spectral smoothing method for noisy signals

Info

Publication number: EP2158588B1
Application number: EP08784249A
Authority: EP
Inventors: Rainer Martin; Timo Gerkmann; Colin Breithaupt
Original assignee: Siemens Audioligische Technik GmbH; Ruhr Universitaet Bochum
Current assignee: Sivantos GmbH; Ruhr Universitaet Bochum
Priority date: 2007-06-27
Filing date: 2008-06-25
Publication date: 2010-10-13
Anticipated expiration: 2028-06-25
Also published as: WO2009000255A9; ATE484822T1; DK2158588T3; WO2009000255A1; US8892431B2; EP2158588A1; US20100182510A1; DE102007030209A1; DE502008001543D1

Abstract

A smoothing method for suppressing fluctuating artifacts in the reduction of interference noise includes the following steps: providing short-term spectra for a sequence of signal frames, transforming each short-term spectrum by way of a forward transformation which describes the short-term spectrum using transformation coefficients that represent the short-term spectrum subdivided into its coarse and fine structures; smoothing the transformation coefficients with the respective same coefficient indices by combining at least two successive transformed short-term spectra; and transforming the smoothed transformation coefficients into smoothed short-term spectra by way of a backward transformation.

Description

Die Erfindung betrifft ein Glättungsverfahren zur Unterdrückung von fluktuierenden Artefakten bei der Störgeräuschreduktion.The invention relates to a smoothing method for suppressing fluctuating artifacts in noise reduction.

In der digitalen Sprachsignalübertragung ist die Störgeräuschunterdrückung ein wichtiger Aspekt. Die mit einem Mikrofon erfassten und anschließend digitalisierten Audiosignale enthalten neben dem Nutzsignal (Figur 1) noch Umgebungsgeräusche, die dem Nutzsignal überlagert sind (Figur 2). So werden z.B. bei Freisprechanlagen in Fahrzeugen neben den Sprachsignalen noch Motoren- und Windgeräusche erfasst, bei Hörhilfen sind es ständig wechselnde Umgebungsgeräusche wie Verkehrsgeräusche oder im Hintergrund sprechende Personen wie etwa in einem Restaurant. Dadurch ist ein Verstehen des Sprachsignals nur mit erhöhter Anstrengung möglich. Die Störgeräuschreduktion zielt dementsprechend auf eine Erleichterung des Sprachverstehens ab. Daher darf eine Verringerung des Störgeräuschs auch das Sprachsignal nicht hörbar verzerren.In digital voice signal transmission, noise suppression is an important aspect. The audio signals recorded with a microphone and subsequently digitized contain, in addition to the useful signal ( FIG. 1 ) still ambient noise, which are superimposed on the useful signal ( FIG. 2 ). Thus, for example, in car kits in vehicles in addition to the speech signals engine and wind noise detected, hearing aids are constantly changing ambient noise such as traffic noise or people talking in the background, such as in a restaurant. As a result, an understanding of the speech signal is possible only with increased effort. The noise reduction aims accordingly to a relief of Speech understanding. Therefore, reducing the noise should not audibly distort the speech signal.

Für die Störgeräuschreduktion ist die Spektraldarstellung eine günstige Repräsentation des Signals. Hierbei wird das Signal in Frequenzen aufgeschlüsselt dargestellt. Eine praktische Realisierung der Spektraldarstellung sind Kurzzeitspektren, die durch eine Zerteilung des Signals in kurze Rahmen entstehen (Figur 3), die getrennt voneinander einer Spektraltransformation unterzogen werden (Figur 4). Ein Signalrahmen kann dabei bei einer Abtastrate von f_s = 8000 Hz beispielsweise M = 256 aufeinanderfolgende digitale Signalabtastwerte umfassen, was dann einer Dauer von 32 ms entspricht. Ein transformierter Rahmen besteht dann aus M sogenannten Frequenzbins. Der quadrierte Amplitudenwert eines Frequenzbins korrespondiert zur Energie, die das Signal in dem schmalen Frequenzausschnitt von ca. 31 Hz Bandbreite enthält, der vom jeweiligen Frequenzbin repräsentiert wird. Aufgrund der Symmetrieeigenschaften der Spektraltransformation sind von den M Frequenzbins nur M/2+1, also im vorangegangenen Beispiel 129 Bins relevant für die Signaldarstellung. Mit 129 relevanten Bins und 31 Hz Bandbreite pro Bin wird insgesamt ein spektrales Band von 0 Hz bis ca. 4000 Hz abgedeckt. Dies reicht aus, um viele Sprachlaute mit hinreichender spektraler Auflösung zu beschreiben. Eine andere gängige Bandbreite ist 8000 Hz, die durch eine höhere Abtastrate und somit mehr Frequenzbins bei gleicher Rahmendauer erreicht werden kann. In einem Kurzzeitspektrum sind die Frequenzbins mit µ indiziert. Der Index für Rahmen ist λ. Die Amplituden des Kurzzeitspektrums eines Rahmens λ werden hier allgemein als spektrale Größe G_µ(λ) notiert. Ein vollständiges Kurzzeitspektrum bestehend aus den M Frequenzbins eines Rahmens ergibt sich aus den Amplituden G_µ (λ) der Indizes µ = 0 bis µ = M-1, also µ = 0 ... M - 1. Für reelle Zeitsignale erfüllen Kurzzeitspektren die Symmetriebedingung G_µ (λ) = G_M-µ (λ). Eine gängige Form der Präsentation der Kurzzeitspektren sind sogenannte Spektrogramme, die durch Aneinanderreihung zeitlich aufeinanderfolgender Kurzzeitspektren gebildet werden (vgl. beispielhaft Figuren 6 bis 9).For the noise reduction, the spectral representation is a favorable representation of the signal. Here, the signal is displayed in frequency broken down. A practical realization of the spectral representation are short-term spectra which result from a division of the signal into short frames ( FIG. 3 ), which are subjected to a spectral transformation separately from each other ( FIG. 4 ). At a sampling rate of f _s = 8000 Hz, a signal frame may comprise, for example, M = 256 consecutive digital signal samples, which then corresponds to a duration of 32 ms. A transformed frame then consists of M so-called frequency bins. The squared amplitude value of a frequency bin corresponds to the energy that contains the signal in the narrow frequency slice of about 31 Hz bandwidth represented by the respective frequency bin. Due to the symmetry properties of the spectral transformation, only M / 2 + 1 of the M frequency bins, ie 129 bins in the previous example, are relevant for the signal representation. With 129 relevant bins and 31 Hz bandwidth per bin, a total of spectral band from 0 Hz to about 4000 Hz is covered. This is sufficient to describe many speech sounds with sufficient spectral resolution. Another common bandwidth is 8000 Hz, which can be achieved by a higher sampling rate and thus more frequency bins with the same frame duration. In a short-term spectrum, the frequency bins are indexed with μ. The index for frames is λ. The amplitudes of the short-term spectrum of a frame λ are generally noted here as a spectral quantity G _μ (λ). A complete short-term spectrum consisting of the M frequency bins of a frame results from the amplitudes G _μ (λ) of the indices μ = 0 to μ = M-1, that is μ = 0... M-1. For real-time signals, short-time spectra fulfill the symmetry condition G _μ (λ) = G _M-μ (λ). A common form of presentation of short-term spectra are so-called spectrograms, which are formed by juxtaposing temporally successive short-term spectra (cf., for example FIGS. 6 to 9 ).

Vorteil der Spektraldarstellung ist, dass die wesentliche Sprachenergie in einer relativ geringen Anzahl von Frequenzbins konzentriert vorliegt (Figuren 4 und 6), während im Zeitsignal alle digitalen Abtastwerte gleich relevant sind (Figur 3). Die Signalenergie der Störung ist in den meisten Fällen auf eine größere Anzahl von Frequenzbins verteilt. Da die Frequenzbins unterschiedlich viel Sprachenergie enthalten, ist es möglich, das Rauschen in jenen Bins zu unterdrücken, die nur wenig Sprachenergie enthalten. Je schmalbandiger die Frequenzbins sind, desto besser gelingt diese Trennung.The advantage of the spectral representation is that the essential speech energy is concentrated in a relatively small number of frequency bins ( FIGS. 4 and 6 ), while in the time signal all digital samples are equally relevant ( FIG. 3 ). The signal energy of the disturbance is in most cases distributed over a larger number of frequency bins. Since the frequency bins contain different amounts of speech energy, it is possible to suppress the noise in those bins that contain little speech energy. The narrowband the frequency bins are, the better this separation succeeds.

Für die Störgeräuschreduktion wird eine spektrale Gewichtungsfunktion geschätzt, die nach unterschiedlichen Optimierungskriterien berechnet werden kann. Sie ergibt niedrige Werte oder Null in Frequenzbins, in denen hauptsächlich Störung vorliegt, und Werte nahe oder gleich Eins für Bins, in denen Sprachenergie dominiert (Figur 5). Die Gewichtungsfunktion wird im Allgemeinen für jeden Signalrahmen in jedem Frequenzbin neu geschätzt. Die Gesamtheit der Gewichtungswerte aller Frequenzbins eines Rahmens wird hier auch als "Kurzzeitspektrum der Gewichtungsfunktion" oder einfach als "Gewichtungsfunktion" bezeichnet.For noise reduction, a spectral weighting function is estimated, which can be calculated according to different optimization criteria. It results in low values or zero in frequency bins that are predominantly perturbing, and values close to or equal to one for bins in which voice energy dominates ( FIG. 5 ). The weighting function is generally re-estimated for each signal frame in each frequency bin. The totality of the weighting values of all the frequency bins of a frame is also referred to herein as the "short-term spectrum of the weighting function" or simply as the "weighting function".

Eine Multiplikation der Gewichtungsfunktion mit dem Kurzzeitspektrum des verrauschten Signals ergibt das gefilterte Spektrum, in dem die Amplituden der Frequenzbins, in denen Störung dominiert, stark verringert sind, während Sprachkomponenten nahezu unbeeinflusst bleiben (Figuren 8 und 9).Multiplication of the weighting function with the short-term spectrum of the noisy signal yields the filtered spectrum in which the amplitudes of the frequency bins in which interference dominates are greatly reduced, while speech components remain almost unaffected ( FIGS. 8 and 9 ).

Schätzfehler bei der Berechnung der spektralen Gewichtungsfunktion, sogenannte Fluktuationen, ergeben gelegentlich zu hohe Gewichtungswerte für Frequenzbins, die hauptsächlich Störung enthalten (Figur 8). Dies geschieht unabhängig von spektral benachbarten oder zeitlich vorangegangenen Werten. Fluktuationen kommen auch schon in spektralen Zwischengrößen wie z.B. der Schätzung des Signal-zu-Rausch-Verhältnisses (signal-to-noise ratio, SNR) vor. Nach Multiplikation der schätzfehlerbehafteten Gewichtungsfunktion mit dem verrauschten Kurzzeitspektrum enthält das gefilterte Spektrum einzelne Frequenzbins, die hauptsächlich Störung enthalten und dennoch relativ hohe Amplituden aufweisen. Diese Bins heißen Ausreißer. Bei der Synthese eines Zeitsignals aus den gefilterten Kurzzeitspektren sind die vereinzelten Ausreißer als tonale Artefakte (musical noise) zu hören, die wegen ihrer Tonalität als besonders störend empfunden werden (Figuren 10 und 11). Ein einzelnes tonales Artefakt hat die Dauer eines Signalrahmens und seine Frequenz wird durch den Frequenzbin bestimmt, in dem der Ausreißer vorkam.Estimated errors in the calculation of the spectral weighting function, so-called fluctuations, sometimes result in too high weighting values for frequency bins, which mainly contain interference ( FIG. 8 ). This happens independently of spectrally adjacent or temporally preceding values. Fluctuations occur even in spectral intermediate quantities such as the signal-to-noise ratio (SNR). After multiplying the estimation-biased weighting function with the noisy short-term spectrum, the filtered spectrum includes individual frequency bins that are predominantly perturbing yet have relatively high amplitudes. These bins are called outliers. In the synthesis of a Time signal from the filtered short-term spectra, the isolated outliers are heard as tonal artifacts (musical noise), which are perceived as particularly disturbing because of their tonality ( FIGS. 10 and 11 ). A single tonal artifact has the duration of a signal frame and its frequency is determined by the frequency bin in which the outlier occurred.

Zur Unterdrückung von Fluktuationen in der Gewichtungsfunktion oder in spektralen Zwischengrößen bzw. zur Unterdrückung von Ausreißern im gefilterten Spektrum können diese spektralen Größen durch ein Mittelungsverfahren geglättet und somit von überhöhten Werten befreit werden. Spektrale Größen mehrerer spektral benachbarter oder zeitlich aufeinanderfolgender Frequenzbins werden dabei zu einem Mittelwert verrechnet, so dass die Amplitude einzelner Ausreißer relativiert wird. Eine Glättung ist über der Frequenz [1: Tim Fingscheidt, Christophe Beaugeant and Suhadi Suhadi. Overcoming the statistical independence assumption w.r.t. frequency in speech enhancement. Proceedings, IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), 1:1081-1084, 2005 ], entlang der Zeit [2: Harald Gustafsson, Sven Erik Nordholm and Ingvar Claesson. Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Transactions on Speech and Audio Processing, 9(8):799-807, November 2001 ] oder als Kombination aus zeitlicher und spektraler Mittelung [3: Zenton Goh, Kah-Chye Tan and B.T.G. Tan. Postprocessing method for suppressing musical noise generated by spectral subtraction. IEEE Transactions on Speech and Audio Processing, 6(3):287-292, May 1998 ] bekannt. Nachteil einer Glättung über der Frequenz ist, dass bei einer Verrechnung mehrerer Frequenzbins die spektrale Auflösung verringert wird, also die Unterscheidung zwischen Sprachbins und Rauschbins schwieriger wird. Eine zeitliche Glättung durch Zusammenfassung aufeinanderfolgender Werte eines Bins verringert die zeitliche Dynamik der spektralen Werte, also ihr Vermögen, schnellen zeitlichen Änderungen der Sprache zu folgen. Eine Verzerrung des Sprachsignals ist die Folge (clipping). Außerdem kann ein mit dem Sprachsignal korreliertes, irritierendes Restrauschen hörbar werden (noise shaping). Diese Glättungsverfahren im Spektralbereich müssen deshalb im Allgemeinen aufwändig an das Sprachsignal adaptiert werden.To suppress fluctuations in the weighting function or in spectral intermediate quantities or to suppress outliers in the filtered spectrum, these spectral magnitudes can be smoothed by an averaging method and thus freed from excessive values. Spectral magnitudes of several spectrally adjacent or temporally successive frequency bins are calculated to an average, so that the amplitude of individual outliers is relativized. Smoothing is above the frequency [1: Tim Fingscheidt, Christophe Beaugeant and Suhadi Suhadi. Overcoming the statistical independence assumption wrt frequency in speech enhancement. Proceedings, IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), 1: 1081-1084, 2005 ], along the time [2: Harald Gustafsson, Sven Erik Nordholm and Ingvar Claesson. Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Transactions on Speech and Audio Processing, 9 (8): 799-807, November 2001 ] or as a combination of temporal and spectral averaging [3: Zenton Goh, Kah-Chye Tan and BTG Tan. Postprocessing method for suppressing musical noise generated by spectral subtraction. IEEE Transactions on Speech and Audio Processing, 6 (3): 287-292, May 1998 ] known. The disadvantage of smoothing over the frequency is that when multiple frequency bins are billed, the spectral resolution is reduced, making it more difficult to distinguish between speech bins and noise bins. A temporal smoothing by combining successive values of a bin reduces the temporal dynamics of the spectral values, ie their ability to follow rapid temporal changes of the language. Distortion of the speech signal is the consequence (clipping). In addition, an irritating residual noise correlated with the speech signal can be heard (noise shaping). This smoothing process in Spectral range must therefore be extensively adapted to the speech signal in general.

Eine weitere bekannte Form der Glättung einzelner Kurzzeitspektren über der Frequenz ist ein als "Liftering" bekanntes Verfahren [4: Andrzej Czyzewski. Multitask noisy speech enhancement system. http://sound.eti.pg.gda.pl/denoise/main.html, 2004 ], [5: Francois Thibault. High-level control of singing voice timbre transformations. http://www.music.mcgill.ca/thibault/Thesis/node43.html, 2004 ]. Hierbei wird das Kurzzeitspektrum eines Rahmens λ zunächst in den sogenannten Cepstralbereich transformiert. Die cepstrale Repräsentation der spektralen Amplituden G_µ(λ) berechnet sich zu $G_{μʹ}^{cepst} (λ) = IDFT \{\log (G_{μ} (λ))\}, {μʹ} = 0 \dots (M - 1), μ = 0 \dots (M - 1)$

mit IDFT{·} der inversen diskreten Fourier-Transformation (DFT) einer Folge von Werten der Länge M. Diese Transformation resultiert in M Transformationskoeffizienten

G_{μʹ}^{cepst} (λ),

den sogenannten cepstralen Bins mit Index µ'. Das Cepstrum besteht nach Gleichung (1) prinzipiell aus einer nicht-linearen Abbildung, nämlich der Logarithmierung, einer betragsmäßig vorliegenden spektralen Größe und einer anschließenden Transformation dieses logarithmierten Betragsspektrums mit einer Transformation. Der Vorteil einer cepstralen Repräsentation der Amplituden (Figur 14) ist, dass Sprache nicht mehr kammartig über die Frequenz verteilt ist (Figuren 4 und 6), sondern die wesentliche Information über das Sprachsignal in den cepstralen Bins mit kleinem Index repräsentiert ist. Außerdem wird wesentliche Sprachinformation noch in dem verhältnismäßig leicht zu detektierenden cepstralen Bin mit höherem Index repräsentiert, der die sogenannte Pitch-Frequenz (Sprachgrundfrequenz) des Sprechers repräsentiert.Another known form of smoothing individual short-term spectra over frequency is a method known as "liftering" [4]. Andrzej Czyzewski. Multitask noisy speech enhancement system. Http://sound.eti.pg.gda.pl/denoise/main.html, 2004 ], [5: Francois Thibault. High-level control of singing voice timbre transformations. http://www.music.mcgill.ca/thibault/Thesis/node43.html, 2004 ]. Here, the short-term spectrum of a frame λ is first transformed into the so-called cepstral region. The cepstral representation of the spectral amplitudes G _μ (λ) is calculated to

G_{μ'}^{cepst} (λ) = IDFT \{\log (G_{μ} (λ))\} . {μ'} = 0 ... (M - 1) . μ = 0 ... (M - 1)

with IDFT {·} of the inverse discrete Fourier transformation (DFT) of a sequence of values of length M. This transformation results in M transformation coefficients

G_{μ'}^{cepst} (λ) .

the so-called cepstral bins with index μ '. According to equation (1), the cepstrum principally consists of a non-linear mapping, namely the logarithmization, a spectral value of magnitude and a subsequent transformation of this logarithmic magnitude spectrum with a transformation. The advantage of a cepstral representation of the amplitudes ( FIG. 14 ) is that speech is no longer comb-like across the frequency ( FIGS. 4 and 6 ), but the essential information about the speech signal is represented in the cepstral small index bins. In addition, substantial speech information is still represented in the relatively easily detectable cepstral bin of higher index, which represents the so-called pitch frequency of the speaker.

Ein geglättetes Kurzzeitspektrum kann berechnet werden, indem cepstrale Bins mit relativ kleinen Beträgen zu Null gesetzt werden und anschließend das veränderte Cepstrum wieder in ein Kurzzeitspektrum rücktransformiert wird. Da allerdings starke Fluktuationen bzw. Ausreißer zu entsprechend hohen Amplituden im Cepstrum führen, können diese Artefakte durch diese Verfahren nicht detektiert und unterdrückt werden.A smoothed short-term spectrum can be calculated by setting cepstral bins to zero with relatively small amounts, and then re-transforming the altered cepstrum back to a short-term spectrum. However, since strong fluctuations or outliers to correspondingly high Amplitudes in the cepstrum, these artifacts can not be detected by these methods and suppressed.

Alternativ zum Liftering gibt es noch das Verfahen nach [6: Petre Stoica and Niclas Sandgren. Smoothed nonparametric spectral estimation via cepstrum thresholding. IEEE Signal Processing Magazine, pages 34-45, November 2006 ]. Hier werden nach einem Kriterium ausgesuchte cepstrale Bins nicht zu Null gesetzt, sondern zu einem Wert, der für die Schätzung von Langzeitspektren stationärer Signale aus Kurzzeitspektren optimal ist. Diese Form der Schätzung von Signalspektren bringt für stark nicht-stationäre Signale wie Sprache generell keine Vorteile.As an alternative to liftering, the procedure according to [6: Petre Stoica and Niclas Sandgren. Smoothed nonparametric spectral estimation via cepstrum thresholding. IEEE Signal Processing Magazine, pages 34-45, November 2006 ]. Here cepstral bins selected according to one criterion are not set to zero, but to a value which is optimal for the estimation of long-term spectra of stationary signals from short-term spectra. This form of signal spectrum estimation generally does not provide benefits for highly non-stationary signals such as speech.

Das Dokument US 5,365,592 offenbart ein weiteres Liftering ähnlichen Verfahren.The document US 5,365,592 discloses another liftering-like method.

Hiervon ausgehend liegt der Erfindung die Aufgabe zugrunde, für die Rauschreduktion ein Glättungsverfahren zur Unterdrückung von Fluktuationen in der Gewichtungsfunktion oder in spektralen Zwischengrößen bzw. von Ausreißern in gefilterten Kurzzeitspektren aufzuzeigen, das weder die Frequenzauflösung der Kurzzeitspektren verringert noch die zeitliche Dynamik des Sprachsignals beeinträchtigt.Proceeding from this, the object of the invention is to provide a smoothing method for suppressing fluctuations in the weighting function or in spectral intermediate variables or outliers in filtered short-term spectra, which neither reduces the frequency resolution of the short-term spectra nor impairs the temporal dynamics of the speech signal.

Die Lösung dieser Aufgabe besteht in einem Glättungsverfahren mit den Maßnahmen von Patentanspruch 1. Vorteilhafte Weiterbildungen sind Gegenstand der Unteransprüche.The solution to this problem consists in a smoothing process with the measures of claim 1. Advantageous developments are the subject of the dependent claims.

Das erfindungsgemäße Glättungsverfahren umfasst folgende Schritte:

Bereitstellen von Kurzzeitspektren einer Folge von digitalen Abtastwerten umfassenden Signalrahmen,
Transformieren jedes Kurzzeitspektrums durch eine Hintransformation, die das Kurzzeitspektrum durch Transformationskoeffizienten beschreibt, welche das Kurzzeitspektrum in seine groben und seine feinen Strukturen unterteilt repräsentieren,
Glätten der Transformationskoeffizienten jeweils gleicher Koeffizientenindizes durch Kombination von wenigstens zwei aufeinanderfolgenden transformierten Kurzzeitspektren und
Transformieren der geglätteten Transformationskoeffizienten in geglättete Kurzzeitspektren durch eine Rücktransformation.

The smoothing method according to the invention comprises the following steps:

Providing short-term spectra of a sequence of digital samples comprising signal frames,
Transforming each short-term spectrum by a forward transformation that describes the short-term spectrum by transform coefficients that represent the short-term spectrum divided into its coarse and fine structures,
Smoothing the transformation coefficients of each same coefficient indices by combining at least two consecutive transformed short-term spectra and
Transform the smoothed transform coefficients into smoothed short-term spectra by inverse transform.

Das erfindungsgemäße Glättungsverfahren bedient sich einer Transformation wie des Cepstrums, um ein breitbandiges Sprachsignal mit möglichst wenig Transformationskoeffizienten in seiner wesentlichen Struktur zu beschreiben. Anders als in bekannten Verfahren werden die Transformationskoeffizienten aber nicht unabhängig voneinander zu Null gesetzt, wenn sie einen Schwellwert unterschreiten. Es werden stattdessen die Werte von Transformationskoeffizienten aus mindestens zwei aufeinanderfolgenden Rahmen durch eine Glättung über die Zeit miteinander verrechnet. Hierbei wird der Grad der Glättung davon abhängig gemacht, inwieweit die durch den Koeffizienten repräsentierte spektrale Struktur entscheidend für die Beschreibung des Nutzsignals ist. Der Grad der zeitlichen Glättung eines Koeffizienten hängt daher beispielsweise davon ab, ob ein Transformationskoeffizient viel Sprachenergie enthält oder wenig. Dies ist im Cepstrum oder ähnlichen Transformationen leichter zu bestimmen als im Kurzzeitspektrum. So kann beispielsweise angenommen werden, dass die ersten vier cepstralen Koeffizienten mit Indizes µ' = 0 ... 3 und zusätzlich der Koeffizient mit maximalem Betrag und Index µ' größer 16 und kleiner 160 bei f_s = 8000 Hz (Pitch) Sprache repräsentieren. Koeffizienten mit viel Sprachinformation werden nur soweit geglättet, dass ihre zeitliche Dynamik nicht geringer wird als bei einem unverrauschten Sprachsignal. Gegebenenfalls werden diese Koeffizienten gar nicht geglättet. Sprachverzerrungen werden so verhindert. Da spektrale Fluktuationen und Ausreißer eine kurzzeitige Änderung in der Feinstruktur eines Kurzzeitspektrums darstellen, bilden sie sich im transformierten Kurzzeitspektrum als kurzzeitige Änderung derjenigen Transformationskoeffizienten ab, die die Feinstruktur des Kurzzeitspektrums repräsentieren. Da diese Transformationskoeffizienten bei unverrauschter Sprache eine relativ geringe zeitliche Änderungsrate haben, können eben diese Koeffizienten stärker geglättet werden. Eine verstärkte zeitliche Glättung wirkt somit der Ausbildung von Ausreißern entgegen, ohne die Struktur der Sprache zu beeinflussen. Das Glättungsverfahren resultiert somit nicht in einer verringerten spektralen Auflösung für Sprachlaute. Die Änderung der Feinstruktur des Kurzzeitspektrums bei aufeinanderfolgenden Rahmen ist derart verzögert, dass nur schmalbandige spektrale Änderungen mit Zeitkonstanten kleiner als derjenigen von unverrauschter Sprache unterbunden werden.The smoothing method according to the invention makes use of a transformation such as cepstrum in order to describe a broadband speech signal with as few transformation coefficients as possible in its essential structure. Unlike in known methods, however, the transformation coefficients are not set to zero independently of each other if they fall below a threshold value. Instead, the values of transformation coefficients from at least two consecutive frames are offset by smoothing over time. In this case, the degree of smoothing is made dependent on the extent to which the spectral structure represented by the coefficient is decisive for the description of the useful signal. The degree of temporal smoothing of a coefficient therefore depends, for example, on whether a transformation coefficient contains a lot of speech energy or little. This is easier to determine in cepstrum or similar transformations than in the short-term spectrum. Thus, for example, it can be assumed that the first four cepstral coefficients with indices μ '= 0... 3 and additionally the coefficient with maximum magnitude and index μ' represent greater than 16 and less than 160 at f _s = 8000 Hz (pitch) speech. Coefficients with a large amount of speech information are only smoothed to the extent that their temporal dynamics do not become lower than with an unbounded speech signal. If necessary, these coefficients are not even smoothed. Speech distortions are thus prevented. Since spectral fluctuations and outliers represent a short-term change in the fine structure of a short-term spectrum, they form in the transformed short-term spectrum as a short-term change of those transformation coefficients that represent the fine structure of the short-term spectrum. Since these transformation coefficients have a relatively low rate of change of time in the case of unobtrusive speech, it is precisely these coefficients that can be smoothed out more effectively. Increased temporal smoothing thus counteracts the formation of outliers without affecting the structure of the language. The smoothing method thus does not result in a reduced spectral resolution for speech sounds. The change in the fine structure of the short-term spectrum in successive frames is delayed so that only narrow-band spectral changes with time constants smaller than that of noisy speech are prevented.

Aus der geglätteten Größe, notiert als $G_{μʹ, smooth}^{cepst} (λ),$

kann durch eine Rücktransformation wieder eine spektrale Darstellung des geglätteten Kurzzeitspektrums gewonnen werden. Für eine cepstrale Repräsentation, wie unter (1) beschrieben, ist eine mögliche Rücktransformation:

G_{μ, smooth} (λ) = \exp (DFT \{G_{μʹ, smooth}^{cepst} (λ)\}), μ = 0 \dots (M - 1), μʹ = 0 \dots (M - 1)

mit DFT{} der diskreten Fourier-Transformation und exp() der Exponentialfunktion, die in (2) elementweise angewendet wird.From the smoothed size, noted as

G_{μ' . smooth}^{cepst} (λ) .

can be obtained by a back transformation again a spectral representation of the smoothed short-term spectrum. For a cepstral representation, as described in (1), a possible back transformation is:

G_{μ . smooth} (λ) = \exp (DFT \{G_{μ' . smooth}^{cepst} (λ)\}) . μ = 0 ... (M - 1) . μ' = 0 ... (M - 1)

with DFT {} of the discrete Fourier transform and exp () of the exponential function, which is applied element by element in (2).

Die Vorteile, die sich aus der erfindungsgemäßen Glättung von kurzzeitspektren ergeben, sind:

eine effektive Unterdrückung von Fluktuationen bzw. Ausreißern,
Beibehaltung der spektralen Auflösung für Sprachsignale und
keine hörbare Beeinflussung von Sprache.

The advantages that result from the smoothing of short-term spectra according to the invention are:

an effective suppression of fluctuations or outliers,
Maintaining the spectral resolution for speech signals and
no audible influence on speech.

Es ist wichtig anzumerken, dass die für das Cepstrum verwendete inverse DFT in (1) und die DFT für die Rücktransformation in (2) durch andere Transformationen ersetzt werden können, ohne dass dabei die prinzipiellen Eigenschaften der Transformationskoeffizienten bzgl. der kompakten Repräsentation von Sprache verloren gehen. Genauso verhält es sich mit der Logarithmierung in (1) und der entsprechenden Umkehrfunktion in (2), der Exponentialfunktion. Auch hier sind andere nicht-lineare Abbildungen und auch lineare Abbildungen denkbar.It is important to note that the inverse DFT in (1) used for the cepstrum and the DFT for inverse transformation in (2) can be replaced by other transforms without losing the principal properties of the transform coefficients with respect to the compact representation of speech walk. The same is true of the logarithmization in (1) and the corresponding inverse function in (2), the exponential function. Again, other non-linear mappings and even linear mappings are conceivable.

Transformationen unterscheiden sich in ihren verwendeten Basisfunktionen. Der Vorgang der Transformation bedeutet, dass das Signal mit den verschiedenen Basisfunktionen korreliert wird. Der resultierende Grad der Korrelation zwischen dem Signal und einer Basisfunktion ist dann der zugehörige Transformationskoeffizient. Bei einer Transformation entstehen so viele Transformationskoeffizienten wie es Basisfunktionen gibt. Ihre Anzahl ist hier mit M bezeichnet. Für die Erfindung wichtige Transformationen sind solche, durch deren Basisfunktionen das zu transformierende Kurzzeitspektrum in seiner Grobstruktur und seiner Feinstruktur aufgeschlüsselt wird.Transformations differ in their basic functions used. The process of transformation means that the signal is correlated with the various basis functions. The resulting degree of correlation between the signal and a basis function is then the associated transformation coefficient. In a transformation arise like this many transformation coefficients as there are basis functions. Their number is denoted by M here. Transformations that are important for the invention are those by means of whose basic functions the short-term spectrum to be transformed is broken down into its coarse structure and its fine structure.

Ein Unterscheidungsmerkmal von Transformationen ist die Orthogonalität. Orthogonale Transformationsbasen enthalten nur Basisfunktionen, die unkorreliert sind. Für den Fall, dass das Signal mit einer der Basisfunktionen identisch ist, entstehen bei orthogonalen Transformationen Transformationskoeffizienten mit dem Wert Null, bis auf den einen Koeffizienten, der identisch zum Signal ist. Die Trennschärfe einer orthogonalen Transformation ist demnach hoch. Nicht-orthogonale Transformationen verwenden Funktionsbasen, die miteinander korreliert sind.A distinguishing feature of transformations is orthogonality. Orthogonal transformation bases contain only basis functions that are uncorrelated. In the case that the signal is identical to one of the basic functions, transform coefficients with the value zero are produced in the case of orthogonal transformations, with the exception of the one coefficient which is identical to the signal. The selectivity of an orthogonal transformation is therefore high. Non-orthogonal transforms use function bases that are correlated with each other.

Ein weiteres Merkmal ist, dass die Basisfunktionen für den betrachteten Anwendungsfall diskret und endlich sind, da es sich bei den verarbeiteten Signalrahmen um diskrete Signale von der Länge eines Rahmens handelt.Another feature is that the basic functions for the considered application are discrete and finite, since the processed signal frames are discrete signals of the length of one frame.

Wichtiges Merkmal einer Transformation ist die Invertierbarkeit. Existiert zu einer Transformation (Hintransformation) eine inverse Transformation, so entsteht durch Transformation eines Signals in Transformationskoeffizienten und anschließender inverser Transformation (Rücktransformation) dieser Koeffizienten wieder das Ausgangssignal, falls die Transformationkoeffizienten nicht verändert wurden.An important feature of a transformation is the invertibility. If there is an inverse transformation to a transformation (forward transformation), the transformation of a signal into transformation coefficients and subsequent inverse transformation (inverse transformation) of these coefficients again produces the output signal if the transformation coefficients were not changed.

In der Signalverarbeitung, wie sie hier beschrieben wird, ist die diskrete Fourier-Transformation (DFT) eine bevorzugte Transformation. Ein dazugehöriger wichtiger Algorithmus in der diskreten Signalverarbeitung ist die "Fast-Fourier-Transformation" (FFT). Außerdem sind die diskrete Cosinus-Transformation (DCT) und die diskrete Sinus-Transformation (DST) häufig verwendete Transformationen. Diese Transformationen werden hier unter dem Begriff "Standardtransformationen" zusammengefasst. Eine für die Erfindung entscheidende bereits erwähnte Eigenschaft der Standardtransformationen ist, dass die Amplituden der verschiedenen Transformationskoeffizienten unterschiedliche Grade an Feinstruktur des transformierten Signals repräsentieren. So beschreiben Koeffizienten mit kleinen Indizes die groben Strukturen des transformierten Signals, weil die zugehörigen Basisfunktionen niederfrequente harmonische Funktionen sind. Je höher der Index eines Transformationskoeffizienten bis hin zu µ' = M/2, desto feiner sind die Strukturen des transformierten Signals, die durch diesen Koeffizienten beschrieben werden. Für darüber hinausgehende Koeffizienten dreht sich diese Eigenschaft wegen der Symmetrie der Koeffizienten um. In der Regel werden bei der Signalverarbeitung nur die Koeffizienten mit Indizes µ' = 0 bis µ' = M/2 verarbeitet und die restlichen Werte durch spiegeln der Resultate ermittelt.In signal processing as described herein, Discrete Fourier Transform (DFT) is a preferred transformation. An associated important algorithm in discrete signal processing is the "Fast Fourier Transform" (FFT). In addition, discrete cosine transform (DCT) and discrete sine transform (DST) are commonly used transformations. These transformations are summarized here under the term "standard transformations". An already mentioned characteristic of the standard transformations, which is decisive for the invention, is that the amplitudes of the different transformation coefficients are different Represent levels of fine structure of the transformed signal. Thus, coefficients with small indices describe the coarse structures of the transformed signal because the associated basis functions are low-frequency harmonic functions. The higher the index of a transformation coefficient up to μ '= M / 2, the finer are the structures of the transformed signal which are described by this coefficient. For further coefficients, this property revolves because of the symmetry of the coefficients. As a rule, only the coefficients with indices μ '= 0 to μ' = M / 2 are processed during signal processing and the remaining values are determined by mirroring the results.

Die Invertierbarkeit der Transformationen macht es außerdem möglich, die Transformation und ihre Inverse bei der Hin- und Rücktransformation zu vertauschen. In (1) ist also auch beispielsweise die Verwendung der DFT aus (2) möglich, wenn in (2) die IDFT aus (1) verwendet wird.The invertibility of the transforms also makes it possible to interchange the transform and its inverse in the back and forth transformations. In (1), the use of the DFT from (2) is thus also possible, for example, if the IDFT from (1) is used in (2).

Vorteilhaft werden die spektralen Koeffizienten der Kurzzeitspektren vor der Hintransformation nicht-linear abgebildet. Prinzipielle, für die Erfindung vorteilhafte Eigenschaft der nicht-linearen Abbildung ist eine Dynamik-Kompression relativ großer Amplituden und eine Dynamik-Expansion relativ kleiner Amplituden.Advantageously, the spectral coefficients of the short-term spectra are mapped before non-linear transformation. A principal feature of non-linear imaging which is advantageous for the invention is dynamic compression of relatively large amplitudes and dynamic expansion of relatively small amplitudes.

Entsprechend können die spektralen Koeffizienten der geglätteten Kurzzeitspektren nach der Rücktransformation nicht-linear abgebildet werden, wobei die nicht-lineare Abbildung nach der Rücktransformation die Umkehrung der nicht-linearen Abbildung vor der Hintransformation ist.Accordingly, the spectral coefficients of the smoothed short-term spectra after the inverse transformation can be mapped non-linearly, the non-linear mapping after the inverse transformation being the inverse of the non-linear mapping before the forward transformation.

Zweckmäßigerweise werden die spektralen Koeffizienten vor der Hintransformation durch Logarithmierung nicht-linear abgebildet.Expediently, the spectral coefficients are imaged non-linearly by logarithmization before the forward transformation.

Eine Form der zeitlichen Glättung kann durch ein rekursives System vorzugsweise erster Ordnung erreicht werden: $G_{μʹ, smooth}^{cepst} (λ) = β_{μʹ} G_{μʹ, smooth}^{cepst} (λ - 1) + (1 - β_{μʹ}) G_{μʹ}^{cepst} (λ) .$

A form of temporal smoothing can be achieved by a recursive system of preferably first order:

G_{μ' . smooth}^{cepst} (λ) = β_{μ'} G_{μ' . smooth}^{cepst} (λ - 1) + (1 - β_{μ'}) G_{μ'}^{cepst} (λ),

Mögliche Werte für die Glättungskonstanten für Koeffizienten der Standardtransformationen im Falle von Sprachsignalen sind β_µ' = 0 für µ' = 0 ... 3, β_µ' = 0,8 für µ' = 4 ... M/2 mit Ausnahme der Transformationskoeffizienten durch die die Pitch-Frequenz eines Sprechers repräsentiert wird, und β_µ' = 0,4 für Transformationskoeffizienten, die die Pitch-Frequenz repräsentieren. Verfahren zur Bestimmung des Pitch-Koeffizienten sind zahlreich in der Literatur verfügbar. Beispielsweise kann zur Bestimmung des Koeffizienten der Pitch derjenige Koeffizient gewählt werden, dessen Index zwischen µ' = 16 und µ' = 160 liegt und der die maximale Amplitude aller Koeffizienten in diesem Indexbereich aufweist. Für die verbleibenden Transformationskoeffizienten mit Indizes µ'= M/2 + 1 ... M - 1 gilt die Symmetriebedingung β_M-µ' = β_µ'. Die Werte sind für die Standardtransformationen sowie Kurzzeitspektren, die aus Signalen mit f_s = 8000 Hz entstanden sind, geeignet. Sie sind durch verhältnismäßige Umrechnung an andere Systeme anpassbar. Die Wahl β_µ' = 0 bedeutet, dass die betreffenden Koeffizienten nicht geglättet werden. Es ist eine entscheidende Eigenschaft der Erfindung, dass Koeffizienten, die den groben Verlauf des Kurzzeitspektrums beschreiben, möglichst wenig geglättet werden, wenn es sich um die Entrauschung von Sprachsignalen handelt. So werden die groben Strukturen des breitbandigen Sprachspektrums vor Glättungseffekten geschützt. Die feinen Strukturen von Fluktuationen bzw. spektralen Ausreißern bilden sich bei den Standardtransformationen in den Transformationskoeffizienten zwischen µ' = 4 und µ' = M/2 ab, weshalb diese bis auf den Pitch der Sprache stark geglättet werden.Possible values for the smoothing constants for coefficients of the standard transformations in the case of speech signals are β _{μ '} = 0 for μ' = 0 ... 3, β _μ '= 0.8 for μ' = 4 ... M / 2 with the exception of Transform coefficients by which the pitch frequency of a speaker is represented, and β _{μ '} = 0.4 for transform coefficients representing the pitch frequency. Methods for determining the pitch coefficient are numerous available in the literature. For example, to determine the coefficient of the pitch, that coefficient can be selected whose index lies between μ '= 16 and μ' = 160 and which has the maximum amplitude of all the coefficients in this index range. For the remaining transformation coefficients with indices μ '= M / 2 + 1... M-1, the symmetry condition β _M-μ' = β _{μ '} applies. The values are suitable for the standard transformations as well as short-time spectra, which have been generated from signals with f _s = 8000 Hz. They are adaptable by relative conversion to other systems. The choice β _{μ '} = 0 means that the respective coefficients are not smoothed. It is a crucial feature of the invention that coefficients that describe the coarse course of the short-term spectrum are smoothed as little as possible when it comes to the denoising of speech signals. Thus, the coarse structures of the broadband speech spectrum are protected from smoothing effects. The fine structures of fluctuations or spectral outliers are reflected in the standard transformations in the transformation coefficients between μ '= 4 and μ' = M / 2, which is why they are greatly smoothed to the pitch of the language.

Vorteilhafterweise wird das Glättungsverfahren auf den Betrag oder eine Potenz des Betrags der Kurzzeitspektren angewendet.Advantageously, the smoothing method is applied to the magnitude or power of the magnitude of the short-term spectra.

Besonders vorteilhaft ist es, wenn zum Glätten der jeweiligen Transformationskoeffizienten unterschiedliche Zeitkonstanten verwendet werden. Die Zeitkonstanten können so gewählt werden, dass die Transformationskoeffizenten, die vornehmlich Sprache repräsentieren, wenig geglättet werden. Zweckmäßigerweise können die Transformationskoeffizenten, die hauptsächlich fluktuierende Hintergrundgeräusche und Artefakte der Geräuschreduktionsalgorithmen beschreiben, stark geglättet werden.It is particularly advantageous if different time constants are used to smooth out the respective transformation coefficients. The time constants can be chosen in such a way that the transformation coefficients, which primarily represent language, are less well smoothed. Conveniently, the transformation coefficients, the mainly fluctuating background noise and Artifacts of noise reduction algorithms describe being heavily smoothed.

Als Kurzzeitspektrum kann die spektrale Gewichtungsfunktion eines Geräuschreduktionsalgorithmus bereitgestellt werden. Vorteilhaft kann als Kurzzeitspektrum auch die spektrale Gewichtungsfunktion eines Postfilters für mehrkanalige Verfahren zur Geräuschreduktion verwendet werden. Zweckmäßigerweise ergibt sich die spektrale Gewichtungsfunktion hierbei aus der Minimierung eines Fehlerkriteriums.As a short-term spectrum, the spectral weighting function of a noise reduction algorithm can be provided. Advantageously, the spectral weighting function of a postfilter for multi-channel noise reduction methods can also be used as the short-term spectrum. Expediently, the spectral weighting function results here from the minimization of an error criterion.

Als Kurzzeitspektrum kann auch ein gefiltertes Kurzzeitspektrum bereitgestellt werden.As a short-term spectrum, a filtered short-term spectrum can also be provided.

Nach einer anderen Weiterbildung des Verfahrens, wird als Kurzzeitspektrum eine spektrale Gewichtungsfunktion eines mehrkanaligen Verfahrens zur Geräuschreduktion bereitgestellt.According to another development of the method, a spectral weighting function of a multi-channel method for noise reduction is provided as the short-term spectrum.

Als Kurzzeitspektrum kann auch eine geschätzte Kohärenz oder eine geschätzte "Magnitude Squared Coherence" zwischen wenigstens zwei Mikrofonkanälen bereitgestellt werden.As a short-term spectrum, an estimated coherence or an estimated magnitude squared coherence can also be provided between at least two microphone channels.

Vorteilhaft wird als Kurzzeitspektrum eine spektrale Gewichtungsfunktion eines mehrkanaligen Verfahrens zur Sprecher- oder Quellentrennung bereitgestellt.Advantageously, a spectral weighting function of a multi-channel method for speaker or source separation is provided as the short-term spectrum.

Weiterhin ist vorgesehen, dass als Kurzzeitspektrum eine spektrale Gewichtungsfunktion eines mehrkanaligen Verfahrens zur Sprechertrennung auf Basis von Phasenunterschieden von Signalen in den verschiedenen Kanälen (Phase Transform - PHAT) bereitgestellt wird.It is further provided that a spectral weighting function of a multi-channel method for speaker separation on the basis of phase differences of signals in the different channels (phase transform - PHAT) is provided as a short-term spectrum.

Ferner ist es möglich, als Kurzzeitspektrum eine spektrale Gewichtungsfunktion eines mehrkanaligen Verfahrens auf Basis einer "Generalized Cross-Correlation" (GCC) zu verwenden.Furthermore, it is possible to use a spectral weighting function of a multi-channel method based on a generalized cross-correlation (GCC) as the short-term spectrum.

Als Kurzzeitspektrum können auch spektrale Größen, die sowohl Sprach- als auch Störanteile enthalten, bereitgestellt werden.As a short-term spectrum, spectral magnitudes containing both speech and noise components can also be provided.

So kann als Kurzzeitspektrum auch eine Schätzung des Signal-zu-Rausch-Verhältnisses in den einzelnen Frequenzbins bereitgestellt werden. Ferner kann als Kurzzeitspektrum eine Schätzung der Rauschleistung verwendet werden.Thus, an estimate of the signal-to-noise ratio in the individual frequency bins can also be provided as the short-term spectrum. Further, as the short-term spectrum, an estimate of the noise power can be used.

Das Problem von Fluktuationen in Kurzzeitspektren ist nicht nur in der Audiosignalverarbeitung bekannt. Weitere vorteilhafte Anwendungsgebiete sind die Bild- und die medizinische Signalverarbeitung.The problem of fluctuations in short-term spectra is known not only in audio signal processing. Further advantageous fields of application are image and medical signal processing.

In der Bildverarbeitung kann z.B. die Zeile eines Bildes als Signalrahmen interpretiert werden, der in den Spektralbereich transformiert werden kann. Die entstehenden Frequenzbins werden hier Ortsfrequenzbins genannt. Bei der Verarbeitung von Bildern im Ortsfrequenzbereich werden Algorithmen verwendet, die denen in der Audiosignalverarbeitung äquivalent sind. Mögliche Fluktuationen, die diese Algorithmen im Ortsfrequenzbereich erzeugen, resultieren im verarbeiteten Bild in optischen Artefakten. Diese sind äquivalent zum tonalen Rauschen in der Audioverarbeitung.In image processing, e.g. the line of an image is interpreted as a signal frame that can be transformed into the spectral range. The resulting frequency bins are called spatial frequency bins here. When processing images in the spatial frequency domain, algorithms equivalent to those used in audio signal processing are used. Possible fluctuations that generate these algorithms in the spatial frequency range result in the processed image in optical artifacts. These are equivalent to tonal noise in audio processing.

In der medizinischen Signalverarbeitung werden vom menschlichen Körper Signale abgeleitet, die wie akustische Signale verrauscht sein können. Das verrauschte Signal kann entsprechend rahmenweise in den Spektralbereich transformiert werden. Die entstehenden Spektrogramme lassen sich wie Audiospektren verarbeiten.In medical signal processing, signals are derived from the human body that can be noisy, such as acoustic signals. The noisy signal can be transformed into the spectral range frame by frame. The resulting spectrograms can be processed like audio spectra.

Das Glättungsverfahren kann in einem Telekommunikationsnetzwerk und/oder bei einer Rundfunkübertragung zur Verbesserung der Sprach- und/oder Bildqualität sowie zur Unterdrückung von Artefakten eingesetzt werden. In der mobilen Sprachkommunikation treten Verzerrungen des Sprachsignals auf, die zum einen durch die eingesetzten Sprachcodierverfahren (redundanzvermindernde Sprachkompression) und das damit verbundene Quantisierungsrauschen und zum anderen durch die vom Übertragungskanal hervorgerufenen Störungen bedingt sind. Letztere sind wiederum stark zeitlich und spektral fluktuierend und führen zu einer deutlich wahrnehmbaren Verschlechterung der Sprachqualität. Auch hier muss die empfängerseitig oder im Netzwerk eingesetzte Signalverarbeitung sicherstellen, dass die quasi zufälligen Artefakte reduziert werden. Zur Qualitätsverbesserung werden bisher sogenannte Postfilter und Fehlerverdeckungsverfahren eingesetzt. Während das Postfilter überwiegend die Reduktion von Quantisierungsrauschen zur Aufgabe hat, werden Fehlerverdeckungsverfahren zur Unterdrückung von übertragungsbedingten Kanalstörungen eingesetzt. In beiden Anwendungen können Verbesserungen erzielt werden, wenn in das Postfilter oder das Verdeckungsverfahren das erfindungsgemäße Glättungsverfahren integriert wird. Das Glättungsverfahren kann somit als Postfilter, in einem Postfilter, in Kombination mit einem Postfilter, im Rahmen eines Fehlerverdeckungsverfahrens oder in Zusammenhang mit einem Verfahren zur Sprach- und/oder Bildcodierung (Dekompressionsverfahren bzw. Dekodierungsverfahren) insbesondere empfängerseitig eingesetzt werden. Mit der Verwendung des Verfahrens als Postfilter ist dabei gemeint, dass das Verfahren zum Postfiltern eingesetzt wird, dass also mit einem das Verfahren umsetzenden Algorithmus die in den Anwendungen entstehenden Daten prozessiert werden. Weiterhin ist es möglich, die Qualität des Sprachsignals im Telekommunikationsnetzwerk zu verbessern, indem das Sprachsignalspektrum oder eine davon abgeleitete Größe mit dem erfindungsgemäßen Glättungsverfahren geglättet wird.The smoothing method can be used in a telecommunications network and / or in broadcasting to improve speech and / or picture quality and to suppress artifacts. In the mobile voice communication distortions of the speech signal occur, which are caused on the one hand by the speech coding (redundancy-reducing speech compression) and the associated quantization noise and on the other by the interference caused by the transmission channel. The latter, in turn, fluctuate greatly in terms of time and spectral and lead to a noticeable deterioration of the voice quality. Again, the receiver side or network used Signal processing ensure that the quasi-random artifacts are reduced. To improve quality, so-called post-filters and error concealment methods have hitherto been used. While the postfilter is primarily intended to reduce quantization noise, error concealment techniques are used to suppress transmission-related channel interference. Improvements can be achieved in both applications if the smoothing process according to the invention is integrated into the postfilter or the masking process. The smoothing method can thus be used as a postfilter, in a postfilter, in combination with a postfilter, as part of an error concealment method or in connection with a method for speech and / or picture coding (decompression method or decoding method), in particular on the receiver side. By using the method as a postfilter, it is meant that the method is used for post-filtering, that is to say that the data produced in the applications are processed with an algorithm implementing the method. Furthermore, it is possible to improve the quality of the speech signal in the telecommunications network by smoothing the speech signal spectrum or a variable derived therefrom with the smoothing method according to the invention.

Die Erfindung wird nachfolgend anhand von in den Figuren dargestellten Abbildungen näher erläutert. Es zeigen:

Figur 1: ein unverrauschtes Zeitsignal;
Figur 2: ein verrauschtes Zeitsignal;
Figur 3: einen einzelnen Signalrahmen im Zeitbereich;
Figur 4: einen einzelnen Signalrahmen im Spektralbereich;
Figur 5: eine Gewichtungsfunktion für einen einzelnen Rahmen;
Figur 6: das Spektrogramm eines unverrauschten Signals;
Figur 7: das Spektrogramm eines verrauschten Signals;
Figur 8: das Spektrogramm eines mit der ungeglätteten Gewichtungsfunktion gefilterten Signals;
Figur 9: das Spektrogramm eines mit einer erfindungsgemäß geglätteten Gewichtungsfunktion gefilterten Signals;
Figur 10: ein gefiltertes Zeitsignal mit tonalen Artefakten;
Figur 11: ein gemäß der Erfindung gefiltertes Zeitsignal;
Figur 12: das Spektrogramm einer ungeglätteten Gewichtungsfunktion;
Figur 13: das Spektrogramm einer erfindungsgemäß geglätteten Gewichtungsfunktion;
Figur 14: den Betrag des Cepstrums eines unverrauschten Sprachsignals und
Figur 15: den Signalflussgraphen gemäß einer bevorzugten Ausführungsform der Erfindung.

The invention will be explained in more detail with reference to figures shown in the figures. Show it:

FIG. 1: a noisy time signal;
FIG. 2: a noisy time signal;
FIG. 3: a single signal frame in the time domain;
FIG. 4: a single signal frame in the spectral range;
FIG. 5: a weighting function for a single frame;
FIG. 6: the spectrogram of an unrestrained signal;
FIG. 7: the spectrogram of a noisy signal;
FIG. 8: the spectrogram of a signal filtered with the unsmoothed weighting function;
FIG. 9: the spectrogram of a signal filtered with a smoothed weighting function according to the invention;
FIG. 10: a filtered time signal with tonal artifacts;
FIG. 11: a time signal filtered according to the invention;
FIG. 12: the spectrogram of an unsmoothed weighting function;
FIG. 13: the spectrogram of a weighting function smoothed according to the invention;
FIG. 14: the amount of the cepstrum of a noisy speech signal and
FIG. 15: the signal flow graph according to a preferred embodiment of the invention.

In Figur 1 ist ein unverrauschtes Signal in Form der Amplitude über die Zeit dargestellt. Die Dauer des Signals ist 4 Sekunden, die Amplituden reichen von ca. -0,18 bis ca. 0,18. In Figur 2 ist das Signal in verrauschter Form dargestellt. Man erkennt ein zufälliges Grundrauschen über dem gesamten Zeitverlauf.In FIG. 1 is an unencumbered signal in the form of amplitude over time. The duration of the signal is 4 seconds, the amplitudes range from about -0.18 to about 0.18. In FIG. 2 the signal is shown in noisy form. One recognizes a random background noise over the entire time course.

In Figur 3 ist das Signal eines einzelnen Signalrahmens λ dargestellt. Der Signalrahmen hat eine Segmentdauer von 32 Millisekunden. Die Amplitude beider Graphen bewegt sich zwischen -0,1 und 0,1. Die einzelnen Abtastwerte der digitalen Signale sind zu Graphen verbunden. Der verrauschte Graph repräsentiert das Eingangssignal, in dem das unverrauschte Signal enthalten ist. Eine Trennung von Signal und Rauschen im verrauschten Signal ist in dieser Repräsentation des Signals kaum möglich.In FIG. 3 the signal of a single signal frame λ is shown. The signal frame has a segment duration of 32 milliseconds. The amplitude of both graphs ranges between -0.1 and 0.1. The individual samples of the digital signals are connected to graphs. The noisy graph represents the input signal containing the noisy signal. A separation of signal and noise in the noisy signal is hardly possible in this representation of the signal.

Figur 4 ist eine Darstellung desselben Signalrahmens nach der Transformation in den Frequenzbereich. Die einzelnen Frequenzbins µ sind zu Graphen verbunden. Auch in dieser Figur sind die Frequenzbins verrauscht und unverrauscht dargestellt, wobei wieder das unverrauschte Signal das im verrauschten Signal enthaltene Sprachsignal ist. Über der Abszisse sind die Frequenzbins µ von 0 bis 128 eingezeichnet. Sie haben Amplituden von ca. -40 Dezibel (dB) bis ca. 10 dB. Aus dem Vergleich der Graphen ist ersichtlich, dass die Energie des Sprachsignals in einigen Frequenzbins in einer kammartigen Struktur konzentriert ist, während das Rauschen auch in den dazwischenliegenden Bins vorhanden ist. FIG. 4 is a representation of the same signal frame after the transformation into the frequency domain. The individual frequency bins μ are connected to graphs. Also in this figure, the frequency bins are noisy and noisy, again with the noisy signal in the noisy signal is included speech signal. The abscissa shows the frequency bins μ from 0 to 128. They have amplitudes of about -40 decibels (dB) to about 10 dB. From the comparison of the graphs, it can be seen that the energy of the speech signal in some frequency bins is concentrated in a comb-like structure, while the noise is also present in the intervening bins.

In Figur 5 ist eine Gewichtungsfunktion für den verrauschten Rahmen aus Figur 4 dargestellt. Für jeden Frequenzbin µ ergibt sich in Abhängigkeit vom Verhältnis aus Sprach- und Rauschenergie ein Faktor zwischen 0 und 1. Die einzelnen Gewichtungsfaktoren sind zu einem Graphen verbunden. Man erkennt die kammartige Struktur des Sprachspektrums wieder.In FIG. 5 is a weighting function for the noisy frame FIG. 4 shown. For each frequency bin μ, a factor between 0 and 1 results depending on the ratio of speech and noise energy. The individual weighting factors are connected to a graph. One recognizes the comb-like structure of the speech spectrum again.

In den Figuren 6 und 7 sind Spektrogramme aus einer Folge von unverrauschten bzw. verrauschten Kurzzeitspektren (Figur 4) dargestellt. Auf der Abszisse ist der Rahmenindex λ aufgetragen, über der Ordinate der Frequenzbinindex µ. Die Amplituden der einzelnen Frequenzbins sind als Grauwerte dargestellt. Im Vergleich von Figur 6 und 7 wird deutlich, wie Sprache in wenigen Frequenzbins konzentriert ist. Sie bildet zudem regelmäßige Strukturen aus. Das Rauschen ist dagegen über alle Frequenzbins verteilt.In the FIGS. 6 and 7 are spectrograms from a series of noisy or noisy short-term spectra ( FIG. 4 ). On the abscissa the frame index λ is plotted, on the ordinate of the frequency bin index μ. The amplitudes of the individual frequency bins are shown as gray values. In comparison of FIGS. 6 and 7 It becomes clear how language is concentrated in a few frequency bins. It also trains regular structures. The noise, however, is distributed over all frequency bins.

In Figur 8 ist das Spektrogramm eines gefilterten Signals dargestellt. Die Achsen entsprechen denen aus den Figuren 6 und 7. Aus einem Vergleich mit Figur 6 ist erkennbar, dass durch Schätzfehler in der Gewichtungsfunktion hohe Amplituden in Frequenzbins verbleiben, die keine Sprache enthalten. Diese Ausreißer zu unterdrücken ist Ziel des erfindungsgemäßen Verfahrens.In FIG. 8 the spectrogram of a filtered signal is shown. The axes correspond to those from the FIGS. 6 and 7 , From a comparison with FIG. 6 It can be seen that estimation errors in the weighting function leave high amplitudes in frequency bins which contain no speech. To suppress these outliers is the aim of the method according to the invention.

In Figur 9 ist das Spektrogramm eines Signals dargestellt, das gemäß einer bevorzugten Weiterbildung des erfindungsgemäßen Verfahrens mit einer geglätteten Gewichtungsfunktion gefiltert wurde. Die Achsen entsprechen denen der vorangegangenen Spektrogramme. Im Vergleich mit Figur 8 sind die Ausreißer stark vermindert. Die Sprachanteile im Spektrogramm sind dagegen in ihrer wesentlichen Form erhalten.In FIG. 9 is the spectrogram of a signal is shown, which has been filtered according to a preferred embodiment of the method according to the invention with a smoothed weighting function. The axes correspond to those of the previous spectrograms. In comparison with FIG. 8 the outliers are greatly reduced. In contrast, the speech components in the spectrogram are preserved in their essential form.

In den Figuren 10 und 11 sind die Zeitsignale dargestellt, die sich jeweils aus den gefilterten Spektren der Figuren 8 und 9 ergeben. Aufgetragen ist die Amplitude über der Zeit. Die Signale sind 4 Sekunden lang und haben Amplituden zwischen ca. -0,18 und 0,18. Die Ausreißer im Spektrogramm aus Figur 8 ergeben im zugehörigen Zeitsignal in Figur 10 deutlich sichtbare tonale Artefakte, die im unverrauschten Signal aus Figur 1 nicht vorhanden sind. Das Zeitsignal in Figur 11 weist einen deutlich ruhigeren Verlauf des Restrauschens auf. Dieses Zeitsignal ergibt sich aus dem Spektrogramm von Figur 9, das durch Filterung mit der geglätteten Gewichtungsfunktion erzeugt wurde.In the FIGS. 10 and 11 the time signals are shown, each resulting from the filtered spectra of the FIGS. 8 and 9 result. Plotted is the amplitude over time. The signals are 4 seconds long and have amplitudes between about -0.18 and 0.18. The outliers in the spectrogram out FIG. 8 result in the associated time signal in FIG. 10 clearly visible tonal artifacts, which in the noisy signal FIG. 1 are not available. The time signal in FIG. 11 has a much quieter course of residual noise. This time signal results from the spectrogram of FIG. 9 that was generated by filtering with the smoothed weighting function.

In Figur 12 ist die ungeglättete Gewichtungsfunktion für alle Rahmen dargestellt. Zu jedem Rahmen λ sind entlang der Ordinate Frequenzbins µ aufgetragen. Die Werte der Gewichtungsfunktion sind als Grauton dargestellt. Die Fluktuationen, die aus Schätzfehlern resultieren, sind als unregelmäßige Flecken erkennbar.In FIG. 12 the unsmoothed weighting function is shown for all frames. Frequency bins μ are plotted along the ordinate for each frame λ. The values of the weighting function are shown as gray tones. The fluctuations resulting from estimation errors are recognizable as irregular patches.

In Figur 13 ist die geglättete Gewichtungsfunktion für alle Rahmen dargestellt. Die Achsen entsprechen denen aus Figur 12. Durch die Glättung werden die Fluktuationen verschmiert und im Wert stark vermindert. Die Struktur der Sprachfrequenzbins bleibt dagegen deutlich erkennbar.In FIG. 13 the smoothed weighting function is shown for all frames. The axes correspond to those from FIG. 12 , Due to the smoothing, the fluctuations are smeared and greatly reduced in value. The structure of the voice frequency bins, however, remains clearly recognizable.

In Figur 14 ist der Betrag des Cepstrums eines unverrauschten Signals über alle Rahmen dargestellt. Zu jedem Rahmen λ sind entlang der Ordinate die cepstralen Bins µ' aufgetragen. Die Werte der Beträge der cepstralen Koeffizienten $G_{μʹ}^{cepst} (λ)$

sind als Grautöne dargestellt. Ein Vergleich mit Figur 6 zeigt, dass Sprache im Cepstrum auf eine noch geringere Anzahl von Koeffizienten konzentriert ist. Außerdem sind diese Koeffizienten in ihrer Position weniger variabel. Deutlich erkennbar ist auch der Verlauf des cepstralen Koeffizienten, der die Pitch-Frequenz repräsentiert.In FIG. 14 is the amount of the cepstrum of an unbroken signal across all frames. For each frame λ, the cepstral bins μ 'are plotted along the ordinate. The values of the amounts of cepstral coefficients

G_{μ'}^{cepst} (λ)

are shown as shades of gray. A comparison with FIG. 6 shows that language in cepstrum is concentrated on an even smaller number of coefficients. In addition, these coefficients are less variable in their position. Clearly recognizable is the course of the cepstral coefficient, which represents the pitch frequency.

In Figur 15 ist ein Signalflussgraph gemäß einer bevorzugten Ausführungsform der Erfindung dargestellt. Ein verrauschtes Eingangssignal wird in eine Folge von Kurzzeitspektren transformiert und daraus über spektrale Zwischengrößen anschließend eine Gewichtungsfunktion zur Filterung geschätzt. Es wird jeweils ein Rahmen zur Zeit bearbeitet. Zunächst werden die Kurzzeitspektren der Gewichtungsfunktion einer nicht-linearen, logarithmischen Abbildung unterworfen. Es folgt eine Hintransformation in den cepstralen Bereich. Die so transformierten Kurzzeitspektren werden damit durch Transformationskoeffizienten der Basisfunktionen repräsentiert. Die auf diesem Wege berechneten Transformationskoeffizienten werden getrennt voneinander unter Verwendung von unterschiedlichen Zeitkonstanten geglättet. Der rekursive Charakter der Glättung ist durch die Rückführung der Ausgabe der Glättung zu ihrem Eingang angedeutet. Von den Signalpfaden der insgesamt M Transformationskoeffizienten sind nur 3 dargestellt, die restlichen sind durch drei Punkte "..." ersetzt. Nach der Glättung erfolgen eine Rücktransformation und danach die nicht-lineare Umkehrabbildung. Auf diese Weise erhält man als Ergebnis eine Folge von geglätteten Kurzzeitspektren der Gewichtungsfunktion. Diese geglätteten Kurzzeitspektren der Gewichtungsfunktion können mit den verrauschten Kurzzeitspektren multipliziert werden, wodurch gefilterte Kurzzeitspektren mit wenigen Ausreißern entstehen. Diese können dann in ein Zeitsignal mit verringertem Rauschpegel umgerechnet werden. Der Teil des Signalflussgraphen, der die erfindungsgemäße Glättung beschreibt, ist gestrichelt umrandet.In FIG. 15 a signal flow graph according to a preferred embodiment of the invention is shown. A noisy input signal is transformed into a sequence of short-term spectra, and then a weighting function for filtering is then estimated via spectral intermediate quantities. It will be respectively a frame currently being edited. First, the short-term spectra of the weighting function are subjected to a non-linear, logarithmic mapping. This is followed by a transformation into the cepstral area. The thus transformed short-term spectra are thus represented by transformation coefficients of the basis functions. The transformation coefficients calculated in this way are smoothed separately using different time constants. The recursive nature of the smoothing is indicated by the return of the output of the smoothing to its input. Of the signal paths of the total M transformation coefficients only 3 are shown, the rest are replaced by three points "...". After smoothing, a back transformation and then the non-linear inverse mapping occur. In this way one obtains as a result a series of smoothed short-term spectra of the weighting function. These smoothed short-term spectra of the weighting function can be multiplied by the noisy short-term spectra, resulting in filtered short-term spectra with few outliers. These can then be converted into a time signal with reduced noise level. The part of the signal flow graph which describes the smoothing according to the invention is bordered by dashed lines.

Claims

Smoothing method for suppressing fluctuating artifacts during noise reduction, having the following steps:
• short-term spectra for a series of signal frames comprising digital samples are provided,

• each short-term spectrum is transformed by forward transformation, which describes the short-term spectrum using transformation coefficients which represent the short-term spectrum divided into its coarse and its fine structures,

• the transformation coefficients with the same coefficient indices in each case are smoothed by combining at least two successive transformed short-term spectra, and

• the smoothed transformation coefficients are transformed into smoothed short-term spectra by backward transformation.
Smoothing method according to the preceding claim, characterized in that the inverse of the forward transformation is used for the backward transformation.
Smoothing method according to Claim 1 or 2, characterized in that transformation with an orthogonal base is used.
Smoothing method according to Claim 1 or 2, characterized in that transformation with a nonorthogonal base is used.
Smoothing method according to Claim 1 or 2, characterized in that the discrete Fourier transformation and the inverse thereof are used for the transformations.
Smoothing method according to Claim 1 or 2, characterized in that the fast Fourier transformation and the inverse thereof are used for the transformations.
Smoothing method according to Claim 1 or 2, characterized in that the discrete cosine transformation and the inverse thereof are used for the transformations.
Smoothing method according to Claim 1 or 2, characterized in that the discrete sine transformation and the inverse thereof are used for the transformations.
Smoothing method according to one of the preceding claims, characterized in that the short-term spectra are mapped nonlinearly before the forward transformation.
Smoothing method according to the preceding claim, characterized in that the smoothed short-term spectra are mapped nonlinearly after the backward transformation, wherein the nonlinear mapping of the backward transformation is the reversal of the nonlinear mapping of the forward transformation.
Smoothing method according to one of the two preceding claims, characterized in that the short-term spectra are mapped nonlinearly before the forward transformation by logarithmization.
Smoothing method according to one of Claims 1 to 11, characterized in that recursive smoothing is used for smoothing the transformation coefficients.
Smoothing method according to one of Claims 1 to 11, characterized in that nonrecursive smoothing is used for smoothing the transformation coefficients.
Smoothing method according to one of the preceding claims, characterized in that the smoothing is applied to the absolute value or to a power of the absolute value of the short-term spectra.
Smoothing method according to one of the preceding claims, characterized in that different time constants are used for smoothing the respective transformation coefficients.
Smoothing method according to the preceding claim, characterized in that time constants are chosen such that the transformation coefficients which typically describe spectral structures of voice are smoothed little.
Smoothing method according to one of the two preceding claims, characterized in that time constants are chosen such that the transformation coefficients which describe spectral structures of fluctuating spectral magnitudes and of artifacts of noise reduction algorithms are smoothed much.
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is a spectral weighting function of a noise reduction algorithm.
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum used is a spectral weighting function of a post filter for multichannel methods for noise reduction.
Smoothing method according to one of the two preceding claims, characterized in that the spectral weighting function results from the minimization of an error criterion.
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is a filtered short-term spectrum.
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is a spectral weighting function of a multichannel method for noise reduction.
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is an estimated coherence or an estimated "Magnitude Squared Coherence" between at least two microphone channels.
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is a spectral weighting function of a multichannel method for speaker or source separation.
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is a spectral weighting function of a multichannel method for speaker separation on the basis of phase differences for signals in the different channels (Phase Transform - PHAT).
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is a spectral weighting function of a multichannel method for noise reduction on the basis of a "Generalized Cross-Correlation" (GCC).
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is spectral magnitudes which contain both voice and noise components.
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is an estimate of the signal-to-noise ratio.
Smoothing method according to one of Claims 1 to 17, characterized in that the short-term spectrum provided is an estimate of the noise power.
Smoothing method according to one of Claims 1 to 15, characterized in that the short-term spectrum provided is transformed signal frames of an image signal, and the coefficients of the transformed image signal which are calculated row by row or column by column or two-dimensionally are subjected to spatial smoothing with different smoothing parameters.
Smoothing method according to the preceding claim, characterized in that the image signal is a video signal.
Smoothing method according to one of Claims 1 to 15, characterized in that the short-term spectrum used is a transformed medical signal derived from the human body.
Smoothing method according to one of Claims 1 to 32, characterized in that the smoothing method is used in a post filter, in combination with a post filter, as part of an error masking method or in connection with a method for voice and/or image coding, particularly at the receiver end.
Smoothing method according to one of Claims 1 to 33, characterized in that the smoothing method is used in a telecommunication network and/or during a broadcast transmission to improve the voice and/or image quality and to suppress artifacts.