EP1435089A1

EP1435089A1 - Method and system for reducing a voice signal noise

Info

Publication number: EP1435089A1
Application number: EP02776772A
Authority: EP
Inventors: Marc Ihle; Frank Walter
Original assignee: Siemens AG
Current assignee: BenQ Corp
Priority date: 2001-10-12
Filing date: 2002-10-02
Publication date: 2004-07-07
Anticipated expiration: 2022-10-02
Also published as: EP1435089B1; US8005669B2; CN1241172C; DE10150519B4; DE10150519A1; DE50206411D1; US20090132241A1; US7392177B2; CN1568503A; WO2003034407A1; US20040186711A1

Abstract

The invention concerns a method whereby, before being subjected to a low rate voice coding, an incoming digital voice signal s(k) is chronologically segmented (101) into blocks (block, m) said blocks (block, m) are broken down (102) respectively, in chronological order, into frequency components f(i, m) by a transformation in the frequency range and said frequency components are multiplied by weight factors depending on the frequency and modifiable in time, a frequency component being multiplied by the last weight factor calculated for said frequency component if said factor is less than the current weight factor.

Description

description

METHOD AND ARRANGEMENT FOR NOISE REDUCTION OF A VOICE SIGNAL

The invention relates to a method and arrangement for speech processing, in particular of a disturbed Sprachsig ^¬ Nals.

The rapid technical development in the field of mobile communication has led to constantly increasing demands on speech processing, in particular speech coding and noise suppression, which not least due to an increasing shortage of bandwidth and ever increasing demands on speech quality. is to be returned.

An essential part of speech processing consists in estimating the interference signal or interference noise with which a speech signal recorded by a microphone, for example, is usually afflicted and, if necessary, suppressing it in the input signal in order to transmit only the speech signal if possible. However, with common methods for noise suppression, undesired artifacts, also called musical tones, often result in the background signal.

The invention has for its object to provide a technical teaching for language processing, which enables the transmission of speech with a low data rate and high quality.

This object is solved by the features of the independent claims. Advantageous and expedient further developments result from the dependent claims.

The invention is therefore initially based on the idea that the frequency components of a speech signal with an interference signal are encoded by a low-rate one Multiplying speech codec ewichtungsfaktoren by chronologically modifiable frequency-dependent _G, wherein a frequency ^¬ component with a current weighting factor multiplied ^¬ is sheet if this is smaller than the last calculated for this frequency component weighting factor, and where a frequency component with the last ^¬ component for this frequency calculated Weighting factor is multiplied if it is smaller than the current weighting factor. A low-rate speech codec is understood in particular to mean a speech codec that delivers a data rate that is less than 5 kbit per second.

It is thereby achieved that the interference signal applied to a speech signal is damped in such a way that speech with good quality can be transmitted with little computation or storage effort.

The invention is initially based on the knowledge that when using low-rate speech codecs, good speech quality can only be achieved if the artifacts - already explained above - are avoided or reduced as far as possible. This could be recognized by the use of complex simulation tools specially created for this purpose.

Furthermore, the invention is based on the knowledge that - as also complex simulations showed - artifacts in the background signal, in particular during speech pauses, are reduced by the special use of current or most recently calculated weighting factors.

This advantageous effect of the invention, ie the combination of a special method for noise suppression with a low-rate speech codec, which in particular delivers a data rate that is between 3 kbit per second and 5 kbit per second, was finally also confirmed by extensive simulations. The Wei ^¬ ter _b il _d Ungen described in further dependent claims or, _A usgestaltungen and variants may be both in combination with the process as well as in Kombina ^¬ tion with the arrangements included in this invention.

The invention is described in more detail below on the basis of preferred exemplary embodiments, the features contained therein also being able to be included in other combinations by the invention. The figures listed below are intended to explain these exemplary embodiments:

Figure 1 simplified block diagram of a method for speech processing;

FIG. 2 flow diagram of a method for noise suppression;

Figure 3 simplified block diagram of an arrangement for speech processing.

Figure 1 shows a block diagram of a method for speech processing. This process can be roughly divided into the interacting blocks noise suppression and downstream low-rate speech codec NSC. A low-rate speech codec, which, for example, delivers a data rate of 4 kbit per second, is known as such, which is why it is not discussed in more detail here.

The process for noise suppression can be divided into several function blocks, which are explained below.

The block analysis AN and synthesis SY form the framework of the method for noise suppression. A segmentation (not shown in the figure) of the input signal before an analysis AN and the block sizes used are matched to the low-rate speech codec in such a way that the _d remains as low as possible urch the Storgerauschunterdruckung caused algorithmic delay of the signal. The segmentation of the input signal x (k) takes place, for example, in blocks of 20 ms at a sampling rate of 8 kHz. The processed data can also be passed on to the speech codec in segments with the specified block length.

The analysis AN can include a windowing, zero padding and a transformation into the frequency range by means of a Fourier transformation, and the synthesis SY a reverse transformation by an inverse Fourier transformation, the time range and a signal reconstruction according to the overlap add method.

The frequency components resulting from the analysis AN have a real and an imaginary part or a magnitude and phase. The magnitudes of different frequency components lying next to one another are initially combined into frequency groups to reduce expenditure, for example on the basis of a bar chart. FGZU1.

An amplification calculation VB is carried out for each frequency group on the basis of an a priori and an a posteriori signal-to-noise ratio, which results in weighting factors for the magnitudes of the individual frequency groups. The a priori signal-to-noise behavior is can be derived from the power density spectrum of the disturbed input signal and the a priori noise estimate GS. The A-postio rio signal-to-noise ratio can be calculated from the power density spectrum of the disturbed input signal and the output signal of a buffering P, which in turn is supplied by corrected frequency components summarized by a frequency group summary FGZU2.

Before decomposing FGZE of the frequency components previously combined into frequency groups and multiplying the frequency components by that for a corresponding one Frequency group calculated weighting factor to the storge ^¬ noise reduction, the weighting factors of said minimum so ^¬ filtering MF are subjected, which later is to hand ^¬ figure 2 near explained.

In order to estimate interference noise, the power density of the background noise is essentially estimated from the input signal. To reduce the required computing power and memory consumption, the a priori noise estimation, the gain calculation, the buffering of the signal magnitude modified for interference signal suppression and the minimum filter are only carried out in a few sub-bands. For this purpose, the magnitude of the input signal transformed into the frequency range and of the signal modified for interference signal suppression are summarized in two bands for frequency group summarization. The width of the sub-bands is based on the Bark scale and therefore varies with the frequency. The output signal of each frequency group of the minimum filter is distributed to the corresponding frequency components or Fourier coefficients by the block frequency group decomposition. To calculate the input signal of the buffering block, in another embodiment variant, instead of a frequency group summary of the signal modified for interference signal suppression, the magnitude of the input signal combined in frequency groups can also be multiplied element by element with the output signal of the minimum filter.

In addition to the noise estimate, an a posteriori estimate of the speech signal component is made. For this purpose, the signal of the magnitude values modified for noise reduction combined in frequency groups is stored in the buffering block. The output signals of the a priori noise estimation and the buffering are used in addition to the magnitude values of the n frequency group combined input signal to calculate the gain calculation. The reinforcement calculation results in weighting factors that are _l purged - minimum filters are fed. The minimum filter finally determines provided for the multiplication with the frequency components of the frequency groups Ge ^¬ weighting factors.

_A Nhand a in Figure 2 flowchart shown a simplified embodiment for Storgerauschunter ^¬ will now be explained in more detail druckung a speech signal. The blocks frequency group summary FGZU1, FGZU2 and frequency group decomposition shown in FIG. 1 are not used.

Interfered speech signals recorded by a microphone are converted by a scanning device and a downstream analog-digital conversion into an incoming digital speech signal s (k) which is subject to interference n (k). This input signal is segmented (101) into blocks (block, m) in time, and the blocks (block, m) are mapped in chronological order by transformation into the frequency domain on I frequency components f (i, m) (102), where m represents time and i represents frequency. This can be done, for example, by a Fourier transformation. If the Fourier coefficients of the input signal are designated X (i, m), then the values | X (i, m) | ^Λ 2 are called frequency components.

The frequency components of a speech signal f (i, m) are multiplied by a weighting factor H (i, m) after the segmentation 101 and transformation into the frequency range 102 explained above, the weighting factor being derived, for example, from the estimated a priori and a posteriori already explained above Signal-to-noise ratios can be derived. The a priori signal-to-noise ratio can be derived from the power density spectrum of the disturbed input signal and the a priori noise estimate. The A-posteriori signal-to-noise ratio can be determined from the power t spectrum of the disturbed input signal and the output signal of the buffering can be calculated.

The weighting factor, which is dependent on the frequency or frequency components, is time-variable and is continuously determined in accordance with the time-varying frequency components. In order to avoid undesired artifacts in the background signal, the weighting factor H (ι, m) currently calculated for this frequency component is not always used to implement a minimum filter for multiplication by a frequency component f (ι, m), but then , if the last weighting factor H (ι, m-1) calculated for this frequency component in the previous step is smaller than the current weighting factor, the last weighting factor H (ι, ml) calculated for this frequency component in this previous step. is used.

An embodiment variant of the invention provides that a frequency component is multiplied by the current weighting factor if the frequency-dependent weighting factor is above a threshold value, even if the weighting factor last calculated for this frequency component is smaller than the current weighting factor.

This can be implemented by means of a filter which compares the current weight factor with the previous weight factor at the same frequency and selects the smaller of the two values for application to the frequency component. If the fixed threshold value 0.76 is exceeded by the current weighting factor, no modification of the frequency component takes place.

FIG. 4 shows a program-controlled processor device PE, such as a microcontroller, which can also include a processor CPU and a memory device SPE. Components may be arranged, which - depending on the embodiment can thereby within or au ^¬ ßerhalb said processor means further PE - controlling the processor means associated ^¬, belonging to the processor means, controlled by the processor means or the processor means

Function in connection with a processor device are well known to a person skilled in the art, and which will therefore not be discussed further here. The different components can exchange data with the processor device PE via a bus system BUS or input / output interfaces IOS and possibly suitable controllers (not shown). The processor device PE can be part of an electronic device, such as a communication terminal, or a cell phone, and can also control other methods and applications specific to the electronic device.

Depending on the embodiment variant, the storage device SPE, which can also be one or more volatile or non-volatile RAM or ROM memory modules, or parts of the storage device SPE can be implemented as part of the processor device (shown in the figure) or as an external storage device (Not shown in the figure), which is located outside the processor device PE or even outside the device containing the processor device PE and is connected to the processor device PE by lines or a bus system.

The program data which are used to control the device and the method for speech processing and for interference signal suppression are stored in the storage device SPE. It is within the scope of professional action to implement the above-mentioned functional components using program-controlled processors or microcircuits specially provided for this purpose. The digital voice signals, which are subject to interference, can be fed to the processor device PE via the input / output interface IOS. In addition to the processor CPU, a digital signal processor DSP can be provided in order to carry out the steps of the methods explained above in whole or in part.

Claims

claims

1. Language processing method,

- in which an incoming digital voice signal s (k) is temporarily lent m blocks (block, m) is segmented (101),

- in which the blocks (block, m) are mapped (102) in time sequence by a transformation into the frequency range on frequency components (f, ι),

- the frequency components are multiplied by time-dependent frequency-dependent weighting factors,

a frequency component is multiplied by the current weighting factor if this is smaller than the last weighting factor calculated for this frequency component, and a frequency component is multiplied by the last one for this

Frequency component calculated weighting factor is multiplied if this is smaller than the current weighting factor, and

- in which the frequency components weighted in this way are fed to a low-rate speech codec after a jerk transformation in the time domain.

2. The method of claim 1, in which a frequency component is multiplied by the current weighting factor if the frequency-dependent weighting factor is above a threshold value, even if the weighting factor last calculated for this frequency component is smaller than the current weighting factor.

3. Arrangement for noise suppression

- With an input (IOS) for digital voice signals, and

- With a processor device (PE), which is set up in this way

an incoming digital speech signal s (k) is segmented in time m blocks (block, m) (101), - the blocks (block, m) (f, i) in chronological order by a transformation in the frequency domain respectively to frequency ^¬ components are mapped (102)

- where a frequency component is multiplied by the current weighting factor if this is smaller than the weighting factor last calculated for this frequency component, and - where a frequency component is multiplied by the last one for this

Frequency component calculated weighting factor is multiplied, if this is smaller than the current weighting factor, and that

- The frequency components weighted in this way after a re-transformation into the time domain of a low rate

Undergo speech coding.

4. Arrangement according to claim 3, wherein

- A frequency component is multiplied by the current weighting factor if the frequency-dependent weighting factor is above a threshold value, even if the weighting factor last calculated for this frequency component is smaller than the current weighting factor.