EP0929065A2 - Modulare Sprachverbesserung mit Anwendung an der Sprachkodierung - Google Patents
Modulare Sprachverbesserung mit Anwendung an der Sprachkodierung Download PDFInfo
- Publication number
- EP0929065A2 EP0929065A2 EP99100141A EP99100141A EP0929065A2 EP 0929065 A2 EP0929065 A2 EP 0929065A2 EP 99100141 A EP99100141 A EP 99100141A EP 99100141 A EP99100141 A EP 99100141A EP 0929065 A2 EP0929065 A2 EP 0929065A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- digitized
- speech enhancement
- coder
- components
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000013459 approach Methods 0.000 title description 2
- 238000000034 method Methods 0.000 claims abstract description 38
- 230000005284 excitation Effects 0.000 claims abstract description 30
- 230000008569 process Effects 0.000 claims abstract description 26
- 238000001228 spectrum Methods 0.000 claims abstract description 18
- 230000002708 enhancing effect Effects 0.000 claims description 9
- 230000000737 periodic effect Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000010183 spectrum analysis Methods 0.000 claims description 5
- 230000001755 vocal effect Effects 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000006835 compression Effects 0.000 description 4
- 238000007906 compression Methods 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 239000000470 constituent Substances 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 241000282412 Homo Species 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000033458 reproduction Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
Definitions
- Speech enhancement is an effort to process the noisy speech for the benefit of the intended listener, be it a human, speech recognition module, or anything else. For a human listener, it is desirable to increase the perceptual quality and intelligibility of the perceived speech, so that the listener understands the communication with minimal effort and fatigue.
- Speech enhancement can be broadly defined as the removal of additive noise from a corrupted speech signal in an attempt to increase the intelligibility or quality of speech. In most speech enhancement techniques, the noise and speech are generally assumed to be uncorrelated. Single channel speech enhancement is the simplest scenario, where only one version of the noisy speech is available, which is typically the result of recording someone speaking in a noisy environment with a single microphone.
- FIG. 1 illustrates a speech enhancement setup for N noise sources for a single-channel system.
- exact reconstruction of the clean speech signal is usually impossible in practice.
- speech enhancement algorithms must strike a balance between the amount of noise they attempt to remove and the degree of distortion that is introduced as a side effect. Since any noise component at the microphone cannot in general be distinguished as coming from a specific noise source, the sum of the responses at the microphone from each noise source is denoted as a single additive noise term.
- Speech enhancement has a number of potential applications.
- a human listener observes the output of the speech enhancement directly, while in others speech enhancement is merely the first stage in a communications channel and might be used as a preprocessor for a speech coder or speech recognition module.
- speech enhancement is merely the first stage in a communications channel and might be used as a preprocessor for a speech coder or speech recognition module.
- speech enhancement module makes very different demands on the performance of the speech enhancement module, so any speech enhancement scheme ought to be developed with the intended application in mind.
- many well-known speech enhancement processes perform very differently with different speakers and noise conditions, making robustness in design a primary concern. Implementation issues such as delay and computational complexity are also considered.
- Speech can be modeled as the output of an acoustic filter (i.e., the vocal tract) where the frequency response of the filter carries the message. Humans constantly change properties of the vocal tract to convey messages by changing the frequency response of the vocal tract.
- an acoustic filter i.e., the vocal tract
- the input signal to the vocal tract is a mixture of harmonically related sinusoids and noise.
- Pitch is the fundamental frequency of the sinusoids.
- Force correspond to the resonant frequency(ies) of the vocal tract.
- a speech coder works in the digital domain, typically deployed after an analog-to-digital (A/D) converter, to process a digitized speech input to the speech coder.
- the speech coder breaks the speech into constituent parts on an interval-by-interval basis. Intervals are chosen based on the amount of compression or complexity of the digitized speech. The intervals are commonly referred to as frames or sub-frames.
- the constituent parts include: (a) gain components to indicate the loudness of the speech; (b) spectrum components to indicate the frequency response of the vocal tract, where the spectrum components are typically represented by linear prediction coefficients ("LPCs") and/or cepstral coefficients; and (c) excitation signal components, which include a sinusoidal or periodic part, from which pitch is captured, and a noise-like part.
- LPCs linear prediction coefficients
- excitation signal components which include a sinusoidal or periodic part, from which pitch is captured, and a noise-like part.
- gain is measured for an interval to normalize speech into a typical range. This is important to be able to run a fixed point processor on the speech.
- LPCs linear prediction coefficients
- FFT fast Fourier transform
- the bandwidth of a telephone channel is limited to 3.5 kHz. Upper (higher-frequency) formants can be lost in coding.
- the speech spectrum is flattened out by noise, and formants can be lost in coding.
- Calculation of the LPC and the cepstral coefficients can be affected.
- the excitation signal (or “residual signal”) components are determined after or separate from the gain components and the spectrum components by breaking the speech into a periodic part (the fundamental frequency) and a noise part.
- the processor looks back one (pitch) period (1/F) of the fundamental frequency (F) of the vocal tract to take the pitch, and makes the noise part from white noise. A sinusoidal or periodic part and a noise-like part are thus obtained.
- Speech enhancement is needed because the more the speech coder is based on a speech production model, the less able it is to render faithful reproductions of non-speech sounds that are passed through the speech coder. Noise does not fit traditional speech production models. Non-speech sounds sound peculiar and annoying. The noise itself may be considered annoying by many people. Speech enhancement has never been shown to improve intelligibility but has often been shown to improve the quality of uncoded speech.
- speech enhancement was performed prior to speech coding, in a speech enhancement system separated from a speech coder/decoder, as shown in FIG. 2.
- the speech enhancement module 6 is separated from the speech coder/decoder 8.
- the speech enhancement module 6 receives input speech.
- the speech enhancement module 6 enhances (e.g., removes noise from) the input speech and produces enhanced speech.
- the speech coder/decoder 8 receives the already enhanced speech from the speech enhancement module 6.
- the speech coder/decoder 8 generates output speech based on the already-enhanced speech.
- the speech enhancement module 6 is not integral with the speech coder/decoder 8.
- a system for enhancing and coding speech performs the steps of receiving digitized speech and enhancing the digitized speech to extract component parts of the digitized speech.
- the digitized speech is enhanced differently for each of the component parts extracted.
- an apparatus for enhancing and coding speech includes a speech coder that receives digitized speech.
- a spectrum signal processor within the speech coder determines spectrum components of the digitized speech.
- An excitation signal processor within the speech coder determines excitation signal components of the digitized speech.
- a first speech enhancement system within the speech coder processes the spectrum components.
- a second speech enhancement system within the speech coder processes the excitation signal components.
- a speech enhancement system is integral with a speech coder such that differing speech enhancement processes are used for particular (e.g., gain, spectrum and excitation) components of the digitized speech while the speech is being coded.
- Speech enhancement is performed within the speech coder using one speech enhancement system as a preprocessor for the LPC filter computer and a different speech enhancement system as a preprocessor for the speech signal from which the residual signal is computed.
- the two speech enhancement processes are both within the speech coder.
- the combined speech enhancement and speech coding method is applicable to both time-domain coders and frequency-domain coders.
- FIG. 3 is a schematic view of an apparatus which integrates speech enhancement into a speech coder in accordance with the principles of the invention.
- the apparatus illustrated in FIG. 3 includes a first speech enhancement system 10.
- the first speech enhancement system 10 receives an input speech signal, which has been digitized.
- An LPC analysis computer (LPC analyzer) 20 is coupled to the first speech enhancement system 10.
- An LPC quantizer 30 is coupled to the LPC analysis computer 20.
- An LPC synthesis filter (LPC synthesizer) 40 is coupled to the LPC quantizer 30.
- a second speech enhancement system 50 receives the digitized input speech signal.
- a first perceptual weighting filter 60 is coupled to the second speech enhancement system 50 and to the LPC analyzer 20.
- a second perceptual weighting filter 70 is coupled to the LPC analyzer 20 and to the LPC synthesizer 40.
- a subtractor 100 is coupled to the first perceptual weighting filter 60 and the second perceptual weighting filter 70.
- the subtractor 100 produces an error signal based on the difference of two inputs.
- An error minimization processor 90 is coupled to the subtractor 100.
- An excitation generation processor 80 is coupled to the error minimization processor 90.
- the LPC synthesis filter 40 is coupled to the excitation generation processor 80.
- the first speech enhancement system 10 and the second speech enhancement system 50 are integral with the rest of the apparatus illustrated in FIG. 3.
- the first speech enhancement system 10 and the second speech enhancement system 50 can be entirely different or can represent different "tunings" that give different amounts of enhancement using the same basic system.
- the first speech enhancement system 10 enhances speech prior to computation of spectral parameters, which in this example is an LPC analysis.
- the LPC analysis system 20 carries out the LPC spectral analysis.
- the LPC analysis system 20 determines the best acoustic filter, which is represented as a sequence of LPC parameters.
- the output LPC parameters of the LPC spectral analysis are used for two different purposes in this example.
- the unquantized LPC parameters are used to compute coefficient values in the first perceptual weighting filter 60 and the second perceptual weighting filter 70.
- the unquantized LPC values are also quantized in the LPC quantizer 30.
- the LPC quantizer 30 produces the best estimate of the spectral information as a series of bits.
- the quantized values produced by the LPC quantizer 30 are used as the filter coefficients in the LPC synthesis filter (LPC synthesizer) 40.
- the LPC synthesizer 40 combines the excitation signal, indicating pulse amplitudes and locations, produced by the excitation generation processor 80 with the quantized values representing the best estimate of the spectral information that are output from the LPC quantizer 30.
- the second speech enhancement system 50 is used in determining the excitation signal produced by the excitation generation processor 80.
- the digitized speech signal is input to the second speech enhancement system 50.
- the enhanced speech signal output from the second speech enhancement system 50 is perceptually weighted in the first perceptual weighting filter 60.
- the first perceptual weighting filter 60 weights the speech with respect to perceptual quality to a listener.
- the perceptual quality continually changes based on the acoustic filter (i.e., based on the frequency response of the vocal tract) represented by the output of the LPC analyzer 20.
- the first perceptual weighting filter 60 thus operates in the psychophysical domain, in a "perceptual space" where mean square error differences are relevant to the coding distortion that a listener hears.
- all possible excitation sequences are generated in the excitation generation processor 80.
- the possible excitation sequences generated by excitation generator 80 are input to the LPC synthesizer 40.
- the LPC synthesizer 40 generates possible coded output signals based on the quantized values representing the best estimate of the spectral information generated by LPC quantizer 30 and the possible excitation sequences generated by excitation generation processor 80.
- the possible coded output signals from the LPC synthesizer 40 can be sent to a digital to analog (A/D) converter for further processing.
- A/D digital to analog
- the possible coded output signals from the LPC synthesizer 40 are passed through the second perceptual weighting filter 70.
- the second perceptual weighting filter 70 has the same coefficients as the first perceptual weighting filter 60.
- the first perceptual weighting filter 60 filters the enhanced speech signal whereas the second perceptual weighting filter 70 filters possible speech output signals.
- the second perceptual weighting filter 70 tries all of the different possible excitation signals to get the best decoded speech.
- the perceptually weighted possible output speech signals from the second perceptual weighting filter 70 and the perceptually weighted enhanced input speech signal from the first perceptual weighting filter 60 are input to the subtractor 100.
- the subtractor 100 determines a signal representing a difference between perceptually weighted possible output speech signals from the second perceptual weighting filter 70 and the perceptually weighted enhanced input speech signal from the first perceptual weighting filter 60.
- the subtractor 100 produces an error signal based on the signal representing such difference.
- the output of the subtractor 100 is coupled to the error minimization processor 90.
- the error minimization processor 90 selects the excitation signal that minimizes the error signal output from the subtractor 100 as the optimal excitation signal.
- the quantized LPC values from LPC quantizer 30 and the optimal excitation signal from the error minimization processor 90 are the values that are transmitted to the speech decoder and can be used to re-synthesize the output speech signal.
- the first speech enhancement system 10 and the second speech enhancement system 50 within the apparatus illustrated in FIG. 3 can (i) apply differing amounts of the same speech enhancement process, or (ii) apply different speech enhancement processes.
- the principles of the invention can be applied to frequency-domain coders as well as time-domain coders, and are particularly useful in a cellular telephone environment, where bandwidth is limited. Because the bandwidth is limited, transmissions of cellular telephone calls use compression and often require speech enhancement. The noisy acoustic environment of a cellular telephone favors the use of a speech enhancement process. Generally, speech coders that use a great deal of compression need a lot of speech enhancement, while those using less compression need less speech enhancement.
- the invention combines the strengths of multiple speech enhancement systems in order to generate a robust and flexible speech enhancement and coding process that exhibits better performance.
- Experimental data indicate that a combination enhancement approach leads to a more robust and flexible system that shares the benefits of each constituent speech enhancement process.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Analogue/Digital Conversion (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12041298P | 1998-01-09 | 1998-01-09 | |
US120412 | 1998-01-09 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0929065A2 true EP0929065A2 (de) | 1999-07-14 |
EP0929065A3 EP0929065A3 (de) | 1999-12-22 |
Family
ID=22390111
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP99100141A Withdrawn EP0929065A3 (de) | 1998-01-09 | 1999-01-08 | Modulare Sprachverbesserung mit Anwendung an der Sprachkodierung |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP0929065A3 (de) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017177782A1 (zh) * | 2016-04-15 | 2017-10-19 | 腾讯科技(深圳)有限公司 | 语音信号级联处理方法、终端和计算机可读存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08130513A (ja) * | 1994-10-28 | 1996-05-21 | Fujitsu Ltd | 音声符号化及び復号化システム |
EP0732687A2 (de) * | 1995-03-13 | 1996-09-18 | Matsushita Electric Industrial Co., Ltd. | Vorrichtung zur Erweiterung der Sprachbandbreite |
EP0742548A2 (de) * | 1995-05-12 | 1996-11-13 | Mitsubishi Denki Kabushiki Kaisha | Vorrichtung und Verfahren zur Sprachkodierung unter Verwendung eines Filters zur Verbesserung der Signalqualität |
-
1999
- 1999-01-08 EP EP99100141A patent/EP0929065A3/de not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08130513A (ja) * | 1994-10-28 | 1996-05-21 | Fujitsu Ltd | 音声符号化及び復号化システム |
EP0732687A2 (de) * | 1995-03-13 | 1996-09-18 | Matsushita Electric Industrial Co., Ltd. | Vorrichtung zur Erweiterung der Sprachbandbreite |
EP0742548A2 (de) * | 1995-05-12 | 1996-11-13 | Mitsubishi Denki Kabushiki Kaisha | Vorrichtung und Verfahren zur Sprachkodierung unter Verwendung eines Filters zur Verbesserung der Signalqualität |
Non-Patent Citations (1)
Title |
---|
PATENT ABSTRACTS OF JAPAN vol. 1996, no. 09, 30 September 1996 (1996-09-30) & JP 08 130513 A (FUJITSU), 21 May 1996 (1996-05-21) & US 5 717 724 A (YAMAZAKI ET AL.) 10 February 1998 (1998-02-10) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017177782A1 (zh) * | 2016-04-15 | 2017-10-19 | 腾讯科技(深圳)有限公司 | 语音信号级联处理方法、终端和计算机可读存储介质 |
EP3444819A4 (de) * | 2016-04-15 | 2019-04-24 | Tencent Technology (Shenzhen) Company Limited | Sprachsignalkaskadenverarbeitungsverfahren und -endgerät und computerlesbares speichermedium |
US11605394B2 (en) | 2016-04-15 | 2023-03-14 | Tencent Technology (Shenzhen) Company Limited | Speech signal cascade processing method, terminal, and computer-readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
EP0929065A3 (de) | 1999-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6182033B1 (en) | Modular approach to speech enhancement with an application to speech coding | |
US8554550B2 (en) | Systems, methods, and apparatus for context processing using multi resolution analysis | |
US7680653B2 (en) | Background noise reduction in sinusoidal based speech coding systems | |
EP0993670B1 (de) | Verfahren und vorrichtung zur sprachverbesserung in einem sprachübertragungssystem | |
US20060031066A1 (en) | Isolating speech signals utilizing neural networks | |
US6182035B1 (en) | Method and apparatus for detecting voice activity | |
WO2001056021A1 (en) | System and method for modifying speech signals | |
EP1386313B1 (de) | Vorrichtung zur sprachverbesserung | |
US7392180B1 (en) | System and method of coding sound signals using sound enhancement | |
EP0929065A2 (de) | Modulare Sprachverbesserung mit Anwendung an der Sprachkodierung | |
GB2343822A (en) | Using LSP to alter frequency characteristics of speech | |
KR20060109418A (ko) | 인지 가중 필터를 이용한 전처리 방법 및 전처리기 | |
Hayashi et al. | A subtractive-type speech enhancement using the perceptual frequency-weighting function | |
Aoki et al. | Enhancing the naturalness of synthesized speech by using the random fractalness of vowel source signals | |
Loizou et al. | A MULTI-BAND SPECTRAL SUBTRACTION METHOD FOR SPEECH ENHANCEMENT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): DE FI FR GB IT SE |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
17P | Request for examination filed |
Effective date: 20000517 |
|
AKX | Designation fees paid |
Free format text: DE FI FR GB IT SE |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Withdrawal date: 20011126 |