WO2004059614A2 - Procede et appareil permettant d'augmenter la qualite de perception de signaux de parole synthetises - Google Patents

Procede et appareil permettant d'augmenter la qualite de perception de signaux de parole synthetises Download PDF

Info

Publication number
WO2004059614A2
WO2004059614A2 PCT/DK2003/000917 DK0300917W WO2004059614A2 WO 2004059614 A2 WO2004059614 A2 WO 2004059614A2 DK 0300917 W DK0300917 W DK 0300917W WO 2004059614 A2 WO2004059614 A2 WO 2004059614A2
Authority
WO
WIPO (PCT)
Prior art keywords
speech signal
speech
signal
filter
component
Prior art date
Application number
PCT/DK2003/000917
Other languages
English (en)
Other versions
WO2004059614A3 (fr
Inventor
Kjeld Hermansen
Original Assignee
Microsound A/S
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsound A/S filed Critical Microsound A/S
Priority to AU2003287927A priority Critical patent/AU2003287927A1/en
Publication of WO2004059614A2 publication Critical patent/WO2004059614A2/fr
Publication of WO2004059614A3 publication Critical patent/WO2004059614A3/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information

Definitions

  • the present invention relates to a method and apparatus for enhancing the perceptual quality of synthesized speech signals.
  • Noise reduction of a noise-corrupted target speech signal is obtained through dynamic smoothing or filtration of time-varying speech model parameters, so-called f, b, g parameters, based on a priori knowledge of human speech production.
  • the filtration of the f, b, g parameters may take into account typical maximum and minimum frequencies for formants of human speech, durations of voiced sounds such as phonemes, or unvoiced sounds, and/or take into account frequency differences between formants of a present speech signal and a previous speech signal.
  • the synthesized speech signals may originate from a preceding model based noise reduction process which may comprise deriving a so- called f, b, g-model representing respective frequencies, bandwidths and gains of formants in a segment of a noise-corrupted target speech signal, individually smoothing the f, b, g parameters over time, and re-synthesizing a noise-reduced version of the target speech signal.
  • a preceding model based noise reduction process which may comprise deriving a so- called f, b, g-model representing respective frequencies, bandwidths and gains of formants in a segment of a noise-corrupted target speech signal, individually smoothing the f, b, g parameters over time, and re-synthesizing a noise-reduced version of the target speech signal.
  • a speech synthesis method in accordance with the present invention may be utilized to improve the perceptual quality of speech that has been subjected to the noise reduction methodology disclosed in WO 00/72305.
  • the present invention may also be utilized to improve the perceptual quality of synthesized speech in general for applications such as text-to-speech systems and voice-recognition and response systems.
  • the present invention relates to a method of synthesizing speech signals, an apparatus for implementing the method as well as a computer programme product for enabling a programmable computer to perform the method.
  • the time-varying synthesis filter comprises one or several filter section(s) of fractional order, i.e. filter sections having a non-integer order.
  • filter sections having a non-integer order.
  • synthesis filters employed for synthesis of parametric speech data have been of integer order, and often been constituted by a composition of paralleled or cascaded second order filter sections.
  • the speech synthesis filter may in the latter situation be composed of a set of cascaded or parallel second order filters or filer sections as disclosed in lines 12-16 of page 3 of WO 00/72305.
  • a significant problem associated with the use of filter sections of integer order in the synthesis filter is that individual formants, such a second and a third formant, of the speech signal to be synthesized, or resynthesized, tend to overlap or fuse. This overlap is a consequence of the fact that formants of a speech segment often are too closely spaced in frequency to avoid a substantial overlap between individual formant frequencies with the frequency response roll-off rates provided by traditional second order filter sections. This fusing of individual formants of the speech segment leads to a reduced intelligibility and/or perceptual quality of the synthesized speech signal. Listening experiments have shown that noise free speech signals are very well modelled by all-pole transfer functions for the vocal tract, but problems are encountered in case of noise corrupted speech problems. Desired speech signal manipulation through the f, b, g parameters, such as manipulation motivated by speech enhancement, transposition, noise cancellation etc., eventually leads to non- all pole speech models and therefore a need to continually vary filter section orders as functions of their respective bandwidths.
  • a solution to the above-mentioned technical problem in accordance with the present invention is to include at least one, but preferably several, filter section(s) of fractional order in the speech synthesis filter.
  • the fractional order filter or filters are preferably digital filters and designed with a 3 dB bandwidth that represents the formants to be modelled.
  • the order of the at least one fractional order filter may advantageously lie between 2 and 6 or more preferably between 2 and 4 such as between 2.5 and 3.5.
  • the generator signal component of the speech signal comprises a transient part of the speech signal and preferably additionally comprises glottal pulse components and unvoiced components of the speech signal.
  • the generator signal component may have been determined from the speech signal via conventional techniques known in the art such as inverse filtering. In case of heavy noise contamination of the target speech signal, the generator signal component may have been composed of a synthetic glottal pulse combined with the proper pitch period, thereby removing/reducing the inevitable noise in the inverse filter based generator signal.
  • the quasi-stationary component of the speech signal has preferably been determined by a step of mapping the speech signal into a LPC model of predetermined order, such as an order of 8 or 10 or 12.
  • the quasi-stationary component of the speech signal and/or the generator signal component for synthesis of the speech signal may be stored in digital form in electronic memory means such as a RAM, ROM, EPROM, EEPROM, Flash memory and/or on a magnetic or optical memory disc.
  • electronic memory means which holds the quasi-stationary component of the speech signal and/or the generator signal component may form part of a Personal Computer, hand-held computer, mobile or cellular phone, headset, hearing prosthesis etc. for adapted for reproducing the synthesized speech signal to a listener through suitable loudspeaker means.
  • the method according to the present invention may be fully or partly implemented by a software program running on a programmable signal processor, such as a microprocessor and/or industrial or proprietary Digital Signal Processor (DSP).
  • a software program running on a programmable signal processor, such as a microprocessor and/or industrial or proprietary Digital Signal Processor (DSP).
  • DSP Digital Signal Processor
  • the software program may be loaded at run-time from a nonvolatile memory of the apparatus to a suitable Program RAM storage space and then executed from the Program RAM.
  • the apparatus according to the present invention may comprise a portable and battery powered communication device such as a hearing prosthesis, headset, handset, mobile or cellular phone etc.
  • the LPC model of the quasi- stationary component of the speech signal is subjected to pseudo-decomposition into second order and fractional order filter sections having respective f, b, g parameter sets representing respective formant frequencies, formant bandwidths and formant gains of a segment of the speech signal.
  • a significant advantage of the f, b, g representation of the speech signal is that this representation is closely coupled to physiologically identifiable, time-varying parameters, of the vocal tract of the individual producing the synthesized speech signal in question.
  • the present methodology comprises a preceding speech analysis stage during which the generator signal component and the quasi-stationary component of the speech signal are derived.
  • This embodiment is particularly useful during real-time processing of incoming signals such as microphone input signals to a hearing prosthesis or headset.
  • the incoming signal may be subjected to non-linear or linear signal processing between analysis and synthesis step in order to improve the audibility and/or, intelligibility and/or comfort of the incoming signal before a synthesized resulting signal is provided to a user, e.g. a hearing impaired user.
  • the methodology comprises the above-mentioned speech analysis stage and a noise reduction step during which the f, b, g parameters are subjected to band pass or low pass filtering with a filter having a cut-off frequency below 15 Hz or 10 Hz.
  • the filtering step comprises individually filtering each parameter of each set of f, b, g parameters with respective band pass filters having a pass band between 1 and 10 Hz.
  • the noise reduction process may additionally comprise a step of inserting a synthetic glottal pulse signal in at least a part of the generator signal component of the speech signal.
  • a noise-robust pitch detector of conventional design may be used to determine a pitch of the speech signal and the processor may use such information to select a preferred period of the synthetic glottal pulse signal.
  • a noise-reduction method is particularly well suited to work in cooperation with the noise reduction method and apparatus disclosed in international patent application WO 00/72305 owned by the present applicant.
  • the invention disclosed in this prior art document relates to a model based noise reduction method and apparatus.
  • Noise reduction of a noise-corrupted target speech signal is obtained through dynamic smoothing or filtration of time-varying speech model parameters, so-called f, b, g parameters, based on a priori knowledge of human speech production.
  • the present invention may be operate to improve the noise-reduction results obtained with the prior art method by further reducing artefacts in the target speech signal and improve the suppression of the noise component in the target speech signal.
  • the improvement is obtained by including statistical modelling of phoneme development processes over time based on one or more Hidden Markov Models that have been adapted to model f, b, g parameter development over time. Since such statistical properties of phoneme development may be language specific, HMMs for a particular language may be provided in apparatuses that are adapted to implement the present invention on a particular market.
  • Fig. 1 shows an empirically determined relationship between the order, x, of a section of a speech synthesis filter and the bandwidth of the formant frequency to be modelled by the filter section.
  • Fig. 2 illustrates two different exemplary magnitude responses of a particular section of a speech synthesis filter.
  • Response 1 is obtained by using a prior art second order filter section to model a formant response of a formant located at 400 Hz
  • response 2 is obtained by using a fractional order filter of order 3.4 to model the same formant. Note that the bandwidth of the formant is approximately equal for both filter responses, but that the undesired spread in frequency of "skirts" of the filter is much smaller for the fractional/higher order filter.
  • the speech synthesis filter comprises at least one fractional order digital filter that models a formant of a segment of the speech signal.
  • Formants with small bandwidths such as bandwidths less than 50 Hz or 30 Hz, are preferably represented by filter sections of relatively low order, while formants with larger bandwidths may utilize a progressively raising filter order of the filter section in question.
  • Figure 1 illustrates a preferred relation between filter section order, X, and formant bandwidth, BW.
  • the preferred goal of the synthesis filter is to obtain a target 3 dB bandwidth of a standard second order model filter substantially independent of which order, x, that is selected for the fractional order section.
  • the bandwidth of the filter section shall preferably be modified according to the below-mentioned formula:
  • Amplitude (l/(1.0+y ⁇ 2)) ⁇ 0.5

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne un procédé et un appareil permettant d'augmenter la qualité de perception de signaux de parole synthétisés. Selon l'invention, le filtre de synthèse à variation temporelle comprend une ou plusieurs sections de filtre d'ordre fractionnel, c'est-à-dire des sections de filtre présentant un ordre non entier. En général, les filtres de synthèse employés pour la synthèse de données de parole paramétriques présentent un ordre entier, et sont souvent constitués par une composition de sections de filtre du second ordre disposées en parallèle ou en cascade. Le filtre de synthèse de parole peut dans ce dernier cas être composé d'un ensemble de filtres du second ordre disposés en parallèle ou en cascade ou de sections de filtre décrites dans le brevet d'invention WO 00/72305.
PCT/DK2003/000917 2002-12-31 2003-12-19 Procede et appareil permettant d'augmenter la qualite de perception de signaux de parole synthetises WO2004059614A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003287927A AU2003287927A1 (en) 2002-12-31 2003-12-19 A method and apparatus for enhancing the perceptual quality of synthesized speech signals

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DKPA200202019 2002-12-31
DKPA200202019 2002-12-31

Publications (2)

Publication Number Publication Date
WO2004059614A2 true WO2004059614A2 (fr) 2004-07-15
WO2004059614A3 WO2004059614A3 (fr) 2004-09-23

Family

ID=32668640

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DK2003/000917 WO2004059614A2 (fr) 2002-12-31 2003-12-19 Procede et appareil permettant d'augmenter la qualite de perception de signaux de parole synthetises

Country Status (2)

Country Link
AU (1) AU2003287927A1 (fr)
WO (1) WO2004059614A2 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999001942A2 (fr) * 1997-07-01 1999-01-14 Partran Aps Procede de reduction de bruit dans des signaux vocaux et appareil d'application du procede
WO2000072305A2 (fr) * 1999-05-19 2000-11-30 Noisecom Aps Procede et dispositif de reduction du bruit dans des signaux vocaux

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999001942A2 (fr) * 1997-07-01 1999-01-14 Partran Aps Procede de reduction de bruit dans des signaux vocaux et appareil d'application du procede
WO2000072305A2 (fr) * 1999-05-19 2000-11-30 Noisecom Aps Procede et dispositif de reduction du bruit dans des signaux vocaux

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HARTMANN U ET AL: "Model based spectral subtraction used for noise suppression in speech with low SNR" NORSIG2000. NORDIC SIGNAL PROCESSING SYMPOSIUM, NORSIG2000. NORDIC SIGNAL PROCESSING SYMPOSIUM, VILDMARKSHOTELLET KOLMARDEN, SWEDEN, 13-15 JUNE 2000, pages 129-132, XP002278902 2000, Linkoping, Sweden, Linkoping Univ, Sweden ISBN: 91-7219-789-7 *

Also Published As

Publication number Publication date
AU2003287927A8 (en) 2004-07-22
AU2003287927A1 (en) 2004-07-22
WO2004059614A3 (fr) 2004-09-23

Similar Documents

Publication Publication Date Title
US8265940B2 (en) Method and device for the artificial extension of the bandwidth of speech signals
US8311842B2 (en) Method and apparatus for expanding bandwidth of voice signal
US6889182B2 (en) Speech bandwidth extension
EP2375785B1 (fr) Améliorations de la stabilité des appareils auditifs
JP3653826B2 (ja) 音声復号化方法及び装置
US20020128839A1 (en) Speech bandwidth extension
WO2008032828A1 (fr) Dispositif de codage audio et procédé de codage audio
WO2001056021A1 (fr) Systeme et procede de modification de signaux vocaux
TW201308316A (zh) 適應性聲音清晰度處理器
EP1333700A2 (fr) Procédé de transposition de fréquence dans une prothèse auditive et une telle prothèse auditive
EP1008984A2 (fr) Synthèse de la parole à large bande à partir d'un signal vocal à bande étroite
DE102008031150B3 (de) Verfahren zur Störgeräuschunterdrückung und zugehöriges Hörgerät
US20080027708A1 (en) Method and system for FFT-based companding for automatic speech recognition
EP2675191A2 (fr) Translation de fréquence dans des dispositifs d'assistance auditive par synthèse spectrale additive
WO2004059614A2 (fr) Procede et appareil permettant d'augmenter la qualite de perception de signaux de parole synthetises
EP0570362B1 (fr) Decodeur de parole numerisee utilisant un postfiltre a distorsion spectrale reduite
EP4133482A1 (fr) Amélioration de la parole à bande passante réduite avec extension de bande passante
WO2010078938A2 (fr) Procédé et dispositif de traitement de signaux acoustiques vocaux
JP3197975B2 (ja) ピッチ制御方法及び装置
JP6159570B2 (ja) 音声強調装置、及びプログラム
JPH08110796A (ja) 音声強調方法および装置
JP2001249676A (ja) 雑音が付加された周期波形の基本周期あるいは基本周波数の抽出方法
Fulop et al. Signal Processing in Speech and Hearing Technology
EP2506254A1 (fr) Procédé d'amélioration de l'intelligibilité de la parole avec un appareil auditif ainsi qu'appareil auditif
US11967334B2 (en) Method for operating a hearing device based on a speech signal, and hearing device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP