US6006180A - Method and apparatus for recognizing deformed speech - Google Patents

Method and apparatus for recognizing deformed speech Download PDF

Info

Publication number
US6006180A
US6006180A US08/379,870 US37987095A US6006180A US 6006180 A US6006180 A US 6006180A US 37987095 A US37987095 A US 37987095A US 6006180 A US6006180 A US 6006180A
Authority
US
United States
Prior art keywords
speech
deformed
signal
signals
simulated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US08/379,870
Other languages
English (en)
Inventor
Philippe Bardaud
Gerard Chollet
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Application granted granted Critical
Publication of US6006180A publication Critical patent/US6006180A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • G10L2021/03643Diver speech

Definitions

  • the present invention relates to a method and to a system for processing a speech signal.
  • the technical field of the invention is that of methods and systems for processing signals.
  • the invention relates more particularly to a method and to apparatus for processing speech signals in order to facilitate the recognition of speech signals that have been disturbed or deformed, in particular speech uttered by an undersea diver.
  • the composition of the gas mixture breathed by the diver is very different from the gas composition of normal earth atmosphere (such mixtures generally being made up of nitrogen, oxygen, helium, hydrogen, . . . ), and also the pressure of the gas mixture, together with other parameters all modify and deform the speech uttered by the speaker (the diver), and consequently change the appearance and the characteristics of the corresponding speech signals that are delivered by a microphone into which the diver speaks.
  • the modifications include variations in pitch and in formants.
  • Presently known systems operate in real time to modify the deformed speech signals delivered by a microphone into which a diver is speaking, by means of electronic devices of greater or lesser sophistication which operate on the basis of a model (necessarily an approximate model) of the mechanisms whereby the speech is deformed, so as to enable speech uttered by the diver to be understood by a second party, generally situated on land or on the surface of the water.
  • Those signal-correction systems are generally situated on land or at the surface (e.g. on a ship or a platform), or else they include a portion situated close to the diver, as in U.S. Pat. No. 4,342,104 (Jack), for example.
  • the problem consists in providing a method and a system for processing speech signals and designed to be incorporated in a system for recognizing deformed speech, for the purpose of facilitating or improving intelligibility of speech signals.
  • the solution to the problem posed consists in providing a system for processing a speech signal (P), which system includes electronic and preferably essentially digital means for increasing, preferably in substantially linear manner, the frequencies (f 1 , f 2 ) of the formant of said speech signal by a factor that is close to 2 or 3, i.e. including means for multiplying said formant frequencies by a number that is close to 2 to 3 (where the formants are frequencies about which a significant fraction of the energy of said speech signal is concentrated, due to the human vocal track being resonant at those frequencies).
  • a system of the invention includes:
  • extraction means responsive to said speech signal (P) for extracting (i.e. computing on the basis of said speech signal) an excitation signal (or a residual signal) representative of the sound and vibration sources of the speech (vocal cords, flows of gas being breathed, . . . );
  • envelope determination means responsive to said speech signal to compute coefficients characteristic of the shape of the spectrum envelope of said speech signal (or characteristics of said formants);
  • interpolation means responsive to said excitation signal to generate an interpolated excitation signal having a waveform (or appearance) identical to the waveform of said excitation signal and having a (time) density of points (or samples or values) that is two or three times the density of points in said excitation signal;
  • synthesis means responsive to said interpolated excitation signal and to said characteristic coefficients to synthesize a simulated deformed speech signal (D) (or simulated hyperbaric speech).
  • D simulated deformed speech signal
  • a system of the invention includes a linear prediction coding module that combines (or constitutes) said extraction means and said envelope determination means.
  • a system of the invention includes preprocessor means for preprocessing said speech signal and comprising:
  • pre-emphasis means for boosting the high frequency components of said speech signal a little
  • windowing means for weighting a signal segment (i.e. a window) or a time sequence of said speech signal in application of a curve of predetermined shape, e.g. a "Hamming" window.
  • the invention also consists in providing a system for recognizing deformed speech signals (A) (i.e. speech uttered in an atmosphere whose gaseous composition and/or pressure is different from that of normal earth atmosphere) delivered by a microphone, which system includes a module for comparing said deformed signal (A) with simulated deformed signals (D) obtained (generated) from previously digitized non-deformed speech signals (P) stored in a memory, and in which the frequencies of the formants of said simulated deformed signals are close to two or three times the frequencies of the formants of said digitized non-deformed signals.
  • A deformed speech signals
  • D simulated deformed signals
  • P previously digitized non-deformed speech signals
  • the recognition system comprises:
  • apparatus for generating data (s) representative of simulated deformed signals (D) obtained from non-deformed speech signals (P) and for storing data in a memory (e.g. so as to constitute a database or data file specific to a given speaker), the apparatus comprising:
  • conversion means for digitizing (or sampling) an analog speech signal into a time sequence of samples or numerical values x(n) of non-deformed speech
  • pre-emphasis digital means for boosting the high frequency components of the sampled speech signal x(n) a little;
  • windowing means for weighting a window (or a time sequence of samples of said non-deformed speech signal) in application of a curve of predetermined shape
  • extraction means responsive to said speech data x(n) representative of said non-deformed speech signal to extract therefrom excitation digital data e(n) representative of an excitation signal;
  • envelope determination means responsive to said non-deformed speech data to compute characteristic coefficients a(i) of the shape of the spectrum envelope of said non-deformed speech signal (or characteristics of said formants);
  • linear interpolation means responsive to said excitation data e(n) to generate interpolated excitation data ei(n) having a waveform (or appearance) identical to the waveform of said excitation data and having a (time) density of points (or samples or values) that is two or three times the density of points of said excitation data;
  • synthesis means responsive to said interpolated excitation data ei(n) and to said characteristic coefficients a(i) to synthesize (by computation) data s(n) representative of a simulated deformed speed signal;
  • a comparator module for comparing the deformed speech signals (A) with said simulated deformed signals (D).
  • the invention also consists in implementing a method of recognizing deformed speech signals (A) delivered by a microphone, in which:
  • non-deformed speech signals (P) uttered by at least one given speaker under conditions (pressure and gas mixture being breathed, in particular) similar or identical to those of the average or normal earth atmosphere are digitized and stored in a memory;
  • simulated deformed signals are generated on the basis of said non-deformed speech signals (P), the frequencies of performance of said simulated deformed signals being close to two or three times the frequencies of the formants of said non-deformed signals (that have been digitized); and
  • a comparator module is used to compare said deformed signals (A) with said simulated deformed signals (D), e.g. by computing distances.
  • said signal (P) is sampled and digitized at a first frequency (fe) e.g. close to 10 kHz (10,000 Hz), with the successive data values (y(1), . . . , y(n)) obtained being stored in a memory;
  • a first frequency e.g. close to 10 kHz (10,000 Hz)
  • said signal (D) is obtained by subjecting data (s(1), . . . , s(3n)) representative of said simulated deformed signal (D) to digital-to-analog conversion and sampling at a second frequency (fs) close to two or three times said first frequency;
  • said data representative of said signal (D) is obtained by synthesis or superposition of data (ei(1), . . . , ei(3n)) representative of an interpolated excitation signal (ei) and of a spectrum envelope determined by coefficients (ai(1), . . . , ai(k));
  • said data representative of said interpolated excitation signal (ei) is obtained by interpolation of data (e(1), . . . , e(n)) representative of a non-interpolated excitation signal (e);
  • said data representative of said non-interpolated excitation signal (e) and said characteristic coefficients of the spectrum envelope are calculated (i.e. extracted) from said non-deformed speech signal (P) by a linear predictive coding (LPC) method.
  • LPC linear predictive coding
  • said simulated deformed signals are generated by making use of a linear multiple regression (LMR) method applied to the ceptral vectors of said digitized non-deformed speech signals.
  • LMR linear multiple regression
  • An advantage obtained by the invention is that it is easy to build up files or a database of simulated signals that are representative of simulated deformed speech, i.e. from speech signals that are "clean", i.e. that are not deformed.
  • the simulated deformed speech signals (e.g. several tens or hundreds of words) can then be recorded on a recording medium suitable for use in an "on-board" system, i.e. placed or immersed close to the diver while diving.
  • a recording medium suitable for use in an "on-board" system, i.e. placed or immersed close to the diver while diving. This can enable the diver himself to monitor whether the deformed speech he utters is recognized by the recognition system, and to do so in time that is real or deferred only slightly. This constitutes a significant improvement over known voice recognition systems whose results cannot be known by the speaker (whether or not a word uttered in deformed manner has been recognized).
  • This provides advantages particularly with respect to safety for the diver himself when he desires to communicate with a party situated at the surface (or another diver), and can also enable the diver to control underwater tools by means of an on-board voice recognition system, which is practically impossible if the recognition system is located on the surface or on land.
  • Another advantage obtained is that it is possible to generate files (or databases) of simulated deformed speech at smaller cost and suitable for use in testing the performance of other recognition systems.
  • FIG. 1 is a block diagram of the essential components of a system of the invention for use in application of the method of the invention.
  • FIG. 2 is a diagrammatic waveform of a non-deformed speech signal P
  • FIG. 3 is a diagram showing the appearance of the spectrum of said signal.
  • FIG. 4 is a diagrammatic waveform of an excitation signal E and of an interpolated excitation signal Ei obtained from the excitation signal E.
  • FIG. 5 is a diagram of a spectrum envelope curve used in synthesizing a simulated deformed speech signal.
  • FIG. 6 is a diagram showing how a simulator operates that makes use of a linear predictive coding method.
  • a recognition system of the invention comprises:
  • a microphone 1 responsive to speech uttered by a speaker, which microphone delivers non-deformed speech signals P to an analog-to-digital converter 2 that digitizes the signal P at a sampling frequency fe controlled by a clock 5;
  • the (n) digital data samples obtained (FIGS. 1 and 2) y(1), . . . , y(n) are stored in a memory 3 which may be structured as a database;
  • apparatus for simulating deformation of the speech signal P comprising:
  • a digital pre-emphasis filter 4 which receives the data y(1), . . . , y(n), and which then applies the filtered data to a windowing or weighting module 6 which delivers n preprocessed data samples x(1), . . . x(n) representative of a preprocessed version of said signal P;
  • a linear prediction coding module 9 itself comprising a module 7 for computing or extracting n data samples e(1), . . . , e(n) representative of an excitation signal E (FIG. 4), and a module 8 for computing k coefficients a(1), . . . , a(k) representative of the spectrum envelope (FIG. 5) of the signal P (with the omission of the peak centered on the frequency f o which corresponds to pitch, see FIG. 3);
  • an interpolation module 10 which doubles or triples the density of points (or values) in the signal E to deliver an interpolated excitation signal Ei identical to the signal E (FIG. 4);
  • a synthesizer module 11 which computes data samples s(1), . . . , s(3n) representative of a simulated deformed signal S by superposing the interpolated excitation signal and the spectrum envelope defined by the k coefficients a(1), . . . a(k); and
  • the data s(1, . . . , 3n) representative of the simulated deformed signal S can then be presented to the input of a digital-to-analog converter 12 controlled by a clock 14 that delivers a sampling signal at an output sampling frequency fs that is two or three times the input sampling frequency fe of the converter 2;
  • the analog simulated deformed speech signal D obtained this way is compared with a genuinely deformed speech signal A delivered by a microphone 15.
  • the deformed speech simulator may optionally use means that perform LMR computation.
  • the method is used on the following principles (see FIG. 6 in particular):
  • the production of a speech signal X by a given speaker is schematically modelled as follows:
  • the speech signal X results from the excitation signal E being filtered by an all-pole filter H(z) where H(z) can be defined as follows:
  • H(z) is representative of the transformation applied by the resonances of the speaker's vocal track on the sound excitation signal E produced by the vocal chords, which transformation gives rise, in the spectrum of the speech signal P (FIG. 3), to maxima centered on frequencies f 1 and f 2 known as "formants";
  • n-th sample x(n) of the speech signal is a linear combination of the excitation signal sample e(n) and of the k preceding samples, which can be written:
  • the coefficients a(j) are prediction coefficients; if the excitation sample e(n) is zero, it is possible in application of the above formula to predict the value x(n) of the speech signal sample with a prediction error whose value err(n) is given by the following:
  • the k values of the k coefficients a(1), . . . , a(k) that enable the prediction error to be minimized are calculated for each sample x(n), which is equivalent to minimizing the quadratic error Q which is given by:
  • An output signal as deformed by the above simulation is synthesized from the interpolated excitation signal Ei and from the coefficients a(j) by the formula:
  • Simulation and recognition systems can be implemented, for example, by using a PC microcomputer fitted with a mother board based on an 80386 microprocessor (manufactured by Intel) clocked at 33 MHz, and associated with an 80387 arithmetic coprocessor (same manufacturer).
  • a signal processing card installed in the PC microcomputer takes care of the hyperbaric simulation processing and of speech recognition.
  • Such a card is essentially made up of a TMS320C30 processor (manufactured by Texas Instruments), an analog input, and analog-to-digital and digital-to-analog converters operating at a maximum sampling frequency of 200 kHz.
  • Hyperbaric speech simulation consists in isolating words uttered in an ambient earth atmosphere, and in transforming them so as to simulate the same words as uttered in a hyperbaric atmosphere. No real time constraints are necessary for this type of processing. Nevertheless, for reasons of computation time, the processing is performed by the signal processor card.
  • the recognition system must operate in real time. As a result it may be installed on and optimized for the signal processor card.
US08/379,870 1994-01-28 1995-01-27 Method and apparatus for recognizing deformed speech Expired - Lifetime US6006180A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR9401235A FR2715755B1 (fr) 1994-01-28 1994-01-28 Procédé et dispositif de reconnaissance de la parole.
FR9401235 1994-01-28

Publications (1)

Publication Number Publication Date
US6006180A true US6006180A (en) 1999-12-21

Family

ID=9459753

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/379,870 Expired - Lifetime US6006180A (en) 1994-01-28 1995-01-27 Method and apparatus for recognizing deformed speech

Country Status (4)

Country Link
US (1) US6006180A (fr)
EP (1) EP0665531B1 (fr)
DE (1) DE69518674T2 (fr)
FR (1) FR2715755B1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6233549B1 (en) * 1998-11-23 2001-05-15 Qualcomm, Inc. Low frequency spectral enhancement system and method
US6505152B1 (en) * 1999-09-03 2003-01-07 Microsoft Corporation Method and apparatus for using formant models in speech systems
US20030135362A1 (en) * 2002-01-15 2003-07-17 General Motors Corporation Automated voice pattern filter
US20070100630A1 (en) * 2002-03-04 2007-05-03 Ntt Docomo, Inc Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product
US20170047080A1 (en) * 2014-02-28 2017-02-16 Naitonal Institute of Information and Communications Technology Speech intelligibility improving apparatus and computer program therefor
US11062708B2 (en) * 2018-08-06 2021-07-13 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for dialoguing based on a mood of a user

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10311913B3 (de) * 2003-03-17 2004-11-25 Forschungszentrum Jülich GmbH Verfahren und Vorrichtung zur Analyse von Sprachsignalen
DE102004046045B3 (de) * 2004-09-21 2005-12-29 Drepper, Friedhelm R., Dr. Verfahren und Vorrichtung zur Analyse von instationären Sprachsignalen

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3995116A (en) * 1974-11-18 1976-11-30 Bell Telephone Laboratories, Incorporated Emphasis controlled speech synthesizer
US4246617A (en) * 1979-07-30 1981-01-20 Massachusetts Institute Of Technology Digital system for changing the rate of recorded speech
US4342104A (en) * 1979-11-02 1982-07-27 University Court Of The University Of Edinburgh Helium-speech communication
US4566117A (en) * 1982-10-04 1986-01-21 Motorola, Inc. Speech synthesis system
US4624012A (en) * 1982-05-06 1986-11-18 Texas Instruments Incorporated Method and apparatus for converting voice characteristics of synthesized speech
US4852168A (en) * 1986-11-18 1989-07-25 Sprague Richard P Compression of stored waveforms for artificial speech
US4969192A (en) * 1987-04-06 1990-11-06 Voicecraft, Inc. Vector adaptive predictive coder for speech and audio
US5163110A (en) * 1990-08-13 1992-11-10 First Byte Pitch control in artificial speech
US5327521A (en) * 1992-03-02 1994-07-05 The Walt Disney Company Speech transformation system
US5528726A (en) * 1992-01-27 1996-06-18 The Board Of Trustees Of The Leland Stanford Junior University Digital waveguide speech synthesis system and method
US5577160A (en) * 1992-06-24 1996-11-19 Sumitomo Electric Industries, Inc. Speech analysis apparatus for extracting glottal source parameters and formant parameters

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3995116A (en) * 1974-11-18 1976-11-30 Bell Telephone Laboratories, Incorporated Emphasis controlled speech synthesizer
US4246617A (en) * 1979-07-30 1981-01-20 Massachusetts Institute Of Technology Digital system for changing the rate of recorded speech
US4342104A (en) * 1979-11-02 1982-07-27 University Court Of The University Of Edinburgh Helium-speech communication
US4624012A (en) * 1982-05-06 1986-11-18 Texas Instruments Incorporated Method and apparatus for converting voice characteristics of synthesized speech
US4566117A (en) * 1982-10-04 1986-01-21 Motorola, Inc. Speech synthesis system
US4852168A (en) * 1986-11-18 1989-07-25 Sprague Richard P Compression of stored waveforms for artificial speech
US4969192A (en) * 1987-04-06 1990-11-06 Voicecraft, Inc. Vector adaptive predictive coder for speech and audio
US5163110A (en) * 1990-08-13 1992-11-10 First Byte Pitch control in artificial speech
US5528726A (en) * 1992-01-27 1996-06-18 The Board Of Trustees Of The Leland Stanford Junior University Digital waveguide speech synthesis system and method
US5327521A (en) * 1992-03-02 1994-07-05 The Walt Disney Company Speech transformation system
US5577160A (en) * 1992-06-24 1996-11-19 Sumitomo Electric Industries, Inc. Speech analysis apparatus for extracting glottal source parameters and formant parameters

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
Fifth International Conference on Electronics for Ocean Technology; Duncan, "Correction of the helium speech effect by short time autoregressive signal processing". pp. 125-130, Mar. 1987.
Fifth International Conference on Electronics for Ocean Technology; Duncan, Correction of the helium speech effect by short time autoregressive signal processing . pp. 125 130, Mar. 1987. *
ICASSP 83, 1883, Boston, pp. 1160 1163, Belcher et al., Helium Speech Enhancement by frequent Domain Processing Jun. 1983. *
ICASSP 83, 1883, Boston, pp. 1160-1163, Belcher et al., "Helium Speech Enhancement by frequent -Domain Processing"Jun. 1983.
ICASSP 83, 1983, Boston, PP. 1160 1163, E. O. Belcher et al., Helium Speech Enhancement By Frequency Domain Processing . *
ICASSP 83, 1983, Boston, PP. 1160-1163, E. O. Belcher et al., "Helium Speech Enhancement By Frequency-Domain Processing".
IEEE Engineering in Medicine and Biology Magazine vol. 12; Mackay, "Speaking and whisling with non-aur gases", pp. 114-115 Dec. 1993.
IEEE Engineering in Medicine and Biology Magazine vol. 12; Mackay, Speaking and whisling with non aur gases , pp. 114 115 Dec. 1993. *
The Radio And Electronic Engineer, vol. 52, No. 5, May 1982, PP. 211 223, M. A. Jack et al., The Helium Speech Effect and Electronic Techniques For Enhancing Intelligibility In a Helium Oxygen Environment . *
The Radio And Electronic Engineer, vol. 52, No. 5, May 1982, PP. 211-223, M. A. Jack et al., "The Helium Speech Effect and Electronic Techniques For Enhancing Intelligibility In a Helium-Oxygen Environment".

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6233549B1 (en) * 1998-11-23 2001-05-15 Qualcomm, Inc. Low frequency spectral enhancement system and method
US6694291B2 (en) 1998-11-23 2004-02-17 Qualcomm Incorporated System and method for enhancing low frequency spectrum content of a digitized voice signal
US6505152B1 (en) * 1999-09-03 2003-01-07 Microsoft Corporation Method and apparatus for using formant models in speech systems
US6708154B2 (en) 1999-09-03 2004-03-16 Microsoft Corporation Method and apparatus for using formant models in resonance control for speech systems
US20030135362A1 (en) * 2002-01-15 2003-07-17 General Motors Corporation Automated voice pattern filter
US7003458B2 (en) * 2002-01-15 2006-02-21 General Motors Corporation Automated voice pattern filter
US20070100630A1 (en) * 2002-03-04 2007-05-03 Ntt Docomo, Inc Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product
US7680666B2 (en) * 2002-03-04 2010-03-16 Ntt Docomo, Inc. Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product
US20170047080A1 (en) * 2014-02-28 2017-02-16 Naitonal Institute of Information and Communications Technology Speech intelligibility improving apparatus and computer program therefor
US9842607B2 (en) * 2014-02-28 2017-12-12 National Institute Of Information And Communications Technology Speech intelligibility improving apparatus and computer program therefor
US11062708B2 (en) * 2018-08-06 2021-07-13 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for dialoguing based on a mood of a user

Also Published As

Publication number Publication date
DE69518674T2 (de) 2001-06-13
DE69518674D1 (de) 2000-10-12
FR2715755A1 (fr) 1995-08-04
EP0665531A1 (fr) 1995-08-02
FR2715755B1 (fr) 1996-04-12
EP0665531B1 (fr) 2000-09-06

Similar Documents

Publication Publication Date Title
Valbret et al. Voice transformation using PSOLA technique
US7765101B2 (en) Voice signal conversation method and system
Ye et al. Quality-enhanced voice morphing using maximum likelihood transformations
US7792672B2 (en) Method and system for the quick conversion of a voice signal
AU639394B2 (en) Speech synthesis using perceptual linear prediction parameters
EP0458859A1 (fr) Systeme et procede de synthese de texte en paroles utilisant des allophones de voyelle dependant du contexte.
JPH04313034A (ja) 合成音声生成方法及びテキスト音声合成装置
US6006180A (en) Method and apparatus for recognizing deformed speech
EP2507794B1 (fr) Synthèse de parole assombrie
CN113345415A (zh) 语音合成方法、装置、设备及存储介质
EP0515709A1 (fr) Méthode et dispositif pour la représentation d'unités segmentaires pour la conversion texte-parole
JPH08248994A (ja) 声質変換音声合成装置
Lerner Computers: Products that talk: Speech-synthesis devices are being incorporated into dozens of products as difficult technical problems are solved
US6829577B1 (en) Generating non-stationary additive noise for addition to synthesized speech
Venkatagiri et al. Digital speech synthesis: Tutorial
Iwano et al. Noise robust speech recognition using F0 contour extracted by Hough transform
Mittal et al. An impulse sequence representation of the excitation source characteristics of nonverbal speech sounds
Dhanoa et al. PERFORMANCE COMPARISON OF MFCC BASED TECHNIQUES FOR RECOGNITION OF SPOKEN HINDI WORDS
El-imam A personal computer-based speech analysis and synthesis system
JP2001100777A (ja) 音声合成方法及び装置
Smith Production Rate and Weapon System Cost: Research Review, Case Studies, and Planning Models
Mittal et al. A sparse representation of the excitation source characteristics of nonnormal speech sounds
Rodet Sound analysis, processing and synthesis tools for music research and production
Farrús et al. Speaker recognition robustness to voice conversion
Pai et al. A two-dimensional cepstrum approach for the recognition of Mandarin syllable initials

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12