WO2000016310A1 - Vorrichtung und verfahren zur digitalen sprachbearbeitung - Google Patents
Vorrichtung und verfahren zur digitalen sprachbearbeitung Download PDFInfo
- Publication number
- WO2000016310A1 WO2000016310A1 PCT/EP1999/006712 EP9906712W WO0016310A1 WO 2000016310 A1 WO2000016310 A1 WO 2000016310A1 EP 9906712 W EP9906712 W EP 9906712W WO 0016310 A1 WO0016310 A1 WO 0016310A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- generating
- melody
- speech
- generated
- modifying
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 16
- 238000000034 method Methods 0.000 title claims description 18
- 238000013518 transcription Methods 0.000 claims description 13
- 230000035897 transcription Effects 0.000 claims description 13
- 230000000694 effects Effects 0.000 claims description 9
- 238000012986 modification Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 2
- 230000033764 rhythmic process Effects 0.000 claims description 2
- 238000013519 translation Methods 0.000 claims 2
- 230000005355 Hall effect Effects 0.000 claims 1
- 238000004590 computer program Methods 0.000 claims 1
- 230000015572 biosynthetic process Effects 0.000 description 8
- 238000003786 synthesis reaction Methods 0.000 description 8
- 238000011156 evaluation Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 239000003607 modifier Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000015220 hamburgers Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 210000003928 nasal cavity Anatomy 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000005477 standard model Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
Definitions
- the present invention relates to an apparatus and a method for digital speech processing or speech generation.
- Current systems for digital speech output have so far been used in environments in which a synthetic
- the present invention relates to a system which enables natural-looking speech to be generated synthetically.
- the commands built into the text stream can also contain information on the characteristics of the speaker (i.e. parameters of the speaker model).
- EP 0762384 describes a system in which these speaker characteristics can be entered on the screen on a graphical user interface.
- the speech synthesis is carried out using auxiliary information which is stored in a database (for example as a “waveform sequence” in EP 0831460).
- a database for example as a “waveform sequence” in EP 0831460.
- rules for pronunciation must nevertheless be present in the program
- the composition of the individual sequences leads to distortions and acoustic artifacts if no measures are taken to suppress them.
- this problem one speaks of "segmental quality” is considered to be largely solved today (cf. eg Volker Kraft: Linking natural language modules to Speech synthesis: requirements, techniques and evaluation (Progr.-Ber.VDI series 10 No. 468, VDI-Verlag 1997), but there are also a number of other problems with modern speech synthesis systems.
- One problem in digital speech output is, for example, the ability to speak multiple languages.
- the applications range from the creation of simple texts for multimedia applications to film settings (synchronization), radio plays and audio books.
- Text generated sentence melody can be modified using an editor.
- the starting point is the written text. However, in order to achieve sufficient (in particular prosodic) quality and to achieve dramaturgical effects, the user is given extensive options for intervention in a preferred embodiment.
- the user is in the role of the director, who defines the speakers on the system and specifies the rhythm and sentence melody, pronunciation and emphasis.
- the present invention also includes generating one
- Phonetic transcription for a written text and the provision of the possibility to modify the phonetic transcription generated, or to generate the phonetic transcription based on modifiable rules. This can be used, for example, to generate a special accent for a speaker.
- the invention comprises a dictionary device in which the words of one or more languages are stored together with their pronunciation. In the latter case, this enables multilingual capability, ie the processing of texts in different languages.
- the generated phonetic transcription or sentence melody is preferably edited using an easy-to-use editor, for example a graphical user interface.
- Speech processing includes speaker models, which can either be predefined or defined or modified by the user. Characteristics of different speakers can be realized, be it male or female voices, or different accents of a speaker, such as a Bavarian, Swabian or North German accent.
- the device consists of a dictionary in which the pronunciation of all words is also stored in phonetic transcription (if phonetic transcription is mentioned below, this means any phonetic transcription, such as the SAM PA notation, cf. e.g. "Multilingual speech input / output assessment, methodology and standardization, Standard computer-compatible transscription, pp 29-31, in Esprit Project 2589 (SAM) Fin. Report SAM-UCC-037", or the international phonetic script known from language teaching aids , see e.g. "The Principles of the International Phonetic Association: Adescription of the International Phonetic Alphabet and the Manner of Using it. International Phonetic Association, Dept, Phonetics, Univ.
- a translator who typed texts in phonetic transcription converts and generates a sentence melody
- an editor with which texts can be entered and speakers can be assigned and in which both the generated The phonetic spelling as well as the sentence melody can be displayed and changed
- an input module in which speaker models can be defined a system for digital speech generation that generates signals representing speech or data representing speech together with the sentence melody from the phonetic spelling and that in the Is able to process various speaker models, a system of digital filters and other devices (for reverb, echo, etc.) with which special effects can be generated, a sound archive, and a mixing device in which the generated Speech signals can be mixed together with sounds from the archive and effects can be added.
- the invention can either be implemented hybrid in software and hardware or entirely in software.
- the generated digital voice signals can be output via a special device for digital audio or via a PC sound card.
- FIG. 1 shows a block diagram of a device for digital speech generation according to an exemplary embodiment of the present invention.
- this consists of several individual components which can be implemented by means of one or more digital computing systems, the functioning and interaction of which is described in more detail below.
- the dictionary 100 consists of simple tables (one for each language) in which the words of a language are stored together with their pronunciation.
- the tables can be expanded to include additional words and their pronunciation.
- additional tables with different phonetic entries can also be created in one language.
- a table from the dictionary is assigned to each speaker.
- the translator 110 generates the phonetic script by using the
- Words of the entered text are replaced by their phonetic counterparts in the dictionary. If in the speaker model modifiers, the later more precisely are described, he uses them to modify the pronunciation.
- heuristics are e.g. the Fujisaki (1992) model or other acoustic methods, then the perceptual models, e.g. that of d'Aiessandro and Mertens (1995).
- These, but also older linguistic models are e.g. described in "Gold Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997".
- segmentation setting breaks
- the user has an instrument in his hand with which he can enter and change pronunciation, intonation, emphasis, tempo, volume, pauses, etc.
- the translator responds to this assignment by adapting the phonetics and, if necessary, the prosody to the speaker model and generating new ones.
- the phonetics are displayed to the user in phonetic transcription, the prosody e.g. in a symbolism taken from the music (musical notation).
- the user then has the option of changing these specifications, listening to individual text sections and improving his entries again, etc.
- Speaker models 130 are, for example, parameterizations for speech generation.
- the models reproduce the characteristics of the human speech tract.
- the function of the vocal cords is represented by a pulse train, of which only the frequency (pitch) can be changed.
- the remaining characteristics (oral cavity, nasal cavity) of the speech tract are realized with digital filters.
- Your parameters are stored in the speaker model.
- Standard models are stored (child, young lady, old man, etc.). The user can generate additional models from them by selecting or changing the parameters appropriately and saving the model.
- the parameters stored here are used together with the prosody information for the intonation during the speech generation, which will be explained in more detail later.
- a speaker model can, for example, relate to the rules according to which the translator generates the phonetic transcription, different speaker models can operate according to different rules. However, it can also correspond to a specific set of filter parameters in order to process the speech signals in accordance with the speaker characteristics thus specified. Any combination of these two aspects of a speaker model is of course also conceivable.
- the task of the speech generation unit 140 is to create numerical information from the given text together with the additional phonetic and prosodic information generated by the translator and edited by the user
- This Data stream can then be converted by an output device 150, for example a digital audio device or a sound card in the PC, into analog sound signals, the text to be output.
- an output device 150 for example a digital audio device or a sound card in the PC
- a conventional text-to-speech can be used for speech generation
- Rule-based synthesizers work with rules for generating the
- Chain-based synthesizers are easier to use. You work with a database that stores all possible pairs of sounds. These can be easily linked, although high-quality systems require a lot of computing time. Such systems are described in “Gold Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997” and in “Volker Kraft: Linking Natural Language Building Blocks for Speech Synthesis: Requirements, Techniques and Evaluation. Progress Report VDI Series 10 No. 468 , VDI-Verlag 1997 ".
- digital filters e.g. Bandpass filters for telephone effect
- Hall generators etc.
- Archives 170 contain sounds such as Street noise, railroad, kids shouting, ocean waves, background music etc. saved.
- the archive can be expanded with your own sounds.
- the archive can simply be a collection of files with digitized sounds, but it can also be a database in which the sounds are stored as blobs (binary large objects).
- the generated speech signals are assembled with the background noise.
- the volume of all signals can be regulated before assembling. It is also possible to add effects to each signal individually or all together.
- the result of the signal generated in this way can be transferred to a suitable device for digital audio 150, for example a sound card of a PC, and can thus be acoustically checked or output.
- a suitable device for digital audio 150 for example a sound card of a PC
- a sound card of a PC for example a sound card of a PC
- Storage device is provided to store the signal so that it can later be transferred to the target medium in a suitable manner.
- a device that is classically implemented in hardware can be used as a mixing device, or it can be implemented in software and integrated into the entire program.
- the output device 150 may be replaced by a further computer which is coupled to the mixing device 180 by means of a network connection.
- a network connection for example, a Computer network, such as the Internet, the voice signal generated are transferred to another computer.
- the speech signal generated by the speech generating device 140 can also be transmitted directly to the output device 150, without the detour via the mixing device 180. Further comparable modifications result in a relaxed manner for the person skilled in the art.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU60813/99A AU769036B2 (en) | 1998-09-11 | 1999-09-10 | Device and method for digital voice processing |
EP99947314A EP1110203B1 (de) | 1998-09-11 | 1999-09-10 | Vorrichtung und verfahren zur digitalen sprachbearbeitung |
AT99947314T ATE222393T1 (de) | 1998-09-11 | 1999-09-10 | Vorrichtung und verfahren zur digitalen sprachbearbeitung |
DE59902365T DE59902365D1 (de) | 1998-09-11 | 1999-09-10 | Vorrichtung und verfahren zur digitalen sprachbearbeitung |
CA002343071A CA2343071A1 (en) | 1998-09-11 | 1999-09-10 | Device and method for digital voice processing |
JP2000570766A JP2002525663A (ja) | 1998-09-11 | 1999-09-10 | ディジタル音声処理装置及び方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19841683.0 | 1998-09-11 | ||
DE19841683A DE19841683A1 (de) | 1998-09-11 | 1998-09-11 | Vorrichtung und Verfahren zur digitalen Sprachbearbeitung |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000016310A1 true WO2000016310A1 (de) | 2000-03-23 |
Family
ID=7880683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP1999/006712 WO2000016310A1 (de) | 1998-09-11 | 1999-09-10 | Vorrichtung und verfahren zur digitalen sprachbearbeitung |
Country Status (7)
Country | Link |
---|---|
EP (1) | EP1110203B1 (de) |
JP (1) | JP2002525663A (de) |
AT (1) | ATE222393T1 (de) |
AU (1) | AU769036B2 (de) |
CA (1) | CA2343071A1 (de) |
DE (2) | DE19841683A1 (de) |
WO (1) | WO2000016310A1 (de) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002318593A (ja) * | 2001-04-20 | 2002-10-31 | Sony Corp | 言語処理装置および言語処理方法、並びにプログラムおよび記録媒体 |
US7167824B2 (en) | 2002-02-14 | 2007-01-23 | Sail Labs Technology Ag | Method for generating natural language in computer-based dialog systems |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE10117367B4 (de) * | 2001-04-06 | 2005-08-18 | Siemens Ag | Verfahren und System zur automatischen Umsetzung von Text-Nachrichten in Sprach-Nachrichten |
DE10207875A1 (de) * | 2002-02-19 | 2003-08-28 | Deutsche Telekom Ag | Parametergesteuerte Sprachsynthese |
WO2005088606A1 (en) * | 2004-03-05 | 2005-09-22 | Lessac Technologies, Inc. | Prosodic speech text codes and their use in computerized speech systems |
DE102004012208A1 (de) * | 2004-03-12 | 2005-09-29 | Siemens Ag | Individualisierung von Sprachausgabe durch Anpassen einer Synthesestimme an eine Zielstimme |
DE102008044635A1 (de) | 2008-07-22 | 2010-02-04 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Vorrichtung und Verfahren zum Bereitstellen einer Fernsehsequenz |
US10424288B2 (en) | 2017-03-31 | 2019-09-24 | Wipro Limited | System and method for rendering textual messages using customized natural voice |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996008813A1 (fr) * | 1994-09-12 | 1996-03-21 | Arcadia, Inc. | Convertisseur de caracteristiques sonores, dispositif d'association son/marque et leur procede de realisation |
US5559927A (en) * | 1992-08-19 | 1996-09-24 | Clynes; Manfred | Computer system producing emotionally-expressive speech messages |
EP0762384A2 (de) * | 1995-09-01 | 1997-03-12 | AT&T IPM Corp. | Verfahren und Vorrichtung zur Veränderung von Stimmeigenschaften synthetisch erzeugter Sprache |
DE19610019A1 (de) * | 1996-03-14 | 1997-09-18 | Data Software Gmbh G | Digitales Sprachsyntheseverfahren |
US5956685A (en) * | 1994-09-12 | 1999-09-21 | Arcadia, Inc. | Sound characteristic converter, sound-label association apparatus and method therefor |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5695295A (en) * | 1979-12-28 | 1981-08-01 | Sharp Kk | Voice sysnthesis and control circuit |
FR2494017B1 (fr) * | 1980-11-07 | 1985-10-25 | Thomson Csf | Procede de detection de la frequence de melodie dans un signal de parole et dispositif destine a la mise en oeuvre de ce procede |
JPS58102298A (ja) * | 1981-12-14 | 1983-06-17 | キヤノン株式会社 | 電子機器 |
US4623761A (en) * | 1984-04-18 | 1986-11-18 | Golden Enterprises, Incorporated | Telephone operator voice storage and retrieval system |
DE19503419A1 (de) * | 1995-02-03 | 1996-08-08 | Bosch Gmbh Robert | Verfahren und Einrichtung zur Ausgabe von digital codierten Verkehrsmeldungen mittels synthetisch erzeugter Sprache |
JPH08263094A (ja) * | 1995-03-10 | 1996-10-11 | Winbond Electron Corp | メロディを混合した音声を発生する合成器 |
JP3616250B2 (ja) * | 1997-05-21 | 2005-02-02 | 日本電信電話株式会社 | 合成音声メッセージ作成方法、その装置及びその方法を記録した記録媒体 |
US6226614B1 (en) * | 1997-05-21 | 2001-05-01 | Nippon Telegraph And Telephone Corporation | Method and apparatus for editing/creating synthetic speech message and recording medium with the method recorded thereon |
-
1998
- 1998-09-11 DE DE19841683A patent/DE19841683A1/de not_active Withdrawn
-
1999
- 1999-09-10 CA CA002343071A patent/CA2343071A1/en not_active Abandoned
- 1999-09-10 JP JP2000570766A patent/JP2002525663A/ja not_active Withdrawn
- 1999-09-10 AT AT99947314T patent/ATE222393T1/de not_active IP Right Cessation
- 1999-09-10 DE DE59902365T patent/DE59902365D1/de not_active Expired - Fee Related
- 1999-09-10 EP EP99947314A patent/EP1110203B1/de not_active Expired - Lifetime
- 1999-09-10 AU AU60813/99A patent/AU769036B2/en not_active Ceased
- 1999-09-10 WO PCT/EP1999/006712 patent/WO2000016310A1/de active IP Right Grant
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5559927A (en) * | 1992-08-19 | 1996-09-24 | Clynes; Manfred | Computer system producing emotionally-expressive speech messages |
WO1996008813A1 (fr) * | 1994-09-12 | 1996-03-21 | Arcadia, Inc. | Convertisseur de caracteristiques sonores, dispositif d'association son/marque et leur procede de realisation |
US5956685A (en) * | 1994-09-12 | 1999-09-21 | Arcadia, Inc. | Sound characteristic converter, sound-label association apparatus and method therefor |
EP0762384A2 (de) * | 1995-09-01 | 1997-03-12 | AT&T IPM Corp. | Verfahren und Vorrichtung zur Veränderung von Stimmeigenschaften synthetisch erzeugter Sprache |
DE19610019A1 (de) * | 1996-03-14 | 1997-09-18 | Data Software Gmbh G | Digitales Sprachsyntheseverfahren |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002318593A (ja) * | 2001-04-20 | 2002-10-31 | Sony Corp | 言語処理装置および言語処理方法、並びにプログラムおよび記録媒体 |
US7167824B2 (en) | 2002-02-14 | 2007-01-23 | Sail Labs Technology Ag | Method for generating natural language in computer-based dialog systems |
Also Published As
Publication number | Publication date |
---|---|
DE59902365D1 (de) | 2002-09-19 |
AU769036B2 (en) | 2004-01-15 |
EP1110203A1 (de) | 2001-06-27 |
DE19841683A1 (de) | 2000-05-11 |
AU6081399A (en) | 2000-04-03 |
EP1110203B1 (de) | 2002-08-14 |
CA2343071A1 (en) | 2000-03-23 |
JP2002525663A (ja) | 2002-08-13 |
ATE222393T1 (de) | 2002-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0886853B1 (de) | Auf mikrosegmenten basierendes sprachsyntheseverfahren | |
DE60216069T2 (de) | Sprache-zu-sprache erzeugungssystem und verfahren | |
DE69821673T2 (de) | Verfahren und Vorrichtung zum Editieren synthetischer Sprachnachrichten, sowie Speichermittel mit dem Verfahren | |
Jilka | The contribution of intonation to the perception of foreign accent | |
DE60112512T2 (de) | Kodierung von Ausdruck in Sprachsynthese | |
DE60035001T2 (de) | Sprachsynthese mit Prosodie-Mustern | |
DE69909716T2 (de) | Formant Sprachsynthetisierer unter Verwendung von Verkettung von Halbsilben mit unabhängiger Überblendung im Filterkoeffizienten- und Quellenbereich | |
DE60118874T2 (de) | Prosodiemustervergleich für Text-zu-Sprache Systeme | |
DE112004000187T5 (de) | Verfahren und Vorrichtung der prosodischen Simulations-Synthese | |
EP3010014B1 (de) | Verfahren zur interpretation von automatischer spracherkennung | |
EP1105867B1 (de) | Verfahren und vorrichtungen zur koartikulationsgerechten konkatenation von audiosegmenten | |
EP1110203B1 (de) | Vorrichtung und verfahren zur digitalen sprachbearbeitung | |
EP0058130B1 (de) | Verfahren zur Synthese von Sprache mit unbegrenztem Wortschatz und Schaltungsanordnung zur Durchführung des Verfahrens | |
EP1344211B1 (de) | Vorrichtung und verfahren zur differenzierten sprachausgabe | |
DE60311482T2 (de) | Verfahren zur steuerung der dauer bei der sprachsynthese | |
JP2577372B2 (ja) | 音声合成装置および方法 | |
DE19837661C2 (de) | Verfahren und Vorrichtung zur koartikulationsgerechten Konkatenation von Audiosegmenten | |
EP3144929A1 (de) | Synthetische erzeugung eines natürlich klingenden sprachsignals | |
EP1212748A1 (de) | Digitales sprachsyntheseverfahren mit intonationsnachbildung | |
WO2023222287A1 (de) | Sprachsynthesizer und verfahren zur sprachsynthese | |
EP2325836A1 (de) | Verfahren und System für das Training von Sprachverarbeitungseinrichtungen | |
Vanderslice et al. | Synthetic Intonation. | |
DE10334105A1 (de) | Verfahren zur Generierung von Gesichts-Animationsparametern zur Darstellung gesprochener Sprache mittels graphischer Computermodelle | |
DE3314674A1 (de) | Sprachsynthesator mit variabler rate | |
DE2306816A1 (de) | Sprachgenerator |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU CA JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
ENP | Entry into the national phase |
Ref document number: 2343071 Country of ref document: CA Ref country code: CA Ref document number: 2343071 Kind code of ref document: A Format of ref document f/p: F |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2000 570766 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 09786888 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1999947314 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 60813/99 Country of ref document: AU |
|
WWP | Wipo information: published in national office |
Ref document number: 1999947314 Country of ref document: EP |
|
WWG | Wipo information: grant in national office |
Ref document number: 1999947314 Country of ref document: EP |
|
WWG | Wipo information: grant in national office |
Ref document number: 60813/99 Country of ref document: AU |