EP1110203B1 - Device and method for digital voice processing - Google Patents

Device and method for digital voice processing Download PDF

Info

Publication number
EP1110203B1
EP1110203B1 EP99947314A EP99947314A EP1110203B1 EP 1110203 B1 EP1110203 B1 EP 1110203B1 EP 99947314 A EP99947314 A EP 99947314A EP 99947314 A EP99947314 A EP 99947314A EP 1110203 B1 EP1110203 B1 EP 1110203B1
Authority
EP
European Patent Office
Prior art keywords
prosody
generating
speaker
speech
generated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP99947314A
Other languages
German (de)
French (fr)
Other versions
EP1110203A1 (en
Inventor
Hans Kull
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of EP1110203A1 publication Critical patent/EP1110203A1/en
Application granted granted Critical
Publication of EP1110203B1 publication Critical patent/EP1110203B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser

Definitions

  • the present invention relates to an apparatus and a method for digital speech processing or language production.
  • Current systems for digital Voice output has so far been used in environments in which a synthetic Voice is acceptable or even desired.
  • the present invention relates a system that enables natural-looking speech to be generated synthetically.
  • the commands built into the text stream can also provide information about the Characteristics of the speaker (i.e. parameters of the speaker model) included.
  • the Characteristics of the speaker i.e. parameters of the speaker model
  • EP 0762384 describes a system in which on the screen on a graphic User interface these speaker characteristics can be entered.
  • the speech synthesis takes place using auxiliary information which is given in a database (e.g. as a "waveform sequence" in EP 0831460).
  • a database e.g. as a "waveform sequence” in EP 0831460.
  • the Composition of the individual sequences leads to distortions and acoustic Artifacts if no measures are taken to suppress them.
  • This The problem (one speaks of “segmental quality”) is considered largely solved today (see e.g. Volker Kraft: Linking natural language building blocks to Speech synthesis: requirements, techniques and evaluation. Fortschr.-Ber.VDI Row 10 No. 468, VDI Verlag 1997). Nevertheless, there is also a modern one Speech synthesis systems have a number of other problems.
  • Multi-language capability One problem in digital speech output is, for example Multi-language capability.
  • Another problem is the improvement of the prosodic Quality, i.e. the quality of the intonation, compare for example "Volker Kraft: Linking natural language building blocks for speech synthesis: requirements, Techniques and Evaluation, Progr.-Ber.VDI Erasmus 10 Nr 468, VDI-Verlag 1997 ".
  • the difficulty is due to the fact that the intonation from the Orthographic input information can only be reconstructed inadequately. It is also dependent on higher levels such as semantics and pragmatics as well Speaker situation and type of speaker.
  • DE-A-196 10 019 describes a method and an apparatus for digital speech processing with a sentence melody generation device for Generation of a sentence melody for a text known. It also enables display the speech signals in the time and frequency domain and by a Marking the time signal a change in the fundamental frequencies of certain Reach segments.
  • the applications range from creating simpler Texts for multimedia applications up to film settings (dubbing), Radio plays, and audio books.
  • a further object of the present invention is therefore that of Provision of such intervention options.
  • the object of the invention is achieved by the for Text generated sentence melody can be modified using an editor.
  • Special embodiments of the invention allow in addition to Editing the sentence melody editing other characteristics of the synthetic generated language.
  • the starting point is the written text. But one to achieve sufficient (especially prosodic) quality, as well as to achieve it Dramatic effects are preferred to the user Embodiment given extensive options for intervention.
  • the User is in the function of the director who is the speaker on the system defined and them speaking rhythm and melody, pronunciation and emphasis pretends.
  • the present invention also includes generating one Phonetic transcription for a written text, as well as the provision of the possibility Modify the phonetic transcription generated, or the phonetic transcription based on generate modifiable rules.
  • This can be a special one, for example Accent of a speaker are generated.
  • the invention comprises a dictionary facility in which the words of one or more languages are saved along with their pronunciation. In the latter case, this enables Multilingual capability, i.e. editing texts in different languages.
  • the Voice processing included speaker models that are either predefined or can be defined or modified by the user. This allows Characteristics of different speakers can be realized, be it male or female Female voices, or different accents of a speaker, such as a Bavarian, Swabian or North German accent.
  • the device exists from a dictionary in which all words also include pronunciation in phonetic transcription are saved (if we speak of phonetic transcription below, then this is one any phonetic spelling, such as the SAMPA notation, cf. e.g. "Multilingual speech input / output assessment, methodology and standardization, standard computer-compatible transscription, pp 29-31, in Esprit Project 2589 (SAM) Fin. Report SAM-UCC-037 ", or the international one known from language teaching aids phonetic writing, cf. e.g. "The Principles of the International Phonetic Association: Adescription of the International Phonetic Alphabet and the Manner of Using it. International Phonetic Association, Dept, Phonetics, Univ.
  • the invention can either be hybrid in software and hardware or entirely in Software can be realized.
  • the digital voice signals generated can be via a special device for digital audio or via a PC sound card.
  • Figure 1 shows a block diagram of a device for digital Speech generation according to an embodiment of the present invention.
  • Invention this consists of several individual components, which by means of or several digital computing systems can be realized, and their Operation and interaction is described in more detail below.
  • the dictionary 100 consists of simple tables (for each language a) in which the words of a language are stored along with their pronunciation are.
  • the tables can be used to include additional words and their pronunciation can be expanded as required.
  • For special purposes, e.g. for creating accents can also add additional tables with different in one language phonetic entries are generated. The different speakers will each get one Table assigned to the dictionary.
  • the translator 110 generates the phonetic script by using the Words of the entered text by their phonetic counterparts in the Dictionary replaced. If in the speaker model modifiers, the later more precisely are described, he uses them to modify the Pronunciation.
  • heuristics are e.g. the model by Fujisaki (1992) or other acoustic methods, then the perceptual ones Models, e.g. that of d'Alessandro and Mertens (1995).
  • These, but also older ones linguistic models are e.g. described in "Gold Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997 ".
  • the user has an instrument in his hand with which Enter pronunciation, intonation, emphasis, tempo, volume, pauses, etc. and can change.
  • he assigns a speaker model to the text sections to be processed 130 to which later explained in more detail with regard to structure and mode of operation becomes.
  • the translator responds to this assignment by using the phonetics and if necessary, adapt the prosody to the speaker model and regenerate it.
  • the Phonetics are displayed to the user in phonetic transcription, the prosody e.g. in one of the Music taken symbolism (musical notation).
  • the user then has the option of to change these specifications, to listen to individual sections of text and his Improve entries again, etc.
  • Speaker models 130 are, for example, parameterizations for the Speech production.
  • the function of the vocal cords is determined by a Pulse sequence shown, of which only the frequency (pitch) can be changed.
  • the other characteristics (oral cavity, nasal cavity) of the speech tract are included digital filters.
  • Your parameters are stored in the speaker model. It standard models are stored (child, young lady, old man, etc.).
  • the User can generate additional models from them by changing the parameters suitably chooses or changes and saves the model.
  • the ones deposited here Parameters are created during language generation, which will be explained in more detail later. used with the prosody information for intonation.
  • a speaker model can, for example, concern the rules according to which the translator creates the phonetic transcription, different speaker models can follow different rules. However, it can also be one certain set of filter parameters correspond to the speech signals to be processed in accordance with the speaker characteristics specified thereby. Of course, any combination of these two aspects is one Speaker model conceivable.
  • the task of the speech generation unit 140 is to: Predefined text together with the one created by the translator and by the user edited phonetic and prosodic additional information a numerical Generate data stream that represents digital voice signals.
  • This Data stream can then be from an output device 150, such as a digital one Audio device or a sound card in the PC, in analog sound signals text to be output.
  • a conventional text-to-speech can be used for speech generation Conversion procedures are used, but the pronunciation and the Sentence melody has already been created. Generally one differentiates between rule-based and chain-based synthesizers.
  • Chain-based synthesizers are easier to use. You work with a database that stores all possible pairs of sounds. This can easily be chained, but high quality systems are high Have computing time requirements. Such systems are described in "Gold Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997 “and in” Volker Kraft: Linking natural language building blocks for speech synthesis: requirements, Techniques and evaluation. Fortschr.-Ber. VDI series 10 No. 468, VDI-Verlag 1997 ".
  • Digital filters e.g. Bandpass filter for telephone effect
  • Hall generators etc.
  • sounds stored in an archive 170 can be used.
  • Archives 170 contain sounds such as Road noise, railroad, Child cries, ocean waves, background music, etc. saved.
  • the archive can be expanded with your own sounds.
  • the archive can simply be one Collection of files with digitized sounds, but it can also be one Database in which the sounds are housed as blobs (binary large objects) are.
  • the generated speech signals with the Assembled background noise In the mixing device 180, the generated speech signals with the Assembled background noise.
  • the volume of all signals can be used the composition are regulated. It is also possible to send each signal individually or all of them with effects.
  • the result of the signal generated in this way can be sent to a suitable device for digital audio 150, such as a sound card from a PC, and thus acoustically checked or issued.
  • a (not shown) Storage device is provided to store the signal so that it later in can be appropriately transferred to the target medium.
  • a device that is classically implemented in hardware can be used as a mixing device can be used, or it can be implemented in software and in the entire program be involved.
  • the output device 150 by a another computer to be replaced by a network connection to the Mixing device 180 is coupled.
  • a Network connection to the Mixing device 180 is coupled.
  • a Computer network such as the Internet, the voice signal generated on another Computer.
  • Speech generator 140 generates speech signal directly to the output device 150 are transmitted without the detour via the mixing device 180. Further comparable modifications result in a casual manner for the person skilled in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to a device for digital voice processing which comprises a sentence melody generating device for generating a sentence melody for a text, and an editing device for displaying and modifying the generated sentence melody.

Description

Die vorliegende Erfindung betrifft eine Vorrichtung und ein Verfahren zur digitalen Sprachbearbeitung bzw. Spracherzeugung. Derzeitige Systeme zur digitalen Sprachausgabe werden bisher in Umgebungen eingesetzt, in denen eine synthetische Stimme akzeptabel oder gar erwünscht ist. Die vorliegende Erfindung dagegen betrifft ein System, das es ermöglicht, natürlich wirkende Sprache synthetisch zu erzeugen.The present invention relates to an apparatus and a method for digital speech processing or language production. Current systems for digital Voice output has so far been used in environments in which a synthetic Voice is acceptable or even desired. The present invention, however, relates a system that enables natural-looking speech to be generated synthetically.

In derzeitigen Systemen zur digitalen Spracherzeugung werden die Informationen zur Satzmelodie und zur Betonung automatisch erzeugt, wie z. B. beschrieben in EP 0689706. In manchen Systemen ist es möglich, zusätzliche Kommandos in den Textstrom einzubauen, bevor dieser dem Spracherzeuger übergeben wird, z.B. in EP 0598599. Diese Kommandos werden z.B. als (nicht aussprechbare) Sonderzeichen eingegeben, wie etwa beschrieben in EP 0598598.In current systems for digital speech generation, the Automatically generated information about the melody and the emphasis, such as B. described in EP 0689706. In some systems it is possible to add additional To include commands in the text stream before it reaches the language producer is handed over, e.g. in EP 0598599. These commands are e.g. as not pronounceable) special characters, such as described in EP 0598598.

Die in den Textstrom eingebauten Kommandos können auch Angaben zur Charakteristik des Sprechers (d.h. Parameter des Sprechermodells) enthalten. In EP 0762384 wird ein System beschrieben, in dem am Bildschirm auf einer graphischen Benutzeroberfläche diese Sprechercharakteristiken eingegeben werden können.The commands built into the text stream can also provide information about the Characteristics of the speaker (i.e. parameters of the speaker model) included. In EP 0762384 describes a system in which on the screen on a graphic User interface these speaker characteristics can be entered.

Die Sprachsynthese erfolgt unter Verwendung von Hilfsinformationen, die in einer Datenbank gespeichert werden (z.B. als "waveform sequence" in EP 0831460). Für die Aussprache der Wörter, die nicht in der Datenbank gespeichert sind, müssen aber dennoch Regeln zur Aussprache im Programm vorhanden sein. Die Zusammensetzung der einzelnen Sequenzen führt zu Verzerrungen und akustischen Artefakten, wenn keine Massnahmen zu ihrer Unterdrückung getroffen werden. Dieses Problem (man spricht von ,,segmentaler Qualität) gilt aber heute als weitgehend gelöst (vgl. dazu z.B. Volker Kraft: Verkettung natürlichsprachlicher Bausteine zur Sprachsynthese: Anforderungen, Techniken und Evaluierung. Fortschr.-Ber.VDI Reihe 10 Nr 468, VDI-Verlag 1997). Dennoch besteht auch bei modernen Sprachsynthesesystemen eine Reihe von weiteren Problemen. The speech synthesis takes place using auxiliary information which is given in a database (e.g. as a "waveform sequence" in EP 0831460). For the pronunciation of words that are not stored in the database, must but there are still rules for pronunciation in the program. The Composition of the individual sequences leads to distortions and acoustic Artifacts if no measures are taken to suppress them. This The problem (one speaks of “segmental quality”) is considered largely solved today (see e.g. Volker Kraft: Linking natural language building blocks to Speech synthesis: requirements, techniques and evaluation. Fortschr.-Ber.VDI Row 10 No. 468, VDI Verlag 1997). Nevertheless, there is also a modern one Speech synthesis systems have a number of other problems.

Ein Problem in der digitalen Sprachausgabe ist beispielsweise die Mehrsprachen-fähigkeit.One problem in digital speech output is, for example Multi-language capability.

Ein weiteres Problem besteht in der Verbesserung der prosodischen Qualität, d.h. der Güte der Intonation, man vergleiche hierzu etwa "Volker Kraft: Verkettung natürlichsprachlicher Bausteine zur Sprachsynthese: Anforderungen, Techniken und Evaluierung, Fortschr.-Ber.VDI Reihe 10 Nr 468, VDI-Verlag 1997". Die Schwierigkeit ist darauf zurückzuführen, daß die Intonation aus der orthographischen Eingabeinformation nur unzureichend rekonstruiert werden kann. Sie ist auch abhängig von höheren Ebenen wie Semantik und Pragmatik sowie Sprechersituation und Sprechertyp.Another problem is the improvement of the prosodic Quality, i.e. the quality of the intonation, compare for example "Volker Kraft: Linking natural language building blocks for speech synthesis: requirements, Techniques and Evaluation, Progr.-Ber.VDI Reihe 10 Nr 468, VDI-Verlag 1997 ". The difficulty is due to the fact that the intonation from the Orthographic input information can only be reconstructed inadequately. It is also dependent on higher levels such as semantics and pragmatics as well Speaker situation and type of speaker.

Aus DE-A-196 10 019 sind ein Verfahren und eine Vorrichtung zur digitalen Sprachbearbeitung mit einer Satzmelodie-Erzeugungseinrichtung zur Erzeugung einer Satzmelodie für einen Text bekannt. Daneben wird es ermöglicht, die Sprachsignale im Zeit- sowie im Frequenzbereich anzuzeigen und durch eine Markierung des Zeitsignals eine Veränderung der Grundfrequenzen bestimmter Segmente zu erreichen.DE-A-196 10 019 describes a method and an apparatus for digital speech processing with a sentence melody generation device for Generation of a sentence melody for a text known. It also enables display the speech signals in the time and frequency domain and by a Marking the time signal a change in the fundamental frequencies of certain Reach segments.

Allgemein kann gesagt werden, daß die Qualität der heutigen Sprachausgabesysteme den Anforderungen dort genügen, wo der Zuhörer eine synthetische Stimme erwartet oder akzeptiert. Vielfach wird jedoch die Qualität synthetischer Sprache als nicht ausreichend oder als unbefriedigend empfunden.In general it can be said that the quality of today Narrator systems meet the requirements where the listener is one synthetic voice expected or accepted. However, the quality is often synthetic language as inadequate or perceived as unsatisfactory.

Es ist daher eine Aufgabe der vorliegenden Erfindung, eine Vorrichtung und ein Verfahren zu digitalen Sprachbearbeitung zu schaffen, das es ermöglicht, synthetische Sprache von besserer Qualität zu erzeugen.It is therefore an object of the present invention, an apparatus and to create a process for digital speech processing that enables to produce synthetic language of better quality.

Es ist ein weiteres Ziel der Erfindung, natürlich wirkende Sprache synthetisch zu erzeugen. Die Anwendungen reichen von der Erzeugung einfacher Texte für Multimedia-Applikationen bis hin zu Filmvertonungen (Synchronisation), Hörspielen, und Hörbüchern.It is another object of the invention to use natural language to produce synthetically. The applications range from creating simpler Texts for multimedia applications up to film settings (dubbing), Radio plays, and audio books.

Selbst wenn die synthetisch erzeugte Sprache natürlich wirkt, sind manchmal Eingriffsmöglichkeiten für die Erzeugung dramaturgischer Effekte erforderlich. Eine weitere Aufgabe der vorliegenden Erfindung besteht daher in der Bereitstellung derartiger Eingriffsmöglichkeiten. Even if the synthetically generated language looks natural, are sometimes interventions to create dramaturgical effects required. A further object of the present invention is therefore that of Provision of such intervention options.

Die vorliegende Erfindung ist in den unabhängigen Ansprüche definiert. Die abhängigen Ansprüche definieren besondere Ausführungsbeispiele der Erfindung.The present invention is defined in the independent claims. The dependent claims define particular embodiments of the invention.

Im wesentlichen wird die Aufgabe der Erfindung gelöst, indem die für einen Text erzeugte Satzmelodie mittels eines Editors modifiziert werden kann.Essentially, the object of the invention is achieved by the for Text generated sentence melody can be modified using an editor.

Besondere Ausführungsformen der Erfindung ermöglichen neben der Editierung der Satzmelodie eine Editierung weiterer Charakteristiken der synthetisch erzeugten Sprache.Special embodiments of the invention allow in addition to Editing the sentence melody editing other characteristics of the synthetic generated language.

Ausgangspunkt ist dabei der geschriebene Text. Um aber eine ausreichende (insbesondere prosodische) Qualität zu erreichen, sowie zur Erzielung dramaturgischer Effekte werden dem Anwender in einer bevorzugten Ausführungsform weitreichende Möglichkeiten zum Eingreifen gegeben. Der Anwender ist in der Funktion des Regisseurs, der die Sprecher auf dem System definiert und ihnen Sprechrhythmus und Satzmelodie, Aussprache und Betonung vorgibt.The starting point is the written text. But one to achieve sufficient (especially prosodic) quality, as well as to achieve it Dramatic effects are preferred to the user Embodiment given extensive options for intervention. The User is in the function of the director who is the speaker on the system defined and them speaking rhythm and melody, pronunciation and emphasis pretends.

Vorzugsweise umfaßt die vorliegende Erfindung auch das Erzeugen einer Lautschrift für einen geschriebenen Text, sowie das Vorsehen der Möglichkeit die erzeugte Lautschrift zu modifizieren, bzw. die Lautschrift basierend auf modifizierbaren Regeln zu erzeugen. Dadurch kann beispielsweise ein besonderer Akzent eines Sprechers generiert werden.Preferably, the present invention also includes generating one Phonetic transcription for a written text, as well as the provision of the possibility Modify the phonetic transcription generated, or the phonetic transcription based on generate modifiable rules. This can be a special one, for example Accent of a speaker are generated.

In einem weiteren bevorzugten Ausführungsbeispiel umfaßt die Erfindung eine Wörterbucheinrichtung, in der die Wörter einer oder mehrerer Sprachen zusammen mit ihrer Aussprache gespeichert sind. In letzteren Fall ermöglicht dies die Mehrsprachenfähigkeit, d.h. die Bearbeitung von Texten verschiedener Sprache. In a further preferred embodiment, the invention comprises a dictionary facility in which the words of one or more languages are saved along with their pronunciation. In the latter case, this enables Multilingual capability, i.e. editing texts in different languages.

Vorzugsweise erfolgt die Editierung der erzeugten Lautschrift bzw. Satzmelodie mittels eines leicht bedienbaren Editors, etwa einer grafischen Benutzerschnittstelle.The editing of the generated phonetic transcription or Sentence melody using an easy-to-use editor, such as a graphic one User interface.

In einem weiteren bevorzugten Ausführungsbeispiel werden in die Sprachbearbeitung Sprechermodelle mit einbezogen, die entweder vordefiniert oder vom Benutzer definiert bzw. modifiziert sein können. Dadurch können Charakteristiken verschiedener Sprecher realisiert werden, seien es nun Männer- oder Frauenstimmen, oder aber auch verschiedene Akzente eines Sprechers, etwa ein bayerischer, schwäbischer oder norddeutscher Akzent.In a further preferred embodiment, the Voice processing included speaker models that are either predefined or can be defined or modified by the user. This allows Characteristics of different speakers can be realized, be it male or female Female voices, or different accents of a speaker, such as a Bavarian, Swabian or North German accent.

In einer besonders bevorzugten Ausführungsform besteht die Vorrichtung aus einem Wörterbuch, in dem zu allen Wörtern auch die Aussprache in Lautschrift gespeichert sind (wenn nachstehend von Lautschrift die Rede ist, so ist damit eine beliebige Lautschrift gemeint, wie z.B. die SAMPA-Notation, vgl. z.B. "Multilingual speech input/output assessment, methodology and standardization, standard computer-compatible transscription, pp 29-31, in Esprit Project 2589 (SAM) Fin. Report SAM-UCC-037", oder die aus Sprachlehrmitteln bekannte internationale phonetische Schrift, vgl. z.B. "The Principles of the International Phonetic Association: Adescription of the International Phonetic Alphabet and the Manner of Using it. International Phonetic Association, Dept, Phonetics, Univ. College of London"), einem Übersetzer, der eingegebene Texte in Lautschrift wandelt und eine Satzmelodie erzeugt, einem Editor, mit dem Texte eingegeben und Sprecher zugeordnet werden können und in dem sowohl die erzeugte Lautschrift als auch die Satzmelodie angezeigt und verändert werden kann, einem Eingabemodul, in dem Sprechermodelle definiert werden können, einem System zur digitalen Spracherzeugung, das aus der Lautschrift zusammen mit der Satzmelodie gesprochene Sprache repräsentierende Signale bzw. solche Signale repräsentierende Daten erzeugt und das in der Lage ist, verschiedene Sprechermodelle zu verarbeiten, einem System von digitalen Filtern und anderen Geräten (für Hall, Echo usw.) mit dem besondere Effekte erzeugt werden können, einem Geräusch-Archiv, sowie einem Misch-Gerät, in dem die erzeugten Sprach-Signale zusammen mit Geräuschen aus dem Archiv zusammen gemischt und mit Effekten versehen werden können.In a particularly preferred embodiment, the device exists from a dictionary in which all words also include pronunciation in phonetic transcription are saved (if we speak of phonetic transcription below, then this is one any phonetic spelling, such as the SAMPA notation, cf. e.g. "Multilingual speech input / output assessment, methodology and standardization, standard computer-compatible transscription, pp 29-31, in Esprit Project 2589 (SAM) Fin. Report SAM-UCC-037 ", or the international one known from language teaching aids phonetic writing, cf. e.g. "The Principles of the International Phonetic Association: Adescription of the International Phonetic Alphabet and the Manner of Using it. International Phonetic Association, Dept, Phonetics, Univ. College of London "), one Translator who converts typed texts into phonetic transcription and a sentence melody created, an editor with which texts can be entered and speakers can be assigned can and in which both the generated phonetic transcription and the sentence melody can be displayed and changed, an input module in which speaker models can be defined, a system for digital speech production, which from the Representing phonetic spelling along with the sentence melody spoken language Generates signals or data representing such signals and is capable of to process different speaker models, a system of digital filters and other devices (for reverb, echo etc.) with which special effects are created can, a sound archive, as well as a mixing device in which the generated Speech signals mixed together with sounds from the archive and can be provided with effects.

Die Erfindung kann entweder hybrid in Soft- und Hardware oder ganz in Software realisiert werden. Die erzeugten digitalen Sprachsignale können über ein spezielles Gerät für digital Audio oder über eine PC-Soundkarte ausgegeben werden.The invention can either be hybrid in software and hardware or entirely in Software can be realized. The digital voice signals generated can be via a special device for digital audio or via a PC sound card.

Die vorliegende Erfindung wird nachfolgend anhand mehrerer Ausführungsbeispiele und der Bezugnahme auf die beiliegende Zeichnung in Detail beschrieben.The present invention will hereinafter be described with reference to several Embodiments and reference to the accompanying drawings in detail described.

Figur 1 zeigt ein Blockschaltbild einer Vorrichtung zur digitalen Spracherzeugung gemäß einem Ausführungsbeispiel der vorliegenden Erfindung.Figure 1 shows a block diagram of a device for digital Speech generation according to an embodiment of the present invention.

Im nachfolgend beschriebenen Ausführungsbeispiel der vorliegenden Erfindung besteht diese aus mehreren Einzelkomponenten, die mittels einer oder mehrerer digitaler Rechenanlagen realisiert werden können, und deren Funktionsweise und Zusammenwirken nachfolgend genauer beschrieben wird.In the exemplary embodiment of the present described below Invention this consists of several individual components, which by means of or several digital computing systems can be realized, and their Operation and interaction is described in more detail below.

Das Wörterbuch 100 besteht aus einfachen Tabellen (für jede Sprache eine), in der die Wörter einer Sprache zusammen mit ihrer Aussprache gespeichert sind. Die Tabellen können für die Aufnahme zusätzlicher Wörter und ihrer Aussprache beliebig erweitert werden. Für besondere Zwecke, z.B. für das Erzeugen von Akzenten können in einer Sprache auch zusätzliche Tabellen mit unterschiedlichen phonetischen Einträgen erzeugt werden. Den verschiedenen Sprechern wird je eine Tabelle des Wörterbuches zugeordnet.The dictionary 100 consists of simple tables (for each language a) in which the words of a language are stored along with their pronunciation are. The tables can be used to include additional words and their pronunciation can be expanded as required. For special purposes, e.g. for creating accents can also add additional tables with different in one language phonetic entries are generated. The different speakers will each get one Table assigned to the dictionary.

Der Übersetzer 110 erzeugt einerseits die phonetische Schrift, indem er die Wörter des eingegebenen Textes durch ihre phonetischen Entsprechungen im Wörterbuch ersetzt. Falls im Sprechermodell Modifikatoren, die später genauer beschrieben werden, hinterlegt sind, so verwendet er sie zur Modifikation der Aussprache.On the one hand, the translator 110 generates the phonetic script by using the Words of the entered text by their phonetic counterparts in the Dictionary replaced. If in the speaker model modifiers, the later more precisely are described, he uses them to modify the Pronunciation.

Zusätzlich erzeugt er die Prosodie unter Verwendung von in der Sprachverarbeitung bekannten Heuristiken. Solche Heuristiken sind z.B. das Modell von Fujisaki (1992) oder andere akustische Methoden, dann die perzeptuellen Modelle, z.B. das von d'Alessandro und Mertens (1995). Diese, aber auch ältere linguistische Modelle sind z.B. beschrieben in "Thierry Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997". Dort finden sich auch Verfahren für die Segmentation (setzen von Pausen), welche ebenfalls vom Übersetzer erzeugt wird.In addition, he creates the prosody using in the Speech processing known heuristics. Such heuristics are e.g. the model by Fujisaki (1992) or other acoustic methods, then the perceptual ones Models, e.g. that of d'Alessandro and Mertens (1995). These, but also older ones linguistic models are e.g. described in "Thierry Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997 ". There are also procedures for the Segmentation (setting breaks), which is also generated by the translator.

Die Wahl der Verfahren ist dabei von eher untergeordneter Bedeutung, da der Übersetzer lediglich eine Vorgabe der Prosodie erzeugt, welche vom Anwender noch geändert werden kann.The choice of procedure is of minor importance, because the translator merely creates a specification of the prosody, which is created by the user can still be changed.

Mit dem Editor 120 hat der Anwender ein Instrument in der Hand, mit dem er Aussprache, Intonation, Betonung, Tempo, Lautstärke, Pausen usw. eingeben und verändern kann.With the Editor 120, the user has an instrument in his hand with which Enter pronunciation, intonation, emphasis, tempo, volume, pauses, etc. and can change.

Zuerst ordnet er den zu verarbeitenden Textabschnitten ein Sprechermodell 130 zu welches später bezüglich Aufbau und Funktionsweise noch genauer erläutert wird. Der Übersetzer reagiert auf diese Zuordnung, indem er die Phonetik und gegebenenfalls die Prosodie dem Sprechermodell anpaßt und neu generiert. Die Phonetik wird dem Anwender in Lautschrift angezeigt, die Prosodie z.B. in einer der Musik entnommenen Symbolik (Notenschrift). Der Anwender hat dann die Möglichkeit, diese Vorgaben zu verändern, sich einzelne Textabschnitte anzuhören und seine Eingaben nochmals zu verbessern usw.First, he assigns a speaker model to the text sections to be processed 130 to which later explained in more detail with regard to structure and mode of operation becomes. The translator responds to this assignment by using the phonetics and if necessary, adapt the prosody to the speaker model and regenerate it. The Phonetics are displayed to the user in phonetic transcription, the prosody e.g. in one of the Music taken symbolism (musical notation). The user then has the option of to change these specifications, to listen to individual sections of text and his Improve entries again, etc.

Selbstverständlich können im Editor auch die Texte selbst erfaßt werden, falls sie nicht direkt aus einem anderen Textverarbeitungssystem importiert werden können. Of course, the texts themselves can also be entered in the editor, if they are not imported directly from another word processing system can.

Sprechermodelle 130 sind beispielsweise Parametrisierungen für die Spracherzeugung. In den Modellen werden die Charakteristiken des menschlichen Sprechtrakts nachgebildet. Die Funktion der Stimmbänder wird durch einen Impulsfolge dargestellt, von der nur die Frequenz (pitch) verändert werden kann. Die übrigen Charakteristiken (Mundhöhle, Nasenraum) des Sprechtrakts werden mit digitalen Filtern realisiert. Ihre Parameter werden im Sprechermodel hinterlegt. Es werden Standardmodelle hinterlegt (Kind, junge Dame, alter Mann usw.). Der Anwender kann aus ihnen zusätzliche Modelle erzeugen, indem er die Parameter geeignet wählt oder abändert und das Modell abspeichert. Die hier hinterlegten Parameter werden wahrend der Spracherzeugung, die später genauer erläutert wird, zusammen mit der Prosodie-Information für die Intonation verwendet.Speaker models 130 are, for example, parameterizations for the Speech production. In the models, the characteristics of the human Replica speech tract. The function of the vocal cords is determined by a Pulse sequence shown, of which only the frequency (pitch) can be changed. The other characteristics (oral cavity, nasal cavity) of the speech tract are included digital filters. Your parameters are stored in the speaker model. It standard models are stored (child, young lady, old man, etc.). The User can generate additional models from them by changing the parameters suitably chooses or changes and saves the model. The ones deposited here Parameters are created during language generation, which will be explained in more detail later. used with the prosody information for intonation.

Dabei können auch Besonderheiten des Sprechers wie z.B. Akzente oder Sprachfehler eingegeben werden. Diese werden vom Übersetzer zur Modifikation der Aussprache verwendet. Ein einfaches Beispiel eines solchen Modifikators ist z.B. die Regel, jeweils (in der Lautschrift) "∫t" durch "st" zu ersetzen (für die Erzeugung des Akzents eines Hamburgers).Special features of the speaker such as Accents or Speech errors can be entered. These are used by the translator to modify the Pronunciation used. A simple example of such a modifier is e.g. the Rule to replace "∫t" with "st" (for the generation of the Accent of a hamburger).

Ein Sprechermodell kann also beispielsweise die Regeln betreffen, nach denen der Übersetzer die Lautschrift erzeugt, unterschiedliche Sprechermodelle können dabei nach unterschiedlichen regeln verfahren. Es kann jedoch auch einem bestimmten Satz von Filterparametern entsprechen, um die Sprachsignale entsprechend der dadurch vorgegebenen Sprechercharakteristik zu verarbeiten. Selbstverständlich sind auch beliebige Kombinationen dieser beiden Aspekte eines Sprechermodells denkbar.A speaker model can, for example, concern the rules according to which the translator creates the phonetic transcription, different speaker models can follow different rules. However, it can also be one certain set of filter parameters correspond to the speech signals to be processed in accordance with the speaker characteristics specified thereby. Of course, any combination of these two aspects is one Speaker model conceivable.

Die Aufgabe der Spracherzeugungseinheit 140 besteht darin, aus dem vorgegebenen Text zusammen mit den vom Übersetzer erzeugten und vom Anwender editierten phonetischen und prosodischen Zusatzinformationen einen numerischen Datenstrom zu erzeugen, welcher digitale Sprachsignale repräsentiert. Dieser Datenstrom kann dann von einem Ausgabegerät 150, etwa einem digitalen Audio-Gerät oder einer Soundkarte im PC, in analoge Tonsignale, den auszugebenden Text, umgewandelt werden.The task of the speech generation unit 140 is to: Predefined text together with the one created by the translator and by the user edited phonetic and prosodic additional information a numerical Generate data stream that represents digital voice signals. This Data stream can then be from an output device 150, such as a digital one Audio device or a sound card in the PC, in analog sound signals text to be output.

Für die Spracherzeugung kann ein herkömmliches Text-to-Speech Konversions-verfahren angewendet werden, wobei allerdings die Aussprache und die Satzmelodie bereits erzeugt worden sind. Im allgemeinen unterscheidet man zwischen regelbasierten und verkettungsbasierten Synthesizern.A conventional text-to-speech can be used for speech generation Conversion procedures are used, but the pronunciation and the Sentence melody has already been created. Generally one differentiates between rule-based and chain-based synthesizers.

Regelbasierte Synthesizer arbeiten mit Regeln für die Generierung der Laute und die Übergänge dazwischen. Diese Synthesizer arbeiten mit bis zu 60 Parametern, deren Bestimmung sehr aufwendig ist. Dafür können mit ihnen auch sehr gute Ergebnisse erzielt werden. Eine Übersicht über derartige Systeme und Hinweise zu weiterer Literatur findet sich in "Thierry Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997".Rule-based synthesizers work with rules for generating the Lute and the transitions in between. These synthesizers work with up to 60 Parameters, the determination of which is very complex. You can do a lot with them good results are achieved. An overview of such systems and information for further literature can be found in "Thierry Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997 ".

Verkettungsbasierte Synthesizer sind dagegen einfacher zu handhaben. Sie arbeiten mit einer Datenbank, welche alle möglichen Lautpaare speichert. Diese können einfach verkettet werden, wobei allerdings qualitativ gute Systeme hohen Rechenzeitbedarf haben. Derartige Systeme sind beschrieben in "Thierry Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997" und in "Volker Kraft: Verkettung natürlichsprachlicher Bausteine zur Sprachsynthese: Anforderungen, Techniken und Evaluierung. Fortschr.-Ber. VDI Reihe 10 Nr 468, VDI-Verlag 1997".Chain-based synthesizers, on the other hand, are easier to use. You work with a database that stores all possible pairs of sounds. This can easily be chained, but high quality systems are high Have computing time requirements. Such systems are described in "Thierry Dutoit: An Introduction to Text-to-Speech Synthesis, Kluwer 1997 "and in" Volker Kraft: Linking natural language building blocks for speech synthesis: requirements, Techniques and evaluation. Fortschr.-Ber. VDI series 10 No. 468, VDI-Verlag 1997 ".

Grundsätzlich können beide Systemarten verwendet werden. In den regelbasierten Synthesizem fließt die prosodische Information direkt in das Regelwerk ein, wahrend diese in verkettungsbasierten Systemen in geeigneter Weise überlagert wird.In principle, both types of system can be used. In the rule-based syntheses, the prosodic information flows directly into the rules a, while these are appropriately overlaid in chain-based systems becomes.

Für die Erzeugung besonderer Effekte 160 werden bekannte Techniken aus der digitalen Signalverarbeitung eingesetzt, wie z.B. digitale Filter (z.B. Bandpassfilter für Telefon-Effekt), Hallgeneratoren usw. Diese können auch auf in einem Archiv 170 gespeicherte Geräusche angewendet werden.Known techniques are used to create special effects 160 used in digital signal processing, e.g. digital filters (e.g. Bandpass filter for telephone effect), Hall generators etc. These can also be used in sounds stored in an archive 170 can be used.

Im Archiv 170 sind Geräusche wie z.B. Straßenlärm, Eisenbahn, Kindergeschrei, Meereswogen, Hintergrundmusik usw. gespeichert. Das Archiv kann mit eigenen Geräuschen beliebig erweitert werden. Das Archiv kann einfach eine Sammlung von Dateien mit digitalisierten Geräuschen sein, es kann aber auch eine Datenbank sein, in der die Geräusche als Blobs (binary large objects) untergebracht sind.Archives 170 contain sounds such as Road noise, railroad, Child cries, ocean waves, background music, etc. saved. The archive can can be expanded with your own sounds. The archive can simply be one Collection of files with digitized sounds, but it can also be one Database in which the sounds are housed as blobs (binary large objects) are.

In der Misch-Einrichtung 180 werden die erzeugten Sprachsignale mit den Hintergrundgeräuschen zusammengebaut. Die Lautstarke aller Signale kann dabei vor dem Zusammensetzten reguliert werden. Zudem ist es möglich, jedes Signal einzeln oder alle zusammen mit Effekten zu versehen.In the mixing device 180, the generated speech signals with the Assembled background noise. The volume of all signals can be used the composition are regulated. It is also possible to send each signal individually or all of them with effects.

Das Ergebnis des so erzeugten Signals kann an ein geeignetes Gerät für digitales Audio 150, etwa eine Soundkarte eines PC, übergeben und so akustisch überprüft bzw. ausgegeben werden. Zudem ist eine (nicht gezeigte) Speichereinrichtung vorgesehen, um das Signal abzuspeichern, damit es später in geeigneter Weise auf das Zielmedium übertragen werden kann.The result of the signal generated in this way can be sent to a suitable device for digital audio 150, such as a sound card from a PC, and thus acoustically checked or issued. In addition, a (not shown) Storage device is provided to store the signal so that it later in can be appropriately transferred to the target medium.

Als Misch-Einrichtung kann ein klassisch in Hardware realisiertes Gerät verwendet werden, oder es kann in Software realisiert und in das gesamte Programm eingebunden werden.A device that is classically implemented in hardware can be used as a mixing device can be used, or it can be implemented in software and in the entire program be involved.

Für den Fachmann ergeben sich leicht Modifikationen des oben geschriebenen Ausführungsbeispiels. So kann beispielsweise in einem weiteren Ausführungsbeispiel der vorliegenden Erfindung das Ausgabegerät 150 durch einen weiteren Computer ersetzt sein, der mittels einer Netzwerkverbindung an die Mischeinrichtung 180 angekoppelt ist. So kann beispielsweise über ein Computernetz, etwa das Internet, das erzeugte Sprachsignal auf einen anderen Computer übertragen werden.Modifications to the above are readily apparent to those skilled in the art written embodiment. For example, in another Embodiment of the present invention, the output device 150 by a another computer to be replaced by a network connection to the Mixing device 180 is coupled. For example, a Computer network, such as the Internet, the voice signal generated on another Computer.

In einem weiteren Ausführungsbeispiel kann auch das von der Spracherzeugungs-einrichtung 140 erzeugte Sprachsignal direkt an das Ausgabegerät 150 übertragen werden, ohne den Umweg über die Mischeinrichtung 180. Weitere vergleichbare Modifikationen ergeben sich für den Fachmann auf zwanglose Weise.In a further embodiment, that of the Speech generator 140 generates speech signal directly to the output device 150 are transmitted without the detour via the mixing device 180. Further comparable modifications result in a casual manner for the person skilled in the art.

Claims (21)

  1. A digital speech processing apparatus comprising:
    a prosody generation means for generating a prosody for a text; characterized by
    an editing means for displaying and modifying the generated prosody.
  2. The apparatus of claim 1 further comprising:
    translation means for translating the text into a phonetic transcription, said translation means further comprising:
    means for displaying and modifying the generated phonetic transcription.
  3. The apparatus of claim 1 or 2, wherein
    said prosody generating means and/or said translation means generates said prosody and/or said phonetic transcription based on respectively in dependence of a particular speaker's model.
  4. The apparatus of one of claims 1 to 3, further comprising:
    means for displaying and/or modification of one or more speaker's models.
  5. The apparatus of claim 4, wherein said speaker's model modification means comprises:
    means for modifying phonetic transcription elements for the generation of accents.
  6. An apparatus for generating digital speech comprising:
    an apparatus for digital speech processing according to one of claims 1 to 4; and
    means for generating speech signals based on said phonetic transcription which may have been edited using said editing means and/or based on said prosody.
  7. The apparatus of claim 6, wherein said speech signal generating means further comprises:
    a speaker's model processing means for generating said speech signals based on respectively depending on a particular speaker's model.
  8. The apparatus of claim 7, wherein said speaker's model processing means comprises one or more of the following:
    a digital filter system;
    means for adopting a set of filter parameters representing a particular speaker's model.
  9. The apparatus of claim 7 or 8, wherein said speaker's model processing means further comprises:
    means for selecting and/or modifying a speaker's model.
  10. The apparatus of one of claims 6 to 9, further comprising:
    effect generating means for generating sound effects.
  11. The apparatus of claim 10, wherein said effect generating means comprises one or more of the following:
    digital filter means for modifying the generated speech signals, and/or
    a hall generator for generating a hall effect.
  12. The apparatus of one of claims 6 to 11, further comprising:
    archive means for storing sounds; and
    mixing means for mixing the generated speech signals with the sounds stored in said archive means.
  13. The apparatus of one of the preceding claims, further comprising:
    a graphical user interface for editing the generated phonetic transcription and/or prosody.
  14. The apparatus of one of the preceding claims, further comprising:
    means for modifying speech rhythm and/or pronunciation and/or intonation.
  15. The apparatus of one of the preceding claims, further comprising:
    display means for displaying the prosody by means of a symbolic notation.
  16. The apparatus of one of the preceding claims, further comprising:
    dictionary means in which the words of one or more languages are stored together with their pronunciation.
  17. The apparatus of claim 16, wherein for at least one dictionary entry different phonetic entries are stored in said dictionary means.
  18. The apparatus of one of claims 6 to 17, further comprising:
    means for converting said digital speech signals into acoustic signals.
  19. A digital speech processing method comprising:
    generating a prosody for a text;
    displaying said generated prosody; said method being charcterized by:
    editing said generated and displayed prosody.
  20. The method of claim 19, further comprising:
    using an apparatus according to one of claims 1 to 18 for generating digital speech.
  21. A computer program product comprising:
    a medium, in particular a data carrier, for storing and/or transmitting digital data readable by a computer, characterized in that said stored and/or transmitted data comprise:
    a sequence of computer-executable instructions causing said computer to carry out a method according to one of claims 19 or 20.
EP99947314A 1998-09-11 1999-09-10 Device and method for digital voice processing Expired - Lifetime EP1110203B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
DE19841683 1998-09-11
DE19841683A DE19841683A1 (en) 1998-09-11 1998-09-11 Device and method for digital speech processing
PCT/EP1999/006712 WO2000016310A1 (en) 1998-09-11 1999-09-10 Device and method for digital voice processing

Publications (2)

Publication Number Publication Date
EP1110203A1 EP1110203A1 (en) 2001-06-27
EP1110203B1 true EP1110203B1 (en) 2002-08-14

Family

ID=7880683

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99947314A Expired - Lifetime EP1110203B1 (en) 1998-09-11 1999-09-10 Device and method for digital voice processing

Country Status (7)

Country Link
EP (1) EP1110203B1 (en)
JP (1) JP2002525663A (en)
AT (1) ATE222393T1 (en)
AU (1) AU769036B2 (en)
CA (1) CA2343071A1 (en)
DE (2) DE19841683A1 (en)
WO (1) WO2000016310A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10117367B4 (en) * 2001-04-06 2005-08-18 Siemens Ag Method and system for automatically converting text messages into voice messages
JP2002318593A (en) * 2001-04-20 2002-10-31 Sony Corp Language processing system and language processing method as well as program and recording medium
AT6920U1 (en) 2002-02-14 2004-05-25 Sail Labs Technology Ag METHOD FOR GENERATING NATURAL LANGUAGE IN COMPUTER DIALOG SYSTEMS
DE10207875A1 (en) * 2002-02-19 2003-08-28 Deutsche Telekom Ag Parameter-controlled, expressive speech synthesis from text, modifies voice tonal color and melody, in accordance with control commands
US7877259B2 (en) 2004-03-05 2011-01-25 Lessac Technologies, Inc. Prosodic speech text codes and their use in computerized speech systems
DE102004012208A1 (en) 2004-03-12 2005-09-29 Siemens Ag Individualization of speech output by adapting a synthesis voice to a target voice
DE102008044635A1 (en) 2008-07-22 2010-02-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for providing a television sequence
US10424288B2 (en) 2017-03-31 2019-09-24 Wipro Limited System and method for rendering textual messages using customized natural voice

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5695295A (en) * 1979-12-28 1981-08-01 Sharp Kk Voice sysnthesis and control circuit
FR2494017B1 (en) * 1980-11-07 1985-10-25 Thomson Csf METHOD FOR DETECTING THE MELODY FREQUENCY IN A SPEECH SIGNAL AND DEVICE FOR CARRYING OUT SAID METHOD
JPS58102298A (en) * 1981-12-14 1983-06-17 キヤノン株式会社 Electronic appliance
US4623761A (en) * 1984-04-18 1986-11-18 Golden Enterprises, Incorporated Telephone operator voice storage and retrieval system
US5559927A (en) * 1992-08-19 1996-09-24 Clynes; Manfred Computer system producing emotionally-expressive speech messages
US5956685A (en) * 1994-09-12 1999-09-21 Arcadia, Inc. Sound characteristic converter, sound-label association apparatus and method therefor
WO1996008813A1 (en) * 1994-09-12 1996-03-21 Arcadia, Inc. Sound characteristic convertor, sound/label associating apparatus and method to form them
DE19503419A1 (en) * 1995-02-03 1996-08-08 Bosch Gmbh Robert Method and device for outputting digitally coded traffic reports using synthetically generated speech
JPH08263094A (en) * 1995-03-10 1996-10-11 Winbond Electron Corp Synthesizer for generation of speech mixed with melody
EP0762384A2 (en) * 1995-09-01 1997-03-12 AT&T IPM Corp. Method and apparatus for modifying voice characteristics of synthesized speech
DE19610019C2 (en) * 1996-03-14 1999-10-28 Data Software Gmbh G Digital speech synthesis process
JP3616250B2 (en) * 1997-05-21 2005-02-02 日本電信電話株式会社 Synthetic voice message creation method, apparatus and recording medium recording the method
US6226614B1 (en) * 1997-05-21 2001-05-01 Nippon Telegraph And Telephone Corporation Method and apparatus for editing/creating synthetic speech message and recording medium with the method recorded thereon

Also Published As

Publication number Publication date
JP2002525663A (en) 2002-08-13
DE59902365D1 (en) 2002-09-19
AU6081399A (en) 2000-04-03
AU769036B2 (en) 2004-01-15
CA2343071A1 (en) 2000-03-23
EP1110203A1 (en) 2001-06-27
WO2000016310A1 (en) 2000-03-23
DE19841683A1 (en) 2000-05-11
ATE222393T1 (en) 2002-08-15

Similar Documents

Publication Publication Date Title
EP0886853B1 (en) Microsegment-based speech-synthesis process
DE60216069T2 (en) LANGUAGE-TO-LANGUAGE GENERATION SYSTEM AND METHOD
DE60112512T2 (en) Coding of expression in speech synthesis
DE69821673T2 (en) Method and apparatus for editing synthetic voice messages, and storage means with the method
DE60035001T2 (en) Speech synthesis with prosody patterns
EP1105867B1 (en) Method and device for the concatenation of audiosegments, taking into account coarticulation
EP3010014B1 (en) Method for interpretation of automatic speech recognition
EP1110203B1 (en) Device and method for digital voice processing
Schröder Can emotions be synthesized without controlling voice quality
EP0058130B1 (en) Method for speech synthesizing with unlimited vocabulary, and arrangement for realizing the same
EP1344211B1 (en) Device and method for differentiated speech output
DE60305944T2 (en) METHOD FOR SYNTHESIS OF A STATIONARY SOUND SIGNAL
DE60311482T2 (en) METHOD FOR CONTROLLING DURATION OF LANGUAGE SYNTHESIS
JP2008058379A (en) Speech synthesis system and filter device
JP2577372B2 (en) Speech synthesis apparatus and method
Pearson et al. Combining concatenation and formant synthesis for improved intelligibility and naturalness in text-to-speech systems
DE19837661C2 (en) Method and device for co-articulating concatenation of audio segments
EP3144929A1 (en) Synthetic generation of a naturally-sounding speech signal
WO2023222287A1 (en) Speech synthesiser and method for speech synthesis
EP1212748A1 (en) Digital speech synthesis method with intonation reproduction
Murray Emotion in concatenated speech
EP2325836A1 (en) Method and system for training speech processing devices
JPH07129188A (en) Voice synthesizing device
DE10334105A1 (en) Face animation parameters generation method in which a person's mood is determined from a spoken word or word sequence and then used to generate face animation parameters used in animating a graphical face image
Denes Automatic voice answerback using text to speech conversion by rule

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010322

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

17Q First examination report despatched

Effective date: 20011015

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20020814

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED.

Effective date: 20020814

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20020814

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20020814

REF Corresponds to:

Ref document number: 222393

Country of ref document: AT

Date of ref document: 20020815

Kind code of ref document: T

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20020910

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: GERMAN

REF Corresponds to:

Ref document number: 59902365

Country of ref document: DE

Date of ref document: 20020919

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20020930

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20020930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20021114

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20021114

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20021202

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 20021223

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030228

ET Fr: translation filed
BERE Be: lapsed

Owner name: *KULL HANS

Effective date: 20020930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20030401

REG Reference to a national code

Ref country code: CH

Ref legal event code: NV

Representative=s name: SCHMAUDER & PARTNER AG PATENTANWALTSBUERO

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IE

Payment date: 20030730

Year of fee payment: 5

26N No opposition filed

Effective date: 20030515

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20030821

Year of fee payment: 5

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20030918

Year of fee payment: 5

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: AT

Payment date: 20030922

Year of fee payment: 5

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: CH

Payment date: 20030923

Year of fee payment: 5

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20030930

Year of fee payment: 5

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040910

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040910

Ref country code: AT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040910

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040930

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050401

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20040910

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050531

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST