EP0157903B1 - Method and apparatus for speech synthesizing - Google Patents

Method and apparatus for speech synthesizing Download PDF

Info

Publication number
EP0157903B1
EP0157903B1 EP19840109492 EP84109492A EP0157903B1 EP 0157903 B1 EP0157903 B1 EP 0157903B1 EP 19840109492 EP19840109492 EP 19840109492 EP 84109492 A EP84109492 A EP 84109492A EP 0157903 B1 EP0157903 B1 EP 0157903B1
Authority
EP
European Patent Office
Prior art keywords
vowel
consonant
stored
uniform
vowels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
EP19840109492
Other languages
German (de)
French (fr)
Other versions
EP0157903A1 (en
Inventor
Christian Deforeit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Matth Hohner AG
Original Assignee
Matth Hohner AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matth Hohner AG filed Critical Matth Hohner AG
Publication of EP0157903A1 publication Critical patent/EP0157903A1/en
Application granted granted Critical
Publication of EP0157903B1 publication Critical patent/EP0157903B1/en
Expired legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Definitions

  • the invention relates to a method for speech synthesis and an arrangement for carrying it out.
  • the invention has for its object to significantly reduce the amount of memory, so that the method even for mass consumer goods, for. B. toys, (dolls' voices and the like) is applicable.
  • Claim 1 defines the method according to the invention. It can be seen that each consonant is only stored in combination with a single unit vowel, which is to be referred to here and below as "&", and which has approximately the "e” in the German word “all”, in the English word “the”, in the French word “le”. It has been shown that when reading out with a corresponding time delay, the & is masked completely or at least to such an extent by the actual vowel then reproduced that the resulting syllable largely comes close to natural pronunciation.
  • & unit vowel
  • Claim 5 defines an arrangement according to the invention with which speech synthesis can be carried out.
  • Fig. 1 the upper diagram shows the envelope curve of the consonant channel, the lower one that of the vowel channel, the simple word "DATO" being chosen as an example. It can be seen that the & parts of the consonant channel and the vowels are reproduced at the same time, and the weak & sound is strongly masked by this alone. This masking can also be supported by further measures.
  • Fig. 2 represents a first means for this. It is known that the frequency spectrum of the consonants and vowels is different; e.g. For a male voice, for example, the maxima of the consonants are in the range of approximately 600 ... 3000 Hz, the vowels in the range of approximately 200 ... 1000 Hz. Accordingly, filters with the passbands shown in FIG. 2 are assigned to the two channels , where the filtering can be done either during recording or during playback.
  • Fig. 3 shows schematically the format for the storage.
  • the sounds are digitized, that is, with a clock of z. B. 10 KHz or more amplitude sampled and the data thus obtained are stored in successive memory locations for serial readout.
  • two memory locations for command data are kept free, namely a "continue” command and a "end” command.
  • the “continue” command means the point in time at which the other channel is to continue reading; this command is in the consonant data when the actual consonant sound changes to the & part, while in the vowel data it is close to the end of the data string.
  • the command “end” goes without saying, but is necessary because the individual phonemes have different durations.
  • the "continue” command can be used to increase the masking effect by using its occurrence the envelope of the channel just read is damped, as indicated in Fig. 4, for which one can use a conventional analog attenuator consisting of diode, resistor and capacitor.
  • Fig. 5 shows in block form an embodiment of a speech synthesizer which - as can be seen - is extremely simple.
  • the phonemes to be reproduced are selected by external means, for example a microprocessor, and do not form part of the subject matter of the present invention; an external control circuit is therefore only indicated here as block 1.
  • the arrangement comprises two mutually identical channels, only one of which is described below.
  • a memory address counter 2 is set by the control circuit 1 to a specific phoneme start address.
  • a phoneme memory 3 contains all the phonemes required for a given language, with thirty-six phonemes being sufficient for many languages. After filtering during recording (as explained above), the phonemes are digitized and stored in the format shown in FIG. 3; For example, the codes “0" or “1” can be reserved for the commands "Continue” or “End”.
  • a clock generator 5 generates the read clock of z. B. 10 KHz, for both channels. The read data go to a decoder 4, which determines whether it is data or one of the commands "continue” or "end”. Data pass through a digital-to-analog converter 6 and a multiplier 7 to a summing element 8 and from there to an amplifier-loudspeaker unit 9.
  • a phoneme request flip-flop 11 is set for the other channel; it is reset by the external control circuit when the next start address is entered. Furthermore, when the command "continue” is switched a damping flip-flop 12, which connects an envelope generator 13 with its output F to the one, with its output F to the other channel, which acts on the multiplier 7, so that the output of the channel in question is gently damped without, however, the "click” noise.
  • the outputs of both channels are combined in the summing element 8.
  • the respective set state of the flip-flop 12 is also transmitted to the external control circuit in order to signal the latter which of the two channels can be occupied, for example at the beginning of a readout cycle after the circuit has been started up.
  • the memory expenditure can be halved if only one phoneme memory 3 is provided for both channels and the reading is done in time division multiplex.
  • the multiplier 7 is already included in certain commercially available digital-to-analog converters, so that the output of the envelope generator 13 need only be connected to the corresponding input of the converter.
  • the circuit can also be largely implemented in a microprocessor, in which case either the two envelope generators and the two converters remain outside or only a single, common converter, while all other processes are carried out digitally by the microprocessor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

1. Process for the synthesis of speech in which combinations of a consonant followed by a vowel are stored and read out as required, characterised in that - every consonant occurring in the language is stored together with a weak uniform vowel "&", - all the vowels occurring in the language are stored individually, - the desired consonant/vowel combination is formed by staggered reading out of consonant and vowel in two channels, with the uniform vowel "&" being masked by the vowel that is read out.

Description

Die Erfindung betrifft ein Verfahren für die Sprachsynthese und eine Anordnung zu seiner Durchführung.The invention relates to a method for speech synthesis and an arrangement for carrying it out.

Es ist bekannt, die in einer Sprache vorkommenden Phoneme (Konsonanten und Vokale) einzeln abzuspeichern und dann bedarfsweise sequentiell auszulesen. Da die Anzahl der Phoneme relativ gering ist, bietet sich dabei die Möglichkeit, mit relativ wenig Speicherkapazität zu arbeiten. Nachteilig ist jedoch, daß bei der Wiedergabe ein von natürlicher Sprache stark abweichender Klang entsteht, weil zwischen aufeinanderfolgenden Konsonanten und Vokalen ein "Klick-"geräusch hörbar wird, so daß dieses Prinzip praktisch nicht angewandt wird.It is known to individually store the phonemes (consonants and vowels) occurring in a language and then to read them out sequentially as required. Since the number of phonemes is relatively small, it is possible to work with relatively little storage capacity. It is disadvantageous, however, that a sound that deviates greatly from natural language is produced because a "click" sound can be heard between successive consonants and vowels, so that this principle is practically not applied.

Um eine der menschlichen Sprache weitgehend angenäherte synthetische Sprache zu erzeugen, geht man daher so vor, daß jeweils aus den vorkommenden Konsonanten und den vorkommenden Vokalen gebildete Kombinationen, sogenannte Di-Phoneme, gespeichert und ausgelesen werden. Es versteht sich, daß hierfür eine sehr erhebliche Speicherkapazität benötigt wird, die für eine indogermanische Sprache in der Größenordnung von etwa 500 oder mehr Di-Phonemen liegt. Die Verwendung des Verfahrens beschränkt sich demgemäß auf solche Fälle, bei denen ein erheblicher Aufwand gerechtfertigt ist.In order to produce a synthetic language which is largely approximated to human language, the procedure is therefore that combinations formed from the occurring consonants and the occurring vowels, so-called di-phonemes, are stored and read out. It is understood that a very considerable storage capacity is required for this, which for an Indo-European language is of the order of about 500 or more di-phonemes. Accordingly, the use of the method is limited to those cases in which considerable effort is justified.

Ausgehend von dem letztgenannten Verfahren liegt der Erfindung die Aufgabe zugrunde, den Speicheraufwand erheblich herabzusetzen, so daß das Verfahren auch bei Massenkonsumgütern, z. B. Spielzeugen, (Puppenstimmen und dergleichen) anwendbar wird.Based on the last-mentioned method, the invention has for its object to significantly reduce the amount of memory, so that the method even for mass consumer goods, for. B. toys, (dolls' voices and the like) is applicable.

Der Patentanspruch 1 definiert das erfindungsgemäße Verfahren. Man erkennt, daß jeder Konsonant nur in Kombination mit einem einzigen Einheitsvokal abgespeichert wird, der hier und im folgenden mit "&" bezeichnet werden soll, und der etwa dem "e" im deutschen Wort "alle", im englischen Wort "the", im französischen Wort "le" entspricht. Es hat sich gezeigt, daß bei entsprechend zeitversetztem Auslesen das & von dem dann wiedergegebenen eigentlichen Vokal völlig oder jedenfalls soweit maskiert wird, daß die resultierende Silbe der natürlichen Aussprache weitgehend nahekommt.Claim 1 defines the method according to the invention. It can be seen that each consonant is only stored in combination with a single unit vowel, which is to be referred to here and below as "&", and which has approximately the "e" in the German word "all", in the English word "the", in the French word "le". It has been shown that when reading out with a corresponding time delay, the & is masked completely or at least to such an extent by the actual vowel then reproduced that the resulting syllable largely comes close to natural pronunciation.

Bei der üblichen digitalen Abspeicherung der Phoneme ist es zweckmäßig, bei den Konsonanten an der richtigen Stelle einen Befehl abzuspeichern, der den Beginn der Auslesung des Vokalspeichers einleitet. Ferner kann es zweckmäßig sein, durch entsprechend ausgelegte Filter die Frequenzen der Konsonanten einerseits, der Vokale andererseits unterschiedlich zu verstärken bzw. zu bedämpfen, um die Maskierung der & zu verbessern.In the usual digital storage of the phonemes, it is expedient to store a command in the consonants in the correct place, which initiates the beginning of the reading of the vowel memory. Furthermore, it can be expedient to use different filters to amplify or attenuate the frequencies of the consonants on the one hand and the vowels on the other to improve the masking of the &.

Der Patentanspruch 5 definiert eine Anordnung gemäß der Erfindung, mit der die Sprachsynthese durchführbar ist.Claim 5 defines an arrangement according to the invention with which speech synthesis can be carried out.

Unter Bezugnahme auf die beigefügten Zeichnungen soll die Erfindung nachstehend im einzelnen erläutert werden.

  • Fig. 1 zeigt anhand eines Beispiels das Prinzip des Verfahrens,
  • Fig. 2 zeigt Frequenzgänge von Filtern für Vokale und Konsonanten,
  • Fig. 3 zeigt schematisch ein mögliches Speicherformat für Konsonanten und Vokale,
  • Fig. 4 zeigt eine bevorzugte Hüllkurve für die Konsonantenerzeugung,
  • Fig. 5 ist ein Blockdiagramm einer Anordnung zur Ausführung des Verfahrens, und
  • Fig. 6 ist ein Diagramm zur Darstellung des Zeitablaufs bei der Synthese eines einfachen Wortes.
The invention will be explained in detail below with reference to the accompanying drawings.
  • 1 shows an example of the principle of the method,
  • 2 shows frequency responses of filters for vowels and consonants,
  • 3 schematically shows a possible storage format for consonants and vowels,
  • 4 shows a preferred envelope for the consonant generation,
  • 5 is a block diagram of an arrangement for carrying out the method, and
  • Fig. 6 is a diagram showing the timing of the synthesis of a simple word.

Da der Auslesevorgang zeitversetzt, däs heißt so erfolgt, daß die Auslesung eines Vokals bereits beginnt, während noch das Auslesen des Konsonanten-Di-Phonems (nämlich dessen &-Teil) abläuft, arbeitet man mit zwei Auslesekanälen. In Fig. 1 stellt das obere Diagramm den Hüllkurvenverlauf des Konsonantenkanals, das untere den des Vokalkanals dar, wobei als Beispiel das einfache Wort "DATO" gewählt ist. Man erkennt, daß gleichzeitig die &-Anteile des Konsonantenkanals und die Vokale wiedergegeben werden, und bereits dadurch werden die schwachen &-Laute stark maskiert. Diese Maskierung kann aber noch durch weitere Maßnahmen unterstützt werden.Since the reading process is delayed, this means that the reading of a vowel already begins while the reading of the consonant di-phoneme (namely its & part) is still in progress, one works with two reading channels. In Fig. 1 the upper diagram shows the envelope curve of the consonant channel, the lower one that of the vowel channel, the simple word "DATO" being chosen as an example. It can be seen that the & parts of the consonant channel and the vowels are reproduced at the same time, and the weak & sound is strongly masked by this alone. This masking can also be supported by further measures.

Fig. 2 stellt ein erstes Mittel hierfür dar. Es ist bekannt, daß das Frequenzspektrum der Konsonanten und Vokale unterschiedlich ist; z. B. liegen bei einer männlichen Stimme die Maxima der Konsonanten im Bereich von etwa 600...3000 Hz, der Vokale im Bereich von etwa 200...1000 Hz. Dementsprechend werden den beiden Kanälen Filter mit den in Fig. 2 gezeigten Durchlaßbändern zugeordnet, wobei die Filterung entweder bei der Aufzeichnung oder bei der Wiedergabe erfolgen kann.Fig. 2 represents a first means for this. It is known that the frequency spectrum of the consonants and vowels is different; e.g. For a male voice, for example, the maxima of the consonants are in the range of approximately 600 ... 3000 Hz, the vowels in the range of approximately 200 ... 1000 Hz. Accordingly, filters with the passbands shown in FIG. 2 are assigned to the two channels , where the filtering can be done either during recording or during playback.

Fig. 3 zeigt schematisch das Format für die Speicherung. Bei der Aufzeichnung werden die Laute digitalisiert, das heißt mit einem Takt von z. B. 10 KHz oder mehr amplitudenabgetastet und die so erhaltenen Daten werden in aufeinanderfolgenden Speicherplätzen für serielles Auslesen abgespeichert. Es werden jedoch zwei Speicherplätze für Kommandodaten freigehalten, nämlich ein Kommando "weiter" und ein Kommando "Ende". Das Kommando "weiter" bedeutet den Zeitpunkt, bei welchem der jeweils andere Kanal mit dem Auslesen fortfahren soll; dieses Kommando liegt bei den Konsonantendaten beim Übergang des eigentlichen Konsonantenlauts zum &-Teil, während es bei den Vokaldaten nahe dem Ende des Datenstrangs liegt. Das Kommando "Ende" versteht sich von selbst, ist aber erforderlich, weil die einzelnen Phoneme unterschiedliche Dauer besitzen. Das Kommando "weiter" kann dazu verwendet werden, um den Maskierungseffekt noch zu verstärken, indem bei seinem Auftreten die Hüllkurve des gerade ausgelesenen Kanals bedämpft wird, wie in Fig. 4 angedeutet, wofür man ein übliches analog arbeitendes Dämpfungsglied aus Diode, Widerstand und Kondensator verwenden kann.Fig. 3 shows schematically the format for the storage. When recording, the sounds are digitized, that is, with a clock of z. B. 10 KHz or more amplitude sampled and the data thus obtained are stored in successive memory locations for serial readout. However, two memory locations for command data are kept free, namely a "continue" command and a "end" command. The "continue" command means the point in time at which the other channel is to continue reading; this command is in the consonant data when the actual consonant sound changes to the & part, while in the vowel data it is close to the end of the data string. The command "end" goes without saying, but is necessary because the individual phonemes have different durations. The "continue" command can be used to increase the masking effect by using its occurrence the envelope of the channel just read is damped, as indicated in Fig. 4, for which one can use a conventional analog attenuator consisting of diode, resistor and capacitor.

Fig. 5 zeigt in Blockform ein Ausführungsbeispiel eines Sprachsynthesizers, der - wie man erkennt - höchst einfach aufgebaut ist. Die Auswahl der wiederzugebenden Phoneme erfolgt durch externe Mittel, beispielsweise einen Mikroprozessor, und bildet keinen Gegenstand der vorliegenden Erfindung; hier ist deshalb nur als Block 1 eine externe Steuerschaltung angedeutet.Fig. 5 shows in block form an embodiment of a speech synthesizer which - as can be seen - is extremely simple. The phonemes to be reproduced are selected by external means, for example a microprocessor, and do not form part of the subject matter of the present invention; an external control circuit is therefore only indicated here as block 1.

Die Anordnung umfaßt zwei untereinander identische Kanäle, von denen nachstehend nur einer beschrieben wird.The arrangement comprises two mutually identical channels, only one of which is described below.

Ein Speicheradressenzähler 2 wird von der Steuerschaltung 1 auf eine bestimmte Phonem-Startadresse gesetzt. Ein Phonemspeicher 3 enthält alle für eine gegebene Sprache benötigten Phoneme, wobei für viele Sprachen sechsunddreißig Phoneme ausreichend sind. Die Phoneme sind nach Filterung bei der Aufnahme (wie oben erläutert) digitalisiert und in dem in Fig. 3 dargestellten Format abgespeichert; dabei können beispielsweise die Kodes "0" bzw. "1" für die Kommandos "weiter" bzw. "Ende" reserviert sein. Ein Taktgenerator 5 erzeugt den Auslesetakt von z. B. 10 KHz, und zwar für beide Kanäle. Die ausgelesenen Daten gelangen zu einem Dekoder 4, der feststellt, ob es sich um Daten oder eines der Kommandos "weiter" bzw. "Ende" handelt. Daten gelangen über einen Digital-Analog-Umsetzer 6 sowie ein Multiplizierglied 7 zu einem Summierglied 8 und von dort zu einer Verstärker-Lautsprecher-Einheit 9.A memory address counter 2 is set by the control circuit 1 to a specific phoneme start address. A phoneme memory 3 contains all the phonemes required for a given language, with thirty-six phonemes being sufficient for many languages. After filtering during recording (as explained above), the phonemes are digitized and stored in the format shown in FIG. 3; For example, the codes "0" or "1" can be reserved for the commands "Continue" or "End". A clock generator 5 generates the read clock of z. B. 10 KHz, for both channels. The read data go to a decoder 4, which determines whether it is data or one of the commands "continue" or "end". Data pass through a digital-to-analog converter 6 and a multiplier 7 to a summing element 8 and from there to an amplifier-loudspeaker unit 9.

Bei Dekodierung des Kommandos "Ende" wird über ein UND-Gatter 10 die Inkrementierung des Adresszählers 2 gesperrt.When the "End" command is decoded, the incrementation of the address counter 2 is blocked via an AND gate 10.

Wird das Kommando "weiter" dekodiert, so wird ein Phonem-Anforderungs-Flipflop 11 für den jeweils anderen Kanal gesetzt; seine Rücksetzung erfolgt durch die externe Steuerschaltung bei Eingabe der nächsten Startadresse. Ferner wird beim Kommando "weiter" ein Dämpfungsflipflop 12 umgeschaltet, das mit seinem Ausgang F dem einen, mit seinem Ausgang F dem anderen Kanal einen Hüllkurvengenerator 13 zuschaltet, der auf das Multiplizierglied 7 einwirkt, so daß der Ausgang des betreffenden Kanals sanft abfallend bedämpft wird, ohne daß jedoch das "Klick"- geräusch entsteht. Die Ausgänge beider Kanäle werden im Summierglied 8 kombiniert.If the command "further" is decoded, a phoneme request flip-flop 11 is set for the other channel; it is reset by the external control circuit when the next start address is entered. Furthermore, when the command "continue" is switched a damping flip-flop 12, which connects an envelope generator 13 with its output F to the one, with its output F to the other channel, which acts on the multiplier 7, so that the output of the channel in question is gently damped without, however, the "click" noise. The outputs of both channels are combined in the summing element 8.

Der jeweilige Setzzustand des Flipflops 12 wird auch zu der externen Steuerschaltung übertragen, um dieser zu signalisieren, welcher der beiden Kanäle belegt werden kann, etwa zu Beginn eines Auslesezyklus nach Inbetriebnahme der Schaltung.The respective set state of the flip-flop 12 is also transmitted to the external control circuit in order to signal the latter which of the two channels can be occupied, for example at the beginning of a readout cycle after the circuit has been started up.

Bevor unter Bezugnahme auf Fig. 6 ein Synthesevorgang im einzelnen erläutert wird, sei noch auf mögliche Abwandlungen der in Fig. 5 gezeigten Blockschaltung hingewiesen.Before a synthesis process is explained in detail with reference to FIG. 6, possible modifications of the block circuit shown in FIG. 5 are pointed out.

Der Speicheraufwand läßt sich halbieren, wenn für beide Kanäle nur ein Phonemspeicher 3 vorgesehen ist und das Auslesen im Zeitmultiplex erfolgt. Das Multiplizierglied 7 ist in bestimmten handelsüblichen Digital-Analog-Umsetzern bereits enthalten, so daß man den Ausgang der Hüllkurvengeneratoren 13 nur mit dem entsprechenden Eingang des Umsetzers zu verbinden braucht. Man kann die Schaltung auch weitgehend in einem Mikroprozessor realisieren, wobei dann entweder die beiden Hüllkurvengeneratoren und die beiden Umsetzer außerhalb bleiben oder nur ein einzelner, gemeinsamer Umsetzer, während alle anderen Vorgänge vom Mikroprozessor digital durchgeführt werden.The memory expenditure can be halved if only one phoneme memory 3 is provided for both channels and the reading is done in time division multiplex. The multiplier 7 is already included in certain commercially available digital-to-analog converters, so that the output of the envelope generator 13 need only be connected to the corresponding input of the converter. The circuit can also be largely implemented in a microprocessor, in which case either the two envelope generators and the two converters remain outside or only a single, common converter, while all other processes are carried out digitally by the microprocessor.

Fig. 6 zeigt

  • - in Zeile (a) den Takt des Taktgenerators 5; dieser Takt kann starr sein, kann aber auch von der Steuerschaltung 1 variiert werden, um eine der natürlichen Sprache noch ähnlichere Phrasierung zu erzielen,
  • - in Zeile (b) Formate aus dem ersten Kanal, hier die Phoneme "D&" und "T&",
  • - in Zeile (c) den Logikpegel am Ausgang des Flipflop 12,
  • - in Zeile (d) den Logikpegel am Ausgang des Flipflops 11 des zweiten Kanals,
  • - in Zeile (e) Formate aus demselben zweiten Kanal, hier Phoneme "a" und "o",
  • - in Zeile (f) den Logikpegel am Ausgang des Flipflops 11 des ersten Kanals,
  • - in Zeilen (g) bzw. (h) die Hüllkurven, erzeugt von den Hüllkurvengeneratoren 13 des ersten bzw. des zweiten Kanals, und
  • - in Zeilen (i) bzw. (k) die analogen Ausgangssignale des ersten bzw. zweiten Kanals; dabei sind die Hüllkurven nicht als repräsentativ für die tatsächlich erzeugten Laute "D", "A", "T" oder "0" zu verstehen; das Diagramm dient nur der Erläuterung des zeitlichen Ablaufs.
Fig. 6 shows
  • - In line (a) the clock of the clock generator 5; this clock can be rigid, but can also be varied by the control circuit 1 in order to achieve a phrase that is even more similar to natural language,
  • - in line (b) formats from the first channel, here the phonemes "D &" and "T &",
  • in line (c) the logic level at the output of the flip-flop 12,
  • in line (d) the logic level at the output of the flip-flop 11 of the second channel,
  • - in line (s) formats from the same second channel, here phonemes "a" and "o",
  • in line (f) the logic level at the output of the flip-flop 11 of the first channel,
  • - In lines (g) and (h) the envelopes generated by the envelope generators 13 of the first and second channels, and
  • - in lines (i) and (k) the analog output signals of the first and second channels; the envelopes are not to be understood as representative of the sounds "D", "A", "T" or "0" actually generated; the diagram only serves to explain the chronological sequence.

Claims (8)

1. Process for the synthesis of speech in which combinations of a consonant followed by a vowel are stored and read out as required, characterised in that
- every consonant occurring in the language is stored together with a weak uniform vowel "&",
- all the vowels occurring in the language are stored individually,
- the desired consonant/vowel combination is formed by staggered reading out of consonant and vowel in two channels, with the uniform vowel "&" being masked by the vowel that is read out.
2. Process according to claim 1, characterised in that when the combinations of consonant and uniform vowel are stored vowel-typical frequencies are attenuated and that when the vowels are stored consonant-typical frequencies are attenuated.
3. Process according to claim 1 or claim 2, characterised in that the amplitudes of the uniform vowels "&" are attenuated during reading out.
4. Process according to claim 1, 2 or 3, characterised in that the reading out is effected at a variable pulse frequency.
5. Arrangement for the synthesis of speech, comprising a memory device for storing consonants and vowels and a reading out circuit for reading out combinations of a consonant and a following vowel, characterised in that
- every consonant occurring in the language is stored in the memory device together with a weak uniform vowel "&",
- all the vowels occurring in the language are stored individually in the memory device,
- the reading out circuit comprises two read-out channels that can be activated alternately and a switching over circuit that can be operated by a command read out in one channel for activating the other channel, and reads out the desired consonant/vowel combination from the two channels in a staggered manner, and that
- an envelope generator that can be activated by the switch-over command is provided for each channel, which generator effects attenuation of the amplitude in order for the uniform vowel "&" to be masked by the vowel that is read out.
6. Arrangement according to claim 5, characterised in that each channel includes a memory for all the sounds required (phonemes and diphonemes).
7. Arrangement according to claim 5 or claim 6, in which the sounds (phonemes and diphonemes) are stored digitally in each memory and can be read out sequentially, characterised in that at least in the case of the diphonemes the changeover command is stored in the read-out sequence between the consonant interval and the uniform vowel interval.
8. Arrangement according to claim 5 in which the sounds (phonemes and diphonemes) are stored digitally in each memory and can be read out sequentially, characterised in that at the end of each read-out sequence a command "end" can be read out, by means of which command a subsequent reading out operation can be initiated.
EP19840109492 1984-02-23 1984-08-09 Method and apparatus for speech synthesizing Expired EP0157903B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE19843406540 DE3406540C1 (en) 1984-02-23 1984-02-23 Method and arrangement for speech synthesis
DE3406540 1984-02-23

Publications (2)

Publication Number Publication Date
EP0157903A1 EP0157903A1 (en) 1985-10-16
EP0157903B1 true EP0157903B1 (en) 1988-01-13

Family

ID=6228601

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19840109492 Expired EP0157903B1 (en) 1984-02-23 1984-08-09 Method and apparatus for speech synthesizing

Country Status (3)

Country Link
EP (1) EP0157903B1 (en)
JP (1) JPS60211499A (en)
DE (1) DE3406540C1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4111781A1 (en) * 1991-04-11 1992-10-22 Ibm COMPUTER SYSTEM FOR VOICE RECOGNITION
IT1263756B (en) * 1993-01-15 1996-08-29 Alcatel Italia AUTOMATIC METHOD FOR IMPLEMENTATION OF INTONATIVE CURVES ON VOICE MESSAGES CODED WITH TECHNIQUES THAT ALLOW THE ASSIGNMENT OF THE PITCH
CN105895076B (en) * 2015-01-26 2019-11-15 科大讯飞股份有限公司 A kind of phoneme synthesizing method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4658424A (en) * 1981-03-05 1987-04-14 Texas Instruments Incorporated Speech synthesis integrated circuit device having variable frame rate capability
EP0114123B1 (en) * 1983-01-18 1987-04-22 Matsushita Electric Industrial Co., Ltd. Wave generating apparatus

Also Published As

Publication number Publication date
EP0157903A1 (en) 1985-10-16
JPS60211499A (en) 1985-10-23
DE3406540C1 (en) 1985-09-05

Similar Documents

Publication Publication Date Title
DE2740520A1 (en) METHOD AND ARRANGEMENT FOR SYNTHESIS OF LANGUAGE
DE3046338A1 (en) ELECTRONIC CLOCK WITH RECORDING FUNCTION
DE3036680A1 (en) VOICE SYNTHESIZER WITH EXTENDABLE AND COMPRESSIBLE LANGUAGE TIME
DE2050512B2 (en) Device for deriving speech parameters and for generating synthetic speech
DE2753707A1 (en) DEVICE FOR DETECTING THE APPEARANCE OF A COMMAND WORD FROM AN INPUT LANGUAGE
DE2343158A1 (en) FREQUENCY BAND CONVERTER
DE2850286A1 (en) ELECTRONIC STRIKING CLOCK
DE3228756A1 (en) METHOD AND DEVICE FOR PERIODICALLY COMPRESSING AND SYNTHESIS OF VOICE-FREE VOICE SIGNALS
DE1965480A1 (en) Device for the artificial generation of words by converting a text printed in letters into pronunciation
EP0157903B1 (en) Method and apparatus for speech synthesizing
DE1811040C3 (en) Arrangement for synthesizing speech signals
DE69232964T2 (en) Information announcement device
DE1937464B2 (en) VOICE ANALYZER
DE2854401C2 (en) Answering machine
AT391035B (en) VOICE RECOGNITION SYSTEM
DE3232835C2 (en)
DE4441906C2 (en) Arrangement and method for speech synthesis
DE3928664C2 (en) VCR with voice input / output function
DE2657430A1 (en) DEVICE FOR SYNTHETIZING HUMAN LANGUAGE
DE2335818A1 (en) ARRANGEMENT FOR GENERATING ANSWER VOICES OR -LANGUAGE
AT311077B (en) Device for synthesizing audio information
EP0094681B1 (en) Arrangement for electronic speech synthesis
DE2016572A1 (en) Method and device for speech synthesis
DE1762336C (en) Circuit arrangement for speech analysis and speech synthesis in the manner of a vocoder
DE1191124B (en) Method and arrangement for the temporal expansion or compression of speech sounds

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Designated state(s): FR GB IT NL

17P Request for examination filed

Effective date: 19851102

17Q First examination report despatched

Effective date: 19870202

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): FR GB IT NL

ET Fr: translation filed
ITF It: translation for a ep patent filed

Owner name: STUDIO TORTA SOCIETA' SEMPLICE

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Effective date: 19890809

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Effective date: 19900301

GBPC Gb: european patent ceased through non-payment of renewal fee
NLV4 Nl: lapsed or anulled due to non-payment of the annual fee
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Effective date: 19900427

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST