EP1078354A1 - Method and device for determining spectral voice characteristics in a spoken expression - Google Patents

Method and device for determining spectral voice characteristics in a spoken expression

Info

Publication number
EP1078354A1
EP1078354A1 EP99929088A EP99929088A EP1078354A1 EP 1078354 A1 EP1078354 A1 EP 1078354A1 EP 99929088 A EP99929088 A EP 99929088A EP 99929088 A EP99929088 A EP 99929088A EP 1078354 A1 EP1078354 A1 EP 1078354A1
Authority
EP
European Patent Office
Prior art keywords
transformation
utterance
speaker
speech
wavelet transformation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP99929088A
Other languages
German (de)
French (fr)
Other versions
EP1078354B1 (en
Inventor
Martin Holzapfel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Original Assignee
Siemens AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG filed Critical Siemens AG
Publication of EP1078354A1 publication Critical patent/EP1078354A1/en
Application granted granted Critical
Publication of EP1078354B1 publication Critical patent/EP1078354B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/06Elementary speech units used in speech synthesisers; Concatenation rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Definitions

  • the invention relates to a method and an arrangement for determining spectral speech characteristics in a spoken utterance.
  • a wavelet transformation is known from [1].
  • a wavelet filter ensures that a high-pass component and a
  • Low-pass component of a subsequent transformation stage completely restore a signal of a current transformation stage.
  • the resolution of the high-pass component or low-pass component is reduced from one transformation stage to the next (technical term: "subsampling").
  • the number of transformation levels is finite due to subsampling.
  • the object of the invention is a method and an arrangement for determining spectral
  • a method for determining spectral speech characteristics in a spoken utterance.
  • the spoken utterance is digitized and subjected to a wavelet transformation.
  • the speaker-specific characteristics are determined on the basis of different transformation levels of the wavelet transformation.
  • the utterance is divided in the wavelet transformation by means of a high-pass filter and a low-pass filter and that different high-pass components or low-pass components of different transformation stages contain speaker-specific characteristics.
  • the individual high-pass components or low-pass components of different transformation levels stand for predetermined speaker-specific characteristics, it being possible for both high-pass component and low-pass component of a respective transformation stage, that is to say the respective characteristic, to be modified separately from other characteristics. If, in the case of the inverse wavelet transformation, the original signal is put together again from the respective high-pass and low-pass components of the individual transformation stages, this ensures that exactly the desired signal
  • Characteristic has been changed. It is thus possible to change certain specified characteristics of the utterance without influencing the rest of the utterance.
  • One embodiment consists in that before the wavelet transformation the utterance is windowed, that is to say a predetermined quantity of samples is cut out, and the frequency range is transformed.
  • a Fast Fourier Transform (FFT) is used in particular for this purpose.
  • Another embodiment consists in that a high-pass component of a transformation stage m is a real part and an imaginary part is divided.
  • the high-pass component of the wavelet transformation corresponds to the difference signal between the current low-pass component and the low-pass component of the previous transformation stage.
  • a further development consists in determining the number of transformation stages of the wavelet transformation to be carried out by including a constant component of the utterance in the last transformation stage, which consists of low-pass filters connected in series. Then the signal as a whole can be represented by its wavelet coefficients. This corresponds to the complete transformation of the information of the signal section m into the wavelet space.
  • the difference signal remains as the high-pass component of a transformation stage, as explained above. If one accumulates difference signals (high-pass components) over the transformation stages, the information of the spoken utterance without a constant component is obtained in the last transformation stage as a cumulative high-pass component.
  • the speaker-specific characteristics can be identified as:
  • Transformation reveals the fundamental frequency of the utterance.
  • the basic frequency indicates whether the speaker is a man or a woman.
  • the spectral envelope contains information about a transfer function of the vocal tract during articulation. In a voiced area, the spectral envelope is dominated by the formants. The high-pass component of a higher transformation level of the wavelet transformation contains this spectral envelope.
  • the smokiness in a voice becomes visible as a negative slope in the course of the penultimate low-pass portion.
  • the speaker-specific characteristics a) to c) are of great importance in speech synthesis.
  • concatenative speech synthesis uses large quantities of real uttered utterances, from which excerpts are cut out and later put together to form a new word (synthesized language).
  • Discontinuities between compound sounds are disadvantageous because they are perceived by the human ear as unnatural.
  • An advantage of the invention is that the spectral envelope curve reflects the articulation tract of the speaker and is not based on formants, such as a pole position model. Go further with the wavelet transformation no data is lost as a nonparametric representation, the utterance can always be completely reconstructed.
  • the data resulting from the individual transformation stages of the wavelet transformation are linearly independent of one another, can thus be influenced separately from one another and can later be combined again - without loss - to the influenced utterance.
  • Speech characteristics specified which has a processor unit which is set up such that an utterance can be digitized.
  • the utterance is then subjected to a wavelet transformation and speaker-specific characteristics are determined using different transformation levels.
  • Fig.l a wavelet function
  • Fig.l shows a wavelet function, which is determined by
  • f is the frequency
  • is a standard deviation
  • c is a given normalization constant
  • the standard deviation ⁇ is determined by the predeterminable position of the sideband minimum 101 in Fig.l.
  • ⁇ (f) ⁇ (f) + j • H ⁇ (f) ⁇ (2).
  • denotes the conjugate complex wavelet function.
  • 3 shows the cascaded application of the wavelet transform.
  • a signal 301 is filtered both by a high pass HP1 302 and by a low pass TP1 305. In particular, subsampling takes place, ie the number of values to be saved is reduced per filter.
  • a mverse wavelet transformation ensures that the original signal 301 can be reconstructed from the low-pass component TP1 305 and the high-pass component HP1 304.
  • HP1 302 is filtered separately for real part Rel 303 and Imagmar part Iml 304.
  • the signal 310 after the low-pass filter TP1 305 is again both by a high-pass HP2 306 and by a
  • the high pass HP2 306 again comprises a real part Re2 307 and an imagemart Im2 308.
  • the signal after the second transformation stage 311 is filtered again, etc.
  • FIG. 4 shows various transformation stages of the wavelet transformation, divided into low-pass components (FIGS. 4A, 4C and 4E) and high-pass components (FIGS. 4B, 4D and 4F).
  • the basic frequency of the spoken utterance can be seen from the high-pass component in accordance with FIG. 4B.
  • the fundamental frequency of the speaker On the basis of the fundamental frequency, it is possible to express one another in the speech synthesis adapt or determine suitable utterances from a database with predefined utterances.
  • the formants of the speech signal section are shown as pronounced Mmima and Maxima (the length of the speech signal section corresponds to m approximately twice the fundamental frequency).
  • the formants represent resonance frequencies in the speaker's vocal tract. The clear representability of the formants enables adaptation and / or selection of suitable phonetic components in concatenative speech synthesis.
  • the smokiness of a voice can be determined in the low-pass portion of the penultimate transformation stage (with 256 frequency values in the original signal: TP7).
  • the descent of the curve between maximum Mx and minimum Mi indicates the degree of smokiness.
  • the three speaker-specific characteristics mentioned are thus identified and can be influenced in a targeted manner for speech synthesis. It is particularly important that the manipulation of a single speaker-specific characteristic only influences this in the case of the verse wavelet transformation; the other perceptually relevant variables remain unaffected. In this way, the basic frequency can be adjusted in a targeted manner without affecting the smokiness of the voice.
  • Another option is to select a suitable sound section for concatenative linking with another sound section, both sound sections originally being recorded by different speakers in different contexts.
  • a suitable sound section to be linked can be found, since criteria are known with the characteristics that allow a comparison of sound sections with each other and thus a selection of the Allow suitable sound section automatically according to certain specifications.
  • a database is created with a predetermined amount of naturally spoken language by different speakers, sound sections in the naturally spoken language being identified and stored. There are numerous representatives for the different sound sections of a language that the database can access.
  • the sound sections are in particular phonemes of a language or a series of such phonemes. The smaller the section of the sound, the greater the possibilities for combining new words. For example, the German language contains a predetermined amount of approximately 40
  • Discontinuities that are perceived by the human ear as unnatural and "synthetic" can be avoided.
  • the sound sections come from different speakers and thus have different speaker-specific characteristics.
  • FIG. 5 shows two sounds A 507 and B 508, each of which has individual sound sections 505 and 506, for example.
  • the sounds A 507 and B 508 each come from a spoken utterance, whereby the sound A 507 clearly is different from the sound B 508.
  • a dividing line 509 indicates where the sound A 507 should be linked with the sound B 508. In the present case, the first three sound sections of sound A 507 are to be concatenated with the last three sound sections of sound B 508.
  • a temporal stretching or compressing (see arrow 503) of the successive sound sections is carried out along the dividing line 509 in order to reduce the discontinuous impression at the transition 509.
  • a variant consists in an abrupt transition of the sounds divided along the dividing line 509. However, this leads to the discontinuities mentioned, which human hearing perceives as disturbing. If, on the other hand, a sound C is put together, the sound sections within a transition area 501 or 502 are taken into account, whereby a spectral distance between two mutually assignable sound sections m is adapted to the respective transition area 501 or 502 (gradual transition between the
  • the Euclidean distance between the coefficients relevant to this area is used as the distance measure, particularly in the wavelet space.

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Machine Translation (AREA)
  • Sorting Of Articles (AREA)
  • Pallets (AREA)
  • Navigation (AREA)
  • Traffic Control Systems (AREA)

Abstract

According to the invention, spectral voice characteristics are determined in a natural language expression, whereby the expression is digitized and subjected to a wavelet transformation. The speaker-specific characteristics arise from the different transformation steps of the wavelet transformation. Within the scope of a voice synthesis, these characteristics can be compared with characteristics of other expressions in order to generate a continuously sounding synthetic voice signal for the human ear. Alternatively, the characteristics can also be modified in a targeted manner in order to counteract a perceptive dissonance.

Description

Beschreibungdescription
Verfahren und Anordnung zur Bestimmung spektraler Sprachcharakteristika in einer gesprochenen ÄußerungMethod and arrangement for determining spectral speech characteristics in a spoken utterance
Die Erfindung betrifft ein Verfahren und eine Anordnung zur Bestimmung spektraler Sprachcharakteristika m einer gesprochenen Äußerung.The invention relates to a method and an arrangement for determining spectral speech characteristics in a spoken utterance.
Bei einer konkatenativen Sprachsynthese werden einzelne Laute aus Sprachdatenbanken zusammengesetzt. Um dabei einen für das menschliche Ohr natürlich klingenden Sprachverlauf zu erhalten, sind Diskontinuitäten an den Punkten, wo die Laute zusammengesetzt werden (Konkatenationspunkte) zu vermeiden. Die Laute sind dabei insbesondere Phoneme einer Sprache oder eine Zusammensetzung mehrerer Phoneme.In a concatenative speech synthesis, individual sounds are put together from speech databases. In order to obtain a speech course that sounds natural to the human ear, discontinuities must be avoided at the points where the sounds are composed (concatenation points). The sounds are in particular phonemes of one language or a combination of several phonemes.
Eine Wavelet-Transformation ist aus [1] bekannt. Bei der Wavelet-Transformation ist durch ein Wavelet-Filter gewährleistet, daß jeweils ein Hochpaßanteil und einA wavelet transformation is known from [1]. In the wavelet transformation, a wavelet filter ensures that a high-pass component and a
Tiefpaßanteil einer nachfolgenden Transformationsstufe ein Signal einer aktuellen Transformationsstufe vollständig wiederherstellen. Dabei erfolgt von einer Transformationsstufe zur nächsten eine Reduktion der Auflosung des Hochpaßanteils bzw. Tiefpaßanteils (engl. Fachbegriff: "Subsamplmg" ) . Insbesondere ist durch das Subsamplmg die Anzahl der Transformationsstufen endlich.Low-pass component of a subsequent transformation stage completely restore a signal of a current transformation stage. The resolution of the high-pass component or low-pass component is reduced from one transformation stage to the next (technical term: "subsampling"). In particular, the number of transformation levels is finite due to subsampling.
Die Aufgabe der Erfindung besteht darin, ein Verfahren und eine Anordnung zur Bestimmung spektralerThe object of the invention is a method and an arrangement for determining spectral
Sprachcharakteristika anzugeben, mit deren Hilfe insbesondere eine natürlich wirkende synthetische Sprachausgabe bestimmbarSpecify language characteristics, with the help of which in particular a natural-looking synthetic speech output can be determined
Diese Aufgabe wird gemäß den Merkmalen der unabhängigen Patentansprüche gelost. Im Rahmen der Erfindung wird ein Verfahren angegeben zur Bestimmung spektraler Sprachcharakteristika m einer gesprochenen Äußerung. Dazu wird die gesprochenen Äußerung digitalisiert und einer Wavelet-Transformation unterzogen. Anhand unterschiedlicher Transformationsstufen der Wavelet- Transformation werden die sprecherspezifischen Charakteristika ermittelt.This object is achieved according to the features of the independent claims. In the context of the invention, a method is specified for determining spectral speech characteristics in a spoken utterance. For this purpose, the spoken utterance is digitized and subjected to a wavelet transformation. The speaker-specific characteristics are determined on the basis of different transformation levels of the wavelet transformation.
Dabei ist es insbesondere ein Vorteil, daß bei der Wavelet- Transformation mittels eines Hochpaßfllters und eines Tiefpaßfilters die Äußerung aufgeteilt wird und unterschiedliche Hochpaßanteile bzw. Tiefpaßanteile verschiedener Transformationsstufen sprecherspezifische Charakteristika enthalten.It is an advantage in particular that the utterance is divided in the wavelet transformation by means of a high-pass filter and a low-pass filter and that different high-pass components or low-pass components of different transformation stages contain speaker-specific characteristics.
Die einzelnen Hochpaßanteile bzw. Tiefpaßanteile verschiedener Transformationsstufen stehen für vorgegebene sprecherspezifische Charakteristika, wobei sowohl Hochpaßanteil als auch Tiefpaßanteil einer jeweiligen Transformationsstufe, also das jeweilige Charakteristikum, getrennt von anderen Charakteristika modifiziert werden kann. Setzt man bei der mversen Wavelet-Transformation aus den jeweiligen Hochpaß- und Tiefpaßanteilen der einzelnen Transformationsstufen wieder das ursprungliche Signal zusammen, so ist gewährleistet, daß genau das gewünschteThe individual high-pass components or low-pass components of different transformation levels stand for predetermined speaker-specific characteristics, it being possible for both high-pass component and low-pass component of a respective transformation stage, that is to say the respective characteristic, to be modified separately from other characteristics. If, in the case of the inverse wavelet transformation, the original signal is put together again from the respective high-pass and low-pass components of the individual transformation stages, this ensures that exactly the desired signal
Charakteristikum verändert worden ist. Es ist somit möglich bestimmte vorgegebene Eigenarten der Äußerung zu verandern, ohne daß dadurch der Rest der Äußerung beeinflußt wird.Characteristic has been changed. It is thus possible to change certain specified characteristics of the utterance without influencing the rest of the utterance.
Eine Ausgestaltung besteht darin, daß vor der Wavelet- Transformation die ußerung gefenstert, also eine vorgegebene Menge von Abtastwerten ausgeschnitten, und m den Frequenzbereich transformiert wird. Hierzu wird insbesondere eine Fast-Fourier-Transformation (FFT) angewandt.One embodiment consists in that before the wavelet transformation the utterance is windowed, that is to say a predetermined quantity of samples is cut out, and the frequency range is transformed. A Fast Fourier Transform (FFT) is used in particular for this purpose.
Eine weitere Ausgestaltung besteht darin, daß ein Hochpaßanteil einer Transformationsstufe m einen Realteil und einen Imagmarteil aufgeteilt wird. Der Hochpaßanteil der Wavelet-Transformation entspricht dem Differenzsignal zwischen dem aktuellen Tiefpaßanteil und dem Tiefpaßanteil der vorhergehenden Transformationsstufe.Another embodiment consists in that a high-pass component of a transformation stage m is a real part and an imaginary part is divided. The high-pass component of the wavelet transformation corresponds to the difference signal between the current low-pass component and the low-pass component of the previous transformation stage.
Insbesondere besteht eine Weiterbildung darin, die Zahl der durchzuführenden Transformationsstufen der Wavelet- Transformation dadurch zu bestimmen, daß m der letzten Transformationsstufe, die aus hinteremandergeschalteten Tiefpassen besteht, ein Gleichanteil der Äußerung enthalten ist. Dann ist das Signal als Ganzes darstellbar durch seine Wavelet-Koefflzienten. Dies entspricht der vollständigen Transformation der Information des Signalausschnitts m den Wavelet-Raum.In particular, a further development consists in determining the number of transformation stages of the wavelet transformation to be carried out by including a constant component of the utterance in the last transformation stage, which consists of low-pass filters connected in series. Then the signal as a whole can be represented by its wavelet coefficients. This corresponds to the complete transformation of the information of the signal section m into the wavelet space.
Wird insbesondere nur der jeweilige Tiefpaßanteil weiter transformiert (mittels eines Hochpaß- und eines Tiefpaßfilters), so verbleibt als Hochpaßanteil einer Transformationsstufe das Differenzsignal, wie oben erläutert. Kumuliert man Differenzsignale (Hochpaßanteile) über die Transformationsstufen, erhalt man m der letzten Transformationsstufe als kumulierten Hochpaßanteil die Information der gesprochenen Äußerung ohne Gleichanteil.If, in particular, only the respective low-pass component is further transformed (by means of a high-pass and a low-pass filter), the difference signal remains as the high-pass component of a transformation stage, as explained above. If one accumulates difference signals (high-pass components) over the transformation stages, the information of the spoken utterance without a constant component is obtained in the last transformation stage as a cumulative high-pass component.
Im Rahmen einer zusätzlichen Weiterbildung sind die sprecherspezifischen Charakteristika identifizierbar als:As part of additional training, the speaker-specific characteristics can be identified as:
a) Grundfrequenz :a) fundamental frequency:
Die Schwingung des Hochpaßanteils der ersten oder der zweiten Transformationsstufe der Wavelet-The oscillation of the high-pass component of the first or second transformation stage of the wavelet
Transformation laßt die Grundfrequenz der ußerung erkennen. Die Grundfrequenz zeigt an, ob der Sprecher ein Mann oder einen Frau ist.Transformation reveals the fundamental frequency of the utterance. The basic frequency indicates whether the speaker is a man or a woman.
b) Form der spektralen Hullkurve:b) Shape of the spectral envelope:
Die spektrale Hullkurve enthalt Information über eine Transferfunktion des Vokaltrakts bei der Artikulation. In einem stimmhaften Bereich wird die spektrale Hullkurve von den Formanten dominiert. Der Hochpaßanteil einer höheren Transformationsstufe der Wavelet-Transformation enthalt diese spektrale Hullkurve.The spectral envelope contains information about a transfer function of the vocal tract during articulation. In a voiced area, the spectral envelope is dominated by the formants. The high-pass component of a higher transformation level of the wavelet transformation contains this spectral envelope.
c) Spectral Tilt (Rauchigkeit) :c) Spectral Tilt:
Die Rauchigkeit m einer Stimme wird als negative Steigung im Verlauf des vorletzten Tiefpaßanteils sichtbar.The smokiness in a voice becomes visible as a negative slope in the course of the penultimate low-pass portion.
Die sprecherspezifischen Charakteristika a) bis c) sind bei der Sprachsynthese von großer Bedeutung. Wie eingangs erwähnt, bedient man sich bei der konkatenativen Sprachsynthese großer Mengen realgesprochener Äußerungen, aus denen Beispiellaute ausgeschnitten und spater zu einem neuen Wort zusammengesetzt werden (synthetisierte Sprache) . Dabei sind Diskontinuitäten zwischen zusammengesetzten Lauten von Nachteil, da diese vom menschlichen Ohr als unnatürlich wahrgenommen werden. Um den Diskontinuitäten entgegenzuwirken ist es von Vorteil, direkt die perzeptiv relevanten Großen zu erfassen und ggf. zu vergleiche und/oder einander anzupassen.The speaker-specific characteristics a) to c) are of great importance in speech synthesis. As mentioned at the beginning, concatenative speech synthesis uses large quantities of real uttered utterances, from which excerpts are cut out and later put together to form a new word (synthesized language). Discontinuities between compound sounds are disadvantageous because they are perceived by the human ear as unnatural. In order to counteract the discontinuities, it is advantageous to directly record the perceptually relevant sizes and, if necessary, to compare and / or adapt them.
Dies kann geschehen durch direkte Manipulation, indem ein Sprachlaut m mindestens einer seiner sprecherspezifischen Charakteristika angepaßt wird, so daß er m dem akustischen Kontext der konkatenativ verknüpften Laute nicht als störend wahrgenommen wird. Auch ist es möglich, die Auswahl eines passenden Lautes daran auszurichten, daß sprecherspezifische Charakteristika von zu verknüpfenden Lauten möglichst gut zueinander passen, z.B. daß den Lauten gleiche oder ähnliche Rauchigkeit zu eigen ist.This can be done by direct manipulation by adapting a speech m at least one of its speaker-specific characteristics, so that it is not perceived as disturbing in the acoustic context of the concatenative linked sounds. It is also possible to align the selection of a suitable sound so that speaker-specific characteristics of sounds to be linked match each other as well as possible, e.g. that the sounds have the same or similar smokiness.
Ein Vorteil der Erfindung besteht darin, daß die spektrale Hullkurve den Artikulationstrakt des Sprechers widerspiegelt und nicht, wie z.B. ein Polstellenmodell, auf Formanten gestutzt ist. Weiterhin gehen bei der Wavelet-Transformation als nichtparametrischer Darstellung keine Daten verloren, die Äußerung kann stets vollständig rekonstruiert werden. Die aus den einzelnen Transformationsstufen der Wavelet- Transformation hervorgehenden Daten sind linear voneinander unabhängig, können somit getrennt voneinander beeinflußt und spater wieder zu der beeinflußten Äußerung - verlustlos - zusammengesetzt werden.An advantage of the invention is that the spectral envelope curve reflects the articulation tract of the speaker and is not based on formants, such as a pole position model. Go further with the wavelet transformation no data is lost as a nonparametric representation, the utterance can always be completely reconstructed. The data resulting from the individual transformation stages of the wavelet transformation are linearly independent of one another, can thus be influenced separately from one another and can later be combined again - without loss - to the influenced utterance.
Weiterhin wird eine Anordnung zur Bestimmung spektralerFurthermore, an arrangement for determining spectral
Sprachcharakteristika angegeben, die eine Prozessoreinheit aufweist, die derart eingerichtet ist, daß eine Äußerung digitalisierbar ist. Daraufhin wird die Äußerung einer Wavelet-Transformation unterzogen und anhand unterschiedlicher Transformationsstufen werden sprecherspezifische Charakteristika ermittelt.Speech characteristics specified, which has a processor unit which is set up such that an utterance can be digitized. The utterance is then subjected to a wavelet transformation and speaker-specific characteristics are determined using different transformation levels.
Diese Anordnung ist insbesondere geeignet zur Durchfuhrung des erfmdungsgemaßen Verfahrens oder einer seiner vorstehend erläuterten Weiterbildungen.This arrangement is particularly suitable for carrying out the method according to the invention or one of its developments explained above.
Weiterbildungen der Erfindung ergeben sich auch aus den abhangigen Ansprüchen.Further developments of the invention also result from the dependent claims.
Ausfuhrungsbeispiele der Erfindung werden nachfolgend anhand der Zeichnung dargestellt und erläutert.Exemplary embodiments of the invention are illustrated and explained below with reference to the drawing.
Es zeigenShow it
Fig.l eine Wavelet-Funktion;Fig.l a wavelet function;
Fig.2 eine Wavelet-Funktion, unterteilt nach Realteil und Imagmarteil;2 shows a wavelet function, divided into real part and imaginary part;
Fig.3 eine kaskadierte Filterstruktur, die die3 shows a cascaded filter structure that the
Transformationsschritte der Wavelet-Transformation darstellt; Fig.4 Tiefpaßanteile und Hochpaßanteile unterschiedlicher Transformationsstufen;Represents transformation steps of the wavelet transformation; Fig.4 low-pass components and high-pass components of different transformation levels;
Fig.5 Schritte der konkatenativen Sprachsynthese.Fig. 5 steps of concatenative speech synthesis.
Fig.l zeigt eine Wavelet-Funktion, die bestimmt ist durchFig.l shows a wavelet function, which is determined by
wobei f die Frequenz, σ eine Standardabweichung und c eine vorgegebene Normierungskonstante bezeichnen.where f is the frequency, σ is a standard deviation and c is a given normalization constant.
Insbesondere ist die Standardabweichung σ bestimmt durch die vorgebbare Stelle des Seitenbandminimums 101 in Fig.l.In particular, the standard deviation σ is determined by the predeterminable position of the sideband minimum 101 in Fig.l.
Fig.2 zeigt eine Wavelet-Funktion mit einem Realteil gemäß Gleichung (1) und einer Hilbert-Transformierten H des Realteils als Imaginärteil. Die komplexe Wavelet-Funktion ergibt sich somit zu2 shows a wavelet function with a real part according to equation (1) and a Hilbert transform H of the real part as an imaginary part. The complex wavelet function thus arises
Ψ(f) = ψ(f) + j H{ψ(f)} (2).Ψ (f) = ψ (f) + j H {ψ (f)} (2).
Die Konstante c aus Gleichung (1) wird verwendet, um die komplexe Wavelet-Funktion zu normieren:The constant c from equation (1) is used to normalize the complex wavelet function:
oo oo
-oo-oo
wobei Ψ die konjugiert komplexe Wavelet-Funktion bezeichnet. Fig.3 zeigt die kaskadierte Anwendung der Wavelet- Transformation. Ein Signal 301 wird sowohl durch einen Hochpaß HP1 302 als auch durch einen Tiefpaß TP1 305 gefiltert. Dabei findet insbesondere ein Subsamplmg statt, d.h. die Anzahl der abzuspeichernden Werte wird pro Filter reduziert. Eine mverse Wavelet-Transformation gewahrleistet, daß aus dem Tiefpaßanteil TP1 305 und dem Hochpaßanteil HP1 304 wieder das ursprüngliche Signal 301 rekonstruierbar ist.where Ψ denotes the conjugate complex wavelet function. 3 shows the cascaded application of the wavelet transform. A signal 301 is filtered both by a high pass HP1 302 and by a low pass TP1 305. In particular, subsampling takes place, ie the number of values to be saved is reduced per filter. A mverse wavelet transformation ensures that the original signal 301 can be reconstructed from the low-pass component TP1 305 and the high-pass component HP1 304.
Im Hochpaß HP1 302 wird getrennt nach Realteil Rel 303 und Imagmarteil Iml 304 gefiltert.In the high pass HP1 302 is filtered separately for real part Rel 303 and Imagmar part Iml 304.
Das Signal 310 nach dem Tiefpaßfilter TP1 305 wird erneut sowohl durch einen Hochpaß HP2 306 als auch durch einenThe signal 310 after the low-pass filter TP1 305 is again both by a high-pass HP2 306 and by a
Tiefpaß TP2 309 gefiltert. Der Hochpaß HP2 306 umfaßt wieder einen Realteil Re2 307 und einen Imagmarteil Im2 308. Das Signal nach der zweiten Transformationsstufe 311 wird wieder gefiltert, usf.Filtered low pass TP2 309. The high pass HP2 306 again comprises a real part Re2 307 and an imagemart Im2 308. The signal after the second transformation stage 311 is filtered again, etc.
Geht man von einem (FFT-transformierten) Kurzzeitspektrum mit 256 Werten aus, so werden acht Transformationsschritte durchgef hrt (Subsampl grate : 1/2), bis das Signal aus dem letzten Tiefpaßfilter TP8 dem Gleichanteil entspricht.Assuming a (FFT-transformed) short-term spectrum with 256 values, eight transformation steps are carried out (subsampl grate: 1/2) until the signal from the last low-pass filter TP8 corresponds to the DC component.
In Fig.4 sind verschiedene Transformationsstufen der Wavelet- Transformation, unterteilt nach Tiefpaßanteilen (Figuren 4A, 4C und 4E) und Hochpaßanteilen (Figuren 4B, 4D und 4F) dargestellt .4 shows various transformation stages of the wavelet transformation, divided into low-pass components (FIGS. 4A, 4C and 4E) and high-pass components (FIGS. 4B, 4D and 4F).
Aus dem Hochpaßanteil gemäß Fιg.4B ist die Grundfrequenz der gesprochenen Äußerung ersichtlich. Neben den Schwankungen m der Amplitude ist deutlich eine überwiegende Periodizitat im wavelet-gefllterten Spektrum zu erkennen, die Grundfrequenz des Sprechers. Anhand der Grundfrequenz ist es möglich, vorgegebene Äußerungen bei der Sprachsynthese einander anzupassen oder passende Äußerungen aus einer Datenbank mit vorgegebene Äußerungen zu bestimmen.The basic frequency of the spoken utterance can be seen from the high-pass component in accordance with FIG. 4B. In addition to the fluctuations in the amplitude, a predominant periodicity in the wavelet-filled spectrum can clearly be seen, the fundamental frequency of the speaker. On the basis of the fundamental frequency, it is possible to express one another in the speech synthesis adapt or determine suitable utterances from a database with predefined utterances.
Im Tiefpaßanteil von Fιg.4C sind als ausgeprägte Mmima und Maxima die Formanten des Sprachsignalausschnitts (die Lange des Sprachsignalausschnitts entspricht m etwa der doppelten Grundfrequenz) dargestellt. Die Formanten repräsentieren Resonanzfrequenzen im Vokaltrakt des Sprechers. Die deutliche Darstellbarkeit der Formanten ermöglicht eine Anpassung und/oder Auswahl passender Lautbausteine bei der konkatenativen Sprachsynthese.In the low-pass portion of Fιg.4C, the formants of the speech signal section are shown as pronounced Mmima and Maxima (the length of the speech signal section corresponds to m approximately twice the fundamental frequency). The formants represent resonance frequencies in the speaker's vocal tract. The clear representability of the formants enables adaptation and / or selection of suitable phonetic components in concatenative speech synthesis.
Im Tiefpaßanteil der vorletzten Transformationsstufe (bei 256 Frequenzwerten im Originalsignal: TP7), kann die Rauchigkeit einer Stimme ermittelt werden. Der Abstieg des Kurvenverlaufs zwischen Maximum Mx und Minimum Mi kennzeichnet den Grad der Rauchigkeit .The smokiness of a voice can be determined in the low-pass portion of the penultimate transformation stage (with 256 frequency values in the original signal: TP7). The descent of the curve between maximum Mx and minimum Mi indicates the degree of smokiness.
Die erwähnten drei sprecherspezifischen Charakteristika sind somit identifiziert und können für die Sprachsynthese gezielt beeinflußt werden. Dabei ist es insbesondere von Bedeutung, daß bei der versen Wavelet-Transformation die Manipulation eines einzelnen sprecherspezifischen Charakteristikums nur dieses beeinflußt, die anderen perziptiv relevanten Großen bleiben unberührt. Somit kann die Grundfrequenz gezielt verstellt werden, ohne daß dadurch die Rauchigkeit der Stimme beeinflußt wird.The three speaker-specific characteristics mentioned are thus identified and can be influenced in a targeted manner for speech synthesis. It is particularly important that the manipulation of a single speaker-specific characteristic only influences this in the case of the verse wavelet transformation; the other perceptually relevant variables remain unaffected. In this way, the basic frequency can be adjusted in a targeted manner without affecting the smokiness of the voice.
Eine andere Emsatzmoglichkeit besteht m der Auswahl eines geeigneten Lautabschnitts zur konkatenativen Verknüpfung mit einem anderen Lautabschnitt, wobei beide Lautabschnitte ursprünglich von verschiedenen Sprechern m unterschiedlichen Kontexten aufgenommen wurden. Mit Ermittlung spektraler Sprachcharakteristika kann ein geeigneter zu verknüpfender Lautabschnitt gefunden werden, da mit den Charakteristika Kriterien bekannt sind, die einen Vergleich von Lautabschnitten untereinander und somit eine Auswahl des passenden Lautabschnitts automatisch nach bestimmten Vorgaben ermöglichen.Another option is to select a suitable sound section for concatenative linking with another sound section, both sound sections originally being recorded by different speakers in different contexts. By determining spectral speech characteristics, a suitable sound section to be linked can be found, since criteria are known with the characteristics that allow a comparison of sound sections with each other and thus a selection of the Allow suitable sound section automatically according to certain specifications.
Fig.5 zeigt Schritte einer konkatenativen Sprachsynthese. Eine Datenbank wird mit einer vorgegebenen Menge naturlichgesprochener Sprache verschiedener Sprecher erstellt, wobei Lautabschnitte m der naturlichgesprochenen Sprache identifiziert und abgespeichert werden. Es ergeben sich zahlreiche Repräsentanten für die verschiedenen Lautabschnitte einer Sprache, auf die die Datenbank zugreifen kann. Die Lautabschnitte sind insbesondere Phoneme einer Sprache oder eine Aneinanderreihung solcher Phoneme. Je kleiner der Lautabschnitt, desto großer sind die Möglichkeiten bei der Zusammensetzung neuer Worter. So umfaßt die deutsche Sprache eine vorgegebene Menge von ca. 405 shows steps of a concatenative speech synthesis. A database is created with a predetermined amount of naturally spoken language by different speakers, sound sections in the naturally spoken language being identified and stored. There are numerous representatives for the different sound sections of a language that the database can access. The sound sections are in particular phonemes of a language or a series of such phonemes. The smaller the section of the sound, the greater the possibilities for combining new words. For example, the German language contains a predetermined amount of approximately 40
Phonemen, die zur Synthese nahezu aller Worter der Sprache ausreichen. Dabei sind unterschiedliche akustische Kontexte zu berücksichtigen, e nachdem, m welchem Wort das jeweilige Phonem auftritt. Nun ist es wichtig, die einzelnen Phoneme m den akustischen Kontext derart einzubetten, daßPhonemes sufficient to synthesize almost all words in the language. Different acoustic contexts must be taken into account, depending on the word in which the respective phoneme occurs. Now it is important to embed the individual phonemes in the acoustic context in such a way that
Diskontinuitäten, die vom menschlichen Gehör als unnaturlich und "synthetisch" empfunden werden, vermieden werden. Wie erwähnt stammen die Lautabschnitte von unterschiedlichen Sprechern und weisen somit verschiedene sprecherspezifische Charakteristika auf. Um eine möglichst natürlich wirkende Äußerung zu synthetisieren, ist es wichtig, die Diskontinuitäten zu minimieren. Dies kann erfolgen durch Anpassung der identifizierbaren und modifizierbaren sprecherspezifischen Charakteristika oder durch Auswahl passender Lautabschnitte aus der Datenbank, wobei ebenfalls die sprecherspezifischen Charakteristika bei der Auswahl ein entscheidendes Hilfsmittel darstellen.Discontinuities that are perceived by the human ear as unnatural and "synthetic" can be avoided. As mentioned, the sound sections come from different speakers and thus have different speaker-specific characteristics. In order to synthesize a statement that looks as natural as possible, it is important to minimize the discontinuities. This can be done by adapting the identifiable and modifiable speaker-specific characteristics or by selecting suitable sound sections from the database, the speaker-specific characteristics also being a decisive aid in the selection.
In Fig.5 sind beispielhaft zwei Laute A 507 und B 508 dargestellt, die jeweils einzelne Lautabschnitte 505 bzw. 506 aufweisen. Die Laute A 507 und B 508 stammen jeweils aus einer gesprochenen Äußerung, wobei der Laut A 507 deutlich vom Laut B 508 verschieden ist. Eine Trennlinie 509 zeigt an, wo der Laut A 507 mit dem Laut B 508 verknüpft werden soll. Im vorliegenden Fall sollen die ersten drei Lautabschnitte des Lautes A 507 mit den letzten drei Lautabschnitten des Lautes B 508 konkatenativ verknüpft werden.5 shows two sounds A 507 and B 508, each of which has individual sound sections 505 and 506, for example. The sounds A 507 and B 508 each come from a spoken utterance, whereby the sound A 507 clearly is different from the sound B 508. A dividing line 509 indicates where the sound A 507 should be linked with the sound B 508. In the present case, the first three sound sections of sound A 507 are to be concatenated with the last three sound sections of sound B 508.
Es wird entlang der Trennlinie 509 em zeitliches Dehnen oder Stauchen (vergleiche Pfeil 503) der aufeinanderfolgenden Lautabschnitte durchgeführt, um den diskontinuierlichen Eindruck am Übergang 509 zu vermindern.A temporal stretching or compressing (see arrow 503) of the successive sound sections is carried out along the dividing line 509 in order to reduce the discontinuous impression at the transition 509.
Eine Variante besteht m einem abrupten Übergang der entlang der Trennlinie 509 geteilten Laute. Dabei kommt es jedoch zu den erwähnten Diskontinuitäten, die das menschliche Gehör als störend wahrnimmt. Fugt man hingegen einen Laut C zusammen, daß die Lautabschnitte innerhalb eines Ubergangsbereichs 501 oder 502 berücksichtigt werden, wobei em spektrales Abstandsmaß zwischen zwei einander zuordenbaren Lautabschnitten m dem jeweiligen Ubergangsbereich 501 oder 502 angepaßt wird (allmählicher Übergang zwischen denA variant consists in an abrupt transition of the sounds divided along the dividing line 509. However, this leads to the discontinuities mentioned, which human hearing perceives as disturbing. If, on the other hand, a sound C is put together, the sound sections within a transition area 501 or 502 are taken into account, whereby a spectral distance between two mutually assignable sound sections m is adapted to the respective transition area 501 or 502 (gradual transition between the
Lautabschnitten) . Als das Abstandsmaß herangezogen wird insbesondere im Wavelet-Raum der euklidische Abstand zwischen den diesem Bereich relevanten Koeffizienten. Sound sections). The Euclidean distance between the coefficients relevant to this area is used as the distance measure, particularly in the wavelet space.
Literaturverzeichnis :Bibliography :
[1] I. Daubechies: "Ten Lectures on Wavelets", Siam Verlag 1992, ISBN 0-89871-274-2, Kapitel 5.1, Seiten 129-137. [1] I. Daubechies: "Ten Lectures on Wavelets", Siam Verlag 1992, ISBN 0-89871-274-2, chapter 5.1, pages 129-137.

Claims

Patentansprücheclaims
1. Verfahren zur Bestimmung spektraler Sprachcharakteristika m einer gesprochenen Äußerung, a) bei dem die Äußerung digitalisiert wird, b) bei dem die digitalisierte ußerung einer Wavelet- Transformation unterzogen wird, c) bei dem anhand unterschiedlicher Transformationsstufen der Wavelet-Transformation die sprecherspezifischen Charakteristika bestimmt werden.1. Method for determining spectral speech characteristics in a spoken utterance, a) in which the utterance is digitized, b) in which the digitized utterance is subjected to a wavelet transformation, c) in which the speaker-specific characteristics are determined on the basis of different transformation levels of the wavelet transformation become.
2. Verfahren nach Anspruch 1, bei dem vor der Wavelet-Transformation eine gefensterte Transformation der digitalisierten Äußerung einen Frequenzbereich durchgeführt wird.2. The method according to claim 1, in which a windowed transformation of the digitized utterance is carried out in a frequency range before the wavelet transformation.
3. Verfahren nach Anspruch 2, bei dem die Transformation den Frequenzbereich mittels Fast-Fourier-Transformation durchgeführt wird.3. The method according to claim 2, wherein the transformation of the frequency range is carried out by means of Fast Fourier transformation.
Verfahren nach einem der vorhergehenden Ansprüche, bei dem jeder Stufe der Wavelet-Transformation em Tiefpaßanteil und em Hochpaßanteil eines zu transformierenden Signals ermittelt werden.Method according to one of the preceding claims, in which each stage of the wavelet transformation em low pass portion and em high pass portion of a signal to be transformed are determined.
Verfahren nach einem der vorhergehenden Ansprüche, bei dem em Hochpaßanteil nach einem Realteil und einemMethod according to one of the preceding claims, in which a high-pass component according to a real component and a
Imagmarteil unterteilt wird.Imagmar part is divided.
6. Verfahren nach einem der vorhergehenden Ansprüche, bei dem die Wavelet-Transformation mehrere Transformationsstufen umfaßt, wobei die letzte Transformationsstufe einen Gleichanteil der Äußerung m einer der Anzahl Transformationsstufen entsprechenden wiederholten Tiefpaßfilterung liefert. 6. The method according to any one of the preceding claims, wherein the wavelet transformation comprises a plurality of transformation stages, the last transformation stage providing a constant component of the utterance m of repeated low-pass filtering corresponding to the number of transformation stages.
7. Verfahren nach einem der vorhergehenden Ansprüche, bei dem die sprecherspezifischen Charakteristika bestimmt sind durch: a) eine Grundfrequenz der gesprochenen Äußerung; b) spektrale Hullkurve; c) einer Rauchigkeit der gesprochenen Äußerung.7. The method according to any one of the preceding claims, wherein the speaker-specific characteristics are determined by: a) a basic frequency of the spoken utterance; b) spectral envelope; c) a smokiness of the spoken utterance.
8. Verwendung des Verfahrens nach einem der Ansprüche 1 bis 7 zur Sprachsynthese, wobei einzelne sprecherspezifische Charakteristika im8. Use of the method according to one of claims 1 to 7 for speech synthesis, wherein individual speaker-specific characteristics in
Hinblick auf eine natürlich klingende Aneinanderreihung von Sprachlauten angepaßt werden.With a view to a natural sounding sequence of speech sounds can be adjusted.
9. Verwendung des Verfahrens nach einem der Ansprüche 1 bis 7 zur Sprachsynthese, wobei aus einer vorgegebenen Datenmenge diejenigen Sprachlaute anhand einzelner spektraler Sprachcharakteristika ausgewählt werden, die eine natürlich klingende Aneinanderreihung von Sprachlauten gewährleisten.9. Use of the method according to one of claims 1 to 7 for speech synthesis, wherein those speech sounds are selected from a predetermined amount of data on the basis of individual spectral speech characteristics, which ensure a natural sounding sequence of speech sounds.
10. Anordnung zur Bestimmung spektraler Sprachcharakteristika m einer gesprochenen Äußerung mit einer Prozessoreinheit, die derart eingerichtet ist, daß folgende Schritte durchfuhrbar sind: a) die Äußerung wird digitalisiert; b) die digitalisierte Äußerung wird einer Wavelet- Transformation unterzogen; c) anhand unterschiedlicher Transformationsstufen der10. Arrangement for determining spectral speech characteristics in a spoken utterance with a processor unit, which is set up in such a way that the following steps can be carried out: a) the utterance is digitized; b) the digitized utterance is subjected to a wavelet transformation; c) based on different levels of transformation
Wavelet-Transformation werden die sprecherspezifischen Charakteristika bestimmt. Wavelet transformation, the speaker-specific characteristics are determined.
EP99929088A 1998-05-11 1999-05-03 Method and device for determining spectral voice characteristics in a spoken expression Expired - Lifetime EP1078354B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
DE19821031 1998-05-11
DE19821031 1998-05-11
PCT/DE1999/001308 WO1999059134A1 (en) 1998-05-11 1999-05-03 Method and device for determining spectral voice characteristics in a spoken expression

Publications (2)

Publication Number Publication Date
EP1078354A1 true EP1078354A1 (en) 2001-02-28
EP1078354B1 EP1078354B1 (en) 2002-03-20

Family

ID=7867382

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99929088A Expired - Lifetime EP1078354B1 (en) 1998-05-11 1999-05-03 Method and device for determining spectral voice characteristics in a spoken expression

Country Status (6)

Country Link
EP (1) EP1078354B1 (en)
JP (1) JP2002515608A (en)
AT (1) ATE214831T1 (en)
DE (1) DE59901018D1 (en)
ES (1) ES2175988T3 (en)
WO (1) WO1999059134A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10031832C2 (en) 2000-06-30 2003-04-30 Cochlear Ltd Hearing aid for the rehabilitation of a hearing disorder
US8483854B2 (en) 2008-01-28 2013-07-09 Qualcomm Incorporated Systems, methods, and apparatus for context processing using multiple microphones
JP6251145B2 (en) * 2014-09-18 2017-12-20 株式会社東芝 Audio processing apparatus, audio processing method and program
JP2018025827A (en) * 2017-11-15 2018-02-15 株式会社東芝 Interactive system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2678103B1 (en) * 1991-06-18 1996-10-25 Sextant Avionique VOICE SYNTHESIS PROCESS.
GB2272554A (en) * 1992-11-13 1994-05-18 Creative Tech Ltd Recognizing speech by using wavelet transform and transient response therefrom
JP3093113B2 (en) * 1994-09-21 2000-10-03 日本アイ・ビー・エム株式会社 Speech synthesis method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9959134A1 *

Also Published As

Publication number Publication date
WO1999059134A1 (en) 1999-11-18
ES2175988T3 (en) 2002-11-16
EP1078354B1 (en) 2002-03-20
JP2002515608A (en) 2002-05-28
ATE214831T1 (en) 2002-04-15
DE59901018D1 (en) 2002-04-25

Similar Documents

Publication Publication Date Title
DE69613646T2 (en) Method for speech detection in case of strong ambient noise
DE69028072T2 (en) Method and device for speech synthesis
DE60000074T2 (en) Linear predictive cepstral features organized in hierarchical subbands for HMM-based speech recognition
DE69811656T2 (en) VOICE TRANSFER AFTER A TARGET VOICE
DE69521955T2 (en) Method of speech synthesis by chaining and partially overlapping waveforms
DE69718284T2 (en) Speech synthesis system and waveform database with reduced redundancy
DE4237563C2 (en) Method for synthesizing speech
DE69031165T2 (en) SYSTEM AND METHOD FOR TEXT-LANGUAGE IMPLEMENTATION WITH THE CONTEXT-DEPENDENT VOCALALLOPHONE
DE68919637T2 (en) Method and device for speech synthesis by covering and summing waveforms.
DE69719654T2 (en) Prosody databases for speech synthesis containing fundamental frequency patterns
DE69909716T2 (en) Formant speech synthesizer using concatenation of half-syllables with independent cross-fading in the filter coefficient and source range
DE69932786T2 (en) PITCH DETECTION
DE3687815T2 (en) METHOD AND DEVICE FOR VOICE ANALYSIS.
DE69720861T2 (en) Methods of sound synthesis
DE69933188T2 (en) Method and apparatus for extracting formant based source filter data using cost function and inverted filtering for speech coding and synthesis
DE69627865T2 (en) VOICE SYNTHESIZER WITH A DATABASE FOR ACOUSTIC ELEMENTS
DE69614233T2 (en) Speech adaptation system and speech recognizer
WO2003012779A1 (en) Method for analysing audio signals
DE69631037T2 (en) VOICE SYNTHESIS
DE3228757A1 (en) METHOD AND DEVICE FOR PERIODIC COMPRESSION AND SYNTHESIS OF AUDIBLE SIGNALS
DE69723930T2 (en) Method and device for speech synthesis and data carriers therefor
EP1435087B1 (en) Method for producing reference segments describing voice modules and method for modelling voice units of a spoken test model
WO2001086634A1 (en) Method for creating a speech database for a target vocabulary in order to train a speech recognition system
EP1078354B1 (en) Method and device for determining spectral voice characteristics in a spoken expression
DE69607928T2 (en) METHOD AND DEVICE FOR PROVIDING AND USING DIPHONES FOR MULTI-LANGUAGE TEXT-BY-LANGUAGE SYSTEMS

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000919

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE DE ES FR GB NL

RIC1 Information provided on ipc code assigned before grant

Free format text: 7G 10L 13/06 A

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

17Q First examination report despatched

Effective date: 20010904

REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE DE ES FR GB NL

REF Corresponds to:

Ref document number: 214831

Country of ref document: AT

Date of ref document: 20020415

Kind code of ref document: T

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: AT

Payment date: 20020424

Year of fee payment: 4

REF Corresponds to:

Ref document number: 59901018

Country of ref document: DE

Date of ref document: 20020425

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20020523

Year of fee payment: 4

Ref country code: BE

Payment date: 20020523

Year of fee payment: 4

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20020528

Year of fee payment: 4

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20020722

Year of fee payment: 4

ET Fr: translation filed
REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2175988

Country of ref document: ES

Kind code of ref document: T3

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20021223

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20030503

Ref country code: AT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20030503

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20030505

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20030531

BERE Be: lapsed

Owner name: *SIEMENS A.G.

Effective date: 20030531

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20031201

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20031202

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20030503

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040130

NLV4 Nl: lapsed or anulled due to non-payment of the annual fee

Effective date: 20031201

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 20030505