EP0919052B1 - Verfahren und system zur sprache-in-sprache-umsetzung - Google Patents

Verfahren und system zur sprache-in-sprache-umsetzung Download PDF

Info

Publication number
EP0919052B1
EP0919052B1 EP97919840A EP97919840A EP0919052B1 EP 0919052 B1 EP0919052 B1 EP 0919052B1 EP 97919840 A EP97919840 A EP 97919840A EP 97919840 A EP97919840 A EP 97919840A EP 0919052 B1 EP0919052 B1 EP 0919052B1
Authority
EP
European Patent Office
Prior art keywords
speech
information
input
model
fundamental tone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP97919840A
Other languages
English (en)
French (fr)
Other versions
EP0919052A1 (de
Inventor
Bertil Lyberg
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telia AB
Original Assignee
Telia AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telia AB filed Critical Telia AB
Publication of EP0919052A1 publication Critical patent/EP0919052A1/de
Application granted granted Critical
Publication of EP0919052B1 publication Critical patent/EP0919052B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Definitions

  • the invention relates to a speech-to-speech conversion system and method which are capable of matching the dialect of speech outputs to that of the respective speech inputs, and to a voice responsive communication system including a speech-to-speech conversion system and operating in accordance with a speech-to-speech conversion method.
  • the speech information which is stored in a database and used to provide appropriate synthesised spoken responses to voice inputs utilising a speech-to-speech conversion system, is normally reproduced in a dialect which conforms to a standard national dialect.
  • the database of known voice responsive communication systems interpret received speech information, i.e. the voice inputs. It may also be difficult for the person making the voice inputs to fully understand the spoken response. Even if such responses are understandable to a recipient, it would be more user friendly if the dialect of the spoken response is the same as the dialect of the related voice input.
  • the meaning of a word can have widely different meanings depending on language stress.
  • the meaning of one and the same sentence can be given a different significance depending on where the stress is placed.
  • the stressing of sentences, or parts thereof determines sections which are emphasised in the language and which may be of importance in determining the precise meaning of the spoken language.
  • document WO-A-96/00962 discloses a speech recognition system for recognizing dialectal variations in a language.
  • a voice responsive communication system In order to overcome these difficulties, it would be necessary for a voice responsive communication system to be capable of interpreting the received speech information, irrespective of dialect, and to match the dialect of speech outputs to that of the respective speech inputs. Also, in order to be able to determine the meaning of single words, or phrases, in an unambiguous manner in a spoken sequence, it would be necessary for the speech-to-speech converters used in a voice responsive communication system to be capable of determining, and taking account of, stresses in the spoken sequence.
  • the invention as claimed in claims 1-26 provides a speech-to-speech conversion system for providing, at the output thereof, spoken responses to speech inputs to the system including speech recognition means for the input speech; interpretation means for interpreting the content of the recognised input speech; and a database containing speech information data for use in the formulation of said spoken responses, the output of said interpretation means being used to access said database and obtain speech information data therefrom, characterised in that the system further includes extraction means for extracting prosody information from the input speech; means for obtaining dialectal information from said prosody information; and text-to-speech conversion means for converting the speech information data obtained from said database into a spoken response using said dialectal information, the dialect of the spoken response being matched to that of the input speech.
  • the speech recognition means may be adapted to identifying a number of phonemes from a segment of the input speech and to interpret the phonemes, as possible words, or word combinations, to establish a model of the speech, the speech model having word and sentence accents according to a standardised pattern for the language of the input speech.
  • the prosody information extracted from the input speech is preferably the fundamental tone curve of the input speech.
  • the means for obtaining dialectal information from said prosody information includes first analysing means for determining the intonation pattern of the fundamental tone of the input speech and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; second analysing means for determining the intonation pattern of the fundamental tone curve of the speech model and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; comparison means for comparing the intonation pattern of the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and minimum values of the fundamental tone curves of the incoming speech in relation to the maximum and minimum values of the fundamental tone curve of the speech model, the identified time difference being indicative of dialectal characteristics of the input speech.
  • the time difference may be determined in relation to an intonation pattern reference point, for example, the point at which a consonant/vowel limit occurs.
  • the speech-to-speech conversion system may include means for obtaining information on sentence accents from said prosody information.
  • the speech recognition means includes checking means for lexically checking the words in the speech model and for syntactically checking the phrases in the speech model, the words and phrases which are not linguistically possible being excluded from the speech model.
  • the checking means are, with this arrangement, adapted to check the orthography and phonetic transcription of the words in the speech model, the transcription information including lexically abstracted accent information, of type stressed syllables, and information relating to the location of secondary accent.
  • the accent information may, for example, relate to tonal word accent I and accent II.
  • the sentence accent information and/or sentence stressing may be used, to advantage, in the interpretation of the content of the recognised input speech.
  • the speech-to-speech conversion system may include dialogue management means for managing a dialogue with the database, said dialogue being initiated by the interpretation means.
  • the dialogue with the database results in the application of speech information data to the text-to-speech conversion means.
  • the invention also provides, in a voice responsive communication system, a method for providing a spoken response to a speech input to the system, said response having a dialect to match that of the speech input, said method including the steps of recognising and interpreting the input speech, and utilising the interpretation to obtain speech information data from a database for use in the formulation of said spoken response, characterised in that said method further includes the steps of extracting prosody information from the input speech, obtaining dialectal information from said prosody information, and converting the speech information data obtained from said database into said spoken response using said dialectal information.
  • the recognition and interpretation of the input speech includes the steps of identifying a number of phonemes from a segment of the input speech and interpreting the phonemes, as possible words, or word combinations, to establish a model of the speech, the speech model having word and sentence accents according to a standardised pattern for the language of the input speech.
  • the prosody information extracted from the input speech is the fundamental tone curve of the input speech.
  • the method according to the present invention includes the steps of determining the intonation pattern of the fundamental tone of the input speech and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; determining the intonation pattern of the fundamental tone curve of a speech model and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; comparing the intonation pattern of the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and minimum values of the fundamental tone curves of the incoming speech in relation to the maximum and minimum values of the fundamental tone curve of the speech model, the identified time difference being indicative of dialectal characteristics of the input speech.
  • the time difference may be determined in relation to an intonation pattern reference point, for example, the point at which a consonant/vowel limit occurs.
  • the method may include the step of obtaining information on sentence accents from said prosody information.
  • the words in the speech model are checked lexically and the phrases in the speech model are checked syntactically, the words and phrases which are not linguistically possible being excluded from the speech model.
  • the orthography and phonetic transcription of the words in the speech model may be checked, the transcription information including lexically abstracted accent information, of type stressed syllables, and information relating to the location of secondary accent.
  • the accent information may relate to tonal word accent I and accent II.
  • sentence accent information and/or sentence stressing may be used in the interpretation of the content of the recognised input speech.
  • the method according to the present invention may include the step of initiating a dialogue with the database to obtain speech information data for formulating said spoken response, said dialogue being initiated following the interpretation of the input speech.
  • the dialogue with the database may result in the application of speech information data to text-to-speech conversion means.
  • the invention further provides a voice responsive communication system which includes a speech-to-speech conversion system as outlined in the preceding paragraphs, or utilises a method as outlined in the preceding paragraphs for providing a spoken response to a speech input to the system.
  • the characteristic features of the speech-to-speech conversion system and method, according to the present invention are that:
  • a speech-to-speech conversion system includes, at the input 1 thereof, a speech recognition unit 2 and an extraction unit 3 for extracting prosody information from speech applied to the system input 1, i.e. the fundamental tone curve of the input speech.
  • speech inputs, applied to the input 1 are simultaneously applied to the units 2 and 3.
  • the output of the speech recognition unit 2 and an output of the extraction unit 3 are connected to separate inputs of an interpretation unit 4, the output of which is connected to a database management unit 5.
  • the database management unit 5 which is adapted for two way communication with a database 6, is connected at the output thereof to the input of a text-to-speech converter 7.
  • the dialogue between the database 6 and the database management unit 5 can be effected by any known database communication language, for example, SQL (Structured Query Language).
  • the output of the text-to-speech converter 7 provides a synthesised speech output for the speech-to-speech conversion system.
  • a further output of the extraction unit 3 is connected to the input of a prosody analyzer unit 8 which is adapted for two way communication with the text-to-speech converter 7.
  • the prosody analyzer unit 8 is adapted, as a part of the text-to-speech conversion process of the converter 7, to analyze the prosody information, i.e. the fundamental tone curve, of the synthesised speech and make any necessary corrections to the intonation pattern of the synthesised speech in accordance with the dialectal information extracted from the input speech.
  • the dialect of the synthesised speech output of the speech-to-speech conversion system will match that of the input speech.
  • the present invention is adapted to provide a spoken response to a speech input to the speech-to-speech conversion system which has a dialect to match that of the speech input and that this conversion process includes the steps of recognising and interpreting the input speech, utilising the interpretation to obtain speech information data from a database for use in the formulation of the spoken response, extracting prosody information from the input speech, obtaining dialectal information from the prosody information, and converting the speech information data obtained from said database into the spoken response using the dialectal information.
  • This will be outlined in the following paragraphs.
  • the speech inputs to the speech-to-speech conversion system which may be in many forms, for example, requests for information on particular topics, such as banking or telephone services, or general enquiries concerning such services, are applied to the input 1 and thereby to the inputs of the units 2 and 3.
  • the speech recognition unit 2 and interpretation unit 4 are adapted to operate, in a manner well known to persons skilled in the art, to recognise and interpret the speech inputs to the system.
  • the speech recognition unit 2 may, for example, operate by using a Hidden Markov model, or an equivalent speech model.
  • the function of the units 2 and 4 is to convert speech inputs to the system into a form which is a faithful representation of the content of the speech inputs and suitable for application to the input of the database management unit 5.
  • the content of the textual information data at the output of the interpretation unit 4 must be an accurate representation of the speech input and be usable by the database management unit 5 to access, and extract speech information data from, the database 6 for use in the formulation of a synthesised spoken response to the speech input.
  • this process would, in essence, be effected by identifying a number of phonemes from a segment of the input speech which are combined into allophone strings, the phonemes being interpreted as possible words, or word combinations, to establish a model of the speech.
  • the established speech model will have word and sentence accents according to a standardised pattern for the language of the input speech.
  • the information, concerning the recognised words and word combinations, generated by the speech recognition unit 2 may, in practice, be checked both lexically (using a lexicon, with orthography and transcription) and syntactically.
  • the purpose of these checks is to identify and exclude any words which do not exist in the language concerned, and/or any phrase whose syntax does not correspond with the language concerned.
  • the speech recognition unit 2 ensures that only those words, and word combinations, which are found to be acceptable both lexically and syntactically, are used to create a model of the input speech.
  • the intonation pattern of the speech model is a standardised intonation pattern for the language concerned, or an intonation pattern which has been established by training, or explicit knowledge, using a number of dialects of the language concerned.
  • the prosody information i.e. the fundamental tone curve, extracted from the input speech by the extraction unit 3 can be used to obtain dialectal, sentence accent and sentence stressing, information, for use by the speech-to-speech conversion system and method of the present invention.
  • the dialectal information can be used by the speech-to-speech conversion system and method to match the dialect of the output speech to that of the input speech and the sentence accent and stressing information can be used in the recognition and interpretation of the input speech.
  • the means for obtaining dialectal information from the prosody information includes;
  • the time difference may be determined in relation to an intonation pattern reference point.
  • the-difference, in terms of intonation pattern, between different dialects can be described by different points in time for word and sentence accent, i.e. the time difference can be determined in relation to an intonation pattern reference point, for example, the point at which a consonant/vowel limit occurs.
  • the reference against which the time difference is measured is the point at which the consonant/vowel boundary, i.e. the CV-boundary, occurs.
  • the identified time difference which, as stated above, is indicative of the dialect in the input speech, i.e. the spoken language, is applied to the text-to-speech converter 7 to enable the intonation pattern, and thereby the dialect, of the speech output of the system to be corrected so that it corresponds to the intonation pattern of the corresponding words and/or phrase of the input speech.
  • this corrective process enables the dialectal information in the input speech to be incorporated into the output speech.
  • the fundamental tone curve of the speech model is based on information resulting from the lexical (orthography and transcription) and syntactic checks.
  • the transcription information includes lexically abstracted accent information, of type stressed syllables, i.e. tonal word accents I and II, and information relating to the location of secondary accent, i.e. information given, for instance, in dictionaries. This information can be used to adjust the recognition pattern of the speech recognition model, for example, the Hidden Markov model, to take account of the transcription information. A more exact model of the input speech is, therefore, obtained during the interpretation process.
  • the speech model is compared with a spoken input sequence, and any difference there between can be determined and used to bring the speech model into conformity with the spoken sequence and/or to determine stresses in the spoken sequence.
  • relative sentence stresses can be determined by classifying the ratio between variations and declination of the fundamental tone curve, whereby emphasised sections, or individual words can be determined.
  • the pitch of the speech can be determined from the declination of the fundamental tone curve.
  • the extraction unit 3 in association with the interpretation unit 4, is adapted to determine:
  • classification of the ratio between the variation and declination of the fundamental tone curve makes it possible to identify/determine relative sentence stresses, and emphasised sections, or words.
  • the relation between the variation and declination of the fundamental tone curve can be utilised to determine the dynamic range of the fundamental tone curve.
  • the information obtained in respect of the fundamental tone curve concerning dialect, sentence accent and stressing can be used for the interpretation of speech by the interpretation unit 4, i.e. the information can be used, in the manner outlined above, to obtain a better understanding of the content of the input speech and bring the intonation pattern of the speech model into conformity with the input speech.
  • the corrected speech model exhibits the language characteristics (including dialect information, sentence accent and stressing) of the input speech it can be used to give an increased understanding of the input speech and be effectively used by the database management unit 5 to obtain the required speech information data from the database 6 to formulate a response to a voice input to the speech-to-speech conversion system.
  • the ability to detect speech, irrespective of dialect variations, in accordance with the system and method of the present invention makes it possible to use speech in many different voice-responsive applications.
  • the system is, therefore, adapted to recognise and accurately interpret the content of speech inputs and to tailor the dialect of the voice response to match the dialect of the voice input.
  • This process provides a user friendly system because the language of the man-machine dialogue is in accordance with the dialect of the user concerned.

Claims (26)

  1. System zur Sprache-in-Sprache-Umsetzung zum Erzeugen am Ausgang desselben von gesprochenen Antworten auf am System eingegebenen Spracheingaben mit Spracherkennungsmitteln für die Spracheingabe; Interpretationsmitteln zum Interpretieren des Inhaltes der erkannten Spracheingabe; und einer Datenbank, welche Sprachinformationsdaten zur Verwendung bei der Formulierung der gesprochenen Antworten enthält, wobei der Ausgang der Interpretationsmittel dazu verwendet wird auf die Datenbank zuzugreifen und Sprachinformationsdaten aus dieser zu erhalten,
    dadurch gekennzeichnet, daß das System weiterhin Extraktionsmittel zum Extrahieren der Satzrhythmusinformation aus dem Spracheingang; Mittel zum Erzielen einer Dialektinformation aus der Satzrhythmusinformation; und eine Einrichtung für eine Textin-Sprache-Umsetzung zum Umsetzen der Sprachinformationsdaten, die aus der Datenbank erhalten worden sind, in eine gesprochene Antwort unter Verwendung der Dialektinformation aufweist, wobei der Dialekt der gesprochenen Antwort an denjenigen des Spracheingangs angepaßt wird, wobei die Mittel zum Erzielen der Dialektinformation aus der Sprachrhythmusinformation aufweisen erste Analysemittel zum Bestimmen des Intonationsmusters aus dem Grundton des Spracheingangs und dabei der Maximum- und Minimum-Werte der Grundtonkurve und deren entsprechende Positionen; zweite Analysemittel zum Bestimmen des Intonationsmusters der Grundtonkurve des Sprachmodells und dabei der Maximum- und Minimum-Werte der Grundtonkurve und ihrer entsprechenden Positionen; Vergleichsmittel zum Vergleichen des Intonationsmusters des Spracheingangs mit dem Intonationsmuster des Sprachmodells zum Identifizieren der Zeitdifferenz zwischen dem Auftreten der Maximum- und Minimum-Werte der Grundtonkurven des Spracheingangs in Relation zu den Maximum- und Minimum-Werten der Grundtonkurve des Sprachmodells, wobei die identifizierte Zeitdifferenz die Dialektcharakteristika des Spracheingangs anzeigt.
  2. System zur Sprache-in-Sprache-Umsetzung nach Anspruch 1,
    dadurch gekennzeichnet, daß die Spracherkennungsmittel für das Identifizieren einer Anzahl von Phonemen aus einem Segment des Spracheingangs angepaßt sind und Interpretationsmittel zum Interpretieren der Phoneme als möglicher Worte oder Wortkombinationen aufweisen, um ein Sprachmodell zu errichten, wobei das Sprachmodell Wortund Satzakzente gemäß einem standardisierten Muster für die Sprache des Spracheingangs hat.
  3. System zur Sprache-in-Sprache-Umsetzung nach Anspruch 2,
    dadurch gekennzeichnet, daß die Satzrhythmusinformation, die aus dem Spracheingang extrahiert worden ist, die Grundtonkurve des Spracheingangs ist.
  4. System zur Sprache-in-Sprache-Umsetzung nach Anspruch 3,
    dadurch gekennzeichnet, daß die Zeitdifferenz in Relation zu einem Intonationsmuster-Referenzpunkt bestimmt wird.
  5. System zur Sprache-in-Sprache-Umsetzung nach Anspruch 4,
    dadurch gekennzeichnet, daß der Intonationsmuster-Referenzpunkt, bezogen auf welchen die Zeitdifferenz gemessen wird, der Punkt ist, an welchem eine Konsonant-/Vokal-Grenze auftritt.
  6. System zur Sprache-in-Sprache-Umsetzung nach einem der vorstehenden Ansprüche,
    dadurch gekennzeichnet, daß das System ferner Mittel zum Erzielen von Information über die Satzakzente aus der Satzrhythmusinformation aufweist.
  7. System zur Sprache-in-Sprache-Umsetzung nach Anspruch 6,
    dadurch gekennzeichnet, daß die Spracherkennungsmittel Überprüfungsmittel zum lexikalen Überprüfen der Worte in dem Sprachmodell und zum Überprüfen der Syntax der Phrasen in dem Sprachmodell haben, wobei die Worte und Phrasen, welche linguistisch nicht möglich sind, aus dem Sprachmodell ausgeschlossen werden, in dem die Überprüfungsmittel so ausgebildet sind, daß sie die orthographische und phonetische Transkription der Worte in dem Sprachmodell überprüfen, indem die Transkriptionsinformation lexikalisch abstrahierte Akzentinformation, Information bezüglich der Art betonter Silben und Information bezüglich des Ortes des Sekundärakzentes aufweist.
  8. System zur Sprache-in-Sprache-Umsetzung nach Anspruch 7,
    dadurch gekennzeichnet, daß die Akzentinformation sich auf den tonalen Wortakzent I und Akzent II bezieht.
  9. System zur Sprache-in-Sprache-Umsetzung nach einem der Ansprüche 6 bis 8,
    dadurch gekennzeichnet, daß die Satzakzentinformation bei der Interpretation des Inhaltes des erkanntes Spracheingangs verwendet wird.
  10. System zur Sprache-in-Sprache-Umsetzung nach einem der vorstehenden Ansprüche,
    dadurch gekennzeichnet, daß Satzbetonungen bestimmt werden und bei der Interpretation des Inhaltes des erkannten Spracheingangs verwendet werden.
  11. System zur Sprache-in-Sprache-Umsetzung nach einem der vorstehenden Ansprüche,
    dadurch gekennzeichnet, daß das System ferner Dialogverwaltungsmittel zum Verwalten eines Dialogs mit der Datenbank hat, wobei der Dialog durch die Interpretationsmittel initiiert wird.
  12. System zur Sprache-in-Sprache-Umsetzung nach Anspruch 11,
    dadurch gekennzeichnet, daß der Dialog mit der Datenbank zu der Anwendung von Sprachinformationsdaten bei den Text-in-Sprache-Umsetzungsmitteln führt.
  13. System zur Sprache-in-Sprache-Umsetzung nach Anspruch 10 oder 11,
    dadurch gekennzeichnet, daß der Dialog mit der Datenbank unter Verwendung von SQL durchgeführt wird.
  14. Mit Stimme antwortendes Kommunikationssystem mit einem System zur Sprache-in-Sprache-Umsetzung nach einem der vorstehenden Ansprüche.
  15. Verfahren zum Schaffen einer gesprochenen Antwort auf einen Spracheingang in einem mit Stimme antwortenden Kommunikationssystem, wobei die Antwort einen Dialekt aufweist, der an denjenigen des Spracheingangs angepaßt ist, wobei das Verfahren die Schritte aufweist Erkennen und Interpretieren des Spracheingangs und Verwenden der Interpretation zum Erzielen von Sprachinformationsdaten aus einer Datenbank zur Verwendung bei der Formulierung der gesprochenen Antwort,
    dadurch gekennzeichnet, daß das Verfahren weiterhin die Schritte aufweist Extrahieren der Satzrhythmusinformation aus dem Spracheingang, Erzielen von Dialektinformation aus der Satzrhythmusinformation und Umwandeln der Sprachinformationsdaten, die von der Datenbank erhalten worden sind, in die gesprochene Antwort unter Verwendung der Dialektinformation, die Schritte Bestimmen des Intonationsmusters des Grundtons des Spracheingangs und dadurch der Maximum- und Minimum-Werte der Grundtonkurve und deren entsprechende Positionen; Bestimmen des Intonationsmusters der Grundtonkurve des Sprachmodells und dadurch der Maximum- und Minimumwerte der Grundtonkurve und deren jeweiliger Positionen; Vergleichen des Intonationsmusters des Spracheingangs mit dem Intonationsmuster des Sprachmodells, um die Zeitdifferenz zwischen dem Auftreten des Maximum- und Minimum-Wertes der Grundtonkurven für den Spracheingang in Relation zu den Maximum- und Minimum-Werten der Grundtonkurve des Sprachmodells zu identifizieren, wobei die identifizierte Zeitdifferenz die Dialektcharakteristika des Spracheingangs anzeigt.
  16. Verfahren nach Anspruch 15,
    dadurch gekennzeichnet, daß die Erkennung und Interpretation die Schritte Identifizieren einer Anzahl von Phonemen aus einem Segment des Spracheingangs und Interpretieren der Phoneme als mögliche Worte oder Wortkombinationen zum Errichten eines Sprachmodells aufweist, wobei das Sprachmodell Wort- und Satzakzente gemäß einem standardisierten Muster für die Sprache des Spracheingangs hat.
  17. Verfahren nach Anspruch 16,
    dadurch gekennzeichnet, daß die Satzrhythmusinformation, welche aus dem Spracheingang extrahiert worden ist, die Grundtonkurve des Spracheingangs ist.
  18. Verfahren nach Anspruch 15,
    dadurch gekennzeichnet, daß die Zeitdifferenz in Relation zu einem Intonationsmuster-Referenzpunkt bestimmt wird.
  19. Verfahren nach Anspruch 18,
    dadurch gekennzeichnet, daß der Intonationsmuster-Referenzpunkt, gegenüber welchem die Zeitdifferenz gemessen wird, der Punkt ist, an welchem eine Konsonant/Vokal-Grenze auftritt.
  20. Verfahren nach einem der Ansprüche 15 bis 19,
    gekennzeichnet durch den Schritt Erzielen von Information bezüglich der Satzakzente aus der Satzrhythmusinformation.
  21. Verfahren nach Anspruch 20,
    dadurch gekennzeichnet, daß die Worte in dem Sprachmodell lexikalisch überprüft werden, daß die Phrasen in dem Sprachmodell bezüglich der Syntax überprüft werden, daß die Worte und Phrasen, die linguistisch nicht möglich sind, aus dem Sprachmodell ausgeschlossen werden, daß die orthographische und phonetische Transkription der Worte in dem Sprachmodell überprüft wird und daß die Transkriptionsinformation lexikalisch abstrahierte Akzentinformation der Art betonter Silben und Information bezüglich des Ortes des Sekundärakzentes aufweist.
  22. Verfahren nach Anspruch 21,
    dadurch gekennzeichnet, daß die Akzentinformation sich auf den tonalen Wortakzent I und Akzent II bezieht.
  23. Verfahren nach einem der Ansprüche 20 bis 22,
    gekennzeichnet durch den Schritt Verwenden der Satzakzentinformation bei der Interpretation des Spracheingangs.
  24. Verfahren nach einem der Ansprüche 15 bis 23,
    gekennzeichnet durch den Schritt Initiieren eines Dialoges mit der Datenbank zum Erzielen von Sprachinformationsdaten zum Formulieren der gesprochenen Antwort, wobei der Dialog auf die Interpretation des Spracheingangs folgend, initiiert wird.
  25. Verfahren nach Anspruch 24,
    dadurch gekennzeichnet, daß der Dialog mit der Datenbank zu der Anwendung von Sprachinformationsdaten bei den Text-in-Sprache-Umsetzungsmitteln führt.
  26. Mit Stimme antwortendes Kommunikationssystem, das so ausgebildet ist, daß es ein Verfahren wie in einem der Ansprüche 15 bis 25 beansprucht, verwenden kann, um eine gesprochene Antwort auf einen Spracheingang am System zu erzeugen.
EP97919840A 1996-05-13 1997-04-08 Verfahren und system zur sprache-in-sprache-umsetzung Expired - Lifetime EP0919052B1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
SE9601811A SE9601811L (sv) 1996-05-13 1996-05-13 Metod och system för tal-till-tal-omvandling med extrahering av prosodiinformation
SE9601811 1996-05-13
PCT/SE1997/000583 WO1997043756A1 (en) 1996-05-13 1997-04-08 A method and a system for speech-to-speech conversion

Publications (2)

Publication Number Publication Date
EP0919052A1 EP0919052A1 (de) 1999-06-02
EP0919052B1 true EP0919052B1 (de) 2003-07-09

Family

ID=20402543

Family Applications (1)

Application Number Title Priority Date Filing Date
EP97919840A Expired - Lifetime EP0919052B1 (de) 1996-05-13 1997-04-08 Verfahren und system zur sprache-in-sprache-umsetzung

Country Status (6)

Country Link
EP (1) EP0919052B1 (de)
DE (1) DE69723449T2 (de)
DK (1) DK0919052T3 (de)
NO (1) NO318557B1 (de)
SE (1) SE9601811L (de)
WO (1) WO1997043756A1 (de)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1159702C (zh) * 2001-04-11 2004-07-28 国际商业机器公司 具有情感的语音-语音翻译系统和方法
US7181397B2 (en) 2005-04-29 2007-02-20 Motorola, Inc. Speech dialog method and system
DE102007011039B4 (de) * 2007-03-07 2019-08-29 Man Truck & Bus Ag Freisprecheinrichtung in einem Kraftfahrzeug
US8150020B1 (en) * 2007-04-04 2012-04-03 At&T Intellectual Property Ii, L.P. System and method for prompt modification based on caller hang ups in IVRs
US8024179B2 (en) * 2007-10-30 2011-09-20 At&T Intellectual Property Ii, L.P. System and method for improving interaction with a user through a dynamically alterable spoken dialog system
JP5282469B2 (ja) 2008-07-25 2013-09-04 ヤマハ株式会社 音声処理装置およびプログラム
EP3389043A4 (de) * 2015-12-07 2019-05-15 Yamaha Corporation Sprachinteraktionsvorrichtung und sprachinteraktionsverfahren
CN113470670A (zh) * 2021-06-30 2021-10-01 广州资云科技有限公司 电音基调快速切换方法及系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2165969B (en) * 1984-10-19 1988-07-06 British Telecomm Dialogue system
JPH0772840B2 (ja) * 1992-09-29 1995-08-02 日本アイ・ビー・エム株式会社 音声モデルの構成方法、音声認識方法、音声認識装置及び音声モデルの訓練方法
SE9301596L (sv) * 1993-05-10 1994-05-24 Televerket Anordning för att öka talförståelsen vid översätttning av tal från ett första språk till ett andra språk
SE504177C2 (sv) * 1994-06-29 1996-12-02 Telia Ab Metod och anordning att adaptera en taligenkänningsutrustning för dialektala variationer i ett språk

Also Published As

Publication number Publication date
DE69723449T2 (de) 2004-04-22
SE506003C2 (sv) 1997-11-03
NO985179L (no) 1998-11-11
SE9601811L (sv) 1997-11-03
DE69723449D1 (de) 2003-08-14
EP0919052A1 (de) 1999-06-02
WO1997043756A1 (en) 1997-11-20
NO985179D0 (no) 1998-11-06
DK0919052T3 (da) 2003-11-03
SE9601811D0 (sv) 1996-05-13
NO318557B1 (no) 2005-04-11

Similar Documents

Publication Publication Date Title
US5752227A (en) Method and arrangement for speech to text conversion
US5806033A (en) Syllable duration and pitch variation to determine accents and stresses for speech recognition
JP4536323B2 (ja) 音声−音声生成システムおよび方法
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
US7013276B2 (en) Method of assessing degree of acoustic confusability, and system therefor
AU2009249165B2 (en) Systems and methods of improving automated speech recognition accuracy using statistical analysis of search terms
US7191132B2 (en) Speech synthesis apparatus and method
EP0767950B1 (de) Verfahren und vorrichtung zur anpassung eines spracherkenners an dialektische sprachvarianten
JPH09500223A (ja) 多言語音声認識システム
GB2380380A (en) Speech synthesis method and apparatus
US5677992A (en) Method and arrangement in automatic extraction of prosodic information
EP0919052B1 (de) Verfahren und system zur sprache-in-sprache-umsetzung
Badino et al. Language independent phoneme mapping for foreign TTS
Kadambe et al. Language identification with phonological and lexical models
Chomphan et al. Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis
WO1997043707A1 (en) Improvements in, or relating to, speech-to-speech conversion
Chou et al. Automatic segmental and prosodic labeling of Mandarin speech database.
Alam et al. Development of annotated Bangla speech corpora
US11817079B1 (en) GAN-based speech synthesis model and training method
Gros et al. SI-PRON pronunciation lexicon: a new language resource for Slovenian
Wangchuk et al. Developing a Text to Speech System for Dzongkha
Potisuk et al. Using stress to disambiguate spoken Thai sentences containing syntactic ambiguity
Williams The segmentation and labelling of speech databases
KR0136423B1 (ko) 발음 제어 기호의 유효성 판정을 이용한 음운 변동 처리 방법
Martin et al. Cross Lingual Modelling Experiments for Indonesian

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19981214

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): CH DE DK FI FR GB LI NL

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

RIC1 Information provided on ipc code assigned before grant

Free format text: 7G 10L 13/08 A, 7G 06F 3/16 B

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Designated state(s): CH DE DK FI FR GB LI NL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030709

Ref country code: LI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030709

Ref country code: CH

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030709

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REF Corresponds to:

Ref document number: 69723449

Country of ref document: DE

Date of ref document: 20030814

Kind code of ref document: P

REG Reference to a national code

Ref country code: DK

Ref legal event code: T3

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20040414

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DK

Payment date: 20080411

Year of fee payment: 12

Ref country code: DE

Payment date: 20080418

Year of fee payment: 12

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FI

Payment date: 20080415

Year of fee payment: 12

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20080412

Year of fee payment: 12

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20080421

Year of fee payment: 12

REG Reference to a national code

Ref country code: DK

Ref legal event code: EBP

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20090408

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20091231

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20090408

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20091103

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20090408

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20091222

Ref country code: DK

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20090430