EP0919052B1 - Verfahren und system zur sprache-in-sprache-umsetzung - Google Patents
Verfahren und system zur sprache-in-sprache-umsetzung Download PDFInfo
- Publication number
- EP0919052B1 EP0919052B1 EP97919840A EP97919840A EP0919052B1 EP 0919052 B1 EP0919052 B1 EP 0919052B1 EP 97919840 A EP97919840 A EP 97919840A EP 97919840 A EP97919840 A EP 97919840A EP 0919052 B1 EP0919052 B1 EP 0919052B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- information
- input
- model
- fundamental tone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
Definitions
- the invention relates to a speech-to-speech conversion system and method which are capable of matching the dialect of speech outputs to that of the respective speech inputs, and to a voice responsive communication system including a speech-to-speech conversion system and operating in accordance with a speech-to-speech conversion method.
- the speech information which is stored in a database and used to provide appropriate synthesised spoken responses to voice inputs utilising a speech-to-speech conversion system, is normally reproduced in a dialect which conforms to a standard national dialect.
- the database of known voice responsive communication systems interpret received speech information, i.e. the voice inputs. It may also be difficult for the person making the voice inputs to fully understand the spoken response. Even if such responses are understandable to a recipient, it would be more user friendly if the dialect of the spoken response is the same as the dialect of the related voice input.
- the meaning of a word can have widely different meanings depending on language stress.
- the meaning of one and the same sentence can be given a different significance depending on where the stress is placed.
- the stressing of sentences, or parts thereof determines sections which are emphasised in the language and which may be of importance in determining the precise meaning of the spoken language.
- document WO-A-96/00962 discloses a speech recognition system for recognizing dialectal variations in a language.
- a voice responsive communication system In order to overcome these difficulties, it would be necessary for a voice responsive communication system to be capable of interpreting the received speech information, irrespective of dialect, and to match the dialect of speech outputs to that of the respective speech inputs. Also, in order to be able to determine the meaning of single words, or phrases, in an unambiguous manner in a spoken sequence, it would be necessary for the speech-to-speech converters used in a voice responsive communication system to be capable of determining, and taking account of, stresses in the spoken sequence.
- the invention as claimed in claims 1-26 provides a speech-to-speech conversion system for providing, at the output thereof, spoken responses to speech inputs to the system including speech recognition means for the input speech; interpretation means for interpreting the content of the recognised input speech; and a database containing speech information data for use in the formulation of said spoken responses, the output of said interpretation means being used to access said database and obtain speech information data therefrom, characterised in that the system further includes extraction means for extracting prosody information from the input speech; means for obtaining dialectal information from said prosody information; and text-to-speech conversion means for converting the speech information data obtained from said database into a spoken response using said dialectal information, the dialect of the spoken response being matched to that of the input speech.
- the speech recognition means may be adapted to identifying a number of phonemes from a segment of the input speech and to interpret the phonemes, as possible words, or word combinations, to establish a model of the speech, the speech model having word and sentence accents according to a standardised pattern for the language of the input speech.
- the prosody information extracted from the input speech is preferably the fundamental tone curve of the input speech.
- the means for obtaining dialectal information from said prosody information includes first analysing means for determining the intonation pattern of the fundamental tone of the input speech and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; second analysing means for determining the intonation pattern of the fundamental tone curve of the speech model and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; comparison means for comparing the intonation pattern of the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and minimum values of the fundamental tone curves of the incoming speech in relation to the maximum and minimum values of the fundamental tone curve of the speech model, the identified time difference being indicative of dialectal characteristics of the input speech.
- the time difference may be determined in relation to an intonation pattern reference point, for example, the point at which a consonant/vowel limit occurs.
- the speech-to-speech conversion system may include means for obtaining information on sentence accents from said prosody information.
- the speech recognition means includes checking means for lexically checking the words in the speech model and for syntactically checking the phrases in the speech model, the words and phrases which are not linguistically possible being excluded from the speech model.
- the checking means are, with this arrangement, adapted to check the orthography and phonetic transcription of the words in the speech model, the transcription information including lexically abstracted accent information, of type stressed syllables, and information relating to the location of secondary accent.
- the accent information may, for example, relate to tonal word accent I and accent II.
- the sentence accent information and/or sentence stressing may be used, to advantage, in the interpretation of the content of the recognised input speech.
- the speech-to-speech conversion system may include dialogue management means for managing a dialogue with the database, said dialogue being initiated by the interpretation means.
- the dialogue with the database results in the application of speech information data to the text-to-speech conversion means.
- the invention also provides, in a voice responsive communication system, a method for providing a spoken response to a speech input to the system, said response having a dialect to match that of the speech input, said method including the steps of recognising and interpreting the input speech, and utilising the interpretation to obtain speech information data from a database for use in the formulation of said spoken response, characterised in that said method further includes the steps of extracting prosody information from the input speech, obtaining dialectal information from said prosody information, and converting the speech information data obtained from said database into said spoken response using said dialectal information.
- the recognition and interpretation of the input speech includes the steps of identifying a number of phonemes from a segment of the input speech and interpreting the phonemes, as possible words, or word combinations, to establish a model of the speech, the speech model having word and sentence accents according to a standardised pattern for the language of the input speech.
- the prosody information extracted from the input speech is the fundamental tone curve of the input speech.
- the method according to the present invention includes the steps of determining the intonation pattern of the fundamental tone of the input speech and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; determining the intonation pattern of the fundamental tone curve of a speech model and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; comparing the intonation pattern of the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and minimum values of the fundamental tone curves of the incoming speech in relation to the maximum and minimum values of the fundamental tone curve of the speech model, the identified time difference being indicative of dialectal characteristics of the input speech.
- the time difference may be determined in relation to an intonation pattern reference point, for example, the point at which a consonant/vowel limit occurs.
- the method may include the step of obtaining information on sentence accents from said prosody information.
- the words in the speech model are checked lexically and the phrases in the speech model are checked syntactically, the words and phrases which are not linguistically possible being excluded from the speech model.
- the orthography and phonetic transcription of the words in the speech model may be checked, the transcription information including lexically abstracted accent information, of type stressed syllables, and information relating to the location of secondary accent.
- the accent information may relate to tonal word accent I and accent II.
- sentence accent information and/or sentence stressing may be used in the interpretation of the content of the recognised input speech.
- the method according to the present invention may include the step of initiating a dialogue with the database to obtain speech information data for formulating said spoken response, said dialogue being initiated following the interpretation of the input speech.
- the dialogue with the database may result in the application of speech information data to text-to-speech conversion means.
- the invention further provides a voice responsive communication system which includes a speech-to-speech conversion system as outlined in the preceding paragraphs, or utilises a method as outlined in the preceding paragraphs for providing a spoken response to a speech input to the system.
- the characteristic features of the speech-to-speech conversion system and method, according to the present invention are that:
- a speech-to-speech conversion system includes, at the input 1 thereof, a speech recognition unit 2 and an extraction unit 3 for extracting prosody information from speech applied to the system input 1, i.e. the fundamental tone curve of the input speech.
- speech inputs, applied to the input 1 are simultaneously applied to the units 2 and 3.
- the output of the speech recognition unit 2 and an output of the extraction unit 3 are connected to separate inputs of an interpretation unit 4, the output of which is connected to a database management unit 5.
- the database management unit 5 which is adapted for two way communication with a database 6, is connected at the output thereof to the input of a text-to-speech converter 7.
- the dialogue between the database 6 and the database management unit 5 can be effected by any known database communication language, for example, SQL (Structured Query Language).
- the output of the text-to-speech converter 7 provides a synthesised speech output for the speech-to-speech conversion system.
- a further output of the extraction unit 3 is connected to the input of a prosody analyzer unit 8 which is adapted for two way communication with the text-to-speech converter 7.
- the prosody analyzer unit 8 is adapted, as a part of the text-to-speech conversion process of the converter 7, to analyze the prosody information, i.e. the fundamental tone curve, of the synthesised speech and make any necessary corrections to the intonation pattern of the synthesised speech in accordance with the dialectal information extracted from the input speech.
- the dialect of the synthesised speech output of the speech-to-speech conversion system will match that of the input speech.
- the present invention is adapted to provide a spoken response to a speech input to the speech-to-speech conversion system which has a dialect to match that of the speech input and that this conversion process includes the steps of recognising and interpreting the input speech, utilising the interpretation to obtain speech information data from a database for use in the formulation of the spoken response, extracting prosody information from the input speech, obtaining dialectal information from the prosody information, and converting the speech information data obtained from said database into the spoken response using the dialectal information.
- This will be outlined in the following paragraphs.
- the speech inputs to the speech-to-speech conversion system which may be in many forms, for example, requests for information on particular topics, such as banking or telephone services, or general enquiries concerning such services, are applied to the input 1 and thereby to the inputs of the units 2 and 3.
- the speech recognition unit 2 and interpretation unit 4 are adapted to operate, in a manner well known to persons skilled in the art, to recognise and interpret the speech inputs to the system.
- the speech recognition unit 2 may, for example, operate by using a Hidden Markov model, or an equivalent speech model.
- the function of the units 2 and 4 is to convert speech inputs to the system into a form which is a faithful representation of the content of the speech inputs and suitable for application to the input of the database management unit 5.
- the content of the textual information data at the output of the interpretation unit 4 must be an accurate representation of the speech input and be usable by the database management unit 5 to access, and extract speech information data from, the database 6 for use in the formulation of a synthesised spoken response to the speech input.
- this process would, in essence, be effected by identifying a number of phonemes from a segment of the input speech which are combined into allophone strings, the phonemes being interpreted as possible words, or word combinations, to establish a model of the speech.
- the established speech model will have word and sentence accents according to a standardised pattern for the language of the input speech.
- the information, concerning the recognised words and word combinations, generated by the speech recognition unit 2 may, in practice, be checked both lexically (using a lexicon, with orthography and transcription) and syntactically.
- the purpose of these checks is to identify and exclude any words which do not exist in the language concerned, and/or any phrase whose syntax does not correspond with the language concerned.
- the speech recognition unit 2 ensures that only those words, and word combinations, which are found to be acceptable both lexically and syntactically, are used to create a model of the input speech.
- the intonation pattern of the speech model is a standardised intonation pattern for the language concerned, or an intonation pattern which has been established by training, or explicit knowledge, using a number of dialects of the language concerned.
- the prosody information i.e. the fundamental tone curve, extracted from the input speech by the extraction unit 3 can be used to obtain dialectal, sentence accent and sentence stressing, information, for use by the speech-to-speech conversion system and method of the present invention.
- the dialectal information can be used by the speech-to-speech conversion system and method to match the dialect of the output speech to that of the input speech and the sentence accent and stressing information can be used in the recognition and interpretation of the input speech.
- the means for obtaining dialectal information from the prosody information includes;
- the time difference may be determined in relation to an intonation pattern reference point.
- the-difference, in terms of intonation pattern, between different dialects can be described by different points in time for word and sentence accent, i.e. the time difference can be determined in relation to an intonation pattern reference point, for example, the point at which a consonant/vowel limit occurs.
- the reference against which the time difference is measured is the point at which the consonant/vowel boundary, i.e. the CV-boundary, occurs.
- the identified time difference which, as stated above, is indicative of the dialect in the input speech, i.e. the spoken language, is applied to the text-to-speech converter 7 to enable the intonation pattern, and thereby the dialect, of the speech output of the system to be corrected so that it corresponds to the intonation pattern of the corresponding words and/or phrase of the input speech.
- this corrective process enables the dialectal information in the input speech to be incorporated into the output speech.
- the fundamental tone curve of the speech model is based on information resulting from the lexical (orthography and transcription) and syntactic checks.
- the transcription information includes lexically abstracted accent information, of type stressed syllables, i.e. tonal word accents I and II, and information relating to the location of secondary accent, i.e. information given, for instance, in dictionaries. This information can be used to adjust the recognition pattern of the speech recognition model, for example, the Hidden Markov model, to take account of the transcription information. A more exact model of the input speech is, therefore, obtained during the interpretation process.
- the speech model is compared with a spoken input sequence, and any difference there between can be determined and used to bring the speech model into conformity with the spoken sequence and/or to determine stresses in the spoken sequence.
- relative sentence stresses can be determined by classifying the ratio between variations and declination of the fundamental tone curve, whereby emphasised sections, or individual words can be determined.
- the pitch of the speech can be determined from the declination of the fundamental tone curve.
- the extraction unit 3 in association with the interpretation unit 4, is adapted to determine:
- classification of the ratio between the variation and declination of the fundamental tone curve makes it possible to identify/determine relative sentence stresses, and emphasised sections, or words.
- the relation between the variation and declination of the fundamental tone curve can be utilised to determine the dynamic range of the fundamental tone curve.
- the information obtained in respect of the fundamental tone curve concerning dialect, sentence accent and stressing can be used for the interpretation of speech by the interpretation unit 4, i.e. the information can be used, in the manner outlined above, to obtain a better understanding of the content of the input speech and bring the intonation pattern of the speech model into conformity with the input speech.
- the corrected speech model exhibits the language characteristics (including dialect information, sentence accent and stressing) of the input speech it can be used to give an increased understanding of the input speech and be effectively used by the database management unit 5 to obtain the required speech information data from the database 6 to formulate a response to a voice input to the speech-to-speech conversion system.
- the ability to detect speech, irrespective of dialect variations, in accordance with the system and method of the present invention makes it possible to use speech in many different voice-responsive applications.
- the system is, therefore, adapted to recognise and accurately interpret the content of speech inputs and to tailor the dialect of the voice response to match the dialect of the voice input.
- This process provides a user friendly system because the language of the man-machine dialogue is in accordance with the dialect of the user concerned.
Claims (26)
- System zur Sprache-in-Sprache-Umsetzung zum Erzeugen am Ausgang desselben von gesprochenen Antworten auf am System eingegebenen Spracheingaben mit Spracherkennungsmitteln für die Spracheingabe; Interpretationsmitteln zum Interpretieren des Inhaltes der erkannten Spracheingabe; und einer Datenbank, welche Sprachinformationsdaten zur Verwendung bei der Formulierung der gesprochenen Antworten enthält, wobei der Ausgang der Interpretationsmittel dazu verwendet wird auf die Datenbank zuzugreifen und Sprachinformationsdaten aus dieser zu erhalten,
dadurch gekennzeichnet, daß das System weiterhin Extraktionsmittel zum Extrahieren der Satzrhythmusinformation aus dem Spracheingang; Mittel zum Erzielen einer Dialektinformation aus der Satzrhythmusinformation; und eine Einrichtung für eine Textin-Sprache-Umsetzung zum Umsetzen der Sprachinformationsdaten, die aus der Datenbank erhalten worden sind, in eine gesprochene Antwort unter Verwendung der Dialektinformation aufweist, wobei der Dialekt der gesprochenen Antwort an denjenigen des Spracheingangs angepaßt wird, wobei die Mittel zum Erzielen der Dialektinformation aus der Sprachrhythmusinformation aufweisen erste Analysemittel zum Bestimmen des Intonationsmusters aus dem Grundton des Spracheingangs und dabei der Maximum- und Minimum-Werte der Grundtonkurve und deren entsprechende Positionen; zweite Analysemittel zum Bestimmen des Intonationsmusters der Grundtonkurve des Sprachmodells und dabei der Maximum- und Minimum-Werte der Grundtonkurve und ihrer entsprechenden Positionen; Vergleichsmittel zum Vergleichen des Intonationsmusters des Spracheingangs mit dem Intonationsmuster des Sprachmodells zum Identifizieren der Zeitdifferenz zwischen dem Auftreten der Maximum- und Minimum-Werte der Grundtonkurven des Spracheingangs in Relation zu den Maximum- und Minimum-Werten der Grundtonkurve des Sprachmodells, wobei die identifizierte Zeitdifferenz die Dialektcharakteristika des Spracheingangs anzeigt. - System zur Sprache-in-Sprache-Umsetzung nach Anspruch 1,
dadurch gekennzeichnet, daß die Spracherkennungsmittel für das Identifizieren einer Anzahl von Phonemen aus einem Segment des Spracheingangs angepaßt sind und Interpretationsmittel zum Interpretieren der Phoneme als möglicher Worte oder Wortkombinationen aufweisen, um ein Sprachmodell zu errichten, wobei das Sprachmodell Wortund Satzakzente gemäß einem standardisierten Muster für die Sprache des Spracheingangs hat. - System zur Sprache-in-Sprache-Umsetzung nach Anspruch 2,
dadurch gekennzeichnet, daß die Satzrhythmusinformation, die aus dem Spracheingang extrahiert worden ist, die Grundtonkurve des Spracheingangs ist. - System zur Sprache-in-Sprache-Umsetzung nach Anspruch 3,
dadurch gekennzeichnet, daß die Zeitdifferenz in Relation zu einem Intonationsmuster-Referenzpunkt bestimmt wird. - System zur Sprache-in-Sprache-Umsetzung nach Anspruch 4,
dadurch gekennzeichnet, daß der Intonationsmuster-Referenzpunkt, bezogen auf welchen die Zeitdifferenz gemessen wird, der Punkt ist, an welchem eine Konsonant-/Vokal-Grenze auftritt. - System zur Sprache-in-Sprache-Umsetzung nach einem der vorstehenden Ansprüche,
dadurch gekennzeichnet, daß das System ferner Mittel zum Erzielen von Information über die Satzakzente aus der Satzrhythmusinformation aufweist. - System zur Sprache-in-Sprache-Umsetzung nach Anspruch 6,
dadurch gekennzeichnet, daß die Spracherkennungsmittel Überprüfungsmittel zum lexikalen Überprüfen der Worte in dem Sprachmodell und zum Überprüfen der Syntax der Phrasen in dem Sprachmodell haben, wobei die Worte und Phrasen, welche linguistisch nicht möglich sind, aus dem Sprachmodell ausgeschlossen werden, in dem die Überprüfungsmittel so ausgebildet sind, daß sie die orthographische und phonetische Transkription der Worte in dem Sprachmodell überprüfen, indem die Transkriptionsinformation lexikalisch abstrahierte Akzentinformation, Information bezüglich der Art betonter Silben und Information bezüglich des Ortes des Sekundärakzentes aufweist. - System zur Sprache-in-Sprache-Umsetzung nach Anspruch 7,
dadurch gekennzeichnet, daß die Akzentinformation sich auf den tonalen Wortakzent I und Akzent II bezieht. - System zur Sprache-in-Sprache-Umsetzung nach einem der Ansprüche 6 bis 8,
dadurch gekennzeichnet, daß die Satzakzentinformation bei der Interpretation des Inhaltes des erkanntes Spracheingangs verwendet wird. - System zur Sprache-in-Sprache-Umsetzung nach einem der vorstehenden Ansprüche,
dadurch gekennzeichnet, daß Satzbetonungen bestimmt werden und bei der Interpretation des Inhaltes des erkannten Spracheingangs verwendet werden. - System zur Sprache-in-Sprache-Umsetzung nach einem der vorstehenden Ansprüche,
dadurch gekennzeichnet, daß das System ferner Dialogverwaltungsmittel zum Verwalten eines Dialogs mit der Datenbank hat, wobei der Dialog durch die Interpretationsmittel initiiert wird. - System zur Sprache-in-Sprache-Umsetzung nach Anspruch 11,
dadurch gekennzeichnet, daß der Dialog mit der Datenbank zu der Anwendung von Sprachinformationsdaten bei den Text-in-Sprache-Umsetzungsmitteln führt. - System zur Sprache-in-Sprache-Umsetzung nach Anspruch 10 oder 11,
dadurch gekennzeichnet, daß der Dialog mit der Datenbank unter Verwendung von SQL durchgeführt wird. - Mit Stimme antwortendes Kommunikationssystem mit einem System zur Sprache-in-Sprache-Umsetzung nach einem der vorstehenden Ansprüche.
- Verfahren zum Schaffen einer gesprochenen Antwort auf einen Spracheingang in einem mit Stimme antwortenden Kommunikationssystem, wobei die Antwort einen Dialekt aufweist, der an denjenigen des Spracheingangs angepaßt ist, wobei das Verfahren die Schritte aufweist Erkennen und Interpretieren des Spracheingangs und Verwenden der Interpretation zum Erzielen von Sprachinformationsdaten aus einer Datenbank zur Verwendung bei der Formulierung der gesprochenen Antwort,
dadurch gekennzeichnet, daß das Verfahren weiterhin die Schritte aufweist Extrahieren der Satzrhythmusinformation aus dem Spracheingang, Erzielen von Dialektinformation aus der Satzrhythmusinformation und Umwandeln der Sprachinformationsdaten, die von der Datenbank erhalten worden sind, in die gesprochene Antwort unter Verwendung der Dialektinformation, die Schritte Bestimmen des Intonationsmusters des Grundtons des Spracheingangs und dadurch der Maximum- und Minimum-Werte der Grundtonkurve und deren entsprechende Positionen; Bestimmen des Intonationsmusters der Grundtonkurve des Sprachmodells und dadurch der Maximum- und Minimumwerte der Grundtonkurve und deren jeweiliger Positionen; Vergleichen des Intonationsmusters des Spracheingangs mit dem Intonationsmuster des Sprachmodells, um die Zeitdifferenz zwischen dem Auftreten des Maximum- und Minimum-Wertes der Grundtonkurven für den Spracheingang in Relation zu den Maximum- und Minimum-Werten der Grundtonkurve des Sprachmodells zu identifizieren, wobei die identifizierte Zeitdifferenz die Dialektcharakteristika des Spracheingangs anzeigt. - Verfahren nach Anspruch 15,
dadurch gekennzeichnet, daß die Erkennung und Interpretation die Schritte Identifizieren einer Anzahl von Phonemen aus einem Segment des Spracheingangs und Interpretieren der Phoneme als mögliche Worte oder Wortkombinationen zum Errichten eines Sprachmodells aufweist, wobei das Sprachmodell Wort- und Satzakzente gemäß einem standardisierten Muster für die Sprache des Spracheingangs hat. - Verfahren nach Anspruch 16,
dadurch gekennzeichnet, daß die Satzrhythmusinformation, welche aus dem Spracheingang extrahiert worden ist, die Grundtonkurve des Spracheingangs ist. - Verfahren nach Anspruch 15,
dadurch gekennzeichnet, daß die Zeitdifferenz in Relation zu einem Intonationsmuster-Referenzpunkt bestimmt wird. - Verfahren nach Anspruch 18,
dadurch gekennzeichnet, daß der Intonationsmuster-Referenzpunkt, gegenüber welchem die Zeitdifferenz gemessen wird, der Punkt ist, an welchem eine Konsonant/Vokal-Grenze auftritt. - Verfahren nach einem der Ansprüche 15 bis 19,
gekennzeichnet durch den Schritt Erzielen von Information bezüglich der Satzakzente aus der Satzrhythmusinformation. - Verfahren nach Anspruch 20,
dadurch gekennzeichnet, daß die Worte in dem Sprachmodell lexikalisch überprüft werden, daß die Phrasen in dem Sprachmodell bezüglich der Syntax überprüft werden, daß die Worte und Phrasen, die linguistisch nicht möglich sind, aus dem Sprachmodell ausgeschlossen werden, daß die orthographische und phonetische Transkription der Worte in dem Sprachmodell überprüft wird und daß die Transkriptionsinformation lexikalisch abstrahierte Akzentinformation der Art betonter Silben und Information bezüglich des Ortes des Sekundärakzentes aufweist. - Verfahren nach Anspruch 21,
dadurch gekennzeichnet, daß die Akzentinformation sich auf den tonalen Wortakzent I und Akzent II bezieht. - Verfahren nach einem der Ansprüche 20 bis 22,
gekennzeichnet durch den Schritt Verwenden der Satzakzentinformation bei der Interpretation des Spracheingangs. - Verfahren nach einem der Ansprüche 15 bis 23,
gekennzeichnet durch den Schritt Initiieren eines Dialoges mit der Datenbank zum Erzielen von Sprachinformationsdaten zum Formulieren der gesprochenen Antwort, wobei der Dialog auf die Interpretation des Spracheingangs folgend, initiiert wird. - Verfahren nach Anspruch 24,
dadurch gekennzeichnet, daß der Dialog mit der Datenbank zu der Anwendung von Sprachinformationsdaten bei den Text-in-Sprache-Umsetzungsmitteln führt. - Mit Stimme antwortendes Kommunikationssystem, das so ausgebildet ist, daß es ein Verfahren wie in einem der Ansprüche 15 bis 25 beansprucht, verwenden kann, um eine gesprochene Antwort auf einen Spracheingang am System zu erzeugen.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SE9601811A SE9601811L (sv) | 1996-05-13 | 1996-05-13 | Metod och system för tal-till-tal-omvandling med extrahering av prosodiinformation |
SE9601811 | 1996-05-13 | ||
PCT/SE1997/000583 WO1997043756A1 (en) | 1996-05-13 | 1997-04-08 | A method and a system for speech-to-speech conversion |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0919052A1 EP0919052A1 (de) | 1999-06-02 |
EP0919052B1 true EP0919052B1 (de) | 2003-07-09 |
Family
ID=20402543
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP97919840A Expired - Lifetime EP0919052B1 (de) | 1996-05-13 | 1997-04-08 | Verfahren und system zur sprache-in-sprache-umsetzung |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP0919052B1 (de) |
DE (1) | DE69723449T2 (de) |
DK (1) | DK0919052T3 (de) |
NO (1) | NO318557B1 (de) |
SE (1) | SE9601811L (de) |
WO (1) | WO1997043756A1 (de) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1159702C (zh) * | 2001-04-11 | 2004-07-28 | 国际商业机器公司 | 具有情感的语音-语音翻译系统和方法 |
US7181397B2 (en) | 2005-04-29 | 2007-02-20 | Motorola, Inc. | Speech dialog method and system |
DE102007011039B4 (de) * | 2007-03-07 | 2019-08-29 | Man Truck & Bus Ag | Freisprecheinrichtung in einem Kraftfahrzeug |
US8150020B1 (en) * | 2007-04-04 | 2012-04-03 | At&T Intellectual Property Ii, L.P. | System and method for prompt modification based on caller hang ups in IVRs |
US8024179B2 (en) * | 2007-10-30 | 2011-09-20 | At&T Intellectual Property Ii, L.P. | System and method for improving interaction with a user through a dynamically alterable spoken dialog system |
JP5282469B2 (ja) | 2008-07-25 | 2013-09-04 | ヤマハ株式会社 | 音声処理装置およびプログラム |
EP3389043A4 (de) * | 2015-12-07 | 2019-05-15 | Yamaha Corporation | Sprachinteraktionsvorrichtung und sprachinteraktionsverfahren |
CN113470670A (zh) * | 2021-06-30 | 2021-10-01 | 广州资云科技有限公司 | 电音基调快速切换方法及系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2165969B (en) * | 1984-10-19 | 1988-07-06 | British Telecomm | Dialogue system |
JPH0772840B2 (ja) * | 1992-09-29 | 1995-08-02 | 日本アイ・ビー・エム株式会社 | 音声モデルの構成方法、音声認識方法、音声認識装置及び音声モデルの訓練方法 |
SE9301596L (sv) * | 1993-05-10 | 1994-05-24 | Televerket | Anordning för att öka talförståelsen vid översätttning av tal från ett första språk till ett andra språk |
SE504177C2 (sv) * | 1994-06-29 | 1996-12-02 | Telia Ab | Metod och anordning att adaptera en taligenkänningsutrustning för dialektala variationer i ett språk |
-
1996
- 1996-05-13 SE SE9601811A patent/SE9601811L/xx unknown
-
1997
- 1997-04-08 WO PCT/SE1997/000583 patent/WO1997043756A1/en active IP Right Grant
- 1997-04-08 DE DE69723449T patent/DE69723449T2/de not_active Expired - Fee Related
- 1997-04-08 EP EP97919840A patent/EP0919052B1/de not_active Expired - Lifetime
- 1997-04-08 DK DK97919840T patent/DK0919052T3/da active
-
1998
- 1998-11-06 NO NO19985179A patent/NO318557B1/no unknown
Also Published As
Publication number | Publication date |
---|---|
DE69723449T2 (de) | 2004-04-22 |
SE506003C2 (sv) | 1997-11-03 |
NO985179L (no) | 1998-11-11 |
SE9601811L (sv) | 1997-11-03 |
DE69723449D1 (de) | 2003-08-14 |
EP0919052A1 (de) | 1999-06-02 |
WO1997043756A1 (en) | 1997-11-20 |
NO985179D0 (no) | 1998-11-06 |
DK0919052T3 (da) | 2003-11-03 |
SE9601811D0 (sv) | 1996-05-13 |
NO318557B1 (no) | 2005-04-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5752227A (en) | Method and arrangement for speech to text conversion | |
US5806033A (en) | Syllable duration and pitch variation to determine accents and stresses for speech recognition | |
JP4536323B2 (ja) | 音声−音声生成システムおよび方法 | |
US6910012B2 (en) | Method and system for speech recognition using phonetically similar word alternatives | |
US7013276B2 (en) | Method of assessing degree of acoustic confusability, and system therefor | |
AU2009249165B2 (en) | Systems and methods of improving automated speech recognition accuracy using statistical analysis of search terms | |
US7191132B2 (en) | Speech synthesis apparatus and method | |
EP0767950B1 (de) | Verfahren und vorrichtung zur anpassung eines spracherkenners an dialektische sprachvarianten | |
JPH09500223A (ja) | 多言語音声認識システム | |
GB2380380A (en) | Speech synthesis method and apparatus | |
US5677992A (en) | Method and arrangement in automatic extraction of prosodic information | |
EP0919052B1 (de) | Verfahren und system zur sprache-in-sprache-umsetzung | |
Badino et al. | Language independent phoneme mapping for foreign TTS | |
Kadambe et al. | Language identification with phonological and lexical models | |
Chomphan et al. | Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis | |
WO1997043707A1 (en) | Improvements in, or relating to, speech-to-speech conversion | |
Chou et al. | Automatic segmental and prosodic labeling of Mandarin speech database. | |
Alam et al. | Development of annotated Bangla speech corpora | |
US11817079B1 (en) | GAN-based speech synthesis model and training method | |
Gros et al. | SI-PRON pronunciation lexicon: a new language resource for Slovenian | |
Wangchuk et al. | Developing a Text to Speech System for Dzongkha | |
Potisuk et al. | Using stress to disambiguate spoken Thai sentences containing syntactic ambiguity | |
Williams | The segmentation and labelling of speech databases | |
KR0136423B1 (ko) | 발음 제어 기호의 유효성 판정을 이용한 음운 변동 처리 방법 | |
Martin et al. | Cross Lingual Modelling Experiments for Indonesian |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19981214 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): CH DE DK FI FR GB LI NL |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7G 10L 13/08 A, 7G 06F 3/16 B |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Designated state(s): CH DE DK FI FR GB LI NL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030709 Ref country code: LI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030709 Ref country code: CH Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030709 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REF | Corresponds to: |
Ref document number: 69723449 Country of ref document: DE Date of ref document: 20030814 Kind code of ref document: P |
|
REG | Reference to a national code |
Ref country code: DK Ref legal event code: T3 |
|
NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20040414 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DK Payment date: 20080411 Year of fee payment: 12 Ref country code: DE Payment date: 20080418 Year of fee payment: 12 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FI Payment date: 20080415 Year of fee payment: 12 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20080412 Year of fee payment: 12 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20080421 Year of fee payment: 12 |
|
REG | Reference to a national code |
Ref country code: DK Ref legal event code: EBP |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20090408 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20091231 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090408 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20091103 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090408 Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20091222 Ref country code: DK Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090430 |