DE69723449T2

DE69723449T2 - METHOD AND SYSTEM FOR LANGUAGE-TO-LANGUAGE IMPLEMENTATION

Info

Publication number: DE69723449T2
Application number: DE69723449T
Authority: DE
Inventors: Bertil Lyberg
Original assignee: Telia AB
Current assignee: Telia AB
Priority date: 1996-05-13
Filing date: 1997-04-08
Publication date: 2004-04-22
Anticipated expiration: 2017-04-09
Also published as: EP0919052B1; DE69723449D1; SE506003C2; SE9601811L; WO1997043756A1; NO985179L; SE9601811D0; NO985179D0; EP0919052A1; NO318557B1; DK0919052T3

Description

Die Erfindung betrifft ein Sprache-in-Sprache-Umwandlungssystem und ein Verfahren, die im Stande sind, den Dialekt von Sprachausgangssignalen an die empfangenen Spracheingangssignale anzupassen, und ein auf Sprache reagierendes Kommunikationssystem, das ein Sprache-in-Sprache-Umwandlungssystem einschließt und in Übereinstimmung mit einem Sprache-in-Sprache-Umwandlungsverfahren arbeitet.The invention relates to a speech-to-speech conversion system and a method that is capable of dialect voice output signals to adapt to the received speech input signals, and one on Speech responsive communication system that is a speech-to-speech conversion system includes and in agreement with a speech-to-speech conversion process is working.

Bei bekannten auf Sprache reagierenden Kommunikationssystemen wird die Sprachinformation, die in einer Datenbank gespeichert ist und verwendet wird, geeignete synthetisierte gesprochene Antworten auf Spracheingangssignale unter Verwendung eines Sprache-in-Sprache-Umwandlungssystem zu liefern, normalerweise in einem Dialekt reproduziert, der einem nationalen Standarddialekt entspricht. Wenn es beträchtliche Unterschiede zwischen dem Dialekt der Spracheingangssignale und dem nationalen Standarddialekt gibt, kann es sich so als schwierig in gewissen Umständen für die Datenbank von bekannten auf Sprache reagierenden Kommunikationssystemen erweisen, empfangene Sprachinformation zu interpretieren, d. h. die Spracheingangssignale zu interpretieren. Es kann auch schwierig sein für die Person, die die Spracheingangssignale macht, voll die gesprochene Antwort zu verstehen. Sogar wenn solche Antworten für einen Empfänger verständlich sind, wäre es benutzerfreundlicher, wenn der Dialekt der gesprochenen Antwort der selbe ist wie der Dialekt des damit zusammenhängenden Spracheingangssignals.In known communication systems that react to speech becomes the language information that is stored in a database and use appropriate synthesized spoken answers to speech input signals using a speech-to-speech conversion system to deliver, usually reproduced in a dialect that suits you national standard dialect. If it is substantial Differences between the dialect of the input speech signals and the national standard dialect, it can prove difficult in certain circumstances for the Database of known communication systems reacting to speech prove to interpret received speech information, d. H. interpret the speech input signals. It can also be difficult be for the person who makes the input speech signals fully speaks Understand answer. Even if such responses are understandable to a recipient, would it be more user-friendly if the dialect of the spoken answer the same as the dialect of the related Voice input signal.

Auch bei der künstlichen Wiedergabe einer gesprochenen Sprache ist es notwendig, daß die Sprache natürlich und mit der richtigen Akzentuierung reproduziert wird. Insbesondere kann die Bedeutung eines Wortes stark unterschiedliche Bedeutungen in Abhängigkeit von der Sprachbetonung haben. Auch kann der Bedeutung des einen und selben Satzes eine beträchtliche Signifikanz in Abhängigkeit davon gegeben werden, wo die Betonung angeordnet wird. Außerdem bestimmt das Betonen von Sätzen oder Teilen derselben Abschnitte, die in der Sprache hervorgehoben werden und die wichtig bei der Bestimmung der genauen Bedeutung der gesprochenen Sprache sein können.Even with the artificial reproduction of a spoken one Language it is necessary that language Naturally and reproduced with the right accentuation. In particular The meaning of a word can have very different meanings dependent on from the emphasis on speech. Also the importance of one and the same sentence a considerable one Significance depending on it be given where the emphasis is placed. Also determined emphasizing sentences or parts of the same sections highlighted in the language become important in determining the exact meaning of the spoken language.

Die Notwendigkeit, daß künstlich produzierte Sprache so natürlich wie möglich sein soll und die richtige Betonung hat, ist von besonderer Bedeutung bei auf Sprache reagierenden Kommunikationseinrichtungen und/oder Systemen, die Sprache in unterschiedlichen Zusammenhängen erzeugen. Mit bekannten auf Sprache reagierenden Anordnungen ist die reproduzierte Sprache schwierig zu verstehen und zu interpretieren. Es besteht daher eine Notwendigkeit für ein Sprache-in-Sprache-Umwandlungssystem, in dem die künstlichen Sprachausgangssignale natürlich sind, die richtige Betonung haben und leicht verständlich sind.The need for that artificially produced language so natural as possible should be and has the right emphasis is of particular importance communication devices and / or systems reacting to speech, create the language in different contexts. With known arrangements responding to language is the reproduced language difficult to understand and interpret. There is therefore one Need for a speech-to-speech conversion system, in which the artificial Voice output signals, of course have the right emphasis and are easy to understand.

Bei Sprachen, die gut entwickelte Satzakzentbetonungen und/oder Tonhöhen in individuellen Worten haben, ist die Identifizierung der natürlichen Bedeutung der Worte/Sätze sehr schwierig. Die Tatsache, daß Betonungen falsch angeordnet werden können, erhöht das Risiko der falschen Interpretation, oder daß die Bedeutung für die zuhörende Partei völlig verloren geht.For languages that are well developed Sentence accentuation and / or pitches in individual words the identification of the natural meaning of the words / sentences is very difficult. The fact that stresses can be misplaced elevated the risk of misinterpretation, or that the meaning for the listening party is completely lost goes.

Es sind verschiedene Typen von Spracherkennungssystemen bekannt. Es ist bei solchen Systemen üblich, daß die Spracherkennungsausrüstung trainiert wird, um Sprache von einer großen Anzahl von Personen zu erkennen. Auch das Sprachtraining folgt einem besonderen Dialekt oder Dialekten. Die Information, die durch diesen Vorgang gesammelt wird, wird dann durch das System verwendet, um ankommende Sprache zu interpretieren. Ein solches System kann daher normalerweise Dialektabwandlungen der Sprache nicht verstehen, die außerhalb des be sonderen Dialekts/der besonderen Dialekte liegen, für die das System trainiert worden ist.There are different types of speech recognition systems known. It is common in such systems for the speech recognition equipment to train is going to language from a big one Recognize number of people. Language training also follows you special dialect or dialects. The information provided by this Operation is then used by the system interpret incoming language. Such a system can therefore usually don't understand dialect variations of the language that outside of special dialect (s) for which the System has been trained.

Als Beispiel offenbart Dokument WO-A-96-00962 ein Spracherkennungssystem zum Erkennen von Dialektvariationen in einer Sprache.As an example, document WO-A-96-00962 discloses a speech recognition system for recognizing dialect variations in one language.

In Sprachen mit Tonwortakzenten und Tonsprache bildet das Intonationsmuster der Sprache einen sehr wichtigen Teil beim Verständnis der Sprache, bekannte Systeme berücksichtigen jedoch nicht diese Sprachcharakteristiken. Als Konsequenz hiervon kann die Erkennung von Worten und Phrasen bei bekannten Spracherkennungssystemen Anlaß zu Fehlinterpretationen geben. Die bekannten Spracherkennungssysteme, die dazu ausgebildet sind, Dialekte bei der Sprache zu berücksichtigen, sind besonders für einen speziellen Dialekt maßgeschneidert und sind daher nicht dazu ausgebildet, unterschiedliche Dialekte in einer Sprache zu erkennen.In languages with accented words and Tonal language makes the intonation pattern of language a very important one Part in understanding the language, known systems do not take this into account Voice characteristics. As a consequence, the detection of words and phrases in known speech recognition systems cause misinterpretation give. The well-known speech recognition systems that are designed for this to take dialects into account in the language are special for one tailor-made special dialect and are therefore not trained to use different dialects recognizable in one language.

In der Zukunft werden Spracherkennungsausrüstungen in immer größeren Ausmaß bei sehr vielen verschiedenen Anwendungen verwendet werden, wo die Notwendigkeit besteht, imstande zu sein, unterschiedliche Dialekte in einer Sprache zu erkennen. Die Dialektvariationen in einer Sprache sind in der Vergangenheit schwierig zu bestimmen gewesen, und als Konsequenz hiervon wurden Schwierigkeiten beim Erhalten eines richtigen Verständnisses von künstlich erzeugter Sprache erfahren. Darüber hinaus können die bekannten Spracherkennungsausrüstungen im allgemeinen nicht mit unterschiedlichen Sprachen verwendet werden.In the future there will be speech recognition equipment to an ever increasing extent with a great many different applications are used where the need consists of being able to use different dialects in one language to recognize. The dialect variations in one language are in the The past has been difficult to determine, and as a consequence of these have had difficulties in gaining a correct understanding of artificially experience generated language. Furthermore can the known speech recognition equipment in general not with different languages.

Obwohl bekannte Spracherkennungssysteme dazu ausgebildet sind, durch Training einen besonderen Dialekt in einer Sprache zu erkennen, ist es für solche Systeme nicht möglich, unterschiedliche Dialekte in dieser Sprache oder Dialekte in unterschiedlichen Sprachen unter Verwendung derselben Spracherkennungsausrüstung ohne weiteres Training zu erkennen.Although known speech recognition systems are trained to train in a special dialect To recognize a language, it is not possible for such systems to use different dialects in this language or dialects in different languages below Use of the same speech recognition equipment without further training to recognize.

Die künstliche Interpretation von Sprache ist daher schwierig, wenn nicht sogar unmöglich mit bekannten Spracherkennungsausrüstungen durchzuführen gewesen, und zwar aufgrund der Unfähigkeit solcher Systeme, Dialektvariationen zu erkennen.The artificial interpretation of language is therefore difficult, if not impossible, to do with known speech recognition equipment due to the inability of such systems to recognize dialect variations.

Außer den technischen Problemen, Sprache korrekt zu interpretieren, ist es bei Spracherkennungs/Sprachsteuersystemen notwendig, daß die verbalen Anweisungen oder Befehle richtig interpretiert werden, da es sonst nicht möglich wäre, geeignete Reaktionen oder richtige Steuerung unterschiedlicher Typen von Ausrüstungen zu liefern.Besides the technical problems, To interpret language correctly, it is with speech recognition / speech control systems necessary that the verbal Instructions or commands are interpreted correctly, otherwise it will not possible would be appropriate Reactions or correct control of different types of equipment to deliver.

Um diese Schwierigkeiten zu überwinden, wäre es für ein auf Sprache reagierendes Kommunikationssystem notwendig, daß es die empfangene Sprachinformation unabhängig vom Dialekt zu interpretieren und den Dialekt der Sprachausgangssignale an denjenigen der entsprechenden Spracheingangssignale anzupassen imstande ist. Um imstande zu sein, die Bedeutung von einzelnen Worten oder Phrasen in unzweideutiger Weise in einem gesprochenem Satz zu verstehen, wäre es auch notwendig für die Sprache in Sprachewandler, die in einem auf Sprache reagierenden Kommunikationssystem verwendet werden, daß sie imstande sind, Betonungen in der gesprochenen Sequenz zu bestimmen und zu berücksichtigen.To overcome these difficulties, it would be up to one Speech-responsive communication system necessary that it interpret received speech information regardless of the dialect and the dialect of the voice output signals to that of the corresponding ones Is able to adapt voice input signals. To be able the meaning of single words or phrases in unambiguous To understand wisely in a spoken sentence would also be necessary for language in speech converters in a communication system that responds to speech used that they are able to determine stresses in the spoken sequence and take into account.

Es ist eine Aufgabe der vorliegenden Erfindung, ein System und ein Verfahren für Sprache-in-Sprache-Umwandlung zu schaffen, die imstande sind, den Dialekt der Sprachausgangssignale an denjenigen der entsprechenden Spracheingangssignale anzupassen.It is a task of the present Invention, system and method for speech-to-speech conversion to create who are able to dialect the voice output signals to match those of the corresponding voice input signals.

Es ist ein weiteres Ziel der vorliegenden Erfindung, ein System und ein Verfahren für Sprache-in-Sprache-Umwandlung zu schaffen, die dazu ausgebildet sind, Spracheingangssignale und insbesondere den Dialekt, Satzakzent und Betonung von gesprochenen Sequenzen unter Verwendung der Fundamentalkurve der gesprochenen Sequenz zu erkennen und zu interpretieren.It is another goal of the present Invention, system and method for speech-to-speech conversion to create who are trained to input speech signals and especially the dialect, sentence accent and emphasis on spoken Sequences using the fundamental curve of the spoken Recognize and interpret sequence.

Es ist ein weiteres Ziel der vorliegenden Erfindung, ein auf Sprache reagierendes Kommunikationssystem einschließlich eines Sprache-in-Sprache-Umwandlungssystems zu schaffen, das imstande ist, den Dialekt von Sprachausgangssignalen an denjenigen von entsprechenden Spracheingangssignalen anzupassen.It is another goal of the present Invention, a speech-responsive communication system including one Speech-to-speech conversion system capable of doing that is, the dialect of voice output signals to that of corresponding ones Adapt voice input signals.

Die Erfindung, wie sie in den Ansprüchen 1 bis 26 beansprucht ist, schafft ein Sprache-in-Sprache-Umwandlungssystem zum Schaffen von gesprochenen Antworten auf Spracheingänge in das System am Ausgang desselben einschließlich Spracherkennungsmittel für die Eingangssprache; Interpretationsmittel zum Interpretieren des Inhalts des erkannten eingegebenen Sprachsignals; und eine Datenbank, die Sprachinformationsdaten für Verwendung bei der Formulierung der gesprochenen Antworten enthält, wobei der Ausgang der Interpretationsmittel dazu verwendet wird, Zugang zur Datenbank zu machen und Sprachinformationsdaten von derselben zu erhalten, dadurch gekennzeichnet, daß das System weiter Extraktionsmittel zum Extrahieren von Prosodieinformation von der eingegebenen Sprache; Mittel zum Erhalten von Dialektinformation von der Prosodieinformation; und Text-zu-Sprache-Umwandlungsmittel einschließt, um die Sprachinformationsdaten, die von der Datenbank erhalten sind, in eine gesprochene Antwort unter Verwendung der Dialektinformation umzuwandeln, wobei der Dialekt der gesprochenen Antwort an denjenigen der eingegebenen Sprache angepaßt ist.The invention as set out in claims 1 to 26 creates a speech-to-speech conversion system to create spoken responses to voice inputs into the System at the exit of the same including speech recognition means for the Input speech; Means of interpretation for interpreting the content the recognized input speech signal; and a database that Voice information data for Includes use in formulating spoken answers, where the output of the means of interpretation is used to access to make the database and voice information data from the same to get, characterized in that the system further extractant for extracting prosody information from the input speech; Means for obtaining dialect information from the prosody information; and text-to-speech converting means to convert the speech information data received from the database into a spoken answer below Use the dialect information to convert the dialect the spoken answer to that of the entered language customized is.

Die Spracherkennungsmittel können dazu ausgebildet sein, eine Anzahl von Phonemen von einem Segment der eingegebenen Sprache zu identifizieren und die Phoneme zu interpretieren, und zwar als mögliche Worte oder Wortkombinationen, um ein Modell der Sprache einzurichten, wobei das Sprachmodell Wort- und Satzakzente gemäß einem standardisierten Muster für die Sprache der eingegebenen Sprache hat.The speech recognition means can be designed for this be a number of phonemes from a segment of the input Identify language and interpret the phonemes, and as possible Words or combinations of words to set up a model of language, where the language model word and sentence accents according to a standardized pattern for the Language of the entered language.

Die Prosodieinformation, die von der eingegebenen Sprache extrahiert wird, ist vorzugsweise die Fundamentaltonkurve der eingegebenen Sprache. In diesem Falle schließen die Mittel zum Erhalten von Dialektinformation von der Prosodieinformation erste Analysiermittel zum Bestimmen des Intonationsmusters des Fundamentaltons der eingegebenen Sprache und damit der Maximal- und Minimal-Werte der Fundamentalkurve und ihrer entsprechenden Stellen; zweite Analysiermittel zum Bestimmen des Intonationsmusters der Fundamentaltonkurve des Sprachmodells und dadurch der maximalen und minimalen Werte der Fundamentaltonkurve und ihrer entsprechenden Positionen; Vergleichermittel zum Vergleichen der Intonationsmuster der eingegebenen Sprache mit dem Intonationsmuster des Sprachmodells, um einen Zeitunterschied zwischen dem Auftreten der Maximal- und Minimal-Werte der Fundamentalkurve der ankommenden Sprache in Bezug auf die Maximal- und Minimal-Werte der Fundamentalkurve des Sprachmodells zu bestimmen, wobei die identifizierte Zeitdifferenz Dialektcharakteristiken der eingegebenen Sprache anzeigt, ein. Die Zeitdifferenz kann in Bezug auf einen Intonationsmusterbezugspunkt, z. B. den Punkt, an dem eine Konsonanten/Vokalgrenze auftritt, bestimmt werden.The prosody information provided by extracted from the input language is preferably the fundamental tone curve the entered language. In this case, the means to receive close from dialect information from prosody information first analyzing means to determine the intonation pattern of the fundamental of the entered Language and thus the maximum and minimum values of the fundamental curve and their corresponding bodies; second analyzer for determination the intonation pattern of the fundamental tone curve of the language model and thereby the maximum and minimum values of the fundamental tone curve and their corresponding positions; Comparative means for comparison the intonation pattern of the entered language with the intonation pattern of the language model, a time difference between the occurrence the maximum and minimum values of the fundamental curve of the incoming Language related to the maximum and minimum values of the fundamental curve determine the language model, taking the identified time difference Indicates dialect characteristics of the entered language. The time difference may refer to an intonation pattern reference point, e.g. B. the Point at which a consonant / vowel boundary occurs can be determined.

Das Sprache-in-Sprache-Umwandlungssystem kann Mittel zum Erhalten von Information über Satzakzente von der Prosodieinformation enthalten. In diesem Falle schließt das Spracherkennungssystem Prüfmittel ein, um lexikalisch die Worte im Sprachmodell zu überprüfen und um syntaktisch die Phrasen im Sprachmodell zu prüfen, wobei die Worte und Phrasen, die linguistisch nicht möglich sind, vom Sprachmodell ausgeschlossen werden. Die Prüfmittel sind bei dieser Anordnung dazu ausgebildet, die Orthographie und phonetische Transkription der Worte im Sprachmodell zu überprüfen, wobei die Transkriptionsinformation lexikalisch abstrahierte Akzentinformation des Typs betonter Silben und Information einschließt, die sich auf den Ort des sekundären Akzents bezieht. Die Akzentinformation kann sich z. B. auf den tonalen Wortakzent I und Akzent II beziehen.The speech-to-speech conversion system may include means for obtaining sentence accent information from the prosody information. In this case, the speech recognition system includes checking means for lexically checking the words in the language model and for syntactically checking the phrases in the language model, the words and phrases that are not linguistically possible being excluded from the language model. With this arrangement, the checking means are designed to check the orthography and phonetic transcription of the words in the language model, the transcription information being lexically abstracted accent information of the type stressed syllables and information mation that refers to the location of the secondary accent. The accent information can e.g. B. refer to the tonal word accent I and accent II.

Die Satzakzentinformation und/oder die Satzbetonung können vorteilhafterweise bei der Interpretation des Inhalts der erkannten Eingangssprache verwendet werden.The sentence accent information and / or the sentence emphasis can advantageously when interpreting the content of the recognized Input language can be used.

Das Sprache-in-Sprache-Umwandlungssystem kann Dialogverwaltungssysteme zum Verwalten eines Dialogs mit der Datenbank einschließen, welcher Dialog durch die Interpretationsmittel initiiert wird. Bei einer bevorzugten Ausführungsform führt der Dialog mit der Datenbank zur Anwendung von Sprachinformationsdaten auf die Text-zu-Sprache-Umwandlungsmittel.The speech-to-speech conversion system can use dialog management systems to manage a dialog with the Include database, which dialogue is initiated by the means of interpretation. at a preferred embodiment leads the Dialogue with the database for the application of voice information data to the text-to-speech conversion means.

Die Erfindung schafft auch in einem auf Sprache reagierenden xommunikationssystem ein Verfahren, eine gesprochene Antwort auf ein Spracheingangssignal in das System zu schaffen, welche Antwort einen Dialekt hat, der denjenigen des Spracheingangssignals angepaßt ist, wobei das Verfahren die Schritte aufweist, die Eingangssprache zu erkennen und zu interpretieren und die Interpretation zu verwenden, Sprachinformationsdaten von einer Datenbank für Verwendung bei der Formulierung der gesprochenen Antwort zu erhalten, dadurch gekennzeichnet, daß das Verfahren weiter die Schritte aufweist, Prosodieinformation von der eingegebenen Sprache zu extrahieren, Dialektinformation von der Prosodieinformation zu erhalten und Sprachinformationsdaten, die von der Datenbank erhalten werden, in die gesprochene Antwort unter Verwendung der Dialektinformation umzuwandeln.The invention also creates one a system that reacts to speech spoken response to a speech input signal to the system create which answer has a dialect that that of the speech input signal customized , the method comprising the steps of the input language recognize and interpret and use the interpretation Voice information data from a database for use in formulation to obtain the spoken answer, characterized in that the method further comprising the steps of prosody information from the inputted language to extract dialect information from the prosody information obtained and voice information data obtained from the database into the spoken answer using the dialect information convert.

Die Erkennung und Interpretation der eingegebenen Sprache schließt die Schritte ein, eine Anzahl von Phonemen von einem Segment der eingegebenen Sprache zu identifizieren und die Phoneme als mögliche Worte oder Wortkombination zu interpretieren, um ein Modell der Sprache einzurichten, wobei das Sprachmodell Wort- und Satzakzente entsprechend einem standardisierten Muster für die Sprache der eingegebenen Sprache hat.The detection and interpretation of the entered language closes the steps one, a number of phonemes from a segment of the to identify the input language and the phonemes as possible words or interpret a combination of words to create a model of language set up, with the language model word and sentence accents accordingly a standardized pattern for has the language of the entered language.

Bei einem bevorzugten Verfahren ist die Prosodieinformation, die von der eingegebenen Sprache extrahiert wird, die Funda mentaltonkurve der eingegebenen Sprache. In diesem Falle schließt das Verfahren der vorliegenden Erfindung die Schritte ein, das Intonationsmuster des Fundamentaltons der eingegebenen Sprache zu bestimmen und dadurch die Maximal- und Minimal-Werte der Fundamentaltonkurve und ihre entsprechenden Positionen zu bestimmen; das Intonationsmuster der Fundamentalkurve eines Sprachmodells und dabei die Maximal- und Minimal-Werte der Fundamentaltonkurve und ihrer entsprechenden Positionen zu bestimmen; das Intonationsmuster der eingegebenen Sprache mit dem Intonationsmuster des Sprachmodells zu vergleichen, um eine Zeitdifferenz zwischen dem Auftreten der Maximal- und Minimal-Werte der Fundamentaltonkurven der ankommenden Sprache in Bezug auf die Maximal- und Minimal-Werte der Fundamentalkurve des Sprachmodells zu identifizieren, wobei die identifizierte Zeitdifferenz für Dialektcharakteristiken der eingegebenen Sprache typisch ist bzw. diese anzeigt. Die Zeitdifferenz kann in Bezug auf einen Intonationsmusterbezugspunkt, z. B. den Punkt bestimmt werden, an dem die Konsonant/Vokalgrenze auftritt.One preferred method is the prosody information extracted from the input language the fundamental tone curve of the language entered. In this Trap closes the method of the present invention includes the steps of the intonation pattern the fundamental tone of the language entered and thereby the maximum and minimum values of the fundamental tone curve and their determine appropriate positions; the intonation pattern of the Fundamental curve of a language model and the maximum and Minimum values of the fundamental tone curve and its corresponding positions to determine; the intonation pattern of the entered language with compare the intonation pattern of the language model to a Time difference between the occurrence of the maximum and minimum values the fundamental tone curves of the incoming language in relation to the maximum and to identify minimum values of the fundamental curve of the language model, where the identified time difference for dialect characteristics of the entered Language is typical or indicates this. The time difference can be in Reference to an intonation pattern reference point, e.g. B. determines the point at which the consonant / vowel boundary occurs.

Das Verfahren der vorliegenden Erfindung kann den Schritt aufweisen, Information über Satzakzente von der Prosodieinformation zu erhalten. In Übereinstimmung mit diesem Verfahren werden die Worte im Sprachmodell lexikalisch überprüft, und die Phrasen im Sprachmodell werden syntaktisch überprüft, wobei die Worte und Phrasen, die linguistisch nicht möglich sind, von dem Sprachmodell ausgeschlossen werden. Auch kann in Übereinstimmung mit diesem Verfahren die Orthographie und phonetische Transkription der Worte des Sprachmodells überprüft werden, wobei die Transkriptionsinformation lexikalisch abstrahierte Akzentinformation des Typs betonter Silben und Informationen bezüglich des Ortes der Sekundärakzents einschließt. Die Akzentinformation kann sich auf den tonalen Wortakzent I und Akzent II beziehen.The method of the present invention can have the step of information about sentence accents from the prosody information to obtain. In accordance with this procedure the words in the language model are checked lexically, and the phrases in the language model are checked syntactically, the words and phrases, which is not possible linguistically are excluded from the language model. Can also be in accordance with This process involves orthography and phonetic transcription the words of the language model are checked, where the transcription information is lexically abstracted accent information the type of stressed syllables and information regarding the location of the secondary accents includes. The accent information can relate to the tonal word accent I and Get accent II.

In Übereinstimmung mit dem Verfahren der vorliegenden Erfindung können Satzakzentinformation und/oder Satzbetonung bei der Interpretation des Inhalts der erkannten eingegebenen Sprache verwendet werden.In accordance with the procedure of the present invention Sentence accent information and / or sentence emphasis during interpretation the content of the recognized input language can be used.

Das Verfahren der vorliegenden Erfindung kann den Schritt einschließen, einen Dialog mit einer Datenbank zu initiieren, um Sprachinformationsdaten zum Formulieren der gesprochenen Antwort zu erhalten, wobei der Dialog initiiert wird nach Interpretation der eingegebenen Sprache. Der Dialog mit der Datenbank kann zur Anwendung der Sprachinformationsdaten auf Text-zu-Sprache-Umwandlungsmittel führen.The method of the present invention can include the step to initiate a dialog with a database to collect voice information data Formulate the spoken answer to be received, taking the dialogue is initiated after interpreting the entered language. The Dialogue with the database can be used to apply the voice information data lead to text-to-speech conversion media.

Die Erfindung liefert weiter ein auf Sprache reagierendes Kommunikationssystem, das ein Sprache-in-Sprache-Umwandlungssystem einschließt, wie es in den vorstehenden Absätzen ausgeführt wurde, oder verwendet ein Verfahren, wie es in den vorstehenden Absätzen aufgeführt wurde, um eine gesprochene Antwort auf Sprache zu liefern, die in das System eingegeben ist. Im wesentlichen bestehen die charakteristischen Merkmale des Sprache-in-Sprache-Umwandlungssystems und des Verfahrens gemäß der vorliegenden Erfindung in folgendem:The invention further provides speech-responsive communication system that is a speech-to-speech conversion system includes how it in the previous paragraphs was executed or uses a method as set out in the previous paragraphs to deliver a spoken answer to language that is in the system is entered. The characteristic ones essentially exist Features of the speech-to-speech conversion system and method according to the present Invention in the following:

- prosody information is extracted from speech that is fed to the input of the system, and handled by the process;
- the Prosody information is in the form of the fundamental tone curve entered language;
- the Fundamental curve is used to dialect, sentence accent and sentence Stress information for get the entered language;
- the Sentence accent and emphasis information are used in the interpretation of the language entered, the result of the interpretation is used to get voice information data from a database get used in the formulation of voice responses to the voice inputs becomes; and
- the Dialect information is used to ensure that the voice responses to the voice inputs have a dialect that is adapted to that of the speech inputs.

Die vorstehenden und andere Merkmale der vorliegenden Erfindung werden besser aus der folgenden Beschreibung unter Bezugnahme auf die einzige Figur der beigefügten Zeichnungen verstanden werden, die in Form eine Blockdiagramms ein Sprache-in-Sprache-Umwandlungssystem der Erfindung darstellt.The above and other features of the present invention will become better from the following description with reference to the single figure of the accompanying drawings be understood in the form of a block diagram of a speech-to-speech conversion system represents the invention.

Man wird aus der einzigen Figur der beigefügten Zeichnungen ersehen, daß ein Sprache-in-Sprache-Umwandlungssystem der Erfindung an ihrem Eingang 1 eine Spracherkennungseinheit 2 und eine Extraktionseinheit 3 zum Extrahieren von Prosodieinformation von der Sprache einschließt, die in den Systemeingang 1 eingegeben ist, d. h. die Fundamentaltonkurve der eingegebenen Sprache. Es werden also Spracheingänge, die an den Eingang 1 angelegt werden, gleichzeitig an die Einheiten 2 und 3 angelegt.It will be seen from the single figure of the accompanying drawings that a speech-to-speech conversion system of the invention is at its entrance 1 a speech recognition unit 2 and an extraction unit 3 for extracting prosody information from the language included in the system input 1 is entered, ie the fundamental tone curve of the entered language. So there are voice inputs to the input 1 be applied to the units at the same time 2 and 3 created.

Der Ausgang der Spracherkennungseinheit 2 und ein Ausgang der Extraktionseinheit 3 sind mit getrennten Eingängen der Interpretationseinheit 4 verbunden, deren Ausgang mit einer Datenbankverwaltungseinheit 5 verbunden ist. Die Datenbankverwaltungseinheit 5, die ausgebildet ist für Zweiwegkommunikation mit einer Datenbank 6, ist an ihrem Ausgangsende mit dem Eingang eines Text-zu-Sprache-Umwandlers verbunden. Der Dialog zwischen der Datenbank 6 und der Datenbankverwaltungseinheit 5 kann durch irgendeine bekannte Datenbankkommunikationssprache, z. B. SQL (Structured Query Language, strukturierte Abfragesprache) bewirkt werden. Das Ausgangssignal des Text-zu-Sprache-Umwandlers 7 liefert ein synthetisiertes Sprachausgangssignal an das Sprache-in-Sprache-Umwandlungs-system.The output of the speech recognition unit 2 and an output of the extraction unit 3 are with separate inputs of the interpretation unit 4 connected, the output of which is connected to a database management unit 5 connected is. The database management unit 5 which is designed for two-way communication with a database 6 , is connected at its output end to the input of a text-to-speech converter. The dialogue between the database 6 and the database management unit 5 can be by any known database communication language, e.g. B. SQL (Structured Query Language, structured query language) can be effected. The output signal of the text-to-speech converter 7 provides a synthesized speech output signal to the speech-to-speech conversion system.

Wie dies in der einzigen Figur der Zeichnungen gezeigt ist, ist ein weiterer Ausgang der Extraktionseinheit 3 mit dem Eingang einer Prosodieanalysiereinheit 8 verbunden, die für Zweiwegkommunikation mit dem Text-zu-Sprache-Umwandler ausgebildet ist. Die Prosodieanalysiereinheit 8 ist dazu ausgebildet, als Teil des Text-zu-Sprache-Umwandlungsvorgangs des Umwandlers 7 die Prosodieinformation, d. h. die Fundamentalkurve der synthetisierten Sprache zu analysieren und irgendwelche notwendigen Korrekturen am Intonationsmuster der synthetisierten Sprache in Übereinstimmung mit der Dialektinformation vorzunehmen, die von der eingegebenen Sprache extrahiert worden ist. Der Dialekt des synthetisierten Sprachausgangsignals des Sprache-in-Sprache-Umwandlungssystem wird daher an denjenigen der eingegebenen Sprache angepaßt sein.As shown in the single figure of the drawings is another output of the extraction unit 3 with the input of a prosody analysis unit 8th connected, which is designed for two-way communication with the text-to-speech converter. The prosody analyzer 8th is designed to be part of the text-to-speech conversion process of the converter 7 analyze the prosody information, that is, the fundamental curve of the synthesized language and make any necessary corrections to the intonation pattern of the synthesized language in accordance with the dialect information extracted from the input language. The dialect of the synthesized speech output signal of the speech-to-speech conversion system will therefore be adapted to that of the input speech.

Man wird daher aus dem Vorstehenden erkennen, daß die vorliegende Erfindung dazu ausgebildet ist, eine gesprochene Antwort auf eine in das Sprache-in-Sprache-Umwandlungssystem eingegebene Sprache zu liefern, die einen Dialekt hat, die demjenigen der eingegebenen Sprache angepaßt ist, und daß dieser Umwandlungsvorgang die Schritte aufweist, die eingegebene Sprache zu erkennen und zu interpretieren, die Interpretation zu verwenden, um Sprachinformationsdaten von einer Datenbank für Verwendung bei der Formulierung der gesprochenen Antwort zu erhalten, Prosodieinformation von der eingegebenen Sprache zu extrahieren, Dialektinformation von der Prosodieinformation zu erhalten und die Sprachinformationsdaten, die von der Datenbank erhalten sind, in die gesprochene Antwort unter Verwendung der Dialektinformation umzuwandeln. Die Art und Weise, in der dies bewirkt werden kann, soll in den folgende Absätzen ausgeführt werden.One therefore becomes from the above recognize that the The present invention is designed to provide a spoken answer to one entered into the speech-to-speech conversion system To deliver language that has a dialect that that of the entered Language adapted and that this The conversion process comprises the steps of the entered language to recognize and interpret, to use the interpretation to Voice information data from a database for use in formulation to obtain the spoken answer, prosody information from the input Extract speech, dialect information from prosody information to get and the voice information data received from the database into the spoken answer using the dialect information convert. The way in which this can be done is said in the following paragraphs accomplished become.

In der Praxis werden die Spracheingaben in das Sprache-in-Sprache-Umwandlungssystem, die viele Formen haben können, z. B. Anforderungen auf Information über besondere Themen wie z. B. Bank- oder Telefondienste oder allgemeine Anfragen betreffend solcher Dienste, an den Eingang 1 und dadurch an die Eingänge der Einheiten 2 und 3 angelegt.In practice, the speech inputs to the speech-to-speech conversion system, which can take many forms, e.g. B. Requirements for information on special topics such as B. banking or telephone services or general inquiries regarding such services, to the entrance 1 and thereby to the inputs of the units 2 and 3 created.

Die Spracherkennungseinheit 2 und die Interpretationseinheit 4 sind dazu ausgebildet, in einer für den Fachmann wohlbekannten Art die Spracheingänge in das System zu erkennen und zu interpretieren. Die Spracherkennungseinheit 2 kann z. B. arbeiten, indem sie ein Hidden-Markov-Modell oder ein äquivalentes Sprachmodell verwendet. Im wesentlichen besteht die Funktion der Einheiten 2 und 4 darin, Spracheingangssignale in das System in eine Form umzuwandeln, die eine treue Darstellung des Inhalts der Spracheingänge ist und für Eingabe in den Eingang der Datenbankverwaltungseinheit 5 geeignet ist. Anders gesagt muß der Inhalt der Textinformationsdaten am Ausgang der Interpretationseinheit 4 eine genaue Darstellung des Spracheingangssignals sein und durch die Datenbankverwaltungseinheit 5 verwendbar sein, um Zugriff zu nehmen und Sprachinformationsdaten von der Datenbank 6 zu extrahieren für die Verwendung der Formulierung der synthetisierten gesprochenen Antwort auf das Spracheingangssignal. In der Praxis würde dieser Vorgang im wesentlichen durch Identifizieren einer Anzahl von Phonemen von einem Segment der eingegebenen Sprache bewirkt werden, die in Allophonstränge kombiniert werden, wobei die Phoneme als mögliche Worte oder Wortkombinationen interpretiert werden, um ein Modell der Sprache einzurichten. Das eingerichtete Sprachmodell wird Wort- und Satzakzente gemäß einem standardisierten Muster für die Sprache der eingegebenen Sprache haben.The speech recognition unit 2 and the interpretation unit 4 are designed to recognize and interpret the voice inputs into the system in a manner well known to those skilled in the art. The speech recognition unit 2 can e.g. B. work by using a hidden Markov model or an equivalent language model. The function of the units essentially exists 2 and 4 in converting voice input signals into the system into a form that is faithful to the content of the voice inputs and for input to the input of the database manager 5 suitable is. In other words, the content of the text information data must be at the output of the interpretation unit 4 be an accurate representation of the speech input signal and by the database management unit 5 be usable to access and voice information data from the database 6 extract for use in the formulation of the synthesized spoken response to the speech input signal. In practice, this process would essentially be accomplished by identifying a number of phonemes from a segment of the input speech that are combined into allophone strands, the phonemes interpreted as possible words or combinations of words to establish a model of the language. The language model set up will be word and sentence accents according to a standardized pattern for the language of the input language.

Die Information, die die erkannten Worte und Wortkombinationen betrifft, die durch die Spracherkennungseinheit 2 erzeugt wird, kann in der Praxis sowohl lexikalisch (unter Verwendung eines Lexikons mit Orthographie und Transkription) oder syntaktisch geprüft werden. Der Zweck dieser Überprüfungen besteht darin, irgendwelche Worte zu identifizieren und auszuschließen, die in der betreffenden Sprache nicht existieren, und/oder irgendwelche Phrasen auszuschließen, deren Syntax nicht der betreffenden Sprache entspricht.The information relating to the recognized words and word combinations by the speech recognition unit 2 can be checked in practice both lexically (using a lexicon with orthography and transcription) or syntactically. The purpose of these checks is to identify and exclude any words that do not exist in the language in question and / or to exclude any phrases whose syntax does not correspond to the language in question.

In Übereinstimmung mit der vorliegenden Erfindung stellt die Spracherkennungseinheit 2 also sicher, daß nur jene Worte und Wortkombinationen, die als sowohl lexikalisch als auch syntaktisch als annehmbar befunden werden, benutzt werden, um ein Modell der eingegebenen Sprache zu erzeugen. In der Praxis ist das Intonationsmuster des-Sprachmodells ein standardisiertes Intonationsmuster für die betreffende Sprache oder ein Intonationsmuster, das durch Training oder genaue Kenntnis unter Verwendung einer Anzahl von Dialekten der betreffenden Sprache eingerichtet ist.In accordance with the present invention, the speech recognition unit 2 so be sure that only those words and combinations of words that are found to be both lexical and syntactic acceptable are used to create a model of the input language. In practice, the intonation pattern of the language model is a standardized intonation pattern for the language in question or an intonation pattern established by training or detailed knowledge using a number of dialects of the language in question.

Die Prosodieinformation, d. h. die Fundamentaltonkurve, die von der eingegebenen Sprache durch die Extraktionseinheit 3 extrahiert ist, kann dazu verwendet werden, um Dialketinformation, Satzakzentinformation und Satzbetonungsinformation für Verwendung durch das Sprache-in-Sprache-Umwandlungssystem und für das Verfahren der vorliegenden Erfindung zu erhalten. Insbesondere kann die Dialektinformation durch das Sprachein-Sprache-Umwandlungssystem und das Verfahren benutzt werden, um den Dialekt der ausgegebenen Sprache an denjenigen der eingegebenen Sprache anzupassen, und die Satzakzent- und Betonungsinformation kann bei der Erkennung und Interpretation der eingegebenen Sprache verwendet werden.The prosody information, ie the fundamental tone curve, from the input language by the extraction unit 3 extracted can be used to obtain dialet information, sentence accent information and sentence emphasis information for use by the speech-to-speech conversion system and for the method of the present invention. In particular, the dialect information can be used by the speech-to-speech conversion system and method to match the dialect of the output language with that of the input language, and the sentence accent and emphasis information can be used in recognizing and interpreting the input language.

In Übereinstimmung mit der vorliegenden Erfindung schließen die Mittel zum Erhalten von Dialektinformation von der Prosodieinformation ein:In accordance with the present Close invention the means for obtaining dialect information from the prosody information on:

- first Analyzer for determining the intonation pattern of the fundamental tone the entered language and therefore the maximum and minimum values the fundamental tone curve and its corresponding places;
- second Analyzer for determining the intonation pattern of the fundamental tone curve of the language model and thus the maximum and minimum values of the Fundamental tone curve and its corresponding places; and
- Comparative means to compare the intonation pattern of the input language with the intonation pattern of the language model by a time difference between the occurrence of the maximum and minimum values of the fundamental tone curves the incoming language in relation to the maximum and minimum values compare the fundamental tone curve of the language model, where the identified time difference for the dialect characteristics gives hints to the entered language or displays them.

Die Zeitdifferenz, auf die oben Bezug genommen wurde, kann in bezug auf einen Intonationsmusterbezugspunkt bestimmt werden.The time difference referred to above taken with respect to an intonation pattern reference point be determined.

In der schwedischen Sprache kann die Differenz, was das Intonationsmuster anbetrifft, zwischen unterschiedlichen Dialekten durch unterschiedliche Zeitpunkte für Wort- und Satzakzent beschrieben werden, d. h., daß die Zeitdifferenz in bezug auf einen Intonationsmusterbezugspunkt bestimmt werden kann, z. B. den Punkt, bei dem die Konsonanten/Vokalgrenze auftritt.In the Swedish language can the difference in intonation pattern between different ones Dialects are described by different times for word and sentence accent, d. that is, the Time difference determined with respect to an intonation pattern reference point can be, e.g. B. the point at which the consonants / vowel boundary occurs.

Bei einer bevorzugten Ausführungsform der vorliegenden Erfindung ist der Bezugswert, gegen den die Zeitdifferenz gemessen wird, der Punkt, an dem die Konsonant/Vokalgrenze auftritt, d. h. die CV-Grenze.In a preferred embodiment The present invention is the reference value against which the time difference is measured, the point at which the consonant / vowel boundary occurs d. H. the CV limit.

Die identifizierte Zeitdifferenz, die, wie dies oben erwähnt wurde, den Dialekt der angegebenen Sprache anzeigt, d. h. der gesprochenen Sprache, wird an den Text-zu-Sprache-Umwandler 7 angelegt, um es zu ermöglichen, daß das Intonationsmuster und dadurch der Dialekt des Sprachausgangssignals des Systems korrigiert werden kann, so daß es dem Intonationsmuster der entsprechenden Worte und/oder Phrase der eingegebenen Sprache entspricht. Der Korrekturvorgang ermöglicht daher, daß die Dialektinformation in der Eingangssprache in die Ausgangssprache eingebaut wird.The identified time difference, which, as mentioned above, indicates the dialect of the specified language, ie the spoken language, is sent to the text-to-speech converter 7 designed to allow the intonation pattern and thereby the dialect of the system's speech output signal to be corrected to match the intonation pattern of the corresponding words and / or phrases of the input speech. The correction process therefore enables the dialect information in the input language to be built into the source language.

Wie dies oben angegeben wurde, beruht die Fundamentaltonkurve des Sprachmodells auf Information, die von lexikalischen (Orthographie und Transkription) und synthetischen Prüfungen besteht. zusätzlich schließt die Transkriptionsinformation lexikalisch abstrahierte Akzentinformation des Typs betonter Silben, d. h. tonale Wortakzente I und II, und Information ein, die sich auf den Ort des sekundären Akzents beziehen, d. h. Information, die z. B. in Wörterbüchern angegeben ist. Diese Information kann verwendet werden, um das Erkennungsmuster des Spracherkennungsmodells einzustellen, z. B. das Hidden-Markov-Modell, um die Transkriptionsinformation zu berücksichtigen. Ein exakteres Modell der eingegebenen Sprache wird daher während des Interpretationsvorgangs erhalten.As stated above, is based the fundamental tone curve of the language model on information provided by lexical (orthography and transcription) and synthetic exams consists. additionally includes the transcription information is lexically abstracted accent information of the accented syllable type, i.e. H. tonal word accents I and II, and information one that relate to the location of the secondary accent, d. H. Information such. B. specified in dictionaries is. This information can be used to identify the pattern set the speech recognition model, e.g. B. the hidden Markov model, to take into account the transcription information. A more precise one The model of the entered language is therefore used during the interpretation process receive.

Eine weitere Konsequenz dieses Sprachmodellkorrekturvorgangs besteht darin, daß mit der Zeit das Sprachmodell ein Betonungsmuster haben wird, das durch einen Trainingsvorgang eingerichtet ist.Another consequence of this language model correction process is that with the time the language model will have an emphasis pattern that through a training process is set up.

Mit dem System und dem Verfahren der vorliegenden Erfindung wird das Sprachmodell mit einer gesprochenen Eingangssequenz verglichen, und irgendein Unterschied zwischen kann bestimmt werden und dazu verwendet werden, das Sprachmodell in Übereinstimmung mit der gesprochenen Sequenz zu bringen und/oder Betonungen in der gesprochenen Sequenz zu bestimmen.With the system and method of the present invention, the language model is compared to a spoken input sequence, and any difference between can be determined and used to determine the language model in accordance with the spoken sequence and / or to determine stresses in the spoken sequence.

Die Identifizierung von Betonungen in einer gesprochenen Sequenz ermöglicht es, die genaue Bedeutung der gesprochenen Sequenz in unzweideutiger Weise zu bestimmen. Insbesondere können relative Satzbetonungen bestimmt werden, indem das Verhältnis Veränderungen und Deklination der Fundamentalkurve klassifiziert werden, wobei hervorgehobene Abschnitte oder individuelle Wörter bestimmt werden können. Zusätzlich kann die Tonhöhe der Sprache von der Deklination der Fundamentaltonkurve bestimmt werden.The identification of stresses in a spoken sequence allows the exact meaning to determine the spoken sequence in an unambiguous manner. In particular can be relative Sentence stresses are determined by changing the ratio and declination of the fundamental curve be classified, with highlighted sections or individual words can be determined. additionally can the pitch of the Language can be determined by the declination of the fundamental tone curve.

Um Satzbetonungen bei der Erkennung und Interpretation der Spracheingangssignale in das Sprache-in-Sprache-Umwandlungs system der vorliegenden Erfindung zu berücksichtigen, sind daher die Extraktionseinheit 3 in Verbindung mit der Interpretationseinheit 4 dazu ausgebildet, zu bestimmen:The extraction unit is therefore to take sentence emphasis into account in the recognition and interpretation of the speech input signals in the speech-to-speech conversion system of the present invention 3 in connection with the interpretation unit 4 trained to determine:

- on first ratio between the variation and the declination of the fundamental curve the language entered;
- on second ratio between the variation and the declination of the fundamental tone curve the language model; and
- the first and second ratios to compare, using any differences to To determine sentence accent arrangements.

Zusätzlich ermöglicht Klassifizierung des Verhältnisses zwischen der Variation und der Deklination der Fundamentaltonkurve, relative Satzbetonungen und hervorgehobene Abschnitt oder Wörter zu identifizieren/bestimmen.It also allows classification of the ratio between the variation and the declination of the fundamental tone curve, relative phrase emphasis and highlighted section or words too identify / determine.

Auch die Beziehung zwischen der Variation und der Deklination der Fundamentaltonkurve kann verwendet werden, den dynamischen Bereich der Fundamentaltonkurve zu bestimmen.Also the relationship between the variation and the declination of the fundamental tone curve can be used to determine the dynamic range of the fundamental tone curve.

Die Information, die in Bezug auf die Fundamentaltonkurve bezüglich Dialekt, Satzakzent und Betonung erhalten wird, kann für die Interpretation der Sprache durch die Interpretationseinheit 4 verwendet werden, d. h. die Information kann in der oben angegebenen Weise benutzt werden, um ein besseres Verständnis des Inhalts der eingegebenen Sprache zu erhalten und das Intonationsmuster des Sprachmodells in Übereinstimmung mit der eingegebenen Sprache zu bringen.The information obtained in relation to the fundamental tone curve with regard to dialect, sentence accent and emphasis can be used for the interpretation of the language by the interpretation unit 4 can be used, ie the information can be used in the manner indicated above, in order to obtain a better understanding of the content of the input language and to bring the intonation pattern of the language model into conformity with the input language.

Da das korrigierte Sprachmodell die Sprachcharakteristiken (einschließlich Dialektinformation, Satzakzent und Betonung) der eingegebenen Sprache aufweist, kann es benutzt werden, um ein verbessertes Verständnis der eingegebenen Sprache zu erhalten und kann wirksam durch die Datenbankverwaltungseinheit 5 verwendet werden, um die erforderlichen Sprachinformationsdaten von der Datenbank 6 zu erhalten, um eine Antwort auf eine Spracheingabe in das Sprache-in-Sprache-Umwandlungssystem zu formulieren.Since the corrected language model has the language characteristics (including dialect information, sentence accent and emphasis) of the input language, it can be used to obtain a better understanding of the input language and can be effective by the database management unit 5 used to get the required voice information data from the database 6 in order to formulate a response to a speech input into the speech-to-speech conversion system.

Die Fähigkeit, ohne weiteres unterschiedliche Dialekte in einer Sprache unter Verwendung von Fundamentaltonkurveninformation zu interpretieren, ist von einiger Bedeutung, da solche Interpretationen bewirkt werden können, ohne daß man das Spracherkennungssystem trainieren muß. Das Ergebnis hiervon ist, daß die Größe und dadurch die Kosten des Spracherkennungssystems, das in Übereinstimmung mit der Erfindung hergestellt ist, viel geringer sein können, als dies mit bekannten Systemen möglich ist. Es gibt daher deutliche Vorteile gegenüber bekannten Spracherkennungssystemen.The ability to be easily different Dialects in one language using fundamental tone curve information interpreting is of some importance since such interpretations do can be without you the speech recognition system must train. The result of this is that the Size and thereby the cost of the speech recognition system in accordance with the invention is manufactured, can be much less than with known ones Systems possible is. There are therefore clear advantages over known speech recognition systems.

Auch die Möglichkeit, Sprache zu detektieren, und zwar unabhängig von Dialektvariationen, in Übereinstimmung mit dem System und dem Verfahren der vorliegenden Erfindung ermöglicht es, Sprache in vielen auf Sprache reagierenden Anwendungen zu verwenden. Das System ist daher dazu ausgebildet, den Inhalt von Spracheingaben zu erkennen und genau zu interpretieren und den Dialekt der Sprachantwort so auszubilden, daß er mit dem Dialekt der eingegebenen Sprache übereinstimmt. Dieses verfahren schafft ein benutzerfreundliches System, da die Sprache des Mensch- Maschinendialog in Übereinstimmung ist mit dem Dialekt des betreffenden Benutzers.Also the ability to detect speech and independently of dialect variations, in agreement with the system and method of the present invention enables Use speech in many speech-responsive applications. The system is therefore designed to control the content of voice inputs to recognize and interpret precisely and the dialect of the speech response to train so that he matches the dialect of the language entered. This method creates a user-friendly system because the language of the human Machine dialog in agreement is with the dialect of the user concerned.

Die vorliegende Erfindung ist nicht auf die oben ausgeführten Ausführungsformen beschränkt, sondern kann innerhalb des Bereichs der beigefügten Patentansprüche abgewandelt werden.The present invention is not on the above embodiments limited, but may be modified within the scope of the appended claims become.

Claims

System for speech-to-speech conversion for generating at the output of the same of spoken answers to speech inputs entered on the system with speech recognition means for speech input; Means of interpretation for interpreting the content of the recognized speech input; and a database containing voice information data for use in formulating the spoken responses, the output of the interpretation means being used to access and obtain voice information data from the database, characterized in that the system further comprises extraction means for extracting the sentence rhythm information from the voice input ; Means for obtaining dialect information from the sentence rhythm information; and means for text-to-speech conversion for converting the speech information data obtained from the database into a spoken answer using the dialect information, the dialect of the spoken answer being adapted to that of the speech input, the Means for obtaining the dialect information from the speech rhythm information have first analysis means for determining the intonation pattern from the basic tone of the speech gangs and thereby the maximum and minimum values of the fundamental curve and their corresponding positions; second analysis means for determining the intonation pattern of the fundamental tone curve of the speech model and thereby the maximum and minimum values of the fundamental tone curve and their corresponding positions; Comparison means for comparing the intonation pattern of the speech input with the intonation pattern of the speech model to identify the time difference between the occurrence of the maximum and minimum values of the fundamental tone curves of the speech input in relation to the maximum and minimum values of the fundamental tone curve of the speech model, the identified time difference being the Displays dialect characteristics of the speech input.

System for speech-to-speech implementation according to claim 1, characterized in that the speech recognition means for the Identify a number of phonemes from a segment of the speech input customized are and means of interpretation for interpreting the phonemes as possible words or have combinations of words to build a language model, where the language model word and sentence accents according to a standardized pattern for the Language of the input speech.

System for speech-to-speech implementation according to claim 2, characterized in that the sentence rhythm information, extracted from the speech input, the root tone curve of the voice input is.

System for speech-to-speech implementation according to claim 3, characterized in that the time difference is determined in relation to an intonation pattern reference point.

System for speech-to-speech implementation according to claim 4, characterized in that the intonation pattern reference point related at which the time difference is measured is the point at which a consonant / vowel boundary occurs.

System for speech-to-speech conversion according to one the preceding claims, characterized in that the System further means for obtaining information about the Sentence accents from the sentence rhythm information have:

System for speech-to-speech implementation according to claim 6, characterized in that the speech recognition means checking means for lexical checking of the Words in the language model and to check the syntax of the phrases in the language model, the words and phrases being linguistic not possible are excluded from the language model in which the verification means are designed so that they the orthographic and phonetic transcription of the words in the Check language model by the transcription information, lexically abstracted accent information, Information regarding the type of stressed syllables and information regarding the location of the secondary accent having.

System for speech-to-speech implementation according to claim 7, characterized in that the accent information refers to the tonal word accent I and accent II.

System for speech-to-speech conversion according to one of claims 6 to 8, characterized in that the sentence accent information when interpreting the content of the recognized speech input is used.

System for speech-to-speech conversion according to one the preceding claims, characterized in that sentence stresses be determined and in interpreting the content of the recognized Voice input can be used.

System for speech-to-speech conversion according to one the preceding claims, characterized in that the System also dialog management means for managing a dialog with the database, the dialogue being through the means of interpretation is initiated.

System for speech-to-speech conversion according to claim 11, characterized in that the dialogue with the database for the application of voice information data leads the text-to-speech implementation means.

System for speech-to-speech implementation according to claim 10 or 11, characterized in that the dialogue with the database is done using SQL.

Communication system with voice response a system for speech-to-speech conversion according to one of the preceding Expectations.

A method of creating a spoken response to a voice input in a voice response communication system, the response having a dialect adapted to that of the voice input, the method comprising the steps of recognizing and interpreting the voice input and using the interpretation to obtain voice information data from a database for use in the formulation of the spoken answer, characterized in that the method further comprises the steps of extracting the sentence rhythm information from the speech input, obtaining dialect information from the sentence rhythm information and Um converting the speech information data obtained from the database into the spoken answer using the dialect information, the steps of determining the intonation pattern of the root of the speech input and thereby the maximum and minimum values of the root curve and their corresponding positions; Determining the intonation pattern of the fundamental curve of the language model and thereby the maximum and minimum values of the fundamental curve and their respective positions; Comparing the intonation pattern of the speech input with the intonation pattern of the speech model to identify the time difference between the occurrence of the maximum and minimum values of the fundamental tone curves for the speech input in relation to the maximum and minimum values of the fundamental tone curve of the speech model, the identified one Time difference shows the dialect characteristics of the speech input.

A method according to claim 15, characterized in that the Recognizing and interpreting the steps of identifying a number of phonemes from a segment of speech input and interpretation the phoneme as possible Has words or combinations of words to build a language model, the language model being word and sentence accents according to one standardized pattern for has the language of the speech input.

A method according to claim 16, characterized in that the Sentence rhythm information extracted from the speech input has been the fundamental tone curve of the speech input.

A method according to claim 15, characterized in that the Time difference in relation to an intonation pattern reference point is determined.

Method according to claim 18, characterized in that the Intonation pattern reference point, against which the time difference is measured, is the point at which a consonant / vowel boundary occurs.

Method according to one of claims 15 to 19, characterized through the step of obtaining information regarding the sentence accents from the Prosody information.

A method according to claim 20, characterized in that the Words in the language model are checked lexically that the phrases in terms of the language model the syntax are checked, that the Words and phrases that are not linguistically possible from the language model be excluded that the orthographic and phonetic transcription of the words in the language model is checked and that the Transcription information, lexically abstracted accent information the type of stressed syllables and information regarding the location of the secondary accent having.

A method according to claim 21, characterized in that the Accent information on the tonal word accent I and accent II refers.

Method according to one of claims 20 to 22, characterized through the step of using sentence accent information in the interpretation of the voice input.

Method according to one of claims 15 to 23, characterized by initiating a dialog with the database on Obtaining voice information data to formulate the spoken Answer, the dialogue on the interpretation of the speech input following, is initiated.

A method according to claim 24, characterized in that the Dialogue with the database on the application of voice information data in the text-to-speech implementation means leads.

Communication system with voice response, the is designed so that it use a method as claimed in any one of claims 15 to 25 can to a spoken answer to a voice input on the system to create.