EP0919052B1 - Procede et systeme de conversion de signaux vocaux en signaux vocaux - Google Patents
Procede et systeme de conversion de signaux vocaux en signaux vocaux Download PDFInfo
- Publication number
- EP0919052B1 EP0919052B1 EP97919840A EP97919840A EP0919052B1 EP 0919052 B1 EP0919052 B1 EP 0919052B1 EP 97919840 A EP97919840 A EP 97919840A EP 97919840 A EP97919840 A EP 97919840A EP 0919052 B1 EP0919052 B1 EP 0919052B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- information
- input
- model
- fundamental tone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000006243 chemical reaction Methods 0.000 title claims description 45
- 238000000034 method Methods 0.000 title claims description 45
- 238000004891 communication Methods 0.000 claims description 15
- 238000013518 transcription Methods 0.000 claims description 12
- 230000035897 transcription Effects 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 7
- 238000009472 formulation Methods 0.000 claims description 7
- 239000000203 mixture Substances 0.000 claims description 7
- 230000000977 initiatory effect Effects 0.000 claims description 2
- 238000012549 training Methods 0.000 description 5
- 230000001944 accentuation Effects 0.000 description 3
- 238000012937 correction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
Definitions
- the invention relates to a speech-to-speech conversion system and method which are capable of matching the dialect of speech outputs to that of the respective speech inputs, and to a voice responsive communication system including a speech-to-speech conversion system and operating in accordance with a speech-to-speech conversion method.
- the speech information which is stored in a database and used to provide appropriate synthesised spoken responses to voice inputs utilising a speech-to-speech conversion system, is normally reproduced in a dialect which conforms to a standard national dialect.
- the database of known voice responsive communication systems interpret received speech information, i.e. the voice inputs. It may also be difficult for the person making the voice inputs to fully understand the spoken response. Even if such responses are understandable to a recipient, it would be more user friendly if the dialect of the spoken response is the same as the dialect of the related voice input.
- the meaning of a word can have widely different meanings depending on language stress.
- the meaning of one and the same sentence can be given a different significance depending on where the stress is placed.
- the stressing of sentences, or parts thereof determines sections which are emphasised in the language and which may be of importance in determining the precise meaning of the spoken language.
- document WO-A-96/00962 discloses a speech recognition system for recognizing dialectal variations in a language.
- a voice responsive communication system In order to overcome these difficulties, it would be necessary for a voice responsive communication system to be capable of interpreting the received speech information, irrespective of dialect, and to match the dialect of speech outputs to that of the respective speech inputs. Also, in order to be able to determine the meaning of single words, or phrases, in an unambiguous manner in a spoken sequence, it would be necessary for the speech-to-speech converters used in a voice responsive communication system to be capable of determining, and taking account of, stresses in the spoken sequence.
- the invention as claimed in claims 1-26 provides a speech-to-speech conversion system for providing, at the output thereof, spoken responses to speech inputs to the system including speech recognition means for the input speech; interpretation means for interpreting the content of the recognised input speech; and a database containing speech information data for use in the formulation of said spoken responses, the output of said interpretation means being used to access said database and obtain speech information data therefrom, characterised in that the system further includes extraction means for extracting prosody information from the input speech; means for obtaining dialectal information from said prosody information; and text-to-speech conversion means for converting the speech information data obtained from said database into a spoken response using said dialectal information, the dialect of the spoken response being matched to that of the input speech.
- the speech recognition means may be adapted to identifying a number of phonemes from a segment of the input speech and to interpret the phonemes, as possible words, or word combinations, to establish a model of the speech, the speech model having word and sentence accents according to a standardised pattern for the language of the input speech.
- the prosody information extracted from the input speech is preferably the fundamental tone curve of the input speech.
- the means for obtaining dialectal information from said prosody information includes first analysing means for determining the intonation pattern of the fundamental tone of the input speech and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; second analysing means for determining the intonation pattern of the fundamental tone curve of the speech model and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; comparison means for comparing the intonation pattern of the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and minimum values of the fundamental tone curves of the incoming speech in relation to the maximum and minimum values of the fundamental tone curve of the speech model, the identified time difference being indicative of dialectal characteristics of the input speech.
- the time difference may be determined in relation to an intonation pattern reference point, for example, the point at which a consonant/vowel limit occurs.
- the speech-to-speech conversion system may include means for obtaining information on sentence accents from said prosody information.
- the speech recognition means includes checking means for lexically checking the words in the speech model and for syntactically checking the phrases in the speech model, the words and phrases which are not linguistically possible being excluded from the speech model.
- the checking means are, with this arrangement, adapted to check the orthography and phonetic transcription of the words in the speech model, the transcription information including lexically abstracted accent information, of type stressed syllables, and information relating to the location of secondary accent.
- the accent information may, for example, relate to tonal word accent I and accent II.
- the sentence accent information and/or sentence stressing may be used, to advantage, in the interpretation of the content of the recognised input speech.
- the speech-to-speech conversion system may include dialogue management means for managing a dialogue with the database, said dialogue being initiated by the interpretation means.
- the dialogue with the database results in the application of speech information data to the text-to-speech conversion means.
- the invention also provides, in a voice responsive communication system, a method for providing a spoken response to a speech input to the system, said response having a dialect to match that of the speech input, said method including the steps of recognising and interpreting the input speech, and utilising the interpretation to obtain speech information data from a database for use in the formulation of said spoken response, characterised in that said method further includes the steps of extracting prosody information from the input speech, obtaining dialectal information from said prosody information, and converting the speech information data obtained from said database into said spoken response using said dialectal information.
- the recognition and interpretation of the input speech includes the steps of identifying a number of phonemes from a segment of the input speech and interpreting the phonemes, as possible words, or word combinations, to establish a model of the speech, the speech model having word and sentence accents according to a standardised pattern for the language of the input speech.
- the prosody information extracted from the input speech is the fundamental tone curve of the input speech.
- the method according to the present invention includes the steps of determining the intonation pattern of the fundamental tone of the input speech and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; determining the intonation pattern of the fundamental tone curve of a speech model and thereby the maximum and minimum values of the fundamental tone curve and their respective positions; comparing the intonation pattern of the input speech with the intonation pattern of the speech model to identify a time difference between the occurrence of the maximum and minimum values of the fundamental tone curves of the incoming speech in relation to the maximum and minimum values of the fundamental tone curve of the speech model, the identified time difference being indicative of dialectal characteristics of the input speech.
- the time difference may be determined in relation to an intonation pattern reference point, for example, the point at which a consonant/vowel limit occurs.
- the method may include the step of obtaining information on sentence accents from said prosody information.
- the words in the speech model are checked lexically and the phrases in the speech model are checked syntactically, the words and phrases which are not linguistically possible being excluded from the speech model.
- the orthography and phonetic transcription of the words in the speech model may be checked, the transcription information including lexically abstracted accent information, of type stressed syllables, and information relating to the location of secondary accent.
- the accent information may relate to tonal word accent I and accent II.
- sentence accent information and/or sentence stressing may be used in the interpretation of the content of the recognised input speech.
- the method according to the present invention may include the step of initiating a dialogue with the database to obtain speech information data for formulating said spoken response, said dialogue being initiated following the interpretation of the input speech.
- the dialogue with the database may result in the application of speech information data to text-to-speech conversion means.
- the invention further provides a voice responsive communication system which includes a speech-to-speech conversion system as outlined in the preceding paragraphs, or utilises a method as outlined in the preceding paragraphs for providing a spoken response to a speech input to the system.
- the characteristic features of the speech-to-speech conversion system and method, according to the present invention are that:
- a speech-to-speech conversion system includes, at the input 1 thereof, a speech recognition unit 2 and an extraction unit 3 for extracting prosody information from speech applied to the system input 1, i.e. the fundamental tone curve of the input speech.
- speech inputs, applied to the input 1 are simultaneously applied to the units 2 and 3.
- the output of the speech recognition unit 2 and an output of the extraction unit 3 are connected to separate inputs of an interpretation unit 4, the output of which is connected to a database management unit 5.
- the database management unit 5 which is adapted for two way communication with a database 6, is connected at the output thereof to the input of a text-to-speech converter 7.
- the dialogue between the database 6 and the database management unit 5 can be effected by any known database communication language, for example, SQL (Structured Query Language).
- the output of the text-to-speech converter 7 provides a synthesised speech output for the speech-to-speech conversion system.
- a further output of the extraction unit 3 is connected to the input of a prosody analyzer unit 8 which is adapted for two way communication with the text-to-speech converter 7.
- the prosody analyzer unit 8 is adapted, as a part of the text-to-speech conversion process of the converter 7, to analyze the prosody information, i.e. the fundamental tone curve, of the synthesised speech and make any necessary corrections to the intonation pattern of the synthesised speech in accordance with the dialectal information extracted from the input speech.
- the dialect of the synthesised speech output of the speech-to-speech conversion system will match that of the input speech.
- the present invention is adapted to provide a spoken response to a speech input to the speech-to-speech conversion system which has a dialect to match that of the speech input and that this conversion process includes the steps of recognising and interpreting the input speech, utilising the interpretation to obtain speech information data from a database for use in the formulation of the spoken response, extracting prosody information from the input speech, obtaining dialectal information from the prosody information, and converting the speech information data obtained from said database into the spoken response using the dialectal information.
- This will be outlined in the following paragraphs.
- the speech inputs to the speech-to-speech conversion system which may be in many forms, for example, requests for information on particular topics, such as banking or telephone services, or general enquiries concerning such services, are applied to the input 1 and thereby to the inputs of the units 2 and 3.
- the speech recognition unit 2 and interpretation unit 4 are adapted to operate, in a manner well known to persons skilled in the art, to recognise and interpret the speech inputs to the system.
- the speech recognition unit 2 may, for example, operate by using a Hidden Markov model, or an equivalent speech model.
- the function of the units 2 and 4 is to convert speech inputs to the system into a form which is a faithful representation of the content of the speech inputs and suitable for application to the input of the database management unit 5.
- the content of the textual information data at the output of the interpretation unit 4 must be an accurate representation of the speech input and be usable by the database management unit 5 to access, and extract speech information data from, the database 6 for use in the formulation of a synthesised spoken response to the speech input.
- this process would, in essence, be effected by identifying a number of phonemes from a segment of the input speech which are combined into allophone strings, the phonemes being interpreted as possible words, or word combinations, to establish a model of the speech.
- the established speech model will have word and sentence accents according to a standardised pattern for the language of the input speech.
- the information, concerning the recognised words and word combinations, generated by the speech recognition unit 2 may, in practice, be checked both lexically (using a lexicon, with orthography and transcription) and syntactically.
- the purpose of these checks is to identify and exclude any words which do not exist in the language concerned, and/or any phrase whose syntax does not correspond with the language concerned.
- the speech recognition unit 2 ensures that only those words, and word combinations, which are found to be acceptable both lexically and syntactically, are used to create a model of the input speech.
- the intonation pattern of the speech model is a standardised intonation pattern for the language concerned, or an intonation pattern which has been established by training, or explicit knowledge, using a number of dialects of the language concerned.
- the prosody information i.e. the fundamental tone curve, extracted from the input speech by the extraction unit 3 can be used to obtain dialectal, sentence accent and sentence stressing, information, for use by the speech-to-speech conversion system and method of the present invention.
- the dialectal information can be used by the speech-to-speech conversion system and method to match the dialect of the output speech to that of the input speech and the sentence accent and stressing information can be used in the recognition and interpretation of the input speech.
- the means for obtaining dialectal information from the prosody information includes;
- the time difference may be determined in relation to an intonation pattern reference point.
- the-difference, in terms of intonation pattern, between different dialects can be described by different points in time for word and sentence accent, i.e. the time difference can be determined in relation to an intonation pattern reference point, for example, the point at which a consonant/vowel limit occurs.
- the reference against which the time difference is measured is the point at which the consonant/vowel boundary, i.e. the CV-boundary, occurs.
- the identified time difference which, as stated above, is indicative of the dialect in the input speech, i.e. the spoken language, is applied to the text-to-speech converter 7 to enable the intonation pattern, and thereby the dialect, of the speech output of the system to be corrected so that it corresponds to the intonation pattern of the corresponding words and/or phrase of the input speech.
- this corrective process enables the dialectal information in the input speech to be incorporated into the output speech.
- the fundamental tone curve of the speech model is based on information resulting from the lexical (orthography and transcription) and syntactic checks.
- the transcription information includes lexically abstracted accent information, of type stressed syllables, i.e. tonal word accents I and II, and information relating to the location of secondary accent, i.e. information given, for instance, in dictionaries. This information can be used to adjust the recognition pattern of the speech recognition model, for example, the Hidden Markov model, to take account of the transcription information. A more exact model of the input speech is, therefore, obtained during the interpretation process.
- the speech model is compared with a spoken input sequence, and any difference there between can be determined and used to bring the speech model into conformity with the spoken sequence and/or to determine stresses in the spoken sequence.
- relative sentence stresses can be determined by classifying the ratio between variations and declination of the fundamental tone curve, whereby emphasised sections, or individual words can be determined.
- the pitch of the speech can be determined from the declination of the fundamental tone curve.
- the extraction unit 3 in association with the interpretation unit 4, is adapted to determine:
- classification of the ratio between the variation and declination of the fundamental tone curve makes it possible to identify/determine relative sentence stresses, and emphasised sections, or words.
- the relation between the variation and declination of the fundamental tone curve can be utilised to determine the dynamic range of the fundamental tone curve.
- the information obtained in respect of the fundamental tone curve concerning dialect, sentence accent and stressing can be used for the interpretation of speech by the interpretation unit 4, i.e. the information can be used, in the manner outlined above, to obtain a better understanding of the content of the input speech and bring the intonation pattern of the speech model into conformity with the input speech.
- the corrected speech model exhibits the language characteristics (including dialect information, sentence accent and stressing) of the input speech it can be used to give an increased understanding of the input speech and be effectively used by the database management unit 5 to obtain the required speech information data from the database 6 to formulate a response to a voice input to the speech-to-speech conversion system.
- the ability to detect speech, irrespective of dialect variations, in accordance with the system and method of the present invention makes it possible to use speech in many different voice-responsive applications.
- the system is, therefore, adapted to recognise and accurately interpret the content of speech inputs and to tailor the dialect of the voice response to match the dialect of the voice input.
- This process provides a user friendly system because the language of the man-machine dialogue is in accordance with the dialect of the user concerned.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Use Of Switch Circuits For Exchanges And Methods Of Control Of Multiplex Exchanges (AREA)
Claims (26)
- Système de conversion de parole en parole pour fournir, à sa sortie, des réponses parlées à des entrées de parole dans le système, comprenant : des moyens de reconnaissance de parole pour la parole d'entrée ; des moyens d'interprétation pour interpréter le contenu de la parole d'entrée reconnue ; et une base de données contenant des données d'information de parole utilisables dans la formulation des dites réponses parlées, la sortie des dits moyens d'interprétation étant utilisée pour accéder à la dite base de données et obtenir des données d'information de parole, caractérisé en ce que le système comprend en outre des moyens d'extraction pour extraire une information de prosodie à partir de la parole d'entrée ; des moyens d'obtention d'une information dialectale à partir de la dite information de prosodie ; des moyens de conversion de texte en parole pour convertir les données d'information de parole obtenues à partir de la dite base de données en une réponse parlée avec utilisation de la dite information dialectale, le dialecte de la réponse parlée correspondant à celui de la parole d'entrée, les moyens d'obtention d'une information dialectale à partir de la dite information de prosodie comprenant des premiers moyens d'analyse pour déterminer la configuration d'intonation du ton fondamental de la parole d'entrée et donc les valeurs maximale et minimale de la courbe de ton fondamental et leurs positions respectives ; des deuxièmes moyens d'analyse pour déterminer la configuration d'intonation de la courbe de ton fondamental du modèle de parole et ainsi les valeurs maximale et minimale de la courbe de ton fondamental et leurs positions respectives ; et des moyens de comparaison pour comparer la configuration d'intonation de la parole d'entrée avec la configuration d'intonation du modèle de parole afin d'identifier une différence de temps entre l'occurrence des valeurs maximale et minimale des courbes de ton fondamental de la parole entrante par rapport aux valeurs maximale et minimale de la courbe de ton fondamental du modèle de parole, la différence de temps identifiée étant indicative des caractéristiques dialectales de la parole d'entrée.
- Système de conversion de parole en parole selon la revendication 1, caractérisé en ce que les moyens de reconnaissance de la parole sont prévus pour identifier une pluralité de phonèmes à partir d'un segment de la parole d'entrée et ils comprennent des moyens d'interprétation pour interpréter les phonèmes, comme mots ou combinaisons de mots possibles, afin d'établir un modèle de la parole, le modèle de parole ayant des accents de mots et de phrases selon une configuration standardisée pour le langage de la parole d'entrée.
- Système de conversion de parole en parole selon la revendication 2, caractérisé en ce que l'information de prosodie extraite de la parole d'entrée est la courbe de ton fondamental de la parole d'entrée.
- Système de conversion de parole en parole selon la revendication 3, caractérisé en ce que la différence de temps est déterminée par rapport à un point de référence de configuration d'intonation.
- Système de conversion de parole en parole selon la revendication 4, caractérisé en ce que le point de référence de configuration d'intonation, par rapport auquel la différence de temps est mesurée, est le point auquel une limite consonne / voyelle apparaít.
- Système de conversion de parole en parole selon une quelconque des revendications précédentes, caractérisé en ce que le système comprend en outre des moyens d'obtention d'une information sur les accents de phrase à partir de la dite information de prosodie.
- Système de conversion de parole en parole selon la revendication 6, caractérisé en ce que les moyens de reconnaissance de la parole comprennent des moyens de vérification pour vérifier lexicalement les mots dans le modèle de parole et pour vérifier la syntaxe des phrases dans le modèle de parole, les mots et les phrases qui ne sont pas linguistiquement possibles étant exclus du modèle de parole ; en ce que les moyens de vérification sont prévus pour vérifier la transcription orthographique et phonétique des mots dans le modèle de parole ; et en ce que l'information de transcription comprend une information d'accent lexicalement abstraite, du type à syllabes accentuées, et une information concernant la position d'un accent secondaire.
- Système de conversion de parole en parole selon la revendication 7, caractérisé en ce que l'information d'accès concerne l'accent tonal I et l'accent tonal II de mot.
- Système de conversion de parole en parole selon une quelconque des revendications 6 à 8, caractérisé en ce que la dite information d'accent de phrase est utilisée dans l'interprétation du contenu de la parole d'entrée reconnue.
- Système de conversion de parole en parole selon une quelconque des revendications précédentes, caractérisé en ce que les accentuations de phrase sont déterminées et utilisées dans l'interprétation du contenu de la parole d'entrée reconnue.
- Système de conversion de parole en parole selon une quelconque des revendications précédentes, caractérisé en ce que le système comprend en outre des moyens de gestion de dialogue pour gérer un dialogue avec la base de données, le dit dialogue étant démarré par les moyens d'interprétation.
- Système de conversion de parole en parole selon la revendication 11, caractérisé en ce que le dialogue avec la base de données a pour résultat l'application de données d'information de parole aux moyens de conversion de texte en parole.
- Système de conversion de parole en parole selon la revendication 10 ou la revendication 11, caractérisé en ce que le dialogue avec la base de données est effectué au moyen d'un langage d'interrogation structuré SQL.
- Système de communication répondant à la voix, qui comprend un système de conversion de parole en parole selon une quelconque des revendications précédentes.
- Procédé de fourniture d'une réponse parlée à une entrée de parole dans un système de communication répondant à la voix, la dite réponse ayant un dialecte correspondant à celui de l'entrée de parole, le dit procédé comprenant les étapes de reconnaissance et d'interprétation de la parole d'entrée, et d'utilisation de l'interprétation pour obtenir des données d'information de parole à partir d'une base de données pour utilisation dans la formulation de la dite réponse parlée, caractérisé en ce que le dit procédé comprend en outre les étapes d'extraction d'une information de prosodie à partir de la parole d'entrée, d'obtention d'une information dialectale à partir de la dite information de prosodie, et de conversion des données d'information de parole obtenues à partir de la dite base de données en la dite réponse parlée par utilisation de la dite information dialectale, les étapes de détermination de la configuration d'intonation du ton fondamental de la parole d'entrée et donc des valeurs maximale et minimale de la courbe de ton fondamental et leurs positions respectives ; de détermination de la configuration d'intonation de la courbe de ton fondamental d'un modèle de parole et donc des valeurs maximale et minimale de la courbe de ton fondamental et de leurs positions respectives ; de comparaison de la configuration d'intonation de la parole d'entrée avec la configuration d'intonation du modèle de parole pour identifier une différence de temps entre l'occurrence des valeurs maximale et minimale des courbes de ton fondamental de la parole entrante par rapport aux valeurs maximale et minimale de la courbe de ton fondamental du modèle de parole, la différence de temps identifiée étant indicative des caractéristiques dialectales de la parole d'entrée.
- Procédé selon la revendication 15, caractérisé en ce que la reconnaissance et l'interprétation de la parole d'entrée comprennent les étapes d'identification d'une pluralité de phonèmes à partir d'un segment de la parole d'entrée et d'interprétation des phonèmes, comme mots ou combinaisons de mots possibles, afin d'établir un modèle de la parole, le modèle de parole ayant des accents de mot et de phrase conformes à une configuration standardisée pour le langage de la parole d'entrée.
- Procédé selon la revendication 16, caractérisé en ce que l'information de prosodie extraite de la parole d'entrée est la courbe de ton fondamental de la parole d'entrée.
- Procédé selon la revendication 15, caractérisé en ce que la différence de temps est déterminée par rapport à un point de référence de configuration d'intonation.
- Procédé selon la revendication 18, caractérisé en ce que le point de référence de configuration d'intonation, par rapport auquel la différence de temps est mesurée, est le point auquel une limite consonne / voyelle apparaít.
- Procédé selon une quelconque des revendications 15 à 19, caractérisé par l'étape d'obtention d'une information sur les accentuations de phrase à partir de la dite information de prosodie.
- Procédé selon la revendication 20, caractérisé en ce que les mots dans le modèle de parole sont vérifiés lexicalement, en ce que les phrases dans le modèle de parole sont vérifiées pour la syntaxe, en ce que les mots et les phrases qui ne sont pas linguistiquement possibles sont exclus du modèle de parole, en ce que l'orthographe et la transcription phonétique des mots dans le modèle de parole sont vérifiées, et en ce que l'information de transcription inclut une information d'accent lexicalement abstraite, du type des syllabes accentuées, et une information concernant la position d'un accent secondaire.
- Procédé selon la revendication 21, caractérisé en ce que l'information d'accent concerne l'accent tonal I et l'accent tonal II des mots.
- Procédé selon une quelconque des revendications 20 à 22, caractérisé par l'étape d'utilisation de la dite information d'accent de phrase dans l'interprétation de la parole d'entrée.
- Procédé selon une quelconque des revendications 15 à 23, caractérisé par l'étape de démarrage d'un dialogue avec la base de données afin d'obtenir des données d'information de parole pour formuler la dite réponse parlée, le dit dialogue étant démarré à la suite de l'interprétation de la parole d'entrée.
- Procédé selon la revendication 24, caractérisé en ce que le dialogue avec la base de données a pour résultat l'application de données d'information de parole aux moyens de conversion de texte en parole.
- Système de communication répondant à la voix, qui est prévu pour utiliser un procédé selon une quelconque des revendications 15 à 25 afin de fournir une réponse parlée à une entrée de parole dans le système.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SE9601811A SE506003C2 (sv) | 1996-05-13 | 1996-05-13 | Metod och system för tal-till-tal-omvandling med extrahering av prosodiinformation |
SE9601811 | 1996-05-13 | ||
PCT/SE1997/000583 WO1997043756A1 (fr) | 1996-05-13 | 1997-04-08 | Procede et systeme de conversion de signaux vocaux en signaux vocaux |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0919052A1 EP0919052A1 (fr) | 1999-06-02 |
EP0919052B1 true EP0919052B1 (fr) | 2003-07-09 |
Family
ID=20402543
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP97919840A Expired - Lifetime EP0919052B1 (fr) | 1996-05-13 | 1997-04-08 | Procede et systeme de conversion de signaux vocaux en signaux vocaux |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP0919052B1 (fr) |
DE (1) | DE69723449T2 (fr) |
DK (1) | DK0919052T3 (fr) |
NO (1) | NO318557B1 (fr) |
SE (1) | SE506003C2 (fr) |
WO (1) | WO1997043756A1 (fr) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1159702C (zh) * | 2001-04-11 | 2004-07-28 | 国际商业机器公司 | 具有情感的语音-语音翻译系统和方法 |
US7181397B2 (en) * | 2005-04-29 | 2007-02-20 | Motorola, Inc. | Speech dialog method and system |
DE102007011039B4 (de) * | 2007-03-07 | 2019-08-29 | Man Truck & Bus Ag | Freisprecheinrichtung in einem Kraftfahrzeug |
US8150020B1 (en) | 2007-04-04 | 2012-04-03 | At&T Intellectual Property Ii, L.P. | System and method for prompt modification based on caller hang ups in IVRs |
US8024179B2 (en) * | 2007-10-30 | 2011-09-20 | At&T Intellectual Property Ii, L.P. | System and method for improving interaction with a user through a dynamically alterable spoken dialog system |
JP5282469B2 (ja) | 2008-07-25 | 2013-09-04 | ヤマハ株式会社 | 音声処理装置およびプログラム |
EP3389043A4 (fr) | 2015-12-07 | 2019-05-15 | Yamaha Corporation | Dispositif d'interaction vocale et procédé d'interaction vocale |
CN113470670B (zh) * | 2021-06-30 | 2024-06-07 | 广州资云科技有限公司 | 电音基调快速切换方法及系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2165969B (en) * | 1984-10-19 | 1988-07-06 | British Telecomm | Dialogue system |
JPH0772840B2 (ja) * | 1992-09-29 | 1995-08-02 | 日本アイ・ビー・エム株式会社 | 音声モデルの構成方法、音声認識方法、音声認識装置及び音声モデルの訓練方法 |
SE9301596L (sv) * | 1993-05-10 | 1994-05-24 | Televerket | Anordning för att öka talförståelsen vid översätttning av tal från ett första språk till ett andra språk |
SE504177C2 (sv) * | 1994-06-29 | 1996-12-02 | Telia Ab | Metod och anordning att adaptera en taligenkänningsutrustning för dialektala variationer i ett språk |
-
1996
- 1996-05-13 SE SE9601811A patent/SE506003C2/sv unknown
-
1997
- 1997-04-08 EP EP97919840A patent/EP0919052B1/fr not_active Expired - Lifetime
- 1997-04-08 DK DK97919840T patent/DK0919052T3/da active
- 1997-04-08 DE DE69723449T patent/DE69723449T2/de not_active Expired - Fee Related
- 1997-04-08 WO PCT/SE1997/000583 patent/WO1997043756A1/fr active IP Right Grant
-
1998
- 1998-11-06 NO NO19985179A patent/NO318557B1/no unknown
Also Published As
Publication number | Publication date |
---|---|
SE9601811D0 (sv) | 1996-05-13 |
DE69723449T2 (de) | 2004-04-22 |
NO985179L (no) | 1998-11-11 |
WO1997043756A1 (fr) | 1997-11-20 |
EP0919052A1 (fr) | 1999-06-02 |
NO318557B1 (no) | 2005-04-11 |
SE9601811L (sv) | 1997-11-03 |
DK0919052T3 (da) | 2003-11-03 |
SE506003C2 (sv) | 1997-11-03 |
DE69723449D1 (de) | 2003-08-14 |
NO985179D0 (no) | 1998-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5752227A (en) | Method and arrangement for speech to text conversion | |
US5806033A (en) | Syllable duration and pitch variation to determine accents and stresses for speech recognition | |
JP4536323B2 (ja) | 音声−音声生成システムおよび方法 | |
US6910012B2 (en) | Method and system for speech recognition using phonetically similar word alternatives | |
AU2009249165B2 (en) | Systems and methods of improving automated speech recognition accuracy using statistical analysis of search terms | |
US7191132B2 (en) | Speech synthesis apparatus and method | |
US20030069729A1 (en) | Method of assessing degree of acoustic confusability, and system therefor | |
EP0767950B1 (fr) | Procede et dispositif pour adapter un equipement de reconnaissance de la parole aux variantes dialectales dans une langue | |
JPH09500223A (ja) | 多言語音声認識システム | |
GB2380380A (en) | Speech synthesis method and apparatus | |
US5677992A (en) | Method and arrangement in automatic extraction of prosodic information | |
EP0919052B1 (fr) | Procede et systeme de conversion de signaux vocaux en signaux vocaux | |
Badino et al. | Language independent phoneme mapping for foreign TTS | |
US11817079B1 (en) | GAN-based speech synthesis model and training method | |
Kadambe et al. | Language identification with phonological and lexical models | |
Chomphan et al. | Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis | |
WO1997043707A1 (fr) | Ameliorations relatives a la conversion voix-voix | |
Chou et al. | Automatic segmental and prosodic labeling of Mandarin speech database | |
Alam et al. | Development of annotated Bangla speech corpora | |
Gros et al. | SI-PRON pronunciation lexicon: a new language resource for Slovenian | |
Wangchuk et al. | Developing a Text to Speech System for Dzongkha | |
Potisuk et al. | Using stress to disambiguate spoken Thai sentences containing syntactic ambiguity | |
KR20220036237A (ko) | 딥러닝을 기반으로 하는 가이드 음성 제공 시스템 | |
Williams | The segmentation and labelling of speech databases | |
Meinedo et al. | The use of syllable segmentation information in continuous speech recognition hybrid systems applied to the Portuguese language. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19981214 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): CH DE DK FI FR GB LI NL |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7G 10L 13/08 A, 7G 06F 3/16 B |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Designated state(s): CH DE DK FI FR GB LI NL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030709 Ref country code: LI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030709 Ref country code: CH Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030709 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REF | Corresponds to: |
Ref document number: 69723449 Country of ref document: DE Date of ref document: 20030814 Kind code of ref document: P |
|
REG | Reference to a national code |
Ref country code: DK Ref legal event code: T3 |
|
NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20040414 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DK Payment date: 20080411 Year of fee payment: 12 Ref country code: DE Payment date: 20080418 Year of fee payment: 12 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FI Payment date: 20080415 Year of fee payment: 12 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20080412 Year of fee payment: 12 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20080421 Year of fee payment: 12 |
|
REG | Reference to a national code |
Ref country code: DK Ref legal event code: EBP |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20090408 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20091231 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090408 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20091103 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090408 Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20091222 Ref country code: DK Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090430 |