WO2001018788A2 - Procede de determination de fins de phrase dans le traitement vocal automatique - Google Patents
Procede de determination de fins de phrase dans le traitement vocal automatique Download PDFInfo
- Publication number
- WO2001018788A2 WO2001018788A2 PCT/DE2000/002979 DE0002979W WO0118788A2 WO 2001018788 A2 WO2001018788 A2 WO 2001018788A2 DE 0002979 W DE0002979 W DE 0002979W WO 0118788 A2 WO0118788 A2 WO 0118788A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tokens
- token
- sentence
- assessment
- category
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 51
- 238000013528 artificial neural network Methods 0.000 claims description 18
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 230000003936 working memory Effects 0.000 claims 2
- 238000004590 computer program Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 101100154785 Mus musculus Tulp2 gene Proteins 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
Definitions
- the present invention relates to a method for end-of-sentence determination in automatic speech processing.
- the two main areas of application for automatic speech processing are automatic speech recognition and automatic speech synthesis.
- Methods for synthesizing speech are known for example from EP 793 218 A2, EP 821 344 A2 or WO 96/42079.
- a text present in the form of a text file is converted into an audio file which is output as speech by means of an acoustic output unit.
- an attempt is made to reproduce human language as precisely as possible.
- the two main criteria for this are the intelligibility of the language itself and the prosody of the language generated.
- the prosody is essentially determined by the basic frequency (voice position), sound energy (loudness) and sound duration (stretching and pauses).
- a complex problem in creating the correct prosody is the recognition of the end of the sentence in any text. To do this, the punctuation marks valid in the respective language must be interpreted correctly. So far, this problem has been solved by rule-based routines which are implemented in a corresponding program for generating speech. To set up such a rule-based routine, a language expert is required who sets up a rule set for the respective language. The creation of the rule set means a considerable effort, which must be repeated for each language for which the method is to be used.
- the invention has for its object to provide methods for end-of-sentence determination in automatic speech processing which can be adapted to different languages more easily than the known methods and nevertheless correctly recognizes sentence ends with the lowest error rate.
- the method according to the invention for determining the end of a sentence in automatic speech processing comprises the following steps:
- the assessment of the flagged tokens can be carried out using a data-driven routine, that is to say a learning program part which can adapt itself to a language essentially independently.
- data-driven routines are routines that independently generate statistics and evaluate them accordingly when making a decision, or neural networks.
- a lso, the disambiguating the token can be realized by means of data-driven routines.
- the method according to the invention is particularly suitable for data-driven routines, since the assessment of the tokens provided with a flag is carried out after disambiguating the tokens on the basis of the category assigned to them, so that the linguistic categories of the individual tokens determined are almost completely correct and accordingly the token can be assessed precisely.
- the two procedural steps of disambiguating and evaluating the tokens provided with a flag are designed as neural networks, each of which has the same context, e.g. access three tokens before and three tokens after the token to be examined.
- Fig. 4 shows the structure of a neural network for assessing sentence endings.
- a text file token is divided.
- tokens are all text elements that are located between two token separators.
- the token separators include spaces, tabs and end-of-line characters.
- E token begins with a character that is not a separator and ends with the character after which e separator comes. These separators can be stored in a separate file for each language.
- the tokens that can represent the end of the sentence are marked with a corresponding flag.
- Flags in the sense of the invention are any data assignments with which individual tokens can be identified simply and quickly as a possible sentence end after a corresponding assignment. This flag is called PEOS (possible end of sentence). All tokens that have an emblem that can possibly be understood as the end of a sentence are assessed as tokens that can represent the end of a sentence.
- end-of-sentence characters a distinction is made between characters that always mark the end of a sentence, such as the question mark or exclamation mark, and characters that can also have other uses, such as the period, which can also appear in abbreviations, acronyms and numbers.
- a special case for the determination of prosody is the colon, since it never stands at the end of the grammatical sentence, but for prosody, especially for a pause in speech, m generally has the same meaning as the point at the end of the sentence.
- the end-of-sentence character is at the end of the token and a lower-case token follows. In this case it is not the end of a sentence.
- the punctuation mark is in the token, which means that there is no token separator. This case occurs e.g. m Figures on (1.5, 13:20).
- the end-of-sentence character in no case marks the end of a sentence.
- the end of sentence character is at the end of the token and the next token does not begin with a small letter.
- This token which has the end of the sentence character at the end, represents a possible sentence end and is marked with the PEOS flag (PEOS: possible end of sentence).
- linguistic categories are assigned to the individual tokens.
- the linguistic categories include word classes and other characters that can be contained in a text.
- the linguistic categories used in the present exemplary embodiment are listed:
- the division of the linguistic categories given above is only an example. Other subdivisions of linguistic categories can also be used. For example, up to 40 linguistic categories are used in speech recognition. In the present invention, however, a division with fewer categories is advantageous since the neural networks explained in more detail below can be implemented more easily and can be trained faster.
- the linguistic categories belonging to the respective tokens are read from a lexicon. It is possible that several linguistic categories are assigned to a single token. As a rule, but not all tokens are a text in Le ⁇ xikon present, so that the appropriate category and the corresponding categories may not apply to all tokens with the help of the dictionary to be determined.
- the linguistic category of the tokens which cannot be clearly assigned to a category, is determined using a so-called OOV routine (out of vocabulary).
- this OOV routine is designed as a neural network, which uses the last four letters of the respective token to infer its category. However, this OOV routine can also be based on another data-driven method.
- the neural network of the OOV routme can also evaluate the last three or five characters of the token in order to infer its category. In another language, it may be appropriate not to determine the category based on the ending, but on another section of the token.
- the linguistic criterion can be ambiguous in both the categorization using the lexicon and the categorization using the OOV routme, that is to say that the token is assigned several linguistic categories.
- the lexicons for the individual languages are in turn language-specific, so that the lexicon must be replaced accordingly when the method according to the invention is transferred to another language.
- such lexicons are known for most languages, which is why the exchange of the lexicons is not a serious problem when the method according to the invention is transferred to another language.
- the tokens can be subjected to further processing operations, which are summarized in the flow chart shown in FIG. 1 in step S4.
- abbreviations, acronyms and formulas contained in the text can be evaluated. This can show that a token marked with a flag as a potential end of a sentence cannot be an end of a sentence. In such a case, the corresponding flag is deleted during these processing operations.
- Other such work processes can be, for example, normalizing (normalizing) or expanding (expanding) the tokens.
- normalizing a token tokens are categorized that contain characters of different categories, such as "54jahng".
- When tokens are expanded several tokens such as "New" and "York” are combined into a single token "New York". Even with these
- Processing operations can result in that a flag set in step S2 can be deleted, which is then carried out accordingly.
- the ambiguous tokens that is to say the tokens to which several linguistic categories are assigned, are disambiguated.
- this is carried out by a neural network which is based on a standard feed-forward architecture with a hidden layer.
- This neural network is schematically shown in a roughly simplified manner in FIG. 3.
- On the input side it has nodes for the word to be disambiguated and the corresponding processors or successors.
- three tokens preceding the token to be disambiguated and three tokens following the token to be disambiguated are taken into account. This means that for the three tokens of the processors, 14 nodes are provided for the individual categories.
- 13 nodes are provided for the token to be disambiguated, since the category of punctuation marks does not have to be taken into account here.
- 3 x 14 (42) nodes are to be provided for the successor as well as for the predecessor. Each of these nodes thus represents a linguistic category for a specific token.
- the input signal +1 is applied to the nodes if the respective category is assigned to the respective token or -1 if this category is not assigned to the respective token , If there is no token with the processors or successors, what at the beginning and at the end of the text, the respective nodes are assigned the value 0.
- 13 nodes are provided for the respective categories of the word to be disambiguated.
- a hidden layer is located between the output nodes and the input nodes.
- FIG. 4 Show end of sentence or no end of sentence.
- the neural network in turn has 13 nodes for the token to be assessed and 42 nodes for the predecessor (3 tokens) and 42 nodes for the successor (3 tokens).
- a hidden layer is arranged above it and on the output side there is only a single node which represents the binary result, the token is an end of the sentence or is not an end of the sentence.
- This structure of the neural network shows that the linguistic category of the token to be assessed and the linguistic category of the processors and successors are also taken into account in the assessment of the token provided with the flag.
- An audio file can thus be generated on the basis of this data (step S7), with further parameters for determining the prosody to be taken into account here are, but which are not the subject of the present invention.
- the neural networks or other data-driven routines of the method according to the invention are initially m one
- Training phase trained using a text.
- the linguisti ⁇ rule categories of tokens and the ends of each sentence of this training text are known and the training will be during the input to be trained routines.
- the method according to the invention thus automatically learns the laws of a language, only known and easily available knowledge (division of the tokens, allocation of flags for sentence endings, lexicon) having to be added as expert knowledge.
- the method according to the invention learns the laws of language that are difficult to create in practice during training. The method according to the invention can thus be quickly and easily transferred to another language.
- the method according to the invention is implemented as a computer program on a computer system, as is shown schematically in a simplified manner in FIG. 2.
- the computer program can also be stored on an electronically readable data carrier and can thus be transferred to another computer system.
- the computer system 1 has an internal bus 2, which is connected to a memory area 3, a central processor unit 4 and an interface 5.
- the interface 5 establishes a data connection to further computer systems via a data line 6.
- the acoustic output unit 7 is connected to a loudspeaker 10, the graphic output unit 8 to a screen 11, and the output unit 9 to a keyboard 12.
- Texts that are stored in the memory 3 can be transmitted to the computer system 1 via the data line 6 and the interface 5.
- the memory area 3 is subdivided into several areas in which texts, audio files, application programs for carrying out the method according to the invention and further application and auxiliary programs are stored.
- the texts stored as a text file are converted by the application programs for executing the method according to the invention m audio files, which are transmitted via the internal bus 2 to the acoustic output unit 7 and are output by the latter at the loudspeaker 10 as speech.
- the invention is explained in more detail above using an exemplary embodiment for the German language.
- the invention is not restricted to the use of the German language, but can be very easily transferred to other languages in comparison with known methods.
- An essential advantage of the method according to the invention compared to known methods is that it also enables end-of-sentence recognition in languages for which expert knowledge of the language rules for determining the token category and the end of sentences is not yet known.
- the method according to the invention can thus also be used easily in languages which are not very popular and therefore only little researched.
- the two neural networks of the exemplary embodiment described above are designed as a single neural network for disambiguating and for evaluating the sentence ends. It is also possible to use any other statistical, data-driven method instead of neural networks.
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Selon l'invention, un texte divisé en unités lexicales est traité de telle sorte que les différentes unités lexicales sont d'abord subdivisées en catégories linguistiques prédéterminées, des unités lexicales ambiguës étant désambiguisées au cours d'une étape séparée et la détermination définitive des fins de phrase étant réalisée sur la base des catégories linguistiques.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19942171.4 | 1999-09-03 | ||
DE1999142171 DE19942171A1 (de) | 1999-09-03 | 1999-09-03 | Verfahren zur Satzendebestimmung in der automatischen Sprachverarbeitung |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001018788A2 true WO2001018788A2 (fr) | 2001-03-15 |
WO2001018788A3 WO2001018788A3 (fr) | 2001-09-07 |
Family
ID=7920746
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/DE2000/002979 WO2001018788A2 (fr) | 1999-09-03 | 2000-08-31 | Procede de determination de fins de phrase dans le traitement vocal automatique |
Country Status (2)
Country | Link |
---|---|
DE (1) | DE19942171A1 (fr) |
WO (1) | WO2001018788A2 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102016008855A1 (de) | 2016-07-20 | 2018-01-25 | Audi Ag | Verfahren zum Durchführen einer Sprachübertragung |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE3733674A1 (de) * | 1986-10-03 | 1988-04-21 | Ricoh Kk | Sprachanalysator |
US4773009A (en) * | 1986-06-06 | 1988-09-20 | Houghton Mifflin Company | Method and apparatus for text analysis |
EP0327266A2 (fr) * | 1988-02-05 | 1989-08-09 | AT&T Corp. | Méthode pour la détermination des élements de langage et utilisation |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5860064A (en) * | 1993-05-13 | 1999-01-12 | Apple Computer, Inc. | Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system |
US6330538B1 (en) * | 1995-06-13 | 2001-12-11 | British Telecommunications Public Limited Company | Phonetic unit duration adjustment for text-to-speech system |
JPH09230896A (ja) * | 1996-02-28 | 1997-09-05 | Sony Corp | 音声合成装置 |
JPH1039895A (ja) * | 1996-07-25 | 1998-02-13 | Matsushita Electric Ind Co Ltd | 音声合成方法および装置 |
-
1999
- 1999-09-03 DE DE1999142171 patent/DE19942171A1/de not_active Withdrawn
-
2000
- 2000-08-31 WO PCT/DE2000/002979 patent/WO2001018788A2/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4773009A (en) * | 1986-06-06 | 1988-09-20 | Houghton Mifflin Company | Method and apparatus for text analysis |
DE3733674A1 (de) * | 1986-10-03 | 1988-04-21 | Ricoh Kk | Sprachanalysator |
EP0327266A2 (fr) * | 1988-02-05 | 1989-08-09 | AT&T Corp. | Méthode pour la détermination des élements de langage et utilisation |
Non-Patent Citations (4)
Title |
---|
EDGINGTON M ET AL: "OVERVIEW OF CURRENT TEXT-TO-SPEECH TECHNIQUES: PART II - PROSODY AND SPEECH GENERATION" BT TECHNOLOGY JOURNAL,GB,BT LABORATORIES, Bd. 14, Nr. 1, 1996, Seiten 84-99, XP000554641 ISSN: 1358-3948 * |
PALMER D D ET AL: "Adaptive multilingual sentence boundary disambiguation" COMPUTATIONAL LINGUISTICS, JUNE 1997, MIT PRESS FOR ASSOC. COMPUT. LINGUISTICS, USA, Bd. 23, Nr. 2, Seiten 241-267, XP002164114 ISSN: 0891-2017 * |
SPROAT R W ET AL: "TEXT-TO-SPEECH SYNTHESIS" AT & T TECHNICAL JOURNAL,US,AMERICAN TELEPHONE AND TELEGRAPH CO. NEW YORK, Bd. 74, Nr. 2, 1. M{rz 1995 (1995-03-01), Seiten 35-44, XP000495044 ISSN: 8756-2324 * |
YUKIKO YAMAGUCHI ET AL: "A NEURAL NETWORK APPROACH TO MULTI-LANGUAGE TEXT-TO-SPEECH SYSTEM" PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP),TOKYO, JP, 18. November 1990 (1990-11-18), Seiten 325-328, XP000503375 * |
Also Published As
Publication number | Publication date |
---|---|
DE19942171A1 (de) | 2001-03-15 |
WO2001018788A3 (fr) | 2001-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE69908047T2 (de) | Verfahren und System zur automatischen Bestimmung von phonetischen Transkriptionen in Verbindung mit buchstabierten Wörtern | |
DE69519328T2 (de) | Verfahren und Anordnung für die Umwandlung von Sprache in Text | |
DE60203705T2 (de) | Umschreibung und anzeige eines eingegebenen sprachsignals | |
DE60020434T2 (de) | Erzeugung und Synthese von Prosodie-Mustern | |
DE69618503T2 (de) | Spracherkennung für Tonsprachen | |
DE69923379T2 (de) | Nicht-interaktive Registrierung zur Spracherkennung | |
DE69821673T2 (de) | Verfahren und Vorrichtung zum Editieren synthetischer Sprachnachrichten, sowie Speichermittel mit dem Verfahren | |
DE60200857T2 (de) | Erzeugung einer künstlichen Sprache | |
DE69413052T2 (de) | Sprachsynthese | |
DE69712216T2 (de) | Verfahren und gerät zum übersetzen von einer sparche in eine andere | |
DE3337353C2 (de) | Sprachanalysator auf der Grundlage eines verborgenen Markov-Modells | |
DE60216069T2 (de) | Sprache-zu-sprache erzeugungssystem und verfahren | |
DE69937176T2 (de) | Segmentierungsverfahren zur Erweiterung des aktiven Vokabulars von Spracherkennern | |
DE3783154T2 (de) | Spracherkennungssystem. | |
DE69831991T2 (de) | Verfahren und Vorrichtung zur Sprachdetektion | |
DE3876207T2 (de) | Spracherkennungssystem unter verwendung von markov-modellen. | |
DE69828141T2 (de) | Verfahren und Vorrichtung zur Spracherkennung | |
DE3236832C2 (de) | Verfahren und Gerät zur Sprachanalyse | |
DE19825205C2 (de) | Verfahren, Vorrichtung und Erzeugnis zum Generieren von postlexikalischen Aussprachen aus lexikalischen Aussprachen mit einem neuronalen Netz | |
EP1273003B1 (fr) | Procede et dispositif de determination de marquages prosodiques | |
EP1217610A1 (fr) | Méthode et système pour la reconnaissance vocale multilingue | |
EP0994461A2 (fr) | Procédé de reconnaissance automatique d'une expression vocale épellée | |
DE2212472A1 (de) | Verfahren und Anordnung zur Sprachsynthese gedruckter Nachrichtentexte | |
DE10306599B4 (de) | Benutzeroberfläche, System und Verfahren zur automatischen Benennung von phonischen Symbolen für Sprachsignale zum Korrigieren von Aussprache | |
EP1214703B1 (fr) | Procede d'apprentissage des graphemes d'apres des regles de phonemes pour la synthese vocale |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |