DE19857070A1

DE19857070A1 - Determining orthographic text reproduction involves determining constituent hypothesis sequence by comparing digital data with vocabulary for representing text, rule-based processing

Info

Publication number: DE19857070A1
Application number: DE1998157070
Authority: DE
Inventors: Michael Mende; Ralph Ostwald; Thomas Pachunke
Original assignee: Individual
Current assignee: Individual
Priority date: 1998-12-10
Filing date: 1998-12-10
Publication date: 2000-06-15

Abstract

The method involves providing a detection vocabulary containing words and word parts as word constituents to be detected; determining a sequence of constituent hypotheses by comparing the digital data with the detector vocabulary for representing the text with the determined constituent hypothesis sequence; rule-based processing of the constituent hypothesis sequence to group it into individual words to determine a word sequence hypothesis; determining a constituent of the word sequence hypothesis requiring correction; and providing at least on further constituent hypothesis for the constituents requiring correction. Independent claims are also included for an arrangement for determining an orthographic reproduction of a text and for a computer program product.

Description

Die vorliegende Erfindung betrifft ein Verfahren und eine Vorrichtung zur Ermittlung einer orthographischen Wiedergabe eines Textes, und insbesondere eine Anwendung eines derartigen Verfahrens für ein System zur automatischen Spracherkennung (ASE).The present invention relates to a method and an apparatus for Determination of an orthographic representation of a text, and in particular an application of such a method for an automatic system Speech recognition (ASE).

ASE-Systeme arbeiten üblicherweise so, daß gesprochene Texte mittels eines Mikrofons in elektronische Signale umgewandelt werden, die dann gespeichert und analysiert werden, um aus diesen Sprachsignalen eine orthographische Wiedergabe bzw. Repräsentation des gesprochenen Textes zu ermitteln. Dabei wird üblicherweise ein Abgleich des Sprachsignals bzw. von aus diesem ermittelten Daten mit einem Wörterbuch durchgeführt, in dem ein sogenanntes Erkennervokabular gespeichert ist. Abhängig von dem Grad der Ähnlichkeit zwischen den digitalen Daten und den Inhalten des Erkennervokabulars werden dann Begriffe aus dem Vokabular ausgewählt, die eine Repräsentation des gesprochenen Textes darstellen sollen.ASE systems usually work in such a way that spoken texts by means of of a microphone can be converted into electronic signals, which then can be stored and analyzed in order to convert these speech signals into one orthographic reproduction or representation of the spoken text determine. This is usually a comparison of the speech signal or from this determined data carried out with a dictionary in which a so-called recognizer vocabulary is stored. Depending on the degree of Similarity between the digital data and the content of the Recognizer vocabulary is then used to select terms from the vocabulary that should represent the spoken text.

Die Probleme bei einem derartigen ASE-System sind außerordentlich vielfältig, was im wesentlichen auf die Komplexität, Vielfalt und hohe Ambiguität der gesprochenen Sprache zurückzuführen ist. Beispielsweise kann das Wort "arbeiten" ein Verb sein oder ein Substantiv, was jedoch anhand des gesprochenen Textes praktisch nicht erkennbar ist, für die orthographische Wiedergabe aber Konsequenzen hat. Ein weiteres Problem besteht auch in der hohen Anzahl von Komposita der deutschen Sprache, denn für ein ASE-System ist es außerordentlich schwierig zu unterscheiden, ob zwei getrennte Worte oder ein Kompositum vorliegen.The problems with such an ASE system are extraordinary diverse, essentially due to the complexity, diversity and high ambiguity of the spoken language. For example, the word "work" to be a verb or a noun, which, however, is based on the spoken text is practically undetectable for orthographic Playback has consequences. Another problem is that high number of composites of the German language, because for an ASE system it is extremely difficult to distinguish between two separate words or one Compositum are available.

Herkömmliche Systeme zur Automatischen Spracherkennung (ASE) arbeiten wortformenorientiert. Damit ist die kleinste für den Benutzer sichtbare Einheit im Erkennungsprozeß die flektierte bzw. derivierte Form eines Lexems, letzteres vergleichbar einem herkömmlichen Wörterbucheintrag. Somit ist dies auch die kleinste dem ASE-Benutzer zugängliche Einheit, der Benutzer muß eine erkannte Form eines Lexems entweder als richtig akzeptieren oder als falsch verwerfen. Insbesondere besteht bei herkömmlichen Systemen keine Möglichkeit, sich die Tatsache zunutze zu machen, daß zu einem Lexem verschiedene Wortformen gehören können, beispielsweise die Wortformen Wand, Wände, Wänden, - evtl. Wandung - zu dem Lexem Wand. Herkömmliche ASE-Systeme sind auf die Erkennung der konkreten Wortformen beschränkt, sie sind nicht in der Lage, sich auf eine flexible Art und Weise mit der Tatsache auseinanderzusetzen, daß möglicherweise zwar eine erkannte derivierte Wortform falsch, das zugehörige nicht derivierte Lexem aber korrekt erkannt ist.Conventional systems for automatic speech recognition (ASE) work word-oriented. This is the smallest visible to the user Unit in the recognition process the inflected or derived form of a lexeme, the latter comparable to a conventional dictionary entry. So this is it even the smallest unit accessible to the ASE user, the user must have one recognized form of a lexeme either as right or wrong discard. In particular, with conventional systems there is no possibility to take advantage of the fact that a lexeme is different Word forms can include, for example, the word forms wall, walls, Walls, - possibly wall - to the Lexem wall. Conventional ASE systems are limited to recognizing the specific word forms, they are not in the Able to deal with the fact in a flexible way that a recognized derived word form may be wrong, the corresponding one not derived lexeme but correctly recognized.

Ein weiterer Grund für die teilweise unbefriedigenden Leistungen herkömmlicher ASE-Systeme besteht in der prinzipiellen Beschränkung auf eine maximale Menge von erkennbaren Wortformen, im folgenden als Wortformen- Vokabular (= WfVok) bezeichnet. Auch wenn solche Vokabulare Zehn- oder sogar Hunderttausende von Wortformen umfassen, sind sie doch niemals eine Obermenge des Wortschatzes eines menschlichen Sprechers. Dazu trägt neben der Fülle an Fachwörtern, Neologismen (Neuschöpfungen) vor allem Namen maßgeblich die Flexion (grammatisch-regelhafte Änderung der Wortendung) und, als spezielle Erscheinung des Deutschen, die Komposition (Bildung von Wörtern aus Teilwörtern) bei. Die Komposition ist dabei ein linguistisch produktiver Prozeß, der ständigem Wandel unterworfen ist. So sind z. B. nicht nur Komposita durch Verknüpfung von (zwei oder mehr) Teilwörtern historisch entstanden, sondern dies kann aktuell jederzeit durch den Sprachbenutzer geschehen. Solche Ad-hoc- Komposita sind für andere Sprachbenutzer meist unmittelbar dekodierbar, nicht jedoch für ein wortformorientiertes ASE-System, da sie naturgemäß in einem Erkennervokabular nicht vorkommen können. Another reason for the sometimes unsatisfactory performance conventional ASE systems consist in the principle limitation to one maximum number of recognizable word forms, hereinafter referred to as word forms - Vocabulary (= WfVok) called. Even if such vocabularies are ten or even Include hundreds of thousands of word forms, as they are never one Superset of a human speaker's vocabulary. Add to that the abundance of technical terms, neologisms (new creations) especially names decisive is the inflection (grammatical regular change of the word ending) and, as a special phenomenon of German, the composition (formation of words from partial words). The composition is a linguistically productive process, is subject to constant change. So z. B. not only through composites Linking (two or more) sub-words historically emerged, but this can currently be done by the language user at any time. Such ad hoc Composites are usually immediately decodable for other language users, not however, for a word-form-oriented ASE system, since they are naturally in one Recognizer vocabulary cannot occur.

Wegen der beschriebenen Beschränktheit der Vokabulare und der prinzipiell fehlerbehafteten, da stochastisch orientierten Erkennung besteht für den Benutzer eines ASE-Systems die ständige Notwendigkeit der Wortkorrektur. (Eine Erkennungsrate von 98% muß als optimal gewertet werden.) Diese erfolgt bei den gängigen ASE-Systemen konsequenterweise wortformenorientiert: Zu jeder erkannten Wortform wird dem Benutzer auf Wunsch eine Liste von statistisch ermittelten wahrscheinlichsten Alternativen präsentiert, aus der er die korrekte Form auswählen kann. Ist diese in der Liste nicht enthalten (etwa weil sie nicht zum Vokabular gehört), so muß sie per Tastatur eingegeben werden, selbst wenn sie in einem üblicherweise vorhandenen Hintergrundvokabular des ASE-Systems enthalten ist. Ein solches Hintergrundvokabular liefert im Korrekturfall bei herkömmlichen ASE-Systemen lediglich die phonetische Transkription, aber erst nach der manuellen Eingabe der orthographischen Form durch den Benutzer.Because of the limited vocabulary and the described principally error-prone, since stochastically oriented recognition exists for the ASE system users the constant need for word correction. (A Detection rate of 98% must be rated as optimal.) This takes place at the Common ASE systems consistently word-form-oriented: for everyone Recognized word form will give the user a list of statistical if desired determined most likely alternatives, from which he presents the correct one Can choose shape. If this is not in the list (perhaps because it is not part of the vocabulary), it must be entered using the keyboard, even if it in a commonly available background vocabulary of the ASE system is included. Such a background vocabulary is provided in the event of a correction conventional ASE systems only the phonetic transcription, but only after manual entry of the orthographic form by the user.

Die eigentliche Spracherkennung erfolgt herkömmlicherweise durch probabilistische Verfahren, sowohl was die Abfolge der Laute oder kleinerer Einheiten innerhalb eines gesprochenen Wortes angeht (d. h. den Vergleich dieser ermittelten Abfolge mit idealtypischen Aussprachen der Wortformen des Erkenner vokabulars), als auch die Wortfolgen (d. h. den Vergleich von Wortkandidatenfolgen mit Wortfolgen aus offline analysierten Textkorpora). Wegen der sehr hohen Zahl kombinatorisch möglicher Bi- und Trigramme von Wortformen können auch in sehr großen Korpora diese nicht alle beobachtet werden. Es ergibt sich somit die Unmöglichkeit, auf der Basis dieser Verfahren irreguläre Wortfolgen von vornherein auszuschließen, was bedeutet, daß grundsätzlich eine mehr oder weniger umfangreiche Korrektur eines erkannten Textes nötig ist.The actual speech recognition conventionally takes place through probabilistic procedures, both as to the order of the sounds or smaller Units within a spoken word (i.e. comparing them determined sequence with ideally typical pronunciations of the word forms of the recognizer vocabulary), as well as the word sequences (i.e. the comparison of word candidate sequences with word sequences from text corpora analyzed offline). Because of the very high number Combinatorial bi- and trigrams of word forms can also be used in very large corpora these are not all observed. The result is Impossibility of irregular word sequences from exclude in advance, which means that basically one more or less extensive correction of a recognized text is necessary.

Es ist daher eine Aufgabe der vorliegenden Erfindung, ein Verfahren und eine Vorrichtung zur Ermittlung einer orthographischen Wiedergabe eines Textes zu schaffen, bei denen insbesondere eine Korrektur von Fehlern in der orthographischen Wiedergabe des Textes auf besonders einfache, flexible und vielseitige Art und Weise möglich ist. It is therefore an object of the present invention to provide a method and a device for determining an orthographic reproduction of a To create text, in particular a correction of errors in the Orthographic rendering of the text in a particularly simple, flexible and versatile way is possible.

Ein Verfahren und eine Vorrichtung gemäß der vorliegenden Erfindung sind in den unabhängigen Ansprüchen definiert. Die abhängigen Ansprüche definieren besondere Ausführungsformen der Erfindung.A method and an apparatus according to the present invention are defined in the independent claims. The dependent claims define particular embodiments of the invention.

Die Aufgabe der Erfindung wird im wesentlichen gelöst, indem ein Erkenner-Vokabular vorgesehen ist, das aus Wörtern und Wortteilen als Wortkonstituenten besteht, ferner daß basierend auf diesem Vokabular eine Konstituentenhypothesenfolge ermittelt wird, welche dann unter Anwendung von Regeln in eine Wortfolgenhypothese umgewandelt wird. Schließlich wird eine korrekturbedürftige Konstituente dieser Wortfolgenhypothese ermittelt und für diese korrekturbedürftige Konstituente wird mindestens eine weitere Konstituenten- Hypothese geliefert.The object of the invention is essentially achieved by a Recognizer vocabulary is provided that consists of words and parts of words as Word constituents exist, further that based on this vocabulary Constituent hypothesis sequence is determined, which then using Rules is converted into a word sequence hypothesis. Eventually one will Constituents of this word sequence hypothesis in need of correction are determined and for this constituent in need of correction is at least one further constituent Hypothesis delivered.

In einer bevorzugten Ausführungsform erfolgt die Lieferung der weiteren Konstituenten-Hypothesen basierend auf einem phonetisch-akustischen Ähnlichkeitskalkül. Hierzu wird beispielsweise im Vokabular, wobei auch das Hintergrundvokabular einbezogen werden kann, nach Einträgen gesucht, deren phonetische Transkription der der Konstituenten-Hypothese ähnelt. Dadurch wird vermieden, daß für die Lieferung weiterer ähnlicher Hypothesen die Audiodaten der Erkennungssitzung gespeichert bleiben müssen, vielmehr kann ein derartiges Suchen nach weiteren Hypothesen auch "offline" erfolgen, d. h. basierend auf dem ersten Erkennungsresultat als zugrundeliegender Hypothese unter der Anwendung von Algorithmen, die phonetisch ähnliche Resultate liefern und lediglich auf die akustischen Repräsentationen der Konstituenten durch deren phonetische Transkription zugreifen.In a preferred embodiment, the others are delivered Constituent hypotheses based on a phonetic-acoustic Similarity calculus. This is done, for example, in the vocabulary, whereby that too Background vocabulary can be included, searched for entries whose phonetic transcription similar to the constituent hypothesis. This will avoided the audio data for the delivery of further similar hypotheses the recognition session must remain stored, rather such Searches for further hypotheses can also take place "offline", d. H. based on the first recognition result as the underlying hypothesis under the application of algorithms that deliver phonetically similar results and only on the acoustic representations of the constituents by their phonetic Access transcription.

Vorteilhafterweise kann die Lieferung weiterer Hypothesen auch unter der Anwendung von Flexions- und/oder Derivationsparadigmen erfolgen, das heißt es werden beispielsweise als weitere Hypothesen mögliche weitere Flexionen der ersten Hypothese geliefert. In diesem Fall kann man sich die Tatsache zunutze machen, daß möglicherweise zwar das Lexem "Haus" korrekt erkannt wurde, allerdings in fehlerhafter Weise im Nominativ statt im Dativ "Hause".Advantageously, the delivery of further hypotheses can also under the use of inflection and / or derivation paradigms, that is possible further inflections of the first hypothesis. In this case, you can take advantage of the fact make sure that the lexeme "house" may have been recognized correctly, however, incorrectly in the nominative instead of in the "home" dative.

In einer weiteren vorteilhaften Ausführungsform kann die Lieferung weiterer Hypothesen auch die Segmentierung von Wörtern einer Wortfolgenhypothese in deren Konstituenten umfassen oder umgekehrt, um so ein falsch erkanntes Kompositum in seine Bestandteile aufzutrennen oder ein auf fehlerhafte Weise nicht als Kompositum erkanntes Wort aus seinen Bestandteilen zu bilden.In a further advantageous embodiment, the delivery further hypotheses include the segmentation of words one Word sequence hypothesis in their constituents or vice versa, so one incorrectly recognized compound to separate into its components or one incorrect word from its components that is not recognized as a compound to build.

Besonders vorteilhaft ist es, wenn verschiedene Wege zur Lieferung einer weiteren Hypothese vorgesehen sind und der Benutzer - z. B. durch eine Menüsteuerung - zwischen den unterschiedlichen Möglichkeiten wählen kann. Vorteilhafterweise ist ein solches Menü hierarchisch aufgebaut, so daß auf der ersten Ebene z. B. zwischen phonetischer Ähnlichkeit und Flexion/Derivation gewählt werden kann, während in einer zweiten Ebene der Menüpunkt Flexion/ Derivation z. B. noch unterteilt ist in Flexion, Affigierung, Segmentierung wobei Segmentierung beispielsweise für die Trennung oder Bildung von Komposita steht.It is particularly advantageous if different ways of delivery Another hypothesis is provided and the user - e.g. B. by a Menu control - can choose between the different options. Such a menu is advantageously structured hierarchically, so that on the first level z. B. between phonetic similarity and inflection / derivation can be selected while the menu item Flexion / Derivation e.g. B. is still divided into inflection, affigation, segmentation Segmentation stands for example for the separation or formation of composites.

Ein weiterer Weg zur Lieferung von weiteren Hypothesen umfaßt die Unterteilung von Konstituenten-Hypothesen in Wörter und/oder Morphe und/oder Silben, wobei jeweils für diese Teileinheiten neue Hypothesen geliefert werden können. Dies ermöglicht beispielsweise die Fokussierung auf einzelne fehlerhaft erkannte Silben bei der Korrektur, während korrekt erkannte Silben (oder Wörter oder Morphe) ohne Veränderung beibehalten werden können. Die Lieferung von weiteren Silben- oder Wort- oder Morph-Hypothesen kann dann wiederum auf verschiedene Arten erfolgen, beispielsweise basierend auf einem phonetisch- akustischen Ähnlichkeitskalkül.Another way to deliver further hypotheses includes the Subdivision of constituent hypotheses into words and / or morphs and / or Syllables, whereby new hypotheses are provided for these subunits can. This enables, for example, focusing on individual errors recognized syllables when correcting while correctly recognized syllables (or words or Morphe) can be maintained without change. The delivery of further syllable or word or morph hypotheses can then turn up different types occur, for example based on a phonetic acoustic similarity calculation.

In einer bevorzugten Ausführungsform erfolgt die Ermittlung von korrekturbedürftigen Wörtern oder Konstituenten durch eine regelbasierte Bewertung der n-Gramme der Konstituentenhypothesenfolge und/oder der Wortfolgenhypothese. Dadurch kann die Ermittlung korrekturbedürftiger Elemente automatisch ohne Eingriff des Benutzers erfolgen. In einer weiteren vorteilhaften Ausführungsform kann eine derartige regelbasierte Bewertung getriggert werden durch eine geringe Erkennungswahrscheinlichkeit, so daß bei einer wahrscheinlich richtig erkannten Hypothesenfolge eine Erzeugung von Fehlern durch die regelbasierte Bewertung der n-Gramme verhindert wird.In a preferred embodiment, the determination of Words or constituents in need of correction by a rule-based Evaluation of the n-grams of the constituent hypothesis sequence and / or the Word order hypothesis. This enables the identification of elements in need of correction done automatically without user intervention. In another advantageous In one embodiment, such a rule-based evaluation can be triggered by a low probability of detection, so that with a probable correctly recognized hypothesis sequence a generation of errors by the rule-based evaluation of the n-grams is prevented.

Vorzugsweise sind im Erkenner-Vokabular Wortteile nur dann als Konstituenten aufgenommen, wenn ihnen ein gewisses Mindestmaß an akustischem Gewicht zukommt. Dies verhindert eine "Zersplitterung" des Erkennervokabulars in eine zu große Zahl von Konstituenten, was wiederum die Anzahl möglicher Permutationen erhöht und somit die Anforderungen an die Erkennung vergrößert und damit deren Trefferquote senkt.Word parts are preferably only in the recognizer vocabulary as Constituents added when given a certain minimum acoustic weight. This prevents "fragmentation" of the Recognizer vocabulary in too many constituents, which in turn Number of possible permutations increased and thus the requirements for Detection increases and thus their hit rate is reduced.

Vorteilhafterweise ist ein Hintergrundvokabular zusätzlich zum Erkennervokabular vorgesehen, in dem eine Vielzahl von Wörtern mit ihren Transkriptionen gespeichert ist und in dem zum Auffinden weiterer Hypothesen nachgeschlagen werden kann. Vorzugsweise ist dabei dieses Hintergrundvokabular - ebenso wie das Erkennervokabular - auf Konstituentenbasis aufgebaut, um insbesondere Hypothesen für Konstituenten liefern zu können.A background vocabulary is advantageous in addition to Recognizer vocabulary provided in which a variety of words with their Transcriptions is stored and used to find further hypotheses can be looked up. This is preferably this Background vocabulary - just like the recognizer vocabulary Constituent base built up in particular hypotheses for constituents to be able to deliver.

Weiter ist es vorteilhaft, wenn ein nicht erkanntes Wort oder eine Wortform oder Konstituente, die durch Anwendung eines der Mechanismen zur Lieferung weiterer Hypothesen aufgefunden wurde, in das Hintergrundvokabular (oder auch das Erkennervokabular) übernommen werden kann. Dadurch wird letztlich das zur Verfügung stehende Vokabular dynamisch erweiterbar und die Grenzen des Vokabularumfangs bei herkömmlichen ASE-Systemen können bedeutend erweitert werden. Dieser neu aufgefundene Eintrag kann dann z. B. selbst wieder Konstituente eines wiederum neuen Kompositums sein. It is also advantageous if an unrecognized word or a Word form or constituent, which can be achieved by using one of the mechanisms for Delivery of further hypotheses was found in the background vocabulary (or the recognizer vocabulary) can be adopted. This will ultimately the available vocabulary is dynamically expandable and the Vocabulary limits in conventional ASE systems can significantly expanded. This newly found entry can then z. B. are constituents of a new compound.

Besonders vorteilhaft ist es, wenn die verschiedenen Elemente, die zur Korrektur einer Hypothese dienen können, sämtlich implementiert sind und ferner eine durch Korrektur aufgefundene neue Hypothese selbst wiederum erneut als Grundlage für eine weitere Korrektur dient. Dadurch kann auf iterative Weise die letztlich korrekte orthographische Wiedergabe des Textes ermittelt bzw. erzeugt werden.It is particularly advantageous if the various elements used for Correction of a hypothesis can serve, all are implemented and further a new hypothesis found by correction itself again as Serves as the basis for further correction. This allows the ultimately determined or generated correct orthographic rendering of the text become.

In einer besonders bevorzugten Ausführungsform besteht die vorliegende Erfindung in einem Verfahren, das den orthographischen Output eines gegebenen, mit erfindungsgemäß d. h. auf Konstituentenbasis aufgebautem Sprachmodell (= die Gesamtheit aus WfVok und Wort[formen]folgestatistik) arbeitenden automatischen Spracherkennungssystems (ASE) für kontinuierliches Diktieren automatisch so bearbeitet, daß unabhängig von verwendeter Spracherkennungs-Engine auf der Basis linguistischer Wortumgebungsregeln (1) irreguläre Wortfolgen korrigiert werden, (2) beliebige nicht im Erkenner-Vokabular, wohl aber in einem beliebig großen Hintergrundvokabular enthaltene Komposita erkannt werden und (3) dem Benutzer im Bedarfsfall eine Korrektur ermöglicht wird, die ihm - unabhängig von vorhandenen Audiodaten und unabhängig von korrekt oder falsch erkannten Wortgrenzen - dynamisch-approximativ Wortkandidaten bzw. Wortteil-Kandidaten anbietet, die aus einer beliebig großen, vom verwendeten Erkenner-Vokabular unabhängigen Menge von Wortformen bzw. Teilwortformen, dem dynamisch variablen Hintergrundvokabular, automatisch ausgewählt werden.In a particularly preferred embodiment, the present invention in a method that the orthographic output of a given, according to the invention d. H. built on a constituent basis Language model (= the total of WfVok and word [form] follow-up statistics) working automatic speech recognition system (ASE) for continuous Dictation automatically processed so that regardless of which one is used Speech recognition engine based on linguistic word environment rules (1) irregular word sequences are corrected, (2) any not in the recognizer vocabulary, but it is probably in composites contained in an arbitrarily large background vocabulary are recognized and (3) enables the user to correct them if necessary that is - independent of existing audio data and independent of correctly or incorrectly recognized word boundaries - dynamic-approximate Offers word candidates or word part candidates that consist of an arbitrarily large, set of word forms independent of the recognizer vocabulary used or Subword forms, the dynamically variable background vocabulary, automatically to be selected.

Die Erfindung wird nachfolgend im Detail anhand mehrerer Ausführungsbeispiele unter Bezugnahme auf beiliegende Zeichnungen beschrieben. Es zeigen: The invention is described in detail below with reference to several Embodiments with reference to the accompanying drawings described. Show it:

Fig. 1 den schematischen Aufbau eines erfindungsgemäßen ASE- Systems bzw. Verfahrens, das eine Konstituentenhypothesenfolge liefert, mit einem zugehörigen Korrekturmodul; Fig. 1 shows the schematic structure of an ASE system or method according to the invention, which provides a Konstituentenhypothesenfolge, with an associated correction module;

Fig. 2 schematisch den Aufbau des Korrekturmoduls bzw. den Ablauf des Korrekturverfahrens. Fig. 2 shows schematically the structure of the correction module or the course of the correction process.

Einem ASE-System gemäß einem ersten Ausführungsbeispiel der Erfindung liegen auf der Seite des zu erkennenden Vokabulars sprachliche Einheiten zugrunde, hier Konstituenten genannt, die Wortformen oder aber Teilen von Wortformen entsprechen. Diese Konstituenten sind die primären Elemente des Sprachsignals, sie bilden beispielsweise durch Zusammenfügung die Komposita.An ASE system according to a first embodiment of the Invention lie on the side of the vocabulary to be recognized linguistically Based on units, here called constituents, the word forms or parts of word forms. These constituents are the primary elements of the Speech signal, they form the composites for example by assembly.

Damit ergibt sich zwangsweise, daß das System in der Lage sein muß, ohne diskretes Diktieren die Wortgrenzen der gesprochenen Sprache zu erkennen. Während der Benutzer bei ASE-Systemen für diskretes Diktieren die Leerzeichen zwischen den orthographischen Wortformen durch eine Pause im Sprechfluß "wiedergeben" muß, besteht die Notwendigkeit bei Systemen zum kontinuierlichen Diktieren, wie auch dem hier beschriebenen, nicht. Bei der hier möglichen pausenfreien Sprechweise gibt es im resultierenden Sprachsignal keine signalphonetisch nachweisbare segmentale Entsprechung für Wortgrenzen. Der Unterschied zwischen Kompositum und entsprechender Wortgruppe, wie z. B. Steuererklärung und Steuer-Erklärung, kann als Ergebnis rein orthographischer Konvention und damit als lautsprachlich irrelevant betrachtet. Die Bestandteile der Komposita, die Konstituenten, sind die primären Elemente des Sprachsignals, auf denen die Wortfolgestatistik der Basis-ASE beruht.This inevitably means that the system must be able to to recognize the word limits of the spoken language without discreet dictation. While the user in ASE systems for discrete dictation, the spaces between the orthographic word forms through a pause in the flow of speech "Play" must exist in continuous systems Don't dictate like the one described here. With the possible here there is no non-stop speech in the resulting speech signal Detectable segmental correspondence for word boundaries. The Difference between compound and corresponding phrase, such as B. Tax return and tax return can be purely orthographic as a result Convention and thus regarded as irrelevant in terms of speech. The components of the Composites, the constituents, are the primary elements of the speech signal, on which are based on the word sequence statistics of the basic ASE.

Dafür sind vorzugsweise zwei Voraussetzungen gegeben: Zum einen ist das zugrundeliegende Sprachmodell so vorzubereiten, daß es auf den Konstituenten statt auf herkömmlichen Wortformen basiert. Zum anderen müssen vorher die Konstituenten der zu erkennenden Wortformen sinnvoll bestimmt werden. Eine vollständige automatische Zerlegung aller Komposita ist nicht zweckmäßig, da sinnlose Zerlegungen (Bsp.: Verbraucher) sicher ausgeschlossen werden müssen, was auch bei Verwendung von morphologisch basierten Grammatiken nicht zu leisten ist (siehe hierzu z. B. T. Pachunke et al., "Broad Coverage Automatic Morphological Segmentation of German Words", Proceedings of the Fifteenth International Conference on Computational Linguistics, Nantes, France, July 1992, Vol. IV, S. 1218-122). Würden inadäquate Wortformen zerlegungen zugelassen, wäre eine Verfälschung des Korpus und damit der Wort[formen]folgestatistik die Folge.There are preferably two prerequisites for this: First is to prepare the underlying language model so that it is based on the Constituents rather than traditional word forms. The other must previously the constituents of the word forms to be recognized meaningfully determined become. A complete automatic disassembly of all composites is not Expedient, as senseless decompositions (e.g. consumers) are definitely excluded have to be what even when using morphologically based Grammars cannot be provided (see, for example, T. Pachunke et al., "Broad Coverage Automatic Morphological Segmentation of German Words ", Proceedings of the Fifteenth International Conference on Computational Linguistics, Nantes, France, July 1992, Vol. IV, pp. 1218-122). Would inadequate word forms disassembling would be a falsification of the body and thus the Word [form] follow statistics the result.

Konstituenten sollten Wortcharakter haben, d. h. sie müssen lautlich mindestens durch eine Silbe repräsentiert werden und mindestens ein Stamm- Morph enthalten, welches einen Wortneben- oder Hauptakzent trägt. Erkennerabhängig sollte jeder Konstituente ein gewisses "akustisches Gewicht" zukommen, d. h. ein bestimmtes erwartbares Energiequantum im Sprachsignal. Außerdem sollten die Konstituenten im jeweiligen Kontext syntagmenbildend interpretiert werden können. So ist beispielsweise die Zerlegung Vor-Wand für Vorwand zwar sowohl orthographisch als auch lautlich korrekt, sie führt jedoch für die Teilwörter vor und Wand zu einer verfälschten Wort[formen]folgestatistik, da diese dem Wort Vorwand nicht zugrundeliegen. Betrachtet man den häufigen Fall von hypotaktisch aus Grundwort und Bestimmungsteil aufgebauten Determinativkomposita vom Typ Hauswand oder auch Hinterwand, so wird deutlich, daß die Umgebung des Grundwortes -wand syntaktisch vergleichbar sein wird mit der von Wand, jedoch grundsätzlich anders als die des Wortes Vorwand.Constituents should be word-based, i.e. H. they have to be noisy are represented by at least one syllable and at least one parent Contain morph, which has a minor or main accent. Depending on the recognizer, each constituent should have a certain "acoustic weight" to come d. H. a certain expected amount of energy in the speech signal. In addition, the constituents should form syntagmas in the respective context can be interpreted. For example, the pre-wall disassembly is for The pretext is correct both orthographically and in terms of sound, but it leads to the partial words in front and wall to a falsified word [form] sequence statistics, because these do not underlie the pretext. Consider the common case of hypotactically constructed from basic word and part of determination Determinative composites of the type house wall or also rear wall, so is clearly that the surroundings of the basic word -wand be syntactically comparable with the wall, but fundamentally different from that of the word pretext.

Die Konstituentengrenze ist vorzugsweise immer zugleich Morph- und Silbengrenze; damit sind die gefundenen Wortsegmente die kleinste Einheit, auf der sowohl morphologische als auch phonetisch-silbenorientierte Mechanismen operieren können. The constituent boundary is preferably always morph and Syllable boundary; thus the word segments found are the smallest unit, up of both morphological and phonetic-syllable-oriented mechanisms can operate.

Nach dem Ermitteln 'zerfallskritischer' Komposita, d. h. derjenigen Komposita für die eine Zerlegung in ihre Bestandteile sinnvoll erscheint, erfolgt eine entsprechende Vokabularanpassung, d. h. eine Ersetzung von Komposita der Form ab durch die Wortfolge 'a-b' in den Korpora für die Wort[formen]folge statistik.After identifying 'decay-critical' composites, i.e. H. of those Composites for which a breakdown into their constituents seems sensible occur a corresponding vocabulary adjustment, d. H. a replacement of composites of Form ab through the word sequence 'a-b' in the corpora for the word [form] sequence statistics.

Es wird ein Lexikon mit gemäß den oben beschriebenen Prinzipien konstituenten-segmentierten Wortformen verwendet, das die zur Vokabulargenerierung notwendigen Korpora abdeckt. Da sich aus den verwendeten Konstituenten fast beliebig viele weitere Komposita bilden lassen, von denen ein gewisser Anteil real verwendete Wörter sind, ergibt sich für den Benutzer eine faktische Vergrößerung des Vokabulars gegenüber einem wortformen-orientierten ASE-System. Dieses in das ASE-System integrierte Wortformen-Lexikon ist erweiterbar gestaltet, und zwar sowohl um Komposita- Neubildungen aus bereits gespeicherten Konstituenten, als auch um neue Konstituenten. Diese werden durch Aufnahme ins Konstituentenvokabular verfügbar. Für den Benutzer sind diese beiden verschiedenen Aufnahmeprozesse nicht als unterschiedlich erkennbar, er bedient über ein entsprechendes Front-End nur das dynamische Wortformenlexikon, das intern auf Konstituentenbasis aufgebaut ist.It becomes a lexicon with the principles described above uses constituent-segmented word forms, which are used for Vocabulary generation covers necessary corpora. Because of the let constituents used form almost any number of other composites, from of which a certain proportion are actually used words results for the Users actually increase the vocabulary compared to one word form-oriented ASE system. This integrated into the ASE system Word forms lexicon is designed to be expandable, both with composites New formations from already saved constituents, as well as new ones Constituents. These are included in the constituent vocabulary available. For the user, these are two different recording processes not recognizable as different, it operates via a corresponding front end only the dynamic word form lexicon, which is internal on a constituent basis is constructed.

Basierend auf diesem Umfeld liefert das ASE-System zunächst eine erste Hypothese für einen zu erkennenden Text. Diese Hypothese ist üblicherweise korrekturbedürftig, die Korrektur erfolgt dann durch einen Komplex aus Regelapparaten und Programm-Modulen, der die erste Hypothese weiter verarbeitet.Based on this environment, the ASE system initially delivers one first hypothesis for a text to be recognized. This hypothesis is Usually in need of correction, the correction is then made by a complex from control apparatus and program modules, which continues the first hypothesis processed.

Der eigentliche Komplex aus Regelapparaten und Programm-Modulen, der den Kontituenten-Output der ASE nachbereitet, im folgenden "Nachbereitungs- Modul" genannt, ist wie in Fig. 1 gezeigt der Basiserkennung nachgeschaltet: The actual complex of control apparatus and program modules, which postprocesses the ASE's continent output, hereinafter referred to as "postprocessing module", is connected downstream of the basic recognition as shown in FIG. 1:

Die einzelnen Komponenten des "Nachbereitungs-Moduls" sind in Fig. 2 schematisch dargestellt:The individual components of the "postprocessing module" are shown schematically in FIG. 2:

Der Input des "Nachbereitungs-Moduls" ist eine Folge von n erkannten Wortkonstituenten inklusive der von der ASE-Engine mit Wahrscheinlichkeitswerten versehenen nächsten Hypothesen. In einer beliebig wiederholbaren Abfolge von Stufen wird dieser Input (im folgenden nur "Konstituenten" genannt) modifiziert und mit Information aus externen Quellen angereichert. Eine potentielle, jedoch nicht notwendige Quelle ist dabei die Wortkorrektur durch den Benutzer.The input of the "postprocessing module" is a sequence of n recognized Word constituents including those from the ASE engine Probability values provided next hypotheses. In any one Repeatable sequence of levels is this input (in the following only "Constituents") modified and with information from external sources enriched. A potential but not necessary source is the Word correction by the user.

In einer ersten (Analyse-)Stufe erfolgt eine Silbensegmentierung der Konstituenten. Ergebnis ist eine gewichtete Analyse der phonetischen Konstituentenstruktur. Im nächsten Schritt wird eine morphologische Analyse der Konstituenten hinzugefügt, die zum einen die eigentliche Komposition (Zusammensetzung der Konstituenten zu Komposita) durchführt und zum anderen für spätere Schritte Informationen über weitere flektierte bzw. affigierte oder derivierte Formen sowie die Wortart liefert.In a first (analysis) stage, syllable segmentation takes place Constituents. The result is a weighted analysis of the phonetic Constituent structure. The next step is a morphological analysis of the Constituents added, on the one hand the actual composition (Composition of constituents to composites) and the other for later steps information about further inflected or afflicted or derives derived forms as well as the part of speech.

In einer syntaktischen Stufe werden Wortfolgeregeln durch partielles Parsing auf das n-Gramm angewendet, wobei u. a. auf die zuvor gewonnene Wortartinformation zugegriffen wird. Diese Regeln können getriggert werden durch "Zonen geringer Wahrscheinlichkeit", d. h. sie werden erst abgearbeitet, wenn die von der ASE-Engine ermittelte Konstituentenfolge unwahrscheinlich ist bzw. ihre Wahrscheinlichkeit unterhalb eines bestimmten Schwellenwertes liegt. In einer integrativ-phonetischen Stufe werden mittels eines engine-abhängigen optimierten Ähnlichkeits-Kalküls wiederum Hypothesen zu den Konstituenten gebildet.In a syntactic level, word order rules are replaced by partial Parsing applied to the n-gram, where u. a. on the previously won Part of speech information is accessed. These rules can be triggered by "Low probability zones", i. H. they are only processed when the the sequence of constituents determined by the ASE engine is unlikely or theirs Probability is below a certain threshold. In a integrative-phonetic level are optimized by means of an engine-dependent Similarity calculus in turn formed hypotheses about the constituents.

Das Korrekturmodul stellt dem Benutzer die Ergebnisse der Wort-, Silben- und Morphsegmentierung zur Verfügung, so daß er über geeignete graphische Interfaces folgende Aktionen ausführen kann:
The correction module provides the user with the results of the word, syllable and morph segmentation, so that he can carry out the following actions via suitable graphical interfaces:

- Any combination or separation of constituents
- Case sensitivity
- Flexion / Derivation / Prefecture

Im Zusammenspiel dieser Operationen mit einer geeigneten Präsentation der Hypothesen kann der Benutzer Word Morphing betreiben: Dabei wählt er eine Wortformen- oder Konstituenten-Hypothese aus, die dann an die Stelle des ursprünglich Erkannten rückt und erhält in Abhängigkeit von seiner Auswahl neu generierte, spezifische Hypothesen, aus denen er wiederum die passendste auswählt. Diesen Vorgang kann er wiederholen, bis das gewünschte Wort erreicht ist. Damit steht eine tastaturfreie, beispielsweise auch per Sprache komfortabel zu bediendende Korrektur zur Verfügung.In the interaction of these operations with a suitable one Word morphing can be performed by the user to present the hypotheses: he selects a word form or constituent hypothesis, which is then passed on to the The place of what was originally recognized moves and maintains depending on it Selection of newly generated, specific hypotheses from which, in turn, the selects the most suitable. He can repeat this process until the desired one Word is reached. This is a keyboard-free, for example, also by voice Correction is easy to use.

Für den Fachmann ergeben sich auf zwanglose Weise weitere Modifikationen des geschilderten Ausführungsbeispiels. Beispielsweise können die einzelnen Korrekturmechanismen in unterschiedlicher Reihenfolge durchgeführt werden. Des weiteren kann das beschriebene Verfahren nicht nur in einem ASE- System angewendet werden, sondern beispielsweise auch in einem System zur automatischen Rechtschreibprüfung und -korrektur.For those skilled in the art, there are more easily Modifications of the described embodiment. For example, the individual correction mechanisms in different orders become. Furthermore, the method described cannot only be used in an ASE System are applied, but also for example in a system for automatic spell checking and correction.

Das geschilderte Ausführungsbeispiel läßt sich auf verschiedene Weisen in die Tat umsetzen, so können beispielsweise die geschilderten Module mittels auf einem Rechner ablaufender Programme und somit rein durch Software realisiert werden, desgleichen ist aber auch eine hybride Realisierung teils mittels Software teils mittels Hardware ohne weiteres für den Fachmann vorstellbar.The described embodiment can be different Implement the wise, for example, the modules described by means of programs running on a computer and thus purely by software can be realized, but also a hybrid realization is partly by means of Some of the software is easily imaginable for the specialist using hardware.

Für die Anwendung auf ein Spracherkennungssystem sind implizit natürlich verschiedene Hardwarekomponenten, beispielsweise ein Mikrofon und entsprechende Komponenten zur Umwandlung der Sprache in digitale Signale erforderlich, die Bereitstellung und Realisierung dieser Mittel bereiten jedoch dem Fachmann keine Probleme. Es versteht sich ferner, daß die Erfindung neben ihrer Realisierung als Verfahren auch durch eine entsprechende Vorrichtung sowie durch einen Datenträger realisiert werden kann, der Programmcode enthält, welcher einen Computer zur Durchführung des erfindungsgemäßen Verfahrens veranlaßt.For application to a speech recognition system are implicit of course, different hardware components, such as a microphone and corresponding components for converting speech into digital signals necessary, but the provision and implementation of these funds prepare the Professional no problems. It is further understood that the invention in addition to its Realization as a method also by means of a corresponding device can be realized by a data medium that contains program code, which a computer for performing the method according to the invention prompted.

Claims

1. A computer-running method for determining an orthographic reproduction of a text represented by a sequence of digital data, which comprises:
Providing a recognizer vocabulary which includes words and parts of words as constituent words to be recognized;
Determining a sequence of constituent hypotheses by comparing the digital data with the recognizer vocabulary to represent the text by means of the determined constituent hypothesis sequence;
Rule-based processing of the constituent hypothesis sequence for grouping the constituent hypothesis sequence into individual words in order to determine a word sequence hypothesis;
Determining a constituent of the word sequence hypothesis in need of correction; and
Provide at least one further constituent hypothesis for the constituents that are in need of correction.

2. The method according to claim 1, characterized in that determining the constituent hypothesis in need of correction comprises:
Determining a word of the word sequence hypothesis in need of correction;
Segmentation of the word in need of correction into its constituents;
Determination of one of the constituents resulting from the segmentation as constituents in need of correction.

3. The method according to claim 1 or 2, characterized in that the delivery of at least one further constituent hypothesis based on a phonetic-acoustic similarity calculus as phonetically-acoustically similar Constituent hypothesis is supplied.

4. The method according to claim 3, characterized in that the phonetic-acoustic similarity calculus based on phonetic acoustic representations of the constituents without recourse to any Audio data of a speech recognition device is performed.

5. The method according to any one of the preceding claims, characterized in that it further comprises:
Application of inflection and / or derivation paradigms for the delivery of at least one further inflection and / or derivation hypothesis for the constituent hypothesis that needs to be corrected.

6. The method according to any one of the preceding claims, further comprising:
Segmentation of at least the constituent hypothesis in need of correction into words and / or morphs and / or syllables;
Determination of a word or morph of the constituent hypothesis in need of correction;
Delivery of at least one further word and / or morph and / or syllable hypothesis for the word and / or morph and / or the syllable in need of correction of the constituent hypothesis in need of correction.

7. The method according to any one of the preceding claims, characterized in that it further comprises:
Rule-based evaluation of the n-grams of the constituent hypothesis sequence and / or the word sequence hypothesis;
Determination of words and / or constituents of the constituent hypothesis sequence and / or the word sequence hypothesis in need of correction based on the evaluation;
Delivery of at least one further constituent hypothesis and / or word hypothesis for the determined constituent requiring correction and / or the determined word requiring correction.

8. The method according to any one of the preceding claims, characterized characterized in that several more hypotheses each for the hypothesis in need of correction to be delivered.

9. The method according to any one of the preceding claims, characterized by an interface for determining a constituent in need of correction Hypothesis and / or word hypothesis by the user.

10. The method of claim 9, wherein the interface further comprises:
a choice of the following corrective actions:
Assembling or separating constituents;
Influencing case sensitivity;
Delivery of further hypotheses through inflection and / or derivation and / or preformation.

11. The method according to any one of the preceding claims, characterized characterized in that Parts of the word are then included as constituents in the recognizer vocabulary be given a minimum level of acoustic weight as expected Energy quantum comes in the speech signal.

12. The method according to any one of the preceding claims, characterized by:
Provision of a background vocabulary which can be accessed by means of a phonetic-acoustic similarity calculus for determining further constituent hypotheses or for determining word hypotheses.

13. The method according to any one of the preceding claims, characterized characterized in that the constituents of the recognizer vocabulary are components of composites.

14. The method according to any one of the preceding claims, characterized in that the constituents of the recognizer vocabulary
be represented at least by a syllable that bears a word accent and / or
contain at least one stem morph and / or
can be interpreted as syntagma-forming in the respective context.

15. The method according to any one of the preceding claims, characterized in that it further comprises:
Use of word form sequence statistics to determine corrective hypotheses of words and / or constituents, whereby in the word form sequence statistics the composites formed from constituents are replaced by the respective constituents.

16. The method according to any one of the preceding claims, characterized in that it further comprises:
Inclusion of new constituents and / or new composites in the recognizer vocabulary and / or a background vocabulary.

17. The method according to any one of the preceding claims, characterized characterized in that the sequence of digital data the audio data of a device for Speech recognition or the text data of a written text.

18. The method according to any one of the preceding claims, characterized by Implementation of the method according to one of the preceding claims based on a constituent hypothesis sequence, which is replaced by a Constituent hypothesis of a constituent hypothesis sequence by a new one Hypothesis arises.

19. Device for determining an orthographic reproduction of a text represented by a sequence of digital data, which comprises:
a recognizer vocabulary device which comprises words and parts of words as word constituents to be recognized;
a device for determining a sequence of constituent hypotheses by comparing the digital data with the recognizer vocabulary for representing the text by the determined constituent hypothesis sequence;
means for rule-based processing of the constituent hypothesis sequence for grouping the constituent hypothesis sequence into individual words in order to determine a word sequence hypothesis;
means for determining a constituent of the word sequence hypothesis in need of correction; and
a device for supplying at least one further constituent hypothesis for the constituent in need of correction.

20. The apparatus of claim 19, further comprising:
a device for performing a method according to one of claims 2 to 18.

21. Computer program product which comprises:
a computer readable or interpretable code which causes the computer to carry out a method according to any one of claims 1 to 18.