DE10253868B3

DE10253868B3 - Test and reference pattern synchronization method e.g. for speech recognition system, has test pattern potential synchronization points associated with reference synchronization points

Info

Publication number: DE10253868B3
Application number: DE2002153868
Authority: DE
Inventors: Jos Wallers; Christian Saam
Original assignee: Digital Design GmbH
Current assignee: Digital Design GmbH
Priority date: 2002-11-15
Filing date: 2002-11-15
Publication date: 2004-07-29
Anticipated expiration: 2022-11-16

Abstract

The synchronization method has the synchronization effected via predetermined or automatically determined synchronization points within the test and reference patterns, selected vectors of the reference pattern used as reference synchronization points and corresponding vectors of a test pattern identified as potential synchronization points, associated with one or more of the reference synchronization points, with automatic synchronization via the associated synchronization points when given synchronization criteria are fulfilled. Also included are Independent claims for the following: (a) a processor device for synchronization of test and reference patterns, a computer program product with a memory medium for storing a synchronization program for test and reference patterns; (b) a computer-readable memory medium storing a program for synchronization of test and reference patterns; (c) a databank for storing speech pattern synchronization fragments

Description

Die Erfindung betrifft ein Verfahren und eine Anordnung zur Synchronisation von Test- und Referenzmustern sowie ein entsprechendes Computerprogramm-Erzeugnis und ein entsprechendes computerlesbares Speichermedium, welche insbesondere einsetzbar sind zum Auffinden von sprachbasierten Stichworten oder Notizen.The invention relates to a method and an arrangement for the synchronization of test and reference patterns and a corresponding computer program product and a corresponding one computer-readable storage medium, which can be used in particular are for finding language-based keywords or notes.

Verfahren zum Vergleich von Sprachmustern kommen bereits in Word-spotting Anwendungen, Spracherkennungssystemen oder beispielsweise in Stichwortsuchverfahren, wie in der Patentanmeldung DE 100 54 583 A1 beschrieben, zur Anwendung.Methods for comparing speech patterns are already used in word-spotting applications, speech recognition systems or, for example, in keyword search methods, such as in the patent application DE 100 54 583 A1 described, for use.

Aus DE 195 10 095 A1 ist ein Verfahren zur Segmentierung stimmhafter Laute in Sprachsignalen bekannt. Für jedes Wort werden markante Punkte im Zeitsignal als Synchronisationszeitpunkte festgelegt und beim Training für die Suche und Festlegung von Segmentgrenzen genutzt. Aus DE 694 23 588 T2 ist ein Spracherkennungsgerät bekannt, welches zum Ausgleichen von Umgebungsunterschieden zwischen einer Eingangssprache und einem Referenzmuster dient. Weiter ist in WO 01/31835 A1 eine Sprachsynchronisierungsvorrichtung offenbart, welche ein Test- und ein Referenzsignal synchronisiert. Out DE 195 10 095 A1 a method for segmenting voiced sounds in speech signals is known. For each word, striking points in the time signal are set as synchronization times and used in training for the search and definition of segment boundaries. Out DE 694 23 588 T2 a speech recognition device is known which serves to compensate for environmental differences between an input language and a reference pattern. Furthermore, WO 01/31835 A1 discloses a voice synchronization device which synchronizes a test and a reference signal.

In einem typischen, herkömmlichen System zum Vergleich von Sprachmustern werden die akustischen Sprachsignale mittels Mikrofon in analoge, elektrische Signale umgewandelt. Die analogen Signale werden abgetastet und digitalisiert. Die so gewonnenen Werte werden in eine Folge von gleichgroßen Zeitsegmenten von typischerweise 10 bis 20 ms aufgeteilt. 1 und 2 zeigen im oberen Bildteil die Signalform 101, 201 je eines Sprachmusters des Wortes „Jupiter", gesprochen vom gleichen Sprecher. Die jeweils obere Skala 102, 202 gibt die Nummern der Zeitsegmente an. Die Länge eines Zeitsegmentes beträgt in diesem Falle 20 ms. Die untere Bildhälfte zeigt in beiden Figuren das zugehörige Spektrogramm 103, 203. Die jeweils untere Skala 104, 204 gibt die Zeit an. Für jedes Segment werden die Sprachparameter bestimmt. Abhängig von der Analyseart können diese Parameter z. B. die spektrale Energie, die LPC-Koeffizienten, die Cepstrum-Koeffizienten oder Vergleichbares darstellen. Die Sprachparameter jedes Zeitsegmentes i werden in einem Merkmalsvektor a(i) zusammengefasst. Das Sprachmuster liegt anschließend als Folge A von I Merkmalsvektoren a(1) , a(2) ,... a( I) vor. Die Referenzmuster liegen ebenfalls als Folgen B(k) von Merkmalsvektoren b(1, k) , b(2, k) ,..., b(J_k, k) vor. Der Index k bezeichnet das k-te Referenzmuster, die zugehörige Folge ist B(k). J_k ist die Anzahl der Vektoren des k-ten Referenzmusters. Da auch gleiche Worte unterschiedlich schnell gesprochen werden können, dementsprechend die Folgen A und B(k) unterschiedlich lang sein können, muss eine Zeitanpassung zwischen Testmuster und Referenzmuster erfolgen. Diese erfolgt üblicherweise auf Basis der Zeitsegmente. Gesucht wird die optimale Zuordnung der Zeitindizes i des Testmusters zu den Zeitindizes j der verschiedenen Referenzmuster k. Erforderlich ist eine dynamische Zeitanpassung (Dynamic Time Warping, DTW). Ein weit verbreitetes Verfahren ist die „Dynamische Programmierung". Die Anfangspunkte (Wortanfänge: i = 1, j = 1) und die Endpunkte (Wortenden: i = I, j = J_k) liegen fest. Ein lokales Abstandsmaß d(i,j,k) zwischen zwei Merkmalsvektoren a(i) und b (j,k) bestimmt den Abstand zwischen den Vektoren der beiden Segmente i und j. Je größer der Abstandswert, umso ungleicher die beiden Sprachsegmente.In a typical, conventional system for comparing speech patterns, the acoustic speech signals are converted into analog, electrical signals by means of a microphone. The analog signals are sampled and digitized. The values obtained in this way are divided into a sequence of equally large time segments of typically 10 to 20 ms. 1 and 2 show the waveform in the upper part of the picture 101 . 201 one speech pattern each of the word "Jupiter", spoken by the same speaker. The upper scale 102 . 202 specifies the numbers of the time segments. In this case, the length of a time segment is 20 ms. The lower half of the picture shows the corresponding spectrogram in both figures 103 . 203 , The lower scale in each case 104 . 204 indicates the time. The language parameters are determined for each segment. Depending on the type of analysis, these parameters can e.g. B. represent the spectral energy, the LPC coefficients, the cepstrum coefficients or the like. The speech parameters of each time segment i are summarized in a feature vector a (i). The speech pattern is then available as a sequence A of I feature vectors a (1), a (2), ... a (I). The reference patterns are also present as sequences B (k) of feature vectors b (1, k), b (2, k), ..., b (J _k , k). The index k denotes the kth reference pattern, the associated sequence is B (k). J _k is the number of vectors of the kth reference pattern. Since the same words can also be spoken at different speeds, accordingly the sequences A and B (k) can be of different lengths, a time adjustment must take place between the test pattern and the reference pattern. This is usually done based on the time segments. The optimal assignment of the time indices i of the test pattern to the time indices j of the different reference patterns k is sought. Dynamic time warping (DTW) is required. A widespread method is "dynamic programming". The starting points (word beginnings: i = 1, j = 1) and the end points (word ends: i = I, j = J _k ) are fixed. A local distance measure d (i, j , k) between two feature vectors a (i) and b (j, k) determines the distance between the vectors of the two segments i and j. The greater the distance value, the more unequal the two language segments.

In der Literatur sind eine Vielzahl von Abstandsfunktionen vorgeschlagen worden. Ein klassisches Abstandsmaß ist der quadrierte euklidische Abstand:

hierbei ist a_n(i) das n-te Element der N Elemente des Merkmalsvektors a(i); b_n(j,k) das n-te Element des Merkmalsvektors b(j,k).A variety of distance functions have been proposed in the literature. A classic distance measure is the squared Euclidean distance:

here a _n (i) is the nth element of the N elements of the feature vector a (i); b _n (j, k) the nth element of the feature vector b (j, k).

In der einfachsten Form der Zeitanpassung werden alle Abstandswerte d(i,j,k), i = 1 bis I, j = 1 bis J ermittelt. Anschließend werden, ausgehend vom Startpunkt i = 1, j = 1, die möglichen Pfade durchprobiert. Zu jedem Pfad wird die Summe D (1,k) der Abstandswerte d (i,j,k) der durchlaufenen Zeitsegmentpaare (i,j) gebildet. Der Pfad mit der kleinsten Summe D_min(l,k) entspricht der optimalen Zeitanpassung zwischen Testmuster und Referenzmuster k. In einigen Anwendungen wird D_min (l,k) anschließend noch normiert, z . B. durch eine Division von D_min(l,k) durch die Anzahl der durchlaufenen Segmente oder durch die Anzahl der Segmente des Referenzwortes. 3 veranschaulicht die Zeitanpassung der beiden Sprachmuster aus 1 und 2. Eines der Muster enthält 59 Segmente, das andere 94. Kleine Abstandswerte sind heller, größere Werte dunkler dargestellt. Die kleinen Quadrate kennzeichnen den optimalen Pfad 301. Zur besseren Orientierung ist in den Diagrammen 302, 303 unter und rechts neben der Matrix der Verlauf der Gesamtenergie der einzelnen Segmente dargestellt.In the simplest form of time adjustment, all distance values d (i, j, k), i = 1 to I, j = 1 to J are determined. Then, starting from the starting point i = 1, j = 1, the possible paths are tried out. The sum D (1, k) of the distance values d (i, j, k) of the time segment pairs (i, j) passed through is formed for each path. The path with the smallest sum D _min ( l , k) corresponds to the optimal time adaptation between test pattern and reference pattern k. In some applications, D _min (l, k) is then standardized, e.g. B. by dividing D _min (l, k) by the number of segments passed or by the number of segments of the reference word. 3 illustrates the timing of the two speech patterns 1 and 2 , One of the patterns contains 59 segments, the other 94. Small distance values are shown lighter, larger values darker. The small squares indicate the optimal path 301 , For better orientation is in the diagrams 302 . 303 the course of the total energy of the individual segments is shown below and to the right of the matrix.

In Abhängigkeit von der Anwendung wird z. B. nach dem ersten Referenzmuster k aus einer Menge von Referenzmustern gesucht, bei welchem die Summe des optimalen Pfades D_min(k) kleiner als ein Schwellenwert ist. Bei anderen Anwendungen wird nach dem Referenzmuster k mit dem kleinsten D_min(k) aller Referenzmuster einer gegebenen Menge gesucht. In wieder anderen Applikationen werden die n besten Treffer, d. h. die n Referenzmuster mit den kleinsten Summen D_min(k) benötigt.Depending on the application, e.g. B. is looking for the first reference pattern k from a set of reference patterns, in which the sum of the optimal path D _min (k) is less than a threshold value. In other applications, the reference pattern k with the smallest D _min (k) of all reference patterns of a given set is searched for. In still other applications, the best n hits, i.e. H. which requires n reference samples with the smallest sums D _min (k).

Die Berechnung aller Abstandswerte d(i,j,k) und das Durchprobieren aller möglichen Pfade erfordert einen großen Rechenaufwand. Die Anzahl der möglichen Pfade und damit der Suchraum werden daher in realen Systemen eingeschränkt. Eine erste Einschränkung ist, dass in einem Pfad bei jedem Schritt mindestens ein Index, i oder j, oder auch beide gleichzeitig erhöht werden. Eine Rückwärtsbewegung ist dadurch ausgeschlossen. Eine weitere mögliche Einschränkung betrifft die Schrittweite der Indizes i oder j bei jedem einzelnen Schritt (z. B. auf 1), eine andere Einschränkung begrenzt die Anzahl aufeinanderfolgender Schritte parallel zu einer Achse. Bei wieder anderen Anwendungen wird die Suche auf die Diagonale und eine vorgegebene Anzahl von Zeitsegmenten beiderseits der Diagonale begrenzt. Im Laufe der Zeit wurde eine Vielzahl von weiteren Einschränkungen vorgeschlagen und verwendet. Dennoch ist der Rechenaufwand, insbesondere in dem Fall, dass viele Referenzmuster vorliegen, erheblich. 4 zeigt beispielhaft die gleiche Zeitanpassung mit einem der gängigen Verfahren wie 3, jedoch mit eingeschränktem Suchraum.The calculation of all distance values d (i, j, k) and trying out all possible paths requires a great deal of computing effort. The number of possible paths and thus the search space are therefore restricted in real systems. A first limitation is that at least one index, i or j, or both at the same time, is increased in a path. This prevents a backward movement. Another possible restriction concerns the step size of the indices i or j for each individual step (e.g. to 1), another restriction limits the number of successive steps parallel to an axis. In still other applications, the search is limited to the diagonal and a predetermined number of time segments on both sides of the diagonal. A variety of other restrictions have been proposed and used over time. Nevertheless, the computational effort is considerable, especially in the case that there are many reference patterns. 4 shows an example of the same time adjustment with one of the common methods like 3 but with limited search space.

Ein weiteres Problem ist die Festlegung der Wortgrenzen. Gängige Verfahren bestimmen bei einzeln gesprochenen Worten die Wortgrenzen anhand des Energiegehaltes mehrerer aufeinanderfolgender Segmente. Anfang und Ende eines Wortes sind wegen des geringen Energiegehaltes mancher Wortanfänge 105, 205 bzw. Wortenden 106, 206 (1 und 2) bei gleichzeitig störenden Hintergrundgeräuschen (in 1 und 2 sind keine wesentlichen Hintergrundgeräusche vorhanden) mit diesen Algorithmen nur unzulänglich zu bestimmen. Die Probleme treten noch verstärkt auf, wenn Wortanfang und Wortende in fließend gesprochener Sprache gefunden werden sollen. Hier gehen die einzelnen Worte häufig ohne erkennbare Pause ineinander über.Another problem is the definition of word boundaries. Current methods determine the word limits for individually spoken words based on the energy content of several successive segments. The beginning and end of a word are due to the low energy content of some word beginnings 105 . 205 or word ends 106 . 206 ( 1 and 2 ) with disturbing background noise (in 1 and 2 there are no significant background noises) can only be inadequately determined with these algorithms. The problems arise even more if the beginning and end of the word are to be found in fluent language. Here the individual words often merge into one another without a recognizable pause.

Zur Reduzierung des Rechenaufwandes ist es weiterhin von Vorteil, wenn die Anzahl der zu vergleichenden Referenzmuster durch eine Vorauswahl beschränkt werden kann.To reduce the computing effort it is still an advantage if the number of to be compared Reference samples can be limited by a pre-selection.

Die Aufgabe der Erfindung besteht deshalb darin, die erwähnten Nachteile der bekannten Lösung zu beheben und insbesondere ein Verfahren und eine Anordnung zur Synchronisation von Test- und Referenzmustern sowie ein entsprechendes Computerprogramm-Erzeugnis und ein entsprechendes computerlesbares Speichermedium bereitzustellen, welche den Rechenaufwand bei der Zeitanpassung reduzieren. Ein weiteres Ziel der Erfindung ist es, die Präzision bei der Bestimmung der Wortgrenzen einzeln gesprochener Worte zu erhöhen sowie eine gegenüber dem Stand der Technik verbesserte Lösung für das Auffinden der Wortgrenzen von Stichworten in fließender Sprache zu schaffen.The object of the invention is therefore in the mentioned Disadvantages of the known solution to fix and in particular a method and an arrangement for Synchronization of test and reference patterns as well as a corresponding one Computer program product and a corresponding computer-readable storage medium To provide, which the computing effort in the time adjustment to reduce. Another object of the invention is to be precise determining the word boundaries of individually spoken words; and one opposite the prior art improved solution for finding the word boundaries of Keywords in fluent Creating language.

Diese Aufgabe wird erfindungsgemäß gelöst durch die Merkmale im kennzeichnenden Teil der Ansprüche 1, 35 und 41 bis 43 im Zusammenwirken mit den Merkmalen im Oberbegriff. Zweckmäßige Ausgestaltungen der Erfindung sind in den Unteransprüchen enthalten.According to the invention, this object is achieved by the features in the characterizing part of claims 1, 35 and 41 to 43 in Interaction with the features in the generic term. Appropriate configurations the invention are contained in the subclaims.

Ein besonderer Vorteil des Verfahrens zur Synchronisation von Test- und Referenzmustern, insbesondere zur Synchronisation von Sprachmustern, wobei Test- und Referenzmuster jeweils als Folge A = (a(1), a(2),..., a (I)) bzw. B = (b(1) , b(2) ,..., b(J)) von (Zeitsegmenten zugeordneten) Merkmalsvektoren a(i) (i = 1, 2, ..., I) bzw. b (j) (j = 1, 2,..., J) vorliegen, besteht darin, daß der Rechenaufwand bei der Zeitanpassung reduziert wird, indem die Synchronisation über in den Test- und Referenzmustern vorgegebene und/oder automatisch ermittelte Synchronisationspunkte (SP) erfolgt, wobei

– ein Teil der Merkmalsvektoren b(j) der Referenzmuster als Referenz-SP ausgezeichnet wird,
– die Merkmalsvektoren a(i) eines Testmusters nach vorgebbaren Merkmalen durchsucht und diejenigen Merkmalsvektoren als potentielle SP gekennzeichnet werden, welche mindestens eins der vorgegebenen Merkmale aufweisen,
– wenigstens ein Teil der ermittelten potentiellen SP gemäß vorgebbaren Regeln einem oder mehreren Referenz-SP zugeordnet wird und
– bei Erfüllung vorgebbarer Kriterien die Synchronisation von Test- und Referenzmuster über einander zugeordnete SP automatisch hergestellt wird.

A particular advantage of the method for the synchronization of test and reference patterns, in particular for the synchronization of speech patterns, wherein test and reference patterns each as a sequence A = (a (1), a (2), ..., a (I)) or . B = (b (1), b (2), ..., b (J)) of (associated with time segments) feature vectors a (i) (i = 1, 2, ..., I) or b ( j) (j = 1, 2, ..., J), there is a reduction in the computing effort for the time adjustment by the synchronization being carried out via synchronization points (SP) which are predetermined and / or automatically determined in the test and reference patterns , in which

Part of the feature vectors b (j) of the reference pattern is marked as a reference SP,
The feature vectors a (i) of a test pattern are searched for predefinable features and those feature vectors are identified as potential SPs which have at least one of the predefined features,
- At least part of the potential SP determined is assigned to one or more reference SP according to predefinable rules and
- If predefined criteria are met, the synchronization of test and reference patterns is automatically established via mutually assigned SP.

Eine Anordnung zur Synchronisation von Test- und Referenzmustern ist vorteilhafterweise so eingerichtet, daß sie mindestens einen Prozessor umfaßt, der (die) derart eingerichtet ist (sind), daß ein Verfahren zur Synchronisation von Test- und Referenzmustern, insbesondere zur Synchronisation von Sprachmustern, wobei Test- und Referenzmuster jeweils als Folge A = (a(1), a(2),..., a(I)) bzw. B = (b(1), b(2),..., b(J)) von (Zeitsegmenten zugeordneten) Merkmalsvektoren a(i) (i = 1, 2,..., I) bzw. b (j) (j = 1, 2,..., J) vorliegen, durchführbar ist, wobei die Synchronisation über in den Test- und Referenzmustern vorgegebene und/oder automatisch ermittelte Synchronisationspunkte (SP) erfolgt, wobei

An arrangement for the synchronization of test and reference patterns is advantageously set up in such a way that it comprises at least one processor which is (are) set up in such a way that a method for the synchronization of test and reference patterns, in particular for the synchronization of speech patterns, wherein Test and reference samples each as a sequence A = (a (1), a (2), ..., a (I)) or B = (b (1), b (2), ..., b ( J)) of (associated with time segments) feature vectors a (i) (i = 1, 2, ..., I) or b (j) (j = 1, 2, ..., J) are present, can be carried out, wherein the synchronization takes place via synchronization points (SP) specified and / or automatically determined in the test and reference patterns, wherein

Ein Computerprogrammprodukt zur Synchronisation von Test- und Referenzmustern umfaßt ein computerlesbares Speichermedium, auf dem ein Programm gespeichert ist, das es einem Computer ermöglicht, nachdem es in den Speicher des Computers geladen worden ist, ein Verfahren zur Synchronisation von Test- und Referenzmustern, insbesondere zur Synchronisation von Sprachmu stern, wobei Test- und Referenzmuster jeweils als Folge A = (a(1), a(2),..., a(I)) bzw. B = (b(1), b(2),..., b(J)) von (Zeitsegmenten zugeordneten) Merkmalsvektoren a (i) (i = 1, 2,..., I) bzw. b (j) (j = 1, 2,..., J) vorliegen, durchzuführen, wobei die Synchronisation über in den Test- und Referenzmustern vorgegebene und/oder automatisch ermittelte Synchronisationspunkte (SP) erfolgt, wobei

A computer program product for the synchronization of test and reference patterns comprises a computer-readable storage medium on which a program is stored which enables a computer, after it has been loaded into the memory of the computer, a method for the synchronization of test and reference patterns, in particular for Synchronization of Sprachmu stern, whereby test and reference patterns each as a sequence A = (a (1), a (2), ..., a (I)) or B = (b (1), b (2), ..., b (J)) of (associated with time segments) feature vectors a (i) (i = 1, 2, ..., I) or b (j) (j = 1, 2, ..., J ) are to be carried out, the synchronization taking place via synchronization points (SP) predetermined and / or automatically determined in the test and reference patterns, wherein

Um eine Synchronisation von Test- und Referenzmustern durchzuführen, wird vorteilhafterweise ein computerlesbares Speichermedium eingesetzt, auf dem ein Programm gespeichert ist, das es einem Computer ermöglicht, nachdem es in den Speicher des Computers geladen worden ist, ein Verfahren zur Synchronisation von Test- und Referenzmustern, insbesondere zur Synchronisation von Sprachmustern, wobei Test- und Referenzmuster jeweils als Folge A = (a(1), a(2),..., a(I)) bzw. B = (b(1), b(2),..., b(J)) von (Zeitsegmenten zugeordneten) Merkmalsvektoren a(i) (i = 1, 2,..., I) bzw. b(j) (j = 1, 2,..., J) vorliegen, durchzuführen, wobei die Synchronisation über in den Test- und Referenzmustern vorgegebene und/oder automatisch ermittelte Synchronisationspunkte (SP) erfolgt, wobei

In order to carry out a synchronization of test and reference patterns, a computer-readable storage medium is advantageously used, on which a program is stored which enables a computer, after it has been loaded into the memory of the computer, a method for the synchronization of test and reference patterns , in particular for the synchronization of speech patterns, whereby test and reference patterns each as a sequence A = (a (1), a (2), ..., a (I)) or B = (b (1), b (2 ), ..., b (J)) of (associated with time segments) feature vectors a (i) (i = 1, 2, ..., I) or b (j) (j = 1, 2, ... , J) are to be carried out, the synchronization taking place via synchronization points (SP) predetermined and / or automatically determined in the test and reference patterns, wherein

In einer bevorzugten Ausführungsform der Erfindung ist vorgesehen, daß die Ermittlung potentieller SP anhand folgender Merkmale erfolgt:

– Differenz der Energie von aufeinanderfolgenden Zeitsegmenten und/oder
– Differenz der Energie bestimmter Frequenzbänder von aufeinanderfolgenden Zeitsegmenten und/oder
– Änderung der Anzahl der Nulldurchgänge des Sprachsignals in aufeinanderfolgenden Zeitsegmenten und/oder
– anhand von Cepstrum-, LPC- und/oder PARLOR-Koeffizienten und/oder anhand der Ableitungen dieser Koeffizienten.

A preferred embodiment of the invention provides that potential SPs are determined using the following features:

- Difference in energy from successive time segments and / or
- Difference in the energy of certain frequency bands from successive time segments and / or
- Change the number of zero crossings of the speech signal in successive time segments and / or
- on the basis of cepstrum, LPC and / or PARLOR coefficients and / or on the derivatives of these coefficients.

Vorteilhafterweise werden zur Ermittlung potentieller SP Frequenzbänder unterschiedlich gewichtet.Advantageously, for the determination potential SP frequency bands weighted differently.

In einer anderen bevorzugten Ausführungsform der Erfindung ist vorgesehen, daß die vorgebbaren Regeln für die Zuordnung potentieller SP zu Referenz-SP für wenigstens einen Teil der ermittelten potentiellen SP eine Analyse zusätzlicher, vorzugsweise in zeitlicher Umgebung der potentiellen SP angeordneter Merkmalsvektoren des Testmusters vorschreiben und eine Zuordnung von potentiellen SP zu Referenz-SP in Abhängigkeit der Ergebnisse dieser Analyse erfolgt.In another preferred embodiment the invention provides that the predefinable rules for the assignment potential SP to reference SP for at least part of the potential SP determined an analysis additional, preferably arranged in the temporal environment of the potential SP Prescribe feature vectors of the test pattern and an assignment from potential SP to reference SP depending on the results of these Analysis is done.

Eine weitere Ausführungsform benutzt eine der bekannten Glättungsfunktionen bei der Berechnung der oben aufgeführten Differenzen bzw. Ableitungen.Another embodiment uses one of the known smoothing functions when calculating the differences or derivatives listed above.

Eine weitere Ausführungsform des erfindungsgemäßen Verfahrens sieht vor, daß die zur Ermittlung potentieller SP analysierten Merkmale miteinander kombiniert und/ oder mit mathematische Funktionen, wie insbesondere der Logarithmus-Funktion, verknüpft werden.Another embodiment of the method according to the invention stipulates that the to determine potential SP analyzed features with each other combined and / or with mathematical functions, such as in particular the logarithm function become.

Es erweist sich als Vorteil, wenn die Synchronisation die Ermittlung eines Grades der Ähnlichkeit (score) von Referenz- und Testmuster umfasst.It turns out to be an advantage if the synchronization the determination of a degree of similarity (score) of reference and test samples.

Darüber hinaus erweist es sich als Vorteil, wenn Stoppkonsonanten und/oder Explosivlaute charakterisierende Merkmalsvektoren als Referenz-Synchronisationspunkte dienen.It also proves to be an advantage if stop consonants and / or explosive sounds are characteristic feature vectors serving as reference synchronization points.

Eine andere bevorzugte Ausführungsform der Erfindung sieht vor, daß nach erfolgter Synchronisation das oder die ermittelten Referenzmuster und/oder das (die) Testmuster ausgegeben, abrufbar bereitgestellt oder an andere Applikationen übergeben werden oder eine dynamische Zeitanpassung (DTW, Dynamic Time Warping) von Test- und Referenzmuster(n) oder eine Analyse des Testmusters durch ein Hidden-Markov-Modell (HMM) durchgeführt wird und anschließend das oder die ermittelten Referenzmuster und/oder das (die) Testmuster ausgegeben, abrufbar bereitgestellt oder an andere Applikationen übergeben werden. Dabei kann die Ausgabe akustisch und/oder visuell erfolgen.Another preferred embodiment the invention provides that after synchronization has taken place, the determined reference pattern or patterns and / or the test pattern (s) are output, made available or passed to other applications or a dynamic time adjustment (DTW, Dynamic Time Warping) of test and reference pattern (s) or an analysis of the test pattern is carried out by a hidden Markov model (HMM) and then the or the determined reference pattern and / or the test pattern (s) output, made available or handed over to other applications become. The output can be acoustic and / or visual.

Des weiteren ist es von Vorteil, wenn die bei der dynamischen Zeitanpassung berücksichtigte Anzahl der möglichen Pfade und damit der Suchraum eingeschränkt werden, indem in einem Pfad bei jedem Schritt mindestens ein Index, i oder j, um einen vorgebbaren Wert erhöht wird und/oder in einem Pfad bei jedem Schritt beide Indizes gleichzeitig um jeweils vorgebbare Werte erhöht werden und/oder Anzahl aufeinanderfolgender Schritte parallel zu einer Achse beschränkt wird und/ oder die Suche nach dem optimalen Pfad auf die Diagonale und eine vorgegebene Anzahl von Zeitsegmenten beiderseits der Diagonale begrenzt wird und/oder die dynamische Zeitanpassung nur für die Intervalle zwischen den SP ausgeführt wird. Um Ungenauigkeiten bei der Bestimmung der zeitlichen Lage von Synchronisationspunkten zu reduzieren, ist es vorteilhaft, wenn bei einer dynamischen Zeitanpassung für Intervalle zwischen SP der Suchraum erweitert wird, indem dem Suchraum Zeitsegmente aus der zeitlichen Umgebung der SP hinzugefügt werden.It is also an advantage if the number of possible times taken into account in the dynamic time adjustment Paths and thus the search space can be restricted by in one Path at least one index, i or j, to one at each step predeterminable value increased and / or both indexes simultaneously in each step in a path increased by predeterminable values and / or number of successive steps in parallel limited to one axis and / or the search for the optimal path on the diagonal and a predetermined number of time segments on either side of the diagonal is limited and / or the dynamic time adjustment only for the intervals executed between the SP becomes. To inaccuracies in determining the timing of synchronization points, it is advantageous if at dynamic time adjustment for intervals between SP the Search space is expanded by adding time segments from the search space temporal environment of the SP can be added.

Für die Durchführung des erfindungsgemäßen Verfahrens erweist es sich als vorteilhaft, wenn eine dynamische Zeitanpassung (DTW) mittels „Dynamischer Programmierung" (DP) oder mit Hilfe des Viterbi-Algorithmus bzw. von Hidden-Markov-Modellen unter Verwendung der SP zur Synchronisation der DP oder der HMM durchgeführt wird.For the implementation of the method according to the invention it proves to be advantageous if a dynamic time adjustment (DTW) using “Dynamic Programming "(DP) or with the help of the Viterbi algorithm or hidden Markov models using the SP to synchronize the DP or the HMM carried out becomes.

Es erweist sich des weiteren als Vorteil, wenn bei dem Verfahren, insbesondere zur Berechnung von Abstandsfunktionen, folgende Parameter ermittelt werden:

– Cepstrum-Koeffizienten und/oder
– LPC-Koeffizienten (Linear Predictive Coding) und/oder
– PARLOR-Koeffizienten und/oder
– LAR-Koeffizienten und/oder
– LSP-Koeffizienten und/oder
– LSF-Koeffizienten und/oder
– Spektralenergieverteilung und/oder
– MEL-Spektrum und/oder
– Nulldurchgangsrate (zero crossing rate) und/oder
– Mel- oder Bark-Transformationen der vorgenannten Koeffizienten und/oder
– zeitliche Ableitungen der vorgenenannten Koeffizienten und/oder ihrer Mel- oder Bark-Transformationen und/oder
– Kombinationen dieser Koeffizienten und/oder Parameter in geglätteter und ungeglätteter Form.

It also proves to be an advantage if the following parameters are determined in the method, in particular for calculating distance functions:

- Cepstrum coefficients and / or
- LPC coefficients (Linear Predictive Coding) and / or
- PARLOR coefficients and / or
- LAR coefficients and / or
- LSP coefficients and / or
- LSF coefficients and / or
- spectral energy distribution and / or
- MEL spectrum and / or
- zero crossing rate and / or
- Mel or Bark transformations of the aforementioned coefficients and / or
- Time derivatives of the aforementioned coefficients and / or their Mel or Bark transformations and / or
- Combinations of these coefficients and / or parameters in smoothed and unsmoothed form.

Für einen automatische Bestimmung von Wortgrenzen eines Sprachmusters ist es von Vorteil, wenn die Pfadsuche bei der dynamischen Zeitanpassung vom ersten SP eines Testmusters beginnend zum Wortanfang (rückwärts) und/ oder vom letzten SP eines Testmusters beginnend zum Wortende (vorwärts) erfolgt. Ein besonders günstiges Vorgehen bei der Festlegung der Wortgrenzen besteht dabei darin, daß bei einer automatischen Bestimmung von Wortgrenzen eines Sprachmusters die Pfadsuche bei den Zeitsegmenten abgebrochen wird, bei denen sich keine Zuordnungen von Zeitsegmenten finden lassen, für die der Abstandswert d(i,j) und ein vorgebbarer Schwellenwert D_S die Bedingung d(i,j) < D_S erfüllen, oder wenn der Abstandswert d(i,j) in einer vorgebbaren Anzahl aufeinanderfolgender Zeitsegmente den Schwellenwert D_S überschreitet, und die Zeitsegmente, an denen die Suche abgebrochen wurde, als Wortgrenze gekennzeichnet werden. Vorteilhafterweise wird der Schwellenwert D_S in Abhängigkeit der Anwendung, vorzugsweise unter Berücksichtigung von Hintergrundgeräuschen, vorgegeben. Hierbei wiederum ist es besonders einfach, wenn der Schwellenwert D_S durch Ermittlung der Werte für d() in Sprachpausen bestimmt wird. Die Qualität des Verfahrens kann außerdem dadurch verbessert werden, indem bei Vorhandensein mehrerer, das gleiche Wort repräsentierenden Referenz-Sprachmuster die automatische Bestimmung von Wortgrenzen eines Sprachmusters mit mehreren dieser Referenz-Sprachmuster durchgeführt wird.For an automatic determination of word boundaries of a speech pattern, it is advantageous if the path search for dynamic time adjustment takes place from the first SP of a test pattern beginning at the beginning of the word (backwards) and / or from the last SP of a test pattern beginning at the end of the word (forward). A particularly favorable procedure when determining the word boundaries consists in the fact that when the word boundaries of a speech pattern are automatically determined, the path search is terminated in the time segments for which no assignments of time segments can be found for which the distance value d (i, j) and a predeterminable threshold value D _S fulfills the condition d (i, j) <D _S , or if the distance value d (i, j) exceeds the threshold value D _S in a predeterminable number of successive time segments and the time segments at which the search was terminated was marked as a word boundary. The threshold value D _S is advantageously specified as a function of the application, preferably taking into account background noise. Here again, it is particularly simple if the threshold value D _S is determined by determining the values for d () in speech pauses. The quality of the method can also be improved by the automatic determination of word boundaries of a speech pattern with several of these reference speech patterns in the presence of a plurality of reference speech patterns representing the same word.

Eine weitere Verbesserung erfährt das Verfahren, wenn Referenz-SP mindestens einer Klasse von Synchronisationsfragmenten (SF) zugeordnet werden, wobei ein SF Merkmalsvektoren umfaßt, welche in zeitlicher Umgebung eines Referenz-SP eines vorgegebenen Referenzmusters angeordnet sind. Eine Klassifikation von SP und/oder Synchronisationsfragmenten erfolgt dabei vorteilhafterweise derart, daß SP und/oder Synchronisationsfragmente mit ansteigender Energie einer ersten Klasse und SP und/oder Synchronisationsfragmente mit abfallender Energie einer zweiten Klasse von SP zugeordnet werden.This is further improved Procedure if reference SP of at least one class of synchronization fragments (SF) are assigned, an SF comprising feature vectors which in the temporal environment of a reference SP of a given reference pattern are arranged. A classification of SP and / or synchronization fragments is advantageously carried out such that SP and / or synchronization fragments with increasing energy of a first class and SP and / or synchronization fragments be assigned to a second class of SP with falling energy.

Von Vorteil ist es weiterhin, wenn Referenzmustern eine Liste von Datenstrukturen (Schablonen) zugeordnet wird, welche Informationen zu Referenz-SP des Referenzmusters, insbesondere Informationen über erfolgte Zuordnungen von SP von Testmustern zu Referenz-SP des Referenzmusters und/ oder über den Grad der Ähnlichkeit (score) von Referenz- und Testmuster umfaßt. Je nach Ausgestaltung des Verfahrens kann eine Liste anfangs durchaus leer sein.It is also advantageous if reference patterns provide a list of data structures (templates) is arranged, which includes information on the reference SP of the reference pattern, in particular information on past assignments of SP from test patterns to reference SP of the reference pattern and / or on the degree of similarity (score) of the reference and test patterns. Depending on the design of the procedure, a list may initially be empty.

Die Erfindung wird nachfolgend anhand eines Ausführungsbeispieles näher erläutert. Die zugehörigen Zeichnung zeigen:The invention is described below of an embodiment explained in more detail. The associated drawing demonstrate:

1 und 2 Signalformen mit zugehörigen Spektrogrammen von zwei Sprachmustern des vom selben Sprecher gesprochenen Wortes „Jupiter"; 1 and 2 Waveforms with associated spectrograms of two speech patterns of the word "Jupiter" spoken by the same speaker;

3 Veranschaulichung der Zeitanpassung der beiden Sprachmuster aus 1 und 2; 3 Illustration of the time adjustment of the two speech patterns 1 and 2 ;

4 Veranschaulichung einer Zeitanpassung mit eingeschränktem Suchraum; 4 Illustration of a time adjustment with limited search space;

5 Veranschaulichung einer Zeitanpassung mit durch Synchronisationspunkte eingeschränktem Suchraum; 5 Illustration of a time adjustment with search space restricted by synchronization points;

6 Veranschaulichung einer Zeitanpassung, bei welcher die Suchraumbegrenzung mittels Synchronisationspunkte zusätzlich mit herkömmlichen Verfahren eingeschränkt wird; 6 Illustration of a time adjustment in which the search space limitation is additionally restricted by means of synchronization points using conventional methods;

7 Beispiele für Strukturen, die der Ermittlung von Klassen von Synchronisationsfragmenten dienen; 7 Examples of structures used to determine classes of synchronization fragments;

8 Beispiel eines für die Vorauswahl von Referenzworten nutzbaren Pseudo-Codes; 8th Example of a pseudo code that can be used for the preselection of reference words;

9 Beispiel für die Erweiterung der template-Struktur, die eine Bewertung der Ähnlichkeit von Referenzwort und Testmuster gestattet; 9 Example for the extension of the template structure, which allows an assessment of the similarity of reference word and test pattern;

10 Veranschaulichung einer dynamischen Zeitanpassung (DTW) mit Überlappungen der Zeitsegmente; 10 Illustration of a dynamic time adjustment (DTW) with overlap of the time segments;

11 Veranschaulichung der Verfahrensschritte bei der Bestimmung der Klassenzugehörigkeit von Synchronisationspunkten; 11 Illustration of the process steps in determining the class belonging to synchronization points;

12 beispielhafter Aufbau einer Vorrichtung zum Mustervergleich; 12 exemplary structure of a device for pattern comparison;

13 Veranschaulichung einer beispielhaften Verarbeitung der Signale in einem Digitalen Signal Prozessor (DSP). 13 Illustration of an exemplary processing of the signals in a digital signal processor (DSP).

Mögliche Ausführungsformen des erfindungsgemäßen Verfahrens werden durch im folgenden beschriebene Verfahren unter Verwendung einer ebenfalls nachfolgend beschriebenen Vorrichtung realisiert.Possible embodiments of the method according to the invention are used by methods described below a device also described below realized.

Um die Synchronisation zwischen Test- und Referenzmuster herstellen zu können, werden in den Sprachmustern Zeitpunkte gesucht, an denen die Zeitsegmente des Testmusters und des zu vergleichenden Referenzmusters synchronisiert werden können. Diese werden im Folgenden Synchronisationspunkte (SP) genannt. Als SP eignen sich große Sprachenergieänderungen, welche in einem kurzen Zeitintervall erfolgen, oder abrupte spektrale Veränderungen, z. B. Energieänderungen bei Stoppkonsonanten bzw. Explosivlauten. Im Deutschen sind dies u. a. die Konsonanten b, p, d, t, g und k.To ensure synchronization between test and to be able to produce reference samples are in the speech samples Searched for times at which the time segments of the test pattern and of the reference pattern to be compared can be synchronized. This are referred to below as synchronization points (SP). As SP are great Speech energy changes which occur in a short time interval, or abrupt spectral changes, z. B. Energy changes with stop consonants or explosive sounds. In German these are u. a. the consonants b, p, d, t, g and k.

Jedes Testmuster wird nach möglichen SP abgesucht. Dazu wird beispielsweise der Absolutwert der Differenz des Energiegehaltes zweier aufeinanderfolgender Zeitsegmente bestimmt. Überschreitet dieser einen bestimmten Schwellenwert, beispielsweise 1/3 des mittleren Energiegehaltes von z. B. zwanzig umgebenden Zeitsegmente, gilt er als möglicher Synchronisationspunkt. Ebenso eignen sich der Energiegehalt von ausgewählten Frequenzbändern, oder andere Verfahren, die abrupte energetische oder spektrale Veränderungen extrahieren können. Ein Test- oder Referenzmuster kann mehrere SP enthalten. Die SP der Referenzmuster werden vorher bestimmt und zusammen mit deren Merksmalvektoren abgespeichert. Der Vergleich mittels „Dynamische Programmierung" kann anschließend auf die Intervalle zwischen den SP beschränkt werden.Each test pattern is made after possible SP searched. For example, the absolute value of the difference of the energy content of two successive time segments. exceeds this a certain threshold, for example 1/3 of the mean Energy content of e.g. B. twenty surrounding time segments applies he as a possible Synchronization point. The energy content of is also suitable chosen Frequency bands, or other methods that extract abrupt energetic or spectral changes can. A test or reference pattern can contain multiple SP. The SP the reference patterns are determined beforehand and together with their Feature vectors stored. The comparison using “Dynamic Programming " subsequently limited to the intervals between the SP.

Anstelle der „Dynamischen Programmierung" können ebenso Hidden-Markov-Modelle zur Zeitanpassung in allen Varianten des im folgenden beschriebenen beispielhaften erfindungsgemäßen Verfahrens eingesetzt werden.Instead of "dynamic programming" can also Hidden Markov models for time adjustment in all variants of the im The following described exemplary method according to the invention be used.

In 1 und 2 sind mögliche Synchronisationspunkte eingezeichnet: 110, 111 und 112 in 1 sowie 210, 211, 212 und 213 in 2. 5 zeigt wie in 3. die DTW zu den Sprachmustern aus 1 und 2, jedoch mit durch Synchronisationspunkte eingeschränktem Suchraum. Zusätzlich zu den Suchraumbegrenzungen mittels SP, kann der Suchraum mit den herkömmlichen Verfahren eingeschränkt werden. 6 zeigt eine solche zusätzliche Begrenzung (Begrenzung wie in 4) mit den gleichen Sprachmustern.In 1 and 2 possible synchronization points are shown: 110 . 111 and 112 in 1 such as 210 . 211 . 212 and 213 in 2 , 5 shows like in 3 , the DTW on the speech samples 1 and 2 , but with a search space restricted by synchronization points. In addition to the search space limitations using SP, the search space can be restricted using conventional methods. 6 shows such an additional limitation (limitation as in 4 ) with the same speech patterns.

Die Pfadsuche zwischen Wortanfang und erstem Synchronisationspunkt kann rückwärts erfolgen. Sie startet am ersten Synchronisationspunkt und sucht sich einen Pfad, mit den von der vorwärts gerichteten Suche bekannten Einschränkungen bezüglich Schrittweite, usw. zurück zum Wortanfang. Da der Wortanfang der Referenzmuster feststeht, kann die Suche abgebrochen werden, wenn der günstigste Pfad vom ersten Synchronisationspunkt zurück zum Wortbeginn b(1,k) des Referenzmusters gefunden ist. Damit ist das Problem der Bestimmung des Wortanfanges des Testmusters gelöst. Ebenso wird die Suche (vorwärts) des optimalen Pfades im letzten Teil des Testmusters (letzter Synchronisationspunkt bis Wortende) abgebrochen, wenn der optimale Weg zum letzten Vektor b(J_k,k) des jeweiligen Referenzmusters gefunden ist. Damit ist das Problem der Bestimmung des Wortendes gelöst.The path search between the beginning of the word and the first synchronization point can be done in reverse. It starts at the first synchronization point and looks for a path with the restrictions regarding step size, etc. known from the forward search, back to the beginning of the word. Since the beginning of the word of the reference pattern is fixed, the search can be stopped when the most favorable path from the first synchronization point back to the beginning of the word b (1, k) of the reference pattern has been found. This solves the problem of determining the beginning of the word in the test pattern. Likewise, the search (forward) of the optimal path in the last part of the test pattern (last synchronization point to the end of the word) is terminated when the optimal path to the last vector b (J _k , k) of the respective reference pattern has been found. This solves the problem of determining the end of the word.

Im nächsten Schritt werden die Sprachparameter der unmittelbaren Umgebung der Synchronisationspunkte zur Vorauswahl der zu vergleichenden Referenzmuster benutzt. Dazu werden Klassen von Synchronisationsfragmenten (SF) um die SP verwendet. In einer bevorzugten An wendung ist ein SF eine kurze Folge von Merkmalsvektoren mit einer vorgegebenen Anzahl von Vektoren. Siehe hierzu 11. Die SF beginnen bei Synchronisationspunkten mit ansteigender Energie 902, 903 mit dem Vektor des SP. Bei SP mit abfallender Energie 901 bilden die Merkmalsvektoren unmittelbar vor dem SP zusammen mit dem Vektor am SP das SF 911. Die SF 911, 912, 913 werden mit den SF-Repräsentanten 930 von vorgegebenen SF-Klassen 921, 922, 923 verglichen.In the next step, the language parameters of the immediate environment of the synchronization points are used to preselect the reference samples to be compared. Classes of synchronization fragments (SF) around the SP are used for this. In a preferred application, an SF is a short sequence of feature vectors with a predetermined number of vectors. See also 11 , The SF begin at increasing synchronization points 902 . 903 with the vector of the SP. At SP with falling energy 901 the feature vectors immediately before the SP together with the vector at the SP form the SF 911 , The SF 911 . 912 . 913 be with the SF representatives 930 of specified SF classes 921 . 922 . 923 compared.

Bei einem Suchvorgang werden nach dem Bestimmen der SP 901, 902 bzw. 903 die zugehörigen SF 911, 912 bzw. 913 zusammengestellt und deren Klassenzugehörigkeit bestimmt. Zu jeder SF-Klasse gehört eine Menge von SF-Objekten 931. Jedes SF-Objekt gehört zu genau einer SF-Klasse. Es enthält einen Zeiger auf ein RP-Objekt 941, 942 (RP = Reference Pattern). Für jedes Referenzwort existiert ein RP-Objekt. Die Zeiger der SF-Objekte 931 einer SF-Klasse 921, 922, 923 zeigen auf die RP-Objekte 941, 942 der Referenzworte, welche Synchronisationsfragmente dieser Klasse enthalten. Nachdem die Klasse eines Fragmentes bestimmt ist, wird für jedes Referenzwort, welches in der besagten Liste referenziert wird, sofern nicht schon vorhanden und passend, eine Struktur angelegt, die im Folgenden als Schablone 951 bezeichnet wird. Diese neu angelegte Schablone 951 wird in die Liste der aktiven Schablonen eingegliedert. Jedes Referenzwort kann eine solche Liste besitzen. In den Schablonen 951 wird vermerkt, zu welchen SP des Referenzmusters im aktuell untersuchten Testmuster passende SF gefunden wurden. Wenn ein zum letzten SP eines Referenzmusters passendes SF im Testmuster gefunden wurde, oder wenn seit dem Übereinstimmen eines früheren SF eine bestimmte Zeit vergangen ist, wird die Schablone 951 aus der Liste der aktiven Schablonen entfernt und der weiteren Verarbeitung zugeführt bzw.When searching, after determining the SP 901 . 902 respectively. 903 the associated SF 911 . 912 respectively. 913 compiled and their class membership determined. A lot of SF objects belong to each SF class 931 , Each SF object belongs to exactly one SF class. It contains a pointer to an RP object 941 . 942 (RP = Reference Pattern). There is an RP object for each reference word. The pointers of the SF objects 931 an SF class 921 . 922 . 923 point to the RP objects 941 . 942 the reference words which contain synchronization fragments of this class. After the class of a fragment has been determined, a structure is created for each reference word which is referenced in the said list, if it does not already exist and suitably, which is subsequently used as a template 951 referred to as. This newly created template 951 is included in the list of active templates. Each reference word can have such a list. In the templates 951 It is noted which SP of the reference pattern found in the currently examined test pattern matching SF. If an SF matching the last SP of a reference pattern has been found in the test pattern, or if a certain time has passed since a previous SF matched, the template is used 951 removed from the list of active templates and sent for further processing or

ausgegeben: So wird z. B. bei Schablonen 951, in welchen alle SP als gefunden gekennzeichnet sind, dann eine Zeitanpassung mittels „Dynamischer Programmierung" (DP) ausgeführt. Abhängig von der Anwendung, kann die DP auch zwischen Testmuster und Referenzwort bei Schablonen 951, in welchen nicht alle, sondern nur eine vorgebbare Anzahl von SP übereinstimmten, ausgeführt werden. Die Ausführung einer DP kann ebenso von dem Verhältnis zwischen gefundenen übereinstimmenden SP und der Anzahl der SP des Referenzwortes abhängig sein. Eine andere Erweiterung des Verfahrens bestimmt bei jeden Vergleich von Referenz- und Testmuster-SF die „Ähnlichkeit" (score) der einander zugeordneten Synchronisationsfragmente. Eine Struktur, welche die „Ähnlichkeit" (d[NmOfSP]) berücksichtigt, ist beispielhaft in 9 wiedergegeben. Diese „Ähnlichkeiten" können anschließend zur Festlegung einer Rangfolge unter den möglichen Referenzworten verwendet werden. Die Rangfolge wiederum bestimmt, welche Referenzmuster zuerst mittels DP verglichen werden. Durch diese Verfahren wird erreicht, dass das DTW zwischen Referenzwort und Testmuster nicht für alle Referenzworte ausgeführt werden muss, d. h. es findet eine Vorauswahl der Referenzworte statt.output: So z. B. stencils 951 , in which all SP are marked as found, then a time adjustment is carried out using "dynamic programming" (DP). Depending on the application, the DP can also choose between the test pattern and reference word for templates 951 , in which not all, but only a predeterminable number of SPs, are carried out. The execution of a DP can also depend on the relationship between the matching SPs found and the number of SPs in the reference word. Another extension of the method determines the "similarity" (score) of the synchronization fragments assigned to one another for each comparison of reference and test pattern SF. A structure which takes into account the "similarity" (d [NmOfSP]) is exemplified in 9 played. These "similarities" can then be used to determine a ranking order among the possible reference words. The ranking order in turn determines which reference patterns are compared first by means of DP. This method means that the DTW between reference word and test pattern does not have to be carried out for all reference words , ie the reference words are preselected.

Die DTW-Suche kann beispielsweise mit den Referenzworten beginnen, welche die höchste Anzahl von Treffern, das beste Verhältnis von Treffern zu Anzahl der Synchronisationspunkte des Referenzmusters, oder die beste mittlere Ähnlichkeit der Synchronisationsfragmente (SF) des Referenzmusters zu den Synchronisationsfragmenten des Testmusters hat. Die mittlere Ähnlichkeit kann z. B. als Summe der Ähnlichkeiten der einzelnen SF zur jeweiligen Klasse geteilt durch Anzahl der Synchronisationspunkte definiert werden. Hat ein Referenzwort mehrere Synchronisationspunkte, so wird der Vergleich zwischen Referenzmuster und Suchmuster mittels DTW dann begonnen, wenn ein Synchronisationspunkt des Testmusters dem letzten Synchronisationspunkt des Referenzmusters zugeordnet wurde. Ergänzend kann der Vergleich gestartet werden, wenn seit der letzten Zuordnung eines Synchronisationspunktes von Testmuster und Referenzmuster mehr Zeitsegmente verstrichen sind, als bei maximaler Dehnung oder Stauchung der Muster mittels DTW an Zeitanpassung möglich ist.The DTW search can, for example start with the reference words that have the highest number of hits, the best relationship from hits to the number of synchronization points of the reference pattern, or the best mean similarity the synchronization fragments (SF) of the reference pattern to the synchronization fragments of the test pattern. The mean similarity can e.g. B. as a sum of similarities of the individual SF for each class divided by the number of Synchronization points can be defined. If a reference word has several Synchronization points, so is the comparison between reference patterns and search pattern started by DTW when a synchronization point of the test pattern the last synchronization point of the reference pattern was assigned. additional the comparison can be started if since the last assignment a synchronization point of test pattern and reference pattern more time segments have passed than at maximum stretch or Upsetting of the samples by means of DTW is possible.

Zur Begrenzung der Rechenlast durch die rechenintensive DTW-Berechnungen können z. B. nur die DTW-Vergleiche mit der größten Ähnlichkeit der SF ausführt werden, oder die DTW-Berechnungen in der Reihenfolge der mittleren Ähnlichkeit der SF vorgenommen werden, und abbrechen, sobald das erste Referenzmuster k gefunden ist, bei welchem D_min(k) einen vorgegebenen Schwellenwert unterschreitet.To limit the computing load through the computationally intensive DTW calculations, e.g. B. only the DTW comparisons with the greatest similarity of the SF are carried out, or the DTW calculations are carried out in the order of the mean similarity of the SF, and abort as soon as the first reference pattern k has been found, at which D _min (k) falls below a predetermined threshold.

In anderen Anwendungen können die Synchronisationspunkte, analog zu ihrer Verwendung zur Synchronisation von zwei Mustern bei der „Dynamischen Programmierung", zur Synchronisation in Hidden-Markov-Modellen eingesetzt werden.In other applications, the Synchronization points, analogous to their use for synchronization of two patterns in the “Dynamic Programming", can be used for synchronization in hidden Markov models.

Die oben beschriebene Synchronisation mit Hilfe der SP, stellt eigentlich den ersten Schritt der Mustererkennung dar. Bei manchen Anwendungen reicht dieser erste Synchronisationsschritt jedoch bereits zur Bestimmung des zum Testmuster passenden Referenzwortes. Ein Vergleich mittels DP oder der Einsatz eines Hidden-Markov-Modells ist bei diesen Anwendungen nicht mehr notwendig. Dies betrifft insbesondere Anwendungen mit einer geringen Anzahl von Referenzworten.The synchronization described above with the help of the SP, actually represents the first step of pattern recognition In some applications, this first synchronization step is sufficient however, already to determine the reference word that matches the test pattern. A comparison using DP or the use of a hidden Markov model is with them Applications no longer necessary. This applies in particular to applications with a small number of reference words.

Das Anlegen von Schablonen 951 in der beschriebenen Art und Weise, die Rückwärtssuche im ersten Teil des Wortes nach dem Wortanfang, und das Abbrechen der Suche, wenn der günstigste Pfad zum Wortanfang bzw. zum Wortende des Referenzmusters gefunden ist, führt dazu, dass auch dann Worte in fließender Sprache gefunden werden, wenn keine klaren Pausen zwischen einzelnen Worten erkennbar sind.The creation of templates 951 In the manner described, the backward search in the first part of the word after the beginning of the word, and the termination of the search when the most favorable path to the beginning or end of the word of the reference pattern has been found leads to the fact that words are also found in fluent language if there are no clear pauses between individual words.

In dem oben erläuterten Ausführungsbeispiel wurde der spezielle Fall dargelegt, dass ein SF genau einer SF-Klasse zugeordnet werden kann, dass für jedes Referenzwort genau ein RP-Objekt existiert, sowie eine mögliche Form der Datenstrukturen. Für andere Anwendungen können daneben jedoch Variationen des Verfahrens von Vorteil sein, z. B. kann eine SF-Klasse mehrere Repräsentanten haben, oder jedes SF-Objekt kann einen eigenen Repräsentanten haben, oder jedes SF-Objekt kann mehreren Klassen mir einer jeweils spezifischen Wahrscheinlichkeit zugeordnet werden (unscharfe Clusterung).In the embodiment explained above the special case was explained that an SF was assigned to exactly one SF class can be that for each reference word has exactly one RP object, as well as a possible form of the data structures. For other applications can however, variations in the method may be advantageous, e.g. B. an SF class can have multiple representatives or each SF object can have its own representative or each SF object can have several classes with a specific one Probability can be assigned (fuzzy clustering).

Detaillierte Beschreibung einer beispielhaften erfindungsgemäßen Anordnung und der Synchronisationsschritte einer beispielhaften Ausführungsform:Detailed description an exemplary arrangement according to the invention and the synchronization steps an exemplary embodiment:

12 zeigt beispielhaft den Aufbau einer erfindungsgemäßen Vorrichtung. Die akustischen Sprachsignale werden von einem Mikrofon 401 aufgenommen und in elektrische Signale umgewandelt. Ein Verstärker 402 verstärkt das Signal. Ein Bandpassfilter 403 unterdrückt einen eventuell vorhandenen Gleichstromanteil, sowie Frequenzen unterhalb von 300 Hz und oberhalb von 3 kHz. Dieses entspricht den Bandbreitenbegrenzungen in vielen Telefonnetzen. Andere Begrenzungen sind ebenso möglich. Anschließend wird das Signal in einem Analog-Digital- Wandler 404 abgetastet. Die Abtastrate beträgt 8.000 Abtastungen/s mit einer Auflösung von 13 Bit. Diese Abtastrate wird auch in vielen digitalen Telefonanlagen verwendet. Andere Abtastraten und Auflösungen sind möglich. Anschießend werden die Signale von einem Digitalen Signal Prozessor (DSP) 405 weiterverarbeitet. Die Auswertung der Ergebnisse erfolgt in einem Prozessor 406 mit RAM 407 und Festwertspeicher (ROM) 408. Die Ergebnisse werden je nach Anwendung vom Prozessor 406 oder vom DSP 405 ausgegeben 409. In wieder anderen Anwendungen erfolgt die Eingabe und/oder Ausgabe entfernt in einer separaten Vorrichtung. 12 shows an example of the structure of a device according to the invention. The acoustic speech signals are from a microphone 401 recorded and converted into electrical signals. An amplifier 402 amplifies the signal. A bandpass filter 403 suppresses any DC component, as well as frequencies below 300 Hz and above 3 kHz. This corresponds to the bandwidth limits in many telephone networks. Other limitations are also possible. Then the signal is in an analog-to-digital converter 404 sampled. The sampling rate is 8,000 samples / s with a resolution of 13 bits. This sampling rate is also used in many digital telephone systems. Other sampling rates and resolutions are possible. The signals are then sent from a digital signal processor (DSP) 405 further processed. The results are evaluated in a processor 406 with RAM 407 and read-only memory (ROM) 408 , The results are depending on the application from the processor 406 or from the DSP 405 output 409 , In still other applications, the input and / or output takes place remotely in a separate device.

13 zeigt die Verarbeitung der Signale im DSP 405 in einer bevorzugten Ausführung. Zuerst werden in einem Framer 421 die Abtastwerte in Zeitsegmente (engl. Frames) von 20 ms (entsprechend 160 Abtastwerten) eingeteilt. Andere Längen der Zeitsegmente sind möglich. Ebenso sind Ausführungen mit überlappenden Zeitsegmente möglich. Anschließend werden die Signale durch einen Hochpassfilter 422 geführt. Damit werden insbesondere etwaige Gleichstromanteile ausgefiltert. Danach werden die einzelnen Werte eines Segmentes mit einer Fensterfunktion multipliziert. Damit werden die Auswirkungen der Unterteilung des Signals in Zeitsegmente auf die Berechnungen der Sprachparameter reduziert. In der bevorzugten Anwendung wird dafür die Hamming-Fenster-Funktion 423 verwendet. Andere Funktionen wie Hanning, Kaiser, usw. sind möglich, ebenso der Verzicht auf eine Fensterfunktion. Die Berechnung des Filter- 422 und der Fensterfunktion 423 kann auch in einem Schritt erfolgen. 13 shows the processing of the signals in the DSP 405 in a preferred embodiment. First be in a framer 421 the samples are divided into time segments (English frames) of 20 ms (corresponding to 160 samples). Other lengths of the time segments are possible. Versions with overlapping time segments are also possible. The signals are then passed through a high pass filter 422 guided. In particular, any DC components are filtered out. Then the individual values of a segment are multiplied by a window function. This reduces the effects of dividing the signal into time segments on the calculations of the speech parameters. The Hamming window function is used for this in the preferred application 423 used. Other functions such as Hanning, Kaiser, etc. are possible, as is a window function. The calculation of the filter 422 and the window function 423 can also be done in one step.

Anschließend werden die Sprachparameter der Sprachsegmente bestimmt. In der bevorzugten Realisierung werden Cepstrum-Koeffizienten verwendet. Diese werden gebildet, indem zuerst eine DFT (Diskrete Fourier-Transfor mation) 424 des Zeitsegmentes berechnet wird. Anschließend wird der Logarithmus der DFT-Ergebnisse 425 berechnet. Aus diesen Werten werden durch eine Inverse Diskrete Fourier-Transformation 426 die Ceptrum-Koeffizienten gewonnen. Andere Verfahren zur Berechnung der Cepstrum-Koeffizienten können ebenso eingesetzt werden, beispielsweise die Berechnung mittels der Autokorrelationskoeffizienten und/oder der LPC-Koeffizienten.The language parameters of the language segments are then determined. In the preferred implementation, cepstrum coefficients are used. These are formed by first using a DFT (Discrete Fourier Transform) 424 of the time segment is calculated. Then the logarithm of the DFT results 425 calculated. An inverse discrete Fourier transformation results from these values 426 won the ceptrum coefficients. Other methods for calculating the cepstrum coefficients can also be used, for example the calculation using the autocorrelation coefficients and / or the LPC coefficients.

Anstelle der Cepstrum-Koeffizienten können auch andere Sprachparameter verwendet werden, wie z. B. die LPC-Koeffizienten (LPC = Linear Predictive Coding), PARCOR-Koeffizienten, Energie-Spektrum oder MEL-skaliertes Spektrum, MEL-Cepstrum-Koeffizienten, Anzahl der Nulldurchgänge, sowie vergleichbare Parameter.Instead of the cepstrum coefficients can other language parameters are used, such as. B. the LPC coefficients (LPC = Linear Predictive Coding), PARCOR coefficients, energy spectrum or MEL-scaled Spectrum, MEL-Cepstrum coefficients, number of zero crossings, as well comparable parameters.

In der bevorzugten Realisierung werden die zehn ersten Cepstrum-Koeffizienten, deren erste Ableitung (Differenzen c_j(i) – c_j(i – 1) der j-ten Cepstrum-Koeffizienten des i-ten und (i – 1)-ten Zeigsegmentes), die Gesamtenergie und deren erste Ableitungen (berechnet in der Einheit zur Bildung der ersten Ableitung 428) als Elemente des Merkmalsvektors eines Zeitsegmentes verwendet. Die Verwendung von anderen Parametersätzen und deren Ableitungen ist ebenso erfindungsgemäß möglich. Für die Berechnung der Ableitungen können auch andere Verfahren, z. B. die bekannten Verfahren mit Glättungsfunktionen, eingesetzt werden.In the preferred implementation, the ten first cepstrum coefficients, their first derivative (differences c _j (i) - c _j (i - 1) of the j-th cepstrum coefficients of the i-th and (i-1) -th pointing segment ), the total energy and its first derivatives (calculated in the unit for forming the first derivative 428 ) used as elements of the feature vector of a time segment. The use of other parameter sets and their derivations is also possible according to the invention. Other methods, e.g. B. the known methods with smoothing functions can be used.

Die Suche nach Synchronisationspunkten in dem SP-Detektor 429 erfolgt auf der Basis der Zeitsegmente. In einer einfachen Version wird die Differenz Δ_i der Energie der Segmente i und i – 1 gebildet. Die mittlere Energie E _i der umgebenden zwanzig Segmente wird bestimmt. Dazu wird die Energie der Segmente i – 10 bis i + 9 addiert und durch die Anzahl der Segmente, in diesem Fall 20, geteilt. Zur Bestimmung der mittleren Energie der ersten Segmente einer Aufzeichnung wird die mittlere Energie aus entsprechend weniger Elementen gebildet, d. h. zur Bestimmung der mittleren Energie des zweiten Segmentes wird die Summe der Energiewerte von Segment 1 bis 11 gebildet und durch 11 geteilt, bei der Bestimmung der mittleren Energie des zweiten Elementes wir die Energie der Segmente 1 bis 12 summiert und durch 12 geteilt, usw. Bei dieser Vorgehensweise kann das erste Segment einer Aufzeichnung keinen Synchronisationspunkt haben. Ein Synchronisationspunkt liegt dann vor, wenn der Absolutwert des Verhältnisses Δ_i/E _i einen bestimmten Schwellenwert überschreitet. In der bevorzugten Realisierung wird eine Schwelle von 1/3 verwendet, d. h. ein Synchronisationspunkt liegt vor, wenn |Δ_i/E _i| > 1/3 ist. Auch andere Schwellenwerte liefern brauchbare Ergebnisse. Ebenso kann anstelle der Differenz der gesamten Energie zweier benachbarter Segmente, die Differenz der Energie dieser Segmente in bestimmten Frequenzbändern (z. B. 300 bis 1.000 Hz) verwendet werden. Einzelne Frequenzbänder können unterschiedlich gewichtet werden. Ferner kann die Änderung der Anzahl der Nulldurchgänge des Sprachsignals in zwei benachbarten Zeitsegmenten, sowie die Cepstrum-Koeffizienten, die LPC-Koeffizienten, die PARCOR-Koeffizienten und/oder deren Ableitungen, oder die Summe bzw. gewichtetet Summe (oder sonstige Funktionen, wie z. B. die Summe der log-Werte von aufgeführten Koeffizienten) aus Kombinationen dieser Parameter zur Festlegung der SP benutzt werden. In einer anderen Ausführungsform dieser Erfindung werden die Synchronisationspunkte nicht auf Basis einer festen Segmenteinteilung bestimmt. Stattdessen wird die mittlere Energie kontinuierlich aus den Abtastwerten ermittelt (gleitender Mittelwert). Dabei wird der gleitende Mittelwert über ein kurzes und ein längeres Zeitfenster ermittelt. Überschreitet das Verhältnis des Absolutwertes der Änderung des Mittelwertes des kurzen Zeitfensters zum Mittelwert des langen Zeitfensters eine vorgegebne Schwelle, liegt ein Synchronisationspunkt vor.The search for synchronization points in the SP detector 429 is based on the time segments. In a simple version, the difference Δ _{i of} the energy of the segments i and i-1 is formed. The mean energy e _{i of} the surrounding twenty segments is determined. For this purpose, the energy of the segments i - 10 to i + 9 is added and by the number of segments, in this case 20 , divided. To determine the average energy of the first segments of a recording, the average energy is formed from correspondingly fewer elements, ie to determine the average energy of the second segment, the sum of the energy values of segment 1 to 11 formed and divided by 11, when determining the average energy of the second element we use the energy of the segments 1 to 12 summed and divided by 12, etc. With this approach, the first segment of a record cannot have a synchronization point. A synchronization point is present when the absolute value of the ratio Δ _i / e _i exceeds a certain threshold. In the preferred implementation, a threshold of 1/3 is used, ie a synchronization on point exists if | Δ _i / e _i | > 1/3. Other threshold values also provide useful results. Likewise, instead of the difference in the total energy of two adjacent segments, the difference in the energy of these segments in certain frequency bands (e.g. 300 to 1,000 Hz) can be used. Individual frequency bands can be weighted differently. Furthermore, the change in the number of zero crossings of the speech signal in two adjacent time segments, as well as the cepstrum coefficients, the LPC coefficients, the PARCOR coefficients and / or their derivatives, or the sum or weighted sum (or other functions, such as, for E.g. the sum of the log values of listed coefficients) from combinations of these parameters can be used to determine the SP. In another embodiment of this invention, the synchronization points are not determined on the basis of a fixed segment division. Instead, the average energy is continuously determined from the samples (moving average). The moving average is determined over a short and a longer time window. If the ratio of the absolute value of the change in the mean value of the short time window to the mean value of the long time window exceeds a predetermined threshold, there is a synchronization point.

Anschließend werden in Einheit 430 zur Ermittlung von Synchronisationsfragmenten passende Synchronisationsfragmente zusammengestellt, gegebenenfalls deren Klasse bestimmt, sowie Vergleiche zwischen dem Testmuster und einem oder mehreren Referenzmustern 432 mittels DTW 431 berechnet. Diese Schritte können in der DSP-Einheit 405 oder in der Prozessor-Einheit 406 der Vorrichtung ausgeführt werden. Sie werden in den folgenden Abschnitten näher beschrieben.Then be in unity 430 for the determination of synchronization fragments, suitable synchronization fragments are compiled, their class determined if necessary, and comparisons between the test pattern and one or more reference patterns 432 using DTW 431 calculated. These steps can be done in the DSP unit 405 or in the processor unit 406 of the device. They are described in more detail in the following sections.

Bei den vorstehend beschriebenen Ausführungsformen handelt es sich um Beispiele von speziellen Anordnungen, mit denen das erfindungsgemäße Verfahren ausgeführt werden kann. Sie sind keinesfalls eine abschließende Darstellung. Vielmehr können sich in bestimmten Fällen Anordnungen als praktisch erweisen, welche auf einzelne Bauteile verzichten, wie etwa einen Bandpassfilter 403 oder einen Verstärker 402. Ebenso kann der Prozessor 406 die Aufgaben des DSP 405 übernehmen, oder umgekehrt. Auch durch Neuentwicklungen auf dem Gebiet der Mikroelektronik können sich weitere Ausführungsformen der erfindungsgemäßen Anordnung als zweckmäßig herausstellen, welche das erfindungsgemäße Verfahren ebenfalls umsetzen.The embodiments described above are examples of special arrangements with which the method according to the invention can be carried out. They are by no means a conclusive presentation. Rather, arrangements that do without individual components, such as a bandpass filter, may prove practical in certain cases 403 or an amplifier 402 , The processor can also 406 the tasks of the DSP 405 take over, or vice versa. New developments in the field of microelectronics can also prove to be useful in further embodiments of the arrangement according to the invention, which also implement the method according to the invention.

Berechnung der DTW mit Synchronisationspunktencalculation the DTW with synchronization points

In der einfachsten Version des erfindungsgemäßen Verfahrens erfolgen nach der Berechnung der Sprachparameter der Zeitsegmente und der Bestimmung möglicher Synchronisationspunkte die Vergleiche des Testmusters mit den Referenzmustern mittels „Dynamischer Program mierung". Die Synchronisationspunkte der Referenzmuster liegen vor (Referenz-SP). Sie sind in der Regel zusammen mit den Merkmalsvektoren der Referenzmuster abgespeichert.In the simplest version of the method according to the invention take place after the calculation of the language parameters of the time segments and determining possible ones Synchronization points the comparisons of the test pattern with the reference patterns using "Dynamic Programming ". The synchronization points of the reference patterns are available (reference SP). They are usually together with the feature vectors of the reference pattern stored.

Als Erstes wird die Anzahl der Synchronisationspunkte des Testmusters mit der Anzahl der Referenz-Synchronisationspunkte des jeweiligen Referenzmusters verglichen. Sind diese gleich, so werden die SP in der Reihenfolge ihres Auftretens in den beiden Mustern einander zugeordnet, d. h. der erste Referenz-SP des Referenzmusters wird dem ersten SP des Testmusters, der zweite Referenz-SP des Referenzmusters dem zweiten SP des Testmusters zugeordnet usw. Zur Bestimmung der Ähnlichkeit der beiden Muster wird anschließend mittels der „Dynamischen Programmierung" der Pfad mit der kleinsten Summe der Abstände der einzelnen Merkmalsvektoren bestimmt. Die DTW erfolgt durch einzelne DTW zwischen den Synchronisationspunkten bzw. zwischen dem ersten SP und dem Wortanfang und dem letzten SP und dem Wortende. Die Summe der kleinsten Summen der einzelnen Abschnitte ergibt die Ähnlichkeit von Test- und Referenzwort.First is the number of synchronization points of the test pattern with the number of reference synchronization points of the respective reference pattern compared. If they are the same, then become the SP in the order of their appearance in the two Patterns associated with each other, d. H. the first reference SP of the reference pattern becomes the first SP of the test pattern, the second reference SP of the reference pattern assigned to the second SP of the test pattern, etc. To determine the similarity the two patterns will then using the "Dynamic Programming "the Path with the smallest sum of the distances of the individual feature vectors certainly. The DTW is carried out by individual DTW between the synchronization points or between the first SP and the beginning of the word and the last SP and the end of the word. The sum of the smallest sums of the individual sections gives the similarity of test and reference word.

Prinzipiell kann die Suche zwischen zwei Synchronisationspunkten in Richtung der Zeitachsen, oder rückwärts erfolgen. Insbesondere für den ersten Zeitabschnitt, vom Anfang des Wortes bis zum ersten Synchronisationspunkt ist, wegen der oft nur ungenauen Bestimmung des Wortanfanges, eine Rückwärtssuche von Vorteil (vgl. die obigen Erläuterungen). Die Rückwärtssuche erfolgt genau wie die klassische Zeitanpassung, nur mit umgekehrten Vorzeichen bei den einzelnen Schritten. Sie beginnt an einem Synchronisationspunkt. Bei jedem Schritt wird dann mindestens ein Index (i oder j) erniedrigt. Zur Einschränkung des Suchraumes können die gleichen Ein schränkungen hinsichtlich Schrittweite usw. verwendet werden wie bei der Vorwärtssuche.In principle, the search between two synchronization points in the direction of the time axes, or backwards. Especially for the first period, from the beginning of the word to the first synchronization point is, due to the often inaccurate determination of the beginning of the word, one Reverse search from Advantage (see the explanations above). The reverse search is done exactly like the classic time adjustment, only with the reverse Sign of the individual steps. It starts at a synchronization point. At each step, at least one index (i or j) is then lowered. For limitation of the search space can the same restrictions in terms of step size etc. are used as in the forward search.

In der einfachsten Version wird in dem Fall, dass die Anzahl der Synchronisationspunkte in Testmuster und im Referenzmuster nicht übereinstimmt, die „Dynamische Zeitanpassung" nach hergebrachter Art, ohne Synchronisation durch Synchronisationspunkte, erfolgen. Ebenso wird verfahren, wenn keine Synchronisationspunkte gefunden werden.The simplest version is in the case that the number of synchronization points in test pattern and does not match in the reference pattern, the "dynamic Time adjustment "after traditional way, without synchronization through synchronization points, respectively. The same procedure is followed if there are no synchronization points being found.

Unterscheidung von zwei Arten von SPDistinction of two types of SP

Eine mögliche Erweiterung des Verfahrens besteht in der Unterscheidung von zwei Arten von Synchronisationspunkten: Punkte mit steigender Sprachenergie und Punkte mit fallender Sprachenergie (typisch bei Explosivlauten, resp. bei Stoppkonsonanten). Vor dem Vergleich zweier Muster durch ein DP-Verfahren erfolgt die Zuordnung der Synchronisationspunkte der beiden Muster wie oben beschrieben. Anschließend wird verglichen, ob alle Paare von Synchronisationspunkten gleicher Art sind, d. h. Punkte steigender oder fallender Energiewerte darstellen. Ist mindestens ein Paar ungleich, so wird der Vergleich mittels DP zwischen den beiden Mustern entweder nicht vorgenommen (d. h. die beiden Muster werden als ungleich angenommen), oder das DP wird ausgeführt, wobei die nicht übereinstimmenden Synchronisationspunkte nicht zur Synchronisation verwendet werden. 5 zeigt ein Beispiel: der (dritte) SP 212 aus 2 wird nicht verwendet, da in 1 kein passender SP gefunden wurde. Die Zuordnung der SP der Referenzworte und der Testmuster ist bei dieser Vorgehensweise nicht eindeutig bestimmbar. Es können entweder die möglichen Kombinationen ausprobiert werden, oder die Zeitabstände zwischen den Referenz-SP des Referenzwortes und die Zeitabstände zwischen den SP des Testmusters für eine Zuordnung benutzt werden.A possible extension of the method is to differentiate between two types of synchronization points: points with increasing speech energy and points with falling speech energy (typical for explosive sounds or stop consonants). Before comparing two samples by a DP process follows the assignment of the synchronization points of the two patterns as described above. It is then compared whether all pairs of synchronization points are of the same type, ie represent points of increasing or decreasing energy values. If at least one pair is not the same, the comparison by means of DP between the two patterns is either not carried out (ie the two patterns are assumed to be unequal), or the DP is carried out, the non-matching synchronization points not being used for synchronization. 5 shows an example: the (third) SP 212 out 2 is not used because in 1 no suitable SP was found. The assignment of the SP of the reference words and the test pattern cannot be clearly determined with this procedure. Either the possible combinations can be tried out, or the time intervals between the reference SP of the reference word and the time intervals between the SP of the test pattern can be used for an assignment.

Synchronisationsfragmentesynchronization fragments

Eine bessere Zuordnung der SP von Referenz- und Testmuster ermöglicht eine weitere Modifikation des Verfahrens: zur Untergliederung der Synchronisationspunkte in mehrere Klassen wird jedem Synchronisationspunkt ein Synchronisationsfragment zugeordnet. Synchronisationsfragmente bestehen aus einer Folgen von Vektoren mit Sprachmerkmalen aufeinanderfolgender Zeitsegmente. In der bevorzugten Realisierung besteht die Folge aus vier Vektoren, andere Längen können auch zu guten Ergebnissen führen. Bei Synchronisationspunkten, welche durch einen Abfall der Sprachenergie gekennzeichnet sind, bilden in der bevorzugten Realisierung die Merkmalsvektoren der drei vor dem Synchronisationspunkt liegenden Zeitsegmente zusammen mit dem Merkmalsvektor des Synchronisationspunktes die besagte Folge. Bei Synchronisationspunkten, welche durch einen Anstieg der Sprachenergie gekennzeichnet sind, bildet der Vektor des Zeitsegmentes des Synchronisationspunktes zusammen mit den Merkmalsvektoren der drei folgenden Zeitsegmente diese Folge. Andere Verfahren zur Auswahl der Vektoren sind ebenso möglich. Beispielsweise können unabhängig davon, ob der Synchronisationspunkt an einem Zeitpunkt steigender oder fallender Sprachenergie liegt, die Folge aus den Merkmalsvektoren der zwei vor, dem am und den zwei hinter dem Synchronisationspunkt liegenden Zeitsegmenten bestehen. Nachdem die Synchronisationsfragmente eines Testmusters bestimmt sind, wird deren Klassenzugehörigkeit bestimmt. Dazu wird jedes Synchronisationsfragment des Testmusters mit einem Repräsentanten jeder Klasse verglichen. Der Vergleich erfolgt, indem die Summe der Abstände der Merkmalsvektoren berechnet wird. In der bevorzugten Realisierung wird die gleiche Funktion d(i,j,k) wie bei der Zeitanpassung mittels der „Dynamischen Programmierung" als Abstandsmaß verwendet. Es ist ebenso erfindungsgemäß möglich, andere Abstandsfunktionen zu verwenden, z. B. Funktionen, welche besonders aussagefähig im Bereich von phonetischen Übergängen sind. Ebenso können für Synchronisationsfragmente mit ansteigendem Energiegehalt andere Abstandsfunktionen verwendet werden, als bei Synchronisationsfragmenten mit abfallendem Energiegehalt. Ein Synchronisationsfragment wird der Klasse C_i zugeordnet, bei welcher der Abstand zu dem Synchronisationsfragment des Repräsentanten dieser Klasse am kleinsten ist. Anstelle der Summe der Abstände der Merkmalsvektoren mit einer festen Zuordnung der Vektoren der Fragmente eines Referenzwortes zu den Vektoren der Fragmente des Testmusters kann ein DTW-Vergleich zwischen dem Anfang und Ende der beiden Vektorfolgen verwendet werden. Diese Vorgehensweise ist insbesondere dann notwendig, wenn die Folgen aus einer variablen Anzahl von Vektoren bestehen. So können z. B. Fragmente aus der Folge aller Vektoren zwischen zwei SP, oder der Hälfte der Vektoren zwischen zwei SP zur Klasseneinteilung verwendet werden. Bei dieser Variante des Verfahrens, werden Folgen von Vektoren, welche zeitlich vor dem SP liegen, vom SP ausgehend rückwärts (vgl. oben) mittels „Dynamischen Programmierung" verglichen, wobei der rückwärtige Vergleich beendet wird, wenn der günstigste Pfad zum ersten Vektor der zum Referenzwort gehörenden Folge gefunden ist.A better assignment of the SP of reference and test patterns enables a further modification of the method: in order to subdivide the synchronization points into several classes, a synchronization fragment is assigned to each synchronization point. Synchronization fragments consist of a sequence of vectors with speech features of successive time segments. In the preferred implementation, the sequence consists of four vectors, other lengths can also lead to good results. In the case of synchronization points, which are characterized by a drop in voice energy, in the preferred implementation the feature vectors of the three time segments lying before the synchronization point together with the feature vector of the synchronization point form the said sequence. In the case of synchronization points, which are characterized by an increase in the speech energy, the vector of the time segment of the synchronization point forms this sequence together with the feature vectors of the three following time segments. Other methods of selecting the vectors are also possible. For example, regardless of whether the synchronization point is at a time of rising or falling speech energy, the sequence can consist of the feature vectors of the two before, the on and the two time segments lying behind the synchronization point. After the synchronization fragments of a test pattern are determined, their class affiliation is determined. For this purpose, each synchronization fragment of the test pattern is compared with a representative of each class. The comparison is made by calculating the sum of the distances of the feature vectors. In the preferred implementation, the same function d (i, j, k) is used as the distance measure in the case of time adjustment by means of "dynamic programming". It is also possible according to the invention to use other distance functions, for example functions which are particularly meaningful are in the range of phonetic transitions. Similarly, can be used for synchronization fragments with increasing energy content other distance functions than in synchronization fragments with decreasing energy content. A synchronization fragment of the class C _i is assigned, wherein the distance to the synchronization fragment of the representatives of this class is the smallest. Instead of the sum of the distances of the feature vectors with a fixed assignment of the vectors of the fragments of a reference word to the vectors of the fragments of the test pattern, a DTW comparison between the beginning and end of the two vector sequences can be used this is necessary if the sequences consist of a variable number of vectors. So z. B. fragments from the sequence of all vectors between two SP, or half of the vectors between two SP can be used to classify. In this variant of the method, sequences of vectors which are earlier than the SP are compared backwards from the SP (see above) by means of "dynamic programming", the backward comparison being ended when the cheapest path to the first vector leads to the Reference word belonging sequence is found.

Überlappungen an den Synchronisationspunktenoverlaps at the synchronization points

Eine weitere Variante des Verfahrens, welche in Kombination mit allen oben dargelegten Verfahren zu Bestimmung der SP-Zuordnung eingesetzt werden kann, erweitert den Suchraum um den SP. Dadurch werden Probleme, welche durch Ungenauigkeiten bei der Bestimmung der zeitlichen Lage des SP entstehen, behoben. Der Suchraum wird dadurch erweitert, dass die beiden Suchräume (vor und hinter dem SP) um einige Zeitsegmente überlappen. Diese Überlappung kann entlang einer oder beider Achsen zugleich erfolgen. 10 zeigt eine DTW mit Überlappung von Zeitsegmenten 310 an den passenden SP mit den aus 1 und 2 bekannten Sprachmustern.Another variant of the method, which can be used in combination with all of the methods described above for determining the SP assignment, extends the search space by the SP. This eliminates problems caused by inaccuracies in determining the SP's temporal position. The search space is expanded in that the two search spaces (in front of and behind the SP) overlap by a few time segments. This overlap can occur along one or both axes at the same time. 10 shows a DTW with overlap of time segments 310 to the matching SP with the from 1 and 2 known speech patterns.

Eine Möglichkeit zur Ausführung der DP bei Überlappungen ist, den Suchbereich wie in 5 oder 6 zu begrenzen und anschließend um einige Felder durch Hinzufügen einzelner Felder unmittelbar um die SP zu erweitern, beispielsweise durch hinzufügen am SP (n,m) je eines Feldes (n, m + 1) und (n + 1, m). Beim „Rückwärtssuchen" kann wie folgt vorgegangen werden: Angenommen der SP liegt bei (n,m), die Überlappungspunkte sind mit (n – 1, m) und (n + 1, m) festgelegt worden. Dann wird zuerst der optimale Pfad rückwärts – ausgehend von (n,m) – gesucht. Dabei wird die akkumulierte Distanz D(i,j) (d. h. die Summe der Abstandswerte d entlang des Pfades bis zu und einschließlich (i,j)) jedes vom optimalen Pfad durchlaufenen Feldes gespeichert. Anschließend wird die „Rückwärtssuche" der optimalen Pfade, der anderen Überlappungspunkten gestartet. Kreuzt einer dieser Pfade (P_b) einen vorher gefundenen optimalen Pfad P_a (eines anderen Ausgangspunktes) kann die Suche für den entsprechenden Pfad abgebrochen werden. Der weitere optimale Pfad P_b (hin zum Anfang) entspricht dem vorher gefundenen Restteilstück von P_a. Aus den abgespeicherten akkumulierten Distanzen von P_a und dem bis zum Kreuzungspunkt akkumulierten Distanzwerten von P_b kann die Gesamtabstand für P_b berechnet werden.One way to execute the DP in case of overlap is to use the search area as in 5 or 6 limit and then immediately add a few fields to the SP by adding individual fields, for example by adding a field (n, m + 1) and (n + 1, m) to the SP (n, m). When searching backwards, you can proceed as follows: Suppose the SP is at (n, m), the overlap points have been defined with (n - 1, m) and (n + 1, m). Then the optimal path is backwards - starting from (n, m) - searched. The accumulated distance D (i, j) (ie the sum of the distance values d along the path up to and including (i, j)) of each field traversed by the optimal path is stored. Then the “backward search” of the optimal paths and the other overlap points is started. If one of these paths (P _b ) crosses an optimal path P _a (previously found) (another starting point), the search for the corresponding path can be terminated. The further optimal path P _b (towards the beginning) corresponds to the previously found residual section of P _a . The total distance for P _b can be calculated from the stored accumulated distances of P _a and the distance values of P _b accumulated up to the crossing point.

Es können außer dem beschriebenen Algorithmus zur Suche bei der vorgeschlagenen Überlappung alternativ auch Abwandlungen bekannter Vorgehensweisen angewendet werden.In addition to the algorithm described alternatively to search for the proposed overlap Variations of known approaches can be applied.

Vorauswahl von Referenzwortenpreselection of reference words

Das Verfahren zur Vorauswahl von Referenzworten mittels Klassen von Synchronisationsfragmenten benutzt die in 7 beispielhaft aufgelisteten Strukturen.The method for preselecting reference words using classes of synchronization fragments uses the in 7 structures listed as examples.

Die Struktur feature vector 701 enthält einen Vektor mit den Sprachparametern eines Zeitsegmentes. Die Konstante NumberOfParams gibt die Anzahl der verwendeten Parameter an. In der bevorzugten Realisierung beträgt sie zweiundzwanzig: zehn Cepstrum-Koeffizienten, zehn erste Ableitungen der Cepstrum-Koeffizienten, die Energie und die erste Ableitung der Energie. feature vector wird für die Darstellung der Zeitsegmente sowohl der Testmuster als auch der Referenzmuster verwendet.The feature vector structure 701 contains a vector with the speech parameters of a time segment. The constant NumberOfParams indicates the number of parameters used. In the preferred implementation it is twenty two: ten cepstrum coefficients, ten first derivatives of the cepstrum coefficients, the energy and the first derivative of the energy. feature vector is used to display the time segments of both the test pattern and the reference pattern.

Die Struktur sync fragment 702 enthält die Daten eines Synchronisationsfragmentes an einem Synchronisationspunkt. Der Zeiger RefPattern zeigt auf das Referenzwort, in welchem das Synchronisationsfragment vorkommt. In type wird angegeben, ob es sich um einen SP mit ansteigender Sprachenergie (START), oder um einen SP mit abfallender Energie (STOP) handelt. PositionInPattern gibt die Position des Synchronisationsfragmentes im Referenzmuster an. Gezählt wird die Anzahl der Synchronisationsfragmente ab dem Anfang des Referenzmusters. NumberOfContextVectors gibt die Anzahl der Vektoren an, welche zur Klassifizierung des Synchronisationsfragmentes verwendet werden. Die bevorzugte Realisierung verwendet vier Vektoren.The sync fragment structure 702 contains the data of a synchronization fragment at a synchronization point. The pointer RefPattern points to the reference word in which the synchronization fragment occurs. In type it is specified whether it is an SP with increasing speech energy (START) or an SP with falling energy (STOP). PositionInPattern specifies the position of the synchronization fragment in the reference pattern. The number of synchronization fragments is counted from the beginning of the reference pattern. NumberOfContextVectors specifies the number of vectors that are used to classify the synchronization fragment. The preferred implementation uses four vectors.

Für jede Klasse von Synchronisationsfragmenten wird ein Objekt der Struktur fragment class 704 angelegt. Es enthält neben der Angabe der Art des Synchronisationsfragmentes (START, STOP) in type ein Feld von Vektoren B[NumberOfContextVectors]. Diese Vektoren bilden das Referenzmuster der Klasse. Der Set *Fragments beinhaltet Zeiger auf alle sync-fragment-Objekte, welche zu der entsprechende Klasse gehören.For each class of synchronization fragments, an object of the structure fragment class 704 created. In addition to specifying the type of synchronization fragment (START, STOP) in type, it contains a field of vectors B [NumberOfContextVectors]. These vectors form the reference pattern of the class. The Set * Fragments contains pointers to all sync fragment objects that belong to the corresponding class.

Für jedes Referenzmuster ist ein Objekt der Struktur reference pattern 705 angelegt. NumberOfFrames gibt an, aus wie vielen Zeitsegmente (frames) das Muster besteht. NumberOfSP gibt an, wie viele Synchronisationspunkte das Muster enthält. Der Zeiger template-list zeigt auf den Anfang einer Kette von Schablonen 951. Das Feld von Vektoren B[NumberOfFrames] enthält die Merkmalsvektoren des Referenzmusters. Die Einträge in dem Feld TimeToEnd [NumberOfSP] geben für jeden SP die Zeit zwischen dem entsprechenden SP und dem Ende des Referenzmusters an.For each reference pattern there is an object of the structure reference pattern 705 created. NumberOfFrames specifies how many time segments (frames) the pattern consists of. NumberOfSP indicates how many synchronization points the pattern contains. The template-list pointer points to the beginning of a chain of templates 951 , The field of vectors B [NumberOfFrames] contains the feature vectors of the reference pattern. The entries in the TimeToEnd [NumberOfSP] field indicate for each SP the time between the corresponding SP and the end of the reference pattern.

Beim Absuchen des Testmusters nach Synchronisationsfragmenten wird für jedes Referenzwort, welches ein Synchronisationsfragment der Klasse eines gefundenen Synchronisationsfragmentes besitzt, ein Objekt der Struktur template 706, sofern noch keines vorhanden ist oder die vorhandenen nicht passend sind, angelegt. Diese Objekte sind die oben beschriebenen Schablonen 951. Sie werden in Listen verwaltet. Jedes Referenzwort kann eine solche Liste haben. In der zum Referenzwort gehörenden Datenstruktur reference pattern 705 zeigt template list auf das erste Objekt der jeweiligen Liste. In jedem template-Objekt zeigt der Zeiger next auf das nächste Objekt der Liste. Die Variable PreviousSyncPoint gibt an, welches Synchronisationsfragment der Schablone 951 während der laufenden Suche nach Synchronisationsfragmenten zuletzt übereingestimmt hat, d. h. wieweit der Vergleich von Referenzwort und Testmuster fortgeschritten ist. timeout gibt den Zeitpunkt an, an welchem das Objekt spätestens aus der Liste genommen wird.When the test pattern is searched for synchronization fragments, an object of the structure template is found for each reference word which has a synchronization fragment of the class of a found synchronization fragment 706 , if none already exists or the existing ones are not suitable, created. These objects are the templates described above 951 , They are managed in lists. Each reference word can have such a list. In the data structure reference pattern belonging to the reference word 705 points template list to the first object of the respective list. In each template object, the next pointer points to the next object in the list. The PreviousSyncPoint variable specifies which synchronization fragment of the template 951 last agreed during the ongoing search for synchronization fragments, ie to what extent the comparison of reference word and test pattern has progressed. timeout indicates the time at which the object is removed from the list at the latest.

Anhand des Pseudo-Codes in 8 soll nun die Vorauswahl von Referenzworten erklärt werden.Using the pseudo code in 8th the preselection of reference words will now be explained.

Der Funktionsaufruf FirstSyncPoint(TestPattern) 810 sucht den ersten Synchronisationspunkt in Testpattern, bestimmt die Klassenzugehörigkeit des zugehörigen Synchronisationsfragmentes und liefert einen Zeiger auf die Datenstruktur der Klasse, zu welcher das Synchronisationsfragment gehört. Die folgende while-Schleife (812 bis 894) wird für jedes Synchronisationsfragment in Testpattern einmal durchlaufen, wobei c immer auf die Klasse des jeweiligen Synchronisationsfragmentes zeigt.The FirstSyncPoint (TestPattern) function call 810 searches for the first synchronization point in test patterns, determines the class belonging to the associated synchronization fragment and provides a pointer to the data structure of the class to which the synchronization fragment belongs. The following while loop ( 812 to 894 ) is run through once for each synchronization fragment in test patterns, where c always points to the class of the respective synchronization fragment.

Zu jeder Klasse von Synchronisationsfragmenten 704 (vgl. 7) gehört ein Set von Objekten mit der Struktur sync fragment 702 (vgl. 7). Jedes dieser Objekte hat u. a. einen Zeiger auf ein Referenzwort (RefPattern), in welchem ein Synchronisationsfragment dieser Klasse vorkommt, und eine Variable (PositionInPattern), welche die Position des Synchronisationsfragmentes innerhalb des betreffenden Referenzwortes angibt. Enthält ein Referenzwort mehrere Synchronisationsfragmente derselben Klasse, dann enthält der Set *Fragments, neben den sync fragment-Objekten für andere Referenzwörter, entsprechend viele sync fragment-Objekte 702 für das Referenzwort. Die for-Schleife 814 bis 890 wird für jedes Element des Sets *Fragments einmal durchlaufen. Der Zeiger Fragment_i zeigt bei jedem Durchlauf auf ein anderes sync fragment-Objekt. In 816 wird der Zeiger w_i auf das reference pattern-Objekt 705 des zum *Fragment_i gehörende Referenzwortes gesetzt, in 818 wird der Variablen PosInPat_i die Position des Synchronisationsfragmentes im Referenzwort zugewiesen. Zu jedem Referenzwort kann eine Liste von Schablonen (Objekte der Struktur template 706 (vgl. 7.)) gehören. In 820 wird der Zeiger p auf den Anfang der Liste der Schablonen des aktuellen Referenzwortes gesetzt. Wenn (Abfrage in 822) das Referenzwort noch keine Liste von Schablonen besitzt, wird in 826 ein neues Objekt für eine Schablone des Referenzwortes erzeugt. Das neue Objekt wird in 828 als erstes (und derzeit einziges) in die Liste der Schablonen des Referenzwortes eingetragen. In 830 wird die Position des aktuellen Fragmentes (im Referenzwort) der Variablen PreviousSyncPoint zugewiesen. Diese Variable dient dazu, die Reihenfolge der Synchronisationsfragmente in der jeweilige Schablone sicherzustellen. In 831 wird ein Zeitpunkt (timeout) festgelegt, bis zu welchem das zum Referenzwort der Schablone gehörende Muster maximal gedehnt werden kann. Die Berechnung erfolgt ausgehend von der aktuellen Zeit am SP (time). Die Konstante MaxSprF gibt den Faktor an, um die die verbleibende Zeit (SP bis Wortende) gedehnt werden kann. In der bevorzugten Realisierung ist MaxSprF = 3 gesetzt.For each class of synchronization fragments 704 (see. 7 ) belongs to a set of objects with the structure sync fragment 702 (see. 7 ). Each of these objects has, among other things, a pointer to a reference word (RefPattern), in which a synchronization fragment of this class occurs, and a variable (PositionInPattern), which indicates the position of the synchronization fragment within the relevant reference word. If a reference word contains several synchronization fragments of the same class, the set * fragments contains, in addition to the sync fragment objects for other reference words, a corresponding number of sync fragment objects projects 702 for the reference word. The for loop 814 to 890 is run through once for each element of the * fragment set. The fragment _i pointer points to a different sync fragment object each time it is run. In 816 the pointer w _i to the reference pattern object 705 of the reference word belonging to * fragment _i , in 818 the position of the synchronization fragment in the reference word is assigned to the variable PosInPat _i . A list of templates (objects of the structure template 706 (see. 7 .)) belong. In 820 the pointer p is set to the beginning of the list of templates of the current reference word. If (query in 822 ) the reference word does not yet have a list of templates, is in 826 creates a new object for a template of the reference word. The new object will be in 828 first (and currently only) entered in the list of templates of the reference word. In 830 the position of the current fragment (in the reference word) is assigned to the variable PreviousSyncPoint. This variable serves to ensure the order of the synchronization fragments in the respective template. In 831 a point in time (timeout) is determined up to which the pattern belonging to the reference word of the template can be stretched to the maximum. The calculation is based on the current time at the SP (time). The constant MaxSprF specifies the factor by which the remaining time (SP to end of word) can be extended. In the preferred implementation, MaxSprF = 3 is set.

Wenn die Liste mit Schablonen des Referenzwortes nicht leer war (Abfrage 822, else-Zweig 832), wird die while-Schleife (834 bis 868) für jedes Objekt der Liste einmal durchlaufen. In einer Schablone gibt PreviousSyncPoint die Position des letzten übereinstimmenden Synchronisationsfragmentes an. Wenn (Abfrage 836 und 840) die Position des neuen Fragmentes in *Fragment_i hinter dem PreviousSyncPoint liegt, zwischen dem zuletzt übereinstimmenden Synchronisationspunkt PreviousSyncPoint und dem aktuellen Synchronisations punkt aber ein oder mehrere weitere Synchronisationspunkte im Referenzwort liegen (PosInPat ≠ PreviousSyncPoint + 1), wird in 846 und 848 ein Duplikat der Schablone erzeugt. Das Duplikat wird in Zeile 847 in eine temporäre Liste NewTemplates eingefügt. Der Aufbau und die Verwaltung der Liste NewTemplates ist naheliegend und deshalb nicht dargestellt. In 854 wird die Variable PreviousSyncPoint der Schablone auf die Position des aktuellen Synchronisationsfragmentes gesetzt und in 855 der wert für timeout anhand des neuen Synchronisationsfragmentes neu bestimmt. Wenn der aktuelle Synchronisationspunkt im Referenzwort nicht direkt hinter dem vorher eingetragenen Synchronisationspunkt liegt, bestehen nun zwei Schablonen; eine, welche das neue Synchronisationsfragment eingetragen hat (PreviousSyncPoint und timeout verändert) und eine ohne das neue Synchronisationsfragment. Diese Verfahren verhindert, dass ein im Testpattern, nicht aber im entsprechenden Teil des Referenzwortes als SP gewertetes Muster das Erkennen der Synchronisationsfragment-Reihenfolge durcheinanderbringt.If the list with templates of the reference word was not empty (query 822 , else branch 832 ), the while loop ( 834 to 868 ) run through once for each object in the list. In a template, PreviousSyncPoint specifies the position of the last matching synchronization fragment. If (query 836 and 840 ) the position of the new fragment in * fragment _i lies behind the PreviousSyncPoint, but between the last matching synchronization point PreviousSyncPoint and the current synchronization point but there are one or more further synchronization points in the reference word (PosInPat ≠ PreviousSyncPoint + 1), in 846 and 848 creates a duplicate of the template. The duplicate is in line 847 inserted in a temporary list of NewTemplates. The structure and management of the NewTemplates list is obvious and therefore not shown. In 854 the template's PreviousSyncPoint is set to the position of the current synchronization fragment and in 855 the value for timeout is determined anew based on the new synchronization fragment. If the current synchronization point in the reference word is not directly behind the previously entered synchronization point, there are now two templates; one that entered the new synchronization fragment (previousSyncPoint and timeout changed) and one without the new synchronization fragment. This method prevents a pattern that is rated as SP in the test pattern, but not in the corresponding part of the reference word, from confusing the recognition of the synchronization fragment sequence.

Wenn (Abfage 836, else-Zweig 856) die Position des neuen Fragmentes vor oder auf PreviousSyncPoint liegt, passt diese Referenz nicht in die aktuelle Schablone. In 858 bis 863 wird eine neue Schablone angelegt. Deren Variable PreviousSyncPoint wird in Zeile 862 gleich der Position des Synchronisationsfragmentes des aktuellen pattern fragment gesetzt. In 863 wird der Zeitpunkt timeout (wie in 831) bestimmt.If (query 836 , else branch 856 ) the position of the new fragment is in front of or on PreviousSyncPoint, this reference does not fit into the current template. In 858 to 863 a new template is created. Their variable PreviousSyncPoint is in line 862 set to the position of the synchronization fragment of the current pattern fragment. In 863 the timeout (as in 831 ) certainly.

Nachdem alle aktiven Schablonen eines Referenzwortes in der Schleife 836 bis 868 bearbeitet wurden, werden in 874 die eventuell neu erzeugten Schablonen in der Liste NewTemplates in die Liste template list eingefügt. Die Liste NewTemplates ist anschließend wieder leer.After all active templates of a reference word in the loop 836 to 868 were processed in 874 any newly created templates in the NewTemplates list are added to the template list. The NewTemplates list is then empty again.

In der nachfolgenden for-Schleife 876 bis 888 werden alle Schablonen des Referenzwortes nach zwei Kriterien durchsucht: timeout überschritten, oder letzter Synchronisationspunkt des Referenzwortes erreicht. Ist eines der beiden Kriterien erfüllt, wird das Referenzwort mit dem Testpattern durch den Prozeduraufruf executeDTW(p, Testpattern) in 884 verglichen. Das Bestimmen der "fertigen" Schablonen und der anschießende DTW-Vergleich von Referenzwort und Testmuster kann auch an einer anderen Stelle des Gesamtablaufs erfolgen. Er wurde nur zur Verdeutlichung der Funktionsweise der Vorauswahl von Referenzworten an dieser Stelle eingefügt.In the following for loop 876 to 888 all templates of the reference word are searched according to two criteria: timeout exceeded or the last synchronization point of the reference word reached. If one of the two criteria is met, the reference word with the test pattern is entered by the executeDTW (p, test pattern) procedure call 884 compared. The determination of the "finished" templates and the subsequent DTW comparison of reference word and test pattern can also take place at a different point in the overall process. It was only inserted at this point to illustrate how the preselection of reference words works.

Die Funktion NextSyncPoint(TestPattern) 892 sucht den nächsten SP im Testpattern, bestimmt die Klassenzugehörigkeit des zugehörigen Synchronisationsfragmentes und liefert einen Zeiger auf die Datenstruktur der Klasse, zu welcher das Synchronisationsfragment gehört.The NextSyncPoint (TestPattern) function 892 searches for the next SP in the test pattern, determines the class belonging to the associated synchronization fragment and provides a pointer to the data structure of the class to which the synchronization fragment belongs.

9 zeigt eine Erweiterung der Struktur template. Beim Anlegen einer neuen template-Schablone wird die Anzahl der Synchronisationspunkte des entsprechenden Referenzwortes als Parameter NmOfSP angegeben. Die Struktur enthält nun zusätzlich für jeden SP ein Flag in dem Feld SPoint[], je ein Element in den Feldern RPframe[], TPframe[] und d[]. Die Elemente des Feldes SPoint[] werden mit dem Wert FALSE initialisiert. Beim Auffinden eines passenden Synchronisationsfragment-Paares am i-ten SP wird das entsprechende Element SPoint[i] auf TRUE gesetzt, den Elemente der Felder RPframe[i] und TPframe[i] werden die Nummern des Zeitsegmentes des SP im Referenzwort, bzw. im Testmuster zugewiesen, und in d[i] wird die Distanz der beiden Synchronisationsfragmente gespeichert. Die Felder SyPoint[], RPframe[] und TPframe[] werden beim anschließenden DTW-Vergleich zum Unterteilen der Suchraumes verwendet. Die Summe der Distanzen in d[] geben eine erste Aussage über die Ähnlichkeit von Referenzwort und Testmuster. Sie können zu einer weitern Eingrenzung der mittels DTW zu vergleichenden Referenzworte benutzt werden. 9 shows an extension of the template structure. When creating a new template, the number of synchronization points of the corresponding reference word is specified as parameter NmOfSP. The structure now also contains a flag for each SP in the field SPoint [], one element each in the fields RPframe [], TPframe [] and d []. The elements of the SPoint [] field are initialized with the value FALSE. When a suitable synchronization fragment pair is found on the i-th SP, the corresponding element SPoint [i] is set to TRUE, the elements of the fields RPframe [i] and TPframe [i] are the numbers of the time segment of the SP in the reference word or in Test pattern assigned, and the distance of the two synchronization fragments is stored in d [i]. The fields SyPoint [], RPframe [] and TPframe [] are used in the subsequent DTW comparison to subdivide the search space. The sum of the distances in d [] gives a first statement about the similarity of the reference word and test pattern. They can be used to further narrow down the reference words to be compared using DTW.

Bildung der SF-KlassenFormation of SF classes

Die Synchronisationsfragmentklassen können mit den bekannten Verfahren zur Klassenbildung und Klassenzuordnung aus einer Menge von Sprachmustern gebildet werden. In der einfachsten Version werden an vorhandenen Mustern Synchronisationspunkte gesucht und die zugehörigen Synchronisationsfragmente zusammengestellt. Es können dann sowohl die in der Literatur als „dynamic clustering" als auch die „hierarchical clustering" bekannten Verfahren verwendet werden.The synchronization fragment classes can with the known methods for class formation and class assignment can be formed from a set of speech patterns. In the simplest Version, synchronization points are searched for on existing patterns and the associated Sync fragments compiled. Then both in the Literature as “dynamic clustering "as also the "hierarchical known clustering Procedures are used.

In einer abgewandelten Form werden die Klassen aus mehreren Sprachmustern eines Wortes gebildet. Zuerst werden alle potentiellen SF aller Sprachmuster eines Wortes lokalisiert, dann werden die SF der verschiedenen Sprachmuster einander zugeordnet. Die Zuordnung der SF erfolgt aufgrund ihrer Stellung im Wort, und aufgrund des Energieverlaufs am zugehörigen SP. Die einander zugeordneten SF bilden dann eine Klasse. Nicht eindeutig zuordenbare SF werden ignoriert. Diese Zuordnungsweise hat zu Folge, dass jedes Wort seine eigenen Klassen besitzt. Dies kann zu einer vergleichsweise guten Vorauswahl von Referenzworten (beim späteren Suchen) führen. Der höhere Aufwand beim Bestimmen der SF-Klasse (beim späteren Suchen) kann durch eine entsprechende Gestaltung der Suchverfahren reduziert werden, z. B. durch Suchbäume. In einem zweiten Schritt können die vielen (wortspezifischen) Klassen mit den bekannten Verfahren zu wortübergreifenden Klassen zusammengelegt werden.Be in a modified form the classes are made up of several speech patterns of a word. First all potential SF of all language patterns of a word are localized, then the SF of the different speech patterns are assigned to each other. The SF is assigned based on its position in the word, and due to the energy curve at the associated SP. The assigned to each other SF then form a class. SF that cannot be clearly assigned ignored. This means that each word has its own own classes. This can be a comparatively good one Preselect reference words (when searching later). The higher Effort in determining the SF class (when searching later) can be determined by a appropriate design of the search procedures are reduced, e.g. B. by search trees. In a second step you can the many (word-specific) classes with the known methods to cross-word Classes are merged.

Bestimmung des Wortbeginns, Bildung der Grenzen der ReferenzmusterDetermination of the beginning of the word, Formation of the borders of the reference pattern

In einigen Anwendungen, beispielsweise in dem in der Patentanmeldung DE 100 54 583 A1 beschriebenen Verfahren, sammeln sich im Laufe der Zeit eine Reihe von Sprachmustern, von denen bekannt ist, dass sie das gleiche Wort darstellen. Bei anderen Anwendungen sind solche Muster von Anfang an vorhanden, z. B. durch Vorsprechen im Training. Bei diesen Anwendungen kann das Problem bestehen, den Anfang bzw. das Ende des jeweiligen Wortes im Sprachmuster zu bestimmen. Muster mit bekannten Wortgrenzen können z. B. als Referenzmuster benötigt werden.In some applications, such as that in the patent application DE 100 54 583 A1 described methods, over time a series of speech patterns are known to represent the same word. In other applications, such patterns are present from the beginning, e.g. B. by audition in training. The problem with these applications can be to determine the beginning or end of the respective word in the speech pattern. Patterns with known word boundaries can e.g. B. are required as a reference pattern.

Zur Bestimmung der Wortgrenzen werden zuerst die Synchronisationspunkte in den Sprachmustern bestimmt. Bei Sprachmustern, von denen bekannt ist, dass sie gleiche Worte darstellen, wird dann wie folgt verfahren: die Synchronisationspunkte von jeweils zwei Sprachmustern werden einander zugeordnet. Das kann aufgrund ihrer Reihenfolge, und (zusätzlich) durch die Klassifikation der Synchronisationsfragmente der SP erfolgen. Der Anfang eines Wortes wird dann durch „Rückwärtssuche", ausgehend vom ersten Synchronisationspunkt, das Wortende durch „Vorwärtssuche" ausgehend vom letzten Synchronisationspunkt, bestimmt. Die Suche erfolgt mittels „Dynamischer Programmierung", d. h. die Zeitsegmente der beiden Muster werden einander so zugeordnet, dass dabei ein Pfad mit möglichst kleinen Abstandswerten zwischen den Merkmalsvektoren der jeweils zugeordneten Segmente entsteht. Die Suche wird bei den Zeitsegmenten abgebrochen, an denen sich keine Zuordnungen von Zeitsegmenten mit Abstandswerten d(i,j) < D_S finden lassen. Der Schwellenwert D_S hängt von der Anwendung (Hintergrundgeräusche, usw.) ab. Er kann z. B. durch Bestimmen der Werte von d() in Sprachpausen bestimmt werden. Alternativ kann die Suche abgebrochen werden, wenn d(i,j) in einer bestimmten Anzahl n_s von m aufeinanderfolgenden Segmenten den Schwellenwert überschreitet. Die Zeitsegmente, an denen die Suche abgebrochen wird, bilden die Wortgrenzen im entsprechenden Sprachmuster. Dieses Verfahren kann, sofern mehrere Sprachmuster eines Wortes vorhanden sind, mit anderen Paarungen von Sprachmustern wiederholt werden. Dabei sollten Ausreißer nicht berücksichtigt werden. Ausreißer sind Vergleiche, bei denen die Wortgrenzen wesentlich von den durch andere Paarungen ermittelten abweichen. Zusätzlich können die bekannten Verfahren, z. B. Bestimmung von Wortanfang und -ende mittels Kriterien bezüglich Gesamtenergie bzw. Energiegehalt in einzelnen Frequenzbändern, eingesetzt werden.To determine the word boundaries, the synchronization points in the speech patterns are first determined. In the case of speech patterns which are known to represent the same words, the procedure is then as follows: the synchronization points of two speech patterns are assigned to one another. This can be done on the basis of their sequence and (additionally) by classifying the synchronization fragments of the SP. The beginning of a word is then determined by "backward search" starting from the first synchronization point, the end of the word by "forward search" starting from the last synchronization point. The search is carried out by means of "dynamic programming", ie the time segments of the two patterns are assigned to one another in such a way that a path with the smallest possible distance values between the feature vectors of the respectively assigned segments is created. The search is terminated for those time segments to which there are no assignments of time segments with distance values d (i, j) <D _S. The threshold value D _S depends on the application (background noise, etc.) and can be determined, for example, by determining the values of d () during speech pauses Alternatively, the search can be terminated if d (i, j) exceeds the threshold value in a certain number n _s of m successive segments. The time segments at which the search is terminated form the word boundaries in the corresponding speech pattern. if there are several speech patterns of a word, be repeated with other pairings of speech patterns r are not taken into account. Outliers are comparisons in which the word boundaries differ significantly from those determined by other pairings. In addition, the known methods, e.g. B. Determining the beginning and end of words using criteria relating to total energy or energy content in individual frequency bands.

Die Erfindung ist nicht beschränkt auf die hier dargestellten Ausführungsbeispiele. Vielmehr ist es möglich, durch Kombination und Modifikation der genannten Mittel und Merkmale weitere Ausführungsvarianten zu realisieren, ohne den Rahmen der Erfindung zu verlassen. Insbesondere können anstelle der „Ähnlichkeiten", bzw. der Distanzfunktionen, auch analog „Wahrscheinlichkeiten", wie z. B. die „Wahrscheinlichkeit für die Übereinstimmung von zwei Segmenten", verwendet werden. Ferner werden in der vorliegenden Beschreibung Zeitsegmente von 10–20 ms beispielhaft verwendet. Es ist ebenso erfindungsgemäß möglich, andere Längen der Zeitsegemente zu verwenden. Insbesondere kann auf der Basis einzelner Abtastwerte (die Länge eines Zeitsegmentes entspricht dann einem Abtestwert) gearbeitet werden, wobei die Berechnung der Sprachmerkmale dann gleitend unter Verwendung der umliegenden Abtastwerte erfolgt. Ebenso können die einzelnen Zeitsegmente unterschiedliche Längen haben. So kann z. B. jedes Zeitsegment je ein Phon darstellen.The invention is not limited to the embodiments shown here. Rather, it is possible by combining and modifying the means and features mentioned further versions to realize without leaving the scope of the invention. In particular can instead of the "similarities" or the distance functions, also analogous to "probabilities", such as the "probability for the match of two segments ", be used. Furthermore, in the present description Time segments from 10-20 ms used as an example. It is also possible according to the invention, other lengths of To use time segments. In particular, based on individual Samples (the length of one Time segment then corresponds to a test value), the calculation of the speech characteristics then sliding using of the surrounding samples. Likewise, the individual time segments different lengths to have. So z. B. each time segment represent a phone.

101, 201101 201: Signalform des Wortes „Jupiter"waveform of the word "Jupiter"
102, 202102 202: Skala der Zeitsegmentescale of the time segments
103, 203103 203: Spektrogramm des Wortes „Jupiter"spectrogram of the word "Jupiter"
104, 204104 204: Zeitskalatimescale
105, 205105 205: Energiegehalt am Wortanfangenergy content at the beginning of the word
106, 206106 206: Energiegehalt am Wortendeenergy content at the end of the word
110, 111, 112 110 111, 112: Synchronisationspunktesynchronization points
210, 211, 212, 213210 211, 212, 213: Synchronisationspunktesynchronization points
301301: optimaler Pfadoptimal path
302, 303302 303: Diagramme der Gesamtenergie der Sprachmusterdiagrams the total energy of speech patterns
310310: Überlappung von Zeitsegmentenoverlap of time segments
401401: Mittel zur Dateneingabe, Mikrofonmedium for data entry, microphone
402402: Verstärkeramplifier
403403: BandpassfilterBandpass filter
404404: Analog-Digital-Wandler, A/D-WandlerAnalog to digital converter, A / D converter
405405: Digitalen Signal Prozessor (DSP)digital Signal processor (DSP)
406406: Prozessor, CPUProcessor, CPU
407407: RAMR.A.M.
408408: ROMROME
409409: Mittel zur Datenausgabemedium for data output
421421: Einheit zur Einteilung der Abtastwerte inunit to divide the samples into
: Zeitsegmente, FramerTime segments, Framer
422422: HochpassfilterHigh Pass Filter
423423: Einheit zur Multiplikation der Segmentwerteunit to multiply the segment values
: mit einer FensterfunktionWith a window function
424424: Einheit zur Durchführung einer diskretenunit to carry out a discreet
: Fourier-Transformation (DFT)Fourier transform (DFT)
425425: Einheit zur Logarithmierung der DFT-unit for logarithmization of the DFT
: ErgebnisseResults
426426: Einheit zur Durchführung einer inversen DFTunit to carry out an inverse DFT
427427: Einheit zur Energieberechnungunit for energy calculation
428428: Einheit zur Bildung der ersten Ableitungunit to form the first derivative
429429: Einheit zur Ermittlung vonunit to determine
: Synchronisationspunktensynchronization points
430430: Einheit zur Ermittlung vonunit to determine
: Synchronisationsfragmentensynchronization fragments
431431: Einheit zur Durchführung einer dynamischenunit to carry out a dynamic
: Zeitanpassung (DTW)time adjustment (DTW)
432432: Speichermittel, vorzugsweise zur SpeicherungStorage means, preferably for storage
: von Referenzmusternof reference patterns
701701: feature vector-Strukturfeature vector structure
702702: sync fragment-Struktursync fragment structure
704704: fragment class-Strukturfragment class structure
705705: reference pattern-Strukturreference pattern-structure
706706: template-Strukturtemplate structure
810 ... 894810 ... 894: Zeilen-Nummern des Pseudo-CodesLine numbers of the pseudo code
901901: Synchronisationspunkt mit abfallender Energiesynchronization point with falling energy
902, 903902 903: Synchronisationspunkte mit ansteigendersynchronization points with increasing
: Energieenergy
906906: erweiterte template-StrukturAdvanced template structure
911, 912, 913911, 912, 913: Synchronisationsfragmente synchronization fragments
921, 922, 923921, 922, 923: Synchronisationsfragment-Klassen Synchronization fragment classes
930930: Synchronisationsfragment-RepräsentantSynchronization fragment Representative
931931: Synchronisationsfragment-ObjektSynchronization fragment object
941, 942941, 942: Reference-Pattern-Objekt, RP-Objekt Reference pattern object RP object
951951: Schablonetemplate

Claims

Method for the synchronization of test and reference patterns, in particular for the synchronization of speech patterns, with test and reference patterns each as a sequence A = (a (1), a (2), ..., a (I)) or B = ( b (1), b (2), ..., b (J)) of feature vectors (assigned to time segments) a (i) (i = 1, 2, ..., I) or b (j) (j = 1, 2, ..., J) are present, characterized in that the synchronization takes place via synchronization points (SP) which are predetermined and / or automatically determined in the test and reference patterns, with - a part of the feature vectors b (j) of the reference patterns is marked as a reference SP, - the feature vectors a (i) of a test pattern are searched for predeterminable features and those feature vectors are identified as potential SP that have at least one of the predefined features, - at least part of the potential SP determined according to predefinable rules or is assigned to several reference SP and - if predefined criteria are met the synchronization of test and reference patterns is automatically established via mutually assigned SP.

Method according to claim 1, characterized, that the Potential SP is determined using the following features: - difference the energy of successive time segments and / or - difference the energy of certain frequency bands of successive time segments and / or - Modification the number of zero crossings of the speech signal in successive time segments and / or - based of cepstrum, LPC and / or PARCOR coefficients and / or based on the derivatives of these coefficients.

A method according to claim 2, characterized in that for Determination of potential SP frequency bands weighted differently become.

Method according to one of the preceding claims, characterized characterized that the predefinable rules for the assignment of potential SP to reference SP for at least part of the potential SP determined an analysis of additional, preferably temporally Environment of the potential SP arranged feature vectors of the test pattern prescribe and assign potential SP to reference SP dependent on of the results of this analysis.

Method according to one of the preceding claims, characterized characterized that the to determine potential SP analyzed features with each other combined and / or with mathematical functions, such as in particular the logarithm function become.

Method according to one of the preceding claims, characterized characterized that for Determining potential SP the characteristics of at least two directly successive time segments are analyzed.

Method according to one of the preceding claims, characterized characterized that the Synchronization determining a degree of similarity (score) of reference and test patterns.

Method according to one of the preceding claims, characterized characterized that a Average of characteristics analyzed to determine potential SP continuously from within a predefinable time window lying samples is determined (moving average).

Method according to one of the preceding claims, characterized characterized that stop consonants and / or feature vectors characterizing explosive sounds as reference synchronization points serve.

Method according to one of the preceding claims, characterized in that to determine potential SP the absolute value of the ratio Δ _i / e _{i is} evaluated, wherein - Δ _{i is} the difference in the energy of the time segments i and i-1 and - e _i denote the mean energy of a predeterminable number of time segments surrounding the time segment i, and a time segment i is characterized as SP in that | Δ _i / e _i | exceeds a predeterminable threshold value.

Method according to one of the preceding claims, thereby characterized in that to synchronization - the or the determined reference pattern and / or the test pattern (s) output, made available or handed over to other applications be or - one dynamic time adjustment (DTW, Dynamic Time Warping) of test and Reference pattern (s) or an analysis of the test pattern by a hidden Markov model (HMM) carried out and then the determined reference pattern (s) and / or the test pattern (s) output, made available or handed over to other applications become.

A method according to claim 11, characterized in that the Output takes place acoustically and / or visually.

Method according to one of the preceding claims, thereby characterized in that a dynamic time adjustment between a reference and a test pattern takes place when - for all reference SP of the reference pattern or - for a definable Number of reference SP of the reference pattern an assignment to SP of the test sample has been produced or - a predeterminable ratio between the total number of reference SP of the reference pattern and the number of Reference SP of the reference pattern for which an assignment of SP a test pattern has been reached, reached or exceeded.

A method according to claim 13, characterized, that the number of possible considered in the dynamic time adjustment Paths and thus the search space can be restricted by - in one Path at least one index, i or j, to one at each step predeterminable value increased will and / or - in a path for each step both indices at the same time Predeterminable values increased be and / or - Number successive steps parallel to an axis and or - the Find the optimal path on the diagonal and a given one Number of time segments on both sides of the diagonal is limited and or - the dynamic time adjustment only for the intervals between the SP is executed.

Method according to one of claims 13 or 14, characterized in that that at dynamic time adjustment for intervals between SP the Search space is expanded by adding time segments from the search space temporal environment of the SP can be added.

Method according to one of claims 13 to 15, characterized in that that a Dynamic time adjustment (DTW) using "Dynamic Programming" (DP) or with the help of the Viterbi algorithm or using hidden Markov models carried out becomes.

Method according to one of claims 13 to 16, characterized in that that the Path search for dynamic time adjustment from the first SP of one Test pattern beginning at the beginning of the word (backwards) and / or from the last SP a test pattern starting at the end of the word (forward).

A method according to claim 17, characterized in that in the case of an automatic determination of word boundaries of a speech pattern, the path search for the time segments is terminated, - in which no assignments of time segments can be found for which the distance value d (i, j) and a predefinable threshold value D _S fulfill the condition d (i, j) <D _S , or - if the distance value d (i, j) in a predeterminable number of successive time segments the smolder lenwert D _S exceeds, and the time segments at which the search was terminated are marked as a word boundary.

A method according to claim 18, characterized in that the threshold value D _{S is} predetermined as a function of the application, preferably taking background noise into account.

Method according to Claim 18 or 19, characterized in that the threshold value D _S is determined by determining the values for d () in speech pauses.

Method according to one of claims 18 to 20, characterized in that that at Presence of multiple reference speech patterns representing the same word the automatic determination of word boundaries of a speech pattern is performed with several of these reference speech patterns.

Method according to one of the preceding claims, thereby characterized in that at the method, in particular for calculating distance functions, the following parameters are determined: - Cepstrum coefficients and / or - LPC coefficients (Linear Predictive Coding) and / or - PARLOR coefficients and / or - LAR coefficients and or - LSP coefficients and or - LSF coefficients and or - spectral energy distribution and or - MEL spectrum and or - zero crossing rate (zero crossing rate) and / or - Mel or Bark transformations the aforementioned coefficients and / or - Time derivatives of the aforementioned coefficients and / or their Mel or Bark transformations and / or - combinations these coefficients and / or parameters in smoothed and unsmoothed form.

Method according to one of the preceding claims, characterized characterized that reference SP at least be assigned to a class of synchronization fragments (SF), an SF comprises feature vectors which are in temporal surroundings a reference SP of a predetermined reference pattern are.

A method according to claim 23, characterized in that a Classification of SP and / or synchronization fragments takes place such that SP and / or synchronization fragments with increasing energy first class and SP and / or synchronization fragments with falling Energy can be assigned to a second class of SP.

Method according to one of claims 23 or 24, characterized in that that synchronization fragments of immediately successive feature vectors of a speech pattern be formed.

Method according to one of claims 23 to 25, thereby characterized in that one SP associated with increasing energy synchronization fragments are formed by feature vectors that are temporal in the pattern are arranged after the SP and / or an SP with falling Energy-associated synchronization fragments of feature vectors are formed, which are arranged in the pattern before the SP are.

Method according to one of claims 23 to 26, characterized in that that the membership an SP to a class of synchronization fragments by dynamic Time adjustment takes place.

Method according to one of claims 23 to 27, characterized in that - to determine the affiliation of an SP to a class of synchronization fragments - distance functions are used which are particularly meaningful in the area of phonetic transitions, or - other synchronization functions are used for synchronization fragments with increasing energy content than with synchronization fragments with falling energy content.

Method according to one of the preceding claims, characterized characterized that after Determine the SP of a test pattern by comparing the number of these SP with the number of reference SPs of at least some of the reference patterns done and in agreement the number an assignment between SP and reference SP in temporal Order of their appearance in the respective language pattern becomes.

A method according to claim 29, characterized in that after assignment of the SP and reference SP a test of the SP pairs this is followed by whether both SP of a pair belong to the same class of Heard synchronization fragments.

Method according to one of the preceding claims, thereby characterized in that reference patterns a list of data structures (templates) is assigned which Information on the reference SP of the reference pattern, in particular information - about done Assignments of SP from test patterns to reference SP of the reference pattern and or - on the Degree of similarity (score) of reference and test samples includes.

Method according to one of the preceding claims, characterized characterized that after assignment of an SP of a test pattern to a reference SP the SP is included in the set of reference SP.

Method according to one of the preceding claims, characterized characterized that after synchronization of a test and reference pattern the test pattern is included in the set of reference samples.

Method according to one of claims 11 to 33, thereby characterized in that the execution the DTW starts with the reference pattern, - which is the highest number of assigned SP or - which is the best ratio of Assignments to the number of reference SP of the reference pattern has or - For which the best mean similarity the synchronization fragments of the reference pattern to the synchronization fragments of the test pattern was determined, the mean similarity as the sum of the similarities of the individual fragments for each class divided by the Number of reference SP of the reference pattern is defined.

Arrangement with at least one processor that is (are) set up in such a way that a method for synchronization of test and reference patterns, especially for synchronization of speech patterns, with test and reference patterns each as a sequence A = (a (1), a (2), ..., a (I)) or B = (b (1), b (2), ..., b (J)) of (associated with time segments) feature vectors a (i) (i = 1, 2, ..., I) or b (j) (j = 1, 2, ..., J) are present, can be carried out, being the synchronization over in the test and Reference patterns predefined and / or automatically determined synchronization points (SP) takes place where - on Part of the feature vectors b (j) of the reference pattern as a reference SP is awarded - the Feature vectors a (i) of a test pattern according to predefinable features searched and those feature vectors marked as potential SP which have at least one of the specified characteristics, - at least Part of the potential SP determined according to predefinable rules is assigned to one or more reference SPs and - if it can be specified Criteria the synchronization of test and reference patterns over each other assigned SP is automatically created.

Arrangement according to claim 35, characterized by means for data input ( 401 ) and edition ( 409 ), at least one amplifier ( 402 ), at least one bandpass filter ( 403 ), at least one analog-to-digital converter ( 404 ), at least one digital signal processor (DSP) ( 405 ) at least one processor ( 406 ) with RAM ( 407 ) and read-only memory (ROM) ( 408 ), data inputs and outputs of these units being connected to one another in such a way that signal transmission from the (one) means for data input ( 401 ) via amplifier ( 402 ), Bandpass filter ( 403 ) and analog-digital converter ( 404 ) to the (a) digital signal processor (DSP) ( 405 ) is feasible, RAM ( 407 ) and read-only memory (ROM) ( 408 ) with the processor (s) ( 406 ), the processor (s) ( 406 ) with the digital signal processor (s) (DSP) ( 405 ) and at least one processor ( 406 ) and / or digital signal processor (DSP) ( 405 ) with means for data output ( 409 ) are connected by means of data exchange.

Arrangement according to one of claims 35 or 36, characterized in that the arrangement comprises means for data exchange with external data processing devices and / or for acoustic and / or visual data output ( 409 ) having.

Arrangement according to one of Claims 36 or 37, characterized in that the digital signal processor (DSP) ( 405 ) and / or processor ( 406 ) - at least one framer ( 421 ), - at least one unit for calculating language parameters of the language segments, - at least one unit for determining the synchronization points ( 429 ), - at least one unit for compiling the synchronization fragments ( 430 ), - at least one unit for performing a dynamic time adjustment (DTW, Dynamic Time Warping) ( 431 ), - storage means ( 432 ), preferably for storing reference patterns, with data inputs and outputs of these units being connected to one another in such a way that signal transmission from the (a) framer ( 421 ) via (the) unit (s) to calculate language parameters of the language segments, (the) unit (s) to determine the synchronization points ( 429 ) and (the) unit (s) for compiling the synchronization fragments ( 430 ) to the unit (s) for performing a dynamic time adjustment (DTW, Dynamic Time Warping) ( 431 ) is feasible, the (the) storage means ( 432 ) via means of data exchange with the unit (s) for carrying out a DTW ( 431 ) and the data input (s) of the framer (s) with data inputs from the digital signal processor (DSP) ( 405 ) and / or processor ( 406 ) and the data output (s) of the unit (s) for performing a DTW ( 431 ) with data outputs from digital signal processor (DSP) ( 405 ) and / or processor ( 406 ) are connected.

Arrangement according to claim 38, characterized, that a Unit for calculating language parameters of the language segments - Medium to calculate the cepstrum coefficients and / or - Medium to calculate the LPC coefficients (Linear Predictive Coding) and or - Medium to calculate the PARLOR coefficients and / or - Medium to calculate the LAR coefficients and / or - Medium to calculate the LSP coefficients and / or - Medium to calculate the LSF coefficients and / or - Medium to calculate the spectral energy distribution and / or - Medium to calculate the MEL spectrum and / or - Means for calculating the Zero crossing rate and / or - Medium to calculate the Mel or Bark transformations of the aforementioned Coefficients and / or - Medium to calculate the time derivatives of the aforementioned coefficients and / or their Mel or Bark transformations and / or - Medium for combinations of these coefficients and / or parameters in smoothed and unsmoothed Form includes.

Arrangement according to claim 39, characterized in that a means for calculating the cepstrum coefficients - at least one unit for performing a discrete Fourier transformation (DFT) ( 424 ), - at least one unit for logarithmizing the DFT results ( 425 ), - at least one unit for performing an inverse DFT ( 426 ), - at least one unit to form the first derivative ( 428 ), with data inputs and outputs of these units being connected to one another in such a way that a signal transmission from the unit (s) for performing a discrete Fourier transform (DFT) ( 424 ) via (the) unit (s) for logarithmizing the DFT results ( 425 ), (the) units) for performing an inverse DFT ( 426 ) to the unit (s) to form the first derivative ( 428 ) can be carried out, the data input of the unit for calculating speech parameters of the speech segments from the data input (s) of the unit (s) for performing a discrete Fourier transformation (DFT) ( 424 ) and the data output of the unit for the calculation of language parameters of the speech segments from the data output (data outputs) of the units) to form the first derivative ( 428 ) are formed.

Computer program product that is a computer readable Storage medium includes on which a program is stored that enables a computer to after it has been loaded into the computer's memory Methods for the synchronization of test and reference patterns, in particular for the synchronization of speech patterns, whereby test and reference patterns each as a sequence A = (a (1), a (2), ..., a (I)) or B = (b (1), b (2), ..., b (J)) of (associated with time segments) feature vectors a (i) (i = 1, 2, ..., I) or b (j) (j = 1, 2, ..., J) are present, whereby the synchronization over predefined and / or automatic in the test and reference samples determined synchronization points (SP) takes place, whereby - a part of the feature vectors b (j) of the reference pattern as a reference SP becomes, - the Characteristic vectors a (i) of a test pattern according to predefinable characteristics searched and those feature vectors marked as potential SP which have at least one of the specified characteristics, - at least Part of the potential SP determined according to predefinable rules is assigned to one or more reference SPs and - if it can be specified Criteria the synchronization of test and reference patterns over each other assigned SP is automatically created.

Computer-readable storage medium on which a program is saved, which allows a computer after it is in memory of the computer has been loaded, a method for synchronizing Test and reference samples, especially for the synchronization of Speech patterns, with test and reference patterns each as a sequence A = (a (1), a (2), ..., a (I)) or B = (b (1), b (2), ..., b (J)) of (time segments assigned) feature vectors a (i) (i = 1, 2, ..., I) or b (j) (j = 1, 2, ..., J) are to be carried out, with the synchronization via in the Test and reference samples predefined and / or automatically determined Synchronization points (SP) takes place, whereby - part of the feature vectors b (j) the reference pattern is designated as a reference SP, - the feature vectors a (i) a test pattern is searched for predeterminable features and those feature vectors are identified as potential SP, which have at least one of the specified characteristics, - at least Part of the potential SP determined according to predefinable rules is assigned to one or more reference SPs and - if it can be specified Criteria the synchronization of test and reference patterns over each other assigned SP is automatically created.

Database containing information such as, in particular, lists of similarities of speech patterns or synchronization fragments and / or class details to include synchronization fragments, which by a method according to a of claims 1 to 34 were won.