EP0815553B1 - Method of detecting a pause between two signal patterns on a time-variable measurement signal - Google Patents

Method of detecting a pause between two signal patterns on a time-variable measurement signal Download PDF

Info

Publication number
EP0815553B1
EP0815553B1 EP96905679A EP96905679A EP0815553B1 EP 0815553 B1 EP0815553 B1 EP 0815553B1 EP 96905679 A EP96905679 A EP 96905679A EP 96905679 A EP96905679 A EP 96905679A EP 0815553 B1 EP0815553 B1 EP 0815553B1
Authority
EP
European Patent Office
Prior art keywords
pause
signal
pattern
measurement signal
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP96905679A
Other languages
German (de)
French (fr)
Other versions
EP0815553A2 (en
Inventor
Abdulmesih Aktas
Klaus ZÜNKLER
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Original Assignee
Siemens AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG filed Critical Siemens AG
Publication of EP0815553A2 publication Critical patent/EP0815553A2/en
Application granted granted Critical
Publication of EP0815553B1 publication Critical patent/EP0815553B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • Pattern recognition is achieved in many technical processes is becoming increasingly important as this increases the degree of automation can be achieved.
  • Let pattern recognition processes are usually reduced to a time-varying measurement signal, which from the patterns to be recognized in a suitable manner is derived.
  • these measurement signals not in a pure form, but often from stationary ones or are overlaid by transient interference signals.
  • these interference components the measurement signal, for example by background noise, Breathing noises, machine noises, or even through the recording medium and the transmission path become.
  • the measurement signal is never in pure form it is particularly important between the components of the measurement signal, which contain the pattern to be recognized and between other proportions in which there is no pattern differentiate. So it is for better recognition of the patterns especially important to know exactly when pattern in the measurement signal and when there are no samples, i.e. not from the pattern originating signals are present as pause signals in the measurement signal are.
  • a pause detection is e.g. also important to a reduction the amount of data transmitted, for example in the case of voice communication channels and also in satellite broadcasting to achieve the general useful interference signal decision signal processing, or the end of an utterance to be found in automatic speech recognition systems.
  • a robust pause detector serves to improve performance of voice-controlled systems. Especially applies this for speech recognition systems since it is about a spoken utterance as a pattern with an existing one Compare version.
  • the problem of determining breaks is particularly detailed in automatic speech recognition described by Rabiner [1]. It also has an algorithm specified for break detection. There are pause detection Information taken into account which comes straight from the sampled time signal (energy, Zero crossing rate ETC.). This approach is everyone known pause detectors together [2].
  • the object underlying the invention is a improved method for pause detection between patterns specify which are present in a measurement signal and which were modeled with the help of hidden Markov models.
  • An advantage of the method according to the invention is that that for the first time information in different signal processing levels be won and the one after the other occur for pause detection.
  • the means the pause information is compared by comparing one special pause model with the feature vectors of the measurement signal won in a comparison level and at the feature extraction level the pattern recognition, so that in another time slice in the feature extraction level the pause status is taken into account in the measurement signal analysis can be.
  • the method according to the invention advantageously uses the Information that certain sample groups belong together, for example, in the case of words, these are phoneme pattern groups it is ensured that at least according to the sample group there must be a pause.
  • This information will follow advantageously first in the feature extraction level Processing stage of the process exploited.
  • the method according to the invention also ensured that one to be recognized before arrival Sequence must have been a break. This fact is also used in pattern recognition.
  • the method according to the invention can advantageously also be used known methods for pause detection can be combined, what properties of the measurement signal in the time domain and in Evaluate the spectral range. This way, a higher one Detection rate in pattern recognition can be achieved.
  • Speech patterns, writing patterns or signaling patterns are analyzed as they are used in a wide range of technical applications occur and be appropriately modeled can.
  • the method according to the invention can advantageously be used ensure that if no patterns are detected, there must be a pause, this will increase Detection rate achieved in the pattern recognition, since with it the Feature extraction level makes pause information even more reliable can be made available.
  • FIG. 1 shows a schematic example of a speech recognition system equipped with pause recognition.
  • Figure 2 illustrates the pause detection process using various hidden Markov models.
  • the method according to the invention is based in particular on that the signal states and the feature vectors of one time slice to the other time slice of the analysis interval do not change excessively.
  • information can is obtained in the classification class Klass by, for example it is found that when comparing the hidden Markov models more likely to pause than for a pattern to be recognized, to the feature extraction level are forwarded as pause information Pa. It’s very likely that the time slice in which the pause is detected, another time slice is included Follow pause. This procedure allows the measurement signal existing undesirable disturbances in the formation of the feature vectors even with a low signal-to-noise ratio great security can be suppressed.
  • This Knowledge can, for example, from a speech signal about the acoustic phonetic modeling level (hidden Markov models), who already have a lot of training data for speech recognition has been trained.
  • a speech signal about the acoustic phonetic modeling level hidden Markov models
  • acoustic phonetic modeling level hidden Markov models
  • Modeling is more refined and therefore better taking into account the phoneme context, i.e. the Know which phoneme is following another.
  • One links for example the pause decision of the acoustically phonetic Modeling level with common criteria for the Break estimation is an improvement in the break decision achievable.
  • the various Viterbi paths V1 to V3 are for different hidden Markov models are shown.
  • the measurement signal which for example a voice signal, a write signal, or a Signal that is emitted by signaling methods via a suitable signal transformation or several signal transformations transformed into a feature vector space.
  • the measurement signal which for example a voice signal, a write signal, or a Signal that is emitted by signaling methods via a suitable signal transformation or several signal transformations transformed into a feature vector space.
  • the method according to the invention can be used for training, for example realized with the method of the hidden Markov models become.
  • the pause detection method can be equally with other pattern recognition methods, such as e.g. dynamic programming, or neural networks carry out.
  • Hidden Markov models can be applied, e.g. for example the distribution functions of the feature vectors be estimated for each recognition unit.
  • recognition units are in this context in the automatic Speech recognition Speech sounds (phonemes) meant.
  • the procedure was, for example, automatic Realized speech recognition, but it is conceivable that it can be used for any kind of pattern recognition can. Just make sure that signal patterns are provided and that there are pause conditions in which the interference signals can be determined in order to to train the hidden Markov models for break states.
  • Some such examples of other pattern recognition methods are, for example, the patterns used when signing a document in the form of pressure or time-dependent Write signals occur, or signal sequences that occur with automatic telecommunications signaling method applied become.
  • the recognition phase for example, a continuous pattern comparison in every analysis interval or time slice the probability of generation for each recognition unit to calculate.
  • An easy solution is to evaluate this Probabilities. Is the probability of pause, So for the hidden Markov model for break or its Correspondence highest, so the relevant analysis interval to re-estimate the distribution functions, or can be used to filter out noise suppression.
  • the method according to the invention becomes even more robust if the result of a pattern recognizer is taken into account as an additional source of knowledge. Assuming, for example, that the pattern recognizer is able to recognize every possible useful signal, the method according to the invention can take advantage of this and define all other analysis intervals, which are not classified as useful signals, as pauses. Such a time period is designated T p in FIG. If there is no requirement for real-time processing with respect to the method, as is the case, for example, in simulations, the method according to the invention can hereby already be considered sufficient for pattern recognition. In practice, real-time criteria are to be used in the applications mentioned and the earliest possible assignment to the useful or noise signal. The method must therefore be integrated into the recognition process itself, for example.
  • the recognition method is thus expanded in accordance with the invention in such a way that after each analysis step, for example, it is evaluated which of the patterns, for example words, composed of the recognition units is the most likely.
  • the probability that it contains a signal pause is calculated over a larger analysis interval, for example.
  • the analysis interval is dimensioned such that it is in any case longer than short pauses, for example plosive pauses, in the useful signal. This probability is then compared with that of the most probable pattern, and they are related to an equally long time interval. The result of this comparison can already be used as a decision.
  • the existing in the different time slices Information about the presence of a pause in the classifier Class of the feature extraction level Merk supplied.
  • a dynamic one takes place during the recognition Pattern comparison, in which on the basis of the feature vectors in one Analysis window or a time slice an assignment to the pre-trained models is accomplished.
  • a global one Search strategy e.g. realized by the Viterbi algorithm, finds the most likely sequence of pre-trained Model states representing the incoming sequence of feature vectors reproduces [6].
  • the classifier can be used for the classifier Information about pause / non-pause can be tapped and be fed to a pause detector in another stage.
  • a special hidden Markov model for pause with the incoming feature vectors is compared if a higher probability for Pause occurs than for other patterns, so it becomes a pause information for example to the feature extraction level Merk passed on and leads there to the decision that currently there is a pause. That means with this pause information can also be an existing one in the extraction stage
  • Pause detector can be controlled to set pause.
  • This pause decision can be probability-weighted, for example and is based on a decision that other sources of knowledge within the inventive method considered.
  • Such other sources of knowledge are for example statistics of the measurement signal and phoneme context the Viterbi process. Because of the sequential structure a recognizer must, for example, when the Information on a pause detection level for the suppression of noise, e.g. the delay by an analysis window be taken into account. If you link the pause decision the acoustic phonetic modeling level speech recognition with common criteria for pause estimation, so is an improvement in the break decision achievable. Take the frame-by-frame detection, for example the breaks completely, so there is another source of knowledge in the detection system for the pause estimation.
  • the inventive method is in one Main program that is limited by main and end.
  • This main program essentially contains a do-loop as a time loop.
  • a signal_analysis procedure a transformation of the measurement signal into a feature area carried out.
  • a special time slice of the measurement signal is analyzed and feature vectors from this time slice created.
  • the created feature vectors are then analyzed in a subroutine calculate_word_wk.
  • the for each reference word Probability, e.g. with hidden Markov models and with Calculated using Viterbi decoding. For example the association probability that all previous Feature vectors were emitted, calculated.
  • calculate_pause_wk becomes the probability calculated for pause for the last P time steps.
  • the association probability is calculated that the last P feature vectors from the model for pause were issued.
  • pause information is generated if the probability for pause is higher than for the best word, otherwise the pause information is not generated. For example here a standardization of the probability to be taken into account performed for the same period of time P.
  • a standardization of the probability to be taken into account performed for the same period of time P.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Image Analysis (AREA)
  • Radar Systems Or Details Thereof (AREA)

Description

In vielen technischen Prozessen erlangt die Mustererkennung vermehrt Bedeutung, da damit ein steigender Automatisierungsgrad erreicht werden kann. Mustererkennungsprozesse lassen sich in der Regel auf ein zeitvariantes Meßsignal reduzieren, welches aus den zu erkennenden Mustern auf geeignete Weise abgeleitet wird. Bei der automatischen Analyse dieser Meßsignale tritt jedoch das Problem auf, daß diese Meßsignale nicht in reiner Form vorliegen, sondern häufig von stationären oder von instationären Störsignalen überlagert werden. Bei der Untersuchung von Meßsignalen, welche aus natürlich gesprochener Sprache abgeleitet werden, können diese Störanteile des Meßsignales beispielsweise durch Hintergrundgeräusche, Atemgeräusche, Maschinengeräusche, oder auch durch das Aufnahmemedium und die Übertragungsstrecke hervorgerufen werden. Weil das Meßsignal nie in reiner Form vorliegt, ist es besonders wichtig, zwischen den Anteilen des Meßsignales, welche das zu erkennende Muster enthalten und zwischen anderen Anteilen, in denen kein Muster vorhanden ist, zu unterscheiden. Zur besseren Erkennung der Muster ist es also besonders wichtig genau zu wissen, wann Muster im Meßsignal vorhanden sind und wann keine Muster, d.h. nicht vom Muster herrührende Signale als Pausensignale im Meßsignal vorhanden sind.Pattern recognition is achieved in many technical processes is becoming increasingly important as this increases the degree of automation can be achieved. Let pattern recognition processes are usually reduced to a time-varying measurement signal, which from the patterns to be recognized in a suitable manner is derived. In the automatic analysis of these measurement signals However, the problem arises that these measurement signals not in a pure form, but often from stationary ones or are overlaid by transient interference signals. When examining measurement signals, which of course spoken language can be derived, these interference components the measurement signal, for example by background noise, Breathing noises, machine noises, or even through the recording medium and the transmission path become. Because the measurement signal is never in pure form it is particularly important between the components of the measurement signal, which contain the pattern to be recognized and between other proportions in which there is no pattern differentiate. So it is for better recognition of the patterns especially important to know exactly when pattern in the measurement signal and when there are no samples, i.e. not from the pattern originating signals are present as pause signals in the measurement signal are.

Eine Pausedetektion ist z.B. auch wichtig, um eine Reduktion der Menge der übertragenen Daten, beispielsweise bei Sprachkommunikationskanälen und auch in der Satellitenübertragung zu erzielen, zur Allgemeinen Nutz-Störsignal-Entscheidung bei der Signalverarbeitung, oder aber um das Ende einer Äußerung im automatischen Spracherkennungssystemen zu finden. Dabei dient ein robuster Pausendetektor der Verbesserung der Leistungsfähigkeit von sprachgesteuerten Systemen. Besonders gilt dies für Spracherkennungssysteme, da es darum geht, eine gesprochene Äußerung als Muster mit einer bereits vorhandenen Version zu vergleichen. Die Problematik der Pausenbestimmung speziell in der automatischen Spracherkennung ist ausführlich von Rabiner [1] beschrieben worden. Er hat auch einen Algorithmus zur Pausendetektion angegeben. Dort werden zur Pausendetektion Informationen berücksichtigt, welche direkt aus dem abgetasteten Zeitsignal berechnet werden (Energie, Nulldurchgangsrate ETC.). Diese Vorgehensweise ist allen bekannten Pausendetektoren gemeinsam [2]. Sie benutzen in der Regel ein mehr oder weniger kompliziertes Regelwerk, um die Klassifikation der Pause aus den berechneten Merkmalen durchzuführen. Alternativ wurden auch statistische Klassifikatoren benutzt [3]. Wegen dieser Vorgehensweise können all diese Verfahren nur bis zu einem bestimmten Störpegel arbeiten. Die Grenze ist von der Art der Störung abhängig. Sie können bei geringen Signal-Rauschabständen nicht mehr eingesetzt werden, denn Pausendetektoren sind in der Regel schwellengesteuert. In störbehafteten Umgebungen bei sehr geringen Signal-Rauschabständen versagen jedoch die gängigen schwellenbehafteten Entscheidungskriterien. Dazu kommen nichtstationäre Störungen mit signalähnlichem Charakter, die kaum zu erfassen sind.A pause detection is e.g. also important to a reduction the amount of data transmitted, for example in the case of voice communication channels and also in satellite broadcasting to achieve the general useful interference signal decision signal processing, or the end of an utterance to be found in automatic speech recognition systems. Here A robust pause detector serves to improve performance of voice-controlled systems. Especially applies this for speech recognition systems since it is about a spoken utterance as a pattern with an existing one Compare version. The problem of determining breaks is particularly detailed in automatic speech recognition described by Rabiner [1]. It also has an algorithm specified for break detection. There are pause detection Information taken into account which comes straight from the sampled time signal (energy, Zero crossing rate ETC.). This approach is everyone known pause detectors together [2]. You use in the Rule a more or less complicated set of rules to the Classification of the break from the calculated characteristics perform. Statistical classifiers were also used as an alternative used [3]. Because of this procedure, all these procedures only work up to a certain noise level. The limit depends on the type of disturbance. she can no longer be used with low signal-to-noise ratios be because pause detectors are usually threshold controlled. In noisy environments at very However, the common signal-to-noise ratios fail decision criteria with thresholds. Join in non-stationary disturbances with a signal-like character, the are barely detectable.

Bisherige Ansätze für Sprachpausenbestimmungen, verwenden beispielsweise einen lokalen, d.h. anhand einer zeitlichen bzw. spektralen Frame-Information gewonnenen Parameter zur Detektion von Signal- bzw. Nichtsignalbereichen [4,5]. Auch neuere hierzu veröffentlichte Arbeiten basieren in erster Linie auf Modifikationen oder Erweiterungen dieser Arbeiten. Weitere Vorgehensweisen zur Pausenerkennung in zeitvarianten Meßsignalen sind nicht bekannt.Use previous approaches for language break regulations for example a local, i.e. based on a temporal or spectral frame information Detection of signal and non-signal areas [4,5]. Also recent works published on this are based on the first Line on modifications or extensions of this work. Further procedures for pause detection in time variants Measurement signals are not known.

Die der Erfindung zugrundeliegende Aufgabe besteht darin, ein verbessertes Verfahren zur Pausenerkennung zwischen Mustern anzugeben, die in einem Meßsignal vorhanden sind und welche mit Hilfe von Hidden-Markov-Modellen modelliert wurden.The object underlying the invention is a improved method for pause detection between patterns specify which are present in a measurement signal and which were modeled with the help of hidden Markov models.

Diese Aufgabe wird gemäß den Merkmalen des Patentanspruchs 1 gelöst.This object is achieved in accordance with the features of patent claim 1 solved.

Weiterbildungen der Erfindung ergeben sich aus den Unteransprüchen.Further developments of the invention result from the subclaims.

Ein Vorteil des erfindungsgemäßen Verfahrens besteht darin, daß erstmals Informationen, die in unterschiedlichen Signalverarbeitungsstufen gewonnen werden und die zeitlich nacheinander auftreten zur Pausendetektion eingesetzt werden. Das heißt, die Pauseinformation wird durch den Vergleich eines speziellen Pausenmodells mit dem Merkmalsvektoren des Meßsignales in einer Vergleichsstufe gewonnen und an die Merkmalsextraktionsstufe der Mustererkennung zurückgeführt, so daß in einer weiteren Zeitscheibe in der Merkmalsextraktionsstufe der Pausenzustand bei der Meßsignalanalyse berücksichtigt werden kann.An advantage of the method according to the invention is that that for the first time information in different signal processing levels be won and the one after the other occur for pause detection. The means the pause information is compared by comparing one special pause model with the feature vectors of the measurement signal won in a comparison level and at the feature extraction level the pattern recognition, so that in another time slice in the feature extraction level the pause status is taken into account in the measurement signal analysis can be.

Vorteilhafterweise nutzt das erfindungsgemäße Verfahren die Information aus, daß bestimmte Mustergruppen zusammengehören, beispielsweise bei Worten sind dies Phonemmustergruppen, so wird sichergestellt, daß mindestens nach der Mustergruppe eine Pause folgen muß. Im Anschluß wird diese Information vorteilhafterweise in der Merkmalsextraktionsstufe als erster Verarbeitungsstufe des Verfahrens ausgenutzt.The method according to the invention advantageously uses the Information that certain sample groups belong together, for example, in the case of words, these are phoneme pattern groups it is ensured that at least according to the sample group there must be a pause. This information will follow advantageously first in the feature extraction level Processing stage of the process exploited.

Vorteilhafterweise wird durch das erfindungsgemäße Verfahren auch sichergestellt, daß vor Eintreffen einer zu erkennenden Musterfolge eine Pause gewesen sein muß. Dieser Sachverhalt wird ebenfalls bei der Mustererkennung ausgenutzt.Advantageously, the method according to the invention also ensured that one to be recognized before arrival Sequence must have been a break. This fact is also used in pattern recognition.

Vorteilhafterweise kann das erfindungsgemäße Verfahren mit bekannten Verfahren zur Pausenerkennung kombiniert werden, welche Eigenschaften des Meßsignals im Zeitbereich und im Spektralbereich auswerten. Auf diese Weise kann eine höhere Detektionsrate bei der Mustererkennung erreicht werden.The method according to the invention can advantageously also be used known methods for pause detection can be combined, what properties of the measurement signal in the time domain and in Evaluate the spectral range. This way, a higher one Detection rate in pattern recognition can be achieved.

Besonders vorteilhaft können mit dem erfindungsgemäßen Verfahren Sprachmuster, Schreibmuster oder Signalisierungsmuster analysiert werden, da sie in vielfältigen technischen Anwendungen vorkommen und auf geeignete Weise modelliert werden können.Can be particularly advantageous with the inventive method Speech patterns, writing patterns or signaling patterns are analyzed as they are used in a wide range of technical applications occur and be appropriately modeled can.

Vorteilhafterweise kann mit dem erfindungsgemäßen Verfahren sichergestellt werden, daß falls keine Muster erkannt werden, eine Pause vorliegen muß, auf diese Weise wird eine erhöhte Detektionsrate bei der Mustererkennung erzielt, da damit der Merkmalsextraktionsstufe eine Pauseninformation noch zuverlässiger zur Verfügung gestellt werden kann.The method according to the invention can advantageously be used ensure that if no patterns are detected, there must be a pause, this will increase Detection rate achieved in the pattern recognition, since with it the Feature extraction level makes pause information even more reliable can be made available.

Im folgenden wird die Erfindung anhand von Figuren weiter erläutert.In the following, the invention will be further explained with reference to figures explained.

Figur 1 zeigt ein schematisiertes Beispiel eines mit Pauseerkennung ausgestatteten Spracherkennungssystems.
Figur 2 veranschaulicht den Pausenerkennungsvorgang anhand verschiedener Hidden-Markov-Modelle.
FIG. 1 shows a schematic example of a speech recognition system equipped with pause recognition.
Figure 2 illustrates the pause detection process using various hidden Markov models.

Figur 1 zeigt anhand eines Beispiels, das hier als Spracherkennungssystem ausgeführt ist, wie nach dem erfindungsgemäßen Verfahren die Pauseninformation detektiert und weitergegeben, d.h. zurückgeleitet wird. Das Meßsignal hier als Sprachsignal Spr, gelangt zunächst in eine Merkmalsextraktionsstufe Merk, welche der ersten Signalverarbeitungsstufe im erfindungsgemäßen Verfahren entspricht. In dieser ersten Signalverarbeitungsstufe, werden üblicherweise die spektralen Merkmale des Sprachsignales bzw. des Meßsignals Spr analysiert. Diese Merkmale, die im Anschluß von der Merkmalsextraktionsstufe ausgegeben werden, sind hier in Figur 1 mit m bezeichnet. Die spektralen Merkmale m gelangen z.B. als Merkmalsvektoren im Anschluß in eine Klassifikationsstufe Klass, in der sie mit den Hidden-Markov-Modellen HMM verglichen werden. Hier setzt nun das erfindungsgemäße Verfahren ein, indem die aus den Meßsignalen gewonnenen Merkmalsvektoren in speziellen Hidden-Markov-Modellen für einzelne Phoneme bzw. für Pausenzustände verglichen werden. In der Trainingsphase der Hidden-Markov-Modelle werden beispielsweise typische Merkmalsvektoren für das Hintergrundrauschen, wie auch für das Nutzsignal geschätzt. So wird es möglich, daß bei einem fortlaufenden Mustervergleich in jedem Analyseintervall zwischen Nutz- und Rauschsignal unterschieden werden kann. Eine noch höhere Robustheit bei sehr schlechtem Signal-Rauschverhältnis erhält man

  • a) durch eine gemeinsame Bewertung vieler Analyseintervalle und
  • b) durch eine Erkennung der Nutzsignale, wobei alle Signale, die nicht als Nutzsignal erkannt werden, beispielsweise dem Rauschen zugeordnet werden können. Vorteilhafterweise kann die Erfindung bei allen bekannten Mustererkennungsverfahren angewendet und mit diesem kombiniert werden.
  • FIG. 1 shows, using an example that is implemented here as a speech recognition system, how the pause information is detected and passed on, that is to say returned, using the method according to the invention. The measurement signal here as a speech signal Spr first reaches a feature extraction stage Merk which corresponds to the first signal processing stage in the method according to the invention. In this first signal processing stage, the spectral features of the speech signal or the measurement signal Spr are usually analyzed. These features, which are subsequently output by the feature extraction level, are designated here by m in FIG. 1. The spectral features m then arrive, for example, as feature vectors in a classification level Klass, in which they are compared with the hidden Markov models HMM. This is where the method according to the invention comes in, by comparing the feature vectors obtained from the measurement signals in special hidden Markov models for individual phonemes or for pause states. In the training phase of the hidden Markov models, for example, typical feature vectors for the background noise and for the useful signal are estimated. This makes it possible for a continuous pattern comparison to distinguish between useful and noise signals in every analysis interval. You get an even higher robustness with a very bad signal-to-noise ratio
  • (a) by evaluating many analysis intervals together and
  • b) by recognizing the useful signals, it being possible for all signals that are not recognized as useful signals to be assigned to the noise, for example. The invention can advantageously be applied to and combined with all known pattern recognition methods.
  • Das erfindungsgemäße Verfahren beruht insbesondere darauf, daß sich die Signalzustände und die Merkmalsvektoren von einer Zeitscheibe zur anderen Zeitscheibe des Analyseintervalls nicht übermäßig ändern. Somit kann eine Information die in der Klassifikationsstufe Klass gewonnen wird, indem beispielsweise festgestellt wird, daß beim Vergleich der Hidden-Markov-Modelle eine höhere Wahrscheinlichkeit für Pause, als für einen zu erkennendes Muster vorliegt, an die Merkmalsextraktionsstufe als Pauseninformation Pa weitergeleitet werden. Mit großer Wahrscheinlichkeit wird auf die Zeitscheibe, in der die Pause detektiert wird eine weitere Zeitscheibe mit Pause folgen. Durch diese Vorgehensweise können im Meßsignal vorhandene unerwünschte Störungen bei der Bildung der Merkmalsvektoren auch bei geringem Signal-Rauschabstand mit großer Sicherheit unterdrückt werden. Vorteilhaft wird durch das erfindungsgemäße Verfahren das in der Erkennungsstufe in einer zweiten Zeitscheibe vorhandene Wissen über die Pause an eine erste Signalverarbeitungsstufe übermittelt. Dieses Wissen kann beispielsweise aus einem Sprachsignal über die akustisch phonetische Modellierungsstufe (Hidden-Markov-Modelle), die bereits mit einer Menge der Trainingsdaten für die Spracherkennung trainiert wurde, gewonnen werden. In phonembasierten Systemen ist die Pause als Modell eines Phonems mittrainiert und umfaßt somit die Statistik der Trainingsdaten. Verfeinerter und damit besser ist die Modellierung bei Berücksichtigung des Phonemkontextes, d.h. das Wissen, welches Phonem einem anderen folgt. Verknüpft man beispielsweise die Pauseentscheidung der akustisch phonetischen Modellierungsstufe mit gängigen Kriterien für die Pauseschätzung, so ist eine Verbesserung der Pausenentscheidung erzielbar.The method according to the invention is based in particular on that the signal states and the feature vectors of one time slice to the other time slice of the analysis interval do not change excessively. Thus, information can is obtained in the classification class Klass by, for example it is found that when comparing the hidden Markov models more likely to pause than for a pattern to be recognized, to the feature extraction level are forwarded as pause information Pa. It’s very likely that the time slice in which the pause is detected, another time slice is included Follow pause. This procedure allows the measurement signal existing undesirable disturbances in the formation of the feature vectors even with a low signal-to-noise ratio great security can be suppressed. Will be advantageous through the inventive method in the recognition stage in a second time slice of knowledge about the break a first signal processing stage is transmitted. This Knowledge can, for example, from a speech signal about the acoustic phonetic modeling level (hidden Markov models), who already have a lot of training data for speech recognition has been trained. In Phonebased systems is the pause as a model Phonems trained and thus includes the statistics of Training data. Modeling is more refined and therefore better taking into account the phoneme context, i.e. the Know which phoneme is following another. One links for example the pause decision of the acoustically phonetic Modeling level with common criteria for the Break estimation is an improvement in the break decision achievable.

    In Figur 2 sind die verschiedenen Viterbipfade V1 bis V3 für unterschiedliche Hidden-Markov-Modelle dargestellt. Hier wird über die Zeit der Zusammenhang zwischen der Mustererkennung und dem Vorhandensein einer Pause zwischen unterschiedlichen Mustern dargestellt. Zunächst wird das Meßsignal, welches beispielsweise ein Sprachsignal, ein Schreibsignal, oder ein Signal ist, das von Signalisierungsverfahren abgegeben wird, über eine geeignete Signaltransformation oder mehrere Signaltransformationen in einen Merkmalsvektorraum transformiert. In einer Trainingsphase des Mustererkennungsverfahrens werden beispielsweise typische Modelle für das Hintergrundrauschen und auch für das Nutzsignal geschätzt, die im Anschluß im Erkennungsverfahren eingesetzt werden sollen. Für das erfindungsgemäße Verfahren kann das Training beispielsweise mit dem Verfahren der Hidden-Markov-Modelle realisiert werden. Das Verfahren zur Pauseerkennung läßt sich jedoch gleichermaßen auch mit anderen Mustererkennungsverfahren, wie z.B. der dynamischen Programmierung, oder neuronalen Netzen durchführen. Falls bei dem erfindungsgemäßen Verfahren Hidden-Markov-Modelle angewendet werden, können u.a. beispielsweise die Verteilungsfunktionen der Merkmalsvektoren für jede Erkennungseinheit geschätzt werden. Mit Erkennungseinheiten sind in diesem Zusammenhang in der automatischen Spracherkennung Sprachlaute (Phoneme) gemeint. Das erfindungsgemäße Verfahren wurde beispielsweise für die automatische Spracherkennung realisiert, es ist jedoch denkbar, daß es für jegliche Arten von Mustererkennung eingesetzt werden kann. Es ist nur sicherzustellen, daß Signalmuster bereitgestellt werden können, und daß Pausenzustände vorhanden sind, in denen die Störsignale ermittelt werden können, um-damit die Hidden-Markov-Modelle für Pausenzustände zu trainieren. Einige solcher Beispiele für andere Mustererkennungsverfahren, sind beispielsweise die Muster, die beim Unterschreiben eines Dokuments in Form von druck- oder zeitabhängigen Schreibsignalen auftreten, oder Signalfolgen, die bei automatischen nachrichtentechnischen Signalverfahren angewendet werden.In Figure 2, the various Viterbi paths V1 to V3 are for different hidden Markov models are shown. Here will over time the relationship between pattern recognition and the presence of a break between different ones Patterns shown. First, the measurement signal, which for example a voice signal, a write signal, or a Signal that is emitted by signaling methods via a suitable signal transformation or several signal transformations transformed into a feature vector space. In a training phase of the pattern recognition process become typical models for background noise and also estimated for the useful signal that followed to be used in the recognition process. For the method according to the invention can be used for training, for example realized with the method of the hidden Markov models become. However, the pause detection method can be equally with other pattern recognition methods, such as e.g. dynamic programming, or neural networks carry out. If with the method according to the invention Hidden Markov models can be applied, e.g. for example the distribution functions of the feature vectors be estimated for each recognition unit. With recognition units are in this context in the automatic Speech recognition Speech sounds (phonemes) meant. The invention The procedure was, for example, automatic Realized speech recognition, but it is conceivable that it can be used for any kind of pattern recognition can. Just make sure that signal patterns are provided and that there are pause conditions in which the interference signals can be determined in order to to train the hidden Markov models for break states. Some such examples of other pattern recognition methods, are, for example, the patterns used when signing a document in the form of pressure or time-dependent Write signals occur, or signal sequences that occur with automatic telecommunications signaling method applied become.

    Bei der Durchführung des erfindungsgemäßen Verfahrens kann in der Erkennungsphase beispielsweise ein fortlaufender Mustervergleich in jedem Analyseintervall, bzw. jeder Zeitscheibe die Erzeugungswahrscheinlichkeit für jede Erkennungseinheit berechnen. Eine einfache Lösung ist die Bewertung dieser Wahrscheinlichkeiten. Ist die Wahrscheinlichkeit für Pause, also für das Hidden-Markov-Modell für Pause oder dessen Entsprechung am höchsten, so kann das betreffende Analyseintervall zum neuen Abschätzen der Verteilungsfunktionen, oder zum Ausfiltern bei einer Rauschunterdrückung verwendet werden.When carrying out the method according to the invention, in the recognition phase, for example, a continuous pattern comparison in every analysis interval or time slice the probability of generation for each recognition unit to calculate. An easy solution is to evaluate this Probabilities. Is the probability of pause, So for the hidden Markov model for break or its Correspondence highest, so the relevant analysis interval to re-estimate the distribution functions, or can be used to filter out noise suppression.

    Das erfindungsgemäße Verfahren wird noch robuster, wenn als zusätzliche Wissensquelle das Ergebnis eines Mustererkenners berücksichtigt wird. Setzt man voraus, daß beispielsweise der Mustererkenner in der Lage ist, jedes mögliche Nutzsignal zu erkennen, so kann sich dies das erfindungsgemäße Verfahren zu Nutze machen und alle anderen Analyseintervalle, welche nicht als Nutzsignal klassifiziert sind, als Pause definieren. Ein solcher Zeitabschnitt ist in Figur 2 mit Tp bezeichnet. Falls bezüglich des Verfahrens an Echtzeitverarbeitung keine Anforderung gestellt werden, wie z.B. in Simulationen dies der Fall ist, so kann das erfindungsgemäße Verfahren hiermit bereits als ausreichend für die Mustererkennung gelten. In der Praxis sind bei den genannten Anwendungen Echtzeitkriterien anzuwenden und es muß eine möglichst frühzeitige Zuordnung zu Nutz- oder Rauschsignal erfolgen. Daher muß das Verfahren beispielsweise in den Erkennungsprozeß selbst integriert werden. Das Erkennungsverfahren wird also erfindungsgemäß derart erweitert, daß nach jedem Analyseschritt, beispielsweise ausgewertet wird, welches der aus den Erkennungseinheiten zusammengesetzen Muster, z.B. Wörter das wahrscheinlichste ist. Zusätzlich wird beispielsweise über ein größeres Analyseintervall hinweg die Wahrscheinlichkeit berechnet, daß dieses eine Signalpause enthält. Beispielsweise ist das Analyseintervall dabei so bemessen, daß es in jedem Falle länger als kurze Pausen, z.B. Plosivpausen, im Nutzsignal ist. Diese Wahrscheinlichkeit wird dann mit derjenigen des wahrscheinlichsten Musters verglichen, wobei sie auf ein gleich langes Zeitintervall bezogen werden. Das Ergebnis dieses Vergleichs kann bereits als Entscheidung herangezogen werden.The method according to the invention becomes even more robust if the result of a pattern recognizer is taken into account as an additional source of knowledge. Assuming, for example, that the pattern recognizer is able to recognize every possible useful signal, the method according to the invention can take advantage of this and define all other analysis intervals, which are not classified as useful signals, as pauses. Such a time period is designated T p in FIG. If there is no requirement for real-time processing with respect to the method, as is the case, for example, in simulations, the method according to the invention can hereby already be considered sufficient for pattern recognition. In practice, real-time criteria are to be used in the applications mentioned and the earliest possible assignment to the useful or noise signal. The method must therefore be integrated into the recognition process itself, for example. The recognition method is thus expanded in accordance with the invention in such a way that after each analysis step, for example, it is evaluated which of the patterns, for example words, composed of the recognition units is the most likely. In addition, the probability that it contains a signal pause is calculated over a larger analysis interval, for example. For example, the analysis interval is dimensioned such that it is in any case longer than short pauses, for example plosive pauses, in the useful signal. This probability is then compared with that of the most probable pattern, and they are related to an equally long time interval. The result of this comparison can already be used as a decision.

    Noch höhere Anforderungen werden beispielsweise an Spracherkennungssysteme gestellt. Bei ihnen muß vermieden werden, daß der Erkenner vorzeitig abschaltet und dadurch bedingt ein falsches Wort ausgibt. In Figur 1 ist der Erkenner mit Klass bezeichnet. Diese Fälle treten besonders bei instationären Störgeräuschen auf. Beispielsweise kann dies durch eine Zusatzbedingung verhindert werden. Beispielsweise wird eine Signalpause erst dann als Wortende erkannt, wenn zusätzlich zum oben beschriebenen Kriterium das wahrscheinlichste Wort über eine bestimmte Zeitspanne immer das wahrscheinlichste Wort gewesen ist. Diese Zeitspanne ist in Figur 2 mit TST bezeichnet. Durch die Kombination dieser beiden beschriebenen Kriterien erhält man eine hohe Zuverlässigkeit bei der Pausenerkennung, was für die sichere Funktion eines Spracherkenners wichtig ist. For example, even higher demands are placed on speech recognition systems. With them it must be avoided that the recognizer switches off prematurely and, as a result, outputs an incorrect word. In Figure 1, the recognizer is designated by Klass. These cases occur particularly with transient noise. For example, this can be prevented by an additional condition. For example, a signal pause is only recognized as the end of a word if, in addition to the criterion described above, the most likely word has always been the most likely word over a certain period of time. This time period is designated T ST in FIG. The combination of these two criteria described provides a high level of reliability in pause recognition, which is important for the reliable functioning of a speech recognizer.

    Der Grundgedanke besteht darin in einem Mustererkennungssystem die auf verschiedenen Ebenen in Signalverarbeitungsstufen vorhandenen Wissensquellen zur Detektion einer Pause auszunutzen. Diese erstrecken sich beispielsweise auf

    • Eigenschaften des Signals im Zeitbereich, wie z.B. Nulldurchgangsrate und Pegel, sowie
    • im Spektralbereich, z.B. die Leistung und das Korrelationsmaß einschließlich des logarithmischen und/oder Merkmalsbereiches.
    • Zusätzlich wird durch das erfindungsgemäße Verfahren die Pause detektiert, indem eine Rückführung von der Erkennungsstufe zur Merkmalsextraktionstufe realisiert wird.
    The basic idea in a pattern recognition system is to use the knowledge sources available at different levels in signal processing stages to detect a pause. These extend to, for example
    • Properties of the signal in the time domain, such as zero crossing rate and level, as well
    • in the spectral range, for example the power and the correlation measure including the logarithmic and / or feature range.
    • In addition, the pause is detected by the method according to the invention by realizing a return from the recognition stage to the feature extraction stage.

    Hierdurch wird die in den verschiedenen Zeitscheiben vorhandene Information über das Vorhandensein einer Pause im Klassifikator Klass der Merkmalsextraktionsstufe Merk zugeführt. Während der Erkennung erfolgt beispielsweise ein dynamischer Mustervergleich, bei dem anhand der Merkmalsvektoren in einem Analysefenster bzw. einer Zeitscheibe eine Zuordnung zu den vortrainierten Modellen bewerkstelligt wird. Eine globale Suchstrategie, wie z.B. durch den Viterbi-Algorithmus realisiert, findet die wahrscheinlichste Folge von vortrainierten Modellzuständen, die die ankommende Folge von Merkmalsvektoren wiedergibt [6].As a result, the existing in the different time slices Information about the presence of a pause in the classifier Class of the feature extraction level Merk supplied. For example, a dynamic one takes place during the recognition Pattern comparison, in which on the basis of the feature vectors in one Analysis window or a time slice an assignment to the pre-trained models is accomplished. A global one Search strategy, e.g. realized by the Viterbi algorithm, finds the most likely sequence of pre-trained Model states representing the incoming sequence of feature vectors reproduces [6].

    In jedem Zeitfenster kann somit am Klassifikator Klass die Information über Pause/Nichtpause abgegriffen werden und einem Pausendetektor in einer anderen Stufe zugeführt werden. Im erfindungsgemäßen Verfahren ist dies beispielsweise so realisiert, daß im Klassifikator ein spezielles Hidden-Markov-Modell für Pause mit den einkommenden Merkmalsvektoren verglichen wird, falls eine höhere Wahrscheinlichkeit für Pause auftritt als für andere Muster, so wird eine Pauseinformation beispielsweise an die Merkmalsextraktionsstufe Merk weitergegeben und führt dort zur Entscheidung, daß momentan eine Pause vorliegt. Das heißt mit dieser Pauseinformation kann auch ein bereits in der Extraktionsstufe vorhandener Pausedetektor angesteuert werden, um Pause einzustellen. Diese Pauseentscheidung kann beispielsweise wahrscheinlichkeitsgewichtet sein und basiert auf einer Entscheidung, die andere Wissensquellen innerhalb des erfindungsgemäßen Verfahrens berücksichtigt. Solche anderen Wissensquellen sind beispielsweise Statistik des Meßsignals und Phonemkontext aus dem Viterbi-Verfahren. Aufgrund der sequentiellen Struktur eines Erkenners muß beispielsweise bei einer Rückführung der Information zu einer Pausedetektionsstufe für die Unterdrükkung von Störgeräuschen z.B. die Verzögerung um ein Analysefenster berücksichtigt werden. Verknüpft man die Pauseentscheidung der akustisch phonetischen Modellierungsstufe bei der Spracherkennung mit gängigen Kriterien für die Pauseschätzung, so ist eine Verbesserung der Pauseentscheidung erzielbar. Geht man beispielsweise von der frameweisen Detektion der Pausen ganz ab, so läßt sich eine weitere Wissensquelle im Erkennungssystem für die Pauseschätzung ausnutzen.In every time window, the classifier can be used for the classifier Information about pause / non-pause can be tapped and be fed to a pause detector in another stage. This is the case, for example, in the method according to the invention realizes that in the classifier a special hidden Markov model for pause with the incoming feature vectors is compared if a higher probability for Pause occurs than for other patterns, so it becomes a pause information for example to the feature extraction level Merk passed on and leads there to the decision that currently there is a pause. That means with this pause information can also be an existing one in the extraction stage Pause detector can be controlled to set pause. This pause decision can be probability-weighted, for example and is based on a decision that other sources of knowledge within the inventive method considered. Such other sources of knowledge are for example statistics of the measurement signal and phoneme context the Viterbi process. Because of the sequential structure a recognizer must, for example, when the Information on a pause detection level for the suppression of noise, e.g. the delay by an analysis window be taken into account. If you link the pause decision the acoustic phonetic modeling level speech recognition with common criteria for pause estimation, so is an improvement in the break decision achievable. Take the frame-by-frame detection, for example the breaks completely, so there is another source of knowledge in the detection system for the pause estimation.

    Beispielsweise können verschiedene zusammenhängende und auch zusammengehörende Muster als Gesamtheit detektiert werden und daraus Rückschlüsse auf im Meßsignal vorhandene Pausen gezogen werden. Beispielsweise kann ein solcher globaler Pausendetektor seine Information über das gesamte zu erkennende Muster bzw. die Musterfolge bereitstellen. Im Falle der Spracherkennung wäre eine solche Musterfolge beispielsweise ein zu erkennendes Wort. Alle Bereiche außer dieser Musterfolge können also beispielsweise als Pause erkannt werden. Dies hat den Vorteil, daß sogar aktuelle Störungen in die Pausedetektion eingehen. Das erfindungsgemäße Verfahren funktioniert damit auch noch bei sehr hohen Störpegeln, ist also robuster. Prinzipbedingt ist eine größere Zeitverzögerung zu berücksichtigen, bis eine Entscheidung vorliegt. Diese globale Pausendetektionsstufe ist also besonders in Verbindung mit einer Signalzwischenspeicherung anzuwenden. Es ist besonders für die Aufbereitung des Meßsignales geeignet und kann insbesondere der Erkennung der Trennpausen zwischen einzelnen Worten bzw. zu erkennenden Musterfolgen dienen. Zusammenfassend kann ein erfindungsgemäßes Mustererkennungs- und Pausenerkennungssystem in folgenden Stufen beschrieben werden.

  • 1. Berücksichtigung der Signalcharakteristika im Zeitbereich (z.B. Nulldurchgangssrate, Pegel);
  • 2. Zusätzliche Berücksichtigung der Eigenschaften im Spektralbereich (z.B. Leistung, Korrelationsmaß) einschließlich des logarithmischen und/oder Merkmalsbereiches;
  • 3. Zusätzliche Berücksichtigung des frameweisen Mustervergleichs mit vortrainierten-Pausenmodellen;
  • 4. Zusätzliche Berücksichtigung der Rückführung der Entscheidung des in der globalen Erkennung integrierten Pausedetektors.
  • For example, various coherent and also related patterns can be detected as a whole and conclusions can be drawn from them about pauses present in the measurement signal. For example, such a global pause detector can provide its information about the entire pattern to be recognized or the pattern sequence. In the case of speech recognition, such a pattern sequence would be a word to be recognized, for example. All areas except this pattern sequence can thus be recognized as a break, for example. This has the advantage that even current disturbances are included in the pause detection. The method according to the invention thus also works at very high interference levels, and is therefore more robust. Due to the principle, a larger time delay must be taken into account until a decision is made. This global pause detection stage is therefore particularly useful in connection with intermediate signal storage. It is particularly suitable for the processing of the measurement signal and can be used in particular to identify the breaks between individual words or to recognize pattern sequences. In summary, a pattern recognition and pause recognition system according to the invention can be described in the following stages.
  • 1. Consideration of the signal characteristics in the time domain (eg zero crossing rate, level);
  • 2. Additional consideration of the properties in the spectral range (eg power, correlation measure) including the logarithmic and / or feature range;
  • 3. Additional consideration of the frame-wise pattern comparison with pre-trained break models;
  • 4. Additional consideration of the return of the decision of the pause detector integrated in the global detection.
  • Beispielsweise wird eine Ausführungsform des erfindungsgemäßen Verfahrens durch den Pseudocode, der in Tabelle 1 dargestellt ist, beschrieben.

    Figure 00110001
    Figure 00120001
    For example, one embodiment of the method according to the invention is described by the pseudocode shown in Table 1.
    Figure 00110001
    Figure 00120001

    Beispielsweise wird das erfindungsgemäße Verfahren in einem Hauptprogramm das durch main und end begrenzt wird realisiert. Dieses Hauptprogramm enthält im wesentlichen einen do-loop als Zeitschleife. Mit einer Prozedur signal_analyse wird eine Transformation des Meßsignals in einen Merkmalsbereich durchgeführt. Beispielsweise wird eine spezielle Zeitscheibe des Meßsignals analysiert und von dieser Zeitscheibe Merkmalsvektoren angelegt. Die angelegten Merkmalsvektoren werden anschließend in einer Unterroutine berechne_wort_wk analysiert. Beispielsweise wird dort für jedes Referenzwort die Wahrscheinlichkeit, z.B. mit Hidden-Markov-Modellen und mit Hilfe der Viterbi-Dekodierung berechnet. Dabei wird beispielsweise die Verbundwahrscheinlichkeit, daß alle bisherigen Merkmalsvektoren emittiert wurden, berechnet. In einer weiteren Subroutine berechne_pause_wk wird die Wahrscheinlichkeit für pause für die letzten P Zeitschritte berechnet. For example, the inventive method is in one Main program that is limited by main and end. This main program essentially contains a do-loop as a time loop. With a signal_analysis procedure a transformation of the measurement signal into a feature area carried out. For example, a special time slice of the measurement signal is analyzed and feature vectors from this time slice created. The created feature vectors are then analyzed in a subroutine calculate_word_wk. For example, the for each reference word Probability, e.g. with hidden Markov models and with Calculated using Viterbi decoding. For example the association probability that all previous Feature vectors were emitted, calculated. In a Another subroutine calculate_pause_wk becomes the probability calculated for pause for the last P time steps.

    Auch hier wird die Verbundwahrscheinlichkeit dafür berechnet, daß die letzten P-Merkmalsvektoren vom Modell für pause emittiert wurden. In einer weiteren Subroutine pause detektor wird eine pause Information generiert, wenn die Wahrscheinlichkeit für pause höher ist als für das beste Wort, sonst wird die Pauseinformation nicht erzeugt. Beispielsweise wird hier eine Normierung der zu berücksichtigenden Wahrscheinlichkeit auf die gleiche Zeitdauer P durchgeführt. In einer weiteren Abfrage if (pause && wort_stabil > x) break, wird ein Abbruch des Verfahrens durchgeführt, wenn Pause von pausedetektor erkannt wurde und das beste Wort mindestens seit x Zeitabschnitten ununterbrochen stabil ist (wort_stabil). Mit der Subroutine ausgabe wird dann die erkannte Musterfolge, bei der Spracherkennung ein Wort, ausgegeben. Here, too, the association probability is calculated that the last P feature vectors from the model for pause were issued. In another subroutine pause detector pause information is generated if the probability for pause is higher than for the best word, otherwise the pause information is not generated. For example here a standardization of the probability to be taken into account performed for the same period of time P. In a another query if (pause && word_stabil> x) break, will the procedure was terminated when paused by pause detector was recognized and the best word at least has been stable for x periods (word_stable). With the subroutine output the recognized pattern sequence, in speech recognition a word, spent.

    Literaturliterature

  • [1] Rabiner, L.R. und M. Sambur (1975). An algorithm for determing the endpoints of isolated utterances. The Bell System Technical Journal, 54(2): 297 - 315[1] Rabiner, L.R. and M. Sambur (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54 (2): 297-315
  • [2] Hansen, J.H. (1991). Speech enhancement employing boundary detection and morphological based spectral constraints. IEEE International Conference on Acoustics, Speech and Signal Processing, pp 901 - 904, Toronto. ICASSP.[2] Hansen, J.H. (1991). Speech enhancement employing boundary detection and morphological based spectral constraints. IEEE International Conference on Acoustics, Speech and Signal Processing, pp 901-904, Toronto. ICASSP.
  • [3] Katterfeldt, H. Sprachbestimmung mit Polynom Klassifikatoren. Proceedings Mustererkennung 7, DAGM-Symposium, Erlangen S 180 - 184.[3] Katterfeldt, H. Language determination with polynomial classifiers. Proceedings pattern recognition 7, DAGM symposium, Erlangen S 180 - 184.
  • [4] Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 31 (3): 678 - 684[4] Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 31 (3): 678-684
  • [5] Widrow, B., J.Glover, J.McCool, J.Kaunitz (1975). Adaptive noise cancelling: Principles and applications. Proceedings of the IEEE, 63(12):1692 - 1716.[5] Widrow, B., J.Glover, J.McCool, J.Kaunitz (1975). Adaptive noise canceling: Principles and applications. Proceedings of the IEEE, 63 (12): 1692-1716.
  • [6] Rabiner, L.R. und B.H. Juang (1986). An introduction to hidden markov models. IEEE Transactions on Acoustics, Speech and Signal Processing, (1): 4-16.[6] Rabiner, L.R. and B.H. Juang (1986). An introduction to hidden markov models. IEEE Transactions on Acoustics, Speech and Signal Processing, (1): 4-16.
  • Claims (11)

    1. Method for identifying a signal pause between two patterns which are present in a measurement signal that varies with time and are identified with the aid of hidden Markov models, having the following features:
      a) in a first signal processing stage (Merk), feature vectors (m) are formed periodically for pattern identification, which feature vectors (m) describe the signal waveform of the measurement signal (Spr) within a time slice, and a pause detector contained therein does not detect any speech pause in a first time slice on the basis of features of a first feature vector which are present,
      b) in a second signal processing stage (Klass) of the method, and in a second time slice which follows the first time slice, the first feature vector is compared with at least two hidden Markov models (HMM), at least one of which has been trained for a pattern which is to be identified and a further one of which has been trained for a pattern which is characteristic of a pause,
      c) if the comparison of the first feature vector (m) with the hidden Markov models (HMM) results in a greater probability of the presence of a pause, then the information about the presence of a pause, the pause information (Pa), is passed to the pause detector in the first signal processing stage, and the measurement signal is dealt with as a signal pause there, at least in the second time slice.
    2. Method according to Claim 1, in which a defined sequence of patterns, a pattern sequence, is to be identified and in which the pause information is passed on after identifying the pattern sequence over a number of time slices, so that the measurement signal is dealt with as a signal pause, and not as a pattern to be identified, in the first signal processing stage, at least in the time slice following the pattern sequence.
    3. Method according to Claim 2, in which feature vectors are buffer-stored until a pattern sequence has been identified and in which the pause information is passed on after identifying the pattern sequence, so that the measurement signal is dealt with as a signal pause, and not as a pattern to be identified, in the first signal processing stage, at least in the time slice before the pattern sequence.
    4. Method according to one of the preceding claims, in which characteristics of the measurement signal are evaluated in the time domain in the first signal processing stage for pause identification.
    5. Method according to one of the preceding claims, in which characteristics of the measurement signal are evaluated in the frequency domain in the first signal processing stage for pause identification.
    6. Method according to one of the preceding claims, in which context-modelled hidden Markov models are used.
    7. Method according to one of the preceding claims, in which the measurement signal represents spoken voice information.
    8. Method according to Claim 7, in which disturbances in the feature extraction stage of a voice processing system are suppressed.
    9. Method according to one of Claims 7 or 8, in which channel adaptation of a voice channel is carried out.
    10. Method according to one of Claims 1 to 6, in which the measurement signal represents writing movements on a surface.
    11. Method according to one of Claims 1 to 6, in which the measurement signal represents signal sequences of an information signalling method.
    EP96905679A 1995-03-10 1996-03-04 Method of detecting a pause between two signal patterns on a time-variable measurement signal Expired - Lifetime EP0815553B1 (en)

    Applications Claiming Priority (3)

    Application Number Priority Date Filing Date Title
    DE19508711 1995-03-10
    DE19508711A DE19508711A1 (en) 1995-03-10 1995-03-10 Method for recognizing a signal pause between two patterns which are present in a time-variant measurement signal
    PCT/DE1996/000379 WO1996028808A2 (en) 1995-03-10 1996-03-04 Method of detecting a pause between two signal patterns on a time-variable measurement signal

    Publications (2)

    Publication Number Publication Date
    EP0815553A2 EP0815553A2 (en) 1998-01-07
    EP0815553B1 true EP0815553B1 (en) 1999-06-02

    Family

    ID=7756346

    Family Applications (1)

    Application Number Title Priority Date Filing Date
    EP96905679A Expired - Lifetime EP0815553B1 (en) 1995-03-10 1996-03-04 Method of detecting a pause between two signal patterns on a time-variable measurement signal

    Country Status (4)

    Country Link
    US (1) US5970452A (en)
    EP (1) EP0815553B1 (en)
    DE (2) DE19508711A1 (en)
    WO (1) WO1996028808A2 (en)

    Families Citing this family (19)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    DE19705471C2 (en) * 1997-02-13 1998-04-09 Sican F & E Gmbh Sibet Method and circuit arrangement for speech recognition and for voice control of devices
    DE19824355A1 (en) * 1998-05-30 1999-12-02 Philips Patentverwaltung Apparatus for verifying time dependent user specific signals
    DE19824353A1 (en) * 1998-05-30 1999-12-02 Philips Patentverwaltung Device for verifying signals
    DE19824354A1 (en) * 1998-05-30 1999-12-02 Philips Patentverwaltung Device for verifying signals
    US6418411B1 (en) * 1999-03-12 2002-07-09 Texas Instruments Incorporated Method and system for adaptive speech recognition in a noisy environment
    DE19939102C1 (en) * 1999-08-18 2000-10-26 Siemens Ag Speech recognition method for dictating system or automatic telephone exchange
    DE10033104C2 (en) * 2000-07-07 2003-02-27 Siemens Ag Methods for generating statistics of phone durations and methods for determining the duration of individual phones for speech synthesis
    US20020042709A1 (en) * 2000-09-29 2002-04-11 Rainer Klisch Method and device for analyzing a spoken sequence of numbers
    JP4759827B2 (en) * 2001-03-28 2011-08-31 日本電気株式会社 Voice segmentation apparatus and method, and control program therefor
    WO2003054856A1 (en) * 2001-12-21 2003-07-03 Telefonaktiebolaget Lm Ericsson (Publ) Method and device for voice recognition
    US20080249779A1 (en) * 2003-06-30 2008-10-09 Marcus Hennecke Speech dialog system
    JP3909709B2 (en) * 2004-03-09 2007-04-25 インターナショナル・ビジネス・マシーンズ・コーポレーション Noise removal apparatus, method, and program
    DE102004023824B4 (en) * 2004-05-13 2006-07-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for evaluating a quality class of an object to be tested
    US20070033041A1 (en) * 2004-07-12 2007-02-08 Norton Jeffrey W Method of identifying a person based upon voice analysis
    US20090327036A1 (en) * 2008-06-26 2009-12-31 Bank Of America Decision support systems using multi-scale customer and transaction clustering and visualization
    US8255218B1 (en) * 2011-09-26 2012-08-28 Google Inc. Directing dictation into input fields
    US8543397B1 (en) 2012-10-11 2013-09-24 Google Inc. Mobile device voice activation
    US9473094B2 (en) * 2014-05-23 2016-10-18 General Motors Llc Automatically controlling the loudness of voice prompts
    US11283586B1 (en) 2020-09-05 2022-03-22 Francis Tiong Method to estimate and compensate for clock rate difference in acoustic sensors

    Family Cites Families (12)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    US4481593A (en) * 1981-10-05 1984-11-06 Exxon Corporation Continuous speech recognition
    US4587670A (en) * 1982-10-15 1986-05-06 At&T Bell Laboratories Hidden Markov model speech recognition arrangement
    US4713777A (en) * 1984-05-27 1987-12-15 Exxon Research And Engineering Company Speech recognition method having noise immunity
    US4811399A (en) * 1984-12-31 1989-03-07 Itt Defense Communications, A Division Of Itt Corporation Apparatus and method for automatic speech recognition
    FR2581465B1 (en) * 1985-05-03 1988-05-20 Telephonie Ind Commerciale METHOD AND DEVICE FOR CONTROLLING PROCESS BY SOUND PROCESS
    US5226091A (en) * 1985-11-05 1993-07-06 Howell David N L Method and apparatus for capturing information in drawing or writing
    DE3784168T2 (en) * 1987-09-23 1993-09-16 Ibm DIGITAL PACKAGE SWITCHING NETWORKS.
    JP2573352B2 (en) * 1989-04-10 1997-01-22 富士通株式会社 Voice detection device
    JPH04362698A (en) * 1991-06-11 1992-12-15 Canon Inc Method and device for voice recognition
    US5293452A (en) * 1991-07-01 1994-03-08 Texas Instruments Incorporated Voice log-in using spoken name input
    US5465317A (en) * 1993-05-18 1995-11-07 International Business Machines Corporation Speech recognition system with improved rejection of words and sounds not in the system vocabulary
    JPH06332492A (en) * 1993-05-19 1994-12-02 Matsushita Electric Ind Co Ltd Method and device for voice detection

    Also Published As

    Publication number Publication date
    US5970452A (en) 1999-10-19
    WO1996028808A2 (en) 1996-09-19
    DE59602095D1 (en) 1999-07-08
    DE19508711A1 (en) 1996-09-12
    EP0815553A2 (en) 1998-01-07
    WO1996028808A3 (en) 1996-10-24

    Similar Documents

    Publication Publication Date Title
    EP0815553B1 (en) Method of detecting a pause between two signal patterns on a time-variable measurement signal
    EP0604476B1 (en) Process for recognizing patterns in time-varying measurement signals
    EP1733223B1 (en) Device and method for assessing the quality class of an object to be tested
    DE69816610T2 (en) METHOD AND DEVICE FOR NOISE REDUCTION, ESPECIALLY WITH HEARING AIDS
    DE69433254T2 (en) Method and device for speech detection
    DE69720087T2 (en) Method and device for suppressing background music or noise in the input signal of a speech recognizer
    DE69823954T2 (en) Source-normalizing training for language modeling
    EP1084490B1 (en) Arrangement and method for computer recognition of a predefined vocabulary in spoken language
    EP0076233B1 (en) Method and apparatus for redundancy-reducing digital speech processing
    EP0987683B1 (en) Speech recognition method with confidence measure
    EP1386307B1 (en) Method and device for determining a quality measure for an audio signal
    DE10030105A1 (en) Speech recognition device
    EP0747880B1 (en) System for speech recognition
    WO1996029695A1 (en) Speech recognition process and device for languages containing composite words
    DE60018696T2 (en) ROBUST LANGUAGE PROCESSING OF CHARACTERED LANGUAGE MODELS
    DE102010040553A1 (en) Speech recognition method
    EP0813734B1 (en) Method of recognising at least one defined pattern modelled using hidden markov models in a time-variable test signal on which at least one interference signal is superimposed
    DE102019102414B4 (en) Method and system for detecting fricatives in speech signals
    WO2005069278A1 (en) Method and device for processing a voice signal for robust speech recognition
    EP0817167B1 (en) Speech recognition method and device for carrying out the method
    DE10308611A1 (en) Determination of the likelihood of confusion between vocabulary entries in phoneme-based speech recognition
    EP0540535B1 (en) Process for speaker adaptation in an automatic speech-recognition system
    DE3935308C1 (en) Speech recognition method by digitising microphone signal - using delta modulator to produce continuous of equal value bits for data reduction
    EP0834860B1 (en) Speech recognition method with context dependent hidden Markov models
    DE10244699A1 (en) Voice activity determining method for detecting phrases operates in a portion of an audio signal through phrase detection based on thresholds

    Legal Events

    Date Code Title Description
    PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

    Free format text: ORIGINAL CODE: 0009012

    17P Request for examination filed

    Effective date: 19970902

    AK Designated contracting states

    Kind code of ref document: A2

    Designated state(s): DE FR GB IT

    GRAG Despatch of communication of intention to grant

    Free format text: ORIGINAL CODE: EPIDOS AGRA

    GRAG Despatch of communication of intention to grant

    Free format text: ORIGINAL CODE: EPIDOS AGRA

    GRAH Despatch of communication of intention to grant a patent

    Free format text: ORIGINAL CODE: EPIDOS IGRA

    17Q First examination report despatched

    Effective date: 19980921

    GRAH Despatch of communication of intention to grant a patent

    Free format text: ORIGINAL CODE: EPIDOS IGRA

    GRAA (expected) grant

    Free format text: ORIGINAL CODE: 0009210

    AK Designated contracting states

    Kind code of ref document: B1

    Designated state(s): DE FR GB IT

    REF Corresponds to:

    Ref document number: 59602095

    Country of ref document: DE

    Date of ref document: 19990708

    ET Fr: translation filed
    GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

    Effective date: 19990715

    ITF It: translation for a ep patent filed

    Owner name: STUDIO JAUMANN P. & C. S.N.C.

    PLBE No opposition filed within time limit

    Free format text: ORIGINAL CODE: 0009261

    STAA Information on the status of an ep patent application or granted ep patent

    Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

    26N No opposition filed
    PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

    Ref country code: GB

    Payment date: 20010309

    Year of fee payment: 6

    PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

    Ref country code: FR

    Payment date: 20010327

    Year of fee payment: 6

    REG Reference to a national code

    Ref country code: GB

    Ref legal event code: IF02

    PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

    Ref country code: GB

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20020304

    GBPC Gb: european patent ceased through non-payment of renewal fee

    Effective date: 20020304

    PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

    Ref country code: FR

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20021129

    REG Reference to a national code

    Ref country code: FR

    Ref legal event code: ST

    PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

    Ref country code: IT

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED.

    Effective date: 20050304

    REG Reference to a national code

    Ref country code: DE

    Ref legal event code: R081

    Ref document number: 59602095

    Country of ref document: DE

    Owner name: LANTIQ DEUTSCHLAND GMBH, DE

    Free format text: FORMER OWNER: INFINEON TECHNOLOGIES AG, 85579 NEUBIBERG, DE

    Effective date: 20110325

    PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

    Ref country code: DE

    Payment date: 20150320

    Year of fee payment: 20

    REG Reference to a national code

    Ref country code: DE

    Ref legal event code: R071

    Ref document number: 59602095

    Country of ref document: DE