EP0110467B1 - Arrangement for the detection of speech intervals - Google Patents

Arrangement for the detection of speech intervals Download PDF

Info

Publication number
EP0110467B1
EP0110467B1 EP83201638A EP83201638A EP0110467B1 EP 0110467 B1 EP0110467 B1 EP 0110467B1 EP 83201638 A EP83201638 A EP 83201638A EP 83201638 A EP83201638 A EP 83201638A EP 0110467 B1 EP0110467 B1 EP 0110467B1
Authority
EP
European Patent Office
Prior art keywords
value
short
arrangement
estimate
time mean
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
EP83201638A
Other languages
German (de)
French (fr)
Other versions
EP0110467A1 (en
EP0110467B2 (en
Inventor
Bernd Dipl.-Ing. Selbach
Peter Dr. Ing. Vary
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Philips Kommunikations Industrie AG
Koninklijke Philips NV
Original Assignee
Philips Kommunikations Industrie AG
Philips Gloeilampenfabrieken NV
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=6178780&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=EP0110467(B1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Philips Kommunikations Industrie AG, Philips Gloeilampenfabrieken NV, Koninklijke Philips Electronics NV filed Critical Philips Kommunikations Industrie AG
Publication of EP0110467A1 publication Critical patent/EP0110467A1/en
Application granted granted Critical
Publication of EP0110467B1 publication Critical patent/EP0110467B1/en
Publication of EP0110467B2 publication Critical patent/EP0110467B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • G10L2025/786Adaptive threshold

Definitions

  • the invention relates to an arrangement for recognizing speech pauses in a speech signal, according to the preamble of patent claim 1.
  • Such arrangements are e.g. the prerequisite for the suppression of interference signals when calling from an acoustically disturbed environment.
  • characteristic parameters of the interference signal are measured and used to filter out the interference as completely as possible from the signal to be transmitted using adaptive filters.
  • a circuit arrangement for recognizing speech pauses in a speech signal in which a short-term mean value is determined at certain clock instants of a clock.
  • the circuit arrangement known therefrom has a fixed threshold and two adaptively tracked thresholds, the sign of the respective slope in the speech signal being used in the tracking of the thresholds.
  • the adaptive noise thresholds are changed by constant amounts, so that these are not determined as a function of own values at previous clock instants.
  • Such a circuit arrangement is preferably used for the recognition of speech pauses in a speech signal, on which only weak interference signals are superimposed.
  • This break detection does not take into account, among other things, that z. B. unvoiced sounds lead to a drop in performance in the speech signal and the speech sections in question are therefore incorrectly regarded as speech pauses. Such mistakes in the known arrangement occur all the more, the more the speech signal is overlaid with interference signals.
  • the arrangement is also intended to enable speech pause recognition even if the average noise level changes only slowly.
  • kT o samples x (k) are obtained from the disturbed speech signal applied to a terminal E by an analog-to-digital converter A / D at sampling times, where k is a natural number and I / T o the sampling frequency represents.
  • the samples are passed on to an averager M.
  • the mean value generator M At all clock instants T (n) with the time interval mT o , the mean value generator M generates a so-called short-term mean value from the amounts of m consecutive samples.
  • the arithmetic mean of the amounts of the sampled values is used as the mean value, since the block expenditure is less than z. B. to form the square mean.
  • Each short-term mean value G (n) is approximately a measure of the average power of the disturbed speech signal over a period of approximately 100 ms. This specification and the sampling frequency also determine the number m of samples which are required to determine one of the short-term mean values G (n). Is z. B. sampled the disturbed speech signal at 10 kHz, m must be about 1000. Each of the quantities G (1), G (2) ... thus results from approximately a thousand consecutive samples.
  • the unit GL of FIG. 1 smoothes the sequence of the short-term mean values G (n). More about the purpose and manner of smoothing is given below.
  • block PA of FIG. 1 converts the short-term mean values into an estimated value P (n) for the average noise power, i.e. determined for the average power of the interference signal. More details about the estimate P (n) are also given below.
  • a comparator V in FIG. 1 compares a threshold S dependent on the estimated value P (n) with the smoothed short-term mean values GG (n). If the smoothed short-term mean value GG (n) is less than the threshold S, a signal is forwarded to a unit EN. Has the unit EN z. B. at two consecutive clock instants T (n-1) and T (n) receive such a signal, they can in turn detect the presence of a speech pause by means of their own signal at terminal A.
  • the diagram a) of Fig. 2 shows a possible output signal AM of the averager M, i.e. a possible sequence of the short-term mean values G (1), G (2) ...
  • the output signal AM is standardized so that its absolute maximum assumes the value 1.
  • the amplitude thresholds entered are the estimated value P (n) (lower threshold, shown in broken lines) and the threshold S (upper threshold, solid).
  • Diagram b) schematically shows the associated speech signal S with its true pauses P. If a pause determination were made due to the fact that the upper amplitude threshold was not reached in diagram a) - this pause determination is shown in diagram c) - a large number of wrong decisions would result, as a comparison of diagrams b) and c) shows.
  • a shift of the upper threshold downward would lead to the fact that the performance drops contained in diagram c), which are not based on language breaks, would not be displayed either, but the statement about the length of the breaks would then be significantly falsified.
  • a smoothing of the output signal AM is provided before the decision to pause, either with the aid of a linear digital filter, by means of which three successive short-term mean values G (n), G (n-1) and G (n- 2) a value GG (n) of the smoothed signal is obtained, or using a median filter.
  • FIG. 3 shows how the output signal of the mean value generator M looks after smoothing with a linear digital filter.
  • diagram b) the true speech sections and the real pauses of the speech signal are in turn plotted, and diagram c) shows the speech sections and speech pauses as they result analogously to diagram c) in FIG. 1. Due to the linear smoothing, the number of wrong decisions has decreased considerably, as the comparison of FIGS. 2 and 3 shows. Even with smoothing with a median filter, the number of incorrect decisions is reduced, as can be seen from diagram c) in FIG. 4.
  • a drop in performance can only be regarded as a speech pause if the upper amplitude threshold is fallen below twice in FIG. 2, 3 or 4.
  • the amplitude thresholds shown in FIGS. 2, 3 and 4 are - as already indicated above - determined by the unit PA in FIG. 1, namely that the estimated value P (n) of the noise power is initially determined for each time T (n).
  • This variable is intended to be an approximate measure of the average power of the interference signal, the averaging time being of the order of one second.
  • the arrangement according to the invention still delivers good results even if the above-mentioned average power of the interference signal changes only slowly , ie if it is to be regarded as stationary in time intervals of the size, one or two seconds.
  • the estimated value P (n) is a linear combination of the previous estimated value (P (n-1) and the short-term mean value G (n) according to the equation redefined.
  • the value of the constant a appearing in this equation is between zero and one.
  • a threshold D in terms of amount. Is z. B. K times the inequality in succession is satisfied, this fact is considered to be a longer speech pause and the new estimated value P (n) is determined according to the equation given above.
  • the threshold D is selected proportional to the short-term mean G (n) in order to arrive at the same statements if, for. B. the levels of all signals would be doubled.
  • the proportionality factor y and the number K are to be determined experimentally in such a way that as few incorrect decisions as possible are made by the arrangement. Typical values are
  • the constant c is to be chosen so that the estimation value reaches the modulation limit in one to two seconds with unimpeded enlargement. If, on the other hand, the already existing estimated value P (n-1) lies above the current short-term mean value G (n), the new estimated value P (n) is lowered compared to the existing one, specifically according to the equation which represents the new estimated value as a linear combination of the previous estimated value and the current short-term mean value G (n). Values around 0.5 have proven to be favorable for the constant ⁇ .
  • the threshold S which is used for the pause decision, is proportional to the estimated value P (n).
  • the relationship S 1.1 P (n) is typical of the relationship between the threshold S and the estimated value P (n).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)
  • Analogue/Digital Conversion (AREA)
  • Telephone Function (AREA)

Description

Die Erfindung betrifft eine Anordnung zur Erkennung von Sprachpausen in einem Sprachsignal, gemäß dem Oberbegriff des Patentanspruchs 1.The invention relates to an arrangement for recognizing speech pauses in a speech signal, according to the preamble of patent claim 1.

Derartige Anordnungen sind z.B. die Voraussetzung für die Unterdrückung von Störsignalen beim Telefonieren aus akustisch gestörter Umgebung. Während der Sprachpause werden charakteristische Parameter des Störsignales gemessen und dazu verwendet, die Störungen vor der Übertragung möglichst vollständig aus dem zu übertragenden Signal mit adaptiven Filtern herauszufiltern.Such arrangements are e.g. the prerequisite for the suppression of interference signals when calling from an acoustically disturbed environment. During the pause in speech, characteristic parameters of the interference signal are measured and used to filter out the interference as completely as possible from the signal to be transmitted using adaptive filters.

Aus der US-A- 4,357,491 ist eine Schaltungsanordnung zur Erkennung von Sprachpausen in einem Sprachsignal bekannt, bei welcher ein Kurzzeitmittelwert zu bestimmten Taktzeitpunkten eines Taktes bestimmt wird. Die daraus bekannte Schaltungsanordnung weist eine feste Schwelle und zwei adaptiv nachgeführte Schwellen auf, wobei bei der Nachführung der Schwellen das Vorzeichen der jeweiligen Steigung im Sprachsignal ausgenutzt wird. Die adaptiven Rauschschwellen werden um konstante Beträge verändert, so daß diese nicht in Abhängigkeit von eigenen Werten zu vorangegangenen Taktzeitpunkten bestimmt werden. Vorzugsweise wird eine solche Schaltungsanordnung zur Erkennung von Sprachpausen in einem Sprachsignal verwendet, welchem nur schwache Störsignale überlagert sind.From US-A-4,357,491 a circuit arrangement for recognizing speech pauses in a speech signal is known, in which a short-term mean value is determined at certain clock instants of a clock. The circuit arrangement known therefrom has a fixed threshold and two adaptively tracked thresholds, the sign of the respective slope in the speech signal being used in the tracking of the thresholds. The adaptive noise thresholds are changed by constant amounts, so that these are not determined as a function of own values at previous clock instants. Such a circuit arrangement is preferably used for the recognition of speech pauses in a speech signal, on which only weak interference signals are superimposed.

Aus der DE-A- 26 23 025 ist ein Verfahren zur Analyse eines Signals bekannt, bei dem der Schätzwert der Kanalsignal-Kurzzeitleistung sowohl mit zwei konstanten Schwellwertpegeln ptm und ptl als auch mit dem um die Zeitdauer At verzögerten Schätzwert selbst verglichen wird.From DE-A-26 23 025 a method for analyzing a signal is known, in which the estimated value of the channel signal short-term power is compared both with two constant threshold levels ptm and ptl and with the estimated value itself delayed by the time period At.

Eine Anregung zur adaptiven Nachführung einer Rauschschwelle ist der DE-OS 26 23 025 nicht zu entnehmen.A suggestion for adaptive tracking of a noise threshold cannot be found in DE-OS 26 23 025.

Aus der DE-B-2 455 477 Spalte 10 ist eine Anordnung in analoger Technik zur Erkennung von Sprachpausen bekannt, der folgende Wirkungsweise zugrunde liegt: Das Sprachsignal wird in gleich lange Abschnitte zerlegt und für jeden Abschnitt wird durch Gleichrichtung und Mittelwertbildung ein Spannungswert gewonnen, der zur mittleren Lautstärke des Abschnittes proportional ist. Schließlich wird durch Mittelwertbildung über mehrere Sprachabschnitte ein weiterer Spannungswert bestimmt, der zur mittleren Gesprächslautstärke proportional ist. Durch einen Vergleich der beiden Mittelwerte wird entschieden, ob ein Abschnitt einer Sprachpause angehört oder nicht.From DE-B-2 455 477 column 10, an arrangement in analog technology for recognizing speech pauses is known, which is based on the following mode of action: the speech signal is broken down into sections of equal length and a voltage value is obtained for each section by rectification and averaging, which is proportional to the average volume of the section. Finally, by averaging over several speech sections, a further voltage value is determined that is proportional to the average volume of the conversation. By comparing the two mean values, it is decided whether a section belongs to a speech pause or not.

Bei dieser Pausenerkennung ist unter anderem nicht berücksichtigt, daß z. B. stimmlose Laute zu einem Leistungseinbruch im Sprachsignal führen und die betreffenden Sprachabschnitte deshalb fälschlicherweise als Sprachpausen angesehen werden. Derartige Fehlentscheidungen treten bei der bekannten Anordnung um so häufiger auf, je stärker das Sprachsignal von Störsignalen überlagert ist.This break detection does not take into account, among other things, that z. B. unvoiced sounds lead to a drop in performance in the speech signal and the speech sections in question are therefore incorrectly regarded as speech pauses. Such mistakes in the known arrangement occur all the more, the more the speech signal is overlaid with interference signals.

Es ist deshalb Aufgabe der Erfindung, eine Anordnung zur Erkennung der Pausen in einem gestörten Sprachsignal anzugeben, bei der Fehlentscheidungen im oben erläuterten Sinne vermieden werden. Die Anordnung soll darüberhinaus eine Sprachpausenerkennung auch dann ermöglichen, wenn sich die mittlere Geräuschleistung nur langsam verändert.It is therefore an object of the invention to provide an arrangement for recognizing the pauses in a disturbed speech signal, in which incorrect decisions in the sense explained above are avoided. The arrangement is also intended to enable speech pause recognition even if the average noise level changes only slowly.

Diese Aufgabe wird durch die im Kennzeichen des Anspruches 1 angegebenen Merkmale gelöst. Vorteilhafte Ausgestaltungen geben die Unteransprüche an.This object is achieved by the features specified in the characterizing part of claim 1. Advantageous refinements indicate the subclaims.

Anhand der Figuren soll die Erfindung näher erläutert werden.The invention will be explained in more detail with reference to the figures.

Es zeigt:

  • Figur 1 ein Blockschaltbild der erfindungsgemäßen Anordnung
  • Figur 2, 3 und 4 Diagramme zur Erläuterung der Wirkungsweise der erfindungsgemäßen Anordnung
It shows:
  • Figure 1 is a block diagram of the arrangement according to the invention
  • Figure 2, 3 and 4 diagrams to explain the operation of the arrangement according to the invention

Im Blockschaltbild nach Fig. 1 werden aus dem an einer Klemme E angelegten, gestörten Sprachsignal durch einen Analog-Digital-Umsetzer A/D zu Abtastzeitpunkten kTo Abtastwerte x(k) gewonnen, wobei k eine natürliche Zahl und I/To die Abtastfrequenz darstellt. Die Abtastwerte werden an einen Mittelwertbildner M weitergegeben.In the block diagram according to FIG. 1, kT o samples x (k) are obtained from the disturbed speech signal applied to a terminal E by an analog-to-digital converter A / D at sampling times, where k is a natural number and I / T o the sampling frequency represents. The samples are passed on to an averager M.

Der Mittelwertbildner M erzeugt zu allen Taktzeitpunkten T(n) mit dem zeitlichen Abstand mTo aus den Beträgen von m aufeinanderfolgenden Abtastwerten einen sogenannten Kurzzeitmittelwert.At all clock instants T (n) with the time interval mT o , the mean value generator M generates a so-called short-term mean value from the amounts of m consecutive samples.

G(n) - m

Figure imgb0001
n = 1, 2, 3,... usw.G (n) - m
Figure imgb0001
n = 1, 2, 3, ... etc.

Als Mittelwert ist das arithmetische Mittel aus den Beträgen der Abtastwerte verwendet, da zu dessen Bestimmung der Bausteineaufwand geringer ist als z. B. zur Bildung des quadratischen Mittels. Jeder Kurzzeitmittelwert G(n) ist näherungsweise ein Maß für die mittlere Leistung des gestörten Sprachsignales über einen Zeitraum von etwa 100 ms. Durch diese Angabe und durch die Abtastfrequenz ist auch die Zahl m der Abtastwerte festgelegt, die zur Bestimmung eines der Kurzzeitmittelwerte G(n) erforderlich sind. Wird z. B. das gestörte Sprachsignal mit 10 kHz abgetastet, so muß m etwa 1000 betragen. Jede der Größen G(1), G(2)... ergibt sich also aus etwa tausend aufeinanderfolgenden Abtastwerten.The arithmetic mean of the amounts of the sampled values is used as the mean value, since the block expenditure is less than z. B. to form the square mean. Each short-term mean value G (n) is approximately a measure of the average power of the disturbed speech signal over a period of approximately 100 ms. This specification and the sampling frequency also determine the number m of samples which are required to determine one of the short-term mean values G (n). Is z. B. sampled the disturbed speech signal at 10 kHz, m must be about 1000. Each of the quantities G (1), G (2) ... thus results from approximately a thousand consecutive samples.

Die Einheit GL der Fig. 1 führt eine Glättung der Folge der Kurzzeitmittelwerte G(n) durch. Näheres über den Zweck und die Art und Weise der Glättung wird weiter unten angegeben.The unit GL of FIG. 1 smoothes the sequence of the short-term mean values G (n). More about the purpose and manner of smoothing is given below.

Parallel zur Glättung wird durch den Block PA der Fig. 1 aus den Kurzzeitmittelwerten ein Schätzwert P(n) für die mittlere Geräuschleistung, d.h. für die mittlere Leistung des Störsignales bestimmt. Genaueres über den Schätzwert P(n) wird ebenfalls weiter unten ausgeführt. Ein Vergleicher V in Fig. 1 vergleicht eine vom Schätzwert P(n) abhängige Schwelle S mit den geglätteten Kurzzeitmittelwerten GG(n). Ist der geglättete Kurzzeitmittelwert GG(n) kleiner als die Schwelle S, wird ein Signal an eine Einheit EN weitergeleitet. Hat die Einheit EN z. B. zu zwei aufeinanderfolgenden Taktzeitpunkten T(n-1) und T(n) ein derartiges Signal erhalten, so läßt sie ihrerseits durch ein eigenes Signal an einer Klemme A das Vorliegen einer Sprachpause erkennen.In parallel to the smoothing, block PA of FIG. 1 converts the short-term mean values into an estimated value P (n) for the average noise power, i.e. determined for the average power of the interference signal. More details about the estimate P (n) are also given below. A comparator V in FIG. 1 compares a threshold S dependent on the estimated value P (n) with the smoothed short-term mean values GG (n). If the smoothed short-term mean value GG (n) is less than the threshold S, a signal is forwarded to a unit EN. Has the unit EN z. B. at two consecutive clock instants T (n-1) and T (n) receive such a signal, they can in turn detect the presence of a speech pause by means of their own signal at terminal A.

Das Diagramm a) der Fig. 2 zeigt ein mögliches Ausgangssignal AM des Mittelwertbildners M, d.h. eine mögliche Folge der Kurzzeitmittelwerte G(1), G(2)... In dem Diagramm a) ist das Ausgangssignal AM so normiert, daß sein absolutes Maximum den Wert 1 annimmt. Bei den eingetragenen Amplitudenschwellen handelt es sich um den Schätzwert P(n) (untere Schwelle, unterbrochen gezeichnet) und die Schwelle S (obere Schwelle, durchgezogen). Im Diagramm b) ist schematisch das zugehörige Sprachsignal S mit seinen wahren Pausen P abgebildet. Würde eine Pausenbestimmung aufgrund der Unterschreitung der oberen Amplitudenschwelle im Diagramm a) - diese Pausenbestimmung ist im Diagramm c) abgebildet - vorgenommen werden, so würde sich eine Vielzahl von Fehlentscheidungen ergeben, wie ein Vergleich der Diagramme b) und c) zeigt. Eine Verschiebung der oberen Schwelle nach unten würde zwar dazu führen, daß die im Diagramm c) enthaltenen Leistungseinbrüche, die nicht auf Sprachpausen beruhen, auch nicht angezeigt würden, jedoch würde dann die Aussage über die Pausenlängen erheblich verfälscht werden.The diagram a) of Fig. 2 shows a possible output signal AM of the averager M, i.e. a possible sequence of the short-term mean values G (1), G (2) ... In diagram a) the output signal AM is standardized so that its absolute maximum assumes the value 1. The amplitude thresholds entered are the estimated value P (n) (lower threshold, shown in broken lines) and the threshold S (upper threshold, solid). Diagram b) schematically shows the associated speech signal S with its true pauses P. If a pause determination were made due to the fact that the upper amplitude threshold was not reached in diagram a) - this pause determination is shown in diagram c) - a large number of wrong decisions would result, as a comparison of diagrams b) and c) shows. A shift of the upper threshold downward would lead to the fact that the performance drops contained in diagram c), which are not based on language breaks, would not be displayed either, but the statement about the length of the breaks would then be significantly falsified.

Daher ist bei der erfindungsgemäßen Anordnung vor der Entscheidung auf Pause eine Glättung des Ausgangsignales AM vorgesehen, und zwar entweder mit Hilfe eines linearen Digitalfilters, durch das aus drei aufeinander folgenden Kurzzeitmittelwerten G(n), G(n-1) und G(n-2) ein Wert GG(n) des geglätteten Signales erhalten wird, oder mit Hilfe eines Median-Filters.Therefore, in the arrangement according to the invention, a smoothing of the output signal AM is provided before the decision to pause, either with the aid of a linear digital filter, by means of which three successive short-term mean values G (n), G (n-1) and G (n- 2) a value GG (n) of the smoothed signal is obtained, or using a median filter.

Bei der linearen Filterung hat sich ein Filter mit den Koeffizienten 1/4, 1/2 und 1/4 als günstig erwiesen.With linear filtering, a filter with the coefficients 1/4, 1/2 and 1/4 has proven to be cheap.

Bei der Medianfilterung werden z. B. fünf aufeinanderfolgende Kurzzeitmittelwerte G(n)... G(n-4) der Größe nach geordnet und dann der mittlere Wert als Ausgangswert GG(n) des Filters ausgelesen. Wie das Ausgangssignal des Mittelwertbildners M nach der Glättung mit einem linearen Digitalfilter aussieht, ist dem Diagramm a) der Fig. 3 zu entnehmen. Im Diagramm b) sind wiederum schematisch die wahren Sprachabschnitte und die wahren Pausen des Sprachsignales aufgetragen, und das Diagramm c) zeigt die Sprachabschnitte und Sprachpausen wie sie sich analog zum Diagramm c) in Fig. 1 ergeben. Durch die lineare Glättung ist die Zahl der Fehlentscheidungen erheblich zurückgegangen, wie der Vergleich von Fig. 2 und Fig. 3 zeigt. Auch bei Glättung mit einem Median-Filter verringert sich - wie dem Diagramm c) der Fig. 4 zu entnehmen ist- die Zahl der Fehlentscheidungen.In median filtering, e.g. B. five consecutive short-term mean values G (n) ... G (n-4) ordered in size and then the average value is read out as the output value GG (n) of the filter. The diagram a) of FIG. 3 shows how the output signal of the mean value generator M looks after smoothing with a linear digital filter. In diagram b), the true speech sections and the real pauses of the speech signal are in turn plotted, and diagram c) shows the speech sections and speech pauses as they result analogously to diagram c) in FIG. 1. Due to the linear smoothing, the number of wrong decisions has decreased considerably, as the comparison of FIGS. 2 and 3 shows. Even with smoothing with a median filter, the number of incorrect decisions is reduced, as can be seen from diagram c) in FIG. 4.

Eine weitere Maßnahme, kürzere Leistungseinbrüche im gestörten Sprachsignal nicht als Pausen zu mißdeuten, besteht darin, z. B. einen Leistungseinbruch erst bei zweimaligem Unterschreiten der oberen Amplitudenschwelle in der Fig. 2, 3 oder 4 als Sprachpause anzusehen.Another measure, not to misinterpret shorter drops in performance in the disturbed speech signal as breaks, is, for. B. a drop in performance can only be regarded as a speech pause if the upper amplitude threshold is fallen below twice in FIG. 2, 3 or 4.

Die in der Fig. 2, 3 und 4 eingezeichneten Amplitudenschwellen werden - wie oben schon angedeutet - von der Einheit PA in Fig. 1 ermittelt, und zwar wird zunächst für jeden Zeitpunkt T(n) der Schätzwert P(n) der Geräuschleistung bestimmt. Diese Größe soll ein ungefähres Maß für die mittlere Leistung des Störsignales sein, wobei die Mittelungszeit in der Größenordnung einer Sekunde liegt.The amplitude thresholds shown in FIGS. 2, 3 and 4 are - as already indicated above - determined by the unit PA in FIG. 1, namely that the estimated value P (n) of the noise power is initially determined for each time T (n). This variable is intended to be an approximate measure of the average power of the interference signal, the averaging time being of the order of one second.

Weil der Schätzwert P(n) der Geräuschleistung während längerer Sprachpausen - auf deren Erkennung wird weiter unten eingegangen - auf einen aktuellen Wert gebracht wird, liefert die erfindungsgemäße Anordnung auch dann noch gute Ergebnisse, wenn sich die oben erwähnte mittlere Leistung des Störsignales nur langsam verändert, d.h., wenn sie in Zeitintervallen der Größe ein bis zwei Sekunden als stationär anzusehen ist.Because the estimated value P (n) of the noise power during longer speech pauses - their recognition will be discussed below - is brought to a current value, the arrangement according to the invention still delivers good results even if the above-mentioned average power of the interference signal changes only slowly , ie if it is to be regarded as stationary in time intervals of the size, one or two seconds.

Fällt der Zeitpunkt T(n) in eine längere Sprachpause, so wird der Schätzwert P(n) als Linearkombination aus dem vorangegangenen Schätzwert (P(n-1) und dem Kurzzeitmittelwert G(n) nach der Gleichung

Figure imgb0002
neu bestimmt. Der Wert der in dieser Gleichung auftretenden Konstante a liegt zwischen Null und Eins. Ein typischer Wert für a ist 0,5. Liegt keine längere Sprachpause vor, so wird der vorangegangene Schätzwert beibehalten, d.h. es wird P(n) = P (n-1) gesetzt. Zu Beginn der Pausenerkennung wird der Schätzwert zu Null gewählt.If the time T (n) falls into a longer speech pause, the estimated value P (n) is a linear combination of the previous estimated value (P (n-1) and the short-term mean value G (n) according to the equation
Figure imgb0002
redefined. The value of the constant a appearing in this equation is between zero and one. A typical value for a is 0.5. If there is no longer a pause in speech, the previous estimate is retained, ie P (n) = P (n-1) is set. At the beginning of the break detection, the estimated value is chosen to be zero.

Um die längeren Sprachpausen zu erkennen, wird laufend geprüft, ob die Differenz zweier aufeinanderfolgender Kurzzeitmittelwerte betragsmäßig unter eine Schwelle D fällt. Ist z. B. K mal nacheinander die Ungleichung

Figure imgb0003
erfüllt, so wird dieser Umstand als Vorliegen einer längeren Sprachpause gewertet und der neue Schätzwert P(n) nach der oben angegebenen Gleichung bestimmt. Die Schwelle D ist proportional zum Kurzzeitmittelwert G(n) gewählt, um zu gleichen Aussagen zu gelangen, wenn z. B. die Pegel aller Signale verdoppelt würden. Der Proportionalitätsfaktor y und die Anzahl K sind experimentell so zu bestimmen, daß durch die Anordnung möglichst wenige Fehlentscheidungen gefällt werden. Typische Werte sind
Figure imgb0004
In order to recognize the longer pauses in speech, it is continuously checked whether the difference between two successive short-term mean values falls below a threshold D in terms of amount. Is z. B. K times the inequality in succession
Figure imgb0003
is satisfied, this fact is considered to be a longer speech pause and the new estimated value P (n) is determined according to the equation given above. The threshold D is selected proportional to the short-term mean G (n) in order to arrive at the same statements if, for. B. the levels of all signals would be doubled. The proportionality factor y and the number K are to be determined experimentally in such a way that as few incorrect decisions as possible are made by the arrangement. Typical values are
Figure imgb0004

Ein anderer Weg, einen möglichst guten Schätzwert P(n) für eine langsam veränderliche Geräuschleistung zu erhalten, besteht darin, zu jedem Taktzeitpunkt T(n) eine Vergrößerung des schon vorhandenen Schätzwertes P(n-1) um einen festen Betrag c vorzunehmen, wenn der Schätzwert P(n-1) kleiner als der Kurzzeitmittelwert G(n) ist. Jedes Mal also, wenn die Ungleichung P(n-1) < G(n) erfüllt ist, wird P(n) = P(n-1) + c gesetzt.Another way of obtaining the best possible estimate P (n) for a slowly changing noise output is to increase the already existing estimate P (n-1) by a fixed amount c at every cycle time T (n), if the estimate P (n-1) is smaller than the short-term mean G (n). So every time the inequality P (n-1) <G (n) is satisfied, P (n) = P (n-1) + c is set.

Die Konstante c ist so zu wählen, daß der Schätzwert bei ungehinderter Vergrößerung in ein bis zwei Sekunden die Aussteuerungsgrenze erreicht hat. Liegt andererseits der schon vorhandene Schätzwert P(n-1) über dem augenblicklichen Kurzzeitmittelwert G(n), so wird der neue Schätzwert P(n) gegenüber dem vorhandenen erniedrigt, und zwar gemäß der Gleichung

Figure imgb0005
die den neuen Schätzwert als Linearkombination des vorangegangenen Schätzwertes und des augenblicklichen Kurzzeitmittelwertes G(n) darstellt. Werte um 0,5 haben sich für die Konstante β als günstig erwiesen.The constant c is to be chosen so that the estimation value reaches the modulation limit in one to two seconds with unimpeded enlargement. If, on the other hand, the already existing estimated value P (n-1) lies above the current short-term mean value G (n), the new estimated value P (n) is lowered compared to the existing one, specifically according to the equation
Figure imgb0005
which represents the new estimated value as a linear combination of the previous estimated value and the current short-term mean value G (n). Values around 0.5 have proven to be favorable for the constant β.

Die Schwelle S, die zur Pausenentscheidung herangezogen wird, ist proportional zum Schätzwert P(n). Typisch für den Zusammenhang zwischen der Schwelle S und dem Schätzwert P(n) ist die Gleichung S = 1,1 P(n).The threshold S, which is used for the pause decision, is proportional to the estimated value P (n). The relationship S = 1.1 P (n) is typical of the relationship between the threshold S and the estimated value P (n).

Claims (8)

1. An arrangement for recognizing speech pauses in a speech signal which may have noise signals superposed on it, and in which at each clock instant T(n) of a clock signal a short-time mean-value G(n) is determined by means of a mean-value producer (M), which represents an average of the values or of the square values of all the sampling values of the disturbed speech signal which are located between the clock instants T(n-1) and T(n), as a result of which an estimate P(n) of the noise power produced as a function of the short-time mean-value G(n) can be determined, characterized in that, with the aid of the block (PA) an estimate P(n) of the noise power is determined, which is a function of the short-time mean-value G(n) and the estimate P(n-1) at the preceding clock instant, that a smoothed short-time mean-value GG(n) is taken via a unit (GL) which is obtained from both the instantaneous short-time mean-value G(n) and the preceding short-time mean-values, that at each clock instant T(n) a comparator (V) checks whether the smoothed short-time mean-value GG(n) is below a first threshold (S) and that, when this condition is satisfied, once or several time consecutively a signal indicating the presence of a speech pause is produced.
2. An arrangement as claimed in Claim 1, characterized in that the arithmetic mean-value of the amounts of the sampling values is used as a short-time mean-value G(n).
3. An arrangement as claimed in Claim 1, characterized in that the estimate P(n) is only determined in accordance with the equation
Figure imgb0011
where a is a first constant, when the difference between the short-time mean-values G(n) - G(n-1) is, as regards its value, below a second threshold (D) and when this case has occurred uninterruptedly for a number of K preceding clock instants, and that if these condiditions are not satisfied, the estimate P(n) is made equal to the preceding estimate P(n-1).
4. An arrangement as claimed in Claim 1, characterized in that the estimate P(n) is only determined in accordance with the equation
Figure imgb0012
where c is a second constant when the unequation
Figure imgb0013
is satisfied, and that if this is not the case the estimate P(n) is chosen with a third constant to form
Figure imgb0014
5. An arrangement as claimed in Claim 1, characterized in that the first threshold (S) is chosen proportionally to the estimate P(n).
6. An arrangement as claimed in Claim 1, characterized in that the unit (GL) provided for the smoothing operation smooths three short-time mean-values G(n), G(n-1) and G(n-2) in accordance with the formula
Figure imgb0015
where the constants C°, Cl, C2 are all greater than or equal to zero and their sum has the value 1.
7. An arrangement as claimed in Claim 1, characterized in that the smoothing unit (GL) is in the form of a median filter.
8. An arrangement as claimed in Claim 3, characterized in that the second threshold (D) is chosen proportionally to the short-time mean-value G(n).
EP83201638A 1982-11-23 1983-11-17 Arrangement for the detection of speech intervals Expired - Lifetime EP0110467B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE19823243231 DE3243231A1 (en) 1982-11-23 1982-11-23 METHOD FOR DETECTING VOICE BREAKS
DE3243231 1982-11-23

Publications (3)

Publication Number Publication Date
EP0110467A1 EP0110467A1 (en) 1984-06-13
EP0110467B1 true EP0110467B1 (en) 1987-08-12
EP0110467B2 EP0110467B2 (en) 1991-06-19

Family

ID=6178780

Family Applications (1)

Application Number Title Priority Date Filing Date
EP83201638A Expired - Lifetime EP0110467B2 (en) 1982-11-23 1983-11-17 Arrangement for the detection of speech intervals

Country Status (6)

Country Link
US (1) US4700394A (en)
EP (1) EP0110467B2 (en)
JP (1) JPS59105695A (en)
AU (1) AU561076B2 (en)
CA (1) CA1203627A (en)
DE (2) DE3243231A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT1160148B (en) * 1983-12-19 1987-03-04 Cselt Centro Studi Lab Telecom SPEAKER VERIFICATION DEVICE
EP0167364A1 (en) * 1984-07-06 1986-01-08 AT&T Corp. Speech-silence detection with subband coding
AU583871B2 (en) * 1984-12-31 1989-05-11 Itt Industries, Inc. Apparatus and method for automatic speech recognition
JPH0748695B2 (en) * 1986-05-23 1995-05-24 株式会社日立製作所 Speech coding system
DE3626862A1 (en) * 1986-08-08 1988-02-11 Philips Patentverwaltung MULTI-STAGE TRANSMITTER ANTENNA COUPLING DEVICE
DE3739681A1 (en) * 1987-11-24 1989-06-08 Philips Patentverwaltung METHOD FOR DETERMINING START AND END POINT ISOLATED SPOKEN WORDS IN A VOICE SIGNAL AND ARRANGEMENT FOR IMPLEMENTING THE METHOD
FR2631147B1 (en) * 1988-05-04 1991-02-08 Thomson Csf METHOD AND DEVICE FOR DETECTING VOICE SIGNALS
JP2573352B2 (en) * 1989-04-10 1997-01-22 富士通株式会社 Voice detection device
US5305422A (en) * 1992-02-28 1994-04-19 Panasonic Technologies, Inc. Method for determining boundaries of isolated words within a speech signal
DE4220524A1 (en) * 1992-06-23 1992-10-22 Matzner Rolf Dipl Ing Separate estimation of power in two superimposed stochastic processes - by sampling and filtering to identify inputs for processing to identify separate signal and noise components
US5459814A (en) * 1993-03-26 1995-10-17 Hughes Aircraft Company Voice activity detector for speech signals in variable background noise
DE4405723A1 (en) * 1994-02-23 1995-08-24 Daimler Benz Ag Method for noise reduction of a disturbed speech signal
DE19730518C1 (en) * 1997-07-16 1999-02-11 Siemens Ag Speech pause recognition method
GB0103242D0 (en) * 2001-02-09 2001-03-28 Radioscape Ltd Method of analysing a compressed signal for the presence or absence of information content
DE10120231A1 (en) * 2001-04-19 2002-10-24 Deutsche Telekom Ag Single-channel noise reduction of speech signals whose noise changes more slowly than speech signals, by estimating non-steady noise using power calculation and time-delay stages
JP4739219B2 (en) * 2003-10-16 2011-08-03 エヌエックスピー ビー ヴィ Voice motion detection with adaptive noise floor tracking
US8543061B2 (en) 2011-05-03 2013-09-24 Suhami Associates Ltd Cellphone managed hearing eyeglasses
CN104658546B (en) * 2013-11-19 2019-02-01 腾讯科技(深圳)有限公司 Recording treating method and apparatus
RU2691603C1 (en) * 2018-08-22 2019-06-14 Акционерное общество "Концерн "Созвездие" Method of separating speech and pauses by analyzing values of interference correlation function and signal and interference mixture

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT1044353B (en) * 1975-07-03 1980-03-20 Telettra Lab Telefon METHOD AND DEVICE FOR RECOVERY KNOWLEDGE OF THE PRESENCE E. OR ABSENCE OF USEFUL SIGNAL SPOKEN WORD ON PHONE LINES PHONE CHANNELS
US4052568A (en) * 1976-04-23 1977-10-04 Communications Satellite Corporation Digital voice switch
US4025721A (en) * 1976-05-04 1977-05-24 Biocommunications Research Corporation Method of and means for adaptively filtering near-stationary noise from speech
US4028496A (en) * 1976-08-17 1977-06-07 Bell Telephone Laboratories, Incorporated Digital speech detector
FR2451680A1 (en) * 1979-03-12 1980-10-10 Soumagne Joel SPEECH / SILENCE DISCRIMINATOR FOR SPEECH INTERPOLATION
JPS56104399A (en) * 1980-01-23 1981-08-20 Hitachi Ltd Voice interval detection system
JPS56135898A (en) * 1980-03-26 1981-10-23 Sanyo Electric Co Voice recognition device
CA1147071A (en) * 1980-09-09 1983-05-24 Northern Telecom Limited Method of and apparatus for detecting speech in a voice channel signal
US4357491A (en) * 1980-09-16 1982-11-02 Northern Telecom Limited Method of and apparatus for detecting speech in a voice channel signal
JPS5852695A (en) * 1981-09-25 1983-03-28 日産自動車株式会社 Voice detector for vehicle
US4531228A (en) * 1981-10-20 1985-07-23 Nissan Motor Company, Limited Speech recognition system for an automotive vehicle

Also Published As

Publication number Publication date
AU2154583A (en) 1984-05-31
JPS59105695A (en) 1984-06-19
EP0110467A1 (en) 1984-06-13
AU561076B2 (en) 1987-04-30
DE3373037D1 (en) 1987-09-17
DE3243231C2 (en) 1987-07-02
CA1203627A (en) 1986-04-22
US4700394A (en) 1987-10-13
EP0110467B2 (en) 1991-06-19
DE3243231A1 (en) 1984-05-24

Similar Documents

Publication Publication Date Title
EP0110467B1 (en) Arrangement for the detection of speech intervals
EP0111947A1 (en) Arrangement for the detection of silence in speech signals
DE3612347C2 (en)
DE3012771C2 (en)
WO1998035715A1 (en) Process for switching the inspiration or expiration phase during cpap therapy
DE4217265C2 (en) Method for determining relevant relative extreme values of a signal subject to interference
DE19834108C2 (en) Method for determining the number of motor revolutions in electric motors from current ripples
EP0584388A1 (en) Method of producing a signal corresponding to a patient&#39;s minute-volume
EP0560047B1 (en) Safety device for power-closable openings
WO2001006265A2 (en) Method for determining the amplitude and angle of phase of a measuring signal corresponding to the current or voltage of an electric power distribution system
WO2000010633A1 (en) Method and device for switching the inspiration or expiration phase during cpap therapy
DE69725970T2 (en) METHOD FOR MONITORING LEVEL SWITCHES BY ACOUSTIC ANALYSIS
EP0775348B1 (en) Method of detecting signals by means of fuzzy-logic classification
DE19840872A1 (en) Method for probabilistic estimation of disturbed measured values
DE3017623C2 (en) Sensor for traffic detection of message streams consisting of analog signals on telecommunication lines
DE19854341A1 (en) Method and circuit arrangement for speech level measurement in a speech signal processing system
EP0203029B1 (en) Method of producing a tripping signal in dependence upon the amplitude and the duration of an overcurrent
DE10244699B4 (en) Method for determining speech activity
DE3305045C2 (en) Arrangement for determining the basic speech frequency
DE4315677C2 (en) Circuit arrangement for determining the basic frequency from a signal which does not have a band-limited signal and contains harmonics and interference signals, in particular for determining the basic voice frequency from the voice and speech signal
DE3335343A1 (en) METHOD FOR EXCITING ANALYSIS FOR AUTOMATIC VOICE RECOGNITION
DE3137314A1 (en) Circuit arrangement for voice-controlled hands-free apparatuses
EP0161423A1 (en) Method for determining the boundaries of a signal mixed with background noise
DE3617949C2 (en)
DE19922133C2 (en) Hearing aid device with oscillation detector and method for determining oscillations in a hearing aid device

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Designated state(s): BE DE FR GB IT SE

17P Request for examination filed

Effective date: 19840718

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): BE DE FR GB IT SE

REF Corresponds to:

Ref document number: 3373037

Country of ref document: DE

Date of ref document: 19870917

ITF It: translation for a ep patent filed

Owner name: ING. C. GREGORJ S.P.A.

ET Fr: translation filed
PLBI Opposition filed

Free format text: ORIGINAL CODE: 0009260

26 Opposition filed

Opponent name: SIEMENS AKTIENGESELLSCHAFT, BERLIN UND MUENCHEN

Effective date: 19880502

PLAB Opposition data, opponent's data or that of the opponent's representative modified

Free format text: ORIGINAL CODE: 0009299OPPO

R26 Opposition filed (corrected)

Opponent name: SIEMENS AKTIENGESELLSCHAFT, BERLIN UND MUENCHEN

Effective date: 19880809

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: BE

Payment date: 19891114

Year of fee payment: 7

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 19891121

Year of fee payment: 7

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: SE

Payment date: 19891128

Year of fee payment: 7

ITTA It: last paid annual fee
PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 19891130

Year of fee payment: 7

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 19900125

Year of fee payment: 7

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Effective date: 19901117

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Effective date: 19901118

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Effective date: 19901130

PUAH Patent maintained in amended form

Free format text: ORIGINAL CODE: 0009272

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: PATENT MAINTAINED AS AMENDED

BERE Be: lapsed

Owner name: N.V. PHILIPS' GLOEILAMPENFABRIEKEN

Effective date: 19901130

27A Patent maintained in amended form

Effective date: 19910619

AK Designated contracting states

Kind code of ref document: B2

Designated state(s): BE DE FR GB IT SE

GBPC Gb: european patent ceased through non-payment of renewal fee
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Effective date: 19910731

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Effective date: 19910801

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

EN3 Fr: translation not filed ** decision concerning opposition
EUG Se: european patent has lapsed

Ref document number: 83201638.0

Effective date: 19910705