DE19730518C1

DE19730518C1 - Speech pause recognition method

Info

Publication number: DE19730518C1
Application number: DE1997130518
Authority: DE
Inventors: Gonzalo Dr Ing Lucioni
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 1997-07-16
Filing date: 1997-07-16
Publication date: 1999-02-11
Anticipated expiration: 2017-07-17

Abstract

The method involves recognizing a speech pause within a sound consisting of a mixture of speech ("ZWEI", HG) and background noise, in which the sound is converted into digital data, which are processed in successive frames (Sk). A sum of the data of at least one frame is determined, and the sums of successive frames are supplied to a first digital filter (14). This determines a pulse value (P) under consideration of a first adaptation function, which represents the energy capacity of the sound within a frame. The sums of successive frames are supplied furthermore to a second digital filter (16) which determines a background value (H) under consideration of a second adaptation function. This represents the average energy capacity within a frame. An evaluation signal (VAD) for signalling an existing speech pause is produced, if the pulse value is larger than the background value plus a predetermined threshold level.

Description

Die Erfindung betrifft ein Verfahren und eine Einrichtung zum Erkennen einer Sprechpause innerhalb eines aus einem Gemisch von Sprache und Hintergrundgeräusch bestehenden Geräusches.The invention relates to a method and a device for Detect a pause in speech from within a mixture of speech and background noise.

Bei der Informationsverarbeitung eines Geräusches ist es häu fig von Vorteil, das Geräusch in ein Nutzgeräusch und in ein Hintergrundgeräusch zu trennen. Die für die Verarbeitung re levante Information wird dabei allein durch das Nutzgeräusch repräsentiert, während auf die Verarbeitung der in dem Hin tergrundgeräusch enthaltenen Informationen verzichtet werden kann. Dies ermöglicht es, die Informationsverarbeitung effek tiv zu gestalten. So kann beispielsweise in der Mobilfunk technik der Aufwand für die Übertragung der Sprachinformation reduziert oder in einem Ansage- und Anrufbeantwortersystem die benötigte Kapazität zum Speichern der Sprachinformation verringert werden. In der folgenden Beschreibung soll Sprache als Nutzgeräusch, d. h. als Geräuschanteil, der die relevante Information enthält, betrachtet werden, ohne daß dies als Einschränkung zu verstehen ist. Beispielsweise kann auch Mu sik ein solches Nutzgeräusch darstellen.It is common when processing a sound's information fig advantage, the noise in a useful sound and in one Separate background noise. The right for processing Levante information is provided solely by the useful noise represents while processing the in the Hin information contained in the background noise can. This enables the information processing to be effec tiv to design. For example, in mobile communications technology the effort for the transmission of the voice information reduced or in an announcement and answering system the capacity required to store the voice information be reduced. In the following description, language is meant as useful noise, d. H. as a noise component that is the relevant one Information contains, are considered without this as Restriction is to be understood. For example, Mu sik represent such useful noise.

Um in einem Geräusch informationsenthaltende Anteile von An teilen ohne relevante Sprachinformation zu trennen, ist es nötig, die Sprechpausen innerhalb des Geräusches zu identifi zieren. Verfahren und Vorrichtungen zum Erkennen von Sprech pausen sind bekannt. Sie sind jedoch durch ein Ein- bzw. Aus schwingverhalten ihrer funktionswesentlichen Komponenten ge kennzeichnet, das es nicht erlaubt, Sprechpausen innerhalb des Geräusches ohne störende Verzögerungen zu identifizieren. Diese Verzögerungen bewirken einen Mehraufwand beim Übertra gen oder Speichern der aus dem Geräusch gewonnenen Daten, da auch Anteile des Geräusches verarbeitet werden, die keine relevante Information enthalten.In order to contain parts of An sharing without separating relevant language information is it necessary to identify the pauses in the sound adorn. Methods and devices for recognizing speech breaks are known. However, they are by an on or off vibration behavior of their functionally essential components indicates that it does not allow pauses in speech within identify the noise without annoying delays. These delays cause additional effort when transferring or save the data obtained from the noise, because also parts of the noise are processed that contain no relevant information.

Aus den Druckschriften DE 32 43 231 A1 und DE 689 03 872 T2 sind Verfahren zur Erkennung von Sprechpausen bekannt, die auf der Basis von Energiemittelwerten innerhalb bestimmter Zeitabschnitte basieren.From the publications DE 32 43 231 A1 and DE 689 03 872 T2 are known methods for the detection of pauses in speech based on energy averages within certain Periods are based.

Es ist deshalb Aufgabe der Erfindung, ein Verfahren und eine Einrichtung anzugeben, die es ermöglichen, Sprechpausen mit möglichst geringer Verzögerung zu identifizieren, und so die Voraussetzung liefern, das Geräusch möglichst effektiv zu verarbeiten.It is therefore an object of the invention, a method and a Specify facility that allow pauses in speaking with identify as little delay as possible, and so the Provide the requirement to make the noise as effective as possible to process.

Die Erfindung löst diese Aufgabe durch das Verfahren mit den Merkmalen des Anspruchs 1 und durch die Einrichtung mit den Merkmalen des Anspruchs 16.The invention solves this problem by the method with Features of claim 1 and by the device with the Features of claim 16.

Bei dem erfindungsgemäßen Verfahren wird das Geräusch über die Zeit in digitale Daten gewandelt, die in aufeinanderfol genden Rahmen weiterverarbeitet werden. Es wird eine Summe der Daten mindestens eines Rahmens ermittelt. Die Summen auf einanderfolgender Rahmen werden einem ersten digitalen Filter zugeführt, welches unter Berücksichtigung einer ersten Adap tionsfunktion einen Pulswert ermittelt, welcher im wesentli chen den Energieinhalt des Geräusches innerhalb eines Rahmens wiedergibt. Weiterhin werden die Summen aufeinanderfolgender Rahmen einem zweiten digitalen Filter zugeführt, welches un ter Berücksichtigung einer zweiten Adaptionsfunktion einen Hintergrundwert ermittelt, welcher im wesentlichen den mitt leren Energieinhalt innerhalb eines Rahmens wiedergibt. Es wird ein Auswertesignal für das Vorliegen einer Sprechpause erzeugt, wenn der Pulswert kleiner als der Hintergrundwert plus einem vorgegebenen Schwellenwert ist. Bei dem Verfahren werden somit die aufeinanderfolgenden Rahmen dahingehend un terschieden, ob sie Sprachinformation enthalten oder nicht und damit zeitlich in eine Sprechpause fallen. Diese Unter scheidung wird für den aktuell zu verarbeitenden Rahmen auf der Grundlage des ermit telten Pulswertes und des ermittelten Hintergrundwertes ge troffen. Der Ermittlung des Pulswertes und des Hintergrund wertes liegt die Berechnung der Summe des aktuellen Rahmens zugrunde. Während der Pulswert ein Maß für den momentanen Energieinhalt des Geräusches ist und somit kurzzeitige Ge räuschschwankungen berücksichtigt, gibt der Hintergrundwert ein Maß für den über eine längere Zeit gemit telten Anteil des Energieinhaltes des Geräusches wieder. Ob gleich die Hintergrundwerte gleichsam das Langzeitverhalten des Geräusches wiedergeben, werden sie ebenso wie die Puls werte für jeden einzelnen Rahmen ermittelt. Die synchrone Er mittlung von Pulswert und Hintergrundwert des aktuellen Rah mens wird durch die Berücksichtigung der ersten und der zwei ten Adaptionsfunktion ermöglicht. Insbesondere entfällt die Notwendigkeit, das Hintergrundgeräusch durch Mittelung mehre rer aufeinanderfolgender Rahmen zu bestimmen. Für jeden ein zelnen Rahmen kann somit der Pulswert mit dem Hintergrundwert plus dem Schwellenwert verglichen und das Auswertesignal er zeugt werden, das eine Sprechpause anzeigt. Die Verzögerung bei der Ermittlung der Sprechpausen liegt demnach bei dem vorgestellten Verfahren in der Größenordnung einer Rahmen dauer. Ferner ist für das Bestimmen der Sprechpausen die Er mittlung der Summen, der Pulswerte und der Hintergrundwerte nicht auf einzelne Rahmen beschränkt, sondern kann auch für Einheiten erfolgen, die aus mehreren Rahmen zusammengesetzt sind.In the method according to the invention, the noise is over time converted into digital data, which in succession be processed further. It will be a sum the data of at least one frame is determined. The sums up successive frames become a first digital filter fed which, taking into account a first adap tion function determines a pulse value, which essentially Chen the energy content of the noise within a frame reproduces. The sums continue to be consecutive Frame supplied to a second digital filter, which un ter taking into account a second adaptation function Background value determined, which essentially the mean reproduces the energy content within a frame. It becomes an evaluation signal for the presence of a pause in speech generated when the pulse value is less than the background value plus a predetermined threshold. In the process the successive frames are thus un differentiated whether they contain voice information or not and thus fall into a pause in speaking. This sub divorce is for the frame currently to be processed on the basis of the ermit the pulse value and the determined background value hit. The determination of the pulse value and the background It is worth calculating the sum of the current frame underlying. While the pulse value is a measure of the current The energy content of the noise is short-term Ge takes noise fluctuations into account, the Background value is a measure of the over a long period of time share of the energy content of the noise again. Whether the background values are the long-term behavior of the sound, they will reproduce just like the pulse values determined for each individual frame. The synchronous Er averaging the pulse value and background value of the current frame mens is by considering the first and the two adaptation function. In particular, the Need to increase the background noise by averaging to determine the successive framework. For everyone The pulse value can thus separate the frame from the background value plus the threshold value and compared the evaluation signal be witnessed, which indicates a pause. The delay when determining the pauses in speaking lies with the presented methods in the order of a frame duration. Furthermore, the Er is for determining the pauses in speaking averaging the sums, the pulse values and the background values not limited to individual frames, but can also be used for Units are made up of multiple frames are.

Das erfindungsgemäße Verfahren ist unabhängig vom verwendeten Codierverfahren, mit dem die Komprimierung der Sprachinforma tionen vorgenommen wird. Es kann beispielsweise zur effekti ven Speicherung von Sprachinformationen innerhalb eines zen tralen Anrufbeantworter- und Ansagesystems verwendet werden. In diesem Fall werden die durch das bekannte RPE-LTP-Verfah ren (Regular Pulse Excitation-Long-Term Prediction) kompri mierten Daten mit einer Bitrate von 13 kBit/s abgespeichert. Durch den Einsatz des erfindungsgemäßen Verfahrens kann die zur Speicherung benötigte Bitrate auf unter 9 kBit/s gesenkt werden. Das Verfahren zur Sprechpausenerkennung kann auch zur Spracherkennung eingesetzt werden, bei der es auf eine genaue Erfassung der Wortgrenzen ankommt, um ein Wort zuverlässig zu erkennen. The method according to the invention is independent of the one used Coding method with which the compression of the speech information tion is made. For example, it can be used to effect ven storage of voice information within a zen central answering machine and announcement system can be used. In this case, the known RPE-LTP procedure ren (Regular Pulse Excitation-Long-Term Prediction) kompri stored data with a bit rate of 13 kbit / s. By using the method according to the invention, the The bit rate required for storage has been reduced to below 9 kbit / s become. The method for speech pause detection can also be used for Speech recognition can be used where it is accurate Capturing the word boundaries arrives to make a word reliable detect.

Eine vorteilhafte Weiterbildung der Erfindung besteht darin, als Summe den Logarithmus aus der Summe der Quadrate der ein zelnen Daten eines Rahmens zu verwenden. Als Summe kann auch der Logarithmus aus der Summe der Betragswerte der einzelnen Daten eines Rahmens verwendet werden. Bei der Verarbeitung von Sprache hat sich als vorteilhaft herausgestellt, den Lo garithmus aus der Summe der Quadrate bzw. der Betragswerte der einzelnen Daten zu verwenden. Der Logarithmus der Summe ist gut dem dynamischen Bereich der Daten angepaßt, die aus der Sprache durch eine Analog-Digital-Wandlung gewonnen wer den.An advantageous development of the invention consists in as the sum of the logarithm of the sum of the squares of the to use individual data of a frame. As a sum, too the logarithm of the sum of the individual values Data of a frame can be used. While processing of language has proven to be advantageous, the Lo algorithm from the sum of the squares or the absolute values to use the individual data. The logarithm of the sum is well adapted to the dynamic range of data that comes from of language through an analog-to-digital conversion the.

Eine weitere vorteilhafte Weiterbildung der Erfindung besteht darin, daß die Summe, welche durch das erste digitale Filter verarbeitet wird, einem ersten und einem zweiten Verarbei tungszweig des ersten digitalen Filters zugeführt wird, daß der erste Verarbeitungszweig in Reihe geschaltet ein erstes Additionsglied, ein Multiplikationsglied und ein zweites Ad ditionsglied hat, welches den Pulswert ausgibt, daß der Puls wert mit einer Verzögerung mindestens einer Rahmendauer zu rückgeführt und beim zweiten Additionsglied addiert und beim ersten Additionsglied subtrahiert wird, daß der zweite Verar beitungszweig in Reihe geschaltet ein Additionsglied und ein Adaptionsglied hat, daß am Additionsglied der verzögerte Pulswert subtrahiert und als Ergebnis eine erste Hilfsgröße ermittelt wird, daß in dem Adaptionsglied aus der ersten Hilfsgröße eine erste Adaptionsgröße gemäß der vorgegebenen ersten Adaptionsfunktion ermittelt wird und daß diese Adapti onsgröße im ersten Verarbeitungszweig am Multiplikationsglied als Multiplikator verwendet wird. Von der Summe des aktuellen Rahmens wird der Pulswert des vorhergehenden Rahmens subtra hiert.Another advantageous development of the invention exists in that the sum of the first digital filter is processed, a first and a second processing tion branch of the first digital filter is supplied that the first processing branch connected in series a first Adder, a multiplier and a second ad dition member, which outputs the pulse value that the pulse with a delay of at least one frame duration fed back and added at the second adder and at the first adder is subtracted that the second process processing branch connected in series an adder and a Adaptation element has that the delayed adder Subtracted pulse value and as a result a first auxiliary variable it is determined that in the adapter from the first Auxiliary size a first adaptation size according to the given first adaptation function is determined and that this Adapti size in the first processing branch on the multiplier is used as a multiplier. From the sum of the current Frame, the pulse value of the previous frame becomes subtra here.

Die erste Hilfsgröße als Ergebnis dieser Subtraktion ist ein Maß für die Änderung der Pulswerte aufeinanderfolgender Rah men, d. h. ein Maß für die kurzzeitigen Schwankungen des Ge räusches. In Abhängigkeit der ersten Hilfsgröße wird im zwei ten Verarbeitungszweig die erste Adaptionsgröße ermittelt, die wiederum in dem ersten Verarbeitungszweig zur Bestimmung des Pulswertes des aktuellen Rahmens benutzt wird. Die Ein schwingzeit des ersten digitalen Filters kann auf die Dauer eines Rahmens beschränkt werden, so daß die Pulswerte nach dem Einschalten ohne wesentliche Verzögerung ermittelt werden können.The first auxiliary variable as a result of this subtraction is a Measure of the change in the pulse values of successive frames men, d. H. a measure of the short-term fluctuations of the Ge noise. Depending on the first auxiliary variable, the second processing branch determines the first adaptation size, which in turn are determined in the first processing branch of the pulse value of the current frame is used. The one oscillation time of the first digital filter can be permanent of a frame are limited so that the pulse values after switch on without significant delay can.

Vorteilhaft wird das Verfahren dadurch weitergebildet, daß die Summe, welche durch das zweite digitale Filter verarbei tet wird, einem ersten und einem zweiten Verarbeitungszweig des zweiten digitalen Filters zugeführt wird, daß der erste Verarbeitungszweig in Reihe geschaltet ein erstes Additions glied, ein Multiplikationsglied und eines zweites Additions glied hat, welches den Hintergrundwert ausgibt, daß der Hin tergrundwert mit einer Verzögerung mindestens einer Rahmen dauer zurückgeführt und beim zweiten Additionsglied addiert und beim ersten Additionsglied subtrahiert wird, daß der zweite Verarbeitungszweig in Reihe geschaltet ein Additions glied und ein Adaptionsglied hat, daß am Additionsglied der verzögerte Hintergrundwert subtrahiert und als Ergebnis eine zweite Hilfsgröße ermittelt wird, daß in dem Adaptionsglied aus der zweiten Hilfsgröße eine zweite Adaptionsgröße gemäß der vorgegebenen zweiten Adaptionsfunktion ermittelt wird und daß diese Adaptionsgröße im zweiten Verarbeitungszweig am Multiplikationsglied als Multiplikator verwendet wird.The method is advantageously further developed in that the sum processed by the second digital filter a first and a second processing branch of the second digital filter that the first Processing branch connected in series a first addition link, a multiplier and a second addition member that outputs the background value that the Hin Background value with a delay of at least one frame duration and added to the second adder and the first adder is subtracted that the second processing branch connected in series an addition member and an adapter has that on the adder of delayed background value subtracted and as a result one second auxiliary variable is determined that in the adapter from the second auxiliary variable according to a second adaptation variable the predetermined second adaptation function is determined and that this adaptation size in the second processing branch on Multiplier is used as a multiplier.

Bei der beschriebenen Weiterbildung kann demnach zum Ermit teln der Hintergrundwerte ein zweites digitales Filter ver wendet werden, das sich von dem ersten digitalen Filter da durch unterscheidet, daß es unter Berücksichtigung der zwei ten Adaptionsfunktion die zweite Adaptionsgröße bestimmt. Bei dem Verfahren können also zwei im wesentlichen baugleiche di gitale Filter verwendet werden, so daß der schaltungstechni sche Aufwand verglichen mit bekannten Verfahren relativ ge ring ist. In the described further training can therefore Ermit a second digital filter that are there from the first digital filter by distinguishes that taking into account the two th adaptation function determines the second adaptation size. At the process can thus two essentially identical di gitale filters are used, so that the circuit techni cal effort compared to known methods relatively ge ring is.

Obgleich der Hintergrundwert im wesentlichen ein Maß für den über längere Zeit gemittelten Anteil des Geräusches angibt, wird er wie auch der Pulswert für jeden einzelnen Rahmen be stimmt, so daß für jeden Rahmen ein Vergleich von Pulswert und Hintergrundwert erfolgen kann. Dies ermöglicht es, Sprechpausen weitgehend verzögerungsfrei zu erkennen.Although the background value is essentially a measure of the indicates averaged portion of the noise over a longer period of time, it will be the same as the pulse value for each individual frame true, so that a comparison of pulse value for each frame and background value can be done. This enables Recognize pauses in speech largely without delay.

Vorteilhaft beträgt der Schwellenwert, um den der Hinter grundwert des aktuellen Rahmens bei einem Vergleich mit dem Pulswert vergrößert wird, etwa 5 dB. Diese Erhöhung der Hin tergrundwerte um 5 dB steigert die Zuverlässigkeit, mit der die Sprechpausen ermittelt werden können.The threshold around which the rear is advantageous is advantageous base value of the current frame when compared with the Pulse value is increased, about 5 dB. This increase in hin 5 dB increases the reliability with which the pauses in speaking can be determined.

Zum Speichern der Rahmen, die zeitlich innerhalb einer Sprechpause liegen, kann vorteilhaft eine Kennzeichnung er zeugt werden, die anstelle der Rahmen gespeichert wird. Die Kennzeichnung kann ein Zeichen sein, das in dem Rahmen sonst nicht auftritt. Die zum Speichern der Daten benötigte Kapazi tät kann auf diese Weise verringert werden, da nur die Sprachrahmen, die Sprachinformation enthalten, gespeichert werden.To save the frames that are timed within one Speaking break, it can be advantageous to mark it be created, which is saved instead of the frame. The Labeling can be a sign that is otherwise in the frame does not occur. The capacity needed to store the data can be reduced in this way, since only the Speech frames containing speech information are stored become.

Werden die den Sprechpausen zugrundeliegenden Daten nicht ab gespeichert, so kann zur Wiedergabe der Sprechpausen ein künstliches Rauschen durch einen Rauschgenerator erzeugt wer den. Die Sprechpausen werden so von dem Hörer nicht als unna türlich wirkende Phasen völliger Stille wahrgenommen, sondern als Hintergrundgeräusch, so daß kein abrupter, sondern ein "weicher" Übergang zwischen Sprache und Sprechpause wahrge nommen wird.The data on which the pauses in the speech are based are not removed saved, so you can play back the pauses artificial noise generated by a noise generator the. The listener does not consider the pauses in speech as unna perceived natural phases of complete silence, but as background noise, so that no abrupt, but a "soft" transition between language and pause in speech is taken.

Weitere vorteilhafte Weiterbildungen der Erfindung sind Ge genstand der Unteransprüche sowie der folgenden Beschreibung.Further advantageous developments of the invention are Ge subject of the subclaims and the following description.

Die Erfindung wird im folgenden an Hand der Figuren näher er läutert. Darin zeigen: The invention is he in the following with reference to the figures purifies. In it show:

Fig. 1 ein Blockdiagramm einer Einrichtung zum Erkennen von Sprechpausen, Fig. 1 is a block diagram of an apparatus for detecting speech pauses,

Fig. 2 eine Schaltungsanordnung eines Pulsfilters der Ein richtung nach Fig. 1 zum Ermitteln von Pulswerten, Fig. 2 shows a circuit arrangement of a pulse filter of the A direction of FIG. 1 for determining pulse values,

Fig. 3 eine erste Adaptionsfunktion, die in dem Pulsfilter nach Fig. 2 berücksichtigt wird, Fig. 3 shows a first adaptation function, which is taken into account in the pulse filter according to Fig. 2,

Fig. 4 den zeitlichen Verlauf der ermittelten Pulswerte für ein vorgegebenes Geräusch, Fig. 4 shows the time course of the detected pulse values for a given noise,

Fig. 5 eine Schaltungsanordnung eines Hintergrundfilters der Einrichtung nach Fig. 1 zum Ermitteln von Hin tergrundwerten, Fig circuitry tergrundwerten. 5 of a background filter of the apparatus of Fig. 1 for determining Hin

Fig. 6 eine zweite Adaptionsfunktion, die in dem Hinter grundfilter nach Fig. 4 berücksichtigt wird, Fig. 6 shows a second adaptation function, which is taken into account in the background filter according to Fig. 4,

Fig. 7 den zeitlichen Verlauf der ermittelten Hintergrund werte für das vorgegebene Geräusch, Fig. 7 shows the time course of the determined background values for the predetermined sound,

Fig. 8 den zeitlichen Verlauf eines Auswertesignals für das vorgegebene Geräusch. Fig. 8 shows the time course of an evaluation signal for the predetermined noise.

In Fig. 1 ist eine Einrichtung 10 gemäß der Erfindung darge stellt, die es ermöglicht, Sprechpausen innerhalb eines aus einem Gemisch von Sprache und Hintergrundgeräusch bestehenden Geräusches zu identifizieren. Die Einrichtung 10 kann bei spielsweise Teil eines Anrufbeantworter- und Ansagesystems (nicht dargestellt) sein. Das Geräusch wird in diesem Anruf beantworter- und Ansagesystem zunächst mit einer Abtastfre quenz von z. B. 8 kHz abgetastet und durch einen Digital-Ana log-Wandler (nicht dargestellt) in digitale Daten gewandelt. Die Daten werden daraufhin nach einem bekannten Codierverfah ren komprimiert und in Rahmen Sk eingeteilt, die jeweils aus mehreren Datenworten bestehen. Als Codierungsverfahren kann beispielsweise das RPE-LTP-Verfahren gemäß der Norm GSM 6.10 eingesetzt werden, das Rahmen der Dauer von 20 ms erzeugt, die einen Datenstrom mit einer Bitrate von 13 kBit/s bilden.In Fig. 1, a device 10 according to the invention is Darge, which makes it possible to identify pauses in speech within a noise consisting of a mixture of speech and background noise. The device 10 can, for example, be part of an answering machine and announcement system (not shown). The noise in this call answering and announcement system is initially with a sampling frequency of z. B. sampled 8 kHz and converted into digital data by a digital-ana log converter (not shown). The data are then compressed according to a known coding method and divided into frames Sk, each consisting of several data words. The RPE-LTP method according to the GSM 6.10 standard, for example, can be used as the coding method, which generates frames of 20 ms duration, which form a data stream with a bit rate of 13 kbit / s.

Die Einrichtung 10 enthält eine Summationseinheit 12 und nachfolgend, parallel angeordnet ein digitales Pulsfilter 14 erstes digitales Filter und ein digitales Hintergrundfilter 16 zweites digitales Filter. Das Pulsfilter 14 und das Hintergrundfilter 16 sind mit einer Auswerteeinheit 18 verbunden. In der Summationseinheit 12 werden die Daten eines aktuell zu verarbeitenden Rahmens gemäß Gleichung
The device 10 contains a summation unit 12 and subsequently, arranged in parallel, a digital pulse filter 14 first digital filter and a digital background filter 16 second digital filter. The pulse filter 14 and the background filter 16 are connected to an evaluation unit 18 . In the summation unit 12 , the data of a frame currently to be processed according to the equation

summiert und dies logarithmische Summe Elog ermittelt. In Gleichung (1) bezeichnet k eine Zählvariable, welche Abtast werte innerhalb des aktuellen Rahmens numeriert, F die Ge samtzahl der Abtastwerte innerhalb des Rahmens und s(k) den k-ten Abtastwert. Die Bildung des Logarithmus nach Gleichung (1) kann in der Summationseinheit 10 über eine geeignete Po lynomnäherung oder eine Tabellenzuordnung erfolgen. Die durch das Summationsglied 12 erzeugte Summe Elog stellt somit die über die Dauer eines Rahmens summierte logarithmische Ge räuschenergie dar.summed and this logarithmic sum Elog determined. In equation (1), k denotes a count variable that numbers samples within the current frame, F the total number of samples within the frame, and s (k) the kth sample. The logarithm according to equation (1) can be formed in the summation unit 10 via a suitable polynomial approximation or a table assignment. The sum Elog generated by the summation element 12 thus represents the sum of the logarithmic Ge noise energy over the duration of a frame.

Die logarithmische Geräuschenergie Elog wird dem digitalen Pulsfilter 14 und dem digitalen Hintergrundfilter 16 zuge führt. Das Pulsfilter 14 ermittelt aus der logarithmischen Geräuschenergie Elog in noch zu beschreibender Weise einen Pulswert P des aktuell zu verarbeitenden Rahmens. Durch das Hintergrundfilter 16 wird aus der logarithmischen Geräusch energie Elog ein Hintergrundwert H erzeugt. Die dem aktuellen Rahmen zugeordneten Werte P und H werden der Auswerteeinheit 18 zugeführt. In der Auswerteeinheit 18 wird der Pulswert P mit dem um einen vorgegebenen Schwellenwert erhöhten Hinter grundwert H verglichen. Ist der Pulswert P kleiner als die Summe aus dem Hintergrundwert H und dem vorgegebenen Schwel lenwert, so wird durch die Auswerteeinheit 18 ein Auswertesi gnal VAD=0 ausgegeben, das eine Sprechpause anzeigt. Über steigt dagegen der Pulswert P die Summe aus dem Hintergrund wert H und dem vorgegebenen Schwellenwert, so wird durch das Signal VAD=1 angezeigt, daß der aktuelle Rahmen Sprachinfor mation enthält. Als Schwellenwert ist beispielsweise ein Wert von 5 dB geeignet.The logarithmic noise energy Elog is fed to the digital pulse filter 14 and the digital background filter 16 . The pulse filter 14 determines a pulse value P of the frame currently to be processed from the logarithmic noise energy Elog in a manner still to be described. A background value H is generated by the background filter 16 from the logarithmic noise energy Elog. The values P and H assigned to the current frame are fed to the evaluation unit 18 . In the evaluation unit 18 , the pulse value P is compared with the background value H increased by a predetermined threshold value. If the pulse value P is smaller than the sum of the background value H and the predetermined threshold value, the evaluation unit 18 outputs an evaluation signal VAD = 0 which indicates a pause in speech. On the other hand, the pulse value P rises the sum of the background value H and the predetermined threshold value, so the signal VAD = 1 indicates that the current frame contains speech information. A value of 5 dB is suitable as a threshold value, for example.

In Abhängigkeit des Auswertesignals VAD kann nun entschieden werden, ob der aktuelle Rahmen in einer Speichereinheit (nicht dargestellt) des Ansage- und Anrufbeantwortersystems gespeichert werden soll, da er Sprachinformationen enthält (VAD=1), oder ob anstelle des Rahmens lediglich eine Kenn zeichnung gespeichert werden soll, da der aktuelle Rahmen zeitlich in eine Sprechpause fällt (VAD=0). Auf diese Weise kann die Bitrate der zu speichernden Sprachinformation von 13 kBit auf unter 9 kBit/s verringert werden. Der genaue Wert der zu speichernden Bitrate hängt von der Sprechgeschwindig keit ab. Je langsamer gesprochen wird, desto leistungsfähiger ist das Verfahren.Depending on the evaluation signal VAD can now decide be whether the current frame in a storage unit (not shown) of the announcement and answering system should be saved because it contains voice information (VAD = 1), or whether instead of the frame just an identifier drawing should be saved because of the current frame falls during a pause (VAD = 0). In this way can change the bit rate of the speech information to be stored from 13 kbit can be reduced to below 9 kbit / s. The exact value the bit rate to be saved depends on the speech speed off. The slower you speak, the more efficient is the procedure.

In Fig. 2 ist das Pulsfilter 14 dargestellt, das aus der logarithmischen Geräuschenergie Elog des aktuellen Rahmens den Pulswert P erzeugt. Das Pulsfilter 14 enthält zwei Verar beitungszweige. Im ersten Verarbeitungszweig sind in Reihe geschaltet ein erstes Additionsglied 20, ein Multiplikations glied 22 und ein zweites Additionsglied 24 enthalten. Dem zweiten Additionsglied 24 schließt sich ein Verzögerungsglied 26 an, dessen Ausgang mit dem Eingang des ersten Additions glieds 20 und dem Eingang des zweiten Additionsglieds 24 ver bunden ist. Das Verzögerungsglied 26 erzeugt eine Signalver zögerung T einer Rahmendauer. In dem zweiten Verarbeitungs zweig sind ein Additionsglied 28 und ein Adaptionsglied 30 in Reihe geschaltet. Der Eingang des Additionsgliedes 28 ist zu dem mit dem Ausgang des Verzögerungsgliedes 26 verbunden.In FIG. 2, the pulse filter 14 is illustrated of the current frame generated from the logarithmic noise energy Elog the pulse value P. The pulse filter 14 contains two processing branches. A first adder 20 , a multiplier 22 and a second adder 24 are included in series in the first processing branch. The second adder 24 is followed by a delay element 26 , the output of which is connected to the input of the first adder 20 and the input of the second adder 24 . The delay element 26 generates a signal delay T of a frame duration. In the second processing branch, an adder 28 and an adapter 30 are connected in series. The input of the adder 28 is connected to the output of the delay element 26 .

Zur Erläuterung der Funktionsweise des Pulsfilters 14 ist Fig. 3 heranzuziehen, in der eine wannenförmige, zur Null- Achse symmetrische Kennlinie 32 (durchgezogene Linie) darge stellt ist, die eine erste Adaptionsfunktion charakterisiert. Die Adaptionsfunktion nach Fig. 3 gibt den Zusammenhang zwi schen einer ersten Hilfsgröße D1 und einer ersten Adaptions größe M1 an, die im ersten Verarbeitungszweig des Pulsfilters 14 zur Ermittlung des Pulswertes P verwendet wird. Wie der Fig. 2 zu entnehmen ist, wird die erste Hilfsgröße D1 gebil det, indem an dem Additionsglied 28 der um T verzögerte Puls wert P, d. h. der Pulswert des dem aktuellen Rahmen vorherge henden Rahmens, von der logarithmischen Geräuschenergie Elog des aktuellen Rahmens subtrahiert wird. Bezeichnet man den aktuell zu verarbeitenden Rahmen mit n und den vorhergehen den, d. h. den eine Rahmendauer T zuvor verarbeiteten Rahmen mit n-1, so kann D1 geschrieben werden als
To explain the function of the pulse filter 14 Fig. 3 is to be applied, in which a trough-shaped, symmetrical to the neutral axis characteristic curve 32 (solid line) represents Darge is that characterizes a first adaptation function. The adaptation function of FIG. 3 shows the relationship Zvi rule of a first auxiliary variable D1 and a first adaptation size M1, which is used the pulse filter 14 in the first processing branch for detecting the pulse value P. As can be seen from FIG. 2, the first auxiliary variable D1 is formed by subtracting from the logarithmic noise energy Elog of the current frame at the adder 28 the pulse value P delayed by T, ie the pulse value of the frame preceding the current frame becomes. If one designates the frame currently to be processed with n and the previous frame, ie the frame frame T previously processed with n-1, then D1 can be written as

D1(n)=Elog(n)-P(n-1) (2)D1 (n) = Elog (n) -P (n-1) (2)

Die Hilfsgröße D1 gibt ein Maß an für die Änderung der Puls werte P zweier aufeinanderfolgender Rahmen.The auxiliary variable D1 gives a measure of the change in the pulse values P of two successive frames.

Aus der Hilfsgröße D1 gemäß der Beziehung (2) erzeugt das Ad aptionsglied 30 entsprechend der Kennlinie 32 nach Fig. 3 die Adaptionsgröße M1, die dem Multiplikationsglied 22 zuge führt wird. Im ersten Verarbeitungszweig wird am Additions glied 20 der Pulswert P des zuletzt verarbeiteten Rahmens n-1 von der logarithmischen Geräuschenergie Elog des aktuellen Rahmens n subtrahiert. Das Ergebnis dieser Subtraktion wird am Multiplikationsglied 22 mit der Adaptionsgröße M1 multi pliziert. An dem Additionsglied 24 werden schließlich das Er gebnis der am Multiplikationsglied 22 durchgeführten Multi plikation und der Pulswert P des zuletzt verarbeiteten Rah mens n-1 addiert und der Pulswert P des aktuellen Rahmens n ausgegeben. Dieser Pulswert P ist durch Gleichung (3) gege ben.
From the auxiliary variable D1 according to the relationship ( 2 ), the adaptation element 30 generates the adaptation variable M1 according to the characteristic curve 32 according to FIG. 3, which is supplied to the multiplication element 22 . In the first processing branch, the pulse value P of the last processed frame n-1 is subtracted from the logarithmic noise energy Elog of the current frame n at the adder 20 . The result of this subtraction is multiplied on the multiplication element 22 with the adaptation variable M1. He eventually become the plication result of the multiplication performed on the member 22 and the multi-pulse value P of the last processed Rah mens n-1 is added, and the pulse output value P of the current frame n at the adder 24th This pulse value P is given by equation (3).

P(n)= P(n-1)+[Elog(n)-P(n-1)]M (3)P (n) = P (n-1) + [Elog (n) -P (n-1)] M (3)

Der Pulswert P kann als ein Maß des Energieinhaltes des aktu ellen Rahmens n, d. h. des momentanen Energiegehaltes des Ge räusches, angesehen werden. Wie sich aus Gleichung (3) in Verbindung mit Fig. 3 ergibt, bleibt für betragsmäßig kleine Werte von D1, die später unter Berücksichtigung von Fig. 4 quantifiziert werden, der Pulswert P des aktuellen Rahmens n gegenüber dem Pulswert P des vorhergehenden Rahmens n-1 nahe zu unverändert, da die erste Adaptionsfunktion in diesem Wer tebereich von D1 kleine Werte nahe Null annimmt. Dies führt bei geringen Geräuschschwankungen, d. h. bei einem Rauschen, zu einer Glättung des zeitlichen Verlaufs von P, da sich die Pulswerte P nach Gleichung (3) bei betragsmäßig kleinen Wer ten von D1, d. h. bei kleinen Geräuschschwankungen, nur wenig verändern. Für betragsmäßig große Werte von D1, d. h. großen Geräuschschwankungen nimmt die Adaptionsgröße M1 einen großen Wert nahe 1 an, so daß der Pulswert P des aktuellen Rahmens n der logarithmischen Geräuschenergie Elog folgt. Ein starker Anstieg bzw. ein starker Abfall der logarithmischen Geräusch energie Elog, d. h. das Ende bzw. der Beginn einer Sprech pause, spiegelt sich somit in einem entsprechenden Anstieg bzw. Abfall der ermittelten Pulswerte P wieder.The pulse value P can be regarded as a measure of the energy content of the current frame n, ie the instantaneous energy content of the noise. As can be seen from equation (3) in conjunction with FIG. 3, the pulse value P of the current frame n compared to the pulse value P of the previous frame n remains for small values of D1, which are quantified later with reference to FIG. 1 close to unchanged, since the first adaptation function in this value range of D1 assumes small values close to zero. With small noise fluctuations, ie with noise, this leads to a smoothing of the time profile of P, since the pulse values P according to equation (3) change only little with values of D1, ie small noise fluctuations. For large amounts of D1, ie large noise fluctuations, the adaptation variable M1 assumes a large value close to 1, so that the pulse value P of the current frame n follows the logarithmic noise energy Elog. A sharp rise or fall in the logarithmic noise energy Elog, ie the end or the beginning of a pause in speech, is thus reflected in a corresponding rise or fall in the pulse values P determined.

Die erste Adaptionsfunktion mit der Kennlinie 32 nach Fig. 3 kann durch eine Rechteckfunktion angenähert werden, die für Betragswerte von D1 kleiner als ein vorgegebener Amplituden wert s einen Minimalwert, z. B. etwa 1/32, und für Betrags werte von D1 größer als der vorgegebene Amplitudenwert s ei nen Maximalwert, z. B. etwa 0,7 annimmt. Die Rechteckfunktion ist in Fig. 3 durch eine Rechteckkennlinie 34 (gestrichelte Linie) dargestellt. Der Amplitudenwert s ist so einzustellen, daß die Pulswerte P zum einen vom Rauschen von Elog im we sentlichen unbeeinflußt bleiben und zum anderen der logarith mischen Geräuschenergie Elog am Anfang oder am Ende einer Sprechpause folgen. In Fig. 4 ist am Beispiel des Wortes "ZWEI" gefolgt von drei Hustgeräuschen HG verdeutlicht, wie sich die logarithmische Geräuschenergie Elog und die Puls werte P mit der Zeit ändern. Es ist zu erkennen, daß die Pulswerte P in den Sprechpausen auf niedrigem Niveau ver gleichsweise gleichförmig verlaufen, während sie für das Wort "ZWEI" und die drei Hustgeräusche HG der logarithmischen Ge räuschenergie Elog folgen und große Werte annehmen. Der Wert s nach Fig. 3, der den Übergang der Rechteckskennlinie 34 von einem Minimalwert auf einen Maximalwert angibt, ist für das Beispiel nach Fig. 4 auf etwa 10 eingestellt, so daß das Rauschen von Elog, das in diesem Beispiel etwa ± 10 auf einer logarithmischen Amplitudenskala beträgt, von den Pulswerten P unterdrückt wird.The first adaptation function with the characteristic curve 32 according to FIG. 3 can be approximated by a rectangular function which, for value values of D1 smaller than a predetermined amplitude value s, has a minimum value, e.g. B. about 1/32, and for amounts of D1 greater than the predetermined amplitude value s egg nen maximum value, z. B. assumes about 0.7. The rectangular function is represented in FIG. 3 by a rectangular characteristic curve 34 (dashed line). The amplitude value s is to be set such that the pulse values P remain largely unaffected by the noise from Elog and, on the other hand, follow the logarithmic noise energy Elog at the beginning or at the end of a pause in speech. In FIG. 4, the example of the word "TWO" followed by three coughing noises HG illustrates how the logarithmic noise energy Elog and the pulse values P change over time. It can be seen that the pulse values P run comparatively uniformly in the pauses at low levels, while they follow the logarithmic noise energy Elog for the word "TWO" and the three coughing noises HG and assume large values. The value s according to FIG. 3, which indicates the transition of the rectangular characteristic curve 34 from a minimum value to a maximum value, is set to approximately 10 for the example according to FIG. 4, so that the noise from Elog, which in this example is approximately ± 10 a logarithmic amplitude scale, from which pulse values P are suppressed.

In Fig. 5 ist das digitale Hintergrundfilter 16 dargestellt, das aus der logarithmischen Geräuschenergie Elog des aktuel len Rahmens den Hintergrundwert H erzeugt. Das Hintergrund filter 16 ist ebenso aufgebaut wie das Pulsfilter 14, so daß auf eine detaillierte Beschreibung seines Aufbaus an dieser Stelle verzichtet werden kann. Die Komponenten des Hinter grundfilters 16 in Fig. 5 sind mit denselben Bezugszeichen versehen wie die entsprechenden Komponenten des Pulsfilters 14 nach Fig. 2. Das Hintergrundfilter 16 unterscheidet sich von dem Pulsfilter 14 darin, daß in dem Adaptionsglied 30 ei ne von der ersten Adaptionsgröße M1 verschiedene zweite Adap tionsgröße M2 ermittelt wird, die dem Multiplikationsglied 22 in dem ersten Verarbeitungszweig zugeführt wird. Der Ermitt lung der zweiten Adaptionsgröße M2 liegt eine zweite Adapti onsfunktion zugrunde, deren Kennlinie 36 in Fig. 6 durchge zogen dargestellt ist. Die zweite Adaptionsgröße M2 wird in Abhängigkeit einer zweiten Hilfsgröße D2 ermittelt, die da durch erzeugt wird, daß an dem Additionsglied 28 des Hinter grundfilters 16 der um T verzögerte Hintergrundwert H von der logarithmischen Geräuschenergie Elog des aktuellen Rahmens n subtrahiert wird. D2 kann geschrieben werden als
In Fig. 5, the digital background filter 16 is illustrated which the aktuel len frame generated from the logarithmic noise energy Elog the background value H. The background filter 16 is constructed in the same way as the pulse filter 14 , so that a detailed description of its structure can be dispensed with at this point. The components of the background filter 16 in FIG. 5 are provided with the same reference numerals as the corresponding components of the pulse filter 14 according to FIG. 2. The background filter 16 differs from the pulse filter 14 in that in the adapter 30, egg ne from the first adaptation variable M1 various second adaptation variable M2 is determined, which is fed to the multiplication element 22 in the first processing branch. The determination of the second adaptation variable M2 is based on a second adaptation function, the characteristic 36 of which is shown as a solid line in FIG. 6. The second adaptation variable M2 is determined as a function of a second auxiliary variable D2 which is generated by subtracting the background value H delayed by T from the logarithmic noise energy Elog of the current frame n at the addition element 28 of the background filter 16 . D2 can be written as

D2(n)=Elog(n)-H(n-1) (4)D2 (n) = Elog (n) -H (n-1) (4)

Wie der Fig. 6 zu entnehmen ist, fällt die Kennlinie 36 in nerhalb eines Übergangsbereiches II, der im wesentlichen ne gative Werte von D2 enthält, von einem hohen Niveau nahe 1 auf ein niedriges Niveau nahe 0. Die entsprechend der Kennli nie 36 ermittelte zweite Adaptionsgröße M2 wird in dem ersten Verarbeitungszweig zur Berechnung des Hintergrundwertes H des aktuellen Rahmens n verwendet. Am Additionsglied 20 wird der Hintergrundwert H des zuletzt verarbeiteten Rahmens n-1 von der logarithmischen Geräuschenergie Elog des aktuellen Rah mens n subtrahiert und das Ergebnis dieser Subtraktion am Multiplikationsglied 22 mit der zweiten Adaptionsgröße M2 multipliziert. An dem Additionsglied 24 wird das Ergebnis dieser Multiplikation und der Hintergrundwert H des zuletzt verarbeiteten Rahmens n-1 addiert und der Hintergrundwert H des aktuellen Rahmens n ausgegeben. Dieser Hintergrundwert H ist durch Gleichung (5) gegeben.
As can be seen from FIG. 6, the characteristic curve 36 falls within a transition region II, which essentially contains negative values of D2, from a high level close to 1 to a low level close to 0. The second 36 determined according to the characteristic curve Adaptation variable M2 is used in the first processing branch to calculate the background value H of the current frame n. On the adder 20 , the background value H of the last processed frame n-1 is subtracted from the logarithmic noise energy Elog of the current frame n and the result of this subtraction is multiplied on the multiplier 22 by the second adaptation variable M2. The result of this multiplication and the background value H of the last processed frame n-1 are added to the adder 24 and the background value H of the current frame n is output. This background value H is given by equation (5).

H(n) = H(n-1)+[Elog(n)-H(n-1)]M2 (5)H (n) = H (n-1) + [Elog (n) -H (n-1)] M2 (5)

Gleichung (5) besagt in Verbindung mit Fig. 6, daß der Hin tergrundwert H des aktuellen Rahmens n für positive Werte von D2, d. h. in Fig. 6 in einem Bereich III, im wesentlichen der Hintergrundwert H des vorhergehenden Rahmens n-1 ist, da M2 für positive Werte von D2 Werte nahe Null annimmt. Die Hin tergrundwerte H bleiben also bei einem plötzlichen Anstieg der logarithmischen Geräuschenergie Elog, d. h. bei einem Sprechpausenende im wesentlichen unverändert. Für negative Werte von D2, die außerhalb des Übergangsbereichs II in einem Bereich I liegen, und einen Sprechpausenanfang kennzeichnen, nimmt die zweite Adaptionsgröße M2 Werte nahe 1 an, so daß der Hintergrundwert H des aktuellen Rahmens n nach Gleichung (5) vergleichbar der logarithmischen Geräuschenergie Elog dieses Rahmens wird. Die Hintergrundwerte H folgen also dem bei einem Sprechpausenanfang auftretenden steilen Abfall der logarithmischen Geräuschenergie Elog. Die Bereiche I, II, III sind durch einen negativen Wert S1 und einen positiven Wert S2 von D2 festgelegt. Die Werte von S1 und S2 werden später unter Berücksichtigung von Fig. 7 angegeben. Equation (5) in conjunction with FIG. 6 means that the background value H of the current frame n for positive values of D2, ie in FIG. 6 in a region III, is essentially the background value H of the previous frame n-1, because M2 takes values close to zero for positive values of D2. The background values H therefore remain essentially unchanged in the event of a sudden increase in the logarithmic noise energy Elog, ie at the end of a pause in speech. For negative values of D2, which lie outside of the transition area II in an area I and mark a beginning of the pause, the second adaptation variable M2 takes values close to 1, so that the background value H of the current frame n according to equation (5) is comparable to the logarithmic noise energy Elog of this framework will. The background values H thus follow the steep drop in the logarithmic noise energy Elog that occurs at the beginning of a pause. The areas I, II, III are determined by a negative value S1 and a positive value S2 of D2. The values of S1 and S2 are given later taking into account Fig. 7.

Die zweite Adaptionsfunktion kann durch eine Stufenfunktion angenähert sein. In Fig. 6 ist als Annäherung an die Kennli nie 36 eine stufenförmige Kennlinie 38 dargestellt (gestrichelte Linie) Die Adaptionsgröße M2 nimmt für die Kennlinie 38 in dem Bereich I etwa einen Wert von 0,9, in dem Bereich II etwa einen Wert von 0,1 und in dem Bereich III et wa einen Wert von 0 an.The second adaptation function can be approximated by a step function. In FIG. 6, 36 a step-shaped characteristic curve 38 is as an approximation to the Kennli never shown (dashed line) The adaptation variable M2 assumes for the characteristic curve 38 in the region I, a value of approximately 0.9, in the region II as a value of 0 , 1 and in the range III et wa a value of 0.

In Fig. 7 ist der zeitliche Verlauf der Hintergrundwerte H, die unter Verwendung der stufenförmigen Kennlinie 38 ermit telt worden sind, zusammen mit dem zeitlichen Verlauf der logarithmischen Geräuschenergie Elog dargestellt. Der Fig. 7 liegt das schon in Fig. 4 betrachtete Beispielgeräusch zu grunde. Der zeitliche Verlauf der Hintergrundwerte H ist weitgehend unbeeinflußt von den großen Werten der logarithmi schen Geräuschenergie Elog, die bei dem Wort "ZWEI" und den drei Hustgeräuschen HG auftreten. Die Hintergrundwerte H be schreiben demnach zuverlässig das Hintergrundgeräusch des Ge räuschgemisches. Die Werte S1 und S2 nach Fig. 6 sind für das Beispiel nach Fig. 7 wie folgt gewählt: S1= -20, S2= 10. FIG. 7 shows the time course of the background values H, which were determined using the step-shaped characteristic curve 38 , together with the time course of the logarithmic noise energy Elog. FIG. 7 is the considered already in Fig. 4, sound to grunde. The time course of the background values H is largely unaffected by the large values of the logarithmic noise energy Elog, which occur with the word "TWO" and the three coughing noises HG. The background values H therefore reliably describe the background noise of the noise mixture. The values S1 and S2 according to FIG. 6 are selected as follows for the example according to FIG. 7: S1 = -20, S2 = 10.

Nachdem in den digitalen Filtern 14, 16 der Pulswert P bzw. der Hintergrundwert H des aktuellen Rahmens bestimmt worden sind, wird in der Auswerteeinheit 18 der Pulswert mit der Summe aus Hintergrundwert H und vorgegebenem Schwellwert ver glichen. In Fig. 8 sind für das Beispielgeräusch nach Fig. 5 und 7 die Ergebnisse dieses Vergleichs, d. h. die ermittel ten Werte des Auswertesignals VAD dargestellt. Es ist zu er kennen, daß die Sprechpausen zuverlässig identifiziert wer den, da in den Sprechpausen das Auswertesignal VAD den Wert 0 annimmt. Die Rahmen, denen auf der Grundlage des Vergleichs von P und H das Auswertesignal VAD=1 zugeordnet ist, werden in dem Ansage- und Anrufbeantwortersystem gespeichert. Für die Rahmen, denen das Auswertesignal VAD=0 zugeordnet ist, muß dagegen lediglich eine Kennzeichnung, beispielsweise ein in den Rahmen sonst nicht auftretendes einzelnes Zeichen ge speichert werden, da diese Rahmen keine Sprachinformation enthalten.After the pulse value P or the background value H of the current frame has been determined in the digital filters 14 , 16, the pulse value is compared in the evaluation unit 18 with the sum of the background value H and the predetermined threshold value. In FIG. 8, the results of this comparison, ie the determined values of the evaluation signal VAD, are shown for the example noise according to FIGS. 5 and 7. It should be known that the pauses in speaking are reliably identified, since the evaluation signal VAD assumes the value 0 in the pauses in speech. The frames to which the evaluation signal VAD = 1 is assigned on the basis of the comparison of P and H are stored in the announcement and answering machine system. For the frames to which the evaluation signal VAD = 0 is assigned, on the other hand, only an identifier, for example a single character that does not otherwise appear in the frame, must be stored, since these frames contain no voice information.

Zur Wiedergabe der nach dem eben beschriebenen Verfahren ver arbeiteten und gespeicherten Daten ist es möglich, für die an Hand der Kennzeichnung erkannten Sprechpausen Null-Werte aus zugeben. Dadurch treten jedoch innerhalb des Geräuschgemi sches geräuschlose Phasen auf, die für den Hörer unnatürlich wirken. Einer solchen unnatürlichen Geräuschwiedergabe kann dadurch entgegengewirkt werden, daß in den Sprechpausen an stelle der Null-Werte Zufallswerte ausgegeben werden, die ein künstliches Rauschen erzeugen. Ein solches Rauschen kann bei spielsweise durch einen digitalen Rauschgenerator erzeugt werden, der aus einem zuvor bestimmten Zufallswert x(k) einen darauffolgenden Zufallswert x(k+1) über die Beziehung
To reproduce the data processed and stored according to the method just described, it is possible to output zero values for the speech pauses recognized on the basis of the identification. As a result, however, noiseless phases occur within the noise mix that appear unnatural for the listener. Such an unnatural sound reproduction can be counteracted by random values being generated instead of the zero values during the pauses in the speech, which generate artificial noise. Such noise can be generated, for example, by a digital noise generator, which uses a previously determined random value x (k) to generate a subsequent random value x (k + 1) via the relationship

x(k+1)=integer[(15305.x(k)+12345).2^-P];x(0)=1 (6)
x (k + 1) = integer [(15305.x (k) +12345) .2 ^-P ]; x (0) = 1 (6)

berechnet. Der Faktor 2^-P gibt dabei einen geeignet zu wäh lenden Schiebefaktor an, wobei P eine ganze Zahl ist. Die be rechneten Zufallswerte x(k+1) können in Abhängigkeit von H skaliert werden, um den Rauschpegel zu verändern. Der Rausch pegel kann zudem an die logarithmische Geräuschenergie Elog oder den Pulswert P des Rahmens angepaßt werden, der zuletzt als ein Sprachinformation enthaltender Rahmen identifiziert worden ist. Dadurch wird der Übergang von einem solchen Rah men zu einem Rahmen, der keine Sprachinformation enthält, vom Hörer als "weich" empfunden. calculated. The factor 2 ^-P indicates a suitable shift factor to be selected, where P is an integer. The calculated random values x (k + 1) can be scaled depending on H in order to change the noise level. The noise level can also be adapted to the logarithmic noise energy Elog or the pulse value P of the frame that was last identified as a frame containing voice information. As a result, the transition from such a frame to a frame that contains no voice information is felt by the listener as "soft".

Reference list

1010th

Einrichtung zur Sprechpausenerkennung
Device for speech pause detection

1212th

Summationseinheit
Summation unit

1414

digitales Pulsfilter
digital pulse filter

1616

digitales Hintergrundfilter
digital background filter

1818th

Auswerteeinheit
Evaluation unit

2020th

Additionsglied
Adder

2222

Multiplikationsglied
Multiplier

2424th

Additionsglied
Adder

2626

Verzögerungsglied
Delay element

2828

Additionsglied
Adder

3030th

Adaptionsglied
Adapter

3232

erste Adaptionsfunktion
first adaptation function

3434

Rechteckfunktion
Rectangular function

3636

zweite Adaptionsfunktion
second adaptation function

3838

Stufenfunktion
Sk Rahmen
P Pulswert
H Hintergrundwert
VAD Auswertesignal
Elog logarithmische Geräuschenergie
D1 erste Hilfsgröße
D2 zweite Hilfsgröße
M1 erste Adaptionsgröße
M2 zweite Adaptionsgröße
HG Hustgeräusche
I,II,III Wertebereiche von D2
Step function
Sk frame
P pulse value
H background value
VAD evaluation signal
Elog logarithmic noise energy
D1 first auxiliary variable
D2 second auxiliary variable
M1 first adaptation size
M2 second adaptation size
HG coughing noises
I, II, III value ranges of D2

Claims

1. Method for recognizing a pause in speech within a mixture of speech ("ZWEI", HG) and background noise,
in which the noise is sampled over time and converted by a digital-to-analog converter into digital data which are further processed in successive frames (Sk),
a sum of the data of at least one frame (Sk) is determined,
the sums of successive frames (Sk) are fed to a first digital filter ( 14 ) which, taking into account a first adaptation function ( 32 ), determines a pulse value (P) which essentially reflects the energy content of the noise within a frame (Sk),
the sums of successive frames (Sk) continue to be fed to a second digital filter ( 16 ) which, taking into account a second adaptation function ( 36 ), determines a background value (H) which essentially corresponds to the average energy content of the background noise within a frame (Sk ) as is
and in which an evaluation signal (VAD = 0) for the presence of a speech pause is generated if the pulse value (P) is less than the background value (H) plus a predetermined threshold value.

2. The method according to claim 1, characterized in that as the sum of the logarithm of the sum of the squares of the individual data (Elog) of a frame (Sk) is used.

3. The method according to claim 1, characterized in that as the sum of the logarithm of the sum of the absolute values the individual data of a frame (Sk) is used.

4. The method according to any one of the preceding claims, characterized by that the sum, which is obtained by it is processed ste digital filter (14) is fed to a first and a second processing branch of the first digital filter (14),
that the first processing branch connected in series has a first addition element ( 20 ), a multiplication element ( 22 ) and a second addition element ( 24 ) which outputs the pulse value (P),
that the pulse value (P) is fed back with a delay (T) of at least one frame duration and is added in the second addition element ( 24 ) and subtracted in the first addition element ( 20 ),
that the second processing branch connected in series has an adder ( 28 ) and an adapter ( 30 ),
that the delayed pulse value (P) is subtracted on the adder and as a result a first auxiliary variable (D1) is determined,
that a first adaptation variable (M1) is determined in the adaptation element from the first auxiliary quantity (D1) in accordance with the predetermined adaptation function ( 32 ),
and that this adaptation variable (M1) is used in the first processing branch on the multiplier ( 22 ) as a multiplier.

5. The method according to claim 4, characterized in that the first adaptation function ( 32 ) defines a relationship between the first adaptation variable (M1) and the first auxiliary variable (D1), which is characterized by a trough-shaped characteristic curve symmetrical to the zero axis.

6. The method according to claim 5, characterized in that the first adaptation function is approximated by a rectangular function ( 34 ).

7. The method according to claim 6, characterized in that for absolute values of the first auxiliary variable (D1) less than a predetermined amplitude value (s) the first adaptation size (M1) equal to approximately 1/32 and otherwise approximately 0.7 is.

8. The method according to any one of the preceding claims, characterized in that the sum which is processed by the second digital filter ( 16 ), one of the first and a second processing branch of the second digital filter ( 16 ) is supplied,
that the first processing branch connected in series has a first addition element ( 20 ), a multiplication element ( 22 ) and a second addition element ( 24 ) which outputs the background value (H),
that the background value (H) is fed back with a delay (T) of at least one frame duration and added to the second addition element ( 24 ) and subtracted to the first addition element ( 20 ),
that the second processing branch connected in series has an adder ( 28 ) and an adapter ( 30 ),
that the delayed background value (H) is subtracted at the adder ( 28 ) and a second auxiliary variable (D2) is determined as a result,
that in the adaptation element ( 30 ) from the second auxiliary quantity (D2) a second adaptation quantity (M2) is determined according to the pregiven NEN adaptation function ( 36 ) and that this adaptation quantity (M1) is used in the first processing branch on the multiplication element ( 22 ) as a multiplier becomes.

9. The method according to claim 8, characterized in that the second adaptation function ( 36 ) defines a relationship between the second adaptation variable (M2) and the second auxiliary variable (D2), which is characterized by a characteristic curve, which is within approximately to Negative range (II) of the second auxiliary variable (D2) which extends from the zero axis from a high level to a low level.

10. The method according to claim 9, characterized in that the second adaptation function is approximated by a step function ( 38 ).

11. The method according to claim 10, characterized in that the second adaptation variable (M2) for values of the auxiliary variable (D2) smaller than a first predetermined negative auxiliary value (S1) about 0.9, for values of the auxiliary variable (D2) larger than the first auxiliary value (S1) and less than a second predetermined positive auxiliary value (S2) about 0.1 and for Values greater than the second auxiliary value (S2) is approximately zero.

12. The method according to any one of the preceding claims, since characterized in that the threshold be about 5 dB wearing.

13. The method according to any one of the preceding claims, since characterized in that for storing the data for the Framework (Sk) that is timed within a pause lie, a label is generated that instead of Frame is saved.

14. The method according to claim 13, characterized in that the marking is a sign that appears in the frame (Sk) otherwise does not occur.

15. The method according to any one of the preceding claims, since characterized in that to play the pauses in speech an artificial noise (x (k + 1)) by a noise gene rator is generated.

16. speech pause detection device ( 10 ) for performing the method according to one of the preceding claims,
with a summation element ( 12 ) in which a sum of data of at least one frame (Sk) is determined,
with a first digital filter ( 14 ) to which the sums are fed on successive frames (Sk) and which, taking into account the first adaptation function, determines a pulse value (P) which essentially reproduces the energy content of the noise within a frame (Sk),
with a second digital filter ( 16 ) to which the sums of successive frames are also fed and which, taking into account a second adaptation function, determines a background value (H) which essentially reproduces the mean energy content of the background noise within a frame (Sk), and
with an evaluation unit ( 18 ) which generates a signal (VAD = 0) for the presence of a pause when the pulse value (P) is less than the background value (H) plus a predetermined threshold value (S),
the first digital filter ( 14 ) having a first and a second processing branch,
the first processing branch connected in series has a first addition element ( 20 ), a multiplication element ( 22 ) and a second addition element ( 24 ) and also a delay element ( 26 ) following the second addition element ( 24 ), the output of which has the input of the first Addition element ( 20 ) and the input of the second addition element ( 24 ) is connected, and
the second processing branch connected in series an Additi onslied ( 28 ), the input of which is connected to the output of the delay element ( 26 ), and an adapter ( 30 ), whose output is connected to the input of the multiplication element ( 22 ).

17. speech pause detection device ( 10 ) for performing the method according to any one of claims 1-15,
with a summation element ( 12 ) in which a sum of data of at least one frame (Sk) is determined,
with a first digital filter ( 14 ) to which the sums are fed on successive frames (Sk) and which, taking into account the first adaptation function, determines a pulse value (P) which essentially reproduces the energy content of the noise within a frame (Sk),
with a second digital filter ( 16 ) to which the sums of successive frames are also fed and which, taking into account a second adaptation function, determines a background value (H) which essentially reproduces the mean energy content of the background noise within a frame (Sk), and
with an evaluation unit ( 18 ) which generates a signal (VAD = 0) for the presence of a pause when the pulse value (P) is less than the background value (H) plus a predetermined threshold value (S),
the first digital filter ( 14 ) having a first and a second processing branch,
the first processing branch connected in series has a first addition element ( 20 ), a multiplication element ( 22 ) and a second addition element ( 24 ) and also has a delay element ( 26 ), the output of which has the input of the first addition element ( 20 ) and the input of the second adder ( 24 ) is connected, and
the second processing branch connected in series an Additi onslied ( 28 ), the input of which is connected to the output of the delay element ( 26 ), and an adapter ( 30 ), whose output is connected to the input of the multiplication element ( 22 ).