DE2435654A1

DE2435654A1 - Apparatus for speech analysis and synthesis - applies predictor method with reduced requirement of computer storage

Info

Publication number: DE2435654A1
Application number: DE2435654A
Authority: DE
Inventors: Louis-Sepp Willimann
Original assignee: Gretag AG
Current assignee: Gretag AG
Priority date: 1974-07-24
Filing date: 1974-07-24
Publication date: 1976-02-05
Also published as: DE2435654C2

Abstract

A microphone and analogue-digital converter transform the speech into a sequence of digits using a model of the vocal tract (7). The volume of digits for transmission is reduced by dividing the sequence into successive short lengths, each to be represented by a small set of parameters which will provide the key for the regeneration - - sthesis - of (an approximation to) the original according to a pre-arranged algorithm. An arbitrary set of parameters is used for a first synthesis, the result is compared with the original, a gradient vector is calculated to indicate the best way of applying a small corrective step to the parameters, the corrected parameters are applied to a fresh synthesis, in a series of iterations. At each stage the degree of approximation to the original is calculated (according to some distance function).

Description

Verfahren und Vorrichtung zur Analyse und Synthese von menschlicher Sprache. Method and device for the analysis and synthesis of human Language.

Bei der Uebertragung von Sprachsignalen, insbesondere in digitaler oder pulsamplitudenmodulierter Form ueber Kanäle begrenzter Bandbreite oder bei der möglichst platzsparenden Speicherung von Sprachsignalen beispielsweise in Computern, ergibt sich das Problem, den Umfang der Sprachinformation durch Elimination von deren Redundanz zu reduzieren. When transmitting voice signals, especially in digital or pulse amplitude modulated form over channels of limited bandwidth or at the most space-saving storage of voice signals, for example in computers, the problem arises, the extent of the language information by eliminating reduce their redundancy.

Zur Lösung dieses Problems wurden im wesentlichen zwei Verfahren vorgeschlagen; die nach diesen Verfahren arbeitenden Vorrichtungen sind unter der Bezeichnung ''Vocoderl' bzw. "Predictor" bekannt. Essentially two approaches have been taken to solve this problem suggested; the devices operating according to these methods are under the Designation '' Vocoderl 'or "Predictor" known.

Beim tw'Vocoder" wird die gegenseitige Abhängigkeit der Spektralkomponenten eines Lautes zur Redundanzvennirlderung ausgenUtzt. Dies ist deswegen möglich, weil die stimmhaften Laute, beispielsweise die Vokale eines Sprachsignals, quasiperiodischen Charakter besitzen. Das zugehörige Frequenzspektrum ist demnach linienförmig, wobei die einzelnen Spektrallinien um eine bestimmte Grundfrequenz, die sogenannte Pitch-Frequenz, auseinanderliegen. Leider hat das durch Vocoder synthetisierte Sprachsignal eine schlechte Qualität. With the tw'Vocoder "the mutual dependence of the spectral components of a sound used to change the redundancy. This is possible because the voiced sounds, for example the vowels of a speech signal, are quasi-periodic Have character. The associated frequency spectrum is therefore linear, where the individual spectral lines around a certain basic frequency, the so-called pitch frequency, be apart. Unfortunately, the speech signal synthesized by vocoder has one poor quality.

Beim 1'Predictor'1 wird zur Redundanzverminderung die statistische Abhängigkeit aufeinanderfolgender Momentanwerte der Sprachinformation als Funktion der Zeit ausgenUtzt, indem nur solche Momentanwerte übertragen werden, welche voneinander relativ unabhängig sind und ausserhalb eines bestimmten Toleranzintervails liegen. Hierzu wird fUr jeden zu Ubertragenden Momentanwert auf der Sendeseite bestimmt, ob er von den bereits übertragenen vorangegangenen Momentanwerten relativ unabhängig ist und auf der Empfangsseite werden die nicht bertragenen, abhängigen Momentanwerte ermittelt bzw. interpoliert. With the 1'Predictor'1, the redundancy reduction is the statistical Dependency of successive instantaneous values of the speech information as a function over time by only transmitting instantaneous values that are different from each other are relatively independent and lie outside a certain tolerance interval. For this purpose, it is determined for each instantaneous value to be transmitted on the transmission side, whether it is relatively independent of the previous instantaneous values that have already been transmitted is and on the receiving side are the non-transmitted, dependent instantaneous values determined or interpolated.

Das durch einen Predictor synthetisierte Sprachsignal hat eine sehr gute Qualität, die Bestimmung des zu Ubertragenden Momentanwertes kann jedoch unter Umständen einen relativ grossen Aufwand erfordern.The speech signal synthesized by a predictor has a very good quality, but the determination of the instantaneous value to be transferred can be done under Circumstances require a relatively large effort.

Die vorliegende Erfindung liegt auf dem letztgenannten Gebiet und betrifft ein Verfahren zur Analyse und Synthese von Sprache, bei welchem zur Analyse. das ursprUngliche Sprachslgnal in Abschnitte unterteilt uird und für jeden Abschnitt drei das jeweilige Sprachsignal repräsentierende Gruppen von Signalen abgeleitet werden, wobei die erste Gruppe von Signalen die Parameter eines dem menschlichen Vokaltrakt funktionsmässig entsprechenden und im wesentlichen aus einem diskreten linearen Filter aufgebauten Synthese- Vokaltraktmodells darstellt und die zweite und dritte Gruppe von Signalen den im folgenden als Pitchperiode bezeichneten Kehnvert der Grundfrequenz bzw. den Stimmhaftigkeitscharakter des ursprUnglichen Sprachsignals für den jeweiligen Abschnitt repräsentieren, und bei welche ztir Synthese das Synthese-Vokaltraktmodell anhand der ersten Gruppe von Signalen eingestellt und während stimmhaften Abschnitten des ursprünglichen Sprachsignals durch eine Folge von Impulsen im Abstand der Pitchperiode und während stimmlosen Abschnitten durch weisses Rauschen angeregt und so am Ausgang des Synthese-Vokaltraktmodells ein dem ursprünglichen Sprachsignal ähnliches, künstliches Sprachsignal erzeugt wird. The present invention is in the latter field and relates to a method for the analysis and synthesis of speech, in which for analysis. the original speech signal divided into sections and for each section three groups of representing the respective speech signal Signals are derived, the first group of signals being the parameters of a the human vocal tract functionally corresponding and essentially from represents a synthesis vocal tract model constructed using a discrete linear filter and the second and third groups of signals are hereinafter referred to as the pitch period denoted Kehnvert of the basic frequency or the voicing character of the original Represent speech signal for the respective section, and for which ztir synthesis the synthesis vocal tract model is set based on the first group of signals and during voiced portions of the original speech signal by a Sequence of impulses at intervals of the pitch period and during unvoiced sections stimulated by white noise and thus at the output of the synthesis vocal tract model an artificial speech signal similar to the original speech signal is generated will.

Eei einen aus der US-Patentschrift lr. 3 624 302 bekannten Verfahren dieser Art wird die erste Gruppe von Signalen, die sogenannten Predictorparameter, aus dem statistischen Zusammenhang von beispielsweise 12 aufeinanderfolgenden Abtastwerten des ursprunglichen Sprachs1na1s arithmetisch berechnet. Da hierzu ein lineares Gleichungssystem aufgelöst werden muss und die Nullstellen eins Polynoms 12. Grades bestimmt werden müssen, liegt dcr Reclrr-l.urx7alld ausserordentlich hoch und kann nur von einem Computer bewältigt werden.Eei one from US patent lr. 3,624,302 known processes the first group of signals, the so-called predictor parameters, from the statistical relationship of, for example, 12 consecutive samples of the original language 1na1s arithmetically calculated. Since this is a linear system of equations must be resolved and the zeros of a polynomial of the 12th degree are determined must, the Reclrr-l.urx7alld lies extraordinarily high and can can only be handled by one computer.

Ausserdem muss bei diesem YerLaXnren ar jeden Abschnitt auch die Energie des ursprunglichen Sprachsignals bestimmt werden.In addition, in this YerLaXnren there must also be energy in each section of the original speech signal can be determined.

Die Erfindung vermeidet diese Nachteileund ist dadurch gekennzeichnet, dass bei der Analyse zur Gewinnung der die Parameter des Synthese-Vokaltraktmodells darstellenden Signale ein mit dem Synthese-Vokaltraktmodell identisches Analyse-Vokaltraktmodell verwendet und während stimmhaften Abschnitten des ursprünglichen Sprachsignals durch eine Folge von Impulsen im Abstand der Pitchperiode und während stimmlosen Abschnitten durch weisses Rauschen angeregt wird, dass das Ausgangssignal des Analyse-Vokaltraktmodells abschnittweise mit dem ursprdnglichen Sprachsignal verglichen und die Abweichung zwischen den beiden genannten Signalen durch Verändern der Parameter des Analyse-Vokaltraktmodells minimalisiert wird, und dass diejenigen Parameter des Analyse-Vokaltraktmodells, bei welchen die Abweichung einen vorgegebenen Schwellenwert unterschreitet, als erste Gruppe von Signalen verwendet werden. The invention avoids these disadvantages and is characterized in that in the analysis to obtain the the parameters of the synthesis vocal tract model signals representing an analysis vocal tract model identical to the synthesis vocal tract model used and during voiced portions of the original speech signal a series of pulses spaced apart from the pitch period and during unvoiced sections is stimulated by white noise that the output signal of the analysis vocal tract model compared in sections with the original speech signal and the deviation between the two mentioned signals by changing the parameters of the analysis vocal tract model is minimized, and that those parameters of the analysis vocal tract model, in which the deviation falls below a predetermined threshold value than first group of signals are used.

Die Erfindung betrifft weiter eine Vorrichtung zur DurchfUhrung des genannten Verfahrens. bei welcher der Syntheseteil ein Syntheseumfasst und der Analyset eil Vokaltraktmodell und eine Impuls/Rausch-Quellg mit Mitteln zur Festlegung der Parameter des Synthese-Vokaltraktmodells, Mitteln zur Bestimmung der Pitchperiode und Mitteln zur Bestimmung des Stimmhaftigkeitscharakters des ursprUnglichen Sprachsignals ausgestattet ist. The invention further relates to a device for performing the mentioned procedure. in which the synthesis part comprises a synthesis and the analyzes eil vocal tract model and an impulse / noise source with means for determining the Parameters of the synthesis vocal tract model, means for determining the pitch period and means for determining the voicing character of the original speech signal Is provided.

Diese Vorrichtung ist dadurch gekennzeichnet, dass die Mittel zur Festlegung der Parameter des Synthese-Vokaltraktmodells aus einem mit dem letzteren identischen Analyse-Vokaltraktmodell, aus einer mit der Impuls/Rausch-Quelle des Syntheseteils identischen Impuls/Rausch-Quelle, aus einem Abschnitt speicher für die abschnittweise Speicherung des ursprünglichen Sprachsignals, aus einem Vergleicher für den Vergleich des Ausgangssignals des Analyse-Vokaltraktmodells mit den im Abschnitt speicher gespeicherten Signal und aus einem Parameterrechner zur Minimalisierung der im Vergleicher ermittelten Abweichung zwischen den beiden Signalen gebildet sind. This device is characterized in that the means for Establishing the parameters of the synthesis vocal tract model from one with the latter identical analysis vocal tract model, from one with the impulse / noise source of the Synthesis part identical pulse / noise source, from a section memory for the partial storage of the original speech signal from a comparator for comparing the output signal of the analysis vocal tract model with those in section memory stored signal and from a parameter calculator for minimization the difference between the two signals determined in the comparator are.

Da somit bei der erfindungsgemässen Vorrichtung die wesentlichen Bestandteile von Analyse- und Syntheseteil identisch sind, kann diese, beispielsweise bei der Uebermittlung von Sprachsignalen, ohne grossen zusätzlichen Aufwand im wechselweisen Sende-Empfangsbetrieb verwendet werden. Ein weiterer Vorteil gegenüber der nach dem bekannten Verfahren arbeitenden Vorrichtung liegt darin, dass Analyse- und Synthese-Vokaltraktmodell durch ein beliebiges lineares Digitalfilter gebildet sind und daher ein solches mit geringer Quantisierungsempfindlichkeit verwendet werden kann. Bei der bekannten Vorrichtung wird hingegen ein ganz bestimmtes rekursives Filter verwendet, nämlich die sogenannte Frobenus-Form, bei welcher die RUckkopplung aus einem Transversalfilter besteht. Es ist bekannt, dass die Koeffizienten dieser Form extrem quantisierungsempfindlich sind. Since in the device according to the invention the essential Components of the analysis and synthesis part are identical, this can, for example in the transmission of voice signals, without much additional effort in alternation Transmit-receive mode can be used. Another advantage over the after the known method working device is that analysis and synthesis vocal tract model are formed by any linear digital filter and therefore such a can be used with low quantization sensitivity. With the well-known Apparatus, on the other hand, becomes a very specific one recursive filter used, namely the so-called Frobenus shape, in which the feedback from a transversal filter. It is known that the coefficients of this form are extremely sensitive to quantization.

Im folgenden wird die Erfindung anhand eines in den Figuren dargestellten Ausführungsbeispiels näher erläutert; es zeigen: Fig. l ein Blockschema einer Einrichtung zur Sprachanalyse und Sprachsynthese, Fig. 2a ein Detail von Fig. 1 in Blockschaltbilddarstellung, Fig. 2b ein vereinfachtes Blockschema der Anordnung gemäss Fig. 2a, Fig. 3a ein weiteres Detail von Fig. 1 in Blockschaltbilddarstellung, Fig. 3b eine Variante der Schaltung gemäss Fig. 3a ebenfalls in Blockschaltbilddarstellung, und Fig. 4 ein weiteres Detail von Fig. 1 in Blockschaltbilddarstellung. In the following, the invention is illustrated by means of one in the figures Embodiment explained in more detail; The figures show: FIG. 1 a block diagram of a device for speech analysis and speech synthesis, FIG. 2a shows a detail of FIG. 1 in a block diagram, 2b shows a simplified block diagram of the arrangement according to FIGS. 2a, 3a further detail of FIG. 1 in a block diagram, FIG. 3b a variant the circuit according to FIG. 3a also in a block diagram representation, and FIG. 4 a further detail of FIG. 1 in a block diagram representation.

Gemäss Fig. 1 besteht eine vollständige Einrichtung zur Sprachanalyse und Sprachsynthese aus einem Analyseteil A und einem Syntheseteil S. Zwischen dem Ausgang des Analyseteils und dem Eingang des Syntheseteils wirkt darstellungsgemäss ein Uebertragungs- oder Speichermedium 14, beispielsweise ein digitaler Uebertragungskanal oder ein digitaler Speicher. According to FIG. 1, there is a complete device for speech analysis and speech synthesis from an analysis part A and a synthesis part S. Between the The output of the analysis part and the input of the synthesis part act as shown a transmission or storage medium 14, for example a digital transmission channel or a digital memory.

Der Analyseteil A besteht aus einer Sprachquelle 1, einem Tiefpassfilter 2, einem Analog/Digital-Wandler 3, einer Taktquelle 15, welche den gesamten Analyseteil A taktet, einem Pitchdetektor 4, einem Abschnittspeicher 5, einer Impuls/Rausch-Quelle 6, einem Analyse-Vokaltraktmodell 7, einem Vergleicher 8, einem Parameterrechner 9 und aus einem Codierer lO. The analysis part A consists of a speech source 1, a low-pass filter 2, an analog / digital converter 3, a clock source 15, which the entire analysis part A clocks, a pitch detector 4, a section memory 5, a pulse / noise source 6, an analysis vocal tract model 7, a comparator 8, a parameter calculator 9 and from one Encoder lO.

Der Syntheseteil S besteht aus einem Decodierer ll, einer Impuls/Rausch-Quelle 6', einem Synthese-Vokaltraktmodell 7', einem Digital/Analog-Wandler 12, einem Tiefpassfilter 2' und aus einer Sprachsenke 13, beispielsweise einem Lautsprecher. The synthesis part S consists of a decoder II, a pulse / noise source 6 ', a synthesis vocal tract model 7', a digital / analog converter 12, a low-pass filter 2 'and from a speech sink 13, for example a loudspeaker.

Die Tiefpassfilter'2,2', die Impuls/Rausch-Quellen 6,6' und die Vokaltraktmodelle 7,7' von Analyseteil A und Syntheseteil S.The low-pass filters' 2,2 ', the pulse / noise sources 6,6' and the vocal tract models 7.7 'of analysis part A and synthesis part S.

sind jeweils identisch aufgebaut. Mit entsprechenden Umschaltmöglichkeiten auf Analyse oder Synthese braucht jede dieser drei Vorrichtungen nur einmal vorhanden zu sein.are each constructed identically. With corresponding switching options each of these three devices only needs to be present once for analysis or synthesis to be.

ANALYSETEIL Das zu analysierende Sprachsignal gelangt von der Quelle 1, beispielsweise einem Mikrofon oder Analogspeicher, zum Tiefpassfilter 2. Letzteres weist eine bestimmte Grenzfrequenz fg, beispielsweise 3 bis 5 kHz auf. Das Ausgangssignal des Tiefpassfilters 2 wird im Analog/Digital-Wandler 3 mit einer Abtastfrequenz 2fg, beispielsweise also 6 bis 10 kllz abgetastet und digitalisiert. Die dabei entstehende Folge von Abtastwerten {5n} gelangt einerseits in den Pitchdetektor 4 und anderseits in den Abschnittspeicher 5.ANALYSIS PART The speech signal to be analyzed comes from the source 1, for example a microphone or analog memory, to the low-pass filter 2. The latter has a certain cutoff frequency fg, for example 3 to 5 kHz. The output signal of the low-pass filter 2 is in the analog / digital converter 3 with a sampling frequency 2fg, for example 6 to 10 kllz scanned and digitized. The resulting The sequence of samples {5n} reaches the pitch detector 4 on the one hand and on the other hand into the section memory 5.

Im Abschnitt speicher 5 wird ein kurzer Abschnitt des zu analysierenden Signales für tsn\ fUr wiederholten Abruf zwischengespeichert. Die Länge des Abschnitts liegt in der Grössenordnung von einer bis zu mehreren Pitchperioden, beträgt also etwa 10 bis 30 msec. Sie braucht aber nicht ein ganzes Vielfaches einer Pitchperiode zu sein. In section memory 5 a short section of the to be analyzed is shown Signals for tsn \ buffered for repeated retrieval. The length of the section is in the order of magnitude from one to several pitch periods, i.e. is about 10 to 30 msec. But it does not need a whole multiple of a pitch period to be.

Im Pitchdetektor 4 wird nach bekannten Verfahren, beispielsweise so wie in Vocodern klassischer Bauart, bestimmt, ob der jeweilige Sprachabschnitt stimmhaft ist oder nicht. Ist der Abschnitt stimmhaft, so wird gleichzeitig Länge und Lage der Pitchperioden bestimmt, wobei man unter einer Pitchperiode die Zeitspanne zwischen zwei von den Stimmbändern bel stiwimhaften Lauten erzeugten Druckpulsen versteht. Der Pitchdetektor 4 gibt seine Information, nämlich ein die stimmhaft/stimmlos-Entscheidung repräsentierendes Signal g sowie bei Vorliegen stimmhafter Abschnitte auch Länge und Lage der Pitchperiode darstellende Pitchperioden Signale M, einerseits direkt an den Codierer 10 und anderseits an die Impuls/Rauschquelle 6 im Analyseteil weiter. In the pitch detector 4 according to known methods, for example as in classic vocoders, determines whether the respective section of speech is voiced or not. If the section is voiced, it becomes length at the same time and position of the pitch periods is determined, with the time span under a pitch period between two of the vocal cords belly vocal Generated sounds Understands pressure pulses. The pitch detector 4 gives its information, namely a die Signal g representing voiced / unvoiced decision and, if it is present, voiced Sections also pitch period signals representing the length and position of the pitch period M, on the one hand directly to the encoder 10 and on the other hand to the pulse / noise source 6 in the analysis part.

Die Irnpuls/Rausch-QueLle 6 gibt, gesteuert durch den Pitchdetektor 4, während stimmlosen Abschnitten im Sprachsignal weisses Rauschen und während stimmhaften Abschnitten im Sprachsignal impulsförmige Signale im Abstand der Pitchperiode ab. Das weisse Rauschen wird durch einen PseuQozufallsgenerator bekannter Bauart erzeugt und weist eine annähernd konstante Leistung auf. Die während stimmhaften Abschnitten im Sprachsignal von der Impuls/Rausch-Quelle 6 abgegebenen Impulse sind im einfachsten Fall einfache Einheitsimpulse, sie können jedoch atlzh eine andere Form, beispielsweise Dreieckform, aufweisen. Die Leistung der Impulsfolge ist ebenfalls etwa lconstanL und ist gleich jener des weissen Rauschens. The impulse / noise source 6 is there, controlled by the pitch detector 4, white noise during unvoiced portions in the speech signal and white noise during voiced Sections in the speech signal from pulse-shaped signals at intervals of the pitch period. The white noise is generated by a pseudo random generator of a known type and has an approximately constant performance. The during voiced sections The pulses emitted in the speech signal by the pulse / noise source 6 are the simplest Case simple unit impulses, however, they can atlzh a different form, for example Triangular shape. The power of the pulse train is also about constant and is like that of white noise.

Das aus weissem Rauschen oder aus Impulsen im Abstand der Pitchperiode gebildete Ausgangssignal der Impuls/Rausch-Quelle 6 bildet das Anregungssignal für das Analyse-Vokaltraktmodell 7. That from white noise or from pulses spaced apart from the pitch period formed output signal of the pulse / noise source 6 forms the excitation signal for the analysis vocal tract model 7.

Unter Vokaltrakt versteht man das System von Rdhren variabler Querschnittsfläche zwischen Kehlkopf und Lippen sowie zwischen Velum und Näsenöffnungen. Dieser Vokaltrakt wird beim Sprechen während den Vokalen durch periodische Pulse, die Pitchpulse, welche durch die Stimmritze erzeugt werden, angeregt. Bei Konsonanten wird der Vokaltrakt durch annähernd weisses Rauschen angeregt. Letzteres wird durch einen Luftstrom erzeugt, welcher durch eine Verengung im tokaLtrakt, beim Konsonanten f beispielsweise durch die Verengung zwischen Oberzähnen und Unterlippe, gepresst wird. The vocal tract is the system of tubes of variable cross-sectional area between the larynx and lips as well as between the velum and the nostrils. This vocal tract is generated when speaking during the vowels by periodic pulses, the pitch pulses generated by the glottis are stimulated. With consonants the vocal tract is stimulated by almost white noise. The latter is going through an air flow is generated, which through a narrowing in the tokaL tract, at the consonant f for example by the narrowing between the upper teeth and the lower lip, pressed will.

Das Modell 7 des menschlichen Vokaltrakts ist durch ein lineares Digitalfilter beliebiger Struktur gebildet. Lineare Digitalfilter si.nd beispielsweise in H.W. Schüssler: "Digitale Systeme zur Signalverarbeitung", Springer 1973, beschrieben. The model 7 of the human vocal tract is represented by a linear one Digital filter formed of any structure. Linear digital filters are for example in H.W. Schüssler: "Digital systems for signal processing", Springer 1973, described.

Lineare Digitalfilter gestatten, aus einer Eingangsfolge Sxn| eine Ausgangsfolge fYni nach folgendem Gesetz zu erzeugen: Un+l = A.u + b.x (la) - - -n - n yn = cT.un + d#xn (lb) Hierbei ist u der n-te Zustandsvektor der Dimension N. u ist -n -o vorgegeben und ist in den meisten Fällen der Nullvektor. Durch die NxN-matrix A, die beiden N-dimensionalen Vektoren b und c sowie durch den Skalar d ist das Modell vollständig beschrieben. Linear digital filters allow an input sequence Sxn | one To generate output sequence fYni according to the following law: Un + l = A.u + b.x (la) - - -n - n yn = cT.un + d # xn (lb) Here u is the nth state vector of the dimension N. u is given -n -o and is the zero vector in most cases. Through the NxN-matrix A, the two N-dimensional vectors b and c as well as by the scalar d the model is fully described.

Wie schon ausgeführt, ist die Eingangs folge ßx n während stimmhaften Abschnitten des Sprachsignals durch eine Folge von Impulsen im Abstand der Pitchperiode und während stimmlosen Abschnitten im Sprachsignal durch weisses Rauschen gebildet. As already stated, the input sequence is ßx n during voiced Sections of the speech signal by a sequence of pulses spaced apart by the pitch period and formed by white noise during unvoiced portions of the speech signal.

Das Analyse-Vokaltraktmodell 7, welches in den Fig. 2a und 2b näher erläutert wird, gibt bei Anregung auf die genannte Art ein erstes, noch rohes Sprachsignal Yn an den Vergleicher 8 weiter, in welchem dieses Näherungssignal mit dem im Abschnittspeicher 5 gespeicherten Abschnitt des ursprünglichen Sprachsignalssverglichen wird. The analysis vocal tract model 7, which is shown in more detail in FIGS. 2a and 2b is explained, gives a first, still raw speech signal when stimulated in the aforementioned manner Yn to the comparator 8, in which this proximity signal with that in the section memory 5 stored portion of the original speech signal is compared.

Das Vergleichskriterium, welches ein mathematisches Nass für die Abweichung zwischen den beiden Folgen Ynl und darstellt und in der Bewertung dem physiologischen Empfinden des menschlichen Ohres möglichst ähnlich sein soll, kann an sich beliebig gewählt werden. Ein besonders wegen seiner analytischen Einfachheit bevorzugtes Mass ist die quadratische Abweichung. The comparison criterion, which represents a mathematical wetness for the deviation between the two sequences Ynl and Ynl and and should be as similar as possible in the evaluation to the physiological perception of the human ear, can be chosen as desired. A measure that is particularly preferred because of its analytical simplicity is the quadratic deviation.

wenn L die Länge des Sprachabschnitts ist.if L is the length of the speech segment.

Aufgrund der Ergebnisse dieses Vergleichs werden im Parameterrechner 9 die erforderlichen Aenderungen am Analyse-Vokaltraktmodell 7 derart bestimmt, dass beim nächsten Vergleich die Abweichung gemäss Formel (2) zwischen dem synthetischen Signal tYnl und dem ursprünglichen Sprachsignal {sn} kleiner ist. Based on the results of this comparison, the parameter calculator 9 determines the necessary changes to the analysis vocal tract model 7 in such a way that that in the next comparison the deviation according to formula (2) between the synthetic Signal tYnl and the original speech signal {sn} is smaller.

Zu diesem Zweck bestimmt der Parameterrechner 9 den Gradienten des Fehlermasses bezüglich der Parameter des Analyse-Vokaltraktmodells 7. Die Parameter des Analyse-Vokaltraktmodells 7 stellen diejenige Gruppe aller Komponenten dieses Modells dar, an welchen die genannten Aenderungen vorgenommen werden, also die variablen Komponenten. Nicht variable Komponenten, also beispielsweise feste elektrische Verbindunge, werden nicht verändert und werden infoVedessen bei der Bestimazung des Gradienten des Fehlermasses nicht berücksichtigt. Der Gradient ist ein Vektor, welcher in Richtung des steilsten Anstiegs des Fehlers weist und dessen Absolutbetrag die lokale Steilheit in dieser Richtung angibt. Die Berechnung des Gradienten wird weiter unten anhand der Fig. 3a und 3b näher erläutert. For this purpose, the parameter computer 9 determines the gradient of the Measure of error with regard to the parameters of the analysis vocal tract model 7. The parameters of the analysis vocal tract model 7 represent that group of all components of this Model on which the changes mentioned are made, i.e. the variables Components. Non-variable components, e.g. fixed electrical connections, are not changed and are used when determining the gradient of the degree of error is not taken into account. The gradient is a vector which points in the direction the steepest rise of the error and its absolute value the local steepness indicating in this direction. The calculation of the gradient is based on below 3a and 3b explained in more detail.

Nach erfolgter Berechnung des Gradienten werden die neuen Parameter für das Analyse-Vokaltraktmodell 7 so festgelegt, dass ein kleiner Schritt in der zur Gradientenrichtung entgegengesetzten Richtung erfolgt. In dieser Richtung nininit der Fehler naturgemäss am stärksten ab. Wenn nun Pk der Vektor aller Parameter des Analyse-Vokaltraktmodells 7 nach der k-ten Iteration ist, so werden bei der nächsten Iterativ die Parameter gemäss folgender Formel bestimmt: Pkfl Pk - Ak -8radk (E) (3) Ak stellt eine feste oder jedesmal neu zu bestimmende, kleine positive Schrittweite dar. After the gradient has been calculated, the new parameters set for the analysis vocal tract model 7 so that a small step in the the direction opposite to the gradient direction takes place. In this direction nininit the error naturally decreases the most. If now Pk is the vector of all parameters of the Analysis vocal tract model 7 after the kth iteration, so will be with the next Iteratively determine the parameters according to the following formula: Pkfl Pk - Ak -8radk (E) (3) Ak represents a fixed or a small one to be determined each time positive step size.

Beim Iterationsverfabren gemäss Formel (3) nimmt der Fehler bei jedem Schritt ab. Sobald der Vergleicher 8 feststellt, dass der Fehler einen vorgegebenen Schwellenwert unterschreitet, also tolerierbar geworden ist, gibt er ein Befehlssignal B an den Codierer 10 ab, die aktuellen Parameter P. des Analyse-Vokaltraktmodells 7 zu übernehmen und zusammen mit der Information des Pitchdetektors 4, also stimmhaft/ stimmlos-Signalgund gegebenenfalls Pitchperiodensignale M, für die binäre Uebertragung oder Speicherung vorzubereiten. When iterating according to formula (3), the error increases with everyone Step off. As soon as the comparator 8 determines that the error is a predetermined If it falls below the threshold value, i.e. has become tolerable, it issues a command signal B to the encoder 10, the current parameters P. of the analysis vocal tract model 7 and together with the information from the pitch detector 4, i.e. voiced / unvoiced signal and possibly pitch period signals M for the binary transmission or prepare for storage.

Von diesem Augenblick an ist der Analyseteil für die Analyse des nächstfolgenden Sprachabschnitts bereit.From this moment on, the analysis part is for the analysis of the next one Language section ready.

Gemäss Fig. 2a, welche ein Blockschema des Analyse-Vokaltraktmodells 7 für die Ordnung N = 8 zeigt, besteht das Vokaltraktmodell aus einem Speicher 21 mit 8 Speicherplätzen, aus einer Ruckkopplungsmatrix 22, aus einer Stufe 23 mit 8 ersten Multiplikatoren)aus einer Stufe 24 mit 8 zweiten Multiplikatoren, aus einem Multiplikator 25, aus einer Stufe 26 mit 8 Addiergliedern und aus einem Summierglied 27. Die Wdckkopplungsmatrix 22 ist aus Addiergliedern und Multiplikatoren aufgebaut. According to Fig. 2a, which is a block diagram of the analysis vocal tract model 7 shows for the order N = 8, the vocal tract model consists of a memory 21 with 8 memory locations, from a feedback matrix 22, from a stage 23 with 8 first multipliers) from a level 24 with 8 second multipliers, from one Multiplier 25, from a stage 26 with 8 adders and from a summing element 27. The feedback matrix 22 is made up of adders and multipliers.

Den Stufen 23 und 24, dem Multiplikator 25 und der Rückkopplungsmatrix 22 ist je ein zusätzlicher Speicher (nicht dargestellt) zugeordnet, in welchem jeweils die aktuellen Parameter dieser Stufen, also ihre variablen Komponenten bi, ci, d und aij, welche zusammen den Parametersatz Pj (Fig. 1) bilden, gespeichert sind. Die so gespeicherten Parameter pj können durch das Befehlssignal B des Vergleichers 8 (Fig. 1) auf einfache Weise aus dem Vokaltraktmodell 7 ausgelesen und in den Codierer 10 eingespeist werden. The stages 23 and 24, the multiplier 25 and the feedback matrix 22 is assigned an additional memory (not shown) in which each the current parameters of these levels, i.e. their variable components bi, ci, d and aij, which together form the parameter set Pj (FIG. 1), are stored. The parameters pj stored in this way can be accessed by the command signal B of the comparator 8 (Fig. 1) read out in a simple manner from the vocal tract model 7 and into the encoder 10 can be fed in.

Wie schon ausgeführt wurde, ist das Vokaltraktmodell ein lineares Digitalfilter, welches dem rekursiven Vektorgleichungspaar (lb) und (lb) gehorcht. As already stated, the vocal tract model is a linear one Digital filter that obeys the recursive vector equation pair (lb) and (lb).

u = A-u + b x (la) -n+l - -n - n yn = cTun + d xn (lb) In Komponentenform geschrieben lauten die Gleichungen (la) und (lb) folgendermassen: für alle i mit 1 # i # N (lauf) Der Inhalt der 8 Speicherplätze des Speichers 21 bildet den Zustandsvektor u des Modells beim n-ten Takt. Aus diesen -n 8 Speicherwerten u1 bis u8 werden mit Hilfe der Rückkopplungsmatrix 22 8 Linearkombinationen gebildet. Dies entspricht jeweis dem ersten Summanden der rechten Seite von Gleichung (la) oder (la'). Zu jeder dieser Linearkombinationen A11 ...... A18 bis A81... A8s wird in der Addierstufe 26 jeweils der n-te Abtastwert der Anregungsfolge Xn multipliziert mit einer Komponente des Einkopplungsvektors b addiert. Die Multiplikation der Abtastwerte der Anregungsfolge x mit den Komponenten b1 bis b8 des Einkopplungsvektors b erfolgt mit den ersten Multiplikatoren der Stufe 23. Die Addition der Linearkombinationen All ... Al8 bis A8l...A88 mit dem Produkt aus Abtastwert der Folge xn mal Komponente des Einkopplungsvektors b entspricht jeweils dem zweiten Summanden der rechten Seite von Gleichung (la) oder (la').u = Au + bx (la) -n + l - -n - n yn = cTun + d xn (lb) In component form, the equations (la) and (lb) read as follows: for all i with 1 # i # N (run) The content of the 8 storage locations of the memory 21 forms the state vector u of the model at the nth cycle. From these -n 8 storage values u1 to u8, 8 linear combinations are formed with the aid of the feedback matrix 22. This corresponds in each case to the first summand on the right-hand side of equation (la) or (la '). For each of these linear combinations A11... A18 to A81.. The multiplication of the sample values of the excitation sequence x with the components b1 to b8 of the coupling vector b takes place with the first multipliers of stage 23. The addition of the linear combinations All ... Al8 to A8l ... A88 with the product of the sample value of the sequence xn times the component of the coupling vector b corresponds in each case to the second summand on the right-hand side of equation (la) or (la ').

Die bei der genannten Addition entstehenden Summen bilden die neuen Speicherwerte, welche beim nächsten, also beim (n+l)-ten Takt in den Zustandsspeicher 21 übernommen werden. The sums resulting from the mentioned addition form the new ones Storage values which are transferred to the state memory at the next, i.e. at the (n + l) -th cycle 21 should be adopted.

Der n-te Antwort-Abtastwert Yn wird als Linearkombination der Speicherwerte im Speicher 21 berechnet. Die verwendeten Koeffizienten bilden den Auskopplungsvektor c, mit dessen Komponenten c bis c8 die Ausgangssignale der einzelnen Speicherplätze des Speichers 21 durch die zweiten Multiplikatoren der Stufe 24 multipliziert werden. Die Linearkombination der Ausgangssignale der zweiten Multiplikatoren der Stufe 24, in welche ausserdem noch das in der Multiplizierstufe 25 mit dem Durchgangskoeffizienten d multiplizierte Eingangssignal xn miteinbezogen wird, erfolgt im Summierglied 27. The nth response sample Yn is used as a linear combination of the storage values calculated in memory 21. The coefficients used form the coupling-out vector c, with its components c to c8 the output signals of the individual memory locations of the memory 21 are multiplied by the second multipliers of the stage 24. The linear combination of the output signals from the second multipliers of the stage 24, in which also that in the multiplier 25 with the passage coefficient d multiplied input signal xn is included in the summing element 27.

Die Komponenten der Matrix A und der Vektoren b und c sowie eventuell der Skalar d lassen sich in 3 Gruppen einteilen. The components of the matrix A and the vectors b and c as well as possibly the scalar d can be divided into 3 groups.

Die Komponenten der ersten Gruppe sind fest vorgewählt. Sie haben meist einfache Werte wie 0, d.h. die entsprechende Verbindung ist gar nicht vorhanden, oder 1, d.h., das entsprechende Signal geht ohne zusätzliche Multiplikation rein additiv in die Linearkombination ein, oder -1, d.h. reine Subtraktion. Die Komponenten dieser Gruppe werden durch den Optimierungsvorgang also nicht beeinflusst. Die zweite Gruppe umfasst jene Komponenten, welche bei jedem Optimierungsschritt verändert werden. Die Komponenten der dritten Gruppe schliesslich sind Linearkombinationen von veränderlichen und unveränderlichen Teilkomponenten. So mag beispielsweise die Matrix A eine Komponente der Form A.. = ii haben. Hier wUrde Pk bei jedem Optimierungsschritt verändert werden und l würde eine feste Verdrahtung bedeuten. Der Signalpfad, welcher die i-te Komponente des n-ten Zustandsvektors u -n auf die j-te Komponente von und zurückkoppelt, würde also aus einem festen und aus einem veränderlichen Teilpfad bestehen.The components of the first group are pre-selected. They have mostly simple values like 0, i.e. the corresponding connection does not exist at all, or 1, i.e. the corresponding signal goes in without additional multiplication additively in the linear combination, or -1, i.e. pure subtraction. The components this group are therefore not influenced by the optimization process. The second Group includes those components that change with each optimization step will. Finally, the components of the third group are linear combinations of changeable and unchangeable subcomponents. For example, she likes Matrix A have a component of the form A .. = ii. Here Pk would be at every optimization step can be changed and l would mean hard wiring. The signal path, which the i-th component of the n-th state vector u -n to the j-th component of and fed back, would therefore consist of a fixed and a variable partial path exist.

Die festen Komponenten, also jene der ersten Gruppe und die festen Teile der dritten Gruppe legen die Struktur des Vokaltraktmodells fest. Die veränderlichen Komponenten, also jene der zweiten Gruppe und die veränderlichen Teile der dritten Gruppe bilden die über den Kanal 14 zu übertragenden Parameter pj (Fig. l) des Vokaltraktmodells. The solid components, i.e. those of the first group and the solid ones Parts of the third group determine the structure of the vocal tract model. The changeable ones Components, i.e. those of the second group and the changeable parts of the third The group is formed by the parameters pj (FIG. 1) of the vocal tract model to be transmitted via the channel 14.

In Fig. 2b ist das Vokaltraktmodell von Fig. 2a vereinfacht dargestellt, wobei die einzelnen Stufen der Schaltung nur noch mit den entsprechenden Signalen bzw. Signalkomponenten bezeichnet sind. In Fig. 2b the vocal tract model of Fig. 2a is shown in simplified form, the individual stages of the circuit only with the corresponding signals or signal components are designated.

In den Fig. 3a und 3b ist je ein Blockschaltbild des Parameterrechners 9 (Fig. 1) dargestellt. A block diagram of the parameter calculator is shown in each of FIGS. 3a and 3b 9 (Fig. 1).

Wie schon ausgeführt wurde, hat der Parameterrechner 9 bei jedem Optimierungsschritt einen Satz von neuen Parametern zkl nach der Formel (3) zu berechnen: Pk+l 2k - k k gradk (E) (3) In dieser Formel ist Pk der Vektor der alten Parameter, Äk ist eine kleine positive Schrittweite. Diese kann bei jedem Schritt gleich gewählt werden, also A k= für alle k; sie kann jedoch auch für jeden Optimierungsschritt neu bestimmt werden. As has already been stated, the parameter calculator has 9 for each Optimization step to calculate a set of new parameters zkl according to the formula (3): Pk + l 2k - k k gradk (E) (3) In this formula, Pk is the vector of the old parameters, Äk is a small positive step size. This can be chosen the same for each step become, so A k = for all k; however, it can also be used for each optimization step be redefined.

Im Artikel von L.S. Willimann: Computation of the Response-Error Gradient of Linear Diskrete Filters", IEEE Transactions, vol.. ASSP-22, No.l, February 1974" ist auch gezeigt, dass die Berechnung von gradk (E) in zwei Schritte zerfällt. Der erste Schritt ist sehr einfach und mathematisch elementar und hängt nur von der Art des Fehlermasses E ab, hingegen nicht von der Wahl der Struktur des Vokaltraktmodells. Der zweite Schritt hängt nur von der Struktur des Vokaltraktmodells, nicht aber vom Fehlermass ab. In the article by L.S. Willimann: Computation of the Response Error Gradient of Linear Discrete Filters ", IEEE Transactions, vol. ASSP-22, No. 1, February 1974 "it is also shown that the calculation of gradk (E) is divided into two steps. The first step is very simple and mathematically elementary and only depends on depends on the type of error E, but not on the choice of the structure of the vocal tract model. The second step depends only on the structure of the vocal tract model, but not on the degree of error.

In der erwähnten Publikation von L.S. Willimann wird weiter mit Hilfe eines Dualitätstheorems gezeigt, dass der Parameterrechner 9 gleichzeitig die Funktion des Filters und damit des Vokaltraktmodells 7 (Fig. 1) übernehmen kann.In the mentioned publication by L.S. Willimann will continue to help a duality theorem shown that the parameter calculator 9 at the same time the function the filter and thus the vocal tract model 7 (Fig. 1) can take over.

Fig. 3a zeigt eine erste Version eines kombinierten Parat meterrechners 9 und Vokaltraktmodells 7 gemäss Fig. 2a bzw. 2b, wobei die Ordnung N wiederum gleich 8 ist. Fig. 3a shows a first version of a combined Parat meter computer 9 and vocal tract model 7 according to FIG. 2a and 2b, the order N being again the same 8 is.

Gemäss Fig. 3a besteht der Parameterrechner 9 aus einem ersten primären Modell 29, aus einer Baugruppe 30, sowie aus N=8 weiteren primären Teilmodellen 31 bis 38. Das erste primäre Modell 29 ist mit dem in Fig. 2a bzw. 2b dargestellten Vokaltraktmodell identisch, wie ein Vergleich der Fig. 2b und 3a zeigt. According to FIG. 3a, the parameter computer 9 consists of a first primary one Model 29, from an assembly 30, as well as from N = 8 further primary sub-models 31 to 38. The first primary model 29 is similar to that shown in FIGS. 2a and 2b, respectively Identical vocal tract model, as a comparison of FIGS. 2b and 3a shows.

Das erste pr.illiäre Modell 29 wird durch die Impuls/Rausch-Quelle 6 (Fig. l.) angeregt und liefert neben dem synthetischen Sprachsignal { yn 3 die partiellen Ableitungen #yn/#c1 ... #yn/#c8 sowie by /dd. Die Ableitung ßyn/C)ci ist gerade gleich der i-ten Komponente des Zustandsvektors u (Gleichung la). Die mathematische Begrünung für diesen und die folgenden Zusammenhänge wird in der erwähnten Dissertation gegeben. Weiter ist die Ableitung (Emp:cindlichkeit) #yn/#d des Modellausgangs yn bezüglich des Durchgangskoeffizienten d gleich dem entsprechenden Gleid der Anregungsfolge {xn}. The first primary model 29 is made by the pulse / noise source 6 (Fig. 1.) stimulated and delivers in addition to the synthetic speech signal { yn 3 the partial derivatives # yn / # c1 ... # yn / # c8 and by / dd. The derivation ßyn / C) ci is exactly equal to the i-th component of the state vector u (equation la). The mathematical greening for this and the following connections will be given in the mentioned dissertation. Next is the derivation (sensitivity) # yn / # d of the model output yn with respect to the passage coefficient d is equal to that corresponding equilibrium of the excitation sequence {xn}.

Die Baugruppe 30, welche ebenfalls durch die Impuls/Rausch-Quelle 6 (Fig. 1) angeregt wird, ist ein Teil des zum ersten primären Modell 29 und damit zum Vokaltraktmodell 7 sogenannten dualen Modells. Es lässt sich nämlich zeigen, dass es ein zu den Gleichungen (la) und (lb) äquivalentes Gleichungssystem (4a) und (4b) gibt, welches auf eine gleiche Anregungsfolge «xnt die gleiche Antwortfolge {yn} liefert wie das primäre Modell: vn+1 = AT .vn + c # xn (4a) Yn = bT #vn + d#xn (4b) Die RUckkopplungsmatrix des dualen Modells ist die Transponierte AT der Rückkopplungsmatrix A des primären Modells. The assembly 30, which is also provided by the pulse / noise source 6 (Fig. 1) is excited, is part of the first primary model 29 and thus to the vocal tract model 7 so-called dual model. Because it can be shown that there is an equation system (4a) equivalent to equations (la) and (lb) and (4b) gives which of the same sequence of stimuli does not give the same sequence of responses {yn} delivers like the primary model: vn + 1 = AT .vn + c # xn (4a) Yn = bT #vn + d # xn (4b) The feedback matrix of the dual model is the transpose AT of the feedback matrix A of the primary model.

Der primäre Auskopplungsvektor c wird Einkopplungsvektor im dualen Modell und der primäre Einkopplungsvektor b wird Auskopplungsvektor. Der Durchgangskoeffizient d ist in beiden Modellen gleich.The primary outcoupling vector c becomes the infeed vector in the dual model and the primary input vector b becomes the output vector. The passage coefficient d is the same in both models.

Die Baugruppe 30 repräsentiert die Gleichung (4a). Die Komponenten des Zustandsvektors v dieses dualen Modells sind die partiellen Ableitungen #yn/#b1 .... byn/ib8 des aktuellen Gliedes yn der Ausgangs folge nach den Komponenten des Einkopplungsvektors bl ... b8 Die Komponenten des Zustandsvektors v des dualen Teilmodells 30 regen wieder je ein primäres Teilmodell 31 bis 38 an. The assembly 30 represents the equation (4a). The components of the state vector v of this dual model are the partial derivatives # yn / # b1 .... byn / ib8 of the current member yn of the output sequence according to the components of the Einkopplungsvektor bl ... b8 The components of the state vector v of the dual partial model 30 again each stimulate a primary sub-model 31 to 38.

Die Zustandsvektoren u ... uVllldieser primären Teilmodelle liefern die partiellen Ableitungen des aktuellen Gliedes der Ausgangsfolge nach den Elementen A.. der RückkopplungsiJ matrix A in der angegebenen Art.The state vectors u ... uVlll of these primary sub-models deliver the partial derivatives of the current member of the output sequence according to the elements A .. the feedback matrix A in the specified manner.

Eine zweite, gleichwertige Anordnung ist in Fig. 3b dargestellt. Auch hier regt die Eingangs folge jene ein vollständiges primäres Modell 39 und ein duales Teilmodell 40 an. A second, equivalent arrangement is shown in FIG. 3b. Here, too, the initial sequence suggests that a complete primary model 39 and a dual sub-model 40.

Im Unterschied zu Fig. 3a werden jedoch hier die Komponenten des Zustandsvektors u des primären Modells verwendet, um N=8 weitere duale Teilmodelle 41 bis 48 anzuregen. Die Modellantwort SYn} sowie die gesuchten partiellen Ableitungen nach den Modellparametern #yn/#Aij, #yn/#bi, #yn/#cj und #yn/#d findet man wie in der Fig. eingetragen.In contrast to FIG. 3a, however, the components of the state vector are here u of the primary model is used to stimulate N = 8 further dual sub-models 41 to 48. The model response SYn} as well as the partial derivatives sought after the model parameters # yn / # Aij, # yn / # bi, # yn / # cj and # yn / # d can be found as shown in the figure.

Die am Ausgang des Parameterrechners 9 erhältlichen partiellen Ableitungen ayn/adn #yn/#ci, #yn/#bi und #yn/#Aij werden so wie in Fig. 4 dargestellt ist, einer Rechenstufe 49 zugeführt und in dieser einer vom gewählten Fehlermass E abhängigen Rechenoperation unterworfen. Die auf diese Weise veränderten partiellen Ableitungen #E/#d, #E/#ci, #E/#bi und #E/ #Aij werde vom Ausgang der Rechenstufe 49 so wie in den Fig. 3a, 3b und 4 angedeutet ist, an die entsprechenden Multiplikatoren d, ci, b. und A. . des Parameterrechners 9 und somit auch des Vokaltraktmodells 7 zurückgeführt und verändern deren Koeffizienten bei jedem Optimierungsschritt in Abhängigkeit von der im Vergleicher 8 (Fig. l) festgestellten Abweichung zwischen den Folgen {sn} und {yn}. The partial derivatives available at the output of the parameter computer 9 ayn / adn # yn / # ci, # yn / # bi and # yn / # Aij become one as shown in FIG Computing stage 49 is supplied and in this one dependent on the selected error measure E Subject to arithmetic operation. The partial derivatives modified in this way # E / # d, # E / # ci, # E / # bi and # E / #Aij are taken from the output of computing stage 49 as in 3a, 3b and 4 is indicated, to the corresponding multipliers d, ci, b. and A.. of the parameter computer 9 and thus also of the vocal tract model 7 and change their coefficients as a function of each optimization step of the discrepancy between the sequences determined in the comparator 8 (FIG. 1) {sn} and {yn}.

Wenn als Fehlermass die quadratische Abweichung gemäss Formel (2) gewählt wird, und wenn man die partiellen Ableitungen am Ausgang des Parameterrechners 9 mit #yn/#pj bezeichnet, dann ergibt sich für die Rechenoperation in der Stufe 49 folgende Formel: Es sei in diesem Zusammenhang auf die weiter oben gegebene Definition der Parameter verwiesen. Diese stellen ja nur einen Teil aller Komponenten d, ci, bi und Aij des Parameterrechners 9 dar. Es versteht sich von selbst, dass beim Optimierungsvorgang nur diejenigen Komponenten verändert werden welche tatsächlich Parameter darstellen. Infolgedessen brauchen auch nur diejenigen partiellen Ableitungen der Stufe 49 und dem Parameterrechner 9 zugeführt zu werden, welche tatsächlichen Parametern zugeordnet sind. In der Praxis bedeutet dies, dass anstelle der möglichen 81 Modellparameter (l Parameter d + 8 Parameter ciA8 Parameter bi+8x8 Parameter A. .) bei geeigneter Modellstruktur iJ 15 Parameter ausreichend sind.If the square deviation according to formula (2) is selected as the measure of error, and if the partial derivatives at the output of the parameter calculator 9 are designated with # yn / # pj, then the following formula results for the arithmetic operation in step 49: In this context, reference is made to the definition of the parameters given above. These represent only a part of all components d, ci, bi and Aij of the parameter computer 9. It goes without saying that during the optimization process only those components are changed which actually represent parameters. As a result, only those partial derivatives need to be fed to stage 49 and parameter computer 9 which are assigned to actual parameters. In practice, this means that instead of the possible 81 model parameters (1 parameter d + 8 parameters ciA8 parameters bi + 8x8 parameters A..), 15 parameters are sufficient with a suitable model structure.

Es sei noclenals erwähnt, dass, wie die Fig. 3a und 3b zeigen, der Parameterrechner ein vollständiges Vokaltraktmodell enthält. Bei der praktischen Ausführung der beschriebenen AnaLyse- und Syntheseeinrichtung ist das Vokaltraktmodell 7 imIbrameterrechner 9 (Fig. l) enthalten. Die getrennte Darstellung der beiden Elemente in Fig. l erfolgte nur aus Gründen der einfacheren Beschreibung. It should be noted that, as FIGS. 3a and 3b show, the Parameter calculator contains a complete vocal tract model. In the practical The vocal tract model is the implementation of the described analysis and synthesis device 7 contained in the parameter calculator 9 (Fig. 1). The separate representation of the two Elements in Fig. 1 are included for convenience of description.

5YNTHESETEIL Der Decodierer 11 (Fig. 1) zerlegt sein Eingangssignal in die entsprechenden Signale, aus denen es aufgebaut ist, d.h. er gewinnt aus dem Kanalsignal oder aus den gespeicherten digitalen Signalen die Modellparameter pj, die Stimmhaftagkeits-.information g und, falls vorhanden, die Pitchperiodeninformation M.SYNTHESIS PART The decoder 11 (FIG. 1) decomposes its input signal into the corresponding signals from which it is built, i.e. it gains from the Channel signal or the model parameters pj from the stored digital signals, the voicing information g and, if available, the pitch period information M.

Mit der Stimmhaftigkeitsinformation und der Länge der Pitchperiode wird die Impuls/Rausch-Quelle 6' angesteuert, welche mit der Impuls/Rausch-Quelle 6 des Analyseteils identisch ist. Die Impuls/Rausch-Quelle 6' liefert die Anregungsfolge für das Synthese-Vokaltraktmodell 7', welches ebenfalls mit dem Analyse-Vokaltraktmodell 7 identisch ist. Da das Synthese-Vokaltraktmodell 7' die gleiche Struktur aufweist wie das Analyse- Vokaltraktmodell 7, da es anhand der gleichen Parameter eingestellt und da es ausserdem von der gleichen An regungsfolge {xn} angeregt wird, liefert es die gleiche Antwortfolge SYn; . Wegen des im Analyseteil angewandten Optimierungsalgorithmus weicht diese Antwort folge nur nur unwesentlich, d.h. für das Ohr kaum wahrnehmbar, vom ursprünglichen, abgetasteten Sprachsignal {sn} ab. With the voicing information and the length of the pitch period the pulse / noise source 6 'is controlled, which with the pulse / noise source 6 of the analysis part is identical. The pulse / noise source 6 'supplies the excitation sequence for the synthesis vocal tract model 7 ', which is also with the analysis vocal tract model 7 is identical. Since the synthesis vocal tract model 7 'has the same structure like the analysis vocal tract model 7, since it is set using the same parameters and since it is also excited by the same sequence of excitation {xn}, yields it has the same response sequence SYn; . Because of the optimization algorithm used in the analysis part if this answer deviates only insignificantly, i.e. barely perceptible to the ear, from the original, sampled speech signal {sn}.

Die Ausgangsfolge {yn} des Synthese-Vokaltraktmodells 7' wird im Digital/AnalogÄandler 12 in ein Analogsignal umgewandelt, welches im anschliessenden Tiefpassfiltcr 2' demoduliert wird. Das Demodulationsfilter 2' ist gleich ausgelegt wie das Eingangsfilter 2 des Analyseteils. Das so synthetisierte Sprachsignal wird der Senke 13 zugeführt, welche im allgemeinen ein Lautsprecher oder ein Analogspeicher ist. The output sequence {yn} of the synthesis vocal tract model 7 'is im Digital / analog converter 12 converted into an analog signal, which in the subsequent Low-pass filter 2 'is demodulated. The demodulation filter 2 'is designed in the same way like the input filter 2 of the analysis part. The speech signal synthesized in this way will the sink 13, which is generally a loudspeaker or an analog memory is.

Die wesentlichen Elemente des Syntheseteils, nämlich die Impuls/Rausch-Quelle 6', das Vokaltraktmodell 7' und das Filter 2' sind somit in identischer Form auch im Analyseteil enthalten. Da ausserdem Analog/Digital-Wandler gebräuchlicher Bauart in ihrem Rückkopplungskreis meistens einen Digital/ Analog-Wandler aufweisen, ist auch der Digital/Analog-Wandler 12 bereits im Analyseteil vorhanden. Diese Umstände gestatten einen besonders einfachen Einsatz des Geräts im Halbduplexbetrieb. The essential elements of the synthesis part, namely the pulse / noise source 6 ', the vocal tract model 7' and the filter 2 'are therefore also in identical form included in the analysis part. There is also an analog / digital converter of a common type usually have a digital / analog converter in their feedback circuit the digital / analog converter 12 is also already present in the analysis part. These circumstances allow a particularly simple use of the device in half-duplex operation.

Praktische Versuche haben ergeben, dass die zu Ubertragenden bzw. abzuspeichernden Grössen, Stimmhaftigkeitsinformation, Pitchperiode und Modellparameter, etwa 30 mal pro Sekunde neu bestimmt werden mUssen, um eine annehmbare Qualität der synthetischen Sprache zu erhalten. Weiter hat sich gezeigt, dass bei einer Abtastfrequenz von 6 kHz die Modellordnung N=8 genügt. Ausserdem sind bei geeigneter Modellstruktur 15 Modellparameter zu je 8 Bit ausreichend. Beachtet man, dass die Stimmhaftigkeitsinformation 1 Bit beansprucht und rechnet man für die Pitchperiode mit 10 Bit, so erhält man eine Uebertragungsrate von 30. (15.8+10+1) Bit/sec 4000 Bit/sec. Gegenüber der herkömmlichen Uebertragungsart durch PCM verringert sich somit die benötigte Kanalkapazität um etwa 90 %. Durch geschickte Wahl der Struktur des Vokaltraktmodells lässt sich wahrscheinlich die Uebertragungsrate noch weiter reduzieren. Practical tests have shown that the to be transferred or variables to be saved, voicing information, pitch period and model parameters, must be redetermined about 30 times per second in order to achieve an acceptable quality of synthetic speech. It has also been shown that at a sampling frequency of 6 kHz the model order N = 8 is sufficient. In addition, with a suitable model structure 15 model parameters of 8 bits each are sufficient. Note that the voicing information 1 bit is required and if you calculate with 10 bits for the pitch period, you get a transmission rate of 30. (15.8 + 10 + 1) bit / sec 4000 bit / sec. Compared to the conventional The type of transmission through PCM reduces the required channel capacity by about 90%. By clever choice of the structure of the vocal tract model, one can probably reduce the transmission rate even further.

Claims

Patent claims

1. Method for the analysis and synthesis of speech, in which to Analysis of the original speech signal divided into sections w.rd and for each Section three groups of signals representing the respective speech signal can be derived, the first group of signals being the parameters of a human Functionally corresponding vocal tract and essentially consisting of a discrete one linear filter based synthesis-vowel truth model and the second and the third group of signals, the reciprocal value referred to below as the pitch period the fundamental frequency or the voicing character of the original speech signal represent for the respective section, and in which case the synthesis vocal tract model for synthesis set based on the first group of signals and during voiced sections of the original speech signal by a series of pulses spaced apart by the pitch period and during voiceless sections stimulated by white noise and so at the exit of the synthesis vocabulary tract model, an artificial one that is similar to the original speech signal Speech signal is generated, characterized in that in the analysis for extraction the signals representing the parameters of the synthesis vowel model (7 ') Analysis vocal tract model (7) identical to the synthesis vocal tract model was used and during voiced portions of the original speech signal by a episode of impulses at the interval of the pitch period and during unvoiced Sections are stimulated by white noise that the output signal of the analysis vocal tract model compared in sections with the original speech signal and the deviation between the two mentioned signals by changing the parameters of the analysis vocal tract model is minimized, and that those parameters of the analysis vocal tract model, at which the deviation falls below a specified threshold value, \ aLa first group of signals are used.

2. The method according to claim L, characterized in that for minimization the deviation between their original speech signal and the output signal of the analysis vocal tract model (7) the gradient of the deviation representing Measure of error with respect to the parameters of the analysis vocal tract model is determined and that the parameters of the analysis vocal tract model are in the direction of the gradient opposite direction can be changed.

3. The method according to claim 2, characterized in that after each Determination of the deviation between the original speech signal and the output signal of the analysis vocal tract model (7) representing the error measure the parameters of the analysis vocal tract model can be changed in one small step.

4. The method according to claim 3, characterized in that the width the step in changing the parameters of the analysis vocal tract model (7) is firmly selected.

5. The method according to claim 1, characterized in that the white noise representing signal and the sequence of impulses with which signals the analysis vocal tract model (7) during unvoiced or voiced sections of the original speech signal is excited, approximately constant and approximately have the same performance.

6. The method according to claim 5, characterized in that the sequence of impulses with which the analysis vocal tract model (7) during voiced sections of the original speech signal is stimulated, formed by simple unit pulses will.

7. Device for performing the method according to claim 1, in which the synthesis part has a synthesis vocal tract model and an impulse / noise source and the analysis part with means for defining the parameters of this synthesis vocal tract model, Means for determining the pitch period and means for determining the voicing character of the original voice signal, characterized in that the means for determining the parameters of the synthesis vocal tract model (7 ') an analysis vocal tract model identical to the latter (7), from a pulse / noise source identical to the pulse / noise source (6 ') of the synthesis (S) (6), from a section memory (5) for the section-by-section storage of the original Speech signal, from a comparator (S) for comparing the output signal of the Analysis vocal tract model with the signal stored in the section memory and from a parameter calculator 69) to minimize the Deviations are formed between the two signals.

8. The device according to claim 7, characterized in that the analysis vocal tract model (7) and the synthesis vocal tract model (7 ') each formed by a linear digital filter are.

9. Apparatus according to claim 7 or 8, characterized in that the parameter computer (9) is designed so that its output signal when excited by the signal of the impulse noise source (6) the gradient of the in the comparator (8) the determined deviation corresponds to the degree of error.

10. The device according to claim 9, characterized in that the Parameter calculator (9) and the vocal tract model (7) are formed together: from one primary model (29), identical to the vocal tract model, from a part of a to this primary model dual model (30) and from one of the number of components corresponding to the state vector of the primary model or the dual sub-model Number of further sub-models (31-38) of the primary model that the input of the primary model and that of the dual sub-model to the output of the pulse / noise source (6) are connected, and that each of the further primary sub-models with its Input is connected to one of those outputs of the dual sub-model, which supply the components of the state vector of this dual partial model.

11. The device according to claim 9, characterized in that the Parameter calculator (8) and the vocal tract model (7) are formed together: from one primary model (39), identical to the vocal tract model, from part of a to this primary model first dual model (40) and from one of the number of Components of the state vector of the primary model or the dual sub-model corresponding number of further dual sub-models (41-48); that the entrance of the primary model and that of the first dual partial model to the output connected to the pulse / noise source (6) are, and that each of the other dual sub-models with its input to one of those outputs of the primary model is connected, which contains the components of the state vector deliver this primary sub-model.