FR2741744A1

FR2741744A1 - Energy evaluation method for speech signal in low bit rate vocoder

Info

Publication number: FR2741744A1
Application number: FR9513944A
Authority: FR
Inventors: Pierre Andre Laurent
Original assignee: Thomson CSF SA
Current assignee: Thales SA
Priority date: 1995-11-23
Filing date: 1995-11-23
Publication date: 1997-05-30
Anticipated expiration: 2015-11-23
Also published as: FR2741744B1

Abstract

The method involves performing analysis of signals transmitted between transmission and reception vocoders to predict the linear response characteristic. A filter in the transmitter circuit (3) has an impulse response which is the inverse of the receiver synthesis filter response. The receiver synthesis filter is preferably a linear phase digital filter. The filter estimates (4) the residual frequency spectrum. The estimate allows the receiver vocoder impulse response filter to be set to the optimum frequency response curve to reduce the residual spectrum frequency.

Description

La présente invention conceme un procédé et un dispositif d'évaluation de l'énergie du signal de parole par sous bande pour vocodeur bas débits. The present invention relates to a method and a device for evaluating the energy of the speech signal by sub-band for low bit rate vocoder.

Elle s'applique notamment à la réalisation de vocodeurs à prédiction linéaire suivant lesquels le signal de parole est découpé en trames de durée fixe, et où un paquet de données représentatif de la parole est transmis vers un vocodeur de réception durant chaque trame considérée. It applies in particular to the production of linear prediction vocoders according to which the speech signal is divided into frames of fixed duration, and where a data packet representative of the speech is transmitted to a reception vocoder during each frame considered.

Dans ces vocodeurs, la synthèse du signal de parole est obtenue côté récepteur en filtrant, par un filtre de synthèse un signal d'excitation obtenu par la combinaison en proportion adéquate d'un bruit blanc et dun train d'impulsions périodiques, le tout ayant un spectre en fréquence globalement plat. In these vocoders, the synthesis of the speech signal is obtained on the receiver side by filtering, by a synthesis filter, an excitation signal obtained by the combination in adequate proportion of a white noise and a train of periodic pulses, the whole having a generally flat frequency spectrum.

Un inconvénient de ces vocodeurs est que lorsque le débit binaire des paquets d'échantillons de parole est réduit à 2400 bits/s et moins, le signal de parole reconstitué à la synthèse présente une sonorité "synthétique". Ceci provient du fait que l'analyse par prédiction linéaire qui est utilisée pour les faibles débits est essentiellement conçue pour bien représenter les résonances et non pas les anti-résonances produites notamment par la prononciation des voyelles dites "nasales". A disadvantage of these vocoders is that when the bit rate of the speech sample packets is reduced to 2400 bits / s and less, the speech signal reconstituted at synthesis has a "synthetic" sound. This stems from the fact that the analysis by linear prediction which is used for low flow rates is essentially designed to properly represent the resonances and not the anti-resonances produced in particular by the pronunciation of the so-called "nasal" vowels.

Le but de l'invention est de pallier l'inconvénient précité en proposant un procédé de génération de données supplémentaires pour moduler l'énergie du signal d'excitation du filtre de synthèse afin d'améliorer la qualité de reproduction des anti-résonances, en particulier les creux de fréquence dans les fréquences basses où elles sont les plus perceptibles. The object of the invention is to overcome the aforementioned drawback by proposing a method for generating additional data to modulate the energy of the excitation signal of the synthesis filter in order to improve the quality of reproduction of the anti-resonances, in especially the frequency dips in the low frequencies where they are most noticeable.

Elle a également pour objet un dispositif pour la mise en oeuvre du procédé précité. It also relates to a device for implementing the above method.

D'autres caractéristiques et avantages de l'invention apparaîtront à l'aide de la description qui suit faite en regard des dessins annexés qui représentent:
Les figures 1 et 2 une illustration du procédé selon l'invention mise sous la forme d'organigrammes.Other characteristics and advantages of the invention will become apparent from the following description given with reference to the appended drawings which represent:
Figures 1 and 2 an illustration of the method according to the invention in the form of flowcharts.

La figure 3 un dispositif de réception pour la mise en oeuvre du procédé selon l'invention. Figure 3 a receiving device for the implementation of the method according to the invention.

La figure 4 un graphe illustrant une répartition des fréquences centrales des différentes sous-bandes composant le filtre du signal d'excitation du filtre de synthèse. FIG. 4 is a graph illustrating a distribution of the central frequencies of the different sub-bands making up the filter of the excitation signal of the synthesis filter.

La figure 5 un graphe illustrant les réponses impulsionnelles de filtres élémentaires mis en oeuvre dans le filtrage du signal d'excitation du filtre de synthèse. FIG. 5 is a graph illustrating the impulse responses of elementary filters used in the filtering of the excitation signal of the synthesis filter.

La figure 6 des réponses en fréquence comparées du filtre du signal d'excitation et du signal résiduel. FIG. 6 of the compared frequency responses of the filter of the excitation signal and the residual signal.

Le procédé selon l'invention consiste à estimer le spectre en fréquence d'un signal d'erreur de prédiction du vocodeur d'émission, de manière à agir en réception sur les coefficients d'un filtre, pour filtrer le signal d'excitation du filtre de synthèse afin que le signal de parole reconstitué par le filtre de synthèse ait une forme qui soit la plus proche possible du signal de parole appliqué à l'entrée du vocodeur d'émission. The method according to the invention consists in estimating the frequency spectrum of a prediction error signal from the transmission vocoder, so as to act upon reception on the coefficients of a filter, to filter the excitation signal from the synthesis filter so that the speech signal reconstituted by the synthesis filter has a shape which is as close as possible to the speech signal applied to the input of the transmission vocoder.

Pour bien reproduire le spectre en fréquence surtout dans les fréquences basses, le procédé selon l'invention vise à prédéterminer un jeu de M fréquences gm réparties suivant une loi semi logarithmique de la forme

avec m=1...M où Fo est une fréquence de référence fixée par exemple à 1 000 Hz.To reproduce the frequency spectrum well, especially in the low frequencies, the method according to the invention aims to predetermine a set of M frequencies gm distributed according to a semi logarithmic law of the form

with m = 1 ... M where Fo is a reference frequency fixed for example at 1000 Hz.

Les M valeurs désirées de la réponse en fréquence correspondant au jeu des fréquences gm sont notées par la suite Dm. The M desired values of the frequency response corresponding to the set of frequencies gm are noted below Dm.

Le signal d'excitation est filtré par un filtre H(z) à phase linéaire, à 2p+1 coefficients dont la réponse impulsionnelle H(z) est de la forme
H(Z) = Z P(hp ZP+...+hlZ+ ho + hlZ I ..+hpZ P) (2)
Suivant un mode préféré de réalisation de l'invention ce filtre est construit suivant une combinaison linéaire de K filtres élémentaires sousbandes à phase linéaire et de même temps de propagation de groupe de réponse impulsionnelle Hk(Z) telle que

The excitation signal is filtered by a filter H (z) with linear phase, with 2p + 1 coefficients whose impulse response H (z) is of the form
H (Z) = ZP (hp ZP + ... + hlZ + ho + hlZ I .. + hpZ P) (2)
According to a preferred embodiment of the invention, this filter is constructed according to a linear combination of K elementary subband filters with linear phase and of the same propagation time of impulse response group Hk (Z) such that

Pour obtenir une réponse en fréquence approximativement plate du filtre H(z), par exemple à 0,5 dB près, chaque kème filtre est dimensionné à la manière d'un filtre passe bande de fréquence centrale Fk et de bande passante Bk=Fk+1t2-Dk-1/2, les fréquences Fk étant ellesmémes réparties suivant une même échelle semi logarithmique que les fréquences gm et telles que

pour k=1...K avec Fr 1/2=o et FK+1/2=9M. To obtain an approximately flat frequency response of the filter H (z), for example to within 0.5 dB, each kth filter is dimensioned in the manner of a band pass filter of central frequency Fk and of band pass Bk = Fk + 1t2-Dk-1/2, the frequencies Fk being themselves distributed on the same semi-logarithmic scale as the frequencies gm and such that

for k = 1 ... K with Fr 1/2 = o and FK + 1/2 = 9M.

Le coefficient hi k de rang i de chaque kème filtre qui représente le ieme échantillon situé à partir du centre de la réponse impulsionnelle du filtre peut etre défini par la relation

pour i = O...p, relation qui donne généralement satisfaction.The coefficient hi k of rank i of each kth filter which represents the i th sample located from the center of the impulse response of the filter can be defined by the relation

for i = O ... p, relation which generally gives satisfaction.

Les coefficients )k qui définissent H(Z) sont calculés pour que la réponse en fréquence du filtre H(Z) soit aussi proche que possible sur les fréquences gm des réponses en fréquence souhaitées Dm. The coefficients) k which define H (Z) are calculated so that the frequency response of the filter H (Z) is as close as possible on the frequencies gm to the desired frequency responses Dm.

Le filtre H(Z) étant à phase linéaire sa réponse en fréquence est de la forme

soit en application de la relation (3)

The filter H (Z) being linear phase its frequency response is of the form

either in application of relation (3)

Les valeurs optimales des coefficients Xk sont celles qui minimisent
I'erreur quadratique totale

(8) entre la réponse en fréquence souhaitée (Dm) et la réponse effective du filtre (H(gm).The optimal values of the coefficients Xk are those which minimize
Total square error

(8) between the desired frequency response (Dm) and the effective response of the filter (H (gm).

Ceci est obtenu par exemple par exécution de la méthode connue des moindre carrés consistant à annuler toutes les dérivées de E par rapport à chacun des K coefficients Xk et en résolvant un système de K équations défini par la relation:

pour k=1...K
En appelant par

la réponse en fréquence du kème filtre à la fréquence gm le système d'équation devient

pourk=I...K soit encore

pour k=1...K.This is obtained for example by execution of the known least squares method consisting in canceling all the derivatives of E with respect to each of the K coefficients Xk and by solving a system of K equations defined by the relation:

for k = 1 ... K
By calling by

the frequency response of the kth filter at the frequency gm the equation system becomes

pourk = I ... K be again

for k = 1 ... K.

Afin de minimiser la charge de calcul, la résolution du système d'équations, défini par la relation précédente, a lieu en transformant la relation précédente en la relation

pour k=1...K par le fait que les coefficients ak,m sont constants et ne dépendent que de la forme exacte des filtres Hk(Z) et des fréquences de référence g,. In order to minimize the computational load, the solution of the system of equations, defined by the previous relation, takes place by transforming the previous relation into the relation

for k = 1 ... K by the fact that the coefficients ak, m are constant and depend only on the exact form of the filters Hk (Z) and the reference frequencies g ,.

Si les filtres sont orthogonaux, c'est-à-dire ont des réponses en fréquence disjointes, le coefficient Xk affecté au filtre de rang k est simplement la moyenne des réponses en fréquences désirées aux fréquences "couvertes" par le filtre. If the filters are orthogonal, that is to say have disjoint frequency responses, the coefficient Xk assigned to the filter of rank k is simply the average of the desired frequency responses to the frequencies "covered" by the filter.

Le calcul des coefficients Xk a lieu côté émission suivant les étapes 1 à 7 du procédé représentées sur l'organigramme de la figure 1. Aux étapes 1 et 2 les coefficients du filtre de synthèse sont quantifiés par application d'une des méthodes d'analyse connues du signal de parole par prédiction linéaire et telles que décrites par exemple aux pages 107 à 137 du livre de
MM. René BOITE et Murat KUNT publié aux Presses Polytechniques
Romandes ayant pour titre 'Traitement de la parole". L'étape 3 consiste à modéliser un filtre A(z) du signal de parole par les coefficients du filtre de synthèse de réception calculés à l'étape 2, pour reproduire dans le vocodeur d'émission un filtre dont la réponse impulsionnelle est l'inverse de celle du filtre de synthèse utilisé à la réception.Le signal résultant du filtrage de l'étape 3 du signal de parole est un signal dont le spectre en fréquence est en principe très proche de celui du signal d'excitation du filtre de synthèse du vocodeur de réception dont le spectre est globalement plat, puisque résultant d'une combinaison adéquate d'un bruit blanc et d'un train d'impulsions périodique. Cette différence entre les deux spectres est mise à profit aux étapes 4 à 7 pour calculer aux étapes 5, 6 et 7 les coefficients Ak du filtre du signal d'excitation du filtre de synthèse. L'étape 4 consiste à estimer une version lissée du spectre en fréquence du signal résiduel obtenu suite à l'exécution de l'étape 3, pour que les réponses en fréquence souhaitées notée Dm sur la figure 1 ne tombent pas accidentellement dans des creux étroits ou des bosses étroites du spectre en fréquence du signal résiduel.Le calcul de la réponse en fréquence souhaitée a lieu sur un nombre déterminé M de fréquences par exécution d'une transformation discrète en cosinus définie par la relation

pour m=1...M1 dans laquelle les wj sont les coefficients d'une fenétre de pondération destinée à lisser la réponse en fréquence, par exemple

i=O. . r et Ri est le résultat d'un calcul d'autocorrélation jusqu'à l'ordre r d'échantillons de signal résiduel tel que

i=O. . r. The calculation of the coefficients Xk takes place on the emission side according to steps 1 to 7 of the method represented on the flow diagram of FIG. 1. In steps 1 and 2 the coefficients of the synthesis filter are quantified by applying one of the analysis methods known from the speech signal by linear prediction and as described for example on pages 107 to 137 of the book
MM. René BOITE and Murat KUNT published by Presses Polytechniques
Romandes with the title "Speech processing". Step 3 consists in modeling a filter A (z) of the speech signal by the coefficients of the reception synthesis filter calculated in step 2, to reproduce in the vocoder d transmitting a filter whose impulse response is the opposite of that of the synthesis filter used on reception. The signal resulting from the filtering of stage 3 of the speech signal is a signal whose frequency spectrum is in principle very close that of the excitation signal of the synthesis filter of the reception vocoder whose spectrum is generally flat, since it results from an adequate combination of white noise and a periodic pulse train. This difference between the two spectra is used in steps 4 to 7 to calculate the coefficients Ak of the filter of the excitation signal of the synthesis filter in steps 5, 6 and 7. Step 4 consists in estimating a smoothed version of the frequency spectrum of the s residual ignal obtained following the execution of step 3, so that the desired frequency responses denoted Dm in FIG. 1 do not accidentally fall into narrow hollows or narrow bumps in the frequency spectrum of the residual signal. the desired frequency response takes place over a determined number M of frequencies by performing a discrete cosine transformation defined by the relation

for m = 1 ... M1 in which the wj are the coefficients of a weighting window intended to smooth the frequency response, for example

i = O. . r and Ri is the result of an autocorrelation calculation up to the order r of samples of residual signal such that

i = O. . r.

Les coefficients Xk du filtre H(z) sont obtenus en résolvant un système d'équations défini par la relation

pour k=1...K où K désigne le nombre de filtres élémentaires utilisé pour la construction du filtre utilisé côté synthèse.The coefficients Xk of the filter H (z) are obtained by solving a system of equations defined by the relation

for k = 1 ... K where K denotes the number of elementary filters used for the construction of the filter used on the synthesis side.

Les coefficients Ak sont ensuite calculés à l'étape 5 par application de la relation (13) et quantifiés à l'étape 7, après éventuellement une normalisation à l'étape 6, avant d'etre transmis. The coefficients Ak are then calculated in step 5 by applying the relation (13) and quantified in step 7, possibly after normalization in step 6, before being transmitted.

L'exécution du procédé dans la partie réception du vocodeur a lieu suivant les étapes 8 à 10 de l'organigramme de la figure 2 consistant à effectuer une déquantification, suivant l'étape 8, des coefficients Xk . un calcul suivant l'étape 9 des coefficients du filtre H(z) défini par les relations (3) et (5) et un filtrage suivant l'étape 10 du signal d'excitation du filtre de synthèse par le filtre H(z) ainsi défini. The execution of the method in the reception part of the vocoder takes place according to steps 8 to 10 of the flow diagram of FIG. 2 consisting in carrying out a dequantification, according to step 8, of the coefficients Xk. a calculation according to step 9 of the coefficients of the filter H (z) defined by relations (3) and (5) and a filtering according to step 10 of the excitation signal of the synthesis filter by the filter H (z) thus defined.

Un dispositif de réception correspondant pour la mise en oeuvre du procédé selon l'invention est montré à la figure 3. Ce dispositif comporte de façon connue, un filtre de synthèse 1 1 excité, au travers d'un dispositif de commande de gain 12, par un signal d'excitation foumi altemativement au travers d'un commutateur de son voisé/non voisé 13 par une source de bruit 14 et une source d'impulsions 15. Pour la reproduction du signal vocal la réponse du filtre de synthèse 11 est commandée par les coefficients ak obtenus à l'étape 2 de quantification représentée à la figue 1. Un filtre H(z) 16 complémentaire ayant les caractéristiques définies précédemment est interposé selon l'invention entre le commutateur 13 et le dispositif de commande de gain 12. A corresponding reception device for implementing the method according to the invention is shown in FIG. 3. This device comprises, in a known manner, a synthesis filter 11 excited, through a gain control device 12, by an excitation signal supplied alternatively through a voiced / unvoiced sound switch 13 by a noise source 14 and a pulse source 15. For the reproduction of the voice signal the response of the synthesis filter 11 is controlled by the coefficients ak obtained in step 2 of quantification shown in fig 1. A complementary filter H (z) 16 having the characteristics defined above is interposed according to the invention between the switch 13 and the gain control device 12.

A titre d'exemple, pour un choix d'une fréquence d'échantillonnage de 8000Hz et d'une fréquence 9m=4000 Hz, le filtre H(z) peut être formé de la façon représenté par le diagramme de la figure 4 par un ensemble K de six filtres élémentaires, centrés respectivement sur les fréquences 500, 1 000, 1 500, 2 500 et 3 500 Hz. En utilisant un filtre H(z) à 2p+1 coefficients tel que défini précédemment, la réponse impulsionnelle de chacun des 6 filtres est pour p=16 celle de la figure 5. Les résultats obtenus par les différentes opérations sont montrés à la figure 6. La courbe en pointillée A montre le spectre en fréquence du signal résiduel. Dans cet exemple, le spectre est obtenu à partir d'un calcul d'une transformée de Fourier rapide sur N=256 points. By way of example, for a choice of a sampling frequency of 8000Hz and a frequency of 9m = 4000 Hz, the filter H (z) can be formed as shown in the diagram in FIG. 4 by a set K of six elementary filters, centered respectively on the frequencies 500, 1,000, 1,500, 2,500 and 3,500 Hz. Using an H (z) filter with 2p + 1 coefficients as defined above, the impulse response of each of the 6 filters is for p = 16 that of FIG. 5. The results obtained by the different operations are shown in FIG. 6. The dotted curve A shows the frequency spectrum of the residual signal. In this example, the spectrum is obtained from a calculation of a fast Fourier transform on N = 256 points.

La courbe B, formée de petits cercles représente les valeurs Dm de la réponse en fréquence souhaitée calculée sur M=30 points fait apparaître une plus grande densité dans les fréquences basses. Cette réponse en fréquence correspond à une version lissée du spectre du signal résiduel. The curve B, formed of small circles represents the values Dm of the desired frequency response calculated on M = 30 points shows a greater density in the low frequencies. This frequency response corresponds to a smoothed version of the spectrum of the residual signal.

Les points au nombre de 6 figurés par des astérisques correspondent aux valeurs normalisées des coefficients Xk. Ils représentent à quelque chose près, une version exagérée de la réponse en fréquence souhaitée ne comportant que 6 points au lieu de 30. The points 6 in number represented by asterisks correspond to the normalized values of the coefficients Xk. They represent with something near, an exaggerated version of the desired frequency response comprising only 6 points instead of 30.

Enfin, la courbe en traits pleins C est la réponse en fréquence finale du filtre H(z) obtenu. Sa réponse impulsionnelle apparaît très proche de la réponse en fréquence souhaitée. Finally, the curve in solid lines C is the final frequency response of the filter H (z) obtained. Its impulse response appears very close to the desired frequency response.

Pour la mise en oeuvre du procédé selon l'invention il pourra être utilisé des processeurs de traitement de signal convenablement programmés suivant les étapes du procédé précédemment décrites. For the implementation of the method according to the invention, it is possible to use signal processing processors suitably programmed according to the steps of the method previously described.

Claims

1. Method for evaluating the energy of a speech signal by subband between a transmission vocoder and a reception predictor with linear prediction, characterized in that it consists in performing on the transmission side a filtering (3) of the signal of speech applied to the input of the vocoder by a filter having an impulse response opposite to the impulse response of the reception synthesis filter, estimating (4) the frequency spectrum of the residual signal obtained and determining (9) in the vocoder receiving the impulse response H (z) of a filter to filter the excitation signal of the synthesis filter so that its frequency response curve is as close as possible to the frequency spectrum of the residual signal.

2. Method according to claim 1, characterized in that the filter

H (z) of the excitation signal of the synthesis filter is a digital linear phase filter.

3. Method according to claim 2, characterized in that the filter

H (z) of the excitation signal is constructed according to a linear combination of a determined number of K elementary filters with linear phase and of the same group propagation time to share the bandwidth of the filter H (z) of the signal of excitation in sub-bands.

4. Method according to claim 3, characterized in that the central frequencies FK of the sub-bands are distributed according to the same semi loaarithmiaue scale by checking a relation of the form

with F1.112 = O and FK + 1/2 = 9M

5. Method according to claim 4, characterized in that each hitk coefficient of rank i of each kth filter is determined by a relation of the form

6. Method according to claims 3 to 5, characterized in that the coefficients XK representing the gain of the elementary filters are determined by minimizing the total quadratic error between the desired frequency response defined by the frequency spectrum of the residual signal (Dm ) and the effective filter response (Hgm) of the excitation signal.

7. Method according to claim 6, characterized in that the desired frequency response Dm takes place over a determined number M of frequencies by execution of a discrete cosine transformation defined by the relation

and Ri is the result of an autocorrelation calculation up to the order r of residual signal samples.

8. Method according to claims 3 and 7 characterized in that the coefficients Xk of the filter of the excitation signal are obtained by solving the system of equations defined by the relation

9. Device for implementing the method according to any one of claims 1 to 8, characterized in that it comprises one or more suitably programmed signal processing microprocessors.