CA2083335A1

CA2083335A1 - Method for the quantification of the energy of the speech signal in a vocoder with very low bit rate

Info

Publication number: CA2083335A1
Application number: CA 2083335
Authority: CA
Inventors: Pierre-Andre Laurent
Original assignee: Individual
Current assignee: Thales SA
Priority date: 1991-11-22
Filing date: 1992-11-19
Publication date: 1993-05-23
Also published as: EP0543700A2; EP0543700A3; FR2684225A1

Abstract

ABSTRACT OF THE DISCLOSURE
The method consists in dividing the speech signal into packets of a determined number of frames of a constant duration by the sampling of a determined number n of energy values in each frame, quantifying the first energy value measured in each first frame of a packet according to a determined number Q0 of bits and the variations of the k - 1 remaining energies in relation to the first value of the energy sampled on determined number Q1 of bits smaller than Q0, the variations of the k - 1 energies being selected from a table of "slopes" enabling each energy sample k to be assigned the energy "slope" that separates it from the energy of the "k - 1th" previous sample. Application:
Vocoders.
Figure 3

Description

- ~08~3~

METHOD FOR THE QUANTIFICATION OF THE ENERGY OF THE
SPEECH SIGNAL IN A VOCODER WITH VERY LOW BIT RATE
BACKGROUND OF THE INVENTION
1. Field of the Invention The present invention relates to a method for the quantification of the energy of the speech signal in a vocoder with a very low bit rate.
It can be applied notably to the making of the linear predlction vocoders used for the transmission of speech by radio, similar to those described for example in the Revue Technique THOMSON-CSF (THOMSON-CSF
Technical Journal), volume 14, No. 3, September 1982, pp. 715 to 731, in which the~ speech signal is identified at the output of a digital filter, the input of which recelves either a periodic waveform corresponding to the waveforms of its voiced sounds such as the vowels or a random waveform corresponding to the waveforms of its unvoiced sounds such as most of its consonants.

2. Description of the Prior Art . .
It is known that the auditory quality of linear prediction vocoders depends greatly on the precision with which their predictive filter is quantified, but also on the quality of the restitution of the power profile of the excita~ion. This is especially true for certain transitory sounds such as many consonants: for .

.
~ ' example, poor quality restitution does not allow a "d"
to be distinguished from a "t" or from a "k".
As a rule, the speech signal is segmented into frames of constant duration, and a single value of power (or energy) is given ~ox each frame.
In vocoders with very low bit rate, one way to lower the bit rate is to increase the duration of the frame, for example from 22.5 ms to 30 ms as well as to group together and quantify the parameters relating to several frames once alone. This enables the dlfferent parameters of synthesis to be renewed less frequently.
Unfortunately, the intelligibility of the restituted speech is diminished, for the transmitting of only one value of ener~y per frame no longer enables the appropriate restitution of certain transitory sounds.
A first known way to overcome these difficulties consists in grouping the frames together in packets while considering k values o~ energy per packet, each of which can be represented by the coordinates of a point referenced in a k-dimensional space. A
statistical analysis makes it possible to determine the main axas of the cloud of the poin~s observed. The quantification takes place on the coordinates of the points borne by the main axes t each point being quantified on a number o~ bits depending on the eigen value or characteristic value associated with each axis considered. However, he drawback of operating in this ~3~

way is that it is necessary to plan a p.rocedure of correction at the synthesis filter so that the values of the energies compute~ are not negative. Furthermore, in this processing operation, no special attention is paid to the fidelity of restitution of the transitory sounds.
According to a second method, also known, which partly follows the procedure of the first method by the grouping of frames in packets and which also takes k values of energy per packet into consideration, the k values of energy are no longer encoded in a scalar way but vectorially by means of a dictionary containing M =
2Q multiplets of k v~lues each in considering the k values to be quantified on Q bits.
In this case, the difficulties of setting up the system appear from the fact that it is necessary, firstly, to create and store a dictionary and, secondly, to carry out a quantification. Since the dictionary is generally poorly structured and since it is necessary to count at least two bits per value of energy, the encoding o~ the number Q occupies no less than 22 combinations which represents very major computing loads for the signal processors of the vocoders.
SUMMARY OF THE INVENTIO~
:
It is the aim of :the invention to overcome the above-mentioned drawbacks. To this effect, an object of '.
:
, - 2~833~

the invention is a method ~or the quan-tification of the energy of the speech signal in a vocoder with very low bit rate, said method consisting in dividing (1) the speech signal into packets of a determined number of ~frames of a constant duration by the sampling of a determined number n o~ energy values in each frame, quantifying ~2, 3, 4) the first energy value measured in each fixst frame of a packet according to a determinéd number QO of bits and the variations of the k - 1 remaining energies in relation to the first value of the energy sampled on a determined number Q1 of bits smaller t,han Q0, the variations~ of the k - 1 energies being selected from a table of "slopes" enabling each energy sample k ~o be assigned the energy "slope'i that separates~i~ from the energy of the "k - 1th" or "k - 1 order" previous sample.
The main advantage of the method according to the invention is that it can be used to obtain high quality energy in each frame of the speech signal while at the same time respecting the energy transitions from frame to frame without thereby affecting the computation load and the necessary memory space in the vocoder.
BRIEF DESCRIPTION OF THE DRAWINGS
Other features and advantages of the invention shall appear~from~the~followlng description, made with reference to the appended drawings, of which:

:

-- 2~833~
s .

- Figures 1 and 2 show two graphs to illustrate the principle of quantification of the energy of a vocoder implemented by the inventlon;
- Figure 3 is a flow chart illus~rating the different steps of the method according to the invention.
MOEE DETAILED DESCRIPTIOM~ ~
The method~according to the lnvention consists, in :
the manne~ shown in fi~ure 1, in segmenting the speech signal into irames wlth a constant determlned duration ranging, for examplel from 22.5 to 30 ms, grouping the frames in packets of a determined number n of energy values of the signal in each frame to transmit, in each packet, only~the first quantified value of the energy ~measured~El in the~first frame of a packet as well as the k - l values o~ the diffPrences of the energies existing between the frames that follow, k being equal to n.L. In reception, the differences of the energies received are placed end to end after the first energy value that~;is received in the first frame of each packet to reconstitute the profile of the quantified values of the energies at emission.
To do this, in the emission vocoder, a first value :: :
; of energy is quantified in each first frame ko of a packet in a determined number QO of bits and the :: ~ : ::
variations of ~ the k~ - 1 remaining energies are quantified with a determined number Q1 of bits smaller . . . : . . :
: - ~ .

' 2~3~3~

than QO. The 2Qo possible initial values include a zero value representing the silences. ThP other values are distributed according to an almost logarithmic scale which is best suited to following the properties of sensitivity of the ear: the higher the level of the speech signal, the smaller is the quantification step.
Typically, a 3dB step is adopted for the low levels and a 1 dB step is adopted for the high levels. The m = 2Q1 other values represent energy increments d; also referred to hereinafter as "legal values of energy", the values of which are predetermined to emphasize the transitions. These transitions are chosen for example as being respectively e~ual to -3dR, OdB, +2dB and +7dB
f the number Ql ls~encoded with only two bits.
As can be~seen ln figure 2, the energy increments can be used to make a search, from each quantified value B of a frame k, for the quantified values ~ of the energy in the k - 1th preceding ~rame which could lead to said value B by a legal increment dj starting with the zero increment do~
The numbers QO and Q1 are determined according to the steps 1 to S of the method represented by the flow chart of figure; 3. The flrst step referenced 1 in figure 3 groups together the frames in packets of L
frames. The values of the energies E1 to Ek are computed~at the step 2. These are quantified in the manner shown in figures 1 and 2 between two values Emax - ~$3~3~

and Emjn in relation to a scale comprising P
graduatlons which may be identified for convenience's sake with the 2Qo possible values of the initial energy E1 measured in the first frame. The quantified values corresponding to the 2Qo posslble values are designated in figure 2 by eO, e1 ... ep_1 with eO = Emjn and ep_1 = Emax The method continues at the step 3 in figure 3 by an initialization stage in which a set of P distances is computed between the first value of energy E and the P possible quantified value.s of this energy.
The corresponding distances Dp are memorized in the form of a first table (D~, not shown, in a memory of the vocoder. The computations take place by squaring the differences between the first energy EI and the quantified values eO, el... ep_1 according to the relationship:
D(p) = (El - ep)2 where p = 0, 1 ... P - 1 The computed distances are all the smaller as the quantified value ep is closer to the value El. The next step 4 consists, in a manner similar to the known VITERBI algorit~mt in in carrying out k - 1 iterations aimed at estimating the distances between all the potential quantification profiles and the real energy profile, in eliminating the least probable quantification profiles. ~ second table (D') not shown and referenced ~Islope~ is prepared. For each of the - . . .

. - ., -.
. .

- 2~33~

iterations l to k - l r this second table D' associates a slope or a legal energy increment dj with each quantified value P of the i.teration k. A search for the quantified value of the preceding k - 1th iteration is ; 5 done by the ticking off, in the "slopes" table, of the "part" or legal increment dj that can lead directly thereto, beginning with the zero increment do. The sequence ~of ~the programming instructions to be implemented is the following:

- FOR p = 0 ... P - l, DO
/* initiaIization for a zero incrementation*/
- Let Dm;n = D (p~do~ = D' (p) and let PrecIndex = O
/*test of:the~non-zero Lncrementations */
- FOR i =~1 ... m, DO
~: - If p - dj > = O AND p - dj < = P - 1 THEN /*legal value dl*/
- If D'(p - dj) < Dmin then /* shorter distance */
DO Dmjn = ~' (P dj) - DO PrecIndex = i - END IF
- END IF
END DO
~ DO SlopeIndex ~k~p)=precIndex/* memoriæe the most probable quant1fied value at the preceding step*/

- DO D(p) = Dmjn -~ Ek-ep)2/* update the distance*/
END DO

.. . . . .
':' ''. ' ' ' : `' ` ` ~
, .

` ` ` . ' 2~3~3~

Thus, at -the k - 1th iteration, a table of distances D~) is prepared. This table, at the position p, contains the cumulated distance between the best quantified profile that arrives at the position p and the original profile. This makes it possible to keep, in memory, a table of slope indices wherein the slope index value (k, p) represents the index of the best possible s~ope to arrive at the quantified value ep at the step k. The two tables thus obtained make it 1~ possible~to arri.ve at a fina} decision. To do this, the method entails carrying out a search in the table D(+) for the index Pmin which corresponds to t~e minimum value. Then it conslsts in making a trace-back in the slopes table by carrying out k - 1 iterations programmed as follows:
- for k = K - 1, K - 2, ...., 1 DO
- Dif~Index(f) = SlopeIndex(k,p ~ Pmi n = Pmi n - SlopeIndex(kl Pmi n ) END DO
The index values Index Diff (1 .......... K - 1) are the indices of the best quantified values possible for the slopes Dj. The final value of Pmin is then simply the most probable quantified value~
The correspondence between the original profile o~
the values o~ the energies to be quantified after the final profile~after quantification is shown in figure 1. The fact that the algorithm automatically eliminates ' ~ . ': : . - ' ' .
- : - ~ ., ,: ' ,... . .
' ' ''' '. .
. ,' :~, ~ , - - 2~8~33~

the aberrant values resulting from a false analysis appears in the fourth value of energy shown in figure 1.
Naturally, the method that has just been described can always be matched to particular characteristics of the system of analysis. In particular, 1f this system tends to find erroneous values for energy, it is always possible to minimize the influence of the erroneous values through the replacement, for example, of the squaring operations used for the distance measurements by absolute values that enable the profile of the quantified values to be linked with the correct values of energy, provided that they are more numerous than the incorrect~values.
Furthermore, the operat1ons of matching and fine tuning fcr the vocoder require only modifications of the quantified starting values (number and values), the increments (number and values), or again the number of iterations.
~ Finally, the method that has just been described represents only a small computation load since the initialization is done starting with the very first frame, and the kth iteration is done at the k ~ 1 frame. This.enables the distribution of the computation load in time, except for the last frame where the final decision is taken without the arrangement's being costly in terms of computation power.

'

Claims

1. A method for the quantification of the energy of the speech signal in a vocoder with very low bit rate, said method consisting in dividing the speech signal into packets of a determined number of frames of a constant duration by the sampling of a determined number n of energy values in each frame, quantifying the first energy value measured in each first frame of a packet according to a determined number Q0 of bits and the variations of the k - 1 remaining energies in relation to the first value of the energy sampled on a determined number Q1 of bits smaller than Q0, the variations of the k - 1 energies being selected from a table of "slopes" enabling each energy sample k to be assigned the energy "slope" that separates it from the energy of the "k - 1th" previous sample.

2. A method according to claim 1, consisting in memorizing the energy slopes associated with each energy sample in the order of appearance of the energy samples.

3. A method according to any one of the claims 1 or 2, wherein the first energy value measured in each first frame is quantified according to an almost logarithmic scale of quantification in giving a greater step value to the low levels of energy.

4. A method according to claim 3, wherein the variations of the k - 1 energies are quantified on levels distributed about a zero level of increase.

5. A method according to claim 4, wherein the selection of the parts of energy is done in making a search, in the table of the slopes, for one of the slopes corresponding to the quantification levels, starting with the zero slope increment do which leads, from an energy sample k of a frame, to an energy value closest to the value of the energy of the k - 1th preceding sample.

6. A method according to any of the claim 5, wherein the determination of the variations of the k -1 energies takes place by the application of the VITERBI algorithm.