EP0482699B1

EP0482699B1 - Method for coding and decoding a sampled analog signal having a repetitive nature and a device for coding and decoding by said method

Info

Publication number: EP0482699B1
Application number: EP91202675A
Authority: EP
Inventors: John Gerard Beerends; Frank Muller; Robertus Lambertus Adrianus Van Ravesteijn
Original assignee: Koninklijke PTT Nederland NV
Current assignee: Koninklijke PTT Nederland NV
Priority date: 1990-10-23
Filing date: 1991-10-16
Publication date: 1997-08-20
Anticipated expiration: 2011-10-16
Also published as: EP0482699A3; NO305188B1; FI914993A; NO914105L; DE69127339D1; CA2053133A1; PT99294A; CA2053133C; ES2106051T3; NO914105D0; JP2958726B2; FI914993A0; EP0482699A2; DK0482699T3; FI105623B; JPH05268098A; NL9002308A; ATE157188T1; DE69127339T2

Abstract

Frequency components are calculated from the STP-filtered speech signal. The amplitudes of these are combined in a manner such that the resultant values are associated with frequencies which are situated equidistantly on a linear Bark scale. Said components are quantised, possibly after scaling. In the decoder, the components are again distributed over the frequency spectrum. In the coder, the fundamental regularity D is determined with an LTP technique, after which it is transmitted. In the decoder the phases of the reconstructed signal at the spacing D in the past are determined. These phases are combined with the amplitudes already present in the frequency spectrum, after which transformation back to the time domain takes place. Inverse STP filtering is then carried out.

Description

The invention relates to a method for coding a sampled analog signal having a repetitive nature, in which the sampled signal is split into consecutive segments each containing a predetermined number of samples; in which a short-term prediction analysis is performed on said segments and in which the coefficients determined in said short-term prediction analysis are transmitted and are also fed to a short-term prediction filter, in which a long-term prediction analysis performed on a residual signal available at an output of said filter and the information determined in said long-term prediction analysis is also transmitted, and in which the information present in the residual signal is coded and transmitted.
Such a method is known from "An error protected transform coder for cellular mobile radio", by H. Suda et al, disclosed during the IEEE Workshop on speech coding "Advances in speech coding", Vancouver, CA, 5-8 September 1989, pages 81-86. This paper discloses in its figure 1 an encoder in which a short-term prediction analysis and a long-term prediction analysis are performed. This encoder comprises a short-term prediction filter for generating a residual signal and comprises a multiplexing unit for multiplexing and then transmitting (after coding) information present in the residual signal, information determined in said short-term prediction analysis and information determined in said long-term prediction analysis.
This known method is disadvantageous, inter alia, because it transmits information in an inefficient way, i.e. with a large number of bits/second.
It is an object of the invention, inter alia, to provide a method for very efficiently transmitting the information, i.e. with a small number of bits/second, without the quality, experienced by the listener, of the speech reconstructed by a decoding method at the receiving side being impaired.
Thereto, the method according to the invention is characterised in that the residual signal is transformed into the frequency domain, in that the amplitudes of at least a number of the frequency components obtained by the transformation into the frequency domain are combined to form a smaller number of frequency components in a manner such that the frequencies associated with the combined amplitudes are situated equidistantly on a linear Bark scale, and in that a signal is transmitted which is representative of said combined amplitudes.
According to the present invention, the residual signal is coded perceptively, which means that only that information is transmitted which is relevant for differences in the decoded received signal which can be detected by the human ear.
In the first place, use is made for this purpose of the known fact that the human ear is not sensitive to absolute phase values, but only to phase relationships, so that it is not necessary in principle to transmit the phase information from the residual signal to be coded, provided only that it is possible to reconstruct the original phase relationships at the receiving end.
In addition, the present invention makes use of the insight known for some time that human hearing functions in fact as a chain consisting of a number of filters having adjacent frequency bands but having different bandwidths, the so-called critical bands or Barks, the bandwidth of such critical bands being much smaller for low frequencies than for high frequencies. A frequency scale formed in accordance with this insight is referred to as a linear Bark scale. For a further explanation of the principle of the Bark scale, reference is made to B. Scharf and S. Buus, "Stimulus, Physiology, Thresholds" in L. Kaufman, K.R. Boff and J. P. Thomas, editors, Handbook of Perception and Human Performance, chapter 14, pages 1-43, Wiley, New York, 1986.
It is also pointed out that the principle of first transforming a residual signal to be transmitted in speech coding to the frequency domain and then transmitting the information available after this transformation has already been put forward earlier. For this purpose reference can be made, for example, to the paper entitled "Fourier Transform Vector Quantisation for Speech Coding" by P. Chang et al. in IEEE Transactions on Communications, Vol. COM 35, No. 10, pages 1059-1068.
According to this publication, however, after the transformation use is made of vector quantisation and there is no mention of transmitting purely amplitude information.
The invention further relates to a method for decoding a signal coded by the method describer above, in which the long-term prediction analysis information received and the other information received from the residual signal are combined and the combined signal, together with the short-term prediction analysis coefficients received, is fed to an inverse short-term prediction filter at whose output a series of samples is delivered which is representative of the sampled analog signal.
The method for decoding is characterised in that original amplitudes in the frequency domain are reconstructed from the combined amplitude values received, in that the information transmitted as a result of the long-term prediction analysis is used to calculate the phase values associated with said amplitudes, and in that the calculated phase values, together with the associated amplitudes are transformed to the time domain.
The invention also relates to an apparatus for coding a sampled analog signal having a repetitive nature, comprising

splitting means for splitting the sampled signal into consecutive segments each containing a predetermined number of samples,
short-term prediction means for performing a short-term prediction analysis on said segments and for generating coefficients,
a short-term prediction filter for receiving the coefficients determined in said short-term prediction analysis,
long-term prediction means for performing a long-term prediction analysis on a residual signal available at an output of said filter,
coding means for coding the information present in the residual signal, which information is to be transmitted, and
an output for transmitting the coefficients, the information determined in said long-term prediction analysis and the information present in the residual signal.

This apparatus is characterised in that the apparatus comprises

transformation means for transforming the residual signal into the frequency domain,
combination means for combining the amplitudes of at least a number of the frequency components obtained by the transformation into the frequency domain to form a smaller number of frequency components in a manner such that the frequencies associated with the combined amplitudes are situated equidistantly on a linear Bark scale, a signal which is representative of said combined amplitudes being transmittable.

The invention further also relates to an apparatus for decoding a coded signal, comprising

an input for receiving long-term prediction analysis information and other information received from the residual signal and short-term prediction analysis coefficients,
combination means for combining the long-term prediction analysis information and the other information received from the residual signal into a combined signal,
an inverse short-term prediction filter for receiving the combined signal and the short-term prediction analysis coefficients and for generating at its output a series of samples which is representative of the-sampled analog signal.

This apparatus is characterised in that the apparatus comprises

reconstruction means for reconstructing original amplitudes in the frequency domain from the combined amplitude values received,
calculation means for using the information transmitted as a result of the long-term prediction analysis to calculate the phase values associated with said amplitudes, and
transformation means for transforming the calculated phase values, together with the associated amplitudes, into the time domain.

It should be observed that it is known that analog signals having a strongly consistent nature such as, for example, speech signals can be efficiently coded after sampling by consecutively performing a number of different transformations on consecutive segments of the signal which each have a particular time duration. One of the known transformations for this purpose is linear predictive coding (LPC), for an explanation of which reference can be made to the book entitled "Digital Processing of Speech Signals" by L.R. Rabiner and R.W. Schafer; Prentice Hall, New Jersey; chapter 8. As stated, LPC is always used for signal segments having a particular time duration, in the case of speech signals, for example, 20 ms, and is considered as short-term coding. It is also known to make use not only of a short-term prediction but also a long-term prediction (LTP) in which a very efficient coding is obtained by a combination of these two techniques. The principle of LTP is described in Frequenz, (Frequency), volume 42, no. 2-3, 1988; pages 85-93; P. Vary et al.: "Sprachcodec for dass Europäische Funkfernsprechnetz" ("Speech coder/decoder for that European Radiotelephone Network"), while an improved version of the LTP principle is described in the Dutch Patent Application 9001985.
The invention will be explained in greater detail below on the basis of an exemplary embodiment with reference to the drawing, wherein:
Figure 1a shows a block diagram of an exemplary embodiment of a coding unit for the device according to the invention.
Figure 1b shows a block diagram of an exemplary embodiment of a decoding unit for the device according to the invention.
An analog signal delivered by a microphone 1 is limited in bandwidth by a low-pass filter 2 and converted in an analog/digital converter 3 into a series of amplitude and time-discrete samples which are representative of the analog signal. The output signal of the converter 3 is fed to the input of a short-term analysis unit 4 and to the input of a short-term prediction filter 5. These two units cater for the abovementioned short-term prediction (STP) on segments of, for example, 160 samples and the analysis unit 4 provides an output signal in the form of short-term prediction filter coefficients which are quantised, coded and transmitted to the decoder unit shown in Figure 1b. The structure and the function of the filter 5 and the unit 4 are well known to those skilled in the art in the field of speech coding and are of no further importance for the present invention, so that a further explanation can be omitted. The STP-filtered signal is fed to a long-term prediction (LTP) analysis unit 6. In this analysis unit, an LTP analysis is applied twice per segment of 160 samples in a manner such as that described, for example, in Dutch Patent Application 9001985. In such an LTP analysis, for a signal segment to be coded, a search is always made, in accordance with a particular search strategy, for a segment which is as similar as possible in a signal period preceding said segment having a particular duration and a signal is transmitted in coded form which is representative of the number of samples D situated between the starting instant of the segment found and the starting instant of the segment to be coded.
The output signal of the STP filter unit 5 is referred to as the residual signal and, according to the invention, said residual signal is transmitted in coded form in a manner such that only the information which, seen perceptively, is relevant is transmitted. For this purpose, the segments of 160 samples in said residual signal are divided into 8 subsegments of 30 samples in the circuit 7. This is done by first dividing the segment supplied into eight subsegments of 20 samples and then completing these at the leading edge with the ten last samples of the previous subsegment. This implies that the last ten samples of every segment have to be stored in order to also be able to complete the first subsegment of the subsequent segment. Then every subsegment of 30 samples is multiplied in a circuit 8 by a window function such as, for example, a cosine function. The window function is so chosen that, for every sample in the overlapping parts of the subsegments, the sum of the squares of the two multiplication factors is unity. The reason that this has to be the case for the squares is that the multiplication by the window function takes place both in the coding unit and in the decoding unit shown in Figure 1b. A Discrete Fourier Transform (DFT) is performed on the windowed subsegment in a circuit 9, 16 different frequency components being obtained for every subsegment. Of these 16 frequency components, numbered 0 to 15 inclusive, the amplitudes A of the components 1 to 13 inclusive are calculated in a circuit 10. The components 0, 14 and 15 can be ignored because they are situated outside the frequency band of 300 - 3,400 Hz chosen for speech communication. If a greater or a smaller frequency band is relevant, the number of amplitude components taken into consideration can be adjusted accordingly. Starting from the said 13 components, four so-called Bark amplitude components are calculated in a circuit 11. These are amplitudes associated with frequencies which are situated equidistantly on a linear Bark scale. The Bark amplitude components B₁ to B₄ inclusive can, for example, be calculated as follows from the DFT amplitudes A₁ to A₁₃ inclusive: $B_{1} = \sqrt{A_{1}^{2} {+ A}_{2}^{2}}$
$B_{2} = \sqrt{A_{3}^{2} {+ A}_{4}^{2} {+ A}_{5}^{2}}$
$B_{3} = \sqrt{A_{6}^{2} {+ A}_{7}^{2} {+ A}_{8}^{2} {+ A}_{9}^{2}}$
$B_{4} = \sqrt{A_{10}^{2} {+ A}_{11}^{2} {+ A}_{12}^{2} {+ A}_{13}^{2}}$
If desired, a gain factor G is calculated as a scaling value in circuit 12 from the four Bark amplitude components in accordance with: $G = \sqrt{B_{1}^{2} {+B}_{2}^{2} {+B}_{3}^{2} {+B}_{4}^{2}}$
The application of the scaling value G has the advantage that the scaled amplitudes can be coded more efficiently. The value of G is quantised in a circuit 13 and then transmitted to the decoding unit. If the scale factor G has been calculated, every Bark component is divided by the quantised gain factor Ĝ in a circuit 14. The result of this division is quantised in a circuit 15, coded and then also transmitted to the decoding unit.
If no use is made of a scaling value, the circuits 12, 13 and 14 can be omitted and the four calculated values for the Bark amplitude components can be transmitted directly after quantisation in circuit 15.
After decoding in a circuit 16 in the decoder unit, the four scaled Bark amplitude components are multiplied in a multiplier 18 by the gain factor, Ĝ, decoded in a circuit 17, as a result of which the reconstructed Bark amplitude components B̂₁ to B̂₄ inclusive are obtained. This is of course not applicable if no scaling factor is used in the coding unit. In a circuit 19, the amplitudes in the frequency domain Â₁ to Â₁₃ inclusive (equidistant on the Hz scale) are calculated by means of the following formulae $Â_{1} {= Â}_{1} = \frac{{\hat{B}}_{1}}{\sqrt{2}}$
$Â_{3} {=Â}_{4} {=Â}_{5} = \frac{{\hat{B}}_{2}}{\sqrt{3}}$
$Â_{6} {= Â}_{7} {= Â}_{8} {=Â}_{9} = \frac{{\hat{B}}_{3}}{2}$
$Â_{10} {=Â}_{11} {=Â}_{12} {=Â}_{13} = \frac{{\hat{B}}_{4}}{2}$
In order to be able to transform the 13 frequency components considered in the coder back to the time domain by means of an inverse DFT (IDFT) in the IDFT circuit, the amplitudes and the phases are required.
The phases are determined in the following manner with the aid of the LTP information decoded in a circuit 23 and consisting of the sample spacing D.
The 120 most recent samples of the reconstructed STP residue such as are present at the output of the circuit 22 to be discussed in greater detail below are stored in each case. In a circuit 24, the subsegment is determined which is situated at a spacing of D samples in the past with respect to the present subsegment and this subsegment is multiplied in a circuit 25 by the same window function as was used in the circuit 8 in the coder unit. A DFT is then applied to said subsegment in a circuit 26, after which the phases of the 13 components considered can be calculated in a circuit 27. With the aid of the phases determined in this way and the amplitudes already calculated, an IDFT is performed in the circuit 20, the amplitudes of Â₀, Â₁₄, Â₁₅ and Â₁₆ being set equal to zero.
At the output of the circuit 20 a reconstruction of the subsegment, 30 samples long, is now available, but this has also been modified by the window function performed in the coder unit. The reconstructed subsegment is therefore multiplied again by the window function in a circuit 21. In the case of the first ten samples of the subsegment now multiplied twice by the window function, the last ten samples, stored for this purpose, of the previous subsegment multiplied twice by the window function are added in a circuit 22. As a result of this, the sum of the multiplication factors in the resultant ten samples is equal to unity.
The last ten samples in this subsegment are stored. The first twenty samples form a portion of the reconstruction of a segment of the STP residue. After eight subsegments have been reconstructed and combined, a completely reconstructed segment of the STP residue is obtained, and this is situated ten samples in the past with respect to the segment on which the STP analysis has been performed in the coding unit.
An inverse STP filtering is performed on this segment in a filter circuit 28 in a manner known per se with the aid of the STP coefficients received, the filter coefficients from the previous segment being used for the first ten samples.
The output signal of the filter 28 is converted in a digital/analog converter 29 into an analog signal which is fed via a low-pass filter 30 to a loudspeaker 31 which gives a high-fidelity reproduction of the speech signal supplied to the microphone 1, it having been possible to transmit said speech signal in coded form with a low number of bits due to the measures according to the invention.
If desired, a circuit 23' can be included between the circuits 23 and 24 to first subject the value of D received by the decoder additionally to a number of operations in order to obtain an optimum value of D for the reconstruction of the speech signal. These may be three consecutive operations.

1) If the series of values of D received exhibit a trend, the present D received, if it falls outside said trend by a certain margin, is replaced by a value which is in keeping with said trend. Algorithms for determining a trend in a series of consecutive values and for determining a replacement value for a signal which falls outside said trend are well known per se to those skilled in the art.
2) Three intermediate values (I₁, I₂ and I₃) are calculated between two consecutive values of D (D₁ and D₂), possibly adjusted with the aid of such an algorithm, by means of interpolation. This is done, for example, in the following manner: $I_{1} {= 0.75 ∗ D}_{1} {+ 0.25 ∗ D}_{2}$
$I_{2} {= 0.5 ∗ D}_{1} {+ 0.5 ∗ D}_{2}$
$I_{3} {= 0.25 ∗ D}_{1} {+ 0.75 ∗D}_{2}$
The interpolation is carried out because the spacing D is determined in the coding unit twice per segment. Without interpolation, decoding of four consecutive subsegments would be carried out with the same value of D. If no fundamental regularity is present in the signal in the coding unit, a regularity would consequently wrongly be provided in the decoder during four subsegments. This problem is overcome by the interpolation.
If fundamental regularity is in fact present in the speech signal, the repetition spacing in the signal will in general vary slowly. Due to the interpolation, the variation in the value of D now also has a smooth nature in the decoder.
3) After equalising the values of D by, if necessary, calculating a replacement value and after interpolation, the calculated spacing D corresponds as well as possible with the actual repetition spacing present in the signal. If, however, said spacing D is less than 30, D is multiplied by an integer which is chosen in a manner such that the result is as a minimum equal to 30. This is necessary because all the samples of a subsegment at a spacing of less than 30 with respect to the present segment have not yet been reconstructed, so that they can therefore not be used to calculate the phases.

The reason that spaces D of less than 30 are nevertheless transmitted is that, if the fundamental regularity in the signal encompasses a number of samples less than 30, this prevents the decoded spacing D assuming values which are mutually unequal multiples of the actual repetition spacing. As a result of this, the equalisation algorithm would have less opportunity of detecting a trend.

Claims

Method for coding a sampled analog signal having a repetitive nature, in which the sampled signal is split into consecutive segments each containing a predetermined number of samples; in which a short-term prediction analysis is performed on said segments and in which the coefficients determined in said short-term prediction analysis are transmitted and are also fed to a short-term prediction filter, in which a long-term prediction analysis performed on a residual signal available at an output of said filter and the information determined in said long-term prediction analysis is also transmitted, and in which the information present in the residual signal is coded and transmitted, characterised in that the residual signal is transformed into the frequency domain, in that the amplitudes of at least a number of the frequency components obtained by the transformation into the frequency domain are combined to form a smaller number of frequency components in a manner such that the frequencies associated with the combined amplitudes are situated equidistantly on a linear Bark scale, and in that a signal is transmitted which is representative of said combined amplitudes.
Method for decoding a signal coded by the method according to Claim 1, in which the long-term prediction analysis information received and the other information received from the residual signal are combined and the combined signal, together with the short-term prediction analysis coefficients received, is fed to an inverse short-term prediction filter at whose output a series of samples is delivered which is representative of the sampled analog signal, characterised in that original amplitudes in the frequency domain are reconstructed from the received amplitude values which are combined according to claim 1, in that the information transmitted as a result of the long-term prediction analysis is used to calculate the phase values associated with said amplitudes, and in that the calculated phase values, together with the associated amplitudes are transformed to the time domain.
Method according to Claim 1, characterised in that the amplitudes of thirteen frequency components A₁ to A₁₃ inclusive obtained by the transformation into the frequency domain are transformed into amplitudes B₁ to B₄ inclusive of four frequency components situated equidistantly on a Bark scale in accordance with: $B_{1} = \sqrt{A_{1}^{2} + A_{2}^{2}}$
$B_{2} = \sqrt{A_{3}^{2} + A_{4}^{2} + A_{5}^{2}}$
$B_{3} = \sqrt{A_{6}^{2} + A_{7}^{2} + A_{8}^{2} + A_{9}^{2}}$
$B_{4} = \sqrt{A_{10}^{2} + A_{11}^{2} + A_{12}^{2} + A_{13}^{2}}$
and in that these values for B are transmitted after quantisation.
Method according to Claim 3, characterised in that a scaling factor G is calculated for the four frequency components B₁ to B₄ inclusive situated equidistantly on a Bark scale in accordance with: $G = \sqrt{B_{1}^{2} + B_{2}^{2} + B_{3}^{2} + B_{4}^{2}}$
in that this value for G is quantised, and in that the values of B₁ to B₄ inclusive are divided by the quantised scaling factor before they are quantised.
Method according to Claim 2 and 3 or 4, characterised in that combined amplitude values B₁' to B₄' are constructed from the information received, in that amplitude values A₁' to A₁₃' inclusive are obtained therefrom in accordance with: $A_{1}^{ʹ} = A_{2}^{ʹ} = \frac{B_{1}^{ʹ}}{\sqrt{2}}$
$A_{3}^{ʹ} = A_{4}^{ʹ} = A_{5}^{ʹ} = \frac{B_{2}^{ʹ}}{\sqrt{3}}$
$A_{6}^{ʹ} = A_{7}^{ʹ} = A_{8}^{ʹ} = A_{9}^{ʹ} = \frac{B_{3}^{ʹ}}{2}$
$A_{10}^{ʹ} = A_{11}^{ʹ} = A_{12}^{ʹ} = A_{13}^{ʹ} = \frac{B_{4}^{ʹ}}{2}$
and in that the information transmitted as a result of the long-term prediction analysis is representative of the number of samples D which is situated between the starting instant of a group of samples found with the aid of the long-term prediction analysis and transmitted earlier and the starting instant of a group of samples to be decoded.
Method according to Claim 5, characterised in that the group of samples transmitted earlier which is situated at a spacing D with respect to a group of samples to be decoded is transformed to the frequency domain, in that the phase value is determined of at least a number of the frequency components calculated with said transformation, in that said phase values are combined with the amplitude values A₁' to A₁₃' inclusive, and in that these combinations are transformed back into the time domain.
Method according to Claim 5 or 6, characterised in that the variation in the received values of D is equalised according to a predetermined algorithm by, if necessary, calculating a replacement value for a received value of D and in that three intermediate values are calculated for D between two consecutive values of D by means of interpolation.
Method according to Claim 7, characterised in that three intermediate values I₁, I₂ and I₃ are calculated from the known values D₁ and D₂ in accordance with: $I_{1} {= 0.75 ∗ D}_{1} {+ 0.25 ∗ D}_{2}$
$I_{2} {= 0.50 ∗ D}_{1} {+ 0.50 ∗ D}_{2}$
$I_{3} {= 0.25 ∗ D}_{1} {+ 0.75 ∗ D}_{2}$
Apparatus for coding a sampled analog signal having a repetitive nature, comprising
- splitting means for splitting the sampled signal into consecutive segments each containing a predetermined number of samples,

- short-term prediction means for performing a short-term prediction analysis on said segments and for generating coefficients,

- a short-term prediction filter for receiving the coefficients determined in said short-term prediction analysis,

- long-term prediction means for performing a long-term prediction analysis on a residual signal available at an output of said filter,

- coding means for coding the information present in the residual signal, which information is to be transmitted, and

- an output for transmitting the coefficients, the information determined in said long-term prediction analysis and the information present in the residual signal,
characterised in that the apparatus comprises
- transformation means for transforming the residual signal into the frequency domain,

- combination means for combining the amplitudes of at least a number of the frequency components obtained by the transformation into the frequency domain to form a smaller number of frequency components in a manner such that the frequencies associated with the combined amplitudes are situated equidistantly on a linear Bark scale, a signal which is representative of said combined amplitudes being transmittable.
Apparatus for decoding a signal coded by the method according to claim 1, comprising
- an input for receiving long-term prediction analysis information and other information received from the residual signal and short-term prediction analysis coefficients,

- combination means for combining the long-term prediction analysis information and the other information received from the residual signal into a combined signal,

- an inverse short-term prediction filter for receiving the combined signal and the short-term prediction analysis coefficients and for generating at its output a series of samples which is representative of the sampled analog signal,
characterised in that the apparatus comprises
- reconstruction means for reconstructing original amplitudes in the frequency domain from the received amplitude values which are combined according to claim 1,

- calculation means for using the information transmitted as a result of the long-term prediction analysis to calculate the phase values associated with said amplitudes, and

- transformation means for transforming the calculated phase values, together with the associated amplitudes, into the time domain.