CA2053133C - Method for coding and decoding a sampled analog signal having a repetitive nature and a device for coding and decoding by said method - Google Patents
Method for coding and decoding a sampled analog signal having a repetitive nature and a device for coding and decoding by said methodInfo
- Publication number
- CA2053133C CA2053133C CA002053133A CA2053133A CA2053133C CA 2053133 C CA2053133 C CA 2053133C CA 002053133 A CA002053133 A CA 002053133A CA 2053133 A CA2053133 A CA 2053133A CA 2053133 C CA2053133 C CA 2053133C
- Authority
- CA
- Canada
- Prior art keywords
- amplitudes
- new
- subsegment
- frequencycomponent
- term prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 230000003252 repetitive effect Effects 0.000 title claims description 5
- 238000000034 method Methods 0.000 title abstract description 12
- 230000001131 transforming effect Effects 0.000 claims description 11
- 230000007774 longterm Effects 0.000 claims description 8
- 230000009466 transformation Effects 0.000 abstract description 5
- 238000001914 filtration Methods 0.000 abstract description 2
- 238000001228 spectrum Methods 0.000 abstract 2
- 230000006870 function Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000001771 impaired effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
- Electrically Operated Instructional Devices (AREA)
- Analogue/Digital Conversion (AREA)
Abstract
Frequency components are calculated from the STP-filtered speech signal. The amplitudes of these are combined in a manner such that the resultant values are associated with frequencies which are situated equidistantly on a linear Bark scale. Said components are quantised, possibly after scaling. In the decoder, the components are again distributed over the frequency spectrum.
In the coder, the fundamental regularity D is determined with an LTP technique, after which it is transmitted. In the decoder the phases of the reconstructed signal at the spacing D in the past are determined. These phases are combined with the amplitudes already present in the frequency spectrum, after which transformation back to the time domain takes place. Inverse STP filtering is then carried out.
In the coder, the fundamental regularity D is determined with an LTP technique, after which it is transmitted. In the decoder the phases of the reconstructed signal at the spacing D in the past are determined. These phases are combined with the amplitudes already present in the frequency spectrum, after which transformation back to the time domain takes place. Inverse STP filtering is then carried out.
Description
Title: Method for coding and decoding a sampled analog signal having a repetitive nature and a device for coding and decoding by said method.
The invention relates to a method for coding a sampled analog signal having a repetitive nature, in which the sampled signal is split into consecutive segments each containing a predetermined number of samples; in which a short-term prediction analysis is performed on said segments and in which the coefficients determined in said analysis are transmitted and are also fed to a short-term prediction filter, in which a long-term prediction analysis is performed on the residual signal available at the output of said filter and the information determined in said analysis is also transmitted, and in which the information present in the residual signal is coded and transmitted.
The invention also relates to a method for decoding a signal coded in the manner described above, in which the long-term prediction analysis information received and the other information received from the residual signal are combined and the combined signal, together with the short-term prediction analysis coefficients received, is fed to an inverse short-term prediction filter at whose output a series of samples is delivered which forms a reconstruction of the sampled analog signal.
The invention also relates to a device for coding and decoding by the method described above.
It is known that analog signals having a strongly consistent nature such as, for example, speech signals can be efficiently coded after sampling by consecutively performing a number of different transformations on consecutive segments of the signal which each have a particular time duration. One of the known transformations for this purpose is linear predictive coding (LPC), for an explanation of which reference can be made to the book entitled "Digital Processing of Speech Signals" by L.R. Rabiner and R.W. Schafer;
Prentice Hall, New Jersey; chapter 8. As stated, LPC is always used for signal segments having a particular time duration, in the case of speech signals, for example, 20 ms, and is considered as short-term coding. It is also known to make use not only of a short-term prediction but also a long-term prediction (LTP) in which a very efficient coding is obtained by a combination of these two techniques. The principle of LTP is described in Frequenz, (Frequency), volume 42, no. 2-3, 1988; pages 85-93; P. Vary et al.: "Sprachcodec fur dass Europaische Funkfernsprechnetz" ("Speech coder/decoder for that European Radiotelephone Network"), while an improved version of the LTP principle is described in the Dutch Patent Application 9001985.
The object of the invention is to provide a method for very efficiently transmitting, i.e. with a small number of bits/sec, the information relevant to the human ear in the residual signal remaining after applying the STP
principle without the quality, experienced by the listener, of the speech reconstructed by the decoder at the receiving side being impaired.
For this purpose, the method for coding according to the invention is characterised in that the residual signal is transformed to the frequency domain, in that the amplitudes of at least a number of the frequency components obtained in transforming to the frequency domain are combined in a manner such that the frequencies associated with the combined amplitudes are situated equidistantly on a linear Bark scale, and in that a signal is transmitted which is representative of said combined amplitudes.
The method for decoding according to the invention is characterised in that the original amplitudes in the frequency domain are reconstructed from the combined amplitude values received, in that the information transmitted as a result of the long-term prediction analysis is used to calculate the phase values associated with said amplitudes, and in that the calculated phase values, together with the associated amplitudes, are transformed to the time domain.
According to the present invention, the residual signal is coded perceptively, which means that only that information is transmitted which is relevant for differences in the decoded received signal which can be detected by the human ear.
In the first place, use is made for this purpose of the known fact that the human ear is not sensitive to absolute phase values, but only to phase relationships, so that it is not necessary in principle to transmit the phase information from the residual signal to be coded, provided only that it is possible to reconstruct the original phase relationships at the receiving end.
In addition, the present invention makes use of the insight known for some time that human hearing functions in fact as a chain consisting of a number of filters having adjacent frequency bands but having different bandwidths, the so-called critical bands or Barks, the bandwidth of such critical bands being much smaller for low frequencies than for high frequencies. A frequency scale formed in accordance with this insight is referred to as a linear Bark scale. For a further explanation of the principle of the Bark scale, reference is made to B.
Scharf and S. Buus, "Stimulus, Physiology, Thresholds"
in L. Kaufman, K.R. Boff and J. P. Thomas, editors, Handbook of Perception and Human Performance, chapter 14, pages 1-43, Wiley, New York, 1986.
It is also pointed out that the principle of first transforming a residual signal to be transmitted in speech coding to the frequency domain and then transmitting the information available after this transformation has already been put forward earlier. For this purpose reference can be made, for example, to the paper entitled "Fourier Transform Vector Quantisation for Speech Coding" by P. Chang et al. in IEEE Transactions on Communications, Vol. COM 35, No. 10, pages 1059-1068.
According to this publication, however, after the transformation use is made of vector quantisation and there is no mention of transmitting purely amplitude information.
According to a first broad aspect, the invention provides an apparatus for coding an analog signal having a repetitive nature, comprising means for performing a short-term prediction analysis on a quantised sampled analog signal and for providing coefficients determined in the short-term prediction analysis at a first output, a short-term prediction filter for receiving the sampled analog signal and for generating a segmented residual signal, means for dividing the segmented residual signal into subsegments, means for transforming the subsegments from a time domain to a frequency domain and providing several frequency components per subsegment, each frequency component having a frequencycomponent-amplitude, means for calculating a number of new amplitudes by combining the several frequencycomponent-amplitudes, the number of new amplitudes being smaller than the several frequencycomponent-amplitudes, and for providing the new amplitudes at a second output.
According to a second broad aspect, the invention provides an apparatus for decoding a coded signal comprising a first input for receiving coefficients which have been determined in a short-term prediction analysis, a second input for receiving a number of new amplitudes which have been calculated by combining several frequencycomponent-amplitudes, means for calculating several new frequency-component-amplitudes at the hand of the number of new 3~ - 4 -~' amplitudes, the number of new amplitudes being smaller than the several new frequencycomponent-amplitudes, means for inverse transforming the several new frequencycomponent-amplitudes from a frequency domain to a time domain into new subsegments, an inverse short-term prediction filter, having a first filterinput, coupled to the first input, for receiving the coefficients and having a second filterinput, coupled to the means for inverse transforming, for receiving the new subsegments, for generating a series of samples which is representative for a sampled analog signal.
The invention will be explained in greater detail below on the basis of an exemplary embodiment with reference to the drawing, wherein: Figure la shows a block diagram of an exemplary embodiment of a coding unit for the device according to the invention.
Figure lb shows a block diagram of an exemplary embodiment of a decoding unit for the device according to the nvent lon .
An analog signal delivered by a microphone 1 is limited in bandwidth by a low-pass filter 2 and converted in an analog/digital converter 3 into a series of amplitude and time-discrete samples which are representative of the analog signal. The output signal of the converter 3 is fed to the input of a short-term analysis unit 4 and to the input of a short-term prediction filter 5. These two units cater for the abovementioned, short-term prediction (STP) on segments of, for example, 160 samples and the analysis unit 4 provides an output signal in the form of short-term prediction filter - 4a -2053 i 33 coefficients which are quantised, coded and transmitted to the decoder unit shown in Figure lb. The structure and the function of the filter 5 and the unit 4 are well known to those skilled in the art in the field of speech coding and are of no further importance for the essence of the present invention, so that a further explanation can be omitted.
The STP-filtered signal is fed to a long-term prediction (LTP) analysis unit 6. In this analysis unit, an LTP analysis is applied twice per segment of 160 samples in a manner such as that described, for example, in Dutch Patent Application 9001985. In such an LTP analysis, for a signal segment to be coded, a search is always made, in accordance with a particular search strategy, for a - 4b -B
segment which is as similar as possible in a signal period preceding said segment having a particular duration and a signal is transmitted in coded form which is representative of the number of samples D situated between the starting instant of the segment found and the starting instant of the segment to be coded.
The output signal of the STP filter unit 5 is referred to as the residual signal and, according to the invention, said residual signal is transmitted in coded form in a manner such that only the information which, seen perceptively, is relevant is transmitted. For this purpose, the segments of 160 samples in said residual signal are divided into 8 subsegments of 30 samples in the circuit 7. This is done by first dividing the segment supplied into eight subsegments of 20 samples and then completing these at the leading edge with the ten last samples of the previous subsegment. This implies that the last ten samples of every segment have to be stored in order to also be able to complete the first subsegment of the subsequent segment. Then every subsegment of 30 samples is multiplied in a circuit 8 by a window function such as, for example, a cosine function. The window function is so chosen that, for every sample in the overlapping parts of the subsegments, the sum of the squares of the two multiplication factors is unity. The reason that this has to be the case for the squares is that the multiplication by the window function takes place both in the coding unit and in the decoding unit shown in Figure lb. A Discrete Fourier Transform (DFT) is performed on the windowed subsegment in a circuit 9, 16 different frequency components being obtained for every subsegment. Of these 16 frequency components, numbered 0 to 15 inclusive, the amplitudes A of the components 1 to 13 inclusive are calculated in a circuit 10. The components 0, 14 and 15 can be ignored because they are situated outside the frequency band of 300 -3,400 Hz chosen for speech communication. If a greater 2~S313~
or a smaller frequency band is relevant, the number of amplitude components taken into consideration can be adjusted accordingly. Starting from the said 13 components, four so-called Bark amplitude components are s calculated in a circuit 11. These are amplitudes associated with frequencies which are situated equidistantly on a linear Bark scale. The Bark amplitude components Bl to B4 inclusive can, for example, be calculated as follows from the DFT amplitudes Al to A13 inclusive:
~ 2 2 Al ~ A2 - 15 32 2 ~ A, + A4 + A5 B3 = ~ A6 + A7 + A8 + A9 B4 = ~ A~o + A,1 + A 2 + A13 If desired, a gain factor G is calculated as a scaling value in circuit 12 from the four Bark amplitude components in accordance with:
G = ~ B + B2 + ~3 34 The application of the scaling value G has the advantage that the scaled amplitudes can be coded more efficiently.
The value of G is quantised in a circuit 13 and then transmitted to the decoding unit. If the scale factor G
has been calculated, every Bark component is divided by the quantised gain factor G in a circuit 14. The result of this division is quantised in a circuit 15, coded and then also transmitted to the decoding unit.
If no use is made of a scaling value, the circuits 12, 13 and 14 can be omitted and the four calculated values for the Bark amplitude components can be transmitted directly after quantisation in circuit 15.
After decoding in a circuit 16 in the decoder unit, the four scaled Bark amplitude components are multiplied in a multiplier 18 by the gain factor, G, decoded in a circuit 17, as a result of which the reconstructed Bark amplitude components Bl to B4 inclusive are obtained.
This is of course not applicable if no scaling factor is used in the coding unit. In a circuit 19, the amplitudes in the frequency domain Al to A13 inclusive (equidistant on the Hz scale) are calculated by means of the following formulae A - ~ ~
A3 5 A4 z A5 ~ -~, A6 = ~7 = A~ = Ag = 2-~;o = ~; = A12 = A13 2 In order to be able to transform the 13 frequency components considered in the coder back to the time domain by means of an inverse DFT (IDFT) in the IDFT
circuit, the amplitudes and the phases are required.
The phases are determined in the following manner with the aid of the LTP information decoded in a circuit 23 and consisting of the sample spacing D.
The 120 most recent samples of the reconstructed STP
residue such as are present at the output of the circuit 22 to be discussed in greater detail below are stored in each case. In a circuit 24, the subsegment is determined which is situated at a spacing of D samples in the past with respect to the present subsegment and this 20~3133 subsegment is multiplied in a circuit 25 by the same window function as was used in the circuit 8 in the coder unit. A DFT is then applied to said subsegment in a circuit 26, after which the phases of the 13 components considered can be calculated in a circuit 27. With the aid of the phases determined in this way and the amplitudes already calculated, an IDFT is performed in the circuit 20, the amplitudes of Ao/ Al4, A15 and Al6 being set equal to zero.
At the output of the circuit 20 a reconstruction of the subsegment, 30 samples long, is now available, but this has also been modified by the window function performed in the coder unit. The reconstructed subsegment is therefore multiplied again by the window function in a circuit 21. In the case of the first ten samples of the subsegment now multiplied twice by the window function, the last ten samples, stored for this purpose, of the previous subsegment multiplied twice by the window function are added in a circuit 22. As a result of this, the sum of the multiplication factors in the resultant ten samples is equal to unity.
The last ten samples in this subsegment are stored. The first twenty samples form a portion of the reconstruction of a segment of the STP residue. After eight subsegments have been reconstructed and combined, a completely reconstructed segment of the STP residue is obtained, and this is situated ten samples in the past with respect to the segment on which the STP analysis has been performed in the coding unit.
An inverse STP filtering is performed on this segment in a filter circuit 28 in a manner known per se with the aid of the STP coefficients received, the filter coefficients from the previous segment being used for the first ten samples.
The output signal of the filter 28 is converted in a digital/analog converter 29 into an analog signal which is fed via a low-pass filter 30 to a loudspeaker 31 which q 2053133 gives a high-fidelity reproduction of the speech signal supplied to the microphone l, it having been possible to transmit said speech signal in coded form with a low number of bits due to the measures according to the invention.
If desired, a circuit 23' can be included between the circuits 23 and 24 to first subject the value of D
received by the decoder additionally to a number of operations in order to obtain an optimum value of D for the reconstruction of the speech signal. These may be three consecutive operations.
1) If the series of values of D received exhibit a trend, the present D received, if it falls outside said trend by a certain margin, is replaced by a value which is in keeping with said trend. Algorithms for determining a trend in a series of consecutive values and for determining a replacement value for a signal which falls outside said trend are well known per se to those skilled in the art.
2) Three intermediate values (I1, I2 and I3) are calculated between two consecutive values of D (D1 and D2), possibly adjusted with the aid of such an algorithm, by means of interpolation. This is done, for example, in the following manner:
Il = 0 75 * Dl + 0.25 * D
I2 0 5 * D1 + 0-5 * D2 I3 0.25 D1 + 0 75 D2 The interpolation is carried out because the spacing D
is determined in the coding unit twice per segment.
Without interpolation, decoding of four consecutive subsegments would be carried out with the same value of D. If no fundamental regularity is present in the signal in the coding unit, a regularity would consequently wrongly be provided in the decoder during four subsegments. This problem is overcome by the interpolation.
If fundamental regularity is in fact present in the 1~
speech signal, the repetition spacing in the signal will in general vary slowly. Due to the interpolation, the variation in the value of D now also has a smooth nature in the decoder.
The invention relates to a method for coding a sampled analog signal having a repetitive nature, in which the sampled signal is split into consecutive segments each containing a predetermined number of samples; in which a short-term prediction analysis is performed on said segments and in which the coefficients determined in said analysis are transmitted and are also fed to a short-term prediction filter, in which a long-term prediction analysis is performed on the residual signal available at the output of said filter and the information determined in said analysis is also transmitted, and in which the information present in the residual signal is coded and transmitted.
The invention also relates to a method for decoding a signal coded in the manner described above, in which the long-term prediction analysis information received and the other information received from the residual signal are combined and the combined signal, together with the short-term prediction analysis coefficients received, is fed to an inverse short-term prediction filter at whose output a series of samples is delivered which forms a reconstruction of the sampled analog signal.
The invention also relates to a device for coding and decoding by the method described above.
It is known that analog signals having a strongly consistent nature such as, for example, speech signals can be efficiently coded after sampling by consecutively performing a number of different transformations on consecutive segments of the signal which each have a particular time duration. One of the known transformations for this purpose is linear predictive coding (LPC), for an explanation of which reference can be made to the book entitled "Digital Processing of Speech Signals" by L.R. Rabiner and R.W. Schafer;
Prentice Hall, New Jersey; chapter 8. As stated, LPC is always used for signal segments having a particular time duration, in the case of speech signals, for example, 20 ms, and is considered as short-term coding. It is also known to make use not only of a short-term prediction but also a long-term prediction (LTP) in which a very efficient coding is obtained by a combination of these two techniques. The principle of LTP is described in Frequenz, (Frequency), volume 42, no. 2-3, 1988; pages 85-93; P. Vary et al.: "Sprachcodec fur dass Europaische Funkfernsprechnetz" ("Speech coder/decoder for that European Radiotelephone Network"), while an improved version of the LTP principle is described in the Dutch Patent Application 9001985.
The object of the invention is to provide a method for very efficiently transmitting, i.e. with a small number of bits/sec, the information relevant to the human ear in the residual signal remaining after applying the STP
principle without the quality, experienced by the listener, of the speech reconstructed by the decoder at the receiving side being impaired.
For this purpose, the method for coding according to the invention is characterised in that the residual signal is transformed to the frequency domain, in that the amplitudes of at least a number of the frequency components obtained in transforming to the frequency domain are combined in a manner such that the frequencies associated with the combined amplitudes are situated equidistantly on a linear Bark scale, and in that a signal is transmitted which is representative of said combined amplitudes.
The method for decoding according to the invention is characterised in that the original amplitudes in the frequency domain are reconstructed from the combined amplitude values received, in that the information transmitted as a result of the long-term prediction analysis is used to calculate the phase values associated with said amplitudes, and in that the calculated phase values, together with the associated amplitudes, are transformed to the time domain.
According to the present invention, the residual signal is coded perceptively, which means that only that information is transmitted which is relevant for differences in the decoded received signal which can be detected by the human ear.
In the first place, use is made for this purpose of the known fact that the human ear is not sensitive to absolute phase values, but only to phase relationships, so that it is not necessary in principle to transmit the phase information from the residual signal to be coded, provided only that it is possible to reconstruct the original phase relationships at the receiving end.
In addition, the present invention makes use of the insight known for some time that human hearing functions in fact as a chain consisting of a number of filters having adjacent frequency bands but having different bandwidths, the so-called critical bands or Barks, the bandwidth of such critical bands being much smaller for low frequencies than for high frequencies. A frequency scale formed in accordance with this insight is referred to as a linear Bark scale. For a further explanation of the principle of the Bark scale, reference is made to B.
Scharf and S. Buus, "Stimulus, Physiology, Thresholds"
in L. Kaufman, K.R. Boff and J. P. Thomas, editors, Handbook of Perception and Human Performance, chapter 14, pages 1-43, Wiley, New York, 1986.
It is also pointed out that the principle of first transforming a residual signal to be transmitted in speech coding to the frequency domain and then transmitting the information available after this transformation has already been put forward earlier. For this purpose reference can be made, for example, to the paper entitled "Fourier Transform Vector Quantisation for Speech Coding" by P. Chang et al. in IEEE Transactions on Communications, Vol. COM 35, No. 10, pages 1059-1068.
According to this publication, however, after the transformation use is made of vector quantisation and there is no mention of transmitting purely amplitude information.
According to a first broad aspect, the invention provides an apparatus for coding an analog signal having a repetitive nature, comprising means for performing a short-term prediction analysis on a quantised sampled analog signal and for providing coefficients determined in the short-term prediction analysis at a first output, a short-term prediction filter for receiving the sampled analog signal and for generating a segmented residual signal, means for dividing the segmented residual signal into subsegments, means for transforming the subsegments from a time domain to a frequency domain and providing several frequency components per subsegment, each frequency component having a frequencycomponent-amplitude, means for calculating a number of new amplitudes by combining the several frequencycomponent-amplitudes, the number of new amplitudes being smaller than the several frequencycomponent-amplitudes, and for providing the new amplitudes at a second output.
According to a second broad aspect, the invention provides an apparatus for decoding a coded signal comprising a first input for receiving coefficients which have been determined in a short-term prediction analysis, a second input for receiving a number of new amplitudes which have been calculated by combining several frequencycomponent-amplitudes, means for calculating several new frequency-component-amplitudes at the hand of the number of new 3~ - 4 -~' amplitudes, the number of new amplitudes being smaller than the several new frequencycomponent-amplitudes, means for inverse transforming the several new frequencycomponent-amplitudes from a frequency domain to a time domain into new subsegments, an inverse short-term prediction filter, having a first filterinput, coupled to the first input, for receiving the coefficients and having a second filterinput, coupled to the means for inverse transforming, for receiving the new subsegments, for generating a series of samples which is representative for a sampled analog signal.
The invention will be explained in greater detail below on the basis of an exemplary embodiment with reference to the drawing, wherein: Figure la shows a block diagram of an exemplary embodiment of a coding unit for the device according to the invention.
Figure lb shows a block diagram of an exemplary embodiment of a decoding unit for the device according to the nvent lon .
An analog signal delivered by a microphone 1 is limited in bandwidth by a low-pass filter 2 and converted in an analog/digital converter 3 into a series of amplitude and time-discrete samples which are representative of the analog signal. The output signal of the converter 3 is fed to the input of a short-term analysis unit 4 and to the input of a short-term prediction filter 5. These two units cater for the abovementioned, short-term prediction (STP) on segments of, for example, 160 samples and the analysis unit 4 provides an output signal in the form of short-term prediction filter - 4a -2053 i 33 coefficients which are quantised, coded and transmitted to the decoder unit shown in Figure lb. The structure and the function of the filter 5 and the unit 4 are well known to those skilled in the art in the field of speech coding and are of no further importance for the essence of the present invention, so that a further explanation can be omitted.
The STP-filtered signal is fed to a long-term prediction (LTP) analysis unit 6. In this analysis unit, an LTP analysis is applied twice per segment of 160 samples in a manner such as that described, for example, in Dutch Patent Application 9001985. In such an LTP analysis, for a signal segment to be coded, a search is always made, in accordance with a particular search strategy, for a - 4b -B
segment which is as similar as possible in a signal period preceding said segment having a particular duration and a signal is transmitted in coded form which is representative of the number of samples D situated between the starting instant of the segment found and the starting instant of the segment to be coded.
The output signal of the STP filter unit 5 is referred to as the residual signal and, according to the invention, said residual signal is transmitted in coded form in a manner such that only the information which, seen perceptively, is relevant is transmitted. For this purpose, the segments of 160 samples in said residual signal are divided into 8 subsegments of 30 samples in the circuit 7. This is done by first dividing the segment supplied into eight subsegments of 20 samples and then completing these at the leading edge with the ten last samples of the previous subsegment. This implies that the last ten samples of every segment have to be stored in order to also be able to complete the first subsegment of the subsequent segment. Then every subsegment of 30 samples is multiplied in a circuit 8 by a window function such as, for example, a cosine function. The window function is so chosen that, for every sample in the overlapping parts of the subsegments, the sum of the squares of the two multiplication factors is unity. The reason that this has to be the case for the squares is that the multiplication by the window function takes place both in the coding unit and in the decoding unit shown in Figure lb. A Discrete Fourier Transform (DFT) is performed on the windowed subsegment in a circuit 9, 16 different frequency components being obtained for every subsegment. Of these 16 frequency components, numbered 0 to 15 inclusive, the amplitudes A of the components 1 to 13 inclusive are calculated in a circuit 10. The components 0, 14 and 15 can be ignored because they are situated outside the frequency band of 300 -3,400 Hz chosen for speech communication. If a greater 2~S313~
or a smaller frequency band is relevant, the number of amplitude components taken into consideration can be adjusted accordingly. Starting from the said 13 components, four so-called Bark amplitude components are s calculated in a circuit 11. These are amplitudes associated with frequencies which are situated equidistantly on a linear Bark scale. The Bark amplitude components Bl to B4 inclusive can, for example, be calculated as follows from the DFT amplitudes Al to A13 inclusive:
~ 2 2 Al ~ A2 - 15 32 2 ~ A, + A4 + A5 B3 = ~ A6 + A7 + A8 + A9 B4 = ~ A~o + A,1 + A 2 + A13 If desired, a gain factor G is calculated as a scaling value in circuit 12 from the four Bark amplitude components in accordance with:
G = ~ B + B2 + ~3 34 The application of the scaling value G has the advantage that the scaled amplitudes can be coded more efficiently.
The value of G is quantised in a circuit 13 and then transmitted to the decoding unit. If the scale factor G
has been calculated, every Bark component is divided by the quantised gain factor G in a circuit 14. The result of this division is quantised in a circuit 15, coded and then also transmitted to the decoding unit.
If no use is made of a scaling value, the circuits 12, 13 and 14 can be omitted and the four calculated values for the Bark amplitude components can be transmitted directly after quantisation in circuit 15.
After decoding in a circuit 16 in the decoder unit, the four scaled Bark amplitude components are multiplied in a multiplier 18 by the gain factor, G, decoded in a circuit 17, as a result of which the reconstructed Bark amplitude components Bl to B4 inclusive are obtained.
This is of course not applicable if no scaling factor is used in the coding unit. In a circuit 19, the amplitudes in the frequency domain Al to A13 inclusive (equidistant on the Hz scale) are calculated by means of the following formulae A - ~ ~
A3 5 A4 z A5 ~ -~, A6 = ~7 = A~ = Ag = 2-~;o = ~; = A12 = A13 2 In order to be able to transform the 13 frequency components considered in the coder back to the time domain by means of an inverse DFT (IDFT) in the IDFT
circuit, the amplitudes and the phases are required.
The phases are determined in the following manner with the aid of the LTP information decoded in a circuit 23 and consisting of the sample spacing D.
The 120 most recent samples of the reconstructed STP
residue such as are present at the output of the circuit 22 to be discussed in greater detail below are stored in each case. In a circuit 24, the subsegment is determined which is situated at a spacing of D samples in the past with respect to the present subsegment and this 20~3133 subsegment is multiplied in a circuit 25 by the same window function as was used in the circuit 8 in the coder unit. A DFT is then applied to said subsegment in a circuit 26, after which the phases of the 13 components considered can be calculated in a circuit 27. With the aid of the phases determined in this way and the amplitudes already calculated, an IDFT is performed in the circuit 20, the amplitudes of Ao/ Al4, A15 and Al6 being set equal to zero.
At the output of the circuit 20 a reconstruction of the subsegment, 30 samples long, is now available, but this has also been modified by the window function performed in the coder unit. The reconstructed subsegment is therefore multiplied again by the window function in a circuit 21. In the case of the first ten samples of the subsegment now multiplied twice by the window function, the last ten samples, stored for this purpose, of the previous subsegment multiplied twice by the window function are added in a circuit 22. As a result of this, the sum of the multiplication factors in the resultant ten samples is equal to unity.
The last ten samples in this subsegment are stored. The first twenty samples form a portion of the reconstruction of a segment of the STP residue. After eight subsegments have been reconstructed and combined, a completely reconstructed segment of the STP residue is obtained, and this is situated ten samples in the past with respect to the segment on which the STP analysis has been performed in the coding unit.
An inverse STP filtering is performed on this segment in a filter circuit 28 in a manner known per se with the aid of the STP coefficients received, the filter coefficients from the previous segment being used for the first ten samples.
The output signal of the filter 28 is converted in a digital/analog converter 29 into an analog signal which is fed via a low-pass filter 30 to a loudspeaker 31 which q 2053133 gives a high-fidelity reproduction of the speech signal supplied to the microphone l, it having been possible to transmit said speech signal in coded form with a low number of bits due to the measures according to the invention.
If desired, a circuit 23' can be included between the circuits 23 and 24 to first subject the value of D
received by the decoder additionally to a number of operations in order to obtain an optimum value of D for the reconstruction of the speech signal. These may be three consecutive operations.
1) If the series of values of D received exhibit a trend, the present D received, if it falls outside said trend by a certain margin, is replaced by a value which is in keeping with said trend. Algorithms for determining a trend in a series of consecutive values and for determining a replacement value for a signal which falls outside said trend are well known per se to those skilled in the art.
2) Three intermediate values (I1, I2 and I3) are calculated between two consecutive values of D (D1 and D2), possibly adjusted with the aid of such an algorithm, by means of interpolation. This is done, for example, in the following manner:
Il = 0 75 * Dl + 0.25 * D
I2 0 5 * D1 + 0-5 * D2 I3 0.25 D1 + 0 75 D2 The interpolation is carried out because the spacing D
is determined in the coding unit twice per segment.
Without interpolation, decoding of four consecutive subsegments would be carried out with the same value of D. If no fundamental regularity is present in the signal in the coding unit, a regularity would consequently wrongly be provided in the decoder during four subsegments. This problem is overcome by the interpolation.
If fundamental regularity is in fact present in the 1~
speech signal, the repetition spacing in the signal will in general vary slowly. Due to the interpolation, the variation in the value of D now also has a smooth nature in the decoder.
3) After equalising the values of D by, if necessary, calculating a replacement value and after interpolation, the calculated spacing D corresponds as well as possible with the actual repetition spacing present in the signal.
If, however, said spacing D is less than 30, D is multiplied by an integer which is chosen in a manner such that the result is as a minimum equal to 30. This is necessary because all the samples of a subsegment at a spacing of less than 30 with respect to the present segment have not yet been reconstructed, so that they can therefore not be used to calculate the phases.
The reason that spaces D of less than 30 are nevertheless transmitted is that, if the fundamental regularity in the signal encompasses a number of samples less than 30, this prevents the decoded spacing D
assuming values which are mutually unequal multiples of the actual repetition spacing. As a result of this, the equalisation algorithm would have less opportunity of detecting a trend.
If, however, said spacing D is less than 30, D is multiplied by an integer which is chosen in a manner such that the result is as a minimum equal to 30. This is necessary because all the samples of a subsegment at a spacing of less than 30 with respect to the present segment have not yet been reconstructed, so that they can therefore not be used to calculate the phases.
The reason that spaces D of less than 30 are nevertheless transmitted is that, if the fundamental regularity in the signal encompasses a number of samples less than 30, this prevents the decoded spacing D
assuming values which are mutually unequal multiples of the actual repetition spacing. As a result of this, the equalisation algorithm would have less opportunity of detecting a trend.
Claims (14)
1. Apparatus for coding an analog signal having a repetitive nature, comprising -means for performing a short-term prediction analysis on a quantised sampled analog signal and for providing coefficients determined in the short-term prediction analysis at a first output, - a short-term prediction filter for receiving the sampled analog signal and for generating a segmented residual signal, - means for dividing the segmented residual signal into subsegments, - means for transforming the subsegments from a time domain to a frequency domain and providing several frequency components per subsegment, each frequency component having a frequencycomponent-amplitude, - means for calculating a number of new amplitudes by combining the several frequencycomponent-amplitudes, the number of new amplitudes being smaller than the several frequencycomponent-amplitudes, and for providing the new amplitudes at a second output.
2. Apparatus according to claim 1, characterised in that the apparatus comprises -means for performing a long-term prediction analysis on the subsegments of the segmented residual signal and for providing coefficients determined in the long-term prediction analysis at a third output.
3. Apparatus according to claim 2, characterised in that the apparatus comprises -means for calculating a gain factor as a scaling value and for dividing each new amplitude by the gain factor and for providing the gain factor at a fourth output.
4. Apparatus according to claim 3, characterised in that the apparatus comprises -means for multiplying each subsegment by a window function.
5. Apparatus according to claim 4, characterised in that the apparatus comprises -means for quantising the new amplitudes.
6. Apparatus according to claim 5, characterised in that thirteen frequencycomponent-amplitudes A1 to A13 are combined to calculate four new amplitudes B1 to B4 in accordance with B3 = B4 = and that the gain factor G is calculated in accordance with G =
7. Apparatus for decoding a coded signal comprising - a first input for receiving coefficients which have been determined in a short-term prediction analysis, - a second input for receiving a number of new amplitudes which have been calculated by combining several frequencycomponent-amplitudes, - means for calculating several new frequencycomponent-amplitudes at the hand of the number of new amplitudes, the number of new amplitudes being smaller than the several new frequencycomponent-amplitudes, - means for inverse transforming the several new frequencycomponent-amplitudes from a frequency domain to a time domain into new subsegments, - an inverse short-term prediction filter, having a first filterinput, coupled to the first input, for receiving the coefficients and having a second filterinput, coupled to the means for inverse transforming, for receiving the new subsegments, for generating a series of samples which is representative for a sampled analog signal.
8. Apparatus according to claim 7, characterised in that the apparatus comprises - a third input for receiving coefficients which have been determined in a long-term prediction analysis, - means, coupled to the third input and to the means for inverse transforming, for determining a subsegment at a spacing of D samples in the past with respect to a present subsegment, - means for transforming the determined subsegment from a time domain to a frequency domain, - means for calculating phases at the hand of the transformed determined subsegment and for providing these phases at the means for inverse transforming.
9. Apparatus according to claim 8, characterised in that the apparatus comprises - a fourth input for receiving a gain factor as a scaling value, - means for multiplying each of the received new amplitudes by the gain factor.
10. Apparatus according to claim 9, characterised in that the apparatus comprises - means for multiplying each new subsegment by a window function, - means for multiplying each determined subsegment by the window function.
11. Apparatus according to claim 10, characterised in that the apparatus comprises - means, coupled to the second input, for decoding the new amplitudes.
12. Apparatus according to claim 11, characterised in that thirteen new frequencycomponent-amplitudes A'1 to A'13 are calculated at the hand of four new amplitudes B'1 to B'4 in accordance with
13. Apparatus according to claim 8, characterised in that the apparatus comprises - means for equalising a value of the spacing of D
samples according to a predetermined algorithm.
samples according to a predetermined algorithm.
14. Apparatus according to claim 8, characterised in that the apparatus comprises - means for calculating three intermediate values I1, I2, I3 for the value of the spacing of D samples between two consecutive values of the spacing of D1 and D2 samples in accordance with I1 = 0.75 * D1 + 0.25 * D2 I2 = 0-50 * D1 + 0.50 * D2 I3 = 0.25 * D1 + 0.75 * D2
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NL9002308A NL9002308A (en) | 1990-10-23 | 1990-10-23 | METHOD FOR CODING AND DECODING A SAMPLED ANALOGUE SIGNAL WITH A REPEATING CHARACTER AND AN APPARATUS FOR CODING AND DECODING ACCORDING TO THIS METHOD |
NL9002308 | 1990-10-23 |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2053133A1 CA2053133A1 (en) | 1992-04-24 |
CA2053133C true CA2053133C (en) | 1996-05-21 |
Family
ID=19857866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002053133A Expired - Lifetime CA2053133C (en) | 1990-10-23 | 1991-10-10 | Method for coding and decoding a sampled analog signal having a repetitive nature and a device for coding and decoding by said method |
Country Status (11)
Country | Link |
---|---|
EP (1) | EP0482699B1 (en) |
JP (1) | JP2958726B2 (en) |
AT (1) | ATE157188T1 (en) |
CA (1) | CA2053133C (en) |
DE (1) | DE69127339T2 (en) |
DK (1) | DK0482699T3 (en) |
ES (1) | ES2106051T3 (en) |
FI (1) | FI105623B (en) |
NL (1) | NL9002308A (en) |
NO (1) | NO305188B1 (en) |
PT (1) | PT99294A (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07261797A (en) * | 1994-03-18 | 1995-10-13 | Mitsubishi Electric Corp | Signal encoding device and signal decoding device |
JPH09127995A (en) * | 1995-10-26 | 1997-05-16 | Sony Corp | Signal decoding method and signal decoder |
JP2000165251A (en) * | 1998-11-27 | 2000-06-16 | Matsushita Electric Ind Co Ltd | Audio signal coding device and microphone realizing the same |
FI116992B (en) | 1999-07-05 | 2006-04-28 | Nokia Corp | Methods, systems, and devices for enhancing audio coding and transmission |
EP1113432B1 (en) * | 1999-12-24 | 2011-03-30 | International Business Machines Corporation | Method and system for detecting identical digital data |
CN114519996B (en) * | 2022-04-20 | 2022-07-08 | 北京远鉴信息技术有限公司 | Method, device and equipment for determining voice synthesis type and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5650398A (en) * | 1979-10-01 | 1981-05-07 | Hitachi Ltd | Sound synthesizer |
US4742550A (en) * | 1984-09-17 | 1988-05-03 | Motorola, Inc. | 4800 BPS interoperable relp system |
JP2892462B2 (en) * | 1990-08-27 | 1999-05-17 | 沖電気工業株式会社 | Code-excited linear predictive encoder |
-
1990
- 1990-10-23 NL NL9002308A patent/NL9002308A/en not_active Application Discontinuation
-
1991
- 1991-10-10 CA CA002053133A patent/CA2053133C/en not_active Expired - Lifetime
- 1991-10-16 DK DK91202675.4T patent/DK0482699T3/en active
- 1991-10-16 EP EP91202675A patent/EP0482699B1/en not_active Expired - Lifetime
- 1991-10-16 ES ES91202675T patent/ES2106051T3/en not_active Expired - Lifetime
- 1991-10-16 AT AT91202675T patent/ATE157188T1/en not_active IP Right Cessation
- 1991-10-16 DE DE69127339T patent/DE69127339T2/en not_active Expired - Lifetime
- 1991-10-17 JP JP3332967A patent/JP2958726B2/en not_active Expired - Lifetime
- 1991-10-18 NO NO914105A patent/NO305188B1/en not_active IP Right Cessation
- 1991-10-22 PT PT99294A patent/PT99294A/en not_active Application Discontinuation
- 1991-10-23 FI FI914993A patent/FI105623B/en not_active IP Right Cessation
Also Published As
Publication number | Publication date |
---|---|
NO914105L (en) | 1992-04-24 |
EP0482699A2 (en) | 1992-04-29 |
DE69127339D1 (en) | 1997-09-25 |
PT99294A (en) | 1994-01-31 |
ES2106051T3 (en) | 1997-11-01 |
NO914105D0 (en) | 1991-10-18 |
NO305188B1 (en) | 1999-04-12 |
EP0482699B1 (en) | 1997-08-20 |
DE69127339T2 (en) | 1998-01-29 |
FI914993A (en) | 1992-04-24 |
EP0482699A3 (en) | 1992-08-19 |
CA2053133A1 (en) | 1992-04-24 |
DK0482699T3 (en) | 1998-03-30 |
NL9002308A (en) | 1992-05-18 |
FI105623B (en) | 2000-09-15 |
JP2958726B2 (en) | 1999-10-06 |
JPH05268098A (en) | 1993-10-15 |
FI914993A0 (en) | 1991-10-23 |
ATE157188T1 (en) | 1997-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6681204B2 (en) | Apparatus and method for encoding a signal as well as apparatus and method for decoding a signal | |
DK2265040T3 (en) | Advanced processing based on a complex exponential modulated filter bank and adaptive time signaling methods | |
KR100361236B1 (en) | Transmission System Implementing Differential Coding Principle | |
RU2255380C2 (en) | Method and device for reproducing speech signals and method for transferring said signals | |
DE69631728T2 (en) | Method and apparatus for speech coding | |
JPS6326947B2 (en) | ||
JPS6161305B2 (en) | ||
JPS5912186B2 (en) | Predictive speech signal coding with reduced noise influence | |
KR20120095920A (en) | Optimized low-throughput parametric coding/decoding | |
JPH07297726A (en) | Information coding method and device, information decoding method and device and information recording medium and information transmission method | |
US5504832A (en) | Reduction of phase information in coding of speech | |
EP0287741A1 (en) | Process for varying speech speed and device for implementing said process | |
CA2053133C (en) | Method for coding and decoding a sampled analog signal having a repetitive nature and a device for coding and decoding by said method | |
US5687281A (en) | Bark amplitude component coder for a sampled analog signal and decoder for the coded signal | |
KR100303580B1 (en) | Transmitter, Encoding Device and Transmission Method | |
US5588089A (en) | Bark amplitude component coder for a sampled analog signal and decoder for the coded signal | |
JP2004012908A (en) | Voice signal interpolation device and method, and program | |
CA1308193C (en) | Multi-pulse coding system | |
EP1125282B1 (en) | Audio coding | |
EP0475520B1 (en) | Method for coding an analog signal having a repetitive nature and a device for coding by said method | |
KR100205472B1 (en) | Quantizer | |
JPH0784595A (en) | Band dividing and encoding device for speech and musical sound | |
JP3099569B2 (en) | Transmission method of acoustic signal | |
JPS6232800B2 (en) | ||
EP0573103A1 (en) | Transmitter, receiver and record carrier in a digital transmission system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKEX | Expiry |