WO1995028770A1 - Adpcm signal encoding/decoding system and method - Google Patents

Adpcm signal encoding/decoding system and method Download PDF

Info

Publication number
WO1995028770A1
WO1995028770A1 PCT/AU1995/000216 AU9500216W WO9528770A1 WO 1995028770 A1 WO1995028770 A1 WO 1995028770A1 AU 9500216 W AU9500216 W AU 9500216W WO 9528770 A1 WO9528770 A1 WO 9528770A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
kalman filter
output
bit rate
specialised
Prior art date
Application number
PCT/AU1995/000216
Other languages
French (fr)
Inventor
Craig Robert Watkins
Robert Ronald Bitmead
Salvatore Crisafulli
Original Assignee
Commonwealth Scientific And Industrial Research Organisation
The Australian National University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Commonwealth Scientific And Industrial Research Organisation, The Australian National University filed Critical Commonwealth Scientific And Industrial Research Organisation
Priority to AU22098/95A priority Critical patent/AU2209895A/en
Publication of WO1995028770A1 publication Critical patent/WO1995028770A1/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3002Conversion to or from differential modulation
    • H03M7/3044Conversion to or from differential modulation with several bits only, i.e. the difference between successive samples being coded by more than one bit, e.g. differential pulse code modulation [DPCM]
    • H03M7/3046Conversion to or from differential modulation with several bits only, i.e. the difference between successive samples being coded by more than one bit, e.g. differential pulse code modulation [DPCM] adaptive, e.g. adaptive differential pulse code modulation [ADPCM]

Definitions

  • the present invention relates to ADPCM (Adaptive Differential Pulse Code Modulation) signal encoding systems and, in particular, to such systems used to encode speech signals to result in a variable bit rate digitised representation of the speech signals.
  • ADPCM Adaptive Differential Pulse Code Modulation
  • the invention also relates to the reverse decoding of such variable bit rate digitised representations in order to provide a reconstituted speech signal.
  • the invention also is applicable to other audio signals such as music.
  • the present invention finds applications in the encoding of speech signals in order to enable such speech to be stored. For example, many banks and like financial institutions routinely record telephone conversations of their market dealers in order to provide a permanent record of financial contractual obligations entered into verbally. Substantial savings in the volume of storage can be made if such speech is stored digitally and reconstituted as necessary by the reverse decoding.
  • the coding techniques of the present invention also find application in ATM (Asynchronous Transfer Mode) communications applications where a variable bit rate in the coded output is not of great consequence since it is the average bit rate which is important in such applications.
  • ADPCM encoding and decoding systems using a predictor which essentially comprises a Kalman filter are known, however, such systems have not found practical application because of the very substantial computational demands made by the Kalman filter predictor even though the Kalman filter is known to be "optimal" as far as linear prediction is concerned.
  • This is essentially achieved by use of a sub- optimal Kalman predictor utilising smoothing sample estimates and arithmetic coding to take advantage of the silent periods within speech utterances and to preserve ADPCM stability at low bit rates.
  • an ADPCM audio encoding system to provide a variable bit rate digitised representation of audio signals, said system comprising a digital input for said audio signals connected to a positive input of a summer, the output of said summer being connected to a cascade connected quantizer and dequantizer, the output of said dequantizer forming an input to a 'specialised' Kalman filter having two outputs, the first of said outputs comprising a predicted output which is connected to a negative input of said summer, the other of said filter outputs comprising a smoothed output which is utilized by an autoregressive calculator means to modify the operation of said 'specialised' Kalman filter; and an arithmetic coder having its input connected to said quantizer output, and controlled by a variance estimator means operating on the output of said dequantizer, the output of said arithmetic coder comprising said variable bit rate digitised representation.
  • the encoding system further comprises an analog-to-digital converter for receiving audio signals in analog form and converting said signals to digital form.
  • an analog-to-digital converter for receiving audio signals in analog form and converting said signals to digital form.
  • audio transducer means to generate said analog audio signals from sound pressure waves.
  • an ADPCM decoding system to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, said system comprising a dequantizer having its input connected to receive a decoded form of said variable bit rate digitised representation, the output of said dequantizer being connected to a 'specialised' Kalman filter, the output of which comprises said reconstituted audio signal and is utilized by an autoregressive calculator means to modify the operation of said 'specialised' Kalman filter; and an arithmetic decoder, controlled by a variance estimator means operating on the output of said dequantizer, interposed between said variable bit rate digitised representation and said dequantizer to generate said decoded form of said variable bit rate digitised representation.
  • the audio signals are speech signals.
  • the invention further discloses an ADPCM audio encoding/decoding system comprising an encoding system described above, a digital memory storage means for receiving and storing said variable bit rate digitised representation, and a decoding system as described above.
  • the invention yet further disclosed a method for ADPCM audio encoding to provide a variable bit rate digitized representation of audio signals, said method comprising the steps of: filtering a digital audio signal by 'specialised' Kalman filter means; quantizing the filtered signal; and arithmetic coding the quantized signal to provide said variable bit rate digitised representation of said audio signal.
  • the invention yet further discloses a method for ADPCM audio decoding to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, the method comprising the steps of: arithmetic decoding of said digitised representation; dequantizing of said decoded representation; and filtering said dequantized signal by 'specialised' Kalman filter means to produce said reconstituted audio signal.
  • Fig. 1 is a schematic block diagram of an ADPCM speech encoding system of the preferred embodiment
  • Fig. 2 is a schematic block diagram of the ADPCM speech decoding system of the preferred embodiment.
  • Fig. 3 shows a distributed communications arrangement.
  • the encoding system 10 of the preferred embodiment has a microphone 1 at which speech signals are generated and passed to an A to D converter 2.
  • a summer 3 is provided together with a quantizer 5 and dequantizer 6.
  • the output of the quantizer 5 is input to an arithmetic coder 8 which is controlled, as indicated by dotted lines, by a variance estimator (VarEst) 9 which obtains its input from the dequantizer 6.
  • VarEst variance estimator
  • the output O/P of the arithmetic coder 8 is the encoded bit stream.
  • the output, O/P can be connected with a digital data memory store, such as CD-ROM, DAT or a disc-drive, or alternatively to a distributed communications network for broadcast.
  • the output of the dequantizer 6 is input to a specialised Kalman filter 7 which has two outputs. One of these outputs is a predicted output P/O which is applied to the negative input of the summer 3. The other output is a smoothed output S/O.
  • the smootiied output S/O is used by an autoregressive model coefficient calculator (ARcalc) 4 which is in turned used to adapt the specialised Kalman filter 7.
  • ARcalc autoregressive model coefficient calculator
  • the specialised Kalman filter 7 is again used, however, since the input I/P for the decoder constitutes a variable bit rate stream, this is applied to an arithmetic decoder 18 which is again controlled by a variance estimator (Var Est) 9.
  • the input, I/P can be from a memory store or from a broadcast network as discussed above.
  • a dequantizer 16 is provided.
  • the specialised Kalman filter 7 is again adapted by the ARcalc 4 and its output O/P is the reconstructed output which is connected to a loudspeaker 15.
  • Speech signals have significant amounts of redundancy, or correlation between samples.
  • the encoding/decoding problem is to represent efficiently the information contained in the speech signal in a digital form for storage or transmission over a channel. In order to do this, it is desirable to remove the redundancy from the samples to be stored or transmitted. Put another way, it is desirable to remove that which is predictable from the samples to be stored or transmitted.
  • speech is typically modelled as a filtered white noise signal which is mathematically represented in a way which incorporates various filter coefficients.
  • Forwards adaptation schemes transmit these filter coefficients to the decoder in a quantized form. Then both the encoder and decoder are able to use the quantized coefficients in the prediction process.
  • backwards adaptation as in the present case, these filter coefficients are not transmitted. Instead, backwards adaptation is based on the availability of the reconstructed signal at the decoder which can then be used to produce a set of filter coefficients for the prediction process. If the reconstructed signal is close to the input signal, then it is reasonable to expect that backwards adaptation will perform well. It is known that a Kalman filter can be used to provide a good prediction with coarsely quantized measurements. These predictions are based on Kalman filter state estimates which contain smoothed signal values up to the order of the Kalman filter. However, a filter order of from 10 to 50 typically is required, and the computational cost typically is of an order of magnitude which approximates to the cube power of the filter order.
  • a Kalman filter predictor which utilises an adequate filter order gives rise to a substantial computational burden.
  • the computational cost of the update equations run every sample is in the vicinity of 120 MFLOPS.
  • This computational burden can be reduced by use of a Kalman filter technique with reduced complexity.
  • such an approach uses smoothing estimates in the predictor up to a relatively small lag of n samples. This approach utilises the consequence that most of the smoothing gain is to be obtained in the first few smoothing lags.
  • Smoothing to n lags is mathematically equivalent to assuming that only the top left hand n x n block of the error co variance matrix is non-zero.
  • the error covariance matrix arises because the calculated coefficients are not identical with the actual signal coefficients.
  • a specialised form of error covariance sub-matrix can be assumed and a good approximation obtained by simply updating the n x n error covariance matrix in order to provide an appropriate smoothing.
  • a specialised Kalman filter maximum smoothing lag value as low as 4, a reconstituted speech signal which differs from the equivalent signal of the full 50th order Kalman filter by less than 0.2 dB in signal to noise ratio is obtainable.
  • a computational load of only 0.8 MFLOPS is required compared to 120 MFLOPS for the 50th order Kalman filter.
  • the modified or specialised Kalman filter also results in considerable subjective improvement as it practically eliminates the high frequency "hiss" introduced by large quantization errors. This is especially important for low bit rate signal encoding/decoding systems.
  • an arithmetic encoder and corresponding decoder are used in conjunction with the quantizer and dequantizer. This arithmetic encoding/decoding gives a very large increase in the number of effective levels of the quantization. These levels are entropy coded via the arithmetic coder 8 (and its corresponding arithmetic decoder 18) based on the expected probability of the occurrence of each level. The probabilities are calculated using a distribution assumption, for example a Laplacian distribution assumption, and a variance estimate of the prediction difference signal which is input to the quantizer 5,
  • a substantial advantage to flow from an effectively larger number of quantization levels is that it practically eliminates overload distortion and hence there is more information present in the quantized data to assist in predictor adaptation and the estimate of the variance.
  • the arithmetic coder 8 and decoder 18 are entropy coded based on the expected probabilities of occurrence, the expected silence periods within speech utterances are able to be utilised to advantage.
  • the above described preferred embodiment gives good quality reconstituted speech at an average bit rate of 16 kbps. Similarly, good quality speech at low bit rates in the vicinity of 8 kbps are achievable.
  • a plurality of the encoding systems 10 and a plurality of decoding systems 20 can be arranged in a distributed communications system 30 as shown in Fig. 3.
  • the system 30 has a plurality of sites 25, which can have transmit- only, receive-only capability or a combination of both. Whilst a ring network configuration is shown, other network configurations are contemplated, including hub.
  • Embodiments of the invention can be implemented as software coding alone.
  • the encoder takes the speech signal as an input and produces a variable rate compressed digital signal (I k ) for storage or transmission (see Figure 1).
  • the speech input (S k ) to the encoder will be in the form of a (high) fixed bit rate digital signal obtained from a microphone in cascade with an analog-to-digital (A/D) converter or from some other means such as another codec (eg. PCM).
  • the prediction output of the specialised Kalman filter (S k _ ⁇ ) is subtracted from the speech signal (S k ) to produce a difference signal (S k ),
  • the quantiser which is represented by the Q block.
  • the quantiser may take many forms including (but not restricted to) uniform, non-uniform, adaptive, etc.
  • the output of the quantiser (Z k ) is a trivially coded version of the difference signal (S k ) and is mathematically specified as
  • the quantiser output is then applied to two other blocks, the arithmetic coder (AC) and the dequantiser (Q -1 ).
  • the dequantiser converts the coded signal Z k back to a real value (Y k ) which represents the quantised difference signal,
  • the quantised difference signal Y k then drives the specialised Kalman filter (KF) which
  • the specialised Kalman filter parameters are backwards adapted by virtue of the fact that the adaptation is performed based on the reconstructed speech signal (S ⁇ ).
  • the block ARcalc is driven by the reconstructed speech signal and computes the coefficients of the all-pole or auto-regressive (AR) signal model which in turn is used to adapt the specialised Kalman filter.
  • the arithmetic coder takes the trivially coded difference signal (Z k ) and converts it to a variable rate entropy coded bit stream (I k ).
  • the arithmetic coding is adapted to cope with the nonstationary statistics and for this particular application, the adaptation was performed based on the short term variance (although it could be based on some other quantity).
  • the block VarEst computes the short term variance of the quantised difference signal (Y k ) and this quantity is used to adapt the arithmetic coder.
  • the specialised Kalman filter is a simplified version of a full Kalman filter based on an all-pole signal model. It is also an extension of the standard linear predictor that is commonly used in speech coding. Mathematical Details
  • the full Kalman filter and the specialised Kalman filter are based on the all-pole signal model (also referred to as an auto-regressive (AR) signal model).
  • the all-pole speech signal model is specified by _
  • Equation (5) can also be written as
  • the state vector, x k consists of N samples, from the fcth sample back to the (k — N + l)th sample.
  • K k is the Kalman gain vector, given by
  • K k P k H ⁇ (HP k H ⁇ + R k ) '1 (11)
  • the Kalman filter predicted output is given by
  • the state vector estimate for the Kalman Filter is of the form
  • K k is the ith entry of the Kalman gain vector K k .
  • the Kalman filter takes account of the measurement (quantisation) noise in the signal coding situation, by recognising that the reconstructed sample (S k ) is not a perfect representation of the input speech sample S k .
  • the Kalman Filter exploits the correlation between samples, given by the all-pole signal model, to obtain smoothed estimates of the input samples for use in the prediction.
  • Smoothing theory specifies that most of the smoothing gain is to be obtained in the first few smoothing lags for this particular problem. Since this is the case, most of the advantage of full Kalman filtering can be obtained by smoothing to the first few n lags (say up to 5 lags), rather than continuing to the full order (eg., 49 lags for a 50th order signal model).
  • the reduced complexity Kalman Filter state estimate has the form
  • the sub-optimal smoothed estimate is read out of the state vector and mathematically, this is specified by where H J is defined as the (j + l)th row of an N x N identity matrix.
  • Kalman Filter in the KF-AC-ADPCM system also results in considerable subjective improvement, as it practically eliminates the high frequency "hiss" introduced by the large quantisation errors. This is especially important for low bit rate signal coding systems.
  • the subjective improvement is in fact far greater than the objective SNR measures would indicate from above.
  • the specialised Kalman filter utilises backward adaptation in its operation. This involves calculating the filter coefficients from the reconstructed speech signal (S k ) rather than from the original speech signal (S k ), as would be the case in forward adaptation.
  • the block ARcalc is driven by the reconstructed speech signal and computes the coefficients of the all-pole or auto- regressive (AR) signal model which in turn is used to adapt the specialised Kalman filter.
  • This is a well known procedure, particularly in the Low Delay Code-word Excited Linear Prediction systems (LD-CELP) such as the CCITT G.728 16kbit/sec standard.
  • LD-CELP Low Delay Code-word Excited Linear Prediction systems
  • the speech signal is broken up into short segments (20-200 samples) and an assumption is made that the signal is stationary over that period, i.e. it is assumed to be piece- wise stationary.
  • the "optimal" set of ⁇ * coefficients can be computed and these are used in the specialised Kalman filter for the duration of the segment.
  • Arithmetic Coding is a practically optimal entropy coding scheme, that is used here to encode the quantiser output with only the number of bits required by information theoretic consid ⁇ erations, based on the probability of that quantisation level being used.
  • the prime difference between arithmetic coding and the more common Huffman coding is that it does not suffer from the disadvantage of requiring each source symbol to be encoded with an integral number of bits. This is particularly advantageous for highly peaked probability distributions as in this case.
  • the arithmetic coder takes the trivially coded difference signal (Z k ) and converts it to a variable rate entropy coded bit stream (I k ). Mathematically, this can be specified by
  • the arithmetic coding is adapted to cope with the nonstationary statistics and for this particular application, the adaptation was performed based on the short term variance (although it could be based on some other quantity).
  • the block VarEst computes the short term variance of the quantised difference signal (Y k ) and uses this quantity to adapt the arithmetic coder.
  • the quantised difference signal is allocated to various bins based on the magnitude of the short term variance ⁇ 2 .
  • a probability distribution is then either assumed (eg. Laplacian) or calculated (eg. using test sentences to compute a look-up table) for each bin and this is then used to arithmetically code the quantised difference signal (Yk).
  • Perceptual weighting is a commonly used technique in many speech coding applications. It is used to improve the subjective quality of the decoded (reconstructed) speech by utilising known properties of the human auditory response. It is known that the human hearing is less sensitive to coding distortion in frequency bands where the energy is greatest due to the masking effect of the human ear. Perceptual weighting is found to give significant subjective performance improvements in the KF-AC-ADPCM system.
  • Perceptual weighting is the technique of adding appropriate filtering in order to redistribute the coding distortion energy in approximately the same distribution as the speech signal.
  • the output of the encoder (I k ) is a variable rate bit stream produced by the arithmetic coding block. These data are then either transmitted over a digital telecommunications channel (eg. Asynchronous Transfer Mode (ATM) network) or stored on some form of digital storage device (eg. hard disk) .
  • ATM Asynchronous Transfer Mode
  • the input bit stream to the decoder will be referred to as .
  • I k I k -
  • a practical codec must be able to cope with the non-ideal situation that occurs when bit errors are introduced (I k I k ) and special techniques (such as periodic resynchronisation) are added in order to achieve this.
  • the encoder output bit stream (I k ) is buffered in order to achieve the fixed rate average value. This necessarily introduces an additional encoding delay.
  • FIG. 2 is a block diagram of the decoder.
  • the decoder converts the received compressed variable rate bit stream (I k ) to the decoder reconstructed speech signal (S k ).
  • This reconstructed speech signal is in a similar form (fixed rate digital signal) to the original speech signal S k -
  • the reconstructed signal (S k ) is then either connected to a cascade of a digital-to-analog (D/A) converter and speaker or to some other device.
  • D/A digital-to-analog
  • the decoder first of all arithmetic decodes the variable rate bit stream (I k ) to produce the trivially coded signal at the decoder (Z k ) and this is represented by the arithmetic decoder block (AC -1 ).
  • the output of the arithmetic decoder (Z k ) is then converted to a decoder quantised difference signal (Y k ) by the dequantiser block Q _1 and this can be written as
  • the dequantiser block is identical to the one at the encoder.
  • the arithmetic decoder is adapted in an equivalent way to the arithmetic coder, i.e. a variance estimate of the decoder quantised difference signal (Y k ) is used to adapt the arithmetic decoder.
  • the VarEst block is identical to that of the encoder.
  • the specialised Kalman filter is identical in form to that of the encoder.
  • the specialised Kalman filter is backward adapted in an equivalent manner to the encoder.
  • Table 2 SNR and segSNR comparison between G.728 LD-CELP and KF-AC-ADPCM operat ⁇ ing at 16 kb/s
  • Table 2 shows values of SNR and segSNR for 16 kb/s LD-CELP and KF-AC-ADPCM operating at an average bit rate of 16 kb/s, when tested on the sentence "Cats and dogs each hate the other" .
  • KF-AC-ADPCM at an average of 16 kb/s has higher SNR and segSNR fig ⁇ ures than LD-CELP.
  • the informal listening tests confirm that the subjective performance is significantly superior to that of LD-CELP.
  • the informal listening tests also indicate that the KF-AC-ADPCM subjective quality at an average bit rate of 12 kb/s is equal to that of 16 kb/s LD-CELP.
  • the KF-AC-ADPCM subjective quality is slightly inferior to LD-CELP, but nevertheless is still very good.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

An ADPCM signal encoding/decoding system is disclosed. A digital input signal (I/P) is input to a summer (3), the output signal of which is quantized (5) and output to an arithmetic coder (8). The quantized signal is dequantized (6) and passed to a specialized Kalman filter (7). An output of the Kalman filter (7) is applied to the summer (3). The Kalman filter (7) is adapted by an autoregressive model coefficient calculator (4). The Kalman filter (7) is specialised in the sense that there is smoothing of the estimates in the predictor up to a relatively small lag of n full-order samples. The arithmetic coder (8) is adapted by a variance estimator (9) so that arithmetic coding is performed using only the number of bits required by information theoretic considerations, based on the probability of that quantization level being used. Decoding of audio signals so encoded operates essentially in a converse manner.

Description

ADPCM Signal Encoding/Decoding System and Method
Technical Field of the Invention
The present invention relates to ADPCM (Adaptive Differential Pulse Code Modulation) signal encoding systems and, in particular, to such systems used to encode speech signals to result in a variable bit rate digitised representation of the speech signals. The invention also relates to the reverse decoding of such variable bit rate digitised representations in order to provide a reconstituted speech signal. The invention also is applicable to other audio signals such as music.
Background Art
The present invention finds applications in the encoding of speech signals in order to enable such speech to be stored. For example, many banks and like financial institutions routinely record telephone conversations of their market dealers in order to provide a permanent record of financial contractual obligations entered into verbally. Substantial savings in the volume of storage can be made if such speech is stored digitally and reconstituted as necessary by the reverse decoding. The coding techniques of the present invention also find application in ATM (Asynchronous Transfer Mode) communications applications where a variable bit rate in the coded output is not of great consequence since it is the average bit rate which is important in such applications. ADPCM encoding and decoding systems using a predictor which essentially comprises a Kalman filter are known, however, such systems have not found practical application because of the very substantial computational demands made by the Kalman filter predictor even though the Kalman filter is known to be "optimal" as far as linear prediction is concerned.
There are four aspects required to be traded-off in implementing a low bit rate speech coding system: reconstructed speech quality, average bit rate, encoding/decoding delay and computational complexity. It would be desirable to be able to realise a system that allows these trade-offs to be made with flexibility, even in the course of operation, to allow specific application requirements to be met. Disclosure of the Invention
It is the object of the present invention to provide such an encoding/decoding system, and the corresponding encoding and decoding methods, which enable some of the virtues of Kalman filtering to be obtained but at very low bit rates and without the vice of substantial computational loads. This is essentially achieved by use of a sub- optimal Kalman predictor utilising smoothing sample estimates and arithmetic coding to take advantage of the silent periods within speech utterances and to preserve ADPCM stability at low bit rates.
In accordance with the first aspect of the present invention there is disclosed an ADPCM audio encoding system to provide a variable bit rate digitised representation of audio signals, said system comprising a digital input for said audio signals connected to a positive input of a summer, the output of said summer being connected to a cascade connected quantizer and dequantizer, the output of said dequantizer forming an input to a 'specialised' Kalman filter having two outputs, the first of said outputs comprising a predicted output which is connected to a negative input of said summer, the other of said filter outputs comprising a smoothed output which is utilized by an autoregressive calculator means to modify the operation of said 'specialised' Kalman filter; and an arithmetic coder having its input connected to said quantizer output, and controlled by a variance estimator means operating on the output of said dequantizer, the output of said arithmetic coder comprising said variable bit rate digitised representation.
Preferably, the encoding system further comprises an analog-to-digital converter for receiving audio signals in analog form and converting said signals to digital form. There further can be provided audio transducer means to generate said analog audio signals from sound pressure waves.
In accordance with a second aspect of the present invention there is disclosed an ADPCM decoding system to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, said system comprising a dequantizer having its input connected to receive a decoded form of said variable bit rate digitised representation, the output of said dequantizer being connected to a 'specialised' Kalman filter, the output of which comprises said reconstituted audio signal and is utilized by an autoregressive calculator means to modify the operation of said 'specialised' Kalman filter; and an arithmetic decoder, controlled by a variance estimator means operating on the output of said dequantizer, interposed between said variable bit rate digitised representation and said dequantizer to generate said decoded form of said variable bit rate digitised representation.
Preferably the audio signals are speech signals. The invention further discloses an ADPCM audio encoding/decoding system comprising an encoding system described above, a digital memory storage means for receiving and storing said variable bit rate digitised representation, and a decoding system as described above.
There further can be a plurality of the encoding systems and a plurality of the decoding systems connected to a distributed digital communications network.
The invention yet further disclosed a method for ADPCM audio encoding to provide a variable bit rate digitized representation of audio signals, said method comprising the steps of: filtering a digital audio signal by 'specialised' Kalman filter means; quantizing the filtered signal; and arithmetic coding the quantized signal to provide said variable bit rate digitised representation of said audio signal.
The invention yet further discloses a method for ADPCM audio decoding to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, the method comprising the steps of: arithmetic decoding of said digitised representation; dequantizing of said decoded representation; and filtering said dequantized signal by 'specialised' Kalman filter means to produce said reconstituted audio signal. Brief Description of the Drawings
A preferred embodiment of the present invention will now be described with reference to the drawings in which:
Fig. 1 is a schematic block diagram of an ADPCM speech encoding system of the preferred embodiment;
Fig. 2 is a schematic block diagram of the ADPCM speech decoding system of the preferred embodiment; and
Fig. 3 shows a distributed communications arrangement.
Detailed Description and Best Mode of Performance
As seen in Fig. 1, the encoding system 10 of the preferred embodiment has a microphone 1 at which speech signals are generated and passed to an A to D converter 2. A summer 3 is provided together with a quantizer 5 and dequantizer 6. The output of the quantizer 5 is input to an arithmetic coder 8 which is controlled, as indicated by dotted lines, by a variance estimator (VarEst) 9 which obtains its input from the dequantizer 6. The output O/P of the arithmetic coder 8 is the encoded bit stream. The output, O/P, can be connected with a digital data memory store, such as CD-ROM, DAT or a disc-drive, or alternatively to a distributed communications network for broadcast. The output of the dequantizer 6 is input to a specialised Kalman filter 7 which has two outputs. One of these outputs is a predicted output P/O which is applied to the negative input of the summer 3. The other output is a smoothed output S/O. The smootiied output S/O is used by an autoregressive model coefficient calculator (ARcalc) 4 which is in turned used to adapt the specialised Kalman filter 7. For the decoding system 20 as seen in Fig. 2, the specialised Kalman filter 7 is again used, however, since the input I/P for the decoder constitutes a variable bit rate stream, this is applied to an arithmetic decoder 18 which is again controlled by a variance estimator (Var Est) 9. The input, I/P, can be from a memory store or from a broadcast network as discussed above. A dequantizer 16 is provided. The specialised Kalman filter 7 is again adapted by the ARcalc 4 and its output O/P is the reconstructed output which is connected to a loudspeaker 15.
Speech signals have significant amounts of redundancy, or correlation between samples. The encoding/decoding problem is to represent efficiently the information contained in the speech signal in a digital form for storage or transmission over a channel. In order to do this, it is desirable to remove the redundancy from the samples to be stored or transmitted. Put another way, it is desirable to remove that which is predictable from the samples to be stored or transmitted.
For the purposes of analysis, speech is typically modelled as a filtered white noise signal which is mathematically represented in a way which incorporates various filter coefficients. Forwards adaptation schemes transmit these filter coefficients to the decoder in a quantized form. Then both the encoder and decoder are able to use the quantized coefficients in the prediction process.
However, in backwards adaptation as in the present case, these filter coefficients are not transmitted. Instead, backwards adaptation is based on the availability of the reconstructed signal at the decoder which can then be used to produce a set of filter coefficients for the prediction process. If the reconstructed signal is close to the input signal, then it is reasonable to expect that backwards adaptation will perform well. It is known that a Kalman filter can be used to provide a good prediction with coarsely quantized measurements. These predictions are based on Kalman filter state estimates which contain smoothed signal values up to the order of the Kalman filter. However, a filter order of from 10 to 50 typically is required, and the computational cost typically is of an order of magnitude which approximates to the cube power of the filter order. As a consequence, a Kalman filter predictor which utilises an adequate filter order gives rise to a substantial computational burden. For example, for Kalman filtering with a 50th order predictor, the computational cost of the update equations run every sample is in the vicinity of 120 MFLOPS. This computational burden can be reduced by use of a Kalman filter technique with reduced complexity. Although sub-optimal, such an approach uses smoothing estimates in the predictor up to a relatively small lag of n samples. This approach utilises the consequence that most of the smoothing gain is to be obtained in the first few smoothing lags.
Smoothing to n lags is mathematically equivalent to assuming that only the top left hand n x n block of the error co variance matrix is non-zero. The error covariance matrix arises because the calculated coefficients are not identical with the actual signal coefficients. Thus a specialised form of error covariance sub-matrix can be assumed and a good approximation obtained by simply updating the n x n error covariance matrix in order to provide an appropriate smoothing. With a specialised Kalman filter maximum smoothing lag value as low as 4, a reconstituted speech signal which differs from the equivalent signal of the full 50th order Kalman filter by less than 0.2 dB in signal to noise ratio is obtainable. A computational load of only 0.8 MFLOPS is required compared to 120 MFLOPS for the 50th order Kalman filter.
Furthermore, the modified or specialised Kalman filter also results in considerable subjective improvement as it practically eliminates the high frequency "hiss" introduced by large quantization errors. This is especially important for low bit rate signal encoding/decoding systems. In order to overcome the problems of large quantization errors, an arithmetic encoder and corresponding decoder are used in conjunction with the quantizer and dequantizer. This arithmetic encoding/decoding gives a very large increase in the number of effective levels of the quantization. These levels are entropy coded via the arithmetic coder 8 (and its corresponding arithmetic decoder 18) based on the expected probability of the occurrence of each level. The probabilities are calculated using a distribution assumption, for example a Laplacian distribution assumption, and a variance estimate of the prediction difference signal which is input to the quantizer 5,
A substantial advantage to flow from an effectively larger number of quantization levels is that it practically eliminates overload distortion and hence there is more information present in the quantized data to assist in predictor adaptation and the estimate of the variance.
Furthermore, because the arithmetic coder 8 and decoder 18 are entropy coded based on the expected probabilities of occurrence, the expected silence periods within speech utterances are able to be utilised to advantage.
The above described preferred embodiment gives good quality reconstituted speech at an average bit rate of 16 kbps. Similarly, good quality speech at low bit rates in the vicinity of 8 kbps are achievable.
The foregoing description relates to speech signals, however the invention equally is applicable to other audio signals including music.
To aid the foregoing description, a further detailed mathematically-based description follows, referring again to Figs. 1 and 2.
In another embodiment, a plurality of the encoding systems 10 and a plurality of decoding systems 20 can be arranged in a distributed communications system 30 as shown in Fig. 3. The system 30 has a plurality of sites 25, which can have transmit- only, receive-only capability or a combination of both. Whilst a ring network configuration is shown, other network configurations are contemplated, including hub.
Embodiments of the invention can be implemented as software coding alone.
Encoder
The encoder takes the speech signal as an input and produces a variable rate compressed digital signal (Ik) for storage or transmission (see Figure 1). The speech input (Sk) to the encoder will be in the form of a (high) fixed bit rate digital signal obtained from a microphone in cascade with an analog-to-digital (A/D) converter or from some other means such as another codec (eg. PCM). The prediction output of the specialised Kalman filter (S k_χ) is subtracted from the speech signal (Sk) to produce a difference signal (Sk),
Sk = Sk - S _ (1)
This signal is then applied to the quantiser which is represented by the Q block. The quantiser may take many forms including (but not restricted to) uniform, non-uniform, adaptive, etc. The output of the quantiser (Zk) is a trivially coded version of the difference signal (Sk) and is mathematically specified as
Zk = Q [Sk] . (2)
The quantiser output is then applied to two other blocks, the arithmetic coder (AC) and the dequantiser (Q-1). The dequantiser converts the coded signal Zk back to a real value (Yk) which represents the quantised difference signal,
Yk -= Q-- [Zk] ~- Q-- [Q [sk]] - (3)
The quantised difference signal Yk then drives the specialised Kalman filter (KF) which
Figure imgf000010_0001
The specialised Kalman filter parameters are backwards adapted by virtue of the fact that the adaptation is performed based on the reconstructed speech signal (S~). The block ARcalc is driven by the reconstructed speech signal and computes the coefficients of the all-pole or auto-regressive (AR) signal model which in turn is used to adapt the specialised Kalman filter.
The arithmetic coder takes the trivially coded difference signal (Zk) and converts it to a variable rate entropy coded bit stream (Ik). The arithmetic coding is adapted to cope with the nonstationary statistics and for this particular application, the adaptation was performed based on the short term variance (although it could be based on some other quantity). The block VarEst computes the short term variance of the quantised difference signal (Yk) and this quantity is used to adapt the arithmetic coder.
Specialised Kalman filter
The specialised Kalman filter is a simplified version of a full Kalman filter based on an all-pole signal model. It is also an extension of the standard linear predictor that is commonly used in speech coding. Mathematical Details
Both the full Kalman filter and the specialised Kalman filter are based on the all-pole signal model (also referred to as an auto-regressive (AR) signal model). The all-pole speech signal model is specified by _
Sk = Σ ^WN Wk, (5) i=ι <HZ where Sk is the speech signal, Wk represents the excitation sequence (white noise), and aι are the all-pole model coefficients. Equation (5) can also be written as
Sk = aiSk-i + ti2Sk-2 + 03^-3 H 1- a>NSk-N + wk, (6) or in state space form as
Figure imgf000011_0001
Sk = Hxk (8) where
Figure imgf000011_0002
H = [1, 0, - - - , 0].
In this state-space formulation, the state vector, xk, consists of N samples, from the fcth sample back to the (k — N + l)th sample.
The full Kalman Filter equations for the above all-pole speech signal model are:
Figure imgf000011_0003
where Kk is the Kalman gain vector, given by
Kk = PkHτ(HPkHτ + Rk)'1 (11) and
Pfc+ι = FPkFτ - FPkHτ(HPkHτ + Rk)-lHPkFτ + Qk, (12) is a Riccati difference equation (RDE) which recursively calculates the error covariance matrix. The quantity
Rk = E[ ] = σlk (13) s the measurement noise variance related to the quantised prediction error signal, Yk = Q[Sk
Figure imgf000012_0001
is the excitation noise variance matrix. The Kalman filter predicted output is given by
Sk\k-1 = Hxk k-l- (15)
The state vector estimate for the Kalman Filter is of the form
Figure imgf000012_0002
Thus the previous speech sample estimates are present in the state vector to various fixed-lag smoothed values. It is then a simple matter of reading out the smoothed sample from this state vector. Mathematically, the smoothed output is specified by
Sk-ι\k = Hlxk\k; 1 = 0, 1, . . . N - l, (17) where H-7 is defined as the (j + l)th row of an N x N identity matrix. Another way of writing (9), (10) and (16) is
Figure imgf000012_0003
>fc|fc k\k-l + KkYk
Sk-ι\k — Sk-\\k-\ + κkYk
Sk-N+ι\k — Sk-N+ \k-\ + Kk Yk, where Kk is the ith entry of the Kalman gain vector Kk.
The Kalman filter takes account of the measurement (quantisation) noise in the signal coding situation, by recognising that the reconstructed sample (Sk) is not a perfect representation of the input speech sample Sk. The Kalman Filter exploits the correlation between samples, given by the all-pole signal model, to obtain smoothed estimates of the input samples for use in the prediction.
Smoothing theory specifies that most of the smoothing gain is to be obtained in the first few smoothing lags for this particular problem. Since this is the case, most of the advantage of full Kalman filtering can be obtained by smoothing to the first few n lags (say up to 5 lags), rather than continuing to the full order (eg., 49 lags for a 50th order signal model).
The reduced complexity (and consequently sub-optimal) specialised Kalman Filter approach uses smoothed estimates in the filter up to a relatively small lag of n samples. Our sub-optimal prediction is based on
Figure imgf000013_0001
+ an+2S _n_^k_3 H NSk_N^k_N+n_l
Figure imgf000013_0002
°k-\ \k = Sk-l\k-l + κkγk
The reduced complexity Kalman Filter state estimate has the form
Figure imgf000013_0004
Compare this with the full Kalman Filter shown above, and it can be seen that we now have a "hybrid" approach and this is what we refer to as the specialised Kalman filter. The re¬ constructed speech signal can be obtained from the sub-optimal smoothed estimate via (Sk = Sk_nk). The sub-optimal smoothed estimate is read out of the state vector and mathematically, this is specified by
Figure imgf000013_0005
where HJ is defined as the (j + l)th row of an N x N identity matrix.
Smoothing to n lags only is equivalent to assuming that only the top left hand n x n block of the error covariance is non-zero Thus, we have an error covariance matrix of the form
Figure imgf000013_0006
where P} is an n x n matrix.
With Pk of this form, the Riccati equation (13), gives
Figure imgf000013_0007
where is found from the nth order Riccati difference equation:
(24)
- 11Pfe 11 (ff1)r(ff1 fc11(ff1)r +
Figure imgf000013_0008
Figure imgf000014_0001
Table 1: SNR measures and MFLOPS for the Riccati difference equation
with P11 and Q} defined as the top left n x n blocks from the F and Qk matrices, and H1 defined as the vector of size n from the left of the H vector. In fact, for Pk as in (22), the top left n x n block of Pk+i is exactly P^, and the only non-zero elements in Pfc+i are in the top left (n + 1) x (n + 1) block. If (22) is a good approximation for Pk obtained during full Kalman Filtering, then we would expect the non-zero elements outside the n x n top left block of Pfc+i to be close to zero, and hence a« = pii' 1 0 ' (25)
0 0 is a good approximation for Pfc+i- We thus are able to reduce the computational cost of full Kalman Filtering, through a sub-optimal approach, by simply updating the n x n error covariance matrix Pk n.
Specialised Kalman Filter Results
Simulations using a 50th order (N = 50) linear predictor within a AC-ADPCM speech coding framework, have shown that even with an upper block as low as n x n in the specialised Kalman filter, the approach can give a reconstructed signal differing from the full order Kalman filter approach by less than 0.2 dB in SNR. The corresponding improvement over the standard linear predictor for this situation is 1.0 dB in SNR. These tabulated results correspond to the output from the variable bit rate ADPCM system, at an average of 1.5 bits/sample, or 12 kbps (at 8 kHz sampling rate).
In Table 1 we present the Signal to Noise Ratio (SNR) and the Segmental SNR (segSNR) for a speech sentence along with the computational cost of the Riccati equation in MFLOPS. The notation LP indicates the standard linear predictor, KF denotes the full 50th order Kalman Filter, and KFn indicates the specialised Kalman filter with an n x n upper block in the Riccati difference equation.
From the table, it is clear that the use of Kalman Filtering techniques can result in significant performance improvement, for extremely low additional complexity. Take particular note of the 0.5 dB improvement from using a first order Riccati equation. The extra computational complexity needed to obtain this improvement is negligible.
The use of the Kalman Filter in the KF-AC-ADPCM system also results in considerable subjective improvement, as it practically eliminates the high frequency "hiss" introduced by the large quantisation errors. This is especially important for low bit rate signal coding systems. The subjective improvement is in fact far greater than the objective SNR measures would indicate from above.
Adaptation
The specialised Kalman filter utilises backward adaptation in its operation. This involves calculating the filter coefficients from the reconstructed speech signal (Sk) rather than from the original speech signal (Sk), as would be the case in forward adaptation. The block ARcalc is driven by the reconstructed speech signal and computes the coefficients of the all-pole or auto- regressive (AR) signal model which in turn is used to adapt the specialised Kalman filter. This is a well known procedure, particularly in the Low Delay Code-word Excited Linear Prediction systems (LD-CELP) such as the CCITT G.728 16kbit/sec standard.
The speech signal is broken up into short segments (20-200 samples) and an assumption is made that the signal is stationary over that period, i.e. it is assumed to be piece- wise stationary. For this segment, the "optimal" set of α* coefficients can be computed and these are used in the specialised Kalman filter for the duration of the segment.
There are many standard approaches to computing these α* coefficients. A common approach is -as follows:
First of all a segment of the most recent data (reconstructed samples) is windowed using some form of recursive window. Next this windowed data is used to compute the auto-correlation coefficients. The Levinson-Durbin algorithm then efficiently computes the α* coefficients in a recursive fashion based on the auto-correlation coefficients.
Arithmetic Coding
Arithmetic Coding is a practically optimal entropy coding scheme, that is used here to encode the quantiser output with only the number of bits required by information theoretic consid¬ erations, based on the probability of that quantisation level being used. The prime difference between arithmetic coding and the more common Huffman coding is that it does not suffer from the disadvantage of requiring each source symbol to be encoded with an integral number of bits. This is particularly advantageous for highly peaked probability distributions as in this case.
The arithmetic coder takes the trivially coded difference signal (Zk) and converts it to a variable rate entropy coded bit stream (Ik). Mathematically, this can be specified by
Ik = AC [Zk] (26)
The arithmetic coding is adapted to cope with the nonstationary statistics and for this particular application, the adaptation was performed based on the short term variance (although it could be based on some other quantity). The block VarEst computes the short term variance of the quantised difference signal (Yk) and uses this quantity to adapt the arithmetic coder. The short term variance is calculated by the recursion σk 2 +1 = σl + (l - a)Yk- (27) where σ~ is a measure of the short term variance and a is the leakage constant with a value between 0 and 1.
The quantised difference signal is allocated to various bins based on the magnitude of the short term variance σ2. A probability distribution is then either assumed (eg. Laplacian) or calculated (eg. using test sentences to compute a look-up table) for each bin and this is then used to arithmetically code the quantised difference signal (Yk). Perceptual Weighting
Perceptual weighting is a commonly used technique in many speech coding applications. It is used to improve the subjective quality of the decoded (reconstructed) speech by utilising known properties of the human auditory response. It is known that the human hearing is less sensitive to coding distortion in frequency bands where the energy is greatest due to the masking effect of the human ear. Perceptual weighting is found to give significant subjective performance improvements in the KF-AC-ADPCM system.
Perceptual weighting is the technique of adding appropriate filtering in order to redistribute the coding distortion energy in approximately the same distribution as the speech signal.
Transmission Channel or Storage
The output of the encoder (Ik) is a variable rate bit stream produced by the arithmetic coding block. These data are then either transmitted over a digital telecommunications channel (eg. Asynchronous Transfer Mode (ATM) network) or stored on some form of digital storage device (eg. hard disk) . There is always the possibility of bit errors occurring during transmission or storage, and so the input bit stream to the decoder will be referred to as . In an ideal situation, Ik = Ik- A practical codec must be able to cope with the non-ideal situation that occurs when bit errors are introduced (Ik Ik) and special techniques (such as periodic resynchronisation) are added in order to achieve this. For fixed rate channel applications, the encoder output bit stream (Ik) is buffered in order to achieve the fixed rate average value. This necessarily introduces an additional encoding delay.
Decoder
Figure 2 is a block diagram of the decoder. The decoder converts the received compressed variable rate bit stream (Ik) to the decoder reconstructed speech signal (Sk ). This reconstructed speech signal is in a similar form (fixed rate digital signal) to the original speech signal Sk- The reconstructed signal (Sk ) is then either connected to a cascade of a digital-to-analog (D/A) converter and speaker or to some other device.
The decoder first of all arithmetic decodes the variable rate bit stream (Ik) to produce the trivially coded signal at the decoder (Zk) and this is represented by the arithmetic decoder block (AC-1). Mathematically we have
Zk' = AC"1 [lk'] . (28)
The output of the arithmetic decoder (Zk) is then converted to a decoder quantised difference signal (Yk) by the dequantiser block Q_1 and this can be written as
Yk = Q-1 [zk '] (29)
Note that the dequantiser block is identical to the one at the encoder. The arithmetic decoder is adapted in an equivalent way to the arithmetic coder, i.e. a variance estimate of the decoder quantised difference signal (Yk) is used to adapt the arithmetic decoder. The VarEst block is identical to that of the encoder. The decoder quantised difference signal (Yk) then drives the specialised Kalman filter to produce the decoder reconstructed signal (Sk ). This reconstructed signal is taken to be the decoder smoothed output, i.e. Sk = <S' MA.. The specialised Kalman filter is identical in form to that of the encoder. The specialised Kalman filter is backward adapted in an equivalent manner to the encoder. Codec SNR (dB) segSNR (dB)
LD-CELP 18.81 16.03
KF-AC- ADPCM 28.03 17.82
Table 2: SNR and segSNR comparison between G.728 LD-CELP and KF-AC-ADPCM operat¬ ing at 16 kb/s
KF-AC-ADPCM Results
It is difficult to present concrete performance results for speech coding since the ultimate per¬ formance criterion is subjective quality and this can only be properly evaluted by expensive formal listening tests. The common approach is to use objective measures such as SNR and segSNR in conjunction with informal listening tests. In addition, it is also usual to compare with an existing well known standard, eg. CCITT G.728 LD-CELP 16 kb/s codec.
Table 2 shows values of SNR and segSNR for 16 kb/s LD-CELP and KF-AC-ADPCM operating at an average bit rate of 16 kb/s, when tested on the sentence "Cats and dogs each hate the other" .
Notice the KF-AC-ADPCM at an average of 16 kb/s has higher SNR and segSNR fig¬ ures than LD-CELP. The informal listening tests confirm that the subjective performance is significantly superior to that of LD-CELP. The informal listening tests also indicate that the KF-AC-ADPCM subjective quality at an average bit rate of 12 kb/s is equal to that of 16 kb/s LD-CELP. At 8kb/s, the KF-AC-ADPCM subjective quality is slightly inferior to LD-CELP, but nevertheless is still very good.

Claims

CLAIMS:
1. An ADPCM audio encoding system to provide a variable bit rate digitised representation of audio signals, said system comprising a digital input for said audio signals connected to a positive input of a summer, the output of said summer being connected to a cascade connected quantizer and dequantizer, the output of said dequantizer forming an input to a 'specialised' Kalman filter having two outputs, the first of said outputs comprising a predicted output which is connected to a negative input of said summer, the other of said filter outputs comprising a smoothed output which is utilized by an autoregressive calculator means to modify the operation of said specialised Kalman filter; and an arithmetic coder having its input connected to said quantizer output, and controlled by a variance estimator means operating on the output of said dequantizer, the output of said arithmetic coder comprising said variable bit rate digitised representation.
2. An encoding system as claimed in claim 1, further comprising an analog-to-digital converter for receiving audio signals in analog form and converting said signals to digital form.
3. An encoding system as claimed in claim 2, further comprising audio transducer means to generate said analog audio signals from sound pressure waves.
4. An encoding system as claimed in any one of the preceding claims, wherein said specialised Kalman filter utilises sub-optimal Kalman prediction by smoothing sample estimates only to a reduced number of lags relative to the full order.
5. An encoding system as claimed in any one of the preceding claims, wherein said arithmetic coder generates a variable rate entropy coded bit stream.
6. An encoding system as claimed in any one of the preceding claims, further comprising digital memory storage means for receiving and storing said variable bit rate digitised representation.
7. An encoding system as claimed in any of the preceding claims, wherein said audio signals are speech signals.
8. An ADPCM decoding system to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, said system comprising a dequantizer having its input connected to receive a decoded form of said variable bit rate digitised representation, the output of said dequantizer being connected to a 'specialised' Kalman filter, the output of which comprises said reconstituted audio signal and is utilized by an autoregressive calculator means to modify the operation of said 'specialised' Kalman filter, and an arithmetic decoder, controlled by a variance estimator means operating on the output of said dequantizer, interposed between said variable bit rate digitised representation and said dequantizer to generate said decoded form of said variable bit rate digitised representation.
9. A decoding system as claimed in claim 8, further comprising transduce means for receiving said reconstituted audio signal to generate sound pressure waves.
10. A decoding circuit as claimed in either of claim 8 or claim 9, wherein said specialised Kalman filter utilises sub-optimal Kalman prediction by smoothing sample estimates only to a reduced number of lags relative to the full order.
11. A decoding system as claimed in any one of claims 8 to 10, wherein said audio signals are speech signals.
12. An ADPCM audio encoding/decoding system comprising: an encoding system as claimed in any one of claims 1 to 5; a digital memory storage means for receiving and storing said variable bit rate digitised representation; and a decoding system as claimed in any one of claims 8 to 10.
13. An ADPCM audio encoding/decoding system comprising: one or more encoding systems as claimed in any one of claims 1 to 5; a distributed digital communications network to which said one or more encoding systems are connected and over which ones of said variable bit rate digitised representations are transmitted; one or more decoding systems as claimed in any one of claims 8 to 10 connected to said network, each said decoding system selectively acting as a destination for said transmitted representations.
14. A method for ADPCM audio encoding to provide a variable bit rate digitized representation of audio signals, said method comprising the steps of: filtering a digital audio signal by 'specialised' Kalman filter means; quantizing the filtered signal; and arithmetic coding the quantized signal to provide said variable bit rate digitised representation of said audio signal.
15. A method as claimed in claim 14, wherein said filtering steps includes dequantizing said quantized signal, filtering said dequantized signal by sub-optimal Kalman prediction by smoothing sample estimates only to a reduced number of lags relative to the full order to produce a predicted signal and subtracting the predicted signal from said digital audio signal.
16. A method as claimed in claim 15, whereby said step of arithmetic coding includes generating a variable rate entropy coded bit stream.
17. A method for ADPCM audio decoding to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, the method comprising the steps of: arithmetic decoding of said digitised representation; dequantizing of said decoded representation; and filtering said dequantized signal by 'specialised' Kalman filter means to produce said reconstituted audio signal.
18. A method as claimed in claim 17, whereby said step of filtering includes filtering said dequantized signal by sub-optimal Kalman prediction by smoothing sample estimates only to a reduced number of lags relative to the full order.
PCT/AU1995/000216 1994-04-13 1995-04-13 Adpcm signal encoding/decoding system and method WO1995028770A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU22098/95A AU2209895A (en) 1994-04-13 1995-04-13 Adpcm signal encoding/decoding system and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPM5037 1994-04-13
AUPM5037A AUPM503794A0 (en) 1994-04-13 1994-04-13 Adpcm signal encoding/decoding system and method

Publications (1)

Publication Number Publication Date
WO1995028770A1 true WO1995028770A1 (en) 1995-10-26

Family

ID=3779617

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU1995/000216 WO1995028770A1 (en) 1994-04-13 1995-04-13 Adpcm signal encoding/decoding system and method

Country Status (2)

Country Link
AU (1) AUPM503794A0 (en)
WO (1) WO1995028770A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2124664A1 (en) * 1996-11-12 1999-02-01 Alsthom Cge Alcatel Method and device for adaptive differential pulse code modulation.
WO2000041313A1 (en) * 1999-01-07 2000-07-13 Koninklijke Philips Electronics N.V. Efficient coding of side information in a lossless encoder
WO2008058692A1 (en) * 2006-11-13 2008-05-22 Global Ip Solutions (Gips) Ab Lossless encoding and decoding of digital data
FR3018942A1 (en) * 2014-03-24 2015-09-25 Orange ESTIMATING CODING NOISE INTRODUCED BY COMPRESSION CODING OF ADPCM TYPE

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2217583A (en) * 1982-12-10 1984-06-14 Nec Corporation Ad pcm decoder
EP0206273A2 (en) * 1985-06-20 1986-12-30 Fujitsu Limited Adaptive differential pulse code modulation system
EP0206352A2 (en) * 1985-06-28 1986-12-30 Fujitsu Limited Coding transmission equipment for carrying out coding with adaptive quantization
US4654863A (en) * 1985-05-23 1987-03-31 At&T Bell Laboratories Wideband adaptive prediction
EP0288281A2 (en) * 1987-04-21 1988-10-26 Oki Electric Industry Company, Limited ADPCM encoding and decoding systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2217583A (en) * 1982-12-10 1984-06-14 Nec Corporation Ad pcm decoder
US4654863A (en) * 1985-05-23 1987-03-31 At&T Bell Laboratories Wideband adaptive prediction
EP0206273A2 (en) * 1985-06-20 1986-12-30 Fujitsu Limited Adaptive differential pulse code modulation system
EP0206352A2 (en) * 1985-06-28 1986-12-30 Fujitsu Limited Coding transmission equipment for carrying out coding with adaptive quantization
EP0288281A2 (en) * 1987-04-21 1988-10-26 Oki Electric Industry Company, Limited ADPCM encoding and decoding systems

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2124664A1 (en) * 1996-11-12 1999-02-01 Alsthom Cge Alcatel Method and device for adaptive differential pulse code modulation.
WO2000041313A1 (en) * 1999-01-07 2000-07-13 Koninklijke Philips Electronics N.V. Efficient coding of side information in a lossless encoder
WO2008058692A1 (en) * 2006-11-13 2008-05-22 Global Ip Solutions (Gips) Ab Lossless encoding and decoding of digital data
FR3018942A1 (en) * 2014-03-24 2015-09-25 Orange ESTIMATING CODING NOISE INTRODUCED BY COMPRESSION CODING OF ADPCM TYPE
WO2015145050A1 (en) * 2014-03-24 2015-10-01 Orange Estimation of encoding noise created by compressed micda encoding

Also Published As

Publication number Publication date
AUPM503794A0 (en) 1994-06-09

Similar Documents

Publication Publication Date Title
US6593872B2 (en) Signal processing apparatus and method, signal coding apparatus and method, and signal decoding apparatus and method
JP2005534947A (en) Scale-factor feedforward prediction based on acceptable distortion of noise formed when compressing on a psychoacoustic basis
KR101143792B1 (en) Signal encoding device and method, and signal decoding device and method
EP1096476B1 (en) Speech signal decoding
JP2007504503A (en) Low bit rate audio encoding
JPH1097295A (en) Coding method and decoding method of acoustic signal
US7072830B2 (en) Audio coder
JP3266178B2 (en) Audio coding device
EP2560163A1 (en) Apparatus and method of enhancing quality of speech codec
JPH0590974A (en) Method and apparatus for processing front echo
JP4843142B2 (en) Use of gain-adaptive quantization and non-uniform code length for speech coding
JP3357829B2 (en) Audio encoding / decoding method
JP4359949B2 (en) Signal encoding apparatus and method, and signal decoding apparatus and method
JP2001053869A (en) Voice storing device and voice encoding device
US6678653B1 (en) Apparatus and method for coding audio data at high speed using precision information
WO1995028770A1 (en) Adpcm signal encoding/decoding system and method
JP4281131B2 (en) Signal encoding apparatus and method, and signal decoding apparatus and method
JP2003110429A (en) Coding method and device, decoding method and device, transmission method and device, and storage medium
JP5057334B2 (en) Linear prediction coefficient calculation device, linear prediction coefficient calculation method, linear prediction coefficient calculation program, and storage medium
JP5451603B2 (en) Digital audio signal encoding
JP3265726B2 (en) Variable rate speech coding device
JP3417362B2 (en) Audio signal decoding method and audio signal encoding / decoding method
JP3004664B2 (en) Variable rate coding method
JP3496618B2 (en) Apparatus and method for speech encoding / decoding including speechless encoding operating at multiple rates
JP4409733B2 (en) Encoding apparatus, encoding method, and recording medium therefor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA