WO1995028770A1

WO1995028770A1 - Adpcm signal encoding/decoding system and method

Info

Publication number: WO1995028770A1
Application number: PCT/AU1995/000216
Authority: WO
Inventors: Craig Robert Watkins; Robert Ronald Bitmead; Salvatore Crisafulli
Original assignee: Commonwealth Scientific And Industrial Research Organisation; The Australian National University
Priority date: 1994-04-13
Filing date: 1995-04-13
Publication date: 1995-10-26
Also published as: AUPM503794A0

Abstract

An ADPCM signal encoding/decoding system is disclosed. A digital input signal (I/P) is input to a summer (3), the output signal of which is quantized (5) and output to an arithmetic coder (8). The quantized signal is dequantized (6) and passed to a specialized Kalman filter (7). An output of the Kalman filter (7) is applied to the summer (3). The Kalman filter (7) is adapted by an autoregressive model coefficient calculator (4). The Kalman filter (7) is specialised in the sense that there is smoothing of the estimates in the predictor up to a relatively small lag of n full-order samples. The arithmetic coder (8) is adapted by a variance estimator (9) so that arithmetic coding is performed using only the number of bits required by information theoretic considerations, based on the probability of that quantization level being used. Decoding of audio signals so encoded operates essentially in a converse manner.

Description

ADPCM Signal Encoding/Decoding System and Method

Technical Field of the Invention

The present invention relates to ADPCM (Adaptive Differential Pulse Code Modulation) signal encoding systems and, in particular, to such systems used to encode speech signals to result in a variable bit rate digitised representation of the speech signals. The invention also relates to the reverse decoding of such variable bit rate digitised representations in order to provide a reconstituted speech signal. The invention also is applicable to other audio signals such as music.

Background Art

The present invention finds applications in the encoding of speech signals in order to enable such speech to be stored. For example, many banks and like financial institutions routinely record telephone conversations of their market dealers in order to provide a permanent record of financial contractual obligations entered into verbally. Substantial savings in the volume of storage can be made if such speech is stored digitally and reconstituted as necessary by the reverse decoding. The coding techniques of the present invention also find application in ATM (Asynchronous Transfer Mode) communications applications where a variable bit rate in the coded output is not of great consequence since it is the average bit rate which is important in such applications. ADPCM encoding and decoding systems using a predictor which essentially comprises a Kalman filter are known, however, such systems have not found practical application because of the very substantial computational demands made by the Kalman filter predictor even though the Kalman filter is known to be "optimal" as far as linear prediction is concerned.

There are four aspects required to be traded-off in implementing a low bit rate speech coding system: reconstructed speech quality, average bit rate, encoding/decoding delay and computational complexity. It would be desirable to be able to realise a system that allows these trade-offs to be made with flexibility, even in the course of operation, to allow specific application requirements to be met. Disclosure of the Invention

It is the object of the present invention to provide such an encoding/decoding system, and the corresponding encoding and decoding methods, which enable some of the virtues of Kalman filtering to be obtained but at very low bit rates and without the vice of substantial computational loads. This is essentially achieved by use of a sub- optimal Kalman predictor utilising smoothing sample estimates and arithmetic coding to take advantage of the silent periods within speech utterances and to preserve ADPCM stability at low bit rates.

In accordance with the first aspect of the present invention there is disclosed an ADPCM audio encoding system to provide a variable bit rate digitised representation of audio signals, said system comprising a digital input for said audio signals connected to a positive input of a summer, the output of said summer being connected to a cascade connected quantizer and dequantizer, the output of said dequantizer forming an input to a 'specialised' Kalman filter having two outputs, the first of said outputs comprising a predicted output which is connected to a negative input of said summer, the other of said filter outputs comprising a smoothed output which is utilized by an autoregressive calculator means to modify the operation of said 'specialised' Kalman filter; and an arithmetic coder having its input connected to said quantizer output, and controlled by a variance estimator means operating on the output of said dequantizer, the output of said arithmetic coder comprising said variable bit rate digitised representation.

Preferably, the encoding system further comprises an analog-to-digital converter for receiving audio signals in analog form and converting said signals to digital form. There further can be provided audio transducer means to generate said analog audio signals from sound pressure waves.

In accordance with a second aspect of the present invention there is disclosed an ADPCM decoding system to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, said system comprising a dequantizer having its input connected to receive a decoded form of said variable bit rate digitised representation, the output of said dequantizer being connected to a 'specialised' Kalman filter, the output of which comprises said reconstituted audio signal and is utilized by an autoregressive calculator means to modify the operation of said 'specialised' Kalman filter; and an arithmetic decoder, controlled by a variance estimator means operating on the output of said dequantizer, interposed between said variable bit rate digitised representation and said dequantizer to generate said decoded form of said variable bit rate digitised representation.

Preferably the audio signals are speech signals. The invention further discloses an ADPCM audio encoding/decoding system comprising an encoding system described above, a digital memory storage means for receiving and storing said variable bit rate digitised representation, and a decoding system as described above.

There further can be a plurality of the encoding systems and a plurality of the decoding systems connected to a distributed digital communications network.

The invention yet further disclosed a method for ADPCM audio encoding to provide a variable bit rate digitized representation of audio signals, said method comprising the steps of: filtering a digital audio signal by 'specialised' Kalman filter means; quantizing the filtered signal; and arithmetic coding the quantized signal to provide said variable bit rate digitised representation of said audio signal.

The invention yet further discloses a method for ADPCM audio decoding to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, the method comprising the steps of: arithmetic decoding of said digitised representation; dequantizing of said decoded representation; and filtering said dequantized signal by 'specialised' Kalman filter means to produce said reconstituted audio signal. Brief Description of the Drawings

A preferred embodiment of the present invention will now be described with reference to the drawings in which:

Fig. 1 is a schematic block diagram of an ADPCM speech encoding system of the preferred embodiment;

Fig. 2 is a schematic block diagram of the ADPCM speech decoding system of the preferred embodiment; and

Fig. 3 shows a distributed communications arrangement.

Detailed Description and Best Mode of Performance

As seen in Fig. 1, the encoding system 10 of the preferred embodiment has a microphone 1 at which speech signals are generated and passed to an A to D converter 2. A summer 3 is provided together with a quantizer 5 and dequantizer 6. The output of the quantizer 5 is input to an arithmetic coder 8 which is controlled, as indicated by dotted lines, by a variance estimator (VarEst) 9 which obtains its input from the dequantizer 6. The output O/P of the arithmetic coder 8 is the encoded bit stream. The output, O/P, can be connected with a digital data memory store, such as CD-ROM, DAT or a disc-drive, or alternatively to a distributed communications network for broadcast. The output of the dequantizer 6 is input to a specialised Kalman filter 7 which has two outputs. One of these outputs is a predicted output P/O which is applied to the negative input of the summer 3. The other output is a smoothed output S/O. The smootiied output S/O is used by an autoregressive model coefficient calculator (ARcalc) 4 which is in turned used to adapt the specialised Kalman filter 7. For the decoding system 20 as seen in Fig. 2, the specialised Kalman filter 7 is again used, however, since the input I/P for the decoder constitutes a variable bit rate stream, this is applied to an arithmetic decoder 18 which is again controlled by a variance estimator (Var Est) 9. The input, I/P, can be from a memory store or from a broadcast network as discussed above. A dequantizer 16 is provided. The specialised Kalman filter 7 is again adapted by the ARcalc 4 and its output O/P is the reconstructed output which is connected to a loudspeaker 15.

Speech signals have significant amounts of redundancy, or correlation between samples. The encoding/decoding problem is to represent efficiently the information contained in the speech signal in a digital form for storage or transmission over a channel. In order to do this, it is desirable to remove the redundancy from the samples to be stored or transmitted. Put another way, it is desirable to remove that which is predictable from the samples to be stored or transmitted.

For the purposes of analysis, speech is typically modelled as a filtered white noise signal which is mathematically represented in a way which incorporates various filter coefficients. Forwards adaptation schemes transmit these filter coefficients to the decoder in a quantized form. Then both the encoder and decoder are able to use the quantized coefficients in the prediction process.

However, in backwards adaptation as in the present case, these filter coefficients are not transmitted. Instead, backwards adaptation is based on the availability of the reconstructed signal at the decoder which can then be used to produce a set of filter coefficients for the prediction process. If the reconstructed signal is close to the input signal, then it is reasonable to expect that backwards adaptation will perform well. It is known that a Kalman filter can be used to provide a good prediction with coarsely quantized measurements. These predictions are based on Kalman filter state estimates which contain smoothed signal values up to the order of the Kalman filter. However, a filter order of from 10 to 50 typically is required, and the computational cost typically is of an order of magnitude which approximates to the cube power of the filter order. As a consequence, a Kalman filter predictor which utilises an adequate filter order gives rise to a substantial computational burden. For example, for Kalman filtering with a 50th order predictor, the computational cost of the update equations run every sample is in the vicinity of 120 MFLOPS. This computational burden can be reduced by use of a Kalman filter technique with reduced complexity. Although sub-optimal, such an approach uses smoothing estimates in the predictor up to a relatively small lag of n samples. This approach utilises the consequence that most of the smoothing gain is to be obtained in the first few smoothing lags.

Smoothing to n lags is mathematically equivalent to assuming that only the top left hand n x n block of the error co variance matrix is non-zero. The error covariance matrix arises because the calculated coefficients are not identical with the actual signal coefficients. Thus a specialised form of error covariance sub-matrix can be assumed and a good approximation obtained by simply updating the n x n error covariance matrix in order to provide an appropriate smoothing. With a specialised Kalman filter maximum smoothing lag value as low as 4, a reconstituted speech signal which differs from the equivalent signal of the full 50th order Kalman filter by less than 0.2 dB in signal to noise ratio is obtainable. A computational load of only 0.8 MFLOPS is required compared to 120 MFLOPS for the 50th order Kalman filter.

Furthermore, the modified or specialised Kalman filter also results in considerable subjective improvement as it practically eliminates the high frequency "hiss" introduced by large quantization errors. This is especially important for low bit rate signal encoding/decoding systems. In order to overcome the problems of large quantization errors, an arithmetic encoder and corresponding decoder are used in conjunction with the quantizer and dequantizer. This arithmetic encoding/decoding gives a very large increase in the number of effective levels of the quantization. These levels are entropy coded via the arithmetic coder 8 (and its corresponding arithmetic decoder 18) based on the expected probability of the occurrence of each level. The probabilities are calculated using a distribution assumption, for example a Laplacian distribution assumption, and a variance estimate of the prediction difference signal which is input to the quantizer 5,

A substantial advantage to flow from an effectively larger number of quantization levels is that it practically eliminates overload distortion and hence there is more information present in the quantized data to assist in predictor adaptation and the estimate of the variance.

Furthermore, because the arithmetic coder 8 and decoder 18 are entropy coded based on the expected probabilities of occurrence, the expected silence periods within speech utterances are able to be utilised to advantage.

The above described preferred embodiment gives good quality reconstituted speech at an average bit rate of 16 kbps. Similarly, good quality speech at low bit rates in the vicinity of 8 kbps are achievable.

The foregoing description relates to speech signals, however the invention equally is applicable to other audio signals including music.

To aid the foregoing description, a further detailed mathematically-based description follows, referring again to Figs. 1 and 2.

In another embodiment, a plurality of the encoding systems 10 and a plurality of decoding systems 20 can be arranged in a distributed communications system 30 as shown in Fig. 3. The system 30 has a plurality of sites 25, which can have transmit- only, receive-only capability or a combination of both. Whilst a ring network configuration is shown, other network configurations are contemplated, including hub.

Embodiments of the invention can be implemented as software coding alone.

Encoder

The encoder takes the speech signal as an input and produces a variable rate compressed digital signal (I_k) for storage or transmission (see Figure 1). The speech input (S_k) to the encoder will be in the form of a (high) fixed bit rate digital signal obtained from a microphone in cascade with an analog-to-digital (A/D) converter or from some other means such as another codec (eg. PCM). The prediction output of the specialised Kalman filter (S _k__χ) is subtracted from the speech signal (S_k) to produce a difference signal (S_k),

S_k = S_k - S _ (1)

This signal is then applied to the quantiser which is represented by the Q block. The quantiser may take many forms including (but not restricted to) uniform, non-uniform, adaptive, etc. The output of the quantiser (Z_k) is a trivially coded version of the difference signal (S_k) and is mathematically specified as

Z_k = Q [S_k] . (2)

The quantiser output is then applied to two other blocks, the arithmetic coder (AC) and the dequantiser (Q^-1). The dequantiser converts the coded signal Z_k back to a real value (Y_k) which represents the quantised difference signal,

Yk -= Q-- [Zk] ^~- Q-- [Q [s_k]] - (3)

The quantised difference signal Y_k then drives the specialised Kalman filter (KF) which

The specialised Kalman filter parameters are backwards adapted by virtue of the fact that the adaptation is performed based on the reconstructed speech signal (S^~). The block ARcalc is driven by the reconstructed speech signal and computes the coefficients of the all-pole or auto-regressive (AR) signal model which in turn is used to adapt the specialised Kalman filter.

The arithmetic coder takes the trivially coded difference signal (Z_k) and converts it to a variable rate entropy coded bit stream (I_k). The arithmetic coding is adapted to cope with the nonstationary statistics and for this particular application, the adaptation was performed based on the short term variance (although it could be based on some other quantity). The block VarEst computes the short term variance of the quantised difference signal (Y_k) and this quantity is used to adapt the arithmetic coder.

Specialised Kalman filter

The specialised Kalman filter is a simplified version of a full Kalman filter based on an all-pole signal model. It is also an extension of the standard linear predictor that is commonly used in speech coding. Mathematical Details

Both the full Kalman filter and the specialised Kalman filter are based on the all-pole signal model (also referred to as an auto-regressive (AR) signal model). The all-pole speech signal model is specified by _

S_k = Σ ^WN Wk, (5) i=ι <HZ where S_k is the speech signal, W_k represents the excitation sequence (white noise), and aι are the all-pole model coefficients. Equation (5) can also be written as

S_k = aiSk-i + ti2Sk-2 + 03^-3 H 1- a>NSk-N + w_k, (6) or in state space form as

Sk = Hx_k (8) where

H = [1, 0, - - - , 0].

In this state-space formulation, the state vector, x_k, consists of N samples, from the fcth sample back to the (k — N + l)th sample.

The full Kalman Filter equations for the above all-pole speech signal model are:

where K_k is the Kalman gain vector, given by

K_k = P_kH^τ(HP_kH^τ + R_k)^'1 (11) and

P_fc+ι = FP_kF^τ - FP_kH^τ(HP_kH^τ + R_k)-^lHP_kF^τ + Q_k, (12) is a Riccati difference equation (RDE) which recursively calculates the error covariance matrix. The quantity

R_k = E[ ] = σl_k (13) s the measurement noise variance related to the quantised prediction error signal, Y_k = Q[S_k

is the excitation noise variance matrix. The Kalman filter predicted output is given by

Sk\k-1 = Hxk k-l- (15)

The state vector estimate for the Kalman Filter is of the form

Thus the previous speech sample estimates are present in the state vector to various fixed-lag smoothed values. It is then a simple matter of reading out the smoothed sample from this state vector. Mathematically, the smoothed output is specified by

S_k-ι\k = H^lx_k\_k; 1 = 0, 1, . . . N - l, (17) where H-⁷ is defined as the (j + l)th row of an N x N identity matrix. Another way of writing (9), (10) and (16) is

>fc|fc k\k-l + K_kYk

Sk-ι\k — S_k-\\_k-\ + κ_kY_k

S_k-N+ι\_k — S_k-N+ \k-\ + K_k Y_k, where K_k is the ith entry of the Kalman gain vector K_k.

The Kalman filter takes account of the measurement (quantisation) noise in the signal coding situation, by recognising that the reconstructed sample (S_k) is not a perfect representation of the input speech sample S_k. The Kalman Filter exploits the correlation between samples, given by the all-pole signal model, to obtain smoothed estimates of the input samples for use in the prediction.

Smoothing theory specifies that most of the smoothing gain is to be obtained in the first few smoothing lags for this particular problem. Since this is the case, most of the advantage of full Kalman filtering can be obtained by smoothing to the first few n lags (say up to 5 lags), rather than continuing to the full order (eg., 49 lags for a 50th order signal model).

The reduced complexity (and consequently sub-optimal) specialised Kalman Filter approach uses smoothed estimates in the filter up to a relatively small lag of n samples. Our sub-optimal prediction is based on

+ a_n+2S __n_^_k_₃ H NS_k__N^_k__N+n__l

°k-\ \k ^{= S}k-l\k-l + ^κk^γk

The reduced complexity Kalman Filter state estimate has the form

Compare this with the full Kalman Filter shown above, and it can be seen that we now have a "hybrid" approach and this is what we refer to as the specialised Kalman filter. The re¬ constructed speech signal can be obtained from the sub-optimal smoothed estimate via (S_k = S_k_n_k). The sub-optimal smoothed estimate is read out of the state vector and mathematically, this is specified by

where H^J is defined as the (j + l)th row of an N x N identity matrix.

Smoothing to n lags only is equivalent to assuming that only the top left hand n x n block of the error covariance is non-zero Thus, we have an error covariance matrix of the form

where P} is an n x n matrix.

With P_k of this form, the Riccati equation (13), gives

where is found from the nth order Riccati difference equation:

(24)

- ¹¹P_fe ¹¹ (ff¹)^r(ff¹ fc¹¹(ff¹)^r +

Table 1: SNR measures and MFLOPS for the Riccati difference equation

with P¹¹ and Q} defined as the top left n x n blocks from the F and Q_k matrices, and H¹ defined as the vector of size n from the left of the H vector. In fact, for P_k as in (22), the top left n x n block of P_k+i is exactly P^, and the only non-zero elements in P_fc+i are in the top left (n + 1) x (n + 1) block. If (22) is a good approximation for P_k obtained during full Kalman Filtering, then we would expect the non-zero elements outside the n x n top left block of P_fc+i to be close to zero, and hence a« = ^pii' ^{1 0 '} (25)

0 0 is a good approximation for P_fc+i- We thus are able to reduce the computational cost of full Kalman Filtering, through a sub-optimal approach, by simply updating the n x n error covariance matrix P_k ⁿ.

Specialised Kalman Filter Results

Simulations using a 50th order (N = 50) linear predictor within a AC-ADPCM speech coding framework, have shown that even with an upper block as low as n x n in the specialised Kalman filter, the approach can give a reconstructed signal differing from the full order Kalman filter approach by less than 0.2 dB in SNR. The corresponding improvement over the standard linear predictor for this situation is 1.0 dB in SNR. These tabulated results correspond to the output from the variable bit rate ADPCM system, at an average of 1.5 bits/sample, or 12 kbps (at 8 kHz sampling rate).

In Table 1 we present the Signal to Noise Ratio (SNR) and the Segmental SNR (segSNR) for a speech sentence along with the computational cost of the Riccati equation in MFLOPS. The notation LP indicates the standard linear predictor, KF denotes the full 50th order Kalman Filter, and KF_n indicates the specialised Kalman filter with an n x n upper block in the Riccati difference equation.

From the table, it is clear that the use of Kalman Filtering techniques can result in significant performance improvement, for extremely low additional complexity. Take particular note of the 0.5 dB improvement from using a first order Riccati equation. The extra computational complexity needed to obtain this improvement is negligible.

The use of the Kalman Filter in the KF-AC-ADPCM system also results in considerable subjective improvement, as it practically eliminates the high frequency "hiss" introduced by the large quantisation errors. This is especially important for low bit rate signal coding systems. The subjective improvement is in fact far greater than the objective SNR measures would indicate from above.

Adaptation

The specialised Kalman filter utilises backward adaptation in its operation. This involves calculating the filter coefficients from the reconstructed speech signal (S_k) rather than from the original speech signal (S_k), as would be the case in forward adaptation. The block ARcalc is driven by the reconstructed speech signal and computes the coefficients of the all-pole or auto- regressive (AR) signal model which in turn is used to adapt the specialised Kalman filter. This is a well known procedure, particularly in the Low Delay Code-word Excited Linear Prediction systems (LD-CELP) such as the CCITT G.728 16kbit/sec standard.

The speech signal is broken up into short segments (20-200 samples) and an assumption is made that the signal is stationary over that period, i.e. it is assumed to be piece- wise stationary. For this segment, the "optimal" set of α* coefficients can be computed and these are used in the specialised Kalman filter for the duration of the segment.

There are many standard approaches to computing these α* coefficients. A common approach is -as follows:

First of all a segment of the most recent data (reconstructed samples) is windowed using some form of recursive window. Next this windowed data is used to compute the auto-correlation coefficients. The Levinson-Durbin algorithm then efficiently computes the α* coefficients in a recursive fashion based on the auto-correlation coefficients.

Arithmetic Coding

Arithmetic Coding is a practically optimal entropy coding scheme, that is used here to encode the quantiser output with only the number of bits required by information theoretic consid¬ erations, based on the probability of that quantisation level being used. The prime difference between arithmetic coding and the more common Huffman coding is that it does not suffer from the disadvantage of requiring each source symbol to be encoded with an integral number of bits. This is particularly advantageous for highly peaked probability distributions as in this case.

The arithmetic coder takes the trivially coded difference signal (Z_k) and converts it to a variable rate entropy coded bit stream (I_k). Mathematically, this can be specified by

I_k = AC [Z_k] (26)

The arithmetic coding is adapted to cope with the nonstationary statistics and for this particular application, the adaptation was performed based on the short term variance (although it could be based on some other quantity). The block VarEst computes the short term variance of the quantised difference signal (Y_k) and uses this quantity to adapt the arithmetic coder. The short term variance is calculated by the recursion σ_k ² ₊₁ = σl + (l - a)Y_k- (27) where σ~ is a measure of the short term variance and a is the leakage constant with a value between 0 and 1.

The quantised difference signal is allocated to various bins based on the magnitude of the short term variance σ². A probability distribution is then either assumed (eg. Laplacian) or calculated (eg. using test sentences to compute a look-up table) for each bin and this is then used to arithmetically code the quantised difference signal (Yk). Perceptual Weighting

Perceptual weighting is a commonly used technique in many speech coding applications. It is used to improve the subjective quality of the decoded (reconstructed) speech by utilising known properties of the human auditory response. It is known that the human hearing is less sensitive to coding distortion in frequency bands where the energy is greatest due to the masking effect of the human ear. Perceptual weighting is found to give significant subjective performance improvements in the KF-AC-ADPCM system.

Perceptual weighting is the technique of adding appropriate filtering in order to redistribute the coding distortion energy in approximately the same distribution as the speech signal.

Transmission Channel or Storage

The output of the encoder (I_k) is a variable rate bit stream produced by the arithmetic coding block. These data are then either transmitted over a digital telecommunications channel (eg. Asynchronous Transfer Mode (ATM) network) or stored on some form of digital storage device (eg. hard disk) . There is always the possibility of bit errors occurring during transmission or storage, and so the input bit stream to the decoder will be referred to as . In an ideal situation, I_k = I_k- A practical codec must be able to cope with the non-ideal situation that occurs when bit errors are introduced (I_k I_k) and special techniques (such as periodic resynchronisation) are added in order to achieve this. For fixed rate channel applications, the encoder output bit stream (I_k) is buffered in order to achieve the fixed rate average value. This necessarily introduces an additional encoding delay.

Decoder

Figure 2 is a block diagram of the decoder. The decoder converts the received compressed variable rate bit stream (I_k) to the decoder reconstructed speech signal (S_k ). This reconstructed speech signal is in a similar form (fixed rate digital signal) to the original speech signal S_k- The reconstructed signal (S_k ) is then either connected to a cascade of a digital-to-analog (D/A) converter and speaker or to some other device.

The decoder first of all arithmetic decodes the variable rate bit stream (I_k) to produce the trivially coded signal at the decoder (Z_k) and this is represented by the arithmetic decoder block (AC^-1). Mathematically we have

Z_k' = AC^"1 [l_k'] . (28)

The output of the arithmetic decoder (Z_k) is then converted to a decoder quantised difference signal (Y_k) by the dequantiser block Q^_1 and this can be written as

Y_k = Q-¹ [z_k ^'] (29)

Note that the dequantiser block is identical to the one at the encoder. The arithmetic decoder is adapted in an equivalent way to the arithmetic coder, i.e. a variance estimate of the decoder quantised difference signal (Y_k) is used to adapt the arithmetic decoder. The VarEst block is identical to that of the encoder. The decoder quantised difference signal (Y_k) then drives the specialised Kalman filter to produce the decoder reconstructed signal (S_k ). This reconstructed signal is taken to be the decoder smoothed output, i.e. S_k = <S' M_A.. The specialised Kalman filter is identical in form to that of the encoder. The specialised Kalman filter is backward adapted in an equivalent manner to the encoder. Codec SNR (dB) segSNR (dB)

LD-CELP 18.81 16.03

KF-AC- ADPCM 28.03 17.82

Table 2: SNR and segSNR comparison between G.728 LD-CELP and KF-AC-ADPCM operat¬ ing at 16 kb/s

KF-AC-ADPCM Results

It is difficult to present concrete performance results for speech coding since the ultimate per¬ formance criterion is subjective quality and this can only be properly evaluted by expensive formal listening tests. The common approach is to use objective measures such as SNR and segSNR in conjunction with informal listening tests. In addition, it is also usual to compare with an existing well known standard, eg. CCITT G.728 LD-CELP 16 kb/s codec.

Table 2 shows values of SNR and segSNR for 16 kb/s LD-CELP and KF-AC-ADPCM operating at an average bit rate of 16 kb/s, when tested on the sentence "Cats and dogs each hate the other" .

Notice the KF-AC-ADPCM at an average of 16 kb/s has higher SNR and segSNR fig¬ ures than LD-CELP. The informal listening tests confirm that the subjective performance is significantly superior to that of LD-CELP. The informal listening tests also indicate that the KF-AC-ADPCM subjective quality at an average bit rate of 12 kb/s is equal to that of 16 kb/s LD-CELP. At 8kb/s, the KF-AC-ADPCM subjective quality is slightly inferior to LD-CELP, but nevertheless is still very good.

Claims

CLAIMS:

1. An ADPCM audio encoding system to provide a variable bit rate digitised representation of audio signals, said system comprising a digital input for said audio signals connected to a positive input of a summer, the output of said summer being connected to a cascade connected quantizer and dequantizer, the output of said dequantizer forming an input to a 'specialised' Kalman filter having two outputs, the first of said outputs comprising a predicted output which is connected to a negative input of said summer, the other of said filter outputs comprising a smoothed output which is utilized by an autoregressive calculator means to modify the operation of said specialised Kalman filter; and an arithmetic coder having its input connected to said quantizer output, and controlled by a variance estimator means operating on the output of said dequantizer, the output of said arithmetic coder comprising said variable bit rate digitised representation.

2. An encoding system as claimed in claim 1, further comprising an analog-to-digital converter for receiving audio signals in analog form and converting said signals to digital form.

3. An encoding system as claimed in claim 2, further comprising audio transducer means to generate said analog audio signals from sound pressure waves.

4. An encoding system as claimed in any one of the preceding claims, wherein said specialised Kalman filter utilises sub-optimal Kalman prediction by smoothing sample estimates only to a reduced number of lags relative to the full order.

5. An encoding system as claimed in any one of the preceding claims, wherein said arithmetic coder generates a variable rate entropy coded bit stream.

6. An encoding system as claimed in any one of the preceding claims, further comprising digital memory storage means for receiving and storing said variable bit rate digitised representation.

7. An encoding system as claimed in any of the preceding claims, wherein said audio signals are speech signals.

8. An ADPCM decoding system to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, said system comprising a dequantizer having its input connected to receive a decoded form of said variable bit rate digitised representation, the output of said dequantizer being connected to a 'specialised' Kalman filter, the output of which comprises said reconstituted audio signal and is utilized by an autoregressive calculator means to modify the operation of said 'specialised' Kalman filter, and an arithmetic decoder, controlled by a variance estimator means operating on the output of said dequantizer, interposed between said variable bit rate digitised representation and said dequantizer to generate said decoded form of said variable bit rate digitised representation.

9. A decoding system as claimed in claim 8, further comprising transduce means for receiving said reconstituted audio signal to generate sound pressure waves.

10. A decoding circuit as claimed in either of claim 8 or claim 9, wherein said specialised Kalman filter utilises sub-optimal Kalman prediction by smoothing sample estimates only to a reduced number of lags relative to the full order.

11. A decoding system as claimed in any one of claims 8 to 10, wherein said audio signals are speech signals.

12. An ADPCM audio encoding/decoding system comprising: an encoding system as claimed in any one of claims 1 to 5; a digital memory storage means for receiving and storing said variable bit rate digitised representation; and a decoding system as claimed in any one of claims 8 to 10.

13. An ADPCM audio encoding/decoding system comprising: one or more encoding systems as claimed in any one of claims 1 to 5; a distributed digital communications network to which said one or more encoding systems are connected and over which ones of said variable bit rate digitised representations are transmitted; one or more decoding systems as claimed in any one of claims 8 to 10 connected to said network, each said decoding system selectively acting as a destination for said transmitted representations.

14. A method for ADPCM audio encoding to provide a variable bit rate digitized representation of audio signals, said method comprising the steps of: filtering a digital audio signal by 'specialised' Kalman filter means; quantizing the filtered signal; and arithmetic coding the quantized signal to provide said variable bit rate digitised representation of said audio signal.

15. A method as claimed in claim 14, wherein said filtering steps includes dequantizing said quantized signal, filtering said dequantized signal by sub-optimal Kalman prediction by smoothing sample estimates only to a reduced number of lags relative to the full order to produce a predicted signal and subtracting the predicted signal from said digital audio signal.

16. A method as claimed in claim 15, whereby said step of arithmetic coding includes generating a variable rate entropy coded bit stream.

17. A method for ADPCM audio decoding to provide a reconstituted audio signal from a variable bit rate digitised representation of an original audio signal, the method comprising the steps of: arithmetic decoding of said digitised representation; dequantizing of said decoded representation; and filtering said dequantized signal by 'specialised' Kalman filter means to produce said reconstituted audio signal.

18. A method as claimed in claim 17, whereby said step of filtering includes filtering said dequantized signal by sub-optimal Kalman prediction by smoothing sample estimates only to a reduced number of lags relative to the full order.