GB2349054A

GB2349054A - Digital audio signal encoders

Info

Publication number: GB2349054A
Application number: GB9908659A
Authority: GB
Inventors: Jeremy Bennett; Alberto Duenas; Stuart Mcdonald
Original assignee: NDS Ltd; Tandberg Television AS
Current assignee: Synamedia Ltd; Ericsson Television AS
Priority date: 1999-04-16
Filing date: 1999-04-16
Publication date: 2000-10-18
Also published as: GB9908659D0

Abstract

In encoding digital audio signals (for transmission) by a perceptual audio encoding method, involving implementing a psycho-acoustic model (PAM) on a frame of audio data which produces a set of "allowed" distortion for each listening band of the human ear, converting the frame of audio data from the time to the frequency domain and dividing it into the listening bands, and converting the frequency amplitude values into a set of bits, an improved algorithm (Fig.4, not shown) is used to obtain an optimal quantisation factor (QF) for implementing the latter task.

Description

IMPROVEMENTS IN OR RELATING TO ENCODING AND DECODING AUDIO SIGNALS This invention relates to the encoding and decoding of audio signals, in particular digital audio signals.

Processing of audio signals in the digital domain is well known and for digital television sound signals is defined in the MPEG standard. It is inefficient to transmit an uncoded digital audio signal. Accordingly the audio signal is generally encoded prior to transmission. Figure 1 shows the main encoding tasks for perceptual audio encoding. These are: implementing a psychoacoustic model (PAM) on a frame of audio data which produces a set of allowed distortion for each listening band of the human ear; converting the frame of audio data from the time to frequency domain and dividing it into the listening bands; and converting the frequency amplitude values into a set of bits.

The standard technique for implementing the third task is by quantisation where the large amplitude values are divided by a large real value called the quantisation factor (QF), to produce a set of small integers. The amplitudes are then reconstructed during decoding by multiplying the integer values with the inverse of the QF. The distortion produced by the process must be less than that allowed by the PAM which will be called"allowed"Hence, the important process for the quantisation task is the search for a value of the QF which produces distortion less than or equal to"allowed". As the purpose of encoding is to reduce the amount of storage required, the value must also minimises the number of bits required to represent the set of integers. A brute-force search can be used when there are no time restrictions placed on the search time. However, in the context of real-time implementation there is insufficient time to operate such a search and a non-optimal but faster method must be used.

The method suggested in the MPEG-2 AAC algorithm is to use two iteration loop. For this algorithm, the number of bits to code a whole frame is fixed before the quantisation process. A base QF is chosen as the QF for all of the listening bands. The iteration loops are then as follows : 1. The inner loop increases the QF of each band until the number of bits required for the frame is not greater than the allowed number of bits.

2. The outer loop carries out the following steps: a. Calling the inner loop ; b. Calculating the distortion in each band and checking if this quantisation process produced the best results so far. If so, then the results are saved. c. If all of the bands have distortion which is too large, then the iterations terminate and the best results are restored. d. If all of the bands have distortion less than or equal to allowed, then the iterations terminate and the best results are restored. e. Decreasing the QF in each band with distortion greater than allowed. f. Repeating the process.

The purpose of storing the best results is because this method does not readily converge to a solution. Also, while this method does produce a good estimate for the QFs, it is not particular fast and the number of iterations must be restricted for real-time implementations.

One object of the present invention is to provide a system which overcomes the disadvantages of the known systems.

According to one aspect of the present invention, there is provided a method of encoding a digital audio signal comprising one or more portions in which an allowed distortion level is known, the method comprising representing the or each portion of the signal as one of a predetermined range of factors, selecting the size of the range of factors to be reduced from the predetermined range; iterating within the reduced range to select the optimal factor for the or each portion, and encoding the signal in accordance with the optimal factor.

In this way each of the listening bands is searched separately and the quantisation factor for each is estimated independently to decrease the time and resources required for the quantisation process. In addition, this solution is bracketed and as such a standard non-liner equation solver such as Newton-Raphson can be used which offer many processing advantages. This method is both deterministic and ensures convergence to a solution.

According to a second aspect of the present invention there is provided apparatus for encoding a digital audio signal comprising one or more portions in which an allowed distortion level is known, the apparatus comprising means for representing the or each portion of the signal as one of a predetermined range of factors; a selector for selecting the size of the range of factors to be reduced from the predetermined range; a selector for searching within the reduced range to select the optimal factor for the or each portion, and an encoder for encoding the signal in accordance with the optimal factor.

Reference will now be made, by way of example, to the accompanying drawings in which: Figure 1 is a prior art block diagram of a perceptual audio encoding process.

Figure 2 is a graph showing the relationship between scalefactor and number of bits.

Figure 3 is a graph showing the relationship between scalefactor and distortion; and Figure 4 is a flow chart of the audio encoding process of the present invention.

The ear is a complex organ and has a number of known characteristics.

These characteristics are used to generate a model of the ear. This model shows some frequency dependency, for example if the ear has just experienced a loud noise, it may be some time before it can'hear'a soft noise. This type of characteristic is exploite in the model used in the present invention.

The number of bits required to encode a band decreases monotonically with increasing scalefactor. This is shown pictorially in Figure 2 which shows the relationship between the value of QF for a listening band and the number of bits required to code the set of integers for the MPEG-2 audio algorithm. The formula relating scalefactor as shown in Figure 2 to the actual QF shall be called the quantisation equation. Hence, any search for the optimal value of the QF would be to maximise the value of the QF while ensuring that the distortion is not greater than"allowed". Figure 3 shows the relationship between value of the QF for the above band with the distortion produced by that value. The dotted horizontal line in Figure 3 represents the allowed distortion produced by the PAM. As can be seen, there is no clear relationship to utilise. To ensure convergence to a reasonable solution, two values for the QF must be chosen which are known to bracket the solution. In other words, quantisation using a QF of the chosen QFmtn must produce distortion not greater than"allowed"while the distortion produced using the chosen QFmax must be greater than"allowed"Then a search for the solution can be replaced by solving the equation: distortion ="allowed" The solution must exist according to the Intermediate Value Theorem (IVT). A standard iterative technique for solving this problem is derived from the IVT and is called the Newton-Raphson method. Here, the estimated value for the next QF is:

QF. F + (QFmaX-QFmin | allowed"-distortionmjn) distortionm-distortionm ; QFnew is used for the quantisation and if the distortion produced is greater than"allowed", QFmax is replaced by QFnew else QFmjn is replaced by QFnew.

This process is repeated until QFmin and QFmax converge and the solution has been found.

The search time is proportional to the difference of the two initial estimates so the selection for the initial values is important. The maximum value for the QF is chosen as the maximum value for which the quantisation process will produce at least one non-zero integer value. If the QF is increased, then the band will be quantised as a set of zeros which is not valid. This value can be calculated by using the quantisation equation on the maximum amplitude in the band and making the QF the subject of the formula.

The minimum initial value can be determined by examining the process of quantisation. The energy in the error signal is calculated as the sum of the squares of the difference. The largest difference between the reconstructed value and the original is half that of the QF. Hence, the most possible energy contained in the error signal is the number of amplitudes in the band multiplie by the square of QF/2. Making the maximum energy equal to "allowed"and QF the subject of the formula :

QF = 2 allowed QF=2 where"allowed"is the distortion allowed by the PAM; and N is the number of amplitudes in the band.

The process described above can be understood more clearly with reference to the flow chart of Figure 4.

QFmax is calculated and the distortion produced is determined (40). The level of the distortion is compared with the"allowed"level (42). If the distortion is less than"allowed"QFma, (44) is used. If the distortion is greater than "allowed" (48) then QFmin is calculated and the distortion produced is determined (50).

The level of distortion is compared with"allowed" (52). If the distortion is the same (54) the QFmin is used (56). If the distortion is not the same (58) further processing occurs. QFnew is calculated from QFmin and QFmax and the distortion is determined (60). Then the distortion is compared to"allowed" (62).

If the distortion is the same (64) QFnew is used (66). If the distortion is more (68) or less (70) one of two iterative loops are started. If the distortion is more QFmax is set to QFnew (72).

If QFmax = QFmin + 1 (74) (76) QFmjn is used (78). If QFmax $ QFmin +1 (80) a new QFmin is calculated (60) and the loop is repeated.

If the distortion is less than"allowed" (70) QFmin is set to QFnew (82). If QFmax = QFmin +1 (84) (86) QFmin is used (88). If QFmax w QFmin +1 (90) the loop is repeated starting at (60).

The method of the invention can be modified to deal with higher bitrates. The above technique does not necessarily use all the bits allowed for the quantisation hence reducing the audio quality below that which is achievable for the bitrate. The method can be modified to use more bits if allowed. The number of bits required to encode the whole frame is calculated after each iteration of the QFnew estimator for all of the bands. The QF used for the quantisation process is taken as QFmen. If the number of bits required is less than that allowed for this frame, the quantisation process is terminated and the current results are used.

Claims

Claims 1. A method of encoding a digital audio signal comprising one or more portions in which an allowed distortion level is known, the method comprising representing the or each portion of the signal as one of a predetermined range of factors, selecting the size of the range of factors to be reduced from the predetermined range; searching within the reduced range to select the optimal factor for the or each portion, and encoding the signal in accordance with the optimal factor.
2. The method of claim 1, wherein the selecting step comprises selecting a maxiumum and a minimum factor which brackets the optimal factor.
3. The method of claim 2 where the selecting step comprises selecting different maximum and minimum factors for different portions of the signal.
4. The method of claim 2 or claim 3, wherein the selecting step comprises selecting the maximum factor to produce distortion greater than the allowed distortion level and selecting the minimum factor to produce distortion less than the allowed distortion level.
5. The method of any preceding claim, wherein the representing step comprises representing the or each portion of the signal as one of a predetermined range of quantisation factors.
6. Apparatus for encoding a digital audio signal comprising one or more portions in which an allowed distortion level is known, the apparatus comprising means for representing the or each portion of the signal as one of a predetermined range of factors; a selector for selecting the size of the range of factors to be reduced from the predetermined range; a selector for searching within the reduced range to select the optimal factor for the or each portion, and an encoder for encoding the signal in accordance with the optimal factor.