GB2349054A - Digital audio signal encoders - Google Patents
Digital audio signal encoders Download PDFInfo
- Publication number
- GB2349054A GB2349054A GB9908659A GB9908659A GB2349054A GB 2349054 A GB2349054 A GB 2349054A GB 9908659 A GB9908659 A GB 9908659A GB 9908659 A GB9908659 A GB 9908659A GB 2349054 A GB2349054 A GB 2349054A
- Authority
- GB
- United Kingdom
- Prior art keywords
- allowed
- distortion
- signal
- encoding
- factor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 12
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000005540 biological transmission Effects 0.000 abstract description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000002940 Newton-Raphson method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
In encoding digital audio signals (for transmission) by a perceptual audio encoding method, involving implementing a psycho-acoustic model (PAM) on a frame of audio data which produces a set of "allowed" distortion for each listening band of the human ear, converting the frame of audio data from the time to the frequency domain and dividing it into the listening bands, and converting the frequency amplitude values into a set of bits, an improved algorithm (Fig.4, not shown) is used to obtain an optimal quantisation factor (QF) for implementing the latter task.
Description
IMPROVEMENTS IN OR RELATING TO ENCODING AND
DECODING AUDIO SIGNALS
This invention relates to the encoding and decoding of audio signals, in particular digital audio signals.
Processing of audio signals in the digital domain is well known and for digital television sound signals is defined in the MPEG standard. It is inefficient to transmit an uncoded digital audio signal. Accordingly the audio signal is generally encoded prior to transmission. Figure 1 shows the main encoding tasks for perceptual audio encoding. These are: implementing a psychoacoustic model (PAM) on a frame of audio data which produces a set of allowed distortion for each listening band of the human ear; converting the frame of audio data from the time to frequency domain and dividing it into the listening bands; and converting the frequency amplitude values into a set of bits.
The standard technique for implementing the third task is by quantisation where the large amplitude values are divided by a large real value called the quantisation factor (QF), to produce a set of small integers. The amplitudes are then reconstructed during decoding by multiplying the integer values with the inverse of the QF. The distortion produced by the process must be less than that allowed by the PAM which will be called"allowed"Hence, the important process for the quantisation task is the search for a value of the QF which produces distortion less than or equal to"allowed". As the purpose of encoding is to reduce the amount of storage required, the value must also minimises the number of bits required to represent the set of integers. A brute-force search can be used when there are no time restrictions placed on the search time. However, in the context of real-time implementation there is insufficient time to operate such a search and a non-optimal but faster method must be used.
The method suggested in the MPEG-2 AAC algorithm is to use two iteration loop. For this algorithm, the number of bits to code a whole frame is fixed before the quantisation process. A base QF is chosen as the QF for all of the listening bands. The iteration loops are then as follows : 1. The inner loop increases the QF of each band until the number of bits
required for the frame is not greater than the allowed number of bits.
2. The outer loop carries out the following steps:
a. Calling the inner loop ;
b. Calculating the distortion in each band and checking if this
quantisation process produced the best results so far. If so, then the
results are saved.
c. If all of the bands have distortion which is too large, then the
iterations terminate and the best results are restored.
d. If all of the bands have distortion less than or equal to allowed, then
the iterations terminate and the best results are restored.
e. Decreasing the QF in each band with distortion greater than allowed.
f. Repeating the process.
The purpose of storing the best results is because this method does not readily converge to a solution. Also, while this method does produce a good estimate for the QFs, it is not particular fast and the number of iterations must be restricted for real-time implementations.
One object of the present invention is to provide a system which overcomes the disadvantages of the known systems.
According to one aspect of the present invention, there is provided a method of encoding a digital audio signal comprising one or more portions in which an allowed distortion level is known, the method comprising representing the or each portion of the signal as one of a predetermined range of factors, selecting the size of the range of factors to be reduced from the predetermined range; iterating within the reduced range to select the optimal factor for the or each portion, and encoding the signal in accordance with the optimal factor.
In this way each of the listening bands is searched separately and the quantisation factor for each is estimated independently to decrease the time and resources required for the quantisation process. In addition, this solution is bracketed and as such a standard non-liner equation solver such as
Newton-Raphson can be used which offer many processing advantages. This method is both deterministic and ensures convergence to a solution.
According to a second aspect of the present invention there is provided apparatus for encoding a digital audio signal comprising one or more portions in which an allowed distortion level is known, the apparatus comprising means for representing the or each portion of the signal as one of a predetermined range of factors; a selector for selecting the size of the range of factors to be reduced from the predetermined range; a selector for searching within the reduced range to select the optimal factor for the or each portion, and an encoder for encoding the signal in accordance with the optimal factor.
Reference will now be made, by way of example, to the accompanying drawings in which:
Figure 1 is a prior art block diagram of a perceptual audio encoding process.
Figure 2 is a graph showing the relationship between scalefactor and number of bits.
Figure 3 is a graph showing the relationship between scalefactor and distortion; and
Figure 4 is a flow chart of the audio encoding process of the present invention.
The ear is a complex organ and has a number of known characteristics.
These characteristics are used to generate a model of the ear. This model shows some frequency dependency, for example if the ear has just experienced a loud noise, it may be some time before it can'hear'a soft noise. This type of characteristic is exploite in the model used in the present invention.
The number of bits required to encode a band decreases monotonically with increasing scalefactor. This is shown pictorially in Figure 2 which shows the relationship between the value of QF for a listening band and the number of bits required to code the set of integers for the MPEG-2 audio algorithm. The formula relating scalefactor as shown in Figure 2 to the actual QF shall be called the quantisation equation. Hence, any search for the optimal value of the QF would be to maximise the value of the QF while ensuring that the distortion is not greater than"allowed". Figure 3 shows the relationship between value of the QF for the above band with the distortion produced by that value. The dotted horizontal line in Figure 3 represents the allowed distortion produced by the PAM. As can be seen, there is no clear relationship to utilise. To ensure convergence to a reasonable solution, two values for the
QF must be chosen which are known to bracket the solution. In other words, quantisation using a QF of the chosen QFmtn must produce distortion not greater than"allowed"while the distortion produced using the chosen QFmax must be greater than"allowed"Then a search for the solution can be replaced by solving the equation:
distortion ="allowed"
The solution must exist according to the Intermediate Value Theorem (IVT). A standard iterative technique for solving this problem is derived from the IVT and is called the Newton-Raphson method. Here, the estimated value for the next QF is:
QF. F + (QFmaX-QFmin | allowed"-distortionmjn) distortionm-distortionm ; QFnew is used for the quantisation and if the distortion produced is greater than"allowed", QFmax is replaced by QFnew else QFmjn is replaced by QFnew.
This process is repeated until QFmin and QFmax converge and the solution has been found.
The search time is proportional to the difference of the two initial estimates so the selection for the initial values is important. The maximum value for the QF is chosen as the maximum value for which the quantisation process will produce at least one non-zero integer value. If the QF is increased, then the band will be quantised as a set of zeros which is not valid. This value can be calculated by using the quantisation equation on the maximum amplitude in the band and making the QF the subject of the formula.
The minimum initial value can be determined by examining the process of quantisation. The energy in the error signal is calculated as the sum of the squares of the difference. The largest difference between the reconstructed value and the original is half that of the QF. Hence, the most possible energy contained in the error signal is the number of amplitudes in the band multiplie by the square of QF/2. Making the maximum energy equal to "allowed"and QF the subject of the formula :
QF = 2 allowed QF=2 where"allowed"is the distortion allowed by the PAM; and N is the number of amplitudes in the band.
The process described above can be understood more clearly with reference to the flow chart of Figure 4.
QFmax is calculated and the distortion produced is determined (40). The level of the distortion is compared with the"allowed"level (42). If the distortion is less than"allowed"QFma, (44) is used. If the distortion is greater than "allowed" (48) then QFmin is calculated and the distortion produced is determined (50).
The level of distortion is compared with"allowed" (52). If the distortion is the same (54) the QFmin is used (56). If the distortion is not the same (58) further processing occurs. QFnew is calculated from QFmin and QFmax and the distortion is determined (60). Then the distortion is compared to"allowed" (62).
If the distortion is the same (64) QFnew is used (66). If the distortion is more (68) or less (70) one of two iterative loops are started. If the distortion is more
QFmax is set to QFnew (72).
If QFmax = QFmin + 1 (74) (76) QFmjn is used (78). If QFmax $ QFmin +1 (80) a new QFmin is calculated (60) and the loop is repeated.
If the distortion is less than"allowed" (70) QFmin is set to QFnew (82). If QFmax = QFmin +1 (84) (86) QFmin is used (88). If QFmax w QFmin +1 (90) the loop is repeated starting at (60).
The method of the invention can be modified to deal with higher bitrates. The above technique does not necessarily use all the bits allowed for the quantisation hence reducing the audio quality below that which is achievable for the bitrate. The method can be modified to use more bits if allowed. The number of bits required to encode the whole frame is calculated after each iteration of the QFnew estimator for all of the bands. The QF used for the quantisation process is taken as QFmen. If the number of bits required is less than that allowed for this frame, the quantisation process is terminated and the current results are used.
Claims (6)
- Claims 1. A method of encoding a digital audio signal comprising one or more portions in which an allowed distortion level is known, the method comprising representing the or each portion of the signal as one of a predetermined range of factors, selecting the size of the range of factors to be reduced from the predetermined range; searching within the reduced range to select the optimal factor for the or each portion, and encoding the signal in accordance with the optimal factor.
- 2. The method of claim 1, wherein the selecting step comprises selecting a maxiumum and a minimum factor which brackets the optimal factor.
- 3. The method of claim 2 where the selecting step comprises selecting different maximum and minimum factors for different portions of the signal.
- 4. The method of claim 2 or claim 3, wherein the selecting step comprises selecting the maximum factor to produce distortion greater than the allowed distortion level and selecting the minimum factor to produce distortion less than the allowed distortion level.
- 5. The method of any preceding claim, wherein the representing step comprises representing the or each portion of the signal as one of a predetermined range of quantisation factors.
- 6. Apparatus for encoding a digital audio signal comprising one or more portions in which an allowed distortion level is known, the apparatus comprising means for representing the or each portion of the signal as one of a predetermined range of factors; a selector for selecting the size of the range of factors to be reduced from the predetermined range; a selector for searching within the reduced range to select the optimal factor for the or each portion, and an encoder for encoding the signal in accordance with the optimal factor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9908659A GB2349054A (en) | 1999-04-16 | 1999-04-16 | Digital audio signal encoders |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9908659A GB2349054A (en) | 1999-04-16 | 1999-04-16 | Digital audio signal encoders |
Publications (2)
Publication Number | Publication Date |
---|---|
GB9908659D0 GB9908659D0 (en) | 1999-06-09 |
GB2349054A true GB2349054A (en) | 2000-10-18 |
Family
ID=10851609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB9908659A Withdrawn GB2349054A (en) | 1999-04-16 | 1999-04-16 | Digital audio signal encoders |
Country Status (1)
Country | Link |
---|---|
GB (1) | GB2349054A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0473367A1 (en) * | 1990-08-24 | 1992-03-04 | Sony Corporation | Digital signal encoders |
EP0559348A2 (en) * | 1992-03-02 | 1993-09-08 | AT&T Corp. | Rate control loop processor for perceptual encoder/decoder |
US5301255A (en) * | 1990-11-09 | 1994-04-05 | Matsushita Electric Industrial Co., Ltd. | Audio signal subband encoder |
EP0612159A2 (en) * | 1993-02-19 | 1994-08-24 | Matsushita Electric Industrial Co., Ltd. | An enhancement method for a coarse quantizer in the ATRAC |
US5414795A (en) * | 1991-03-29 | 1995-05-09 | Sony Corporation | High efficiency digital data encoding and decoding apparatus |
WO1996035269A1 (en) * | 1995-05-03 | 1996-11-07 | Sony Corporation | Non-linearly quantizing an information signal |
-
1999
- 1999-04-16 GB GB9908659A patent/GB2349054A/en not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0473367A1 (en) * | 1990-08-24 | 1992-03-04 | Sony Corporation | Digital signal encoders |
US5301255A (en) * | 1990-11-09 | 1994-04-05 | Matsushita Electric Industrial Co., Ltd. | Audio signal subband encoder |
US5414795A (en) * | 1991-03-29 | 1995-05-09 | Sony Corporation | High efficiency digital data encoding and decoding apparatus |
EP0559348A2 (en) * | 1992-03-02 | 1993-09-08 | AT&T Corp. | Rate control loop processor for perceptual encoder/decoder |
EP0612159A2 (en) * | 1993-02-19 | 1994-08-24 | Matsushita Electric Industrial Co., Ltd. | An enhancement method for a coarse quantizer in the ATRAC |
WO1996035269A1 (en) * | 1995-05-03 | 1996-11-07 | Sony Corporation | Non-linearly quantizing an information signal |
Also Published As
Publication number | Publication date |
---|---|
GB9908659D0 (en) | 1999-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101939782B (en) | Adaptive transition frequency between noise fill and bandwidth extension | |
JP3881943B2 (en) | Acoustic encoding apparatus and acoustic encoding method | |
US10121480B2 (en) | Method and apparatus for encoding audio data | |
JP3343965B2 (en) | Voice encoding method and decoding method | |
JP3071795B2 (en) | Subband coding method and apparatus | |
AU2003299395B2 (en) | Method for encoding and decoding audio at a variable rate | |
JP3881946B2 (en) | Acoustic encoding apparatus and acoustic encoding method | |
JPH05304479A (en) | High efficient encoder of audio signal | |
JP2005189886A (en) | Method for improving coding efficiency of audio signal | |
JP2007525716A (en) | Apparatus and method for determining step size of quantizer | |
US20080164942A1 (en) | Audio data processing apparatus, terminal, and method of audio data processing | |
JP2003501925A (en) | Comfort noise generation method and apparatus using parametric noise model statistics | |
JPH0713600A (en) | Vocoder ane method for encoding of drive synchronizing time | |
CN114550732B (en) | Coding and decoding method and related device for high-frequency audio signal | |
JP2007504503A (en) | Low bit rate audio encoding | |
US9287895B2 (en) | Method and decoder for reconstructing a source signal | |
JP2001102930A (en) | Method and device for correcting quantization error, and method and device for decoding audio information | |
EP1385150A1 (en) | Method and system for parametric characterization of transient audio signals | |
JP3472279B2 (en) | Speech coding parameter coding method and apparatus | |
JP3751001B2 (en) | Audio signal reproducing method and reproducing apparatus | |
JP2006011170A (en) | Signal-coding device and method, and signal-decoding device and method | |
US10734005B2 (en) | Method of encoding, method of decoding, encoder, and decoder of an audio signal using transformation of frequencies of sinusoids | |
JP2004302259A (en) | Hierarchical encoding method and hierarchical decoding method for sound signal | |
GB2349054A (en) | Digital audio signal encoders | |
JP4125520B2 (en) | Decoding method for transform-coded data and decoding device for transform-coded data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
COOA | Change in applicant's name or ownership of the application | ||
WAP | Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1) |