MXPA06008224A - Audio coding based on block grouping - Google Patents

Audio coding based on block grouping

Info

Publication number
MXPA06008224A
MXPA06008224A MXPA/A/2006/008224A MXPA06008224A MXPA06008224A MX PA06008224 A MXPA06008224 A MX PA06008224A MX PA06008224 A MXPA06008224 A MX PA06008224A MX PA06008224 A MXPA06008224 A MX PA06008224A
Authority
MX
Mexico
Prior art keywords
groups
blocks
measure
audio information
group
Prior art date
Application number
MXPA/A/2006/008224A
Other languages
Spanish (es)
Inventor
Grant Allen Davidson
Matthew Conrad Fellers
Mark Stuart Vinton
Claus Bauer
Original Assignee
Claus Bauer
Grant Allen Davidson
Dolby Laboratories Licensing Corporation
Matthew Conrad Fellers
Mark Stuart Vinton
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Claus Bauer, Grant Allen Davidson, Dolby Laboratories Licensing Corporation, Matthew Conrad Fellers, Mark Stuart Vinton filed Critical Claus Bauer
Publication of MXPA06008224A publication Critical patent/MXPA06008224A/en

Links

Abstract

Blocks of audio informationare arranged in groups that share encoding control parameters to reduce the amount of side information needed to convey the control parameters in an encoded signal. The configuration of groups that reduces the distortion of the encoded audio information may be determined by any of several techniques that search for an optimal or near optimal solution. The techniques include an exhaustive search, a fast optimal search and a greed merge, which allow the search technique to tradeoff the reduction in distortion against the bit rate of the encoded signal and/or the computational complexity of the search technique.

Description

AUDIO CODING BASED ON THE GROUPING OF BLOCKS TECHNICAL FIELD The present invention relates to the optimization of the operation of digital audio encoders of the type that apply a coding process to one or more streams of audio information representing one or more audio channels that are segmented into frames, each frame comprising one or more blocks of digital audio information. More particularly, the present invention relates to the grouping of audio information blocks arranged in frames in such a way as to optimize a coding process that is applied to frames.
PREVIOUS TECHNIQUE Many audio processing systems operate by dividing currents of audio information into frames and by further dividing frames into sequential data blocks that represent a portion of the audio information in a particular time interval. Some type of signal processing is applied to each block in the stream. Two examples of audio processing systems that apply a perceptual coding process to each block are systems that conform to the Advanced Audio Encoder (AAC) standard, which is described in ISO / IEC 13818 -7. "MPEG-2 advanced audio coding, AAC". International Standard, 1997; ISO / IEC JTC1 / SC29, "Information technology - very low-bitrate audio-visual coding" and ISO / IEC IS-14496 (Part 3, Audio), 1996 and systems commonly called AC-3 that comply with the coding standard described in document A / 52A of the Committee on Advanced Television Systems (ATSC) entitled "A to Digital Audio Compression Review (AC-3) Standard" published on August 20, 2001. One type of processing The signal that is applied to the blocks in many audio processing systems is a form of perceptual coding that performs an analysis of the audio information in the block to obtain a representation of its spectral components, estimates the perceptual masking effects of the spectral components, quantifies the spectral components in such a way that the resulting quantization noise is imperceptible or its audibility is as low as possible and gathers a representation of the quantized spectral speakers in a coded signal that can be transmitted or recorded. A set of control parameters that is necessary to recover a block of audio information from the quantized spectral components is also gathered in the encoded signal. The spectral analysis can be done in a variety of ways but an analysis that uses a temporal domain to frequency domain transformation is common. With the transformation of the audio information blocks into a frequency domain representation, the spectral components of the audio information are represented by a sequence of vectors in which each vector represents the spectral components for a respective block. The elements of the vectors are frequency domain coefficients and the index of each vector element corresponds to a particular frequency interval. The width of the frequency range represented by each transform coefficient is either fixed or variable. The width of the frequency range represented by the transform coefficients generated by a Fourier-based transform such as the Discrete Fourier Transform (DFT) or a Discrete Cosine Transform (DCT, for its acronym in English) Is fixed. The width of the frequency range represented by the transform coefficients generated by a small wave or small wave packet transform is variable and typically grows larger with the increase in frequency. See, for example, A. Akansu, R. Haddad, "Multiresolution Signal Decomposition, Transforme, Subbands, Avelets, "Academic Press, San Diego, 1992. A type of signal processing that can be used to retrieve a block of audio information from the perceptually encoded signal obtains a set of control parameters and a representation of quantized spectral components of the signal. encoded signal and uses this set of parameters to derive the spectral components for the synthesis in an audio information block The synthesis is complementary to the analysis used to generate the encoded signal It is common for a synthesis that uses a frequency domain transformation to temporal domain In many coding applications, the bandwidth or space that is available to transmit or record an encoded signal is limited and this limitation imposes severe restrictions on the amount of data that can be used to represent the quantized spectral components. necessary to carry co Control parameter sets are an overload that further reduces the amount of data that can be used to represent the quantized spectral components. In some coding systems, a set of control parameters is used to encode each block of audio information. One known technique for reducing the overhead in these types of coding systems is to control the coding processes in such a way that only one set of control parameters is needed to recover the multiple blocks of audio information of a coded signal. If the coding process is controlled so that ten blocks share a set of control parameters, for example, the overload of these parameters is reduced by ninety percent. Unfortunately, audio signals are not stationary and the efficiency of the coding process for all blocks of audio information in a frame may not be optimal if the control parameters are shared by too many blocks. What is necessary is a way to optimize the signal processing efficiency by controlling that processing to reduce the overhead necessary to carry control parameters.
DESCRIPTION OF THE INVENTION According to the present invention, the audio information blocks arranged in frames are grouped into one or more sets or groups of blocks in such a way that each block is in a respective group. Each group may consist of an individual block or a set of two or more blocks within a frame and a process that is applied to each block in the group uses a common set of one or more control parameters such as, for example, a set of scale factors. The present invention is directed to the control of block clustering to optimize the performance of signal processing. In a coding system, for example, an audio information stream comprising blocks of audio information is ordered in frames where each frame has one or more groups of blocks. A set of one or more encoding parameters is used to encode the audio information for all blocks within a respective group. The blocks are grouped to optimize some measure of coding performance. For example, a coding system incorporating various aspects of the present invention can control the grouping of blocks to minimize a signal error representing the distortion of encoded audio information in a frame that uses shared coding parameters for each group in the frame compared to the distortion of a coded signal for a reference signal in which each block is encoded using its own set of coding parameters.
The various features of the present invention and their preferred embodiments can be better understood by reference to the following description and associated drawings in which like reference numerals refer to like elements in the various figures. The contents of the following description and the drawings are set forth as examples only and should not be construed as representing limitations on the scope of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a block diagram of an audio coding system in which various aspects of the present invention can be incorporated. Figure 2 is a block diagram of an outer cycle in an iterative process to find an optimal number of groups of blocks in a frame. Figures 3A and 3B are block diagrams of an inner cycle in an iterative process to find an optimal cluster of blocks in a frame. Figure 4 is a block diagram of a Greedy Merge process. Figure 5 is a conceptual block diagram illustrating an example of a Greedy Merge process applied to four blocks.
Figure 6 is a schematic block diagram of a device that can be used to implement various aspects of the present invention.
MODES FOR CARRYING OUT THE INVENTION A. Introduction Figure 1 illustrates an audio coding system in which an encoder 10 receives from the path 5 one or more streams of audio information representing one or more channels of audio signals. The encoder 10 processes the audio information streams to generate along the path 15 a coded signal that can be transmitted or recorded. The encoded signal is subsequently received by the decoder 20, which processes the encoded signal to generate along the path 25 a repetition of the audio information received from the path 5. The content of the repetition may not be identical to the original audio information. If the encoder 10 uses a lossless encoding method to generate the encoded signal, the decoder 20 can in principle recover a repetition that is identical to the original audio information streams. If the encoder 10 uses a lossy coding technique such as a perceptual coding, the content of the recovered repetition is generally not identical to the content of the original stream but may be perceptually indistinguishable from the original content. The encoder 10 encodes the audio information in each block using a coding process that is responsive to a set of one or more process control parameters. For example, the coding process can transform the temporal domain information within each block into frequency domain transform coefficients, by representing the transform coefficients in a floating point form in which one or more floating point mantissas are associated with a floating point exponent and the use of floating point exponents to control scaling and quantification of mantissas. This basic approach is used in many audio coding systems that include the AC-3 and AAC systems mentioned above and is described in more detail in the following paragraphs. However, it should be understood that the scaling factors and their use as control parameters is only one example of how the teachings of the present invention can be applied. In general, the value of each floating point transform coefficient can be represented more accurately with a given number of bits if each coefficient mantissa is associated with its own exponent because it is more likely that each mantissa can be normalized; however, it is possible that a complete set of transform coefficients for a block can be represented more accurately with a given number of bits if some coefficient coefficients share an exponent. An increase in accuracy may be possible because sharing reduces the number of bits needed to encode exponents and allows a greater number of bits to be used to represent mantissans more accurately. Some of the mantissas can no longer be normalized but if the values of the transform coefficients are similar, the higher precision can result in a more accurate representation of at least some of the mantissas. The way in which the exponents are shared between the mantissas can be adapted from block to block or the order of sharing can be constant. If the order of sharing exponents is constant, it is common to share the exponents in such a way that each exponent and its associated mantises define a frequency sub-band that is commensurate with a critical band of the human auditory system. In this scheme, if the frequency range represented by each transform coefficient is fixed, the larger numbers of mantissas share an exponent for the higher frequencies than for the lower frequencies. The concept of sharing exponents of floating points between mantissas within a block can be extended to share exponents between mantissas in two or more blocks. Exponent sharing reduces the number of bits needed to carry the exponents in an encoded signal so that additional bits are available to represent mantissas with greater precision. Depending on the similarity of the values of transform coefficients in the blocks, the sharing of exponents between blocks can increase or decrease the precision with which the mantissas are represented. In this way, the description has referred to by far the exchange in the precision of a representation of floating points of values of transform coefficients when sharing the exponents of floating points. The same exchange in precision occurs to share blocks of parameters used for control coding processes such as perceptual coding that uses perceptual models to control the quantization of coefficient mantissas. The coding processes used in the AC-3 and AAC systems, for example, use the floating-point exponents of the transform coefficient to control the distribution of bits for the quantization of the mantissas of transform coefficients. Sharing the exponents between blocks decreases the bits needed to represent the exponents, which allows more bits to be used to represent the encoded mantissa. In some cases, sharing exponents between two blocks decreases the accuracy with which the value of the encoded mantissas is represented. In other cases, sharing two blocks increases accuracy. If exponents are shared between two blocks it increases the precision of mantissas, sharing three or more blocks can provide additional increments in accuracy. Various aspects of the present invention can be implemented in an audio encoder by optimizing the number of groups and group boundaries between groups of blocks to minimize the distortion of encoded signals. An exchange can be made between the degree of minimization and either the total number of bits used to represent a frame of an encoded signal and the computational complexity of the technique used to optimize the group orderings. In one implementation, this is done by minimizing a measure of the average quadratic error energy.
B. Background The following description relates to the ways in which various aspects of the present invention can be incorporated into an audio coding system that optimizes the processing of groups of blocks of audio information arranged in frames. Optimization is first expressed as a problem of numerical minimization. This numerical structure is used to develop several implementations that have different levels of computational complexity and provide different levels of optimization. 1. Selection of Groups as a Numerical Minimization Problem Groups are allowed a degree of freedom in the optimization process by allowing a variable number of groups within frames. For the purpose of calculating an optimal grouping configuration, it is assumed that the number of groups and the number of blocks in each group may vary from frame to frame. It is also assumed that a group consists of an individual block or a multiplicity of blocks that are within an individual frame. The optimization to be performed is to optimize the grouping of blocks within a frame given one or more restrictions. These restrictions may vary from one application to another and may be expressed as a maximization of excellence in the results of signal processing such as the fidelity of the encoded signal or may be expressed as a minimization of a reverse processing result such as distortion of the signal. coded signals. For example, an audio encoder may have a restriction in that it requires minimizing the distortion for a given data rate of the encoded signal or that it requires the exchange of the data rate of encoded signals against the level of distortion of encoded signals. , whereas an analysis / detection / classification system may have a restriction in that it requires the exchange of precision of analysis, detection or classification against computational complexity. The signal distortion measures are described later but are only examples of a wide variety of quality measures that can be used. The techniques described below can be used with measures of the excellence of signal processing such as the fidelity of coded signals, for example, by inverting the comparisons and reversing references in terms of relative quantities such as highs and lows or highs and lows. . It is anticipated that the present invention may be implemented in accordance with any of at least three strategies that vary from one another in the use of temporal domain representations and frequency domain of the audio information. In a first strategy, the temporal domain information is analyzed to optimize the processing of groups of blocks that carry temporal domain information. In a second strategy, the frequency domain information is analyzed to optimize the processing of groups of blocks that carry temporal domain information. In a third strategy, frequency domain information is analyzed to optimize the processing of groups of blocks that carry frequency domain information. Below are several implementations described according to the third strategy. In the practical implementations of the present invention to encode audio information for transmission or recording, it is useful to define the terms "distortion" and "secondary value" for the following description. The term "distortion" is a function of the frequency domain transform coefficients in the block or blocks that belong to a group and is a map of the group space with respect to the space of non-negative real numbers. A distortion of zero is assigned to the frame that contains exactly N groups, where N is the number of blocks in the frame. In this case, control parameters are not shared between blocks. The term "secondary value" is a discrete function that maps from the set of non-negative integers to the set of non-negative real numbers. In the following description, it is assumed that the secondary value is a positive linear function of the argument x, where x is equal to p-l and p is the number of groups in a frame. A secondary value of zero is assigned to a frame if the number of groups in the frame equals one. Here are two techniques for calculating the distortion. One technique calculates the distortion based on the "bands" for each of the frequency bands K, where each frequency band is a set of one or more contiguous frequency domain transform coefficients. A second technique calculates an individual distortion value for the entire block in the broadband direction across all of its frequency bands. This is useful to define several more terms for the following description. The term "band distortion" is a vector of values of the K dimension, cataloged from the low frequency to the high frequency. Each of the elements K in the vector represents a distortion value for a respective set of one or more transform coefficients in a block.
The term "block distortion" is a scalar value that represents a distortion value for a block. The term "pre-echo distortion" is a scalar value that expresses a level of distortion commonly called pre-echo relative to some broadband reference energy threshold of barely noticeable difference (JND, for its acronym in English), where the distortion below the reference energy threshold JND is not considered important. The term "temporary support" is the degree of temporal domain samples that correspond to an individual block of transform coefficients. For the Modified Cosine Modified Transform (MDCT) described in Princen et al., "Subband / Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation," ICASSP 1987 Conf. Proc., May 1987, pages 2161-64, any modification to a transform coefficient affects the information that is retrieved from two consecutive blocks of transform coefficients due to the overlap of 50% of the segments in the temporal domain that is imposed by the transformed. The temporary support for this MDCT is the time segment that corresponds only to the first affected block of coefficients.
The term "joint channel coding" is a coding technique by which two or more audio information channels are combined in some way in the encoder and separated in the different channels in the decoder. The separate channels that are obtained by the decoder may not be identical or may still be perceptually indistinguishable from the original channels. The joint coding of channels is used to increase the coding efficiency when exploiting the mutual information between both channels. Pre-echo distortion is a consideration with respect to time domain masking for a transform audio coding system in which the temporal support of the transform is larger than a pre-masking time interval. Additional information regarding the pre-masking time interval can be obtained from Zwicker et al., "Psychoacoustics-Facts and Models," Springer-erlag, Berlin 1990. The optimization techniques described below assume that temporal support is less that the pre-masking interval and, therefore only the objective measures of the distortion are considered. The present invention does not exclude the option of performing the optimization based on a subjective or perceptual distortion measurement which is opposite to an objective measurement of the distortion. In particular, if the temporal support is larger than the optimal length for a perceptual encoder, it is possible that an average square error or other objective distortion measurement does not accurately reflect the level of audible distortion and that the use of a measurement of the subjective distortion could select a block grouping configuration that differs from the grouping configuration obtained using an objective measurement. The optimization process can be designed in a variety of ways. One form iterates the value p from 1 to N, where p is the number of groups in a frame and identifies for each value of p the configurations of groups that have a sum of distortions of all the blocks in the frame that is not greater than a threshold T. Among these identified configurations, one of the three techniques described below can be used to select the optimal configuration of the groups. Alternatively, the value of p may be determined in some other way such as by means of a two-channel coding process that optimizes the coding gain by adaptively selecting a number of blocks for joint coding of channels. In this case, a common value of p is derived from the individual values of p for each channel.
Given a common value of p for the two channels, the optimal group configuration can be calculated together for both channels. The configuration of groups of blocks in a frame may be dependent on the frequency but this requires that the encoded signal carry additional information to specify how many frequency bands are grouped. Several aspects of the present invention can be applied to multi-band implementations by considering bands with common grouping information as separate features of the broadband implementations described herein. 2. Error Energy as a Distortion Measure The meaning of "distortion" has been defined in terms of an amount that drives optimization but this distortion has not yet been related to something that can be used by a process to find an optimal grouping of blocks in an audio encoder. What is needed is a measure of the encoded signal quality that can direct the optimization process towards an optimal solution. Because the optimization is directed towards the use of a common set of control parameters for each block in a group of blocks, the measurement of the encoded signal quality should be based on something that is applied to each block and can be combined easily into an individual representative value or a composite measure for all the blocks in the group. A technique for obtaining a composite measure that is described later is to calculate the average of some value for the blocks in the group with the condition that a useful average can be calculated for the value in question. Unfortunately, not all values available in audio coding can be used to calculate a useful average from a plurality of values. An example of an inadequate value is the phase component of the Discrete Fourier Transform (DFT) for a transform coefficient because an average of these phase components does not provide any significant value. Another technique for obtaining a composite measure is to select the maximum of some value for all the blocks in the group. In any case, the composite measure is used as a reference value and the measure of the encoded signal quality is inversely related to the distance between this reference value and the value for each block in a group. In other words, the measurement of the encoded signal quality for a frame can be defined as the inverse of the error between a reference value and the appropriate value for each block in each group for all groups in the frame. A measure of the encoded signal quality as described above can be used to drive optimization by performing a process that minimizes this measurement. Other parameters may be relevant in several coding systems or in other applications. An example is the parameters related to the coding commonly called intermediate / lateral, which is a joint coding technique of common channels in which the "intermediate" channel is the sum of the left channel and the right channel and the "lateral" channel is the difference between the left channel and the right channel. Implementations of coding systems that incorporate various aspects of the present invention can use inter-channel correlation instead of energy levels to control the sharing of intermediate / lateral coding parameters through the blocks. In general, any audio encoder that assembles blocks into groups, that shares the coding control parameters between the blocks in a group and that transmits the control information to a decoder can benefit from the present invention, which can determine a configuration of optimal grouping for the blocks. Without the benefits provided by the present invention, a sub-optimal bit distribution can result in a total increase in audible quantization distortion because the bits are deviated from the spectral coding coefficients and can not be optimally distributed among the various spectral coefficients. 3. Vector Energy vs. Scalar Energy The implementations of the present invention can use values of either band distortion or block distortion to drive the optimization process. If band distortion is used or if block distortion is used it depends to a large extent on the variation in band energy from one block to the next. The following definitions are provided: um is a scalar energy value for the total energy in block m, and (the) vmj is a vector element that represents the energy of bands for band j in block m (Ib) if the signal to be encoded is without memory in such a way that μ (vTOj, vm + 1 / j ) = 0, where 0 <; j < K-1 for the frequency bands K and μ is a measure of the degree of mutual information between adjacent blocks so a system that uses scalar energy measurement um will work as well as a system that uses band energy measurement values vmj. See Jayant et al., "Digital Coding of aveforms," Prentice-Hall, New Jersey, 1984. In other words, when successive blocks have little similarity in spectral energy levels, scalar energy works as well as band energy. as a measure. On the other hand, as described below, when the successive blocks have a high degree of similarity in spectral energy levels, the scalar energy can not provide a satisfactory measure to indicate whether the parameters can be common for two or more blocks without imposing a serious penalty in coding performance. The present invention is not restricted to the use of any particular measure. Distortion measures based on logarithmic energies and other signal properties may also be appropriate in several applications. However, for transitions of blocks having a similar spectral content, or μ (vm, j, vm +? J) > 0, it is still possible that the specific band energy values vm, j satisfy the following expression: ic-i? -t? Vra? J -? Vm + 1J = 0 (2) j = oj = oo equal a small value close to zero. This result illustrates the fact that, on a broadband basis, a comparison of the total energy between adjacent blocks can control the differences between blocks in individual frequency bands. For many signals, a scalar measure of energy is not enough to minimize distortion accurately. Because this applies to a wide variety of audio signals, an implementation of the present invention described below uses the vector of band energy values Vm = (VÍ, 0, - • •,?, K-?) In Place the scalar block energy value um to identify the optimal grouping configuration. 4. Identification of Restrictions There are numerous restrictions to be considered based on the application in which the invention is used. An implementation of the present invention described below is an audio coding system; therefore, the relevant restrictions are parameters related to the encoding of the audio information. For example, a secondary value constraint arises from the need to transmit control parameters that are common for all blocks in a group. A higher secondary value may allow a signal to be encoded with a lower distortion for each block but the increase in the secondary value may increase the total distortion for all blocks in a frame if a fixed number of bits must be distributed to each frame. There may also be imposed restrictions on the complexity of the implementation that favor a particular implementation of the present invention over another. 5. Derivation of the Establishment of a Problem The following is a definition of a numerical problem to optimize the distortion in an audio coding system. In this definition of a particular problem, the distortion is a measure of the error energy between the spectral coefficients for a frame in a candidate cluster of blocks and the spectral coefficient energy of the individual blocks in a frame where each block is in its own group. Assume an ordered set of N band energy vectors Vi, 0 < i < N, where each vector is of dimension K with real positive elements, that is, V ± =. { j., 0, ..., Vi / K -? -} . The symbol Vi represents a vector of band energy values, where each element of the vector can essentially correspond to any desired band of transform coefficients. For any ordered set of positive integers 0 = s0 <; sx < ... < sp = N, one can define the Im intervals as Im = [sm_ ?, sm], Vm, 0 < m < p. • The symbol sm represents the index of blocks of the first block in each group and m is the index of groups. The value sp = N can be through an index for the first block of the following frame with the sole purpose of defining an end point for the interval Im. One can define a partition P (s0, ..., sp) of the set of energy vectors as follows: P (S) = (Go, ..., Gp., (3) where S is the vector (s0, ..., sp) and Gm = { Vi | ie Im.}. (4) The symbol Gm is representative of the blocks in a group Several distortion measures can be used in various implementations of the present invention. The maximum distortion measure, average M 'is defined as follows: Jm.j = ^ ax (Vi > j) (5) M '(s) = ¿J' (m) (7) The average distortion A is defined as follows: K ^ r ^ -? V (8) Va ra am-l eGm A (s) = ¿K '() (10) m = l A maximum difference distortion M "is defined as follows: M "(s) = ¿J" (m) (12) The secondary value function for a partition P (S) = P (s0, ..., sp) is defined as equal to (pl) c, where c is a positive real constant. Two additional functions for distortion are defined below: M * (S) = M (S) + Dist. { (p-l) c} (13) A * (S) = A (S) + Dist. { (p-l) c} (14) where M (S) can be either M '(S) or M' '(S), and Dist { } It is a map to express the secondary value in the same units as distortion. The function for M (S) can be selected according to the search algorithm used to find an optimal solution. This is described below. The Dist. {Function. } it is used to map the secondary value into values that are compatible with M (S) and A (S). In some coding systems, an adequate mapping of the secondary value for the distortion is Dist { C.}. = 6.02dB- C where C is the secondary value expressed in bits. The optimization can be formulated as the following numerical problem: determine an S vector with elements of positive integers (s0, sx, ..., sp) that minimizes a particular distortion function M (S), M * (S), A (S) or A * (S) for all possible selections of positive integers s0, Si, ..., sp satisfying the relation O = s0 < Yes < ... < sp = N, where 1 < p < N. The variable p can be selected in the range of 1 to N to find the vector S that minimizes the desired distortion function. Alternatively, optimization can be formulated as a numerical problem using a threshold: Determine all integer values of p, 1 < p < N, the vectors S = (s0, Si, ..., sp) satisfying the relation 0 = s0 < Yes < ... < sp = N such that the value of a desired distortion function M (S), M * (S), A (S) or A * (S) is less than an assumed threshold value T. From these vectors, find a vector S with the minimum value for p. An alternative to this approach is to iterate over the increasing values of p from 1 to N and select the first vector S that satisfies the threshold constraint. This approach is described in more detail later. 6. Additional Considerations for Multi-Channel Systems For stereo coding systems or multiple channels that employ joint stereo / multi-channel coding methods such as channel coupling used in AC-3 systems and intermediate stereo encoding / side or intensity stereo coding used in AAC systems, the audio information on all these channels should be encoded in the appropriate short block mode for that particular coding system, ensuring that the audio information on all channels has the same number of groups and the same grouping configuration. This restriction is applied because the scaling factors, which are the main source of the secondary value, are provided only for one of the coded channels. This implies that all channels have the same grouping configuration because a set of scale factors applies to all channels. Optimization can be performed in any of at least three ways in multi-channel coding systems: A form referred to as "Joint Channel Optimization" is performed by means of a joint optimization of the number of groups and the limits of groups in a individual step to, add all the error energies, either band or broadband, through the channels. Another form referred to as "Optimization of Inclusive Cycle Channels" is carried out through the joint optimization of channels implemented as an inclusive cycle process where the external cycle calculates the optimal number of groups for all channels. Considering both channels in a stereo joint coding mode, for example, the inner cycle performs an optimization of the ideal grouping configuration for a given number of groups. The main restriction that is imposed on this approach is that the process performed in the inner cycle uses the same value of p for all coded channels. Still another way referred to as "Individual Channel Optimization" is done by optimizing the grouping configuration for each channel independently of all other channels.No co-channel coding technique can be used to encode some channel in a frame with unique values of a single grouping configuration 7. Methods for Performing a Restricted Optimization The present invention can use essentially any desired method to search for an optimal solution.This document describes three methods.The "Exhaustive Search Method" is computationally intensive but always finds the optimal solution An approach calculates the distortion for all possible group numbers and all possible grouping configurations for each number of groups; identifies the grouping configuration with the minimum distortion for each number of groups; and then determines the optimal number of groups by selecting the configuration that has the least distortion. Alternatively, the method can compare the minimum distortion for each number of groups with a threshold and terminate the search after finding the first grouping configuration having a distortion measurement less than the threshold. This alternative implementation reduces the computational complexity of the search to find an acceptable solution but can not ensure that the optimal solution is found. The "Greedy Merge Method" is not as computationally intensive as the Comprehensive Search Method and can not ensure that the optimal grouping configuration is found but usually finds a configuration that is either as good or almost as good as the optimal configuration. According to this method, the adjacent blocks are combined in groups iteratively while the secondary value is justified. The "Rapid Optimal Method" has a computational complexity that is intermediate with respect to the complexity of the other two methods described above. This iterative method avoids considering certain group configurations based on distortion calculations that were calculated in the first iterations. Like the Exhaustive Search method, all group configurations are considered but a consideration of some configurations can be eliminated from subsequent iterations in view of the previous calculations. 8. Parameters Affecting Secondary Value Preferably, an implementation of the present invention justifies changes in the secondary value as it searches for an optimal grouping configuration. The main component in a secondary value for AAC systems is the information needed to represent the values of scale factors. Because scale factors are shared by all the blocks in a group, adding a new group in an AAC encoder will increase the secondary value by the amount of additional information needed to represent the additional scale factors. If an implementation of the present invention in an encoder AAC does not justify the changes in the secondary value, this consideration must use an estimate because the values of scale factors can not be known until after the calculation of the speed-distortion cycle is completed, which must be done after the grouping configuration is established. Scale factors in AAC systems are highly variable and their values are closely linked to the resolution of quantification of the spectral coefficients, which is determined in the inclusive velocity / distortion cycles. Scale factors in AAC are also encoded by entropy, which additionally contributes to the non-deterministic nature of their secondary value. Other forms of secondary values are possible depending on the specific coding processes that are used to encode the audio information. In AC-3 systems, for example, the coupling coordinates of channels can be shared across blocks in a manner that favors the grouping of coordinates according to a common energy value. Several aspects of the present invention are applicable to the process in AC-3 systems that selects the "exponent encoding strategy" used to carry exponents of transform coefficients in a coded signal. Because the AC-3 exponents are taken as a maximum of the power spectral density values for all the spectral lines that share a given exponent, the optimization process can operate using a maximum error criterion instead of the criterion of average quadratic error used in AAC. In an AC-3 system, the secondary value is the amount of information needed to carry exponents for each new block that does not reuse exponents of the previous block. The exponent encoding strategy, which also determines how many coefficients share exponents across the frequency, affects the secondary value if the exponent strategy is dependent on the grouping configuration. The process necessary to estimate the secondary value of the exponents in the AC-3 systems is less complex than the process necessary to provide an estimate for the scale factors in the AAC systems because the exponent values are calculated first in the process of coding as part of the psychoacoustic model. C. Detailed Descriptions of the Search Methods 1. Comprehensive Search Method The exhaustive search method can be implemented using a threshold to limit the number of grouping configurations and the number of groups under test. This technique can be simplified by relying exclusively on the threshold value to establish the real value of p. This can be done by setting the threshold value in some number between 0.0 and 1.0 and iterating over the possible number of groups p. The optimal group configuration and the resulting distortion function is calculated for p = 1 and incremented p by one in each comparison against T. The resulting distortion is compared to T and the first value of p for which the distortion function is smaller that T is selected as the optimal group number. By empirically establishing the value of the threshold T, it is possible to achieve a Gaussian distribution of p through a large sampling of short window frames for a wide variety of. different entrance signs. This Gaussian distribution can be changed by setting the value of T to thereby allow a higher or lower average value of p over a wide variety of input signals. This process is shown in the block diagram of Figure 2, which shows a process in an external cycle to find an optimal number of groups. The processes suitable for the interior cycle are shown in Figures 3A and 3B and are described below. Any of the distortion functions described in this document including the M (S) functions can be used, M * (S), A (S) and A * (S). For a given value of p, as determined by iterating the outer cycle, the inner cycle calculates the optimal grouping configuration S = (s0, Si, ..., sp) which achieves the lowest amount of quadratic error distortion average. For small values of N in the order of less than 10, it is possible to construct a set of table entries containing all possible ways to divide the p groups through the N blocks. The length of each table entry is the number of combinations of 7 selected at a time (p-l), represented later as "7 selected p-l". There is a separate table entry for all values of p except for p = 0, which is not identified, and p = N, which produces the solution without distortion where each group contains exactly one block. For 0 <; p < N, a preferred implementation of the table stores the partition values for S =. { s0, Yes, ..., sp} as bit fields in a TAB table and the processing in the internal combinatorial cycle masks the TAB bit field values to arrive at the absolute values for each sm. The partition values for the bit fields for 0 < p < N are as follows: Table 1. All Possible Combinations of the Groupings for N = 8 Each entry or row in the table corresponds to a different value of p, for 0 < p < N, N = 8. This table can be used in an iterative process such as those shown in the logic block diagrams of Figures 3A and 3B, which is the internal cycle of the process shown in Figure 2. This inner cycle iterates over all group configurations possible, which are (7 selected pl) in number. As shown by the TAB notation [p, r] in the block diagrams, the p value provided by the outer cycle catalogs the table row and the r value catalogs the bit field for a particular grouping combination. For each inner cycle iteration, the average distortion measurement A (S) shown in Figure 3A or, alternatively, the maximum difference distortion M "(S) shown in Figure 3B is calculated in accordance with equations 10 or 12, respectively. The total distortion across all the blocks and bands is added to obtain an individual scalar value Asav or, alternatively, Msav. The Comprehensive Search Method can use a variety of distortion measures. For example, the implementation described above uses a Standard Ll but instead the measurements of Standard L2 or Infinite Standard L can be used. See RM Gray, A. Diver, AH Gray, Jr., "Distortion Measures for Speech Processing" IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-28, No. 4, August 1980. 2. Rapid Optimal Method The rapid optimal method uses the average maximum distortion M '(S) defined above in equation 7. This method obtains an optimal grouping configuration without having to exhaustively search through all possible solutions. As a result, it is not as exhaustive co-putationally as the exhaustive search method described above. a) Definitions It is said that a partition P (s0, ..., sp) is a partition of the level p if it consists of p groups. The dimension d of a group is the number of blocks in that group. Groups with a dimension greater than 1 are referred to as positive groups. The definition of a group Gm expressed in equation 4 is rewritten as Gm = G (sm-a, sm _? + L ..., sm). b) Preliminaries Mathematics A group that has a dimension d > 3 can be divided into two subgroups that have exactly one block in common. For example, if Gm = G (sm_1 / sm _? + L ..., sm), then the group Gm can be divided into two subgroups Gma = G (sm-1, Sm-a + l, ..., sm_ ? + k) and Gp, b = G (sm_a + k, ..., sm), which both contain the block that has the index sm _? + k. By definition, these two subgroups can not be part of the same partition. A procedure for dividing a group into two overlapping, positive subgroups can be generalized in a procedure that divides a given group into two or more positive overlapping subgroups. The distortion measure J '(m) defined above in equation 6 always satisfies the following statement: J' (m) > J '(ma) + J' (mb) (15) where Gma and Gmb are overlapping subgroups of the Gm group. This can be proved by showing that Jmj = max (Jma / j, J ^, j) is true for all j, l = j = k. By inserting this relation into the definition of J '(m) as shown in equation 6, it can be seen that it follows the assertion in expression 15. c) Description of the Essential Process The principles that are the basis of the fast optimal method can be understood by taking for granted first a given partition Pp of the level p that minimizes M '(S) = M' (S, ..., sp) for all the vectors Si, ..., sp that define a partition 'of the level p. There are F partitions of the level p-l that, independently of the specific values of the spectral coefficients, can not be the only partition Pp_? of the level p-l that minimizes M '(Si, ..., sp) for all the vectors S = (s, ..., sp) that define a partition of the level p-l. In other words, if one of these partitions F minimizes M '(S) for all the vectors S that define a partition of the level pl then there is at least one other partition that minimizes M' '(S) for all the vectors S that define a partition of the level pl too One can define a subset of these F-partitions, represented as X (p, P), which contains particular partitions at the p-level that can be excluded from some of the processing necessary to find an optimal solution as described in more detail later.
The subset X (p, P) is defined as follows: (1) assume that a partition F of the level p-l has n positive groups and that m, 0 < m < n, positive groups of this partition, respectively, can be replaced by another positive group of the same dimension and that after the replacement, the partition F is transformed into a partition G of the level p-l that does not have overlapping groups. If the positive groups of the partition P are a subset of the positive groups of the partition G but not of the partition F, then F belongs to X (p, P). (2) Assume that a partition F of level p-l has n positive groups and that m, 0 < m < n, positive groups of F can be divided into two or more positive groups. Assume further that one or more of these positive groups can be replaced by a group with the same dimension and to transform the partition F into a valid partition G of the level p-l that has no overlapping groups. If the positive groups of the partition P are a subset of the positive groups of the partition G but not of the partition F, then according to the assertion made in 15, F belongs to X (p, P). It may be useful to note that, by construction, the set X (p, P) can not be identical to the set of all partitions of the p-l level. d) Generalized Case (Arbitrary N) The fast optimal method begins by dividing the N blocks of a frame into p = N groups and calculates the average maximum distortion function (M '(S) or M * (S).) This partition is represented as PN The method then calculates the average maximum distortion function for all Nl possible forms of partition of the N blocks in g = Nl groups The particular independent partition of these Nl partitions that minimizes the average maximum distortion function is represented as PN-I- The partitions belonging to the set X (Nl, PN_?) are identified as described above, so the method calculates the average maximum distortion function for all possible forms of partition of N blocks in Nl groups that they do not belong to the set X (Nl, PN-?) The partition that minimizes the average maximum distortion function is represented as PN_2, the fast optimal method iterates this process for p = N-2, .. ., l to find Pp- ?, partitions using the set X (p, Pp) at each level to reduce the number of partitions that are analyzed as a possible solution. The fast optimal method concludes by finding the partition P between the partitions Pa, ..., PN that minimizes the average maximum distortion function M '(S) or M * (S). e) Example The following example is provided to help explain the fast optimal method and to expose the characteristics of a possible implementation. In this example, each frame contains six blocks or N = 6. A set of control tables can be used to simplify the processing required to determine if a partition should be added to the set X (p, Pp) as described above. For this example, a set of tables is shown, Tables 2A through 2C. The notation D (a, b) is used in these tables to identify specific partitions. A partition consists of one or more groups of blocks and can be specified uniquely by the positive groups it contains. For example, a partition of six blocks consisting of four groups in which the first group contains blocks 1 and 2, the second group contains blocks 3 and 4, the third group contains block 5 and the fourth group contains the block 6, can be expressed as (1,2) (3,4) (5) (6) and shown in the tables as D (1, 2) + D (3,4). Each table provides information that can be used to determine whether a particular partition at the p-l level belongs to the set X (p, Pp) when processing a particular partition Pp at the p-level. Table 2A, for example, provides information to determine if a partition at level 4 belongs to the set X (5, P5) for each partition of level 5 shown in the top row of the table. The top row of Table 2A, for example, lists partitions that consist of five groups. All partitions are not listed. In this example, all partitions that include five groups are D (l, 2), D (2,3), D (3,4), D (4,5) and D (5,6). Only the D (l, 2), D (2,3) and D (3,4) partitions are shown in the top row of the table. The missing partitions D (4,5) and D (5,6) are symmetric to the partitions D (2,3) and D (1, 2), respectively, and can be derived from them. The left column in Table 2A shows partitions that consist of four groups. The symbols "Y" and "N" shown in each table indicate whether ("Y") or not ("N") the partition at the level pl shown in the left column should be excluded from further processing for the respective partition Pp shown in the top row of the table in that column. With reference to Table 2A, for example, the partition of level 5 D (l, 2) has an "N" entry in the row for the partition of level 4 D (2,3,4), which indicates that the partition D (2,3,4) belongs to the set X (5, D (1,2)) and must be excluded from further processing. The partition of level 5 D (2,3) has an "Y" entry in the row for the partition of level 4 D (2,3,4), which indicates that the partition of level 4 does not belong to the set X ( 5, D (2.3)). In this example, a process that implements the fast optimal method divides the six blocks of a frame into six groups and calculates the maximum, average distortion. The partition is represented as P6. The process calculates the average maximum distortion for the five possible ways of dividing the six blocks into five groups. 5. The independent partition of the five partitions that minimizes the average maximum distortion is represented as P5. The process refers to Table 2A and selects the column whose top entry specifies the grouping configuration of partition P5. The process calculates the average maximum distortion for all possible ways of partitioning the six blocks into four groups that have the "Y" entry in the selected column. The partition that minimizes the average maximum distortion is represented as P4. The process uses Table 2B and selects the column whose top entry specifies the grouping configuration of the P partition. The process calculates the average maximum distortion for all possible ways of partitioning the six blocks into three groups that have a "Y" entry in the selected column. The partition that minimizes the average maximum distortion is represented as P3. The process uses Table 2C and selects the column whose top entry specifies the grouping configuration of the P3 partition. The process calculates the average maximum distortion for all possible ways of partitioning the six blocks into groups that have a "Y" entry in the selected column. The partition that minimizes the average maximum distortion is represented as P2 The process calculates the average maximum distortion for the partition that consists of a group. This partition is represented as Pi. The process identifies the partition P between the partitions P1, ..., P6 which has the smallest average maximum distortion. This partition P provides the optimal grouping configuration.
Table 2A. Table of Elimination of Rapid Optimal Groups for p = 5 Table 2B. Rapid Optimal Group Elimination Table for p = 4 Table 2C. Optimum Rapid Groups Elimination Table for P = 3 3. Description of the Greedy Merge Process The Greedy Merge process provides a simplified technique for dividing the blocks into a frame into groups. While the Greedy Merge method does not guarantee that the optimal clustering configuration will be found, the reduction in computational complexity provided by this method may be more desirable than a possible reduction in optimization for most practical applications. The Greedy Merge method can use a wide variety of distortion measurement functions including those described above. A preferred implementation uses the function shown in Expression 11. Figure 4 shows a block diagram of a suitable Greedy Merge method that operates as follows: the band energy vectors Vi are calculated for each block i. A set of N groups is created with each one having a block. The method then tests all adjacent pairs N-l of the groups and finds the two adjacent groups g and g + 1 that minimize equation 11. The minimum value of J '' of equation 11 is represented as q. The minimum value q is then compared to a distortion threshold T. If the minimum value is greater than the threshold T, the method ends with the current grouping configuration identified as the optimal or near-optimal configuration. If the minimum value is less than the threshold T, the two groups g and g + 1 are merged into a new group that contains the band energy vectors of the two groups 9 and g + 1 • This method iterates until the measurement of distortion J "for all pairs of adjacent groups exceeds the distortion threshold T or until all the blocks have been merged into a group. An example of how this method operates with a frame of four blocks is shown in Figure 5. In this example, the four blocks are initially ordered into four groups a, b, c, and d that each have a block. The method then finds the two adjacent groups that minimize equation 11. In the first iteration, the method finds the groups b and c that minimize equation 11 with a distortion measure J "that is less than the distortion threshold T; therefore, the method merges groups b and c into a new group to obtain three groups a, be, and d. In the second iteration, the method finds the two adjacent groups a and b that minimize equation 11 and the distortion measure J '' for this pair of groups is less than the threshold T. The groups a and be are merged into a new group for provide a total of two groups abe and d. In the third iteration, the method finds the distortion measure J "for the only remaining pair of groups that is greater than the distortion threshold. T; therefore, the method ends up leaving the two final groups abe and d as the optimum or almost optimal grouping configuration. The actual order of computational complexity for the Greedy Merge method depends on the number of times the method must iterate before the threshold is exceeded; however, the number of iterations is limited between 1 and% N • (N-l). D. Implementation Devices incorporating various aspects of the present invention can be implemented in a variety of ways including a computer program for execution by a computer or some other device that includes more specialized components such as a processor's circuitry. digital signals (DSP, for its acronym in English) coupled to components similar to those found in a common computer. Figure 6 is a schematic block diagram of a device 70 that can be used to implement aspects of the present invention. The DSP 72 provides computation resources. The RAM 73 is a system random access memory (RAM) used by the DSP 72 for processing. The ROM 74 represents some form of persistent storage such as a read-only memory (ROM) for storing programs necessary to operate the device 70 and possibly to carry out various aspects of the present invention. The 1/0 75 control represents the interface circuitry for receiving and transmitting signals via communication channels 16, 11. In the modality shown, all the main components of the system are connected to the common link 71, which can represent more than one physical or logical bus; however, a bus architecture is not required to implement the present invention. In embodiments implemented by a common computer system, additional components may be included for interfacing with devices such as a keyboard or mouse and a screen and for controlling a storage device having a storage medium such as a magnetic tape or disk. or an optical medium. The storage medium can be used to record instruction programs for operating the systems, utilities and applications and can include programs that implement various aspects of the present invention. The functions required to practice various aspects of the present invention can be realized by components that are implemented in a wide variety of ways including discrete logic components, integrated circuits, one or more ASICs and / or program controlled processors. The manner in which these components are implemented is not important for the present invention. The computer program implementations of the present invention can be carried by a variety of means that can be read by machines such as baseband communication paths or modulated across the spectrum ranging from supersonic to ultraviolet frequencies or storage media that carry information using essentially any recording technology that includes tapes, cards or magnetic disks, optical cards or disks and detectable marks on media including paper.

Claims (27)

  1. CLAIMS 1. A method for processing blocks of audio information arranged in frames, each block having a content representing a respective time interval of audio information, characterized in that the method comprises: (a) receiving an input signal carrying the blocks of audio information; (b) obtain two or more quality measures such that: (1) each set in a plurality of sets of groups of the blocks in a respective frame has an associated measure of quality, (2) each group has one or more blocks, (3) each set of groups includes all the blocks in the respective frame and no block is included in more than one group in each set, and (4) the quality measure represents excellence in results that can be obtained through of the processing of each block in a respective group according to an associated set of one or more control parameters; (c) analyze the quality measures to identify a selected set of groups that have a minimum number of groups in such a way that a measure of the processing performance obtained at least in part of the associated quality measure is higher than a threshold; Y
  2. (d) processing each group of blocks in the selected set of groups according to the associated set of one or more control parameters to generate an output signal representing contents of the input signal and representing the associated set of parameters of control for each group in the selected set. The method according to claim 1, characterized in that the blocks comprise temporal domain samples of the audio information.
  3. 3. The method according to claim 1, characterized in that the blocks comprise frequency domain coefficients of the audio information.
  4. The method according to claim 1, characterized in that at least one pair of blocks in the groups that have more than one block have a content that represents the audio information in time intervals that are adjacent to each other or are overlapping each other .
  5. 5. The method according to claim 1, characterized in that it comprises: obtaining two or more measures of value, each measure of value affiliated with a set of groups of blocks, wherein the measurement of value represents a quantity of resources necessary to process the blocks in an affiliated set according to the associated set of control parameters; where the measure of the processing performance is obtained partly from the measure of value affiliated with the selected set.
  6. The method according to claim 1 or 5, characterized in that the analysis is performed in one or more iterations of an iterative process to determine one or more sets of groups that are not candidates for the selected set and excludes the analysis of one or more of these sets in subsequent iterations of the process.
  7. The method according to claim 1 or 5, characterized in that the selected set is identified by an iterative process comprising: determining a second measure of the processing performance for the pairs of groups in an initial set of groups; merge the pair of groups that have a second higher measure of processing performance to form a revised set of groups with the condition that the second highest measure of processing performance is greater than a threshold and determine the second measure of processing performance for the pairs of groups in the revised set of groups; and continue the merger until no pair of groups in the revised set of groups have a second measure of processing performance that is greater than the threshold, where the revised set of groups is the selected set.
  8. 8. The method according to claim 5, characterized in that the value measurements are sensitive to the amounts of data necessary to represent the sets of control parameters in the encoded signal.
  9. 9. The method according to claim 5, characterized in that the value measurements are sensitive to the amounts of computational resources necessary to process the blocks of audio information.
  10. 10. An apparatus for processing blocks of audio information arranged in frames, each block having a content representing a respective time interval of audio information, characterized in that the apparatus comprises: means for receiving an input signal carrying blocks of audio. audio information; means for obtaining two or more quality measures such that: (1) each set in a plurality of sets of groups of the blocks in a respective frame has an associated measure of quality, (2) each group has one or more blocks, (3) each set of groups includes all the blocks in the respective frame and no block is included in more than one group in each set, and (4) the quality measure represents excellence in the results that can be obtained by processing means of each block in a respective group according to an associated set of one or more control parameters; a means to analyze quality measures to identify a selected set of groups that have a minimum number of groups in such a way that a 'measure of processing performance obtained at least in part from the associated measure of quality is higher than a threshold; and means for processing each group of blocks in the selected set of groups according to the associated set of one or more control parameters to generate an output signal representing the contents of the input signal and representing the associated set of control parameters for each group in the selected set.
  11. The apparatus according to claim 10, characterized in that the blocks comprise time domain samples of the audio information.
  12. The apparatus according to claim 10, characterized in that the blocks comprise frequency domain coefficients of the audio information.
  13. The apparatus according to claim 10, characterized in that at least one pair of blocks in the groups having more than one block have a content representing the audio information in time intervals that are adjacent to each other or overlap each other .
  14. The apparatus according to claim 10, characterized in that it comprises: a means to obtain two or more measures of value, each measure of value affiliated with a set of groups of blocks, wherein the measure of value represents a quantity of resources necessary to process the blocks in the affiliated set according to the associated set of control parameters; where the measure of processing performance is obtained in part from the measure of value affiliated with the selected set.
  15. The apparatus according to claim 10 or 14, characterized in that the means for the analysis is iteratively analyzed to determine one or more sets of groups that are not candidates for the selected set and excludes the analysis of one or more of these sets in subsequent iterations.
  16. 16. The apparatus according to claim 10 or 14, characterized in that the means for the analysis performs its analysis by: determining a second measure of processing performance for the pairs of groups in an initial set of groups; merge the pair of groups that have a second higher measure of processing performance to form a revised set of groups with the condition that the second highest measure of processing performance is greater than a threshold and determine the second measure of processing performance for the pairs of groups in the revised group of groups; and continue the merge until no pair of groups in the revised set of groups has a second measure of processing performance that is greater than the threshold, where the set of groups reviewed is the selected set.
  17. The apparatus according to claim 14, characterized in that the value measurements are sensitive to the amounts of data necessary to represent the sets of control parameters in the encoded signal.
  18. 18. The apparatus according to claim 14, characterized in that the value measurements are sensitive to the amounts of computational resources necessary to process the audio information blocks.
  19. 19. A medium that carries a program of instructions that is executable by a device to carry out a method for processing blocks of audio information arranged in frames, each block having a content representing a respective time interval of audio information, characterized because the means comprises: (a) receiving an input signal carrying the audio information blocks; (b) obtain two or more quality measures such that: (1) each set of a plurality of sets of groups of blocks in a respective frame has an associated measure of quality, (2) each group has one or more blocks, (3) each set of groups includes all the blocks in the respective frame and no block is included in more than one group in each set, and (4) the quality measure represents excellence in the results that can be obtained by processing means of each block in a respective group according to an associated set of one or more control parameters; (c) analyze the quality measures to identify a selected set of groups that have a minimum number of groups in such a way that a measure of processing performance obtained at least in part from the associated measure of quality is higher than a threshold; and (d) processing each group of blocks in the selected set of groups according to the associated set of one or more control parameters - to generate an output signal representing the contents of the input signal and representing the associated set of control parameters for each group in the selected set.
  20. The means according to claim 19, characterized in that the blocks comprise time domain samples of the audio information.
  21. 21. The medium according to claim 19, characterized in that the blocks comprise frequency domain coefficients of the audio information.
  22. The medium according to claim 19, characterized in that at least one pair of blocks in the groups having more than one block have a content representing the audio information in time intervals that are adjacent to each other or that overlap between yes
  23. 23. The medium according to claim 19, characterized in that the means comprises: obtaining two or more measures of value, each measure of value affiliated with a set of groups of blocks, wherein the measurement of value represents a necessary amount of resources to process the blocks in the affiliated set according to the associated set of control parameters; where the measure of processing performance is obtained in part from the measure of value affiliated with the selected set.
  24. 24. The medium according to claim 19 or 23, characterized in that the analysis is performed in one or more iterations of an iterative process to determine one or more sets of groups that are not candidates for the selected set and excludes the analysis of one or more of these sets in subsequent iterations of the process.
  25. 25. The medium according to claim 19 or 23, characterized in that the selected set is identified by an iterative process comprising: determining a second processing performance measure for the pairs of groups in an initial set of groups; merge the pair of groups that have a second higher measure of processing performance to form a revised set of groups with the condition that the second highest measure of processing performance is greater than a threshold and determine the second measure of processing performance for pairs of groups in the revised group of groups; and continue the merge until no pair of groups in the revised set of groups has a second measure of processing performance that is greater than the threshold, where the set of groups reviewed is the selected set.
  26. The medium according to claim 23, characterized in that the value measurements are sensitive to the amounts of data necessary to represent the sets of control parameters in the encoded signal.
  27. 27. The medium according to claim 23, characterized in that the value measurements are sensitive to the amounts of computational resources necessary to process the audio information blocks.
MXPA/A/2006/008224A 2004-01-20 2006-07-20 Audio coding based on block grouping MXPA06008224A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US60/537,984 2004-01-20

Publications (1)

Publication Number Publication Date
MXPA06008224A true MXPA06008224A (en) 2007-04-10

Family

ID=

Similar Documents

Publication Publication Date Title
US7840410B2 (en) Audio coding based on block grouping
EP1905011B1 (en) Modification of codewords in dictionary used for efficient coding of digital media spectral data
US7630882B2 (en) Frequency segmentation to obtain bands for efficient coding of digital media
EP2293293B1 (en) Adaptive hybrid transform for signal analysis and synthesis
CN101971253B (en) Encoding device, decoding device, and method thereof
EP2487681A1 (en) Multi-stage quantization method and device
JP2007523366A5 (en)
CN101180675A (en) Predictive encoding of a multi channel signal
CN111968655A (en) Signal encoding method and apparatus, and signal decoding method and apparatus
Chan et al. High fidelity audio transform coding with vector quantization
CA2551281A1 (en) Voice/musical sound encoding device and voice/musical sound encoding method
US7117053B1 (en) Multi-precision technique for digital audio encoder
US20110135007A1 (en) Entropy-Coded Lattice Vector Quantization
EP2525354B1 (en) Encoding device and encoding method
US6775587B1 (en) Method of encoding frequency coefficients in an AC-3 encoder
MXPA06008224A (en) Audio coding based on block grouping
JP5799824B2 (en) Audio encoding apparatus, audio encoding method, and audio encoding computer program
JP2842276B2 (en) Wideband signal encoding device
Goodwin Multichannel matching pursuit and applications to spatial audio coding
KR20190069192A (en) Method and device for predicting channel parameter of audio signal
EP2192577B1 (en) Optimization of MP3 encoding with complete decoder compatibility
AU2012247062B2 (en) Adaptive Hybrid Transform for Signal Analysis and Synthesis
Chan et al. High Fidelity Audio Coding with Generalized Product Code VQ
Zagursky et al. Testing the methods of sound signal compression
Mikkonen et al. Soft-decision decoding of binary block codes in CELP speech coding