WO2006113921A1 - Quantization of speech and audio coding parameters using partial information on atypical subsequences - Google Patents

Quantization of speech and audio coding parameters using partial information on atypical subsequences Download PDF

Info

Publication number
WO2006113921A1
WO2006113921A1 PCT/US2006/015251 US2006015251W WO2006113921A1 WO 2006113921 A1 WO2006113921 A1 WO 2006113921A1 US 2006015251 W US2006015251 W US 2006015251W WO 2006113921 A1 WO2006113921 A1 WO 2006113921A1
Authority
WO
WIPO (PCT)
Prior art keywords
subsequences
information
quantization
fidelity
bit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2006/015251
Other languages
English (en)
French (fr)
Inventor
Sean A. Ramprashad
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Docomo Inc
Original Assignee
NTT Docomo Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NTT Docomo Inc filed Critical NTT Docomo Inc
Priority to DE602006009495T priority Critical patent/DE602006009495D1/de
Priority to JP2008507957A priority patent/JP4963498B2/ja
Priority to AT06751085T priority patent/ATE444550T1/de
Priority to EP06751085A priority patent/EP1872363B1/en
Publication of WO2006113921A1 publication Critical patent/WO2006113921A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/028Noise substitution, i.e. substituting non-tonal spectral components by noisy source

Definitions

  • the present invention relates to the field of information coding; more particularly, the present invention relates to quantization of data using information on atypical behavior of subsequences within the sequence of data to be quantized.
  • Speech and audio coders typically encode signals by a combination of statistical redundancy removal and perceptual irrelevancy removal followed by quantization (encoding) of the remaining normalized parameters.
  • quantization encoding
  • the stages of redundancy and irrelevancy removal must be efficient.
  • the stages of redundancy and irrelevancy removal may be made efficient using a Linear Predictive Coefficient (LPC) Model of the gross (short-term) shape of the signal spectrum.
  • LPC Linear Predictive Coefficient
  • This model is a highly compact representation that is used in many designs, e.g. in Code Excited Linear Predictive Coders, Sinusoidal Coders, and other coders like the TWIN-VQ and Transform Predictive Coders.
  • the LPC model itself can be efficiently encoded using various state of the art techniques, e.g., vector quantization and predictive quantization of Line Spectral Pair parameters, etc.
  • stages of redundancy and irrelevancy removal may be made efficient is using compact specifications of the harmonic or pitch structure in the signal. These structures represent redundant structure in the frequency domain or (long-term) redundant structure in the time domain. Common techniques often use a parameter specifying the periodicity of such structures, e.g., the distance between spectral peaks of frequency domain representations or the distance between quasi-stationary time-domain waveforms, using classic parameters such as a pitch delay (time domain) or a "delta-f (frequency domain). [0007] An additional example of how the stages of redundancy and irrelevancy removal may be made efficient is using gain factors to explicitly encode the approximate value of signal energy in different time and/or frequency domain regions. Various techniques for encoding these gains can be used including scalar or vector quantization of gains or parametric techniques such as the use of the LPC model mentioned above. These gains are often then used to normalize the signal in different areas before further encoding.
  • a target noise/quantization level for different time/frequency regions.
  • the levels are calculated by analyzing the spectral and time characteristics of the input signal.
  • the level can be specified by many techniques including explicitly through a bit-allocation or a noise-level parameter (such as a quantization step size) known at the encoder and at the decoder or implicitly through the variable-length quantization of parameters in the encoder.
  • the targets levels themselves are often perceptually relevant and form the basis for some of the irrelevancy removal. Often these levels are specified in a gross manner with a single target level applying to a given region (group of parameters) in time or frequency
  • a MPEG type coder takes a vector of MDCT coefficients, analyzes the input signal, and produces fidelity criteria for different groups of MDCT coefficients. Generally, a group of coefficients span a certain support area in time and frequency. Coders like the transform predictive coder and basic transform coders use information of signal energy in a given subband to infer a bit-allocation for that band.
  • the creation of criteria is the basis for most speech and audio coding schemes that adapt to the signal.
  • the criteria's creation is the function of earlier stages of the coding algorithm dealing with redundancy removal and irrelevancy removal. These stages produce fidelity criteria for each target sequence "x" of parameters.
  • a single target "x” could represent a single subband or scale- factor band in coders, hi general, there are many such "x” in a given frame of speech or audio, each "x” having its own fidelity criteria.
  • These fidelity criteria themselves can be functions of the gross statistical and irrelevancy variations noted by earlier schemes.
  • variable-length quantization e.g. Huffman codes.
  • the codeword assigned to each target vector during quantization is represented by a variable-length code.
  • the code used tends to be longer for codewords that are used less frequently, and shorter for codewords that are used more frequently. Essentially, the situation can be that "typical" codewords are represented more efficiently and "atypical" codewords less efficiently.
  • the number of bits used to describe codewords is less than if a fixed-length code (a fixed number of bits) is used to represent codeword indices.
  • a method and apparatus for quantizing parameters using partial information on atypical subsequences, hi one embodiment, the method comprises partially classifying a first plurality of subsequences in a target vector into a number of selected groups, creating a refined fidelity criterion for each subsequence of the first plurality of subsequences based on information derived from classification, dividing a target vector into a second plurality of subsequences, and encoding the second plurality of subsequences, which includes quantizing the second plurality of subsequences, given the refined fidelity criterion, hi another embodiment, the first and second plurality can be the same.
  • Figure 1 is a flow diagram of one embodiment of a quantization process.
  • Figure 2 is a flow diagram of one embodiment of an inverse quantization process.
  • Figure 3 illustrates a flow diagram of one embodiment of an encoding process.
  • Figure 4 is a flow diagram of one embodiment of the decoding process.
  • Figure 5 illustrates a flow diagram of one embodiment of an encoding process having an additional perceptual enhancement to the bit allocation.
  • Figure 6 illustrates a flow diagram of one embodiment of a decoding process having an additional perceptual enhancement to the bit allocation.
  • Figure 7 illustrates a flow diagram of one embodiment of a decoding process having a noise-fill operation.
  • Figure 8 illustrates a flow diagram of one embodiment of an encoding process having adaptive quantization.
  • Figure 9 is a block diagram of one embodiment of a computer system. DETAILED DESCRIPTION OF THE PRESENT INVENTION
  • the quantization is performed under practical constraints of a limited quantizer dimension and operates at low bit rates.
  • the techniques described herein also have the properties that naturally allow it to take advantage of perceptual considerations and irrelevancy removal.
  • a sequence of parameters that can no longer benefit from classic statistical redundancy removal techniques is divided into smaller pieces (subsequences).
  • a subset, or a number of subsets, of these subsequences are tagged as containing a statistical variation.
  • This variation is referred to herein as an "atypical" behavior and such tagged sequences are termed "atypical" sequences. That is, from a vector of parameters for which there is no assumed statistical structure, partial (incomplete) information is created about actual (generally random) variations that do exist between subsequences of parameters contained within that vector. The information to be used is partial because it is not a complete specification of the statistical variations.
  • the partial information is used by both the encoder and decoder to modify their handling of the entire sequence of parameters.
  • the decoder and encoder do not require complete knowledge of which sequences are "atypical", or complete information on the types of variations.
  • the partial information is encoded into the bitstream and sent to the decoder with a lower overhead than if complete information had been encoded and sent. A number of approaches on how to specify this information and on how to modify coder behavior based on this information are described below.
  • the new method takes in a target vector, in this case only one of the types of "x" fore-mentioned in prior art, and further divides this "x" into multiple subsequences, and produces a refined fidelity criteria for each subsequence.
  • the fidelity criteria are implemented in terms of bit assignments for the subsequences.
  • bit assignments across the subsequences are created as a function of the partial information. Furthermore, and optionally, these operations include creating purposeful patterns in the bit- assignment to improve perceptual performance given the partial information yet also within the remaining uncertainty not covered by the partial information.
  • a procedure encourages the increasing of the number of areas (subsequences) in the vector effectively receiving zero-bit assignments.
  • This embodiment can further take advantage of this approach by using noise-fill to create a usable signal for the areas receiving zero-bit assignments.
  • This joint procedure is effective for very low bit-rates.
  • the noise-fill itself can adapt based on the exact pattern or during the quantization process. For example, the energy of the noise-fill maybe adapted.
  • the operations also include quantizing (encoding) and inverse-quantizing (decoding) the entire target using the bit-allocation and noise-fill to produce a coded version of the vector of parameters.
  • the techniques described herein do no rely on any predictable or structured statistical variation across subsequences.
  • the techniques works even when the components of the sequence come from an independent and identically distributed statistical source.
  • the techniques do not need to provide information for all subsequences, or complete information on any given subsequence.
  • only partial and possibly imprecise information is provided on the presence and nature of atypical subsequences. This is beneficial as it reduces the amount of information that is transmitted for such information.
  • the fact that the information is partial means that within the uncertainty not specified by the information one can select permutations (quantization options) that have known or potential perceptual advantages. Without any partial information the uncertainty is too great to create or distinguish permutations, and with complete information there is no uncertainty.
  • the partial information is simply encoded into a numeric symbol "V".
  • the original criteria "C” and “V” together directly generate a refined criteria.
  • the refined criteria can consist of a pattern of a number of sub- criteria that together conform to "C”.
  • subsequences are arranged in groups, and each group represents a certain classification of a variation of interest.
  • a subsequence's membership in a group implies that the subsequence is more likely to have (not necessarily has) this noted variation.
  • the embodiment allows for a balance between perfect membership information and imprecise membership information. Imprecise membership information simply conveys that a given type of information (classification) is more likely. For example, subsequence "k" may be assigned a membership to group "j", simply because it takes less information than assigning subsequence "k” to another group.
  • one of the groups used signifies that no classification is being conveyed about members of that group, only the information implicit from not being a member of other groups. Again, this is an example of partial information.
  • the type of information can adapt, that is, the number and definition of groups can be selected from multiple possibilities. The possibility selected for a given "x" is indicated as part of the information encoded 2
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • a machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer).
  • a machine-readable medium includes read only memory ("ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
  • the encoder and decoder adjusts their coding strategy to improve objective performance, e.g. improve the expected mean square error, and to take advantage of perceptual effects of quantization.
  • a variation from an expected behavior can either signify that subsequences with such variations should either have preferential or non-preferential (even detrimental) treatment.
  • This variation in treatment can be done by creating a non-trivial pattern of bit allocations across a group target vectors (e.g., groups of such i.i.d. vectors).
  • a bit allocation signifies how precisely a target vector (subsequence) is to be represented.
  • underlying base methodology is to create this partial information, information that is not based necessarily on any statistical structure, use of the partial information to create non-trivial patterns of bit assignments, and use of patterns effectively and purposefully with noise-fill and perceptual masking techniques.
  • Figure 1 is a flow diagram of one embodiment of a quantization
  • processing logic at the encoder.
  • processing logic may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both.
  • Target vector "x" 120 begins with an input of a target vector "x" 120 to be encoded as well as a target global fidelity criterion "B" 121.
  • the global criteria is simply the criterion (or resource in bits) that is to be applied to the total vector. Both the target and global criterion are assumed generated in earlier coding stages of redundancy and irrelevancy removal.
  • Target vector "x" 120 consists of a sequence of "M” symbols.
  • Target global fidelity "B" 121 is known by the decoder, pre-determined and/or noted from information (bits) sent in the bitstream from earlier coding stages.
  • Processing logic initially interleaves the target vector (processing block 101). This is optional.
  • the interleaving is done by an interleaving function.
  • information "i" specifying this function (represented as a sequence of bits) is packed into the bitstream and sent to the decoder.
  • the interleaving function "i" is fixed or known apriori at the decoder, for example as assumed in "B” above, no information needs to be sent to the decoder.
  • the interleaving has many uses, one being to potentially randomize blocking (localized area) effects of quantization.
  • Processing logic then divides the target vector 120 into a number
  • this division is a function, at least in part, of the fidelity criteria "B."
  • the length of subsequences, the number of subsequences can be a function of "B".
  • the division is a function, at least in part, of the dimension "M" of the target 120.
  • the division is a function of any other side- information from previous coding stages. Note that the division need not be a function of any of them. Regardless, it is assumed that the decoder knows all relevant information and thus can recreate information on the parsing of Division 1.
  • Division 1 can also be a function of another division referred to herein as “Division 2,” which is described below and used when quantizing (encoding) the subsequences.
  • Processing logic analyzes these subsequences to determine if any subsequence represents and/or contains a variation in behavior that is of interest (processing block 103). Such "atypical" subsequences, subsequences with “atypical” variations, are noted and the indices of some are selected for inclusion in the partial information that is sent to the decoder. Note, subsequences that do not have the behavior of interest may also be selected for such classification.
  • Processing logic encodes information on the indices of "atypical" subsequences and possibly the type of variation they represent into a parameter "F' (processing block 104).
  • This parameter is represented by a sequence of bits to be packed into the bitstream.
  • this parameter defines the membership of the subsequences in different groups. It is not necessary that all subsequences are assigned to a group. It is not necessary that subsequences in a group have to actually have or represent the same "atypical" variations. Membership in a group only indicates that one can treat these subsequences as if they had such a variation. For example, it may be more efficient to give more subsequences preferential treatment than to spend resources specifying and limiting which subsequences preferential treatment.
  • processing logic also divides the target into subsequences y(l), .., y(n) (processing block 106).
  • This division (referred to herein as “Division 2") does not have to be the same as the division (Division 1) used in analyzing the variations within target vector 120.
  • Division 2 is a function of "B” and "M” or any other side information sent from previous coding stages, m one embodiment, Division 2 is a function of "F”. For simplicity of illustration, it is assumed that these subsequences are each of "p" symbols.
  • Processing logic uses the fidelity target "B” and partial information parameter represented by "P" to generate a refined fidelity criteria f(l),...,f(n) for the target subsequences in Division 2, where f(k) applies to the target y(k) (processing block 105).
  • Perceptual enhancements can be implicitly represented in the fidelity criteria f(l),.. -,f(n) by further refinements (permutations on the assignments) as discussed below.
  • processing logic tests whether there is new information to further refine the criteria (processing blocks 108) and, if so, determines whether the quantization information obtained as the quantization process proceeds (part of the information that is sent to processing block 115) can actually refme the criteria (processing block 109). If so, processing block sends the information to processing block 105.
  • This optional iterative step may improve performance in some cases.
  • the quantized version of y(k)'s can directly be used to change the quantization for future y(k)'s.
  • the quantized versions of y(k)'s are recovered in the same order as at encoding, and so the process can be repeated exactly at the decoder.
  • One adaptation is simply to use the quantized y(k)'s known at a given time to estimate the actual energy of the original y(k)'s. This provides information possibly about the energy of the remaining y(k)'s and thus this information can be used to adapt quantization techniques. Often the entire vector "x" has a given total expected energy due to the original statistical normalization process from earlier encoding steps. This makes such an estimation possible.
  • the estimated energy of prior y(k)'s can indicate the potential perceptual significance, or perceptual relevance, of future y(k)'s.
  • Processing logic quantizes the subsequences y( 1 ), ... ,y(n) in Division
  • the classic techniques map a subsequence "y(k)" to an index in a codebook.
  • the codebook design for example the number of entries in the codebook and its members, is a function of f(k).
  • the index specifies the unique entry in the codebook that should be used to represent an approximate version of the subsequence "y(k)".
  • FIG. 1 is a flow diagram of one embodiment of an inverse quantization process. The process is performed by processing logic at the decoder. The process is performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both. Note that this scheme does not have perceptual enhancements.
  • processing logic in the decoder receives the transmitted bitstream from the encoder (processing block 201).
  • the processing logic may receive parameters from earlier coding stages that may (or may not) be necessary, e.g. "B” and "M”.
  • Processing logic extracts the parameter "F” from the bitstream and uses this parameters (and possibly others like "B” from earlier decoding stages) to generate the fidelity criteria f(l), ... , f(n) (e.g., the bit allocation) used at the encoder (processing block 204).
  • processing logic uses this fidelity criteria along with the parameters
  • processing logic uses the estimated quantization information to test whether there is new information to further refine the fidelity criteria (processing block 220). If so, processing logic tests whether the information can further refine the fidelity criteria (processing block 211). An iterative procedure for doing that is described in paragraph 0060 above. If so, processing block sends the quantization information to processing block 204, which refines the fidelity criteria (e.g., the bit allocation) and modifies the extraction of future quantization indices accordingly.
  • processing logic Using the Division 2, assumed known at both the encoder and decoder (and possibly a function of other parameters), processing logic assembles w(l), ... , w(n) into a decoded vector of length "M" (processing block 205). [0071] Processing logic optionally de-interleaves this decoded vector, if necessary (if interleaving is done by the encoder), and this produces inverse quantized vector "w" 230, which is an "M” dimensional quantized version of the target "x" (processing block 206).
  • Figure 3 illustrates a flow diagram of one embodiment of an encoding process that uses partial information. The process is performed by processing logic at the encoder.
  • the processing logic may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both.
  • the process begins by processing logic optionally interleaving a target vector 302 of dimension "M" 302 (processing block
  • Interleaving function (I) 303 is represented by bits. That is, "I” represents the bits required to describe completely the interleaving function (which can be 0).
  • no interleaving function is used, and the fidelity criteria "B" specifies the number of bits that is to be used to encode the target x. It can be assumed without loss in generality that "B" is equivalent to specifying "B"- bits are to be used to encode target vector 302.
  • the target "x" consists of "M” symbols.
  • each symbol itself represents a vector, hi the simplest case, a single symbol is a real or complex valued scalar (number).
  • processing logic After optionally interleaving, processing logic performs Division 1.
  • processing logic breaks the vector 302 into subsequences (processing block 312), detects and classifies variations (processing block 313) and encodes partial information on the variations in response to information regarding dimension
  • sub-sequences in Division 1 are non-overlapping and defined simply as consecutive sub-sequences each consisting of "m" symbols.
  • Processing logic decodes the partial information and the variations
  • processing logic creates the new fidelity criteria for each of the "p" dimensional subsequences using the target global fidelity criteria to encode the vector, B 301, the dimension M, the result of decoding the partial information of variations from decode partial information block 315 and an output of processing block 320.
  • processing logic performs Division 2 which includes selecting a method to divide (interleave) target vector 302 into subsequences for encoding.
  • processing logic breaks the vector into subsequences for encoding based on the method selected at processing block 320.
  • the sequences for encoding are subsequences of dimension "p".
  • the subsequences referred to as y(l) . . . , y(n).
  • processing logic In response to the outputs of processing blocks 321 and 316, processing logic encodes the subsequences (processing block 330).
  • the encoded subsequences are each described by parameters (e.g., quantization indices) that collectively comprise the information "Q". This "Q" along with the bits required to describe completely the partial information V are output and sent to mux and packing logic 340.
  • Multiplexing and packing logic 340 receive the bits that are required to completely describe the interleaving function, "I”, the bits required to describe completely the partial information, "V”, and the bits “Q” required to describe completely the quantization which can be interpreted given "V” (and possibly “I”). . In response thereto, multiplexed and packed into a bitstream by logic 340. The output of mux and packing logic 340 sent to mux and packing logic 341 which multiplexes and tacks the information along with parameters from earlier stages 304 into a bitstream 350.
  • Figure 4 is a flow diagram of one embodiment of the decoding process.
  • the process is performed by processing logic in the decoder.
  • the process is performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), or a combination of both.
  • bitstream 401 is received by demux and unpacking logic 411 which produces a bitstream 420 and parameters for earlier stages (e.g., M and B) 402.
  • Bitstream 420 is input into demux and unpacking logic 412 which performs de-multiplexing and unpacking of the bitstream to produce I, V, and Q, where I are the bits required to describe completely the interleaving function, V are the bits required to describe completely the partial information, and Q are the bits required to describe completely the quantization given V.
  • the V bits are sent to processing block 403 where processing logic decodes the partial information on variations in response to an input M that represents the dimensionality of the target vector.
  • processing block 404 uses the results of the decoding to create a new fidelity criteria for each of the "p" dimensional subsequences in response to target global fidelity criteria B and the dimension M of the target vector.
  • the new fidelity is also created in response to the selection of the method used to divide the target vector into subsequences for encoding that is specified by processing block 405.
  • the new fidelity criteria represented as f(l) . . . , f(n) is sent to processing block 406.
  • processing logic decodes the information represented in "Q" from demux and unpacking logic 412 relating to each of the subsequences in response to the fidelity criteria specified by processing block 404.
  • processing logic assembles the retrieved subsequences into a decoded sequence of dimension M.
  • processing logic assembles the subsequences in response to the method to divide (interleave) target X into subsequences as specified by processing block 405. [0086] Thereafter, processing logic performs any necessary deinterleaving
  • processing block 408 This is done in response to interleaving function specified by I output from demux and unpacking logic 412.
  • the output of processing block 408 is the M dimensional decoded version of target X.
  • a measure of variation is computed for each of the "m" dimensional vectors x(l), ... , x(q).
  • the measure has to match the perceptual criteria and quantization scheme that is used.
  • the quantization scheme is based on fixed-rate vector quantizers, and the criteria is the energy of each subsequence.
  • Processing logic decides on a discrete number "D" of categories in which to classify the subsequences based on the measure.
  • Members of each category represent vectors that deviate from the typical behavior in some sense.
  • a single category is used in which the subsequence with the maximum variation in the measure, e.g. energy, is noted.
  • the category has a single member.
  • two categories are used: the first category being the "d" vectors with the highest energies and the second category being the "h” vectors with the lowest energy. In this case, the first group has "d” members and the second group has "h” members.
  • the categories that are used often do not provide precise information on the value of the measure under consideration, e.g. the energy value of the subsequences. In fact, it does not necessarily, as in this case when "a">l, provide information at the granularity of Division 2.
  • AU that is necessary is that the variation differentiates one or more subsequences from the rest within the group of sequences under consideration. That is, categories are for subsequences which are "atypical” given the limited samplings representative of such vectors at low dimension when compared to other subsequences.
  • the examples above represent categories that are being used in practice. In one embodiment, the categories are fixed. In another embodiment, the categories are a function of information from earlier coding stages, e.g.
  • Iog2(q(q-1)) bits is sufficient to describe the membership in the two categories of interest. This would constitute the information "F” in Figure 3 and Figure 4.
  • q-2 subsequences are implicitly in this example included in a third category for which no information is given, besides that these subsequences are not in the two categories of interest.
  • An example of partial information comprises a definition of the "D" categories, membership in the "D” categories, and the fact that many sequences may not be put into a "atypical” category partial information.
  • the (B-V) bits assigned to the target vector "x" are initially divided in a way that is considered equal among the "q" "m"-dimensional subsequences x(l), ..., x(q) of Division 1. This would make sense in the case that there is no partial information since the earlier coding stages assume, or by nature and design try to make, the subsequences to be all statistically equal and the target vector "x" to have no structure.
  • the additional partial information enables one to do better, particularly at low bit rates.
  • the bit allocation is modified to create an unequal assignment across the q subsequences. This creates a coarse initial unequal bit allocation F(I),...,F(q) across the "q" m-dimensional subsequences. For example, if there are two categories: Category 1 being the subsequence with maximum energy and Category 2 being the subsequence with minimum energy, an algorithm could simply remove a given number of bits from subsequence of Category 2 and give to the subsequence in Category 1. The number of bits that is to be transferred is referred to herein as the "skew”.
  • bits are removed from many other vectors that are not differentiated by the partial information, as in this second example, the bit are removed as uniformly as possible across these vectors to make up the skew.
  • the "a" Division 2 subsequences x(k,l),...,x(k,a) within a subsequence x(k) are either treated as equally as possible within the group.
  • the partial information that is available does not apply at a refinement of bit assignments within any subsequence x(k) and so equal treatment is logical and achieved by dividing the bits up as equally as possible between the "a" subsequences.
  • the actual quantization based on a bit assignment to any given x(k,j) is done using classic quantization techniques, as previously described, e.g., scalar or vector quantization.
  • the encoding scheme of Figure 3 and decoding scheme of Figure 4 are modified to add the ability to make perceptual refinements. These perceptual refinements patterned bit-assignments and/or noise-fill.
  • assign f(i), f(j), f(l) to subsequences within the same category i.e. to subsequences within the same x(k) or subsequences of different x(k) that are in the same category
  • the partial information does not distinguish such vectors from one another by definition.
  • Figure 5 illustrates the modification of Figure 3 where perceptual enhancement block 501 examines the output of the newly created fidelity for each of the subsequences and for each of the groups representing the same partial information in V. Processing logic then re-orders f(i), . . . f(n) to have better perceptual effect. The reordered assignment is sent to encoding block 530, which encodes the subsequences as they are produced. The same is similar in Figure 6. [00100] One embodiment of the incorporation of permutation is given below.
  • the choice of which to option to use can depend on other signal characteristics (information) encoded (represented) in previous stages as well as the actual values of f(k). That is, the permutation is entirely implicit on existing information.
  • the targets are quantized. Sometimes it is advantageous in a way with those receiving the maximum bit allocation being quantized first. Note, this information is packed first into the bitstream in Q. [00103] Based on the values of g(j), ... ,g(j+s) and possibly the quantized indices in Q, the perceptual masking properties of the decoded vectors w(j),...,w(j+s) are evaluated.
  • noise-fill processing block 701 generates a random sequence at a prescribed energy for subsequences with no information in Q.
  • Noise-fill effectively increases the variability in potential decoded patterns often at the expense of increase mean square error. The increased variability is perceptually more pleasing and is created by generating random patterns, at a given noise energy level, for areas in which there are zero bit assignments.
  • the noise fill is simply generated at a selected level for subsequences receiving zero-bit assignments.
  • the scheme adapts to the exact pattern g(l),...,g(n), it can do so by changing the energy level of the noise fill in different areas.
  • the decoder may not decide to use any noise-fill in that area or to decrease the energy of the noise-fill. Performance Enhancements to the Embodiment
  • the first is to adapt the quantizer used to code a subsequence based on the subsequence's category. This is shown in Figure 8. To implement this scheme in the case where straight-forward vector quantizers (of dimension "p") are used, the scheme would simply have different codebooks for different categories. The codebooks are trained based on classified training data.
  • a second enhancement is to use two or more embodiments of the scheme simultaneously, e.g. use different "m”, different "p", different categories etc, for each of the embodiments, encode using each embodiment, and then select information from only one embodiment for transmission to the decoder. If "r" different embodiments are tested then an addition Iog2(r) bits of side-information is sent to the decoder to signal which embodiment has been selected and sent.
  • the subsequences in Division 1 are overlapping.
  • the overlapping itself can be used to increase the resolution of information provided by the categories. For example, if two overlapping subsequences are members of the same category, then it could be likely that the overlap region (common to the two subsequences) is the area that is creating the atypical variation. Recall, to balance the information between the "V" bits to describe the category and the "(B-V)" bits to do the quantization it could be that subsequences in a group may not in fact have the variation that the group is trying to signify.
  • the target fidelity criteria "B" can be specified in means other than bits.
  • the target fidelity criteria "B” represents a bound on the error for each target vector.
  • the value "m” is a function of information from earlier stages, e.g. "M” and "B". It may be advantageous to provide additional adaptation in this value through use of additional side information and or use of other parameters. For example, one such scheme uses two potential values of "m” and signals the final choice used for a given sequence to the decoder using 1 bit.
  • the interleaver is fixed or a function of information from earlier coding stages (requiring no side information) or variable (requiring side information).
  • the new fidelity criteria on "p" subsequences do not conform to the global fidelity criteria "B". For example, it could be that the additional partial information is enough to motivate a change in the "B" criteria calculated from earlier stages.
  • the process of generating new perceptual patterns g(l),...,g(n) is not an incremental process that occurs as quantization is being done.
  • the pattern g(l) 5 .. -,g(n) can be generated directly from f(l),...,f(n) without any information from Q. This increases the resilience of the encoding to bit-errors.
  • FIG. 9 is a block diagram of an exemplary computer system that may perform one or more of the operations described herein.
  • computer system 900 may comprise an exemplary client or server computer system.
  • Computer system 900 comprises a communication mechanism or bus 911 for communicating information, and a processor 912 coupled with bus 911 for processing information.
  • Processor 912 includes a microprocessor, but is not limited to a microprocessor, such as, for example, PentiumTM, PowerPCTM, AlphaTM, etc.
  • System 900 further comprises a random access memory (RAM), or other dynamic storage device 904 (referred to as main memory) coupled to bus 911 for storing information and instructions to be executed by processor 912.
  • RAM random access memory
  • main memory main memory
  • Main memory 904 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 912.
  • Computer system 900 also comprises a read only memory (ROM) and/or other static storage device 906 coupled to bus 911 for storing static information and instructions for processor 912, and a data storage device 907, such as a magnetic disk or optical disk and its corresponding disk drive. Data storage device 907 is coupled to bus 911 for storing information and instructions.
  • Computer system 900 may further be coupled to a display device 921, such as a cathode ray tube (CRT) or liquid crystal display (LCD), coupled to bus 911 for displaying information to a computer user.
  • CTR cathode ray tube
  • LCD liquid crystal display
  • An alphanumeric input device 922 may also be coupled to bus 911 for communicating information and command selections to processor 912.
  • cursor control 923 such as a mouse, trackball, trackpad, stylus, or cursor direction keys, coupled to bus 911 for communicating direction information and command selections to processor 912, and for controlling cursor movement on display 921.
  • Another device that may be coupled to bus 911 is hard copy device
  • bus 911 Another device that may be coupled to bus 911 is a wired/wireless communication capability 925 to communication to a phone or handheld palm device.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
PCT/US2006/015251 2005-04-20 2006-04-20 Quantization of speech and audio coding parameters using partial information on atypical subsequences Ceased WO2006113921A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
DE602006009495T DE602006009495D1 (de) 2005-04-20 2006-04-20 Quantisierung von parametern zur sprach- und audiokodierung mittels teilinformationen über atypische untersequenzen
JP2008507957A JP4963498B2 (ja) 2005-04-20 2006-04-20 非典型的な部分系列に関する部分情報を用いた音声及びオーディオ符号化パラメータの量子化
AT06751085T ATE444550T1 (de) 2005-04-20 2006-04-20 Quantisierung von parametern zur sprach- und audiokodierung mittels teilinformationen über atypische untersequenzen
EP06751085A EP1872363B1 (en) 2005-04-20 2006-04-20 Quantization of speech and audio coding parameters using partial information on atypical subsequences

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US67340905P 2005-04-20 2005-04-20
US60/673,409 2005-04-20
US11/408,125 US7885809B2 (en) 2005-04-20 2006-04-19 Quantization of speech and audio coding parameters using partial information on atypical subsequences
US11/408,125 2006-04-19

Publications (1)

Publication Number Publication Date
WO2006113921A1 true WO2006113921A1 (en) 2006-10-26

Family

ID=36658834

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/015251 Ceased WO2006113921A1 (en) 2005-04-20 2006-04-20 Quantization of speech and audio coding parameters using partial information on atypical subsequences

Country Status (6)

Country Link
US (1) US7885809B2 (enExample)
EP (1) EP1872363B1 (enExample)
JP (1) JP4963498B2 (enExample)
AT (1) ATE444550T1 (enExample)
DE (1) DE602006009495D1 (enExample)
WO (1) WO2006113921A1 (enExample)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010003618A3 (en) * 2008-07-11 2010-03-25 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Providing a time warp activation signal and encoding an audio signal therewith
US9025777B2 (en) 2008-07-11 2015-05-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio signal decoder, audio signal encoder, encoded multi-channel audio signal representation, methods and computer program
CN107533847A (zh) * 2015-03-09 2018-01-02 弗劳恩霍夫应用研究促进协会 音频编码器、音频解码器、用于编码音频信号的方法及用于解码经编码的音频信号的方法

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006085243A2 (en) * 2005-02-10 2006-08-17 Koninklijke Philips Electronics N.V. Sound synthesis
US7873514B2 (en) * 2006-08-11 2011-01-18 Ntt Docomo, Inc. Method for quantizing speech and audio through an efficient perceptually relevant search of multiple quantization patterns
US7461106B2 (en) * 2006-09-12 2008-12-02 Motorola, Inc. Apparatus and method for low complexity combinatorial coding of signals
US20080243518A1 (en) * 2006-11-16 2008-10-02 Alexey Oraevsky System And Method For Compressing And Reconstructing Audio Files
AU2007332508B2 (en) * 2006-12-13 2012-08-16 Iii Holdings 12, Llc Encoding device, decoding device, and method thereof
JPWO2008072733A1 (ja) * 2006-12-15 2010-04-02 パナソニック株式会社 符号化装置および符号化方法
US7885819B2 (en) 2007-06-29 2011-02-08 Microsoft Corporation Bitstream syntax for multi-process audio decoding
MX2010001394A (es) 2007-08-27 2010-03-10 Ericsson Telefon Ab L M Frecuencia de transicion adaptiva entre llenado de ruido y extension de anchura de banda.
US8576096B2 (en) * 2007-10-11 2013-11-05 Motorola Mobility Llc Apparatus and method for low complexity combinatorial coding of signals
US8209190B2 (en) * 2007-10-25 2012-06-26 Motorola Mobility, Inc. Method and apparatus for generating an enhancement layer within an audio coding system
WO2009084918A1 (en) 2007-12-31 2009-07-09 Lg Electronics Inc. A method and an apparatus for processing an audio signal
US20090234642A1 (en) * 2008-03-13 2009-09-17 Motorola, Inc. Method and Apparatus for Low Complexity Combinatorial Coding of Signals
US8639519B2 (en) * 2008-04-09 2014-01-28 Motorola Mobility Llc Method and apparatus for selective signal coding based on core encoder performance
ES3032483T3 (en) 2008-07-11 2025-07-21 Fraunhofer Ges Forschung Method for decoding an audio signal and computer program
US8219408B2 (en) * 2008-12-29 2012-07-10 Motorola Mobility, Inc. Audio signal decoder and method for producing a scaled reconstructed audio signal
US8140342B2 (en) * 2008-12-29 2012-03-20 Motorola Mobility, Inc. Selective scaling mask computation based on peak detection
US8200496B2 (en) * 2008-12-29 2012-06-12 Motorola Mobility, Inc. Audio signal decoder and method for producing a scaled reconstructed audio signal
US8175888B2 (en) * 2008-12-29 2012-05-08 Motorola Mobility, Inc. Enhanced layered gain factor balancing within a multiple-channel audio coding system
US8423355B2 (en) * 2010-03-05 2013-04-16 Motorola Mobility Llc Encoder for audio signal including generic audio and speech frames
US8428936B2 (en) * 2010-03-05 2013-04-23 Motorola Mobility Llc Decoder for audio signal including generic audio and speech frames
US9020818B2 (en) * 2012-03-05 2015-04-28 Malaspina Labs (Barbados) Inc. Format based speech reconstruction from noisy signals
US9129600B2 (en) 2012-09-26 2015-09-08 Google Technology Holdings LLC Method and apparatus for encoding an audio signal
ES2628127T3 (es) 2013-04-05 2017-08-01 Dolby International Ab Cuantificador avanzado
MX395108B (es) * 2013-10-18 2025-03-24 Ericsson Telefon Ab L M Codificacion y decodificacion de posiciones de picos espectrales.

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2874363B2 (ja) * 1991-01-30 1999-03-24 日本電気株式会社 適応符号化・復号化方式
EP0525774B1 (en) * 1991-07-31 1997-02-26 Matsushita Electric Industrial Co., Ltd. Digital audio signal coding system and method therefor
US5394508A (en) * 1992-01-17 1995-02-28 Massachusetts Institute Of Technology Method and apparatus for encoding decoding and compression of audio-type data
US5517511A (en) * 1992-11-30 1996-05-14 Digital Voice Systems, Inc. Digital transmission of acoustic signals over a noisy communication channel
CA2135415A1 (en) * 1993-12-15 1995-06-16 Sean Matthew Dorward Device and method for efficient utilization of allocated transmission medium bandwidth
KR960012475B1 (ko) * 1994-01-18 1996-09-20 대우전자 주식회사 디지탈 오디오 부호화장치의 채널별 비트 할당 장치
MY130167A (en) * 1994-04-01 2007-06-29 Sony Corp Information encoding method and apparatus, information decoding method and apparatus, information transmission method and information recording medium
US5602961A (en) * 1994-05-31 1997-02-11 Alaris, Inc. Method and apparatus for speech compression using multi-mode code excited linear predictive coding
EP0721257B1 (en) * 1995-01-09 2005-03-30 Daewoo Electronics Corporation Bit allocation for multichannel audio coder based on perceptual entropy
JP3297238B2 (ja) * 1995-01-20 2002-07-02 大宇電子株式會▲社▼ 適応的符号化システム及びビット割当方法
CA2246532A1 (en) * 1998-09-04 2000-03-04 Northern Telecom Limited Perceptual audio coding

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KULDIP K PALIWAL ET AL: "EFFICIENT VECTOR QUANTIZATION OF LPC PARAMETERS AT 24 BITS/FRAME", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 1, no. 1, January 1993 (1993-01-01), pages 3 - 14, XP000358435, ISSN: 1063-6676 *
OMAR NIAMUT, RICHARD HEUSDENS: "RD OPTIMAL TIME SEGMENTATIONS FOR THE TIME-VARYING MDCT", PROCEEDINGS EUSIPCO 2004 (EUROPEAN SIGNAL PROCESSING CONFERENCE), 6 September 2004 (2004-09-06), pages 1649 - 1652, XP002391769, Retrieved from the Internet <URL:http://www.eurasip.org/content/Eusipco/2004/defevent/papers/cr1699.pdf> [retrieved on 20060724] *
PRANDOM P ET AL: "Optimal time segmentation for signal modeling and compression", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 1997. ICASSP-97., 1997 IEEE INTERNATIONAL CONFERENCE ON MUNICH, GERMANY 21-24 APRIL 1997, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, vol. 3, 21 April 1997 (1997-04-21), pages 2029 - 2032, XP010226332, ISBN: 0-8186-7919-0 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010003618A3 (en) * 2008-07-11 2010-03-25 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Providing a time warp activation signal and encoding an audio signal therewith
CN102150201A (zh) * 2008-07-11 2011-08-10 弗劳恩霍夫应用研究促进协会 提供时间扭曲激活信号以及使用该时间扭曲激活信号对音频信号编码
EP2410519A1 (en) * 2008-07-11 2012-01-25 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs
CN103000178A (zh) * 2008-07-11 2013-03-27 弗劳恩霍夫应用研究促进协会 提供时间扭曲激活信号以及使用该时间扭曲激活信号对音频信号编码
CN102150201B (zh) * 2008-07-11 2013-04-17 弗劳恩霍夫应用研究促进协会 提供时间扭曲激活信号以及使用该时间扭曲激活信号对音频信号编码
AU2009267433B2 (en) * 2008-07-11 2013-06-13 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Providing a time warp activation signal and encoding an audio signal therewith
KR101360456B1 (ko) 2008-07-11 2014-02-07 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. 시간 워프 활성 신호의 제공 및 이를 이용한 오디오 신호의 인코딩
KR101400588B1 (ko) 2008-07-11 2014-05-28 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. 시간 워프 활성 신호의 제공 및 이를 이용한 오디오 신호의 인코딩
KR101400513B1 (ko) * 2008-07-11 2014-05-28 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. 시간 워프 활성 신호의 제공 및 이를 이용한 오디오 신호의 인코딩
KR101400484B1 (ko) * 2008-07-11 2014-05-28 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. 시간 워프 활성 신호의 제공 및 이를 이용한 오디오 신호의 인코딩
KR101400535B1 (ko) 2008-07-11 2014-05-28 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. 시간 워프 활성 신호의 제공 및 이를 이용한 오디오 신호의 인코딩
US9015041B2 (en) 2008-07-11 2015-04-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs
US9025777B2 (en) 2008-07-11 2015-05-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio signal decoder, audio signal encoder, encoded multi-channel audio signal representation, methods and computer program
US9043216B2 (en) 2008-07-11 2015-05-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio signal decoder, time warp contour data provider, method and computer program
US9263057B2 (en) 2008-07-11 2016-02-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs
US9293149B2 (en) 2008-07-11 2016-03-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs
US9299363B2 (en) 2008-07-11 2016-03-29 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Time warp contour calculator, audio signal encoder, encoded audio signal representation, methods and computer program
US9431026B2 (en) 2008-07-11 2016-08-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs
US9466313B2 (en) 2008-07-11 2016-10-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs
US9502049B2 (en) 2008-07-11 2016-11-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs
US9646632B2 (en) 2008-07-11 2017-05-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs
RU2621965C2 (ru) * 2008-07-11 2017-06-08 Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф. Передатчик сигнала активации с деформацией по времени, кодер звукового сигнала, способ преобразования сигнала активации с деформацией по времени, способ кодирования звукового сигнала и компьютерные программы
CN107533847A (zh) * 2015-03-09 2018-01-02 弗劳恩霍夫应用研究促进协会 音频编码器、音频解码器、用于编码音频信号的方法及用于解码经编码的音频信号的方法
JP2018511821A (ja) * 2015-03-09 2018-04-26 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン オーディオエンコーダ、オーディオデコーダ、オーディオ信号を符号化する方法、および符号化されたオーディオ信号を復号化する方法
JP2020038380A (ja) * 2015-03-09 2020-03-12 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン オーディオエンコーダ、オーディオデコーダ、オーディオ信号を符号化する方法、および符号化されたオーディオ信号を復号化する方法
US10600428B2 (en) 2015-03-09 2020-03-24 Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschug e.V. Audio encoder, audio decoder, method for encoding an audio signal and method for decoding an encoded audio signal
CN107533847B (zh) * 2015-03-09 2021-09-10 弗劳恩霍夫应用研究促进协会 音频编码器和音频解码器及对应的方法
JP7078592B2 (ja) 2015-03-09 2022-05-31 フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン オーディオエンコーダ、オーディオデコーダ、オーディオ信号を符号化する方法、および符号化されたオーディオ信号を復号化する方法
US12112765B2 (en) 2015-03-09 2024-10-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding an audio signal and method for decoding an encoded audio signal

Also Published As

Publication number Publication date
EP1872363B1 (en) 2009-09-30
US7885809B2 (en) 2011-02-08
JP2008538619A (ja) 2008-10-30
DE602006009495D1 (de) 2009-11-12
EP1872363A1 (en) 2008-01-02
US20060241940A1 (en) 2006-10-26
JP4963498B2 (ja) 2012-06-27
ATE444550T1 (de) 2009-10-15

Similar Documents

Publication Publication Date Title
US7885809B2 (en) Quantization of speech and audio coding parameters using partial information on atypical subsequences
JP5658307B2 (ja) ディジタルメディアの効率的コーディング用のバンドを入手するための周波数セグメント化
EP2282310B1 (en) Entropy coding by adapting coding between level and run-length/level modes
JP5456310B2 (ja) ディジタル・メディア・スペクトル・データの効率的コーディングに使用される辞書内のコードワードの変更
US7433824B2 (en) Entropy coding by adapting coding between level and run-length/level modes
CN1906855B (zh) 空间矢量和可变分辨率量化
US7873514B2 (en) Method for quantizing speech and audio through an efficient perceptually relevant search of multiple quantization patterns
CN101160621A (zh) 使用关于非典型子序列的部分信息的语音和音频编码参数的量化
Ramprashad Efficient quantization of statistically normalized vectors using multi-option partial-order bit-assignment schemes
HK1154302B (en) Entropy coding by adapting coding between level and run-length/level modes
HK1152790B (en) Entropy coding by adapting coding between level and run-length/level modes

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680012440.2

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006751085

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2008507957

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU