WO2009077950A1 - Procede de codage audio temporel/frequentiel adaptatif - Google Patents

Procede de codage audio temporel/frequentiel adaptatif Download PDF

Info

Publication number
WO2009077950A1
WO2009077950A1 PCT/IB2008/055244 IB2008055244W WO2009077950A1 WO 2009077950 A1 WO2009077950 A1 WO 2009077950A1 IB 2008055244 W IB2008055244 W IB 2008055244W WO 2009077950 A1 WO2009077950 A1 WO 2009077950A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
peakedness
domain signal
time
encoding
Prior art date
Application number
PCT/IB2008/055244
Other languages
English (en)
Inventor
Albertus C. Den Brinker
Steven L. J. D. E. Van De Par
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2009077950A1 publication Critical patent/WO2009077950A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding

Definitions

  • the invention relates to an adaptive time/frequency-based audio encoding method for encoding an input signal that is divided into a plurality of frequency-domain signals into an output data stream, said encoding method comprising encoding each frequency-domain signal in one of a time-based encoding mode or a frequency-based encoding mode. Further the invention relates to an adaptive time/frequency-based audio encoder for encoding an input signal into an output data stream, and a computer program product.
  • speech codecs work well for speech and audio codecs well for audio. From a coding efficiency point of view speech signals are characterized by the voiced speech parts that are generated by an excitatory signal that is filtered by the vocal tract. Speech coders effectively regenerate the excitation signal and the filtering. The parameters for this regeneration process form a very efficient representation of the speech signal. The signal is effectively represented as a time domain signal corresponding nicely to the speech production process. Therefore, speech coders are often termed as time domain coders (TDC). Audio signals on the other hand vary relatively slowly over time, in contrast to speech, and often consist of tonal components that are stable across longer temporal intervals.
  • TDC time domain coders
  • the excitatory nature of the speech signal is represented in the neural encoding of an audio signal and any modification, i.e. temporal smearing, is highly perceptible. This is likely to occur due to the quantization in the spectral domain that is performed by the audio coder.
  • Speech coders are not very suitable for encoding audio because of the constant spectral lines that occur in tonal music signals. Spectral resolution of speech coders at low frequencies is too poor to capture these components well. In addition, the structure of spectral lines at higher frequencies can create a characteristic modulation pattern due to beating patterns between adjacent components that cannot be modeled well by the excitatory-based speech coder. To benefit from the advantages of the audio coders and speech coder systems for joint audio and speech coding have been proposed. One of such systems has been disclosed in the patent application US2007/0106502.
  • an adaptive time/frequency-based audio encoder which obtains high compression efficiency by making efficient use of encoding gains of two encoding methods in which a frequency-domain transform is performed on input audio data such that time-based encoding is performed on a band of the audio data suitable for voice compression and frequency-based encoding is performed on remaining bands of the audio data.
  • the proposed encoder comprises a transformation and mode determination unit to divide an input audio signal into a plurality of frequency-domain signals and to select a time-based encoding mode or frequency-based encoding mode for each respective frequency-domain signal, an encoding unit to encode each frequency-domain signal in the respective encoding modes selected by the transformation and mode determination unit, and a bitstream output unit to output encoded data, division information, and encoding mode information for each encoded frequency-domain signal.
  • the encoding mode to be used for the respective frequency-domain signal is determined based on at least one of a linear coding gain, a spectral change between linear prediction filters of adjacent frames, a predicted pitch delay, and a predicted long-term prediction gain.
  • spectral measures comprise: a linear predictive coding gain, a spectral change between linear predictive filters of consecutive frames, or a spectral tilt (first reflection coefficient).
  • energy measures comprise: signal energy, or a change in signal energy between sub frames.
  • Said long-term prediction estimates comprise: an estimated pitch delay, or estimated prediction gains.
  • This object is achieved by selecting of the time-based encoding mode or the frequency-based encoding mode for the respective frequency-domain signal based on a peakedness of said respective frequency-domain signal. Said peakedness relates much better to the strengths of the time-based and frequency-based encoding than spectral measure, energy measures, or long-term prediction measures do. Therefore, the resulting speech/audio quality is improved. In other words, the choice of the encoding mode is better tailored to the actual content of the frequency-domain signal. In particular, if a signal has a high peakedness, these peaks are typically perceptually relevant and the quantized signal is associated with a low bit rate since the majority of values are quantized to the zero level.
  • the proposed measure has a direct relation to the essential property of the coder and provides a high quality at a low bit rate.
  • the measures used so far are more indirect, involve more parameters, require complex inference rules and are thus more prone to non-optimal decisions.
  • the peakedness of the frequency-domain signal is a spectral peakedness.
  • the frequency domain signal may be the absolute value of the Fourier transform of a signal, the discrete cosine transform DCT or related representations like the MDCT or MLT.
  • the strong tonal components appear as large peaks in the representation and a high spectral peakedness is thus indicative of steady tonal music which can most efficiently be encoded by a frequency domain coder.
  • the spectral peakedness can be established for different bands of the entire frequency spectrum.
  • the frequency-based encoding mode is selected when the spectral peakedness exceeds a predetermined threshold. If the spectral peakedness of a particular frequency band is sufficiently high, the decision can be made to use a frequency domain encoding method for this band.
  • the advantage of using spectral peakedness is its direct coupling to the efficiency of a frequency-domain encoding method.
  • the predetermined threshold takes on a value of a temporal peakedness corresponding to a time-representation of said respective frequency- domain signal.
  • the decision about using one or another encoding mode is made according to most dominant peakedness measure. Thus if the spectral peakedness is larger than the temporal peakedness the frequency-based encoding mode is chosen, otherwise the time-based encoding is chosen.
  • the peakedness of the frequency-domain signal is a temporal peakedness corresponding to a time-representation of said frequency-domain signal.
  • the temporal peakedness measure is determined from the time-domain representation of the frequency domain signal corresponding to a frequency band.
  • This time-domain representation can be further pre-processed.
  • spectral flattening yields a good signal on which the temporal peakedness can best be established.
  • the flattening stage may be omitted.
  • the time-based encoding mode is selected when the temporal peakedness exceeds a predetermined threshold.
  • the predetermined threshold takes on a value of a spectral peakedness of said respective frequency-domain signal. In such a case, one can consider whether a time-domain peakedness or the frequency-domain peakedness is highest and apply the appropriate mode of encoding to the components of this frequency band. In doing so, the information concerning both measures is balanced in the final mode decision thus arriving at the optimal decision.
  • selecting a time based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal is further based on at least one of spectral measures, energy measures, or long-term prediction estimates. Since there are inevitably signal pieces where the temporal and spectral peakedness are almost equal, it is of advantage to use other sources of information to arrive at a better decision concerning the mode. This may be done by using in these cases the more indirect decision criteria as have been disclosed in S. A. Ramprashad, The multimode transform predictive coding paradigm, IEEE Trans. Speech Audio Process, 11 (2): 117-129, March 2003.
  • the division information is comprised in the output data stream. This is especially advantageous when the division into the plurality of frequency-domain signals varies over time and needs to be communicated to the decoder in order to allow an appropriate decoding.
  • the invention further provides encoder as well as a computer program product enabling a programmable device to perform the encoding and/or decoding method according to the invention.
  • Fig. 1 shows a flow chart for an adaptive time/frequency-based audio encoding method for encoding an input signal into an output data stream in accordance with the invention
  • Fig. 2 shows a representation of plurality of frequency-domain signals and the corresponding to them time-domain representation together with the peakedness measure corresponding to each of the respective frequency- domain signals;
  • Fig. 3 shows example architecture of an adaptive time/frequency-based audio encoder for encoding an input signal into an output data stream in accordance with the invention
  • Fig. 4 shows an example block diagram illustrating transformation and mode determination unit of the adaptive time/frequency-based audio encoder
  • Fig. 5 schematically shows an example of an encoding mode determination unit that determines whether time-based encoding mode or a frequency-based encoding mode is to be used for the respective frequency-domain signal.
  • Fig. 1 shows a flow chart for an adaptive time/frequency-based audio encoding method for encoding an input signal into an output data stream in accordance with the invention.
  • the proposed encoding method can be summarized to comprise the following steps.
  • First step 110 comprises dividing an input audio signal into a plurality of frequency- domain signals and selecting a time-based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal.
  • Second step 120 comprises encoding each frequency-domain signal in the respective encoding mode.
  • Third step 130 comprises combining encoded data and encoding mode information of each respective frequency domain signal into an output data stream.
  • selecting a time-based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal is based on peakedness of said respective domain signal.
  • the input signal is transformed into frequency domain. Subsequently, it is divided into a plurality of the frequency domain signals.
  • Said frequency- domain signals correspond to e.g. frequency bands.
  • Said frequency bands can be fixed, i.e. the thresholds separating the bands are fixed.
  • the division in bands is preferably logarithmic or linear.
  • the band sizes can vary from band to band, or they can be of the same size.
  • the number of bins can be arbitrary.
  • the number of bands should be determined depending on the actual audio content.
  • Coding of audio or speech data is typically performed in frames of input data in order to be able to track or adapt to the possibly time-varying character of the input signal.
  • Said frame sizes can be fixed or they can vary over time.
  • the invention focuses on the issue of selecting the encoding mode that is suitable for the frequency-domain signal.
  • Said frequency domain signal is defined as the frequency signal corresponding to a frequency band of e.g. a frame.
  • the peakedness of an audio signal relates much better to the strengths of the time-based and frequency-based encoding than spectral measure, energy measures, or long-term prediction measures, as indicated in S. A. Ramprashad, The multimode transform predictive coding paradigm, IEEE Trans. Speech Audio Process, 11 (2):117-129, March 2003 or the patent application US2007/0106502, do. Therefore, the resulting speech/audio quality and the corresponding encoding efficiency are improved. In other words, the choice of the encoding mode is better tailored to the actual content of the frequency-domain signal.
  • each of the plurality of frequency-domain signal is encoded according to the encoding mode selected for them.
  • the encoded data for each of plurality of the frequency-domain signals is combined with its respective encoding mode into the output data stream.
  • the peakedness of the frequency-domain signal is a spectral peakedness.
  • the spectral measure essentially determines the amount of samples largely deviating from a zero or a mean value relative to the amount of samples close to the zero or mean value for a frequency-domain representation of the input signal or a preprocessed version thereof. If only a small number of large values are present in the signal, these are perceptually very relevant and the remaining large number of small values can be efficiently compressed thus arriving at a high-quality efficient code. It is then advantageous to encode such frequency-domain signal with the frequency-based encoding.
  • the constituent encoders i.e. time-based encoding and frequency-based encoding, operate by segmenting the input signal into frames and encoding the resulting signal frames. It is efficient to determine the peakedness by considering intervals defined within the frame of the input signal and calculating the spectral peakedness for the entire frame from the results obtained for said intervals.
  • the regular intervals allow to closely follow the dynamics of the input signal. From the point of view of the implementation it is preferable that the regular intervals coincide with the shortest update that occurs in the frequency and time domain encoding. However, the intervals over which the spectral peakedness is to be calculated could be even smaller than the shortest frames used by the time and frequency encoding.
  • the spectral peakedness is expressed as: whereby X(k) is the respective frequency-domain signal for a frequency interval with a width F.
  • Said spectral-domain representation is real valued and comprises e.g. MDCT coefficients or absolute values of complex amplitude values resulting from a Discrete Fourier Transform or any other frequency-domain representation like MDCT or MLT. The advantage of this particular measure is its simplicity of calculation.
  • Peakedness refers here to any measure that correlates with the degree of presence of peaks in a signal (spectral or temporal).
  • Various measures are known to be used for this purpose.
  • the normalized fourth moment that is used to measure the degree of fluctuations in an envelope signal (cf. Hartmann and Pumplin, "Noise power fluctuations and masking of sine signals," 1988, J. Acoust. Soc. Am., Vol. 83, pp. 2277- 2289) can be used.
  • This is also the basis for the formula for the spectral peakedness given above.
  • the kurtosis measure can be used as a measure of spectral peakedness.
  • a common spectral flatness measure is the ratio of the geometric mean of bins of the magnitude spectrum divided by the arithmetic mean of the same bins.
  • Various other statistical methods are feasible to define a measure for spectral peakedness.
  • the frequency-based encoding mode is selected when the spectral peakedness exceeds a predetermined threshold. If the spectral peakedness of a particular frequency band is sufficiently high, the decision is made to use a frequency domain encoding method for this band as it results in a high audio/speech quality at the high encoding efficiency.
  • a value of 1.65 is an example for the predetermined threshold. This value has been found to be giving good results in tests, however other values are also possible.
  • the predetermined threshold takes on a value of a temporal peakedness corresponding to a time-representation of said respective frequency- domain signal.
  • the decision about using one or another encoding mode is made according to most dominant peakedness measure. Thus if the spectral peakedness is larger than the temporal peakedness the frequency-based encoding mode is chosen, otherwise the time-based encoding is chosen.
  • the peakedness of the frequency-domain signal is a temporal peakedness corresponding to a time-representation of said frequency-domain signal.
  • said frequency-domain signal is transformed to the time domain.
  • the temporal peakedness is determined.
  • the temporal measure essentially determines the amount of samples largely deviating from a zero or a mean value relative to the amount of samples close to the zero or mean value for said time-domain representation of the frequency-domain signal.
  • the temporal peakedness is expressed as:
  • x w (n) is a temporal signal representation of the respective frequency-domain signal across the interval where N is the number of samples of the associated time-domain signal.
  • the size of the interval can vary. However, it is advantageous to determine the peakedness measure across a number of short, overlapping intervals within a frame rather than calculating the peakedness measure across the frame. It is also possible to divide the temporal signal representation into subintervals, and subsequently to determine the peakedness per subinterval and combine (e.g. a max operator) these peakedness values into a single peakedness value for the entire temporal signal representation.
  • the temporal peakedness measure is determined from the time-domain representation of the frequency-domain signal.
  • Said frequency-domain signal preferably corresponds to a frequency band of the input signal.
  • This time-domain representation can be a pre-processed signal obtained by transforming the respective frequency-domain signal to the time domain.
  • spectral flattening yields a good signal on which the temporal peakedness can best be established.
  • the flattening stage may be omitted.
  • the time-based encoding mode is selected when the temporal peakedness exceeds a predetermined threshold. If the temporal peakedness of a particular frequency band is sufficiently high, the decision is made to use a time-based encoding mode for this band as it results in a high audio/speech quality at the high encoding efficiency.
  • a predetermined threshold A value of 1.7 is an example for the predetermined threshold. This value has been found to be giving good results in tests, however other values are also possible.
  • the predetermined threshold takes on a value of a spectral peakedness of said respective frequency-domain signal.
  • the decision about using one or another encoding mode is made according to most dominant peakedness measure.
  • the temporal peakedness is larger than the spectral peakedness the time-based encoding mode is chosen, otherwise the frequency-based encoding is chosen.
  • Fig. 2 shows a representation of plurality of frequency-domain signals and the corresponding to them time-domain representation together with the peakedness measure corresponding to each of the respective frequency- domain signals.
  • the top plot 210 in Fig. 2 depicts the frequency domain representation of the input signal.
  • Said frequency domain representation is obtained by e.g. Fourier-based transform or filter bank applied on the input signal.
  • a chunk of 512 samples of the input signal has been windowed and transformed using a critically sampled filter bank.
  • Said frequency domain representation of said chunk of the input signal has been divided into 14 frequency-domain signals each of which corresponds to a respective frequency band.
  • Said frequency bands are determined in the logarithmic manner resulting for a small band size for lower frequencies and a large band size for larger frequencies.
  • RMS Root-Mean- Square
  • the normalized fourth moment correlation is calculated as a spectral measure.
  • a spectral measure per frequency band are multiplied by two, and they are indicated in the top plot 210 of Fig. 2 by a circle or a star.
  • the bottom plot 220 in Fig. 2 depicts a time domain representation of the respective frequency-domain signals.
  • the real valued data within each band i.e. for each frequency-domain signal
  • the first component of the vector corresponds to real valued first data point in said frequency-domain signal.
  • the real part of the second vector component is the second data point, and the imaginary part is the third data point.
  • the real part of the third vector component is the fourth data point, and the imaginary part is fifth data point, etc.
  • the complex vector created in this way is assumed to represent the positive frequency half of a real valued signal. Alternatively other ways of constructing said vector could be used.
  • said vector is transformed to the time domain using an inverse Fourier transform.
  • the time domain representation is more peaked than its corresponding frequency domain representation.
  • a star is used to indicate the temporal peakedness in the bottom plot and the circle is used to indicate the spectral peakedness in the corresponding band in the top plot.
  • the decision about which of the encoding modes should be used is based on the relation between the spectral and temporal peakedness for said frequency-domain signal.
  • the frequency-based encoding is used.
  • the temporal peakedness is larger than the spectral peakedness for the respective frequency-domain signal the time-based encoding is used.
  • the frequency- domain signal has a single pronounced peak in this band, while the time domain representation of this frequency-domain signal has rather balanced variation of values approximately around 0.8. Since the spectral peakedness is clearly smaller than the temporal peakedness for said band the frequency-based encoding is used to encode the frequency- domain signal corresponding to this band.
  • the time-based encoding is used to encode the frequency-domain signal when the temporal peakedness is larger than the spectral peakedness for said frequency-domain signal and when additionally the temporal peakedness is larger than the predetermined threshold taking a value of e.g. 1.7. Otherwise, the frequency-based encoding is used to encode the frequency-domain signal.
  • selecting a time based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal is further based on at least one of spectral measures, energy measures, or long-term prediction estimates.
  • spectral measures comprise: LP coding gain, the spectral change between LP filters of consecutive frame, or the like.
  • the energy measures comprise: the signal energy, the change in signal energy between subframes, or the like.
  • the long-term prediction estimates comprise: estimated pitch delay, estimated long-term prediction gains, or the like. Said measures are extensively discussed in S. A. Ramprashad, The multimode transform predictive coding paradigm, IEEE Trans. Speech Audio Process, 11 (2): 117-129, March 2003.
  • the decision about the use of a specific encoding mode is then made as follows.
  • the time- based encoding is used to encode the frequency-domain signal when a combination (e.g. a weighted sum) of the temporal peakedness and the pitch corresponding to a frequency- domain signal is larger than the spectral peakedness for said frequency-domain signal and when additionally the temporal peakedness is larger than the predetermined threshold taking a value of e.g. 1.7. Otherwise, the frequency-based encoding is used to encode the frequency- domain signal.
  • a combination e.g. a weighted sum
  • the frequency-based encoding is used to encode the frequency- domain signal.
  • Various other options can be used to improve the encoding mode selection.
  • constraints are imposed on e.g. tilt, or energy of frequency-domain signal.
  • Said constraint takes a form of e.g. a threshold limitation or other more sophisticated form.
  • Said mode determining means can also be used to determine the optimal division of the input signal into the plurality of the frequency-domain signals. For instance, considering the method where the encoding method is determined by the absolute or relative difference between the spatial and temporal peakedness of the divided frequency-domain signal, the division can be determined as one which maximizes in some sense said difference.
  • an indicator for the division information is comprised in the output data stream. Said indicator is for example a code for specific division information of the input signal, or an address of a device, e.g. a server on the Internet, wherefrom said division information can be retrieved.
  • a decoder based on said indicator can be configured to operate according to the division information used to produce the data stream to be decoded.
  • Fig. 3 shows example architecture of an adaptive time/frequency-based audio encoder 300 for encoding an input signal into an output data stream in accordance with the invention.
  • Said encoder comprises a transformation and mode determination unit 310, an encoding unit 320, and a merger 330.
  • the transformation and mode determination unit 310 divides the input audio signal into plurality of frequency-domain signals and to select the time-based encoding mode or the frequency-based encoding mode for each respective frequency-domain signal. Then, the transformation and mode determination unit 310 outputs a frequency domain-signal 321 determined to be encoded in the time-based encoding mode, a frequency-domain signal 322 determined to be encoded in the frequency-based encoding mode, and encoding mode information 331 for each frequency-domain signal.
  • dividing the input signal and the encoding mode selection is performed in a single unit 310, however, a separate functional unit (implemented in hardware or software) could be assigned to perform each of these functions.
  • the encoding unit 320 encodes each frequency-domain signal in the respective encoding modes selected by the transformation and mode determination unit 310.
  • the unit 320 performs time-based encoding on the frequency-domain signal 321 and performs frequency-based encoding on the frequency-domain signal 322.
  • the encoding unit 320 outputs encoded data 333 on which the time-based encoding has been performed and encoded data 334 on which the frequency-based encoding has been performed.
  • the merger 330 combines encoded data 333 and 334, and encoding mode information 331 for each respective encoded frequency-domain signal to produce the output data stream 341.
  • Fig. 4 shows an example block diagram illustrating transformation and mode determination unit 310 of the adaptive time/frequency-based audio encoder 300.
  • the transformation and mode determination unit 310 comprises a frequency-domain transform unit 400 and an encoding mode determination unit 410.
  • the frequency-domain transform unit 400 transforms the input audio signal 311 into a full frequency-domain signal 421.
  • Said full frequency-domain signal 421 having a frequency spectrum such as e.g. the one illustrated in the top plot 210 of Fig. 2.
  • Said frequency-domain representation is obtained by using e.g. Fourier-based transform or filter bank applied on the input signal.
  • the encoding mode determination unit 410 divides the full frequency-domain signal 421 into a plurality of frequency-domain signals according to a preset standard and selects either the time-based encoding mode or the frequency-based encoding mode for each frequency-domain signal based on peakedness of said frequency-domain signal.
  • the encoding mode determination unit 410 outputs the frequency domain signal 321 determined to be encoded in the time-based encoding mode, the frequency-domain signal 322 determined to be encoded in the frequency- based encoding mode, the encoding mode information 331, and when required the division information 332 for each frequency-domain signal.
  • Fig. 5 schematically shows an example of an encoding mode determination unit 410 that determines whether time-based encoding mode or a frequency-based encoding mode is to be used for the respective frequency-domain signal. Said unit 410 determines an encoding mode based on a spectral peakedness and temporal peakedness of the input signal 421.
  • the input signal 421 is fed into a signal selector 511, which outputs the frequency-domain signal 422 corresponding to e.g. the selected frequency band.
  • Said frequency-domain signal 422 is further fed into the unit 514 which calculates the spectral peakedness of said frequency-domain signal 422 based e.g. on the normalized fourth moment correlation (as discussed before).
  • the frequency-domain signal 422 is fed into an inverse transform unit 512 that transforms said frequency-domain signal into time-domain signal.
  • Unit 513 derives a temporal peakedness measure for the signals provided from the unit 512.
  • Said temporal peakedness is calculated using e.g. the normalized fourth moment correlation (as discussed before).
  • said temporal peakedness is determined across a number of short, overlapping intervals within a frame rather than calculating the peakedness measure across the frame at once.
  • the temporal and spectral peakedness measures obtained from units 513 and 514 are fed into a unit 515, which combines these two measures to make a decision about the value of the predetermined frequency.
  • the predetermined frequency can be determined in many ways by means of a formula or by means of heuristics. Alternatively other ways of assessing the dominance of the spectral or temporal peakedness can be used that utilize a formula or heuristics. Below one of the heuristics which can be used to determine the encoding mode is described.
  • the time-based encoding is used to encode the frequency-domain signal when the temporal peakedness is larger than the spectral peakedness for said frequency-domain signal and when additionally the temporal peakedness is larger than the predetermined threshold taking a value of e.g. 1.7. Otherwise, the frequency-based encoding is used to encode the frequency-domain signal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne un procédé amélioré de codage audio temporel/fréquentiel adaptatif servant à coder un signal d'entrée en un flux de données de sortie et permettant d'obtenir un codage haute efficacité de qualité audio et vocale élevée. L'invention consiste à sélectionner le mode de codage temporel ou le mode de codage fréquentiel pour le signal de domaine fréquentiel respectif de la pluralité de signaux de domaine fréquentiel appartenant au signal d'entrée, en fonction de l'irrégularité du signal de domaine fréquentiel respectif.
PCT/IB2008/055244 2007-12-18 2008-12-12 Procede de codage audio temporel/frequentiel adaptatif WO2009077950A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP07123456.1 2007-12-18
EP07123456 2007-12-18

Publications (1)

Publication Number Publication Date
WO2009077950A1 true WO2009077950A1 (fr) 2009-06-25

Family

ID=40316955

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/055244 WO2009077950A1 (fr) 2007-12-18 2008-12-12 Procede de codage audio temporel/frequentiel adaptatif

Country Status (1)

Country Link
WO (1) WO2009077950A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103915100A (zh) * 2013-01-07 2014-07-09 中兴通讯股份有限公司 一种编码模式切换方法和装置、解码模式切换方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6233550B1 (en) * 1997-08-29 2001-05-15 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4kbps
US7039581B1 (en) * 1999-09-22 2006-05-02 Texas Instruments Incorporated Hybrid speed coding and system
US20070106502A1 (en) * 2005-11-08 2007-05-10 Junghoe Kim Adaptive time/frequency-based audio encoding and decoding apparatuses and methods
WO2007120316A2 (fr) * 2005-12-05 2007-10-25 Qualcomm Incorporated Systèmes, procédés et appareil de détection de composantes tonales

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6233550B1 (en) * 1997-08-29 2001-05-15 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4kbps
US7039581B1 (en) * 1999-09-22 2006-05-02 Texas Instruments Incorporated Hybrid speed coding and system
US20070106502A1 (en) * 2005-11-08 2007-05-10 Junghoe Kim Adaptive time/frequency-based audio encoding and decoding apparatuses and methods
WO2007120316A2 (fr) * 2005-12-05 2007-10-25 Qualcomm Incorporated Systèmes, procédés et appareil de détection de composantes tonales

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SEAN A RAMPRASHAD: "The Multimode Transform Predictive Coding Paradigm", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 11, no. 2, 1 March 2003 (2003-03-01), XP011079700, ISSN: 1063-6676 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103915100A (zh) * 2013-01-07 2014-07-09 中兴通讯股份有限公司 一种编码模式切换方法和装置、解码模式切换方法和装置

Similar Documents

Publication Publication Date Title
JP6682683B2 (ja) 復号方法、コンピュータプログラム及び復号システム
RU2485606C2 (ru) Схема кодирования/декодирования аудио сигналов с низким битрейтом с применением каскадных переключений
JP5688852B2 (ja) オーディオコーデックポストフィルタ
CN104321815B (zh) 用于带宽扩展的高频编码/高频解码方法和设备
EP2491555B1 (fr) Audio multimode codec
CN107077858B (zh) 使用具有全带隙填充的频域处理器以及时域处理器的音频编码器和解码器
CA2833868C (fr) Appareil de quantification de coefficients de codage predictif lineaire, appareil de codage de son, appareil de dequantification de coefficients de codage predictif lineaire, appa reil de decodage de son et dispositif electronique s'y rapportant
CN106796800B (zh) 音频编码器、音频解码器、音频编码方法和音频解码方法
CA2833874C (fr) Procede de quantification de coefficients de codage predictif lineaire, procede de codage de son, procede de dequantification de coefficients de codage predictif lineaire, procede de decodage de son et support d'enregistrement
US10706865B2 (en) Apparatus and method for selecting one of a first encoding algorithm and a second encoding algorithm using harmonics reduction
EP2502230B1 (fr) Extension de largeur de bande de signal d'excitation amélioré
CN105122357B (zh) 频域中基于lpc进行编码的低频增强
JP2013528836A (ja) 広帯域音声コーディングのためのシステム、方法、装置、およびコンピュータプログラム製品
EP2193348A1 (fr) Procédé et dispositif pour une quantification efficace d'informations de transformée dans un codec de parole et d'audio incorporé
JP2016505902A (ja) 第1の符号化アルゴリズム及び第2の符号化アルゴリズムのうちの1つを選択するための装置及び方法
CA2983813C (fr) Codeur audio et procede de codage d'un signal audio
WO2009077950A1 (fr) Procede de codage audio temporel/frequentiel adaptatif
CN105122358B (zh) 用于处理编码信号的装置和方法与用于产生编码信号的编码器和方法
CA2910878C (fr) Appareil et methode destines a selectionner un d'un premier algorithme de codage et d'un deuxieme algorithme de codage a l'aide de reduction d'harmonique
WO2004097795A2 (fr) Amelioration vocale adaptatvie pour codage audio a faible debit binaire
KR20080034817A (ko) 부호화/복호화 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08861201

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08861201

Country of ref document: EP

Kind code of ref document: A1