WO2007055507A1 - Adaptive time/frequency-based audio encoding and decoding apparatuses and methods - Google Patents

Adaptive time/frequency-based audio encoding and decoding apparatuses and methods Download PDF

Info

Publication number
WO2007055507A1
WO2007055507A1 PCT/KR2006/004655 KR2006004655W WO2007055507A1 WO 2007055507 A1 WO2007055507 A1 WO 2007055507A1 KR 2006004655 W KR2006004655 W KR 2006004655W WO 2007055507 A1 WO2007055507 A1 WO 2007055507A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
domain
encoding
time
signal
Prior art date
Application number
PCT/KR2006/004655
Other languages
French (fr)
Inventor
Jung-Hoe Kim
Eun-Mi Oh
Chang-Yong Son
Ki-Hyun Choo
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Priority to EP06812491A priority Critical patent/EP1952400A4/en
Priority to CN2006800415925A priority patent/CN101305423B/en
Publication of WO2007055507A1 publication Critical patent/WO2007055507A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders

Definitions

  • the present general inventive concept relates to audio encoding and decoding apparatuses and methods, and more particularly, to adaptive time/frequency-based audio encoding and decoding apparatuses and methods which can obtain high compression efficiency by making efficient use of encoding gains of two encoding methods in which a frequency-domain transform is performed on input audio data such that time- based encoding is performed on a band of the audio data suitable for voice compression and frequency-based encoding is performed on remaining bands of the audio data.
  • Audio codec algorithms such as aacPlus
  • aacPlus compress a frequency-domain signal and apply a psychoacoustic model.
  • the audio codec algorithm outputs sound having a significantly lower quality than the voice codec algorithm.
  • the quality of sound output from the audio codec algorithm is more adversely affected by an attack signal.
  • AMR-WB compress a time-domain signal and apply a voicing model. Assuming that the voice codec and the audio codec compress audio signals having an equal amount of data, the voice codec algorithm outputs sound having a significantly lower quality than the audio codec algorithm. Disclosure of Invention
  • An AMR-WB plus algorithm considers the above characteristics of the conventional voice/music compression algorithm to efficiently perform voice/music compression.
  • an algebraic code excited linear prediction (ACELP) algorithm is used as a voice compression algorithm and a Tex character translation (TCX) algorithm is used as an audio compression algorithm.
  • ACELP algebraic code excited linear prediction
  • TCX Tex character translation
  • the AMR-WB plus algorithm determines whether to apply the ACELP algorithm or the TCX algorithm to each processing unit, for example, each frame on a time axis, and then performs encoding accordingly.
  • the AMR-WB plus algorithm is effective in compressing what is close to a voice signal.
  • the AMR-WB plus algorithm is used to compress what is close to an audio signal, the sound quality or compression rate deteriorates since the AMR-WB plus algorithm performs encoding in processing units.
  • the present general inventive concept provides adaptive time/frequency-based audio encoding and decoding apparatuses and methods which can obtain high compression efficiency by making efficient use of encoding gains of two encoding methods in which a frequency-domain transform is performed on input audio data such that time-based encoding is performed on a band of the audio data suitable for voice compression and frequency-based encoding is performed on remaining bands of the audio data.
  • an adaptive time/frequency-based audio encoding apparatus including a transformation & mode determination unit to divide an input audio signal into a plurality of frequency-domain signals and to select a time-based encoding mode or a frequency-based encoding mode for each respective frequency- domain signal, an encoding unit to encode each frequency-domain signal in the respective encoding modes selected by the transformation & mode determination unit, and a bitstream output unit to output encoded data, division information, and encoding mode information for each respective encoded frequency-domain signal.
  • the transformation & mode determination unit may include a frequency-domain transform unit to transform the input audio signal into a full frequency-domain signal, and an encoding mode determination unit to divide the full frequency-domain signal into the frequency-domain signals according to a preset standard and to determine the time-based encoding mode or the frequency-based encoding mode for each respective frequency-domain signal.
  • the full frequency-domain signal may be divided into the frequency-domain signals suitable for the time-based encoding mode or the frequency-based encoding mode based on at least one of a spectral tilt, a size of signal energy of each frequency domain, a change in signal energy between sub-frames and a voicing level determination, and the respective encoding mode for each frequency-domain signal is determined accordingly.
  • the encoding unit may include a time-based encoding unit to perform an inverse frequency-domain transform on a first frequency-domain signal determined to be encoded in the time-based encoding mode and to perform time-based encoding on the first frequency-domain signal on which the inverse frequency-domain transform has been performed, and a frequency-based encoding unit to perform frequency-based encoding on a second frequency-domain signal determined to be encoded in the frequency-based encoding mode.
  • the time-based encoding unit may select the encoding mode for the first frequency- domain signal based on at least one of a linear coding gain, a spectral change between linear prediction filters of adjacent frames, a predicted pitch delay, and a predicted long-term prediction gain, continue to perform the time-based encoding on the first frequency-domain signal when the time-based encoding unit determines that the time- based encoding mode is suitable for the first frequency-domain signal, and stop performing the time-based encoding on the first frequency-domain signal and transmit a mode conversion control signal to the transformation & mode determination unit when the time-based encoding unit determines that the frequency-based encoding mode is suitable for the first frequency-domain signal, and the transformation & mode determination unit may output the first frequency-domain signal, which was provided to the time-based encoding unit, to the frequency-based encoding unit in response to the mode conversion control signal.
  • the frequency-domain transform unit may perform the frequency-domain transform using a frequency varying modulated lapped transform (MLT).
  • the time-based encoding unit may quantize a residual signal obtained from linear prediction and dynamically allocate bits to the quantized residual signal according to importance.
  • the time-based encoding unit may transform the residual signal obtained from the linear prediction into a frequency-domain signal, quantize the frequency-domain signal, and dynamically allocate the bits to the quantized signal according to importance. The importance may be determined based on a voicing model.
  • the frequency-based encoding unit may determine a quantization step size of an input frequency-domain signal according to a psychoacoustic model and quantize the frequency-domain signal.
  • the frequency-based encoding unit may extract important frequency components from an input frequency-domain signal according to the psychoacoustic model, encode the extracted important frequency components, and encode the remaining signals using noise modeling.
  • the residual signal may be obtained using a code excited linear prediction (CELP) algorithm.
  • CELP code excited linear prediction
  • an audio data encoding apparatus including a transformation and mode determination unit to divide a frame of audio data into first audio data and second audio data, and an encoding unit to encode the first audio data in a time domain and to encode the second audio data in a frequency domain.
  • an adaptive time/frequency-based audio decoding apparatus including a bitstream sorting unit to extract encoded data for each frequency band, division information, and encoding mode information for each frequency band from an input bitstream, a decoding unit to decode the encoded data for each frequency domain based on the division information and the respective encoding mode information, and a collection & inverse transform unit to collect decoded data in a frequency domain and to perform an inverse frequency-domain transform on the collected data.
  • the decoding unit may include a time-based decoding unit to perform time-based decoding on first encoded data based on the division information and respective first encoding mode information, and a frequency-based decoding unit to perform frequency-based decoding on second encoded data based on the division information and respective second encoding mode information.
  • the collection & inverse transform unit may perform envelope smoothing on the decoded data in the frequency domain and then perform the inverse frequency-domain transform on the decoded data such that the decoded data maintains continuity in the frequency domain.
  • an audio data decoding apparatus including a bitstream sorting unit to extract encoded audio data of a frame, and a decoding unit to decode the audio data of the frame into first audio data in a time domain and second audio data in a frequency domain.
  • an adaptive time/frequency-based audio encoding method including dividing an input audio signal into a plurality of frequency- domain signals and selecting a time-based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal, encoding each frequency- domain signal in the respective encoding mode, and outputting encoded data, division information, and encoding mode information of each respective frequency-domain signal.
  • an audio data encoding method including dividing a frame of audio data into first audio data and second audio data, and encoding the first audio data in a time domain and encoding the second audio data in a frequency domain.
  • an adaptive time/frequency-based audio decoding method including extracting encoded data for each frequency band from an input bitstream, division information, and encoding mode information for each respective frequency band, decoding the encoded data for each frequency domain based on the division information and the respective encoding mode information, and collecting decoded data in a frequency domain and performing an inverse frequency- domain transform on the collected data.
  • FlG. 1 is a block diagram illustrating an adaptive time/frequency-based audio encoding apparatus according to an embodiment of the present general inventive concept
  • FlG. 2 is a conceptual diagram illustrating a method of dividing a signal on which a frequency-domain transform has been performed and determining an encoding mode using a transformation & mode determination unit of the adaptive time/ frequency-based audio encoding apparatus of FlG. 1, according to an embodiment of the present general inventive concept;
  • FlG. 3 is a detailed block diagram illustrating the transformation & mode determination unit of the adaptive time/frequency-based audio encoding apparatus of FlG. 1;
  • FlG. 4 is a detailed block diagram illustrating an encoding unit of the adaptive time/ frequency-based audio encoding apparatus of FlG. 1 ;
  • FlG. 5 is a block diagram of an adaptive time/frequency-based audio encoding apparatus having a time-based encoding unit of FlG. 4 with a function to confirm a determined encoding mode, according to another embodiment of the present general inventive concept;
  • FlG. 6 is a conceptual diagram illustrating a frequency- varying modulated lapped transform (MLT), which is an example of a frequency-domain transform method according to an embodiment of the present general inventive concept;
  • MHT frequency- varying modulated lapped transform
  • FlG. 7A is a conceptual diagram illustrating detailed operations of the time-based encoding unit and a frequency-based encoding unit of the adaptive time/ frequency-based audio encoding apparatus of FlG. 5, according to an embodiment of the present general inventive concept;
  • FlG. 7B is a conceptual diagram illustrating detailed operations of the time-based encoding unit and the frequency-based encoding unit of the adaptive time/ frequency-based audio encoding apparatus of FlG. 5, according to another embodiment of the present general inventive concept;
  • FlG. 8 is a block diagram of an adaptive time/frequency-based audio decoding apparatus according to an embodiment of the present general inventive concept
  • FlG. 9 is a flowchart illustrating an adaptive time/frequency-based audio encoding method according to an embodiment of the present general inventive concept.
  • FlG. 10 is a flowchart illustrating an adaptive time/frequency-based audio decoding method according to an embodiment of the present general inventive concept.
  • the present general inventive concept selects a time-based encoding method or a frequency-based encoding method for each frequency band of an input audio signal and encodes each frequency band of the input audio signal using the selected encoding method.
  • a prediction gain obtained from linear prediction is great or when the input audio signal is a high pitched signal, such as a voice signal, the time-based encoding method is more effective.
  • the input audio signal is a sinusoidal signal, when a high-frequency signal is included in the input audio signal, or when a masking effect between signals is great, the frequency-based encoding method is more effective.
  • the time-based encoding method denotes a voice compression algorithm, such as a code excited linear prediction (CELP) algorithm, which performs compression on a time axis.
  • CELP code excited linear prediction
  • the frequency- based encoding method denotes an audio compression algorithm, such as a Tex character translation (TCX) algorithm and an advanced audio coding (AAC) algorithm, which performs compression on a frequency axis.
  • TCX Tex character translation
  • AAC advanced audio coding
  • the embodiments of the present general inventive concept divide a frame of audio data, which is typically used as a unit for processing (e.g., encoding, decoding, compressing, decompressing, filtering, compensating, etc.) audio data, into sub-frames, bands, or frequency domain signals within the frame such that first audio data of the frame that can be effectively encoded as voice audio data in the time domain while second audio data of the frame that can be effectively encoded as non- voice audio data in the frequency domain.
  • processing e.g., encoding, decoding, compressing, decompressing, filtering, compensating, etc.
  • FlG. 1 is a block diagram illustrating an adaptive time/frequency-based audio encoding apparatus according to an embodiment of the present general inventive concept.
  • the apparatus includes a transformation & mode determination unit 100, an encoding unit 110, and a bitstream output unit 120.
  • the transformation & mode determination unit 100 divides an input audio signal IN into a plurality of frequency-domain signals and selects a time-based encoding mode or a frequency-based encoding mode for each frequency-domain signal. Then, the transformation & mode determination unit 100 outputs a frequency-domain signal Sl determined to be encoded in the time-based encoding mode, a frequency-domain signal
  • division information S3 and encoding mode information S4 for each frequency-domain signal.
  • a decoding end may not require the division information S3.
  • the division information S3 may not need to be output through the bitstream output unit 120.
  • the encoding unit 110 performs time-based encoding on the frequency-domain signal Sl and performs frequency-based encoding on the frequency-domain signal S2.
  • the encoding unit 110 outputs data S5 on which the time-based encoding has been performed and data S6 on which the frequency-based encoding has been performed.
  • the bitstream output unit 120 collects the data S5 and S6, the division information
  • bitstream OUT may have a data compression process performed thereon, such as an entropy-encoding process.
  • FlG. 2 is a conceptual diagram illustrating a method of dividing a signal on which a frequency-domain transform has been performed, and determining an encoding mode using the transformation & mode determination unit 100 of FlG. 1, according to an embodiment of the present general inventive concept.
  • an input audio signal (e.g., the input audio signal IN) includes a frequency component of 22,000 Hz and is divided into five frequency bands (e.g., corresponding to five frequency domain signals).
  • the time-based encoding mode, the frequency-based encoding mode, the time-based encoding mode, the frequency-based encoding mode, and the frequency-based encoding mode are respectively determined for the five frequency bands in the order of lowest to highest frequency band.
  • the input audio signal is an audio frame for a predetermined period of time, for example, 20 D .
  • FlG. 2 is a graph illustrating the audio frame on which the frequency-domain transform has been performed.
  • the audio frame is divided into five sub-frames sfl, sf2, sf3, sf4 and sf5 corresponding to five frequency domains (i.e., bands), respectively.
  • a spectral measuring method In order to divide the input audio signal into the five frequency bands and determine the corresponding encoding mode for each band as illustrated in FlG. 2, a spectral measuring method, an energy measuring method, a long-term prediction estimation method, and a voicing level determination method that distinguishes a voice sound from a voiceless sound may be used.
  • the spectral measuring method include dividing and determining based on a linear prediction coding gain, a spectral change between linear prediction filters of adjacent frames, and a spectral tilt.
  • the energy measuring method include dividing and determining based on the size of signal energy of each band and a change in signal energy between bands.
  • examples of the long-term prediction estimation method include dividing and determining based on a predicted pitch delay and a predicted long-term prediction gain.
  • FlG. 3 is a detailed block diagram illustrating an exemplary embodiment of the transformation & mode determination unit 100 of FlG. 1.
  • the transformation & mode determination unit 100 as illustrated in FlG. 3, includes a frequency-domain transform unit 300 and an encoding mode determination unit 310.
  • the frequency-domain transform unit 300 transforms the input audio signal IN into a full frequency-domain signal S7 having a frequency spectrum as illustrated in FlG. 2.
  • the frequency-domain transform unit 300 may use a modulated lapped transform (MLT) as a frequency-domain transform method.
  • MKT modulated lapped transform
  • the encoding mode determination unit 310 divides the full frequency-domain signal S7 into the plurality of frequency-domain signals according to a preset standard and selects either the time-based encoding mode or the frequency-based encoding mode for each frequency-domain signal based on the preset standard and/or a linear prediction coding gain, a spectral change between linear prediction filters of adjacent frames, a spectral tilt, the size of signal energy of each band, a change in signal energy between bands, a predicted pitch delay, or a predicted long-term prediction gain. That is, the encoding mode can be selected for each of the frequency-domain signal based on approximations, predictions, and/or estimations of frequency characteristics thereof.
  • these approximations, predictions, and/or estimations of the frequency characteristics can estimate which ones of the frequency domain-signals should be encoded using the time-based encoding mode such that remaining ones of the frequency domain-signals can be encoded in the frequency-based encoding mode.
  • the selected encoding mode e.g., the time based encoding mode
  • the selected encoding mode can subsequently be confirmed based on data generated during the encoding process such that the encoding process can be efficiently performed.
  • the encoding mode determination unit 310 outputs the frequency-domain signal Sl determined to be encoded in the time-based encoding mode, the frequency- domain signal S2 determined to be encoded in the frequency-based encoding mode, the division information S3, and the encoding mode information S4 for each frequency- domain signal.
  • the preset standard may be what can be determined in a frequency domain among the criteria for selecting the encoding mode described above. That is, the preset standard may be the spectral tilt, the size of signal energy of each frequency domain, the change in signal energy between sub-frames, or the voicing level determination.
  • the present general inventive concept is not limited thereto.
  • FlG. 4 is a detailed block diagram illustrating an exemplary embodiment of the encoding unit 110 of FlG. 1.
  • the encoding unit 110 as illustrated in FlG. 4 includes a time-based encoding unit 400 and a frequency-based encoding unit 410.
  • the time-based encoding unit 400 performs time-based encoding on the frequency- domain signal Sl using, for example, a linear prediction method.
  • an inverse frequency-domain transform is performed on the frequency-domain signal Sl before the time-based encoding such that the time-based encoding is performed once the frequency domain signal Sl is converted to the time domain.
  • the frequency-based encoding unit 410 performs the frequency-based encoding on the frequency-domain signal S2.
  • the time-based encoding unit 400 since the time-based encoding unit 400 uses an encoding component of a previous frame, the time-based encoding unit 400 includes a buffer (not illustrated) that stores the encoding component of the previous frame.
  • the time-based encoding unit 400 receives an encoding component S 8 of a current frame from the frequency-based encoding unit 410, stores the encoding component S8 of the current frame in the buffer, and uses the stored encoding component S 8 of the current frame to encode a next frame. This process will now be described in detail with reference to FlG. 2.
  • a linear predictive coding (LPC) coefficient of the third sub-frame sf3 of the previous frame is used to perform the time- based encoding on the third sub-frame sf3 of the current frame.
  • the LPC coefficient is the encoding component S 8 of the current frame, which is provided to the time-based encoding unit 400 and stored therein.
  • FlG. 5 is a block diagram illustrating an adaptive time/frequency-based audio encoding apparatus including a time-based encoding unit 510 (similar to the time- based encoding unit 400 of FlG. 4) with a function used to confirm a determined encoding mode, according to another embodiment of the present general inventive concept.
  • the apparatus includes a transformation & mode determination unit 500, the time-based encoding unit 510, a frequency-based encoding unit 520, and a bitstream output unit 530.
  • the frequency-based encoding unit 520 and the bitstream output unit 530 operate and function as described above.
  • the time-based encoding unit 510 performs the time-based encoding, as described above. In addition, the time-based encoding unit 510 determines whether the time- based encoding mode is suitable for the received frequency-domain signal Sl based on intermediate data values obtained during the time-based encoding. In other words, the time-based encoding unit 510 confirms the encoding mode determined by the transformation & mode determination unit 500 for the received frequency-domain signal Sl. That is, the time-based encoding unit 510 confirms that the time-based encoding is appropriate for the received frequency domain signal Sl during the time based encoding, based on the intermediate data values.
  • the time-based encoding unit 510 determines that the frequency-based encoding mode is suitable for the frequency-domain signal Sl, the time-based encoding unit 510 stops performing time-based encoding on the frequency-domain signal Sl and provides a mode conversion control signal S9 back to the transformation & mode determination unit 500. If the time-based encoding unit 510 determines that the time-based encoding mode is suitable for the frequency-domain signal Sl, the time-based encoding unit 510 continues to perform the time-based encoding on the frequency-domain signal Sl.
  • the time-based encoding unit 510 determines whether the time-based encoding mode or the frequency-based encoding mode is suitable for the frequency-domain signal Sl based on at least one of a linear coding gain, a spectral change between linear prediction filters of adjacent frames, a predicted pitch delay, and a predicted long-term prediction gain, all of which are obtained from the encoding process.
  • the transformation & mode determination unit 500 converts a current encoding mode of the frequency- domain signal Sl in response to the mode conversion control signal S9.
  • the frequency-based encoding is performed on the frequency-domain signal Sl which was initially determined to be encoded in the time-based encoding mode.
  • the encoding mode information S4 is changed from the time-based encoding mode to the frequency-based encoding mode.
  • the changed encoding mode information S4 that is, information indicating the frequency-based encoding mode, is transmitted to the decoding end.
  • FlG. 6 is a conceptual diagram illustrating a frequency- varying MLT (modulated lapped transform), which is an example of the frequency-domain transform method according to an embodiment of the present general inventive concept.
  • the frequency-domain transform method uses the MLT. Specifically, the frequency-domain transform method applies the frequency- varying MLT in which the MLT is performed on a portion of the entire frequency band.
  • the frequency- varying MLT is described in detail in 'A New Orthonormal Wavelet Packet Decomposition for Audio Coding Using Frequency- Varying Modulated Lapped Transform' by M. Purat and P. Noll, IEEE Workshop on Application of Signal Processing to Audio and Acoustics, October 1995, which is incorporated herein in its entirety.
  • an input signal x(n) is MLTed and then represented as N frequency components.
  • Ml frequency components and M2 frequency components are inverse MLTed and then represented as time- domain signals yl(n) and y2(n), respectively.
  • the remaining frequency components are represented as a signal y3(n).
  • Time-based encoding is performed on the time- domain signals yl(n) and y2(n)
  • frequency-based encoding is performed on the signal y3(n).
  • time-based decoding and then the MLT are performed on the time-domain signals yl(n) and y2(n), and frequency-based decoding is performed on the signal y3(n).
  • the MLTed signals yl(n), y2(n) and the signal y3(n) on which the frequency-based decoding was performed are inverse MLTed. Consequently, the input signal x(n) is restored to a signal x'(n).
  • the encoding and decoding processes are not illustrated, and only the transform process is illustrated.
  • the encoding and decoding processes are performed in stages indicated by the signals yl(n), y2(n), and y3(n).
  • the signals yl(n), y2(n), and y3(n) have resolutions of frequency bands Ml, M2, and N-M1-M2.
  • FlG. 7A is a conceptual diagram illustrating detailed operations of the time-based encoding unit 510 and the frequency-based encoding unit 520 of FlG. 5, according to an embodiment of the present general inventive concept.
  • FlG. 7A illustrates a case in which a residual signal (r 1 ) of the time-based encoding unit 510 is quantized in the time domain.
  • an inverse frequency-based transform is performed on the frequency-domain signal Sl output from the transformation & mode determination unit 500.
  • a linear prediction coefficient (LPC) analysis is performed on the frequency domain signal Sl, which has been transformed to the time domain, using a restored LPC coefficient (a 1 ) received from an operation of the frequency based encoding unit 410 (as described above).
  • LPC linear prediction coefficient
  • an open loop selection is made. In other words, it is determined whether the time-based encoding mode is suitable for the frequency-domain signal Sl.
  • the open loop selection is made based on at least one of a linear coding gain, a spectral change between linear prediction filters of adjacent frames, a predicted pitch delay, and a predicted long-term prediction gain, all of which are obtained from the time-based encoding process.
  • the open loop selection is made in the time-based encoding process. If it is determined that the time-based encoding mode is suitable for the frequency-domain signal Sl, the time-based encoding continues to be performed on the frequency-domain signal Sl. As a result, data on which the time-based encoding was performed is output, including a long-term filter coefficient, a short-term filter coefficient, and an excitation signal 'e.' If it is determined that the frequency-based encoding mode is suitable for the frequency-domain signal Sl, the mode conversion control signal S9 is transmitted to the transformation & mode determination unit 500.
  • the transformation & mode determination unit 500 determines the frequency-domain signal Sl to be encoded in the frequency-based encoding mode and outputs the frequency-domain signal S2 determined to be encoded in the frequency- based encoding mode. Then, frequency-domain encoding is performed on the frequency-domain signal S2. In other words, the transformation & mode determination unit 500 outputs the frequency-domain signal Sl again as S2 to the frequency-based encoding unit 410 such that the frequency domain signal can be encoded in the frequency based encoding mode (instead of the time based encoding mode).
  • the frequency-domain signal S2 output from the transformation & mode determination unit 500 is quantized in the frequency domain, and quantized data is output as data on which frequency-based encoding was performed.
  • FlG. 7B is a conceptual diagram illustrating detailed operations of the time-based encoding unit 510 and the frequency-based encoding unit 520 of FlG. 5, according to another embodiment of the present general inventive concept.
  • FlG. 7B illustrates a case in which a residual signal of the time-based encoding unit 510 is quantized in the frequency domain.
  • the open loop selection and the time-based encoding are performed on the frequency-domain signal Sl output from the transformation & mode determination unit 500, as described with reference to FlG. 7A.
  • the residual signal is frequency- domain-transformed and then quantized in the frequency domain.
  • the restored LPC coefficient (a 1 ) of the previous frame and the residual signal (r 1 ) are used.
  • a process of restoring the LPC coefficient a' is identical to the process illustrated in FlG. 7A.
  • a process of restoring the residual signal (r 1 ) is different.
  • the frequency-based encoding is performed on a corresponding frequency domain of the previous frame
  • data quantized in the frequency domain is inverse frequency- domain-transformed and added to an output of a long-term filter.
  • the residual signal r' is restored.
  • the time-based encoding is performed on the frequency domain of the previous frame, the data quantized in the frequency domain go through the inverse frequency-domain transform, the LPC analysis, and the short- term filter.
  • FlG. 8 is a block diagram illustrating an adaptive time/frequency-based audio decoding apparatus, according to an embodiment of the present general inventive concept.
  • the apparatus includes a bitstream sorting unit 800, a decoding unit 810, and a collection & inverse transform unit 820.
  • bitstream sorting unit 800 For each frequency band (i.e., domain) of an input bitstream INl, the bitstream sorting unit 800 extracts encoded data SlO, division information SIl, and encoding mode information S 12.
  • the decoding unit 810 decodes the encoded data SlO for each frequency band based on the extracted division information SIl and the encoding mode information S 12.
  • the decoding unit 810 includes a time-based decoding unit (not shown), which performs time-based decoding on the encoded data SlO based on the division information SIl and the encoding mode information S 12, and a frequency-based decoding unit (not shown).
  • the collection & inverse transform unit 820 collects decoded data S13 in the frequency domain, performs an inverse frequency-domain transform on the collected data S 13, and outputs audio data OUTl.
  • data on which time-based decoding is performed is inverse frequency-domain-transformed, before being collected in the frequency domain.
  • the decoded data S 13 for each frequency band is collected in the frequency domain, similar to a frequency spectrum of FlG. 2, an envelope mismatch between two adjacent frequency bands (i.e., sub-frames) may occur.
  • the collection & inverse transform unit 820 performs envelope smoothing on the decoded data S 13, before collecting the same.
  • FlG. 9 is a flowchart illustrating an adaptive time/frequency-based audio encoding method, according to an embodiment of the present general inventive concept.
  • the method of FlG. 9 may be performed by the adaptive time/frequency-based audio encoding apparatuses of FlG. 1 and/or FlG. 5. Accordingly, for illustration purposes, the method of FlG. 9 is described below with reference to FIGS. 1 to 7B. Referring to FIGS. 1 to 7B, and 9, the input audio signal IN is transformed by the frequency- domain transform unit 300 into a full frequency-domain signal (operation 900).
  • the full frequency-domain signal is divided into the plurality of frequency-domain signals (corresponding to the bands) by the encoding mode determination unit 310 according to the preset standard, and the encoding mode suitable for each respective frequency-domain signal is determined (operation 910).
  • the full frequency-domain signal is divided into the frequency-domain signals suitable for the time-based encoding mode or the frequency-based encoding mode based on at least one of the spectral tilt, the size of signal energy of each frequency domain, the change in signal energy between the sub-frames, and the voicing level determination. Then, the encoding mode suitable for each respective frequency-domain signal is determined according to the preset standard and the division of the full-frequency domain signal.
  • Each frequency-domain signal is encoded by the encoding unit 110 in the determined encoding mode (operation 920).
  • the time-based encoding unit 400 (and 510) performs the time-based encoding on the frequency-domain signal Sl determined to be encoded in the time-based encoding mode
  • the frequency- based encoding unit 410 (and 520) performs the frequency-based encoding on the frequency-domain signal S2 determined to be encoded in the frequency-based encoding mode.
  • the frequency domain signal S2 may be a different frequency band from the band of the frequency domain signal Sl, or the bands may be the same when the time based encoding unit 400 (and 51) determines that the time based encoding is not suitable for encoding the frequency domain signal Sl.
  • the time-based encoded data S5, the frequency-based encoded data S6, the division information S3, and the determined encoding mode information S4 are collected by the bitstream output unit 120 and output as the bitstream OUT (operation 930).
  • FIG. 10 is a flowchart illustrating an adaptive time/frequency-based audio decoding method, according to an embodiment of the present general inventive concept.
  • the method of FIG. 10 may be performed by the adaptive time/frequency-based audio decoding apparatus of FIG. 8. Accordingly, for illustration purposes, the method of FIG. 10 is described below with reference to FIG. 8.
  • the encoded data SlO for each frequency band (i.e., domain), the division information SIl, and the encoding mode information S 12 of each respective frequency band are extracted by the bitstream sorting unit 800 from the input bitstream INl (operation 1000).
  • the encoded data SlO is decoded by the decoding unit 810 based on the extracted division information SIl and the encoding mode information S 12 (operation 1010).
  • the decoded data S 13 is collected in the frequency domain by the collection & inverse transform unit 820 (operation 1020).
  • the envelope smoothing may be additionally performed on the collected data S 13 to prevent the envelope mismatch in the frequency domain.
  • the inverse frequency-domain transform is performed on the collected data S 13 by the collection & inverse transform unit 820 and is output as the audio data OUTl, which is a time-based signal (operation 1030).
  • acoustic characteristics and a voicing model are simultaneously applied to a frame which is an audio compression processing unit.
  • a compression method effective for both music and voice can be produced, and the compression method can be used for mobile terminals that require audio compression at a low bit rate.
  • the present general inventive concept can also be implemented as computer- readable code on a computer-readable recording medium.
  • the computer-readable recording medium may be any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD- ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet).
  • the computer-readable recording medium can also be distributed over network- coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Also, functional programs, code, and code segments for accomplishing the present general inventive concept can be easily construed by programmers skilled in the art to which the present general inventive concept pertains.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Adaptive time/frequency-based audio encoding and decoding apparatuses and methods. The encoding apparatus includes a transformation & mode determination unit to divide an input audio signal into a plurality of frequency-domain signals and to select a time-based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal, an encoding unit to encode each frequency-domain signal in the respective encoding mode, and a bitstream output unit to output encoded data, division information, and encoding mode information for each respective frequency-domain signal. In the apparatuses and methods, acoustic characteristics and a voicing model are simultaneously applied to a frame, which is an audio compression processing unit. As a result, a compression method effective for both music and voice can be produced, and the compression method can be used for mobile terminals that require audio compression at a low bit rate.

Description

Description ADAPTIVE TIME/FREQUENCY-BASED AUDIO ENCODING
AND DECODING APPARATUSES AND METHODS
Technical Field
[1] This application claims priority from Korean Patent Application No.
10-2005-0106354, filed on November 8, 2005, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
[2] The present general inventive concept relates to audio encoding and decoding apparatuses and methods, and more particularly, to adaptive time/frequency-based audio encoding and decoding apparatuses and methods which can obtain high compression efficiency by making efficient use of encoding gains of two encoding methods in which a frequency-domain transform is performed on input audio data such that time- based encoding is performed on a band of the audio data suitable for voice compression and frequency-based encoding is performed on remaining bands of the audio data.
Background Art
[3] Conventional voice/music compression algorithms can be broadly classified into audio codec algorithms and voice codec algorithms. Audio codec algorithms, such as aacPlus, compress a frequency-domain signal and apply a psychoacoustic model. Assuming that the audio codec and the voice codec compress voice signals have an equal amount of data, the audio codec algorithm outputs sound having a significantly lower quality than the voice codec algorithm. In particular, the quality of sound output from the audio codec algorithm is more adversely affected by an attack signal.
[4] Voice codec algorithms, such as an adaptive multi-rate wideband codec
(AMR-WB), compress a time-domain signal and apply a voicing model. Assuming that the voice codec and the audio codec compress audio signals having an equal amount of data, the voice codec algorithm outputs sound having a significantly lower quality than the audio codec algorithm. Disclosure of Invention
Technical Problem
[5] An AMR-WB plus algorithm considers the above characteristics of the conventional voice/music compression algorithm to efficiently perform voice/music compression. In the AMR-WB plus algorithm, an algebraic code excited linear prediction (ACELP) algorithm is used as a voice compression algorithm and a Tex character translation (TCX) algorithm is used as an audio compression algorithm. In particular, the AMR-WB plus algorithm determines whether to apply the ACELP algorithm or the TCX algorithm to each processing unit, for example, each frame on a time axis, and then performs encoding accordingly. In this case, the AMR-WB plus algorithm is effective in compressing what is close to a voice signal. However, when the AMR-WB plus algorithm is used to compress what is close to an audio signal, the sound quality or compression rate deteriorates since the AMR-WB plus algorithm performs encoding in processing units.
Technical Solution
[6] The present general inventive concept provides adaptive time/frequency-based audio encoding and decoding apparatuses and methods which can obtain high compression efficiency by making efficient use of encoding gains of two encoding methods in which a frequency-domain transform is performed on input audio data such that time-based encoding is performed on a band of the audio data suitable for voice compression and frequency-based encoding is performed on remaining bands of the audio data.
[7] Additional aspects of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
[8] The foregoing and/or other aspects and utilities of the present general inventive concept are achieved by providing an adaptive time/frequency-based audio encoding apparatus including a transformation & mode determination unit to divide an input audio signal into a plurality of frequency-domain signals and to select a time-based encoding mode or a frequency-based encoding mode for each respective frequency- domain signal, an encoding unit to encode each frequency-domain signal in the respective encoding modes selected by the transformation & mode determination unit, and a bitstream output unit to output encoded data, division information, and encoding mode information for each respective encoded frequency-domain signal.
[9] The transformation & mode determination unit may include a frequency-domain transform unit to transform the input audio signal into a full frequency-domain signal, and an encoding mode determination unit to divide the full frequency-domain signal into the frequency-domain signals according to a preset standard and to determine the time-based encoding mode or the frequency-based encoding mode for each respective frequency-domain signal.
[10] The full frequency-domain signal may be divided into the frequency-domain signals suitable for the time-based encoding mode or the frequency-based encoding mode based on at least one of a spectral tilt, a size of signal energy of each frequency domain, a change in signal energy between sub-frames and a voicing level determination, and the respective encoding mode for each frequency-domain signal is determined accordingly. [11] The encoding unit may include a time-based encoding unit to perform an inverse frequency-domain transform on a first frequency-domain signal determined to be encoded in the time-based encoding mode and to perform time-based encoding on the first frequency-domain signal on which the inverse frequency-domain transform has been performed, and a frequency-based encoding unit to perform frequency-based encoding on a second frequency-domain signal determined to be encoded in the frequency-based encoding mode.
[12] The time-based encoding unit may select the encoding mode for the first frequency- domain signal based on at least one of a linear coding gain, a spectral change between linear prediction filters of adjacent frames, a predicted pitch delay, and a predicted long-term prediction gain, continue to perform the time-based encoding on the first frequency-domain signal when the time-based encoding unit determines that the time- based encoding mode is suitable for the first frequency-domain signal, and stop performing the time-based encoding on the first frequency-domain signal and transmit a mode conversion control signal to the transformation & mode determination unit when the time-based encoding unit determines that the frequency-based encoding mode is suitable for the first frequency-domain signal, and the transformation & mode determination unit may output the first frequency-domain signal, which was provided to the time-based encoding unit, to the frequency-based encoding unit in response to the mode conversion control signal.
[13] The frequency-domain transform unit may perform the frequency-domain transform using a frequency varying modulated lapped transform (MLT). The time-based encoding unit may quantize a residual signal obtained from linear prediction and dynamically allocate bits to the quantized residual signal according to importance. The time-based encoding unit may transform the residual signal obtained from the linear prediction into a frequency-domain signal, quantize the frequency-domain signal, and dynamically allocate the bits to the quantized signal according to importance. The importance may be determined based on a voicing model.
[14] The frequency-based encoding unit may determine a quantization step size of an input frequency-domain signal according to a psychoacoustic model and quantize the frequency-domain signal. The frequency-based encoding unit may extract important frequency components from an input frequency-domain signal according to the psychoacoustic model, encode the extracted important frequency components, and encode the remaining signals using noise modeling.
[15] The residual signal may be obtained using a code excited linear prediction (CELP) algorithm.
[16] The foregoing and/or other aspects and utilities of the present general inventive concept are also achieved by providing an audio data encoding apparatus, including a transformation and mode determination unit to divide a frame of audio data into first audio data and second audio data, and an encoding unit to encode the first audio data in a time domain and to encode the second audio data in a frequency domain.
[17] The foregoing and/or other aspects and utilities of the present general inventive concept are also achieved by providing an adaptive time/frequency-based audio decoding apparatus including a bitstream sorting unit to extract encoded data for each frequency band, division information, and encoding mode information for each frequency band from an input bitstream, a decoding unit to decode the encoded data for each frequency domain based on the division information and the respective encoding mode information, and a collection & inverse transform unit to collect decoded data in a frequency domain and to perform an inverse frequency-domain transform on the collected data.
[18] The decoding unit may include a time-based decoding unit to perform time-based decoding on first encoded data based on the division information and respective first encoding mode information, and a frequency-based decoding unit to perform frequency-based decoding on second encoded data based on the division information and respective second encoding mode information.
[19] The collection & inverse transform unit may perform envelope smoothing on the decoded data in the frequency domain and then perform the inverse frequency-domain transform on the decoded data such that the decoded data maintains continuity in the frequency domain.
[20] The foregoing and/or other aspects and utilities of the present general inventive concept are also achieved by providing an audio data decoding apparatus, including a bitstream sorting unit to extract encoded audio data of a frame, and a decoding unit to decode the audio data of the frame into first audio data in a time domain and second audio data in a frequency domain.
[21] The foregoing and/or other aspects and utilities of the present general inventive concept are also achieved by providing an adaptive time/frequency-based audio encoding method including dividing an input audio signal into a plurality of frequency- domain signals and selecting a time-based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal, encoding each frequency- domain signal in the respective encoding mode, and outputting encoded data, division information, and encoding mode information of each respective frequency-domain signal.
[22] The foregoing and/or other aspects and utilities of the present general inventive concept are also achieved by providing an audio data encoding method, including dividing a frame of audio data into first audio data and second audio data, and encoding the first audio data in a time domain and encoding the second audio data in a frequency domain.
[23] The foregoing and/or other aspects and utilities of the present general inventive concept are also achieved by providing an adaptive time/frequency-based audio decoding method including extracting encoded data for each frequency band from an input bitstream, division information, and encoding mode information for each respective frequency band, decoding the encoded data for each frequency domain based on the division information and the respective encoding mode information, and collecting decoded data in a frequency domain and performing an inverse frequency- domain transform on the collected data.
Advantageous Effects
[24]
Description of Drawings
[25] These and/or other aspects of the present general inventive concept will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
[26] FlG. 1 is a block diagram illustrating an adaptive time/frequency-based audio encoding apparatus according to an embodiment of the present general inventive concept;
[27] FlG. 2 is a conceptual diagram illustrating a method of dividing a signal on which a frequency-domain transform has been performed and determining an encoding mode using a transformation & mode determination unit of the adaptive time/ frequency-based audio encoding apparatus of FlG. 1, according to an embodiment of the present general inventive concept;
[28] FlG. 3 is a detailed block diagram illustrating the transformation & mode determination unit of the adaptive time/frequency-based audio encoding apparatus of FlG. 1;
[29] FlG. 4 is a detailed block diagram illustrating an encoding unit of the adaptive time/ frequency-based audio encoding apparatus of FlG. 1 ;
[30] FlG. 5 is a block diagram of an adaptive time/frequency-based audio encoding apparatus having a time-based encoding unit of FlG. 4 with a function to confirm a determined encoding mode, according to another embodiment of the present general inventive concept;
[31] FlG. 6 is a conceptual diagram illustrating a frequency- varying modulated lapped transform (MLT), which is an example of a frequency-domain transform method according to an embodiment of the present general inventive concept;
[32] FlG. 7A is a conceptual diagram illustrating detailed operations of the time-based encoding unit and a frequency-based encoding unit of the adaptive time/ frequency-based audio encoding apparatus of FlG. 5, according to an embodiment of the present general inventive concept;
[33] FlG. 7B is a conceptual diagram illustrating detailed operations of the time-based encoding unit and the frequency-based encoding unit of the adaptive time/ frequency-based audio encoding apparatus of FlG. 5, according to another embodiment of the present general inventive concept;
[34] FlG. 8 is a block diagram of an adaptive time/frequency-based audio decoding apparatus according to an embodiment of the present general inventive concept;
[35] FlG. 9 is a flowchart illustrating an adaptive time/frequency-based audio encoding method according to an embodiment of the present general inventive concept; and
[36] FlG. 10 is a flowchart illustrating an adaptive time/frequency-based audio decoding method according to an embodiment of the present general inventive concept.
Best Mode
[37]
Mode for Invention
[38] The present general inventive concept will now be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the general inventive concept are illustrated. The general inventive concept may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein, rather, these embodiments are provided so that this description will be thorough and complete, and will fully convey the aspects and utilities of the general inventive concept to those skilled in the art.
[39] The present general inventive concept selects a time-based encoding method or a frequency-based encoding method for each frequency band of an input audio signal and encodes each frequency band of the input audio signal using the selected encoding method. When a prediction gain obtained from linear prediction is great or when the input audio signal is a high pitched signal, such as a voice signal, the time-based encoding method is more effective. When the input audio signal is a sinusoidal signal, when a high-frequency signal is included in the input audio signal, or when a masking effect between signals is great, the frequency-based encoding method is more effective.
[40] In the present general inventive concept, the time-based encoding method denotes a voice compression algorithm, such as a code excited linear prediction (CELP) algorithm, which performs compression on a time axis. In addition, the frequency- based encoding method denotes an audio compression algorithm, such as a Tex character translation (TCX) algorithm and an advanced audio coding (AAC) algorithm, which performs compression on a frequency axis. [41] Additionally, the embodiments of the present general inventive concept divide a frame of audio data, which is typically used as a unit for processing (e.g., encoding, decoding, compressing, decompressing, filtering, compensating, etc.) audio data, into sub-frames, bands, or frequency domain signals within the frame such that first audio data of the frame that can be effectively encoded as voice audio data in the time domain while second audio data of the frame that can be effectively encoded as non- voice audio data in the frequency domain.
[42] FlG. 1 is a block diagram illustrating an adaptive time/frequency-based audio encoding apparatus according to an embodiment of the present general inventive concept. The apparatus includes a transformation & mode determination unit 100, an encoding unit 110, and a bitstream output unit 120.
[43] The transformation & mode determination unit 100 divides an input audio signal IN into a plurality of frequency-domain signals and selects a time-based encoding mode or a frequency-based encoding mode for each frequency-domain signal. Then, the transformation & mode determination unit 100 outputs a frequency-domain signal Sl determined to be encoded in the time-based encoding mode, a frequency-domain signal
52 determined to be encoded in the frequency-based encoding mode, and division information S3 and encoding mode information S4 for each frequency-domain signal. When the input audio signal IN is consistently divided, a decoding end may not require the division information S3. In this case, the division information S3 may not need to be output through the bitstream output unit 120.
[44] The encoding unit 110 performs time-based encoding on the frequency-domain signal Sl and performs frequency-based encoding on the frequency-domain signal S2. The encoding unit 110 outputs data S5 on which the time-based encoding has been performed and data S6 on which the frequency-based encoding has been performed.
[45] The bitstream output unit 120 collects the data S5 and S6, the division information
53 and the encoding mode information S4 of each frequency-domain signal, and outputs a bitstream OUT. Here, the bitstream OUT may have a data compression process performed thereon, such as an entropy-encoding process.
[46] FlG. 2 is a conceptual diagram illustrating a method of dividing a signal on which a frequency-domain transform has been performed, and determining an encoding mode using the transformation & mode determination unit 100 of FlG. 1, according to an embodiment of the present general inventive concept.
[47] Referring to FlG. 2, an input audio signal (e.g., the input audio signal IN) includes a frequency component of 22,000 Hz and is divided into five frequency bands (e.g., corresponding to five frequency domain signals). The time-based encoding mode, the frequency-based encoding mode, the time-based encoding mode, the frequency-based encoding mode, and the frequency-based encoding mode are respectively determined for the five frequency bands in the order of lowest to highest frequency band. The input audio signal is an audio frame for a predetermined period of time, for example, 20 D . In other words, FlG. 2 is a graph illustrating the audio frame on which the frequency-domain transform has been performed. The audio frame is divided into five sub-frames sfl, sf2, sf3, sf4 and sf5 corresponding to five frequency domains (i.e., bands), respectively.
[48] In order to divide the input audio signal into the five frequency bands and determine the corresponding encoding mode for each band as illustrated in FlG. 2, a spectral measuring method, an energy measuring method, a long-term prediction estimation method, and a voicing level determination method that distinguishes a voice sound from a voiceless sound may be used. Examples of the spectral measuring method include dividing and determining based on a linear prediction coding gain, a spectral change between linear prediction filters of adjacent frames, and a spectral tilt. Examples of the energy measuring method include dividing and determining based on the size of signal energy of each band and a change in signal energy between bands. In addition, examples of the long-term prediction estimation method include dividing and determining based on a predicted pitch delay and a predicted long-term prediction gain.
[49] FlG. 3 is a detailed block diagram illustrating an exemplary embodiment of the transformation & mode determination unit 100 of FlG. 1. The transformation & mode determination unit 100, as illustrated in FlG. 3, includes a frequency-domain transform unit 300 and an encoding mode determination unit 310.
[50] The frequency-domain transform unit 300 transforms the input audio signal IN into a full frequency-domain signal S7 having a frequency spectrum as illustrated in FlG. 2. The frequency-domain transform unit 300 may use a modulated lapped transform (MLT) as a frequency-domain transform method.
[51] The encoding mode determination unit 310 divides the full frequency-domain signal S7 into the plurality of frequency-domain signals according to a preset standard and selects either the time-based encoding mode or the frequency-based encoding mode for each frequency-domain signal based on the preset standard and/or a linear prediction coding gain, a spectral change between linear prediction filters of adjacent frames, a spectral tilt, the size of signal energy of each band, a change in signal energy between bands, a predicted pitch delay, or a predicted long-term prediction gain. That is, the encoding mode can be selected for each of the frequency-domain signal based on approximations, predictions, and/or estimations of frequency characteristics thereof. These approximations, predictions, and/or estimations of the frequency characteristics can estimate which ones of the frequency domain-signals should be encoded using the time-based encoding mode such that remaining ones of the frequency domain-signals can be encoded in the frequency-based encoding mode. As described below, the selected encoding mode (e.g., the time based encoding mode) can subsequently be confirmed based on data generated during the encoding process such that the encoding process can be efficiently performed.
[52] Then, the encoding mode determination unit 310 outputs the frequency-domain signal Sl determined to be encoded in the time-based encoding mode, the frequency- domain signal S2 determined to be encoded in the frequency-based encoding mode, the division information S3, and the encoding mode information S4 for each frequency- domain signal. The preset standard may be what can be determined in a frequency domain among the criteria for selecting the encoding mode described above. That is, the preset standard may be the spectral tilt, the size of signal energy of each frequency domain, the change in signal energy between sub-frames, or the voicing level determination. However, the present general inventive concept is not limited thereto.
[53] FlG. 4 is a detailed block diagram illustrating an exemplary embodiment of the encoding unit 110 of FlG. 1. The encoding unit 110 as illustrated in FlG. 4 includes a time-based encoding unit 400 and a frequency-based encoding unit 410.
[54] The time-based encoding unit 400 performs time-based encoding on the frequency- domain signal Sl using, for example, a linear prediction method. Here, an inverse frequency-domain transform is performed on the frequency-domain signal Sl before the time-based encoding such that the time-based encoding is performed once the frequency domain signal Sl is converted to the time domain.
[55] The frequency-based encoding unit 410 performs the frequency-based encoding on the frequency-domain signal S2.
[56] Since the time-based encoding unit 400 uses an encoding component of a previous frame, the time-based encoding unit 400 includes a buffer (not illustrated) that stores the encoding component of the previous frame. The time-based encoding unit 400 receives an encoding component S 8 of a current frame from the frequency-based encoding unit 410, stores the encoding component S8 of the current frame in the buffer, and uses the stored encoding component S 8 of the current frame to encode a next frame. This process will now be described in detail with reference to FlG. 2.
[57] In particular, if the third sub-frame sf3 of the current frame is to be encoded by the time-based encoding unit 400 and frequency-based encoding has been performed on the third sub-frame sf3 of the previous frame, a linear predictive coding (LPC) coefficient of the third sub-frame sf3 of the previous frame is used to perform the time- based encoding on the third sub-frame sf3 of the current frame. The LPC coefficient is the encoding component S 8 of the current frame, which is provided to the time-based encoding unit 400 and stored therein.
[58] FlG. 5 is a block diagram illustrating an adaptive time/frequency-based audio encoding apparatus including a time-based encoding unit 510 (similar to the time- based encoding unit 400 of FlG. 4) with a function used to confirm a determined encoding mode, according to another embodiment of the present general inventive concept. The apparatus includes a transformation & mode determination unit 500, the time-based encoding unit 510, a frequency-based encoding unit 520, and a bitstream output unit 530.
[59] The frequency-based encoding unit 520 and the bitstream output unit 530 operate and function as described above.
[60] The time-based encoding unit 510 performs the time-based encoding, as described above. In addition, the time-based encoding unit 510 determines whether the time- based encoding mode is suitable for the received frequency-domain signal Sl based on intermediate data values obtained during the time-based encoding. In other words, the time-based encoding unit 510 confirms the encoding mode determined by the transformation & mode determination unit 500 for the received frequency-domain signal Sl. That is, the time-based encoding unit 510 confirms that the time-based encoding is appropriate for the received frequency domain signal Sl during the time based encoding, based on the intermediate data values.
[61] If the time-based encoding unit 510 determines that the frequency-based encoding mode is suitable for the frequency-domain signal Sl, the time-based encoding unit 510 stops performing time-based encoding on the frequency-domain signal Sl and provides a mode conversion control signal S9 back to the transformation & mode determination unit 500. If the time-based encoding unit 510 determines that the time-based encoding mode is suitable for the frequency-domain signal Sl, the time-based encoding unit 510 continues to perform the time-based encoding on the frequency-domain signal Sl. The time-based encoding unit 510 determines whether the time-based encoding mode or the frequency-based encoding mode is suitable for the frequency-domain signal Sl based on at least one of a linear coding gain, a spectral change between linear prediction filters of adjacent frames, a predicted pitch delay, and a predicted long-term prediction gain, all of which are obtained from the encoding process.
[62] When the mode conversion control signal S9 is generated, the transformation & mode determination unit 500 converts a current encoding mode of the frequency- domain signal Sl in response to the mode conversion control signal S9. As a result, the frequency-based encoding is performed on the frequency-domain signal Sl which was initially determined to be encoded in the time-based encoding mode. Accordingly, the encoding mode information S4 is changed from the time-based encoding mode to the frequency-based encoding mode. Then, the changed encoding mode information S4, that is, information indicating the frequency-based encoding mode, is transmitted to the decoding end. [63] FlG. 6 is a conceptual diagram illustrating a frequency- varying MLT (modulated lapped transform), which is an example of the frequency-domain transform method according to an embodiment of the present general inventive concept.
[64] As described above, the frequency-domain transform method according to the present general inventive concept uses the MLT. Specifically, the frequency-domain transform method applies the frequency- varying MLT in which the MLT is performed on a portion of the entire frequency band. The frequency- varying MLT is described in detail in 'A New Orthonormal Wavelet Packet Decomposition for Audio Coding Using Frequency- Varying Modulated Lapped Transform' by M. Purat and P. Noll, IEEE Workshop on Application of Signal Processing to Audio and Acoustics, October 1995, which is incorporated herein in its entirety.
[65] Referring to FIG. 6, an input signal x(n) is MLTed and then represented as N frequency components. Of the N frequency components, Ml frequency components and M2 frequency components are inverse MLTed and then represented as time- domain signals yl(n) and y2(n), respectively. The remaining frequency components are represented as a signal y3(n). Time-based encoding is performed on the time- domain signals yl(n) and y2(n), and frequency-based encoding is performed on the signal y3(n). Conversely, at the decoding end, time-based decoding and then the MLT are performed on the time-domain signals yl(n) and y2(n), and frequency-based decoding is performed on the signal y3(n). The MLTed signals yl(n), y2(n) and the signal y3(n) on which the frequency-based decoding was performed are inverse MLTed. Consequently, the input signal x(n) is restored to a signal x'(n). In FlG. 6, the encoding and decoding processes are not illustrated, and only the transform process is illustrated. The encoding and decoding processes are performed in stages indicated by the signals yl(n), y2(n), and y3(n). The signals yl(n), y2(n), and y3(n) have resolutions of frequency bands Ml, M2, and N-M1-M2.
[66] FlG. 7A is a conceptual diagram illustrating detailed operations of the time-based encoding unit 510 and the frequency-based encoding unit 520 of FlG. 5, according to an embodiment of the present general inventive concept. FlG. 7A illustrates a case in which a residual signal (r1) of the time-based encoding unit 510 is quantized in the time domain.
[67] Referring to FlG. 7A, an inverse frequency-based transform is performed on the frequency-domain signal Sl output from the transformation & mode determination unit 500. A linear prediction coefficient (LPC) analysis is performed on the frequency domain signal Sl, which has been transformed to the time domain, using a restored LPC coefficient (a1) received from an operation of the frequency based encoding unit 410 (as described above). After the linear prediction coefficient (LPC) analysis and the LTF analysis, an open loop selection is made. In other words, it is determined whether the time-based encoding mode is suitable for the frequency-domain signal Sl. The open loop selection is made based on at least one of a linear coding gain, a spectral change between linear prediction filters of adjacent frames, a predicted pitch delay, and a predicted long-term prediction gain, all of which are obtained from the time-based encoding process.
[68] The open loop selection is made in the time-based encoding process. If it is determined that the time-based encoding mode is suitable for the frequency-domain signal Sl, the time-based encoding continues to be performed on the frequency-domain signal Sl. As a result, data on which the time-based encoding was performed is output, including a long-term filter coefficient, a short-term filter coefficient, and an excitation signal 'e.' If it is determined that the frequency-based encoding mode is suitable for the frequency-domain signal Sl, the mode conversion control signal S9 is transmitted to the transformation & mode determination unit 500. In response to the mode conversion control signal S9, the transformation & mode determination unit 500 determines the frequency-domain signal Sl to be encoded in the frequency-based encoding mode and outputs the frequency-domain signal S2 determined to be encoded in the frequency- based encoding mode. Then, frequency-domain encoding is performed on the frequency-domain signal S2. In other words, the transformation & mode determination unit 500 outputs the frequency-domain signal Sl again as S2 to the frequency-based encoding unit 410 such that the frequency domain signal can be encoded in the frequency based encoding mode (instead of the time based encoding mode).
[69] The frequency-domain signal S2 output from the transformation & mode determination unit 500 is quantized in the frequency domain, and quantized data is output as data on which frequency-based encoding was performed.
[70] FlG. 7B is a conceptual diagram illustrating detailed operations of the time-based encoding unit 510 and the frequency-based encoding unit 520 of FlG. 5, according to another embodiment of the present general inventive concept. FlG. 7B illustrates a case in which a residual signal of the time-based encoding unit 510 is quantized in the frequency domain.
[71] Referring to FlG. 7B, the open loop selection and the time-based encoding are performed on the frequency-domain signal Sl output from the transformation & mode determination unit 500, as described with reference to FlG. 7A. However, in the time- based encoding of the present embodiment, the residual signal is frequency- domain-transformed and then quantized in the frequency domain.
[72] In order to perform the time-based encoding on the current frame, the restored LPC coefficient (a1) of the previous frame and the residual signal (r1) are used. In this case, a process of restoring the LPC coefficient a' is identical to the process illustrated in FlG. 7A. However, a process of restoring the residual signal (r1) is different. When the frequency-based encoding is performed on a corresponding frequency domain of the previous frame, data quantized in the frequency domain is inverse frequency- domain-transformed and added to an output of a long-term filter. As a result, the residual signal r' is restored. When the time-based encoding is performed on the frequency domain of the previous frame, the data quantized in the frequency domain go through the inverse frequency-domain transform, the LPC analysis, and the short- term filter.
[73] FlG. 8 is a block diagram illustrating an adaptive time/frequency-based audio decoding apparatus, according to an embodiment of the present general inventive concept. Referring to FlG. 8, the apparatus includes a bitstream sorting unit 800, a decoding unit 810, and a collection & inverse transform unit 820.
[74] For each frequency band (i.e., domain) of an input bitstream INl, the bitstream sorting unit 800 extracts encoded data SlO, division information SIl, and encoding mode information S 12.
[75] The decoding unit 810 decodes the encoded data SlO for each frequency band based on the extracted division information SIl and the encoding mode information S 12. The decoding unit 810 includes a time-based decoding unit (not shown), which performs time-based decoding on the encoded data SlO based on the division information SIl and the encoding mode information S 12, and a frequency-based decoding unit (not shown).
[76] The collection & inverse transform unit 820 collects decoded data S13 in the frequency domain, performs an inverse frequency-domain transform on the collected data S 13, and outputs audio data OUTl. In particular, data on which time-based decoding is performed is inverse frequency-domain-transformed, before being collected in the frequency domain. When the decoded data S 13 for each frequency band is collected in the frequency domain, similar to a frequency spectrum of FlG. 2, an envelope mismatch between two adjacent frequency bands (i.e., sub-frames) may occur. In order to prevent the envelope mismatch in the frequency domain, the collection & inverse transform unit 820 performs envelope smoothing on the decoded data S 13, before collecting the same.
[77] FlG. 9 is a flowchart illustrating an adaptive time/frequency-based audio encoding method, according to an embodiment of the present general inventive concept. The method of FlG. 9 may be performed by the adaptive time/frequency-based audio encoding apparatuses of FlG. 1 and/or FlG. 5. Accordingly, for illustration purposes, the method of FlG. 9 is described below with reference to FIGS. 1 to 7B. Referring to FIGS. 1 to 7B, and 9, the input audio signal IN is transformed by the frequency- domain transform unit 300 into a full frequency-domain signal (operation 900).
[78] The full frequency-domain signal is divided into the plurality of frequency-domain signals (corresponding to the bands) by the encoding mode determination unit 310 according to the preset standard, and the encoding mode suitable for each respective frequency-domain signal is determined (operation 910). As described above, the full frequency-domain signal is divided into the frequency-domain signals suitable for the time-based encoding mode or the frequency-based encoding mode based on at least one of the spectral tilt, the size of signal energy of each frequency domain, the change in signal energy between the sub-frames, and the voicing level determination. Then, the encoding mode suitable for each respective frequency-domain signal is determined according to the preset standard and the division of the full-frequency domain signal.
[79] Each frequency-domain signal is encoded by the encoding unit 110 in the determined encoding mode (operation 920). In other words, the time-based encoding unit 400 (and 510) performs the time-based encoding on the frequency-domain signal Sl determined to be encoded in the time-based encoding mode, and the frequency- based encoding unit 410 (and 520) performs the frequency-based encoding on the frequency-domain signal S2 determined to be encoded in the frequency-based encoding mode. The frequency domain signal S2 may be a different frequency band from the band of the frequency domain signal Sl, or the bands may be the same when the time based encoding unit 400 (and 51) determines that the time based encoding is not suitable for encoding the frequency domain signal Sl.
[80] The time-based encoded data S5, the frequency-based encoded data S6, the division information S3, and the determined encoding mode information S4 are collected by the bitstream output unit 120 and output as the bitstream OUT (operation 930).
[81] FIG. 10 is a flowchart illustrating an adaptive time/frequency-based audio decoding method, according to an embodiment of the present general inventive concept. The method of FIG. 10 may be performed by the adaptive time/frequency-based audio decoding apparatus of FIG. 8. Accordingly, for illustration purposes, the method of FIG. 10 is described below with reference to FIG. 8. Referring to FIG. 10, the encoded data SlO for each frequency band (i.e., domain), the division information SIl, and the encoding mode information S 12 of each respective frequency band are extracted by the bitstream sorting unit 800 from the input bitstream INl (operation 1000).
[82] The encoded data SlO is decoded by the decoding unit 810 based on the extracted division information SIl and the encoding mode information S 12 (operation 1010).
[83] The decoded data S 13 is collected in the frequency domain by the collection & inverse transform unit 820 (operation 1020). The envelope smoothing may be additionally performed on the collected data S 13 to prevent the envelope mismatch in the frequency domain.
[84] The inverse frequency-domain transform is performed on the collected data S 13 by the collection & inverse transform unit 820 and is output as the audio data OUTl, which is a time-based signal (operation 1030).
[85] According to the embodiments of the present general inventive concept, acoustic characteristics and a voicing model are simultaneously applied to a frame which is an audio compression processing unit. As a result, a compression method effective for both music and voice can be produced, and the compression method can be used for mobile terminals that require audio compression at a low bit rate.
[86] The present general inventive concept can also be implemented as computer- readable code on a computer-readable recording medium. The computer-readable recording medium may be any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD- ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet).
[87] The computer-readable recording medium can also be distributed over network- coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Also, functional programs, code, and code segments for accomplishing the present general inventive concept can be easily construed by programmers skilled in the art to which the present general inventive concept pertains.
[88] Although a few embodiments of the present general inventive concept have been shown and described, it will be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the general inventive concept, the scope of which is defined in the appended claims and their equivalents.
Industrial Applicability
[89]
Sequence List Text
[90]

Claims

Claims
[1] 1. An adaptive time/frequency-based audio encoding apparatus, comprising: a transformation & mode determination unit to divide an input audio signal into a plurality of frequency-domain signals and to select a time-based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal; an encoding unit to encode each frequency-domain signal in the respective encoding modes selected by the transformation & mode determination unit; and a bitstream output unit to output encoded data, division information, and encoding mode information for each respective encoded frequency-domain signal.
[2] 2. The apparatus of claim 1, wherein the transformation & mode determination unit comprises: a frequency-domain transform unit to transform the input audio signal into a full frequency-domain signal; and an encoding mode determination unit to divide the full frequency-domain signal into the frequency-domain signals according to a preset standard and to determine the time-based encoding mode or the frequency-based encoding mode for each respective frequency-domain signal.
[3] 3. The apparatus of claim 2, wherein the full frequency-domain signal is divided into the frequency-domain signals suitable for the time-based encoding mode or the frequency-based encoding mode based on at least one of a spectral tilt, a size of signal energy of each frequency domain, a change in signal energy between sub-frames, and a voicing level determination, and the respective encoding mode for each frequency-domain signal is determined accordingly.
[4] 4. The apparatus of claim 1, wherein the encoding unit comprises: a time-based encoding unit to perform an inverse frequency-domain transform on a first frequency-domain signal determined to be encoded in the time-based encoding mode and to perform time-based encoding on the first frequency- domain signal on which the inverse frequency-domain transform is performed; and a frequency-based encoding unit to perform frequency-based encoding on a second frequency-domain signal determined to be encoded in the frequency- based encoding mode.
[5] 5. The apparatus of claim 4, wherein the time-based encoding unit selects the encoding mode for the first input frequency-domain signal based on at least one of a linear coding gain, a spectral change between linear prediction filters of adjacent frames, a predicted pitch delay, and a predicted long-term prediction gain, continues to perform the time-based encoding on the first frequency- domain signal when the time-based encoding unit determines that the time-based encoding mode is suitable for the first frequency-domain signal, and stops performing the time-based encoding on the first frequency-domain signal and transmits a mode conversion control signal to the transformation & mode determination unit when the time-based encoding unit determines that the frequency-based encoding mode is suitable for the first frequency-domain signal, and the transformation & mode determination unit outputs the first frequency- domain signal again, which was provided to the time-based encoding unit, to the frequency-based encoding unit in response to the mode conversion control signal.
[6] 6. The apparatus of claim 2, wherein the frequency-domain transform unit performs the frequency-domain transform using a frequency varying modulated lapped transform (MLT).
[7] 7. The apparatus of claim 4, wherein the time-based encoding unit quantizes a residual signal obtained from linear prediction and dynamically allocates bits to the quantized residual signal according to importance.
[8] 8. The apparatus of claim 4, wherein the time-based encoding unit transforms a residual signal obtained from a linear prediction into a frequency-domain signal, quantizes the frequency-domain signal, and dynamically allocates bits to the quantized signal according to importance.
[9] 9. The apparatus of claim 7, wherein the importance is determined based on a voicing model.
[10] 10. The apparatus of claim 8, wherein the importance is determined based on a voicing model.
[11] 11. The apparatus of claim 4, wherein the frequency-based encoding unit determines a quantization step size of an input frequency-domain signal according to a psychoacoustic model and quantizes the frequency-domain signal.
[12] 12. The apparatus of claim 4, wherein the frequency-based encoding unit extracts important frequency components from an input frequency-domain signal according to a psychoacoustic model, encodes the extracted important frequency components, and encodes remaining signals using noise modeling.
[13] 13. The apparatus of claim 8, wherein the residual signal is obtained using a code excited linear prediction (CELP) algorithm.
[14] 14. An audio data encoding apparatus, comprising: a transformation and mode determination unit to divide a frame of audio data into first audio data and second audio data; and an encoding unit to encode the first audio data in a time domain and to encode the second audio data in a frequency domain.
[15] 15. The apparatus of claim 14, wherein the transformation and mode determination unit divides the frame based on estimations of frequency characteristics of the first and second audio data in the frame into voice audio data and non-voice audio data, respectively.
[16] 16. The apparatus of claim 15, wherein the encoding unit performs a time based encoding operation on the voice data, determines whether the estimation of the first audio data as the voice data is accurate during the time based encoding operation by evaluating intermediate data generated during the time based encoding operation, continues the time based encoding operation on the first audio data when the estimation of the first audio data as the voice data is accurate, and stops the time based encoding operation when the estimation of the first audio data as the voice data is not accurate.
[17] 17. The apparatus of claim 16, wherein the encoding unit performs a frequency based encoding operation on the first audio data when the estimation of the first audio data as the voice data is not accurate.
[18] 18. The apparatus of claim 14, wherein the transformation and mode determination unit approximates the first audio data in a frame as voice audio data and approximates the second audio data in the frame as non- voice audio data.
[19] 19. The apparatus of claim 18, wherein the transformation and mode determination unit performs the approximation using at least one of a spectral measuring method, an energy measuring method, a long term prediction estimation method, and a voicing level determination method.
[20] 20. The apparatus of claim 14, further comprising: a collection and inverse transform unit to transform one of the first and second audio data such that the first and second audio data are in the same one of the time and frequency domains and to combine the first and second audio data into an output bitstream.
[21] 21. The apparatus of claim 20, wherein the output bitstream comprises at least one of encoding mode information about encoding processes used to encode the first and second audio data and division information about how the first and second audio data are divided with respect to each other in the same frame.
[22] 22. The apparatus of claim 14, wherein the first and second audio data each includes one or more sub-frames that correspond to different frequency bands within a full frequency domain of the same frame.
[23] 23. An adaptive time/frequency-based audio decoding apparatus, comprising: a bitstream sorting unit to extract encoded data for each frequency band, division information, and encoding mode information for each frequency band from an input bitstream; a decoding unit to decode the encoded data for each frequency domain based on the division information and the respective encoding mode information; and a collection & inverse transform unit to collect decoded data in a frequency domain and to perform an inverse frequency-domain transform on the collected data.
[24] 24. The apparatus of claim 23, wherein the decoding unit comprises: a time-based decoding unit to perform time-based decoding on first encoded data based on the division information and respective first encoding mode information; and a frequency-based decoding unit to perform frequency-based decoding on second encoded data based on the division information and respective second encoding mode information.
[25] 25. The apparatus of claim 24, wherein the time-based decoding unit decodes the first encoded data using a CELP algorithm.
[26] 26. The apparatus of claim 23, wherein the collection & inverse transform unit performs envelope smoothing on the decoded data in the frequency domain and then performs the inverse frequency-domain transform on the decoded data such that the decoded data maintains continuity in the frequency domain.
[27] 27. The apparatus of claim 23, wherein a final audio signal is generated using a frequency- varying MLT after the decoded data is collected in the frequency domain.
[28] 28. An audio data decoding apparatus, comprising: a bitstream sorting unit to extract encoded audio data of a frame; and a decoding unit to decode the audio data of the frame into first audio data in a time domain and second audio data in a frequency domain.
[29] 29. An adaptive time/frequency-based audio encoding method, comprising: dividing an input audio signal into a plurality of frequency-domain signals and selecting a time-based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal; encoding each frequency-domain signal in the respective encoding mode; and outputting encoded data, division information, and encoding mode information of each respective frequency-domain signal.
[30] 30. The method of claim 29, wherein the division of the input audio signal comprises: transforming the input audio signal into a full frequency-domain signal; and dividing the full frequency-domain signal into the frequency-domain signals according to a preset standard and selecting the time-based encoding mode or the frequency-based encoding mode for each respective frequency-domain signal.
[31] 31. The method of claim 30, wherein the division of the full frequency-domain signal comprises: dividing the full frequency-domain signal into the frequency-domain signals suitable for the time-based encoding mode or the frequency-based encoding mode based on at least one of a spectral tilt, a size of signal energy of each frequency domain, a change in signal energy between sub-frames and a voicing level determination; and selecting the encoding mode for each respective frequency-domain signal.
[32] 32. The method of claim 29, wherein the encoding of each frequency-domain signal comprises: performing the time-based encoding on a first frequency-domain signal determined to be encoded in the time-based encoding mode; and performing frequency-based encoding on a second frequency-domain signal determined to be encoded in the frequency-based encoding mode.
[33] 33. An audio data encoding method, comprising: dividing a frame of audio data into first audio data and second audio data; and encoding the first audio data in a time domain and encoding the second audio data in a frequency domain.
[34] 34. An adaptive time/frequency-based audio decoding method, comprising: extracting encoded data for each frequency band from an input bitstream, division information, and encoding mode information for each frequency band; decoding the encoded data for each frequency domain based on the division information and the respective encoding mode information; and collecting decoded data in a frequency domain and performing an inverse frequency-domain transform on the collected data.
[35] 35. A computer-readable recording medium having a software program to execute an adaptive time/frequency-based audio encoding method, the method comprising: dividing an input audio signal into a plurality of frequency-domain signals and selecting a time-based encoding mode or a frequency-based encoding mode for each respective frequency-domain signal; encoding each frequency-domain signal in the respective encoding mode; and outputting encoded data, division information, and encoding mode information of each respective frequency-domain signal
PCT/KR2006/004655 2005-11-08 2006-11-08 Adaptive time/frequency-based audio encoding and decoding apparatuses and methods WO2007055507A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP06812491A EP1952400A4 (en) 2005-11-08 2006-11-08 Adaptive time/frequency-based audio encoding and decoding apparatuses and methods
CN2006800415925A CN101305423B (en) 2005-11-08 2006-11-08 Adaptive time/frequency-based audio encoding and decoding apparatuses and methods

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2005-0106354 2005-11-08
KR1020050106354A KR100647336B1 (en) 2005-11-08 2005-11-08 Apparatus and method for adaptive time/frequency-based encoding/decoding

Publications (1)

Publication Number Publication Date
WO2007055507A1 true WO2007055507A1 (en) 2007-05-18

Family

ID=37712834

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2006/004655 WO2007055507A1 (en) 2005-11-08 2006-11-08 Adaptive time/frequency-based audio encoding and decoding apparatuses and methods

Country Status (5)

Country Link
US (2) US8548801B2 (en)
EP (1) EP1952400A4 (en)
KR (1) KR100647336B1 (en)
CN (3) CN101305423B (en)
WO (1) WO2007055507A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9224403B2 (en) 2010-07-02 2015-12-29 Dolby International Ab Selective bass post filter

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100647336B1 (en) * 2005-11-08 2006-11-23 삼성전자주식회사 Apparatus and method for adaptive time/frequency-based encoding/decoding
US9583117B2 (en) * 2006-10-10 2017-02-28 Qualcomm Incorporated Method and apparatus for encoding and decoding audio signals
MX2009006201A (en) 2006-12-12 2009-06-22 Fraunhofer Ges Forschung Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream.
KR101379263B1 (en) 2007-01-12 2014-03-28 삼성전자주식회사 Method and apparatus for decoding bandwidth extension
KR101149449B1 (en) * 2007-03-20 2012-05-25 삼성전자주식회사 Method and apparatus for encoding audio signal, and method and apparatus for decoding audio signal
KR101377667B1 (en) 2007-04-24 2014-03-26 삼성전자주식회사 Method for encoding audio/speech signal in Time Domain
US8630863B2 (en) 2007-04-24 2014-01-14 Samsung Electronics Co., Ltd. Method and apparatus for encoding and decoding audio/speech signal
KR101393300B1 (en) * 2007-04-24 2014-05-12 삼성전자주식회사 Method and Apparatus for decoding audio/speech signal
KR101403340B1 (en) * 2007-08-02 2014-06-09 삼성전자주식회사 Method and apparatus for transcoding
CA2702669C (en) * 2007-10-15 2015-03-31 Lg Electronics Inc. A method and an apparatus for processing a signal
KR101455648B1 (en) * 2007-10-29 2014-10-30 삼성전자주식회사 Method and System to Encode/Decode Audio/Speech Signal for Supporting Interoperability
WO2009077950A1 (en) * 2007-12-18 2009-06-25 Koninklijke Philips Electronics N.V. An adaptive time/frequency-based audio encoding method
ATE500588T1 (en) * 2008-01-04 2011-03-15 Dolby Sweden Ab AUDIO ENCODERS AND DECODERS
US8880410B2 (en) * 2008-07-11 2014-11-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a bandwidth extended signal
MX2011000375A (en) * 2008-07-11 2011-05-19 Fraunhofer Ges Forschung Audio encoder and decoder for encoding and decoding frames of sampled audio signal.
WO2010003545A1 (en) * 2008-07-11 2010-01-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. An apparatus and a method for decoding an encoded audio signal
EP2144230A1 (en) 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Low bitrate audio encoding/decoding scheme having cascaded switches
CA2729971C (en) 2008-07-11 2014-11-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. An apparatus and a method for calculating a number of spectral envelopes
BR122017003818B1 (en) * 2008-07-11 2024-03-05 Fraunhofer-Gesellschaft zur Föerderung der Angewandten Forschung E.V. INSTRUMENT AND METHOD FOR GENERATING EXTENDED BANDWIDTH SIGNAL
USRE47180E1 (en) 2008-07-11 2018-12-25 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a bandwidth extended signal
KR101381513B1 (en) * 2008-07-14 2014-04-07 광운대학교 산학협력단 Apparatus for encoding and decoding of integrated voice and music
KR101756834B1 (en) * 2008-07-14 2017-07-12 삼성전자주식회사 Method and apparatus for encoding and decoding of speech and audio signal
KR101261677B1 (en) * 2008-07-14 2013-05-06 광운대학교 산학협력단 Apparatus for encoding and decoding of integrated voice and music
KR20100007738A (en) * 2008-07-14 2010-01-22 한국전자통신연구원 Apparatus for encoding and decoding of integrated voice and music
KR101622950B1 (en) * 2009-01-28 2016-05-23 삼성전자주식회사 Method of coding/decoding audio signal and apparatus for enabling the method
US20110087494A1 (en) * 2009-10-09 2011-04-14 Samsung Electronics Co., Ltd. Apparatus and method of encoding audio signal by switching frequency domain transformation scheme and time domain transformation scheme
KR101826331B1 (en) * 2010-09-15 2018-03-22 삼성전자주식회사 Apparatus and method for encoding and decoding for high frequency bandwidth extension
US8868432B2 (en) * 2010-10-15 2014-10-21 Motorola Mobility Llc Audio signal bandwidth extension in CELP-based speech coder
TWI425502B (en) * 2011-03-15 2014-02-01 Mstar Semiconductor Inc Audio time stretch method and associated apparatus
ES2689072T3 (en) * 2012-05-23 2018-11-08 Nippon Telegraph And Telephone Corporation Encoding an audio signal
CN109448745B (en) * 2013-01-07 2021-09-07 中兴通讯股份有限公司 Coding mode switching method and device and decoding mode switching method and device
RU2625561C2 (en) * 2013-01-29 2017-07-14 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Principle for coding mode switch compensation
CN104995680B (en) 2013-04-05 2018-04-03 杜比实验室特许公司 The companding apparatus and method of quantizing noise are reduced using advanced spectrum continuation
TWI615834B (en) * 2013-05-31 2018-02-21 Sony Corp Encoding device and method, decoding device and method, and program
US9349196B2 (en) 2013-08-09 2016-05-24 Red Hat, Inc. Merging and splitting data blocks
KR101457897B1 (en) * 2013-09-16 2014-11-04 삼성전자주식회사 Method and apparatus for encoding and decoding bandwidth extension
FR3013496A1 (en) * 2013-11-15 2015-05-22 Orange TRANSITION FROM TRANSFORMED CODING / DECODING TO PREDICTIVE CODING / DECODING
CN107452391B (en) * 2014-04-29 2020-08-25 华为技术有限公司 Audio coding method and related device
KR20180095123A (en) * 2014-05-15 2018-08-24 텔레폰악티에볼라겟엘엠에릭슨(펍) Audio signal classification and coding
US9685166B2 (en) * 2014-07-26 2017-06-20 Huawei Technologies Co., Ltd. Classification between time-domain coding and frequency domain coding
EP2980801A1 (en) * 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method for estimating noise in an audio signal, noise estimator, audio encoder, audio decoder, and system for transmitting audio signals
CN106297812A (en) * 2016-09-13 2017-01-04 深圳市金立通信设备有限公司 A kind of data processing method and terminal
EP3644313A1 (en) 2018-10-26 2020-04-29 Fraunhofer Gesellschaft zur Förderung der Angewand Perceptual audio coding with adaptive non-uniform time/frequency tiling using subband merging and time domain aliasing reduction
CN110265043B (en) * 2019-06-03 2021-06-01 同响科技股份有限公司 Adaptive lossy or lossless audio compression and decompression calculation method
CN111476137B (en) * 2020-04-01 2023-08-01 北京埃德尔黛威新技术有限公司 Novel pipeline leakage early warning online relevant positioning data compression method and device
CN111554322A (en) * 2020-05-15 2020-08-18 腾讯科技(深圳)有限公司 Voice processing method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020035470A1 (en) * 2000-09-15 2002-03-21 Conexant Systems, Inc. Speech coding system with time-domain noise attenuation
WO2004070706A1 (en) * 2003-01-08 2004-08-19 France Telecom Method for encoding and decoding audio at a variable rate
US20040174984A1 (en) * 2002-10-25 2004-09-09 Dilithium Networks Pty Ltd. Method and apparatus for DTMF detection and voice mixing in the CELP parameter domain
US20050192797A1 (en) * 2004-02-23 2005-09-01 Nokia Corporation Coding model selection
WO2005093717A1 (en) * 2004-03-12 2005-10-06 Nokia Corporation Synthesizing a mono audio signal based on an encoded miltichannel audio signal

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774837A (en) * 1995-09-13 1998-06-30 Voxware, Inc. Speech coding system and method using voicing probability determination
US6233550B1 (en) * 1997-08-29 2001-05-15 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4kbps
US5999897A (en) * 1997-11-14 1999-12-07 Comsat Corporation Method and apparatus for pitch estimation using perception based analysis by synthesis
US6064955A (en) * 1998-04-13 2000-05-16 Motorola Low complexity MBE synthesizer for very low bit rate voice messaging
JP4308345B2 (en) * 1998-08-21 2009-08-05 パナソニック株式会社 Multi-mode speech encoding apparatus and decoding apparatus
US6496797B1 (en) * 1999-04-01 2002-12-17 Lg Electronics Inc. Apparatus and method of speech coding and decoding using multiple frames
US6691082B1 (en) * 1999-08-03 2004-02-10 Lucent Technologies Inc Method and system for sub-band hybrid coding
US6912496B1 (en) * 1999-10-26 2005-06-28 Silicon Automation Systems Preprocessing modules for quality enhancement of MBE coders and decoders for signals having transmission path characteristics
US6377916B1 (en) * 1999-11-29 2002-04-23 Digital Voice Systems, Inc. Multiband harmonic transform coder
JP4907826B2 (en) * 2000-02-29 2012-04-04 クゥアルコム・インコーポレイテッド Closed-loop multimode mixed-domain linear predictive speech coder
US6584438B1 (en) * 2000-04-24 2003-06-24 Qualcomm Incorporated Frame erasure compensation method in a variable rate speech coder
US7363219B2 (en) * 2000-09-22 2008-04-22 Texas Instruments Incorporated Hybrid speech coding and system
DE10102154C2 (en) * 2001-01-18 2003-02-13 Fraunhofer Ges Forschung Method and device for generating a scalable data stream and method and device for decoding a scalable data stream taking into account a bit savings bank function
US6658383B2 (en) * 2001-06-26 2003-12-02 Microsoft Corporation Method for coding speech and music signals
US6912495B2 (en) * 2001-11-20 2005-06-28 Digital Voice Systems, Inc. Speech model and analysis, synthesis, and quantization methods
EP1493146B1 (en) * 2002-04-11 2006-08-02 Matsushita Electric Industrial Co., Ltd. Encoding and decoding devices, methods and programs
WO2004082288A1 (en) * 2003-03-11 2004-09-23 Nokia Corporation Switching between coding schemes
FI118834B (en) * 2004-02-23 2008-03-31 Nokia Corp Classification of audio signals
US7596486B2 (en) * 2004-05-19 2009-09-29 Nokia Corporation Encoding an audio signal using different audio coder modes
US20070147518A1 (en) * 2005-02-18 2007-06-28 Bruno Bessette Methods and devices for low-frequency emphasis during audio compression based on ACELP/TCX
US7177804B2 (en) * 2005-05-31 2007-02-13 Microsoft Corporation Sub-band voice codec with multi-stage codebooks and redundant coding
WO2007040349A1 (en) * 2005-10-05 2007-04-12 Lg Electronics Inc. Method and apparatus for signal processing
KR100647336B1 (en) * 2005-11-08 2006-11-23 삼성전자주식회사 Apparatus and method for adaptive time/frequency-based encoding/decoding
KR20070077652A (en) * 2006-01-24 2007-07-27 삼성전자주식회사 Apparatus for deciding adaptive time/frequency-based encoding mode and method of deciding encoding mode for the same
US8527265B2 (en) * 2007-10-22 2013-09-03 Qualcomm Incorporated Low-complexity encoding/decoding of quantized MDCT spectrum in scalable speech and audio codecs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020035470A1 (en) * 2000-09-15 2002-03-21 Conexant Systems, Inc. Speech coding system with time-domain noise attenuation
US20040174984A1 (en) * 2002-10-25 2004-09-09 Dilithium Networks Pty Ltd. Method and apparatus for DTMF detection and voice mixing in the CELP parameter domain
WO2004070706A1 (en) * 2003-01-08 2004-08-19 France Telecom Method for encoding and decoding audio at a variable rate
US20050192797A1 (en) * 2004-02-23 2005-09-01 Nokia Corporation Coding model selection
WO2005093717A1 (en) * 2004-03-12 2005-10-06 Nokia Corporation Synthesizing a mono audio signal based on an encoded miltichannel audio signal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1952400A4 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9224403B2 (en) 2010-07-02 2015-12-29 Dolby International Ab Selective bass post filter
US9343077B2 (en) 2010-07-02 2016-05-17 Dolby International Ab Pitch filter for audio signals
US9396736B2 (en) 2010-07-02 2016-07-19 Dolby International Ab Audio encoder and decoder with multiple coding modes
US9552824B2 (en) 2010-07-02 2017-01-24 Dolby International Ab Post filter
US9558754B2 (en) 2010-07-02 2017-01-31 Dolby International Ab Audio encoder and decoder with pitch prediction
US9558753B2 (en) 2010-07-02 2017-01-31 Dolby International Ab Pitch filter for audio signals
US9595270B2 (en) 2010-07-02 2017-03-14 Dolby International Ab Selective post filter
US9830923B2 (en) 2010-07-02 2017-11-28 Dolby International Ab Selective bass post filter
US9858940B2 (en) 2010-07-02 2018-01-02 Dolby International Ab Pitch filter for audio signals
US10236010B2 (en) 2010-07-02 2019-03-19 Dolby International Ab Pitch filter for audio signals
US10811024B2 (en) 2010-07-02 2020-10-20 Dolby International Ab Post filter for audio signals
US11183200B2 (en) 2010-07-02 2021-11-23 Dolby International Ab Post filter for audio signals
US11610595B2 (en) 2010-07-02 2023-03-21 Dolby International Ab Post filter for audio signals
US11996111B2 (en) 2010-07-02 2024-05-28 Dolby International Ab Post filter for audio signals

Also Published As

Publication number Publication date
CN101305423A (en) 2008-11-12
US8862463B2 (en) 2014-10-14
US20140032213A1 (en) 2014-01-30
CN101305423B (en) 2013-06-05
KR100647336B1 (en) 2006-11-23
CN103325377B (en) 2016-01-20
EP1952400A4 (en) 2011-02-09
CN103258541B (en) 2017-04-12
EP1952400A1 (en) 2008-08-06
US20070106502A1 (en) 2007-05-10
US8548801B2 (en) 2013-10-01
CN103258541A (en) 2013-08-21
CN103325377A (en) 2013-09-25

Similar Documents

Publication Publication Date Title
US8862463B2 (en) Adaptive time/frequency-based audio encoding and decoding apparatuses and methods
JP5978218B2 (en) General audio signal coding with low bit rate and low delay
CN101496100B (en) Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
JP5165559B2 (en) Audio codec post filter
CN101523484B (en) Systems, methods and apparatus for frame erasure recovery
CN101180676B (en) Methods and apparatus for quantization of spectral envelope representation
US9418666B2 (en) Method and apparatus for encoding and decoding audio/speech signal
JP5123173B2 (en) Subband speech codec with multi-stage codebook and redundant coding technology field
RU2485606C2 (en) Low bitrate audio encoding/decoding scheme using cascaded switches
JP6692948B2 (en) Method, encoder and decoder for linear predictive coding and decoding of speech signals with transitions between frames having different sampling rates
US8744841B2 (en) Adaptive time and/or frequency-based encoding mode determination apparatus and method of determining encoding mode of the apparatus
JP5241701B2 (en) Encoding apparatus and encoding method
CN101496101A (en) Systems, methods, and apparatus for gain factor limiting
US20100268542A1 (en) Apparatus and method of audio encoding and decoding based on variable bit rate
JP4281131B2 (en) Signal encoding apparatus and method, and signal decoding apparatus and method
JP3353852B2 (en) Audio encoding method
JP2000132195A (en) Signal encoding device and method therefor
KR20070106662A (en) Apparatus for deciding adaptive time/frequency-based encoding mode and method of deciding encoding mode for the same

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680041592.5

Country of ref document: CN

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006812491

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE