US8959015B2 - Apparatus for encoding and decoding of integrated speech and audio - Google Patents

Apparatus for encoding and decoding of integrated speech and audio Download PDF

Info

Publication number
US8959015B2
US8959015B2 US13/054,377 US200913054377A US8959015B2 US 8959015 B2 US8959015 B2 US 8959015B2 US 200913054377 A US200913054377 A US 200913054377A US 8959015 B2 US8959015 B2 US 8959015B2
Authority
US
United States
Prior art keywords
module
encoding
decoding
speech
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/054,377
Other versions
US20110119054A1 (en
Inventor
Tae Jin Lee
Seung Kwon Beack
Minje Kim
Dae Young Jang
Kyeongok Kang
Jin Woo Hong
Hochong Park
Young-Cheol Park
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electronics and Telecommunications Research Institute ETRI
Original Assignee
Electronics and Telecommunications Research Institute ETRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electronics and Telecommunications Research Institute ETRI filed Critical Electronics and Telecommunications Research Institute ETRI
Assigned to ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE reassignment ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BEACK, SEUNG KWON, HONG, JIN WOO, JANG, DAE YOUNG, KANG, KYEONGOK, KIM, MINJE, LEE, TAE JIN, PARK, YOUNG-CHEOL, PARK, HOCHONG
Publication of US20110119054A1 publication Critical patent/US20110119054A1/en
Application granted granted Critical
Publication of US8959015B2 publication Critical patent/US8959015B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Definitions

  • the present invention relates to an apparatus and method for integrally encoding and decoding a speech signal and an audio signal. More particularly, the present invention relates to an apparatus and method that may solve a signal distortion problem, resulting from a change of a selected module according to a frame progress, to thereby change a module without distortion, when a codec includes at least two encoding/decoding modules, operating with different structures, and selects and operates one of the at least two encoding/decoding modules according to an input characteristic for each frame.
  • Speech signals and audios signal have different characteristics. Therefore, speech codecs for the speech signals and audio codecs for the audio signals have been independently researched using unique characteristics of speech signals and audio signals, and standard codecs have been developed for each of the speech codecs and the audio codecs.
  • An aspect of the present invention provides an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may combine a speech codec module and an audio codec module and selectively apply a codec module according to a characteristic of an input signal to thereby enhance a performance.
  • Another aspect of the present invention also provides an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may use information of a previous module until a selected codec module is changed over time to thereby solve distortion occurring due to a discontinuous module operations.
  • Another aspect of the present invention also provides an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may use an additional scheme when previous module information for overlapping is not provided from a Modified Discrete Cosine Transform (MDCT) module demanding a time-domain aliasing cancellation (TDAC) operation to thereby enable the TDAC operation and perform a normal MDCT-based codec operation.
  • MDCT Modified Discrete Cosine Transform
  • TDAC time-domain aliasing cancellation
  • an encoding apparatus for integrally encoding a speech signal and an audio signal
  • the encoding apparatus including: a module selection unit to analyze a characteristic of an input signal and to select a first encoding module for encoding a first frame of the input signal; a speech encoding unit to encode the input signal according to a selection of the module selection unit and to generate a speech bitstream; an audio encoding unit to encode the input signal according to the selection of the module selection unit and to generate an audio bitstream; and a bitstream generation unit to generate an output bitstream from the speech encoding unit or the audio encoding unit according to the selection of the module selection unit.
  • the encoding apparatus may further include: a module buffer to store a module identifier (ID) of the selected first encoding module, and to transmit information of a second encoding module corresponding to a previous frame of the first frame to the speech encoding unit and the audio encoding unit; and an input buffer to store the input signal and to output a previous input signal that is an input signal of the previous frame.
  • the bitstream generation unit may combine the module ID of the selected first encoding module and a bitstream thereof to generate the output bitstream.
  • the module selection unit may extract the module ID of the selected first encoding module to transfer the extracted module ID to the module buffer and the bitstream generation unit.
  • the speech encoding unit may include: a first speech encoder to encode the input signal to a Code Excitation Linear Prediction (CELP) structure when the first encoding module is identical to the second encoding module; and an encoding initialization unit to determine an initial value for encoding of the first speech encoder when the first encoding module is different from the second encoding module.
  • CELP Code Excitation Linear Prediction
  • the first speech encoder may encode the input signal using an internal initial value of the first speech encoder.
  • the first speech encoder may encode the input signal using an initial value that is determined by the encoding initialization unit.
  • the encoding initialization unit may include: a Linear Predictive Coder (LPC) analyzer to calculate an LPC coefficient with respect to the previous input signal; a Linear Spectrum Pair (LSP) converter to convert the calculated LPC coefficient to an LSP value; an LPC residual signal calculator to calculate an LPC residual signal using the previous input signal and the LPC coefficient; and an encoding initial value decision unit to determine the initial value for encoding of the first speech encoder using the LPC coefficient, the LSP value, and the LPC residual signal.
  • LPC Linear Predictive Coder
  • LSP Linear Spectrum Pair
  • the audio encoding unit may include: a first audio encoder to encode the input signal through a Modified Discrete Cosine Transform (MDCT) operation when the first encoding module is identical to the second encoding module; a second speech encoder to encode the input signal to a CELP structure when the first encoding module is different from the second encoding module; a second audio encoder to encode the input signal through the MDCT operation when the first encoding module is different from the second encoding module; and a multiplexer to select one of an output of the first audio encoder, an output of the second speech encoder, and an output of the second audio encoder to generate the output bitstream.
  • MDCT Modified Discrete Cosine Transform
  • the second speech encoder may encode an input signal corresponding to a front 1 ⁇ 2 sample of the first frame.
  • the second audio encoder may include: a zero input response calculator to calculate a zero input response with respect to an LPC filter after terminating an encoding operation of the second speech encoder; a first converter to convert, to zero, an input signal corresponding to a front 1 ⁇ 2 sample of the first frame; and a second converter to subtract the zero input response from an input signal corresponding to a rear 1 ⁇ 2 sample of the first frame.
  • the second audio encoder may encode a converted signal of the first converter and a converted signal of the second converter.
  • a decoding apparatus for integrally decoding a speech signal and an audio signal
  • the decoding apparatus including: a module selection unit to analyze a characteristic of an input bitstream and to select a first decoding module for decoding a first frame of the input bitstream; a speech decoding unit to decode the input bitstream according to a selection of the module selection unit and to generate the speech signal; an audio decoding unit to decode the input bitstream according to the selection of the module selection unit and to generate the audio signal; and an output generation unit to select one of the speech signal of the speech decoding unit and the audio signal of the audio signal according to the selection of the module selection unit and to output an output signal.
  • the decoding apparatus may further include: a module buffer to store a module ID of the selected first decoding module, and to transmit information of a second decoding module corresponding to a previous frame of the first frame to the speech decoding unit and the audio decoding unit; and an output buffer to store the output signal and to output a previous output signal that is an output signal of the previous frame.
  • the audio decoding unit may include: a first audio decoder to decode the input bitstream through an Inverse MDCT (IMDCT) operation when the first decoding module is identical to the second decoding module; a second speech decoder to decode the input bitstream to a CELP structure when the first decoding module is different from the second decoding module; a second audio decoder to decode the input bitstream through the IMDCT operation when the first decoding module is different from the second decoding module; and a signal restoration unit to calculate a final output from an output of the second speech decoder and an output of the second audio decoder; and an output selector to select and output one of an output of the signal restoration unit and an output of the first audio decoder.
  • IMDCT Inverse MDCT
  • an apparatus and method for integrally encoding and decoding a speech signal and an audio signal may combine a speech codec module and an audio codec module and selectively apply a codec module according to a characteristic of an input signal to thereby enhance a performance.
  • an apparatus and method for integrally encoding and decoding a speech signal and an audio signal may use information of a previous module until a selected codec module is changed over time to thereby solve distortion occurring due to a discontinuous module operations.
  • an apparatus and method for integrally encoding and decoding a speech signal and an audio signal may use an additional scheme when previous module information for overlapping is not provided from a Modified Discrete Cosine Transform (MDCT) module demanding a time-domain aliasing cancellation (TDAC) operation to thereby enable the TDAC operation and perform a normal MDCT-based codec operation.
  • MDCT Modified Discrete Cosine Transform
  • TDAC time-domain aliasing cancellation
  • FIG. 1 is a block diagram illustrating an encoding apparatus for integrally encoding a speech signal and an audio signal according to an embodiment of the present invention
  • FIG. 2 is a block diagram illustrating an example of a speech encoding unit of FIG. 1 ;
  • FIG. 3 is a block diagram illustrating an example of an audio encoding unit of FIG. 1 ;
  • FIG. 4 is a diagram for describing an operation of the audio encoding unit of FIG. 3 ;
  • FIG. 5 is a block diagram illustrating a decoding apparatus for integrally decoding a speech signal and an audio signal according to an embodiment of the present invention
  • FIG. 6 is a block diagram illustrating an example of a speech decoding unit of FIG. 5 ;
  • FIG. 7 is a block diagram illustrating an example of an audio decoding unit of FIG. 5 ;
  • FIG. 8 is a diagram for describing an operation of the audio decoding unit of FIG. 7 ;
  • FIG. 9 is a flowchart illustrating an encoding method of integrally encoding a speech signal and an audio signal according to an embodiment of the present invention.
  • FIG. 10 is a flowchart illustrating a decoding method of integrally decoding a speech signal and an audio signal according to an embodiment of the present invention.
  • a unified codec includes two encoding modules and two decoding modules, where a speech encoding module and a speech decoding module are in a Code Excitation Linear Prediction (CELP) structure, and an audio encoding module and an audio decoding module perform a Modified Discrete Cosine Transform (MDCT) operation.
  • CELP Code Excitation Linear Prediction
  • MDCT Modified Discrete Cosine Transform
  • FIG. 1 is a block diagram illustrating an encoding apparatus 100 for integrally encoding a speech signal and an audio signal according to an embodiment of the present invention.
  • the encoding apparatus 100 may include a module selection unit 110 , a speech encoding unit 130 , an audio encoding unit 140 , and a bitstream generation unit 150 .
  • the encoding apparatus 100 may further include a module buffer 120 and an input buffer 160 .
  • the module selection unit 110 may analyze a characteristic of an input signal to select a first encoding module for encoding a first frame of the input signal.
  • the first frame may be a current frame of the input signal.
  • the module selection unit 110 may analyze the input signal to determine a module identifier (ID) for encoding the current frame, and may transfer the input signal to the selected first encoding module and input the module ID into the bitstream generation unit 150 .
  • ID module identifier
  • the module buffer 120 may store a module ID of the selected first encoding module, and transmit information of a second encoding module corresponding to a previous frame of the first frame to the speech encoding unit 130 and the audio encoding unit 140 .
  • the input buffer 160 may store the input signal and output a previous input signal that is an input signal of the previous frame. Specifically, the input buffer 160 may store the input signal and output the previous input signal one frame prior to the current frame.
  • the speech encoding unit 130 may encode the input signal according to a selection of the module selection unit 110 to generate a speech bitstream.
  • the speech encoding unit 130 will be described in detail with reference to FIG. 2 .
  • FIG. 2 is a block diagram illustrating an example of the speech encoding unit 130 of FIG. 1 .
  • the speech encoding unit 130 may include an encoding initialization unit 210 and a first speech encoder 220 .
  • the encoding initialization unit 210 may determine an initial value for encoding of the first speech encoder 220 . Specifically, the encoding initialization unit 210 may receive a previous module and determine the initial value for the first speech encoder 220 only when a previous frame has performed an MDCT operation.
  • the encoding initialization unit 210 may include a Linear Predictive Coder (LPC) analyzer 211 , a Linear Spectrum Pair (LSP) converter 212 , an LPC residual signal calculator 213 , and an encoding initial value decision unit 214 .
  • LPC Linear Predictive Coder
  • LSP Linear Spectrum Pair
  • the LPC analyzer 211 may calculate an LPC coefficient with respect to the previous input signal. Specifically, the LPC analyzer 212 may receive the previous input signal to perform an LPC analysis using the same scheme as the first speech encoder 220 and thereby calculate and output the LPC coefficient corresponding to the previous input signal.
  • the LSP converter 212 may convert the calculated LPC coefficient to an LSP value.
  • the LPC residual signal calculator 213 may calculate an LPC residual signal using the previous input signal and the LPC coefficient.
  • the encoding initial value decision unit 214 may determine the initial value for encoding of the first speech encoder 220 using the LPC coefficient, the LSP value, and the LPC residual signal. Specifically, the encoding initial value decision unit 214 may determine and output the initial value in a form, required by the first speech encoder 220 , using the LPC coefficient, the LSP value, the LPC residual signal, and the like.
  • the first speech encoder 220 may encode the input signal to a CELP structure.
  • the first speech encoder 220 may encode the input signal using an internal initial value of the first speech encoder 220 .
  • the first speech encoder 220 may encode the input signal using an initial value that is determined by the encoding initialization unit 210 .
  • the first speech encoder 220 may receive a previous module having performed encoding for a previous frame one frame prior to a current frame.
  • the first speech encoder 220 may encode an input signal corresponding to the current frame using a CELP scheme. In this case, the first speech encoder 220 may perform a consecutive CELP operation and thus continue with an encoding operation using internally provided previous information to generate a bitstream.
  • the first speech encoder 220 may erase all the previous information for CELP encoding, and perform the encoding operation using the initial value, provided from the encoding initialization unit 210 , to generate the bitstream.
  • the audio encoding unit 140 may encode the input signal according to the selection of the module selection unit 110 to generate an audio bitstream.
  • the audio encoding unit 140 will be further described in detail with reference to FIGS. 3 and 4 .
  • FIG. 3 is a block diagram illustrating an example of the audio encoding unit 140 of FIG. 1 .
  • the audio encoding unit 140 may include a second speech encoder 310 , a second audio encoder 320 , a first audio encoder 330 , and a multiplexer 340 .
  • the first audio encoder 330 may encode the input signal through an MDCT operation. Specifically, the first audio encoder 330 may receive a previous module. When the previous frame has performed the MDCT operation, the first audio encoder 330 may encode an input signal corresponding to a current frame using the MDCT operation to thereby generate a bitstream. The generated bitstream may be input into the multiplexer 340 .
  • X denotes an input signal of a current frame 412 .
  • x 1 and x 2 denote signals that are generated by bisecting the input signal X by a 1 ⁇ 2 frame length.
  • An MDCT operation of the current frame 412 may be applied to signals X and Y including signal Y corresponding to a subsequent frame 413 .
  • MDCT may be executed after multiplying windows w 1 w 2 w 3 w 4 420 by signals X and Y.
  • w 1 , w 2 , w 3 , and w 4 denote window pieces that are generated by dividing the entire window by a 1 ⁇ 2 frame length.
  • the first audio encoder 330 may not perform any operation.
  • the second speech encoder 310 may encode the input signal to a CELP structure.
  • the second speech encoder 310 may receive the previous module.
  • the second speech encoder 310 may encode signal x 1 to output the bitstream, and may input the bitstream into the multiplexer 340 .
  • the second speech encoder 310 may be consecutively connected to the previous frame 411 and thus perform the encoding operation without initialization.
  • the second speech encoder 310 may not perform any operation.
  • the second audio encoder 320 may encode the input signal through the MDCT operation.
  • the second audio encoder 320 may receive the previous module.
  • the second audio encoder 320 may encode the input signal using any one of the following first through third schemes.
  • the first scheme may encode the input signal according to the existing MDCT operation.
  • a signal restoration operation of an audio decoding module (not shown) may be determined depending on a scheme adopted by the second audio encoder 320 . When the previous frame has performed the MDCT operation, the second audio encoder 320 may not perform any operation.
  • the second audio encoder 320 may include a zero input response calculator (not shown) to calculate a zero input response with respect to an LPC filter after terminating an encoding operation of the second speech encoder 310 , a first converter (not shown) to convert, to zero, an input signal corresponding to a front 1 ⁇ 2 sample of the first frame, and a second converter (not shown) to subtract the zero input response from an input signal corresponding to a rear 1 ⁇ 2 sample of the first frame.
  • the second audio encoder 320 may encode a converted signal of the first converter and a converted signal of the second converter.
  • the multiplexer 340 may select one of an output of the first audio encoder 330 , an output of the second speech encoder 310 , and an output of the second audio encoder 330 to generate an output bitstream.
  • the multiplexer 340 may combine bitstreams to generate a final bitstream.
  • the final bitstream may be the same as the output bitstream of the first audio encoder 330 .
  • the bitstream generation unit 150 may combine the module ID of the selected first encoding module and the bitstream of the selected first encoding module to generate the output bitstream.
  • the bitstream generation unit 150 may combine the module ID and a bitstream corresponding to the module ID to thereby generate the final bitstream.
  • FIG. 5 is a block diagram illustrating a decoding apparatus 500 for integrally decoding a speech signal and an audio signal according to an embodiment of the present invention.
  • the decoding apparatus 500 may include a module selection unit 510 , a speech decoding unit 530 , an audio decoding unit 540 , and an output generation unit 550 . Also, the decoding apparatus 500 may further include a module buffer 520 and an output buffer 560 .
  • the module selection unit 510 may analyze a characteristic of an input bitstream to select a first decoding module for decoding a first frame of the input bitstream. Specifically, the module selection unit 510 may analyze a module, transmitted from the input bitstream, to output a module ID and to transfer the input bitstream to a corresponding decoding module.
  • the speech decoding unit 530 may decode the input bitstream according to a selection of the module selection unit 510 to generate a speech signal. Specifically, the speech decoding unit 530 may perform a CELP-based speech decoding operation. Hereinafter, the speech decoding unit 530 will be further described in detail with reference to FIG. 6 .
  • FIG. 6 is a block diagram illustrating an example of the speech decoding unit 530 of FIG. 5 .
  • the speech decoding unit 530 may include a decoding initialization unit 610 and a first speech decoder 620 .
  • the decoding initialization unit 610 may determine an initial value for decoding of the first speech decoder 620 . Specifically, the decoding initialization unit 610 may receive a previous module. Only when a previous frame has performed an MDCT operation may the decoding initialization unit 610 determine the initial value to be provided for the first speech decoder 620 .
  • the decoding initialization unit 610 may include an LPC analyzer 611 , an LSP converter 612 , an LPC residual signal calculator 613 , and a decoding initial value decision unit 614 .
  • the LPC analyzer 611 may calculate an LPC coefficient with respect to the previous output signal. Specifically, the LPC analyzer 611 may receive the previous output signal to perform an LPC analysis using the same scheme as the first speech decoder 620 and thereby calculate and output an LPC coefficient corresponding to the previous output signal.
  • the LSP converter 612 may convert the calculated LPC coefficient to an LSP value.
  • the LPC residual signal calculator 613 may calculate an LPC residual signal using the previous output signal and the LPC coefficient.
  • the decoding initial value decision unit 614 may determine the initial value for decoding of the first speech decoder 620 using the LPC coefficient, the LSP value, and the LPC residual signal. Specifically, the decoding initial value decision unit 614 may determine and output the initial value in a form, required by the first speech decoder 620 , using the LPC coefficient, the LPC value, the LPC residual signal, and the like.
  • the first speech decoder 620 may decode the input bitstream to a CELP structure.
  • the first speech decoder 620 may decode the input bitstream using an internal initial value of the first speech decoder 620 .
  • the first speech decoder 620 may decode the input bitstream using an initial value that is determined by the decoding initialization unit 610 .
  • the first speech decoder 620 may receive a previous module having performed decoding for a previous frame one frame prior to a current frame.
  • the first speech decoder 620 may decode input bitstream corresponding to the current frame using a CELP scheme. In this case, the first speech decoder 620 may perform a consecutive CELP operation and thus continue with a decoding operation using internally provided previous information to generate an output signal. When the previous frame has performed an MDCT operation, the first speech decoder 620 may erase all the previous information for CELP decoding, and perform the decoding operation using the initial value, provided from the decoding initialization unit 610 , to generate the output signal.
  • the audio decoding unit 540 may decode the input bitstream according to the selection of the module selection unit 510 to generate an audio signal.
  • the audio decoding unit 540 will be further described in detail with reference to FIGS. 7 and 8 .
  • FIG. 7 is a block diagram illustrating an example of the audio decoding unit 540 of FIG. 5 .
  • the audio decoding unit 540 may include a second speech decoder 710 , a second audio decoder 720 , a first audio decoder 730 , a signal restoration unit 740 , and an output selector 750 .
  • the first audio decoder 730 may decode the input bitstream through an Inverse MDCT (IMDCT) operation. Specifically, the first audio decoder 730 may receive a previous module. When a previous frame has performed the IMDCT operation, the first audio decoder 730 may decode an input bitstream corresponding to the current frame using the IMDCT operation to thereby generate an output signal. Specifically, the first audio decoder 730 may receive an input bitstream of the current frame, perform the IMDCT operation according to an existing technology, apply a window to thereby perform a time-domain aliasing cancellation (TDAC) operation, and output a final output signal. When the previous frame performs a CELP operation, the first audio decoder 730 may not perform any operation.
  • IMDCT Inverse MDCT
  • the second speech decoder 710 may decode the input bitstream to a CELP structure. Specifically, the second speech decoder 710 may receive the previous module. When the previous frame has performed the CELP operation, the second speech decoder 710 may decode the input bitstream according to an existing speech decoding scheme to generate an output signal. Here, the output signal of the second speech decoder 710 may be x 4 820 and have a 1 ⁇ 2 frame length. Since the previous frame has performed the CELP operation, the second speech decoder 710 may be consecutively connected to the previous frame and thus perform the decoding operation without initialization.
  • the second audio decoder 720 may decode the input bitstream through the IMDCT operation.
  • the second audio decoder 720 may apply only a window and obtain an output signal without performing the TDAC operation.
  • ab 830 may denote the output signal of the second audio decoder 720 .
  • a and b may be defined as signals having a 1 ⁇ 2 frame length.
  • the signal restoration unit 740 may calculate a final output from an output of the second speech decoder 710 and an output of the second audio decoder 720 . Also, the signal restoration unit 710 may obtain a final output signal of the current frame and define the output signals as gh 850 as shown in FIG. 8 . Here, g and h may be defined as signals having a 1 ⁇ 2 frame length.
  • a first scheme may obtain h according to the following Equation 1. Here, a general window operation is assumed. In the following Equation 1, R denotes time-axis rotating a signal based on a 1 ⁇ 2 frame length.
  • h denotes the output signal corresponding to a rear 1 ⁇ 2 sample of the first frame
  • b denotes an output signal of the second audio decoder 720
  • x 4 denotes an output signal of the second speech decoder 710
  • w 1 and w 2 denote windows
  • w 1 R denotes a signal that is generated by performing a time-axis rotation for w 1 based on a 1 ⁇ 2 frame length
  • x 4 R denotes a signal that is generated by performing the time-axis rotation for x 4 based on a 1 ⁇ 2 frame length.
  • a second scheme may obtain h according to the following Equation 2:
  • h denotes the output signal corresponding to the rear 1 ⁇ 2 sample of the first frame
  • b denotes the output signal of the second audio decoder 720
  • w 2 denotes a window
  • a third scheme may obtain h according to the following Equation 3:
  • h denotes the output signal corresponding to the rear 1 ⁇ 2 sample of the first frame
  • b denotes the output signal of the second audio decoder 720
  • w 2 denotes a window
  • x 5 840 denotes a zero input response with respect to an LPC filter after decoding the output signal of the second speech decoder 710 .
  • the second speech decoder 710 When the previous frame has performed the MDCT operation, the second speech decoder 710 , the second audio decoder 720 , and the signal restoration unit 740 may not perform any operation.
  • the output selector 750 may select and output one of an output of the signal restoration unit 740 and an output of the first audio decoder 730 .
  • the output generation unit 750 may select one of the speech signal of the speech decoding unit 530 and the audio signal of the audio decoding unit 540 according to the selection of the module selection unit 510 to generate the output signal. Specifically, the output generation unit 750 may select the output signal according to the module ID to output the selected output signal as the final output signal.
  • the module buffer 520 may store a module ID of the selected first decoding module, and transmit information of a second decoding module corresponding to a previous frame of the first frame to the speech decoding unit 530 and the audio decoding unit 540 . Specifically, the module buffer 520 may store the module ID to output a previous module corresponding to a previous module ID that is one frame prior to a current frame.
  • the output buffer 560 may store the output signal and output a previous output signal that is an output signal of the previous frame.
  • FIG. 9 is a flowchart illustrating an encoding method of integrally encoding a speech signal and an audio signal according to an embodiment of the present invention.
  • the encoding method may analyze an input signal to determine a module type of an encoding module for encoding a current frame, and buffer the input signal to prepare a previous frame input signal, and may store a module type of the current frame to prepare a module type of a previous frame.
  • the encoding method may determine whether the determined module type is a speech module or an audio module.
  • the encoding method may determine whether the module type is changed in operation 930 .
  • the encoding method may perform a CELP encoding operation according to an existing technology in operation 950 . Conversely, when the module type is changed in operation 930 , the encoding method may perform an initialization according to an operation of the encoding initialization module to determine an initial value, and perform the CELP encoding operation using the initial value in operation 960 .
  • the encoding method may determine whether the module type is changed in operation 940 .
  • the encoding method may perform an additional encoding process in operation 970 .
  • the encoding method may perform a CELP-based encoding for an input signal corresponding to a 1 ⁇ 2 frame length and perform a second audio encoding operation for the entire frame length.
  • the encoding method may perform an MDCT-based encoding operation according to an existing technology in operation 980 .
  • the encoding method may select and output a final bitstream according to the module type and depending on whether the module type is changed.
  • FIG. 10 is a flowchart illustrating a decoding method of integrally decoding a speech signal and an audio signal according to an embodiment of the present invention.
  • the decoding method may determine a module type of a decoding module of a current frame based on input bitstream information to prepare a previous frame output signal, and store the module type of the current frame to prepare a module type of a previous frame.
  • the decoding method may determine whether the determined module type is a speech module or an audio module.
  • the decoding method may determine whether the module type is changed in operation 1003 .
  • the decoding method may perform a CELP decoding operation according to an existing technology in operation 1005 . Conversely, when the module type is changed in operation 1003 , the decoding method may perform an initialization according to an operation of the decoding initialization module to obtain an initial value, and perform the CELP decoding operation using the initial value in operation 1006 .
  • the decoding method may determine whether the module type is changed in operation 1004 .
  • the decoding method may perform an additional decoding process in operation 1007 .
  • the decoding method may perform a CELP-based decoding for the input bitstream to obtain an output signal corresponding to a 1 ⁇ 2 frame length, and perform a second audio decoding operation for the input bitstream.
  • the decoding method may perform an MDCT-based decoding operation according to an existing technology in operation 1008 .
  • the decoding method may perform a signal restoration operation to obtain an output signal.
  • the decoding method may select and output a final signal according to the module type and depending on whether the module type is changed.
  • an apparatus and method for integrally encoding and decoding a speech signal and an audio signal may unify a speech codec module and an audio codec module, selectively apply a codec module according to a characteristic of an input signal, and thereby may enhance a performance.
  • the TDAC operation may be enabled to thereby perform a normal MDCT-based codec operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Provided is an apparatus for integrally encoding and decoding a speech signal and an audio signal. An encoding apparatus for integrally encoding a speech signal and an audio signal, may include: a module selection unit to analyze a characteristic of an input signal and to select a first encoding module for encoding a first frame of the input signal; a speech encoding unit to encode the input signal according to a selection of the module selection unit and to generate a speech bitstream; an audio encoding unit to encode the input signal according to the selection of the module selection unit and to generate an audio bitstream; and a bitstream generation unit to generate an output bitstream from the speech encoding unit or the audio encoding unit according to the selection of the module selection unit.

Description

CROSS REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of International Application No. PCT/KR2009/003854, filed Jul. 14, 2009, and claims the benefit of Korean Application No. 10-2008-0068370, filed Jul. 14, 2008, and Korean Application No. 10-2009-0061607, filed Jul. 7, 2009, the disclosures of all of which are incorporated herein by reference.
TECHNICAL FIELD
The present invention relates to an apparatus and method for integrally encoding and decoding a speech signal and an audio signal. More particularly, the present invention relates to an apparatus and method that may solve a signal distortion problem, resulting from a change of a selected module according to a frame progress, to thereby change a module without distortion, when a codec includes at least two encoding/decoding modules, operating with different structures, and selects and operates one of the at least two encoding/decoding modules according to an input characteristic for each frame.
BACKGROUND ART
Speech signals and audios signal have different characteristics. Therefore, speech codecs for the speech signals and audio codecs for the audio signals have been independently researched using unique characteristics of speech signals and audio signals, and standard codecs have been developed for each of the speech codecs and the audio codecs.
Currently, as a communication service and a broadcasting service are integrated or converged, there is a need to integrally process a speech signal and an audio signal having various types of characteristics, using a single codec. However, existing speech codecs or audio codecs may not provide a performance demanded of a unified codec. Specifically, an audio codec having the best performance may not provide a satisfactory performance with respect to a speech signal, and a speech codec having the best performance may not provide a satisfactory performance with respect to an audio signal. Therefore, the existing codecs are not used for the unified speech/audio codec.
Accordingly, there is a need for a technology that may select a corresponding module according to a characteristic of an input signal to optimally encode and decode a corresponding signal.
DISCLOSURE OF INVENTION Technical Goals
An aspect of the present invention provides an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may combine a speech codec module and an audio codec module and selectively apply a codec module according to a characteristic of an input signal to thereby enhance a performance.
Another aspect of the present invention also provides an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may use information of a previous module until a selected codec module is changed over time to thereby solve distortion occurring due to a discontinuous module operations.
Another aspect of the present invention also provides an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may use an additional scheme when previous module information for overlapping is not provided from a Modified Discrete Cosine Transform (MDCT) module demanding a time-domain aliasing cancellation (TDAC) operation to thereby enable the TDAC operation and perform a normal MDCT-based codec operation.
Technical Solutions
According to an aspect of the present invention, there is provided an encoding apparatus for integrally encoding a speech signal and an audio signal, the encoding apparatus including: a module selection unit to analyze a characteristic of an input signal and to select a first encoding module for encoding a first frame of the input signal; a speech encoding unit to encode the input signal according to a selection of the module selection unit and to generate a speech bitstream; an audio encoding unit to encode the input signal according to the selection of the module selection unit and to generate an audio bitstream; and a bitstream generation unit to generate an output bitstream from the speech encoding unit or the audio encoding unit according to the selection of the module selection unit.
In this instance, the encoding apparatus may further include: a module buffer to store a module identifier (ID) of the selected first encoding module, and to transmit information of a second encoding module corresponding to a previous frame of the first frame to the speech encoding unit and the audio encoding unit; and an input buffer to store the input signal and to output a previous input signal that is an input signal of the previous frame. The bitstream generation unit may combine the module ID of the selected first encoding module and a bitstream thereof to generate the output bitstream.
Also, the module selection unit may extract the module ID of the selected first encoding module to transfer the extracted module ID to the module buffer and the bitstream generation unit.
Also, the speech encoding unit may include: a first speech encoder to encode the input signal to a Code Excitation Linear Prediction (CELP) structure when the first encoding module is identical to the second encoding module; and an encoding initialization unit to determine an initial value for encoding of the first speech encoder when the first encoding module is different from the second encoding module.
Also, when the first encoding module is identical to the second encoding module, the first speech encoder may encode the input signal using an internal initial value of the first speech encoder. When the first encoding module is different from the second encoding module, the first speech encoder may encode the input signal using an initial value that is determined by the encoding initialization unit.
Also, the encoding initialization unit may include: a Linear Predictive Coder (LPC) analyzer to calculate an LPC coefficient with respect to the previous input signal; a Linear Spectrum Pair (LSP) converter to convert the calculated LPC coefficient to an LSP value; an LPC residual signal calculator to calculate an LPC residual signal using the previous input signal and the LPC coefficient; and an encoding initial value decision unit to determine the initial value for encoding of the first speech encoder using the LPC coefficient, the LSP value, and the LPC residual signal.
Also, the audio encoding unit may include: a first audio encoder to encode the input signal through a Modified Discrete Cosine Transform (MDCT) operation when the first encoding module is identical to the second encoding module; a second speech encoder to encode the input signal to a CELP structure when the first encoding module is different from the second encoding module; a second audio encoder to encode the input signal through the MDCT operation when the first encoding module is different from the second encoding module; and a multiplexer to select one of an output of the first audio encoder, an output of the second speech encoder, and an output of the second audio encoder to generate the output bitstream.
Also, when the first encoding module is different from the second encoding module, the second speech encoder may encode an input signal corresponding to a front ½ sample of the first frame.
Also, the second audio encoder may include: a zero input response calculator to calculate a zero input response with respect to an LPC filter after terminating an encoding operation of the second speech encoder; a first converter to convert, to zero, an input signal corresponding to a front ½ sample of the first frame; and a second converter to subtract the zero input response from an input signal corresponding to a rear ½ sample of the first frame. The second audio encoder may encode a converted signal of the first converter and a converted signal of the second converter.
According to another aspect of the present invention, there is provided a decoding apparatus for integrally decoding a speech signal and an audio signal, the decoding apparatus including: a module selection unit to analyze a characteristic of an input bitstream and to select a first decoding module for decoding a first frame of the input bitstream; a speech decoding unit to decode the input bitstream according to a selection of the module selection unit and to generate the speech signal; an audio decoding unit to decode the input bitstream according to the selection of the module selection unit and to generate the audio signal; and an output generation unit to select one of the speech signal of the speech decoding unit and the audio signal of the audio signal according to the selection of the module selection unit and to output an output signal.
In this instance, the decoding apparatus may further include: a module buffer to store a module ID of the selected first decoding module, and to transmit information of a second decoding module corresponding to a previous frame of the first frame to the speech decoding unit and the audio decoding unit; and an output buffer to store the output signal and to output a previous output signal that is an output signal of the previous frame.
Also, the audio decoding unit may include: a first audio decoder to decode the input bitstream through an Inverse MDCT (IMDCT) operation when the first decoding module is identical to the second decoding module; a second speech decoder to decode the input bitstream to a CELP structure when the first decoding module is different from the second decoding module; a second audio decoder to decode the input bitstream through the IMDCT operation when the first decoding module is different from the second decoding module; and a signal restoration unit to calculate a final output from an output of the second speech decoder and an output of the second audio decoder; and an output selector to select and output one of an output of the signal restoration unit and an output of the first audio decoder.
Advantageous Effects
According to example embodiments, there are an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may combine a speech codec module and an audio codec module and selectively apply a codec module according to a characteristic of an input signal to thereby enhance a performance.
According to example embodiments, there are an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may use information of a previous module until a selected codec module is changed over time to thereby solve distortion occurring due to a discontinuous module operations.
According to example embodiments, there are an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may use an additional scheme when previous module information for overlapping is not provided from a Modified Discrete Cosine Transform (MDCT) module demanding a time-domain aliasing cancellation (TDAC) operation to thereby enable the TDAC operation and perform a normal MDCT-based codec operation.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 is a block diagram illustrating an encoding apparatus for integrally encoding a speech signal and an audio signal according to an embodiment of the present invention;
FIG. 2 is a block diagram illustrating an example of a speech encoding unit of FIG. 1;
FIG. 3 is a block diagram illustrating an example of an audio encoding unit of FIG. 1;
FIG. 4 is a diagram for describing an operation of the audio encoding unit of FIG. 3;
FIG. 5 is a block diagram illustrating a decoding apparatus for integrally decoding a speech signal and an audio signal according to an embodiment of the present invention;
FIG. 6 is a block diagram illustrating an example of a speech decoding unit of FIG. 5;
FIG. 7 is a block diagram illustrating an example of an audio decoding unit of FIG. 5;
FIG. 8 is a diagram for describing an operation of the audio decoding unit of FIG. 7;
FIG. 9 is a flowchart illustrating an encoding method of integrally encoding a speech signal and an audio signal according to an embodiment of the present invention; and
FIG. 10 is a flowchart illustrating a decoding method of integrally decoding a speech signal and an audio signal according to an embodiment of the present invention.
BEST MODE FOR CARRYING OUT THE INVENTION
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
Here, it is assumed that a unified codec includes two encoding modules and two decoding modules, where a speech encoding module and a speech decoding module are in a Code Excitation Linear Prediction (CELP) structure, and an audio encoding module and an audio decoding module perform a Modified Discrete Cosine Transform (MDCT) operation.
FIG. 1 is a block diagram illustrating an encoding apparatus 100 for integrally encoding a speech signal and an audio signal according to an embodiment of the present invention.
Referring to FIG. 1, the encoding apparatus 100 may include a module selection unit 110, a speech encoding unit 130, an audio encoding unit 140, and a bitstream generation unit 150.
Also, the encoding apparatus 100 may further include a module buffer 120 and an input buffer 160.
The module selection unit 110 may analyze a characteristic of an input signal to select a first encoding module for encoding a first frame of the input signal. Here, the first frame may be a current frame of the input signal. Also, the module selection unit 110 may analyze the input signal to determine a module identifier (ID) for encoding the current frame, and may transfer the input signal to the selected first encoding module and input the module ID into the bitstream generation unit 150.
The module buffer 120 may store a module ID of the selected first encoding module, and transmit information of a second encoding module corresponding to a previous frame of the first frame to the speech encoding unit 130 and the audio encoding unit 140.
The input buffer 160 may store the input signal and output a previous input signal that is an input signal of the previous frame. Specifically, the input buffer 160 may store the input signal and output the previous input signal one frame prior to the current frame.
The speech encoding unit 130 may encode the input signal according to a selection of the module selection unit 110 to generate a speech bitstream. Hereinafter, the speech encoding unit 130 will be described in detail with reference to FIG. 2.
FIG. 2 is a block diagram illustrating an example of the speech encoding unit 130 of FIG. 1.
Referring to FIG. 2, the speech encoding unit 130 may include an encoding initialization unit 210 and a first speech encoder 220.
When the first encoding module is different from the second encoding module, the encoding initialization unit 210 may determine an initial value for encoding of the first speech encoder 220. Specifically, the encoding initialization unit 210 may receive a previous module and determine the initial value for the first speech encoder 220 only when a previous frame has performed an MDCT operation. Here, the encoding initialization unit 210 may include a Linear Predictive Coder (LPC) analyzer 211, a Linear Spectrum Pair (LSP) converter 212, an LPC residual signal calculator 213, and an encoding initial value decision unit 214.
The LPC analyzer 211 may calculate an LPC coefficient with respect to the previous input signal. Specifically, the LPC analyzer 212 may receive the previous input signal to perform an LPC analysis using the same scheme as the first speech encoder 220 and thereby calculate and output the LPC coefficient corresponding to the previous input signal.
The LSP converter 212 may convert the calculated LPC coefficient to an LSP value.
The LPC residual signal calculator 213 may calculate an LPC residual signal using the previous input signal and the LPC coefficient.
The encoding initial value decision unit 214 may determine the initial value for encoding of the first speech encoder 220 using the LPC coefficient, the LSP value, and the LPC residual signal. Specifically, the encoding initial value decision unit 214 may determine and output the initial value in a form, required by the first speech encoder 220, using the LPC coefficient, the LSP value, the LPC residual signal, and the like.
When the first encoding module is identical to the second encoding module, the first speech encoder 220 may encode the input signal to a CELP structure. Here, when the first encoding module is identical to the second encoding module, the first speech encoder 220 may encode the input signal using an internal initial value of the first speech encoder 220. When the first encoding module is different from the second encoding module, the first speech encoder 220 may encode the input signal using an initial value that is determined by the encoding initialization unit 210. For example, the first speech encoder 220 may receive a previous module having performed encoding for a previous frame one frame prior to a current frame. When the previous frame has performed a CELP operation, the first speech encoder 220 may encode an input signal corresponding to the current frame using a CELP scheme. In this case, the first speech encoder 220 may perform a consecutive CELP operation and thus continue with an encoding operation using internally provided previous information to generate a bitstream. When the previous frame has performed an MDCT operation, the first speech encoder 220 may erase all the previous information for CELP encoding, and perform the encoding operation using the initial value, provided from the encoding initialization unit 210, to generate the bitstream.
Referring again to FIG. 1, the audio encoding unit 140 may encode the input signal according to the selection of the module selection unit 110 to generate an audio bitstream. Hereinafter, the audio encoding unit 140 will be further described in detail with reference to FIGS. 3 and 4.
FIG. 3 is a block diagram illustrating an example of the audio encoding unit 140 of FIG. 1.
Referring to FIG. 3, the audio encoding unit 140 may include a second speech encoder 310, a second audio encoder 320, a first audio encoder 330, and a multiplexer 340.
When the first encoding module is identical to the second encoding module, the first audio encoder 330 may encode the input signal through an MDCT operation. Specifically, the first audio encoder 330 may receive a previous module. When the previous frame has performed the MDCT operation, the first audio encoder 330 may encode an input signal corresponding to a current frame using the MDCT operation to thereby generate a bitstream. The generated bitstream may be input into the multiplexer 340.
Referring to FIG. 4, X denotes an input signal of a current frame 412. x1 and x2 denote signals that are generated by bisecting the input signal X by a ½ frame length. An MDCT operation of the current frame 412 may be applied to signals X and Y including signal Y corresponding to a subsequent frame 413. MDCT may be executed after multiplying windows w1w2w3w4 420 by signals X and Y. Here, w1, w2, w3, and w4 denote window pieces that are generated by dividing the entire window by a ½ frame length. When the previous frame 411 has performed a CELP operation, the first audio encoder 330 may not perform any operation.
When the first encoding module is different from the second encoding module, the second speech encoder 310 may encode the input signal to a CELP structure. Here, the second speech encoder 310 may receive the previous module. When the previous frame 411 has performed a CELP operation, the second speech encoder 310 may encode signal x1 to output the bitstream, and may input the bitstream into the multiplexer 340. When the previous frame 411 has performed the CELP operation, the second speech encoder 310 may be consecutively connected to the previous frame 411 and thus perform the encoding operation without initialization. When the previous frame 411 has performed the MDCT operation, the second speech encoder 310 may not perform any operation.
When the first encoding module is different from the second encoding module, the second audio encoder 320 may encode the input signal through the MDCT operation. Here, the second audio encoder 320 may receive the previous module. When the previous frame 411 has performed the CELP operation, the second audio encoder 320 may encode the input signal using any one of the following first through third schemes. The first scheme may encode the input signal according to the existing MDCT operation. The second scheme may modify the input signal to be x1=0, and encode the result using a scheme according to the existing MDCT operation. The third scheme may calculate a zero input response x3 430 with respect to an LPC filter obtained after the second speech encoder 310 terminates the encoding operation of signal x1, and may modify signal x2 according to x2=x2−x3 and modify the input signal based on x1=0, and encode the result according to the existing MDCT operation. A signal restoration operation of an audio decoding module (not shown) may be determined depending on a scheme adopted by the second audio encoder 320. When the previous frame has performed the MDCT operation, the second audio encoder 320 may not perform any operation.
For the above encoding operation, the second audio encoder 320 may include a zero input response calculator (not shown) to calculate a zero input response with respect to an LPC filter after terminating an encoding operation of the second speech encoder 310, a first converter (not shown) to convert, to zero, an input signal corresponding to a front ½ sample of the first frame, and a second converter (not shown) to subtract the zero input response from an input signal corresponding to a rear ½ sample of the first frame. The second audio encoder 320 may encode a converted signal of the first converter and a converted signal of the second converter.
The multiplexer 340 may select one of an output of the first audio encoder 330, an output of the second speech encoder 310, and an output of the second audio encoder 330 to generate an output bitstream. Here, the multiplexer 340 may combine bitstreams to generate a final bitstream. When the previous frame performed the MDCT operation, the final bitstream may be the same as the output bitstream of the first audio encoder 330.
Referring again to FIG. 1, the bitstream generation unit 150 may combine the module ID of the selected first encoding module and the bitstream of the selected first encoding module to generate the output bitstream. The bitstream generation unit 150 may combine the module ID and a bitstream corresponding to the module ID to thereby generate the final bitstream.
FIG. 5 is a block diagram illustrating a decoding apparatus 500 for integrally decoding a speech signal and an audio signal according to an embodiment of the present invention.
Referring to FIG. 5, the decoding apparatus 500 may include a module selection unit 510, a speech decoding unit 530, an audio decoding unit 540, and an output generation unit 550. Also, the decoding apparatus 500 may further include a module buffer 520 and an output buffer 560.
The module selection unit 510 may analyze a characteristic of an input bitstream to select a first decoding module for decoding a first frame of the input bitstream. Specifically, the module selection unit 510 may analyze a module, transmitted from the input bitstream, to output a module ID and to transfer the input bitstream to a corresponding decoding module.
The speech decoding unit 530 may decode the input bitstream according to a selection of the module selection unit 510 to generate a speech signal. Specifically, the speech decoding unit 530 may perform a CELP-based speech decoding operation. Hereinafter, the speech decoding unit 530 will be further described in detail with reference to FIG. 6.
FIG. 6 is a block diagram illustrating an example of the speech decoding unit 530 of FIG. 5.
Referring to FIG. 6, the speech decoding unit 530 may include a decoding initialization unit 610 and a first speech decoder 620.
When the first decoding module is different from the second decoding module, the decoding initialization unit 610 may determine an initial value for decoding of the first speech decoder 620. Specifically, the decoding initialization unit 610 may receive a previous module. Only when a previous frame has performed an MDCT operation may the decoding initialization unit 610 determine the initial value to be provided for the first speech decoder 620. Here, the decoding initialization unit 610 may include an LPC analyzer 611, an LSP converter 612, an LPC residual signal calculator 613, and a decoding initial value decision unit 614.
The LPC analyzer 611 may calculate an LPC coefficient with respect to the previous output signal. Specifically, the LPC analyzer 611 may receive the previous output signal to perform an LPC analysis using the same scheme as the first speech decoder 620 and thereby calculate and output an LPC coefficient corresponding to the previous output signal.
The LSP converter 612 may convert the calculated LPC coefficient to an LSP value.
The LPC residual signal calculator 613 may calculate an LPC residual signal using the previous output signal and the LPC coefficient.
The decoding initial value decision unit 614 may determine the initial value for decoding of the first speech decoder 620 using the LPC coefficient, the LSP value, and the LPC residual signal. Specifically, the decoding initial value decision unit 614 may determine and output the initial value in a form, required by the first speech decoder 620, using the LPC coefficient, the LPC value, the LPC residual signal, and the like.
When the first decoding module is identical to the second decoding module, the first speech decoder 620 may decode the input bitstream to a CELP structure. Here, when the first decoding module is identical to the second decoding module, the first speech decoder 620 may decode the input bitstream using an internal initial value of the first speech decoder 620. When the first decoding module is different from the second decoding module, the first speech decoder 620 may decode the input bitstream using an initial value that is determined by the decoding initialization unit 610. Specifically, the first speech decoder 620 may receive a previous module having performed decoding for a previous frame one frame prior to a current frame. When the previous frame has performed a CELP operation, the first speech decoder 620 may decode input bitstream corresponding to the current frame using a CELP scheme. In this case, the first speech decoder 620 may perform a consecutive CELP operation and thus continue with a decoding operation using internally provided previous information to generate an output signal. When the previous frame has performed an MDCT operation, the first speech decoder 620 may erase all the previous information for CELP decoding, and perform the decoding operation using the initial value, provided from the decoding initialization unit 610, to generate the output signal.
Referring again to FIG. 5, the audio decoding unit 540 may decode the input bitstream according to the selection of the module selection unit 510 to generate an audio signal. Hereinafter, the audio decoding unit 540 will be further described in detail with reference to FIGS. 7 and 8.
FIG. 7 is a block diagram illustrating an example of the audio decoding unit 540 of FIG. 5.
Referring to FIG. 7, the audio decoding unit 540 may include a second speech decoder 710, a second audio decoder 720, a first audio decoder 730, a signal restoration unit 740, and an output selector 750.
When the first decoding module is identical to the second decoding module, the first audio decoder 730 may decode the input bitstream through an Inverse MDCT (IMDCT) operation. Specifically, the first audio decoder 730 may receive a previous module. When a previous frame has performed the IMDCT operation, the first audio decoder 730 may decode an input bitstream corresponding to the current frame using the IMDCT operation to thereby generate an output signal. Specifically, the first audio decoder 730 may receive an input bitstream of the current frame, perform the IMDCT operation according to an existing technology, apply a window to thereby perform a time-domain aliasing cancellation (TDAC) operation, and output a final output signal. When the previous frame performs a CELP operation, the first audio decoder 730 may not perform any operation.
Referring to FIG. 8, when the first decoding module is different from the second decoding module, the second speech decoder 710 may decode the input bitstream to a CELP structure. Specifically, the second speech decoder 710 may receive the previous module. When the previous frame has performed the CELP operation, the second speech decoder 710 may decode the input bitstream according to an existing speech decoding scheme to generate an output signal. Here, the output signal of the second speech decoder 710 may be x4 820 and have a ½ frame length. Since the previous frame has performed the CELP operation, the second speech decoder 710 may be consecutively connected to the previous frame and thus perform the decoding operation without initialization.
When the first decoding module is different from the second decoding module, the second audio decoder 720 may decode the input bitstream through the IMDCT operation. Here, after the IMDCT operation, the second audio decoder 720 may apply only a window and obtain an output signal without performing the TDAC operation. Also, in FIG. 8, ab 830 may denote the output signal of the second audio decoder 720. a and b may be defined as signals having a ½ frame length.
The signal restoration unit 740 may calculate a final output from an output of the second speech decoder 710 and an output of the second audio decoder 720. Also, the signal restoration unit 710 may obtain a final output signal of the current frame and define the output signals as gh 850 as shown in FIG. 8. Here, g and h may be defined as signals having a ½ frame length. The signal restoration unit 740 may define g=x4 at all times and decode signal h using one of the following schemes according an operation of the second audio encoder. A first scheme may obtain h according to the following Equation 1. Here, a general window operation is assumed. In the following Equation 1, R denotes time-axis rotating a signal based on a ½ frame length.
h = b + w 2 w 1 R × 4 R w 2 w 2 , [ Equation 1 ]
wherein h denotes the output signal corresponding to a rear ½ sample of the first frame, b denotes an output signal of the second audio decoder 720, x4 denotes an output signal of the second speech decoder 710, w1 and w2 denote windows, w1 R denotes a signal that is generated by performing a time-axis rotation for w1 based on a ½ frame length, and x4 R denotes a signal that is generated by performing the time-axis rotation for x4 based on a ½ frame length.
A second scheme may obtain h according to the following Equation 2:
h = b w 2 w 2 , [ Equation 2 ]
where h denotes the output signal corresponding to the rear ½ sample of the first frame, b denotes the output signal of the second audio decoder 720, and w2 denotes a window.
A third scheme may obtain h according to the following Equation 3:
h = b w 2 w 2 + x 5 , [ Equation 3 ]
where h denotes the output signal corresponding to the rear ½ sample of the first frame, b denotes the output signal of the second audio decoder 720, w2 denotes a window, and x5 840 denotes a zero input response with respect to an LPC filter after decoding the output signal of the second speech decoder 710.
When the previous frame has performed the MDCT operation, the second speech decoder 710, the second audio decoder 720, and the signal restoration unit 740 may not perform any operation.
The output selector 750 may select and output one of an output of the signal restoration unit 740 and an output of the first audio decoder 730.
Referring again to FIG. 5, the output generation unit 750 may select one of the speech signal of the speech decoding unit 530 and the audio signal of the audio decoding unit 540 according to the selection of the module selection unit 510 to generate the output signal. Specifically, the output generation unit 750 may select the output signal according to the module ID to output the selected output signal as the final output signal.
The module buffer 520 may store a module ID of the selected first decoding module, and transmit information of a second decoding module corresponding to a previous frame of the first frame to the speech decoding unit 530 and the audio decoding unit 540. Specifically, the module buffer 520 may store the module ID to output a previous module corresponding to a previous module ID that is one frame prior to a current frame.
The output buffer 560 may store the output signal and output a previous output signal that is an output signal of the previous frame.
FIG. 9 is a flowchart illustrating an encoding method of integrally encoding a speech signal and an audio signal according to an embodiment of the present invention.
Referring to FIG. 9, in operation 910, the encoding method may analyze an input signal to determine a module type of an encoding module for encoding a current frame, and buffer the input signal to prepare a previous frame input signal, and may store a module type of the current frame to prepare a module type of a previous frame.
In operation 920, the encoding method may determine whether the determined module type is a speech module or an audio module.
When the determined module type is the speech module in operation 920, the encoding method may determine whether the module type is changed in operation 930.
When the module type is not changed in operation 930, the encoding method may perform a CELP encoding operation according to an existing technology in operation 950. Conversely, when the module type is changed in operation 930, the encoding method may perform an initialization according to an operation of the encoding initialization module to determine an initial value, and perform the CELP encoding operation using the initial value in operation 960.
When the determined module type is the audio module in operation 920, the encoding method may determine whether the module type is changed in operation 940.
When the module type is changed in operation 940, the encoding method may perform an additional encoding process in operation 970. During the additional encoding process, the encoding method may perform a CELP-based encoding for an input signal corresponding to a ½ frame length and perform a second audio encoding operation for the entire frame length. Conversely, when the module type is not changed in operation 940, the encoding method may perform an MDCT-based encoding operation according to an existing technology in operation 980.
In operation 990, the encoding method may select and output a final bitstream according to the module type and depending on whether the module type is changed.
FIG. 10 is a flowchart illustrating a decoding method of integrally decoding a speech signal and an audio signal according to an embodiment of the present invention.
Referring to FIG. 10, in operation 1001, the decoding method may determine a module type of a decoding module of a current frame based on input bitstream information to prepare a previous frame output signal, and store the module type of the current frame to prepare a module type of a previous frame.
In operation 1002, the decoding method may determine whether the determined module type is a speech module or an audio module.
When the determined module type is the speech module in operation 1002, the decoding method may determine whether the module type is changed in operation 1003.
When the module type is not changed in operation 1003, the decoding method may perform a CELP decoding operation according to an existing technology in operation 1005. Conversely, when the module type is changed in operation 1003, the decoding method may perform an initialization according to an operation of the decoding initialization module to obtain an initial value, and perform the CELP decoding operation using the initial value in operation 1006.
When the determined module type is the audio module in operation 1002, the decoding method may determine whether the module type is changed in operation 1004.
When the module type is changed in operation 1004, the decoding method may perform an additional decoding process in operation 1007. During the additional decoding process, the decoding method may perform a CELP-based decoding for the input bitstream to obtain an output signal corresponding to a ½ frame length, and perform a second audio decoding operation for the input bitstream.
Conversely, when the module type is not changed in operation 1004, the decoding method may perform an MDCT-based decoding operation according to an existing technology in operation 1008.
In operation 1009, the decoding method may perform a signal restoration operation to obtain an output signal. In operation 1010, the decoding method may select and output a final signal according to the module type and depending on whether the module type is changed.
As described above, according to embodiments of the present invention, there may be provided an apparatus and method for integrally encoding and decoding a speech signal and an audio signal that may unify a speech codec module and an audio codec module, selectively apply a codec module according to a characteristic of an input signal, and thereby may enhance a performance.
Also, according to embodiments of the present invention, when a selected codec module is changed over time, information associated with a previous module may be used. Through this, it is possible to solve distortion occurring due to a discontinuous module operation. In addition, when previous module information for overlapping is not provided from an MDCT module demanding a TDAC operation, an additional scheme may be adopted. Accordingly, the TDAC operation may be enabled to thereby perform a normal MDCT-based codec operation.
Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (15)

The invention claimed is:
1. An encoding apparatus for integrally encoding a speech signal and an audio signal, the encoding apparatus comprising:
a module selection unit to analyze a characteristic of an input signal and to select a first encoding module for encoding a current frame of the input signal;
a speech encoding unit to encode the input signal according to a selection of the module selection unit and to generate a speech bitstream;
an audio encoding unit to encode the input signal according to the selection of the module selection unit and to generate an audio bitstream;
a module buffer to transmit information of a second encoding module corresponding to a previous frame of the current frame to the speech encoding unit and the audio encoding unit; and
a bitstream generation unit to generate an output bitstream from the speech encoding unit or the audio encoding unit according to the selection of the module selection unit,
wherein, when an overlap operation between the previous frame and the current frame occurs, the speech encoding unit encodes a half sample of the previous frame having a speech characteristic as additional information to decode a current frame having an audio characteristic according to MDCT(Modified Discrete Cosine Transform) at a decoding apparatus,
wherein the bitstream generation unit generates the output bitstream including module information for the current frame selected by the module selection unit, the speech bitstream generated from the speech encoding unit and the audio bitstream generated from the audio encoding unit.
2. The encoding apparatus of claim 1, wherein the module selection unit extracts the module information of the selected first encoding module and transmits the module information to the bitstream generation unit.
3. The encoding apparatus of claim 1, wherein the speech encoding unit comprises:
a first speech encoder to encode the input signal to a Code Excitation Linear Prediction (CELP) structure when the first encoding module is identical to the second encoding module; and
an encoding initialization unit to determine an initial value for encoding of the first speech encoder when the first encoding module is different from the second encoding module.
4. The encoding apparatus of claim 3, wherein:
when the first encoding module is identical to the second encoding module, the first speech encoder encodes the input signal using an internal initial value of the first speech encoder, and
when the first encoding module is different from the second encoding module, the first speech encoder encodes the input signal using an initial value that is determined by the encoding initialization unit.
5. The encoding apparatus of claim 3, wherein the encoding initialization unit comprises:
a Linear Predictive Coder (LPC) analyzer to calculate an LPC coefficient with respect to the previous input signal;
a Linear Spectrum Pair (LSP) converter to convert the calculated LPC coefficient to an LSP value;
an LPC residual signal calculator to calculate an LPC residual signal using the previous input signal and the LPC coefficient; and
an encoding initial value decision unit to determine the initial value for encoding of the first speech encoder using the LPC coefficient, the LSP value, and the LPC residual signal.
6. The encoding apparatus of claim 1, wherein the audio encoding unit comprises:
a first audio encoder to encode the input signal through a Modified Discrete Cosine Transform (MDCT) operation when the first encoding module is identical to the second encoding module;
a second speech encoder to encode the input signal to a CELP structure when the first encoding module is different from the second encoding module;
a second audio encoder to encode the input signal through the MDCT operation when the first encoding module is different from the second encoding module; and
a multiplexer to select one of an output of the first audio encoder, an output of the second speech encoder, and an output of the second audio encoder to generate the output bitstream.
7. The encoding apparatus of claim 6, wherein, when the first encoding module is different from the second encoding module, the second speech encoder encodes an input signal corresponding to a front half sample of the current frame.
8. The encoding apparatus of claim 6, wherein the second audio encoder comprises:
a zero input response calculator to calculate a zero input response with respect to an LPC filter after terminating an encoding operation of the second speech encoder;
a first converter to convert, to zero, an input signal corresponding to a front ½ sample of the current frame; and
a second converter to subtract the zero input response from an input signal corresponding to a rear half sample of the current frame, wherein
the second audio encoder encodes a converted signal of the first converter and a converted signal of the second converter.
9. A decoding apparatus for integrally decoding a speech signal and an audio signal, the decoding apparatus comprising:
a module selection unit to analyze a characteristic of an input bitstream and to select a first decoding module for decoding a current frame of the input bitstream;
a speech decoding unit to decode the input bitstream according to a selection of the module selection unit and to generate a speech signal;
an audio decoding unit to decode the input bitstream according to the selection of the module selection unit and to generate an audio signal;
a module buffer to transmit information of a second decoding module corresponding to a previous frame of the current frame to the speech decoding unit and the audio decoding unit; and
an output generation unit to select one of the speech signal of the speech decoding unit and the audio signal of the audio signal according to the selection of the module selection unit and to output an output signal,
wherein the speech decoding unit decodes a half sample of a previous frame having a speech characteristic as additional information,
wherein, when an overlap operation between the previous frame and the current frame occurs, the audio decoding unit decodes a current frame according to MDCT(Modified Discrete Cosine Transform) by compensating the current frame based on the additional information.
10. The decoding apparatus of claim 9, wherein the speech decoding unit comprises:
a first speech decoder to decode the input stream to a CELP structure when the first decoding module is identical to the second decoding module; and
a decoding initialization unit to determine an initial value for decoding of the first speech decoder when the first decoding module is different from the second decoding module.
11. The decoding apparatus of claim 10, wherein:
when the first decoding module is identical to the second decoding module, the first speech decoder decodes the input bitstream using an internal initial value of the first speech decoder, and
when the first decoding module is different from the second decoding module, the first speech decoder decodes the input bitstream using an initial value that is determined by the decoding initialization unit.
12. The decoding apparatus of claim 9, wherein the decoding initialization unit comprises:
an LPC analyzer to calculate an LPC coefficient with respect to the previous output signal;
an LSP converter to convert the calculated LPC coefficient to an LSP value;
an LPC residual signal calculator to calculate an LPC residual signal using the previous output signal and the LPC coefficient; and
a decoding initial value decision unit to determine the initial value for decoding of the first speech decoder using the LPC coefficient, the LSP value, and the LPC residual signal.
13. The decoding apparatus of claim 9, wherein the audio decoding unit comprises:
a first audio decoder to decode the input bitstream through an Inverse MDCT (IMDCT) operation when the first decoding module is identical to the second decoding module;
a second speech decoder to decode the input bitstream to a CELP structure when the first decoding module is different from the second decoding module;
a second audio decoder to decode the input bitstream through the IMDCT operation when the first decoding module is different from the second decoding module; and
a signal restoration unit to calculate a final output from an output of the second speech decoder and an output of the second audio decoder; and
an output selector to select and output one of an output of the signal restoration unit and an output of the first audio decoder.
14. The decoding apparatus of claim 13, wherein, when the first decoding module is different from the second decoding module, the second speech decoder decodes an input bitstream corresponding to a front half sample of the current frame to output an input signal.
15. The decoding apparatus of claim 13, wherein the signal restoration unit determines the output of the second speech decoder as an output signal corresponding to a front half sample of the current frame.
US13/054,377 2008-07-14 2009-07-14 Apparatus for encoding and decoding of integrated speech and audio Active 2031-09-16 US8959015B2 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
KR20080068370 2008-07-14
KR10-2008-0068370 2008-07-14
KR10-2009-0061607 2009-07-07
KR1020090061607A KR20100007738A (en) 2008-07-14 2009-07-07 Apparatus for encoding and decoding of integrated voice and music
PCT/KR2009/003854 WO2010008175A2 (en) 2008-07-14 2009-07-14 Apparatus for encoding and decoding of integrated speech and audio

Publications (2)

Publication Number Publication Date
US20110119054A1 US20110119054A1 (en) 2011-05-19
US8959015B2 true US8959015B2 (en) 2015-02-17

Family

ID=41816650

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/054,377 Active 2031-09-16 US8959015B2 (en) 2008-07-14 2009-07-14 Apparatus for encoding and decoding of integrated speech and audio

Country Status (6)

Country Link
US (1) US8959015B2 (en)
EP (2) EP2302623B1 (en)
JP (1) JP2011528134A (en)
KR (1) KR20100007738A (en)
CN (1) CN102150205B (en)
WO (1) WO2010008175A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130124215A1 (en) * 2010-07-08 2013-05-16 Fraunhofer-Gesellschaft Zur Foerderung der angewanen Forschung e.V. Coder using forward aliasing cancellation
US11276413B2 (en) 2018-10-26 2022-03-15 Electronics And Telecommunications Research Institute Audio signal encoding method and audio signal decoding method, and encoder and decoder performing the same

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2558229T3 (en) 2008-07-11 2016-02-02 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder and decoder for encoding frames of sampled audio signals
US9767822B2 (en) * 2011-02-07 2017-09-19 Qualcomm Incorporated Devices for encoding and decoding a watermarked signal
CN102779518B (en) * 2012-07-27 2014-08-06 深圳广晟信源技术有限公司 Coding method and system for dual-core coding mode
KR101383915B1 (en) * 2013-03-21 2014-04-17 한국전자통신연구원 A digital audio receiver having united speech and audio decoder
WO2014148851A1 (en) * 2013-03-21 2014-09-25 전자부품연구원 Digital audio transmission system and digital audio receiver provided with united speech and audio decoder
IL294836B1 (en) * 2013-04-05 2024-06-01 Dolby Int Ab Audio encoder and decoder
WO2015115798A1 (en) * 2014-01-29 2015-08-06 Samsung Electronics Co., Ltd. User terminal device and secured communication method thereof
KR102092756B1 (en) * 2014-01-29 2020-03-24 삼성전자주식회사 User terminal Device and Method for secured communication therof
EP2980795A1 (en) 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoding and decoding using a frequency domain processor, a time domain processor and a cross processor for initialization of the time domain processor
EP2980797A1 (en) * 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio decoder, method and computer program using a zero-input-response to obtain a smooth transition
EP2980796A1 (en) 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and apparatus for processing an audio signal, audio decoder, and audio encoder
EP2980794A1 (en) 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder and decoder using a frequency domain processor and a time domain processor
EP3196877A4 (en) * 2014-09-08 2018-02-28 Sony Corporation Coding device and method, decoding device and method, and program
KR20210003514A (en) 2019-07-02 2021-01-12 한국전자통신연구원 Encoding method and decoding method for high band of audio, and encoder and decoder for performing the method
KR20210003507A (en) 2019-07-02 2021-01-12 한국전자통신연구원 Method for processing residual signal for audio coding, and aduio processing apparatus

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11175098A (en) 1997-12-12 1999-07-02 Nec Corp Voice and music encoding system
US6658383B2 (en) * 2001-06-26 2003-12-02 Microsoft Corporation Method for coding speech and music signals
US20050187759A1 (en) * 2001-10-04 2005-08-25 At&T Corp. System for bandwidth extension of narrow-band speech
KR100614496B1 (en) 2003-11-13 2006-08-22 한국전자통신연구원 An apparatus for coding of variable bit-rate wideband speech and audio signals, and a method thereof
US20070106502A1 (en) 2005-11-08 2007-05-10 Junghoe Kim Adaptive time/frequency-based audio encoding and decoding apparatuses and methods
WO2007083931A1 (en) 2006-01-18 2007-07-26 Lg Electronics Inc. Apparatus and method for encoding and decoding signal
JP2007538283A (en) 2004-05-19 2007-12-27 ノキア コーポレイション Audio coder mode switching support
US20080010062A1 (en) 2006-07-08 2008-01-10 Samsung Electronics Co., Ld. Adaptive encoding and decoding methods and apparatuses
US20080027719A1 (en) 2006-07-31 2008-01-31 Venkatesh Kirshnan Systems and methods for modifying a window with a frame associated with an audio signal
WO2008045846A1 (en) 2006-10-10 2008-04-17 Qualcomm Incorporated Method and apparatus for encoding and decoding audio signals
CN101202042A (en) 2006-12-14 2008-06-18 中兴通讯股份有限公司 Expandable digital audio encoding frame and expansion method thereof
US7860709B2 (en) * 2004-05-17 2010-12-28 Nokia Corporation Audio encoding with different coding frame lengths
US7876966B2 (en) * 2003-03-11 2011-01-25 Spyder Navigations L.L.C. Switching between coding schemes
US8069034B2 (en) * 2004-05-17 2011-11-29 Nokia Corporation Method and apparatus for encoding an audio signal using multiple coders with plural selection models
US8244525B2 (en) * 2004-04-21 2012-08-14 Nokia Corporation Signal encoding a frame in a communication system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134518A (en) * 1997-03-04 2000-10-17 International Business Machines Corporation Digital audio signal coding using a CELP coder and a transform coder
US20070147518A1 (en) * 2005-02-18 2007-06-28 Bruno Bessette Methods and devices for low-frequency emphasis during audio compression based on ACELP/TCX

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11175098A (en) 1997-12-12 1999-07-02 Nec Corp Voice and music encoding system
US6658383B2 (en) * 2001-06-26 2003-12-02 Microsoft Corporation Method for coding speech and music signals
US20050187759A1 (en) * 2001-10-04 2005-08-25 At&T Corp. System for bandwidth extension of narrow-band speech
US7876966B2 (en) * 2003-03-11 2011-01-25 Spyder Navigations L.L.C. Switching between coding schemes
KR100614496B1 (en) 2003-11-13 2006-08-22 한국전자통신연구원 An apparatus for coding of variable bit-rate wideband speech and audio signals, and a method thereof
US8244525B2 (en) * 2004-04-21 2012-08-14 Nokia Corporation Signal encoding a frame in a communication system
US8069034B2 (en) * 2004-05-17 2011-11-29 Nokia Corporation Method and apparatus for encoding an audio signal using multiple coders with plural selection models
US7860709B2 (en) * 2004-05-17 2010-12-28 Nokia Corporation Audio encoding with different coding frame lengths
US7596486B2 (en) * 2004-05-19 2009-09-29 Nokia Corporation Encoding an audio signal using different audio coder modes
JP2007538283A (en) 2004-05-19 2007-12-27 ノキア コーポレイション Audio coder mode switching support
US20070106502A1 (en) 2005-11-08 2007-05-10 Junghoe Kim Adaptive time/frequency-based audio encoding and decoding apparatuses and methods
WO2007083931A1 (en) 2006-01-18 2007-07-26 Lg Electronics Inc. Apparatus and method for encoding and decoding signal
US20080010062A1 (en) 2006-07-08 2008-01-10 Samsung Electronics Co., Ld. Adaptive encoding and decoding methods and apparatuses
WO2008016945A2 (en) 2006-07-31 2008-02-07 Qualcomm Incorporated Systems and methods for modifying a window with a frame associated with an audio signal
US20080027719A1 (en) 2006-07-31 2008-01-31 Venkatesh Kirshnan Systems and methods for modifying a window with a frame associated with an audio signal
WO2008045846A1 (en) 2006-10-10 2008-04-17 Qualcomm Incorporated Method and apparatus for encoding and decoding audio signals
CN101202042A (en) 2006-12-14 2008-06-18 中兴通讯股份有限公司 Expandable digital audio encoding frame and expansion method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Jari Mäkinen et al., "AMR-WB+: A New Audio Coding Standard for 3rd Generation Mobile Audio Services", Proceedings. ICASSP 2005, IEEE, Mar. 18-23, 2005, pp. II 1109-II 1112.

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130124215A1 (en) * 2010-07-08 2013-05-16 Fraunhofer-Gesellschaft Zur Foerderung der angewanen Forschung e.V. Coder using forward aliasing cancellation
US9257130B2 (en) * 2010-07-08 2016-02-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoding/decoding with syntax portions using forward aliasing cancellation
US11276413B2 (en) 2018-10-26 2022-03-15 Electronics And Telecommunications Research Institute Audio signal encoding method and audio signal decoding method, and encoder and decoder performing the same

Also Published As

Publication number Publication date
KR20100007738A (en) 2010-01-22
EP3706122A1 (en) 2020-09-09
WO2010008175A3 (en) 2010-03-18
EP2302623B1 (en) 2020-04-01
EP2302623A2 (en) 2011-03-30
CN102150205B (en) 2013-03-27
US20110119054A1 (en) 2011-05-19
WO2010008175A2 (en) 2010-01-21
JP2011528134A (en) 2011-11-10
CN102150205A (en) 2011-08-10
EP2302623A4 (en) 2016-04-13

Similar Documents

Publication Publication Date Title
US8959015B2 (en) Apparatus for encoding and decoding of integrated speech and audio
US11705137B2 (en) Apparatus for encoding and decoding of integrated speech and audio
KR101664434B1 (en) Method of coding/decoding audio signal and apparatus for enabling the method
JP6173288B2 (en) Multi-mode audio codec and CELP coding adapted thereto
US9218817B2 (en) Low-delay sound-encoding alternating between predictive encoding and transform encoding
US8862480B2 (en) Audio encoding/decoding with aliasing switch for domain transforming of adjacent sub-blocks before and subsequent to windowing
EP2849180B1 (en) Hybrid audio signal encoder, hybrid audio signal decoder, method for encoding audio signal, and method for decoding audio signal
US20120209600A1 (en) Integrated voice/audio encoding/decoding device and method whereby the overlap region of a window is adjusted based on the transition interval
US11062718B2 (en) Encoding apparatus and decoding apparatus for transforming between modified discrete cosine transform-based coder and different coder
CN103594090A (en) Low-complexity spectral analysis/synthesis using selectable time resolution
WO2013061584A1 (en) Hybrid sound-signal decoder, hybrid sound-signal encoder, sound-signal decoding method, and sound-signal encoding method
JPWO2011158485A1 (en) Audio hybrid encoding apparatus and audio hybrid decoding apparatus
US9620139B2 (en) Adaptive linear predictive coding/decoding
Quackenbush MPEG Audio Compression Future

Legal Events

Date Code Title Description
AS Assignment

Owner name: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTIT

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEE, TAE JIN;BEACK, SEUNG KWON;KIM, MINJE;AND OTHERS;SIGNING DATES FROM 20101214 TO 20101220;REEL/FRAME:025749/0802

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551)

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE UNDER 1.28(C) (ORIGINAL EVENT CODE: M1559); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FEPP Fee payment procedure

Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PTGR); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8