US20190158833A1 - Signal encoding method and apparatus and signal decoding method and apparatus - Google Patents
Signal encoding method and apparatus and signal decoding method and apparatus Download PDFInfo
- Publication number
- US20190158833A1 US20190158833A1 US16/259,341 US201916259341A US2019158833A1 US 20190158833 A1 US20190158833 A1 US 20190158833A1 US 201916259341 A US201916259341 A US 201916259341A US 2019158833 A1 US2019158833 A1 US 2019158833A1
- Authority
- US
- United States
- Prior art keywords
- unit
- band
- signal
- decoding
- bits
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 75
- 230000003595 spectral effect Effects 0.000 claims abstract description 119
- 238000001228 spectrum Methods 0.000 claims abstract description 104
- 238000013139 quantization Methods 0.000 claims abstract description 67
- 230000005236 sound signal Effects 0.000 claims description 50
- 238000010606 normalization Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 60
- 230000005284 excitation Effects 0.000 description 51
- 210000004966 intestinal stem cell Anatomy 0.000 description 31
- 230000008569 process Effects 0.000 description 29
- 206010019133 Hangover Diseases 0.000 description 25
- 230000001052 transient effect Effects 0.000 description 24
- 238000012937 correction Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 17
- 238000004891 communication Methods 0.000 description 12
- 230000003247 decreasing effect Effects 0.000 description 7
- 238000005070 sampling Methods 0.000 description 7
- 238000001914 filtration Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 239000000945 filler Substances 0.000 description 5
- 230000007774 longterm Effects 0.000 description 5
- 238000012805 post-processing Methods 0.000 description 4
- 238000007493 shaping process Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000000873 masking effect Effects 0.000 description 3
- 238000000611 regression analysis Methods 0.000 description 3
- 230000011664 signaling Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 241000760358 Enodes Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0017—Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
- H03M13/05—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
- H03M13/05—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
- H03M13/13—Linear codes
- H03M13/15—Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes
- H03M13/151—Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes using error location or error correction polynomials
- H03M13/156—Encoding or decoding using time-frequency transformations, e.g. fast Fourier transformation
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/31—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes combining coding for error detection or correction and efficient use of the spectrum
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/40—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/035—Scalar quantisation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/038—Vector quantisation, e.g. TwinVQ audio
Definitions
- One or more exemplary embodiments relate to audio or speech signal encoding and decoding, and more particularly, to a method and apparatus for encoding or decoding a spectral coefficient in a frequency domain.
- Quantizers of various schemes have been proposed to efficiently encode spectral coefficients in a frequency domain.
- TCQ trellis coded quantization
- USQ uniform scalar quantization
- FPC factorial pulse coding
- AVQ algebraic VQ
- PVQ pyramid VQ
- a lossless encoder optimized for each quantizer may be implemented together.
- One or more exemplary embodiments include a method and apparatus for encoding or decoding a spectral coefficient adaptively to various bit rates or various sub-band sizes in a frequency domain.
- One or more exemplary embodiments include a computer-readable recording medium having recorded thereon a computer-readable program for executing a signal encoding or decoding method.
- One or more exemplary embodiments include a multimedia device employing a signal encoding or decoding apparatus.
- a spectrum encoding method includes quantizing spectral data of a current band based on a first quantization scheme, generating a lower bit of the current band using the spectral data and the quantized spectral data, quantizing a sequence of lower bits including the lower bit of the current band based on a second quantization scheme, and generating a bitstream based on a upper bit excluding N bits, where N is 1 or greater, from the quantized spectral data and the quantized sequence of lower bits.
- a spectrum encoding apparatus includes a processor configured to quantize spectral data of a current band based on a first quantization scheme, generate a lower bit of the current band using the spectral data and the quantized spectral data, quantize a sequence of lower bits including the lower bit of the current band based on a second quantization scheme, and generate a bitstream based on a upper bit excluding N bits, where N is 1 or greater, from the quantized spectral data and the quantized sequence of lower bits.
- a spectrum decoding method includes receiving a bitstream, decoding a sequence of lower bits by extracting TCQ path information, decoding number, position and sign of ISCs by extracting ISC information, extracting and decoding a remaining bit except for a lower bit, and reconstructing spectrum components based on the decoded sequence of lower bits and the decoded remaining bit except for the lower bit.
- a spectrum decoding apparatus includes a processor configured to receive a bitstream, decode a sequence of lower bits by extracting TCQ path information, decode number, position and sign of ISCs by extracting ISC information, extract and decode a remaining bit except for a lower bit, and reconstruct spectrum components based on the decoded sequence of lower bits and the decoded remaining bit except for the lower bit.
- a spectrum coefficient can be encoded by means of a jointed USQ and TCQ by using a bit rate control module designed in a codec supporting multi-rates. In this case, the respective advantages of both quantization methods can be maximized.
- FIGS. 1A and 1B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to an exemplary embodiment, respectively.
- FIGS. 2A and 2B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively.
- FIGS. 3A and 3B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively.
- FIGS. 4A and 4B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively.
- FIG. 5 is a block diagram of a frequency domain audio encoding apparatus according to an exemplary embodiment.
- FIG. 6 is a block diagram of a frequency domain audio decoding apparatus according to an exemplary embodiment.
- FIG. 7 is a block diagram of a spectrum encoding apparatus according to an exemplary embodiment.
- FIG. 8 illustrates sub-band segmentation
- FIG. 9 is a block diagram of a spectrum quantization apparatus according to an exemplary embodiment.
- FIG. 10 is a block diagram of a spectrum encoding apparatus according to an exemplary embodiment.
- FIG. 11 is a block diagram of an ISC encoding apparatus according to an exemplary embodiment.
- FIG. 12 is a block diagram of an ISC information encoding apparatus according to an exemplary embodiment.
- FIG. 13 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment.
- FIG. 14 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment.
- FIG. 15 illustrates a concept of an ISC collection and encoding process according to an exemplary embodiment.
- FIG. 16 illustrates a second joint scheme combining USQ and TCQ.
- FIG. 17 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment.
- FIG. 18 is a block diagram of a second quantization unit of FIG. 17 according to an exemplary embodiment.
- FIG. 19 illustrates a method of generating residual data.
- FIG. 20 illustrates an example of TCQ.
- FIG. 21 is a block diagram of a frequency domain audio decoding apparatus according to an exemplary embodiment.
- FIG. 22 is a block diagram of a spectrum decoding apparatus according to an exemplary embodiment.
- FIG. 23 is a block diagram of a spectrum inverse-quantization apparatus according to an exemplary embodiment.
- FIG. 24 is a block diagram of a spectrum decoding apparatus according to an exemplary embodiment.
- FIG. 25 is a block diagram of an ISC decoding apparatus according to an exemplary embodiment.
- FIG. 26 is a block diagram of an ISC information decoding apparatus according to an exemplary embodiment.
- FIG. 27 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment.
- FIG. 28 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment.
- FIG. 29 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment.
- FIG. 30 is a block diagram of a third decoding unit of FIG. 29 according to another exemplary embodiment.
- FIG. 31 is a block diagram of a multimedia device according to an exemplary embodiment.
- FIG. 32 is a block diagram of a multimedia device according to another exemplary embodiment.
- FIG. 33 is a block diagram of a multimedia device according to another exemplary embodiment.
- FIG. 34 is a flowchart illustrating a spectrum encoding method according to an exemplary embodiment.
- FIG. 35 is a flowchart illustrating a spectrum decoding method according to an exemplary embodiment.
- FIG. 36 is a block diagram of a bit allocation apparatus according to an exemplary embodiment.
- FIG. 37 is a block diagram of a coding mode determination apparatus according to an exemplary embodiment.
- FIG. 38 illustrates a state machine used in a correction unit of FIG. 37 according to an exemplary embodiment.
- inventive concept may have diverse modified embodiments, preferred embodiments are illustrated in the drawings and are described in the detailed description of the inventive concept. However, this does not limit the inventive concept within specific embodiments and it should be understood that the inventive concept covers all the modifications, equivalents, and replacements within the idea and technical scope of the inventive concept. Moreover, detailed descriptions related to well-known functions or configurations will be ruled out in order not to unnecessarily obscure subject matters of the inventive concept.
- FIGS. 1A and 1B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to an exemplary embodiment, respectively.
- the audio encoding apparatus 110 shown in FIG. 1A may include a pre-processor 112 , a frequency domain coder 114 , and a parameter coder 116 .
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown).
- the pre-processor 112 may perform filtering, down-sampling, or the like for an input signal, but is not limited thereto.
- the input signal may include a speech signal, a music signal, or a mixed signal of speech and music.
- the input signal is referred to as an audio signal.
- the frequency domain coder 114 may perform a time-frequency transform on the audio signal provided by the pre-processor 112 , select a coding tool in correspondence with the number of channels, a coding band, and a bit rate of the audio signal, and encode the audio signal by using the selected coding tool.
- the time-frequency transform may use a modified discrete cosine transform (MDCT), a modulated lapped transform (MLT), or a fast Fourier transform (FFT), but is not limited thereto.
- MDCT modified discrete cosine transform
- MHT modulated lapped transform
- FFT fast Fourier transform
- the audio signal is a stereo-channel or multi-channel
- encoding is performed for each channel, and if the number of given bits is not sufficient, a down-mixing scheme may be applied.
- An encoded spectral coefficient is generated by the frequency domain coder 114 .
- the parameter coder 116 may extract a parameter from the encoded spectral coefficient provided from the frequency domain coder 114 and encode the extracted parameter.
- the parameter may be extracted, for example, for each sub-band, which is a unit of grouping spectral coefficients, and may have a uniform or non-uniform length by reflecting a critical band. When each sub-band has a non-uniform length, a sub-band existing in a low frequency band may have a relatively short length compared with a sub-band existing in a high frequency band.
- the number and a length of sub-bands included in one frame vary according to codec algorithms and may affect the encoding performance.
- the parameter may include, for example a scale factor, power, average energy, or Norm, but is not limited thereto.
- Spectral coefficients and parameters obtained as an encoding result form a bitstream, and the bitstream may be stored in a storage medium or may be transmitted in a form of, for example, packets through a channel.
- the audio decoding apparatus 130 shown in FIG. 1B may include a parameter decoder 132 , a frequency domain decoder 134 , and a post-processor 136 .
- the frequency domain decoder 134 may include a frame error concealment algorithm or a packet loss concealment algorithm.
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown).
- the parameter decoder 132 may decode parameters from a received bitstream and check whether an error such as erasure or loss has occurred in frame units from the decoded parameters.
- an error such as erasure or loss has occurred in frame units from the decoded parameters.
- Various well-known methods may be used for the error check, and information on whether a current frame is a good frame or an erasure or loss frame is provided to the frequency domain decoder 134.
- the erasure or loss frame is referred to as an error frame.
- the frequency domain decoder 134 may generate synthesized spectral coefficients by performing decoding through a general transform decoding process.
- the frequency domain decoder 134 may generate synthesized spectral coefficients by repeating spectral coefficients of a previous good frame (PGF) onto the error frame or by scaling the spectral coefficients of the PGF by a regression analysis to then be repeated onto the error frame, through a frame error concealment algorithm or a packet loss concealment algorithm.
- the frequency domain decoder 134 may generate a time domain signal by performing a frequency-time transform on the synthesized spectral coefficients.
- the post-processor 136 may perform filtering, up-sampling, or the like for sound quality improvement with respect to the time domain signal provided from the frequency domain decoder 134 , but is not limited thereto.
- the post-processor 136 provides a reconstructed audio signal as an output signal.
- FIGS. 2A and 2B are block diagrams of an audio encoding apparatus and an audio decoding apparatus, according to another exemplary embodiment, respectively, which have a switching structure.
- the audio encoding apparatus 210 shown in FIG. 2A may include a pre-processor unit 212 , a mode determiner 213 , a frequency domain coder 214 , a time domain coder 215 , and a parameter coder 216 .
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown).
- the mode determiner 213 may determine a coding mode by referring to a characteristic of an input signal.
- the mode determiner 213 may determine according to the characteristic of the input signal whether a coding mode suitable for a current frame is a speech mode or a music mode and may also determine whether a coding mode efficient for the current frame is a time domain mode or a frequency domain mode.
- the characteristic of the input signal may be perceived by using a short-term characteristic of a frame or a long-term characteristic of a plurality of frames, but is not limited thereto.
- the coding mode may be determined as the speech mode or the time domain mode, and if the input signal corresponds to a signal other than a speech signal, i.e., a music signal or a mixed signal, the coding mode may be determined as the music mode or the frequency domain mode.
- the mode determiner 213 may provide an output signal of the pre-processor 212 to the frequency domain coder 214 when the characteristic of the input signal corresponds to the music mode or the frequency domain mode and may provide an output signal of the pre-processor 212 to the time domain coder 215 when the characteristic of the input signal corresponds to the speech mode or the time domain mode.
- frequency domain coder 214 is substantially the same as the frequency domain coder 114 of FIG. 1A , the description thereof is not repeated.
- the time domain coder 215 may perform code excited linear prediction (CELP) coding for an audio signal provided from the pre-processor 212 .
- CELP code excited linear prediction
- algebraic CELP may be used for the CELP coding, but the CELP coding is not limited thereto.
- An encoded spectral coefficient is generated by the time domain coder 215 .
- the parameter coder 216 may extract a parameter from the encoded spectral coefficient provided from the frequency domain coder 214 or the time domain coder 215 and encodes the extracted parameter. Since the parameter coder 216 is substantially the same as the parameter coder 116 of FIG. 1A , the description thereof is not repeated. Spectral coefficients and parameters obtained as an encoding result may form a bitstream together with coding mode information, and the bitstream may be transmitted in a form of packets through a channel or may be stored in a storage medium.
- the audio decoding apparatus 230 shown in FIG. 2B may include a parameter decoder 232 , a mode determiner 233 , a frequency domain decoder 234 , a time domain decoder 235 , and a post-processor 236 .
- Each of the frequency domain decoder 234 and the time domain decoder 235 may include a frame error concealment algorithm or a packet loss concealment algorithm in each corresponding domain.
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown).
- the parameter decoder 232 may decode parameters from a bitstream transmitted in a form of packets and check whether an error has occurred in frame units from the decoded parameters.
- Various well-known methods may be used for the error check, and information on whether a current frame is a good frame or an error frame is provided to the frequency domain decoder 234 or the time domain decoder 235 .
- the mode determiner 233 may check coding mode information included in the bitstream and provide a current frame to the frequency domain decoder 234 or the time domain decoder 235 .
- the frequency domain decoder 234 may operate when a coding mode is the music mode or the frequency domain mode and generate synthesized spectral coefficients by performing decoding through a general transform decoding process when the current frame is a good frame.
- the frequency domain decoder 234 may generate synthesized spectral coefficients by repeating spectral coefficients of a previous good frame (PGF) onto the error frame or by scaling the spectral coefficients of the PGF by a regression analysis to then be repeated onto the error frame, through a frame error concealment algorithm or a packet loss concealment algorithm.
- the frequency domain decoder 234 may generate a time domain signal by performing a frequency-time transform on the synthesized spectral coefficients.
- the time domain decoder 235 may operate when the coding mode is the speech mode or the time domain mode and generate a time domain signal by performing decoding through a general CELP decoding process when the current frame is a normal frame.
- the time domain decoder 235 may perform a frame error concealment algorithm or a packet loss concealment algorithm in the time domain.
- the post-processor 236 may perform filtering, up-sampling, or the like for the time domain signal provided from the frequency domain decoder 234 or the time domain decoder 235 , but is not limited thereto.
- the post-processor 236 provides a reconstructed audio signal as an output signal.
- FIGS. 3A and 3B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively.
- the audio encoding apparatus 310 shown in FIG. 3A may include a pre-processor 312 , a linear prediction (LP) analyzer 313 , a mode determiner 314 , a frequency domain excitation coder 315 , a time domain excitation coder 316 , and a parameter coder 317 .
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown).
- the LP analyzer 313 may extract LP coefficients by performing LP analysis for an input signal and generate an excitation signal from the extracted LP coefficients.
- the excitation signal may be provided to one of the frequency domain excitation coder unit 315 and the time domain excitation coder 316 according to a coding mode.
- the mode determiner 314 is substantially the same as the mode determiner 213 of FIG. 2A , the description thereof is not repeated.
- the frequency domain excitation coder 315 may operate when the coding mode is the music mode or the frequency domain mode, and since the frequency domain excitation coder 315 is substantially the same as the frequency domain coder 114 of FIG. 1A except that an input signal is an excitation signal, the description thereof is not repeated.
- the time domain excitation coder 316 may operate when the coding mode is the speech mode or the time domain mode, and since the time domain excitation coder unit 316 is substantially the same as the time domain coder 215 of FIG. 2A , the description thereof is not repeated.
- the parameter coder 317 may extract a parameter from an encoded spectral coefficient provided from the frequency domain excitation coder 315 or the time domain excitation coder 316 and encode the extracted parameter. Since the parameter coder 317 is substantially the same as the parameter coder 116 of FIG. 1A , the description thereof is not repeated. Spectral coefficients and parameters obtained as an encoding result may form a bitstream together with coding mode information, and the bitstream may be transmitted in a form of packets through a channel or may be stored in a storage medium.
- the audio decoding apparatus 330 shown in FIG. 3B may include a parameter decoder 332 , a mode determiner 333 , a frequency domain excitation decoder 334 , a time domain excitation decoder 335 , an LP synthesizer 336 , and a post-processor 337 .
- Each of the frequency domain excitation decoder 334 and the time domain excitation decoder 335 may include a frame error concealment algorithm or a packet loss concealment algorithm in each corresponding domain.
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown).
- the parameter decoder 332 may decode parameters from a bitstream transmitted in a form of packets and check whether an error has occurred in frame units from the decoded parameters.
- Various well-known methods may be used for the error check, and information on whether a current frame is a good frame or an error frame is provided to the frequency domain excitation decoder 334 or the time domain excitation decoder 335 .
- the mode determiner 333 may check coding mode information included in the bitstream and provide a current frame to the frequency domain excitation decoder 334 or the time domain excitation decoder 335 .
- the frequency domain excitation decoder 334 may operate when a coding mode is the music mode or the frequency domain mode and generate synthesized spectral coefficients by performing decoding through a general transform decoding process when the current frame is a good frame.
- the frequency domain excitation decoder 334 may generate synthesized spectral coefficients by repeating spectral coefficients of a previous good frame (PGF) onto the error frame or by scaling the spectral coefficients of the PGF by a regression analysis to then be repeated onto the error frame, through a frame error concealment algorithm or a packet loss concealment algorithm.
- the frequency domain excitation decoder 334 may generate an excitation signal that is a time domain signal by performing a frequency-time transform on the synthesized spectral coefficients.
- the time domain excitation decoder 335 may operate when the coding mode is the speech mode or the time domain mode and generate an excitation signal that is a time domain signal by performing decoding through a general CELP decoding process when the current frame is a good frame.
- the time domain excitation decoder 335 may perform a frame error concealment algorithm or a packet loss concealment algorithm in the time domain.
- the LP synthesizer 336 may generate a time domain signal by performing LP synthesis for the excitation signal provided from the frequency domain excitation decoder 334 or the time domain excitation decoder 335 .
- the post-processor 337 may perform filtering, up-sampling, or the like for the time domain signal provided from the LP synthesizer 336 , but is not limited thereto.
- the post-processor 337 provides a reconstructed audio signal as an output signal.
- FIGS. 4A and 4B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively, which have a switching structure.
- the audio encoding apparatus 410 shown in FIG. 4A may include a pre-processor 412 , a mode determiner 413 , a frequency domain coder 414 , an LP analyzer 415 , a frequency domain excitation coder 416 , a time domain excitation coder 417 , and a parameter coder 418 .
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown). Since it can be considered that the audio encoding apparatus 410 shown in FIG. 4A is obtained by combining the audio encoding apparatus 210 of FIG. 2A and the audio encoding apparatus 310 of FIG. 3A , the description of operations of common parts is not repeated, and an operation of the mode determination unit 413 will now be described.
- the mode determiner 413 may determine a coding mode of an input signal by referring to a characteristic and a bit rate of the input signal.
- the mode determiner 413 may determine the coding mode as a CELP mode or another mode based on whether a current frame is the speech mode or the music mode according to the characteristic of the input signal and based on whether a coding mode efficient for the current frame is the time domain mode or the frequency domain mode.
- the mode determiner 413 may determine the coding mode as the CELP mode when the characteristic of the input signal corresponds to the speech mode, determine the coding mode as the frequency domain mode when the characteristic of the input signal corresponds to the music mode and a high bit rate, and determine the coding mode as an audio mode when the characteristic of the input signal corresponds to the music mode and a low bit rate.
- the mode determiner 413 may provide the input signal to the frequency domain coder 414 when the coding mode is the frequency domain mode, provide the input signal to the frequency domain excitation coder 416 via the LP analyzer 415 when the coding mode is the audio mode, and provide the input signal to the time domain excitation coder 417 via the LP analyzer 415 when the coding mode is the CELP mode.
- the frequency domain coder 414 may correspond to the frequency domain coder 114 in the audio encoding apparatus 110 of FIG. 1A or the frequency domain coder 214 in the audio encoding apparatus 210 of FIG. 2A
- the frequency domain excitation coder 416 or the time domain excitation coder 417 may correspond to the frequency domain excitation coder 315 or the time domain excitation coder 316 in the audio encoding apparatus 310 of FIG. 3A .
- the audio decoding apparatus 430 shown in FIG. 4B may include a parameter decoder 432 , a mode determiner 433 , a frequency domain decoder 434 , a frequency domain excitation decoder 435 , a time domain excitation decoder 436 , an LP synthesizer 437 , and a post-processor 438 .
- Each of the frequency domain decoder 434 , the frequency domain excitation decoder 435 , and the time domain excitation decoder 436 may include a frame error concealment algorithm or a packet loss concealment algorithm in each corresponding domain.
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown). Since it can be considered that the audio decoding apparatus 430 shown in FIG. 4B is obtained by combining the audio decoding apparatus 230 of FIG. 2B and the audio decoding apparatus 330 of FIG. 3B , the description of operations of common parts is not repeated, and an operation of the mode determiner 433 will now be described
- the mode determiner 433 may check coding mode information included in a bitstream and provide a current frame to the frequency domain decoder 434 , the frequency domain excitation decoder 435 , or the time domain excitation decoder 436 .
- the frequency domain decoder 434 may correspond to the frequency domain decoder 134 in the audio decoding apparatus 130 of FIG. 1B or the frequency domain decoder 234 in the audio encoding apparatus 230 of FIG. 2B
- the frequency domain excitation decoder 435 or the time domain excitation decoder 436 may correspond to the frequency domain excitation decoder 334 or the time domain excitation decoder 335 in the audio decoding apparatus 330 of FIG. 3B .
- FIG. 5 is a block diagram of a frequency domain audio encoding apparatus according to an exemplary embodiment.
- the frequency domain audio encoding apparatus 510 shown in FIG. 5 may include a transient detector 511 , a transformer 512 , a signal classifier 513 , an energy coder 514 , a spectrum normalizer 515 , a bit allocator 516 , a spectrum coder 517 , and a multiplexer 518 .
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown).
- the frequency domain audio encoding apparatus 510 may perform all functions of the frequency domain audio coder 214 and partial functions of the parameter coder 216 shown in FIG. 2 .
- the frequency domain audio encoding apparatus 510 may be replaced by a configuration of an encoder disclosed in the ITU-T G.719 standard except for the signal classifier 513 , and the transformer 512 may use a transform window having an overlap duration of 50%.
- the frequency domain audio encoding apparatus 510 may be replaced by a configuration of an encoder disclosed in the ITU-T G.719 standard except for the transient detector 511 and the signal classifier 513 .
- a noise level estimation unit may be further included at a rear end of the spectrum coder 517 as in the ITU-T G.719 standard to estimate a noise level for a spectral coefficient to which a bit is not allocated in a bit allocation process and insert the estimated noise level into a bitstream.
- the transient detector 511 may detect a duration exhibiting a transient characteristic by analyzing an input signal and generate transient signaling information for each frame in response to a result of the detection.
- Various well-known methods may be used for the detection of a transient duration.
- the transient detector 511 may primarily determine whether a current frame is a transient frame and secondarily verify the current frame that has been determined as a transient frame.
- the transient signaling information may be included in a bitstream by the multiplexer 518 and may be provided to the transformer 512 .
- the transformer 512 may determine a window size to be used for a transform according to a result of the detection of a transient duration and perform a time-frequency transform based on the determined window size. For example, a short window may be applied to a sub-band from which a transient duration has been detected, and a long window may be applied to a sub-band from which a transient duration has not been detected. As another example, a short window may be applied to a frame including a transient duration.
- the signal classifier 513 may analyze a spectrum provided from the transformer 512 in frame units to determine whether each frame corresponds to a harmonic frame. Various well-known methods may be used for the determination of a harmonic frame. According to an exemplary embodiment, the signal classifier 513 may divide the spectrum provided from the transformer 512 into a plurality of sub-bands and obtain a peak energy value and an average energy value for each sub-band. Thereafter, the signal classifier 513 may obtain the number of sub-bands of which a peak energy value is greater than an average energy value by a predetermined ratio or above for each frame and determine, as a harmonic frame, a frame in which the obtained number of sub-bands is greater than or equal to a predetermined value. The predetermined ratio and the predetermined value may be determined in advance through experiments or simulations. Harmonic signaling information may be included in the bitstream by the multiplexer 518 .
- the energy coder 514 may obtain energy in each sub-band unit and quantize and lossless-encode the energy. According to an embodiment, a Norm value corresponding to average spectral energy in each sub-band unit may be used as the energy and a scale factor or a power may also be used, but the energy is not limited thereto.
- the Norm value of each sub-band may be provided to the spectrum normalizer 515 and the bit allocator 516 and may be included in the bitstream by the multiplexer 518 .
- the spectrum normalizer 515 may normalize the spectrum by using the Norm value obtained in each sub-band unit.
- the bit allocator 516 may allocate bits in integer units or fraction units by using the Norm value obtained in each sub-band unit.
- the bit allocator 516 may calculate a masking threshold by using the Norm value obtained in each sub-band unit and estimate the perceptually required number of bits, i.e., the allowable number of bits, by using the masking threshold.
- the bit allocator 516 may limit that the allocated number of bits does not exceed the allowable number of bits for each sub-band.
- the bit allocator 516 may sequentially allocate bits from a sub-band having a larger Norm value and weigh the Norm value of each sub-band according to perceptual importance of each sub-band to adjust the allocated number of bits so that a more number of bits are allocated to a perceptually important sub-band.
- the quantized Norm value provided from the energy coder 514 to the bit allocator 516 may be used for the bit allocation after being adjusted in advance to consider psychoacoustic weighting and a masking effect as in the ITU-T G.719 standard.
- the spectrum coder 517 may quantize the normalized spectrum by using the allocated number of bits of each sub-band and lossless-encode a result of the quantization. For example, TCQ, USQ, FPC, AVQ and PVQ or a combination thereof and a lossless encoder optimized for each quantizer may be used for the spectrum encoding. In addition, a trellis coding may also be used for the spectrum encoding, but the spectrum encoding is not limited thereto. Moreover, a variety of spectrum encoding methods may also be used according to either environments in which a corresponding codec is embodied or a user's need. Information on the spectrum encoded by the spectrum coder 517 may be included in the bitstream by the multiplexer 518 .
- FIG. 6 is a block diagram of a frequency domain audio encoding apparatus according to an exemplary embodiment.
- the frequency domain audio encoding apparatus 600 shown in FIG. 6 may include a pre-processor 610 , a frequency domain coder 630 , a time domain coder 650 , and a multiplexer 670 .
- the frequency domain coder 630 may include a transient detector 631 , a transformer 633 and a spectrum coder 635 .
- the components may be integrated in at least one module and may be implemented as at least one processor (not shown).
- the pre-processor 610 may perform filtering, down-sampling, or the like for an input signal, but is not limited thereto.
- the pre-processor 610 may determine a coding mode according to a signal characteristic.
- the pre-processor 610 may determine according to a signal characteristic whether a coding mode suitable for a current frame is a speech mode or a music mode and may also determine whether a coding mode efficient for the current frame is a time domain mode or a frequency domain mode.
- the signal characteristic may be perceived by using a short-term characteristic of a frame or a long-term characteristic of a plurality of frames, but is not limited thereto.
- the coding mode may be determined as the speech mode or the time domain mode, and if the input signal corresponds to a signal other than a speech signal, i.e., a music signal or a mixed signal, the coding mode may be determined as the music mode or the frequency domain mode.
- the pre-processor 610 may provide an input signal to the frequency domain coder 630 when the signal characteristic corresponds to the music mode or the frequency domain mode and may provide an input signal to the time domain coder 660 when the signal characteristic corresponds to the speech mode or the time domain mode.
- the frequency domain coder 630 may process an audio signal provided from the pre-processor 610 based on a transform coding scheme.
- the transient detector 631 may detect a transient component from the audio signal and determine whether a current frame corresponds to a transient frame.
- the transformer 633 may determine a length or a shape of a transform window based on a frame type, i.e. transient information provided from the transient detector 631 and may transform the audio signal into a frequency domain based on the determined transform window.
- a transform tool a modified discrete cosine transform (MDCT), a fast Fourier transform (FFT) or a modulated lapped transform (MLT) may be used.
- a short transform window may be applied to a frame including a transient component.
- the spectrum coder 635 may perform encoding on the audio spectrum transformed into the frequency domain. The spectrum coder 635 will be described below in more detail with reference to FIGS. 7 and 9 .
- the time domain coder 650 may perform code excited linear prediction (CELP) coding on an audio signal provided from the pre-processor 610 .
- CELP code excited linear prediction
- algebraic CELP may be used for the CELP coding, but the CELP coding is not limited thereto.
- the multiplexer 670 may multiplex spectral components or signal components and variable indices generated as a result of encoding in the frequency domain coder 630 or the time domain coder 650 so as to generate a bitstream.
- the bitstream may be stored in a storage medium or may be transmitted in a form of packets through a channel.
- FIG. 7 is a block diagram of a spectrum encoding apparatus according to an exemplary embodiment.
- the spectrum encoding apparatus shown in FIG. 7 may correspond to the spectrum coder 635 of FIG. 6 , may be included in another frequency domain encoding apparatus, or may be implemented independently.
- the spectrum encoding apparatus shown in FIG. 7 may include an energy estimator 710 , an energy quantizing and coding unit 720 , a bit allocator 730 , a spectrum normalizer 740 , a spectrum quantizing and coding unit 750 and a noise filler 760 .
- the energy estimator 710 may divide original spectral coefficients into a plurality of sub-bands and estimate energy, for example, a Norm value for each sub-band.
- Each sub-band may have a uniform length in a frame.
- the number of spectral coefficients included in a sub-band may be increased from a low frequency to a high frequency band.
- the energy quantizing and coding unit 720 may quantize and encode an estimated Norm value for each sub-band.
- the Norm value may be quantized by means of variable tools such as vector quantization (VQ), scalar quantization (SQ), trellis coded quantization (TCQ), lattice vector quantization (LVQ), etc.
- VQ vector quantization
- SQ scalar quantization
- TCQ trellis coded quantization
- LVQ lattice vector quantization
- the energy quantizing and coding unit 720 may additionally perform lossless coding for further increasing coding efficiency.
- the bit allocator 730 may allocate bits required for coding in consideration of allowable bits of a frame, based on the quantized Norm value for each sub-band.
- the spectrum normalizer 740 may normalize the spectrum based on the Norm value obtained for each sub-band.
- the spectrum quantizing and coding unit 750 may quantize and encode the normalized spectrum based on allocated bits for each sub-band.
- the noise filler 760 may add noises into a component quantized to zero due to constraints of allowable bits in the spectrum quantizing and coding unit 750 .
- FIG. 8 illustrates sub-band segmentation
- an input signal uses a sampling frequency of 48 KHz and has a frame size of 20 ms
- the number of samples to be processed for each frame becomes 960. That is, when the input signal is transformed by using MDCT with 50% overlapping, 960 spectral coefficients are obtained.
- a ratio of overlapping may be variably set according a coding scheme.
- a band up to 24 KHz may be theoretically processed and a band up to 20 KHz may be represented in consideration of an audible range.
- a sub-band comprises 8 spectral coefficients.
- a sub-band comprises 16 spectral coefficients.
- a sub-band In a band of 6.4 to 13.6 KHz, a sub-band comprises 24 spectral coefficients. In a band of 13.6 to 20 KHz, a sub-band comprises 32 spectral coefficients. For a predetermined band set in an encoding apparatus, coding based on a Norm value may be performed and for a high band above the predetermined band, coding based on variable schemes such as band extension may be applied.
- FIG. 9 is a block diagram illustrating a configuration of a spectrum quantization apparatus according to an exemplary embodiment.
- the apparatus shown in FIG. 9 may include a quantizer selecting unit 910 , a USQ 930 , and a TCQ 950 .
- the quantizer selecting unit 910 may select the most efficient quantizer from among various quantizers according to the characteristic of a signal to be quantized, i.e. an input signal.
- the characteristic of the input signal bit allocation information for each band, band size information, and the like are usable.
- the signal to be quantized may be provided to one of the USQ 830 and the TCQ 850 so that corresponding quantization is performed.
- the input signal may be a normalized MDCT spectrum.
- the bandwidth of the input signal may be either a narrow band (NB) or a wide band (WB).
- the coding mode of the input signal may be a normal mode.
- FIG. 10 is a block diagram illustrating a configuration of a spectrum encoding apparatus according to an exemplary embodiment.
- the apparatus shown in FIG. 10 may correspond to the spectrum quantizing and encoding unit 750 of FIG. 7 , may be included in another frequency domain encoding apparatus, or may be independently implemented.
- the apparatus shown in FIG. 10 may include an encoding method selecting unit 1010 , a zero encoding unit 1020 , a scaling unit 1030 , an ISC encoding unit 1040 , a quantized component restoring unit 1050 , and an inverse scaling unit 1060 .
- the quantized component restoring unit 1050 and the inverse scaling unit 1060 may be optionally provided.
- the encoding method selection unit 1010 may select an encoding method by taking into account an input signal characteristic.
- the input signal characteristic may include at least one of a bandwidth and bits allocated for each band.
- a normalized spectrum may be provided to the zero encoding unit 1020 or the scaling unit 1030 based on an encoding scheme selected for each band.
- USQ may be used for the corresponding band by determining that the corresponding band is of high importance, and TCQ may be used for all the other bands.
- the average number of bits may be determined by taking into account a band length or a band size.
- the selected encoding method may be set using a one-bit flag.
- the bandwidth is either a super wide band (SWB) or a full band (FB)
- SWB super wide band
- FB full band
- a joint USQ and TCQ method may be used.
- the zero encoding unit 1020 may encode all samples to zero (0) for bands of which allocated bits are zero.
- the scaling unit 1030 may adjust a bit rate by scaling a spectrum based on bits allocated to bands. In this case, a normalized spectrum may be used.
- the scaling unit 1030 may perform scaling by taking into account the average number of bits allocated to each sample, i.e., a spectral coefficient, included in a band. For example, the greater the average number of bits is, the more scaling may be performed.
- the scaling unit 1030 may determine an appropriate scaling value according to bit allocation for each band.
- the number of pulses for a current band may be estimated using a band length and bit allocation information.
- the pulses may indicate unit pulses.
- bits (b) actually needed for the current band may be calculated based on Equation 1.
- n denotes a band length
- m denotes the number of pulses
- i denotes the number of non-zero positions having the important spectral component (ISC).
- the number of non-zero positions may be obtained based on, for example, a probability by Equation 2.
- the number of pulses may be selected by a value b having the closest value to bits allocated to each band.
- an initial scaling factor may be determined by the estimation of the number of pulses obtained for each band and an absolute value of an input signal.
- the input signal may be scaled by the initial scaling factor. If a sum of the numbers of pulses for a scaled original signal, i.e., a quantized signal, is not the same as the extimated number of pulses, pulse redistribution processing may be performed using an updated scaling factor. According to the pulse redistribution processing, if the number of pulses selected for the current band is less than the estimated number of pulses obtained for each band, the number of pulses increases by decreasing the scaling factor, otherwise if the number of pulses selected for the current band is greater than the estimated number of pulses obtained for each band, the number of pulses decreases by increasing the scaling factor. In this case, the scaling factor may be increased or decreased by a predetermined value by selecting a position where distortion of an original signal is minimized.
- the distortion function for TSQ may be obtained a sum of a squared distance between a quantized value and an un-quantized value in each band as shown in Equation 4.
- p i denotes an actual value
- q i denotes a quantized value
- a distortion function for USQ may use a Euclidean distance to determine a best quantized value.
- a modified equation including a scaling factor may be used to minimize computational complexity, and the distortion function may be calculated by Equation 5.
- a predetermined number of pulses may need to be increased or decreased while maintaining a minimal metric. This may be performed in an iterative manner by adding or deleting a single pulse and then repeating until the number of pulses reaches the required value.
- n distortion values need to be obtained to select the most optimum distortion value.
- a distortion value j may correspond to addition of a pulse to a jth position in a band as shown in Equation 6.
- Equation 7 a deviation may be used as shown in Equation 7.
- ⁇ i 1 n ⁇ q i 2
- ⁇ i 1 n ⁇ q i ⁇ p i
- ⁇ i 1 n ⁇ p i 2
- n denotes a band length, i.e., the number of coefficients in a band
- p denotes an original signal, i.e., an input signal of a quantizer
- q denotes a quantized signal
- g denotes a scaling factor.
- a position j where a distortion d is minimized may be selected, thereby updating q j .
- encoding may be performed by using a scaled spectiral coefficient and selecting an appropriate ISC.
- a spectral component for quantization may be selected using bit allocation for each band.
- the spectral component may be selected based on various combinations according to distribution and variance of spectral components.
- actual non-zero positions may be calculated.
- a non-zero position may be obtained by analyzing an amount of scaling and a redistribution operation, and such a selected non-zero position may be referred to as an ISC.
- an optimal scaling factor and non-zero position information corresponding to ISCs by analyzing a magnitude of a signal which has undergone a scaling and redistribution process.
- the non-zero position information indicates the number and locations of non-zero positions. If the number of pulses is not controlled through the scaling and redistribution process, selected pulses may be quantized through a TCQ process, and surplus bits may be adjusted using a result of the quantization. This process may be illustrated as follows.
- surplus bits may be adjusted through actual TCQ quantization.
- a TCQ quantization process is first performed to adjust surplus bits.
- a scaling factor is increased by multiplying a scaling factor determined before the TCQ quantization by a value, e.g., 1.1, greater than 1, otherwise a scaling factor is decreased by multiplying the scaling factor determined before the actual TCQ quantization by a value, e.g., 0.9, less than 1.
- surplus bits are updated by calculating bits used in the actual TCQ quantization process.
- a non-zero position obtained by this process may correspond to an ISC.
- the ISC encoding unit 1040 may encode information on the number of finally selected ISCs and information on non-zero positions. In this process, lossless encoding may be applied to enhance encoding efficiency.
- the ISC encoding unit 1040 may perform encoding using a selected quantizer for a non-zero band of which allocated bits are non zero.
- the ISC encoding unit 1040 may select ISCs for each band with respect to a normalized spectrum and enode information about the selected ISCs based on number, position, magnitude, and sign. In this case, an ISC magnitude may be encoded in a manner other than number, position, and sign.
- the ISC magnitude may be quantized using one of USQ and TCQ and arithmetic-coded, whereas the number, positions, and signs of the ISCs may be arithmetic-coded.
- one of TCQ and USQ may be selected based on a signal characteristic.
- a first joint scheme in which a quantizer is selected by additionally performing secondary bit allocation processing on surplus bits from a previously coded band in addition to original bit allocation information for each band may be used.
- the second bit allocation processing in the first joint method may distribute the surplus bits from the previously coded band and may detect two band that will be encoded separately.
- the signal characteristic may include a bit allocated to each band or a band length.
- USQ may be used. Otherwise, TCQ may be used. If the average number of bits allocated to each sample included in a band is greater than or equal to a threshold value, e.g., 0.75, it may be determined that the corresponding band includes vary important information, and thus USQ may be used. Even in a case of a low band having a short band length, USQ may be used in accordance with circumstances.
- a threshold value e.g. 0.75
- USQ Even in a case of a low band having a short band length, USQ may be used in accordance with circumstances.
- the bandwidth of an input signal is an NB or a WB
- the first joint scheme may be used.
- the second joint scheme in which all bands may be coded by using USQ and TCQ is used for a least significant bit (LSB).
- LSB least significant bit
- the quantized component restoring unit 1050 may restore an actual quantized component by adding ISC position, magnitude, and sign information to a quantized component.
- zero may be allocated to a spectral coefficient of a zero position, i.e., a spectral coefficient encoded to zero.
- the inverse scaling unit 1060 may output a quantized spectral coefficient of the same level as that of a normalized input spectrum by inversely scaling the restored quantized component.
- the scaling unit 1030 and the inverse scaling unit 1060 may use the same scaling factor.
- FIG. 11 is a block diagram illustrating a configuration of an ISC encoding apparatus according to an exemplary embodiment.
- the apparatus shown in FIG. 11 may include an ISC selecting unit 1110 and an ISC information encoding unit 1130 .
- the apparatus of FIG. 11 may correspond to the ISC encoding unit 1040 of FIG. 10 or may be implemented as an independent apparatus.
- the ISC selecting unit 1110 may select ISCs based on a predetermined criterion from a scaled spectrum to adjust a bit rate.
- the ISC selecting unit 1110 may obtain actual non-zero positions by analyzing a degree of scaling from the scaled spectrum.
- the ISCs may correspond to actual non-zero spectral coefficients before scaling.
- the ISC selecting unit 1110 may select spectral coefficients to be encoded, i.e., non-zero positions, by taking into account distribution and variance of spectral coefficients based on bits allocated for each band. TCQ may be used for the ISC selection.
- the ISC information encoding unit 1130 encode ISC information, i.e., number information, position information, magnitude information, and signs of the ISCs based on the selected ISCs.
- FIG. 12 is a block diagram illustrating a configuration of an ISC information encoding apparatus according to an exemplary embodiment.
- the apparatus shown in FIG. 12 may include a position information encoding unit 1210 , a magnitude information encoding unit 1230 , and a sign encoding unit 1250 .
- the position information encoding unit 1210 may encode position information of the ISCs selected by the ISC selection unit ( 1110 of FIG. 11 ), i.e., position information of the non-zero spectral coefficients.
- the position information may include the number and positions of the selected ISCs.
- Arithmetic coding may be used for the encoding on the position information.
- a new buffer may be configured by collecting the selected ISCs. For the ISC collection, zero bands and non-selected spectra may be excluded.
- the magnitude information encoding unit 1230 may encode magnitude information of the newly configured ISCs.
- quantization may be performed by selecting one of TCQ and USQ, and arithmetic coding may be additionally performed in succession.
- non-zero position information and the number of ISCs may be used.
- the sign information encoding unit 1250 may encode sign information of the selected ISCs. Arithmetic coding may be used for the encoding on the sign information.
- FIG. 13 is a block diagram illustrating a configuration of a spectrum encoding apparatus according to another exemplary embodiment.
- the apparatus shown in FIG. 13 may correspond to the spectrum quantizing and encoding unit 750 of FIG. 7 or may be included in another frequency domain encoding apparatus or independently implemented.
- the apparatus shown in FIG. 13 may include a scaling unit 1330 , an ISC encoding unit 1340 , a quantized component restoring unit 1350 , and an inverse scaling unit 1360 .
- an operation of each component is the same except that the zero encoding unit 1020 and the encoding method selection unit 1010 are omitted, and the ISC encoding unit 1340 uses TCQ.
- FIG. 14 is a block diagram illustrating a configuration of a spectrum encoding apparatus according to another exemplary embodiment.
- the apparatus shown in FIG. 14 may correspond to the spectrum quantizing and encoding unit 750 of FIG. 7 or may be included in another frequency domain encoding apparatus or independently implemented.
- the apparatus shown in FIG. 14 may include an encoding method selection unit 1410 , a scaling unit 1430 , an ISC encoding unit 1440 , a quantized component restoring unit 1450 , and an inverse scaling unit 1460 .
- an operation of each component is the same except that the zero encoding unit 1020 is omitted.
- FIG. 15 illustrates a concept of an ISC collecting and encoding process, according to an exemplary embodiment.
- zero bands i.e., bands to be quantized to zero
- a new buffer may be configured by using ISCs selected from among spectral components existing in non-zero bands.
- Quantization may be performed on the newly configured ISCs by using the first or the second joint scheme combining USQ and TCQ, in a band unit and corresponding lossless encoding may be performed.
- FIG. 16 illustrates a second joint scheme combining USQ and TCQ.
- quantization may be performed on spectral data in a band unit by using USQ.
- Each quantized spectral data that is greater than one (1) may contain an LSB which is zero or one.
- a sequence of LSBs may be obtained and then be quantized by using TCQ to find the best match between the sequence of LSBs and available trellis paths.
- SNR Signal to Noise Ratio
- the advantages of both quantizers i.e. USQ and TCQ may be used in one scheme and the path limitation may be excluded from TCQ.
- FIG. 17 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment.
- the apparatus shown in FIG. 17 may correspond to the ISC encoding unit 1040 of FIG. 10 or independently implemented.
- the spectrum encoding apparatus shown in FIG. 17 may include a first quantization unit 1710 , a second quantization unit 1730 , a first lossless coding unit 1750 , a second lossless coding unit 1760 , a third lossless coding unit 1770 and a bitstream generating unit 1790 .
- the components may be integrated in at least one processor.
- the first quantization unit 1710 may quantize spectral data of a band, i.e. a non-zero band by using USQ.
- the number of bits allocated for quantization of each band may be determined in advance. In this case, the number of bits which will be used for TCQ in the second quantization unit 1730 may be extracted from each non-zero band evenly, and then USQ may be performed on the band by using the remaining number of bits in the non-zero band.
- the spectral data may be norms or a normalized spectral data.
- the second quantization unit 1730 may quantize a lower bit of a quantized spectral data from the first quantization unit 1710 , by using TCQ.
- the lower bit may be an LSB.
- the lower bit i.e. residual data may be collected and then TCQ may be performed on the residual data.
- residual data may be collected as the difference between the quantized and un-quantized spectral data. If some frequencies are quantized as zero in a non-zero band, they may not be included into residual data.
- the residual data may construct an array.
- the first lossless coding unit 1750 may perform lossless coding on information about ISCs included in a band, e.g. a number, a position and a sign of the ISCs. According to an embodiment, arithmetic coding may be used.
- the second lossless coding unit 1760 may perform lossless coding on magnitude information which is constructed by the remaining bit except for the lower bit in the quantized spectral data. According to an embodiment, arithmetic coding may be used.
- the third lossless coding unit 1770 may perform lossless coding on TCQ information, i.e. trellis path data obtained from a quantization result of the second quantization unit 1730 .
- arithmetic coding may be used.
- the trellis path data may be encoded as equi-probable symbols.
- the trellis path data is a binary sequence and may be encoded using an arithmetic encoder with a uniform probability model.
- the bitstream generating unit 1790 may generate a bitstream by using data provided from the first to third lossless coding units 1750 , 1760 and 1770 .
- FIG. 18 is a block diagram of a second quantization unit of FIG. 17 according to an exemplary embodiment.
- the second quantization unit shown in FIG. 18 may include a lower bit obtaining unit 1810 , a residual data generating unit 1830 and a TCQ unit 1850 .
- the components may be integrated in at least one processor.
- the lower bit obtaining unit 1810 may extract residual data based on the difference between the quantized non-zero spectral data provided from the first quantization unit 1710 and original non-zero spectral data.
- the residual data may correspond to a lower bit of the quantized non-zero spectral data, e.g. an LSB.
- the residual data generating unit 1830 may construct a residual array by collecting the difference between the quantized non-zero spectral data and the original non-zero spectral data for all non-zero bands.
- FIG. 19 illustrates a method of generating the residual data.
- the TCQ unit 1850 may perform TCQ on the residual array provided from the residual data generating unit 1830 .
- the residual array may be quantized by TCQ with code rate 1/2 known (7,5) 8 code.
- FIG. 20 illustrates an example of TCQ having four states.
- quantization using TCQ may be performed for the first 2 ⁇ TCQ _AMP magnitudes.
- the constant TCQ _AMP is defined as 10, which allows up to 20 magnitudes per frame to be encoded.
- path metrics may be checked and the best one may be selected. For lossless coding, data for the best trellis path may be stored in a separate array while a trace back procedure is performed.
- FIG. 21 is a block diagram illustrating a configuration of a frequency domain audio decoding apparatus according to an exemplary embodiment.
- a frequency domain audio decoding apparatus 2100 shown in FIG. 21 may include a frame error detecting unit 2110 , a frequency domain decoding unit 2130 , a time domain decoding unit 2150 , and a post-processing unit 2170 .
- the frequency domain decoding unit 2130 may include a spectrum decoding unit 2131 , a memory update unit 2133 , an inverse transform unit 2135 , and an overlap and add (OLA) unit 2137 .
- Each component may be integrated in at least one module and implemented by at least one processor (not shown).
- the frame error detecting unit 2110 may detect whether a frame error has occurred from a received bitstream.
- the frequency domain decoding unit 2130 may operate when an encoding mode is a music mode or a frequency domain mode, enable an FEC or PLC algorithm when a frame error has occurred, and generate a time domain signal through a general transform decoding process when no frame error has occurred.
- the spectrum decoding unit 2131 may synthesize a spectral coefficient by performing spectrum decoding using a decoded parameter. The spectrum decoding unit 2131 will be described in more detail with reference FIGS. 19 and 20 .
- the memory update unit 2133 may update a synthesized spectral coefficient for a current frame that is a normal frame, information obtained using a decoded parameter, the number of continuous error frames till the present, a signal characteristic of each frame, frame type information, or the like for a subsequent frame.
- the signal characteristic may include a transient characteristic and a stationary characteristic
- the frame type may include a transient frame, a stationary frame, or a harmonic frame.
- the inverse transform unit 2135 may generate a time domain signal by performing time-frequency inverse transform on the synthesized spectral coefficient.
- the OLA unit 2137 may perform OLA processing by using a time domain signal of a previous frame, generate a final time domain signal for a current frame as a result of the OLA processing, and provide the final time domain signal to the post-processing unit 2170 .
- the time domain decoding unit 2150 may operate when the encoding mode is a voice mode or a time domain mode, enable the FEC or PLC algorithm when a frame error has occurred, and generate a time domain signal through a general CELP decoding process when no frame error has occurred.
- the post-processing unit 2170 may perform filtering or up-sampling on the time domain signal provided from the frequency domain decoding unit 2130 or the time domain decoding unit 2150 but is not limited thereto.
- the post-processing unit 2170 may provide a restored audio signal as an output signal.
- FIG. 22 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to an exemplary embodiment.
- the apparatus 2200 shown in FIG. 22 may correspond to the spectrum decoding unit 2131 of FIG. 21 or may be included in another frequency domain decoding apparatus or independently implemented.
- a spectrum decoding apparatus 2200 shown in FIG. 22 may include an energy decoding and inverse quantizing unit 2210 , a bit allocator 2230 , a spectrum decoding and inverse quantizing unit 2250 , a noise filler 2270 , and a spectrum shaping unit 2290 .
- the noise filler 2270 may be located at a rear end of the spectrum shaping unit 2290 .
- Each component may be integrated in at least one module and implemented by at least one processor (not shown).
- the energy decoding and inverse quantizing unit 2210 may lossless-decode energy such as a parameter for which lossless encoding has been performed in an encoding process, e.g., a Norm value, and inverse-quantize the decoded Norm value.
- the inverse quantization may be performed using a scheme corresponding to a quantization scheme for the Norm value in the encoding process.
- the bit allocator 2230 may allocate bits of a number required for each sub-band based on a quantized Norm value or the inverse-quantized Norm value.
- the number of bits allocated for each sub-band may be the same as the number of bits allocated in the encoding process.
- the spectrum decoding and inverse quantizing unit 2250 may generate a normalized spectral coefficient by lossless-decoding an encoded spectral coefficient using the number of bits allocated for each sub-band and performing an inverse quantization process on the decoded spectral coefficient.
- the noise filler 2270 may fill noise in portions requiring noise filling for each sub-band among the normalized spectral coefficient.
- the spectrum shaping unit 2290 may shape the normalized spectral coefficient by using the inverse-quantized Norm value. A finally decoded spectral coefficient may be obtained through a spectral shaping process.
- FIG. 23 is a block diagram illustrating a configuration of a spectrum inverse-quantization apparatus according to an exemplary embodiment.
- the apparatus shown in FIG. 23 may include an inverse quantizer selecting unit 2310 , a USQ 2330 , and a TCQ 2350 .
- the inverse quantizer selecting unit 2310 may select the most efficient inverse quantizer from among various inverse quantizers according to characteristics of an input signal, i.e., a signal to be inverse-quantized. Bit allocation information for each band, band size information, and the like are usable as the characteristics of the input signal. According to a result of the selection, the signal to be inverse-quantized may be provided to one of the USQ 2330 and the TCQ 2350 so that corresponding inverse quantization is performed.
- FIG. 23 may correspond to the second joint scheme.
- FIG. 24 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to an exemplary embodiment.
- the apparatus shown in FIG. 24 may correspond to the spectrum decoding and inverse quantizing unit 2250 of FIG. 22 or may be included in another frequency domain decoding apparatus or independently implemented.
- the apparatus shown in FIG. 24 may include a decoding method selecting unit 2410 , a zero decoding unit 2430 , an ISC decoding unit 2450 , a quantized component restoring unit 2470 , and an inverse scaling unit 2490 .
- the quantized component restoring unit 2470 and the inverse scaling unit 2490 may be optionally provided.
- the decoding method selecting unit 2410 may select a decoding method based on bits allocated for each band.
- a normalized spectrum may be provided to the zero decoding unit 2430 or the ISC decoding unit 2450 based on the decoding method selected for each band.
- the zero decoding unit 2430 may decode all samples to zero for bands of which allocated bits are zero.
- the ISC decoding unit 2450 may decode bands of which allocated bits are not zero, by using a selected inverse quantizer.
- the ISC decoding unit 2450 may obtain information about important frequency components for each band of an encoded spectrum and decode the information about the important frequency components obtained for each band, based on number, position, magnitude, and sign.
- An important frequency component magnitude may be decoded in a manner other than number, position, and sign.
- the important frequency component magnitude may be arithmetic-decoded and inverse-quantized using one of USQ and TCQ, whereas the number, positions, and signs of the important frequency components may be arithmetic-decoded.
- the selection of an inverse quantizer may be performed using the same result as in the ISC encoding unit 1040 shown in FIG. 10 .
- the ISC decoding unit 2450 may inverse-quantize the bands of which allocated bits are not zero, based on the first joint scheme or the second joint scheme.
- the quantized component restoring unit 2470 may restore actual quantized components based on position, magnitude, and sign information of restored ISCs.
- zero may be allocated to zero positions, i.e., non-quantized portions which are spectral coefficients decoded to zero.
- the inverse scaling unit (not shown) may be further included to inversely scale the restored quantized components to output quantized spectral coefficients of the same level as the normalized spectrum.
- FIG. 25 is a block diagram illustrating a configuration of an ISC decoding apparatus according to an exemplary embodiment.
- the apparatus shown in FIG. 25 may include a pulse-number estimation unit 2510 and an ISC information decoding unit 2530 .
- the apparatus shown in FIG. 25 may correspond to the ISC decoding unit 2450 of FIG. 24 or may be implemented as an independent apparatus.
- the pulse-number estimation unit 2510 may determine a estimated value of the number of pulses required for a current band by using a band size and bit allocation information. That is, since bit allocation information of a current frame is the same as that of an encoder, decoding is performed by using the same bit allocation information to derive the same estimated value of the number of pulses.
- the ISC information decoding unit 2530 may decode ISC information, i.e., number information, position information, magnitude information, and signs of ISCs based on the estimated number of pulses.
- FIG. 26 is a block diagram illustrating a configuration of an ISC information decoding apparatus according to an exemplary embodiment.
- the apparatus shown in FIG. 26 may include a position information decoding unit 2610 , a magnitude information decoding unit 2630 , and a sign decoding unit 2650 .
- the position information decoding unit 2610 may restore the number and positions of ISCs by decoding an index related to position information, which is included in a bitstream. Arithmetic decoding may be used to decode the position information.
- the magnitude information decoding unit 2330 may arithmetic-decode an index related to magnitude information, which is included in the bitstream and inverse-quantize the decoded index based on the first joint scheme or the second joint scheme. To increase efficiency of the arithmetic decoding, non-zero position information and the number of ISCs may be used.
- the sign decoding unit 2650 may restore signs of the ISCs by decoding an index related to sign information, which is included in the bitstream. Arithmetic decoding may be used to decode the sign information. According to an embodiment, the number of pulses required for a non-zero band may be estimated and used to decode the position information, the magnitude information, or the sign information.
- FIG. 27 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to another exemplary embodiment.
- the apparatus shown in FIG. 27 may correspond to the spectrum decoding and inverse quantizing unit 2250 of FIG. 22 or may be included in another frequency domain decoding apparatus or independently implemented.
- the apparatus shown in FIG. 27 may include an ISC decoding unit 2750 , a quantized component restoring unit 2770 , and an inverse scaling unit 2790 .
- an operation of each component is the same except that the decoding method selecting unit 2410 and the zero decoding unit 2430 are omitted, and the ISC decoding unit 2450 uses TCQ.
- FIG. 28 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to another exemplary embodiment.
- the apparatus shown in FIG. 28 may correspond to the spectrum decoding and inverse quantizing unit 2250 of FIG. 22 or may be included in another frequency domain decoding apparatus or independently implemented.
- the apparatus shown in FIG. 28 may include a decoding method selection unit 2810 , an ISC decoding unit 2850 , a quantized component restoring unit 2870 , and an inverse scaling unit 2890 . As compared with FIG. 24 , an operation of each component is the same except that the zero decoding unit 2430 is omitted.
- FIG. 29 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment.
- the apparatus shown in FIG. 29 may correspond to the ISC decoding unit 2450 of FIG. 24 , or may be independently implemented.
- the apparatus shown in FIG. 29 may include a first decoding unit 2910 , a second decoding unit 2930 , a third decoding unit 2950 and a spectrum component restoring unit 2970 .
- the first decoding unit 2910 may extract ISC information of a band from a bitstream and may decode number, position and sign of ISCs. The remaining bits except for a lower bit may be extracted and then be decoded.
- the decoded ISC information may be provided to the spectrum component restoring unit 2970 and position information of ISCs may be provided to the second decoding unit 2930 .
- the second decoding unit 2930 may decode the remaining bits except for a lower bit from the spectral data for each band, based on the position information of the decoded ISCs provided from the first decoding unit 2910 and bit allocation of each band.
- the surplus bits corresponding to a difference between the allocated bits of a band and an actually used bits of the band may be accumulated and then be used for a next band.
- the third decoding unit 2950 may restore a TCQ residual array corresponding to the sequence of lower bits by decoding the TCQ path information extracted from the bitstream.
- the spectrum component restoring unit 2970 may reconstruct spectrum components based on data provided from the first decoding unit 2910 , the second decoding unit 2930 and the third decoding unit 2950 .
- the first to third decoding units 2910 , 2930 and 2950 may use arithmetic decoding for lossless decoding.
- FIG. 30 is a block diagram of a third decoding unit of FIG. 29 according to another exemplary embodiment.
- the third decoding unit shown in FIG. 30 may include a TCQ path decoding unit 3010 and a TCQ residual restoring unit 3030 .
- the TCQ path decoding unit 3010 may decode TCQ path information obtained from the bitstream.
- the TCQ residual restoring unit 3030 may TCQ residual data based on the decoded TCQ path information.
- the residual data i.e. a residual array may be reconstructed according to a decoded trellis state. From each path bit, two LSB bits may be generated in the residual array. This process may be represented by the following pseudo code.
- the decoder may move through the trellis using decoded dpath bits, and may extract two bits corresponding to the current trellis edge.
- FIGS. 29 and 30 may have a reversible relationship to the configurations of FIGS. 17 and 18 .
- FIG. 31 is a block diagram of a multimedia device including an encoding module, according to an exemplary embodiment.
- the multimedia device 3100 may include a communication unit 3110 and the encoding module 3130 .
- the multimedia device 3100 may further include a storage unit 3150 for storing an audio bitstream obtained as a result of encoding according to the usage of the audio bitstream.
- the multimedia device 3100 may further include a microphone 3170 . That is, the storage unit 3150 and the microphone 3170 may be optionally included.
- the multimedia device 3100 may further include an arbitrary decoding module (not shown), e.g., a decoding module for performing a general decoding function or a decoding module according to an exemplary embodiment.
- the encoding module 3130 may be implemented by at least one processor (not shown) by being integrated with other components (not shown) included in the multimedia device 3100 as one body.
- the communication unit 3110 may receive at least one of an audio signal or an encoded bitstream provided from the outside or may transmit at least one of a reconstructed audio signal or an encoded bitstream obtained as a result of encoding in the encoding module 3130 .
- the communication unit 3110 is configured to transmit and receive data to and from an external multimedia device or a server through a wireless network, such as wireless Internet, wireless intranet, a wireless telephone network, a wireless Local Area Network (LAN), Wi-Fi, Wi-Fi Direct (WFD), third generation (3G), fourth generation (4G), Bluetooth, Infrared Data Association (IrDA), Radio Frequency Identification (RFID), Ultra WideBand (UWB), Zigbee, or Near Field Communication (NFC), or a wired network, such as a wired telephone network or wired Internet.
- a wireless network such as wireless Internet, wireless intranet, a wireless telephone network, a wireless Local Area Network (LAN), Wi-Fi, Wi-Fi Direct (WFD), third generation (3G), fourth generation (4G), Bluetooth, Infrared Data Association (IrDA), Radio Frequency Identification (RFID), Ultra WideBand (UWB), Zigbee, or Near Field Communication (NFC), or a wired network, such as a wired telephone network or wired Internet
- the encoding module 3130 may quantize spectral data of a current band based on a first quantization scheme, generate a lower bit of the current band using the spectral data and the quantized spectral data, quantize a sequence of lower bits including the lower bit of the current band based on a second quantization scheme, and generate a bitstream based on a upper bit excluding N bits, where N is 1 or greater, from the quantized spectral data and the quantized sequence of lower bits.
- the storage unit 3150 may store the encoded bitstream generated by the encoding module 3130 . In addition, the storage unit 3150 may store various programs required to operate the multimedia device 3100 .
- the microphone 3170 may provide an audio signal from a user or the outside to the encoding module 3130 .
- FIG. 32 is a block diagram of a multimedia device including a decoding module, according to an exemplary embodiment.
- the multimedia device 3200 may include a communication unit 3210 and a decoding module 3230 .
- the multimedia device 3200 may further include a storage unit 3250 for storing the reconstructed audio signal.
- the multimedia device 3200 may further include a speaker 3270 . That is, the storage unit 3250 and the speaker 3270 may be optionally included.
- the multimedia device 3200 may further include an encoding module (not shown), e.g., an encoding module for performing a general encoding function or an encoding module according to an exemplary embodiment.
- the decoding module 3230 may be implemented by at least one processor (not shown) by being integrated with other components (not shown) included in the multimedia device 3200 as one body.
- the communication unit 3290 may receive at least one of an audio signal or an encoded bitstream provided from the outside or may transmit at least one of a reconstructed audio signal obtained as a result of decoding in the decoding module 3230 or an audio bitstream obtained as a result of encoding.
- the communication unit 3210 may be implemented substantially and similarly to the communication unit 3100 of FIG. 31 .
- the decoding module 3230 may receive a bitstream provided via the communication unit 3210 , decode a sequence of lower bits by extracting TCQ path information, decode number, position and sign of ISCs by extracting ISC information, extract and decode a remaining bit except for a lower bit, and reconstruct spectrum components based on the decoded sequence of lower bits and the decoded remaining bit except for the lower bit.
- the storage unit 3250 may store the reconstructed audio signal generated by the decoding module 3230 .
- the storage unit 3250 may store various programs required to operate the multimedia device 3200 .
- the speaker 3270 may output the reconstructed audio signal generated by the decoding module 3230 to the outside.
- FIG. 33 is a block diagram of a multimedia device including an encoding module and a decoding module, according to an exemplary embodiment.
- the multimedia device 3300 may include a communication unit 3310 , an encoding module 3320 , and a decoding module 3330 .
- the multimedia device 3300 may further include a storage unit 3340 for storing an audio bitstream obtained as a result of encoding or a reconstructed audio signal obtained as a result of decoding according to the usage of the audio bitstream or the reconstructed audio signal.
- the multimedia device 3300 may further include a microphone 3350 and/or a speaker 3360 .
- the encoding module 3320 and the decoding module 3330 may be implemented by at least one processor (not shown) by being integrated with other components (not shown) included in the multimedia device 3300 as one body.
- the components of the multimedia device 3300 shown in FIG. 33 correspond to the components of the multimedia device 3100 shown in FIG. 31 or the components of the multimedia device 3200 shown in FIG. 32 , a detailed description thereof is omitted.
- Each of the multimedia devices 3100 , 3200 , and 3300 shown in FIGS. 31, 32, and 33 may include a voice communication dedicated terminal, such as a telephone or a mobile phone, a broadcasting or music dedicated device, such as a TV or an MP3 player, or a hybrid terminal device of a voice communication dedicated terminal and a broadcasting or music dedicated device but are not limited thereto.
- a voice communication dedicated terminal such as a telephone or a mobile phone
- a broadcasting or music dedicated device such as a TV or an MP3 player
- a hybrid terminal device of a voice communication dedicated terminal and a broadcasting or music dedicated device but are not limited thereto.
- each of the multimedia devices 3100 , 3200 , and 3300 may be used as a client, a server, or a transducer displaced between a client and a server.
- the multimedia device 3100 , 3200 , and 3300 may further include a user input unit, such as a keypad, a display unit for displaying information processed by a user interface or the mobile phone, and a processor for controlling the functions of the mobile phone.
- the mobile phone may further include a camera unit having an image pickup function and at least one component for performing a function required for the mobile phone.
- the multimedia device 3100 , 3200 , and 3300 may further include a user input unit, such as a keypad, a display unit for displaying received broadcasting information, and a processor for controlling all functions of the TV.
- the TV may further include at least one component for performing a function of the TV.
- FIG. 34 is a flowchart illustrating a spectrum encoding method according to an exemplary embodiment.
- spectral data of a current band may be quantized by using a first quantization scheme.
- the first quantization scheme may be a scalar quantizer.
- the USQ having a uniform quantization step size may be used.
- a lower bit of the current band may be generated based on the spectral data and the quantized spectral data.
- the lower bit may be obtained based on a difference between the spectral data and the quantized spectral data.
- the second quantization scheme may be the TCQ.
- a sequence of the lower bits including the lower bit of the current band may be quantized by using the second quantization scheme.
- a bitstream may be generated based on upper bits except for N bit, where N is a value greater than or equal to 1) from the quantized spectral data and the quantized sequence of the lower bits.
- the bandwidth of spectral data related to a spectrum encoding method of FIG. 34 may be a SWB or a FB.
- the spectral data may be obtained by performing MDCT on an input audio signal and may be coding in a normal mode.
- FIG. 35 is a flowchart illustrating a spectrum decoding method according to an exemplary embodiment.
- ISC information may be extracted from a bitstream and number, position and sign of ISCs may be decoded. The remaining bits except for a lower bit may be extracted and then be decoded.
- the sequence of the lower bits may be decoded by extracting TCQ path information from the bitstream.
- spectral components may be reconstructed based on the decoded remaining bits except for the lower bit by operation 3510 and the decoded sequence of the lower bits by operation 3530 .
- Some functions in respective components of the above decoding apparatus may be added into respective operations of FIG. 35 , according to circumstances or user's need.
- FIG. 36 is a block diagram of a bit allocation apparatus according to an exemplary embodiment.
- the apparatus shown in FIG. 36 may correspond to the bit allocator 516 of FIG. 5 , the bit allocator 730 of FIG. 7 or the bit allocation unit 2230 of FIG. 22 , or may be independently implemented.
- a bit allocation apparatus shown in FIG. 36 may include a bit estimation unit 3610 , a re-distributing unit 3630 and an adjusting unit 3650 , which may be integrated into at least one processor.
- bit allocation used in spectrum quantization fractional bit allocation may be used.
- bit allocation with the fractional parts of e.g. 3 bits may be permitted and thus it is possible to perform a finer bit allocation.
- the fractional bit allocation may be used.
- the bit estimation unit 3610 may estimate initially allocated bits for each band based on average energy of a band, e.g. norms.
- the initially allocated bits R 0 (p,0) of a band may be estimated by Equation 8.
- L M (p) indicates the number of bits that corresponds to 1 bit/sample in a band p, and if a band includes 10 samples, L M (p) becomes 10 bits.
- TB is a total bit budget and Î M (i) indicates quantized norms of a band i.
- the re-distributing unit 3630 may re-distribute the initially allocated bits of each band, based on a predetermined criteria.
- the fully allocated bits may be calculated as a starting point and the first-stage iterations may be done to re-distribute the allocated bits to the bands with non-zero bits until the number of fully allocated bits is equal to the total bit budget TB, which is represented by Equation 9.
- NSL 0 (k ⁇ 1) is the number of spectral lines in all bands with allocated bits after k iterations.
- the first minimum bit may consist of constant values depending on the band index and bit-rate.
- the re-distribution of bits may be done again to allocate bits to the bands with more than L M (p) bits.
- the value of L M (p) bits may correspond to the second minimum bits required for each band.
- the allocated bits R 1 (p,0) may be calculated based on the result of the first-stage iteration and the first and second minimum bit for each band, which is represented by Equation 10, as an example.
- R(p) is the allocated bits after the first-stage iterations
- bs is 2 at 24.4 kbps and 3 at 32 kbps, but is not limited thereto.
- TB may be updated by subtracting the number of bits in bands with L M (p) bits, and the band index p may be updated to p′ which indicates the band indices with higher bits than L M (p) bits.
- N bands may also be updated to N′ bands which is the number of bands for p′.
- the second-stage iterations may be then done until the updated TB (TB′) is equal to the number of bits in bands with more than L M (p′) bits, which is represented by Equation 11, as an example.
- NSL 1 (k ⁇ 1) denotes the number of spectral lines in all bands with more than L M (p′) bits after k iterations.
- the bits in bands with non-zero allocated bits from the highest bands may be set to zero until TB′ is equal to zero.
- a final re-distribution of over-allocated bits and under-allocated bits may be performed.
- the final re-distribution may be performed based on a predetermined reference value.
- the adjusting unit 3650 may adjust the fractional parts of the bit allocation result to be a predetermined bit.
- the fractional parts of the bit allocation result may be adjusted to have three bits, which may be represented by Equations 12.
- FIG. 37 is a block diagram of a coding mode determination apparatus according to an exemplary embodiment.
- a coding mode determination apparatus shown in FIG. 37 may include a speech/music classifying unit 3710 and a correction unit 3730 .
- the apparatus shown in FIG. 37 may be included in the mode determiner 213 of FIG. 2A , the mode determiner 314 of FIG. 3A or the mode determiner 413 of FIG. 4A .
- the apparatus shown in FIG. 37 may be further included in the time domain coder 215 of FIG. 2A , the time domain excitation coder 316 of FIG. 3A or the time domain excitation coder 417 of FIG. 4A , or may be independently implemented.
- the components may be integrated into at least one module and implemented as at least one processor (not shown) except for a case where it is needed to be implemented to separate pieces of hardware.
- an audio signal may indicate a music signal, a speech signal, or a mixed signal of music and speech.
- the speech/music classifying unit 3710 may classify whether an audio signal corresponds to a music signal or a speech signal, based on various initial classification parameters.
- An audio signal classification process may include at least one operation.
- the audio signal may be classified as a music signal or a speech signal based on signal characteristics of a current frame and a plurality of previous frames.
- the signal characteristics may include at least one of a short-term characteristic and a long-term characteristic.
- the signal characteristics may include at least one of a time domain characteristic and a frequency domain characteristic.
- CELP code excited linear prediction
- the audio signal may be coded using a transform coder.
- the transform coder may be, for example, a modified discrete cosine transform (MDCT) coder but is not limited thereto.
- an audio signal classification process may include a first operation of classifying an audio signal as a speech signal and a generic audio signal, i.e., a music signal, according to whether the audio signal has a speech characteristic and a second operation of determining whether the generic audio signal is suitable for a generic signal audio coder (GSC). Whether the audio signal can be classified as a speech signal or a music signal may be determined by combining a classification result of the first operation and a classification result of the second operation. When the audio signal is classified as a speech signal, the audio signal may be encoded by a CELP-type coder.
- the CELP-type coder may include a plurality of modes among an unvoiced coding (UC) mode, a voiced coding (VC) mode, a transient coding (TC) mode, and a generic coding (GC) mode according to a bit rate or a signal characteristic.
- UC unvoiced coding
- VC voiced coding
- TC transient coding
- GC generic coding
- a generic signal audio coding (GSC) mode may be implemented by a separate coder or included as one mode of the CELP-type coder.
- GSC generic signal audio coding
- the audio signal When the audio signal is classified as a music signal, the audio signal may be encoded using the transform coder or a CELP/transform hybrid coder.
- the transform coder may be applied to a music signal
- the CELP/transform hybrid coder may be applied to a non-music signal, which is not a speech signal, or a signal in which music and speech are mixed.
- all of the CELP-type coder, the CELP/transform hybrid coder, and the transform coder may be used, or the CELP-type coder and the transform coder may be used.
- the CELP-type coder and the transform coder may be used for a narrow-band (NB), and the CELP-type coder, the CELP/transform hybrid coder, and the transform coder may be used for a wide-band (WB), a super-wide-band (SWB), and a full band (FB).
- the CELP/transform hybrid coder is obtained by combining an LP-based coder which operates in a time domain and a transform domain coder, and may be also referred to as a generic signal audio coder (GSC).
- GSC generic signal audio coder
- the signal classification of the first operation may be based on a Gaussian mixture model (GMM).
- GMM Gaussian mixture model
- Various signal characteristics may be used for the GMM. Examples of the signal characteristics may include open-loop pitch, normalized correlation, spectral envelope, tonal stability, signal's non-stationarity, LP residual error, spectral difference value, and spectral stationarity but are not limited thereto.
- Examples of signal characteristics used for the signal classification of the second operation may include spectral energy variation characteristic, tilt characteristic of LP analysis residual energy, high-band spectral peakiness characteristic, correlation characteristic, voicing characteristic, and tonal characteristic but are not limited thereto.
- the characteristics used for the first operation may be used to determine whether the audio signal has a speech characteristic or a non-speech characteristic in order to determine whether the CELP-type coder is suitable for encoding
- the characteristics used for the second operation may be used to determine whether the audio signal has a music characteristic or a non-music characteristic in order to determine whether the GSC is suitable for encoding.
- one set of frames classified as a music signal in the first operation may be changed to a speech signal in the second operation and then encoded by one of the CELP modes. That is, when the audio signal is a signal of large correlation or an attack signal while having a large pitch period and high stability, the audio signal may be changed from a music signal to a speech signal in the second operation.
- a coding mode may be changed according to a result of the signal classification described above.
- the correction unit 3730 may correct the classification result of the speech/music classifying unit 3710 based on at least one correction parameter.
- the correction unit 3730 may correct the classification result of the speech/music classifying unit 3710 based on a context. For example, when a current frame is classified as a speech signal, the current frame may be corrected to a music signal or maintained as the speech signal, and when the current frame is classified as a music signal, the current frame may be corrected to a speech signal or maintained as the music signal.
- characteristics of a plurality of frames including the current frame may be used. For example, eight frames may be used, but the embodiment is not limited thereto.
- the correction parameter may include a combination of at least one of characteristics such as tonality, linear prediction error, voicing, and correlation.
- the tonality may include tonality ton2 of a range of 1-2 KHz and tonality ton3 of a range of 2-4 KHz, which may be defined by Equations 13 and 14, respectively.
- tonality2 [ ⁇ 1] denotes tonality of a range of 1-2 KHz of a one-frame previous frame.
- It_tonality may denote full-band long-term tonality.
- a linear prediction error LP err may be defined by Equation 15.
- sfa i and sfb i may vary according to types of feature parameters and bandwidths and are used to approximate each feature parameter to a range of [0;1].
- FV 9 log ( E ⁇ ( 13 ) E ⁇ ( 1 ) ) + log ( E [ - 1 ] ⁇ ( 13 ) E [ - 1 ] ⁇ ( 1 ) ) ( 16 )
- E(1) denotes energy of a first LP coefficient
- E(13) denotes energy of a 13 th LP coefficient
- C norm [.] denotes a normalized correlation in a first or second half frame.
- M cor denotes a correlation map of a frame.
- a correction parameter including at least one of conditions 1 through 4 may be generated using the plurality of feature parameters, taken alone or in combination.
- the conditions 1 and 2 may indicate conditions by which a speech state SPEECH_STATE can be changed
- the conditions 3 and 4 may indicate conditions by which a music state MUSIC_STATE can be changed.
- the condition 1 enables the speech state SPEECH_STATE to be changed from 0 to 1
- the condition 2 enables the speech state SPEECH_STATE to be changed from 1 to 0.
- the condition 3 enables the music state MUSIC_STATE to be changed from 0 to 1
- the condition 4 enables the music state MUSIC_STATE to be changed from 1 to 0.
- the speech state SPEECH_STATE of 1 may indicate that a speech probability is high, that is, CELP-type coding is suitable, and the speech state SPEECH_STATE of 0 may indicate that non-speech probability is high.
- the music state MUSIC_STATE of 1 may indicate that transform coding is suitable, and the music state MUSIC_STATE of 0 may indicate that CELP/transform hybrid coding, i.e., GSC, is suitable.
- the music state MUSIC_STATE of 1 may indicate that transform coding is suitable, and the music state MUSIC_STATE of 0 may indicate that CELP-type coding is suitable.
- the condition 1 (cond A ) may be defined, for example, as follows. That is, when d vcor >0.4 AND d ft ⁇ 0.1 AND FV s (1)>(2*FV s (7)+0.12) AND ton 2 ⁇ d vcor AND ton 3 ⁇ d vcor AND ton LT ⁇ d vcor AND FV s (7) ⁇ d vcor AND FV s (1)>d vcor AND FV s (1)>0.76, cond A may be set to 1.
- condition 2 (cond B ) may be defined, for example, as follows. That is, when d vcor ⁇ 0.4, cond B may be set to 1.
- condition 3 may be defined, for example, as follows. That is, when 0.26 ⁇ ton 2 ⁇ 0.54 AND ton 3 >0.22 AND 0.26 ⁇ ton LT ⁇ 0.54 AND LP err >0.5, cond C may be set to 1.
- condition 4 may be defined, for example, as follows. That is, when ton 2 ⁇ 0.34 AND ton 3 ⁇ 0.26 AND 0.26 ⁇ ton LT ⁇ 0.45, cond D may be set to 1.
- a feature or a set of features used to generate each condition is not limited thereto.
- each constant value is only illustrative and may be set to an optimal value according to an implementation method.
- the correcting unit 3730 may correct errors in the initial classification result by using two independent state machines, for example, a speech state machine and a music state machine.
- Each state machine has two states, and hangover may be used in each state to prevent frequent transitions.
- the hangover may include, for example, six frames.
- FIG. 38 illustrates a state machine used in a correction unit 3730 of FIG. 37 according to an exemplary embodiment.
- a left side shows a state machine suitable for a CELP core, i.e. a state machine for context-based correction in a speech state, according to an embodiment.
- correction on a classification result may be applied according to a music state determined by the music state machine and a speech state determined by the speech state machine. For example, when an initial classification result is set to a music signal, the music signal may be changed to a speech signal based on correction parameters.
- a classification result of a first operation of the initial classification result indicates a music signal, and the speech state is 1, both the classification result of the first operation and a classification result of a second operation may be changed to a speech signal. In this case, it may be determined that there is an error in the initial classification result, thereby correcting the classification result.
- the correction parameters e.g., the condition 1 and the condition 2
- hangover information of the speech state machine may be received.
- An initial classification result may also be received.
- the initial classification result may be provided from the speech/music classifying unit 3710 .
- the speech state It may be determined whether the initial classification result, i.e., the speech state, is 0, the condition 1(cond A ) is 1, and the hangover hang sp of the speech state machine is 0. If it is determined that the initial classification result, i.e., the speech state, is 0, the condition 1 is 1, and the hangover hang sp of the speech state machine is 0, the speech state may be changed to 1, and the hangover may be initialized to 6.
- the speech state it may be determined whether the initial classification result, i.e., the speech state, is 1, the condition 2(cond B ) is 1, and the hangover hang sp of the speech state machine is 0. If it is determined that the speech state is 1, the condition 2 is 1, and the hangover hang sp of the speech state machine is 0, the speech state may be changed to 0, and the hangover sp may be initialized to 6. If the speech state is not 1, the condition 2 is not 1, or the hangover hang sp of the speech state machine is not 0, a hangover update for decreasing the hangover by 1 may be performed.
- a right side shows a state machine suitable for a high quality (HQ) core, i.e. a state machine for context-based correction in a music state, according to an embodiment.
- correction on a classification result may be applied according to a music state determined by the music state machine and a speech state determined by the speech state machine. For example, when an initial classification result is set to a speech signal, the speech signal may be changed to a music signal based on correction parameters.
- a classification result of a first operation of the initial classification result indicates a speech signal
- the music state is 1
- both the classification result of the first operation and a classification result of a second operation may be changed to a music signal.
- the music signal may be changed to a speech signal based on correction parameters. In this case, it may be determined that there is an error in the initial classification result, thereby correcting the classification result.
- the correction parameters e.g., the condition 3 and the condition 4
- hangover information of the music state machine may be received.
- An initial classification result may also be received.
- the initial classification result may be provided from the speech/music classifying unit 3710 .
- the music state may be changed to 1, and the hangover may be initialized to 6.
- the music state It may be determined whether the initial classification result, i.e., the music state, is 1, the condition 4(cond D ) is 1, and the hangover hang mus of the music state machine is 0. If it is determined that the music state is 1, the condition 4 is 1, and the hangover hang mus of the music state machine is 0, the music state may be changed to 0, and the hangover hang mus may be initialized to 6. If the music state is not 1, the condition 4 is not 1, or the hangover hang mus of the music state machine is not 0, a hangover update for decreasing the hangover by 1 may be performed.
- the above-described exemplary embodiments may be written as computer-executable programs and may be implemented in general-use digital computers that execute the programs by using a non-transitory computer-readable recording medium.
- data structures, program instructions, or data files, which can be used in the embodiments can be recorded on a non-transitory computer-readable recording medium in various ways.
- the non-transitory computer-readable recording medium is any data storage device that can store data which can be thereafter read by a computer system.
- non-transitory computer-readable recording medium examples include magnetic storage media, such as hard disks, floppy disks, and magnetic tapes, optical recording media, such as CD-ROMs and DVDs, magneto-optical media, such as optical disks, and hardware devices, such as ROM, RAM, and flash memory, specially configured to store and execute program instructions.
- the non-transitory computer-readable recording medium may be a transmission medium for transmitting signal designating program instructions, data structures, or the like.
- the program instructions may include not only mechanical language codes created by a compiler but also high-level language codes executable by a computer using an interpreter or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
- This application is a continuation application of U.S. application Ser. No. 15/500,292, filed Jan. 30, 2017, which is a National Stage of International Application No. PCT/KR2015/007901, filed Jul. 28, 2015, which claims the benefit of U.S. Patent Application No. 62/029,736, filed Jul. 28, 2014, the disclosures of which are incorporated herein in their entirety by reference.
- One or more exemplary embodiments relate to audio or speech signal encoding and decoding, and more particularly, to a method and apparatus for encoding or decoding a spectral coefficient in a frequency domain.
- Quantizers of various schemes have been proposed to efficiently encode spectral coefficients in a frequency domain. For example, there are trellis coded quantization (TCQ), uniform scalar quantization (USQ), factorial pulse coding (FPC), algebraic VQ (AVQ), pyramid VQ (PVQ), and the like, and a lossless encoder optimized for each quantizer may be implemented together.
- One or more exemplary embodiments include a method and apparatus for encoding or decoding a spectral coefficient adaptively to various bit rates or various sub-band sizes in a frequency domain.
- One or more exemplary embodiments include a computer-readable recording medium having recorded thereon a computer-readable program for executing a signal encoding or decoding method.
- One or more exemplary embodiments include a multimedia device employing a signal encoding or decoding apparatus.
- According to one or more exemplary embodiments, a spectrum encoding method includes quantizing spectral data of a current band based on a first quantization scheme, generating a lower bit of the current band using the spectral data and the quantized spectral data, quantizing a sequence of lower bits including the lower bit of the current band based on a second quantization scheme, and generating a bitstream based on a upper bit excluding N bits, where N is 1 or greater, from the quantized spectral data and the quantized sequence of lower bits.
- According to one or more exemplary embodiments, a spectrum encoding apparatus includes a processor configured to quantize spectral data of a current band based on a first quantization scheme, generate a lower bit of the current band using the spectral data and the quantized spectral data, quantize a sequence of lower bits including the lower bit of the current band based on a second quantization scheme, and generate a bitstream based on a upper bit excluding N bits, where N is 1 or greater, from the quantized spectral data and the quantized sequence of lower bits.
- According to one or more exemplary embodiments, a spectrum decoding method includes receiving a bitstream, decoding a sequence of lower bits by extracting TCQ path information, decoding number, position and sign of ISCs by extracting ISC information, extracting and decoding a remaining bit except for a lower bit, and reconstructing spectrum components based on the decoded sequence of lower bits and the decoded remaining bit except for the lower bit.
- According to one or more exemplary embodiments, a spectrum decoding apparatus includes a processor configured to receive a bitstream, decode a sequence of lower bits by extracting TCQ path information, decode number, position and sign of ISCs by extracting ISC information, extract and decode a remaining bit except for a lower bit, and reconstruct spectrum components based on the decoded sequence of lower bits and the decoded remaining bit except for the lower bit.
- Encoding and decoding of a spectral coefficient adaptive to various bit rates and various sub-band sizes can be performed. In addition, a spectrum coefficient can be encoded by means of a jointed USQ and TCQ by using a bit rate control module designed in a codec supporting multi-rates. In this case, the respective advantages of both quantization methods can be maximized.
-
FIGS. 1A and 1B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to an exemplary embodiment, respectively. -
FIGS. 2A and 2B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively. -
FIGS. 3A and 3B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively. -
FIGS. 4A and 4B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively. -
FIG. 5 is a block diagram of a frequency domain audio encoding apparatus according to an exemplary embodiment. -
FIG. 6 is a block diagram of a frequency domain audio decoding apparatus according to an exemplary embodiment. -
FIG. 7 is a block diagram of a spectrum encoding apparatus according to an exemplary embodiment. -
FIG. 8 illustrates sub-band segmentation. -
FIG. 9 is a block diagram of a spectrum quantization apparatus according to an exemplary embodiment. -
FIG. 10 is a block diagram of a spectrum encoding apparatus according to an exemplary embodiment. -
FIG. 11 is a block diagram of an ISC encoding apparatus according to an exemplary embodiment. -
FIG. 12 is a block diagram of an ISC information encoding apparatus according to an exemplary embodiment. -
FIG. 13 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment. -
FIG. 14 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment. -
FIG. 15 illustrates a concept of an ISC collection and encoding process according to an exemplary embodiment. -
FIG. 16 illustrates a second joint scheme combining USQ and TCQ. -
FIG. 17 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment. -
FIG. 18 is a block diagram of a second quantization unit ofFIG. 17 according to an exemplary embodiment. -
FIG. 19 illustrates a method of generating residual data. -
FIG. 20 illustrates an example of TCQ. -
FIG. 21 is a block diagram of a frequency domain audio decoding apparatus according to an exemplary embodiment. -
FIG. 22 is a block diagram of a spectrum decoding apparatus according to an exemplary embodiment. -
FIG. 23 is a block diagram of a spectrum inverse-quantization apparatus according to an exemplary embodiment. -
FIG. 24 is a block diagram of a spectrum decoding apparatus according to an exemplary embodiment. -
FIG. 25 is a block diagram of an ISC decoding apparatus according to an exemplary embodiment. -
FIG. 26 is a block diagram of an ISC information decoding apparatus according to an exemplary embodiment. -
FIG. 27 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment. -
FIG. 28 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment. -
FIG. 29 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment. -
FIG. 30 is a block diagram of a third decoding unit ofFIG. 29 according to another exemplary embodiment. -
FIG. 31 is a block diagram of a multimedia device according to an exemplary embodiment. -
FIG. 32 is a block diagram of a multimedia device according to another exemplary embodiment. -
FIG. 33 is a block diagram of a multimedia device according to another exemplary embodiment. -
FIG. 34 is a flowchart illustrating a spectrum encoding method according to an exemplary embodiment. -
FIG. 35 is a flowchart illustrating a spectrum decoding method according to an exemplary embodiment. -
FIG. 36 is a block diagram of a bit allocation apparatus according to an exemplary embodiment. -
FIG. 37 is a block diagram of a coding mode determination apparatus according to an exemplary embodiment. -
FIG. 38 illustrates a state machine used in a correction unit ofFIG. 37 according to an exemplary embodiment. - Since the inventive concept may have diverse modified embodiments, preferred embodiments are illustrated in the drawings and are described in the detailed description of the inventive concept. However, this does not limit the inventive concept within specific embodiments and it should be understood that the inventive concept covers all the modifications, equivalents, and replacements within the idea and technical scope of the inventive concept. Moreover, detailed descriptions related to well-known functions or configurations will be ruled out in order not to unnecessarily obscure subject matters of the inventive concept.
- It will be understood that although the terms of first and second are used herein to describe various elements, these elements should not be limited by these terms. Terms are only used to distinguish one component from other components.
- In the following description, the technical terms are used only for explain a specific exemplary embodiment while not limiting the inventive concept. Terms used in the inventive concept have been selected as general terms which are widely used at present, in consideration of the functions of the inventive concept, but may be altered according to the intent of an operator of ordinary skill in the art, conventional practice, or introduction of new technology. Also, if there is a term which is arbitrarily selected by the applicant in a specific case, in which case a meaning of the term will be described in detail in a corresponding description portion of the inventive concept. Therefore, the terms should be defined on the basis of the entire content of this specification instead of a simple name of each of the terms.
- The terms of a singular form may include plural forms unless referred to the contrary. The meaning of ‘comprise’, ‘include’, or ‘have’ specifies a property, a region, a fixed number, a step, a process, an element and/or a component but does not exclude other properties, regions, fixed numbers, steps, processes, elements and/or components.
- Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings.
-
FIGS. 1A and 1B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to an exemplary embodiment, respectively. - The
audio encoding apparatus 110 shown inFIG. 1A may include apre-processor 112, afrequency domain coder 114, and aparameter coder 116. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). - In
FIG. 1A , the pre-processor 112 may perform filtering, down-sampling, or the like for an input signal, but is not limited thereto. The input signal may include a speech signal, a music signal, or a mixed signal of speech and music. Hereinafter, for convenience of explanation, the input signal is referred to as an audio signal. - The
frequency domain coder 114 may perform a time-frequency transform on the audio signal provided by thepre-processor 112, select a coding tool in correspondence with the number of channels, a coding band, and a bit rate of the audio signal, and encode the audio signal by using the selected coding tool. The time-frequency transform may use a modified discrete cosine transform (MDCT), a modulated lapped transform (MLT), or a fast Fourier transform (FFT), but is not limited thereto. When the number of given bits is sufficient, a general transform coding scheme may be applied to the whole bands, and when the number of given bits is not sufficient, a bandwidth extension scheme may be applied to partial bands. When the audio signal is a stereo-channel or multi-channel, if the number of given bits is sufficient, encoding is performed for each channel, and if the number of given bits is not sufficient, a down-mixing scheme may be applied. An encoded spectral coefficient is generated by thefrequency domain coder 114. - The
parameter coder 116 may extract a parameter from the encoded spectral coefficient provided from thefrequency domain coder 114 and encode the extracted parameter. The parameter may be extracted, for example, for each sub-band, which is a unit of grouping spectral coefficients, and may have a uniform or non-uniform length by reflecting a critical band. When each sub-band has a non-uniform length, a sub-band existing in a low frequency band may have a relatively short length compared with a sub-band existing in a high frequency band. The number and a length of sub-bands included in one frame vary according to codec algorithms and may affect the encoding performance. The parameter may include, for example a scale factor, power, average energy, or Norm, but is not limited thereto. Spectral coefficients and parameters obtained as an encoding result form a bitstream, and the bitstream may be stored in a storage medium or may be transmitted in a form of, for example, packets through a channel. - The
audio decoding apparatus 130 shown inFIG. 1B may include aparameter decoder 132, afrequency domain decoder 134, and a post-processor 136. Thefrequency domain decoder 134 may include a frame error concealment algorithm or a packet loss concealment algorithm. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). - In
FIG. 1B , theparameter decoder 132 may decode parameters from a received bitstream and check whether an error such as erasure or loss has occurred in frame units from the decoded parameters. Various well-known methods may be used for the error check, and information on whether a current frame is a good frame or an erasure or loss frame is provided to thefrequency domain decoder 134. Hereinafter, for convenience of explanation, the erasure or loss frame is referred to as an error frame. - When the current frame is a good frame, the
frequency domain decoder 134 may generate synthesized spectral coefficients by performing decoding through a general transform decoding process. When the current frame is an error frame, thefrequency domain decoder 134 may generate synthesized spectral coefficients by repeating spectral coefficients of a previous good frame (PGF) onto the error frame or by scaling the spectral coefficients of the PGF by a regression analysis to then be repeated onto the error frame, through a frame error concealment algorithm or a packet loss concealment algorithm. Thefrequency domain decoder 134 may generate a time domain signal by performing a frequency-time transform on the synthesized spectral coefficients. - The post-processor 136 may perform filtering, up-sampling, or the like for sound quality improvement with respect to the time domain signal provided from the
frequency domain decoder 134, but is not limited thereto. The post-processor 136 provides a reconstructed audio signal as an output signal. -
FIGS. 2A and 2B are block diagrams of an audio encoding apparatus and an audio decoding apparatus, according to another exemplary embodiment, respectively, which have a switching structure. - The
audio encoding apparatus 210 shown inFIG. 2A may include apre-processor unit 212, amode determiner 213, afrequency domain coder 214, atime domain coder 215, and aparameter coder 216. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). - In
FIG. 2A , since the pre-processor 212 is substantially the same as thepre-processor 112 ofFIG. 1A , the description thereof is not repeated. - The
mode determiner 213 may determine a coding mode by referring to a characteristic of an input signal. Themode determiner 213 may determine according to the characteristic of the input signal whether a coding mode suitable for a current frame is a speech mode or a music mode and may also determine whether a coding mode efficient for the current frame is a time domain mode or a frequency domain mode. The characteristic of the input signal may be perceived by using a short-term characteristic of a frame or a long-term characteristic of a plurality of frames, but is not limited thereto. For example, if the input signal corresponds to a speech signal, the coding mode may be determined as the speech mode or the time domain mode, and if the input signal corresponds to a signal other than a speech signal, i.e., a music signal or a mixed signal, the coding mode may be determined as the music mode or the frequency domain mode. Themode determiner 213 may provide an output signal of the pre-processor 212 to thefrequency domain coder 214 when the characteristic of the input signal corresponds to the music mode or the frequency domain mode and may provide an output signal of the pre-processor 212 to thetime domain coder 215 when the characteristic of the input signal corresponds to the speech mode or the time domain mode. - Since the
frequency domain coder 214 is substantially the same as thefrequency domain coder 114 ofFIG. 1A , the description thereof is not repeated. - The
time domain coder 215 may perform code excited linear prediction (CELP) coding for an audio signal provided from thepre-processor 212. In detail, algebraic CELP may be used for the CELP coding, but the CELP coding is not limited thereto. An encoded spectral coefficient is generated by thetime domain coder 215. - The
parameter coder 216 may extract a parameter from the encoded spectral coefficient provided from thefrequency domain coder 214 or thetime domain coder 215 and encodes the extracted parameter. Since theparameter coder 216 is substantially the same as theparameter coder 116 ofFIG. 1A , the description thereof is not repeated. Spectral coefficients and parameters obtained as an encoding result may form a bitstream together with coding mode information, and the bitstream may be transmitted in a form of packets through a channel or may be stored in a storage medium. - The
audio decoding apparatus 230 shown inFIG. 2B may include a parameter decoder 232, amode determiner 233, afrequency domain decoder 234, atime domain decoder 235, and a post-processor 236. Each of thefrequency domain decoder 234 and thetime domain decoder 235 may include a frame error concealment algorithm or a packet loss concealment algorithm in each corresponding domain. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). - In
FIG. 2B , the parameter decoder 232 may decode parameters from a bitstream transmitted in a form of packets and check whether an error has occurred in frame units from the decoded parameters. Various well-known methods may be used for the error check, and information on whether a current frame is a good frame or an error frame is provided to thefrequency domain decoder 234 or thetime domain decoder 235. - The
mode determiner 233 may check coding mode information included in the bitstream and provide a current frame to thefrequency domain decoder 234 or thetime domain decoder 235. - The
frequency domain decoder 234 may operate when a coding mode is the music mode or the frequency domain mode and generate synthesized spectral coefficients by performing decoding through a general transform decoding process when the current frame is a good frame. When the current frame is an error frame, and a coding mode of a previous frame is the music mode or the frequency domain mode, thefrequency domain decoder 234 may generate synthesized spectral coefficients by repeating spectral coefficients of a previous good frame (PGF) onto the error frame or by scaling the spectral coefficients of the PGF by a regression analysis to then be repeated onto the error frame, through a frame error concealment algorithm or a packet loss concealment algorithm. Thefrequency domain decoder 234 may generate a time domain signal by performing a frequency-time transform on the synthesized spectral coefficients. - The
time domain decoder 235 may operate when the coding mode is the speech mode or the time domain mode and generate a time domain signal by performing decoding through a general CELP decoding process when the current frame is a normal frame. When the current frame is an error frame, and the coding mode of the previous frame is the speech mode or the time domain mode, thetime domain decoder 235 may perform a frame error concealment algorithm or a packet loss concealment algorithm in the time domain. - The post-processor 236 may perform filtering, up-sampling, or the like for the time domain signal provided from the
frequency domain decoder 234 or thetime domain decoder 235, but is not limited thereto. The post-processor 236 provides a reconstructed audio signal as an output signal. -
FIGS. 3A and 3B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively. - The
audio encoding apparatus 310 shown inFIG. 3A may include apre-processor 312, a linear prediction (LP)analyzer 313, amode determiner 314, a frequencydomain excitation coder 315, a timedomain excitation coder 316, and aparameter coder 317. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). - In
FIG. 3A , since the pre-processor 312 is substantially the same as thepre-processor 112 ofFIG. 1A , the description thereof is not repeated. - The
LP analyzer 313 may extract LP coefficients by performing LP analysis for an input signal and generate an excitation signal from the extracted LP coefficients. The excitation signal may be provided to one of the frequency domainexcitation coder unit 315 and the timedomain excitation coder 316 according to a coding mode. - Since the
mode determiner 314 is substantially the same as themode determiner 213 ofFIG. 2A , the description thereof is not repeated. - The frequency
domain excitation coder 315 may operate when the coding mode is the music mode or the frequency domain mode, and since the frequencydomain excitation coder 315 is substantially the same as thefrequency domain coder 114 ofFIG. 1A except that an input signal is an excitation signal, the description thereof is not repeated. - The time
domain excitation coder 316 may operate when the coding mode is the speech mode or the time domain mode, and since the time domainexcitation coder unit 316 is substantially the same as thetime domain coder 215 ofFIG. 2A , the description thereof is not repeated. - The
parameter coder 317 may extract a parameter from an encoded spectral coefficient provided from the frequencydomain excitation coder 315 or the timedomain excitation coder 316 and encode the extracted parameter. Since theparameter coder 317 is substantially the same as theparameter coder 116 ofFIG. 1A , the description thereof is not repeated. Spectral coefficients and parameters obtained as an encoding result may form a bitstream together with coding mode information, and the bitstream may be transmitted in a form of packets through a channel or may be stored in a storage medium. - The
audio decoding apparatus 330 shown inFIG. 3B may include aparameter decoder 332, amode determiner 333, a frequencydomain excitation decoder 334, a timedomain excitation decoder 335, anLP synthesizer 336, and a post-processor 337. Each of the frequencydomain excitation decoder 334 and the timedomain excitation decoder 335 may include a frame error concealment algorithm or a packet loss concealment algorithm in each corresponding domain. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). - In
FIG. 3B , theparameter decoder 332 may decode parameters from a bitstream transmitted in a form of packets and check whether an error has occurred in frame units from the decoded parameters. Various well-known methods may be used for the error check, and information on whether a current frame is a good frame or an error frame is provided to the frequencydomain excitation decoder 334 or the timedomain excitation decoder 335. - The
mode determiner 333 may check coding mode information included in the bitstream and provide a current frame to the frequencydomain excitation decoder 334 or the timedomain excitation decoder 335. - The frequency
domain excitation decoder 334 may operate when a coding mode is the music mode or the frequency domain mode and generate synthesized spectral coefficients by performing decoding through a general transform decoding process when the current frame is a good frame. When the current frame is an error frame, and a coding mode of a previous frame is the music mode or the frequency domain mode, the frequencydomain excitation decoder 334 may generate synthesized spectral coefficients by repeating spectral coefficients of a previous good frame (PGF) onto the error frame or by scaling the spectral coefficients of the PGF by a regression analysis to then be repeated onto the error frame, through a frame error concealment algorithm or a packet loss concealment algorithm. The frequencydomain excitation decoder 334 may generate an excitation signal that is a time domain signal by performing a frequency-time transform on the synthesized spectral coefficients. - The time
domain excitation decoder 335 may operate when the coding mode is the speech mode or the time domain mode and generate an excitation signal that is a time domain signal by performing decoding through a general CELP decoding process when the current frame is a good frame. When the current frame is an error frame, and the coding mode of the previous frame is the speech mode or the time domain mode, the timedomain excitation decoder 335 may perform a frame error concealment algorithm or a packet loss concealment algorithm in the time domain. - The
LP synthesizer 336 may generate a time domain signal by performing LP synthesis for the excitation signal provided from the frequencydomain excitation decoder 334 or the timedomain excitation decoder 335. - The post-processor 337 may perform filtering, up-sampling, or the like for the time domain signal provided from the
LP synthesizer 336, but is not limited thereto. The post-processor 337 provides a reconstructed audio signal as an output signal. -
FIGS. 4A and 4B are block diagrams of an audio encoding apparatus and an audio decoding apparatus according to another exemplary embodiment, respectively, which have a switching structure. - The
audio encoding apparatus 410 shown inFIG. 4A may include apre-processor 412, amode determiner 413, afrequency domain coder 414, anLP analyzer 415, a frequencydomain excitation coder 416, a timedomain excitation coder 417, and aparameter coder 418. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). Since it can be considered that theaudio encoding apparatus 410 shown inFIG. 4A is obtained by combining theaudio encoding apparatus 210 ofFIG. 2A and theaudio encoding apparatus 310 ofFIG. 3A , the description of operations of common parts is not repeated, and an operation of themode determination unit 413 will now be described. - The
mode determiner 413 may determine a coding mode of an input signal by referring to a characteristic and a bit rate of the input signal. Themode determiner 413 may determine the coding mode as a CELP mode or another mode based on whether a current frame is the speech mode or the music mode according to the characteristic of the input signal and based on whether a coding mode efficient for the current frame is the time domain mode or the frequency domain mode. Themode determiner 413 may determine the coding mode as the CELP mode when the characteristic of the input signal corresponds to the speech mode, determine the coding mode as the frequency domain mode when the characteristic of the input signal corresponds to the music mode and a high bit rate, and determine the coding mode as an audio mode when the characteristic of the input signal corresponds to the music mode and a low bit rate. Themode determiner 413 may provide the input signal to thefrequency domain coder 414 when the coding mode is the frequency domain mode, provide the input signal to the frequencydomain excitation coder 416 via theLP analyzer 415 when the coding mode is the audio mode, and provide the input signal to the timedomain excitation coder 417 via theLP analyzer 415 when the coding mode is the CELP mode. - The
frequency domain coder 414 may correspond to thefrequency domain coder 114 in theaudio encoding apparatus 110 ofFIG. 1A or thefrequency domain coder 214 in theaudio encoding apparatus 210 ofFIG. 2A , and the frequencydomain excitation coder 416 or the timedomain excitation coder 417 may correspond to the frequencydomain excitation coder 315 or the timedomain excitation coder 316 in theaudio encoding apparatus 310 ofFIG. 3A . - The
audio decoding apparatus 430 shown inFIG. 4B may include aparameter decoder 432, amode determiner 433, afrequency domain decoder 434, a frequencydomain excitation decoder 435, a timedomain excitation decoder 436, anLP synthesizer 437, and a post-processor 438. Each of thefrequency domain decoder 434, the frequencydomain excitation decoder 435, and the timedomain excitation decoder 436 may include a frame error concealment algorithm or a packet loss concealment algorithm in each corresponding domain. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). Since it can be considered that theaudio decoding apparatus 430 shown inFIG. 4B is obtained by combining theaudio decoding apparatus 230 ofFIG. 2B and theaudio decoding apparatus 330 ofFIG. 3B , the description of operations of common parts is not repeated, and an operation of themode determiner 433 will now be described. - The
mode determiner 433 may check coding mode information included in a bitstream and provide a current frame to thefrequency domain decoder 434, the frequencydomain excitation decoder 435, or the timedomain excitation decoder 436. - The
frequency domain decoder 434 may correspond to thefrequency domain decoder 134 in theaudio decoding apparatus 130 ofFIG. 1B or thefrequency domain decoder 234 in theaudio encoding apparatus 230 ofFIG. 2B , and the frequencydomain excitation decoder 435 or the timedomain excitation decoder 436 may correspond to the frequencydomain excitation decoder 334 or the timedomain excitation decoder 335 in theaudio decoding apparatus 330 ofFIG. 3B . -
FIG. 5 is a block diagram of a frequency domain audio encoding apparatus according to an exemplary embodiment. - The frequency domain
audio encoding apparatus 510 shown inFIG. 5 may include atransient detector 511, atransformer 512, asignal classifier 513, anenergy coder 514, aspectrum normalizer 515, a bit allocator 516, aspectrum coder 517, and amultiplexer 518. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). The frequency domainaudio encoding apparatus 510 may perform all functions of the frequencydomain audio coder 214 and partial functions of theparameter coder 216 shown inFIG. 2 . The frequency domainaudio encoding apparatus 510 may be replaced by a configuration of an encoder disclosed in the ITU-T G.719 standard except for thesignal classifier 513, and thetransformer 512 may use a transform window having an overlap duration of 50%. In addition, the frequency domainaudio encoding apparatus 510 may be replaced by a configuration of an encoder disclosed in the ITU-T G.719 standard except for thetransient detector 511 and thesignal classifier 513. In each case, although not shown, a noise level estimation unit may be further included at a rear end of thespectrum coder 517 as in the ITU-T G.719 standard to estimate a noise level for a spectral coefficient to which a bit is not allocated in a bit allocation process and insert the estimated noise level into a bitstream. - Referring to
FIG. 5 , thetransient detector 511 may detect a duration exhibiting a transient characteristic by analyzing an input signal and generate transient signaling information for each frame in response to a result of the detection. Various well-known methods may be used for the detection of a transient duration. According to an exemplary embodiment, thetransient detector 511 may primarily determine whether a current frame is a transient frame and secondarily verify the current frame that has been determined as a transient frame. The transient signaling information may be included in a bitstream by themultiplexer 518 and may be provided to thetransformer 512. - The
transformer 512 may determine a window size to be used for a transform according to a result of the detection of a transient duration and perform a time-frequency transform based on the determined window size. For example, a short window may be applied to a sub-band from which a transient duration has been detected, and a long window may be applied to a sub-band from which a transient duration has not been detected. As another example, a short window may be applied to a frame including a transient duration. - The
signal classifier 513 may analyze a spectrum provided from thetransformer 512 in frame units to determine whether each frame corresponds to a harmonic frame. Various well-known methods may be used for the determination of a harmonic frame. According to an exemplary embodiment, thesignal classifier 513 may divide the spectrum provided from thetransformer 512 into a plurality of sub-bands and obtain a peak energy value and an average energy value for each sub-band. Thereafter, thesignal classifier 513 may obtain the number of sub-bands of which a peak energy value is greater than an average energy value by a predetermined ratio or above for each frame and determine, as a harmonic frame, a frame in which the obtained number of sub-bands is greater than or equal to a predetermined value. The predetermined ratio and the predetermined value may be determined in advance through experiments or simulations. Harmonic signaling information may be included in the bitstream by themultiplexer 518. - The
energy coder 514 may obtain energy in each sub-band unit and quantize and lossless-encode the energy. According to an embodiment, a Norm value corresponding to average spectral energy in each sub-band unit may be used as the energy and a scale factor or a power may also be used, but the energy is not limited thereto. The Norm value of each sub-band may be provided to thespectrum normalizer 515 and the bit allocator 516 and may be included in the bitstream by themultiplexer 518. - The
spectrum normalizer 515 may normalize the spectrum by using the Norm value obtained in each sub-band unit. - The bit allocator 516 may allocate bits in integer units or fraction units by using the Norm value obtained in each sub-band unit. In addition, the bit allocator 516 may calculate a masking threshold by using the Norm value obtained in each sub-band unit and estimate the perceptually required number of bits, i.e., the allowable number of bits, by using the masking threshold. The bit allocator 516 may limit that the allocated number of bits does not exceed the allowable number of bits for each sub-band. The bit allocator 516 may sequentially allocate bits from a sub-band having a larger Norm value and weigh the Norm value of each sub-band according to perceptual importance of each sub-band to adjust the allocated number of bits so that a more number of bits are allocated to a perceptually important sub-band. The quantized Norm value provided from the
energy coder 514 to the bit allocator 516 may be used for the bit allocation after being adjusted in advance to consider psychoacoustic weighting and a masking effect as in the ITU-T G.719 standard. - The
spectrum coder 517 may quantize the normalized spectrum by using the allocated number of bits of each sub-band and lossless-encode a result of the quantization. For example, TCQ, USQ, FPC, AVQ and PVQ or a combination thereof and a lossless encoder optimized for each quantizer may be used for the spectrum encoding. In addition, a trellis coding may also be used for the spectrum encoding, but the spectrum encoding is not limited thereto. Moreover, a variety of spectrum encoding methods may also be used according to either environments in which a corresponding codec is embodied or a user's need. Information on the spectrum encoded by thespectrum coder 517 may be included in the bitstream by themultiplexer 518. -
FIG. 6 is a block diagram of a frequency domain audio encoding apparatus according to an exemplary embodiment. - The frequency domain
audio encoding apparatus 600 shown inFIG. 6 may include apre-processor 610, afrequency domain coder 630, atime domain coder 650, and amultiplexer 670. Thefrequency domain coder 630 may include atransient detector 631, atransformer 633 and aspectrum coder 635. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). - Referring to
FIG. 6 , the pre-processor 610 may perform filtering, down-sampling, or the like for an input signal, but is not limited thereto. The pre-processor 610 may determine a coding mode according to a signal characteristic. The pre-processor 610 may determine according to a signal characteristic whether a coding mode suitable for a current frame is a speech mode or a music mode and may also determine whether a coding mode efficient for the current frame is a time domain mode or a frequency domain mode. The signal characteristic may be perceived by using a short-term characteristic of a frame or a long-term characteristic of a plurality of frames, but is not limited thereto. For example, if the input signal corresponds to a speech signal, the coding mode may be determined as the speech mode or the time domain mode, and if the input signal corresponds to a signal other than a speech signal, i.e., a music signal or a mixed signal, the coding mode may be determined as the music mode or the frequency domain mode. The pre-processor 610 may provide an input signal to thefrequency domain coder 630 when the signal characteristic corresponds to the music mode or the frequency domain mode and may provide an input signal to the time domain coder 660 when the signal characteristic corresponds to the speech mode or the time domain mode. - The
frequency domain coder 630 may process an audio signal provided from the pre-processor 610 based on a transform coding scheme. In detail, thetransient detector 631 may detect a transient component from the audio signal and determine whether a current frame corresponds to a transient frame. Thetransformer 633 may determine a length or a shape of a transform window based on a frame type, i.e. transient information provided from thetransient detector 631 and may transform the audio signal into a frequency domain based on the determined transform window. As an example of a transform tool, a modified discrete cosine transform (MDCT), a fast Fourier transform (FFT) or a modulated lapped transform (MLT) may be used. In general, a short transform window may be applied to a frame including a transient component. Thespectrum coder 635 may perform encoding on the audio spectrum transformed into the frequency domain. Thespectrum coder 635 will be described below in more detail with reference toFIGS. 7 and 9 . - The
time domain coder 650 may perform code excited linear prediction (CELP) coding on an audio signal provided from thepre-processor 610. In detail, algebraic CELP may be used for the CELP coding, but the CELP coding is not limited thereto. - The
multiplexer 670 may multiplex spectral components or signal components and variable indices generated as a result of encoding in thefrequency domain coder 630 or thetime domain coder 650 so as to generate a bitstream. The bitstream may be stored in a storage medium or may be transmitted in a form of packets through a channel. -
FIG. 7 is a block diagram of a spectrum encoding apparatus according to an exemplary embodiment. The spectrum encoding apparatus shown inFIG. 7 may correspond to thespectrum coder 635 ofFIG. 6 , may be included in another frequency domain encoding apparatus, or may be implemented independently. - The spectrum encoding apparatus shown in
FIG. 7 may include anenergy estimator 710, an energy quantizing andcoding unit 720, a bit allocator 730, aspectrum normalizer 740, a spectrum quantizing andcoding unit 750 and anoise filler 760. - Referring to
FIG. 7 , theenergy estimator 710 may divide original spectral coefficients into a plurality of sub-bands and estimate energy, for example, a Norm value for each sub-band. Each sub-band may have a uniform length in a frame. When each sub-band has a non-uniform length, the number of spectral coefficients included in a sub-band may be increased from a low frequency to a high frequency band. - The energy quantizing and
coding unit 720 may quantize and encode an estimated Norm value for each sub-band. The Norm value may be quantized by means of variable tools such as vector quantization (VQ), scalar quantization (SQ), trellis coded quantization (TCQ), lattice vector quantization (LVQ), etc. The energy quantizing andcoding unit 720 may additionally perform lossless coding for further increasing coding efficiency. - The bit allocator 730 may allocate bits required for coding in consideration of allowable bits of a frame, based on the quantized Norm value for each sub-band.
- The
spectrum normalizer 740 may normalize the spectrum based on the Norm value obtained for each sub-band. - The spectrum quantizing and
coding unit 750 may quantize and encode the normalized spectrum based on allocated bits for each sub-band. - The
noise filler 760 may add noises into a component quantized to zero due to constraints of allowable bits in the spectrum quantizing andcoding unit 750. -
FIG. 8 illustrates sub-band segmentation. - Referring to
FIG. 8 , when an input signal uses a sampling frequency of 48 KHz and has a frame size of 20 ms, the number of samples to be processed for each frame becomes 960. That is, when the input signal is transformed by using MDCT with 50% overlapping, 960 spectral coefficients are obtained. A ratio of overlapping may be variably set according a coding scheme. In a frequency domain, a band up to 24 KHz may be theoretically processed and a band up to 20 KHz may be represented in consideration of an audible range. In a low band of 0 to 3.2 KHz, a sub-band comprises 8 spectral coefficients. In a band of 3.2 to 6.4 KHz, a sub-band comprises 16 spectral coefficients. In a band of 6.4 to 13.6 KHz, a sub-band comprises 24 spectral coefficients. In a band of 13.6 to 20 KHz, a sub-band comprises 32 spectral coefficients. For a predetermined band set in an encoding apparatus, coding based on a Norm value may be performed and for a high band above the predetermined band, coding based on variable schemes such as band extension may be applied. -
FIG. 9 is a block diagram illustrating a configuration of a spectrum quantization apparatus according to an exemplary embodiment. - The apparatus shown in
FIG. 9 may include aquantizer selecting unit 910, aUSQ 930, and aTCQ 950. - In
FIG. 9 , thequantizer selecting unit 910 may select the most efficient quantizer from among various quantizers according to the characteristic of a signal to be quantized, i.e. an input signal. As the characteristic of the input signal, bit allocation information for each band, band size information, and the like are usable. According to a result of the selection, the signal to be quantized may be provided to one of the USQ 830 and the TCQ 850 so that corresponding quantization is performed. The input signal may be a normalized MDCT spectrum. The bandwidth of the input signal may be either a narrow band (NB) or a wide band (WB). The coding mode of the input signal may be a normal mode. -
FIG. 10 is a block diagram illustrating a configuration of a spectrum encoding apparatus according to an exemplary embodiment. The apparatus shown inFIG. 10 may correspond to the spectrum quantizing andencoding unit 750 ofFIG. 7 , may be included in another frequency domain encoding apparatus, or may be independently implemented. - The apparatus shown in
FIG. 10 may include an encodingmethod selecting unit 1010, a zeroencoding unit 1020, ascaling unit 1030, anISC encoding unit 1040, a quantizedcomponent restoring unit 1050, and aninverse scaling unit 1060. Herein, the quantizedcomponent restoring unit 1050 and theinverse scaling unit 1060 may be optionally provided. - In
FIG. 10 , the encodingmethod selection unit 1010 may select an encoding method by taking into account an input signal characteristic. The input signal characteristic may include at least one of a bandwidth and bits allocated for each band. A normalized spectrum may be provided to the zeroencoding unit 1020 or thescaling unit 1030 based on an encoding scheme selected for each band. According to an embodiment, in a case that the bandwidth is either the narrow band or the wide band, when the average number of bits allocated to each sample of a band is greater than or equal to a predetermined value, e.g., 0.75, USQ may be used for the corresponding band by determining that the corresponding band is of high importance, and TCQ may be used for all the other bands. Herein, the average number of bits may be determined by taking into account a band length or a band size. The selected encoding method may be set using a one-bit flag. According to another embodiment, in a case that the bandwidth is either a super wide band (SWB) or a full band (FB), a joint USQ and TCQ method may be used. - The zero
encoding unit 1020 may encode all samples to zero (0) for bands of which allocated bits are zero. - The
scaling unit 1030 may adjust a bit rate by scaling a spectrum based on bits allocated to bands. In this case, a normalized spectrum may be used. Thescaling unit 1030 may perform scaling by taking into account the average number of bits allocated to each sample, i.e., a spectral coefficient, included in a band. For example, the greater the average number of bits is, the more scaling may be performed. - According to an embodiment, the
scaling unit 1030 may determine an appropriate scaling value according to bit allocation for each band. - In detail, first, the number of pulses for a current band may be estimated using a band length and bit allocation information. Herein, the pulses may indicate unit pulses. Before the estimation, bits (b) actually needed for the current band may be calculated based on
Equation 1. -
- where, n denotes a band length, m denotes the number of pulses, and i denotes the number of non-zero positions having the important spectral component (ISC).
- The number of non-zero positions may be obtained based on, for example, a probability by
Equation 2. -
pNZP(i)=2i−h C n i C m−1 i−1 , i∈{l, . . . , min(m,n)} (2) - In addition, the number of bits needed for the non-zero positions may be estimated by
Equation 3. -
b nzp=log2(pNZP(i)) (3) - Finally, the number of pulses may be selected by a value b having the closest value to bits allocated to each band.
- Next, an initial scaling factor may be determined by the estimation of the number of pulses obtained for each band and an absolute value of an input signal. The input signal may be scaled by the initial scaling factor. If a sum of the numbers of pulses for a scaled original signal, i.e., a quantized signal, is not the same as the extimated number of pulses, pulse redistribution processing may be performed using an updated scaling factor. According to the pulse redistribution processing, if the number of pulses selected for the current band is less than the estimated number of pulses obtained for each band, the number of pulses increases by decreasing the scaling factor, otherwise if the number of pulses selected for the current band is greater than the estimated number of pulses obtained for each band, the number of pulses decreases by increasing the scaling factor. In this case, the scaling factor may be increased or decreased by a predetermined value by selecting a position where distortion of an original signal is minimized.
- Since a distortion function for TSQ requires a relative size rather than an accurate distance, the distortion function for TSQ may be obtained a sum of a squared distance between a quantized value and an un-quantized value in each band as shown in
Equation 4. -
- where, pi denotes an actual value, and qi denotes a quantized value.
- A distortion function for USQ may use a Euclidean distance to determine a best quantized value. In this case, a modified equation including a scaling factor may be used to minimize computational complexity, and the distortion function may be calculated by Equation 5.
-
- If the number of pulses for each band dows not match a required value, a predetermined number of pulses may need to be increased or decreased while maintaining a minimal metric. This may be performed in an iterative manner by adding or deleting a single pulse and then repeating until the number of pulses reaches the required value.
- To add or delete one pulse, n distortion values need to be obtained to select the most optimum distortion value. For example, a distortion value j may correspond to addition of a pulse to a jth position in a band as shown in
Equation 6. -
- To avoid
Equation 6 from being performed n times, a deviation may be used as shown inEquation 7. -
- In
Equation 7, -
- may be calculated just once. In addition, n denotes a band length, i.e., the number of coefficients in a band, p denotes an original signal, i.e., an input signal of a quantizer, q denotes a quantized signal, and g denotes a scaling factor. Finally, a position j where a distortion d is minimized may be selected, thereby updating qj.
- To control a bit rate, encoding may be performed by using a scaled spectiral coefficient and selecting an appropriate ISC. In detail, a spectral component for quantization may be selected using bit allocation for each band. In this case, the spectral component may be selected based on various combinations according to distribution and variance of spectral components. Next, actual non-zero positions may be calculated. A non-zero position may be obtained by analyzing an amount of scaling and a redistribution operation, and such a selected non-zero position may be referred to as an ISC. In summary, an optimal scaling factor and non-zero position information corresponding to ISCs by analyzing a magnitude of a signal which has undergone a scaling and redistribution process. Herein, the non-zero position information indicates the number and locations of non-zero positions. If the number of pulses is not controlled through the scaling and redistribution process, selected pulses may be quantized through a TCQ process, and surplus bits may be adjusted using a result of the quantization. This process may be illustrated as follows.
- For conditions that the number of non-zero positions is not the same as the estimated number of pulses for each band and is greater than a predetermined value, e.g., 1, and quantizer selection information indicates TCQ, surplus bits may be adjusted through actual TCQ quantization. In detail, in a case corresponding to the conditions, a TCQ quantization process is first performed to adjust surplus bits. If the real number of pulses of a current band obtained through the TCQ quantization is smaller than the estimated number of pulses previously obtained for each band, a scaling factor is increased by multiplying a scaling factor determined before the TCQ quantization by a value, e.g., 1.1, greater than 1, otherwise a scaling factor is decreased by multiplying the scaling factor determined before the actual TCQ quantization by a value, e.g., 0.9, less than 1. When the estimated number of pulses obtained for each band is the same as the number of pulses of the current band, which is obtained through the TCQ quantization by repeating this process, surplus bits are updated by calculating bits used in the actual TCQ quantization process. A non-zero position obtained by this process may correspond to an ISC.
- The
ISC encoding unit 1040 may encode information on the number of finally selected ISCs and information on non-zero positions. In this process, lossless encoding may be applied to enhance encoding efficiency. TheISC encoding unit 1040 may perform encoding using a selected quantizer for a non-zero band of which allocated bits are non zero. In detail, theISC encoding unit 1040 may select ISCs for each band with respect to a normalized spectrum and enode information about the selected ISCs based on number, position, magnitude, and sign. In this case, an ISC magnitude may be encoded in a manner other than number, position, and sign. For example, the ISC magnitude may be quantized using one of USQ and TCQ and arithmetic-coded, whereas the number, positions, and signs of the ISCs may be arithmetic-coded. According to an embodiment, one of TCQ and USQ may be selected based on a signal characteristic. In addition, a first joint scheme in which a quantizer is selected by additionally performing secondary bit allocation processing on surplus bits from a previously coded band in addition to original bit allocation information for each band may be used. The second bit allocation processing in the first joint method may distribute the surplus bits from the previously coded band and may detect two band that will be encoded separately. Herein, the signal characteristic may include a bit allocated to each band or a band length. For example, if it may be determined that a specific band includes vary important information, USQ may be used. Otherwise, TCQ may be used. If the average number of bits allocated to each sample included in a band is greater than or equal to a threshold value, e.g., 0.75, it may be determined that the corresponding band includes vary important information, and thus USQ may be used. Even in a case of a low band having a short band length, USQ may be used in accordance with circumstances. When the bandwidth of an input signal is an NB or a WB, the first joint scheme may be used. According to another embodiment, the second joint scheme in which all bands may be coded by using USQ and TCQ is used for a least significant bit (LSB). When the bandwidth of an input signal is a SWB or a FB, the second joint scheme may be used. - The quantized
component restoring unit 1050 may restore an actual quantized component by adding ISC position, magnitude, and sign information to a quantized component. Herein, zero may be allocated to a spectral coefficient of a zero position, i.e., a spectral coefficient encoded to zero. - The
inverse scaling unit 1060 may output a quantized spectral coefficient of the same level as that of a normalized input spectrum by inversely scaling the restored quantized component. Thescaling unit 1030 and theinverse scaling unit 1060 may use the same scaling factor. -
FIG. 11 is a block diagram illustrating a configuration of an ISC encoding apparatus according to an exemplary embodiment. - The apparatus shown in
FIG. 11 may include anISC selecting unit 1110 and an ISCinformation encoding unit 1130. The apparatus ofFIG. 11 may correspond to theISC encoding unit 1040 ofFIG. 10 or may be implemented as an independent apparatus. - In
FIG. 11 , theISC selecting unit 1110 may select ISCs based on a predetermined criterion from a scaled spectrum to adjust a bit rate. TheISC selecting unit 1110 may obtain actual non-zero positions by analyzing a degree of scaling from the scaled spectrum. Herein, the ISCs may correspond to actual non-zero spectral coefficients before scaling. TheISC selecting unit 1110 may select spectral coefficients to be encoded, i.e., non-zero positions, by taking into account distribution and variance of spectral coefficients based on bits allocated for each band. TCQ may be used for the ISC selection. - The ISC
information encoding unit 1130 encode ISC information, i.e., number information, position information, magnitude information, and signs of the ISCs based on the selected ISCs. -
FIG. 12 is a block diagram illustrating a configuration of an ISC information encoding apparatus according to an exemplary embodiment. - The apparatus shown in
FIG. 12 may include a positioninformation encoding unit 1210, a magnitudeinformation encoding unit 1230, and asign encoding unit 1250. - In
FIG. 12 , the positioninformation encoding unit 1210 may encode position information of the ISCs selected by the ISC selection unit (1110 ofFIG. 11 ), i.e., position information of the non-zero spectral coefficients. The position information may include the number and positions of the selected ISCs. Arithmetic coding may be used for the encoding on the position information. A new buffer may be configured by collecting the selected ISCs. For the ISC collection, zero bands and non-selected spectra may be excluded. - The magnitude
information encoding unit 1230 may encode magnitude information of the newly configured ISCs. In this case, quantization may be performed by selecting one of TCQ and USQ, and arithmetic coding may be additionally performed in succession. To increase efficiency of the arithmetic coding, non-zero position information and the number of ISCs may be used. - The sign
information encoding unit 1250 may encode sign information of the selected ISCs. Arithmetic coding may be used for the encoding on the sign information. -
FIG. 13 is a block diagram illustrating a configuration of a spectrum encoding apparatus according to another exemplary embodiment. The apparatus shown inFIG. 13 may correspond to the spectrum quantizing andencoding unit 750 ofFIG. 7 or may be included in another frequency domain encoding apparatus or independently implemented. - The apparatus shown in
FIG. 13 may include ascaling unit 1330, anISC encoding unit 1340, a quantizedcomponent restoring unit 1350, and aninverse scaling unit 1360. As compared withFIG. 10 , an operation of each component is the same except that the zeroencoding unit 1020 and the encodingmethod selection unit 1010 are omitted, and theISC encoding unit 1340 uses TCQ. -
FIG. 14 is a block diagram illustrating a configuration of a spectrum encoding apparatus according to another exemplary embodiment. The apparatus shown inFIG. 14 may correspond to the spectrum quantizing andencoding unit 750 ofFIG. 7 or may be included in another frequency domain encoding apparatus or independently implemented. - The apparatus shown in
FIG. 14 may include an encodingmethod selection unit 1410, ascaling unit 1430, anISC encoding unit 1440, a quantizedcomponent restoring unit 1450, and aninverse scaling unit 1460. As compared withFIG. 10 , an operation of each component is the same except that the zeroencoding unit 1020 is omitted. -
FIG. 15 illustrates a concept of an ISC collecting and encoding process, according to an exemplary embodiment. First, zero bands, i.e., bands to be quantized to zero, are omitted. Next, a new buffer may be configured by using ISCs selected from among spectral components existing in non-zero bands. Quantization may be performed on the newly configured ISCs by using the first or the second joint scheme combining USQ and TCQ, in a band unit and corresponding lossless encoding may be performed. -
FIG. 16 illustrates a second joint scheme combining USQ and TCQ. - Referring to
FIG. 16 , quantization may be performed on spectral data in a band unit by using USQ. Each quantized spectral data that is greater than one (1) may contain an LSB which is zero or one. For each band, a sequence of LSBs may be obtained and then be quantized by using TCQ to find the best match between the sequence of LSBs and available trellis paths. In terms of a Signal to Noise Ratio (SNR) criteria, error may occur in the quantized sequence. Instead, at the cost of some errors in the quantized sequence, the length of the sequence may be decreased. - According to the second joint scheme, the advantages of both quantizers, i.e. USQ and TCQ may be used in one scheme and the path limitation may be excluded from TCQ.
-
FIG. 17 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment. The apparatus shown inFIG. 17 may correspond to theISC encoding unit 1040 ofFIG. 10 or independently implemented. - The spectrum encoding apparatus shown in
FIG. 17 may include afirst quantization unit 1710, asecond quantization unit 1730, a firstlossless coding unit 1750, a secondlossless coding unit 1760, a thirdlossless coding unit 1770 and abitstream generating unit 1790. The components may be integrated in at least one processor. - Referring to
FIG. 17 , thefirst quantization unit 1710 may quantize spectral data of a band, i.e. a non-zero band by using USQ. The number of bits allocated for quantization of each band may be determined in advance. In this case, the number of bits which will be used for TCQ in thesecond quantization unit 1730 may be extracted from each non-zero band evenly, and then USQ may be performed on the band by using the remaining number of bits in the non-zero band. The spectral data may be norms or a normalized spectral data. - The
second quantization unit 1730 may quantize a lower bit of a quantized spectral data from thefirst quantization unit 1710, by using TCQ. The lower bit may be an LSB. In this case, for all bands, the lower bit, i.e. residual data may be collected and then TCQ may be performed on the residual data. For all bands that have non-zero data after quantization, residual data may be collected as the difference between the quantized and un-quantized spectral data. If some frequencies are quantized as zero in a non-zero band, they may not be included into residual data. The residual data may construct an array. - The first
lossless coding unit 1750 may perform lossless coding on information about ISCs included in a band, e.g. a number, a position and a sign of the ISCs. According to an embodiment, arithmetic coding may be used. - The second
lossless coding unit 1760 may perform lossless coding on magnitude information which is constructed by the remaining bit except for the lower bit in the quantized spectral data. According to an embodiment, arithmetic coding may be used. - The third
lossless coding unit 1770 may perform lossless coding on TCQ information, i.e. trellis path data obtained from a quantization result of thesecond quantization unit 1730. According to an embodiment, arithmetic coding may be used. The trellis path data may be encoded as equi-probable symbols. The trellis path data is a binary sequence and may be encoded using an arithmetic encoder with a uniform probability model. - The
bitstream generating unit 1790 may generate a bitstream by using data provided from the first to thirdlossless coding units -
FIG. 18 is a block diagram of a second quantization unit ofFIG. 17 according to an exemplary embodiment. - The second quantization unit shown in
FIG. 18 may include a lowerbit obtaining unit 1810, a residualdata generating unit 1830 and aTCQ unit 1850. The components may be integrated in at least one processor. - Referring to
FIG. 18 , the lowerbit obtaining unit 1810 may extract residual data based on the difference between the quantized non-zero spectral data provided from thefirst quantization unit 1710 and original non-zero spectral data. The residual data may correspond to a lower bit of the quantized non-zero spectral data, e.g. an LSB. - The residual
data generating unit 1830 may construct a residual array by collecting the difference between the quantized non-zero spectral data and the original non-zero spectral data for all non-zero bands.FIG. 19 illustrates a method of generating the residual data. - The
TCQ unit 1850 may perform TCQ on the residual array provided from the residualdata generating unit 1830. The residual array may be quantized by TCQ withcode rate 1/2 known (7,5)8 code.FIG. 20 illustrates an example of TCQ having four states. According to an embodiment, quantization using TCQ may be performed for the first 2·TCQ _AMP magnitudes. The constant TCQ _AMP is defined as 10, which allows up to 20 magnitudes per frame to be encoded. After quantization, path metrics may be checked and the best one may be selected. For lossless coding, data for the best trellis path may be stored in a separate array while a trace back procedure is performed. -
FIG. 21 is a block diagram illustrating a configuration of a frequency domain audio decoding apparatus according to an exemplary embodiment. - A frequency domain
audio decoding apparatus 2100 shown inFIG. 21 may include a frameerror detecting unit 2110, a frequencydomain decoding unit 2130, a timedomain decoding unit 2150, and apost-processing unit 2170. The frequencydomain decoding unit 2130 may include aspectrum decoding unit 2131, amemory update unit 2133, aninverse transform unit 2135, and an overlap and add (OLA)unit 2137. Each component may be integrated in at least one module and implemented by at least one processor (not shown). - Referring to
FIG. 21 , the frameerror detecting unit 2110 may detect whether a frame error has occurred from a received bitstream. - The frequency
domain decoding unit 2130 may operate when an encoding mode is a music mode or a frequency domain mode, enable an FEC or PLC algorithm when a frame error has occurred, and generate a time domain signal through a general transform decoding process when no frame error has occurred. In detail, thespectrum decoding unit 2131 may synthesize a spectral coefficient by performing spectrum decoding using a decoded parameter. Thespectrum decoding unit 2131 will be described in more detail with referenceFIGS. 19 and 20 . - The
memory update unit 2133 may update a synthesized spectral coefficient for a current frame that is a normal frame, information obtained using a decoded parameter, the number of continuous error frames till the present, a signal characteristic of each frame, frame type information, or the like for a subsequent frame. Herein, the signal characteristic may include a transient characteristic and a stationary characteristic, and the frame type may include a transient frame, a stationary frame, or a harmonic frame. - The
inverse transform unit 2135 may generate a time domain signal by performing time-frequency inverse transform on the synthesized spectral coefficient. - The
OLA unit 2137 may perform OLA processing by using a time domain signal of a previous frame, generate a final time domain signal for a current frame as a result of the OLA processing, and provide the final time domain signal to thepost-processing unit 2170. - The time
domain decoding unit 2150 may operate when the encoding mode is a voice mode or a time domain mode, enable the FEC or PLC algorithm when a frame error has occurred, and generate a time domain signal through a general CELP decoding process when no frame error has occurred. - The
post-processing unit 2170 may perform filtering or up-sampling on the time domain signal provided from the frequencydomain decoding unit 2130 or the timedomain decoding unit 2150 but is not limited thereto. Thepost-processing unit 2170 may provide a restored audio signal as an output signal. -
FIG. 22 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to an exemplary embodiment. Theapparatus 2200 shown inFIG. 22 may correspond to thespectrum decoding unit 2131 ofFIG. 21 or may be included in another frequency domain decoding apparatus or independently implemented. - A
spectrum decoding apparatus 2200 shown inFIG. 22 may include an energy decoding andinverse quantizing unit 2210, a bit allocator 2230, a spectrum decoding andinverse quantizing unit 2250, anoise filler 2270, and aspectrum shaping unit 2290. Herein, thenoise filler 2270 may be located at a rear end of thespectrum shaping unit 2290. Each component may be integrated in at least one module and implemented by at least one processor (not shown). - Referring to
FIG. 22 , the energy decoding andinverse quantizing unit 2210 may lossless-decode energy such as a parameter for which lossless encoding has been performed in an encoding process, e.g., a Norm value, and inverse-quantize the decoded Norm value. The inverse quantization may be performed using a scheme corresponding to a quantization scheme for the Norm value in the encoding process. - The bit allocator 2230 may allocate bits of a number required for each sub-band based on a quantized Norm value or the inverse-quantized Norm value. In this case, the number of bits allocated for each sub-band may be the same as the number of bits allocated in the encoding process.
- The spectrum decoding and
inverse quantizing unit 2250 may generate a normalized spectral coefficient by lossless-decoding an encoded spectral coefficient using the number of bits allocated for each sub-band and performing an inverse quantization process on the decoded spectral coefficient. - The
noise filler 2270 may fill noise in portions requiring noise filling for each sub-band among the normalized spectral coefficient. - The
spectrum shaping unit 2290 may shape the normalized spectral coefficient by using the inverse-quantized Norm value. A finally decoded spectral coefficient may be obtained through a spectral shaping process. -
FIG. 23 is a block diagram illustrating a configuration of a spectrum inverse-quantization apparatus according to an exemplary embodiment. - The apparatus shown in
FIG. 23 may include an inversequantizer selecting unit 2310, aUSQ 2330, and aTCQ 2350. - In
FIG. 23 , the inversequantizer selecting unit 2310 may select the most efficient inverse quantizer from among various inverse quantizers according to characteristics of an input signal, i.e., a signal to be inverse-quantized. Bit allocation information for each band, band size information, and the like are usable as the characteristics of the input signal. According to a result of the selection, the signal to be inverse-quantized may be provided to one of theUSQ 2330 and theTCQ 2350 so that corresponding inverse quantization is performed.FIG. 23 may correspond to the second joint scheme. -
FIG. 24 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to an exemplary embodiment. The apparatus shown inFIG. 24 may correspond to the spectrum decoding andinverse quantizing unit 2250 ofFIG. 22 or may be included in another frequency domain decoding apparatus or independently implemented. - The apparatus shown in
FIG. 24 may include a decodingmethod selecting unit 2410, a zerodecoding unit 2430, anISC decoding unit 2450, a quantizedcomponent restoring unit 2470, and aninverse scaling unit 2490. Herein, the quantizedcomponent restoring unit 2470 and theinverse scaling unit 2490 may be optionally provided. - In
FIG. 24 , the decodingmethod selecting unit 2410 may select a decoding method based on bits allocated for each band. A normalized spectrum may be provided to the zerodecoding unit 2430 or theISC decoding unit 2450 based on the decoding method selected for each band. - The zero
decoding unit 2430 may decode all samples to zero for bands of which allocated bits are zero. - The
ISC decoding unit 2450 may decode bands of which allocated bits are not zero, by using a selected inverse quantizer. TheISC decoding unit 2450 may obtain information about important frequency components for each band of an encoded spectrum and decode the information about the important frequency components obtained for each band, based on number, position, magnitude, and sign. An important frequency component magnitude may be decoded in a manner other than number, position, and sign. For example, the important frequency component magnitude may be arithmetic-decoded and inverse-quantized using one of USQ and TCQ, whereas the number, positions, and signs of the important frequency components may be arithmetic-decoded. The selection of an inverse quantizer may be performed using the same result as in theISC encoding unit 1040 shown inFIG. 10 . TheISC decoding unit 2450 may inverse-quantize the bands of which allocated bits are not zero, based on the first joint scheme or the second joint scheme. - The quantized
component restoring unit 2470 may restore actual quantized components based on position, magnitude, and sign information of restored ISCs. Herein, zero may be allocated to zero positions, i.e., non-quantized portions which are spectral coefficients decoded to zero. - The inverse scaling unit (not shown) may be further included to inversely scale the restored quantized components to output quantized spectral coefficients of the same level as the normalized spectrum.
-
FIG. 25 is a block diagram illustrating a configuration of an ISC decoding apparatus according to an exemplary embodiment. - The apparatus shown in
FIG. 25 may include a pulse-number estimation unit 2510 and an ISCinformation decoding unit 2530. The apparatus shown inFIG. 25 may correspond to theISC decoding unit 2450 ofFIG. 24 or may be implemented as an independent apparatus. - In
FIG. 25 , the pulse-number estimation unit 2510 may determine a estimated value of the number of pulses required for a current band by using a band size and bit allocation information. That is, since bit allocation information of a current frame is the same as that of an encoder, decoding is performed by using the same bit allocation information to derive the same estimated value of the number of pulses. - The ISC
information decoding unit 2530 may decode ISC information, i.e., number information, position information, magnitude information, and signs of ISCs based on the estimated number of pulses. -
FIG. 26 is a block diagram illustrating a configuration of an ISC information decoding apparatus according to an exemplary embodiment. - The apparatus shown in
FIG. 26 may include a positioninformation decoding unit 2610, a magnitudeinformation decoding unit 2630, and asign decoding unit 2650. - In
FIG. 26 , the positioninformation decoding unit 2610 may restore the number and positions of ISCs by decoding an index related to position information, which is included in a bitstream. Arithmetic decoding may be used to decode the position information. The magnitudeinformation decoding unit 2330 may arithmetic-decode an index related to magnitude information, which is included in the bitstream and inverse-quantize the decoded index based on the first joint scheme or the second joint scheme. To increase efficiency of the arithmetic decoding, non-zero position information and the number of ISCs may be used. Thesign decoding unit 2650 may restore signs of the ISCs by decoding an index related to sign information, which is included in the bitstream. Arithmetic decoding may be used to decode the sign information. According to an embodiment, the number of pulses required for a non-zero band may be estimated and used to decode the position information, the magnitude information, or the sign information. -
FIG. 27 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to another exemplary embodiment. The apparatus shown inFIG. 27 may correspond to the spectrum decoding andinverse quantizing unit 2250 ofFIG. 22 or may be included in another frequency domain decoding apparatus or independently implemented. - The apparatus shown in
FIG. 27 may include anISC decoding unit 2750, a quantizedcomponent restoring unit 2770, and aninverse scaling unit 2790. As compared withFIG. 24 , an operation of each component is the same except that the decodingmethod selecting unit 2410 and the zerodecoding unit 2430 are omitted, and theISC decoding unit 2450 uses TCQ. -
FIG. 28 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to another exemplary embodiment. The apparatus shown inFIG. 28 may correspond to the spectrum decoding andinverse quantizing unit 2250 ofFIG. 22 or may be included in another frequency domain decoding apparatus or independently implemented. - The apparatus shown in
FIG. 28 may include a decoding method selection unit 2810, anISC decoding unit 2850, a quantizedcomponent restoring unit 2870, and aninverse scaling unit 2890. As compared withFIG. 24 , an operation of each component is the same except that the zerodecoding unit 2430 is omitted. -
FIG. 29 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment. The apparatus shown inFIG. 29 may correspond to theISC decoding unit 2450 ofFIG. 24 , or may be independently implemented. - The apparatus shown in
FIG. 29 may include afirst decoding unit 2910, asecond decoding unit 2930, athird decoding unit 2950 and a spectrumcomponent restoring unit 2970. - In
FIG. 29 , thefirst decoding unit 2910 may extract ISC information of a band from a bitstream and may decode number, position and sign of ISCs. The remaining bits except for a lower bit may be extracted and then be decoded. The decoded ISC information may be provided to the spectrumcomponent restoring unit 2970 and position information of ISCs may be provided to thesecond decoding unit 2930. - The
second decoding unit 2930 may decode the remaining bits except for a lower bit from the spectral data for each band, based on the position information of the decoded ISCs provided from thefirst decoding unit 2910 and bit allocation of each band. The surplus bits corresponding to a difference between the allocated bits of a band and an actually used bits of the band may be accumulated and then be used for a next band. - The
third decoding unit 2950 may restore a TCQ residual array corresponding to the sequence of lower bits by decoding the TCQ path information extracted from the bitstream. - The spectrum
component restoring unit 2970 may reconstruct spectrum components based on data provided from thefirst decoding unit 2910, thesecond decoding unit 2930 and thethird decoding unit 2950. - The first to
third decoding units -
FIG. 30 is a block diagram of a third decoding unit ofFIG. 29 according to another exemplary embodiment. - The third decoding unit shown in
FIG. 30 may include a TCQpath decoding unit 3010 and a TCQ residual restoringunit 3030. - In
FIG. 30 , the TCQpath decoding unit 3010 may decode TCQ path information obtained from the bitstream. - The TCQ residual restoring
unit 3030 may TCQ residual data based on the decoded TCQ path information. In detail, the residual data, i.e. a residual array may be reconstructed according to a decoded trellis state. From each path bit, two LSB bits may be generated in the residual array. This process may be represented by the following pseudo code. -
for( state = 0, i = 0; i < bcount; i++) { residualbuffer[2*i] = dec_LSB[state][dpath[i]] & 0x1; residualbuffer [2*i + 1] = dec_LSB[state][dpath[i]] & 0x2; state = trellis_nextstate[state][dpath[i]]; } - Starting from
state 0, the decoder may move through the trellis using decoded dpath bits, and may extract two bits corresponding to the current trellis edge. - The configurations of
FIGS. 29 and 30 may have a reversible relationship to the configurations ofFIGS. 17 and 18 . -
FIG. 31 is a block diagram of a multimedia device including an encoding module, according to an exemplary embodiment. - Referring to
FIG. 31 , themultimedia device 3100 may include acommunication unit 3110 and theencoding module 3130. In addition, themultimedia device 3100 may further include astorage unit 3150 for storing an audio bitstream obtained as a result of encoding according to the usage of the audio bitstream. Moreover, themultimedia device 3100 may further include amicrophone 3170. That is, thestorage unit 3150 and themicrophone 3170 may be optionally included. Themultimedia device 3100 may further include an arbitrary decoding module (not shown), e.g., a decoding module for performing a general decoding function or a decoding module according to an exemplary embodiment. Theencoding module 3130 may be implemented by at least one processor (not shown) by being integrated with other components (not shown) included in themultimedia device 3100 as one body. - The
communication unit 3110 may receive at least one of an audio signal or an encoded bitstream provided from the outside or may transmit at least one of a reconstructed audio signal or an encoded bitstream obtained as a result of encoding in theencoding module 3130. - The
communication unit 3110 is configured to transmit and receive data to and from an external multimedia device or a server through a wireless network, such as wireless Internet, wireless intranet, a wireless telephone network, a wireless Local Area Network (LAN), Wi-Fi, Wi-Fi Direct (WFD), third generation (3G), fourth generation (4G), Bluetooth, Infrared Data Association (IrDA), Radio Frequency Identification (RFID), Ultra WideBand (UWB), Zigbee, or Near Field Communication (NFC), or a wired network, such as a wired telephone network or wired Internet. - According to an exemplary embodiment, the
encoding module 3130 may quantize spectral data of a current band based on a first quantization scheme, generate a lower bit of the current band using the spectral data and the quantized spectral data, quantize a sequence of lower bits including the lower bit of the current band based on a second quantization scheme, and generate a bitstream based on a upper bit excluding N bits, where N is 1 or greater, from the quantized spectral data and the quantized sequence of lower bits. - The
storage unit 3150 may store the encoded bitstream generated by theencoding module 3130. In addition, thestorage unit 3150 may store various programs required to operate themultimedia device 3100. - The
microphone 3170 may provide an audio signal from a user or the outside to theencoding module 3130. -
FIG. 32 is a block diagram of a multimedia device including a decoding module, according to an exemplary embodiment. - Referring to
FIG. 32 , themultimedia device 3200 may include acommunication unit 3210 and adecoding module 3230. In addition, according to the usage of a reconstructed audio signal obtained as a result of decoding, themultimedia device 3200 may further include astorage unit 3250 for storing the reconstructed audio signal. In addition, themultimedia device 3200 may further include aspeaker 3270. That is, thestorage unit 3250 and thespeaker 3270 may be optionally included. Themultimedia device 3200 may further include an encoding module (not shown), e.g., an encoding module for performing a general encoding function or an encoding module according to an exemplary embodiment. Thedecoding module 3230 may be implemented by at least one processor (not shown) by being integrated with other components (not shown) included in themultimedia device 3200 as one body. - The communication unit 3290 may receive at least one of an audio signal or an encoded bitstream provided from the outside or may transmit at least one of a reconstructed audio signal obtained as a result of decoding in the
decoding module 3230 or an audio bitstream obtained as a result of encoding. Thecommunication unit 3210 may be implemented substantially and similarly to thecommunication unit 3100 ofFIG. 31 . - According to an exemplary embodiment, the
decoding module 3230 may receive a bitstream provided via thecommunication unit 3210, decode a sequence of lower bits by extracting TCQ path information, decode number, position and sign of ISCs by extracting ISC information, extract and decode a remaining bit except for a lower bit, and reconstruct spectrum components based on the decoded sequence of lower bits and the decoded remaining bit except for the lower bit. - The
storage unit 3250 may store the reconstructed audio signal generated by thedecoding module 3230. In addition, thestorage unit 3250 may store various programs required to operate themultimedia device 3200. - The
speaker 3270 may output the reconstructed audio signal generated by thedecoding module 3230 to the outside. -
FIG. 33 is a block diagram of a multimedia device including an encoding module and a decoding module, according to an exemplary embodiment. - Referring to
FIG. 33 , themultimedia device 3300 may include acommunication unit 3310, anencoding module 3320, and adecoding module 3330. In addition, themultimedia device 3300 may further include astorage unit 3340 for storing an audio bitstream obtained as a result of encoding or a reconstructed audio signal obtained as a result of decoding according to the usage of the audio bitstream or the reconstructed audio signal. In addition, themultimedia device 3300 may further include amicrophone 3350 and/or aspeaker 3360. Theencoding module 3320 and thedecoding module 3330 may be implemented by at least one processor (not shown) by being integrated with other components (not shown) included in themultimedia device 3300 as one body. - Since the components of the
multimedia device 3300 shown inFIG. 33 correspond to the components of themultimedia device 3100 shown inFIG. 31 or the components of themultimedia device 3200 shown inFIG. 32 , a detailed description thereof is omitted. - Each of the
multimedia devices FIGS. 31, 32, and 33 may include a voice communication dedicated terminal, such as a telephone or a mobile phone, a broadcasting or music dedicated device, such as a TV or an MP3 player, or a hybrid terminal device of a voice communication dedicated terminal and a broadcasting or music dedicated device but are not limited thereto. In addition, each of themultimedia devices - When the
multimedia device multimedia device - When the
multimedia device multimedia device -
FIG. 34 is a flowchart illustrating a spectrum encoding method according to an exemplary embodiment. - Referring to
FIG. 34 , inoperation 3410, spectral data of a current band may be quantized by using a first quantization scheme. The first quantization scheme may be a scalar quantizer. As an example, the USQ having a uniform quantization step size may be used. - In
operation 3430, a lower bit of the current band may be generated based on the spectral data and the quantized spectral data. The lower bit may be obtained based on a difference between the spectral data and the quantized spectral data. The second quantization scheme may be the TCQ. - In
operation 3450, a sequence of the lower bits including the lower bit of the current band may be quantized by using the second quantization scheme. - In
operation 3470, a bitstream may be generated based on upper bits except for N bit, where N is a value greater than or equal to 1) from the quantized spectral data and the quantized sequence of the lower bits. - The bandwidth of spectral data related to a spectrum encoding method of
FIG. 34 may be a SWB or a FB. In addition, the spectral data may be obtained by performing MDCT on an input audio signal and may be coding in a normal mode. - Some functions in respective components of the above encoding apparatus may be added into respective operations of
FIG. 34 , according to circumstances or user's need. -
FIG. 35 is a flowchart illustrating a spectrum decoding method according to an exemplary embodiment. - Referring to
FIG. 35 , in 3510, ISC information may be extracted from a bitstream and number, position and sign of ISCs may be decoded. The remaining bits except for a lower bit may be extracted and then be decoded. - In
operation 3530, the sequence of the lower bits may be decoded by extracting TCQ path information from the bitstream. - In
operation 3550, spectral components may be reconstructed based on the decoded remaining bits except for the lower bit byoperation 3510 and the decoded sequence of the lower bits byoperation 3530. - Some functions in respective components of the above decoding apparatus may be added into respective operations of
FIG. 35 , according to circumstances or user's need. -
FIG. 36 is a block diagram of a bit allocation apparatus according to an exemplary embodiment. The apparatus shown inFIG. 36 may correspond to the bit allocator 516 ofFIG. 5 , the bit allocator 730 ofFIG. 7 or thebit allocation unit 2230 ofFIG. 22 , or may be independently implemented. - A bit allocation apparatus shown in
FIG. 36 may include abit estimation unit 3610, are-distributing unit 3630 and anadjusting unit 3650, which may be integrated into at least one processor. For bit allocation used in spectrum quantization, fractional bit allocation may be used. According to the fractional bit allocation, bit allocation with the fractional parts of e.g. 3 bits may be permitted and thus it is possible to perform a finer bit allocation. In a generic mode, the fractional bit allocation may be used. - In
FIG. 36 , thebit estimation unit 3610 may estimate initially allocated bits for each band based on average energy of a band, e.g. norms. - The initially allocated bits R0(p,0) of a band may be estimated by
Equation 8. -
- where LM(p) indicates the number of bits that corresponds to 1 bit/sample in a band p, and if a band includes 10 samples, LM(p) becomes 10 bits. TB is a total bit budget and ÎM(i) indicates quantized norms of a band i.
- The
re-distributing unit 3630 may re-distribute the initially allocated bits of each band, based on a predetermined criteria. - The fully allocated bits may be calculated as a starting point and the first-stage iterations may be done to re-distribute the allocated bits to the bands with non-zero bits until the number of fully allocated bits is equal to the total bit budget TB, which is represented by Equation 9.
-
- where NSL0(k−1) is the number of spectral lines in all bands with allocated bits after k iterations.
- If too few bits are allocated, this can cause a quality degradation due to the reduced SNR. To avoid this problem, a minimum bit limitation may be applied to the allocated bits. The first minimum bit may consist of constant values depending on the band index and bit-rate. As an example, the first minimum bit LNB(p) may be determined as 3 for a band p=0 to 15, 4 for a band p=16 to 23, and 5 for a band p=24 to Nbands−1.
- In the second-stage iterations, the re-distribution of bits may be done again to allocate bits to the bands with more than LM(p) bits. The value of LM(p) bits may correspond to the second minimum bits required for each band.
- Initially, the allocated bits R1(p,0) may be calculated based on the result of the first-stage iteration and the first and second minimum bit for each band, which is represented by
Equation 10, as an example. -
- where R(p)is the allocated bits after the first-stage iterations, and bs is 2 at 24.4 kbps and 3 at 32 kbps, but is not limited thereto.
- TB may be updated by subtracting the number of bits in bands with LM(p) bits, and the band index p may be updated to p′ which indicates the band indices with higher bits than LM(p) bits. Nbands may also be updated to N′bands which is the number of bands for p′.
- The second-stage iterations may be then done until the updated TB (TB′) is equal to the number of bits in bands with more than LM(p′) bits, which is represented by
Equation 11, as an example. -
- where NSL1(k−1) denotes the number of spectral lines in all bands with more than LM(p′) bits after k iterations.
- During the second-stage iterations, if there are no bands with more than LM(p′) bits, the bits in bands with non-zero allocated bits from the highest bands may be set to zero until TB′ is equal to zero.
- Then, a final re-distribution of over-allocated bits and under-allocated bits may be performed. In this case, the final re-distribution may be performed based on a predetermined reference value.
- The
adjusting unit 3650 may adjust the fractional parts of the bit allocation result to be a predetermined bit. As an example, the fractional parts of the bit allocation result may be adjusted to have three bits, which may be represented byEquations 12. -
R(p)=└R(p)*8┘/8 for p=0, . . . , N bands−1 (12) -
FIG. 37 is a block diagram of a coding mode determination apparatus according to an exemplary embodiment. - A coding mode determination apparatus shown in
FIG. 37 may include a speech/music classifying unit 3710 and acorrection unit 3730. The apparatus shown inFIG. 37 may be included in themode determiner 213 ofFIG. 2A , themode determiner 314 ofFIG. 3A or themode determiner 413 ofFIG. 4A . Also, the apparatus shown inFIG. 37 may be further included in thetime domain coder 215 ofFIG. 2A , the timedomain excitation coder 316 ofFIG. 3A or the timedomain excitation coder 417 ofFIG. 4A , or may be independently implemented. Herein, the components may be integrated into at least one module and implemented as at least one processor (not shown) except for a case where it is needed to be implemented to separate pieces of hardware. In addition, an audio signal may indicate a music signal, a speech signal, or a mixed signal of music and speech. - Referring to
FIG. 37 , the speech/music classifying unit 3710 may classify whether an audio signal corresponds to a music signal or a speech signal, based on various initial classification parameters. An audio signal classification process may include at least one operation. - According to an embodiment, the audio signal may be classified as a music signal or a speech signal based on signal characteristics of a current frame and a plurality of previous frames. The signal characteristics may include at least one of a short-term characteristic and a long-term characteristic. In addition, the signal characteristics may include at least one of a time domain characteristic and a frequency domain characteristic. Herein, if the audio signal is classified as a speech signal, the audio signal may be coded using a code excited linear prediction (CELP)-type coder. If the audio signal is classified as a music signal, the audio signal may be coded using a transform coder. The transform coder may be, for example, a modified discrete cosine transform (MDCT) coder but is not limited thereto.
- According to another exemplary embodiment, an audio signal classification process may include a first operation of classifying an audio signal as a speech signal and a generic audio signal, i.e., a music signal, according to whether the audio signal has a speech characteristic and a second operation of determining whether the generic audio signal is suitable for a generic signal audio coder (GSC). Whether the audio signal can be classified as a speech signal or a music signal may be determined by combining a classification result of the first operation and a classification result of the second operation. When the audio signal is classified as a speech signal, the audio signal may be encoded by a CELP-type coder. The CELP-type coder may include a plurality of modes among an unvoiced coding (UC) mode, a voiced coding (VC) mode, a transient coding (TC) mode, and a generic coding (GC) mode according to a bit rate or a signal characteristic. A generic signal audio coding (GSC) mode may be implemented by a separate coder or included as one mode of the CELP-type coder. When the audio signal is classified as a music signal, the audio signal may be encoded using the transform coder or a CELP/transform hybrid coder. In detail, the transform coder may be applied to a music signal, and the CELP/transform hybrid coder may be applied to a non-music signal, which is not a speech signal, or a signal in which music and speech are mixed. According to an embodiment, according to bandwidths, all of the CELP-type coder, the CELP/transform hybrid coder, and the transform coder may be used, or the CELP-type coder and the transform coder may be used. For example, the CELP-type coder and the transform coder may be used for a narrow-band (NB), and the CELP-type coder, the CELP/transform hybrid coder, and the transform coder may be used for a wide-band (WB), a super-wide-band (SWB), and a full band (FB). The CELP/transform hybrid coder is obtained by combining an LP-based coder which operates in a time domain and a transform domain coder, and may be also referred to as a generic signal audio coder (GSC).
- The signal classification of the first operation may be based on a Gaussian mixture model (GMM). Various signal characteristics may be used for the GMM. Examples of the signal characteristics may include open-loop pitch, normalized correlation, spectral envelope, tonal stability, signal's non-stationarity, LP residual error, spectral difference value, and spectral stationarity but are not limited thereto. Examples of signal characteristics used for the signal classification of the second operation may include spectral energy variation characteristic, tilt characteristic of LP analysis residual energy, high-band spectral peakiness characteristic, correlation characteristic, voicing characteristic, and tonal characteristic but are not limited thereto. The characteristics used for the first operation may be used to determine whether the audio signal has a speech characteristic or a non-speech characteristic in order to determine whether the CELP-type coder is suitable for encoding, and the characteristics used for the second operation may be used to determine whether the audio signal has a music characteristic or a non-music characteristic in order to determine whether the GSC is suitable for encoding. For example, one set of frames classified as a music signal in the first operation may be changed to a speech signal in the second operation and then encoded by one of the CELP modes. That is, when the audio signal is a signal of large correlation or an attack signal while having a large pitch period and high stability, the audio signal may be changed from a music signal to a speech signal in the second operation. A coding mode may be changed according to a result of the signal classification described above.
- The
correction unit 3730 may correct the classification result of the speech/music classifying unit 3710 based on at least one correction parameter. Thecorrection unit 3730 may correct the classification result of the speech/music classifying unit 3710 based on a context. For example, when a current frame is classified as a speech signal, the current frame may be corrected to a music signal or maintained as the speech signal, and when the current frame is classified as a music signal, the current frame may be corrected to a speech signal or maintained as the music signal. To determine whether there is an error in a classification result of the current frame, characteristics of a plurality of frames including the current frame may be used. For example, eight frames may be used, but the embodiment is not limited thereto. - The correction parameter may include a combination of at least one of characteristics such as tonality, linear prediction error, voicing, and correlation. Herein, the tonality may include tonality ton2 of a range of 1-2 KHz and tonality ton3 of a range of 2-4 KHz, which may be defined by
Equations -
- where a superscript [-j] denotes a previous frame. For example, tonality2[−1] denotes tonality of a range of 1-2 KHz of a one-frame previous frame.
- Low-band long-term tonality tonLT may be defined as tonLT=0.2* log10[It_tonality]. Herein, It_tonality may denote full-band long-term tonality.
- A difference dft between tonality ton2 of a range of 1-2 KHz and tonality ton3 of a range of 2-4 KHz in an nth frame may be defined as dft=0.2* {log10(tonality2(n))-log10(tonality3(n))).
- Next, a linear prediction error LPerr may be defined by
Equation 15. -
- where FVs(9) is defined as FVs(i)=sfaiFVi+sfbi(i=0, . . . , 11) and corresponds to a value obtained by scaling an LP residual log-energy ratio feature parameter defined by
Equation 16 among feature parameters used for the speech/music classifying unit 3710. In addition, sfai and sfbi may vary according to types of feature parameters and bandwidths and are used to approximate each feature parameter to a range of [0;1]. -
- where E(1) denotes energy of a first LP coefficient, and E(13) denotes energy of a 13th LP coefficient.
- Next, a difference dvcor between a value FVs(1) obtained by scaling a normalized correlation feature or a voicing feature FV1, which is defined by
Equation 17 among the feature parameters used for the speech/music classifying unit 3710, based on FVs(i)=sfaiFVi+sfbi(i=0, . . . , 11) and a value FVs(7) obtained by scaling a correlation map feature FV(7), which is defined by Equation 18, based on FVs(i)=sfaiFVi+sfbi(i=0, . . . , 11) may be defined as dvcor=max(FVs(1)−FVs(7),0). -
FV 1 C norm [.] (17) - where Cnorm [.] denotes a normalized correlation in a first or second half frame.
-
- where Mcor denotes a correlation map of a frame.
- A correction parameter including at least one of
conditions 1 through 4 may be generated using the plurality of feature parameters, taken alone or in combination. Herein, theconditions conditions condition 1 enables the speech state SPEECH_STATE to be changed from 0 to 1, and thecondition 2 enables the speech state SPEECH_STATE to be changed from 1 to 0. In addition, thecondition 3 enables the music state MUSIC_STATE to be changed from 0 to 1, and thecondition 4 enables the music state MUSIC_STATE to be changed from 1 to 0. The speech state SPEECH_STATE of 1 may indicate that a speech probability is high, that is, CELP-type coding is suitable, and the speech state SPEECH_STATE of 0 may indicate that non-speech probability is high. As an example, the music state MUSIC_STATE of 1 may indicate that transform coding is suitable, and the music state MUSIC_STATE of 0 may indicate that CELP/transform hybrid coding, i.e., GSC, is suitable. As another example, the music state MUSIC_STATE of 1 may indicate that transform coding is suitable, and the music state MUSIC_STATE of 0 may indicate that CELP-type coding is suitable. - The condition 1 (condA) may be defined, for example, as follows. That is, when dvcor>0.4 AND dft<0.1 AND FVs(1)>(2*FVs(7)+0.12) AND ton2<dvcor AND ton3<dvcor AND tonLT<dvcor AND FVs(7)<dvcor AND FVs(1)>dvcor AND FVs(1)>0.76, condA may be set to 1.
- The condition 2 (condB) may be defined, for example, as follows. That is, when dvcor<0.4, condB may be set to 1.
- The condition 3 (condC) may be defined, for example, as follows. That is, when 0.26<ton2<0.54 AND ton3>0.22 AND 0.26<tonLT<0.54 AND LPerr>0.5, condC may be set to 1.
- The condition 4 (condD) may be defined, for example, as follows. That is, when ton2<0.34 AND ton3<0.26 AND 0.26<tonLT<0.45, condD may be set to 1.
- A feature or a set of features used to generate each condition is not limited thereto. In addition, each constant value is only illustrative and may be set to an optimal value according to an implementation method.
- According to an embodiment, the correcting
unit 3730 may correct errors in the initial classification result by using two independent state machines, for example, a speech state machine and a music state machine. Each state machine has two states, and hangover may be used in each state to prevent frequent transitions. The hangover may include, for example, six frames. When a hangover variable in the speech state machine is indicated by hangsp, and a hangover variable in the music state machine is indicated by hangmus, if a classification result is changed in a given state, each variable is initialized to 6, and thereafter, hangover decreases by 1 for each subsequent frame. A state change may occur only when hangover decreases to zero. In each state machine, a correction parameter generated by combining at least one feature extracted from the audio signal may be used. -
FIG. 38 illustrates a state machine used in acorrection unit 3730 ofFIG. 37 according to an exemplary embodiment. - Referring to
FIG. 38 , a left side shows a state machine suitable for a CELP core, i.e. a state machine for context-based correction in a speech state, according to an embodiment. In thecorrection unit 3730, correction on a classification result may be applied according to a music state determined by the music state machine and a speech state determined by the speech state machine. For example, when an initial classification result is set to a music signal, the music signal may be changed to a speech signal based on correction parameters. In detail, when a classification result of a first operation of the initial classification result indicates a music signal, and the speech state is 1, both the classification result of the first operation and a classification result of a second operation may be changed to a speech signal. In this case, it may be determined that there is an error in the initial classification result, thereby correcting the classification result. - The above operation will be explained in detail as follows.
- First, the correction parameters, e.g., the
condition 1 and thecondition 2, may be received. In addition, hangover information of the speech state machine may be received. An initial classification result may also be received. The initial classification result may be provided from the speech/music classifying unit 3710. - It may be determined whether the initial classification result, i.e., the speech state, is 0, the condition 1(condA) is 1, and the hangover hangsp of the speech state machine is 0. If it is determined that the initial classification result, i.e., the speech state, is 0, the
condition 1 is 1, and the hangover hangsp of the speech state machine is 0, the speech state may be changed to 1, and the hangover may be initialized to 6. - Meanwhile, it may be determined whether the initial classification result, i.e., the speech state, is 1, the condition 2(condB) is 1, and the hangover hangsp of the speech state machine is 0. If it is determined that the speech state is 1, the
condition 2 is 1, and the hangover hangsp of the speech state machine is 0, the speech state may be changed to 0, and the hangoversp may be initialized to 6. If the speech state is not 1, thecondition 2 is not 1, or the hangover hangsp of the speech state machine is not 0, a hangover update for decreasing the hangover by 1 may be performed. - Referring to
FIG. 38 , a right side shows a state machine suitable for a high quality (HQ) core, i.e. a state machine for context-based correction in a music state, according to an embodiment. In thecorrection unit 3730, correction on a classification result may be applied according to a music state determined by the music state machine and a speech state determined by the speech state machine. For example, when an initial classification result is set to a speech signal, the speech signal may be changed to a music signal based on correction parameters. In detail, when a classification result of a first operation of the initial classification result indicates a speech signal, and the music state is 1, both the classification result of the first operation and a classification result of a second operation may be changed to a music signal. When the initial classification result is set to a music signal, the music signal may be changed to a speech signal based on correction parameters. In this case, it may be determined that there is an error in the initial classification result, thereby correcting the classification result. - The above operation will be explained in detail as follows.
- First, the correction parameters, e.g., the
condition 3 and thecondition 4, may be received. In addition, hangover information of the music state machine may be received. An initial classification result may also be received. The initial classification result may be provided from the speech/music classifying unit 3710. - It may be determined whether the initial classification result, i.e., the music state, is 0, the condition 3(condC) is 1, and the hangover hangmus of the music state machine is 0. If it is determined that the initial classification result, i.e., the music state, is 0, the
condition 3 is 1, and the hangover hangmus of the music state machine is 0, the music state may be changed to 1, and the hangover may be initialized to 6. - It may be determined whether the initial classification result, i.e., the music state, is 1, the condition 4(condD) is 1, and the hangover hangmus of the music state machine is 0. If it is determined that the music state is 1, the
condition 4 is 1, and the hangover hangmus of the music state machine is 0, the music state may be changed to 0, and the hangover hangmus may be initialized to 6. If the music state is not 1, thecondition 4 is not 1, or the hangover hangmus of the music state machine is not 0, a hangover update for decreasing the hangover by 1 may be performed. - The above-described exemplary embodiments may be written as computer-executable programs and may be implemented in general-use digital computers that execute the programs by using a non-transitory computer-readable recording medium. In addition, data structures, program instructions, or data files, which can be used in the embodiments, can be recorded on a non-transitory computer-readable recording medium in various ways. The non-transitory computer-readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the non-transitory computer-readable recording medium include magnetic storage media, such as hard disks, floppy disks, and magnetic tapes, optical recording media, such as CD-ROMs and DVDs, magneto-optical media, such as optical disks, and hardware devices, such as ROM, RAM, and flash memory, specially configured to store and execute program instructions. In addition, the non-transitory computer-readable recording medium may be a transmission medium for transmitting signal designating program instructions, data structures, or the like. Examples of the program instructions may include not only mechanical language codes created by a compiler but also high-level language codes executable by a computer using an interpreter or the like.
- While the exemplary embodiments have been particularly shown and described, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the inventive concept as defined by the appended claims. It should be understood that the exemplary embodiments described therein should be considered in a descriptive sense only and not for purposes of limitation. Descriptions of features or aspects within each exemplary embodiment should typically be considered as available for other similar features or aspects in other exemplary embodiments.
Claims (8)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/259,341 US10827175B2 (en) | 2014-07-28 | 2019-01-28 | Signal encoding method and apparatus and signal decoding method and apparatus |
US17/030,466 US11616954B2 (en) | 2014-07-28 | 2020-09-24 | Signal encoding method and apparatus and signal decoding method and apparatus |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201462029736P | 2014-07-28 | 2014-07-28 | |
PCT/KR2015/007901 WO2016018058A1 (en) | 2014-07-28 | 2015-07-28 | Signal encoding method and apparatus and signal decoding method and apparatus |
US201715500292A | 2017-01-30 | 2017-01-30 | |
US16/259,341 US10827175B2 (en) | 2014-07-28 | 2019-01-28 | Signal encoding method and apparatus and signal decoding method and apparatus |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/500,292 Continuation US10194151B2 (en) | 2014-07-28 | 2015-07-28 | Signal encoding method and apparatus and signal decoding method and apparatus |
PCT/KR2015/007901 Continuation WO2016018058A1 (en) | 2014-07-28 | 2015-07-28 | Signal encoding method and apparatus and signal decoding method and apparatus |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/030,466 Continuation US11616954B2 (en) | 2014-07-28 | 2020-09-24 | Signal encoding method and apparatus and signal decoding method and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190158833A1 true US20190158833A1 (en) | 2019-05-23 |
US10827175B2 US10827175B2 (en) | 2020-11-03 |
Family
ID=58587219
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/259,341 Active US10827175B2 (en) | 2014-07-28 | 2019-01-28 | Signal encoding method and apparatus and signal decoding method and apparatus |
US17/030,466 Active 2035-12-10 US11616954B2 (en) | 2014-07-28 | 2020-09-24 | Signal encoding method and apparatus and signal decoding method and apparatus |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/030,466 Active 2035-12-10 US11616954B2 (en) | 2014-07-28 | 2020-09-24 | Signal encoding method and apparatus and signal decoding method and apparatus |
Country Status (5)
Country | Link |
---|---|
US (2) | US10827175B2 (en) |
EP (2) | EP4293666A3 (en) |
JP (2) | JP6763849B2 (en) |
KR (2) | KR20170037970A (en) |
CN (3) | CN111968656B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220101868A1 (en) * | 2019-06-17 | 2022-03-31 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder with a signal-dependent number and precision control, audio decoder, and related methods and computer programs |
WO2023285630A1 (en) * | 2021-07-14 | 2023-01-19 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Integral band-wise parametric audio coding |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3614381A1 (en) * | 2013-09-16 | 2020-02-26 | Samsung Electronics Co., Ltd. | Signal encoding method and device and signal decoding method and device |
CN106233112B (en) * | 2014-02-17 | 2019-06-28 | 三星电子株式会社 | Coding method and equipment and signal decoding method and equipment |
WO2015122752A1 (en) | 2014-02-17 | 2015-08-20 | 삼성전자 주식회사 | Signal encoding method and apparatus, and signal decoding method and apparatus |
CN111968656B (en) | 2014-07-28 | 2023-11-10 | 三星电子株式会社 | Signal encoding method and device and signal decoding method and device |
JP7257975B2 (en) | 2017-07-03 | 2023-04-14 | ドルビー・インターナショナル・アーベー | Reduced congestion transient detection and coding complexity |
CN107657958B (en) * | 2017-09-13 | 2020-06-23 | 厦门声连网信息科技有限公司 | Music identification system, device, music management server and method |
CN116260799B (en) * | 2023-05-16 | 2023-07-21 | 北京庭宇科技有限公司 | Method for adjusting network state and electronic equipment |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10194151B2 (en) * | 2014-07-28 | 2019-01-29 | Samsung Electronics Co., Ltd. | Signal encoding method and apparatus and signal decoding method and apparatus |
Family Cites Families (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA1251276A (en) | 1985-03-20 | 1989-03-14 | Toshio Koga | Method and arrangement of coding digital image signals utilizing interframe correlation |
US5297170A (en) | 1990-08-21 | 1994-03-22 | Codex Corporation | Lattice and trellis-coded quantization |
US5255339A (en) | 1991-07-19 | 1993-10-19 | Motorola, Inc. | Low bit rate vocoder means and method |
US5369724A (en) | 1992-01-17 | 1994-11-29 | Massachusetts Institute Of Technology | Method and apparatus for encoding, decoding and compression of audio-type data using reference coefficients located within a band of coefficients |
JP3093458B2 (en) | 1992-07-23 | 2000-10-03 | 株式会社東芝 | Variable rate codec / decoder |
US5727484A (en) | 1996-05-13 | 1998-03-17 | Childs; Robert C. | Soil penetrating applicator and method |
US6125149A (en) | 1997-11-05 | 2000-09-26 | At&T Corp. | Successively refinable trellis coded quantization |
KR100335611B1 (en) | 1997-11-20 | 2002-10-09 | 삼성전자 주식회사 | Scalable stereo audio encoding/decoding method and apparatus |
US6192158B1 (en) * | 1998-03-30 | 2001-02-20 | Motorola, Inc. | Wavelet image coder using trellis-coded quantization |
JP3808241B2 (en) * | 1998-07-17 | 2006-08-09 | 富士写真フイルム株式会社 | Data compression method and apparatus, and recording medium |
US7003171B1 (en) | 1998-07-17 | 2006-02-21 | Fuji Photo Film Co., Ltd. | Method, apparatus and recording medium for data compression |
US6504877B1 (en) | 1999-12-14 | 2003-01-07 | Agere Systems Inc. | Successively refinable Trellis-Based Scalar Vector quantizers |
US7020335B1 (en) * | 2000-11-21 | 2006-03-28 | General Dynamics Decision Systems, Inc. | Methods and apparatus for object recognition and compression |
DE60209888T2 (en) | 2001-05-08 | 2006-11-23 | Koninklijke Philips Electronics N.V. | CODING AN AUDIO SIGNAL |
KR100486732B1 (en) | 2003-02-19 | 2005-05-03 | 삼성전자주식회사 | Block-constrained TCQ method and method and apparatus for quantizing LSF parameter employing the same in speech coding system |
KR100565308B1 (en) | 2003-11-24 | 2006-03-30 | 엘지전자 주식회사 | Video code and decode apparatus for snr scalability |
ES2476992T3 (en) | 2004-11-05 | 2014-07-15 | Panasonic Corporation | Encoder, decoder, encoding method and decoding method |
RU2404506C2 (en) | 2004-11-05 | 2010-11-20 | Панасоник Корпорэйшн | Scalable decoding device and scalable coding device |
KR100851970B1 (en) | 2005-07-15 | 2008-08-12 | 삼성전자주식회사 | Method and apparatus for extracting ISCImportant Spectral Component of audio signal, and method and appartus for encoding/decoding audio signal with low bitrate using it |
US7693709B2 (en) | 2005-07-15 | 2010-04-06 | Microsoft Corporation | Reordering coefficients for waveform coding or decoding |
US7562021B2 (en) | 2005-07-15 | 2009-07-14 | Microsoft Corporation | Modification of codewords in dictionary used for efficient coding of digital media spectral data |
US20070168197A1 (en) | 2006-01-18 | 2007-07-19 | Nokia Corporation | Audio coding |
EP1989707A2 (en) | 2006-02-24 | 2008-11-12 | France Telecom | Method for binary coding of quantization indices of a signal envelope, method for decoding a signal envelope and corresponding coding and decoding modules |
KR100728056B1 (en) * | 2006-04-04 | 2007-06-13 | 삼성전자주식회사 | Method of multi-path trellis coded quantization and multi-path trellis coded quantizer using the same |
US7414549B1 (en) | 2006-08-04 | 2008-08-19 | The Texas A&M University System | Wyner-Ziv coding based on TCQ and LDPC codes |
JPWO2008047795A1 (en) * | 2006-10-17 | 2010-02-25 | パナソニック株式会社 | Vector quantization apparatus, vector inverse quantization apparatus, and methods thereof |
FR2912249A1 (en) | 2007-02-02 | 2008-08-08 | France Telecom | Time domain aliasing cancellation type transform coding method for e.g. audio signal of speech, involves determining frequency masking threshold to apply to sub band, and normalizing threshold to permit spectral continuity between sub bands |
KR100903110B1 (en) | 2007-04-13 | 2009-06-16 | 한국전자통신연구원 | The Quantizer and method of LSF coefficient in wide-band speech coder using Trellis Coded Quantization algorithm |
US20090232242A1 (en) * | 2007-09-28 | 2009-09-17 | Zixiang Xiong | Nested Turbo Code Design for the Costa Problem |
US8527265B2 (en) | 2007-10-22 | 2013-09-03 | Qualcomm Incorporated | Low-complexity encoding/decoding of quantized MDCT spectrum in scalable speech and audio codecs |
US8515767B2 (en) | 2007-11-04 | 2013-08-20 | Qualcomm Incorporated | Technique for encoding/decoding of codebook indices for quantized MDCT spectrum in scalable speech and audio codecs |
US20090135946A1 (en) | 2007-11-26 | 2009-05-28 | Eric Morgan Dowling | Tiled-building-block trellis decoders |
KR101671005B1 (en) | 2007-12-27 | 2016-11-01 | 삼성전자주식회사 | Method and apparatus for quantization encoding and de-quantization decoding using trellis |
KR101485339B1 (en) | 2008-09-29 | 2015-01-26 | 삼성전자주식회사 | Apparatus and method for lossless coding and decoding |
KR101622950B1 (en) | 2009-01-28 | 2016-05-23 | 삼성전자주식회사 | Method of coding/decoding audio signal and apparatus for enabling the method |
CN101615910B (en) * | 2009-05-31 | 2010-12-22 | 华为技术有限公司 | Method, device and equipment of compression coding and compression coding method |
WO2011048099A1 (en) | 2009-10-20 | 2011-04-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a region-dependent arithmetic coding mapping rule |
KR101826331B1 (en) | 2010-09-15 | 2018-03-22 | 삼성전자주식회사 | Apparatus and method for encoding and decoding for high frequency bandwidth extension |
AU2011350143B9 (en) | 2010-12-29 | 2015-05-14 | Samsung Electronics Co., Ltd. | Apparatus and method for encoding/decoding for high-frequency bandwidth extension |
RU2571561C2 (en) | 2011-04-05 | 2015-12-20 | Ниппон Телеграф Энд Телефон Корпорейшн | Method of encoding and decoding, coder and decoder, programme and recording carrier |
CN103620675B (en) * | 2011-04-21 | 2015-12-23 | 三星电子株式会社 | To equipment, acoustic coding equipment, equipment linear forecast coding coefficient being carried out to inverse quantization, voice codec equipment and electronic installation thereof that linear forecast coding coefficient quantizes |
CA2833874C (en) | 2011-04-21 | 2019-11-05 | Ho-Sang Sung | Method of quantizing linear predictive coding coefficients, sound encoding method, method of de-quantizing linear predictive coding coefficients, sound decoding method, and recording medium |
KR102053900B1 (en) | 2011-05-13 | 2019-12-09 | 삼성전자주식회사 | Noise filling Method, audio decoding method and apparatus, recoding medium and multimedia device employing the same |
RU2464649C1 (en) | 2011-06-01 | 2012-10-20 | Корпорация "САМСУНГ ЭЛЕКТРОНИКС Ко., Лтд." | Audio signal processing method |
CN103718240B (en) | 2011-09-09 | 2017-02-15 | 松下电器(美国)知识产权公司 | Encoding device, decoding device, encoding method and decoding method |
KR102048076B1 (en) | 2011-09-28 | 2019-11-22 | 엘지전자 주식회사 | Voice signal encoding method, voice signal decoding method, and apparatus using same |
TWI671736B (en) * | 2011-10-21 | 2019-09-11 | 南韓商三星電子股份有限公司 | Apparatus for coding envelope of signal and apparatus for decoding thereof |
CN108831501B (en) * | 2012-03-21 | 2023-01-10 | 三星电子株式会社 | High frequency encoding/decoding method and apparatus for bandwidth extension |
US10205961B2 (en) | 2012-04-23 | 2019-02-12 | Qualcomm Incorporated | View dependency in multi-view coding and 3D coding |
CN103023550A (en) * | 2012-12-14 | 2013-04-03 | 西北农林科技大学 | EGT-and MRC-based phase TCQ (trellis coded quantization) method for MISO (multiple input single output) wireless system |
CN106233112B (en) | 2014-02-17 | 2019-06-28 | 三星电子株式会社 | Coding method and equipment and signal decoding method and equipment |
CN111968656B (en) * | 2014-07-28 | 2023-11-10 | 三星电子株式会社 | Signal encoding method and device and signal decoding method and device |
-
2015
- 2015-07-28 CN CN202010872923.1A patent/CN111968656B/en active Active
- 2015-07-28 KR KR1020177002772A patent/KR20170037970A/en not_active IP Right Cessation
- 2015-07-28 CN CN202010872921.2A patent/CN111968655B/en active Active
- 2015-07-28 CN CN201580052356.2A patent/CN107077855B/en active Active
- 2015-07-28 JP JP2017504669A patent/JP6763849B2/en active Active
- 2015-07-28 EP EP23204701.9A patent/EP4293666A3/en active Pending
- 2015-07-28 KR KR1020237015080A patent/KR20230066137A/en not_active Application Discontinuation
- 2015-07-28 EP EP15828104.8A patent/EP3176780A4/en not_active Ceased
-
2019
- 2019-01-28 US US16/259,341 patent/US10827175B2/en active Active
-
2020
- 2020-09-10 JP JP2020152313A patent/JP6980871B2/en active Active
- 2020-09-24 US US17/030,466 patent/US11616954B2/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10194151B2 (en) * | 2014-07-28 | 2019-01-29 | Samsung Electronics Co., Ltd. | Signal encoding method and apparatus and signal decoding method and apparatus |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220101868A1 (en) * | 2019-06-17 | 2022-03-31 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder with a signal-dependent number and precision control, audio decoder, and related methods and computer programs |
WO2023285630A1 (en) * | 2021-07-14 | 2023-01-19 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Integral band-wise parametric audio coding |
Also Published As
Publication number | Publication date |
---|---|
CN111968655A (en) | 2020-11-20 |
KR20230066137A (en) | 2023-05-12 |
JP6763849B2 (en) | 2020-09-30 |
US20210051325A1 (en) | 2021-02-18 |
CN111968656A (en) | 2020-11-20 |
EP3176780A1 (en) | 2017-06-07 |
EP4293666A3 (en) | 2024-03-06 |
KR20170037970A (en) | 2017-04-05 |
CN107077855B (en) | 2020-09-22 |
JP2020204784A (en) | 2020-12-24 |
EP3176780A4 (en) | 2018-01-17 |
EP4293666A2 (en) | 2023-12-20 |
JP6980871B2 (en) | 2021-12-15 |
JP2017528751A (en) | 2017-09-28 |
CN107077855A (en) | 2017-08-18 |
US11616954B2 (en) | 2023-03-28 |
CN111968655B (en) | 2023-11-10 |
US10827175B2 (en) | 2020-11-03 |
CN111968656B (en) | 2023-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11616954B2 (en) | Signal encoding method and apparatus and signal decoding method and apparatus | |
US10194151B2 (en) | Signal encoding method and apparatus and signal decoding method and apparatus | |
US11705142B2 (en) | Signal encoding method and device and signal decoding method and device | |
US10902860B2 (en) | Signal encoding method and apparatus, and signal decoding method and apparatus | |
CN110176241B (en) | Signal encoding method and apparatus, and signal decoding method and apparatus | |
KR20210131926A (en) | Signal encoding method and apparatus and signal decoding method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |