CN106233112B

CN106233112B - Coding method and equipment and signal decoding method and equipment

Info

Publication number: CN106233112B
Application number: CN201580020096.0A
Authority: CN
Inventors: 成昊相; 康斯坦丁·奥斯波夫; 吕毅
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2014-02-17
Filing date: 2015-02-17
Publication date: 2019-06-28
Anticipated expiration: 2035-02-17
Also published as: JP6633547B2; CN110176241B; KR20240008413A; EP3109611A1; KR102386738B1; KR20160122160A; KR102625143B1; CN106233112A; CN110176241A; EP3109611A4; JP2017506771A; KR20220051028A

Abstract

The present invention relates to a kind of methods and apparatus for being coded and decoded in a frequency domain to spectral coefficient.The spectrum coding method, which can comprise the following steps that based on the bit distribution information of each frequency band, selects type of coding；Zero-code is executed for zero-frequency band；It is encoded for information of each non-zero frequency band to the effective frequency component of selection.The spectrum coding method makes it possible to code and decode the spectral coefficient for being adapted to various bit rates and each sub-band size.In addition, can be encoded using TCQ method to frequency spectrum using Bit-Rate Control Algorithm module with fixed bit rate in the codec for supporting multiple rates.The coding efficiency of the codec can be made to maximize by being encoded with accurate target bit rate to high-performance TCQ.

Description

Signal encoding method and apparatus, and signal decoding method and apparatus

Technical Field

One or more exemplary embodiments relate to audio or speech signal encoding and decoding, and more particularly, to a method and apparatus for encoding or decoding spectral coefficients in the frequency domain.

Background

Various schemes of quantizers have been proposed to efficiently encode spectral coefficients in the frequency domain. For example, there are trellis-encoded quantization (TCQ), Uniform Scalar Quantization (USQ), Factorial Pulse Coding (FPC), algebraic vq (avq), pyramidal vq (pvq), etc., and lossless encoders optimized for each quantizer may be implemented together.

Disclosure of Invention

Technical problem

One or more exemplary embodiments include a method and apparatus for encoding or decoding spectral coefficients in the frequency domain adapted to various bit rates or various subband sizes.

One or more exemplary embodiments include a computer readable recording medium having recorded thereon a computer readable program for executing a signal encoding or decoding method.

One or more exemplary embodiments include a multimedia device employing a signal encoding or decoding apparatus.

Solution scheme

According to one or more exemplary embodiments, a spectral encoding method includes: selecting an encoding method based on at least the bit allocation information of each band; performing zero encoding on the zero frequency band; information about the significant frequency components selected for each non-zero frequency band is encoded.

According to one or more exemplary embodiments, a spectrum decoding method includes: selecting a decoding method based on at least the bit allocation information of each band; performing zero decoding on the zero band; information on the significant frequency components obtained for each non-zero frequency band is decoded.

The invention has the advantages of

Encoding and decoding of spectral coefficients adapted to various bit rates and various sub-band sizes may be performed. In addition, the spectrum can be encoded at a fixed bit rate through the TCQ by using a bit rate control module designed in the multi-rate support codec. In this case, by performing encoding at an accurate target bit rate through high performance of the TCQ, the encoding performance of the codec can be maximized.

Drawings

Fig. 1a and 1b are block diagrams of an audio encoding apparatus and an audio decoding apparatus, respectively, according to an exemplary embodiment.

Fig. 2a and 2b are block diagrams of an audio encoding apparatus and an audio decoding apparatus, respectively, according to another exemplary embodiment.

Fig. 3a and 3b are block diagrams of an audio encoding apparatus and an audio decoding apparatus, respectively, according to another exemplary embodiment.

Fig. 4a and 4b are block diagrams of an audio encoding apparatus and an audio decoding apparatus, respectively, according to another exemplary embodiment.

Fig. 5 is a block diagram of a frequency domain audio encoding apparatus according to an exemplary embodiment.

Fig. 6 is a block diagram of a frequency domain audio decoding apparatus according to an exemplary embodiment.

Fig. 7 is a block diagram of a spectral encoding apparatus according to an exemplary embodiment.

Fig. 8 shows subband splitting.

Fig. 9 is a block diagram of a spectral quantization apparatus according to an exemplary embodiment.

Fig. 10 is a block diagram of a spectrum encoding apparatus according to an exemplary embodiment.

Fig. 11 is a block diagram of an ISC encoding apparatus according to an exemplary embodiment.

Fig. 12 is a block diagram of an ISC information encoding apparatus according to an exemplary embodiment.

Fig. 13 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment.

Fig. 14 is a block diagram of a spectrum encoding apparatus according to another exemplary embodiment.

Fig. 15 illustrates the concept of ISC collection and encoding process according to an exemplary embodiment.

Fig. 16 illustrates the concept of ISC collection and encoding process according to another exemplary embodiment.

Fig. 17 illustrates a TCQ according to an exemplary embodiment.

Fig. 18 is a block diagram of a frequency domain audio decoding apparatus according to an exemplary embodiment.

Fig. 19 is a block diagram of a spectrum decoding apparatus according to an exemplary embodiment.

Fig. 20 is a block diagram of a spectral dequantization apparatus according to an exemplary embodiment.

Fig. 21 is a block diagram of a spectrum decoding apparatus according to an exemplary embodiment.

Fig. 22 is a block diagram of an ISC decoding apparatus according to an exemplary embodiment.

Fig. 23 is a block diagram of an ISC information decoding apparatus according to an exemplary embodiment.

Fig. 24 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment.

Fig. 25 is a block diagram of a spectrum decoding apparatus according to another exemplary embodiment.

Fig. 26 is a block diagram of an ISC information encoding apparatus according to another exemplary embodiment.

Fig. 27 is a block diagram of an ISC information decoding apparatus according to another illustrated configuration embodiment.

Fig. 28 is a block diagram of a multimedia device according to the illustrated configuration embodiment.

Fig. 29 is a block diagram of a multimedia device according to another illustrated configuration embodiment.

Fig. 30 is a block diagram of a multimedia device according to another illustrated configuration embodiment.

Fig. 31 is a flowchart of a method of encoding a spectral refinement structure according to an illustrative configuration embodiment.

Fig. 32 is a flowchart illustrating an operation of a method of decoding a spectrum refinement structure according to an illustrated configuration embodiment.

Detailed Description

Since the inventive concept can have a variety of modified embodiments, preferred embodiments are shown in the drawings and described in the detailed description of the inventive concept. However, this does not limit the inventive concept to the specific embodiments, and it should be understood that the inventive concept covers all modifications, equivalents, and alternatives falling within the spirit and scope of the inventive concept. Furthermore, detailed descriptions related to well-known functions or constructions are excluded so as not to unnecessarily obscure the subject matter of the inventive concepts.

It will be understood that, although the terms first and second are used herein to describe various elements, these elements should not be limited by these terms. The terminology is used only to distinguish one component from another.

In the following description, technical terms are used only to explain specific exemplary embodiments, and do not limit the inventive concept. Terms used in the inventive concept have been selected as general terms that are currently widely used in consideration of functions of the inventive concept, but may be modified according to intentions of those of ordinary skill in the art, conventional implementations, or introduction of new technologies. Further, if there is a term arbitrarily selected by the applicant in a specific case, the meaning of the term in that case will be described in detail in the corresponding description section of the inventive concept. Therefore, terms should be defined based on the entire contents of the present specification, not only on the name of each term.

Terms in the singular may include the plural unless referenced to the contrary. The meaning of "comprising", "including" or "having" indicates an attribute, region, fixed number, step, process, element and/or component but does not exclude other attributes, regions, fixed numbers, steps, processes, elements and/or components.

Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings.

The audio encoding apparatus 110 shown in fig. 1a may include a preprocessor 112, a frequency domain encoder 114, and a parameter encoder 116. The components may be integrated in at least one module and may be implemented as at least one processor (not shown).

In fig. 1a, the preprocessor 112 may perform filtering, down-sampling, etc., on the input signal, but is not limited thereto. The input signal may comprise a speech signal, a music signal or a mixture of speech and music. Hereinafter, for convenience of explanation, the input signal is referred to as an audio signal.

The frequency domain encoder 114 may perform time-frequency transformation on the audio signal provided by the preprocessor 112, select an encoding tool corresponding to the number of channels, the encoding band, and the bit rate of the audio signal, and encode the audio signal by using the selected encoding tool. The time-frequency transform may use Modified Discrete Cosine Transform (MDCT), Modulated Lapped Transform (MLT), or Fast Fourier Transform (FFT), but is not limited thereto. When the given number of bits is sufficient, a general transform coding scheme may be applied to the entire frequency band, and when the given number of bits is insufficient, a bandwidth extension scheme may be applied to a partial frequency band. When the audio signal is a stereo channel or multi-channels, encoding is performed for each channel if the given number of bits is sufficient, and a down-mixing scheme may be applied if the given number of bits is insufficient. The encoded spectral coefficients are generated by a frequency domain encoder 114.

The parameter encoder 116 may extract parameters from the encoded spectral coefficients provided from the frequency domain encoder 114 and encode the extracted parameters. For example, a parameter may be extracted for each sub-band (i.e., a unit in which spectral coefficients are grouped), and the parameter may have a uniform length or a non-uniform length by reflecting a critical band. When each sub-band has a non-uniform length, the sub-band existing in the low frequency band may have a relatively short length with respect to the sub-band existing in the high frequency band. The number and length of subbands included in one frame may vary according to a codec algorithm and may affect encoding performance. The parameters may include, for example, scaling factors, power, average energy, or norm, but are not limited thereto. The spectral coefficients and parameters obtained as a result of the encoding form a bitstream, and the bitstream may be stored in a storage medium or may be transmitted over a channel in the form of, for example, packets.

The audio decoding apparatus 130 shown in fig. 1b may include a parameter decoder 132, a frequency domain decoder 134, and a post-processor 136. The frequency domain decoder 134 may include a frame error concealment algorithm or a packet loss concealment algorithm. The components may be integrated in at least one module and may be implemented as at least one processor (not shown).

In fig. 1b, the parameter decoder 132 may decode parameters from the received bitstream and check whether an error such as an erasure or loss occurs in units of frames from the decoded parameters. Various known methods may be used for error checking and information about whether the current frame is a good frame or an erased or lost frame is provided to the frequency domain decoder 134. Hereinafter, for convenience of explanation, an erasure frame or a lost frame is referred to as an error frame.

When the current frame is a good frame, the frequency domain decoder 134 may generate the synthesized spectral coefficients by performing decoding through a general transform decoding process. When the current frame is an error frame, the frequency domain decoder 134 may generate the synthesized spectral coefficients via a frame error concealment algorithm or a packet loss concealment algorithm by repeating the spectral coefficients of a Previous Good Frame (PGF) for the error frame or by scaling the spectral coefficients of a PGF via a regression analysis and then repeating the spectral coefficients of the PGF for the error frame. The frequency domain decoder 134 may generate a time domain signal by performing a frequency-to-time transform on the synthesized spectral coefficients.

The post-processor 136 may perform filtering, up-sampling, etc. on the time domain signal provided from the frequency domain decoder 134 for sound quality improvement, but is not limited thereto. The post-processor 136 provides the reconstructed audio signal as an output signal.

Fig. 2a and 2b are block diagrams of an audio encoding apparatus and an audio decoding apparatus having a switching structure according to another exemplary embodiment, respectively.

The audio encoding apparatus 210 shown in fig. 2a may include a preprocessor unit 212, a mode determiner 213, a frequency domain encoder 214, a time domain encoder 215, and a parameter encoder 216. The components may be integrated in at least one module and may be implemented as at least one processor (not shown).

In fig. 2a, since the preprocessor 212 is substantially the same as the preprocessor 112 of fig. 1a, a description thereof will not be repeated.

The mode determiner 213 may determine the encoding mode by referring to characteristics of the input signal. The mode determiner 213 may determine whether an encoding mode suitable for the current frame is a speech mode or a music mode according to characteristics of an input signal, and may also determine whether an encoding mode efficient for the current frame is a time-domain mode or a frequency-domain mode. The characteristics of the input signal may be sensed by using short-term characteristics of a frame or long-term characteristics of a plurality of frames, but are not limited thereto. For example, if the input signal corresponds to a speech signal, the encoding mode may be determined as a speech mode or a time domain mode, and if the input signal corresponds to a signal other than the speech signal (e.g., a music signal or a mixed signal), the encoding mode may be determined as a music mode or a frequency domain mode. The mode determiner 213 may provide the output signal of the preprocessor 212 to the frequency domain encoder 214 when the characteristics of the input signal correspond to a music mode or a frequency domain mode, and the mode determiner 213 may provide the output signal of the preprocessor 212 to the time domain encoder 215 when the characteristics of the input signal correspond to a speech mode or a time domain mode.

Since the frequency-domain encoder 214 is substantially identical to the frequency-domain encoder 114 of fig. 1a, a description thereof will not be repeated.

The time domain encoder 215 may perform Code Excited Linear Prediction (CELP) encoding on the audio signal provided from the preprocessor 212. In detail, algebraic CELP may be used for CELP coding, but CELP coding is not limited thereto. The encoded spectral coefficients are generated by a time-domain encoder 215.

The parameter encoder 216 may extract parameters from the encoded spectral coefficients provided from the frequency domain encoder 214 or the time domain encoder 215 and encode the extracted parameters. Since the parameter encoder 216 is substantially identical to the parameter encoder 116 of fig. 1a, a description thereof will not be repeated. The spectral coefficients and parameters obtained as a result of the encoding may form a bitstream along with the encoding mode information, and the bitstream may be transmitted through a channel in the form of packets or may be stored in a storage medium.

The audio decoding apparatus 230 shown in fig. 2b may include a parameter decoder 232, a mode determiner 233, a frequency domain decoder 234, a time domain decoder 235, and a post-processor 236. Each of the frequency domain decoder 234 and the time domain decoder 235 may include a frame error concealment algorithm or a packet loss concealment algorithm in each respective domain. The components may be integrated into at least one module and may be implemented as at least one processor (not shown).

In fig. 2b, the parameter decoder 232 may decode parameters from a bitstream transmitted in the form of packets and check whether an error occurs in a frame unit from the decoded parameters. Various known methods may be used for error checking and information about whether the current frame is a good frame or an error frame is provided to the frequency domain decoder 234 or the time domain decoder 235.

The mode determiner 233 may check encoding mode information included in the bitstream and may provide the current frame to the frequency domain decoder 234 or the time domain decoder 235.

The frequency domain decoder 234 may operate when the encoding mode is a music mode or a frequency domain mode, and when the current frame is a good frame, the spectrum decoder 234 may generate synthesized spectral coefficients by performing decoding through a general transform decoding process. When the current frame is an error frame and the encoding mode of the previous frame is a music mode or a frequency domain mode, the frequency domain decoder 234 may generate the synthesized spectral coefficients by a frame error concealment algorithm or a packet loss concealment algorithm, by repeating the spectral coefficients of a Previous Good Frame (PGF) for the error frame, or by scaling the spectral coefficients of a PGF via regression analysis to then repeat the spectral coefficients of the PGF for the error frame. The frequency domain decoder 234 may generate a time domain signal by performing a frequency-to-time transform on the synthesized spectral coefficients.

The time domain decoder 235 may operate when the encoding mode is a speech mode or a time domain mode, and generate a time domain signal by performing decoding through a general CELP decoding process when the current frame is a normal frame. When the current frame is an error frame and the encoding mode of the previous frame is a speech mode or a time domain mode, the time domain decoder 235 may perform a frame error concealment algorithm or a packet loss concealment algorithm in the time domain.

The post-processor 236 may perform filtering, upsampling, etc., on the time domain signal provided from the frequency domain decoder 234 or the time domain decoder 235, but is not limited thereto. The post-processor 236 provides the reconstructed audio signal as an output signal.

The audio encoding apparatus 310 shown in fig. 3a may include a pre-processor 312, a Linear Prediction (LP) analyzer 313, a mode determiner 314, a frequency-domain excitation encoder 315, a time-domain excitation encoder 316, and a parameter encoder 317. The components may be integrated in at least one module and may be implemented as at least one processor (not shown).

In fig. 3a, since the preprocessor 312 is substantially the same as the preprocessor 112 of fig. 1a, a description thereof will not be repeated.

The LP analyzer 313 may extract LP coefficients by performing LP analysis on the input signal, and may generate an excitation signal from the extracted LP coefficients. The excitation signal may be provided to one of a frequency-domain excitation encoder 315 and a time-domain excitation encoder 316 according to an encoding mode.

Since the mode determiner 314 is substantially identical to the mode determiner 213 of fig. 2a, a description thereof will not be repeated.

The frequency-domain excitation encoder 315 is operable when the encoding mode is a music mode or a frequency-domain mode, and since the frequency-domain excitation encoder 315 is substantially identical to the frequency-domain encoder 114 of fig. 1a except that the input signal is an excitation signal, a description thereof will not be repeated.

The time-domain excitation encoder 316 may operate when the encoding mode is a speech mode or a time-domain mode, and since the time-domain excitation encoder 316 is substantially identical to the time-domain encoder 215 of fig. 2a, a description thereof will not be repeated.

The parameter encoder 317 may extract parameters from the encoded spectral coefficients provided from the frequency-domain excitation encoder 315 or the time-domain excitation encoder 316 and encode the extracted parameters. Since the parameter encoder 317 is substantially the same as the parameter encoder 116 of fig. 1a, a description thereof will not be repeated. The spectral coefficients and parameters obtained as a result of the encoding may form a bitstream along with the encoding mode information, and the bitstream may be transmitted in the form of packets through a channel or may be stored in a storage medium.

The audio decoding apparatus 330 shown in fig. 3b may comprise a parameter decoder 332, a mode determiner 333, a frequency-domain excitation decoder 334, a time-domain excitation decoder 335, an LP synthesizer 336 and a post-processor 337. Each of the frequency domain excitation decoder 334 and the time domain excitation decoder 335 may include a frame error concealment algorithm or a packet loss concealment algorithm in each respective domain. The components may be integrated in at least one module and may be implemented as at least one processor (not shown).

In fig. 3b, the parameter decoder 332 may decode parameters from a bitstream transmitted in the form of packets, and may check whether an error occurs in units of frames from the decoded parameters. Various known methods may be used for error checking and information about whether the current frame is a good frame or an error frame is provided to either the frequency-domain excitation decoder 334 or the time-domain excitation decoder 335.

The mode determiner 333 may check encoding mode information included in the bitstream and may provide the current frame to the frequency-domain excitation decoder 334 or the time-domain excitation decoder 335.

The frequency-domain excitation decoder 334 may operate when the encoding mode is a music mode or a frequency-domain mode, and when the current frame is a good frame, the frequency-domain excitation decoder 334 may generate the synthesized spectral coefficients by performing decoding through a general transform decoding process. When the current frame is an error frame and the encoding mode of the previous frame is a music mode or a frequency domain mode, the frequency domain excitation decoder 334 may generate the synthesized spectral coefficients by a frame error concealment algorithm or a packet loss concealment algorithm, by repeating the spectral coefficients of a Previous Good Frame (PGF) for the error frame, or by scaling the spectral coefficients of a PGF via regression analysis to then repeat the spectral coefficients of a PGF for the error frame. The frequency-domain excitation decoder 334 may generate an excitation signal as a time-domain signal by performing a frequency-to-time transform on the synthesized spectral coefficients.

The time-domain excitation decoder 335 may operate when the encoding mode is a speech mode or a time-domain mode, and generate an excitation signal as a time-domain signal by performing decoding through a general CELP decoding process when the current frame is a good frame. When the current frame is an error frame and the encoding mode of the previous frame is a speech mode or a time domain mode, the time domain excitation decoder 335 may perform a frame error concealment algorithm or a packet loss concealment algorithm in the time domain.

The LP synthesizer 336 may generate a time domain signal by performing LP synthesis on the excitation signal provided from the frequency domain excitation decoder 334 or the time domain excitation decoder 335.

Post-processor 337 may perform filtering, upsampling, etc., on the time domain signal provided from LP synthesizer 336, but is not limited thereto. The post-processor 337 provides the reconstructed audio signal as an output signal.

Fig. 4a and 4b are block diagrams of an audio encoding apparatus and an audio decoding apparatus having a switching structure according to another exemplary embodiment, respectively.

The audio encoding apparatus 410 shown in fig. 4a may comprise a pre-processor 412, a mode determiner 413, a frequency domain encoder 414, an LP analyzer 415, a frequency domain excitation encoder 416, a time domain excitation encoder 417 and a parameter encoder 418. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). Since it can be considered that the audio encoding apparatus 410 illustrated in fig. 4a can be obtained by combining the audio encoding apparatus 210 of fig. 2a and the audio encoding apparatus 310 of fig. 3a, a description of the operation of the common components will not be repeated, and the operation of the mode determining unit 413 will now be described.

The mode determiner 413 may determine the encoding mode of the input signal by referring to the characteristics and the bit rate of the input signal. The mode determiner 413 may determine the encoding mode as the CELP mode or another mode based on whether the current frame is a speech mode or a music mode according to characteristics of the input signal and based on whether an encoding mode efficient for the current frame is a time domain mode or a frequency domain mode. The mode determiner 413 may determine the encoding mode as a CELP mode when the characteristics of the input signal correspond to a voice mode, determine the encoding mode as a frequency domain mode when the characteristics of the input signal correspond to a music mode and a high bit rate, and determine the encoding mode as an audio mode when the characteristics of the input signal correspond to a music mode and a low bit rate. The mode determiner 413 may provide the input signal to the frequency-domain encoder 414 when the encoding mode is the frequency-domain mode, to the frequency-domain excitation encoder 416 via the LP analyzer 415 when the encoding mode is the audio mode, and to the time-domain excitation encoder 417 via the LP analyzer 415 when the encoding mode is the CELP mode.

The frequency-domain encoder 414 may correspond to the frequency-domain encoder 114 in the audio encoding apparatus 110 of fig. 1a or the frequency-domain encoder 214 in the audio encoding apparatus 210 of fig. 2a, and the frequency-domain excitation encoder 416 or the time-domain excitation encoder 417 may correspond to the frequency-domain excitation encoder 315 or the time-domain excitation encoder 316 in the audio encoding apparatus 310 of fig. 3 a.

The audio decoding apparatus 430 shown in fig. 4b may comprise a parameter decoder 432, a mode determiner 433, a frequency domain decoder 434, a frequency domain excitation decoder 435, a time domain excitation decoder 436, an LP synthesizer 437 and a post-processor 438. Each of the frequency domain decoder 434, the frequency domain excitation decoder 435, and the time domain excitation decoder 436 may include a frame error concealment algorithm or a packet loss concealment algorithm in each respective domain. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). Since it can be considered that the audio decoding apparatus 430 shown in fig. 4b can be obtained by combining the audio decoding apparatus 230 of fig. 2b and the audio decoding apparatus 330 of fig. 3b, a description of the operation of the common components will not be repeated, and the operation of the mode determiner 433 will now be described.

The mode determiner 433 may check encoding mode information included in the bitstream and may provide the current frame to the frequency domain decoder 434, the frequency domain excitation decoder 435, or the time domain excitation decoder 436.

The frequency domain decoder 434 may correspond to the frequency domain decoder 134 in the audio decoding apparatus 130 of fig. 1b or the frequency domain decoder 234 in the audio decoding apparatus 230 of fig. 2b, and the frequency domain excitation decoder 435 or the time domain excitation decoder 436 may correspond to the frequency domain excitation decoder 334 or the time domain excitation decoder 335 in the audio decoding apparatus 330 of fig. 3 b.

The frequency domain audio encoding apparatus 510 shown in fig. 5 may include a transient detector 511, a transformer 512, a signal classifier 513, an energy encoder 514, a spectrum normalizer 515, a bit distributor 516, a spectrum encoder 517, and a multiplexer 518. The components may be integrated in at least one module and may be implemented as at least one processor (not shown). The frequency-domain audio encoding apparatus 510 may perform all functions of the frequency-domain audio encoder 214 and part of the functions of the parameter encoder 216 shown in fig. 2. The frequency domain audio encoding apparatus 510 may be replaced by a configuration of an encoder disclosed in ITU-T g.719 standard except for the signal classifier 513, and the transformer 512 may use a transform window having an overlapping duration of 50%. Furthermore, the frequency domain audio encoding apparatus 510 may be replaced by a configuration of an encoder disclosed in ITU-T g.719 standard, except for the transient detector 511 and the signal classifier 513. In each case, although not shown, a noise level estimation unit may be further included at a rear end of the spectrum encoder 517 to estimate a noise level of a spectrum coefficient to which a bit is not allocated in the bit allocation process, as in the ITU-T g.719 standard, and to insert the estimated noise level into a bitstream.

Referring to fig. 5, the transient detector 511 may detect a duration exhibiting transient characteristics by analyzing an input signal, and generate transient signaling information for each frame in response to the detection result. Various known methods may be used to detect the transient duration. According to an exemplary embodiment, the transient detector 511 may first determine whether the current frame is a transient frame, and then verify the current frame that has been determined to be a transient frame. The transient signaling information may be included in the bitstream by multiplexer 518 and may be provided to transformer 512.

The transformer 512 may determine a window size to be used for transformation according to the detection result of the transient duration, and perform time-frequency transformation based on the determined window size. For example, a short window may be applied to sub-bands for which a transient duration has been detected, and a long window may be applied to sub-bands for which a transient duration has not been detected. As another example, a short window may be applied to a frame that includes a transient duration.

The signal classifier 513 may analyze the spectrum provided from the transformer 512 in units of frames to determine whether each frame corresponds to a harmonic frame. Various known methods may be used to determine the harmonic frame. According to an exemplary embodiment, the signal classifier 513 may divide the spectrum provided from the transformer 512 into a plurality of sub-bands and obtain a peak energy value and an average energy value for each sub-band. Thereafter, the signal classifier 513 may obtain the number of sub-bands of which the peak energy value per frame is greater than the average energy value by a predetermined ratio or more, and determine a frame of which the obtained number of sub-bands is greater than or equal to a predetermined value as a harmonic frame. The predetermined ratio and the predetermined value may be predetermined by experiment or simulation. The harmonic signaling information may be included in the bitstream by multiplexer 518.

The energy encoder 514 may obtain energy in units of each sub-band, and may quantize and lossless-encode the energy. According to an embodiment, a norm value corresponding to an average spectral energy in units of each sub-band may be used as the energy, and a scaling factor or power may also be used, but the energy is not limited thereto. The norm value for each sub-band may be provided to a spectrum normalizer 515 and a bit allocator 516 and may be included in the bit stream by a multiplexer 518.

The spectrum normalizer 515 may normalize the spectrum by using a norm value obtained in units of each sub-band.

The bit allocator 516 may allocate bits in integer units or in fractional units by using norm values obtained in units of each subband. Further, the bit allocator 516 may calculate a masking threshold by using a norm value obtained in units of each subband, and estimate the number of bits perceptually required (i.e., an allowable number of bits) by using the masking threshold. The bit allocator 516 may limit: the number of allocated bits does not exceed the allowable number of bits for each sub-band. The bit allocator 516 may sequentially allocate bits from subbands having larger norm values and weight the norm value of each subband to adjust the allocated number of bits according to the perceptual importance of each subband, such that a greater number of bits is allocated to the perceptually important subbands. The quantized norm values provided from the energy encoder 514 to the bit allocator 516 may be used for bit allocation after being pre-adjusted to account for psychoacoustic weighting and masking effects according to the ITU-T g.719 standard.

The spectrum encoder 517 may quantize the normalized spectrum by using the allocated number of bits per subband and losslessly encode the quantization result. For example, TCQ, USQ, FPC, AVQ, and PVQ, or a combination thereof, and a lossless encoder optimized for each quantizer may be used for spectral encoding. Further, the trellis encoding may also be used for the spectrum encoding, but the spectrum encoding is not limited thereto. In addition, various spectrum coding methods may be used according to an environment or user requirements in which the corresponding codec is implemented. Information on the spectrum encoded by the spectrum encoder 517 may be included in the bitstream by the multiplexer 518.

Fig. 6 is a block diagram of a frequency domain audio encoding apparatus according to an exemplary embodiment.

The frequency domain audio encoding apparatus 600 illustrated in fig. 6 may include a preprocessor 610, a frequency domain encoder 630, a time domain encoder 650, and a multiplexer 670. The frequency domain encoder 630 may include a transient detector 631, a transformer 633, and a spectral encoder 635. The components may be integrated in at least one module and may be implemented as at least one processor (not shown).

Referring to fig. 6, the preprocessor 610 may perform filtering, down-sampling, etc., on the input signal, but is not limited thereto. The pre-processor 610 may determine a coding mode according to the signal characteristics. The preprocessor 610 may determine whether an encoding mode suitable for the current frame is a speech mode or a music mode according to signal characteristics, and may also determine whether an encoding mode efficient for the current frame is a time-domain mode or a frequency-domain mode. The signal characteristics may be perceived by using short-term characteristics of the frame or long-term characteristics of the plurality of frames, but are not limited thereto. For example, if the input signal corresponds to a speech signal, the encoding mode may be determined as a speech mode or a time domain mode, and if the input signal corresponds to a signal other than the speech signal (i.e., a music signal or a mixed signal), the encoding mode may be determined as a music mode or a frequency domain mode. The preprocessor 610 may provide the input signal to the frequency domain encoder 630 when the signal characteristics correspond to a music mode or a frequency domain mode, and the preprocessor 610 may provide the input signal to the time domain encoder 660 when the signal characteristics correspond to a speech mode or a time domain mode.

The frequency domain encoder 630 may process the audio signal provided from the pre-processor 610 based on a transform coding scheme. In detail, the transient detector 631 may detect a transient component from the audio signal and determine whether the current frame corresponds to a transient frame. The transformer 633 may determine the length or shape of a transform window based on the frame type (i.e., the transient information provided from the transient detector 631), and may transform the audio signal to the frequency domain based on the determined transform window. As an example of a transform tool, a Modified Discrete Cosine Transform (MDCT), a Modulated Lapped Transform (MLT), or a Fast Fourier Transform (FFT) may be used. In general, a short variation window may be applied to frames that include transient components. The spectrum encoder 635 may perform encoding on the audio spectrum transformed to the frequency domain. The spectral encoder 635 will be described in more detail with reference to fig. 7 and 9.

The time domain encoder 650 may perform Code Excited Linear Prediction (CELP) encoding on the audio signal provided from the pre-processor 610. In detail, algebraic CELP may be used for CELP coding, but CELP coding is not limited thereto.

The multiplexer 670 may multiplex the spectral or signal components with a variable index generated as a result of encoding in the frequency domain encoder 630 or the time domain encoder 650 to generate a bitstream. The bitstream may be stored in a storage medium or may be transmitted in the form of packets through a channel.

Fig. 7 is a block diagram of a spectrum decoding apparatus according to an exemplary embodiment. The spectral encoding apparatus illustrated in fig. 7 may correspond to the spectral encoder 635 of fig. 6, may be included in another frequency domain encoding apparatus, or may be independently implemented.

The spectral encoding apparatus shown in fig. 7 may include an energy estimator 710, an energy quantizing and encoding unit 720, a bit allocator 730, a spectral normalizer 740, a spectral quantizing and encoding unit 750, and a noise filler 760.

Referring to fig. 7, the energy estimator 710 may divide the original spectral coefficients into a plurality of sub-bands and estimate energy (e.g., a norm value of each sub-band). Each sub-band may have a uniform length in the frame. When each sub-band has a non-uniform length, the number of spectral coefficients included in the sub-band may increase from a low frequency band to a high frequency band.

The energy quantization and encoding unit 720 may quantize and encode the estimated norm value of each subband. The norm values may be quantized by various tools, such as Vector Quantization (VQ), Scalar Quantization (SQ), Trellis Coded Quantization (TCQ), Lattice Vector Quantization (LVQ), and the like. The energy quantization and encoding unit 720 may additionally perform lossless encoding to further improve encoding efficiency.

The bit allocator 730 may allocate bits required for encoding in consideration of allowable bits of a frame based on the quantized norm value of each subband.

The spectrum normalizer 740 may normalize the spectrum based on the norm values obtained for each subband.

The spectrum quantization and encoding unit 750 may quantize and encode the normalized spectrum based on the allocated bits of each sub-band.

The noise filler 760 may add noise to components quantized to zero due to the constraint of allowable bits in the spectral quantization and coding unit 750.

Fig. 8 shows subband splitting.

Referring to fig. 8, when the input signal uses a sampling frequency of 48KHz and has a frame size of 20ms, the number of samples to be processed per frame becomes 960. That is, when an input signal is transformed by using MDCT having an overlap of 50%, 960 spectral coefficients are obtained. The ratio of overlapping may be set differently according to the coding scheme. In the frequency domain, a band up to 24KHz can be theoretically processed, and a band up to 20KHz can be represented in consideration of an audible range. In the low band of 0 to 3.2KHz, the sub-band comprises 8 spectral coefficients. In the band of 3.2KHz to 6.4KHz, a sub-band includes 16 spectral coefficients. In the band of 6.4KHz to 13.6KHz, a sub-band includes 24 spectral coefficients. In the band of 13.6KHz to 20KHz, a sub-band includes 32 spectral coefficients. For a predetermined frequency band set in the encoding apparatus, encoding based on an norm value may be performed, and for a high frequency band higher than the predetermined frequency band, encoding based on various schemes (such as bandwidth extension) may be applied.

Fig. 9 is a block diagram illustrating a configuration of a spectrum quantizing device according to an exemplary embodiment.

The apparatus shown in fig. 9 may include quantizer selection units 910, USQ 930, and TCQ 950.

In fig. 9, the quantizer selection unit 910 may select the most efficient quantizer among various quantizers according to the characteristics of a signal to be quantized (i.e., an input signal). As the characteristics of the input signal, bit allocation information for each band, band size information, and the like can be used. Depending on the result of the selection, the signal to be quantized may be provided to one of USQ 930 and TCQ 950 such that a corresponding quantization is performed.

Fig. 10 is a block diagram of a configuration of a spectrum encoding apparatus according to an exemplary embodiment. The apparatus illustrated in fig. 10 may correspond to the spectral quantization and encoding unit 750 of fig. 7, may be included in another frequency domain encoding apparatus, or may be independently implemented.

The apparatus shown in fig. 10 may include a coding method selection unit 1010, a zero coding unit 1020, a scaling unit 1030, an ISC coding unit 1040, a quantized component restoration unit 1050, and an inverse scaling unit 1060. Here, the quantized component restoring unit 1050 and the inverse scaling unit 1060 may be optionally provided.

In fig. 10, the encoding method selection unit 1010 may select an encoding method by considering input signal characteristics. The input signal characteristics may include bits allocated for each frequency band. The normalized spectrum may be provided to the zero encoding unit 1020 or the scaling unit 1030 based on the coding scheme selected for each band. According to an embodiment, the average number of bits allocated to each sample of a band is greater than or equal to a predetermined value (e.g., 0.75), USQ may be used for the corresponding band by determining that the corresponding band is important, and the TCQ may be used for all other bands. Here, the average number of bits may be determined in consideration of a band length or a band size. A one-bit flag may be used to set the selected encoding method.

The zero encoding unit 1020 may encode all samples as zero (0) for a frequency band in which the allocated bits are zero.

The scaling unit 1030 may adjust a bit rate by scaling a spectrum based on bits allocated to a band. In this case, a normalized spectrum may be used. The scaling unit 1030 may perform scaling by considering an average number of bits allocated to each sample (i.e., spectral coefficient) included in the band. For example, the larger the average number of bits, the larger the scaling may be performed.

According to an embodiment, the scaling unit 1030 may determine an appropriate scaling value according to the bit allocation for each frequency band.

In detail, first, the number of pulses of the current band may be estimated using the band length and the bit allocation information. Here, the pulse may indicate a unit pulse. Before the estimation is performed, the bit (b) actually required for the current band may be calculated based on equation 1.

Where n denotes a band length, m denotes a number of pulses, and i denotes a number of non-zero positions having significant spectral coefficients (ISCs).

The number of non-zero positions may be obtained based on a probability according to equation 2, for example.

Further, the number of bits required for the non-zero position can be estimated by equation 3.

Finally, the number of pulses may be selected by the value b having the closest value to the bit allocated to each band.

Next, an initial scaling factor may be determined by an estimate of the number of pulses obtained for each frequency band and the absolute value of the input signal. The input signal may be scaled by an initial scaling factor. If the sum of the number of pulses for the scaled original signal (i.e., the quantized signal) is not the same as the estimated number of pulses, the pulse redistribution process may be performed using the updated scaling factor. According to the pulse redistribution process, if the number of pulses selected for the current band is less than the estimated number of pulses obtained for each band, the number of pulses is increased by decreasing the scaling factor, otherwise, if the number of pulses selected for the current band is greater than the estimated number of pulses obtained for each band, the number of pulses is decreased by increasing the scaling factor. In this case, the scaling factor may be increased or decreased by a predetermined value by selecting a position where distortion of the original signal is minimized.

Since the distortion function of the TSQ requires a relative size rather than an exact distance, the distortion function of the TSQ may be obtained as the sum of squared distances between quantized and unquantized values in each frequency band as shown in equation 4.

Wherein p is_iRepresenting the actual value, q_iRepresenting the quantized values.

USQ, the distortion function may use Euclidean distances to determine the optimal quantization value. In this case, an improved equation including a scaling factor may be used to minimize the computational complexity, and a distortion function may be calculated by equation 5.

If the number of pulses for each band does not match the required value, a predetermined number of pulses need to be increased or decreased while maintaining a minimum metric. This can be performed in a recursive manner by adding or deleting individual pulses and then repeating until the number of pulses reaches a desired value.

In order to add or delete a pulse, n distortion values need to be obtained to select the optimal distortion value. For example, as shown in equation 6, the distortion value j may correspond to the j-th position where a pulse is added to the frequency band.

To avoid equation 6 being performed n times, a bias may be used as shown in equation 7.

In the case of the equation 7,may be calculated only once. Furthermore, n denotes the band length (i.e. the number of coefficients in the band), p denotes the original signal (i.e. the input signal of the quantizer), q denotes the quantized signal, and g denotes the scaling factor. Finally, the location j where the distortion d is minimized may be selected, thereby updating q_j。

To control the bit rate, encoding may be performed by using the scaled spectral coefficients and selecting an appropriate ISC. In detail, the spectral components for quantization may be selected using a bit allocation for each frequency band. In this case, the spectral components may be selected based on various combinations according to the distribution and variance of the spectral components. Next, the actual non-zero position may be calculated. The non-zero position may be obtained by analyzing the scaling amount and the redistribution operation, and the non-zero position thus selected may be referred to as an ISC. In summary, an optimal scaling factor and non-zero position information corresponding to the ISCs are obtained by analyzing the amplitude of the signal that has undergone the scaling and redistribution processes. Here, the non-zero position information indicates the number and positions of the non-zero positions. If the number of pulses is not controlled by the scaling and redistribution processes, the selected pulses may be quantized by the TCQ process and the quantization result may be used to adjust the excess bits. This process is shown below.

For the condition that the number of non-zero positions is not the same as the estimated number of pulses for each band and is greater than a predetermined value (e.g., 1) and the quantizer selection information indicates TCQ, the excess bits may be adjusted by actual TCQ quantization. In detail, in the case of corresponding to the condition, the TCQ quantization process is first performed to adjust the excess bits. If the true number of pulses of the current band obtained by the TCQ quantization is smaller than the estimated number of pulses previously obtained for each band, the scaling factor is increased by multiplying the scaling factor determined before the TCQ quantization by a value larger than 1 (e.g., 1.1), otherwise the scaling factor is decreased by multiplying the scaling factor determined before the actual TCQ quantization by a value smaller than 1 (e.g., 0.9). When the estimated number of pulses obtained for each band is the same as the number of pulses of the current band obtained by TCQ quantization by repeating this process, the excess bits are updated by calculating the bits used in the actual TCQ quantization process. The non-zero position obtained by this process may correspond to the ISC.

The ISC coding unit 1040 may code information on the number of finally selected ISCs and information on non-zero positions. In this process, lossless coding may be applied to improve coding efficiency. ISC coding unit 1040 may perform coding using a selected quantizer for non-zero frequency bands for which the assigned bits are non-zero. In detail, the ISC encoding unit 1040 may select an ISC for each frequency band with respect to the normalized spectrum, and encode information about the selected ISC based on the number, position, amplitude, and sign. In this case, the ISC amplitudes may be encoded in a manner different from the number, position and sign. For example, the ISC amplitude may be quantized and arithmetic coded using one of USQ and TCQ, and the number, location, and sign of ISCs may be arithmetic coded. If it is determined that a particular band includes important information USQ may be used, otherwise the TCQ may be used. According to an embodiment, one of TCQs and USQ may be selected based on signal characteristics. Here, the signal characteristic may include a bit or a band length allocated to each band. If the average number of bits allocated to each sample included in a band is greater than or equal to a threshold value (e.g., 0.75), it may be determined that the corresponding band includes very important information, and thus USQ may be used. Even in the case of a low frequency band having a short band length, USQ can be used as the case may be. According to another embodiment, one of the first and second combining schemes may be used according to a bandwidth. For example, for NB and WB, a first joint scheme may be used in which a quantizer may be selected by additionally using secondary bit allocation processing excess bits from previously encoded bands in addition to original bit allocation information for each band, and for SWB and FB, a second joint scheme in which TCQ may be used for Least Significant Bits (LSBs) for a band determined to use USQ. In a first joint scheme, the secondary bit allocation that processes both bands can be selected by distributing the excess bits from the previously encoded bands. In a second joint scheme, USQ may be used for the remaining bits.

The quantized component restoring unit 1050 may restore the actual quantized component by adding the ISC position, amplitude, and sign information to the quantized component. Here, zero may be assigned to the spectral coefficient at the zero position (the spectral coefficient encoded as zero).

The inverse scaling unit 1060 may output quantized spectral coefficients of the same level as the level of the normalized input spectrum by inversely scaling the restored quantized components. The scaling unit 1030 and the inverse scaling unit 1060 may use the same scaling factor.

Fig. 11 is a block diagram illustrating a configuration of an ISC encoding apparatus according to an exemplary embodiment.

The apparatus shown in fig. 11 may include an ISC selection unit 1110 and an ISC information coding unit 1130. The apparatus of fig. 11 may correspond to the ISC coding unit 1040 of fig. 10, or may be implemented as a separate apparatus.

In fig. 11, the ISC selection unit 1110 may select ISCs from the scaled spectrum based on a predetermined criterion to adjust a bit rate. The ISC selection unit 1110 may obtain the actual non-zero position by analyzing the degree of scaling from the scaled spectrum. Here, the ISC may correspond to an actual non-zero spectral coefficient before scaling. The ISC selection unit 1110 may select a spectral coefficient to be encoded (i.e., a non-zero position) by considering the distribution and variance of the spectral coefficients based on bits allocated for each band. TCQ may be used for ISC selection.

The ISC information encoding unit 1130 encodes ISC information (i.e., number information, position information, amplitude information, and sign of ISCs) based on the selected ISCs.

Fig. 12 is a block diagram illustrating a configuration of an ISC information encoding apparatus according to an exemplary embodiment.

The apparatus shown in fig. 12 may include a position information encoding unit 1210, a magnitude information encoding unit 1230, and a symbol encoding unit 1250.

In fig. 12, the position information encoding unit 1210 may encode position information (i.e., position information of non-zero spectral coefficients) of the ISCs selected by the ISC selection unit (1110 of fig. 11). The location information may include the number and location of the selected ISCs. Arithmetic coding may be used to encode the location information. A new buffer may be configured by collecting the selected ISCs. For ISC collection, zero bands and unselected spectrum may be excluded.

The amplitude information encoding unit 1230 may encode the amplitude information of the newly configured ISCs. In this case, quantization may be performed by selecting one of TCQs and USQ, and arithmetic coding may then be additionally performed. To improve the efficiency of arithmetic coding, non-zero position information and the number of ISCs may be used.

The symbol information encoding unit 1250 may encode the symbol information of the selected ISCs. Arithmetic coding may be used to encode the symbol information.

Fig. 13 is a block diagram showing a configuration of a spectrum encoding apparatus according to another exemplary embodiment. The apparatus illustrated in fig. 13 may correspond to the spectral quantizing and encoding unit 750 of fig. 7, or may be included in another frequency-domain encoding apparatus, or may be independently implemented.

The apparatus shown in fig. 13 may include a scaling unit 1330, an ISC coding unit 1340, a quantized component recovery unit 1350, and an inverse scaling unit 1360. Compared to fig. 10, the operation of each component is the same except that the zero coding unit 1020 and the coding method selection unit 1010 are omitted and the ISC coding unit 1340 uses the TCQ.

Fig. 14 is a block diagram showing a configuration of a spectrum encoding apparatus according to another exemplary embodiment. The apparatus illustrated in fig. 14 may correspond to the spectral quantizing and encoding unit 750 of fig. 7, or may be included in another frequency-domain encoding apparatus, or may be independently implemented.

The apparatus shown in fig. 14 may include a coding method selection unit 1410, a scaling unit 1430, an ISC coding unit 1440, a quantized component recovery unit 1450, and an inverse scaling unit 1460. Compared to fig. 10, the operation of each component is the same except that the zero encoding unit 1020 is omitted.

Fig. 15 illustrates the concept of ISC collection and encoding process according to an exemplary embodiment. First, the zero band (i.e., the band to be quantized to zero) is omitted. Next, a new buffer may be configured by using the ISCs selected from the spectral components existing in the non-zero band. The TCQ and the corresponding lossless coding may be performed on the newly configured ISCs in units of frequency bands.

Fig. 16 illustrates the concept of ISC collection and encoding process according to another exemplary embodiment. First, the zero band (i.e., the band to be quantized to zero) is omitted. Next, a new buffer may be configured by using the ISCs selected from the spectral components existing in the non-zero band. The USC or TCQ and the corresponding lossless coding may be performed on the newly configured ISCs in units of frequency bands.

Fig. 17 illustrates a TCQ according to an exemplary embodiment, and corresponds to an eight-state quad-coset trellis structure having two zero levels. A detailed description of the corresponding TCQ is disclosed in the patent registration with US 7605725.

Fig. 18 is a block diagram illustrating a configuration of a frequency domain audio decoding apparatus according to an exemplary embodiment.

The frequency domain audio decoding apparatus 1800 shown in fig. 18 may include a frame error detection unit 1810, a frequency domain decoding unit 1830, a time domain decoding unit 1850, and a post-processing unit 1870. The frequency domain decoding unit 1830 may include a spectrum decoding unit 1831, a memory updating unit 1833, an inverse transform unit 1835, and an overlap and add (OLA) unit 1837. Each component may be integrated in at least one module and may be implemented by at least one processor (not shown).

Referring to fig. 18, the frame error detection unit 1810 may detect whether a frame error occurs from a received bitstream.

The frequency domain decoding unit 1830 may operate when the encoding mode is a music mode or a frequency domain mode, enable an FEC algorithm or a PLC algorithm when a frame error occurs, and generate a time domain signal through a general transform decoding process when no frame error occurs. In detail, the spectral decoding unit 1831 may synthesize spectral coefficients by performing spectral decoding using the decoded parameters. The spectrum decoding unit 1831 will be described in more detail with reference to fig. 19 and 20.

The memory updating unit 1833 may update the synthesized spectral coefficients of the current frame as a normal frame, information obtained using the decoded parameters for subsequent frame updating, the number of consecutive error frames so far, signal characteristics of each frame, frame type information, and the like. Here, the signal characteristics may include transient characteristics and static characteristics, and the frame type may include a transient frame, a static frame, or a harmonic frame.

The inverse transform unit 1835 may generate a time-domain signal by performing an inverse time-frequency transform on the synthesized spectral coefficients.

The OLA unit 1837 may generate a final time domain signal for the current frame as a result of the OLA processing by performing the OLA processing using the time domain signal of the previous frame and provide the final time domain signal to the post-processing unit 1870.

The time domain decoding unit 1850 may operate when the encoding mode is a voice mode or a time domain mode, enable an FEC or PLC algorithm when a frame error occurs, and generate a time domain signal through a general CELP decoding process when a frame error does not occur.

The post-processing unit 1870 may perform filtering or upsampling on the time-domain signal provided from the frequency-domain decoding unit 1830 or the time-domain decoding unit 1850, but is not limited thereto. The post-processing unit 1870 may provide the recovered audio signal as an output signal.

Fig. 19 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to an exemplary embodiment. The apparatus illustrated in fig. 19 may correspond to the spectrum decoding unit 1831 of fig. 18, or may be included in another frequency domain decoding apparatus, or implemented independently.

The spectrum decoding apparatus 1900 shown in fig. 19 may include an energy decoding and dequantizing unit 1910, a bit allocator 1930, a spectrum decoding and dequantizing unit 1950, a noise filler 1970, and a spectrum shaping unit 1990. Here, the noise filler 1970 may be located at the rear end of the spectrum shaping unit 1990. Each component may be integrated in at least one module and may be implemented by at least one processor (not shown).

Referring to fig. 19, an energy decoding and dequantizing unit 1910 may losslessly decode energy, such as a parameter, e.g., a norm value, on which lossless encoding has been performed in an encoding process, and dequantize the decoded norm value. The inverse quantization may be performed using a scheme corresponding to a quantization scheme for the norm values in the encoding process.

The bit allocator 1930 may allocate the number of bits required for each subband based on the quantized norm value or the dequantized norm value. In this case, the number of bits allocated for each sub-band may be the same as the number of bits allocated in the encoding process.

The spectral decoding and inverse quantization unit 1950 may generate normalized spectral coefficients by losslessly decoding the encoded spectral coefficients using the number of bits allocated for each subband and performing an inverse quantization process on the decoded spectral coefficients.

The noise filler 1970 may fill noise in a portion of each sub-band among the normalized spectral coefficients, which requires noise filling.

The spectral shaping unit 1990 may shape the normalized spectral coefficients by using the dequantized norm values. The finally decoded spectral coefficients may be obtained by a spectral shaping process.

Fig. 20 is a block diagram illustrating a configuration of a spectral dequantization apparatus according to an exemplary embodiment.

The apparatus shown in fig. 20 may include dequantizer selection units 2010, USQ 2030 and TCQ 2050.

In fig. 20, the inverse quantizer selection unit 2010 may select the most efficient inverse quantizer among various inverse quantizers according to the characteristics of an input signal (i.e., a signal to be inversely quantized). Bit allocation information, bit size information, etc. for each frequency band may be used as characteristics of the input signal. Depending on the result of the selection, the signal to be dequantized may be provided to one of USQ 2030 and TCQ 2050, such that a corresponding dequantization is performed.

Fig. 21 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to an exemplary embodiment. The apparatus shown in fig. 21 may correspond to the spectral decoding and inverse quantization unit 1950 of fig. 19, or may be included in another frequency domain decoding apparatus, or implemented independently.

The apparatus shown in fig. 21 may include a decoding method selection unit 2110, a zero decoding unit 2130, an ISC decoding unit 2150, a quantized component restoration unit 2170, and an inverse scaling unit 2190. Here, the quantized component restoring unit 2170 and the inverse scaling unit 2190 may be selectively provided.

In fig. 21, the decoding method selecting unit 2110 may select a decoding method based on bits allocated for each frequency band. The normalized spectrum may be provided to the zero decoding unit 2130 or the ISC decoding unit 2150 based on the decoding method selected for each frequency band.

The zero decoding unit 2130 may decode all samples to zeros for a frequency band where the allocated bits are zeros.

The ISC decoding unit 2150 can decode a frequency band in which the allocated bits are not zero by using the selected inverse quantizer. The ISC decoding unit 2150 may obtain information on important frequency components for each band of the encoded spectrum, and decode the information on important frequency components obtained for each band based on the number, position, amplitude, and sign. The significant frequency component amplitudes may be decoded in ways other than number, location, and sign. For example, the magnitude of the significant frequency component may be arithmetically decoded and may be inversely quantized using one of USQ and TCQ, and the number, position, and sign of the significant frequency component may be arithmetically decoded. The selection of the inverse quantizer may be performed using the same result as in the ISC coding unit 1040 illustrated in fig. 10. The ISC decoding unit 2150 may perform inverse quantization on a frequency band where allocated bits are not zero by using one of the TCQs and USQ.

The quantized component restoring unit 2170 may restore an actual quantized component based on the position, amplitude, and sign information of the restored ISCs. Here, zeros may be assigned to zero positions (i.e., as non-quantized portions of the spectral coefficients that are decoded to zeros).

An inverse scaling unit (not shown) may be further included to inverse scale the restored quantized components to output quantized spectral coefficients of the same level as the normalized spectral levels.

Fig. 22 is a block diagram illustrating a configuration of an ISC decoding apparatus according to an exemplary embodiment.

The apparatus shown in fig. 22 may include a pulse number estimation unit 2210 and an ISC information decoding unit 2230. The apparatus illustrated in fig. 22 may correspond to the ISC decoding unit of fig. 21, or may be implemented as a separate apparatus.

In fig. 22, the pulse number estimation unit 2210 may determine an estimated value of the number of pulses required for the current band by using the band size and the bit allocation information. That is, since the bit allocation information of the current frame is the same as that of the encoder, decoding can be performed to obtain the same estimated value of the number of pulses by using the same bit allocation information.

The ISC information decoding unit 2230 may decode the ISC information (i.e., number information, position information, amplitude information, and symbol of the ISCs) based on the estimated number of pulses.

Fig. 23 is a block diagram illustrating a configuration of an ISC information decoding apparatus according to an exemplary embodiment.

The apparatus shown in fig. 23 may include a position information decoding unit 2310, an amplitude information decoding unit 2330, and a symbol decoding unit 2350.

In fig. 23, the location information decoding unit 2310 may restore the number and locations of ISCs by indexing the index associated with the location information included in the bitstream. The position information may be decoded using arithmetic decoding. The amplitude information decoding unit 2330 may arithmetically decode an index related to amplitude information included in a bitstream and dequantize the decoded index by selecting one of TCQs and USQ. In order to improve the efficiency of arithmetic decoding, non-zero position information and the number of ISCs may be used. The symbol decoding unit 2350 may recover the symbol of the ISC by decoding an index related to the symbol information included in the bitstream. The symbol information may be decoded using arithmetic decoding. According to an embodiment, the number of pulses required for the non-zero band may be estimated and used to decode position information, amplitude information, or symbol information.

Fig. 24 is a block diagram showing a configuration of a spectrum decoding apparatus according to another exemplary embodiment. The apparatus shown in fig. 24 may correspond to the spectral decoding and dequantizing unit 1950 of fig. 19, or may be included in another frequency domain decoding apparatus, or may be independently implemented.

The apparatus shown in fig. 24 may include an ISC decoding unit 2150, a quantized component restoring unit 2170, and an inverse scaling unit 2490. Compared to fig. 21, the operation of each element is the same except that the decoding method selection unit 2110 and the zero decoding unit 2130 are omitted, and the ISC decoding unit 2150 uses TCQ.

Fig. 25 is a block diagram illustrating a configuration of a spectrum decoding apparatus according to another exemplary embodiment. The apparatus shown in fig. 25 may correspond to the spectral decoding and inverse quantization unit 1950 of fig. 19, or may be included in another frequency domain decoding apparatus, or may be independently implemented.

The apparatus shown in fig. 25 may include a decoding method selection unit 2510, an ISC decoding unit 2550, a quantized component restoration unit 2570, and an inverse scaling unit 2590. Compared to fig. 21, the operation of each element is the same except that the zero decoding unit 2130 is omitted.

Fig. 26 is a block diagram illustrating a configuration of an ISC information encoding apparatus according to another exemplary embodiment.

The apparatus of fig. 26 may include a probability calculation unit 2610 and a lossless encoding unit 2630.

In fig. 26, the probability calculation unit 2610 may calculate a probability value for amplitude coding according to equation 8 and equation 9 by using the number of ISCs, the number of pulses, and the TCQ information.

Wherein,represents the number of ISCs remaining after encoding among the ISCs to be transmitted for each frequency band,denotes the number of pulses remaining after encoding, M, among pulses to be transmitted for each frequency band_sRepresenting the existing set of amplitudes in trellis state S. Further, j represents the amplitude of the currently encoded pulse.

The lossless encoding unit 2630 may losslessly encode the TCQ magnitude information (i.e., the magnitude and path information) by using the obtained probability values. The number of pulses per amplitude is passedValue sumThe value is encoded. Here, ,the value represents the probability of the last pulse of the previous amplitude,representing probabilities corresponding to pulses other than the last pulse. Finally, an index encoded by the obtained probability value is output.

Fig. 27 is a block diagram of a configuration of an ISC information decoding apparatus according to another exemplary embodiment.

The apparatus of fig. 27 may include a probability calculation unit 2710 and a lossless decoding unit 2730.

In fig. 27, the probability calculation unit 2710 may calculate a probability value for amplitude decoding by using ISC information (number i and position) TCQ information, the number m of pulses, and a band size m. For this, the desired bit information b can be obtained by using the number of pulses and the band size previously obtained. Thereafter, a probability value for amplitude decoding may be calculated based on equations 8 and 9 by using the obtained bit information b, the number of ISCs, the ISC positions, and the TCQ information.

The lossless decoding unit 2730 may losslessly decode TCQ amplitude information (i.e., amplitude and path information) by using the probability information obtained in the same manner as the encoding apparatus and the transmitted index information. For this, an arithmetic coding model for the quantity information is first obtained using the probability value, and the TCQ amplitude information is decoded by using the obtained model for arithmetically decoding the TCQ amplitude information. In detail, the number of pulses per amplitude is determined byValue sumThe value is encoded. Here, ,the value represents the probability of the last pulse of the previous amplitude. In addition to this, the present invention is,representing probabilities corresponding to pulses other than the last pulse. Finally, TCQ amplitude information (i.e., amplitude information and path information) decoded by the obtained probability value is output.

Fig. 28 is a block diagram of a multimedia device including an encoding module according to an exemplary embodiment.

Referring to fig. 28, the multimedia device 2800 may include a communication unit 2810 and an encoding module 2830. In addition, the multimedia device 2800 may further include a storage unit 2850 to store an audio bitstream obtained as a result of encoding according to the use of the audio bitstream. In addition, the multimedia device 2800 may also include a microphone 2870. That is, the storage unit 2850 and the microphone 2870 may be selectively included. The multimedia device 2800 may further include an arbitrary decoding module (not shown), for example, a decoding module for performing a general decoding function or a decoding module according to an exemplary embodiment. The decoding module 2830 may be implemented by at least one processor (not shown) integrated with other components (not shown) included in the multimedia device 2800.

The communication unit 2810 may receive at least one of an audio signal or an encoded bitstream provided from the outside or may transmit at least one of a reconstructed audio signal or an encoded bitstream obtained as a result of encoding in the encoding module 2830.

The communication unit 2810 is configured to transmit and receive data to and from an external multimedia device or server through a wireless network such as a wireless internet, a wireless intranet, a wireless phone network, a wireless Local Area Network (LAN), Wi-Fi direct (WFD), third generation (3G), fourth generation (4G), bluetooth, infrared data association (IrDA), Radio Frequency Identification (RFID), Ultra Wideband (UWB), Zigbee, or Near Field Communication (NFC), or a wired network such as a wired phone network or a wired internet.

According to an example embodiment, the encoding module 1830 may select ISCs in units of frequency bands for the normalized spectrum based on the number, position, amplitude, and sign and encode information of the selected significant spectral components of each frequency band. The amplitudes of the significant spectral components may be encoded in a different scheme than the scheme used to encode the number, location and sign. For example, the magnitudes of the important spectral components may be quantized and arithmetically encoded by using one selected from USQ and TCQ, and the number, position, and sign of the important spectral components may be encoded by the arithmetic encoding. According to an example embodiment, the encoding module 2830 may perform scaling on the normalized spectrum based on the bit allocation for each band and select the ISCs from the scaled spectrum.

The storage unit 2850 may store the encoded bitstream generated by the encoding module 2830. In addition, the storage unit 2850 may store various programs required to operate the multimedia device 2800.

The microphone 2870 may provide an audio signal from a user or the outside to the encoding module 2830.

Fig. 29 is a block diagram of a multimedia device including a decoding module according to an exemplary embodiment.

Referring to fig. 29, the multimedia device 2900 may include a communication unit 2910 and a decoding module 2930. In addition, according to the use of the reconstructed audio signal obtained as a result of the decoding, the multimedia device 2900 may further include a storage unit 2950 for storing the reconstructed audio signal. In addition, the multimedia device 2900 may also include speakers 2970. That is, the storage unit 2950 and the speaker 2970 may also be selectively included. The multimedia device 2900 may also include an encoding module (not shown), such as an encoding module for performing general encoding functions or an encoding module according to an example embodiment. The decoding module 2930 may be implemented by at least one processor (not shown) integrated with other components (not shown) included in the multimedia device 2900.

The communication unit 2910 may receive at least one of an audio signal or an encoded bitstream provided from the outside, or may transmit at least one of a reconstructed audio signal obtained as a result of decoding in the decoding module 2930 or an audio bitstream obtained as a result of encoding. The communication unit 2910 may be implemented substantially similarly to the communication unit 2810 of fig. 28.

According to an exemplary embodiment, the decoding module 2930 may receive a bitstream provided through the communication unit 2910, obtain information of important spectral components in units of frequency bands with respect to an encoded spectrum, and decode the obtained information of the important spectral components based on the number, position, amplitude, and sign. The amplitudes of the significant spectral components may be decoded in a different scheme than the scheme used to decode the number, location and symbols. For example, the magnitudes of the important spectral components may be arithmetically decoded and dequantized by using one selected from USQ and TCQ, and the arithmetic decoding may be performed on the number, positions, and signs of the important spectral components.

The storage unit 2950 may store the reconstructed audio signal generated by the decoding module 2930. In addition, the storage unit 2950 may store various programs required to operate the multimedia device 2900.

The speaker 2970 may output the reconstructed audio signal generated by the decoding module 2930 to the outside.

Fig. 30 is a block diagram of a multimedia device including an encoding module and a decoding module according to an exemplary embodiment.

Referring to fig. 30, the multimedia device 3000 may include a communication unit 3010, an encoding module 3020, and a decoding module 3030. In addition, the multimedia device 3000 may further include a storage unit 2850 to store an audio bitstream obtained as a result of encoding or a reconstructed audio signal obtained as a result of decoding according to audio bitstream usage. Further, the multimedia device 3000 may also include a microphone 3050 and/or a speaker 3060. The encoding module 3020 and the decoding module 3030 may be implemented by at least one processor (not shown) integrated with other components (not shown) included in the multimedia device 3000.

Since components of the multimedia device 3000 shown in fig. 30 correspond to components of the multimedia device 2800 shown in fig. 28 or components of the multimedia device 2900 shown in fig. 29, detailed descriptions thereof are omitted.

Each of the multimedia devices 2800, 2900, and 3000 shown in fig. 28, 29, and 30 may include a voice communication-dedicated terminal such as a telephone or a mobile phone, a broadcast or music-dedicated device such as a TV or MP3 player, or a hybrid terminal device of a voice communication-dedicated terminal and a broadcast or music-dedicated device, but is not limited thereto. Further, each of the multimedia devices 2800, 2900, and 3000 may function as a client, a server, or a transducer disposed between a client and a server.

When the multimedia devices 2800, 2900, and 3000 are, for example, mobile phones, although not shown, the multimedia devices 2800, 2900, and 3000 may further include a user input unit such as a keypad, a display unit for displaying information processed through a user interface or the mobile phones, and a processor for controlling the functions of the mobile phones. In addition, the mobile phone may further include a camera unit having an image pickup function and at least one component for performing a function required for the mobile phone.

When the multimedia devices 2800, 2900, and 3000 are, for example, TVs, although not shown, the multimedia devices 2800, 2900, and 3000 may further include a user input unit such as a keyboard, a display unit for displaying received broadcast information, and a processor for controlling all functions of the TVs. Further, the TV may further include at least one component for performing a function of the TV.

Fig. 31 is a flowchart illustrating an operation of a method of encoding a spectral refinement structure according to an exemplary embodiment.

Referring to fig. 31, in operation 3110, an encoding method may be selected. For this, information on each frequency band and bit allocation information may be used. Here, the encoding method may include a quantization scheme.

In operation 3130, it is determined whether the current frequency band is a frequency band in which bits are allocated to zero (i.e., a zero frequency band), and if the current frequency band is a zero frequency band, the method proceeds to operation 3250, otherwise, if the current frequency band is a non-zero frequency band, the method proceeds to operation 3270.

In operation 3150, all samples in the zero band may be encoded as zeros.

In operation 3170, the frequency band, which is a non-zero frequency band, may be encoded based on the selected quantization scheme. According to an embodiment, the final number of pulses may be determined by estimating the number of pulses per frequency band using the frequency band length and the bit allocation information, determining the number of non-zero positions, and estimating the required number of bits for the non-zero positions. Next, an initial scaling factor may be determined based on the number of pulses of each frequency band and the absolute value of the input signal, and the scaling factor may be updated through a scaling and pulse redistribution process based on the initial scaling factor. The spectral coefficients are scaled using the finally updated scaling factor and the scaled spectral coefficients may be used to select a suitable ISC, and the spectral components to be quantized may be selected based on the bit allocation information for each band. Next, the amplitudes of the collected ISCs may be quantized and arithmetically coded by the USC and TCQ joint scheme. Here, in order to improve the efficiency of arithmetic coding, the number of non-zero positions and the number of ISCs may be used. The USC and TCQ federation schemes may include a first federation scheme and a second federation scheme according to bandwidth. The first joint scheme enables the quantizer to be selected by using a secondary bit allocation process for redundant bits from a previous band and can be used for NB and WB, and the second joint scheme is a scheme in which TCQ is used for LSB and USQ is used for other bits for a band determined to use USQ and can be used for SWB and FB. The sign information of the selected ISCs may be arithmetically coded with the same probability for positive signs and negative signs.

After operation 3170, an operation of restoring the quantized components and an operation of inversely scaling the frequency bands may be further included. To recover the actual quantized components, position, sign and amplitude information may be added to the quantized components. Zeros may be assigned to zero positions. The inverse scaling factor may be extracted using the same scaling factor as used for scaling, and the recovered actual quantized component may be inversely scaled. The inversely scaled signal may have the same level as the level of the normalized spectrum (i.e., the input signal).

The operation of each component of the above-described encoding apparatus may be further added to the operation of fig. 31 according to circumstances.

Fig. 32 is a flowchart illustrating an operation of a method of decoding a spectrum refinement structure according to an exemplary embodiment. According to the operation of fig. 32, in order to dequantize the refinement structure of the normalized spectrum, the ISCs for each frequency band and information on the selected ISCs may be decoded based on position, number, sign, and magnitude. Here, the amplitude information may be decoded through arithmetic decoding and USQ and TCQ joint scheme, and the position, number and sign information may be decoded through arithmetic decoding.

In detail, referring to fig. 32, in operation 3210, a decoding method may be selected. For this, information on each frequency band and bit allocation information may be used. Here, the decoding method may include an inverse quantization scheme. The inverse quantization scheme may be selected through the same process as the quantization scheme selection applied to the above-described encoding apparatus.

At operation 3230, it is determined whether the current frequency band is a frequency band in which bits are allocated to zero (i.e., a zero frequency band), and if the current frequency band is the zero frequency band, the method proceeds to operation 3250, otherwise, if the current frequency band is a non-zero frequency band, the method proceeds to operation 3270.

In operation 3250, all samples in the zero band may be decoded to zeros.

In operation 3270, the frequency bands that are non-zero bands may be decoded based on the selected inverse quantization scheme. According to an embodiment, the number of pulses per frequency band may be estimated or determined by using the frequency band length and the bit allocation information. This can be performed by the same process as the scaling applied to the above-described encoding apparatus. Next, the location information of the ISCs (i.e., the number and location of the ISCs) may be recovered. This may be handled similarly to the encoding device described above, and the same probability values may be used for proper decoding. Next, the magnitudes of the collected ISCs may be decoded by arithmetic decoding and inversely quantized by a USC and TCQ joint scheme. Here, the number of non-zero positions and the number of ISCs may be used for arithmetic decoding. The USC and TCQ federation schemes may include a first federation scheme and a second federation scheme according to bandwidth. The first joint scheme enables the quantizer to be selected by additionally using a secondary bit allocation process for redundant bits from a previous band and can be used for NB and WB, and the second joint scheme is a scheme in which TCQ is used for LSB and USQ is used for other bits for a band determined to use USQ and can be used for SWB and FB. The sign information of the selected ISCs may be arithmetically decoded with the same probability for positive signs and negative signs.

After operation 3270, an operation of restoring the quantized components and an operation of inversely scaling the frequency bands may be further included. To recover the actual quantized components, position, sign and amplitude information may be added to the quantized components. The frequency bands that do not have data to be transmitted may be padded with zeros. Next, the number of pulses in the non-zero frequency band may be estimated, and position information including the number and position of the ISCs may be decoded based on the estimated number of pulses. The amplitude information may be decoded by lossless decoding and a USC and TCQ joint scheme. For non-zero amplitude values, the sign and quantized components may be finally recovered. For the recovered actual quantized components, inverse scaling may be performed using the transmitted norm information.

The operation of each component of the above-described decoding apparatus may be further added to the operation of fig. 32 according to circumstances.

The above-described exemplary embodiments can be written as computer-executable programs and can be implemented in general-use digital computers that execute the programs by using a non-transitory computer-readable recording medium. In addition, a data structure, program instructions, or data files that may be used in embodiments may be recorded on a non-transitory computer-readable recording medium in various ways. The non-transitory computer readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the non-transitory computer-readable recording medium include magnetic media (such as hard disks, floppy disks, and magnetic tapes), optical recording media (such as CD ROMs and DVDs), magneto-optical media (such as optical disks), and hardware devices (such as ROMs, RAMs, and flash memories) specially configured to store and execute program instructions. Further, the non-transitory computer-readable recording medium may be a transmission medium for transmitting a signal specifying the program instructions, the data structures, and the like. Examples of the program instructions may include not only machine language code created by a compiler, but also high-level language code that may be executed by a computer using an interpreter or the like.

While exemplary embodiments have been particularly shown and described, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the inventive concept as defined by the following claims. It should be understood that the exemplary embodiments described herein should be considered in a descriptive sense only and not for purposes of limitation. The description of features or aspects within each exemplary embodiment should generally be considered as applicable to other similar features or aspects in other exemplary embodiments.

Claims

1. A method of spectral coding, comprising:

selecting an encoding method of a frequency band based on bit allocation information of the frequency band;

encoding spectral components in the frequency band to zeros if the encoding method of the selected frequency band is a zero encoding method;

if the selected encoding method of the frequency band is not a zero encoding method, the amplitudes of the spectral components in the frequency band are encoded by using one of Uniform Scalar Quantization (USQ) and trellis encoding quantization (TCQ) based on the average number of bits allocated to the spectral components of the frequency band.

2. The spectral coding method of claim 1, further comprising: encoding information about the spectral components if the selected encoding method is not a zero encoding method, wherein the information about the spectral components includes the number, positions, amplitudes, and signs of the spectral components.

3. The spectral coding method of claim 2, wherein the magnitudes of the spectral components are coded according to a different scheme than the coding scheme of the number, location and sign of the spectral components.

4. The spectral encoding method of claim 1, wherein the step of encoding the magnitudes of the spectral components comprises: scaling the normalized spectrum based on the bits allocated for the frequency band,

wherein the spectral components are selected for the scaled spectrum.

5. A method of spectral decoding, comprising:

selecting a decoding method of a frequency band based on bit allocation information of the frequency band;

decoding spectral components in the frequency band to zeros if the selected decoding method of the frequency band is a zero decoding method;

if the selected decoding method of the frequency band is not a zero decoding method, decoding the amplitudes of the spectral components in the frequency band by using one of Uniform Scalar Quantization (USQ) and Trellis Coded Quantization (TCQ) based on an average number of bits allocated to the spectral components of the frequency band.