WO2015133795A1 - Procédé et appareil de décodage haute fréquence pour une extension de bande passante - Google Patents

Procédé et appareil de décodage haute fréquence pour une extension de bande passante Download PDF

Info

Publication number
WO2015133795A1
WO2015133795A1 PCT/KR2015/002045 KR2015002045W WO2015133795A1 WO 2015133795 A1 WO2015133795 A1 WO 2015133795A1 KR 2015002045 W KR2015002045 W KR 2015002045W WO 2015133795 A1 WO2015133795 A1 WO 2015133795A1
Authority
WO
WIPO (PCT)
Prior art keywords
low frequency
spectrum
frequency spectrum
excitation
high frequency
Prior art date
Application number
PCT/KR2015/002045
Other languages
English (en)
Korean (ko)
Inventor
주기현
오은미
황선호
Original Assignee
삼성전자 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자 주식회사 filed Critical 삼성전자 주식회사
Priority to CN202010101660.4A priority Critical patent/CN111312277B/zh
Priority to JP2016555511A priority patent/JP6383000B2/ja
Priority to US15/123,897 priority patent/US10410645B2/en
Priority to EP15759308.8A priority patent/EP3115991A4/fr
Priority to CN201580022645.8A priority patent/CN106463143B/zh
Priority to CN202010101692.4A priority patent/CN111312278B/zh
Publication of WO2015133795A1 publication Critical patent/WO2015133795A1/fr
Priority to US16/538,427 priority patent/US10803878B2/en
Priority to US17/030,104 priority patent/US11676614B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques

Definitions

  • the present invention relates to audio encoding and decoding, and more particularly, to a high frequency decoding method and apparatus for bandwidth extension.
  • the coding scheme of G.719 is developed and standardized for the purpose of teleconference, and performs the frequency domain transformation by performing the Modified Discrete Cosine Transform (MDCT).
  • MDCT Modified Discrete Cosine Transform
  • Non-stationary frames are modified to account for temporal characteristics by changing the time domain aliasing order.
  • the spectrum obtained for the non-stationary frame may be configured to be similar to the stationary frame by performing interleaving to configure the codec in the same framework as the stationary frame.
  • Quantumization is performed after normalization is performed by obtaining the energy of the spectrum configured as described above. In general, energy is expressed as an RMS value, and the normalized spectrum generates bits necessary for each band through energy-based bit allocation, and generates a bitstream through quantization and lossless coding based on bit allocation information for each band.
  • the reverse process of the coding scheme dequantizes the energy in the bitstream, generates bit allocation information based on the dequantized energy, and performs dequantization of the spectrum to normalize dequantized spectrum.
  • a noise filling method is generated in which a noise codebook is generated based on a low frequency dequantized spectrum and generates noise in accordance with the transmitted noise level.
  • a bandwidth extension technique for generating a high frequency signal by folding a low frequency signal for a band above a specific frequency is applied.
  • An object of the present invention is to provide a high frequency decoding method and apparatus for bandwidth expansion that can improve reconstructed sound quality and a multimedia device employing the same.
  • a high frequency decoding method for bandwidth extension comprising: decoding an excitation class; Modifying the decoded low frequency spectrum based on the excitation class; And generating a high frequency excitation spectrum based on the modified low frequency spectrum.
  • a high frequency decoding apparatus for bandwidth extension decodes an excitation class, modifies the decoded low frequency spectrum based on the excitation class, and based on the modified low frequency spectrum. It may include at least one processor for generating an excitation spectrum.
  • the high frequency excitation spectrum is generated by modifying the restored low frequency spectrum, thereby improving the reconstructed sound quality without excessively increasing the complexity.
  • 1 is a diagram illustrating an example of a subband configuration of a low frequency band and a high frequency band according to an embodiment.
  • FIGS. 2A to 2C are diagrams illustrating R0 and R1 bands divided into R2 and R3, R4, and R5 according to a selected coding scheme according to an embodiment.
  • FIG. 3 is a diagram illustrating an example of a subband configuration of a high frequency band according to an embodiment.
  • FIG. 4 is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.
  • FIG. 5 is a block diagram illustrating a configuration of a BWE parameter generator according to an embodiment.
  • FIG. 6 is a block diagram illustrating a configuration of an audio decoding apparatus according to an embodiment.
  • FIG. 7 is a block diagram illustrating a configuration of a high frequency decoding apparatus according to an embodiment.
  • FIG. 8 is a block diagram illustrating a configuration of a low frequency spectral deformation unit according to an exemplary embodiment.
  • FIG. 9 is a block diagram illustrating a configuration of a low frequency spectral deformation unit according to another exemplary embodiment.
  • FIG. 10 is a block diagram illustrating a configuration of a low frequency spectral deformation unit according to another exemplary embodiment.
  • FIG. 11 is a block diagram illustrating a configuration of a low frequency spectral deformation unit according to another embodiment.
  • FIG. 12 is a block diagram illustrating a configuration of a dynamic range controller according to an embodiment.
  • FIG. 13 is a block diagram illustrating a configuration of a high frequency excitation spectrum generator according to an exemplary embodiment.
  • FIG. 14 is a diagram for explaining a smoothing process on a weight at a band boundary.
  • FIG. 15 is a diagram illustrating a weight that is a contribution used to reconstruct a spectrum existing in an overlapping region according to an embodiment.
  • 16 is a block diagram illustrating a configuration of a multimedia apparatus including a decoding module according to an embodiment.
  • FIG. 17 is a block diagram illustrating a configuration of a multimedia apparatus including an encoding module and a decoding module, according to an embodiment.
  • FIG. 18 is a flowchart illustrating an operation of a high frequency decoding method according to an embodiment.
  • 19 is a flowchart illustrating an operation of a method for modifying low frequency spectrum according to an embodiment.
  • first and second may be used to describe various components, but the components are not limited by the terms. The terms are only used to distinguish one component from another.
  • the sampling rate is 32 kHz
  • 640 MDCT spectral coefficients are configured as 22 bands, specifically, 17 bands for the low frequency band and 5 bands for the high frequency band.
  • the start frequency of the high frequency band is the 241 th spectral coefficient
  • the spectral coefficients from 0 to 240 may be defined as R0 as a region coded by a low frequency coding scheme, that is, a core coding scheme.
  • the spectral coefficients from 241 to 639 may be defined as R1 as a high frequency band through which bandwidth extension (BWE) is performed.
  • BWE bandwidth extension
  • a band coded by a low frequency coding scheme may also exist in the R1 region according to bit allocation information.
  • FIG. 2A to 2C are diagrams illustrating R0 and R1 regions of FIG. 1 divided into R2, R3, R4, and R5 according to a selected coding scheme.
  • the R1 region which is a BWE region
  • the R0 region which is a low frequency coding region
  • R2 represents a band including a signal that is quantized and lossless coded by a low frequency coding scheme, for example, a frequency domain coding scheme
  • R3 represents a band without a signal coded by the low frequency coding scheme.
  • R4 denotes a band in which low frequency signals are not coded or bits are allocated but noise is added due to lack of bit margin. Therefore, the division of R4 and R5 may be determined by adding noise, which may be determined by the ratio of the number of low-frequency coded in-band spectra, or may be determined based on the in-band pulse allocation information when using FPC. . Since the R4 and R5 bands can be distinguished when adding noise in the decoding process, they may not be clearly distinguished in the encoding process.
  • the R2 to R5 bands not only have different information to be encoded, but may have different decoding schemes.
  • two bands from 170 to 240 in the low frequency coding region R0 are R4 to add noise, and two bands from 241 to 350 and 427-639 in the BWE region R1.
  • Two bands are R2 coded with a low frequency coding scheme.
  • one band up to 202-240 in the low frequency coding region R0 adds noise, and all five bands up to 241-639 in the BWE region R1 use the low frequency coding scheme.
  • R2 is coded.
  • three bands up to 144-240 in the low frequency coding region R0 are R4 to which noise is added, and R2 in the BWE region R1 is not present.
  • R4 may be normally distributed in the high frequency portion, but in the BWE region R1, R2 is not limited to the specific frequency portion.
  • FIG. 3 is a diagram for explaining an example of a subband configuration of a high frequency band of a wide band (WB) according to one embodiment.
  • the 32 KHz sampling rate is 32 kHz
  • 640 MDCT spectral coefficients may be configured with 14 bands for the mid-high frequency band.
  • 100 Hz contains four spectral coefficients, so the first band of 400 Hz may contain 16 spectral coefficients.
  • Reference numeral 310 denotes a high frequency band of 6.4 to 14.4 KHz
  • reference numeral 330 denotes a subband configuration for a high frequency band of 8.0 to 16.0 KHz.
  • FIG. 4 is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.
  • the audio encoding apparatus shown in FIG. 4 may include a BWE parameter generator 410, a low frequency encoder 430, a high frequency encoder 450, and a multiplexer 470. Each component may be integrated into at least one module and implemented as at least one processor (not shown).
  • the input signal may mean music or voice, or a mixed signal of music and voice, and may be divided into a voice signal and another general signal.
  • the audio signal will be referred to collectively.
  • the BWE parameter generator 410 may generate a BWE parameter for bandwidth expansion.
  • the BWE parameter may correspond to an excitation class.
  • the BWE parameter may include parameters different from the excitation class.
  • the BWE parameter generator 410 may generate an excitation class based on signal characteristics on a frame basis. Specifically, it may be determined whether the input signal has a voice characteristic or a tonal characteristic, and one of the plurality of excitation classes may be determined based on the determination result.
  • the plurality of excitation classes may include excitation classes related to voice, excitation classes related to tonal music, and excitation classes related to non-tonal music.
  • the determined excitation class may be included in the bitstream and transmitted.
  • the low frequency encoder 430 may generate an encoded spectral coefficient by performing encoding on the low band signal. Also, the low frequency encoder 430 may encode information related to energy of the low band signal. According to an embodiment, the low frequency encoder 430 may convert the low band signal into the frequency domain to generate a low frequency spectrum, and quantize the low frequency spectrum to generate quantized spectral coefficients. Modified Discrete Cosine Transform (MDCT) may be used for domain transformation, but is not limited thereto. PVQ (Pyramid Vector Quantization) may be used for quantization, but is not limited thereto.
  • MDCT Modified Discrete Cosine Transform
  • PVQ Physical Vector Quantization
  • the high frequency encoder 450 may perform encoding on the high band signal to generate a parameter for bandwidth extension or a parameter for bit allocation in the decoder.
  • Parameters required for bandwidth extension may include information related to energy of the high band signal and additional information.
  • energy may be expressed as an envelope, scale factor, average power or Norm.
  • the additional information is information about a band including an important frequency component in a high band, and may be information related to a frequency component included in a specific high frequency band.
  • the high frequency encoder 450 may generate a high frequency spectrum by converting a high band signal into a frequency domain, and may quantize information related to energy of the high frequency spectrum. MDCT may be used for domain conversion, but is not limited thereto.
  • Vector quantization may be used for quantization, but is not limited thereto.
  • the multiplexer 470 may generate a bitstream including a BWE parameter, that is, an excitation class, a parameter for bandwidth extension or a parameter for bit allocation, and a coded spectral coefficient of a low band.
  • the bitstream can be transmitted or stored.
  • the frequency domain BWE scheme may be applied in combination with a time domain coding part.
  • the CELP scheme may be mainly used for time domain coding, and may be implemented to code a low band with the CELP scheme and be combined with the BWE scheme in the time domain rather than the BWE in the frequency domain.
  • the coding scheme can be selectively applied based on the adaptive coding scheme determination between the time domain coding and the frequency domain coding as a whole.
  • Signal classification is required in order to select an appropriate coding scheme, and according to an embodiment, the excitation class for each frame may be determined by using the signal classification result first.
  • FIG. 5 is a block diagram illustrating a configuration of the BWE parameter generator 410 of FIG. 4, and may include a signal classifier 510 and an excitation class generator 530.
  • the signal classifier 510 may analyze signal characteristics on a frame basis to classify whether a current frame is a voice signal and determine an excitation class according to the classification result.
  • Signal classification processing can be performed using various known methods, for example, short-term and / or long-term characteristics.
  • the short term characteristic and / or long term characteristic may be a frequency domain characteristic or a time domain characteristic.
  • a method of allocating a fixed type of excitation class may help to improve sound quality, rather than a method based on a characteristic of a high band signal.
  • the signal classification process may be performed on the current frame without considering the classification result of the previous frame.
  • the current frame may finally be determined by frequency domain coding in consideration of the hangover, when the current frame itself is classified as a time domain coding method, a fixed excitation class may be allocated. For example, if the current frame is classified as a speech signal for which time domain coding is appropriate, the excitation class may be set to a first excitation class related to the speech characteristic.
  • the excitation class generator 530 may determine the excitation class using at least one or more thresholds. According to an embodiment, the excitation class generator 530 calculates a high band tonality value when the current frame is not classified as a voice signal as a result of the classification of the signal classifier 510, and thresholds the tonality value. You can determine the class here by comparing with. A plurality of thresholds may be used according to the number of classes here. When one threshold is used, it may be classified as a tonal music signal when the tonality value is greater than the threshold, and a non-tonal music signal, for example a noisy signal, when the tonality value is smaller than the threshold. When the current frame is classified as a tonal music signal, the excitation class may be determined as a second excitation class related to the tonal characteristic and a third excitation class related to the non-tonal characteristic when it is classified as a noisy signal.
  • FIG. 6 is a block diagram illustrating a configuration of an audio decoding apparatus according to an embodiment.
  • the audio decoding apparatus illustrated in FIG. 6 may include a demultiplexer 610, a BWE parameter decoder 630, a low frequency decoder 650, and a high frequency decoder 670. Although not shown, the audio decoding apparatus may further include a spectrum combiner and an inverse transform unit. Each component may be integrated into at least one module and implemented as at least one processor (not shown).
  • the input signal may mean music or voice, or a mixed signal of music and voice, and may be divided into a voice signal and another general signal.
  • the audio signal will be referred to collectively.
  • the demultiplexer 610 may generate a parameter necessary for decoding by parsing a received bitstream.
  • the BWE parameter decoder 630 may decode the BWE parameter from the bitstream.
  • the BWE parameter may correspond to a class here. Meanwhile, the BWE parameter may include parameters different from the excitation class.
  • the low frequency decoder 650 may generate a low frequency spectrum by decoding the encoded spectral coefficients of the low band from the bitstream. Meanwhile, the low frequency decoder 650 may decode information related to energy of the low band signal.
  • the high frequency decoder 670 may generate a high frequency excitation spectrum by using the decoded low frequency spectrum and the excitation class. According to another embodiment, the high frequency decoder 670 decodes a parameter for bandwidth extension or a parameter for bit allocation from the bitstream, and a parameter for bandwidth extension or a parameter for bit allocation and energy of the decoded low band signal. The information related to can be applied to the high frequency excitation spectrum.
  • Parameters required for bandwidth extension may include information related to energy of the high band signal and additional information.
  • the additional information is information about a band including an important frequency component in a high band, and may be information related to a frequency component included in a specific high frequency band.
  • Information related to the energy of the highband signal may be vector dequantized.
  • the spectrum combiner may combine the spectrum provided from the low frequency decoder 650 with the spectrum provided from the high frequency decoder 670.
  • the inverse transform unit (not shown) may inversely convert the combined spectrum into the time domain.
  • Inverse MDCT IMDCT
  • IMDCT Inverse MDCT
  • the high frequency decoding apparatus of FIG. 7 may include a low frequency spectrum transform unit 710 and a high frequency excitation spectrum generator 730. Although not shown here, the apparatus may further include a receiver configured to receive the decoded low frequency spectrum.
  • the low frequency spectrum modifying unit 710 may modify the decoded low frequency spectrum based on the excitation class.
  • the decoded low frequency spectrum may be a noise peeled spectrum.
  • the decoded low frequency spectrum may be an anti-sparseness processed spectrum in which a coefficient having a constant amplitude and a random code is inserted again in the remaining portion after the noise filling process. have.
  • the high frequency excitation spectrum generator 730 may generate a high frequency excitation spectrum from the modified low frequency spectrum. Additionally, the gain may be applied to the energy of the generated high frequency excitation spectrum such that the energy of the generated high frequency excitation spectrum matches the dequantized energy.
  • FIG. 8 is a block diagram illustrating a configuration of a low frequency spectrum transform unit 710 of FIG. 7 according to an embodiment, and may include an operation unit 810.
  • the calculator 810 may generate a modified low frequency spectrum by performing a predetermined operation on the decoded low frequency spectrum based on an excitation class.
  • the decoded low frequency spectrum may correspond to a noise peeled spectrum, an anti-sparse processed spectrum, or an inverse quantized low frequency spectrum without adding noise.
  • the predetermined operation may be a process of determining a weight according to an excitation class and mixing the decoded low frequency spectrum and random noise based on the determined weight.
  • the predetermined calculation process may include a multiplication process and an addition process. Random noise may be generated in a variety of known manners, for example, may be generated using a random seed.
  • the operation unit 810 may further include a process of matching the whitened low frequency spectrum and the level of the random noise to a similar level prior to a predetermined operation processing.
  • FIG. 9 is a block diagram illustrating a configuration of a low frequency spectrum transform unit 710 of FIG. 7 according to another embodiment, and may include a whitening unit 910, an operation unit 930, and a level adjustment unit 950.
  • the level adjusting unit 950 may be provided as an option.
  • the whitening unit 910 may perform whitening on the decoded low frequency spectrum.
  • the noise remaining in the portion of the decoded low frequency spectrum to zero may be added by the noise filling process or the anti sparse process.
  • the noise addition may be selectively performed in units of subbands.
  • the whitening process performs normalization based on envelope information of a low frequency spectrum, and various known methods can be applied. Specifically, the normalization process may correspond to calculating an envelope from the low frequency spectrum and dividing the low frequency spectrum by the envelope. The whitening process can be performed so that the shape of the spectrum is flat but the fine structure of the internal frequency is maintained.
  • the window size for normalization processing may be determined according to the signal characteristics.
  • the calculation unit 930 may generate a modified low frequency spectrum by performing a predetermined operation on the whitened low frequency spectrum based on the excitation class.
  • the predetermined operation may be a process of determining a weight according to an excitation class and mixing the whitened low frequency spectrum and random noise based on the determined weight.
  • the calculator 930 may operate in the same manner as the calculator 810 of FIG. 8.
  • FIG. 10 is a block diagram illustrating a configuration of a low frequency spectrum modifying unit 710 of FIG. 7 according to another exemplary embodiment, and may include a dynamic range controller 1010.
  • the dynamic range controller 1010 may generate the modified low frequency spectrum by controlling the dynamic range of the decoded low frequency spectrum based on the excitation class.
  • the dynamic range may mean spectral amplitude.
  • FIG. 11 is a block diagram illustrating a configuration of a low frequency spectrum modifying unit 710 of FIG. 7 according to another exemplary embodiment, and may include a whitening unit 1110 and a dynamic range control unit 1130.
  • the whitening unit 1110 may operate in the same manner as the whitening unit 910 of FIG. 9. That is, the whitening unit 1110 may perform whitening on the decoded low frequency spectrum.
  • the noise remaining in the portion of the decoded low frequency spectrum to zero may be added by the noise filling process or the anti sparse process.
  • the noise addition may be selectively performed in units of subbands.
  • the whitening process performs normalization based on envelope information of a low frequency spectrum, and various known methods can be applied. Specifically, the normalization process may correspond to calculating an envelope from the low frequency spectrum and dividing the low frequency spectrum by the envelope.
  • the whitening process can be performed so that the shape of the spectrum is flat but the fine structure of the internal frequency is maintained.
  • the window size for normalization processing may be determined according to the signal characteristics.
  • the dynamic range controller 1130 may generate the modified low frequency spectrum by controlling the dynamic range of the whitened low frequency spectrum based on the excitation class.
  • FIG. 12 is a block diagram illustrating a configuration of a dynamic range control unit 1110 of FIG. 11, and includes a code separator 1210, a control parameter determiner 1230, an amplitude adjuster 1250, and a random code generation.
  • the unit 1270 and the code applying unit 1290 may be included.
  • the random code generation unit 127 may be integrated with the code application unit 129.
  • the code separator 1210 may generate an amplitude, that is, an absolute value spectrum by removing a code from the decoded low frequency spectrum.
  • the control parameter determiner 1230 may determine the control parameter based on the excitation class. Since the excitation class is information related to the tonal or flat characteristics, it is possible to determine the control parameter that can adjust the amplitude of the absolute value spectrum based on the excitation class. The amplitude of the absolute value spectrum can be expressed as a dynamic range or peak-valley interval. According to an embodiment, the control parameter determiner 1130 may determine control parameters having different values corresponding to the excitation class. For example, 0.2 for an excitation class related to a voice characteristic, 0.05 for an excitation class related to a tonal characteristic, and 0.8 for an excitation class related to a noisy characteristic may be allocated as control parameters. According to this, in the case of a frame having noise characteristics in the high frequency band, the amplitude adjustment degree can be increased.
  • the amplitude adjuster 1250 may adjust the amplitude of the low frequency spectrum, that is, the dynamic range, based on the control parameter determined by the control parameter determiner 1230. In this case, the larger the value of the control parameter, the more the dynamic range is adjusted. According to one embodiment, the dynamic range can be adjusted by adding an amplitude of a predetermined magnitude to the original absolute value spectrum.
  • the amplitude of the predetermined magnitude may correspond to a value obtained by multiplying a control parameter with respect to a difference value between an amplitude of each frequency bin of a specific band of the absolute value spectrum and an average amplitude of the corresponding band.
  • the amplitude adjusting unit 1250 may process the low frequency spectrum by configuring a band having the same size.
  • each band may include 16 spectral coefficients.
  • the average amplitude is calculated for each band, and the amplitude of each frequency bin included in each band may be adjusted based on the average amplitude and control parameter of each band. For example, a frequency bin having an amplitude greater than the average amplitude of the band may mean decreasing its amplitude, and a frequency bin having an amplitude less than the average amplitude of the band may mean increasing its amplitude.
  • the degree of adjustment of the dynamic range may vary depending on the excitation class. Specifically, the dynamic range control may be performed according to Equation 1 below.
  • each amplitude may represent an absolute value.
  • the dynamic range control may be performed in the spectral coefficient of the band, that is, the frequency bin.
  • the average amplitude is calculated in bands, and the control parameter may be applied in units of frames.
  • each band may be configured based on a start frequency at which the transposition is to be performed.
  • each band may be configured to include 16 frequency bins starting from transposition frequency bin 2.
  • SWB 9 bands exist at the end of the frequency bin 145 at 24.4 kbps
  • 8 bands exist at the end of the frequency bin 129 at 32 kbps.
  • FB there are 19 bands ending at the frequency bin 305 at 24.4 kbps, and 18 bands ending at the frequency bin 289 at 32 kbps.
  • the random code generator 1270 may generate a random code when it is determined that a random code is necessary based on the excitation class.
  • the random code may be generated in units of frames.
  • a random code may be applied to an excitation class related to a noisy characteristic.
  • the code applying unit 1290 may generate a modified low frequency spectrum by applying one of a random code or an original code to the low frequency spectrum of which the dynamic range is adjusted.
  • the original code may use the code removed by the code separator 1210.
  • a random code may be applied to the excitation class related to the noisy characteristic, and an original code may be applied to the excitation class related to the tonal characteristic or the excitation class related to the voice characteristic.
  • a random code may be applied to a frame determined to be noisy, and an original code may be applied to a frame determined to be tonal or a frame determined to be a voice signal.
  • FIG. 13 is a block diagram illustrating a configuration of the high frequency excitation spectrum generator 730 of FIG. 7 according to an embodiment, and may include a spectrum patching unit 1310 and a spectrum adjusting unit 1330.
  • the spectrum adjusting unit 1330 may be provided as an option.
  • the spectral patching unit 1310 may fill a spectrum in an empty high band by patching, for example, transferring, copying, mirroring, or folding the modified low frequency spectrum into a high band.
  • the modified spectrum in the source band 50 to 3250 Hz is copied into the 8000 to 11200 Hz band
  • the modified spectrum in the same source band 50 to 3250 Hz is copied into the 11200 Hz to 14400 Hz band
  • a high frequency excitation spectrum can be generated from the modified low frequency spectrum.
  • the spectrum adjuster 1330 may adjust the high frequency excitation spectrum provided from the spectrum patcher 1310 to solve the discontinuity of the spectrum at the boundary between the patched bands performed by the spectrum patcher 1310. According to an embodiment, the spectrums around the boundary position of the high frequency excitation spectrum provided from the spectrum patching unit 1310 may be utilized.
  • the generated high frequency excitation spectrum or the adjusted high frequency excitation spectrum and the decoded low frequency spectrum may be combined, and the combined spectrum may be generated as a time domain signal through an inverse transformation process.
  • An inverse transform process may be performed on each of the high frequency excitation spectrum and the decoded low frequency spectrum beforehand and then combined.
  • an inverse modified discrete cosine transform IMDCT
  • IMDCT inverse modified discrete cosine transform
  • the overlapped frequency bands may be restored through overlap add processing.
  • the portion of the frequency band overlap may be restored based on the information transmitted through the bitstream.
  • the overlap add process or the process based on the transmitted information may be selectively applied according to the environment of the receiver, or may be restored based on the weight.
  • FIG. 14 is a diagram for explaining a smoothing process on a weight at a band boundary.
  • the K + 1 band does not perform smoothing, but performs smoothing only in the K + 2 band.
  • the reason is that the weight in the K + 1 band (Ws (K + 1)) is 0, so when smoothing in the K + 1 band, the weight in the K + 1 band (Ws (K + 1)) is 0.
  • a weight of 0 indicates that the band does not consider random noise when generating high frequency excitation spectrum. This is the case of an extreme tonal signal and is intended to prevent noise from being inserted into the valley section of the harmonic signal due to random noise.
  • the processing can be configured in such a manner that the last band of the low frequency coding region R0 and the start band of the BWE region R1 are overlapped.
  • the band configuration of the BWE region R1 may be configured in another manner to have a more compact band allocation structure.
  • the last band of the low frequency coding region R0 may be configured to 8.2 kHz
  • the start band of the BWE region R1 may be configured to start from 8 kHz.
  • an overlapping region is generated between the low frequency coding region R0 and the BWE region R1.
  • Two decoded spectra can be generated in the overlapping region.
  • One is a spectrum generated by applying a low frequency decoding method
  • the other is a spectrum generated by a high frequency decoding method.
  • An overlap add method may be applied to smooth the transition between the two spectra, that is, the low frequency spectrum and the high frequency spectrum.
  • the spectrum closer to the low frequency of the overlapped region increases the contribution of the spectrum generated by the low frequency method
  • the spectrum close to the high frequency side increases the contribution of the spectrum generated by the high frequency method to the overlapped region.
  • the last band of the low frequency coding region R0 is 8.2 kHz
  • the start band of the BWE region R1 starts at 8 kHz
  • the spectrum of 640 samples is composed at a 32 kHz sampling rate, it is 320 to 327.
  • Eight spectra overlap, and eight spectra may be generated as in Equation 2 below.
  • Is a spectrum decoded in a low frequency manner Is a spectrum decoded by a high frequency method
  • L0 is a starting spectrum position of a high frequency
  • L0 to L1 are overlapped regions
  • w 0 is a contribution.
  • FIG. 15 is a diagram for explaining a contribution used to reconstruct a spectrum existing in an overlapping region after a BWE process according to an embodiment.
  • w O (k) it is may be selectively applied to the w O0 (k) and w O1 (k), w O0 (k) is to apply the same weight to the decoding scheme of the low-frequency and high-frequency , w O1 (k) is a method of applying a greater weight to the high frequency decoding method.
  • the selection criteria for the two w O (k) vary, but one example is the presence or absence of a pulse in the low frequency overlapping band. When a pulse is selected and coded in the overlapping band of low frequency, w O0 (k) is utilized to make the contribution to the spectrum generated at low frequency close to L1 and reduce the contribution of high frequency.
  • the spectrum generated by the actual coding scheme may be higher in terms of proximity to the original signal than the spectrum of the signal generated through the BWE.
  • 16 is a block diagram showing a configuration of a multimedia device including a decoding module according to an embodiment of the present invention.
  • the multimedia apparatus 1600 illustrated in FIG. 16 may include a communication unit 1610 and a decoding module 1630.
  • the storage unit 1650 may further include a storage unit 1650 for storing the restored audio signal according to the use of the restored audio signal obtained as a result of the decoding.
  • the multimedia device 1600 may further include a speaker 1670. That is, the storage 1650 and the speaker 1670 may be provided as an option.
  • the multimedia apparatus 1600 illustrated in FIG. 16 may further include an arbitrary encoding module (not shown), for example, an encoding module for performing a general encoding function or an encoding module according to an embodiment of the present invention.
  • the decoding module 1630 may be integrated with other components (not shown) included in the multimedia device 1600 and implemented as at least one or more processors (not shown).
  • the communication unit 1610 receives at least one of an encoded bitstream and an audio signal provided from the outside, or at least one of a reconstructed audio signal obtained as a result of decoding of the decoding module 1630 and an audio bitstream obtained as a result of encoding. You can send one.
  • the communication unit 1610 includes wireless Internet, wireless intranet, wireless telephone network, wireless LAN (LAN), Wi-Fi, Wi-Fi Direct (WFD), 3G (Generation), 4G (4 Generation), and Bluetooth.
  • Wireless networks such as Bluetooth, Infrared Data Association (IrDA), Radio Frequency Identification (RFID), Ultra WideBand (UWB), Zigbee, Near Field Communication (NFC), wired telephone networks, wired Internet It is configured to send and receive data with external multimedia device through wired network.
  • IrDA Infrared Data Association
  • RFID Radio Frequency Identification
  • UWB Ultra WideBand
  • NFC Near Field Communication
  • the decoding module 1630 may receive a bitstream provided through the communication unit 1610 and perform decoding on an audio spectrum included in the bitstream.
  • the decoding process may be performed using the above-described decoding apparatus or a decoding method to be described later, but is not limited thereto.
  • the storage unit 1650 may store the restored audio signal generated by the decoding module 1630. Meanwhile, the storage unit 1650 may store various programs necessary for operating the multimedia apparatus 1600.
  • the speaker 1670 may output the restored audio signal generated by the decoding module 1630 to the outside.
  • FIG. 17 is a block diagram illustrating a configuration of a multimedia apparatus including an encoding module and a decoding module according to an embodiment of the present invention.
  • the multimedia device 1700 illustrated in FIG. 17 may include a communication unit 1710, an encoding module 1720, and a decoding module 1730.
  • the storage unit 1740 may further include an audio bitstream or a restored audio signal according to a use of the audio bitstream obtained as a result of encoding or the restored audio signal obtained as a result of decoding.
  • the multimedia device 1700 may further include a microphone 1750 or a speaker 1760.
  • the encoding module 1720 and the decoding module 1730 may be integrated with other components (not shown) included in the multimedia device 1700 to be implemented as at least one processor (not shown).
  • the encoding module 1720 may perform encoding on an audio signal of a time domain provided through the communication unit 1710 or the microphone 1750.
  • the encoding process may be performed using the above-described encoding apparatus, but is not limited thereto.
  • the microphone 1750 may provide a user or an external audio signal to the encoding module 1720.
  • the multimedia apparatus 1600 and 1700 include a voice communication terminal including a telephone, a mobile phone, etc., a broadcast or music dedicated apparatus including a TV, an MP3 player, or the like.
  • a fusion terminal device of a broadcast or music dedicated device may be included, but is not limited thereto.
  • the multimedia device 1600, 1700 may be used as a client, a server, or a transducer disposed between the client and the server.
  • the multimedia devices 1600 and 1700 are, for example, mobile phones, although not shown, a user input unit such as a keypad, a display unit for displaying information processed by the user interface or the mobile phone, and a processor for controlling the overall functions of the mobile phone It may further include.
  • the mobile phone may further include a camera unit having an imaging function and at least one component that performs a function required by the mobile phone.
  • the multimedia apparatus 1600 or 1700 when the multimedia apparatus 1600 or 1700 is a TV, for example, although not shown, the multimedia apparatus 1600 may further include a user input unit such as a keypad, a display unit for displaying received broadcast information, and a processor for controlling overall functions of the TV. .
  • the TV may further include at least one or more components that perform a function required by the TV.
  • FIG. 18 is a flowchart illustrating an operation of a high frequency decoding method according to an embodiment. The method illustrated in FIG. 18 may be performed by the high frequency decoder 670 of FIG. 6 or by a separate processor.
  • an excitation class is decoded.
  • the excitation class may be generated at the encoder stage and transmitted to the decoder stage as a bitstream.
  • the excitation class can be generated and used separately in the decoder stage.
  • the class here can be obtained frame by frame.
  • the decoded low frequency spectrum may be received from the quantization index of the low frequency spectrum included in the bitstream.
  • the quantization index may be, for example, an interband difference index except for the lowest frequency band.
  • the quantization index of the low frequency spectrum can be vector dequantized, for example.
  • a vector dequantization method Pyramid Vector Quantization (PVQ) may be used, but is not limited thereto.
  • PVQ Pyramid Vector Quantization
  • a noise peeling process may be performed on the inverse quantization result to generate a decoded low frequency spectrum.
  • the noise filling process is for filling gaps present in the spectrum by quantizing to zero. Pseudo random noise may be inserted in the gap.
  • the frequency bin section in which the noise filling process is processed may be preset.
  • the amount of noise inserted into the gap can be controlled by a parameter transmitted in the bitstream.
  • the low frequency spectrum subjected to the noise filling may be further denormalized.
  • Anti-sparseness processing may be additionally performed on the noise peeled low frequency spectrum.
  • a coefficient having a random magnitude and a constant amplitude may be inserted into a portion of the coefficient that remains zero in the noise-peeled low frequency spectrum.
  • the anti-sparse low frequency spectrum can additionally be energy adjusted based on the low band dequantized envelope.
  • the decoded low frequency spectrum may be modified based on the excitation class.
  • the decoded low frequency spectrum may be one of an inverse quantized spectrum, a noise peeled spectrum, or an anti-sparse spectrum.
  • the amplitude of the decoded low frequency spectrum can be adjusted by the excitation class. For example, the amplitude reduction can be determined by the excitation class.
  • a high frequency excitation spectrum may be generated using the modified low frequency spectrum.
  • the modified low frequency spectrum may be patched to a high band required for bandwidth extension to generate a high frequency excitation spectrum.
  • An example of a patching method may be a method of copying or folding a predetermined section in a high band.
  • FIG. 19 is a flowchart illustrating an operation of a method for modifying low frequency spectrum according to an embodiment.
  • the method illustrated in FIG. 19 may correspond to step 1850 of FIG. 18 or may be independently implemented. Meanwhile, the method illustrated in FIG. 19 may be performed by the low frequency spectrum modifying unit 710 of FIG. 7 or may be performed by a separate processor.
  • the degree of amplitude adjustment may be determined based on an excitation class.
  • a control parameter may be generated based on the excitation class to determine the degree of amplitude adjustment.
  • the value of the control parameter may be determined depending on whether the excitation class represents a voice characteristic, a tonal characteristic or a non-tonal characteristic.
  • the amplitude of the low frequency spectrum may be adjusted based on the determined degree of amplitude adjustment.
  • the amplitude reduction can be large because a larger value control parameter is generated.
  • the amplitude of each frequency bin for example, the difference between the Norm value and the average Norm value of the corresponding band may be reduced by the value multiplied by the control parameter.
  • a sign may be applied to the low frequency spectrum whose amplitude is adjusted.
  • the original code or a random code may be applied.
  • random coding may be applied when the original code represents the non-tonal characteristic.
  • the low frequency spectrum to which the sign is applied may be generated as the modified low frequency spectrum in operation 1950.
  • the method according to the embodiments can be written in a computer executable program and can be implemented in a general-purpose digital computer operating the program using a computer readable recording medium.
  • data structures, program instructions, or data files that can be used in the above-described embodiments of the present invention can be recorded on a computer-readable recording medium through various means.
  • the computer-readable recording medium may include all kinds of storage devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include magnetic media, such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, floppy disks, and the like.
  • Such as magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.
  • the computer-readable recording medium may also be a transmission medium for transmitting a signal specifying a program command, a data structure, or the like.
  • Examples of program instructions may include high-level language code that can be executed by a computer using an interpreter as well as machine code such as produced by a compiler.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Spectroscopy & Molecular Physics (AREA)

Abstract

L'invention concerne un procédé et un appareil de décodage haute fréquence pour une extension de bande passante. Le procédé de décodage haute fréquence pour une extension de bande passante comprend les étapes consistant à : décoder une classe d'excitation ; transformer un spectre basse fréquence décodé sur la base de la classe d'excitation ; et générer un spectre d'excitation haute fréquence sur la base du spectre basse fréquence transformé. Le procédé et l'appareil de décodage haute fréquence pour une extension de bande passante selon un mode de réalisation peuvent transformer un spectre basse fréquence restauré et générer un spectre d'excitation haute fréquence, améliorant ainsi la qualité du son restauré sans augmentation excessive de la complexité.
PCT/KR2015/002045 2014-03-03 2015-03-03 Procédé et appareil de décodage haute fréquence pour une extension de bande passante WO2015133795A1 (fr)

Priority Applications (8)

Application Number Priority Date Filing Date Title
CN202010101660.4A CN111312277B (zh) 2014-03-03 2015-03-03 用于带宽扩展的高频解码的方法及设备
JP2016555511A JP6383000B2 (ja) 2014-03-03 2015-03-03 帯域幅拡張のための高周波復号方法及びその装置
US15/123,897 US10410645B2 (en) 2014-03-03 2015-03-03 Method and apparatus for high frequency decoding for bandwidth extension
EP15759308.8A EP3115991A4 (fr) 2014-03-03 2015-03-03 Procédé et appareil de décodage haute fréquence pour une extension de bande passante
CN201580022645.8A CN106463143B (zh) 2014-03-03 2015-03-03 用于带宽扩展的高频解码的方法及设备
CN202010101692.4A CN111312278B (zh) 2014-03-03 2015-03-03 用于带宽扩展的高频解码的方法及设备
US16/538,427 US10803878B2 (en) 2014-03-03 2019-08-12 Method and apparatus for high frequency decoding for bandwidth extension
US17/030,104 US11676614B2 (en) 2014-03-03 2020-09-23 Method and apparatus for high frequency decoding for bandwidth extension

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201461946985P 2014-03-03 2014-03-03
US61/946,985 2014-03-03
US201461969368P 2014-03-24 2014-03-24
US61/969,368 2014-03-24

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US15/123,897 A-371-Of-International US10410645B2 (en) 2014-03-03 2015-03-03 Method and apparatus for high frequency decoding for bandwidth extension
US16/538,427 Continuation US10803878B2 (en) 2014-03-03 2019-08-12 Method and apparatus for high frequency decoding for bandwidth extension

Publications (1)

Publication Number Publication Date
WO2015133795A1 true WO2015133795A1 (fr) 2015-09-11

Family

ID=54055542

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2015/002045 WO2015133795A1 (fr) 2014-03-03 2015-03-03 Procédé et appareil de décodage haute fréquence pour une extension de bande passante

Country Status (2)

Country Link
KR (2) KR102386736B1 (fr)
WO (1) WO2015133795A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG10201808274UA (en) 2014-03-24 2018-10-30 Samsung Electronics Co Ltd High-band encoding method and device, and high-band decoding method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060051298A (ko) * 2004-09-17 2006-05-19 하만 베커 오토모티브 시스템즈 게엠베하 대역 제한 오디오 신호의 대역폭 확장
US20070282599A1 (en) * 2006-06-03 2007-12-06 Choo Ki-Hyun Method and apparatus to encode and/or decode signal using bandwidth extension technology
WO2012108680A2 (fr) * 2011-02-08 2012-08-16 엘지전자 주식회사 Procédé et dispositif d'extension de largeur de bande
KR20130007485A (ko) * 2011-06-30 2013-01-18 삼성전자주식회사 대역폭 확장신호 생성장치 및 방법
WO2013141638A1 (fr) * 2012-03-21 2013-09-26 삼성전자 주식회사 Procédé et appareil de codage/décodage de haute fréquence pour extension de largeur de bande

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101375582B1 (ko) * 2006-11-17 2014-03-20 삼성전자주식회사 대역폭 확장 부호화 및 복호화 방법 및 장치
CN101965612B (zh) * 2008-03-03 2012-08-29 Lg电子株式会社 用于处理音频信号的方法和装置
PL2273493T3 (pl) * 2009-06-29 2013-07-31 Fraunhofer Ges Forschung Kodowanie i dekodowanie z rozszerzaniem szerokości pasma

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060051298A (ko) * 2004-09-17 2006-05-19 하만 베커 오토모티브 시스템즈 게엠베하 대역 제한 오디오 신호의 대역폭 확장
US20070282599A1 (en) * 2006-06-03 2007-12-06 Choo Ki-Hyun Method and apparatus to encode and/or decode signal using bandwidth extension technology
WO2012108680A2 (fr) * 2011-02-08 2012-08-16 엘지전자 주식회사 Procédé et dispositif d'extension de largeur de bande
KR20130007485A (ko) * 2011-06-30 2013-01-18 삼성전자주식회사 대역폭 확장신호 생성장치 및 방법
WO2013141638A1 (fr) * 2012-03-21 2013-09-26 삼성전자 주식회사 Procédé et appareil de codage/décodage de haute fréquence pour extension de largeur de bande

Also Published As

Publication number Publication date
KR102491177B1 (ko) 2023-01-20
KR20150103643A (ko) 2015-09-11
KR20220051317A (ko) 2022-04-26
KR102386736B1 (ko) 2022-04-14

Similar Documents

Publication Publication Date Title
KR102248252B1 (ko) 대역폭 확장을 위한 고주파수 부호화/복호화 방법 및 장치
JP5539203B2 (ja) 改良された音声及びオーディオ信号の変換符号化
JP6438056B2 (ja) 無損失符号化装置
JP2019168699A (ja) ビット割り当て装置
WO2013183928A1 (fr) Procédé et dispositif de codage audio, procédé et dispositif de décodage audio, et dispositif multimédia les employant
JP6616316B2 (ja) 高帯域符号化方法及びその装置、並びに高帯域復号方法及びその装置
US11676614B2 (en) Method and apparatus for high frequency decoding for bandwidth extension
WO2015065137A1 (fr) Procédé et appareil de génération de signal à large bande, et dispositif les employant
KR102625143B1 (ko) 신호 부호화방법 및 장치와 신호 복호화방법 및 장치
WO2015037969A1 (fr) Procédé et dispositif de codage de signal et procédé et dispositif de décodage de signal
WO2015037961A1 (fr) Procédé et dispositif de codage sans perte d'énergie, procédé et dispositif de codage de signal, procédé et dispositif de décodage sans perte d'énergie et procédé et dispositif de décodage de signal
KR102491177B1 (ko) 대역폭 확장을 위한 고주파 복호화 방법 및 장치
WO2015034115A1 (fr) Procédé et appareil de codage et de décodage d'un signal audio

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15759308

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016555511

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 15123897

Country of ref document: US

WPC Withdrawal of priority claims after completion of the technical preparations for international publication

Ref document number: 61/969,368

Country of ref document: US

Date of ref document: 20160901

Free format text: WITHDRAWN AFTER TECHNICAL PREPARATION FINISHED

REEP Request for entry into the european phase

Ref document number: 2015759308

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015759308

Country of ref document: EP